Scaling Playwright Tests: How I Solved CI Memory Leaks and Built an Open Source Solution

Engineering a robust solution for Playwright page lifecycle management that reduced CI memory usage by 85% and became an open source tool used by thousands of developers.

February 27, 2026•10 min read

#playwright #testing #ci-cd #automation #memory-management #open-source #engineering

#The Problem: When Test Isolation Becomes Resource Exhaustion

As a Senior QA Automation Engineer managing test suites at scale, I encountered a critical problem that many engineering teams face but few solve systematically: resource leaks in end-to-end testing. Our CI environment was crashing with out-of-memory errors, accumulating 800+ browser tabs during test execution, and turning 15-minute builds into 45-minute disasters.

This isn't just a technical curiosity—it's a production problem that affects development velocity, CI costs, and team confidence in automated testing.

#Why This Problem Matters in Real Systems

Testing conversational interfaces requires verifying multiple perspectives simultaneously. When User A sends a message to User B, you need to validate:

Sender experience: Message delivery confirmations, chat state updates, typing indicators
Recipient experience: Message arrival, notification badges, read receipts, UI state changes

This requires multiple browser contexts, and at scale, this becomes a resource management challenge that can cripple your development pipeline.

Our testing environment served a team of 20+ engineers with:

50+ development environments running nightly regression suites
1,800 tests per environment across desktop, mobile web, Chrome, and Safari
Multi-perspective scenarios requiring 2-5 browser pages per test
CI infrastructure costs growing linearly with resource usage

#Technical Root Cause Analysis

The issue wasn't with Playwright's design—it was with how teams typically handle multi-page scenarios. Playwright's context isolation model ensures each test starts clean, but pages created via browser.newPage() persist until the entire test suite finishes, not individual tests.

This creates a resource accumulation pattern:

typescript

// The hidden resource leak pattern
test('conversation flow test 1', async ({ browser }) => {
  const senderPage = await browser.newPage() // Page 1: Never cleaned up
  const recipientPage = await browser.newPage() // Page 2: Never cleaned up
  // Test logic...
  // ❌ Pages remain in memory after test completion
})

test('conversation flow test 2', async ({ browser }) => {
  const senderPage = await browser.newPage() // Page 3: Accumulates
  const recipientPage = await browser.newPage() // Page 4: Accumulates
  // Test logic...
})
// Result: Linear memory growth = Test count × Pages per test

#The CI Infrastructure Impact

The resource leak manifested as a cascading failure in our development pipeline:

Memory Growth Pattern:

Test 1-50: 2 pages each = 100 total pages
Test 51-100: 2 pages each = 200 total pages
Test 150-200: 2 pages each = 400 total pages
Result: 8GB Jenkins instances running out of memory before completion

Business Impact:

❌ Build reliability: 60% of CI builds failing due to resource exhaustion
❌ Developer productivity: 45-minute feedback cycles instead of 15 minutes
❌ Infrastructure costs: Scaling Jenkins instances horizontally without fixing the root cause
❌ Team confidence: Engineers avoiding comprehensive test coverage due to CI instability

#Engineering Solution Architecture

Rather than implementing manual cleanup everywhere (which creates maintenance overhead and error-prone code), I designed an automatic solution that handles page lifecycle management at the framework level.

#Design Principles

Zero Configuration: Should work out-of-the-box with existing Playwright tests
Automatic Tracking: Intercept browser.newPage() calls transparently
Reliable Cleanup: Use Playwright's fixture system for guaranteed cleanup
Graceful Failure Handling: Clean up pages even when tests fail or timeout
Developer Experience: Maintain existing test code patterns

#The Cleanup Chaos

Our first attempt was the obvious one: manual cleanup.

typescript

test('messaging with manual cleanup', async ({ browser }) => {
  const senderPage = await browser.newPage()
  const recipientPage = await browser.newPage()

  try {
    // Test logic here...
  } finally {
    await senderPage.close()
    await recipientPage.close()
  }
})

This worked... until it didn't. Tests that threw exceptions before reaching the finally block. Async operations that hung. Network timeouts that left pages in weird states.

We ended up with:

typescript

test('messaging with paranoid cleanup', async ({ browser }) => {
  let senderPage, recipientPage

  try {
    senderPage = await browser.newPage()
    recipientPage = await browser.newPage()

    // Test logic...
  } finally {
    if (senderPage && !senderPage.isClosed()) {
      await senderPage.close().catch(console.error)
    }
    if (recipientPage && !recipientPage.isClosed()) {
      await recipientPage.close().catch(console.error)
    }
  }
})

Multiply this everywhere. Every test became a defensive programming exercise. Our test code was 40% actual testing, 60% memory management ceremony.

#The Jenkins Memory Battle

Meanwhile, our DevOps team was frantically adjusting Jenkins configurations:

yaml

# Increase memory allocation (spoiler: didn't help)
memory: 8GB → 16GB → 32GB

# Reduce parallel workers (slowed everything down)
workers: 8 → 4 → 2 → 1

# Add aggressive timeouts (killed valid long-running tests)
timeout: 30s → 10s

We were treating the symptoms, not the disease. More memory just meant we could accumulate more leaked pages before crashing. Fewer workers meant longer build times. Aggressive timeouts meant false negatives.

The real problem wasn't memory—it was lifecycle management.

#The Eureka Moment

During one particularly frustrating debugging session, I noticed something odd. When I manually closed a browser window in headed mode, the test would suddenly start behaving normally. It wasn't a test logic problem—it was a resource management problem.

That's when it hit me: What if page cleanup was automatic and reliable, just like Playwright's context isolation?

I wanted the same confidence I had with contexts—knowing that every test starts clean and finishes clean, regardless of what happens in between.

#Building the Solution

I spent the next few days building what would become Playwright PageMan. The concept was elegantly simple:

Track every extra page created during test execution
Auto-close them after each test via Playwright's fixture lifecycle
Handle failures gracefully so cleanup always happens
Make it automatic for the common case (browser.newPage())

Here's how the same test looks with PageMan:

typescript

import { test, expect } from 'playwright-pageman'

test('user can send and receive messages', async ({ browser }) => {
  // This page is automatically tracked!
  const senderPage = await browser.newPage()
  await senderPage.goto('/chat/sender-view')

  // This page is automatically tracked too!
  const recipientPage = await browser.newPage()
  await recipientPage.goto('/chat/recipient-view')

  // ... test both sides of the conversation ...

  // No cleanup needed! Pages auto-close after the test.
})

That's it. No try-finally blocks. No manual cleanup. No defensive programming. Just the test logic that actually matters.

#Auto-Tracking Magic

The secret sauce is automatic tracking. PageMan intercepts browser.newPage() calls and automatically adds them to a cleanup queue. When the test finishes (whether it passes, fails, or times out), Playwright's fixture system ensures all tracked pages get closed.

For the less common cases—pages created via context.newPage() or popup windows—you can manually track them:

typescript

test('handle popup messages', async ({ page, context, extraPages }) => {
  // Open a popup window
  const [popup] = await Promise.all([
    context.waitForEvent('page'),
    page.click('#open-conversation-popup'),
  ])

  // Track it for auto-cleanup
  extraPages.push(popup)

  // Test the popup interface...
  // Popup auto-closes after the test
})

#The CI Transformation

After rolling out PageMan, our Jenkins builds went from disaster to delight:

Before PageMan:

❌ Builds timing out after 45 minutes
❌ Out of memory errors
❌ 800+ leaked browser windows
❌ Tests failing due to resource exhaustion
❌ Manual cleanup everywhere

After PageMan:

✅ Reliable 15-minute build times
✅ Stable memory usage (under 2GB)
✅ Zero leaked pages
✅ Tests failing only when they should
✅ Clean, focused test code

#Real-World Benefits

The impact went beyond just CI stability:

For QA Engineers:

Write tests that focus on behavior, not cleanup
No more defensive try...finally everywhere
Reliable execution in both local and CI environments

For DevOps Teams:

Predictable resource usage in CI pipelines
Smaller Jenkins instances (saved actual money)
No more 3 AM alerts about crashed test runners

For Development Teams:

Faster feedback loops with stable CI
More confidence in test results
Multi-page testing patterns become trivial

#The Global Access Innovation

One of the most requested features came from page object models. Teams wanted to track pages created inside helper functions without passing fixtures around:

typescript

// helpers/conversation-helper.ts
import { extraPages } from 'playwright-pageman'

export class ConversationHelper {
  async openNewChatWindow(context: BrowserContext, userId: string) {
    const page = await context.newPage()
    extraPages.push(page) // Global access - no fixture passing!
    await this.loginAsUser(page, userId)
    return page
  }
}

This made PageMan even more seamless—no architectural changes needed, just better lifecycle management.

#The Open Source Journey

After seeing the transformation in our own testing workflows, I realized this was a universal problem. Every team doing complex end-to-end testing faces the same page management challenges:

E-commerce sites: Testing both customer and admin interfaces
Collaboration tools: Multiple users interacting simultaneously
Real-time applications: Different viewport experiences
Multi-tenant platforms: Various user role perspectives

So I open-sourced Playwright PageMan with:

Zero configuration setup (just change your import)
Auto-tracking enabled by default (handles 80% of cases automatically)
Manual tracking for special cases (popups, context pages, etc.)
Configurable timeouts and logging (for debugging edge cases)
Full TypeScript support (because types prevent bugs)

#Configuration for Different Environments

PageMan adapts to your workflow:

typescript

// For local development - see what's happening
test.use({
  pageManOptions: {
    logCleanup: true, // Log cleanup actions
    closeTimeout: 5000, // Patient with slow pages
  },
})

// For CI - fast and silent
test.use({
  pageManOptions: {
    logCleanup: false, // No noise in CI logs
    closeTimeout: 1000, // Aggressive timeouts
  },
})

// For debugging - manual control
test.use({
  pageManOptions: {
    autoTrack: false, // Manual tracking only
  },
})

#The Community Response

Since going open source, PageMan has attracted teams with even more creative use cases:

Visual regression testing: Comparing designs across multiple viewport sizes
Accessibility testing: Testing screen readers in multiple windows simultaneously
Load testing: Simulating multiple user sessions from a single test
Cross-browser workflows: Coordinating actions between different browser instances

The common thread? Every team was fighting the same page lifecycle battle, and PageMan made it disappear.

#Looking Forward

Today, PageMan manages page lifecycles for thousands of tests across dozens of CI environments. What started as a weekend fix for our Jenkins memory crisis has become a tool that makes multi-page testing humane.

The lesson? Sometimes the best tools aren't about adding new capabilities—they're about removing friction from what you already know how to do.

If your CI builds are mysteriously slow, if your Jenkins instances keep running out of memory, if you're writing more cleanup code than test logic—you're not alone. And maybe, just maybe, your test runner is hosting an involuntary browser tab convention too.

#Get Started

Want to stop the page leak crisis in your tests? PageMan takes 30 seconds to install:

bash

npm install playwright-pageman

typescript

// Replace this:
import { test, expect } from '@playwright/test'

// With this:
import { test, expect } from 'playwright-pageman'

// That's it! Auto-tracking is enabled by default.
test('your test', async ({ browser }) => {
  const page1 = await browser.newPage() // Automatically tracked
  const page2 = await browser.newPage() // Also automatically tracked
  // Both pages auto-close after the test
})

No more manual cleanup. No more leaked pages. No more Jenkins tab parties.

Sometimes the smartest thing you can do is let the computer handle the boring stuff, so you can focus on what actually matters—making sure your conversation works from both sides.

Fighting your own page leak crisis? Found PageMan useful for your multi-page testing scenarios? Share your experience or contribute to the project. Let's make testing cleaner, one auto-closed page at a time.

#The Problem: When Test Isolation Becomes Resource Exhaustion

#Why This Problem Matters in Real Systems

#Technical Root Cause Analysis

#The CI Infrastructure Impact

#Engineering Solution Architecture

#Design Principles

#The Cleanup Chaos

#The Jenkins Memory Battle

#The Eureka Moment

#Building the Solution

#Auto-Tracking Magic

#The CI Transformation

#Real-World Benefits

#The Global Access Innovation

#The Open Source Journey

#Configuration for Different Environments

#The Community Response

#Looking Forward

#Get Started

Subscribe to my space 🚀