The test pyramid is dead — what replaced it

The test pyramid hasn’t been replaced. It’s been misread for fifteen years, and the tooling finally made the misreading visible.

The specific advice that still holds: test at multiple granularity levels, and have fewer high-level tests than low-level ones. Everything else — the ratios, the tiers, the shape — is folk wisdom that accumulated around a metaphor. The “70/20/10” breakdown (70% unit, 20% integration, 10% E2E) widely attributed to Google does not appear in any Google Testing Blog primary source. Adversarial verification came up empty. You’ve been optimizing for a number nobody authoritative ever said.

In 2026, the more urgent conversation is this: Vitest v4.1.7 runs “unit tests” in real Chromium. Playwright v1.60.0 runs component tests in real browsers using the same MSW handlers as your integration tests. The boundary between unit and integration isn’t a philosophy anymore — it’s a config flag.

Who this is for

Frontend and full-stack developers who write tests and have absorbed the pyramid as a ratio target. If you’re already testing at multiple levels by instinct without worrying about percentages, you can skip to the tooling section.

Where the pyramid came from

Mike Cohn introduced the test pyramid around 2009 as a shape, not a formula. Tests at the bottom (unit) are fast and cheap. Tests at the top (UI/E2E) are slow and expensive. Have more of the fast ones.

Martin Fowler’s 2018 clarification — still the clearest primary source on the subject — distilled this to two sentences: “Write tests with different granularity. The more high-level you get the fewer tests you should have.” No ratio. No percentages. The pyramid was always a visual heuristic for relative counts, not a quantitative target.

The 70/20/10 figures entered the discourse as a “Google recommendation” around the time the Google Testing Blog posted “Just Say No to More End-to-End Tests” (2015). That post is real and the core argument is solid: “If you mostly use E2E tests, then your test runtime (and the number of test flakes) will inflate significantly.” But the specific ratio is not in that post, not in Cohn’s writing, and not in Fowler’s clarification. It’s been attributed to Google for a decade with no traceable source.

Why the ratios broke anyway

Even if 70/20/10 had been a real prescription, the premises that made it reasonable have changed.

E2E flakiness is still the bottleneck

The Google 2015 argument holds in 2026. Industry data puts E2E flakiness at 15–21% across real-world test suites. A peer-reviewed industrial study found that flaky tests cost developers roughly 2.5% of total working time — debugging, reruns, and the slower feedback loop that makes people stop running tests locally. E2E tests that touch a real browser, real network, and real backend accumulate each failure mode independently. You do not want most of your tests here.

That’s still the pyramid’s actual lesson, and it’s still correct.

The tier boundary is a config flag now

The practical argument for “mostly unit tests” was environmental: unit tests ran in Node.js or jsdom, integration tests might spin up a database, E2E tests required a browser. Speed scaled inversely with environment fidelity.

Vitest v4.1.7 (released May 20, 2026) ships browser mode as a stable feature. You write a test that looks exactly like a Vitest unit test and it runs in Chromium, Firefox, or WebKit via the Playwright or WebdriverIO provider. The official rationale is direct: jsdom and happy-dom “only simulate a browser environment and not an actual browser, which may result in some discrepancies… Therefore, false positives or negatives in test results may occur.” Browser mode removes that class of false signal.

Playwright v1.60.0 (released May 11, 2026) runs component tests — React, Vue, Svelte — in real browsers with a mounting API that looks like your unit test setup. The component testing packages still carry the experimental- prefix, so call this production-capable but not formally stable. More concretely: Playwright’s component test runner accepts MSW request handlers via router.use(handlers). If you’re already using MSW to mock network requests in Vitest, you can reuse those handlers in Playwright component tests without modification.

The unit/integration/E2E tier distinction used to map to an environment distinction. Now it’s increasingly a question of scope — how much of the system does this test exercise? — rather than where it runs.

What the alternative shapes say

Several frameworks have proposed alternatives to the pyramid. None has achieved the same canonical status, and primary-source verification is uneven. Here’s what’s solid enough to act on:

Kent C. Dodds’ Testing Trophy puts “integration” at the widest band. The underlying logic — that tests which exercise real module boundaries give more confidence per test than pure unit tests — is coherent with how modern tooling works. Worth reading the original post and Dodds’ 2025 update discussion directly; the specific claims about ratios require independent verification.

Spotify’s Testing Honeycomb (2018) reoriented testing around microservices — the natural integration point between services is a better test boundary than mocking internal modules. If you work in a service-mesh architecture, the 2018 Spotify engineering post is worth reading. The specific claim that it “inverts the pyramid” got mixed support in primary-source verification; the nuance may matter to you.

Google’s SMURF (October 2024, Testing Blog) characterizes tests by behavioral properties rather than tier labels. The SMURF acronym — Small, Medium, Useful, Reliable, Fast according to secondary sources — may actually use different terminology in the post itself. Read it directly before citing it; the post exists and the framing appears substantive, but the specific SMURF claim failed adversarial verification in the research pipeline for this article. The underlying move (classify by behavior, not by tier name) is compatible with where the industry is going regardless.

What all three share: they argue that the pyramid’s tier labels are less useful than clarity about what confidence you’re buying from each test. The “integration” band on the Trophy and the “medium” tier in Google’s framework are doing similar conceptual work — exercising real dependencies without the full network stack.

Concrete strategy by project type

The verified evidence supports this general posture: don’t optimize for a ratio; optimize for the confidence-to-flakiness ratio of each test type in your specific environment.

Monolith with server-rendered templates

The pyramid applies most cleanly here. Unit test business logic. Integration-test database writes and service calls with a real database (a Docker container in CI is fine). E2E-test three to five critical user journeys — login, checkout, signup. Don’t grow the E2E layer because it feels thorough.

API service (REST or GraphQL)

Skip unit tests of pure functions that have no external dependencies — the coverage isn’t worth the maintenance. Integration-test your endpoints against a real database. Contract-test the API boundary with consumers if you have them. You likely need almost no browser automation.

Frontend-heavy SPA

This is where Vitest browser mode and Playwright component testing change the calculus. The traditional argument for pure unit tests was that jsdom was fast and “good enough.” It’s not good enough for layout-dependent logic, CSS interactions, or anything that hits ResizeObserver or IntersectionObserver. Run component tests in a real browser via Vitest browser mode or Playwright CT. Use MSW for network boundaries — the same handlers work in both. Reserve Playwright E2E for end-to-end user journeys; keep the count small.

// vitest.config.ts — browser mode
import { defineConfig } from 'vitest/config'

export default defineConfig({
  test: {
    browser: {
      enabled: true,
      provider: 'playwright',
      name: 'chromium',
    },
  },
})

// playwright.config.ts — component testing
import { defineConfig } from '@playwright/experimental-ct-react'

export default defineConfig({
  use: {
    ctPort: 3100,
  },
})

The MSW handler reuse across Vitest and Playwright looks like this:

// handlers.ts — write once, use in both
import { http, HttpResponse } from 'msw'

export const handlers = [
  http.get('/api/user', () =>
    HttpResponse.json({ id: 1, name: 'Test User' })
  ),
]

// vitest: setup.ts
import { setupServer } from 'msw/node'
import { handlers } from './handlers'
const server = setupServer(...handlers)
beforeAll(() => server.listen())
afterEach(() => server.resetHandlers())
afterAll(() => server.close())

// playwright component test
import { handlers } from './handlers'

test('loads user profile', async ({ mount, page, router }) => {
  await router.use(...handlers)
  const component = await mount(<UserProfile />)
  await expect(component.getByText('Test User')).toBeVisible()
})

Verdict

The pyramid’s shape survives. The metaphor is still useful. The numerical ratios were never real.

The actual prescription for 2026:

If you’re testing pure logic with no DOM or network: Vitest in Node mode. Fast, cheap, still valuable.
If you’re testing UI components: Vitest browser mode (v4.1.7) or Playwright component testing (v1.60.0, experimental prefix). Run in real browsers. The jsdom false signal problem is solved by switching environments, not by writing more E2E tests.
If you’re testing user journeys: Playwright E2E. Keep the count small. Flakiness compounds with coverage; ten reliable journeys beat fifty fragile ones.
If you’re testing API surfaces: Integration tests against real dependencies in CI. No browser needed.

The tier you should minimize — in any shape, pyramid or trophy or honeycomb — is the one with the highest flakiness rate per confidence unit delivered. In most projects that’s still E2E. The pyramid got that right. The specific numbers around it were always noise.