Flaky Tests: What Really Causes Them And How To Fix Them

Retries are not a fix. A practical breakdown of the five real causes of flaky tests, the prevention stack that kills 80% of them, and when AI self-healing handles the rest.

Every engineering team that has ever invested in end-to-end testing eventually runs into the same problem. A test that passed yesterday fails today. You rerun it and it passes. You ignore it. A week later it fails again. Nobody knows why. Somebody tags it @flaky, adds a retry, and moves on. Multiply that by a hundred tests and you have a CI pipeline nobody trusts.

Flaky tests are the single largest tax on automated testing. Google published a paper estimating that roughly 16% of their test suite was flaky on any given day. A survey by Microsoft put developer time lost to flaky tests at 4 to 10 hours per week per engineer. That is not a rounding error. That is a full workday every week lost to tests that do not do what they promised.

This post breaks down what actually causes flaky tests, why the usual fixes make things worse, and what a modern approach looks like. No hype, just the patterns we see repeat across thousands of real Playwright test runs.

The Five Real Causes of Flaky Tests

When we analyzed failure patterns across thousands of test runs, five categories accounted for over 94% of intermittent failures. If you can eliminate these five, you will eliminate most of your flakiness.

1. Race Conditions on Asynchronous UI

The test clicks a button before a spinner resolves. It asserts on text before an API response lands. It navigates before a redirect completes. These pass on your laptop where the network is a loopback and the CPU is idle, and fail in CI where three other test suites are fighting for the same runner. Playwright auto-waits for most interactions, but it cannot auto-wait for your app's internal state to settle. That is a problem only your code understands.

2. Test Ordering and Shared State

Test A creates a record. Test B assumes it exists. Run them in order and both pass. Run them in parallel or shuffled, and Test B fails half the time. Shared state is the silent killer of test reliability. Databases, localStorage, cookies, in-memory caches, even the URL bar can leak between tests. If your test depends on any state not created inside the test itself, it will eventually fail for reasons unrelated to the code it is supposed to validate.

Topics: Flaky Tests, Playwright, CI/CD, Self-Healing.

Read the full article · Get Started Free