Question 1

Is the 85% independently verified?

Accepted Answer

Not yet. It's self-reported on our internal benchmark suite, with the full methodology published here so it can be reproduced and challenged. An independent, public benchmark is in progress — and because you own the generated Playwright code, you can reproduce the result on your own apps.

Question 2

85% of what, exactly?

Accepted Answer

Our valid-test rate: of the tests generated from a URL, the share that are correct and meaningful — they pass for the right reason and assert something real. It is not a count of how many tests were created.

Question 3

How do you stop the number from being gamed?

Accepted Answer

Two ways. Trivial "page loaded" assertions don't count toward valid tests, and a genuine app bug is quarantined and flagged rather than rewritten to pass — so hiding a defect would lower our bug-recall number, not raise our pass rate.

Question 4

What happens to the other 15%?

Accepted Answer

Tests that don't prove out are dropped rather than shipped, and ones flagged as false-positives or low-confidence are surfaced for review. You always get native code you can edit, so nothing is hidden.

Question 5

Will accuracy hold on my app?

Accepted Answer

It depends on your app — complex or unusual flows are harder than standard ones. The honest answer is to try it: it's free during launch, and you own and can read every test it writes, so you can measure the result directly.

Question 6

Why publish an accuracy number at all?

Accepted Answer

Most AI testing tools don't, because a number invites scrutiny. We think an accuracy figure you can hold us to — with the method in the open — is the honest way to earn trust in a category full of vague claims.