Engineering Deep Dive: Structured Snapshots From Accessibility Trees
How Validate.QA parses raw accessibility YAML into compact structured snapshots with ranked locators, readiness signals, and active-scope inference.
Raw accessibility trees are great debugging artifacts and mediocre model inputs. They contain the truth, but they do not expose it in the shape an agent actually needs when it is trying to fix a failing locator or understand whether a page is ready. The original browser_snapshot output in our MCP flows was often 50 or more lines of YAML, full of indentation, repeated containers, generic labels, and low-signal structural noise.
That matters because agents burn turns parsing before they can act. If the model spends half a turn reconstructing the form structure in its head, it is not spending that budget on exploration, diagnosis, or code generation. The fix was not to replace the accessibility tree with something invented. It was to parse the same tree server-side, rank the useful elements, infer scope and readiness, and then render a compact block that keeps the evidence but removes the decoding tax.
The Before And After
Here is the core transformation in miniature. The input is still the accessibility tree from MCP. The output is a smaller YAML block, but one that is semantically richer for an agent.
Parsing The Tree Without Pretending We Have DOM Access
The parser in snapshot-parser.ts is intentionally humble. It does not get bounding boxes, live handles, or CSS. It gets lines of YAML. So the implementation is line-oriented and mostly regex driven. Every line that matches the element format is inspected for role, name, attributes, indentation depth, ref, and state flags like required, disabled, checked, selected, pressed, expanded, and popup state.
Two stacks do most of the structural work. A form stack tracks open form-like containers so the parser can group fields and infer the submit button label. A context stack tracks enclosing headings, dialogs, tab panels, and other labeled containers so elements inherit a meaningful container path. By the time a line becomes a parsed element, it already knows whether it likely submits, navigates, belongs to a dialog, or sits inside a panel that should outrank the rest of the page.
Topics: Engineering, Deep Dive, Architecture, Accessibility.
Read the full article · Get Started Free