Deterministic Browser Harness

Computer Vision Tests

Each page starts in a known state, exposes a visible task contract, and reveals a persistent completion code only after true success. The suite is tuned for debugging where an agent gets stuck, not for simulating a full product flow.

Total tests19
Buckets4
Success contractIn-page only

Global Contract

  • Visible task instruction, `TASK_ID`, and `TASK_VERSION` on every page.
  • Exactly one intended success path with deterministic reset behavior.
  • Persistent in-page success block with hidden-until-success completion code.
  • No redirect-based success and no dependency on other tests.

Suite Coverage

  • Inputs, selects, radios, checkboxes, buttons, Enter-to-submit, and textareas.
  • Scroll, wizards, modals, accordions, tabs, and table row actions.
  • Validation recovery, delayed enablement, async loading, and repeated UI.
  • Event logs, machine-readable outcomes, and reset controls on each test page.

Field targeting, typing, clicking, simple submit flows, and explicit success checks.

Tier 1: Atomic Vision + Action

Focused pages that isolate a single visible control or interaction pattern.

9 tests

form-single-input-001

Single Input Form

v1.0.0

One text field, one correct value, one success path.

Enter `AURORA-17` in the Access code field, then submit the form.

field targetingtypingsubmitsuccess verification
Open testCompletion code hidden until success

form-multi-field-002

Multi-Field Basic Form

v1.0.0

Three required fields with inline validation and one optional decoy field.

Fill First name with `Nora`, Last name with `Stone`, Email with `nora.stone@gettandem.app`, then submit.

label mappinginput sequencingvalidation awarenessmulti-field submit
Open testCompletion code hidden until success

dropdown-basic-003

Dropdown / Select

v1.0.0

A native select with decoy controls nearby and one valid option.

Choose `Orion Ops` from the Team assignment dropdown, then submit.

dropdown targetingoption selectionsubmitstate confirmation
Open testCompletion code hidden until success

radio-basic-004

Radio Button Group

v1.0.0

A simple radio group with similar-looking labels.

Select the deployment window labeled `06:30 PM`, then submit.

label associationsmall target precisiongrouped choice selection
Open testCompletion code hidden until success

checkbox-basic-005

Checkbox Checklist

v1.0.0

Two required checkboxes plus a select-all trap and a disabled decoy.

Check only `I have reviewed the release notes` and `I confirm the staging snapshot is current`, then submit.

multi-select behaviorstate detectionprecisiontrap avoidance
Open testCompletion code hidden until success

button-target-006

Button Recognition

v1.0.0

A primary action surrounded by similar-sized decoy buttons.

Press the button labeled `Finalize Snapshot`.

button recognitiondecoy avoidanceCTA parsing
Open testCompletion code hidden until success

enter-submit-007

Enter-to-Submit

v1.0.0

A keyboard-only submission path with no clickable submit button.

Type `delta search ready` in the search field and submit with Enter.

keyboard submissiontypingsubmit detectionhang prevention
Open testCompletion code hidden until success

textarea-basic-008

Textarea Entry

v1.0.0

A multiline input task that checks textarea targeting and formatting.

Enter the exact three-line note shown in the reference block into the textarea, then submit.

textarea recognitionmulti-line typingsubmitcontent verification
Open testCompletion code hidden until success

slider-drag-019

Slider Drag

v1.0.0

A single slider that must be moved to an exact numeric value before submit.

Drag the slider until the current value is `72`, then submit.

drag interactionvalue targetingprecisionsubmit
Open testCompletion code hidden until success

Scroll management, wizards, modals, accordions, tabs, and dense row actions.

Tier 2: Structural Interaction

Layout-heavy tasks where the agent has to navigate sections, overlays, or repeated UI.

6 tests

scroll-long-form-009

Long Scroll Form

v1.0.0

A deliberately tall page that separates the final submit area from the first fields.

Fill Project name with `Northwind`, Owner with `Ava`, Rollout zone with `Central`, then scroll to the bottom and submit.

viewport managementscrollingstate persistencelong-form completion
Open testCompletion code hidden until success

wizard-steps-010

Multi-Step Wizard

v1.0.0

A three-step flow where only the last step reveals success.

Step 1: enter `BETA-9`. Step 2: choose `Manual Review`. Step 3: check the final confirmation and submit.

step progressionstate persistenceprogress recognitionfinal-step submit
Open testCompletion code hidden until success

modal-confirm-011

Modal Confirmation

v1.0.0

A centered overlay that requires the agent to ignore the background page.

Open the modal, type `CONFIRM` inside it, then press `Confirm action`.

modal focus shiftoverlay interactiontext inputconfirmation
Open testCompletion code hidden until success

accordion-reveal-012

Accordion / Expand-to-Reveal

v1.0.0

A hidden input flow behind one of several similar accordion sections.

Expand `Credential Rotation`, enter `rotate-88`, then submit the revealed form.

hidden content discoverycollapsed state recognitiontext-based navigation
Open testCompletion code hidden until success

tabs-target-013

Tabbed Interface

v1.0.0

A tabbed settings shell with one actionable form inside the correct tab.

Switch to the `Notifications` tab, choose `Hourly`, then save the preference.

tab selectioncontext switchingsub-view interactionsave confirmation
Open testCompletion code hidden until success

table-row-action-014

Table / Row Action

v1.0.0

A dense table with repeated row actions and only one valid target row.

Find the row for `Helio Labs` and press its `Mark reviewed` button.

row targetingspatial associationrepeated action disambiguation
Open testCompletion code hidden until success

Error handling, prerequisite gating, loading patience, and state re-checking.

Tier 3: State, Validation, and Recovery

Tasks that intentionally expose waiting, validation, and delayed state transitions.

3 tests

validation-recovery-015

Validation Error Recovery

v1.0.0

A recovery-specific task that requires noticing and correcting a validation failure.

Press submit once with the field empty to surface the validation error. Then enter `TANDEM-VALID` and submit again.

error detectionadaptive correctionstate change recognition
Open testCompletion code hidden until success

delayed-enablement-016

Delayed Enablement

v1.0.0

A prerequisite gate where the final button becomes enabled after a short delay.

Check `I understand this is a simulated dry run`, wait for the continue button to enable, then click continue.

disabled-state recognitionwaitingprerequisite reasoning
Open testCompletion code hidden until success

async-spinner-017

Async Loading / Spinner

v1.0.0

A two-phase async interaction with a visible loading state and post-load action.

Press `Start analysis`, wait for loading to finish, then click `Publish result`.

loading awarenesspatiencestate re-checkingfollow-up action
Open testCompletion code hidden until success

Repeated modules, nearby decoys, full-label reading, and verifying the right target.

Tier 4: Ambiguity and Adversarial Layout

Pages that force the agent to disambiguate repeated controls and local context.

1 tests

repeated-ui-018

Repeated UI Cards

v1.0.0

Three nearly identical cards, each with the same action button.

In the card labeled `North Annex`, press `Arm sensor`.

local context understandingrepeated structure disambiguationbutton targeting
Open testCompletion code hidden until success