Deterministic Browser Harness

Computer Vision Tests

Each page starts in a known state, exposes a visible task contract, and reveals a persistent completion code only after true success. The suite is tuned for debugging where an agent gets stuck, not for simulating a full product flow.

Total tests24
Buckets4
Success contractIn-page only

Global Contract

  • Visible task instruction, `TASK_ID`, and `TASK_VERSION` on every page.
  • Exactly one intended success path with deterministic reset behavior.
  • Persistent in-page success block with hidden-until-success completion code.
  • No redirect-based success and no dependency on other tests.

Suite Coverage

  • Inputs, selects, radios, checkboxes, buttons, Enter-to-submit, and textareas.
  • Scroll, wizards, modals, accordions, tabs, and table row actions.
  • Validation recovery, delayed enablement, async loading, and repeated UI.
  • Event logs, machine-readable outcomes, and reset controls on each test page.

Field targeting, typing, clicking, simple submit flows, and explicit success checks.

Tier 1: Atomic Vision + Action

Focused pages that isolate a single visible control or interaction pattern.

11 tests

form-single-input-001

Single Input Form

v1.0.0

One text field, one correct value, one success path.

Enter `AURORA-17` in the Access code field, then submit the form.

field targetingtypingsubmitsuccess verification
Open testCompletion code hidden until success

form-multi-field-002

Multi-Field Basic Form

v1.0.0

Three required fields with inline validation and one optional decoy field.

Fill First name with `Nora`, Last name with `Stone`, Email with `nora.stone@gettandem.app`, then submit.

label mappinginput sequencingvalidation awarenessmulti-field submit
Open testCompletion code hidden until success

dropdown-basic-003

Dropdown / Select

v1.0.0

A native select with decoy controls nearby and one valid option.

Choose `Orion Ops` from the Team assignment dropdown, then submit.

dropdown targetingoption selectionsubmitstate confirmation
Open testCompletion code hidden until success

radio-basic-004

Radio Button Group

v1.0.0

A simple radio group with similar-looking labels.

Select the deployment window labeled `06:30 PM`, then submit.

label associationsmall target precisiongrouped choice selection
Open testCompletion code hidden until success

checkbox-basic-005

Checkbox Checklist

v1.0.0

Two required checkboxes plus a select-all trap and a disabled decoy.

Check only `I have reviewed the release notes` and `I confirm the staging snapshot is current`, then submit.

multi-select behaviorstate detectionprecisiontrap avoidance
Open testCompletion code hidden until success

button-target-006

Button Recognition

v1.0.0

A primary action surrounded by similar-sized decoy buttons.

Press the button labeled `Finalize Snapshot`.

button recognitiondecoy avoidanceCTA parsing
Open testCompletion code hidden until success

enter-submit-007

Enter-to-Submit

v1.0.0

A keyboard-only submission path with no clickable submit button.

Type `delta search ready` in the search field and submit with Enter.

keyboard submissiontypingsubmit detectionhang prevention
Open testCompletion code hidden until success

textarea-basic-008

Textarea Entry

v1.0.0

A multiline input task that checks textarea targeting and formatting.

Enter the exact three-line note shown in the reference block into the textarea, then submit.

textarea recognitionmulti-line typingsubmitcontent verification
Open testCompletion code hidden until success

slider-drag-019

Slider Drag

v1.0.0

A single slider that must be moved to an exact numeric value before submit.

Drag the slider until the current value is `72`, then submit.

drag interactionvalue targetingprecisionsubmit
Open testCompletion code hidden until success

autocomplete-combobox-020

Autocomplete / Combobox

v1.0.0

A filtered suggestion list where typed text alone does not complete the task.

Type `or`, select `Orion Chen` from the suggestions, then submit.

autocompletecombobox targetingsuggestion selectionstate verification
Open testCompletion code hidden until success

file-upload-022

File Upload

v1.0.0

A native file input that accepts any local file and requires visible filename confirmation.

Upload any local file, confirm the filename appears, then submit.

file chooserupload state detectionfilename verificationsubmit
Open testCompletion code hidden until success

Scroll management, wizards, modals, accordions, tabs, and dense row actions.

Tier 2: Structural Interaction

Layout-heavy tasks where the agent has to navigate sections, overlays, or repeated UI.

9 tests

scroll-long-form-009

Long Scroll Form

v1.0.0

A deliberately tall page that separates the final submit area from the first fields.

Fill Project name with `Northwind`, Owner with `Ava`, Rollout zone with `Central`, then scroll to the bottom and submit.

viewport managementscrollingstate persistencelong-form completion
Open testCompletion code hidden until success

wizard-steps-010

Multi-Step Wizard

v1.0.0

A three-step flow where only the last step reveals success.

Step 1: enter `BETA-9`. Step 2: choose `Manual Review`. Step 3: check the final confirmation and submit.

step progressionstate persistenceprogress recognitionfinal-step submit
Open testCompletion code hidden until success

modal-confirm-011

Modal Confirmation

v1.0.0

A centered overlay that requires the agent to ignore the background page.

Open the modal, type `CONFIRM` inside it, then press `Confirm action`.

modal focus shiftoverlay interactiontext inputconfirmation
Open testCompletion code hidden until success

accordion-reveal-012

Accordion / Expand-to-Reveal

v1.0.0

A hidden input flow behind one of several similar accordion sections.

Expand `Credential Rotation`, enter `rotate-88`, then submit the revealed form.

hidden content discoverycollapsed state recognitiontext-based navigation
Open testCompletion code hidden until success

tabs-target-013

Tabbed Interface

v1.0.0

A tabbed settings shell with one actionable form inside the correct tab.

Switch to the `Notifications` tab, choose `Hourly`, then save the preference.

tab selectioncontext switchingsub-view interactionsave confirmation
Open testCompletion code hidden until success

table-row-action-014

Table / Row Action

v1.0.0

A dense table with repeated row actions and only one valid target row.

Find the row for `Helio Labs` and press its `Mark reviewed` button.

row targetingspatial associationrepeated action disambiguation
Open testCompletion code hidden until success

date-picker-021

Date Picker

v1.0.0

A popover calendar that starts one month before the required date.

Open the date picker, move to April 2026, select `April 22, 2026`, then save.

popover interactioncalendar navigationdate targetingsave confirmation
Open testCompletion code hidden until success

filtered-table-action-023

Search + Filter + Row Action

v1.0.0

A table that must be filtered before the correct repeated row action can succeed.

Search for `Mira`, then click `Approve access` for the `Mira Vault` row.

search filteringrow targetingrepeated action disambiguationstate verification
Open testCompletion code hidden until success

drag-drop-reorder-024

Drag and Drop Reorder

v1.0.0

A reorderable list where the final item order must be verified before submit.

Drag `Compliance review` so it sits directly above `Release notes`, then submit.

drag and dropspatial orderingstate verificationsubmit
Open testCompletion code hidden until success

Error handling, prerequisite gating, loading patience, and state re-checking.

Tier 3: State, Validation, and Recovery

Tasks that intentionally expose waiting, validation, and delayed state transitions.

3 tests

validation-recovery-015

Validation Error Recovery

v1.0.0

A recovery-specific task that requires noticing and correcting a validation failure.

Press submit once with the field empty to surface the validation error. Then enter `TANDEM-VALID` and submit again.

error detectionadaptive correctionstate change recognition
Open testCompletion code hidden until success

delayed-enablement-016

Delayed Enablement

v1.0.0

A prerequisite gate where the final button becomes enabled after a short delay.

Check `I understand this is a simulated dry run`, wait for the continue button to enable, then click continue.

disabled-state recognitionwaitingprerequisite reasoning
Open testCompletion code hidden until success

async-spinner-017

Async Loading / Spinner

v1.0.0

A two-phase async interaction with a visible loading state and post-load action.

Press `Start analysis`, wait for loading to finish, then click `Publish result`.

loading awarenesspatiencestate re-checkingfollow-up action
Open testCompletion code hidden until success

Repeated modules, nearby decoys, full-label reading, and verifying the right target.

Tier 4: Ambiguity and Adversarial Layout

Pages that force the agent to disambiguate repeated controls and local context.

1 tests

repeated-ui-018

Repeated UI Cards

v1.0.0

Three nearly identical cards, each with the same action button.

In the card labeled `North Annex`, press `Arm sensor`.

local context understandingrepeated structure disambiguationbutton targeting
Open testCompletion code hidden until success