WENYUFAN
Tech Project
Back to Selected Works

Online Mind2Web × Agent TARS Testing Script

Tech·2 GitHub StarsHuggingFace + Agent TARS

The Brief

Automated testing pipeline that loads Mind2Web tasks from HuggingFace and executes them via Agent TARS for GUI agent evaluation.

CategoryComputer Science / Tech
RoleSolo Developer
Year2025
Tech StackPython, Agent TARS, HuggingFace, Mind2Web

Problem

Evaluating GUI agents at scale requires manually crafting test scenarios and running them one-by-one, which is slow and error-prone.

Significance

Automated, reproducible GUI agent benchmarking accelerates research iteration on web-task agents like Agent TARS.

My Contribution

Built an end-to-end testing script that loads the Online Mind2Web dataset, formats tasks for Agent TARS, and captures structured results with screenshots—supporting advanced filtering by difficulty, length, and website.