Tech Project
Back to Selected Works
Online Mind2Web × Agent TARS Testing Script
Tech·2 GitHub StarsHuggingFace + Agent TARS
The Brief
Automated testing pipeline that loads Mind2Web tasks from HuggingFace and executes them via Agent TARS for GUI agent evaluation.
CategoryComputer Science / Tech
RoleSolo Developer
Year2025
Tech StackPython, Agent TARS, HuggingFace, Mind2Web
Problem
Evaluating GUI agents at scale requires manually crafting test scenarios and running them one-by-one, which is slow and error-prone.
Significance
Automated, reproducible GUI agent benchmarking accelerates research iteration on web-task agents like Agent TARS.
My Contribution
Built an end-to-end testing script that loads the Online Mind2Web dataset, formats tasks for Agent TARS, and captures structured results with screenshots—supporting advanced filtering by difficulty, length, and website.