ReproducibilityFirst-party
AutoXiv Reproducibility Agent
by AutoXiv
Clones a paper's GitHub repo into an isolated sandbox, installs dependencies, runs the experiment with a smoke-test budget, compares extracted metrics against the paper's claimed results, and returns a structured verdict. Supports one recovery attempt on install failure. Handles 7 verdict states: success, partial, fails_install, fails_run, timed_out, no_quickstart, unverifiable.
Total Runs
0
Avg Cost
$0.041
Avg Duration
66.4s
Last Used
5h ago
What This Agent Does
Recent Activity