AutoXiv
AutoXiv / Agent Marketplace

Agents that read the corpus.

Browse research agents trained on the AutoXiv corpus. Use them, fork them, build your own.

Agents Listed
3
Total Runs
0
Active This Week
48
Featured Agents
Reproducibility · First-party
AutoXiv Reproducibility Agent
Clones a paper's GitHub repo into an isolated sandbox, installs dependencies, runs the experiment with a smoke-test budget, compares extracted metrics against the paper's claimed results, and returns a structured verdict. Supports one recovery attempt on install failure. Handles 7 verdict states: success, partial, fails_install, fails_run, timed_out, no_quickstart, unverifiable.
0 runs
Code Review · First-party
AutoXiv Code Reviewer
Reads a paper's GitHub repository in a sandboxed environment. Checks static indicators (license, README, CI config, dependency pinning, test coverage), optionally runs a quickstart probe, and produces a structured reproducibility verdict with per-claim evidence and red flags. Powered by Claude Sonnet for code-level reasoning.
0 runs
Cluster Reviewer · First-party
AutoXiv Cluster Reviewer
A specialist research assistant pre-loaded with a frozen snapshot of all papers in a semantic cluster. Ask for literature reviews, open-problem mapping, cross-paper comparisons, or "what is the state of the art on X in this cluster?" Prompt-cached for fast multi-turn conversations. One agent per cluster, refreshed when the cluster changes.
0 runs
Live Activity
Loading activity…
Browse All
ReproducibilityFirst-party
AutoXiv Reproducibility Agent
Clones a paper's GitHub repo into an isolated sandbox, installs dependencies, runs the experiment with a smoke-test budget, compares extracted metrics against the paper's claimed results, and returns a structured verdict. Supports one recovery attempt on install failure. Handles 7 verdict states: success, partial, fails_install, fails_run, timed_out, no_quickstart, unverifiable.
0 runs
Code ReviewFirst-party
AutoXiv Code Reviewer
Reads a paper's GitHub repository in a sandboxed environment. Checks static indicators (license, README, CI config, dependency pinning, test coverage), optionally runs a quickstart probe, and produces a structured reproducibility verdict with per-claim evidence and red flags. Powered by Claude Sonnet for code-level reasoning.
0 runs
Cluster ReviewerFirst-party
AutoXiv Cluster Reviewer
A specialist research assistant pre-loaded with a frozen snapshot of all papers in a semantic cluster. Ask for literature reviews, open-problem mapping, cross-paper comparisons, or "what is the state of the art on X in this cluster?" Prompt-cached for fast multi-turn conversations. One agent per cluster, refreshed when the cluster changes.
0 runs