AutoXiv
Coverage, Not Averages: Semantic Stratification for Trustworthy Retrieval Evaluation — AutoXiv