The Sequence Radar #534: The Leaderboard Illusion: The Paper that Challenges Arena-Based AI Evaluations