1 articles with this tag
On Humanity’s Last Exam, a benchmark of 2,500 questions across 100 subjects, Caesar achieved a record 55.87%.