Tencent-Hunyuan/AutoCodeBenchmarkContribute to Tencent-Hunyuan/AutoCodeBenchmark development by creating an account on GitHub.#benchmark#code·github.com·Dec 25, 2025Tencent-Hunyuan/AutoCodeBenchmark
The Ars Technica AI coding agent test: Minesweeper edition - Ars TechnicaHow do four modern LLMs do at recreating a simple Windows gaming classic?#code#agent#benchmark·arstechnica.com·Dec 23, 2025The Ars Technica AI coding agent test: Minesweeper edition - Ars Technica
R1+Sonnet set SOTA on aider’s polyglot benchmarkR1+Sonnet has set a new SOTA on the aider polyglot benchmark. At 14X less cost compared to o1.#agent#code#benchmark·aider.chat·Mar 3, 2025R1+Sonnet set SOTA on aider’s polyglot benchmark