The Ars Technica AI coding agent test: Minesweeper edition - Ars TechnicaHow do four modern LLMs do at recreating a simple Windows gaming classic?#code#agent#benchmark·arstechnica.com·Dec 23, 2025The Ars Technica AI coding agent test: Minesweeper edition - Ars Technica
R1+Sonnet set SOTA on aider’s polyglot benchmarkR1+Sonnet has set a new SOTA on the aider polyglot benchmark. At 14X less cost compared to o1.#agent#code#benchmark·aider.chat·Mar 3, 2025R1+Sonnet set SOTA on aider’s polyglot benchmark