Search AI/ML

Found 2 bookmarks

Newest

The Ars Technica AI coding agent test: Minesweeper edition - Ars Technica

The Ars Technica AI coding agent test: Minesweeper edition - Ars Technica

How do four modern LLMs do at recreating a simple Windows gaming classic?

#code #agent #benchmark

·arstechnica.com·Dec 23, 2025

The Ars Technica AI coding agent test: Minesweeper edition - Ars Technica

R1+Sonnet set SOTA on aider’s polyglot benchmark

R1+Sonnet set SOTA on aider’s polyglot benchmark

R1+Sonnet has set a new SOTA on the aider polyglot benchmark. At 14X less cost compared to o1.

#agent #code #benchmark

·aider.chat·Mar 3, 2025

R1+Sonnet set SOTA on aider’s polyglot benchmark