Search AI/ML

Found 3 bookmarks

Newest

Tencent-Hunyuan/AutoCodeBenchmark

Contribute to Tencent-Hunyuan/AutoCodeBenchmark development by creating an account on GitHub.

·github.com·Dec 25, 2025

The Ars Technica AI coding agent test: Minesweeper edition - Ars Technica

How do four modern LLMs do at recreating a simple Windows gaming classic?

·arstechnica.com·Dec 23, 2025

R1+Sonnet set SOTA on aider’s polyglot benchmark

R1+Sonnet has set a new SOTA on the aider polyglot benchmark. At 14X less cost compared to o1.

·aider.chat·Mar 3, 2025