Search Test Information Space

Found 5 bookmarks

Custom sorting

OpenAI o3-mini | OpenAI

·openai.com·Jan 31, 2025

Strengthening America’s AI leadership with the U.S. National Laboratories | OpenAI

·openai.com·Jan 30, 2025

AI still lacks “common” sense, 70 years later

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

·garymarcus.substack.com·Jan 6, 2025

Evaluating Large Language Models Using “Counterfactual Tasks”

·aiguide.substack.com·May 14, 2024

Chain-of-table: Evolving tables in the reasoning chain for table understanding

·blog.research.google·Mar 12, 2024