29/Nov/2023 - Google-proof, ultra-high-ceiling AI tests (BASIS, GAIA, GPQA) - LifeArchitect.ai LIVE
Supporting benchmarks for AI safety with MLCommons
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
New Records for the Biggest and Smallest AI Computers
[Own work] VALSE 💃: Benchmark for Vision and Language Models Centered on Linguistic Phenomena
Is AI Training Outstripping Moore’s Law?