DafnyBench: A Benchmark for Formal Software VerificationView PDF#AI#Verification#Paper#PDF#Benchmark#Software Engineering#Machine Learning#Programming Languages·arxiv.org·Jun 14, 2024DafnyBench: A Benchmark for Formal Software Verification
29/Nov/2023 - Google-proof, ultra-high-ceiling AI tests (BASIS, GAIA, GPQA) - LifeArchitect.ai LIVE#AI#Benchmark·youtube.com·Nov 29, 202329/Nov/2023 - Google-proof, ultra-high-ceiling AI tests (BASIS, GAIA, GPQA) - LifeArchitect.ai LIVE
Supporting benchmarks for AI safety with MLCommons#AI#Benchmark#Safety#Google·blog.research.google·Oct 27, 2023Supporting benchmarks for AI safety with MLCommons