Found 1 bookmarks
Custom sorting
(1) Rohan Paul on X: "GPT 5 Rumored Benchmark through Copilot. SimpleBench is a roughly 200-question multiple-choice benchmark that targets spatio-temporal, social, and adversarial reasoning. A 90% scroe is quite insane here, as it represent a human-level common-sense reasoning equivalent. https://t.co/X01PC3CDwC" / X
(1) Rohan Paul on X: "GPT 5 Rumored Benchmark through Copilot. SimpleBench is a roughly 200-question multiple-choice benchmark that targets spatio-temporal, social, and adversarial reasoning. A 90% scroe is quite insane here, as it represent a human-level common-sense reasoning equivalent. https://t.co/X01PC3CDwC" / X
·x.com·
(1) Rohan Paul on X: "GPT 5 Rumored Benchmark through Copilot. SimpleBench is a roughly 200-question multiple-choice benchmark that targets spatio-temporal, social, and adversarial reasoning. A 90% scroe is quite insane here, as it represent a human-level common-sense reasoning equivalent. https://t.co/X01PC3CDwC" / X