Testing platform for AI models. Gain control over AI risks with a holistic testing platform for Quality, Security & Compliance for all AI models, from tabular ML models to LLMs.
An open-source research project developed by members from LMSYS and UC Berkeley SkyLab. Our mission is to build an open crowdsourced platform to collect human feedback and evaluate LLMs under real-world scenarios.
Open and responsible development of LLMs for code. BigCode is an open scientific collaboration working on the responsible development of large language models for code
WandB is a central dashboard to keep track of your hyperparameters, system metrics, and predictions so you can compare models live, and share your findings.
SuperGLUE is a new benchmark styled after original GLUE benchmark with a set of more difficult language understanding tasks, improved resources, and a new public leaderboard.