Found 3 bookmarks
Custom sorting
yet-another-applied-llm-benchmark
yet-another-applied-llm-benchmark
Nicholas Carlini introduced this personal LLM benchmark suite [back in February](https://nicholas.carlini.com/writing/2024/my-benchmark-for-large-language-models.html) as a collection of over 100 automated tests he runs against new LLM models to evaluate their performance against …
·simonwillison.net·
yet-another-applied-llm-benchmark
Introduction | Ragas
Introduction | Ragas
Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in.
·docs.ragas.io·
Introduction | Ragas