Search AI/ML

Found 3 bookmarks

Custom sorting

yet-another-applied-llm-benchmark

Nicholas Carlini introduced this personal LLM benchmark suite [back in February](https://nicholas.carlini.com/writing/2024/my-benchmark-for-large-language-models.html) as a collection of over 100 automated tests he runs against new LLM models to evaluate their performance against …

#testing

·simonwillison.net·Nov 6, 2024

yet-another-applied-llm-benchmark

Introduction | Ragas

Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in.

#RAG #model training #testing

·docs.ragas.io·Mar 23, 2024

Introduction | Ragas

Arthur unveils Bench, an open-source AI model evaluator | VentureBeat

#testing

·venturebeat.com·Aug 18, 2023

Arthur unveils Bench, an open-source AI model evaluator | VentureBeat