Found 20 bookmarks
Custom sorting
Wolfram LLM Benchmarking Project
Wolfram LLM Benchmarking Project
Results from Wolfram's ongoing tracking of LLM performance. The benchmark is based on a Wolfram Language code generation task.
·wolfram.com·
Wolfram LLM Benchmarking Project
vectara/hallucination-leaderboard: Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
vectara/hallucination-leaderboard: Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents - GitHub - vectara/hallucination-leaderboard: Leaderboard Comparing LLM Performance at Producing H...
·github.com·
vectara/hallucination-leaderboard: Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents