Found 2 bookmarks
Custom sorting
Claude
Claude
(Claude 3.7 Sonnet looks at TIS. Nothing can see everything so this relies on self-reporting. It is winging it here and there.)
·claude.ai·
Claude
Alex on X: "Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval. For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of… https://t.co/m7wWhhu6Fg" / X
Alex on X: "Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval. For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of… https://t.co/m7wWhhu6Fg" / X
·twitter.com·
Alex on X: "Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval. For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of… https://t.co/m7wWhhu6Fg" / X