LMSYS Chatbot Arena: Live and Community-Driven LLM Evaluation | LMSYS Org

AI/ML
apple/OpenELM · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
GitHub - truefoundry/cognita: RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry - GitHub - truefoundry/cognita: RAG (Retrieval Augmented Generation) Fra...
Aviv Regev: The Revolution in Digital Biology
What happens when big advances in life science converge with A.I.
So you want to Scrape like the Big Boys? 🚀
What it really takes to scrape without getting detected.
(Another) New AI Biopharma Company
Snowflake Arctic Cookbook
Today's big model release was Snowflake Arctic, an enormous 480B model with a 128×3.66B MoE (Mixture of Experts) architecture. It's Apache 2 licensed and Snowflake state that "in addition, we …
What can LLMs never do?
On goal drift and lower reliability. Or, why can't LLMs play Conway's Game Of Life?
James Grimmelmann (@jtlg@mastodon.lawprofs.org)
Something exceptionally grim is happening on the Internet.
In the last few months, the constant flood of algorithmically generated junk content has kicked into an AI-powered overdrive, and it is cutting a swath of destruction as it overwhelms search engines, filters, and moderation systems
Call it Gresham's Law 2.0: bad content drives out good.
I'm starting this thread to document it, because there is a *lot* happening all at once.
#greshamslaw20
The unspoken obnoxiousness of Google's Gemini improvements
Google's Gemini chatbot is seeing all sorts of upgrades on Android this week, but those advancements reveal a darker underlying reality.
Cleft Notes - Turn Voice Memos Into Shared Notes
Cleft AI moves messy ideas to organized content instantly
The Rise of Large-Language-Model Optimization - Schneier on Security
GitHub - timpaul/form-extractor-prototype
This tool extracts the structure from an image of a form.
It uses the Claude 3 LLM model by Anthropic.
A single extraction of an A4 form page costs about 10p.
It replicates the form structure in JSON, following the schema used by GOV.UK Forms.
It then uses that to generate a multi-page web form in the GOV.UK style.
GitHub - apple/corenet: CoreNet: A library for training deep neural networks
CoreNet: A library for training deep neural networks - apple/corenet
GitHub - apple/corenet: CoreNet: A library for training deep neural networks
CoreNet: A library for training deep neural networks - apple/corenet
WHY AI Works - YouTube
Bertrand Serlet's thoughts on WHY LLMs and AI in general work so well, nowadays.
LLMs and the Harry Potter problem
Large language models may have big context windows, but they still aren't good enough at using the information in big contexts, especially in high value use-cases.
openelm/README-pretraining.md
Apple released something big three hours ago, and I'm still trying to get my head around exactly what it is. The parent project is called CoreNet, described as "A library …
GitHub - haizelabs/llama3-jailbreak: A trivial programmatic Llama 3 jailbreak. Sorry Zuck!
A trivial programmatic Llama 3 jailbreak. Sorry Zuck! - haizelabs/llama3-jailbreak
Doug McIlroy and Bing Copilot
A knockout.
Stepify
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
By far the most detailed paper on prompt injection I've seen yet from OpenAI, published a few days ago and with six credited authors: Eric Wallace, Kai Xiao, Reimar Leike, …
A quote from Phi-3 Technical Report
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models …
The Illustrated Word2vec
Discussions:
Hacker News (347 points, 37 comments), Reddit r/MachineLearning (151 points, 19 comments)
Translations: Chinese (Simplified), French, Korean, Portuguese, Russian
“There is in all things a pattern that is part of our universe. It has symmetry, elegance, and grace - those qualities you find always in that which the true artist captures. You can find it in the turning of the seasons, in the way sand trails along a ridge, in the branch clusters of the creosote
bush or the pattern of its leaves.
We try to copy these patterns in our lives and our society,
seeking the rhythms, the dances, the forms that comfort.
Yet, it is possible to see peril in the finding of
ultimate perfection. It is clear that the ultimate
pattern contains it own fixity. In such
perfection, all things move toward death.”
~ Dune (1965)
I find the concept of embeddings to be one of the most fascinating ideas in machine learning. If you’ve ever used Siri, Google Assistant, Alexa, Google Translate, or even smartphone keyboard with next-word prediction, then chances are you’ve benefitted from this idea that has become central to Natural Language Processing models. There has been quite a development over the last couple of decades in using embeddings for neural models (Recent developments include contextualized word embeddings leading to cutting-edge models like BERT and GPT2).
Word2vec is a method to efficiently create word embeddings and has been around since 2013. But in addition to its utility as a word-embedding method, some of its concepts have been shown to be effective in creating recommendation engines and making sense of sequential data even in commercial, non-language tasks. Companies like Airbnb, Alibaba, Spotify, and Anghami have all benefitted from carving out this brilliant piece of machinery from the world of NLP and using it in production to empower a new breed of recommendation engines.
In this post, we’ll go over the concept of embedding, and the mechanics of generating embeddings with word2vec. But let’s start with an example to get familiar with using vectors to represent things. Did you know that a list of five numbers (a vector) can represent so much about your personality?
GPT-4 can exploit real vulnerabilities by reading advisories
While some other LLMs appear to flat-out suck
Inducing Unprompted Misalignment in LLMs — LessWrong
Emergent Instrumental Reasoning Without Explicit Goals TL;DR: LLMs can act and scheme without being told to do so. This is bad. …
AI for Data Journalism: demonstrating what we can do with this stuff right now
I gave a talk last month at the Story Discovery at Scale data journalism conference hosted at Stanford by Big Local News. My brief was to go deep into the …
Limitless
Go beyond your mind’s limitations: Personalized AI powered by what you’ve seen, said, and heard.
Stuff we figured out about AI in 2023
2023 was the breakthrough year for Large Language Models (LLMs). I think it’s OK to call these AI—they’re the latest and (currently) most interesting development in the academic field of …
OpenAI Begins Tackling ChatGPT Data Leak Vulnerability · Embrace The Red
Good news. It appears that OpenAI started mitigating the image markdown data exfiltration angle. It remains vulnerable, but it's great to see a few first actions being taken to mitigate the problem.