Introducing Llama 3.1: Our most capable models to date
Bringing open intelligence to all, our latest models expand context length, add support across eight languages, and include Meta Llama 3.1 405B— the first frontier-level open source AI model.
RoBERTa: An optimized method for pretraining self-supervised NLP systems
Facebook AI’s RoBERTa is a new training recipe that improves on BERT, Google’s self-supervised method for pretraining natural language processing systems. By training longer, on more data, and dropping BERT’s next-sentence prediction RoBERTa topped the GLUE leaderboard.
LG’s hyperscale AI EXAONE 2.0 to be launched for drug development this year - Pulse by Maeil Business News Korea
LG AI Research has unveiled EXAONE 2.0, a hyperscale artificial intelligence (AI) language model that can be used for expert applications in the development of new materials or medicines. During LG’s AI Talk Concert event
Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language...
Meta Open-Sources 175 Billion Parameter AI Language Model OPT
Meta AI Research released Open Pre-trained Transformer (OPT-175B), a 175B parameter AI language model. The model was trained on a dataset containing 180B tokens and exhibits performance comparable with GPT-3, while only requiring 1/7th GPT-3's training carbon footprint.
In this article, we'll explore the architecture and mechanisms behind Google’s T5 Transformer model, from the unified text-to-text framework to the comparison of T5 results.
The Generalist Language Model GLaM is a mixture of experts (MoE) model, a type of model that can be thought of as having different submodels (or expert...
Switch Transformers by Google Brain | Discover AI use cases
Scaling to Trillion Parameter Models with Simple and Efficient Sparsity *In deep learning, models typically reuse the same parameters for all inputs. Mi...
A GPT-3 rival by Deepmind Researchers at DeepMind have proposed a new predicted compute-optimal model called Chinchilla that uses the same compute budge...