Search Public

Found 24 bookmarks

Custom sorting

Improved Baselines with Visual Instruction Tuning

Improved Baselines with Visual Instruction Tuning

#ai #llm #multimodal

·arxiv.org·Oct 9, 2023

Improved Baselines with Visual Instruction Tuning

LIMA: Less Is More for Alignment

LIMA: Less Is More for Alignment

#llm #ai #ai-alignment

·arxiv.org·Jun 15, 2023

LIMA: Less Is More for Alignment

How we sped up transformer inference 100x for 🤗 API customers

How we sped up transformer inference 100x for 🤗 API customers

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Tokenization is often a bottleneck for efficiency during inference. We use the most efficient methods from the 🤗 Tokenizers library, leveraging the Rust implementation of the model tokenizer in combination with smart caching to get up to 10x speedup for the overall latency.

Once the compute platform has been selected for the use case, we can go to work. Here are some CPU-specific techniques that can be applied with a static graph: Optimizing the graph (Removing unused flow) Fusing layers (with specific CPU instructions) Quantizing the operations

·huggingface.co·May 21, 2023

How we sped up transformer inference 100x for 🤗 API customers

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/ysymyth/tree-of-thought-llm.

·arxiv.org·May 23, 2023

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

·arxiv.org·May 15, 2023

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR BIASES ENABLES INPUT LENGTH EXTRAPOLATION

TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR BIASES ENABLES INPUT LENGTH EXTRAPOLATION

·arxiv.org·May 12, 2023

TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR BIASES ENABLES INPUT LENGTH EXTRAPOLATION

Emergent and Predictable Memorization in Large Language Models

Memorization, or the tendency of large language models (LLMs) to output entire sequences from their training data verbatim, is a key concern for safely deploying language models. In particular, it is vital to minimize a model's memorization of sensitive datapoints such as those containing personal identifiable information (PII). The prevalence of such undesirable memorization can pose issues for model trainers, and may even require discarding an otherwise functional model. We therefore seek to predict which sequences will be memorized before a large model's full train-time by extrapolating the memorization behavior of lower-compute trial runs. We measure memorization of the Pythia model suite, and find that intermediate checkpoints are better predictors of a model's memorization behavior than smaller fully-trained models. We additionally provide further novel discoveries on the distribution of memorization scores across models and data.

The paper "Emergent and Predictable Memorization in Large Language Models" by Stella Biderman et al. studies the problem of memorization in large language models and proposes a method to predict which sequences will be memorized before full training of the model, based on extrapolation of memorization behavior from lower-compute trial runs, and provides novel insights on the distribution of memorization scores across models and data. Key insights and lessons learned from the paper: Memorization is a key concern for deploying large language models safely, particularly for sensitive datapoints such as PII. Intermediate checkpoints are better predictors of memorization behavior than smaller fully-trained models. Memorization scores follow a power-law distribution across models and data, with some datapoints being more prone to memorization than others. Fine-tuning can mitigate memorization to some extent, but not completely.

·arxiv.org·Apr 24, 2023

Emergent and Predictable Memorization in Large Language Models

Recurrent Memory Transformer

Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.

The paper "Recurrent Memory Transformer" proposes a memory-augmented segment-level recurrent Transformer (RMT) model that stores and processes global and local information by adding memory tokens to the input or output sequence, and shows that RMT performs on par with Transformer-XL on language modeling for smaller memory sizes and outperforms it for longer sequence processing tasks. Key insights and lessons learned: The self-attention mechanism in Transformer-based models has quadratic computational complexity for long sequences and limits the amount of global and local information that can be stored and processed. Adding memory tokens to the input or output sequence of a Transformer-based model allows for memory-augmentation and the storage and processing of global and local information, as well as the passing of information between segments of long sequences with the help of recurrence. The proposed RMT model performs on par with Transformer-XL on language modeling for smaller memory sizes and outperforms it for longer sequence processing tasks. The RMT model can be applied to a wide range of tasks and domains, including natural language processing and image recognition.

·arxiv.org·Apr 25, 2023

Recurrent Memory Transformer

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

·arxiv.org·Apr 8, 2023

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

PANGU-Σ: TOWARDS TRILLION PARAMETER LANGUAGE MODEL WITH SPARSE HETEROGENEOUS COMPUTING

PANGU-Σ: TOWARDS TRILLION PARAMETER LANGUAGE MODEL WITH SPARSE HETEROGENEOUS COMPUTING

·arxiv.org·Mar 28, 2023

PANGU-Σ: TOWARDS TRILLION PARAMETER LANGUAGE MODEL WITH SPARSE HETEROGENEOUS COMPUTING

REAC T: SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS

REAC T: SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS

·arxiv.org·Mar 27, 2023

REAC T: SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS

SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions

SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions

·arxiv.org·Mar 22, 2023

SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions

The Model That Changes Everything: Alpaca Breakthrough (ft. Apple's LLM, BritGPT, Ernie and AlexaTM)

The Model That Changes Everything: Alpaca Breakthrough (ft. Apple's LLM, BritGPT, Ernie and AlexaTM)

8 years of cost reduction in 5 weeks: how Stanford's Alpaca model changes everything, including the economics of OpenAI and GPT 4. The breakthrough, using self-instruct, has big implications for Apple's secret large language model, Baidu's ErnieBot, Amazon's attempts and even governmental efforts, like the newly announced BritGPT. I will go through how Stanford put the model together, why it costs so little, and demonstrate in action versus Chatgpt and GPT 4. And what are the implications of short-circuiting human annotation like this? With analysis of a tweet by Eliezer Yudkowsky, I delve into the workings of the model and the questions it rises. Web Demo: https://alpaca-ai0.ngrok.io/ Alpaca: https://crfm.stanford.edu/2023/03/13/alpaca.html Ark Forecast: https://research.ark-invest.com/hubfs/1_Download_Files_ARK-Invest/Big_Ideas/ARK%20Invest_013123_Presentation_Big%20Ideas%202023_Final.pdf Eliezer Tweet: https://twitter.com/ESYudkowsky/status/1635577836525469697 https://twitter.com/ESYudkowsky/status/1635667349792780288 Self-Instruct: https://arxiv.org/pdf/2212.10560.pdf InstructGPT: https://openai.com/research/instruction-following OpenAI Terms: https://openai.com/policies/terms-of-use MMLU Test: https://arxiv.org/pdf/2009.03300.pdf Apple LLM: https://www.nytimes.com/2023/03/15/technology/siri-alexa-google-assistant-artificial-intelligence.html GPT 4 API: https://openai.com/pricing Llama Models: https://arxiv.org/pdf/2302.13971.pdf BritGPT: https://www.theguardian.com/technology/2023/mar/15/uk-to-invest-900m-in-supercomputer-in-bid-to-build-own-britgpt Amazon: https://www.businessinsider.com/amazons-ceo-andy-jassy-on-chat-cpt-ai-2023-2?r=US&IR=T AlexaTM: https://arxiv.org/pdf/2208.01448.pdf Baidu Ernie: https://www.nytimes.com/2023/03/16/world/asia/china-baidu-chatgpt-ernie.html PaLM API: https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html https://www.patreon.com/AIExplained

·youtube.com·Mar 21, 2023

The Model That Changes Everything: Alpaca Breakthrough (ft. Apple's LLM, BritGPT, Ernie and AlexaTM)

GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models

GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models

·arxiv.org·Mar 21, 2023

GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models

GPT-4 Technical Report

GPT-4 Technical Report

#ai #llm #multimodal

·cdn.openai.com·Mar 14, 2023

GPT-4 Technical Report

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

#ai #llm #multimodal

·arxiv.org·Mar 12, 2023

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

Training Compute-Optimal Large Language Models

Training Compute-Optimal Large Language Models

·arxiv.org·Mar 8, 2023

Training Compute-Optimal Large Language Models

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS

·arxiv.org·Mar 1, 2023

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS

ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics

ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics

#ai #llm #mathematic

·arxiv.org·Feb 28, 2023

ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics

LLaMA: Open and Efficient Foundation Language Models

LLaMA: Open and Efficient Foundation Language Models

·scontent-cdg2-1.xx.fbcdn.net·Feb 24, 2023

LLaMA: Open and Efficient Foundation Language Models

Introducing LLaMA: A foundational, 65-billion-parameter language model

Introducing LLaMA: A foundational, 65-billion-parameter language model

Today, we’re releasing our LLaMA (Large Language Model Meta AI) foundational model with a gated release. LLaMA is more efficient and competitive with previously published models of a similar size on existing benchmarks.

·ai.facebook.com·Feb 24, 2023

Introducing LLaMA: A foundational, 65-billion-parameter language model

fka/awesome-chatgpt-prompts · Datasets at Hugging Face

fka/awesome-chatgpt-prompts · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

·huggingface.co·Feb 24, 2023

fka/awesome-chatgpt-prompts · Datasets at Hugging Face

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

·arxiv.org·Feb 19, 2023

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

#ai #llm #video #multimodal

·arxiv.org·Feb 13, 2023

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers