PatternBoost: Constructions in Mathematics with a Little Help from AIView PDF#Mathematics#Transformers#AI#Paper#PDF·arxiv.org·Nov 6, 2024PatternBoost: Constructions in Mathematics with a Little Help from AI
STT: Stateful Tracking with Transformers for Autonomous DrivingView PDF#AVs#Transformers#Machine Learning#Computer Vision#Paper#PDF·arxiv.org·May 2, 2024STT: Stateful Tracking with Transformers for Autonomous Driving
Retrieval Head Mechanistically Explains Long-Context FactualityView PDF#Transformers#Machine Learning#Paper#PDF·arxiv.org·Apr 25, 2024Retrieval Head Mechanistically Explains Long-Context Factuality
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention#Transformers#Memory#Paper#PDF·arxiv.org·Apr 13, 2024Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Does Transformer Interpretability Transfer to RNNs?#RNN#Transformers#Large Language Models#Paper#PDF#EleutherAI·arxiv.org·Apr 10, 2024Does Transformer Interpretability Transfer to RNNs?
DSI++: Updating Transformer Memory with New Documents#Search#Transformers#Paper#PDF#Google#DeepMind·arxiv.org·Jan 29, 2024DSI++: Updating Transformer Memory with New Documents
Faith and Fate: Limits of Transformers on Compositionality#Large Language Models#Machine Learning#Compositionality#Paper#PDF#Transformers·arxiv.org·Jun 5, 2023Faith and Fate: Limits of Transformers on Compositionality
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM#Machine Learning#Transformers#Paper#PDF·arxiv.org·Jun 4, 2023CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
Bytes Are All You Need: Transformers Operating Directly On File BytesPDF#Machine Learning#Transformers#Paper#PDF·arxiv.org·Jun 4, 2023Bytes Are All You Need: Transformers Operating Directly On File Bytes
The Impact of Positional Encoding on Length Generalization in TransformersPDF#Machine Learning#Transformers#Paper#PDF·arxiv.org·Jun 4, 2023The Impact of Positional Encoding on Length Generalization in Transformers
Can Transformers Learn to Solve Problems Recursively?#Transformers#Problem-Solving#Recursion#Paper#PDF#EleutherAI·arxiv.org·May 27, 2023Can Transformers Learn to Solve Problems Recursively?
RWKV: Reinventing RNNs for the Transformer EraPDF#Transformers#RNN#Natural Language Processing#Paper#PDF·arxiv.org·May 23, 2023RWKV: Reinventing RNNs for the Transformer Era
Scaling Transformer to 1M tokens and beyond with RMT#Transformers#BERT#Paper#PDF·arxiv.org·Apr 27, 2023Scaling Transformer to 1M tokens and beyond with RMT
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale#Transformers#Image Recognition#Google#Paper#PDF·arxiv.org·Apr 27, 2023An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Scaling Vision Transformers to 22 Billion Parameters#Transformers#Computer Vision#Paper#PDF#Google·arxiv.org·Apr 23, 2023Scaling Vision Transformers to 22 Billion Parameters
Exphormer: Sparse Transformers for Graphs#Machine Learning#Transformers#Paper#PDF·arxiv.org·Mar 20, 2023Exphormer: Sparse Transformers for Graphs