This year has felt distinctly different. I've been working in, on, and with machine learning and AI for over a decade, yet I can't recall a time when these fields were as popular and rapidly evolving as they have been this year. To conclude an eventful 2023 in machine learning and AI research, I'm excited to share 10 noteworthy papers I've read this year. My personal focus has been more on large language models, so you'll find a heavier emphasis on large language model (LLM) papers than computer vision papers this year.
Explore the intriguing history of Eliza, a pioneering chatbot, and learn how to implement a basic version in Go, unraveling the roots of conversational AI.
AI/ML has been witnessing a rapid acceleration in model improvement in the last few years. The majority of the state-of-the-art models in the field are based on the Transformer architecture. Examples include models like BERT (which when applied to Google Search, resulted in what Google calls "one of the biggest leaps forward in the history of Search") and OpenAI's GPT2 and GPT3 (which are able to generate coherent text and essays).
This video by the author of the popular "Illustrated Transformer" guide will introduce the Transformer architecture and its various applications. This is a visual presentation accessible to people with various levels of ML experience.
Intro (0:00)
The Architecture of the Transformer (4:18)
Model Training (7:11)
Transformer LM Component 1: FFNN (10:01)
Transformer LM Component 2: Self-Attention(12:27)
Tokenization: Words to Token Ids (14:59)
Embedding: Breathe meaning into tokens (19:42)
Projecting the Output: Turning Computation into Language (24:11)
Final Note: Visualizing Probabilities (25:51)
The Illustrated Transformer:
https://jalammar.github.io/illustrated-transformer/
Simple transformer language model notebook:
https://github.com/jalammar/jalammar.github.io/blob/master/notebooks/Simple_Transformer_Language_Model.ipynb
Philosophers On GPT-3 (updated with replies by GPT-3):
https://dailynous.com/2020/07/30/philosophers-gpt-3/
-----
Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Mailing List: https://jayalammar.substack.com/
More videos by Jay:
Jay's Visual Intro to AI
https://www.youtube.com/watch?v=mSTCzNgDJy4
How GPT-3 Works - Easily Explained with Animations
https://www.youtube.com/watch?v=MQnJZuBGmSQ
Go to https://www.squarespace.com/nerdwriter for 10% off your first purchase.GET THE PAPERBACK OF MY BOOK: https://amzn.to/3EPDQKtSupport Nerdwriter videos: ...
Discussions:
Hacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments)
Translations: Arabic, Chinese (Simplified) 1, Chinese (Simplified) 2, French 1, French 2, Italian, Japanese, Korean, Persian, Russian, Spanish 1, Spanish 2, Vietnamese
Watch: MIT’s Deep Learning State of the Art lecture referencing this post
Featured in courses at Stanford, Harvard, MIT, Princeton, CMU and others
In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelization. It is in fact Google Cloud’s recommendation to use The Transformer as a reference model to use their Cloud TPU offering. So let’s try to break the model apart and look at how it functions.
The Transformer was proposed in the paper Attention is All You Need. A TensorFlow implementation of it is available as a part of the Tensor2Tensor package. Harvard’s NLP group created a guide annotating the paper with PyTorch implementation. In this post, we will attempt to oversimplify things a bit and introduce the concepts one by one to hopefully make it easier to understand to people without in-depth knowledge of the subject matter.
2020 Update: I’ve created a “Narrated Transformer” video which is a gentler approach to the topic:
A High-Level Look
Let’s begin by looking at the model as a single black box. In a machine translation application, it would take a sentence in one language, and output its translation in another.
Nᴇᴜᴛʀᴀʟ.Nᴇᴡs is a tool that takes any news article URL and processes its content to remove potential biases, emotionally charged language, and other subjective elements. The end result is a set of unbiased, neutral summarized points that represent the core information of the article.
What I Learned Using Private LLMs to Write an Undergraduate History Essay
TL;DR Context Writing A 1996 Essay Again in 2023, This Time With Lots More Transistors ChatGPT 3 Gathering the Sources PrivateGPT Ollama (and Llama2:70b) Hallucinations What I Learned TL;DR I used …
jasonjmcghee/rem: An open source approach to locally record and enable searching everything you view on your Apple Silicon.
An open source approach to locally record and enable searching everything you view on your Apple Silicon. - jasonjmcghee/rem: An open source approach to locally record and enable searching everythi...