AI agents help explain other AI systems#XAI#Explainability#Interpretability#Agents·news.mit.edu·Jan 8, 2024AI agents help explain other AI systems
Language models can explain neurons in language models#Interpretability#Large Language Models#Paper#PDF·openaipublic.blob.core.windows.net·May 27, 2023Language models can explain neurons in language models
Unveiling the Mysteries of AI Neurons: How OpenAI's GPT-4 Automatically Writes and Scores Explanations for GPT-2 Neuron Behavior#Interpretability#GPT-4#GPT-2·marktechpost.com·May 27, 2023Unveiling the Mysteries of AI Neurons: How OpenAI's GPT-4 Automatically Writes and Scores Explanations for GPT-2 Neuron Behavior
Interpretability Dreams#Anthropic#Neural Networks#Interpretability·transformer-circuits.pub·May 25, 2023Interpretability Dreams
Progress measures for grokking via mechanistic interpretability#Machine Learning#Interpretability#Paper#PDF#Explainability#Deep Learning·arxiv.org·Feb 5, 2023Progress measures for grokking via mechanistic interpretability
#92 - SARA HOOKER - Fairness, Interpretability, Language Models#Cohere#Ethics#Research#Fairness#Interpretability#Large Language Models·youtube.com·Dec 26, 2022#92 - SARA HOOKER - Fairness, Interpretability, Language Models
Will You Find These Shortcuts?#Machine Learning#Model#Integrity#Google Research#Interpretability·ai.googleblog.com·Dec 11, 2022Will You Find These Shortcuts?