Search Test Information Space

Found 8 bookmarks

Newest

2309

#Interpretability #Agents #Explainability #Paper #PDF

·arxiv.org·Jan 8, 2024

AI agents help explain other AI systems

#XAI #Explainability #Interpretability #Agents

·news.mit.edu·Jan 8, 2024

AI agents help explain other AI systems

Language models can explain neurons in language models

#Interpretability #Large Language Models #Paper #PDF

·openaipublic.blob.core.windows.net·May 27, 2023

Language models can explain neurons in language models

Unveiling the Mysteries of AI Neurons: How OpenAI's GPT-4 Automatically Writes and Scores Explanations for GPT-2 Neuron Behavior

#Interpretability #GPT-4 #GPT-2

·marktechpost.com·May 27, 2023

Unveiling the Mysteries of AI Neurons: How OpenAI's GPT-4 Automatically Writes and Scores Explanations for GPT-2 Neuron Behavior

Interpretability Dreams

#Anthropic #Neural Networks #Interpretability

·transformer-circuits.pub·May 25, 2023

Interpretability Dreams

Progress measures for grokking via mechanistic interpretability

#Machine Learning #Interpretability #Paper #PDF #Explainability #Deep Learning

·arxiv.org·Feb 5, 2023

Progress measures for grokking via mechanistic interpretability

#92 - SARA HOOKER - Fairness, Interpretability, Language Models

#Cohere #Ethics #Research #Fairness #Interpretability #Large Language Models

·youtube.com·Dec 26, 2022

#92 - SARA HOOKER - Fairness, Interpretability, Language Models

Will You Find These Shortcuts?

#Machine Learning #Model #Integrity #Google Research #Interpretability

·ai.googleblog.com·Dec 11, 2022

Will You Find These Shortcuts?