Language models can explain neurons in language models#Interpretability#Large Language Models#Paper#PDF·openaipublic.blob.core.windows.net·May 27, 2023Language models can explain neurons in language models
Progress measures for grokking via mechanistic interpretability#Machine Learning#Interpretability#Paper#PDF#Explainability#Deep Learning·arxiv.org·Feb 5, 2023Progress measures for grokking via mechanistic interpretability