[Own work] On Measuring Faithfulness or Self-consistency of Natural Language Explanations
Scaling and evaluating sparse autoencoders
View PDF
Cultural Bias in Explainable AI Research: A Systematic Analysis | Journal of Artificial Intelligence Research
2309
AI agents help explain other AI systems
Diagnosing AI Explanation Methods with Folk Concepts of Behavior | Journal of Artificial Intelligence Research
Explainable Goal-driven Agents and Robots - A Comprehensive Review | ACM Computing Surveys
Progress measures for grokking via mechanistic interpretability
Towards Human-Centered Explainable AI: the journey so far
Interpretable Machine Learning
Does this artificial intelligence think like a human?
Software vendors are pushing "explainable A.I." that often isn't
How well do explanation methods for machine-learning models work?
"Knowledge Creation and its Risks" - David Deutsch on AGI - Centre for the Future of Intelligence
Even experts are too quick to rely on AI explanations, study finds
Best practices for ML product decisions (ML Tech Talks)