Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs
Our first attempts at mechanistic interpretability of Transformers from the perspective of network science and graph theory! Check out our preprint: arxiv.org/abs/2502.12352
A wonderful collaboration with superstar MPhil students Batu El, Deepro Choudhury, as well as Pietro Lio' as part of the Geometric Deep Learning class last year at University of Cambridge Department of Computer Science and Technology
We were motivated by Demis Hassabis calling AlphaFold and other AI systems for scientific discovery as ‘engineering artifacts’. We need new tools to interpret the underlying mechanisms and advance our scientific understanding. Graph Transformers are a good place to start.
The key ideas are:
- Attention across multi-heads and layers can be seen as a heterogenous, dynamically evolving graph.
- Attention graphs are complex systems represent information flow in Transformers.
- We can use network science to extract mechanistic insights from them!
More to come on the network science perspective to understanding LLMs next! | 13 comments on LinkedIn