Graph Learning Will Lose Relevance Due To Poor Benchmarks
๐ฃ Our spicy ICML 2025 position paper: โGraph Learning Will Lose Relevance Due To Poor Benchmarksโ.
Graph learning is less trendy in the ML world than it was in 2020-2022. We believe the problem is in poor benchmarks that hold the field back - and suggest ways to fix it!
We identified three problems:
#๏ธโฃ P1: No transformative real-world applications - while LLMs and geometric generative models become more powerful and solve complex tasks every generation (from reasoning to protein folding), how transformative could a GNN on Cora or OGB be?
P1 Remedies: The community is overlooking many significant and transformative applications, including chip design and broader ML for systems, combinatorial optimization, and relational data (as highlighted by RelBench). Each of them offers $billions in potential outcomes.
#๏ธโฃ P2: While everything can be modeled as a graph, often it should not be. We made a simple experiment and probed a vanilla DeepSet w/o edges and a GNN on Cayley graphs (fixed edges for a certain number of nodes) on molecular datasets and the performance is quite competitive.
#๏ธโฃ P3: Bad benchmarking culture (this one hits hard) - itโs a mess :)
Small datasets (donโt use Cora and MUTAG in 2025), no standard splits, and in many cases recent models are clearly worse than GCN / Sage from 2020. It gets worse when evaluating generative models.
Remedies for P3: We need more holistic benchmarks which are harder to game and saturate - while itโs a common problem for all ML fields, standard graph learning benchmarks are egregiously old and rather irrelevant for the scale of problems doable in 2025.
๐ก As a result, itโs hard to build a true foundation model for graphs. Instead of training each model on each dataset, we suggest using GNNs / GTs as processors in the โencoder-processor-decoderโ blueprint, train them at scale, and only tune graph-specific encoders/decoders.
For example, we pre-trained several models on PCQM4M-v2, COCO-SP, and MalNet Tiny, and fine-tuned them on PascalVOC, Peptides-struct, and Stargazers to find that graph transformers benefit from pre-training.
---
The project started around NeurIPS 2024 when Christopher Morris gathered us to discuss the peeve points of graph learning and how to continue to do impactful research in this area. I believe the outcomes appear promising, and we can re-imagine graph learning in 2025 and beyond!
Massive work with 12 authors (everybody actually contributed): Maya Bechler-Speicher, Ben Finkelshtein, Fabrizio Frasca, Luis Mรผller, Jan Tรถnshoff, Antoine Siraudin, Viktor Zaverkin, Michael Bronstein, Mathias Niepert, Bryan Perozzi, and Christopher Morris (Chris you should create a LinkedIn account finally ;)
Graph Learning Will Lose Relevance Due To Poor Benchmarks