Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning
✨ #NeurIPS2025 paper: Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning Combining contrastive learning and message passing markedly improves features created from embedding graphs, scalable to huge graphs. It taught us a lot on graph feature learning 👇
Graphs can represent knowledge and have scaled to huge sizes (115M entities in Wikidata). How to distill these into good downstream features, eg for machine learning? The challenge is to create feature vectors, and for this graph embeddings have been invaluable.
Our paper shows that message passing is a great tool to build feature vectors from graphs As opposed to contrastive learning, message passing helps embeddings represent the large-scale structure of the graph (it gives Arnoldi-type iterations).
Our approach uses contrastive learning on a core subset of entities, to capture a large-scale structure. Consistent with knowledge-graph embedding literature, this step represents relations as operators on the embedding space. It also anchors the central entities.
Knowledge graphs have long-tailed entity distributions, with many weakly-connected entities on which contrastive learning is under constrained. For these, we propagate embeddings via the relation operators, in a diffusion-like step, extrapolating from the central entities.
To have a very efficient algorithm, we split the graph in overlapping highly-connected blocks that fit in GPU memory. Propagation is then simple in-memory iterations, and we embed huge graphs on a single GPU.
Splitting huge knowledge graphs in sub-parts is actually hard because of the mix of very highly-connected nodes, and a huge long tail hard to reach. We introduce a procedure that allows for overlap in the blocks, relaxing a lot the difficulty.
Our approach, SEPAL, combines these elements for feature learning on large knowledge graphs. It creates feature vectors that lead to better performance on downstream tasks, and it is more scalable. Larger knowledge graphs give feature vectors that provide downstream value.
We also learned that performance on link prediction, the canonical task of knowledge-graph embedding, is not a good proxy for downstream utility. We believe this is because link prediction only needs local structure, unlike downstream tasks
The papier is well reproducible, and we hope it will unleash more progress in knowledge graph embedding.
We'll present at #NeurIPS and #Eurips
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning