Millions of G∈AR-s: Extending GraphRAG to Millions of Documents
Scaling GraphRAG to Millions of Documents: Lessons from the SIGIR 2025 LiveRAG Challenge
👉 WHY THIS MATTERS
Retrieval-augmented generation (RAG) struggles with multi-hop questions that require connecting information across documents. While graph-based RAG methods like GEAR improve reasoning by structuring knowledge as entity-relationship triples, scaling these approaches to web-sized datasets (millions/billions of documents) remains a bottleneck. The culprit? Traditional methods rely heavily on LLMs to extract triples—a process too slow and expensive for large corpora.
👉 WHAT THEY DID
Researchers from Huawei and the University of Edinburgh reimagined GEAR to sidestep costly offline triple extraction.
Their solution:
- Pseudo-alignment: Link retrieved passages to existing triples in Wikidata via sparse retrieval.
- Iterative expansion: Use a lightweight LLM (Falcon-3B-Instruct) to iteratively rewrite queries and retrieve additional evidence through Wikidata’s graph structure.
- Multi-step filtering: Combine Reciprocal Rank Fusion (RRF) and prompt-based filtering to reconcile noisy alignments between Wikidata and document content.
This approach achieved 87.6% correctness and 53% faithfulness on the SIGIR 2025 LiveRAG benchmark, despite challenges in aligning Wikidata’s generic triples with domain-specific document content.
👉 KEY INSIGHTS
1. Trade-offs in alignment: Linking Wikidata triples to documents works best for general knowledge but falters with niche topics (e.g., "Pacific geoduck reproduction" mapped incorrectly to oyster biology).
2. Cost efficiency: Avoiding LLM-based triple extraction reduced computational overhead, enabling scalability.
3. The multi-step advantage: Query rewriting and iterative retrieval improved performance on complex questions requiring 2+ reasoning hops.
👉 OPEN QUESTIONS
- How can we build asymmetric semantic models to better align text and graph data?
- Can hybrid alignment strategies (e.g., blending domain-specific KGs with Wikidata) mitigate topic drift?
- Does graph expansion improve linearly with scale, or are diminishing returns inevitable?
Why read this paper?
It’s a pragmatic case study in balancing scalability with reasoning depth in RAG systems. The code and prompts are fully disclosed, offering a blueprint for adapting GraphRAG to real-world, large-scale applications.
Paper: "Millions of G∈AR-s: Extending GraphRAG to Millions of Documents" (Shen et al., SIGIR 2025). Preprint: arXiv:2307.17399.
Millions of G∈AR-s: Extending GraphRAG to Millions of Documents