Comparing LPG and RDF in Recent Graph RAG Architectures
Comparing LPG and RDF in Recent Graph RAG Architectures
As a follow-up to my previous posts and discussions, I would like to share three papers on arXiv that demonstrate the wide range of design choices in combining LPG and RDF. Here’s a brief overview of each:
1. RAGONITE: Iterative Retrieval on Induced Databases and Verbalized RDF
arXiv:2412.17690
This paper builds on RDF knowledge graphs. Rather than relying solely on SPARQL queries, it establishes two retrieval pathways: one from an SQL database generated from the KG, and another from text searches over verbalised RDF facts. A controller decides when to combine or switch between them, with results passed to an LLM. The insight: RDF alone is not robust enough for conversational queries, but pairing it with SQL and text dramatically improves coverage and resilience.
2. GraphAr: Efficient Storage for Property Graphs in Data Lakes
arXiv:2312.09577
This article addresses LPGs. It introduces a storage scheme that preserves LPG semantics in formats such as Parquet, while significantly boosting performance. Reported gains are impressive: neighbour retrieval is ~4452× faster, label filtering 14.8× faster, and end-to-end workflows 29.5× faster compared to baseline Parquet methods. Such optimisations are critical for GraphRAG, where low-latency retrieval is essential.
3. CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era
arXiv:2412.18702
This work brings a benchmarking perspective, targeting Cypher queries over large-scale LPGs. It emphasises precision retrieval across full-scale graphs, something crucial when LLMs are expected to interact with enterprise-scale knowledge. By formalising benchmarks, it encourages more rigorous evaluation of GraphRAG retrieval techniques and raises the bar for future architectures.
Takeaway
Together, these works highlight the diverse strategies for bridging RDF and LPG in GraphRAG — from hybrid retrieval pipelines to optimised storage and precision benchmarks. They show how research is steadily moving from demos to architectures that balance semantics, performance, and accuracy.