On the different roles of ontologies (& machine learning) | LinkedIn
In a previous post I was touching on how ontologies are foundational to many data activities, yet "obscure". As a consequence, the different roles of ontologies are not always known among people that make use of them, as they may focus only on some of the aspects relevant for specific use cases.
Lessons Learned from Evaluating NodeRAG vs Other RAG Systems
๐ Lessons Learned from Evaluating NodeRAG vs Other RAG Systems
I recently dug into the NodeRAG paper (https://lnkd.in/gwaJHP94) and it was eye-opening not just for how it performed, but for what it revealed about the evolution of RAG (Retrieval-Augmented Generation) systems.
Some key takeaways for me:
๐ NaiveRAG is stronger than you think.
Brute-force retrieval using simple vector search sometimes beats graph-based methods, especially when graph structures are too coarse or noisy.
๐ GraphRAG was an important step, but not the final answer.
While it introduced knowledge graphs and community-based retrieval, GraphRAG sometimes underperformed NaiveRAG because its communities could be too coarse, leading to irrelevant retrieval.
๐ LightRAG reduced token cost, but at the expense of accuracy.
By focusing on retrieving just 1-hop neighbors instead of traversing globally, LightRAG made retrieval cheaper โ but often missed important multi-hop reasoning paths, losing precision.
๐ NodeRAG shows what mature RAG looks like.
NodeRAG redesigned the graph structure itself:
Instead of homogeneous graphs, it uses heterogeneous graphs with fine-grained semantic units, entities, relationships, and high-level summaries โ all as nodes.
It combines dual search (exact match + semantic search) and shallow Personalized PageRank to precisely retrieve the most relevant context.
The result?
๐ Highest accuracy across multi-hop and open-ended benchmarks
๐ Lowest token retrieval (i.e., lower inference costs)
๐ Faster indexing and querying
๐ง Key takeaway:
In the RAG world, itโs no longer about retrieving more โ itโs about retrieving better.
Fine-grained, explainable, efficient retrieval will define the next generation of RAG systems.
If youโre working on RAG architectures, NodeRAGโs design principles are well worth studying!
Would love to hear how others are thinking about the future of RAG systems. ๐๐
#RAG #KnowledgeGraphs #AI #LLM #NodeRAG #GraphRAG #LightRAG #MachineLearning #GenAI #KnowledegGraphs
Choosing the Right Format: How Knowledge Graph Layouts Impact AI Reasoning
Choosing the Right Format: How Knowledge Graph Layouts Impact AI Reasoning ...
๐ Why This Matters
Most AI systems blend knowledge graphs (structured data) with large language models (flexible reasoning). But thereโs a hidden variable: "how" you translate the graph into text for the AI. Researchers discovered that the formatting choice alone can swing performance by up to "17.5%" on reasoning tasks. Imagine solving 1 in 5 more problems correctly just by adjusting how you present data.
๐ What They Built
KG-LLM-Bench is a new benchmark to test how language models reason with knowledge graphs.
It includes five tasks:
- Triple verification (โDoes this fact exist?โ)
- Shortest path finding (โHow are two concepts connected?โ)
- Aggregation (โHow many entities meet X condition?โ)
- Multi-hop reasoning (โWhich entities linked to A also have property B?โ)
- Global analysis (โWhich node is most central?โ)
The team tested seven models (Claude, GPT-4o, Gemini, Llama, Nova) with five ways to โtextualizeโ graphs, from simple edge lists to structured JSON and semantic web formats like RDF Turtle.
๐ Key Insights
1. Format matters more than assumed:
ย ย - Structured JSON and edge lists performed best overall, but results varied by task.
ย ย - For example, JSON excels at aggregation tasks (data is grouped by entity), while edge lists help identify central nodes (repeated mentions highlight connections).
2. Models donโt cheat:
Replacing real entity names with fake ones (e.g., โFranceโ โ โVerdaniaโ) caused only a 0.2% performance drop, proving models rely on context, not memorized knowledge.
3. Token efficiency:
ย ย - Edge lists used ~2,600 tokens vs. JSON-LDโs ~13,500. Shorter formats free up context space for complex reasoning.
ย ย - But concise โ always better: structured formats improved accuracy for tasks requiring grouped data.
4. Models struggle with directionality:
ย
Counting outgoing edges (e.g., โWhich countries does France border?โ) is easier than incoming ones (โWhich countries border France?โ), likely due to formatting biases.
๐ Practical Takeaways
- Optimize for your task: Use JSON for aggregation, edge lists for centrality.
- Test your model: The best format depends on the LLMโClaude thrived with RDF Turtle, while Gemini preferred edge lists.
- Donโt fear pseudonyms: Masking real names minimally impacts performance, useful for sensitive data.
The benchmark is openly available, inviting researchers to add new tasks, graphs, and models. As AI handles larger knowledge bases, choosing the right โdata languageโ becomes as critical as the reasoning logic itself.
Paper: [KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs]
Authors: Elan Markowitz, Krupa Galiya, Greg Ver Steeg, Aram Galstyan
Choosing the Right Format: How Knowledge Graph Layouts Impact AI Reasoning
Is developing an ontology from an LLM really feasible?
It seems the answer on whether an LMM would be able to replace the whole text-to-ontology pipeline is a resounding โnoโ. If youโre one of those who think that should be (or even is?) a โyesโ: why, and did you do the experiments that show itโs as good as the alternatives (with the results available)? And I mean a proper ontology, not a knowledge graph with numerous duplications and contradictions and lacking constraints.
For a few gentle considerations (and pointers to longer arguments) and a summary figure of processes the LLM supposedly would be replacing: see https://lnkd.in/dG_Xsv_6 | 43 comments on LinkedIn
Agentic Paranets just landed on the origin_trail DKG. A major paranet feature upgrade built for AI agents with enhanced knowledge graph read/write access control
Knowledge graphs for LLM grounding and avoiding hallucination
This blog post is part of a series that dives into various aspects of SAPโs approach to Generative AI, and its technical underpinnings. In previous blog posts of this series, you learned about how to use large language models (LLMs) for developing AI applications in a trustworthy and reliable manner...
Build your hybrid-Graph for RAG & GraphRAG applications using the power of NLP | LinkedIn
Build a graph for RAG application for a price of a chocolate bar! What is GraphRAG for you? What is GraphRAG? What does GraphRAG mean from your perspective? What if you could have a standard RAG and a GraphRAG as a combi-package, with just a query switch? The fact is, there is no concrete, universal
What is really Graph RAG? Inspired by "From Local to Global: A Graph RAG Approach to Query-Focused Summarization" paper from Microsoft! How do you combineโฆ | 12 comments on LinkedIn
A zero-hallucination AI chatbot that answered over 10000 questions of students at the University of Chicago using GraphRAG
UChicago Genie is now open source! How we built a zero-hallucination AI chatbot that answered over 10000 questions of students at the University ofโฆ | 25 comments on LinkedIn
a zero-hallucination AI chatbot that answered over 10000 questions of students at the University of Chicago
Enhancing RAG-based apps by constructing and leveraging knowledge graphs with open-source LLMs
Graph Retrieval Augmented Generation (Graph RAG) is emerging as a powerful addition to traditional vector search retrieval methods. Graphs are great at repre...