Provenance-Enabled Explainable AI
GraphNews
Leveraging Knowledge Graphs and Large Language Models to Track and...
This study addresses the challenges of tracking and analyzing students' learning trajectories, particularly the issue of inadequate knowledge coverage in course assessments. Traditional assessment...
AutoSchemaKG: Autonomous Knowledge Graph Construction through...
We present AutoSchemaKG, a framework for fully autonomous knowledge graph construction that eliminates the need for predefined schemas. Our system leverages large language models to simultaneously...
Alice enters the magical, branchy world of Graphs and Graph Neural Networks
The first draft 'G' chapter of the geometric deep learning book is live! 🚀
Alice enters the magical, branchy world of Graphs and Graph Neural Networks 🕸️ (Large Language Models are there too!)
I've spent 7+ years studying, researching & talking about graphs -- This text is my best attempt at conveying everything i've learnt 💎
You may read this chapter in the usual place (link in comments!)
Any and all feedback / thoughts / questions on the content, and/or words of encouragement for finishing this book (pretty please! 😇) are warmly welcomed!
Michael Bronstein Joan Bruna Taco Cohen | 18 comments on LinkedIn
Alice enters the magical, branchy world of Graphs and Graph Neural Networks
GDL Book
Grids, Groups, Graphs, Geodesics, and Gauges
Integrating Knowledge Graphs with Symbolic AI: The Path to Interpretable Hybrid AI Systems in Medicine
In this position paper "Integrating Knowledge Graphs with Symbolic AI: The Path to Interpretable Hybrid AI Systems in Medicine" my L3S Research Center and TIB – Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek colleagues around Maria-Esther Vidal have nicely laid out some research challenges on the way to interpretable hybrid AI systems in medicine. However, I think the conceptual framework is broadly applicable way beyond medicine.
For example, my former colleagues and PhD students at eccenca are working on operationalizing Neuro-Symbolic AI for Enterprise Knowledge Management with eccenca's Corporate Memory. The paper outlines a compelling architecture for combining sub-symbolic models (e.g., deep learning) with symbolic reasoning systems to enable AI that is interpretable, robust, and aligned with human values. eccenca implements these principles at scale through its neuro-symbolic Enterprise Knowledge Graph platform, Corporate Memory for real-world industrial settings:
1. Symbolic Foundation via Semantic Web Standards - Corporate Memory is grounded in W3C standards (RDF, RDFS, OWL, SHACL, SPARQL), enabling formal knowledge representation, inferencing, and constraint validation. This allows to encode domain ontologies, business rules, and data governance policies in a machine-interpretable and human-verifiable manner.
2. Integration of Sub-symbolic Components - it integrates LLMs and ML models for tasks such as schema matching, natural language interpretation, entity resolution, and ontology population. These are linked to the symbolic layer via mappings and annotations, ensuring traceability and explainability.
3. Neuro-Symbolic Interfaces for Hybrid Reasoning - Hybrid workflows where symbolic constraints (e.g., SHACL shapes) guide LLM-based data enrichment. LLMs suggest schema alignments, which are verified against ontological axioms. Graph embeddings and path-based querying power semantic search and similarity.
4. Human-in-the-loop Interactions - Domain experts interact through low-code interfaces and semantic UIs that allow inspection, validation, and refinement of both the symbolic and neural outputs, promoting human oversight and continuous improvement.
Such an approach can power Industrial Applications, e.g. in digital thread integration in manufacturing, compliance automation in pharma and finance
and in general, cross-domain interoperability in data mesh architectures. Corporate Memory is a practical instantiation of neuro-symbolic AI that meets industrial-grade requirements for governance, scalability, and explainability – key tenets of Human-Centric AI. Check it out here: https://lnkd.in/evyarUsR
#NeuroSymbolicAI #HumanCentricAI #KnowledgeGraphs #EnterpriseArchitecture #ExplainableAI #SemanticWeb #LinkedData #LLM #eccenca #CorporateMemory #OntologyDrivenAI #AI4Industry
Integrating Knowledge Graphs with Symbolic AI: The Path to Interpretable Hybrid AI Systems in Medicine
Towards Multi-modal Graph Large Language Model
Multi-modal graphs are everywhere in the digital world.
Yet the tools used to understand them haven't evolved as much as one would expect.
What if the same model could handle your social network analysis, molecular discovery, AND urban planning tasks?
A new paper from Tsinghua University proposes Multi-modal Graph Large Language Models (MG-LLM) - a paradigm shift in how we process complex interconnected data that combines text, images, audio, and structured relationships.
Think of it as ChatGPT for graphs, but, metaphorically speaking, with eyes, ears, and structural understanding.
Their key insight? Treating all graph tasks as generative problems.
Instead of training separate models for node classification, link prediction, or graph reasoning, MG-LLM frames everything as transforming one multi-modal graph into another.
This unified approach means the same model that predicts protein interactions could also analyze social media networks or urban traffic patterns.
What makes this particularly exciting is the vision for natural language interaction with graph data. Imagine querying complex molecular structures or editing knowledge graphs using plain English, without learning specialized query languages.
The challenges remain substantial - from handling the multi-granularity of data (pixels to full images) to managing multi-scale tasks (entire graph input, single node output).
But if successful, this could fundamentally change the level of graph-based insights across industries that have barely scratched the surface of AI adoption.
↓
𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡
Towards Multi-modal Graph Large Language Model
Multi-modal Graph Large Language Models (MG-LLM)
Multi-modal graphs are everywhere in the digital world.
Yet the tools used to understand them haven't evolved as much as one would expect.
What if the same model could handle your social network analysis, molecular discovery, AND urban planning tasks?
A new paper from Tsinghua University proposes Multi-modal Graph Large Language Models (MG-LLM) - a paradigm shift in how we process complex interconnected data that combines text, images, audio, and structured relationships.
Think of it as ChatGPT for graphs, but, metaphorically speaking, with eyes, ears, and structural understanding.
Their key insight? Treating all graph tasks as generative problems.
Instead of training separate models for node classification, link prediction, or graph reasoning, MG-LLM frames everything as transforming one multi-modal graph into another.
This unified approach means the same model that predicts protein interactions could also analyze social media networks or urban traffic patterns.
What makes this particularly exciting is the vision for natural language interaction with graph data. Imagine querying complex molecular structures or editing knowledge graphs using plain English, without learning specialized query languages.
The challenges remain substantial - from handling the multi-granularity of data (pixels to full images) to managing multi-scale tasks (entire graph input, single node output).
But if successful, this could fundamentally change the level of graph-based insights across industries that have barely scratched the surface of AI adoption.
↓
𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡
Multi-modal Graph Large Language Models (MG-LLM)
HippoRAG takes cues from the brain to improve LLM retrieval
HippoRAG is a technique inspired from the interactions between the cortex and hippocampus to improve knowledge retrieval for large language models (LLM).
SPARQLLM
Contribute to GDD-Nantes/SPARQLLM development by creating an account on GitHub.
Multimodal for Knowledge Graphs (MM4KG)
A Practical Implementation
Optimizing the Interface Between Knowledge Graphs and LLMs for...
Integrating Large Language Models (LLMs) with Knowledge Graphs (KGs) results in complex systems with numerous hyperparameters that directly affect performance. While such systems are increasingly...
AutoSchemaKG: Building Billion-Node Knowledge Graphs Without Human Schemas
AutoSchemaKG: Building Billion-Node Knowledge Graphs Without Human Schemas
👉 Why This Matters
Traditional knowledge graphs face a paradox: they require expert-crafted schemas to organize information, creating bottlenecks for scalability and adaptability. This limits their ability to handle dynamic real-world knowledge or cross-domain applications effectively.
👉 What Changed
AutoSchemaKG eliminates manual schema design through three innovations:
1. Dynamic schema induction: LLMs automatically create conceptual hierarchies while extracting entities/events
2. Event-aware modeling: Captures temporal relationships and procedural knowledge missed by entity-only approaches
3. Multi-level conceptualization: Organizes instances into semantic categories through abstraction layers
The system processed 50M+ documents to build ATLAS - a family of KGs with:
- 900M+ nodes (entities/events/concepts)
- 5.9B+ relationships
- 95% alignment with human-created schemas (zero manual intervention)
👉 How It Works
1. Triple extraction pipeline:
- LLMs identify entity-entity, entity-event, and event-event relationships
- Processes text at document level rather than sentence level for context preservation
2. Schema induction:
- Automatically groups instances into conceptual categories
- Creates hierarchical relationships between specific facts and abstract concepts
3. Scale optimization:
- Handles web-scale corpora through GPU-accelerated batch processing
- Maintains semantic consistency across 3 distinct domains (Wikipedia, academic papers, Common Crawl)
👉 Proven Impact
- Boosts multi-hop QA accuracy by 12-18% over state-of-the-art baselines
- Improves LLM factuality by up to 9% on specialized domains like medicine and law
- Enables complex reasoning through conceptual bridges between disparate facts
👉 Key Insight
The research demonstrates that billion-scale KGs with dynamic schemas can effectively complement parametric knowledge in LLMs when they reach critical mass (1B+ facts). This challenges the assumption that retrieval augmentation needs domain-specific tuning to be effective.
Question for Discussion
As autonomous KG construction becomes viable, how should we rethink the role of human expertise in knowledge representation? Should curation shift from schema design to validation and ethical oversight? | 15 comments on LinkedIn
AutoSchemaKG: Building Billion-Node Knowledge Graphs Without Human Schemas
DRAG introduces a novel distillation framework that transfers RAG capabilities from LLMs to SLMs through Evidence-based distillation and Graph-based structuring
Small Models, Big Knowledge: How DRAG Bridges the AI Efficiency-Accuracy Gap
👉 Why This Matters
Modern AI systems face a critical tension: large language models (LLMs) deliver impressive knowledge recall but demand massive computational resources, while smaller models (SLMs) struggle with factual accuracy and "hallucinations." Traditional retrieval-augmented generation (RAG) systems amplify this problem by requiring constant updates to vast knowledge bases.
👉 The Innovation
DRAG introduces a novel distillation framework that transfers RAG capabilities from LLMs to SLMs through two key mechanisms:
1. Evidence-based distillation: Filters and ranks factual snippets from teacher LLMs
2. Graph-based structuring: Converts retrieved knowledge into relational graphs to preserve critical connections
This dual approach reduces model size requirements by 10-100x while improving factual accuracy by up to 27.7% compared to prior methods like MiniRAG.
👉 How It Works
1. Evidence generation: A large teacher LLM produces multiple context-relevant facts
2. Semantic filtering: Combines cosine similarity and LLM scoring to retain top evidence
3. Knowledge graph creation: Extracts entity relationships to form structured context
4. Distilled inference: SLMs generate answers using both filtered text and graph data
The process mimics how humans combine raw information with conceptual understanding, enabling smaller models to "think" like their larger counterparts without the computational overhead.
👉 Privacy Bonus
DRAG adds a privacy layer by:
- Local query sanitization before cloud processing
- Returning only de-identified knowledge graphs
Tests show 95.7% reduction in potential personal data leakage while maintaining answer quality.
👉 Why It’s Significant
This work addresses three critical challenges simultaneously:
- Makes advanced RAG capabilities accessible on edge devices
- Reduces hallucination rates through structured knowledge grounding
- Preserves user privacy in cloud-based AI interactions
The GitHub repository provides full implementation details, enabling immediate application in domains like healthcare diagnostics, legal analysis, and educational tools where accuracy and efficiency are non-negotiable.
DRAG introduces a novel distillation framework that transfers RAG capabilities from LLMs to SLMs through two key mechanisms:1. Evidence-based distillation: Filters and ranks factual snippets from teacher LLMs2. Graph-based structuring: Converts retrieved knowledge into relational graphs to preserve critical connections
Semantically Composable Architectures
I'm happy to share the draft of the "Semantically Composable Architectures" mini-paper.
It is the culmination of approximately four years' work, which began with Coreless Architectures and has now evolved into something much bigger.
LLMs are impressive, but a real breakthrough will occur once we surpass the cognitive capabilities of a single human brain.
Enabling autonomous large-scale system reverse engineering and large-scale autonomous transformation with minimal to no human involvement, while still making it understandable to humans if they choose to, is a central pillar of making truly groundbreaking changes.
We hope the ideas we shared will be beneficial to humanity and advance our civilization further.
It is not final and will require some clarification and improvements, but the key concepts are present. Happy to hear your thoughts and feedback.
Some of these concepts underpin the design of the Product X system.
Part of the core team + external contribution:
Andrew Barsukov Andrey Kolodnitsky Sapta Girisa N Keith E. Glendon Gurpreet Sachdeva Saurav Chandra Mike Diachenko Oleh Sinkevych | 13 comments on LinkedIn
Semantically Composable Architectures
Leveraging Large Language Models for Realizing Truly Intelligent...
The number of published scholarly articles is growing at a significant rate, making scholarly knowledge organization increasingly important. Various approaches have been proposed to organize...
Building and navigating attribution graphs for Large Language Models
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Open-sourcing circuit tracing tools
A-MEM Transforms AI Agent Memory with Zettelkasten Method, Atomic Notes, Dynamic Linking & Continuous Evolution
🏯🏇 A-MEM Transforms AI Agent Memory with Zettelkasten Method, Atomic Notes, Dynamic Linking & Continuous Evolution!
This Novel Memory fixes rigid structures with adaptable, evolving, and interconnected knowledge networks, delivering 2x performance in complex reasoning tasks.
𝗧𝗵𝗶𝘀 𝗶𝘀 𝘄𝗵𝗮𝘁 𝗜 𝗹𝗲𝗮𝗿𝗻𝗲𝗱:
﹌﹌﹌﹌﹌﹌﹌﹌﹌
》 𝗪𝗵𝘆 𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗠𝗲𝗺𝗼𝗿𝘆 𝗙𝗮𝗹𝗹 𝗦𝗵𝗼𝗿𝘁
Most AI agents today rely on simplistic storage and retrieval but break down when faced with complex, multi-step reasoning tasks.
✸ Common Limitations:
☆ Fixed schemas: Conventional memory systems require predefined structures that limit flexibility.
☆ Limited adaptability: When new information arises, old memories remain static and disconnected, reducing an agent’s ability to build on past experiences.
☆ Ineffective long-term retention: AI agents often struggle to recall relevant past interactions, leading to redundant processing and inefficiencies.
﹌﹌﹌﹌﹌﹌﹌﹌﹌
》𝗔-𝗠𝗘𝗠: 𝗔𝘁𝗼𝗺𝗶𝗰 𝗻𝗼𝘁𝗲𝘀 𝗮𝗻𝗱 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗹𝗶𝗻𝗸𝗶𝗻𝗴
A-MEM organizes knowledge in a way that mirrors how humans create and refine ideas over time.
✸ How it Works:
☆ Atomic notes: Information is broken down into small, self-contained knowledge units, ensuring clarity and easy integration with future knowledge.
☆ Dynamic linking: Instead of relying on static categories, A-MEM automatically creates connections between related knowledge, forming a network of interrelated ideas.
﹌﹌﹌﹌﹌﹌﹌﹌﹌
》 𝗣𝗿𝗼𝘃𝗲𝗻 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗔𝗱𝘃𝗮𝗻𝘁𝗮𝗴𝗲
A-MEM delivers measurable improvements.
✸ Empirical results demonstrate:
☆ Over 2x performance improvement in complex reasoning tasks, where AI must synthesize multiple pieces of information across different timeframes.
☆ Superior efficiency across top foundation models, including GPT, Llama, and Qwen—proving its versatility and broad applicability.
﹌﹌﹌﹌﹌﹌﹌﹌﹌
》 𝗜𝗻𝘀𝗶𝗱𝗲 𝗔-𝗠𝗘𝗠
✸ Note Construction:
☆ AI-generated structured notes that capture essential details and contextual insights.
☆ Each memory is assigned metadata, including keywords and summaries, for faster retrieval.
✸ Link Generation:
☆ The system autonomously connects new memories to relevant past knowledge.
☆ Relationships between concepts emerge naturally, allowing AI to recognize patterns over time.
✸ Memory Evolution:
☆ Older memories are continuously updated as new insights emerge.
☆ The system dynamically refines knowledge structures, mimicking the way human memory strengthens connections over time.
≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣
⫸ꆛ Want to build Real-World AI agents?
Join My 𝗛𝗮𝗻𝗱𝘀-𝗼𝗻 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝟰-𝗶𝗻-𝟭 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 TODAY! 𝟰𝟴𝟬+ already Enrolled.
➠ Build Real-World AI Agents for Healthcare, Finance,Smart Cities,Sales
➠ Learn 4 Framework: LangGraph | PydanticAI | CrewAI | OpenAI Swarm
➠ Work with Text, Audio, Video and Tabular Data
👉𝗘𝗻𝗿𝗼𝗹𝗹 𝗡𝗢𝗪 (𝟰𝟱% 𝗱𝗶𝘀𝗰𝗼𝘂𝗻𝘁):
https://lnkd.in/eGuWr4CH
| 27 comments on LinkedIn
A-MEM Transforms AI Agent Memory with Zettelkasten Method, Atomic Notes, Dynamic Linking & Continuous Evolution
CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented...
Retrieval-Augmented Generation (RAG) has significantly enhanced large language models (LLMs) in knowledge-intensive tasks by incorporating external knowledge retrieval. However, existing RAG...
this https URL elakhatibi/CDF-RAG
Paper page - Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven Retrieval-Augmented Generation
Join the discussion on this paper page
LLMs generate possibilities; knowledge graphs remember what works
LLMs generate possibilities; knowledge graphs remember what works. Together, they forge the recursive memory and creative engine that enables AI systems to truly evolve themselves.
Combining neural components (like large language models) with symbolic verification creates a powerful framework for self-evolution that overcomes limitations of either approach used independently.
AlphaEvolve demonstrates that self-evolving systems face a fundamental tension between generating novel solutions and ensuring those solutions actually work.
The paper shows how AlphaEvolve addresses this through a hybrid architecture where:
Neural components (LLMs) provide creative generation of code modifications by drawing on patterns learned from vast training data
Symbolic components (code execution) provide ground truth verification through deterministic evaluation
Without this combination, a system would either generate interesting but incorrect solutions (neural-only approach) or be limited to small, safe modifications within known patterns (symbolic-only approach).
The system can operate at multiple levels of abstraction depending on the problem: raw solution evolution, constructor function evolution, search algorithm evolution, or co-evolution of intermediate solutions and search algorithms.
This capability emanates directly from the neurosymbolic integration, where:
Neural networks excel at working with continuous, high-dimensional spaces and recognizing patterns across abstraction levels
Symbolic systems provide precise representations of discrete structures and logical relationships
This enables AlphaEvolve to modify everything from specific lines of code to entire algorithmic approaches.
While AlphaEvolve currently uses an evolutionary database, a knowledge graph structure could significantly enhance self-evolution by:
Capturing evolutionary relationships between solutions
Identifying patterns of code changes that consistently lead to improvements
Representing semantic connections between different solution approaches
Supporting transfer learning across problem domains
Automated, objective evaluation is the core foundation enabling self-evolution:
The main limitation of AlphaEvolve is that it handles problems for which it is possible to devise an automated evaluator.
This evaluation component provides the "ground truth" feedback that guides evolution, allowing the system to:
Differentiate between successful and unsuccessful modifications
Create selection pressure toward better-performing solutions
Avoid hallucinations or non-functional solutions that might emerge from neural components alone.
When applied to optimize Gemini's training kernels, the system essentially improved the very LLM technology that powers it. | 12 comments on LinkedIn
LLMs generate possibilities; knowledge graphs remember what works
Fine-tue an LLM model for triplet extraction
Do you want to fine-tune an LLM model for triplet extraction?
These findings from a recently published paper (first comment) could save you much time.
✅ Does the choice of coding vs natural language prompts significantly impact performance? When fine-tuning these open weights and small LLMs, the choice between code and natural language prompts has a limited impact on performance.
✅ Does training fine-tuned models to include chain-of-thought (rationale) sections in their outputs improve KG construction (KGC) performance? It is ineffective at best and highly detrimental at worst for fine-tuned models. This performance decrease is observed regardless of the number of in-context learning examples provided. Attention analysis suggests this might be due to the model's attention being dispersed on redundant information when rationale is used. Without rationale lists occupying prompt space, the model's attention can focus directly on the ICL examples while extracting relations.
✅ How do the fine-tuned smaller, open-weight LLMs perform compared to the CodeKGC baseline, which uses larger, closed-source models (GPT-3.5)? The selected lightweight LLMs significantly outperform the much larger CodeKGC baseline after fine-tuning. The best fine-tuned models improve upon the CodeKGC baseline by as much as 15–20 absolute F1 points across the dataset.
✅ Does model size matter for KGC performance when fine-tuning with a small amount of training data? Yes, but not in a straightforward way. The 70 B-parameter versions yielded worse results than the 1B, 3B, and 8B models when undergoing the same small amount of training. This implies that for KGC with limited fine-tuning, smaller models can perform better than much larger ones.
✅ For instruction-tuned models without fine-tuning, does prompt language or rationale help? For models without fine-tuning, using code prompts generally yields the best results for both code LLMs and the Mistral natural language model. In addition, using rationale generally seems to help these models, with most of the best results obtained when including rationale lists in the prompt.
✅ What do the errors made by the models suggest about the difficulty of the KGC task? difficulty in predicting relations, entities, and their order, especially when dealing with specialized terminology or specific domain knowledge, which poses a challenge even after fine-tuning. Some errors include adding superfluous adjectives or mistaking entity instances for class names.
✅ What is the impact of the number of in-context learning (ICL) examples during fine-tuning? The greatest performance benefit is obtained when moving from 0 to 3 ICL examples. However, additional ICL examples beyond 3 do not lead to any significant performance delta and can even lead to worse results. This further indicates that the fine-tuning process itself is the primary driver of performance gain, allowing the model to learn the task from the input text and target output.
fine-tune an LLM model for triplet extraction
NodeRAG restructures knowledge into a heterograph: a rich, layered, musical graph where each node plays a different role
NodeRAG restructures knowledge into a heterograph: a rich, layered, musical graph where each node plays a different role.
It’s not just smarter retrieval. It’s structured memory for AI agents.
》 Why NodeRAG?
Most Retrieval-Augmented Generation (RAG) methods retrieve chunks of text. Good enough — until you need reasoning, precision, and multi-hop understanding.
This is how NodeRAG solves these problems:
》 🔹Step 1: Graph Decomposition
NodeRAG begins by decomposing raw text into smart building blocks:
✸ Semantic Units (S): Little event nuggets ("Hinton won the Nobel Prize.")
✸ Entities (N): Key names or concepts ("Hinton", "Nobel Prize")
✸ Relationships (R): Links between entities ("awarded to")
✩ This is like teaching your AI to recognize the actors, actions, and scenes inside any document.
》 🔹Step 2: Graph Augmentation
Decomposition alone isn't enough. NodeRAG augments the graph by identifying important hubs:
✸ Node Importance: Using K-Core and Betweenness Centrality to find critical nodes
✩ Important entities get special attention — their attributes are summarized into new nodes (A).
✸ Community Detection: Grouping related nodes into communities and summarizing them into high-level insights (H).
✩ Each community gets a "headline" overview node (O) for quick retrieval.
It's like adding context and intuition to raw facts.
》 🔹 Step 3: Graph Enrichment
Knowledge without detail is brittle. So NodeRAG enriches the graph:
✸ Original Text: Full chunks are linked back into the graph (Text nodes, T)
✸ Semantic Edges: Using HNSW for fast, meaningful similarity connections
✩ Only smart nodes are embedded (not everything!) — saving huge storage space.
✩ Dual search (exact + vector) makes retrieval laser-sharp.
It’s like turning a 2D map into a 3D living world.
》 🔹 Step 4: Graph Searching
Now comes the magic.
✸ Dual Search: First find strong entry points (by name or by meaning)
✸ Shallow Personalized PageRank (PPR): Expand carefully from entry points to nearby relevant nodes.
✩ No wandering into irrelevant parts of the graph. The search is surgical.
✩ Retrieval includes fine-grained semantic units, attributes, high-level elements — everything you need, nothing you don't.
It’s like sending out agents into a city — and they return not with everything they saw, but exactly what you asked for, summarized and structured.
》 Results: NodeRAG's Performance
Compared to GraphRAG, LightRAG, NaiveRAG, and HyDE — NodeRAG wins across every major domain: Tech, Science, Writing, Recreation, and Finance.
NodeRAG isn’t just a better graph. NodeRAG is a new operating system for memory.
≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣
⫸ꆛ Want to build Real-World AI agents?
Join My 𝗛𝗮𝗻𝗱𝘀-𝗼𝗻 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 TODAY!
➠ Build Real-World AI Agents + RAG Pipelines
➠ Learn 3 Tools: LangGraph/LangChain | CrewAI | OpenAI Swarm
➠ Work with Text, Audio, Video and Tabular Data
👉𝗘𝗻𝗿𝗼𝗹𝗹 𝗡𝗢𝗪 (𝟯𝟰% 𝗱𝗶𝘀𝗰𝗼𝘂𝗻𝘁):
https://lnkd.in/eGuWr4CH
| 20 comments on LinkedIn
NodeRAG restructures knowledge into a heterograph: a rich, layered, musical graph where each node plays a different role
RAKG: Document-level Retrieval Augmented Knowledge Graph Construction
Contribute to LMMApplication/RAKG development by creating an account on GitHub.
RAKG: Document-level Retrieval Augmented Knowledge Graph Construction
Knowledge Graph of Thoughts
Official Implementation of "Affordable AI Assistants with Knowledge Graph of Thoughts" - spcl/knowledge-graph-of-thoughts
Knowledge Graph of Thoughts
if you believe that LLMs need graphs to reason, you are right and now you have evidence: Claude answers questions by building and traversing a graph
To all the knowledge graph enthusiasts who've felt for a while that "graphs are the way to go" when it comes to enabling "intelligence," it was interesting to read Anthropic's "Tracing the thoughts of a large language model" - if you believe that LLMs need graphs to reason, you are right and now you have evidence: Claude answers questions by building and traversing a graph (in latent space) before it translates it back to language:
https://lnkd.in/eWFWwfN4 | 20 comments on LinkedIn
if you believe that LLMs need graphs to reason, you are right and now you have evidence: Claude answers questions by building and traversing a graph
Affordable AI Assistants with Knowledge Graph of Thoughts
Large Language Models (LLMs) are revolutionizing the development of AI assistants capable of performing diverse tasks across domains. However, current state-of-the-art LLM-driven agents face...
What if your LLM is… a graph?
What if your LLM is… a graph?
A few days ago, Petar Veličković from Google DeepMind gave one of the most interesting and thought provoking conference I've seen in a while, "Large Language Models as Graph Neural Networks". Once you start seeing LLM as graph neural network, many structural oddities suddenly falls into place.
For instance, OpenAI currently recommends to put the instructions at the top of a long prompt. Why is that so? Because due to the geometry of attention graphs, LLM are counter-intuitively biased in favors of the first tokens: they travel constinously through each generation steps, are internally repeated a lot and end up "over-squashing" the latter ones. Models then use a variety of internal metrics/transforms like softmax to moderate this bias and better ponderate distribution, but this is a late patch that cannot solve long time attention deficiencies, even more so for long context.
The most interesting aspect of the conference from an applied perspective: graph/geometric representations directly affect accuracy and robustness. As the generated sequence grow and deal with sequences of complex reasoning steps, cannot build solid expert system when attention graphs have single point of failures. Or at least, without extrapolating this information in the first place and providing more detailed accuracy metrics.
I do believe LLM explainability research is largely underexploited right now, despite being accordingly a key component of LLM devops in big labs. If anything, this is literal "prompt engineering", seeing models as nearly physical structure under stress and providing the right feedback loops to make them more reliable. | 30 comments on LinkedIn
What if your LLM is… a graph?
Choosing the Right Format: How Knowledge Graph Layouts Impact AI Reasoning
Choosing the Right Format: How Knowledge Graph Layouts Impact AI Reasoning ...
👉 Why This Matters
Most AI systems blend knowledge graphs (structured data) with large language models (flexible reasoning). But there’s a hidden variable: "how" you translate the graph into text for the AI. Researchers discovered that the formatting choice alone can swing performance by up to "17.5%" on reasoning tasks. Imagine solving 1 in 5 more problems correctly just by adjusting how you present data.
👉 What They Built
KG-LLM-Bench is a new benchmark to test how language models reason with knowledge graphs.
It includes five tasks:
- Triple verification (“Does this fact exist?”)
- Shortest path finding (“How are two concepts connected?”)
- Aggregation (“How many entities meet X condition?”)
- Multi-hop reasoning (“Which entities linked to A also have property B?”)
- Global analysis (“Which node is most central?”)
The team tested seven models (Claude, GPT-4o, Gemini, Llama, Nova) with five ways to “textualize” graphs, from simple edge lists to structured JSON and semantic web formats like RDF Turtle.
👉 Key Insights
1. Format matters more than assumed:
- Structured JSON and edge lists performed best overall, but results varied by task.
- For example, JSON excels at aggregation tasks (data is grouped by entity), while edge lists help identify central nodes (repeated mentions highlight connections).
2. Models don’t cheat:
Replacing real entity names with fake ones (e.g., “France” → “Verdania”) caused only a 0.2% performance drop, proving models rely on context, not memorized knowledge.
3. Token efficiency:
- Edge lists used ~2,600 tokens vs. JSON-LD’s ~13,500. Shorter formats free up context space for complex reasoning.
- But concise ≠ always better: structured formats improved accuracy for tasks requiring grouped data.
4. Models struggle with directionality:
Counting outgoing edges (e.g., “Which countries does France border?”) is easier than incoming ones (“Which countries border France?”), likely due to formatting biases.
👉 Practical Takeaways
- Optimize for your task: Use JSON for aggregation, edge lists for centrality.
- Test your model: The best format depends on the LLM—Claude thrived with RDF Turtle, while Gemini preferred edge lists.
- Don’t fear pseudonyms: Masking real names minimally impacts performance, useful for sensitive data.
The benchmark is openly available, inviting researchers to add new tasks, graphs, and models. As AI handles larger knowledge bases, choosing the right “data language” becomes as critical as the reasoning logic itself.
Paper: [KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs]
Authors: Elan Markowitz, Krupa Galiya, Greg Ver Steeg, Aram Galstyan
Choosing the Right Format: How Knowledge Graph Layouts Impact AI Reasoning
Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs
Our first attempts at mechanistic interpretability of Transformers from the perspective of network science and graph theory! Check out our preprint: arxiv.org/abs/2502.12352
A wonderful collaboration with superstar MPhil students Batu El, Deepro Choudhury, as well as Pietro Lio' as part of the Geometric Deep Learning class last year at University of Cambridge Department of Computer Science and Technology
We were motivated by Demis Hassabis calling AlphaFold and other AI systems for scientific discovery as ‘engineering artifacts’. We need new tools to interpret the underlying mechanisms and advance our scientific understanding. Graph Transformers are a good place to start.
The key ideas are:
- Attention across multi-heads and layers can be seen as a heterogenous, dynamically evolving graph.
- Attention graphs are complex systems represent information flow in Transformers.
- We can use network science to extract mechanistic insights from them!
More to come on the network science perspective to understanding LLMs next! | 13 comments on LinkedIn