Looking to improve the performance of Cypher queries or learn how to model graphs to support business use cases? A graph database like Neo4j can help. In fact, many enterprises are... - Selection from Neo4j: The Definitive Guide [Book]
Over two years ago, I wrote about the emerging synergy between LLMs and ontologies - and how, together, they could create a self-reinforcing loop of continuous improvement.
Over two years ago, I wrote about the emerging synergy between LLMs and ontologies - and how, together, they could create a self-reinforcing loop of continuous improvement. That post struck a chord.
With GPT-5 now here, it’s the right moment to revisit the idea.
Back then, GPT-3.5 and GPT-4 could draft ontology structures, but there were limits in context, reasoning, and abstraction.
With GPT-5 (and other frontier models), that’s changing:
🔹 Larger context windows let entire ontologies sit in working memory at once.
🔹 Test-time compute enables better abstraction of concepts.
🔹 Multimodal input can turn diagrams, tables, and videos into structured ontology scaffolds.
🔹 Tool use allows ontologies to be validated, aligned, and extended in one flow.
But some fundamentals remain. GPT-5 is still curve-fitting to a training set - and that brings limits:
🔹 The flipside of flexibility is hallucination. OpenAI has reduced it, but GPT-5 still scores 0.55 on SimpleQA, with a 5% hallucination rate on its own public-question dataset.
🔹 The model is bound by the landscape of its training data. That landscape is vast, but it excludes your private, proprietary data - and increasingly, an organisation’s edge will track directly to the data it owns outside that distribution.
Fortunately, the benefits flow both ways. LLMs can help build ontologies, but ontologies and knowledge graphs can also help improve LLMs. The two systems can work in tandem.
Ontologies bring structure, consistency, and domain-specific context.
LLMs bring adaptability, speed, and pattern recognition that ontologies can’t achieve in isolation.
Each offsets the other’s weaknesses - and together they make both stronger.
The feedback loop is no longer theory - we’ve been proving it:
Better LLM → Better Ontology → Better LLM - in your domain.
There is a lot of hype around AI. GPT-5 is good, but not ground-breaking. Still, the progress over two years is remarkable. For the foreseeable future, we are living in a world where models keep improving - but where we must pair classic formal symbolic systems with these new probabilistic models.
For organisations, the challenge is to match growing model power with equally strong growth in the power of their proprietary symbolic formalisation. Not all formalisations are equal. We want fewer brittle IF statements buried in application code, and more rich, flexible abstractions embedded in the data itself. That’s what ontologies and knowledge graphs promise to deliver.
Two years ago, this was a hopeful idea.
Today, it’s looking less like a nice-to-have…
…and more like the only sensible way forward for organisations.
⭕ Neural-Symbolic Loop: https://lnkd.in/eJ7S22hF
🔗 Turn your data into a competitive edge: https://lnkd.in/eDd-5hpV
Palantir hit $175/share because they understand what 99% of AI companies don't: ontologies
palantir hit $175/share because they understand what 99% of AI companies don't:
ontologies.
in 2021, the word "ontology" appeared 0 times in their earnings calls. by Q3 2024? 9 times.
their US commercial revenue is growing 153% YoY.
why?
because LLMs are becoming the commodity, while ontologies are becoming the moat.
let me explain why most enterprise AI initiatives are failing without one:
every enterprise has the same problem:
47 different systems ❗️
19 definitions of "customer" ❗️
34 versions of "product"❗️
business logic scattered across 100+ applications ❗️
you throw AI at something like this? it hallucinates. but if you build an ontology first? it gains the context and data depth to be able to reason.
palantir figured this out years ago.
but here's what palantir doesn't do: verticalize at scale.
they're brilliant at defense, government, contracting. but specialized industries need specialized ontologies.
take telecommunications. a telco's "customer" isn't just a record - it's:
➕ a subscriber with multiple services
➕ a hierarchy of accounts and sub-accounts
➕ real-time network states
➕ billing cycles across geographies
➕ regulatory compliance per jurisdiction
Orgs have tried to standardize this before. but standards aren't ontologies. they're just vocabularies.
this is why Totogi has spent so much time and effort building their telco-specific ontology layer
while palantir was perfecting horizontal enterprise ontologies, we went deep on telecom's unique semantic complexity.
now telcos can deploy AI that takes one action - 'activate new customer' - and correctly translates it across systems that call it 'create subscriber' (BSS), 'provision user' (network), 'establish account' (billing), and 'initialize profile' (CRM). No more manual steps, no more dropped handoffs between systems.
palantir proved the model. but they can't be everywhere.
the future belongs to industry-specific semantic platforms like Totogi's BSS Magic 🚀 | 18 comments on LinkedIn
palantir hit $175/share because they understand what 99% of AI companies don't:ontologies
Workshop from @FalkorDB and ZEP (Graphiti): Building Production Knowledge Graphs from Structured/Unstructured Data Sources.👩💻 Google Collab for the demo:...
Hydra is a unique functional programming language based on the LambdaGraph data model.
In case you were wondering what I have been up to lately, Hydra is a large part of it. This is the open source graph programming language I alluded to last year at the Knowledge Graph Conference. Hydra is almost ready for its 1.0 release, and I am planning on making it into a community project, possibly through the Apache Incubator.
In this initial demo video, we take an arbitrary tabular dataset and use Hydra + Claude to map it into a property graph. More specifically, we use the LLM once to construct a pair of schemas and a mapping. From there, we apply the mapping deterministically and efficiently to each row of data, without additional calls to the LLM. The recording was a little too long for LinkedIn, so I broke it into two parts. I will post part 2 momentarily (edit: part 2 is here: https://lnkd.in/gZmHicXu). More videos will follow as we get closer to the release.
GitHub: https://lnkd.in/g8v2hvd5
Discord: https://bit.ly/lg-discord
Semantic Data in Medallion Architecture: Enterprise Knowledge Graphs at Scale | LinkedIn
Building Enterprise Knowledge Graphs Within Modern Data Platforms - Version 26 Louie Franco III Enterprise Architect - Knowledge Graph Architect - Semantics Architect August 3, 2025 In my previous article on Data Vault Medallion Architecture, I outlined how structured data flows through Landing, Bro
Jessica Talisman has been publishing a series of articles on Substack about how to develop more robust AI systems by leveraging vocabularies, thesauri, tax...
A gentle introduction to DSPy for graph data enrichment | Kuzu
📢 Check out our latest blog post by Prashanth Rao, where we introduce the DSPy framework to help you build composable pipelines with LLMs and graphs. In the post, we dive into a fascinating dataset of Nobel laureates and their mentorship networks for a data enrichment task. 👇🏽
✅ The source data that contains the tree structures is enriched with data from the official Nobel Prize API.
✅ We showcase a 2-step methodology that combines the benefits of Kuzu's vector search capabilities with DSPy's powerful primitives to build an LLM-as-a-judge pipeline that help disambiguate entities in the data.
✅ The DSPy approach is scalable, low-cost and efficient, and is flexible enough to apply to a wide variety of domains and use cases.
PyG (PyTorch Geometric) has evolved significantly since its initial release, establishing itself as a leading framework for Graph Neural Networks. In this paper, we present Pyg 2.0 (and its...
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
Graph-R1
New RAG framework just dropped!
Combines agents, GraphRAG, and RL.
Here are my notes:
Introduces a novel RAG framework that moves beyond traditional one-shot or chunk-based retrieval by integrating graph-structured knowledge, agentic multi-turn interaction, and RL.
Graph-R1 is an agent that reasons over a knowledge hypergraph environment by iteratively issuing queries and retrieving subgraphs using a multi-step “think-retrieve-rethink-generate” loop.
Unlike prior GraphRAG systems that perform fixed retrieval, Graph-R1 dynamically explores the graph based on evolving agent state.
Retrieval is modeled as a dual-path mechanism: entity-based hyperedge retrieval and direct hyperedge similarity, fused via reciprocal rank aggregation to return semantically rich subgraphs. These are used to ground subsequent reasoning steps.
The agent is trained end-to-end using GRPO with a composite reward that incorporates structural format adherence and answer correctness. Rewards are only granted if reasoning follows the proper format, encouraging interpretable and complete reasoning traces.
On six RAG benchmarks (e.g., HotpotQA, 2WikiMultiHopQA), Graph-R1 achieves state-of-the-art F1 and generation scores, outperforming prior methods including HyperGraphRAG, R1-Searcher, and Search-R1. It shows particularly strong gains on harder, multi-hop datasets and under OOD conditions.
The authors find that Graph-R1’s performance degrades sharply without its three key components: hypergraph construction, multi-turn interaction, and RL.
Ablation study supports that graph-based and multi-turn retrieval improves information density and accuracy, while end-to-end RL bridges the gap between structure and language.
Paper: https://lnkd.in/eGbf4HhX | 15 comments on LinkedIn
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
Our SPARQL Notebook extension for Visual Studio Code makes it super easy to document SPARQL queries and run them, either against live endpoints or directly on local RDF files. I just (finally!) published a 15-minute walkthrough on our YouTube channel Giant Global Graph. It gives you a quick overview of how it works and how you can get started.
Link in the comments.
Fun fact: I recorded this two years ago and apparently forgot to hit publish. Since then, we've added new features like improved table renderers with pivoting support, so it's even more useful now. Check it out! | 11 comments on LinkedIn
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
Instead of just pulling facts, the system samples multi-step paths within the graph, such as a causal chain from a disease to a symptom, and translates these paths into natural language reasoning tasks complete with a step-by-step thinking trace
Alhamdulillah, iText2KG v0.0.8 is finally out!
(Yes, I’ve been quite busy these past few months 😅)
.. and it can now build dynamic knowledge graphs. The GIF below shows a dynamic KG generated from OpenAI tweets between June 18 and July 17.
(Note: Temporal/logical conflicts aren't handled yet in this version, but you can still resolve them with a post-processing filter.)
Here are the main updated features:
- iText2KG_Star: Introduced a simpler and more efficient version of iText2KG that eliminates the separate entity extraction step. Instead of extracting entities and relations separately, iText2KG_Star directly extracts triplets from text. This approach is more efficient as it reduces processing time and token consumption and does not need to handle invented/isolated entities.
- Facts-Based KG Construction: Enhanced the framework with facts-based knowledge graph construction using the Document Distiller to extract structured facts from documents, which are then used for incremental KG building. This approach provides more exhaustive and precise knowledge graphs.
- Dynamic Knowledge Graphs: iText2KG now supports building dynamic knowledge graphs that evolve. By leveraging the incremental nature of the framework and document snapshots with observation dates, users can track how knowledge changes and grows.
Check out the new version and an example of OpenAI Dynamic KG Construction in the first comment.
Why Businesses Must Ground Their AI in Knowledge Graphs | LinkedIn
Here, I clearly explain why businesses must transition from raw tabular data to RDF-based knowledge graphs, and why this is essential to ground AI in logic-driven, traceable inference rather than black-box prediction: 1. Your tabular data is dumb.
Millions of G∈AR-s: Extending GraphRAG to Millions of Documents
Scaling GraphRAG to Millions of Documents: Lessons from the SIGIR 2025 LiveRAG Challenge
👉 WHY THIS MATTERS
Retrieval-augmented generation (RAG) struggles with multi-hop questions that require connecting information across documents. While graph-based RAG methods like GEAR improve reasoning by structuring knowledge as entity-relationship triples, scaling these approaches to web-sized datasets (millions/billions of documents) remains a bottleneck. The culprit? Traditional methods rely heavily on LLMs to extract triples—a process too slow and expensive for large corpora.
👉 WHAT THEY DID
Researchers from Huawei and the University of Edinburgh reimagined GEAR to sidestep costly offline triple extraction.
Their solution:
- Pseudo-alignment: Link retrieved passages to existing triples in Wikidata via sparse retrieval.
- Iterative expansion: Use a lightweight LLM (Falcon-3B-Instruct) to iteratively rewrite queries and retrieve additional evidence through Wikidata’s graph structure.
- Multi-step filtering: Combine Reciprocal Rank Fusion (RRF) and prompt-based filtering to reconcile noisy alignments between Wikidata and document content.
This approach achieved 87.6% correctness and 53% faithfulness on the SIGIR 2025 LiveRAG benchmark, despite challenges in aligning Wikidata’s generic triples with domain-specific document content.
👉 KEY INSIGHTS
1. Trade-offs in alignment: Linking Wikidata triples to documents works best for general knowledge but falters with niche topics (e.g., "Pacific geoduck reproduction" mapped incorrectly to oyster biology).
2. Cost efficiency: Avoiding LLM-based triple extraction reduced computational overhead, enabling scalability.
3. The multi-step advantage: Query rewriting and iterative retrieval improved performance on complex questions requiring 2+ reasoning hops.
👉 OPEN QUESTIONS
- How can we build asymmetric semantic models to better align text and graph data?
- Can hybrid alignment strategies (e.g., blending domain-specific KGs with Wikidata) mitigate topic drift?
- Does graph expansion improve linearly with scale, or are diminishing returns inevitable?
Why read this paper?
It’s a pragmatic case study in balancing scalability with reasoning depth in RAG systems. The code and prompts are fully disclosed, offering a blueprint for adapting GraphRAG to real-world, large-scale applications.
Paper: "Millions of G∈AR-s: Extending GraphRAG to Millions of Documents" (Shen et al., SIGIR 2025). Preprint: arXiv:2307.17399.
Millions of G∈AR-s: Extending GraphRAG to Millions of Documents
As the fifth most popular website on the Internet, keeping Wikipedia running smoothly is no small feat. The free encyclopedia hosts more than 65 million
This is the title of my upcoming book. And it’s all about the Shapes Constraint Language (SHACL). Expected release before November 1st 2025. The book is written and illustrated by Veronika He…
I've spent long, hard years learning how to talk about knowledge graphs and semantics with software engineers who have little training in linguistics. I feel quite fluent at this point, after investing huge amounts of effort into understanding statistics (I was a humanities undergrad) and into unpac