RDF, the Semantic Web Project, Linked Data, and Knowledge Graphs have always promised an Internet where data is richly interconnected, queryable, and semantically coherent. The vision was not flawed, but adoption has remained mercurial.
T-Box: The secret sauce of knowledge graphs and AI
T-Box: The secret sauce of knowledge graphs and AI
Ever wondered how knowledge graphs “understand” the world? Meet the T-Box, the part that tells your graph what exists and how it can relate.
Think of it like building a LEGO set:
T-Box (Terminological Box) = the instruction manual (defines the pieces and how they fit)
A-Box (Assertional Box) = the LEGO pieces you actually have (your data, your instances)
Why it’s important for RDF knowledge graphs:
- Gives your data structure and rules, so your graph doesn’t turn into spaghetti
- Enables reasoning, letting the system infer new facts automatically
- Keeps your graph consistent and maintainable, even as it grows
Why it’s better than other models:
Traditional databases just store rows and columns; relationships have no meaning
RDF + T-Box = data that can explain itself and connect across domains
Why AI loves it:
- AI can reason over knowledge, not just crunch numbers
- Enables smarter recommendations, insights, and predictions based on structured knowledge
Quick analogy:
T-Box = blueprint/instruction manual (the ontology / what is possible)
A-Box = the real-world building (the facts / what is true)
Together = AI-friendly, smart knowledge graph
#KnowledgeGraph #RDF #AI #SemanticWeb #DataScience #GraphData
T-Box: The secret sauce of knowledge graphs and AI
Comparing LPG and RDF in Recent Graph RAG Architectures
Comparing LPG and RDF in Recent Graph RAG Architectures
As a follow-up to my previous posts and discussions, I would like to share three papers on arXiv that demonstrate the wide range of design choices in combining LPG and RDF. Here’s a brief overview of each:
1. RAGONITE: Iterative Retrieval on Induced Databases and Verbalized RDF
arXiv:2412.17690
This paper builds on RDF knowledge graphs. Rather than relying solely on SPARQL queries, it establishes two retrieval pathways: one from an SQL database generated from the KG, and another from text searches over verbalised RDF facts. A controller decides when to combine or switch between them, with results passed to an LLM. The insight: RDF alone is not robust enough for conversational queries, but pairing it with SQL and text dramatically improves coverage and resilience.
2. GraphAr: Efficient Storage for Property Graphs in Data Lakes
arXiv:2312.09577
This article addresses LPGs. It introduces a storage scheme that preserves LPG semantics in formats such as Parquet, while significantly boosting performance. Reported gains are impressive: neighbour retrieval is ~4452× faster, label filtering 14.8× faster, and end-to-end workflows 29.5× faster compared to baseline Parquet methods. Such optimisations are critical for GraphRAG, where low-latency retrieval is essential.
3. CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era
arXiv:2412.18702
This work brings a benchmarking perspective, targeting Cypher queries over large-scale LPGs. It emphasises precision retrieval across full-scale graphs, something crucial when LLMs are expected to interact with enterprise-scale knowledge. By formalising benchmarks, it encourages more rigorous evaluation of GraphRAG retrieval techniques and raises the bar for future architectures.
Takeaway
Together, these works highlight the diverse strategies for bridging RDF and LPG in GraphRAG — from hybrid retrieval pipelines to optimised storage and precision benchmarks. They show how research is steadily moving from demos to architectures that balance semantics, performance, and accuracy.
AI-Assisted Ontology Mapping
Ontology alignment, glossary mapping, semantic integration - none are new. For decades: TF-IDF, WordNet, property matching, supervised models. They work - but remain rule-bounded.
The new Google + Harvard research (2025-09-08) signals a paradigm shift:
Ontologies are no longer static.
Every conceptual decision can be treated as a measurable task.
Ontologies as Living Systems
An ontology is not a document.
It is a formalized knowledge backbone, where:
- Concepts are expressed declaratively (OWL, RDF, OntoUML)
- Relations exist as axioms
- Every inference is machine-checkable
In this world, the semantic layer isn’t a BI artifact - it’s the formal contract of meaning: business glossaries, KPIs, and data attributes all refer to the same conceptual entities.
Measuring Ontological Precision
The Google–Harvard approach reframes ontology engineering as scorable tasks:
- Mapping-F1 → accuracy of mappings between glossaries and semantic layers.
- Alignment% → conceptual overlap between ontologies.
- Consistency → are KPI definitions aligned with their OWL/RDF axioms?
Once we define these metrics, semantic mappings stop being static deliverables. They become living quality signals - ontological KPIs.
AI as a Sandbox Co-Scientist
The breakthrough is not automation. It’s the ability to generate, test, and validate conceptual hypotheses iteratively:
- LLM proposes alternative mapping strategies: embeddings, synonym discovery, definition-based similarity.
- Tree Search explores promising branches, sandbox-validating each.
- Research Injection pulls external knowledge - papers, books, benchmarks - into the loop.
In one small-scale ontology alignment task:
- Task: map 20 glossary terms into a semantic layer.
- Baseline: manual mapping → Mapping-F1 = 0.55.
- AI loop: hypotheses generated, sandbox-validated.
- Breakthrough: after 8 iterations, Mapping-F1 reached 0.91.
This isn’t “AI hallucination.”
It’s measured, validated ontology evolution.
The Ontological Cockpit
An ontology cockpit tracks the health of your knowledge model:
- Mapping-F1 trends - how well glossaries and layers align.
- Alignment% by domain - where conceptual drift emerges.
- Consistency-break log - where KPI definitions diverge from formal models.
- Drift detection - alerts when semantics shift silently.
This cockpit is the dynamic mirror of formalism.
BI 2.0 dashboards can later inherit these metrics.
AI-Supported Formalism
Jessica Talisman - this is close to what you’ve been advocating:
formal knowledge models supported, not replaced, by AI.
- Sandbox validation ensures every hypothesis is tested and versioned.
- Research injection integrates state-of-the-art ontological heuristics.
- Ontologies evolve iteratively, without compromising formal rigor.
The Google + Harvard research shows us:
a semantic backbone that learns,
an ontology that continuously integrates new knowledge,
and a future where conceptual precision is measurable, auditable, and improvable. | 73 comments on LinkedIn
Showrooms vs. Production Reality: Why is RDF still not widely used?
Showrooms vs. Production Reality: Why is RDF still not widely used?
The debate around RDF never really goes away. Advocates highlight its strong foundations, interoperability and precision. However, critics point to its steep learning curve, unwieldy tools, and limited adoption beyond academia and government circles.
So why is RDF still a hard sell in enterprise settings? The answer lies less in ignorance and more in reality.
Enterprises operate in dynamic environments. Data is constantly being created, updated, versioned and retired. Complex CRUD operations, integration pipelines and governance processes are not exceptions, but part of the daily routine. RDF, with its emphasis on formal representation, often finds it difficult to keep up with this level of operational activity.
Performance matters, too. Systems that appear elegant in theory often encounter scaling and latency issues in practice. Enterprises cannot afford philosophical debates when customers expect instant results and compliance teams demand verifiable evidence.
Usability is another factor. While RDF tooling is powerful, it is geared towards specialists. Enterprises need platforms that are usable by architects, data stewards, analysts and developers, without requiring them to master semantic web standards.
Meanwhile, pragmatic approaches to GraphRAG — combining graph models with embeddings — are gaining traction. While they may lack the rigour of RDF, they offer faster integration, better performance and easier adoption. For many enterprises, 'good enough and working' is preferable to 'perfect but unused'.
This doesn’t mean that RDF has no place. It remains relevant in classical information systems where interoperability and formal semantics are essential, such as in the healthcare, government and regulated industries.
However, the centre of gravity has shifted. In today's LLM and GraphRAG pipelines, with all their complexity and pragmatic constraints, enterprises prioritise solutions that work, scale and can be trusted. Therefore, the real question may no longer be “Why don’t enterprises adopt RDF?”, but rather, “Can RDF remain relevant in the noisy, fast-moving world of enterprise AI?”
#KnowledgeGraphs #EnterpriseAI #GraphRAG #RDF #DataArchitecture #AIinEnterprise #LLM #AIAdoption | 22 comments on LinkedIn
Showrooms vs. Production Reality: Why is RDF still not widely used?
Ontology-driven vibe coding
Build a reliable app in a matter of minutes in just five steps:
1. Define concepts
2. Define relationships
3. Connect concepts through relationships
4. Define attributes
5. Connect attributes to concepts
Then click on 'go to app -' and you are ready to go!
What does Hapsah.org provide:
- Business glossary with terms and definitions
- Conceptual modelling environment
- Business rule authoring tool
- App running environment
- Admin environment
- APIs for operational data access (for data manipulation)
- APIs for meta data access (glossary, conceptual model and business rules)
Making changes or additions to your app is just as easy.
You never run into debugging issues.
There is no spaghetti codebase that is created and managed under the hood.
So no debugging hell, just a smoothly running app.
#vibecoding #nocode #ontology #semantic #app #development #businessrules | 18 comments on LinkedIn
Citation metrics are widely used to assess academic impact but suffer from social biases, including institutional prestige and journal visibility. Here we introduce rXiv Semantic Impact (XSI), a...
Understanding ecological systems using knowledge graphs: an application to highly pathogenic avian influenza | Bioinformatics Advances | Oxford Academic
AbstractMotivation. Ecological systems are complex. Representing heterogeneous knowledge about ecological systems is a pervasive challenge because data are
Earlier this week I came across a post by Miklós Molnár that sparked something I think the ontology community has needed to articulate for a long time. The post described a shift in how we might think about ontology mapping and alignment in the age of AI.
Semantics in use part 4: an interview with Michael Pool, Semantic Technology Product Leader @Bloomberg | LinkedIn
What is your role? I am a product manager in the Office of the CTO at Bloomberg, where I am responsible for developing products that help to deploy semantic solutions that facilitate our data integration and delivery. Bloomberg is a global provider of financial news and information, including real-t
What is an ontology? Well it depends on who’s talking.
Ontology talk has sprung up a lot in data circles the last couple of years. You may have read in the news that the Department of Defense adopted an ontology, Juan will tell you enterprise AI needs an ontology, Jessica will tell you how to build an ontology pipeline, and Palantir will gladly sell you one (🤦♂️). Few people actually spell out what they mean when talking about “ontology” and unsurprisingly they’re not all talking about the same thing.
Ontology is a borrow word for information scientists who took it from philosophy where ontology is an account of the fundamental things around us. Some of you no doubt read Plato’s Republic with the allegory of the cave, which introduces the theory of forms. Aristotle’s had two ontologies, one in the Categories and another in the Metaphysics. (My friend Jessica would call the former a Taxonomy). When I talk about ontology as a philosopher I’m interested in the fundamental nature or reality. Is it made up of medium sized dry goods or subatomic wave functions.
Information scientists aren’t interested in the fundamental nature of reality, but they are interested in how we organize our data about reality. So when they talk about ontologies they actually mean one of several different technologies.
When Juan talks about ontologies I know in my head he means knowledge graphs. This introduces a regression because knowledge graphs can be implemented in number of different ways, though the Resource Description Framework (RDF) is probably the most popular. If you’ve ever built a website, RDF will look familiar because it’s simply URIs that represent subject predicate object triples. (Juan-works at-ServiceNow) Because we’re technologists there are a number of different ways to represent, store, and query a knowledge graph. (See XKCD 927)
Knowledge graphs are cool and all, but they’re not the only approach to ontologies. When the DoD went shopping for an ontology, they started with an upper formal ontology, specifically the Basic Formal Ontology. I think BFO is cool if only because it’s highly influenced by philosophy through the work of philosopher Barry Smith (Buffalo). Formal ontologies can organize the concepts, relations, axioms, across large domains like healthcare, but they’re best fit for slowly evolving industries. While BFO might be the most popular upper ontology it’s certainly not the only one on the market.
My own view is that in data we’re all engaged in ontological work in a broad sense. If you’re building a data model, you need a good account of “what there is” for the business domain. At what grain do we count inventory? Bottles, cases, pallets, etc? The more specific we get around doing ontological work, the harder the deliverables become. eg knowledge graphs are harder to build than data models, formal ontologies are harder to build than knowledge graphs. Most organizations need good data models over formal ontologies. | 109 comments on LinkedIn
A database tells you what is connected. A knowledge graph tells you why.
A database tells you what is connected.
A knowledge graph tells you why.
→ SQL hides semantics in schema logic. Foreign keys don’t explain relationships, they just enforce them.
→ Knowledge graphs make relationships explicit. Edges have meaning, context, synonyms, hierarchies.
→ Traversal in SQL = JOIN gymnastics. Traversal in a KG = natural multi-hop reasoning.
Benchmarks show LLMs answered enterprise questions correctly 16.7% of the time over SQL … vs. 54.2% over the same data in a KG. Same data, different representation.
Sure, you can bolt ontologies, synonyms, and metadata onto SQL. But at that point, you’ve basically reinvented a knowledge graph.
So the real question is:
Do you want storage, or do you want reasoning?
#KnowledgeGraphs #AI #LLM #Agents #DataEngineering | 51 comments on LinkedIn
A database tells you what is connected.A knowledge graph tells you why.
Every knowledge system has to wrestle with a deceptively simple question: what do we assert, and what do we derive? That line between assertion and derivation is where Object-Role Modeling (ORM) and the Resource Description Framework (RDF) with the Web Ontology Language (OWL) go in radically differe
Why is ontology engineering such a mess?
There's a simple reason: proprietary data models, proprietary inference engines, and proprietary query engines.
Some ontology traditions have always been open standards: Prolog/Datalog, RDF, conceptual graphs all spring to mind.
However, startups in ontology often take government money, apparently under conditions that inspire them to close their standards. One notable closed-standard ontology vendor is Palantir. If you look into the world of graph databases, you will discover many more vendors operating on closed standards, as well as some vendors who've implemented less popular and in my view less user-friendly open standards.
My advice to ontology consultants and to their clients is to prioritize vendors that implement open standards. Given that this list includes heavyweights like Oracle and AWS, it isn't hard to remain within one's comfort zone while embracing open standards. Prolog and RDF are likely the most popular and widely known standards for automated inference, knowledge representation, etc. There are more potential engineers and computer scientists and modelers who've trained on these standards than any vendor you may wish to name with a closed standard, there are prebuilt ontologies and query rewriting approaches and inference engine profiles, there are constraint programming approaches for both, etc.
Oracle and AWS have chosen to go with open standards rather than inventing some new graph data model and yet another query processor to handle the same inference and business rule workloads we've been handling with various technologies since the 1950s. Learn from their example, and please quit wasting all of our time on Earth by reinventing the semantic network.
Debunking Urban Myths about RDF and Explaining How Ontologies Help GraphRAG | LinkedIn
I recently came across some misconceptions about why the LPG graph model is more effective than RDF for GraphRAG, and I wrote this article to debunk them. At the end, I also elaborate on two principal advantages of RDF when it comes to provision of context and grounding to LLMs (i) schema languages
Guy van den Broeck (UCLA)https://simons.berkeley.edu/talks/guy-van-den-broeck-ucla-2025-04-29Theoretical Aspects of Trustworthy AIToday, many expect AI to ta...
Webinar: Semantic Graphs in Action - Bridging LPG and RDF Frameworks - Enterprise Knowledge
As organizations increasingly prioritize linked data capabilities to connect information across the enterprise, selecting the right graph framework to leverage has become more important than ever. In this webinar, graph technology experts from Enterprise Knowledge Elliot Risch, James Egan, David Hughes, and Sara Nash shared the best ways to manage and apply a selection of these frameworks to meet enterprise needs.
A new notebook exploring Semantic Entity Resolution & Extraction using DSPy and Google's new LangExtract library.
Just released a new notebook exploring Semantic Entity Resolution & Extraction using DSPy (Community) and Google's new LangExtract library.
Inspired by Russell Jurney’s excellent work on semantic entity resolution, this demo follows his approach of combining:
✅ embeddings,
✅ kNN blocking,
✅ and LLM matching with DSPy (Community).
On top of that, I added a general extraction layer to test-drive LangExtract, a Gemini-powered, open-source Python library for reliable structured information extraction. The goal? Detect and merge mentions of the same real-world entities across text.
It’s an end-to-end flow tackling one of the most persistent data challenges.
Check it out, experiment with your own data, 𝐞𝐧𝐣𝐨𝐲 𝐭𝐡𝐞 𝐬𝐮𝐦𝐦𝐞𝐫 and let me know your thoughts!
cc Paco Nathan you might like this 😉
https://wor.ai/8kQ2qa
a new notebook exploring Semantic Entity Resolution & Extraction using DSPy (Community) and Google's new LangExtract library.
by J Bittner John Sowa once observed: In logic, the existential quantifier ∃ is a notation for asserting that something exists. But logic itself has no vocabulary for describing the things that exist.
In the history of data standards, a recurring pattern should concern anyone working in semantics today. A new standard emerges, promises interoperability, gains adoption across industries or agencies, and for a time seems to solve the immediate need.