I used the o word last week and it hit a few nerves. Ontologies bring context.
But then context engineering is very poorly understood. Agent engineers speak about it, expect everyone is doing it, know but almost everyone is winging it.
Here's what context engineering is definitely not - ie. longer prompts.
What it actually is - the right information, with the right meaning, at the right time. Not more but the right information with the right meaning. Sounds super abstract.
That's why a brief video that actually breaks down how to load context.
Okay. Not brief. but context needs context.
A Graph RAG (Retrieval-Augmented Generation) chat application that combines OpenAI GPT with knowledge graphs stored in GraphDB
After seeing yet another Graph RAG demo using Neo4j with no ontology, I decided to show what real semantic Graph RAG looks like.
The Problem with Most Graph RAG Demos:
Everyone's building Graph RAG with LPG databases (Neo4j, TigerGraph, Arrango etc.) and calling it "knowledge graphs." But here's the thing:
Without formal ontologies, you don't have a knowledge graph—you just have a graph database.
The difference?
❌ LPG: Nodes and edges are just strings. No semantics. No reasoning. No standards.
✅ RDF/SPARQL: Formal ontologies (RDFS/OWL) that define domain knowledge. Machine-readable semantics. W3C standards. Built-in reasoning.
So I Built a Real Semantic Graph RAG
Using:
- Microsoft Agent Framework - AI orchestration
- Formal ontologies - RDFS/OWL knowledge representation
- Ontotext GraphDB - RDF triple store
- SPARQL - semantic querying
- GPT-5 - ontology-aware extraction
It's all on github, a simple template as boilerplate for you project:
The "Jaguar problem":
What does "Yesterday I was hit by a Jaguar" really mean? It is impossible to know without concept awareness. To demonstrate why ontologies matter, I created a corpus with mixed content:
🐆 Wildlife jaguars (Panthera onca)
🚗 Jaguar cars (E-Type, XK-E)
🎸 Fender Jaguar guitars
I fed this to GPT-5 along with a jaguar conservation ontology.
The result? The LLM automatically extracted ONLY wildlife-related entities—filtering out cars and guitars—because it understood the semantic domain from the ontology.
No post-processing. No manual cleanup. Just intelligent, concept-aware extraction.
This is impossible with LPG databases because they lack formal semantic structure. Labels like (:Jaguar) are just strings—the LLM has no way to know if you mean the animal, car, or guitar.
Knowledge Graphs = "Data for AI"
LLMs don't need more data—they need structured, semantic data they can reason over.
That's what formal ontologies provide:
✅ Domain context
✅ Class hierarchies
✅ Property definitions
✅ Relationship semantics
✅ Reasoning rules
This transforms Graph RAG from keyword matching into true semantic retrieval.
Check Out the Full Implementation, the repo includes:
Complete Graph RAG implementation with Microsoft Agent Framework
Working jaguar conservation knowledge graph
Jupyter notebook: ontology-aware extraction from mixed-content text
https://lnkd.in/dmf5HDRm
And if you have gotten this far, you realize that most of this post is written by Cursor ... That goes for the code too. 😁
Your Turn:
I know this is a contentious topic. Many teams are heavily invested in LPG-based Graph RAG. What are your thoughts on RDF vs. LPG for Graph RAG? Drop a comment below!
#GraphRAG #KnowledgeGraphs #SemanticWeb #RDF #SPARQL #AI #MachineLearning #LLM #Ontology #KnowledgeRepresentation #OpenSource #neo4j #graphdb #agentic-framework #ontotext #agenticai | 148 comments on LinkedIn
Beyond RDF vs LPG: Operational Ontologies, Hybrid Semantics, and Why We Still Chose a Property Graph | LinkedIn
How to stay sane about “semantic Graph RAG” when your job is shipping reliable systems, not winning ontology theology wars. You don’t wake up in the morning thinking about OWL profiles or SPARQL entailment regimes.
Your agents NEED a semantic layer 🫵
Traditional RAG systems embed documents, retrieve similar chunks, and feed them to LLMs. This works for simple Q&A. It fails catastrophically for agents that need to reason across systems.
Why? Because semantic similarity doesn't capture relationships.
Your vector database can tell you that two documents are "about bonds." It can't tell you that Document A contains the official pricing methodology, Document B is a customer complaint referencing that methodology, and Document C is an assembly guide that superseded both.
These relationships are invisible to embeddings.
What semantic layers provide:
Entity resolution across data silos. When "John Smith" in your CRM, "J. Smith" in email, and "john.smith@company.com" in logs all map to the same person node, agents can traverse the complete context.
Cross-domain entity linking through knowledge graphs. Products in your database connect to assembly guides, which link to customer reviews, which reference support tickets. Single-query traversal instead of application-level joins.
Provenance-tracked derivations. Every extracted entity, inferred relationship, and generated embedding maintains lineage to source data. Critical for regulatory compliance and debugging agent behavior.
Ontology-grounded reasoning. Financial instruments mapped to FIBO standards. Products mapped to domain taxonomies. Agents reason with structured vocabulary, not statistical word associations.
The technical implementation pattern:
Layer 1: Unified graph database supporting vector, structured, and semi-structured data types in single queries.
Layer 2: Entity extraction pipeline with coreference resolution and deduplication across sources.
Layer 3: Relationship inference and cross-domain linking using both explicit identifiers and contextual signals.
Layer 4: Separation of first-party data from derived artifacts with clear tagging for safe regeneration.
The result: Agents can traverse "Product → described_in → AssemblyGuide → improved_by → CommunityTip → authored_by → Expert" in a single graph query instead of five API calls with application-level joins.
Model Context Protocol is emerging as the open standard for semantic tool modeling. Not just describing APIs, but encoding what tools do, when to use them, and how outputs compose. This enables agents to discover and reason about capabilities dynamically.
The competitive moat isn't your model choice.
The moat is your knowledge graph architecture and the accumulated entity relationships that took years to build. | 28 comments on LinkedIn
Following yesterday's announcements from OpenAI, brands start to have real ways to operate inside ChatGPT. At a very high-level this is the map for anyone considering entering (or expanding) into the ChatGPT ecosystem: Conversational Prompts / UX: optimize how ChatGPT “asks” for or surfaces brand se
New project makes Wikipedia data more accessible to AI | TechCrunch
Called the Wikidata Embedding Project, the system applies a vector-based semantic search to the existing data on Wikipedia and its sister platforms, consisting of nearly 120 million entries.
Automatic Ontology Generation Still Falls Short & Why Applied Ontologists Deliver the ROI | LinkedIn
For all the excitement around large language models, the latest research from Simona-Vasilica Oprea and Georgiana Stănescu (Electronics 14:1313, 2025) offers a reality check. Automatic ontology generation, even with novel prompting techniques like Memoryless CQ-by-CQ and Ontogenia, remains a partial
Semantic Quality Is the Missing Risk Control in Financial AI and GraphRAG | LinkedIn
by Timothy Coleman and J Bittner Picture this: an AI system confidently delivers a financial report, but it misclassifies $100M in assets as liabilities. Errors of this kind are already appearing in financial AI systems, and the stakes only grow as organizations adopt Retrieval-Augmented Generation
T-Box: The secret sauce of knowledge graphs and AI
T-Box: The secret sauce of knowledge graphs and AI
Ever wondered how knowledge graphs “understand” the world? Meet the T-Box, the part that tells your graph what exists and how it can relate.
Think of it like building a LEGO set:
T-Box (Terminological Box) = the instruction manual (defines the pieces and how they fit)
A-Box (Assertional Box) = the LEGO pieces you actually have (your data, your instances)
Why it’s important for RDF knowledge graphs:
- Gives your data structure and rules, so your graph doesn’t turn into spaghetti
- Enables reasoning, letting the system infer new facts automatically
- Keeps your graph consistent and maintainable, even as it grows
Why it’s better than other models:
Traditional databases just store rows and columns; relationships have no meaning
RDF + T-Box = data that can explain itself and connect across domains
Why AI loves it:
- AI can reason over knowledge, not just crunch numbers
- Enables smarter recommendations, insights, and predictions based on structured knowledge
Quick analogy:
T-Box = blueprint/instruction manual (the ontology / what is possible)
A-Box = the real-world building (the facts / what is true)
Together = AI-friendly, smart knowledge graph
#KnowledgeGraph #RDF #AI #SemanticWeb #DataScience #GraphData
T-Box: The secret sauce of knowledge graphs and AI
Guy van den Broeck (UCLA)https://simons.berkeley.edu/talks/guy-van-den-broeck-ucla-2025-04-29Theoretical Aspects of Trustworthy AIToday, many expect AI to ta...
A new notebook exploring Semantic Entity Resolution & Extraction using DSPy and Google's new LangExtract library.
Just released a new notebook exploring Semantic Entity Resolution & Extraction using DSPy (Community) and Google's new LangExtract library.
Inspired by Russell Jurney’s excellent work on semantic entity resolution, this demo follows his approach of combining:
✅ embeddings,
✅ kNN blocking,
✅ and LLM matching with DSPy (Community).
On top of that, I added a general extraction layer to test-drive LangExtract, a Gemini-powered, open-source Python library for reliable structured information extraction. The goal? Detect and merge mentions of the same real-world entities across text.
It’s an end-to-end flow tackling one of the most persistent data challenges.
Check it out, experiment with your own data, 𝐞𝐧𝐣𝐨𝐲 𝐭𝐡𝐞 𝐬𝐮𝐦𝐦𝐞𝐫 and let me know your thoughts!
cc Paco Nathan you might like this 😉
https://wor.ai/8kQ2qa
a new notebook exploring Semantic Entity Resolution & Extraction using DSPy (Community) and Google's new LangExtract library.
Jessica Talisman has been publishing a series of articles on Substack about how to develop more robust AI systems by leveraging vocabularies, thesauri, tax...
Why Businesses Must Ground Their AI in Knowledge Graphs | LinkedIn
Here, I clearly explain why businesses must transition from raw tabular data to RDF-based knowledge graphs, and why this is essential to ground AI in logic-driven, traceable inference rather than black-box prediction: 1. Your tabular data is dumb.
I've spent long, hard years learning how to talk about knowledge graphs and semantics with software engineers who have little training in linguistics. I feel quite fluent at this point, after investing huge amounts of effort into understanding statistics (I was a humanities undergrad) and into unpac
how both OWL and SHACL can be employed during the decision-making phase for AI Agents when using a knowledge graph instead of relying on an LLM that hallucinates
𝙏𝙝𝙤𝙪𝙜𝙝𝙩 𝙛𝙤𝙧 𝙩𝙝𝙚 𝙙𝙖𝙮: I've been mulling over how both OWL and SHACL can be employed during the decision-making phase for AI Agents when using a knowledge graph instead of relying on an LLM that hallucinates. In this way, the LLM can still be used for assessment and sensory feedback, but it augments the graph, not the other way around. OWL and SHACL serve different roles. SHACL is not just a preprocessing validator; it can play an active role in constraining, guiding, or triggering decisions, especially when integrated into AI pipelines. However, OWL is typically more central to inferencing and reasoning tasks.
SHACL can actively participate in decision-making, especially when decisions require data integrity, constraint enforcement, or trigger-based logic. In complex agents, OWL provides the inferencing engine, while SHACL acts as the constraint gatekeeper and occasionally contributes to rule-based decision-making.
For example, an AI agent processes RDF data describing an applicant's skills, degree, and experience. SHACL validates the data's structure, ensuring required fields are present and correctly formatted. OWL reasoning infers that the applicant is qualified for a technical role and matches the profile of a backend developer. SHACL is then used again to check policy compliance. With all checks passed, the applicant is shortlisted, and a follow-up email is triggered.
In AI agent decision-making, OWL and SHACL often work together in complementary ways. SHACL is commonly used as a preprocessing step to validate incoming RDF data. If the data fails validation, it's flagged or excluded, ensuring only clean, structurally sound data reaches the OWL reasoner. In this role, SHACL acts as a gatekeeper.
They can also operate in parallel or in an interleaved manner within a pipeline. As decisions evolve, SHACL shapes may be checked mid-process. Some AI agents even use SHACL as a rule engine—to trigger alerts, detect actionable patterns, or constrain reasoning paths—while OWL continues to handle more complex semantic inferences, such as class hierarchies or property logic.
Finally, SHACL can augment decision-making by confirming whether OWL-inferred actions comply with specific constraints. OWL may infer that “A is a type of B, so do X,” and SHACL then determines whether doing X adheres to a policy or requirement. Because SHACL supports closed-world assumptions (which OWL does not), it plays a valuable role in enforcing policies or compliance rules during decision execution.
Illustrated:
how both OWL and SHACL can be employed during the decision-making phase for AI Agents when using a knowledge graph instead of relying on an LLM that hallucinates
When people discuss how LLMS "reason," you’ll often hear that they rely on transduction rather than abduction. It sounds technical, but the distinction matters - especially as we start wiring LLMs into systems that are supposed to think.
🔵 Transduction is case-to-case reasoning. It doesn’t build theories; it draws fuzzy connections based on resemblance. Think: “This metal conducts electricity, and that one looks similar - so maybe it does too.”
🔵 Abduction, by contrast, is about generating explanations. It’s what scientists (and detectives) do: “This metal is conducting - maybe it contains free electrons. That would explain it.”
The claim is that LLMs operate more like transducers - navigating high-dimensional spaces of statistical similarity, rather than forming crisp generalisations. But this isn’t the whole picture. In practice, it seems to me that LLMs also perform a kind of induction - abstracting general patterns from oceans of text. They learn the shape of ideas and apply them in novel ways. That’s closer to “All metals of this type have conducted in the past, so this one probably will.”
Now add tools to the mix - code execution, web search, Elon Musk's tweet history 😉 - and LLMs start doing something even more interesting: program search and synthesis. It's messy, probabilistic, and not at all principled or rigorous. But it’s inching toward a form of abductive reasoning.
Which brings us to a more principled approach for reasoning within an enterprise domain: the neuro-symbolic loop - a collaboration between large language models and knowledge graphs. The graph provides structure: formal semantics, ontologies, logic, and depth. The LLM brings intuition: flexible inference, linguistic creativity, and breadth. One grounds. The other leaps.
💡 The real breakthrough could come when the grounding isn’t just factual, but conceptual - when the ontology encodes clean, meaningful generalisations. That’s when the LLM’s leaps wouldn’t just reach further - they’d rise higher, landing on novel ideas that hold up under formal scrutiny. 💡
So where do metals fit into this new framing?
🔵 Transduction: “This metal conducts. That one looks the same - it probably does too.”
🔵 Induction: “I’ve tested ten of these. All conducted. It’s probably a rule.”
🔵 Abduction: “This metal is conducting. It shares properties with the ‘conductive alloy’ class - especially composition and crystal structure. The best explanation is a sea of free electrons.”
LLMs, in isolation, are limited in their ability to perform structured abduction. But when embedded in a system that includes a formal ontology, logical reasoning, and external tools, they can begin to participate in richer forms of reasoning. These hybrid systems are still far from principled scientific reasoners - but they hint at a path forward: a more integrated and disciplined neuro-symbolic architecture that moves beyond mere pattern completion.
S&P Global Unlocks the Future of AI-driven insights with AI-Ready Metadata on S&P Global Marketplace
🚀 When I shared our 2025 goals for the Enterprise Data Organization, one of the things I alluded to was machine-readable column-level metadata. Let’s unpack what that means—and why it matters.
🔍 What: For datasets we deliver via modern cloud distribution, we now provide human - and machine - readable metadata at the column level. Each column has an immutable URL (no auth, no CAPTCHA) that hosts name/value metadata - synonyms, units of measure, descriptions, and more - in multiple human languages. It’s semantic context that goes far beyond what a traditional data dictionary can convey. We can't embed it, so we link to it.
💡 Why: Metadata is foundational to agentic, precise consumption of structured data. Our customers are investing in semantic layers, data catalogs, and knowledge graphs - and they shouldn’t have to copy-paste from a PDF to get there. Use curl, Python, Bash - whatever works - to automate ingestion. (We support content negotiation and conditional GETs.)
🧠 Under the hood? It’s RDF. Love it or hate it, you don’t need to engage with the plumbing unless you want to.
✨ To our knowledge, this hasn’t been done before. This is our MVP. We’re putting it out there to learn what works - and what doesn’t. It’s vendor-neutral, web-based, and designed to scale across:
📊 Breadth of datasets across S&P
🧬 Depth of metadata
🔗 Choice of linking venue
🙏 It took a village to make this happen. I can’t name everyone without writing a book, but I want to thank our executive leadership for the trust and support to go build this.
Let us know what you think!
🔗 https://lnkd.in/gbe3NApH
Martina Cheung, Saugata Saha, Swamy Kocherlakota, Dave Ernsberger, Mark Eramo, Frank Tarsillo, Warren Breakstone, Hamish B., Erica Robeen, Laura Miller, Justine S Iverson, | 17 comments on LinkedIn
Building Truly Autonomous AI: A Semantic Architecture Approach | LinkedIn
I've been working on autonomous AI systems, and wanted to share some thoughts on what I believe makes them effective. The challenge isn't just making AI that follows instructions well, but creating systems that can reason, and act independently.
LLMs already contain overlapping world models. You just have to ask them right.
Ontologists reply to an LLM output, “That’s not a real ontology—it’s not a formal conceptualization.”
But that’s just the No True Scotsman fallacy dressed up in OWL. Boring. Not growth-oriented. Look forward, angel.
A foundation model is a compression of human knowledge. The real problem isn't that we "lack a conceptualization". The real problem with an FM is that they contain too many. FMs contain conceptualizations—plural. Messy? Sure. But usable.
At Stardog, we’re turning this latent structure into real ontologies using symbolic knowledge distillation. Prompt orchestration → structure extraction → formal encoding. OWL, SHACL, and friends. Shake till mixed. Rinse. Repeat. Secret sauce simmered and reduced.
This isn't theoretical hard. We avoid that. It’s merely engineering hard. We LTF into that!
But the payoff means bootstrapping rich, new ontologies at scale: faster, cheaper, with lineage. It's the intersection of FM latent space, formal ontology, and user intent expressed via CQs. We call it the Symbolic Latent Layer (SLL). Cute eh?
The future of enterprise AI isn’t just documents. It’s distilling structured symbolic knowledge from LLMs and plugging it into agents, workflows, and reasoning engines.
You don’t need a priesthood to get a formal ontology anymore. You need a good prompt and a smarter pipeline and the right EKG platform.
There's a lot more to say about this so I said it at Stardog Labs https://lnkd.in/eY5Sibed | 17 comments on LinkedIn