After a vivid conversation with Juan Sequeda and others at Connected Data London 2025 about how to start with Ontologies at business clients w/o relying on another KG platform, I have now started to roll (eh vibe coding 🤓) my own Ontology Builder as a simple Streamlit app! Have a look and collaborate if you like.
https://lnkd.in/egGZJHiP
Introducing the ONTO-TRON-5000. A personal project that allows users to build their ontologies right from their data
Introducing the ONTO-TRON-5000. A personal project that allows users to build their ontologies right from their data! The onto-tron is built with the Basic Formal Ontology (BFO) and Common Core Ontologies (CCO) as semantic frameworks for classification. This program emphasizes the importance of design patterns as best practices for ontology documentation and combines it with machine readability. Simply upload your CSV, set the semantic types of your columns and continuously build your ontology above. The program has 3 options for extraction: RDF, R2RML, and the Mermaid Live Editor syntax if you would like to further develop your design pattern there. Included is a BFO/CCO ontology viewer, allowing you to explore the hierarchy and understand how terms are used- no protege required. This is the alpha version and would love feedback as there is a growing list of features to be added. Included in the README are instructions for manual installation and Docker. Enjoy!
https://lnkd.in/ehrDwVrf | 13 comments on LinkedIn
Introducing the ONTO-TRON-5000. A personal project that allows users to build their ontologies right from their data
A Graph RAG (Retrieval-Augmented Generation) chat application that combines OpenAI GPT with knowledge graphs stored in GraphDB
After seeing yet another Graph RAG demo using Neo4j with no ontology, I decided to show what real semantic Graph RAG looks like.
The Problem with Most Graph RAG Demos:
Everyone's building Graph RAG with LPG databases (Neo4j, TigerGraph, Arrango etc.) and calling it "knowledge graphs." But here's the thing:
Without formal ontologies, you don't have a knowledge graph—you just have a graph database.
The difference?
❌ LPG: Nodes and edges are just strings. No semantics. No reasoning. No standards.
✅ RDF/SPARQL: Formal ontologies (RDFS/OWL) that define domain knowledge. Machine-readable semantics. W3C standards. Built-in reasoning.
So I Built a Real Semantic Graph RAG
Using:
- Microsoft Agent Framework - AI orchestration
- Formal ontologies - RDFS/OWL knowledge representation
- Ontotext GraphDB - RDF triple store
- SPARQL - semantic querying
- GPT-5 - ontology-aware extraction
It's all on github, a simple template as boilerplate for you project:
The "Jaguar problem":
What does "Yesterday I was hit by a Jaguar" really mean? It is impossible to know without concept awareness. To demonstrate why ontologies matter, I created a corpus with mixed content:
🐆 Wildlife jaguars (Panthera onca)
🚗 Jaguar cars (E-Type, XK-E)
🎸 Fender Jaguar guitars
I fed this to GPT-5 along with a jaguar conservation ontology.
The result? The LLM automatically extracted ONLY wildlife-related entities—filtering out cars and guitars—because it understood the semantic domain from the ontology.
No post-processing. No manual cleanup. Just intelligent, concept-aware extraction.
This is impossible with LPG databases because they lack formal semantic structure. Labels like (:Jaguar) are just strings—the LLM has no way to know if you mean the animal, car, or guitar.
Knowledge Graphs = "Data for AI"
LLMs don't need more data—they need structured, semantic data they can reason over.
That's what formal ontologies provide:
✅ Domain context
✅ Class hierarchies
✅ Property definitions
✅ Relationship semantics
✅ Reasoning rules
This transforms Graph RAG from keyword matching into true semantic retrieval.
Check Out the Full Implementation, the repo includes:
Complete Graph RAG implementation with Microsoft Agent Framework
Working jaguar conservation knowledge graph
Jupyter notebook: ontology-aware extraction from mixed-content text
https://lnkd.in/dmf5HDRm
And if you have gotten this far, you realize that most of this post is written by Cursor ... That goes for the code too. 😁
Your Turn:
I know this is a contentious topic. Many teams are heavily invested in LPG-based Graph RAG. What are your thoughts on RDF vs. LPG for Graph RAG? Drop a comment below!
#GraphRAG #KnowledgeGraphs #SemanticWeb #RDF #SPARQL #AI #MachineLearning #LLM #Ontology #KnowledgeRepresentation #OpenSource #neo4j #graphdb #agentic-framework #ontotext #agenticai | 148 comments on LinkedIn
Announcing the formation of a Data Façades W3C Community Group
I am excited to announce the formation of a Data Façades W3C Community Group.
Façade-X, initially introduced at SEMANTICS 2021 and successfully implemented by the SPARQL Anything project, provides a simple yet powerful, homogeneous view over diverse and heterogeneous data sources (e.g., CSV, JSON, XML, and many others). With the recent v1.0.0 release of SPARQL Anything, the time was right to work on the long-term stability and widespread adoption of this approach by developing an open, vendor-neutral technology.
The Façade-X concept was born to allow SPARQL users to query data in any structured format in plain SPARQL. Therefore, the choice of a W3C community group to lead efforts on specifications is just natural. Specifications will enhance its reliability, foster innovation, and encourage various vendors and projects—including graph database developers — to provide their own compatible implementations.
The primary goals of the Data Façades Community Group is to:
Define the core specification of the Façade-X method.
Define Standard Mappings: Formalize the required mappings and profiles for connecting Façade-X to common data formats.
Define the specification of the query dialect: Provide a reference for the SPARQL dialect, configuration conventions (like SERVICE IRIs), and the functions/magic properties used.
Establish Governance: Create a monitored, robust process for adding support for new data formats.
Foster Collaboration: Build connections with relevant W3C groups (e.g., RDF & SPARQL, Data Shapes) and encourage involvement from developers, businesses, and adopters.
Join us!
With Luigi Asprino Ivo Velitchkov Justin Dowdy Paul Mulholland Andy Seaborne Ryan Shaw ...
CG: https://lnkd.in/eSxuqsvn
Github: https://lnkd.in/dkHGT8N3
SPARQL Anything #RDF #SPARQL #W3C #FX
announce the formation of a Data Façades W3C Community Group
ODKE+: Ontology-Guided Open-Domain Knowledge Extraction with LLMs
Knowledge graphs (KGs) are foundational to many AI applications, but maintaining their freshness and completeness remains costly. We present ODKE+, a production-grade system that automatically...
A new notebook exploring Semantic Entity Resolution & Extraction using DSPy and Google's new LangExtract library.
Just released a new notebook exploring Semantic Entity Resolution & Extraction using DSPy (Community) and Google's new LangExtract library.
Inspired by Russell Jurney’s excellent work on semantic entity resolution, this demo follows his approach of combining:
✅ embeddings,
✅ kNN blocking,
✅ and LLM matching with DSPy (Community).
On top of that, I added a general extraction layer to test-drive LangExtract, a Gemini-powered, open-source Python library for reliable structured information extraction. The goal? Detect and merge mentions of the same real-world entities across text.
It’s an end-to-end flow tackling one of the most persistent data challenges.
Check it out, experiment with your own data, 𝐞𝐧𝐣𝐨𝐲 𝐭𝐡𝐞 𝐬𝐮𝐦𝐦𝐞𝐫 and let me know your thoughts!
cc Paco Nathan you might like this 😉
https://wor.ai/8kQ2qa
a new notebook exploring Semantic Entity Resolution & Extraction using DSPy (Community) and Google's new LangExtract library.
Ever wish you could make an ontology right from your spreadsheet? A lot of my ontology drafting work begins with a spreadsheet: a lexicon, a catalog of important concepts or subject-matter expert t…
Cellosaurus is now available in RDF format, with a triple store that supports SPARQL queries
If this sounds a bit abstract or unfamiliar…
1) RDF stands for Resource Description Framework. Think of RDF as a way to express knowledge using triplets:
Subject – Predicate – Object.
Example: HeLa (subject) – is_transformed_by (predicate) – Human papillomavirus type 18 (object)
These triplets are like little facts that can be connected together to form a graph of knowledge.
2) A triple store is a database designed specifically to store and retrieve these RDF triplets. Unlike traditional databases (tables, rows), triple stores are optimized for linked data. They allow you to navigate connections between biological entities, like species, tissues, genes, diseases, etc.
3) SPARQL is a query language for RDF data. It lets you ask complex questions, such as:
- Find all cell lines with a *RAS (HRAS, NRAS, KRAS) mutation in p.Gly12
- Find all Cell lines from animals belonging the order "carnivora"
More specifically we now offer from the Tool - API submenu 6 new options:
1) SPARQL Editor (https://lnkd.in/eF2QMsYR). The SPARQL Editor is a tool designed to assist users in developing their SPARQL queries.
2) SPARQL Service (https://lnkd.in/eZ-iN7_e). The SPARQL service is the web service that accepts SPARQL queries over HTTP and returns results from the RDF dataset.
3) Cellosaurs Ontology (https://lnkd.in/eX5ExjMe). An RDF ontology is a formal, structured representation of knowledge. It explicitly defines domain-specific concepts - such as classes and properties - enabling data to be described with meaningful semantics that both humans and machines can interpret. The Cellosaurus ontology is expressed in OWL.
4) Cellosaurus Concept Hopper (https://lnkd.in/e7CH5nj4). The Concept Hopper, is a tool that provides an alternative view of the Cellosaurus ontology. It focuses on a single concept at a time - either a class or a property - and shows how that concept is linked to others within the ontology, as well as how it appears in the data.
5) Cellosaurus dereferencing service (https://lnkd.in/eSATMhGb). The RDF dereferencing service is the mechanism that, given a URI, returns an RDF description of the resource identified by that URI, enabling clients to retrieve structured, machine-readable data about the resource from the web in different formats.
6) Cellosaurus RDF files download (https://lnkd.in/emuEYnMD). This allows you to download the Cellosaurus RDF files in Turtle (ttl) format.
OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment
Ontology Alignment (OA) is fundamental for achieving semantic interoperability across diverse knowledge systems. We present OntoAligner, a comprehensive, modular, and robust Python toolkit for...
SousLesensVocables is a set of tools developed to manage Thesaurus and Ontologies resources through SKOS , OWL and RDF standards and graph visualisation approaches
SousLesensVocables is a set of tools developed to manage Thesaurus and Ontologies resources through SKOS , OWL and RDF standards and graph visualisation approaches
We contributed recently to the "awesome semantic shapes" repository. This is a community-curated list of RDF shape resources, be it validators, generators…
What if creating Linked Open Data was less like coding and more like writing? Could anyone extend the Semantic Web by sharing a document? Publish a knowledge… | 13 comments on LinkedIn
Have you tried Croissant? If not, you are missing out. Using LLMs to generate knowledge graphs is an exciting area of exploration. My colleague Jesús Barrasa…