Summary of what I learned at Connected Data London (CDL2025). The winners are clear: companies who have quietly been investing in knowledge graphs for the last decade, way before this current AI wave
I just stepped off the stage at Connected Data London, where we talked about the black box of AI and the critical role of ontologies. We ended that talk by saying that we, as a community, have a responsibility to cut through the noise. To offer clarity.
That is why today, we're launching The Knowledge Graph Academy.
For too long, education in semantic technology has tended to sit at one of two extremes: highly abstract academic theory, or tool-focused training that fails to teach the underlying principles.
We are building something different. Where educational rigour meets real-world practice.
And I’m not doing this alone. If we are going to define the field, we need the leaders who are actually out there building it. I am incredibly proud to announce that I’ve teamed up with two of the sharpest minds in the industry to lead this programme with me:
🔵 Katariina Kari (Lead Ontologist): Katariina has spent years building KG teams at retail giants. She knows exactly how to capture business expertise to drive ROI. She’s the master of the philosophy: "a little semantics goes a long way."
🔵 Jessica Talisman Tallisman (Senior KG Consultant): With 25+ years in data architecture, Jessica is a true veteran of the trenches. She’s a LinkedIn Top Voice, an expert on the W3C SKOS standard, and creator of the 'Ontology Pipeline' framework.
This isn't just training. It’s a shift in mindset.
This course doesn't just teach you which syntax to use or buttons to press, The Knowledge Graph Academy is designed to change how you think.
Whether you are a practitioner, a leader shaping AI strategy, or someone looking to pivot your career: this is your invitation.
Let’s turn ideas into understanding, and understanding into impact.
⭕ The Knowledge Graph Academy: https://lnkd.in/ecQBMCg3 | 59 comments on LinkedIn
StrangerGraphs is a fan theory prediction engine that applies graph database analytics to the chaotic world of Stranger Things fan theories on Reddit.
The company scraped 150,000 posts and ran community detection algorithms to identify which Stranger Things fan groups have the best track records for predictions. Theories were mapped as a graph (234k nodes and 1.5M relationships) that track characters, plot points and speculation and then used natural language processing to surface patterns across seasons. These predictions are then mapped out in a visualization for extra analysis. Top theories include ■■■ ■■■■■ ■■■■, ■■■ ■■■■■■■■ ■■ and ■■■■ ■■■■■■■■ ■■■ ■■ ■■■■. (Editor note: these theories have been redacted to avoid any angry emails about spoilers.)
A series of posts about knowledge graphs and ontology design patterns
The first in a series of posts about knowledge graphs and ontology design patterns that I swear by. They will lead you through how at Yale we went from a challenge from leadership (build a system that allows discovery of cultural heritage objects across our libraries, archives and museums) to a fully functioning, easy to use, easy to maintain, extremely robust, public knowledge graph.
*The 10 Design Principles to Live By*
1. Scope design through shared use cases
2. Design for international use
3. Make easy things easy, complex things possible
4. Avoid dependency on specific technologies
5. Use REST / Don’t break the web / Don’t fear the network
6. Design for JSON-LD, using LOD principles
7. Follow existing standards & best practices, when possible
8. Define success, not failure
9. Separate concerns, keep APIs & systems loosely coupled
10. Address concerns at the right level
You must first agree on your design principles and priorities. These are crucial because when the inevitable conflicts of opinion arise, you have a set of neutral requirements to compare the different options against.
(1) The first keeps you honest: good ideas are only ideas if they don't advance your business / use cases. Keeping to scope is critical as ontologies have a tendency to expand uncontrollably, reducing usability and maintainability.
(2) Internationalization of knowledge is important because your audience and community doesn't just speak your language, or come from your culture. If you limit your language, you limit your potential.
(3) Ensure that your in-scope edge cases aren't lost, but that in solving them, you haven't made the core functionality more complicated than it needs to be. If your KG isn't usable, then it won't be used.
(4) Don't build for a specific software environment, because that environment is going to change. Probably before you get to production. Locking yourself in is quickest way to obsolescence and oblivion.
(5) Don't try to pack everything a consuming application might need into a single package, browsers and apps deal just fine with hundreds of HTTP requests. Especially with web caches.
(6) JSON-LD is the serialization to use, as devs use JSON all the time, and those devs need to build applications that consume your knowledge. Usability first!
(7) Standards are great... especially as there are so many of them. Don't get all tied up trying to follow a standard that isn't right, but don't invent the wheel unnecessarily.
(8) Define the ontology/API, but don't require errors for all other situations, as you've made versioning impossible. Allow extensions to co-exist, as tomorrow they might be core.
(9) Don't require a single monolith if you can avoid it. If a consuming app only needs half of the functionality, don't make them implement everything.
(10) If there's a problem with the API, don't work around it in the ontology, or vice versa. Solve model problems in the model, vocabulary problems in vocabulary, and API problems in the API. | 12 comments on LinkedIn
a series of posts about knowledge graphs and ontology design patterns
After a vivid conversation with Juan Sequeda and others at Connected Data London 2025 about how to start with Ontologies at business clients w/o relying on another KG platform, I have now started to roll (eh vibe coding 🤓) my own Ontology Builder as a simple Streamlit app! Have a look and collaborate if you like.
https://lnkd.in/egGZJHiP
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning
✨ #NeurIPS2025 paper: Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning Combining contrastive learning and message passing markedly improves features created from embedding graphs, scalable to huge graphs. It taught us a lot on graph feature learning 👇
Graphs can represent knowledge and have scaled to huge sizes (115M entities in Wikidata). How to distill these into good downstream features, eg for machine learning? The challenge is to create feature vectors, and for this graph embeddings have been invaluable.
Our paper shows that message passing is a great tool to build feature vectors from graphs As opposed to contrastive learning, message passing helps embeddings represent the large-scale structure of the graph (it gives Arnoldi-type iterations).
Our approach uses contrastive learning on a core subset of entities, to capture a large-scale structure. Consistent with knowledge-graph embedding literature, this step represents relations as operators on the embedding space. It also anchors the central entities.
Knowledge graphs have long-tailed entity distributions, with many weakly-connected entities on which contrastive learning is under constrained. For these, we propagate embeddings via the relation operators, in a diffusion-like step, extrapolating from the central entities.
To have a very efficient algorithm, we split the graph in overlapping highly-connected blocks that fit in GPU memory. Propagation is then simple in-memory iterations, and we embed huge graphs on a single GPU.
Splitting huge knowledge graphs in sub-parts is actually hard because of the mix of very highly-connected nodes, and a huge long tail hard to reach. We introduce a procedure that allows for overlap in the blocks, relaxing a lot the difficulty.
Our approach, SEPAL, combines these elements for feature learning on large knowledge graphs. It creates feature vectors that lead to better performance on downstream tasks, and it is more scalable. Larger knowledge graphs give feature vectors that provide downstream value.
We also learned that performance on link prediction, the canonical task of knowledge-graph embedding, is not a good proxy for downstream utility. We believe this is because link prediction only needs local structure, unlike downstream tasks
The papier is well reproducible, and we hope it will unleash more progress in knowledge graph embedding.
We'll present at #NeurIPS and #Eurips
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning
Ontology Evolution: When New Knowledge Challenges Old Categories
# ⚙️ Ontology Evolution: When New Knowledge Challenges Old Categories
When new entities don’t fit existing assumptions, your ontology must **evolve logically**, not patch reactively.
---
## 🧩 The Challenge: Going Beyond Binary Thinking
Early models often start simple:
**Appliance → ElectricityConsumer OR NonElectricAppliance**
But what happens when a *WindTurbine* appears?
It **produces** more electricity than it consumes.
---
## 🔧 Step 1: Extend the Energy Role Taxonomy
To reflect the real world:
```
EnergyRole
├─ ElectricityConsumer
├─ ElectricityProducer
├─ ElectricityProsumer (both)
└─ PassiveComponent (neither)
```
Now we can classify correctly:
- 🏠 HVAC → ElectricityConsumer
- ☀️ Solar Panel → ElectricityProducer
- 🔋 Battery → ElectricityProsumer
- 🪟 Window → PassiveComponent
This simple hierarchy shift restores consistency — every new entity has a logical home.
---
## 🧠 Step 2: Add Axioms for Automated Reasoning
Instead of manual assignment, let the reasoner decide.
Example rule set:
- If `producesPower consumesPower` → ElectricityProducer
- If `consumesPower producesPower` → ElectricityConsumer
- If both 0 → ElectricityProsumer
- If both = 0 → PassiveComponent
💡 **Outcome:** The system adapts dynamically to new data while preserving logical harmony.
---
## ⚡ Step 3: Support Evolution, Don’t Break History
When expanding ontologies:
1. Preserve backward compatibility — old data must still make sense.
2. Maintain logical consistency — define disjoint and equivalent classes clearly.
3. Enable gradual migration — version and document each model improvement.
4. Use reasoning — automate classification from quantitative features.
Evolution isn’t about tearing down—it’s about **strengthening the structure**.
---
## 🌍 Real-World Analogy
Think of this like upgrading an energy grid:
You don’t replace the whole system — you extend the schema to accommodate solar panels, batteries, and wind farms while ensuring the old consumers still work.
Ontology evolution works the same way — graceful adaptation ensures **stability + intelligence**.
---
## 💬 Key Takeaway
*WindTurbine* example shows why **ontology evolution** is essential:
- Models must expand beyond rigid assumptions.
- Axiomatic rules make adaptation automatic.
- Logic-based flexibility sustains long-term scalability.
In short: **don’t model just the present — model the principles of change.**
#Ontology #KnowledgeEngineering #KnowledgeGraphs #ExplainableAI #OntologyEvolution #NeuroSymbolicAI #AITransformation #KnowledgeManagement
👉 Follow me for Knowledge Management and Neuro Symbolic AI daily nuggets.
👉 Join my group for more insights and community discussions [Join the Group](https://lnkd.in/d9Z8-RQd)
Ontology Evolution: When New Knowledge Challenges Old Categories
Introducing the ONTO-TRON-5000. A personal project that allows users to build their ontologies right from their data
Introducing the ONTO-TRON-5000. A personal project that allows users to build their ontologies right from their data! The onto-tron is built with the Basic Formal Ontology (BFO) and Common Core Ontologies (CCO) as semantic frameworks for classification. This program emphasizes the importance of design patterns as best practices for ontology documentation and combines it with machine readability. Simply upload your CSV, set the semantic types of your columns and continuously build your ontology above. The program has 3 options for extraction: RDF, R2RML, and the Mermaid Live Editor syntax if you would like to further develop your design pattern there. Included is a BFO/CCO ontology viewer, allowing you to explore the hierarchy and understand how terms are used- no protege required. This is the alpha version and would love feedback as there is a growing list of features to be added. Included in the README are instructions for manual installation and Docker. Enjoy!
https://lnkd.in/ehrDwVrf | 13 comments on LinkedIn
Introducing the ONTO-TRON-5000. A personal project that allows users to build their ontologies right from their data
A Survey of Graph Retrieval-Augmented Generation for Customized...
Large language models (LLMs) have demonstrated remarkable capabilities in a wide range of tasks, yet their application to specialized domains remains challenging due to the need for deep...
Building a Biomedical GraphRAG: When Knowledge Graphs Meet Vector Search
a RAG system for biomedical research that uses both vector search and knowledge graphs.
Turns out, you need both.
Vector databases, such as Qdrant, are excellent at handling semantic similarity, but they struggle with relationship queries.
𝐓𝐡𝐞 𝐢𝐬𝐬𝐮𝐞: Author networks, citations, and institutional collaborations aren't semantic similarities. They're structured relationships that don't live in embeddings.
𝐓𝐡𝐞 𝐡𝐲𝐛𝐫𝐢𝐝 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡
I combined Qdrant for semantic retrieval with Neo4j for relationship queries, using OpenAI's tool-calling to orchestrate between them.
The workflow:
1️⃣ User asks a question
2️⃣ Qdrant retrieves semantically relevant papers
3️⃣ LLM analyzes the query and decides which graph enrichment tools to call
4️⃣ Neo4j returns structured relationship data
5️⃣ Both sources combine into one answer
Same query with the hybrid system: Returns 4 specific collaborators with paper counts, plus relevant research context.
𝐈𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐧𝐨𝐭𝐞𝐬
I initially tried having the LLM generate Cypher queries directly, but tool-calling worked much better. The LLM decides which pre-built tool to call, as the tools themselves contain reliable Cypher queries, and LLMs are not yet good enough at Cypher query generation
For domains with complex relationships, such as biomedical research, legal documents, and enterprise knowledge, combining vector search with knowledge graphs gives you capabilities neither has alone.
The point of semantic modeling is to capture all of the detail, all of the knowledge, all of the information that becomes available to us
The point of semantic modeling is to capture all of the detail, all of the knowledge, all of the information that becomes available to us.
Here is an example of extensive semantic model: ontology plus taxonomies for greater depth.
This example is quite a comprehensive semantic model if you consider that it’s supported with nearly 100 sets of definitions and descriptions.
The model describes knowledge of many different terms, along with an understanding of how those terms are defined, described, and interrelated.
When it becomes difficult to absorb all at once, view it in layers:
- Begin with the simple knowledge graph—understand the nodes and the edges, the illustration of things and relationships among them.
- Then view the property graph to understand the facts that can be known about each thing and each relationship.
- Finally, extend it to include taxonomies to see classes and subclasses.
Another approach for layering might begin with the knowledge graph showing things and relationships, then add entity taxonomies to understand classes and subclasses of entities, and finally extend it to see properties and property taxonomies.
Don’t shy away from large or complex models! Simply plan to manage that detail and complexity by layering and segmenting the diagram. This provides the ability to look at subsets of the model without losing the comprehensive view of enterprise semantics.
Graphic sourced from the ‘Architecture and Design for Data Interoperability’ course by Dave Wells. https://lnkd.in/gtqThWdX
The point of semantic modeling is to capture all of the detail, all of the knowledge, all of the information that becomes available to us