StrangerGraphs is a fan theory prediction engine that applies graph database analytics to the chaotic world of Stranger Things fan theories on Reddit.
The company scraped 150,000 posts and ran community detection algorithms to identify which Stranger Things fan groups have the best track records for predictions. Theories were mapped as a graph (234k nodes and 1.5M relationships) that track characters, plot points and speculation and then used natural language processing to surface patterns across seasons. These predictions are then mapped out in a visualization for extra analysis. Top theories include ■■■ ■■■■■ ■■■■, ■■■ ■■■■■■■■ ■■ and ■■■■ ■■■■■■■■ ■■■ ■■ ■■■■. (Editor note: these theories have been redacted to avoid any angry emails about spoilers.)
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning
✨ #NeurIPS2025 paper: Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning Combining contrastive learning and message passing markedly improves features created from embedding graphs, scalable to huge graphs. It taught us a lot on graph feature learning 👇
Graphs can represent knowledge and have scaled to huge sizes (115M entities in Wikidata). How to distill these into good downstream features, eg for machine learning? The challenge is to create feature vectors, and for this graph embeddings have been invaluable.
Our paper shows that message passing is a great tool to build feature vectors from graphs As opposed to contrastive learning, message passing helps embeddings represent the large-scale structure of the graph (it gives Arnoldi-type iterations).
Our approach uses contrastive learning on a core subset of entities, to capture a large-scale structure. Consistent with knowledge-graph embedding literature, this step represents relations as operators on the embedding space. It also anchors the central entities.
Knowledge graphs have long-tailed entity distributions, with many weakly-connected entities on which contrastive learning is under constrained. For these, we propagate embeddings via the relation operators, in a diffusion-like step, extrapolating from the central entities.
To have a very efficient algorithm, we split the graph in overlapping highly-connected blocks that fit in GPU memory. Propagation is then simple in-memory iterations, and we embed huge graphs on a single GPU.
Splitting huge knowledge graphs in sub-parts is actually hard because of the mix of very highly-connected nodes, and a huge long tail hard to reach. We introduce a procedure that allows for overlap in the blocks, relaxing a lot the difficulty.
Our approach, SEPAL, combines these elements for feature learning on large knowledge graphs. It creates feature vectors that lead to better performance on downstream tasks, and it is more scalable. Larger knowledge graphs give feature vectors that provide downstream value.
We also learned that performance on link prediction, the canonical task of knowledge-graph embedding, is not a good proxy for downstream utility. We believe this is because link prediction only needs local structure, unlike downstream tasks
The papier is well reproducible, and we hope it will unleash more progress in knowledge graph embedding.
We'll present at #NeurIPS and #Eurips
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning
A Survey of Graph Retrieval-Augmented Generation for Customized...
Large language models (LLMs) have demonstrated remarkable capabilities in a wide range of tasks, yet their application to specialized domains remains challenging due to the need for deep...
Building a Biomedical GraphRAG: When Knowledge Graphs Meet Vector Search
a RAG system for biomedical research that uses both vector search and knowledge graphs.
Turns out, you need both.
Vector databases, such as Qdrant, are excellent at handling semantic similarity, but they struggle with relationship queries.
𝐓𝐡𝐞 𝐢𝐬𝐬𝐮𝐞: Author networks, citations, and institutional collaborations aren't semantic similarities. They're structured relationships that don't live in embeddings.
𝐓𝐡𝐞 𝐡𝐲𝐛𝐫𝐢𝐝 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡
I combined Qdrant for semantic retrieval with Neo4j for relationship queries, using OpenAI's tool-calling to orchestrate between them.
The workflow:
1️⃣ User asks a question
2️⃣ Qdrant retrieves semantically relevant papers
3️⃣ LLM analyzes the query and decides which graph enrichment tools to call
4️⃣ Neo4j returns structured relationship data
5️⃣ Both sources combine into one answer
Same query with the hybrid system: Returns 4 specific collaborators with paper counts, plus relevant research context.
𝐈𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐧𝐨𝐭𝐞𝐬
I initially tried having the LLM generate Cypher queries directly, but tool-calling worked much better. The LLM decides which pre-built tool to call, as the tools themselves contain reliable Cypher queries, and LLMs are not yet good enough at Cypher query generation
For domains with complex relationships, such as biomedical research, legal documents, and enterprise knowledge, combining vector search with knowledge graphs gives you capabilities neither has alone.
I used the o word last week and it hit a few nerves. Ontologies bring context.
But then context engineering is very poorly understood. Agent engineers speak about it, expect everyone is doing it, know but almost everyone is winging it.
Here's what context engineering is definitely not - ie. longer prompts.
What it actually is - the right information, with the right meaning, at the right time. Not more but the right information with the right meaning. Sounds super abstract.
That's why a brief video that actually breaks down how to load context.
Okay. Not brief. but context needs context.
The Knowledge Graph Talent Shortage: Why Companies Can't Find the Skills They Desperately Need
The Knowledge Graph Talent Shortage: Why Companies Can't Find the Skills They Desperately Need
In my previous posts, I showed how Google's Knowledge Graph gives them a major AI advantage (https://lnkd.in/d5ZpMYut), and how enterprises from IKEA to Siemens to AstraZeneca have been using knowledge graphs and now leverage them for GenAI applications (https://lnkd.in/dPhuUhFJ).
But here's the problem: we don't have enough people who know how to build them.
📊 The numbers tell the story. Job boards show thousands of open positions globally for ontology engineers, semantic web developers, and knowledge graph specialists. Yet these positions remain unfilled for months. Salaries for this expertise are rising, and technology vendors report inbound client calls instead of chasing business.
🤔 Why the shortage? The semantic web emerged in the early 2000s with technologies like RDF, OWL, and SPARQL. A small group of pioneers built this expertise.
I was part of that early wave. I contributed to the POSC Caesar Association oil and gas ontology, certified as ontology modeller and participated in the W3C workshop hosted by Chevron in Houston in 2008. Later I led the Integrated Operations in the High North (IOHN) program with 23 companies like ABB, Siemens, and Cisco to increase semantic web knowledge within Equinor's vendor ecosystem. After IOHN, I stepped away for over a decade. The Knowledge Graph Alliance (KGA) drew me back.
Companies need people who can design ontologies, write SPARQL queries, map enterprise data to semantic standards, and integrate knowledge graphs with LLMs. These aren't skills you pick up in a weekend bootcamp.
🔄 What needs to change? Universities must integrate semantic knowledge graphs into core curriculum alongside AI and machine learning as requirements, not electives.
Here's something many don't realize: philosophy matters. Some of the best ontologists have philosophy degrees. Understanding how to represent knowledge requires training in logic and formal reasoning.
DAMA International®'s Data Management Body of Knowledge covers 11 knowledge areas, but knowledge graphs remain absent. This would legitimize the discipline.
Industry-academia bridges are critical. Organizations like the KGA bring together industry leaders with research organizations and academia. We need more such collaborations.
💡 The opportunity: If you're a data engineer or data scientist looking for a career differentiator, semantic web skills are your ticket.
🎯 The bottom line: Knowledge graphs aren't optional for industrial-scale GenAI. But you need the people who understand them.
While reports document tech talent shortages, the semantic web skills gap remains largely undocumented as companies struggle to fill thousands of positions.
What's your experience with the shortage? Are you hiring? Upskilling? Teaching this?
#KnowledgeGraphs #SemanticWeb #AI #GenAI #TalentShortage #SkillsGap #Ontology #DataScience #Philosophy #DigitalTransformation | 29 comments on LinkedIn
The Knowledge Graph Talent Shortage: Why Companies Can't Find the Skills They Desperately Need
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
Alhamdulillah, ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
Just as matter is formed from atoms, and galaxies are formed from stars, knowledge is likely to be formed from atomic knowledge graphs.
Atomic knowledge graphs were born from our intention to solve a common problem in LLM-based KG construction methods: exhaustivity and stability. Often, these methods produce unstable KGs that change when rerunning the construction process, even without changing anything. Moreover, they fail to capture all facts in the input documents and usually overlook the temporal and dynamic aspects of real-world data.
What is the solution? Atomic facts that are temporally aware.
Instead of constructing knowledge graphs from raw documents, we split them into atomic facts, which are self-contained and concise propositions. Temporal atomic KGs are constructed from each atomic fact. Then, we defined how the temporal atomic KGs would be merged at the atomic level and how the temporal aspects would be handled. We designed a binary merge algorithm that combines two TKGs and a parallel merge process that merges all TKGs simultaneously. The entire architecture operates in parallel.
ATOM employs dual-time modeling that distinguishes observation time from validity time and has 3 main modules:
- Module 1 (Atomic Fact Decomposition) splits input documents observed at time t into atomic facts using LLM-based prompting, where each temporal atomic fact is a short, self-contained snippet that conveys exactly one piece of information.
- Module 2 (Atomic TKGs Construction) extracts 5-tuples in parallel from each atomic fact to construct atomic temporal KGs, while embedding nodes and relations and addressing temporal resolution during extraction.
- Module 3 (Parallel Atomic Merge) employs a binary merge algorithm to merge pairs of atomic TKGs through iterative pairwise merging in parallel until convergence, with three resolution phases: (1) entity resolution, (2) relation name resolution, and (3) temporal resolution that merges observation and validity time sets for relations with similar (e_s, r_p, e_o). The resulting TKG snapshot is then merged with the previous DTKG to yield the updated DTKG.
Results: Empirical evaluations demonstrate that ATOM achieves ~18% higher exhaustivity, ~17% better stability, and over 90% latency reduction compared to baseline methods (including iText2KG), demonstrating strong scalability potential for dynamic TKG construction.
Check our ATOM's architecture and code:
Preprint Paper: https://lnkd.in/dsJzDaQc
Code: https://lnkd.in/drZUyisV
Website: (coming soon)
Example use cases: (coming soon)
Special thanks to the dream team: Ludovic Moncla, Khalid Benabdeslem, Rémy Cazabet, Pierre Cléau 📚📡 | 14 comments on LinkedIn
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
A Graph RAG (Retrieval-Augmented Generation) chat application that combines OpenAI GPT with knowledge graphs stored in GraphDB
After seeing yet another Graph RAG demo using Neo4j with no ontology, I decided to show what real semantic Graph RAG looks like.
The Problem with Most Graph RAG Demos:
Everyone's building Graph RAG with LPG databases (Neo4j, TigerGraph, Arrango etc.) and calling it "knowledge graphs." But here's the thing:
Without formal ontologies, you don't have a knowledge graph—you just have a graph database.
The difference?
❌ LPG: Nodes and edges are just strings. No semantics. No reasoning. No standards.
✅ RDF/SPARQL: Formal ontologies (RDFS/OWL) that define domain knowledge. Machine-readable semantics. W3C standards. Built-in reasoning.
So I Built a Real Semantic Graph RAG
Using:
- Microsoft Agent Framework - AI orchestration
- Formal ontologies - RDFS/OWL knowledge representation
- Ontotext GraphDB - RDF triple store
- SPARQL - semantic querying
- GPT-5 - ontology-aware extraction
It's all on github, a simple template as boilerplate for you project:
The "Jaguar problem":
What does "Yesterday I was hit by a Jaguar" really mean? It is impossible to know without concept awareness. To demonstrate why ontologies matter, I created a corpus with mixed content:
🐆 Wildlife jaguars (Panthera onca)
🚗 Jaguar cars (E-Type, XK-E)
🎸 Fender Jaguar guitars
I fed this to GPT-5 along with a jaguar conservation ontology.
The result? The LLM automatically extracted ONLY wildlife-related entities—filtering out cars and guitars—because it understood the semantic domain from the ontology.
No post-processing. No manual cleanup. Just intelligent, concept-aware extraction.
This is impossible with LPG databases because they lack formal semantic structure. Labels like (:Jaguar) are just strings—the LLM has no way to know if you mean the animal, car, or guitar.
Knowledge Graphs = "Data for AI"
LLMs don't need more data—they need structured, semantic data they can reason over.
That's what formal ontologies provide:
✅ Domain context
✅ Class hierarchies
✅ Property definitions
✅ Relationship semantics
✅ Reasoning rules
This transforms Graph RAG from keyword matching into true semantic retrieval.
Check Out the Full Implementation, the repo includes:
Complete Graph RAG implementation with Microsoft Agent Framework
Working jaguar conservation knowledge graph
Jupyter notebook: ontology-aware extraction from mixed-content text
https://lnkd.in/dmf5HDRm
And if you have gotten this far, you realize that most of this post is written by Cursor ... That goes for the code too. 😁
Your Turn:
I know this is a contentious topic. Many teams are heavily invested in LPG-based Graph RAG. What are your thoughts on RDF vs. LPG for Graph RAG? Drop a comment below!
#GraphRAG #KnowledgeGraphs #SemanticWeb #RDF #SPARQL #AI #MachineLearning #LLM #Ontology #KnowledgeRepresentation #OpenSource #neo4j #graphdb #agentic-framework #ontotext #agenticai | 148 comments on LinkedIn
ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMs
✅ Some state-of-the-art methods for knowledge graph (KG) construction that implement incrementality build a graph from around 3k atomic facts in 4–7 hours, while ATOM achieves the same in just 20 minutes using only 8 parallel threads and a batch size of 40 for asynchronous LLM API calls.
❓ What’s the secret behind this performance?
👉 The architecture. The parallel design.
❌ Incrementality in KG construction was key, but it significantly limits scalability. This is because the method must first build the KG and compare it with the previous one before moving on to the next chunk. That’s why we eliminated this in iText2KG.
❓ Why is scalability so important? The short answer: real-time analytics.
Fast dynamic TKG construction enables LLMs to reason over them and generate responses instantly, in real time.
Discover more secrets behind this parallel architecture by reading the full paper (link in the first comment).
ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMs
Beyond RDF vs LPG: Operational Ontologies, Hybrid Semantics, and Why We Still Chose a Property Graph | LinkedIn
How to stay sane about “semantic Graph RAG” when your job is shipping reliable systems, not winning ontology theology wars. You don’t wake up in the morning thinking about OWL profiles or SPARQL entailment regimes.
Knowledge Graphs and GraphRAG have sorta taken over my life the last two months or so, so I thought I would share some very important books for learners and builders
Knowledge Graphs and GraphRAG have sorta taken over my life the last two months or so, so I thought I would share some very important books for learners and builders.
Knowledge Graphs: I’m going to really enjoy this KG book a lot more, now. It’s simple reading, in my opinion.
Text as Data: if you work in Data Science and AI, just buy this book right now and then read it. You need to know this. This is my favorite NLP book.
Orange Book (Sorry, long title): that is the best builder book I have found so far. It shows how to build with GraphRAG, and you should check it out. I really enjoyed reading this book and use it all the time.
Just wanted to make some recommendations as I am looking at a lot of my books for ideas, lately. These are diamonds. Find them where you like to shop for books!
#100daysofnetworks | 11 comments on LinkedIn
Knowledge Graphs and GraphRAG have sorta taken over my life the last two months or so, so I thought I would share some very important books for learners and builders
Pseudo-Knowledge Graphs for Better RAG | by Devashish Datt Mamgain | Oct, 2025 | Towards AI
Pseudo-Knowledge Graphs for Better RAG Retrieval-Augmented Generation (RAG) was supposed to give Large Language Models perfect memory: ask a question, fetch the exact facts, and generate a fluent and …
Turn Text Into a Knowledge Graph with 70B LLM on DGX Spark
Looking to run local GraphRAG or other graph analytics use cases? With DGX Spark, you can prepare your local text files for graph use cases at your desk.In t...
Text2KGBench-LettrIA: A Refined Benchmark for Text2Graph Systems
🚀 LLMs can be powerful tools to extract information from texts and automatically populate Knowledge Graphs guided by ontologies given as inputs. BUT how good are they? To reply to this question, we need benchmarks!
💡 With Lettria, we build the Text2KGBench-LettrIA benchmark covering 19 different ontologies in various domains (company, film, food, politician, sports, monument, etc.) and consisting of near 5k sentences strictly annotated with triples conforming to these ontologies (208 classes, 426 properties) yielding more than 17k triples.
What's more? We throw a lot of compute to compare the performance and efficiency of numerous Closed LLMs models and variants (GPT4, Claude 3, Gemini) and numerous fine-tuned Open Weights models (Mistral 3, Qwen 3, Gemma 3, Phi 4).
✨Key take-away: when being provided with high quality data, fine-tuned open models largely outperform larger, proprietary counterparts!
📄 Curious about the detailed results?
Read our paper at https://lnkd.in/e-EZCjWm
See our presentation at https://lnkd.in/eEdCCpdA that I have just presented at the Knowledge Base Construction from Pre-Trained Language Models Workshop colocated with the ISWC - International Semantic Web Conference.
You want to use these results in your operations? Sign-up for using the newly released PERSEUS model, https://lnkd.in/e7exyJHc
Joint work with Julien PLU, Oscar Moreno Escobar, Edouard Trouillez, Axelle Gapin, Pasquale Lisena, Thibault Ehrhart
#iswc2025 #LLMs #KnowledgeGraphs #NLP #Research
EURECOM, Charles Borderie
The audiobook version of "Knowledge Graphs and LLMs in Action" is now available
🎧 Exciting news! The audiobook version of "Knowledge Graphs and LLMs in Action" is now available!
Are you busy but would love to learn how to build powerful and explainable AI solutions? No problem! Manning has just released the audio version of our book.
Now you can listen while you're:
- Running and training for your next marathon 🏃
- Commuting to the office 🚗
- Sitting in the parking lot waiting for your kids to finish their violin lesson 🎻
Your schedule is packed, but that shouldn't stop you from mastering these powerful AI techniques.
Get your copy here: https://hubs.la/Q03MVhhk0
And don't forget to use discount code: lagraphs40 for 40% off!
Clever solutions for smart people.
The audiobook version of "Knowledge Graphs and LLMs in Action" is now available
Knowledge Graphs in the Era of Large Language Models (KGELL)
Knowledge Graphs (KGs) have gained attention due to their ability to represent structured and interlinked information. KGs represent knowledge in the form of relations between entities, referred to as...
Cognee - AI Agents with LangGraph + cognee: Persistent Semantic Memory
Build AI agents with LangGraph and cognee: persistent semantic memory across sessions for cleaner context and higher accuracy. See the demo—get started now.