ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
Alhamdulillah, ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
Just as matter is formed from atoms, and galaxies are formed from stars, knowledge is likely to be formed from atomic knowledge graphs.
Atomic knowledge graphs were born from our intention to solve a common problem in LLM-based KG construction methods: exhaustivity and stability. Often, these methods produce unstable KGs that change when rerunning the construction process, even without changing anything. Moreover, they fail to capture all facts in the input documents and usually overlook the temporal and dynamic aspects of real-world data.
What is the solution? Atomic facts that are temporally aware.
Instead of constructing knowledge graphs from raw documents, we split them into atomic facts, which are self-contained and concise propositions. Temporal atomic KGs are constructed from each atomic fact. Then, we defined how the temporal atomic KGs would be merged at the atomic level and how the temporal aspects would be handled. We designed a binary merge algorithm that combines two TKGs and a parallel merge process that merges all TKGs simultaneously. The entire architecture operates in parallel.
ATOM employs dual-time modeling that distinguishes observation time from validity time and has 3 main modules:
- Module 1 (Atomic Fact Decomposition) splits input documents observed at time t into atomic facts using LLM-based prompting, where each temporal atomic fact is a short, self-contained snippet that conveys exactly one piece of information.
- Module 2 (Atomic TKGs Construction) extracts 5-tuples in parallel from each atomic fact to construct atomic temporal KGs, while embedding nodes and relations and addressing temporal resolution during extraction.
- Module 3 (Parallel Atomic Merge) employs a binary merge algorithm to merge pairs of atomic TKGs through iterative pairwise merging in parallel until convergence, with three resolution phases: (1) entity resolution, (2) relation name resolution, and (3) temporal resolution that merges observation and validity time sets for relations with similar (e_s, r_p, e_o). The resulting TKG snapshot is then merged with the previous DTKG to yield the updated DTKG.
Results: Empirical evaluations demonstrate that ATOM achieves ~18% higher exhaustivity, ~17% better stability, and over 90% latency reduction compared to baseline methods (including iText2KG), demonstrating strong scalability potential for dynamic TKG construction.
Check our ATOM's architecture and code:
Preprint Paper: https://lnkd.in/dsJzDaQc
Code: https://lnkd.in/drZUyisV
Website: (coming soon)
Example use cases: (coming soon)
Special thanks to the dream team: Ludovic Moncla, Khalid Benabdeslem, Rémy Cazabet, Pierre Cléau 📚📡 | 14 comments on LinkedIn
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
Cognee - AI Agents with LangGraph + cognee: Persistent Semantic Memory
Build AI agents with LangGraph and cognee: persistent semantic memory across sessions for cleaner context and higher accuracy. See the demo—get started now.
Is OpenAI quietly moving toward knowledge graphs?
Yesterday’s OpenAI DevDay was all about new no-code tools to create agents. Impressive. But what caught my attention wasn’t what they announced… it’s what they didn’t talk about.
During the summer, OpenAI released a Cookbook update introducing the concept Temporal Agents (see below) connecting it to Subject–Predicate–Object triples: the very foundation of a knowledge graph.
If you’ve ever worked with graphs, you know this means something big:
they’re not just building agents anymore they’re building memory, relationships, and meaning.
When you see “London – isCapitalOf – United Kingdom” in their official docs, you realize they’re experimenting with how to represent knowledge itself.
And with any good knowledge graph… comes an ontology.
So here’s my prediction:
ChatGPT-6 will come with a built-in graph that connects everything about you.
The question is: do you want their AI to know everything about you?
Or do you want to build your own sovereign AI, one that you own, built from open-source intelligence and collective knowledge?
Would love to know what you think. Is that me hallucinating or is that a weak signal?👇 | 62 comments on LinkedIn
Algorithmic vs. Symbolic Reasoning: Is Graph Data Science a critical, transformative layer for GraphRAG?
Is Graph Data Science a critical, transformative layer for GraphRAG? The field of enterprise Artificial Intelligence (AI) is undergoing a significant architectural evolution. The initial enthusiasm for Large Language Models (LLMs) has matured into a pragmatic recognition of their limitations, partic
Today, I'd like to introduce the GitLab Knowledge Graph. This release includes a code indexing engine, written in Rust, that turns your codebase into a live, embeddable graph database for LLM RAG. You can install it with a simple one-line script, parse local repositories directly in your editor, and connect via MCP to query your workspace and over 50,000 files in under 100 milliseconds.
We also saw GKG agents scoring up to 10% higher on the SWE-Bench-lite benchmarks, with just a few tools and a small prompt added to opencode (an open-source coding agent). On average, we observed a 7% accuracy gain across our eval runs, and GKG agents were able to solve new tasks compared to the baseline agents. You can read more from the team's research here https://lnkd.in/egiXXsaE.
This release is just the first step: we aim for this local version to serve as the backbone of a Knowledge Graph service that enables you to query the entire GitLab Software Development Life Cycle—from an Issue down to a single line of code.
I am incredibly proud of the work the team has done. Thank you, Michael U., Jean-Gabriel Doyon, Bohdan Parkhomchuk, Dmitry Gruzd, Omar Qunsul, and Jonathan Shobrook. You can watch Bill Staples and I present this and more in the GitLab 18.4 release here: https://lnkd.in/epvjrhqB
Try today at: https://lnkd.in/eAypneFA
Roadmap: https://lnkd.in/eXNYQkEn
Watch more below for a complete, in-depth tutorial on what we've built: | 19 comments on LinkedIn
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation ...
Why Current AI Search Falls Short When You Need Real Answers
What happens when you ask an AI system a complex question that requires connecting multiple pieces of information? Most current approaches retrieve some relevant documents, generate an answer, and call it done. But this single-pass strategy often misses critical evidence.
👉 The Problem with Shallow Retrieval
Traditional retrieval-augmented generation (RAG) systems work like a student who only skims the first few search results before writing an essay. They grab what seems relevant on the surface but miss deeper connections that would lead to better answers.
When researchers tested these systems on complex multi-hop questions, they found a consistent pattern: the AI would confidently provide answers based on incomplete evidence, leading to logical gaps and missing key facts.
👉 A New Approach: Deep Searching with Dual Channels
Researchers from IDEA Research and Hong Kong University of Science and Technology developed GraphSearch, which works more like a thorough investigator than a quick searcher.
The system breaks down complex questions into smaller, manageable pieces, then searches through both text documents and structured knowledge graphs. Think of it as having two different research assistants: one excellent at finding descriptive information in documents, another skilled at tracing relationships between entities.
👉 How It Actually Works
Instead of one search-and-answer cycle, GraphSearch uses six coordinated modules:
Query decomposition splits complex questions into atomic sub-questions
Context refinement filters out noise from retrieved information
Query grounding fills in missing details from previous searches
Logic drafting organizes evidence into coherent reasoning chains
Evidence verification checks if the reasoning holds up
Query expansion generates new searches to fill identified gaps
The system continues this process until it has sufficient evidence to provide a well-grounded answer.
👉 Real Performance Gains
Testing across six different question-answering benchmarks showed consistent improvements. On the MuSiQue dataset, for example, answer accuracy jumped from 35% to 51% when GraphSearch was integrated with existing graph-based systems.
The approach works particularly well under constrained conditions - when you have limited computational resources for retrieval, the iterative searching strategy maintains performance better than single-pass methods.
This research points toward more reliable AI systems that can handle the kind of complex reasoning we actually need in practice.
Paper: "GraphSearch: An Agentic Deep Searching Workflow for Graph Retrieval-Augmented Generation" by Yang et al.
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation
HiRAG: Retrieval-Augmented Generation with Hierarchical Knowledge
Graph-based Retrieval-Augmented Generation (RAG) methods have significantly enhanced the performance of large language models (LLMs) in domain-specific tasks. However, existing RAG methods do not...
Youtu-GraphRAG: Vertically Unified Agents for Graph...
Graph retrieval-augmented generation (GraphRAG) has effectively enhanced large language models in complex reasoning by organizing fragmented knowledge into explicitly structured graphs. Prior...
If you could hire the smartest engineers and drop them in your code base would you expect miracles overnight? No, of course not! Because even if they are the best of coders, they don’t have context on your project, engineering processes and culture, security and compliance rules, user personas, business priorities, etc. The same is true of the very best agents.. they may know how to write (mostly) technically correct code, and have the context of your source code, but they’re still missing tons of context.
Building agents that can deliver high quality outcomes, faster, is going to require much more than your source code, rules and a few prompts. Agents need the same full lifecyle context your engineers gain after being months and years on the job. LLMs will never have access to your company’s engineering systems to train on, so something has to bridge the knowledge gap and it shouldn’t be you, one prompt at a time. This is why we're building what we call our Knowledge Graph at GitLab.
It's not just indexing files and code; it's mapping the relationships across your entire development environment. When an agent understands that a particular code block contains three security vulnerabilities, impacts two downstream services, and connects to a broader epic about performance improvements, it can make smarter recommendations and changes than just technically correct code.
This kind of contextual reasoning is what separates valuable AI agents from expensive, slow, LLM driven search tools. We're moving toward a world where institutional knowledge becomes portable and queryable. The context of a veteran engineer who knows "why we built it this way" or "what happened last time we tried this approach" can now be captured, connected, and made available to both human teammates and AI agents. See the awesome demos below and I look forward to sharing more later this month in our 18.4 beta update!
Enterprise Adoption of GraphRAG: The CRUD Challenge
Enterprise Adoption of GraphRAG: The CRUD Challenge
GraphRAG and other retrieval-augmented generation (RAG) workflows are currently attracting a lot of attention. Their prototypes are impressive, with data ingestion, embedding generation, knowledge graph creation, and answer generation all functioning smoothly.
However, without proper CRUD (Create, Read, Update, Delete) support, these systems are limited to academic experimentation rather than becoming enterprise-ready solutions.
Update: knowledge is constantly evolving. Regulations change, medical guidelines are updated, and product catalogues are revised. If a system cannot reliably update its information, it will produce outdated answers and quickly lose credibility.
Delete: Incorrect or obsolete information must be deleted. In regulated industries such as healthcare, finance and law, retaining deleted data can lead to compliance issues. Without a deletion mechanism, incorrect or obsolete information can persist in the system long after it should have been removed.
This is an issue that many GraphRAG pilots face. Although the proof of concept looks promising, limitations become evident when someone asks, "What happens when the source of truth changes?"
While reading and creation are straightforward, updates and deletions determine whether a system remains a prototype or becomes a reliable enterprise tool. Most implementations stop at 'reading', and while retrieval and answer generation work, real-world enterprise systems never stand still.
In order for GraphRAG and RAG in general to transition from research labs to widespread enterprise adoption, support for CRUD must be an fundamental aspect of the design process.
#GraphRAG #RAG #KnowledgeGraph #EnterpriseAI #CRUD #EnterpriseAdoption #TrustworthyAI #DataManagement
Enterprise Adoption of GraphRAG: The CRUD Challenge
A new notebook exploring Semantic Entity Resolution & Extraction using DSPy and Google's new LangExtract library.
Just released a new notebook exploring Semantic Entity Resolution & Extraction using DSPy (Community) and Google's new LangExtract library.
Inspired by Russell Jurney’s excellent work on semantic entity resolution, this demo follows his approach of combining:
✅ embeddings,
✅ kNN blocking,
✅ and LLM matching with DSPy (Community).
On top of that, I added a general extraction layer to test-drive LangExtract, a Gemini-powered, open-source Python library for reliable structured information extraction. The goal? Detect and merge mentions of the same real-world entities across text.
It’s an end-to-end flow tackling one of the most persistent data challenges.
Check it out, experiment with your own data, 𝐞𝐧𝐣𝐨𝐲 𝐭𝐡𝐞 𝐬𝐮𝐦𝐦𝐞𝐫 and let me know your thoughts!
cc Paco Nathan you might like this 😉
https://wor.ai/8kQ2qa
a new notebook exploring Semantic Entity Resolution & Extraction using DSPy (Community) and Google's new LangExtract library.
Stop manually building your company's brain. ❌
Having reviewed the excellent DeepLearning.AI lecture on Agentic Knowledge Graph Construction, by Andreas Kollegger and writing a book on Agentic graph system with Sam Julien, it is clear that the use of agentic systems represents a shift in how we build and maintain knowledge graphs (KGs).
Most organizations are sitting on a goldmine of data spread across CSVs, documents, and databases.
The dream is to connect it all into a unified Knowledge Graph, an intelligent brain that understands your entire business.
The reality? It's a brutal, expensive, and unscalable manual process.
But a new approach is changing everything.
Here’s the new playbook for building intelligent systems:
🧠 Deploy an AI Agent Workforce
Instead of rigid scripts, you use a cognitive assembly line of specialized AI agents. A Proposer agent designs the data model, a Critic refines it, and an Extractor pulls the facts.
This modular approach is proven to reduce errors and improve the accuracy and coherence of the final graph.
🎨 Treat AI as a Designer, Not Just a Doer
The agents act as data architects. In discovery mode, they analyze unstructured data (like customer reviews) and propose a new logical structure from scratch.
In an enterprise with an existing data model, they switch to alignment mode, mapping new information to the established structure.
🏛️ Use a 3-Part Graph Architecture
This technique is key to managing data quality and uncertainty. You create three interconnected graphs:
The Domain Graph: Your single source of truth, built from trusted, structured data.
The Lexical Graph: The raw, original text from your documents, preserving the evidence.
The Subject Graph: An AI-generated bridge that connects them. It holds extracted insights that are validated before being linked to your trusted data.
Jaro-Winkler is a string comparison algorithm that measures the similarity or edit distance between two strings. It can be used here for entity resolution, the process of identifying and linking entities from the unstructured text (Subject Graph) to the official entities in the structured database (Domain Graph).
For example, the algorithm compares a product name extracted from a customer review (e.g., "the gothenburg table") with the official product names in the database. If the Jaro-Winkler similarity score is above a certain threshold, the system automatically creates a CORRESPONDS_TO relationship, effectively linking the customer's comment to the correct product in the supply chain graph.
🤝 Augment Humans, Don't Replace Them
The workflow is Propose, then Approve. AI does the heavy lifting, but a human expert makes the final call.
This process is made reliable by tools like Pydantic and Outlines, which enforce a rigid contract on the AI's output, ensuring every piece of data is perfectly structured and consistent.
And once discovered and validated, a schema can be enforced. | 32 comments on LinkedIn
FinReflectKG: Agentic Construction and Evaluation of Financial Knowledge Graphs
Sharing our recent research 𝐅𝐢𝐧𝐑𝐞𝐟𝐥𝐞𝐜𝐭𝐊𝐆: 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐂𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐅𝐢𝐧𝐚𝐧𝐜𝐢𝐚𝐥 𝐊𝐧𝐨𝐰𝐥𝐞𝐝𝐠𝐞 𝐆𝐫𝐚𝐩𝐡𝐬. It is the largest financial knowledge graph built from unstructured data. The preprint of our article is out on arXiv now (link is in the comments). It is coauthored with Abhinav Arun | Fabrizio Dimino | Tejas Prakash Agrawal
While LLMs make it easier than ever to generate knowledge graphs, the real challenge lies in ensuring quality without hallucinations, with strong coverage, precision, comprehensiveness, and relevance. FinReflectKG tackles this through an iterative, evaluation-driven agentic approach, carefully optimized across multiple evaluation metrics to deliver a trustworthy and high-quality knowledge graph.
Designed to power use cases like entity search, question answering, signal generation, predictive modeling, and financial network analysis, FinReflectKG sets a new benchmark for building reliable financial KGs and showcases the potential of agentic workflows in LLM-driven systems.
We will be creating a suite of benchmarks using FinReflectKG for KG related tasks in financial services. More details to come soon. | 15 comments on LinkedIn
Knowledge Graphs and LLMs in Action - Alessandro Negro with Vlastimil Kus, Giuseppe Futia and Fabio Montagna
Knowledge graphs help understand relationships between the objects, events, situations, and concepts in your data so you can readily identify important patterns and make better decisions. This book provides tools and techniques for efficiently labeling data, modeling a knowledge graph, and using it to derive useful insights.
In Knowledge Graphs and LLMs in Action you will learn how to:
Model knowledge graphs with an iterative top-down approach based in business needs
Create a knowledge graph starting from ontologies, taxonomies, and structured data
Use machine learning algorithms to hone and complete your graphs
Build knowledge graphs from unstructured text data sources
Reason on the knowledge graph and apply machine learning algorithms
Move beyond analyzing data and start making decisions based on useful, contextual knowledge. The cutting-edge knowledge graphs (KG) approach puts that power in your hands. In Knowledge Graphs and LLMs in Action, you’ll discover the theory of knowledge graphs and learn how to build services that can demonstrate intelligent behavior. You’ll learn to create KGs from first principles and go hands-on to develop advisor applications for real-world domains like healthcare and finance.
Most people talk about AI agents like they’re already reliable. They aren’t.
Most people talk about AI agents like they’re already reliable. They aren’t.
They follow instructions. They spit out results. But they forget what they did, why it mattered, or how circumstances have changed. There’s no continuity. No memory. No grasp of unfolding context. Today’s agents can respond - but they can’t reflect, reason, or adapt over time.
OpenAI’s new cookbook Temporal Agents with Knowledge Graphs lays out just how limiting that is and offers a credible path forward. It introduces a new class of temporal agents: systems built not around isolated prompts, but around structured, persistent memory.
At the core is a knowledge graph that acts as an evolving world model - not a passive record, but a map of what happened, why it mattered, and what it connects to. This lets agents handle questions like:
“What changed since last week?”
“Why was this decision made?”
“What’s still pending and what’s blocking it?”
It’s an architectural shift that turns time, intent, and interdependence into first-class elements.
This mirrors Tony Seale’s argument about enterprise data: most data products don’t fail because of missing pipelines - they fail because they don’t align with how the business actually thinks. Data lives in tables and schemas. Business lives in concepts like churn, margin erosion, customer health, or risk exposure.
Tony’s answer is a business ontology: a formal, machine-readable layer that defines the language of the business and anchors data products to it. It’s a shift from structure to semantics - from warehouse to shared understanding.
That’s the same shift OpenAI is proposing for agents.
In both cases, what’s missing isn’t infrastructure. It’s interpretation.
The challenge isn’t access. It’s alignment.
If we want agents that behave reliably in real-world settings, it’s not enough to fine-tune them on PDFs or dump Slack threads into context windows. They need to be wired into shared ontologies - concept-level scaffolding like:
Who are our customers?
What defines success?
What risks are emerging, and how are they evolving?
The temporal knowledge graph becomes more than just memory. It becomes an interface - a structured bridge between reasoning and meaning.
This goes far beyond another agent orchestration blueprint. It points to something deeper: Without time and meaning, there is no true delegation.
We don’t need agents that mimic tasks.
We need agents that internalise context and navigate change.
That means building systems that don’t just handle data, but understand how it fits into the changing world we care about.
OpenAI’s temporal memory graphs and Tony’s business ontologies aren’t separate ideas. They’re converging on the same missing layer:
AI that reasons in the language of time and meaning.
H/T Vin Vashishta for the pointer to the OpenAI cookbook, and image nicked from Tony (as usual). | 72 comments on LinkedIn
Most people talk about AI agents like they’re already reliable. They aren’t.