How can we create general-purpose graph foundation models?
How can we create general-purpose graph foundation models?
(by Dmitry Eremeev)
For a long time, we believed that general-purpose graph foundation models were impossible to create. Indeed, graphs are used to represent data across many different domains, and thus graph machine learning must handle tasks on extremely diverse datasets, such as social, information, transportation, and co-purchasing networks, or models of various physical, biological, or engineering systems. Given the vast differences in structure, features, and labels among these datasets, it seemed unlikely that a single model could achieve robust cross-domain generalization and perform well on all of them.
However, we noticed that tabular machine learning faces a similar challenge of working with diverse datasets containing different features and labels. And yet, this field has recently witnessed the emergence of first successful foundation models such as TabPFNv2, which are based on the prior-data fitted networks (PFNs) paradigm. Thus, we have…
Exploring Network-Knowledge Graph Duality: A Case Study in Agentic Supply Chain Risk Analysis
Exploring Network-Knowledge Graph Duality: A Case Study in
Agentic Supply Chain Risk Analysis ...
What happens when you ask an AI about supply chain vulnerabilities and it misses the most critical dependencies?
Most AI systems treat business relationships like isolated facts in a database. They might know Apple uses lithium batteries, but they miss the web of connections that create real risk.
👉 The Core Problem
Standard AI retrieval treats every piece of information as a standalone point. But supply chain risk lives in the relationships between companies, products, and locations. When conflict minerals from the DRC affect smartphone production, it's not just about one supplier - it's about cascading effects through interconnected networks.
Vector similarity search finds related documents but ignores the structural dependencies that matter most for risk assessment.
👉 A Different Approach
New research from UC Berkeley and MSCI demonstrates how to solve this by treating supply chains as both networks and knowledge graphs simultaneously.
The key insight: economic relationships like "Company A produces Product B" are both structural network links and semantic knowledge graph triples. This duality lets you use network science to find the most economically important paths.
👉 How It Works
Instead of searching for similar text, the system:
- Maps supply chains as networks with companies, products, and locations as nodes
- Uses centrality measures to identify structurally important paths
- Wraps quantitative data in descriptive language so AI can reason about what numbers actually mean
- Retrieves specific relationship paths rather than generic similar content
When asked about cobalt risks, it doesn't just find articles about cobalt. It traces the actual path from DRC mines through battery manufacturers to final products, revealing hidden dependencies.
The system generates risk narratives that connect operational disruptions to financial impacts without requiring specialized training or expensive graph databases.
This approach shows how understanding the structure of business relationships - not just their content - can make AI genuinely useful for complex domain problems.
Exploring Network-Knowledge Graph Duality: A Case Study inAgentic Supply Chain Risk Analysis
Another awesome Graph RAG paper from February: Agentic Retrieval-Augmented Generation: A Survey ON Agentic RAG
Another awesome Graph RAG paper from February: Agentic Retrieval-Augmented Generation: A Survey ON Agentic RAG by Aditi Singh, Ph.D., Abul E., Saket Kumar and Tala Talaei Khoei, Ph.D..
Section 3 is a complete breakdown of the entire RAG and agentic stack from vector RAG to agentic RAG, that I highly recommend for an introduction... there are lots of figures and plain language, so even useful if you're not that technical. The paper then provides an Agentic RAG taxonomy and gives a thorough, high level overview of different forms of Agentic RAG.
It outlines the concepts relating to Agentic RAG, which it refers to as having:
- Multiple, autonomous agents
- Dynamic decision-making
- Iterative refinement and workflow optimization
- Adaptable to real-time changes
- Scalable for multi-domain tasks
- High accuracy
What does that mean?
Traditional RAG systems, with their static workflows and limited adaptability, often struggle to handle dynamic, multistep reasoning and complex real-world tasks. These limitations have spurred the integration of agentic intelligence, resulting in Agentic RAG. By incorporating autonomous agents capable of dynamic decision-making, iterative reasoning, and adaptive retrieval strategies, Agentic RAG builds on the modularity of earlier paradigms while overcoming their inherent constraints. This evolution enables more complex, multi-domain tasks to be addressed with enhanced precision and contextual understanding, positioning Agentic RAG as a cornerstone for next-generation AI applications. In particular, Agentic RAG systems reduce latency through optimized workflows and refine outputs iteratively, tackling the very challenges that have historically hindered traditional RAG’s scalability and effectiveness.
That sounds cool... now how does it work? Agentic RAG is a combination of tool calling agents iteratively accessing different kinds of data stores and APIs in collaboration with other agents that may split tasks in parallel, check one another's work of perform different parts of a chain of prompts.
Single Agentic RAG, the simplest form, rocks these features... in summary it can supplement RAG-retrieval by routing queries to different forms of search ("[filesystem, relational DB, graph DB], semantic search, web search"), as well as external APIs such as DuckDuckGo, Serp, Splunk, Wikipedia, Salesforce, Outlook, Dropbox, Google Workspace, Slack, Discord or multiple, iterative tool usage via your favorite MPC servers. The possibilities to craft your own agentic RAG workflows are enormous!
The paper is on arXiv here: https://lnkd.in/gZ8ypXYf
Another awesome Graph RAG paper from February: Agentic Retrieval-Augmented Generation: A Survey ON Agentic RAG
Protocols move bits. Semantics move value.
The reports on agents are starting to sound samey: go vertical not horizontal; redesign workflows end-to-end; clean your data; stop doing pilots that automate inefficiencies; price for outcomes when the agent does the work.
All true. All necessary. All needing repetition ad nauseam.
So it’s refreshing to see a switch-up in Bain’s Technology Report 2025: the real leverage now sits with semantics. A shared layer of meaning.
Bain notes that protocols are maturing. MCP and A2A let agents pass tool calls, tokens, and results between layers. Useful plumbing. But there’s still no shared vocabulary that says what an invoice, policy, or work order is, how it moves through states, and how it maps to APIs, tables, and approvals. Without that, cross-vendor reliability will keep stalling.
They go further: whoever lands a pragmatic semantic layer first gets winner-takes-most network effects. Define the dictionary and you steer the value flow. This isn’t just a feature. It’s a control point.
Bain frames the stack clearly:
- Systems of record (data, rules, compliance)
- Agent operating systems (orchestration, planning, memory)
- Outcome interfaces (natural language requests, user-facing actions)
The bottleneck is semantics.
And there’s a pricing twist. If agents do the work, semantics define what “done” means. That unlocks outcome-based pricing, charging for tasks completed or value delivered, not log-ons.
Bain is blunt: the open, any-to-any agent utopia will smash against vendor incentives, messy data, IP, and security. Translation: walled gardens lead first. Start where governance is clear and data is good enough, then use that traction to shape the semantics others will later adopt.
This is where I’m seeing convergence. In practice, a knowledge graph can provide that shared meaning, identity, relationships, and policy. One workable pattern: the agent plans with an LLM, resolves entities and checks rules in the graph, then acts through typed APIs, writing back as events the graph can audit.
That’s the missing vocabulary and the enforcement that protocols alone can’t cover.
Tony Seale puts it well: “Neural and symbolic systems are not rivals; they are complements… a knowledge graph provides the symbolic backbone… to ground AI in shared semantics and enforce consistency.”
To me, this is optimistic, because it moves the conversation from “make the model smarter” to “make the system understandable.”
Agents don’t need perfection if they are predictable, composable, and auditable. Semantics deliver that.
It’s also how smaller players compete with hyperscalers: you don’t need to win the model race to win the meaning race.
With semantics, agents become infrastructure.
The next few years won’t be won by who builds the biggest model.
It’ll be won by who defines the smallest shared meaning. | 27 comments on LinkedIn
The single most undervalued fact of graph theory: Every board is a graph in disguise
The single most undervalued fact of graph theory:
Every board is a graph in disguise.
Here’s the 3-step mapping that turns messy “rooms” into clean, countable components.
0/ You’re given a map of walls and floor tiles.
By eye, you see there are three rooms.
But how do you get a computer to see them too?
1/ Start by modeling the board as a graph.
Treat every floor tile as a node.
Define valid moves as edges.
In our case, moves are the four directions:
• Up
• Down
• Left
• Right
Walls simply remove edges because you can’t step through them.
2/ Number the floor tiles arbitrarily so you can reference nodes.
Now you’ve converted the board to an undirected graph.
Why do this?
Because two common board questions become standard graph problems.
1. “Shortest path between two tiles?” becomes “shortest path between two nodes.”
2. “How many rooms?” becomes “how many connected components?”
That second one is our target.
A “room” is just a maximal set of tiles reachable from each other without crossing walls.
In graph terms, that’s a connected component.
So the count of rooms equals the count of connected components.
Here’s the practical recipe I use:
• Nodes = all floor tiles.
• Edges = pairs of floor tiles one step apart (U/D/L/R).
• Walls = missing edges.
• Rooms = connected components.
• Answer = number of connected components.
3. You can run a DFS or BFS from every unvisited node and mark all reachable tiles.
Each fresh start increments the room counter by one.
That’s it.
No heuristics, no guesswork, just graph structure doing the heavy lifting.
Once you see boards as graphs, these problems stop feeling ad hoc.
They become repeatable templates you can code in minutes.
If this helped, repost so more people learn the “rooms = components” pattern.
The single most undervalued fact of graph theory:Every board is a graph in disguise
G-REASONER: foundation models for unified reasoning over graph-structured knowledge
G-REASONER: foundation models for unified reasoning over graph-structured knowledge ...
Why Graph-Enhanced AI Still Struggles with Complex Reasoning (And How G-REASONER Fixes It)
Ever wondered why current AI systems still fail at connecting the dots across complex knowledge domains? The answer lies in how they handle structured information.
👉 The Core Problem
Large language models excel at reasoning but hit a wall when dealing with interconnected knowledge. Traditional retrieval systems treat information as isolated fragments, missing the rich relationships that make knowledge truly useful.
Current graph-enhanced approaches face three critical limitations:
- They're designed for specific graph types only
- They rely on expensive agent-based reasoning
- They can't generalize across different domains
👉 What G-REASONER Brings to the Table
Researchers from Monash University and collaborating institutions introduce G-REASONER, a unified framework that bridges graph and language foundation models.
The key innovation is QuadGraph - a standardized four-layer structure that unifies diverse knowledge sources:
- Community layer for global context
- Document layer for textual information
- Knowledge graph layer for factual relationships
- Attribute layer for common properties
👉 How It Works in Practice
G-REASONER employs a 34M-parameter graph foundation model that jointly processes graph topology and text semantics. Unlike previous approaches, it uses knowledge distillation to learn from large-scale datasets with weak supervision.
The system implements distributed message-passing across multiple GPUs, enabling efficient scaling. Mixed-precision training reduces memory usage by 17.5% while doubling training throughput.
Testing across six benchmarks shows consistent improvements over state-of-the-art baselines, with particularly strong performance on multi-hop reasoning tasks requiring complex knowledge connections.
The framework demonstrates remarkable generalization - the same model works effectively across medical records, legal documents, and encyclopedia data without domain-specific fine-tuning.
This represents a significant step toward AI systems that can reason over structured knowledge as fluidly as humans navigate interconnected concepts.
G-REASONER: foundation models for unified reasoning over graph-structured knowledge
Announcing the formation of a Data Façades W3C Community Group
I am excited to announce the formation of a Data Façades W3C Community Group.
Façade-X, initially introduced at SEMANTICS 2021 and successfully implemented by the SPARQL Anything project, provides a simple yet powerful, homogeneous view over diverse and heterogeneous data sources (e.g., CSV, JSON, XML, and many others). With the recent v1.0.0 release of SPARQL Anything, the time was right to work on the long-term stability and widespread adoption of this approach by developing an open, vendor-neutral technology.
The Façade-X concept was born to allow SPARQL users to query data in any structured format in plain SPARQL. Therefore, the choice of a W3C community group to lead efforts on specifications is just natural. Specifications will enhance its reliability, foster innovation, and encourage various vendors and projects—including graph database developers — to provide their own compatible implementations.
The primary goals of the Data Façades Community Group is to:
Define the core specification of the Façade-X method.
Define Standard Mappings: Formalize the required mappings and profiles for connecting Façade-X to common data formats.
Define the specification of the query dialect: Provide a reference for the SPARQL dialect, configuration conventions (like SERVICE IRIs), and the functions/magic properties used.
Establish Governance: Create a monitored, robust process for adding support for new data formats.
Foster Collaboration: Build connections with relevant W3C groups (e.g., RDF & SPARQL, Data Shapes) and encourage involvement from developers, businesses, and adopters.
Join us!
With Luigi Asprino Ivo Velitchkov Justin Dowdy Paul Mulholland Andy Seaborne Ryan Shaw ...
CG: https://lnkd.in/eSxuqsvn
Github: https://lnkd.in/dkHGT8N3
SPARQL Anything #RDF #SPARQL #W3C #FX
announce the formation of a Data Façades W3C Community Group
Today, I'd like to introduce the GitLab Knowledge Graph. This release includes a code indexing engine, written in Rust, that turns your codebase into a live, embeddable graph database for LLM RAG. You can install it with a simple one-line script, parse local repositories directly in your editor, and connect via MCP to query your workspace and over 50,000 files in under 100 milliseconds.
We also saw GKG agents scoring up to 10% higher on the SWE-Bench-lite benchmarks, with just a few tools and a small prompt added to opencode (an open-source coding agent). On average, we observed a 7% accuracy gain across our eval runs, and GKG agents were able to solve new tasks compared to the baseline agents. You can read more from the team's research here https://lnkd.in/egiXXsaE.
This release is just the first step: we aim for this local version to serve as the backbone of a Knowledge Graph service that enables you to query the entire GitLab Software Development Life Cycle—from an Issue down to a single line of code.
I am incredibly proud of the work the team has done. Thank you, Michael U., Jean-Gabriel Doyon, Bohdan Parkhomchuk, Dmitry Gruzd, Omar Qunsul, and Jonathan Shobrook. You can watch Bill Staples and I present this and more in the GitLab 18.4 release here: https://lnkd.in/epvjrhqB
Try today at: https://lnkd.in/eAypneFA
Roadmap: https://lnkd.in/eXNYQkEn
Watch more below for a complete, in-depth tutorial on what we've built: | 19 comments on LinkedIn
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation ...
Why Current AI Search Falls Short When You Need Real Answers
What happens when you ask an AI system a complex question that requires connecting multiple pieces of information? Most current approaches retrieve some relevant documents, generate an answer, and call it done. But this single-pass strategy often misses critical evidence.
👉 The Problem with Shallow Retrieval
Traditional retrieval-augmented generation (RAG) systems work like a student who only skims the first few search results before writing an essay. They grab what seems relevant on the surface but miss deeper connections that would lead to better answers.
When researchers tested these systems on complex multi-hop questions, they found a consistent pattern: the AI would confidently provide answers based on incomplete evidence, leading to logical gaps and missing key facts.
👉 A New Approach: Deep Searching with Dual Channels
Researchers from IDEA Research and Hong Kong University of Science and Technology developed GraphSearch, which works more like a thorough investigator than a quick searcher.
The system breaks down complex questions into smaller, manageable pieces, then searches through both text documents and structured knowledge graphs. Think of it as having two different research assistants: one excellent at finding descriptive information in documents, another skilled at tracing relationships between entities.
👉 How It Actually Works
Instead of one search-and-answer cycle, GraphSearch uses six coordinated modules:
Query decomposition splits complex questions into atomic sub-questions
Context refinement filters out noise from retrieved information
Query grounding fills in missing details from previous searches
Logic drafting organizes evidence into coherent reasoning chains
Evidence verification checks if the reasoning holds up
Query expansion generates new searches to fill identified gaps
The system continues this process until it has sufficient evidence to provide a well-grounded answer.
👉 Real Performance Gains
Testing across six different question-answering benchmarks showed consistent improvements. On the MuSiQue dataset, for example, answer accuracy jumped from 35% to 51% when GraphSearch was integrated with existing graph-based systems.
The approach works particularly well under constrained conditions - when you have limited computational resources for retrieval, the iterative searching strategy maintains performance better than single-pass methods.
This research points toward more reliable AI systems that can handle the kind of complex reasoning we actually need in practice.
Paper: "GraphSearch: An Agentic Deep Searching Workflow for Graph Retrieval-Augmented Generation" by Yang et al.
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation
Product management makes or breaks AI. The role of graph
Product management makes or breaks AI. That includes 𝐝𝐚𝐭𝐚.
The role of 𝐝𝐚𝐭𝐚 𝐏𝐌 is shifting.
For years, the focus was BI - dashboards, reports, warehouses.
But AI demands more: context, retrieval, real time, and integration into the flow of work.
Data PMs who understand AI requirements will define the next generation of enterprise success.
Here’s how my team thinks about BI-ready vs AI-ready data 👇
Learn about how graph-wide scanning , a method that scans an entire network graph, offers a powerful solution to a major cybersecurity challenge: detecting hidden, low-signal threats like Advanced Persistent Threats (APTs) that are often missed by traditional security tools.
Cognee - Graph-Aware Embeddings by cognee: For Even Smarter Retrieval
Cognee introduces graph-aware embeddings: graph signals boost semantic search for faster and more precise retrievals in paid plans. Learn more and book a call.
Many teams adopt graph databases believing they need specialized tools for relationship data, adding unnecessary complexity to their stack. This session reveals that for most use cases, the performance benefits don't justify the overhead. You'll learn to evaluate whether you truly need graph DB capabilities and how to implement graph patterns using simpler alternatives.
ODKE+: Ontology-Guided Open-Domain Knowledge Extraction with LLMs
Knowledge graphs (KGs) are foundational to many AI applications, but maintaining their freshness and completeness remains costly. We present ODKE+, a production-grade system that automatically...
Snowflake Unites Industry Leaders to Unlock AI's Potential with the Open Semantic Interchange
So I am worried.
https://lnkd.in/gfpkjUNZ
A semantic exchange format in YAML?
Because there is nothing to build on already?
https://lnkd.in/gB-iEeXn
:(
Snowflake Unites Industry Leaders to Unlock AI's Potential with the Open Semantic Interchange
City2graph is a Python library that turns urban datasets such as streets, buildings, transit networks, and mobility flows into graph structures ready for Graph Neural Networks.
city2graph logo city2graph logo city2graph is a Python library for converting geospatial datasets into graphs for GNN with integrated interface of GeoPandas, NetworkX, and Pytorch Geometric across ...