Turning Tabular Foundation Models into Graph Foundation Models
While foundation models have revolutionized such fields as natural language processing and computer vision, their application and potential within graph machine learning remain largely unexplored....
Blue Morpho: A new solution for building AI apps on top of knowledge bases
Blue Morpho: A new solution for building AI apps on top of knowledge bases
Blue Morpho helps you build AI agents that understand your business context, using ontologies and knowledge graphs.
Knowledge Graphs work great with LLMs. The problem is that building KGs from unstructured data is hard.
Blue Morpho promises a system that turns PDFs and text files into knowledge graphs. KGs are then used to augment LLMs with the right context to answer queries, make decisions, produce reports, and automate workflows.
How it works:
1. Upload documents (pdf or txt).
2. Define your ontology: concepts, properties, and relationships. (Coming soon: ontology generation via AI assistant.)
3. Extract a knowledge graph from documents based on that ontology. Entities are automatically deduplicated across chunks and documents, so every mention of “Walmart,” for example, resolves to the same node.
4. Build agents on top. Connect external ones via MCP, or use Blue Morpho: Q&A (“text-to-cypher”) and Dashboard Generation agents.
Blue Morpho differentiation:
- Strong focus on reliability. Guardrails in place to make sure LLMs follow instructions and the ontology.
- Entity deduplication, with AI reviewing edge cases.
- Easy to iterate on ontologies: they are versioned, extraction runs are versioned as well with all their parameters, and changes only trigger necessary recomputes.
- Vector embeddings are only used in very special circumstances, coupled with other techniques.
Link in comments. Jérémy Thomas
#KnowledgeGraph #AI #Agents #MCP #NewRelease #Ontology #LLMs #GenAI #Application
--
Connected Data London 2025 is coming! 20-21 November, Leonardo Royal Hotel London Tower Bridge
Join us for all things #KnowledgeGraph #Graph #analytics #datascience #AI #graphDB #SemTech #Ontology
🎟️ Ticket sales are open. Benefit from early bird prices with discounts up to 30%. https://lnkd.in/diXHEXNE
📺 Sponsorship opportunities are available. Maximize your exposure with early onboarding. Contact us at info@connected-data.london for more.
Blue Morpho: A new solution for building AI apps on top of knowledge bases
What Every Data Scientist Should Know About Graph Transformers and Their Impact on Structured Data
In my latest piece for Unite.AI, I dive into:
🔹 Why message passing alone isn’t enough
🔹 How Graph Transformers use attention to overcome GNN limitations
🔹 Real-world applications in drug discovery, supply chains, recommender systems, and cybersecurity
🔹 The exciting frontier where LLMs meet graphs
Box's Invisible Moat: The permission graph driving 28% operating margins
Everyone's racing to build AI agents.
Few are thinking about data permissions.
Box spent two decades building a boring moat- a detailed map of who can touch what document, when, why, and with what proof.
This invisible metadata layer is now their key moat against irrelevance.
Q2 FY26:
→ Revenue: $294M (+9% YoY)
→ Gross margin: 81.4%
→ Operating margin: 28.6%
→ Net retention: 103%
→ Enterprise Advanced: 10% of revenue (up from 5%)
Slow-growth, high-margin business at a crossroads.
The Permission Graph
Every document in Box has a shadow: its permission metadata. Who created it, modified it, can access it. What compliance rules govern it. Which systems can call it.
When an AI agent requests a contract, it needs more than the PDF. It needs proof it's allowed to see it, verification it's the right version, an audit trail.
Twenty years of accumulated governance that can't be easily replicated.
Why This Matters Now
The CEO Aaron Levie recently told CNBC: "If you don't maintain access controls well, AI agents will find the wrong information - leading to wrong answers or security incidents."
Every enterprise faces the same AI crisis: scattered data with inconsistent permissions, no unified governance, one breach risking progress.
The permission graph solves this.
The Context Control Problem
Box recently launched Enterprise Advanced: AI agents, workflow automation, document generation. They are adding contextual layers because they see a future where AI agents calling their API while users never see Box.
Microsoft owns the experience.
Box becomes plumbing.
This push is their attempt to stay visible. But it's still Product Rails, not Operating Rails. They're adding features to documents, not deepening their permission moat.
The Bull vs Bear Case
Bull: Enterprises will pay for bulletproof governance even if transformation happens elsewhere. The permission graph remains valuable.
Bear: Microsoft acquires or partners with Varonis + Cloudfuze to recreate the graph. The moat may not be deep enough.
Every SaaS Company's Dilemma
Box isn't alone. Every legacy SaaS faces the same question: how do you avoid becoming invisible infrastructure?
They're all trying the same failing playbook. Add AI features, claim "AI-native," hope the moat holds.
Box's advantage: the permission graph is genuinely hard to replicate.
Box's disadvantage: they still think like a document storage company.
Market's View
Box has 81% gross margins on commodity storage because of the permission graph. Yet the market values them at 24x forward P/E, not pricing the graph premium.
The other factor is that Box is led by Aaron Levie. He's a founder who's spent two decades obsessing over one problem: enterprise content governance.
That obsession matters now more than ever.
The question isn't whether the permission graph has value. It's whether Box can deepen the moat before others make it irrelevant.
(Full version sent to subscribers) | 25 comments on LinkedIn
Third edition of the Knowledge Graphs course at Ghent University.
In February 2026 I will start teaching the third edition of the Knowledge Graphs course at Ghent University. This is an elective course in which I teach everything I know about creating interoperable data ecosystems.
As in the previous editions, we open up this elective course as well to professionals using a micro credential. Feel like going back to school? We poured our heart and soul into this one.
🤓 👉 https://lnkd.in/euUiiEwJ
Co-teachers include Ruben Verborgh, Ben De Meester, Ruben Taelman and yourself (there’s a peer teaching assignment).
In February 2026 I will start teaching the third edition of the Knowledge Graphs course at Ghent University.
𝗖𝗮𝗻 𝗮 𝗥𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲 𝗕𝗲 𝗮 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗚𝗿𝗮𝗽𝗵?
Not all knowledge graphs are equal. A semantic KG (RDF/OWL/Stardog) isn’t the same as a property graph (Neo4j), and both differ from enforcing graph-like structures in a relational DB (CockroachDB/Postgres). Each has strengths and trade-offs:
𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗞𝗚𝘀 excel at reasoning and inference over ontologies.
𝗣𝗿𝗼𝗽𝗲𝗿𝘁𝘆 𝗴𝗿𝗮𝗽𝗵𝘀 shine when exploring relationships with intuitive query patterns.
𝗥𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵𝗲𝘀 enforce graph-like models via schema, FKs, indexes, recursive CTEs — with added benefits of scale, distributed TXs, and decades of maturity.
𝗚𝗿𝗮𝗽𝗵 𝗧𝗿𝗮𝘃𝗲𝗿𝘀𝗮𝗹𝘀 𝗶𝗻 𝗦𝗤𝗟
Recursive CTEs let SQL “walk the graph.” Start with a base case (movie + actors), then repeatedly join back to discover multi-hop paths (actors → movies → actors → movies). This simulates “friends-of-friends” traversals in a few lines of SQL.
𝗥𝗔𝗚, 𝗚𝗿𝗮𝗽𝗵𝗥𝗔𝗚, 𝗮𝗻𝗱 𝗟𝗟𝗠𝘀
RAG and GraphRAG give LLMs grounding in structured data, reducing hallucinations and injecting context. Whether via RDF triples, LPG edges, or SQL joins — the principle is the same: real relationships fuel better answers.
𝗧𝗵𝗲 𝟯-𝗛𝗼𝗽 𝗔𝗿𝗴𝘂𝗺𝗲𝗻𝘁
Some vendors claim SQL breaks down after 3 hops. In reality, recursive CTEs traverse arbitrary depth. SQL may not be as compact as Cypher or GQL, but it’s expressive and efficient — the “3-hop wall” is outdated FUD.
𝗟𝗼𝗮𝗱𝗶𝗻𝗴 𝗗𝗮𝘁𝗮 𝗮𝘁 𝗦𝗰𝗮𝗹𝗲
One graph DB is notorious for slow, resource-heavy CSV loads. Distributed RDBMS like CockroachDB can bulk ingest 100s of GB to TBs efficiently.
𝗡𝗼 𝗦𝘁𝗮𝗹𝗲 𝗗𝗮𝘁𝗮
Too often, data must move from TX systems into a graph before use — by then, it’s stale. For AI-driven apps, that lag means hallucinations, missed insights, and poor UX.
𝗪𝗵𝘆 𝗧𝗵𝗶𝘀 𝗠𝗮𝘁𝘁𝗲𝗿𝘀 𝗳𝗼𝗿 𝗔𝗜
As AI apps go multi-regional and global, they demand low latency + strong consistency. Centralized graph DBs hit lag, hotspots, scaling pain. Distributed SQL delivers expressive queries and global consistency — exactly what AI workloads need.
You don’t need to pick “graph” or “relational” as religion. Choose the right model for scale, consistency, and AI grounding. Sometimes RDF. Sometimes LPG. And sometimes, graph-enforced in SQL.
#KnowledgeGraph #ArtificialIntelligence #GenerativeAI #DistributedSQL #CockroachDB
| 11 comments on LinkedIn
Build a knowledge graph from structured & unstructured data [Code Tutorial]
Looking into building knowledge graphs? Check out this code tutorial on how we built a knowledge graph of the latest 'La Liga' standings! ⚽️👩💻 Google Coll...
Webinar: Semantic Graphs in Action - Bridging LPG and RDF Frameworks - Enterprise Knowledge
As organizations increasingly prioritize linked data capabilities to connect information across the enterprise, selecting the right graph framework to leverage has become more important than ever. In this webinar, graph technology experts from Enterprise Knowledge Elliot Risch, James Egan, David Hughes, and Sara Nash shared the best ways to manage and apply a selection of these frameworks to meet enterprise needs.
Tackle the core challenges related to enterprise-ready graph representation and learning. With this hands-on guide, applied data scientists, machine learning engineers, and... - Selection from Scaling Graph Learning for the Enterprise [Book]
A new notebook exploring Semantic Entity Resolution & Extraction using DSPy and Google's new LangExtract library.
Just released a new notebook exploring Semantic Entity Resolution & Extraction using DSPy (Community) and Google's new LangExtract library.
Inspired by Russell Jurney’s excellent work on semantic entity resolution, this demo follows his approach of combining:
✅ embeddings,
✅ kNN blocking,
✅ and LLM matching with DSPy (Community).
On top of that, I added a general extraction layer to test-drive LangExtract, a Gemini-powered, open-source Python library for reliable structured information extraction. The goal? Detect and merge mentions of the same real-world entities across text.
It’s an end-to-end flow tackling one of the most persistent data challenges.
Check it out, experiment with your own data, 𝐞𝐧𝐣𝐨𝐲 𝐭𝐡𝐞 𝐬𝐮𝐦𝐦𝐞𝐫 and let me know your thoughts!
cc Paco Nathan you might like this 😉
https://wor.ai/8kQ2qa
a new notebook exploring Semantic Entity Resolution & Extraction using DSPy (Community) and Google's new LangExtract library.
Stop manually building your company's brain. ❌
Having reviewed the excellent DeepLearning.AI lecture on Agentic Knowledge Graph Construction, by Andreas Kollegger and writing a book on Agentic graph system with Sam Julien, it is clear that the use of agentic systems represents a shift in how we build and maintain knowledge graphs (KGs).
Most organizations are sitting on a goldmine of data spread across CSVs, documents, and databases.
The dream is to connect it all into a unified Knowledge Graph, an intelligent brain that understands your entire business.
The reality? It's a brutal, expensive, and unscalable manual process.
But a new approach is changing everything.
Here’s the new playbook for building intelligent systems:
🧠 Deploy an AI Agent Workforce
Instead of rigid scripts, you use a cognitive assembly line of specialized AI agents. A Proposer agent designs the data model, a Critic refines it, and an Extractor pulls the facts.
This modular approach is proven to reduce errors and improve the accuracy and coherence of the final graph.
🎨 Treat AI as a Designer, Not Just a Doer
The agents act as data architects. In discovery mode, they analyze unstructured data (like customer reviews) and propose a new logical structure from scratch.
In an enterprise with an existing data model, they switch to alignment mode, mapping new information to the established structure.
🏛️ Use a 3-Part Graph Architecture
This technique is key to managing data quality and uncertainty. You create three interconnected graphs:
The Domain Graph: Your single source of truth, built from trusted, structured data.
The Lexical Graph: The raw, original text from your documents, preserving the evidence.
The Subject Graph: An AI-generated bridge that connects them. It holds extracted insights that are validated before being linked to your trusted data.
Jaro-Winkler is a string comparison algorithm that measures the similarity or edit distance between two strings. It can be used here for entity resolution, the process of identifying and linking entities from the unstructured text (Subject Graph) to the official entities in the structured database (Domain Graph).
For example, the algorithm compares a product name extracted from a customer review (e.g., "the gothenburg table") with the official product names in the database. If the Jaro-Winkler similarity score is above a certain threshold, the system automatically creates a CORRESPONDS_TO relationship, effectively linking the customer's comment to the correct product in the supply chain graph.
🤝 Augment Humans, Don't Replace Them
The workflow is Propose, then Approve. AI does the heavy lifting, but a human expert makes the final call.
This process is made reliable by tools like Pydantic and Outlines, which enforce a rigid contract on the AI's output, ensuring every piece of data is perfectly structured and consistent.
And once discovered and validated, a schema can be enforced. | 32 comments on LinkedIn
by J Bittner John Sowa once observed: In logic, the existential quantifier ∃ is a notation for asserting that something exists. But logic itself has no vocabulary for describing the things that exist.
FinReflectKG: Agentic Construction and Evaluation of Financial Knowledge Graphs
Sharing our recent research 𝐅𝐢𝐧𝐑𝐞𝐟𝐥𝐞𝐜𝐭𝐊𝐆: 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐂𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐅𝐢𝐧𝐚𝐧𝐜𝐢𝐚𝐥 𝐊𝐧𝐨𝐰𝐥𝐞𝐝𝐠𝐞 𝐆𝐫𝐚𝐩𝐡𝐬. It is the largest financial knowledge graph built from unstructured data. The preprint of our article is out on arXiv now (link is in the comments). It is coauthored with Abhinav Arun | Fabrizio Dimino | Tejas Prakash Agrawal
While LLMs make it easier than ever to generate knowledge graphs, the real challenge lies in ensuring quality without hallucinations, with strong coverage, precision, comprehensiveness, and relevance. FinReflectKG tackles this through an iterative, evaluation-driven agentic approach, carefully optimized across multiple evaluation metrics to deliver a trustworthy and high-quality knowledge graph.
Designed to power use cases like entity search, question answering, signal generation, predictive modeling, and financial network analysis, FinReflectKG sets a new benchmark for building reliable financial KGs and showcases the potential of agentic workflows in LLM-driven systems.
We will be creating a suite of benchmarks using FinReflectKG for KG related tasks in financial services. More details to come soon. | 15 comments on LinkedIn
barnard59 is a toolkit to automate extract, transform and load (ETL) tasks. It allows you to generate RDF out of non-RDF data sources
Reliability in data pipelines depends on knowing what went wrong before your users do. With the new OpenTelemetry integration in our RDF ETL framework barnard59, every pipeline and API integration is now fully traceable!
Errors, validation results and performance metrics are automatically collected and visualised in Grafana. Instead of hunting through logs, you immediately see where time was spent and where an error occurred. This makes RDF-based ETL pipelines far more transparent and easier to operate at scale.
SynaLinks is an open-source framework designed to make it easier to partner language models (LMs) with your graph technologies. Since most companies are not in a position to train their own language models from scratch, SynaLinks empowers you to adapt existing LMs on the market to specialized tasks.
In the history of data standards, a recurring pattern should concern anyone working in semantics today. A new standard emerges, promises interoperability, gains adoption across industries or agencies, and for a time seems to solve the immediate need.
MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains
MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains ...
When AI Diagnoses Patients, Should Reasoning Be a Team Sport?
👉 Why Existing Approaches Fall Short
Medical question answering demands precision, but current AI methods struggle with two key issues:
1. Error Accumulation: Linear reasoning chains (like Chain-of-Thought) risk compounding mistakes—if the first step is wrong, the entire answer falters.
2. Flat Knowledge Retrieval: Traditional retrieval-augmented methods treat medical facts as unrelated text snippets, ignoring complex relationships between symptoms, diseases, and treatments.
This leads to unreliable diagnoses and opaque decision-making—a critical problem when patient outcomes are at stake.
👉 What MIRAGE Does Differently
MIRAGE transforms reasoning from a solo sprint into a coordinated team effort:
- Parallel Detective Work: Instead of one linear chain, multiple specialized "detectives" (reasoning chains) investigate different symptoms or entities in parallel.
- Structured Evidence Hunting: Retrieval operates on medical knowledge graphs, tracing connections between symptoms (e.g., "face pain → lead poisoning") rather than scanning documents.
- Cross-Check Consensus: Answers from parallel chains are verified against each other to resolve contradictions, like clinicians discussing differential diagnoses.
👉 How It Works (Without the Jargon)
1. Break It Down
- Splits complex queries ("Why am I fatigued with knee pain?") into focused sub-questions grounded in specific symptoms/entities.
- Example: "Conditions linked to fatigue" and "Causes of knee lumps" become separate investigation threads.
2. Graph-Guided Retrieval
- Each thread explores a medical knowledge graph like a map:
- Anchor Mode: Examines direct connections (e.g., diseases causing a symptom).
- Bridge Mode: Hunts multi-step relationships (e.g., toxin exposure → neurological symptoms → joint pain).
3. Vote & Verify
- Combines evidence from all threads, prioritizing answers supported by multiple independent chains.
- Discards conflicting hypotheses (e.g., ruling out lupus if only one chain suggests it without corroboration).
👉 Why This Matters
Tested on three medical benchmarks (including real clinician queries), MIRAGE:
- Outperformed GPT-4 and Tree-of-Thought variants in accuracy (84.8% vs. 80.2%)
- Reduced error propagation by 37% compared to linear retrieval-augmented methods
- Produced answers with traceable evidence paths, critical for auditability in healthcare
The Big Picture
MIRAGE shifts AI reasoning from brittle, opaque processes to collaborative, structured exploration. By mirroring how clinicians synthesize information from multiple angles, it highlights a path toward AI systems that are both smarter and more trustworthy in high-stakes domains.
Paper: Wei et al. MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains
MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains
𝗛𝗼𝘁 𝘁𝗮𝗸𝗲 𝗼𝗻 𝘁𝗵𝗲 “𝗳𝗮𝘀𝘁𝗲𝗿 𝘁𝗵𝗮𝗻 𝗗𝗶𝗷𝗸𝘀𝘁𝗿𝗮” 𝗵𝗲𝗮𝗱𝗹𝗶𝗻𝗲𝘀:
The recent result given in the paper: https://lnkd.in/dQSbqrhD is a breakthrough for theory. It beats Dijkstra’s classic worst-case bound for single-source shortest paths on directed graphs with non-negative weights. That’s big for the research community.
𝗕𝘂𝘁 𝗶𝘁 𝗱𝗼𝗲𝘀𝗻’𝘁 “𝗿𝗲𝘄𝗿𝗶𝘁𝗲” 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗿𝗼𝘂𝘁𝗶𝗻𝗴.
In practice, large-scale systems (maps, logistics, ride-hailing) moved past plain Dijkstra years ago. They rely on heavy preprocessing. Contraction Hierarchies, Hub Labels and other methods are used to answer point-to-point queries in milliseconds, even on large, continental networks.
𝗪𝗵𝘆 𝘁𝗵𝗲 𝗱𝗶𝘀𝗰𝗼𝗻𝗻𝗲𝗰𝘁?
• Different goals: The paper targets single-source shortest paths; production prioritizes point-to-point queries at interactive latencies.
• Asymptotics vs. constants: Beating O(m + n log n) matters in principle, but real systems live and die by constants, cache behavior, and integration with traffic/turn costs.
• Preprocessing wins: Once you allow preprocessing, the speedups from hierarchical/labeling methods dwarf Dijkstra and likely any drop-in replacement without preprocessing.
We should celebrate the theoretical advance and keep an eye on practical implementations. Just don’t confuse a sorting-barrier result with an immediate upgrade for Google Maps.
𝗕𝗼𝘁𝘁𝗼𝗺 𝗹𝗶𝗻𝗲: Great theory milestone. Production routing already “changed the rules” years ago with preprocessing and smart graph engineering.