Semantic Data in Medallion Architecture: Enterprise Knowledge Graphs at Scale | LinkedIn
Building Enterprise Knowledge Graphs Within Modern Data Platforms - Version 26 Louie Franco III Enterprise Architect - Knowledge Graph Architect - Semantics Architect August 3, 2025 In my previous article on Data Vault Medallion Architecture, I outlined how structured data flows through Landing, Bro
Jessica Talisman has been publishing a series of articles on Substack about how to develop more robust AI systems by leveraging vocabularies, thesauri, tax...
A gentle introduction to DSPy for graph data enrichment | Kuzu
📢 Check out our latest blog post by Prashanth Rao, where we introduce the DSPy framework to help you build composable pipelines with LLMs and graphs. In the post, we dive into a fascinating dataset of Nobel laureates and their mentorship networks for a data enrichment task. 👇🏽
✅ The source data that contains the tree structures is enriched with data from the official Nobel Prize API.
✅ We showcase a 2-step methodology that combines the benefits of Kuzu's vector search capabilities with DSPy's powerful primitives to build an LLM-as-a-judge pipeline that help disambiguate entities in the data.
✅ The DSPy approach is scalable, low-cost and efficient, and is flexible enough to apply to a wide variety of domains and use cases.
PyG (PyTorch Geometric) has evolved significantly since its initial release, establishing itself as a leading framework for Graph Neural Networks. In this paper, we present Pyg 2.0 (and its...
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
Graph-R1
New RAG framework just dropped!
Combines agents, GraphRAG, and RL.
Here are my notes:
Introduces a novel RAG framework that moves beyond traditional one-shot or chunk-based retrieval by integrating graph-structured knowledge, agentic multi-turn interaction, and RL.
Graph-R1 is an agent that reasons over a knowledge hypergraph environment by iteratively issuing queries and retrieving subgraphs using a multi-step “think-retrieve-rethink-generate” loop.
Unlike prior GraphRAG systems that perform fixed retrieval, Graph-R1 dynamically explores the graph based on evolving agent state.
Retrieval is modeled as a dual-path mechanism: entity-based hyperedge retrieval and direct hyperedge similarity, fused via reciprocal rank aggregation to return semantically rich subgraphs. These are used to ground subsequent reasoning steps.
The agent is trained end-to-end using GRPO with a composite reward that incorporates structural format adherence and answer correctness. Rewards are only granted if reasoning follows the proper format, encouraging interpretable and complete reasoning traces.
On six RAG benchmarks (e.g., HotpotQA, 2WikiMultiHopQA), Graph-R1 achieves state-of-the-art F1 and generation scores, outperforming prior methods including HyperGraphRAG, R1-Searcher, and Search-R1. It shows particularly strong gains on harder, multi-hop datasets and under OOD conditions.
The authors find that Graph-R1’s performance degrades sharply without its three key components: hypergraph construction, multi-turn interaction, and RL.
Ablation study supports that graph-based and multi-turn retrieval improves information density and accuracy, while end-to-end RL bridges the gap between structure and language.
Paper: https://lnkd.in/eGbf4HhX | 15 comments on LinkedIn
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
Our SPARQL Notebook extension for Visual Studio Code makes it super easy to document SPARQL queries and run them, either against live endpoints or directly on local RDF files. I just (finally!) published a 15-minute walkthrough on our YouTube channel Giant Global Graph. It gives you a quick overview of how it works and how you can get started.
Link in the comments.
Fun fact: I recorded this two years ago and apparently forgot to hit publish. Since then, we've added new features like improved table renderers with pivoting support, so it's even more useful now. Check it out! | 11 comments on LinkedIn
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
Instead of just pulling facts, the system samples multi-step paths within the graph, such as a causal chain from a disease to a symptom, and translates these paths into natural language reasoning tasks complete with a step-by-step thinking trace
Alhamdulillah, iText2KG v0.0.8 is finally out!
(Yes, I’ve been quite busy these past few months 😅)
.. and it can now build dynamic knowledge graphs. The GIF below shows a dynamic KG generated from OpenAI tweets between June 18 and July 17.
(Note: Temporal/logical conflicts aren't handled yet in this version, but you can still resolve them with a post-processing filter.)
Here are the main updated features:
- iText2KG_Star: Introduced a simpler and more efficient version of iText2KG that eliminates the separate entity extraction step. Instead of extracting entities and relations separately, iText2KG_Star directly extracts triplets from text. This approach is more efficient as it reduces processing time and token consumption and does not need to handle invented/isolated entities.
- Facts-Based KG Construction: Enhanced the framework with facts-based knowledge graph construction using the Document Distiller to extract structured facts from documents, which are then used for incremental KG building. This approach provides more exhaustive and precise knowledge graphs.
- Dynamic Knowledge Graphs: iText2KG now supports building dynamic knowledge graphs that evolve. By leveraging the incremental nature of the framework and document snapshots with observation dates, users can track how knowledge changes and grows.
Check out the new version and an example of OpenAI Dynamic KG Construction in the first comment.
Why Businesses Must Ground Their AI in Knowledge Graphs | LinkedIn
Here, I clearly explain why businesses must transition from raw tabular data to RDF-based knowledge graphs, and why this is essential to ground AI in logic-driven, traceable inference rather than black-box prediction: 1. Your tabular data is dumb.
Millions of G∈AR-s: Extending GraphRAG to Millions of Documents
Scaling GraphRAG to Millions of Documents: Lessons from the SIGIR 2025 LiveRAG Challenge
👉 WHY THIS MATTERS
Retrieval-augmented generation (RAG) struggles with multi-hop questions that require connecting information across documents. While graph-based RAG methods like GEAR improve reasoning by structuring knowledge as entity-relationship triples, scaling these approaches to web-sized datasets (millions/billions of documents) remains a bottleneck. The culprit? Traditional methods rely heavily on LLMs to extract triples—a process too slow and expensive for large corpora.
👉 WHAT THEY DID
Researchers from Huawei and the University of Edinburgh reimagined GEAR to sidestep costly offline triple extraction.
Their solution:
- Pseudo-alignment: Link retrieved passages to existing triples in Wikidata via sparse retrieval.
- Iterative expansion: Use a lightweight LLM (Falcon-3B-Instruct) to iteratively rewrite queries and retrieve additional evidence through Wikidata’s graph structure.
- Multi-step filtering: Combine Reciprocal Rank Fusion (RRF) and prompt-based filtering to reconcile noisy alignments between Wikidata and document content.
This approach achieved 87.6% correctness and 53% faithfulness on the SIGIR 2025 LiveRAG benchmark, despite challenges in aligning Wikidata’s generic triples with domain-specific document content.
👉 KEY INSIGHTS
1. Trade-offs in alignment: Linking Wikidata triples to documents works best for general knowledge but falters with niche topics (e.g., "Pacific geoduck reproduction" mapped incorrectly to oyster biology).
2. Cost efficiency: Avoiding LLM-based triple extraction reduced computational overhead, enabling scalability.
3. The multi-step advantage: Query rewriting and iterative retrieval improved performance on complex questions requiring 2+ reasoning hops.
👉 OPEN QUESTIONS
- How can we build asymmetric semantic models to better align text and graph data?
- Can hybrid alignment strategies (e.g., blending domain-specific KGs with Wikidata) mitigate topic drift?
- Does graph expansion improve linearly with scale, or are diminishing returns inevitable?
Why read this paper?
It’s a pragmatic case study in balancing scalability with reasoning depth in RAG systems. The code and prompts are fully disclosed, offering a blueprint for adapting GraphRAG to real-world, large-scale applications.
Paper: "Millions of G∈AR-s: Extending GraphRAG to Millions of Documents" (Shen et al., SIGIR 2025). Preprint: arXiv:2307.17399.
Millions of G∈AR-s: Extending GraphRAG to Millions of Documents
As the fifth most popular website on the Internet, keeping Wikipedia running smoothly is no small feat. The free encyclopedia hosts more than 65 million
This is the title of my upcoming book. And it’s all about the Shapes Constraint Language (SHACL). Expected release before November 1st 2025. The book is written and illustrated by Veronika He…
I've spent long, hard years learning how to talk about knowledge graphs and semantics with software engineers who have little training in linguistics. I feel quite fluent at this point, after investing huge amounts of effort into understanding statistics (I was a humanities undergrad) and into unpac
The future of trustworthy AI.
Powered by graphs.
data² has secured a groundbreaking patent for explainable AI powered by graphs.
🚨 AI hallucinations destroy trust.
That's not acceptable when lives and missions are at stake.
While others rush to patch traditional RAG systems, we've engineered a fundamentally different approach.
Our patented innovation delivers what leaders demand:
🔍 **Complete Transparency**
- Watch AI traverse relationship paths in real-time
- No more black box decisions
📊 **Evidence You Can Trust**
- Every conclusion links to source data
- Full citation trails for audit readiness
How did we build it?
🔗 **Graph-Based Architecture**
- Knowledge graphs capture critical relationships traditional RAG misses
- Every connection adds context and validates accuracy
This isn't just innovation for innovation's sake.
At data² we are solving critical challenges across:
↳ Intelligence operations requiring all-source validation
↳ Cyber threat analysis demanding instant verification
↳ Energy infrastructure decisions where safety is paramount
↳ Financial investigations tracking complex money flows
↳ Supply chain operations in contested environments
While others promise AI accuracy, we've patented how to prove it.
💬 Interested in learning more? Reach out directly.
🔔 Follow me Daniel Bukowski for daily insights about delivering transparent AI with graph technology. | 90 comments on LinkedIn
Getting Started with the Graph Query Language (GQL): The complete guide to designing, querying, and managing graph databases with GQL: 9781836204015: Computer Science Books @ Amazon.com
Getting Started with the Graph Query Language (GQL): The complete guide to designing, querying, and managing graph databases with GQL: 9781836204015: Computer Science Books @ Amazon.com
GraphFaker: Instant Graphs for Prototyping, Teaching, and Beyond
I can't tell you how many times I've had a graph analytics idea, only to spend days trying to find decent data to test it on. 😤Sound familiar?
That's why I'm excited about the talk next week by Dennis Irorere on GraphFaker - a free tool from the GraphGeeks Lab to help with the graph data problem.
Good graph data is ridiculously hard to come by. It's either locked behind privacy walls, messy beyond belief, or not really relationship-centric. I've been there, we've all been there.
Dennis will show us how to:
- Generate realistic social networks quickly
- Pull actual street network data without the headaches
- Access air travel networks, Wikipedia graphs, and more
🌐 Join us on July 29 - Or register for the recording.
https://lnkd.in/gBxjrWGS
Whether you're in research, prototyping new features, or teaching graph algorithms, this could shorten your workflow. –And what really caught my attention is that this will allow me to focus on the fun part of testing ideas. 🤓
What’s the difference between context engineering and ontology engineering?
What’s the difference between context engineering and ontology engineering?
We hear a lot about “context engineering” these days in AI wonderland. A lot of good thing are being said but it’s worth noting what’s missing.
Yes, context matters. But context without structure is narrative, not knowledge. And if AI is going to scale beyond demos and copilots into systems that reason, track memory, and interoperate across domains… then context alone isn’t enough.
We need ontology engineering.
Here’s the difference:
- Context engineering is about curating inputs: prompts, memory, user instructions, embeddings. It’s the art of framing.
- Ontology engineering is about modeling the world: defining entities, relations, axioms, and constraints that make reasoning possible.
In other words:
Context guides attention. Ontology shapes understanding.
What’s dangerous is that many teams stop at context, assuming that if you feed the right words to an LLM, you’ll get truth, traceability, or decisions you can trust. This is what I call “hallucination of control”.
Ontologies provide what LLMs lack: grounding, consistency, and interoperability, but they are hard to build without the right methods, adapted from the original discipline that started 20+ years ago with the semantic web, now it’s time to work it out for the LLM AI era.
If you’re serious about scaling AI across business processes or mission-critical systems, the real challenge is more than context, it’s shared meaning. And tech alone cannot solve this.
That’s why we need put ontology discussion in the board room, because integrating AI into organizations is much more complicated than just providing the right context in a prompt or a context window.
That’s it for today. More tomorrow!
I’m trying to get back at journaling here every day. 🤙 hope you will find something useful in what I write. | 71 comments on LinkedIn
What’s the difference between context engineering and ontology engineering?