GraphNews

#KnowledgeGraph
Building a Biomedical GraphRAG: When Knowledge Graphs Meet Vector Search
Building a Biomedical GraphRAG: When Knowledge Graphs Meet Vector Search

a RAG system for biomedical research that uses both vector search and knowledge graphs.

Turns out, you need both.

Vector databases, such as Qdrant, are excellent at handling semantic similarity, but they struggle with relationship queries.

𝐓𝐡𝐞 𝐢𝐬𝐬𝐮𝐞: Author networks, citations, and institutional collaborations aren't semantic similarities. They're structured relationships that don't live in embeddings.

𝐓𝐡𝐞 𝐡𝐲𝐛𝐫𝐢𝐝 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡

I combined Qdrant for semantic retrieval with Neo4j for relationship queries, using OpenAI's tool-calling to orchestrate between them.

The workflow:

1️⃣ User asks a question 2️⃣ Qdrant retrieves semantically relevant papers 3️⃣ LLM analyzes the query and decides which graph enrichment tools to call 4️⃣ Neo4j returns structured relationship data 5️⃣ Both sources combine into one answer

Same query with the hybrid system: Returns 4 specific collaborators with paper counts, plus relevant research context.

𝐈𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐧𝐨𝐭𝐞𝐬

  • I initially tried having the LLM generate Cypher queries directly, but tool-calling worked much better. The LLM decides which pre-built tool to call, as the tools themselves contain reliable Cypher queries, and LLMs are not yet good enough at Cypher query generation

  • For domains with complex relationships, such as biomedical research, legal documents, and enterprise knowledge, combining vector search with knowledge graphs gives you capabilities neither has alone.

https://www.linkedin.com/posts/activity-7397237155716063232-0pku/

·aiechoes.substack.com·
Building a Biomedical GraphRAG: When Knowledge Graphs Meet Vector Search
The point of semantic modeling is to capture all of the detail, all of the knowledge, all of the information that becomes available to us
The point of semantic modeling is to capture all of the detail, all of the knowledge, all of the information that becomes available to us
The point of semantic modeling is to capture all of the detail, all of the knowledge, all of the information that becomes available to us. Here is an example of extensive semantic model: ontology plus taxonomies for greater depth. This example is quite a comprehensive semantic model if you consider that it’s supported with nearly 100 sets of definitions and descriptions. The model describes knowledge of many different terms, along with an understanding of how those terms are defined, described, and interrelated. When it becomes difficult to absorb all at once, view it in layers: - Begin with the simple knowledge graph—understand the nodes and the edges, the illustration of things and relationships among them. - Then view the property graph to understand the facts that can be known about each thing and each relationship. - Finally, extend it to include taxonomies to see classes and subclasses. Another approach for layering might begin with the knowledge graph showing things and relationships, then add entity taxonomies to understand classes and subclasses of entities, and finally extend it to see properties and property taxonomies. Don’t shy away from large or complex models! Simply plan to manage that detail and complexity by layering and segmenting the diagram. This provides the ability to look at subsets of the model without losing the comprehensive view of enterprise semantics. Graphic sourced from the ‘Architecture and Design for Data Interoperability’ course by Dave Wells. https://lnkd.in/gtqThWdX
The point of semantic modeling is to capture all of the detail, all of the knowledge, all of the information that becomes available to us
·linkedin.com·
The point of semantic modeling is to capture all of the detail, all of the knowledge, all of the information that becomes available to us
Most agentic systems hardcode their capabilities. This does not scale. Ontologies as executable metadata for the four core agent capabilities can solve this.
Most agentic systems hardcode their capabilities. This does not scale. Ontologies as executable metadata for the four core agent capabilities can solve this.
Most agentic systems hardcode their capabilities.
Most agentic systems hardcode their capabilities. 🔳This does not scale.Ontologies as executable metadata for the four core agent capabilities can solve this.
·linkedin.com·
Most agentic systems hardcode their capabilities. This does not scale. Ontologies as executable metadata for the four core agent capabilities can solve this.
Visualizing Knowledge Graphs
Visualizing Knowledge Graphs
A practical guide to visualizing and exploring knowledge graphs (RDF/OWL and property graphs) with yFiles: predicate-aware analysis, schema vs. instance views, appropriate layouts, semantic styling, and interaction patterns like predicate filters and progressive disclosure.
Visualizing Knowledge Graphs: A Comprehensive Guide
·yfiles.com·
Visualizing Knowledge Graphs
Ontologies bring context
Ontologies bring context
I used the o word last week and it hit a few nerves. Ontologies bring context. But then context engineering is very poorly understood. Agent engineers speak about it, expect everyone is doing it, know but almost everyone is winging it. Here's what context engineering is definitely not - ie. longer prompts. What it actually is - the right information, with the right meaning, at the right time. Not more but the right information with the right meaning. Sounds super abstract. That's why a brief video that actually breaks down how to load context. Okay. Not brief. but context needs context.
Ontologies bring context
·linkedin.com·
Ontologies bring context
Most companies think their knowledge graph or ontology will be built by extracting information from their data only to find out that their data doesn’t contain much information
Most companies think their knowledge graph or ontology will be built by extracting information from their data only to find out that their data doesn’t contain much information
Most companies think their knowledge graph or ontology will be built by extracting information from their data only to find out that their data doesn’t contain much information. You’re taught the cycle of Data - Information - Knowledge - Wisdom, but they stop before teaching a fundamental concept of information theory. You can measure the information in a dataset. There’s an entire area of study around defining whether a dataset has sufficient information to answer a question and build a model. Run that evaluation on most enterprise data and business questions and you’ll see the extent of the problem. No downstream process (cleaning, transformation, wrangling, etc.) can introduce information that a dataset doesn’t already contain. Said simply, you can’t clean the signal back into the data. If it wasn’t gathered contextually, the information was lost. For almost a decade, I have had to give new clients the same sad story. Roughly 80% of the business’s data doesn’t contain enough information to be used for model training. LLMs don’t change that. Agents need a lot more information to do their jobs reliably. An agent detects intent, then infers the desired outcome and all the steps required to deliver it. RAG over knowledge graphs is intended to provide all the supporting information required to do that reliably. However, if your datasets don’t contain enough information, no amount of AI can fix it. Before building an agent, we must assess whether our data contains enough information to satisfy the range of intents our users will bring to it. That’s an even higher bar than just answering a question or predicting a single variable. Agents create an information problem on both sides of the equation: Do you have enough information to define the intent and outcome based on the user’s prompt? Do you have enough information to define the steps required to deliver the outcome and execute them reliably enough to deliver the outcome? Information and knowledge management are the keys that unlock AI’s value, but businesses must curate datasets in new ways to succeed. The enterprise’s BI datasets and data warehouses rarely contain enough information to get the job done. | 24 comments on LinkedIn
Most companies think their knowledge graph or ontology will be built by extracting information from their data only to find out that their data doesn’t contain much information
·linkedin.com·
Most companies think their knowledge graph or ontology will be built by extracting information from their data only to find out that their data doesn’t contain much information
Ontologies transcend their traditional role as static schema documentation and emerge as dynamic, executable metadata that actively controls and defines the capabilities of AI agents
Ontologies transcend their traditional role as static schema documentation and emerge as dynamic, executable metadata that actively controls and defines the capabilities of AI agents
Ontologies transcend their traditional role as static schema documentation and emerge as dynamic, executable metadata that actively controls and defines the capabilities of AI agents. 🔳 They're storing the instructions agents use to operate on that data. Traditional software architectures separate code from data, with logic hardcoded in application layers while data resides in storage layers. The ontology-based approach fundamentally challenges this separation by storing behavioral rules and tool definitions as graph data that agents actively query during execution. Ontologies in these systems operate as runtime-queryable metadata rather than compile-time specifications This is meta-programming at the database level, and the technical implications are profound: Traditional approach: Your agent has hardcoded tools. Each tool is a Python function that knows exactly what query to run, which entity types to expect, and how to navigate relationships. Ontology-as-meta-tool approach: Your agent has THREE generic tools that query the ontology at runtime to figure out how to operate. Here's the technical breakdown: Tool 1 does semantic search and returns mixed entity types (could be Artist nodes, Subject nodes, whatever matches the vector similarity). Tool 2 queries the ontology: "For this entity type, what property serves as the unique identifier?" The ontology responds because properties are marked with "inverseFunctional" annotations. Now the agent knows how to retrieve specific instances. Tool 3 queries the ontology again: "Which relationships from this entity type are marked as contextualizing?" The ontology returns relationship types. The agent then constructs a dynamic Cypher query using those relationship types as parameters. The breakthrough: The same three tools work for ANY domain. Swap the art gallery ontology for a medical ontology, and the agent adapts instantly because it's reading navigation rules from the graph, not from code. This is self-referential architecture. The system queries its own structure to determine its own behavior. The ontology becomes executable metadata - not documentation about the system, but instructions that drive the system. The technical pattern: Store tool definitions as (:Tool) nodes with Cypher implementations as properties Mark relationships with custom annotations (contextualizing: true/false) Mark properties with OWL annotations (inverseFunctional for identifiers) Agent queries these annotations at runtime to construct dynamic queries Result: You move from procedural logic (IF entity_type == "Artist" THEN...) to declarative logic (query the ontology to learn the rules). The system can now analyze its own schema, identify missing capabilities, and propose new tool definitions. It's not just configurable - it's introspective. What technical patterns have you found for making agent capabilities declarative rather than hardcoded? | 37 comments on LinkedIn
Ontologies transcend their traditional role as static schema documentation and emerge as dynamic, executable metadata that actively controls and defines the capabilities of AI agents
·linkedin.com·
Ontologies transcend their traditional role as static schema documentation and emerge as dynamic, executable metadata that actively controls and defines the capabilities of AI agents
The Knowledge Graph Talent Shortage: Why Companies Can't Find the Skills They Desperately Need
The Knowledge Graph Talent Shortage: Why Companies Can't Find the Skills They Desperately Need
The Knowledge Graph Talent Shortage: Why Companies Can't Find the Skills They Desperately Need In my previous posts, I showed how Google's Knowledge Graph gives them a major AI advantage (https://lnkd.in/d5ZpMYut), and how enterprises from IKEA to Siemens to AstraZeneca have been using knowledge graphs and now leverage them for GenAI applications (https://lnkd.in/dPhuUhFJ). But here's the problem: we don't have enough people who know how to build them. 📊 The numbers tell the story. Job boards show thousands of open positions globally for ontology engineers, semantic web developers, and knowledge graph specialists. Yet these positions remain unfilled for months. Salaries for this expertise are rising, and technology vendors report inbound client calls instead of chasing business. 🤔 Why the shortage? The semantic web emerged in the early 2000s with technologies like RDF, OWL, and SPARQL. A small group of pioneers built this expertise. I was part of that early wave. I contributed to the POSC Caesar Association oil and gas ontology, certified as ontology modeller and participated in the W3C workshop hosted by Chevron in Houston in 2008. Later I led the Integrated Operations in the High North (IOHN) program with 23 companies like ABB, Siemens, and Cisco to increase semantic web knowledge within Equinor's vendor ecosystem. After IOHN, I stepped away for over a decade. The Knowledge Graph Alliance (KGA) drew me back. Companies need people who can design ontologies, write SPARQL queries, map enterprise data to semantic standards, and integrate knowledge graphs with LLMs. These aren't skills you pick up in a weekend bootcamp. 🔄 What needs to change? Universities must integrate semantic knowledge graphs into core curriculum alongside AI and machine learning as requirements, not electives. Here's something many don't realize: philosophy matters. Some of the best ontologists have philosophy degrees. Understanding how to represent knowledge requires training in logic and formal reasoning. DAMA International®'s Data Management Body of Knowledge covers 11 knowledge areas, but knowledge graphs remain absent. This would legitimize the discipline. Industry-academia bridges are critical. Organizations like the KGA bring together industry leaders with research organizations and academia. We need more such collaborations. 💡 The opportunity: If you're a data engineer or data scientist looking for a career differentiator, semantic web skills are your ticket. 🎯 The bottom line: Knowledge graphs aren't optional for industrial-scale GenAI. But you need the people who understand them. While reports document tech talent shortages, the semantic web skills gap remains largely undocumented as companies struggle to fill thousands of positions. What's your experience with the shortage? Are you hiring? Upskilling? Teaching this? #KnowledgeGraphs #SemanticWeb #AI #GenAI #TalentShortage #SkillsGap #Ontology #DataScience #Philosophy #DigitalTransformation | 29 comments on LinkedIn
The Knowledge Graph Talent Shortage: Why Companies Can't Find the Skills They Desperately Need
·linkedin.com·
The Knowledge Graph Talent Shortage: Why Companies Can't Find the Skills They Desperately Need
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
Alhamdulillah, ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds. Just as matter is formed from atoms, and galaxies are formed from stars, knowledge is likely to be formed from atomic knowledge graphs. Atomic knowledge graphs were born from our intention to solve a common problem in LLM-based KG construction methods: exhaustivity and stability. Often, these methods produce unstable KGs that change when rerunning the construction process, even without changing anything. Moreover, they fail to capture all facts in the input documents and usually overlook the temporal and dynamic aspects of real-world data. What is the solution? Atomic facts that are temporally aware. Instead of constructing knowledge graphs from raw documents, we split them into atomic facts, which are self-contained and concise propositions. Temporal atomic KGs are constructed from each atomic fact. Then, we defined how the temporal atomic KGs would be merged at the atomic level and how the temporal aspects would be handled. We designed a binary merge algorithm that combines two TKGs and a parallel merge process that merges all TKGs simultaneously. The entire architecture operates in parallel. ATOM employs dual-time modeling that distinguishes observation time from validity time and has 3 main modules: - Module 1 (Atomic Fact Decomposition) splits input documents observed at time t into atomic facts using LLM-based prompting, where each temporal atomic fact is a short, self-contained snippet that conveys exactly one piece of information. - Module 2 (Atomic TKGs Construction) extracts 5-tuples in parallel from each atomic fact to construct atomic temporal KGs, while embedding nodes and relations and addressing temporal resolution during extraction. - Module 3 (Parallel Atomic Merge) employs a binary merge algorithm to merge pairs of atomic TKGs through iterative pairwise merging in parallel until convergence, with three resolution phases: (1) entity resolution, (2) relation name resolution, and (3) temporal resolution that merges observation and validity time sets for relations with similar (e_s, r_p, e_o). The resulting TKG snapshot is then merged with the previous DTKG to yield the updated DTKG. Results: Empirical evaluations demonstrate that ATOM achieves ~18% higher exhaustivity, ~17% better stability, and over 90% latency reduction compared to baseline methods (including iText2KG), demonstrating strong scalability potential for dynamic TKG construction. Check our ATOM's architecture and code: Preprint Paper: https://lnkd.in/dsJzDaQc Code: https://lnkd.in/drZUyisV Website: (coming soon) Example use cases: (coming soon) Special thanks to the dream team: Ludovic Moncla, Khalid Benabdeslem, Rémy Cazabet, Pierre Cléau 📚📡 | 14 comments on LinkedIn
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
·linkedin.com·
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.