Choosing the Right Format: How Knowledge Graph Layouts Impact AI Reasoning
Choosing the Right Format: How Knowledge Graph Layouts Impact AI Reasoning ...
👉 Why This Matters
Most AI systems blend knowledge graphs (structured data) with large language models (flexible reasoning). But there’s a hidden variable: "how" you translate the graph into text for the AI. Researchers discovered that the formatting choice alone can swing performance by up to "17.5%" on reasoning tasks. Imagine solving 1 in 5 more problems correctly just by adjusting how you present data.
👉 What They Built
KG-LLM-Bench is a new benchmark to test how language models reason with knowledge graphs.
It includes five tasks:
- Triple verification (“Does this fact exist?”)
- Shortest path finding (“How are two concepts connected?”)
- Aggregation (“How many entities meet X condition?”)
- Multi-hop reasoning (“Which entities linked to A also have property B?”)
- Global analysis (“Which node is most central?”)
The team tested seven models (Claude, GPT-4o, Gemini, Llama, Nova) with five ways to “textualize” graphs, from simple edge lists to structured JSON and semantic web formats like RDF Turtle.
👉 Key Insights
1. Format matters more than assumed:
- Structured JSON and edge lists performed best overall, but results varied by task.
- For example, JSON excels at aggregation tasks (data is grouped by entity), while edge lists help identify central nodes (repeated mentions highlight connections).
2. Models don’t cheat:
Replacing real entity names with fake ones (e.g., “France” → “Verdania”) caused only a 0.2% performance drop, proving models rely on context, not memorized knowledge.
3. Token efficiency:
- Edge lists used ~2,600 tokens vs. JSON-LD’s ~13,500. Shorter formats free up context space for complex reasoning.
- But concise ≠ always better: structured formats improved accuracy for tasks requiring grouped data.
4. Models struggle with directionality:
Counting outgoing edges (e.g., “Which countries does France border?”) is easier than incoming ones (“Which countries border France?”), likely due to formatting biases.
👉 Practical Takeaways
- Optimize for your task: Use JSON for aggregation, edge lists for centrality.
- Test your model: The best format depends on the LLM—Claude thrived with RDF Turtle, while Gemini preferred edge lists.
- Don’t fear pseudonyms: Masking real names minimally impacts performance, useful for sensitive data.
The benchmark is openly available, inviting researchers to add new tasks, graphs, and models. As AI handles larger knowledge bases, choosing the right “data language” becomes as critical as the reasoning logic itself.
Paper: [KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs]
Authors: Elan Markowitz, Krupa Galiya, Greg Ver Steeg, Aram Galstyan
Choosing the Right Format: How Knowledge Graph Layouts Impact AI Reasoning
Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs
Our first attempts at mechanistic interpretability of Transformers from the perspective of network science and graph theory! Check out our preprint: arxiv.org/abs/2502.12352
A wonderful collaboration with superstar MPhil students Batu El, Deepro Choudhury, as well as Pietro Lio' as part of the Geometric Deep Learning class last year at University of Cambridge Department of Computer Science and Technology
We were motivated by Demis Hassabis calling AlphaFold and other AI systems for scientific discovery as ‘engineering artifacts’. We need new tools to interpret the underlying mechanisms and advance our scientific understanding. Graph Transformers are a good place to start.
The key ideas are:
- Attention across multi-heads and layers can be seen as a heterogenous, dynamically evolving graph.
- Attention graphs are complex systems represent information flow in Transformers.
- We can use network science to extract mechanistic insights from them!
More to come on the network science perspective to understanding LLMs next! | 13 comments on LinkedIn
A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research
🚀 Thrilled to share our latest work published in Nature Machine Intelligence!
📄 "A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research"
In this study, we constructed iKraph, one of the most comprehensive biomedical knowledge graphs to date, using a human-level information extraction pipeline that won both the LitCoin NLP Challenge and the BioCreative Challenge. iKraph integrates insights from over 34 million PubMed abstracts and 40 public databases, enabling unprecedented scale and precision in automated knowledge discovery (AKD).
💡 What sets our work apart?
We developed a causal knowledge graph and a probabilistic semantic reasoning (PSR) algorithm to infer indirect entity relationships, such as drug-disease relationships. This time-aware framework allowed us to retrospectively and prospectively validate drug repurposing and drug target predictions, something rarely done in prior work.
✅ For COVID-19, we predicted hundreds of drug candidates in real-time, one-third of which were later supported by clinical trials or publications.
✅ For cystic fibrosis, we demonstrated our predictions were often validated up to a decade later, suggesting our method could significantly accelerate the drug discovery pipeline.
✅ Across diverse diseases and common drugs, we achieved benchmark-setting recall and positive predictive rates, pushing the boundaries of what's possible in drug repurposing.
We believe this study sets a new frontier in biomedical discovery and demonstrates the power of structured knowledge and interpretability in real-world applications.
📚 Read the full paper: https://lnkd.in/egYgbYT4?
📌 Access the platform: https://lnkd.in/ecxwHBK7
📂 Access the data and code: https://lnkd.in/eBp2GEnH
LitCoin NLP Challenge: https://lnkd.in/e-cBc6eR
Kudos to our incredible team and collaborators who made this possible!
#DrugDiscovery #AI #KnowledgeGraph #Bioinformatics #MachineLearning #NatureMachineIntelligence #DrugRepurposing #LLM #BiomedicalAI #NLP #COVID19 #Insilicom #NIH #NCI #NSF #ARPA-H | 10 comments on LinkedIn
A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research
Uncover data's hidden connections using graph analytics in BigQuery. This session shows how to use BigQuery's scalable infrastructure for graph analysis directly in your data warehouse. Identify patterns, connections, and influences for fraud detection, drug discovery, social network analysis, and recommendation engines. Join us to explore the latest innovations in graphs and see real-world examples. Transform your data into actionable insights with BigQuery's powerful graph capabilities.
Graph Data Modeling Without Graph Databases: PostgreSQL and Hybrid Approaches for Agentic Systems 🖇️
Organizations implementing AI systems today face a practical challenge: maintaining multiple specialized databases (vector stores, graph databases, relational systems) creates significant operational complexity, increases costs, and introduces synchronization headaches.
Companies like Writer (insight from a recent Waseem Alshikh interview with Harrison Chase) have tackled this problem by implementing graph-like structures directly within PostgreSQL, eliminating the need for separate graph databases while maintaining the necessary functionality. This approach dramatically simplifies infrastructure management, reduces the number of systems to monitor, and eliminates error-prone synchronization processes that can cost thousands of dollars in wasted resources.
For enterprises focused on delivering business value rather than managing technical complexity, these PostgreSQL-based implementations offer a pragmatic path forward, though with important trade-offs when considering more sophisticated agentic systems.
Writer implemented a subject-predicate-object triple structure directly in PostgreSQL tables rather than using dedicated graph databases. This approach maintains the semantic richness of knowledge graphs while leveraging PostgreSQL's maturity and scalability. Writer kept the conceptual structure of triples that underpin knowledge graphs implemented through a relational schema design.
Instead of relying on native graph traversals, Writer developed a fusion decoder that reconstructs graph-like relationships at query time. This component serves as the bridge between the storage layer (PostgreSQL with its triple-inspired structure) and the language model, enabling sophisticated information retrieval without requiring a dedicated graph database's traversal capabilities. The approach focuses on query translation and result combination rather than storage structure optimization.
Complementing the triple-based approach, PostgreSQL with extensions (PG Vector and PG Vector Scale) can function effectively as a vector database. This challenges the notion that specialized vector databases are necessary, Treating embeddings as derived data leads to a more natural and maintainable architecture. This reframes the database's role from storing independent vector embeddings to managing derived data that automatically synchronizes with its source.
But a critical distinction between retrieval systems and agentic systems need to be made. While PostgreSQL-based approaches excel at knowledge retrieval tasks where the focus is on precision and relevance, agentic systems operate in dynamic environments where context evolves over time, previous actions influence future decisions, and contradictions need to be resolved. This distinction drives different architectural requirements and suggests potential complementary roles for different database approaches. | 15 comments on LinkedIn
Why Labeled Property Graphs Break Reasoning — Even with RDF Interop
Why Labeled Property Graphs Break Reasoning — Even with RDF Interop
At first glance, it seems like LPGs can handle reasoning over RDF data if you just install… | 14 comments on LinkedIn
LLMs as Graph Neural Networks | Petar Veličković @ GLOW
Join our slack and come to the next Graph Learning on Wednesdays (GLOW) session.https://sites.google.com/view/graph-learning-on-wedsOn March 26th, 2025, we h...
Is developing an ontology from an LLM really feasible?
It seems the answer on whether an LMM would be able to replace the whole text-to-ontology pipeline is a resounding ‘no’. If you’re one of those who think that should be (or even is?) a ‘yes’: why, and did you do the experiments that show it’s as good as the alternatives (with the results available)? And I mean a proper ontology, not a knowledge graph with numerous duplications and contradictions and lacking constraints.
For a few gentle considerations (and pointers to longer arguments) and a summary figure of processes the LLM supposedly would be replacing: see https://lnkd.in/dG_Xsv_6 | 43 comments on LinkedIn
What are the Different Types of Graphs? The Most Common Misconceptions and Understanding Their Applications - Enterprise Knowledge
Learn about different types of graphs and their applications in data management and AI, as well as common misconceptions, in this article by Lulit Tesfaye.
AI-Powered Databases Boost the Alzheimer’s Drug Discovery Process
Researchers studying Alzheimer’s disease are using artificial intelligence-powered databases to accelerate the drug discovery process by making it easier to sift through vast amounts of biomedical data.
Self-Organizing Graph Reasoning Evolves into a Critical State for Continuous Discovery Through Structural-Semantic Dynamics
Deep stuff! We uncovered a startling link between #entropy, a bedrock concept in #physics, and how #AI can discover new ideas without stagnating. In an era… | 41 comments on LinkedIn
Building Flexible Virtual Knowledge Graphs with Ontop and Apache Iceberg | LinkedIn
What’s So Special About Apache Iceberg? Apache Iceberg is one of the most fascinating technologies when it comes to standardized access to large analytic tables. And Apache Iceberg combines very well with the idea of virtual knowledge graphs.
How the Ontology Pipeline Powers Semantic Knowledge Systems
The Need for a Structured Approach, Elements of the Ontology Pipeline, the Pipeline as a Framework for Developing Knowledge Management Systems, and More!
Digital evolution: Novo Nordisk’s shift to ontology-based data management - Journal of Biomedical Semantics
The amount of biomedical data is growing, and managing it is increasingly challenging. While Findable, Accessible, Interoperable and Reusable (FAIR) data principles provide guidance, their adoption has proven difficult, especially in larger enterprises like pharmaceutical companies. In this manuscript, we describe how we leverage an Ontology-Based Data Management (OBDM) strategy for digital transformation in Novo Nordisk Research & Early Development. Here, we include both our technical blueprint and our approach for organizational change management. We further discuss how such an OBDM ecosystem plays a pivotal role in the organization’s digital aspirations for data federation and discovery fuelled by artificial intelligence. Our aim for this paper is to share the lessons learned in order to foster dialogue with parties navigating similar waters while collectively advancing the efforts in the fields of data management, semantics and data driven drug discovery.
Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models
LLMs are taking Graph Neural Networks to the next level:
While we've been discussing LLMs for natural language, they're quietly changing how we represent…
Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large
Agentic Paranets just landed on the origin_trail DKG. A major paranet feature upgrade built for AI agents with enhanced knowledge graph read/write access control
Knowledge graphs for LLM grounding and avoiding hallucination
This blog post is part of a series that dives into various aspects of SAP’s approach to Generative AI, and its technical underpinnings. In previous blog posts of this series, you learned about how to use large language models (LLMs) for developing AI applications in a trustworthy and reliable manner...
Enabling LLM development through knowledge graph visualization
Discover how to empower LLM development through effective knowledge graph visualization. Learn to leverage yFiles for intuitive, interactive diagrams that simplify debugging and optimization in AI applications.
"Knowledge Graphs Applied" becomes "Knowledge Graphs and LLMs in Action"
🎉🎉 🎉 "Knowledge Graphs Applied" becomes "Knowledge Graphs and LLMs in Action"
Four years ago, we embarked on writing "Knowledge Graphs Applied" with a clear mission: to guide practitioners in implementing production-ready knowledge graph solutions. Drawing from our extensive field experience across multiple domains, we aimed to share battle-tested best practices that transcend basic use cases.
Like fine wine, ideas, and concepts need time to mature. During these four years of careful development, we witnessed a seismic shift in the technological landscape. Large Language Models (LLMs) emerged not just as a buzzword, but as a transformative force that naturally converged with knowledge graphs.
This synergy unlocked new possibilities, particularly in simplifying complex tasks like unstructured data ingestion and knowledge graph-based question-answering.
We couldn't ignore this technological disruption. Instead, we embraced it, incorporating our hands-on experience in combining LLMs with graph technologies. The result is "Knowledge Graphs and LLMs in Action" – a thoroughly revised work with new chapters and an expanded scope.
Yet our fundamental goal remains unchanged: to empower you to harness the full potential of knowledge graphs, now enhanced by their increasingly natural companion, LLMs. This book represents the culmination of a journey that evolved alongside the technology itself. It delivers practical, production-focused guidance for the modern era, in which knowledge graphs and LLMs work in concert.
Now available in MEAP, with new LLMs-focused chapters ready to be published.
#llms #knowledgegraph #graphdatascience
"Knowledge Graphs Applied" becomes "Knowledge Graphs and LLMs in Action"
The SECI model for knowledge creation, collection, and distribution within the organization
💫 An 𝗲𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝗼𝗻𝘁𝗼𝗹𝗼𝗴𝘆 is just a means, not an end.
👉 Transforming 𝘁𝗮𝗰𝗶𝘁 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 into 𝗲𝘅𝗽𝗹𝗶𝗰𝗶𝘁 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 through an enterprise ontology is a self-contained exercise if not framed within a broader process of knowledge creation, collection, and distribution within the organization.
👇 The 𝗦𝗘𝗖𝗜 𝗠𝗼𝗱𝗲𝗹 effectively describes the various steps of this process, going beyond mere collection and formalization. The SECI model outlines the following four phases that must be executed iteratively and continuously to properly manage organizational knowledge:
1️⃣ 𝗦𝗼𝗰𝗶𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: In this phase, tacit knowledge is shared through direct interaction, observation, or experiences. It emphasizes the transfer of personal knowledge between individuals and fosters mutual understanding through collaboration (tacit ➡️ tacit).
2️⃣ 𝗘𝘅𝘁𝗲𝗿𝗻𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: In this phase, tacit knowledge is articulated into explicit forms, such as an enterprise ontology. It helps to codify and communicate the personal knowledge that might otherwise remain unspoken or difficult to share (tacit ➡️ explicit).
3️⃣ 𝗖𝗼𝗺𝗯𝗶𝗻𝗮𝘁𝗶𝗼𝗻: In this phase, explicit knowledge is gathered from different sources, categorized, and synthesized to form new sets of knowledge. It involves the aggregation and reorganization of existing knowledge to create more structured and accessible forms (explicit ➡️ explicit).
4️⃣ 𝗜𝗻𝘁𝗲𝗿𝗻𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: In this phase, individuals internalize explicit knowledge, turning it back into tacit knowledge through practice, experience, and learning. It emphasizes the transformation of formalized knowledge into personal, actionable knowledge (explicit ➡️ tacit).
🎯 In a world where the only constant is change, it is no longer enough for an organization to know something; what matters most is how fast it learns by creating and redistributing new knowledge internally.
🧑🎓 To quote Nadella, organizations and the people within them should not be 𝘒𝘯𝘰𝘸-𝘐𝘵-𝘈𝘭𝘭𝘴 but rather 𝘓𝘦𝘢𝘳𝘯-𝘐𝘵-𝘈𝘭𝘭𝘴.
#TheDataJoy #KnowledgeMesh #KnowledgeManagement #Ontologies
Transforming 𝘁𝗮𝗰𝗶𝘁 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 into 𝗲𝘅𝗽𝗹𝗶𝗰𝗶𝘁 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 through an enterprise ontology is a self-contained exercise if not framed within a broader process of knowledge creation, collection, and distribution within the organization.