Affordable AI Assistants with Knowledge Graph of Thoughts
Large Language Models (LLMs) are revolutionizing the development of AI assistants capable of performing diverse tasks across domains. However, current state-of-the-art LLM-driven agents face...
The Dataverse Project: 750K FAIR Datasets and a Living Knowledge Graph
"I'm Ukrainian and I'm wearing a suit, so no complaints about me from the Oval Office" - that's the start of my lecture about building Artificial Intelligence with Croissant ML in the Dataverse data platform, for the Bio x AI Hackathon kick-off event in Berlin. https://lnkd.in/ePYHCfJt
* 750,000+ FAIR datasets across the world forcing the innovation of the whole data landscape.
* A knowledge graph with 50M+ triples.
* AI-ready metadata exports.
* Qdrant as a vector storage, Google Meta Mistral AI as LLM model providers.
* Adrian Gschwend Qlever as fastest triple store for Dataverse knowledge graphs
Multilingual, machine-readable, queryable scientific data at scale.
If you're interested, you can also apply for the 2-month #BioAgentHack online hackathon:
• $125K+ prizes
• Mentorship from Biotech and AI leaders
• Build alongside top open-science researchers & devs
More info: https://lnkd.in/eGhvaKdH
Choosing the Right Format: How Knowledge Graph Layouts Impact AI Reasoning
Choosing the Right Format: How Knowledge Graph Layouts Impact AI Reasoning ...
👉 Why This Matters
Most AI systems blend knowledge graphs (structured data) with large language models (flexible reasoning). But there’s a hidden variable: "how" you translate the graph into text for the AI. Researchers discovered that the formatting choice alone can swing performance by up to "17.5%" on reasoning tasks. Imagine solving 1 in 5 more problems correctly just by adjusting how you present data.
👉 What They Built
KG-LLM-Bench is a new benchmark to test how language models reason with knowledge graphs.
It includes five tasks:
- Triple verification (“Does this fact exist?”)
- Shortest path finding (“How are two concepts connected?”)
- Aggregation (“How many entities meet X condition?”)
- Multi-hop reasoning (“Which entities linked to A also have property B?”)
- Global analysis (“Which node is most central?”)
The team tested seven models (Claude, GPT-4o, Gemini, Llama, Nova) with five ways to “textualize” graphs, from simple edge lists to structured JSON and semantic web formats like RDF Turtle.
👉 Key Insights
1. Format matters more than assumed:
- Structured JSON and edge lists performed best overall, but results varied by task.
- For example, JSON excels at aggregation tasks (data is grouped by entity), while edge lists help identify central nodes (repeated mentions highlight connections).
2. Models don’t cheat:
Replacing real entity names with fake ones (e.g., “France” → “Verdania”) caused only a 0.2% performance drop, proving models rely on context, not memorized knowledge.
3. Token efficiency:
- Edge lists used ~2,600 tokens vs. JSON-LD’s ~13,500. Shorter formats free up context space for complex reasoning.
- But concise ≠ always better: structured formats improved accuracy for tasks requiring grouped data.
4. Models struggle with directionality:
Counting outgoing edges (e.g., “Which countries does France border?”) is easier than incoming ones (“Which countries border France?”), likely due to formatting biases.
👉 Practical Takeaways
- Optimize for your task: Use JSON for aggregation, edge lists for centrality.
- Test your model: The best format depends on the LLM—Claude thrived with RDF Turtle, while Gemini preferred edge lists.
- Don’t fear pseudonyms: Masking real names minimally impacts performance, useful for sensitive data.
The benchmark is openly available, inviting researchers to add new tasks, graphs, and models. As AI handles larger knowledge bases, choosing the right “data language” becomes as critical as the reasoning logic itself.
Paper: [KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs]
Authors: Elan Markowitz, Krupa Galiya, Greg Ver Steeg, Aram Galstyan
Choosing the Right Format: How Knowledge Graph Layouts Impact AI Reasoning
A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research
🚀 Thrilled to share our latest work published in Nature Machine Intelligence!
📄 "A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research"
In this study, we constructed iKraph, one of the most comprehensive biomedical knowledge graphs to date, using a human-level information extraction pipeline that won both the LitCoin NLP Challenge and the BioCreative Challenge. iKraph integrates insights from over 34 million PubMed abstracts and 40 public databases, enabling unprecedented scale and precision in automated knowledge discovery (AKD).
💡 What sets our work apart?
We developed a causal knowledge graph and a probabilistic semantic reasoning (PSR) algorithm to infer indirect entity relationships, such as drug-disease relationships. This time-aware framework allowed us to retrospectively and prospectively validate drug repurposing and drug target predictions, something rarely done in prior work.
✅ For COVID-19, we predicted hundreds of drug candidates in real-time, one-third of which were later supported by clinical trials or publications.
✅ For cystic fibrosis, we demonstrated our predictions were often validated up to a decade later, suggesting our method could significantly accelerate the drug discovery pipeline.
✅ Across diverse diseases and common drugs, we achieved benchmark-setting recall and positive predictive rates, pushing the boundaries of what's possible in drug repurposing.
We believe this study sets a new frontier in biomedical discovery and demonstrates the power of structured knowledge and interpretability in real-world applications.
📚 Read the full paper: https://lnkd.in/egYgbYT4?
📌 Access the platform: https://lnkd.in/ecxwHBK7
📂 Access the data and code: https://lnkd.in/eBp2GEnH
LitCoin NLP Challenge: https://lnkd.in/e-cBc6eR
Kudos to our incredible team and collaborators who made this possible!
#DrugDiscovery #AI #KnowledgeGraph #Bioinformatics #MachineLearning #NatureMachineIntelligence #DrugRepurposing #LLM #BiomedicalAI #NLP #COVID19 #Insilicom #NIH #NCI #NSF #ARPA-H | 10 comments on LinkedIn
A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research
Digital evolution: Novo Nordisk’s shift to ontology-based data management - Journal of Biomedical Semantics
The amount of biomedical data is growing, and managing it is increasingly challenging. While Findable, Accessible, Interoperable and Reusable (FAIR) data principles provide guidance, their adoption has proven difficult, especially in larger enterprises like pharmaceutical companies. In this manuscript, we describe how we leverage an Ontology-Based Data Management (OBDM) strategy for digital transformation in Novo Nordisk Research & Early Development. Here, we include both our technical blueprint and our approach for organizational change management. We further discuss how such an OBDM ecosystem plays a pivotal role in the organization’s digital aspirations for data federation and discovery fuelled by artificial intelligence. Our aim for this paper is to share the lessons learned in order to foster dialogue with parties navigating similar waters while collectively advancing the efforts in the fields of data management, semantics and data driven drug discovery.
MiniRAG Introduces Near-LLM Accurate RAG for Small Language Models with Just 25% of the Storage
🏆🚣MiniRAG Introduces Near-LLM Accurate RAG for Small Language Models with Just 25% of the Storage.
Achieving that by Semantic-Aware Heterogeneous Graph…
MiniRAG Introduces Near-LLM Accurate RAG for Small Language Models with Just 25% of the Storage
Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks
I love Markus J. Buehler's work, and his latest paper "Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks" does not disappoint, revealing… | 19 comments on LinkedIn
Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks
MiniRAG Introduces Near-LLM Accurate RAG for Small Language Models with Just 25% of the Storage
🏆🚣MiniRAG Introduces Near-LLM Accurate RAG for Small Language Models with Just 25% of the Storage.
Achieving that by Semantic-Aware Heterogeneous Graph…
MiniRAG Introduces Near-LLM Accurate RAG for Small Language Models with Just 25% of the Storage
KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs
Breaking LLM Hallucinations in a Smarter Way!
(It’s not about feeding more data)
Large Language Models (LLMs) still struggle with factual inaccuracies, but…
KET-RAG: Turbocharging AI Agents with 10x Cheaper, Smarter Knowledge Retrieval
KET-RAG: Turbocharging AI Agents with 10x Cheaper, Smarter Knowledge Retrieval
This Multi-Granular Graph Framework uses PageRank and Keyword-Chunk Graph to have the Best Cost-Quality Tradeoff
﹌﹌﹌﹌﹌﹌﹌﹌﹌
》The Problem: Knowledge Graphs Are Expensive (and Clunky)
AI agents need context to answer complex questions—like connecting “COVID vaccines” to “myocarditis risks” across research papers. But today’s solutions face two nightmares:
✸ Cost: Building detailed knowledge graphs with LLMs can cost $33,000 for a 5GB legal case.
✸ Quality: Cheap methods (like KNN graphs) miss key relationships, leading to 32% worse answers.
☆ Imagine training an AI doctor that either bankrupts you or misdiagnoses patients. Ouch.
﹌﹌﹌﹌﹌﹌﹌﹌﹌
》The Fix: KET-RAG’s Two-Layer Brain
KET-RAG merges precision (knowledge graphs) and efficiency (keyword-text maps) into one system:
✸ Layer 1: Knowledge Graph Skeleton
☆ Uses PageRank to find core text chunks (like “vaccine side effects” in medical docs).
☆ Builds a sparse graph only on these chunks with LLMs—saving 80% of indexing costs.
✸ Layer 2: Keyword-Chunk Bipartite Graph
☆ Links keywords (e.g., “myocarditis”) to all related text snippets—no LLM needed.
☆ Acts as a “fast lane” for retrieving context without expensive entity extraction.
﹌﹌﹌﹌﹌﹌﹌﹌﹌
》Results: Beating Microsoft’s Graph-RAG with Pennies
On HotpotQA and MuSiQue benchmarks, KET-RAG:
✸ Retrieves 81.6% of critical info vs. Microsoft’s 74.6%—with 10x lower cost.
✸ Boosts answer accuracy (F1 score) by 32.4% while cutting indexing bills by 20%.
✸ Scales to terabytes of data without melting budgets.
☆ Think of it as a Tesla Model 3 outperforming a Lamborghini at 1/10th the price.
﹌﹌﹌﹌﹌﹌﹌﹌﹌
》Why AI Agents Need This
AI agents aren’t just chatbots—they’re problem solvers for medicine, law, and customer service. KET-RAG gives them:
✸ Real-time, multi-hop reasoning: Connecting “drug A → gene B → side effect C” in milliseconds.
✸ Cost-effective scalability: Deploying agents across millions of documents without going broke.
✸ Adaptability: Mixing precise knowledge graphs (for critical data) with keyword maps (for speed).
Paper in comments
≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣
》Build Your Own Supercharged AI Agent?
🔮 Join My 𝐇𝐚𝐧𝐝𝐬-𝐎𝐧 𝐀𝐈 𝐀𝐠𝐞𝐧𝐭𝐬 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 TODAY!
and Learn Building AI Agent with Langgraph/Langchain, CrewAI and OpenAI Swarm + RAG Pipelines
𝐄𝐧𝐫𝐨𝐥𝐥 𝐍𝐎𝐖 [34% discount]:
👉 https://lnkd.in/eGuWr4CH | 10 comments on LinkedIn
KET-RAG: Turbocharging AI Agents with 10x Cheaper, Smarter Knowledge Retrieval
SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs
LLMs that automatically fill knowledge gaps - too good to be true?
Large Language Models (LLMs) often stumble in logical tasks due to hallucinations, especially when relying on incomplete Knowledge Graphs (KGs).
Current methods naively trust KGs as exhaustive truth sources - a flawed assumption in real-world domains like healthcare or finance where gaps persist.
SymAgent is a new framework that approaches this problem by making KGs active collaborators, not passive databases.
Its dual-module design combines symbolic logic with neural flexibility:
1. Agent-Planner extracts implicit rules from KGs (e.g., "If drug X interacts with Y, avoid co-prescription") to decompose complex questions into structured steps.
2. Agent-Executor dynamically pulls external data when KG triples are missing, bypassing the "static repository" limitation.
Perhaps most impressively, SymAgent’s self-learning observes failed reasoning paths to iteratively refine its strategy and flag missing KG connections - achieving 20-30% accuracy gains over raw LLMs.
Equipped with SymAgent, even 7B models rival their much larger counterparts by leveraging this closed-loop system.
It would be great if LLMs were able to autonomously curate knowledge and adapt to domain shifts without costly retraining.
But are we there yet? Are hybrid architectures like SymAgent the future?
↓
Liked this post? Join my newsletter with 50k+ readers that breaks down all you need to know about the latest LLM research: llmwatch.com 💡
KAG: Boosting LLMs in Professional Domains via Knowledge Augmented...
The recently developed retrieval-augmented generation (RAG) technology has enabled the efficient construction of domain-specific applications. However, it also has limitations, including the gap...
Ontologies as Conceptualizations by Nicola Guarino
Nicola Guarino Keynote Address for the Ontology Summit 2025 on 22 January 2025 "Ontologies as specifications of conceptualizations: correctness, precision, a...
Terminology Augmented Generation (TAG)? Recently some fellow terminologists have proposed the new term "Terminology-Augmented Generation (TAG)" to refer to… | 29 comments on LinkedIn
What is really Graph RAG? Inspired by "From Local to Global: A Graph RAG Approach to Query-Focused Summarization" paper from Microsoft! How do you combine… | 12 comments on LinkedIn
Knowledge Graphs as a source of trust for LLM-powered enterprise question answering
Knowledge Graphs as a source of trust for LLM-powered enterprise question answering That has been our position from the beginning when we started our research… | 29 comments on LinkedIn
Knowledge Graphs as a source of trust for LLM-powered enterprise question answering
Graph contrastive learning (GCL) is a self-supervised learning technique for graphs that focuses on learning representations by contrasting different views of…
OG-RAG: Ontology-Grounded Retrieval-Augmented Generation For Large...
This paper presents OG-RAG, an Ontology-Grounded Retrieval Augmented Generation method designed to enhance LLM-generated responses by anchoring retrieval processes in domain-specific ontologies....
Large Language Models, Knowledge Graphs and Search Engines: A...
Much has been discussed about how Large Language Models, Knowledge Graphs and Search Engines can be combined in a synergistic manner. A dimension largely absent from current academic discourse is...
Background: The field of Artificial Intelligence has undergone cyclical periods of growth and decline, known as AI summers and winters. Currently, we are in the third AI summer, characterized by...
PG-Schema: Schemas for Property Graphs | Proceedings of the ACM on Management of Data
Property graphs have reached a high level of maturity, witnessed by multiple robust
graph database systems as well as the ongoing ISO standardization effort aiming at
creating a new standard Graph Query Language (GQL). Yet, despite documented demand,
...
What if creating Linked Open Data was less like coding and more like writing? Could anyone extend the Semantic Web by sharing a document? Publish a knowledge… | 13 comments on LinkedIn
SimGRAG is a novel method for knowledge graph driven RAG, transforms queries into graph patterns and aligns them with candidate subgraphs using a graph semantic distance metric
SimGRAG is a novel method for knowledge graph driven RAG, transforms queries into graph patterns and aligns them with candidate subgraphs using a graph…
SimGRAG is a novel method for knowledge graph driven RAG, transforms queries into graph patterns and aligns them with candidate subgraphs using a graph semantic distance metric