Found 124 bookmarks
Newest
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning
✨ #NeurIPS2025 paper: Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning Combining contrastive learning and message passing markedly improves features created from embedding graphs, scalable to huge graphs. It taught us a lot on graph feature learning 👇 Graphs can represent knowledge and have scaled to huge sizes (115M entities in Wikidata). How to distill these into good downstream features, eg for machine learning? The challenge is to create feature vectors, and for this graph embeddings have been invaluable. Our paper shows that message passing is a great tool to build feature vectors from graphs As opposed to contrastive learning, message passing helps embeddings represent the large-scale structure of the graph (it gives Arnoldi-type iterations). Our approach uses contrastive learning on a core subset of entities, to capture a large-scale structure. Consistent with knowledge-graph embedding literature, this step represents relations as operators on the embedding space. It also anchors the central entities. Knowledge graphs have long-tailed entity distributions, with many weakly-connected entities on which contrastive learning is under constrained. For these, we propagate embeddings via the relation operators, in a diffusion-like step, extrapolating from the central entities. To have a very efficient algorithm, we split the graph in overlapping highly-connected blocks that fit in GPU memory. Propagation is then simple in-memory iterations, and we embed huge graphs on a single GPU. Splitting huge knowledge graphs in sub-parts is actually hard because of the mix of very highly-connected nodes, and a huge long tail hard to reach. We introduce a procedure that allows for overlap in the blocks, relaxing a lot the difficulty. ‪Our approach, SEPAL, combines these elements for feature learning on large knowledge graphs. It creates feature vectors that lead to better performance on downstream tasks, and it is more scalable. Larger knowledge graphs give feature vectors that provide downstream value. We also learned that performance on link prediction, the canonical task of knowledge-graph embedding, is not a good proxy for downstream utility. We believe this is because link prediction only needs local structure, unlike downstream tasks The papier is well reproducible, and we hope it will unleash more progress in knowledge graph embedding. We'll present at #NeurIPS and #Eurips
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning
·linkedin.com·
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning
Graph-constrained Reasoning (GCR) is a novel framework that bridges structured knowledge in knowledge graphs (KG) with unstructured reasoning in LLMs
Graph-constrained Reasoning (GCR) is a novel framework that bridges structured knowledge in knowledge graphs (KG) with unstructured reasoning in LLMs
Graph-constrained Reasoning (GCR) is a novel framework that bridges structured knowledge in knowledge graphs (KG) with unstructured reasoning in LLMs.
Graph-constrained Reasoning (GCR) is a novel framework that bridges structured knowledge in knowledge graphs (KG) with unstructured reasoning in LLMs
·linkedin.com·
Graph-constrained Reasoning (GCR) is a novel framework that bridges structured knowledge in knowledge graphs (KG) with unstructured reasoning in LLMs
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
Alhamdulillah, ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds. Just as matter is formed from atoms, and galaxies are formed from stars, knowledge is likely to be formed from atomic knowledge graphs. Atomic knowledge graphs were born from our intention to solve a common problem in LLM-based KG construction methods: exhaustivity and stability. Often, these methods produce unstable KGs that change when rerunning the construction process, even without changing anything. Moreover, they fail to capture all facts in the input documents and usually overlook the temporal and dynamic aspects of real-world data. What is the solution? Atomic facts that are temporally aware. Instead of constructing knowledge graphs from raw documents, we split them into atomic facts, which are self-contained and concise propositions. Temporal atomic KGs are constructed from each atomic fact. Then, we defined how the temporal atomic KGs would be merged at the atomic level and how the temporal aspects would be handled. We designed a binary merge algorithm that combines two TKGs and a parallel merge process that merges all TKGs simultaneously. The entire architecture operates in parallel. ATOM employs dual-time modeling that distinguishes observation time from validity time and has 3 main modules: - Module 1 (Atomic Fact Decomposition) splits input documents observed at time t into atomic facts using LLM-based prompting, where each temporal atomic fact is a short, self-contained snippet that conveys exactly one piece of information. - Module 2 (Atomic TKGs Construction) extracts 5-tuples in parallel from each atomic fact to construct atomic temporal KGs, while embedding nodes and relations and addressing temporal resolution during extraction. - Module 3 (Parallel Atomic Merge) employs a binary merge algorithm to merge pairs of atomic TKGs through iterative pairwise merging in parallel until convergence, with three resolution phases: (1) entity resolution, (2) relation name resolution, and (3) temporal resolution that merges observation and validity time sets for relations with similar (e_s, r_p, e_o). The resulting TKG snapshot is then merged with the previous DTKG to yield the updated DTKG. Results: Empirical evaluations demonstrate that ATOM achieves ~18% higher exhaustivity, ~17% better stability, and over 90% latency reduction compared to baseline methods (including iText2KG), demonstrating strong scalability potential for dynamic TKG construction. Check our ATOM's architecture and code: Preprint Paper: https://lnkd.in/dsJzDaQc Code: https://lnkd.in/drZUyisV Website: (coming soon) Example use cases: (coming soon) Special thanks to the dream team: Ludovic Moncla, Khalid Benabdeslem, Rémy Cazabet, Pierre Cléau 📚📡 | 14 comments on LinkedIn
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
·linkedin.com·
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMs
ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMs
✅ Some state-of-the-art methods for knowledge graph (KG) construction that implement incrementality build a graph from around 3k atomic facts in 4–7 hours, while ATOM achieves the same in just 20 minutes using only 8 parallel threads and a batch size of 40 for asynchronous LLM API calls. ❓ What’s the secret behind this performance? 👉 The architecture. The parallel design. ❌ Incrementality in KG construction was key, but it significantly limits scalability. This is because the method must first build the KG and compare it with the previous one before moving on to the next chunk. That’s why we eliminated this in iText2KG. ❓ Why is scalability so important? The short answer: real-time analytics. Fast dynamic TKG construction enables LLMs to reason over them and generate responses instantly, in real time. Discover more secrets behind this parallel architecture by reading the full paper (link in the first comment).
ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMs
·linkedin.com·
ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMs
Text2KGBench-LettrIA: A Refined Benchmark for Text2Graph Systems
Text2KGBench-LettrIA: A Refined Benchmark for Text2Graph Systems
🚀 LLMs can be powerful tools to extract information from texts and automatically populate Knowledge Graphs guided by ontologies given as inputs. BUT how good are they? To reply to this question, we need benchmarks! 💡 With Lettria, we build the Text2KGBench-LettrIA benchmark covering 19 different ontologies in various domains (company, film, food, politician, sports, monument, etc.) and consisting of near 5k sentences strictly annotated with triples conforming to these ontologies (208 classes, 426 properties) yielding more than 17k triples. What's more? We throw a lot of compute to compare the performance and efficiency of numerous Closed LLMs models and variants (GPT4, Claude 3, Gemini) and numerous fine-tuned Open Weights models (Mistral 3, Qwen 3, Gemma 3, Phi 4). ✨Key take-away: when being provided with high quality data, fine-tuned open models largely outperform larger, proprietary counterparts! 📄 Curious about the detailed results? Read our paper at https://lnkd.in/e-EZCjWm See our presentation at https://lnkd.in/eEdCCpdA that I have just presented at the Knowledge Base Construction from Pre-Trained Language Models Workshop colocated with the ISWC - International Semantic Web Conference. You want to use these results in your operations? Sign-up for using the newly released PERSEUS model, https://lnkd.in/e7exyJHc Joint work with Julien PLU, Oscar Moreno Escobar, Edouard Trouillez, Axelle Gapin, Pasquale Lisena, Thibault Ehrhart #iswc2025 #LLMs #KnowledgeGraphs #NLP #Research EURECOM, Charles Borderie
·linkedin.com·
Text2KGBench-LettrIA: A Refined Benchmark for Text2Graph Systems
Exploring Network-Knowledge Graph Duality: A Case Study in Agentic Supply Chain Risk Analysis
Exploring Network-Knowledge Graph Duality: A Case Study in Agentic Supply Chain Risk Analysis
Exploring Network-Knowledge Graph Duality: A Case Study in Agentic Supply Chain Risk Analysis ... What happens when you ask an AI about supply chain vulnerabilities and it misses the most critical dependencies? Most AI systems treat business relationships like isolated facts in a database. They might know Apple uses lithium batteries, but they miss the web of connections that create real risk. 👉 The Core Problem Standard AI retrieval treats every piece of information as a standalone point. But supply chain risk lives in the relationships between companies, products, and locations. When conflict minerals from the DRC affect smartphone production, it's not just about one supplier - it's about cascading effects through interconnected networks. Vector similarity search finds related documents but ignores the structural dependencies that matter most for risk assessment. 👉 A Different Approach New research from UC Berkeley and MSCI demonstrates how to solve this by treating supply chains as both networks and knowledge graphs simultaneously. The key insight: economic relationships like "Company A produces Product B" are both structural network links and semantic knowledge graph triples. This duality lets you use network science to find the most economically important paths. 👉 How It Works Instead of searching for similar text, the system: - Maps supply chains as networks with companies, products, and locations as nodes - Uses centrality measures to identify structurally important paths - Wraps quantitative data in descriptive language so AI can reason about what numbers actually mean - Retrieves specific relationship paths rather than generic similar content When asked about cobalt risks, it doesn't just find articles about cobalt. It traces the actual path from DRC mines through battery manufacturers to final products, revealing hidden dependencies. The system generates risk narratives that connect operational disruptions to financial impacts without requiring specialized training or expensive graph databases. This approach shows how understanding the structure of business relationships - not just their content - can make AI genuinely useful for complex domain problems.
Exploring Network-Knowledge Graph Duality: A Case Study inAgentic Supply Chain Risk Analysis
·linkedin.com·
Exploring Network-Knowledge Graph Duality: A Case Study in Agentic Supply Chain Risk Analysis
Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning
Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning
KG-R1, Why Knowledge Graph RAG Systems Are Too Expensive to Deploy (And How One Team Fixed It) ... What if I told you that most knowledge graph systems require multiple AI models just to answer a single question? That's exactly the problem plaguing current KG-RAG deployments. 👉 The Cost Problem Traditional knowledge graph retrieval systems use a pipeline approach: one model for planning, another for reasoning, a third for reviewing, and a fourth for responding. Each step burns through tokens and compute resources, making deployment prohibitively expensive for most organizations. Even worse? These systems are built for specific knowledge graphs. Change your data source, and you need to retrain everything. 👉 A Single-Agent Solution Researchers from MIT and IBM just published KG-R1, which replaces this entire multi-model pipeline with one lightweight agent that learns through reinforcement learning. Here's the clever part: instead of hardcoding domain-specific logic, the system uses four simple, universal operations: - Get relations from an entity - Get entities from a relation - Navigate forward or backward through connections These operations work on any knowledge graph without modification. 👉 The Results Are Striking Using just a 3B parameter model, KG-R1: - Matches accuracy of much larger foundation models - Uses 60% fewer tokens per query than existing methods - Transfers across different knowledge graphs without retraining - Processes queries in under 7 seconds on a single GPU The system learned to retrieve information strategically through multi-turn interactions, optimized end-to-end rather than stage-by-stage. This matters because knowledge graphs contain some of our most valuable structured data - from scientific databases to legal documents. Making them accessible and affordable could unlock entirely new applications.
https://arxiv.org/abs/2509.26383v1 Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning
·linkedin.com·
Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation ... Why Current AI Search Falls Short When You Need Real Answers What happens when you ask an AI system a complex question that requires connecting multiple pieces of information? Most current approaches retrieve some relevant documents, generate an answer, and call it done. But this single-pass strategy often misses critical evidence. 👉 The Problem with Shallow Retrieval Traditional retrieval-augmented generation (RAG) systems work like a student who only skims the first few search results before writing an essay. They grab what seems relevant on the surface but miss deeper connections that would lead to better answers. When researchers tested these systems on complex multi-hop questions, they found a consistent pattern: the AI would confidently provide answers based on incomplete evidence, leading to logical gaps and missing key facts. 👉 A New Approach: Deep Searching with Dual Channels Researchers from IDEA Research and Hong Kong University of Science and Technology developed GraphSearch, which works more like a thorough investigator than a quick searcher. The system breaks down complex questions into smaller, manageable pieces, then searches through both text documents and structured knowledge graphs. Think of it as having two different research assistants: one excellent at finding descriptive information in documents, another skilled at tracing relationships between entities. 👉 How It Actually Works Instead of one search-and-answer cycle, GraphSearch uses six coordinated modules: Query decomposition splits complex questions into atomic sub-questions Context refinement filters out noise from retrieved information Query grounding fills in missing details from previous searches Logic drafting organizes evidence into coherent reasoning chains Evidence verification checks if the reasoning holds up Query expansion generates new searches to fill identified gaps The system continues this process until it has sufficient evidence to provide a well-grounded answer. 👉 Real Performance Gains Testing across six different question-answering benchmarks showed consistent improvements. On the MuSiQue dataset, for example, answer accuracy jumped from 35% to 51% when GraphSearch was integrated with existing graph-based systems. The approach works particularly well under constrained conditions - when you have limited computational resources for retrieval, the iterative searching strategy maintains performance better than single-pass methods. This research points toward more reliable AI systems that can handle the kind of complex reasoning we actually need in practice. Paper: "GraphSearch: An Agentic Deep Searching Workflow for Graph Retrieval-Augmented Generation" by Yang et al.
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation
·linkedin.com·
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation
GoAI: Enhancing AI Students' Learning Paths and Idea Generation via Graph of AI Ideas
GoAI: Enhancing AI Students' Learning Paths and Idea Generation via Graph of AI Ideas
💡 Graph of Ideas -- LLMs paired with knowledge graphs can be great partners for ideation, exploration, and research. We've all seen the classic detective corkboard, with pinned notes and pictures, all strung together with red twine. 🕵️  The digital version could be a mind-map, but you still have to draw everything by hand. What if you could just build one from a giant pile of documents? Enter GoAI - a fascinating approach that just dropped on arXiv combining knowledge graphs with LLMs for AI research idea generation. While the paper focuses on a graph of research papers, the approach is generalizable. Here's what caught my attention: 🔗 It builds knowledge graphs from AI papers where nodes are papers/concepts and edges capture semantic citation relationships - basically mapping how ideas actually connect and build on each other 🎯 The "Idea Studio" feature gives you feedback on innovation, clarity, and feasibility of your research ideas - like having a research mentor in your pocket 📈 Experiments show it helps produce clearer, more novel, and more impactful research ideas compared to traditional LLM approaches The key insight? Current LLMs miss the semantic structure and prerequisite relationships in academic knowledge. This framework bridges that gap by making the connections explicit. As AI research accelerates, this approach can be be used for any situation where you're looking for what's missing, rather than answering a question about what exists. Read all the details in the paper... https://lnkd.in/ekGtCx9T
Graph of Ideas -- LLMs paired with knowledge graphs can be great partners for ideation, exploration, and research.
·linkedin.com·
GoAI: Enhancing AI Students' Learning Paths and Idea Generation via Graph of AI Ideas
FinReflectKG: Agentic Construction and Evaluation of Financial Knowledge Graphs
FinReflectKG: Agentic Construction and Evaluation of Financial Knowledge Graphs
Sharing our recent research 𝐅𝐢𝐧𝐑𝐞𝐟𝐥𝐞𝐜𝐭𝐊𝐆: 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐂𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐅𝐢𝐧𝐚𝐧𝐜𝐢𝐚𝐥 𝐊𝐧𝐨𝐰𝐥𝐞𝐝𝐠𝐞 𝐆𝐫𝐚𝐩𝐡𝐬. It is the largest financial knowledge graph built from unstructured data. The preprint of our article is out on arXiv now (link is in the comments). It is coauthored with Abhinav Arun | Fabrizio Dimino | Tejas Prakash Agrawal While LLMs make it easier than ever to generate knowledge graphs, the real challenge lies in ensuring quality without hallucinations, with strong coverage, precision, comprehensiveness, and relevance. FinReflectKG tackles this through an iterative, evaluation-driven agentic approach, carefully optimized across multiple evaluation metrics to deliver a trustworthy and high-quality knowledge graph. Designed to power use cases like entity search, question answering, signal generation, predictive modeling, and financial network analysis, FinReflectKG sets a new benchmark for building reliable financial KGs and showcases the potential of agentic workflows in LLM-driven systems. We will be creating a suite of benchmarks using FinReflectKG for KG related tasks in financial services. More details to come soon. | 15 comments on LinkedIn
·linkedin.com·
FinReflectKG: Agentic Construction and Evaluation of Financial Knowledge Graphs
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
Graph-R1 New RAG framework just dropped! Combines agents, GraphRAG, and RL. Here are my notes: Introduces a novel RAG framework that moves beyond traditional one-shot or chunk-based retrieval by integrating graph-structured knowledge, agentic multi-turn interaction, and RL. Graph-R1 is an agent that reasons over a knowledge hypergraph environment by iteratively issuing queries and retrieving subgraphs using a multi-step “think-retrieve-rethink-generate” loop. Unlike prior GraphRAG systems that perform fixed retrieval, Graph-R1 dynamically explores the graph based on evolving agent state. Retrieval is modeled as a dual-path mechanism: entity-based hyperedge retrieval and direct hyperedge similarity, fused via reciprocal rank aggregation to return semantically rich subgraphs. These are used to ground subsequent reasoning steps. The agent is trained end-to-end using GRPO with a composite reward that incorporates structural format adherence and answer correctness. Rewards are only granted if reasoning follows the proper format, encouraging interpretable and complete reasoning traces. On six RAG benchmarks (e.g., HotpotQA, 2WikiMultiHopQA), Graph-R1 achieves state-of-the-art F1 and generation scores, outperforming prior methods including HyperGraphRAG, R1-Searcher, and Search-R1. It shows particularly strong gains on harder, multi-hop datasets and under OOD conditions. The authors find that Graph-R1’s performance degrades sharply without its three key components: hypergraph construction, multi-turn interaction, and RL. Ablation study supports that graph-based and multi-turn retrieval improves information density and accuracy, while end-to-end RL bridges the gap between structure and language. Paper: https://lnkd.in/eGbf4HhX | 15 comments on LinkedIn
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
·linkedin.com·
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
Instead of just pulling facts, the system samples multi-step paths within the graph, such as a causal chain from a disease to a symptom, and translates these paths into natural language reasoning tasks complete with a step-by-step thinking trace
·kg-bottom-up-superintelligence.github.io·
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
Millions of G∈AR-s: Extending GraphRAG to Millions of Documents
Millions of G∈AR-s: Extending GraphRAG to Millions of Documents
Scaling GraphRAG to Millions of Documents: Lessons from the SIGIR 2025 LiveRAG Challenge 👉 WHY THIS MATTERS Retrieval-augmented generation (RAG) struggles with multi-hop questions that require connecting information across documents. While graph-based RAG methods like GEAR improve reasoning by structuring knowledge as entity-relationship triples, scaling these approaches to web-sized datasets (millions/billions of documents) remains a bottleneck. The culprit? Traditional methods rely heavily on LLMs to extract triples—a process too slow and expensive for large corpora. 👉 WHAT THEY DID Researchers from Huawei and the University of Edinburgh reimagined GEAR to sidestep costly offline triple extraction. Their solution: - Pseudo-alignment: Link retrieved passages to existing triples in Wikidata via sparse retrieval. - Iterative expansion: Use a lightweight LLM (Falcon-3B-Instruct) to iteratively rewrite queries and retrieve additional evidence through Wikidata’s graph structure. - Multi-step filtering: Combine Reciprocal Rank Fusion (RRF) and prompt-based filtering to reconcile noisy alignments between Wikidata and document content. This approach achieved 87.6% correctness and 53% faithfulness on the SIGIR 2025 LiveRAG benchmark, despite challenges in aligning Wikidata’s generic triples with domain-specific document content. 👉 KEY INSIGHTS 1. Trade-offs in alignment: Linking Wikidata triples to documents works best for general knowledge but falters with niche topics (e.g., "Pacific geoduck reproduction" mapped incorrectly to oyster biology). 2. Cost efficiency: Avoiding LLM-based triple extraction reduced computational overhead, enabling scalability. 3. The multi-step advantage: Query rewriting and iterative retrieval improved performance on complex questions requiring 2+ reasoning hops. 👉 OPEN QUESTIONS - How can we build asymmetric semantic models to better align text and graph data? - Can hybrid alignment strategies (e.g., blending domain-specific KGs with Wikidata) mitigate topic drift? - Does graph expansion improve linearly with scale, or are diminishing returns inevitable? Why read this paper? It’s a pragmatic case study in balancing scalability with reasoning depth in RAG systems. The code and prompts are fully disclosed, offering a blueprint for adapting GraphRAG to real-world, large-scale applications. Paper: "Millions of G∈AR-s: Extending GraphRAG to Millions of Documents" (Shen et al., SIGIR 2025). Preprint: arXiv:2307.17399.
Millions of G∈AR-s: Extending GraphRAG to Millions of Documents
·linkedin.com·
Millions of G∈AR-s: Extending GraphRAG to Millions of Documents
Integrating Knowledge Graphs with Symbolic AI: The Path to Interpretable Hybrid AI Systems in Medicine
Integrating Knowledge Graphs with Symbolic AI: The Path to Interpretable Hybrid AI Systems in Medicine
In this position paper "Integrating Knowledge Graphs with Symbolic AI: The Path to Interpretable Hybrid AI Systems in Medicine" my L3S Research Center and TIB – Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek colleagues around Maria-Esther Vidal have nicely laid out some research challenges on the way to interpretable hybrid AI systems in medicine. However, I think the conceptual framework is broadly applicable way beyond medicine. For example, my former colleagues and PhD students at eccenca are working on operationalizing Neuro-Symbolic AI for Enterprise Knowledge Management with eccenca's Corporate Memory. The paper outlines a compelling architecture for combining sub-symbolic models (e.g., deep learning) with symbolic reasoning systems to enable AI that is interpretable, robust, and aligned with human values. eccenca implements these principles at scale through its neuro-symbolic Enterprise Knowledge Graph platform, Corporate Memory for real-world industrial settings: 1. Symbolic Foundation via Semantic Web Standards - Corporate Memory is grounded in W3C standards (RDF, RDFS, OWL, SHACL, SPARQL), enabling formal knowledge representation, inferencing, and constraint validation. This allows to encode domain ontologies, business rules, and data governance policies in a machine-interpretable and human-verifiable manner. 2. Integration of Sub-symbolic Components - it integrates LLMs and ML models for tasks such as schema matching, natural language interpretation, entity resolution, and ontology population. These are linked to the symbolic layer via mappings and annotations, ensuring traceability and explainability. 3. Neuro-Symbolic Interfaces for Hybrid Reasoning - Hybrid workflows where symbolic constraints (e.g., SHACL shapes) guide LLM-based data enrichment. LLMs suggest schema alignments, which are verified against ontological axioms. Graph embeddings and path-based querying power semantic search and similarity. 4. Human-in-the-loop Interactions - Domain experts interact through low-code interfaces and semantic UIs that allow inspection, validation, and refinement of both the symbolic and neural outputs, promoting human oversight and continuous improvement. Such an approach can power Industrial Applications, e.g. in digital thread integration in manufacturing, compliance automation in pharma and finance and in general, cross-domain interoperability in data mesh architectures. Corporate Memory is a practical instantiation of neuro-symbolic AI that meets industrial-grade requirements for governance, scalability, and explainability – key tenets of Human-Centric AI. Check it out here: https://lnkd.in/evyarUsR #NeuroSymbolicAI #HumanCentricAI #KnowledgeGraphs #EnterpriseArchitecture #ExplainableAI #SemanticWeb #LinkedData #LLM #eccenca #CorporateMemory #OntologyDrivenAI #AI4Industry
Integrating Knowledge Graphs with Symbolic AI: The Path to Interpretable Hybrid AI Systems in Medicine
·linkedin.com·
Integrating Knowledge Graphs with Symbolic AI: The Path to Interpretable Hybrid AI Systems in Medicine
AutoSchemaKG: Building Billion-Node Knowledge Graphs Without Human Schemas
AutoSchemaKG: Building Billion-Node Knowledge Graphs Without Human Schemas
AutoSchemaKG: Building Billion-Node Knowledge Graphs Without Human Schemas 👉 Why This Matters Traditional knowledge graphs face a paradox: they require expert-crafted schemas to organize information, creating bottlenecks for scalability and adaptability. This limits their ability to handle dynamic real-world knowledge or cross-domain applications effectively. 👉 What Changed AutoSchemaKG eliminates manual schema design through three innovations: 1. Dynamic schema induction: LLMs automatically create conceptual hierarchies while extracting entities/events 2. Event-aware modeling: Captures temporal relationships and procedural knowledge missed by entity-only approaches 3. Multi-level conceptualization: Organizes instances into semantic categories through abstraction layers The system processed 50M+ documents to build ATLAS - a family of KGs with: - 900M+ nodes (entities/events/concepts) - 5.9B+ relationships - 95% alignment with human-created schemas (zero manual intervention) 👉 How It Works 1. Triple extraction pipeline:   - LLMs identify entity-entity, entity-event, and event-event relationships   - Processes text at document level rather than sentence level for context preservation 2. Schema induction:   - Automatically groups instances into conceptual categories   - Creates hierarchical relationships between specific facts and abstract concepts 3. Scale optimization:   - Handles web-scale corpora through GPU-accelerated batch processing   - Maintains semantic consistency across 3 distinct domains (Wikipedia, academic papers, Common Crawl) 👉 Proven Impact - Boosts multi-hop QA accuracy by 12-18% over state-of-the-art baselines - Improves LLM factuality by up to 9% on specialized domains like medicine and law - Enables complex reasoning through conceptual bridges between disparate facts 👉 Key Insight The research demonstrates that billion-scale KGs with dynamic schemas can effectively complement parametric knowledge in LLMs when they reach critical mass (1B+ facts). This challenges the assumption that retrieval augmentation needs domain-specific tuning to be effective. Question for Discussion As autonomous KG construction becomes viable, how should we rethink the role of human expertise in knowledge representation? Should curation shift from schema design to validation and ethical oversight? | 15 comments on LinkedIn
AutoSchemaKG: Building Billion-Node Knowledge Graphs Without Human Schemas
·linkedin.com·
AutoSchemaKG: Building Billion-Node Knowledge Graphs Without Human Schemas
DRAG introduces a novel distillation framework that transfers RAG capabilities from LLMs to SLMs through Evidence-based distillation and Graph-based structuring
DRAG introduces a novel distillation framework that transfers RAG capabilities from LLMs to SLMs through Evidence-based distillation and Graph-based structuring
Small Models, Big Knowledge: How DRAG Bridges the AI Efficiency-Accuracy Gap 👉 Why This Matters Modern AI systems face a critical tension: large language models (LLMs) deliver impressive knowledge recall but demand massive computational resources, while smaller models (SLMs) struggle with factual accuracy and "hallucinations." Traditional retrieval-augmented generation (RAG) systems amplify this problem by requiring constant updates to vast knowledge bases. 👉 The Innovation DRAG introduces a novel distillation framework that transfers RAG capabilities from LLMs to SLMs through two key mechanisms: 1. Evidence-based distillation: Filters and ranks factual snippets from teacher LLMs 2. Graph-based structuring: Converts retrieved knowledge into relational graphs to preserve critical connections This dual approach reduces model size requirements by 10-100x while improving factual accuracy by up to 27.7% compared to prior methods like MiniRAG. 👉 How It Works 1. Evidence generation: A large teacher LLM produces multiple context-relevant facts 2. Semantic filtering: Combines cosine similarity and LLM scoring to retain top evidence 3. Knowledge graph creation: Extracts entity relationships to form structured context 4. Distilled inference: SLMs generate answers using both filtered text and graph data The process mimics how humans combine raw information with conceptual understanding, enabling smaller models to "think" like their larger counterparts without the computational overhead. 👉 Privacy Bonus DRAG adds a privacy layer by: - Local query sanitization before cloud processing - Returning only de-identified knowledge graphs Tests show 95.7% reduction in potential personal data leakage while maintaining answer quality. 👉 Why It’s Significant This work addresses three critical challenges simultaneously: - Makes advanced RAG capabilities accessible on edge devices - Reduces hallucination rates through structured knowledge grounding - Preserves user privacy in cloud-based AI interactions The GitHub repository provides full implementation details, enabling immediate application in domains like healthcare diagnostics, legal analysis, and educational tools where accuracy and efficiency are non-negotiable.
DRAG introduces a novel distillation framework that transfers RAG capabilities from LLMs to SLMs through two key mechanisms:1. Evidence-based distillation: Filters and ranks factual snippets from teacher LLMs2. Graph-based structuring: Converts retrieved knowledge into relational graphs to preserve critical connections
·linkedin.com·
DRAG introduces a novel distillation framework that transfers RAG capabilities from LLMs to SLMs through Evidence-based distillation and Graph-based structuring