Found 166 bookmarks
Custom sorting
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
Alhamdulillah, ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds. Just as matter is formed from atoms, and galaxies are formed from stars, knowledge is likely to be formed from atomic knowledge graphs. Atomic knowledge graphs were born from our intention to solve a common problem in LLM-based KG construction methods: exhaustivity and stability. Often, these methods produce unstable KGs that change when rerunning the construction process, even without changing anything. Moreover, they fail to capture all facts in the input documents and usually overlook the temporal and dynamic aspects of real-world data. What is the solution? Atomic facts that are temporally aware. Instead of constructing knowledge graphs from raw documents, we split them into atomic facts, which are self-contained and concise propositions. Temporal atomic KGs are constructed from each atomic fact. Then, we defined how the temporal atomic KGs would be merged at the atomic level and how the temporal aspects would be handled. We designed a binary merge algorithm that combines two TKGs and a parallel merge process that merges all TKGs simultaneously. The entire architecture operates in parallel. ATOM employs dual-time modeling that distinguishes observation time from validity time and has 3 main modules: - Module 1 (Atomic Fact Decomposition) splits input documents observed at time t into atomic facts using LLM-based prompting, where each temporal atomic fact is a short, self-contained snippet that conveys exactly one piece of information. - Module 2 (Atomic TKGs Construction) extracts 5-tuples in parallel from each atomic fact to construct atomic temporal KGs, while embedding nodes and relations and addressing temporal resolution during extraction. - Module 3 (Parallel Atomic Merge) employs a binary merge algorithm to merge pairs of atomic TKGs through iterative pairwise merging in parallel until convergence, with three resolution phases: (1) entity resolution, (2) relation name resolution, and (3) temporal resolution that merges observation and validity time sets for relations with similar (e_s, r_p, e_o). The resulting TKG snapshot is then merged with the previous DTKG to yield the updated DTKG. Results: Empirical evaluations demonstrate that ATOM achieves ~18% higher exhaustivity, ~17% better stability, and over 90% latency reduction compared to baseline methods (including iText2KG), demonstrating strong scalability potential for dynamic TKG construction. Check our ATOM's architecture and code: Preprint Paper: https://lnkd.in/dsJzDaQc Code: https://lnkd.in/drZUyisV Website: (coming soon) Example use cases: (coming soon) Special thanks to the dream team: Ludovic Moncla, Khalid Benabdeslem, Rémy Cazabet, Pierre Cléau 📚📡 | 14 comments on LinkedIn
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
·linkedin.com·
ATOM is finally here! A scalable and fast approach that can build and continuously update temporal knowledge graphs, inspired by atomic bonds.
Is OpenAI quietly moving toward knowledge graphs?
Is OpenAI quietly moving toward knowledge graphs?
Is OpenAI quietly moving toward knowledge graphs? Yesterday’s OpenAI DevDay was all about new no-code tools to create agents. Impressive. But what caught my attention wasn’t what they announced… it’s what they didn’t talk about. During the summer, OpenAI released a Cookbook update introducing the concept Temporal Agents (see below) connecting it to Subject–Predicate–Object triples: the very foundation of a knowledge graph. If you’ve ever worked with graphs, you know this means something big: they’re not just building agents anymore they’re building memory, relationships, and meaning. When you see “London – isCapitalOf – United Kingdom” in their official docs, you realize they’re experimenting with how to represent knowledge itself. And with any good knowledge graph… comes an ontology. So here’s my prediction: ChatGPT-6 will come with a built-in graph that connects everything about you. The question is: do you want their AI to know everything about you? Or do you want to build your own sovereign AI, one that you own, built from open-source intelligence and collective knowledge? Would love to know what you think. Is that me hallucinating or is that a weak signal?👇 | 62 comments on LinkedIn
Is OpenAI quietly moving toward knowledge graphs?
·linkedin.com·
Is OpenAI quietly moving toward knowledge graphs?
Some companies like Rippletideare getting agents to production using graphs as the orchestration layer and pushing LLMs to the edges
Some companies like Rippletideare getting agents to production using graphs as the orchestration layer and pushing LLMs to the edges
Most companies are building autonomous agents with LLMs at the center, making every decision. Here's the problem: each LLM call has a ~5% error rate. Chain 10 calls together and your reliability drops to 60%. ⁉️ Here's the math: Single LLM call = 95% accuracy. Chain 10 LLM calls for agentic workflow = 0.95^10 = 60% reliability. This compounds exponentially with complexity. Enterprise can't ship that. Some companies like Rippletide Yann BILIEN getting agents to production are doing something different. They're using graphs as the orchestration layer and pushing LLMs to the edges. The architectural solution is about removing LLMs from the orchestration loop entirely and using hypergraph-based reasoning substrates instead. Why hypergraphs specifically? Regular graphs connect two nodes per edge. Hyperedges connect multiple nodes simultaneously - critical for representing complex state transitions. A single sales conversation turn involves speaker, utterance, topic, customer state, sentiment, outcome, and timestamp. A hyperedge captures all these relationships atomically in the reasoning structure. The neurosymbolic integration is what makes this production-grade: Symbolic layer = business rules, ontologies, deterministic patterns. These are hard constraints that prevent policy violations (discount limits, required info collection, compliance rules). Neural layer = RL components that learn edge weights, validate patterns, update confidences. Operates within symbolic constraints. Together they enable the "crystallization mechanism" - patterns start probabilistic, validate through repeated success, then lock into deterministic rules at 95%+ confidence. The system becomes non-regressive: it learns and improves but validated patterns never degrade. Here's what this solves that LLM-orchestration can't: Hallucinations with confidence - eliminated because reasoning follows deterministic graph traversal through verified data, not generative token prediction. Goal drift - impossible because goal hierarchies are encoded in graph topology and enforced mathematically by traversal algorithms. Data leakage across contexts - prevented through graph partitioning and structural access controls, not prompt instructions. Ignoring instructions - doesn't happen because business rules are executable constraints, not natural language hopes. The LLM's role reduces to exactly two functions: (1) helping structure ontologies during build phase, (2) optionally formatting final outputs to natural language. Zero involvement in decision-making or orchestration. Rippletide's architecture demonstrates this at scale: Hypergraph stores unified memory + reasoning (no RAG, no retrieval bottleneck) Reasoning engines execute graph traversal algorithms for decisions Weighted edges encode relationship strength, recency, confidence, importance Temporal/spatial/causal relationships explicit in structure (what LLMs fundamentally lack) | 27 comments on LinkedIn
Some companies like Rippletide Yann BILIEN getting agents to production are doing something different. They're using graphs as the orchestration layer and pushing LLMs to the edges.
·linkedin.com·
Some companies like Rippletideare getting agents to production using graphs as the orchestration layer and pushing LLMs to the edges
The document-to-knowledge-graph pipeline is fundamentally broken
The document-to-knowledge-graph pipeline is fundamentally broken
The market is obsessed with the sexy stuff, autonomous agents, reasoning engines, sophisticated orchestration. Meanwhile, the unsexy foundation layer is completely broken. ⭕ And that foundation layer? It's the only thing that determines whether your agent actually works. Here's the technical problem killing agentic AI reliability and that a great company like Lettria solves: The document-to-knowledge-graph pipeline is fundamentally broken : Layer 1: Document Parsing Hell You can't feed a 400-page PDF with mixed layouts into a vision-language model and expect consistent structure. Here's why: Reading order detection fails on multi-column layouts, nested tables, and floating elements Vision LLMs hallucinate cell boundaries on complex tables (financial statements, technical specs) You need bbox-level segmentation with preserved coordinate metadata for traceability Traditional CV models (Doctr, Detectron2, YOLO) outperform transformers on layout detection and run on CPU Optimal approach requires model routing: PDF Plumber for text extraction, specialized table parsers for structured data, VLMs only as fallback Without preserving document_id → page_num → bbox_coords → chunk_id mapping, you lose provenance permanently Layer 2: Ontology Generation Collapse RDF/OWL ontology creation isn't prompt engineering. It's semantic modeling: You need 5-6 levels of hierarchical abstraction (not flat entity lists) Object properties require explicit domain/range specifications (rdfs:domain, rdfs:range) Data properties need typed constraints (xsd:string, xsd:integer, xsd:date) Relationships must follow semantic web standards (owl:ObjectProperty, owl:DatatypeProperty) LLM might output syntactically valid Turtle that violates semantic consistency Proper approach: 8-9 specialized LLM calls with constraint validation, reasoner checks, and ontologist-in-the-loop verification Without this, your knowledge graph has edges connecting semantically incompatible nodes Layer 3: Text-to-RDF Extraction Failure Converting natural language to structured triples while maintaining schema compliance is where frontier models crater: GPT-4/Claude achieve ~60-70% F1 on entity extraction, ~50-60% on relation extraction (measured on Text2KGBench) They hallucinate entities not in your ontology They create relations violating domain/range constraints Context window limitations force truncation (32K tokens = ~10-15 pages with full ontology) A specialized 600M parameter model fine-tuned on 14K annotated triples across 19 domain ontologies hits 85%+ F1 Why? Task-specific loss functions, schema-aware training, constrained decoding The compounding effect destroys reliability Your agent's reasoning is irrelevant when it's operating on a knowledge graph where 73% of nodes/edges are wrong, incomplete, or unverifiable. Without bidirectional traceability (SPARQL query → triple → chunk_id → bbox → source PDF), you can't deploy in regulated environments. Period. | 13 comments on LinkedIn
The document-to-knowledge-graph pipeline is fundamentally broken
·linkedin.com·
The document-to-knowledge-graph pipeline is fundamentally broken
Building Intelligent AI Memory Systems with Cognee: A Python Development Knowledge Graph
Building Intelligent AI Memory Systems with Cognee: A Python Development Knowledge Graph
Building AI agents that can synthesize scattered knowledge like expert developers 🧠 I have a tutorial about building intelligent AI memory systems with Cognee in my 'Agents Towards Production' repo that solves a critical problem - developers navigate between documentation, community practices, and personal experience, but traditional approaches treat these as isolated resources. This tutorial shows how to build a unified knowledge graph that connects Python's design philosophy, real-world implementations from its creator, and your specific development patterns. The tutorial covers 3 key capabilities: - Knowledge Graph Construction: Building interconnected networks from Guido van Rossum's actual commits, PEP guidelines, and personal conversations - Temporal Analysis: Understanding how solutions evolved over time with time-aware queries - Dynamic Memory Layer: Inferring implicit rules and discovering non-obvious connections across knowledge domains The cross-domain discovery is particularly impressive - it connects your validation issues from January 2024 with Guido van Rossum's actual solutions from mypy and CPython. Rather than keyword matching, it understands semantic relationships between your type hinting challenges and historical solutions, even when terminology differs. Tech stack: - Cognee for knowledge graph construction - OpenAI GPT-4o-mini for entity extraction - Graph algorithms for pattern recognition - Vector embeddings for semantic search The system uses semantic graph traversal with deep relationship understanding for contextually aware responses. Includes working Python code, complete Jupyter notebook with interactive visualizations, and production-ready patterns. Part of the collection of practical guides for building production-ready AI systems. Direct link to the tutorial: https://lnkd.in/eSsjwbuh Ever wish you could query all your development knowledge as one unified intelligent system? ♻️ Repost to let your network learn about this too!
·linkedin.com·
Building Intelligent AI Memory Systems with Cognee: A Python Development Knowledge Graph
Algorithmic vs. Symbolic Reasoning: Is Graph Data Science a critical, transformative layer for GraphRAG?
Algorithmic vs. Symbolic Reasoning: Is Graph Data Science a critical, transformative layer for GraphRAG?
Is Graph Data Science a critical, transformative layer for GraphRAG? The field of enterprise Artificial Intelligence (AI) is undergoing a significant architectural evolution. The initial enthusiasm for Large Language Models (LLMs) has matured into a pragmatic recognition of their limitations, partic
·linkedin.com·
Algorithmic vs. Symbolic Reasoning: Is Graph Data Science a critical, transformative layer for GraphRAG?
Flexible-GraphRAG
Flexible-GraphRAG
𝗙𝗹𝗲𝘅𝗶𝗯𝗹𝗲 𝗚𝗿𝗮𝗽𝗵𝗥𝗔𝗚 𝗼𝗿 𝗥𝗔𝗚 is now flexing to the max using LlamaIndex, supports 𝟳 𝗴𝗿𝗮𝗽𝗵 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲𝘀, 𝟭𝟬 𝘃𝗲𝗰𝘁𝗼𝗿 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲𝘀, 𝟭𝟯 𝗱𝗮𝘁𝗮 𝘀𝗼𝘂𝗿𝗰𝗲𝘀, 𝗟𝗟𝗠𝘀, Docling 𝗱𝗼𝗰 𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴, 𝗮𝘂𝘁𝗼 𝗰𝗿𝗲𝗮𝘁𝗲 𝗞𝗚𝘀, 𝗚𝗿𝗮𝗽𝗵𝗥𝗔𝗚, 𝗛𝘆𝗯𝗿𝗶𝗱 𝗦𝗲𝗮𝗿𝗰𝗵, 𝗔𝗜 𝗖𝗵𝗮𝘁 (shown Hyland products web page data src) 𝗔𝗽𝗮𝗰𝗵𝗲 𝟮.𝟬 𝗢𝗽𝗲𝗻 𝗦𝗼𝘂𝗿𝗰𝗲 𝗚𝗿𝗮𝗽𝗵: Neo4j ArcadeDB FalkorDB Kuzu NebulaGraph, powered by Vesoft (coming Memgraph and 𝗔𝗺𝗮𝘇𝗼𝗻 𝗡𝗲𝗽𝘁𝘂𝗻𝗲) 𝗩𝗲𝗰𝘁𝗼𝗿: Qdrant, Elastic, OpenSearch Project, Neo4j 𝘃𝗲𝗰𝘁𝗼𝗿, Milvus, created by Zilliz (coming Weaviate, Chroma, Pinecone, 𝗣𝗼𝘀𝘁𝗴𝗿𝗲𝗦𝗤𝗟 + 𝗽𝗴𝘃𝗲𝗰𝘁𝗼𝗿, LanceDB) Docling 𝗱𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 𝗗𝗮𝘁𝗮 𝗦𝗼𝘂𝗿𝗰𝗲𝘀: using LlamaIndex readers: working: Web Pages, Wikipedia, Youtube, untested: Google Drive, Msft OneDrive, S3, Azure Blob, GCS, Box, SharePoint, previous: filesystem, Alfresco, CMIS. 𝗟𝗟𝗠𝘀: 𝗟𝗹𝗮𝗺𝗮𝗜𝗻𝗱𝗲𝘅 𝗟𝗟𝗠𝘀 (OpenAI, Ollama, Claude, Gemini, etc.) 𝗥𝗲𝗮𝗰𝘁, 𝗩𝘂𝗲, 𝗔𝗻𝗴𝘂𝗹𝗮𝗿 𝗨𝗜𝘀, 𝗠𝗖𝗣 𝘀𝗲𝗿𝘃𝗲𝗿, 𝗙𝗮𝘀𝘁𝗔𝗣𝗜 𝘀𝗲𝗿𝘃𝗲𝗿 𝗚𝗶𝘁𝗛𝘂𝗯 𝘀𝘁𝗲𝘃𝗲𝗿𝗲𝗶𝗻𝗲𝗿/𝗳𝗹𝗲𝘅𝗶𝗯𝗹𝗲-𝗴𝗿𝗮𝗽𝗵𝗿𝗮𝗴: https://lnkd.in/eUEeF2cN 𝗫.𝗰𝗼𝗺 𝗣𝗼𝘀𝘁 𝗼𝗻 𝗙𝗹𝗲𝘅𝗶𝗯𝗹𝗲 𝗚𝗿𝗮𝗽𝗵𝗥𝗔𝗚 𝗼𝗿 𝗥𝗔𝗚 𝗺𝗮𝘅 𝗳𝗹𝗲𝘅𝗶𝗻𝗴 https://lnkd.in/gHpTupAr 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗲𝗱 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰𝘀 𝗕𝗹𝗼𝗴: https://lnkd.in/ehpjTV7d
·linkedin.com·
Flexible-GraphRAG
Introducing the GitLab Knowledge Graph
Introducing the GitLab Knowledge Graph
Today, I'd like to introduce the GitLab Knowledge Graph. This release includes a code indexing engine, written in Rust, that turns your codebase into a live, embeddable graph database for LLM RAG. You can install it with a simple one-line script, parse local repositories directly in your editor, and connect via MCP to query your workspace and over 50,000 files in under 100 milliseconds. We also saw GKG agents scoring up to 10% higher on the SWE-Bench-lite benchmarks, with just a few tools and a small prompt added to opencode (an open-source coding agent). On average, we observed a 7% accuracy gain across our eval runs, and GKG agents were able to solve new tasks compared to the baseline agents. You can read more from the team's research here https://lnkd.in/egiXXsaE. This release is just the first step: we aim for this local version to serve as the backbone of a Knowledge Graph service that enables you to query the entire GitLab Software Development Life Cycle—from an Issue down to a single line of code. I am incredibly proud of the work the team has done. Thank you, Michael U., Jean-Gabriel Doyon, Bohdan Parkhomchuk, Dmitry Gruzd, Omar Qunsul, and Jonathan Shobrook. You can watch Bill Staples and I present this and more in the GitLab 18.4 release here: https://lnkd.in/epvjrhqB Try today at: https://lnkd.in/eAypneFA Roadmap: https://lnkd.in/eXNYQkEn Watch more below for a complete, in-depth tutorial on what we've built: | 19 comments on LinkedIn
introduce the GitLab Knowledge Graph
·linkedin.com·
Introducing the GitLab Knowledge Graph
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation ... Why Current AI Search Falls Short When You Need Real Answers What happens when you ask an AI system a complex question that requires connecting multiple pieces of information? Most current approaches retrieve some relevant documents, generate an answer, and call it done. But this single-pass strategy often misses critical evidence. 👉 The Problem with Shallow Retrieval Traditional retrieval-augmented generation (RAG) systems work like a student who only skims the first few search results before writing an essay. They grab what seems relevant on the surface but miss deeper connections that would lead to better answers. When researchers tested these systems on complex multi-hop questions, they found a consistent pattern: the AI would confidently provide answers based on incomplete evidence, leading to logical gaps and missing key facts. 👉 A New Approach: Deep Searching with Dual Channels Researchers from IDEA Research and Hong Kong University of Science and Technology developed GraphSearch, which works more like a thorough investigator than a quick searcher. The system breaks down complex questions into smaller, manageable pieces, then searches through both text documents and structured knowledge graphs. Think of it as having two different research assistants: one excellent at finding descriptive information in documents, another skilled at tracing relationships between entities. 👉 How It Actually Works Instead of one search-and-answer cycle, GraphSearch uses six coordinated modules: Query decomposition splits complex questions into atomic sub-questions Context refinement filters out noise from retrieved information Query grounding fills in missing details from previous searches Logic drafting organizes evidence into coherent reasoning chains Evidence verification checks if the reasoning holds up Query expansion generates new searches to fill identified gaps The system continues this process until it has sufficient evidence to provide a well-grounded answer. 👉 Real Performance Gains Testing across six different question-answering benchmarks showed consistent improvements. On the MuSiQue dataset, for example, answer accuracy jumped from 35% to 51% when GraphSearch was integrated with existing graph-based systems. The approach works particularly well under constrained conditions - when you have limited computational resources for retrieval, the iterative searching strategy maintains performance better than single-pass methods. This research points toward more reliable AI systems that can handle the kind of complex reasoning we actually need in practice. Paper: "GraphSearch: An Agentic Deep Searching Workflow for Graph Retrieval-Augmented Generation" by Yang et al.
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation
·linkedin.com·
GraphSearch: An Agentic Deep‑Search Workflow for Graph Retrieval‑Augmented Generation
The rise of Context Engineering
The rise of Context Engineering
The field is evolving from Prompt Engineering, treating context as a single, static string, to Contextual Engineering, which views context as a dynamic system of structured components (instructions, tools, memory, knowledge) orchestrated to solve complex tasks. 🔎 Nearly all innovation is a response to the primary limitation of Transformer models: the quadratic (O(n2)) computational cost of the self-attention mechanism as the context length (n) increases. All techniques for managing this challenge can be organized into three areas: 1. Context Generation & Retrieval (Sourcing Ingredients) Advanced Reasoning: Chain-of-Thought (CoT), Tree-of-Thoughts (ToT). External Knowledge: Advanced Retrieval-Augmented Generation (RAG) like GraphRAG, which uses knowledge graphs for more structured retrieval. 2. Context Processing (Cooking the Ingredients) Refinement: Using the LLM to iterate and improve its own output (Self-Refine). Architectural Changes: Exploring models beyond Transformers (e.g., Mamba) to escape the quadratic bottleneck. 3. Context Management (The Pantry System) Memory: Creating stateful interactions using hierarchical memory systems (e.g., MemGPT) that manage information between the active context window and external storage. Key Distinction: RAG is stateless I/O to the world; Memory is the agent's stateful internal history. The most advanced applications integrate these pillars to create sophisticated agents, with an added layer of dynamic adaptation: Tool-Integrated Reasoning: Empowering LLMs to use external tools (APIs, databases, code interpreters) to interact with the real world. Multi-Agent Systems: Designing "organizations" of specialized LLM agents that communicate and collaborate to solve multi-faceted problems, mirroring the structure of human teams. Adaptive Context Optimization: Leveraging Reinforcement Learning (RL) to dynamically optimize context selection and construction for specific environments and tasks, ensuring efficient and effective performance. Contextual Engineering is the emerging science of building robust, scalable, and stateful applications by systematically managing the flow of information to and from an LLM. | 16 comments on LinkedIn
·linkedin.com·
The rise of Context Engineering