GraphNews

#GraphAI
Simplify graph embeddings
Simplify graph embeddings
Simplify graph emebeddings ↙️↙️↙️ Developing a fast vector indexing datastore engine 🚂 at `arrowspace` led me into defining a fast way for doing graph embeddings. What I came up with is a process that is categorised as inductive graph embeddings, aka infer the embedding of an added node without retraining on the graph. `arrowspace` work similarly to Laplacian Eigenmaps with some relevant tweaks to achieve performance as described in https://lnkd.in/eGgeKbdM This method is a sequence of linear operations, compared to similar algorithms it uses spectral properties instead of random walks so to achieve faster training speed 🚄 How faster will be the object of a future blogpost. Practical comparison summary: * Inductiveness: `arrowspace` (spectral operator on features) and GraphSAGE are inductive; DeepWalk/node2vec are typically transductive * Online cost: `arrowspace`’s operator application is lightweight; GraphSAGE requires model inference; node2vec/DeepWalk usually require rerunning or approximations to add nodes * Quality: Laplacian embeddings benchmark strongly against node2vec and are competitive with deep methods (VGAE) depending on graph properties and metrics, suggesting `arrowspace`’s embeddings will be solid baselines or better for community-structured retrieval tasks * Integration: `arrowspace` emphasizes Rust/native vector indexing with spectral augmentation, complementing external training stacks rather than replacing them. This simplifies this kind of processes compared to Deep Learning and random walks approaches. Please follow for more updates. #graphembeddings #graphs #embeddings #search #algorithm
Simplify graph emebeddings
·linkedin.com·
Simplify graph embeddings
GraphLand: Evaluating Graph Machine Learning Models on Diverse...
GraphLand: Evaluating Graph Machine Learning Models on Diverse...

Recently, there has been a lot of criticism of existing popular graph ML benchmark datasets concerning such aspects as lacking practical relevance, low structural diversity that leaves most of the possible graph structure space not represented, low application domain diversity, graph structure not being beneficial for the considered tasks, and potential bugs in the data collection processes. Some of these criticisms previously appeared on this channel.

To provide the community with better benchmarks, we present GraphLand: a collection of 14 graph datasets for node property prediction coming from diverse real-world industrial applications of graph ML. What makes this benchmark stand out?

Diverse application domains: social networks, web graphs, road networks, and more. Importantly, half of the datasets feature node-level regression tasks that are currently underrepresented in graph ML benchmarks, but are often encountered in real-world applications.

Range of sizes: from thousands to millions of nodes, providing opportunities for researchers with different computational resources.

Rich node attributes that contain numerical and categorical features — these are more typical for industrial applications than textual descriptions that are standard for current benchmarks.

Different learning scenarios. For all datasets, we provide two random data splits with low and high label rate. Further, many of our networks are evolving over time, and for them we additionally provide more challenging temporal data splits and an opportunity to evaluate models in the inductive setting where only an early snapshot of the evolving network is available at train time.

We evaluated a range of models on our datasets and found that, while GNNs achieve strong performance on industrial datasets, they can sometimes be rivaled by popular in the industry gradient boosted decision trees which are provided with additional graph-based input features.

Further, we evaluated several graph foundation models (GFMs). Despite much attention being paid to GFMs recently, we found that there are currently only a few GFMs that can handle arbitrary node features (which is required for true generalization between different graphs) and that these GFMs produce very weak results on our benchmark. So it seemed like the problem of developing general-purpose graph foundation models was far from being solved, which motivated our research in this direction (see the next post).

·arxiv.org·
GraphLand: Evaluating Graph Machine Learning Models on Diverse...
How can we create general-purpose graph foundation models?
How can we create general-purpose graph foundation models?
How can we create general-purpose graph foundation models? (by Dmitry Eremeev) For a long time, we believed that general-purpose graph foundation models were impossible to create. Indeed, graphs are used to represent data across many different domains, and thus graph machine learning must handle tasks on extremely diverse datasets, such as social, information, transportation, and co-purchasing networks, or models of various physical, biological, or engineering systems. Given the vast differences in structure, features, and labels among these datasets, it seemed unlikely that a single model could achieve robust cross-domain generalization and perform well on all of them. However, we noticed that tabular machine learning faces a similar challenge of working with diverse datasets containing different features and labels. And yet, this field has recently witnessed the emergence of first successful foundation models such as TabPFNv2, which are based on the prior-data fitted networks (PFNs) paradigm. Thus, we have…
·t.me·
How can we create general-purpose graph foundation models?
G-REASONER: foundation models for unified reasoning over graph-structured knowledge
G-REASONER: foundation models for unified reasoning over graph-structured knowledge
G-REASONER: foundation models for unified reasoning over graph-structured knowledge ... Why Graph-Enhanced AI Still Struggles with Complex Reasoning (And How G-REASONER Fixes It) Ever wondered why current AI systems still fail at connecting the dots across complex knowledge domains? The answer lies in how they handle structured information. 👉 The Core Problem Large language models excel at reasoning but hit a wall when dealing with interconnected knowledge. Traditional retrieval systems treat information as isolated fragments, missing the rich relationships that make knowledge truly useful. Current graph-enhanced approaches face three critical limitations: - They're designed for specific graph types only - They rely on expensive agent-based reasoning - They can't generalize across different domains 👉 What G-REASONER Brings to the Table Researchers from Monash University and collaborating institutions introduce G-REASONER, a unified framework that bridges graph and language foundation models. The key innovation is QuadGraph - a standardized four-layer structure that unifies diverse knowledge sources: - Community layer for global context - Document layer for textual information - Knowledge graph layer for factual relationships - Attribute layer for common properties 👉 How It Works in Practice G-REASONER employs a 34M-parameter graph foundation model that jointly processes graph topology and text semantics. Unlike previous approaches, it uses knowledge distillation to learn from large-scale datasets with weak supervision. The system implements distributed message-passing across multiple GPUs, enabling efficient scaling. Mixed-precision training reduces memory usage by 17.5% while doubling training throughput. Testing across six benchmarks shows consistent improvements over state-of-the-art baselines, with particularly strong performance on multi-hop reasoning tasks requiring complex knowledge connections. The framework demonstrates remarkable generalization - the same model works effectively across medical records, legal documents, and encyclopedia data without domain-specific fine-tuning. This represents a significant step toward AI systems that can reason over structured knowledge as fluidly as humans navigate interconnected concepts.
G-REASONER: foundation models for unified reasoning over graph-structured knowledge
·linkedin.com·
G-REASONER: foundation models for unified reasoning over graph-structured knowledge
City2graph is a Python library that turns urban datasets such as streets, buildings, transit networks, and mobility flows into graph structures ready for Graph Neural Networks.
City2graph is a Python library that turns urban datasets such as streets, buildings, transit networks, and mobility flows into graph structures ready for Graph Neural Networks.
city2graph logo city2graph logo city2graph is a Python library for converting geospatial datasets into graphs for GNN with integrated interface of GeoPandas, NetworkX, and Pytorch Geometric across ...
·city2graph.net·
City2graph is a Python library that turns urban datasets such as streets, buildings, transit networks, and mobility flows into graph structures ready for Graph Neural Networks.
Stanford Graph Learning Workshop 2025
Stanford Graph Learning Workshop 2025
🚀 I’m excited to announce Stanford Graph Learning Workshop 2025, happening Tuesday, October 14, 2025 at Stanford University (with online livestream). Free registration! Submit a talk/poster. 📍 This year’s workshop will spotlight three fast-moving frontiers in AI & data science: -- Agents — Autonomous systems reshaping how we interact with tech -- Relational Foundation Models — Unlocking structure and meaning in complex data -- Fast LLM Inference — Pushing the boundaries of speed & scalability for large language models We’re bringing together researchers, innovators, and practitioners for a full day of cutting-edge talks, interactive sessions, and collaborative discussions. Whether you’re working in industry, academia, or startup land, there will be something to spark your curiosity and drive your work forward. 🔍 Want to share your work? We have a Call for Contributed Talks and Posters/Demos open now. ✅ Register now (free): https://lnkd.in/dm9JUnH6 📅 Save the date: Oct 14, 2025 | 13 comments on LinkedIn
Stanford Graph Learning Workshop 2025
·linkedin.com·
Stanford Graph Learning Workshop 2025
Graph training: Graph Tech Demystified
Graph training: Graph Tech Demystified
Calling all data scientists, developers, and managers! 📢 Looking to level up your team's knowledge of graph technology? We're excited to share the recorded 2-part training series, "Graph Tech Demystified" with the amazing Paco Nathan. This is your chance to get up to speed on graph fundamentals: In Part 1: Intro to Graph Technologies, you'll learn: - Core concepts in graph tech. - Common pitfalls and what graph technology won't solve. - Focus of graph analytics and measuring quality. 🎥 Recording https://lnkd.in/gCtCCZH5 📖 Slides https://lnkd.in/gbCnUjQN In Part 2: Advanced Topics in Graph Technologies, we explore: - Sophisticated graph patterns like motifs and probabilistic subgraphs. - Intersection of Graph Neural Networks (GNNs) and Reinforcement Learning. - Multi-agent systems and Graph RAG. 🎥 Recording https://lnkd.in/g_5B8nNC 📖 Slides https://lnkd.in/g6iMbJ_Z Insider tip: The resources alone are enough to keep you busy far longer the time it takes to watch the training!
Graph Tech Demystified
·linkedin.com·
Graph training: Graph Tech Demystified
What Every Data Scientist Should Know About Graph Transformers and Their Impact on Structured Data
What Every Data Scientist Should Know About Graph Transformers and Their Impact on Structured Data

In my latest piece for Unite.AI, I dive into: 🔹 Why message passing alone isn’t enough 🔹 How Graph Transformers use attention to overcome GNN limitations 🔹 Real-world applications in drug discovery, supply chains, recommender systems, and cybersecurity 🔹 The exciting frontier where LLMs meet graphs

·unite.ai·
What Every Data Scientist Should Know About Graph Transformers and Their Impact on Structured Data
Scaling Graph Learning for the Enterprise
Scaling Graph Learning for the Enterprise
Tackle the core challenges related to enterprise-ready graph representation and learning. With this hands-on guide, applied data scientists, machine learning engineers, and... - Selection from Scaling Graph Learning for the Enterprise [Book]
·oreilly.com·
Scaling Graph Learning for the Enterprise
A new notebook exploring Semantic Entity Resolution & Extraction using DSPy and Google's new LangExtract library.
A new notebook exploring Semantic Entity Resolution & Extraction using DSPy and Google's new LangExtract library.
Just released a new notebook exploring Semantic Entity Resolution & Extraction using DSPy (Community) and Google's new LangExtract library. Inspired by Russell Jurney’s excellent work on semantic entity resolution, this demo follows his approach of combining: ✅ embeddings, ✅ kNN blocking, ✅ and LLM matching with DSPy (Community). On top of that, I added a general extraction layer to test-drive LangExtract, a Gemini-powered, open-source Python library for reliable structured information extraction. The goal? Detect and merge mentions of the same real-world entities across text. It’s an end-to-end flow tackling one of the most persistent data challenges. Check it out, experiment with your own data, 𝐞𝐧𝐣𝐨𝐲 𝐭𝐡𝐞 𝐬𝐮𝐦𝐦𝐞𝐫 and let me know your thoughts! cc Paco Nathan you might like this 😉 https://wor.ai/8kQ2qa
a new notebook exploring Semantic Entity Resolution & Extraction using DSPy (Community) and Google's new LangExtract library.
·linkedin.com·
A new notebook exploring Semantic Entity Resolution & Extraction using DSPy and Google's new LangExtract library.
MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains
MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains
MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains ... When AI Diagnoses Patients, Should Reasoning Be a Team Sport? 👉 Why Existing Approaches Fall Short Medical question answering demands precision, but current AI methods struggle with two key issues: 1. Error Accumulation: Linear reasoning chains (like Chain-of-Thought) risk compounding mistakes—if the first step is wrong, the entire answer falters. 2. Flat Knowledge Retrieval: Traditional retrieval-augmented methods treat medical facts as unrelated text snippets, ignoring complex relationships between symptoms, diseases, and treatments. This leads to unreliable diagnoses and opaque decision-making—a critical problem when patient outcomes are at stake. 👉 What MIRAGE Does Differently MIRAGE transforms reasoning from a solo sprint into a coordinated team effort: - Parallel Detective Work: Instead of one linear chain, multiple specialized "detectives" (reasoning chains) investigate different symptoms or entities in parallel. - Structured Evidence Hunting: Retrieval operates on medical knowledge graphs, tracing connections between symptoms (e.g., "face pain → lead poisoning") rather than scanning documents. - Cross-Check Consensus: Answers from parallel chains are verified against each other to resolve contradictions, like clinicians discussing differential diagnoses. 👉 How It Works (Without the Jargon) 1. Break It Down   - Splits complex queries ("Why am I fatigued with knee pain?") into focused sub-questions grounded in specific symptoms/entities.   - Example: "Conditions linked to fatigue" and "Causes of knee lumps" become separate investigation threads. 2. Graph-Guided Retrieval   - Each thread explores a medical knowledge graph like a map:    - Anchor Mode: Examines direct connections (e.g., diseases causing a symptom).    - Bridge Mode: Hunts multi-step relationships (e.g., toxin exposure → neurological symptoms → joint pain). 3. Vote & Verify   - Combines evidence from all threads, prioritizing answers supported by multiple independent chains.   - Discards conflicting hypotheses (e.g., ruling out lupus if only one chain suggests it without corroboration). 👉 Why This Matters Tested on three medical benchmarks (including real clinician queries), MIRAGE: - Outperformed GPT-4 and Tree-of-Thought variants in accuracy (84.8% vs. 80.2%) - Reduced error propagation by 37% compared to linear retrieval-augmented methods - Produced answers with traceable evidence paths, critical for auditability in healthcare The Big Picture MIRAGE shifts AI reasoning from brittle, opaque processes to collaborative, structured exploration. By mirroring how clinicians synthesize information from multiple angles, it highlights a path toward AI systems that are both smarter and more trustworthy in high-stakes domains. Paper: Wei et al. MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains
MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains
·linkedin.com·
MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains
GraphCogent: Overcoming LLMs’ Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding
GraphCogent: Overcoming LLMs’ Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding
Unlocking LLMs' Graph Reasoning Potential Through Cognitive-Inspired Collaboration 👉 Why This Matters Large language models often falter when analyzing transportation networks, social connections, or citation graphs—not due to lacking intelligence, but because of working memory constraints. Imagine solving a 1,000-node shortest path problem while simultaneously memorizing every connection. Like humans juggling too many thoughts, LLMs lose accuracy as graph complexity increases. 👉 What GraphCogent Solves This new framework addresses three core limitations: 1. Representation confusion: Mixed graph formats (adjacency lists, symbols, natural language) 2. Memory overload: Context window limitations for large-scale graphs 3. Execution fragility: Error-prone code generation for graph algorithms Drawing inspiration from human cognition's working memory model, GraphCogent decomposes graph reasoning into specialized processes mirroring how our brains handle complex tasks. 👉 How It Works Sensory Module - Acts as an LLM's "eyes," standardizing diverse graph inputs through subgraph sampling - Converts web links, social connections, or traffic routes into uniform adjacency lists Buffer Module - Functions as a "mental workspace," integrating graph data across formats (NetworkX/PyG/NumPy) - Maintains persistent memory beyond standard LLM context limits Execution Module - Combines two reasoning modes:  - Tool calling for common tasks (pathfinding, cycle detection)  - Model generation for novel problems using preprocessed data 👉 Proven Impact - Achieves 98.5% accuracy on real-world graphs (social networks, transportation systems) using Llama3.1-8B - Outperforms 671B parameter models by 50% while using 80% fewer tokens - Handles graphs 10x larger than previous benchmarks through efficient memory management The framework's secret sauce? Treating graph reasoning as a team effort rather than a single AI's task—much like how human experts collaborate on complex problems. Key Question for Discussion As multi-agent systems become more sophisticated, how might we redesign LLM architectures to better emulate human cognitive processes for specific problem domains? Paper: "GraphCogent: Overcoming LLMs’ Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding" (Wang et al., 2025)
- Achieves 98.5% accuracy on real-world graphs (social networks, transportation systems) using Llama3.1-8B- Outperforms 671B parameter models by 50% while using 80% fewer tokens- Handles graphs 10x larger than previous benchmarks through efficient memory management
·linkedin.com·
GraphCogent: Overcoming LLMs’ Working Memory Constraints via Multi-Agent Collaboration in Complex Graph Understanding
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
Graph-R1 New RAG framework just dropped! Combines agents, GraphRAG, and RL. Here are my notes: Introduces a novel RAG framework that moves beyond traditional one-shot or chunk-based retrieval by integrating graph-structured knowledge, agentic multi-turn interaction, and RL. Graph-R1 is an agent that reasons over a knowledge hypergraph environment by iteratively issuing queries and retrieving subgraphs using a multi-step “think-retrieve-rethink-generate” loop. Unlike prior GraphRAG systems that perform fixed retrieval, Graph-R1 dynamically explores the graph based on evolving agent state. Retrieval is modeled as a dual-path mechanism: entity-based hyperedge retrieval and direct hyperedge similarity, fused via reciprocal rank aggregation to return semantically rich subgraphs. These are used to ground subsequent reasoning steps. The agent is trained end-to-end using GRPO with a composite reward that incorporates structural format adherence and answer correctness. Rewards are only granted if reasoning follows the proper format, encouraging interpretable and complete reasoning traces. On six RAG benchmarks (e.g., HotpotQA, 2WikiMultiHopQA), Graph-R1 achieves state-of-the-art F1 and generation scores, outperforming prior methods including HyperGraphRAG, R1-Searcher, and Search-R1. It shows particularly strong gains on harder, multi-hop datasets and under OOD conditions. The authors find that Graph-R1’s performance degrades sharply without its three key components: hypergraph construction, multi-turn interaction, and RL. Ablation study supports that graph-based and multi-turn retrieval improves information density and accuracy, while end-to-end RL bridges the gap between structure and language. Paper: https://lnkd.in/eGbf4HhX | 15 comments on LinkedIn
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
·linkedin.com·
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
Use Graph Machine Learning to detect fraud with Amazon Neptune Analytics and GraphStorm | Amazon Web Services
Use Graph Machine Learning to detect fraud with Amazon Neptune Analytics and GraphStorm | Amazon Web Services
Every year, businesses and consumers lose billions of dollars to fraud, with consumers reporting $12.5 billion lost to fraud in 2024, a 25% increase year over year. People who commit fraud often work together in organized fraud networks, running many different schemes that companies struggle to detect and stop. In this post, we discuss how to use Amazon Neptune Analytics, a memory-optimized graph database engine for analytics, and GraphStorm, a scalable open source graph machine learning (ML) library, to build a fraud analysis pipeline with AWS services.
·aws.amazon.com·
Use Graph Machine Learning to detect fraud with Amazon Neptune Analytics and GraphStorm | Amazon Web Services
Alice enters the magical, branchy world of Graphs and Graph Neural Networks
Alice enters the magical, branchy world of Graphs and Graph Neural Networks
The first draft 'G' chapter of the geometric deep learning book is live! 🚀 Alice enters the magical, branchy world of Graphs and Graph Neural Networks 🕸️ (Large Language Models are there too!) I've spent 7+ years studying, researching & talking about graphs -- This text is my best attempt at conveying everything i've learnt 💎 You may read this chapter in the usual place (link in comments!) Any and all feedback / thoughts / questions on the content, and/or words of encouragement for finishing this book (pretty please! 😇) are warmly welcomed! Michael Bronstein Joan Bruna Taco Cohen | 18 comments on LinkedIn
Alice enters the magical, branchy world of Graphs and Graph Neural Networks
·linkedin.com·
Alice enters the magical, branchy world of Graphs and Graph Neural Networks
Multi-modal Graph Large Language Models (MG-LLM)
Multi-modal Graph Large Language Models (MG-LLM)
Multi-modal graphs are everywhere in the digital world. Yet the tools used to understand them haven't evolved as much as one would expect. What if the same model could handle your social network analysis, molecular discovery, AND urban planning tasks? A new paper from Tsinghua University proposes Multi-modal Graph Large Language Models (MG-LLM) - a paradigm shift in how we process complex interconnected data that combines text, images, audio, and structured relationships. Think of it as ChatGPT for graphs, but, metaphorically speaking, with eyes, ears, and structural understanding. Their key insight? Treating all graph tasks as generative problems. Instead of training separate models for node classification, link prediction, or graph reasoning, MG-LLM frames everything as transforming one multi-modal graph into another. This unified approach means the same model that predicts protein interactions could also analyze social media networks or urban traffic patterns. What makes this particularly exciting is the vision for natural language interaction with graph data. Imagine querying complex molecular structures or editing knowledge graphs using plain English, without learning specialized query languages. The challenges remain substantial - from handling the multi-granularity of data (pixels to full images) to managing multi-scale tasks (entire graph input, single node output). But if successful, this could fundamentally change the level of graph-based insights across industries that have barely scratched the surface of AI adoption. ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡
Multi-modal Graph Large Language Models (MG-LLM)
·linkedin.com·
Multi-modal Graph Large Language Models (MG-LLM)
Relational Graph Transformers: A New Frontier in AI for Relational Data - Kumo
Relational Graph Transformers: A New Frontier in AI for Relational Data - Kumo
Relational Graph Transformers represent the next evolution in Relational Deep Learning, allowing AI systems to seamlessly navigate and learn from data spread across multiple tables. By treating relational databases as the rich, interconnected graphs they inherently are, these models eliminate the need for extensive feature engineering and complex data pipelines that have traditionally slowed AI adoption. In this post, we'll explore how Relational Graph Transformers work, why they're uniquely suited for enterprise data challenges, and how they're already revolutionizing applications from customer analytics and recommendation systems to fraud detection and demand forecasting.
·kumo.ai·
Relational Graph Transformers: A New Frontier in AI for Relational Data - Kumo