The Dataverse Project: 750K FAIR Datasets and a Living Knowledge Graph

"I'm Ukrainian and I'm wearing a suit, so no complaints about me from the Oval Office" - that's the start of my lecture about building Artificial Intelligence with Croissant ML in the Dataverse data platform, for the Bio x AI Hackathon kick-off event in Berlin. https://lnkd.in/ePYHCfJt * 750,000+ FAIR datasets across the world forcing the innovation of the whole data landscape. * A knowledge graph with 50M+ triples. * AI-ready metadata exports. * Qdrant as a vector storage, Google Meta Mistral AI as LLM model providers. * Adrian Gschwend Qlever as fastest triple store for Dataverse knowledge graphs Multilingual, machine-readable, queryable scientific data at scale. If you're interested, you can also apply for the 2-month #BioAgentHack online hackathon: • $125K+ prizes • Mentorship from Biotech and AI leaders • Build alongside top open-science researchers & devs More info: https://lnkd.in/eGhvaKdH

#KnowledgeGraph #research #open source #technical #open data #science

·linkedin.com·Apr 11, 2025

The Dataverse Project: 750K FAIR Datasets and a Living Knowledge Graph

OpenFact: Factuality Enhanced Open Knowledge Extraction | Transactions of the Association for Computational Linguistics | MIT Press

Abstract. We focus on the factuality property during the extraction of an OpenIE corpus named OpenFact, which contains more than 12 million high-quality knowledge triplets. We break down the factuality property into two important aspects—expressiveness and groundedness—and we propose a comprehensive framework to handle both aspects. To enhance expressiveness, we formulate each knowledge piece in OpenFact based on a semantic frame. We also design templates, extra constraints, and adopt human efforts so that most OpenFact triplets contain enough details. For groundedness, we require the main arguments of each triplet to contain linked Wikidata1 entities. A human evaluation suggests that the OpenFact triplets are much more accurate and contain denser information compared to OPIEC-Linked (Gashteovski et al., 2019), one recent high-quality OpenIE corpus grounded to Wikidata. Further experiments on knowledge base completion and knowledge base question answering show the effectiveness of OpenFact over OPIEC-Linked as supplementary knowledge to Wikidata as the major KG.

#KnowledgeGraph #open data #open source

·direct.mit.edu·Jul 13, 2023

OpenFact: Factuality Enhanced Open Knowledge Extraction | Transactions of the Association for Computational Linguistics | MIT Press