Most companies think their knowledge graph or ontology will be built by extracting information from their data only to find out that their data doesn’t contain much information
Most companies think their knowledge graph or ontology will be built by extracting information from their data only to find out that their data doesn’t contain much information.
You’re taught the cycle of Data - Information - Knowledge - Wisdom, but they stop before teaching a fundamental concept of information theory. You can measure the information in a dataset.
There’s an entire area of study around defining whether a dataset has sufficient information to answer a question and build a model. Run that evaluation on most enterprise data and business questions and you’ll see the extent of the problem.
No downstream process (cleaning, transformation, wrangling, etc.) can introduce information that a dataset doesn’t already contain. Said simply, you can’t clean the signal back into the data.
If it wasn’t gathered contextually, the information was lost.
For almost a decade, I have had to give new clients the same sad story. Roughly 80% of the business’s data doesn’t contain enough information to be used for model training. LLMs don’t change that.
Agents need a lot more information to do their jobs reliably. An agent detects intent, then infers the desired outcome and all the steps required to deliver it.
RAG over knowledge graphs is intended to provide all the supporting information required to do that reliably. However, if your datasets don’t contain enough information, no amount of AI can fix it.
Before building an agent, we must assess whether our data contains enough information to satisfy the range of intents our users will bring to it. That’s an even higher bar than just answering a question or predicting a single variable.
Agents create an information problem on both sides of the equation:
Do you have enough information to define the intent and outcome based on the user’s prompt?
Do you have enough information to define the steps required to deliver the outcome and execute them reliably enough to deliver the outcome?
Information and knowledge management are the keys that unlock AI’s value, but businesses must curate datasets in new ways to succeed. The enterprise’s BI datasets and data warehouses rarely contain enough information to get the job done. | 24 comments on LinkedIn
Most companies think their knowledge graph or ontology will be built by extracting information from their data only to find out that their data doesn’t contain much information