In this article, I explain Copilot in Fabric and Copilot in Power BI, walking through what it is, how it works, and three common scenarios when it might be used. More importantly, I evaluate whether using it over other, non-AI approaches makes sense, and evaluate the current state.
2. To walk through 3 Copilot senarios: generating code, answering data questions, and generating reports.
Copilot is a generative AI tool, and generative AI is a form of “narrow” or “weak” artificial intelligence. A simple explanation is that tools like Copilot provide a chatbot interface to interact with pre-trained foundation models like GPT-4, which might have additional tuning on-top. Inputs to these large-language models (LLMs) might use additional pre-processing with contextual data (which is called grounding) or post-processing by applying filtering or constraints (i.e. for responsible AI, or AI safety). These foundational models are trained on large amounts of data to learn patterns and can then generate new data that should resemble the inputs while also being coherent and context appropriate.
They work in different ways: The different Copilots might have subtly different architectures, use different foundational models (like Chat-GPT4 or DALL-E3), and return different results.
To paraphrase what Copilot does in steps, it:
Setting the row label and key column properties for tables in your semantic model. These properties help determine how to aggregate data when there are duplicate values identified by another column. The example from Chris Webb’s blog is avoiding grouping customers only by name, since some customers have the same name, but are identified uniquely by their customer ID.
To get this output, Copilot will take your prompt and grounding data to use with the LLM. This grounding data is perhaps a misnomer, because it’s not necessarily the data from your data items. Rather, it refers to the metadata that defines i.e. your semantic model and how you’ve set it up to work. For instance, Copilot looks at the names of the tables, columns, and measures in your model, as well as their properties, including the linguistic schema (i.e. synonyms), among others.
Generative AI is good at use-cases with soft accuracy. Unfortunately, in business intelligence, almost nothing fits in that box. When it comes to a BI solution, the business expects only one answer – the correct one. An incorrect or even a misleading output can lead to wrong business decisions and disastrous results.
They aren’t generally deterministic: When you submit a prompt multiple times to Copilot or Chat-GPT, you aren’t guaranteed to get the same answer back. However, in other tools that allow more tuning of LLM responses, you can set the temperature property lower (or colder) which makes results more deterministic.
Remember that Copilot is not deterministic. If we run this exact same prompt again in another query, we may get a completely different result with different issues. This might depend on other grounding data that Copilot is taking from that different session, i.e. if you submitted other prompts before.
additionally, Copilot is introducing new problems because of a flawed prompt; it’s computing the difference between Order Date and the Date from the Date table, which has a relationship… to Order Date. So even if we correct the filter to “Express Order”, the result will be incorrect; it will be 0.
Put another way, the more you know about a topic, the better you can tell what’s right and what’s wrong. When you know more, you’re also more aware of what you don’t know, what’s possible, and what makes sense. When using generative AI, this is extremely helpful, because you can have a more productive sparring session and less frequently fall victim to their errors or hallucinations.
What this means is that novices and beginners are more susceptible to wasting time and resources using AI. That’s because they are more likely to encounter “unknown-unknowns” that are harder to understand and apply, but they’ll also struggle more to identify bullshit like errors, hallucinations, or logical flaws.
In this scenario, the user asks Copilot for the MTD Sales for August 2021. This initially returns an error; MTD Sales is a synonym for Turnover MTD, so the user repeats the question using the actual measure name in the model. Copilot proceeds to return the Turnover MTD, but for August 2022 instead of August 2021. The current report page is showing MTD Sales, where the month is August (from a slicer selection) and the year is 2022 (from the filter pane). For its answer, Copilot refers to a matrix visual that shows this number.
EXAMPLE 1: USING COPILOT TO HELP WRITE DAX CODE
EXAMPLE 2: USING COPILOT TO ANSWER DATA QUESTIONS
When submitting another prompt after changing the report page, Copilot then creates a card stating that Turnover MTD is 5bn for August 2021, which is not only incorrect, but I have no idea how it came up with this number, because when clicking “Show reasoning”, Copilot simply re-phrases the prompt.
In the previous example where we generated code, the user could get a better result. Here, the prompt already seems quite specific and descriptive for such a simple question. However, even if the user gets extremely specific, referencing the exact column names and values, Copilot still returns an incorrect result.
The user slightly adjusts the prompt, removing possible ambiguities from the column names, which include examples. But again, the result is wrong; Copilot incorrectly returns the turnover for August 2022 instead of 2021.
Obviously at this point any normal user would give up. The damage has been done. But for the sake of argument, let’s press forward. The problem might not be the prompt. Could it be the model? The data?
Additionally, Copilot seems to be referencing the report page and grabbing a result from a visual, when I actually want it to query the model.
EXAMPLE 3: GENERATING REPORTS WITH COPILOT
At this point, however, I realized that I actually intended for Copilot to use the on-time delivery percentage and not the values. I should have specified this explicitly in the prompt, but at this point, I wonder “Do I generate a new report page, or replace these visuals myself?” But I wasn’t interested in using more of my available capacity.
IF THE REPORTS LOOK NICE, IT’S BECAUSE OF A HUMAN AND NOT COPILOT
The example reports from Copilot demonstrations look nice at first glance, especially before scrutiny. However, it’s important to emphasize that the initial aesthetic appeal of these reports is not an AI output, but due to some simple design controls put in by a human.
Copilot reports look nice because a human followed some good design practices when setting the default properties for Copilot reports. Click the image to enlarge it in a lightbox.
A generated report page by Copilot. The prompt was “Title: On-Time Delivery by Key Account Description: An overview of the total On-Time Delivery percentage in lines, value, and quantity, the trend, and a breakdown by Key account for these three metrics”. Click the image to enlarge it in a lightbox.
The result when asking Copilot to create a matrix of OTD by Key Account. Copilot adds the conditional formatting without asking. Click the image to enlarge it in a lightbox.
COPILOT PROBABLY ISN’T A DECIDING FACTOR FOR YOU TO GET AN F64
Copilot in Fabric and Copilot for Power BI are evolving over time. Undoubtedly, eventually, it will grow to encompass much more functionality and scope, and it will improve. However, in its current state, I don’t really consider it to be a deciding factor for purchasing Fabric or an F64 SKU. Of course, this might change, and I’ll re-evaluate this as its capabilities evolve.
One final remark is not about Copilot specifically, but generative AI technology, as a whole.