'I am not a robot' isn't what you think.Remove your personal information from the web at https://JoinDeleteMe.com/chuppl20 and use code CHUPPL20 for 20% offS...
What went (methodologically) wrong with the ChatGPT in education meta-studies
This continues my post that analyzed the Weng&Fan meta-analysis of the impacts of ChatGPT on "learning performance," "learning perception," and "critical thinking." Weng&Fan and the earlier Deng et al.
There's a tendency to write about technological change as an "all of a sudden" occurrence – even if you try to offer some background, some precursors, some concurrent events, or a longer, broader perspective, people still often read "all of a sudden" into any discussion about the arrival of a new technology. That it even has something like an arrival. A recent oral history, published in Quanta Magazine, of what happened to the field of natural language processing with the release of GPT-3 is, ar
I invited 31 researchers to test AI research synthesis by running the exact same prompt. They learned LLM analysis is overhyped, but evaluating it is something you can do yourself.
Last month I ran an #AI for #userresearch workshop with Rosenfeld Media. Our first cohort was full of smart, thoughtful researchers (if you participated in the workshop, I hope you’ll tag yourself and weigh in in the comments!).
A major limitation of a lot of AI for UXR “thought leadership” right now is that too much of it is anecdotal: researchers run datasets a few times through a commercial tool and decide whether or not the output is good enough based on only a handful of results.
But for nondeterministic systems like generative AI, repeated testing under controlled conditions is the only way to know how well they actually work. So that’s what we did in the workshop.
Our workshop participants produced a lot of interesting findings about qualitative research synthesis with AI:
1️⃣ LLMs can product vastly different output even with the exact same prompt and data. The number of themes alone ranged from 5 to 18, with a median of about 10.5.
2️⃣ Our AI-generated themes mapped pretty well to human-generated themes, but there were some notable differences. This led to a discussion of whether mapping to human themes is even the right metric to use to evaluate AI synthesis (how are we evaluating whether the human-generated themes were right in the first place?).
3️⃣ The bigger concern for the researchers in the workshop was the lack of supporting evidence for themes. The supporting quotes the LLM provided looked okay superficially, but on closer investigation *every single participant* found examples of data being misquoted or entirely fabricated. One person commented that validating the output was ultimately more work than performing the analysis themselves.
Now, I want to acknowledge that this is one dataset, one prompt (although, a carefully vetted one, written by an industry expert), and one model (GPT 4o 2024-11-20). Some researchers claim that GPT 4o is worse for research hallucinations–and perhaps it is–but it is still a heavily utilized model in current off-the-shelf AI research tools (and if you’re using off-the-shelf tools, you won’t always know which models they’re using unless you read a whole lot of fine print).
But the point is–I think this is exactly the level at which we should be scrutinizing the output of *all* LLMs in research.
AI absolutely has its place in the modern researcher’s toolkit. But until we systematically evaluate its strengths and weaknesses, we're rolling the dice every time we use it.
We'll be running a second round of my workshop in June as part of Rosenfeld Media’s Designing with AI conference (ticket prices go up tomorrow; register with code PAINE-DWAI2025 for a discount). Or, to hear about other upcoming workshops and events from me, sign up for my mailing list (links below). | 178 comments on LinkedIn
The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis
Humanities and Social Sciences Communications - The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis
More than 160 new AI data centers have been built across the US in the past three years in regions already grappling with scarce water resources, a Bloomberg News analysis finds.
I believe that nature has incontrovertibly told us that diversity survives, and monocultures fail. Same with scientific method: diverse thinking and approaches gets us to closer to quality truth; and…
The 60% Problem — How AI Search Is Draining Your Traffic
Roughly 60% of searches now yield no clicks at all, as AI-generated answers satisfy them directly on the search results page. Here are some insights to gain back traffic.
A whole lot of people – including computer scientists who should know better and academics who are usually thoughtful – are caught up in fanciful, magical beliefs about chatbots. Any su…
Time for a Pause: Without Effective Public Oversight, AI in Schools Will Do More Harm Than Good.
Ignoring their own well-publicized calls to regulate AI development and to pause implementation of its applications, major technology companies such as Google, Microsoft, and Meta are racing to fend off regulation and integrate artificial intelligence (AI) into their platforms. The weight of the available evidence suggests that the current wholesale adoption of unregulated AI applications in schools poses a grave danger to democratic civil society and to individual freedom and liberty. Years of warnings and precedents have highlighted the risks posed by the widespread use of pre-AI digital technologies in education, which have obscured decision-making and enabled student data exploitation. Without effective public oversight, the introduction of opaque and unproven AI systems and applications will likely exacerbate these problems. This policy brief explores the harms likely if lawmakers and others do not step in with carefully considered measures to prevent these extensive risks. The authors urge school leaders to pause the adoption of AI applications until policymakers have had sufficient time to thoroughly educate themselves and develop legislation and policies ensuring effective public oversight and control of school applications. Suggested Citation: Williamson, B., Molnar, A., & Boninger, F. (2024). Time for a pause: Without effective public oversight, AI in schools will do more harm than good. Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/publication/ai
Meta Stole Millions of Books to Train AI—Then Called Them Worthless. Now They’re Suing to Silence One.
Meta stole millions of books to build its AI empire—then declared them worthless, profited from every word, moved to silence the whistleblower, and is now trying to outlaw the very theft it perfected. Meta’s Great AI Heist Meta scraped over 7 million pirated books to train its LLaMA models—including
I Tested The AI That Calls Your Elderly Parents If You Can't Be Bothered
inTouch says on its website "Busy life? You can’t call your parent every day—but we can." My own mum said she would feel terrible if her child used it.
LinkedIn’s AI action figure fad is ‘obviously unsustainable,’ warns UK tech mogul
If you’ve been scrolling social media over the past week, you may have noticed miniature action figure versions of friends, family, or colleagues neatly wrapped in a blister pack.
These ...
Big tech’s water-guzzling data centers are draining some of the world’s driest regions
Amazon, Google, and Microsoft are expanding data centers in areas already struggling with drought, raising concerns about their use of local water supplies for cooling massive server farms.Luke Barratt and Costanza Gambarini report for The Guardian.In short:The three largest cloud companies are buil...