Remember that "ChatGPT makes you dumber" paper from MIT?
Remember that "ChatGPT makes you dumber" paper from MIT? I've just read it while preparing for the next iteration of my "GenAI in Research" course and it's even worse than I feared. I was sceptical of the sensationalist headlines, but expected just some nuanced misinterpretation of the results. In fact, I'd already seen posts talking about the small sample size or the fact that the observed changes in neural connectivity might be due to a familiarisation effect. However, the reality turned out to be much worse.
I'm sharing this because I feel articles with titles "The truth is a little more complicated," e.g. recent Conversation coverage, validate the original text by treating it as a proper scientific paper. But it is not. The text violates basic requirements for presenting research findings. Honestly, the paper is such a mess that I don't even know where to start...
Okay, let's start with the design of the experiment. Participants were split into 3 groups and got 20 minutes to write an essay on a given topic. One group was allowed to use ChatGPT, another could use a search engine, and the third could only use their brain. The sessions were repeated 3 times, and then, 4 months after the first session, there was a 4th session where participants switched modes, i.e., those who had been using ChatGPT were not allowed to use anything, and those who had been using only their brain switched to using ChatGPT.
55 participants completed 3 sessions, and then the authors removed one of them to make the distribution nicer. Yes, you heard that right—they just removed an observation for no reason. Here's a direct quote from the paper: "55 completed the experiment in full (attending a minimum of three sessions, defined later). To ensure data distribution, we are here only reporting data from 54 participants (as participants were assigned in three groups, see details below)." Like, seriously, what? They just want the number of observations to be divisible by three and drop a data point? That's not how science works.
Anyway. At least, we have established that there were 3 groups in the experimental design. However, one of them mysteriously disappeared and the final results are reported only for two groups! So maybe there were 3 groups, maybe 2. Or maybe 5. And I'm not joking—there is a plot (Figure 12) suddenly showing 5 different groups without any explanation of what is going on.
One of my favourite figures is Figure 7 (attached to this post). Can you guess what these p-values correspond to? Check out the paper and you might be surprised. But also note that the figure caption says "Percentage of participants within each group who provided a correct quote" while the axis label says "Percentage of Participants Who Failed." So, never mind the p-values as authors can't even decide if they're measuring success or failure.
The most interesting part is just coming but apparently this post is already too long for LinkedIn so I have to continue in comments. | 74 comments on LinkedIn