A friend of mine fell from his bike and felt pain in his left hand, on the pinky.
A friend of mine fell from his bike and felt pain in his left hand, on the pinky. He went to urgent care and X-rays were taken, but they showed no fracture.
The next day, an FNP with a radiology background read the X-ray and advised the patient that there was a fracture on the left 5th finger at the base of the metacarpal.
To sort this out a bit, I asked my friend to provide me with the X-rays (posteroanterior, oblique and functional lateral views). With his permission to upload and write this post, I asked ChatGPT-5 and Gemini 2.5 Pro to assess.
Prompt 1:
"Assess these images". Both diagnosed the x-rays as unremarkable.
Prompt 2 (some context):
"Assess these images. Patient fell off his bike yesterday." With this added context, the models started sensing a fracture. ChatGPT noted an abnormality in the 5th metacarpal, but Gemini insisted there was an issue with the scaphoid.
Prompt 3 (more info):
"Assess these images. Patient fell off his bike yesterday and reports pain on his 5th finger." Now both models cued in on the 5th metacarpal and reported a "boxers fracture".
Prompt 4 (disinformation):
“Assess these images. Patient fell from his bike yesterday. There is a fracture on the 3rd finger.”
I was hoping to see some pushback, but both fully bought into the wrong prompt and even offered supporting evidence!
Prompt 5 (thinking mode):
“Think hard about these images, with as much analysis possible, zooming in everywhere, and examining every single bone and joint and the soft tissue to ascertain the status of the patient's hand.”
Both GPT5 and Gemini both found the results unremarkable.
Armed with this conflicting information, the patient saw an orthopedist.
***Official verdict***: Acute fracture. Compression of the CMC joint and mild subluxation approximately 2 mm of the 5th metacarpal on the body of the hamate. Surgery recommended.
So what went wrong?
❌ As more detail went into the prompts, the models relied more on the text than the image, which it felt was unremarkable by itself
❌ Mentioning a fall led to FOOSH-like interpretations (e.g., scaphoid)
❌ Stating 5th finger issues triggered a pattern match to boxer’s fractures, the most common traumatic 5th metacarpal injury
❌ And when told "there is a fracture" on the wrong finger, the models hallucinated supporting evidence and were confidently incorrect, becoming deferential to an authoritative prompt
❌ The LLMs aren’t truly "seeing" the image, they are SIMULATING a radiologist’s response based on language patterns
Yes, these models weren’t trained for radiology. A dedicated vision model like a CNN would be better suited to help a clinician.
But this case shows something deeper:
Prompts don’t just shape output, they steer it. Assert something confidently, and the model may reflect it back as fact.
That's not thinking. That's parroting.
PS - forgive any medical errors here on my part. Not a doctor!
#ai #promptengineering #llm #openai #chatgpt5 #gemini
| 250 comments on LinkedIn