Your doctor's AI notetaker may be making things up, Ontario audit finds
Source: Ars Technica
In recent years, many overworked doctors have turned to so-called AI medical scribes to help automatically summarize patient conversations, diagnoses, and care decisions into structured notes for health record logging. But a recent audit by the auditor general of Ontario found that AI scribes recommended by the provincial government regularly generated incorrect, incomplete and hallucinated information that could potentially result in inadequate or harmful treatment plans that may potentially impact patient health outcomes.
In a recent report on Use of Artificial Intelligence in the Ontario Government, the auditor general reviewed transcription tests of two simulated patient-doctor conversations performed across 20 AI scribe vendors that were approved and pre-qualified by the provincial government for purchase by healthcare providers. All 20 of those vendors showed some issue with accuracy or completeness in at least one of these simple tests, including nine that hallucinated patient information, 12 that recorded information incorrectly, and 17 that missed key details about discussed mental health issues.
In the report, the auditor general points out multiple concerning examples of mistakes in those summaries that could have a direct and negative impact on a patients subsequent care. That includes situations where an AI scribe hallucinated nonexistent referrals for blood tests or therapy, incorrectly transcribed the names of prescription medication, and/or missed key details of mental health issues discussed in the simulated conversations.
Across all approved vendors, the average tested AI scribe scored only a 12 out of 20 on the accuracy of medical notes generated section of Supply Ontarios evaluation rubric. But that seemingly key accuracy metric was only responsible for about 4 percent of a vendors overall score, making it easy to meet the minimum threshold for approval even if an AI scribe scored a zero on the accuracy metric (a separate metric measuring domestic presence in Ontario was worth 30 percent of the overall scoring).
-snip-
Read more: https://arstechnica.com/health/2026/05/your-doctors-ai-notetaker-may-be-making-things-up-ontario-audit-finds/
hlthe2b
(114,602 posts)AI COULD be useful in aiding difficult diagnoses--compiling rule-out lists and developing an appropriate diagnostic pathway, whether with bloodwork, routine or more complex scans, and work-up of other body systems based on the initial (thorough) physical exam and history.
But, what I fear (and there is reason to believe this may already be occurring, especially with less experienced physicians in ERs and other high stress environments, is that AI is used to go backward to create the history and physical findings to support a "knee-jerk" diagnosis. While imaging and labs may be self-correcting to some extent, that's one hell of an expensive price tag when they may not all have been ordered in the first place with an accurate early assessment of actual findings. And what CRITICAL finding might they MISS in doing this?
AI in the hands of some--can be a real scourge for many disciplines--especially when it encourages shortcuts and reliance without validation.
ToxMarz
(3,052 posts)It should all be reviewed and approved by the appropriate entities who are SOLEY resposible for the results and any liability or repercussions. You can't punish AI, the only way is to have real accountabilty for the users who have been entrusted. If they won't vouch for the results produced with AI personally, they shouldn't be using it.
hlthe2b
(114,602 posts)up against a horrifically understaffed and overworked medical system--whether it be at the medical school academic level, the nation's emergency departments (with waits quite often approaching 8-12 hours), the best experienced staff so burnt out after COVID or the atrocities wrought by the Trump/RFK Jr /Oz administration, not to mention the verbal and increasing physical violence from the angry public, the "should do" level of supervision is damned near impossible.
So, I dream of the day when I could once jump up on my high horse, wag my finger, and castigate, but the truth is there are reasons why this is occurring and will continue unless major changes come to health care. Once upon a time the desire to deliver the best care and the pride that came from it (as well as the learning experiences) were enough to prevent technology like AI from being abused. For a few it was only the fear of a lawsuit. But, now. ERs and ICUs are the "walking wounded" in terms of their still dedicated staff. Likewise many areas of academic medicine training. We can only ask so much before only those ultra-rich who can afford concierge-style medical care will receive what most of us believe should be provided for all. But, I think my point has been made... sigh
Pobeka
(5,009 posts)I understand the rationale -- the physician can focus on the conversation with the patient instead of typing notes.
The pressure from management is probably trying to get one extra patient per day in the door.
I never had an issue with the appointment notes -- I have a very good cardiologist -- she has always got it right.
... until the AI assisted write up. AI has no idea about context, it merely flagged a keyword that it "heard", completely out of context and indicated we had a discussion that we didn't actually have. My cardiologist didn't catch it. My view is she wouldn't have written erroneous comment in the first place had she been directly typing in notes.
I worry about health professionals, young or old. Proofreading is hard. So easy to get distracted and lose focus, while the words presented to the right brain make enough sense to sound correct, but actually are incorrect.
hlthe2b
(114,602 posts)sans any consideration. This will be a wide problem, but most serious among those without the additional training and experience to know the pitfalls.
Not to mention the AI use of source publications and meta-analyses used (or not) to respond on any issue--not to mention the overemphasis on readily accessible but unvalidated or less reliable published studies or the total failure to give consensus guidelines due consideration. Having been spending time on several big issues and comparing Claude, ChatGPT, and one other I won't name, this is an issue.
UpInArms
(55,319 posts)It should not be used to replace human beings. It is an awful buggy program.
Captain Zero
(8,951 posts)Kind of as an aside, that their data may contain AI errors, and, that this employee was correcting some of it. Aetna uses Signify.
Did the employee correct it all?
Did the employee use any AI measures themselves? Do Aetna and Signify capitalize any AI collection with data brokers? Do AI measures get internally validated?
UpInArms
(55,319 posts)told me that her workplace insisted that she use AI
She said that it was wrong more than right and that she spent more time correcting it than it saved
spinbaby
(15,404 posts)Years ago, I sneezed during a CT scan. The technician decided that meant I was allergic to contrast. I was not. Took years to get that out of my record. Now AI can insert even more errors that will live forever.