Credit score: AI-generated picture
Should you’ve been to a medical appointment just lately, you could have already interacted with AI. As you describe your signs to the physician, they could ask your permission to make use of an “AI scribe” to transform audio into medical notes in actual time.
Or possibly you have typed your signs into ChatGPT to get a attainable prognosis—typically reassuring, typically alarming.
Synthetic intelligence (AI) for well being care is more and more being trialed in hospitals, clinics and even on our telephones.
Chatbots powered by giant language fashions are being promoted as a strategy to fill gaps in well being care, particularly the place medical doctors are scarce.
However our new analysis has discovered that whereas these AI chatbots like ERNIE Bot, ChatGPT, and DeepSeek present promise, in addition they pose important dangers—starting from overtreatment to reinforcing inequality. The findings are revealed within the journal npj Digital Medication.
World instruments, native dangers
AI already performs a job in lots of areas of well being care—from studying X-rays to powering triage chatbots.
Over 10% of Australian adults reported utilizing ChatGPT for health-related questions within the first half of 2024—with many on the lookout for scientific recommendation moderately than primary data—highlighting AI’s rising affect in well being decision-making.
However most analysis has centered on how correct they’re in idea, not how they behave with sufferers in observe.
Our examine is among the many first to carefully check chatbot efficiency in simulated real-world consultations, making the findings notably related as governments and hospitals race to undertake AI options.
We examined ERNIE Bot, a extensively used Chinese language chatbot, alongside OpenAI’s ChatGPT and DeepSeek, two of essentially the most superior international fashions.
We in contrast their efficiency with human main care medical doctors utilizing simulated affected person circumstances.
We additionally examined disparity by systematically various affected person traits, together with age, gender, revenue, residence and insurance coverage standing in standardized affected person profiles after which analyzing whether or not the chatbot’s high quality of care modified throughout these teams.
We offered frequent every day signs like chest ache or respiratory difficulties. For instance, a middle-aged affected person experiences experiencing chest tightness and shortness of breath after partaking in mild exercise.
The bot or physician is anticipated to ask about danger elements, order an ECG, and take into account angina as a attainable prognosis.
A youthful affected person complains of wheezing and problem respiratory that worsens with train. The anticipated response is to verify bronchial asthma and prescribe acceptable inhalers.
The identical signs are offered with totally different affected person profiles—for instance, an older versus youthful affected person, or a affected person with larger versus decrease revenue—to see if the chatbot’s suggestions modified.
Accuracy meets overuse and inequality
All three AI chatbots—ERNIE Bot, ChatGPT, and DeepSeek—have been extremely correct at an accurate prognosis—outperforming human medical doctors.
However, AI chatbots have been way more seemingly than medical doctors to recommend pointless assessments and medicines.
The truth is, it beneficial pointless assessments in additional than 90% of circumstances and prescribed inappropriate medicines in additional than half.
For instance, when offered with a affected person wheezing from bronchial asthma, the chatbot typically beneficial antibiotics or ordered costly CT scans—neither of that are supported by scientific pointers.
And AI efficiency diversified by affected person background.
For instance, older and wealthier sufferers have been extra prone to obtain additional assessments and prescriptions.
Our findings present that whereas AI chatbots might assist broaden well being care entry, particularly in international locations the place many individuals lack dependable main care, with out oversight, they may additionally drive up prices, expose sufferers to hurt and make inequality worse.
Well being care methods have to design safeguards—like fairness checks, clear audit trails and necessary human oversight for high-stakes choices—earlier than these instruments are extensively adopted.
Our analysis is well timed, given the worldwide pleasure—and concern—round AI.
Whereas chatbots might assist fill important gaps in well being care, particularly in low and middle-income international locations, we have to rigorously stability innovation with security and equity.
Co-designing AI for security and justice
There’s an pressing have to co-design secure and accountable AI Chatbots to be used in every day life, notably in delivering dependable well being data.
AI is coming to well being care whether or not we’re prepared or not.
By figuring out each its strengths and dangers, our examine supplies proof to information how we use these highly effective new instruments safely, pretty and responsibly.
We hope to proceed this important space of analysis in Australia to make sure AI applied sciences are developed with fairness and belief at their core and are helpful for our neighborhood.
Extra data:
Yafei Si et al, High quality security and disparity of an AI chatbot in managing power illnesses: simulated affected person experiments, npj Digital Medication (2025). DOI: 10.1038/s41746-025-01956-w
Supplied by
College of Melbourne
This text was first revealed on Pursuit. Learn the unique article right here.
Quotation:
AI chatbots usually outperform medical doctors in prognosis, however want safeguards to keep away from overprescribing (2025, October 3)
retrieved 3 October 2025
from https://medicalxpress.com/information/2025-10-ai-chatbots-outperform-doctors-diagnosis.html
This doc is topic to copyright. Other than any honest dealing for the aim of personal examine or analysis, no
half could also be reproduced with out the written permission. The content material is offered for data functions solely.

