In Harvard study, AI offered more accurate diagnoses than emergency room doctors
A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors.
A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors.
The study was published this week in Science and comes from a research team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. The researchers said they conducted a variety of experiments to measure how OpenAI’s models compared to human physicians.
In one experiment, researchers focused on 76 patients who came into the Beth Israel emergency room, comparing the diagnoses offered by two attending physicians to those generated by OpenAI’s o1 and 4o models. These diagnoses were assessed by two other attending physicians, who did not know which ones came from humans and which came from AI.
“At each diagnostic touchpoint, o1 either performed nominally better than or on par with the two attending physicians and 4o,” the study said, adding that the differences “were especially pronounced at the first diagnostic touchpoint (initial ER triage), where there is the least information available about the patient and the most urgency to make the correct decision.”
In Harvard Medical School’s press release about the study, the researchers emphasized that they did not “pre-process the data at all” — the AI models were presented with the same information that was available in the electronic medical records at the time of each diagnosis.
With that information, the o1 model managed to offer “the exact or very close diagnosis” in 67% of triage cases, compared to one physician who had the exact or close diagnosis 55% of the time, and to the other who hit the mark 50% of the time.
“We tested the AI model against virtually every benchmark, and it eclipsed both prior models and our physician baselines,” said Arjun Manrai, who heads an AI lab at Harvard Medical School and is one of the study’s lead authors, in the press release.
Meet your next investor or portfolio startup at Disrupt
Meet your next investor or portfolio startup at Disrupt
To be clear, the study didn’t claim that AI is ready to make real life-or-death decisions in the emergency room. Instead, it said the findings show an “urgent need for prospective trials to evaluate these technologies in real-world patient care settings.”
The researchers also noted that they only studied how models performed when provided with text-based information, and that “existing studies suggest that current foundation models are more limited in reasoning over nontext inputs.”
Adam Rodman, a Beth Israel doctor who’s also one of the study’s lead authors, told the Guardian that there’s “no formal framework right now for accountability” around AI diagnoses, and that patients still “want humans to guide them through life or death decisions [and] to guide them through challenging treatment decisions”.
When you purchase through links in our articles, we may earn a small commission . This doesn’t affect our editorial independence.
Anthony Ha is TechCrunch’s weekend editor. Previously, he worked as a tech reporter at Adweek, a senior editor at VentureBeat, a local government reporter at the Hollister Free Lance, and vice president of content at a VC firm. He lives in New York City.
You can contact or verify outreach from Anthony by emailing anthony.ha@techcrunch.com .
StrictlyVC Athens is up next. Hear unfiltered insights straight from Europe’s tech leaders and connect with the people shaping what’s ahead. Lock in your spot before it’s gone.
Uber wants to turn its millions of drivers into a sensor grid for self-driving companies Connie Loizos
Uber wants to turn its millions of drivers into a sensor grid for self-driving companies
Uber wants to turn its millions of drivers into a sensor grid for self-driving companies
Elon Musk testifies that xAI trained Grok on OpenAI models Tim Fernholz
Elon Musk testifies that xAI trained Grok on OpenAI models
Elon Musk testifies that xAI trained Grok on OpenAI models
Amazon, Meta join fight to end Google Pay, PhonePe dominance in India Jagmeet Singh
Amazon, Meta join fight to end Google Pay, PhonePe dominance in India
Amazon, Meta join fight to end Google Pay, PhonePe dominance in India
On the stand, Elon Musk can’t escape his own tweets Tim Fernholz
On the stand, Elon Musk can’t escape his own tweets
On the stand, Elon Musk can’t escape his own tweets
OpenAI ends Microsoft legal peril over its $50B Amazon deal Julie Bort
OpenAI ends Microsoft legal peril over its $50B Amazon deal
OpenAI ends Microsoft legal peril over its $50B Amazon deal
DeepMind’s David Silver just raised $1.1B to build an AI that learns without human data Anna Heim
DeepMind’s David Silver just raised $1.1B to build an AI that learns without human data
DeepMind’s David Silver just raised $1.1B to build an AI that learns without human data
OpenAI could be making a phone with AI agents replacing apps Ivan Mehta
OpenAI could be making a phone with AI agents replacing apps
OpenAI could be making a phone with AI agents replacing apps
Key takeaways
- AI demonstrated higher diagnostic accuracy compared to human doctors in an emergency room setting.
- The study emphasizes the importance of unprocessed data and transparency in the use of medical information.
- The adoption of AI in healthcare may require investments in technology and training but can improve the quality of care.
Editorial analysis
The Harvard University study highlights a significant advancement in the application of artificial intelligence in medical contexts, especially in critical situations like emergency rooms. For the Brazilian tech sector, this represents an opportunity for innovation, as the country has an emerging ecosystem of startups focused on digital health and AI. The accuracy demonstrated by OpenAI's language models suggests that, with proper regulation and oversight, AI could become a valuable tool to assist doctors in diagnostics, particularly in environments where time is a critical factor.
Moreover, the research emphasizes the importance of unprocessed data and transparency in the use of medical information. This raises questions about the need for adequate infrastructure for the collection and management of health data in Brazil, where many hospitals still face challenges in digitizing records. The adoption of AI in diagnostics may require investments in technology and training, but it could also lead to significant improvements in the quality of care.
It is crucial to observe how Brazilian health institutions will respond to these advancements. The integration of AI into healthcare services can not only optimize processes but also democratize access to quality diagnostics. However, implementation must be done cautiously, considering ethical implications and the need for human oversight. The future of medicine may well depend on the collaboration between humans and machines, and Brazil has the chance to lead this transformation in Latin America.
What this coverage includes
- Clear source attribution and link to the original publication.
- Editorial framing about relevance, impact, and likely next developments.
- Review for readability, context, and duplication before publication.
Original source:
TechCrunch AIAbout this article
This article was curated and published by AIDaily as part of our editorial coverage of artificial intelligence developments. The content is based on the original source cited below, enriched with editorial context and analysis. Automated tools may assist with translation and initial structuring, but publication decisions, factual review, and contextual framing remain editorial responsibilities.
Learn more about our editorial process