Skip to Main

AI chatbots are mostly correct, but incomplete, on endometriosis

UTSW study evaluates three increasingly popular sources for medical information

Confident young businesswoman working on laptop with business clients in outdoors co-working space, surrounded by green plants.
A study led by UT Southwestern researchers found that information about endometriosis from three different chatbots was mostly accurate but incomplete. (Photo Credit: Getty Images)

DALLAS – Feb. 20, 2025 – Three of the leading chatbots can provide basic information about endometriosis, a painful gynecologic condition that affects up to 1 in 10 women, but their responses are not as comprehensive as the guidance from health care providers, according to a study by UT Southwestern Medical Center researchers. Their findings, published in the American Journal of Obstetrics and Gynecology, sound a cautionary note for patients who turn to generative artificial intelligence (AI) for medical information.

Kimberly Kho, M.D.
Kimberly Kho, M.D., is Professor of Obstetrics and Gynecology and holds the Helen J. and Robert S. Strauss and Diana K. and Richard C. Strauss Chair in Women’s Health at UT Southwestern.

“We did this study because we wanted to know what patients are learning from these chatbots. Is it accurate? Is it reliable? Is it aligning with updated clinical recommendations and what we know from current research?” asked study leader Kimberly Kho, M.D., Professor of Obstetrics and Gynecology at UT Southwestern. “Our results affirm that responses from a chatbot cannot replace a proper evaluation and management by skilled experts for this and other diseases.”

AI chatbots have attracted significant attention since OpenAI’s release of ChatGPT in November 2022. Several other chatbots use a similar large language model, including Claude (developed by Anthropic) and Gemini (developed by Google and formerly known as Bard). Each of these chatbots generates responses developed from a wealth of publicly available data. Over the last few years, they have permeated many industries, including medicine.

Patients are increasingly turning to chatbots for medical information, either directly or through their incorporation into search engines, such as Google. However, the quality of answers delivered by these sources has been unclear, Dr. Kho explained. Studies designed to evaluate their output have largely focused on information about cancer, she added, while benign gynecologic conditions haven’t been well explored. These include endometriosis, a common disease in which tissue similar to the uterine lining grows outside the uterus, often causing pain, inflammation, and infertility.

To determine how well popular chatbots answer questions about endometriosis, Dr. Kho and her colleagues collected answers from ChatGPT-4, Claude, and Gemini after posing 10 questions patients often ask about this disease. Examples include: “What is endometriosis?” “How common is endometriosis?” and “How is endometriosis treated?” They then asked nine board-certified gynecologists to rate the accuracy and completeness of the answers based on current evidence-based guidelines.

The medical experts found that answers generated by all three chatbots were mostly accurate, with more correct answers about symptoms and disease processes than about treatment or risk of recurrence. However, Dr. Kho said, the physicians determined that some answers were incomplete. This inadequacy might be due to several factors, she explained, including a lack of patient-specific context in the questions, not enough chatbot training data reflecting the most recent advances in clinical practice, and a lack of consensus among experts in the field. Among the three chatbots studied, ChatGPT delivered the most comprehensive and correct responses.

Based on these results, Dr. Kho said, chatbots could serve as a useful starting point for medical information, but patients should still see their physicians to address questions and concerns. Medical experts need to be consulted and involved in the quality control process for health care-specific chatbots currently in development, she added.

Dr. Kho holds the Helen J. and Robert S. Strauss and Diana K. and Richard C. Strauss Chair in Women’s Health.

Other UTSW researchers who contributed to this study include first author Natalie D. Cohen, M.D., Assistant Instructor of Obstetrics and Gynecology; Donald McIntire, Ph.D., Professor of Obstetrics and Gynecology; Katherine Smith, M.D., Assistant Professor of Obstetrics and Gynecology; and Milan Ho, B.S., medical student.

About UT Southwestern Medical Center   

UT Southwestern, one of the nation’s premier academic medical centers, integrates pioneering biomedical research with exceptional clinical care and education. The institution’s faculty members have received six Nobel Prizes and include 25 members of the National Academy of Sciences, 23 members of the National Academy of Medicine, and 14 Howard Hughes Medical Institute Investigators. The full-time faculty of more than 3,200 is responsible for groundbreaking medical advances and is committed to translating science-driven research quickly to new clinical treatments. UT Southwestern physicians provide care in more than 80 specialties to more than 120,000 hospitalized patients, more than 360,000 emergency room cases, and oversee nearly 5 million outpatient visits a year.