by UT Southwestern Medical Center
Credit: Unsplash/CC0 Public Domain
Three of the leading chatbots can provide basic information about endometriosis, a painful gynecologic condition that affects up to 1 in 10 women, but their responses are not as comprehensive as the guidance from health care providers, according to a study by UT Southwestern Medical Center researchers. Their findings, published in AJOG Global Reports, sound a cautionary note for patients who turn to generative artificial intelligence (AI) for medical information.
"We did this study because we wanted to know what patients are learning from these chatbots. Is it accurate? Is it reliable? Is it aligning with updated clinical recommendations and what we know from current research?" asked study leader Kimberly Kho, M.D., Professor of Obstetrics and Gynecology at UT Southwestern.
"Our results affirm that responses from a chatbot cannot replace a proper evaluation and management by skilled experts for this and other diseases."
AI chatbots have attracted significant attention since OpenAI's release of ChatGPT in November 2022. Several other chatbots use a similar large language model, including Claude (developed by Anthropic) and Gemini (developed by Google and formerly known as Bard). Each of these chatbots generates responses developed from a wealth of publicly available data. Over the last few years, they have permeated many industries, including medicine.
Average score for each model for question. Credit: AJOG Global Reports (2024). DOI: 10.1016/j.xagr.2024.100405
Patients are increasingly turning to chatbots for medical information, either directly or through their incorporation into search engines, such as Google. However, the quality of answers delivered by these sources has been unclear, Dr. Kho explained.
Studies designed to evaluate their output have largely focused on information about cancer, she added, while benign gynecologic conditions haven't been well explored. These include endometriosis, a common disease in which tissue similar to the uterine lining grows outside the uterus, often causing pain, inflammation, and infertility.
To determine how well popular chatbots answer questions about endometriosis, Dr. Kho and her colleagues collected answers from ChatGPT-4, Claude, and Gemini after posing 10 questions patients often ask about this disease. Examples include: "What is endometriosis?" "How common is endometriosis?" and "How is endometriosis treated?" They then asked nine board-certified gynecologists to rate the accuracy and completeness of the answers based on current evidence-based guidelines.
The medical experts found that answers generated by all three chatbots were mostly accurate, with more correct answers about symptoms and disease processes than about treatment or risk of recurrence. However, Dr. Kho said, the physicians determined that some answers were incomplete.
This inadequacy might be due to several factors, she explained, including a lack of patient-specific context in the questions, not enough chatbot training data reflecting the most recent advances in clinical practice, and a lack of consensus among experts in the field. Among the three chatbots studied, ChatGPT delivered the most comprehensive and correct responses.
Based on these results, Dr. Kho said chatbots could serve as a useful starting point for medical information, but patients should still see their physicians to address questions and concerns. Medical experts need to be consulted and involved in the quality control process for health care-specific chatbots currently in development, she added.
More information: Natalie D. Cohen et al, A comparative analysis of generative artificial intelligence responses from leading chatbots to questions about endometriosis, AJOG Global Reports (2024). DOI: 10.1016/j.xagr.2024.100405
Post comments