by Anna Zarra Aldrich, University of Connecticut

ChatGPT

Credit: Unsplash/CC0 Public Domain

Since its debut in late 2022, people have experimented with using the AI (artificial intelligence) chat bot ChatGPT for everything from recipe planning to answering trivia to helping with homework. But ChatGPT has been mired in issues concerning its accuracy.

Regular physical activity is critical for health and disease prevention, yet only 25% of U.S. adults meet national physical activity guidelines. Since the public premiere of ChatGPT, people have been using the tool to generate physical activity plans.

A team composed of researchers from UConn's College of Agriculture, Health and Natural Resources and Hartford Hospital recently investigated the accuracy of ChatGPT's exercise recommendations. They published these findings in the JMIR Medical Education in a special issue dedicated to ChatGPT and Generative Language Models in Medical Education. The team was led by Amanda Zaleski '08, '14, '18 (CAHNR), a senior scientist in the Department of Preventive Cardiology at Hartford Hospital.

"It's a big topic and nobody knows what to do with it," says Distinguished Professor of Kinesiology Linda Pescatello.

The research team developed a formal grading rubric to score the AI-generated exercise recommendations. The rubric included 10 categories that comprise a "gold-standard" exercise recommendation according to the American College of Sports Medicine (ACSM) Guidelines for Exercise Testing and Prescription. Pescatello was already very familiar with these recommendations. She was a member of the 2018 Physical Activity Guidelines Advisory which established them.

The AI chatbot was then prompted to provide individualized exercise recommendations for all 26 clinical populations for which there exists an evidence base in the ACSM Guidelines. The team then compared AI-generated exercise recommendations to the gold-standard reference and evaluated their comprehensiveness, accuracy, and readability.

The researchers found that ChatGPT's output provided only 41% of the content expected in a gold-standard exercise recommendation indicating poor comprehensiveness.

The chat bot was able to provide general recommendations, such as getting 150 minutes of exercise per week, but failed to provide guidance on other key elements such as the frequency, intensity, time, and type, also known as the FITT principle of physical activity.

However, of the content provided, ChatGPT output demonstrated high accuracy, around 91%.

The most common source of misinformation was the recommendation to seek medical clearance prior to engaging in any exercise, which is generally not necessary except when a person is having signs and symptoms of disease.

The AI chatbot also failed to account for special considerations such as how the medications a person is taking may interact with a new exercise regime.

In addition, the ChatGPT output was written at the college-level. This is well above the American Medical Association's recommendation that health-related educational material be written at a level a sixth grader could understand.

The researchers recommend being cautious with following exercise recommendations from ChatGPT, understanding it does not provide a complete physical activity program.

Compounding these findings is another paper by Pescatello's team led by Shiqi Chen, a master's student, which analyzed existing mobile apps for exercise recommendations. That work was published in the Journal of Cardiovascular Development and Disease.

They determined that there was no app currently on the market that offered evidence-based exercise recommendations for people with cardiovascular disease risk factors in line with ACSM's and American Heart Association guidelines.

"We did not find a single app on the market that did that," Pescatello says.

The 219 apps they studied were highly rated with more than 1,000 reviews, free to download, and not gender specific. A mere 0.5% of the apps were evidence-based. Only 3.7% included a preparticipation screening and less than a third built cardiovascular disease risk profiles.

A majority of the apps (64.8%) focused on body image and/or athletic performance rather than health.

Within this environment of uncertainty and shortcomings, Pescatello has developed her own tool, P3-EX LLC, which offers evidence-based personalized exercise recommendations for those with cardiovascular disease risk factors.

Clinicians can use the tool to generate an exercise recommendation in less than five minutes. This is a critical resource as most doctors do not receive training on generating exercise prescriptions in medical school.

"Our research shows that we're touching on something and identifying an unmet clinical need," Pescatello says.

While the tool is currently only available to clinicians, Pescatello hopes to scale P3-EX so that one day it may be available to the public.

The tool is used in the exercise prescription program at UConn.

"Our mission is to get people to realize the value of exercise and provide that value with tools that can make exercise more accessible and increase adherence," Pescatello says.

More information: Amanda L Zaleski et al, Comprehensiveness, Accuracy, and Readability of Exercise Recommendations Provided by an AI-Based Chatbot: Mixed Methods Study, JMIR Medical Education (2024). DOI: 10.2196/51308

Shiqi Chen et al, Evaluation of Exercise Mobile Applications for Adults with Cardiovascular Disease Risk Factors, Journal of Cardiovascular Development and Disease (2023). DOI: 10.3390/jcdd10120477

Provided by University of Connecticut