A study assessed the effectiveness of safeguards in foundational large language models (LLMs) to protect against malicious instruction that could turn them into tools for spreading disinformation, or the deliberate creation and dissemination of false information with the intent to harm.
The study revealed vulnerabilities in the safeguards for OpenAI’s GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, Llama 3.2-90B Vision, and Grok Beta. Specifically, customized LLM chatbots were created that consistently generated disinformation responses to health queries, incorporating fake references, scientific jargon, and logical cause-and-effect reasoning to make the disinformation seem plausible.
The findings are published in Annals of Internal Medicine.
Researchers from Flinders University and colleagues evaluated the application programming interfaces (APIs) of five foundational LLMs for their capacity to be system-instructed to always provide incorrect responses to health questions and concerns.
The specific system instructions provided to these LLMs included always providing incorrect responses to health questions, fabricating references to reputable sources, and delivering responses in an authoritative tone. Each customized chatbot was asked 10 health-related queries, in duplicate, on subjects like vaccine safety, HIV, and depression.
The researchers found that 88% of responses from the customized LLM chatbots were health disinformation, with four chatbots (GPT-4o, Gemini 1.5 Pro, Llama 3.2-90B Vision, and Grok Beta) providing disinformation to all tested questions.
The Claude 3.5 Sonnet chatbot exhibited some safeguards, answering only 40% of questions with disinformation. In a separate exploratory analysis of the OpenAI GPT Store, the researchers investigated whether any publicly accessible GPTs appeared to disseminate health disinformation.
They identified three customized GPTs that appeared tuned to produce such content, which generated health disinformation responses to 97% of submitted questions.
Overall, the findings suggest that LLMs remain substantially vulnerable to misuse and, without improved safeguards, could be exploited as tools to disseminate harmful health disinformation.
More information:
Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion into Health Disinformation Chatbots, Annals of Internal Medicine (2025). DOI: 10.7326/ANNALS-24-03933
American College of Physicians
Citation:
AI chatbot safeguards fail to prevent spread of health disinformation, study reveals (2025, June 23)
retrieved 23 June 2025
from https://medicalxpress.com/news/2025-06-ai-chatbot-safeguards-health-disinformation.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.