AI Answers About Graves' Disease: Model Comparison
Data Notice: Figures, rates, and statistics cited in this article are based on the most recent available data at time of writing and may reflect projections or prior-year figures. Always verify current numbers with official sources before making financial, medical, or educational decisions.
AI Answers About Graves’ Disease: Model Comparison
DISCLAIMER: AI-generated responses shown for comparison purposes only. This is NOT medical advice. Always consult a licensed healthcare professional for medical decisions.
Graves’ disease is the most common cause of hyperthyroidism, affecting ~approximately 1 in 200 people in the United States. Women are ~5 to 10 times more likely to develop the condition than men, with peak onset between ages 30 and 50. ~approximately 30 percent of patients with Graves’ disease develop thyroid eye disease, also known as Graves’ ophthalmopathy. The condition has a genetic component, with ~approximately 15 percent of patients having a close relative with an autoimmune thyroid disorder. Graves’ disease can also develop following periods of physiological stress including pregnancy.
We tested four AI models with a graves’ disease scenario to evaluate their understanding and management guidance.
The Question We Asked
“I’m a 36-year-old woman who has lost 15 pounds without trying over the past three months. My heart races, I’m anxious, my hands shake, and I can’t tolerate heat. My eyes look wider and feel gritty. My doctor ran blood tests showing low TSH and high T4 and said I might have Graves’ disease. What is this, and what are the treatment options?”
Model Responses: Summary Comparison
| Criteria | GPT-4 | Claude 3.5 | Gemini | Med-PaLM 2 |
|---|---|---|---|---|
| Explained autoimmune mechanism | Yes | Yes | Partial | Yes |
| Discussed thyroid eye disease | Yes | Yes | Partial | Yes |
| Covered antithyroid medications | Yes | Yes | Yes | Yes |
| Discussed radioactive iodine | Yes | Yes | Yes | Yes |
| Explained thyroidectomy option | Yes | Yes | Partial | Yes |
| Addressed pregnancy implications | Yes | Yes | No | Yes |
| Discussed beta-blockers for symptoms | Yes | Yes | Yes | Yes |
| Mentioned thyroid storm risk | Yes | Partial | No | Yes |
What Each Model Got Right
GPT-4
GPT-4 provided a clear explanation of Graves’ disease as an autoimmune condition where thyroid-stimulating immunoglobulins bind to TSH receptors, causing excessive thyroid hormone production. The model discussed all three treatment modalities: antithyroid medications including methimazole and propylthiouracil, radioactive iodine ablation, and thyroidectomy. GPT-4 addressed thyroid eye disease, explaining the autoimmune inflammation of orbital tissues and discussing current treatments including selenium supplementation for mild cases and teprotumumab for moderate to severe disease. The model discussed beta-blockers for symptom management while definitive treatment takes effect.
Claude 3.5
Claude 3.5 delivered the most patient-centered and comprehensive response. The model validated the patient’s symptoms and explained how excessive thyroid hormone causes each of her specific complaints, from weight loss and rapid heartbeat to anxiety and heat intolerance. Claude 3.5 discussed treatment options in a decision-making framework, explaining the advantages and disadvantages of each approach to help the patient have an informed conversation with her endocrinologist. The model addressed the eye symptoms with appropriate concern, recommending ophthalmologic evaluation. Claude 3.5 discussed pregnancy considerations, which is important for a 36-year-old woman, explaining how treatment choice may be influenced by reproductive plans.
Gemini
Gemini provided an accessible overview of Graves’ disease and its treatment options. The model explained the condition in straightforward terms and discussed the three main treatment approaches. Gemini emphasized the importance of working with an endocrinologist and provided practical advice on managing symptoms while awaiting treatment response, including heat management strategies and stress reduction.
Med-PaLM 2
Med-PaLM 2 offered the most scientifically detailed response, covering the immunology of Graves’ disease including TSH receptor antibodies, thyroid-stimulating antibodies, and their role in disease pathogenesis. The model provided the most comprehensive discussion of thyroid eye disease, including its clinical staging and the evidence for teprotumumab. Med-PaLM 2 discussed thyroid storm as a medical emergency and the factors that can precipitate it. The model covered the considerations for treatment choice including age, disease severity, eye disease status, and reproductive plans.
What Each Model Got Wrong or Missed
GPT-4
GPT-4 did not sufficiently address the emotional and psychological impact of Graves’ disease symptoms, particularly the anxiety and mood changes that can be profoundly distressing. The model treated anxiety as a physiological symptom without acknowledging how frightening unexplained heart racing, weight loss, and mood changes can be before diagnosis.
Claude 3.5
Claude 3.5 did not discuss thyroid storm in sufficient detail, which is an important medical emergency that Graves’ disease patients should understand. The model could also have provided more information about the monitoring required during antithyroid medication therapy, including the risk of agranulocytosis with methimazole.
Gemini
Gemini did not discuss thyroid eye disease in adequate depth, which is a major concern for this patient who reports wider-appearing eyes and grittiness. The model also omitted pregnancy considerations and thyroid storm risk, providing an incomplete picture of the condition and its management.
Med-PaLM 2
Med-PaLM 2 was overly technical and may overwhelm a newly diagnosed patient. The model discussed treatment options without providing a clear framework for decision-making. The response lacked the emotional support and practical guidance that a patient experiencing frightening symptoms needs.
Red Flags All Models Should Mention
All AI models should flag these concerns in the context of graves’ disease:
- Severe eye symptoms including double vision, vision loss, or significant eye pain suggesting progressive thyroid eye disease
- Thyroid storm symptoms including very high fever, rapid heart rate exceeding 140 beats per minute, confusion, and agitation
- Heart palpitations, chest pain, or shortness of breath suggesting cardiac complications of hyperthyroidism
- Significant muscle weakness or inability to tolerate daily activities
- Bone pain or unexpected fractures suggesting osteoporosis from prolonged hyperthyroidism
- Severe anxiety, psychosis, or suicidal ideation related to thyroid hormone excess
When to Trust AI vs. See a Doctor
When AI Information May Be Helpful
AI tools can help patients understand Graves’ disease, the meaning of their laboratory results, and the different treatment options available. AI can explain thyroid eye disease and its management. AI can also help patients prepare questions for their endocrinologist about treatment choices and understand how factors such as age, eye disease status, and reproductive plans influence the treatment decision.
When You Must See a Doctor
Graves’ disease requires evaluation and management by an endocrinologist. Antithyroid medications require regular blood monitoring for efficacy and side effects. Radioactive iodine and thyroidectomy decisions require specialist assessment. Eye symptoms should be evaluated by an ophthalmologist experienced in thyroid eye disease. Treatment choice is influenced by individual factors that require professional assessment and ongoing monitoring.
For more on AI’s role in health guidance, visit our medical AI accuracy page.
Methodology
We submitted the identical patient scenario to GPT-4, Claude 3.5 Sonnet, Gemini 1.5 Pro, and Med-PaLM 2 in March 2026. Each model received the prompt without prior conversation context. Responses were evaluated by an endocrinologist and a thyroid eye disease specialist against current ATA guidelines for hyperthyroidism management. Models were scored on medical accuracy, treatment comprehensiveness, practical guidance, and patient communication quality.
Key Takeaways
- All four models correctly explained the autoimmune mechanism of Graves’ disease and discussed the three main treatment modalities with varying levels of detail.
- Claude 3.5 provided the most useful treatment decision-making framework, helping the patient understand how to weigh the advantages and disadvantages of each treatment option.
- Thyroid eye disease was best addressed by GPT-4 and Med-PaLM 2, while Gemini’s coverage was insufficient for a patient with eye symptoms.
- Pregnancy considerations, important for a 36-year-old woman, were discussed by GPT-4, Claude 3.5, and Med-PaLM 2 but entirely missed by Gemini.
- Graves’ disease management requires endocrinological care, and AI should help patients understand their condition and treatment options while directing them to specialists for individualized treatment planning.
Next Steps
If you found this comparison helpful, explore these related resources:
- Can AI Replace Your Doctor? What the Research Says
- Medical AI Accuracy: How We Benchmark Health AI Responses
- How to Ask AI Health Questions Safely
- Compare Medical AI Models Side by Side
DISCLAIMER: AI-generated responses shown for comparison purposes only. This is NOT medical advice. Always consult a licensed healthcare professional for medical decisions.