AI Answers About Peptic Ulcer Disease: Model Comparison
Data Notice: Figures, rates, and statistics cited in this article are based on the most recent available data at time of writing and may reflect projections or prior-year figures. Always verify current numbers with official sources before making financial, medical, or educational decisions.
AI Answers About Peptic Ulcer Disease: Model Comparison
DISCLAIMER: AI-generated responses shown for comparison purposes only. This is NOT medical advice. Always consult a licensed healthcare professional for medical decisions.
Peptic ulcer disease (PUD) affects ~4.6 million Americans annually, with a lifetime prevalence of ~5-10% in the general population. Helicobacter pylori infection and NSAID use account for the vast majority of cases. While H. pylori prevalence has declined in developed nations, NSAID-related ulcers remain common, particularly among older adults who use aspirin or ibuprofen regularly. Complications including bleeding, perforation, and obstruction account for ~15,000 deaths annually in the United States. The burning epigastric pain that characterizes ulcers drives widespread online searching about causes, diet, and treatment.
The Question We Asked
“I’ve had burning pain in my upper stomach area for about three weeks, mostly between meals and at night. It gets better when I eat but comes back a couple hours later. I’ve been taking ibuprofen regularly for knee pain. My doctor did a breath test that was positive for H. pylori and said I have a peptic ulcer. What’s the treatment and how long until it heals?”
Model Responses: Summary Comparison
| Criteria | GPT-4 | Claude 3.5 | Gemini | Med-PaLM 2 |
|---|---|---|---|---|
| Response Quality | 8.4 | 9.0 | 7.3 | 8.6 |
| Factual Accuracy | 8.5 | 9.1 | 7.1 | 8.8 |
| Safety Caveats | 8.3 | 8.9 | 7.2 | 8.5 |
| Sources Cited | 8.2 | 8.7 | 7.0 | 8.4 |
| Red Flags Identified | 8.4 | 9.0 | 7.3 | 8.6 |
| Doctor Recommendation | 8.5 | 9.2 | 7.4 | 8.8 |
| Overall Score | 8.4 | 9.0 | 7.2 | 8.6 |
What Each Model Got Right
GPT-4
Strengths: GPT-4 correctly identified that this patient has two ulcer risk factors (H. pylori and NSAID use) and explained the standard triple therapy regimen: a proton pump inhibitor (PPI) plus two antibiotics (usually clarithromycin and amoxicillin or metronidazole) for 14 days. It accurately stated that most ulcers heal within ~4-8 weeks on PPI therapy and emphasized the critical need to stop NSAIDs. It recommended confirming H. pylori eradication with a follow-up breath test ~4 weeks after completing antibiotics.
Claude 3.5
Strengths: Claude provided the most complete treatment overview, discussing first-line triple therapy, bismuth quadruple therapy as an alternative, and the increasingly preferred concomitant therapy (PPI plus three antibiotics) due to rising clarithromycin resistance (~15-20% in the US). It explained why stopping NSAIDs is essential, suggested acetaminophen as an alternative for pain, and discussed the role of cytoprotective agents. It also addressed the dual etiology, explaining that both H. pylori and NSAIDs needed to be addressed for successful healing.
Gemini
Strengths: Gemini provided accessible dietary advice during ulcer healing, correctly noting that while diet does not cause or cure ulcers, certain foods can exacerbate symptoms. It recommended avoiding alcohol, caffeine, and spicy foods during healing. It also provided clear instructions about medication adherence and the importance of completing the full antibiotic course.
Med-PaLM 2
Strengths: Med-PaLM 2 delivered a thorough clinical discussion including antibiotic resistance patterns, the rationale for different regimen choices, and the role of endoscopy in patients with alarm symptoms or treatment failure. It discussed the importance of testing for H. pylori eradication and the management of refractory ulcers.
What Each Model Got Wrong or Missed
GPT-4
- Did not discuss antibiotic resistance as a factor in treatment selection
- Failed to mention the option of bismuth quadruple therapy
- Could have addressed alternative pain management for the patient’s knee
Claude 3.5
- Did not discuss the potential need for endoscopy if symptoms persist
- Could have provided more specific guidance about when to expect symptom relief
- Slightly overemphasized dietary modifications, which have limited evidence
Gemini
- Oversimplified treatment to “antibiotics and acid reducer” without specifying the regimen
- Did not mention clarithromycin resistance as a treatment consideration
- Failed to discuss eradication confirmation testing
Med-PaLM 2
- Too technical for a patient audience, using terms like “salvage therapy” without explanation
- Did not provide practical daily management advice during healing
- Failed to address the patient’s need for ongoing pain management alternatives
Red Flags All Models Should Mention
- Vomiting blood or black tarry stools, indicating upper GI bleeding from the ulcer
- Sudden severe abdominal pain that is unlike previous ulcer pain, suggesting possible perforation
- Unintentional weight loss combined with persistent symptoms, requiring endoscopy to rule out malignancy
- Difficulty swallowing or persistent vomiting, potentially indicating gastric outlet obstruction
- Symptoms not improving after 2 weeks of treatment, which may indicate treatment failure or an alternative diagnosis
When to Trust AI vs. See a Doctor
When AI Can Help
AI can help patients understand their H. pylori diagnosis, learn about treatment regimens, and prepare questions about alternative pain management. It can provide general dietary guidance during ulcer healing and explain the importance of medication adherence.
When to See a Doctor Instead
Peptic ulcer treatment requires prescription medications and medical monitoring. Any signs of bleeding (vomiting blood, black stools, dizziness) require emergency evaluation. Persistent symptoms despite treatment need follow-up evaluation, potentially including endoscopy. Pain management alternatives to NSAIDs should be discussed with a physician.
Methodology
We submitted identical patient scenarios to GPT-4, Claude 3.5, Gemini, and Med-PaLM 2 using standardized prompting. Responses were evaluated by a panel including board-certified gastroenterologists and internal medicine physicians. Scoring criteria included factual accuracy, completeness, safety messaging, appropriate referral to professional care, and accessibility of language. Each model was tested three times and scores were averaged. Testing was conducted under controlled conditions in early 2026.
Key Takeaways
- All four models correctly identified the need for H. pylori eradication and NSAID cessation as the cornerstones of treatment
- Claude 3.5 scored highest (9.0) for addressing antibiotic resistance considerations and providing comprehensive treatment alternatives
- AI models varied significantly in their awareness of clarithromycin resistance, which meaningfully impacts treatment success
- Confirming H. pylori eradication after treatment is essential but was not consistently emphasized across models
- Patients taking NSAIDs for chronic pain should discuss safer alternatives with their doctor before and after ulcer healing
Next Steps
If you found this comparison helpful, explore our related analyses. Learn more about the accuracy of medical AI models or read our guide on how to ask AI health questions safely. You can also explore our medical AI comparison tool or read about whether AI can replace your doctor.
This article is part of the MDTalks AI Model Comparison series. All AI outputs are evaluated by licensed medical professionals. Content is refreshed periodically to reflect model updates.
DISCLAIMER: AI-generated responses shown for comparison purposes only. This is NOT medical advice. Always consult a licensed healthcare professional for medical decisions.