ARTICLE LINK> OPENING ARTICLE STREAM> WARMING IMAGE CACHE> LOCKING READER ROUTE> TRANSFER

// INITIALIZING GLOBE FEED...

AIREWRITTENdb#3047

When AI sees a blank: models can invent medical findings without an image

March 30, 2026(2mo ago)

Stanford, United States

Quick article interpreter

Stanford'sreportrevealsleadingmultimodalAImodelsachieve70to80percentoftheirstandardbenchmarkscoresonthePhantom-0setof200questionswithoutanyimage.Thisphenomenon,dubbed'mirageinvision,'isn'tjustanacademiccuriosity–inmedicalorsafetyapplications,fabricateddiagnosescanleadtoseriousconsequences.Stanford'stestcovered20categories,andmodelsdidn'tjustdescribenonexistentdetailsbutofferedconvincingexplanationsfortheir'perception.'It'safundamentalvulnerabilityininputcredibilityassessmentthatexistingbenchmarkscompletelymiss.

Wikimedia Commons: Anthropic Claude Opus 4.5📷 © Прикли

AuthorNexus ValeAI editor“Treats every model release like a courtroom transcript.”

★Stanford'sPhantom-0setof200imagelessquestionsshowsmodelsretain70-80%ofstandardbenchmarkscores,inventinganatomicaldetailsandclinicalnarratives.
★The'mirageinvision'phenomenonisn'tjustanacademiccuriosity–inmedicalandsafetyapplications,fabricateddiagnosescanhaveseriousconsequences.
★Existingevaluationframeworkslagbehindmodelsophistication,failingtodetectafundamentalvulnerabilityininputcredibilityassessment.

Leading multimodal AI models are now diagnosing images that were never shown to them—and doing it with unsettling confidence. Stanford research reveals that GPT-5, Google's Gemini 3 Pro, and Anthropic's Claude Opus 4.5 generate detailed medical interpretations and visual descriptions even when fed zero pixels. The kicker? Existing benchmarks designed to catch such failures completely miss the behavior, letting these systems pass evaluations while fabricating nonexistent content.

The Stanford team constructed Phantom-0, a set of 200 imageless prompts about medical scans—X-rays, MRIs, dermatology cases. Models retained 70-80% of their standard benchmark scores despite having no visual input whatsoever. They invented anatomical details, constructed clinical narratives, and deployed convincing medical terminology. One might expect stammering uncertainty or explicit refusals. Instead, the AI produced elaborate, plausible-sounding reports complete with specific observations about structures it never observed.

This "mirage in vision" phenomenon exposes a critical gap between model sophistication and evaluation rigor. The benchmarks that certify these systems for real-world deployment were built to test whether models understand what they do see, not whether they can recognize when they see nothing. It's a distinction with consequences: in medical workflows, a fabricated diagnosis from corrupted or missing image data isn't a curiosity—it's a liability.

Stanfordresearchrevealsleadingmultimodalmodelsconfidentlyfabricatemedicaldiagnoseswithzerovisualinput

Wikimedia Commons: Stanford University📷 © Frank Schulenburg

The root cause likely traces to training data saturated with image-text pairs, where the model learns to generate coherent visual descriptions without firm anchoring to actual pixel input. Under uncertainty, the system defaults to plausible inference rather than honest admission of ignorance. This isn't overconfidence in the human sense; it's a structural feature of how current multimodal architectures process ambiguous inputs.

For developers, the implication is stark: traditional validation pipelines need fundamental redesign. Input credibility assessment—verifying that visual data actually arrived and was processed—must become as standard as output quality checks. Healthcare integrators face the more immediate problem: how to build human-in-the-loop safeguards that catch fabricated interpretations without negating the efficiency gains that attracted them to AI in the first place.

The broader pattern matters beyond medicine. Any domain where multimodal models interpret sensor data, surveillance feeds, or diagnostic imagery carries similar risk. Evaluation frameworks that lag behind model capabilities don't merely underperform—they create dangerous certification gaps, blessing systems with safety credentials they haven't earned. Phantom-0 won't be the last probe of this vulnerability, but it should be the wake-up call.

Anthropic Claude Gemini Google Multimodalaimodelslikegpt-5 Multimodalniaimodelipoputgpt-5

// Next from latest and related signals

All Will Fall Turns Ocean Colonies Into Structural Trouble

MeerKAT’s rare triple-double galaxy forces a rethink of black hole jets

// liked by readers

//Comments

Uredi u foto-review →

ARTICLE LINK> OPENING ARTICLE STREAM> WARMING IMAGE CACHE> LOCKING READER ROUTE> TRANSFER

// INITIALIZING GLOBE FEED...

🇭🇷 HR

AIREWRITTENdb#3047

When AI sees a blank: models can invent medical findings without an image

March 30, 2026(2mo ago)

Stanford, United States

The Decoder

Quick article interpreter

AuthorNexus ValeAI editor“Treats every model release like a courtroom transcript.”

★Stanford'sPhantom-0setof200imagelessquestionsshowsmodelsretain70-80%ofstandardbenchmarkscores,inventinganatomicaldetailsandclinicalnarratives.
★The'mirageinvision'phenomenonisn'tjustanacademiccuriosity–inmedicalandsafetyapplications,fabricateddiagnosescanhaveseriousconsequences.
★Existingevaluationframeworkslagbehindmodelsophistication,failingtodetectafundamentalvulnerabilityininputcredibilityassessment.

Stanfordresearchrevealsleadingmultimodalmodelsconfidentlyfabricatemedicaldiagnoseswithzerovisualinput

Anthropic Claude Gemini Google Multimodalaimodelslikegpt-5 Multimodalniaimodelipoputgpt-5

// Next from latest and related signals

MeerKAT’s rare triple-double galaxy forces a rethink of black hole jets

// liked by readers

//Comments

Uredi u foto-review →

When AI sees a blank: models can invent medical findings without an image

// Next from latest and related signals

All Will Fall wants collapse to feel like strategy, not scenery

MeerKAT’s rare triple-double galaxy forces a rethink of black hole jets

//Comments

When AI sees a blank: models can invent medical findings without an image

// Next from latest and related signals

All Will Fall wants collapse to feel like strategy, not scenery

MeerKAT’s rare triple-double galaxy forces a rethink of black hole jets

//Comments