June 4, 2026
We're excited to share that our paper on diagnostic benchmarking, “MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings,” has been accepted to the Structured Data 4 Health Workshop at ICML 2026.
Most AI clinical benchmarks test models on highly curated plain-text medical case narratives — but in the real world, clinical systems in healthcare run on structured, interoperable data standards like HL7 FHIR, which store medically relevant information inside records that also contain the additional noise of health data.
Our paper introduces a pipeline that converts unstructured medical case narratives into validated FHIR R4 bundles, resulting in a much more realistic substrate for building AI models. The findings are striking: diagnostic accuracy consistently drops when models move from plain text to structured FHIR inputs, highlighting the importance of deployment-aligned evaluation.
We look forward to sharing more, including the dataset, in the coming weeks.