Blog <-

System’s Evidence Synthesis Leads in Accuracy

Patrick Wedlock

10.2.2025

Clinicians are increasingly turning to AI to synthesize vast amounts of literature and guidelines to support clinical decisions. It is therefore critical to continuously evaluate these solutions for their scientific rigor and medical trustworthiness.

System’s Synthesize API takes in clinical and biomedical queries and returns highly accurate, fully cited, and extremely flexible natural language responses based on the relevant literature and guidelines. Powered by the proprietary System Graph and novel agent-based retrieval and rerank algorithms, our Synthesize API is currently integrated into clinical decision support (CDS) applications at leading healthcare providers and health technology companies.

Today we’re sharing the latest results of our ongoing benchmarking of this solution. In this blind head-to-head study, 31 clinicians were asked to provide 93 real-world clinical questions and score the accuracy of responses generated by System and OpenEvidence.

‍

Results

Overall, clinicians score System higher on accuracy compared to OpenEvidence

‍

Clinicians strongly agree with the accuracy of System’s response more often than OpenEvidence

‍

Clinicians strongly disagreed with the accuracy of OpenEvidence’s response ~5% of the time

‍

There is no difference between System and OpenEvidence in overall preference between responses

‍

Methodology

Participants

31 clinicians were recruited to participate in the evaluation via the User Interviews platform. Participants included physicians, physician assistants and nurse practitioners working across academic medical centers, family practice, and other settings.

Data Collection

As part of the recruitment process, we asked clinicians to identify their medical specialty (or specialities), board certifications, and to share 2-4 questions relevant to their clinical practice that required or could require guidelines and literature to answer. A total of 93 clinical questions were evaluated.

For each question, we generated two responses, one from System’s Synthesize API and one from OpenEvidence. Each response was copied into its own Google Doc including the entire text output, tables, figures, and references, and labeled as Synthesis 1 or Synthesis 2. Participants were blinded to the source of the response.

A single Google Form was created for each participant containing the responses to each of their questions and a set of criteria for evaluating responses. We asked reviewers to score the accuracy of the answer on a scale of 1 (strongly disagree) to 5 (strongly agree).

Participants were also asked to select which response they preferred overall, with the option to select either of the syntheses or mark ‘no preference’.

We collected data from September 7-28.

Analysis

At the end of the data collection period, we summed the total number of responses by category (1-5) for both System and OpenEvidence and calculated the average accuracy score across all participant responses, as well as the overall preference.

‍

About System

System builds knowledge infrastructure to transform decision-making from silos to systems — starting in healthcare. System’s APIs are used today by leading healthcare providers in the US and Europe to power groundbreaking clinical decision support systems (CDSS). At the core of System is the System Graph, a patented, large-scale, statistical graph of the world modeled as one interconnected system, based on trusted sources of evidence that are updated daily. System Inc. is a Public Benefit Corporation committed to advancing systems thinking in the world.

System’s Evidence Synthesis Leads in Accuracy

Patrick Wedlock

October 2, 2025

‍

Results

Overall, clinicians score System higher on accuracy compared to OpenEvidence

‍

Clinicians strongly agree with the accuracy of System’s response more often than OpenEvidence

‍

Clinicians strongly disagreed with the accuracy of OpenEvidence’s response ~5% of the time

‍

There is no difference between System and OpenEvidence in overall preference between responses

‍

Methodology

Participants

Data Collection

Participants were also asked to select which response they preferred overall, with the option to select either of the syntheses or mark ‘no preference’.

We collected data from September 7-28.

Analysis

‍

About System

Filed under:

Tech

Request a demo

System API

Request received

Join the community

Welcome,systems thinker

Request a demo

WholeHealth

Welcome,systems thinker

Blog <-

System’s Evidence Synthesis Leads in Accuracy

Patrick Wedlock

10.2.2025

Results

Overall, clinicians score System higher on accuracy compared to OpenEvidence

Clinicians strongly agree with the accuracy of System’s response more often than OpenEvidence

Clinicians strongly disagreed with the accuracy of OpenEvidence’s response ~5% of the time

There is no difference between System and OpenEvidence in overall preference between responses

Methodology

About System

System’s Evidence Synthesis Leads in Accuracy

Results

Overall, clinicians score System higher on accuracy compared to OpenEvidence

Clinicians strongly agree with the accuracy of System’s response more often than OpenEvidence

Clinicians strongly disagreed with the accuracy of OpenEvidence’s response ~5% of the time

There is no difference between System and OpenEvidence in overall preference between responses

Methodology

About System

Welcome,
systems thinker

Welcome,
systems thinker