The New Era of Evidence Search, Generation, and Interpretation – Gustavo Monnerat

The New Era of Evidence Search, Generation, and Interpretation – Gustavo Monnerat

Clinicians treating cancer, policymakers shaping access to therapies, and pharma teams designing the next trial all depend on the same thing: timely access to the best available evidence. Comprehensive literature search, rigorous quality evaluation, and reliable data synthesis supports clinical guidelines, regulatory submissions, and investment decisions. Yet the infrastructure supporting this process is increasingly under strain from its own success. Global scientific output now exceeds 3 million articles per year, and the volume indexed in major databases grew significantly in the past decade. Add to this the expanding universe of clinical trial registries, preprint servers, real-world evidence platforms, and conference abstracts, and the challenge becomes clear: the evidence base that should inform decisions is growing faster than any individual or team can manage.

Artificial intelligence is emerging as a powerful response to this challenge. But it also introduces new risks that require careful governance. The evidence lifecycle, from searching and screening to extraction, synthesis, and interpretation, is being reshaped by AI tools at every stage, and oncology is at the center of this transformation.

Opportunities across the evidence lifecycle

The most immediate opportunity lies in evidence search. AI-powered semantic search tools can now query clinical trial databases and peer-reviewed literature simultaneously, identifying relevant studies even when they do not match traditional keyword strategies. Unlike Boolean searches that depend on precise terminology, these tools use natural language understanding to surface conceptually related findings, including studies published in less commonly indexed journals or deposited as preprints. For oncology, where treatment paradigms may evolve rapidly and trial data accumulates across different tumor types, this capability is significant.

Beyond search, large language models (LLMs) are being deployed for screening abstracts, extracting data from full-text publications, and summarizing findings. Recent evaluations suggest that LLM-assisted screening can achieve sensitivity above 96% and specificity approaching 98%, compared to roughly 82% sensitivity for traditional dual-human review workflows. In data extraction, accuracy rates above 93% have been reported in head-to-head comparisons with human reviewers, whose own accuracy typically falls in the 66–86% range.

Perhaps the most transformative application is in the production of living systematic reviews and living evidence documents. Traditional systematic reviews may become outdated within months of publication, especially in fast-moving fields like oncology. AI-assisted pipelines can continuously monitor new publications, screen for eligibility, extract data, and flag when accumulated evidence may shift a conclusion. This creates a path toward evidence synthesis that is genuinely responsive to the pace of research, rather than frozen at the moment of the last manual update.

Risks that demand attention

The same capabilities that make AI tools powerful also make their failures consequential. Hallucination, the generation of plausible but fabricated information, remains a critical vulnerability. In medical contexts, studies report that state-of-the-art LLMs exhibit hallucination a significant rate on clinical tasks. When tasked with generating literature reviews, one study found that roughly one in five citations produced by a leading model were entirely fabricated, with fabrication rates reaching nearly 30% for less-studied conditions. Among fabricated citations that included digital
object identifiers, 64% linked to real but completely unrelated papers, making errors particularly difficult to detect without manual verification.

Data extraction errors present a related risk. While average accuracy is high, errors in extracting numerical outcomes, dosing regimens, or subgroup results can propagate through meta-analyses and alter pooled estimates. Algorithmic bias is another concerns. Limited transparency in many AI tools, where users cannot inspect the model’s reasoning or verify which sources informed a given output, undermines the transparency that evidence-based medicine depends on.

Beyond a single human in the loop

The standard recommendation for AI-assisted evidence work is to maintain ahuman in the loop.” This is necessary but not sufficient. A single reviewer checking AI outputs may catch obvious errors but miss subtle ones, hallucinated citations that look plausible, extraction inaccuracies in unfamiliar therapeutic areas, or systematic omissions that only become visible in aggregate. What is needed is not one human but a structured, multistakeholder oversight system.

Clinicians with domain expertise should verify that AI-generated summaries accurately represent treatment effects and clinical relevance. Epidemiologists and methodologists should audit search strategies, data extraction accuracy, and statistical outputs for systematic bias. Patient representatives and community advocates should assess whether the evidence selected and synthesized reflects the populations and outcomes that matter most to those affected by the disease. Regulatory and governance specialists should evaluate whether the process meets the standards required for guideline development, health technology assessment, or regulatory submission.

A shared responsibility

The transition is already underway. AI tools are being integrated into systematic review workflows, clinical trial monitoring, health technology assessments, and real-time evidence surveillance. The question is no longer whether these tools will be used, but whether they will be used in ways that produce evidence that is effective, representative, unbiased, auditable, and continuously updated.

Answering that question affirmatively requires more than technical solutions. It requires governance structures, multidisciplinary oversight, transparent reporting standards, and a shared commitment from researchers, clinicians, industry, and regulatory bodies. AI can accelerate evidence synthesis in oncology to a degree that was unimaginable a few years ago. Ensuring it does so reliably is a responsibility that belongs to all of us.

Written by Gustavo Monnerat, PhD, MBA.