AI in mammography screening is rapidly transforming how breast cancer detection is approached, offering a potential solution to one of the biggest challenges in modern screening programs: maintaining high diagnostic accuracy while managing increasing radiologist workload.

In this prospective paired noninferiority trial published in Nature Medicine, investigators evaluated whether artificial intelligence could safely identify low-risk screening exams and exclude them from human reading. Among 31,301 women, the AI-supported strategy reduced radiologist workload by 63.6% and increased the cancer detection rate from 6.3 to 7.3 per 1,000 examinations, highlighting the growing role of AI in optimizing breast cancer screening pathways.

Title: AI-based triage and decision support in mammography and digital tomosynthesis for breast cancer screening: a paired, noninferiority trial

Authors: Esperanza Elías-Cabot, Sara Romero-Martín, José Luis Raya-Povedano, Alejandro Rodríguez-Ruiz, Marina Álvarez-Benito

Background

Breast cancer screening continues to balance two competing priorities: detecting as many cancers as possible while limiting unnecessary recalls and keeping radiologist workload manageable. This challenge has become even more important as screening programs expand, digital breast tomosynthesis becomes more common, and many health systems face shortages of experienced breast imagers. In this prospective paired noninferiority trial published in Nature Medicine, Elías-Cabot and colleagues evaluated whether an artificial intelligence-based screening workflow could safely exclude low-risk mammograms from human reading while preserving screening performance.

The study tested a partially autonomous model in which examinations categorized by AI as low risk were automatically labeled normal, while higher-risk cases underwent double reading with AI support. The central question was clinically important: can AI reduce workload substantially without compromising cancer detection in routine breast cancer screening?

Methods

The trial was conducted within the Córdoba Breast Cancer Screening Unit in Spain as part of a population-based screening program. Women aged 50 to 71 years who attended routine biennial screening between 15 March 2022 and 11 January 2024 were invited to participate. After informed consent and exclusions, 31,301 women were included in the final analysis. Among them, 17,333 underwent digital mammography and 13,968 underwent digital breast tomosynthesis.

Each participant’s screening examination was assessed using two parallel strategies. The first was the standard strategy, consisting of double human reading without AI support. The second was the AI strategy. In this intervention arm, the commercially available Transpara version 1.7 AI system first assigned each exam a risk category from 1 to 10. Cases scored 1 to 7 were considered low risk and were automatically classified as normal without radiologist review. Cases scored 8 to 10 were then read by two radiologists with AI support.

The primary outcomes were radiologist workload, cancer detection rate, and recall rate. Workload was measured as the absolute number of screening readings. Cancer detection rate was defined as the number of screen-detected cancers per 1,000 examinations. Recall rate reflected the proportion of women referred for further assessment. Secondary outcomes included positive predictive value of recalls and false positive rate. The investigators also performed subgroup analyses according to imaging modality.

Study Design

This was a prospective, paired, noninferiority trial, an important design choice because each woman served as her own control. Rather than comparing two separate screening populations, the investigators applied both reading strategies to the same examination. This approach reduced between-group variability and allowed a more direct comparison of screening performance.

The study was designed to test whether the AI-supported strategy was noninferior to standard double reading in terms of cancer detection and recall, while also reducing workload. The noninferiority margin was set at 5% relative difference for the key endpoints. If noninferiority was established, superiority testing was then performed. The sample size calculation estimated that approximately 27,000 women would be needed, and the final enrolled population exceeded that threshold with 31,301 participants.

mammography and AI

This design is especially relevant because it reflects a real-world screening environment rather than an isolated retrospective simulation. It also included both digital mammography and digital breast tomosynthesis, which makes the findings more clinically meaningful for contemporary screening programs.

Results

The standard strategy generated 62,602 radiologist readings and detected 198 screen-detected cancers. This corresponded to a cancer detection rate of 6.3 per 1,000 examinations and a recall rate of 4.8%.

By contrast, the AI strategy required radiologist reading for only 36.4% of all screening exams. In practical terms, 19,917 low-risk examinations were automatically labeled normal without human review, and only 11,384 exams were sent for AI-supported double reading. This reduced the number of radiologist reads to 22,768, representing a 63.6% reduction in workload.

Importantly, this reduction in workload was not associated with lower cancer detection. The AI strategy detected 228 cancers, compared with 198 under the standard approach. Cancer detection rate increased from 6.3 per 1,000 to 7.3 per 1,000, a relative increase of 15.2% and an absolute increase of 1.0 cancer per 1,000 women screened. This difference was statistically significant with P < 0.001, and the AI strategy was both noninferior and superior for cancer detection.

The tradeoff was in recall. Recall rate increased from 4.8% with standard double reading to 5.5% with the AI strategy. This represented a relative increase of 14.8% and an absolute increase of 0.7%. Because of this increase, recall rate did not meet the predefined criterion for noninferiority.

Secondary outcomes helped clarify the picture further. Positive predictive value of recalls remained essentially unchanged, at 13.19% in the standard arm and 13.23% in the AI arm. However, the false positive rate rose from 4.2% to 4.8%.

Subgroup analyses showed notable differences by modality. In digital mammography, the AI strategy improved cancer detection substantially. Cancer detection rate increased by an absolute 1.6 per 1,000, while recall rate also rose by 1.3%. In digital breast tomosynthesis, however, cancer detection and recall remained largely stable between strategies, even though workload still fell by 65.5%.

The histopathologic profile of detected cancers is also relevant. Across the whole cohort, the AI strategy detected more invasive cancers and more carcinomas in situ than the standard strategy. It identified 10.1% more invasive carcinomas and 35% more carcinomas in situ. It also detected a higher proportion of grade I invasive tumors, T1 tumors, and node-negative cancers, suggesting that the additional cancers found by AI may include earlier-stage disease.

A total of 252 cancers were identified across both strategies. Twenty-four were found only by the standard strategy, while 54 were found only by the AI strategy. Among the cancers missed by AI, 11 had been categorized by the system as low risk and therefore were not reviewed by radiologists. Most of these were subtle cases, and 9 of the 11 occurred in tomosynthesis exams. No adverse events were reported.

Key Findings

This trial shows that a partially autonomous AI screening workflow can dramatically reduce radiologist workload in breast cancer screening. A reduction of nearly two thirds is clinically meaningful, especially for programs facing workforce constraints.

At the same time, the AI-based strategy did not simply preserve performance. It improved cancer detection overall, raising the screening cancer detection rate from 6.3 to 7.3 per 1,000 examinations. This is one of the most important findings of the study, because it suggests that workload reduction does not necessarily require sacrificing sensitivity.

The main limitation of the AI workflow was the increase in recalls. Although positive predictive value remained stable, more women were called back for additional assessment. This issue was particularly evident in digital mammography, whereas the tomosynthesis subgroup showed a more balanced result with stable detection and recall.

Another notable observation is that the AI strategy appeared to identify more favorable-pathology cancers, including smaller and node-negative invasive tumors. That pattern raises the possibility that AI-supported screening could shift detection toward earlier-stage disease, although interval cancer outcomes were not assessable in this paired design.

Conclusion

The AITIC trial provides strong prospective evidence that AI-based triage and decision support can meaningfully reshape breast cancer screening workflows. In more than 31,000 women, the AI strategy reduced radiologist workload by 63.6%, increased cancer detection from 6.3 to 7.3 per 1,000 screenings, and maintained positive predictive value, although at the cost of a higher recall rate.

These findings suggest that partially autonomous AI screening may be a practical and effective option for modern screening programs, particularly where radiologist capacity is limited. At the same time, the increase in recalls and the unanswered questions around implementation, safety oversight, and generalizability make clear that adoption should remain careful, data-driven, and program-specific.

Read the full article

AI-Based Triage and Decision Support in Mammography and Digital Tomosynthesis for Breast Cancer Screening

Background

Methods

Study Design

Results

Key Findings

Conclusion

European School of Oncology

Sitemap

Hemostasis Today

Fertility News

Oncodaily Journal