For more than a decade, PD-L1 tumor proportion score (TPS) has served as the cornerstone biomarker guiding immune checkpoint inhibitor therapy in non-small cell lung cancer (NSCLC). While patients with higher PD-L1 expression generally derive greater benefit from immunotherapy, clinicians have long recognized a frustrating reality: PD-L1 is far from a perfect predictor. Some patients with PD-L1–negative tumors experience remarkable responses, whereas others with very high PD-L1 expression fail to benefit.
In the neoadjuvant setting, this discrepancy has become even more apparent. Numerous studies have reported only weak associations between pretreatment PD-L1 expression and the degree of pathological tumor regression following chemoimmunotherapy. One possible explanation has been that PD-L1 scoring itself is subjective, with different pathologists assigning different TPS values, particularly around clinically important cutoffs.
A new international study led by Konrad Steinestel and colleagues asked a simple but highly important question:
Is the limited predictive value of PD-L1 simply caused by variability among pathologists—or does the biomarker itself have fundamental biological limitations?
The findings strongly support the latter.

Blood-Based Immune Biomarkers Predict Immunotherapy Response in Metastatic NSCLC
Why This Question Matters
PD-L1 TPS currently influences treatment decisions across multiple stages of NSCLC. In metastatic disease, it determines eligibility for pembrolizumab monotherapy and several combination regimens. More recently, PD-L1 has also been explored as a biomarker for predicting pathological response after neoadjuvant immunotherapy.
Unlike metastatic disease, where response is evaluated radiographically over time, neoadjuvant therapy allows direct examination of the resected tumor. The amount of residual viable tumor (RVT) remaining after treatment serves as an early indicator of therapeutic efficacy and has emerged as an important surrogate marker for long-term survival.
However, previous studies—including the original ReGraDE study—found only a weak relationship between pretreatment PD-L1 expression and residual viable tumor. The investigators wanted to determine whether inconsistent pathological scoring could explain this disappointing performance.
Study Design
The investigators selected 30 NSCLC cases from the German ReGraDE study, representing the full spectrum of PD-L1 expression and pathological response after neoadjuvant chemoimmunotherapy.
Thirty pathologists from 11 countries independently reviewed digitized pathology slides without access to clinical information or each other’s assessments.
Each participant evaluated:
- PD-L1 tumor proportion score (TPS) on pretreatment biopsy specimens.
- Residual viable tumor (RVT) on corresponding surgical specimens after neoadjuvant treatment.
The study included both experienced thoracic pathologists and residents, allowing the investigators to assess whether professional experience or routine case volume influenced scoring consistency.
How Consistent Were Pathologists?
The study confirmed what many thoracic pathologists already experience in routine practice: PD-L1 assessment is not perfectly reproducible.
Agreement between observers was moderate for both PD-L1 TPS and RVT when individual assessments were compared. Interestingly, when scores from all 30 observers were averaged, agreement became almost perfect, demonstrating that much of the variability reflected random differences in interpretation rather than systematic error.
The greatest disagreement occurred in clinically challenging borderline cases.
For PD-L1, variability was highest around the 1% TPS cutoff, where even small differences in interpretation can determine whether a patient is classified as PD-L1 positive or negative. In contrast, tumors with very high PD-L1 expression (≥50%) were scored much more consistently across observers.
A similar pattern was observed for pathological response. Distinguishing pathological complete response (0% RVT)from minimal residual disease proved substantially more difficult than recognizing tumors with larger amounts of viable cancer. These findings illustrate that some degree of observer variability is unavoidable whenever semi-quantitative pathological biomarkers rely on subjective interpretation.
Experience Improved Agreement—but Didn’t Change the Overall Picture
Professional experience mattered, particularly in borderline cases.
Pathologists with greater experience or higher PD-L1 case volumes demonstrated better agreement when evaluating tumors close to the 1% threshold. Likewise, experienced thoracic pathologists were more consistent when distinguishing complete pathological response from minimal residual viable tumor. These observations support ongoing efforts to improve PD-L1 interpretation through dedicated training, standardized scoring methods, digital pathology, and potentially artificial intelligence.
However, even after eliminating much of the observer-related variability by averaging assessments, the fundamental problem remained unchanged.

Better Scoring Did Not Improve PD-L1 Performance
This represents the central finding of the study.
If observer variability were responsible for PD-L1’s poor predictive performance, averaging the scores from 30 pathologists should have strengthened the relationship between PD-L1 expression and pathological response.
It did not.
The correlation between PD-L1 TPS and residual viable tumor remained weak and statistically insignificant regardless of whether TPS was derived from a single observer or from the averaged assessment of all observers.
Key Results
- 30 pathologists from 11 countries independently assessed 30 NSCLC cases.
- Interobserver agreement was moderate for both PD-L1 TPS and RVT (ICC = 0.74 for each).
- Averaging scores increased agreement to ICC = 0.99, indicating near-perfect reproducibility.
- Despite this improvement, the correlation between PD-L1 TPS and pathological response remained minimal (r = –0.17 for single observers and r = –0.16 after averaging; both non-significant).
- Statistical testing confirmed that reducing observer variability did not improve PD-L1’s predictive value (p = 0.96).
The Real Limitation Is Biological
The findings suggest that the weakness of PD-L1 is not primarily technical—it is biological.
PD-L1 expression represents only one component of an extraordinarily complex interaction between tumor cells and the immune system. A single biopsy captures only a small portion of a heterogeneous tumor, and PD-L1 expression itself is dynamic, changing over time and in response to therapy.
Multiple additional factors influence response to immune checkpoint blockade, including:
- Tumor mutational burden
- Antigen presentation and HLA expression
- STK11 and KEAP1 alterations
- Spatial immune-cell infiltration
- The composition of the tumor microenvironment
- Dynamic immune remodeling during treatment
Consequently, two tumors with identical PD-L1 expression may possess entirely different immune landscapes and respond very differently to immunotherapy.
This explains why PD-L1 alone cannot reliably predict pathological response.
Clinical Implications
For practicing oncologists and pathologists, this study provides an important message.
Improving pathological training and standardizing PD-L1 scoring remain worthwhile goals because they reduce diagnostic variability and improve consistency in routine practice. However, these efforts alone are unlikely to transform PD-L1 into a highly accurate predictive biomarker.
Instead, future biomarker development will likely require multidimensional approaches that integrate PD-L1 with molecular, genomic, and immune microenvironment features rather than relying on a single immunohistochemical marker.

Conclusion
Steinestel and colleagues provide compelling evidence that the limited predictive value of PD-L1 in the neoadjuvant treatment of NSCLC cannot be explained simply by differences among pathologists. Even when observer variability was almost completely eliminated, PD-L1 remained only weakly associated with pathological response.
Rather than exposing a problem with pathological scoring, this study highlights the intrinsic biological limitations of PD-L1 as a standalone biomarker. As neoadjuvant immunotherapy becomes increasingly integrated into early-stage NSCLC, these findings reinforce the need for next-generation biomarker strategies that combine pathology with genomic, molecular, and immune profiling to better identify the patients most likely to benefit from treatment.
Read Full Article Here