Integration Of A Deep Learning Algorithm Into The Clinically Established PanCan Model For Malignancy Risk Estimation Of Screen-detected Pulmonary Nodules In First Screening CT

K. Venkadesh, A. Schreuder, E. Scholten, S. Atkar-Khattra, J. Mayo, Z. Saghir, M. Wille, B. van Ginneken, S. Lam, M. Prokop and C. Jacobs

Annual Meeting of the Radiological Society of North America 2021.

PURPOSE: To quantify the added value of integrating a deep learning algorithm (DLA)'s output to the existing Pan-Canadian Early Detection of Lung Cancer Study (PanCan) models for estimating malignancy risk of screen-detected pulmonary nodules.

METHODS AND MATERIALS: Our DLA was trained on a cohort of 14,828 benign and 1,249 malignant nodules from the National Lung Screening Trial. In the present study, we derived a new multivariable logistic regression model on the PanCan data that included the DLA risk score and the original variables from the PanCan model 2b except for "nodule type" and "spiculation" as these are already encoded in the DLA risk score. The new model was externally validated on baseline nodules from the Danish Lung Cancer Screening Trial (DLCST). For comparison, the performances of the existing PanCan model 2b and of our DLA stand-alone were also calculated.

RESULTS: 6024 benign and 86 malignant nodules from the PanCan data were included as the development set, and 818 benign and 34 malignant nodules from the Danish Lung Cancer Screening Trial (DLCST) were included as the validation set. The area under the receiver operating characteristic curve (AUC) for the DLA, PanCan model 2b, and the new model in the PanCan cohort were 0.944 (95% confidence interval = 0.917 - 0.968), 0.941 (0.908 - 0.969), and 0.944 (0.909 - 0.975), respectively. In the DLCST cohort, the AUCs were 0.917 (0.851 - 0.968), 0.896 (0.841 - 0.944), and 0.927 (0.878 - 0.969), respectively.

CONCLUSIONS: Using our DLA risk score to derive a new multivariable logistic regression model on the PanCan data does not appear to significantly improve the predictive performance in high-risk screening participants, but may serve as a replacement for the "nodule type" and "spiculation" parameters that are known to have substantial interobserver variability.

CLINICAL RELEVANCE / APPLICATION: Our DLA has a comparable nodule malignancy risk estimation performance to the PanCan models. This may help to make the computation of nodule risk scores easier and less subjective.