A full pipeline to analyze lung histopathology images

L. Borras Ferris, S. Püttmann, N. Marini, S. Vatrano, F. Fragetta, A. Caputo, F. Ciompi, M. Atzori and H. Müller

Medical Imaging 2024: Digital and Computational Pathology 2024.

Histopathology images involve the analysis of tissue samples to diagnose several diseases, such as cancer. The analysis of tissue samples is a time-consuming procedure, manually made by medical experts, namely pathologists. Computational pathology aims to develop automatic methods to analyze Whole Slide Images (WSI), which are digitized histopathology images, showing accurate performance in terms of image analysis. Although the amount of available WSIs is increasing, the capacity of medical experts to manually analyze samples is not expanding proportionally. This paper presents a full automatic pipeline to classify lung cancer WSIs, considering four classes: Small Cell Lung Cancer (SCLC), non-small cell lung cancer divided into LUng ADenocarcinoma (LUAD) and LUng Squamous cell Carcinoma (LUSC), and normal tissue. The pipeline includes a self-supervised algorithm for pre-training the model and Multiple Instance Learning (MIL) for WSI classification. The model is trained with 2,226 WSIs and it obtains an AUC of 0.8558 +- 0.0051 and a weighted f1-score of 0.6537 +- 0.0237 for the 4-class classification on the test set. The capability of the model to generalize was evaluated by testing it on the public The Cancer Genome Atlas (TCGA) dataset on LUAD and LUSC classification. In this task, the model obtained an AUC of 0.9433 +- 0.0198 and a weighted f1-score of 0.7726 +- 0.0438.