Reliability of Automated RECIST 1.1 and Volumetric RECIST Target Lesion Response Evaluation in Follow-Up CT--A Multi-Center, Multi-Observer Reading Study

I. Dahm, M. Kolb, S. Altmann, K. Nikolaou, S. Gatidis, A. Othman, A. Hering, J. Moltz and F. Peisen

Cancers 2024;16:4009.

Objectives: To evaluate the performance of a custom-made convolutional neural network (CNN) algorithm for fully automated lesion tracking and segmentation, as well as RECIST 1.1 evaluation, in longitudinal computed tomography (CT) studies compared to a manual Response Evaluation Criteria in Solid Tumors (RECIST 1.1) evaluation performed by three radiologists. Methods: Baseline and follow-up CTs of patients with stage IV melanoma (n = 58) was investigated in a retrospective reading study. Three radiologists performed manual measurements of metastatic lesions. Fully automated segmentations were generated, and diameters and volumes were computed from the segmentation results, with subsequent RECIST 1.1 evaluation. We measured (1) the intra- and inter-reader variability in the manual diameter measurements, (2) the agreement between manual and automated diameter measurements, as well as the resulting RECIST 1.1 categories, and (3) the agreement between the RECIST 1.1 categories derived from automated diameter measurement compared to automated volume measurements. Results: In total, 114 target lesions were measured at baseline and follow-up. The intraclass correlation coefficients (ICCs) for the intra- and inter-reader reliability of the diameter measurements were excellent, being >0.90 for all readers. There was moderate to almost perfect agreement when comparing the timepoint response category derived from the mean manual diameter measurements from all three readers with those derived from automated diameter measurements (Cohen's k 0.67-0.76). The agreement between the manual and automated volumetric timepoint responses was substantial (Fleiss' k 0.66-0.68) and that between the automated diameter and volume timepoint responses was substantial to almost perfect (Cohen's k 0.81). Conclusions: The automated diameter measurement of preselected target lesions in follow-up CT is reliable and can potentially help to accelerate RECIST evaluation.