Vision Language Foundation Models for Scoring Tumor-Infiltrating Lymphocytes in Breast Cancer through Text Prompting

M. Stegeman, G. Bogina, E. Munari, J. van der Laak and F. Ciompi

European Congress on Digital Pathology 2024.

Introduction

We explored the potential of the PLIP generalist vision-language AI model to quantify tumor-infiltrating lymphocytes (TILs) in breast cancer via text prompting. Contrary to task-specific deep learning models, trained with manual annotations for tasks such as tissue segmentation or cell detection, we used textual prompts to tailor a single model to assess multiple morphological features.

Material and methods

We prompted PLIP with strings "An HE image containing tumor associated stroma" and "An HE image containing high amount of lymphocytes" and used the cosine similarity between text embeddings and embeddings of non-overlapping patches to obtain likelihood maps for these features. Cosine similarity for lymphocytes was used as a surrogate for TIL density, assessed only in patches in which the tumor associated stroma likelihood exceeded a threshold (computed using the validation set) We used two datasets with slide-level TILs assessed by a pathologist. We optimized using 82 biopsies and resections from the WSITILS training subset of the TIGERchallenge, and evaluated on 56 external biopsies from the Verona hospital. Also, a comparison was performed with the TIGER submission scoring highest on segmentation and detection.

Results and discussion

Our approach yielded TILs scores with a Pearson correlation of 0.57 compared to the pathologists' assessment on external biopsies, on which the tiger algorithm achieved a Pearson correlation of 0.74.

Conclusion

While our approach showed a lower Pearson correlation than the tiger algorithm, which was specifically tuned for this problem, we demonstrated a viable strategy that circumvents the need for extensive data, annotations, and training specific deep-learning models for individual tasks.