QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

H. Bran, F. Navarro, I. Ezhov, A. Bayat, D. Das, F. Kofler, S. Shit, D. Waldmannstetter, J. Paetzold, X. Hu, B. Wiestler, L. Zimmer, T. Amiranashvili, C. Prabhakar, C. Berger, J. Weidner, M. Alonso-Basant, A. Rashid, U. Baid, W. Adel, D. Ali, B. Baheti, Y. Bai, I. Bhatt, S. Cetindag, W. Chen, L. Cheng, P. Dutand, L. Dular, M. Elattar, M. Feng, S. Gao, H. Huisman, W. Hu, S. Innani, W. Jiat, D. Karimi, H. Kuijf, J. Kwak, H. Le, X. Lia, H. Lin, T. Liu, J. Ma, K. Ma, T. Ma, I. Oksuz, R. Holland, A. Oliveira, J. Pal, X. Pei, M. Qiao, A. Saha, R. Selvan, L. Shen, J. Silva, Z. Spiclin, S. Talbar, D. Wang, W. Wang, X. Wang, Y. Wang, R. Xia, K. Xu, Y. Yan, M. Yergin, S. Yu, L. Zeng, Y. Zhang, J. Zhao, Y. Zheng, M. Zukovec, R. Do, A. Becker, A. Simpson, E. Konukoglu, A. Jakab, S. Bakas, L. Joskowicz and B. Menze

arXiv:2405.18435 2024.

DOI arXiv

Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.