Warning: fopen(/home/virtual/colon/journal/upload/ip_log/ip_log_2024-09.txt): failed to open stream: Permission denied in /home/virtual/lib/view_data.php on line 95 Warning: fwrite() expects parameter 1 to be resource, boolean given in /home/virtual/lib/view_data.php on line 96
1Department of Surgery, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang, Korea
2Medical Artificial Intelligence Center, Hallym University Medical Center, Anyang, Korea
3Department of Radiology, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang, Korea
© 2024 The Korean Society of Coloproctology
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Conflict of interest
Bo Young Oh and Il Tae Son are Editorial Board members of Annals of Coloproctology, but were not involved in in the peer reviewer selection, evaluation, or decision process of this article. No other potential conflict of interest relevant to this article was reported.
Funding
None.
Author contributions
Conceptualization: ITS, BJC; Formal analysis: MK, PT; Investigation: MK, PT, MJK, BYO; Methodology: PT; Project administration: ITS, BJC; Supervision: MJK, BYO, Validation: MJK, BYO, Writing–original draft: MK; Writing–review & editing: all authors. All authors read and approved the final manuscript.
Study | Country | Modality | Architecture | Internal dataset | External dataset | Automatization | Performance | Comparison |
---|---|---|---|---|---|---|---|---|
Tumor segmentation | ||||||||
Trebeschi et al. [26] (2017) | The Netherlands | T2WI, DWI | CNN | Training, 56 | None | Manual annotation | DSC, 0.70 | None |
Validation, 62 | AUROC, 0.99 | |||||||
Test, 14 | ||||||||
Wang et al. [27] (2018) | China | T2WI | CNN (2D U-Net) | Training, 84 | None | Manual annotation | DSC, 0.74 | None |
Validation, 9 | Hausdorff distance, 20.44 | |||||||
Test, 20 | Average surface distance, 3.25 | |||||||
Jaccard index, 0.60 | ||||||||
Kim et al. [28] (2019) | Korea | T2WI | CNN (U-Net, FCN-8, SegNet) | 133 | None | Manual annotation | U-Net: | U-Net is superior to other models |
DSC, 0.81 | ||||||||
Sensitivity, 0.79 | ||||||||
Specificity, 0.98 | ||||||||
Pang et al. [29] (2021) | China | T2WI | CNN (U-Net) | Training, 88 | 34 | Manual annotation | DSC, 0.95 | None |
Test, 46 | Sensitivity, 0.97 | |||||||
Specificity, 0.96 | ||||||||
Knuth et al. [30] (2022) | Norway | T2WI | CNN (2D U-Net) | 109 | 83 | Manual annotation | DSC, 0.78 | None |
DeSilvio et al. [31] (2023) | USA | T2WI | CNN (region-specific U-Net, multiclass U-Net) | Training, 44 | 11 | Manual annotation | Region-specific U-Net: | Region-specific U-Net is superior to multiclass U-Net |
Validation, 44 | DSC, 0.91 | |||||||
Test, 49 | Hausdorff distance, 2.45 | |||||||
Zhang et al. [32] (2021) | China | T2WI, DWI | CNN (3D V-Net) | Training, 108 | None | Manual annotation | DSC, 0.96 | None |
Test, 94 | ||||||||
Jian et al. [34] (2018) | China | T2WI | CNN (VGG-16) | Training, 410 | None | Fully automatic | DSC, 0.84 | VGG-16 is superior to U-Net |
Test, 102 | PPV, 0.83 | |||||||
Sensitivity, 0.88 | ||||||||
Specificity, 0.97 | ||||||||
Hammoude distance, 0.27 | ||||||||
Hausdorff distance, 8.2 | ||||||||
Zhu et al. [35] (2021) | China | DWI | CNN (3D U-Net) | Training, 180 | None | Fully automatic | DSC, 0.68 | Automatic model is superior to semiautomatic model |
Validation, 60 | ||||||||
Test, 60 | ||||||||
LN segmentation | ||||||||
Zhao et al. [33] (2020) | China | T2WI, DWI | CNN (Mask R-CNN) | 293 | 50 | Manual annotation | Detection: | Mask R-CNN is superior to junior radiologists in LN detection |
Sensitivity, 0.63 | ||||||||
PPV, 0.65 | ||||||||
False-positive rate per case, 8.2 | ||||||||
Segmentation: | ||||||||
DSC, 0.81-0.82 |
T2WI, T2-weighted image; DWI, diffusion-weighted image; CNN, convolutional neural network; DSC, dice similarity coefficient; AUROC, area under the receiver operating characteristic curve; 2D, 2-dimensional; FCN-8, fully convolutional network (8 pixels); 3D, 3-dimensional; LN, lymph node; PPV, positive predictive value.
Study | Country | Modality | Architecture | Internal dataset | External dataset | Performance | Comparison |
---|---|---|---|---|---|---|---|
T stage | |||||||
Kim et al. [28] (2019) | Korea | T2WI | CNN (AlexNet, Inception) | 133 | None | Inception: | Inception is superior to AlexNet |
Accuracy, 0.94 | |||||||
Sensitivity, 0.88 | |||||||
Specificity, 1.0 | |||||||
Wu et al. [43] (2021) | China | T2WI | CNN (Faster R-CNN) | 183 | None | Coronal view: | None |
AUROC (T1), 0.96 | |||||||
AUROC (T2), 0.97 | |||||||
AUROC (T3), 0.97 | |||||||
AUROC (T4), 0.97 | |||||||
N stage | |||||||
Lu et al. [44] (2018) | China | T2WI, DWI | CNN (Faster R-CNN) | 351 | 414 | AUROC, 0.91 | No difference in accuracy compared to radiologists |
Diagnostic time, 20 sec vs. 600 sec | Faster than radiologists | ||||||
Ding et al. [45] (2019) | China | T2WI, DWI | CNN (Faster R-CNN) | 414 | None | κ coefficient: | AI model is superior to radiologists |
AI vs. pathologist, 0.57 | |||||||
Radiologist vs. pathologist, 0.47 | |||||||
Zhou et al. [46] (2019) | China | Pelvic HD MRI | CNN | Training, 201 | None | AUROC, 0.89 | No difference in accuracy compared with radiologists |
Test, 100 | Diagnostic time, 10 sec vs. 600 sec | Faster than radiologists | |||||
Li et al. [47] (2021) | China | T2WI | CNN (InceptionV3) | 129 | None | Sensitivity, 0.95 | AI is superior to radiologists |
Specificity, 0.95 | |||||||
PPV, 0.95 | |||||||
NPV, 0.95 | |||||||
AUROC, 0.99 | |||||||
Circumferential resection margin | |||||||
Wang et al. [48] (2020) | China | T2WI | CNN (Faster R-CNN) | Training, 192 | None | Accuracy, 0.93 | None |
Test, 48 | Sensitivity, 0.84 | ||||||
Specificity, 0.96 | |||||||
AUROC, 0.95 | |||||||
Xu et al. [49] (2020) | China | T2WI | CNN (Faster R-CNN) | Training, 300 | None | Accuracy, 0.88 | None |
Test, 50 | Sensitivity, 0.86 | ||||||
Specificity, 0.90 | |||||||
AUROC, 0.93 |
Study | Country | Modality | Architecture | Internal dataset | External dataset | Performance | Comparison |
---|---|---|---|---|---|---|---|
Microsatellite instability | |||||||
Zhang et al. [61] (2021) | China | T2WI | CNN (3D MobileNetV2) | Training, 395 | None | Image model: | Image model is superior to clinical model |
Validation, 395 | Sensitivity, 0.89 | ||||||
Test, 96 | Specificity, 0.74 | ||||||
AUROC, 0.82 | |||||||
Clinical model: | |||||||
Sensitivity, 1.00 | |||||||
Specificity, 0.31 | |||||||
AUROC, 0.61 | |||||||
Cao et al. [62] (2023) | China | Enhanced APCT | CNN (Resnet101) | Training, 1,124 | 206 | Internal validation: | None |
Test, 482 | Accuracy, 0.99 | ||||||
Sensitivity, 1.0 | |||||||
Specificity, 0.97 | |||||||
AUROC, 0.99 | |||||||
External validation: | |||||||
Accuracy, 0.91 | |||||||
Sensitivity, 0.90 | |||||||
Specificity, 0.93 | |||||||
AUROC, 0.92 | |||||||
KRAS mutation | |||||||
He et al. [63] (2020) | China | Enhanced APCT | CNN (ResNet) | Training, 117 | None | AI model: | AI model is superior to radiomics model |
Test, 40 | Sensitivity, 0.59 | ||||||
Specificity, 1.0 | |||||||
AUROC, 0.93 | |||||||
Radiomics model: | |||||||
Sensitivity, 0.70 | |||||||
Specificity, 0.85 | |||||||
AUROC, 0.82 |
Study | Country | Modality | Architecture | Internal dataset | External dataset | Performance | Comparison |
---|---|---|---|---|---|---|---|
Shi et al. [79] (2019) | China | T2WI (pre- and mid-CRT) | CNN | 51 | None | pCR: | No difference in predicting pCR |
AUROC (CNN), 0.83 | CNN model is inferior to radiomics model in predicting GR | ||||||
AUROC (radiomics), 0.81 | |||||||
GR: | |||||||
AUROC (CNN), 0.74 | |||||||
AUROC (radiomics), 0.92 | |||||||
Zhang et al. [80] (2020) | China | T2WI (pre- and post-CRT) | CNN (models A, B, C) | Training, 290 | 93 | pCR: | Model A is superior to the other models |
Accuracy, 0.98 | |||||||
Sensitivity, 1.00 | |||||||
Specificity, 0.97 | |||||||
AUROC, 0.99 | |||||||
Zhu et al. [81] (2020) | China | T2WI (pre-CRT) | CNN | Training, 400 | None | GR: | None |
Validation, 100 | Sensitivity, 0.93 | ||||||
Test, 200 | Specificity, 0.62 | ||||||
AUROC, 0.81 | |||||||
Jang et al. [78] (2021) | Korea | T2WI (post-CRT) | CNN (ShuffleNet) | Training, 303 | None | pCR: | AI model is superior to human for predicting pCR and GR |
Validation, 46 | Accuracy, 0.85 | ||||||
Test, 117 | Sensitivity, 0.30 | ||||||
Specificity, 0.96 | |||||||
GR: | |||||||
Accuracy 0.72 | |||||||
Sensitivity 0.54 | |||||||
Specificity 0.81 |
Study | Country | Modality | Architecture | Internal dataset | External dataset | Automatization | Performance | Comparison |
---|---|---|---|---|---|---|---|---|
Tumor segmentation | ||||||||
Trebeschi et al. [26] (2017) | The Netherlands | T2WI, DWI | CNN | Training, 56 | None | Manual annotation | DSC, 0.70 | None |
Validation, 62 | AUROC, 0.99 | |||||||
Test, 14 | ||||||||
Wang et al. [27] (2018) | China | T2WI | CNN (2D U-Net) | Training, 84 | None | Manual annotation | DSC, 0.74 | None |
Validation, 9 | Hausdorff distance, 20.44 | |||||||
Test, 20 | Average surface distance, 3.25 | |||||||
Jaccard index, 0.60 | ||||||||
Kim et al. [28] (2019) | Korea | T2WI | CNN (U-Net, FCN-8, SegNet) | 133 | None | Manual annotation | U-Net: | U-Net is superior to other models |
DSC, 0.81 | ||||||||
Sensitivity, 0.79 | ||||||||
Specificity, 0.98 | ||||||||
Pang et al. [29] (2021) | China | T2WI | CNN (U-Net) | Training, 88 | 34 | Manual annotation | DSC, 0.95 | None |
Test, 46 | Sensitivity, 0.97 | |||||||
Specificity, 0.96 | ||||||||
Knuth et al. [30] (2022) | Norway | T2WI | CNN (2D U-Net) | 109 | 83 | Manual annotation | DSC, 0.78 | None |
DeSilvio et al. [31] (2023) | USA | T2WI | CNN (region-specific U-Net, multiclass U-Net) | Training, 44 | 11 | Manual annotation | Region-specific U-Net: | Region-specific U-Net is superior to multiclass U-Net |
Validation, 44 | DSC, 0.91 | |||||||
Test, 49 | Hausdorff distance, 2.45 | |||||||
Zhang et al. [32] (2021) | China | T2WI, DWI | CNN (3D V-Net) | Training, 108 | None | Manual annotation | DSC, 0.96 | None |
Test, 94 | ||||||||
Jian et al. [34] (2018) | China | T2WI | CNN (VGG-16) | Training, 410 | None | Fully automatic | DSC, 0.84 | VGG-16 is superior to U-Net |
Test, 102 | PPV, 0.83 | |||||||
Sensitivity, 0.88 | ||||||||
Specificity, 0.97 | ||||||||
Hammoude distance, 0.27 | ||||||||
Hausdorff distance, 8.2 | ||||||||
Zhu et al. [35] (2021) | China | DWI | CNN (3D U-Net) | Training, 180 | None | Fully automatic | DSC, 0.68 | Automatic model is superior to semiautomatic model |
Validation, 60 | ||||||||
Test, 60 | ||||||||
LN segmentation | ||||||||
Zhao et al. [33] (2020) | China | T2WI, DWI | CNN (Mask R-CNN) | 293 | 50 | Manual annotation | Detection: | Mask R-CNN is superior to junior radiologists in LN detection |
Sensitivity, 0.63 | ||||||||
PPV, 0.65 | ||||||||
False-positive rate per case, 8.2 | ||||||||
Segmentation: | ||||||||
DSC, 0.81-0.82 |
Study | Country | Modality | Architecture | Internal dataset | External dataset | Performance | Comparison |
---|---|---|---|---|---|---|---|
T stage | |||||||
Kim et al. [28] (2019) | Korea | T2WI | CNN (AlexNet, Inception) | 133 | None | Inception: | Inception is superior to AlexNet |
Accuracy, 0.94 | |||||||
Sensitivity, 0.88 | |||||||
Specificity, 1.0 | |||||||
Wu et al. [43] (2021) | China | T2WI | CNN (Faster R-CNN) | 183 | None | Coronal view: | None |
AUROC (T1), 0.96 | |||||||
AUROC (T2), 0.97 | |||||||
AUROC (T3), 0.97 | |||||||
AUROC (T4), 0.97 | |||||||
N stage | |||||||
Lu et al. [44] (2018) | China | T2WI, DWI | CNN (Faster R-CNN) | 351 | 414 | AUROC, 0.91 | No difference in accuracy compared to radiologists |
Diagnostic time, 20 sec vs. 600 sec | Faster than radiologists | ||||||
Ding et al. [45] (2019) | China | T2WI, DWI | CNN (Faster R-CNN) | 414 | None | κ coefficient: | AI model is superior to radiologists |
AI vs. pathologist, 0.57 | |||||||
Radiologist vs. pathologist, 0.47 | |||||||
Zhou et al. [46] (2019) | China | Pelvic HD MRI | CNN | Training, 201 | None | AUROC, 0.89 | No difference in accuracy compared with radiologists |
Test, 100 | Diagnostic time, 10 sec vs. 600 sec | Faster than radiologists | |||||
Li et al. [47] (2021) | China | T2WI | CNN (InceptionV3) | 129 | None | Sensitivity, 0.95 | AI is superior to radiologists |
Specificity, 0.95 | |||||||
PPV, 0.95 | |||||||
NPV, 0.95 | |||||||
AUROC, 0.99 | |||||||
Circumferential resection margin | |||||||
Wang et al. [48] (2020) | China | T2WI | CNN (Faster R-CNN) | Training, 192 | None | Accuracy, 0.93 | None |
Test, 48 | Sensitivity, 0.84 | ||||||
Specificity, 0.96 | |||||||
AUROC, 0.95 | |||||||
Xu et al. [49] (2020) | China | T2WI | CNN (Faster R-CNN) | Training, 300 | None | Accuracy, 0.88 | None |
Test, 50 | Sensitivity, 0.86 | ||||||
Specificity, 0.90 | |||||||
AUROC, 0.93 |
Study | Country | Modality | Architecture | Internal dataset | External dataset | Performance | Comparison |
---|---|---|---|---|---|---|---|
Microsatellite instability | |||||||
Zhang et al. [61] (2021) | China | T2WI | CNN (3D MobileNetV2) | Training, 395 | None | Image model: | Image model is superior to clinical model |
Validation, 395 | Sensitivity, 0.89 | ||||||
Test, 96 | Specificity, 0.74 | ||||||
AUROC, 0.82 | |||||||
Clinical model: | |||||||
Sensitivity, 1.00 | |||||||
Specificity, 0.31 | |||||||
AUROC, 0.61 | |||||||
Cao et al. [62] (2023) | China | Enhanced APCT | CNN (Resnet101) | Training, 1,124 | 206 | Internal validation: | None |
Test, 482 | Accuracy, 0.99 | ||||||
Sensitivity, 1.0 | |||||||
Specificity, 0.97 | |||||||
AUROC, 0.99 | |||||||
External validation: | |||||||
Accuracy, 0.91 | |||||||
Sensitivity, 0.90 | |||||||
Specificity, 0.93 | |||||||
AUROC, 0.92 | |||||||
KRAS mutation | |||||||
He et al. [63] (2020) | China | Enhanced APCT | CNN (ResNet) | Training, 117 | None | AI model: | AI model is superior to radiomics model |
Test, 40 | Sensitivity, 0.59 | ||||||
Specificity, 1.0 | |||||||
AUROC, 0.93 | |||||||
Radiomics model: | |||||||
Sensitivity, 0.70 | |||||||
Specificity, 0.85 | |||||||
AUROC, 0.82 |
Study | Country | Modality | Architecture | Internal dataset | External dataset | Performance | Comparison |
---|---|---|---|---|---|---|---|
Shi et al. [79] (2019) | China | T2WI (pre- and mid-CRT) | CNN | 51 | None | pCR: | No difference in predicting pCR |
AUROC (CNN), 0.83 | CNN model is inferior to radiomics model in predicting GR | ||||||
AUROC (radiomics), 0.81 | |||||||
GR: | |||||||
AUROC (CNN), 0.74 | |||||||
AUROC (radiomics), 0.92 | |||||||
Zhang et al. [80] (2020) | China | T2WI (pre- and post-CRT) | CNN (models A, B, C) | Training, 290 | 93 | pCR: | Model A is superior to the other models |
Accuracy, 0.98 | |||||||
Sensitivity, 1.00 | |||||||
Specificity, 0.97 | |||||||
AUROC, 0.99 | |||||||
Zhu et al. [81] (2020) | China | T2WI (pre-CRT) | CNN | Training, 400 | None | GR: | None |
Validation, 100 | Sensitivity, 0.93 | ||||||
Test, 200 | Specificity, 0.62 | ||||||
AUROC, 0.81 | |||||||
Jang et al. [78] (2021) | Korea | T2WI (post-CRT) | CNN (ShuffleNet) | Training, 303 | None | pCR: | AI model is superior to human for predicting pCR and GR |
Validation, 46 | Accuracy, 0.85 | ||||||
Test, 117 | Sensitivity, 0.30 | ||||||
Specificity, 0.96 | |||||||
GR: | |||||||
Accuracy 0.72 | |||||||
Sensitivity 0.54 | |||||||
Specificity 0.81 |
T2WI, T2-weighted image; DWI, diffusion-weighted image; CNN, convolutional neural network; DSC, dice similarity coefficient; AUROC, area under the receiver operating characteristic curve; 2D, 2-dimensional; FCN-8, fully convolutional network (8 pixels); 3D, 3-dimensional; LN, lymph node; PPV, positive predictive value.
AI, artificial intelligence; T2WI, T2-weighted image; CNN, convolutional neural network; AUROC, area under the receiver operating characteristic curve; DWI, diffusion-weighted image; MRI, magnetic resonance imaging; PPV, positive predictive value; NPV, negative predictive value.
AI, artificial intelligence; T2WI, T2-weighted image; CNN, convolutional neural network; 3D, 3-dimensional; AUROC, area under the receiver operating characteristic curve; APCT, abdominopelvic computed tomography.
AI, artificial intelligence; CRT, chemoradiotherapy; T2WI, T2-weighted image; CNN, convolutional neural network; pCR, pathologic complete response; AUROC, area under the receiver operating characteristic curve; GR, good response.