文献速递：深度学习疾病预后--使用深度学习对数字病理图像进行胃癌的准确诊断和预后预测：一项回顾性多中心研究

Title

题目

Accurate diagnosis and prognosis prediction of gastric cancer using deep learning on digital pathological images: A retrospective multicentre study

使用深度学习对数字病理图像进行胃癌的准确诊断和预后预测：一项回顾性多中心研究

文献速递介绍

Gastric cancer (GC) is the fifth most common type of malignant disease, and it ranks as the third leading cause of cancer-related deaths worldwide . For patients with early GC, the 5-year survival

rate can exceed 90% . However, approximately half of patients with GC already proceed the advanced stage at the time of diagnosis, with the 5-year survival rate dropping below 30%. To reduce the

mortality of GC, early detection and appropriate treatment are cru cial, and precise and efficient pathology services are indispensable to realize this goal. Pathological evaluation remains the gold standard for the diagno sis of GC. Conventionally carried out by pathologists, this method is labor-intensive, tedious, and time-consuming. A severe shortage of pathologists and a heavy workload of diagnosis are widespread prob lems globally, which negatively affect the diagnostic accuracy.

Accordingly, it is necessary to design a new method to conveniently and accurately diagnose GC using pathological pictures. Surgery is the main treatment for GC, followed by adjuvant treat ments including chemoradiotherapy and molecular targeted therapy

胃癌（GC）是全球第五大常见的恶性疾病类型，同时也是导致癌症相关死亡的第三大原因。对于早期GC患者，5年生存率可以超过90%。然而，约有一半的GC患者在诊断时已经进展到晚期阶段，5年生存率下降到30%以下。为了降低GC的死亡率，早期发现和适当治疗至关重要，精确高效的病理服务是实现这一目标不可或缺的。

病理学评估仍然是GC诊断的黄金标准。这一方法传统上由病理学家执行，是劳动密集型的、乏味的且耗时的。全球范围内，病理学家严重短缺和诊断工作量重是普遍存在的问题，这些问题负面影响了诊断的准确性。

因此，设计一种新方法，利用病理图片方便且准确地诊断GC变得十分必要。

手术是治疗GC的主要方法，随后是辅助治疗，包括化学放疗和分子靶向治疗。

Results

结果

Patient characteristics A total of 871 GC patients were initially screened from the RHWU cohort, and 588 with tumour tissue blocks were eligible for the study.

There were 449 GC patients with digital H&E-stained pathological images were eligible for this study in the TCGA cohort and 91 in the NHGRP cohort. A total of 1276 images from the RHWU cohort and 1057 images from the TCGA cohort were obtained for the develop ment of the GastroMIL model. Through data augmentation, 3221 pic tures (malignant: normal = 1574: 1647) were finally enrolled in the GastroMIL model and 70% (N = 2261) were randomly assigned to the training set while the remaining 30% (N = 960) were included in the internal validation set. 175 pictures from the independent NHGRP cohort were used as the external validation set. The detailed data dis tribution was shown in Supplementary Table 1.

患者特征最初从RHWU队列中筛选了871名GC患者，其中588名具有肿瘤组织块的患者符合研究资格。在TCGA队列中，有449名具有数字化H&E染色病理图像的GC患者符合本研究资格，在NHGRP队列中有91名。从RHWU队列中获得了1276张图像，从TCGA队列中获得了1057张图像，用于开发GastroMIL模型。通过数据增强，最终有3221张图片（恶性：正常 = 1574:1647）被纳入GastroMIL模型，其中70%（N = 2261）被随机分配到训练集，剩余30%（N = 960）被纳入内部验证集。独立的NHGRP队列中的175张图片被用作外部验证集。详细的数据分布显示在补充表1中。

Methods

方法

2333 hematoxylin and eosin-stained pathological pictures of 1037 GC patients were collected from two cohorts to develop our algorithms, Renmin Hospital of Wuhan University (RHWU) and the Cancer Genome Atlas (TCGA). Additionally, we gained 175 digital pictures of 91 GC patients from National Human Genetic Resources Sharing Service Platform (NHGRP), served as the independent external validation set. Two models were developed using artificial intelligence (AI), one named GastroMIL for diagnosing GC, and the other named MIL-GC for predicting outcome of GC.

我们从两个队列中收集了1037名GC患者的2333张苏木精和伊红染色的病理图片，用于开发我们的算法，这两个队列分别是武汉大学人民医院（RHWU）和癌症基因组图谱（TCGA）。此外，我们还从国家人类遗传资源共享服务平台（NHGRP）获得了91名GC患者的175张数字图片，作为独立的外部验证集。使用人工智能（AI）开发了两个模型，一个名为GastroMIL，用于诊断GC，另一个名为MIL-GC，用于预测GC的结果。

Fig

图

Fig. 1. Flow chart of the developed models. The framework of GastroMIL is shown in a-b, and that of MIL-GC is shown in a-c. Pathological images are input and tiles with 224 £ 224 pixels of each image are generated (a). Through CNN classifier of the MIL model, the probability of these tiles being malignant is output. Heat map visualizes ROIs identified by the model. Feature vectors with dimension 608 of the most suspicious tiles are extracted. Feature vectors of the K most suspicious tiles are input to the second layer of MIL and aggre gated by RNN, and then the final diagnosis prediction of the input image is generated. In this study we took K as 32 (b). Feature vectors of the most S suspicious tiles are input to the prognosis model (in this study S = 128). In the MIL-GC model, each feature vector yields a probability value through a MLP algorithm. Probability values of the 128 most suspicious tiles of the input picture were merged to generate an average value as the output risk score (c). CNN, convolutional neural network; RNN, recurrent neural network; MIL, multiple instance learning; MLP, multilayer perceptron; ROI, region of interest.

图1. 开发模型的流程图。GastroMIL的框架显示在a-b中，MIL-GC的框架显示在a-c中。输入病理图像并生成每个图像的224×224像素的图块（a）。通过MIL模型的CNN分类器，输出这些图块为恶性的概率。热图可视化模型识别的感兴趣区域（ROI）。提取最可疑图块的608维特征向量。输入K个最可疑图块的特征向量到MIL的第二层，并通过RNN聚合，然后生成输入图像的最终诊断预测。在本研究中，我们取K为32（b）。将最可疑的S个图块的特征向量输入到预后模型中（在本研究中S=128）。在MIL-GC模型中，每个特征向量通过MLP算法产生一个概率值。输入图片的128个最可疑图块的概率值合并生成一个平均值作为输出风险得分（c）。CNN，卷积神经网络；RNN，循环神经网络；MIL，多实例学习；MLP，多层感知器；ROI，感兴趣区域。

Fig. 2. Diagnostic abilities of GastroMIL at different magnification in the training and internal validation sets. a-c, ROC curves in the training set when images at 5 £, 10 £ and20£ magnification, respectively; e-g, ROC curves in the internal validation set when images at 5 £, 10 £ and 20 £ magnification, respectively. The AUC, Acc, Sen and Spe of the training and internal validation sets were exhibited in d and h, respectively. ROC, receiver operating characteristic; AUC, area under the curve; Acc, accuracy; Sen, sensitivity; Spe,specificity.

图2. GastroMIL在不同放大倍数下在训练集和内部验证集中的诊断能力。a-c，分别为训练集中5倍、10倍和20倍放大时的ROC曲线；e-g，分别为内部验证集中5倍、10倍和20倍放大时的ROC曲线。训练集和内部验证集的AUC、准确率、敏感性和特异性分别在d和h中展示。ROC，接收者操作特征；AUC，曲线下面积；准确率；敏感性；特异性。

Fig. 3. Heat maps of the RHWU cohort. a-d, pathological images and corresponding heat maps with pathological TNM stage I, II, III, and IV from the RHWU cohort, respectively. The actual tumor regions annotated by expert pathologists were shown with yellow lines.

图3. RHWU队列的热图。a-d，分别是RHWU队列中病理学TNM分期I、II、III和IV的病理图片及其对应的热图。由专家病理学家标注的实际肿瘤区域用黄线显示。

Fig. 4. Prognostic significance of the risk score generated by MIL-GC in the internal validation set. HRs for prediction of survival by the MIL-GC model and other clinicopathological indexes based on univariate (a) and multivariate (b) analyses. The output score was converted into a binary score (high or low risk), using the median value of the training set as a threshold. KM survival curves for the internal validation set (c) and some other subgroups: age 60 (d); age > 60 (e); histologic grade 1-2 (f); histologic grade 3-4 (g); pT stage 3-4 (h); pN stage 0-1 (i); pN stage 2-3 (j); pTNM stage 1-2 (k) and pTNM stage 3-4 (l). , P < 0.0001; **, P <0.01; *,P < 0.05. The P-value of Kaplan-Meier survival curve was evaluated by Log-Rank test. The P-value of HR was calculated by Cox analyse.

图4. MIL-GC生成的风险得分在内部验证集中的预后意义。基于单变量（a）和多变量（b）分析，MIL-GC模型和其他临床病理指标预测生存的风险比（HRs）。输出得分被转换为二进制得分（高风险或低风险），使用训练集的中位数作为阈值。内部验证集（c）以及一些其他子组的KM生存曲线：年龄≤60（d）；年龄>60（e）；组织学等级1-2（f）；组织学等级3-4（g）；pT阶段3-4（h）；pN阶段0-1（i）；pN阶段2-3（j）；pTNM阶段1-2（k）和pTNM阶段3-4（l）。，P < 0.0001；*，P < 0.01；，P < 0.05。Kaplan-Meier生存曲线的P值通过Log-Rank测试评估。HR的P值通过Cox分析计算。

Fig. 5. Predicting diagnosis and prognostic performance in the external validation set. ROC curve (a) and HRs based on univariate (b) and multivariate (d) analyses are exhibited. KM survival curves for the external validation set (c) and some other subgroups: age > 60 (e); tumour size 5 (f); histologic grade 3 (g); pT stage 3 (h); pN stage 0 (i); pN stage 3 (j); pM stage 0 (k); pTNM stage 2 (l) and pTNM stage 3 (m). *, P < 0.001; , P < 0.01; *, P < 0.05. ROC, receiver operating characteristic; AUC, area under the curve. The P-value of Kaplan Meier survival curve was evaluated by Log-Rank test. The P-value of HR was calculated by Cox analyse.

图5. 在外部验证集中预测诊断和预后性能。展示了ROC曲线（a）以及基于单变量（b）和多变量（d）分析的HRs。外部验证集（c）以及一些其他子组的KM生存曲线：年龄>60（e）；肿瘤大小≤5（f）；组织学等级3（g）；pT阶段3（h）；pN阶段0（i）；pN阶段3（j）；pM阶段0（k）；pTNM阶段2（l）和pTNM阶段3（m）。, P < 0.001；, P < 0.01；, P < 0.05。ROC，接收者操作特征；AUC，曲线下面积。Kaplan Meier生存曲线的P值通过Log-Rank测试评估。HR的P值通过Cox分析计算。

Fig. 6. Representative predictive tiles produced by our model. These tiles were of obvious tumour heterogeneity, including necrosis (a), nerve invasion (b), signet ring cell (c), intravasated cancer cells (d), muscularis propria invasion (e), and mucous secretion (f).

图6. 我们模型生成的代表性预测图块。这些图块展示了明显的肿瘤异质性，包括坏死（a）、神经侵犯（b）、印戒细胞（c）、癌细胞内脉侵犯（d）、肌层侵犯（e）和粘液分泌（f）。

Table

表

Table 1 Baseline characteristics in the prognostic model (MIL-GC).

表1 预后模型（MIL-GC）中的基线特征。

Table 2Accuracy, sensitivity and specificity of the diagnostic model (GastroMIL) and human pathologists.

表2诊断模型（GastroMIL）和人类病理学家的准确性、敏感性和特异性。