NPJ_Digital_Medicine_2024(1-141)

NPJ Digital Medicine 2024年文章汇总(1)

  • 自动抓取的数据,未经过手动验证。

共收录文章:358 篇

目录

  1. 基于模拟的数字孪生方法,用于评估紧急呼叫响应的组织。
  2. 一种基于深度学习的智能手机应用程序,用于使用内窥镜图像早期检测鼻咽癌。
  3. 将知识概念与整个幻灯片图像对齐,以进行精确的组织病理学图像分析。
  4. 一种新型汗液传感器可检测肝硬化住院患者和门诊患者的炎症差异节律模式。
  5. 美国基于对比规则和机器学习的数字自我分类系统。
  6. 使用特定性别的心脏模拟器快速准确地预测药物引起的致心律失常风险。
  7. 组织病理学人工智能模型的生态可持续基准测试。
  8. 构建类风湿关节炎滑膜关节的模块化多细胞虚拟双胞胎。
  9. 关于如何进行和描述数字健康干预协同设计的总体回顾。
  10. 使用静息态脑电图进行认知评估的自适应时空编码网络。
  11. 一种量化整个月经周期中可穿戴设备每日心血管参数波动的新方法。
  12. [利用远程医疗改善中国 14 亿人获得心血管护理的机会。](#利用远程医疗改善中国 14 亿人获得心血管护理的机会。)
  13. 医疗保健领域的法学硕士和人工智能算法的道德数据采集。
  14. 人类人工智能协作对实时手术反馈进行无监督分类。
  15. 通过改进糖尿病视网膜疾病的自主人工智能系统,减轻人工智能采用偏差。
  16. 支持日常环境中神经发育条件适应性功能的人工智能技术:系统评价。
  17. 大型语言模型的概率医学预测。
  18. 优化人类人工智能协作临床编码的范式。
  19. 结肠镜检查中具有不同假阳性率的两种计算机辅助检测系统的前瞻性比较。
  20. 医学未来研究的出现揭示了医学和医疗保健尚未开发的潜力。
  21. 急诊科使用计算机视觉人工智能和简短的患者视频剪辑进行住院预测。
  22. 推进心理健康研究中的数字传感。
  23. SurgeryLLM:用于手术决策支持和工作流程增强的检索增强生成大型语言模型框架。
  24. 数字心理健康干预中机器学习预测的最小数据集大小的估计。
  25. 对心理健康应用程序临床试验中不良事件的系统回顾和荟萃分析。
  26. 对大型语言模型的指南遵守情况进行自主医学评估。
  27. 使用机器学习预测南亚心血管疾病危险因素的控制。
  28. [通过平扫 CT 深度学习计时和生物缺血性中风病变年龄的生物标志物。](#通过平扫 CT 深度学习计时和生物缺血性中风病变年龄的生物标志物。)
  29. 昼夜节律紊乱的数字标记与心理健康风险之间的现实关联。
  30. [使用 5 年期间的运动学数据预测未来跌倒的帕金森病患者。](#使用 5 年期间的运动学数据预测未来跌倒的帕金森病患者。)
  31. 支持人工智能的可穿戴摄像头,用于协助非洲人群的饮食评估。
  32. 对医疗保健中儿科脓毒症预测技术的范围审查。
  33. 利用自然语言处理来汇总整个欧盟医疗器械的现场安全通知。
  34. 探索用于数字孪生解剖编辑的扩散模型的局限性和功能。
  35. [与 FDA 医疗器械报告相关的人工智能相关安全问题。](#与 FDA 医疗器械报告相关的人工智能相关安全问题。)
  36. 放射学人工智能时代的报销。
  37. 制定负责任地使用人工智能指南:针对医疗机构的综合案例研究。
  38. 使用静态站立平衡识别帕金森病及其阶段。
  39. 使用交互式机器学习方法对电子病历中具有注射吸毒史的人进行表型分析。
  40. 人类和人工智能协作对减少医学图像判读工作量的影响。
  41. 一种机器学习模型,用于在治疗早期使用体重变化特征来预测减肥成功。
  42. 使用人工智能为脓毒症患者提供最佳抗生素选择框架。
  43. 对年轻人进行数字睡眠酒精干预的混合方法评估中的自然语言处理。
  44. 自然睡眠环境中无张力的快速眼动睡眠半自动量化。
  45. 用于预测对乙酰氨基酚肝毒性的肝小叶虚拟可扩展模型。
  46. 医学中大型语言模型的前进道路是开放的。
  47. 临床医生医学强化学习入门。
  48. 人工智能因塑造医学未来的创新而荣获两项诺贝尔奖。
  49. [使用深度学习增强空间分辨率可改善基于厚层 CT 的胸部疾病诊断。](#使用深度学习增强空间分辨率可改善基于厚层 CT 的胸部疾病诊断。)
  50. 进行系统审查,了解用户对人工智能决策辅助的看法,为共同决策提供信息。
  51. 一种数据驱动框架,用于识别人工智能/机器学习模型可能表现不佳的患者亚组。
  52. 表型驱动的分子遗传学测试建议用于诊断儿科罕见疾病。
  53. [向 EHR 学习,在医疗保健领域实施人工智能。](#向 EHR 学习,在医疗保健领域实施人工智能。)
  54. 使用生成式人工智能生成以患者为中心的出院指令的质量和安全性。
  55. 一种从大规模在线认知数据估计特定领域认知能力的迭代方法。
  56. 可解释的机器学习模型,用于在缺失数据的中国人群中进行数字化肺癌预筛查。
  57. 在卫生系统范围内使用具有成本效益的大型语言模型的策略。
  58. [模拟 A/B 测试与 SMART 设计,以实现法学硕士驱动的患者参与,以缩小预防性护理差距。](#模拟 A/B 测试与 SMART 设计,以实现法学硕士驱动的患者参与,以缩小预防性护理差距。)
  59. 使用可穿戴睡眠和昼夜节律功能准确预测情绪障碍患者的情绪发作。
  60. 使用德尔菲研究开发加拿大人工智能医学课程。
  61. 用于优化右美托咪定剂量以预防危重患者谵妄的强化学习模型。
  62. 使用马尔可夫队列模拟对德国抑郁症的移动医疗应用进行成本效益分析。
  63. 数字健康干预对慢性肾脏病患者激活的影响。
  64. 从电子健康记录中提取儿科知识的多源表示学习。
  65. 模拟大型语言模型和临床学分系统的滥用。
  66. 增强现实钻井轨迹与传统导航随机交叉试验的准确性和效率。
  67. 使用电子病历的自然语言处理进行抗癌药物的上市后监测。
  68. 人工智能辅助内窥镜垂体手术中的手术解剖识别。
  69. 电子健康互动干预措施促进男男性行为者的安全性行为。
  70. 巴雷特食道的自动决策:自然语言处理工具的开发和部署。
  71. 将单一机构糖尿病护理平台改造为全国可用的交钥匙解决方案。
  72. 桥接器官转录组学,通过生成人工智能方法推进多器官毒性评估。
  73. 使用光学相干断层扫描基于深度学习的高精度移植生物工程角膜等效厚度测量。
  74. 应用于冠状动脉钙扫描(AI-CAC)的人工智能显着改善了心血管事件的预测。
  75. R.O.A.D.到精准医疗。
  76. 使用低成本可穿戴设备进行连续临床监测来预测登革热恶化情况。
  77. 对癌症患者进行数字化、分散的运动疗法试验。
  78. PRISM:使用大型语言模型的语义临床试验匹配系统的患者记录解释。
  79. 通过基于人工智能的病理学进行临床适用的优化假体周围关节感染诊断。
  80. 有关数字病理学人工智能产品的公开证据。
  81. 传输线模型作为腹主动脉瘤患者的数字双胞胎。
  82. 移动医疗数据分析中的流程挖掘。
  83. 医学大语言模型容易受到有针对性的错误信息攻击。
  84. 数字孪生引导的虚拟胺碘酮测试在房颤消融患者中的临床有效性。
  85. 开发和评估用于预测苏格兰紧急入院的机器学习工具。
  86. [根据 3D 体型预测总体和局部身体成分。](#根据 3D 体型预测总体和局部身体成分。)
  87. 使用支持人工智能的可穿戴摄像头检测临床用药错误。
  88. 医学语言模型中认知偏差的评估和缓解。
  89. [寻找 Long-COVID:来自 N3C 和 RECOVER 计划的电子健康记录的时间主题建模。](#寻找 Long-COVID:来自 N3C 和 RECOVER 计划的电子健康记录的时间主题建模。)
  90. 通过视网膜成像和值得信赖的人工智能及早发现痴呆症。
  91. 利用合成数据进行隐私增强和泛化深度学习,用于纵隔肿瘤诊断。
  92. 使用可穿戴传感器进行快照测试来检测心肺疾病(印度的新冠肺炎感染)的可行性。
  93. 生物学深度神经网络可对胶质母细胞瘤治疗后的瘤内异质性进行定量评估。
  94. [医疗保健决策和公平的无偏见预测信息指南 (GUIDE):种族可能成为预后因素时的考虑因素。](#医疗保健决策和公平的无偏见预测信息指南 (GUIDE):种族可能成为预后因素时的考虑因素。)
  95. 解决基于深度学习的医学图像分析中的公平性问题:系统评价。
  96. 一项随机临床试验,测试数字思维干预对膝骨关节炎疼痛和活动改善的影响。
  97. 医疗保健领域异构数据视图中基于知识抽象和过滤的联合学习。
  98. 印度不断发展的数字医疗战略。
  99. 深度学习用于腹主动脉瘤超声筛查的前瞻性临床评估。
  100. 在欧洲试点研究中通过安全多方计算对患者数据进行隐私友好的评估。
  101. 对用于心力衰竭管理的无创可穿戴技术的进展进行范围审查。
  102. 对非痴呆症患者认知症状的独立数字干预措施的系统回顾和荟萃分析。
  103. 使用强化学习在院外心脏骤停的现场复苏时间中做出个性化决策。
  104. 利用超广角眼底图像通过深度学习筛查慢性肾脏疾病。
  105. [通过 SERDIF 框架在弹性环境中实现罕见疾病的数据链接。](#通过 SERDIF 框架在弹性环境中实现罕见疾病的数据链接。)
  106. 利用孤立的人肺放射照片的独特特征提高肺移植的预后准确性。
  107. [对 FDA 批准的人工智能医疗设备的报告差距进行范围审查。](#对 FDA 批准的人工智能医疗设备的报告差距进行范围审查。)
  108. 算法警戒,药物警戒的教训。
  109. 报告和评估人工智能优于人类医生的说法的道德指南。
  110. "通过相互学习协调欧盟数字医疗设备的评估和报销"。
  111. 机器学习解释了深部脑刺激对帕金森病生活质量的反应变异性。
  112. 奥运会向运动员合作构建人工智能系统开放。
  113. 克服健康研究中个人层面的购物历史数据的偏差。
  114. 人工智能实施对医学成像效率的影响------系统文献综述和荟萃分析。
  115. 谈论疾病;开发患者和公众优先考虑的疾病表型模型。
  116. 使用机器学习模型确定哪些人不太可能从全膝关节置换术中受益。
  117. 数字与面对面认知行为疗法治疗头痛的间接治疗比较荟萃分析。
  118. 融合空间和时间,在扩展的现实中谈论癌症。
  119. 深度学习用于从电子病历中识别个人和家族的自杀想法和行为史。
  120. [NLP 机器学习模型与人类医生进行 ASA 身体状况分类的比较。](#NLP 机器学习模型与人类医生进行 ASA 身体状况分类的比较。)
  121. 来自文献综述的医疗保健中大型语言模型的人类评估框架。
  122. 用于结构化医疗信息检索的隐私保护大型语言模型。
  123. 使用变压器进行零射击健康轨迹预测。
  124. 以中国为重点的人工智能医疗器械的监管应对和审批状况。
  125. [用于个体化 2 型糖尿病管理的药物组合和剂量决策算法。](#用于个体化 2 型糖尿病管理的药物组合和剂量决策算法。)
  126. 对住院环境中数字心理健康干预措施的系统回顾和荟萃分析。
  127. 深度行为表征学习揭示了恶性室性心律失常的风险概况。
  128. 家庭医院如何实现更绿色、更健康的未来?
  129. 变分贝叶斯机器学习用于一般结果指标的风险调整,并以泌尿科为例。
  130. 大型生物医学和健康信息学入门课程中生成式人工智能的结果和影响。
  131. 一项测试数字医学支持轻度至中度酒精使用障碍模型的随机试验。
  132. 患有和未患有长新冠病毒的人的可穿戴传感器数据的长期变化。
  133. [社交媒体可以鼓励糖尿病自我筛查吗?一项针对印度尼西亚 Facebook 用户的随机对照试验。](#社交媒体可以鼓励糖尿病自我筛查吗?一项针对印度尼西亚 Facebook 用户的随机对照试验。)
  134. 一项随机对照试验,调查谨慎使用抗生素的体验式虚拟现实交流。
  135. 有缺陷的医疗保健人工智能指标中的道德争论。
  136. [大规模了解活动和生理学:Apple 心脏和运动研究。](#大规模了解活动和生理学:Apple 心脏和运动研究。)
  137. 机器学习算法预测心理健康危机的跨大西洋可转移性和可复制性。
  138. [接种 COVID-19 疫苗后坚持非药物干预措施:一项联合队列研究。](#接种 COVID-19 疫苗后坚持非药物干预措施:一项联合队列研究。)
  139. 缩小用于医学证据总结的开源和商业大型语言模型之间的差距。
  140. 浏览欧盟人工智能法案:对受监管数字医疗产品的影响。
  141. 为可穿戴设备驱动的冠状动脉数字双胞胎建立纵向血流动力学映射框架。
  142. [MyThisYourThat 用于生物医学图像联合学习中系统偏差的可解释识别。](#MyThisYourThat 用于生物医学图像联合学习中系统偏差的可解释识别。)
  143. 将数字步态数据与代谢组学和临床数据相结合,以预测帕金森病的结果。
  144. 用于检测颅内高压的深度学习方法的推导、外部和临床验证。
  145. 人工智能估计心电图年龄作为房颤导管消融术后复发的预测因子。
  146. 开发用于阿尔茨海默病研究的远程测量技术的监管考虑因素。
  147. 开发、部署和扩展手术室就绪的人工智能,以支持实时手术决策。
  148. 绘制欧盟内部人工智能在健康领域的监管格局。
  149. 一个基于信任的医疗人工智能框架。
  150. [PRECISE CURATE.AI 试验中首例华氏巨球蛋白血症患者的个性化剂量选择。](#PRECISE CURATE.AI 试验中首例华氏巨球蛋白血症患者的个性化剂量选择。)
  151. 皮肤病学中的数字孪生、现状和未来之路。
  152. 对基于大语言模型的放射学报告信息提取方法的范围审查。
  153. [针对不同种族人群中长 QT 综合征致病基因型的多模式融合学习。](#针对不同种族人群中长 QT 综合征致病基因型的多模式融合学习。)
  154. 使用纵向电子健康记录对抗生素引起的艰难梭菌感染进行深度学习预测。
  155. 将自成像光学相干断层扫描纳入基于眼底照相的糖尿病视网膜病变筛查的成本效益。
  156. 一种使用眼动追踪机器学习和虚拟现实的便携式高效痴呆症筛查工具。
  157. 用于帕金森病精确诊断和监测的数字生物标志物。
  158. [FDA 发起家庭医疗保健计划,以推动数字医疗保健的公平性。](#FDA 发起家庭医疗保健计划,以推动数字医疗保健的公平性。)
  159. 缩小差距:解决持续大流行时代各专业之间的远程医疗差异。
  160. 欧盟-美国数据传输:健康研究合作的持久挑战。
  161. [使用基于 Transformer 的序列建模,利用纵向医学成像的力量进行眼病预后。](#使用基于 Transformer 的序列建模,利用纵向医学成像的力量进行眼病预后。)
  162. 计算机认知训练与认知改善之间的剂量-反应关系。
  163. 强大的自动钙化网格划分,可实现个性化心血管生物力学。
  164. 从肿瘤学不成功的分散临床试验中吸取的经验教训。
  165. 数字医疗技术证据和获取国家政策的核心要素。
  166. 可穿戴设备记录的完整人类妊娠的生物识别。
  167. 浏览欧盟医疗保健人工智能法案。
  168. 从全球角度来看,人工智能应用的临床研究存在差异。
  169. 临床语音人工智能的负责任发展:弥合临床研究与技术之间的差距。
  170. 评估医疗诊断中的多模式人工智能。
  171. 用于儿童近视发病预测和干预效果评估的深度学习系统。
  172. [用于评估医学中值得信赖的人工智能的数据质量的 METRIC 框架:系统评价。](#用于评估医学中值得信赖的人工智能的数据质量的 METRIC 框架:系统评价。)
  173. 眼动追踪通过安全和不安全的可解释人工智能建议来洞察医生行为。
  174. [通过 3 导联人工智能增强重建 12 导联心电图,并进行准确的临床评估。](#通过 3 导联人工智能增强重建 12 导联心电图,并进行准确的临床评估。)
  175. 新诊断和复发的多发性骨髓瘤的联合人工智能驱动事件预测和纵向建模。
  176. 专家关注作为医疗人工智能决策支持系统可用性指标:初步研究。
  177. 使用呼吸可穿戴传感器估计运动期间的通气阈值。
  178. 基于院前护理点测试和现场生命体征的临床表型和短期结果。
  179. [医学领域多模态 GPT-4 视觉专家级准确性背后隐藏的缺陷。](#医学领域多模态 GPT-4 视觉专家级准确性背后隐藏的缺陷。)
  180. 为医学成像中的多模式和纵向数据编排可解释的人工智能。
  181. 用于治疗糖尿病眼病的自主人工智能可以增加服务不足人群的获取机会和健康公平性。
  182. [使用社交媒体估计家庭二次发作率和 COVID-19 的连续间隔。](#使用社交媒体估计家庭二次发作率和 COVID-19 的连续间隔。)
  183. 在识别强迫症方面,大型语言模型的表现优于心理和医疗保健专业人员。
  184. [系统回顾远程患者监测 (RPM) 干预措施对安全性、依从性、生活质量和成本相关结果的影响。](#系统回顾远程患者监测 (RPM) 干预措施对安全性、依从性、生活质量和成本相关结果的影响。)
  185. 前瞻性研究中肤色评估的调查。
  186. 从虚拟患者到免疫肿瘤学中的数字双胞胎:从机械定量系统药理学模型中吸取的教训。
  187. 帕金森病非运动症状的数字生物标志物:最先进的技术。
  188. [UCHealth 的虚拟健康中心:科罗拉多州最大的医疗系统如何创建技术并将其集成到患者护理中。](#UCHealth 的虚拟健康中心:科罗拉多州最大的医疗系统如何创建技术并将其集成到患者护理中。)
  189. [通过多模态数据的综合分析识别帕金森病 PACE 亚型并重新调整治疗方案。](#通过多模态数据的综合分析识别帕金森病 PACE 亚型并重新调整治疗方案。)
  190. [从孤岛到协同:将学术健康信息学与运营 IT 相结合,实现医疗保健转型。](#从孤岛到协同:将学术健康信息学与运营 IT 相结合,实现医疗保健转型。)
  191. [ChatGPT 在医学和医疗保健领域的伦理:大型语言模型 (LLM) 的系统评价。](#ChatGPT 在医学和医疗保健领域的伦理:大型语言模型 (LLM) 的系统评价。)
  192. 使用不同的动态导航和机器人系统进行牙种植体植入的准确性:一项体外研究。
  193. 多型感染性角膜炎诊断的深度学习:一项全国性、横断面、多中心研究。
  194. 使用在健康受试者上训练的人工智能模型来量化损伤和疾病严重程度。
  195. 对改善生活方式行为的电子医疗和移动医疗干预措施进行系统性总体审查和荟萃分析。
  196. 利益竞争:数字健康和本土数据主权。
  197. 使用人工智能心电图分析进行儿科性别估计:青春期发育的影响。
  198. 监管和报销如何更好地适应灵活的数字医疗技术套件?
  199. 促进数字健康公平的建议:定性研究的系统回顾。
  200. 参与和行为机制是否支撑"少喝"应用程序的有效性?
  201. 深度学习量化新生儿重症监护病房的护理操作活动。
  202. 电子健康记录共享基础模型适应性的多中心研究。
  203. 基于可穿戴传感器的不同运动亚型帕金森病患者的定量步态分析。
  204. 人工智能增强心电图得出的体重指数作为未来心脏代谢疾病的预测因子。
  205. 一项范围审查,评估针对多发性硬化症患者的数字健康技术的可用性。
  206. 人工智能能否改善医学与数学之间不舒服的关系?
  207. 基于视频的震颤分析的计算机视觉算法的验证和应用。
  208. [使用 ClarityDX 机器学习平台开发有效的前列腺癌预测筛查工具。](#使用 ClarityDX 机器学习平台开发有效的前列腺癌预测筛查工具。)
  209. 开发和验证基于智能手机的深度学习系统,用于检测耳镜图像中的中耳状况。
  210. 五百万个夜晚:人类睡眠表型的时间动态。
  211. PatchSorter:一种用于对象标记的高通量深度学习数字病理学工具。
  212. 从可穿戴传感器数据到数字生物标记开发:十个经验教训和框架建议。
  213. 肌张力障碍的头部运动动力学:使用视觉感知深度学习的多中心回顾性研究。
  214. 数字医疗技术需要监管和报销,以实现灵活的交互和分组。
  215. 轻度认知障碍和痴呆症患者的运动游戏和认知功能:荟萃分析。
  216. [COVID-19 大流行期间远程医疗与面对面护理的有效性:系统评价。](#COVID-19 大流行期间远程医疗与面对面护理的有效性:系统评价。)
  217. [从 FDA 授权的首款用于初级护理的人工智能皮肤癌设备中汲取经验。](#从 FDA 授权的首款用于初级护理的人工智能皮肤癌设备中汲取经验。)
  218. 捕获缝合子技能之间的关系以改进自动缝合评估。
  219. 跨临床护理阶段的计算得出的过渡点。
  220. 评估大型医疗保健系统中部署的机器学习营养不良预测模型的校准和偏差。
  221. [与 EHR 系统相连的 CLL 治疗感染模型的部署和验证。](#与 EHR 系统相连的 CLL 治疗感染模型的部署和验证。)
  222. 使用基于可穿戴设备的加速度测量作为衰老和健康寿命的数字生物标记进行昼夜节律分析。
  223. 数字孪生在促进全民精准健康方面的有效性:系统评价。
  224. 使用深度学习自动估计家庭睡眠呼吸暂停。
  225. 深度学习直接从子宫内膜癌的组织病理学全幻灯片图像评估微卫星不稳定性。
  226. 可穿戴传感器和机器学习可估计老年人和神经系统疾病患者的步长。
  227. 德国数字治疗报销计划的三年演变及其前进道路。
  228. [结合广义代谢流的数字孪生模型,用于识别和预测 2 型糖尿病慢性肾病。](#结合广义代谢流的数字孪生模型,用于识别和预测 2 型糖尿病慢性肾病。)
  229. 体力活动模式与帕金森病风险的关联。
  230. [使用应用于冠状动脉造影的 AI 视频模型评估狭窄。](#使用应用于冠状动脉造影的 AI 视频模型评估狭窄。)
  231. 使用商业可穿戴设备进行早期不良生理事件检测:挑战和机遇。
  232. 欧盟通过了人工智能法案,其对数字医学的影响尚不清楚。
  233. 泛化------负责任的人工智能在面向患者的临床应用中的一个关键挑战。
  234. 虚拟健身伙伴生态系统:针对儿童的混合现实精准健康体育活动干预。
  235. 加速度计数据的自我监督学习为睡眠及其与死亡率的关系提供了新的见解。
  236. 使用自然语言处理预测青少年心理干预中的反复聊天接触。
  237. [经验证的网络应用程序 (GFDC),用于使用 Hodapp-Parrish-Anderson 标准对青光眼视野缺陷进行自动分类。](#经验证的网络应用程序 (GFDC),用于使用 Hodapp-Parrish-Anderson 标准对青光眼视野缺陷进行自动分类。)
  238. 基于心电图的机器学习算法的开发和验证,用于人群层面的心血管诊断。
  239. StrokeClassifier:使用电子健康记录通过整体共识模型对缺血性中风病因进行分类。
  240. 学术医疗中心转化临床机器学习的操作指南。
  241. 数字测量开发中以患者为中心:最佳实践和监管指南的共同演化。
  242. 用于信息提取的生物医学自然语言处理联邦学习的深入评估。
  243. 人工智能与临床医生在皮肤癌诊断方面的系统回顾和荟萃分析。
  244. 医疗人工智能中的捷径学习阻碍了泛化:无需外部数据即可估计人工智能模型泛化的方法。
  245. [FDA 批准的家用睡眠呼吸暂停测试设备。](#FDA 批准的家用睡眠呼吸暂停测试设备。)
  246. 多种运动障碍中基底神经节信号的广义睡眠解码。
  247. 机器学习技术在创伤后应激障碍中的应用:系统评价和荟萃分析。
  248. 医疗人工智能算法上市后监测的分布变化检测:一项回顾性模拟研究。
  249. 制定医疗保健新路线:早期人工智能算法注册以增强信任和透明度。
  250. [使用稀疏典型相关分析和合作学习的多模态数据融合:一项 COVID-19 队列研究。](#使用稀疏典型相关分析和合作学习的多模态数据融合:一项 COVID-19 队列研究。)
  251. 针对帕金森病和快速眼动睡眠行为障碍患者的在线认知监测技术。
  252. 探索认知储备的影响:揭示数字远程康复在帕金森病恢复力中的动态。
  253. 数字病理学中的人工智能:诊断测试准确性的系统回顾和荟萃分析。
  254. 使用生成深度学习进行光学相干断层扫描脉络膜增强。
  255. [快速同步并解决问题 - 负责任的数字健康的最佳实践。](#快速同步并解决问题 - 负责任的数字健康的最佳实践。)
  256. [基于生理学的数字孪生,用于预测现实生活中的饮酒反应和长期血浆 PEth。](#基于生理学的数字孪生,用于预测现实生活中的饮酒反应和长期血浆 PEth。)
  257. FFA-GPT:用于眼底荧光素血管造影解释和问答的自动化管道。
  258. 通过社交媒体在时间和空间上进行基于语言的强大心理健康评估。
  259. 使用可解释的人工智能构建痴呆症患者大脑结构畸变的个性化特征。
  260. [使用 ChatGPT 从临床记录中提取结构化数据的关键评估。](#使用 ChatGPT 从临床记录中提取结构化数据的关键评估。)
  261. [RETFound 增强的基于社区的眼底疾病筛查:真实世界证据和决策曲线分析。](#RETFound 增强的基于社区的眼底疾病筛查:真实世界证据和决策曲线分析。)
  262. 视频剪辑帮助患者了解急诊护理中的心房颤动和深静脉血栓形成。一项随机临床试验。
  263. 基于证据的健康信息增加了德国应对孤独感的意愿:一项随机对照在线试验。
  264. DialBetesPlus(糖尿病肾病自我管理支持系统)的有效性:随机对照试验。
  265. [FDA 批准的手术机器人的自主水平:系统评价。](#FDA 批准的手术机器人的自主水平:系统评价。)
  266. 增强非幻觉大型语言模型作为医疗信息管理者。
  267. 通过大语言模型优化肝病临床指南解释:基于检索增强生成的框架。
  268. 诊断的二分法:探索消费者、临床医生和护理途径的价值。
  269. 在结直肠手术中使用可扩展的非专家众包的实时近红外人工智能。
  270. [使用人工智能预测非肌肉浸润性膀胱癌的结果:使用 APPRAISE-AI 进行的系统评价。](#使用人工智能预测非肌肉浸润性膀胱癌的结果:使用 APPRAISE-AI 进行的系统评价。)
  271. 半月板损伤可视化MRI精细分级,用于智能辅助诊断和治疗。
  272. 超声心动图的深度学习评估以识别隐匿性心房颤动。
  273. [建立一个进行临床研究的欧洲共同道德和法律框架:GATEKEEPER 经验。](#建立一个进行临床研究的欧洲共同道德和法律框架:GATEKEEPER 经验。)
  274. 通过多学科德尔菲研究衡量国家数字公共卫生系统成熟度的国际视角。
  275. 探索美国神经数字健康技术的监管环境。
  276. [使用 700,000 人日的可穿戴数据进行人类活动识别的自我监督学习。](#使用 700,000 人日的可穿戴数据进行人类活动识别的自我监督学习。)
  277. 使用潜在神经常微分方程进行全心脏机电模拟。
  278. 了解人工智能算法在组织病理学中所犯的错误对患者的影响。
  279. 人工智能改变医疗保健的潜力:国际卫生领袖的观点。
  280. 皮肤癌诊断中的人机交互:系统评价和荟萃分析。
  281. 算法旅程图:在医疗保健领域实施人工智能解决方案的切实方法。
  282. 评估大型语言模型作为临床代理。
  283. 使用人工智能从社交媒体了解当代对冠状动脉钙的态度和信念。
  284. 用于评估由生成人工智能支持的医疗保健对话有效性的基础指标。
  285. 人类心房颤动数字双胞胎中肺静脉隔离间隙的抗颤动和促颤动作用。
  286. 一种远程数字记忆复合材料,可使用移动设备在无人监督的环境中检测记忆诊所样本中的认知障碍。
  287. [临床医生-AI 接口:经 FDA 批准的用于医学图像解读的 AI 设备的预期用途和可解释性。](#临床医生-AI 接口:经 FDA 批准的用于医学图像解读的 AI 设备的预期用途和可解释性。)
  288. 健康数字孪生:范围界定审查。
  289. 使用生成人工智能模拟合成急性髓系白血病患者的临床试验。
  290. 我们可以从想象中的针对家庭医院平台的勒索软件攻击中吸取教训吗?
  291. 使用自动会话代理改善年轻人心理健康:范围界定审查。
  292. 评估可穿戴设备睡眠分期的可靠性。
  293. 技术支持的减少成人钠摄入量的行为改变干预措施:系统评价和荟萃分析。
  294. 严重高胆固醇血症的临床决策支持对低密度脂蛋白胆固醇水平的影响。
  295. [1 型糖尿病中葡萄糖与生态瞬时认知之间的动态关联。](#1 型糖尿病中葡萄糖与生态瞬时认知之间的动态关联。)
  296. [COVID-19 大流行期间儿童的体力活动和睡眠发生变化。](#COVID-19 大流行期间儿童的体力活动和睡眠发生变化。)
  297. [用于措施选择的主动收集和被动监测的临床结果评估 (COA) 的一致框架。](#用于措施选择的主动收集和被动监测的临床结果评估 (COA) 的一致框架。)
  298. 使用常规收集的健康数据在研究中应用本土数据治理方法:范围界定审查。
  299. 放射学领域人工智能监管的透明但不透明的挑战。
  300. 通过多组学驱动的机器学习为免疫治疗提供信息。
  301. 描述用户与移动健康的互动,以进行慢性病自我管理以及对机器学习性能的影响。
  302. 使用移动和可穿戴传感器数据对多发性硬化症进行建模。
  303. 缩小手术同意的文化差距:人工智能与人类专家的协作方法。
  304. 从多层共病网络中揭示从摇篮到坟墓的疾病轨迹。
  305. 为了保证临床采用人工智能模型,需要进行多方面的实施评估。
  306. 远程跌倒风险情境化:视频数据捕获和实施道德人工智能。
  307. 为什么我们不应该将医疗人工智能的准确性误认为效率。
  308. 人工智能个性化辅助生殖技术的前景。
  309. 智能手机键盘动态预测自杀意念的影响。
  310. 为什么概率临床模型无法在站点之间传输。
  311. 从以太网到以太网:确保心血管手术候补分诊数字化转型中的道德政策。
  312. 基于互联网和移动设备的针对青少年创伤后应激症状的心理干预:系统评价和荟萃分析。
  313. 根据智能手机收集的行为模式进行个性化情绪预测。
  314. 通过混合方法分析了解一般实践中采用数字医疗的内在影响因素。
  315. 美国本土医院:现状与未来展望
  316. 利用生成式人工智能,通过现实世界的临床验证,优先考虑阿尔茨海默病的药物再利用候选药物。
  317. 行走、交谈、思考、观察和感受:利用医疗保健中数字生物标记的力量。
  318. 评估智能设备的所有权和数字健康数据共享的可接受性。
  319. 使用工作流程注释实现可靠的癫痫发作检测。
  320. 医疗人工智能的经济评估:糖尿病视网膜病变筛查案例的准确性与成本效益。
  321. 通过法学硕士循证指南,促进工程设计的一致性和可靠性。
  322. [CancerGPT 使用大型预训练语言模型进行少量药物对协同预测。](#CancerGPT 使用大型预训练语言模型进行少量药物对协同预测。)
  323. [使用强化学习进行个性化沟通对药物依从性的影响:REINFORCE 试验的结果。](#使用强化学习进行个性化沟通对药物依从性的影响:REINFORCE 试验的结果。)
  324. 基于语音的自动评估,通过跨语言方法区分帕金森病和原发性震颤。
  325. 经常打鼾与不受控制的高血压有关。
  326. 量化远程医疗和患者医疗建议请求消息对医生工作外工作的影响。
  327. 公民数据主权是可穿戴设备和健康数据重用以实现共同利益的关键。
  328. 利用深度学习将彩色眼底摄影转化为吲哚菁绿血管造影,用于年龄相关性黄斑变性筛查。
  329. 数字健康技术和机器学习增强了患者报告的结果,以远程表征类风湿性关节炎。
  330. [FastEval Parkinsonism:一种用于帕金森运动症状评估的即时深度学习辅助视频在线系统。](#FastEval Parkinsonism:一种用于帕金森运动症状评估的即时深度学习辅助视频在线系统。)
  331. 优化皮肤病诊断:通过对比学习和聚类技术利用在线社区数据。
  332. [超越 510(k):FDA De Novo 途径中新型中等风险医疗器械的监管、知识产权考虑和创新激励。](#超越 510(k):FDA De Novo 途径中新型中等风险医疗器械的监管、知识产权考虑和创新激励。)
  333. 促进心理弹性的数字干预:系统评价和荟萃分析。
  334. 用于预测入院非造影头部计算机断层扫描的幕上血肿扩张的不确定性感知深度学习模型。
  335. 人工智能驱动的社区居民虚拟康复:范围界定审查。
  336. [AI 衍生的心外膜脂肪测量可改善心肌灌注成像的心血管风险预测。](#AI 衍生的心外膜脂肪测量可改善心肌灌注成像的心血管风险预测。)
  337. [CMS 有机会通过推进技术支持的初创企业和数字健康创新来改善医疗保健的可及性和公平性。](#CMS 有机会通过推进技术支持的初创企业和数字健康创新来改善医疗保健的可及性和公平性。)
  338. 将空间计算和人工智能相结合以支持焦虑和抑郁心理健康的可行性。
  339. 支持人工智能/机器学习的医疗设备的透明度。
  340. 诊断推理提示揭示了大语言模型可解释性在医学中的潜力。
  341. 跨大西洋协作健康数据使用和人工智能开发的有趣愿景。
  342. 深度学习脓毒症预测模型对护理质量和生存的影响。
  343. [人工智能辅助 PET 成像对帕金森病的诊断性能:系统评价和荟萃分析。](#人工智能辅助 PET 成像对帕金森病的诊断性能:系统评价和荟萃分析。)
  344. DRG-LLaMA:调整LLaMA模型以预测住院患者的诊断相关组。
  345. 使用数字技术在家中进行诊断:德尔福小组的建议。
  346. 基于组织病理学图像的深度学习预测小细胞肺癌的预后和治疗反应。
  347. 无线面部生物传感系统,通过灵活的微针电极阵列监测面部麻痹。
  348. 在德国医疗保健系统中实施云计算。
  349. 用于筛查和早期发现尿路感染的数字远程监测。
  350. [下一代研究数据库需要公平、EHR 集成且可扩展的电子数据捕获来进行医疗记录和决策支持。](#下一代研究数据库需要公平、EHR 集成且可扩展的电子数据捕获来进行医疗记录和决策支持。)
  351. 深度学习系统辅助初级眼科医生诊断13种主要眼底疾病的表现:前瞻性多中心临床试验
  352. [对患有 2 型糖尿病或有 2 型糖尿病风险的西班牙裔/拉丁裔成年人的饮食、体力活动和血糖进行多模式数字表型分析。](#对患有 2 型糖尿病或有 2 型糖尿病风险的西班牙裔/拉丁裔成年人的饮食、体力活动和血糖进行多模式数字表型分析。)
  353. 用于识别电子健康记录中健康的社会决定因素的大型语言模型。
  354. 英国资助机构推出数字健康中心:变革的新催化剂?
  355. 基于机器学习的肝细胞癌治疗推荐和总生存预测的临床决策支持系统:一项多中心研究。
  356. 支持人工智能的心电图,用于测量左心室舒张功能和充盈压。
  357. 基于图学习的可解释模型,用于通过语音相关脑电图诊断帕金森病。
  358. 针对轻度认知障碍或痴呆症记忆功能的计算机认知训练:系统评价和荟萃分析。

1. A simulation based digital twin approach to assessing the organization of response to emergency calls.

基于模拟的数字孪生方法,用于评估紧急呼叫响应的组织。

PMID: 39741218 | DOI: 10.1038/s41746-024-01392-2 | 日期: 2024-12-31

摘要: In emergency situations, timely contact with emergency medical communication centers (EMCCs) is critical for patient outcomes. Increasing call volumes and economic constraints are challenging many countries, necessitating organizational changes in EMCCs. This study uses a simulation-based digital twin approach, creating a virtual model of EMCC operations to assess the impact of different organizational scenarios on accessibility. Specifically, we explore two decompartmentalized scenarios where traditionally isolated call centers are reorganized to enable more flexible call distribution. The primary measure of accessibility was service quality within 30 s of call reception. Our results show that decompartmentalization improves service quality by 17% to 21%. This study demonstrates that reducing regional isolation in EMCCs can enhance performance and accessibility with a simulation-based digital twin approach providing a clear and objective method to quantify the benefits."

中文摘要: 在紧急情况下,及时联系紧急医疗通信中心 (EMCC) 对于患者的治疗效果至关重要。不断增加的通话量和经济限制给许多国家带来了挑战,需要 EMCC 进行组织变革。本研究使用基于模拟的数字孪生方法,创建 EMCC 运营的虚拟模型,以评估不同组织场景对可访问性的影响。具体来说,我们探索了两种分离的场景,其中传统上孤立的呼叫中心被重组,以实现更灵活的呼叫分配。可访问性的主要衡量标准是接到呼叫后 30 秒内的服务质量。我们的结果表明,去划分可以将服务质量提高 17% 至 21%。这项研究表明,通过基于仿真的数字孪生方法,减少 EMCC 中的区域隔离可以提高性能和可及性,从而提供清晰客观的方法来量化收益。"


2. A deep learning based smartphone application for early detection of nasopharyngeal carcinoma using endoscopic images.

一种基于深度学习的智能手机应用程序,用于使用内窥镜图像早期检测鼻咽癌。

PMID: 39738998 | DOI: 10.1038/s41746-024-01403-2 | 日期: 2024-12-31

摘要: Nasal endoscopy is crucial for the early detection of nasopharyngeal carcinoma (NPC), but its accuracy relies heavily on the clinician's expertise, posing challenges for primary healthcare providers. Here, we retrospectively analysed 39,340 nasal endoscopic white-light images from three high-incidence NPC centres, utilising eight advanced deep learning models to develop an Internet-enabled smartphone application, "Nose-Keeper", that can be used for early detection of NPC and five prevalent nasal diseases and assessment of healthy individuals. Our app demonstrated a remarkable overall accuracy of 92.27% (95% Confidence Interval (CI): 90.66%-93.61%). Notably, its sensitivity and specificity in NPC detection achieved 96.39% and 99.91%, respectively, outperforming nine experienced otolaryngologists. Explainable artificial intelligence was employed to highlight key lesion areas, improving Nose-Keeper's decision-making accuracy and safety. Nose-Keeper can assist primary healthcare providers in diagnosing NPC and common nasal diseases efficiently, offering a valuable resource for people in high-incidence NPC regions to manage nasal cavity health effectively.

中文摘要: 鼻内窥镜检查对于鼻咽癌(NPC)的早期发现至关重要,但其准确性在很大程度上依赖于临床医生的专业知识,这给基层医疗保健提供者带来了挑战。在这里,我们回顾性分析了来自三个鼻咽癌高发中心的 39,340 张鼻内窥镜白光图像,利用八种先进的深度学习模型开发了一款支持互联网的智能手机应用程序"Nose-Keeper",可用于鼻咽癌和五种常见鼻部疾病的早期检测以及健康个体的评估。我们的应用程序表现出 92.27% 的显着整体准确率(95% 置信区间 (CI):90.66%-93.61%)。值得注意的是,其鼻咽癌检测的灵敏度和特异性分别达到96.39%和99.91%,优于九位经验丰富的耳鼻喉科医生。采用可解释的人工智能来突出关键病变区域,提高 Nose-Keeper 决策的准确性和安全性。 Nose-Keeper可以帮助基层医疗保健人员有效诊断鼻咽癌和常见鼻部疾病,为鼻咽癌高发地区的人们有效管理鼻腔健康提供宝贵的资源。


3. Aligning knowledge concepts to whole slide images for precise histopathology image analysis.

将知识概念与整个幻灯片图像对齐,以进行精确的组织病理学图像分析。

PMID: 39738468 | DOI: 10.1038/s41746-024-01411-2 | 日期: 2024-12-30

摘要: Due to the large size and lack of fine-grained annotation, Whole Slide Images (WSIs) analysis is commonly approached as a Multiple Instance Learning (MIL) problem. However, previous studies only learn from training data, posing a stark contrast to how human clinicians teach each other and reason about histopathologic entities and factors. Here, we present a novel knowledge concept-based MIL framework, named ConcepPath, to fill this gap. Specifically, ConcepPath utilizes GPT-4 to induce reliable disease-specific human expert concepts from medical literature and incorporate them with a group of purely learnable concepts to extract complementary knowledge from training data. In ConcepPath, WSIs are aligned to these linguistic knowledge concepts by utilizing the pathology vision-language model as the basic building component. In the application of lung cancer subtyping, breast cancer HER2 scoring, and gastric cancer immunotherapy-sensitive subtyping tasks, ConcepPath significantly outperformed previous SOTA methods, which lacked the guidance of human expert knowledge.

中文摘要: 由于尺寸较大且缺乏细粒度注释,整个幻灯片图像 (WSI) 分析通常被视为多实例学习 (MIL) 问题。然而,之前的研究仅从训练数据中学习,这与人类临床医生如何互相教学以及推理组织病理学实体和因素形成鲜明对比。在这里,我们提出了一种新颖的基于知识概念的 MIL 框架,名为 ConcepPath,来填补这一空白。具体来说,ConcepPath 利用 GPT-4 从医学文献中引入可靠的特定疾病人类专家概念,并将其与一组纯粹可学习的概念相结合,以从训练数据中提取补充知识。在 ConcepPath 中,WSI 通过利用病理视觉语言模型作为基本构建组件来与这些语言知识概念保持一致。在肺癌分型、乳腺癌 HER2 评分和胃癌免疫治疗敏感分型任务的应用中,ConcepPath 显着优于之前缺乏人类专家知识指导的 SOTA 方法。


4. A novel sweat sensor detects inflammatory differential rhythmicity patterns in inpatients and outpatients with cirrhosis.

一种新型汗液传感器可检测肝硬化住院患者和门诊患者的炎症差异节律模式。

PMID: 39733165 | DOI: 10.1038/s41746-024-01404-1 | 日期: 2024-12-28

摘要: Patients with cirrhosis have high systemic inflammation (TNFα, CRP, and IL-6) that is associated with poor outcomes. These biomarkers need continuous non-invasive monitoring, which is difficult with blood. We studied the AWARE sweat-sensor to measure these in passively expressed sweat in healthy people (N = 12) and cirrhosis (N = 32, 10 outpatients/22 inpatients) for 3 days. Blood CRP, TNFα, IL6, levels, and liver function and quality of life were measured. We found that CRP, TNFα, and IL6 were correlated in sweat and serum among both groups and were evaluated in inpatients versus outpatients/controls. IL6 is associated with lower transplant-free survival. Sweat monitoring nocturnal CRP/IL6 elevations in cirrhosis versus controls. Outpatients with cirrhosis had inflammation levels that elevated during the evening and peaked towards the early night periods. The levels start to fall much later at night and early morning. These data suggest that further investigation of continuous measurement of sweat biomarkers in cirrhosis is warranted.

中文摘要: 肝硬化患者有较高的全身炎症(TNFα、CRP 和 IL-6),与不良预后相关。这些生物标志物需要连续的非侵入性监测,而这对于血液来说是困难的。我们研究了 AWARE 汗液传感器,以测量健康人 (N = 12) 和肝硬化患者 (N = 32,10 名门诊患者/22 名住院患者) 被动表达的汗液,为期 3 天。测量血液 CRP、TNFα、IL6 水平以及肝功能和生活质量。我们发现两组的汗液和血清中的 CRP、TNFα 和 IL6 相关,并在住院患者与门诊患者/对照组中进行了评估。 IL6 与较低的无移植存活率相关。与对照组相比,出汗监测肝硬化患者夜间 CRP/IL6 升高。肝硬化门诊患者的炎症水平在夜间升高,并在夜间达到峰值。夜间和清晨的水平开始下降。这些数据表明,有必要进一步研究肝硬化汗液生物标志物的连续测量。


5. Contrasting rule and machine learning based digital self triage systems in the USA.

美国基于对比规则和机器学习的数字自我分类系统。

PMID: 39725711 | DOI: 10.1038/s41746-024-01367-3 | 日期: 2024-12-27

摘要: Patient smart access and self-triage systems have been in development for decades. As of now, no LLM for processing self-reported patient data has been published by health systems. Many expert systems and computational models have been released to millions. This review is the first to summarize progress in the field including an analysis of the exact self-triage solutions available on the websites of 647 health systems in the USA.

中文摘要: 患者智能访问和自我分诊系统已经发展了数十年。截至目前,卫生系统尚未发布用于处理患者自我报告数据的法学硕士。许多专家系统和计算模型已向数百万人发布。这篇综述首次总结了该领域的进展,包括对美国 647 个卫生系统网站上提供的精确自我分类解决方案的分析。


6. Fast and accurate prediction of drug induced proarrhythmic risk with sex specific cardiac emulators.

使用特定性别的心脏模拟器快速准确地预测药物引起的致心律失常风险。

PMID: 39725693 | DOI: 10.1038/s41746-024-01370-8 | 日期: 2024-12-26

摘要: In silico trials for drug safety assessment require many high-fidelity 3D cardiac simulations to predict drug-induced QT interval prolongation, which is often computationally prohibitive. To streamline this process, we developed sex-specific emulators for a fast prediction of QT interval, trained on a dataset of 900 simulations. Our results show significant differences between 3D and 0D single-cell models as risk levels increase, underscoring the ability of 3D modeling to capture more complex cardiac responses. The emulators demonstrated an average error of 4% compared to simulations, allowing for efficient global sensitivity analysis and fast replication of in silico clinical trials. This approach enables rapid, multi-dose drug testing on standard hardware, addressing critical industry challenges around trial design, assay variability, and cost-effective safety evaluations. By integrating these emulators into drug development, we can improve preclinical reliability and advance the practical application of digital twins in biomedicine.

中文摘要: 药物安全性评估的计算机模拟试验需要许多高保真 3D 心脏模拟来预测药物引起的 QT 间期延长,而这在计算上通常是令人望而却步的。为了简化这一过程,我们开发了用于快速预测 QT 间期的特定性别模拟器,并在 900 个模拟的数据集上进行了训练。我们的结果显示,随着风险水平的增加,3D 和 0D 单细胞模型之间存在显着差异,这强调了 3D 建模捕获更复杂的心脏反应的能力。与模拟相比,模拟器的平均误差为 4%,可实现高效的全局敏感性分析和计算机模拟临床试验的快速复制。这种方法可以在标准硬件上进行快速、多剂量药物测试,解决围绕试验设计、测定变异性和经济高效的安全性评估的关键行业挑战。通过将这些模拟器集成到药物开发中,我们可以提高临床前可靠性并推进数字孪生在生物医学中的实际应用。


7. Ecologically sustainable benchmarking of AI models for histopathology.

组织病理学人工智能模型的生态可持续基准测试。

PMID: 39719527 | DOI: 10.1038/s41746-024-01397-x | 日期: 2024-12-24

摘要: Deep learning (DL) holds great promise to improve medical diagnostics, including pathology. Current DL research mainly focuses on performance. DL implementation potentially leads to environmental consequences but approaches for assessment of both performance and carbon footprint are missing. Here, we explored an approach for developing DL for pathology, which considers both diagnostic performance and carbon footprint, calculated as CO2 or equivalent emissions (CO2eq). We evaluated various DL architectures used in computational pathology, including a large foundation model, across two diagnostic tasks of low and high complexity. We proposed a metric termed 'environmentally sustainable performance' (ESPer), which quantitatively integrates performance and operational CO2eq during training and inference. While some DL models showed comparable diagnostic performance, ESPer enabled prioritizing those with less carbon footprint. We also investigated how data reduction approaches can improve the ESPer of individual models. This study provides an approach facilitating the development of environmentally friendly, sustainable medical AI.

中文摘要: 深度学习 (DL) 有望改善包括病理学在内的医学诊断。目前的深度学习研究主要集中在性能方面。深度学习的实施可能会导致环境后果,但缺乏评估绩效和碳足迹的方法。在这里,我们探索了一种开发病理学深度学习的方法,该方法考虑了诊断性能和碳足迹,计算为 CO2 或当量排放量 (CO2eq)。我们评估了计算病理学中使用的各种深度学习架构,包括一个大型基础模型,涵盖了低复杂度和高复杂度的两个诊断任务。我们提出了一个名为"环境可持续绩效"(ESPer)的指标,它在训练和推理过程中定量地整合了绩效和运营 CO2eq。虽然一些 DL 模型显示出类似的诊断性能,但 ESPer 可以优先考虑那些碳足迹较少的模型。我们还研究了数据缩减方法如何提高单个模型的 ESPer。这项研究提供了一种促进环境友好、可持续医疗人工智能发展的方法。


8. Building a modular and multi-cellular virtual twin of the synovial joint in Rheumatoid Arthritis.

构建类风湿关节炎滑膜关节的模块化多细胞虚拟双胞胎。

PMID: 39719524 | DOI: 10.1038/s41746-024-01396-y | 日期: 2024-12-24

摘要: Rheumatoid arthritis is a complex disease marked by joint pain, stiffness, swelling, and chronic synovitis, arising from the dysregulated interaction between synoviocytes and immune cells. Its unclear etiology makes finding a cure challenging. The concept of digital twins, used in engineering, can be applied to healthcare to improve diagnosis and treatment for complex diseases like rheumatoid arthritis. In this work, we pave the path towards a digital twin of the arthritic joint by building a large, modular biochemical reaction map of intra- and intercellular interactions. This network, featuring over 1000 biomolecules, is then converted to one of the largest executable Boolean models for biological systems to date. Validated through existing knowledge and gene expression data, our model is used to explore current treatments and identify new therapeutic targets for rheumatoid arthritis.

中文摘要: 类风湿关节炎是一种复杂的疾病,以关节疼痛、僵硬、肿胀和慢性滑膜炎为特征,由滑膜细胞和免疫细胞之间的相互作用失调引起。其病因尚不清楚,因此寻找治疗方法具有挑战性。用于工程的数字孪生概念可以应用于医疗保健,以改善类风湿性关节炎等复杂疾病的诊断和治疗。在这项工作中,我们通过构建细胞内和细胞间相互作用的大型模块化生化反应图,为关节炎关节的数字孪生铺平了道路。该网络包含 1000 多个生物分子,然后被转换为迄今为止生物系统最大的可执行布尔模型之一。通过现有知识和基因表达数据进行验证,我们的模型用于探索当前的治疗方法并确定类风湿关节炎的新治疗靶点。


9. An umbrella review on how digital health intervention co-design is conducted and described.

关于如何进行和描述数字健康干预协同设计的总体回顾。

PMID: 39715947 | DOI: 10.1038/s41746-024-01385-1 | 日期: 2024-12-23

摘要: Co-design has been suggested to improve intervention effectiveness and sustainability. However, digital health intervention co-design is inconsistently reported. This umbrella review aims to synthesize what is known about co-design of digital health interventions. We searched five databases from inception. Reviews which reported on co-design methodologies used in digital health were eligible. Information on review type, health conditions, and reported specifics of co-design were extracted and synthesized. Methodological quality was assessed using the AMSTAR2 tool. We included 21 reviews published between 2015 and 2023. Co-design participants included patients, caregivers and healthcare professionals. The frequency and breadth of participant involvement in co-design activities were reported in less than half of reviews. Participants evaluated intervention co-design as a positive process. All reviews were rated as critically low quality. This umbrella review highlights the inconsistent reporting of co-design in digital health. Here, we emphasize the importance of creating guidelines to direct co-design activities.

中文摘要: 建议共同设计以提高干预效果和可持续性。然而,数字健康干预协同设计的报道并不一致。本综述旨在综合数字健康干预措施协同设计的已知信息。我们从一开始就检索了五个数据库。报告数字健康中使用的协同设计方法的评论是合格的。提取并综合了有关审查类型、健康状况和报告的协同设计细节的信息。使用 AMSTAR2 工具评估方法学质量。我们纳入了 2015 年至 2023 年间发表的 21 篇评论。协同设计参与者包括患者、护理人员和医疗保健专业人员。不到一半的评论报告了参与者参与协同设计活动的频率和广度。参与者将干预协同设计评价为一个积极的过程。所有评论均被评为极低质量。该总体审查强调了数字健康领域协同设计报告的不一致。在这里,我们强调制定指南来指导协同设计活动的重要性。


10. Adaptive spatiotemporal encoding network for cognitive assessment using resting state EEG.

使用静息态脑电图进行认知评估的自适应时空编码网络。

PMID: 39715883 | DOI: 10.1038/s41746-024-01384-2 | 日期: 2024-12-23

摘要: Cognitive impairment, marked by neurodegenerative damage, leads to diminished cognitive function decline. Accurate cognitive assessment is crucial for early detection and progress evaluation, yet current methods in clinical practice lack objectivity, precision, and convenience. This study included 743 participants, including healthy individuals, mild cognitive impairment (MCI), and dementia patients, with collected resting-state EEG data and cognitive scale scores. An adaptive spatiotemporal encoding framework was developed based on resting-state EEG, achieving an MAE of 3.12% (95% CI: 2.9034, 3.3975) in testing (sensitivity: 0.97, 95% CI: 0.779,1; specificity: 0.97, 95% CI: 0.779,1). The model's effectiveness was also validated on the neurofeedback (sensitivity: 0.867, 95% CI: 0.621, 0.963; specificity: 1, 95% CI: 0.439, 1.0) and TMS datasets (sensitivity: 0.833, 95% CI: 0.608, 0.942), which effectively reflect the participants' cognitive changes. The model effectively extracted repetitive spatiotemporal patterns from resting-state EEG, aiding in cognitive disease diagnosis and assessment in various scenarios.

中文摘要: 以神经退行性损伤为特征的认知障碍会导致认知功能下降。准确的认知评估对于早期发现和进展评估至关重要,但目前临床实践中的方法缺乏客观性、精确性和便利性。这项研究包括 743 名参与者,包括健康个体、轻度认知障碍 (MCI) 和痴呆患者,收集了静息态脑电图数据和认知量表评分。基于静息态脑电图开发了自适应时空编码框架,测试中 MAE 为 3.12%(95% CI:2.9034,3.3975)(灵敏度:0.97,95% CI:0.779,1;特异性:0.97,95% CI:0.779,1)。该模型的有效性也在神经反馈(敏感性:0.867,95% CI:0.621,0.963;特异性:1,95% CI:0.439,1.0)和TMS数据集(敏感性:0.833,95% CI:0.608,0.942)上得到验证,有效反映了参与者的认知变化。该模型有效地从静息态脑电图中提取重复的时空模式,有助于各种场景下的认知疾病诊断和评估。


11. A Novel method for quantifying fluctuations in wearable derived daily cardiovascular parameters across the menstrual cycle.

一种量化整个月经周期中可穿戴设备每日心血管参数波动的新方法。

PMID: 39715818 | DOI: 10.1038/s41746-024-01394-0 | 日期: 2024-12-23

摘要: Currently, knowledge of changes in cardiovascular function across the menstrual cycle and how these changes may inform upon underlying health is limited. Utilizing wrist-worn biometric data we developed a novel measure to quantify and investigate the cardiovascular fluctuation (i.e. cardiovascular amplitude) within resting heart rate (RHR) and heart rate variability (RMSSD) across 11,590 participants and 45,811 menstrual cycles. Within participants, RHR and RMSSD fluctuated in a regular pattern throughout the menstrual cycle, with population RHRmin and RMSSDmax at cycle day 5, RHRmax at day 26, and RMSSDmin at day 27. Cardiovascular amplitude was attenuated (p < 0.05) in older participants and participants using birth control, suggesting the novel metric may mirror differences in hormonal fluctuations in these cohorts. Longitudinal tracking of cardiovascular amplitude may offer accessible non-invasive monitoring of female physiology and underlying health across the menstrual cycle.

中文摘要: 目前,关于整个月经周期心血管功能变化以及这些变化如何影响潜在健康的知识还很有限。利用腕戴式生物识别数据,我们开发了一种新颖的方法来量化和研究 11,590 名参与者和 45,811 个月经周期的静息心率 (RHR) 和心率变异性 (RMSSD) 内的心血管波动(即心血管振幅)。在参与者中,RHR 和 RMSSD 在整个月经周期中以规律的模式波动,总体 RHRmin 和 RMSSDmax 在周期第 5 天,RHRmax 在第 26 天,RMSSDmin 在第 27 天。老年参与者和使用节育措施的参与者的心血管振幅减弱(p < 0.05),表明新的研究发现指标可能反映了这些群体中荷尔蒙波动的差异。心血管振幅的纵向跟踪可以为整个月经周期的女性生理和潜在健康提供可访问的非侵入性监测。


12. Improving access to cardiovascular care for 1.4 billion people in China using telehealth.

利用远程医疗改善中国 14 亿人获得心血管护理的机会。

PMID: 39715810 | DOI: 10.1038/s41746-024-01381-5 | 日期: 2024-12-23

摘要: Cardiovascular diseases (CVDs) pose a significant health burden in China, where the large population and vast geography limit access to care. Telehealth (tHealth) services provide a virtual model of care that can enhance CVD management. This study aims to describe the trajectory of tHealth services for cardiovascular care between 2016 and 2020 in China, assess their utilization, and discuss their implications for improving access to care in resource-scarce regions. Data were collected on patient-facing, operational tHealth apps in Mainland China. In 2016, 45.8% of tertiary hospitals were accessible via tHealth apps, with a 10.7% annual growth rate. Wealthier regions had better tHealth coverage, irrespective of CVD burden. In 2016 and 2020, 34% and 67% of patients, respectively, consulted doctors located outside of their provinces, primarily in wealthier areas. The most common CVDs managed were hypertension, coronary artery disease, and arrhythmia. These findings suggest that tHealth services improve care access, especially in underdeveloped regions, but widespread technology adoption remains crucial.

中文摘要: 心血管疾病(CVD)在中国造成了巨大的健康负担,因为中国人口众多且地域广阔,就医的机会有限。远程医疗 (tHealth) 服务提供虚拟护理模型,可以增强 CVD 管理。本研究旨在描述 2016 年至 2020 年间中国心血管保健 tHealth 服务的发展轨迹,评估其利用率,并讨论其对改善资源稀缺地区获得医疗服务的影响。数据是在中国大陆面向患者、可运行的 tHealth 应用程序上收集的。 2016年,45.8%的三级医院可通过tHealth应用访问,年增长率为10.7%。无论心血管疾病负担如何,较富裕地区的医疗保健覆盖率较高。 2016年和2020年,分别有34%和67%的患者向省外(主要是较富裕地区)的医生咨询。最常见的心血管疾病是高血压、冠状动脉疾病和心律失常。这些发现表明,tHealth 服务改善了医疗服务的可及性,特别是在欠发达地区,但技术的广泛采用仍然至关重要。


13. Ethical data acquisition for LLMs and AI algorithms in healthcare.

医疗保健领域的法学硕士和人工智能算法的道德数据采集。

PMID: 39715803 | DOI: 10.1038/s41746-024-01399-9 | 日期: 2024-12-24

摘要: Artificial intelligence (AI) algorithms will become increasingly integrated into our healthcare systems in the coming decades. These algorithms require large volumes of data for development and fine-tuning. Patient data is typically acquired for AI algorithms through an opt-out system in the United States, while others support an opt-in model. We argue that ethical principles around autonomy, patient ownership of data, and privacy should be prioritized in the data acquisition paradigm.

中文摘要: 未来几十年,人工智能 (AI) 算法将越来越多地集成到我们的医疗保健系统中。这些算法需要大量数据来进行开发和微调。在美国,人工智能算法通常通过选择退出系统获取患者数据,而其他国家则支持选择加入模型。我们认为,在数据采集范式中应优先考虑围绕自主性、患者数据所有权和隐私的道德原则。


14. Human AI collaboration for unsupervised categorization of live surgical feedback.

人类人工智能协作对实时手术反馈进行无监督分类。

PMID: 39706895 | DOI: 10.1038/s41746-024-01383-3 | 日期: 2024-12-20

摘要: Formative verbal feedback during live surgery is essential for adjusting trainee behavior and accelerating skill acquisition. Despite its importance, understanding optimal feedback is challenging due to the difficulty of capturing and categorizing feedback at scale. We propose a Human-AI Collaborative Refinement Process that uses unsupervised machine learning (Topic Modeling) with human refinement to discover feedback categories from surgical transcripts. Our discovered categories are rated highly for clinical clarity and are relevant to practice, including topics like "Handling and Positioning of (tissue)" and "(Tissue) Layer Depth Assessment and Correction [during tissue dissection]." These AI-generated topics significantly enhance predictions of trainee behavioral change, providing insights beyond traditional manual categorization. For example, feedback on "Handling Bleeding" is linked to improved behavioral change. This work demonstrates the potential of AI to analyze surgical feedback at scale, informing better training guidelines and paving the way for automated feedback and cueing systems in surgery.

中文摘要: 现场手术期间形成性的口头反馈对于调整学员行为和加速技能获取至关重要。尽管它很重要,但由于难以大规模捕获和分类反馈,因此理解最佳反馈具有挑战性。我们提出了一种人机协作细化流程,该流程使用无监督机器学习(主题建模)和人工细化来从手术记录中发现反馈类别。我们发现的类别在临床清晰度方面得到了高度评​​价,并且与实践相关,包括"(组织)的处理和定位"和"[组织解剖期间](组织)层深度评估和校正"等主题。这些人工智能生成的主题显着增强了对学员行为变化的预测,提供了超越传统手动分类的见解。例如,对"处理出血"的反馈与行为改变的改善有关。这项工作展示了人工智能大规模分析手术反馈的潜力,为更好的培训指南提供信息,并为手术中的自动反馈和提示系统铺平了道路。


15. Mitigation of AI adoption bias through an improved autonomous AI system for diabetic retinal disease.

通过改进糖尿病视网膜疾病的自主人工智能系统,减轻人工智能采用偏差。

PMID: 39702673 | DOI: 10.1038/s41746-024-01389-x | 日期: 2024-12-19

摘要: Where adopted, Autonomous artificial Intelligence (AI) for Diabetic Retinal Disease (DRD) resolves longstanding racial, ethnic, and socioeconomic disparities, but AI adoption bias persists. This preregistered trial determined sensitivity and specificity of a previously FDA authorized AI, improved to compensate for lower contrast and smaller imaged area of a widely adopted, lower cost, handheld fundus camera (RetinaVue700, Baxter Healthcare, Deerfield, IL) to identify DRD in participants with diabetes without known DRD, in primary care. In 626 participants (1252 eyes) 50.8% male, 45.7% Hispanic, 17.3% Black, DRD prevalence was 29.0%, all prespecified non-inferiority endpoints were met and no racial, ethnic or sex bias was identified, against a Wisconsin Reading Center level I prognostic standard using widefield stereoscopic photography and macular Optical Coherence Tomography. Results suggest this improved autonomous AI system can mitigate AI adoption bias, while preserving safety and efficacy, potentially contributing to rapid scaling of health access equity. ClinicalTrials.gov NCT05808699 (3/29/2023).

中文摘要: 如果采用自主人工智能 (AI) 治疗糖尿病视网膜疾病 (DRD),可以解决长期存在的种族、民族和社会经济差异,但人工智能采用偏见仍然存在。这项预先注册的试验确定了 FDA 先前授权的 AI 的敏感性和特异性,并进行了改进,以补偿广泛采用的低成本手持眼底相机(RetinaVue700,Baxter Healthcare,Deerfield,IL)的较低对比度和较小的成像区域,以识别初级保健中不知道 DRD 的糖尿病参与者的 DRD。在 626 名参与者(1252 只眼睛)中,50.8% 为男性,45.7% 为西班牙裔,17.3% 为黑人,DRD 患病率为 29.0%,根据使用宽场立体摄影和黄斑光学相干断层扫描的威斯康星州阅读中心 I 级预后标准,所有预先指定的非劣效性终点均得到满足,且未发现种族、民族或性别偏见。结果表明,这种改进的自主人工智能系统可以减轻人工智能采用偏差,同时保持安全性和有效性,可能有助于快速扩大医疗服务的公平性。 ClinicalTrials.gov NCT05808699 (3/29/2023)。


16. AI technology to support adaptive functioning in neurodevelopmental conditions in everyday environments: a systematic review.

支持日常环境中神经发育条件适应性功能的人工智能技术:系统评价。

PMID: 39702672 | DOI: 10.1038/s41746-024-01355-7 | 日期: 2024-12-19

摘要: Supports for adaptive functioning in individuals with neurodevelopmental conditions (NDCs) is of umost importance to long-term outcomes. Artificial intelligence (AI)-assistive technologies has enormous potential to offer efficient, cost-effective, and personalized solutions to address these challenges, particularly in everday environments. This systematic review examines the existing evidence for using AI-assistive technologies to support adaptive functioning in people with NDCs in everyday settings. Searches across six databases yielded 15 studies meeting inclusion criteria, focusing on robotics, phones/computers and virtual reality. Studies most frequently recruited children diagnosed with autism and targeted social skills (47%), daily living skills (26%), and communication (16%). Despite promising results, studies addressing broader transdiagnostic needs across different NDC populations are needed. There is also an urgent need to improve the quality of evidence-based research practices. This review concludes that AI holds enormous potential to support adaptive functioning for people with NDCs and for personalized health support. This review underscores the need for further research studies to advance AI technologies in this field.

中文摘要: 支持患有神经发育疾病(NDC)的个体的适应性功能对于长期结果至关重要。人工智能 (AI) 辅助技术具有巨大的潜力,可以提供高效、经济高效和个性化的解决方案来应对这些挑战,特别是在日常环境中。这项系统综述研究了使用人工智能辅助技术来支持 NDC 患者在日常环境中的适应性功能的现有证据。对六个数据库的搜索产生了 15 项符合纳入标准的研究,重点关注机器人、电话/计算机和虚拟现实。最常招募的研究对象是被诊断患有自闭症的儿童,并有针对性的社交技能(47%)、日常生活技能(26%)和沟通能力(16%)。尽管结果有希望,但仍需要研究解决不同 NDC 人群更广泛的跨诊断需求。还迫切需要提高循证研究实践的质量。本次审查得出的结论是,人工智能在支持 NDC 患者的适应性功能和个性化健康支持方面具有巨大潜力。本综述强调需要进一步研究以推进该领域的人工智能技术。


17. Probabilistic medical predictions of large language models.

大型语言模型的概率医学预测。

PMID: 39702641 | DOI: 10.1038/s41746-024-01366-4 | 日期: 2024-12-19

摘要: Large Language Models (LLMs) have shown promise in clinical applications through prompt engineering, allowing flexible clinical predictions. However, they struggle to produce reliable prediction probabilities, which are crucial for transparency and decision-making. While explicit prompts can lead LLMs to generate probability estimates, their numerical reasoning limitations raise concerns about reliability. We compared explicit probabilities from text generation to implicit probabilities derived from the likelihood of predicting the correct label token. Across six advanced open-source LLMs and five medical datasets, explicit probabilities consistently underperformed implicit probabilities in discrimination, precision, and recall. This discrepancy is more pronounced with smaller LLMs and imbalanced datasets, highlighting the need for cautious interpretation, improved probability estimation methods, and further research for clinical use of LLMs.

中文摘要: 大型语言模型(LLM)通过快速工程在临床应用中显示出前景,允许灵活的临床预测。然而,他们很难产生可靠的预测概率,这对于透明度和决策至关重要。虽然明确的提示可以引导法学硕士生成概率估计,但其数字推理的局限性引起了人们对可靠性的担忧。我们将文本生成的显式概率与从预测正确标签标记的可能性得出的隐式概率进行了比较。在六个高级开源法学硕士和五个医学数据集中,显式概率在辨别力、精确度和召回率方面始终低于隐式概率。这种差异对于较小的法学硕士和不平衡的数据集更为明显,突出表明需要谨慎解释、改进概率估计方法以及进一步研究法学硕士的临床使用。


18. Optimising the paradigms of human AI collaborative clinical coding.

优化人类人工智能协作临床编码的范式。

PMID: 39702575 | DOI: 10.1038/s41746-024-01363-7 | 日期: 2024-12-19

摘要: Automated clinical coding (ACC) has emerged as a promising alternative to manual coding. This study proposes a novel human-in-the-loop (HITL) framework, CliniCoCo. Using deep learning capacities, CliniCoCo focuses on how such ACC systems and human coders can work effectively and efficiently together in real-world settings. Specifically, it implements a series of collaborative strategies at annotation, training and user interaction stages. Extensive experiments are conducted using real-world EMR datasets from Chinese hospitals. With automatically optimised annotation workloads, the model can achieve F1 scores around 0.80-0.84. For an EMR with 30% mistaken codes, CliniCoCo can suggest halving the annotations from 3000 admissions with an ignorable 0.01 F1 decrease. In human evaluations, compared to manual coding, CliniCoCo reduces coding time by 40% on average and significantly improves the correction rates on EMR mistakes (e.g., three times better on missing codes). Senior professional coders' performances can be boosted to more than 0.93 F1 score from 0.72.

中文摘要: 自动临床编码(ACC)已成为手动编码的有前途的替代方案。本研究提出了一种新颖的人机交互 (HITL) 框架 CliniCoCo。 CliniCoCo 利用深度学习能力,专注于此类 ACC 系统和人类编码员如何在现实环境中有效且高效地协同工作。具体来说,它在注释、训练和用户交互阶段实现了一系列协作策略。使用中国医院的真实电子病历数据集进行了大量实验。通过自动优化的注释工作负载,该模型可以获得 0.80-0.84 左右的 F1 分数。对于错误代码为 30% 的 EMR,CliniCoCo 可以建议将 3000 次入院的注释减半,F1 减少可忽略不计。在人工评估中,与手动编码相比,CliniCoCo 平均减少了 40% 的编码时间,并显着提高了 EMR 错误的纠正率(例如,丢失代码的纠正率提高了三倍)。高级专业程序员的F1分数可以从0.72提升到0.93以上。


19. A prospective comparison of two computer aided detection systems with different false positive rates in colonoscopy.

结肠镜检查中具有不同假阳性率的两种计算机辅助检测系统的前瞻性比较。

PMID: 39702474 | DOI: 10.1038/s41746-024-01334-y | 日期: 2024-12-19

摘要: This study evaluated the impact of differing false positive (FP) rates in two computer-aided detection (CADe) systems on the clinical effectiveness of artificial intelligence (AI)-assisted colonoscopy. The primary outcomes were adenoma detection rate (ADR) and adenomas per colonoscopy (APC). The ADR in the control, system A (3.2% FP rate), and system B (0.6% FP rate) groups were 44.3%, 43.4%, and 50.4%, respectively, with system B showing a significantly higher ADR than the control group. The APC for the control, A, and B groups were 0.75, 0.83, and 0.90, respectively, with system B also showing a higher APC than the control. The non-true lesion resection rates were 23.8%, 29.2%, and 21.3%, with system B having the lowest. The system with lower FP rates demonstrated improved ADR and APC without increasing the resection of non-neoplastic lesions. These findings suggest that higher FP rates negatively affect the clinical performance of AI-assisted colonoscopy.

中文摘要: 本研究评估了两种计算机辅助检测 (CADe) 系统中不同假阳性 (FP) 率对人工智能 (AI) 辅助结肠镜检查临床有效性的影响。主要结局是腺瘤检出率(ADR)和每次结肠镜检查的腺瘤(APC)。对照组、系统 A(3.2% FP 率)和系统 B(0.6% FP 率)组的 ADR 分别为 44.3%、43.4% 和 50.4%,其中系统 B 的 ADR 显着高于对照组。对照组、A 组和 B 组的 APC 分别为 0.75、0.83 和 0.90,系统 B 也显示出比对照组更高的 APC。病灶非真实切除率分别为23.8%、29.2%和21.3%,其中B系统最低。具有较低 FP 率的系统证明了 ADR 和 APC 的改善,而无需增加非肿瘤性病变的切除。这些发现表明,较高的 FP 率会对 AI 辅助结肠镜检查的临床表现产生负面影响。


20. The emergence of medical futures studies uncovers medicine and healthcare's untapped potential.

医学未来研究的出现揭示了医学和医疗保健尚未开发的潜力。

PMID: 39702406 | DOI: 10.1038/s41746-024-01365-5 | 日期: 2024-12-19

摘要: Analyzing the future of medicine and healthcare, especially during the rise of digital health and artificial intelligence, should rely on established futures methods that the discipline of futures studies has been using for decades. By employing such methods, healthcare professionals, policymakers and patient leaders could better navigate the complexities of modern healthcare, anticipate emerging challenges, and shape a future that is not just awaited but actively constructed.

中文摘要: 分析医学和医疗保健的未来,特别是在数字健康和人工智能兴起期间,应该依赖于未来研究学科几十年来一直使用的既定未来方法。通过采用这些方法,医疗保健专业人员、政策制定者和患者领袖可以更好地应对现代医疗保健的复杂性,预测新出现的挑战,并塑造一个不仅值得期待而且正在积极建设的未来。


21. Hospitalization prediction from the emergency department using computer vision AI with short patient video clips.

急诊科使用计算机视觉人工智能和简短的患者视频剪辑进行住院预测。

PMID: 39702364 | DOI: 10.1038/s41746-024-01375-3 | 日期: 2024-12-19

摘要: In this study, we investigate the performance of computer vision AI algorithms in predicting patient disposition from the emergency department (ED) using short video clips. Clinicians often use "eye-balling" or clinical gestalt to aid in triage, based on brief observations. We hypothesize that AI can similarly use patient appearance for disposition prediction. Data were collected from adult patients at an academic ED, with mobile phone videos capturing patients performing simple tasks. Our AI algorithm, using video alone, showed better performance in predicting hospital admissions (AUROC = 0.693 [95% CI 0.689, 0.696]) compared to models using triage clinical data (AUROC = 0.678 [95% CI 0.668, 0.687]). Combining video and triage data achieved the highest predictive performance (AUROC = 0.714 [95% CI 0.709, 0.719]). This study demonstrates the potential of video AI algorithms to support ED triage and alleviate healthcare capacity strains during periods of high demand.

中文摘要: 在这项研究中,我们研究了计算机视觉 AI 算法在使用短视频片段预测急诊科 (ED) 患者处置方面的性能。临床医生经常根据简短的观察使用"眼球观察"或临床格式塔来帮助分类。我们假设人工智能可以类似地使用患者的外表来进行性格预测。数据是从学术急诊室的成年患者身上收集的,手机视频捕捉到患者执行简单任务的情况。与使用分诊临床数据的模型 (AUROC = 0.678 [95% CI 0.668, 0.687]) 相比,我们的 AI 算法仅使用视频,在预测入院率方面表现出更好的性能 (AUROC = 0.693 [95% CI 0.689, 0.696])。结合视频和分类数据实现了最高的预测性能(AUROC = 0.714 [95% CI 0.709, 0.719])。这项研究证明了视频人工智能算法在支持急诊分诊和缓解高需求时期医疗保健能力紧张方面的潜力。


22. Advancing digital sensing in mental health research.

推进心理健康研究中的数字传感。

PMID: 39695319 | DOI: 10.1038/s41746-024-01343-x | 日期: 2024-12-18

摘要: Digital sensing tools, like smartphones and wearables, offer transformative potential for mental health research by enabling scalable, longitudinal data collection. Realizing this promise requires overcoming significant challenges including limited data standards, underpowered studies, and a disconnect between research aims and community needs. This report, based on the 2023 Workshop on Advancing Digital Sensing Tools for Mental Health, articulates strategies to address these challenges to ensure rigorous, equitable, and impactful research.

中文摘要: 智能手机和可穿戴设备等数字传感工具通过实现可扩展的纵向数据收集,为心理健康研究提供了变革潜力。实现这一承诺需要克服重大挑战,包括数据标准有限、研究动力不足以及研究目标与社区需求之间的脱节。该报告以 2023 年推进心理健康数字传感工具研讨会为基础,阐述了应对这些挑战的策略,以确保严格、公平和有影响力的研究。


23. SurgeryLLM: a retrieval-augmented generation large language model framework for surgical decision support and workflow enhancement.

SurgeryLLM:用于手术决策支持和工作流程增强的检索增强生成大型语言模型框架。

PMID: 39695316 | DOI: 10.1038/s41746-024-01391-3 | 日期: 2024-12-18

摘要: SurgeryLLM, a large language model framework using Retrieval Augmented Generation demonstrably incorporated domain-specific knowledge from current evidence-based surgical guidelines when presented with patient-specific data. The successful incorporation of guideline-based information represents a substantial step toward enabling greater surgeon efficiency, improving patient safety, and optimizing surgical outcomes.

中文摘要: SurgeryLLM 是一个使用检索增强生成的大型语言模型框架,在提供患者特定数据时,它显然结合了当前基于证据的手术指南中的特定领域知识。基于指南的信息的成功整合代表着朝着提高外科医生效率、提高患者安全和优化手术结果迈出了实质性的一步。


24. Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions.

数字心理健康干预中机器学习预测的最小数据集大小的估计。

PMID: 39695276 | DOI: 10.1038/s41746-024-01360-w | 日期: 2024-12-18

摘要: Artificial intelligence promises to revolutionize mental health care, but small dataset sizes and lack of robust methods raise concerns about result generalizability. To provide insights on minimal necessary data set sizes, we explore domain-specific learning curves for digital intervention dropout predictions based on 3654 users from a single study (ISRCTN13716228, 26/02/2016). Prediction performance is analyzed based on dataset size (N = 100-3654), feature groups (F = 2-129), and algorithm choice (from Naive Bayes to Neural Networks). The results substantiate the concern that small datasets (N ≤ 300) overestimate predictive power. For uninformative feature groups, in-sample prediction performance was negatively correlated with dataset size. Sophisticated models overfitted in small datasets but maximized holdout test results in larger datasets. While N = 500 mitigated overfitting, performance did not converge until N = 750-1500. Consequently, we propose minimum dataset sizes of N = 500-1000. As such, this study offers an empirical reference for researchers designing or interpreting AI studies on Digital Mental Health Intervention data.

中文摘要: 人工智能有望彻底改变精神卫生保健,但数据集规模小和缺乏可靠的方法引起了人们对结果普遍性的担忧。为了提供有关最小必要数据集大小的见解,我们基于一项研究(ISRCTN13716228,26/02/2016)中的 3654 位用户,探索了数字干预退出预测的特定领域学习曲线。根据数据集大小 (N = 100-3654)、特征组 (F = 2-129) 和算法选择(从朴素贝叶斯到神经网络)来分析预测性能。结果证实了小数据集 (N≤≤300) 高估预测能力的担忧。对于无信息的特征组,样本内预测性能与数据集大小负相关。复杂的模型在小数据集中过度拟合,但在较大数据集中最大化保留测试结果。虽然 N = 500 缓解了过度拟合,但直到 N = 750-1500 时性能才收敛。因此,我们建议最小数据集大小为 N = 500-1000。因此,本研究为研究人员设计或解释数字心理健康干预数据的人工智能研究提供了实证参考。


25. Systematic review and meta-analysis of adverse events in clinical trials of mental health apps.

对心理健康应用程序临床试验中不良事件的系统回顾和荟萃分析。

PMID: 39695173 | DOI: 10.1038/s41746-024-01388-y | 日期: 2024-12-18

摘要: Mental health apps are efficacious, yet they may pose risks in some. This review (CRD42024506486) examined adverse events (AEs) from mental health apps. We searched (May 2024) the Medline, PsycINFO, Web of Science, and ProQuest databases to identify clinical trials of mental health apps. The risk of bias was assessed using the Cochrane Risk of Bias tool. Only 55 of 171 identified clinical trials reported AEs. AEs were more likely to be reported in trials sampling schizophrenia and delivering apps with symptom monitoring technology. The meta-analytic deterioration rate from 13 app conditions was 6.7% (95% CI = 4.3, 10.1, I2 = 75%). Deterioration rates did not differ between app and control groups (OR = 0.79, 95% CI = 0.62-1.01, I2 = 0%). Reporting of AEs was heterogeneous, in terms of assessments used, events recorded, and detail provided. Overall, few clinical trials of mental health apps report AEs. Those that do often provide insufficient information to properly judge risks related to app use.

中文摘要: 心理健康应用程序很有效,但它们可能会给某些人带来风险。这篇综述 (CRD42024506486) 检查了心理健康应用程序中的不良事件 (AE)。我们搜索了(2024 年 5 月)Medline、PsycINFO、Web of Science 和 ProQuest 数据库,以确定心理健康应用程序的临床试验。使用 Cochrane 偏倚风险工具评估偏倚风险。 171 项已确定的临床试验中只有 55 项报告了 AE。在对精神分裂症进行采样并提供带有症状监测技术的应用程序的试验中,更有可能报告不良事件。 13 个应用条件的荟萃分析恶化率为 6.7%(95% CI = 4.3、10.1、I2 = 75%)。应用组和对照组之间的恶化率没有差异(OR = 0.79,95% CI = 0.62-1.01,I2 = 0%)。就使用的评估、记录的事件和提供的详细信息而言,不良事件的报告各不相同。总体而言,很少有心理健康应用程序的临床试验报告不良事件。那些确实提供的信息通常不足以正确判断与应用程序使用相关的风险。


26. Autonomous medical evaluation for guideline adherence of large language models.

对大型语言模型的指南遵守情况进行自主医学评估。

PMID: 39668168 | DOI: 10.1038/s41746-024-01356-6 | 日期: 2024-12-12

摘要: Autonomous Medical Evaluation for Guideline Adherence (AMEGA) is a comprehensive benchmark designed to evaluate large language models' adherence to medical guidelines across 20 diagnostic scenarios spanning 13 specialties. It includes an evaluation framework and methodology to assess models' capabilities in medical reasoning, differential diagnosis, treatment planning, and guideline adherence, using open-ended questions that mirror real-world clinical interactions. It includes 135 questions and 1337 weighted scoring elements designed to assess comprehensive medical knowledge. In tests of 17 LLMs, GPT-4 scored highest with 41.9/50, followed closely by Llama-3 70B and WizardLM-2-8x22B. For comparison, a recent medical graduate scored 25.8/50. The benchmark introduces novel content to avoid the issue of LLMs memorizing existing medical data. AMEGA's publicly available code supports further research in AI-assisted clinical decision-making, aiming to enhance patient care by aiding clinicians in diagnosis and treatment under time constraints.

中文摘要: 指南遵守情况自主医疗评估 (AMEGA) 是一个综合基准,旨在评估大型语言模型对跨 13 个专业的 20 个诊断场景的医疗指南的遵守情况。它包括一个评估框架和方法,使用反映现实世界临床相互作用的开放式问题来评估模型在医学推理、鉴别诊断、治疗计划和指南遵守方面的能力。它包括 135 个问题和 1337 个加权评分元素,旨在评估综合医学知识。在 17 个 LLM 的测试中,GPT-4 得分最高,为 41.9/50,紧随其后的是 Llama-3 70B 和 WizardLM-2-8x22B。相比之下,一名刚毕业的医学毕业生得分为 25.8/50。该基准引入了新颖的内容,以避免法学硕士记忆现有医疗数据的问题。 AMEGA 的公开代码支持人工智能辅助临床决策的进一步研究,旨在通过帮助临床医生在时间限制下进行诊断和治疗来增强患者护理。


27. Predicting control of cardiovascular disease risk factors in South Asia using machine learning.

使用机器学习预测南亚心血管疾病危险因素的控制。

PMID: 39658561 | DOI: 10.1038/s41746-024-01353-9 | 日期: 2024-12-10

摘要: A substantial share of patients at risk of developing cardiovascular disease (CVD) fail to achieve control of CVD risk factors, but clinicians lack a structured approach to identify these patients. We applied machine learning to longitudinal data from two completed randomized controlled trials among 1502 individuals with diabetes in urban India and Pakistan. Using commonly available clinical data, we predict each individual's risk of failing to achieve CVD risk factor control goals or meaningful improvements in risk factors at one year after baseline. When classifying those in the top quartile of predicted risk scores as at risk of failing to achieve goals or meaningful improvements, the precision for not achieving goals was 73% for HbA1c, 30% for SBP, and 24% for LDL, and for not achieving meaningful improvements 88% for HbA1c, 87% for SBP, and 85% for LDL. Such models could be integrated into routine care and enable efficient and targeted delivery of health resources in resource-constrained settings.

中文摘要: 很大一部分有心血管疾病 (CVD) 风险的患者未能控制 CVD 危险因素,但临床医生缺乏识别这些患者的结构化方法。我们将机器学习应用于两项已完成的随机对照试验的纵向数据,该试验涉及印度和巴基斯坦城市的 1502 名糖尿病患者。使用常用的临床数据,我们预测每个人在基线后一年未能实现 CVD 风险因素控制目标或风险因素有意义改善的风险。当将预测风险评分前四分之一的人分类为有无法实现目标或有意义的改善的风险时,HbA1c 未实现目标的精确度为 73%,SBP 为 30%,LDL 为 24%,未实现有意义的改善的精确度为 HbA1c 88%,SBP 87%,LDL 85%。这些模型可以融入日常护理中,并能够在资源有限的环境中高效、有针对性地提供卫生资源。


28. Deep learning biomarker of chronometric and biological ischemic stroke lesion age from unenhanced CT.

通过平扫 CT 深度学习计时和生物缺血性中风病变年龄的生物标志物。

PMID: 39643604 | DOI: 10.1038/s41746-024-01325-z | 日期: 2024-12-06

摘要: Estimating progression of acute ischemic brain lesions - or biological lesion age - holds huge practical importance for hyperacute stroke management. The current best method for determining lesion age from non-contrast computerised tomography (NCCT), measures Relative Intensity (RI), termed Net Water Uptake (NWU). We optimised lesion age estimation from NCCT using a convolutional neural network - radiomics (CNN-R) model trained upon chronometric lesion age (Onset Time to Scan: OTS), while validating against chronometric and biological lesion age in external datasets (N = 1945). Coefficients of determination (R2) for OTS prediction, using CNN-R, and RI models were 0.58 and 0.32 respectively; while CNN-R estimated OTS showed stronger associations with ischemic core:penumbra ratio, than RI and chronometric, OTS (ρ2 = 0.37, 0.19, 0.11); and with early lesion expansion (regression coefficients >2x for CNN-R versus others) (all comparisons: p < 0.05). Concluding, deep-learning analytics of NCCT lesions is approximately twice as accurate as NWU for estimating chronometric and biological lesion ages.

中文摘要: 估计急性缺血性脑损伤的进展(或生物学损伤年龄)对于超急性中风治疗具有巨大的实际意义。目前通过非对比计算机断层扫描 (NCCT) 确定病变年龄的最佳方法是测量相对强度 (RI),称为净水吸收 (NWU)。我们使用基于计时病变年龄(扫描开始时间:OTS)训练的卷积神经网络 - 放射组学 (CNN-R) 模型优化了 NCCT 的病变年龄估计,同时根据外部数据集 (N = 1945) 中的计时和生物病变年龄进行验证。使用 CNN-R 和 RI 模型进行 OTS 预测的决定系数 (R2) 分别为 0.58 和 0.32;而 CNN-R 估计的 OTS 显示出与缺血核心:半影比的相关性比 RI 和计时 OTS 更强 (ρ2 = 0.37, 0.19, 0.11);以及早期病变扩展(CNN-R 与其他的回归系数 >2 倍)(所有比较:p < 0.05)。总之,在估计计时和生物病变年龄方面,NCCT 病变的深度学习分析的准确度大约是 NWU 的两倍。


29. The real-world association between digital markers of circadian disruption and mental health risks.

昼夜节律紊乱的数字标记与心理健康风险之间的现实关联。

PMID: 39639100 | DOI: 10.1038/s41746-024-01348-6 | 日期: 2024-12-05

摘要: While circadian disruption is recognized as a potential driver of depression, its real-world impact is poorly understood. A critical step to addressing this is the noninvasive collection of physiological time-series data outside laboratory settings in large populations. Digital tools offer promise in this endeavor. Here, using wearable data, we first quantify the degrees of circadian disruption, both between different internal rhythms and between each internal rhythm and the sleep-wake cycle. Our analysis, based on over 50,000 days of data from over 800 first-year training physicians, reveals bidirectional links between digital markers of circadian disruption and mood both before and after they began shift work, while accounting for confounders such as demographic and geographic variables. We further validate this by finding clinically relevant changes in the 9-item Patient Health Questionnaire score. Our findings validate a scalable digital measure of circadian disruption that could serve as a marker for psychiatric intervention.

中文摘要: 虽然昼夜节律紊乱被认为是抑郁症的潜在驱动因素,但其对现实世界的影响却知之甚少。解决这个问题的关键一步是在实验室环境之外对大量人群进行无创生理时间序列数据收集。数字工具为这一努力提供了希望。在这里,使用可穿戴数据,我们首先量化不同内部节律之间以及每个内部节律与睡眠-觉醒周期之间的昼夜节律破坏程度。我们的分析基于 800 多名第一年培训医生的 50,000 多天的数据,揭示了他们开始轮班工作前后昼夜节律紊乱和情绪的数字标记之间的双向联系,同时考虑了人口和地理变量等混杂因素。我们通过在 9 项患者健康问卷评分中发现临床相关的变化来进一步验证这一点。我们的研究结果验证了昼夜节律紊乱的可扩展数字测量,可以作为精神干预的标志。


30. Predicting future fallers in Parkinson's disease using kinematic data over a period of 5 years.

使用 5 年期间的运动学数据预测未来跌倒的帕金森病患者。

PMID: 39638907 | DOI: 10.1038/s41746-024-01311-5 | 日期: 2024-12-05

摘要: Parkinson's disease (PD) increases fall risk, leading to injuries and reduced quality of life. Accurate fall risk assessment is crucial for effective care planning. Traditional assessments are subjective and time-consuming, while recent assessment methods based on wearable sensors have been limited to 1-year follow-ups. This study investigated whether a short sensor-based assessment could predict falls over up to 5 years. Data from 104 people with PD without prior falls were collected using six wearable sensors during a 2-min walk and a 30-s postural sway task. Five machine learning classifiers analysed the data. The Random Forest classifier performed best, achieving 78% accuracy (AUC = 0.85) at 60 months. Most models showed excellent performance at 24 months (AUC > 0.90, accuracy 84-92%). Walking and postural variability measures were key predictors. Adding clinicodemographic data, particularly age, improved model performance. Wearable sensors combined with machine learning can effectively predict fall risk, enhancing PD management and prevention strategies.

中文摘要: 帕金森病 (PD) 会增加跌倒风险,导致受伤并降低生活质量。准确的跌倒风险评估对于有效的护理计划至关重要。传统的评估是主观且耗时的,而最近基于可穿戴传感器的评估方法仅限于一年的随访。这项研究调查了基于传感器的短期评估是否可以预测长达 5 年的跌倒情况。使用六个可穿戴传感器收集了 104 名没有跌倒史的帕金森病患者在 2 分钟步行和 30 秒姿势摇摆任务中的数据。五个机器学习分类器分析了数据。随机森林分类器表现最好,在 60 个月时达到 78% 的准确率 (AUC = 0.85)。大多数模型在 24 个月时表现出优异的性能(AUC > 0.90,准确度 84-92%)。步行和姿势变异测量是关键的预测因素。添加临床人口统计学数据,特别是年龄,可以提高模型性能。可穿戴传感器与机器学习相结合,可以有效预测跌倒风险,增强局部放电管理和预防策略。


31. AI-enabled wearable cameras for assisting dietary assessment in African populations.

支持人工智能的可穿戴摄像头,用于协助非洲人群的饮食评估。

PMID: 39638852 | DOI: 10.1038/s41746-024-01346-8 | 日期: 2024-12-05

摘要: We have developed a population-level method for dietary assessment using low-cost wearable cameras. Our approach, EgoDiet, employs an egocentric vision-based pipeline to learn portion sizes, addressing the shortcomings of traditional self-reported dietary methods. To evaluate the functionality of this method, field studies were conducted in London (Study A) and Ghana (Study B) among populations of Ghanaian and Kenyan origin. In Study A, EgoDiet's estimations were contrasted with dietitians' assessments, revealing a performance with a Mean Absolute Percentage Error (MAPE) of 31.9% for portion size estimation, compared to 40.1% for estimates made by dietitians. We further evaluated our approach in Study B, comparing its performance to the traditional 24-Hour Dietary Recall (24HR). Our approach demonstrated a MAPE of 28.0%, showing a reduction in error when contrasted with the 24HR, which exhibited a MAPE of 32.5%. This improvement highlights the potential of using passive camera technology to serve as an alternative to the traditional dietary assessment methods.

中文摘要: 我们开发了一种使用低成本可穿戴相机进行饮食评估的人群水平方法。我们的方法 EgoDiet 采用以自我为中心的基于视觉的管道来学习份量大小,解决了传统自我报告饮食方法的缺点。为了评估该方法的功能,在伦敦(研究 A)和加纳(研究 B)对加纳和肯尼亚裔人群进行了实地研究。在研究 A 中,EgoDiet 的估计与营养师的评估进行了对比,结果显示份量估计的平均绝对百分比误差 (MAPE) 为 31.9%,而营养师的估计为 40.1%。我们进一步评估了研究 B 中的方法,将其性能与传统的 24 小时饮食回忆 (24HR) 进行了比较。我们的方法的 MAPE 为 28.0%,与 24HR 的 MAPE 为 32.5% 相比,误差减少了。这一改进凸显了使用被动摄像技术作为传统饮食评估方法替代方案的潜力。


32. A scoping review on pediatric sepsis prediction technologies in healthcare.

对医疗保健中儿科脓毒症预测技术的范围审查。

PMID: 39633080 | DOI: 10.1038/s41746-024-01361-9 | 日期: 2024-12-04

摘要: This scoping review evaluates recent advancements in data-driven technologies for predicting non-neonatal pediatric sepsis, including artificial intelligence, machine learning, and other methodologies. Of the 27 included studies, 23 (85%) were single-center investigations, and 16 (59%) used logistic regression. Notably, 20 (74%) studies used datasets with a low prevalence of sepsis-related outcomes, with area under the receiver operating characteristic scores ranging from 0.56 to 0.99. Prediction time points varied widely, and development characteristics, performance metrics, implementation outcomes, and considerations for human factors-especially workflow integration and clinical judgment-were inconsistently reported. The variations in endpoint definitions highlight the potential significance of the 2024 consensus criteria in future development. Future research should strengthen the involvement of clinical users to enhance the understanding and integration of human factors in designing and evaluating these technologies, ultimately aiming for safe and effective integration in pediatric healthcare.

中文摘要: 本范围审查评估了预测非新生儿儿科败血症的数据驱动技术的最新进展,包括人工智能、机器学习和其他方法。在纳入的 27 项研究中,23 项(85%)为单中心研究,16 项(59%)使用逻辑回归。值得注意的是,20 项 (74%) 研究使用脓毒症相关结果发生率较低的数据集,接受者操作特征评分下的面积范围为 0.56 至 0.99。预测时间点差异很大,开发特征、性能指标、实施结果和人为因素考虑因素(尤其是工作流程集成和临床判断)的报告不一致。端点定义的变化凸显了 2024 年共识标准在未来发展中的潜在意义。未来的研究应加强临床用户的参与,以增强设计和评估这些技术时人为因素的理解和整合,最终目标是安全有效地整合到儿科医疗保健中。


33. Leveraging natural language processing to aggregate field safety notices of medical devices across the EU.

利用自然语言处理来汇总整个欧盟医疗器械的现场安全通知。

PMID: 39632973 | DOI: 10.1038/s41746-024-01337-9 | 日期: 2024-12-04

摘要: The European Union (EU) Medical Device Regulation and In Vitro Medical Device Regulation have introduced more rigorous regulatory requirements for medical devices, including new rules for post-market surveillance. However, EU market vigilance is limited by the absence of harmonized reporting systems, languages and nomenclatures among Member States. Our aim was to develop a framework based on Natural Language Processing capable of automatically collecting publicly available Field Safety Notices (FSNs) reporting medical device problems by applying web scraping to EU authority websites, to attribute the most suitable device category based on the European Medical Device Nomenclature (EMDN), and to display processed FSNs in an aggregated way to allow multiple queries. 65,036 FSNs published up to 31/12/2023 were retrieved from 16 EU countries, of which 40,212 (61.83%) were successfully assigned the proper EMDN. The framework's performance was successfully tested, with accuracies ranging from 87.34% to 98.71% for EMDN level 1 and from 64.15% to 85.71% even for level 4.

中文摘要: 欧盟 (EU) 医疗器械法规和体外医疗器械法规对医疗器械提出了更严格的监管要求,包括上市后监管的新规则。然而,由于成员国之间缺乏统一的报告系统、语言和术语,欧盟市场的警惕性受到限制。我们的目标是开发一个基于自然语言处理的框架,能够通过将网络抓取应用于欧盟权威网站,自动收集报告医疗设备问题的公开现场安全通知 (FSN),根据欧洲医疗设备命名法 (EMDN) 归属最合适的设备类别,并以聚合方式显示处理后的 FSN 以允许多次查询。从 16 个欧盟国家检索了截至 2023 年 12 月 31 日发布的 65,036 个 FSN,其中 40,212 个 (61.83%) 成功分配了正确的 EMDN。该框架的性能已成功测试,EMDN 1 级的准确率范围为 87.34% 至 98.71%,即使是 4 级的准确率也为 64.15% 至 85.71%。


34. Probing the limits and capabilities of diffusion models for the anatomic editing of digital twins.

探索用于数字孪生解剖编辑的扩散模型的局限性和功能。

PMID: 39632966 | DOI: 10.1038/s41746-024-01332-0 | 日期: 2024-12-05

摘要: Numerical simulations of cardiovascular device deployment within digital twins of patient-specific anatomy can expedite and de-risk the device design process. Nonetheless, the exclusive use of patient-specific data constrains the anatomic variability that can be explored. We study how Latent Diffusion Models (LDMs) can edit digital twins to create digital siblings. Siblings can serve as the basis for comparative simulations, which can reveal how subtle anatomic variations impact device deployment, and augment virtual cohorts for improved device assessment. Using a case example centered on cardiac anatomy, we study various methods to generate digital siblings. We specifically introduce anatomic variation at different spatial scales or within localized regions, demonstrating the existence of bias toward common anatomic features. We furthermore leverage this bias for virtual cohort augmentation through selective editing, addressing issues related to dataset imbalance and diversity. Our framework delineates the capabilities of diffusion models in synthesizing anatomic variation for numerical simulation studies.

中文摘要: 在患者特定解剖结构的数字双胞胎中对心血管设备部署进行数值模拟可以加快设备设计过程并降低风险。尽管如此,仅使用患者特定数据限制了可以探索的解剖变异性。我们研究潜在扩散模型 (LDM) 如何编辑数字孪生以创建数字兄弟姐妹。兄弟姐妹可以作为比较模拟的基础,这可以揭示微妙的解剖变化如何影响设备部署,并增强虚拟队列以改进设备评估。通过以心脏解剖学为中心的案例,我们研究了生成数字兄弟姐妹的各种方法。我们特别介绍了不同空间尺度或局部区域内的解剖变异,证明了对共同解剖特征的偏见的存在。我们还通过选择性编辑利用这种偏见来增强虚拟队列,解决与数据集不平衡和多样性相关的问题。我们的框架描述了扩散模型在合成数值模拟研究的解剖变化方面的能力。


与 FDA 医疗器械报告相关的人工智能相关安全问题。

PMID: 39627534 | DOI: 10.1038/s41746-024-01357-5 | 日期: 2024-12-03

摘要: The Biden 2023 Artificial Intelligence (AI) Executive Order calls for the creation of a patient safety program. Patient safety reports are a natural starting point for identifying issues. We examined the feasibility of this approach by analyzing reports associated with AI/Machine Learning (ML)-enabled medical devices. Of the 429 reports reviewed, 108 (25.2%) were potentially AI/ML related, with 148 (34.5%) containing insufficient information to determine an AI/ML contribution. A more comprehensive approach is needed.

中文摘要: 拜登 2023 年人工智能 (AI) 行政命令呼吁制定患者安全计划。患者安全报告是识别问题的自然起点。我们通过分析与支持人工智能/机器学习 (ML) 的医疗设备相关的报告来检验这种方法的可行性。在审查的 429 份报告中,108 份 (25.2%) 可能与 AI/ML 相关,其中 148 份 (34.5%) 包含的信息不足以确定 AI/ML 的贡献。需要一种更全面的方法。


36. Reimbursement in the age of generalist radiology artificial intelligence.

放射学人工智能时代的报销。

PMID: 39622981 | DOI: 10.1038/s41746-024-01352-w | 日期: 2024-12-02

摘要: We argue that generalist radiology artificial intelligence (GRAI) challenges current healthcare reimbursement frameworks. Unlike narrow AI tools, GRAI's multi-task capabilities render existing pathways inadequate. This perspective examines key questions surrounding GRAI reimbursement, including issues of coding, valuation, and coverage policies. We aim to catalyze dialogue among stakeholders about how reimbursement might evolve to accommodate GRAI, potentially influencing AI reimbursement strategies in radiology and beyond.

中文摘要: 我们认为,通用放射学人工智能(GRAI)挑战了当前的医疗保健报销框架。与狭隘的人工智能工具不同,GRAI 的多任务能力使得现有的途径变得不够充分。该视角探讨了有关 GRAI 报销的关键问题,包括编码、估值和覆盖政策问题。我们的目标是促进利益相关者之间的对话,讨论报销如何演变以适应 GRAI,从而可能影响放射学及其他领域的 AI 报销策略。


37. Establishing responsible use of AI guidelines: a comprehensive case study for healthcare institutions.

制定负责任地使用人工智能指南:针对医疗机构的综合案例研究。

PMID: 39616269 | DOI: 10.1038/s41746-024-01300-8 | 日期: 2024-11-30

摘要: This report presents a comprehensive case study for the responsible integration of artificial intelligence (AI) into healthcare settings. Recognizing the rapid advancement of AI technologies and their potential to transform healthcare delivery, we propose a set of guidelines emphasizing fairness, robustness, privacy, safety, transparency, explainability, accountability, and benefit. Through a multidisciplinary collaboration, we developed and operationalized these guidelines within a healthcare system, highlighting a case study on ambient documentation to demonstrate the practical application and challenges of implementing generative AI in clinical environments. Our proposed framework ensures continuous monitoring, evaluation, and adaptation of AI technologies, addressing ethical considerations and enhancing patient care. This work contributes to the discourse on responsible AI use in healthcare, offering a blueprint for institutions to navigate the complexities of AI integration responsibly and effectively, thus promoting better, more equitable healthcare outcomes.

中文摘要: 本报告介绍了将人工智能 (AI) 负责任地整合到医疗保健环境中的综合案例研究。认识到人工智能技术的快速发展及其改变医疗保健服务的潜力,我们提出了一套强调公平性、稳健性、隐私、安全性、透明度、可解释性、问责制和效益的指导方针。通过多学科合作,我们在医疗保健系统中制定并实施了这些指南,重点介绍了环境文档的案例研究,以展示在临床环境中实施生成人工智能的实际应用和挑战。我们提出的框架确保人工智能技术的持续监控、评估和适应,解决伦理问题并加强患者护理。这项工作有助于讨论医疗保健中负责任的人工智能使用,为机构提供了负责任、有效地应对人工智能集成的复杂性的蓝图,从而促进更好、更公平的医疗保健结果。


38. Identifying Parkinson's disease and its stages using static standing balance.

使用静态站立平衡识别帕金森病及其阶段。

PMID: 39616268 | DOI: 10.1038/s41746-024-01351-x | 日期: 2024-11-30

摘要: The current assessment of Parkinson's disease (PD) relies on dynamic motor tasks, limiting accessibility. This study aimed to propose an innovative approach to identifying PD and its stages using static standing balance and machine learning. A total of 210 participants were recruited, including a control group and five PD groups categorized by stage. Each participant completed a 10-s static standing balance task in which center of pressure trajectory data in the medial-lateral and anterior-posterior directions were collected. Features were extracted from these trajectory data and the data derived from them using both representation learning and handcrafting methods. A Transformer encoder-based classifier was trained on these features and achieved an F1-score of 0.963 in classifying the six study groups. This approach enhances the accessibility of PD assessment, enabling earlier detection and timely intervention. The novel data mining framework introduced in this study heralds a new era of time-series data-driven digital healthcare.

中文摘要: 目前对帕金森病 (PD) 的评估依赖于动态运动任务,限制了可及性。本研究旨在提出一种利用静态站立平衡和机器学习来识别帕金森病及其阶段的创新方法。总共招募了 210 名参与者,包括对照组和按阶段划分的 5 个 PD 组。每个参与者完成10秒的静态站立平衡任务,收集内外和前后方向的压力中心轨迹数据。使用表示学习和手工方法从这些轨迹数据和从中导出的数据中提取特征。基于 Transformer 编码器的分类器接受了这些特征的训练,并在对六个研究组进行分类时获得了 0.963 的 F1 分数。这种方法提高了局部放电评估的可及性,从而能够更早发现和及时干预。本研究中引入的新颖的数据挖掘框架预示着时间序列数据驱动的数字医疗保健的新时代。


39. Phenotyping people with a history of injecting drug use within electronic medical records using an interactive machine learning approach.

使用交互式机器学习方法对电子病历中具有注射吸毒史的人进行表型分析。

PMID: 39616266 | DOI: 10.1038/s41746-024-01318-y | 日期: 2024-11-30

摘要: People with a history of injecting drug use are a priority for eliminating blood-borne viruses and sexually transmissible infections. Identifying them for disease surveillance in electronic medical records (EMRs) is challenged by sparsity of predictors. This study introduced a novel approach to phenotype people who have injected drugs using structured EMR data and interactive human-in-the-loop methods. We iteratively trained random forest classifiers removing important features and adding new positive labels each time. The initial model achieved 92.7% precision and 93.5% recall. Models maintained >90% precision and recall after nine iterations, revealing combinations of less obvious features influencing predictions. Applied to approximately 1.7 million patients, the final model identified 128,704 (7.7%) patients as potentially having injected drugs, beyond the 50,510 (2.9%) with known indicators of injecting drug use. This process produced explainable models that revealed otherwise hidden combinations of predictors, offering an adaptive approach to addressing the inherent challenge of inconsistently missing data in EMRs.

中文摘要: 有注射吸毒史的人是消除血源性病毒和性传播感染的优先对象。在电子病历 (EMR) 中识别它们以进行疾病监测面临着预测变量稀疏的挑战。这项研究引入了一种使用结构化 EMR 数据和交互式人机交互方法来对注射药物的人进行表型分析的新方法。我们迭代训练随机森林分类器,每次删除重要特征并添加新的正标签。初始模型实现了 92.7% 的准确率和 93.5% 的召回率。模型在九次迭代后保持 >90% 的精确度和召回率,揭示了影响预测的不太明显特征的组合。最终模型适用于约 170 万名患者,识别出 128,704 名患者(7.7%)可能注射过毒品,超过已知有注射毒品使用迹象的 50,510 名患者(2.9%)。这一过程产生了可解释的模型,揭示了原本隐藏的预测变量组合,提供了一种自适应方法来解决电子病历中不一致丢失数据的固有挑战。


40. Impact of human and artificial intelligence collaboration on workload reduction in medical image interpretation.

人类和人工智能协作对减少医学图像判读工作量的影响。

PMID: 39616244 | DOI: 10.1038/s41746-024-01328-w | 日期: 2024-11-30

摘要: Clinicians face increasing workloads in medical imaging interpretation, and artificial intelligence (AI) offers potential relief. This meta-analysis evaluates the impact of human-AI collaboration on image interpretation workload. Four databases were searched for studies comparing reading time or quantity for image-based disease detection before and after AI integration. The Quality Assessment of Studies of Diagnostic Accuracy was modified to assess risk of bias. Workload reduction and relative diagnostic performance were pooled using random-effects model. Thirty-six studies were included. AI concurrent assistance reduced reading time by 27.20% (95% confidence interval, 18.22%-36.18%). The reading quantity decreased by 44.47% (40.68%-48.26%) and 61.72% (47.92%-75.52%) when AI served as the second reader and pre-screening, respectively. Overall relative sensitivity and specificity are 1.12 (1.09, 1.14) and 1.00 (1.00, 1.01), respectively. Despite these promising results, caution is warranted due to significant heterogeneity and uneven study quality.

中文摘要: 临床医生在医学影像判读方面面临着越来越大的工作量,而人工智能 (AI) 可以提供潜在的缓解。这项荟萃分析评估了人类与人工智能协作对图像判读工作量的影响。搜索了四个数据库,以比较人工智能集成前后基于图像的疾病检测的阅读时间或数量的研究。诊断准确性研究的质量评估进行了修改,以评估偏倚风险。使用随机效应模型汇总工作量减少和相对诊断性能。纳入了三十六项研究。 AI并发辅助将阅读时间减少了27.20%(95%置信区间,18.22%-36.18%)。 AI作为二次阅读和预筛选时,阅读量分别下降了44.47%(40.68%-48.26%)和61.72%(47.92%-75.52%)。总体相对敏感性和特异性分别为 1.12 (1.09, 1.14) 和 1.00 (1.00, 1.01)。尽管取得了这些有希望的结果,但由于显着的异质性和研究质量参差不齐,仍需谨慎。


41. A machine-learned model for predicting weight loss success using weight change features early in treatment.

一种机器学习模型,用于在治疗早期使用体重变化特征来预测减肥成功。

PMID: 39613928 | DOI: 10.1038/s41746-024-01299-y | 日期: 2024-11-29

摘要: Stepped-care obesity treatments aim to improve efficiency by early identification of non-responders and adjusting interventions but lack validated models. We trained a random forest classifier to improve the predictive utility of a clinical decision rule (>0.5 lb weight loss/week) that identifies non-responders in the first 2 weeks of a stepped-care weight loss trial (SMART). From 2009 to 2021, 1058 individuals with obesity participated in three studies: SMART, Opt-IN, and ENGAGED. The model was trained on 80% of the SMART data (224 participants), and its in-distribution generalizability was tested on the remaining 20% (remaining 57 participants). The out-of-distribution generalizability was tested on the ENGAGED and Opt-IN studies (472 participants). The model predicted weight loss at month 6 with an 84.5% AUROC and an 86.3% AUPRC. SHAP identified predictive features: weight loss at week 2, ranges/means and ranges of weight loss, slope, and age. The SMART-trained model showed generalizable performance with no substantial difference across studies.

中文摘要: 分级护理肥胖治疗旨在通过早期识别无反应者并调整干预措施来提高效率,但缺乏经过验证的模型。我们训练了一个随机森林分类器,以提高临床决策规则(> 0.5 lb 体重减轻/周)的预测效用,该规则可在分级护理减肥试验 (SMART) 的前两周内识别无反应者。从 2009 年到 2021 年,1058 名肥胖症患者参与了三项研究:SMART、Opt-IN 和 ENGAGED。该模型使用 80% 的 SMART 数据(224 名参与者)进行训练,并在剩余 20% 的数据(剩余 57 名参与者)上测试其分布内泛化性。在 ENGAGED 和 Opt-IN 研究(472 名参与者)中测试了分布外的普遍性。该模型预测第 6 个月时体重减轻,AUROC 为 84.5%,AUPRC 为 86.3%。 SHAP 确定了预测特征:第 2 周的体重减轻、体重减轻的范围/平均值和范围、斜率和年龄。经过 SMART 训练的模型显示出可推广的性能,各个研究之间没有显着差异。


42. An optimal antibiotic selection framework for Sepsis patients using Artificial Intelligence.

使用人工智能为脓毒症患者提供最佳抗生素选择框架。

PMID: 39613924 | DOI: 10.1038/s41746-024-01350-y | 日期: 2024-11-29

摘要: In this work we present OptAB, the first completely data-driven online-updateable antibiotic selection model based on Artificial Intelligence for Sepsis patients accounting for side-effects. OptAB performs an iterative optimal antibiotic selection for real-world Sepsis patients focussing on minimizing the Sepsis-related organ failure score (SOFA-Score) as treatment success while accounting for nephrotoxicity and hepatotoxicity as serious antibiotic side-effects. OptAB provides disease progression forecasts for (combinations of) the antibiotics Vancomycin, Ceftriaxone and Piperacillin/Tazobactam and learns realistic treatment influences on the SOFA-Score and the laboratory values creatinine, bilirubin total and alanine-transaminase indicating possible side-effects. OptAB is based on a hybrid neural network differential equation algorithm and can handle the special characteristics of patient data including irregular measurements, a large amount of missing values and time-dependent confounding. OptAB's selected optimal antibiotics exhibit faster efficacy than the administered antibiotics.

中文摘要: 在这项工作中,我们提出了 OptAB,这是第一个完全数据驱动的、基于人工智能的、可在线更新的抗生素选择模型,用于脓毒症患者的副作用分析。 OptAB 对现实世界的脓毒症患者进行迭代最佳抗生素选择,重点是最大限度地减少脓毒症相关器官衰竭评分 (SOFA-Score) 作为治疗成功,同时将肾毒性和肝毒性视为严重的抗生素副作用。 OptAB 提供抗生素万古霉素、头孢曲松和哌拉西林/他唑巴坦(组合)的疾病进展预测,并了解实际治疗对 SOFA 评分的影响以及指示可能副作用的肌酐、总胆红素和丙氨酸转氨酶的实验室值。 OptAB基于混合神经网络微分方程算法,可以处理患者数据的特殊特征,包括不规则测量、大量缺失值和时间相关混杂。 OptAB 选择的最佳抗生素比所施用的抗生素表现出更快的疗效。


43. Natural language processing in mixed-methods evaluation of a digital sleep-alcohol intervention for young adults.

对年轻人进行数字睡眠酒精干预的混合方法评估中的自然语言处理。

PMID: 39613828 | DOI: 10.1038/s41746-024-01321-3 | 日期: 2024-11-29

摘要: We used natural language processing (NLP) in convergent mixed methods to evaluate young adults' experiences with Call it a Night (CIAN), a digital personalized feedback and coaching sleep-alcohol intervention. Young adults with heavy drinking (N = 120) were randomized to CIAN or controls (A + SM: web-based advice + self-monitoring or A: advice; clinicaltrials.gov, 8/31/18, #NCT03658954). Most CIAN participants (72.0%) preferred coaching to control interventions. Control participants found advice more helpful than CIAN participants (X2 = 27.34, p < 0.001). Most participants were interested in sleep factors besides alcohol and appreciated increased awareness through monitoring. NLP corroborated generally positive sentiments (M = 15.07(10.54)) and added critical insight that sleep (40%), not alcohol use (12%), was a main participant motivator. All groups had high adherence, satisfaction, and feasibility. CIAN (Δ = 0.48, p = 0.008) and A + SM (Δ = 0.55, p < 0.001) had higher reported effectiveness than A (F(2, 115) = 8.45, p < 0.001). Digital sleep-alcohol interventions are acceptable, and improving sleep and wellness may be important motivations for young adults.

中文摘要: 我们使用融合混合方法中的自然语言处理 (NLP) 来评估年轻人使用 Call it a Night (CAN) 的体验,这是一种数字个性化反馈和指导睡眠酒精干预。酗酒的年轻人 (N = 120) 被随机分为 CIAN 或对照组(A + SM:基于网络的建议 + 自我监测或 A:建议;clinicaltrials.gov,8/31/18,#NCT03658954)。大多数 CIAN 参与者 (72.0%) 更喜欢通过辅导来控制干预措施。对照组参与者发现建议比 CIAN 参与者更有帮助(X2 = 27.34,p < 0.001)。大多数参与者对酒精以外的睡眠因素感兴趣,并赞赏通过监测提高意识。 NLP 证实了普遍的积极情绪 (M = 15.07(10.54)),并补充了重要的见解,即睡眠 (40%),而不是饮酒 (12%),是主要的参与者激励因素。所有小组都有很高的依从性、满意度和可行性。 CIAN (Δ = 0.48, p = 0.008) 和 A + SM (Δ = 0.55, p < 0.001) 报告的有效性高于 A (F(2, 115) = 8.45, p < 0.001)。数字睡眠酒精干预措施是可以接受的,改善睡眠和健康可能是年轻人的重要动机。


44. Semi automatic quantification of REM sleep without atonia in natural sleep environment.

自然睡眠环境中无张力的快速眼动睡眠半自动量化。

PMID: 39609533 | DOI: 10.1038/s41746-024-01354-8 | 日期: 2024-11-28

摘要: Polysomnography, the gold standard diagnostic tool in sleep medicine, is performed in an artificial environment. This might alter sleep and may not accurately reflect typical sleep patterns. While macro-structures are sensitive to environmental effects, micro-structures remain more stable. In this study we applied semi-automated algorithms to capture REM sleep without atonia (RSWA) and sleep spindles, comparing lab and home measurements. We analyzed 107 full-night recordings from 55 subjects: 24 healthy adults, 28 Parkinson's disease patients (15 RBD), and three with isolated Rem sleep behavior disorder (RBD). Sessions were manually scored. An automatic algorithm for quantifying RSWA was developed and tested against manual scoring. RSWAi showed a 60% correlation between home and lab. RBD detection achieved 83% sensitivity, 79% specificity, and 81% balanced accuracy. The algorithm accurately quantified RSWA, enabling the detection of RBD patients. These findings could facilitate more accessible sleep testing, and provide a possible alternative for screening RBD.

中文摘要: 多导睡眠图是睡眠医学的黄金标准诊断工具,是在人工环境中进行的。这可能会改变睡眠,并且可能无法准确反映典型的睡眠模式。虽然宏观结构对环境影响敏感,但微观结构仍然更加稳定。在这项研究中,我们应用半自动算法来捕获无肌张力不全的快速眼动睡眠 (RSWA) 和睡眠纺锤波,比较实验室和家庭测量结果。我们分析了 55 名受试者的 107 条整夜记录:24 名健康成年人、28 名帕金森病患者 (15 RBD) 和 3 名患有孤立性快速眼动睡眠行为障碍 (RBD) 的患者。会议是手动评分的。开发了一种用于量化 RSWA 的自动算法,并针对手动评分进行了测试。 RSWAi 显示家庭和实验室之间有 60% 的相关性。 RBD 检测实现了 83% 的灵敏度、79% 的特异性和 81% 的平衡准确度。该算法准确量化 RSWA,从而能够检测 RBD 患者。这些发现可以促进更容易进行的睡眠测试,并为筛查 RBD 提供可能的替代方案。


45. A virtual scalable model of the Hepatic Lobule for acetaminophen hepatotoxicity prediction.

用于预测对乙酰氨基酚肝毒性的肝小叶虚拟可扩展模型。

PMID: 39604584 | DOI: 10.1038/s41746-024-01349-5 | 日期: 2024-11-28

摘要: Addressing drug-induced liver injury is crucial in drug development, often causing Phase III trial failures and market withdrawals. Traditional animal models fail to predict human liver toxicity accurately. Virtual twins of human organs present a promising solution. We introduce the Virtual Hepatic Lobule, a foundational element of the Living Liver, a multi-scale liver virtual twin. This model integrates blood flow dynamics and an acetaminophen-induced injury model to predict hepatocyte injury patterns specific to patients. By incorporating metabolic zonation, our predictions align with clinical zonal hepatotoxicity observations. This methodology advances the development of a human liver virtual twin, aiding in the prediction and validation of drug-induced liver injuries.

中文摘要: 解决药物引起的肝损伤对于药物开发至关重要,常常导致 III 期试验失败和退出市场。传统的动物模型无法准确预测人类肝脏毒性。人体器官的虚拟双胞胎提供了一个有前途的解决方案。我们介绍虚拟肝小叶,它是活体肝脏的基本元素,是多尺度肝脏虚拟双胞胎。该模型整合了血流动力学和对乙酰氨基酚诱导的损伤模型,以预测患者特有的肝细胞损伤模式。通过纳入代谢分区,我们的预测与临床分区肝毒性观察结果一致。该方法促进了人类肝脏虚拟双胞胎的开发,有助于预测和验证药物引起的肝损伤。


46. The path forward for large language models in medicine is open.

医学中大型语言模型的前进道路是开放的。

PMID: 39604549 | DOI: 10.1038/s41746-024-01344-w | 日期: 2024-11-27

摘要: Large language models (LLMs) are increasingly applied in medical documentation and have been proposed for clinical decision support. We argue that the future for LLMs in medicine must be based on transparent and controllable open-source models. Openness enables medical tool developers to control the safety and quality of underlying AI models, while also allowing healthcare professionals to hold these models accountable. For these reasons, the future is open.

中文摘要: 大语言模型 (LLM) 越来越多地应用于医学文档,并已被提议用于临床决策支持。我们认为,医学法学硕士的未来必须基于透明且可控的开源模式。开放性使医疗工具开发人员能够控制底层人工智能模型的安全性和质量,同时也允许医疗保健专业人员对这些模型负责。由于这些原因,未来是开放的。


47. A Primer on Reinforcement Learning in Medicine for Clinicians.

临床医生医学强化学习入门。

PMID: 39592855 | DOI: 10.1038/s41746-024-01316-0 | 日期: 2024-11-26

摘要: Reinforcement Learning (RL) is a machine learning paradigm that enhances clinical decision-making for healthcare professionals by addressing uncertainties and optimizing sequential treatment strategies. RL leverages patient-data to create personalized treatment plans, improving outcomes and resource efficiency. This review introduces RL to a clinical audience, exploring core concepts, potential applications, and challenges in integrating RL into clinical practice, offering insights into efficient, personalized, and effective patient care.

中文摘要: 强化学习 (RL) 是一种机器学习范式,可通过解决不确定性和优化序贯治疗策略来增强医疗保健专业人员的临床决策。 RL 利用患者数据创建个性化治疗计划,从而改善结果和资源效率。这篇综述向临床受众介绍了强化学习,探讨了将强化学习融入临床实践的核心概念、潜在应用和挑战,提供了对高效、个性化和有效的患者护理的见解。


48. Artificial Intelligence awarded two Nobel Prizes for innovations that will shape the future of medicine.

人工智能因塑造医学未来的创新而荣获两项诺贝尔奖。

PMID: 39587223 | DOI: 10.1038/s41746-024-01345-9 | 日期: 2024-11-25

摘要: John J. Hopfield and Geoffrey E. Hinton were awarded the 2024 Nobel Prize in Physics for developing machine learning technology using artificial neural networks. In Chemistry it was awarded to Demis Hassabis and John M. Jumper for developing an AI algorithm that solved the 50-year protein structure prediction challenge. This highlights AI's impact on science, medicine and society; however, the winners acknowledge ethical aspects of AI that must be considered.

中文摘要: 约翰·J·霍普菲尔德 (John J. Hopfield) 和杰弗里·E·辛顿 (Geoffrey E. Hinton) 因开发利用人工神经网络的机器学习技术而荣获 2024 年诺贝尔物理学奖。在化学领域,该奖项授予 Demis Hassabis 和 John M. Jumper,以表彰他们开发的人工智能算法解决了 50 年蛋白质结构预测挑战。这凸显了人工智能对科学、医学和社会的影响;然而,获奖者承认必须考虑人工智能的道德问题。


49. Spatial resolution enhancement using deep learning improves chest disease diagnosis based on thick slice CT.

使用深度学习增强空间分辨率可改善基于厚层 CT 的胸部疾病诊断。

PMID: 39580609 | DOI: 10.1038/s41746-024-01338-8 | 日期: 2024-11-23

摘要: CT is crucial for diagnosing chest diseases, with image quality affected by spatial resolution. Thick-slice CT remains prevalent in practice due to cost considerations, yet its coarse spatial resolution may hinder accurate diagnoses. Our multicenter study develops a deep learning synthetic model with Convolutional-Transformer hybrid encoder-decoder architecture for generating thin-slice CT from thick-slice CT on a single center (1576 participants) and access the synthetic CT on three cross-regional centers (1228 participants). The qualitative image quality of synthetic and real thin-slice CT is comparable (p = 0.16). Four radiologists' accuracy in diagnosing community-acquired pneumonia using synthetic thin-slice CT surpasses thick-slice CT (p < 0.05), and matches real thin-slice CT (p > 0.99). For lung nodule detection, sensitivity with thin-slice CT outperforms thick-slice CT (p < 0.001) and comparable to real thin-slice CT (p > 0.05). These findings indicate the potential of our model to generate high-quality synthetic thin-slice CT as a practical alternative when real thin-slice CT is preferred but unavailable.

中文摘要: CT 对于诊断胸部疾病至关重要,其图像质量受空间分辨率的影响。出于成本考虑,厚层CT在实践中仍然很普遍,但其粗糙的空间分辨率可能会妨碍准确的诊断。我们的多中心研究开发了一种具有卷积变换器混合编码器-解码器架构的深度学习合成模型,用于从单个中心(1576 名参与者)的厚切片 CT 生成薄层 CT,并访问三个跨区域中心(1228 名参与者)的合成 CT。合成和真实薄层 CT 的定性图像质量具有可比性 (p = 0.16)。四名放射科医生使用合成薄层 CT 诊断社区获得性肺炎的准确性超过厚层 CT (p<0.05),并与真实薄层 CT 相匹配 (p>0.99)。对于肺结节检测,薄层 CT 的灵敏度优于厚层 CT (p < 0.001),并且与真实薄层 CT 相当 (p > 0.05)。这些发现表明,当首选真实薄层 CT 但不可用时,我们的模型有可能生成高质量的合成薄层 CT,作为实用的替代方案。


50. Systematic review to understand users perspectives on AI-enabled decision aids to inform shared decision making.

进行系统审查,了解用户对人工智能决策辅助的看法,为共同决策提供信息。

PMID: 39572838 | DOI: 10.1038/s41746-024-01326-y | 日期: 2024-11-21

摘要: Artificial intelligence (AI)-enabled decision aids can contribute to the shared decision-making process between patients and clinicians through personalised recommendations. This systematic review aims to understand users' perceptions on using AI-enabled decision aids to inform shared decision-making. Four databases were searched. The population, intervention, comparison, outcomes and study design tool was used to formulate eligibility criteria. Titles, abstracts and full texts were independently screened and PRISMA guidelines followed. A narrative synthesis was conducted. Twenty-six articles were included, with AI-enabled decision aids used for screening and prevention, prognosis, and treatment. Patients found the AI-enabled decision aids easy to understand and user-friendly, fostering a sense of ownership and promoting better adherence to recommended treatment. Clinicians expressed concerns about how up-to-date the information was and the potential for over- or under-treatment. Despite users' positive perceptions, they also acknowledged certain challenges relating to the usage and risk of bias that would need to be addressed.Registration: PROSPERO database: (CRD42020220320).

中文摘要: 支持人工智能 (AI) 的决策辅助可以通过个性化建议促进患者和临床医生之间的共享决策过程。这项系统审查旨在了解用户对使用人工智能决策辅助工具为共同决策提供信息的看法。检索了四个数据库。使用人群、干预、比较、结果和研究设计工具来制定资格标准。标题、摘要和全文均经过独立筛选,并遵循 PRISMA 指南。进行了叙述综合。其中包括 26 篇文章,其中人工智能决策辅助用于筛查和预防、预后和治疗。患者发现基于人工智能的决策辅助工具易于理解且用户友好,可以培养主人翁意识并促进更好地遵守推荐的治疗。临床医生对信息的最新程度以及过度或治疗不足的可能性表示担忧。尽管用户有积极的看法,但他们也承认需要解决与使用和偏见风险相关的某些挑战。注册:PROSPERO 数据库:(CRD42020220320)。


51. A data-driven framework for identifying patient subgroups on which an AI/machine learning model may underperform.

一种数据驱动框架,用于识别人工智能/机器学习模型可能表现不佳的患者亚组。

PMID: 39572755 | DOI: 10.1038/s41746-024-01275-6 | 日期: 2024-11-21

摘要: A fundamental goal of evaluating the performance of a clinical model is to ensure it performs well across a diverse intended patient population. A primary challenge is that the data used in model development and testing often consist of many overlapping, heterogeneous patient subgroups that may not be explicitly defined or labeled. While a model's average performance on a dataset may be high, the model can have significantly lower performance for certain subgroups, which may be hard to detect. We describe an algorithmic framework for identifying subgroups with potential performance disparities (AFISP), which produces a set of interpretable phenotypes corresponding to subgroups for which the model's performance may be relatively lower. This could allow model evaluators, including developers and users, to identify possible failure modes prior to wide-scale deployment. We illustrate the application of AFISP by applying it to a patient deterioration model to detect significant subgroup performance disparities, and show that AFISP is significantly more scalable than existing algorithmic approaches.

中文摘要: 评估临床模型性能的基本目标是确保其在不同的目标患者群体中表现良好。主要挑战是模型开发和测试中使用的数据通常由许多重叠的、异质的患者亚组组成,这些患者亚组可能没有明确定义或标记。虽然模型在数据集上的平均性能可能很高,但该模型对于某些子组的性能可能会显着降低,这可能很难检测到。我们描述了一种用于识别具有潜在性能差异的子组(AFISP)的算法框架,该框架产生一组与模型性能可能相对较低的子组相对应的可解释表型。这可以让模型评估者(包括开发人员和用户)在大规模部署之前识别可能的故障模式。我们通过将 AFISP 应用于患者病情恶化模型来检测显着的亚组表现差异来说明 AFISP 的应用,并表明 AFISP 比现有算法方法具有明显更高的可扩展性。


52. Phenotype driven molecular genetic test recommendation for diagnosing pediatric rare disorders.

表型驱动的分子遗传学测试建议用于诊断儿科罕见疾病。

PMID: 39572625 | DOI: 10.1038/s41746-024-01331-1 | 日期: 2024-11-21

摘要: Patients with rare diseases often experience prolonged diagnostic delays. Ordering appropriate genetic tests is crucial yet challenging, especially for general pediatricians without genetic expertise. Recent American College of Medical Genetics (ACMG) guidelines embrace early use of exome sequencing (ES) or genome sequencing (GS) for conditions like congenital anomalies or developmental delays while still recommend gene panels for patients exhibiting strong manifestations of a specific disease. Recognizing the difficulty in navigating these options, we developed a machine learning model trained on 1005 patient records from Columbia University Irving Medical Center to recommend appropriate genetic tests based on the phenotype information. The model achieved a remarkable performance with an AUROC of 0.823 and AUPRC of 0.918, aligning closely with decisions made by genetic specialists, and demonstrated strong generalizability (AUROC:0.77, AUPRC: 0.816) in an external cohort, indicating its potential value for general pediatricians to expedite rare disease diagnosis by enhancing genetic test ordering.

中文摘要: 患有罕见疾病的患者常常会经历长时间的诊断延误。进行适当的基因检测至关重要但又具有挑战性,特别是对于没有遗传专业知识的普通儿科医生来说。最近的美国医学遗传学学院 (ACMG) 指南支持早期使用外显子组测序 (ES) 或基因组测序 (GS) 来治疗先天性异常或发育迟缓等疾病,同时仍然建议对表现出特定疾病的强烈表现的患者进行基因组检测。认识到导航这些选项的困难,我们开发了一种机器学习模型,该模型根据哥伦比亚大学欧文医学中心的 1005 份患者记录进行训练,以根据表型信息推荐适当的基因测试。该模型取得了显着的性能,AUROC 为 0.823,AUPRC 为 0.918,与遗传专家的决策密切相关,并在外部队列中表现出很强的通用性(AUROC:0.77,AUPRC:0.816),表明其对普通儿科医生通过加强基因检测顺序来加快罕见病诊断的潜在价值。


53. Learning from the EHR to implement AI in healthcare.

向 EHR 学习,在医疗保健领域实施人工智能。

PMID: 39567723 | DOI: 10.1038/s41746-024-01340-0 | 日期: 2024-11-21

摘要: The introduction of the electronic health record was heralded as a technology solution to improve care quality and efficiency, but these tools have contributed to increased administrative burden and burnout for clinicians. Today, artificial intelligence is receiving much of the same attention and promises as electronic health records. Can healthcare learn from the failures of electronic health records to maximize the potential of artificial intelligence?

中文摘要: 电子健康记录的引入被誉为提高护理质量和效率的技术解决方案,但这些工具却增加了临床医生的管理负担和倦怠。如今,人工智能正受到与电子健康记录同样的关注和承诺。医疗保健能否从电子健康记录的失败中吸取教训,以最大限度地发挥人工智能的潜力?


54. The quality and safety of using generative AI to produce patient-centred discharge instructions.

使用生成式人工智能生成以患者为中心的出院指令的质量和安全性。

PMID: 39567722 | DOI: 10.1038/s41746-024-01336-w | 日期: 2024-11-20

摘要: Patient-centred instructions on discharge can improve adherence and outcomes. Using GPT-3.5 to generate patient-centred discharge instructions, we evaluated responses for safety, accuracy and language simplification. When tested on 100 discharge summaries from MIMIC-IV, potentially harmful safety issues attributable to the AI tool were found in 18%, including 6% with hallucinations and 3% with new medications. AI tools can generate patient-centred discharge instructions, but careful implementation is needed to avoid harms.

中文摘要: 以患者为中心的出院说明可以提高依从性和结果。我们使用 GPT-3.5 生成以患者为中心的出院指令,评估了响应的安全性、准确性和语言简化性。当对 MIMIC-IV 的 100 份出院摘要进行测试时,发现 18% 的人因人工智能工具而存在潜在的有害安全问题,其中 6% 的人出现幻觉,3% 的人出现新药。人工智能工具可以生成以患者为中心的出院指令,但需要谨慎实施以避免伤害。


55. An iterative approach for estimating domain-specific cognitive abilities from large scale online cognitive data.

一种从大规模在线认知数据估计特定领域认知能力的迭代方法。

PMID: 39562825 | DOI: 10.1038/s41746-024-01327-x | 日期: 2024-11-19

摘要: Online cognitive tasks are gaining traction as scalable and cost-effective alternatives to traditional supervised assessments. However, variability in peoples' home devices, visual and motor abilities, and speed-accuracy biases confound the specificity with which online tasks can measure cognitive abilities. To address these limitations, we developed IDoCT (Iterative Decomposition of Cognitive Tasks), a method for estimating domain-specific cognitive abilities and trial-difficulty scales from task performance timecourses in a data-driven manner while accounting for device and visuomotor latencies, unspecific cognitive processes and speed-accuracy trade-offs. IDoCT can operate with any computerised task where cognitive difficulty varies across trials. Using data from 388,757 adults, we show that IDoCT successfully dissociates cognitive abilities from these confounding factors. The resultant cognitive scores exhibit stronger dissociation of psychometric factors, improved cross-participants distributions, and meaningful demographic's associations. We propose that IDoCT can enhance the precision of online cognitive assessments, especially in large scale clinical and research applications.

中文摘要: 在线认知任务作为传统监督评估的可扩展且具有成本效益的替代方案越来越受到关注。然而,人们的家庭设备、视觉和运动能力的差异以及速度准确性偏差混淆了在线任务测量认知能力的特异性。为了解决这些限制,我们开发了 IDoCT(认知任务的迭代分解),这是一种以数据驱动的方式根据任务表现时间过程估计特定领域认知能力和试验难度的方法,同时考虑设备和视觉运动延迟、非特定认知过程和速度准确性权衡。 IDoCT 可以处理任何认知难度因试验而异的计算机化任务。使用 388,757 名成年人的数据,我们表明 IDoCT 成功地将认知能力与这些混杂因素分开。由此产生的认知得分表现出更强的心理测量因素分离、改善的跨参与者分布以及有意义的人口统计学关联。我们认为 IDoCT 可以提高在线认知评估的精度,特别是在大规模临床和研究应用中。


56. Interpretable machine learning model for digital lung cancer prescreening in Chinese populations with missing data.

可解释的机器学习模型,用于在缺失数据的中国人群中进行数字化肺癌预筛查。

PMID: 39562681 | DOI: 10.1038/s41746-024-01309-z | 日期: 2024-11-19

摘要: We developed an interpretable model, BOUND (Bayesian netwOrk for large-scale lUng caNcer Digital prescreening), using a comprehensive EHR dataset from the China to improve lung cancer detection rates. BOUND employs Bayesian network uncertainty inference, allowing it to predict lung cancer risk even with missing data and identify high-risk factors. Developed using data from 905,194 individuals, BOUND achieved an AUC of 0.866 in internal validation, with time- and geography-based external validations yielding AUCs of 0.848 and 0.841, respectively. In datasets with 10%-70% missing data, AUC ranged from 0.827 - 0.746. The model demonstrates strong calibration, clinical utility, and robust performance in both balanced and imbalanced datasets. A risk scorecard was also created, improving detection rates up to 6.8 times, available free online ( https://drzhang1.aiself.net/ ). BOUND enables non-radiative, cost-effective lung cancer prescreening, excels with missing data, and addresses treatment inequities in resource-limited primary healthcare settings.

中文摘要: 我们开发了一个可解释的模型 BOUND(用于大规模肺癌数字预筛查的贝叶斯网络),使用来自中国的综合 EHR 数据集来提高肺癌检出率。 BOUND 采用贝叶斯网络不确定性推理,即使在数据缺失的情况下也能预测肺癌风险,并识别高风险因素。 BOUND 使用来自 905,194 名个体的数据进行开发,在内部验证中实现了 0.866 的 AUC,基于时间和地理的外部验证的 AUC 分别为 0.848 和 0.841。在缺失数据 10%-70% 的数据集中,AUC 范围为 0.827 - 0.746。该模型在平衡和不平衡数据集中表现出强大的校准、临床实用性和稳健的性能。还创建了风险记分卡,将检测率提高了 6.8 倍,可免费在线获取(https://drzhang1.aiself.net/)。 BOUND 可实现非辐射、经济高效的肺癌预筛查,擅长处理缺失数据,并解决资源有限的初级医疗机构中的治疗不平等问题。


57. A strategy for cost-effective large language model use at health system-scale.

在卫生系统范围内使用具有成本效益的大型语言模型的策略。

PMID: 39558090 | DOI: 10.1038/s41746-024-01315-1 | 日期: 2024-11-18

摘要: Large language models (LLMs) can optimize clinical workflows; however, the economic and computational challenges of their utilization at the health system scale are underexplored. We evaluated how concatenating queries with multiple clinical notes and tasks simultaneously affects model performance under increasing computational loads. We assessed ten LLMs of different capacities and sizes utilizing real-world patient data. We conducted >300,000 experiments of various task sizes and configurations, measuring accuracy in question-answering and the ability to properly format outputs. Performance deteriorated as the number of questions and notes increased. High-capacity models, like Llama-3-70b, had low failure rates and high accuracies. GPT-4-turbo-128k was similarly resilient across task burdens, but performance deteriorated after 50 tasks at large prompt sizes. After addressing mitigable failures, these two models can concatenate up to 50 simultaneous tasks effectively, with validation on a public medical question-answering dataset. An economic analysis demonstrated up to a 17-fold cost reduction at 50 tasks using concatenation. These results identify the limits of LLMs for effective utilization and highlight avenues for cost-efficiency at the enterprise scale.

中文摘要: 大语言模型(LLM)可以优化临床工作流程;然而,在卫生系统范围内利用它们所面临的经济和计算挑战尚未得到充分探讨。我们评估了在计算负载增加的情况下,将查询与多个临床记录和任务同时连接如何影响模型性能。我们利用真实世界的患者数据评估了十位不同能力和规模的法学硕士。我们对各种任务规模和配置进行了超过 300,000 次实验,测量问答的准确性以及正确格式化输出的能力。随着问题和笔记数量的增加,性能下降。 Llama-3-70b 等高容量型号故障率低且准确度高。 GPT-4-turbo-128k 在任务负担方面具有类似的弹性,但在大提示大小的 50 个任务后性能下降。在解决可缓解的故障后,这两个模型可以有效地连接多达 50 个同步任务,并在公共医疗问答数据集上进行验证。经济分析表明,使用串联可将 50 项任务的成本降低多达 17 倍。这些结果确定了法学硕士有效利用的限制,并突出了企业规模成本效益的途径。


58. Simulating A/B testing versus SMART designs for LLM-driven patient engagement to close preventive care gaps.

模拟 A/B 测试与 SMART 设计,以实现法学硕士驱动的患者参与,以缩小预防性护理差距。

PMID: 39558021 | DOI: 10.1038/s41746-024-01330-2 | 日期: 2024-11-18

摘要: Population health initiatives often rely on cold outreach to close gaps in preventive care, such as overdue screenings or immunizations. Tailoring messages to diverse patient populations remains challenging, as traditional A/B testing requires large sample sizes to test only two alternative messages. With increasing availability of large language models (LLMs), programs can utilize tiered testing among both LLM and manual human agents, presenting the dilemma of identifying which patients need different levels of human support to cost-effectively engage large populations. Using microsimulations, we compared both the statistical power and false positive rates of A/B testing and Sequential Multiple Assignment Randomized Trials (SMART) for developing personalized communications across multiple effect sizes and sample sizes. SMART showed better cost-effectiveness and net benefit across all scenarios, but superior power for detecting heterogeneous treatment effects (HTEs) only in later randomization stages, when populations were more homogeneous and subtle differences drove engagement differences.

中文摘要: 人口健康举措通常依靠冷外展来缩小预防保健方面的差距,例如逾期的筛查或免疫接种。针对不同患者群体定制消息仍然具有挑战性,因为传统的 A/B 测试需要大量样本才能仅测试两种替代消息。随着大型语言模型 (LLM) 的可用性不断增加,程序可以利用 LLM 和人工人工代理之间的分层测试,从而出现了确定哪些患者需要不同级别的人力支持以经济有效地吸引大量人群的困境。使用微观模拟,我们比较了 A/B 测试和序贯多重分配随机试验 (SMART) 的统计功效和误报率,以跨多种效应大小和样本量开发个性化沟通。 SMART 在所有情况下都显示出更好的成本效益和净收益,但仅在后期随机化阶段检测异质治疗效果 (HTE) 的能力更强,此时人群更加同质,细微的差异导致参与度差异。


59. Accurately predicting mood episodes in mood disorder patients using wearable sleep and circadian rhythm features.

使用可穿戴睡眠和昼夜节律功能准确预测情绪障碍患者的情绪发作。

PMID: 39557997 | DOI: 10.1038/s41746-024-01333-z | 日期: 2024-11-18

摘要: Wearable devices enable passive collection of sleep, heart rate, and step-count data, offering potential for mood episode prediction in mood disorder patients. However, current models often require various data types, limiting real-world application. Here, we develop models that predict future episodes using only sleep-wake data, easily gathered through smartphones and wearables when trained on an individual's sleep-wake history and past mood episodes. Using mathematical modeling to longitudinal data from 168 patients (587 days average clinical follow-up, 267 days wearable data), we derived 36 sleep and circadian rhythm features. These features enabled accurate next-day predictions for depressive, manic, and hypomanic episodes (AUCs: 0.80, 0.98, 0.95). Notably, daily circadian phase shifts were the most significant predictors: delays linked to depressive episodes, advances to manic episodes. This prospective observational cohort study (ClinicalTrials.gov: NCT03088657, 2017-3-23) shows sleep-wake data, combined with prior mood episode history, can effectively predict mood episodes, enhancing mood disorder management.

中文摘要: 可穿戴设备能够被动收集睡眠、心率和步数数据,为情绪障碍患者的情绪发作预测提供了潜力。然而,当前的模型通常需要各种数据类型,限制了实际应用。在这里,我们开发了仅使用睡眠-觉醒数据来预测未来发作的模型,当对个人的睡眠-觉醒历史和过去的情绪发作进行训练时,可以通过智能手机和可穿戴设备轻松收集这些数据。通过对 168 名患者的纵向数据(587 天的平均临床随访,267 天的可穿戴数据)进行数学建模,我们得出了 36 个睡眠和昼夜节律特征。这些功能可以准确预测第二天的抑郁、躁狂和轻躁狂发作(AUC:0.80、0.98、0.95)。值得注意的是,每日昼夜节律相移是最重要的预测因素:延迟与抑郁发作有关,进展与躁狂发作有关。这项前瞻性观察队列研究(ClinicalTrials.gov:NCT03088657,2017-3-23)显示,睡眠-觉醒数据与既往情绪发作史相结合,可以有效预测情绪发作,增强情绪障碍管理。


60. Developing a Canadian artificial intelligence medical curriculum using a Delphi study.

使用德尔菲研究开发加拿大人工智能医学课程。

PMID: 39557985 | DOI: 10.1038/s41746-024-01307-1 | 日期: 2024-11-18

摘要: The integration of artificial intelligence (AI) education into medical curricula is critical for preparing future healthcare professionals. This research employed the Delphi method to establish an expert-based AI curriculum for Canadian undergraduate medical students. A panel of 18 experts in health and AI across Canada participated in three rounds of surveys to determine essential AI learning competencies. The study identified key curricular components across ethics, law, theory, application, communication, collaboration, and quality improvement. The findings demonstrate substantial support among medical educators and professionals for the inclusion of comprehensive AI education, with 82 out of 107 curricular competencies being deemed essential to address both clinical and educational priorities. It additionally provides suggestions on methods to integrate these competencies within existing dense medical curricula. The endorsed set of objectives aims to enhance AI literacy and application skills among medical students, equipping them to effectively utilize AI technologies in future healthcare settings.

中文摘要: 将人工智能(AI)教育融入医学课程对于培养未来的医疗保健专业人员至关重要。本研究采用德尔菲法为加拿大医学本科生建立了基于专家的人工智能课程。由加拿大各地 18 名健康和人工智能专家组成的小组参与了三轮调查,以确定基本的人工智能学习能力。该研究确定了道德、法律、理论、应用、沟通、协作和质量改进的关键课程组成部分。调查结果表明,医学教育工作者和专业人士大力支持纳入全面的人工智能教育,107 项课程能力中的 82 项被认为对于解决临床和教育优先事项至关重要。它还提供了有关将这些能力整合到现有密集医学课程中的方法的建议。批准的一系列目标旨在提高医学生的人工智能素养和应用技能,使他们能够在未来的医疗保健环境中有效利用人工智能技术。


61. Reinforcement learning model for optimizing dexmedetomidine dosing to prevent delirium in critically ill patients.

用于优化右美托咪定剂量以预防危重患者谵妄的强化学习模型。

PMID: 39557970 | DOI: 10.1038/s41746-024-01335-x | 日期: 2024-11-18

摘要: Delirium can result in undesirable outcomes including increased length of stays and mortality in patients admitted to the intensive care unit (ICU). Dexmedetomidine has emerged for delirium prevention in these patients; however, optimal dosing is challenging. A reinforcement learning-based Artificial Intelligence model for Delirium prevention (AID) is proposed to optimize dexmedetomidine dosing. The model was developed and internally validated using 2416 patients (2531 ICU admissions) and externally validated on 270 patients (274 ICU admissions). The estimated performance return of the AID policy was higher than that of the clinicians' policy in both derivation (0.390 95% confidence interval [CI] 0.361 to 0.420 vs. -0.051 95% CI -0.077 to -0.025) and external validation (0.186 95% CI 0.139 to 0.236 vs. -0.436 95% CI -0.474 to -0.402) cohorts. Our finding indicates that AID might support clinicians' decision-making regarding dexmedetomidine dosing to prevent delirium in ICU patients, but further off-policy evaluation is required.

中文摘要: 谵妄可能会导致不良后果,包括入住重症监护病房 (ICU) 的患者住院时间增加和死亡率增加。右美托咪定可用于预防这些患者的谵妄;然而,最佳剂量具有挑战性。提出了一种基于强化学习的预防谵妄(AID)的人工智能模型来优化右美托咪定剂量。该模型是使用 2416 名患者(2531 名 ICU 入院患者)进行开发和内部验证,并在 270 名患者(274 名 ICU 入院患者)上进行外部验证。 AID 政策的估计绩效回报在推导(0.390 95% CI 0.361 至 0.420 对比 -0.051 95% CI -0.077 至 -0.025)和外部验证(0.186 95% CI 0.139 至 0.236 对比 -0.436)方面均高于临床医生的政策。 95% CI -0.474 至 -0.402) 队列。我们的研究结果表明,AID 可能支持临床医生关于右美托咪定剂量的决策,以预防 ICU 患者的谵妄,但需要进一步的政策外评估。


62. Cost-effectiveness analysis of mHealth applications for depression in Germany using a Markov cohort simulation.

使用马尔可夫队列模拟对德国抑郁症的移动医疗应用进行成本效益分析。

PMID: 39551808 | DOI: 10.1038/s41746-024-01324-0 | 日期: 2024-11-17

摘要: Regulated mobile health applications are called digital health applications ("DiGA") in Germany. To qualify for reimbursement by statutory health insurance companies, DiGA have to prove positive care effects in scientific studies. Since the empirical exploration of DiGA cost-effectiveness remains largely uncharted, this study pioneers the methodology of cohort-based state-transition Markov models to evaluate DiGA for depression. As health states, we define mild, moderate, severe depression, remission and death. Comparing a future scenario where 50% of patients receive supplementary DiGA access with the current standard of care reveals a gain of 0.02 quality-adjusted life years (QALYs) per patient, which comes at additional direct costs of ~1536 EUR per patient over a five-year timeframe. Influencing factors determining DiGA cost-effectiveness are the DiGA cost structure and individual DiGA effectiveness. Under Germany's existing cost structure, DiGA for depression are yet to demonstrate the ability to generate overall savings in healthcare expenditures.

中文摘要: 受监管的移动健康应用程序在德国称为数字健康应用程序("DiGA")。为了获得法定健康保险公司报销的资格,DiGA 必须在科学研究中证明积极的护理效果。由于 DiGA 成本效益的实证探索在很大程度上仍然未知,因此本研究开创了基于队列的状态转移马尔可夫模型的方法来评估 DiGA 治疗抑郁症的效果。作为健康状态,我们定义轻度、中度、重度抑郁、缓解和死亡。将未来 50% 的患者接受补充 DiGA 访问与当前护理标准的情况进行比较,发现每位患者的质量调整生命年 (QALY) 增加了 0.02 个,而在五年的时间范围内,每位患者的额外直接成本约为 1536 欧元。决定 DiGA 成本效益的影响因素是 DiGA 成本结构和个体 DiGA 有效性。根据德国现有的成本结构,针对抑郁症的 DiGA 尚未证明能够总体节省医疗支出。


63. The effects of a digital health intervention on patient activation in chronic kidney disease.

数字健康干预对慢性肾脏病患者激活的影响。

PMID: 39533053 | DOI: 10.1038/s41746-024-01296-1 | 日期: 2024-11-12

摘要: My Kidneys & Me (MK&M), a digital health intervention delivering specialist health and lifestyle education for people with CKD, was developed and its effects tested (SMILE-K trial, ISRCTN18314195, 18/12/2020). 420 adult patients with CKD stages 3-4 were recruited and randomised 2:1 to intervention (MK&M) (n = 280) or control (n = 140) groups. Outcomes, including Patient Activation Measure (PAM-13), were collected at baseline and 20 weeks. Complete case (CC) and per-protocol (PP) analyses were conducted. 210 (75%) participants used MK&M more than once. PAM-13 increased at 20 weeks compared to control (CC: +3.1 (95%CI: -0.2 to 6.4), P = 0.065; PP: +3.6 (95%CI: 0.2 to 7.0), P = 0.041). In those with low activation at baseline, significant between-group differences were observed (CC: +6.6 (95%CI: 1.3 to 11.9), P = 0.016; PP: +9.2 (95%CI: 4.0 to 14.6), P < 0.001) favouring MK&M group. MK&M improved patient activation in those who used the resource compared to standard care, although the overall effect was non-significant. The greatest benefits were seen in those with low activation.

中文摘要: My Kidneys & Me (MK&M) 是一种数字健康干预措施,为 CKD 患者提供专业健康和生活方式教育,并对其效果进行了测试(SMILE-K 试验,ISRCTN18314195,2020 年 12 月 18 日)。招募了 420 名 CKD 3-4 期成年患者,并以 2:1 的比例随机分为干预组 (MK&M) (n = 280) 或对照组 (n = 140)。在基线和 20 周时收集结果,包括患者激活测量 (PAM-13)。进行了完整病例 (CC) 和符合方案 (PP) 分析。 210 名 (75%) 参与者多次使用 MK&M。与对照相比,PAM-13在第20周时增加(CC:+3.1(95%CI:-0.2至6.4),P = 0.065;PP:+3.6(95%CI:0.2至7.0),P = 0.041)。在基线激活度较低的患者中,观察到显着的组间差异(CC:+6.6(95%CI:1.3至11.9),P = 0.016;PP:+9.2(95%CI:4.0至14.6),P < 0.001)有利于MK&M组。与标准护理相比,MK&M 提高了使用该资源的患者的激活度,尽管总体效果并不显着。最大的好处出现在那些活性较低的人身上。


64. Multisource representation learning for pediatric knowledge extraction from electronic health records.

从电子健康记录中提取儿科知识的多源表示学习。

PMID: 39533050 | DOI: 10.1038/s41746-024-01320-4 | 日期: 2024-11-13

摘要: Electronic Health Record (EHR) systems are particularly valuable in pediatrics due to high barriers in clinical studies, but pediatric EHR data often suffer from low content density. Existing EHR code embeddings tailored for the general patient population fail to address the unique needs of pediatric patients. To bridge this gap, we introduce a transfer learning approach, MUltisource Graph Synthesis (MUGS), aimed at accurate knowledge extraction and relation detection in pediatric contexts. MUGS integrates graphical data from both pediatric and general EHR systems, along with hierarchical medical ontologies, to create embeddings that adaptively capture both the homogeneity and heterogeneity between hospital systems. These embeddings enable refined EHR feature engineering and nuanced patient profiling, proving particularly effective in identifying pediatric patients similar to specific profiles, with a focus on pulmonary hypertension (PH). MUGS embeddings, resistant to negative transfer, outperform other benchmark methods in multiple applications, advancing evidence-based pediatric research.

中文摘要: 由于临床研究的高障碍,电子健康记录 (EHR) 系统在儿科中特别有价值,但儿科 EHR 数据往往内容密度低。现有的针对一般患者群体定制的 EHR 代码嵌入无法满足儿科患者的独特需求。为了弥补这一差距,我们引入了一种迁移学习方法,MUltisource Graph Synthesis (MUGS),旨在儿科背景下准确的知识提取和关系检测。 MUGS 集成了来自儿科和普通 EHR 系统的图形数据以及分层医学本体,以创建嵌入,自适应地捕获医院系统之间的同质性和异质性。这些嵌入可以实现精细的 EHR 特征工程和细致入微的患者分析,事实证明在识别与特定特征相似的儿科患者方面特别有效,重点关注肺动脉高压 (PH)。 MUGS 嵌入具有抗负迁移能力,在多种应用中优于其他基准方法,从而推进了基于证据的儿科研究。


65. Simulated misuse of large language models and clinical credit systems.

模拟大型语言模型和临床学分系统的滥用。

PMID: 39528596 | DOI: 10.1038/s41746-024-01306-2 | 日期: 2024-11-11

摘要: In the future, large language models (LLMs) may enhance the delivery of healthcare, but there are risks of misuse. These methods may be trained to allocate resources via unjust criteria involving multimodal data - financial transactions, internet activity, social behaviors, and healthcare information. This study shows that LLMs may be biased in favor of collective/systemic benefit over the protection of individual rights and could facilitate AI-driven social credit systems.

中文摘要: 未来,大语言模型(LLM)可能会增强医疗保健的提供,但存在误用的风险。这些方法可能经过训练,通过涉及多模式数据(金融交易、互联网活动、社会行为和医疗保健信息)的不公正标准来分配资源。这项研究表明,法学硕士可能会偏向于集体/系统利益,而不是保护个人权利,并且可以促进人工智能驱动的社会信用体系。


增强现实钻井轨迹与传统导航随机交叉试验的准确性和效率。

PMID: 39523443 | DOI: 10.1038/s41746-024-01314-2 | 日期: 2024-11-10

摘要: Conventional navigation systems (CNS) in surgery require strong spatial cognitive abilities and hand-eye coordination. Augmented Reality Navigation Systems (ARNS) provide 3D guidance and may overcome these challenges, but their accuracy and efficiency compared to CNS have not been systematically evaluated. In this randomized crossover study with 36 participants from different professional backgrounds (surgeons, students, engineers), drilling accuracy, time and perceived workload were evaluated using ARNS and CNS. For the first time, this study provides compelling evidence that ARNS and CNS have comparable accuracy in translational error. Differences in angle and depth error with ARNS were likely due to limited stereoscopic vision, hardware limitations, and design. Despite this, ARNS was preferred by most participants, including surgeons with prior navigation experience, and demonstrated a significantly better overall user experience. Depending on accuracy requirements, ARNS could serve as a viable alternative to CNS for guided drilling, with potential for future optimization.

中文摘要: 手术中的传统导航系统(CNS)需要强大的空间认知能力和手眼协调能力。增强现实导航系统 (ARNS) 提供 3D 引导并可能克服这些挑战,但与 CNS 相比,其准确性和效率尚未得到系统评估。在这项由来自不同专业背景(外科医生、学生、工程师)的 36 名参与者参与的随机交叉研究中,使用 ARNS 和 CNS 评估了钻孔精度、时间和感知工作量。这项研究首次提供了令人信服的证据,证明 ARNS 和 CNS 在翻译错误方面具有相当的准确性。 ARNS 的角度和深度误差差异可能是由于有限的立体视觉、硬件限制和设计造成的。尽管如此,ARNS 仍受到大多数参与者的青睐,包括具有导航经验的外科医生,并且表现出明显更好的整体用户体验。根据精度要求,ARNS 可以作为引导钻井 CNS 的可行替代方案,并具有未来优化的潜力。


67. Post-marketing surveillance of anticancer drugs using natural language processing of electronic medical records.

使用电子病历的自然语言处理进行抗癌药物的上市后监测。

PMID: 39521935 | DOI: 10.1038/s41746-024-01323-1 | 日期: 2024-11-09

摘要: This study demonstrates that adverse events (AEs) extracted using natural language processing (NLP) from clinical texts reflect the known frequencies of AEs associated with anticancer drugs. Using data from 44,502 cancer patients at a single hospital, we identified cases prescribed anticancer drugs (platinum, PLT; taxane, TAX; pyrimidine, PYA) and compared them to non-treatment (NTx) group using propensity score matching. Over 365 days, AEs (peripheral neuropathy, PN; oral mucositis, OM; taste abnormality, TA; appetite loss, AL) were extracted from clinical text using an NLP tool. The hazard ratios (HRs) for the anticancer drugs were: PN, 1.15-1.95; OM, 3.11-3.85; TA, 3.48-4.71; and AL, 1.98-3.84; the HRs were significantly higher than that of the NTx group. Sensitivity analysis revealed that the HR for TA may have been underestimated; however, the remaining three types of AEs extracted from clinical text by NLP were consistently associated with the three anticancer drugs.

中文摘要: 这项研究表明,使用自然语言处理 (NLP) 从临床文本中提取的不良事件 (AE) 反映了与抗癌药物相关的 AE 的已知频率。利用来自同一家医院 44,502 名癌症患者的数据,我们确定了服用抗癌药物(铂,PLT;紫杉烷,TAX;嘧啶,PYA)的病例,并使用倾向评分匹配将其与未治疗 (NTx) 组进行比较。在 365 天的时间里,使用 NLP 工具从临床文本中提取 AE(周围神经病变,PN;口腔粘膜炎,OM;味觉异常,TA;食欲不振,AL)。抗癌药物的风险比(HR)为:PN,1.15-1.95;奥姆,3.11-3.85; TA,3.48-4.71;和AL,1.98-3.84; HR 显着高于 NTx 组。敏感性分析显示,TA 的 HR 可能被低估;然而,通过 NLP 从临床文本中提取的其余三类 AE 与三种抗癌药物一致相关。


68. Artificial intelligence assisted operative anatomy recognition in endoscopic pituitary surgery.

人工智能辅助内窥镜垂体手术中的手术解剖识别。

PMID: 39521895 | DOI: 10.1038/s41746-024-01273-8 | 日期: 2024-11-09

摘要: Pituitary tumours are surrounded by critical neurovascular structures and identification of these intra-operatively can be challenging. We have previously developed an AI model capable of sellar anatomy segmentation. This study aims to apply this model, and explore the impact of AI-assistance on clinician anatomy recognition. Participants were tasked with labelling the sella on six images, initially without assistance, then augmented by AI. Mean DICE scores and the proportion of annotations encompassing the centroid of the sella were calculated. Six medical students, six junior trainees, six intermediate trainees and six experts were recruited. There was an overall improvement in sella recognition from a DICE of score 70.7% without AI assistance to 77.5% with AI assistance (+6.7; p < 0.001). Medical students used and benefitted from AI assistance the most, improving from a DICE score of 66.2% to 78.9% (+12.8; p = 0.02). This technology has the potential to augment surgical education and eventually be used as an intra-operative decision support tool.

中文摘要: 垂体肿瘤被关键的神经血管结构包围,术中识别这些结构可能具有挑战性。我们之前开发了一种能够进行鞍区解剖分割的人工智能模型。本研究旨在应用该模型,探讨人工智能辅助对临床医生解剖识别的影响。参与者的任务是在六张图像上标记蝶鞍,最初没有帮助,然后通过人工智能进行增强。计算平均 DICE 分数和包含蝶鞍质心的注释比例。招收医学生6名、初级实习生6名、中级实习生6名、专家6名。鞍区识别率整体提高,从没有 AI 辅助的 DICE 得分 70.7% 提高到有 AI 辅助的 77.5%(+6.7;p<0.001)。医学生使用人工智能辅助并从中受益最多,DICE 分数从 66.2% 提高到 78.9% (+12.8;p = 0.02)。该技术有潜力增强外科教育并最终用作术中决策支持工具。


69. Ehealth interactive intervention in promoting safer sex among men who have sex with men.

电子健康互动干预措施促进男男性行为者的安全性行为。

PMID: 39516336 | DOI: 10.1038/s41746-024-01313-3 | 日期: 2024-11-09

摘要: Men who have sex with men (MSM) who use dating applications (apps) have higher rates of engaging in condomless anal sex than those who do not. Therefore, we conducted a two-arm randomized controlled trial to evaluate the effectiveness of an interactive web-based intervention in promoting safer sex among this population. The intervention was guided by the Theory of Planned Behavior and co-designed by researchers, healthcare providers, and MSM participants. The primary outcome was the frequency of condomless anal sex in past three months. Secondary outcomes included five other behavioral outcomes and two psychological outcomes. This trial was registered on ISRCTN (ISRCTN16681863) on 2020/04/28. A total of 480 MSM were enrolled and randomly assigned to the intervention or control group. Our findings indicate that the intervention significantly reduced condomless anal sex behaviors by enhancing self-efficacy and attitudes toward condom use among MSM dating app users, with the effects sustained at both three and six months.

中文摘要: 使用约会应用程序 (app) 的男男性行为者 (MSM) 进行无套肛交的比例高于不使用避孕套的男性。因此,我们进行了一项双组随机对照试验,以评估基于网络的交互式干预措施在促进该人群安全性行为方面的有效性。该干预措施以计划行为理论为指导,由研究人员、医疗保健提供者和 MSM 参与者共同设计。主要结果是过去三个月无套肛交的频率。次要结果包括其他五种行为结果和两种心理结果。该试验于2020年4月28日在ISRCTN(ISRCTN16681863)上注册。总共 480 名 MSM 被招募并随机分配到干预组或对照组。我们的研究结果表明,干预措施通过提高 MSM 约会应用程序用户的自我效能和对安全套使用的态度,显着减少了无套肛交行为,效果持续三个月和六个月。


70. Automated decision making in Barrett's oesophagus: development and deployment of a natural language processing tool.

巴雷特食道的自动决策:自然语言处理工具的开发和部署。

PMID: 39511374 | DOI: 10.1038/s41746-024-01302-6 | 日期: 2024-11-07

摘要: Manual decisions regarding the timing of surveillance endoscopy for premalignant Barrett's oesophagus (BO) is error-prone. This leads to inefficient resource usage and safety risks. To automate decision-making, we fine-tuned Bidirectional Encoder Representations from Transformers (BERT) models to categorize BO length (EndoBERT) and worst histopathological grade (PathBERT) on 4,831 endoscopy and 4,581 pathology reports from Guy's and St Thomas' Hospital (GSTT). The accuracies for EndoBERT test sets from GSTT, King's College Hospital (KCH), and Sandwell and West Birmingham Hospitals (SWB) were 0.95, 0.86, and 0.99, respectively. Average accuracies for PathBERT were 0.93, 0.91, and 0.92, respectively. A retrospective analysis of 1640 GSTT reports revealed a 27% discrepancy between endoscopists' decisions and model recommendations. This study underscores the development and deployment of NLP-based software in BO surveillance, demonstrating high performance at multiple sites. The analysis emphasizes the potential efficiency of automation in enhancing precision and guideline adherence in clinical decision-making.

中文摘要: 关于癌前巴雷特食管 (BO) 监测内窥镜检查时间的手动决定很容易出错。这导致资源利用效率低下和安全风险。为了实现决策自动化,我们对来自 Transformers (BERT) 模型的双向编码器表示进行了微调,以对盖伊圣托马斯医院 (GSTT) 的 4,831 份内窥镜检查和 4,581 份病理报告中的 BO 长度 (EndoBERT) 和最差组织病理学分级 (PathBERT) 进行分类。 GSTT、国王学院医院 (KCH) 以及桑德韦尔和西伯明翰医院 (SWB) 的 EndoBERT 测试集的准确度分别为 0.95、0.86 和 0.99。 PathBERT 的平均准确度分别为 0.93、0.91 和 0.92。对 1640 份 GSTT 报告的回顾性分析显示,内窥镜医师的决定与模型建议之间存在 27% 的差异。这项研究强调了基于 NLP 的软件在 BO 监测中的开发和部署,在多个地点展示了高性能。该分析强调了自动化在提高临床决策的准确性和指南遵守方面的潜在效率。


71. The adaptation of a single institution diabetes care platform into a nationally available turnkey solution.

将单一机构糖尿病护理平台改造为全国可用的交钥匙解决方案。

PMID: 39506045 | DOI: 10.1038/s41746-024-01319-x | 日期: 2024-11-06

摘要: Digital decision support and remote patient monitoring may improve outcomes and efficiency, but rarely scale beyond a single institution. Over the last 5 years, the platform Timely Interventions for Diabetes Excellence (TIDE) has been associated with reduced care provider screen time and improved, equitable type 1 diabetes care and outcomes for 268 patients in a heterogeneous population as part of the Teamwork, Targets, Technology, and Tight Control (4T) Study (NCT03968055, NCT04336969). Previous efforts to deploy TIDE at other institutions continue to face delays. In partnership with the diabetes technology non-profit, Tidepool, we developed Tidepool-TIDE, a clinic-agnostic, turnkey solution available to any clinic in the United States. We present how we overcame common technical and operational barriers specific to scaling digital health technology from one site to many. The concepts described are broadly applicable for institutions interested in facilitating broader adoption of digital technology for population-level management of chronic health conditions.

中文摘要: 数字决策支持和远程患者监测可以改善结果和效率,但很少扩展到单个机构之外。在过去的 5 年里,作为团队合作、目标、技术和严格控制 (4T) 研究 (NCT03968055、NCT04336969) 的一部分,及时干预糖尿病卓越 (TIDE) 平台与减少护理人员筛查时间以及为异质人群中的 268 名患者改善、公平的 1 型糖尿病护理和结果相关。之前在其他机构部署 TIDE 的努力仍然面临延误。我们与糖尿病技术非营利组织 Tidepool 合作开发了 Tidepool-TIDE,这是一种与诊所无关的交钥匙解决方案,可供美国任何诊所使用。我们介绍了我们如何克服将数字医疗技术从一个站点扩展到多个站点时所特有的常见技术和运营障碍。所描述的概念广泛适用于有兴趣促进更广泛地采用数字技术来对慢性健康状况进行人口层面管理的机构。


72. Bridging organ transcriptomics for advancing multiple organ toxicity assessment with a generative AI approach.

桥接器官转录组学,通过生成人工智能方法推进多器官毒性评估。

PMID: 39501092 | DOI: 10.1038/s41746-024-01317-z | 日期: 2024-11-05

摘要: Translational research in toxicology has significantly benefited from transcriptomic profiling, particularly in drug safety. However, its application has predominantly focused on limited organs, notably the liver, due to resource constraints. This paper presents TransTox, an innovative AI model using a generative adversarial network (GAN) method to facilitate the bidirectional translation of transcriptomic profiles between the liver and kidney under drug treatment. TransTox demonstrates robust performance, validated across independent datasets and laboratories. First, the concordance between real experimental data and synthetic data generated by TransTox was demonstrated in characterizing toxicity mechanisms compared to real experimental settings. Second, TransTox proved valuable in gene expression predictive models, where synthetic data could be used to develop gene expression predictive models or serve as "digital twins" for diagnostic applications. The TransTox approach holds the potential for multi-organ toxicity assessment with AI and advancing the field of precision toxicology.

中文摘要: 毒理学的转化研究极大地受益于转录组学分析,特别是在药物安全方面。然而,由于资源限制,其应用主要集中在有限的器官,特别是肝脏。本文介绍了 TransTox,这是一种创新的人工智能模型,使用生成对抗网络 (GAN) 方法来促进药物治疗下肝脏和肾脏之间转录组图谱的双向翻译。 TransTox 展示了强大的性能,并经过独立数据集和实验室的验证。首先,与真实实验设置相比,在表征毒性机制方面证明了真实实验数据和 TransTox 生成的合成数据之间的一致性。其次,TransTox 被证明在基因表达预测模型中很有价值,其中合成数据可用于开发基因表达预测模型或作为诊断应用的"数字双胞胎"。 TransTox 方法具有利用人工智能进行多器官毒性评估和推进精准毒理学领域的潜力。


73. Deep learning based highly accurate transplanted bioengineered corneal equivalent thickness measurement using optical coherence tomography.

使用光学相干断层扫描基于深度学习的高精度移植生物工程角膜等效厚度测量。

PMID: 39501083 | DOI: 10.1038/s41746-024-01305-3 | 日期: 2024-11-05

摘要: Corneal transplantation is the primary treatment for irreversible corneal diseases, but due to limited donor availability, bioengineered corneal equivalents are being developed as a solution, with biocompatibility, structural integrity, and physical function considered key factors. Since conventional evaluation methods may not fully capture the complex properties of the cornea, there is a need for advanced imaging and assessment techniques. In this study, we proposed a deep learning-based automatic segmentation method for transplanted bioengineered corneal equivalents using optical coherence tomography to achieve a highly accurate evaluation of graft integrity and biocompatibility. Our method provides quantitative individual thickness values, detailed maps, and volume measurements of the bioengineered corneal equivalents, and has been validated through 14 days of monitoring. Based on the results, it is expected to have high clinical utility as a quantitative assessment method for human keratoplasties, including automatic opacity area segmentation and implanted graft part extraction, beyond animal studies.

中文摘要: 角膜移植是不可逆角膜疾病的主要治疗方法,但由于供体有限,生物工程角膜等效物正在开发作为解决方案,其中生物相容性、结构完整性和物理功能被视为关键因素。由于传统的评估方法可能无法完全捕捉角膜的复杂特性,因此需要先进的成像和评估技术。在本研究中,我们提出了一种基于深度学习的移植生物工程角膜等效物的自动分割方法,使用光学相干断层扫描来实现移植物完整性和生物相容性的高精度评估。我们的方法提供了生物工程角膜等效物的定量个体厚度值、详细图和体积测量值,并通过 14 天的监测进行了验证。基于这些结果,除了动物研究之外,预计它作为人类角膜移植术的定量评估方法具有很高的临床实用性,包括自动不透明区域分割和植入移植物部分提取。


74. Artificial intelligence applied to coronary artery calcium scans (AI-CAC) significantly improves cardiovascular events prediction.

应用于冠状动脉钙扫描(AI-CAC)的人工智能显着改善了心血管事件的预测。

PMID: 39501071 | DOI: 10.1038/s41746-024-01308-0 | 日期: 2024-11-05

摘要: Coronary artery calcium (CAC) scans contain valuable information beyond the Agatston Score which is currently reported for predicting coronary heart disease (CHD) only. We examined whether new artificial intelligence (AI) applied to CAC scans can predict non-CHD events, including heart failure, atrial fibrillation, and stroke. We applied AI-enabled automated cardiac chambers volumetry and calcified plaque characterization to CAC scans (AI-CAC) of 5830 asymptomatic individuals (52.2% women, age 61.7 ± 10.2 years) in the multi-ethnic study of atherosclerosis during 15 years of follow-up, 1773 CVD events accrued. The AUC at 1-, 5-, 10-, and 15-year follow-up for AI-CAC vs. Agatston score was (0.784 vs. 0.701), (0.771 vs. 0.709), (0.789 vs. 0.712) and (0.816 vs. 0.729) (p < 0.0001 for all), respectively. AI-CAC plaque characteristics, including number, location, density, plus number of vessels, significantly improved CHD prediction in the CAC 1-100 cohort vs. Agatston Score. AI-CAC significantly improved the Agatston score for predicting all CVD events.

中文摘要: 冠状动脉钙 (CAC) 扫描包含除 Agatston 评分之外的有价值的信息,目前报道的 Agatston 评分仅用于预测冠心病 (CHD)。我们研究了应用于 CAC 扫描的新人工智能 (AI) 是否可以预测非 CHD 事件,包括心力衰竭、心房颤动和中风。我们在动脉粥样硬化多种族研究中,对 5830 名无症状个体(52.2% 女性,年龄 61.7±10.2 岁)的 CAC 扫描 (AI-CAC) 应用了人工智能支持的自动心腔容积测定和钙化斑块表征,在 15 年的随访期间,发生了 1773 起 CVD 事件。 AI-CAC 与 Agatston 评分在 1 年、5 年、10 年和 15 年随访时的 AUC 分别为(0.784 vs. 0.701)、(0.771 vs. 0.709)、(0.789 vs. 0.712)和(0.816 vs. 0.729)(p < 0.0001)全部)分别。 AI-CAC 斑块特征(包括数量、位置、密度以及血管数量)与 Agatston 评分相比,显着改善了 CAC 1-100 队列中的 CHD 预测。 AI-CAC 显着提高了预测所有 CVD 事件的 Agatston 评分。


75. The R.O.A.D. to precision medicine.

R.O.A.D.到精准医疗。

PMID: 39489814 | DOI: 10.1038/s41746-024-01291-6 | 日期: 2024-11-03

摘要: We propose a novel framework that addresses the deficiencies of Randomized clinical trial data subgroup analysis while it transforms ObservAtional Data to be used as if they were randomized, thus paving the road for precision medicine. Our approach counters the effects of unobserved confounding in observational data through a two-step process that adjusts predicted outcomes under treatment. These adjusted predictions train decision trees, optimizing treatment assignments for patient subgroups based on their characteristics, enabling intuitive treatment recommendations. Implementing this framework on gastrointestinal stromal tumors (GIST) data, including genetic sub-cohorts, showed that our tree recommendations outperformed current guidelines in an external cohort. Furthermore, we extended the application of this framework to RCT data from patients with extremity sarcomas. Despite initial trial indications of universal treatment necessity, our framework identified a subset of patients who may not require treatment. Once again, we successfully validated our recommendations in an external cohort.

中文摘要: 我们提出了一种新颖的框架,可以解决随机临床试验数据亚组分析的缺陷,同时将观察数据转变为随机使用,从而为精准医疗铺平道路。我们的方法通过调整治疗下预测结果的两步过程来抵消观察数据中未观察到的混杂因素的影响。这些调整后的预测训练决策树,根据患者亚组的特征优化治疗分配,从而实现直观的治疗建议。在胃肠道间质瘤 (GIST) 数据(包括遗传子队列)上实施此框架表明,我们的树建议优于外部队列中的当前指南。此外,我们将该框架的应用扩展到四肢肉瘤患者的随机对照试验数据。尽管初步试验表明有必要进行普遍治疗,但我们的框架确定了一部分可能不需要治疗的患者。我们再次在外部队列中成功验证了我们的建议。


76. Predicting deterioration in dengue using a low cost wearable for continuous clinical monitoring.

使用低成本可穿戴设备进行连续临床监测来预测登革热恶化情况。

PMID: 39488652 | DOI: 10.1038/s41746-024-01304-4 | 日期: 2024-11-02

摘要: Close vital signs monitoring is crucial for the clinical management of patients with dengue. We investigated performance of a non-invasive wearable utilising photoplethysmography (PPG), to provide real-time risk prediction in hospitalised individuals. We performed a prospective observational clinical study in Vietnam between January 2020 and October 2022: 153 patients were included in analyses, providing 1353 h of PPG data. Using a multi-modal transformer approach, 10-min PPG waveform segments and basic clinical data (age, sex, clinical features on admission) were used as features to continuously forecast clinical state 2 h ahead. Prediction of low-risk states (17,939/80,843; 22.1%), defined by NEWS2 and mSOFA < 6, was associated with an area under the precision-recall curve of 0.67 and an area under the receiver operator curve of 0.83. Implementation of such interventions could provide cost-effective triage and clinical care in dengue, offering opportunities for safe ambulatory patient management.

中文摘要: 密切生命体征监测对于登革热患者的临床管理至关重要。我们研究了利用光电体积描记法 (PPG) 的非侵入式可穿戴设备的性能,为住院患者提供实时风险预测。我们于 2020 年 1 月至 2022 年 10 月期间在越南进行了一项前瞻性观察性临床研究:分析纳入了 153 名患者,提供了 1353 小时的 PPG 数据。使用多模态转换器方法,使用 10 分钟 PPG 波形片段和基本临床数据(年龄、性别、入院时的临床特征)作为特征,连续预测提前 2 小时的临床状态。由 NEWS2 和 mSOFA < 6 定义的低风险状态预测 (17,939/80,843; 22.1%) 与精确回忆曲线下面积 0.67 和接收者算子曲线下面积 0.83 相关。实施此类干预措施可以为登革热提供具有成本效益的分诊和临床护理,为安全的门诊患者管理提供机会。


77. A digital, decentralized trial of exercise therapy in patients with cancer.

对癌症患者进行数字化、分散的运动疗法试验。

PMID: 39468290 | DOI: 10.1038/s41746-024-01288-1 | 日期: 2024-10-28

摘要: We developed and evaluated the Digital Platform for Exercise (DPEx): a decentralized, patient-centric approach designed to enhance all aspects of clinical investigation of exercise therapy. DPEx integrated provision of a treadmill with telemedicine and remote biospecimen collection permitting all study procedures to be conducted in patient's homes. Linked health biodevices enabled high-resolution monitoring of lifestyle and physiological response. Here we describe the rationale and development of DPEx as well as feasibility evaluation in three different cohorts of patients with cancer: a phase 0a development study among three women with post-treatment primary breast cancer; a phase 0b proof-of-concept trial of neoadjuvant exercise therapy in 13 patients with untreated solid tumors; and a phase 1a level-finding trial of neoadjuvant exercise therapy in 53 men with localized prostate cancer. Collectively, our study demonstrates the utility of a fully digital, decentralized approach to conduct clinical trials of exercise therapy in a clinical population.

中文摘要: 我们开发并评估了运动数字平台 (DPEx):一种分散的、以患者为中心的方法,旨在加强运动疗法临床研究的各个方面。 DPEx 集成了跑步机、远程医疗和远程生物样本采集功能,允许所有研究程序在患者家中进行。互联的健康生物设备能够对生活方式和生理反应进行高分辨率监测。在这里,我们描述了 DPEx 的基本原理和开发,以及在三个不同癌症患者队列中的可行性评估:针对三名治疗后原发性乳腺癌女性的 0a 期开发研究;对 13 名未经治疗的实体瘤患者进行新辅助运动疗法的 0b 期概念验证试验;以及对 53 名患有局限性前列腺癌的男性进行新辅助运动疗法的 1a 期水平探索试验。总的来说,我们的研究证明了完全数字化、分散式方法在临床人群中进行运动疗法临床试验的实用性。


78. PRISM: Patient Records Interpretation for Semantic clinical trial Matching system using large language models.

PRISM:使用大型语言模型的语义临床试验匹配系统的患者记录解释。

PMID: 39468259 | DOI: 10.1038/s41746-024-01274-7 | 日期: 2024-10-28

摘要: Clinical trial matching is the task of identifying trials for which patients may be eligible. Typically, this task is labor-intensive and requires detailed verification of patient electronic health records (EHRs) against the stringent inclusion and exclusion criteria of clinical trials. This process also results in many patients missing out on potential therapeutic options. Recent advancements in Large Language Models (LLMs) have made automating patient-trial matching possible, as shown in multiple concurrent research studies. However, the current approaches are confined to constrained, often synthetic, datasets that do not adequately mirror the complexities encountered in real-world medical data. In this study, we present an end-to-end large-scale empirical evaluation of a clinical trial matching system and validate it using real-world EHRs. We perform comprehensive experiments with proprietary LLMs and our custom fine-tuned model called OncoLLM and show that OncoLLM outperforms GPT-3.5 and matches the performance of qualified medical doctors for clinical trial matching.

中文摘要: 临床试验匹配是确定患者可能有资格参加的试验的任务。通常,这项任务是劳动密集型的,需要根据临床试验严格的纳入和排除标准对患者电子健康记录 (EHR) 进行详细验证。这个过程还导致许多患者错过了潜在的治疗选择。大型语言模型 (LLM) 的最新进展使得自动化患者试验匹配成为可能,正如多项并行研究所示。然而,当前的方法仅限于受约束的、通常是合成的数据集,这些数据集不能充分反映现实世界医疗数据中遇到的复杂性。在这项研究中,我们对临床试验匹配系统进行了端到端的大规模实证评估,并使用现实世界的电子病历对其进行了验证。我们使用专有的 LLM 和名为 OncoLLM 的定制微调模型进行了全面的实验,结果表明 OncoLLM 的性能优于 GPT-3.5,并且与合格医生进行临床试验匹配的表现相匹配。


79. Clinically applicable optimized periprosthetic joint infection diagnosis via AI based pathology.

通过基于人工智能的病理学进行临床适用的优化假体周围关节感染诊断。

PMID: 39462052 | DOI: 10.1038/s41746-024-01301-7 | 日期: 2024-10-26

摘要: Periprosthetic joint infection (PJI) is a severe complication after joint replacement surgery that demands precise diagnosis for effective treatment. We enhanced PJI diagnostic accuracy through three steps: (1) developing a self-supervised PJI model with DINO v2 to create a large dataset; (2) comparing multiple intelligent models to identify the best one; and (3) using the optimal model for visual analysis to refine diagnostic practices. The self-supervised model generated 27,724 training samples and achieved a perfect AUC of 1, indicating flawless case differentiation. EfficientNet v2-S outperformed CAMEL2 at the image level, while CAMEL2 was superior at the patient level. By using the weakly supervised PJI model to adjust diagnostic criteria, we reduced the required high-power field diagnoses per slide from five to three. These findings demonstrate AI's potential to improve the accuracy and standardization of PJI pathology and have significant implications for infectious disease diagnostics.

中文摘要: 假体周围感染(PJI)是关节置换手术后的严重并发症,需要精确诊断才能有效治疗。我们通过三个步骤提高了 PJI 诊断的准确性:(1) 使用 DINO v2 开发自监督 PJI 模型以创建大型数据集; (2)比较多个智能模型,找出最好的一个; (3) 使用最佳模型进行可视化分析来完善诊断实践。自监督模型生成了 27,724 个训练样本,并实现了完美的 AUC 1,表明案例区分完美无缺。 EfficientNet v2-S 在图像级别上优于 CAMEL2,而 CAMEL2 在患者级别上更胜一筹。通过使用弱监督 PJI 模型调整诊断标准,我们将每张幻灯片所需的高倍现场诊断从 5 次减少到 3 次。这些发现表明人工智能有潜力提高 PJI 病理学的准确性和标准化,并对传染病诊断具有重大影响。


80. Public evidence on AI products for digital pathology.

有关数字病理学人工智能产品的公开证据。

PMID: 39455883 | DOI: 10.1038/s41746-024-01294-3 | 日期: 2024-10-25

摘要: Novel products applying artificial intelligence (AI)-based methods to digital pathology images are touted to have many uses and benefits. However, publicly available information for products can be variable, with few sources of independent evidence. This review aimed to identify public evidence for AI-based products for digital pathology. Key features of products on the European Economic Area/Great Britain (EEA/GB) markets were examined, including their regulatory approval, intended use, and published validation studies. There were 26 AI-based products that met the inclusion criteria and, of these, 24 had received regulatory approval via the self-certification route as General in vitro diagnostic (IVD) medical devices. Only 10 of the products (38%) had peer-reviewed internal validation studies and 11 products (42%) had peer-reviewed external validation studies. To support transparency an online register was developed using identified public evidence ( https://osf.io/gb84r/ ), which we anticipate will provide an accessible resource on novel devices and support decision making.

中文摘要: 将基于人工智能(AI)的方法应用于数字病理图像的新颖产品被认为具有许多用途和好处。然而,公开的产品信息可能各不相同,独立证据来源很少。本次审查旨在确定基于人工智能的数字病理产品的公共证据。我们检查了欧洲经济区/英国 (EEA/GB) 市场上产品的主要特征,包括其监管批准、预期用途和已发表的验证研究。有 26 种基于人工智能的产品符合纳入标准,其中 24 种已通过自我认证途径获得监管部门批准作为通用体外诊断 (IVD) 医疗器械。其中只有 10 种产品 (38%) 进行了同行评审的内部验证研究,11 种产品 (42%) 进行了同行评审的外部验证研究。为了支持透明度,我们使用已确定的公共证据(https://osf.io/gb84r/)开发了在线登记册,我们预计它将在新型设备上提供可访问的资源并支持决策。


81. Transmission line model as a digital twin for abdominal aortic aneurysm patients.

传输线模型作为腹主动脉瘤患者的数字双胞胎。

PMID: 39455823 | DOI: 10.1038/s41746-024-01303-5 | 日期: 2024-10-25

摘要: We investigated the potential of the transmission line model as a digital twin of aneurysmal aorta by comparatively analyzing how a uniform lossless tube-load model were fitted to the carotid and femoral artery tonometry waveforms pertaining to (i) 79 abdominal aortic aneurysm (AAA) patients vs their matched controls (CON) and (ii) 35 AAA patients before vs after endovascular aneurysm repair (EVAR). The uniform lossless tube-load model fitted the tonometry waveforms pertaining to AAA as well as CON and EVAR. In addition, the parameters in the tube-load model exhibited physiologically explainable changes: when normalized, both pulse transit time and reflection coefficient increased with AAA and decreased after EVAR, which can be explained by the increase in arterial compliance and the decrease in arterial inertance due to the aortic expansion associated with AAA. In sum, the tube-load model may have the potential as a digital twin to enable personalized AAA monitoring.

中文摘要: 我们通过比较分析如何将统一的无损管负载模型拟合到颈动脉和股动脉张力测量波形,研究了传输线模型作为动脉瘤主动脉数字孪生的潜力,这些波形涉及(i)79名腹主动脉瘤(AAA)患者与其匹配对照(CON)和(ii)35名AAA患者在血管内动脉瘤修复(EVAR)之前和之后。均匀无损管负载模型适合与 AAA 以及 CON 和 EVAR 相关的眼压测量波形。此外,管负荷模型中的参数表现出生理上可解释的变化:标准化后,脉搏传导时间和反射系数均随 AAA 增加而在 EVAR 后减少,这可以通过与 AAA 相关的主动脉扩张导致的动脉顺应性增加和动脉惰性减少来解释。总之,管负载模型可能具有作为数字孪生的潜力,以实现个性化 AAA 监测。


82. Process mining in mHealth data analysis.

移动医疗数据分析中的流程挖掘。

PMID: 39443677 | DOI: 10.1038/s41746-024-01297-0 | 日期: 2024-10-23

摘要: This perspective article explores how process mining can extract clinical insights from mobile health data and complement data-driven techniques like machine learning. Despite technological advances, challenges such as selection bias and the complex dynamics of health data require advanced approaches. Process mining focuses on analyzing temporal process patterns and provides complementary insights into health condition variability. The article highlights the potential of process mining for analyzing mHealth data and beyond.

中文摘要: 这篇透视文章探讨了流程挖掘如何从移动健康数据中提取临床见解并补充机器学习等数据驱动技术。尽管技术取得了进步,但选择偏差和健康数据的复杂动态等挑战仍然需要先进的方法。过程挖掘侧重于分析时间过程模式,并提供对健康状况变异性的补充见解。本文强调了流程挖掘在分析移动医疗数据及其他方面的潜力。


83. Medical large language models are susceptible to targeted misinformation attacks.

医学大语言模型容易受到有针对性的错误信息攻击。

PMID: 39443664 | DOI: 10.1038/s41746-024-01282-7 | 日期: 2024-10-23

摘要: Large language models (LLMs) have broad medical knowledge and can reason about medical information across many domains, holding promising potential for diverse medical applications in the near future. In this study, we demonstrate a concerning vulnerability of LLMs in medicine. Through targeted manipulation of just 1.1% of the weights of the LLM, we can deliberately inject incorrect biomedical facts. The erroneous information is then propagated in the model's output while maintaining performance on other biomedical tasks. We validate our findings in a set of 1025 incorrect biomedical facts. This peculiar susceptibility raises serious security and trustworthiness concerns for the application of LLMs in healthcare settings. It accentuates the need for robust protective measures, thorough verification mechanisms, and stringent management of access to these models, ensuring their reliable and safe use in medical practice.

中文摘要: 大型语言模型(LLM)拥有广泛的医学知识,可以推理多个领域的医学信息,在不久的将来在各种医学应用中具有广阔的潜力。在这项研究中,我们证明了医学硕士的一个令人担忧的脆弱性。通过有针对性地操纵法学硕士权重的 1.1%,我们可以故意注入不正确的生物医学事实。然后,错误信息会在模型的输出中传播,同时保持其他生物医学任务的性能。我们在一组 1025 个不正确的生物医学事实中验证了我们的发现。这种特殊的敏感性引发了法学硕士在医疗保健环境中应用的严重安全性和可信性问题。它强调了对强有力的保护措施、彻底的验证机制以及对这些模型的访问的严格管理的需要,以确保它们在医疗实践中可靠和安全的使用。


84. Clinical usefulness of digital twin guided virtual amiodarone test in patients with atrial fibrillation ablation.

数字孪生引导的虚拟胺碘酮测试在房颤消融患者中的临床有效性。

PMID: 39443659 | DOI: 10.1038/s41746-024-01298-z | 日期: 2024-10-23

摘要: It would be clinically valuable if the efficacy of antiarrhythmic drugs could be simulated in advance. We developed a digital twin to predict amiodarone efficacy in high-risk atrial fibrillation (AF) patients post-ablation. Virtual left atrium models were created from computed tomography and electroanatomical maps to simulate AF and evaluate its response to varying amiodarone concentrations. As the amiodarone concentration increased in the virtual setting, action potential duration lengthened, peak upstroke velocities decreased, and virtual AF termination became more frequent. Patients were classified into effective (those with virtually terminated AF at therapeutic doses) and ineffective groups. The one-year clinical outcomes after AF ablation showed significantly better results in the effective group compared to the ineffective group, with AF recurrence rates of 20.8% vs. 45.1% (log-rank p = 0.031, adjusted hazard ratio, 0.37 [0.14-0.98]; p = 0.046). This study highlights the potential of a digital twin-guided approach in predicting amiodarone's effectiveness and improving personalized AF management. Clinical Trial Registration Name: The Evaluation for Prognostic Factors After Catheter Ablation of Atrial Fibrillation: Cohort Study, Registration number: NCT02138695. The date of registration: 2014-05. URL: https://www.clinicaltrials.gov ; Unique identifier: NCT02138695.

中文摘要: 如果能够提前模拟抗心律失常药物的药效,将具有临床价值。我们开发了一个数字双胞胎来预测胺碘酮对高危房颤 (AF) 患者消融后的疗效。根据计算机断层扫描和电解剖图创建虚拟左心房模型,以模拟 AF 并评估其对不同胺碘酮浓度的反应。随着虚拟环境中胺碘酮浓度的增加,动作电位持续时间延长,峰值上冲速度降低,虚拟 AF 终止变得更加频繁。患者被分为有效组(那些在治疗剂量下房颤几乎终止的患者)和无效组。房颤消融后一年的临床结果显示,与无效组相比,有效组的结果明显更好,房颤复发率分别为 20.8% 和 45.1%(对数秩 p = 0.031,调整后的风险比,0.37 [0.14-0.98];p = 0.046)。这项研究强调了数字孪生引导方法在预测胺碘酮有效性和改善个性化房颤管理方面的潜力。临床试验注册名称:心房颤动导管消融术后预后因素的评估:队列研究,注册号:NCT02138695。注册日期:2014年5月。网址:https://www.clinicaltrials.gov;唯一标识符:NCT02138695。


85. Development and assessment of a machine learning tool for predicting emergency admission in Scotland.

开发和评估用于预测苏格兰紧急入院的机器学习工具。

PMID: 39443624 | DOI: 10.1038/s41746-024-01250-1 | 日期: 2024-10-23

摘要: Emergency admissions (EA), where a patient requires urgent in-hospital care, are a major challenge for healthcare systems. The development of risk prediction models can partly alleviate this problem by supporting primary care interventions and public health planning. Here, we introduce SPARRAv4, a predictive score for EA risk that will be deployed nationwide in Scotland. SPARRAv4 was derived using supervised and unsupervised machine-learning methods applied to routinely collected electronic health records from approximately 4.8M Scottish residents (2013-18). We demonstrate improvements in discrimination and calibration with respect to previous scores deployed in Scotland, as well as stability over a 3-year timeframe. Our analysis also provides insights about the epidemiology of EA risk in Scotland, by studying predictive performance across different population sub-groups and reasons for admission, as well as by quantifying the effect of individual input features. Finally, we discuss broader challenges including reproducibility and how to safely update risk prediction models that are already deployed at population level.

中文摘要: 紧急入院 (EA) 是指患者需要紧急住院护理的情况,是医疗保健系统面临的一项重大挑战。风险预测模型的开发可以通过支持初级保健干预措施和公共卫生规划来部分缓解这一问题。在这里,我们介绍 SPARRAv4,这是一种 EA 风险预测评分,将在苏格兰全国范围内部署。 SPARRAv4 是使用监督和非监督机器学习方法衍生出来的,该方法应用于从大约 480 万苏格兰居民定期收集的电子健康记录 (2013-18)。我们展示了与之前在苏格兰部署的分数相比,区分度和校准方面的改进,以及 3 年时间范围内的稳定性。我们的分析还通过研究不同人群亚组的预测表现和入院原因以及量化个人输入特征的影响,提供了有关苏格兰 EA 风险流行病学的见解。最后,我们讨论了更广泛的挑战,包括可重复性以及如何安全更新已在人群层面部署的风险预测模型。


86. Prediction of total and regional body composition from 3D body shape.

根据 3D 体型预测总体和局部身体成分。

PMID: 39443585 | DOI: 10.1038/s41746-024-01289-0 | 日期: 2024-10-23

摘要: Accurate assessment of body composition is essential for evaluating the risk of chronic disease. 3D body shape, obtainable using smartphones, correlates strongly with body composition. We present a novel method that fits a 3D body mesh to a dual-energy X-ray absorptiometry (DXA) silhouette (emulating a single photograph) paired with anthropometric traits, and apply it to the multi-phase Fenland study comprising 12,435 adults. Using baseline data, we derive models predicting total and regional body composition metrics from these meshes. In Fenland follow-up data, all metrics were predicted with high correlations (r > 0.86). We also evaluate a smartphone app which reconstructs a 3D mesh from phone images to predict body composition metrics; this analysis also showed strong correlations (r > 0.84) for all metrics. The 3D body shape approach is a valid alternative to medical imaging that could offer accessible health parameters for monitoring the efficacy of lifestyle intervention programmes.

中文摘要: 准确评估身体成分对于评估慢性病风险至关重要。使用智能手机可以获得的 3D 体型与身体成分密切相关。我们提出了一种新颖的方法,将 3D 身体网格与双能 X 射线吸收测量 (DXA) 轮廓(模拟单张照片)与人体测量特征配对,并将其应用于包含 12,435 名成年人的多阶段 Fenland 研究。使用基线数据,我们从这些网格中导出预测总体和区域身体成分指标的模型。在 Fenland 后续数据中,所有指标的预测都具有高度相关性 (r > 0.86)。我们还评估了一款智能手机应用程序,该应用程序可以根据手机图像重建 3D 网格以预测身体成分指标;该分析还显示所有指标都具有很强的相关性 (r > 0.84)。 3D 体型方法是医学成像的有效替代方案,可以提供易于获取的健康参数,用于监测生活方式干预计划的效果。


87. Detecting clinical medication errors with AI enabled wearable cameras.

使用支持人工智能的可穿戴摄像头检测临床用药错误。

PMID: 39438764 | DOI: 10.1038/s41746-024-01295-2 | 日期: 2024-10-22

摘要: Drug-related errors are a leading cause of preventable patient harm in the clinical setting. We present the first wearable camera system to automatically detect potential errors, prior to medication delivery. We demonstrate that using deep learning algorithms, our system can detect and classify drug labels on syringes and vials in drug preparation events recorded in real-world operating rooms. We created a first-of-its-kind large-scale video dataset from head-mounted cameras comprising 4K footage across 13 anesthesiology providers, 2 hospitals and 17 operating rooms over 55 days. The system was evaluated on 418 drug draw events in routine patient care and a controlled environment and achieved 99.6% sensitivity and 98.8% specificity at detecting vial swap errors. These results suggest that our wearable camera system has the potential to provide a secondary check when a medication is selected for a patient, and a chance to intervene before a potential medical error.

中文摘要: 药物相关错误是临床环境中可预防的患者伤害的主要原因。我们推出了第一个可穿戴摄像头系统,可以在药物输送之前自动检测潜在的错误。我们证明,使用深度学习算法,我们的系统可以在现实手术室记录的药物制备事件中检测和分类注射器和小瓶上的药物标签。我们利用头戴式摄像机创建了首个大型视频数据集,其中包含 55 天内 13 家麻醉医生、2 家医院和 17 个手术室的 4K 镜头。该系统在常规患者护理和受控环境中对 418 次药物抽取事件进行了评估,在检测药瓶交换错误方面实现了 99.6% 的灵敏度和 98.8% 的特异性。这些结果表明,我们的可穿戴摄像头系统有可能在为患者选择药物时提供二次检查,并有机会在潜在的医疗错误发生之前进行干预。


88. Evaluation and mitigation of cognitive biases in medical language models.

医学语言模型中认知偏差的评估和缓解。

PMID: 39433945 | DOI: 10.1038/s41746-024-01283-6 | 日期: 2024-10-21

摘要: Increasing interest in applying large language models (LLMs) to medicine is due in part to their impressive performance on medical exam questions. However, these exams do not capture the complexity of real patient-doctor interactions because of factors like patient compliance, experience, and cognitive bias. We hypothesized that LLMs would produce less accurate responses when faced with clinically biased questions as compared to unbiased ones. To test this, we developed the BiasMedQA dataset, which consists of 1273 USMLE questions modified to replicate common clinically relevant cognitive biases. We assessed six LLMs on BiasMedQA and found that GPT-4 stood out for its resilience to bias, in contrast to Llama 2 70B-chat and PMC Llama 13B, which showed large drops in performance. Additionally, we introduced three bias mitigation strategies, which improved but did not fully restore accuracy. Our findings highlight the need to improve LLMs' robustness to cognitive biases, in order to achieve more reliable applications of LLMs in healthcare.

中文摘要: 人们对将大型语言模型(LLM)应用于医学的兴趣日益浓厚,部分原因在于它们在医学考试问题上的出色表现。然而,由于患者依从性、经验和认知偏差等因素,这些检查并没有捕捉到真实的医患互动的复杂性。我们假设法学硕士在面对有临床偏见的问题时,与无偏见的问题相比,会产生不太准确的回答。为了测试这一点,我们开发了 BiasMedQA 数据集,其中包含 1273 个 USMLE 问题,经过修改以复制常见的临床相关认知偏差。我们在 BiasMedQA 上评估了 6 位法学硕士,发现 GPT-4 因其对偏见的恢复能力而脱颖而出,而 Llama 2 70B-chat 和 PMC Llama 13B 则表现大幅下降。此外,我们引入了三种偏差缓解策略,这些策略提高了准确性,但并未完全恢复准确性。我们的研究结果强调需要提高法学硕士对认知偏差的鲁棒性,以便实现法学硕士在医疗保健领域更可靠的应用。


89. Finding Long-COVID: temporal topic modeling of electronic health records from the N3C and RECOVER programs.

寻找 Long-COVID:来自 N3C 和 RECOVER 计划的电子健康记录的时间主题建模。

PMID: 39433942 | DOI: 10.1038/s41746-024-01286-3 | 日期: 2024-10-21

摘要: Post-Acute Sequelae of SARS-CoV-2 infection (PASC), also known as Long-COVID, encompasses a variety of complex and varied outcomes following COVID-19 infection that are still poorly understood. We clustered over 600 million condition diagnoses from 14 million patients available through the National COVID Cohort Collaborative (N3C), generating hundreds of highly detailed clinical phenotypes. Assessing patient clinical trajectories using these clusters allowed us to identify individual conditions and phenotypes strongly increased after acute infection. We found many conditions increased in COVID-19 patients compared to controls, and using a novel method to associate patients with clusters over time, we additionally found phenotypes specific to patient sex, age, wave of infection, and PASC diagnosis status. While many of these results reflect known PASC symptoms, the resolution provided by this unprecedented data scale suggests avenues for improved diagnostics and mechanistic understanding of this multifaceted disease.

中文摘要: SARS-CoV-2 感染 (PASC) 的急性后遗症,也称为长新冠病毒 (Long-COVID),包括 COVID-19 感染后的各种复杂多样的结果,但目前人们对这些结果仍知之甚少。我们通过国家新冠肺炎队列协作组织 (N3C) 对 1400 万患者的 6 亿多条病情诊断进行了聚类,生成了数百种高度详细的临床表型。使用这些聚类评估患者的临床轨迹使我们能够识别急性感染后急剧增加的个体状况和表型。我们发现与对照组相比,COVID-19 患者的许多病情有所增加,并且使用一种新方法将患者与随着时间的推移而聚集的患者联系起来,我们还发现了特定于患者性别、年龄、感染波次和 PASC 诊断状态的表型。虽然其中许多结果反映了已知的 PASC 症状,但这种前所未有的数据规模提供的分辨率表明了改进对这种多方面疾病的诊断和机制理解的途径。


90. Early detection of dementia through retinal imaging and trustworthy AI.

通过视网膜成像和值得信赖的人工智能及早发现痴呆症。

PMID: 39428420 | DOI: 10.1038/s41746-024-01292-5 | 日期: 2024-10-20

摘要: Alzheimer's disease (AD) is a global healthcare challenge lacking a simple and affordable detection method. We propose a novel deep learning framework, Eye-AD, to detect Early-onset Alzheimer's Disease (EOAD) and Mild Cognitive Impairment (MCI) using OCTA images of retinal microvasculature and choriocapillaris. Eye-AD employs a multilevel graph representation to analyze intra- and inter-instance relationships in retinal layers. Using 5751 OCTA images from 1671 participants in a multi-center study, our model demonstrated superior performance in EOAD (internal data: AUC = 0.9355, external data: AUC = 0.9007) and MCI detection (internal data: AUC = 0.8630, external data: AUC = 0.8037). Furthermore, we explored the associations between retinal structural biomarkers in OCTA images and EOAD/MCI, and the results align well with the conclusions drawn from our deep learning interpretability analysis. Our findings provide further evidence that retinal OCTA imaging, coupled with artificial intelligence, will serve as a rapid, noninvasive, and affordable dementia detection.

中文摘要: 阿尔茨海默病 (AD) 是一项全球性的医疗保健挑战,缺乏简单且负担得起的检测方法。我们提出了一种新颖的深度学习框架 Eye-AD,利用视网膜微血管和脉络膜毛细血管的 OCTA 图像来检测早发性阿尔茨海默病 (EOAD) 和轻度认知障碍 (MCI)。 Eye-AD 采用多级图形表示来分析视网膜层中的实例内和实例间关系。使用多中心研究中 1671 名参与者的 5751 幅 OCTA 图像,我们的模型在 EOAD(内部数据:AUC = 0.9355,外部数据:AUC = 0.9007)和 MCI 检测(内部数据:AUC = 0.8630,外部数据: AUC = 0.8037)。此外,我们探索了 OCTA 图像中的视网膜结构生物标志物与 EOAD/MCI 之间的关联,结果与我们的深度学习可解释性分析得出的结论非常吻合。我们的研究结果进一步证明,视网膜 OCTA 成像与人工智能相结合,将成为一种快速、无创且经济实惠的痴呆症检测方法。


91. Privacy enhancing and generalizable deep learning with synthetic data for mediastinal neoplasm diagnosis.

利用合成数据进行隐私增强和泛化深度学习,用于纵隔肿瘤诊断。

PMID: 39427092 | DOI: 10.1038/s41746-024-01290-7 | 日期: 2024-10-20

摘要: The success of deep learning (DL) relies heavily on training data from which DL models encapsulate information. Consequently, the development and deployment of DL models expose data to potential privacy breaches, which are particularly critical in data-sensitive contexts like medicine. We propose a new technique named DiffGuard that generates realistic and diverse synthetic medical images with annotations, even indistinguishable for experts, to replace real data for DL model training, which cuts off their direct connection and enhances privacy safety. We demonstrate that DiffGuard enhances privacy safety with much less data leakage and better resistance against privacy attacks on data and model. It also improves the accuracy and generalizability of DL models for segmentation and classification of mediastinal neoplasms in multi-center evaluation. We expect that our solution would enlighten the road to privacy-preserving DL for precision medicine, promote data and model sharing, and inspire more innovation on artificial-intelligence-generated-content technologies for medicine.

中文摘要: 深度学习 (DL) 的成功在很大程度上依赖于 DL 模型封装信息的训练数据。因此,深度学习模型的开发和部署会使数据面临潜在的隐私泄露,这在医学等数据敏感的环境中尤其重要。我们提出了一种名为 DiffGuard 的新技术,它可以生成真实且多样化的带有注释的合成医学图像,甚至专家无法区分,以取代真实数据进行 DL 模型训练,从而切断它们的直接联系并增强隐私安全。我们证明,DiffGuard 可以显着减少数据泄露,并更好地抵御对数据和模型的隐私攻击,从而增强隐私安全。它还提高了多中心评估中纵隔肿瘤分割和分类的深度学习模型的准确性和通用性。我们期望我们的解决方案能够为精准医疗的隐私保护深度学习指明道路,促进数据和模型共享,并激发人工智能生成的医学内容技术的更多创新。


92. Feasibility of snapshot testing using wearable sensors to detect cardiorespiratory illness (COVID infection in India).

使用可穿戴传感器进行快照测试来检测心肺疾病(印度的新冠肺炎感染)的可行性。

PMID: 39427067 | DOI: 10.1038/s41746-024-01287-2 | 日期: 2024-10-19

摘要: The COVID-19 pandemic has challenged the current paradigm of clinical and community-based disease detection. We present a multimodal wearable sensor system paired with a two-minute, movement-based activity sequence that successfully captures a snapshot of physiological data (including cardiac, respiratory, temperature, and percent oxygen saturation). We conducted a large, multi-site trial of this technology across India from June 2021 to April 2022 amidst the COVID-19 pandemic (Clinical trial registry name: International Validation of Wearable Sensor to Monitor COVID-19 Like Signs and Symptoms; NCT05334680; initial release: 04/15/2022). An Extreme Gradient Boosting algorithm was trained to discriminate between COVID-19 infected individuals (n = 295) and COVID-19 negative healthy controls (n = 172) and achieved an F1-Score of 0.80 (95% CI = [0.79, 0.81]). SHAP values were mapped to visualize feature importance and directionality, yielding engineered features from core temperature, cough, and lung sounds as highly important. The results demonstrated potential for data-driven wearable sensor technology for remote preliminary screening, highlighting a fundamental pivot from continuous to snapshot monitoring of cardiorespiratory illnesses.

中文摘要: COVID-19 大流行挑战了当前临床和社区疾病检测的范式。我们提出了一种多模式可穿戴传感器系统,搭配两分钟的基于运动的活动序列,可成功捕获生理数据的快照(包括心脏、呼吸、温度和氧饱和度百分比)。 2021 年 6 月至 2022 年 4 月,在 COVID-19 大流行期间,我们在印度各地对该技术进行了大规模多站点试验(临床试验注册名称:用于监测 COVID-19 样体征和症状的可穿戴传感器的国际验证;NCT05334680;首次发布:2022 年 4 月 15 日)。经过训练的极限梯度增强算法可以区分 COVID-19 感染者 (n = 295) 和 COVID-19 阴性健康对照 (n = 172),并实现 F1 分数 0.80 (95% CI = [0.79, 0.81])。 SHAP 值被映射以可视化特征重要性和方向性,从核心温度、咳嗽和肺音中产生非常重要的工程特征。结果证明了数据驱动的可穿戴传感器技术在远程初步筛查方面的潜力,突出了心肺疾病从连续监测到快照监测的基本支点。


93. Biologically informed deep neural networks provide quantitative assessment of intratumoral heterogeneity in post treatment glioblastoma.

生物学深度神经网络可对胶质母细胞瘤治疗后的瘤内异质性进行定量评估。

PMID: 39427044 | DOI: 10.1038/s41746-024-01277-4 | 日期: 2024-10-19

摘要: Intratumoral heterogeneity poses a significant challenge to the diagnosis and treatment of recurrent glioblastoma. This study addresses the need for non-invasive approaches to map heterogeneous landscape of histopathological alterations throughout the entire lesion for each patient. We developed BioNet, a biologically-informed neural network, to predict regional distributions of two primary tissue-specific gene modules: proliferating tumor (Pro) and reactive/inflammatory cells (Inf). BioNet significantly outperforms existing methods (p < 2e-26). In cross-validation, BioNet achieved AUCs of 0.80 (Pro) and 0.81 (Inf), with accuracies of 80% and 75%, respectively. In blind tests, BioNet achieved AUCs of 0.80 (Pro) and 0.76 (Inf), with accuracies of 81% and 74%. Competing methods had AUCs lower or around 0.6 and accuracies lower or around 70%. BioNet's voxel-level prediction maps reveal intratumoral heterogeneity, potentially improving biopsy targeting and treatment evaluation. This non-invasive approach facilitates regular monitoring and timely therapeutic adjustments, highlighting the role of ML in precision medicine.

中文摘要: 瘤内异质性对复发性胶质母细胞瘤的诊断和治疗提出了重大挑战。这项研究解决了对非侵入性方法的需求,以绘制每位患者整个病变组织病理学变化的异质图谱。我们开发了 BioNet,一种生物信息神经网络,用于预测两个主要组织特异性基因模块的区域分布:增殖肿瘤 (Pro) 和反应性/炎症细胞 (Inf)。 BioNet 显着优于现有方法 (p < 2e-26)。在交叉验证中,BioNet 的 AUC 为 0.80 (Pro) 和 0.81 (Inf),准确率分别为 80% 和 75%。在盲测中,BioNet 的 AUC 分别为 0.80 (Pro) 和 0.76 (Inf),准确率分别为 81% 和 74%。竞争方法的 AUC 较低或约为 0.6,准确度较低或约为 70%。 BioNet 的体素水平预测图揭示了肿瘤内异质性,有可能改善活检靶向和治疗评估。这种非侵入性方法有利于定期监测和及时的治疗调整,凸显了机器学习在精准医疗中的作用。


94. Guidance for unbiased predictive information for healthcare decision-making and equity (GUIDE): considerations when race may be a prognostic factor.

医疗保健决策和公平的无偏见预测信息指南 (GUIDE):种族可能成为预后因素时的考虑因素。

PMID: 39427028 | DOI: 10.1038/s41746-024-01245-y | 日期: 2024-10-19

摘要: Clinical prediction models (CPMs) are tools that compute the risk of an outcome given a set of patient characteristics and are routinely used to inform patients, guide treatment decision-making, and resource allocation. Although much hope has been placed on CPMs to mitigate human biases, CPMs may potentially contribute to racial disparities in decision-making and resource allocation. While some policymakers, professional organizations, and scholars have called for eliminating race as a variable from CPMs, others raise concerns that excluding race may exacerbate healthcare disparities and this controversy remains unresolved. The Guidance for Unbiased predictive Information for healthcare Decision-making and Equity (GUIDE) provides expert guidelines for model developers and health system administrators on the transparent use of race in CPMs and mitigation of algorithmic bias across contexts developed through a 5-round, modified Delphi process from a diverse 14-person technical expert panel (TEP). Deliberations affirmed that race is a social construct and that the goals of prediction are distinct from those of causal inference, and emphasized: the importance of decisional context (e.g., shared decision-making versus healthcare rationing); the conflicting nature of different anti-discrimination principles (e.g., anticlassification versus antisubordination principles); and the importance of identifying and balancing trade-offs in achieving equity-related goals with race-aware versus race-unaware CPMs for conditions where racial identity is prognostically informative. The GUIDE, comprising 31 key items in the development and use of CPMs in healthcare, outlines foundational principles, distinguishes between bias and fairness, and offers guidance for examining subgroup invalidity and using race as a variable in CPMs. This GUIDE presents a living document that supports appraisal and reporting of bias in CPMs to support best practice in CPM development and use.

中文摘要: 临床预测模型 (CPM) 是根据一组患者特征计算结果风险的工具,通常用于告知患者、指导治疗决策和资源分配。尽管人们对 CPM 寄予厚望,以减轻人类偏见,但 CPM 可能会导致决策和资源分配方面的种族差异。尽管一些政策制定者、专业组织和学者呼吁消除 CPM 中的种族这一变量,但其他人则担心排除种族可能会加剧医疗保健不平等,而这一争议仍未得到解决。医疗保健决策和公平的无偏见预测信息指南 (GUIDE) 为模型开发人员和卫生系统管理员提供了关于在 CPM 中透明地使用种族以及减轻跨环境的算法偏差的专家指南,该指南是通过由 14 人组成的多元化技术专家小组 (TEP) 进行的 5 轮改进的 Delphi 流程开发的。审议确认种族是一种社会建构,预测的目标与因果推理的目标不同,并强调:决策背景的重要性(例如,共同决策与医疗配给);不同反歧视原则的冲突性质(例如,反分类原则与反从属原则);以及在种族身份具有预测信息的情况下,通过种族意识与种族不意识 CPM 来确定和平衡实现公平相关目标的重要性。该指南由医疗保健领域 CPM 开发和使用的 31 个关键项目组成,概述了基本原则,区分了偏见和公平,并为检查亚组无效性和使用种族作为 CPM 变量提供了指导。本指南提供了一份动态文档,支持对 CPM 中的偏差进行评估和报告,以支持 CPM 开发和使用的最佳实践。


95. Addressing fairness issues in deep learning-based medical image analysis: a systematic review.

解决基于深度学习的医学图像分析中的公平性问题:系统评价。

PMID: 39420149 | DOI: 10.1038/s41746-024-01276-5 | 日期: 2024-10-17

摘要: Deep learning algorithms have demonstrated remarkable efficacy in various medical image analysis (MedIA) applications. However, recent research highlights a performance disparity in these algorithms when applied to specific subgroups, such as exhibiting poorer predictive performance in elderly females. Addressing this fairness issue has become a collaborative effort involving AI scientists and clinicians seeking to understand its origins and develop solutions for mitigation within MedIA. In this survey, we thoroughly examine the current advancements in addressing fairness issues in MedIA, focusing on methodological approaches. We introduce the basics of group fairness and subsequently categorize studies on fair MedIA into fairness evaluation and unfairness mitigation. Detailed methods employed in these studies are presented too. Our survey concludes with a discussion of existing challenges and opportunities in establishing a fair MedIA and healthcare system. By offering this comprehensive review, we aim to foster a shared understanding of fairness among AI researchers and clinicians, enhance the development of unfairness mitigation methods, and contribute to the creation of an equitable MedIA society.

中文摘要: 深度学习算法在各种医学图像分析(MediA)应用中表现出了显着的功效。然而,最近的研究强调了这些算法在应用于特定亚组时的性能差异,例如在老年女性中表现出较差的预测性能。解决这一公平问题已成为人工智能科学家和临床医生共同努力的结果,他们寻求了解其起源并在 MedIA 内开发缓解解决方案。在这项调查中,我们深入研究了当前在解决 MedIA 公平问题方面取得的进展,重点关注方法论方法。我们介绍了群体公平的基础知识,随后将公平媒体的研究分为公平评估和不公平缓解。还介绍了这些研究中采用的详细方法。我们的调查最后讨论了建立公平的媒体和医疗保健系统中现有的挑战和机遇。通过提供这项全面的审查,我们的目标是促进人工智能研究人员和临床医生对公平性的共同理解,加强不公平缓解方法的开发,并为创建一个公平的媒体社会做出贡献。


96. A randomized clinical trial testing digital mindset intervention for knee osteoarthritis pain and activity improvement.

一项随机临床试验,测试数字思维干预对膝骨关节炎疼痛和活动改善的影响。

PMID: 39414999 | DOI: 10.1038/s41746-024-01281-8 | 日期: 2024-10-17

摘要: This randomized clinical trial evaluated the effectiveness of short, digital interventions in improving physical activity and pain for individuals with knee osteoarthritis. We compared a digital mindset intervention, focusing on adaptive mindsets (e.g., osteoarthritis is manageable), to a digital education intervention and a no-intervention group. 408 participants with knee osteoarthritis completed the study online in the US. The mindset intervention significantly improved mindsets compared to both other groups (P < 0.001) and increased physical activity levels more than the no-intervention group (mean = 28.6 points, P = 0.001), but pain reduction was not significant. The mindset group also showed significantly greater improvements in the perceived need for surgery, self-imposed physical limitations, fear of movement, and self-efficacy than the no-intervention and education groups. This trial demonstrates the effectiveness of brief digital interventions in educating about osteoarthritis and further highlights the additional benefits of improving mindsets to transform patients' approach to disease management. The study was prospectively registered (ClinicalTrials.gov: NCT05698368, 2023-01-26).

中文摘要: 这项随机临床试验评估了短期数字干预措施在改善膝骨关节炎患者身体活动和疼痛方面的有效性。我们将专注于适应性思维(例如,骨关节炎是可控的)的数字思维干预与数字教育干预和无干预组进行了比较。 408 名患有膝骨关节炎的参与者在美国在线完成了这项研究。与其他组相比,心态干预显着改善了心态(P < 0.001),并且比未干预组更多地增加了体力活动水平(平均值= 28.6分,P = 0.001),但疼痛减轻并不显着。与不干预组和教育组相比,心态组在手术需求、自我施加的身体限制、对运动的恐惧和自我效能方面也表现出显着更大的改善。该试验证明了简短的数字干预措施在骨关节炎教育方面的有效性,并进一步强调了改善心态以改变患者疾病管理方法的额外好处。该研究是前瞻性注册的(ClinicalTrials.gov:NCT05698368,2023-01-26)。


97. Knowledge abstraction and filtering based federated learning over heterogeneous data views in healthcare.

医疗保健领域异构数据视图中基于知识抽象和过滤的联合学习。

PMID: 39414980 | DOI: 10.1038/s41746-024-01272-9 | 日期: 2024-10-16

摘要: Robust data privacy regulations hinder the exchange of healthcare data among institutions, crucial for global insights and developing generalised clinical models. Federated learning (FL) is ideal for training global models using datasets from different institutions without compromising privacy. However, disparities in electronic healthcare records (EHRs) lead to inconsistencies in ML-ready data views, making FL challenging without extensive preprocessing and information loss. These differences arise from variations in services, care standards, and record-keeping practices. This paper addresses data view heterogeneity by introducing a knowledge abstraction and filtering-based FL framework that allows FL over heterogeneous data views without manual alignment or information loss. The knowledge abstraction and filtering mechanism maps raw input representations to a unified, semantically rich shared space for effective global model training. Experiments on three healthcare datasets demonstrate the framework's effectiveness in overcoming data view heterogeneity and facilitating information sharing in a federated setup.

中文摘要: 严格的数据隐私法规阻碍了机构之间的医疗数据交换,而这对于全球洞察和开发通用临床模型至关重要。联邦学习 (FL) 非常适合使用来自不同机构的数据集训练全局模型,而不会损害隐私。然而,电子医疗记录 (EHR) 的差异导致 ML 就绪数据视图不一致,使得 FL 在没有大量预处理和信息丢失的情况下具有挑战性。这些差异源于服务、护理标准和记录保存做法的差异。本文通过引入基于知识抽象和过滤的 FL 框架来解决数据视图异构性问题,该框架允许在异构数据视图上进行 FL,而无需手动对齐或信息丢失。知识抽象和过滤机制将原始输入表示映射到统一的、语义丰富的共享空间,以进行有效的全局模型训练。对三个医疗数据集的实验证明了该框架在克服数据视图异质性和促进联合设置中的信息共享方面的有效性。


98. India's evolving digital health strategy.

印度不断发展的数字医疗战略。

PMID: 39414924 | DOI: 10.1038/s41746-024-01279-2 | 日期: 2024-10-16

摘要: India's evolving digital health strategy leverages innovative technologies to enhance access to healthcare services. This paper explores the key components of India's digital health transformation, including the Ayushman Bharat Digital Mission (ABDM) and India's integration of biometric identification and digital infrastructure to improve healthcare delivery. The lessons learned from India's large-scale implementation of digital health provide valuable insights for global health markets and digital transformations in healthcare systems.

中文摘要: 印度不断发展的数字健康战略利用创新技术来增强医疗保健服务的可及性。本文探讨了印度数字医疗转型的关键组成部分,包括 Ayushman Bharat 数字使命 (ABDM) 以及印度整合生物识别和数字基础设施以改善医疗保健服务。印度大规模实施数字医疗的经验教训为全球医疗市场和医疗保健系统的数字化转型提供了宝贵的见解。


99. Prospective clinical evaluation of deep learning for ultrasonographic screening of abdominal aortic aneurysms.

深度学习用于腹主动脉瘤超声筛查的前瞻性临床评估。

PMID: 39406888 | DOI: 10.1038/s41746-024-01269-4 | 日期: 2024-10-15

摘要: Abdominal aortic aneurysm (AAA) often remains undetected until rupture due to limited access to diagnostic ultrasound. This trial evaluated a deep learning (DL) algorithm to guide AAA screening by novice nurses with no prior ultrasonography experience. Ten nurses performed 15 scans each on patients over 65, assisted by a DL object detection algorithm, and compared against physician-performed scans. Ultrasound scan quality, assessed by three blinded expert physicians, was the primary outcome. Among 184 patients, DL-guided novices achieved adequate scan quality in 87.5% of cases, comparable to the 91.3% by physicians (p = 0.310). The DL model predicted AAA with an AUC of 0.975, 100% sensitivity, and 97.8% specificity, with a mean absolute error of 2.8 mm in predicting aortic width compared to physicians. This study demonstrates that DL-guided POCUS has the potential to democratize AAA screening, offering performance comparable to experienced physicians and improving early detection.

中文摘要: 由于超声诊断的机会有限,腹主动脉瘤 (AAA) 通常直到破裂才被发现。该试验评估了深度学习 (DL) 算法,以指导没有超声检查经验的新手护士进行 AAA 筛查。 10 名护士在 DL 对象检测算法的辅助下,每人对 65 岁以上的患者进行了 15 次扫描,并与医生进行的扫描进行了比较。主要结果是由三名盲法专家医师评估的超声扫描质量。在 184 名患者中,DL 引导的新手在 87.5% 的病例中达到了足够的扫描质量,与医生的 91.3% 相当 (p = 0.310)。 DL 模型预测 AAA 的 AUC 为 0.975,敏感性为 100%,特异性为 97.8%,与医生相比,预测主动脉宽度的平均绝对误差为 2.8mm。这项研究表明,DL 引导的 POCUS 有潜力使 AAA 筛查大众化,提供与经验丰富的医生相当的性能并改善早期检测。


100. Privacy-friendly evaluation of patient data with secure multiparty computation in a European pilot study.

在欧洲试点研究中通过安全多方计算对患者数据进行隐私友好的评估。

PMID: 39397162 | DOI: 10.1038/s41746-024-01293-4 | 日期: 2024-10-14

摘要: In multicentric studies, data sharing between institutions might negatively impact patient privacy or data security. An alternative is federated analysis by secure multiparty computation. This pilot study demonstrates an architecture and implementation addressing both technical challenges and legal difficulties in the particularly demanding setting of clinical research on cancer patients within the strict European regulation on patient privacy and data protection: 24 patients from LMU University Hospital in Munich, Germany, and 24 patients from Policlinico Universitario Fondazione Agostino Gemelli, Rome, Italy, were treated for adrenal gland metastasis with typically 40 Gy in 3 or 5 fractions of online-adaptive radiotherapy guided by real-time MR. High local control (21% complete remission, 27% partial remission, 40% stable disease) and low toxicity (73% reporting no toxicity) were observed. Median overall survival was 19 months. Federated analysis was found to improve clinical science through privacy-friendly evaluation of patient data in the European health data space.

中文摘要: 在多中心研究中,机构之间的数据共享可能会对患者隐私或数据安全产生负面影响。另一种方法是通过安全多方计算进行联合分析。这项试点研究展示了一种架构和实施方案,在严格的欧洲患者隐私和数据保护法规下,解决癌症患者临床研究的技术挑战和法律困难:来自德国慕尼黑 LMU 大学医院的 24 名患者和来自意大利罗马 Policlinico Universitario Fondazione Agostino Gemelli 的 24 名患者,接受了通常为 40Gy 的肾上腺转移治疗,分 3 或 5 次在线自适应放射治疗引导通过实时先生。观察到高局部控制(21% 完全缓解、27% 部分缓解、40% 疾病稳定)和低毒性(73% 报告无毒性)。中位总生存期为 19 个月。研究发现,联合分析可以通过对欧洲健康数据空间中的患者数据进行隐私友好的评估来改善临床科学。


101. A scoping review on advancements in noninvasive wearable technology for heart failure management.

对用于心力衰竭管理的无创可穿戴技术的进展进行范围审查。

PMID: 39396094 | DOI: 10.1038/s41746-024-01268-5 | 日期: 2024-10-12

摘要: Wearables offer a promising solution for enhancing remote monitoring (RM) of heart failure (HF) patients by tracking key physiological parameters. Despite their potential, their clinical integration faces challenges due to the lack of rigorous evaluations. This review aims to summarize the current evidence and assess the readiness of wearables for clinical practice using the Medical Device Readiness Level (MDRL). A systematic search identified 99 studies from 3112 found articles, with only eight being randomized controlled trials. Accelerometery was the most used measurement technique. Consumer-grade wearables, repurposed for HF monitoring, dominated the studies with most of them in the feasibility testing stage (MDRL 6). Only two of the described wearables were specifically designed for HF RM, and received FDA approval. Consequently, the actual impact of wearables on HF management remains uncertain due to limited robust evidence, posing a significant barrier to their integration into HF care.

中文摘要: 可穿戴设备为通过跟踪关键生理参数来增强心力衰竭 (HF) 患者的远程监测 (RM) 提供了一种有前途的解决方案。尽管它们具有潜力,但由于缺乏严格的评估,它们的临床整合面临挑战。本次审查旨在总结当前证据,并使用医疗设备就绪级别 (MDRL) 评估可穿戴设备对临床实践的准备情况。系统检索从 3112 篇发现的文章中确定了 99 项研究,其中只有 8 项是随机对照试验。加速度计是最常用的测量技术。消费级可穿戴设备(重新用于高频监测)主导了研究,其中大多数处于可行性测试阶段(MDRL 6)。所描述的可穿戴设备中只有两款是专门为 HF RM 设计的,并获得了 FDA 的批准。因此,由于强有力的证据有限,可穿戴设备对心力衰竭管理的实际影响仍然不确定,这对其融入心力衰竭护理构成了重大障碍。


102. Systematic review and meta-analysis of standalone digital interventions for cognitive symptoms in people without dementia.

对非痴呆症患者认知症状的独立数字干预措施的系统回顾和荟萃分析。

PMID: 39390236 | DOI: 10.1038/s41746-024-01280-9 | 日期: 2024-10-10

摘要: Cognitive symptoms are prevalent across neuropsychiatric disorders, increase distress and impair quality of life. Self-guided digital interventions offer accessibility, scalability, and may overcome the research-to-practice treatment gap. Seventy-six trials with 5214 participants were identified. A random-effects meta-analysis investigated the effects of all digital self-guided interventions, compared to controls, at post-treatment. We found a small-to-moderate positive pooled effect on cognition (k = 71; g = -0.51, 95%CI -0.64 to -0.37; p < 0.00001) and mental health (k = 30; g = -0.41, 95%CI -0.60 to -0.22; p < 0.0001). Positive treatment effects on fatigue (k = 8; g = -0.27, 95%CI -0.53 to -0.02; p = 0.03) and quality of life (k = 22; g = -0.17, 95%CI -0.34 to -0.00; p = 0.04) were only marginally significant. No significant benefit was found for performance on activities of daily living. Results were independent of control groups, treatment duration, risk of bias and delivery format. Self-guided digital transdiagnostic interventions may benefit at least a subset of patients in the short run, yet their impact on non-cognitive outcomes remains uncertain.

中文摘要: 认知症状在神经精神疾病中普遍存在,会增加痛苦并损害生活质量。自我引导的数字干预措施提供了可访问性、可扩展性,并且可以克服研究与实践之间的治疗差距。共确定了 76 项试验,共有 5214 名参与者。一项随机效应荟萃分析调查了所有数字自我引导干预措施与对照组相比在治疗后的效果。我们发现对认知(k = 71;g = -0.51,95%CI -0.64至-0.37;p < 0.00001)和心理健康(k = 30;g = -0.41,95%CI -0.60至-0.60)有小到中度的正向汇总效应。 -0.22;p < 0.0001)。对疲劳(k = 8;g = -0.27,95%CI -0.53至-0.02;p = 0.03)和生活质量(k = 22;g = -0.17,95%CI -0.34至-0.00; p = 0.04)仅具有轻微显着性。没有发现对日常生活活动的表现有显着的好处。结果独立于对照组、治疗持续时间、偏倚风险和给药方式。自我引导的数字跨诊断干预措施可能在短期内至少使一部分患者受益,但其对非认知结果的影响仍不确定。


103. Individualized decision making in on-scene resuscitation time for out-of-hospital cardiac arrest using reinforcement learning.

使用强化学习在院外心脏骤停的现场复苏时间中做出个性化决策。

PMID: 39384897 | DOI: 10.1038/s41746-024-01278-3 | 日期: 2024-10-09

摘要: On-scene resuscitation time is associated with out-of-hospital cardiac arrest (OHCA) outcomes. We developed and validated reinforcement learning models for individualized on-scene resuscitation times, leveraging nationwide Korean data. Adult OHCA patients with a medical cause of arrest were included (N = 73,905). The optimal policy was derived from conservative Q-learning to maximize survival. The on-scene return of spontaneous circulation hazard rates estimated from the Random Survival Forest were used as intermediate rewards to handle sparse rewards, while patients' historical survival was reflected in the terminal rewards. The optimal policy increased the survival to hospital discharge rate from 9.6% to 12.5% (95% CI: 12.2-12.8) and the good neurological recovery rate from 5.4% to 7.5% (95% CI: 7.3-7.7). The recommended maximum on-scene resuscitation times for patients demonstrated a bimodal distribution, varying with patient, emergency medical services, and OHCA characteristics. Our survival analysis-based approach generates explainable rewards, reducing subjectivity in reinforcement learning.

中文摘要: 现场复苏时间与院外心脏骤停 (OHCA) 的结果相关。我们利用韩国全国数据,开发并验证了针对个性化现场复苏时间的强化学习模型。因医疗原因导致逮捕的成年 OHCA 患者也被纳入其中 (N = 73,905)。最优策略源自保守的 Q 学习,以最大化生存。使用随机生存森林估计的自发循环危险率的现场返回作为中间奖励来处理稀疏奖励,而患者的历史生存情况则反映在终端奖励中。最优策略将出院生存率从 9.6% 提高到 12.5%(95% CI:12.2-12.8),神经功能良好恢复率从 5.4% 提高到 7.5%(95% CI:7.3-7.7)。推荐的患者最大现场复苏时间呈现双峰分布,随患者、紧急医疗服务和 OHCA 特征的不同而变化。我们基于生存分析的方法会产生可解释的奖励,减少强化学习的主观性。


104. Screening chronic kidney disease through deep learning utilizing ultra-wide-field fundus images.

利用超广角眼底图像通过深度学习筛查慢性肾脏疾病。

PMID: 39375513 | DOI: 10.1038/s41746-024-01271-w | 日期: 2024-10-07

摘要: To address challenges in screening for chronic kidney disease (CKD), we devised a deep learning-based CKD screening model named UWF-CKDS. It utilizes ultra-wide-field (UWF) fundus images to predict the presence of CKD. We validated the model with data from 23 tertiary hospitals across China. Retinal vessels and retinal microvascular parameters (RMPs) were extracted to enhance model interpretability, which revealed a significant correlation between renal function and RMPs. UWF-CKDS, utilizing UWF images, RMPs, and relevant medical history, can accurately determine CKD status. Importantly, UWF-CKDS exhibited superior performance compared to CTR-CKDS, a model developed using the central region (CTR) cropped from UWF images, underscoring the contribution of the peripheral retina in predicting renal function. The study presents UWF-CKDS as a highly implementable method for large-scale and accurate CKD screening at the population level.

中文摘要: 为了解决慢性肾脏病 (CKD) 筛查的挑战,我们设计了一种基于深度学习的 CKD 筛查模型,名为 UWF-CKDS。它利用超广角 (UWF) 眼底图像来预测 CKD 的存在。我们利用中国 23 家三级医院的数据验证了该模型。提取视网膜血管和视网膜微血管参数(RMP)以增强模型的可解释性,这揭示了肾功能和 RMP 之间的显着相关性。 UWF-CKDS 利用 UWF 图像、RMP 和相关病史,可以准确确定 CKD 状态。重要的是,与 CTR-CKDS 相比,UWF-CKDS 表现出优越的性能,CTR-CKDS 是使用从 UWF 图像裁剪的中心区域 (CTR) 开发的模型,强调了周边视网膜在预测肾功能中的贡献。该研究表明 UWF-CKDS 是一种高度可实施的方法,可在人群水平上进行大规模、准确的 CKD 筛查。


105. Enabling data linkages for rare diseases in a resilient environment with the SERDIF framework.

通过 SERDIF 框架在弹性环境中实现罕见疾病的数据链接。

PMID: 39367112 | DOI: 10.1038/s41746-024-01267-6 | 日期: 2024-10-04

摘要: Environmental factors amplified by climate change contribute significantly to the global burden of disease, disproportionately impacting vulnerable populations, such as individuals with rare diseases. Researchers require innovative, dynamic data linkage methods to enable the development of risk prediction models, particularly for diseases like vasculitis with unknown aetiology but potential environmental triggers. In response, we present the Semantic Environmental and Rare Disease Data Integration Framework (SERDIF). SERDIF was evaluated with researchers studying climate-related health hazards of vasculitis disease activity across European countries (NP1 = 10, NP2 = 17, NP3 = 23). Usability metrics consistently improved, indicating SERDIF's effectiveness in linking complex environmental and health datasets. Furthermore, SERDIF-enabled epidemiologists to study environmental factors in a pregnancy cohort in Lombardy, showcasing its versatility beyond rare diseases. This framework offers for the first time a user-friendly, FAIR-compliant design for environment-health data linkage with export capabilities enabling data analysis to mitigate health risks posed by climate change.

中文摘要: 气候变化加剧的环境因素极大地加重了全球疾病负担,对弱势群体(例如患有罕见疾病的人)产生了不成比例的影响。研究人员需要创新的动态数据链接方法来开发风险预测模型,特别是对于病因不明但潜在环境触发因素的血管炎等疾病。作为回应,我们提出了语义环境和罕见疾病数据集成框架(SERDIF)。 SERDIF 与研究欧洲各国血管炎疾病活动的气候相关健康危害的研究人员一起进行了评估(NP1 = 10、NP2 = 17、NP3 = 23)。可用性指标持续改进,表明 SERDIF 在链接复杂的环境和健康数据集方面的有效性。此外,SERDIF 使流行病学家能够研究伦巴第妊娠队列中的环境因素,展示其在罕见疾病之外的多功能性。该框架首次为环境健康数据链接提供了用户友好、符合公平标准的设计,并具有导出功能,使数据分析能够减轻气候变化带来的健康风险。


106. Improving prognostic accuracy in lung transplantation using unique features of isolated human lung radiographs.

利用孤立的人肺放射照片的独特特征提高肺移植的预后准确性。

PMID: 39363013 | DOI: 10.1038/s41746-024-01260-z | 日期: 2024-10-03

摘要: Ex vivo lung perfusion (EVLP) enables advanced assessment of human lungs for transplant suitability. We developed a convolutional neural network (CNN)-based approach to analyze the largest cohort of isolated lung radiographs to date. CNNs were trained to process 1300 longitudinal radiographs from n = 650 clinical EVLP cases. Latent features were transformed into principal components (PC) and correlated with known radiographic findings. PCs were combined with physiological data to classify clinical outcomes: (1) recipient time to extubation of <72 h, (2) ≥ 72 h, and (3) lungs unsuitable for transplantation. The top PC was significantly correlated with infiltration (Spearman R: 0·72, p < 0·0001), and adding radiographic PCs significantly improved the discrimination for clinical outcomes (Accuracy: 73 vs 78%, p = 0·014). CNN-derived radiographic lung features therefore add substantial value to the current assessments. This approach can be adopted by EVLP centers worldwide to harness radiographic information without requiring real-time radiological expertise.

中文摘要: 离体肺灌注 (EVLP) 可以对人肺的移植适宜性进行高级评估。我们开发了一种基于卷积神经网络 (CNN) 的方法来分析迄今为止最大的一组孤立肺 X 光照片。 CNN 被训练来处理来自 n = 650 个临床 EVLP 病例的 1300 张纵向放射线照片。潜在特征被转化为主成分(PC)并与已知的射线照相结果相关联。 PC 与生理数据相结合,对临床结果进行分类:(1) 受者拔管时间<72小时,(2)≥72小时,(3) 肺不适合移植。顶级 PC 与浸润显着相关(Spearman R:0·72,p < 0·0001),并且添加放射学 PC 显着提高了对临床结果的辨别力(准确度:73 vs 78%,p = 0·014)。因此,CNN 衍生的肺部放射学特征为当前的评估增添了巨大的价值。世界各地的 EVLP 中心都可以采用这种方法来利用放射学信息,而无需实时放射学专业知识。


107. A scoping review of reporting gaps in FDA-approved AI medical devices.

对 FDA 批准的人工智能医疗设备的报告差距进行范围审查。

PMID: 39362934 | DOI: 10.1038/s41746-024-01270-x | 日期: 2024-10-03

摘要: Machine learning and artificial intelligence (AI/ML) models in healthcare may exacerbate health biases. Regulatory oversight is critical in evaluating the safety and effectiveness of AI/ML devices in clinical settings. We conducted a scoping review on the 692 FDA-approved AI/ML-enabled medical devices approved from 1995-2023 to examine transparency, safety reporting, and sociodemographic representation. Only 3.6% of approvals reported race/ethnicity, 99.1% provided no socioeconomic data. 81.6% did not report the age of study subjects. Only 46.1% provided comprehensive detailed results of performance studies; only 1.9% included a link to a scientific publication with safety and efficacy data. Only 9.0% contained a prospective study for post-market surveillance. Despite the growing number of market-approved medical devices, our data shows that FDA reporting data remains inconsistent. Demographic and socioeconomic characteristics are underreported, exacerbating the risk of algorithmic bias and health disparity.

中文摘要: 医疗保健领域的机器学习和人工智能 (AI/ML) 模型可能会加剧健康偏差。监管监督对于评估临床环境中人工智能/机器学习设备的安全性和有效性至关重要。我们对 1995 年至 2023 年期间批准的 692 种 FDA 批准的支持 AI/ML 的医疗设备进行了范围审查,以检查透明度、安全报告和社会人口统计代表性。只有 3.6% 的批准报告了种族/民族,99.1% 的批准没有提供社会经济数据。 81.6% 的人没有报告研究对象的年龄。只有 46.1% 提供了全面详细的绩效研究结果;只有 1.9% 包含带有安全性和有效性数据的科学出版物的链接。只有 9.0% 包含上市后监测的前瞻性研究。尽管市场批准的医疗器械数量不断增加,但我们的数据显示 FDA 报告数据仍然不一致。人口统计和社会经济特征被低估,加剧了算法偏差和健康差异的风险。


108. Algorithmovigilance, lessons from pharmacovigilance.

算法警戒,药物警戒的教训。

PMID: 39358559 | DOI: 10.1038/s41746-024-01237-y | 日期: 2024-10-02

摘要: Artificial Intelligence (AI) systems are increasingly being deployed across various high-risk applications, especially in healthcare. Despite significant attention to evaluating these systems, post-deployment incidents are not uncommon, and effective mitigation strategies remain challenging. Drug safety has a well-established history of assessing, monitoring, understanding, and preventing adverse effects in real-world usage, known as pharmacovigilance. Drawing inspiration from pharmacovigilance methods, we discuss concepts that can be adapted for monitoring AI systems in healthcare. This discussion aims to improve responses to adverse effects and potential incidents and risks associated with AI deployment in healthcare but also beyond.

中文摘要: 人工智能 (AI) 系统越来越多地部署在各种高风险应用中,尤其是在医疗保健领域。尽管对评估这些系统给予了极大的关注,但部署后事件并不少见,有效的缓解策略仍然具有挑战性。药物安全在评估、监测、理解和预防现实世界使用中的不良反应(称为药物警戒)方面有着悠久的历史。从药物警戒方法中汲取灵感,我们讨论了可用于监测医疗保健中人工智能系统的概念。本次讨论旨在改善对医疗保健及其他领域与人工智能部署相关的不良影响、潜在事件和风险的响应。


109. Ethical guidance for reporting and evaluating claims of AI outperforming human doctors.

报告和评估人工智能优于人类医生的说法的道德指南。

PMID: 39358556 | DOI: 10.1038/s41746-024-01255-w | 日期: 2024-10-02

摘要: Claims of AI outperforming medical practitioners are under scrutiny, as the evidence supporting many of these claims is not convincing or transparently reported. These claims often lack specificity, contextualization, and empirical grounding. In this comment, we offer constructive ethical guidance that can benefit authors, journal editors, and peer reviewers when reporting and evaluating findings in studies comparing AI to physician performance. The guidance provided here forms an essential addition to current reporting guidelines for healthcare studies using machine learning.

中文摘要: 人工智能优于医生的说法正在受到审查,因为支持这些说法的证据并不令人信服或不透明。这些主张往往缺乏具体性、情境化和经验基础。在这篇评论中,我们提供了建设性的道德指导,可以使作者、期刊编辑和同行评审员在报告和评估人工智能与医生表现比较的研究结果时受益。此处提供的指南是对当前使用机器学习的医疗保健研究报告指南的重要补充。


110. "Towards harmonizing assessment and reimbursement of digital medical devices in the EU through mutual learning".

"通过相互学习协调欧盟数字医疗设备的评估和报销"。

PMID: 39354125 | DOI: 10.1038/s41746-024-01263-w | 日期: 2024-10-01

摘要: Digital medical devices (DMDs) present unique opportunities in their regulation and reimbursement. A dynamic landscape of DMD assessment frameworks is emerging within the European Union, with five clusters of prevailing approaches identified. Despite notable gaps in maturity levels, cross-country learning effects are becoming prevalent. We expect more countries, both within the EU and beyond, to follow the steps of current frontrunners, hence expediting the harmonization process.

中文摘要: 数字医疗设备 (DMD) 在监管和报销方面提供了独特的机会。欧盟内部正在形成动态的 DMD 评估框架格局,确定了五组主流方法。尽管成熟度水平存在显着差距,但跨国学习效应正在变得普遍。我们预计欧盟内外有更多国家效仿当前领跑者的步伐,从而加快协调进程。


111. Machine learning explains response variability of deep brain stimulation on Parkinson's disease quality of life.

机器学习解释了深部脑刺激对帕金森病生活质量的反应变异性。

PMID: 39354049 | DOI: 10.1038/s41746-024-01253-y | 日期: 2024-10-02

摘要: Improving health-related quality of life (QoL) is crucial for managing Parkinson's disease. However, QoL outcomes after deep brain stimulation (DBS) of the subthalamic nucleus (STN) vary considerably. Current approaches lack integration of demographic, patient-reported, neuroimaging, and neurophysiological data to understand this variability. This study used explainable machine learning to analyze multimodal factors affecting QoL changes, measured by the Parkinson's Disease Questionnaire (PDQ-39) in 63 patients, and quantified each variable's contribution. Results showed that preoperative PDQ-39 scores and upper beta band activity (>20 Hz) in the left STN were key predictors of QoL changes. Lower initial QoL burden predicted worsening, while improvement was associated with higher beta activity. Additionally, electrode positions along the superior-inferior axis, especially relative to the z = -7 coordinate in standard space, influenced outcomes, with improved and worsened QoL above and below this marker. This study emphasizes a tailored, data-informed approach to optimize DBS treatment and improve patient QoL.

中文摘要: 改善与健康相关的生活质量 (QoL) 对于治疗帕金森病至关重要。然而,丘脑底核 (STN) 深部脑刺激 (DBS) 后的生活质量结果差异很大。目前的方法缺乏对人口统计、患者报告、神经影像和神经生理学数据的整合来理解这种变异性。这项研究使用可解释的机器学习来分析影响生活质量变化的多模式因素,通过帕金森病问卷 (PDQ-39) 对 63 名患者进行测量,并量化每个变量的贡献。结果显示,术前 PDQ-39 评分和左侧 STN 的上 β 带活性 (>20Hz) 是 QoL 变化的关键预测因素。较低的初始生活质量负担预示着恶化,而改善则与较高的β活性相关。此外,沿上下轴的电极位置,特别是相对于标准空间中的 z = -7 坐标,会影响结果,高于或低于该标记的 QoL 会有所改善和恶化。这项研究强调采用量身定制的、基于数据的方法来优化 DBS 治疗并提高患者的生活质量。


112. Games Wide Open to athlete partnership in building artificial intelligence systems.

奥运会向运动员合作构建人工智能系统开放。

PMID: 39354047 | DOI: 10.1038/s41746-024-01261-y | 日期: 2024-10-01

摘要: The integration of artificial intelligence (AI) in sports medicine is opening new frontiers for athlete health and performance, aligning with the spirit of the Paris 2024 Olympic Games slogan, "Games Wide Open."

中文摘要: 人工智能 (AI) 与运动医学的融合正在为运动员的健康和表现开辟新领域,这与 2024 年巴黎奥运会口号"Games Wide Open"的精神不谋而合。


113. Overcoming biases of individual level shopping history data in health research.

克服健康研究中个人层面的购物历史数据的偏差。

PMID: 39349949 | DOI: 10.1038/s41746-024-01231-4 | 日期: 2024-09-30

摘要: Novel sources of population data, especially administrative and medical records, as well as the digital footprints generated through interactions with online services, present a considerable opportunity for advancing health research and policymaking. An illustrative example is shopping history records that can illuminate aspects of population health by scrutinizing extensive sets of everyday choices made in the real world. However, like any dataset, these sources possess specific limitations, including sampling biases, validity issues, and measurement errors. To enhance the applicability and potential of shopping data in health research, we advocate for the integration of individual-level shopping data with external datasets containing rich repositories of longitudinal population cohort studies. This strategic approach holds the promise of devising innovative methodologies to address inherent data limitations and biases. By meticulously documenting biases, establishing validated associations, and discerning patterns within these amalgamated records, researchers can extrapolate their findings to encompass population-wide datasets derived from national supermarket chain. The validation and linkage of population health data with real-world choices pertaining to food, beverages, and over-the-counter medications, such as pain relief, present a significant opportunity to comprehend the impact of these choices and behavioural patterns associated with them on public health.

中文摘要: 人口数据的新来源,特别是行政和医疗记录,以及通过与在线服务交互产生的数字足迹,为推进卫生研究和政策制定提供了巨大的机会。一个说明性的例子是购物历史记录,它可以通过仔细检查现实世界中所做的大量日常选择来阐明人口健康的各个方面。然而,与任何数据集一样,这些来源也具有特定的局限性,包括抽样偏差、有效性问题和测量误差。为了增强购物数据在健康研究中的适用性和潜力,我们主张将个人购物数据与包含丰富的纵向人群队列研究存储库的外部数据集相整合。这种战略方法有望设计出创新的方法来解决固有的数据限制和偏见。通过仔细记录偏见、建立经过验证的关联并辨别这些合并记录中的模式,研究人员可以推断他们的发现,以涵盖来自全国连锁超市的全人群数据集。将人口健康数据与有关食品、饮料和非处方药物(例如止痛药)的现实世界选择进行验证和联系,为理解这些选择和与之相关的行为模式对公共卫生的影响提供了重要的机会。


114. Effects of artificial intelligence implementation on efficiency in medical imaging-a systematic literature review and meta-analysis.

人工智能实施对医学成像效率的影响------系统文献综述和荟萃分析。

PMID: 39349815 | DOI: 10.1038/s41746-024-01248-9 | 日期: 2024-09-30

摘要: In healthcare, integration of artificial intelligence (AI) holds strong promise for facilitating clinicians' work, especially in clinical imaging. We aimed to assess the impact of AI implementation for medical imaging on efficiency in real-world clinical workflows and conducted a systematic review searching six medical databases. Two reviewers double-screened all records. Eligible records were evaluated for methodological quality. The outcomes of interest were workflow adaptation due to AI implementation, changes in time for tasks, and clinician workload. After screening 13,756 records, we identified 48 original studies to be incuded in the review. Thirty-three studies measured time for tasks, with 67% reporting reductions. Yet, three separate meta-analyses of 12 studies did not show significant effects after AI implementation. We identified five different workflows adapting to AI use. Most commonly, AI served as a secondary reader for detection tasks. Alternatively, AI was used as the primary reader for identifying positive cases, resulting in reorganizing worklists or issuing alerts. Only three studies scrutinized workload calculations based on the time saved through AI use. This systematic review and meta-analysis represents an assessment of the efficiency improvements offered by AI applications in real-world clinical imaging, predominantly revealing enhancements across the studies. However, considerable heterogeneity in available studies renders robust inferences regarding overall effectiveness in imaging tasks. Further work is needed on standardized reporting, evaluation of system integration, and real-world data collection to better understand the technological advances of AI in real-world healthcare workflows. Systematic review registration: Prospero ID CRD42022303439, International Registered Report Identifier (IRRID): RR2-10.2196/40485.

中文摘要: 在医疗保健领域,人工智能 (AI) 的集成有望促进临床医生的工作,尤其是在临床成像方面。我们的目的是评估人工智能在医学成像中的应用对现实临床工作流程效率的影响,并对六个医学数据库进行了系统评价。两名评审员对所有记录进行了双重筛选。对合格记录的方法学质量进行了评估。令人感兴趣的结果是人工智能实施导致的工作流程适应、任务时间的变化以及临床医生的工作量。在筛选了 13,756 条记录后,我们确定了 48 项原始研究纳入审查。 33 项研究测量了任务时间,其中 67% 报告减少了。然而,对 12 项研究进行的三项独立荟萃分析并未显示人工智能实施后的显着效果。我们确定了五种适应人工智能使用的不同工作流程。最常见的是,人工智能充当检测任务的辅助读者。或者,人工智能被用作识别阳性病例的主要阅读器,从而重新组织工作列表或发出警报。只有三项研究根据人工智能使用节省的时间仔细计算了工作量。这项系统回顾和荟萃分析代表了对人工智能应用在现实世界临床成像中所提供的效率改进的评估,主要揭示了各项研究的增强。然而,现有研究中相当大的异质性对成像任务的整体有效性提供了强有力的推论。需要在标准化报告、系统集成评估和现实世界数据收集方面开展进一步的工作,以更好地了解人工智能在现实世界医疗保健工作流程中的技术进步。系统审查注册:Prospero ID CRD42022303439,国际注册报告标识符(IRRID):RR2-10.2196/40485。


115. Talking about diseases; developing a model of patient and public-prioritised disease phenotypes.

谈论疾病;开发患者和公众优先考虑的疾病表型模型。

PMID: 39349692 | DOI: 10.1038/s41746-024-01257-8 | 日期: 2024-09-30

摘要: Deep phenotyping describes the use of standardised terminologies to create comprehensive phenotypic descriptions of biomedical phenomena. These characterisations facilitate secondary analysis, evidence synthesis, and practitioner awareness, thereby guiding patient care. The vast majority of this knowledge is derived from sources that describe an academic understanding of disease, including academic literature and experimental databases. Previous work indicates a gulf between the priorities, perspectives, and perceptions held by different healthcare stakeholders. Using social media data, we develop a phenotype model that represents a public perspective on disease and compare this with a model derived from a combination of existing academic phenotype databases. We identified 52,198 positive disease-phenotype associations from social media across 311 diseases. We further identified 24,618 novel phenotype associations not shared by the biomedical and literature-derived phenotype model across 304 diseases, of which we considered 14,531 significant. Manifestations of disease affecting quality of life, and concerning endocrine, digestive, and reproductive diseases were over-represented in the social media phenotype model. An expert clinical review found that social media-derived associations were considered similarly well-established to those derived from literature, and were seen significantly more in patient clinical encounters. The phenotype model recovered from social media presents a significantly different perspective than existing resources derived from biomedical databases and literature, providing a large number of associations novel to the latter dataset. We propose that the integration and interrogation of these public perspectives on the disease can inform clinical awareness, improve secondary analysis, and bridge understanding and priorities across healthcare stakeholders.

中文摘要: 深度表型分析描述了使用标准化术语来创建生物医学现象的全面表型描述。这些特征有助于二次分析、证据综合和医生意识,从而指导患者护理。这些知识绝大多数来自描述对疾病的学术理解的来源,包括学术文献和实验数据库。之前的研究表明不同医疗保健利益相关者的优先事项、观点和看法之间存在鸿沟。使用社交媒体数据,我们开发了一个代表公众对疾病的看法的表型模型,并将其与源自现有学术表型数据库组合的模型进行比较。我们从社交媒体中识别出 311 种疾病的 52,198 种阳性疾病表型关联。我们进一步在 304 种疾病中发现了生物医学和文献衍生表型模型不共有的 24,618 个新表型关联,其中 14,531 个我们认为是显着的。影响生活质量以及内分泌、消化和生殖疾病的疾病表现在社交媒体表型模型中被过度体现。一项专家临床审查发现,社交媒体衍生的关联被认为与文献衍生的关联同样完善,并且在患者临床接触中明显更多。从社交媒体恢复的表型模型呈现出与生物医学数据库和文献中衍生的现有资源显着不同的视角,为后一个数据集提供了大量新颖的关联。我们建议,整合和询问这些公众对该疾病的观点可以提高临床意识,改善二次分析,并弥合医疗保健利益相关者之间的理解和优先事项。


116. Identifying who are unlikely to benefit from total knee arthroplasty using machine learning models.

使用机器学习模型确定哪些人不太可能从全膝关节置换术中受益。

PMID: 39349593 | DOI: 10.1038/s41746-024-01265-8 | 日期: 2024-09-30

摘要: Identifying and preventing patients who are not likely to benefit long-term from total knee arthroplasty (TKA) would decrease healthcare expenditure significantly. We trained machine learning (ML) models (image-only, clinical-data only, and multimodal) among 5720 knee OA patients to predict postoperative dissatisfaction at 2 years. Dissatisfaction was defined as not achieving a minimal clinically important difference in postoperative Knee Society knee and function scores (KSS), Short Form-36 Health Survey [SF-36, divided into a physical component score (PCS) and mental component score (MCS)], and Oxford Knee Score (OKS). Compared to image-only models, both clinical-data only and multimodal models achieved superior performance at predicting dissatisfaction measured by AUC, clinical-data only model: KSS 0.888 (0.866-0.909), SF-PCS 0.836 (0.812-0.860), SF-MCS 0.833 (0.812-0.854), and OKS 0.806 (0.753-0.859); multimodal model: KSS 0.891 (0.870-0.911), SF-PCS 0.832 (0.808-0.857), SF-MCS 0.835 (0.811-0.856), and OKS 0.816 (0.768-0.863). Our findings highlighted that ML models using clinical or multimodal data were capable to predict post-TKA dissatisfaction.

中文摘要: 识别和预防不太可能从全膝关节置换术 (TKA) 中长期受益的患者将显着减少医疗支出。我们在 5720 名膝关节 OA 患者中训练了机器学习 (ML) 模型(仅图像、仅临床数据和多模式),以预测术后 2 年的满意度。不满意被定义为术后膝关节协会膝关节和功能评分 (KSS)、Short Form-36 健康调查 [SF-36,分为身体成分评分 (PCS) 和心理成分评分 (MCS)] 以及牛津膝关节评分 (OKS) 方面未达到最小的临床重要差异。与仅图像模型相比,仅临床数据模型和多模态模型在预测 AUC 测量的满意度方面均取得了优异的性能,仅临床数据模型:KSS 0.888 (0.866-0.909)、SF-PCS 0.836 (0.812-0.860)、SF-MCS 0.833 (0.812-0.854) 和 OKS 0.806(0.753-0.859);多模式模型:KSS 0.891 (0.870-0.911)、SF-PCS 0.832 (0.808-0.857)、SF-MCS 0.835 (0.811-0.856) 和 OKS 0.816 (0.768-0.863)。我们的研究结果强调,使用临床或多模态数据的机器学习模型能够预测 TKA 后的不满意程度。


117. An indirect treatment comparison meta-analysis of digital versus face-to-face cognitive behavior therapy for headache.

数字与面对面认知行为疗法治疗头痛的间接治疗比较荟萃分析。

PMID: 39343978 | DOI: 10.1038/s41746-024-01264-9 | 日期: 2024-09-29

摘要: Cognitive behavioral therapy (CBT) is effective for headache disorders. However, it is unclear whether the emerging digital CBT is noninferior to face-to-face CBT. An indirect treatment comparison (ITC) meta-analysis was conducted to assess the relative effects between them using standard mean differences (SMDs). Effective sample size (ESS) and required sample size (RSS) were calculated to demonstrate the robustness of the results. Our study found that digital CBT had a similar effect on headache frequency reduction (SMD, 0.12; 95%CI, -2.45 to 2.63) compared with face-to-face CBT. The ESS had 84 participants, while the RSS had 466 participants to achieve the same power as a non-inferior head-to-head trial. Digital CBT is as effective as face-to-face CBT in preventing headache disorders. Due to the heterogeneity (I2 = 94.5%, τ2 = 1.83) and the fact that most of the included studies were on migraine prevention, further head-to-head trials are warranted.

中文摘要: 认知行为疗法(CBT)对头痛疾病有效。然而,尚不清楚新兴的数字 CBT 是否不逊色于面对面的 CBT。进行间接治疗比较(ITC)荟萃分析,以使用标准平均差(SMD)评估它们之间的相对效果。计算有效样本量(ESS)和所需样本量(RSS)以证明结果的稳健性。我们的研究发现,与面对面 CBT 相比,数字 CBT 在减少头痛频率方面具有相似的效果(SMD,0.12;95%CI,-2.45 至 2.63)。 ESS 有 84 名参与者,而 RSS 有 466 名参与者,达到了与非劣质头对头试验相同的功效。在预防头痛方面,数字 CBT 与面对面 CBT 一样有效。由于异质性(I2 = 94.5%,τ2 = 1.83)以及大多数纳入的研究都是关于偏头痛预防的事实,因此有必要进行进一步的头对头试验。


118. Blending space and time to talk about cancer in extended reality.

融合空间和时间,在扩展的现实中谈论癌症。

PMID: 39343807 | DOI: 10.1038/s41746-024-01262-x | 日期: 2024-09-29

摘要: We introduce a proof-of-concept extended reality (XR) environment for discussing cancer, presenting genomic information from multiple tumour sites in the context of 3D tumour models generated from CT scans. This tool enhances multidisciplinary discussions. Clinicians and cancer researchers explored its use in oncology, sharing perspectives on XR's potential for use in molecular tumour boards, clinician-patient communication, and education. XR serves as a universal language, fostering collaborative decision-making in oncology.

中文摘要: 我们引入了一个用于讨论癌症的概念验证扩展现实 (XR) 环境,在 CT 扫描生成的 3D 肿瘤模型的背景下呈现来自多个肿瘤部位的基因组信息。该工具增强了多学科讨论。临床医生和癌症研究人员探讨了 XR 在肿瘤学中的应用,分享了 XR 在分子肿瘤委员会、临床医生与患者沟通和教育方面的潜力。 XR 作为一种通用语言,促进肿瘤学领域的协作决策。


119. Deep learning for identifying personal and family history of suicidal thoughts and behaviors from EHRs.

深度学习用于从电子病历中识别个人和家族的自杀想法和行为史。

PMID: 39341983 | DOI: 10.1038/s41746-024-01266-7 | 日期: 2024-09-28

摘要: Personal and family history of suicidal thoughts and behaviors (PSH and FSH, respectively) are significant risk factors associated with suicides. Research is limited in automatic identification of such data from clinical notes in Electronic Health Records. This study developed deep learning (DL) tools utilizing transformer models (Bio_ClinicalBERT and GatorTron) to detect PSH and FSH in clinical notes derived from three academic medical centers, and compared their performance with a rule-based natural language processing tool. For detecting PSH, the rule-based approach obtained an F1-score of 0.75 ± 0.07, while the Bio_ClinicalBERT and GatorTron DL tools scored 0.83 ± 0.09 and 0.84 ± 0.07, respectively. For detecting FSH, the rule-based approach achieved an F1-score of 0.69 ± 0.11, compared to 0.89 ± 0.10 for Bio_ClinicalBERT and 0.92 ± 0.07 for GatorTron. Across sites, the DL tools identified more than 80% of patients at elevated risk for suicide who remain undiagnosed and untreated.

中文摘要: 自杀想法和行为(分别为 PSH 和 FSH)的个人史和家族史是与自杀相关的重要危险因素。研究仅限于从电子健康记录中的临床记录中自动识别此类数据。本研究开发了深度学习 (DL) 工具,利用 Transformer 模型(Bio_ClinicalBERT 和 GatorTron)来检测来自三个学术医疗中心的临床记录中的 PSH 和 FSH,并将其性能与基于规则的自然语言处理工具进行比较。对于检测 PSH,基于规则的方法获得的 F1 分数为 0.75±0.07,而 Bio_ClinicalBERT 和 GatorTron DL 工具的分数分别为 0.83±0.09 和 0.84±0.07。对于检测 FSH,基于规则的方法获得了 0.69±0.11 的 F1 分数,而 Bio_ClinicalBERT 的 F1 分数为 0.89±0.10,GatorTron 的 F1 分数为 0.92±0.07。在各个站点,深度学习工具识别出超过 80% 的自杀风险较高的患者仍未得到诊断和治疗。


120. Comparison of NLP machine learning models with human physicians for ASA Physical Status classification.

NLP 机器学习模型与人类医生进行 ASA 身体状况分类的比较。

PMID: 39341936 | DOI: 10.1038/s41746-024-01259-6 | 日期: 2024-09-28

摘要: The American Society of Anesthesiologist's Physical Status (ASA-PS) classification system assesses comorbidities before sedation and analgesia, but inconsistencies among raters have hindered its objective use. This study aimed to develop natural language processing (NLP) models to classify ASA-PS using pre-anesthesia evaluation summaries, comparing their performance to human physicians. Data from 717,389 surgical cases in a tertiary hospital (October 2004-May 2023) was split into training, tuning, and test datasets. Board-certified anesthesiologists created reference labels for tuning and test datasets. The NLP models, including ClinicalBigBird, BioClinicalBERT, and Generative Pretrained Transformer 4, were validated against anesthesiologists. The ClinicalBigBird model achieved an area under the receiver operating characteristic curve of 0.915. It outperformed board-certified anesthesiologists with a specificity of 0.901 vs. 0.897, precision of 0.732 vs. 0.715, and F1-score of 0.716 vs. 0.713 (all p <0.01). This approach will facilitate automatic and objective ASA-PS classification, thereby streamlining the clinical workflow.

中文摘要: 美国麻醉医师协会身体状况(ASA-PS)分类系统在镇静和镇痛前评估合并症,但评估者之间的不一致阻碍了其客观使用。本研究旨在开发自然语言处理 (NLP) 模型,使用麻醉前评估摘要对 ASA-PS 进行分类,并将其表现与人类医生进行比较。来自三级医院的 717,389 例手术病例(2004 年 10 月至 2023 年 5 月)的数据被分为训练、调整和测试数据集。委员会认证的麻醉师创建了用于调整和测试数据集的参考标签。 NLP 模型(包括 ClinicalBigBird、BioClinicalBERT 和 Generative Pretrained Transformer 4)经过了麻醉师的验证。 ClinicalBigBird 模型的受试者工作特征曲线下面积为 0.915。它的特异性优于委员会认证的麻醉师,特异性为 0.901 vs. 0.897,精确度为 0.732 vs. 0.715,F1 分数为 0.716 vs. 0.713(所有 p<0.01)。这种方法将促进自动、客观的 ASA-PS 分类,从而简化临床工作流程。


121. A framework for human evaluation of large language models in healthcare derived from literature review.

来自文献综述的医疗保健中大型语言模型的人类评估框架。

PMID: 39333376 | DOI: 10.1038/s41746-024-01258-7 | 日期: 2024-09-28

摘要: With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential to assuring safety and effectiveness. This study reviews existing literature on human evaluation methodologies for LLMs in healthcare across various medical specialties and addresses factors such as evaluation dimensions, sample types and sizes, selection, and recruitment of evaluators, frameworks and metrics, evaluation process, and statistical analysis type. Our literature review of 142 studies shows gaps in reliability, generalizability, and applicability of current human evaluation practices. To overcome such significant obstacles to healthcare LLM developments and deployments, we propose QUEST, a comprehensive and practical framework for human evaluation of LLMs covering three phases of workflow: Planning, Implementation and Adjudication, and Scoring and Review. QUEST is designed with five proposed evaluation principles: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence.

中文摘要: 随着生成人工智能 (GenAI),特别是大型语言模型 (LLM) 在医疗保健领域不断取得进展,通过人工评估来评估 LLM 对于确保安全性和有效性至关重要。本研究回顾了有关各个医学专业的医疗保健法学硕士人类评估方法的现有文献,并讨论了评估维度、样本类型和大小、评估者的选择和招聘、框架和指标、评估过程和统计分析类型等因素。我们对 142 项研究的文献回顾表明,当前人类评估实践在可靠性、普遍性和适用性方面存在差距。为了克服医疗保健法学硕士开发和部署的重大障碍,我们提出了 QUEST,这是一个全面实用的法学硕士人工评估框架,涵盖工作流程的三个阶段:规划、实施和裁决以及评分和审查。 QUEST 的设计提出了五项评估原则:信息质量、理解和推理、表达风格和角色、安全和伤害以及信任和信心。


122. Privacy-preserving large language models for structured medical information retrieval.

用于结构化医疗信息检索的隐私保护大型语言模型。

PMID: 39304709 | DOI: 10.1038/s41746-024-01233-2 | 日期: 2024-09-20

摘要: Most clinical information is encoded as free text, not accessible for quantitative analysis. This study presents an open-source pipeline using the local large language model (LLM) "Llama 2" to extract quantitative information from clinical text and evaluates its performance in identifying features of decompensated liver cirrhosis. The LLM identified five key clinical features in a zero- and one-shot manner from 500 patient medical histories in the MIMIC IV dataset. We compared LLMs of three sizes and various prompt engineering approaches, with predictions compared against ground truth from three blinded medical experts. Our pipeline achieved high accuracy, detecting liver cirrhosis with 100% sensitivity and 96% specificity. High sensitivities and specificities were also yielded for detecting ascites (95%, 95%), confusion (76%, 94%), abdominal pain (84%, 97%), and shortness of breath (87%, 97%) using the 70 billion parameter model, which outperformed smaller versions. Our study successfully demonstrates the capability of locally deployed LLMs to extract clinical information from free text with low hardware requirements.

中文摘要: 大多数临床信息被编码为自由文本,无法进行定量分析。本研究提出了一个开源管道,使用本地大语言模型(LLM)"Llama 2"从临床文本中提取定量信息,并评估其在识别失代偿性肝硬化特征方面的性能。法学硕士以零次和一次的方式从 MIMIC IV 数据集中的 500 名患者病史中识别出五个关键临床特征。我们比较了三种规模的法学硕士和各种即时工程方法,并将预测与三位双盲医学专家的真实情况进行了比较。我们的管道实现了高精度,以 100% 的灵敏度和 96% 的特异性检测肝硬化。使用 700 亿个参数模型,检测腹水(95%、95%)、精神错乱(76%、94%)、腹痛(84%、97%)和呼吸短促(87%、97%)也获得了高灵敏度和特异性,该模型的性能优于较小的版本。我们的研究成功证明了本地部署的法学硕士能够以较低的硬件要求从自由文本中提取临床信息。


123. Zero shot health trajectory prediction using transformer.

使用变压器进行零射击健康轨迹预测。

PMID: 39300208 | DOI: 10.1038/s41746-024-01235-0 | 日期: 2024-09-19

摘要: Integrating modern machine learning and clinical decision-making has great promise for mitigating healthcare's increasing cost and complexity. We introduce the Enhanced Transformer for Health Outcome Simulation (ETHOS), a novel application of the transformer deep-learning architecture for analyzing high-dimensional, heterogeneous, and episodic health data. ETHOS is trained using Patient Health Timelines (PHTs)-detailed, tokenized records of health events-to predict future health trajectories, leveraging a zero-shot learning approach. ETHOS represents a significant advancement in foundation model development for healthcare analytics, eliminating the need for labeled data and model fine-tuning. Its ability to simulate various treatment pathways and consider patient-specific factors positions ETHOS as a tool for care optimization and addressing biases in healthcare delivery. Future developments will expand ETHOS' capabilities to incorporate a wider range of data types and data sources. Our work demonstrates a pathway toward accelerated AI development and deployment in healthcare.

中文摘要: 将现代机器学习和临床决策相结合对于缓解医疗保健日益增加的成本和复杂性具有巨大的希望。我们介绍了用于健康结果模拟的增强型 Transformer (ETHOS),这是 Transformer 深度学习架构的一种新颖应用,用于分析高维、异构和情景健康数据。 ETHOS 使用患者健康时间线 (PHT)(健康事件的详细、标记化记录)进行训练,以利用零样本学习方法来预测未来的健康轨迹。 ETHOS 代表了医疗保健分析基础模型开发的重大进步,消除了对标记数据和模型微调的需要。 ETHOS 能够模拟各种治疗途径并考虑患者特定因素,使其成为优化护理和解决医疗保健服务偏差的工具。未来的发展将扩展 ETHOS 的功能,以纳入更广泛的数据类型和数据源。我们的工作展示了一条加速医疗保健领域人工智能开发和部署的途径。


124. Regulatory responses and approval status of artificial intelligence medical devices with a focus on China.

以中国为重点的人工智能医疗器械的监管应对和审批状况。

PMID: 39294318 | DOI: 10.1038/s41746-024-01254-x | 日期: 2024-09-18

摘要: This paper focuses on how regulatory bodies respond to artificial intelligence (AI)-enabled medical devices. To achieve this, we present a comparative overview of the United States (USA), European Union (EU), and China. Our search in the governmental database identified 59 AI medical devices approved in China as of July 2023. In comparison to the rules-based regulatory approach in China, the approaches in the USA and EU are more standards-oriented.

中文摘要: 本文重点讨论监管机构如何应对人工智能 (AI) 医疗设备。为了实现这一目标,我们对美国 (USA)、欧盟 (EU) 和中国进行了比较概述。我们在政府数据库中的搜索发现,截至 2023 年 7 月,中国批准了 59 种人工智能医疗器械。与中国基于规则的监管方法相比,美国和欧盟的方法更加以标准为导向。


125. A drug mix and dose decision algorithm for individualized type 2 diabetes management.

用于个体化 2 型糖尿病管理的药物组合和剂量决策算法。

PMID: 39289474 | DOI: 10.1038/s41746-024-01230-5 | 日期: 2024-09-17

摘要: Pharmacotherapy guidelines for type 2 diabetes (T2D) emphasize patient-centered care, but applying this approach effectively in outpatient practice remains challenging. Data-driven treatment optimization approaches could enhance individualized T2D management, but current approaches cannot account for drug-specific and dose-dependent variations in safety and efficacy. We developed and evaluated an AI Drug mix and dose Advisor (AIDA) for glycemic management, using electronic medical records from 107,854 T2D patients in the SingHealth Diabetes Registry. Given a patient's medical profile, AIDA leverages a predict-then-optimize approach to identify the minimal drug mix and dose changes required to optimize glycemic control, subject to clinical knowledge-based guidelines. On unseen data from large internal, external, and temporal validation sets, AIDA recommendations were estimated to improve post-visit glycated hemoglobin (HbA1c) by an average of 0.40-0.68% over standard of care (P < 0.0001). In qualitative evaluations on 60 diverse cases by a panel of three endocrinologists, AIDA recommendations were mostly rated as reasonable and precise. Finally, AIDA's ability to account for drug-dose specifics offered several advantages over competing methods, including greater consistency with practice preferences and clinical guidelines for practical but effective options, indication-based treatments, and renal dosing. As AIDA provides drug-dose recommendations to improve outcomes for individual T2D patients, it could be used for clinical decision support at point-of-care, especially in resource-limited settings.

中文摘要: 2 型糖尿病 (T2D) 的药物治疗指南强调以患者为中心的护理,但在门诊实践中有效应用这种方法仍然具有挑战性。数据驱动的治疗优化方法可以增强个体化 T2D 管理,但目前的方法无法解释药物特异性和剂量依赖性的安全性和有效性变化。我们利用 SingHealth 糖尿病登记处 107,854 名 T2D 患者的电子病历,开发并评估了用于血糖管理的人工智能药物组合和剂量顾问 (AIDA)。根据患者的医疗状况,AIDA 利用预测然后优化的方法来确定优化血糖控制所需的最小药物组合和剂量变化,并遵循基于临床知识的指南。根据来自大型内部、外部和时间验证集的未见数据,AIDA 建议估计可将就诊后糖化血红蛋白 (HbA1c) 比护理标准平均提高 0.40-0.68% (P < 0.0001)。在由三名内分泌学家组成的小组对 60 个不同病例进行的定性评估中,AIDA 的建议大多被评为合理且精确。最后,AIDA 能够解释药物剂量的具体情况,与竞争方法相比具有多种优势,包括与实践偏好和临床指南更加一致,以实现实用但有效的选择、基于适应症的治疗和肾脏剂量。由于 AIDA 提供药物剂量建议以改善个别 T2D 患者的预后,因此它可用于护理点的临床决策支持,特别是在资源有限的环境中。


126. A systematic review and meta analysis on digital mental health interventions in inpatient settings.

对住院环境中数字心理健康干预措施的系统回顾和荟萃分析。

PMID: 39289463 | DOI: 10.1038/s41746-024-01252-z | 日期: 2024-09-17

摘要: E-mental health (EMH) interventions gain increasing importance in the treatment of mental health disorders. Their outpatient efficacy is well-established. However, research on EMH in inpatient settings remains sparse and lacks a meta-analytic synthesis. This paper presents a meta-analysis on the efficacy of EMH in inpatient settings. Searching multiple databases (PubMed, ScienceGov, PsycInfo, CENTRAL, references), 26 randomized controlled trial (RCT) EMH inpatient studies (n = 6112) with low or medium assessed risk of bias were included. A small significant total effect of EMH treatment was found (g = 0.3). The effect was significant both for blended interventions (g = 0.42) and post-treatment EMH-based aftercare (g = 0.29). EMH treatment yielded significant effects across different patient groups and types of therapy, and the effects remained stable post-treatment. The results show the efficacy of EMH treatment in inpatient settings. The meta-analysis is limited by the small number of included studies.

中文摘要: 电子心理健康(EMH)干预措施在心理健康障碍的治疗中变得越来越重要。他们的门诊疗效是公认的。然而,关于住院患者中有效市场假说的研究仍然很少,并且缺乏荟萃分析综合。本文对 EMH 在住院环境中的功效进行了荟萃分析。检索多个数据库(PubMed、ScienceGov、PsycInfo、CENTRAL、参考文献),纳入了 26 项具有低或中评估偏倚风险的随机对照试验 (RCT) EMH 住院患者研究 (n = 6112)。发现 EMH 治疗的总体效果较小且显着(g = 0.3)。对于混合干预 (g = 0.42) 和治疗后基于 EMH 的善后护理 (g = 0.29),效果均显着。 EMH治疗在不同的患者群体和治疗类型中产生了显着的效果,并且治疗后效果保持稳定。结果显示了 EMH 治疗在住院环境中的有效性。荟萃分析因纳入研究数量较少而受到限制。


127. Deep behavioural representation learning reveals risk profiles for malignant ventricular arrhythmias.

深度行为表征学习揭示了恶性室性心律失常的风险概况。

PMID: 39284923 | DOI: 10.1038/s41746-024-01247-w | 日期: 2024-09-16

摘要: We aimed to identify and characterise behavioural profiles in patients at high risk of SCD, by using deep representation learning of day-to-day behavioural recordings. We present a pipeline that employed unsupervised clustering on low-dimensional representations of behavioural time-series data learned by a convolutional residual variational neural network (ResNet-VAE). Data from the prospective, observational SafeHeart study conducted at two large tertiary university centers in the Netherlands and Denmark were used. Patients received an implantable cardioverter-defibrillator (ICD) between May 2021 and September 2022 and wore wearable devices using accelerometer technology during 180 consecutive days. A total of 272 patients (mean age of 63.1 ± 10.2 years, 81% male) were eligible with a total sampling of 37,478 days of behavioural data (138 ± 47 days per patient). Deep representation learning identified five distinct behavioural profiles: Cluster A (n = 46) had very low physical activity levels and a disturbed sleep pattern. Cluster B (n = 70) had high activity levels, mainly at light-to-moderate intensity. Cluster C (n = 63) exhibited a high-intensity activity profile. Cluster D (n = 51) showed above-average sleep efficiency. Cluster E (n = 42) had frequent waking episodes and poor sleep. Annual risks of malignant ventricular arrhythmias ranged from 30.4% in Cluster A to 9.8% and 9.5% for Clusters D-E, respectively. Compared to low-risk profiles (D-E), Cluster A demonstrated a three-to-four fold increased risk of malignant ventricular arrhythmias adjusted for clinical covariates (adjusted HR 3.63, 95% CI 1.54-8.53, p < 0.001). These behavioural profiles may guide more personalised approaches to ventricular arrhythmia and SCD prevention.

中文摘要: 我们的目的是通过使用日常行为记录的深度表征学习来识别和表征 SCD 高风险患者的行为特征。我们提出了一种管道,该管道对通过卷积残差变分神经网络(ResNet-VAE)学习的行为时间序列数据的低维表示采用无监督聚类。使用的数据来自在荷兰和丹麦的两个大型高等教育中心进行的前瞻性观察性 SafeHeart 研究。患者在 2021 年 5 月至 2022 年 9 月期间接受了植入式心律转复除颤器 (ICD),并连续 180 天佩戴采用加速度计技术的可穿戴设备。共有 272 名患者(平均年龄 63.1±10.2 岁,81% 为男性)符合资格,总共采样了 37,478 天的行为数据(每位患者 138±47 天)。深度表征学习确定了五种不同的行为特征:A 组 (n = 46) 的体力活动水平非常低,睡眠模式受到干扰。簇 B (n = 70) 具有较高的活性水平,主要处于轻度至中等强度。簇 C (n = 63) 表现出高强度的活动特征。聚类 D (n = 51) 显示出高于平均水平的睡眠效率。 E 组 (n = 42) 频繁醒来且睡眠质量不佳。恶性室性心律失常的年度风险范围分别为 A 簇的 30.4%、D-E 簇的 9.8% 和 9.5%。与低风险概况 (D-E) 相比,根据临床协变量调整后,A 组显示恶性室性心律失常的风险增加了 3 至 4 倍(调整后 HR 3.63,95% CI 1.54-8.53,p<0.001)。这些行为特征可以指导更个性化的室性心律失常和心源性猝死预防方法。


128. How might Hospital at Home enable a greener and healthier future?

家庭医院如何实现更绿色、更健康的未来?

PMID: 39284871 | DOI: 10.1038/s41746-024-01249-8 | 日期: 2024-09-16

摘要: Traditional healthcare delivery models face mounting pressure from rising costs, increasing demand, and a growing environmental footprint. Hospital at Home (HaH) has been proposed as a potential solution, offering care at home through in-person, virtual, or hybrid approaches. Despite focus on expanding HaH provision and capacity, research has primarily explored patient care outcomes, patient satisfaction economic costs with a key gap in its environmental impact. By reducing this evidence gap, HaH may be better placed as a positive enabler in delivering healthier planet and population. This article explores the environmental opportunities and challenges associated with HaH compared to traditional hospital care and reinforces the case for further research to comprehensively quantify the environmental impact including any co-benefits. Our aim for this article is to spark conversation, and begin to help prioritise future research and analysis.

中文摘要: 传统的医疗保健服务模式面临着成本上升、需求增加和环境足迹不断增加带来的越来越大的压力。家庭医院 (HaH) 被提议作为一种潜在的解决方案,通过面对面、虚拟或混合的方式提供家庭护理。尽管重点是扩大 HaH 的供应和能力,但研究主要探讨了患者护理结果、患者满意度、经济成本及其环境影响方面的关键差距。通过缩小这一证据差距,HaH 可能会更好地成为实现更健康的地球和人口的积极推动者。本文探讨了与传统医院护理相比与 HaH 相关的环境机遇和挑战,并强化了进一步研究的案例,以全面量化环境影响(包括任何附带效益)。我们写这篇文章的目的是引发对话,并开始帮助确定未来研究和分析的优先顺序。


129. Variational Bayes machine learning for risk adjustment of general outcome indicators with examples in urology.

变分贝叶斯机器学习用于一般结果指标的风险调整,并以泌尿科为例。

PMID: 39277683 | DOI: 10.1038/s41746-024-01244-z | 日期: 2024-09-14

摘要: Risk adjustment is often necessary for outcome quality indicators (QIs) to provide fair and accurate feedback to healthcare professionals. However, traditional risk adjustment models are generally oversimplified and not equipped to disentangle complex factors influencing outcomes that are out of a healthcare professional's control. We present VIRGO, a novel variational Bayes model trained on routinely collected, large administrative datasets to risk-adjust outcome QIs. VIRGO uses detailed demographics, diagnosis, and procedure codes to provide individualized risk adjustment and explanations on patient factors affecting outcomes. VIRGO achieves state-of-the-art on external datasets and features capabilities of uncertainty expression, explainable features, and counterfactual analysis capabilities. VIRGO facilitates risk adjustment by explaining how patient factors led to adverse outcomes and expresses the uncertainty of each prediction, allowing healthcare professionals to not only explore patient factors with unexplained variance that are associated with worse outcomes but also reflect on the quality of their clinical practice.

中文摘要: 结果质量指标 (QI) 通常需要进行风险调整,以便为医疗保健专业人员提供公平和准确的反馈。然而,传统的风险调整模型通常过于简单化,无法理清影响医疗保健专业人员无法控制的结果的复杂因素。我们提出了 VIRGO,一种新颖的变分贝叶斯模型,在常规收集的大型管理数据集上进行训练,以调整结果 QI 的风险。 VIRGO 使用详细的人口统计数据、诊断和程序代码来提供个性化的风险调整和对影响结果的患者因素的解释。 VIRGO 在外部数据集上实现了最先进的水平,并具有不确定性表达能力、可解释特征和反事实分析能力。 VIRGO 通过解释患者因素如何导致不良结果并表达每个预测的不确定性来促进风险调整,使医疗保健专业人员不仅能够探索与较差结果相关的无法解释的方差的患者因素,而且还可以反映其临床实践的质量。


130. Results and implications for generative AI in a large introductory biomedical and health informatics course.

大型生物医学和健康信息学入门课程中生成式人工智能的结果和影响。

PMID: 39271955 | DOI: 10.1038/s41746-024-01251-0 | 日期: 2024-09-13

摘要: Generative artificial intelligence (AI) systems have performed well at many biomedical tasks, but few studies have assessed their performance directly compared to students in higher-education courses. We compared student knowledge-assessment scores with prompting of 6 large-language model (LLM) systems as they would be used by typical students in a large online introductory course in biomedical and health informatics that is taken by graduate, continuing education, and medical students. The state-of-the-art LLM systems were prompted to answer multiple-choice questions (MCQs) and final exam questions. We compared the scores for 139 students (30 graduate students, 85 continuing education students, and 24 medical students) to the LLM systems. All of the LLMs scored between the 50th and 75th percentiles of students for MCQ and final exam questions. The performance of LLMs raises questions about student assessment in higher education, especially in courses that are knowledge-based and online.

中文摘要: 生成人工智能(AI)系统在许多生物医学任务中表现良好,但很少有研究直接与高等教育课程的学生进行比较来评估其表现。我们将学生的知识评估分数与 6 个大语言模型 (LLM) 系统的提示进行了比较,因为典型学生将在研究生、继续教育和医学生学习的生物医学和健康信息学大型在线入门课程中使用这些系统。最先进的法学硕士系统被提示回答多项选择题(MCQ)和期末考试问题。我们将 139 名学生(30 名研究生、85 名继续教育学生和 24 名医学生)的分数与法学硕士系统进行了比较。所有法学硕士的 MCQ 和期末考试问题得分都在学生的 50th 和 75th 百分位数之间。法学硕士的表现引发了有关高等教育中学生评估的问题,特别是在基于知识的在线课程中。


131. A randomized trial testing digital medicine support models for mild-to-moderate alcohol use disorder.

一项测试数字医学支持轻度至中度酒精使用障碍模型的随机试验。

PMID: 39271938 | DOI: 10.1038/s41746-024-01241-2 | 日期: 2024-09-14

摘要: This paper reports the results of a hybrid effectiveness-implementation randomized trial that systematically varied levels of human oversight required to support the implementation of a digital medicine intervention for persons with mild-to-moderate alcohol use disorder (AUD). Participants were randomly assigned to three groups representing possible digital health support models within a health system: self-monitored use (SM; n = 185), peer-supported use (PS; n = 186), or a clinically integrated model CI; (n = 187). Across all three groups, the percentage of self-reported heavy drinking days dropped from 38.4% at baseline (95% CI [35.8%, 41%]) to 22.5% (19.5%, 25.5%) at 12 months. The clinically integrated group showed significant improvements in mental health and quality of life compared to the self-monitoring group (p = 0.011). However, higher attrition rates in the clinically integrated group warrant consideration in interpreting this result. Results suggest that making a self-guided digital intervention available to patients may be a viable option for health systems looking to promote alcohol risk reduction. This study was prospectively registered at clinicaltrials.gov on 7/03/2019 (NCT04011644).

中文摘要: 本文报告了一项混合有效性实施随机试验的结果,该试验系统地改变了支持对轻度至中度酒精使用障碍 (AUD) 患者实施数字医学干预所需的人力监督水平。参与者被随机分配到代表卫生系统内可能的数字健康支持模型的三组:自我监控使用(SM;n = 185)、同伴支持使用(PS;n = 186)或临床整合模型CI; (n = 187)。在所有三组中,自我报告的酗酒天数百分比从基线时的 38.4%(95% CI [35.8%,41%])下降至 12 个月时的 22.5%(19.5%,25.5%)。与自我监测组相比,临床整合组的心理健康和生活质量显着改善(p = 0.011)。然而,在解释这一结果时需要考虑临床整合组中较高的流失率。结果表明,为患者提供自我引导的数字干预可能是寻求促进降低酒精风险的卫生系统的一个可行选择。该研究于 2019 年 7 月 3 日在 ClinicalTrials.gov 前瞻性注册(NCT04011644)。


132. Long-term changes in wearable sensor data in people with and without Long Covid.

患有和未患有长新冠病毒的人的可穿戴传感器数据的长期变化。

PMID: 39271927 | DOI: 10.1038/s41746-024-01238-x | 日期: 2024-09-13

摘要: To better understand the impact of Long COVID on an individual, we explored changes in daily wearable data (step count, resting heart rate (RHR), and sleep quantity) for up to one year in individuals relative to their pre-infection baseline among 279 people with and 274 without long COVID. Participants with Long COVID, defined as symptoms lasting for 30 days or longer, following a SARS-CoV-2 infection had significantly different RHR and activity trajectories than those who did not report Long COVID and were also more likely to be women, younger, unvaccinated, and report more acute-phase (first 2 weeks) symptoms than those without Long COVID. Demographic, vaccine, and acute-phase sensor data differences could be used for early identification of individuals most likely to develop Long COVID complications and track objective evidence of the therapeutic efficacy of any interventions.Trial Registration: https://classic.clinicaltrials.gov/ct2/show/NCT04336020 .

中文摘要: 为了更好地了解长期新冠病毒对个人的影响,我们对 279 名长期新冠病毒感染者和 274 名非长期新冠病毒感染者进行了长达一年的每日可穿戴数据(步数、静息心率 (RHR) 和睡眠量)相对于感染前基线的变化的研究。患有长新冠病毒(定义为感染 SARS-CoV-2 后症状持续 30 天或更长时间)的参与者与未报告长新冠病毒的参与者相比,其 RHR 和活动轨迹显着不同,并且与未报告长新冠病毒的参与者相比,更有可能是女性、更年轻、未接种疫苗,并且报告更多急性期(前 2 周)症状。人口统计、疫苗和急性期传感器数据差异可用于早期识别最有可能出现长期新冠肺炎并发症的个体,并跟踪任何干预措施治疗效果的客观证据。试验注册:https://classic.clinicaltrials.gov/ct2/show/NCT04336020。


133. Can social media encourage diabetes self-screenings? A randomized controlled trial with Indonesian Facebook users.

社交媒体可以鼓励糖尿病自我筛查吗?一项针对印度尼西亚 Facebook 用户的随机对照试验。

PMID: 39271847 | DOI: 10.1038/s41746-024-01246-x | 日期: 2024-09-13

摘要: Nudging individuals without obvious symptoms of non-communicable diseases (NCDs) to undergo a health screening remains a challenge, especially in middle-income countries, where NCD awareness is low but the incidence is high. We assess whether an awareness campaign implemented on Facebook can encourage individuals in Indonesia to undergo an online diabetes self-screening. We use Facebook's advertisement function to randomly distribute graphical ads related to the risk and consequences of diabetes. Depending on their risk score, participants receive a recommendation to undergo a professional screening. We were able to reach almost 300,000 individuals in only three weeks. More than 1400 individuals completed the screening, inducing costs of about US0.75 per person. The two ads labeled "diabetes consequences" and "shock" outperform all other ads. A follow-up survey shows that many high-risk respondents have scheduled a professional screening. A cost-effectiveness analysis suggests that our campaign can diagnose an additional person with diabetes for about US9.

中文摘要: 促使没有明显非传染性疾病(NCD)症状的个人接受健康筛查仍然是一个挑战,特别是在中等收入国家,这些国家对非传染性疾病的认识较低,但发病率却很高。我们评估在 Facebook 上实施的宣传活动是否可以鼓励印度尼西亚个人进行在线糖尿病自我筛查。我们利用 Facebook 的广告功能随机分发与糖尿病风险和后果相关的图形广告。根据他们的风险评分,参与者会收到接受专业筛查的建议。我们在短短三周内就接触到了近 300,000 人。超过 1400 人完成了筛查,每人费用约为 0.75 美元。标有"糖尿病后果"和"震惊"的两个广告优于所有其他广告。后续调查显示,许多高危受访者已经安排了专业筛查。成本效益分析表明,我们的活动可以花费约 9 美元诊断一名额外的糖尿病患者。


134. A randomized controlled trial investigating experiential virtual reality communication on prudent antibiotic use.

一项随机对照试验,调查谨慎使用抗生素的体验式虚拟现实交流。

PMID: 39266716 | DOI: 10.1038/s41746-024-01240-3 | 日期: 2024-09-12

摘要: Antimicrobial resistance (AMR) is a global health threat. This randomized controlled trial evaluates the impact of experiential virtual reality (VR) versus information provision via VR or leaflet on prudent antibiotic use. A total of 249 (239 analyzed) participants were randomized into three conditions: VR Information + Experience, VR Information, or Leaflet Information. All participants received AMR information, while those in the VR Information + Experience condition additionally engaged in a game, making treatment decisions for their virtual avatar's infection. Participants in the VR Information + Experience condition showed a significant increase in prudent use intentions from baseline (d = 1.48). This increase was significantly larger compared to the VR Information (d = 0.50) and Leaflet Information (d = 0.79) conditions. The increase in intentions from baseline remained significant at follow-up in the VR Information + Experience condition (d = 1.25). Experiential VR communication shows promise for promoting prudent antibiotics use.

中文摘要: 抗菌素耐药性(AMR)是一个全球性的健康威胁。这项随机对照试验评估了体验式虚拟现实 (VR) 与通过 VR 或传单提供的信息对谨慎使用抗生素的影响。总共 249 名(分析了 239 名)参与者被随机分为三种情况:VR 信息 + 体验、VR 信息或传单信息。所有参与者都收到了 AMR 信息,而处于 VR 信息 + 体验状态的参与者还参与了游戏,为虚拟化身的感染做出治疗决策。 VR信息+体验条件下的参与者表现出谨慎使用意图较基线显着增加(d = 1.48)。与 VR 信息 (d = 0.50) 和 Leaflet 信息 (d = 0.79) 条件相比,这种增加明显更大。在 VR 信息 + 体验条件下(d = 1.25),随访时意图相对于基线的增加仍然显着。体验式 VR 交流有望促进谨慎使用抗生素。


135. Ethical debates amidst flawed healthcare artificial intelligence metrics.

有缺陷的医疗保健人工智能指标中的道德争论。

PMID: 39261642 | DOI: 10.1038/s41746-024-01242-1 | 日期: 2024-09-11

摘要: Healthcare AI faces an ethical dilemma between selective and equitable deployment, exacerbated by flawed performance metrics. These metrics inadequately capture real-world complexities and biases, leading to premature assertions of effectiveness. Improved evaluation practices, including continuous monitoring and silent evaluation periods, are crucial. To address these fundamental shortcomings, a paradigm shift in AI assessment is needed, prioritizing actual patient outcomes over conventional benchmarking.

中文摘要: 医疗保健人工智能在选择性部署和公平部署之间面临着道德困境,而有缺陷的绩效指标则加剧了这一困境。这些指标不足以捕捉现实世界的复杂性和偏见,导致过早断言有效性。改进评估实践,包括持续监控和静默评估期,至关重要。为了解决这些基本缺陷,需要对人工智能评估进行范式转变,将实际患者结果优先于传统基准测试。


136. Understanding activity and physiology at scale: The Apple Heart & Movement Study.

大规模了解活动和生理学:Apple 心脏和运动研究。

PMID: 39256546 | DOI: 10.1038/s41746-024-01187-5 | 日期: 2024-09-10

摘要: Physical activity or structured exercise is beneficial in a wide range of circumstances. Nevertheless, individual-level data on differential responses to various types of activity are not yet sufficient in scale, duration or level of annotation to understand the mechanisms of discrete outcomes nor to support personalized recommendations. The Apple Heart & Movement Study was designed to passively collect the dense physiologic data accessible on Apple Watch and iPhone from a large real-world cohort distributed across the US in order to address these knowledge gaps.

中文摘要: 体力活动或结构化锻炼在很多情况下都是有益的。然而,关于对各种类型活动的差异反应的个人层面的数据在规模、持续时间或注释水平上还不足以理解离散结果的机制,也不足以支持个性化建议。 Apple 心脏与运动研究旨在被动地从分布在美国各地的大量现实世界队列中收集 Apple Watch 和 iPhone 上可访问的密集生理数据,以弥补这些知识差距。


137. Transatlantic transferability and replicability of machine-learning algorithms to predict mental health crises.

机器学习算法预测心理健康危机的跨大西洋可转移性和可复制性。

PMID: 39251868 | DOI: 10.1038/s41746-024-01203-8 | 日期: 2024-09-09

摘要: Transferring and replicating predictive algorithms across healthcare systems constitutes a unique yet crucial challenge that needs to be addressed to enable the widespread adoption of machine learning in healthcare. In this study, we explored the impact of important differences across healthcare systems and the associated Electronic Health Records (EHRs) on machine-learning algorithms to predict mental health crises, up to 28 days in advance. We evaluated both the transferability and replicability of such machine learning models, and for this purpose, we trained six models using features and methods developed on EHR data from the Birmingham and Solihull Mental Health NHS Foundation Trust in the UK. These machine learning models were then used to predict the mental health crises of 2907 patients seen at the Rush University System for Health in the US between 2018 and 2020. The best one was trained on a combination of US-specific structured features and frequency features from anonymized patient notes and achieved an AUROC of 0.837. A model with comparable performance, originally trained using UK structured data, was transferred and then tuned using US data, achieving an AUROC of 0.826. Our findings establish the feasibility of transferring and replicating machine learning models to predict mental health crises across diverse hospital systems.

中文摘要: 在医疗保健系统中传输和复制预测算法构成了一个独特而关键的挑战,需要解决这一挑战,以实现机器学习在医疗保健领域的广泛采用。在这项研究中,我们探讨了医疗保健系统和相关电子健康记录 (EHR) 之间的重要差异对机器学习算法的影响,以提前 28 天预测心理健康危机。我们评估了此类机器学习模型的可转移性和可复制性,为此,我们使用根据英国伯明翰和索利哈尔心理健康 NHS 基金会信托基金的 EHR 数据开发的功能和方法训练了 6 个模型。然后,这些机器学习模型被用来预测 2018 年至 2020 年间在美国拉什大学健康系统就诊的 2907 名患者的心理健康危机。最好的模型接受了美国特有的结构特征和匿名患者笔记中的频率特征的组合训练,并获得了 0.837 的 AUROC。最初使用英国结构化数据训练的性能相当的模型被转移,然后使用美国数据进行调整,实现了 0.826 的 AUROC。我们的研究结果确立了转移和复制机器学习模型以预测不同医院系统中心理健康危机的可行性。


138. Adherence to non-pharmaceutical interventions following COVID-19 vaccination: a federated cohort study.

接种 COVID-19 疫苗后坚持非药物干预措施:一项联合队列研究。

PMID: 39251821 | DOI: 10.1038/s41746-024-01223-4 | 日期: 2024-09-10

摘要: In pandemic mitigation, strategies such as social distancing and mask-wearing are vital to prevent disease resurgence. Yet, monitoring adherence is challenging, as individuals might be reluctant to share behavioral data with public health authorities. To address this challenge and demonstrate a framework for conducting observational research with sensitive data in a privacy-conscious manner, we employ a privacy-centric epidemiological study design: the federated cohort. This approach leverages recent computational advances to allow for distributed participants to contribute to a prospective, observational research study while maintaining full control of their data. We apply this strategy here to explore pandemic intervention adherence patterns. Participants (n = 3808) were enrolled in our federated cohort via the "Google Health Studies" mobile application. Participants completed weekly surveys and contributed empirically measured mobility data from their Android devices between November 2020 to August 2021. Using federated analytics, differential privacy, and secure aggregation, we analyzed data in five 6-week periods, encompassing the pre- and post-vaccination phases. Our results showed that participants largely utilized non-pharmaceutical intervention strategies until they were fully vaccinated against COVID-19, except for individuals without plans to become vaccinated. Furthermore, this project offers a blueprint for conducting a federated cohort study and engaging in privacy-preserving research during a public health emergency.

中文摘要: 在缓解大流行病方面,保持社交距离和戴口罩等策略对于防止疾病复发至关重要。然而,监测依从性具有挑战性,因为个人可能不愿意与公共卫生当局分享行为数据。为了应对这一挑战并展示以注重隐私的方式利用敏感数据进行观察研究的框架,我们采用了以隐私为中心的流行病学研究设计:联合队列。这种方法利用最新的计算进展,允许分布式参与者为前瞻性、观察性研究做出贡献,同时保持对其数据的完全控制。我们在这里应用这一策略来探索大流行干预的依从模式。参与者 (n = 3808) 通过"Google Health Studies"移动应用程序加入我们的联合队列。参与者完成了每周调查,并提供了 2020 年 11 月至 2021 年 8 月期间通过其 Android 设备进行实证测量的移动数据。使用联合分析、差异隐私和安全聚合,我们分析了 5 个为期 6 周的数据,包括疫苗接种前和疫苗接种后阶段。我们的结果显示,除了没有计划接种疫苗的个人外,参与者在完全接种 COVID-19 疫苗之前大多采用非药物干预策略。此外,该项目还为在公共卫生紧急情况下进行联合队列研究和参与隐私保护研究提供了蓝图。


139. Closing the gap between open source and commercial large language models for medical evidence summarization.

缩小用于医学证据总结的开源和商业大型语言模型之间的差距。

PMID: 39251804 | DOI: 10.1038/s41746-024-01239-w | 日期: 2024-09-09

摘要: Large language models (LLMs) hold great promise in summarizing medical evidence. Most recent studies focus on the application of proprietary LLMs. Using proprietary LLMs introduces multiple risk factors, including a lack of transparency and vendor dependency. While open-source LLMs allow better transparency and customization, their performance falls short compared to the proprietary ones. In this study, we investigated to what extent fine-tuning open-source LLMs can further improve their performance. Utilizing a benchmark dataset, MedReview, consisting of 8161 pairs of systematic reviews and summaries, we fine-tuned three broadly-used, open-sourced LLMs, namely PRIMERA, LongT5, and Llama-2. Overall, the performance of open-source models was all improved after fine-tuning. The performance of fine-tuned LongT5 is close to GPT-3.5 with zero-shot settings. Furthermore, smaller fine-tuned models sometimes even demonstrated superior performance compared to larger zero-shot models. The above trends of improvement were manifested in both a human evaluation and a larger-scale GPT4-simulated evaluation.

中文摘要: 大型语言模型(LLM)在总结医学证据方面具有广阔的前景。最近的研究重点是专有法学硕士的应用。使用专有的法学硕士会带来多种风险因素,包括缺乏透明度和供应商依赖性。虽然开源法学硕士可以提供更好的透明度和定制性,但与专有法学硕士相比,其性能却有所不足。在这项研究中,我们调查了微调开源法学硕士可以在多大程度上进一步提高其表现。利用由 8161 对系统评价和摘要组成的基准数据集 MedReview,我们对三个广泛使用的开源法学硕士进行了微调,即 PRIMERA、LongT5 和 Llama-2。总体来看,开源模型经过微调后性能均得到提升。微调后的LongT5在零样本设置下的性能接近GPT-3.5。此外,与较大的零样本模型相比,较小的微调模型有时甚至表现出优越的性能。上述改进趋势在人类评估和更大规模的 GPT4 模拟评估中都得到了体现。


浏览欧盟人工智能法案:对受监管数字医疗产品的影响。

PMID: 39242831 | DOI: 10.1038/s41746-024-01232-3 | 日期: 2024-09-06

摘要: The newly adopted EU AI Act represents a pivotal milestone that heralds a new era of AI regulation across industries. With its broad territorial scope and applicability, this comprehensive legislation establishes stringent requirements for AI systems. In this article, we analyze the AI Act's impact on digital medical products, such as medical devices: How does the AI Act apply to AI/ML-enabled medical devices? How are they classified? What are the compliance requirements? And, what are the obligations of 'providers' of these AI systems? After addressing these foundational questions, we discuss the AI Act's broader implications for the future of regulated digital medical products.

中文摘要: 新通过的欧盟人工智能法案是一个关键的里程碑,预示着跨行业人工智能监管的新时代。凭借其广泛的地域范围和适用性,这项全面的立法对人工智能系统提出了严格的要求。在本文中,我们分析了《人工智能法案》对医疗设备等数字医疗产品的影响:《人工智能法案》如何适用于支持人工智能/机器学习的医疗设备?它们是如何分类的?合规要求是什么?那么,这些人工智能系统的"提供者"有哪些义务呢?在解决这些基本问题后,我们讨论了《人工智能法案》对受监管数字医疗产品未来的更广泛影响。


141. Establishing the longitudinal hemodynamic mapping framework for wearable-driven coronary digital twins.

为可穿戴设备驱动的冠状动脉数字双胞胎建立纵向血流动力学映射框架。

PMID: 39242829 | DOI: 10.1038/s41746-024-01216-3 | 日期: 2024-09-06

摘要: Understanding the evolving nature of coronary hemodynamics is crucial for early disease detection and monitoring progression. We require digital twins that mimic a patient's circulatory system by integrating continuous physiological data and computing hemodynamic patterns over months. Current models match clinical flow measurements but are limited to single heartbeats. To this end, we introduced the longitudinal hemodynamic mapping framework (LHMF), designed to tackle critical challenges: (1) computational intractability of explicit methods; (2) boundary conditions reflecting varying activity states; and (3) accessible computing resources for clinical translation. We show negligible error (0.0002-0.004%) between LHMF and explicit data of 750 heartbeats. We deployed LHMF across traditional and cloud-based platforms, demonstrating high-throughput simulations on heterogeneous systems. Additionally, we established LHMFC, where hemodynamically similar heartbeats are clustered to avoid redundant simulations, accurately reconstructing longitudinal hemodynamic maps (LHMs). This study captured 3D hemodynamics o