NPJ Digital Medicine 2024年文章汇总(3)
共收录文章:358 篇
261. RETFound-enhanced community-based fundus disease screening: real-world evidence and decision curve analysis.
RETFound 增强的基于社区的眼底疾病筛查:真实世界证据和决策曲线分析。
PMID: 38693205 | DOI: 10.1038/s41746-024-01109-5 | 日期: 2024-04-30
摘要: Visual impairments and blindness are major public health concerns globally. Effective eye disease screening aided by artificial intelligence (AI) is a promising countermeasure, although it is challenged by practical constraints such as poor image quality in community screening. The recently developed ophthalmic foundation model RETFound has shown higher accuracy in retinal image recognition tasks. This study developed an RETFound-enhanced deep learning (DL) model for multiple-eye disease screening using real-world images from community screenings. Our results revealed that our DL model improved the sensitivity and specificity by over 15% compared with commercial models. Our model also shows better generalisation ability than AI models developed using traditional processes. Additionally, decision curve analysis underscores the higher net benefit of employing our model in both urban and rural settings in China. These findings indicate that the RETFound-enhanced DL model can achieve a higher net benefit in community-based screening, advocating its adoption in low- and middle-income countries to address global eye health challenges.
中文摘要: 视力障碍和失明是全球主要的公共卫生问题。人工智能(AI)辅助下的有效眼病筛查是一种很有前途的对策,尽管它受到社区筛查中图像质量差等实际限制的挑战。最近开发的眼科基础模型 RETFound 在视网膜图像识别任务中表现出更高的准确性。本研究开发了一种 RETFound 增强型深度学习 (DL) 模型,使用社区筛查的真实图像进行多眼疾病筛查。我们的结果表明,与商业模型相比,我们的 DL 模型将灵敏度和特异性提高了 15% 以上。我们的模型还表现出比使用传统流程开发的人工智能模型更好的泛化能力。此外,决策曲线分析强调了在中国城市和农村环境中采用我们的模型的更高净效益。这些发现表明,RETFound 增强的 DL 模型可以在基于社区的筛查中实现更高的净效益,提倡在低收入和中等收入国家采用该模型来应对全球眼健康挑战。
262. Video clips for patient comprehension of atrial fibrillation and deep vein thrombosis in emergency care. A randomised clinical trial.
视频剪辑帮助患者了解急诊护理中的心房颤动和深静脉血栓形成。一项随机临床试验。
PMID: 38688958 | DOI: 10.1038/s41746-024-01107-7 | 日期: 2024-04-30
摘要: Integrating video clips in the discharge process may enhance patients' understanding and awareness of their condition. To determine the effect of video clip-integrated discharge discussion on patient comprehension of atrial fibrillation (AF) and deep vein thrombosis (DVT), and their main complications (stroke and pulmonary embolism), we designed a multicentre, pragmatic, parallel groups, randomised clinical trial, that was conducted at two Emergency Units in Italy. A convenience sample of 144 adult patients (or their caregivers) discharged home with either AF or DVT were randomised to receive standard verbal instructions (control) or video clip-integrated doctor-patient discharge discussion. Participants were guided by the discharging physician through the clip. Mean score for primary outcome (knowledge of the diagnosis and its potential complication) (range 0-18) was 5.87 (95% CI, 5.02-6.72] in the control group and 8.28 (95% CI, 7.27-9.31) in the intervention group (mean difference, -2.41; 95% CI, -3.73 to -1.09; p < 0.001). Among secondary outcomes, mean score for knowledge of the prescribed therapy (range 0-6) was 2.98 (95% CI, 2.57-3.39) in the control group and 3.20 (95% CI, 2.73-3.67) in the study group (mean difference, -0.22; 95% CI, -0.84 to 0.39). Mean score for satisfaction (range 0-12) was 7.34 (95% CI, 6.45-8.23) in the control arm and 7.97 (95% CI, 7.15-8.78) in the intervention arm (mean difference, -0.625; 95% CI -1.82 to 0.57). Initiation rate of newly prescribed anticoagulants was 80% (36/45) in the control group and 90.2% (46/51) in the intervention group. Among 109 patients reached at a median follow up of 21 (IQR 16-28) months, 5.55% (3/54) in the control arm and 1.82% (1/55) in the intervention arm had developed stroke or pulmonary embolism. In this trial, video clip-integrated doctor-patient discharge discussion, improved participants comprehension of AF and DVT and their main complications. Physicians should consider integrating these inexpensive tools during the discharge process of patients with AF or DVT.Trial Registration: ClinicalTrials.gov Identifier "NCT03734406".
中文摘要: 在出院过程中整合视频片段可以增强患者对其病情的理解和认识。为了确定视频片段整合出院讨论对患者理解心房颤动 (AF) 和深静脉血栓 (DVT) 及其主要并发症(中风和肺栓塞)的影响,我们设计了一项多中心、务实、平行组、随机临床试验,在意大利的两个急诊室进行。 144 名因 AF 或 DVT 出院回家的成年患者(或其护理人员)随机抽取,接受标准口头指示(对照)或视频片段整合的医患出院讨论。出院医生通过剪辑引导参与者。对照组的主要结局(诊断及其潜在并发症的知识)(范围 0-18)平均得分为 5.87(95% CI,5.02-6.72),干预组为 8.28(95% CI,7.27-9.31)(平均差,-2.41;95% CI,-3.73 至 -1.09;95% CI,-3.73 至 -1.09)。在次要结局中,对照组的处方治疗知识平均得分(范围 0-6)为 2.98(95% CI,2.57-3.39),研究组为 3.20(95% CI,2.73-3.67)(平均差异,-0.22;95% CI,-0.84 至 0.39)。对照组的满意度评分(范围 0-12)为 7.34(95% CI,6.45-8.23),干预组为 7.97(95% CI,7.15-8.78)(平均差,-0.625;95% CI -1.82 至 0.57)新开抗凝剂的起始率为 80%(36/45)。在中位随访时间为 21 (IQR 16-28) 个月的 109 名患者中,对照组和干预组中有 90.2% (46/51) 的患者出现中风或肺栓塞。 DVT 及其主要并发症。医生应考虑在 AF 或 DVT 患者出院过程中整合这些廉价工具。试验注册:ClinicalTrials.gov 标识符"NCT03734406"。
263. Evidence-based health messages increase intention to cope with loneliness in Germany: a randomized controlled online trial.
基于证据的健康信息增加了德国应对孤独感的意愿:一项随机对照在线试验。
PMID: 38684903 | DOI: 10.1038/s41746-024-01096-7 | 日期: 2024-04-29
摘要: Loneliness poses a formidable global health challenge in our volatile, post-pandemic world. Prior studies have identified promising interventions to alleviate loneliness, however, little is known about their effectiveness. This study measured the effectiveness of educational entertainment ("edutainment") and/or evidence-based, written health messages in alleviating loneliness and increasing intention to cope with loneliness. We recruited 1639 German participants, aged 18 years or older. We compared three intervention groups who received: (A) edutainment and written health messages, (B) only edutainment, or © only written health messages, against (D) a control group that received nothing. The primary outcomes were loneliness and intention to cope with loneliness. Participants were also invited to leave comments about the interventions or about their perception or experiences with loneliness. We found a small (d = 0.254) but significant effect of the written messages on increased intention to cope with loneliness (b = 1.78, t(1602) = 2.91, P = 0.004), while a combination of edutainment and written messages significantly decreased loneliness scores (b = -0.25, t(1602) = -2.06, P = 0.04) when compared with the control, even after adjusting for covariables including baseline values, self-esteem, self-efficacy, and hope. We also observed significantly higher self-esteem scores after exposure to a combination of edutainment and written messages (b = 0.821, t(1609) = 1.76, one-tailed P = 0.039) and significantly higher hope scores after exposure to edutainment-only (b = 0.986, t(1609) = 1.85, one-tailed P = 0.032) when compared with the control group. Our study highlights the benefits of using written messages for increasing intention to cope with loneliness and a combination of edutainment and written messages for easing loneliness. Even in small "doses" (less than 6 min of exposure), edutainment can nurture hope, and edutainment combined with written messages can boost self-esteem.
中文摘要: 在我们动荡的大流行后世界中,孤独构成了巨大的全球健康挑战。先前的研究已经确定了缓解孤独感的有希望的干预措施,但对其有效性知之甚少。这项研究衡量了教育娱乐("寓教于乐")和/或基于证据的书面健康信息在缓解孤独感和增强应对孤独感的意愿方面的有效性。我们招募了 1639 名年龄在 18 岁或以上的德国参与者。我们比较了三个干预组,他们收到:(A)寓教于乐和书面健康信息,(B)仅寓教于乐,或(C)仅书面健康信息,与(D)未收到任何信息的对照组。主要结果是孤独感和应对孤独感的意愿。参与者还被邀请对干预措施或他们对孤独的看法或经历发表评论。我们发现,书面信息对增加应对孤独感的意愿有微小(d = 0.254)但显着的影响(b = 1.78,t(1602) = 2.91,P = 0.004),而寓教于乐和书面信息的结合显着降低了孤独感得分(b = -0.25, t(1602) = -2.06, P = 0.04) 与对照组相比,即使在调整了包括基线值、自尊、自我效能和希望在内的协变量之后也是如此。我们还观察到,在接触寓教于乐和书面信息的组合后,自尊得分显着较高(b = 0.821,t(1609) = 1.76,单尾P = 0.039),并且在仅接触寓教于乐后,希望得分显着较高(b = 0.986,与对照组相比,t(1609) = 1.85,单尾P = 0.032)。我们的研究强调了使用书面信息来增强应对孤独感的意愿以及结合寓教于乐和书面信息来缓解孤独感的好处。即使是小"剂量"(接触时间少于 6 分钟),寓教于乐也可以培养希望,而寓教于乐与书面信息相结合可以增强自尊。
264. Effectiveness of DialBetesPlus, a self-management support system for diabetic kidney disease: Randomized controlled trial.
DialBetesPlus(糖尿病肾病自我管理支持系统)的有效性:随机对照试验。
PMID: 38678094 | DOI: 10.1038/s41746-024-01114-8 | 日期: 2024-04-27
摘要: We evaluated the effectiveness of a mobile health (mHealth) intervention for diabetic kidney disease patients by conducting a 12-month randomized controlled trial among 126 type 2 diabetes mellitus patients with moderately increased albuminuria (urinary albumin-to-creatinine ratio (UACR): 30-299 mg/g creatinine) recruited from eight clinical sites in Japan. Using a Theory of Planned Behavior (TPB) behavior change theory framework, the intervention provides patients detailed information in order to improve patient control over exercise and dietary behaviors. In addition to standard care, the intervention group received DialBetesPlus, a self-management support system allowing patients to monitor exercise, blood glucose, diet, blood pressure, and body weight via a smartphone application. The primary outcome, change in UACR after 12 months (used as a surrogate measure of renal function), was 28.8% better than the control group's change (P = 0.029). Secondary outcomes also improved in the intervention group, including a 0.32-point better change in HbA1c percentage (P = 0.041). These improvements persisted when models were adjusted to account for the impacts of coadministration of drugs targeting albuminuria (GLP-1 receptor agonists, SGLT-2 inhibitors, ACE inhibitors, and ARBs) (UACR: -32.3% [95% CI: -49.2%, -9.8%] between-group difference in change, P = 0.008). Exploratory multivariate regression analysis suggests that the improvements were primarily due to levels of exercise. This is the first trial to show that a lifestyle intervention via mHealth achieved a clinically-significant improvement in moderately increased albuminuria.
中文摘要: 我们通过对来自日本八个临床中心的 126 名白蛋白尿中度升高(尿白蛋白肌酐比 (UACR):30-299mg/g 肌酐)的 2 型糖尿病患者进行为期 12 个月的随机对照试验,评估了移动健康 (mHealth) 干预对糖尿病肾病患者的有效性。该干预措施利用计划行为理论 (TPB) 行为改变理论框架,为患者提供详细信息,以改善患者对运动和饮食行为的控制。除了标准护理外,干预组还接受了 DialBetesPlus,这是一种自我管理支持系统,允许患者通过智能手机应用程序监测运动、血糖、饮食、血压和体重。主要结局,12 个月后 UACR 的变化(用作肾功能的替代指标),比对照组的变化好 28.8%(P = 0.029)。干预组的次要结果也有所改善,包括 HbA1c 百分比改善了 0.32 点 (P = 0.041)。当调整模型以考虑联合给药针对白蛋白尿的药物(GLP-1受体激动剂、SGLT-2抑制剂、ACE抑制剂和ARB)的影响时,这些改善仍然存在(UACR:-32.3%[95%CI:-49.2%,-9.8%]组间变化差异,P = 0.008)。探索性多元回归分析表明,这些改善主要归因于运动水平。这是第一项表明通过移动健康进行生活方式干预对白蛋白尿中度增加有临床显着改善的试验。
265. Levels of autonomy in FDA-cleared surgical robots: a systematic review.
FDA 批准的手术机器人的自主水平:系统评价。
PMID: 38671232 | DOI: 10.1038/s41746-024-01102-y | 日期: 2024-04-26
摘要: The integration of robotics in surgery has increased over the past decade, and advances in the autonomous capabilities of surgical robots have paralleled that of assistive and industrial robots. However, classification and regulatory frameworks have not kept pace with the increasing autonomy of surgical robots. There is a need to modernize our classification to understand technological trends and prepare to regulate and streamline surgical practice around these robotic systems. We present a systematic review of all surgical robots cleared by the United States Food and Drug Administration (FDA) from 2015 to 2023, utilizing a classification system that we call Levels of Autonomy in Surgical Robotics (LASR) to categorize each robot's decision-making and action-taking abilities from Level 1 (Robot Assistance) to Level 5 (Full Autonomy). We searched the 510(k), De Novo, and AccessGUDID databases in December 2023 and included all medical devices fitting our definition of a surgical robot. 37,981 records were screened to identify 49 surgical robots. Most surgical robots were at Level 1 (86%) and some reached Level 3 (Conditional Autonomy) (6%). 2 surgical robots were recognized by the FDA to have machine learning-enabled capabilities, while more were reported to have these capabilities in their marketing materials. Most surgical robots were introduced via the 510(k) pathway, but a growing number via the De Novo pathway. This review highlights trends toward greater autonomy in surgical robotics. Implementing regulatory frameworks that acknowledge varying levels of autonomy in surgical robots may help ensure their safe and effective integration into surgical practice.
中文摘要: 过去十年来,机器人技术在手术中的集成度不断提高,手术机器人自主能力的进步与辅助机器人和工业机器人的进步相当。然而,分类和监管框架并没有跟上手术机器人日益增强的自主性。有必要对我们的分类进行现代化改造,以了解技术趋势,并准备规范和简化围绕这些机器人系统的手术实践。我们对 2015 年至 2023 年美国食品和药物管理局 (FDA) 批准的所有手术机器人进行了系统回顾,利用我们称为手术机器人自主级别 (LASR) 的分类系统,将每个机器人的决策和行动能力从 1 级(机器人辅助)到 5 级(完全自主)进行分类。我们于 2023 年 12 月检索了 510(k)、De Novo 和 AccessGUDID 数据库,纳入了符合我们对手术机器人定义的所有医疗设备。筛选了 37,981 条记录,识别出 49 台手术机器人。大多数手术机器人处于 1 级(86%),有些达到 3 级(有条件自主)(6%)。 FDA 认可 2 款手术机器人具有机器学习功能,据报道还有更多手术机器人在其营销材料中具备这些功能。大多数手术机器人是通过 510(k) 途径引入的,但越来越多的手术机器人是通过 De Novo 途径引入的。这篇评论强调了手术机器人更大自主性的趋势。实施承认手术机器人不同程度自主权的监管框架可能有助于确保其安全有效地融入手术实践。
266. Augmented non-hallucinating large language models as medical information curators.
增强非幻觉大型语言模型作为医疗信息管理者。
PMID: 38654142 | DOI: 10.1038/s41746-024-01081-0 | 日期: 2024-04-23
摘要: Reliably processing and interlinking medical information has been recognized as a critical foundation to the digital transformation of medical workflows, and despite the development of medical ontologies, the optimization of these has been a major bottleneck to digital medicine. The advent of large language models has brought great excitement, and maybe a solution to the medicines' 'communication problem' is in sight, but how can the known weaknesses of these models, such as hallucination and non-determinism, be tempered? Retrieval Augmented Generation, particularly through knowledge graphs, is an automated approach that can deliver structured reasoning and a model of truth alongside LLMs, relevant to information structuring and therefore also to decision support.
中文摘要: 可靠地处理和互连医疗信息已被认为是医疗工作流程数字化转型的关键基础,尽管医学本体不断发展,但其优化一直是数字医学的主要瓶颈。大语言模型的出现带来了极大的兴奋,也许药物"沟通问题"的解决方案就在眼前,但如何缓和这些模型的已知弱点,例如幻觉和非决定论?检索增强生成,特别是通过知识图,是一种自动化方法,可以与法学硕士一起提供结构化推理和真理模型,与信息结构化相关,因此也与决策支持相关。
267. Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework.
通过大语言模型优化肝病临床指南解释:基于检索增强生成的框架。
PMID: 38654102 | DOI: 10.1038/s41746-024-01091-y | 日期: 2024-04-23
摘要: Large language models (LLMs) can potentially transform healthcare, particularly in providing the right information to the right provider at the right time in the hospital workflow. This study investigates the integration of LLMs into healthcare, specifically focusing on improving clinical decision support systems (CDSSs) through accurate interpretation of medical guidelines for chronic Hepatitis C Virus infection management. Utilizing OpenAI's GPT-4 Turbo model, we developed a customized LLM framework that incorporates retrieval augmented generation (RAG) and prompt engineering. Our framework involved guideline conversion into the best-structured format that can be efficiently processed by LLMs to provide the most accurate output. An ablation study was conducted to evaluate the impact of different formatting and learning strategies on the LLM's answer generation accuracy. The baseline GPT-4 Turbo model's performance was compared against five experimental setups with increasing levels of complexity: inclusion of in-context guidelines, guideline reformatting, and implementation of few-shot learning. Our primary outcome was the qualitative assessment of accuracy based on expert review, while secondary outcomes included the quantitative measurement of similarity of LLM-generated responses to expert-provided answers using text-similarity scores. The results showed a significant improvement in accuracy from 43 to 99% (p < 0.001), when guidelines were provided as context in a coherent corpus of text and non-text sources were converted into text. In addition, few-shot learning did not seem to improve overall accuracy. The study highlights that structured guideline reformatting and advanced prompt engineering (data quality vs. data quantity) can enhance the efficacy of LLM integrations to CDSSs for guideline delivery.
中文摘要: 大语言模型 (LLM) 可以潜在地改变医疗保健,特别是在医院工作流程中在正确的时间向正确的提供者提供正确的信息。本研究调查了法学硕士与医疗保健的整合,特别侧重于通过准确解释慢性丙型肝炎病毒感染管理的医疗指南来改善临床决策支持系统(CDSS)。利用 OpenAI 的 GPT-4 Turbo 模型,我们开发了一个定制的 LLM 框架,其中结合了检索增强生成 (RAG) 和提示工程。我们的框架涉及将指南转换为最佳结构的格式,该格式可以由法学硕士有效处理,以提供最准确的输出。进行了一项消融研究,以评估不同格式和学习策略对法学硕士答案生成准确性的影响。将基线 GPT-4 Turbo 模型的性能与复杂程度不断增加的五种实验设置进行了比较:包含上下文指南、指南重新格式化和实施少样本学习。我们的主要成果是根据专家评审对准确性进行定性评估,而次要成果包括使用文本相似性分数对法学硕士生成的回复与专家提供的答案的相似性进行定量测量。结果显示,当在连贯的文本语料库中提供指南并将非文本源转换为文本时,准确率从 43% 显着提高到 99% (p<0.001)。此外,小样本学习似乎并没有提高整体准确性。该研究强调,结构化指南重新格式化和先进的提示工程(数据质量与数据数量)可以提高 LLM 与 CDSS 集成的指南交付的效率。
268. The dichotomy of diagnostics: exploring the value for consumers, clinicians and care pathways.
诊断的二分法:探索消费者、临床医生和护理途径的价值。
PMID: 38654071 | DOI: 10.1038/s41746-024-01087-8 | 日期: 2024-04-23
摘要: Diagnostics play a crucial role in screening, detecting, and stratifying patients, yet can account for only 2--3% of healthcare spending. With advancements in wearable technology and direct-to-consumer testing, the market for consumer health continues to rise. The potential benefits of more holistic and continuous measurement offer a promising opportunity for earlier disease detection and proactive health management. Many health systems are in a parallel transition from legacy analogue approaches to digitally enabled infrastructures. The evolving role of the clinical workforce, including medical ethics, regulation, will be closely coupled and a critical lever in success. This includes on a patient and clinician level, balancing the benefits and risks of interventions, and care pathway level, promoting responsible data utilisation with greater contextualisation based on the latest evidence of clinical efficacy. Moving forward a balance may need to be struck between increased data capture, analysis and reuse, with proportionate ethics, regulation, trust and governance.
中文摘要: 诊断在筛查、检测和患者分层方面发挥着至关重要的作用,但仅占医疗保健支出的 2-3%。随着可穿戴技术和直接面向消费者的测试的进步,消费者健康市场持续增长。更全面和连续测量的潜在好处为早期疾病检测和主动健康管理提供了有希望的机会。许多卫生系统正在从传统的模拟方法并行过渡到数字化基础设施。临床人员不断变化的角色,包括医学伦理、监管,将紧密结合在一起,成为成功的关键杠杆。这包括在患者和临床医生层面,平衡干预措施的益处和风险,以及护理路径层面,根据临床疗效的最新证据,通过更大的情境化促进负责任的数据利用。向前发展可能需要在增加数据捕获、分析和重用与相应的道德、监管、信任和治理之间取得平衡。
269. Real-time near infrared artificial intelligence using scalable non-expert crowdsourcing in colorectal surgery.
在结直肠手术中使用可扩展的非专家众包的实时近红外人工智能。
PMID: 38649447 | DOI: 10.1038/s41746-024-01095-8 | 日期: 2024-04-22
摘要: Surgical artificial intelligence (AI) has the potential to improve patient safety and clinical outcomes. To date, training such AI models to identify tissue anatomy requires annotations by expensive and rate-limiting surgical domain experts. Herein, we demonstrate and validate a methodology to obtain high quality surgical tissue annotations through crowdsourcing of non-experts, and real-time deployment of multimodal surgical anatomy AI model in colorectal surgery.
中文摘要: 外科人工智能 (AI) 有潜力改善患者安全和临床结果。迄今为止,训练此类人工智能模型来识别组织解剖结构需要由昂贵且限制速率的外科领域专家进行注释。在此,我们演示并验证了一种通过非专家众包获得高质量手术组织注释的方法,以及在结直肠手术中实时部署多模式手术解剖人工智能模型。
270. Predicting non-muscle invasive bladder cancer outcomes using artificial intelligence: a systematic review using APPRAISE-AI.
使用人工智能预测非肌肉浸润性膀胱癌的结果:使用 APPRAISE-AI 进行的系统评价。
PMID: 38637674 | DOI: 10.1038/s41746-024-01088-7 | 日期: 2024-04-18
摘要: Accurate prediction of recurrence and progression in non-muscle invasive bladder cancer (NMIBC) is essential to inform management and eligibility for clinical trials. Despite substantial interest in developing artificial intelligence (AI) applications in NMIBC, their clinical readiness remains unclear. This systematic review aimed to critically appraise AI studies predicting NMIBC outcomes, and to identify common methodological and reporting pitfalls. MEDLINE, EMBASE, Web of Science, and Scopus were searched from inception to February 5th, 2024 for AI studies predicting NMIBC recurrence or progression. APPRAISE-AI was used to assess methodological and reporting quality of these studies. Performance between AI and non-AI approaches included within these studies were compared. A total of 15 studies (five on recurrence, four on progression, and six on both) were included. All studies were retrospective, with a median follow-up of 71 months (IQR 32-93) and median cohort size of 125 (IQR 93-309). Most studies were low quality, with only one classified as high quality. While AI models generally outperformed non-AI approaches with respect to accuracy, c-index, sensitivity, and specificity, this margin of benefit varied with study quality (median absolute performance difference was 10 for low, 22 for moderate, and 4 for high quality studies). Common pitfalls included dataset limitations, heterogeneous outcome definitions, methodological flaws, suboptimal model evaluation, and reproducibility issues. Recommendations to address these challenges are proposed. These findings emphasise the need for collaborative efforts between urological and AI communities paired with rigorous methodologies to develop higher quality models, enabling AI to reach its potential in enhancing NMIBC care.
中文摘要: 准确预测非肌层浸润性膀胱癌 (NMIBC) 的复发和进展对于告知管理和临床试验资格至关重要。尽管人们对在 NMIBC 中开发人工智能 (AI) 应用抱有浓厚兴趣,但其临床准备情况仍不清楚。本系统综述旨在严格评估预测 NMIBC 结果的人工智能研究,并找出常见的方法和报告缺陷。在 MEDLINE、EMBASE、Web of Science 和 Scopus 中检索了从起始时间到 2024 年 2 月 5 日预测 NMIBC 复发或进展的 AI 研究。 APPRAISE-AI 用于评估这些研究的方法学和报告质量。对这些研究中包含的人工智能和非人工智能方法的性能进行了比较。总共纳入 15 项研究(5 项关于复发,4 项关于进展,6 项关于两者)。所有研究均为回顾性研究,中位随访时间为 71 个月 (IQR 32-93),中位队列规模为 125 人 (IQR 93-309)。大多数研究质量较低,只有一项研究质量较高。虽然 AI 模型在准确性、c 指数、敏感性和特异性方面通常优于非 AI 方法,但这种获益幅度随研究质量的不同而变化(低质量研究的中位绝对性能差异为 10,中等质量研究为 22,高质量研究为 4)。常见的陷阱包括数据集限制、异构结果定义、方法缺陷、次优模型评估和再现性问题。提出了应对这些挑战的建议。这些发现强调,泌尿外科和人工智能社区之间需要合作,并结合严格的方法来开发更高质量的模型,使人工智能能够发挥其在增强 NMIBC 护理方面的潜力。
271. Visual interpretable MRI fine grading of meniscus injury for intelligent assisted diagnosis and treatment.
半月板损伤可视化MRI精细分级,用于智能辅助诊断和治疗。
PMID: 38622284 | DOI: 10.1038/s41746-024-01082-z | 日期: 2024-04-15
摘要: Meniscal injury represents a common type of knee injury, accounting for over 50% of all knee injuries. The clinical diagnosis and treatment of meniscal injury heavily rely on magnetic resonance imaging (MRI). However, accurately diagnosing the meniscus from a comprehensive knee MRI is challenging due to its limited and weak signal, significantly impeding the precise grading of meniscal injuries. In this study, a visual interpretable fine grading (VIFG) diagnosis model has been developed to facilitate intelligent and quantified grading of meniscal injuries. Leveraging a multilevel transfer learning framework, it extracts comprehensive features and incorporates an attributional attention module to precisely locate the injured positions. Moreover, the attention-enhancing feedback module effectively concentrates on and distinguishes regions with similar grades of injury. The proposed method underwent validation on FastMRI_Knee and Xijing_Knee dataset, achieving mean grading accuracies of 0.8631 and 0.8502, surpassing the state-of-the-art grading methods notably in error-prone Grade 1 and Grade 2 cases. Additionally, the visually interpretable heatmaps generated by VIFG provide accurate depictions of actual or potential meniscus injury areas beyond human visual capability. Building upon this, a novel fine grading criterion was introduced for subtypes of meniscal injury, further classifying Grade 2 into 2a, 2b, and 2c, aligning with the anatomical knowledge of meniscal blood supply. It can provide enhanced injury-specific details, facilitating the development of more precise surgical strategies. The efficacy of this subtype classification was evidenced in 20 arthroscopic cases, underscoring the potential enhancement brought by intelligent-assisted diagnosis and treatment for meniscal injuries.
中文摘要: 半月板损伤是膝关节损伤的一种常见类型,占所有膝关节损伤的 50% 以上。半月板损伤的临床诊断和治疗很大程度上依赖于磁共振成像(MRI)。然而,由于综合膝关节 MRI 信号有限且微弱,准确诊断半月板具有挑战性,严重阻碍了半月板损伤的精确分级。在这项研究中,开发了一种视觉可解释精细分级(VIFG)诊断模型,以促进半月板损伤的智能和量化分级。利用多级迁移学习框架,提取综合特征并结合归因注意模块来精确定位受伤位置。此外,注意力增强反馈模块有效地集中并区分具有相似损伤等级的区域。所提出的方法在 FastMRI_Knee 和 Xijing_Knee 数据集上进行了验证,平均分级精度达到 0.8631 和 0.8502,超过了最先进的分级方法,特别是在容易出错的 1 级和 2 级情况下。此外,VIFG 生成的视觉可解释热图可准确描述超出人类视觉能力的实际或潜在半月板损伤区域。在此基础上,针对半月板损伤的亚型引入了一种新的精细分级标准,将 2 级进一步分为 2a、2b 和 2c,与半月板血液供应的解剖学知识保持一致。它可以提供增强的损伤特定细节,促进更精确的手术策略的制定。该亚型分类的有效性已在20例关节镜病例中得到证实,凸显了半月板损伤智能辅助诊断和治疗带来的潜在增强。
272. Deep learning evaluation of echocardiograms to identify occult atrial fibrillation.
超声心动图的深度学习评估以识别隐匿性心房颤动。
PMID: 38615104 | DOI: 10.1038/s41746-024-01090-z | 日期: 2024-04-13
摘要: Atrial fibrillation (AF) often escapes detection, given its frequent paroxysmal and asymptomatic presentation. Deep learning of transthoracic echocardiograms (TTEs), which have structural information, could help identify occult AF. We created a two-stage deep learning algorithm using a video-based convolutional neural network model that (1) distinguished whether TTEs were in sinus rhythm or AF and then (2) predicted which of the TTEs in sinus rhythm were in patients who had experienced AF within 90 days. Our model, trained on 111,319 TTE videos, distinguished TTEs in AF from those in sinus rhythm with high accuracy in a held-out test cohort (AUC 0.96 (0.95-0.96), AUPRC 0.91 (0.90-0.92)). Among TTEs in sinus rhythm, the model predicted the presence of concurrent paroxysmal AF (AUC 0.74 (0.71-0.77), AUPRC 0.19 (0.16-0.23)). Model discrimination remained similar in an external cohort of 10,203 TTEs (AUC of 0.69 (0.67-0.70), AUPRC 0.34 (0.31-0.36)). Performance held across patients who were women (AUC 0.76 (0.72-0.81)), older than 65 years (0.73 (0.69-0.76)), or had a CHA2DS2VASc ≥2 (0.73 (0.79-0.77)). The model performed better than using clinical risk factors (AUC 0.64 (0.62-0.67)), TTE measurements (0.64 (0.62-0.67)), left atrial size (0.63 (0.62-0.64)), or CHA2DS2VASc (0.61 (0.60-0.62)). An ensemble model in a cohort subset combining the TTE model with an electrocardiogram (ECGs) deep learning model performed better than using the ECG model alone (AUC 0.81 vs. 0.79, p = 0.01). Deep learning using TTEs can predict patients with active or occult AF and could be used for opportunistic AF screening that could lead to earlier treatment.
中文摘要: 由于心房颤动 (AF) 频繁出现阵发性且无症状,因此常常难以被发现。经胸超声心动图 (TTE) 的深度学习具有结构信息,可以帮助识别隐匿性 AF。我们使用基于视频的卷积神经网络模型创建了一个两阶段深度学习算法,该算法 (1) 区分 TTE 是窦性心律还是 AF,然后 (2) 预测哪些 TTE 处于窦性心律是在 90 天内经历过 AF 的患者。我们的模型经过 111,319 个 TTE 视频的训练,在测试队列中以高精度区分 AF 中的 TTE 和窦性心律中的 TTE(AUC 0.96 (0.95-0.96)、AUPRC 0.91 (0.90-0.92))。在窦性心律的 TTE 中,该模型预测存在并发阵发性 AF(AUC 0.74 (0.71-0.77)、AUPRC 0.19 (0.16-0.23))。在 10,203 个 TTE 的外部队列中,模型歧视仍然相似(AUC 为 0.69 (0.67-0.70),AUPRC 0.34 (0.31-0.36))。女性 (AUC 0.76 (0.72-0.81))、65 岁以上 (0.73 (0.69-0.76)) 或 CHA2DS2VASc ≥2 (0.73 (0.79-0.77)) 的患者的表现均保持不变。该模型的表现优于使用临床危险因素 (AUC 0.64 (0.62-0.67))、TTE 测量值 (0.64 (0.62-0.67))、左心房大小 (0.63 (0.62-0.64)) 或 CHA2DS2VASc (0.61 (0.60-0.62))。将 TTE 模型与心电图 (ECG) 深度学习模型相结合的队列子集中的集成模型比单独使用 ECG 模型表现更好(AUC 0.81 vs. 0.79,p = 0.01)。使用 TTE 的深度学习可以预测患有活动性或隐匿性 AF 的患者,并可用于机会性 AF 筛查,从而实现早期治疗。
273. Towards a common European ethical and legal framework for conducting clinical research: the GATEKEEPER experience.
建立一个进行临床研究的欧洲共同道德和法律框架:GATEKEEPER 经验。
PMID: 38615054 | DOI: 10.1038/s41746-024-01092-x | 日期: 2024-04-13
摘要: This paper examines the ethical and legal challenges encountered during the GATEKEEPER Project and how these challenges informed the development of a comprehensive framework for future Large-Scale Pilot (LSP) projects. GATEKEEPER is a LSP Project with 48 partners conducting 30 implementation studies across Europe with 50,000 target participants grouped into 9 Reference Use Cases. The project underscored the complexity of obtaining ethical approval across various jurisdictions with divergent regulations and procedures. Through a detailed analysis of the issues faced and the strategies employed to navigate these challenges, this study proposes an ethical and legal framework. This framework, derived from a comparative analysis of ethical application forms and regulations, aims to streamline the ethical approval process for future LSP research projects. By addressing the hurdles encountered in GATEKEEPER, the proposed framework offers a roadmap for more efficient and effective project management, ensuring smoother implementation of similar projects in the future.
中文摘要: 本文探讨了 GATEKEEPER 项目期间遇到的道德和法律挑战,以及这些挑战如何为未来大规模试点 (LSP) 项目的综合框架的开发提供信息。 GATEKEEPER 是一个 LSP 项目,有 48 个合作伙伴在欧洲开展了 30 项实施研究,将 50,000 名目标参与者分为 9 个参考用例。该项目强调了在法规和程序各异的各个司法管辖区获得道德批准的复杂性。通过对所面临的问题和应对这些挑战所采用的策略的详细分析,本研究提出了一个道德和法律框架。该框架源自对伦理申请表和法规的比较分析,旨在简化未来 LSP 研究项目的伦理审批流程。通过解决 GATEKEEPER 中遇到的障碍,拟议的框架为更高效和有效的项目管理提供了路线图,确保未来类似项目的顺利实施。
274. International perspectives on measuring national digital public health system maturity through a multidisciplinary Delphi study.
通过多学科德尔菲研究衡量国家数字公共卫生系统成熟度的国际视角。
PMID: 38609458 | DOI: 10.1038/s41746-024-01078-9 | 日期: 2024-04-12
摘要: Unlocking the full potential of digital public health (DiPH) systems requires a comprehensive tool to assess their maturity. While the World Health Organization and the International Telecommunication Union released a toolkit in 2012 covering various aspects of digitalizing national healthcare systems, a holistic maturity assessment tool has been lacking ever since. To bridge this gap, we conducted a pioneering Delphi study, to which 54 experts from diverse continents and academic fields actively contributed to at least one of three rounds. 54 experts participated in developing and rating multidisciplinary quality indicators to measure the maturity of national digital public health systems. Participants established consensus on these indicators with a threshold of 70% agreement on indicator importance. Eventually, 96 indicators were identified and agreed upon by experts. Notably, 48% of these indicators were found to align with existing validated tools, highlighting their relevance and reliability. However, further investigation is required to assess the suitability and applicability of all the suggestions put forward by our participants. Nevertheless, this Delphi study is an essential initial stride toward a comprehensive measurement tool for DiPH system maturity. By working towards a standardized assessment of DiPH system maturity, we aim to empower decision-makers to make informed choices, optimize resource allocation, and drive innovation in healthcare delivery. The results of this study mark a significant milestone in advancing DiPH on a global scale.
中文摘要: 释放数字公共卫生 (DiPH) 系统的全部潜力需要一个全面的工具来评估其成熟度。尽管世界卫生组织和国际电信联盟于 2012 年发布了涵盖国家医疗保健系统数字化各个方面的工具包,但自那以后一直缺乏全面的成熟度评估工具。为了弥补这一差距,我们进行了一项开创性的德尔菲研究,来自不同大洲和学术领域的 54 名专家为三轮中的至少一轮做出了积极贡献。 54名专家参与制定和评级多学科质量指标,以衡量国家数字公共卫生系统的成熟度。参与者就这些指标达成了共识,对指标重要性达成了 70% 的共识。最终,96项指标被专家确定并达成一致。值得注意的是,其中 48% 的指标与现有经过验证的工具相一致,凸显了它们的相关性和可靠性。然而,还需要进一步调查来评估参与者提出的所有建议的适当性和适用性。尽管如此,这项 Delphi 研究是迈向 DiPH 系统成熟度综合测量工具的重要的第一步。通过致力于对 DiPH 系统成熟度进行标准化评估,我们的目标是帮助决策者做出明智的选择、优化资源分配并推动医疗保健服务的创新。这项研究的结果标志着 DiPH 在全球范围内推进的一个重要里程碑。
275. Navigating the U.S. regulatory landscape for neurologic digital health technologies.
探索美国神经数字健康技术的监管环境。
PMID: 38609447 | DOI: 10.1038/s41746-024-01098-5 | 日期: 2024-04-12
摘要: Digital health technologies (DHTs) can transform neurological assessments, improving quality and continuity of care. In the United States, the Food & Drug Administration (FDA) oversees the safety and efficacy of these technologies, employing a detailed regulatory process that classifies devices based on risk and requires rigorous review and post-market surveillance. Following FDA approval, DHTs enter the Current Procedural Terminology, Relative Value Scale Update Committee, and Centers for Medicare & Medicaid Services coding and valuation processes leading to coverage and payment decisions. DHT adoption is challenged by rapid technologic advancements, an inconsistent evidence base, marketing discrepancies, ambiguous coding guidance, and variable health insurance coverage. Regulators, policymakers, and payers will need to develop better methods to evaluate these promising technologies and guide their deployment. This includes striking a balance between patient safety and clinical effectiveness versus promotion of innovation, especially as DHTs increasingly incorporate artificial intelligence. Data validity, cybersecurity, risk management, societal, and ethical responsibilities should be addressed. Regulatory advances can support adoption of these promising tools by ensuring DHTs are safe, effective, accessible, and equitable.
中文摘要: 数字健康技术 (DHT) 可以改变神经评估,提高护理质量和连续性。在美国,食品和药物管理局 (FDA) 负责监督这些技术的安全性和有效性,采用详细的监管流程,根据风险对设备进行分类,并要求进行严格的审查和上市后监督。 FDA 批准后,DHT 进入当前程序术语、相对价值量表更新委员会以及医疗保险和医疗补助服务中心的编码和评估流程,从而做出承保和支付决策。 DHT 的采用面临着快速的技术进步、不一致的证据基础、营销差异、模糊的编码指导和可变的健康保险覆盖范围的挑战。监管机构、政策制定者和付款人需要开发更好的方法来评估这些有前景的技术并指导其部署。这包括在患者安全和临床有效性与促进创新之间取得平衡,特别是在 DHT 越来越多地融入人工智能的情况下。应解决数据有效性、网络安全、风险管理、社会和道德责任。监管进步可以确保 DHT 的安全、有效、可访问和公平,从而支持这些有前途的工具的采用。
276. Self-supervised learning for human activity recognition using 700,000 person-days of wearable data.
使用 700,000 人日的可穿戴数据进行人类活动识别的自我监督学习。
PMID: 38609437 | DOI: 10.1038/s41746-024-01062-3 | 日期: 2024-04-12
摘要: Accurate physical activity monitoring is essential to understand the impact of physical activity on one's physical health and overall well-being. However, advances in human activity recognition algorithms have been constrained by the limited availability of large labelled datasets. This study aims to leverage recent advances in self-supervised learning to exploit the large-scale UK Biobank accelerometer dataset-a 700,000 person-days unlabelled dataset-in order to build models with vastly improved generalisability and accuracy. Our resulting models consistently outperform strong baselines across eight benchmark datasets, with an F1 relative improvement of 2.5-130.9% (median 24.4%). More importantly, in contrast to previous reports, our results generalise across external datasets, cohorts, living environments, and sensor devices. Our open-sourced pre-trained models will be valuable in domains with limited labelled data or where good sampling coverage (across devices, populations, and activities) is hard to achieve.
中文摘要: 准确的身体活动监测对于了解身体活动对身体健康和整体福祉的影响至关重要。然而,人类活动识别算法的进步受到大型标记数据集的有限可用性的限制。本研究旨在利用自我监督学习的最新进展来开发大规模的英国生物银行加速计数据集(700,000 人天的未标记数据集),以便构建具有大幅提高的通用性和准确性的模型。我们得到的模型在八个基准数据集上始终优于强大的基线,F1 相对改进为 2.5-130.9%(中位数 24.4%)。更重要的是,与之前的报告相比,我们的结果概括了外部数据集、群组、生活环境和传感器设备。我们的开源预训练模型在标记数据有限或难以实现良好采样覆盖(跨设备、人群和活动)的领域将非常有价值。
277. Whole-heart electromechanical simulations using Latent Neural Ordinary Differential Equations.
使用潜在神经常微分方程进行全心脏机电模拟。
PMID: 38605089 | DOI: 10.1038/s41746-024-01084-x | 日期: 2024-04-11
摘要: Cardiac digital twins provide a physics and physiology informed framework to deliver personalized medicine. However, high-fidelity multi-scale cardiac models remain a barrier to adoption due to their extensive computational costs. Artificial Intelligence-based methods can make the creation of fast and accurate whole-heart digital twins feasible. We use Latent Neural Ordinary Differential Equations (LNODEs) to learn the pressure-volume dynamics of a heart failure patient. Our surrogate model is trained from 400 simulations while accounting for 43 parameters describing cell-to-organ cardiac electromechanics and cardiovascular hemodynamics. LNODEs provide a compact representation of the 3D-0D model in a latent space by means of an Artificial Neural Network that retains only 3 hidden layers with 13 neurons per layer and allows for numerical simulations of cardiac function on a single processor. We employ LNODEs to perform global sensitivity analysis and parameter estimation with uncertainty quantification in 3 hours of computations, still on a single processor.
中文摘要: 心脏数字双胞胎提供了一个物理和生理学知识框架来提供个性化医疗。然而,高保真多尺度心脏模型由于其大量的计算成本仍然是采用的障碍。基于人工智能的方法可以使快速、准确的全心脏数字双胞胎的创建成为可能。我们使用潜在神经常微分方程 (LNODE) 来了解心力衰竭患者的压力-容积动态。我们的代理模型经过 400 次模拟训练,同时考虑了描述细胞到器官心脏机电和心血管血流动力学的 43 个参数。 LNODE 通过人工神经网络在潜在空间中提供 3D-0D 模型的紧凑表示,该网络仅保留 3 个隐藏层,每层 13 个神经元,并允许在单个处理器上对心脏功能进行数值模拟。我们使用 LNODE 在 3 小时的计算中执行全局灵敏度分析和参数估计,并进行不确定性量化,而且仍然在单个处理器上进行。
278. Understanding the errors made by artificial intelligence algorithms in histopathology in terms of patient impact.
了解人工智能算法在组织病理学中所犯的错误对患者的影响。
PMID: 38600151 | DOI: 10.1038/s41746-024-01093-w | 日期: 2024-04-10
摘要: An increasing number of artificial intelligence (AI) tools are moving towards the clinical realm in histopathology and across medicine. The introduction of such tools will bring several benefits to diagnostic specialities, namely increased diagnostic accuracy and efficiency, however, as no AI tool is infallible, their use will inevitably introduce novel errors. These errors made by AI tools are, most fundamentally, misclassifications made by a computational algorithm. Understanding of how these translate into clinical impact on patients is often lacking, meaning true reporting of AI tool safety is incomplete. In this Perspective we consider AI diagnostic tools in histopathology, which are predominantly assessed in terms of technical performance metrics such as sensitivity, specificity and area under the receiver operating characteristic curve. Although these metrics are essential and allow tool comparison, they alone give an incomplete picture of how an AI tool's errors could impact a patient's diagnosis, management and prognosis. We instead suggest assessing and reporting AI tool errors from a pathological and clinical stance, demonstrating how this is done in studies on human pathologist errors, and giving examples where available from pathology and radiology. Although this seems a significant task, we discuss ways to move towards this approach in terms of study design, guidelines and regulation. This Perspective seeks to initiate broader consideration of the assessment of AI tool errors in histopathology and across diagnostic specialities, in an attempt to keep patient safety at the forefront of AI tool development and facilitate safe clinical deployment.
中文摘要: 越来越多的人工智能 (AI) 工具正在走向组织病理学和医学领域的临床领域。此类工具的引入将为诊断专业带来多种好处,即提高诊断准确性和效率,但是,由于没有任何人工智能工具是万无一失的,它们的使用将不可避免地引入新的错误。人工智能工具所犯的这些错误从根本上来说是计算算法造成的错误分类。通常缺乏对这些如何转化为对患者的临床影响的了解,这意味着人工智能工具安全性的真实报告是不完整的。在本视角中,我们考虑组织病理学中的人工智能诊断工具,这些工具主要根据技术性能指标进行评估,例如灵敏度、特异性和接受者操作特征曲线下的面积。尽管这些指标很重要并且可以进行工具比较,但它们本身并不能完整地说明人工智能工具的错误如何影响患者的诊断、管理和预后。相反,我们建议从病理和临床立场评估和报告人工智能工具错误,展示在人类病理学家错误的研究中如何做到这一点,并给出病理学和放射学中可用的示例。尽管这似乎是一项重要的任务,但我们讨论了在研究设计、指南和监管方面实现这一方法的方法。本视角旨在对组织病理学和跨诊断专业的人工智能工具错误评估进行更广泛的考虑,试图将患者安全置于人工智能工具开发的最前沿,并促进安全的临床部署。
279. The potential for artificial intelligence to transform healthcare: perspectives from international health leaders.
人工智能改变医疗保健的潜力:国际卫生领袖的观点。
PMID: 38594477 | DOI: 10.1038/s41746-024-01097-6 | 日期: 2024-04-09
摘要: Artificial intelligence (AI) has the potential to transform care delivery by improving health outcomes, patient safety, and the affordability and accessibility of high-quality care. AI will be critical to building an infrastructure capable of caring for an increasingly aging population, utilizing an ever-increasing knowledge of disease and options for precision treatments, and combatting workforce shortages and burnout of medical professionals. However, we are not currently on track to create this future. This is in part because the health data needed to train, test, use, and surveil these tools are generally neither standardized nor accessible. There is also universal concern about the ability to monitor health AI tools for changes in performance as they are implemented in new places, used with diverse populations, and over time as health data may change. The Future of Health (FOH), an international community of senior health care leaders, collaborated with the Duke-Margolis Institute for Health Policy to conduct a literature review, expert convening, and consensus-building exercise around this topic. This commentary summarizes the four priority action areas and recommendations for health care organizations and policymakers across the globe that FOH members identified as important for fully realizing AI's potential in health care: improving data quality to power AI, building infrastructure to encourage efficient and trustworthy development and evaluations, sharing data for better AI, and providing incentives to accelerate the progress and impact of AI.
中文摘要: 人工智能 (AI) 有潜力通过改善健康结果、患者安全以及高质量护理的可负担性和可及性来改变护理服务。人工智能对于建设能够照顾日益老龄化的人口的基础设施、利用不断增长的疾病知识和精准治疗选择、解决劳动力短缺和医疗专业人员的倦怠问题至关重要。然而,我们目前还没有走上创造这个未来的轨道。部分原因是训练、测试、使用和监视这些工具所需的健康数据通常既不标准化也不可访问。人们还普遍担心健康人工智能工具在新的地方实施、在不同人群中使用以及随着时间的推移健康数据可能发生变化,监控其性能变化的能力。健康未来 (FOH) 是一个由高级医疗保健领导者组成的国际社区,与杜克-马戈利斯健康政策研究所合作,围绕这一主题进行了文献综述、专家召集和建立共识活动。本评论总结了 FOH 成员认为对充分发挥人工智能在医疗保健领域的潜力非常重要的四个优先行动领域和对全球医疗保健组织和政策制定者的建议:提高数据质量以推动人工智能,建设基础设施以鼓励高效和值得信赖的开发和评估,共享数据以实现更好的人工智能,以及提供激励措施以加速人工智能的进步和影响。
280. Human-AI interaction in skin cancer diagnosis: a systematic review and meta-analysis.
皮肤癌诊断中的人机交互:系统评价和荟萃分析。
PMID: 38594408 | DOI: 10.1038/s41746-024-01031-w | 日期: 2024-04-09
摘要: The development of diagnostic tools for skin cancer based on artificial intelligence (AI) is increasing rapidly and will likely soon be widely implemented in clinical use. Even though the performance of these algorithms is promising in theory, there is limited evidence on the impact of AI assistance on human diagnostic decisions. Therefore, the aim of this systematic review and meta-analysis was to study the effect of AI assistance on the accuracy of skin cancer diagnosis. We searched PubMed, Embase, IEE Xplore, Scopus and conference proceedings for articles from 1/1/2017 to 11/8/2022. We included studies comparing the performance of clinicians diagnosing at least one skin cancer with and without deep learning-based AI assistance. Summary estimates of sensitivity and specificity of diagnostic accuracy with versus without AI assistance were computed using a bivariate random effects model. We identified 2983 studies, of which ten were eligible for meta-analysis. For clinicians without AI assistance, pooled sensitivity was 74.8% (95% CI 68.6-80.1) and specificity was 81.5% (95% CI 73.9-87.3). For AI-assisted clinicians, the overall sensitivity was 81.1% (95% CI 74.4-86.5) and specificity was 86.1% (95% CI 79.2-90.9). AI benefitted medical professionals of all experience levels in subgroup analyses, with the largest improvement among non-dermatologists. No publication bias was detected, and sensitivity analysis revealed that the findings were robust. AI in the hands of clinicians has the potential to improve diagnostic accuracy in skin cancer diagnosis. Given that most studies were conducted in experimental settings, we encourage future studies to further investigate these potential benefits in real-life settings.
中文摘要: 基于人工智能(AI)的皮肤癌诊断工具的开发正在迅速增长,并且可能很快就会广泛应用于临床。尽管这些算法的性能在理论上很有希望,但关于人工智能辅助对人类诊断决策的影响的证据有限。因此,本次系统评价和荟萃分析的目的是研究人工智能辅助对皮肤癌诊断准确性的影响。我们检索了 PubMed、Embase、IEE Xplore、Scopus 和会议记录中 2017 年 1 月 1 日至 2022 年 8 月 11 日的文章。我们纳入的研究比较了临床医生在有或没有基于深度学习的人工智能辅助的情况下诊断至少一种皮肤癌的表现。使用双变量随机效应模型计算了有人工智能辅助与无人工智能辅助时诊断准确性的敏感性和特异性的汇总估计。我们确定了 2983 项研究,其中 10 项适合进行荟萃分析。对于没有 AI 辅助的临床医生,汇总敏感性为 74.8% (95% CI 68.6-80.1),特异性为 81.5% (95% CI 73.9-87.3)。对于人工智能辅助的临床医生,总体敏感性为 81.1% (95% CI 74.4-86.5),特异性为 86.1% (95% CI 79.2-90.9)。在亚组分析中,人工智能使所有经验水平的医疗专业人员受益,其中非皮肤科医生的进步最大。没有发现发表偏倚,敏感性分析表明研究结果是稳健的。临床医生手中的人工智能有可能提高皮肤癌诊断的准确性。鉴于大多数研究都是在实验环境中进行的,我们鼓励未来的研究进一步研究现实生活中的这些潜在好处。
281. The algorithm journey map: a tangible approach to implementing AI solutions in healthcare.
算法旅程图:在医疗保健领域实施人工智能解决方案的切实方法。
PMID: 38594344 | DOI: 10.1038/s41746-024-01061-4 | 日期: 2024-04-09
摘要: When integrating AI tools in healthcare settings, complex interactions between technologies and primary users are not always fully understood or visible. This deficient and ambiguous understanding hampers attempts by healthcare organizations to adopt AI/ML, and it also creates new challenges for researchers to identify opportunities for simplifying adoption and developing best practices for the use of AI-based solutions. Our study fills this gap by documenting the process of designing, building, and maintaining an AI solution called SepsisWatch at Duke University Health System. We conducted 20 interviews with the team of engineers and scientists that led the multi-year effort to build the tool, integrate it into practice, and maintain the solution. This "Algorithm Journey Map" enumerates all social and technical activities throughout the AI solution's procurement, development, integration, and full lifecycle management. In addition to mapping the "who?" and "what?" of the adoption of the AI tool, we also show several 'lessons learned' throughout the algorithm journey maps including modeling assumptions, stakeholder inclusion, and organizational structure. In doing so, we identify generalizable insights about how to recognize and navigate barriers to AI/ML adoption in healthcare settings. We expect that this effort will further the development of best practices for operationalizing and sustaining ethical principles-in algorithmic systems.
中文摘要: 在医疗保健环境中集成人工智能工具时,技术和主要用户之间复杂的交互并不总是完全理解或可见。这种缺陷和模糊的理解阻碍了医疗保健组织采用人工智能/机器学习的尝试,也为研究人员寻找简化采用和开发使用基于人工智能的解决方案的最佳实践的机会带来了新的挑战。我们的研究通过记录杜克大学医疗系统设计、构建和维护名为 SepsisWatch 的人工智能解决方案的过程来填补这一空白。我们对工程师和科学家团队进行了 20 次采访,他们领导了多年来构建该工具、将其集成到实践中并维护解决方案的工作。这张"算法旅程图"列举了整个人工智能解决方案的采购、开发、集成和全生命周期管理的所有社会和技术活动。除了映射"谁?"和"什么?"在采用人工智能工具的过程中,我们还展示了整个算法旅程地图中的一些"经验教训",包括建模假设、利益相关者包容性和组织结构。在此过程中,我们确定了有关如何识别和克服医疗保健环境中采用人工智能/机器学习的障碍的普遍见解。我们期望这项工作将进一步发展在算法系统中实施和维护道德原则的最佳实践。
282. Evaluating large language models as agents in the clinic.
评估大型语言模型作为临床代理。
PMID: 38570554 | DOI: 10.1038/s41746-024-01083-y | 日期: 2024-04-03
摘要: Recent developments in large language models (LLMs) have unlocked opportunities for healthcare, from information synthesis to clinical decision support. These LLMs are not just capable of modeling language, but can also act as intelligent "agents" that interact with stakeholders in open-ended conversations and even influence clinical decision-making. Rather than relying on benchmarks that measure a model's ability to process clinical data or answer standardized test questions, LLM agents can be modeled in high-fidelity simulations of clinical settings and should be assessed for their impact on clinical workflows. These evaluation frameworks, which we refer to as "Artificial Intelligence Structured Clinical Examinations" ("AI-SCE"), can draw from comparable technologies where machines operate with varying degrees of self-governance, such as self-driving cars, in dynamic environments with multiple stakeholders. Developing these robust, real-world clinical evaluations will be crucial towards deploying LLM agents in medical settings.
中文摘要: 大语言模型 (LLM) 的最新发展为医疗保健带来了从信息合成到临床决策支持的机遇。这些法学硕士不仅能够对语言进行建模,还可以充当智能"代理",在开放式对话中与利益相关者互动,甚至影响临床决策。 LLM 代理可以在临床环境的高保真模拟中进行建模,并应评估其对临床工作流程的影响,而不是依赖于衡量模型处理临床数据或回答标准化测试问题的能力的基准。这些评估框架,我们称之为"人工智能结构化临床检查"("AI-SCE"),可以借鉴类似技术,其中机器在具有多个利益相关者的动态环境中以不同程度的自治运行,例如自动驾驶汽车。开发这些强大的、真实的临床评估对于在医疗环境中部署法学硕士药剂至关重要。
283. Contemporary attitudes and beliefs on coronary artery calcium from social media using artificial intelligence.
使用人工智能从社交媒体了解当代对冠状动脉钙的态度和信念。
PMID: 38555387 | DOI: 10.1038/s41746-024-01077-w | 日期: 2024-03-30
摘要: Coronary artery calcium (CAC) is a powerful tool to refine atherosclerotic cardiovascular disease (ASCVD) risk assessment. Despite its growing interest, contemporary public attitudes around CAC are not well-described in literature and have important implications for shared decision-making around cardiovascular prevention. We used an artificial intelligence (AI) pipeline consisting of a semi-supervised natural language processing model and unsupervised machine learning techniques to analyze 5,606 CAC-related discussions on Reddit. A total of 91 discussion topics were identified and were classified into 14 overarching thematic groups. These included the strong impact of CAC on therapeutic decision-making, ongoing non-evidence-based use of CAC testing, and the patient perceived downsides of CAC testing (e.g., radiation risk). Sentiment analysis also revealed that most discussions had a neutral (49.5%) or negative (48.4%) sentiment. The results of this study demonstrate the potential of an AI-based approach to analyze large, publicly available social media data to generate insights into public perceptions about CAC, which may help guide strategies to improve shared decision-making around ASCVD management and public health interventions.
中文摘要: 冠状动脉钙 (CAC) 是完善动脉粥样硬化性心血管疾病 (ASCVD) 风险评估的强大工具。尽管人们对 CAC 的兴趣日益浓厚,但当代公众对 CAC 的态度并没有在文献中得到很好的描述,并且对围绕心血管预防的共同决策具有重要影响。我们使用由半监督自然语言处理模型和无监督机器学习技术组成的人工智能 (AI) 管道来分析 Reddit 上的 5,606 个与 CAC 相关的讨论。总共确定了 91 个讨论主题,并分为 14 个总体主题组。其中包括 CAC 对治疗决策的强烈影响、CAC 测试的持续非循证使用以及患者感知到的 CAC 测试的缺点(例如辐射风险)。情绪分析还显示,大多数讨论的情绪是中性(49.5%)或负面(48.4%)。这项研究的结果表明,基于人工智能的方法可以分析大量公开的社交媒体数据,从而深入了解公众对 CAC 的看法,这可能有助于指导改善 ASCVD 管理和公共卫生干预措施的共同决策的策略。
284. Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI.
用于评估由生成人工智能支持的医疗保健对话有效性的基础指标。
PMID: 38553625 | DOI: 10.1038/s41746-024-01074-z | 日期: 2024-03-29
摘要: Generative Artificial Intelligence is set to revolutionize healthcare delivery by transforming traditional patient care into a more personalized, efficient, and proactive process. Chatbots, serving as interactive conversational models, will probably drive this patient-centered transformation in healthcare. Through the provision of various services, including diagnosis, personalized lifestyle recommendations, dynamic scheduling of follow-ups, and mental health support, the objective is to substantially augment patient health outcomes, all the while mitigating the workload burden on healthcare providers. The life-critical nature of healthcare applications necessitates establishing a unified and comprehensive set of evaluation metrics for conversational models. Existing evaluation metrics proposed for various generic large language models (LLMs) demonstrate a lack of comprehension regarding medical and health concepts and their significance in promoting patients' well-being. Moreover, these metrics neglect pivotal user-centered aspects, including trust-building, ethics, personalization, empathy, user comprehension, and emotional support. The purpose of this paper is to explore state-of-the-art LLM-based evaluation metrics that are specifically applicable to the assessment of interactive conversational models in healthcare. Subsequently, we present a comprehensive set of evaluation metrics designed to thoroughly assess the performance of healthcare chatbots from an end-user perspective. These metrics encompass an evaluation of language processing abilities, impact on real-world clinical tasks, and effectiveness in user-interactive conversations. Finally, we engage in a discussion concerning the challenges associated with defining and implementing these metrics, with particular emphasis on confounding factors such as the target audience, evaluation methods, and prompt techniques involved in the evaluation process.
中文摘要: 生成人工智能将通过将传统的患者护理转变为更加个性化、高效和主动的流程来彻底改变医疗保健服务。聊天机器人作为交互式对话模型,可能会推动医疗保健领域以患者为中心的转型。通过提供各种服务,包括诊断、个性化生活方式建议、动态安排随访和心理健康支持,目标是大幅改善患者的健康结果,同时减轻医疗保健提供者的工作负担。医疗保健应用程序的生命攸关性需要为对话模型建立一套统一且全面的评估指标。针对各种通用大语言模型(LLM)提出的现有评估指标表明,人们缺乏对医疗和健康概念及其在促进患者福祉方面的重要性的理解。此外,这些指标忽视了以用户为中心的关键方面,包括信任建设、道德、个性化、同理心、用户理解和情感支持。本文的目的是探索最先进的基于法学硕士的评估指标,这些指标特别适用于医疗保健中交互式对话模型的评估。随后,我们提出了一套全面的评估指标,旨在从最终用户的角度全面评估医疗保健聊天机器人的性能。这些指标包括对语言处理能力的评估、对现实世界临床任务的影响以及用户交互式对话的有效性。最后,我们讨论了与定义和实施这些指标相关的挑战,特别强调了目标受众、评估方法和评估过程中涉及的提示技术等混杂因素。
285. Anti- and pro-fibrillatory effects of pulmonary vein isolation gaps in human atrial fibrillation digital twins.
人类心房颤动数字双胞胎中肺静脉隔离间隙的抗颤动和促颤动作用。
PMID: 38532181 | DOI: 10.1038/s41746-024-01075-y | 日期: 2024-03-26
摘要: Although pulmonary vein isolation (PVI) gaps and extrapulmonary vein triggers contribute to recurrence after atrial fibrillation (AF) ablation, their precise mechanisms remain unproven. Our study assessed the impact of PVI gaps on rhythm outcomes using a human AF digital twin. We included 50 patients (76.0% with persistent AF) who underwent catheter ablation with a realistic AF digital twin by integrating computed tomography and electroanatomical mapping. We evaluated the final rhythm status, including AF and atrial tachycardia (AT), across 600 AF episodes, considering factors including PVI level, PVI gap number, and pacing locations. Our findings revealed that antral PVI had a significantly lower ratio of AF at the final rhythm (28% vs. 56%, p = 0.002) than ostial PVI. Increasing PVI gap numbers correlated with an increased ratio of AF at the final rhythm (p < 0.001). Extra-PV induction yielded a higher ratio of AF at the final rhythm than internal PV induction (77.5% vs. 59.0%, p < 0.001). In conclusion, our human AF digital twin model helped assess AF maintenance mechanisms. Clinical trial registration: https://www.clinicaltrials.gov ; Unique identifier: NCT02138695.
中文摘要: 尽管肺静脉隔离(PVI)间隙和肺外静脉触发因素导致房颤(AF)消融后复发,但其确切机制尚未得到证实。我们的研究使用人类 AF 数字双胞胎评估了 PVI 差距对心律结果的影响。我们纳入了 50 名患者(76.0% 患有持续性 AF),他们通过整合计算机断层扫描和电解剖图,使用逼真的 AF 数字双胞胎进行了导管消融。我们评估了 600 次 AF 发作的最终心律状态,包括 AF 和房性心动过速 (AT),同时考虑了 PVI 水平、PVI 间隙数和起搏位置等因素。我们的研究结果表明,窦部 PVI 在最终心律时的 AF 比例显着低于窦口 PVI(28% vs. 56%,p = 0.002)。 PVI 间隙数的增加与最终心律时 AF 比率的增加相关 (p< 0.001)。额外 PV 诱导在最终节律下产生的 AF 比例高于内部 PV 诱导(77.5% vs. 59.0%,p<0.001)。总之,我们的人类 AF 数字孪生模型有助于评估 AF 维护机制。临床试验注册:https://www.clinicaltrials.gov;唯一标识符:NCT02138695。
286. A remote digital memory composite to detect cognitive impairment in memory clinic samples in unsupervised settings using mobile devices.
一种远程数字记忆复合材料,可使用移动设备在无人监督的环境中检测记忆诊所样本中的认知障碍。
PMID: 38532080 | DOI: 10.1038/s41746-024-00999-9 | 日期: 2024-03-26
摘要: Remote monitoring of cognition holds the promise to facilitate case-finding in clinical care and the individual detection of cognitive impairment in clinical and research settings. In the context of Alzheimer's disease, this is particularly relevant for patients who seek medical advice due to memory problems. Here, we develop a remote digital memory composite (RDMC) score from an unsupervised remote cognitive assessment battery focused on episodic memory and long-term recall and assess its construct validity, retest reliability, and diagnostic accuracy when predicting MCI-grade impairment in a memory clinic sample and healthy controls. A total of 199 participants were recruited from three cohorts and included as healthy controls (n = 97), individuals with subjective cognitive decline (n = 59), or patients with mild cognitive impairment (n = 43). Participants performed cognitive assessments in a fully remote and unsupervised setting via a smartphone app. The derived RDMC score is significantly correlated with the PACC5 score across participants and demonstrates good retest reliability. Diagnostic accuracy for discriminating memory impairment from no impairment is high (cross-validated AUC = 0.83, 95% CI [0.66, 0.99]) with a sensitivity of 0.82 and a specificity of 0.72. Thus, unsupervised remote cognitive assessments implemented in the neotiv digital platform show good discrimination between cognitively impaired and unimpaired individuals, further demonstrating that it is feasible to complement the neuropsychological assessment of episodic memory with unsupervised and remote assessments on mobile devices. This contributes to recent efforts to implement remote assessment of episodic memory for case-finding and monitoring in large research studies and clinical care.
中文摘要: 认知的远程监测有望促进临床护理中的病例发现以及临床和研究环境中认知障碍的个体检测。在阿尔茨海默病的背景下,这对于因记忆问题而寻求医疗建议的患者尤其重要。在这里,我们通过无监督的远程认知评估电池开发了远程数字记忆综合(RDMC)分数,重点关注情景记忆和长期回忆,并在预测记忆诊所样本和健康对照中的 MCI 级损伤时评估其构建有效性、重新测试可靠性和诊断准确性。从三个队列中总共招募了 199 名参与者,包括健康对照 (n = 97)、主观认知能力下降的个体 (n = 59) 或轻度认知障碍患者 (n = 43)。参与者通过智能手机应用程序在完全远程和无人监督的环境中进行认知评估。得出的 RDMC 分数与参与者的 PACC5 分数显着相关,并表现出良好的重测可靠性。区分记忆障碍和无记忆障碍的诊断准确性很高(交叉验证的 AUC = 0.83,95% CI [0.66,0.99]),敏感性为 0.82,特异性为 0.72。因此,neotiv 数字平台中实施的无监督远程认知评估显示出认知受损和未受损个体之间的良好区分,进一步证明通过移动设备上的无监督远程评估来补充情景记忆的神经心理学评估是可行的。这有助于最近在大型研究和临床护理中实施情景记忆远程评估以进行病例发现和监测的努力。
287. The clinician-AI interface: intended use and explainability in FDA-cleared AI devices for medical image interpretation.
临床医生-AI 接口:经 FDA 批准的用于医学图像解读的 AI 设备的预期用途和可解释性。
PMID: 38531952 | DOI: 10.1038/s41746-024-01080-1 | 日期: 2024-03-26
摘要: As applications of AI in medicine continue to expand, there is an increasing focus on integration into clinical practice. An underappreciated aspect of this clinical translation is where the AI fits into the clinical workflow, and in turn, the outputs generated by the AI to facilitate clinician interaction in this workflow. For instance, in the canonical use case of AI for medical image interpretation, the AI could prioritize cases before clinician review or even autonomously interpret the images without clinician review. A related aspect is explainability - does the AI generate outputs to help explain its predictions to clinicians? While many clinical AI workflows and explainability techniques have been proposed, a summative assessment of the current scope in clinical practice is lacking. Here, we evaluate the current state of FDA-cleared AI devices for medical image interpretation assistance in terms of intended clinical use, outputs generated, and types of explainability offered. We create a curated database focused on these aspects of the clinician-AI interface, where we find a high frequency of "triage" devices, notable variability in output characteristics across products, and often limited explainability of AI predictions. Altogether, we aim to increase transparency of the current landscape of the clinician-AI interface and highlight the need to rigorously assess which strategies ultimately lead to the best clinical outcomes.
中文摘要: 随着人工智能在医学中的应用不断扩大,人们越来越关注与临床实践的结合。这种临床翻译的一个未被充分重视的方面是人工智能融入临床工作流程,反过来,人工智能生成的输出可以促进临床医生在此工作流程中的互动。例如,在人工智能用于医学图像解释的典型用例中,人工智能可以在临床医生审查之前对病例进行优先级排序,甚至可以在没有临床医生审查的情况下自主解释图像。一个相关的方面是可解释性------人工智能是否会生成输出来帮助向临床医生解释其预测?尽管已经提出了许多临床人工智能工作流程和可解释性技术,但缺乏对当前临床实践范围的总结性评估。在这里,我们从预期临床用途、生成的输出以及提供的可解释性类型方面评估了 FDA 批准的用于医学图像解释辅助的人工智能设备的当前状态。我们创建了一个专门针对临床医生与人工智能接口这些方面的精选数据库,其中我们发现"分类"设备的频率很高,不同产品的输出特征存在显着差异,并且人工智能预测的可解释性通常有限。总而言之,我们的目标是提高临床医生与人工智能界面当前状况的透明度,并强调需要严格评估哪些策略最终会带来最佳的临床结果。
288. Digital twins for health: a scoping review.
健康数字孪生:范围界定审查。
PMID: 38519626 | DOI: 10.1038/s41746-024-01073-0 | 日期: 2024-03-22
摘要: The use of digital twins (DTs) has proliferated across various fields and industries, with a recent surge in the healthcare sector. The concept of digital twin for health (DT4H) holds great promise to revolutionize the entire healthcare system, including management and delivery, disease treatment and prevention, and health well-being maintenance, ultimately improving human life. The rapid growth of big data and continuous advancement in data science (DS) and artificial intelligence (AI) have the potential to significantly expedite DT research and development by providing scientific expertise, essential data, and robust cybertechnology infrastructure. Although various DT initiatives have been underway in the industry, government, and military, DT4H is still in its early stages. This paper presents an overview of the current applications of DTs in healthcare, examines consortium research centers and their limitations, and surveys the current landscape of emerging research and development opportunities in healthcare. We envision the emergence of a collaborative global effort among stakeholders to enhance healthcare and improve the quality of life for millions of individuals worldwide through pioneering research and development in the realm of DT technology.
中文摘要: 数字孪生 (DT) 的使用已在各个领域和行业激增,最近在医疗保健领域的应用激增。健康数字孪生(DT4H)的概念有望彻底改变整个医疗保健系统,包括管理和交付、疾病治疗和预防以及健康福祉维护,最终改善人类生活。大数据的快速增长以及数据科学(DS)和人工智能(AI)的不断进步有可能通过提供科学专业知识、基本数据和强大的网络技术基础设施来显着加快数字技术的研究和开发。尽管行业、政府和军队已经开展了各种 DT 计划,但 DT4H 仍处于早期阶段。本文概述了 DT 在医疗保健领域的当前应用,研究了联盟研究中心及其局限性,并调查了医疗保健领域新兴研究和开发机会的现状。我们设想利益相关者之间将展开全球协作,通过 DT 技术领域的开拓性研究和开发,增强全球数百万人的医疗保健并改善生活质量。
289. Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence.
使用生成人工智能模拟合成急性髓系白血病患者的临床试验。
PMID: 38509224 | DOI: 10.1038/s41746-024-01076-x | 日期: 2024-03-20
摘要: Clinical research relies on high-quality patient data, however, obtaining big data sets is costly and access to existing data is often hindered by privacy and regulatory concerns. Synthetic data generation holds the promise of effectively bypassing these boundaries allowing for simplified data accessibility and the prospect of synthetic control cohorts. We employed two different methodologies of generative artificial intelligence - CTAB-GAN+ and normalizing flows (NFlow) - to synthesize patient data derived from 1606 patients with acute myeloid leukemia, a heterogeneous hematological malignancy, that were treated within four multicenter clinical trials. Both generative models accurately captured distributions of demographic, laboratory, molecular and cytogenetic variables, as well as patient outcomes yielding high performance scores regarding fidelity and usability of both synthetic cohorts (n = 1606 each). Survival analysis demonstrated close resemblance of survival curves between original and synthetic cohorts. Inter-variable relationships were preserved in univariable outcome analysis enabling explorative analysis in our synthetic data. Additionally, training sample privacy is safeguarded mitigating possible patient re-identification, which we quantified using Hamming distances. We provide not only a proof-of-concept for synthetic data generation in multimodal clinical data for rare diseases, but also full public access to synthetic data sets to foster further research.
中文摘要: 临床研究依赖于高质量的患者数据,然而,获取大数据集的成本高昂,而且对现有数据的访问常常受到隐私和监管问题的阻碍。合成数据生成有望有效绕过这些边界,从而简化数据访问和合成控制队列的前景。我们采用两种不同的生成人工智能方法 - CTAB-GAN+ 和标准化流 (NFlow) - 来合成来自 1606 名急性髓性白血病(一种异质性血液恶性肿瘤)患者的患者数据,这些患者在四项多中心临床试验中接受治疗。两种生成模型都准确地捕获了人口统计、实验室、分子和细胞遗传学变量的分布,以及患者结果,在两个合成队列的保真度和可用性方面产生了高绩效得分(每个队列 n = 1606)。生存分析表明原始队列和合成队列之间的生存曲线非常相似。单变量结果分析中保留了变量间的关系,从而可以在我们的合成数据中进行探索性分析。此外,训练样本的隐私受到保护,减少了可能的患者重新识别,我们使用汉明距离对其进行了量化。我们不仅为罕见疾病的多模式临床数据中的合成数据生成提供概念验证,还为公众提供对合成数据集的全面访问,以促进进一步的研究。
290. Can we learn from an imagined ransomware attack on a hospital at home platform?
我们可以从想象中的针对家庭医院平台的勒索软件攻击中吸取教训吗?
PMID: 38509171 | DOI: 10.1038/s41746-024-01044-5 | 日期: 2024-03-20
摘要: The hospital at home concept integrates key digital medicine technologies and concepts in a single platform approach, with telemedicine, wearables, and sensors. It could bring benefits to patients, who face lower risks from hospital infections and who want to be at home with their loved ones. Moreover, it may lead to efficiency savings, through its seamless integration of data flows, and therefore is likely to be an increasingly implemented model. But what happens when the platform succumbs to exploited platform/infrastructure vulnerabilities or cyber attacks like ransomware that have been weaponized to bring networked systems crashing down? Exploring the attack modes and their consequences could help prioritize adequate safeguards.
中文摘要: 家庭医院概念将关键的数字医学技术和概念与远程医疗、可穿戴设备和传感器集成在一个平台方法中。它可以给患者带来好处,因为他们面临医院感染的风险较低,并且希望与亲人待在家里。此外,它还可以通过数据流的无缝集成来提高效率,因此很可能成为越来越多实施的模型。但是,当平台屈服于被利用的平台/基础设施漏洞或勒索软件等网络攻击(这些攻击已被武器化以导致网络系统崩溃)时,会发生什么?探索攻击模式及其后果有助于优先考虑适当的防护措施。
291. Use of automated conversational agents in improving young population mental health: a scoping review.
使用自动会话代理改善年轻人心理健康:范围界定审查。
PMID: 38503909 | DOI: 10.1038/s41746-024-01072-1 | 日期: 2024-03-19
摘要: Automated conversational agents (CAs) emerged as a promising solution in mental health interventions among young people. Therefore, the objective of this scoping review is to examine the current state of research into fully automated CAs mediated interventions for the emotional component of mental health among young people. Selected databases were searched in March 2023. Included studies were primary research, reporting on development, feasibility/usability, or evaluation of fully automated CAs as a tool to improve the emotional component of mental health among young population. Twenty-five studies were included (N = 1707). Most automated CAs applications were standalone preventions targeting anxiety and depression. Automated CAs were predominantly AI-based chatbots, using text as the main communication channel. Overall, the results of the current scoping review showed that automated CAs mediated interventions for emotional problems are acceptable, engaging and with high usability. However, the results for clinical efficacy are far less conclusive, since almost half of evaluation studies reported no significant effect on emotional mental health outcomes. Based on these findings, it can be concluded that there is a pressing need to improve the existing automated CAs applications to increase their efficacy as well as conducting more rigorous methodological research in this area.
中文摘要: 自动对话代理(CA)成为年轻人心理健康干预的一个有前途的解决方案。因此,本次范围审查的目的是考察全自动 CA 介导的年轻人心理健康情绪成分干预措施的研究现状。选定的数据库于 2023 年 3 月进行了检索。纳入的研究包括初步研究、开发报告、可行性/可用性或对全自动 CA 作为改善年轻人心理健康情绪成分的工具的评估。纳入了 25 项研究 (N = 1707)。大多数自动化 CA 应用程序都是针对焦虑和抑郁的独立预防措施。自动化 CA 主要是基于人工智能的聊天机器人,使用文本作为主要通信渠道。总体而言,当前范围界定审查的结果表明,自动化 CA 介导的情绪问题干预措施是可以接受的、有吸引力的且具有高可用性。然而,临床疗效的结果远没有那么确定,因为几乎一半的评估研究报告对情绪心理健康结果没有显着影响。基于这些发现,可以得出结论,迫切需要改进现有的自动化 CA 应用程序以提高其效率,并在该领域进行更严格的方法学研究。
292. Evaluating reliability in wearable devices for sleep staging.
评估可穿戴设备睡眠分期的可靠性。
PMID: 38499793 | DOI: 10.1038/s41746-024-01016-9 | 日期: 2024-03-18
摘要: Sleep is crucial for physical and mental health, but traditional sleep quality assessment methods have limitations. This scoping review analyzes 35 articles from the past decade, evaluating 62 wearable setups with varying sensors, algorithms, and features. Our analysis indicates a trend towards combining accelerometer and photoplethysmography (PPG) data for out-of-lab sleep staging. Devices using only accelerometer data are effective for sleep/wake detection but fall short in identifying multiple sleep stages, unlike those incorporating PPG signals. To enhance the reliability of sleep staging wearables, we propose five recommendations: (1) Algorithm validation with equity, diversity, and inclusion considerations, (2) Comparative performance analysis of commercial algorithms across multiple sleep stages, (3) Exploration of feature impacts on algorithm accuracy, (4) Consistent reporting of performance metrics for objective reliability assessment, and (5) Encouragement of open-source classifier and data availability. Implementing these recommendations can improve the accuracy and reliability of sleep staging algorithms in wearables, solidifying their value in research and clinical settings.
中文摘要: 睡眠对于身心健康至关重要,但传统的睡眠质量评估方法存在局限性。这份范围审查分析了过去十年的 35 篇文章,评估了具有不同传感器、算法和功能的 62 种可穿戴设备。我们的分析表明了将加速度计和光电体积描记法 (PPG) 数据结合起来进行实验室外睡眠分期的趋势。仅使用加速度计数据的设备对于睡眠/唤醒检测有效,但在识别多个睡眠阶段方面存在不足,这与结合 PPG 信号的设备不同。为了提高睡眠分期可穿戴设备的可靠性,我们提出了五项建议:(1)考虑公平性、多样性和包容性的算法验证,(2)跨多个睡眠阶段的商业算法的性能比较分析,(3)探索功能对算法准确性的影响,(4)为客观可靠性评估提供一致的性能指标报告,以及(5)鼓励开源分类器和数据可用性。实施这些建议可以提高可穿戴设备中睡眠分期算法的准确性和可靠性,从而巩固其在研究和临床环境中的价值。
293. Technology-supported behavior change interventions for reducing sodium intake in adults: a systematic review and meta-analysis.
技术支持的减少成人钠摄入量的行为改变干预措施:系统评价和荟萃分析。
PMID: 38499729 | DOI: 10.1038/s41746-024-01067-y | 日期: 2024-03-18
摘要: The effects of technology-supported behavior change interventions for reducing sodium intake on health outcomes in adults are inconclusive. Effective intervention characteristics associated with sodium reduction have yet to be identified. A systematic review and meta-analysis were conducted, searching randomized controlled trials (RCTs) published between January 2000 and April 2023 across 5 databases (PROSPERO: CRD42022357905). Meta-analyses using random-effects models were performed on 24-h urinary sodium (24HUNa), systolic blood pressure (SBP), and diastolic blood pressure (DBP). Subgroup analysis and meta-regression of 24HUNa were performed to identify effective intervention characteristics. Eighteen RCTs involving 3505 participants (51.5% female, mean age 51.6 years) were included. Technology-supported behavior change interventions for reducing sodium intake significantly reduced 24HUNa (mean difference [MD] -0.39 gm/24 h, 95% confidence interval [CI] -0.50 to -0.27; I2 = 24%), SBP (MD -2.67 mmHg, 95% CI -4.06 to -1.29; I2 = 40%), and DBP (MD -1.39 mmHg, 95% CI -2.31 to -0.48; I2 = 31%), compared to control conditions. Interventions delivered more frequently (≤weekly) were associated with a significantly larger effect size in 24HUNa reduction compared to less frequent interventions (>weekly). Other intervention characteristics, such as intervention delivery via instant messaging and participant-family dyad involvement, were associated with larger, albeit non-significant, effect sizes in 24HUNa reduction when compared to other subgroups. Technology-supported behavior change interventions aimed at reducing sodium intake were effective in reducing 24HUNa, SBP, and DBP at post-intervention. Effective intervention characteristics identified in this review should be considered to develop sodium intake reduction interventions and tested in future trials, particularly for its long-term effects.
中文摘要: 技术支持的减少钠摄入量的行为改变干预措施对成人健康结果的影响尚无定论。与减钠相关的有效干预特征尚未确定。我们进行了系统回顾和荟萃分析,检索了 2000 年 1 月至 2023 年 4 月期间在 5 个数据库中发表的随机对照试验 (RCT)(PROSPERO:CRD42022357905)。使用随机效应模型对 24 小时尿钠 (24HUNa)、收缩压 (SBP) 和舒张压 (DBP) 进行荟萃分析。对 24HUNa 进行亚组分析和荟萃回归以确定有效的干预特征。纳入了 18 项随机对照试验,涉及 3505 名参与者(51.5% 为女性,平均年龄 51.6 岁)。技术支持的减少钠摄入量的行为改变干预措施显着降低了 24HUNa(平均差 [MD] -0.39 gm/24 h,95% 置信区间 [CI] -0.50 至 -0.27;I2 = 24%)、SBP(MD -2.67 mmHg,95% CI -4.06 至与对照条件相比,I2 = 40%)和 DBP(MD -1.39 mmHg,95% CI -2.31 至 -0.48;I2 = 31%)。与频率较低的干预措施(>每周)相比,频率较高(≤每周)的干预措施与 24HUNa 减少效果显着更大相关。与其他亚组相比,其他干预特征,例如通过即时消息传递和参与者家庭二人参与进行干预,与 24HUNa 减少的较大(尽管不显着)效应大小相关。旨在减少钠摄入量的技术支持的行为改变干预措施可有效降低干预后的 24HUNa、SBP 和 DBP。应考虑本次综述中确定的有效干预措施特征来制定减少钠摄入量的干预措施,并在未来的试验中进行测试,特别是其长期效果。
294. Effect of clinical decision support for severe hypercholesterolemia on low-density lipoprotein cholesterol levels.
严重高胆固醇血症的临床决策支持对低密度脂蛋白胆固醇水平的影响。
PMID: 38499608 | DOI: 10.1038/s41746-024-01069-w | 日期: 2024-03-18
摘要: Severe hypercholesterolemia/possible familial hypercholesterolemia (FH) is relatively common but underdiagnosed and undertreated. We investigated whether implementing clinical decision support (CDS) was associated with lower low-density lipoprotein cholesterol (LDL-C) in patients with severe hypercholesterolemia/possible FH (LDL-C ≥ 190 mg/dL). As part of a pre-post implementation study, a CDS alert was deployed in the electronic health record (EHR) in a large health system comprising 3 main sites, 16 hospitals and 53 clinics. Data were collected for 3 months before ('silent mode') and after ('active mode') its implementation. Clinicians were only able to view the alert in the EHR during active mode. We matched individuals 1:1 in both modes, based on age, sex, and baseline lipid lowering therapy (LLT). The primary outcome was difference in LDL-C between the two groups and the secondary outcome was initiation/intensification of LLT after alert trigger. We identified 800 matched patients in each mode (mean ± SD age 56.1 ± 11.8 y vs. 55.9 ± 11.8 y; 36.0% male in both groups; mean ± SD initial LDL-C 211.3 ± 27.4 mg/dL vs. 209.8 ± 23.9 mg/dL; 11.2% on LLT at baseline in each group). LDL-C levels were 6.6 mg/dL lower (95% CI, -10.7 to -2.5; P = 0.002) in active vs. silent mode. The odds of high-intensity statin use (OR, 1.78; 95% CI, 1.41-2.23; P < 0.001) and LLT initiation/intensification (OR, 1.30, 95% CI, 1.06-1.58, P = 0.01) were higher in active vs. silent mode. Implementation of a CDS was associated with lowering of LDL-C levels in patients with severe hypercholesterolemia/possible FH, likely due to higher rates of clinician led LLT initiation/intensification.
中文摘要: 严重高胆固醇血症/可能的家族性高胆固醇血症(FH)相对常见,但诊断和治疗不足。我们研究了实施临床决策支持 (CDS) 是否与重度高胆固醇血症/可能 FH (LDL-C ≥ 190 mg/dL) 患者的低密度脂蛋白胆固醇 (LDL-C) 降低相关。作为实施前后研究的一部分,CDS 警报被部署在由 3 个主要站点、16 家医院和 53 家诊所组成的大型卫生系统的电子健康记录 (EHR) 中。数据收集于实施前("静默模式")和实施后("主动模式")3 个月。临床医生只能在主动模式下查看 EHR 中的警报。我们根据年龄、性别和基线降脂治疗 (LLT) 在两种模式下对个体进行 1:1 匹配。主要结局是两组之间 LDL-C 的差异,次要结局是警报触发后 LLT 的启动/强化。我们在每种模式下确定了 800 名匹配的患者(平均±SD 年龄为 56.1±11.8 岁 vs 55.9±11.8 岁;两组中男性均为 36.0%;平均±SD 初始 LDL-C 为 211.3±27.4 mg/dL vs. 209.8±23.9mg/dL;每组基线时 LLT 为 11.2%)。与静默模式相比,主动模式下 LDL-C 水平降低 6.6mg/dL(95% CI,-10.7 至 -2.5;P = 0.002)。与静默模式相比,高强度他汀类药物使用(OR,1.78;95% CI,1.41-2.23;P < 0.001)和 LLT 启动/强化(OR,1.30,95% CI,1.06-1.58,P = 0.01)的几率较高。 CDS 的实施与严重高胆固醇血症/可能 FH 患者的 LDL-C 水平降低相关,这可能是由于临床医生主导的 LLT 启动/强化率较高。
295. Dynamic associations between glucose and ecological momentary cognition in Type 1 Diabetes.
1 型糖尿病中葡萄糖与生态瞬时认知之间的动态关联。
PMID: 38499605 | DOI: 10.1038/s41746-024-01036-5 | 日期: 2024-03-18
摘要: Type 1 diabetes (T1D) is a chronic condition characterized by glucose fluctuations. Laboratory studies suggest that cognition is reduced when glucose is very low (hypoglycemia) and very high (hyperglycemia). Until recently, technological limitations prevented researchers from understanding how naturally-occurring glucose fluctuations impact cognitive fluctuations. This study leveraged advances in continuous glucose monitoring (CGM) and cognitive ecological momentary assessment (EMA) to characterize dynamic, within-person associations between glucose and cognition in naturalistic environments. Using CGM and EMA, we obtained intensive longitudinal measurements of glucose and cognition (processing speed, sustained attention) in 200 adults with T1D. First, we used hierarchical Bayesian modeling to estimate dynamic, within-person associations between glucose and cognition. Consistent with laboratory studies, we hypothesized that cognitive performance would be reduced at low and high glucose, reflecting cognitive vulnerability to glucose fluctuations. Second, we used data-driven lasso regression to identify clinical characteristics that predicted individual differences in cognitive vulnerability to glucose fluctuations. Large glucose fluctuations were associated with slower and less accurate processing speed, although slight glucose elevations (relative to person-level means) were associated with faster processing speed. Glucose fluctuations were not related to sustained attention. Seven clinical characteristics predicted individual differences in cognitive vulnerability to glucose fluctuations: age, time in hypoglycemia, lifetime severe hypoglycemic events, microvascular complications, glucose variability, fatigue, and neck circumference. Results establish the impact of glucose on processing speed in naturalistic environments, suggest that minimizing glucose fluctuations is important for optimizing processing speed, and identify several clinical characteristics that may exacerbate cognitive vulnerability to glucose fluctuations.
中文摘要: 1 型糖尿病 (T1D) 是一种以血糖波动为特征的慢性疾病。实验室研究表明,当血糖非常低(低血糖)和非常高(高血糖)时,认知能力会降低。直到最近,技术限制使研究人员无法了解自然发生的葡萄糖波动如何影响认知波动。这项研究利用连续血糖监测 (CGM) 和认知生态瞬时评估 (EMA) 的进步来表征自然环境中血糖与认知之间的动态、人体内关联。使用 CGM 和 EMA,我们对 200 名 1 型糖尿病成人进行了血糖和认知(处理速度、持续注意力)的密集纵向测量。首先,我们使用分层贝叶斯模型来估计葡萄糖和认知之间的动态、人体内关联。与实验室研究一致,我们假设认知能力在低血糖和高血糖下都会降低,反映了认知对血糖波动的脆弱性。其次,我们使用数据驱动的套索回归来识别临床特征,这些特征可以预测血糖波动认知脆弱性的个体差异。较大的葡萄糖波动与较慢且不太准确的处理速度相关,尽管轻微的葡萄糖升高(相对于个人水平的平均值)与较快的处理速度相关。血糖波动与持续注意力无关。七个临床特征预测了对血糖波动的认知脆弱性的个体差异:年龄、低血糖时间、终生严重低血糖事件、微血管并发症、血糖变异性、疲劳和颈围。结果确定了自然环境中葡萄糖对处理速度的影响,表明最小化葡萄糖波动对于优化处理速度很重要,并确定了可能加剧葡萄糖波动认知脆弱性的几种临床特征。
296. Physical activity and sleep changes among children during the COVID-19 pandemic.
COVID-19 大流行期间儿童的体力活动和睡眠发生变化。
PMID: 38493216 | DOI: 10.1038/s41746-024-01041-8 | 日期: 2024-03-16
摘要: Daily routines, including in-person school and extracurricular activities, are important for maintaining healthy physical activity and sleep habits in children. The COVID-19 pandemic significantly disrupted daily routines as in-person school and activities closed to prevent spread of SARS-CoV-2. We aimed to examine and assess differences in objectively measured physical activity levels and sleep patterns from wearable sensors in children with obesity before, during, and after a period of school and extracurricular activity closures associated with the COVID-19 pandemic. We compared average step count and sleep patterns (using the Mann-Whitney U Test) before and during the pandemic-associated school closures by using data from activity tracker wristbands (Garmin VivoFit 3). Data were collected from 94 children (aged 5-17) with obesity, who were enrolled in a randomized controlled trial testing a community-based lifestyle intervention for a duration of 12-months. During the period that in-person school and extracurricular activities were closed due to the COVID-19 pandemic, children with obesity experienced objectively-measured decreases in physical activity, and sleep duration. From March 15, 2020 to March 31, 2021, corresponding with local school closures, average daily step count decreased by 1655 steps. Sleep onset and wake time were delayed by about an hour and 45 min, respectively, while sleep duration decreased by over 12 min as compared with the pre-closure period. Step counts increased with the resumption of in-person activities. These findings provide objective evidence for parents, clinicians, and public health professionals on the importance of in-person daily activities and routines on health behaviors, particularly for children with pre-existing obesity. Trial Registration: Clinical trial registration: NCT03339440.
中文摘要: 日常生活,包括现场学校和课外活动,对于保持儿童健康的身体活动和睡眠习惯非常重要。 COVID-19 大流行严重扰乱了日常生活,为防止 SARS-CoV-2 传播,学校和活动都关闭了。我们的目的是检查和评估可穿戴传感器客观测量的肥胖儿童体力活动水平和睡眠模式在与 COVID-19 大流行相关的学校和课外活动关闭之前、期间和之后的差异。我们使用活动追踪腕带 (Garmin VivoFit 3) 的数据,比较了与大流行相关的学校停课之前和期间的平均步数和睡眠模式(使用 Mann-Whitney U 测试)。数据收集自 94 名肥胖儿童(5-17 岁),他们参加了一项随机对照试验,测试为期 12 个月的基于社区的生活方式干预措施。在因 COVID-19 大流行而关闭学校和课外活动期间,肥胖儿童的体力活动和睡眠时间出现了客观测量的减少。从2020年3月15日到2021年3月31日,随着当地学校停课,日均步数减少了1655步。与关闭前相比,入睡时间和醒来时间分别延迟约1小时和45分钟,睡眠时间减少12分钟以上。随着面对面活动的恢复,步数有所增加。这些发现为家长、临床医生和公共卫生专业人员提供了客观证据,证明了日常活动和常规对健康行为的重要性,特别是对于已有肥胖的儿童。试验注册:临床试验注册号:NCT03339440。
297. An aligned framework of actively collected and passively monitored clinical outcome assessments (COAs) for measure selection.
用于措施选择的主动收集和被动监测的临床结果评估 (COA) 的一致框架。
PMID: 38493202 | DOI: 10.1038/s41746-024-01068-x | 日期: 2024-03-16
摘要: Regulators increasingly require clinical outcome assessment (COA) data for approval. COAs can be collected via questionnaires or digital health technologies (DHTs), yet no single resource provides a side-by-side comparison of tools that collect complementary or related COA measures. We propose how to align ontologies for actively collected and passively monitored COAs into a single framework to allow for rapid, evidence-based, and fit-for-purpose measure selection.
中文摘要: 监管机构越来越需要临床结果评估 (COA) 数据进行批准。 COA 可以通过问卷或数字健康技术 (DHT) 收集,但没有任何单一资源可以对收集补充或相关 COA 措施的工具进行并列比较。我们提出如何将主动收集和被动监控的 COA 本体整合到一个框架中,以实现快速、基于证据和适合目的的措施选择。
298. Indigenous data governance approaches applied in research using routinely collected health data: a scoping review.
使用常规收集的健康数据在研究中应用本土数据治理方法:范围界定审查。
PMID: 38491156 | DOI: 10.1038/s41746-024-01070-3 | 日期: 2024-03-15
摘要: Globally, there is a growing acknowledgment of Indigenous Peoples' rights to control data related to their communities. This is seen in the development of Indigenous Data Governance standards. As health data collection increases, it's crucial to apply these standards in research involving Indigenous communities. Our study, therefore, aims to systematically review research using routinely collected health data of Indigenous Peoples, understanding the Indigenous Data Governance approaches and the associated advantages and challenges. We searched electronic databases for studies from 2013 to 2022, resulting in 85 selected articles. Of these, 65 (77%) involved Indigenous Peoples in the research, and 60 (71%) were authored by Indigenous individuals or organisations. While most studies (93%) provided ethical approval details, only 18 (21%) described Indigenous guiding principles, 35 (41%) reported on data sovereignty, and 28 (33%) addressed consent. This highlights the increasing focus on Indigenous Data Governance in utilising health data. Leveraging existing data sources in line with Indigenous data governance principles is vital for better understanding Indigenous health outcomes.
中文摘要: 在全球范围内,人们越来越认识到原住民有权控制与其社区相关的数据。这可以从本土数据治理标准的制定中看出。随着健康数据收集的增加,将这些标准应用于涉及土著社区的研究至关重要。因此,我们的研究旨在系统地回顾使用常规收集的原住民健康数据的研究,了解原住民数据治理方法以及相关的优势和挑战。我们检索了 2013 年至 2022 年研究的电子数据库,筛选出 85 篇文章。其中,65 项 (77%) 涉及原住民参与研究,60 项 (71%) 由原住民个人或组织撰写。虽然大多数研究 (93%) 提供了伦理批准细节,但只有 18 项 (21%) 描述了原住民指导原则,35 项 (41%) 报告了数据主权,28 项 (33%) 涉及同意。这凸显了在利用健康数据方面对本土数据治理的日益关注。根据土著数据治理原则利用现有数据源对于更好地了解土著健康结果至关重要。
299. The lucent yet opaque challenge of regulating artificial intelligence in radiology.
放射学领域人工智能监管的透明但不透明的挑战。
PMID: 38491126 | DOI: 10.1038/s41746-024-01071-2 | 日期: 2024-03-15
300. Informing immunotherapy with multi-omics driven machine learning.
通过多组学驱动的机器学习为免疫治疗提供信息。
PMID: 38486092 | DOI: 10.1038/s41746-024-01043-6 | 日期: 2024-03-14
摘要: Progress in sequencing technologies and clinical experiments has revolutionized immunotherapy on solid and hematologic malignancies. However, the benefits of immunotherapy are limited to specific patient subsets, posing challenges for broader application. To improve its effectiveness, identifying biomarkers that can predict patient response is crucial. Machine learning (ML) play a pivotal role in harnessing multi-omic cancer datasets and unlocking new insights into immunotherapy. This review provides an overview of cutting-edge ML models applied in omics data for immunotherapy analysis, including immunotherapy response prediction and immunotherapy-relevant tumor microenvironment identification. We elucidate how ML leverages diverse data types to identify significant biomarkers, enhance our understanding of immunotherapy mechanisms, and optimize decision-making process. Additionally, we discuss current limitations and challenges of ML in this rapidly evolving field. Finally, we outline future directions aimed at overcoming these barriers and improving the efficiency of ML in immunotherapy research.
中文摘要: 测序技术和临床实验的进步彻底改变了实体瘤和血液恶性肿瘤的免疫治疗。然而,免疫疗法的好处仅限于特定的患者亚群,这给更广泛的应用带来了挑战。为了提高其有效性,识别可以预测患者反应的生物标志物至关重要。机器学习 (ML) 在利用多组学癌症数据集和解锁免疫治疗新见解方面发挥着关键作用。本综述概述了应用于免疫治疗分析组学数据的前沿机器学习模型,包括免疫治疗反应预测和免疫治疗相关的肿瘤微环境识别。我们阐明机器学习如何利用不同的数据类型来识别重要的生物标志物,增强我们对免疫治疗机制的理解并优化决策过程。此外,我们还讨论了机器学习在这个快速发展的领域中当前的局限性和挑战。最后,我们概述了旨在克服这些障碍并提高免疫治疗研究中机器学习效率的未来方向。
301. Characterising user engagement with mHealth for chronic disease self-management and impact on machine learning performance.
描述用户与移动健康的互动,以进行慢性病自我管理以及对机器学习性能的影响。
PMID: 38472270 | DOI: 10.1038/s41746-024-01063-2 | 日期: 2024-03-12
摘要: Mobile Health (mHealth) has the potential to be transformative in the management of chronic conditions. Machine learning can leverage self-reported data collected with apps to predict periods of increased health risk, alert users, and signpost interventions. Despite this, mHealth must balance the treatment burden of frequent self-reporting and predictive performance and safety. Here we report how user engagement with a widely used and clinically validated mHealth app, myCOPD (designed for the self-management of Chronic Obstructive Pulmonary Disease), directly impacts the performance of a machine learning model predicting an acute worsening of condition (i.e., exacerbations). We classify how users typically engage with myCOPD, finding that 60.3% of users engage frequently, however, less frequent users can show transitional engagement (18.4%), becoming more engaged immediately ( < 21 days) before exacerbating. Machine learning performed better for users who engaged the most, however, this performance decrease can be mostly offset for less frequent users who engage more near exacerbation. We conduct interviews and focus groups with myCOPD users, highlighting digital diaries and disease acuity as key factors for engagement. Users of mHealth can feel overburdened when self-reporting data necessary for predictive modelling and confidence of recognising exacerbations is a significant barrier to accurate self-reported data. We demonstrate that users of mHealth should be encouraged to engage when they notice changes to their condition (rather than clinically defined symptoms) to achieve data that is still predictive for machine learning, while reducing the likelihood of disengagement through desensitisation.
中文摘要: 移动医疗(mHealth)有可能在慢性病的管理方面带来变革。机器学习可以利用应用程序收集的自我报告数据来预测健康风险增加的时期、提醒用户和路标干预措施。尽管如此,移动医疗必须平衡频繁自我报告和预测性能与安全性的治疗负担。在这里,我们报告了用户对广泛使用且经过临床验证的移动健康应用程序 myCOPD(专为慢性阻塞性肺疾病的自我管理而设计)的参与如何直接影响机器学习模型预测病情急性恶化(即病情加重)的性能。我们对用户通常如何参与 myCOPD 进行了分类,发现 60.3% 的用户频繁参与,然而,频率较低的用户可以表现出过渡性参与(18.4%),在恶化之前立即变得更加参与( < 21 天)。对于参与最多的用户来说,机器学习的表现更好,但是,对于参与程度更接近恶化的不太频繁的用户来说,这种性能下降可以在很大程度上得到抵消。我们对 myCOPD 用户进行访谈和焦点小组讨论,强调数字日记和疾病敏锐度是参与的关键因素。当预测建模所需的自我报告数据和识别病情加重的信心成为准确自我报告数据的重大障碍时,移动医疗的用户可能会感到负担过重。我们证明,当移动医疗用户注意到自己的病情(而不是临床定义的症状)发生变化时,应该鼓励他们参与进来,以获得仍然可以预测机器学习的数据,同时通过脱敏来减少脱离的可能性。
302. Modeling multiple sclerosis using mobile and wearable sensor data.
使用移动和可穿戴传感器数据对多发性硬化症进行建模。
PMID: 38467710 | DOI: 10.1038/s41746-024-01025-8 | 日期: 2024-03-11
摘要: Multiple sclerosis (MS) is a neurological disease of the central nervous system that is the leading cause of non-traumatic disability in young adults. Clinical laboratory tests and neuroimaging studies are the standard methods to diagnose and monitor MS. However, due to infrequent clinic visits, it is fundamental to identify remote and frequent approaches for monitoring MS, which enable timely diagnosis, early access to treatment, and slowing down disease progression. In this work, we investigate the most reliable, clinically useful, and available features derived from mobile and wearable devices as well as their ability to distinguish people with MS (PwMS) from healthy controls, recognize MS disability and fatigue levels. To this end, we formalize clinical knowledge and derive behavioral markers to characterize MS. We evaluate our approach on a dataset we collected from 55 PwMS and 24 healthy controls for a total of 489 days conducted in free-living conditions. The dataset contains wearable sensor data - e.g., heart rate - collected using an arm-worn device, smartphone data - e.g., phone locks - collected through a mobile application, patient health records - e.g., MS type - obtained from the hospital, and self-reports - e.g., fatigue level - collected using validated questionnaires administered via the mobile application. Our results demonstrate the feasibility of using features derived from mobile and wearable sensors to monitor MS. Our findings open up opportunities for continuous monitoring of MS in free-living conditions and can be used to evaluate and guide the effectiveness of treatments, manage the disease, and identify participants for clinical trials.
中文摘要: 多发性硬化症 (MS) 是一种中枢神经系统神经系统疾病,是导致年轻人非创伤性残疾的主要原因。临床实验室测试和神经影像学研究是诊断和监测多发性硬化症的标准方法。然而,由于诊所就诊频率较低,因此确定远程和频繁监测多发性硬化症的方法至关重要,这样可以及时诊断、及早获得治疗并减缓疾病进展。在这项工作中,我们研究了源自移动和可穿戴设备的最可靠、临床上最有用和可用的功能,以及它们区分多发性硬化症患者 (PwMS) 与健康对照者、识别多发性硬化症残疾和疲劳程度的能力。为此,我们将临床知识形式化并推导行为标记来表征多发性硬化症。我们使用从 55 个 PwMS 和 24 个健康对照中收集的数据集来评估我们的方法,该数据集在自由生活条件下进行总共 489 天。该数据集包含使用手臂佩戴设备收集的可穿戴传感器数据(例如心率)、通过移动应用程序收集的智能手机数据(例如手机锁)、从医院获得的患者健康记录(例如 MS 类型)以及自我报告(例如疲劳程度)(使用通过移动应用程序管理的经过验证的问卷收集)。我们的结果证明了使用移动和可穿戴传感器的功能来监测多发性硬化症的可行性。我们的研究结果为在自由生活条件下持续监测多发性硬化症提供了机会,可用于评估和指导治疗的有效性、管理疾病以及确定临床试验的参与者。
303. Bridging the literacy gap for surgical consents: an AI-human expert collaborative approach.
缩小手术同意的文化差距:人工智能与人类专家的协作方法。
PMID: 38459205 | DOI: 10.1038/s41746-024-01039-2 | 日期: 2024-03-08
摘要: Despite the importance of informed consent in healthcare, the readability and specificity of consent forms often impede patients' comprehension. This study investigates the use of GPT-4 to simplify surgical consent forms and introduces an AI-human expert collaborative approach to validate content appropriateness. Consent forms from multiple institutions were assessed for readability and simplified using GPT-4, with pre- and post-simplification readability metrics compared using nonparametric tests. Independent reviews by medical authors and a malpractice defense attorney were conducted. Finally, GPT-4's potential for generating de novo procedure-specific consent forms was assessed, with forms evaluated using a validated 8-item rubric and expert subspecialty surgeon review. Analysis of 15 academic medical centers' consent forms revealed significant reductions in average reading time, word rarity, and passive sentence frequency (all P < 0.05) following GPT-4-faciliated simplification. Readability improved from an average college freshman to an 8th-grade level (P = 0.004), matching the average American's reading level. Medical and legal sufficiency consistency was confirmed. GPT-4 generated procedure-specific consent forms for five varied surgical procedures at an average 6th-grade reading level. These forms received perfect scores on a standardized consent form rubric and withstood scrutiny upon expert subspeciality surgeon review. This study demonstrates the first AI-human expert collaboration to enhance surgical consent forms, significantly improving readability without sacrificing clinical detail. Our framework could be extended to other patient communication materials, emphasizing clear communication and mitigating disparities related to health literacy barriers.
中文摘要: 尽管知情同意在医疗保健中很重要,但同意书的可读性和特异性常常妨碍患者的理解。本研究调查了使用 GPT-4 来简化手术同意书,并引入了人工智能-人类专家协作方法来验证内容的适当性。使用 GPT-4 评估多个机构的同意书的可读性并进行简化,并使用非参数测试对简化前和简化后的可读性指标进行比较。医学作者和医疗事故辩护律师进行了独立审查。最后,评估了 GPT-4 生成从头手术特定同意书的潜力,并使用经过验证的 8 项标准和专家亚专科外科医生审查来评估表格。对 15 个学术医疗中心同意书的分析显示,经过 GPT-4 简化后,平均阅读时间、单词稀有度和被动句频率显着减少(所有 P< 0.05)。可读性从大学新生的平均水平提高到八年级的水平(P = 0.004),与美国人的平均阅读水平相当。医疗和法律充分性的一致性得到了确认。 GPT-4 以六年级平均阅读水平为五种不同的外科手术生成了特定于手术的同意书。这些表格在标准化同意书评分标准上获得了满分,并经受住了专家亚专科外科医生审查的严格审查。这项研究展示了人工智能与人类专家的首次合作,以增强手术同意书的质量,在不牺牲临床细节的情况下显着提高可读性。我们的框架可以扩展到其他患者沟通材料,强调清晰的沟通并减少与健康素养障碍相关的差异。
304. Unraveling cradle-to-grave disease trajectories from multilayer comorbidity networks.
从多层共病网络中揭示从摇篮到坟墓的疾病轨迹。
PMID: 38454004 | DOI: 10.1038/s41746-024-01015-w | 日期: 2024-03-07
摘要: We aim to comprehensively identify typical life-spanning trajectories and critical events that impact patients' hospital utilization and mortality. We use a unique dataset containing 44 million records of almost all inpatient stays from 2003 to 2014 in Austria to investigate disease trajectories. We develop a new, multilayer disease network approach to quantitatively analyze how cooccurrences of two or more diagnoses form and evolve over the life course of patients. Nodes represent diagnoses in age groups of ten years; each age group makes up a layer of the comorbidity multilayer network. Inter-layer links encode a significant correlation between diagnoses (p < 0.001, relative risk > 1.5), while intra-layers links encode correlations between diagnoses across different age groups. We use an unsupervised clustering algorithm for detecting typical disease trajectories as overlapping clusters in the multilayer comorbidity network. We identify critical events in a patient's career as points where initially overlapping trajectories start to diverge towards different states. We identified 1260 distinct disease trajectories (618 for females, 642 for males) that on average contain 9 (IQR 2-6) different diagnoses that cover over up to 70 years (mean 23 years). We found 70 pairs of diverging trajectories that share some diagnoses at younger ages but develop into markedly different groups of diagnoses at older ages. The disease trajectory framework can help us to identify critical events as specific combinations of risk factors that put patients at high risk for different diagnoses decades later. Our findings enable a data-driven integration of personalized life-course perspectives into clinical decision-making.
中文摘要: 我们的目标是全面识别影响患者医院利用率和死亡率的典型生命周期轨迹和关键事件。我们使用包含 2003 年至 2014 年奥地利几乎所有住院患者的 4400 万条记录的独特数据集来调查疾病轨迹。我们开发了一种新的多层疾病网络方法来定量分析两种或多种诊断的同时发生如何在患者的生命过程中形成和演变。节点代表十年年龄组的诊断;每个年龄组构成共病多层网络的一层。层间链接编码诊断之间的显着相关性(p < 0.001,相对风险 > 1.5),而层内链接编码不同年龄组诊断之间的相关性。我们使用无监督聚类算法来检测典型疾病轨迹作为多层合并症网络中的重叠聚类。我们将患者职业生涯中的关键事件确定为最初重叠的轨迹开始转向不同状态的点。我们确定了 1260 种不同的疾病轨迹(女性 618 种,男性 642 种),平均包含 9 种 (IQR 2-6) 不同的诊断,涵盖长达 70 年(平均 23 年)。我们发现了 70 对不同的轨迹,它们在年轻时有一些共同的诊断,但在老年时却发展成明显不同的诊断组。疾病轨迹框架可以帮助我们将关键事件识别为风险因素的特定组合,这些风险因素使患者在几十年后面临不同诊断的高风险。我们的研究结果能够以数据驱动的方式将个性化生命历程观点整合到临床决策中。
305. To warrant clinical adoption AI models require a multi-faceted implementation evaluation.
为了保证临床采用人工智能模型,需要进行多方面的实施评估。
PMID: 38448743 | DOI: 10.1038/s41746-024-01064-1 | 日期: 2024-03-06
摘要: Despite artificial intelligence (AI) technology progresses at unprecedented rate, our ability to translate these advancements into clinical value and adoption at the bedside remains comparatively limited. This paper reviews the current use of implementation outcomes in randomized controlled trials evaluating AI-based clinical decision support and found limited adoption. To advance trust and clinical adoption of AI, there is a need to bridge the gap between traditional quantitative metrics and implementation outcomes to better grasp the reasons behind the success or failure of AI systems and improve their translation into clinical value.
中文摘要: 尽管人工智能(AI)技术以前所未有的速度进步,但我们将这些进步转化为临床价值和床边采用的能力仍然相对有限。本文回顾了评估基于人工智能的临床决策支持的随机对照试验中实施结果的当前使用情况,发现采用率有限。为了促进人工智能的信任和临床应用,需要弥合传统定量指标和实施结果之间的差距,以更好地掌握人工智能系统成功或失败背后的原因,并提高其向临床价值的转化。
306. Contextualizing remote fall risk: Video data capture and implementing ethical AI.
远程跌倒风险情境化:视频数据捕获和实施道德人工智能。
PMID: 38448611 | DOI: 10.1038/s41746-024-01050-7 | 日期: 2024-03-06
摘要: Wearable inertial measurement units (IMUs) are being used to quantify gait characteristics that are associated with increased fall risk, but the current limitation is the lack of contextual information that would clarify IMU data. Use of wearable video-based cameras would provide a comprehensive understanding of an individual's habitual fall risk, adding context to clarify abnormal IMU data. Generally, there is taboo when suggesting the use of wearable cameras to capture real-world video, clinical and patient apprehension due to ethical and privacy concerns. This perspective proposes that routine use of wearable cameras could be realized within digital medicine through AI-based computer vision models to obfuscate/blur/shade sensitive information while preserving helpful contextual information for a comprehensive patient assessment. Specifically, no person sees the raw video data to understand context, rather AI interprets the raw video data first to blur sensitive objects and uphold privacy. That may be more routinely achieved than one imagines as contemporary resources exist. Here, to showcase/display the potential an exemplar model is suggested via off-the-shelf methods to detect and blur sensitive objects (e.g., people) with an accuracy of 88%. Here, the benefit of the proposed approach includes a more comprehensive understanding of an individual's free-living fall risk (from free-living IMU-based gait) without compromising privacy. More generally, the video and AI approach could be used beyond fall risk to better inform habitual experiences and challenges across a range of clinical cohorts. Medicine is becoming more receptive to wearables as a helpful toolbox, camera-based devices should be plausible instruments.
中文摘要: 可穿戴惯性测量单元 (IMU) 用于量化与跌倒风险增加相关的步态特征,但目前的限制是缺乏可以澄清 IMU 数据的上下文信息。使用可穿戴视频摄像头可以全面了解个人的习惯性跌倒风险,并添加背景信息以澄清异常的 IMU 数据。一般来说,出于道德和隐私方面的考虑,建议使用可穿戴摄像头捕捉真实世界的视频、临床和患者的忧虑是有禁忌的。这种观点提出,可通过基于人工智能的计算机视觉模型在数字医学中实现可穿戴相机的常规使用,以混淆/模糊/遮蔽敏感信息,同时保留有用的上下文信息以进行全面的患者评估。具体来说,没有人看到原始视频数据来理解上下文,而是人工智能首先解释原始视频数据以模糊敏感对象并维护隐私。由于当代资源的存在,这一目标可能比人们想象的更容易实现。在这里,为了展示/展示潜力,建议通过现成的方法来检测和模糊敏感对象(例如人),准确度为 88%。在这里,所提出的方法的好处包括在不损害隐私的情况下更全面地了解个人的自由生活跌倒风险(来自基于 IMU 的自由生活步态)。更一般地说,视频和人工智能方法可以在跌倒风险之外使用,以更好地告知一系列临床队列的习惯经历和挑战。医学界越来越接受可穿戴设备作为有用的工具箱,基于摄像头的设备应该是可行的工具。
307. Why we should not mistake accuracy of medical AI for efficiency.
为什么我们不应该将医疗人工智能的准确性误认为效率。
PMID: 38438477 | DOI: 10.1038/s41746-024-01047-2 | 日期: 2024-03-04
摘要: In the medical literature, promising results regarding accuracy of medical AI are presented as claims for its potential to increase efficiency. This elision of concepts is misleading and incorrect. First, the promise that AI will reduce human workload rests on a too narrow assessment of what constitutes workload in the first place. Human operators need new skills and deal with new responsibilities, these systems need an elaborate infrastructure and support system that all contribute to an increased amount of human work and short-term efficiency wins may become sources of long-term inefficiency. Second, for the realization of increased efficiency, the human-side of technology implementation is determinate. Human knowledge, competencies and trust can foster or undermine efficiency. We conclude that is important to remain conscious and critical about how we talk about expected benefits of AI, especially when referring to systemic changes based on single studies.
中文摘要: 在医学文献中,关于医疗人工智能准确性的有希望的结果被认为是其提高效率的潜力。这种概念的省略是误导性的且不正确的。首先,人工智能将减少人类工作量的承诺基于对工作量构成的过于狭隘的评估。人类操作员需要新技能并承担新职责,这些系统需要复杂的基础设施和支持系统,所有这些都有助于增加人类工作量,短期效率的提高可能会成为长期低效率的根源。其次,为了实现效率的提高,技术实施的人性化是确定的。人类的知识、能力和信任可以提高或降低效率。我们的结论是,对于如何谈论人工智能的预期收益保持清醒和批判性非常重要,特别是在提及基于单一研究的系统性变化时。
308. The prospect of artificial intelligence to personalize assisted reproductive technology.
人工智能个性化辅助生殖技术的前景。
PMID: 38429464 | DOI: 10.1038/s41746-024-01006-x | 日期: 2024-03-01
摘要: Infertility affects 1-in-6 couples, with repeated intensive cycles of assisted reproductive technology (ART) required by many to achieve a desired live birth. In ART, typically, clinicians and laboratory staff consider patient characteristics, previous treatment responses, and ongoing monitoring to determine treatment decisions. However, the reproducibility, weighting, and interpretation of these characteristics are contentious, and highly operator-dependent, resulting in considerable reliance on clinical experience. Artificial intelligence (AI) is ideally suited to handle, process, and analyze large, dynamic, temporal datasets with multiple intermediary outcomes that are generated during an ART cycle. Here, we review how AI has demonstrated potential for optimization and personalization of key steps in a reproducible manner, including: drug selection and dosing, cycle monitoring, induction of oocyte maturation, and selection of the most competent gametes and embryos, to improve the overall efficacy and safety of ART.
中文摘要: 不孕不育症影响着六分之一的夫妇,许多人需要重复强化的辅助生殖技术 (ART) 周期才能实现理想的活产。在 ART 中,临床医生和实验室工作人员通常会考虑患者特征、既往治疗反应和持续监测来确定治疗决策。然而,这些特征的再现性、权重和解释是有争议的,并且高度依赖于操作者,导致对临床经验的相当大的依赖。人工智能 (AI) 非常适合处理、处理和分析大型、动态、时间数据集,以及 ART 周期中生成的多个中间结果。在这里,我们回顾人工智能如何以可重复的方式展示关键步骤的优化和个性化潜力,包括:药物选择和剂量、周期监测、诱导卵母细胞成熟以及选择最有能力的配子和胚胎,以提高 ART 的整体功效和安全性。
309. Smartphone keyboard dynamics predict affect in suicidal ideation.
智能手机键盘动态预测自杀意念的影响。
PMID: 38429434 | DOI: 10.1038/s41746-024-01048-1 | 日期: 2024-03-01
摘要: While digital phenotyping provides opportunities for unobtrusive, real-time mental health assessments, the integration of its modalities is not trivial due to high dimensionalities and discrepancies in sampling frequencies. We provide an integrated pipeline that solves these issues by transforming all modalities to the same time unit, applying temporal independent component analysis (ICA) to high-dimensional modalities, and fusing the modalities with linear mixed-effects models. We applied our approach to integrate high-quality, daily self-report data with BiAffect keyboard dynamics derived from a clinical suicidality sample of mental health outpatients. Applying the ICA to the self-report data (104 participants, 5712 days of data) revealed components related to well-being, anhedonia, and irritability and social dysfunction. Mixed-effects models (55 participants, 1794 days) showed that less phone movement while typing was associated with more anhedonia (β = -0.12, p = 0.00030). We consider this method to be widely applicable to dense, longitudinal digital phenotyping data.
中文摘要: 虽然数字表型分析为不引人注目的实时心理健康评估提供了机会,但由于高维度和采样频率的差异,其模式的整合并非微不足道。我们提供了一个集成的管道,通过将所有模态转换为同一时间单位、将时间独立分量分析(ICA)应用于高维模态以及将模态与线性混合效应模型融合来解决这些问题。我们应用我们的方法将高质量的每日自我报告数据与源自心理健康门诊患者临床自杀样本的 BiAffect 键盘动态相结合。将 ICA 应用到自我报告数据(104 名参与者,5712 天的数据)中,揭示了与幸福感、快感缺失、易怒和社交功能障碍相关的成分。混合效应模型(55 名参与者,1794 天)表明,打字时较少移动手机与更多的快感缺乏相关(β = -0.12,p = 0.00030)。我们认为这种方法广泛适用于密集的纵向数字表型数据。
310. Why do probabilistic clinical models fail to transport between sites.
为什么概率临床模型无法在站点之间传输。
PMID: 38429353 | DOI: 10.1038/s41746-024-01037-4 | 日期: 2024-03-01
摘要: The rising popularity of artificial intelligence in healthcare is highlighting the problem that a computational model achieving super-human clinical performance at its training sites may perform substantially worse at new sites. In this perspective, we argue that we should typically expect this failure to transport, and we present common sources for it, divided into those under the control of the experimenter and those inherent to the clinical data-generating process. Of the inherent sources we look a little deeper into site-specific clinical practices that can affect the data distribution, and propose a potential solution intended to isolate the imprint of those practices on the data from the patterns of disease cause and effect that are the usual target of probabilistic clinical models.
中文摘要: 人工智能在医疗保健领域的日益普及凸显了这样一个问题:在训练地点实现超人类临床表现的计算模型在新地点可能表现会差很多。从这个角度来看,我们认为我们通常应该预料到这种传输失败,并且我们提出了它的常见来源,分为实验者控制下的来源和临床数据生成过程固有的来源。在固有来源中,我们更深入地研究了可能影响数据分布的特定地点的临床实践,并提出了一种潜在的解决方案,旨在将这些实践对数据的影响与疾病因果模式隔离开来,而疾病因果模式是概率临床模型的通常目标。
311. From ether to ethernet: ensuring ethical policy in digital transformation of waitlist triage for cardiovascular procedures.
从以太网到以太网:确保心血管手术候补分诊数字化转型中的道德政策。
PMID: 38424267 | DOI: 10.1038/s41746-024-01019-6 | 日期: 2024-02-29
312. Internet- and mobile-based psychological interventions for post-traumatic stress symptoms in youth: a systematic review and meta-analysis.
基于互联网和移动设备的针对青少年创伤后应激症状的心理干预:系统评价和荟萃分析。
PMID: 38424186 | DOI: 10.1038/s41746-024-01042-7 | 日期: 2024-02-29
摘要: Psychological interventions can help reduce posttraumatic stress symptoms (PTSS) in youth, but many do not seek help. Internet- and mobile-based interventions (IMIs) show promise in expanding treatment options. However, the overall evidence on IMIs in reducing PTSS among youth remains unclear. This systematic review and meta-analysis investigated the efficacy of IMIs in PTSS reduction for youth exposed to traumatic events. A comprehensive literature search was conducted in January 2023 including non-randomized and randomized-controlled trials (RCT) investigating the effects of IMIs on PTSS in youth aged ≤25 years. Six studies were identified with five providing data for the meta-analysis. The majority of studies included youth with different types of trauma irrespective of PTSS severity at baseline (k = 5). We found a small within-group effect in reducing PTSS from baseline to post-treatment (g = -0.39, 95% CrI: -0.67 to -0.11, k = 5; n = 558; 9 comparisons). No effect emerged when comparing the effect of IMIs to control conditions (g = 0.04; 95%-CrI: -0.52 to 0.6, k = 3; n = 768; k = 3; 4 comparisons). Heterogeneity was low between and within studies. All studies showed at least some concerns in terms of risk of bias. Current evidence does not conclusively support the overall efficacy of IMIs in addressing youth PTSS. This review revealed a scarcity of studies investigating IMIs for youth exposed to traumatic events, with most being feasibility studies rather than adequately powered RCTs and lacking a trauma focus. This underscores the demand for more high-quality research.
中文摘要: 心理干预可以帮助减少青少年的创伤后应激症状(PTSS),但许多人并不寻求帮助。基于互联网和移动设备的干预措施(IMIs)有望扩大治疗选择。然而,关于 IMI 减少青少年 PTSS 的总体证据仍不清楚。这项系统回顾和荟萃分析调查了 IMI 在减少遭受创伤事件的青少年的 PTSS 方面的功效。 2023 年 1 月进行了一项全面的文献检索,包括非随机和随机对照试验 (RCT),调查 IMI 对 25 岁以下青少年 PTSS 的影响。确定了六项研究,其中五项为荟萃分析提供数据。大多数研究都包括患有不同类型创伤的青少年,无论基线时 PTSS 的严重程度如何 (k = 5)。我们发现从基线到治疗后减少 PTSS 的组内效应很小(g = -0.39,95% CrI:-0.67 至 -0.11,k = 5;n = 558;9 个比较)。当比较 IMI 与对照条件的效果时,没有出现效果(g = 0.04;95%-CrI:-0.52 至 0.6,k = 3;n = 768;k = 3;4 次比较)。研究之间和研究内部的异质性较低。所有研究都至少显示出一些关于偏倚风险的担忧。目前的证据并未最终支持 IMI 在解决青少年 PTSS 方面的整体功效。该综述揭示了针对遭受创伤事件的青少年调查 IMI 的研究很少,其中大多数是可行性研究,而不是充分有力的随机对照试验,并且缺乏创伤焦点。这凸显了对更多高质量研究的需求。
313. Personalized mood prediction from patterns of behavior collected with smartphones.
根据智能手机收集的行为模式进行个性化情绪预测。
PMID: 38418551 | DOI: 10.1038/s41746-024-01035-6 | 日期: 2024-02-28
摘要: Over the last ten years, there has been considerable progress in using digital behavioral phenotypes, captured passively and continuously from smartphones and wearable devices, to infer depressive mood. However, most digital phenotype studies suffer from poor replicability, often fail to detect clinically relevant events, and use measures of depression that are not validated or suitable for collecting large and longitudinal data. Here, we report high-quality longitudinal validated assessments of depressive mood from computerized adaptive testing paired with continuous digital assessments of behavior from smartphone sensors for up to 40 weeks on 183 individuals experiencing mild to severe symptoms of depression. We apply a combination of cubic spline interpolation and idiographic models to generate individualized predictions of future mood from the digital behavioral phenotypes, achieving high prediction accuracy of depression severity up to three weeks in advance (R2 ≥ 80%) and a 65.7% reduction in the prediction error over a baseline model which predicts future mood based on past depression severity alone. Finally, our study verified the feasibility of obtaining high-quality longitudinal assessments of mood from a clinical population and predicting symptom severity weeks in advance using passively collected digital behavioral data. Our results indicate the possibility of expanding the repertoire of patient-specific behavioral measures to enable future psychiatric research.
中文摘要: 在过去的十年中,利用从智能手机和可穿戴设备被动持续捕获的数字行为表型来推断抑郁情绪方面取得了相当大的进展。然而,大多数数字表型研究的可重复性较差,常常无法检测到临床相关事件,并且使用未经验证或不适合收集大量纵向数据的抑郁症测量方法。在这里,我们报告了对 183 名经历轻度至重度抑郁症状的个体进行长达 40 周的高质量纵向验证抑郁情绪评估,这些评估是通过计算机自适应测试与智能手机传感器的连续数字行为评估相结合而得出的。我们应用三次样条插值和具体模型的组合,从数字行为表型生成未来情绪的个性化预测,提前三周实现抑郁严重程度的高预测精度(R2 ≥ 80%),与仅根据过去抑郁严重程度预测未来情绪的基线模型相比,预测误差减少了 65.7%。最后,我们的研究验证了从临床人群获得高质量纵向情绪评估并使用被动收集的数字行为数据提前几周预测症状严重程度的可行性。我们的结果表明有可能扩大患者特定行为测量的范围,以实现未来的精神病学研究。
314. Understanding inherent influencing factors to digital health adoption in general practices through a mixed-methods analysis.
通过混合方法分析了解一般实践中采用数字医疗的内在影响因素。
PMID: 38413767 | DOI: 10.1038/s41746-024-01049-0 | 日期: 2024-02-27
摘要: Extensive research has shown the potential value of digital health solutions and highlighted the importance of clinicians' adoption. As general practitioners (GPs) are patients' first point of contact, understanding influencing factors to their digital health adoption is especially important to derive personalized practical recommendations. Using a mixed-methods approach, this study broadly identifies adoption barriers and potential improvement strategies in general practices, including the impact of GPs' inherent characteristics - especially their personality - on digital health adoption. Results of our online survey with 216 GPs reveal moderate overall barriers on a 5-point Likert-type scale, with required workflow adjustments (M = 4.13, SD = 0.93), inadequate reimbursement (M = 4.02, SD = 1.02), and high training effort (M = 3.87, SD = 1.01) as substantial barriers. Improvement strategies are considered important overall, with respondents especially wishing for improved interoperability (M = 4.38, SD = 0.81), continued technical support (M = 4.33, SD = 0.91), and improved usability (M = 4.20, SD = 0.88). In our regression model, practice-related characteristics, the expected future digital health usage, GPs' digital affinity, several personality traits, and digital maturity are significant predictors of the perceived strength of barriers. For the perceived importance of improvement strategies, only demographics and usage-related variables are significant predictors. This study provides strong evidence for the impact of GPs' inherent characteristics on barriers and improvement strategies. Our findings highlight the need for comprehensive approaches integrating personal and emotional elements to make digitization in practices more engaging, tangible, and applicable.
中文摘要: 广泛的研究表明了数字健康解决方案的潜在价值,并强调了临床医生采用的重要性。由于全科医生 (GP) 是患者的第一接触点,因此了解影响他们采用数字医疗的因素对于得出个性化的实用建议尤为重要。这项研究采用混合方法,广泛确定了一般实践中的采用障碍和潜在的改进策略,包括全科医生的固有特征(尤其是他们的个性)对数字医疗采用的影响。我们对 216 名全科医生进行的在线调查结果显示,按照 5 点李克特量表,整体障碍中等,需要调整工作流程 (M = 4.13,SD = 0.93),报销不足 (M = 4.02,SD = 1.02),以及较高的培训工作量 (M = 3.87, SD = 1.01)作为实质性障碍。总体而言,改进策略被认为很重要,受访者特别希望提高互操作性(M = 4.38,SD = 0.81),持续的技术支持(M = 4.33,SD = 0.91)和提高可用性(M = 4.20,SD = 0.88)。在我们的回归模型中,与实践相关的特征、预期的未来数字医疗使用、全科医生的数字亲和力、一些个性特征和数字成熟度是感知障碍强度的重要预测因素。对于改进策略的感知重要性,只有人口统计和与使用相关的变量是重要的预测因素。这项研究为全科医生的固有特征对障碍和改进策略的影响提供了有力的证据。我们的研究结果强调需要采用整合个人和情感元素的综合方法,以使实践中的数字化更具吸引力、切实性和适用性。
315. The hospital at home in the USA: current status and future prospects.
美国本土医院:现状与未来展望
PMID: 38413704 | DOI: 10.1038/s41746-024-01040-9 | 日期: 2024-02-27
摘要: The annual cost of hospital care services in the US has risen to over $1 trillion despite relatively worse health outcomes compared to similar nations. These trends accentuate a growing need for innovative care delivery models that reduce costs and improve outcomes. HaH-a program that provides patients acute-level hospital care at home-has made significant progress over the past two decades. Technological advancements in remote patient monitoring, wearable sensors, health information technology infrastructure, and multimodal health data processing have contributed to its rise across hospitals. More recently, the COVID-19 pandemic brought HaH into the mainstream, especially in the US, with reimbursement waivers that made the model financially acceptable for hospitals and payors. However, HaH continues to face serious challenges to gain widespread adoption. In this review, we evaluate the peer-reviewed evidence and discuss the promises, challenges, and what it would take to tap into the future potential of HaH.
中文摘要: 尽管与类似国家相比,美国的健康状况相对较差,但每年的医院护理服务费用已上升至超过 1 万亿美元。这些趋势凸显了对降低成本和改善结果的创新护理服务模式的日益增长的需求。 HaH 是一项为患者提供居家紧急医院护理的计划,在过去二十年中取得了重大进展。远程患者监测、可穿戴传感器、健康信息技术基础设施和多模式健康数据处理方面的技术进步促进了其在医院的崛起。最近,COVID-19 大流行使 HaH 成为主流,尤其是在美国,报销豁免使该模式在财务上为医院和付款人所接受。然而,HaH 要想获得广泛采用,仍然面临着严峻的挑战。在这篇综述中,我们评估了同行评审的证据,并讨论了希望、挑战以及如何挖掘 HaH 的未来潜力。
316. Leveraging generative AI to prioritize drug repurposing candidates for Alzheimer's disease with real-world clinical validation.
利用生成式人工智能,通过现实世界的临床验证,优先考虑阿尔茨海默病的药物再利用候选药物。
PMID: 38409350 | DOI: 10.1038/s41746-024-01038-3 | 日期: 2024-02-26
摘要: Drug repurposing represents an attractive alternative to the costly and time-consuming process of new drug development, particularly for serious, widespread conditions with limited effective treatments, such as Alzheimer's disease (AD). Emerging generative artificial intelligence (GAI) technologies like ChatGPT offer the promise of expediting the review and summary of scientific knowledge. To examine the feasibility of using GAI for identifying drug repurposing candidates, we iteratively tasked ChatGPT with proposing the twenty most promising drugs for repurposing in AD, and tested the top ten for risk of incident AD in exposed and unexposed individuals over age 65 in two large clinical datasets: (1) Vanderbilt University Medical Center and (2) the All of Us Research Program. Among the candidates suggested by ChatGPT, metformin, simvastatin, and losartan were associated with lower AD risk in meta-analysis. These findings suggest GAI technologies can assimilate scientific insights from an extensive Internet-based search space, helping to prioritize drug repurposing candidates and facilitate the treatment of diseases.
中文摘要: 药物再利用是一种有吸引力的替代方案,可以替代昂贵且耗时的新药开发过程,特别是对于有效治疗有限的严重、广泛的疾病,例如阿尔茨海默氏病(AD)。 ChatGPT 等新兴的生成人工智能 (GAI) 技术有望加快科学知识的审查和总结。为了检验使用 GAI 识别药物再利用候选者的可行性,我们反复委托 ChatGPT 提出 20 种最有希望在 AD 中再利用的药物,并在两个大型临床数据集中测试了 65 岁以上暴露和未暴露个体发生 AD 风险的前 10 种药物:(1) 范德比尔特大学医学中心和 (2) All of Us 研究计划。在 ChatGPT 推荐的候选药物中,荟萃分析显示二甲双胍、辛伐他汀和氯沙坦与较低的 AD 风险相关。这些发现表明,GAI 技术可以从广泛的互联网搜索空间中吸收科学见解,有助于确定药物再利用候选者的优先顺序并促进疾病的治疗。
317. Walk, talk, think, see and feel: harnessing the power of digital biomarkers in healthcare.
行走、交谈、思考、观察和感受:利用医疗保健中数字生物标记的力量。
PMID: 38396034 | DOI: 10.1038/s41746-024-01023-w | 日期: 2024-02-24
318. Assessment of ownership of smart devices and the acceptability of digital health data sharing.
评估智能设备的所有权和数字健康数据共享的可接受性。
PMID: 38388660 | DOI: 10.1038/s41746-024-01030-x | 日期: 2024-02-22
摘要: Smart portable devices- smartphones and smartwatches- are rapidly being adopted by the general population, which has brought forward an opportunity to use the large volumes of physiological, behavioral, and activity data continuously being collected by these devices in naturalistic settings to perform research, monitor health, and track disease. While these data can serve to revolutionize health monitoring in research and clinical care, minimal research has been conducted to understand what motivates people to use these devices and their interest and comfort in sharing the data. In this study, we aimed to characterize the ownership and usage of smart devices among patients from an expansive academic health system in the southeastern US and understand their willingness to share data collected by the smart devices. We conducted an electronic survey of participants from an online patient advisory group around smart device ownership, usage, and data sharing. Out of the 3021 members of the online patient advisory group, 1368 (45%) responded to the survey, with 871 female (64%), 826 and 390 White (60%) and Black (29%) participants, respectively, and a slight majority (52%) age 58 and older. Most of the respondents (98%) owned a smartphone and the majority (59%) owned a wearable. In this population, people who identify as female, Hispanic, and Generation Z (age 18-25), and those completing higher education and having full-time employment, were most likely to own a wearable device compared to their demographic counterparts. 50% of smart device owners were willing to share and 32% would consider sharing their smart device data for research purposes. The type of activity data they are willing to share varies by gender, age, education, and employment. Findings from this study can be used to design both equitable and cost-effective digital health studies, leveraging personally-owned smartphones and wearables in representative populations, ultimately enabling the development of equitable digital health technologies.
中文摘要: 智能便携式设备(智能手机和智能手表)正在迅速被大众所采用,这为利用这些设备在自然环境中不断收集的大量生理、行为和活动数据来进行研究、监测健康和跟踪疾病提供了机会。虽然这些数据可以彻底改变研究和临床护理中的健康监测,但很少有研究来了解人们使用这些设备的动机以及他们对共享数据的兴趣和舒适度。在这项研究中,我们的目的是描述美国东南部广阔的学术医疗系统中患者对智能设备的拥有和使用情况,并了解他们分享智能设备收集的数据的意愿。我们对在线患者咨询小组的参与者进行了一项关于智能设备所有权、使用和数据共享的电子调查。在在线患者咨询小组的 3021 名成员中,有 1368 名 (45%) 人回复了调查,其中 871 名女性 (64%)、826 名白人 (60%) 和 390 名黑人 (29%) 参与者,略多 (52%) 年龄在 58 岁及以上。大多数受访者 (98%) 拥有智能手机,大多数 (59%) 拥有可穿戴设备。在这一人群中,与其他人群相比,女性、西班牙裔和 Z 世代(18-25 岁)以及完成高等教育并有全职工作的人最有可能拥有可穿戴设备。 50% 的智能设备所有者愿意分享,32% 的人会考虑出于研究目的分享他们的智能设备数据。他们愿意分享的活动数据类型因性别、年龄、教育和就业而异。这项研究的结果可用于设计公平且具有成本效益的数字健康研究,利用代表性人群中的个人智能手机和可穿戴设备,最终实现公平数字健康技术的发展。
319. Towards trustworthy seizure onset detection using workflow notes.
使用工作流程注释实现可靠的癫痫发作检测。
PMID: 38383884 | DOI: 10.1038/s41746-024-01008-9 | 日期: 2024-02-21
摘要: A major barrier to deploying healthcare AI is trustworthiness. One form of trustworthiness is a model's robustness across subgroups: while models may exhibit expert-level performance on aggregate metrics, they often rely on non-causal features, leading to errors in hidden subgroups. To take a step closer towards trustworthy seizure onset detection from EEG, we propose to leverage annotations that are produced by healthcare personnel in routine clinical workflows-which we refer to as workflow notes-that include multiple event descriptions beyond seizures. Using workflow notes, we first show that by scaling training data to 68,920 EEG hours, seizure onset detection performance significantly improves by 12.3 AUROC (Area Under the Receiver Operating Characteristic) points compared to relying on smaller training sets with gold-standard labels. Second, we reveal that our binary seizure onset detection model underperforms on clinically relevant subgroups (e.g., up to a margin of 6.5 AUROC points between pediatrics and adults), while having significantly higher FPRs (False Positive Rates) on EEG clips showing non-epileptiform abnormalities (+19 FPR points). To improve model robustness to hidden subgroups, we train a multilabel model that classifies 26 attributes other than seizures (e.g., spikes and movement artifacts) and significantly improve overall performance (+5.9 AUROC points) while greatly improving performance among subgroups (up to +8.3 AUROC points) and decreasing false positives on non-epileptiform abnormalities (by 8 FPR points). Finally, we find that our multilabel model improves clinical utility (false positives per 24 EEG hours) by a factor of 2×.
中文摘要: 部署医疗人工智能的一个主要障碍是可信度。可信度的一种形式是模型在子组中的稳健性:虽然模型可能在聚合指标上表现出专家级的性能,但它们通常依赖于非因果特征,导致隐藏子组中出现错误。为了更进一步实现脑电图可靠的癫痫发作检测,我们建议利用医护人员在常规临床工作流程中生成的注释(我们称之为工作流程注释),其中包括癫痫发作之外的多个事件描述。使用工作流程注释,我们首先表明,通过将训练数据扩展到 68,920 EEG 小时,与依赖带有黄金标准标签的较小训练集相比,癫痫发作检测性能显着提高了 12.3 AUROC(接收器操作特征下的面积)点。其次,我们发现我们的二元癫痫发作检测模型在临床相关亚组上表现不佳(例如,儿科和成人之间的差距高达 6.5 AUROC 点),而在显示非癫痫样异常的 EEG 剪辑上具有显着较高的 FPR(假阳性率)(+19 FPR 点)。为了提高模型对隐藏子组的鲁棒性,我们训练了一个多标签模型,该模型对癫痫发作以外的 26 个属性(例如尖峰和运动伪影)进行分类,并显着提高整体性能(+5.9 AUROC 点),同时大大提高子组之间的性能(高达 +8.3 AUROC 点)并减少非癫痫样异常的误报(8 FPR 点)。最后,我们发现我们的多标签模型将临床效用(每 24 小时脑电图假阳性)提高了 2 倍。
320. Economic evaluation for medical artificial intelligence: accuracy vs. cost-effectiveness in a diabetic retinopathy screening case.
医疗人工智能的经济评估:糖尿病视网膜病变筛查案例的准确性与成本效益。
PMID: 38383738 | DOI: 10.1038/s41746-024-01032-9 | 日期: 2024-02-21
摘要: Artificial intelligence (AI) models have shown great accuracy in health screening. However, for real-world implementation, high accuracy may not guarantee cost-effectiveness. Improving AI's sensitivity finds more high-risk patients but may raise medical costs while increasing specificity reduces unnecessary referrals but may weaken detection capability. To evaluate the trade-off between AI model performance and the long-running cost-effectiveness, we conducted a cost-effectiveness analysis in a nationwide diabetic retinopathy (DR) screening program in China, comprising 251,535 participants with diabetes over 30 years. We tested a validated AI model in 1100 different diagnostic performances (presented as sensitivity/specificity pairs) and modeled annual screening scenarios. The status quo was defined as the scenario with the most accurate AI performance. The incremental cost-effectiveness ratio (ICER) was calculated for other scenarios against the status quo as cost-effectiveness metrics. Compared to the status quo (sensitivity/specificity: 93.3%/87.7%), six scenarios were cost-saving and seven were cost-effective. To achieve cost-saving or cost-effective, the AI model should reach a minimum sensitivity of 88.2% and specificity of 80.4%. The most cost-effective AI model exhibited higher sensitivity (96.3%) and lower specificity (80.4%) than the status quo. In settings with higher DR prevalence and willingness-to-pay levels, the AI needed higher sensitivity for optimal cost-effectiveness. Urban regions and younger patient groups also required higher sensitivity in AI-based screening. In real-world DR screening, the most accurate AI model may not be the most cost-effective. Cost-effectiveness should be independently evaluated, which is most likely to be affected by the AI's sensitivity.
中文摘要: 人工智能(AI)模型在健康筛查方面表现出了极高的准确性。然而,对于现实世界的实施,高精度可能并不能保证成本效益。提高人工智能的灵敏度可以发现更多的高危患者,但可能会增加医疗成本,而提高特异性可以减少不必要的转诊,但可能会削弱检测能力。为了评估 AI 模型性能和长期运行成本效益之间的权衡,我们对中国全国范围的糖尿病视网膜病变 (DR) 筛查项目进行了成本效益分析,该项目包括 251,535 名患有 30 年以上糖尿病的参与者。我们在 1100 种不同的诊断性能(以敏感性/特异性对表示)中测试了经过验证的 AI 模型,并模拟了年度筛查场景。现状被定义为AI性能最准确的场景。增量成本效益比 (ICER) 是根据现状计算的其他场景的成本效益指标。与现状(敏感性/特异性:93.3%/87.7%)相比,有 6 种方案是节省成本的,7 种方案是具有成本效益的。为了实现节省成本或具有成本效益,AI模型的最低灵敏度应达到88.2%,特异性应达到80.4%。最具成本效益的人工智能模型比现状表现出更高的灵敏度(96.3%)和更低的特异性(80.4%)。在 DR 患病率和支付意愿水平较高的环境中,人工智能需要更高的灵敏度才能实现最佳成本效益。城市地区和年轻患者群体也需要基于人工智能的筛查具有更高的灵敏度。在现实世界的 DR 筛查中,最准确的 AI 模型可能并不是最具成本效益的。成本效益应该独立评估,这最有可能受到人工智能敏感性的影响。
321. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs.
通过法学硕士循证指南,促进工程设计的一致性和可靠性。
PMID: 38378899 | DOI: 10.1038/s41746-024-01029-4 | 日期: 2024-02-20
摘要: The use of large language models (LLMs) in clinical medicine is currently thriving. Effectively transferring LLMs' pertinent theoretical knowledge from computer science to their application in clinical medicine is crucial. Prompt engineering has shown potential as an effective method in this regard. To explore the application of prompt engineering in LLMs and to examine the reliability of LLMs, different styles of prompts were designed and used to ask different LLMs about their agreement with the American Academy of Orthopedic Surgeons (AAOS) osteoarthritis (OA) evidence-based guidelines. Each question was asked 5 times. We compared the consistency of the findings with guidelines across different evidence levels for different prompts and assessed the reliability of different prompts by asking the same question 5 times. gpt-4-Web with ROT prompting had the highest overall consistency (62.9%) and a significant performance for strong recommendations, with a total consistency of 77.5%. The reliability of the different LLMs for different prompts was not stable (Fleiss kappa ranged from -0.002 to 0.984). This study revealed that different prompts had variable effects across various models, and the gpt-4-Web with ROT prompt was the most consistent. An appropriate prompt could improve the accuracy of responses to professional medical questions.
中文摘要: 目前,大语言模型(LLM)在临床医学中的应用正在蓬勃发展。将法学硕士的相关理论知识从计算机科学有效地转移到临床医学中的应用至关重要。快速工程已显示出作为这方面有效方法的潜力。为了探索提示工程在法学硕士中的应用并检查法学硕士的可靠性,设计了不同风格的提示并用于询问不同的法学硕士是否同意美国骨科医师学会(AAOS)骨关节炎(OA)循证指南。每个问题被问了 5 次。我们将研究结果的一致性与针对不同提示的不同证据水平的指南进行了比较,并通过询问同一问题 5 次来评估不同提示的可靠性。带ROT提示的gpt-4-Web总体一致性最高(62.9%),强推荐性能显着,总体一致性为77.5%。不同提示的不同法学硕士的可靠性并不稳定(Fleiss kappa 范围为 -0.002 至 0.984)。这项研究表明,不同的提示在不同的模型中具有不同的效果,并且带有 ROT 提示的 gpt-4-Web 效果最为一致。适当的提示可以提高对专业医疗问题的回答的准确性。
322. CancerGPT for few shot drug pair synergy prediction using large pretrained language models.
CancerGPT 使用大型预训练语言模型进行少量药物对协同预测。
PMID: 38374445 | DOI: 10.1038/s41746-024-01024-9 | 日期: 2024-02-19
摘要: Large language models (LLMs) have been shown to have significant potential in few-shot learning across various fields, even with minimal training data. However, their ability to generalize to unseen tasks in more complex fields, such as biology and medicine has yet to be fully evaluated. LLMs can offer a promising alternative approach for biological inference, particularly in cases where structured data and sample size are limited, by extracting prior knowledge from text corpora. Here we report our proposed few-shot learning approach, which uses LLMs to predict the synergy of drug pairs in rare tissues that lack structured data and features. Our experiments, which involved seven rare tissues from different cancer types, demonstrate that the LLM-based prediction model achieves significant accuracy with very few or zero samples. Our proposed model, the CancerGPT (with ~ 124M parameters), is comparable to the larger fine-tuned GPT-3 model (with ~ 175B parameters). Our research contributes to tackling drug pair synergy prediction in rare tissues with limited data, and also advancing the use of LLMs for biological and medical inference tasks.
中文摘要: 大型语言模型 (LLM) 已被证明在各个领域的小样本学习中具有巨大潜力,即使训练数据很少。然而,它们推广到更复杂领域(例如生物学和医学)中未见过的任务的能力尚未得到充分评估。法学硕士可以通过从文本语料库中提取先验知识,为生物推理提供一种有前景的替代方法,特别是在结构化数据和样本量有限的情况下。在这里,我们报告了我们提出的少样本学习方法,该方法使用法学硕士来预测缺乏结构化数据和特征的稀有组织中药物对的协同作用。我们的实验涉及来自不同癌症类型的七种稀有组织,表明基于 LLM 的预测模型只需很少或零样本即可实现显着的准确性。我们提出的模型 CancerGPT(具有 ~ 124M 参数)与更大的微调 GPT-3 模型(具有 ~ 175B 参数)相当。我们的研究有助于解决数据有限的稀有组织中的药物对协同预测问题,并推进法学硕士在生物和医学推理任务中的使用。
323. The impact of using reinforcement learning to personalize communication on medication adherence: findings from the REINFORCE trial.
使用强化学习进行个性化沟通对药物依从性的影响:REINFORCE 试验的结果。
PMID: 38374424 | DOI: 10.1038/s41746-024-01028-5 | 日期: 2024-02-19
摘要: Text messaging can promote healthy behaviors, like adherence to medication, yet its effectiveness remains modest, in part because message content is rarely personalized. Reinforcement learning has been used in consumer technology to personalize content but with limited application in healthcare. We tested a reinforcement learning program that identifies individual responsiveness ("adherence") to text message content and personalizes messaging accordingly. We randomized 60 individuals with diabetes and glycated hemoglobin A1c [HbA1c] ≥ 7.5% to reinforcement learning intervention or control (no messages). Both arms received electronic pill bottles to measure adherence. The intervention improved absolute adjusted adherence by 13.6% (95%CI: 1.7%-27.1%) versus control and was more effective in patients with HbA1c 7.5- < 9.0% (36.6%, 95%CI: 25.1%-48.2%, interaction p < 0.001). We also explored whether individual patient characteristics were associated with differential response to tested behavioral factors and unique clusters of responsiveness. Reinforcement learning may be a promising approach to improve adherence and personalize communication at scale.
中文摘要: 短信可以促进健康行为,例如坚持用药,但其效果仍然有限,部分原因是短信内容很少是个性化的。强化学习已在消费技术中用于个性化内容,但在医疗保健领域的应用有限。我们测试了一个强化学习程序,该程序可以识别个人对短信内容的响应("遵守")并相应地个性化消息传递。我们将 60 名糖化血红蛋白 A1c [HbA1c] ≥ 7.5% 的糖尿病患者随机分为强化学习干预组或对照组(无消息)。双臂均装有电子药瓶来测量依从性。与对照组相比,干预措施的绝对调整依从性提高了 13.6%(95%CI:1.7%-27.1%),并且对 HbA1c 7.5- < 9.0% 的患者更有效(36.6%,95%CI:25.1%-48.2%,交互作用 p< 0.001)。我们还探讨了患者的个体特征是否与对测试的行为因素和独特的反应簇的差异反应相关。强化学习可能是提高依从性和大规模个性化沟通的一种有前途的方法。
324. Automatic speech-based assessment to discriminate Parkinson's disease from essential tremor with a cross-language approach.
基于语音的自动评估,通过跨语言方法区分帕金森病和原发性震颤。
PMID: 38368458 | DOI: 10.1038/s41746-024-01027-6 | 日期: 2024-02-17
摘要: Parkinson's disease (PD) and essential tremor (ET) are prevalent movement disorders that mainly affect elderly people, presenting diagnostic challenges due to shared clinical features. While both disorders exhibit distinct speech patterns-hypokinetic dysarthria in PD and hyperkinetic dysarthria in ET-the efficacy of speech assessment for differentiation remains unexplored. Developing technology for automatic discrimination could enable early diagnosis and continuous monitoring. However, the lack of data for investigating speech behavior in these patients has inhibited the development of a framework for diagnostic support. In addition, phonetic variability across languages poses practical challenges in establishing a universal speech assessment system. Therefore, it is necessary to develop models robust to the phonetic variability present in different languages worldwide. We propose a method based on Gaussian mixture models to assess domain adaptation from models trained in German and Spanish to classify PD and ET patients in Czech. We modeled three different speech dimensions: articulation, phonation, and prosody and evaluated the models' performance in both bi-class and tri-class classification scenarios (with the addition of healthy controls). Our results show that a fusion of the three speech dimensions achieved optimal results in binary classification, with accuracies up to 81.4 and 86.2% for monologue and /pa-ta-ka/ tasks, respectively. In tri-class scenarios, incorporating healthy speech signals resulted in accuracies of 63.3 and 71.6% for monologue and /pa-ta-ka/ tasks, respectively. Our findings suggest that automated speech analysis, combined with machine learning is robust, accurate, and can be adapted to different languages to distinguish between PD and ET patients.
中文摘要: 帕金森病 (PD) 和特发性震颤 (ET) 是常见的运动障碍,主要影响老年人,由于共同的临床特征,给诊断带来了挑战。虽然这两种疾病都表现出不同的言语模式------PD 中的运动机能障碍和 ET 中的运动亢进障碍------但言语评估对区分的功效仍有待探索。开发自动识别技术可以实现早期诊断和持续监测。然而,缺乏调查这些患者言语行为的数据阻碍了诊断支持框架的开发。此外,不同语言之间的语音变异对建立通用语音评估系统提出了实际挑战。因此,有必要开发针对全球不同语言中存在的语音变异的稳健模型。我们提出了一种基于高斯混合模型的方法,用于评估用德语和西班牙语训练的模型的域适应,以对捷克语的 PD 和 ET 患者进行分类。我们对三个不同的语音维度进行了建模:发音、发声和韵律,并评估了模型在二类和三类分类场景(添加健康对照)中的性能。我们的结果表明,三个语音维度的融合在二元分类中取得了最佳结果,独白和 /pa-ta-ka/ 任务的准确率分别高达 81.4% 和 86.2%。在三类场景中,结合健康的语音信号,独白和 /pa-ta-ka/ 任务的准确率分别为 63.3% 和 71.6%。我们的研究结果表明,自动语音分析与机器学习相结合是稳健、准确的,并且可以适应不同的语言来区分 PD 和 ET 患者。
325. Regular snoring is associated with uncontrolled hypertension.
经常打鼾与不受控制的高血压有关。
PMID: 38368445 | DOI: 10.1038/s41746-024-01026-7 | 日期: 2024-02-17
摘要: Snoring may be a risk factor for cardiovascular disease independent of other co-morbidities. However, most prior studies have relied on subjective, self-report, snoring evaluation. This study assessed snoring prevalence objectively over multiple months using in-home monitoring technology, and its association with hypertension prevalence. In this study, 12,287 participants were monitored nightly for approximately six months using under-the-mattress sensor technology to estimate the average percentage of sleep time spent snoring per night and the estimated apnea-hypopnea index (eAHI). Blood pressure cuff measurements from multiple daytime assessments were averaged to define uncontrolled hypertension based on mean systolic blood pressure≥140 mmHg and/or a mean diastolic blood pressure ≥90 mmHg. Associations between snoring and uncontrolled hypertension were examined using logistic regressions controlled for age, body mass index, sex, and eAHI. Participants were middle-aged (mean ± SD; 50 ± 12 y) and most were male (88%). There were 2467 cases (20%) with uncontrolled hypertension. Approximately 29, 14 and 7% of the study population snored for an average of >10, 20, and 30% per night, respectively. A higher proportion of time spent snoring (75th vs. 5th; 12% vs. 0.04%) was associated with a ~1.9-fold increase (OR [95%CI]; 1.87 [1.63, 2.15]) in uncontrolled hypertension independent of sleep apnea. Multi-night objective snoring assessments and repeat daytime blood pressure recordings in a large global consumer sample, indicate that snoring is common and positively associated with hypertension. These findings highlight the potential clinical utility of simple, objective, and noninvasive methods to detect snoring and its potential adverse health consequences.
中文摘要: 打鼾可能是心血管疾病的一个独立于其他合并症的危险因素。然而,大多数先前的研究都依赖于主观的、自我报告的打鼾评估。这项研究使用家庭监测技术客观评估了几个月的打鼾患病率及其与高血压患病率的关系。在这项研究中,使用床垫下传感器技术每晚对 12,287 名参与者进行了大约六个月的监测,以估计每晚打鼾的睡眠时间的平均百分比以及估计的呼吸暂停低通气指数 (eAHI)。对多次白天评估的血压袖带测量值进行平均,根据平均收缩压≥140mmHg和/或平均舒张压≥90mmHg来定义不受控制的高血压。使用年龄、体重指数、性别和 eAHI 控制的逻辑回归来检查打鼾和不受控制的高血压之间的关联。参与者均为中年人(平均±SD;50±12岁),大多数为男性(88%)。有2467例(20%)高血压未得到控制。大约 29%、14% 和 7% 的研究人群每晚平均打鼾程度分别 >10%、20% 和 30%。打鼾时间比例较高(75% vs. 5%;12% vs. 0.04%)与不受控制的高血压(独立于睡眠呼吸暂停)增加约 1.9 倍(OR [95%CI];1.87 [1.63,2.15])相关。在全球大量消费者样本中进行的多晚客观打鼾评估和重复的日间血压记录表明,打鼾很常见,并且与高血压呈正相关。这些发现强调了简单、客观和非侵入性方法在检测打鼾及其潜在不良健康后果方面的潜在临床实用性。
326. Quantifying the impact of telemedicine and patient medical advice request messages on physicians' work-outside-work.
量化远程医疗和患者医疗建议请求消息对医生工作外工作的影响。
PMID: 38355913 | DOI: 10.1038/s41746-024-01001-2 | 日期: 2024-02-14
摘要: The COVID-19 pandemic has boosted digital health utilization, raising concerns about increased physicians' after-hours clinical work ("work-outside-work"). The surge in patients' digital messages and additional time spent on work-outside-work by telemedicine providers underscores the need to evaluate the connection between digital health utilization and physicians' after-hours commitments. We examined the impact on physicians' workload from two types of digital demands - patients' messages requesting medical advice (PMARs) sent to physicians' inbox (inbasket), and telemedicine. Our study included 1716 ambulatory-care physicians in New York City regularly practicing between November 2022 and March 2023. Regression analyses assessed primary and interaction effects of (PMARs) and telemedicine on work-outside-work. The study revealed a significant effect of PMARs on physicians' work-outside-work and that this relationship is moderated by physicians' specialties. Non-primary care physicians or specialists experienced a more pronounced effect than their primary care peers. Analysis of their telemedicine load revealed that primary care physicians received fewer PMARs and spent less time in work-outside-work with more telemedicine. Specialists faced increased PMARs and did more work-outside-work as telemedicine visits increased which could be due to the difference in patient panels. Reducing PMAR volumes and efficient inbasket management strategies needed to reduce physicians' work-outside-work. Policymakers need to be cognizant of potential disruptions in physicians carefully balanced workload caused by the digital health services.
中文摘要: COVID-19 大流行促进了数字医疗的利用,引发了人们对医生下班后临床工作("工作外工作")增加的担忧。患者数字信息的激增以及远程医疗提供商在工作外工作上花费的额外时间强调了评估数字医疗利用与医生下班后承诺之间的联系的必要性。我们研究了两种类型的数字需求对医生工作量的影响:发送到医生收件箱(inbasket)的患者请求医疗建议的消息(PMAR)和远程医疗。我们的研究包括 2022 年 11 月至 2023 年 3 月期间定期执业的纽约市 1716 名门诊医生。回归分析评估了 (PMAR) 和远程医疗对工作外工作的主要影响和相互作用。该研究揭示了 PMAR 对医生的工作外工作有显着影响,并且这种关系受到医生专业的调节。非初级保健医生或专家比初级保健同行受到更明显的影响。对远程医疗负荷的分析显示,初级保健医生收到的 PMAR 较少,在工作外工作的时间也较少,而远程医疗较多。随着远程医疗就诊次数的增加,专家们面临着 PMAR 的增加,并且需要做更多的户外工作,这可能是由于患者群体的差异所致。减少 PMAR 数量和有效的篮子管理策略是减少医生工作之外的工作所必需的。政策制定者需要认识到数字医疗服务对医生谨慎平衡工作量造成的潜在干扰。
327. Citizen data sovereignty is key to wearables and wellness data reuse for the common good.
公民数据主权是可穿戴设备和健康数据重用以实现共同利益的关键。
PMID: 38347159 | DOI: 10.1038/s41746-024-01004-z | 日期: 2024-02-12
摘要: Smartphones, smartwatches, linked wearables, and associated wellness apps have had rapid uptake. These tools become ever 'smarter' in sensing intimate aspects of our surroundings and physiology over time, including activity, metabolites, electrical signals, blood pressure and oxygenation. Proposed EU law stipulates the 'involuntary donation' of depersonalized health and wellness data. There has been pushback against the ever-increasing gathering and sharing of wellness data in this context, increasing with every app purchased or updated. Is the potential of this data now lost to research? Consent-led COVID-19 data donation projects signpost a participative, standardized, and scalable approach to data sharing.
中文摘要: 智能手机、智能手表、联网可穿戴设备和相关健康应用程序已迅速普及。随着时间的推移,这些工具在感知我们周围环境和生理机能的亲密方面变得越来越"聪明",包括活动、代谢物、电信号、血压和氧合。拟议的欧盟法律规定非个性化健康和保健数据的"非自愿捐赠"。在这种情况下,越来越多的健康数据收集和共享遭到了抵制,随着购买或更新的每个应用程序的增加,这种抵制也在增加。这些数据的潜力现在是否已经失去了研究意义?以同意为主导的 COVID-19 数据捐赠项目标志着一种参与性、标准化和可扩展的数据共享方法。
328. Translating color fundus photography to indocyanine green angiography using deep-learning for age-related macular degeneration screening.
利用深度学习将彩色眼底摄影转化为吲哚菁绿血管造影,用于年龄相关性黄斑变性筛查。
PMID: 38347098 | DOI: 10.1038/s41746-024-01018-7 | 日期: 2024-02-12
摘要: Age-related macular degeneration (AMD) is the leading cause of central vision impairment among the elderly. Effective and accurate AMD screening tools are urgently needed. Indocyanine green angiography (ICGA) is a well-established technique for detecting chorioretinal diseases, but its invasive nature and potential risks impede its routine clinical application. Here, we innovatively developed a deep-learning model capable of generating realistic ICGA images from color fundus photography (CF) using generative adversarial networks (GANs) and evaluated its performance in AMD classification. The model was developed with 99,002 CF-ICGA pairs from a tertiary center. The quality of the generated ICGA images underwent objective evaluation using mean absolute error (MAE), peak signal-to-noise ratio (PSNR), structural similarity measures (SSIM), etc., and subjective evaluation by two experienced ophthalmologists. The model generated realistic early, mid and late-phase ICGA images, with SSIM spanned from 0.57 to 0.65. The subjective quality scores ranged from 1.46 to 2.74 on the five-point scale (1 refers to the real ICGA image quality, Kappa 0.79-0.84). Moreover, we assessed the application of translated ICGA images in AMD screening on an external dataset (n = 13887) by calculating area under the ROC curve (AUC) in classifying AMD. Combining generated ICGA with real CF images improved the accuracy of AMD classification with AUC increased from 0.93 to 0.97 (P < 0.001). These results suggested that CF-to-ICGA translation can serve as a cross-modal data augmentation method to address the data hunger often encountered in deep-learning research, and as a promising add-on for population-based AMD screening. Real-world validation is warranted before clinical usage.
中文摘要: 年龄相关性黄斑变性(AMD)是老年人中央视力损害的主要原因。迫切需要有效、准确的 AMD 筛查工具。吲哚菁绿血管造影(ICGA)是一种成熟的脉络膜视网膜疾病检测技术,但其侵入性和潜在风险阻碍了其常规临床应用。在这里,我们创新性地开发了一种深度学习模型,能够使用生成对抗网络(GAN)从彩色眼底摄影(CF)生成逼真的 ICGA 图像,并评估其在 AMD 分类中的性能。该模型是使用来自三级中心的 99,002 对 CF-ICGA 开发的。生成的 ICGA 图像的质量使用平均绝对误差 (MAE)、峰值信噪比 (PSNR)、结构相似性度量 (SSIM) 等进行客观评估,并由两位经验丰富的眼科医生进行主观评估。该模型生成了真实的早期、中期和晚期 ICGA 图像,SSIM 范围为 0.57 至 0.65。主观质量分数在五分制上从 1.46 到 2.74 不等(1 指真实的 ICGA 图像质量,Kappa 0.79-0.84)。此外,我们通过计算 AMD 分类中的 ROC 曲线下面积 (AUC),评估了翻译后的 ICGA 图像在外部数据集 (n = 13887) 上的 AMD 筛查中的应用。将生成的 ICGA 与真实 CF 图像相结合,提高了 AMD 分类的准确性,AUC 从 0.93 增加到 0.97(P < 0.001)。这些结果表明,CF 到 ICGA 的转换可以作为一种跨模式数据增强方法来解决深度学习研究中经常遇到的数据匮乏问题,并作为基于人群的 AMD 筛查的一个有前景的附加功能。在临床使用之前需要进行真实世界的验证。
329. Digital health technologies and machine learning augment patient reported outcomes to remotely characterise rheumatoid arthritis.
数字健康技术和机器学习增强了患者报告的结果,以远程表征类风湿性关节炎。
PMID: 38347090 | DOI: 10.1038/s41746-024-01013-y | 日期: 2024-02-12
摘要: Digital measures of health status captured during daily life could greatly augment current in-clinic assessments for rheumatoid arthritis (RA), to enable better assessment of disease progression and impact. This work presents results from weaRAble-PRO, a 14-day observational study, which aimed to investigate how digital health technologies (DHT), such as smartphones and wearables, could augment patient reported outcomes (PRO) to determine RA status and severity in a study of 30 moderate-to-severe RA patients, compared to 30 matched healthy controls (HC). Sensor-based measures of health status, mobility, dexterity, fatigue, and other RA specific symptoms were extracted from daily iPhone guided tests (GT), as well as actigraphy and heart rate sensor data, which was passively recorded from patients' Apple smartwatch continuously over the study duration. We subsequently developed a machine learning (ML) framework to distinguish RA status and to estimate RA severity. It was found that daily wearable sensor-outcomes robustly distinguished RA from HC participants (F1, 0.807). Furthermore, by day 7 of the study (half-way), a sufficient volume of data had been collected to reliably capture the characteristics of RA participants. In addition, we observed that the detection of RA severity levels could be improved by augmenting standard patient reported outcomes with sensor-based features (F1, 0.833) in comparison to using PRO assessments alone (F1, 0.759), and that the combination of modalities could reliability measure continuous RA severity, as determined by the clinician-assessed RAPID-3 score at baseline (r2, 0.692; RMSE, 1.33). The ability to measure the impact of the disease during daily life-through objective and remote digital outcomes-paves the way forward to enable the development of more patient-centric and personalised measurements for use in RA clinical trials.
中文摘要: 日常生活中获取的健康状况数字测量可以极大地增强当前类风湿性关节炎(RA)的临床评估,从而更好地评估疾病进展和影响。这项工作展示了 weaRAble-PRO 的结果,这是一项为期 14 天的观察性研究,旨在调查智能手机和可穿戴设备等数字健康技术 (DHT) 如何增强患者报告的结果 (PRO),以确定 RA 状态和严重程度,对 30 名中度至重度 RA 患者进行研究,并与 30 名匹配的健康对照 (HC) 进行比较。基于传感器的健康状况、活动能力、灵活性、疲劳和其他 RA 特定症状的测量数据是从每日 iPhone 引导测试 (GT) 以及体动记录仪和心率传感器数据中提取的,这些数据在研究期间从患者的 Apple 智能手表中连续被动记录。我们随后开发了一个机器学习 (ML) 框架来区分 RA 状态并估计 RA 严重程度。结果发现,日常可穿戴传感器结果可以将 RA 与 HC 参与者区分开来(F1,0.807)。此外,到研究的第 7 天(中途),已经收集了足够的数据来可靠地捕捉 RA 参与者的特征。此外,我们观察到,与单独使用 PRO 评估(F1,0.759)相比,通过使用基于传感器的特征(F1,0.833)增强标准患者报告的结果可以改善 RA 严重程度的检测,并且结合多种方式可以可靠地测量持续的 RA 严重程度,由临床医生评估的基线 RAPID-3 评分确定(r2,0.692;RMSE, 1.33)。通过客观和远程数字结果来测量疾病对日常生活的影响的能力,为开发更多以患者为中心和个性化的测量用于 RA 临床试验铺平了道路。
330. FastEval Parkinsonism: an instant deep learning-assisted video-based online system for Parkinsonian motor symptom evaluation.
FastEval Parkinsonism:一种用于帕金森运动症状评估的即时深度学习辅助视频在线系统。
PMID: 38332372 | DOI: 10.1038/s41746-024-01022-x | 日期: 2024-02-08
摘要: The Motor Disorder Society's Unified Parkinson's Disease Rating Scale (MDS-UPDRS) is designed to assess bradykinesia, the cardinal symptoms of Parkinson's disease (PD). However, it cannot capture the all-day variability of bradykinesia outside the clinical environment. Here, we introduce FastEval Parkinsonism ( https://fastevalp.cmdm.tw/ ), a deep learning-driven video-based system, providing users to capture keypoints, estimate the severity, and summarize in a report. Leveraging 840 finger-tapping videos from 186 individuals (103 patients with Parkinson's disease (PD), 24 participants with atypical parkinsonism (APD), 12 elderly with mild parkinsonism signs (MPS), and 47 healthy controls (HCs)), we employ a dilated convolution neural network with two data augmentation techniques. Our model achieves acceptable accuracies (AAC) of 88.0% and 81.5%. The frequency-intensity (FI) value of thumb-index finger distance was indicated as a pivotal hand parameter to quantify the performance. Our model also shows the usability for multi-angle videos, tested in an external database enrolling over 300 PD patients.
中文摘要: 运动障碍协会的统一帕金森病评定量表 (MDS-UPDRS) 旨在评估运动迟缓,这是帕金森病 (PD) 的主要症状。然而,它无法捕捉临床环境之外的运动迟缓的全天变化。在这里,我们介绍 FastEval Parkinsonism ( https://fastevalp.cmdm.tw/ ),这是一个深度学习驱动的基于视频的系统,为用户提供捕获关键点、估计严重性并在报告中进行总结的功能。利用来自 186 个人(103 名帕金森病 (PD) 患者、24 名非典型帕金森病 (APD) 参与者、12 名患有轻度帕金森病症状 (MPS) 的老年人和 47 名健康对照 (HC))的 840 个手指敲击视频,我们采用了具有两种数据增强技术的扩张卷积神经网络。我们的模型达到了 88.0% 和 81.5% 的可接受精度 (AAC)。拇指-食指距离的频率强度(FI)值被表示为量化性能的关键手部参数。我们的模型还展示了多角度视频的可用性,并在招募了 300 多名 PD 患者的外部数据库中进行了测试。
331. Optimizing skin disease diagnosis: harnessing online community data with contrastive learning and clustering techniques.
优化皮肤病诊断:通过对比学习和聚类技术利用在线社区数据。
PMID: 38332257 | DOI: 10.1038/s41746-024-01014-x | 日期: 2024-02-08
摘要: Skin diseases pose significant challenges in China. Internet health forums offer a platform for millions of users to discuss skin diseases and share images for early intervention, leaving large amount of valuable dermatology images. However, data quality and annotation challenges limit the potential of these resources for developing diagnostic models. In this study, we proposed a deep-learning model that utilized unannotated dermatology images from diverse online sources. We adopted a contrastive learning approach to learn general representations from unlabeled images and fine-tuned the model on coarsely annotated images from Internet forums. Our model classified 22 common skin diseases. To improve annotation quality, we used a clustering method with a small set of standardized validation images. We tested the model on images collected by 33 experienced dermatologists from 15 tertiary hospitals and achieved a 45.05% top-1 accuracy, outperforming the published baseline model by 3%. Accuracy increased with additional validation images, reaching 49.64% with 50 images per category. Our model also demonstrated transferability to new tasks, such as detecting monkeypox, with a 61.76% top-1 accuracy using only 50 additional images in the training process. We also tested our model on benchmark datasets to show the generalization ability. Our findings highlight the potential of unannotated images from online forums for future dermatology applications and demonstrate the effectiveness of our model for early diagnosis and potential outbreak mitigation.
中文摘要: 皮肤病在中国构成重大挑战。互联网健康论坛为数百万用户提供了讨论皮肤病、分享图像以进行早期干预的平台,留下了大量有价值的皮肤病图像。然而,数据质量和注释挑战限制了这些资源开发诊断模型的潜力。在这项研究中,我们提出了一种深度学习模型,该模型利用来自不同在线来源的未注释的皮肤科图像。我们采用对比学习方法来从未标记的图像中学习一般表示,并根据互联网论坛上的粗略注释图像对模型进行微调。我们的模型对 22 种常见皮肤病进行了分类。为了提高注释质量,我们使用了具有一小组标准化验证图像的聚类方法。我们在来自 15 家三级医院的 33 名经验丰富的皮肤科医生收集的图像上测试了该模型,取得了 45.05% 的 top-1 准确率,比已发布的基线模型高出 3%。随着额外的验证图像的增加,准确率有所提高,每个类别有 50 张图像时,准确率达到 49.64%。我们的模型还展示了对新任务的可迁移性,例如检测猴痘,在训练过程中仅使用 50 个额外图像,top-1 准确率达到 61.76%。我们还在基准数据集上测试了我们的模型,以显示泛化能力。我们的研究结果强调了来自在线论坛的未注释图像在未来皮肤病学应用中的潜力,并证明了我们的模型对于早期诊断和潜在的疫情缓解的有效性。
332. Beyond the 510(k): The regulation of novel moderate-risk medical devices, intellectual property considerations, and innovation incentives in the FDA's De Novo pathway.
超越 510(k):FDA De Novo 途径中新型中等风险医疗器械的监管、知识产权考虑和创新激励。
PMID: 38332182 | DOI: 10.1038/s41746-024-01021-y | 日期: 2024-02-08
摘要: Moderate-risk medical devices constitute 99% of those that have been regulated by the U.S. Food and Drug Administration (FDA) since it gained authority to regulate medical technology nearly five decades ago. This article presents an analysis of the interaction between the 510(k) process -the historically dominant path to market for most medical devices- and the De Novo pathway, a more recent alternative that targets more novel devices, including those involving new technologies, diagnostics, hardware, and software. The De Novo pathway holds significant potential for innovators seeking to define new categories of medical devices, as it represents a less burdensome approach than would have otherwise been needed historically. Moreover, it supports the FDA in its effort to modernize the long-established 510(k) pathway by promoting the availability of up-to-date device "predicates" upon which subsequent device applications can be based, reflecting positive spillovers that are likely to encourage manufacturers to adopt current state-of-the-art technologies and modern standards of safety and effectiveness. We analyze the of characteristics all the De Novo classification requests to date, including the submission type, trends, FDA review times, and device types. After characterizing how the De Novo process has been used over time, we discuss its unique challenges and opportunities with respect to medical device software and AI-enabled devices, including considerations for intellectual property, innovation, and competition economics.
中文摘要: 自美国食品和药物管理局 (FDA) 近 50 年前获得监管医疗技术的权力以来,99% 的医疗器械属于中等风险医疗器械。本文分析了 510(k) 流程(历史上大多数医疗器械进入市场的主导途径)与 De Novo 途径(一种更新的替代方案,针对更新颖的设备,包括涉及新技术、诊断、硬件和软件的设备)之间的相互作用。对于寻求定义新类别医疗器械的创新者来说,De Novo 途径具有巨大的潜力,因为它代表了一种比历史上所需的方法更轻松的方法。此外,它还支持 FDA 努力通过促进后续设备应用所基于的最新设备"谓词"的可用性来实现长期建立的 510(k) 途径的现代化,这反映了积极的溢出效应,可能会鼓励制造商采用当前最先进的技术和现代安全性和有效性标准。我们分析了迄今为止所有 De Novo 分类请求的特征,包括提交类型、趋势、FDA 审查时间和设备类型。在描述 De Novo 流程随着时间的推移如何使用之后,我们讨论了其在医疗设备软件和人工智能设备方面的独特挑战和机遇,包括知识产权、创新和竞争经济学的考虑。
333. Digital interventions to promote psychological resilience: a systematic review and meta-analysis.
促进心理弹性的数字干预:系统评价和荟萃分析。
PMID: 38332030 | DOI: 10.1038/s41746-024-01017-8 | 日期: 2024-02-08
摘要: Societies are exposed to major challenges at an increasing pace. This underscores the need for preventive measures such as resilience promotion that should be available in time and without access barriers. Our systematic review summarizes evidence on digital resilience interventions, which have the potential to meet these demands. We searched five databases for randomized-controlled trials in non-clinical adult populations. Primary outcomes were mental distress, positive mental health, and resilience factors. Multilevel meta-analyses were performed to compare intervention and control groups at post-intervention and follow-up assessments. We identified 101 studies comprising 20,010 participants. Meta-analyses showed small favorable effects on mental distress, SMD = -0.24, 95% CI [-0.31, -0.18], positive mental health, SMD = 0.27, 95% CI [0.13, 0.40], and resilience factors, SMD = 0.31, 95% CI [0.21, 0.41]. Among middle-aged samples, older age was associated with more beneficial effects at follow-up, and effects were smaller for active control groups. Effects were comparable to those of face-to-face interventions and underline the potential of digital resilience interventions to prepare for future challenges.
中文摘要: 社会正日益面临重大挑战。这强调了预防措施的必要性,例如应及时提供且没有准入障碍的恢复力提升。我们的系统回顾总结了数字弹性干预措施的证据,这些干预措施有可能满足这些需求。我们检索了五个数据库,以进行非临床成人人群的随机对照试验。主要结局是精神困扰、积极的心理健康和复原力因素。进行多层次荟萃分析,以在干预后和随访评估中比较干预组和对照组。我们确定了 101 项研究,涉及 20,010 名参与者。荟萃分析显示,对精神困扰(SMD = -0.24,95% CI [-0.31,-0.18])、积极心理健康(SMD = 0.27,95% CI [0.13,0.40])和复原力因素(SMD = 0.31,95% CI [0.21, 0.41]。在中年样本中,年龄越大,随访时的有益效果越明显,而活性对照组的效果较小。其效果与面对面干预措施相当,并突显了数字弹性干预措施为应对未来挑战做好准备的潜力。
334. Uncertainty-aware deep-learning model for prediction of supratentorial hematoma expansion from admission non-contrast head computed tomography scan.
用于预测入院非造影头部计算机断层扫描的幕上血肿扩张的不确定性感知深度学习模型。
PMID: 38321131 | DOI: 10.1038/s41746-024-01007-w | 日期: 2024-02-06
摘要: Hematoma expansion (HE) is a modifiable risk factor and a potential treatment target in patients with intracerebral hemorrhage (ICH). We aimed to train and validate deep-learning models for high-confidence prediction of supratentorial ICH expansion, based on admission non-contrast head Computed Tomography (CT). Applying Monte Carlo dropout and entropy of deep-learning model predictions, we estimated the model uncertainty and identified patients at high risk of HE with high confidence. Using the receiver operating characteristics area under the curve (AUC), we compared the deep-learning model prediction performance with multivariable models based on visual markers of HE determined by expert reviewers. We randomly split a multicentric dataset of patients (4-to-1) into training/cross-validation (n = 634) versus test (n = 159) cohorts. We trained and tested separate models for prediction of ≥6 mL and ≥3 mL ICH expansion. The deep-learning models achieved an AUC = 0.81 for high-confidence prediction of HE~≥6 mL~ and AUC = 0.80 for prediction of HE~≥3 mL~, which were higher than visual maker models AUC = 0.69 for HE~≥6 mL~ (p = 0.036) and AUC = 0.68 for HE~≥3 mL~ (p = 0.043). Our results show that fully automated deep-learning models can identify patients at risk of supratentorial ICH expansion based on admission non-contrast head CT, with high confidence, and more accurately than benchmark visual markers.
中文摘要: 血肿扩张(HE)是脑出血(ICH)患者的一个可改变的危险因素和潜在的治疗目标。我们的目的是训练和验证深度学习模型,以基于入院非造影头部计算机断层扫描 (CT) 对幕上 ICH 扩张进行高置信度预测。应用深度学习模型预测的蒙特卡洛丢失和熵,我们估计了模型的不确定性,并以高置信度识别了 HE 高风险患者。使用受试者工作特征曲线下面积 (AUC),我们将深度学习模型的预测性能与基于专家评审员确定的 HE 视觉标记的多变量模型进行比较。我们将多中心患者数据集(4 对 1)随机分为训练/交叉验证 (n = 634) 和测试 (n = 159) 队列。我们训练和测试了预测 ≥6mL 和 ≥3mL ICH 扩展的单独模型。深度学习模型在高置信度预测 HE~≥6 mL~ 时实现了 AUC = 0.81,在预测 HE~≥3 mL~ 时实现了 AUC = 0.80,高于视觉标记模型在 HE~≥6 mL~ 时的 AUC = 0.69 (p = 0.036) 和 HE~≥3 mL~ 的 AUC = 0.68 (p = 0.043)。我们的结果表明,全自动深度学习模型可以根据入院非造影头部 CT 识别有幕上 ICH 扩张风险的患者,置信度高,并且比基准视觉标记更准确。
335. Artificial intelligence-driven virtual rehabilitation for people living in the community: A scoping review.
人工智能驱动的社区居民虚拟康复:范围界定审查。
PMID: 38310158 | DOI: 10.1038/s41746-024-00998-w | 日期: 2024-02-03
摘要: Virtual Rehabilitation (VRehab) is a promising approach to improving the physical and mental functioning of patients living in the community. The use of VRehab technology results in the generation of multi-modal datasets collected through various devices. This presents opportunities for the development of Artificial Intelligence (AI) techniques in VRehab, namely the measurement, detection, and prediction of various patients' health outcomes. The objective of this scoping review was to explore the applications and effectiveness of incorporating AI into home-based VRehab programs. PubMed/MEDLINE, Embase, IEEE Xplore, Web of Science databases, and Google Scholar were searched from inception until June 2023 for studies that applied AI for the delivery of VRehab programs to the homes of adult patients. After screening 2172 unique titles and abstracts and 51 full-text studies, 13 studies were included in the review. A variety of AI algorithms were applied to analyze data collected from various sensors and make inferences about patients' health outcomes, most involving evaluating patients' exercise quality and providing feedback to patients. The AI algorithms used in the studies were mostly fuzzy rule-based methods, template matching, and deep neural networks. Despite the growing body of literature on the use of AI in VRehab, very few studies have examined its use in patients' homes. Current research suggests that integrating AI with home-based VRehab can lead to improved rehabilitation outcomes for patients. However, further research is required to fully assess the effectiveness of various forms of AI-driven home-based VRehab, taking into account its unique challenges and using standardized metrics.
中文摘要: 虚拟康复 (VRehab) 是改善社区患者身心功能的一种有前景的方法。 VRehab 技术的使用可以生成通过各种设备收集的多模式数据集。这为 VRehab 中人工智能 (AI) 技术的发展提供了机会,即测量、检测和预测各种患者的健康结果。本次范围审查的目的是探索将人工智能纳入家庭 VRehab 计划的应用和有效性。从成立到 2023 年 6 月,我们对 PubMed/MEDLINE、Embase、IEEE Xplore、Web of Science 数据库和 Google Scholar 进行了搜索,以查找应用人工智能向成年患者家中提供 VRehab 项目的研究。在筛选了 2172 项独特的标题和摘要以及 51 项全文研究后,13 项研究纳入了综述。应用各种人工智能算法来分析从各种传感器收集的数据并推断患者的健康结果,其中大多数涉及评估患者的运动质量并向患者提供反馈。研究中使用的人工智能算法主要是基于模糊规则的方法、模板匹配和深度神经网络。尽管有关在虚拟康复中心使用人工智能的文献越来越多,但很少有研究考察其在患者家中的使用。目前的研究表明,将人工智能与家庭 VRehab 相结合可以改善患者的康复结果。然而,需要进一步的研究来充分评估各种形式的人工智能驱动的家庭 VRehab 的有效性,同时考虑到其独特的挑战并使用标准化指标。
336. AI-derived epicardial fat measurements improve cardiovascular risk prediction from myocardial perfusion imaging.
AI 衍生的心外膜脂肪测量可改善心肌灌注成像的心血管风险预测。
PMID: 38310123 | DOI: 10.1038/s41746-024-01020-z | 日期: 2024-02-03
摘要: Epicardial adipose tissue (EAT) volume and attenuation are associated with cardiovascular risk, but manual annotation is time-consuming. We evaluated whether automated deep learning-based EAT measurements from ungated computed tomography (CT) are associated with death or myocardial infarction (MI). We included 8781 patients from 4 sites without known coronary artery disease who underwent hybrid myocardial perfusion imaging. Of those, 500 patients from one site were used for model training and validation, with the remaining patients held out for testing (n = 3511 internal testing, n = 4770 external testing). We modified an existing deep learning model to first identify the cardiac silhouette, then automatically segment EAT based on attenuation thresholds. Deep learning EAT measurements were obtained in <2 s compared to 15 min for expert annotations. There was excellent agreement between EAT attenuation (Spearman correlation 0.90 internal, 0.82 external) and volume (Spearman correlation 0.90 internal, 0.91 external) by deep learning and expert segmentation in all 3 sites (Spearman correlation 0.90-0.98). During median follow-up of 2.7 years (IQR 1.6-4.9), 565 patients experienced death or MI. Elevated EAT volume and attenuation were independently associated with an increased risk of death or MI after adjustment for relevant confounders. Deep learning can automatically measure EAT volume and attenuation from low-dose, ungated CT with excellent correlation with expert annotations, but in a fraction of the time. EAT measurements offer additional prognostic insights within the context of hybrid perfusion imaging.
中文摘要: 心外膜脂肪组织(EAT)体积和衰减与心血管风险相关,但手动注释非常耗时。我们评估了非门控计算机断层扫描 (CT) 中基于深度学习的自动 EAT 测量是否与死亡或心肌梗塞 (MI) 相关。我们纳入了来自 4 个地点的 8781 名没有已知冠状动脉疾病的患者,他们接受了混合心肌灌注成像。其中,来自一个地点的 500 名患者用于模型训练和验证,其余患者则用于测试(n = 3511 名内部测试,n = 4770 名外部测试)。我们修改了现有的深度学习模型,首先识别心脏轮廓,然后根据衰减阈值自动分割 EAT。深度学习 EAT 测量只需不到 2 秒即可获得,而专家注释则需要 15 分钟。通过深度学习和专家分割,所有 3 个站点的 EAT 衰减(Spearman 相关性 0.90 内部,0.82 外部)和体积(Spearman 相关性 0.90 内部,0.91 外部)之间存在极好的一致性(Spearman 相关性 0.90-0.98)。在中位随访 2.7 年(IQR 1.6-4.9)期间,565 名患者经历了死亡或心肌梗死。调整相关混杂因素后,EAT 容量和衰减升高与死亡或 MI 风险增加独立相关。深度学习可以自动测量低剂量、非门控 CT 的 EAT 体积和衰减,与专家注释具有良好的相关性,但所需时间却很短。 EAT 测量在混合灌注成像的背景下提供了额外的预后见解。
337. Opportunities for CMS to improve healthcare access and equity through advancing technology-enabled startups and digital health innovations.
CMS 有机会通过推进技术支持的初创企业和数字健康创新来改善医疗保健的可及性和公平性。
PMID: 38291101 | DOI: 10.1038/s41746-024-00997-x | 日期: 2024-01-30
摘要: Historically, the Centers for Medicare and Medicaid Services (CMS) has formed partnerships with select private sector entities, including large traditional hospital and health system networks, nursing homes, and payer groups. However, innovations from technology-enabled services companies and digital technology companies are uniquely poised to aid CMS in addressing key barriers toward advancing its mission of improving healthcare access and equity. There are four pivotal opportunity areas where partnerships with technology businesses and tools would enhance the work of CMS: (1) improving consumer awareness about CMS programs, (2) mitigating access gaps through virtual care programs, (3) streamlining the complexity of different payer plan models, and (4) using technology-enabled services to address social risk factors without imposing additional burdens on providers. We offer examples of digital and technology-enabled solutions that improve patient access to care and close equity gaps, as well as propose specific recommendations for CMS to advance and expand the reach and impact of these solutions. Namely, these recommendations include partnerships with private sector companies that can educate and support consumers about their benefits, the extension of telehealth reimbursement parity for virtual care solutions, allowing for cross-state licensure across plans and reimbursement for care coordination services that alleviate provider burden to screen and address patients' social determinants of health needs. We argue that CMS has an imperative role in leveraging the innovations of technology-enabled services and digital health technologies to lower healthcare access barriers, mitigate provider burden, stimulate innovation, and close equity gaps at the patient, provider, and innovator levels.
中文摘要: 从历史上看,医疗保险和医疗补助服务中心 (CMS) 已与选定的私营部门实体建立了合作伙伴关系,包括大型传统医院和卫生系统网络、疗养院和付款人团体。然而,技术支持的服务公司和数字技术公司的创新具有独特的优势,可以帮助 CMS 解决关键障碍,以推进其改善医疗保健可及性和公平性的使命。与技术企业和工具的合作将在四个关键机会领域加强 CMS 的工作:(1) 提高消费者对 CMS 计划的认识,(2) 通过虚拟护理计划缩小获取差距,(3) 简化不同付款人计划模型的复杂性,以及 (4) 使用技术支持的服务来解决社会风险因素,而不会给提供者带来额外的负担。我们提供了数字和技术支持的解决方案的示例,这些解决方案可以改善患者获得护理的机会并缩小公平差距,并为 CMS 提出具体建议,以推进和扩大这些解决方案的范围和影响。也就是说,这些建议包括与私营部门公司建立合作伙伴关系,这些公司可以教育和支持消费者了解他们的好处,扩大虚拟护理解决方案的远程医疗报销平价,允许跨州许可跨计划和报销护理协调服务,从而减轻提供者筛选和解决患者健康需求的社会决定因素的负担。我们认为,CMS 在利用技术支持的服务和数字医疗技术的创新来降低医疗保健准入障碍、减轻提供者负担、刺激创新以及缩小患者、提供者和创新者层面的公平差距方面发挥着至关重要的作用。
338. Feasibility of combining spatial computing and AI for mental health support in anxiety and depression.
将空间计算和人工智能相结合以支持焦虑和抑郁心理健康的可行性。
PMID: 38279034 | DOI: 10.1038/s41746-024-01011-0 | 日期: 2024-01-26
摘要: The increasing need for mental health support and a shortage of therapists have led to the development of the eXtended-reality Artificial Intelligence Assistant (XAIA). This platform combines spatial computing, virtual reality (VR), and artificial intelligence (AI) to provide immersive mental health support. Utilizing GPT-4 for AI-driven therapy, XAIA engaged participants with mild-to-moderate anxiety or depression in biophilic VR environments. Speaking with an AI therapy avatar in VR was considered acceptable, helpful, and safe, with participants observed to engage genuinely with the program. However, some still favored human interaction and identified shortcomings with using a digital VR therapist. The study provides initial evidence of the acceptability and safety of AI psychotherapy via spatial computing, warranting further research on technical enhancements and clinical impact.
中文摘要: 对心理健康支持的日益增长的需求和治疗师的短缺导致了扩展现实人工智能助手(XAIA)的开发。该平台结合了空间计算、虚拟现实(VR)和人工智能(AI),提供沉浸式心理健康支持。 XAIA 利用 GPT-4 进行人工智能驱动的治疗,让患有轻度至中度焦虑或抑郁的参与者在亲生命的 VR 环境中进行治疗。在 VR 中与人工智能治疗虚拟人物交谈被认为是可以接受的、有帮助的、安全的,并且观察到参与者真正参与了该项目。然而,一些人仍然喜欢人际互动,并指出使用数字 VR 治疗师的缺点。该研究提供了通过空间计算进行人工智能心理治疗的可接受性和安全性的初步证据,值得进一步研究技术改进和临床影响。
339. Transparency of artificial intelligence/machine learning-enabled medical devices.
支持人工智能/机器学习的医疗设备的透明度。
PMID: 38273098 | DOI: 10.1038/s41746-023-00992-8 | 日期: 2024-01-26
340. Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine.
诊断推理提示揭示了大语言模型可解释性在医学中的潜力。
PMID: 38267608 | DOI: 10.1038/s41746-024-01010-1 | 日期: 2024-01-24
摘要: One of the major barriers to using large language models (LLMs) in medicine is the perception they use uninterpretable methods to make clinical decisions that are inherently different from the cognitive processes of clinicians. In this manuscript we develop diagnostic reasoning prompts to study whether LLMs can imitate clinical reasoning while accurately forming a diagnosis. We find that GPT-4 can be prompted to mimic the common clinical reasoning processes of clinicians without sacrificing diagnostic accuracy. This is significant because an LLM that can imitate clinical reasoning to provide an interpretable rationale offers physicians a means to evaluate whether an LLMs response is likely correct and can be trusted for patient care. Prompting methods that use diagnostic reasoning have the potential to mitigate the "black box" limitations of LLMs, bringing them one step closer to safe and effective use in medicine.
中文摘要: 在医学中使用大语言模型 (LLM) 的主要障碍之一是人们认为它们使用不可解释的方法来做出与临床医生的认知过程本质上不同的临床决策。在这篇手稿中,我们开发了诊断推理提示,以研究法学硕士是否可以在准确形成诊断的同时模仿临床推理。我们发现 GPT-4 可以被提示模仿临床医生常见的临床推理过程,而不会牺牲诊断准确性。这一点很重要,因为法学硕士可以模仿临床推理来提供可解释的基本原理,为医生提供了一种方法来评估法学硕士的反应是否可能正确以及是否可以信任患者护理。使用诊断推理的提示方法有可能减轻法学硕士的"黑匣子"限制,使他们更接近安全有效地在医学中使用。
341. An intriguing vision for transatlantic collaborative health data use and artificial intelligence development.
跨大西洋协作健康数据使用和人工智能开发的有趣愿景。
PMID: 38263436 | DOI: 10.1038/s41746-024-01005-y | 日期: 2024-01-23
摘要: Our traditional approach to diagnosis, prognosis, and treatment, can no longer process and transform the enormous volume of information into therapeutic success, innovative discovery, and health economic performance. Precision health, i.e., the right treatment, for the right person, at the right time in the right place, is enabled through a learning health system, in which medicine and multidisciplinary science, economic viability, diverse culture, and empowered patient's preferences are digitally integrated and conceptually aligned for continuous improvement and maintenance of health, wellbeing, and equity. Artificial intelligence (AI) has been successfully evaluated in risk stratification, accurate diagnosis, and treatment allocation, and to prevent health disparities. There is one caveat though: dependable AI models need to be trained on population-representative, large and deep data sets by multidisciplinary and multinational teams to avoid developer, statistical and social bias. Such applications and models can neither be created nor validated with data at the country, let alone institutional level and require a new dimension of collaboration, a cultural change with the establishment of trust in a precompetitive space. The Data for Health (#DFH23) conference in Berlin and the Follow-Up Workshop at Harvard University in Boston hosted a representative group of stakeholders in society, academia, industry, and government. With the momentum #DFH23 created, the European Health Data Space (EHDS) as a solid and safe foundation for consented collaborative health data use and the G7 Hiroshima AI process in place, we call on citizens and their governments to fully support digital transformation of medicine, research and innovation including AI.
中文摘要: 我们传统的诊断、预后和治疗方法无法再处理大量信息并将其转化为治疗成功、创新发现和健康经济绩效。精准健康,即在正确的时间、正确的地点为正确的人提供正确的治疗,是通过学习型医疗系统实现的,在该系统中,医学和多学科科学、经济可行性、多元文化和赋予患者的偏好以数字方式整合并在概念上保持一致,以持续改善和维护健康、福祉和公平。人工智能(AI)已成功评估风险分层、准确诊断和治疗分配,并防止健康差异。但有一点需要注意:可靠的人工智能模型需要由多学科和跨国团队在具有人口代表性的大型和深度数据集上进行训练,以避免开发人员、统计和社会偏见。此类应用程序和模型既无法在国家/地区创建也无法通过数据进行验证,更不用说在机构层面上进行验证,并且需要新的协作维度,即在竞争前空间中建立信任的文化变革。在柏林举行的健康数据 (#DFH23) 会议和在波士顿哈佛大学举行的后续研讨会主办了由社会、学术界、工业界和政府的利益相关者代表组成的小组。随着 #DFH23 的势头,欧洲健康数据空间 (EHDS) 作为同意的协作健康数据使用的坚实和安全的基础以及七国集团广岛人工智能流程的到位,我们呼吁公民及其政府全力支持包括人工智能在内的医学、研究和创新的数字化转型。
342. Impact of a deep learning sepsis prediction model on quality of care and survival.
深度学习脓毒症预测模型对护理质量和生存的影响。
PMID: 38263386 | DOI: 10.1038/s41746-023-00986-6 | 日期: 2024-01-23
摘要: Sepsis remains a major cause of mortality and morbidity worldwide. Algorithms that assist with the early recognition of sepsis may improve outcomes, but relatively few studies have examined their impact on real-world patient outcomes. Our objective was to assess the impact of a deep-learning model (COMPOSER) for the early prediction of sepsis on patient outcomes. We completed a before-and-after quasi-experimental study at two distinct Emergency Departments (EDs) within the UC San Diego Health System. We included 6217 adult septic patients from 1/1/2021 through 4/30/2023. The exposure tested was a nurse-facing Best Practice Advisory (BPA) triggered by COMPOSER. In-hospital mortality, sepsis bundle compliance, 72-h change in sequential organ failure assessment (SOFA) score following sepsis onset, ICU-free days, and the number of ICU encounters were evaluated in the pre-intervention period (705 days) and the post-intervention period (145 days). The causal impact analysis was performed using a Bayesian structural time-series approach with confounder adjustments to assess the significance of the exposure at the 95% confidence level. The deployment of COMPOSER was significantly associated with a 1.9% absolute reduction (17% relative decrease) in in-hospital sepsis mortality (95% CI, 0.3%-3.5%), a 5.0% absolute increase (10% relative increase) in sepsis bundle compliance (95% CI, 2.4%-8.0%), and a 4% (95% CI, 1.1%-7.1%) reduction in 72-h SOFA change after sepsis onset in causal inference analysis. This study suggests that the deployment of COMPOSER for early prediction of sepsis was associated with a significant reduction in mortality and a significant increase in sepsis bundle compliance.
中文摘要: 脓毒症仍然是全世界死亡和发病的主要原因。有助于早期识别脓毒症的算法可能会改善结果,但相对较少的研究检验了它们对现实世界患者结果的影响。我们的目标是评估深度学习模型 (COMPOSER) 对脓毒症早期预测对患者预后的影响。我们在加州大学圣地亚哥分校卫生系统内的两个不同的急诊科 (ED) 完成了一项前后的准实验研究。我们纳入了 2021 年 1 月 1 日至 2023 年 4 月 30 日期间的 6217 名成年脓毒症患者。暴露测试是由 COMPOSER 触发的面向护士的最佳实践咨询 (BPA)。在干预前(705天)和干预后(145天)评估院内死亡率、脓毒症集束依从性、脓毒症发作后序贯器官衰竭评估(SOFA)评分72小时变化、无ICU天数以及ICU就诊次数。使用贝叶斯结构时间序列方法进行因果影响分析,并进行混杂因素调整,以评估暴露在 95% 置信水平下的显着性。 COMPOSER 的部署与院内脓毒症死亡率绝对降低 1.9%(相对降低 17%)(95% CI,0.3%-3.5%)、脓毒症集束依从性绝对增加 5.0%(相对增加 10%)(95% CI,2.4%-8.0%)以及死亡率降低 4%(95% CI,1.1%-7.1%)显着相关。因果推断分析中脓毒症发作后 72 小时 SOFA 变化。这项研究表明,部署 COMPOSER 来早期预测脓毒症与死亡率的显着降低和脓毒症束依从性的显着增加相关。
343. Diagnostic performance of artificial intelligence-assisted PET imaging for Parkinson's disease: a systematic review and meta-analysis.
人工智能辅助 PET 成像对帕金森病的诊断性能:系统评价和荟萃分析。
PMID: 38253738 | DOI: 10.1038/s41746-024-01012-z | 日期: 2024-01-22
摘要: Artificial intelligence (AI)-assisted PET imaging is emerging as a promising tool for the diagnosis of Parkinson's disease (PD). We aim to systematically review the diagnostic accuracy of AI-assisted PET in detecting PD. The Ovid MEDLINE, Ovid Embase, Web of Science, and IEEE Xplore databases were systematically searched for related studies that developed an AI algorithm in PET imaging for diagnostic performance from PD and were published by August 17, 2023. Binary diagnostic accuracy data were extracted for meta-analysis to derive outcomes of interest: area under the curve (AUC). 23 eligible studies provided sufficient data to construct contingency tables that allowed the calculation of diagnostic accuracy. Specifically, 11 studies were identified that distinguished PD from normal control, with a pooled AUC of 0.96 (95% CI: 0.94-0.97) for presynaptic dopamine (DA) and 0.90 (95% CI: 0.87-0.93) for glucose metabolism (18F-FDG). 13 studies were identified that distinguished PD from the atypical parkinsonism (AP), with a pooled AUC of 0.93 (95% CI: 0.91 - 0.95) for presynaptic DA, 0.79 (95% CI: 0.75-0.82) for postsynaptic DA, and 0.97 (95% CI: 0.96-0.99) for 18F-FDG. Acceptable diagnostic performance of PD with AI algorithms-assisted PET imaging was highlighted across the subgroups. More rigorous reporting standards that take into account the unique challenges of AI research could improve future studies.
中文摘要: 人工智能 (AI) 辅助 PET 成像正在成为诊断帕金森病 (PD) 的一种有前景的工具。我们的目标是系统评价人工智能辅助 PET 在检测 PD 方面的诊断准确性。系统地搜索了 Ovid MEDLINE、Ovid Embase、Web of Science 和 IEEE Xplore 数据库中的相关研究,这些研究开发了 PET 成像中的 AI 算法,以提高 PD 的诊断性能,并于 2023 年 8 月 17 日发布。提取二元诊断准确性数据进行荟萃分析,以得出感兴趣的结果:曲线下面积 (AUC)。 23 项符合条件的研究提供了足够的数据来构建列联表,以便计算诊断准确性。具体而言,确定了 11 项研究将 PD 与正常对照区分开来,突触前多巴胺 (DA) 的汇总 AUC 为 0.96 (95% CI: 0.94-0.97),葡萄糖代谢 (18F-FDG) 的汇总 AUC 为 0.90 (95% CI: 0.87-0.93)。确定了 13 项研究区分 PD 和非典型帕金森病 (AP),突触前 DA 的汇总 AUC 为 0.93 (95% CI: 0.91 - 0.95),突触后 DA 为 0.79 (95% CI: 0.75-0.82),突触后 DA 为 0.97 (95% CI: 0.96-0.99)对于18F-FDG。各个亚组都强调了人工智能算法辅助 PET 成像对 PD 的可接受的诊断性能。考虑到人工智能研究的独特挑战的更严格的报告标准可以改善未来的研究。
344. DRG-LLaMA : tuning LLaMA model to predict diagnosis-related group for hospitalized patients.
DRG-LLaMA:调整LLaMA模型以预测住院患者的诊断相关组。
PMID: 38253711 | DOI: 10.1038/s41746-023-00989-3 | 日期: 2024-01-22
摘要: In the U.S. inpatient payment system, the Diagnosis-Related Group (DRG) is pivotal, but its assignment process is inefficient. The study introduces DRG-LLaMA, an advanced large language model (LLM) fine-tuned on clinical notes to enhance DRGs assignment. Utilizing LLaMA as the foundational model and optimizing it through Low-Rank Adaptation (LoRA) on 236,192 MIMIC-IV discharge summaries, our DRG-LLaMA -7B model exhibited a noteworthy macro-averaged F1 score of 0.327, a top-1 prediction accuracy of 52.0%, and a macro-averaged Area Under the Curve (AUC) of 0.986, with a maximum input token length of 512. This model surpassed the performance of prior leading models in DRG prediction, showing a relative improvement of 40.3% and 35.7% in macro-averaged F1 score compared to ClinicalBERT and CAML, respectively. Applied to base DRG and complication or comorbidity (CC)/major complication or comorbidity (MCC) prediction, DRG-LLaMA achieved a top-1 prediction accuracy of 67.8% and 67.5%, respectively. Additionally, our findings indicate that DRG-LLaMA 's performance correlates with increased model parameters and input context lengths.
中文摘要: 在美国住院支付系统中,诊断相关组(DRG)至关重要,但其分配流程效率低下。该研究引入了 DRG-LLaMA,这是一种先进的大语言模型 (LLM),根据临床记录进行微调,以增强 DRG 分配。利用 LLaMA 作为基础模型,并通过低秩适应 (LoRA) 对 236,192 个 MIMIC-IV 放电摘要进行优化,我们的 DRG-LLaMA -7B 模型表现出值得注意的宏观平均 F1 分数为 0.327,top-1 预测精度为 52.0%,宏观平均曲线下面积 (AUC) 为 0.986,最大输入标记长度为512. 该模型在 DRG 预测方面超越了先前领先模型的性能,与 ClinicalBERT 和 CAML 相比,宏观平均 F1 分数分别相对提高了 40.3% 和 35.7%。应用于基础 DRG 和并发症或合并症 (CC)/主要并发症或合并症 (MCC) 预测时,DRG-LLaMA 分别实现了 67.8% 和 67.5% 的 top-1 预测精度。此外,我们的研究结果表明 DRG-LLaMA 的性能与增加的模型参数和输入上下文长度相关。
345. Using digital technologies to diagnose in the home: recommendations from a Delphi panel.
使用数字技术在家中进行诊断:德尔福小组的建议。
PMID: 38253682 | DOI: 10.1038/s41746-024-01009-8 | 日期: 2024-01-22
摘要: Rapid advances in digital technology have expanded the availability of diagnostic tools beyond traditional medical settings. Previously confined to clinical environments, these many diagnostic capabilities are now accessible outside the clinic. This study utilized the Delphi method, a consensus-building approach, to develop recommendations for the development and deployment of these innovative technologies. The study findings present the 29 consensus-based recommendations generated through the Delphi process, providing valuable insights and guidance for stakeholders involved in the implementation and utilization of these novel diagnostic solutions. These recommendations serve as a roadmap for navigating the complexities of integrating digital diagnostics into healthcare practice outside traditional settings like hospitals and clinics.
中文摘要: 数字技术的快速进步将诊断工具的可用性扩展到传统医疗环境之外。以前仅限于临床环境,现在可以在诊所外使用这些诊断功能。本研究利用德尔菲法(一种建立共识的方法)为这些创新技术的开发和部署提出建议。研究结果提出了通过德尔菲流程生成的 29 项基于共识的建议,为参与实施和利用这些新型诊断解决方案的利益相关者提供了宝贵的见解和指导。这些建议可作为解决将数字诊断集成到医院和诊所等传统环境之外的医疗保健实践中的复杂性的路线图。
346. Histopathology images-based deep learning prediction of prognosis and therapeutic response in small cell lung cancer.
基于组织病理学图像的深度学习预测小细胞肺癌的预后和治疗反应。
PMID: 38238410 | DOI: 10.1038/s41746-024-01003-0 | 日期: 2024-01-18
摘要: Small cell lung cancer (SCLC) is a highly aggressive subtype of lung cancer characterized by rapid tumor growth and early metastasis. Accurate prediction of prognosis and therapeutic response is crucial for optimizing treatment strategies and improving patient outcomes. In this study, we conducted a deep-learning analysis of Hematoxylin and Eosin (H&E) stained histopathological images using contrastive clustering and identified 50 intricate histomorphological phenotype clusters (HPCs) as pathomic features. We identified two of 50 HPCs with significant prognostic value and then integrated them into a pathomics signature (PathoSig) using the Cox regression model. PathoSig showed significant risk stratification for overall survival and disease-free survival and successfully identified patients who may benefit from postoperative or preoperative chemoradiotherapy. The predictive power of PathoSig was validated in independent multicenter cohorts. Furthermore, PathoSig can provide comprehensive prognostic information beyond the current TNM staging system and molecular subtyping. Overall, our study highlights the significant potential of utilizing histopathology images-based deep learning in improving prognostic predictions and evaluating therapeutic response in SCLC. PathoSig represents an effective tool that aids clinicians in making informed decisions and selecting personalized treatment strategies for SCLC patients.
中文摘要: 小细胞肺癌(SCLC)是一种高度侵袭性的肺癌亚型,其特点是肿瘤生长快速和早期转移。准确预测预后和治疗反应对于优化治疗策略和改善患者预后至关重要。在这项研究中,我们使用对比聚类对苏木精和曙红 (H&E) 染色的组织病理学图像进行了深度学习分析,并确定了 50 个复杂的组织形态学表型簇 (HPC) 作为病理特征。我们确定了 50 个 HPC 中的两个具有显着的预后价值,然后使用 Cox 回归模型将它们集成到病理组学特征 (PathoSig) 中。 PathoSig 显示了总体生存率和无病生存率的显着风险分层,并成功识别出可能受益于术后或术前放化疗的患者。 PathoSig 的预测能力在独立的多中心队列中得到了验证。此外,PathoSig 可以提供超越当前 TNM 分期系统和分子亚型的全面预后信息。总的来说,我们的研究强调了利用基于组织病理学图像的深度学习在改善 SCLC 的预后预测和评估治疗反应方面的巨大潜力。 PathoSig 是一种有效的工具,可帮助临床医生为 SCLC 患者做出明智的决策并选择个性化的治疗策略。
347. Wireless facial biosensing system for monitoring facial palsy with flexible microneedle electrode arrays.
无线面部生物传感系统,通过灵活的微针电极阵列监测面部麻痹。
PMID: 38225423 | DOI: 10.1038/s41746-024-01002-1 | 日期: 2024-01-15
摘要: Facial palsy (FP) profoundly influences interpersonal communication and emotional expression, necessitating precise diagnostic and monitoring tools for optimal care. However, current electromyography (EMG) systems are limited by their bulky nature, complex setups, and dependence on skilled technicians. Here we report an innovative biosensing approach that utilizes a PEDOT:PSS-modified flexible microneedle electrode array (P-FMNEA) to overcome the limitations of existing EMG devices. Supple system-level mechanics ensure excellent conformality to the facial curvilinear regions, enabling the detection of targeted muscular ensemble movements for facial paralysis assessment. Moreover, our apparatus adeptly captures each electrical impulse in response to real-time direct nerve stimulation during neurosurgical procedures. The wireless conveyance of EMG signals to medical facilities via a server augments access to patient follow-up evaluation data, fostering prompt treatment suggestions and enabling the access of multiple facial EMG datasets during typical 6-month follow-ups. Furthermore, the device's soft mechanics alleviate issues of spatial intricacy, diminish pain, and minimize soft tissue hematomas associated with traditional needle electrode positioning. This groundbreaking biosensing strategy has the potential to transform FP management by providing an efficient, user-friendly, and less invasive alternative to the prevailing EMG devices. This pioneering technology enables more informed decision-making in FP-management and therapeutic intervention.
中文摘要: 面瘫 (FP) 深刻影响人际沟通和情绪表达,需要精确的诊断和监测工具才能实现最佳护理。然而,当前的肌电图 (EMG) 系统因其体积庞大、设置复杂以及对熟练技术人员的依赖而受到限制。在这里,我们报告了一种创新的生物传感方法,该方法利用 PEDOT:PSS 修饰的柔性微针电极阵列 (P-FMNEA) 来克服现有 EMG 设备的局限性。柔软的系统级机械装置确保面部曲线区域具有出色的保形性,从而能够检测目标肌肉整体运动以进行面部麻痹评估。此外,我们的设备能够熟练地捕获神经外科手术过程中响应实时直接神经刺激的每个电脉冲。通过服务器将 EMG 信号无线传输到医疗机构,增强了对患者随访评估数据的访问,促进及时的治疗建议,并在典型的 6 个月随访期间能够访问多个面部 EMG 数据集。此外,该设备的软机械结构缓解了空间复杂性问题,减轻了疼痛,并最大限度地减少了与传统针电极定位相关的软组织血肿。这种突破性的生物传感策略有潜力改变 FP 管理,为流行的 EMG 设备提供高效、用户友好且侵入性较小的替代方案。这项开创性技术可以在 FP 管理和治疗干预方面做出更明智的决策。
348. Implementation of cloud computing in the German healthcare system.
在德国医疗保健系统中实施云计算。
PMID: 38218892 | DOI: 10.1038/s41746-024-01000-3 | 日期: 2024-01-13
摘要: With the advent of artificial intelligence and Big Data - projects, the necessity for a transition from analog medicine to modern-day solutions such as cloud computing becomes unavoidable. Even though this need is now common knowledge, the process is not always easy to start. Legislative changes, for example at the level of the European Union, are helping the respective healthcare systems to take the necessary steps. This article provides an overview of how a German university hospital is dealing with European data protection laws on the integration of cloud computing into everyday clinical practice. By describing our model approach, we aim to identify opportunities and possible pitfalls to sustainably influence digitization in Germany.
中文摘要: 随着人工智能和大数据项目的出现,从模拟医学过渡到云计算等现代解决方案的必要性变得不可避免。尽管这种需求现在已是众所周知,但这个过程并不总是那么容易开始。立法变革(例如欧盟层面的立法变革)正在帮助各自的医疗保健系统采取必要的措施。本文概述了德国大学医院如何处理有关将云计算集成到日常临床实践中的欧洲数据保护法。通过描述我们的模型方法,我们的目标是识别可持续影响德国数字化的机会和可能的陷阱。
349. Digital remote monitoring for screening and early detection of urinary tract infections.
用于筛查和早期发现尿路感染的数字远程监测。
PMID: 38218738 | DOI: 10.1038/s41746-023-00995-5 | 日期: 2024-01-13
摘要: Urinary Tract Infections (UTIs) are one of the most prevalent bacterial infections in older adults and a significant contributor to unplanned hospital admissions in People Living with Dementia (PLWD), with early detection being crucial due to the predicament of reporting symptoms and limited help-seeking behaviour. The most common diagnostic tool is urine sample analysis, which can be time-consuming and is only employed where UTI clinical suspicion exists. In this method development and proof-of-concept study, participants living with dementia were monitored via low-cost devices in the home that passively measure activity, sleep, and nocturnal physiology. Using 27828 person-days of remote monitoring data (from 117 participants), we engineered features representing symptoms used for diagnosing a UTI. We then evaluate explainable machine learning techniques in passively calculating UTI risk and perform stratification on scores to support clinical translation and allow control over the balance between alert rate and sensitivity and specificity. The proposed UTI algorithm achieves a sensitivity of 65.3% (95% Confidence Interval (CI) = 64.3-66.2) and specificity of 70.9% (68.6-73.1) when predicting UTIs on unseen participants and after risk stratification, a sensitivity of 74.7% (67.9-81.5) and specificity of 87.9% (85.0-90.9). In addition, feature importance methods reveal that the largest contributions to the predictions were bathroom visit statistics, night-time respiratory rate, and the number of previous UTI events, aligning with the literature. Our machine learning method alerts clinicians of UTI risk in subjects, enabling earlier detection and enhanced screening when considering treatment.
中文摘要: 尿路感染 (UTI) 是老年人中最常见的细菌感染之一,也是导致痴呆症患者 (PLWD) 意外入院的重要原因,由于报告症状的困境和寻求帮助行为有限,早期发现至关重要。最常见的诊断工具是尿液样本分析,该工具可能非常耗时,并且仅在临床怀疑存在尿路感染的情况下使用。在这项方法开发和概念验证研究中,通过家中的低成本设备对患有痴呆症的参与者进行监测,这些设备被动测量活动、睡眠和夜间生理机能。使用 27828 人日的远程监控数据(来自 117 名参与者),我们设计了代表用于诊断 UTI 的症状的特征。然后,我们评估被动计算尿路感染风险的可解释机器学习技术,并对分数进行分层,以支持临床转化并控制警报率与敏感性和特异性之间的平衡。所提出的 UTI 算法在预测未见过的参与者的 UTI 时,灵敏度为 65.3%(95% 置信区间 (CI) = 64.3-66.2),特异性为 70.9%(68.6-73.1),在风险分层后,灵敏度为 74.7%(67.9-81.5),特异性为 87.9% (85.0-90.9)。此外,特征重要性方法显示,对预测贡献最大的是上厕所统计数据、夜间呼吸频率和之前 UTI 事件的数量,与文献一致。我们的机器学习方法提醒临床医生受试者的尿路感染风险,从而在考虑治疗时实现早期检测和加强筛查。
350. Next-generation study databases require FAIR, EHR-integrated, and scalable Electronic Data Capture for medical documentation and decision support.
下一代研究数据库需要公平、EHR 集成且可扩展的电子数据捕获来进行医疗记录和决策支持。
PMID: 38216645 | DOI: 10.1038/s41746-023-00994-6 | 日期: 2024-01-12
摘要: Structured patient data play a key role in all types of clinical research. They are often collected in study databases for research purposes. In order to describe characteristics of a next-generation study database and assess the feasibility of its implementation a proof-of-concept study in a German university hospital was performed. Key characteristics identified include FAIR access to electronic case report forms (eCRF), regulatory compliant Electronic Data Capture (EDC), an EDC with electronic health record (EHR) integration, scalable EDC for medical documentation, patient generated data, and clinical decision support. In a local case study, we then successfully implemented a next-generation study database for 19 EDC systems (n = 2217 patients) that linked to i.s.h.med (Oracle Cerner) with the local EDC system called OpenEDC. Desiderata of next-generation study databases for patient data were identified from ongoing local clinical study projects in 11 clinical departments at Heidelberg University Hospital, Germany, a major tertiary referral hospital. We compiled and analyzed feature and functionality requests submitted to the OpenEDC team between May 2021 and July 2023. Next-generation study databases are technically and clinically feasible. Further research is needed to evaluate if our approach is feasible in a multi-center setting as well.
中文摘要: 结构化患者数据在所有类型的临床研究中都发挥着关键作用。它们通常被收集在研究数据库中用于研究目的。为了描述下一代研究数据库的特征并评估其实施的可行性,在德国大学医院进行了概念验证研究。确定的关键特征包括公平访问电子病例报告表 (eCRF)、符合监管要求的电子数据采集 (EDC)、与电子健康记录 (EHR) 集成的 EDC、用于医疗文档的可扩展 EDC、患者生成的数据和临床决策支持。在当地的案例研究中,我们成功地为 19 个 EDC 系统(n = 2217 名患者)实施了下一代研究数据库,该数据库通过名为 OpenEDC 的本地 EDC 系统链接到 i.s.h.med (Oracle Cerner)。德国海德堡大学医院(一家大型三级转诊医院)11 个临床科室正在进行的本地临床研究项目确定了下一代患者数据研究数据库的需求。我们整理并分析了 2021 年 5 月至 2023 年 7 月期间提交给 OpenEDC 团队的特性和功能请求。下一代研究数据库在技术和临床上都是可行的。需要进一步的研究来评估我们的方法在多中心环境中是否可行。
351. The performance of a deep learning system in assisting junior ophthalmologists in diagnosing 13 major fundus diseases: a prospective multi-center clinical trial.
深度学习系统辅助初级眼科医生诊断13种主要眼底疾病的表现:前瞻性多中心临床试验
PMID: 38212607 | DOI: 10.1038/s41746-023-00991-9 | 日期: 2024-01-11
摘要: Artificial intelligence (AI)-based diagnostic systems have been reported to improve fundus disease screening in previous studies. This multicenter prospective self-controlled clinical trial aims to evaluate the diagnostic performance of a deep learning system (DLS) in assisting junior ophthalmologists in detecting 13 major fundus diseases. A total of 1493 fundus images from 748 patients were prospectively collected from five tertiary hospitals in China. Nine junior ophthalmologists were trained and annotated the images with or without the suggestions proposed by the DLS. The diagnostic performance was evaluated among three groups: DLS-assisted junior ophthalmologist group (test group), junior ophthalmologist group (control group) and DLS group. The diagnostic consistency was 84.9% (95%CI, 83.0% ~ 86.9%), 72.9% (95%CI, 70.3% ~ 75.6%) and 85.5% (95%CI, 83.5% ~ 87.4%) in the test group, control group and DLS group, respectively. With the help of the proposed DLS, the diagnostic consistency of junior ophthalmologists improved by approximately 12% (95% CI, 9.1% ~ 14.9%) with statistical significance (P < 0.001). For the detection of 13 diseases, the test group achieved significant higher sensitivities (72.2% ~ 100.0%) and comparable specificities (90.8% ~ 98.7%) comparing with the control group (sensitivities, 50% ~ 100%; specificities 96.7 ~ 99.8%). The DLS group presented similar performance to the test group in the detection of any fundus abnormality (sensitivity, 95.7%; specificity, 87.2%) and each of the 13 diseases (sensitivity, 83.3% ~ 100.0%; specificity, 89.0 ~ 98.0%). The proposed DLS provided a novel approach for the automatic detection of 13 major fundus diseases with high diagnostic consistency and assisted to improve the performance of junior ophthalmologists, resulting especially in reducing the risk of missed diagnoses. ClinicalTrials.gov NCT04723160.
中文摘要: 据报道,基于人工智能 (AI) 的诊断系统可以改善眼底疾病筛查。这项多中心前瞻性自我对照临床试验旨在评估深度学习系统(DLS)辅助初级眼科医生检测13种主要眼底疾病的诊断性能。前瞻性收集了中国五家三级医院的 748 名患者的 1493 幅眼底图像。九名初级眼科医生接受了培训,并在有或没有 DLS 提出的建议的情况下对图像进行了注释。在三组中评估诊断性能:DLS辅助的初级眼科医生组(测试组)、初级眼科医生组(对照组)和DLS组。试验组、对照组和DLS组的诊断一致性分别为84.9%(95%CI,83.0% ~ 86.9%)、72.9%(95%CI,70.3% ~ 75.6%)和85.5%(95%CI,83.5% ~ 87.4%)。在所提出的DLS的帮助下,初级眼科医生的诊断一致性提高了约12%(95% CI,9.1%14.9%),具有统计学意义(P<0.001)。对于13种疾病的检测,与对照组(敏感性50%100%;特异性96.799.8%)相比,试验组的敏感性(72.2%100.0%)和特异性(90.8%~98.7%)相当。 DLS组在检测眼底异常(敏感性95.7%;特异性87.2%)和13种疾病(敏感性83.3%100.0%;特异性89.098.0%)方面与测试组表现相似。所提出的DLS为自动检测13种主要眼底疾病提供了一种具有高诊断一致性的新方法,有助于提高初级眼科医生的表现,尤其是降低漏诊风险。 ClinicalTrials.gov NCT04723160。
352. Multimodal digital phenotyping of diet, physical activity, and glycemia in Hispanic/Latino adults with or at risk of type 2 diabetes.
对患有 2 型糖尿病或有 2 型糖尿病风险的西班牙裔/拉丁裔成年人的饮食、体力活动和血糖进行多模式数字表型分析。
PMID: 38212415 | DOI: 10.1038/s41746-023-00985-7 | 日期: 2024-01-11
摘要: Digital phenotyping refers to characterizing human bio-behavior through wearables, personal devices, and digital health technologies. Digital phenotyping in populations facing a disproportionate burden of type 2 diabetes (T2D) and health disparities continues to lag compared to other populations. Here, we report our study demonstrating the application of multimodal digital phenotyping, i.e., the simultaneous use of CGM, physical activity monitors, and meal tracking in Hispanic/Latino individuals with or at risk of T2D. For 14 days, 36 Hispanic/Latino adults (28 female, 14 with non-insulin treated T2D) wore a continuous glucose monitor (CGM) and a physical activity monitor (Actigraph) while simultaneously logging meals using the MyFitnessPal app. We model meal events and daily digital biomarkers representing diet, physical activity choices, and corresponding glycemic response. We develop a digital biomarker for meal events that differentiates meal events into normal and elevated categories. We examine the contribution of daily digital biomarkers of elevated meal event count and step count on daily time-in-range 54-140 mg/dL (TIR54-140) and average glucose. After adjusting for step count, a change in elevated meal event count from zero to two decreases TIR54-140 by 4.0% (p = 0.003). An increase in 1000 steps in post-meal step count also reduces the meal event glucose response by 641 min mg/dL (p = 0.0006) and reduces the odds of an elevated meal event by 55% (p < 0.0001). The proposed meal event digital biomarkers may provide an opportunity for non-pharmacologic interventions for Hispanic/Latino adults facing a disproportionate burden of T2D.
中文摘要: 数字表型分析是指通过可穿戴设备、个人设备和数字健康技术来表征人类生物行为。与其他人群相比,面临不成比例的 2 型糖尿病 (T2D) 负担和健康差异的人群的数字表型分析仍然滞后。在这里,我们报告了我们的研究,展示了多模式数字表型分析的应用,即在患有或有 T2D 风险的西班牙裔/拉丁裔个体中同时使用 CGM、体力活动监测器和膳食跟踪。在 14 天的时间里,36 名西班牙裔/拉丁裔成年人(28 名女性,14 名未经胰岛素治疗的 T2D)佩戴连续血糖监测仪 (CGM) 和身体活动监测仪 (Actigraph),同时使用 MyFitnessPal 应用程序记录膳食。我们对膳食事件和代表饮食、身体活动选择和相应血糖反应的日常数字生物标记进行建模。我们开发了一种用于进餐事件的数字生物标记,可将进餐事件分为正常和升高类别。我们研究了膳食事件计数和步数增加的每日数字生物标志物对每日时间范围 54-140 mg/dL (TIR54-140) 和平均血糖的贡献。调整步数后,进餐次数增加从 0 变为 2 会使 TIR54-140 降低 4.0% (p = 0.003)。餐后步数增加 1000 步还会使进餐事件葡萄糖反应降低 641 min mg/dL (p = 0.0006),并将进餐事件升高的几率降低 55% (p < 0.0001)。拟议的膳食事件数字生物标志物可能为面临不成比例的 T2D 负担的西班牙裔/拉丁裔成年人提供非药物干预的机会。
353. Large language models to identify social determinants of health in electronic health records.
用于识别电子健康记录中健康的社会决定因素的大型语言模型。
PMID: 38200151 | DOI: 10.1038/s41746-023-00970-0 | 日期: 2024-01-11
摘要: Social determinants of health (SDoH) play a critical role in patient outcomes, yet their documentation is often missing or incomplete in the structured data of electronic health records (EHRs). Large language models (LLMs) could enable high-throughput extraction of SDoH from the EHR to support research and clinical care. However, class imbalance and data limitations present challenges for this sparsely documented yet critical information. Here, we investigated the optimal methods for using LLMs to extract six SDoH categories from narrative text in the EHR: employment, housing, transportation, parental status, relationship, and social support. The best-performing models were fine-tuned Flan-T5 XL for any SDoH mentions (macro-F1 0.71), and Flan-T5 XXL for adverse SDoH mentions (macro-F1 0.70). Adding LLM-generated synthetic data to training varied across models and architecture, but improved the performance of smaller Flan-T5 models (delta F1 + 0.12 to +0.23). Our best-fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models in the zero- and few-shot setting, except GPT4 with 10-shot prompting for adverse SDoH. Fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p < 0.05). Our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. These results demonstrate the potential of LLMs in improving real-world evidence on SDoH and assisting in identifying patients who could benefit from resource support.
中文摘要: 健康的社会决定因素 (SDoH) 在患者治疗结果中发挥着至关重要的作用,但其记录在电子健康记录 (EHR) 的结构化数据中经常缺失或不完整。大型语言模型 (LLM) 可以从 EHR 中高通量提取 SDoH,以支持研究和临床护理。然而,类别不平衡和数据限制给这些记录稀疏但关键的信息带来了挑战。在这里,我们研究了使用法学硕士从 EHR 叙述文本中提取六个 SDoH 类别的最佳方法:就业、住房、交通、父母状况、关系和社会支持。性能最佳的模型是针对任何 SDoH 提及(宏观 F1 0.71)进行微调的 Flan-T5 XL,以及针对不利的 SDoH 提及(宏观 F1 0.70)进行微调的 Flan-T5 XXL。将 LLM 生成的合成数据添加到训练中因模型和架构而异,但提高了较小 Flan-T5 模型的性能(增量 F1 + 0.12 至 +0.23)。我们经过最佳微调的模型在零次和几次射击设置中的性能优于 ChatGPT 系列模型的零次和几次射击性能,但 GPT4 除外,它具有 10 次射击提示不良 SDoH 的功能。当将种族/民族和性别描述符添加到文本中时,微调模型比 ChatGPT 更不可能改变其预测,这表明算法偏差较小 (p<0.05)。我们的模型识别出 93.8% 的 SDoH 不良患者,而 ICD-10 代码识别出 2.0%。这些结果证明了法学硕士在改善 SDoH 的现实世界证据和帮助识别可以从资源支持中受益的患者方面的潜力。
354. UK funding agency launches digital health hubs: a new catalyst for change?
英国资助机构推出数字健康中心:变革的新催化剂?
PMID: 38184701 | DOI: 10.1038/s41746-023-00990-w | 日期: 2024-01-06
355. Machine learning-based clinical decision support system for treatment recommendation and overall survival prediction of hepatocellular carcinoma: a multi-center study.
基于机器学习的肝细胞癌治疗推荐和总生存预测的临床决策支持系统:一项多中心研究。
PMID: 38182886 | DOI: 10.1038/s41746-023-00976-8 | 日期: 2024-01-05
摘要: The treatment decisions for patients with hepatocellular carcinoma are determined by a wide range of factors, and there is a significant difference between the recommendations of widely used staging systems and the actual initial treatment choices. Herein, we propose a machine learning-based clinical decision support system suitable for use in multi-center settings. We collected data from nine institutions in South Korea for training and validation datasets. The internal and external datasets included 935 and 1750 patients, respectively. We developed a model with 20 clinical variables consisting of two stages: the first stage which recommends initial treatment using an ensemble voting machine, and the second stage, which predicts post-treatment survival using a random survival forest algorithm. We derived the first and second treatment options from the results with the highest and the second-highest probabilities given by the ensemble model and predicted their post-treatment survival. When only the first treatment option was accepted, the mean accuracy of treatment recommendation in the internal and external datasets was 67.27% and 55.34%, respectively. The accuracy increased to 87.27% and 86.06%, respectively, when the second option was included as the correct answer. Harrell's C index, integrated time-dependent AUC curve, and integrated Brier score of survival prediction in the internal and external datasets were 0.8381 and 0.7767, 91.89 and 86.48, 0.12, and 0.14, respectively. The proposed system can assist physicians by providing data-driven predictions for reference from other larger institutions or other physicians within the same institution when making treatment decisions.
中文摘要: 肝细胞癌患者的治疗决策由多种因素决定,广泛使用的分期系统的建议与实际的初始治疗选择之间存在显着差异。在此,我们提出了一种基于机器学习的临床决策支持系统,适合在多中心环境中使用。我们从韩国的九个机构收集了数据用于训练和验证数据集。内部和外部数据集分别包括 935 名和 1750 名患者。我们开发了一个包含 20 个临床变量的模型,由两个阶段组成:第一阶段建议使用整体投票机进行初始治疗,第二阶段使用随机生存森林算法预测治疗后生存。我们根据集成模型给出的最高和第二高概率的结果得出第一和第二治疗方案,并预测他们的治疗后生存率。当仅接受第一种治疗方案时,内部和外部数据集中治疗推荐的平均准确度分别为 67.27% 和 55.34%。当第二个选项被包含为正确答案时,准确率分别增加到 87.27% 和 86.06%。内部和外部数据集中的 Harrell's C 指数、集成的时间依赖性 AUC 曲线和生存预测的集成 Brier 评分分别为 0.8381 和 0.7767、91.89 和 86.48、0.12 和 0.14。所提出的系统可以通过提供数据驱动的预测来帮助医生,以供其他较大机构或同一机构内的其他医生在做出治疗决策时参考。
356. Artificial intelligence-enabled ECG for left ventricular diastolic function and filling pressure.
支持人工智能的心电图,用于测量左心室舒张功能和充盈压。
PMID: 38182738 | DOI: 10.1038/s41746-023-00993-7 | 日期: 2024-01-06
摘要: Assessment of left ventricular diastolic function plays a major role in the diagnosis and prognosis of cardiac diseases, including heart failure with preserved ejection fraction. We aimed to develop an artificial intelligence (AI)-enabled electrocardiogram (ECG) model to identify echocardiographically determined diastolic dysfunction and increased filling pressure. We trained, validated, and tested an AI-enabled ECG in 98,736, 21,963, and 98,763 patients, respectively, who had an ECG and echocardiographic diastolic function assessment within 14 days with no exclusion criteria. It was also tested in 55,248 patients with indeterminate diastolic function by echocardiography. The model was evaluated using the area under the curve (AUC) of the receiver operating characteristic curve, and its prognostic performance was compared to echocardiography. The AUC for detecting increased filling pressure was 0.911. The AUCs to identify diastolic dysfunction grades ≥1, ≥2, and 3 were 0.847, 0.911, and 0.943, respectively. During a median follow-up of 5.9 years, 20,223 (20.5%) died. Patients with increased filling pressure predicted by AI-ECG had higher mortality than those with normal filling pressure, after adjusting for age, sex, and comorbidities in the test group (hazard ratio (HR) 1.7, 95% CI 1.645-1.757) similar to echocardiography and in the indeterminate group (HR 1.34, 95% CI 1.298-1.383). An AI-enabled ECG identifies increased filling pressure and diastolic function grades with a good prognostic value similar to echocardiography. AI-ECG is a simple and promising tool to enhance the detection of diseases associated with diastolic dysfunction and increased diastolic filling pressure.
中文摘要: 左心室舒张功能的评估在心脏病(包括射血分数保留的心力衰竭)的诊断和预后中起着重要作用。我们的目标是开发一种支持人工智能 (AI) 的心电图 (ECG) 模型,以识别超声心动图确定的舒张功能障碍和充盈压升高。我们分别对 98,736、21,963 和 98,763 名患者进行了训练、验证和测试了人工智能心电图,这些患者在 14 天内进行了心电图和超声心动图舒张功能评估,没有排除标准。还通过超声心动图对 55,248 名舒张功能不确定的患者进行了测试。使用受试者工作特征曲线的曲线下面积(AUC)评估该模型,并将其预后性能与超声心动图进行比较。检测充盈压力增加的 AUC 为 0.911。识别舒张功能障碍等级≥1、≥2和3级的AUC分别为0.847、0.911和0.943。在中位随访 5.9 年期间,有 20,223 人 (20.5%) 死亡。在调整年龄、性别和合并症后,与超声心动图相似的测试组(HR 1.7,95% CI 1.645-1.757)和不确定组(HR 1.34,95% CI 1.298-1.383)中,AI-ECG预测的充盈压升高的患者的死亡率高于充盈压正常的患者。支持人工智能的心电图可识别充盈压升高和舒张功能等级,具有与超声心动图类似的良好预后价值。 AI-ECG 是一种简单且有前景的工具,可增强与舒张功能障碍和舒张充盈压升高相关疾病的检测。
357. An interpretable model based on graph learning for diagnosis of Parkinson's disease with voice-related EEG.
基于图学习的可解释模型,用于通过语音相关脑电图诊断帕金森病。
PMID: 38182737 | DOI: 10.1038/s41746-023-00983-9 | 日期: 2024-01-05
摘要: Parkinson's disease (PD) exhibits significant clinical heterogeneity, presenting challenges in the identification of reliable electroencephalogram (EEG) biomarkers. Machine learning techniques have been integrated with resting-state EEG for PD diagnosis, but their practicality is constrained by the interpretable features and the stochastic nature of resting-state EEG. The present study proposes a novel and interpretable deep learning model, graph signal processing-graph convolutional networks (GSP-GCNs), using event-related EEG data obtained from a specific task involving vocal pitch regulation for PD diagnosis. By incorporating both local and global information from single-hop and multi-hop networks, our proposed GSP-GCNs models achieved an averaged classification accuracy of 90.2%, exhibiting a significant improvement of 9.5% over other deep learning models. Moreover, the interpretability analysis revealed discriminative distributions of large-scale EEG networks and topographic map of microstate MS5 learned by our models, primarily located in the left ventral premotor cortex, superior temporal gyrus, and Broca's area that are implicated in PD-related speech disorders, reflecting our GSP-GCN models' ability to provide interpretable insights identifying distinctive EEG biomarkers from large-scale networks. These findings demonstrate the potential of interpretable deep learning models coupled with voice-related EEG signals for distinguishing PD patients from healthy controls with accuracy and elucidating the underlying neurobiological mechanisms.
中文摘要: 帕金森病 (PD) 表现出显着的临床异质性,给识别可靠的脑电图 (EEG) 生物标志物带来了挑战。机器学习技术已与静息态脑电图相结合用于帕金森病诊断,但其实用性受到静息态脑电图的可解释特征和随机性的限制。本研究提出了一种新颖且可解释的深度学习模型,即图信号处理图卷积网络(GSP-GCN),该模型使用从涉及音高调节的特定任务中获得的事件相关脑电图数据来进行 PD 诊断。通过结合来自单跳和多跳网络的本地和全局信息,我们提出的 GSP-GCN 模型实现了 90.2% 的平均分类精度,比其他深度学习模型显着提高了 9.5%。此外,可解释性分析揭示了我们的模型学习到的大规模脑电图网络和微状态 MS5 的地形图的判别性分布,主要位于与 PD 相关的言语障碍有关的左腹侧前运动皮层、颞上回和布罗卡区,这反映了我们的 GSP-GCN 模型能够提供可解释的见解,从大规模网络中识别独特的脑电图生物标志物。这些发现证明了可解释的深度学习模型与声音相关的脑电图信号相结合的潜力,可以准确地区分帕金森病患者与健康对照,并阐明潜在的神经生物学机制。
358. Computerized cognitive training for memory functions in mild cognitive impairment or dementia: a systematic review and meta-analysis.
针对轻度认知障碍或痴呆症记忆功能的计算机认知训练:系统评价和荟萃分析。
PMID: 38172429 | DOI: 10.1038/s41746-023-00987-5 | 日期: 2024-01-03
摘要: Dementia is a common medical condition in the ageing population, and cognitive intervention is a non-pharmacologic strategy to improve cognitive functions. This meta-analysis evaluated the benefits of computerized cognitive training (CCT) on memory functions in individuals with MCI or dementia. The study was registered prospectively with PROSPERO under CRD42022363715 and received no funding. The search was conducted on MEDLINE, Embase, and PsycINFO on Sept 19, 2022, and Google Scholar on May 9, 2023, to identify randomized controlled trials that examined the effects of CCT on memory outcomes in individuals with MCI or dementia. Mean differences and standard deviations of neuropsychological assessment scores were extracted to derive standardized mean differences. Our search identified 10,678 studies, of which 35 studies were included. Among 1489 participants with MCI, CCT showed improvements in verbal memory (SMD (95%CI) = 0.55 (0.35-0.74)), visual memory (0.36 (0.12-0.60)), and working memory (0.37 (0.10-0.64)). Supervised CCT showed improvements in verbal memory (0.72 (0.45-0.98)), visual memory (0.51 (0.22-0.79)), and working memory (0.33 (0.01-0.66)). Unsupervised CCT showed improvement in verbal memory (0.21 (0.04-0.38)) only. Among 371 participants with dementia, CCT showed improvement in verbal memory (0.64 (0.02-1.27)) only. Inconsistency due to heterogeneity (as indicated by I2 values) is observed, which reduces our confidence in MCI outcomes to a moderate level and dementia outcomes to a low level. The results suggest that CCT is efficacious on various memory domains in individuals with MCI. Although the supervised approach showed greater effects, the unsupervised approach can improve verbal memory while allowing users to receive CCT at home without engaging as many healthcare resources.
中文摘要: 痴呆症是老龄化人群中的常见疾病,认知干预是一种改善认知功能的非药物策略。这项荟萃分析评估了计算机认知训练 (CCT) 对轻度认知障碍 (MCI) 或痴呆症患者记忆功能的益处。该研究已在 PROSPERO 进行前瞻性注册,注册编号为 CRD42022363715,但未获得任何资助。该搜索于 2022 年 9 月 19 日在 MEDLINE、Embase 和 PsycINFO 上进行,并于 2023 年 5 月 9 日在 Google Scholar 上进行,以确定检验 CCT 对 MCI 或痴呆症患者记忆结果影响的随机对照试验。提取神经心理学评估分数的平均差和标准差以得出标准化平均差。我们的检索发现了 10,678 项研究,其中纳入了 35 项研究。在 1489 名 MCI 参与者中,CCT 显示言语记忆 (SMD (95%CI) = 0.55 (0.35-0.74))、视觉记忆 (0.36 (0.12-0.60)) 和工作记忆 (0.37 (0.10-0.64)) 有所改善。监督 CCT 显示言语记忆 (0.72 (0.45-0.98))、视觉记忆 (0.51 (0.22-0.79)) 和工作记忆 (0.33 (0.01-0.66)) 有所改善。无监督 CCT 仅显示言语记忆有所改善 (0.21 (0.04-0.38))。在 371 名痴呆症参与者中,CCT 仅显示言语记忆有所改善 (0.64 (0.02-1.27))。观察到由于异质性(如 I2 值所示)导致的不一致,这将我们对 MCI 结果的信心降低至中等水平,并将痴呆结果降低至低水平。结果表明,CCT 对 MCI 患者的各个记忆域都有效。尽管有监督的方法显示出更大的效果,但无监督的方法可以改善语言记忆,同时允许用户在家中接受 CCT,而无需使用太多的医疗资源。