医学本体识别 映射-UMLS

Dimension scispaCy + UMLS MedCAT
Concept source UMLS UMLS
NER Statistical (fixed) Integrated
Disambiguation String similarity Neural context model
Abbreviations Weak Strong
Domain adaptation ✅ (trainable)
Transparency High Medium
Speed Fast Moderate
Reproducibility High Medium
Explainability alignment Excellent Moderate
Setup complexity Low High

Code

Clinical sentence

scispaCy NER (en_core_sci_md)

String-based candidate generation

UMLS Metathesaurus lookup

Similarity-based linking (CUI + score)

复制代码
# UMLS concept mapping for a clinical sentence using Python
# This example uses scispaCy + UMLS Entity Linker
import spacy
from scispacy.linking import EntityLinker

# Load a biomedical NLP model
nlp = spacy.load("en_core_sci_md")

# Add UMLS entity linker to the pipeline
# Use "umls" as the linker name and add it with a custom name
nlp.add_pipe("scispacy_linker", config={"resolve_abbreviations": True, "linker_name": "umls"})

# Get reference to the linker component
linker = nlp.get_pipe("scispacy_linker")

# Input sentence
sentence = "he has a family history of seizure and eyes often closed and resistant to passive opening"

# Process the sentence
doc = nlp(sentence)

# Extract and display UMLS concepts
results = []
for ent in doc.ents:
    # Check if the entity has linked concepts
    if ent._.kb_ents:  # Use kb_ents instead of umls_ents
        for umls_ent in ent._.kb_ents:
            cui, score = umls_ent
            concept = linker.kb.cui_to_entity[cui]
            results.append({
                "text_span": ent.text,
                "cui": cui,
                "preferred_name": concept.canonical_name,
                "semantic_types": concept.types,
                "confidence_score": score
            })

# Print results
if results:
    for r in results:
        print(r)
else:
    print("No UMLS concepts found.")

Outcome

{'text_span': 'family history of seizure', 'cui': 'C0241889', 'preferred_name': 'Family history (finding)', 'semantic_types': ['T033'], 'confidence_score': 0.8135724663734436}

{'text_span': 'family history of seizure', 'cui': 'C5238701', 'preferred_name': 'Family History of Myocardial Infarction', 'semantic_types': ['T033'], 'confidence_score': 0.7316329479217529}

{'text_span': 'family history of seizure', 'cui': 'C2317524', 'preferred_name': 'Family history of coronary arteriosclerosis', 'semantic_types': ['T033'], 'confidence_score': 0.7181885242462158}

{'text_span': 'family history of seizure', 'cui': 'C0260515', 'preferred_name': 'Family history of cancer', 'semantic_types': ['T033'], 'confidence_score': 0.7104339003562927}

{'text_span': 'eyes', 'cui': 'C0015392', 'preferred_name': 'Eye', 'semantic_types': ['T023'], 'confidence_score': 0.9724115133285522}

{'text_span': 'eyes', 'cui': 'C0235267', 'preferred_name': 'Redness of eye', 'semantic_types': ['T184'], 'confidence_score': 0.8306822180747986}

{'text_span': 'eyes', 'cui': 'C0266574', 'preferred_name': 'Ablepharon', 'semantic_types': ['T019'], 'confidence_score': 0.7846059799194336}

{'text_span': 'eyes', 'cui': 'C0885957', 'preferred_name': 'Eye care (regime/therapy)', 'semantic_types': ['T058'], 'confidence_score': 0.783681333065033}

{'text_span': 'eyes', 'cui': 'C1268161', 'preferred_name': 'Eye part', 'semantic_types': ['T023'], 'confidence_score': 0.7718889713287354}

{'text_span': 'closed', 'cui': 'C0587267', 'preferred_name': 'Closed', 'semantic_types': ['T169'], 'confidence_score': 0.9921370148658752}

{'text_span': 'closed', 'cui': 'C1548219', 'preferred_name': 'Bed Status - Closed', 'semantic_types': ['T078'], 'confidence_score': 0.9921370148658752}

{'text_span': 'closed', 'cui': 'C5669719', 'preferred_name': 'Closed Captioning', 'semantic_types': ['T170'], 'confidence_score': 0.709942638874054}

{'text_span': 'resistant', 'cui': 'C0332325', 'preferred_name': 'Resistant (qualifier value)', 'semantic_types': ['T169'], 'confidence_score': 0.9936941862106323}

{'text_span': 'resistant', 'cui': 'C1550464', 'preferred_name': 'resistant - Observation Interpretation Susceptibility', 'semantic_types': ['T078'], 'confidence_score': 0.9936941862106323}

{'text_span': 'resistant', 'cui': 'C2827757', 'preferred_name': 'Antimicrobial Resistance Result', 'semantic_types': ['T034'], 'confidence_score': 0.9936941862106323}

{'text_span': 'resistant', 'cui': 'C2986418', 'preferred_name': 'Resistant Starch', 'semantic_types': ['T109', 'T121'], 'confidence_score': 0.7441271543502808}

{'text_span': 'resistant', 'cui': 'C1514892', 'preferred_name': 'Resistance Process', 'semantic_types': ['T039'], 'confidence_score': 0.7311867475509644}

{'text_span': 'passive opening', 'cui': 'C0175566', 'preferred_name': 'Open', 'semantic_types': ['T082'], 'confidence_score': 0.7032682299613953}

{'text_span': 'passive opening', 'cui': 'C1882151', 'preferred_name': 'Opening', 'semantic_types': ['T082'], 'confidence_score': 0.7032682299613953}

相关推荐
霍理迪2 小时前
js数据类型与运算符
开发语言·前端·javascript
被星1砸昏头2 小时前
自定义操作符高级用法
开发语言·c++·算法
2301_810540732 小时前
python第一次作业
开发语言·python·算法
梦想的旅途22 小时前
基于RPA的多线程企微外部群异步推送架构
java·开发语言·jvm
【赫兹威客】浩哥2 小时前
【赫兹威客】Pycharm安装详细教程
python·pycharm
Rhys..2 小时前
Playwright + JS 进行页面跳转测试
开发语言·前端·javascript
oscar9992 小时前
深入解析不安全反序列化漏洞与防护[高风险]
开发语言·python·安全
项目題供诗2 小时前
C语言基础(十)
c语言·开发语言
落叶,听雪2 小时前
性价比高的软著助手供应商选哪家
大数据·人工智能·python