医学本体识别 映射-UMLS

Dimension scispaCy + UMLS MedCAT
Concept source UMLS UMLS
NER Statistical (fixed) Integrated
Disambiguation String similarity Neural context model
Abbreviations Weak Strong
Domain adaptation ✅ (trainable)
Transparency High Medium
Speed Fast Moderate
Reproducibility High Medium
Explainability alignment Excellent Moderate
Setup complexity Low High

Code

Clinical sentence

scispaCy NER (en_core_sci_md)

String-based candidate generation

UMLS Metathesaurus lookup

Similarity-based linking (CUI + score)

复制代码
# UMLS concept mapping for a clinical sentence using Python
# This example uses scispaCy + UMLS Entity Linker
import spacy
from scispacy.linking import EntityLinker

# Load a biomedical NLP model
nlp = spacy.load("en_core_sci_md")

# Add UMLS entity linker to the pipeline
# Use "umls" as the linker name and add it with a custom name
nlp.add_pipe("scispacy_linker", config={"resolve_abbreviations": True, "linker_name": "umls"})

# Get reference to the linker component
linker = nlp.get_pipe("scispacy_linker")

# Input sentence
sentence = "he has a family history of seizure and eyes often closed and resistant to passive opening"

# Process the sentence
doc = nlp(sentence)

# Extract and display UMLS concepts
results = []
for ent in doc.ents:
    # Check if the entity has linked concepts
    if ent._.kb_ents:  # Use kb_ents instead of umls_ents
        for umls_ent in ent._.kb_ents:
            cui, score = umls_ent
            concept = linker.kb.cui_to_entity[cui]
            results.append({
                "text_span": ent.text,
                "cui": cui,
                "preferred_name": concept.canonical_name,
                "semantic_types": concept.types,
                "confidence_score": score
            })

# Print results
if results:
    for r in results:
        print(r)
else:
    print("No UMLS concepts found.")

Outcome

{'text_span': 'family history of seizure', 'cui': 'C0241889', 'preferred_name': 'Family history (finding)', 'semantic_types': ['T033'], 'confidence_score': 0.8135724663734436}

{'text_span': 'family history of seizure', 'cui': 'C5238701', 'preferred_name': 'Family History of Myocardial Infarction', 'semantic_types': ['T033'], 'confidence_score': 0.7316329479217529}

{'text_span': 'family history of seizure', 'cui': 'C2317524', 'preferred_name': 'Family history of coronary arteriosclerosis', 'semantic_types': ['T033'], 'confidence_score': 0.7181885242462158}

{'text_span': 'family history of seizure', 'cui': 'C0260515', 'preferred_name': 'Family history of cancer', 'semantic_types': ['T033'], 'confidence_score': 0.7104339003562927}

{'text_span': 'eyes', 'cui': 'C0015392', 'preferred_name': 'Eye', 'semantic_types': ['T023'], 'confidence_score': 0.9724115133285522}

{'text_span': 'eyes', 'cui': 'C0235267', 'preferred_name': 'Redness of eye', 'semantic_types': ['T184'], 'confidence_score': 0.8306822180747986}

{'text_span': 'eyes', 'cui': 'C0266574', 'preferred_name': 'Ablepharon', 'semantic_types': ['T019'], 'confidence_score': 0.7846059799194336}

{'text_span': 'eyes', 'cui': 'C0885957', 'preferred_name': 'Eye care (regime/therapy)', 'semantic_types': ['T058'], 'confidence_score': 0.783681333065033}

{'text_span': 'eyes', 'cui': 'C1268161', 'preferred_name': 'Eye part', 'semantic_types': ['T023'], 'confidence_score': 0.7718889713287354}

{'text_span': 'closed', 'cui': 'C0587267', 'preferred_name': 'Closed', 'semantic_types': ['T169'], 'confidence_score': 0.9921370148658752}

{'text_span': 'closed', 'cui': 'C1548219', 'preferred_name': 'Bed Status - Closed', 'semantic_types': ['T078'], 'confidence_score': 0.9921370148658752}

{'text_span': 'closed', 'cui': 'C5669719', 'preferred_name': 'Closed Captioning', 'semantic_types': ['T170'], 'confidence_score': 0.709942638874054}

{'text_span': 'resistant', 'cui': 'C0332325', 'preferred_name': 'Resistant (qualifier value)', 'semantic_types': ['T169'], 'confidence_score': 0.9936941862106323}

{'text_span': 'resistant', 'cui': 'C1550464', 'preferred_name': 'resistant - Observation Interpretation Susceptibility', 'semantic_types': ['T078'], 'confidence_score': 0.9936941862106323}

{'text_span': 'resistant', 'cui': 'C2827757', 'preferred_name': 'Antimicrobial Resistance Result', 'semantic_types': ['T034'], 'confidence_score': 0.9936941862106323}

{'text_span': 'resistant', 'cui': 'C2986418', 'preferred_name': 'Resistant Starch', 'semantic_types': ['T109', 'T121'], 'confidence_score': 0.7441271543502808}

{'text_span': 'resistant', 'cui': 'C1514892', 'preferred_name': 'Resistance Process', 'semantic_types': ['T039'], 'confidence_score': 0.7311867475509644}

{'text_span': 'passive opening', 'cui': 'C0175566', 'preferred_name': 'Open', 'semantic_types': ['T082'], 'confidence_score': 0.7032682299613953}

{'text_span': 'passive opening', 'cui': 'C1882151', 'preferred_name': 'Opening', 'semantic_types': ['T082'], 'confidence_score': 0.7032682299613953}

相关推荐
kkeeper~1 天前
0基础C语言积跬步之深入理解指针(5下)
c语言·开发语言
一直不明飞行1 天前
Java的equals(),hashCode()应该在什么时候重写
java·开发语言·jvm
2301_803934611 天前
Go语言如何做网络爬虫_Go语言爬虫开发教程【指南】
jvm·数据库·python
WL_Aurora1 天前
Python爬虫实战(六):新发地蔬菜价格数据采集.
爬虫·python
盲敲代码的阿豪1 天前
Python 入门基础教程(爬虫前置版)
开发语言·爬虫·python
basketball6161 天前
C++ 构造函数完全指南:从入门到进阶
java·开发语言·c++
互联科技报1 天前
2026超融合选型:Top5品牌与市场格局解读
开发语言·perl
weixin199701080161 天前
[特殊字符] 智能数据采集:数字化转型的“数据石油勘探队”(附Python实战源码)
开发语言·python
想唱rap1 天前
IO多路转接之poll
服务器·开发语言·数据库·c++
@杰克成1 天前
Java学习30
java·开发语言·学习