医学本体识别 映射-UMLS

Dimension scispaCy + UMLS MedCAT
Concept source UMLS UMLS
NER Statistical (fixed) Integrated
Disambiguation String similarity Neural context model
Abbreviations Weak Strong
Domain adaptation ✅ (trainable)
Transparency High Medium
Speed Fast Moderate
Reproducibility High Medium
Explainability alignment Excellent Moderate
Setup complexity Low High

Code

Clinical sentence

scispaCy NER (en_core_sci_md)

String-based candidate generation

UMLS Metathesaurus lookup

Similarity-based linking (CUI + score)

复制代码
# UMLS concept mapping for a clinical sentence using Python
# This example uses scispaCy + UMLS Entity Linker
import spacy
from scispacy.linking import EntityLinker

# Load a biomedical NLP model
nlp = spacy.load("en_core_sci_md")

# Add UMLS entity linker to the pipeline
# Use "umls" as the linker name and add it with a custom name
nlp.add_pipe("scispacy_linker", config={"resolve_abbreviations": True, "linker_name": "umls"})

# Get reference to the linker component
linker = nlp.get_pipe("scispacy_linker")

# Input sentence
sentence = "he has a family history of seizure and eyes often closed and resistant to passive opening"

# Process the sentence
doc = nlp(sentence)

# Extract and display UMLS concepts
results = []
for ent in doc.ents:
    # Check if the entity has linked concepts
    if ent._.kb_ents:  # Use kb_ents instead of umls_ents
        for umls_ent in ent._.kb_ents:
            cui, score = umls_ent
            concept = linker.kb.cui_to_entity[cui]
            results.append({
                "text_span": ent.text,
                "cui": cui,
                "preferred_name": concept.canonical_name,
                "semantic_types": concept.types,
                "confidence_score": score
            })

# Print results
if results:
    for r in results:
        print(r)
else:
    print("No UMLS concepts found.")

Outcome

{'text_span': 'family history of seizure', 'cui': 'C0241889', 'preferred_name': 'Family history (finding)', 'semantic_types': 'T033', 'confidence_score': 0.8135724663734436}

{'text_span': 'family history of seizure', 'cui': 'C5238701', 'preferred_name': 'Family History of Myocardial Infarction', 'semantic_types': 'T033', 'confidence_score': 0.7316329479217529}

{'text_span': 'family history of seizure', 'cui': 'C2317524', 'preferred_name': 'Family history of coronary arteriosclerosis', 'semantic_types': 'T033', 'confidence_score': 0.7181885242462158}

{'text_span': 'family history of seizure', 'cui': 'C0260515', 'preferred_name': 'Family history of cancer', 'semantic_types': 'T033', 'confidence_score': 0.7104339003562927}

{'text_span': 'eyes', 'cui': 'C0015392', 'preferred_name': 'Eye', 'semantic_types': 'T023', 'confidence_score': 0.9724115133285522}

{'text_span': 'eyes', 'cui': 'C0235267', 'preferred_name': 'Redness of eye', 'semantic_types': 'T184', 'confidence_score': 0.8306822180747986}

{'text_span': 'eyes', 'cui': 'C0266574', 'preferred_name': 'Ablepharon', 'semantic_types': 'T019', 'confidence_score': 0.7846059799194336}

{'text_span': 'eyes', 'cui': 'C0885957', 'preferred_name': 'Eye care (regime/therapy)', 'semantic_types': 'T058', 'confidence_score': 0.783681333065033}

{'text_span': 'eyes', 'cui': 'C1268161', 'preferred_name': 'Eye part', 'semantic_types': 'T023', 'confidence_score': 0.7718889713287354}

{'text_span': 'closed', 'cui': 'C0587267', 'preferred_name': 'Closed', 'semantic_types': 'T169', 'confidence_score': 0.9921370148658752}

{'text_span': 'closed', 'cui': 'C1548219', 'preferred_name': 'Bed Status - Closed', 'semantic_types': 'T078', 'confidence_score': 0.9921370148658752}

{'text_span': 'closed', 'cui': 'C5669719', 'preferred_name': 'Closed Captioning', 'semantic_types': 'T170', 'confidence_score': 0.709942638874054}

{'text_span': 'resistant', 'cui': 'C0332325', 'preferred_name': 'Resistant (qualifier value)', 'semantic_types': 'T169', 'confidence_score': 0.9936941862106323}

{'text_span': 'resistant', 'cui': 'C1550464', 'preferred_name': 'resistant - Observation Interpretation Susceptibility', 'semantic_types': 'T078', 'confidence_score': 0.9936941862106323}

{'text_span': 'resistant', 'cui': 'C2827757', 'preferred_name': 'Antimicrobial Resistance Result', 'semantic_types': 'T034', 'confidence_score': 0.9936941862106323}

{'text_span': 'resistant', 'cui': 'C2986418', 'preferred_name': 'Resistant Starch', 'semantic_types': 'T109', 'T121', 'confidence_score': 0.7441271543502808}

{'text_span': 'resistant', 'cui': 'C1514892', 'preferred_name': 'Resistance Process', 'semantic_types': 'T039', 'confidence_score': 0.7311867475509644}

{'text_span': 'passive opening', 'cui': 'C0175566', 'preferred_name': 'Open', 'semantic_types': 'T082', 'confidence_score': 0.7032682299613953}

{'text_span': 'passive opening', 'cui': 'C1882151', 'preferred_name': 'Opening', 'semantic_types': 'T082', 'confidence_score': 0.7032682299613953}

相关推荐
SelectDB8 小时前
Apache Doris Python UDF:让 SQL 直接调用 Python 生态,支撑 Agent 时代复杂业务逻辑
大数据·数据库·python
荣码16 小时前
GraphRAG:普通RAG只能回答"点"的问题,我踩了4个坑才搞懂
java·python
金銀銅鐵1 天前
[Python] 基于欧几里得算法,实现分数约分计算器
python·数学
Lyn_Li1 天前
Kaggle Top 5 | 198只股票、200条数据的金融预测——BattleFin高分方案从零复现
python·kaggle·比赛复盘·金融预测
小九九的爸爸1 天前
前端想要入门Agent开发,要具备哪些Python基础?
python·agent·ai编程
阿耶同学1 天前
手把手教你用 LangGraph 搭建三层嵌套 Agent 架构
python·程序员
花酒锄作田2 天前
Pydantic校验配置文件
python
hboot2 天前
AI工程师第四课 - 深度学习入门
pytorch·python·神经网络
ZhengEnCi3 天前
P2M-Matplotlib折线图完全指南-从数据可视化到趋势分析的Python绘图利器
python·matlab·数据可视化
ZhengEnCi3 天前
P2L-Matplotlib饼图完全指南-从数据可视化到图表定制的Python绘图利器
python·matlab