计算机集成制造系统 ›› 2019, Vol. 25 ›› Issue (第4): 1010-1016.DOI: 10.13196/j.cims.2019.04.025

• 当期目次 • 上一篇    下一篇

诊疗活动向量化表示研究

周梦颖,金涛+,王瀛,王建民   

  1. 清华大学软件学院
  • 出版日期:2019-04-30 发布日期:2019-04-30
  • 基金资助:
    国家科技支撑计划资助项目(2015BAH14F02);国家自然科学基金资助项目(61325008)。

Clinical activity representation learning

  • Online:2019-04-30 Published:2019-04-30
  • Supported by:
    Project supported by the National Key Technology R&D Program,China(No.2015BAH14F02),and the National Natural Science Foundation,China(No.61325008).

摘要: 诊疗活动是诊疗过程的基本元素,诊疗活动向量可以应用于诊疗活动聚类、患者聚类等任务。以把握和利用诊疗活动“局部无序,全局有序”的数据特点为出发点,结合医学先验知识,提出了诊疗活动向量化表示方法CA2Vec。提出了诊疗活动向量化学习的数据处理过程框架;加入了下一个诊疗日的诊疗活动和诊断结果信息,使得所提诊疗活动向量化学习模型相比于经典模型,获得了更丰富的上下文信息;提出了基于诊断结果约束的诊疗活动负采样方法。以SNOMED-CT本体和ICD-10编码医学知识为依据设计了评估实验,并在基于特定诊疗活动的聚类、基于本体的相似性度量、基于诊疗活动类型的相关度度量、基于特定病种的分类准确度度量、基于患者向量的聚类准确度度量这几个任务上进行了对比实验。实验证明,相比其他已有的先进的词向量化学习模型,CA2Vec方法有效把握了诊疗活动的相关关系,总体上有更高的准确度。

关键词: 向量化表示, 诊疗活动聚类, 语义相似度, 神经网络

Abstract: Clinical activity is the basic element of clinical process,and its word representation can be applied to both clinical activity clustering and patient clustering.Making data characterization of clinical activity "being disordered locally but relative order globally" along with medical prior knowledge,an advanced representation learning method called CA2Vec was proposed for medical activity.In this method,an overall data processing framework for the representation learning task of clinical activity data was proposed;the diagnose and the next clinical day information were added to enrich the training pairs for the neural network;a diagnosis-constraint-based negative sampling method was created.With SNOMED CT ontology and ICD-10 encoding system,the evaluation experiments was designed for clinical activity representation learning results,including clinical activities clustering,similarity measurement compared with ontology,correlation measurement of clinical activity categories,classification accuracy measurement among specific diseases,and patient clustering accuracy measurement.Experimental results showed that CA2Vec could effectively grasp the correlation between clinical activities,and had achieved better results in different experiments.

Key words: representation learning, clinical activity clustering, semantic similarity, neutral network

中图分类号: