南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (6): 942951.doi: 10.13232/j.cnki.jnju.2019.06.007
Pu Han1,2(),Yizhuo Liu3,Xiaoyan Li4
摘要:
电子病历实体识别是医疗领域人工智能和医疗信息服务中非常关键的基础任务.为了更充分地挖掘电子病历中的实体语义知识以提升中文医疗实体识别效果,提出融入外部语义特征的中文电子病历实体识别模型.该模型首先利用语言模型word2vec将大规模的未标记文本生成具有语义特征的字符级向量,接着通过医疗语义资源的整合以及实体边界特征分析构建了医疗实体及特征库,将其与字符级向量相拼接以更好地挖掘序列信息,最后采用改进的Voting算法将深度学习结果与条件随机场(Conditional Random Fields,CRF)的结果加以整合来纠正标签偏置.实验表明,融入外部语义特征的改进模型的F值达到94.06%,较CRF高出1.55%.此外,还给出了模型最佳效果的各项参数.
中图分类号:
1 | Zhu F , Patumcharoenpol P , Zhang C ,et al . Biomedical text mining and its applications in cancer research. Journal of Biomedical Informatics,2013,46(2):200-211. |
2 | Wang Y S , Wang L W , Rastegar?Mojarad M ,et al . Clinical information extraction applications:a literature review. Journal of Biomedical Informatics,2018,77:34-49. |
3 | Wu Y H , Jiang M , Lei J B ,et al . Named entity recognition in Chinese clinical text using deep neural network. Studies in Health Technology and Informatics,2015,216:624-628. |
4 | Segura?Bedmar I , Suárez?Paniagua V , Martínez P . Exploring word embedding for drug name recognition∥International Workshop on Health Text Mining and Information Analysis. Lisbon,Portugal:Association for Computational Linguistics,2015:64-72. |
5 | 叶枫,陈莺莺,周根贵 等 . 电子病历中命名实体的智能识别. 中国生物医学工程学报,2011,30(2):256-262. |
Ye F , Chen Y Y , Zhou G G ,et al . Intelligent recognition of named entity in electronic medical records. Chinese Journal of Biomedical Engineering,2011,30(2):256-262. | |
6 | Hu J L , Shi X , Liu Z J ,et al . HITSZ_CNER:a hybrid system for entity recognition from Chinese clinical text∥China Conference on Knowledge Graph and Semantic Computing 2017. Chendu,China:Springer,2017,26-29. |
7 | Keerthi S S , Sundararajan S . CRF versus SVM?struct for sequence labeling. Yahoo Research Technical Report,2007. |
8 | Jiang M , Chen Y K , Liu M ,et al . A study of machine?learning?based approaches to extract clinical entities and their assertions from discharge summaries. Journal of the American Medical Informatics Association,2011,18(5):601-606. |
9 | Lei J , Tang B , Lu X ,et al . A comprehensive study of named entity recognition in Chinese clinical text. Journal of the American Medical Informatics Association,2013,21(5):808-814. |
10 | 王润奇,关毅 . 基于Tri?Training算法的中文电子病历实体识别研究. 智能计算机与应用,2017,7(6):132-134,138. (Wang R Q,Guan Y. Named entity recognition research in Chinese electronic medical records based on Tri?Training algorithm. Intelligent Computer and Applications,2017,7(6):132-134,138.) |
11 | 付文博,孙涛,梁藉 等 . 深度学习原理及应用综述. 计算机科学,2018,45(6A):11-15,40. |
Fu W B , Sun T , Liang J ,et al . Review of principle and application of deep learning. Computer Science,2018,45(6A):11-15,40. | |
12 | Lample G , Ballesteros M , Subramanian S ,et al . Neural architectures for named entity recognition. arXiv:1603.01360,2016:260-270. |
13 | Liu Z J , Yang M , Wang X L ,et al . Entity recognition from clinical texts via recurrent neural network. BMC Medical Informatics and Decision Making,2017,17(2):67. |
14 | Ma X Z , Hovy E . End?to?end sequence labeling via bi?directional lstm?cnns?crf. 2016,arXiv:1603.01354. |
15 | Chalapathy R , Borzeshi E Z , Piccardi M . Bidirectional LSTM?CRF for clinical concept extraction. 2016,arXiv:1610.05858. |
16 | Wu J H , Hu X , Zhao R S ,et al . Clinical named entity recognition via bi?directional LSTM?CRF model∥China Conference on Knowledge Graph and Semantic Computing 2017. Chendu,Sichuan,2017,26-29. |
17 | Mikolov T , Kombrink S , Burget L ,et al . Extensions of recurrent neural network language model∥IEEE International Conference on Acoustics,Speech and Signal Processing. Prague,Czech Republic:IEEE,2011:5528-5531. |
18 | Hochreiter S . The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty,Fuzziness and Knowledge?Based Systems,1998,6(2):107-116. |
19 | Donahue J , Hendricks L A , Rohrbach M ,et al . Long?term recurrent convolutional networks for visual recognition and description. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(4):677-691. |
20 | Gers F A , Schmidhuber J , Cummins F . Learning to forget:continual prediction with LSTM. Neural Computation,2014,12(10):2451-2471. |
21 | Habibi M , Weber L , Neves M ,et al . Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics,2017,33(14):i37-i48. |
22 | Lafferty J D , Mccallum A , Pereira F C N . Conditional random fields:Probabilistic models for segmenting and labeling sequence data∥Proceedings of the 18th International Conference on Machine Learning.San Francisco,CA,USA:Morgan Kaufmann Publishers Inc.,2001:282-289. |
23 | Goldberg Y , Levy O . word2vec Explained:Deriving Mikolovet al.'s negative?sampling word?embedding method. 2014,arXiv:1402.3722. |
24 | Srivastava N , Hinton G , Krizhevsky A ,et al . Dropout:a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research,2014,15(1):1929-1958. |
25 | Mikolov T , Chen K , Corrado G ,et al . Efficient estimation of word representations in vector space. 2013,arXiv:1301.3781. |
[1] | 朱伟,张帅,辛晓燕,李文飞,王骏,张建,王炜. 结合区域检测和注意力机制的胸片自动定位与识别[J]. 南京大学学报(自然科学版), 2020, 56(4): 591-600. |
[2] | 李康,谢宁,李旭,谭凯. 基于卷积神经网络和几何优化的统计染色体核型分析方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 116-124. |
[3] | 张家精,夏巽鹏,陈金兰,倪友聪. 基于张量分解和深度学习的混合推荐算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 952-959. |
[4] | 钟琪,冯亚琴,王蔚. 跨语言语料库的语音情感识别对比研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 765-773. |
[5] | 王蔚, 胡婷婷, 冯亚琴. 基于深度学习的自然与表演语音情感识别[J]. 南京大学学报(自然科学版), 2019, 55(4): 660-666. |
[6] | 张鹏,黄毅,阮雅端,陈启美*. 基于稀疏特征的交通流视频检测算法[J]. 南京大学学报(自然科学版), 2015, 51(2): 264-270. |
|