南京大学学报(自然科学版) ›› 2012, Vol. 48 ›› Issue (2): 147–153.

• • 上一篇    下一篇

 基于知网的贝叶斯中文人名识别*

 蒋才智**,王浩,姚宏亮
  

  • 出版日期:2015-05-23 发布日期:2015-05-23
  • 作者简介: (合肥工业大学计算机与信息学院,合肥,230009)
  • 基金资助:
     国家自然科学基金(61070131,61175051),国家重点基础研究发展计划(973项日)(2009CB326203)

 Chinese name recognition based on HowNet and Bayesian classifier

 Jiunn Cai Zhi,Wang Huo ,Yao Hong一Liaug   

  • Online:2015-05-23 Published:2015-05-23
  • About author: (School of Computer and Information, Hefei University of Technology, Hefei,230009,China)

摘要:  木文在朴素贝叶斯分类器的基础上,融入了知网语义的元素,构建了一个统计与语义相结合的中文人名识别模型.其基木思想是,首先利用贝叶斯分类器对中国人名进行定位和粗略识别,然后使用知网语义做进一步修正.该模型在继承了贝叶斯算法公式简单和具有一定学习能力的基础上,避免了
人名规则的大量使用,同时克服统计方法中人名边界难于界定的问题.实验结果表明,其准确率和召回率分别为95. 67%和97. 78%.

Abstract:  Chinese name is of highest frequency of unknown words in Chinese articles. The correct rate of Chinese name recognition will affect the application of syntactic analysis, machine translation, information retrieval extraction, automatic qucstion answering system,and so on. It is the key and difficult point.
The difficulty of Chinese name recognition is that it contains large kinds of name without morphological characteristics, also has some uncommon words. Despite these shortage for name recognition, there is the relative independence between characters except a small number of characters could be word.Thus feature is well in line with the Naive Baycs. In fact,the Baycsian classifier has good recognition results. But in the complex context,the recognition is not satisfactory for applications.The reason is that it is difficult to define the boundary of the names. It is easy to cause the boundary error. To solve this problem, this paper constructs a Chinese name recognition model combining HowNet with Baycsian classifier. The basic idea is to locate and recognize the Chinese name roughly by Baycsian classifier, and then to fix this name by using HowNet. The model not only has the advantages of simple formula and ability to
learn, but also overcomes the extensive use of rules and the difficulty of boundary defining. Experimental results
show that the precision and recall rates were 95. 67%and 97. 78%,respectively.

[1]Luo Z Y,Song R. Recognition of person names based on reliability. Journal of Chinese lnforma- tion Processing, 2005,1903):67一72,86.(罗智
勇,宋柔.一种基于可信度的人名识别方法. 中文信息学报,2005, 19(3):67-72, 86).
[2]Li Z G, Liu Y. Chinese name recognition based on boundary templates and local frequency. Journal of Chinese information Processing,
2006, 20(5);44-50.(李中国,刘颖.边界模板和局部统计相结合的中国人名识别.中文信息学报,2006,20(5):44~50).
[3]Zhang H P,Liu Q. Automatic recognition of Chinese personal name based on role tagging. Chinese Journal of Computers, 2004,27(1):
85-91.(张华平,刘群.基于角色标注的中国人名自动识别研究.计算机学报,200,27 (1):85一91).
[4]Qing X Y,Wang X Y. Model selection for ro- bust Baycsian mixture distributions. Journal of Nanjing University(Natural Sciences),2009,
45(5): 689-698.卿湘远,土行愚.鲁棒贝叶斯混合分布的模型选择.南京大学学报(自然科学),2009,45(5):689-698).
[5]Zhang F, Fan X Z, Xu Y. The research of Chi-nese names recognition method based on static tics. Computer Engineering and Applications
200通, 40(10):53-54.(张锋,樊孝忠,许 云.基于统计的中文姓名识别方法研究.计算机工程与应用,2004, 40(10); 53-54).
[6]Zheng J H,Li X,Tan H Y.The research of Chinese names recognition method based on cor-pus. Journal of Chinese information Processing,
2000,14(1);7~12.(郑家恒,李鑫,谭红叶. 基于语料库的中文姓名识别方法研究.中文信息学报,2000,14(1);7-12).
[7]Zhou B, Yang G W. Research on Chinese name identification based on Baycs algorithm. Journal of Computer Applications, 2006,26(4):998一
1000.(周波,杨国纬.基于贝叶斯算法的中国人名识别.计算机应用,2006, 26(4):998- 1000).
[8]Niu Z Y,Zi P Q. Identifying Chinese names for TTS system. Application Research of Comput- ers, 2001(1);25~26.(牛正雨,柴佩琪.文语
转换系统中的中文姓名识别.计算机应用研究,2001(1):25一26).
[9]Liu B W, Huang X J,Guo Y K. Statistical Chi- nese person names identification. Journal of Chinese information Processing, 2000,14(3):
16-24.(刘秉伟,黄聋菩,郭以昆.基于统计的中文姓名识别.中文信息学报,2000, 14 (3):16一24).
[10]Qin W, Yuan C F, identification of Chinese un- known word based on decision tree. Journal of Chinese information Processing, 200,18(1):
14-19.(秦文,苑春法.基于决策树的汉语未登录词识别.中文信息学报,200, 18(1): 14一19).
[11]Wang Z H,Kong X L,Lu R Z, et al. Chinese name identification integrated decision tree learning. Journal of Chinese information Pro-
cessing, 2004, 18( 6 ) : 10 ~15.(土振华,孔祥龙,陆汝占等.结合决策树方法的中文姓名识别.中文信息学报,2004, 18(6):10~15).
[12]Wang H,Su X N. Model for person name rec ognition based on role labeling using CRFs and its application to web opinion analysis. Journa
of the China Society for Scientific and Technica information, 2009 , 28 ( 1) ; 88 - 96.(王昊,苏新宁.基于CRFs、的角色标注人名识别模型
在网络舆情分析中的应用.情报学报,2009,28 (1):88一96).
[13]Zhao W,Li D. Chinese name identification based on both support vector machines and er rot-driven learning. Journal of Changchun Uni-
versity of Technology(Natural Sciences),2009,30(4); 396400.(赵伟,李月. SVM与错误驱动学习相结合的中文人名识别.长春工业
大学学报(自然科学),2009, 30(4); 396- 400).
[14]Xu G X, Yang D D. Study of obtaining name term technology in Chinese text mining. Journal of the Central University for Nationalities(Nat-
ural Sciences),2003,12(4),351一355.(青桂仙,杨月一月一中文文木挖掘中姓名特征提取技术的研究.中央民族大学学报(自然科学),2003,12(4):351一355).
[15]Feng C,Huang H Y,Chen Z X,et al. Chinese person name recognition with semantic radical. Proceedings of 2005 IEEE international Confer-
ence on Natural Language Processing and Knowledge Engineering, 2005,294一300.
[16]Xu Y,Fan X Z, Zhang F. Semantic relevancy computing based on Hownet. Transactions of Beijing institute of Technology, 2005,25(5):
411-414.(许云,樊孝忠,张锋.基于知网的语义相关度计算.北京理工大学学报,2005,25(5) :411一414).
[17]Liu Q, Li X J. Semantic similarity computing based on Hownet. Computational Linguistics and Chinese Language Processing, 2002,7:
59-76.(刘群,李素建.基于《知网》的词汇语义相似度计算.中文计算语言学,2002, 7; 59一76).











No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!