南京大学学报(自然科学版) ›› 2012, Vol. 48 ›› Issue (1): 55–62.

• • 上一篇    下一篇

 藏文印刷体字符识别技术研究*

 李永忠**,王玉雷,刘真真
  

  • 出版日期:2015-05-15 发布日期:2015-05-15
  • 作者简介: (江苏科技大学计算机科学与工程学院,镇江,212003)
  • 基金资助:
     国家自然科学基金(69973038),江苏省高校自然科学基金(05KJD52006),江苏科技大学科研资助项口
    (2005DX006J)

 Study on printed Tibetan character recognition technology

Li Yong-Zhong, Wang Yu-Lei,  Liu Zhen-Zhen   

  • Online:2015-05-15 Published:2015-05-15
  • About author: (School of Computer Science and Engineering, Jiangsu University of Science and
    Technology, Ghcnjiang, 212003,China)

摘要:  在分析了现有的藏文字符特征提取为一法一图像投影法和为一向线素法的基础上,运用分形矩理论和粗网格法,实现了基于分形矩的藏文字符特征提取为一法和改进粗网格法藏文字符特征提取.用分形
矩为一法提取的特征有效地反映了藏文字丁的局部和全局特征,减少了图像中因像素位置变化而降低识别率的影响.用改进粗网格法提取的字符特征不仅能有效地减少因图像像素位置变化造成的识别率下
降的影响,而且在一定程度上克服了藏文字符过多而造成的误识别率过高的缺点.通过实验对比,分形矩和改进粗网格法与为一向线素特征提取为一法的在识别率相同情况下,运算速度快,且在一定程度上克服
了藏文字丁极多而造成的误识率高的缺点.

Abstract:  Tibetan character set is composed of 30 Tibetan letters, 4 Tibetan vowel signs, 4 Tibetan subscripts, 3 Tibetan superscripts,10 Tibetan digits and some Tibetan punctuation markers. All Tibetan regards syllable as the
word-building unit. Each syllable is separated by the syllable nodes and every horizontal unit of the syllable spelling is called a Tibetan character, It is the least one that the Tibetan character of a syllable counts, there can be 4 at
most.Tibetan character recognition takes Tibetan character as the basic recognition unit, so the difficulty of the printed Tibetan character recognition is how to select and extract appropriate features to represent a Tibetan
character. At present,the printed Tibetan character feature extraction mainly has two methods, namely image projection approach and directional line clement approach.The image projection approach obtains the character
characteristic vector from the character image pixels along a certain direction (such as vertical,horizontal or diagonal direction, etc.)projection.This method is characterized by that the algorithms of matching and classification arc
simple, easy to realize,and the anti-interference ability is strong, but its ability to distinguish similitude characters is bad.The extraction method of directional line clement feature is the edge of the character pixel by four directions;
horizontal 0º,vertical 90º,inclined 45ºand anti-oblique 135º to quantify and the quantization result is regarded as the direction attribute of the point. This approach not only includes the structure information of characters, but also has
the statistical property and its recognition performance is better than the image projection approach. But its feature vector dimension is too much, so use this method needs to take some compression algorithms,which makes this
method described comparatively complicated while identifying, the matching process complexity much higher. Approaches of feature extraction for printed Tibetan character both based on fractal moments and hased on the
improved rough grid are presented in this paper. After analyzing present feature extraction approaches一一image projection approach and directional line clement approach,the fractal moments theory and the rough grid arc applied
in the proposed approach.The extracted features using the proposed approach method can not only describe the local and global properties of the character, but also decrease the influence that caused by the decline in the rate of
recognition part due to the change of pixel’s position in the image.The methods can not only overcome high probability of misrecognition due to numerous Tbetan characters,but also solve the low run speed owing to plenty feature vector dimensions.


[1]Liu Z Z, Wang M J,Li Y Z, et al. Feature ex- traction approach for printed丁ibetan character based on fractal moments. Pattern Recognition and Artificial lntclligcncc}2008,21(05):654一
657.(刘真真,王茂基,李永忠等.基于分形矩的印刷体藏文特征提取为一法.模式识别与人工智能,2008,21(05):654一669).
[2]Wang W L. Algorithm study on feature extrar ting of Tibetan character recognition. Journal of Northwest Minorities University(Natural Sci- ence), 1999, 20(3);20~23.(王维兰.藏文基本字符识别算法研究.西北民族大学学报(自然科学),1999,20(3):20一23,51).
[3]Kato N, Suzuki M, Omachi S, et al. A hand- written character recognition system using di- rectional clement feature and asymmetric mahal- anobis distance, IEEE Transactions on Pattern Analysis and Machine intelligence,1999,21 (3):258一262.
[4]Wang H,Ding X Q. Multi-font printed Tibetan character recognition. Journal of Chinese lnfor- mation Processing,2003,17(6) :47一52.(王华,丁晓青.多字体印刷藏文字符识别.中文信息学报,2003 , 17(6): 47一52 .
[5]Vivek S. Accurate fractal dimension estimation and its application to image analysis. State Uni- versitv of Ncw York at Buffalo,1998:40一55.
[6]Liao S X, Pawlak M. On image analysis by mo- menu. IEEE Transactions on Pattern Analysis and Machine intelligence, 1996,18:254一266.
[7]Faugeras O D, Pratt W K. Decorrelation meth ods of texture feature extraction, IEEE Trans actions on Pattern Analysis and Machine lntelli gence, 1980,2:323一332.
[8]Prokop R J,Reeves A P. A survey of moment based techniques for unoccluded object represen- tation and recognition. Uraphical Models and Image Processing, 1992,54(5):438一460.
[9]Hu M K. Visual pattern recognition by moment  invariants, lRE Transactions of Information Theory, 1962,8:179一187.
[10]Wang H,Ding X Q. A normalization method of multi-font printed Tibetan characters. Applica- tion Research of Computers, 200,21(6):4l一 43.(王华,丁晓青一种多字体印刷藏文字符的归一化为一法.计算机应用研究,200, 21(6): 41一43).
[11]Brown G, Michon G, Peyriere J. On the multi fractal analysis of measures. Journal of Statisti cal Physics,1992,66(3/4):775一790. 
[12]Ou Z,Pu C R,Da L S L J. Study on printed Tibetan character recognition. Computer Engi- nccring and Applications, 2009,51(2):165一 169.欧珠,普次仁,大罗桑朗杰.印刷体藏文文字识别技术研究.计算机工程与应用,2009, 51(24):165一169). 
[13]Cai Z J. Identification of abbreviated word in Tibetan word segmentation. Journal of Chinese Information Processing, 2009,7(1):35一37. (才智杰.藏文自动分词系统中紧缩词的识别.中文信息学报,2009,1(7);35-37). 
[14]Liu Z Z,Li Y Z, Shen Y H. Application of  fractal moments in feature extraction of printed Tibetan character. Journal of Jiangsu University of Science and Technology(Natural Science), 
2008,22(2);71-74.(刘真真,李永忠,沈哗华. 分形矩在印刷体藏文特征提取中的应用.江苏科技大学学报(自然科学),2008,22(2);71一74). 
[15]Chen Y Z, Li B L, Yu S W, et al. The design  and implementation of a Tibetan word segmen- tation system. Journal of Chinese information Processing, 2003 , 17 ( 3 ) : 15 ~20.(陈玉忠,李保利,俞士汉等.藏文自动分词系统的设计与实现.中文信息学报,2003,17(3): 15-20).
[16]Pu C R. Research on extracting the character of Tibetan symbol at different printing shape let- tern. Journal of Tibet University(Natural Sci- ence) , 2008, 23(1) ;25-28.(普次仁.多种印刷字体藏文字符的特征提取为一法研究.西藏大学学报(自然科学),2008, 23(1);25-28). 
[17]Yuan X M,Yang M. A kind of StASVM en- semble algorithm for unbalanced data sets. Journal of Nanjing University(Natural Sci-ences), 2010,24(4); 123一127.(袁兴梅,杨明.一种面向不平衡数据的结构化SVM集成算法.南京大学学报(自然科学),2010,24(4); 123一127). 




 



No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!