南京大学学报(自然科学版) ›› 2010, Vol. 46 ›› Issue (5): 501–506.

• • 上一篇    下一篇

 一种新的基于粗糙集的动态样本识别算法*

 易兴辉 ** , 王国胤, 胡峰   

  • 出版日期:2015-04-02 发布日期:2015-04-02
  • 作者简介: ( 重庆邮电大学计算机科学与技术研究所, 重庆, 400065)
  • 基金资助:
     国家自然科学基金( 60573068, 60773113) , 重庆市自然科学基金( 2008BA201, 200 8BA2041) , 重庆市教育委员会科学技术研究项目( KJ090512)

 A new dynamic sample recognition algorithm based on rough set

 Yi Xing H ui , Wang Guo Yin, Hu Feng
  

  • Online:2015-04-02 Published:2015-04-02
  • About author: (Institute of Computer Science and Technology, Chongqing University of Posts and T elecommunications, Chongqing, 400065, China )

摘要:  样本识别是知识获取的最终应用体现, 是数据挖掘研究中的一个重要内容. 现有的数据挖掘算法众多, 如何才能选择到一个泛化能力较强、 识别率较高的最优算法成为研究的重点. 文中利用粗糙
集能处理不完整、 不精确数据的优势, 结合支持向量机、 决策树方法, 通过分析数据的特征, 提出利用样本对规则集的覆盖度和设置一个相关阈值来进行最优分类方法的动态选择. 在第一时间为样本选择到
相对较优的分类算法. 仿真实验验证了算法的有效性.

Abstract:  Sample identification is the ultimate application of knowledge acquisition, is an important element of the data mining study. There have been a lot of mining algorithms, how to choose the best algorithm with strong
generalization ability is now a main research point. In this paper, we make use of the advantages that rough set can handle incomplete and inaccurate data, combined with Support Vector Machines, Decision Tree methods, by
analyzing the characteristics of the data, presenting using a rule union?s coverage and setting a threshold to select the optimal classification method dynamically. It can find out the best algorithm at the first time. There are four
steps in total. First, use rough set methods to get the rule union. Second, by analyzing the relation of sample example and rule union, putting forward uses the coverage of sample to rule union to judge whether it is suitable to
use rough sets to identify the sample. The coverage reflects the number of rules that match with the sample. When the coverage is greater ( or less) than 1/ n, ( the n here is the number of rules we get) , it indicates that there are
more than one rules ( or no rules) match with the sample, then it may identifies the sample in error( or refuses torecognize) , the sample in that case need further analysis. T hird, to the samples leaved from step 2, computing the
distance between it and the support vector points, when the distance is greater than a certain threshold, then it tells us that SVM can classify it well, so uses the SVM method to classify it. Forth, if the distance in step 3 is smaller
than the threshold, then, uses the decision tree algorithm to identify it. In order to verify the effective of the algorithm, in the experiment part, we choose eight data sets from the UCI to test. To each data set, We select 50
percent data randomly to be train set and the other 50 percent data is used to be test set. The result shows that the algorithm in this paper has the equal well recognition rate with current optimal algorithm. The experiment results
have verified the effectiveness of the algorithm.

 [ 1 ] Pawlak Z. Rough set. International Journal of Computer and Information Sciences, 1982, 11: 341~ 356.
[ 2 ] Pawlak Z, Grzymala ?Busse J, Slowinski R, et al. Rough sets. Communications of the Association for Computing Machinery, 1995, 38 ( 11): 89~ 95.
[ 3 ] Pawlak Z. Vagueness- A rough set view. Mycielski J, Rozenberg G, Salomaa A. Structures in logic and computer science: A selection of essays in honor of A. Ehrenfeucht. Berlin: Springer ?Verlag, 1997, 106~ 117.
[ 4 ] Xie K M, Chen Z H, Xie G, et al. BGrC for superheated steam temperature system modeling in power plant. Proceedings of the 2006 IEEE International Conference on Granular Computing. Atlanta, USA, 2006, 708~ 711.
[ 5 ] Valdes J J, Romero E, Gonzalea R. Data and knowledge visualization with virtual reality spaces, neural networks and rough sets: Application to geophysical prospecting neural networks. Proceedings of the International Joint
Conference on Neuval Network 2007. Orlando, Florida, USA, 2007, 160~ 165.
[ 6 ] Hirano S, Tsumoto S. Segmentation of medical images based on approximations in rough set theory. Proceedings of the Rourg Sets and Current T rends in Computing 2002, 2002: 554~ 563.
[ 7 ] Zhu Y C, Xiong W, Jing Y W, et al. Design and realization of integrated classifier based on rough set. Journal on Communications, 2006, 27
( zl): 63~ 67. (朱有产, 熊 ?伟, 静永文等. 基于Rough set 理论的综合分类器设计与实现. 通信学报, 2006, 27(zl) : 63~ 67) .
[ 8 ] Peng Y Q, Liu G Q, Geng H S. Application of rough set theory in network fault diagnosis. Proceedings of the Information Technology and Application, 2005, 2: 556~ 559.
[ 9 ] Wojcik Z M. Detecting spots for NASA space programs using rough sets. Proceedings of the 2 nd International Conference on Rough Sets and Current T rends in Computing, 2000, 531~ 537.
[ 10]Swoniarski R, Hargis L. Rough set as a format end of neural -networks texture classifiers. Neurocomputig, 2001, 36( 1 -4): 85~ 102.
[ 11] Ahn B, Cho S, Kim C. The integrated methodology of rough set theory and artificial neural network for business failure prediction. Expert Systems with Applications, 2000, 18 ( 2) : 65~ 74.
[ 12] Li R P, Wang Z G. Mining classification rules using rough sets and neural networks. European Journal of Operational Research, 2004( 157) : 439~ 448.
[ 13] Swiniarski R W, Hargis L. Rough sets as a front end of neural?networks texture classifiers. Neruocomputing, 2001, 36: 85~ 102.
[ 14] M ohua B, Sushmita M, Pal S K. Rough fuzzy M LP: Knowledge encoding and classification. IEEE Transaction on Neural Networks, 1998, 9 ( 6) : 1203~ 1216.
[ 15] Zhu Y L, Wu L Z, Li X Y. Synthesized diagnosis on transformer faults based on bayesian classifier and rough set. Proceedings of the Chinese
Society for Electrical Engineering, 2005, 25 ( 10): 160 ~ 165. ( 朱永利, 吴立增, 李雪玉. 贝叶斯分类器与粗糙集相结合的变压器综合故障诊断. 中国电机工程学报, 2005, 25( 10): 160~ 165).
[ 16] Liu X M, Huang H K, Xu W X. An algorithm of decision tree based on rough set theory. Journal of Communication and Computer, 2005, 2 (8): 37~ 40. ( 刘旭敏, 黄厚宽, 徐维祥. 用粗糙集理论建立决策树的一种方法. 通讯和计算机, 2005, 2(8): 37~ 40) .
[ 17] Kotsiantis S, Zaharakis I, Pintelas P. Supervised machine learning: A review of classification techniques. Artificial Intelligence Review, Berlin, Germany: Springer, 2006, 26 ( 3): 249~ 268.
[ 18] Hastie T, Tibshirani R, Friedman J H. The elements of statistical learning. Canada: SpringerVerlag, 2008, 320~ 400.
[ 19] Wang G Y. Rough set theory and knowledge acquisition. Xi) An: Xi) An JiaoTong University Press, 2001, 20~ 167. ( 王国胤. Rough 集理论与知识获取. 西安: 西安交通大学出版社, 2001, 20~ 167) .
[ 20] Burges C J C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998 2(2) : 121~ 167.
[ 21] Lin C J. Welcome to Chih?Jen Lin’s Home Page. http: // www. csie. ntu. edu. tw/~ cjlin/. 2004.
[ 22] Xu P, Lin S. Internet traffic classification using C4-5 decision tree. Journal of Software, 2009, 20( 10) : 2692~ 2704. ( 徐 ?鹏, 林 ?森. 基于
C4 - 5决策树的流量分类方法. 软件学报, 2009, 20(10) : 2692~ 2704).
[ 23] Liao S Z, Ding L Z, Jia L. . Simultaneous tuning of multiple parameters for support vector regression. Journal of Nanjing University(Natural
Sciences), 2009, 45( 5): 587~ 591. ( 廖士中, 丁立中, 贾 磊. 支持向量回归多参数的同时调节. 南京大学学报( 自然科学) , 2009, 45( 5): 587~ 591).
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!