Divide and conquer algorithm for minimal cost feature selection with measurement error

Huang Weiting1*,Zhao Hong2

Journal of Nanjing University(Natural Sciences) ›› 2016, Vol. 52 ›› Issue (5) : 890.

PDF(3699831 KB)
PDF(3699831 KB)
Journal of Nanjing University(Natural Sciences) ›› 2016, Vol. 52 ›› Issue (5) : 890.

Divide and conquer algorithm for minimal cost feature selection with measurement error

  • Huang Weiting1*,Zhao Hong2
Author information +
History +

Abstract

Minimal cost feature selection is one of the most important problems in the current stage of data mining research.The optimization goal of this problem is to obtain an attribute subset that has the minimal total cost.In the process of access to the actual data,measurement error is inevitable,and the actual data will exist error.At present,there are some minimal cost feature selection algorithms with measurement error.But the efficiency of these algorithms is not very satisfactory,especially on large scale datasets.In order to solve this critical problem,this paper proposes a divide and conquer algorithm for minimal cost feature selection with measurement error.Combining divide and conquer thought,the dataset is broken down into some disjoint sub­datasets according to the number of the column.The proposed algorithm is used for each sub­dataset,and it can obtain the optimal attribute subset of each sub­dataset.These optimal attribute subsets are merged,thus an attribute subset with the minimal total cost is obtained.For the different scale of the datasets,the size of the sub­datasets and the number of the sub­datasets are adaptive to the scale of the datasets rather than fixed.The computational efficiency of this algorithm is enhanced by reducing the scale of the problem.The experimental results on six UCI datasets with different scales show that the proposed method is effective.Compared with the classical backtracking algorithm,the efficiency of this algorithm is increased at least by thirty percent.At the same time,the effect of this algorithm is guaranteed.Therefore the proposed method can well meet the need of the practical problems.

Cite this article

Download Citations
Huang Weiting1*,Zhao Hong2. Divide and conquer algorithm for minimal cost feature selection with measurement error[J]. Journal of Nanjing University(Natural Sciences), 2016, 52(5): 890

References

[1] Turney P D.Types of cost in inductive concept learning.In:Proceedings of the Workshop on Cost­Sensitive Learning at the 17th International Conference on Machine Learning(ICML).Canada:National Research Council,2000:15-21.
[2]  Liu D,Li T R,Ruan D.Probabilistic model criteria with decision­theoretic rough sets.Information Sciences,2011,181(17):3709-3722.
[3]  Min F,Zhu W.Minimal cost attribute reduction through backtracking.Database Theory and Application,Bio­Science and Bio­Technology.Heidelberg,Berlin:Springer,2011:100-107.
[4]  Qian W B,Shu W H,Yang J,et al.Cost­sensitive feature selection on heterogeneous data.Advances in Knowledge Discovery and Data Mining.Springer International Publishing,2015:397-408.
[5]  Yang X B,Qi Y S,Song X N,et al.Test cost sensitive multigranulation rough set:Model and minimal cost selection.Information Sciences,2013,250:184-199.
[6]  Bolón­Canedo V,Porto­Díaz I,Sánchez­Maroño N,et al.A framework for cost­based feature selection.Pattern Recognition,2014,47(7):2481-2489.
[7]  Li X J,Zhao H,Zhu W.A cost sensitive decision tree algorithm with two adaptive mechanisms.Knowledge­Based Systems,2015,88:24-33.
[8]  Shu W H,Shen H.Multi­criteria feature selection on cost­sensitive data with missing values.Pattern Recognition,2016,51:268-280.
[9]  Yuan D R,Huang X M,Duan Q L,et al.A strategy of constructing heterogeneous cost­sensitive decision tree.Advanced Materials Research,2013,756:3414-3418.
[10]  张燕平,邹慧锦,赵 姝.基于CCA的代价敏感三支决策模型.南京大学学报(自然科学),2015,51(2):447-452.(Zhang Y P,Zou H J,Zhao S.Cost­sensitive three­way decisions model based on CCA.Journal of Nanjing University(Natural Sciences),2015,51(2):447-452.)
[11]  Zhou Z H,Chawla N V,Jin Y C,et al.Big data opportunities and challenges:Discussions from data analytics perspectives.Computational Intelligence Magazine,IEEE,2014,9(4):62-74.
[12]  蒋盛益,谢照青,余 雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究.计算机研究与发展,2011,48:387-390.(Jiang S Y,Xie Z Q,Yu W.Naive bayes classification algorithm based on cost sensitive for imbalanced data distribution.Journal of Computer Research and Development,2011,48:387-390.)
[13]  Wu W Z,Zhang W X.Neighborhood operator systems and approximations.Information Sciences,2002,144(1):201-217.
[14]  Zhao H,Min F,Zhu W.Test­cost­sensitive attribute reduction based on neighborhood rough set.In:2011 IEEE International Conference on Granular Computing(GrC).Kaohsiung:IEEE,2011:802-806.
[15]  Min F,Zhu W.Attribute reduction of data with error ranges and test costs.Information Sciences,2012,211:48-67.
[16]  Zhao H,Zhu W.Optimal cost­sensitive granularization based on rough sets for variable costs.Knowledge­Based Systems,2014,65:72-82.
[17]  Yao Y Y.A partition model of granular computing.Transactions on Rough Sets I.Heidelberg,Berlin:Springer,2004:232-253.
PDF(3699831 KB)

1827

Accesses

0

Citation

Detail

Sections
Recommended

/