南京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (4): 714–.

• • 上一篇    下一篇

截断式鲁棒非负矩阵分解算法

卢文凯*,景丽萍*,杨 柳   

  • 出版日期:2016-07-24 发布日期:2016-07-24
  • 作者简介:北京交通大学计算机与信息技术学院,北京,100044
  • 基金资助:
    基金项目:国家自然科学基金(61370129,61375062),中央高校基本科研业务费(2014JBM029),CCF­Tencent RAGR(20150116),教育部-中国移动科研基金(MCM20513) 收稿日期:2016-04-06 *通讯联系人,E­mail:kai624904302@126.com,lpjing@bjtu.edu.cn

Capped robust nonnegative matrix factorization algorithm

Lu Wenkai*,Jing Liping*,Yang Liu   

  • Online:2016-07-24 Published:2016-07-24
  • About author:Institute of Computer and Information Technology,Beijing Jiaotong University,Beijing,100044,China

摘要: 非负矩阵分解算法(Nonnegative Matrix Factorization Algorithm,NMF)已经广泛地应用于诸多领域,但它容易受到异常点的影响.各种针对这个问题的改进方法中,使用L2,1范数的鲁棒非负矩阵算法(Robust Nonnegative Matrix Factorization Algorithm,RNMF)取得了较好的改进效果,但是该算法不能很好的适应数据集异常点比例的变化.针对这一缺点,提出了截断式鲁棒非负矩阵分解算法(Capped Robust Nonnegative Matrix Factorization Algorithm,CRNMF),将去噪比例ε值引入到目标函数中,降低异常点对整体算法的影响.该算法的主要步骤是:在矩阵分解迭代更新的每一步中,计算输入数据与分解因子重构值之间的误差,将误差大于预先设定参数值ε的数据点对应的误差截断为零,重复以上步骤直到收敛.通过ε截断操作,降低基矩阵F和系数矩阵G受异常点的影响.给出了CRNMF的算法描述,并且在模拟数据集和真实数据集进行了实验,实验表明提出的算法与传统的NMF和RNMF相比,可以在一定程度上提高聚类的准确度,减少了异常点对聚类准确度的影响,提高了算法的鲁棒性.

Abstract: Nonnegative Matrix Factorization Algorithm(NMF)has been widely applied in various areas,but it is easily influenced by outliers.In order to solve this problem,researchers have proposed Robust Nonnegative Matrix Factorization Algorithm(RNMF),which uses L2,1 norm to make the normal points be approximate as much as possible and reduce the residual of the outliers by using its absolute rather than square.However,RNMF is still sensitive to the proportion of outliers,i.e.,in some datasets,RNMF can handle outliers well,but its performance in other datasets is not satisfactory.Every real dataset has its own structure,that is,it contains a different proportion of outliers.Because of this,RNMF is limited in practical application.In this article,we present a Capped Robust Nonnegative Matrix Factorization Algorithm(CRNMF)by adding a denoising rate ε into the objective function of RNMF.To achieve better controlling of the outliers,we use this algorithm to handle the situation which the real dataset outlier ratio is different.The main idea of CRNMF is evaluating the residual for each data point according to the input data and the factors during the iterative procedure,if the residual is larger than the given denoising rate ε,we will set the residual as 0,i.e.,the corresponding data point is taken as outlier and not considered in the computing process.By introducing ε truncation,the algorithm reduced the influence of outliers on matrix F and matrix G.This paper gives the description of CRNMF and experiments on real world and synthetic data sets.Experimental results show that the proposed algorithm can improve the clustering accuracy,reduce the impact of outliers and then improve the robustness of the algorithm to some extent,compared with the traditional NMF and RNMF.

[1] Lee D D,Seung H S.Algorithms for non­negative matrix factorization.In:The 13rd Annual Conference on Neural Information Processing System.Denver:MIT Press,2000:556-562. [2] Jolliffe I T.Principal component analysis.Springer,2002,87(100):41-64. [3] Dan K.A singularly valuable decomposition:The SVD of a matrix.College Mathematics Journal,1996,27(1):2-23. [4] Kohonen T.Learning vector quantization.Springer,1995,30:537-540. [5] Ding C,He X,Simon H.On the equivalence of nonnegative matrix factorization and spectral clustering.In:SIAM International Conference on Data Mining.California:SIAM Press,2005. [6] Ding C,Li T,Peng W,et al.Orthogonal nonnegative matrix tri­factorizations for clustering.ACM Knowledge Discovery and Data Mining,Philadelphia:ACM Press,2006:126-135. [7] Li S,Hou X,Zhang H,et al.Learning spatially localized,parts­based representation.Computer Vision and Pattern Recognition,2001:207-212. [8] Guillamet D,Vitria J.Non­negative matrix factorization for face recognition.In:Catalonian Conference on AI:Topics in Artificial Intelligence.London:Springer Press,2002:336-344. [9] Cho Y C,Choi S.Nonnegative features of spectro­temporal sounds for classification.Pattern Recognition Letters,2005,26(9):1327-1336. [10] Bucak S S,Gunsel B.Video content representation by incremental non­negative matrix factorization.In:IEEE International Conference on Image Processing.Texas:IEEE Press,2007,Ⅱ-113-Ⅱ-116. [11] Pauca V P,Shahnaz F,Berry M,et al.Text mining using non­negative matrix factorization.In:SIAM International Conference on Data Mining.Florida:SIAM Press,2004. [12] Xu W,Liu X,Gong Y.Document clustering based on non­negative matrix factorization.In:Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Toronto:ACM Press,2003:267-273. [13] Koren Y,Bell R,Volinsky C.Matrix factorization techniques for recommender systems.Computer,2009,42(8):30-37. [14] Kong D,Ding C,Huang H.Robust nonnegative matrix factorization using L21­norm.In:ACM International Conference on Information & Knowledge Management.Glasgow:ACM Press,2011:673-682. [15] Chandola V,Banerjee A,Kumar V.Anomaly detection:A survey.ACM Computing Surveys,2009,41(3):75-79. [16] Chatterjee S,Hadi A S.Influential observation,high leverage points,and outliers in linear regression.Statistical Science,1986,1(3):379-393. [17] Velleman P F,Welsch R E.Efficient computing of regression diagnostics.The American Statistician,1981,35(4):234-242. [18] Pitsoulis L,Zioutas G.A fast algorithm for robust regression with penalized trimmed squares.Computational Statistics,2010,25(4):663-689. [19] Jiang W,Nie F,Huang H.Robust dictionary learning with capped L1­norm.In:International Joint Conference on Artificial Intelligence(IJCAI).Buenos Aires:Morgan Kaufmann Press,2015:199. [20] Sun Q,Xiang S,Ye J.Robust principal component analysis via capped norms.ACM Knowledge Discovery and Data Mining,Chicago:ACM Press,2013:311-319. [21] Georghiades A S,Belhumeur P N,Kriegman D J.From few to many:Illumination cone models for face recognition under variable lighting and pose.IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23:643-660.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!