南京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (4): 682–.

• • 上一篇    下一篇

一种改进的FastMap算法

屈太国*,蔡自兴   

  • 出版日期:2016-07-24 发布日期:2016-07-24
  • 作者简介:中南大学信息科学与工程学院,长沙,410083
  • 基金资助:
    基金项目:国家自然科学基金(61175064,61273314,91220301) 收稿日期:2016-04-21 *通讯联系人,E­mail:qutaiguo88@sohu.com

An improved FastMap algorithm

Qu Taiguo*,Cai Zixing   

  • Online:2016-07-24 Published:2016-07-24
  • About author:School of Information Science & Engineering,Central South University,Changsha,410083,China

摘要: FastMap是经典多维标度法(classical multidimensional scaling,CMDS)的一种快速算法,它包含一系列的投影.在每次投影中,两个相距较远的点被选为枢轴点,连接枢轴点得到一个枢轴;然后将各样本投影到枢轴上;最后,修改所有样本间的距离.FastMap的不足在于只能得到CMDS的近似解.对FastMap进行了深入分析,指出FastMap的本质就是把各样本投影到由枢轴确定的一组正交向量上.由于这组向量通常不同于样本集的主轴,使得FastMap只能得到CMDS的近似解;并指出FastMap算法的最大投影次数等于样本集的内在维数.在此理论分析的基础上,提出了一种改进的FastMap 算法—iFastMap(improved FastMap)算法.通过对FastMap坐标进行主成分分析,iFastMap得到了与CMDS完全一致的解;此外,从样本集中选取一个内在维数等于整个样本集内在维数的子集,将枢轴点的选取限定在这个子集上,并在每次投影后只修改枢轴点与各样本间的距离,iFastMap的速度得到进一步提高.实验验证了iFastMap与CMDS解的完全一致性及其高效性.

Abstract: Classical multidimensional scaling(CMDS)is a very common method for dimensionality reduction,data visualization,machine learning,pattern recognition,etc.When the distance is Euclidean,the CMDS solution can be viewed as the projections of the samples onto the principal axes of the sample set.The problem of CMDS is that its time increases rapidly with the increase of data size.As one of the fast variants of CMDS,the FastMap algorithm has been widely used in many fields.It includes some recursive projections.And each projection consists of three steps.Firstly,a pivot is obtained by passing through two far apart points(referred to as pivot points from now on);then the samples are projected onto the pivot and the coordinates of the samples in a low­dimensional Euclidean space are obtained;and finally,the distances between all samples are modified.The shortcoming of the FastMap algorithm is that it can only find approximate solution of CMDS.In this paper,the FastMap algorithm is analyzed in detail.It is found that the essence of the FastMap algorithm is to project the samples onto a set of mutually orthogonal directions determined by the pivots.Since these directions are usually different from the principal axes of the sample set,the FastMap algorithm can only get the approximate solution of CMDS.It is also found that the pivots can be selected from a subset provided that its intrinsic dimension is equal to that of the whole sample set.Last but not least,it is found that only the distances between the pivot points and the samples are used to obtain the FastMap coordinates.That is to say,it’s unnecessary to modify the distances between all the samples.Based on the theoretical analysis,an improved algorithm called iFastMap(improved FastMap)is put forward in this paper.By introducing principal component analysis,the FastMap coordinates can be aligned with the CMDS coordinates.As a result,the iFastMap algorithm can find exactly the same solution as that of CMDS.In addition,by selecting a subset with the same intrinsic dimension as that of the whole sample set,choosing the pivots only from this subset,and modifying only the distances between the pivot points and all the samples after each projection,the speed of the iFastMap algorithm is further improved.The experimental results verify the complete consistency of the solutions of the iFastMap algorithm and CMDS and high efficiency of the iFastMap algorithm.

[1] Cox T F,Cox M A A.Multidimensional scaling.Boca Raton:Chapman & Hall/CRC,2001,9-56. [2] 吴纯青,任沛阁,王小峰.基于语义的网络大数据组织与搜索.计算机学报,2015,38(1):1-17.(Wu C Q,Ren P G,Wang X F.Survery on semantic­based organization and search technologies for network big data.Chinese Journal of Computers,2015,38(1):1-17.) [3] Ingram S,Munzner T.Dimensionality reduction for documents with nearest neighbor queries.Neurocomputing,2015,150,Part B:557-569. [4] Mjolsness E,DeCoste D.Machine learning for science:State of the art and future prospects.Science,2001,293(5537):2051-2055. [5] Patel A P,Tirosh I,Trombetta J J,et al.Single­cell RNA­seq highlights intratumoral heterogeneity in primary glioblastoma.Science,2014,344(6190):1396-1401. [6] 明 亮,赵 刚,谢桂海等.面向智能空间的位置感知方法研究.软件学报,2009,20(3):671-681.(Ming L,Zhao G,Xie G H,et al.Research on smart space oriented location awareness method.Journal of Software,2009,20(3):671-681.) [7] Wang S,Zhuo Z,Yang H,et al.An approach to facial expression recognition integrating radial basis function kernel and multidimensional scaling analysis.Soft Computing,2014,18(7):1363-1371. [8] Vogelstein J T,Park Y,Ohyama T,et al.Discovery of brainwide neural­behavioral maps via multiscale unsupervised structure learning.Science,2014,344(6182):386-392. [9] Arkin A,Shen P,Ross J.A test case of correlation metric construction of a reaction pathway from measurements.Science,1997,277(5330):1275-1279. [10] Li S S M,Wang C C L,Hui K C.Bending­invariant correspondence matching on 3­D human bodies for feature point extraction.IEEE Transactions on Automation Science and Engineering,2011,8(4):805-814. [11] 徐江山,路海明,马卫民等.基于MDS变换聚类算法的数字电视用户特征提取.电子学报,2008,36(9):1786-1789.(Xu J S,Lu H M,Ma W M,et al.The extraction of digital TV user feature based on MDS clustering.Acta Electronica Sinica,2008,36(9):1786-1789.) [12] Tenenbaum J B,Silva V D,Langford J C.A global geometric framework for nonlinear dimensionality reduction.Science,2000,290(5500):2319-2323. [13] Chalmers M.A linear iteration time layout algorithm for visualizing high dimensional data.In:Proceedings of the 7th IEEE Visualization Conference.Los Alamitos,California,USA:IEEE Computer Society Press,1996:127-132. [14] Morrison A,Ross G,Chalmers M.Fast multidimensional scaling through sampling,Springs and Interpolation.Information Visualization,2003,2(1):68-77. [15] Williams M,Munzner T.Steerable,progressive multidimensional scaling.In:IEEE Symposium on Information Visualization 2004.Austin,TX,USA:IEEE Press,2004:57-64. [16] Wang J T L,Wang X,Shasha D,et al.Metricmap:An embedding technique for processing distance­based queries in metric spaces.IEEE Transactions on Systems,Man,and Cybernetics,Part B:Cybernetics,2005,35(5):973-987. [17] Silva V D,Tenenbaum J B.Sparse multidimensional scaling using landmark points.Technical Report.Department of Mathematics,Stanford University,Palo Alto,June 2004. [18] Faloutsos C,Lin K.FastMap:A fast algorithm for indexing,data­mining,and visualization.In:Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data.San Jose,California,USA:ACM Press,1995,24(2):163-174. [19] Mignotte M.A bicriteria­optimization­approach­based dimensionality­reduction model for the color display of hyperspectral images.IEEE Transactions on Geoscience and Remote Sensing,2012,50(2):501-513. [20] Wang X,Zhang L,Wang Y,et al.3D model features co­clustering based on heterogeneous semantic network.In:The 4th IEEE International Conference on Information Science and Technology(ICIST 2014).Shenzhen,Guangdong,China:IEEE Press,2014:75-78. [21] Tejada E,Minghim R,Nonato L G.On improved projection techniques to support visual exploration of multi­dimensional data sets.Information Visualization,2003,2(4):218-231. [22] Latsoudas G,Sidiropoulos N D.A fast and effective multidimensional scaling approach for node localization in wireless sensor networks.IEEE Transactions on Signal Processing,2007,55(10):5121-5127. [23] Zhang D Y,Bai X,Liu P N.multidimensional scaling localization algorithm in wireless sensor networks.Sensors & Transducers,2014,164(2):71-79. [24] 张培珍,付 平,肖 军等.基于快速聚类索引的图像检索系统.吉林大学学报(信息科学版),2004,22(6):638-642.(Zhang P Z,Fu P,Xiao J,et al.Image retrieval system based on fast clustering indexing.Journal of Jilin University(Information Science Edition),2004,22(6):638-642.) [25] Liu Y,Kender J R.Fast video segment retrieval by Sort­Merge feature selection,boundary refinement,and lazy evaluation.Computer Vision and Image Understanding,2003,92(2-3):147-175. [26] 王 菁,杨寿保,高鹰等.FCAN:一种基于快速映射的内容访问网络.系统仿真学报,2007,19(17):3955-3960.(Wang J,Yang S B,Gao Y,et al.FCAN:Fastmap­based content addressable network.Journal of System Simulation,2007,19(17):3955-3960.) [27] Khan I,Huang J Z,Tung N T,et al.Ensemble clustering of high dimensional data with FastMap projection.In:Peng W C.PAKDD 2014 Workshops.Tainan,Taiwan:Springer International Publishing,2014:483-493. [28] Touati R,Mignotte M.MDS­based multi­axial dimensionality reduction model for human action recognition.In:2014 Canadian Conference on Computer and Robot Vision(CRV).Montreal,QC,Canada:IEEE Press,2014:262-267. [29] Zheng Y,Xiao L,Tang W,et al.A music recommendation method for large­scale music library on a heterogeneous platform.In:Sun X H,Qu W Y,Ivan Stojmenovic I,et al.The 14th International Conference on Algorithms and Architectures for Parallel Processing.Dalian,China:Springer International Publishing,2014:472-482. [30] Moevus A,Mignotte M,de Guise J A,et al.Evaluating perceptual maps of asymmetries for gait symmetry quantification and pathology detection.In:The 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC 2014).Chicago,Illinois:IEEE Press,2014:3317-3320. [31] Platt J.FastMap,MetricMap,and Landmark MDS are all nystrom algorithms.Technical Report.Microsoft Research,Microsoft Corporation,New York,MSR-TR-2004-26. [32] Abu­Khzam F N,Samatova N F,Ostrouchov G,et al.Distributed dimension reduction algorithms for widely dispersed data.In:Gonzales T,Akl S G.The 14th IASTED International Conference on Parallel and Distributed Computer Systems.Cambridge,USA:Acta Press,2002:167-174. [33] Ostrouchov G,Samatova N F.On FastMap and the convex hull of multivariate data:Toward fast and robust dimension reduction.IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(8):1340-1343. [34] An J Y,Yu J X,Ratanamahatana C A,et al.A dimensionality reduction algorithm and its application to interactive visualization.Journal of Visual Languages and Computing,2007,18(1):48-70. [35] Young G,Householder A S.Discussion of a set of points in terms of their mutual distances.Psychometrika,1938,3(1):19-22. [36] 张润楚.多元统计分析.北京:科学出版社,2006,288-311.(Zhang R C.Multivariate statistical analysis.Beijing:Science Press,2006,288-311.) [37] Pearson K.On lines and planes of closest fit to system of points in space.Philosophical Magazine,1901,2:559-572. [38] Hotelling H.Analysis of a complex of statistical variables into principal components.Journal of Educational Psychology,1933,24(6):417-441. [39] http://archive.ics.uci.edu/ml/datasets.html [40] http://www.cs.toronto.edu/~roweis/data.html  [41] Kruskal J B.Nonmetric multidimensional scaling:A numerical method.Psychometrika,1964,29(2):115-129.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!