Classical multidimensional scaling(CMDS)is a very common method for dimensionality reduction,data visualization,machine learning,pattern recognition,etc.When the distance is Euclidean,the CMDS solution can be viewed as the projections of the samples onto the principal axes of the sample set.The problem of CMDS is that its time increases rapidly with the increase of data size.As one of the fast variants of CMDS,the FastMap algorithm has been widely used in many fields.It includes some recursive projections.And each projection consists of three steps.Firstly,a pivot is obtained by passing through two far apart points(referred to as pivot points from now on);then the samples are projected onto the pivot and the coordinates of the samples in a lowdimensional Euclidean space are obtained;and finally,the distances between all samples are modified.The shortcoming of the FastMap algorithm is that it can only find approximate solution of CMDS.In this paper,the FastMap algorithm is analyzed in detail.It is found that the essence of the FastMap algorithm is to project the samples onto a set of mutually orthogonal directions determined by the pivots.Since these directions are usually different from the principal axes of the sample set,the FastMap algorithm can only get the approximate solution of CMDS.It is also found that the pivots can be selected from a subset provided that its intrinsic dimension is equal to that of the whole sample set.Last but not least,it is found that only the distances between the pivot points and the samples are used to obtain the FastMap coordinates.That is to say,it’s unnecessary to modify the distances between all the samples.Based on the theoretical analysis,an improved algorithm called iFastMap(improved FastMap)is put forward in this paper.By introducing principal component analysis,the FastMap coordinates can be aligned with the CMDS coordinates.As a result,the iFastMap algorithm can find exactly the same solution as that of CMDS.In addition,by selecting a subset with the same intrinsic dimension as that of the whole sample set,choosing the pivots only from this subset,and modifying only the distances between the pivot points and all the samples after each projection,the speed of the iFastMap algorithm is further improved.The experimental results verify the complete consistency of the solutions of the iFastMap algorithm and CMDS and high efficiency of the iFastMap algorithm.
Qu Taiguo*,Cai Zixing .
An improved FastMap algorithm[J]. Journal of Nanjing University(Natural Sciences), 2016, 52(4): 682
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Cox T F,Cox M A A.Multidimensional scaling.Boca Raton:Chapman & Hall/CRC,2001,9-56. [2] 吴纯青,任沛阁,王小峰.基于语义的网络大数据组织与搜索.计算机学报,2015,38(1):1-17.(Wu C Q,Ren P G,Wang X F.Survery on semanticbased organization and search technologies for network big data.Chinese Journal of Computers,2015,38(1):1-17.) [3] Ingram S,Munzner T.Dimensionality reduction for documents with nearest neighbor queries.Neurocomputing,2015,150,Part B:557-569. [4] Mjolsness E,DeCoste D.Machine learning for science:State of the art and future prospects.Science,2001,293(5537):2051-2055. [5] Patel A P,Tirosh I,Trombetta J J,et al.Singlecell RNAseq highlights intratumoral heterogeneity in primary glioblastoma.Science,2014,344(6190):1396-1401. [6] 明 亮,赵 刚,谢桂海等.面向智能空间的位置感知方法研究.软件学报,2009,20(3):671-681.(Ming L,Zhao G,Xie G H,et al.Research on smart space oriented location awareness method.Journal of Software,2009,20(3):671-681.) [7] Wang S,Zhuo Z,Yang H,et al.An approach to facial expression recognition integrating radial basis function kernel and multidimensional scaling analysis.Soft Computing,2014,18(7):1363-1371. [8] Vogelstein J T,Park Y,Ohyama T,et al.Discovery of brainwide neuralbehavioral maps via multiscale unsupervised structure learning.Science,2014,344(6182):386-392. [9] Arkin A,Shen P,Ross J.A test case of correlation metric construction of a reaction pathway from measurements.Science,1997,277(5330):1275-1279. [10] Li S S M,Wang C C L,Hui K C.Bendinginvariant correspondence matching on 3D human bodies for feature point extraction.IEEE Transactions on Automation Science and Engineering,2011,8(4):805-814. [11] 徐江山,路海明,马卫民等.基于MDS变换聚类算法的数字电视用户特征提取.电子学报,2008,36(9):1786-1789.(Xu J S,Lu H M,Ma W M,et al.The extraction of digital TV user feature based on MDS clustering.Acta Electronica Sinica,2008,36(9):1786-1789.) [12] Tenenbaum J B,Silva V D,Langford J C.A global geometric framework for nonlinear dimensionality reduction.Science,2000,290(5500):2319-2323. [13] Chalmers M.A linear iteration time layout algorithm for visualizing high dimensional data.In:Proceedings of the 7th IEEE Visualization Conference.Los Alamitos,California,USA:IEEE Computer Society Press,1996:127-132. [14] Morrison A,Ross G,Chalmers M.Fast multidimensional scaling through sampling,Springs and Interpolation.Information Visualization,2003,2(1):68-77. [15] Williams M,Munzner T.Steerable,progressive multidimensional scaling.In:IEEE Symposium on Information Visualization 2004.Austin,TX,USA:IEEE Press,2004:57-64. [16] Wang J T L,Wang X,Shasha D,et al.Metricmap:An embedding technique for processing distancebased queries in metric spaces.IEEE Transactions on Systems,Man,and Cybernetics,Part B:Cybernetics,2005,35(5):973-987. [17] Silva V D,Tenenbaum J B.Sparse multidimensional scaling using landmark points.Technical Report.Department of Mathematics,Stanford University,Palo Alto,June 2004. [18] Faloutsos C,Lin K.FastMap:A fast algorithm for indexing,datamining,and visualization.In:Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data.San Jose,California,USA:ACM Press,1995,24(2):163-174. [19] Mignotte M.A bicriteriaoptimizationapproachbased dimensionalityreduction model for the color display of hyperspectral images.IEEE Transactions on Geoscience and Remote Sensing,2012,50(2):501-513. [20] Wang X,Zhang L,Wang Y,et al.3D model features coclustering based on heterogeneous semantic network.In:The 4th IEEE International Conference on Information Science and Technology(ICIST 2014).Shenzhen,Guangdong,China:IEEE Press,2014:75-78. [21] Tejada E,Minghim R,Nonato L G.On improved projection techniques to support visual exploration of multidimensional data sets.Information Visualization,2003,2(4):218-231. [22] Latsoudas G,Sidiropoulos N D.A fast and effective multidimensional scaling approach for node localization in wireless sensor networks.IEEE Transactions on Signal Processing,2007,55(10):5121-5127. [23] Zhang D Y,Bai X,Liu P N.multidimensional scaling localization algorithm in wireless sensor networks.Sensors & Transducers,2014,164(2):71-79. [24] 张培珍,付 平,肖 军等.基于快速聚类索引的图像检索系统.吉林大学学报(信息科学版),2004,22(6):638-642.(Zhang P Z,Fu P,Xiao J,et al.Image retrieval system based on fast clustering indexing.Journal of Jilin University(Information Science Edition),2004,22(6):638-642.) [25] Liu Y,Kender J R.Fast video segment retrieval by SortMerge feature selection,boundary refinement,and lazy evaluation.Computer Vision and Image Understanding,2003,92(2-3):147-175. [26] 王 菁,杨寿保,高鹰等.FCAN:一种基于快速映射的内容访问网络.系统仿真学报,2007,19(17):3955-3960.(Wang J,Yang S B,Gao Y,et al.FCAN:Fastmapbased content addressable network.Journal of System Simulation,2007,19(17):3955-3960.) [27] Khan I,Huang J Z,Tung N T,et al.Ensemble clustering of high dimensional data with FastMap projection.In:Peng W C.PAKDD 2014 Workshops.Tainan,Taiwan:Springer International Publishing,2014:483-493. [28] Touati R,Mignotte M.MDSbased multiaxial dimensionality reduction model for human action recognition.In:2014 Canadian Conference on Computer and Robot Vision(CRV).Montreal,QC,Canada:IEEE Press,2014:262-267. [29] Zheng Y,Xiao L,Tang W,et al.A music recommendation method for largescale music library on a heterogeneous platform.In:Sun X H,Qu W Y,Ivan Stojmenovic I,et al.The 14th International Conference on Algorithms and Architectures for Parallel Processing.Dalian,China:Springer International Publishing,2014:472-482. [30] Moevus A,Mignotte M,de Guise J A,et al.Evaluating perceptual maps of asymmetries for gait symmetry quantification and pathology detection.In:The 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC 2014).Chicago,Illinois:IEEE Press,2014:3317-3320. [31] Platt J.FastMap,MetricMap,and Landmark MDS are all nystrom algorithms.Technical Report.Microsoft Research,Microsoft Corporation,New York,MSR-TR-2004-26. [32] AbuKhzam F N,Samatova N F,Ostrouchov G,et al.Distributed dimension reduction algorithms for widely dispersed data.In:Gonzales T,Akl S G.The 14th IASTED International Conference on Parallel and Distributed Computer Systems.Cambridge,USA:Acta Press,2002:167-174. [33] Ostrouchov G,Samatova N F.On FastMap and the convex hull of multivariate data:Toward fast and robust dimension reduction.IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(8):1340-1343. [34] An J Y,Yu J X,Ratanamahatana C A,et al.A dimensionality reduction algorithm and its application to interactive visualization.Journal of Visual Languages and Computing,2007,18(1):48-70. [35] Young G,Householder A S.Discussion of a set of points in terms of their mutual distances.Psychometrika,1938,3(1):19-22. [36] 张润楚.多元统计分析.北京:科学出版社,2006,288-311.(Zhang R C.Multivariate statistical analysis.Beijing:Science Press,2006,288-311.) [37] Pearson K.On lines and planes of closest fit to system of points in space.Philosophical Magazine,1901,2:559-572. [38] Hotelling H.Analysis of a complex of statistical variables into principal components.Journal of Educational Psychology,1933,24(6):417-441. [39] http://archive.ics.uci.edu/ml/datasets.html [40] http://www.cs.toronto.edu/~roweis/data.html [41] Kruskal J B.Nonmetric multidimensional scaling:A numerical method.Psychometrika,1964,29(2):115-129.