南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (3): 483.
李兴亮1,毛 睿2*
Li Xingliang1,Mao Rui2*
摘要: 度量空间数据管理分析方法把数据抽象成度量空间中的点,具有高度的通用性,是应对大数据多样性挑战的有效手段之一.由于度量空间没有坐标,很多数学工具无法直接使用,一般以数据到参考点(也称作支撑点)的距离作为坐标.支撑点的好坏对于度量空间数据管理分析的性能发挥着关键性的影响.最远优先遍历(Farthest First Traversal,FFT)可以选出数据拐角的点,具有线性的时间复杂度和空间复杂度,是使用最广泛的支撑点选取算法之一.但是,实验表明最好的支撑点往往不是最拐角的点,故FFT很难选出最好的支撑点.提出近期最远遍历(Recent Farthest Traversal,RFT)算法,只以近期的几个支撑点来选择下一个支撑点,能够更快地选出性能更优的支撑点.同时,实验表明FFT还可以在数据内部均匀抽样.提出支撑点集合选择算法(Pivot Set Selection,PSS),可以一次性选出所有支撑点.以RFT选择候选集,以FFT选择评价集,选出支撑点并构建相似性索引,PSS使得索引构建代价大大降低,索引性能得到一定提升.实验表明,RFT选出好的支撑点的速度远快于FFT,准确率高于FFT,而FFT的抽样效果良好.
[1] Roman S.Advanced linear algebra.The 2nd Edition.New York,USA:Springer,2005,283-286. [2] Iqbal Q,Aggarwal J K.Image retrieval via isotropic and anisotropic mappings.Pattern Recognition Journal,2002,35(12):2673-2686. [3] Chavez E,Navarro G,BaezaYates R,et al.Searching in metric spaces.ACM Computing Surveys(CSUR),2001,33:273-321. [4] Ramakrishnan S R,Mao R,Nakorchevskiy A A,et al.A fast coarse filtering method for protein identification by mass spectrometry.Bioinformatics,2006,22(12):1524-1531. [5] Ciaccia P,Patella M,Zezula P.Mtree:An efficient access method for similarity search in metric spaces.In:The 23rd International Conference on Very Large Data Bases(VLDB’97).Athens,Greece:Morgan Kaufmann Publishers Inc.,1997:426-435. [6] Navarro G.Searching in metric spaces by spatial approximation.VLDB Journal,1999,11(1):141-148. [7] Amato G,Esuli A,Falchi F.Pivot selection strategies for permutationbased similarity search.Springer,2013,8199:91-102. [8] Mao R,Miranker W L,Miranker D P.Pivot selection:Dimension reduction for distancebased indexing.Journal of Discrete Algorithm,2012,13:32-46. [9] Ma K W,Liu Y J,Xu H L,et al.Pivot selection methods based on covariance and correlation for metricspace indexing.2012 National Conference on Information Technology and Computer Science (CITCS2012).Lanzhou,China:Atlantis Press,2012,672-677. [10] Bustos B,Navarro G,Chavez E.Pivot selection techniques for proximity searching in metric spaces.Pattern Recognition Letters,2003,24(14):2357-2366. [11] Gonzalez T F.Clustering to minimize the maximum intercluster distance.Theoretical Computer Science,1985,38:293-306. [12] Berman A,Shapiro L G.Selecting good keys for triangleinequalitybased pruning algorithms.In:Proceedings of the 1998 International Workshop on Contentbased Access of Image and Video Databases(CAIVD’98).Bombay,India:IEEE,1998:12-19. [13] Vleugels J,Veltkamp R C.Efficient image retrieval through vantage objects.Pattern Recognition,2002,35(1):69-80. [14] Mao R,Xu W,Singh N,et al.An assessment of a metric space database index to support sequence homology.IEEE Symposium on Bioinformatics & Bioengineering,2003,14(5):375-382. [15] Henniga C,Lateckic L J.The choice of vantage objects for image retrieval.Pattern Recognition,2003,36(9):2187-2196. [16] Hochbaum D S,Shmoys D B.A best possible heuristic for the kcenter problem.Mathematics of Operational Research,1985,10(2):180-184. [17] Yianilos P N.Data structures and algorithms for nearest neighbor search in general metric spaces.ACMSIAM Symposium on Discrete Algorithms,1993:311-321. [18] Mao R,Zhang P H,Li X L,et al.Pivot selection for metricspace indexing.International Journal of Machine Learning and Cybernetics,2016,7(2):311-323. [19] The address of UMAD project:https://github.com/ruimao/UMAD,2015-11. [20] Mao R,Iqbal Q,Liu W,et al.Case Study:Distancebased image retrieval in the MoBIoS DBMS.In:The Proceedings of The 5th International Conference on Computer and Information Technology(CIT2005).Shanghai,China:IEEE,2005:49-55. [21] Needleman S B,Wunsch C D.A general method applicable to the search for similarities in the amino acid sequence of two proteins.Journal of Molecular Biology,2015,48(3):443-453. [22] Xu W,Miranker D P.A metric model of amino acid substitution.Bioinformatics,2004,20(8):1214-1221. [23] Navarro G.Analyzing metric space indexes:What for?In:The Proceedings of the 2nd International Conference on Similarity Search & Applications(SISAP2009).Washington D C,USA:IEEE Computer Society,2009:3-10. |
No related articles found! |
|