南京大学学报(自然科学版) ›› 2013, Vol. 49 ›› Issue (2): 266234.
马晓文1,胡学钢1**,谢飞1·2,郭丹1
Ma Xiao-Wen1,Hu Xue-Gang1,Xie Fei1’2,Guo Dan1
摘要: 带有通配符的多序列模式挖掘在文木检索、网络安全、生物科学等领域中具有很重要的作用.
通过挖掘多序列模式,能够透彻的了解序列之间的联系,在各个领域中具有重要的现实意义.在己有的
工作中,随着多序列集长度的增大,挖掘的规模呈现指数级增长.研究这样一个问题:给定多条序列s1,
…,sn,支持度阂值和间隔约束,从多序列中挖掘所有出现次数不小于给定支持度阂值的频繁序列模式,
并且要求模式中任意两个相邻元素在序列中的出现位置满足用户定义的间隔约束.设计了一个有效的
算法M-OneOffMine,模式在序列中的出现满足one-off条件.在生物DNA序列上的实验结果表明,
M-OneOffMine算法比相关的序列模式挖掘算法具有更好的时间性能.
[1]Oaten T,Cohen P. Searching for structure in mul tiple streams of data. Proceedings of 13th lnterna tional Conference on Machine Learning. Bari,Ita 1y,1996,346一354. [2]Herzel H,Weiss O,Trifonov E. 10一11 bp perio- dicities in complete genomes reflect protein strur ture and DNA folding. Bioinformatics, 1999,15 (3).187一193. [3]Jonassen I. Efficient discovery of conserved pat- terns using a pattern graph. Computer Applica- tions in the Biosciences:CABIOS, 1997,13:509 ~522. [4]Rigoutsos I,Floratos A. Combinatorial pattern dis covery in biological sequences;The TFIRESIAS al gorithm. Bioinlormatics, 1998,14(1):55一67. [5]Xie F, Wu X D, Hu X G, et al. Sequential pattern mining with wildcards. Proceedings of the 22nd In- ternational Conference on Tools with Artificial Intelligence. Arras,France,2010,241一247. [6]Belkum A Van,Scherer S,Leeuwen W Van. Vari- able number of tandem repeats in clinical strains of Hacmophilus influenzae. Infection and Immuni- ty,1997,5017一5027. [7]Huang Y M, Wu X D, Hu X G, et al. Mining fre quent patterns with gaps and one-off condition. Pro- ceedings of the international Conference on Compu- tational Science and Engineering. Vancouver, BC, USA,2009,180一186. [8]Gao Y. Press of data mining in China. Journal of Nanjing University(Natural Sciences),2011,47(4): 351 -353.(高阳.中国数据挖掘研究进展.南京大学学报(自然科学),2011,47(4):351-353). [9]Agrawal R,Srikant R. Mining sequential pat terns. Proceedings of the 11th international Con ference on Data Engineering. Taipei,Taiwan 1995,3一14. [10]Zou X, Zhang W, Liu Y, et al. Study on distrihuted sequential pattern discovery algorithm. Journal of Software,2005,16(7):1262一1269.(邹祥,张巍,刘洋等.分布式序列模式发现算法的研究. 软件学报,2005,16(7):1262 -1269). [11]Agrawal R, Srikant R. Fast algorithms for mining association rules. Proceedings of the international Conference on Very Large Databases. Santiago,Chil- e; Morgan Kaufmann Publishing lnc, San Matco, CA,USA, 1994 ,487一499. [12]Han J,Pei J,Mortazavi-Asl B, et al. FreeSpan;Frequent pattern-projected sequential pattern mining. Proceedings of the international Confer once on Knowledge Discovery and Data Mining Boston, MA,2000,355一359. [13]Pei J,Han J,Mortazavi-Asl B,et al. PrefixSpan; Mining sequential patterns efficiently by prefix- projected pattern growth. Proceedings of the 17th IEEE international Conference on Data Engineer- ing. Heidelberg, Germany; institute of Electrical and Electronics Engineers Computer Society, 2001,215一224. [14]Zhang M H,Kao B,Cheung D,et al. Minim peri odic patterns with gap requirement from se- quences. Proceedings of the ACM SIUMOD In ternational Conference on Management of Data. Baltimore,MD,Unitcd States,2005,623一633. [15]He Y, Wu X D, Zhu X Q. Mining frequent patterns with wildcards from biological sequences. Proceed- ings of the IEEE international Conference on lnfor- mation Reuse and lntcgration,Las Vegas,NV,USA, 2007,329一334. [16]Zhu X Q, Wu X D. Mining complex patterns across sequences with gap requirements. Proceeding of the 20th International Joint Conference on Artificial In telligence. Hyderabad,2007,726一735. [17]Ding B, Lo D, Han J,et al. Efficient minim of closed repetitive gapped subsequences from a se quence database. Proceedings of the 25th Interna tional Conference on Data Engineering. Shanghai China,2009,1024一1035. [18]Chen G,Wu X D,Zhu X Q. Efficient string matc-hing with wildcards and length constraints. Knowledge and Information Systems,2006,10(4):339~419. [19]http//www. ncbi. nlm. nih. gov. |
No related articles found! |
|