南京大学学报(自然科学版) ›› 2011, Vol. 47 ›› Issue (5): 532–543.

• • 上一篇    下一篇

 基于对娇gSpan改进的有向频繁子图挖掘算法*

 周溜溜,业 宁**   

  • 出版日期:2015-04-27 发布日期:2015-04-27
  • 作者简介: (南京林业大学信息技术学院,南京,210037)
  • 基金资助:
     国家自然科学基金(30671639),江苏省自然科学基金(BK2009393),江苏省青蓝工程学术带头人项日

 Digraph frequent subgraph mining based on gSpan

 Zhou Liu Liu,Ye Ning   

  • Online:2015-04-27 Published:2015-04-27
  • About author: (College of Computer Science and Technology, Nanjing Forestry University, Nanjing, 210037,China)

摘要:  提出的新算法对gSpan算法做了适用性改进,算法所采用的图编码技术与传统的频繁子图挖掘(FSG),快速频繁子图挖掘(FFSM),基于先验的图挖掘(AGM)等算法对图结构的编码均不同,由于对有向图
进行了新的二维特征定义,因此可使算法适用范围有效地扩展至对有向图的学习,称之为基于对gSpan改进的有向频繁子图挖掘算法((DFSS);因目前为II.,一系列频繁子图的挖掘大都是基于无向图上的知识发现,对直接
作用于有向图的挖掘尚且很少.并且所设计算法较先前基于Apriori思想的FSG, AGM等一系列频繁图挖掘算法,在时间复杂度方面有了一定程度的改进,使得挖掘效率得以提升;实验结果表明在不损失挖掘完整度的前
提卜,其效率是FFSM算法的70一80倍.

Abstract:  With graph data generated from various sources, meaningful pattern mining on this kind of data set becomes more and more urgent, especially along with the development of life science, yielding a considerable amount of directed graphs.
However, existed related algorithms arc all designed for undirected graphs, like Frequent Subgraph Discovery(FSG),Apriori-based Uraph Mining(AUM),Fast Frequent Subgraph Mining(FFSM) and so on. Except for FFSM, there is not such
an algorithm specially designed for directed graphs. Meanwhile, FFSM algorithm itself also needs some modifications ahead of utilizing on directed graphs, In the paper, we propose a new algorithm called DFSS after the research based on some pattern
mining algorithms of graphs, which detects frequent substructures directly in directed graphs. DFSS constructs a new data model to indicate each graph and maps each one into a unique minimum array code.Two new concepts, link degrees and level
degree arc presented in the paper for the special operation of directed graph pattern mining. We also analyze the efficiency of  FFSM and DFSS.The time complexity of DFSS is O(N3/N2),which improves O(m4/n2=)times compared to FFSM, where n
is the number of frequent edges and m is the number of frequent vertices in the graph data set. Experiments showed that DFSS achieves orders of magnitude speedup in comparison with FFSM, and demonstrate theoretical analysis mentioned above. In
addition, the algorithm proposed in this paper plays the role as the first to directly operate on the directed graph data set, which will surely have vital significance for future related works.

[1]Li X T,Li J Z,Gao H. An efficient algorithm for mining frequent subgraphs. Journal of Soft- ware, 2007, 18(10): 2469~2480.(李先通,李建中,高宏.一种高效频繁子图挖掘算法.软件学报,2007,18(10);2469~2480).
[2]Fan W , Li X M.The challenge of thins’char- acteristics for modeling and data mining. China Computer Federation Communications, 2010,55 (9):42~47.(范伟,李晓明.物联网数据特性对建模和挖掘的挑战.计算机学会通讯,2010, 55 (9):42一47) .
[3]Borgclt C, Berhold M R. Mining molecular fragm ents; Finding relevant substructures of molecular. Proceedings of the IEEE lnternation- al Conference on Data Mining. Macbashi,2002,176一187.
[4]Holder L B, Cook D J,Djoko S. Substructures discovery in the subdue system. Proceedings of the AAAI’94 Workshop Knowledge Discovery in Databascs,l994,169一180.
[5]Inokuchi A,WashioT,OkadaT,et al. Apply- ing algebraic mining method of graph substrur tures to mutagenicsis data analysis. Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Kyoto,Japan, 2000,554一562.
[6]Inokuchi A,WashioT,OkadaT.An apriori based algorithm for mining frequent substruc tures from graph data. Proceedings of the Prac tice of Knowledge Discovery in Databases. Ly on. France. 2000,13一23.
[7]Kuramochi M,Karypis G. Frequent subgraph discovery. Proceedings of the IEEE lnternation- al Conference on Data Mining. California, USA,2001,759一772.
[8]Yan Y,H an J, gSpan; Graph-Based substruc ture pattern mining. Proceedings of the 2002 In ternational Conference on Data Mining. Mac bashi,Japan, 2002,548一562.
[9]Yan X, Han J. CloseGraph; Mining closed fre- qucnt graph patterns. Proceedings of the 9th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining. Wash- ington, USA,2003,1548一1568.
[10]Han J,Wang W,Prins J. Efficient mining of frequent subgraphs in the presence of isomor- phism. Proceedings of the IEEE international Conference on Data Mining. Florida, USA,2003,24l4一2136.
[11]WashioT,Motoda H. State of the art of graph- based data mining. Proceedings of the 15thACM SIUKDD Conference on Knowledge Discovery and Data Mining. Paris, France 2009 1274~1281
[12]Li X T,Li J Z. An efficient graph query algo- rithm based on important vertices and decision features. international Journal of information Technology,2009,5(1):26一33.
[13]Li X T,Li J Z. A decomposition based algo- rithm of graph contaiment query.Information Technology, 2009,8(2):214一219.
[14]Liu J L Zhang W, W L. L. Algebra property and application of coalition structure graph. Pattern Recognition and Artificial intelligence,2009,22(6):841}-847.(刘惊需,张伟, 王玲玲.联盟结构图的代数性质及应用.模式识别与人工智能,2009,22(6):841}-847).
[15]Hu S L, Li S F, Shi C Y. Coalition structure generation with worst case finite bound from the optimal guarantees. Journal of Computer Re search and Development,2009,16(8):1356一
1357.(胡山立,李少芳,石纯一最坏情况具有限界的联盟结构生成.计算机研究与发展,2009,46(8):1356一1357).
[16]Ye N,Zhu D M,Zhang Q Q. A fast algorithm of constrained longest common subsequence. Journal of Nanjing University(Natural Sci- ences),2009,45(5):576~584.(业宁,朱大
铭,张倩倩等.带约束最长公共子序列快速算法.南京大学学报(自然科学),2009,45(5): 576一584).

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!