
基于主动学习的模式兴趣评估方法
Pattern interestingness evaluation based on active learning
频繁模式挖掘(Frequent Pattern Mining,FPM)是图数据分析、挖掘领域的核心问题之一,其目的是从大规模图数据中发现支持度不低于指定阈值的模式.传统的频繁模式挖掘算法依赖支持度进行剪枝,返回结果往往包含大量“冗余”模式;top⁃k模式挖掘算法虽然仅返回k个频繁模式,但该类算法主要依据“客观”指标,如支持度等,对模式进行评估,难以充分反映用户的主观兴趣偏好.针对上述问题,提出一种基于主动学习的模式兴趣评估方法(Pattern Interestingness Evaluation with Active Learning,PIEAL),通过主动学习策略,从采样图上挖掘的频繁模式中选择代表性模式,并利用有限次人机交互收集用户对这些模式的偏好,进而预测模式的兴趣分数,指导算法发现用户感兴趣的模式.在人机交互环节,PIEAL采用基于成对比较的策略来收集用户对模式的偏好反馈,有效降低了用户的主观评价难度.在真实数据集上的实验结果表明,PIEAL仅需要少量的人机交互便可发现用户感兴趣的模式,其测试集准确率最高可达95%.
Frequent Pattern Mining (FPM) is crucial in graph data analysis and mining,aiming to discover patterns in large⁃scale graph data with support above a specified threshold. Traditional FPM algorithms rely on the support threshold for search space pruning,often resulting in excessive redundant
主动学习 / 交互式 / 频繁模式挖掘 / 兴趣度 {{custom_keyword}} /
active learning / interaction / frequent pattern mining / interestingness {{custom_keyword}} /
1 | Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases∥Proceedings of the 20th International Conference on Very Large Data Bases. San Francisco,CA,USA:Morgan Kaufmann Publishers Inc.,1994:487-499. |
2 | Han J W, Pei J, Yin Y W,et al. Mining frequent patterns without candidate generation:A frequent?pattern tree approach. Data Mining and Knowledge Discovery,2004,8(1):53-87. |
3 | Wang X, Shi J H, Zou J J,et al. Supports estimation via graph sampling. Expert Systems with Applications,2024,240:122554. |
4 | Wang X, Lan Z, He Y A,et al. A cost?effective approach for mining near?optimal top?k patterns. Expert Systems with Applications,2022,202:117262. |
5 | Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science,2014,344(6191):1492-1496. |
6 | Wu Y X, Meng Y F, Li Y,et al. COPP?Miner:Top?k contrast order?preserving pattern mining for time series classification. IEEE Transactions on Knowledge and Data Engineering,2024,36(6):2372-2387. |
7 | Fournier?Viger P, Wang Y, Yang P,et al. TSPIN:Mining top?k stable periodic patterns. Applied Intelligence,2022,52(6):6917-6938. |
8 | 邹杰军,王欣,石俊豪,等. 面向大图的Top?Rank?K频繁模式挖掘算法. 南京大学学报(自然科学),2024,60(1):38-52. |
Zou J J, Wang X, Shi J H,et al. Top?Rank?K frequent pattern mining algorithm for large graphs. Journal of Nanjing University (Natural Science),2024,60(1):38-52. | |
9 | Lee C, Kim H, Cho M,et al. Incremental Top?k high utility pattern mining and analyzing over the entire accumulated dynamic database. IEEE Access,2024,12:77605-77620. |
10 | Wang X, Tang L, Liu Y,et al. Diversified pattern mining on large graphs∥The 32nd International Conference on Database and Expert Systems Appli?cations. Virtual Event:Springer Cham Germany,2021:171-184. |
11 | Gal Y, Islam R, Ghahramani Z. Deep Bayesian active learning with image data∥Proceedings of the 34th International Conference on Machine Learning. Sydney,Australia:JMLR.org,2017:1183-1192. |
12 | Raj A, Bach F. Convergence of uncertainty sampling for active learning∥Proceedings of the 39th Inter?national Conference on Machine Learning. Balti?more,MD,USA:PMLR,2022:18310-18331. |
13 | Buchert F, Navab N, Kim S T. Toward label?efficient neural network training:Diversity?based sampling in semi?supervised active learning. IEEE Access,2023,11:5193-5205. |
14 | Wu T H, Liu Y C, Huang Y K,et al. ReDAL:Region?based and diversity?aware active learning for point cloud semantic segmentation∥Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal,Canada:IEEE,2021:15510-15519. |
15 | Liu J C, Wang Y W, Hooi B,et al. LSCALE:Latent space clustering?based active learning for node classification∥Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Turin,Italy:Springer Grenoble France,2023:55-70. |
16 | Yan X Y, Nazmi S, Gebru B,et al. A clustering?based active learning method to query informative and representative samples. Applied Intelligence,2022,52(11):13250-13267. |
17 | Zhou P, Zhang T X, Zhao L W,et al. Pre?clustering active learning method for automatic classification of building structures in urban areas. Engineering Applications of Artificial Intelligence,2023,123:106382. |
18 | Yan X F, Han J W. gSpan:Graph?based substructure pattern mining∥Proceedings of the IEEE Inter?national Conference on Data Mining. Maebashi City,Japan:IEEE,2002:721-724. |
19 | Abdelhamid E, Abdelaziz I, Kalnis P,et al. ScaleMine:Scalable parallel frequent subgraph mining in a single large graph∥Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis. Salt Lake City,UT,USA:IEEE,2016:716-727. |
20 | Yang J, Leskovec J. Defining and evaluating network communities based on ground?truth∥2012 IEEE 12th International Conference on Data Mining. Brussels,Belgium:IEEE,2012:745-754. |
21 | Leskovec J, Huttenlocher D, Kleinberg J. Predicting positive and negative links in online social networks∥Proceedings of the 19th International Conference on World Wide Web. Raleigh North,CA,USA:ACM,2010:641-650. |
22 | Takac L, Zabovsky M. Data analysis in public social networks∥International Scientific Conference & International Workshop Present Day Trends of Innovations. ?om?a,Poland:DTI,2012. |
23 | Ikotun A M, Ezugwu A E, Abualigah L,et al. K?means clustering algorithms:A comprehensive review,variants analysis,and advances in the era of big data. Information Sciences,2023,622:178-210. |
24 | Ricaud B, Aspert N, Miz V. Spikyball sampling:Exploring large networks via an inhomogeneous filtered diffusion. Algorithms,2020,13(11):275. |
25 | Elseidy M, Abdelhamid E, Skiadopoulos S,et al. GraMi:Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB Endowment,2014,7(7):517-528. |
26 | Shi Y F, Yu Z W, Cao W M,et al. Fast and effective active clustering ensemble based on density peak. IEEE Transactions on Neural Networks and Learning Systems,2021,32(8):3593-3607. |
/
〈 |
|
〉 |