基于主动学习的模式兴趣评估方法

doi:10.13232/j.cnki.jnju.2025.02.006

PDF(1272 KB)

南京大学学报(自然科学版) ›› 2025, Vol. 61 ›› Issue (2) : 249-260. DOI: 10.13232/j.cnki.jnju.2025.02.006

基于主动学习的模式兴趣评估方法

王璐, 付勋, 沈玲珍, 蒋星, 王欣()

作者信息 +

Pattern interestingness evaluation based on active learning

Lu Wang, Xun Fu, Lingzhen Shen, Xing Jiang, Xin Wang()

Author information +

文章历史 +

摘要

频繁模式挖掘（Frequent Pattern Mining，FPM）是图数据分析、挖掘领域的核心问题之一，其目的是从大规模图数据中发现支持度不低于指定阈值的模式.传统的频繁模式挖掘算法依赖支持度进行剪枝，返回结果往往包含大量“冗余”模式；top⁃k模式挖掘算法虽然仅返回k个频繁模式，但该类算法主要依据“客观”指标，如支持度等，对模式进行评估，难以充分反映用户的主观兴趣偏好.针对上述问题，提出一种基于主动学习的模式兴趣评估方法（Pattern Interestingness Evaluation with Active Learning，PIEAL），通过主动学习策略，从采样图上挖掘的频繁模式中选择代表性模式，并利用有限次人机交互收集用户对这些模式的偏好，进而预测模式的兴趣分数，指导算法发现用户感兴趣的模式.在人机交互环节，PIEAL采用基于成对比较的策略来收集用户对模式的偏好反馈，有效降低了用户的主观评价难度.在真实数据集上的实验结果表明，PIEAL仅需要少量的人机交互便可发现用户感兴趣的模式，其测试集准确率最高可达95%.

Abstract

Frequent Pattern Mining (FPM) is crucial in graph data analysis and mining，aiming to discover patterns in large⁃scale graph data with support above a specified threshold. Traditional FPM algorithms rely on the support threshold for search space pruning，often resulting in excessive redundantpatterns. While algorithms for the top⁃k patterns mining only return k frequent patterns，their results often fail to fully reflect users' interests and preferences as these patterns are picked based on objectiveindicators such as support. To address these issues，we propose a Pattern Interestingness Evaluation with Active Learning (PIEAL) method. PIEAL employs an active learning strategy to select representative patterns from frequent patterns that are mined from a sample graph and leverages limited human⁃computer interaction to gather users’ preferences on representative patterns，therefore enabling the prediction of pattern interestingness and the discovery of patterns of interest to users. During the interaction phase，PIEAL adopts a pairwise comparison strategy to collect users' feedback on the representative patterns，effectively reducing the difficulty of subjective evaluation. Experimental results on real life graphs show that with few interactions PIEAL can discover patterns of interest to users，achieving an accuracy of up to 95% on the test set.

导出引用

王璐, 付勋, 沈玲珍, 蒋星, 王欣. 基于主动学习的模式兴趣评估方法[J]. 南京大学学报(自然科学版), 2025, 61(2): 249-260 https://doi.org/10.13232/j.cnki.jnju.2025.02.006

Lu Wang, Xun Fu, Lingzhen Shen, Xing Jiang, Xin Wang. Pattern interestingness evaluation based on active learning[J]. Journal of Nanjing University(Natural Sciences), 2025, 61(2): 249-260 https://doi.org/10.13232/j.cnki.jnju.2025.02.006

中图分类号： TP311

参考文献

1	Agrawal R， Srikant R. Fast algorithms for mining association rules in large databases∥Proceedings of the 20th International Conference on Very Large Data Bases. San Francisco，CA，USA：Morgan Kaufmann Publishers Inc.，1994：487-499.
2	Han J W， Pei J， Yin Y W，et al. Mining frequent patterns without candidate generation：A frequent?pattern tree approach. Data Mining and Knowledge Discovery，2004，8(1)：53-87.
3	Wang X， Shi J H， Zou J J，et al. Supports estimation via graph sampling. Expert Systems with Applications，2024，240：122554.
4	Wang X， Lan Z， He Y A，et al. A cost?effective approach for mining near?optimal top?k patterns. Expert Systems with Applications，2022，202：117262.
5	Rodriguez A， Laio A. Clustering by fast search and find of density peaks. Science，2014，344(6191)：1492-1496.
6	Wu Y X， Meng Y F， Li Y，et al. COPP?Miner：Top?k contrast order?preserving pattern mining for time series classification. IEEE Transactions on Knowledge and Data Engineering，2024，36(6)：2372-2387.
7	Fournier?Viger P， Wang Y， Yang P，et al. TSPIN：Mining top?k stable periodic patterns. Applied Intelligence，2022，52(6)：6917-6938.
8	邹杰军，王欣，石俊豪，等. 面向大图的Top?Rank?K频繁模式挖掘算法. 南京大学学报(自然科学)，2024，60(1)：38-52.
	Zou J J， Wang X， Shi J H，et al. Top?Rank?K frequent pattern mining algorithm for large graphs. Journal of Nanjing University (Natural Science)，2024，60(1)：38-52.
9	Lee C， Kim H， Cho M，et al. Incremental Top?k high utility pattern mining and analyzing over the entire accumulated dynamic database. IEEE Access，2024，12：77605-77620.
10	Wang X， Tang L， Liu Y，et al. Diversified pattern mining on large graphs∥The 32nd International Conference on Database and Expert Systems Appli?cations. Virtual Event：Springer Cham Germany，2021：171-184.
11	Gal Y， Islam R， Ghahramani Z. Deep Bayesian active learning with image data∥Proceedings of the 34th International Conference on Machine Learning. Sydney，Australia：JMLR.org，2017：1183-1192.
12	Raj A， Bach F. Convergence of uncertainty sampling for active learning∥Proceedings of the 39th Inter?national Conference on Machine Learning. Balti?more，MD，USA：PMLR，2022：18310-18331.
13	Buchert F， Navab N， Kim S T. Toward label?efficient neural network training：Diversity?based sampling in semi?supervised active learning. IEEE Access，2023，11：5193-5205.
14	Wu T H， Liu Y C， Huang Y K，et al. ReDAL：Region?based and diversity?aware active learning for point cloud semantic segmentation∥Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal，Canada：IEEE，2021：15510-15519.
15	Liu J C， Wang Y W， Hooi B，et al. LSCALE：Latent space clustering?based active learning for node classification∥Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Turin，Italy：Springer Grenoble France，2023：55-70.
16	Yan X Y， Nazmi S， Gebru B，et al. A clustering?based active learning method to query informative and representative samples. Applied Intelligence，2022，52(11)：13250-13267.
17	Zhou P， Zhang T X， Zhao L W，et al. Pre?clustering active learning method for automatic classification of building structures in urban areas. Engineering Applications of Artificial Intelligence，2023，123：106382.
18	Yan X F， Han J W. gSpan：Graph?based substructure pattern mining∥Proceedings of the IEEE Inter?national Conference on Data Mining. Maebashi City，Japan：IEEE，2002：721-724.
19	Abdelhamid E， Abdelaziz I， Kalnis P，et al. ScaleMine：Scalable parallel frequent subgraph mining in a single large graph∥Proceedings of the International Conference for High Performance Computing，Networking，Storage and Analysis. Salt Lake City，UT，USA：IEEE，2016：716-727.
20	Yang J， Leskovec J. Defining and evaluating network communities based on ground?truth∥2012 IEEE 12th International Conference on Data Mining. Brussels，Belgium：IEEE，2012：745-754.
21	Leskovec J， Huttenlocher D， Kleinberg J. Predicting positive and negative links in online social networks∥Proceedings of the 19th International Conference on World Wide Web. Raleigh North，CA，USA：ACM，2010：641-650.
22	Takac L， Zabovsky M. Data analysis in public social networks∥International Scientific Conference & International Workshop Present Day Trends of Innovations. ?om?a，Poland：DTI，2012.
23	Ikotun A M， Ezugwu A E， Abualigah L，et al. K?means clustering algorithms：A comprehensive review，variants analysis，and advances in the era of big data. Information Sciences，2023，622：178-210.
24	Ricaud B， Aspert N， Miz V. Spikyball sampling：Exploring large networks via an inhomogeneous filtered diffusion. Algorithms，2020，13(11)：275.
25	Elseidy M， Abdelhamid E， Skiadopoulos S，et al. GraMi：Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB Endowment，2014，7(7)：517-528.
26	Shi Y F， Yu Z W， Cao W M，et al. Fast and effective active clustering ensemble based on density peak. IEEE Transactions on Neural Networks and Learning Systems，2021，32(8)：3593-3607.