基于事务型滑动窗口的数据流中高效用项集挖掘算法

南京大学学报(自然科学版) ›› 2014, Vol. 50 ›› Issue (4): 494–.

基于事务型滑动窗口的数据流中高效用项集挖掘算法

宋威,刘明渊,李晋宏

出版日期:2014-08-23 发布日期:2014-08-23
作者简介: 北方工业大学信息工程学院，北京，100144
基金资助:
国家自然科学基金(61105045),北方工业大学科研人才提升计划(CCXZ201303)

High utility itemsets mining algorithm over data stream based on transaction-sensitive sliding window

Song Wei, Liu Mingyuan, Li Jinhong

Online:2014-08-23 Published:2014-08-23
About author: College of Information Engineering, North China University of Technology, Beijing, 100144, China

摘要/Abstract

摘要： 由于能反映用户的偏好，可以弥补传统频繁项集挖掘仅由支持度来衡量项集重要性的不足，高效用项集正在成为当前数据挖掘研究的热点。为使高效用项集挖掘更好地适应数据流环境，提出了一种基于事务型滑动窗口的数据流中高效用项集挖掘算法MHUIDS。首先在二进制向量的基础上，提出了高事务加权效用项集树(HTWUI-树)的结构。进而分别给出了事务型滑动窗口初始化与滑动的算法描述。最后，提出了高效用项集的剪枝策略与挖掘算法。实验结果表明，MHUIDS算法具有较高的挖掘效率及较低的存储开销。

Abstract: Traditional frequent itemset mining methods consider an equal profit/weight for all items and only binary occurrences (0/1) of the items in transactions. By considering the non-binary frequency values of items in transactions and different profit values for each item, high utility itemset mining becomes a very important research issue in the field of data mining. Mining high utility itemsets from a transactional database refers to the discovery of itemsets with high utility like profits. Although a number of relevant algorithms have been proposed in recent years, they incur the problem of producing a large number of candidate itemsets for high utility itemsets. Such a large number of candidate itemsets degrades the mining performance in terms of execution time and space requirement. The situation may become worse when the database contains lots of long transactions or long high utility itemsets. Meanwhile, data stream mining is an emerging research topic in the data mining and knowledge discovery community. Finding frequent itemsets is one of the most important tasks in data stream mining with wide applications like online e-business and web click-stream analysis. However, there are few works on how to discover high utility itemsets in data stream. To meet the scenario of data stream, an algorithm based on transaction-sensitive sliding window, called MHUIDS (Mine High Utility Itemsets over Data Stream), for mining high utility itemsets over data stream is proposed. Firstly, the problem of high utility itemsets mining in transaction-sensitive sliding window is defined formally. Secondly, a tree structure, called the High Transaction-Weighted Utilization Itemset Tree (HTWUI-Tree), is introduced based on binary vector. By recording itemset as well as utility and bit vectors, HTWUI-Tree can describe the searching space effectively, which lays foundation for quickening the process of high utility itemset mining. Then, initialization and sliding algorithms of transaction-sensitive sliding window are described respectively, which are used for constructing and modifying HTWUI-Tree. Finally, pruning strategies and mining algorithm are proposed, which can reduce the number of candidate itemsets and lower the cost of database scanning. Experimental results show that MHUIDS algorithm is efficient and consumes low storage cost.

宋威,刘明渊,李晋宏. 基于事务型滑动窗口的数据流中高效用项集挖掘算法[J]. 南京大学学报(自然科学版), 2014, 50(4): 494–.

Song Wei, Liu Mingyuan, Li Jinhong. High utility itemsets mining algorithm over data stream based on transaction-sensitive sliding window[J]. Journal of Nanjing University(Natural Sciences), 2014, 50(4): 494–.

参考文献

[1] Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In: Bocca J B, Jarke M, Zaniolo C. The 20th International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann, 1994: 487~499.
[2] 宋威,李晋宏,徐章艳等.一种新的频繁项集精简表示方法及其挖掘算法的研究.计算机研究与发展,2010,47(2): 277~285.
[3] Liu Y, Liao W K, Choudhary A N. A two-phase algorithm for fast discovery of high utility itemsets. In: Ho T B, Cheung D W L, Liu H. The 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2005: 689~695.
[4] Lan G C, Hong T P, Tseng V S. An efficient projection-based indexing approach for mining high utility itemsets. Knowledge and Information Systems, 2014, 38(1): 85~107.
[5] 高阳.中国数据挖掘研究进展.南京大学学报(自然科学), 2011, 47(4): 351~353.
[6] Yao H, Hamilton H J, Butz C J. A foundational approach to mining itemset utilities from databases. In: Berry M W, Dayal U, Kamath C, et al. The 4th SIAM International Conference on Data Mining. Philadelphia: SIAM Press, 2004, 482~486.
[7] Yao H, Hamilton H J. Mining itemset utilities from transaction databases. Data & Knowledge Engineering, 2006, 59(3): 603~626.
[8] Li Y C, Yeh J S, Chang C C. Isolated items discarding strategy for discovering high utility itemsets. Data & Knowledge Engineering, 2008, 64(1): 198~217.
[9] Han J, Pei J, Yin Y, et al. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 2004, 8(1): 53~87.
[10] Tseng V S, Wu CW, Shie BE, et al. UP-growth: An efficient algorithm for high utility itemset mining. In: Rao B, Krishnapuram B, Tomkins A, et al. The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2010, 253~262.
[11] Tseng V S, Shie B E, Wu C W, et al. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(8): 1772~1786.
[12] Ahmed C F, Tanbeer S K, Jeong B-S, et al. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(12): 1708~1721.
[13] Song W, Liu Y, Li J. Mining high utility itemsets by dynamically pruning the tree structure. Applied Intelligence, 2014, 40(1): 29~43.
[14] 李海峰,章宁,朱建明等.时间敏感数据流上的频繁项集挖掘算法.计算机学报,2012,35(11):2283~2293.
[15] 吴枫,仲妍,吴泉源.基于时间衰减模型的数据流频繁模式挖掘.自动化学报,2010,36(5):674~684.
[16] Chu C J, Tseng V S, Liang T. An efficient algorithm for mining temporal high utility itemsets from data streams. Journal of Systems and Software, 2008, 81(7):1105~1117.
[17] Li H F, Huang H Y, Chen Y C, et al. Fast and memory efficient mining of high utility itemsets in data streams. In: The 8th IEEE International Conference on Data Mining.New York: IEEE Press, 2008: 881~886.
[18] Ahmed C F, Tanbeer S K, Jeong B S. Efficient mining of high utility patterns over data streams with a sliding window method. In: Chowdhury M U, Ray S, Lee R Y. The 11th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. New York: IEEE Press, 2010: 99~113.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed