南京大学学报(自然科学版) ›› 2012, Vol. 48 ›› Issue (3): 351–359.

• • 上一篇    下一篇

 一种基于滑动窗口的不确定数据流Top一K 查询算法*

 汤克明1,2,戴彩艳1,陈峻3.4**
  

  • 出版日期:2015-06-18 发布日期:2015-06-18
  • 作者简介: (1.南京航空航天大学计算机科学与技术学院,南京,210016;2.盐城师范学院信息科学与技术学院,盐城,224002;
    3.扬州大学计算机科学系,扬州,225009;.南京大学计算机软件新技术国家重点实验室,南京,210093)
  • 基金资助:
     国家自然科学基金(61070047),江苏省自然科学基金(BK2008206)

 A Top一queries algorithm for uncertain data streams
based on sliding-window

 Tank Ke-Ming 1’2,Dai Cai-Yan1,Chen Ling 3,4
  

  • Online:2015-06-18 Published:2015-06-18
  • About author: (1 .College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing,
    210016 , China; 2. College of Information Science and Technology,Yancheng Teachers University, Yancheng,
    224002,China;3. Department of Computer Science,Yangzhou University,Yangzhou,225009,China;4. State
    Key Lab of Novel Software Technology,Nanjing University,Nanjing,210093,China)

摘要:  由于不确定数据流在诸如移动计算、无线射频识别技术和传感器网络等实际应用中广泛存在,如何利用有限存储空间进行快速查询处理是不确定数据流管理的重要问题.木文研究基于滑动窗口模型的不确定数据流Top-K查询的问题,提出了相应的算法.该算法利用滑动窗II数据模型存储不确定流数据,建立3个概要表,当前窗II中的元组分别按照它们出现的顺序、它们的得分值的大小、它们的出现概率值的大小存入这3个表中.算法逐次在得分值最高的前若干个元组中选取概率值最高的前k
项元组集合,并计算它们的发生概率.我们在理论上证明了,这些前k项元组集合中概率最高的就是Top-K查询结果.实验结果表明,所提出的查询算法在时间与空间复杂性方面优于其他类似的算法.

Abstract:  Due to the existence of uncertain data streams in wide spectrum of real-world applications, such as mobile computing, radio frequency identification technology and wireless sensor networks, uncertain data streamsmanagement has become an important problem in stream data mining.This paper tackles the problem of answering maximal probabilistic Top-K tuple set (MPTopKTS) queries on uncertain data streams based on a sliding-window model. We present an algorithm for processing sliding-window MPTopKTS queries on uncertain data streams. Based on the sliding-window model,we designed three synopses table to process each tuple which contains data item x score item f(X),and existential probability p(X).The tuples arc stored in the tables according to their arrival times
their scores,and their probabilities respectively.The algorithm selects the k tuples with the highest probabilities from the sets of different numbers of the tuples with the highest scores. After that,the algorithm computes existential probability of theTop-K tulpes,and chooses the one with the highest probability as the answer of MPTopKTS. We theoretically proved the correctnesss of the algorithm presented. Our experimental results show that our algorithm requires lower time and space complexity than other similar algorithms.

[1]Gao Y. Process of data mining in china. Journal of Nanjing University(Natural Sciences),2011,47(4):351-353.(高阳.中国数据挖掘研究
进展.南京大学学报(自然科学),2011,47(4): 351一353).
[2]Soliman M A,liyas I F,Chang K C.Top-k query processing in uncertain database. Proceedings of the 23rd IEEE international Conference on Data
Enginccring,lstanbul,2007,896一905.
[3]Hua M,Pei J,Zhang W J,et al. Efficiently an swering probabilistic threshold Top-K queries on uncertain data. Proceedings of the 24th IEEE
International Conference on Data Engineering, Washington,2008,1403一1405.
[4]Jin C Q, Yi K,Chen L,et al. Sliding-window Top K queries on uncertain streams. Procecd- ings of the international Conference on Very
Large Data Bases,Endowment,2008,301一312.
[5]Cormode G,Li F F, Yi K. Semantics of ranking queries for probabilistic data and expect ranks. Proceedings of the 25th IEEE international Con-
ference on Data Engineering,Washington,2009,305一316.
[6]Jester J,Cormode G,Li F F,et al. Semantics of ranking queries for probabilistic data. IEEE Transactions on Knowledge and data Data Engi-
neering,2011,23(12):1903一1917.
[7]Li J,Saha B, Deshpande A. A unified approach to ranking in probabilistic database. Journal on Very Large Data Bases,2011,20(2);249一275.
[8]Babcock B. Babu S. Datar M, et al. Modcls and issues in data stream systems. Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART
Symposium on Principles of Database System,Madison,2002,1~16.
[9]Parisa H,Sebastian M,Karl A. Evaluating Top- K queries over incomplete data streams. Pro ceedings of the 18th ACM Conference on Infor
matron and Knowledge Management, New York,2009,877~886.
[10]Kawashima H,Kitagawa H,Li X. Complex event processing over uncertain data streams. Proceedings of the international Conference on
P2P, Parallel,Grid,Cloud and lntcrnct Compu- tiny, Washington, 2010,521一526.
[11]Leung C K S, Hao B Y,Jiang F. Constaincd fre quent itemset mining from uncertain data streams. Proceedings of the IEEE 26th interna-
tional conference on data engineering work- shops,Long Beach,2010,120一127.
[12]Hua M,Pei J,Zhang W J,et al. Ranking on un certain data;A probabilistic threshold approach. Proceedings of the 2008 ACM SIUMOD Inter-
national Conference on Management of Data, Vancouver,2008:673一686.
[13]Jin C Q, Yi K,Chen L, et al. Sliding-window Top-K queries on uncertain streams. lnterna- tional Journal of Very Large Data Bases, 2010,19:411一435.
[14]Zhou A Y,Jin C Q, Wang G R,et al. A survey on the management of uncertain data. Chinese Journal of Computcrs,2009,32(1):1一16.(周
傲英,金澈清,土国仁等.不确定性数据管理技术研究综述.计算机学报,2009,32(1): 1一16).
[15]Wang S, Wang G R,Gao X X,et al. Frequent items computation over uncertain wireless sen- sor network. Proceedings of the 9th international
Conference on Hybrid intelligent System,Wash- ington, 2009 , 223一228.
[16]Zhang Y,Zhang W J,Lin X M, et al. Ranking uncertain sky; The probabilistic Top-K skyline operator, Information System, 2011,36:898~915.
[17]National Snow and Ice Data Center,Internation- al Ice Patrol(IIP) iceberg sightings database. http;//nsidc. org/data/g00807, html,2011一11一20.














No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!