南京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (1): 85–.

• • 上一篇    下一篇

 UCM-PPM:基于用户分级的多参量Web预测模型

 王卓君,申德荣*,聂铁铮,寇 月,于 戈   

  • 出版日期:2018-01-31 发布日期:2018-01-31
  • 作者简介: 东北大学计算机科学与工程学院,沈阳,110000
  • 基金资助:
     基金项目:国家自然科学基金(61472070,61672142)
    收稿日期:2017-12-09
    *通讯联系人,E-mail:shenderong@cse.neu.edu.cn

 UCM-PPM:Multi-parameter web prediction model based on the user classification

 Wang Zhuojun,Shen Derong*,Nie Tiezheng,Kou Yue,Yu Ge   

  • Online:2018-01-31 Published:2018-01-31
  • About author: School of Computer Science and Engineering,Northeastern University,Shenyang,110000,China

摘要:  Web在过去数十年飞速发展,其低延迟和快响应的特性已经变得越来越重要.面对这样的需求,通常会预取用户即将访问的文件到缓存中,利用代理服务器缓存来获取数据,避免网络堵塞,提高Web访问效率.可见,在预取技术中,一个有效的预测模型是非常有必要的.针对目前缓存预取工作对用户差异关注度不足和度量指标单一化的薄弱环节,提出一个基于用户分级化的Web预测模型,并且能够随着Web请求进行多参数动态调整.该模型通过对代理服务器上用户访问情况分布的变化趋势分析,将用户集分为重要性不同的若干等级,并适当利用序列相似度来聚类低贡献用户产生的会话,之后在部分匹配预测模型的基础上,结合缓存替换策略为预测树结点构造包含多个参量的目标函数,并使构建好的模型能够进行自适应调整.最后通过实验证明该模型可以有效提高缓存的预取性能.

Abstract:  With the Web’s rapid development,the demands of low latency and fast response become increasingly urgent over the past few decades.In order to achieve this goal,the prefetching techniques are widely used,where documents are prefetched into caches in advance.Using prefetching techniques,we can avoid network congestion and raise access efficiency.Therefore,an effective prediction model is very essentialin the prefetching technique.Considering the necessities of high accuracy rate and practicability,we use the Prediction by Partial Match(PPM) suffix tree as a fundamental model to predict web pages.We point out some deficiencies on the side of neglect of users’ differences and the metric simplification in current cache-prefetching work.Then we present a multi-parameter web prediction model with a self-adaptation adjustment based on the user hierarchy.The main contents are listed as follows:First,we propose a user classification model based on the history access log in this paper.User behaviors are analyzed and user permutation distribution can be acquired.Then our model classifies users into different categories according to the user contribution degree distribution.The users with different contribution degree account ought to own different weights.In addition,for the users with very low contribution,we align their access web sequences and clusters them.Secondly,a method that sets the node objective function with the multi-parameter effecting is presented to construct the prediction model.The objective function involved with multiple parameters is constructed with elements related to cache replace strategies as the page accessing heat and the user classification accumulation based on the accessing frequency.And we regard the node with maximum value as one owns the strongest predictive ability.We also establish an adjustment mechanism when the prediction tree is working.So the model can learn continuously and adjust dynamically.Finally,we compare our model with several existing models through experiments.Our model has better performance on the prediction accuracy and the cache hit ratio,and we can get better results by adjusting model parameters.

 [1] Padmanabhan V N,Mogul J C.Using predictive prefetching to improve World Wide Web latency.ACM SIGCOMM Computer Communication Review,1996,26(3):22-36.
[2] Xu C Z,Ibrahim T I.Semantics-based personalized prefetching to improve Web performance ∥ The 20th IEEE Conference on Distributed Computing Systems.Piscataway,NJ,USA:IEEE Press,2000:636-643.
[3] Géry M,Haddad H.Evaluation of Web usage mining approaches for user’s next request prediction ∥ The 5th ACM International Workshop on Web Information and Data Management.New Orleans,LA,USA:ACM Press,2003:74-81.
[4] Mabroukeh N R,Ezeife C I.Semantic-rich markov models for web prefetching ∥ Proceedings of the IEEE International Conference on Data Mining Workshops.Miami,FL,USA:IEEE Press,2009:465-470.
[5] Zukerman I,Albrecht D W,Nicholson A E.Predicting users’ requests on the WWW ∥ The 7th International Conference on User Modeling.Banff,Canada:Springer,1999:275-284.
[6] Palpanas T,Mendelzon A.Web prefetching using partial match prediction ∥ The 4th International Web Caching Workshop.San Diego,CA,USA:National Library of Canada,oai:CiteSeerX.psu:10.1.1.41.677,1998.
[7] Su Z,Yang Q,Lu Y,et al.whatNext:A prediction system for web requests using N-gram sequence models ∥ The 1st International Conference on Web Information Systems Engineering.Hong Kong,China:IEEE Press,2000:214-221.
[8] Pitkow J,Pirolli P.Mining longest repeating subsequences to predict world wide web surfing ∥ The 2nd USENIX Symposium on Internet Technologies and Systems.Boulder,CO,USA:USENIX Association Press,1999:13.
[9] Chen X,Zhang X D.Popularity-based PPM:An effective web prefetching technique for high accuracy and low storage ∥ Proceedings of the International Conference on Parallel Processing.Vancouver,Canada:IEEE,2002:296-304.
[10] Deshpande M,Karypis G.Selective Markov models for predicting Web page accesses.ACM Transactions on Internet Technology,2004,4(2):163-184.
[11] Bernhard S D,Leung C K,Reimer V J,et al.Clickstream prediction using sequential stream mining techniques with markov chains ∥ The 20th International Database Engineering & Applications Symposium.Montreal,Canada:ACM Press,2016:24-33.
[12] Gellert A,Florea A.Web prefetching through efficient prediction by partial matching.World Wide Web,2016,19(5):921-932.
[13] Fagni T,Perego R,Silvestri F,et al.Boosting the performance of web search engines:Caching and prefetching query results by exploiting historical usage data.ACM Transactions on Information Systems,2006,24(1):51-78.
[14] Dimopoulos C,Makris C,Panagis Y,et al.A web page usage prediction scheme using sequence indexing and clustering techniques.Data & Knowledge Engineering,2010,69(4):371-382.
[15] Poornalatha G,Raghavendra P S.Web page prediction by clustering and integrated distance measure ∥ Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.Istanbul,Turkey:IEEE Press,2012:1349-1354.
[16] Thiyagarajan R,Thangavel K,Rathipriya R.Recommendation of web pages using weighted k-means clustering.International Journal of Computer Applications,2014,86(14):44-48.
[17] Poornalatha G,Raghavendra P.Alignment based similarity distance measure for better web sessions clustering ∥ The 2nd International Conference on Ambient Systems,Networks and Technologies.Amsterdam,The Netherlands:Elsevier Press,2011:450-457.
[18] Borges J,Levene M.Data mining of user navigation patterns ∥ Proceedings of the International WEBKDD’99 Workshop on Web usage analysis and user profiling.San Diego,CA,USA:Springer,1999:92-111.
[19] Lempel R,Moran S.Predictive caching and prefetching of query results in search engines ∥ The 12th International Conference on World Wide Web.Budapest,Hungary:ACM Press,2003:19-28.
[20] Ban Z J,Gu Z M,Jin Y.A PPM prediction model based on stochastic gradient descent for web prefetching ∥ The 22nd International Conference on Advanced Information Networking and Applications.Okinawa,Japan:IEEE Press,2008:166-173.
[21] Ma H Y,Wang B.User-aware caching and prefetching query results in web search engines ∥ The 35th International ACM SIGIR Conference on Research and Development in Information Retrieval.Portland,OR,USA:ACM Press,2012:1163-1164.
[22] 孟 涛,闫宏飞,王继民.Web网页信息变化的时间局部性规律及其验证.情报学报,2005,24(4):398-406.(Meng T,Yan H F,Wang J M.Characterizing temporal locality in changes of web documents.Journal of the China Society for Scientific and Technical Information,2005,24(4):398-406.)
[23] Wang J Y,Shan S W,Lei M,et al.Web search engine:Characteristics of user behaviors and their implication.Science in China Series:Information Sciences,2001,44(5):351-365.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!