南京大学学报(自然科学版) ›› 2012, Vol. 48 ›› Issue (2): 123–132.

• •    下一篇

 基于回归树与K-最近邻交互模型的存储设备性能预测*

 郭昌辉1·2,刘贵全1.2**,张磊1·2
  

  • 出版日期:2015-05-22 发布日期:2015-05-22
  • 作者简介: (1.中国科学技术大学计算机科学与技术学院,合肥,230027;
    2.安徽省计算与通信软件重点实验室,合肥,230027)
  • 基金资助:
     中央高校基本科研基金

 An interactive model based on regression tree and K-nearest neighbor
for storage device performance prediction

 Guo Chang一Hui 1.2 ,Lui Gui一Quan 1.2,Zhang Lei 1.2
  

  • Online:2015-05-22 Published:2015-05-22
  • About author: (1 .School of Computer Science and Technology, University of Science and Technology of China
    Hefei,230027,China;2. Anhui Province Key Laboratory for Computing and Communication Software
    University of Science and Technology of China, Hefei,230027,China)

摘要:  存储设备性能预测在存储系统的自动化管理以及规划任务中发挥重要的作用.传统的方法是利用分析模型、仿真模型来预测存储设备性能,但这类方法需要大量领域专家知识,也无法适应越来越
高端、复杂的存储系统;利用机器学习的方法构建存储设备的预测模型不需要了解存储设备的内部结构和调度算法,但缺陷是预测精度不够高.木文提出一种基于回归树与K-近邻这两种具备潜在优劣互
补特性的交互模型来预测存储设备性能,以获取更高的预测精度.通过实验表明,该混合模型较单一模型(回归树或KNN)有更好的稳定性和预测精度.此外,在工作负载特征化的设计上,考虑到一个非常重
要的特征—缓存效应,该特征能够显著提高模型的预测精度.

Abstract:  Storage device performance prediction is a significant clement of self-managed storage systems and
application planning tasks, such as data assignment The traditional methods for storage device performance
prediction, such as accurate simulations and analytic models, needs sufficient expertise about storages. As the
storage devices are becoming more and more hig-end and complex, the accurate simulations and analytic models are
not available. Compared with traditional methods,the machine learning methods consider the storage devices as
black boxes,and needs no information about the internal components or algorithms of those storage devices. So
machine learning methods arc more appropriate for the trend of current storage devices development. Classification
and regression tree(CART)method for modelling storage devices is simple.This work explores an interactive model
based on regression tree and K-nearest neighbor algorithm to improve the machine learning method. Experiments
show that our proposed model has a higher prediction precise and a better stability than regression tree or KNN. In
our experiments, we found out that the caching effect is very important. We improved the method of workload
characterization considering caching effect,which makes a substantial difference on prediction accuracy.

[1]Gregory K G. Ucnerating representative syn- thetic workloads:An unsolved problem. Pro- ceedings of the Computer Measurement Group Conference,1995,1263一1269.
[2]John W. Data services-from data to contain ers.Keynote Address at 2nd Conference on File and Storage Technologies Conference, San Francisco,2003.
[3]Allen N. Don’t waste your storage dollars What you need to know. Research Note COM一 13一1217,Gartner Group, Stamford,2001.
[4]Gartncr Group.Total cost of storage owner- ship-A user-oriented approach.Research Note, Gartner GrouP. 2000一02一16.
[5]Gray J. A convcrsation with Jim Gray. ACM Queue,2003,1(4):8~17
[6]Lamb E. Hardware spending sputters. Red Herring, 2001,32一33.
[7]Edward K L, Randy H K. An analytic perform  ance model in Minerva.Technical Report HPL-2001-118,HP Laboratories,2001.
[8]Elizabeth S, Arif M, John W. An analytical be- havior model for disk drives with read ahead caches and request reordering. Proceedings of the international Conference on Measurement and Modeling of Computer Systems,1998,182一191.
[9]Mustafa U,Guillcrmo A A,Arif M. A modu- lar, analytical throughput model for modern disk arrays. Proceedings of the 9th international Symposium on Modeling, Analysis and Stimula- tion of Computer and Telecommunication Sys- tams, 2001,183一192.
[10]John B, Greg G.The DiskSim Simulation Envi- ronment Version 3. 0 Reference Manual.Tech- nical Report CMU一CS一03一102,Carnegie Mellon University, 2003.
[11]Chris R,John W. An introduction to disk drive modeling, IEEE Computer, 1999,27(3):17~28
[12]Griffina J L,Schindler J,Schlosser S W,et al. Timing-accurate storage emulation. FAST2002 on File and Storage Technologies, USENIX AS- sociation, Monterey, 2002,75一88.
[13]Andenson E. Simple table-based modeling of storage devices. Technical Report HPL一SSP 2001一04,HP Laboratories,2001.
[14]Kelly T,Cohen I G Keeton M K, Inducing models of black-box storage arrays.Technical Report HPL一SSP一2004一108,HP Laboratories,2004.
[15]Mesnier M P,Wachs M,Sambasivan R R,et al. Modeling the relative fitness of storage. Proceedings of the 2007 ACM SIUMETRICS international Conference on Measurement and Modeling of Computer Systems, New York: ACM, 2007,37一18.
[16]Wang M, Ailamaki A K,Brockwell A A,et al. Storage device performance prediction with CART models.The 12th Annual international Symposium on Modeling, Analysis, and Simu- lation of Computer and Telecommunication Sys- tems, MASCOTS, 2004,588一595.
[17]Zhang L, Liu G Q, Song X C, et al. Storage device performance prediction with selective bagging classification and regression trees. Net work and Parallel Computing, 2010,6289 121一133.
[18]Breiman L, Freidman J H,Olshen R A,et al. Classification and regression trees. Wadsworth international Group, California; Belmnt,1984,358.
[19]Fix E, Hodges J L. Discriminatory analysis,nonparametric discrimination; Consistency properties. Technical Report,4 USAF School of Aviation Medicine, Randolph Field,Texas,1951.
[20]Wettscherceck D, Aha D W, Mohri T.A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Aritificial intelligence Review, 1997,11(1一5):273一314.
[21]Weiss S, lndurkhya N. Rul}based regression. Proceedings of the 13th international Joint Con ference on Artificial intelligence, 1993 1072一1078.
[22]Hastie J J,Pregihon D. Shrinking Trees.Tech nical report, AT &- Bell Labrtatories, 1990.
[23]Robnik-Sikonja M, Konnonenko I. Context sensitive attribute estimation in regression. Pro- ceedings of the ICML Workshop on Learning in Context-Sensitive Domains, 1996,43一52.
[24]Blake C, Maerz J C. UCI repository of maching learning databases. http;archive.ics.uci. edu/ml/.1998.
[25]Greg G, Bruce W, Yale P.The DiskSim Simu lation Environment(v4.0).http://www.pdl.cmu.edu/DiskSim/.2008.
[26]Laboratory of Advanced System Softwane Umass Trace Repository, http; / /traces, cs umass.edu/ndex, php/Storage/Storage. 2004.











No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!