Intel Bensley 平台下前端总线竞争对访存密集型程序的性能影响分析*

 毛晓炜 1 * * , 陶先平 1 ,何万青 2

南京大学学报(自然科学版) ›› 2010, Vol. 46 ›› Issue (2) : 149-158.

PDF(712097 KB)
PDF(712097 KB)
南京大学学报(自然科学版) ›› 2010, Vol. 46 ›› Issue (2) : 149-158.

 Intel Bensley 平台下前端总线竞争对访存密集型程序的性能影响分析*

  •  毛晓炜 1 * * , 陶先平 1 ,何万青 2
作者信息 +

 Performance impact analysis of memory-intensive application by front side bus competition on Intel Bensley platform

  •  Mao Xiao-Wei 1 , Tao X ian-Ping1, He Wan- Qing 2
Author information +
文章历史 +

摘要

 对称多处理( symmetric multiprocessor, SMP) 机群系统因其优越的性价比和良好的可扩展性, 已经成为当今高性能计算的主流结构. 其中, 单节点采用 Intel双路四核平台已经逐渐成为目前高性
能计算服务器的主流平台. 由于一个 CPU 的四个核心共享一根前端总线, 而且两根前端总线并不完全独立, 前端总线竞争对访存密集型程序的性能有很大的影响. 本文针对 Intel Bensley 双路四核平台特
性, 给出了前端总线竞争对访存密集型 message passing interface( MPI) 程序性能影响的计算模型, 并编写程序和利用实例验证的该计算模型的有效性. 

Abstract

 Systemetric multiprocessor (SMP) clusters are the mainstream architecture in high performance computing ( HPC)because of their good cost performance ratio and excellent scalability . And Intel 2-way Quad -Core
platform is the main stream platform on signal node.However, on the popular Intel 2 -way Quad -Core platform named Bensley, front side bus( FSB)competition heavily affects the performance of memory intensive applications
because four cores in each CPU share a single FSB and dual FSB are not completely independent. Message Passing Interface ( MPI)is both a computer specification and is an implementation that allows many
computers to communicate with one another.It is widely accepted by the parallel computing because of its high performance, scalability , and portability.
This paper gives a model to predict the performance impact of memory intensive MPI application by FSB competition on Intel Bensley 2-way Quad -Core platform.To discuss the issue, we introduce a new variable called
Speeddown to depict the performance decline by FSB competition.Generally , a complex HPC M PI application canbe divided into numbers of basic blocks, in which there is continuous and balanced bus utilization.By analyzing the
address bus utilization and data bus utilization of the system when running a single basic block process binding on core 0 and the relationship between bus utilization and the number of data read from and write back memory, we
deduce the equations to predict the Speeddown when running 2/4/8 basic block processes binding on different cores. For complex memory intensive MPI applications, we focus on its computing time to study the performance impact by
FSB competition.Since the computing time can be divided into serial time and parallel time, we analyze their Speeddown when creating 4 or 8 processes binding on certain cores separately . Then a method is introduced to merge
them together and create the final performance impact model. A testing application is programmed to validate the effectiveness of the performance impact model of basic
block.The test result shows that predicted Speeddown and real Speeddown are highly consistent. We also introduce five models( BT , EP, IS, LU and MG)of NAS Parallel Benchmarks ( NPB)as real HPC applications to validate
the effectiveness of the performance impact model of memory intensive MPI applications by FSB competition.And the test result meets our expectation.
 

引用本文

导出引用
 毛晓炜 1 * * , 陶先平 1 ,何万青 2
.
 Intel Bensley 平台下前端总线竞争对访存密集型程序的性能影响分析*
[J]. 南京大学学报(自然科学版), 2010, 46(2): 149-158
 Mao Xiao-Wei 1 , Tao X ian-Ping1, He Wan- Qing 2
.
 Performance impact analysis of memory-intensive application by front side bus competition on Intel Bensley platform
[J]. Journal of Nanjing University(Natural Sciences), 2010, 46(2): 149-158

参考文献

 [ 1]   Jack D.Trends in hgh performance computing. The Computer Journal, 2004, 47( 4): 399~ 403.
[ 2]   Top 500 supercomputer sites.http: //www. top500. org/ stats/ list/30/procgen, 2008-04-05.
[ 2]   Lu P, Peir J K , Prakash T K , et al .Memory performance and scalability of Intel’ s and
AM D ’ s dual-core Processors:A case study . Performance Computing and Communications Conference, 2007, 11( 13) :55~ 64.
[ 3]   Message Passing Interface Forum.MPI:A message -passing interface standard. Interna - tional Journal of Supercomputer Applications
and High Performance Computing, 1994, 8( 3/4): 159~ 416.
[ 4]   Chitra N , Bruce C, Fayé B.A study of per-formance impact of memory controller features
in multi -processor server environment.ACM International Conference Proceeding Series, 2004, 68: 20~ 22.
[ 5]   Lu P, Peir J K , Prakash T K , et al.Memory performance and scalability of Intel ’ s and AMD ’ s
dual-core processors:A case study.Perform- ance Computing and Communications Confer-ence, 2007, 11( 13): 55~ 64.
[ 6]   Gu L H, Wu S G .Performance analysis of  memory intensive applications on SMP cluster.
Mini-Micro Systems, 2006, 27 (7) : 1258 ~1261. ( 顾丽红, 吴少刚 . 访存密集型应用在 SM P 机群系统中的性能分析. 小型微型计算机系
统, 2006, 27( 7) : 1258~ 1261).
[ 7]   David K .A preview of Intel’ s Bensley platform ( Part II).http: // www . realworldtech. com/page.cfm? ArticleID = RWT112905011743, 2008-04-05.
[ 8]   Tom S.The unabridged Pentium 4:Ia32 pro-cessor genealogy . Addison Wesley, 2004, 1141~ 1186.
[ 9]   Gene A .Validity of the single processor ap-proach to achieving large -scale computing capa-bilities.AFIPS Conference Proceedings, 1967,30: 483~ 485.
[ 10]   John D McC.STREAM :Sustainable memory bandwidth in high performance computers.ht- tp: // www . cs. virginia. edu/ stream/ #ListofT-ables, 2008-04-05.
[ 11]   NAS Parallel Benchmarks.http: // www . nas. nasa. gov/ Software/NPB/ , 2008-04-05.

基金

 国家“863”计划( 2007AA01Z178) , 江苏省自然科学基金( BK2006712)

PDF(712097 KB)

3136

Accesses

0

Citation

Detail

段落导航
相关文章

/