高级计算机系统结构第三章

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

20
§3.3 Basic Performance Metrics
Workload Metrics
– Execution time : depending on algorithm , data structure, input data, platform, and language – Instruction count : depending on input data, platform (RISC, CISC), compiler – Floating -point count : normally it is independent
5
§3.1.1 Micro Benchmarks
Name LINPARK (Top 500) LMBENCH STREAM Measuring Numerical computing (Linear algebra) System calls and data movement operations in Unix Memory bandwidth
14
1) Execution Time
To run the user’s application on the target machine and measure the wall clock time elapsed. But this approach is sometimes difficult to apply Execution time is critical to some applications, such as in a real-time application Execution time alone does not give much clue to a true performance of the machine
10
§3.1.4 SPEC Benchmark Family
SPEC(Standard Performance Evaluation Corporation) emphasizes developing real applications benchmarks that closely reflect the actual workload SPEC defines a few (2 in many cases) metrics that measure the overall performance of entire system
2
A benchmark suite = A set of benchmark programs + a set of specific rules governing the test conditions and procedures, including the
– – – – tested platform environment, the input data, the output results, and the performance metrics
11
SPEC89 SPEC92 SPEC95: (CPU-intensive applications)
– SPEC95 CPU benchmarks are most famous SPEC benchmarks widely used by vendors and users – they measure the CPU speed, the cache/memory system, and the compiler as a whole
15
2) Processing Speed
For many applications, the users may be interested in achieving a certain processing speed, rather than an execution time limit The application may process different data inputs with different workloads, thus taking different execution times However, the speed requirement should be maintained
SPEC2000
– 12(C:11 ,C++:1) integer programs → SPECint 2000 – 14(Fortran77:6, Fortran 90:4,C:4) integer programs → SPECfp 2000
13
§3.2 Performance versus cost
12
– 8 integer programs → SPECint95 – 10 floating-point programs → SPECfp95 – All SPEC95 results are expressed as ratios compared to a Sun SPARC station 10/40, the reference machine
4
(2) According to macro or micro:
– Macro benchmark → measure the performance as a whole – Micro benchmark → measure the performance from a specific aspect, such as, CPU speed, memory access time, I/O speed, OS performance , networking
Chapter 3. Performance Metrics and Benchmarks
§3.1 System and Application
Benchmarks
1.Definition of Benchmark: A benchmark is a performance testing program that supposedly captures processing and data movement characteristics of a class of applications
A benchmark family = A set of benchmark suits
3
2. The main goals of benchmark
Benchmarks are used to measure and to predict the performance of computer systems, and to reveal their architectural weakness and strong points
17
4) Utilization
It is the ratio of the achieved speed to the peak speed of a given computer A sequential application executing on a single MPP processor has a utilization ranging from 5%-40%, typical 8%-35% A parallel application executing on multiple processors has a utilization ranging from 1%35%, typical 4%-20% Some benchmark can reach higher utilization, for example : ASCI White Pacific IBM SP POWER3(375MHz) U = 7.226/12.3 = 58.7 %, NEC Earth Simulator can reach U = 35.8/40.96 = 87.4%
18
5) Cost-effectiveness
It refers to a high percentage of the CPUhours to be used for useful computing, instead of being wasted on load imbalance, communication overheads, etc. A good indicator of cost-effectiveness is the utilization , which is the ratio of the achieved speed to the peak speed of a given computer
6
Nov. 2006 Top500
7
2003, June,1st four of Top 500
Ran k 1 Computer NEC Earth-Simulator HP SC ES 45 MCR Linux Cluster Xeon 2.4GHz IBM ASCI White SP Power3 神州 IV Alpha 800MHz 联想 深腾1800 深腾 Number of processor 5120
3. Classification of benchmarks
(1)According to application classes – scientific computing – commercial applications – network services – multimedia applications – signal processing
19
6) Performance/cost
It is defined as the ratio of the speed to the purchasing price Gflop/s per $M Should use sustained performance/cost, not peak performance/cost
Rmax (Tflops) Rpeak (Tflops)
35.8 40.96
Nmax
1075200
Country
Japan
2
8192
13.88
20.48
633000
USA
3
2304
7.634
11.06
350000
USA
4百度文库
8192
7.304
12.288
518096
USA
4‘
8192 512 (Xeon 2GHz)
N.A
13.107
N.A
China
52
1.046
2.048
153600
China
8
§3.1.2 Parallel Computing Benchmarks
1.The NBP Suite - NAS parallel benchmarks 2.The PARKBENCH - PARallel Kernels and BENCHmarks) 3.The Parallel STAP Suite - The Space-Time Adaptive Processing
9
§3.1.3 Business and TPC Benchmarks
TPC - Transaction Processing Performance Council The most popular benchmark for commercial applications is TPC-C Benchmark
16
3) System Throughout
Throughput is defined to be the number of jobs processed in a unit time The throughput is usually used when multiple jobs are executed simultaneously.
Performance : how to measure 1) Execution Time 2) Processing Speed 3) System Throughput 4) Utilization 5) Cost-effectiveness 6) Performance/cost • These performance requirements could lead to quite different conclusions for the same application on the same computer platform
相关文档
最新文档