执行时间(latency等待时间)
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Performance Measurement 1
Performance
ִ Execution time 执行时间(latency等待时间): 执行时间( 等待时间) 等待时间 Time between the start and the completion of an event 一个事件从开始到结束所经过的时间 Performance ∝ 1/(Execution time) 性能与执行时间成反比 ִ Throughput吞吐量 (bandwidth带宽 : 带宽): 吞吐量 带宽 Total amount of work done in a given time 给定时间内完成的全部工作
提高频繁事件的执行速度, 提高频繁事件的执行速度,而不是提高罕见事件 的执行速度, 的执行速度,将带来明显的性能上的提高
ִ例如加法运算中的溢出和非溢出情况 例如加法运算中的溢出和非溢出情况
5
Amdahl’s Law 1
Amdahl’s Law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. 阿姆达定律表明: 阿姆达定律表明:通过改进某模式得到的整 体性能提高, 体性能提高,受限于该改进模式所占的运行 时间比例。 时间比例。
12
Suppose FPLaw canroot(FPSQR) is responsible for Amdahl’s square also be applied to compare two 20% of the execution time of a critical graphics CPU design alternatives, for example : benchmark. One proposal is to enhance the FPSQR Implementations of floating-point(FP)square root hardware and speed up this operation by a factor of vary significantly in performance, especially all FP 10. The other alternative is just to try to make among instructions in the graphics processor run faster by a processors designed for graphics. Suppose FP factor of 1.6; FP instructions are responseible for the square root(FPSQR) is responsible for 20% of a total of 50% of the executiongraphics benchmark. execution time of a critical time for the application. Compare these two design alternatives.
4
Make the Common Case Fast
Improving the frequent event, rather than the rare event, will obviously help performance.
ִOverflo来自百度文库 case and no overflow case in addition
1 0.2 = 1.22 SpeedupFPSQR = 1− 0.2) + (( ) = 1.22 0 10.2) ((1− 0.2) + 10 1 SpeedupFP = = 1.23 1 0.5 (( SpeedupFP = 1− 0.5) + ) = 1.23 1.0.5 6 ((1− 0.5) + )of the FP Improving the performance 1.6 operations overall is slightly better because of the higher frequency. Improving the performance of the FP (可见提高所有FP操作的性能的方案要好, 这是由 可见提高所有 操作的性能的方案要好, 操作的性能的方案要好 operations overall is slightly better because 于它们的执行频率较高) 于它们的执行频率较高)
Answer: we can compare these two Answer: we can compare the speedups: alternatives by comparing these two alternatives by comparing the speedups: 可以通过计算加速比来进行比较) (可以通过计算加速比来进行比较)
7
Amdahl’s Law 3
Execution timenew = Execution timeold x
fE ((1− fE) + ) sE
where fE: fraction of enhancement sE: improvement gained by the enhancement mode 新的执行时间= 即:新的执行时间
3
Make the Common Case Fast
Perhaps the most important and pervasive principle of computer design is to make the common case fast: In making a design trade-off, favor the frequent case over the infrequent case. 计算机设计的最重要的原则就是: 计算机设计的最重要的原则就是:加快经常性发 生事件的执行速度。 生事件的执行速度。
10
Amdahl’sLaw can also be be applied to compare Amdahl’s Law can also applied to compare two CPU design alternatives, for example : two CPU design alternatives, Implementations for example : of floating-point(FP)square root vary significantly in performance, especially among Implementations of floating-point(FP)FP processors designed for graphics. Suppose squareroot(FPSQR) is responsible for 20% of the square root vary significantly in performance, of a critical graphics processors execution time especially among benchmark. designed for is to enhance the FPSQR hardware One proposal graphics. and speed up this operation by a factor of 10. The Amdahl’s Law也可以用于比较两种设计不同的 也可以用于比较两种设计不同的 other alternative is just to try to make all FP CPU,特别是对于处理图形的处理器来说,求浮点 ,特别是对于处理图形的处理器来说, instructions in the graphics processor run faster 数平方根的不同实现方法在性能上有很大差异。 数平方根的不同实现方法在性能上有很大差异。 by a factor of 1.6; FP instructions are responseible for a total of 50% of the execution time for the application. Compare these two design alternatives.
2
Performance Measurement 2
Example: Machine A runs a program in 10 seconds, Machine B runs the same program in 15 seconds, A is __% faster than B.
n Execution tim B 15 50 e 1+ = = =1+ 100 Execution tim A 10 e 100 ⇒n = 50
6
Amdahl’s Law 2
Speedup (加速比) 加速比) = Performance for entire task using the enhancement when possible(改进后完成整个任务的性能) (改进后完成整个任务的性能) Performance for entire task w/o using the enhancement (改进前完成整个任务的性能) 改进前完成整个任务的性能) = Execution time for entire task w/o using the enhancement(改进前完成整个任务的时间) (改进前完成整个任务的时间) Execution time for entire task using the enhancement when possible (改进前完成整个任务的时间) 改进前完成整个任务的时间)
增强比例 原来执行时间x 原来执行时间 ((1− 增强比例 + ) ) 增强加速比
8
Amdahl’s Law 3
1 Execution timeold = ⇒ Speedup = Execution timenew (1− fE) + fE sE
即:加速比=原来的执行时间/新的执行时间 1 =
增强比例 ((1−增强比例 + ) ) 增强加速比
11
One proposal is to enhance the FPSQR hardware and speed up this operation by a factor of 10. The 例如,求浮点数平方根的操作,在一个标准测试程序中占总执 例如,求浮点数平方根的操作, other 行时间的20%。一种方法是改进 硬件, 行时间的alternative is just to try to make all FP 。一种方法是改进FPSQR硬件,将它的操作速 硬件 instructions in the graphics processor run 指令的执 度提10倍 另一种方法是将所有图形处理器中的FP指令的执 度提 倍。另一种方法是将所有图形处理器中的 faster by a factor of 倍 这些FP指令在总的执行时间中占 行速度都提高1.6倍 FP instructions are responseible for 指令在总的执行时间中占50% 行速度都提高 1.6; ,这些 指令在总的执行时间中占 a total of 50% of 比较这两种设计方法。 比较这两种设计方法。 the execution time for the application. Compare these two design alternatives.
1
Performance Measurement 1
Machine X is n% faster than Machine Y: 机器X比机器 比机器Y快 机器 比机器 快 n%
n ExecutiontimeY PerformanceX = = 1+ 100 ExecutiontimeX PerformanceY
9
Amdahl’s Law 4
Example: An enhancement run 10 times faster than the original machine, but it is usable 40% of the time, then the speedup = __. Sol:fE = 0.4 sE = 10 ⇒ Speedup= 1/((1-0.4) + 0.4/10) = 1.56
Performance
ִ Execution time 执行时间(latency等待时间): 执行时间( 等待时间) 等待时间 Time between the start and the completion of an event 一个事件从开始到结束所经过的时间 Performance ∝ 1/(Execution time) 性能与执行时间成反比 ִ Throughput吞吐量 (bandwidth带宽 : 带宽): 吞吐量 带宽 Total amount of work done in a given time 给定时间内完成的全部工作
提高频繁事件的执行速度, 提高频繁事件的执行速度,而不是提高罕见事件 的执行速度, 的执行速度,将带来明显的性能上的提高
ִ例如加法运算中的溢出和非溢出情况 例如加法运算中的溢出和非溢出情况
5
Amdahl’s Law 1
Amdahl’s Law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. 阿姆达定律表明: 阿姆达定律表明:通过改进某模式得到的整 体性能提高, 体性能提高,受限于该改进模式所占的运行 时间比例。 时间比例。
12
Suppose FPLaw canroot(FPSQR) is responsible for Amdahl’s square also be applied to compare two 20% of the execution time of a critical graphics CPU design alternatives, for example : benchmark. One proposal is to enhance the FPSQR Implementations of floating-point(FP)square root hardware and speed up this operation by a factor of vary significantly in performance, especially all FP 10. The other alternative is just to try to make among instructions in the graphics processor run faster by a processors designed for graphics. Suppose FP factor of 1.6; FP instructions are responseible for the square root(FPSQR) is responsible for 20% of a total of 50% of the executiongraphics benchmark. execution time of a critical time for the application. Compare these two design alternatives.
4
Make the Common Case Fast
Improving the frequent event, rather than the rare event, will obviously help performance.
ִOverflo来自百度文库 case and no overflow case in addition
1 0.2 = 1.22 SpeedupFPSQR = 1− 0.2) + (( ) = 1.22 0 10.2) ((1− 0.2) + 10 1 SpeedupFP = = 1.23 1 0.5 (( SpeedupFP = 1− 0.5) + ) = 1.23 1.0.5 6 ((1− 0.5) + )of the FP Improving the performance 1.6 operations overall is slightly better because of the higher frequency. Improving the performance of the FP (可见提高所有FP操作的性能的方案要好, 这是由 可见提高所有 操作的性能的方案要好, 操作的性能的方案要好 operations overall is slightly better because 于它们的执行频率较高) 于它们的执行频率较高)
Answer: we can compare these two Answer: we can compare the speedups: alternatives by comparing these two alternatives by comparing the speedups: 可以通过计算加速比来进行比较) (可以通过计算加速比来进行比较)
7
Amdahl’s Law 3
Execution timenew = Execution timeold x
fE ((1− fE) + ) sE
where fE: fraction of enhancement sE: improvement gained by the enhancement mode 新的执行时间= 即:新的执行时间
3
Make the Common Case Fast
Perhaps the most important and pervasive principle of computer design is to make the common case fast: In making a design trade-off, favor the frequent case over the infrequent case. 计算机设计的最重要的原则就是: 计算机设计的最重要的原则就是:加快经常性发 生事件的执行速度。 生事件的执行速度。
10
Amdahl’sLaw can also be be applied to compare Amdahl’s Law can also applied to compare two CPU design alternatives, for example : two CPU design alternatives, Implementations for example : of floating-point(FP)square root vary significantly in performance, especially among Implementations of floating-point(FP)FP processors designed for graphics. Suppose squareroot(FPSQR) is responsible for 20% of the square root vary significantly in performance, of a critical graphics processors execution time especially among benchmark. designed for is to enhance the FPSQR hardware One proposal graphics. and speed up this operation by a factor of 10. The Amdahl’s Law也可以用于比较两种设计不同的 也可以用于比较两种设计不同的 other alternative is just to try to make all FP CPU,特别是对于处理图形的处理器来说,求浮点 ,特别是对于处理图形的处理器来说, instructions in the graphics processor run faster 数平方根的不同实现方法在性能上有很大差异。 数平方根的不同实现方法在性能上有很大差异。 by a factor of 1.6; FP instructions are responseible for a total of 50% of the execution time for the application. Compare these two design alternatives.
2
Performance Measurement 2
Example: Machine A runs a program in 10 seconds, Machine B runs the same program in 15 seconds, A is __% faster than B.
n Execution tim B 15 50 e 1+ = = =1+ 100 Execution tim A 10 e 100 ⇒n = 50
6
Amdahl’s Law 2
Speedup (加速比) 加速比) = Performance for entire task using the enhancement when possible(改进后完成整个任务的性能) (改进后完成整个任务的性能) Performance for entire task w/o using the enhancement (改进前完成整个任务的性能) 改进前完成整个任务的性能) = Execution time for entire task w/o using the enhancement(改进前完成整个任务的时间) (改进前完成整个任务的时间) Execution time for entire task using the enhancement when possible (改进前完成整个任务的时间) 改进前完成整个任务的时间)
增强比例 原来执行时间x 原来执行时间 ((1− 增强比例 + ) ) 增强加速比
8
Amdahl’s Law 3
1 Execution timeold = ⇒ Speedup = Execution timenew (1− fE) + fE sE
即:加速比=原来的执行时间/新的执行时间 1 =
增强比例 ((1−增强比例 + ) ) 增强加速比
11
One proposal is to enhance the FPSQR hardware and speed up this operation by a factor of 10. The 例如,求浮点数平方根的操作,在一个标准测试程序中占总执 例如,求浮点数平方根的操作, other 行时间的20%。一种方法是改进 硬件, 行时间的alternative is just to try to make all FP 。一种方法是改进FPSQR硬件,将它的操作速 硬件 instructions in the graphics processor run 指令的执 度提10倍 另一种方法是将所有图形处理器中的FP指令的执 度提 倍。另一种方法是将所有图形处理器中的 faster by a factor of 倍 这些FP指令在总的执行时间中占 行速度都提高1.6倍 FP instructions are responseible for 指令在总的执行时间中占50% 行速度都提高 1.6; ,这些 指令在总的执行时间中占 a total of 50% of 比较这两种设计方法。 比较这两种设计方法。 the execution time for the application. Compare these two design alternatives.
1
Performance Measurement 1
Machine X is n% faster than Machine Y: 机器X比机器 比机器Y快 机器 比机器 快 n%
n ExecutiontimeY PerformanceX = = 1+ 100 ExecutiontimeX PerformanceY
9
Amdahl’s Law 4
Example: An enhancement run 10 times faster than the original machine, but it is usable 40% of the time, then the speedup = __. Sol:fE = 0.4 sE = 10 ⇒ Speedup= 1/((1-0.4) + 0.4/10) = 1.56