体系结构例题
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
1.5 Quantitative Principles of Computer Design
ANSWER: 答案 we can use CPU performance formula: CPIA = 0.202 + 0.801 = 1.2 CPU timeA = ICA1.2Clock cycle timeA Clock cycle timeB =1.25Clock cycle timeA Compares are not executed in CPU B, so 20%/80%=25% instructions are branches: CPIB = 0.252 + 0.751 = 1.25 Because, ICB=0.8ICA. so: CPU timeB = ICB1.25Clock cycle timeB = 0.8ICA1.25(1.25Clock cycle timeA) = 1.25ICAClock cycle timeA
3.2 The Basic Pipeline for DLX
ANSWER:
* The average instruction execution time on the unpipelined machine is:
* In pipelined machine, clock must run at the speed of slowest stage plus overhead: 10 + 1 = 11 ns; * this is the average instruction execution time.
ANSWER: comparing the speedups:
1.33 0.75 2.0 Improving the performance of the FP operations overall is slightly better because of the higher frequency.
3.2 The Basic Pipeline for DLX
* Thus, the speedup from pipelining is:
* The 1ns overhead essentially establishes a limit on the effectiveness of pipelining.
1.5 Quantitative Principles of Computer Design
EXAMPLE: 例子 Suppose we have the following measurements: * Frequency of FP operations = 25% * Average CPI of FP operations = 4.0 * Average CPI of other instructions = 1.33 * Frequency of FPSQR= 2% * CPI of FPSQR = 20 测量结果 Assume that the two design alternatives are to reduce the CPI of FPSQR to 2 or to reduce the average CPI of all FP operations to 2. Compare these two design alternatives using the CPU performance equation. 设计选择
3.3 The Major Hurdle of Pipelining – Pipeline Hazards
EXAMPLE:
* see how much load structural hazard might cost. * Suppose: data references: 40%; ideal CPI : 1. * Assume: clock rate with structural hazard is 1.05 times higher than without hazard. 因为减少了开销 * is the pipeline with or without structural hazard faster, and by how much? 有或没有结构冒险哪个快?快多少?
* CPU A: A condition code is set by a compare instruction and followed by a branch that tests the condition code. 先用比较指令置条件码, 然后转移指令 检测条件码 * CPU B: A compare is included in the branch. 在转移指令中进行比较
1.5 Quantitative Principles of Computer Design
Under these assumptions, CPU A (with the shorter clock cycle time) is faster than CPU B (which executes fewer instructions). A快 If CPU A were only 1.1 times faster, then Clock cycle times is 1.10 Clock cycle timeA and the performance of CPU B is:Baidu NhomakorabeaCPU timeB = ICBCPIBClock cycle timeB = 0.8ICA1.25(1.10Clock cycle timeA) = 1.10ICAClock cycle timeA With this improvement CPU B, which executes fewer instructions, is faster. B快
1.5 Quantitative Principles of Computer Design
EXAMPLE: Implementations of floating-point square root (FPSQR) vary significantly in performance. Suppose FPSQR is responsible for 20% of the execution time of a critical benchmark. One proposal is to add FPSQR hardware that will speed up this operation by a factor of 10. The other alternative is just to try to make all FP instructions run faster; FP instructions are responsible for a total of 50% of the execution time. The design team believes that they can make all FP instructions run two times faster with the same effort as required for the fast square root. Compare these two design alternatives.
2.0 1.5
Since the CPI of overall FP enhancement is lower, its performance will better. Specifically, the speedup for the overall FP enhancement is:
We can compute the CPI for the enhanced FPSQR by: 增强FPSQR的CPI
1.5 Quantitative Principles of Computer Design
We compute the CPI for the enhancement of all FP instructions: 增强FP指令的CPI
1.5 Quantitative Principles of Computer Design
EXAMPLE: Suppose that we are considering an enhancement that runs 10 times faster than the original machine, but is only usable 40% of the time. What is the overall speedup gained by incorporating the enhancement? 例子
1.5 Quantitative Principles of Computer Design
On both CPUs, the conditional branch instruction takes 2 cycles, and all other instructions take 1 clock cycle. On CPU A, 20% of all instructions executed are conditional branches. Since every branch needs a compare, another 20% of the instructions are compares. Because CPU A does not have the compare included in the branch, assume that its clock cycle time is 1.25 times faster than that of CPU B. Which CPU is faster? 哪一个更快? What if CPU A was only 1.1 times faster?
1.5 Quantitative Principles of Computer Design
ANSWER: 答案 First, observe that only the CPI changes; the clock rate and instruction count remain identical. 只有CPI变化了
1.33 1.5
1.5 Quantitative Principles of Computer Design
Using the CPU Performance Equations: More Examples 运用CPU性能方程:更多例子
EXAMPLE: 例子 we are considering two alternatives for our conditional branch instructions (条件转移指令), as follows: 选择哪种?
3.2 The Basic Pipeline for DLX EXAMPLE: * Consider the unpipelined machine early. * Assume: clock: 10ns; ALU: 4 cycles, 40%; branches: 4 cycles, 20%; memory operations: 5 cycles, 40%. * Suppose: due to skew and setup, pipelining machine adds to clock overhead=1ns. * how much speedup we will gain from pipeline ?