超标量流水线
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
1 Speedup 1 f f s v
Limits on Instruction Level Parallelism (ILP)
Weiss and Smith [1984] 1.58
Sohi and Vajapeyam [1987]
Tjaden and Flynn [1970] Tjaden and Flynn [1973] Uht [1986] Smith et al. [1989] Jouppi and Wall [1988]
Limits of Pipelining
IBM RISC Experience(P91,Tilak Agerwala and John
Cocke,1987)(原理性问题)
Control and data dependences add 15% Best case CPI of 1.15, IPC of 0.87 Deeper pipelines (higher frequency) magnify dependence penalties
三个阶段: 第一:N条指令进入流水线 第二:流水线充满阶段,假定没有流水线干扰引起的停顿,此时是流水线最优 的性能
第三:流水线排空阶段,没有新指令进入流水线,当前正在流水线中的指令完
成执行
Pipelined Performance Model
N Pipeline Depth 1 1-g g
Tyranny of Amdahl’s Law *Bob Colwell+ When g is even slightly below 100%, a big performance hit will result Stalled cycles are the key adversary and must be minimized as much as possible
1 1 f f N
Superscalar Proposal
Moderate tyranny of Amdahl’s Law Ease sequential bottleneck More generally applicable Robust (less sensitive to f) Revised Amdahl’s Law:
Motivation for Superscalar [Agerwala and Cocke](P23)
Speedup jumps from 3 to 4.3 for N=6, f=0.8, but s =2 instead of s=1 (scalar)
Typical Range
Speedup
Pipelining to Superscalar
Forecast Limits of pipelining The case for superscalar Instruction-level parallel machines Superscalar pipeline organization Superscalar pipeline design
1 Sequential bottleneck lim v f 1 f 1 f Even if v is infinite v Performance limited by nonvectorizable portion (1-f) 1
N No. of Processors 1 h 1-h 1-f f
h = fraction of time in serial code f = fraction that is vectorizable v = speedup for f Speedup Overall speedup:
1
f 1 f v
Revisit Amdahl’s Law
1.81
1.86 (Flynn’s bottleneck) 1.96 2.00 2.00 2.40
Johnson [1991]
Acosta et al. [1986] Wedig [1982] Butler et al. [1991] Melvin and Patt [1991] Wall [1991] Kuck et al. [1972] Riseman and Foster [1972] Nicolau and Fisher [1984]
This analysis assumes 100% cache hit rates(存储问题) Hit rates approach 100% for some programs Many important programs have much worse hit rates
Later!
Time
Pipelined Performance Model(Harold Stone,1987,P19)
N Pipeline Depth 1 1-g g
g = fraction of time pipeline is filled 1-g = fraction of time pipeline is not filled (stalled)
In the 1980’s (decade of pipelining): CPI: 5.0 => 1.15 In the 1990’s (decade of superscalar): CPI: 1.15 => 0.5 (best case)
Amdahl’s Law(P18)
N No. of Processors 1 h 1-h 1-f f Time
Processor Performance(P17)
Processor Performance = --------------Time Program
=
Instructions Program
X
Cycles Instruction
X
Time Cycle
(code size)
(CPI)
(cyHale Waihona Puke Baidule time)