计算机结构与组成 22-ddg-pipelineII

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

CS61C L29 CPU Design : Pipelining to Improve Performance II (8)
ALU
ALU
ALU
ALU
ALU
Garcia, Spring 2008 © UCB
Control Hazard: Branching (2/8)
分支决策硬件放在ALU阶段
INTEL’S NEW CHIP: THE ATOM PROCESSOR!
Designed for the “mobile internet” (i.e., handheld devices), it has 45 million transistors, is as fast as the CPU in a 4 year old laptop, but uses only ~0.2 watts! The chips in today’s laptops use 35 watts. The key is its adaptive power states and the ability to adjust the clock speed and CPU voltage depending on usage. Very cool!
Reg
Read same memory twice in same clock cycle
Garcia, Spring 2008 © UCB
CS61C L29 CPU Design : Pipelining to Improve Performance II (4)
Structural Hazard #1: Single Memory (2/2)
Time (clock cycles)
ALU
I n s Load t Instr 1 r. Instr 2 O Instr 3 r d Instr 4 e r
I$
Reg
I$
D$ ALU
Reg I$
Reg
D$ ALU Reg D$ ALU Reg D$ Reg D$ Reg
Reg I$
Reg I$
ALU
CS61C L29 CPU Design : Pipelining to Improve Performance II (11) Garcia, Spring 2008 © UCB
Control Hazard: Branching (5/8)
Time (clock cycles)
I n I$ D$ Reg Reg beq s I$ D$ Reg Reg t Instr 1 r. I$ D$ Reg Reg Instr 2 O I$ D$ Reg Reg Instr 3 r I$ D$ Reg Reg d Instr 4 e r Branch comparator moved to Decode stage.
Pipelining is a BIG idea
每个周期执行指令的一个阶段在每个时钟周期都有一个指令会完成.
平均而言，执行速度快了很多.
What makes this work?
指令间的相似性，使得所有的指令可以使用相同的
阶段 (一般意义上). 每个阶段的时间大致相同: 浪费时间更少.
designated clock cycle
Structural hazards: HW cannot support some
combination of instructions (single person to fold and put clothes away) Control hazards: Pipelining of branches causes later instruction fetches to wait for the result of the branch Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)
identifies it as a branch), immediately make a decision and set the new value of the PC Benefit: since branch is complete in Stage 2, only one unnecessary instruction is fetched, so only one no-op is needed Side Note: This means that branches are idle in Stages 3, 4 and 5.
CS61C L29 CPU Design : Pipelining to Improve Performance II (10)
Garcia, Spring 2008 © UCB
Control Hazard: Branching (4/8)
Optimization #1:
insert special branch comparator in Stage 2 as soon as instruction is decoded (Opcode
Garcia, Spring 2008 © UCB
Control Hazard: Branching (3/8)
基本解决方案: Stall until decision is
made
插入“no-op”指令 (该指令不做任何工作, 只是
消耗时间) or hold up the fetch of the next instruction (for 2 cycles). 缺点: 每个分支语句会花费3个时钟周期 (假定比较是在ALU阶段进行)
这会产生流水线 stalls or “bubbles”.
CS61C L29 CPU Design : Pipelining to Improve Performance II (3) Garcia, Spring 2008 © UCB
Structural Hazard #1: Single Memory (1/2)
Control Hazard: Branching (1/8)
Time (clock cycles) I n I$ D$ Reg Reg beq s I$ D$ Reg Reg t Instr 1 r. I$ D$ Reg Reg Instr 2 O I$ D$ Reg Reg Instr 3 r I$ D$ Reg Reg d Instr 4 e r Where do we do the compare for the branch?
Computer Organization and Design
Lecture22 – CPU Design : Pipelining to Improve Performance II 2008-05-13
Hi to Kyle Ledoux from Worcester, MA !
Lecturer SOE Dan Garcia
2) Build RegFile with independent read and write ports
Result: can perform Read and Write
during same clock cycle
CS61C L29 CPU Design : Pipelining to Improve Performance II (7) Garcia, Spring 2008 © UCB
1) RegFile access is VERY fast: takes less than half the time of ALU stage
Write to Registers during first half of each clock cycle Read from Registers during second half of each clock cycle
/Infotech/20525/
Review
Optimal Pipeline
Correction to the Pipelining Demo At one point I asked ALU what he was doing (after the pipeline had drained) and he said “something”, and I made him say “nothing”. David was right; the ALU is always doing something, it’s hardware!
(a temporary smaller [of usually most recently used] copy of memory)
have both an L1 Instruction Cache and
CS61C L29 CPU Design : Pipelining to Improve Performance II (5)
CS61C L29 CPU Design : Pipelining to Improve Performance II (12)
ALU
ALUΒιβλιοθήκη ALUALUALU
Garcia, Spring 2008 © UCB
Control Hazard: Branching (6a/8)
I User inserting no-op instruction n Time (clock cycles) s I$ D$ Reg Reg t add r. I$ D$ Reg Reg beq
CS61C L29 CPU Design : Pipelining to Improve Performance II (2)
Garcia, Spring 2008 © UCB
Problems for Pipelining CPUs
Limits to pipelining:Hazards prevent next instruction from executing during its
an L1 Data Cache need more complex hardware to control when both caches miss
Garcia, Spring 2008 © UCB
Structural Hazard #2: Registers Time (clock cycles) I (1/2)
n s t sw r. Instr 1 O Instr 2 r Instr 3 d e Instr 4 r
ALU I$ Reg I$ D$ ALU Reg I$ Reg D$ ALU Reg D$ ALU Reg D$ ALU Reg
Reg I$
Reg
I$
Reg
D$
Reg
Can we read and write to registers simultaneously?
因此，不管是否进行分支，总会取得分支之后的两
条指令
希望的分支功能
如果不分支, 不必浪费时间，直接正常执行后面的
语句如果分支, 不执行其后的语句，直接跳到所要求的标号处开始执行。
CS61C L29 CPU Design : Pipelining to Improve Performance II (9)
CS61C L29 CPU Design : Pipelining to Improve Performance II (6) Garcia, Spring 2008 © UCB
Structural Hazard #2: Registers (2/2) different solutions have been used: Two
Solution:
infeasible and inefficient to create second
memory (We’ll learn about this morefriday/next week) „so simulate this by having two Level 1 Caches