计算机体系结构_第一次作业

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

计算机体系结构
第一章
1.11 Availability is the most important consideration for designing servers, followed closely by scalability and throughput.
a. We have a single processor with a failures in time(FIT) of 100. What is the mean time to failure (MTTF) for this system?
b. If it takes 1 day to get the system running again, what is the availability of the system?
c. Imagine that the government, to cut costs, is going to build a supercomputer out of inexpensive computers rather than expensive, reliable computers. What is the MTTF for a system with 1000 processors? Assume that if one fails, they all fail.
答：
a. 平均故障时间(MTTF)是一个可靠性度量方法，MTTF的倒数是故
障率，一般以每10亿小时运行中的故障时间计算(FIT)。

因此由该定义可知1/MTTF=FIT/10＾9，所以MTTF=10^9/100=10^7。

b. 系统可用性=MTTF/(MTTF+MTTR)，其中MTTR为平均修复时间，
在该题目中表示为系统重启时间。

计算10^7/(10^7+24)约等于1.
c. 由于一个处理器发生故障，其他处理器也不能使用，所以故障率
为原来的1000倍，所以MTTF值为单个处理器MTTF的1/1000即10^7/1000=10^4。

1.14 In this exercise, assume that we are considering enhancing
a machine by adding vector hardware to it. When a computation is run in vector mode on the vector hardware, it is 10 times faster than the normal mode of execution. We call the percentage of time that could be spent using vector mode the percentage of vectorization.
a. Draw a graph that plots the speedup as a percentage of the computation performed in vector mode. Label the y-axis “Net speedup” and label the x-axis “Percent vectorization”.
b. What percentage of vectorization is needed to achieve a speedup of 2?
c. What percentage of the computation run time is spent in vector mode if a speedup of 2 is achieved?
d. What percentage of vectorization is needed to achieved one-half the maximum speedup attainable from using vector mode?
e. Suppose you have measured the percentage of vectorization of the program to be 70%. The hardware design group estimates it can speed up the vector hardware even more with significant additional investment. You wonder whether the compiler of vectorization would the compiler team need to achieve in order to equal an addition 2*speedup in the vector unit(beyond the initial 10*)?
答：
a. 根据加速比定义可知，增强加速比=10，如果令增强比例为x，总
加速比为y，则有y=1/(1-x+x/10)。

x的取值范围为[0，1]；y的取值范围为[0，10]。

如下图示：
b. y=1/(1-x+x/10)；当y=2时，x=5/9=55.6%；
c. (5/9)/10/(1/2)=1/9=11.1%
d. 最大加速比理论上为10；最大加速比的一半就是5；y=1/(1-x+x/10)；
当y=5时，x=8/9=88.9%
e. 当前x=70%；y=1/(1-x+x/10)；可知y=2.7;
如果y=2×2.7=5.4；y=1/(1-x+x/10)；可知x=0.91；
第二章
2.8The following questions investigate the impact of small and simple caches using CACTI and assume a 65nm(0.065μm) technology.
a. Compare the access times of 64KB caches with 64byte blocks
and a single bank. What are the relative access times of two-way and four-way set associative caches in comparison to a direct mapped organization?
b. Compare the access times of four-way set associative caches with 64 byte blocks and a single bank. What are the relative access times of 32KB and 64KB caches in comparison to a 16KB cache?
c. For a 64KB cache, find the cache associativity between 1 and 8 with the lowest average memory access time given that misses per instruction for a certain workload suite is 0.00664 for direct mapped, 0.00366 for two-way set associative, 0.00987 for four-way set associative and 0.000266 for eight-way set associative cache. Overall, there are 0.3 data references per instruction. Assume cache misses take 10 ns in all models. To calculate the hit time in cycles, assume the cycle time output using CACTI, which corresponds to the maximum frequency a chche can operate without any bubbles in the pipeline.
答：
a. 直接映射：0.86ns；两路组相联：1.12ns；四路组相联：1.37ns。

两路组相联访存时间是直接映射的1.12/0.86=1.30倍；四路组相联访存时间是直接映射的1.37/0.86=1.59倍。

b. 16KB cache 的访存时间为1.27ns，32KB cache为1.35ns，64KB
cache为1.37ns。

32KB cache的访存时间是16KB cache访存时间的
1.35/1.27=1.06倍；64KB cache的访存时间是16KB cache访存
时间的1.37/1.27=1.08倍；
c. 平均访存时间=命中率×命中时间+缺失率×缺失代价；
DM缺失率=0.00664/0.3=2.2%；
2-way缺失率=0.00366/0.3=1.2%；
4-way缺失率=0.00987/0.3=0.33%；
8-way缺失率=0.000266/0.3=0.09%；
DM访存所用时钟周期=0.86ns/0.5ns向上取整=2；
2-way访存所用时钟周期=1.12ns/0.5ns向上取整=3；
4-way访存所用时钟周期=1.37ns/0.83ns向上取整=2；
8-way访存所用时钟周期=2.03ns/0.79ns向上取整=3；
DM缺失代价=10ns/0.5ns=20时钟周期；
2-way缺失代价=10ns/0.5ns=20时钟周期；
4-way缺失代价=10ns/0.83ns=13时钟周期；
8-way缺失代价=10ns/0.79ns=13时钟周期；
DM平均访存时间=(1-0.22)×2+0.022×20×0.5=2.396×
0.5=1.2ns；
2-way平均访存时间=(1-0.012)×3+0.012×20×0.5=3.2×
0.5=1.6ns；
4-way平均访存时间=(1-0.0033)×2+0.0033×13×0.83=2.036×
0.83=1.69ns
8-way平均访存时间=(1-0.0009)×3+0.0009×13×0.79=3×
0.79=2.37ns。

2.11 Consider the usage of critical word first and early restart on L2 cache misses. Assume a 1MB L2 cache with 64 byte blocks and a refill path that is 16 bytes wide. Assume that the L2 can be written with 16 bytes every 4 processor cycles, the time to receive the first 16 byte block from the memory controller is 120 cycles, each additional 16 byte block from main memory requires 16 cycles, and data can be bypassed directly into the read port of the L2 cache. Ignore any cycles to transfer the miss request to the L2 cache and the requested data to the L1 cache.
a. How many cycles would it take to service an L2 cache miss with and without critical word first and early restart?
b. Do you think critical word first and early restart would be more important for L1 caches or L2 caches, and what factors would contribute to their relative importance?
答:
a. 使用关键字优先和提前启动策略：
第一次获取16bytes所需时钟周期120.
不使用关键字优先和提前启动策略：
第一次获取16bytes所需时钟周期120个，获取其余字节所需的时钟周期为16×(64B-16B)/16B=48个，所以一共所需的时钟周
期数为168个。

b. 有两个关键因素：L1和L2缺失时，对平均访存时间的影响；使
用关键字优先和提前启动策略时减少的缺失比率。

2.21 Virtual machines can lose performance from a number of events, such as the execution of privileged instructions, TLB misses, traps, and I/O. These events are usually handled in system code. Thus, one way of estimating the slowdown when running under a VM is the percentage of application execution time in system versus user mode. For example, an application spending 10% of its execution in system mode might slow down by 60% when running on a VM. Fugure 2.32 lists the early performance of various system calls under native execution, pure virtualization and paravirtualization for LMbench using Xen on an Itanuum system with times measured in microseconds.
a. What types of programs would be expected to have smaller slowdowns when running VMs?
b. If slowdowns were linear as a function of system time, given the slowdown above, how much slower would a program spending 20% of its execution in system time be expected to run?
c. What is the median slowdown of the system calls in the table above under pure virtualization and paravirtualization?
d. Which functions in the table above have the largest slowdowns?
What do you think the cause of this could be?
答：
a. I/O操作少，使用系统调用少的一些程序。

b. 如果slowdown与程序运行在系统模式时间成正比，则当程序有
20%执行时间处在系统模式时，slowdown为2×60%=120%。

c. Pure virtualization环境下，median slowdown为19.26；Para
virtualization环境下，median slowdown为4.14.
d. Null call 和Null I/O call 有最大的slowdown。

由于虚拟机对这两
个函数的优化性能最低，所以当虚拟机运行这类型函数时，slowdown会变得很大。