深入理解计算机系统chapter4

合集下载

计算机系统结构课件第4章

计算机系统结构课件第4章

在多任务或多用户系统中,可以把不同的任务分配在不同的存 储器中来提高存储器的访问速度。
第4章 存贮体系
低位交叉访问
MDR0 低位部分:存储体体号 b=log2m 高位部分:体内地址 a=log2n 0 m 2m 3m … (n-1)m
W
MDR1 1 m+1 2m+1 3m+1 … (n-1)m+1
第4章 存贮体系
第 4 章 存储系统
4.1 存贮体系的形成与性能 4.2 虚拟存贮器 4.3 高速缓冲存贮器(Cache) 4.4 主存保护
第4章 存贮体系
4.1 存储系统的形成与性能
冯诺依曼机的改进:
运算器为中心 存储器为中心
目的:存放计算机系统中所需要处理的程序与数据 。
主存储器:用以存放正在运行的程序与数据 。
m--并行工作的存贮体个数。
第4章 存贮体系 存贮器的速度可以用访问时间TA、存贮周期TM和频宽(也称 带宽)Bm来描述。
TA 存贮器从接到访存读申请,到信息被读到数据总线上所需的
时间。 TM 连续启动一个存贮体所需要的间隔时间,它一般总比TA大。 存贮器频宽 Bm: 存贮器可提供的数据传送速率,一般用每秒钟
第4章 存贮体系
交叉访问存储器
模M主存储器:分为M个存储体的主存储器。
同时访问:采取同时启动,完全并行工作的方式;
交叉访问:分时启动,互相错开一个存储体存储周期的 1/M,交叉进行工作。
图 4.3 多体(m=4)交叉存贮器
第4章 存贮体系
高位交叉访问
MDR
0
W MDR
1
数据总线
MDRm-1 n(m-1) n(m-1)+1 n(m-1)+2 n(m-1)+3 … n(m-1) MARm-1

计算机系统结构第4章精品PPT课件

计算机系统结构第4章精品PPT课件


4/1344.1 指令来自并行1. 循环级并行:使一个循环中的不同循环体并行执行。 ➢ 开发循环体中存在的并行性
最常见、最基本
➢ 是指令级并行研究的重点之一 ➢ 例如,考虑下述语句:
for (i=1; i<=500; i=i+1) a[i]=a[i]+s; 每一次循环都可以与其他的循环重叠并行执行; 在每一次循环的内部,却没有任何的并行性。
(ILP:Instruction-Level Parallelism)
➢ 本章研究:如何通过各种可能的技术,获得更多 的指令级并行性。
硬件+软件技术 必须要硬件技术和软件技术互相配合,才能够最大 限度地挖掘出程序中存在的指令级并行。

3/134
4.1 指令级并行
1. 流水线处理机的实际CPI ➢ 理想流水线的CPI加上各类停顿的时钟周期数:

5/134
4.1 指令级并行
1. 最基本的开发循环级并行的技术 ➢ 循环展开(loop unrolling)技术 ➢ 采用向量指令和向量数据表示
2. 相关与流水线冲突 ➢ 相关有三种类型:
数据相关、名相关、控制相关
➢ 流水线冲突是指对于具体的流水线来说,由于相关 的存在,使得指令流中的下一条指令不能在指定的 时钟周期执行。
➢ 读操作数(Read Operands,RO):等待数据冲 突消失,然后读操作数。
(out of order execution)
IS
RO
检测结构冲突 检测数据冲突

16/134
4.2 指令的动态调度
1. 在前述5段流水线中,是不会发生WAR冲突和WAW冲突 的。但乱序执行就使得它们可能发生了。
第4章 指令级并行

计算机操作系统第四章

计算机操作系统第四章

• 划分分区的方法 –分区大小相等 –分区大小不相等
缺将乏用灵户活区性划:分作成业具小有,多造个成较内小存的空分间区的、浪适费量;的 作中业等大分,区一及个少分量区的装大不分下区,,作以业满无足法不运同行作。业因的 此需,求常用于工控系统中。
• 内存分配管理 – 分区使用表(MBT) –分区按大小排队
第四章 存储器管理
存储器的层次结构 程序的装入和链接 连续分配方式 基本分页存储管理方式 基本分段存储管理方式 虚拟存储器 请求分页存储管理方式 页面置换算法 请求分段存储管理方式
目的:
内存有限,有效地对内存进行 管理
4.1 存储器的层次结构
一 多级存储器结构
• 在CPU的控制部件中,包含的寄存器有指令寄 存器和程序计数器;在中央处理器的算术及 逻辑部件中,包含的寄存器有累加器。
• 寄存器是内存阶层中的最顶端,也是系统获 得操作资料的最快速途径,完全能与CPU协调 工作。寄存器的长度通常以“字”作为单位。
2 主存储器
• 内存指的就是主板上的存储部件,用于保存 进程运行时的程序和数据。
系统区 作业 3 作业 1 分区 3 分区 4 作业 2
• 存储保护与重定位(地址转换) – 每个分区(一道程序)对应一对界地址寄 存器:上/下限寄存器。 – 采用静态重定位方式,即由链接/装入程序 完成。
• 优缺点 – 优点:简单,要求的硬件支持少(一对 R); – 缺点:存在大的碎片(内碎片),主存利 用率低。
CPU寄存器 高速缓存 主存 磁盘缓存 磁盘
可移动存储介质
可执行速存度储越器,快是 OS存价储管格理越的高对象。 信对息它掉的容电访量即问越丢,失通小。过
OS指令即可,耗费 时间少,访问速度快

操作系统第4章ppt课件

操作系统第4章ppt课件

THANKS
感谢观看
P/V操作
对信号量进行加减操作,实现进程同 步与互斥。
经典同步问题及其解决方法
1 2
生产者-消费者问题
通过两个信号量分别控制生产者和消费者进程, 确保生产者和消费者之间的同步与互斥。
哲学家进餐问题
通过引入资源分级法或信号量集机制,避免死锁 的发生,确保哲学家进餐过程中的同步与互斥。
3
读者-写者问题

多线程模型比较分析
01
多对一模型Leabharlann 将多个用户级线程映射到一个内核级线程上。该模型下,线程管理在用
户空间完成,线程的调度采用非抢占式调度,由线程库负责。
02
一对一模型
将每个用户级线程都映射到一个内核级线程上。该模型下,线程的创建
、撤销和同步等都在内核中实现,线程的调度由内核完成。
03
多对多模型
将多个用户级线程映射到少数但不止一个内核级线程上。该模型结合了
前两种模型的优点,允许多个用户级线程映射到不同的内核级线程上运
行。
线程同步与互斥机制
互斥锁
采用互斥对象机制,只有拥有互斥对象的线程才有访问公共 资源的权限。因为互斥对象只有一个,所以能保证公共资源 不会同时被多个线程同时访问。
信号量
信号量是一个整型变量,可以对其执行down和up操作,也 就是常见的P和V操作。信号量初始化为一个正数,表示并发 执行的线程数量。
死锁避免:银行家算法是一种典型的 死锁避免算法。该算法通过检查请求 资源的进程对资源的最大需求量是否 超过系统可用资源量来判断是否分配 资源给该进程。如果分配后系统剩余 资源量仍然能够满足其他进程的最大 需求量,则分配资源,否则不分配资 源。
死锁检测:通过定期运行死锁检测算 法来检测系统中是否存在死锁。常见 的死锁检测算法有资源分配图算法和 银行家算法等。如果检测到死锁发生 ,则需要采取相应措施来解除死锁, 例如通过撤销部分进程或抢占部分资 源来打破死锁状态。

自考《计算机系统结构》第4章精讲

自考《计算机系统结构》第4章精讲

第四章 指令系统的设计原理和风格 本章属重点章。

指令系统是计算机外特性的重要内容,本章主要介绍了两种不同风格的指令系统:RISC和CISC.在学习这两种指令系统之前,我们先了解⼀下什么是指令系统。

⼀、指令系统的设计(领会) 指令系统是指机器所具有的全部指令的集合。

它反映了计算机所拥有的基本功能。

它是机器语⾔程序员所看到的机器的主要属性之⼀。

通常我们说的加法指令、传输数据指令等等就是计算机的指令,这些指令就是告诉计算机从事某⼀特殊运算的代码,⼀种计算机系统确定的这些指令的集合我们就说它是这种机器的指令系统。

那么指令系统的设计要做什么?就是要确定它的指令格式(就是指令有多少位长,哪⼏位表⽰地址,哪⼏位表⽰操作等)、类型(如堆栈型、寄存器型等分类)、操作(⽐如运算、数据传送啊什么的都是指令中要确定的操作)以及操作数的访问⽅式(⼀个指令要访问数据,是按其地址访问还是按内容访问等也要由指令设计来解决)。

我们知道,由多条指令构成的程序是要以⼆进制的形式放到存储器中的,早期的存储器很昂贵,因此导致指令设计者尽量增强⼀条指令的复杂性以减少程序的长度。

还⽤微程序(就是保存在专⽤的存储器中的⼀⼩段程序,运⾏时只要⽤⼀条指令来启动它就可⽤来代替好多条指令)来改进代码密度。

这样的设计倾向形成了⼀种传统的指令设计风格,即认为计算机系统性能的提⾼主要依靠增加指令复杂性及其功能来获取。

这就是称为复杂指令系统(CISC)的设计风格。

我们现在⽤的PC机多是⽤这种设计风格的指令系统,⽐如MMX多媒体扩展指令等,都是增加进去的指令,是复杂指令。

后来,通过测试,这种不断增加指令复杂度的办法并不能使系统性能得到很⼤提⾼,反倒使指令系统实现更困难和费时。

所以在70年代中期⼜出现了另⼀种称为"简化指令系统(RISC)"的设计风格。

它的基本思想是,简单的指令能执⾏得更快以及指令系统只需由使且频率⾼的指令组成。

(插话) 指令系统在设计时,应特别注意的是如何能使编译系统⾼效、简易地将源程序翻译成⽬标代码。

计算机操作系统第四章-50页PPT资料

计算机操作系统第四章-50页PPT资料

CriCticoalpsyecrtigonh; t 2019-2019 CAristipcaol sseectiPont;y Ltd.
Flag[1] = false;
Flag[2] = false;
……
……
5.2
操 作 系
操 end.
end.



4
二十一世纪计算机本科教育

作 系
•第三次尝试
统 在第二次尝试的基础上,将其中的两个语句顺序交换,形 操 成了新的算法如下:
6

作 正确的算法
二十一世纪计算机本科教育

为了避免过度互斥礼让的问题,Dekker使用第一种尝试中的轮盘
统 变量turn,表示哪个进程有权进入临界区。让它在两个进程的活动中规

定一个强制性的顺序。算法如下:
Process P (1)
Process P (2)
作 Begin
begin
系 ……
Evaluation……only.
统te统d
Flag[1] = true;
witWhhiAle (sFplago[2s])ed.oSlides
{flag[1]=false;
{flag[2]=false;

系 flag[1]=true};
flag[2]=true};

统 Critical section;
Critical section;

操 Flag[1] = false;
Flag[2] = false;

作 ……
……
系 end.
end.
– 第二次尝试
5.2

计算机方法论-chapter 4 计算学科中的三个学科形态

计算机方法论-chapter 4  计算学科中的三个学科形态
有一组初始的、专门的符号集; 有一组精确定义的,由初始的、专门的符号组成的符号
串转换成另一个符号串的规则。 在形式语言中,不允许出现根据形成规则无法确定的符
号串。
15
形式语言的语法
形式语言的语法:形式语言中的转换规则。
语法不包含语义。
在一个给定的形式语言中,可以根据需要,通 过赋值或模型对其进行严格的语义解释,从而 构成形式语言的语义。
的。
30
3、预备知识——冯·诺依曼计算机 P64
1946年2月14日,世界上第一台数字电子计算机ENIAC在美 国宾夕法尼亚大学研制成功。
ENIAC的结构在很大程度上是依照机电系统设计的,还存在 重大的线路结构等问题。
在图灵等人工作的影响下,1946年6月,美国杰出的数学家 冯·诺依曼(Von Neumann)及其同事完成了关于“电子计 算装置逻辑结构设计”的研究报告, 具体介绍了制造电子计算机和程序设计的新思想 至今为止,大多数计算机采用的仍然是冯·诺依曼型计算机 的组织结构,只是作了一些改进而已。因此,冯·诺依曼被 人们誉为“计算机器之父”。
1、抽象形态
科学抽象是指在思维中对同类事物去除其现象 的、次要的方面,抽取其共同的、主要的方面,从 而做到从个别中把握一般,从现象中把握本质的认 知过程和思维方法。
学科中的抽象形态包含着具体的内容,它们是 学科中所具有的科学概念、科学符号和思想模型。
3
抽象形态——源于现实世界(建立对客观事物 进行抽象描述的方法,建立概念模型)
状态
q
控制器
23
图灵机
写在带子上的符号为一个有穷字母表: {S0,S1,S2,…,Sp}
可以认为这个有穷字母表仅有S0、S1两个字符 其中S0可以看作是“0”,S1可以看作是“1” 由 “0”和“1”组成的字母表可以表示任何一个数

深入理解计算机操作系统笔记

深入理解计算机操作系统笔记

深入理解计算机系统第一章计算机系统漫游信息就是位+上下文源程序实际上就是一个由0和1组成的位序列,这些位被组织成8个一组,称为字节。

每个字节都表示程序中某个文本字符。

不仅如此,系统中所有的信息,包括磁盘文件、存储器中的程序、存储器中存放的用户数据以及网络上传送的数据,都是一串比特表示的。

C与Unix操作系统关系密切/C是一个小而简单的语言/Unix操作系统由C编写文件的C语句必须被转换为一系列的低级机器语言指令按照一种称为可执行目标程序的格式(不同系统格式不一样)打好包并以二进制磁盘形式存放起来。

目标文件(二进制可执行文件可执行目标文件->可重定向目标文件)预处理器:hello.c→hello.i(.i结尾是C语言程序)编译器: hello.i→hello.s(.s结尾是汇编语言程序)汇编器(编译过程可以归结到编译过程中): hello.s→hello.o(将hello.s翻译成机器语言指令指令打包成为一种可重定向(目标程序)的格式将结果保存在hello.o中)链接器: hello.o→hello标准C库中的函数并入到hello.o程序中GUN环境包括EMACS编辑器GCC编译器GDB调试器汇编器链接器处理二进制文件的工具其他一些部件一个典型系统的硬件组织(图)总线:贯穿整个系统的一组电子管道,它携带信息字节并负责在各个部件间传递。

传送定长的字节块,也就是字。

字长是一个基本的系统参数,各个系统中不尽相同。

物理上来说贮存是由一组DRAM(动态随机存取存储器)芯片组成处理器解释(或执行)存储在主存中指令的引擎。

处理器的核心是一个被称为程序计数器(PC)的字长大小的存储设备(或寄存器)。

在任何一个时间点上,PC都指向主存中的某条机器语言指令(内含其地址)。

主存、寄存器文件(Resister file)、算术逻辑单元(ALU)之间循环高速缓存用来作为暂时的集结区域,存放存储器在不久的将来可能会需要的信息。

计算机系统结构第4章

计算机系统结构第4章

第4章 存贮体系 图4-1主存-辅存存储层次
第4章 存贮体系
虚拟存储器是从主存容量满足不了要求提出来的。在主 存和辅存之间,增设辅助的软、硬件设备,让它们构成一个 整体,所以也称为主存-辅存存储层次。如图4-1所示。从 CPU看,速度是接近于主存的,容量是辅存的,每位价格是 接近于辅存的。从CPU看,速度是接近于主存的,容量是辅 存的,每位价格是接近于辅存的。从速度上看,主存的访问 时间约为磁盘的访问时间的10-5,即快10万倍。从价格上看, 主存的每位价格约为磁盘的每位价格的103,即贵1000倍。如 果存储层次能以接近辅存的每位价格去构成等于辅存容量的 快速主存,就会大大提高存储器系统的性能价格比。
序(多用户)环境,而Cache存储器既可以是单用户环境也可以 是多用户环境。
第4章 存贮体系
4.1.3
为简单起见,以图4-4的二级体系(M1,M2)为例来分析。 设ci为Mi的每位价格,SMi为Mi的以位计算的存储容量,TAi为 CPU访问到Mi中的信息所需的时间。为评价存储层次性能, 引入存储层次的每位价格c、命中率H和等效访问时间TA。
第4章 存贮体系
为了进行段式管理,每道程序在系统中都有一个段(映 像)表来存放该道程序各段装入主存的状况信息。参看图4-6, 段表中的每一项(对应表中的每一行)描述该道程序一个段的 基本状况,由若干个字段提供。段名字段用于存放段的名称, 段名一般是有其逻辑意义的,也可以转换成用段号指明。由 于段号从0开始顺序编号,正好与段表中的行号对应,如2段 必是段表中的第3行,这样,段表中就可不设段号(名)字段。 装入位字段用来指示该段是否已经调入主存,“1”表示已装 入,“0”表示未装入。在程序的执行过程中,各段的装入位 随该段是否活跃而动态变化。

深入理解计算机系统答案(超高清电子版)

深入理解计算机系统答案(超高清电子版)

Computer Systems:A Programmer’s PerspectiveInstructor’s Solution Manual1Randal E.BryantDavid R.O’HallaronDecember4,20031Copyright c2003,R.E.Bryant,D.R.O’Hallaron.All rights reserved.2Chapter1Solutions to Homework ProblemsThe text uses two different kinds of exercises:Practice Problems.These are problems that are incorporated directly into the text,with explanatory solutions at the end of each chapter.Our intention is that students will work on these problems as they read the book.Each one highlights some particular concept.Homework Problems.These are found at the end of each chapter.They vary in complexity from simple drills to multi-week labs and are designed for instructors to give as assignments or to use as recitation examples.This document gives the solutions to the homework problems.1.1Chapter1:A Tour of Computer Systems1.2Chapter2:Representing and Manipulating InformationProblem2.40Solution:This exercise should be a straightforward variation on the existing code.2CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS1011void show_double(double x)12{13show_bytes((byte_pointer)&x,sizeof(double));14}code/data/show-ans.c 1int is_little_endian(void)2{3/*MSB=0,LSB=1*/4int x=1;56/*Return MSB when big-endian,LSB when little-endian*/7return(int)(*(char*)&x);8}1.2.CHAPTER2:REPRESENTING AND MANIPULATING INFORMATION3 There are many solutions to this problem,but it is a little bit tricky to write one that works for any word size.Here is our solution:code/data/shift-ans.c The above code peforms a right shift of a word in which all bits are set to1.If the shift is arithmetic,the resulting word will still have all bits set to1.Problem2.45Solution:This problem illustrates some of the challenges of writing portable code.The fact that1<<32yields0on some32-bit machines and1on others is common source of bugs.A.The C standard does not define the effect of a shift by32of a32-bit datum.On the SPARC(andmany other machines),the expression x<<k shifts by,i.e.,it ignores all but the least significant5bits of the shift amount.Thus,the expression1<<32yields1.pute beyond_msb as2<<31.C.We cannot shift by more than15bits at a time,but we can compose multiple shifts to get thedesired effect.Thus,we can compute set_msb as2<<15<<15,and beyond_msb as set_msb<<1.Problem2.46Solution:This problem highlights the difference between zero extension and sign extension.It also provides an excuse to show an interesting trick that compilers often use to use shifting to perform masking and sign extension.A.The function does not perform any sign extension.For example,if we attempt to extract byte0fromword0xFF,we will get255,rather than.B.The following code uses a well-known trick for using shifts to isolate a particular range of bits and toperform sign extension at the same time.First,we perform a left shift so that the most significant bit of the desired byte is at bit position31.Then we right shift by24,moving the byte into the proper position and peforming sign extension at the same time.4CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 3int left=word<<((3-bytenum)<<3);4return left>>24;5}Problem2.48Solution:This problem lets students rework the proof that complement plus increment performs negation.We make use of the property that two’s complement addition is associative,commutative,and has additive ing C notation,if we define y to be x-1,then we have˜y+1equal to-y,and hence˜y equals -y+1.Substituting gives the expression-(x-1)+1,which equals-x.Problem2.49Solution:This problem requires a fairly deep understanding of two’s complement arithmetic.Some machines only provide one form of multiplication,and hence the trick shown in the code here is actually required to perform that actual form.As seen in Equation2.16we have.Thefinal term has no effect on the-bit representation of,but the middle term represents a correction factor that must be added to the high order bits.This is implemented as follows:code/data/uhp-ans.c Problem2.50Solution:Patterns of the kind shown here frequently appear in compiled code.1.2.CHAPTER2:REPRESENTING AND MANIPULATING INFORMATION5A.:x+(x<<2)B.:x+(x<<3)C.:(x<<4)-(x<<1)D.:(x<<3)-(x<<6)Problem2.51Solution:Bit patterns similar to these arise in many applications.Many programmers provide them directly in hex-adecimal,but it would be better if they could express them in more abstract ways.A..˜((1<<k)-1)B..((1<<k)-1)<<jProblem2.52Solution:Byte extraction and insertion code is useful in many contexts.Being able to write this sort of code is an important skill to foster.code/data/rbyte-ans.c Problem2.53Solution:These problems are fairly tricky.They require generating masks based on the shift amounts.Shift value k equal to0must be handled as a special case,since otherwise we would be generating the mask by performing a left shift by32.6CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1unsigned srl(unsigned x,int k)2{3/*Perform shift arithmetically*/4unsigned xsra=(int)x>>k;5/*Make mask of low order32-k bits*/6unsigned mask=k?((1<<(32-k))-1):˜0;78return xsra&mask;9}code/data/rshift-ans.c 1int sra(int x,int k)2{3/*Perform shift logically*/4int xsrl=(unsigned)x>>k;5/*Make mask of high order k bits*/6unsigned mask=k?˜((1<<(32-k))-1):0;78return(x<0)?mask|xsrl:xsrl;9}.1.2.CHAPTER2:REPRESENTING AND MANIPULATING INFORMATION7B.(a)For,we have,,code/data/floatge-ans.c 1int float_ge(float x,float y)2{3unsigned ux=f2u(x);4unsigned uy=f2u(y);5unsigned sx=ux>>31;6unsigned sy=uy>>31;78return9(ux<<1==0&&uy<<1==0)||/*Both are zero*/10(!sx&&sy)||/*x>=0,y<0*/11(!sx&&!sy&&ux>=uy)||/*x>=0,y>=0*/12(sx&&sy&&ux<=uy);/*x<0,y<0*/13},8CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS This exercise is of practical value,since Intel-compatible processors perform all of their arithmetic in ex-tended precision.It is interesting to see how adding a few more bits to the exponent greatly increases the range of values that can be represented.Description Extended precisionValueSmallest denorm.Largest norm.Problem2.59Solution:We have found that working throughfloating point representations for small word sizes is very instructive. Problems such as this one help make the description of IEEEfloating point more concrete.Description8000Smallest value4700Largest denormalized———code/data/fpwr2-ans.c1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS91/*Compute2**x*/2float fpwr2(int x){34unsigned exp,sig;5unsigned u;67if(x<-149){8/*Too small.Return0.0*/9exp=0;10sig=0;11}else if(x<-126){12/*Denormalized result*/13exp=0;14sig=1<<(x+149);15}else if(x<128){16/*Normalized result.*/17exp=x+127;18sig=0;19}else{20/*Too big.Return+oo*/21exp=255;22sig=0;23}24u=exp<<23|sig;25return u2f(u);26}10CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS int decode2(int x,int y,int z){int t1=y-z;int t2=x*t1;int t3=(t1<<31)>>31;int t4=t3ˆt2;return t4;}Problem3.32Solution:This code example demonstrates one of the pedagogical challenges of using a compiler to generate assembly code examples.Seemingly insignificant changes in the C code can yield very different results.Of course, students will have to contend with this property as work with machine-generated assembly code anyhow. They will need to be able to decipher many different code patterns.This problem encourages them to think in abstract terms about one such pattern.The following is an annotated version of the assembly code:1movl8(%ebp),%edx x2movl12(%ebp),%ecx y3movl%edx,%eax4subl%ecx,%eax result=x-y5cmpl%ecx,%edx Compare x:y6jge.L3if>=goto done:7movl%ecx,%eax8subl%edx,%eax result=y-x9.L3:done:A.When,it will computefirst and then.When it just computes.B.The code for then-statement gets executed unconditionally.It then jumps over the code for else-statement if the test is false.C.then-statementt=test-expr;if(t)goto done;else-statementdone:D.The code in then-statement must not have any side effects,other than to set variables that are also setin else-statement.1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS11Problem3.33Solution:This problem requires students to reason about the code fragments that implement the different branches of a switch statement.For this code,it also requires understanding different forms of pointer dereferencing.A.In line29,register%edx is copied to register%eax as the return value.From this,we can infer that%edx holds result.B.The original C code for the function is as follows:1/*Enumerated type creates set of constants numbered0and upward*/2typedef enum{MODE_A,MODE_B,MODE_C,MODE_D,MODE_E}mode_t;34int switch3(int*p1,int*p2,mode_t action)5{6int result=0;7switch(action){8case MODE_A:9result=*p1;10*p1=*p2;11break;12case MODE_B:13*p2+=*p1;14result=*p2;15break;16case MODE_C:17*p2=15;18result=*p1;19break;20case MODE_D:21*p2=*p1;22/*Fall Through*/23case MODE_E:24result=17;25break;26default:27result=-1;28}29return result;30}Problem3.34Solution:This problem gives students practice analyzing disassembled code.The switch statement contains all the features one can imagine—cases with multiple labels,holes in the range of possible case values,and cases that fall through.12CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1int switch_prob(int x)2{3int result=x;45switch(x){6case50:7case52:8result<<=2;9break;10case53:11result>>=2;12break;13case54:14result*=3;15/*Fall through*/16case55:17result*=result;18/*Fall through*/19default:20result+=10;21}2223return result;24}code/asm/varprod-ans.c 1int var_prod_ele_opt(var_matrix A,var_matrix B,int i,int k,int n) 2{3int*Aptr=&A[i*n];4int*Bptr=&B[k];5int result=0;6int cnt=n;78if(n<=0)9return result;1011do{12result+=(*Aptr)*(*Bptr);13Aptr+=1;14Bptr+=n;15cnt--;1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS13 16}while(cnt);1718return result;19}code/asm/structprob-ans.c 1typedef struct{2int idx;3int x[4];4}a_struct;14CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1/*Read input line and write it back*/2/*Code will work for any buffer size.Bigger is more time-efficient*/ 3#define BUFSIZE644void good_echo()5{6char buf[BUFSIZE];7int i;8while(1){9if(!fgets(buf,BUFSIZE,stdin))10return;/*End of file or error*/11/*Print characters in buffer*/12for(i=0;buf[i]&&buf[i]!=’\n’;i++)13if(putchar(buf[i])==EOF)14return;/*Error*/15if(buf[i]==’\n’){16/*Reached terminating newline*/17putchar(’\n’);18return;19}20}21}An alternative implementation is to use getchar to read the characters one at a time.Problem3.38Solution:Successfully mounting a buffer overflow attack requires understanding many aspects of machine-level pro-grams.It is quite intriguing that by supplying a string to one function,we can alter the behavior of another function that should always return afixed value.In assigning this problem,you should also give students a stern lecture about ethical computing practices and dispell any notion that hacking into systems is a desirable or even acceptable thing to do.Our solution starts by disassembling bufbomb,giving the following code for getbuf: 1080484f4<getbuf>:280484f4:55push%ebp380484f5:89e5mov%esp,%ebp480484f7:83ec18sub$0x18,%esp580484fa:83c4f4add$0xfffffff4,%esp680484fd:8d45f4lea0xfffffff4(%ebp),%eax78048500:50push%eax88048501:e86a ff ff ff call8048470<getxs>98048506:b801000000mov$0x1,%eax10804850b:89ec mov%ebp,%esp11804850d:5d pop%ebp12804850e:c3ret13804850f:90nopWe can see on line6that the address of buf is12bytes below the saved value of%ebp,which is4bytes below the return address.Our strategy then is to push a string that contains12bytes of code,the saved value1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS15 of%ebp,and the address of the start of the buffer.To determine the relevant values,we run GDB as follows:1.First,we set a breakpoint in getbuf and run the program to that point:(gdb)break getbuf(gdb)runComparing the stopping point to the disassembly,we see that it has already set up the stack frame.2.We get the value of buf by computing a value relative to%ebp:(gdb)print/x(%ebp+12)This gives0xbfffefbc.3.Wefind the saved value of register%ebp by dereferencing the current value of this register:(gdb)print/x*$ebpThis gives0xbfffefe8.4.Wefind the value of the return pointer on the stack,at offset4relative to%ebp:(gdb)print/x*((int*)$ebp+1)This gives0x8048528We can now put this information together to generate assembly code for our attack:1pushl$0x8048528Put correct return pointer back on stack2movl$0xdeadbeef,%eax Alter return value3ret Re-execute return4.align4Round up to125.long0xbfffefe8Saved value of%ebp6.long0xbfffefbc Location of buf7.long0x00000000PaddingNote that we have used the.align statement to get the assembler to insert enough extra bytes to use up twelve bytes for the code.We added an extra4bytes of0s at the end,because in some cases OBJDUMP would not generate the complete byte pattern for the data.These extra bytes(plus the termininating null byte)will overflow into the stack frame for test,but they will not affect the program behavior. Assembling this code and disassembling the object code gives us the following:10:6828850408push$0x804852825:b8ef be ad de mov$0xdeadbeef,%eax3a:c3ret4b:90nop Byte inserted for alignment.5c:e8ef ff bf bc call0xbcc00000Invalid disassembly.611:ef out%eax,(%dx)Trying to diassemble712:ff(bad)data813:bf00000000mov$0x0,%edi16CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS From this we can read off the byte sequence:6828850408b8ef be ad de c390e8ef ff bf bc ef ff bf00000000Problem3.39Solution:This problem is a variant on the asm examples in the text.The code is actually fairly simple.It relies on the fact that asm outputs can be arbitrary lvalues,and hence we can use dest[0]and dest[1]directly in the output list.code/asm/asmprobs-ans.c Problem3.40Solution:For this example,students essentially have to write the entire function in assembly.There is no(apparent) way to interface between thefloating point registers and the C code using extended asm.code/asm/fscale.c1.4.CHAPTER4:PROCESSOR ARCHITECTURE17 1.4Chapter4:Processor ArchitectureProblem4.32Solution:This problem makes students carefully examine the tables showing the computation stages for the different instructions.The steps for iaddl are a hybrid of those for irmovl and OPl.StageFetchrA:rB M PCvalP PCExecuteR rB valEPC updateleaveicode:ifun M PCDecodevalB RvalE valBMemoryWrite backR valMPC valPProblem4.34Solution:The following HCL code includes implementations of both the iaddl instruction and the leave instruc-tions.The implementations are fairly straightforward given the computation steps listed in the solutions to problems4.32and4.33.You can test the solutions using the test code in the ptest subdirectory.Make sure you use command line argument‘-i.’18CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1####################################################################2#HCL Description of Control for Single Cycle Y86Processor SEQ#3#Copyright(C)Randal E.Bryant,David R.O’Hallaron,2002#4####################################################################56##This is the solution for the iaddl and leave problems78####################################################################9#C Include’s.Don’t alter these#10#################################################################### 1112quote’#include<stdio.h>’13quote’#include"isa.h"’14quote’#include"sim.h"’15quote’int sim_main(int argc,char*argv[]);’16quote’int gen_pc(){return0;}’17quote’int main(int argc,char*argv[])’18quote’{plusmode=0;return sim_main(argc,argv);}’1920####################################################################21#Declarations.Do not change/remove/delete any of these#22#################################################################### 2324#####Symbolic representation of Y86Instruction Codes#############25intsig INOP’I_NOP’26intsig IHALT’I_HALT’27intsig IRRMOVL’I_RRMOVL’28intsig IIRMOVL’I_IRMOVL’29intsig IRMMOVL’I_RMMOVL’30intsig IMRMOVL’I_MRMOVL’31intsig IOPL’I_ALU’32intsig IJXX’I_JMP’33intsig ICALL’I_CALL’34intsig IRET’I_RET’35intsig IPUSHL’I_PUSHL’36intsig IPOPL’I_POPL’37#Instruction code for iaddl instruction38intsig IIADDL’I_IADDL’39#Instruction code for leave instruction40intsig ILEAVE’I_LEAVE’4142#####Symbolic representation of Y86Registers referenced explicitly##### 43intsig RESP’REG_ESP’#Stack Pointer44intsig REBP’REG_EBP’#Frame Pointer45intsig RNONE’REG_NONE’#Special value indicating"no register"4647#####ALU Functions referenced explicitly##### 48intsig ALUADD’A_ADD’#ALU should add its arguments4950#####Signals that can be referenced by control logic####################1.4.CHAPTER4:PROCESSOR ARCHITECTURE195152#####Fetch stage inputs#####53intsig pc’pc’#Program counter54#####Fetch stage computations#####55intsig icode’icode’#Instruction control code56intsig ifun’ifun’#Instruction function57intsig rA’ra’#rA field from instruction58intsig rB’rb’#rB field from instruction59intsig valC’valc’#Constant from instruction60intsig valP’valp’#Address of following instruction 6162#####Decode stage computations#####63intsig valA’vala’#Value from register A port64intsig valB’valb’#Value from register B port 6566#####Execute stage computations#####67intsig valE’vale’#Value computed by ALU68boolsig Bch’bcond’#Branch test6970#####Memory stage computations#####71intsig valM’valm’#Value read from memory727374####################################################################75#Control Signal Definitions.#76#################################################################### 7778################Fetch Stage################################### 7980#Does fetched instruction require a regid byte?81bool need_regids=82icode in{IRRMOVL,IOPL,IPUSHL,IPOPL,83IIADDL,84IIRMOVL,IRMMOVL,IMRMOVL};8586#Does fetched instruction require a constant word?87bool need_valC=88icode in{IIRMOVL,IRMMOVL,IMRMOVL,IJXX,ICALL,IIADDL};8990bool instr_valid=icode in91{INOP,IHALT,IRRMOVL,IIRMOVL,IRMMOVL,IMRMOVL,92IIADDL,ILEAVE,93IOPL,IJXX,ICALL,IRET,IPUSHL,IPOPL};9495################Decode Stage################################### 9697##What register should be used as the A source?98int srcA=[99icode in{IRRMOVL,IRMMOVL,IOPL,IPUSHL}:rA;20CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 101icode in{IPOPL,IRET}:RESP;1021:RNONE;#Don’t need register103];104105##What register should be used as the B source?106int srcB=[107icode in{IOPL,IRMMOVL,IMRMOVL}:rB;108icode in{IIADDL}:rB;109icode in{IPUSHL,IPOPL,ICALL,IRET}:RESP;110icode in{ILEAVE}:REBP;1111:RNONE;#Don’t need register112];113114##What register should be used as the E destination?115int dstE=[116icode in{IRRMOVL,IIRMOVL,IOPL}:rB;117icode in{IIADDL}:rB;118icode in{IPUSHL,IPOPL,ICALL,IRET}:RESP;119icode in{ILEAVE}:RESP;1201:RNONE;#Don’t need register121];122123##What register should be used as the M destination?124int dstM=[125icode in{IMRMOVL,IPOPL}:rA;126icode in{ILEAVE}:REBP;1271:RNONE;#Don’t need register128];129130################Execute Stage###################################131132##Select input A to ALU133int aluA=[134icode in{IRRMOVL,IOPL}:valA;135icode in{IIRMOVL,IRMMOVL,IMRMOVL}:valC;136icode in{IIADDL}:valC;137icode in{ICALL,IPUSHL}:-4;138icode in{IRET,IPOPL}:4;139icode in{ILEAVE}:4;140#Other instructions don’t need ALU141];142143##Select input B to ALU144int aluB=[145icode in{IRMMOVL,IMRMOVL,IOPL,ICALL,146IPUSHL,IRET,IPOPL}:valB;147icode in{IIADDL,ILEAVE}:valB;148icode in{IRRMOVL,IIRMOVL}:0;149#Other instructions don’t need ALU1.4.CHAPTER4:PROCESSOR ARCHITECTURE21151152##Set the ALU function153int alufun=[154icode==IOPL:ifun;1551:ALUADD;156];157158##Should the condition codes be updated?159bool set_cc=icode in{IOPL,IIADDL};160161################Memory Stage###################################162163##Set read control signal164bool mem_read=icode in{IMRMOVL,IPOPL,IRET,ILEAVE};165166##Set write control signal167bool mem_write=icode in{IRMMOVL,IPUSHL,ICALL};168169##Select memory address170int mem_addr=[171icode in{IRMMOVL,IPUSHL,ICALL,IMRMOVL}:valE;172icode in{IPOPL,IRET}:valA;173icode in{ILEAVE}:valA;174#Other instructions don’t need address175];176177##Select memory input data178int mem_data=[179#Value from register180icode in{IRMMOVL,IPUSHL}:valA;181#Return PC182icode==ICALL:valP;183#Default:Don’t write anything184];185186################Program Counter Update############################187188##What address should instruction be fetched at189190int new_pc=[191#e instruction constant192icode==ICALL:valC;193#Taken e instruction constant194icode==IJXX&&Bch:valC;195#Completion of RET e value from stack196icode==IRET:valM;197#Default:Use incremented PC1981:valP;199];22CHAPTER 1.SOLUTIONS TO HOMEWORK PROBLEMSME DMispredictE DM E DM M E D E DMGen./use 1W E DM Gen./use 2WE DM Gen./use 3W Figure 1.1:Pipeline states for special control conditions.The pairs connected by arrows can arisesimultaneously.code/arch/pipe-nobypass-ans.hcl1.4.CHAPTER4:PROCESSOR ARCHITECTURE232#At most one of these can be true.3bool F_bubble=0;4bool F_stall=5#Stall if either operand source is destination of6#instruction in execute,memory,or write-back stages7d_srcA!=RNONE&&d_srcA in8{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE}||9d_srcB!=RNONE&&d_srcB in10{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE}||11#Stalling at fetch while ret passes through pipeline12IRET in{D_icode,E_icode,M_icode};1314#Should I stall or inject a bubble into Pipeline Register D?15#At most one of these can be true.16bool D_stall=17#Stall if either operand source is destination of18#instruction in execute,memory,or write-back stages19#but not part of mispredicted branch20!(E_icode==IJXX&&!e_Bch)&&21(d_srcA!=RNONE&&d_srcA in22{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE}||23d_srcB!=RNONE&&d_srcB in24{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE});2526bool D_bubble=27#Mispredicted branch28(E_icode==IJXX&&!e_Bch)||29#Stalling at fetch while ret passes through pipeline30!(E_icode in{IMRMOVL,IPOPL}&&E_dstM in{d_srcA,d_srcB})&&31#but not condition for a generate/use hazard32!(d_srcA!=RNONE&&d_srcA in33{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE}||34d_srcB!=RNONE&&d_srcB in35{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE})&&36IRET in{D_icode,E_icode,M_icode};3738#Should I stall or inject a bubble into Pipeline Register E?39#At most one of these can be true.40bool E_stall=0;41bool E_bubble=42#Mispredicted branch43(E_icode==IJXX&&!e_Bch)||44#Inject bubble if either operand source is destination of45#instruction in execute,memory,or write back stages46d_srcA!=RNONE&&47d_srcA in{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE}|| 48d_srcB!=RNONE&&49d_srcB in{E_dstM,E_dstE,M_dstM,M_dstE,W_dstM,W_dstE};5024CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 52#At most one of these can be true.53bool M_stall=0;54bool M_bubble=0;code/arch/pipe-full-ans.hcl 1####################################################################2#HCL Description of Control for Pipelined Y86Processor#3#Copyright(C)Randal E.Bryant,David R.O’Hallaron,2002#4####################################################################56##This is the solution for the iaddl and leave problems78####################################################################9#C Include’s.Don’t alter these#10#################################################################### 1112quote’#include<stdio.h>’13quote’#include"isa.h"’14quote’#include"pipeline.h"’15quote’#include"stages.h"’16quote’#include"sim.h"’17quote’int sim_main(int argc,char*argv[]);’18quote’int main(int argc,char*argv[]){return sim_main(argc,argv);}’1920####################################################################21#Declarations.Do not change/remove/delete any of these#22#################################################################### 2324#####Symbolic representation of Y86Instruction Codes#############25intsig INOP’I_NOP’26intsig IHALT’I_HALT’27intsig IRRMOVL’I_RRMOVL’28intsig IIRMOVL’I_IRMOVL’29intsig IRMMOVL’I_RMMOVL’30intsig IMRMOVL’I_MRMOVL’31intsig IOPL’I_ALU’32intsig IJXX’I_JMP’33intsig ICALL’I_CALL’34intsig IRET’I_RET’1.4.CHAPTER4:PROCESSOR ARCHITECTURE25 36intsig IPOPL’I_POPL’37#Instruction code for iaddl instruction38intsig IIADDL’I_IADDL’39#Instruction code for leave instruction40intsig ILEAVE’I_LEAVE’4142#####Symbolic representation of Y86Registers referenced explicitly##### 43intsig RESP’REG_ESP’#Stack Pointer44intsig REBP’REG_EBP’#Frame Pointer45intsig RNONE’REG_NONE’#Special value indicating"no register"4647#####ALU Functions referenced explicitly##########################48intsig ALUADD’A_ADD’#ALU should add its arguments4950#####Signals that can be referenced by control logic##############5152#####Pipeline Register F##########################################5354intsig F_predPC’pc_curr->pc’#Predicted value of PC5556#####Intermediate Values in Fetch Stage###########################5758intsig f_icode’if_id_next->icode’#Fetched instruction code59intsig f_ifun’if_id_next->ifun’#Fetched instruction function60intsig f_valC’if_id_next->valc’#Constant data of fetched instruction 61intsig f_valP’if_id_next->valp’#Address of following instruction 6263#####Pipeline Register D##########################################64intsig D_icode’if_id_curr->icode’#Instruction code65intsig D_rA’if_id_curr->ra’#rA field from instruction66intsig D_rB’if_id_curr->rb’#rB field from instruction67intsig D_valP’if_id_curr->valp’#Incremented PC6869#####Intermediate Values in Decode Stage#########################7071intsig d_srcA’id_ex_next->srca’#srcA from decoded instruction72intsig d_srcB’id_ex_next->srcb’#srcB from decoded instruction73intsig d_rvalA’d_regvala’#valA read from register file74intsig d_rvalB’d_regvalb’#valB read from register file 7576#####Pipeline Register E##########################################77intsig E_icode’id_ex_curr->icode’#Instruction code78intsig E_ifun’id_ex_curr->ifun’#Instruction function79intsig E_valC’id_ex_curr->valc’#Constant data80intsig E_srcA’id_ex_curr->srca’#Source A register ID81intsig E_valA’id_ex_curr->vala’#Source A value82intsig E_srcB’id_ex_curr->srcb’#Source B register ID83intsig E_valB’id_ex_curr->valb’#Source B value84intsig E_dstE’id_ex_curr->deste’#Destination E register ID。

深入理解计算机系统答案(超高清电子版)

深入理解计算机系统答案(超高清电子版)

深⼊理解计算机系统答案(超⾼清电⼦版)Computer Systems:A Programmer’s Perspective Instructor’s Solution Manual1Randal E.BryantDavid R.O’HallaronDecember4,20032Chapter1Solutions to Homework ProblemsThe text uses two different kinds of exercises:Practice Problems.These are problems that are incorporated directly into the text,with explanatory solutions at the end of each chapter.Our intention is that students will work on these problems as they read the book.Each one highlights some particular concept.Homework Problems.These are found at the end of each chapter.They vary in complexity from simple drills to multi-week labs and are designed for instructors to give as assignments or to use as recitation examples.This document gives the solutions to the homework problems.1.1Chapter1:A Tour of Computer Systems1.2Chapter2:Representing and Manipulating InformationProblem2.40Solution:This exercise should be a straightforward variation on the existing code.2CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS1011void show_double(double x)12{13show_bytes((byte_pointer)&x,sizeof(double));14}code/data/show-ans.c 1int is_little_endian(void)3/*MSB=0,LSB=1*/4int x=1;56/*Return MSB when big-endian,LSB when little-endian*/7return(int)(*(char*)&x);8}1.2.CHAPTER2:REPRESENTING AND MANIPULATING INFORMATION3 There are many solutions to this problem,but it is a little bit tricky to write one that works for any word size.Here is our solution:code/data/shift-ans.c The above code peforms a right shift of a word in which all bits are set to1.If the shift is arithmetic,the resulting word will still have all bits set to1.Problem2.45Solution:This problem illustrates some of the challenges of writing portable code.The fact that1<<32yields0on some32-bit machines and1on others is common source of bugs.A.The C standard does not de?ne the effect of a shift by32of a32-bit datum.On the SPARC(andmany other machines),the expression x</doc/dde1f034f111f18583d05a59.html pute beyond_msb as2<<31.C.We cannot shift by more than15bits at a time,but we can compose multiple shifts to get thedesired effect.Thus,we can compute set_msb as2<<15<<15,and beyond_msb as set_msb<<1.Problem2.46Solution:This problem highlights the difference between zero extension and sign extension.It also provides an excuse to show an interesting trick that compilers often use to use shifting to perform masking and sign extension.A.The function does not perform any sign extension.For example,if we attempt to extract byte0fromword0xFF,we will get255,rather than.B.The following code uses a well-known trick for using shifts to isolate a particular range of bits and toperform sign extension at the same time.First,we perform a left shift so that the most signi?cant bit of the desired byte is at bit position31.Then we right shift by24,moving the byte into the proper position and peforming sign extension at the same time. 4CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 3int left=word<<((3-bytenum)<<3);4return left>>24;5}Problem2.48Solution:This problem lets students rework the proof that complement plus increment performs negation.We make use of the property that two’s complement addition is associative,commutative,and has additive/doc/dde1f034f111f18583d05a59.html ing C notation,if we de?ne y to be x-1,then we have?y+1equal to-y,and hence?y equals -y+1.Substituting gives the expression-(x-1)+1,which equals-x.Problem2.49Solution:This problem requires a fairly deep understanding of two’s complement arithmetic.Some machines only provide one form of multiplication,and hence the trick shown in the code here is actually required to perform that actual form.As seen in Equation2.16we have.The?nal term has no effect on the-bit representation of,but the middle term represents acode/data/uhp-ans.c Problem2.50Solution:1.2.CHAPTER2:REPRESENTING AND MANIPULATING INFORMATION5A.:x+(x<<2)B.:x+(x<<3)C.:(x<<4)-(x<<1)D.:(x<<3)-(x<<6)Problem2.51Solution:Bit patterns similar to these arise in many applications.Many programmers provide them directly in hex-adecimal,but it would be better if they could express them in more abstract ways.A..((1<B..((1<Problem2.52Solution:Byte extraction and insertion code is useful in many contexts.Being able to write this sort of code is an important skill to foster.code/data/rbyte-ans.c Problem2.53Solution:These problems are fairly tricky.They require generating masks based on the shift amounts.Shift value k equal to0must be handled as a special case,since otherwise we would be generating the mask by performing a left shift by32.6CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1unsigned srl(unsigned x,int k)2{3/*Perform shift arithmetically*/4unsigned xsra=(int)x>>k;5/*Make mask of low order32-k bits*/6unsigned mask=k?((1<<(32-k))-1):?0;78return xsra&mask;9}code/data/rshift-ans.c 1int sra(int x,int k)2{3/*Perform shift logically*/4int xsrl=(unsigned)x>>k;5/*Make mask of high order k bits*/6unsigned mask=k??((1<<(32-k))-1):0;78return(x<0)?mask|xsrl:xsrl;1.2.CHAPTER2:REPRESENTING AND MANIPULATING INFORMATION7B.(a)For,we have,,code/data/?oatge-ans.c 1int float_ge(float x,float y)2{3unsigned ux=f2u(x);4unsigned uy=f2u(y);5unsigned sx=ux>>31;6unsigned sy=uy>>31;78return9(ux<<1==0&&uy<<1==0)||/*Both are zero*/10(!sx&&sy)||/*x>=0,y<0*/11(!sx&&!sy&&ux>=uy)||/*x>=0,y>=0*/12(sx&&sy&&ux<=uy);/*x<0,y<0*/13},8CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS This exercise is of practical value,since Intel-compatible processors perform all of their arithmetic in ex-tended precision.It is interesting to see how adding a few more bits to the exponent greatly increases the range of values that can be represented.Description Extended precisionValueSmallest denorm.Largest norm.Problem2.59Solution:We have found that working through?oating point representations for small word sizes is very instructive. Problems such as this one help make the description of IEEE?oating point more concrete.Description8000Smallest value4700Largest denormalized———1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS91/*Compute2**x*/2float fpwr2(int x){4unsigned exp,sig;5unsigned u;67if(x<-149){8/*Too small.Return0.0*/9exp=0;10sig=0;11}else if(x<-126){12/*Denormalized result*/13exp=0;14sig=1<<(x+149);15}else if(x<128){16/*Normalized result.*/17exp=x+127;18sig=0;19}else{20/*Too big.Return+oo*/21exp=255;22sig=0;23}24u=exp<<23|sig;25return u2f(u);26}10CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS int decode2(int x,int y,int z){int t1=y-z;int t2=x*t1;int t3=(t1<<31)>>31;int t4=t3?t2;return t4;}Problem3.32Solution:This code example demonstrates one of the pedagogical challenges of using a compiler to generate assembly code examples.Seemingly insigni?cant changes in the C code can yield very different results.Of course, students will have to contend with this property as work with machine-generated assembly code anyhow. They will need to be able to decipher many different code patterns.This problem encourages them to think in abstract terms about one such pattern.1movl8(%ebp),%edx x2movl12(%ebp),%ecx y3movl%edx,%eax4subl%ecx,%eax result=x-y5cmpl%ecx,%edx Compare x:y6jge.L3if>=goto done:7movl%ecx,%eax8subl%edx,%eax result=y-x9.L3:done:A.When,it will compute?rst and then.When it just computes.B.The code for then-statement gets executed unconditionally.It then jumps over the code for else-statement if the test is false.C.then-statementt=test-expr;if(t)goto done;else-statementdone:D.The code in then-statement must not have any side effects,other than to set variables that are also set1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS11Problem3.33Solution:This problem requires students to reason about the code fragments that implement the different branches of a switch statement.For this code,it also requires understanding different forms of pointer dereferencing.A.In line29,register%edx is copied to register%eax as the return value.From this,we can infer that%edx holds result.B.The original C code for the function is as follows:1/*Enumerated type creates set of constants numbered0and upward*/2typedef enum{MODE_A,MODE_B,MODE_C,MODE_D,MODE_E}mode_t;34int switch3(int*p1,int*p2,mode_t action)5{6int result=0;7switch(action){8case MODE_A:12case MODE_B:13*p2+=*p1;14result=*p2;15break;16case MODE_C:17*p2=15;18result=*p1;19break;20case MODE_D:21*p2=*p1;22/*Fall Through*/23case MODE_E:24result=17;25break;26default:27result=-1;28}29return result;30}Problem3.34Solution:This problem gives students practice analyzing disassembled code.The switch statement contains all the features one can imagine—cases with multiple labels,holes in the range of possible case values,and cases that fall through.12CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1int switch_prob(int x)2{3int result=x;45switch(x){6case50:7case52:8result<<=2;9break;10case53:11result>>=2;15/*Fall through*/16case55:17result*=result;18/*Fall through*/19default:20result+=10;21}2223return result;24}code/asm/varprod-ans.c 1int var_prod_ele_opt(var_matrix A,var_matrix B,int i,int k,int n) 2{3int*Aptr=&A[i*n];4int*Bptr=&B[k];5int result=0;6int cnt=n;78if(n<=0)9return result;1011do{12result+=(*Aptr)*(*Bptr);13Aptr+=1;14Bptr+=n;1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS13 16}while(cnt); 1718return result;19}code/asm/structprob-ans.c 1typedef struct{2int idx;3int x[4];4}a_struct;14CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS 1/*Read input line and write it back*/ 2/*Code will work for any buffer size.Bigger is more time-efficient*/ 3#define BUFSIZE644void good_echo()5{6char buf[BUFSIZE];7int i;8while(1){9if(!fgets(buf,BUFSIZE,stdin))10return;/*End of file or error*/11/*Print characters in buffer*/12for(i=0;buf[i]&&buf[i]!=’\n’;i++)13if(putchar(buf[i])==EOF)14return;/*Error*/15if(buf[i]==’\n’){16/*Reached terminating newline*/17putchar(’\n’);18return;19}20}21}An alternative implementation is to use getchar to read the characters one at a time.Problem3.38Solution:Successfully mounting a buffer over?ow attack requires understanding many aspects of machine-level pro-grams.It is quite intriguing that by supplying a string to one function,we can alter the behavior of another function that should always return a? xed value.In assigning this problem,you should also give students a stern lecture about ethical computing practices and dispell any notion that hacking into systems is a desirable or even acceptable thing to do.Our solution starts by disassembling bufbomb,giving the following code for getbuf: 1080484f4:280484f4:55push%ebp380484f5:89e5mov%esp,%ebp480484f7:83ec18sub$0x18,%esp580484fa:83c4f4add$0xfffffff4,%esp680484fd:8d45f4lea0xfffffff4(%ebp),%eax78048500:50push%eax88048501:e86a ff ff ff call804847098048506:b801000000mov$0x1,%eax10804850b:89ec mov%ebp,%esp11804850d:5d pop%ebp12804850e:c3retWe can see on line6that the address of buf is12bytes below the saved value of%ebp,which is4bytes1.3.CHAPTER3:MACHINE LEVEL REPRESENTATION OF C PROGRAMS15 of%ebp,and the address of the start of the buffer.To determine the relevant values,we run GDB as follows:1.First,we set a breakpoint in getbuf and run the program to that point:(gdb)break getbuf(gdb)runComparing the stopping point to the disassembly,we see that it has already set up the stack frame.2.We get the value of buf by computing a value relative to%ebp:(gdb)print/x(%ebp+12)This gives0xbfffefbc.3.We?nd the saved value of register%ebp by dereferencing the current value of this register:(gdb)print/x*$ebpThis gives0xbfffefe8.4.We?nd the value of the return pointer on the stack,at offset4relative to%ebp:(gdb)print/x*((int*)$ebp+1)This gives0x8048528We can now put this information together to generate assembly code for our attack:1pushl$0x8048528Put correct return pointer back on stack2movl$0xdeadbeef,%eax Alter return value3ret Re-execute return4.align4Round up to125.long0xbfffefe8Saved value of%ebp6.long0xbfffefbc Location of buf7.long0x00000000PaddingNote that we have used the.align statement to get the assembler to insert enough extra bytes to use up twelve bytes for the code.We added an extra4bytes of0s at the end,because in some cases OBJDUMP would not generate the complete byte pattern for the data.These extra bytes(plus the termininating null byte)will over?ow into the stack frame for test,but they will not affect the program behavior. Assembling this code and disassembling the object code gives us the following:10:6828850408push$0x804852825:b8ef be ad de mov$0xdeadbeef,%eax3a:c3ret4b:90nop Byte inserted for alignment.5c:e8ef ff bf bc call0xbcc00000Invalid disassembly.611:ef out%eax,(%dx)Trying to diassemble712:ff(bad)data16CHAPTER1.SOLUTIONS TO HOMEWORK PROBLEMS From this we can read off the byte sequence:Problem3.39Solution:This problem is a variant on the asm examples in the text.The code is actually fairly simple.It relies on the fact that asm outputs can be arbitrary lvalues,and hence we can use dest[0]and dest[1]directly in the output list.code/asm/asmprobs-ans.c Problem3.40Solution:For this example,students essentially have to write the entire function in assembly.There is no(apparent) way to interface between the?oating point registers and the C code using extended asm.code/asm/fscale.c1.4.CHAPTER4:PROCESSOR ARCHITECTURE17 1.4Chapter4:Processor ArchitectureProblem4.32Solution:This problem makes students carefully examine the tables showing the computation stages for the different instructions.The steps for iaddl are a hybrid of those for irmovl and OPl.StageFetchrA:rB M PCvalP PCExecuteR rB valEPC updateleaveicode:ifun M PCDecodevalB RvalE valBMemoryWrite backR valMPC valPProblem4.34Solution:The following HCL code includes implementations of both the iaddl instruction and the leave instruc-tions.The implementations are fairly straightforward given the computation steps listed in the solutions to problems4.32and4.33.You can test the solutions using the test code in the ptest subdirectory.Make sure you use command line argument‘-i.’。

深入理解计算机系统chapter

深入理解计算机系统chapter

科学计算
用于解决复杂的数学问题和模 拟实验,如天气预报、核爆炸 模拟等。
人工智能
用于模拟人类智能行为,如语 音识别、图像识别等。
网络通信
用于实现计算机之间的数据传 输和通信,如互联网、物联网 等。
02
计算机硬件系统
中央处理器
运算器
执行算术和逻辑运算,处理数据。
控制器
控制计算机各部件协调工作,保证计算机按 照程序设定的步骤执行。
复杂指令集(CISC)和精简指令集(RISC)
CISC指令丰富,功能强大,但复杂度高;RISC指令精简,设计思路简洁,效率高。
程序的执行过程
程序的加载
将程序从外存加载到内存,为执行做好准备。
指令的取指与执行
根据程序计数器(PC)的值从内存中取出指令,解码并执行。
数据的存取与运算
根据指令对数据进行存取和运算,结果存回寄存器或内存。
深入理解计算机系统 chapter
目录
• 计算机系统概述 • 计算机硬件系统 • 计算机软件系统 • 计算机系统的工作原理 • 计算机系统的性能评价 • 计算机系统的安全与可靠性
01
计算机系统概述
计算机系统的组成
硬件
包括中央处理器(CPU)、内存、输入输出设备 等,提供计算能力和数据存储。
软件
输出设备
将计算机处理后的结果转换为人类可读的形式,如显 示器、打印机、音响等。
输入/输出接口
连接输入/输出设备与计算机主机的桥梁,实现数据 的传输和控制。
03
计算机软件系统
系统软件
操作系统
提供计算机硬件与应用程序之间的接口, 控制和管理计算机的硬件及软件资源。
数据库管理系统
用于存储、检索、定义和管理大量数 据的软件,提供数据组织和访问的方

深入理解计算机系统chapter4

深入理解计算机系统chapter4

4.1.3 Instruction Encoding
One important property of any instruction set is that the byte encodings must have a unique interpretation.An arbitrary sequence of bytes either encodes a unique instruction sequence or is not a legal byte sequence.This property holds for Y86, because every instruction has a unique combination of code and function in its initial byte, and given this byte, we can determine the length and meaning of any additional bytes.This property ensures that a processor can execute an object-code program without any ambiguity about the meaning of the code. Even if the code is embedded within other bytes in the program, we can readily determine the instruction sequence as long as we start from the first byte in the sequence. On the other hand,if we do not know the starting position of a code sequence, we cannot reliably determine how to split the sequence into individual instructions. This causes problems for disassemblers and other tools that attempt to extract machine-level programs directly from object-code byte sequences.
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

4.1.2 Y86
Instructions
• The seven jump instructions (shown in Figure 4.2 as jXX) are jmp, jle, jl, je, jne, jge, and jg. Branches are taken according to the type of branch and the settings of the condition codes.
4.1.2 Y86
பைடு நூலகம்
Instructions
There are four integer operation instructions, shown in Figure 4.2 as OP1. These are add1, sub1, and1, and xor1. They operate only on register data. These instructions set the three condition codes ZF, SF, and OF (zero, sign, and overflow).
4.1.2 Y86
Instructions
• The call instruction pushes the return address on the stack and jumps to the destination address. • The ret instruction returns from such a call. • The push1 and pop1 instructions implement push and pop, just as they do in IA32. • The halt instruction stops instruction execution. For Y86, executing the halt instruction causes the processor to stop, with the status code set to HLT. (See Section 4.1.4.)
• 4.1.1 Programmer-Visible State
The memory is conceptually a large array of bytes, holding both program and data. Y86 programs reference memory locations using virtual addresses. A combination of hardware and operating system software translates these into the actual, or physical, addresses indicating where the values are actually stored in memory.we can think of the virtual memory system as providing Y86 programs with an image of a monolithic byte array.
• 4.1.1 Programmer-Visible State
There are three single-bit condition codes, ZF, SF, and OF, storing information about the effect of the most recent arithmetic or logical instruction. The program counter (PC) holds the address of the instruction currently being executed.
4.1.3 Instruction Encoding
Figure 4.2 also shows the byte-level encoding of the instructions. Each instruction requires between 1 and 6 bytes, depending on which fields are required. Every instruction has an initial byte identifying the instruction type. This byte is split into two 4-bit parts: the high-order, or code, and the low-order, or function.As you can see in Figure 4.2, code values range from 0 to 0xB. The function values are significant only for the cases where a group of related instructions share a common code. These are given in Figure 4.3,showing the specific encodings of the integer operation, conditional move, and branch instructions. Observe that rrmovl has the same instruction code as the conditional moves. It can be viewed as an “unconditional move” just as the jmp instruction is an unconditional jump, both having function code 0.
4.1.3 Instruction Encoding
One important property of any instruction set is that the byte encodings must have a unique interpretation.An arbitrary sequence of bytes either encodes a unique instruction sequence or is not a legal byte sequence.This property holds for Y86, because every instruction has a unique combination of code and function in its initial byte, and given this byte, we can determine the length and meaning of any additional bytes.This property ensures that a processor can execute an object-code program without any ambiguity about the meaning of the code. Even if the code is embedded within other bytes in the program, we can readily determine the instruction sequence as long as we start from the first byte in the sequence. On the other hand,if we do not know the starting position of a code sequence, we cannot reliably determine how to split the sequence into individual instructions. This causes problems for disassemblers and other tools that attempt to extract machine-level programs directly from object-code byte sequences.
4.1.2 Y86
Instructions
There are six conditional move instructions (shown in Figure 4.2 as cmovXX): cmov1e, cmov1, cmove, cmovne, cmovge, and cmovg. These have the same format as the register-register move instruction rrmovl, but the destination register is updated only if the condition codes satisfy the required constraints.
• 4.1.1 Programmer-Visible State
A final part of the program state is a status code Stat, indicating the overall state of program execution.It will indicate either normal operation, or that some sort of exception has occurred, such as when an instruction attempts to read from an invalid memory address.
Chapter4:Processor Architecture
4.1 The Y86 Instruction Set Architecture 4.2 Logic Design and Harware Control Language HCT
相关文档
最新文档