[2011-paper]Multicore OS benchmarks--we can do better
得客户的量产测试成本可以持续 的下降。尽管测试
的成本 下 降 , 所提 供 的测试 质 量并 没有 因此 下降 , 但 客 户 所 收 到 的 S C芯 片仍 会 是 经 过 完 整 测试 的 高 O
成的影响。唯高速信号专用 P B板材价格高昂 , C 且
除 了 P B板 材外 , 有许 多 测试 载板 设 计 的 因素 会 C 还 导 致高 速信 号在 载板 上发 生衰 减 ,或是 让 高速信 号 受 到 噪声 的干扰 。
图 2测试载板上的 ST AA信 号仿 真 眼 图
作者简 介
5 结 论 与展 望
高速设 备 的信号 传输 速度 不 断 的提升 ,使 得单 位 时问 内可被 传输 的数 据量倍 数 的增 加 。然 而测试 这 些 内 含高 速 I P的 S C芯 片 的 困难 度 也持 续 地 提 O 高 。高速 I P内建 自测试 技 术让 高 速 I P的量 产测 试
【】 集 成 电 路 l国
C hi na n eg r e d C icui It at r 生衰 减或 是受 到 噪声
的影 响 ,就 可能会 让 内建 自测 试将 原 本性 能正 常 的
芯 片 判 断 为 测 试 不 通 过 。若 选 用 高 速 信 号 专 用 的 P B板 材来 制作 测 试载 板 , 可 以改 善 上述 情 况 造 C 则
陈宏 铭 , 术 市场部 总监 智原 科技 ( 海 ) 限公 技 上 有
硬 件成本 不增 加 的前 提下 达成 目标 。
intel cpu型号大全2009年12月24日星期四 15:12intel cpu型号大全按照处理器支持的平台来分,Intel处理器可分为台式机处理器、笔记本电脑处理器以及工作站/服务器处理器三大类;下面我们将根据这一分类为大家详细介绍不同处理器名称的含义与规格。
台式机处理器Pentium 4(P4)第一款P4处理器是Intel在2000年11月21日发布的P4 1.5GHz处理器,从那以后到现在近四年的时间里,P4处理器随着规格的不断变化已经发展成了具有近10种不同规格的处理器家族。
在这里面,“P4 XXGHz”是最简单的P4处理器型号。
这其中,早期的P4处理器采用了Willamette核心和Socket 423封装,具256KB二级缓存以及400MHz前端总线。
之后由于接口类型的改变,又出现了采用illamette核心和Socket478封装的 P4产品。
而目前我们所说的“P4”一般是指采用了Northwood核心、具有400MHz前端总线以及512KB二级缓存、基于Socket 478封装的P4处理器。
虽然规格上不一样,不过这些处理器的名称都采用了“P4 XXGHz”的命名方式,比如P4 1.5GHz、P4 1.8GHz、P4 2.4GHz。
Pentium 4 A(P4 A)有了P4作为型号基准,那么P4 A就不难理解了。
在基于Willamette核心的P4处理器推出后不久,Intel为了提升处理器性能,发布了采用Northwood 核心、具有 400MHz前端总线以及512KB二级缓存的新一代P4。
由于这两种处理器在部分频率上发生了重叠,为了便于消费者辨识,Intel就在出现重叠的、基于Northwood核心的P4处理器后面增加一个大写字母“A”以示区别,于是就诞生了P4 1.8A GHz、P4 2.0A GHz这样的处理器产品。
该书讲述的是采用了一个MIPS 处理器来展示计算机硬件技术、流水线、存储器的层次结构以及I/O 等基本功能。
采用ARMv6(ARM 11系列)为主要架构来展示指令系统和计算机算术运算的基本功能。
描述一种度量多核性能的独特方法——“Roofline model”,自带benchmark测试和分析AMD Opteron X4、Intel Xeo 5000、Sun Ultra SPARC T2和IBM Cell的性能。
将AMD Opteron X4和Intel Nehalem作为贯穿《计算机组成与设计:硬件/软件接口(英文版·第4版·ARM版)》的实例。
用SPEC CPU2006组件更新了所有处理器性能实例。
图书目录:1 Computer Abstractions and Technology1.1 Introduction1.2 BelowYour Program1.3 Under the Covers1.4 Performance1.5 The Power Wall1.6 The Sea Change: The Switch from Uniprocessors to Multiprocessors1.7 Real Stuff: Manufacturing and Benchmarking the AMD Opteron X41.8 Fallacies and Pitfalls1.9 Concluding Remarks1.10 Historical Perspective and Further Reading1.11 Exercises2 Instructions: Language of the Computer2.1 Introduction2.2 Operations of the Computer Hardware2.3 Operands of the Computer Hardware2.4 Signed and Unsigned Numbers2.5 Representing Instructions in the Computer2.6 Logical Operations2.7 Instructions for Making Decisions2.8 Supporting Procedures in Computer Hardware2.9 Communicating with People2.10 ARM Addressing for 32-Bit Immediates and More Complex Addressing Modes2.11 Parallelism and Instructions: Synchronization2.12 Translating and Starting a Program2.13 A C Sort Example to Put lt AU Together2.14 Arrays versus Pointers2.15 Advanced Material: Compiling C and Interpreting Java2.16 Real Stuff." MIPS Instructions2.17 Real Stuff: x86 Instructions2.18 Fallacies and Pitfalls2.19 Conduding Remarks2.20 Historical Perspective and Further Reading2.21 Exercises3 Arithmetic for Computers3.1 Introduction3.2 Addition and Subtraction3.3 Multiplication3.4 Division3.5 Floating Point3.6 Parallelism and Computer Arithmetic: Associativity 3.7 Real Stuff: Floating Point in the x863.8 Fallacies and Pitfalls3.9 Concluding Remarks3.10 Historical Perspective and Further Reading3.11 Exercises4 The Processor4.1 Introduction4.2 Logic Design Conventions4.3 Building a Datapath4.4 A Simple Implementation Scheme4.5 An Overview of Pipelining4.6 Pipelined Datapath and Control4.7 Data Hazards: Forwarding versus Stalling4.8 Control Hazards4.9 Exceptions4.10 Parallelism and Advanced Instruction-Level Parallelism4.11 Real Stuff: theAMD OpteronX4 (Barcelona)Pipeline4.12 Advanced Topic: an Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipelineand More Pipelining Illustrations4.13 Fallacies and Pitfalls4.14 Concluding Remarks4.15 Historical Perspective and Further Reading4.16 Exercises5 Large and Fast: Exploiting Memory Hierarchy5.1 Introduction5.2 The Basics of Caches5.3 Measuring and Improving Cache Performance5.4 Virtual Memory5.5 A Common Framework for Memory Hierarchies5.6 Virtual Machines5.7 Using a Finite-State Machine to Control a Simple Cache5.8 Parallelism and Memory Hierarchies: Cache Coherence5.9 Advanced Material: Implementing Cache Controllers5.10 Real Stuff: the AMD Opteron X4 (Barcelona)and Intel NehalemMemory Hierarchies5.11 Fallacies and Pitfalls5.12 Concluding Remarks5.13 Historical Perspective and Further Reading5.14 Exercises6 Storage and Other I/0 Topics6.1 Introduction6.2 Dependability, Reliability, and Availability6.3 Disk Storage6.4 Flash Storage6.5 Connecting Processors, Memory, and I/O Devices6.6 Interfacing I/O Devices to the Processor, Memory, andOperating System6.7 I/O Performance Measures: Examples from Disk and File Systems6.8 Designing an I/O System6.9 Parallelism and I/O: Redundant Arrays of Inexpensive Disks6.10 Real Stuff: Sun Fire x4150 Server6.11 Advanced Topics: Networks6.12 Fallacies and Pitfalls6.13 Concluding Remarks6.14 Historical Perspective and Further Reading6.15 Exercises7 Multicores, Multiprocessors, and Clusters7.1 Introduction7.2 The Difficulty of Creating Parallel Processing Programs7.3 Shared Memory Multiprocessors7.4 Clusters and Other Message-Passing Multiprocessors7.5 Hardware Multithreading 637.6 SISD,MIMD,SIMD,SPMD,and Vector7.7 Introduction to Graphics Processing Units7.8 Introduction to Multiprocessor Network Topologies7.9 Multiprocessor Benchmarks7.10 Roofline:A Simple Performance Model7.11 Real Stuff:Benchmarking Four Multicores Using theRooflineMudd7.12 Fallacies and Pitfalls7.13 Concluding Remarks7.14 Historical Perspective and Further Reading7.15 ExercisesInuexC D-ROM CONTENTA Graphics and Computing GPUSA.1 IntroductionA.2 GPU System ArchitecturesA.3 Scalable Parallelism-Programming GPUSA.4 Multithreaded Multiprocessor ArchitectureA.5 Paralld Memory System G.6 Floating PointA.6 Floating Point ArithmeticA.7 Real Stuff:The NVIDIA GeForce 8800A.8 Real Stuff:MappingApplications to GPUsA.9 Fallacies and PitflaUsA.10 Conduding RemarksA.1l HistoricalPerspectiveandFurtherReadingB1 ARM and Thumb Assembler InstructionsB1.1 Using This AppendixB1.2 SyntaxB1.3 Alphabetical List ofARM and Thumb Instructions B1.4 ARM Asembler Quick ReferenceB1.5 GNU Assembler Quick ReferenceB2 ARM and Thumb Instruction EncodingsB3 Intruction Cycle TimingsC The Basics of Logic DesignD Mapping Control to HardwareADVANCED CONTENTHISTORICAL PERSPECTIVES & FURTHER READINGTUTORIALSSOFTWARE作者简介:David A.Patterson,加州大学伯克利分校计算机科学系教授。
内存 (DDR2)
芯片组 无线
最高 667 MHz
最高 667 MHz
最高 667 MHz
移动式英特尔® 945 高速芯片组家族 英特尔® PRO/无线 3945ABG
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Multicore OS benchmarks:we can do betterIhor Kuz ∗Zachary Anderson Pravin Shinde †Timothy Roscoe Systems Group,Department of Computer Science,ETH ZurichAbstractCurrent multicore OS benchmarks do not provide workloads that su fficiently reflect real-world use:they typically run a single application,whereas real work-loads consist of multiple concurrent programs.In this paper we show that this lack of mixed workloads leads to benchmarks that do not fully exercise the OS and are therefore inadequate at predicting real-world behav-ior.This implies that e ffective multicore OS benchmarks must include mixed workloads,but the main design chal-lenge is choosing an appropriate mix.We present a prin-cipled approach which treats benchmark design as an op-timization problem.Our solution leads to a workload mix that uses as much of a system’s resources as possi-ble,while also selecting applications whose performance is most sensitive to the availability of those resources.1IntroductionWe argue that benchmarks used in the Operating Sys-tems literature for evaluating new designs and techniques are fundamentally unrealistic:they ignore the common case of running multiple applications (or subsystems)on the same machine.Bluntly,we are measuring the wrong thing.We show,using existing OS benchmarks running concurrently,how traditional benchmarks lead to unre-alistic results,and propose composing benchmarks so as to obtain more useful information about how well an OS can multiplex the machine among competing programs.The purpose of an OS is to allocate and share machine resources between applications in a controlled way.The mismatch between what an OS should do,and which properties we currently measure about it,becomes morega ts yb yo na t t eas ,u r rb ae ru cn i nof oal ov a lo r eu eo nt i ca da fe ca r ka n ao dv i daOS S+H+D Perf Tornado[8]0+0+04 HeliOS[10]1+0+16 Corey[4]1+1+06 fos[12,13]1+0+07 Barrelfish[2]1+2+06 Linux[5]4+2+10w1cores w2cores1,2,4,6,861,2,4,6,861,2,4,6,86Table2:Configurations of MOSBENCH workloads used mixes that best exercise the OS.Overall,despite the fact that a multicore OS should provide isolation between running applications,most re-search does not use benchmarks which evaluate this. 2.2Case Study:MOSBENCHTo provide a concrete example of why a mixed workload is necessary,we modified the public version of the MOS-BENCH[5]suite to run multiple instances at once,and compared the result of running a mixed workload with a single workload.MOSBENCH is a benchmark suite for multicore OSes that includes a wide variety of applica-tions,but only runs one program at a time.We modified the MOSBENCH harness to start and monitor two workloads at once.MOSBENCH divides a workload into a startup stage,a waiting and collecting stage,and a stopping stage.We ensured that both work-loads would run through the stages in synchrony(i.e., both would execute the start stage in parallel,then the wait stage in parallel,and then the stop stage).Through-out the runs,we pinned the workloads to a disjoint set of cores,to reduce interference due to contention for cores. This is not strictly necessary(part of a multicore OS’s job is to schedule applications on cores)but it simplifies interpreting the results.We also added a dummy work-load that performs no work,to compare the results of a mixed workload to a single workload.We used a16core, 4socket AMD Shanghai machine with16GB RAM run-ning Linux2.6.32.We present a principled approach to workload selec-tion in the next section of this paper,but for this exper-iment we tried a number of arbitrary combinations of programs from the MOSBENCH suite,and we present a subset of the results in Table2.In all experiments, workload2uses afixed6cores,while we vary work-load1from its minimum to maximum core count.For each such two-load configuration we also ran with work-load2replaced by the dummy workload,providing both “mixed”and“non-mixed”results.Figure1shows the slowdown of the mixed workload relative to the corresponding non-mixed run(calculated as(nonmixed−mixed)/nonmixed,where nonmixed and mixed denote jobs per second).For some workloads there is little or no slowdown,but for others resource contention significantly impacts performance.Note that-5 0 5 10 15 20 25 30 35 40 12 3 4 5 6 7 8p e r c e n t s l o w d o w ncoresPsearchy Gmake PostgresFigure 1:Slowdown for mixed MOSBENCH workloadswhile the di fferences may be modest,they point to an isolation problem that none of the individual workloads uncovered,and one that would unlikely be uncovered by a di fferent single-application load.In summary,both performance isolation and scalabil-ity are a ffected by mixed workloads.This means that single-workload benchmarks alone are unlikely to pro-vide su fficient insight into the working of the OS.Next,we determine what kinds of workload mixes yield the most information when used to evaluate a multicore OS.3A better wayWe have argued and shown evidence that a good multi-core OS benchmark should provide a mixed workload.The problem,however,is that it is not directly obvious what kind of mix should be used,since as we saw previ-ously,not all workload combinations provide interesting results.The key questions that must be answered when choosing a workload mix include:•Which applications to choose?•Which application workloads and configurations to choose?•Which combinations of applications to run?Here we discuss an approach to answering these ques-tions.Our work is inspired by work on the DaCapo benchmark suite [3],and the vector-based approach to benchmarking developed by Seltzer et al.[11].Note that,while our approach may seem complex at first,much of it can be automated,greatly simplifying its application.Our goal is to design mixed workloads that can re-veal information about the scalability and performance isolation provided by an OS.Since such information is gained by pushing the OS to its limits,an e ffective mix should use as much of a system’s resources as possible,and devote those resources to applications whose perfor-mance is sensitive to their allocation.In this way,any ef-fect of the OS on those resources will be highlighted bythe benchmark,making it easier to trace anomalous ap-plication performance back to the OS,or to interactions among OS subsystems.Because there is no single metric of performance when multiple applications are run concurrently,evaluating the results of a mixed workload is also a problem.There-fore,we must incorporate into our approach application-specific measures of goodness,which we use both to evaluate benchmark results,and to guide the choice of a workload mix itself.3.1Optimal mix selectionIn our approach,we solve an optimization problem where the constraints derive from the resources con-sumed by benchmark applications when run alone,along with the sensitivity of their performance to changes in resource availability.We explain how we derive the con-straints,and how we use the solutions to compose mixed workloads.We also discuss how to evaluate scalability,performance isolation,and performance degradation in the face of resource overcommitment.Finally,we iden-tify conditions under which the technique is valid.To show why this is a plausible approach,consider a hypothetical mixed workload composed of typical desk-top applications:a game,a web browser,and an anti-virus scanner –a common desktop scenario.Each ap-plication accepts many possible inputs,but for our ap-proach we need only consider the set of inputs for which the proportion of system resources used varies as much as possible.We can also force the resources used by a benchmark to vary by placing external limits on an ap-plication.We assume that a suitable range of inputs and constraints is supplied by the benchmark designer.Furthermore,we assume that the benchmark designer provides a way to score the results of a run according to some goodness function .For a game this might be a combination of graphics fidelity and frame rate.For a browser,it might be a function of the average page load latency,and for an anti-virus scanner,a function of the number of files scanned in some fixed time period.With a variety of inputs and a function for scoring the results for each of the benchmarks,we derive the con-straints for our optimization problem in two steps.First,we run the benchmark applications alone on all the pro-vided inputs,measuring resource usage,and scoring the results with the goodness functions.Then,for each ap-plication we perform a sensitivity analysis to determine which resources were important for performance.For example,suppose that our example benchmarks are provided with inputs that result in the resource con-sumption and performance as indicated in Table 3(this is hypothetical data and not based on measured results).In this table,the rows give the proportion of a resourcemix cache diskgame10.250.1gameN 1.00.5webb10.250.0webbN 1.00.0antiv10.10.6antivN0.10.8CPU mem netwk0.80.60.1webb0. results of a sensitivity analysis. used by a benchmark on one of N different inputs.For example,the mem entry for game1is0.25,indicating that the game uses a quarter of the system’s memory with in-put1.The score column of the table gives the application specific goodness score,which is calculated for each of the runs.When the game uses0.25of the CPU,0.25of the cache,0.25of main memory,0.1of the disk,and0.1 of the network,it achieves an goodness score of0.25. Using this data we can now perform a sensitivity anal-ysis for each of the benchmark applications.The results of the analysis are a sensitivity score for each resource that show,on a scale of0to1,how sensitive an applica-tion’s performance is to changes in each resource.Exam-ple results of a sensitivity analysis are given in Table4. This table shows hypothetical sensitivities to resource al-locations of our example applications.For example,the CPU(at0.8)is more important for the game’s perfor-mance than the network(at0.1).We now have all the data necessary to compose the optimization constraints.We phrase the optimization problem as an integer linear program.The solution to the optimization problem tells us which benchmark ap-plications running on which inputs should compose the mixed workload.Let x i be the integer variable for the i’th benchmark/input pair.The solution to the optimiza-tion problem will be an assignment of the x i’s indicating how many of each benchmark/input pair should be run as part of the mixed workload.For each pair,we know the resource usage.Let r i j be the proportion of the j’th resource used by bench-mark/input pair i.We also know the sensitivity of each benchmark to changes in resources.Letσi j be the sensi-tivity of the benchmark in benchmark/resource pair i to changes in resource j.The problem is as follows:maximize j i x i r i jσi j(1)subject to∀j. i x i r i j≤1(2)Intuitively,what this means is that,without overcom-mitting the system,solutions will devote as many system resources as possible to benchmark applications that are sensitive to their allocation.In(1)r i jσi j is a heuristic that can be thought of as the sensitivity of a benchmark to a resource,written in terms of the amount of a resource that the benchmark productively uses.It is large if a benchmark is sensitive to and uses a lot of a resource, moderate if it is sensitive to the small amount it uses or is not sensitive to the large amount it uses,and small if it is neither sensitive to nor needs very much of a resource. We sum over all of the resources in the maximization condition.If the potential constituent benchmarks are sensitive to each of the system resources,then this maxi-mization condition will result in solutions that use every resource as much as possible.Inspecting the resulting so-lution will indicate whether or not the set of constituent benchmarks is complete enough.Finally,using this optimization problem we create a mixed workload that uses only the6runs listed explic-itly in ing this sensitivity data to generate the optimization problem yields the following mixed work-load results:{game1,game1,webb1,antivN}.This uses 85%of CPU,85%of cache,70%of memory,100%of disk,and70%of the network,and includes benchmarks that together are sensitive to all of the resources.3.2Interpreting ResultsOnce we have composed a good mixed workload,we can compare the results of individual benchmarks in the non-mixed and mixed settings.Also,we can examine aggre-gate results in order to expose OS performance issues. Identifying Bottlenecks:Ideally,a mixed workload should consume the same resources as the sum of those consumed by each constituent benchmark running alone. Deviations from this ideal may indicate subsystems,or interactions among subsystems,for which the OS is hav-ing trouble allocating resources when under load. Performance isolation:Since we know how well each of these benchmarks performed when running alone on the system,we can compare against the performance when they are run all together.In particular,we can cal-culate the percent difference between the sum of perfor-mance scores of the benchmarks run alone,and run as part of the mix.If the percent difference is smaller,then the OS provides better performance isolation.Scalability:It is also useful to see how an individual benchmark application scales up when others are run-ning at the same time.To accomplish this,we can use the same optimization problem,with the additional con-straint that one of the applications chosen must be the one we care about.If we choose inputs that show scala-bility when the benchmark is run alone,the same inputs should also scale when run as part of a mixed workload. Resource overcommitment:We can also construct a sequence of mixed workloads in which system resources become increasingly overcommitted.In particular in(2) above,we can replace the requirement that the sum of resources used by all the benchmarks is less than one, with a more general constraint.That is,instead of using 1as the upper bound of resource usage,we can use other values,even different values for different resources. Validity of this approach:We also propose a test for determining whether or not this technique will yield con-sistent,meaningful results.Given a sufficiently large set of benchmark/input pairs,the optimization problems we described above will have several solutions with simi-lar,near-optimal values of the objective function.If our approach is valid,then these solutions will give similar results.In particular,we can perform the performance isolation test for each mix,and obtain a set of percent dif-ferences in performance scores.If the variance of this set is small,then we can have confidence in our approach. Discussion:If the variance in performance differences across mixes is small,then we will have also shown that, so long as a mix is near-optimal,its precise composition is not important:our approach has the potential to obvi-ate the need for“standard”workload mixes,which may be biased toward particular architectures or systems.In the future we wish to show that this technique can tailor mixed workloads for particular systems in such a way that we can both obtain useful diagnostic results for a single system while comparing results across systems. We can mitigate the complexity of this approach us-ing a tool we are presently building that automates the entire process.Additionally,we can rely on previous work in IO benchmarking,e.g.the work on self-scaling workloads[6],to guide our interpretation of the results of sampling a large parameter space.4ConclusionWe claim that current benchmarks for multicore OSes do not reflect a realistic workload.In particular,they neglect mixed workloads consisting of several applications run-ning concurrently.However,the difficulty with designing mixed workload benchmarks is in choosing an appropri-ate mix.We propose a principled approach to designing good mixes based on treating it as an optimization prob-lem.The key advantages are that we can target specific resources of interest and gain a better understanding of how the mix is expected to behave.In the future,we intend to further develop and evalu-ate our approach.Choosing a good mix is,however,only part of the problem,and we will address other problems, such as portability of benchmarks,burstiness,and dy-namic workloads,as well.It is our intention to work to-gether with others from the OS community to further de-velop this work,in particular to develop a framework for producing multicore OS benchmark suites,and to pro-duce a standard suite that can be used for further OS re-search.AcknowledgmentsWe thank Jan Rellermeyer,Tim Harris,and Simon Peter for their contributions.References[1]A sanovic,K.,B odik,R.,C atanzaro,B.C.,G ebis,J.J.,H us-bands,P.,K eutzer,K.,P atterson,D.A.,P lishker,W.L.,S half, J.,W illiams,S.W.,and Y elick,K.A.The landscape of par-allel computing research:A view from Berkeley.Tech.Rep.UCB/EECS-2006-183,EECS Department,University of Cali-fornia,Berkeley,Dec2006.[2]B aumann,A.,B arham,P.,D agand,P.,and T.The Multiker-nel:A new OS architecture for scalable multicore systems.In SOSP’09.[3]B lackburn,S.M.,G arner,R.,H offman,C.,K han,A.M.,M c K inley,K.S.,B entzur,R.,D iwan,A.,F einberg,D.,et al.The DaCapo benchmarks:Java benchmarking development and analysis.In OOPSLA’06.[4]B oyd-W ickizer,S.,C hen,H.,C hen,R.,M ao,Y.,K aashoek,M.F.,M orris,R.,P esterev,A.,S tein,L.,W u,M.,hua D ai,Y., Z hang,Y.,and Z hang,Z.Corey:An operating system for many cores.In OSDI’08.[5]B oyd-wickizer,S.,C lements,A.T.,M ao,Y.,P esterev,A.,K aashoek,M.F.,M orris,R.,and Z eldovich,N.An analysis of Linux scalability to many cores.In OSDI’10.[6]C hen,P.M.,and P atterson,D.A.A new approach to I/O per-formance evaluation–self-scaling I/O benchmarks,predicted I/O performance.In SIGMETRICS’93.[7]F rachtenberg,E.,and E tsion,Y.Hardware parallelism:Areoperating systems ready?(case studies in mis-scheduling).In Workshop on the Interaction between Operating System and Computer Architecture(June2006).[8]G amsa,B.,K rieger,O.,A ppavoo,J.,and S tumm,M.Tornado:Maximizing locality and concurrency in a shared memory mul-tiprocessor operating system.In OSDI’99.[9]M ogul,J.Brittle metrics in operating systems research.InHotOS-VII(Mar.1999).[10]N ightingale,E.B.,H odson,O.,M c I lroy,R.,H awblitzel,C.,and H unt,G.Helios:Heterogeneous multiprocessing with satel-lite kernels.In SOSP’09.[11]S eltzer,M.,K rinsky, D.,and S mith,K.The case forapplication-specific benchmarking.In HotOS-VII(Mar.1999).[12]W entzlaff,D.,and A garwal,A.Factored operating systems(fos):The case for a scalable operating system for multicores.ACM SIGOPS Operating Systems Review43,2(Apr.2009). [13]W entzlaff,D.,III,C.G.,and B eckmann,N.An operating sys-tem for multicore and clouds:Mechanisms and implementation.In ACM Symposium on Cloud Computing(June2010).。