Categories and Subject Descriptors D.2.2 [Software Engineering] Design Tools and Techniques

合集下载

Design, Verification

CHARMY:An Extensible Tool for Architectural Analysis Paola Inverardi,Henry Muccini and Patrizio PelliccioneDipartimento di InformaticaUniversity of L’AquilaVia Vetoio-L’Aquila,Italy[inverard,muccini,pellicci]@di.univaq.itABSTRACTCharmy is a framework for designing and validating ar-chitectural speciﬁcations.In the early stages of the soft-ware development process,the Charmy framework assists the software architect in the design and validation phases. To increase its usability in an industrial context,the tool allows the use of UML-like notations to graphically design the system.Once the design is done,a formal prototype is automatically created for simulation and analysis purposes. The framework provides extensibility mechanisms to enable the introduction of new design and analysis features. Categories and Subject DescriptorsD.2.11[Software Architectures];I.6.4[Model Valida-tion and Analysis]General TermsDesign,VeriﬁcationKeywordsModel checking,Software Architectures1.INTRODUCTIONNowadays industries are increasing their interests in ana-lyzing and validating architectural choices,both behavioral and quantitative.Software Architecture(SA)-based analy-sis methods have been introduced to provide several value-added beneﬁts,such as system deadlock detection,perfor-mance analysis,component validation and much more[9]. Despite theﬂourishing of research work on architectural analysis,very few tools have been proposed to support SA-level analysis,and many of them are not anymore supported or diﬃcult to be introduced in an industrial context.Thus, how to automate the SA-based analysis process in a way use-ful for current industrial needs is a topic which requires a careful investigation.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on theﬁrst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.ESEC-FSE’05,September5–9,2005,Lisbon,Portugal.Copyright2005ACM1-59593-014-0/05/0009...$5.00.This paper introduces Charmy,a tool which allows the speciﬁcation of a software system SA through diagrammatic, UML-based notations,and the validation of the architec-tural speciﬁcation conformance with respect to certain func-tional requirements.Charmy oﬀers a graphical user inter-face to draw state diagrams and scenarios,used to specify the SA behavior and the functional requirements,respec-tively.A translation engine automatically derives formal speciﬁcations out of the diagrammatic notations,and the SPIN[11]model-checker is used for automatic veriﬁcation on such speciﬁcations.XMI is the output format of Charmy. Moreover,the framework provides extensibility mechanisms (via a plugin-based architecture)to enable the introduction of new features and to help the integration with other exist-ing analysis tools.The tool main beneﬁts are that it is UML-based(thus eas-ily integrable in industrial development processes),it auto-matically produces a formal prototype of the SA and model-checks it with SPIN(without requiring formal languages skills),and it is extensible,due to its plugin architecture.2.CHARMY FEATURESCharmy allows the speciﬁcation of a software architecture by means of both a topological(static)description and a be-havioral(dynamic)one[10].To increase the acceptability of our tool in industrial contexts we use a UML-based nota-tion(stereotyped class diagrams for the topology and state diagrams for the behavior).The tool has been used in real case studies both industrial and academic as summarized in Section6.Charmy allows the speciﬁcation of the SA topology in terms of components,connectors and relationships among them,where components represent abstract computational subsystems and connectors formalize the interactions among components.The internal behavior of each component is speciﬁed in terms of state machines.The Charmy tool performs several checks,at the SA spec-iﬁcation level,in order toﬁnd static speciﬁcation errors:a) in a state diagram it is not possible to introduce two states with the same name;b)each state diagram must contain one and only one initial state;c)for each send(receive)message in a component,there must exist a receive(send)message in another component;d)sequence diagrams can contain only messages already present into the state diagrams of the involved components;e)the sender and the receiver of a message must be the same(components)in the sequence diagrams and in the state diagrams;f)messages with the same name must have the same number of parameters.Promela CodeSA TopologySA DynamicsBuchi AutomataStep 1Step 2Step 3Charmy specificcheckStep 3Simulation and Standard Verificationa) b)c)Figure 1:The Charmy ToolOnce the SA speciﬁcation is available,the translation fea-ture is used to obtain from the model-based SA speciﬁcation,a formal executable prototype in Promela (the speciﬁcation language of SPIN)[11].On the generated Promela code,we can use the SPIN standard features to ﬁnd,for example,deadlocks or parts of states machines that are unreachable.Temporal properties are modeled using Property Sequence Charts PSC ,sequence diagram notation used to specify properties [3].Each sequence diagram represents a desired behavioral property we want to check in the Promela (archi-tectural)prototype.The Psc2Ba algorithm automatically translates PSC into B¨u chi automata (the automata repre-sentation for LTL formulae)while the SPIN model-checker is used to validate the temporal properties on the Promela code.Note that this translation process is fully automated.The aforementioned features are implemented through a graphical editor component which allows the speciﬁcation of both the topology and the behavior of the software archi-tecture.This component is composed by the topology editor which allows the speciﬁcation of the SA topology (Figure 1.a)and the thread editor which allows the speciﬁcation of the internal behavior of each component (Figure 1.b).The sequence editor allows the draw of sequence diagram repre-senting desired behavioral properties we want to check (Fig-ure 1.c).A translator utility converts state diagrams into Promela (Figure 1.step1).The translation algorithm (Fig-ure 1.step2),described in [3],allows an automatic transla-tion of the sequence diagrams into the B¨u chi automata for-malism,comprehensible by the model checking engine sup-ported by SPIN.More details on the Charmy features can be found in [12].3.THE CHARMY PLUGIN SAThe Charmy plugin Software Architecture is driven by requirements of easy extensibility of the initial core in several directions.Notably we want to be able to extend the kind of analysis we may perform on a software architecture and we want Charmy to be easily integrated with other existing tools .The Charmy tool architecture is shown in Figure 2.Tak-ing a look at the Charmy Core macro-component,it is composed by the Data Structure component,the Plugin Manager which allows the handling of the plug of a new component in the core system,the GUI which receives stim-uli by the users,and activates the Action Manager and the Event Handler .The Core Plugin meta-component contains a set of core plugs to edit the software architecture topology,the state machines and the scenarios respectively.The Standard Plugin contains a set of standard plugs to implement the translation from sequence diagrams to B¨u chi automata and from state machines to Promela code.More-over,this component will contain others future plugs.The Charmy core handles the plugin management by specifying:i )how a new plug should be implemented,ii )how the core system has to recognize the plug and use it,and iii )how the core and plug components should interact.Figure 2:The Charmy Plugin Architecture Figure 3graphically summarizes these aspects.i )Implementing a new plug:when a new component needs to become a plugin,it has to implement two interfaces:the“IMainTabPane”and the“IFilePlug”.The IMainTabPane interface handles the data information related to the win-dows.Here we have methods which allow the reception of information from the editor components(Topology,Se-quence and Thread components).The IFilePlug interface, instead,needs to be implemented when the plug requires to save or open aﬁle.ii)Recognizing the new plug:when a new plug is created and wants to be inserted,the core system needs to be in-formed about this.The solution we adopted is to create an .xmlﬁle(called plugin.xml)which contains all information needed.iii)Interaction:when a data is modiﬁed inside the core system,an event is sent by the Event Handler component to the plug.This event informs the plug of which kind of modiﬁcation has been made over the data(e.g.,insert,mod-ify,delete)and sends a clone of the data itself to the plug.A plug,in order to receive the event,has to be registered as a listener of the event itself.Figure3:Plug and Core4.THE IMPLEMENTED PLUGINSIn this section we brieﬂy describe the Charmy standard plugins implemented so far and illustrated in Figure2.4.1Psc2Ba:From PSC to B¨uchi Automata Plu-ginThe Psc2Ba algorithm translates Property Sequence Char-ts PSC[3]into B¨u chi automata.A scenario editor is used to draw PSC,while the Psc2Ba plugin translates them into B¨u chi automata.Such automata is used in Step3(see Fig-ure1)for Charmy speciﬁc checks.4.2The Promela Translation PluginThe Promela code generation plugin allows the transla-tion of components’state machines into Promela code.The generated Promela code is used for running SPIN standard features toﬁnd,for example,deadlocks or parts of states machines that are unreachable.The translation algorithm is described in[12]while the plugin is downloadable with Charmy.4.3The TeStor PluginTeStor[15](TEst Sequence generaTOR)is an algorithm for extracting test sequences from model-based speciﬁca-tions as produced by practitioners.A behavioral model of each component is provided in the form of UML state di-agrams.Such models explicitly specify which components interact and how.Sequence diagrams(inSD)are used as an abstract or even incomplete speciﬁcation of what the test should include,by representing test directives.TeStor takes in input UML state and sequence diagrams and syn-thesizes more detailed sequence diagrams(outSD)(conform-ing to the inSD)by recovering missing information from state diagrams.The output represents test sequences spec-iﬁed in the form of more informative scenarios.The TeStor algorithm has been implemented has a plu-gin component for Charmy and a beta version is currently available[6].More details on the TeStor plugin may be found in[15,6].4.4Compositional Analysis of Middleware-based SA PluginSummarizing the approach in[5],our aim is to model check middleware-based SA with respect to a subset of LTL system properties by means of Compositional Reasoning. Our approach exploits the structure imposed on the system, by the SA.The idea is to decompose the veriﬁcation of a global property,into the veriﬁcation of a number of proper-ties that hold locally on the architectural components.The architectural structure helps in deriving the validity of the whole system from the validity of the local properties.This plugin realizes and partially automatizes the theory presented in[5].A graphical editor allows the design of the Software Architecture without detailing how the com-ponents interact.Thus the communication between each pair of components is represented with a simple link con-nection,distinguishing between invocation of a service and results retrieval.To obtain a reﬁned SA,the software archi-tect can select a deﬁned middleware.The proxies’models (used to bridge the application with the middleware)are automatically generated.More details on the compositional veriﬁcation plugin may be found in[6].4.5JSpin PluginThe rationale is to introduce in Charmy a graphical in-terface for the SPIN model checker.To accomplish this task we exploit JSpin,a Java GUI for Spin developed by Moti Ben-Ari[13].5.RELATED WORKWhen dealing with tools for functional analysis of software architectures,we may distinguish between proposed tools, still supported tools and tools usable in industrial contexts. In theﬁrst class,i.e.the proposed ones,we may list all of those ones(mainly academic)introduced in the’90s to model and analyze speciﬁc Architecture Description Lan-guages(ADLs)(e.g.,Aesop,ArTek,C2,Darwin,LILEANNA, MetaH,Rapide,SADL,UniCon,Weaves,Wright)[14]. Currently only some ADLs seem to be still supported and in use.All such still-in-use tools are somehow easy to use, even if none of them makes use of UML-like notations.Their main limitation is that each of them focusses on a particu-lar analysis technique,leaving other techniques unexplored. Moreover,each of them uses a diﬀerent notation for SA spec-iﬁcation,thus making any integration diﬃcult.6.SOME CONSIDERATIONSCharmy has been thought in order to be easily integrated in industrial projects:the model checker engine complexity is hidden,providing the software engineer an automated,easy to use tool which takes in input the architectural mod-els in a UML-based notation,creates the prototype and au-tomatically analyzes the prototype reducing as much as pos-sible human intervention.Charmy supports the software architect in the design pro-cess in order to produce space eﬃcient models.In fact,Charmy provides guidelines on how to model the system and automatically generates an optimized Promela code,thus allowing an exhaustive analysis through model-checking.The experience shows that there is generally a considerable diﬀerence in eﬃciency and memory size be-tween models developed by a“casual”user and models de-veloped by an“expert”user.Then by using Charmy,the usual problems of state explosion and model memory size are mitigated,without requiring particular knowledge to users. Charmy has been used in several case studies both indus-trial and academic:NICE a joint work with Marconi Mobile Lab.NMS C2(L’Aquila-Italy)that operates in a naval com-munication environment[8].Siena and CoMETA a pub-lish/subscribe middleware and its extension to handle mo-bility[4].Engineering Order Wire(EOW)a joint work with Siemens C.N.X.S.p.A.,R.&D.(L’Aquila-Italy).EOW is an application that supports a telephone link between mul-tiple equipments by using dedicated voice link channels[2].7.FUTURE WORKInteresting extensions are planned to take place:Veriﬁcation Engine:since we are not tied to use the model checker SPIN,we are currently investigating the use of SMV or Bogor as model-checking engines.In the case of Bogor,it is very interesting to take advantage of its plugin structure in order to deﬁne a customized search algorithm for SA.Time and Space savings techniques:recently much ef-fort focusses in techniques that operate on the input of the model checker(models)in order to improve time and space eﬃciency:abstraction,symmetry and compositional reason-ing[7]are the currently evaluated solutions.The plugin SA of Charmy will allow the introduction of new features to handle these new techniques.Architecture Description Language:the SA topology editor will be extended by following the representation pro-vided by common architecture description languages and by taking into consideration our experience in SA-based mod-eling and analysis in industrial contexts.In particular,con-cepts and formalisms coming from both architecture descrip-tion languages,UML and XML representation of SA[17, 1]will be taken into consideration.By using existing archi-tectural languages,we plan also to be able to reuse existing dependence analysis[16]and architectural slicing[18]tech-niques,already automated by other tools.8.REFERENCES[1]ADML:Architecture Description Markup Language./architecture/adml/adml home.htm,Last Modiﬁed:December2002.Open Group.[2]A.Bucchiarone,H.Muccini,P.Pelliccione,and P.Pierini.Model-Checking plus Testing:from SoftwareArchitecture Analysis to Code Testing.In Proc.Int.Workshop on Integration of Testing Methodologies,ITM’04.LNCS n.3236[3]M.Autili,P.Inverardi,P.Pelliccione.GraphicalScenarios for Specifying Temporal Properties:anAutomatic Approach.Technical report,University ofL’Aquila,April2005.[4]M.Caporuscio,P.Inverardi,and P.Pelliccione.Formal analysis of architectural patterns.In FirstEuropean Workshop on Software Architecture-EWSA 2004,21-22May2004,St Andrews,Scotland,UK.LNCS n.3047.[5]M.Caporuscio,P.Inverardi,and P.Pelliccione.Compositional veriﬁcation of middleware-basedsoftware architecture descriptions.In Proceedings ofthe International Conference on Software Engineering (ICSE2004),Edimburgh,2004.IEEE ComputerScience Press.[6]Charmy Project.Charmy web site.http://www.di.univaq.it/charmy,February2004. [7]E.M.Clarke,O.Grumberg,and D.A.Peled.ModelChecking.The MIT Press,2001.[8]pare,P.Inverardi,P.Pelliccione,andA.Sebastiani.Integrating model-checkingarchitectural analysis and validation in a real software life-cycle.In FME2003,LNCS2805,pages114–132,Pisa,2003.[9]Formal Methods for Software Architectures.Tutorialbook on Software Architectures and Formal Methods.Eds.M.Bernardo and P.Inverardi,LNCS2804,2003.[10]D.Garlan.Software Architecture:a Roadmap.In A.Finkelstein(Ed.),ACM ICSE2000,The Future ofSoftware Engineering,pp.91-101,2000.[11]G.J.Holzmann.The SPIN Model Checker:Primerand Reference Manual.Addison-Wesley,September2003.[12]P.Pelliccione,P.Inverardi,and H.Muccini.Charmy:A framework for Designing and ValidatingArchitectural Speciﬁcations.Submitted forpublication.Technical report,University of L’Aquila,April2005.[13]jSpin-A Java GUI for Spin.http://stwww.weizmann.ac.il/g-cs/benari/jspin/. [14]N.Medvidovic and R.Taylor.A classiﬁcation andcomparison framework for software architecturedescription languages.IEEE Transactions on Software Engineering,2000,26(1),pp.70-93.[15]P.Pelliccione,H.Muccini,A.Bucchiarone,and F.Facchini.TeStor:Deriving Test Sequences fromModel-based Speciﬁcations8th InternationalSIGSOFT Symposium on Component-based Software Engineering.May2005,St.Louis,Missouri,USA.LNCS3489,pp.267-282.[16]J.A.Staﬀord,and A.L.Wolf,Architecture-leveldependence analysis in support of softwaremaintenance.In Third International SoftwareArchitecture Workshop(Orlando,Florida,November1998),pp.129–132.[17]xADL2.0Architecture Description Language./projects/xarchuci/,2005. [18]J.Zhao Software Architecture Slicing.In Proceedingsof the14th Annual Conference of Japan Society forSoftware Science and Technology(1997).。

Mesh 医学主题词表

Mesh 医学主题词表The following is a table of the categories and subcategories of the tree structure chart:A。

AnatomyA1.Body ns___A3.Digestive SystemA4.Respiratory SystemA5.Urogenital SystemA6.Endocrine SystemA7.vascular SystemA8.Nervous SystemA9.Sense OrgansA10.TissuesA11.CellsA12.Fluids and nsA13.Animal Structures___A15.___A16.Embryonic StructuresB。

OrganismsB1.InvertebratesB2.___B3.BacteriaB4.VirusesB5.Algae and FungiB6.PlantsB7.ArchaeaC。

Diseases___C2.Virus DiseasesC3.Parasitic DiseasesC4.NeoplasmsC5.Musculoskeletal Diseases C6.Digestive System Diseases ___C8.Respiratory Tract Diseases___C10.Nervous System DiseasesC11.Eye Diseases______ of the tree structure chart。

The categories are Anatomy。

Organisms。

and Diseases。

with ___ each。

The Anatomy category includes Body ns。

Musculoskeletal System。

Digestive System。

Respiratory System。

General Terms

A Biologically Inspired Programming Modelfor Self-Healing SystemsSelvin George Department of Computer Science University of VirginiaCharlottesville, VAselvin@David EvansDepartment of Computer ScienceUniversity of VirginiaCharlottesville, VAevans@Lance DavidsonDepartment of BiologyUniversity of VirginiaCharlottesville, VAlance_davidson@ABSTRACTThere is an increasing need for software systems to be able to adapt to changing conditions of resource variability, component malfunction and malicious intrusion. Such self-healing systems can prove extremely useful in situations where continuous serviceis critical or manual repair is not feasible. Human efforts to engineer self-healing systems have had limited success, but nature has developed extraordinary mechanisms for robustness and self-healing over billions of years. Nature’s programs are encoded in DNA and exhibit remarkable density and expressiveness. We argue that the software engineering community can learn a great deal about building systems from the broader concepts surrounding biological cell programs and the strategies they use to robustly accomplish complex tasks such as development, healing and regeneration. We present a cell-based programming model inspired from biology and speculate on biologically inspired strategies for producing robust, scalable and self-healing software systems.Categories and Subject DescriptorsD.1.0 [Programming Techniques]: General; D.2.4 [Software/Program Verification] – reliability; F.1.1 [Models of Computation].General TermsDesign, Reliability, Experimentation, Security, Languages KeywordsBiological programming; self-healing systems; amorphous computing.1. INTRODUCTIONBiology is replete with examples of systems with remarkable robustness and self-healing properties. These include morphogenesis, wound healing and regeneration: Morphogenesis. A single cell develops into a full organism following a program encoded in its DNA that evolved over billions of years. Cells perform various actions like division, deformation and growth based on gene actions. The actions of the genes are dictated by the presence of chemical substances. Gene actions coupled with physical forces acting on a cell from its neighboring cells and external environment lead to a developed organism. Even in simple organisms, development is robust to many kinds of local failures and adapts to a wide range of environments. For example, when a cell dies, the neighboring cells sense changes in the environment and adapt their own development to correct the problem [6].Wound Healing. Almost all complex organisms have some sort of mechanism for healing simple wounds. In humans, when a minor injury happens, an inflammatory response occurs and the cells below the dermis (the deepest skin layer) begin to increase collagen (connective tissue) production. Later, the epithelial tissue (the outer skin layer) is regenerated. The interesting point here is the apparent level of awareness of the cells. Also, cells around the injury are able to adapt to a different function based on the new circumstances [2].Regeneration. Many organisms can regenerate new heads, limbs, internal organs or other body parts if the originals are lost or damaged. Organisms take two approaches to replacing a lost body part. Some, such as flatworms and the polyp Hydra, retain populations of stem cells throughout their lives, which are mobilized when needed. These stem cells retain the ability to regrow many of the body’s tissues. Other organisms, including newts, segmented worms and zebrafish, convert differentiated adult cells that have stopped dividing and form part of the skin, muscle or another tissue back into stem cells. When a newt’s leg, tail or eye is amputated or damaged, cells near the stump begin an extraordinary change. They revert from specialized skin, muscle and nerve cells into blank progenitor cells. These progenitors multiply quickly to about 80,000 cells and then grow into specialized cells to regenerate the missing part [5].We observe that nature’s approach to programming has the following properties:1. Environmental Awareness. Though the cells may havelimited communication capabilities, they act differently in response to sensed properties of the surrounding environment. This enables cells to react to changes in nearbycells, as well as the surrounding environment.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.WOSS '02, Nov 18-19, 2002, Charleston, SC, USA.Copyright 2002 ACM 1-58113-609-9/02/0011 ...$5.002. Adaptation. Many cells have a great amount of adaptability.In many organisms, at the beginning of morphogenesis, if one of the initial cells obtained by the first division of the germ cell dies then the surviving cell is often able to complete the development of the organism. This indicates that enough information is preserved to be able to “backtrack” to a previous state of development. This is due to the fact that all cells run the same cell-program and can hence respond to aberrant behavior from neighbors3. Redundancy. A non-redundant organism would have everycell assigned a fixed role in the development process. Any failure during the development process would produce a defective organism. Typical organisms have many cells devoted to the same function throughout development, so that failures of individual cells are inconsequential.Biological systems also exhibit redundancy of function, where several distinct mechanisms evolve for the same purpose in a single organism.4. Decentralization. There is no global coordination andlimited communication for most of the development process.Cells sense properties of their environment, and are most affected by nearby cells. Cells can induce neighboring cells to do a particular action, but there is no centralized control and limited long-distance communication.2. CELL-BASED PROGRAMMINGInspired by biological systems, we propose a cell-based programming model that can be used for software systems operation and healing. Our model is similar to the cellular automata that have been studied extensively since von Neumann’s early work [4], but differs in that it is more closely related to biological processes. In particular, we support a notion of cell division, a communication model based on chemical diffusion, and a rudimentary model of the physical forces involved. By developing a programming model more like nature’s, we believe we will produce more robust programs with natural self-healing properties.A related approach is amorphous computing, which considers approaches for programming a medium of randomly distributed computing particles. The Growing Point Language [1] and Origami Shape Language [3] both illustrate mechanisms for global self-organization using simple local communication of the agents. Self-healing properties are also being studied using GPL. As with our work, the challenge is to produce programs that generate predictable behavior with a locally unpredictable and non-traditional programming model. Because the underlying execution environment is inherently redundant and decentralized, robustness is practically inevitable if programs are constructed in the right way.We represent a cell program as an automaton containing discrete states and transitions between these states. Every cell comprising the program is in one of these states. The input to each cell state is the sensed properties of the local environment and the output is a transition to another state, or a division into two (possibly different) states. States are represented by circles and state transitions by directional arrows. Dots represent cell divisions.Our cell programming model incorporates: 1. Cell Division. A cell can divide into two daughter cells thatmay be dissimilar in orientation and chemical composition but have the same program (DNA). A cell has an axis calledthe apical-basal axis. Divisions can be either perpendicular tothis cell axis or along the plane containing the axis. The difference in chemical composition and also the different chemicals on their cell walls causes the two daughter cells tobehave differently from that point onwards. Cell division ismodeled by using a transition from one state to two states.2. Cell Actions. Cells can produce proteins and signalingchemicals depending on what genes are active. Chemicals produced this way affect the environment and neighboring cells through chemical diffusion.3. Gene Actions. Genes can activate or deactivate dependingon the presence or absence of a particular protein or a certaindegree of chemical concentration. Activation or deactivationof a gene results in cell actions like production of chemicals. The varying degrees of concentration produced by earlier cell actions (both by the cell and its neighbors) cause gene actions and gene actions cause cell actions; this results in a powerful programming paradigm. Cell actions such as production of chemicals are modeled using messages. Gene actions are modeled using cell state transitions; these are a result of received messages.A cell program begins with cells in an initial configuration, and all the cells follow transition rules like a finite state machine. Between steps, an environment simulator determines changes in external stimuli. The changes to the environment can be due to operations of the software system, expected input conditions or failure conditions. Since cells can sense their local environment it is possible for them to be able to perform failure recovery (healing) or re-composition of appropriate components (regeneration). Our simulator also provides opportunities to conduct experiments involving random and catastrophic failures. Two simple examples of cell programs are shown in Figure 1. Automaton A produces a line of cells as long as the input condition a exists. The condition a may represent the presence of food for growth. AutomatonB produces cells to combat intruders as long as it detects unfavorable conditions. This approach creates excess cells so that some may survive the malicious action.Figure 1. Example Cell ProgramsA. Creating a line of Cellsa – Condition favorableto continued cellproductionB. Robustness through replicationa – Favorable conditionsb – Unfavorable conditionsNumbered Circle – StateArrow – TransitionDot – Cell division3. SIMULATING CELL PROGRAMSUsing a simulator, we have conducted simulations of different cell programs. The simulator simulates a cell program on a simulation configuration. The simulation configuration is used to introduce new cells, chemical concentrations or failures. Our simulator is available at /cellsim .A sample program for creation of a self-healing blastula is shown below. A blastula is a spherical structure that is the first stage of development of many large organisms. A sufficient number of cells are needed before organism development proceeds to the next stage.state s1 {emits (sig, 0.1)transitions(0 <= sig <= 0.375) -> (s2, s2) axis; -> (s1); }state s2 {emits (sig, 0.1) transitions(0 <= sig <= 0.375) -> (s3, s3) normal-X; -> (s2); }state s3 {emits (sig, 0.1) transitions(0 <= sig <= 0.375) -> (s1, s1) normal-Y; -> (s3); }In the above cell automaton there are three cell-states – s1, s2 and s3. They are similar in that they emit the same signaling chemical sig and divide into two cells each if they sense that the concentration of sig is less than 0.375. The cell remains in its current state if the concentration of sig is above 0.375.(a)(b)(c)Figure 2. Simulated Blastula Program. (a) Blastula in 8-cell stage – starting from one cell; (b) damaged blastula – after killing one cell (c) after the blastula regenerates.This self-healing blastula has the property that if a few cells are killed, it will automatically heal itself by producing the required number of cells. Figure 2 shows a simulation of the blastula program for four steps, after which one of the cells was killed to observe the self-healing behavior. The surviving cells regenerate additional cells to continue the process. The principle behind this type of healing is that the once a cell was killed, it stopped producing the particular chemical that was being sensed by its neighbors. Note that nothing in the cell program explicitly deals with healing and regeneration. The neighbors of a failed cell just follow a different path in the cell-program due to the changed environmental conditions.4. TOWARDS SELF-HEALING SYSTEMSAlthough our initial experiments have focused on mimicking simple biological processes and generating basic geometric structures, our long-term goal is to develop techniques that can be used to produce robust, self-healing systems designed to perform a complex task. Developing complex programs using state diagrams, however, is infeasible. A high-level programming abstraction for cell-based programs is needed in which a programmer can describe desired processes at a high-level and a cell-program compiler will produce the steps of the automaton. An important design issue is which operations or abstractions should be part of the language and which can be composed of the elementary operations and hence can be kept outside the language. We are currently working on programming abstractions based on the biological cell model. If successful, programs described in this way will have intrinsic robustness, scaling and self-healing properties. We hope that our experiments with biological programs will provide insights into how to build more robust computer systems.5. ACKNOWLEDGMENTSThis work was funded in part by grants from the National Science Foundation (CCR-0092945 and EIA-0205327) and NASA Langley Research Center.6. REFERENCES[1] H. Abelson, D. Allen, D. Coore, C. Hanson, G. Homsy, T.Knight, R. Nagpal, E. Rauch, G. Sussman and R. Weiss, Amorphous Computing, Communications of the ACM, Volume 43, Number 5, p. 74-83. May 2000.[2] Mary Y. Mazzotta. Nutrition and wound healing . Journal ofthe American Podiatric Medical Association. Volume 84, Number 9, p. 456–62. September 1994. [3] Radhika Nagpal, Programmable Self-Assembly:Constructing Global Shape using Biologically-inspired Local Interactions and Origami Mathematics, PhD Thesis, MIT Department of Electrical Engineering and Computer Science, June 2001. [4] John von Neumann, Theory of Self-Reproducing Automata.University of Illinois Press, 1966 (Originally published in 1953).[5] Helen Pearson, The regeneration gap , Nature ScienceUpdate. 22 November 2001. [6] Lewis Wolpert, Rosa Beddington, Peter Lawrence, ThomasM. Jessell, Principles of Development , Oxford University Press. 2002.。

Metrics—Performance measures

Intermediately Executed Code is the Key to Find Refactorings that Improve Temporal Data LocalityKristof BeylsElectronics and Information Systems(ELIS),Ghent University,Sint-Pietersnieuwstraat41,B-9000Gent,Belgiumkristof.beyls@elis.UGent.beErik H.D’Hollander Electronics and Information Systems(ELIS)Ghent University,Sint-Pietersnieuwstraat41,B-9000Gent,Belgiumerik.dhollander@elis.UGent.beABSTRACTThe growing speed gap between memory and processor makes an eﬃcient use of the cache ever more important to reach high performance.One of the most important ways to im-prove cache behavior is to increase the data locality.While many cache analysis tools have been developed,most of them only indicate the locations in the code where cache misses occur.Often,optimizing the program,even after pin-pointing the cache bottlenecks in the source code,remains hard with these tools.In this paper,we present two related tools that not only pinpoint the locations of cache misses,but also suggest source code refactorings which improve temporal locality and thereby eliminate the majority of the cache misses.In both tools, the key toﬁnd the appropriate refactorings is an analysis of the code executed between a data use and the next use of the same data,which we call the Intermediately Executed Code (IEC).Theﬁrst tool,the Reuse Distance VISualizer(RD-VIS),performs a clustering on the IECs,which reduces the amount of work toﬁnd required refactorings.The second tool,SLO(short for“Suggestions for Locality Optimiza-tions”),suggests a number of refactorings by analyzing the call graph and loop structure of the ing these tools, we have pinpointed the most important optimizations for a number of SPEC2000programs,resulting in an average speedup of2.3on a number of diﬀerent platforms. Categories and Subject DescriptorsD.3.4[Programming Languages]:Processors—Compil-ers,Debuggers,Optimization; D.2.8[Software Engineer-ing]:Metrics—Performance measuresGeneral TermsPerformance,Measurement,LanguagesPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on theﬁrst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.CF’06,May3–5,2006,Ischia,Italy.Copyright2006ACM1-59593-302-6/06/0005...$5.00.KeywordsTemporal Data Locality,Program Analysis,Refactoring, Program Optimizations,Performance Debugger,Loop Trans-formations1.INTRODUCTIONThe widening speed gap between processor and main mem-ory makes low cache miss rates ever more important.The major classes of cache misses are conﬂict and capacity misses. While conﬂict misses are caused by conﬂicts in the internal cache structure,capacity misses are caused by poor tempo-ral or spatial locality.In this paper,we propose two tools that help to identify the underlying reason of poor temporal data locality in the source code.1.1Related WorkIn recent years,compiler methods have been devised to automatically increase spatial data locality,by transform-ing the data layout of arrays and structures,so that data accessed close together in time also lays close together in the address space[9,11,17,19,22,33].On the other hand, temporal locality can only be improved by reordering the memory accesses so that the same addresses are accessed closer together.Advanced compiler methods to do this all target speciﬁc code patterns such as aﬃne array expressions in regular loop nests[11,18,22],or speciﬁc sparse matrix computations[14,15,24,27].For more general program constructs,fully-automatic optimization seems to be very hard,mainly due to the diﬃculty of the required dependence analysis.Therefore,cache and data locality analysis tools and visualizers are needed to help programmers to refactor their programs for improved temporal locality.void ex(double*X,double*Y,int len,int N){int i,j,k;for(i=0;i<N;i++){for(j=1;j<len;j++)Y[j]=Y[j]*X[i];//39%of cache misses for(k=1;k<len;k+=2)Y[k]=(Y[k]+Y[k-1])/2.0;//61%of cache misses }}Figure1:First motivating example,view on cache misses given by traditional tools.N=10,len=100001.(c) ACM, 2006. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Computing Frontiers, May 2006. /10.1145/1128022.1128071(a)view of reference pairs with long-distance reuse inRDVIS.(b)histogram of long-distance reuses.Gray scales cor-respond to the arrows in(a).(c)graphical view of intermediately executed code in RDVIS,and associated clusteranalysis.(d)view of Intermediately Executed Code of resp.the light and the dark gray cluster in (c).Figure 2:First motivating example,views produced by RDVIS.The colors were manually changed to gray scale,to make the results readable in this black-and-white copy of the paper.Most existing cache and data locality analyzers measure the locality or the cache misses and indicate at which lo-cations in the source code,or for which data structures,most cache misses occur [2,4,5,8,12,13,20,21,23,28,29,32].While this information is helpful in identifying the main bottlenecks in the program,it can still be diﬃcult to deduce a suitable program transformation from it.In this regard,a few of these tools provide additional support for ﬁnding the underlying cause of conﬂict misses (e.g.CVT[28],CacheVIZ[32],YACO[25])or the underlying cause of poor spatial locality (e.g.SIP[4]).In contrast,we present a method to help identify the un-derlying causes of poor temporal data locality.Basically,poor temporal locality results when a large amount of other data is accessed between two consecutive uses of the same data.Improving the locality requires diminishing the vol-ume of data accessed between use and reuse.The source code executed between use and reuse is responsible for ac-cessing the large data volume,resulting in a long reuse dis-tance .That source code is called the Intermediately Exe-cuted Code (IEC)of that reuse.Consequently,to improve the temporal data locality,a refactoring of the IEC is re-quired.In this paper,we present two tools that analyze the IEC in diﬀerent ways to pinpoint the required refactorings:RD-VIS (Reuse Distance VISualizer),which has been discussed earlier in [7],and SLO (Suggestions for Locality Optimiza-tions).RDVIS represents the IEC as a set of basic blocks ex-ecuted between long-distance reuses.In a typical program,there are a huge number of data reuses,and consequently a huge number of corresponding IECs.RDVIS applies a cluster analysis to the IECs so that the main patterns of poor locality-generating source code are revealed.Based on a visual representation of the resulting clusters and high-lighting of the corresponding source code,the programmer can deduce the necessary program optimizations.In SLO,the loop structure and the call graph of the IEC is also taken into account,allowing it to go one step further than RDVIS.SLO pinpoints the exact source code refactorings that are needed to improve locality.Examples of such refac-torings are loop tiling,and computation fusion,which are demonstrated in the motivating examples in section 2.In section 3,reuse distances and associated terms are deﬁned.Section 4describes how RDVIS analyzes the IEC.Section 5presents the analyses performed by SLO on the IEC to ﬁnd the appropriate source code refactorings.In section 6,we provide a few case studies where these tools have been used to identify the required refactorings for a number of real-world programs from the SPEC2000benchmarks.For two of them,we applied the necessary transformations,leading to an average cross-platform speedup of about 2.3.Con-cluding remarks are given in section 7.2.MOTIV ATING EXAMPLESWe start by showing two small code examples where the indication of cache misses with traditional tools does not clearly reveal how to optimize the programs.Furthermore,we show how RDVIS and SLO visualize the intermediately executed code of long-distance reuses,and how that makes it easier to ﬁnd source code refactorings that improve temporal locality.2.1Example 1:Intra-Procedural Loop ReusesThe code in ﬁgure 1shows a small piece of code,where a traditional tool would show that the ﬁrst statement is responsible for about 39%of all cache misses and the second statement produces 61%of them.While this information indicates where cache misses occur,it is not directly clear how the locality of the program can be improved to diminish the number of cache misses.The views produced by our tools are shown in ﬁgure 2for RDVIS,and in ﬁgure 3for SLO.For each pair of refer-ences that generate many long-distance reuses,an arrow is drawn,starting at the reference that accesses the data ﬁrst,and pointing to the reference that reuses that data after a long time.Figure 2(a)shows the four pairs of references that generate the majority of long-distance reuses:(Y[k],Y[j]),(Y[k-1],Y[j]),(Y[j],Y[k])and (Y[j],Y[k-1]).Figure 2(b)shows that each of those 4pairs generate about the same amount of long-distance reuses at distance 217,meaning that about 217other elements are accessed between those reuses.(In this example,N =10and len =100001).When the cache can contain 210elements (as indicated by the background),all the reuses at a larger distance lead to cache misses.So,the reuses at distance 217must be made smaller than 210,in other words,largely diminishing the amount of data accessed between use and reuse.To optimize each of the four arrows in ﬁgure 2(a),the ﬁrst step is to pinpoint which code is responsible for generating the accesses between use and reuse.The second step is to refactor the code so that fewer data elements are accessed between use and reuse.RDVIS records the basic blocks executed between each use and reuse,and allows to visu-ally indicate the corresponding source code for each arrow.Besides an evaluation examining each arrow separately,RD-VIS also contains a cluster analysis.The arrows with similar IEC are put in the same cluster.As an example,ﬁgure 2(c)shows how RDVIS graphically represents the result of the cluster analysis.On the left hand side,the code executed(a)5diﬀerent optimizations indicated by gray scale,with respect to the reuse distance of the reuses they optimize,as shown by SLO(b)Indication of the two optimizations for the reuses at distance 217,as indicated by SLO.The light gray op-timization indicates fusion of the two inner loops.The dark gray optimization requires tiling the outer loop.Figure 3:First motivating example,SLO view.The colors were manually altered to gray scale,to make the results readable in this black-and-white copy of the paper.between use and reuse is graphically represented.There are four horizontal bands,respectively representing the IEC of the four arrows in ﬁgure 2(a).In each band,the basic blocks in the program are represented,left to right.If a basic block is executed between use and reuse,it is colored in a shade of gray,otherwise it is nearly white.Originally,RDVIS pro-duces a colored view with higher contrast.Here,the colors were converted to enhance readability in black-and-white.Figure 2(c)shows that the code executed between use and reuse of arrows 1and 2are identical.Also the code exe-Figure4:Code and left-over long reuse distance after loop fusion.cuted between use and reuse of arrow3and4are identical. On the right hand side,the cluster dendrogram graphically indicates how“similar”the IEC is for each arrow.In this example,the user has manually selected two subclusters.It shows that52.6%of the long distance reuses are generated by the light gray cluster,while47.4%are generated by the dark gray cluster.Furthermore,inﬁgure2(d),the IEC for the two clusters has been highlighted in the source code by RDVIS.The code that is executed between use and reuse is highlighted in bold.This shows that for the light gray cluster,the uses occur in the j-loop,while the reuses occur in the k-loop.Both the use and the reuse occur in the same iteration of the i-loop,since the loop control code:i<N; i++is not highlighted.These two arrows can be optimized by loop fusion,as is discussed in detail below.In the dark gray cluster,it shows that the control of loop i:i<N;i++ is executed between use and reuse.Hence the use and reuse occur in diﬀerent iterations of the outer i-loop.The expe-rienced RDVIS-user recognizes from this pattern that loop tiling needs to be applied,as discussed in more detail below. In contrast to RDVIS,where the programmer needs to examine the Intermediately Executed Code to pinpoint op-timizations,SLO analyzes the IEC itself,and interactively indicates the optimizations that are needed.For example, inﬁgure3(b),the required loop fusion and loop tiling are indicated by a bar on the left hand side.Furthermore,the histogram produced by SLO indicates which reuses can be optimized by which optimization in diﬀerent colors,e.g.see ﬁgure3(a).The upper histogram shows the absolute num-ber of reuses at a given distance.The bottom histogram Figure5:Code and left-over long reuse distance after loop tiling.shows the fraction of reuses at a given distance that can be optimized by each transformation.Below,we explain how loop fusion and loop tiling can be used to improve the locality and performance.These two transformations are the most important optimizations for improving temporal locality in loops.2.1.1Optimizing Pattern1:Loop FusionFrom both the views produced by RDVIS(ﬁg.2(d)at the top)and SLO(ﬁg.3(b)at the top),it shows that about half of the long-distance reuses occur because element Y[j] is used in theﬁrst loop,and it is reused by references Y[k] and Y[k-1]in the second loop.The distance is long because between the reuses,all other elements of array Y are accessed by the same loops.For this pattern,the reuse distance can be reduced by loop fusion:instead of running over array Y twice,the computations from both loops are performed in one run over the array.In order to fuse the loops,theﬁrst loop is unrolled twice,after which they are fused,under the assumption that variable len is odd,resulting in the code in ﬁgure4.The histogram in theﬁgure shows that the long-distance reuses targeted have all been shortened to distances smaller than25.This results in a speedup of about1.9ona Pentium4system,due to fewer cache misses,see table1.2.1.2Optimizing Pattern2:Loop TilingAfter fusing the inner loops,the code can be analyzed again for the causes of the remaining long reuse distance patterns.Figure4shows how SLO indicates that all left-version exec.time speeduporig0.183sfused0.098s 1.87fused+tiled0.032s 5.72Table1:Running times and speedups of the code before and after optimizations,on a2.66Ghz Pen-tium4,for N=10,len=1000001.1double inproduct(double*X,double*Y,int len){ int i;double result=0.0;for(i=0;i<len;i++)result+=X[i]*Y[i];//50%of cache misses 5return result;}double sum(double*X,int len){int i;double result=0.0;10for(i=0;i<len;i++)result+=X[i];//50%of cache missesreturn result;}15double prodsum(double*X,double*Y,int len){ double inp=inproduct(X,Y,len);double sumX=sum(X,len);double sumY=sum(Y,len);return inp+sumX+sumY;20}Figure6:View on cache misses as provided by most traditional tools for the second example.over long reuse distances occur because the use is in one iteration of the i-loop,and the reuse is in a later iteration. Consequently,the tool indicates that the i-loop should be tiled,by displaying a bar to the left of the loop source code. Loop tiling is applied when the long-distance reuses occur between diﬀerent iterations of a single outer loop.When this occurs,it means that in a single iteration of that loop,more data is accessed than canﬁt in the cache.The principle idea behind loop tiling is to process less data in one iteration of the loop,so that data can be retained in the cache between several iterations of the loop.Figure5shows the code after tiling.Now,the inner j-loop executes at most50iterations (see variable tilesize),and hence the amount of data ac-cessed in the inner loop is limited.As a result,the reuses between diﬀerent iterations of the i-loop are shortened from a distance of217to a distance between27and29,see the histograms in Figures4and5.Note that some reuses have increased in size:1in50reuses between iterations of the j-loop inﬁgure4have increased from24–25to29–210(see dark bars inﬁgure5).This is because1in50reuses in the original j-loop are now between iterations of the outer jj-loop.The end result is the removal of all long-distance reuses.As a result,the overall measured program speedup is5.7,see table1.2.2Example2:Inter-Procedural ReusesThe second example is shown inﬁgure6.The code in function prodsumﬁrst calculates the inproduct of two arrays by calling inproduct,after which the sum of all elements in both arrays is computed by calling function sum.Most existing tools would show,in one way or another,that halfof the misses occur on line4,and the other half are causedby the code on line11.In contrast,RDVIS shows two reference pairs,indicatedby arrows,that lead to long distance reuses,seeﬁgure7.By examining the highlighted code carefully,the program-mer canﬁnd that uses occur in the call to inproduct,while reuses occur in one of the two calls to sum.Here,the pro-grammer must perform an interprocedural analysis of the IEC.SLO,on the other hand,performs the interprocedu-ral analysis for the programmer,and visualizes the result as shown inﬁgure8.It clearly identiﬁes that for half of the long-distances reuses,inproduct must be fused with theﬁrst call to sum,and for the other half inproduct must be fused with the second call to sum.3.BASIC DEFINITIONSIn this section,we review the basic terms and deﬁnitions that are used to characterize reuses in a program.Deﬁnition1.A memory access a x is a single access to memory,that accesses address x.A memory reference r is the source code construct that leads to a memory instructionat compile-time,which in turn generates memory accessesat run-time.The reference that generates memory access a xis denoted by ref(a x).The address accessed be a memory access is denoted by addr(a x),i.e.addr(a x)=x.Deﬁnition2.A memory access trace T is a sequence of memory accesses,indexed by a logical time.The diﬀerencein time between consecutive accesses in a trace is1.The time of an access a x is denoted by T[a x].Deﬁnition3.A reuse pair a x,a x is a pair of memory accesses in a trace such that both accesses address the same data,and there are no intervening accesses to that data. The use of a reuse pair is theﬁrst access in the pair;the reuse is the second access.A reference pair(r1,r2)is a pair of memory references. The reuse pairs associated with a reference pair(r1,r2)is the set of reuse pairs for which the use is generated by r1and the reuse is generated by r2,and is denoted by reuses(r1,r2).Deﬁnition4.The Intermediately Executed Code(IEC) of a reuse pair a x,a x is the code executed between T[a x] and T[a x].Deﬁnition5.The reuse distance of a reuse pair froma trace,is the number of unique memory addresses in that trace between use and reuse.Cache misses are identiﬁed by the reuses that have a dis-tance larger than the cache size[6].4.RDVIS:IEC ANALYSIS BY BASIC BLOCKVECTOR CLUSTERINGIn RDVIS,the Intermediately Executed Code is repre-sented by a basic block vector:Deﬁnition6.The basic block vector of a reuse paira x,a x ,denoted by BBV( a x,a x )is a vector∈{0,1}n, where n is the number of basic blocks in the program.Whena basic block is executed between use and reuse,the corre-sponding vector element is1,otherwise it is0.(a)IEC for ﬁrst referencepair.(b)IEC for second reference pair.Figure 7:Indication of intermediately executed code byRDVIS.(a)Two required fusions of functions indicated byarrows.(b)The reuse distance histogram for the reuses opti-mized by the two arrows in (a),for len =1000000.Figure 8:Indication of locality optimizations by SLO.The basic block vector of a reference pair (r 1,r 2),denoted by BBV ((r 1,r 2))is a vector ∈[0,1]n .The value of a vector element is the fraction of reuse pairs in reuses(r 1,r 2)for which the basic block is executed between use and reuse.More formally:BBV ((r 1,r 2))=Pa x ,a x ∈reuses(r 1,r 2)BBV( a x ,a x )#reuses(r 1,r 2)In RDVIS,reference pairs are visually represented by ar-rows drawn on top of the source code,e.g.ﬁgure 2.The tool allows to highlight the code executed between use and reuse for each individual arrow.Additionally,RDVIS clusters ar-rows according to the similarity of their IEC.The similarity (or rather dissimilarity)of the code exe-cuted between two reference pairs is computed as the Man-hattan distance of the corresponding basic block vectors in the vector space [0,1]n .When exactly the same code is exe-cuted between the reuses,the distance is 0;when the code is completely dissimilar,the distance is n .Based on the Man-hattan distance,an agglomerative clustering is performed,which proceeds as follows.First,each reference pair forms a separate cluster.Then,iteratively,the two closest clustersare merged into a single cluster.The basic block vector cor-responding with the new cluster is the average of the two basic block vectors that represent the merged clusters.The clustering stops when all reference pairs are combined into one large cluster.The distances between diﬀerent subclus-ters are shown graphically in the dendrogram,and the user selects “interesting-looking”or “tight”subclusters. E.g.in ﬁgure 2(c),the user selected two very tight subclusters:the light gray and the dark gray subcluster.Since similar code is executed between use and reuse in a tight subcluster,it is likely that the long-distance reference pairs can be optimized by the same refactoring,e.g.see ﬁgure 2(d).5.SLO:IEC ANALYSIS BY INTERPROCE-DURAL CONTROL FLOW INSPECTIONSLO aims to improve on RDVIS by analyzing the IEC further and automatically pinpoint the refactorings that are necessary to improve temporal locality,even in an interpro-cedural context.To make this possible,SLO tracks the loop headers (i.e.the basic blocks that control whether a loop body is executed [1])and the functions that are executed between use and reuse,using the following framework.5.1Step1:Determining the Least CommonAncestor FunctionFigure9:The Least Common Ancestor Frame (LCAF)of a reuse,indicated in the activation tree. The activation tree represents a given time during the execution of the code inﬁgure6,assuming that the use occurs inside function inproduct,and the reuse occurs inside sum.SLO proceeds byﬁrst determining the function in which the refactoring must be applied.In a second step,the exact refactoring on which part of that function’s code is com-puted.The refactoring must be applied in the“smallest”function in which both the use and the reuse can be seen. This is formalized by the following deﬁnitions,and illus-trated inﬁgure9.Deﬁnition7.The activation tree[1]of a running pro-gram is a tree with a node for every function call at run-time and edges pointing from callers to callees.The use site of a reuse pair a x,a x is the node cor-responding to the function invocation in which access a x occurs.The reuse site is the node where access a x occurs. The Least Common Ancestor Frame(LCAF)of a reuse pair a x,a x is the least common ancestor in the acti-vation tree of the use site and the reuse site of a x,a x .The Least Common Ancestor Function is the function that corresponds to the least common ancestor frame.The LCAF is the function where some refactoring is needed to bring use and reuse closer together.Once the LCAF has been determined,the loop structure of the LCAF is exam-ined,and the basic blocks in the LCAF executed between use and reuse.Deﬁnition8.The basic block in the LCAF,in which the use occurred(directly or indirectly through a function call), is called the Use Basic Block(UseBB)of a x,a x ;the basic block that contains the reuse is called the Reuse Ba-sic Block(ReuseBB)of a x,a x .5.2Step2:Analyzing the Control FlowStructure in the Least Common AncestorFunctionThe key to the analysis isﬁnding the loops that“carry”the reuses.These loops are found by determining the Non-nested Use and Non-nested Reuse Basic Blocks,as deﬁned below(illustrated inﬁgure10):Deﬁnition9.The Nested Loop Forest of a function is a graph,where each node represents a basic block in the function,and there are edges from a loop header to each basic block directly controlled by that loop header.The Outermost Executed Loop Header(OELH)ofa basic block BB with respect to a given reuse pair a x,a x is the unique ancestor of BB in the nested loop forest that has been executed between use a x and reuse a x, but does not have ancestors itself that are executed between use and reuse.The Non-nested Use Basic Block(NNUBB)of a x,a x is the OELH of the use basic block of a x,a x .The Non-nested Reuse Basic Block(NNRBB)of a x,a x is the OELH of the reuse basic block of a x,a x .5.3Step3:Determining the RequiredRefactoringRefactorings are determined by analyzing the NNUBB and NNRBB.We subdivide in3diﬀerent patterns:Pattern1:Reuse occurs between iterations of a single loop.This occurs when NNUBB=NNRBB,and they are loop headers.Consequently,a single loop carries the reuses. This pattern arises when the loop traverses a“data struc-ture”1in every iteration of the loop.The distance of reuses across iterations can be made smaller by ensuring that onlya small part of the data structure is traversed in any given iteration.As such,reuses of data elements between consecu-tive iterations are separated by only a small amount of data, instead of the complete data structure.A number of transformations have been proposed to in-crease temporal locality in this way,e.g.loop tiling[26,30], data shackling[18],time skewing[31],loop chunking[3], data tiling[16]and sparse tiling[27].We call these transfor-mations tiling-like optimizations.An extreme case of sucha tiling-like optimization is loop permutation[22],where in-ner and outer loops are swapped,so that the long-distance accesses in diﬀerent iterations of the outer loop become short-distance accesses between iterations of the inner loop. Examples of occurrences of this pattern are indicated by bars with the word“TILE L...”inﬁgures3,4and5. Pattern2:Use is in one loop nest,the reuse in an-other.When NNUBB and NNRBB are diﬀerent loop head-ers,reuses occur between diﬀerent loops.The code tra-verses a data structure in the loop indicated by the NNUBB. The data structure is retraversed in the NNRBB-loop.The reuses can be brought closer together by only doing a sin-gle traversal,performing computations from both loops at the same time.This kind of optimization is known as loop fusion.We call the required transformation a fusion-like optimization.Examples of this pattern are indicated by bars with the word“FUSE L...”inﬁgure3.Pattern3:NNUBB and NNRBB are not both loop head-ers.When one of NNUBB or NNRBB are not loop head-ers,it means that either the use or the reuse is not insidea loop in the LCAF.It indicates that data is accessed in one basic block(possibly indirectly through a function call), and the other access may or may not be in a loop.So, the reused data structure is traversed twice by two separate code pieces.In this case,bringing use and reuse closer to-gether requires that the computations done in the NNUBB and in the NNRBB are“fused”so that the data structure is1the data structure could be as small a single scalar variableor as large as all the data in the program。

cuda bfs

Accelerating CUDA Graph Algorithms at Maximum WarpSungpack Hong Sang Kyun Kim Tayo Oguntebi Kunle OlukotunComputer Systems LaboratoryStanford University{hongsup,skkim38,tayo,kunle}@AbstractGraphs are powerful data representations favored in many compu-tational domains.Modern GPUs have recently shown promising re-sults in accelerating computationally challenging graph problems but their performance suffers heavily when the graph structure is highly irregular,as most real-world graphs tend to be.In this study, weﬁrst observe that the poor performance is caused by work imbal-ance and is an artifact of a discrepancy between the GPU program-ming model and the underlying GPU architecture.We then propose a novel virtual warp-centric programming method that exposes the traits of underlying GPU architectures to users.Our method signif-icantly improves the performance of applications with heavily im-balanced workloads,and enables trade-offs between workload im-balance and ALU underutilization forﬁne-tuning the performance.Our evaluation reveals that our method exhibits up to9x speedup over previous GPU algorithms and12x over single thread CPU execution on irregular graphs.When properly conﬁgured,it also yields up to30%improvement over previous GPU algorithms on regular graphs.In addition to performance gains on graph algo-rithms,our programming method achieves1.3x to15.1x speedup on a set of GPU benchmark applications.Our study also conﬁrms that the performance gap between GPUs and other multi-threaded CPU graph implementations is primarily due to the large difference in memory bandwidth.Categories and Subject Descriptors D.1.3[Programming Tech-niques]:Concurrent Programming–Parallel programming; D.3.3 [Programming Languages]:Language Constructs and Features–PatternsGeneral Terms Algorithms,PerformanceKeywords Parallel graph algorithms,CUDA,GPGPU1.IntroductionGraphs are widely-used data structures that describe a set of ob-jects,referred to as nodes,and the connections between them, called edges.Certain graph algorithms,such as breadth-ﬁrst search, minimum spanning tree,and shortest paths,serve as key compo-nents to a large number of applications[4,5,15–17,22,25]and have thus been heavily explored for potential improvement.Despite the considerable research conducted on making these algorithms Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on theﬁrst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.PPoPP’11,Feb12–16,2011,San Antonio,Texas,USA.Copyright©2011ACM978-1-4503-01190-0/11/02...$10.00efﬁcient,and the signiﬁcant performance beneﬁts they have reaped due to ever-increasing computational power,processing large irreg-ular graphs quickly and effectively remains an immense challenge today.Unfortunately,many real-world applications involve large irregular graphs.It is therefore important to fully exploit theﬁne-grain parallelism in these algorithms,especially as parallel compu-tation resources are abundant in modern CPUs and GPUs.The Parallel Random Access Machine(PRAM)abstraction has often been used to investigate theoretical parallel performance of graph algorithms[18].The PRAM abstraction assumes an inﬁnite number of processors and unit latency to shared memory from any of the processors.Actual hardware approximations of PRAM,how-ever,have been rare.Conventional CPUs lack in number of pro-cessors,and clusters of commodity general-purpose processors are poorly-suited as PRAM approximations due to their large inter-node communication latencies.In addition,clusters which span multiple address spaces impose the added difﬁculty of partitioning the graphs.In the supercomputer domain,several accurate approx-imations of the PRAM,such as the Cray XMT[13],have demon-strated impressive performance executing sophisticated graph al-gorithms[5,17].Unfortunately,such machines are prohibitively costly for many organizations.GPUs have recently become popular as general computing de-vices due to their relatively low costs,massively parallel architec-tures,and improving accessibility provided by programming envi-ronments such as the Nvidia CUDA framework[23].It has been observed that GPU architectures closely resemble supercomputers, as they implement the primary PRAM characteristic of utilizing a very large number of hardware threads with uniform memory la-tency.PRAM algorithms involving irregular graphs,however,fail to perform well on GPUs[15,16]due to the workload imbalance between threads caused by the irregularity of the graph instance.In this paper,weﬁrst observe that the signiﬁcant performance drop of GPU programs from irregular workloads is an artifact of a discrepancy between the GPU hardware architecture and direct application of PRAM-based algorithms(Section2).We propose a novel virtual warp-centric programming method that reduces the inefﬁciency in an intuitive but effective way(Section3).We apply our programming method to graph algorithms,and show signiﬁcant speedup against previous GPU implementations as well as a multi-threaded CPU implementation.We discuss why graph algorithms can execute faster on GPUs than on multi-threaded CPUs.We also demonstrate that our programming method can beneﬁt other GPU applications which suffer from irregular workloads(Section4).This work makes the following contributions:•We present a novel virtual warp-centric programming method which addresses the problem of workload imbalance,a general issue in GPU ing our method,we improve upon previous implementations of GPU graph algorithms,by several factors in the case of irregular graphs.Thread BlocksGraphics MemoryMemory Control UnitSM Unit Instr UnitShared Mem Reg FileALUSM Unit Instr UnitShared MemReg File ALUSM UnitInstr UnitShared MemReg File ALUWARPThreadFigure 1.GPU architecture and thread execution model in CUDA.•Our method provides a generalized,systematic scheme of warp-wise task allocation and improves the performance of GPU ap-plications which feature heavy branch divergence or (unneces-sary)scattering of memory accesses.Notably,it enables users to easily make the necessary trade-off between SIMD under-utilization and workload imbalance with a single parameter.Our method boosts the performance of a set of benchmark ap-plications suffering from these issues by 1.3x –15.1x.•We provide a comparative analysis featuring a GPU and twodifferent CPUs that examines which architectural traits are crit-ical to the performance of graph algorithms.In doing so,we show that GPUs can outperform other architectures by provid-ing sufﬁcient random access memory bandwidth to exploit the abundant parallelism in the algorithm.2.Background2.1GPU Architectures and the CUDA Programming Model In this section,we brieﬂy review the microarchitecture of modern graphics processors and the CUDA programming model approach to using them.We then provide a sample graph algorithm and illus-trate how the conventional CUDA programming model can result in low performance despite abundant parallelism available in the graph algorithm.In this paper,we focus on Nvidia graphics archi-tectures and terminology speciﬁc to their products.The concepts discussed here,however,are relatively general and apply to any similar GPU architecture.2.2Graph Algorithms on GPUFigure 1depicts a simpliﬁed block diagram of a modern GPU archi-tecture,only displaying modules related to general purpose com-putation.As seen in the diagram,typical general-purpose graphics processors consist of multiple identical instances of computation units called Stream Multiprocessors (SM).An SM is the unit of computation to which a group of threads,called thread blocks,are assigned by the runtime for parallel execution.Each SM has one (or more)unit to fetch instructions,multiple ALUs (i.e.,stream proces-sors or CUDA cores)for parallel execution,a shared memory ac-cessible by all threads in the SM,and a large register ﬁle which contains private register sets for each of the hardware threads.Each thread of a thread block is processed on an ALU in the SM.Since ALUs are grouped to share a single instruction unit,threads mapped on these ALUs execute the same instruction each cycle,but on different data.Each logical group of threads sharing instructions is called a warp.1Moreover,threads belonging to different warps can execute different instructions on the same ALUs,but in a dif-ferent time slot.In effect,ALUs are time-shared between warps.1Notethat the number of ALUs sharing an instruction unit (e.g.8)can be smaller than the warp size.In such cases,the ALUs are time-shared between threads in a warp;this resembles vector processors whose vector length is larger than the number of vector lanes.The following summarizes the discussion above:from the ar-chitectural standpoint,a group of threads in a warp performs as a SIMD(Single Instruction Multiple Data)unit,each warp in a thread block as a SMT(Simultaneous Multithreading)unit,and a thread block as a unit of multiprocessing.That said,modern GPU architectures relax SIMD constraints by allowing threads in a given warp to execute different instruc-tions.Since threads in a warp share an instruction unit,however,these varying instructions cannot be executed concurrently and are serialized in time,severely degrading performance.This advanced feature,so called SIMT (Single Instruction Multiple Threads),pro-vides increased programming ﬂexibility by deviating from SIMD at the cost of performance.Threads executing different instructions in a warp are said to diverge;if-then-else statements and loop-termination conditions are common sources of divergence.Another characteristic of a graphics processor which greatly im-pacts performance is its handling of different simultaneous mem-ory requests from multiple threads in a warp.Depending on the accessed addresses,the concurrent memory requests from a warp can exhibit three possible behaviors:1.Requests targeting the same address are merged to be one unless they are atomic operations.In the case of write operations,the value actually written to memory is nondeterministically chosen from among merged requests.2.Requests exhibiting spatial locality are maximally coalesced.For example,accesses to addresses i and i +1are served by a single memory fetch,as long as they are aligned.3.All other memory requests (including atomic ones)are serial-ized in a nondeterministic order.This last behavior,often called the scattering access pattern,greatly reduces memory throughput,since each memory request utilizes only a few bytes from each memory fetch.To best utilize the aforementioned graphics processors for gen-eral purpose computation,the CUDA programming model was in-troduced recently by Nvidia [23].CUDA has gained great popu-larity among developers,engineers,and scientists due to its easily accessible compiler and the familiar C-like constructs of its API ex-tension.It provided a method of programming a graphics processor without thinking in the context of pixels or textures.There is a direct mapping between CUDA’s thread model and the PRAM abstraction;each thread is identiﬁed by its thread ID and is assigned to a different job.External memory access takes a unit amount of time in a massive threading environment,and no con-cept of memory coherence is enforced among executing threads.The CUDA programming model extends the PRAM abstraction to include the notion of shared memory and thread blocks,a reﬂec-tion of the underlying hardware architecture as shown in Figure 1.All threads in a thread block can access the same shared memory,which provides lower latency and higher bandwidth access than global GPU memory but is limited in size.Threads in a thread block may also communicate with each other via this shared memory.This widely-used programming model efﬁciently maps computa-tion kernels onto GPU hardware for numerous applications such as matrix multiplication.The PRAM-like CUDA’s thread model,however,exhibits cer-tain discrepancies with the GPU microarchitecture that can signiﬁ-cantly degrade performance.Especially,it provides no explicit no-tion of warps;they are transparent to the programmers due to the SIMT ability of the processors to handle divergent threads.As a result,applications written according to the PRAM paradigm will likely suffer from unnecessary path divergence,particularly when each task assigned to a thread is completely independent from other tasks.One example is parallel graph algorithms,where the irregular nature of real-world graph instances often induce extreme branch1struct graph {2int nodes[N+1];//start index of edges from nth node 3int edges[M];//destination node of mth edge4int levels[N];//will contatin BFS level of nth node 5};67void bfs_main(graph *g,int root){8initialize_levels(g->levels,root);9curr =0;finished =false ;10do {11finished =true ;12launch_gpu_bfs_kernel(g,curr++,&finished);13}while (!finished);14}15__kernel__16void baseline_bfs_kernel(int N,int curr,int *levels,17int *nodes,int *edges,bool *finished){18int v =THREAD_ID;19if (levels[v]==curr){20//iterate over neighbors21int num_nbr =nodes[v+1]-nodes[v];22int *nbrs =&edges[nodes[v]];23for (int i =0;i <num_nbr;i++){24int w =nbrs[i];25if (levels[w]==INF){//if not visited yet 26*finished =false ;27levels[w]=curr +1;28}}}}Figure 2.The baseline GPU implementation of BFS algorithm.divergence problems and scattering memory access patterns,as will be explained in the next section.In Section 3,we introduce a new generalized programming method that uses its awareness of the warp concept to address this problem.Figure 2is an example of a graph algorithm written in CUDA using the conventional PRAM-style programming from a previous work [15].2This algorithm performs a breadth-ﬁrst search (BFS)on the graph instance,starting from a given root node.More accu-rately,it assigns a “BFS level”to every (connected)vertex in the graph;the level represents the minimum number of hops to reach this node from the root node.Figure 2also describes the graph data structure used in the BFS,which is the same as the data structures used in other related work [6,15,16].This data structure consists of an array of nodes and edges,where each element in the nodes array stores the start index (in the edges array)of the edges outgoing from each node.The edges array stores the destination nodes of each edge.The last element of the nodes array serves as a marker to indicate the length of the edges array.Figure 3.(a)visualizes the data structure.3For this algorithm,the level of each node is set to ∞,except for the root which is set to zero.The kernel (code to be executed on the GPU)is called multiple times until all reachable nodes are visited,incrementing the current level by one upon each call.At each invo-cation,each thread visits a node which has the same current_level and marks all unvisited neighbors of the node with current_level+1.Nodes may be marked multiple time within a kernel invocation,since updates are not immediately visible to all threads.This does not affect correctness,as all updates will use the same correct value.This paper mainly focuses on the BFS algorithm,but our discussion can be applied to many similar parallel graph algorithms that pro-cess multiple nodes in parallel while exploring neighboring nodes from each.We will discuss some of these algorithms in Section 4.3.The baseline BFS implementation shown in Figure 2suffers a severe performance penalty when the graph is highly irregular,i.e.when the distribution of degrees (number of edges per node)is highly skewed.As we will show in Section 4,the baseline al-gorithm yields only a 1.5x speedup over a single-threaded CPU when the graph is very irregular.Performance degradation comes from execution path divergence at lines 19,23,and 25in Fig-ure 2.Speciﬁcally,a thread that processes a high-degree node will iterate the loop at line 23many more times than other threads,stalling other threads in its warp.Additional performance degrada-tion comes from non-coalesced memory operations at lines 21,22,25,and 27since their addresses exhibit no spatial locality across the threads.In addition,a repeated single-threaded access over con-2Thealgorithm presented actually contains additional optimizations we made to the original version [15];we eliminated unnecessary memory accesses and also eliminated an entire secondary kernel,which resulted in more than 20%improvement.We use this optimized version as our baseline.3This data-structure is also known as compressed sparse row (CSR)in sparse-matrix computation domain [9].………02599189…NodesEdges799250189…………78…(a)Degree# N o d e s(b)Figure 3.(a)A visualization of the graph data structure used in theBFS algorithm.(b)A degree distribution of a real-world graph instance (LiveJournal),which we used for our evaluation in Section 4.secutive memory addresses (i.e.,at line 21)actually wastes memory bandwidth by failing to exploit spatial locality in memory accesses.Unfortunately,the nature of most real-world graph instances is known to be irregular [24].Figure 3.(b)displays the degree distri-bution from one such example.Note that the plot is presented in log-log format.The distribution shows that although the average degree is small (about 17),there are many nodes which have de-grees 10x ∼100x (and some even 1000x)larger than the average.3.Addressing Irregular Workloads using GPUs3.1Virtual Warp-centric Programming MethodWe introduce a novel virtual warp-centric programming method which explicitly exposes the underlying SIMD nature of the GPU architecture to achieve better performance under irregular work-loads.Generalized warp-based task allocationInstead of assigning a different task to each thread as is typical in PRAM-style programming,our approach allocates a chunk of tasks to each warp and executes distinct tasks as serial.We uti-lize multiple threads in a warp for explicit SIMD operations only,thereby preventing branch-divergence altogether.More speciﬁ-cally,the kernel in our programming model alternates between two phases:the SISD (Single Instruction Single Data)phase,which is the default serial execution mode,and the SIMD phase,the parallel execution mode.When the kernel is in the SISD phase,only a sin-gle stream of instructions is executed by each warp.In this phase,each warp is identiﬁed by a unique warp ID and works on an inde-pendent set of tasks.The degree of parallelism is thus maintained by utilizing multiple warps.In contrast,the SIMD phase begins by entering a special function explicitly invoked by the user.Once in the SIMD phase,each thread in the warp follows the same in-struction sequence,but on different data based on its warp offset,or the lane ID within the given SIMD width.Unlike the classi-cal (CPU-based)SIMD programming model,however,our SIMD threads are allowed more ﬂexibility in executing instructions;they29template<int W_SZ>__device__30void memcpy_SIMD31(int W_OFF,int cnt,int*dest,int*src){32for(int IDX=W_OFF;IDX<cnt;IDX+=W_SZ) 33dest[IDX]=src[IDX];34__threadfence_block();}3536template<int W_SZ>__device__37void expand_bfs_SIMD38(int W_SZ,int W_OFF,int cnt,int*edges,39int*levels,int curr,bool*finished){40for(int IDX=W_OFF;IDX<cnt;IDX+=W_SZ){ 41int v=edges[IDX];42if(levels[v]==INF){43levels[v]=curr+1;44*finished=false;45}}46__threadfence_block();}4748struct warpmem_t{49int levels[CHUNK_SZ];50int nodes[CHUNK_SZ+1];51int scratch;52};53template<int W_SZ>__kernel__54void warp_bfs_kernel55(int N,int curr,int*levels,56int*nodes,int*edges,bool*finished){57int W_OFF=THREAD_ID%W_SZ;58int W_ID=THREAD_ID/W_SZ;59int NUM_WARPS=NUM_THREADS/W_SZ;60extern__shared__warp_mem_t SMEM[];61warpmem_t*MY=SMEM+(LOCAL_THREAD_ID/W_SZ); 6263//copy my work to local64int v_=W_ID*CHUNK_SZ;65memcpy_SIMD<W_SZ>(W_OFF,CHUNK_SZ,66MY->levels,&levels[v_]);67memcpy_SIMD<W_SZ>(W_OFF,CHUNK_SZ+1,68MY->nodes,&nodes[v_]);6970//iterate over my work71for(int v=0;v<CHUNK_SZ;v++){72if(MY->levels[v]==curr){73int num_nbr=MY->nodes[v+1]-MY->nodes[v]; 74int*nbrs=&edges[MY->nodes[v]];75expand_bfs_SIMD<W_SZ>(W_OFF,num_nbr,76nbrs,levels,curr,finished)77}}}Figure4.BFS kernel written in virtual warp-centric programming modelcan perform scattering/gathering memory accesses,execute condi-tional operations independently,and process dynamic data width. This is all done while taking advantage of the underlying hardware SIMT feature.The proposed programming method has several advantages: 1.Unless explicitly intended by the user,this approach never en-counters execution-path divergence issues.Intra-warp workload imbalance is therefore never unaware.2.Memory access patterns can be more coalesced than the con-ventional thread-level task allocation in applications where con-current memory accesses within a task exhibit much higher spa-tial locality than across different tasks.3.Many developers are already familiar with our approach,sinceit resembles,in many ways,the traditional SIMD programming model for CPU architectures.However,the proposed approach is even simpler and more powerful than SIMD programming for CPU,since CUDA allows users to describe custom SIMD operations with C-like syntax.4.This method allows for each task to allocate a substantialamount of privately-partitioned shared memory per task.This is because there are fewer warps than threads in a thread block.In order to generally apply our programming method within current GPU hardware and compiler environments,we take simple means of replicated computation:during the SISD phase,every thread in a warp executes exactly the same instruction on exactly the same data.We enforce this by assigning the same warp ID to all threads in a warp.Note that this does not waste memory bandwidth since accesses from the same warp to the same destination address are merged into one by the underlying hardware.Virtual Warp SizeAlthough naive warp-granular task allocation provides several merits aforementioned,it suffers from two potential drawbacks, where in both cases,unused ALUs within a warp limit the parallel performance of kernel execution:1.If the native SIMD width of the user application is small,theunderlying hardware will be under-utilized.2.The ratio of the SIMD phase duration to the SISD phase dura-tion imposes an Amdahl’s limit on performance.We address these issues by logically partitioning a warp into multiple virtual warps.Speciﬁcally,instead of setting the warp size parameter value to be the actual physical warp size of32,we use a divisor(i.e.4,8,and16).Multiple virtual warps are then co-located in one physical warp,with each virtual warp processing a different task.Note that all previous assumptions on a warp’s execution behavior–synchronized execution and merged memory accesses for the threads inside a warp–are still valid within virtual warps.Thus,the parallelism of the SISD phase increases as a result of having multiple virtual warps for each physical warp,and the ALU utilization improves as well due to the logically narrower SIMD width.Using virtual warps leads to the possibility of execution path divergence among different virtual warps,which in turn serializes different instruction streams among the warps.The degree of diver-gence among virtual warps,however,is most likely much less than among threads in a conventional PRAM warp.In essence,the vir-tual warp scheme can be viewed as a trade-off between execution-path divergence and ALU underutilization by varying a single pa-rameter,the virtual warp size.BFS in the Virtual Warp-centric Programming MethodFigure4displays the implementation of the BFS algorithm using our virtual warp-centric method.While the underlying BFS algorithm is fundamentally identical to the baseline implementation in Figure2,the new implementation divides into SISD and SIMD phases.The main kernel(lines54-77)executes the same instruction and data pattern for every thread in the warp,thus operating in the SISD phase.Functions in lines30-46operate in the SIMD phase, since distinct partitions of data are processed.Each warp also uses a private partition of shared memory;the data structure in lines48-51 illustrates the layout of each private partition.Lines57-61of the main kernel deﬁne several utility variables. The virtual-warp size(W_SZ)is given as a template parameter;the warp ID(W_ID)of the current warp and the warp offset(W_OFF)of each thread is computed using the warp size.Warp-private memory space is allocated by setting the pointer(MY)to the appropriate location in the shared memory space.The virtual warp-centric implementation copies its portion of work to the private memory space(lines64-68)before executing the main loop.As the function name implies,the memory copy operation is performed in a SIMD manner.After the memory copy operationﬁnishes,the kernel executes the iterative BFS algorithm78BEGIN_SIMD_DEF(memcpy,int*dest,int*src) 79{dest[IDX]=src[IDX];}END_SIMD_DEF8081BEGIN_SIMD_DEF(expand_bfs,int*edges,82int*level,int curr,bool*finished)83{int v=edges[IDX];84if(level[v]==INF){85level[v]=curr+1;86*finished=false;87}}END_SIMD_DEF8889BEGIN_WARP_KERNEL(warp_bfs_kernel,90int N,int curr,int*level,91int*nodes,int*edges,bool*finished){92USE_PRIV_MEM(warp_mem_t);93//copy my_work94int v_=N/NUM_WARPS*W_ID;95DO_SIMD(mempcy,CHUNK_SZ,MY->levels,&level[v_]); 96DO_SIMD(mempcy,CHUNK_SZ+1,MY->nodes,&nodes[v_]); 9798//iterate over my_work99for(int v=0;v<CHUNK_SZ;v++){100if(level[v]==curr){101int num_nbr=MY->nodes[v+1]-MY->nodes[v];102int*nbrs=&edges[MY->nodes[v]];103DO_SIMD(expand_bfs,num_nbr,104nbrs,begin,level,curr,finished)105}}}END_WARP_KERNELFigure5.Same code as Figure4using macro-expansion.Type deﬁnition of warp_mem_t is same as before and omitted.sequentially(lines71-77),with the exception of explicitly-called SIMD functions.The expansion of BFS neighbors(line75)is an explicit SIMD function call to the one deﬁned at line37,whose functionality is equivalent to lines23-27of the baseline algorithm Figure2.For a detailed explanation of how SIMD functions are imple-mented,consider the simple memcpy function in line30.Each thread in a warp enters the function with a distinct warp offset(W_OFF), which leads to a different range of indices(IDX)of the data to be copied.The SIMT feature of CUDA enables the width of the SIMD operation to be determined dynamically.Although the SIMT fea-ture guarantees synchronous execution of all threads at the end of the memcpy function,__threadfence_block()at line34is still re-quired for intra-warp visibility of any pending writes before re-turning to SISD phase.4The second SIMD function,expand_bfs (line37),is structured similarly to memcpy.The if-then-else state-ment in line42is an example of a conditional SIMD operation, automatically handled by SIMT hardware.Using the virtual warp-centric method,the BFS code exhibits no execution-path divergence other than intended dynamic widths and conditional operations,as shown in Figure4.Moreover,memory accesses are coalesced except theﬁnal scattering at line42and43, which are inherent to the nature of the BFS algorithm.Abstracting the Virtual Warp-centric Programming Method As evident in the BFS example,the virtual warp-centric pro-gramming method is intuitive enough to be manually applied by GPU programmers.Closer inspection of the code in Figure4,how-ever,reveals some structural repetition in patterns that serve the programming method itself,rather than the user algorithm.Thus, providing an appropriate abstraction for the model can further re-duce programmer effort as well as potential for error in the struc-tural part of the program.To this end,we introduce a small set of syntactic constructs in-tended to facilitate use of the programming model.Figure5illus-trates how these constructs can simplify our previous warp-centric BFS implementation.For example,the SIMD function memcpy(line 30-33)in Figure4can be concisely expressed as line78-79in Fig-ure5.The constructs BEGIN_SIMD_DEF and END_SIMD_DEF automat-ically generate the function deﬁnition and outer-loop for work dis-tribution.The user invokes the SIMD function using the DO_SIMD construct(line95),where the function name,dynamic width,and other arguments are speciﬁed.Similarly,the BEGIN_WARP_KERNEL and END_WARP_KERNEL constructs indicate and generate the begin-ning and end of a warp-centric kernel,while the USE_PRIV_MEM con-struct allocates a private partition of shared memory.4Although intra-warp visibility is attainable without the fence in some GPU generations(e.g.GT200),it is not guaranteed in general by the CUDA speciﬁcation.Also note that the fence guarantees threadblock-wide visibility,which is larger than required;however,the performance impact of the overhead is negligible.The current set of constructs are implemented as C-macros, which is adequate to demonstrate how these constructs can gen-erate desired routines and simplify programming.However,future compiler support of such virtual warp-centric constructs,or simi-lar,could provide further beneﬁts.For example,the compiler may choose to generate codes for SISD regions such that only a sin-gle thread in a warp is actually activated,rather than replicating computation.This eliminates unnecessary allocation of duplicated registers which are used only in the SISD phase and can also save power wasted by replicated computation.3.2Other TechniquesIn this subsection,we discuss two other general techniques for addressing work imbalance.These techniques do not necessarily rely on the new programming model but can accompany it.Deferring OutliersTheﬁrst technique is deferring execution of exceptionally large-sized tasks,which we term’outliers’.Since there are a limited number of such tasks which induce load imbalance,we identify these tasks during main-kernel execution and defer their processing by placing them in a globally-shared queue,rather than processing them on-line.In subsequent kernel calls,each of the deferred tasks is executed individually,with its work parallelized across multiple threads.Figure6illustrates this idea.In the BFS algorithm,the amount of work is proportional to the degree of each node,which is obtainable in0(1)time given our data structure.For this technique,therefore,we simply defer processing of any node having degree greater than a predetermined threshold. Results in Section4explore the effects on performance when one varies this threshold.This optimization technique requires the implementation of a global queue,a challenging task on a GPU in general.It is rela-tively simple,however,to implement a queue that always grows (or shrinks)during a kernel’s execution.The code below exempli-ﬁes such an implementation using a single atomic operation: AddQueue(int*q_idx,type_t*q,type_t item){ int old_idx=AtomicAdd(q_idx,1);q[old_idx]=item;}In our case,the overhead of the atomic operations is negligible compared to overall execution time,since queuing of deferred out-liers is rare.However,this technique presents additional overhead via subsequent kernel invocations to process the deferred outliers.Dynamic Workload DistributionThe virtual warp-centric programming method addresses the problem of workload imbalance inside a warp.However,there still exists the possibility of workload imbalance between warps:a sin-gle warp processing an exceptionally large task can stall the entire thread block(mapped to an SM),wasting computational resources. To solve this problem,we apply a dynamic workload distribution。

What Every Computer Scientist Should Know About Floating-Point Arithmetic

What Every Computer Scientist Should Know About Floating-Point Arithmetic
2550 Garcia Avenue Mountain View, CA 94043 U.S.A.
Part No: 800-7895-10 Revision A, June 1992
iii
Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rounding Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Errors In Summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Theorem 14 and Theorem 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

《医学主题词表》

《医学主题词表》来源：创新医学网医学编辑部推荐《医学主题词表》(MeSH)(1)《医学主题词表》(Medical Subject Headings，简称MeSh)，是美国国立医学图书馆编制的权威性主题词表。

它是一部规范化的可扩充的动态性叙词表。

美国国立医学图书馆以它作为生物医学标引的依据，编制《医学索引》(Index Medicus)及建立计算机文献联机检索系统MEDLINE数据库。

《MeSh》汇集约18,000多个医学主题词。

《MeSh》在文献检索中的重要作用主要表现在两个方面：准确性(准确揭示文献内容的主题)和专指性。

标引(对文献进行主题分析，从自然语言转换成规范化检索语言的过程)人员将信息输入检索系统以及检索者(用户)利用系统内信息情报这两个过程中，以主题词作为标准用语，使标引和检索之间用语一致，达到最佳检索效果。

(2)在进行检索时，用户输入一个主题词后，系统会自动显示该主题词所能组配的副主题词。

《MeSh》有一个副主题词表，1989-1990年IM使用的副主题词是77个，1991-1994年是80个，每年略有变化。

目前IM使用的副主题词是92个。

副主题词(Subheadings)又称限定词(Qualifiers)，与主题词进行组配，对某一主题词的概念进行限定或复分，使主题词具有更高的专指性。

如诊断(Diagnosis，DI)、药物治疗(Drug Theray，DT)、血液供给(Blood Supply，BS)等。

正确选择副主题词也很关键。

例如肺发育不全，输入主题词“肺”后，在副主题词菜单中选择“畸形”表示发育不全;再例如，双子宫——用子宫/畸形检索。

(3)在〈〈医学主题词注释字顺表MeSHAAL〉〉中，对每个范畴类目的主题词和副主题词的组配原则进行了严格规定，组配时要按照规则进行。

例如，副主题词治疗therapy与疾病主题词组配，可用于综合疗法。

例如，消化性溃疡的心理疗法，用消化性溃疡/治疗;心理疗法。

3 Natural Language Question Answering over RDF - A Graph Data Driven Approach

Natural Language Question Answering over RDF — A Graph Data Driven ApproachLei ZouPeking University Beijing, ChinaRuizhe HuangPeking University Beijing, ChinaHaixun Wang ∗Microsoft Research Asia Beijing, Chinazoulei@ Jeffrey Xu YuThe Chinese Univ. of Hong Kong Hong Kong, Chinahuangruizhe@ Wenqiang HePeking University Beijing, Chinahaixun@ Dongyan ZhaoPeking University Beijing, Chinayu@.hk ABSTRACThewenqiang@zhaody@RDF question/answering (Q/A) allows users to ask questions in natural languages over a knowledge base represented by RDF. To answer a national language question, the existing work takes a twostage approach: question understanding and query evaluation. Their focus is on question understanding to deal with the disambiguation of the natural language phrases. The most common technique is the joint disambiguation, which has the exponential search space. In this paper, we propose a systematic framework to answer natural language questions over RDF repository (RDF Q/A) from a graph data-driven perspective. We propose a semantic query graph to model the query intention in the natural language question in a structural way, based on which, RDF Q/A is reduced to subgraph matching problem. More importantly, we resolve the ambiguity of natural language questions at the time when matches of query are found. The cost of disambiguation is saved if there are no matching found. We compare our method with some state-of-theart RDF Q/A systems in the benchmark dataset. Extensive experiments conﬁrm that our method not only improves the precision but also speeds up query performance greatly.and predicates are edge labels. Although SPARQL is a standard way to access RDF data, it remains tedious and difﬁcult for end users because of the complexity of the SPARQL syntax and the RDF schema. An ideal system should allow end users to proﬁt from the expressive power of Semantic Web standards (such as RDF and SPARQLs) while at the same time hiding their complexity behind an intuitive and easy-to-use interface [13]. Therefore, RDF question/answering (Q/A) systems have received wide attention in both NLP (natural language processing) [29, 2] and database areas [30].1.1MotivationCategories and Subject DescriptorsH.2.8 [Database Management]: Database Applications—RDF, Graph Database, Question Answering1.INTRODUCTIONAs more and more structured data become available on the web, the question of how end users can access this body of knowledge becomes of crucial importance. As a de facto standard of a knowledge base, RDF (Resource Description Framework) repository is a collection of triples, denoted as subject, predicate, object , and can be represented as a graph, where subjects and objects are vertices∗ Haixun Wang is currently with Google Research, Mountain View, CA.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. Request permissions from Permissions@. SIGMOD’14, June 22–27, 2014, Snowbird, UT, USA. Copyright 2014 ACM 978-1-4503-2376-5/14/06 ...$15.00. /10.1145/2588555.2610525 ...$.There are two stages in RDF Q/A systems: question understanding and query evaluation. Existing systems in the ﬁrst stage translate a natural language question N into SPARQLs [29, 6, 13], and in the second stage evaluate all SPARQLs translated in the ﬁrst stage. The focus of the existing solutions is on query understanding. Let us consider a running example in Figure 1. The RDF dataset is given in Figure 1(a). For the natural language question N “Who was married to an actor that played in Philadelphia ? ”, Figure 1(b) illustrates the two stages done in the existing solutions. The inherent hardness in RDF Q/A is the ambiguity of natural language. In order to translate N into SPARQLs, each phrase in N should map to a semantic item (i.e, an entity or a class or a predicate) in RDF graph G. However, some phrases have ambiguities. For example, phrase “Philadelphia” may refer to entity Philadelphia(ﬁlm) or Philadelphia_76ers . Similarly, phrase “play in” also maps to predicates starring or playForTeam . Although it is easy for humans to know the mapping from phrase “Philadelphia” (in question N ) to Philadelphia_76ers is wrong, it is uneasy for machines. Disambiguating one phrase in N can inﬂuence the mapping of other phrases. The most common technique is the joint disambiguation [29]. Existing disambiguation methods only consider the semantics of a question sentence N . They have high cost in the query understanding stage, thus, it is most likely to result in slow response time in online RDF Q/A processing. In this paper, we deal with the disambiguation in RDF Q/A from a different perspective. We do not resolve the disambiguation problem in understanding question sentence N , i.e., the ﬁrst stage. We take a lazy approach and push down the disambiguation to the query evaluation stage. The main advantage of our method is it can avoid the expensive disambiguation process in the question understanding stage, and speed up the whole performance. Our disambiguation process is integrated with query evaluation stage. More speciﬁcally, we allow that phrases (in N ) correspond313Subject Antonio_Banderas Antonio_Banderas Antonio_Banderas Philadelphia_(film) Jonathan_Demme Philadelphia Aaron_McKie James_Anderson Constantin_Stanislavski Philadelphia_76ers An_Actor_Prepares c1 actor <type> u2 Antonio_Banderas <starring> <spouse> u1 Melanie_Griffith c2Predicate type spouse starring type director type bornIn create type type c3Object actor Melanie_Griffith Philadelphia_(film) film Philadelphia_(film) city Philadelphia An_Actor_Prepares Basketball_team Book Philadelphia city actor play in Disambiguation be married to SPARQL GenerationWho was married to an actor that play in Philadelphia ? Generating a Semantic Query Graph <spouse> <actor> <An_Actor_Prepares> <starring> <playedForTeam> <Philadelphia> <Philadelphia(film)> <Philadelphia_76ers> ?who v1 Who be married to “that” v2 play in actor <spouse, 1.0> <playForTeam, 1.0> <actor, 1.0> <Philadelphia, 1.0> <starring, 0.9> <Philadelphia(film), 0.9> <An_Actor_Prepares, 0.9> <director, 0.5> <Philadelphia_76ers, 0.8> Philadelphia v3 Semantic Query GraphplayedForTeam Philadelphia_76ersfilm <type> u3 Philadelphia_(film) <director> u4 Jonathan_Demme <type> u5 Philadelphia <bornIn > u6 Aaron_McKie c4 Basketball_team <type>SELECT ?y WHERE { ?x starring Philadelphia_ ( film ) . ?x type actor . ?x spouse ?y. } Query EvaluationFinding Top-k Subgraph Matches c1 actor u2 <spouse> u1 Melanie_Griffith <type> Antonio_Banderasu7 James_Anderson<playedForTeam>u10 Philadelphia_76ers u8 c5 Book Constantin_Stanislavski <create> <type> u10 An_Actor_Prepares (a) RDF Dataset and RDF GraphSPARQL Query Engine<starring> u3 Philadelphia_(film)?y: Melanie_Griffith (b) SPARQL Generation-and-Query Framework(c) Our FrameworkFigure 1: Question Answering Over RDF Dataset to multiple semantic items (e.g., subjects, objects and predicates) in RDF graph G in the question understanding stage, and resolve the ambiguity at the time when matches of the query are found. The cost of disambiguation is saved if there are no matching found. In our problem, the key problem is how to deﬁne a “match” of question N in RDF graph G and how to ﬁnd matches efﬁciently. Intuitively, a match is a subgraph (of RDF graph G) that can ﬁt the semantics of question N . The formal deﬁnition of the match is given in Deﬁnition 3 (Section 2). We illustrate the intuition of our method by an example. Consider a subgraph of graph G in Figure 1(a) (the subgraph induced → by vertices u1 , u2 , u3 and c1 ). Edge − u− 2 c1 says that “Antonio Ban− − → deras is an actor”. Edge u2 u1 says that “Melanie Grifﬁth is mar→ ried to Antonio Banderas”. Edge − u− 2 u3 says that “Antonio Banderas starred in a ﬁlm Philadelphia(ﬁlm) ”. The natural language question N is “Who was married to an actor that played in Philadel→ − − → phia”. Obviously, the subgraph formed by edges − u− 2 c1 , u2 u1 and − − → u2 u3 is a match of N . “Melanie Grifﬁth” is a correct answer. On the other hand, we cannot ﬁnd a match (of N ) containing Philadelphia _76ers in RDF graph G. Therefore, the phrase “Philadelphia” (in N ) cannot map to Philadelphia_76ers . This is the basic idea of our data-driven approach. Different from traditional approaches, we resolve the ambiguity problem in the query evaluation stage. A challenge of our method is how to deﬁne a “match” between a subgraph of G and a natural language question N . Because N is unstructured data and G is graph structure data, we should ﬁll the gap between two kinds of data. Therefore, we propose a semantic query graph QS to represent the question semantics of N . We formally deﬁne QS in Deﬁnition 2. An example of QS is given in Figure 1(c), which represents the semantic of the question N . Each edge in QS denotes a semantic relation. For example, edge v1 v2 denotes that “who was married to an actor”. Intuitively, a match of question N over RDF graph G is a subgraph match of QS over G (formally deﬁned in Deﬁnition 3). N . The coarse-grained framework is given in Figure 1(c). In the question understanding stage, we interpret a natural language question N as a semantic query graph QS (see Deﬁnition 2). Each edge in QS denotes a semantic relation extracted from N . A semantic relation is a triple rel,arg 1,arg 2 , where rel is a relation phrase, and arg 1 and arg 2 are its associated arguments. For example, “play in”,“actor”,“Philadelphia” is a semantic relation. The edge label is the relation phrase and the vertex labels are the associated arguments. In QS , two edges share one common endpoint if the two corresponding relations share one common argument. For example, there are two extracted semantic relations in N , thus, we have two edges in QS . Although they do not share any argument, arguments “actor” and “that” refer to the same thing. This phenomenon is known as “coreference resolution” [25]. The phrases in edges and vertices of QS can map to multiple semantic items (such as entities, classes and predicates) in RDF graph G. We allow the ambiguity in this stage. For example, the relation phrase “play in” (in edge v2 v3 ) corresponds to three different predicates. The argument “Philadelphia” in v3 also maps to three different entities, as shown in Figure 1(c). In the query evaluation stage, we ﬁnd subgraph matches of QS over RDF graph G. For each subgraph match, we deﬁne its matching score (see Deﬁnition 6) that is based on the semantic similarity of the matching vertices and edge in QS and the subgraph match in G. We ﬁnd the top-k subgraph matches with the largest scores. For example, the subgraph induced by u1 , u2 and u3 matches query QS , as shown in Figure 1(c). u2 matches v2 (“actor”), since u2 ( Antonio_Banderas ) is a type-constraint entity and u2 ’s type is actor . u3 ( Philadelphia(ﬁlm) ) matches v3 (“Philadelphia”) and u1 ( Melanie_Grifﬁth ) matches v1 (“who”). The result to question N is Melanie_Grifﬁth . Also based on the subgraph match query, we cannot ﬁnd a subgraph containing u10 ( Philadelphia_76ers ) to match QS . It means that the mapping from “Philadelphia” to u10 is a false alarm. We deal with disambiguation in query evaluation based on the matching result. Pushing down disambiguation to the query evaluation stage not only improves the precision but also speeds up the whole query response time. Take the up-to-date DEANNA [20] as an example. DEANNA [29] proposes a joint disambiguation technique. It mod-1.2Our ApproachAlthough there are still two stages “question understanding” and “query evaluation” in our method, we do not adopt the existing framework, i.e., SPARQL generation-and-evaluation. We propose a graph data-driven solution to answer a natural language question314Table 1: NotationsNotation G(V, E ) N Q Y D T rel vi /ui Cvi /Cvi vj d Deﬁnition and Description RDF graph and vertex and edge sets A natural language question A SPARQL query The dependency tree of qN L The paraphrase dictionary A relation phrase dictionary A relation phrase A vertex in query graph/RDF graph Candidate mappings of vertex vi /edge vi vj Candidate mappings of vertex vi /edge vi vjĂĂFigure 3: Paraphrase Dictionary D Philadelphia(ﬁlm) and Philadelphia_76ers . We need to know which one is users’ concern. In order to address the ﬁrst challenge, we extract the semantic relations (Deﬁnition 1) implied by the question N , based on which, we build a semantic query graph QS (Deﬁnition 2) to model the query intention in N . D EFINITION 1. (Semantic Relation). A semantic relation is a triple rel, arg 1, arg 2 , where rel is a relation phrase in the paraphrase dictionary D, arg 1 and arg 2 are the two argument phrases. In the running example, “be married to”, “who”,“actor” is a semantic relation, in which “be married to” is a relation phrase, “who” and “actor” are its associated arguments. We can also ﬁnd another semantic relation “play in”, “that”,“Philadelphia” in N . D EFINITION 2. (Semantic Query Graph) A semantic query graph is denoted as QS , in which each vertex vi is associated with an argument and each edge vi vj is associated with a relation phrase, 1 ≤ i, j ≤ |V (QS )| . Actually, each edge in QS together with the two endpoints represents a semantic relation. We build a semantic query graph QS as follows. We extract all semantic relations in N , each of which corresponds to an edge in QS . If the two semantic relations have one common argument, they share one endpoint in QS . In the running example, we get two semantic relations, i.e., “be married to”, “who”,“actor” and “play in”, “that”,“Philadelphia” , as shown in Figure 2. Although they do not share any argument, arguments “actor” and “that” refer to the same thing. This phenomenon is known as “coreference resolution” [25]. Therefore, the two edges also share one common vertex in QS (see Figure 2(c)). We will discuss more technical issues in Section 4.1. To deal with the ambiguity issue (the second challenge), we propose a data-driven approach. The basic idea is: for a candidate mapping from a phrase in N to an entity (i.e., vertex) in RDF graph G, if we can ﬁnd the subgraph containing the entity that ﬁts the query intention in N , the candidate mapping is correct; otherwise, this is a false positive mapping. To enable this, we combine the disambiguation with the query evaluation in a single step. For example, although “Philadelphia” can map three different entities, in the query evaluation stage, we can only ﬁnd a subgraph containing Philadelphia_ﬁlm that matches the semantic query graph QS . Note that QS is a structural representation of the query intention in N . The match is based on the subgraph isomorphism between QS and RDF graph G. The formal deﬁnition of match is given in Deﬁnition 3. For the running example, we cannot ﬁnd any subgraph match containing Philadelphia or Philadelphia_76ers of QS . The answer to question N is “Melanie_Grifﬁth” according to the resulting subgraph match. Generally speaking, there are ofﬂine and online phases in our solution.els the disambiguation as an ILP (integer liner programming) problem, which is an NP-hard problem. To enable the disambiguation, DEANNA needs to build a disambiguation graph. Some phrases in the natural language question map to some candidate entities or predicates in RDF graph as vertices. In order to introduce the edges in the disambiguation graph, DEANNA needs to compute the pairwise similarity and semantic coherence between every two candidates on the ﬂy. It is very costly. However, our method avoids the complex disambiguation algorithms, and combines the query evaluation and the disambiguation in a single step. We can speed up the whole performance greatly. In a nutshell, we make the following contributions in this paper. 1. We propose a systematic framework (see Section 2) to answer natural language questions over RDF repositories from a graph data-driven perspective. To address the ambiguity issue, different from existing methods, we combine the query evaluation and the disambiguation in a single step, which not only improves the precision but also speed up query processing time greatly. 2. In the ofﬂine processing, we propose a graph mining algorithm to map natural language phrases to top-k possible predicates (in a RDF dataset) to form a paraphrase dictionary D, which is used for question understanding in RDF Q/A. 3. In the online processing, we adopt two-stage approach. In the query understanding stage, we propose a semantic query graph QS to represent the users’ query intention and allow the ambiguity of phrases. Then, we reduce RDF Q/A into ﬁnding subgraph matches of QS over RDF graph G in the query evaluation stage. We resolve the ambiguity at the time when matches of the query are found. The cost of disambiguation is saved if there are no matching found. 4. We conduct extensive experiments over several real RDF datasets (including QALD benchmark) and compare our system with some state-of-the-art systems. Experiment results show that our solution is not only more effective but also more efﬁcient.2.FRAMEWORKThe problem to be addressed in this paper is to ﬁnd the answers to a natural language question N over an RDF graph G. Table 1 lists the notations used throughout this paper. There are two key challenges in this problem. The ﬁrst one is how to represent the query intention of the natural language question N in a structural way. The underlying RDF repository is a graph structured data, but, the natural language question N is unstructured data. To enable query processing, we need a graph representation of N . The second one is how to address the ambiguity of natural language phrases in N . In the running example, “Philadelphia” in the question N may refer to different entities, such as315(a) Natural Language Question1 Who actor 23 1 2 3(b) Semantic Relations Extraction and Building Semantic Query Graph3 5 2?who <spouse, 1.0> 1 <actor, 1.0> 7469 8An_Actor_Prepares5 10Figure 2: Natural Language Question Answering over Large RDF Graphs2.1OfﬂineTo enable the semantic relation extraction from N , we build a paraphrase dictionary D, which records the semantic equivalence between relation phrases and predicates. For example, in the running example, natural language phrases “be married to” and “play in” have the similar semantics with predicates spouse and starring , respectively. Some existing systems, such as Patty [18] and ReVerb [10], provide a rich relation phrase dataset. For each relation phrase, they also provide a support set with entity pairs, such as ( Antonio_Banderas , Philadelphia(ﬁlm) ) for the relation phrase “play in”. Table 2 shows two sample relation phrases and their supporting entity pairs. The intuition of our method is as follows: for each relation phrase reli , let Sup(reli ) denotes a set of supporting entity pairs. We assume that these entity pairs also occur in RDF graph. Experiments show that more than 67% entity pairs in the Patty relation phrase dataset occur in DBpedia RDF graph. The frequent predicates (or predicate paths) connecting the entity pairs in Sup(reli ) have the semantic equivalence with the relation phrase reli . Based on this idea, we propose a graph mining algorithm to ﬁnd the semantic equivalence between relation phrases and predicates (or predicate paths).2.2OnlineThere are two stages in RDF Q/A: question understanding and query evaluation. 1) Question Understanding. The goal of the question understanding in our method is to build a semantic query graph QS for representing users’ query intention in N . We ﬁrst apply Stanford Parser to N to obtain the dependency tree Y of N . Then, we extract the semantic relations from Y based on the paraphrase dictionary D. The basic idea is to ﬁnd a minimum subtree (of Y ) that contains all words of rel, where rel is a relation phrase in D. The subtree is called an embedding of rel in Y . Based on the embedding position in Y , we also ﬁnd the associated arguments according to some linguistics rules. The relation phrase rel together with the two associated arguments form a semantic relation, denoted as a triple rel,arg 1,arg 2 . Finally, we build a semantic query graphQS by connecting these semantic relations. We will discuss more technical issues in Section 4.1. 2) Query Evaluation. As mentioned earlier, a semantic query graph QS is a structural representation of N . In order to answer N , we need to ﬁnd a subgraph (in RDF graph G) that matches QS . The match is deﬁned according to the subgraph isomorphism (formally deﬁned in Deﬁnition 3) First, each argument in vertex vi of QS is mapped to some entities or classes in the RDF graph. Given an argument argi (in vertex vi of QS ) and an RDF graph G, entity linking [31] is to retrieve all entities and classes (in G) that possibly correspond to argi , denoted as Cvi . Each item in Cvi is associated with a conﬁdence probability. In Figure 2, argument “Philadelphia” is mapped to three different entities Philadelphia , Philadelphia(ﬁlm) and Philadelphia_76ers , while argument “actor” is mapped to a class Actor and an entity An_ Actor_Prepares . We can distinguish a class vertex and an entity vertex according to RDF’s syntax. If a vertex has an incoming adjacent edge with predicate rdf:type or rdf:subclass , it is a class vertex; otherwise, it is an entity vertex. Furthermore, if arg is a wh-word, we assume that it can match all entities and classes in G. Therefore, for each vertex vi in QS , it also has a ranked list Cvi containing candidate entities or classes. Each relation phrase relvi vj (in edge vi vj of QS ) is mapped to a list of candidate predicates and predicate paths. This list is denoted as Cvi vj . The candidates in the list are ranked by the conﬁdence probabilities. It is important to note that we do not resolve the ambiguity issue in this step. For example, we allow that “Philadelphia” maps to three possible entities, Philadelphia_76ers , Philadelphia and Philadelphia(ﬁlm) . We push down the disambiguation to the query evaluation step. Second, if a subgraph in RDF graph can match QS if and only if the structure (of the subgraph) is isomorphism to QS . We have the following deﬁnition about match. D EFINITION 3. (Match) Consider a semantic query graph QS with n vertices {v1 ,...,vn }. Each vertex vi has a candidate list Cvi , i = 1, ..., n. Each edge vi vj also has a candidate list of Cvi vj ,316Joseph_P._Kennedy,_Sr.Table 2: Relation Phrases and Supporting Entity PairsRelation Phrase “play in” “uncle of” ( ( ( ( Supporting Entity Pairs Antonio_Banderas , Philadelphia(ﬁlm) ), Julia_Roberts , Runaway_Bride ),...... Ted_Kennedy , John_F._Kennedy,_Jr. ) Peter_Corr , Jim_Corr ),......Antonio_BanderashasChildTed_KennedyhasChildJohn_F._KennedyhasChild hasGenderMale John_F._Kennedy,_Jr.starringPhiladelphia(film) (a) Āplay ināwhere 1 ≤ i = j ≤ n. A subgraph M containing n vertices {u1 ,...,un } in RDF graph G is a match of QS if and only if the following conditions hold: 1. If vi is mapping to an entity ui , i = 1, ..., n, ui must be in list Cvi ; and 2. If vi is mapping to a class ci , i = 1, ..., n, ui is an entity whose type is ci (i.e., there is a triple ui rdf:type ci in RDF graph) and ci must be in Cvi ; and → − − → u− 3. ∀vi vj ∈ QS ; − i uj ∈ G ∨ uj ui ∈ G. Furthermore, the − → − − → predicate Pij associated with u− i uj (or uj ui ) is in Cvi vj , 1 ≤ i, j ≤ n. Each subgraph match has a score, which is derived from the probability conﬁdences of each edge and vertex mapping. Deﬁnition 6 deﬁnes the score, which we will discuss later. Our goal is to ﬁnd all subgraph matches with the top-k scores. A TA-style algorithm [11] is proposed in Section 4.2.2 to address this issue. Each subgraph match of QS implies an answer to the natural language question N , meanwhile, the ambiguity is resolved. For example, in Figure 2, although “Philadelphia” can map three different entities, in the query evaluation stage, we can only ﬁnd a subgraph (included by vertices u1 , u2 , u3 and c1 in G) containing Philadelphia_ﬁlm that matches the semantic query graph QS . According to the subgraph graph, we know that the result is “Melanie_Grifﬁth”, meanwhile, the ambiguity is resolved. Mapping phrases “Philadelphia” to Philadelphia or Philadelphia_76ers of QS is false positive for the question N , since there is no data to support that.hasGender(b) Āuncle ofāFigure 4: Mapping Relation Phrases to Predicates or Predicate Paths Although mapping these relation phrases into canonicalized representations is the core challenge in relation extraction [17], none of the prior approaches consider mapping a relation phrase to a sequence of consecutive predicate edges in RDF graph. Patty demo [17] only ﬁnds the equivalence between a relation phrase and a single predicate. However, some relation phrases cannot be interpreted as a single predicate. For example, “uncle of” corresponds to a length-3 predicate path in RDF graph G, as shown in Figure 3. In order to address this issue, we propose the following approach. Given a relation phrase reli , its corresponding support set containing entity pairs that occurs in RDF graph is denoted as Sup(reli ) j 1 m = { (vi , vi1 ), ..., (vi , vim )}. Considering each pair (vi , vij ), j j j = 1, ..., m, we ﬁnd all simple paths between vi and vi in RDF j graph G, denoted as P ath(vi , vij ). Let P S (reli ) = j =1,...,m j j P ath(vi , vi ). For example, given an entity pair ( Ted_Kennedy , John_F._Kennedy,_Jr. ), we locate them at RDF graph G and ﬁnd simple pathes between them (as shown in Figure 4). If a path L is frequent in P S (“uncle of”), L is a good candidate to represent the semantic of relation phrase “uncle of”. For efﬁciency considerations, we only ﬁnd simple paths with no longer than a threshold1 . We adopt a bi-directional BFS (breathj j ﬁrst-search) search from vertices vi and vij to ﬁnd P ath(vi , vij ). Note that we ignore edge directions (in RDF graph) in a BFS process. For each relation phrase reli with m supporting entity pairs, j we have a collection of all path sets P ath(vi , vij ), denoted as j j P S (reli ) = j =1,...,m P ath(vi , vi ). Intuitively, if a predicate path is frequent in P S (reli ), it is a good candidate that has semantic equivalence with relation phrase reli . However, the above simple intuition may introduce noises. For example, we ﬁnd that (hasGender , hasGender) is the most frequent predicate path in P S (“uncle of”) (as shown in Figure 4). Obviously, it is not a good predicate path to represent the sematic of relation phrase “uncle of”. In order to eliminate noises, we borrow the intuition of tf-idf measure [15]. Although (hasGender ,hasGender) is frequent in P S (“uncle of”), it is also frequent in the path sets of other relation phrases, such as P S (“is parent of”), P S (“is advisor of”) and so on. Thus, (hasGender ,hasGender) is not an important feature for P S (“uncle of”). It is exactly the same with measuring the importance of a word w with regard to a document. For example, if a word w is frequent in lots of documents in a corpus, it is not a good feature. A word has a high tf-idf, a numerical statistic in measuring how important a word is to a document in a corpus, if it occurs in a document frequently, but the frequency of the word in the whole corpus is small. In our problem, for each relation phrase reli , i = 1, ..., n, we deem P S (reli ) as a virtual document. All predicate paths in P S (reli ) are regarded as virtual words. The corpus contains all P S (reli ), i = 1, ..., n. Formally, we deﬁne tf-idf value of a predicate path L in the following deﬁnition. Note that if L is a length-1 predicate path, L is a predicate P .1 We set the threshold as 4 in our experiments. More details about the parameter setting will be discussed in Section 6.3.OFFLINEThe semantic relation extraction relies on a paraphrase dictionary D. A relation phrase is a surface string that occurs between a pair of entities in a sentence [17], such as “be married to” and “play in” in the running example. We need to build a paraphrase dictionary D, such as Figure 3, to map relation phrases to some candidate predicates or predicate paths. Table 2 shows two sample relation phrases and their supporting entity pairs. In this paper, we do not discuss how to extract relation phrases along with their corresponding entity pairs. Lots of NLP literature about relation extraction study this problem, such as Patty [18] and ReVerb [10]. For example, Patty [18] utilizes the dependency structure in sentences and ReVerb [10] adopts the n-gram to ﬁnd relation phrases and the corresponding support set. In this work, we assume that the relation phrases and their support sets are given. The task in the ofﬂine processing is to ﬁnd the semantic equivalence between relation phrases and the corresponding predicates (and predicate paths) in RDF graphs, i.e., building a paraphrase dictionary D like Figure 3. Suppose that we have a dictionary T = {rel1 , ..., reln }, where each reli is a relation phrase, i = 1, ..., n. Each reli has a support set of entity pairs that occur in RDF graph, i.e., Sup (reli ) 1 m = { (vi , vi1 ), ..., (vi , vim )}. For each reli , i = 1, ..., n, the goal is to mine top-k possible predicates or predicate paths formed by consecutive predicate edges in RDF graph, which have sematic equivalence with relation phrase reli .317。

Categories and Subject Descriptors

Searching the Web Using Composed Pages Ramakrishna Varadarajan Vagelis Hristidis Tao LiFlorida International University{ramakrishna,vagelis,taoli}@Categories and Subject Descriptors:H.3.3 [Information Search and Retrieval]General Terms: Algorithms, Performance Keywords: Composed pages1.INTRODUCTIONGiven a user keyword query, current Web search engines return a list of pages ranked by their “goodness” with respect to the query. However, this technique misses results whose contents are distributed across multiple physical pages and are connected via hyperlinks and frames [3]. That is, it is often the case that no single page contains all query keywords. Li et al. [3] make a first step towards this problem by returning a tree of hyperlinked pages that collectively contain all query keywords. The limitation of this approach is that it operates at the page-level granularity, which ignores the specific context where the keywords are found within the pages. More importantly, it is cumbersome for the user to locate the most desirable tree of pages due to the amount of data in each page tree and a large number of page trees.We propose a technique called composed pages that given a keyword query, generates new pages containing all query keywords on-the-fly. We view a web page as a set of interconnected text fragments. The composed pages are generated by stitching together appropriate fragments from hyperlinked Web pages, and retain links to the original Web pages. To rank the composed pages we consider both the hyperlink structure of the original pages, as well as the associations between the fragments within each page. In addition, we propose heuristic algorithms to efficiently generate the top composed pages. Experiments are conducted to empirically evaluate the effectiveness of the proposed algorithms. In summary, our contributions are listed as follows: (i) we introduce composed pages to improve the quality of search; composed pages are designed in a way that they can be viewed as a regular page but also describe the structure of the original pages and have links back to them, (ii) we rank the composed pages based on both the hyperlink structure of the original pages, and the associations between the text fragments within each page, and (iii) we propose efficient heuristic algorithms to compute top composed pages using the uniformity factor. The effectiveness of these algorithms is shown and evaluated experimentally. 2.FRAMEWORKLet D={d1,d2,,…,d n} be a set of web pages d1,d2,,…,d n. Alsolet size(d i)be the length of d i in number of words. Termfrequency tf(d,w) of term (word) w in a web page d is thenumber of occurrences of w in d. Inverse documentfrequency idf(w)is the inverse of the number of web pagescontaining term w in them. The web graph G W(V W,E W) of aset of web pages d1,d2,,…,d n is defined as follows: A node v i∈V W, is created for each web page d i in D. An edge e(v i,v j)∈E W is added between nodes v i,v j∈V W if there is a hyperlink between v i and v j. Figure 1 shows a web graph. Thehyperlinks between pages are depicted in the web graph asedges. The nodes in the graph represent the web pages andinside those nodes, the text fragments, into which that webpage has been split up using html tag parsing, are displayed(see [5]).In contrast to previous works on web search [3,4], wego beyond the page granularity. To do so, we view each pageas a set of text fragments connected through semanticassociations. The page graph G D(V D,E D) of a web page d isdefined as follows: (a) d is split to a set of non-overlapping text fragments t(v), each corresponding to a node v∈V D.(b) An edge e(u,v)∈E D is added between nodes u,v∈V D if there is an association between t(u) and t(v) in d. Figure 2 shows the page graph for Page 1 of Figure 1. As denoted in Figure 1, page 1 is split into 7 text fragments and each one is represented by a node. An edge between two nodes denotes semantic associations. Higher weights denote greater association. In this work nodes and edges of the page graph are assigned weights using both query-dependent and independent factors (see [5]). The semantic association between the nodes is used to compute the edge weights (query-independent) while the relevance of a node to the query is used to define the node weight (query-dependent).A keyword query Q is a set of keywords Q={w1,…,w m}.A search result of a keyword query is a subtree of the webgraph, consisting of pages d1,…,d l, where a subtree s i of thepage graph G Di of d i is associated with each d i. A result is total−all query keywords are contained in the text fragments−and minimal−by removing any text fragment a query keyword is missed. For example, Table 1 shows the Top-3 search results for the query “Graduate Research Scholarships” on the Web graph of Figure 1.3.RANKING SEARCH RESULTSProblem 1 (Find Top-k Search Results).Given a webgraph G W, the page graphs G D for all pages in G W, and akeyword query Q, find the k search results R with maximumScore(R).The computation of Score(R) is based on the followingprinciples. First, search results R involving fewer pages areranked higher [3]. Second, the scores of the subtrees ofCopyright is held by the author/owner(s).SIGIR’06, August 6–11, 2006, Seattle, Washington, USA. ACM 1-59593-369-7/06/0008.Figure 2: A page graph of Page 1 of Figure 1.Table 1: Top-3 search results for query “Graduate ResearchScholarships”Search Resultsthe page graphs of the constituting pages of are combined usinga monotonic aggregate function to compute the score of the searchresult. A modification of the expanding search algorithm of [1] isused where a heuristic value combining the Information Retrieval(IR) score, the PageRank score [4], and the inverse of theuniformity factor (uf)of a page is used to determine the nextexpansion page. The uf is high for pages that focus on a single orfew topics and low for pages with many topics. The uf iscomputed using the edge weights of the page graph of a page(high average edge weights imply high uf). The intuition behindexpanding according tothe inverse uf is that among pages withsimilar IR scores, pages with low uf are more likely to contain ashort focused text fragment relevant to the query keywords.Figure 3 shows the quality of the results of our heuristic search vs.the quality of the results of the non-heuristic expanding search [1](a random page is chosen for expansion since hyperlinks are un-weighted) compared to the optimal exhaustive search. Themodified Spearman’s rho metric [2] is used to compare two Top-kFigure 3: Quality Experiments using Spearman’s rho.4.REFERENCES[1]G. Bhalotia, C. Nakhe, A. Hulgeri, S. Chakrabarti and S,Sudarshan:Keyword Searching and Browsing in Databases using BANKS.ICDE, 2002[2]Ronald Fagin, Ravi Kumar, and D. Sivakumar: Comparing top-klists. SODA, 2003[3]W.S. Li, K. S. Candan, Q. Vu and D. Agrawal: Retrieving andOrganizing Web Pages by "Information Unit". WWW, 2001[4]L. Page, S. Brin, R. Motwani, and T. Winograd: The pagerankcitation ranking: Bringing order to the web. Technical report,Stanford University, 1998[5]R. Varadarajan, V Hristidis: Structure-Based Query-SpecificDocument Summarization. CIKM, 2005。

广域存储的一致性和稳定性COPS理论

Wyatt Lloyd , Michael J. Freedman , Michael Kaminsky† , and David G. Andersen‡
Princeton University, † Intel Labs, ‡ Carnegie Mellon University
ABSTRACT
To appear in Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11)
Don’t Settle for Eventual:
Scalable Causal Consistency for Wide-Area Storage with COPS1.INTRODUCTION
Categories and Subject Descriptors
C.2.4 [Computer Systems Organization]: Distributed Systems
General Terms
Design, Experimentation, Performance
Distributed data stores are a fundamental building block of modern Internet services. Ideally, these data stores would be strongly consistent, always available for reads and writes, and able to continue operating during network partitions. The CAP Theorem, unfortunately, proves it impossible to create a system that achieves all three [13, 23]. Instead, modern web services have chosen overwhelmingly to embrace availability and partition tolerance at the cost of strong consistency [16, 20, 30]. This is perhaps not surprising, given that this choice also enables these systems to provide low latency for client operations and high scalability. Further, many of the earlier high-scale Internet services, typically focusing on web search, saw little reason for stronger consistency, although this position is changing with the rise of interactive services such as social networking applications [46]. We refer to systems with these four properties—Availability, low Latency, Partition-tolerance, and high Scalability—as ALPS systems. Given that ALPS systems must sacriﬁce strong consistency (i.e., linearizability), we seek the strongest consistency model that is achievable under these constraints. Stronger consistency is desirable because it makes systems easier for a programmer to reason about. In this paper, we consider causal consistency with convergent conﬂict handling, which we refer to as causal+ consistency. Many previous systems believed to implement the weaker causal consistency [10, 41] actually implement the more useful causal+ consistency, though none do so in a scalable manner. The causal component of causal+ consistency ensures that the data store respects the causal dependencies between operations [31]. Consider a scenario where a user uploads a picture to a web site, the picture is saved, and then a reference to it is added to that user’s album. The reference “depends on” the picture being saved. Under causal+ consistency, these dependencies are always satisﬁed. Programmers never have to deal with the situation where they can get the reference to the picture but not the picture itself, unlike in systems with weaker guarantees, such as eventual consistency. The convergent conﬂict handling component of causal+ consistency ensures that replicas never permanently diverge and that conﬂicting updates to the same key are dealt with identically at all sites. When combined with causal consistency, this property ensures that clients see only progressively newer versions of keys. In comparison, eventually consistent systems may expose versions out of order. By combining causal consistency and convergent conﬂict handling, causal+ consistency ensures clients see a causally-correct, conﬂict-free, and always-progressing data store. Our COPS system (Clusters of Order-Preserving Servers) provides causal+ consistency and is designed to support complex online applications that are hosted from a small number of large-scale datacenters, each of which is composed of front-end servers (clients of

How to read a paper(如何阅读一篇文章)

How to Read a PaperS.KeshavDavid R.Cheriton School of Computer Science,University of WaterlooWaterloo,ON,Canadakeshav@uwaterloo.caABSTRACTResearchers spend a great deal of time reading research pa-pers.However,this skill is rarely taught,leading to much wasted eﬀort.This article outlines a practical and eﬃcient three-pass method for reading research papers.I also de-scribe how to use this method to do a literature survey. Categories and Subject Descriptors:A.1[Introductory and Survey]General Terms:Documentation.Keywords:Paper,Reading,Hints.1.INTRODUCTIONResearchers must read papers for several reasons:to re-view them for a conference or a class,to keep current in theirﬁeld,or for a literature survey of a newﬁeld.A typi-cal researcher will likely spend hundreds of hours every year reading papers.Learning to eﬃciently read a paper is a critical but rarely taught skill.Beginning graduate students,therefore,must learn on their own using trial and error.Students waste much eﬀort in the process and are frequently driven to frus-tration.For many years I have used a simple approach to eﬃciently read papers.This paper describes the‘three-pass’approach and its use in doing a literature survey.2.THE THREE-PASS APPROACHThe key idea is that you should read the paper in up to three passes,instead of starting at the beginning and plow-ing your way to the end.Each pass accomplishes speciﬁc goals and builds upon the previous pass:The first pass gives you a general idea about the paper.The second pass lets you grasp the paper’s content,but not its details.The third pass helps you understand the paper in depth.2.1Theﬁrst passTheﬁrst pass is a quick scan to get a bird’s-eye view of the paper.You can also decide whether you need to do any more passes.This pass should take aboutﬁve to ten minutes and consists of the following steps:1.Carefully read the title,abstract,and introduction2.Read the section and sub-section headings,but ignoreeverything else3.Read the conclusions4.Glance over the references,mentally ticking oﬀtheones you’ve already readAt the end of theﬁrst pass,you should be able to answer theﬁve Cs:1.Category:What type of paper is this?A measure-ment paper?An analysis of an existing system?A description of a research prototype?2.Context:Which other papers is it related to?Whichtheoretical bases were used to analyze the problem?3.Correctness:Do the assumptions appear to be valid?4.Contributions:What are the paper’s main contribu-tions?5.Clarity:Is the paper well written?Using this information,you may choose not to read fur-ther.This could be because the paper doesn’t interest you, or you don’t know enough about the area to understand the paper,or that the authors make invalid assumptions.The ﬁrst pass is adequate for papers that aren’t in your research area,but may someday prove relevant.Incidentally,when you write a paper,you can expect most reviewers(and readers)to make only one pass over it.Take care to choose coherent section and sub-section titles and to write concise and comprehensive abstracts.If a reviewer cannot understand the gist after one pass,the paper will likely be rejected;if a reader cannot understand the high-lights of the paper afterﬁve minutes,the paper will likely never be read.2.2The second passIn the second pass,read the paper with greater care,but ignore details such as proofs.It helps to jot down the key points,or to make comments in the margins,as you read.1.Look carefully at theﬁgures,diagrams and other illus-trations in the paper.Pay special attention to graphs.Are the axes properly labeled?Are results shown with error bars,so that conclusions are statistically sig-niﬁcant?Common mistakes like these will separate rushed,shoddy work from the truly excellent.2.Remember to mark relevant unread references for fur-ther reading(this is a good way to learn more about the background of the paper).The second pass should take up to an hour.After this pass,you should be able to grasp the content of the paper. You should be able to summarize the main thrust of the pa-per,with supporting evidence,to someone else.This level of detail is appropriate for a paper in which you are interested, but does not lie in your research speciality.Sometimes you won’t understand a paper even at the end of the second pass.This may be because the subject matter is new to you,with unfamiliar terminology and acronyms. Or the authors may use a proof or experimental technique that you don’t understand,so that the bulk of the pa-per is incomprehensible.The paper may be poorly written with unsubstantiated assertions and numerous forward ref-erences.Or it could just be that it’s late at night and you’re tired.You can now choose to:(a)set the paper aside,hoping you don’t need to understand the material to be successful in your career,(b)return to the paper later,perhaps after reading background material or(c)persevere and go on to the third pass.2.3The third passTo fully understand a paper,particularly if you are re-viewer,requires a third pass.The key to the third pass is to attempt to virtually re-implement the paper:that is, making the same assumptions as the authors,re-create the work.By comparing this re-creation with the actual paper, you can easily identify not only a paper’s innovations,but also its hidden failings and assumptions.This pass requires great attention to detail.You should identify and challenge every assumption in every statement. Moreover,you should think about how you yourself would present a particular idea.This comparison of the actual with the virtual lends a sharp insight into the proof and presentation techniques in the paper and you can very likely add this to your repertoire of tools.During this pass,you should also jot down ideas for future work.This pass can take about four orﬁve hours for beginners, and about an hour for an experienced reader.At the end of this pass,you should be able to reconstruct the entire structure of the paper from memory,as well as be able to identify its strong and weak points.In particular,you should be able to pinpoint implicit assumptions,missing citations to relevant work,and potential issues with experimental or analytical techniques.3.DOING A LITERATURE SURVEYPaper reading skills are put to the test in doing a literature survey.This will require you to read tens of papers,perhaps in an unfamiliarﬁeld.What papers should you read?Here is how you can use the three-pass approach to help.First,use an academic search engine such as Google Scholar or CiteSeer and some well-chosen keywords toﬁnd three to ﬁve recent papers in the area.Do one pass on each pa-per to get a sense of the work,then read their related work sections.You willﬁnd a thumbnail summary of the recent work,and perhaps,if you are lucky,a pointer to a recent survey paper.If you canﬁnd such a survey,you are done. Read the survey,congratulating yourself on your good luck. Otherwise,in the second step,ﬁnd shared citations and repeated author names in the bibliography.These are the key papers and researchers in that area.Download the key papers and set them aside.Then go to the websites of the key researchers and see where they’ve published recently.That will help you identify the top conferences in thatﬁeld because the best researchers usually publish in the top con-ferences.The third step is to go to the website for these top con-ferences and look through their recent proceedings.A quick scan will usually identify recent high-quality related work. These papers,along with the ones you set aside earlier,con-stitute theﬁrst version of your survey.Make two passes through these papers.If they all cite a key paper that you did notﬁnd earlier,obtain and read it,iterating as neces-sary.4.EXPERIENCEI’ve used this approach for the last15years to read con-ference proceedings,write reviews,do background research, and to quickly review papers before a discussion.This dis-ciplined approach prevents me from drowning in the details before getting a bird’s-eye-view.It allows me to estimate the amount of time required to review a set of papers.More-over,I can adjust the depth of paper evaluation depending on my needs and how much time I have.5.RELATED WORKIf you are reading a paper to do a review,you should also read Timothy Roscoe’s paper on“Writing reviews for sys-tems conferences”[1].If you’re planning to write a technical paper,you should refer both to Henning Schulzrinne’s com-prehensive web site[2]and George Whitesides’s excellent overview of the process[3].6.A REQUESTI would like to make this a living document,updating it as I receive comments.Please take a moment to email me any comments or suggestions for improvement.You can also add comments at CCRo,the online edition of CCR[4]. 7.ACKNOWLEDGMENTSTheﬁrst version of this document was drafted by my stu-dents:Hossein Falaki,Earl Oliver,and Sumair Ur Rahman. My thanks to them.I also beneﬁted from Christophe Diot’s perceptive comments and Nicole Keshav’s eagle-eyed copy-editing.This work was supported by grants from the National Science and Engineering Council of Canada,the Canada Research Chair Program,Nortel Networks,Microsoft,Intel Corporation,and Sprint Corporation.8.REFERENCES[1]T.Roscoe,“Writing Reviews for SystemsConferences,”http://people.inf.ethz.ch/troscoe/pubs/review-writing.pdf.[2]H.Schulzrinne,“Writing Technical Articles,”/hgs/etc/writing-style.html.[3]G.M.Whitesides,“Whitesides’Group:Writing aPaper,”http://www.che.iitm.ac.in/misc/dd/writepaper.pdf. [4]ACM SIGCOMM Computer Communication ReviewOnline,/ccr/drupal/.。

如何阅读论文 How to read a paper

如何阅读论文∗作者：S.KeShav†，译者：计军平‡摘要学者们需花费大量时间阅读论文。

然而，很少有人传授这项技能，导致初学者浪费了大量时间精力。

本文提出了一种高效实用的论文阅读方法——“三轮阅读法”。

同时，本文也描述了如何采用该方法进行文献综述。

1概述学者们出于各种原因阅读论文，比如为了准备一场学术会议或者一堂课，为了紧跟自己所在领域的研究进展，或者为了了解新领域而进行的文献综述。

一般而言，一名学者每年会花数百小时来阅读论文。

高效阅读论文是一项极其重要但却很少被人传授的技能。

因此，初学者不得不在自己的摸索中学习这项技能。

结果是他们在此过程中浪费了很多精力，并且常常陷入深深的挫败感之中。

多年以来，我一直使用一种简单有效的方法来阅读论文。

本文对这种“三轮阅读法”进行了说明，并介绍了该方法在文献综述中的应用。

2三轮阅读法该方法的关键点在于分三轮阅读一篇论文，而非仔细地从头看到尾。

每一轮阅读都在上一轮的基础上达成特定的目的：第一轮了解论文的大意，第二轮了解论文的主要内容（而非细节），第三轮深入理解论文。

∗S.Keshav,2007.How to Read a Paper.ACM SIGCOMM Computer Communication Review, 37(3):83-84.†David R.Cherition School of Computer Science,University of Waterloo‡北京大学深圳研究生院环境与能源学院12三轮阅读法2 2.1第一轮阅读第一轮属于鸟瞰式阅读，快速浏览论文。

由此决定是否需要进入后两轮阅读。

这一轮使用5至10分钟，包括以下四个步骤：1.仔细阅读题目、摘要及导言；2.阅读章节标题，略过其他内容；3.阅读结论；4.粗略地看一下参考文献，识别出你已经读过的文献。

在第一轮的最后，你应该能回答以下五个问题：1.类别：这篇论文属于什么类别？是实证量化分析？还是对现有方法进行改进？亦或是提出了一个新的理论？2.背景：这篇论文与哪些论文有关联？分析的理论基础有哪些？3.正确性：凭经验判断，这篇论文的前提假设是否成立？4.贡献：这篇论文的主要贡献是什么？5.清晰度：这篇论文的文字表述是否清晰？基于上述信息，你可能决定不再阅读这篇文章。

The Hebrew University Jerusalem

Processes are increasingly being used to make complex application logic explicit. Programming using processes has signi cant advantages but it poses a di cult problem from the system point of view in that the interactions between processes cannot be controlled using conventional techniques. In terms of recovery, the steps of a process are di erent from operations within a transaction. Each one has its own termination semantics and there are dependencies among the di erent steps. Regarding concurrency control, the ow of control of a process is more complex than in a at transaction. A process may, e.g., partially rollback its execution or may follow one of several alternatives. In this paper, we deal with the problem of atomicity and isolation in the context of processes. We propose a uni ed model for concurrency control and recovery for processes and show how this model can be implemented in practice, thereby providing a complete framework for developing middleware applications using processes. Categories and Subject Descriptors: D.2.2 Software Engineering]: Design Tools/Techniques; D.2.4 Software Engineering]: Software/Program Veri cation|Correctness proofs ; Reliability; H.2.4 Database Management]: Systems|Concurrency ; Distributed databases ; Transaction processing; H.2.7 Database Management]: Database Administration|Logging and recovery; H.4.1 Information Systems Applications]: O ce Automation|Work ow management General Terms: Algorithms, Design, Reliability Additional Key Words and Phrases: Advanced transaction models, business process management, electronic commerce, execution guarantees, locking, rocesses, semantically rich transactions, transactional work ows, uni ed theory of concurrency control and recovery.

毕业设计——关于传感器的英语参考文献

DiMo:Distributed Node Monitoring in WirelessSensor NetworksAndreas Meier†,Mehul Motani∗,Hu Siquan∗,and Simon Künzli‡†Computer Engineering and Networks Lab,ETH Zurich,Switzerland∗Electrical&Computer Engineering,National University of Singapore,Singapore‡Siemens Building T echnologies,Zug,SwitzerlandABSTRACTSafety-critical wireless sensor networks,such as a distributed ﬁre-or burglar-alarm system,require that all sensor nodes are up and functional.If an event is triggered on a node, this information must be forwarded immediately to the sink, without setting up a route on demand or having toﬁnd an alternate route in case of a node or link failure.Therefore, failures of nodes must be known at all times and in case of a detected failure,an immediate notiﬁcation must be sent to the network operator.There is usually a bounded time limit,e.g.,ﬁve minutes,for the system to report network or node failure.This paper presents DiMo,a distributed and scalable solution for monitoring the nodes and the topology, along with a redundant topology for increased robustness. Compared to existing solutions,which traditionally assume a continuous data-ﬂow from all nodes in the network,DiMo observes the nodes and the topology locally.DiMo only reports to the sink if a node is potentially failed,which greatly reduces the message overhead and energy consump-tion.DiMo timely reports failed nodes and minimizes the false-positive rate and energy consumption compared with other prominent solutions for node monitoring.Categories and Subject DescriptorsC.2.2[Network Protocols]:Wireless Sensor NetworkGeneral TermsAlgorithms,Design,Reliability,PerformanceKeywordsLow power,Node monitoring,Topology monitoring,WSN 1.INTRODUCTIONDriven by recent advances in low power platforms and protocols,wireless sensor networks are being deployed to-day to monitor the environment from wildlife habitats[1] Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on theﬁrst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.MSWiM’08,October27–31,2008,Vancouver,BC,Canada.Copyright2008ACM978-1-60558-235-1/08/10...$5.00.to mission-criticalﬁre-alarm systems[5].There are,how-ever,still some obstacles in the way for mass application of wireless sensor networks.One of the key challenges is the management of the wireless sensor network itself.With-out a practical management system,WSN maintenance will be very diﬃcult for network administrators.Furthermore, without a solid management plan,WSNs are not likely to be accepted by industrial users.One of the key points in the management of a WSN is the health status monitoring of the network itself.Node failures should be captured by the system and reported to adminis-trators within a given delay constraint.Due to the resource constraints of WSN nodes,traditional network management protocols such as SNMP adopted by TCP/IP networks are not suitable for sensor networks.In this paper,we con-sider a light-weight network management approach tailored speciﬁcally for WSNs and their unique constraints. Currently,WSN deployments can be categorized by their application scenario:data-gathering applications and event-detection applications.For data-gathering systems,health status monitoring is quite straight forward.Monitoring in-formation can be forwarded to the sink by speciﬁc health status packets or embedded in the regular data packets.Ad-ministrators can usually diagnose the network with a helper program.NUCLEUS[6]is one of the network management systems for data-gathering application of WSN.Since event-detection deployments do not have regular traﬃc to send to the sink,the solutions for data-gathering deployments are not suitable.In this case,health status monitoring can be quite challenging and has not been discussed explicitly in the literature.In an event-detection WSN,there is no periodic data trans-fer,i.e.,nodes maintain radio silence until there is an event to report.While this is energy eﬃcient,it does mean that there is no possibility for the sink to decide whether the net-work is still up and running(and waiting for an event to be detected)or if some nodes in the network have failed and are therefore silent.Furthermore,for certain military ap-plications or safety-critical systems,the speciﬁcations may include a hard time constraint for accomplishing the node health status monitoring task.In an event-detection WSN,the system maintains a net-work topology that allows for forwarding of data to a sink in the case of an event.Even though there is no regular data transfer in the network,the network should always be ready to forward a message to the sink immediately when-ever necessary.It is this urgency of data forwarding that makes it undesirable to set up a routing table and neighborlist after the event has been detected.The lack of regular data transfer in the network also leads to diﬃculty in de-tecting bad quality links,making it challenging to establish and maintain a stable robust network topology.While we have mentioned event-detection WSNs in gen-eral,we accentuate that the distributed node monitoring problem we are considering is inspired by a real-world ap-plication:a distributed indoor wireless alarm system which includes a sensor for detection of a speciﬁc alarm such as ﬁre(as studied in[5]).To illustrate the reporting require-ments of such a system,we point out that regulatory speci-ﬁcations require aﬁre to be reported to the control station within10seconds and a node failure to be reported within 5minutes[9].This highlights the importance of the node-monitoring problem.In this paper,we present a solution for distributed node monitoring called DiMo,which consists of two functions: (i)Network topology maintenance,introduced in Section2, and(ii)Node health status monitoring,introduced in Sec-tion3.We compare DiMo to existing state-of-the-art node monitoring solutions and evaluate DiMo via simulations in Section4.1.1Design GoalsDiMo is developed based on the following design goals:•In safety critical event monitoring systems,the statusof the nodes needs to be monitored continuously,allow-ing the detection and reporting of a failed node withina certain failure detection time T D,e.g.,T D=5min.•If a node is reported failed,a costly on-site inspectionis required.This makes it of paramount interest todecrease the false-positive rate,i.e.,wrongly assuminga node to have failed.•In the case of an event,the latency in forwarding theinformation to the sink is crucial,leaving no time toset up a route on demand.We require the system tomaintain a topology at all times.In order to be robustagainst possible link failures,the topology needs toprovide redundancy.•To increase eﬃciency and minimize energy consump-tion,the two tasks of topology maintenance(in par-ticular monitoring of the links)and node monitoringshould be combined.•Maximizing lifetime of the network does not necessar-ily translate to minimizing the average energy con-sumption in the network,but rather minimizing theenergy consumption of the node with the maximal loadin the network.In particular,the monitoring shouldnot signiﬁcantly increase the load towards the sink.•We assume that the event detection WSN has no reg-ular data traﬃc,with possibly no messages for days,weeks or even months.Hence we do not attempt to op-timize routing or load balancing for regular data.Wealso note that approaches like estimating links’perfor-mance based on the ongoing dataﬂow are not possibleand do not take them into account.•Wireless communications in sensor networks(especially indoor deployments)is known for its erratic behav-ior[2,8],likely due to multi-path fading.We assumesuch an environment with unreliable and unpredictablecommunication links,and argue that message lossesmust be taken into account.1.2Related WorkNithya et al.discuss Sympathy in[3],a tool for detect-ing and debugging failures in pre-and post-deployment sen-sor networks,especially designed for data gathering appli-cations.The nodes send periodic heartbeats to the sink that combines this information with passively gathered data to detect failures.For the failure detection,the sink re-quires receiving at least one heartbeat from the node every so called sweep interval,i.e.,its lacking indicates a node fail-ure.Direct-Heartbeat performs poorly in practice without adaptation to wireless packet losses.To meet a desired false positive rate,the rate of heartbeats has to be increased also increasing the communication cost.NUCLEUS[6]follows a very similar approach to Sympathy,providing a manage-ment system to monitor the heath status of data-gathering applications.Rost et al.propose with Memento a failure detection sys-tem that also requires nodes to periodically send heartbeats to the so called observer node.Those heartbeats are not directly forwarded to the sink node,but are aggregated in form of a bitmask(i.e.,bitwise OR operation).The ob-server node is sweeping its bitmask every sweep interval and will forward the bitmask with the node missing during the next sweep interval if the node fails sending a heartbeat in between.Hence the information of the missing node is disseminated every sweep interval by one hop,eventually arriving at the sink.Memento is not making use of ac-knowledgements and proactively sends multiple heartbeats every sweep interval,whereas this number is estimated based on the link’s estimated worst-case performance and the tar-geted false positive rate.Hence Memento and Sympathy do both send several messages every sweep interval,most of them being redundant.In[5],Strasser et al.propose a ring based(hop count)gos-siping scheme that provides a latency bound for detecting failed nodes.The approach is based on a bitmask aggre-gation,beingﬁlled ring by ring based on a tight schedule requiring a global clock.Due to the tight schedule,retrans-missions are limited and contention/collisions likely,increas-ing the number of false positives.The approach is similar to Memento[4],i.e.,it does not scale,but provides latency bounds and uses the beneﬁts of acknowledgements on the link layer.2.TOPOLOGY MAINTENANCEForwarding a detected event without any delay requires maintaining a redundant topology that is robust against link failures.The characteristics of such a redundant topology are discussed subsequently.The topology is based on so called relay nodes,a neighbor that can provide one or more routes towards the sink with a smaller cost metric than the node itself has.Loops are inherently ruled out if packets are always forwarded to relay nodes.For instance,in a simple tree topology,the parent is the relay node and the cost metric is the hop count.In order to provide redundancy,every node is connected with at least two relay nodes,and is called redundantly con-nected.Two neighboring nodes can be redundantly con-nected by being each others relay,although having the same cost metric,only if they are both connected to the sink. This exception allows the nodes neighboring the sink to be redundantly connected and avoids having a link to the sinkas a single point of failure.In a(redundantly)connected network,all deployed nodes are(redundantly)connected.A node’s level L represents the minimal hop count to the sink according to the level of its relay nodes;i.e.,the relay with the least hop count plus one.The level is inﬁnity if the node is not connected.The maximal hop count H to the sink represents the longest path to the sink,i.e.,if at every hop the relay node with the highest maximal hop count is chosen.If the node is redundantly connected,the node’s H is the maximum hop count in the set of its relays plus one, if not,the maximal hop count is inﬁnity.If and only if all nodes in the network have aﬁnite maximal hop count,the network is redundantly connected.The topology management function aims to maintain a redundantly connected network whenever possible.This might not be possible for sparsely connected networks,where some nodes might only have one neighbor and therefore can-not be redundantly connected by deﬁnition.Sometimes it would be possible toﬁnd alternative paths with a higher cost metric,which in turn would largely increase the overhead for topology maintenance(e.g.,for avoiding loops).For the cost metric,the tuple(L,H)is used.A node A has the smaller cost metric than node B ifL A<L B∨(L A=L B∧H A<H B).(1) During the operation of the network,DiMo continuously monitors the links(as described in Section3),which allows the detection of degrading links and allows triggering topol-ogy adaptation.Due to DiMo’s redundant structure,the node is still connected to the network,during this neighbor search,and hence in the case of an event,can forward the message without delay.3.MONITORING ALGORITHMThis section describes the main contribution of this paper, a distributed algorithm for topology,link and node monitor-ing.From the underlying MAC protocol,it is required that an acknowledged message transfer is supported.3.1AlgorithmA monitoring algorithm is required to detect failed nodes within a given failure detection time T D(e.g.,T D=5min).A node failure can occur for example due to hardware fail-ures,software errors or because a node runs out of energy. Furthermore,an operational node that gets disconnected from the network is also considered as failed.The monitoring is done by so called observer nodes that monitor whether the target node has checked in by sending a heartbeat within a certain monitoring time.If not,the ob-server sends a node missing message to the sink.The target node is monitored by one observer at any time.If there are multiple observer nodes available,they alternate amongst themselves.For instance,if there are three observers,each one observes the target node every third monitoring time. The observer node should not only check for the liveliness of the nodes,but also for the links that are being used for sending data packets to the sink in case of a detected event. These two tasks are combined by selecting the relay nodes as observers,greatly reducing the network load and maximiz-ing the network lifetime.In order to ensure that all nodes are up and running,every node is observed at all times. The speciﬁed failure detection time T D is an upper bound for the monitoring interval T M,i.e.,the interval within which the node has to send a heartbeat.Since failure detec-tion time is measured at the sink,the detection of a missing node at the relay needs to be forwarded,resulting in an ad-ditional maximal delay T L.Furthermore,the heartbeat can be delayed as well,either by message collisions or link fail-ures.Hence the node should send the heartbeat before the relay’s monitoring timer expires and leave room for retries and clock drift within the time window T R.So the monitor-ing interval has to be set toT M≤T D−T L−T R(2) and the node has to ensure that it is being monitored every T M by one of its observers.The schedule of reporting to an observer is only deﬁned for the next monitoring time for each observer.Whenever the node checks in,the next monitoring time is announced with the same message.So for every heartbeat sent,the old monitoring timer at the observer can be cancelled and a new timer can be set according the new time.Whenever,a node is newly observed or not being observed by a particular observer,this is indicated to the sink.Hence the sink is always aware of which nodes are being observed in the network,and therefore always knows which nodes are up and running.This registration scheme at the sink is an optional feature of DiMo and depends on the user’s requirements.3.2Packet LossWireless communication always has to account for possi-ble message losses.Sudden changes in the link quality are always possible and even total link failures in the order of a few seconds are not uncommon[2].So the time T R for send-ing retries should be suﬃciently long to cover such blanks. Though unlikely,it is possible that even after a duration of T R,the heartbeat could not have been successfully for-warded to the observer and thus was not acknowledged,in spite of multiple retries.The node has to assume that it will be reported miss-ing at the sink,despite the fact it is still up and running. Should the node be redundantly connected,a recovery mes-sage is sent to the sink via another relay announcing be-ing still alive.The sink receiving a recovery message and a node-missing message concerning the same node can neglect these messages as they cancel each other out.This recov-ery scheme is optional,but minimizes the false positives by orders of magnitudes as shown in Section4.3.3Topology ChangesIn the case of a new relay being announced from the topol-ogy management,a heartbeat is sent to the new relay,mark-ing it as an observer node.On the other hand,if a depre-cated relay is announced,this relay might still be acting as an observer,and the node has to check in as scheduled.How-ever,no new monitor time is announced with the heartbeat, which will release the deprecated relay of being an observer.3.4Queuing PolicyA monitoring buﬀer exclusively used for monitoring mes-sages is introduced,having the messages queued according to a priority level,in particular node-missing messagesﬁrst. Since the MAC protocol and routing engine usually have a queuing buﬀer also,it must be ensured that only one single monitoring message is being handled by the lower layers atthe time.Only if an ACK is received,the monitoring mes-sage can be removed from the queue(if a NACK is received, the message remains).DiMo only prioritizes between the diﬀerent types of monitoring messages and does not require prioritized access to data traﬃc.4.EV ALUATIONIn literature,there are very few existing solutions for mon-itoring the health of the wireless sensor network deployment itself.DiMo is theﬁrst sensor network monitoring solution speciﬁcally designed for event detection applications.How-ever,the two prominent solutions of Sympathy[3]and Me-mento[4]for monitoring general WSNs can also be tailored for event gathering applications.We compare the three ap-proaches by looking at the rate at which they generate false positives,i.e.,wrongly inferring that a live node has failed. False positives tell us something about the monitoring pro-tocol since they normally result from packet losses during monitoring.It is crucial to prevent false positives since for every node that is reported missing,a costly on-site inspec-tion is required.DiMo uses the relay nodes for observation.Hence a pos-sible event message and the regular heartbeats both use the same path,except that the latter is a one hop message only. The false positive probability thus determines the reliability of forwarding an event.We point out that there are other performance metrics which might be of interest for evaluation.In addition to false positives,we have looked at latency,message overhead, and energy consumption.We present the evaluation of false positives below.4.1Analysis of False PositivesIn the following analysis,we assume r heartbeats in one sweep for Memento,whereas DiMo and Sympathy allow sending up to r−1retransmissions in the case of unac-knowledged messages.To compare the performance of the false positive rate,we assume the same sweep interval for three protocols which means that Memento’s and Sympa-thy’s sweep interval is equal to DiMo’s monitoring interval. In the analysis we assume all three protocols having the same packet-loss probability p l for each hop.For Sympathy,a false positive for a node occurs when the heartbeat from the node does not arrive at the sink in a sweep interval,assuming r−1retries on every hop.So a node will generate false positive with a possibility(1−(1−p r l)d)n,where d is the hop count to the sink and n the numbers of heartbeats per sweep.In Memento,the bitmask representing all nodes assumes them failed by default after the bitmap is reset at the beginning of each sweep interval. If a node doesn’t report to its parent successfully,i.e.,if all the r heartbeats are lost in a sweep interval,a false positive will occur with a probability of p l r.In DiMo the node is reported missing if it fails to check in at the observer having a probability of p l r.In this case,a recovery message is triggered.Consider the case that the recovery message is not kept in the monitoring queue like the node-missing messages, but dropped after r attempts,the false positive rate results in p l r(1−(1−p l r)d).Table1illustrates the false positive rates for the three protocols ranging the packet reception rate(PRR)between 80%and95%.For this example the observed node is in aﬁve-hop distance(d=5)from the sink and a commonPRR80%85%90%95% Sympathy(n=1) 3.93e-2 1.68e-2 4.99e-3 6.25e-4 Sympathy(n=2) 1.55e-3 2.81e-4 2.50e-5 3.91e-7 Memento8.00e-3 3.38e-3 1.00e-3 1.25e-4 DiMo 3.15e-4 5.66e-5 4.99e-67.81e-8Table1:False positive rates for a node with hop count5and3transmissions under diﬀerent packet success rates.number of r=3attempts for forwarding a message is as-sumed.Sympathy clearly suﬀers from a high packet loss, but its performance can be increased greatly sending two heartbeats every sweep interval(n=2).This however dou-bles the message load in the network,which is especially substantial as the messages are not aggregated,resulting in a largely increased load and energy consumption for nodes next to the paring DiMo with Memento,we ob-serve the paramount impact of the redundant relay on the false positive rate.DiMo oﬀers a mechanism here that is not supported in Sympathy or Memento as it allows sending up to r−1retries for the observer and redundant relay.Due to this redundancy,the message can also be forwarded in the case of a total blackout of one link,a feature both Memento and Sympathy are lacking.4.2SimulationFor evaluation purposes we have implemented DiMo in Castalia1.3,a state of the art WSN simulator based on the OMNet++platform.Castalia allows evaluating DiMo with a realistic wireless channel(based on the empiricalﬁndings of Zuniga et al.[8])and radio model but also captures eﬀects like the nodes’clock drift.Packet collisions are calculated based on the signal to interference ratio(SIR)and the radio model features transition times between the radio’s states (e.g.,sending after a carrier sense will be delayed).Speck-MAC[7],a packet based version of B-MAC,with acknowl-edgements and a low-power listening interval of100ms is used on the link layer.The characteristics of the Chipcon CC2420are used to model the radio.The simulations are performed for a network containing80 nodes,arranged in a grid with a small Gaussian distributed displacement,representing an event detection system where nodes are usually not randomly deployed but rather evenly spread over the observed area.500diﬀerent topologies were analyzed.The topology management results in a redun-dantly connected network with up to5levels L and a max-imum hop count H of6to8.A false positive is triggered if the node fails to check in, which is primarily due to packet errors and losses on the wireless channel.In order to understand false positives,we set the available link’s packet reception rate(PRR)to0.8, allowing us to see the eﬀects of the retransmission scheme. Furthermore,thisﬁxed PRR also allows a comparison with the results of the previous section’s analysis and is shown in Figure1(a).The plot shows on the one hand side the monitoring based on a tree structure that is comparable to the performance of Memento,i.e.,without DiMo’s possibil-ity of sending a recovery message using an alternate relay. On the other hand side,the plot shows the false positive rate of DiMo.The plot clearly shows the advantage of DiMo’s redundancy,yet allowing sending twice as many heartbeats than the tree approach.This might not seem necessarily fair atﬁrst;however,in a real deployment it is always possible(a)Varying number of retries;PRR =0.8.(b)Varying link quality.Figure 1:False positives:DiMo achieves the targeted false positive rate of 1e-7,also representing the reliability for successfully forwarding an event.that a link fails completely,allowing DiMo to still forward the heartbeat.The simulation and the analysis show a slight oﬀset in the performance,which is explained by a simulation artifact of the SpeckMAC implementation that occurs when the receiver’s wake-up time coincides with the start time of a packet.This rare case allows receiving not only one but two packets out of the stream,which artiﬁcially increases the link quality by about three percent.The nodes are observed every T M =4min,resulting in being monitored 1.3e5times a year.A false positive rate of 1e-6would result in having a particular node being wrongly reported failed every 7.7years.Therefore,for a 77-node net-work,a false positive rate of 1e-7would result in one false alarm a year,being the targeted false-positive threshold for the monitoring system.DiMo achieves this rate by setting the numbers of retries for both the heartbeat and the recov-ery message to four.Hence the guard time T R for sending the retries need to be set suﬃciently long to accommodate up to ten messages and back-oﬀtimes.The impact of the link quality on DiMo’s performance is shown in Figure 1(b).The tree topology shows a similar performance than DiMo,if the same number of messages is sent.However,it does not show the beneﬁt in the case of a sudden link failure,allowing DiMo to recover immedi-ately.Additionally,the surprising fact that false positives are not going to zero for perfect link quality is explained by collisions.This is also the reason why DiMo’s curve for two retries ﬂattens for higher link qualities.Hence,leaving room for retries is as important as choosing good quality links.5.CONCLUSIONIn this paper,we presented DiMo,a distributed algorithm for node and topology monitoring,especially designed for use with event-triggered wireless sensor networks.As a de-tailed comparative study with two other well-known moni-toring algorithm shows,DiMo is the only one to reach the design target of having a maximum error reporting delay of 5minutes while keeping the false positive rate and the energy consumption competitive.The proposed algorithm can easily be implemented and also be enhanced with a topology management mechanism to provide a robust mechanism for WSNs.This enables its use in the area of safety-critical wireless sensor networks.AcknowledgmentThe work presented in this paper was supported by CTI grant number 8222.1and the National Competence Center in Research on Mobile Information and Communication Sys-tems (NCCR-MICS),a center supported by the Swiss Na-tional Science Foundation under grant number 5005-67322.This work was also supported in part by phase II of the Embedded and Hybrid System program (EHS-II)funded by the Agency for Science,Technology and Research (A*STAR)under grant 052-118-0054(NUS WBS:R-263-000-376-305).The authors thank Matthias Woehrle for revising a draft version of this paper.6.REFERENCES[1] A.Mainwaring et al.Wireless sensor networks for habitatmonitoring.In 1st ACM Int’l Workshop on Wireless Sensor Networks and Application (WSNA 2002),2002.[2] A.Meier,T.Rein,et al.Coping with unreliable channels:Eﬃcient link estimation for low-power wireless sensor networks.In Proc.5th Int’l worked Sensing Systems (INSS 2008),2008.[3]N.Ramanathan,K.Chang,et al.Sympathy for the sensornetwork debugger.In Proc.3rd ACM Conf.Embedded Networked Sensor Systems (SenSys 2005),2005.[4]S.Rost and H.Balakrishnan.Memento:A health monitoringsystem for wireless sensor networks.In Proc.3rd IEEE Communications Society Conf.Sensor,Mesh and Ad Hoc Communications and Networks (IEEE SECON 2006),2006.[5]M.Strasser,A.Meier,et al.Dwarf:Delay-aware robustforwarding for energy-constrained wireless sensor networks.In Proceedings of the 3rd IEEE Int’l Conference onDistributed Computing in Sensor Systems (DCOSS 2007),2007.[6]G.Tolle and D.Culler.Design of an application-cooperativemanagement system for wireless sensor networks.In Proc.2nd European Workshop on Sensor Networks (EWSN 2005),2005.[7]K.-J.Wong et al.Speckmac:low-power decentralised MACprotocols for low data rate transmissions in specknets.In Proc.2nd Int’l workshop on Multi-hop ad hoc networks:from theory to reality (REALMAN ’06),2006.[8]M.Zuniga and B.Krishnamachari.Analyzing thetransitional region in low power wireless links.In IEEE SECON 2004,2004.[9]Fire detection and ﬁre alarm systems –Part 25:Componentsusing radio links.European Norm (EN)54-25:2008-06,2008.。

外文翻译----质量BotC企业网站的综合评估的一个索引系统

原文2An Index System for Quality Synthesis Evaluation of BtoC Business Website ABSTRACTIt is important for successful electronic business to have a hi-quality business website. So we need an accurate and effective index system to evaluate and analyses the quality of the business website. In this paper, the evaluation index system following the ‘grey box’ principle is proposed which considers both efficiency of business website and performance of electronic business system. Using R-Hierarchical clustering method to extract the typical indexes from sub-indexes is theoretically proved to have a rationality and effectiveness. Finally, the evaluation method is briefly discussed.Categories and Subject DescriptorsTP393.4KeywordsBusiness website; Evaluation system; R-Hierarchical clustering; System performance1. INTRODUCTIONBusiness website is an online media between buyer and seller. A hi-quality website is crucial to a company for a successful e-business. What is a hi-quality business website? In terms of maintaining the website, what do we focus on so that the quality meets the users’ needs? Apparently, using click-through rate to assess the popularity cannot objectively and accurately evaluate the quality of the business websites. Instead, we need to rely on scientific evaluation index system and methods.At present, there are many methods available for business website comparison or ranking, such as Usage Ranking, Purchase Comparison, Expert Opinion and Synthesis Evaluation etc. Y ou can find both official authority and non-governmental organization that issue their power ranking. The former one is to monitor and regulate the market, such as CNNIC, which organized the competition for the Top Ten Websites in domestic. The latter one, such as Consumerreports (www.consumerreports. org), BizRate(), Forrester Research etc., is mainly to guide the web users’ activity. These kinds of comparison or ranking have special value in getting reputation and increasing recognition of the business websites among the users, however,e-business enterprise can not improve the quality of their websites directlybased on the results of these kinds of assessments.The main purpose of this paper is to develop an index system for quantitative evaluation of the BtoC websites, which dose not emphasize the income of the website but focus on evaluating of its synthesis quality. We hope that the applying of this index system will provide the technique developers and maintainers some references for designing, appraising and diagnosing their e-business system to improve its quality level, and to support managers to make decisions for operation of the websites.2. OVERVIEW OF PREVIOUS STUDIESComparing to the fast growing of e-business websites in the world, currently we can rarely find the particular research on he evaluation index system of business website. QEM (The website quality evaluation method) proposed by Olsina and Godoy etc. in 1999 can be considered as one of the representative approaches. It based on the main factors to evaluate the quality of the websites, including functionality (global search, navigability, and content relativity), usability (website map, addresses directory), efficiency and reliability. In 2000, American researcher, Panla Solaman, presented e-SERVQUAL model based on the conventional service quality evaluation model SERVQUAL. It contains some factors like efficiency, deal completeness, reliability, privacy protection, responsiveness, recompense and contact etc. In the same year, another American researcher, Hauler, introduced an e-QUAL model which includes the factors of content, accessibility, navigability, design and presentation, responsiveness, background, personalization and customization, etc. In 2004, F.J. Miranda Gonzalez and T.M.Banegil Palac ios developed an universal evaluation index system WIS (Web Assessment Index) that can be employed to assess websites by different organizations. It consists of four indexes of accessibility, navigability, speed and content. However, the universal index system cannot measure a website exactly and absolutely due to the industry specialty, organizational characteristics and different usages. One of the representative researches is Mr. It assesses the business websites by testing if the design of the website coincides with the shopping process of online consumers. This standard has five factors, such as search and browse, merchandise information,shopping cart, register and pay, service and support. Another index system for small and medium business websites covers the factors of general features, design, promotion,information and the others.Here we list our major findings from the previous researches:2.1 Unreasonable Selection of the IndexSome research consider not only the original design but also thefactors such as promotion and income of business website.Some evaluation systems have correlative or contradictiveindexes. For example, it considers the download speed, at thesame time, it requires the web designers not to excessively useflash and sounds to slow down the speed.2.2 Unilateral EvaluationMost of the research takes the users’ view to evaluate the function and design of website. It treats the business system as a ‘black box’ and ignores the impact of system performan ce on the websites quality. But considering the factors of system performance alone is also not a complete evaluation for improving service quality of website.2.3 Lack of a Complete Set of QualitySynthesis Evaluation System A complete set of tool to evaluate the websites must include the following important elements: categories, factors, weights, rankings standard and assessment model. So far, we have not seen any literature discussing complete set of evaluation index system aiming at the quality of BtoC websites.3. PRINCIPLE FOR THE QUALITYSYNTHESIS EV ALUA TIONFirst, the three fundamental principles we need to follow are to be comprehensive, to be scientific and to be feasible. We should evaluate all the facets of the website from different dimensions and avoid missing value of important factors. Moreover, thedefinition of the evaluation index should be accurate，objective and logical so it can eliminate the impact on the evaluation result brought by the correlative indexes. Concurrently, we need reduce the quantity of indexes or adopt the simple ones which data is easier to be collected, and prevent from complicated calculation due to the excessive indexes.The main purpose of improving business websites is to serve the users better. They are concerned only about the websites’ e xternal attributes, such as content, function, presentation and browse speed, etc. So, evaluating only by taking their views cannot directly guide to develop,maintain and administrate the website. Just like treating the patient’s symptom but not the disease itself, the technique developer or maintainer cannot radically improve the quality of their websites by correcting system structure and web design according to the evaluation result. Only after we adopt the ‘grey box’ index system that considers both e fficiency of business website and performance of e-business system, we can establish a quality synthesis evaluation index system to benefit the management of BtoC websites.4. QUALITY EV ALUA TION INDEXESFOR BUSINESS WEBSITESelection of index items lays down the foundation for constructing evaluation index system. After we thoroughly analyze the evaluation objectives based on the characteristics ofbusiness website, we propose an initial index system includes 5 categories and totally 28 index items shown in the following Table 1.Website self-adaptability refers to capability of e-business system intelligently providing personalized service and dynamic optimizing system performance. System Efficiency refers to the ability that the system response quickly to the requests of numbers of web users. It can be measured through values of some quantitative indexes, such as response time, throughput or utilization rate, etc.5. OPTIMIZING THE EV ALUATION INDEXEIt is necessary for our initial evaluation system to optimize if it can be applied in practice. First, the indexes are more or less correlative which will affects the objectiveness of the evaluation. Second, there are too more indexes that will result in lower efficiency. Therefore, we try to extract and simplify the indexes by using R-Hierarchical clustering method.Generally, R indicates the coefficient of correlation between two items. R-Hierarchical clustering method is usually applied to cluster the indexes. The steps are described as following.5.1 Calculate Coefficient of Correlation and ClusteringIt firstly treats every index as one cluster. So, we have 28 clusters. Then, coefficient of correlation is calculated between every two clusters by minimum-distance method. Next, the two clusters with the maximal coefficient of correlation areclustered into a new one. The same process is repeated until all the indexes are clustered into one.5.2 Analyze the Clustering Process and Determine ClustersWe analyze the variation of minimum coefficient of correlation during the clustering processto find the leap points. According to the number of leap points and the knowledge of special field, we can eventually determine how many clusters we need. The whole process is illustrated in the following Figure 1.Following the principle of simplification and feasibility and considering the characteristics of BtoC website, we cluster the 28 index items into 10. The precision rate is over 0.75.5.3 Calculate Correlation Index and Extract theRepresentative Indexes First, we calculate the correlation index that is the average of R between one index and every other index in the same cluster.一个公式mi in this formula is the number of the indexes in the cluster that index Xj belongs to.Then, we select the index with the maximal correlation index in the total 10 clusters individually and identify 10 of them as the most representative indexes.Finally, the weights of the indexes are derived by the expert grade method. The final indexes and their weights are shown in the following table 2.6. CONCLUSIONIn this paper, we have proposed an index system for quality synthesis evaluation and diagnos is of the BtoC websites following the ‘Grey Box’ evaluation principle, and scientifically determined and simplified the index items.Usually, factor analysis or principal component analysis is used to solve the problem of common-factor and multiple indexes. But these methods are only suitable for the quantitative indexes, and the evaluation process is not truly simplified. Because the new index is the linear function of some original ones, it still needs to calculate the value of new indexes by collecting all the values of the original ones.In our index system, most of index is descriptive one. So we have finalized the indexes by using the R-Hierarchical clustering method. It really has reduced the number of the evaluation indexes without losing the major information from the original indexes. Furthermore, it has effectively avoided the impact of common-factors on the evaluation result.Only the index of system efficiency can be measured through quantitative sub-indexes such as response time, etc. Most of depictive indexes are subjective and fuzzy. In view of this, we should use fuzzy comprehensive analysis method to evaluate to get more efficiency result.In our future work we are intended to propose an evaluation model and conduct evaluation to some famous domestic BtoC websites to prove if this index system is scientific and feasible. Moreover, we will improve this set of index system including evaluation model to make the whole set of index system more feasibility.译文2质量BotC企业网站的综合评估的一个索引系统摘要：对于成功的电子商业来说，拥有一个高质量的商务网站是很重要的。

and

xiaobai@
We present a software toolbox for symmetric band reduction, together with a set of testing and timing drivers. The toolbox contains routines for the reduction of full symmetric matrices to bandrices to narrower banded or tridiagonal form, with optional accumulation of the orthogonal transformations, as well as repacking routines for storage rearrangement. The functionality and the calling sequences of the routines are described, with a detailed discussion of the \control" parameters that allow adaptation of the codes to particular machine and matrix characteristics. We also brie y describe the testing and timing drivers included in the toolbox. Categories and Subject Descriptors: G.1.3 Numerical Analysis]: Numerical Linear Algebra| Bandwidth reduction, orthogonal transformations, eigenvalues; G.4 Mathematical Software] Additional Key Words and Phrases: symmetric matrices, tridiagonalization, blocked Householder transformations

Categories and Subject Descriptors

Layered Peer-to-Peer Streaming∗Yi Cui,Klara NahrstedtDepartment of Computer Science University of Illinois at Urbana-Champaign{yicui,klara }@ABSTRACTIn this paper,we propose a peer-to-peer streaming solu-tion to address the on-demand media distribution problem.We identify two issues,namely the asynchrony of user re-quests and heterogeneity of peer network bandwidth.Our key techniques to address these two issues are cache-and-relay and layer-encoded streaming.A unique challenge of layered peer-to-peer streaming is that the bandwidth and data availability (number of layers received)of each receiv-ing peer are constrained and heterogeneous,which further limits the bandwidth and data availability of its downstream node when it acts as the supplying peer.This challenge dis-tinguishes our work from existing studies on layered multi-cast.Our experiments show that our solution is eﬃcient at utilizing bandwidth resource of supplying peers,scalable at saving server bandwidth consumption,and optimal at max-imizing streaming qualities of all peers.Categories and Subject DescriptorsC.2.2[Network Protocols ]:Applications; C.2.5[Localand Wide-Area Networks ]:Internet;D.4.4[Communications Management ]:Network Communication;D.4.8[Performance ]:SimulationGeneral TermsAlgorithms,Design,PerformanceKeywordsPeer-to-Peer,Layered Streaming,OverlayThis work was supported by the National Science Founda-tion under contract number 9870736,the Air Force Grant under contract number F30602-97-2-0121,National Science Foundation Career Grant under contract number NSF CCR 96-23867,NSF CISE Infrastructure grant under contract number NSF EIA 99-72884,and NASA grant under con-tract number NASA NAG 2-1250.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.NOSSDAV’03,June 1–3,2003,Monterey,California,USA.Copyright 2003ACM 1-58113-694-3/03/0006...$5.00.1.INTRODUCTIONLarge-scale on-demand multimedia distribution has been shown as one of the killer applications in current and next-generation Internet.In existing solutions,multicast has been extensively employed since it can eﬀectively deliver media data to multiple receivers,while minimizing server and network overhead.However,the nature of multicast is intrinsically in conﬂict with some important features of media distribution,namely the asynchrony of user requests and heterogeneity of client resource capabilities.For asyn-chrony,a user may request media data at any time,which is against the synchronous transmission manner of multicast.For heterogeneity,clients may request stream of diﬀerent qualities due to their own resource constraints,especially network bandwidth.Therefore,no single multicast stream can meet everyone’s requirement.These issues are illus-trated in Fig.1.Figure 1:Asynchrony and Heterogeneity in On-Demand Media DistributionDespite the above conﬂicts and numerous solutions to resolve them[2][9][11],the most serious problem faced by multicast today is the deﬁciency of its deployment in the wide-area network infrastructure.Moreover,this problem is likely to persist in the foreseeable future.As an alter-native,application-layer multicast [14]is proposed.In this approach,end systems,instead of routers,are organized into an overlay to relay data to each other in a peer-to-peer fash-ion.Besides its initial success in addressing the absence of IP multicast,application-layer overlay is now believed to of-fer us more,because end system is ﬂexible enough to provide more functionalities than simply forwarding packets,such as caching[16],data/service indirection[6],resilient routing[1],peer-to-peer streaming[5][4][13],etc.Bearing this in mind,we now revisit the aforementioned issues of asynchrony and heterogeneity in peer-to-peer streaming.First,recent studies on cache-and-relay[10][16]show promis-ing solutions to resolve the conﬂict between asynchronous requests and synchronous multicast transmission in peer-to-peer streaming.By delaying the received stream through caching,an end host could relay it to another peer,who re-quests the same data at a later time.Our previous study[16] further showed that this approach introduces less server/network overhead than IP-multicast-based solutions.Second,the layer-encoded streaming approach[11][9][12] has the potential to address the issue of heterogeneity.For example,in layered multicast[11][12],a stream is encoded into multiple layers,then fed into diﬀerent IP multicast ses-sions.A receiver with constrained bandwidth only needsto receive a subset of all layers to decode the stream with certain degraded quality.However,existing layered streaming solutions cannot be directly applied to peer-to-peer system.The fundamental reason roots at the dual role of the end host as both sup-plying peer and receiving peer.First,as a receiving peer,an end host may receive only a subset of all layers,due toits limited inbound bandwidth or processing capabilities.In peer-to-peer system,this means that its data availability asa supplying peer is also constrained,which further limitsthe data availability of its downstream peers.Second,the outbound bandwidth of relaying nodes are limited and het-erogeneous.This means that as a supplying peer,it has con-strained bandwidth supply to the downstream nodes.These problems never arise in layered multicast,whereby the end host is always the sink of data path.To summarize,these unique challenges distinguish our study from previous onesin layered streaming.The rest of this paper is organized as follows.In Sec.2,we formulate the problem of layerd peer-to-peer streaming and show its NP-complete complexity.In Sec.3,we present our solution.Sec.4evaluates its performance.Sec.5reviewsthe related work.Finally,Sec.6concludes.2.PROBLEM FORMULATIONWe consider the distribution of a layer-encoded stream across a set of peers with heterogeneous inbound and out-bound bandwidths.As shown in Fig.2,a peer can retrievethe stream at any time,requiring arbitrary quality,i.e.,arbi-trary number of layers.Each peer caches the received streamin a local circular buﬀer for a certain amount of time.In other words,a buﬀer window is kept along the playbackof the stream,say5minutes.In this way,any later peer (within5minutes)requesting the same stream can receivethe buﬀered data from the current host.Furthermore,it can retrieve diﬀerent layers from multiple peers in parallel. For example,in Fig.2,H3is3minutes later than H1and2 minutes later than H2.Thus,H3can stream from H1andH2.On the other hand,H4can only stream from H3sinceit is already outside the buﬀer windows of H1and H2.We consider the media server(H0)as a special peer,which stays online permanently and has all layers of the stream.A good peer-to-peer streaming solution should consider two factors.Theﬁrst factor is the overall streaming qualityof all peers,regarded as the system beneﬁt.The second factor is the server bandwidth consumption,regarded as the system cost.Therefore,our goal is to maximize the net beneﬁt(system beneﬁt excluding system cost),under each00:04Figure2:Layered Peer-to-peer Streaming peer’s bandwidth constraint.We introduce the following deﬁnitions.•Layer-encoded Stream{l0,l1,l2,...,l L},with l0as the base layer and others as enhancement layers.These layers are accumulative,i.e.,l i can be only decoded if layers l0through l i−1are available.For now,we as-sume that all layers have identical streaming rate.In Sec.3.4,we will relax this assumption.•Peers{H0,H1,H2,...,H N},with H0as the server.H1through H N are sorted based on their requesting times.Furthermore,we use I k and O k to denote the inbound and outbound bandwidths of H k.For the purpose of simplicity,I k and O k are measured as the number of layers H k can receive and send.•Streaming Quality Q k,the streaming quality re-ceived at H k,measured as number of received lay-ers.In particular,Q m k is used to denote the number of layers received from the supplying peer H m.Con-sequently,Q0k is the number of layers received from server H0.•Layer Availability A k,number of layers available at the cache of H k.In order to save cache space,H k is allowed to discard some received layers(always the highest ones,i.e.,the most enhanced ones)after play-back.Therefore,A k≤Q k.•Buﬀer Length B k,measured as the time length ofH k’s buﬀer.Note that with the same cache space,B kmay vary depending on how many layers H k caches.•Request Time t k,the time at which H k starts to playback the stream.For two peers H k and H m,if t k<t m and t m−t k≤B k,then H k is qualiﬁed asH m’s supplying peer.In other words,H m falls withinthe buﬀer window of H k,denoted as H k→H m.Given that the server H0can stream to all peers at any given time,we have H0→H k(k=1,...,N).•Supplying Peer Constraint C k,which sets the max-imum number of supplying peers H k can stream from in parallel.We do so to lower the synchronization com-plexity of parallel streaming.If the streaming qualityQ k still cannot be met by all the supplying peers due to their limited outbound bandwidth or layer availabil-ity,we allow the server H 0to send the missing layers to H k .Given above deﬁnitions,we can now formalize our goal asmaximize P N k =1(Q k −Q 0k )subject to (1)Q k ≤I k (1≤k ≤N )(2)P H k →H m Q km ≤O k (1≤k ≤N )The ﬁrst constraint states that the streaming quality (num-ber of layers received)of each peer H k should not exceedits inbound bandwidth.The second constraint is that as a supplying peer,H k cannot output more layers than its out-bound bandwidth allows to send.Theorem 1:Layered Peer-to-peer Streaming problem is NP-complete.Proof is provided in Appendix A.YERED PEER-TO-PEER STREAMINGSOLUTIONWe now present our solution.We have following design objectives.First,each peer H k should conﬁgure its own streaming session (what layers to stream from which peer)locally .As Theorem 1showed that having global view and control does not help to reduce the problem complexity,this approach is more practical and cost-eﬀective 1.Second,as the client requests come at diﬀerent times,our solution should be incremental ,i.e.,the streaming quality of existing peers must be preserved when admitting new peers.As such,a new client H k can only utilize the residual outbound bandwidth of its supplying peers.3.1Basic AlgorithmWe ﬁrst present the basic algorithm,which assumes that H k is allowed to receive data from unlimited number of senders.The algorithm is a greedy approach,in that it always maximally utilizes the outbound bandwidth of the peer with the smallest number of layers.Executed by H k ,the algorithm takes the following inputs:(1)I k ,the inbound bandwidth of H k ;(2)S ={H 1,...,H M },a set of hosts qualiﬁed as the supplying peers of H k .This means that H k falls within the buﬀer windows of H 1through H M .These peers are sorted such that A 1≤A 2≤...≤A M .The algorithm returns the streaming quality Q k ,and collects the selected supplying peers into set P .Fig.3illustrates our algorithm.H k ’s inbound bandwidth allows to receive 11layers,as represented by the white bar.It requests individual layers from supplying peers H 1through H 4.H 1has 3layers in its cache,as represented by its bar.Likewise,H 2has 4layers and so on.The shadowed squares 1In order to do so,H k needs to know who are qualiﬁed as its supplying peers and their layer availability and cur-rent outbound bandwidth.The dissemination and acqui-sition of such information can be done through any pub-lish/subscribe service,which is considered orthogonal and not covered in this paper.Interested readers are also refer-eed to our previous study[15],which proposed a time-based publish/subscribe solution to address this problem.UnconstrainedSenders (I k ,H 1,...,H M )/*Calculation of Q k */1Q k ←02P ←φ3m =14repeat 5Q m k ←min (O m ,A m −Q k ,I k −Q k )6enque (P,H m )7Q k ←Q k +Q m k 8m ←m +19until Q k =I k or m >M /*Layer Allocation */10index ←011repeat12H m ←deque (P )13allocate layers P index +Q m k −1i =index l i to H m14O m ←O m −Q mk15index ←index +Q m k 16until P =φTable 1:Basic Algorithmof H 1through H 4represent the number of layers their out-bound bandwidths allow to send.Note that these shadowed squares only indicates how many layers onecan send.For example,H 3has six layers available in its cache,and is able to output any ﬁve of them.Black squares represent layers which are already allocated.(d)(e)(f)Figure 3:Sample Illustrating the Basic Algorithm Initially,Q k at H k is 0(Fig.3(a)).H k ﬁrst uses up the outbound bandwidth of H 1and increases Q k to 2(Fig.3(b)).Then,although H 2is able to output four layers,H k only needs two layers from it,since the lower two are already allocated to H 1(Fig.3(c)).For the same reason,H 3only needs to output two layers.Note that its allocated layers shift up,since H 3is free to output any layers available in its cache (Fig.3(d)).After H 4is done,S becomes empty and Q k is still two layers lower than the expected quality of H k(Fig.3(e)).Finally,we ask the server to send the missinglayers to H k(Fig.3(f)).Theorem2:The basic algorithm can allocate maximum number of layers for H k.We leave the proof in Appendix B.3.2Enhanced Algorithm under Supplying PeerConstraintWe now turn to a more constrained scenario,where H k can only stream fromﬁnite number of supplying peers.As-suming the existence of S={H1,...,H M}deﬁned in Sec.3.1, let us denote Q∗k(M,1)as the optimal solution if H k can only choose one supplying peer from H1through H M.This solution is straightforward:we only need toﬁnd the peer which can send the largest number of layers.Then for Q∗k(M,2),assuming Q∗k(m,1)is already known,we canﬁnd the one from H m+1to H M that is able to further contribute the most layers,denoted as Q max(H m+1,...,H M).Then Q∗k(m,1)+Q max(H m+1,...,H M)is a candidate answer to Q∗k(M,2).Repeating this procedure from m=1to M−1, the best solution can be found asQ∗k(M,2)=max[Q∗k(m,1)+Q max(H m+1...H M)]1≤m<MIn general,we haveQ∗k(M,C k)=max[Q∗k(m,C k−1)+Q max(H m+1...H M)]C k−1≤m<MThis is a typical dynamic programming problem.We show our algorithm in Tab.2.Theorem3:Under constraint C k,the enhanced algo-rithm can allocate maximum number of layers for H k.We leave the proof in Appendix C.The algorithm com-plexity is O(C k M2).3.3Node DepartureNode departure can happen frequently,due to user logout, machine crash or network failure.Upon losing a supplying peer H m,the receiving peer H k should reconﬁgure its ses-sion by rerunning the layer allocation algorithm.However, during this transient period,its streaming quality may de-grade.Here we discuss how our solution can adapt to this situation.First,if H m departs normally,it will notify H k to recon-ﬁgure its session.Meanwhile,H m continues to stream data remained in its buﬀer to H k as normal.Therefore,as long as the reconﬁguration of H k’s sessionﬁnishes before H m’s buﬀer is drained,H k will stay unaﬀected.Otherwise,H m can be regarded as failed,which will be addressed below.If H m fails,upon detecting it,H k has two options.First, It can temporally request from the server the layers which were allocated to H m,until its session reconﬁguration isﬁn-ished.In other words,during this time,the server acts as H m.Second,if the server bandwidth is already fully occu-pied,then the streaming quality of H k has to be degraded gracefully.As an example,in Fig.4,H k initially received data from supplying peers H1through H4.When H2fails, H k asks other peers to stream as usual.However,H3’s lay-ers are shifted down to meet the gap left by H2.H4’s layersConstrainedSenders(I k,C k,H1,...,H M)/*Initialization*/1for c←0to C k2for m←0to M3Q∗k(m,c)←04P∗(m,c)←φ/*Calculation of Q∗(M,C k)*/6for c←1toC k7for m←c to M−C k+c8for t←m to M−C k+c9Q ←min(O t,A t−Q∗k(m−1,c−1),I k−Q∗k(m−1,c−1))10if Q∗k(t−1,c)>Q∗k(t,c)or11Q∗k(m−1,c−1)+Q >Q∗k(t,c)12begin13if Q∗k(t−1,c)<Q∗k(m−1,c−1)+Q14then15for i←1to m−116Q i∗k(t,c)←Q i∗k(m−1,c−1)17Q t∗k(t,c)←Q18P∗(t,c)←P∗(m−1,c−1)∪{H t}19Q∗k(t,c)←Q∗k(m−1,c−1)+Q t∗k(t,c) 20else21for i←1to t−122Q i∗k(t,c)←Q i∗k(t−1,c)23P∗(t,c)←P∗(t−1,c)24Q∗k(t,c)←Q∗k(t−1,c)25end/*Layer Allocation*/26index←027repeat28H m←deque(P∗(M,C k))29Q m k←Q m∗k(M,C k)30allocate layersP index+Q mk−1i=indexl i to H m 31O m←O m−Q m k32index←index+Q m∗k(M,C k)33until P∗(M,C k)=φTable2:Enhanced AlgorithmHHHHFailedFigure4:Graceful Degradation of Streaming Qual-ity(when server bandwidth is fully occupied)are shifted down likewise.Thus,H k ’s quality only drops by H 2’s share.Another concern is that the quality degradation of H k could further cause the quality degradation of its children.In this case,H k can be regarded as normal departure.As explained earlier in this subsection,the streaming quality of H k ’s children will not be aﬀected until H k ’s buﬀer is drained.Therefore,buﬀering can eﬀectively absorb the propagation of quality degradation.3.4Layer Rate HeterogeneitySo far in this paper,we have assumed that all layers have identical streaming rate.In practice,however,this is often not the case[12].To show the complexity of this problem,we ﬁrst go through the following (re)deﬁnitions.•r i ,the streaming rate of a layer l i (Kbps).•I k and O k ,the inbound and outbound bandwidth of H k ,measured as raw data rate (Kbps).We deﬁne the Heterogeneous-Rate Layer Allocation Prob-lem as follows.Given a set of layers {l 0,...,l L }with diﬀer-ent streaming rates {r 0,...,r L },the receiving peer H k ,and a set of supplying peers S ={H 1,...,H M },ﬁnd an optimal solution,which allocates maximum number of layers for H k .Theorem 4:Heterogeneous-Rate Layer Allocation Prob-lem is NP-complete.We leave the proof in Appendix D.We modify existing algorithms (Tab.1and Tab.2)to accommodate the layer rate heterogeneity.They are shown in Tab.3and Tab.4,respectively.To save space,we only show the modiﬁed part of each algorithm.UnconstrainedSenders (I k ,H 1,...,H M )...4repeat5Q m k ←max (n |P Q k +n i =Q kr i ≤O m ,P Q k +ni =0r i ≤I k Q k +n ≤A m )6enque (P,H m )7Q k ←Q k +Q m k 8m ←m +19until P Q ki =0r i >I k or m >M...Table 3:Modiﬁed Basic Algorithm for Layer Rate HeterogeneityConstrainedSenders (I k ,C k ,H 1,...,H M )...9Q ←max (n |P Q ∗k (m −1,c −1)+ni =Q ∗k (m −1,c −1)r i ≤O t ,P Q ∗k (m −1,c −1)+ni =0r i ≤I k ,O ∗k (m −1,c −1)+n ≤A t )...Table 4:Modiﬁed Enhanced Algorithm for Layer Rate Heterogeneity4.PERFORMANCE EV ALUATIONWe simulate a peer-to-peer streaming system of total 40000peers.We categorize peers into three classes:(1)Modem/ISDN peers,which take 50%of the population with maximum to-tal bandwidth of 112Kbps ;(2)Cable Modem/DSL peers,which take 35%of the population with maximum total band-width of 1Mbps ;and (3)Ethernet peers,which take rest of the population with maximum total bandwidth of 10Mbps .Each peer requests the 60-minute video at diﬀerent times during the 24-hour run.The layer rate of the video is 20Kbps .Its full-quality streaming rate is 1Mbps ,which consists of 50layers.4.1Overall Streaming Quality and ScalabilityWe compare our solution with versioned streaming ,an-other commonly used solution to address the end host het-erogeneity.In our experiment,the video is encoded into 50versions with diﬀerent streaming rates.Each version is distributed using an independent application-layer multicast tree.0.10.20.30.40.50.60.70.80.910.60.81 1.2 1.4 1.6 1.82a v e r a g e q u a l i t y s a t i s f a c t i o naverage outbound/inbound bandwidth ratiolayered versionedFigure 5:Overall Streaming Quality (Request Rate=120req/hr ,Buﬀer Length=5min )We ﬁrst test these two solutions at utilizing the outbound bandwidth of supplying peers.Since each peer may ex-pect diﬀerent streaming qualities,we propose a new met-ric Streaming Quality Satisfaction ,which is deﬁned as the ratio of received quality and expected quality of a peer H k ,namely Q k /I k .The maximum value is 1.As shown in Fig.5,when the average ratio of each peer’s outbound/inbound bandwidth is greater or equal to 1,the average quality sat-isfaction of layered approach is almost 1,which means that the peers can virtually self-support.On the other hand,the curve of versioned approach never goes over 0.7.More-over,when the outbound/inbound ratio is below 1,the per-formance of layered approach degrades linearly,which in-dicates that it is always able to fully utilize the marginal outbound bandwidth of supplying peers.In comparison,the curve of versioned approach drops suddenly since most supplying peers’outbound bandwidth cannot send out the entire video.This reveals that the layered approach is more adapted to the bandwidth asymmetricity (outbound band-width less than inbound bandwidth),which is often the case for Cable Modem and ADSL users.We then test the scalability of these two solutions at sav-ing server bandwidth.Fig.6shows that,when client request rate grows,the server bandwidth consumption of layered approach actually drops.The main reason is that,when a requesting peer joins,it also acts as a supplying peer to the01002003004005006007000102030405060708090100s e r v e r b a n d w i d t h (K B y t e s /h o u r )rate (requests/hour)layered versionedFigure 6:Server Bandwidth Consumption (Aver-age Outbound/Inbound Bandwidth Ratio=1,Buﬀer Length=5min )following requesting peer,which in turn forms a chain.This chain gets longer when the average interarrival time of dif-ferent peers shortens.Such a chain eﬀect also happens in the case of versioned streaming.However,since this approach always requires enough outbound bandwidth to output the entire video,only few supplying peers qualify,which causes the chain to be easily broken.4.2Impact of Design Parameters0.840.860.880.90.920.940.960.9812030405060708090100a v e r a g e q u a l i t y s a t i s f a c t i o nrate (req/hr)No Constraint Constraint=4Constraint=2Figure 7:Impact of Supplying Peer Constraint (Average Outbound/Inbound Bandwidth Ratio=1,Buﬀer Length=5min )For a receiving peer,limiting the number of supplying peers can help lowering the operation and synchronization complexity.On the other hand,it does not guarantee to maximally utilize the outbound bandwidth of all supplying peers,compared to the unconstrained case.Fig.7shows that constraining the number of senders to 4already ac-quires nearly identical performance to the unconstrained case,in terms of average quality satisfaction.Another important design parameter is the buﬀer length of each supplying peer.Apparently,longer buﬀer enables a supplying peer to help more later-coming peers,thus im-proving the overall streaming quality.As revealed in Fig.8,this is true when request rate is low.This can help keep the peers chain (Sec.4.1)from broken.Further increasing buﬀer size has very little improvement space,since it can help lit-tle at prolonging the chain.This ﬁnding suggests that with small-to-medium sized cache space (3or 5minutes out of an 1-hour video),the system can acquire great performance gain.0.450.50.550.60.650.70.750.80.850.90.95020040060080010001200a v e r a g e q u a l i t y s a t i s f a c t i o nbuffer time length (s)60 req/hr 120 req/hr 240 req/hrFigure 8:Impact of Buﬀer Length (Average Out-bound/Inbound Bandwidth Ratio=0.8)4.3FairnessAs revealed in Fig.5,when the average ratio of each peer’s outbound/inbound bandwidth is below 1,the average qual-ity satisfaction drops correspondingly,i.e.,not every peer’s streaming quality can be as good as expected.As such,a fair solution should ensure that such deviation does not greatly vary from one peer to another.c u m u l a t i v e p e r c e n t a g e o f r e q u e s t squality satisfaction(a)Diﬀerent Outbound/Inbound Bandwidth Ratios0.20.40.60.8100.20.40.60.81c u m u l a t i v e p e r c e n t a g e o f r e q u e s t squality satisfactionModem/ISDN Cable Modem/DSLEthernet(b)Diﬀerent Peer Classes (Outbound/Inbound Ratio=1)Figure 9:Cumulative Distribution of Qual-ity Satisfaction (Request Rate=120req/hr ,Buﬀer Length=5min )We plot the cumulative distribution of peers with diﬀer-ent quality satisfaction in Fig.9(a).When the average out-bound/inbound ratio is 0.5,about 50%of the peers acquire the expected streaming quality.Then the quality satisfac-tion decreases almost linearly among the rest of the peers.We observe the similar trend when increasing the average outbound/inbound ratio.However,when the ratio becomes greater or equal than 1,only 90%of the population receivethe full quality satisfaction.Furthermore,this percentage stays unimproved when we further enlarge the outbound bandwidth of supplying peers.We ﬁnd the answer in Fig.9(b).In Fig.9(b),we study the distribution of quality sat-isfaction over diﬀerent peer classes when the average out-bound/inbound bandwidth ratio is 1.Although all Mo-dem/ISDN peers receive the full quality satisfaction,this is not the case for 5%of Cable Modem/DSL peers.For Ethernet peers,over 40%of them do not receive the stream quality as expected.The main reason is that when a peer of higher class (e.g.,Ethernet)joins,it can happen that all its supplying peers belong to the lower classes (e.g.,Cable Modem or ISDN).Therefore,even when these peers have available outbound bandwidth,they still do not have higher stream layers,which are requested by the peer of higher class.c u m u l a t i v e p e r c e n t a g e o f r e q u e s t sbandwidth contribution(a)Diﬀerent Outbound/Inbound Bandwidth Ratios0.20.40.60.8100.20.40.60.81c u m u l a t i v e p e r c e n t a g e o f r e q u e s t sbandwidth contributionModem/ISDN Cable Modem/DSLEthernet(b)Diﬀerent Peer Classes (Outbound/Inbound Ratio=1)Figure 10:Cumulative Distribution of Outbound Bandwidth Contribution(Request Rate=120req/hr ,Buﬀer Length=5min )We then evaluate whether our solution enables each peer to fairly contribute its outbound bandwidth.As shown in Fig.10(a),when the outbound/inbound ratio is 0.5,each peer contributes all of its bandwidth.When the ratio is 1,only 90%of all peers contribute all of its bandwidth.The contribution decreases linearly among the rest of the peers.Again,this can be explained when we plot the distribution of bandwidth contribution over diﬀerent peer classes in Fig.10(b).Fig.10(b)exhibits the similar pattern with Fig.9(b).All Modem/ISDN peers contribute all of their bandwidths.This is mainly due to the greedy nature of our layer allocation algorithm,which always ﬁrst exploit the peers with smallest number of layers.Almost 40%of the Ethernet peers only partially contribute their bandwidth.This is mainly becausethat they mostly stream to the lower-class peers,who always ﬁrst request layers from supplying peers of the same class,if any.To this end,we conclude that both data availability constraint and bandwidth availability constraint of supplying peers have impact on the issue of fairness.4.4RobustnessTo test the robustness of our solution,we inject random node departures/failures into our simulation.We are mainly interested with the ability of our solution at absorbing the transient failure during stream session reconﬁguration via buﬀering (Recall Sec.3.3).In our experiment,50%of the supplying peers depart early before the playback is ﬁnished.These peers are further categorized into normal departure peers and failed peers .A normal departure peer notiﬁes its children upon leaving,but continues to stream until its buﬀer is drained.The children will stay unaﬀected if they can ﬁnish reconﬁguring their sessions before the buﬀer is drained.Otherwise,they have to experience temporal qual-ity degradation,as depicted in Fig.4.On the other hand,if a peer fails,its children will be deﬁnitely aﬀected.We use Failure Ratio to denote the percentage of failed ones among all departure peers.0.20.40.60.815101520253035404550p e r c e n t a g e o f a f f e c t e d p e e r sreconfiguration time (s)No Constraint, Failure Ratio 1.0Constraint=2, Failure Ratio 1.0No Constraint, Failure Ratio 0.5Constraint=2, Failure Ratio 0.5Figure 11:Percentage of Aﬀected Peers (Request Rate=120req/hr ,Buﬀer Length=5min ,Average Out-bound/Inbound Bandwidth Ratio=1)As shown in Fig.11,buﬀering can eﬀectively “mask”more than half of the peer departures,when the average session reconﬁguration time is small (5seconds).The eﬀect of buﬀering diminishes as the failure ratio grows.Eventu-ally,it is rendered useless when all departure peers are failed ones.Also,one can impose supplying peer constraint to ef-fectively lower the percentage of aﬀected peers.However,as a side eﬀect,the average quality degradation is higher than the unconstrained case (Fig.12).The reason is that when the number of supplying peers is constrained,in order to maximize the streaming quality,the enhanced layer allo-cation algorithm (Sec.3.2)always chooses supplying peers that can contribute most number of layers.Therefore,when one of them departs or fails,it is likely to incur more quality degradation.4.5Layer Rate HeterogeneityEncoding a stream into more layers can help us better utilize the marginal inbound/outbound bandwidth of peers,therefore increases the average streaming quality and helps save server cost.However,the price is that we have to put re-dundant information into each layer.Such ineﬃciency adds up as we increase the number of layers.。

General Terms

If Your Bug Database Could Talk...Adrian Schr¨oter·Thomas Zimmermann·Rahul Premraj·Andreas ZellerSaarland UniversitySaarbr¨ucken,Germany{schroeter|zimmerth|premraj|zeller}@st.cs.uni-sb.deABSTRACTWe have mined the Eclipse bug and version databases to map fail-ures to Eclipse components.The resulting data set lists the defect density of all Eclipse components.As we demonstrate in three sim-ple experiments,the bug data set can be easily used to relate code, process,and developers to defects.The data set is publicly avail-able for download.Categories and Subject DescriptorsD.2.7[Software Engineering]:Distribution,Maintenance,and Enhancement—version control; D.2.8[Software Engineering]: Metrics—Complexity measures,Process metrics,Product metrics;D.2.9[Software Engineering]:Management—Software quality assurance(SQA)General TermsManagement,Measurement,Reliability1.INTRODUCTIONWhy is it that some programs are more failure-prone than others? This is one of the central questions of software engineering.To an-swer it,we mustﬁrst know which programs are more failure-prone than others.With this knowledge,we can search for properties of the program or its development process that commonly correlate with defect density;in other words,once we can measure the ef-fect,we can search for its causes.One of the most abundant,widespread,and reliable sources for fail-ure information is a bug database,listing all the problems that oc-curred during the software life time.Unfortunately,bug databases frequently do not directly record how,where,and by whom the problem in question wasﬁxed.This information is hidden in the version database,recording all changes to the software source code.In recent years,a number of techniques have been devel-oped to relate bug reports toﬁxes[6,3,2].Since we thus can relate bugs toﬁxes,andﬁxes to the locations they apply to,we can easily determine the defect density of a component—simply by counting the appliedﬁxes.We have conducted such a work on the code base of the Eclipse programming environment.In particular,we have computed the mapping of classes to the number of defects that were reported in theﬁrst six months before and after release,respectively.We have made this Eclipse bug data set freely available,such that anyone can use it for research purposes.Figure1shows an excerpt of the data set in XML format.The ﬁle Plugin.java had5failures(and thus defects)before release3.0 (“pre”);it had one failure after release(“post”).The enclosing package org.eclipse.core.runtime contains43ﬁles(“points”)and encountered16failures before and one failure after release3.0; on average eachﬁle in this package had0.609failures before and 0.022failures after release(“avg”).1What can one do with such data?In this paper,we illustrate how the data set can be used to address simple research questions:•Can one predict failure-proneness from metrics like code complexity?(Section3)•What does a high number of bugs found during testing mean for the number of bugs found after release?(Section4)•Do some developers write more failure-prone code than oth-ers?(Section5)This paper does not attempt to give deﬁnitive answers on these questions,but merely highlights the potential of bug data when it comes to answer these questions.We hope that the public avail-ability of data sets like ours will foster empirical research in soft-ware engineering,just like the public availability of open source programs fostered research in program analysis.2.GETTING BUG DATAHow do we know which components failed and which did not? This data can be collected from version archives like CVS and bug tracking systems like BUGZILLA in two steps:1.We identify corrections(orﬁxes)in version archives:Withinthe messages that describe changes,we search for references to bug reports such as“Fixed42233”or“bug#23444”.Ba-sically every number is a potential reference to a bug report, however such references have a low trust atﬁrst.We increase the trust level when a message contains keywords such as “ﬁxed”or“bug”or matches patterns like“#and a number”.This approach was previously used in research[3,2].1Since one failure can affect severalﬁles in one package,the counts on package level cannot be aggregated fromﬁle level and therefore are provided separately.<defects project=”eclipse”release=”3.0”><package name=”org.eclipse.core.runtime”><counts><count id=”pre”value=”16”avg=”0.609”points=”43”max=”5”> <count id=”post”value=”1”avg=”0.022”points=”43”max=”1”> </counts><compilationunit name=”Plugin.java”><counts><count id=”pre”value=”5”><count id=”post”value=”1”></counts></compilationunit><compilationunit name=”Platform.java”><counts><count id=”pre”value”1”><count id=”post”value=”0”></counts></compilationunit>...</package>...</defects>Figure1:The Eclipse bug data set(excerpt).2.We use the bug tracking system to map bug reports to re-leases.The bug database versionﬁeld lists the release for which the bug was reported;however,since theﬁeld value may change during the life-cycle of a bug,we only use the ﬁrst reported release.We distinguish two different kinds of failures:pre-release failures are observed during develop-ment and testing of a program,while post-release failures are observed after the program has been deployed to its users. Since we know the location of every failure that has beenﬁxed,it is easy to count the number of defects per location and release—resulting in the data set of Figure1.3.THE CODE FACTORSo where do these bugs come from?One hypothesis is that some code is more failure-prone than other because it is more complex. Complexity metrics attempt to quantify this complexity,mapping code to metric values.In earlier work on mining Microsoft bug databases[4],we could notﬁnd a single metric that would correlate with bug density across multiple ing the Eclipse bug data set,we can easily check this result by correlating,for each class,complexity metrics with the number of bugs.Chidamber and Kemerer[1]proposed several code metrics that capture the complexity of a class.Table1lists the correlations of each of these metrics(gathered using the tool ckjm[7])with pre-release and post-release failures.Albeit weak,the most strongly correlated features2to pre-release and post-release failures include RFC(Response for a Class),CBO(Coupling Between Object classes)and WMC(Weighted Methods per Class).These results are in line with our previous research at Microsoft[4], thus suggesting that either new or a combination of existing metrics need to be explored to study the relationship between the complex-ity of code to the presence of bugs in a given class.One important predictor might be the domain of a component—in related work, we could predict the failure-proneness of an Eclipse package from its imports alone[5].2For detailed explanations of these code metrics,the reader is re-quested to refer to[1].Number of Pre-release failures Post-release failuresPearson Spearman Pearson Spearman Pre-release failures 1.00 1.000.260.19 Post-release failures0.260.19 1.00 1.00 WMC0.320.310.160.11 DIT0.070.110.000.01 NOC0.000.040.000.02 CBO0.360.400.230.12 RFC0.390.380.210.11 LCOM0.130.230.030.07 CA0.090.050.020.04 NPM0.200.180.110.09 Table1:Correlation of pre-release and post-eelease failures with code metricsNumber of Pre-release failures Post-release failuresPearson Spearman Pearson Spearman Pre-release failures 1.00 1.000.300.20 Post-release failures0.300.20 1.00 1.00 Changes0.340.440.140.15 Changes since2.10.470.560.190.17 Authors0.300.300.150.13 Authors since2.10.410.490.210.17 Table2:Correlation of process measurements with failures [Eclipse3.0].4.THE PROCESS FACTORAny problem that raises after product release indicates a defect not only in the product,but also in its process:Clearly,the de-fect should have been caught by quality assuranceﬁrst.In practice, this may mean that the product was not tested enough.Therefore, we could turn to the testing process as a cause for the problem. Failures during testing are recorded as pre-release failures in bug tracking systems.Other measures for the development process are the number of changes and authors of aﬁle.Tables2shows how these measurements correlate with each other.For pre-release fail-ures the correlation is highest for the number of changes(0.47)and authors(0.41)since release2.1.This is not surprising,since every pre-release failure also resulted in at least one change(namely the ﬁx).Post-release failures show almost now correlation with process measurements,except for pre-release failures where the correlation is0.30.To summarize,it is difﬁcult to predict post-release failures solely from process measurements.5.THE HUMAN FACTORAs a third andﬁnal example of using the Eclipse bug data set,let us turn to the ultimate cause of errors:humans.Unfortunately,data from one project alone is not enough to judge managerial decisions. However,we can turn to the developers and examine whether spe-ciﬁc developers are more likely to produce bugs than others. Tables3and4summarize pre-release and post-release bug patterns introduced by developers.In both tables,theﬁrst column lists the names of developers3and the second column lists the number of ﬁles owned by the developer.The latter was derived by attributing 3Names have been changed to maintain anonymity.Failure-densities Developer No.of Files PrRF/1000lines Avg.PrRF/File Frederick32016.42 2.81Peter9714.70 1.96Isaac1789.95 1.69Mary3929.35 1.84 London639.18 1.41David888.77 1.64Harry55 2.55 1.18 Tommy92 2.200.35King162 2.180.36 Charles63 1.820.43Nellie60 1.140.32 Robert580.470.17Table3:Pre-release failures by developertheﬁle to the developer(s)that owned most number of lines of code in aﬁle and only those developers that owned50or moreﬁles were included in the analysis.Columns3and4record the number of pre-release and post-release failures per1000lines of code and the average number of pre-release and post-release failures perﬁle.For brevity,only theﬁrst and last six entries of each table are reported. In Table3,one observes substantial differences in pre-release fail-ure densities inﬁles(indicated by Columns3and4)between dif-ferent developers.However,such results should be carefully inter-preted.We suspect that the results do not indicate developer com-petency but instead,reﬂect the complexity of code they are work-ing on.Hence,developers with lesser pre-release or post-release failures are not necessarily better developers that the others.Our stance is further supported by there being no clear relation between the number ofﬁles owned by a developer and the corresponding failure densities observed since experienced and better program-mers may own moreﬁles.Likewise,Table4again indicates a high variance in failure den-sity inﬁles owned by different developers,although the densities are smaller in comparison to pre-releasure failures.It is note-worthy that developer Frederick lists in Table3as the owner of theﬁles with highest pre-release failure density,while in Table4, the same developer is the owner of nearly failure free post-release ﬁles.In contrast to Frederick,ﬁles owned by Tommy are less pre-release failure prone while the post-release failures are consider-ably higher.Hence,different developers are likely to introduce different num-ber of failures into the code for manifold possible reasons.We con-sider such information to be only the tip of the iceberg indicating directions for future investigations pertaining to the human factor in software development.6.CONCLUSION AND CONSEQUENCES Where do bugs come from?By mapping failures to components, the Eclipse bug data set offers the opportunity to research these questions.Our initial studies,as shown in this paper,do not give a deﬁnitive answer.However,they raise obvious follow-up ques-tions and indicate the potential of future empirical research based on such bug data.To support this very research,we are happy to make the bug data set publicly available.Failure-densities Developer No.of Files PoRF/1000lines Avg.PoRF/File Jack540.710.13 London630.520.08 Queen1110.510.20 Edward550.410.04 Samuel670.390.12 Tommy920.340.05 Alfred1520.030.01 Oliver1060.030.02 Frederick3200.020.00King1620.000.00 Benjamin1190.000.00 George520.000.00Table4:Post-release failures by developer Overall,we would like this set to become both a challenge and a benchmark:Which factors in programs and processes are the ones that predict future bugs,and which approach gives the best prediction results?The more we learn about past mistakes,the better are our chances to avoid these mistakes in the future—and build better software at lower cost.For access to the Eclipse bug data set,as well as for ongoing infor-mation on the project,seehttp://www.st.cs.uni-sb.de/softevo/ Acknowledgments.Our work on mining software reposito-ries is funded by the Deutsche Forschungsgemeinschaft,grant Ze509/1-1.Thomas Zimmermann is additionally funded by the DFG-Graduiertenkolleg“Leistungsgarantien f¨u r Rechnersysteme”.7.REFERENCES[1]S.R.Chidamber and C.F.Kemerer.A metrics suite for objectoriented design.IEEE Trans.Software Eng.,20(6):476–493, 1994.[2]D.Cubranic,G.C.Murphy,J.Singer,and K.S.Booth.Hipikat:A project memory for software development.IEEE Transactions on Software Engineering,31(6):446–465,June 2005.[3]M.Fischer,M.Pinzger,and H.Gall.Analyzing and relatingbug report data for feature tracking.In Proc.10th WorkingConference on Reverse Engineering(WCRE2003),Victoria, British Columbia,Canada,Nov.2003.IEEE.[4]N.Nagappan,T.Ball,and A.Zeller.Mining metrics to predictcomponent failures.In Proceedings of the InternationalConference on Software Engineering(ICSE2006).ACM,May2006.[5]A.Schr¨o ter,T.Zimmermann,and A.Zeller.Predictingfailure-prone components at design time.In Proceedings ofthe5th International Symposium on Empirical SoftwareEngineering(ISESE2006).ACM,Sept.2006.[6]J.´Sliwerski,T.Zimmermann,and A.Zeller.When do changesinduceﬁxes?In Proc.International Workshop on MiningSoftware Repositories(MSR),St.Louis,Missouri,U.S.,May 2005.[7]D.Spinellis.Code Quality:The Open Source Perspective.Addison Wesley,2006.。

斯坦福回顾SND的演变过程

Fabric:A Retrospective on Evolving SDNMartín CasadoNicira Teemu KoponenNiciraScott ShenkerICSI†,UC BerkeleyAmin TootoonchianUniversity of T oronto,ICSI†AbstractMPLS was an attempt to simplify network hardware while improving theﬂexibility of network control.Software-Deﬁned Networking (SDN)was designed to make further progress along both of these dimensions.While a signiﬁcant step forward in some respects,it was a step backwards in others.In this paper we discuss SDN’s shortcomings and propose how they can be overcome by adopting the insight underlying MPLS.We believe this hybrid approach will enable an era of simple hardware andﬂexible control. Categories and Subject DescriptorsC.2.5[Computer-Communication Networks]:Local and Wide-Area Networks—Internet;C.2.1[Computer-Communication Net-works]:Network Architecture and DesignGeneral TermsDesignKeywordsNetwork architecture1IntroductionThe advent of the Internet,and networking more generally,has been a transformative event,changing our lives along many dimensions: socially,societally,economically,and technologically.While the overall architecture is an undeniable success,the state of the networking industry and the nature of networking infrastructure is a less inspiring story.It is widely agreed that current networks are too expensive,too complicated to manage,too prone to vendor-lockin, and too hard to change.Moreover,this unfortunate state-of-affairs has remained true for well over a decade.Thus,while much of the research community has been focusing on“clean-slate”designs of the overall Internet architecture,a more pressing set of problems remain in the design of the underlying network infrastructure.That is,in addition to worrying about the global Internet protocols,the research community should also devote effort to how one could improve the infrastructure over which these protocols are deployed. This infrastructure has two components:(i)the underlying hardware and(ii)the software that controls the overall behavior of the network.An ideal network design would involve hardware that is:†International Computer Science InstitutePermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on theﬁrst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.HotSDN’12,August13,2012,Helsinki,Finland.Copyright2012ACM978-1-4503-1477-0/12/08...$15.00.•Simple:The hardware should be inexpensive to build andoperate.•Vendor-neutral:Users should be able to easily switchbetween hardware vendors without forklift upgrades.•Future-proof:The hardware should,as much as possible, accommodate future innovation,so users need not upgradetheir hardware unnecessarily.The ideal software“control plane”coordinating the forwarding behavior of the underlying hardware must meet a single but broad criterion:•Flexible:The software control plane should be structured sothat it can support the wide variety of current requirements(such isolation,virtualization,trafﬁc engineering,accesscontrol,etc.)and,to the extent possible,be capable of meetingfuture requirements as they arise.Today’s networking infrastructure does not satisfy any of these goals,which is the cause of signiﬁcant pain for network operators. In fact,in terms of impact on user experience,the inadequacies in these infrastructural aspects are probably more problematic than the Internet’s architectural deﬁciencies.The inability to meet these goals is not for lack of trying: the community has repeatedly tried new approaches to network infrastructure.Some of these attempts,such as Active Networking [22],focused more onﬂexibility than practicality,while others, such as ATM[7],had the opposite emphasis;out of a long list of ephemeral and/or ineffective proposals,by far the most successful approach has been MPLS[19].MPLS is now widely deployed and plays a crucial role in VPN deployment and trafﬁc engineering. While originally decried by some as an architectural abomination, we will argue later that MPLS embodies an important insight that we must incorporate in future network designs.However,MPLS did not meet all the goals of an ideal network,so more recently the community has made another attempt at reaching networking nirvana:Software-Deﬁned Networking(SDN)[10,13, 14,17].There has been a staggering level of hype about SDN,and some knee-jerk opposition;the shallowness of the discussion(which occurs mostly in trade magazines and blogs)has largely overlooked SDN’s more fundamental limitations.In this paper we discuss these limitations and explain how SDN, by itself,would fall short of the goals listed above.We then describe how we might create a better form of SDN by retrospectively leveraging the insights underlying MPLS.While OpenFlow has been used to build MPLS LSRs[12],we propose drawing architectural lessons from MPLS that apply to SDN more broadly.This modiﬁed approach to SDN revolves around the idea of network fabrics1which introduces a new modularity in networking that we feel is necessary 1We use this term in a general sense of a contiguous and coherently controlled portion of the network infrastructure,and do not limit its meaning to current commercial fabric offerings.if we hope to achieve both a simple,vendor-neutral,and future-proof hardware base and a sufﬁcientlyﬂexible control plane.We hasten to note that fabrics are already well established in the academic and commercial arenas(see,for example,[2,11,16]). However,while it would be easy to dismiss what we write here as “nothing new”,the direction we propose for SDN is quite different from what is being pursued by current ONF standardization efforts and what is being discussed in the academic SDN literature.Thus, our paper should be read not as a deep technical contribution but as a“call to arms”for the SDN community to look backwards towards MPLS as they plan for SDN’s future.We begin this paper(Section2)by reviewing the basics of traditional network design,MPLS,and SDN.We then,in Section 3,introduce a hybrid approach that combines SDN and MPLS.We end with a discussion of the implications of this approach.2Background on Network Designs2.1OverviewNetwork infrastructure design is guided by network requirements and network work requirements come from two sources:hosts and operators.Hosts(or,more accurately,the users of that host)want their packets to travel to a particular destination,and they may also have QoS requirements about the nature of the service these packets receive en route to that work operators have a broader set of requirements—such as trafﬁc engineering, virtualization,tunneling and isolation—some of which are invisible and/or irrelevant to the hosts.As we observe below,the control mechanisms used to meet these two sources of requirements are quite different.Like any system,networks can be thought of in terms of inter-faces;here we use that term not to refer to a formal programmatic interface,but to mean more generally and informally places where control information must be passed between network entities.There are three relevant interfaces we consider here:•Host—Network:Theﬁrst interface is how the hosts informthe network of their requirements;this is typically done inthe packet header(for convenience,in the following we willfocus on L3,but our comments apply more generally),whichcontains a destination address and(at least theoretically)someToS bits.However,in some designs(such as IntServ),there isa more explicit interface for specifying service requirements.•Operator—Network:The second interface is how operatorsinform the network of their requirements;traditionally,thishas been through per-box(and often manual)conﬁguration, but SDN(as we discuss later)has introduced a programmaticinterface.•Packet—Switch:The third interface is how a packet identiﬁesitself to a switch.To forward a packet,a router uses someﬁelds from the packet header as an index to its forwardingtable;the third interface is the nature of this index.We now turn to how the original Internet,MPLS,and SDN deal with requirements and implement these interfaces.2.2Original InternetIn the original Internet design,there were no operator requirements; the goal of the network was to merely carry the packet from source to destination(for convenience,we will ignore the ToS bits), and routing algorithms computed the routing tables necessary to achieve that goal.At each hop,the router would use the destination address as the key for a lookup in the routing table;that is,in our conceptual terms,every router would independently interpret the host requirements and take the appropriate forwarding action.Thus, the Host-Network and Packet-Switch interfaces were identical,and there was no need for the Operator-Network interface.2.3MPLSMPLS introduced an explicit distinction between the network edge and the network core.Edge routers inspect the incoming packet headers(which express the host’s requirements as to where to deliver the packet)and then attach a label onto the packet which is used for all forwarding within the core.The label-based forwarding tables in core routers are built not just to deliver packets to the destination, but also to address operator requirements such as VPNs(tunnels)or trafﬁc engineering.MPLS labels have meaning only within the core, and are completely decoupled from the host protocol(e.g.,IPv4or IPv6)used by the host to express its requirement to the network. Thus,the interface for specifying host requirements is still IP,while the interface for packets to identify themselves is an MPLS label. However,MPLS did not formalize the interface by which operators speciﬁed their control requirements.Thus,MPLS distinguished between the Host-Network and Packet-Switch interfaces,but did not develop a general Operator-Network interface.2.4SDNIn contrast to MPLS,SDN focuses on the control plane.In particular, SDN provides a fully programmatic Operator-Network interface, which allows it to address a wide variety of operator requirements without changing any of the lower-level aspects of the network. SDN achieves thisﬂexibility by decoupling the control plane from the topology of the data plane,so that the distribution model of the control plane need not mimic the distribution of the data plane. While the term SDN can apply to many network designs with decoupled control and data planes[3,5,6,9,20],we will frame this discussion around its canonical instantiation:OpenFlow.2In OpenFlow each switch within the network exports an interface that allows a remote controller to manage its forwarding state. This managed state is a set of forwarding tables that provide a mapping between packet headerﬁelds and actions to execute on matching packets.The set ofﬁelds that can be matched on is roughly equivalent to what forwarding ASICs can match on today, namely standard Ethernet,IP,and transport protocolﬁelds;actions include the common packet operations of sending the packet to a port as well as modifying the protocolﬁelds.While OpenFlow is a signiﬁcant step towards making the control plane moreﬂexible, it suffers from a fundamental problem that it does not distinguish between the Host-Network interface and the Packet-Switch interface. Much like the original Internet design,each switch must consider the host’s original packet header when making forwarding decisions. Granted,theﬂexibility of the SDN control plane allows theﬂow entries,when taken collectively,to implement sophisticated network services(such as isolation);however,each switch must still interpret the host header.3This leads to three problems:•First,it does not fulﬁll the promise of simpliﬁed hardware.In fairness,OpenFlow was intended to strike a balance 2We note that there has been a wealth of recent work on SDN, including[1,4,8,15,18,21,23,24,25],which extends SDN inone or more directions,but all of them are essentially orthogonal to the issues we are discussing here.3One could,of course,use SDN to implement MPLS.However, each switch must be prepared to deal with full host headers.between practicality(support matching on standard and generality(match on all headers).However,this switch hardware to support lookups over hundreds in contrast,core forwarding with MPLS needover some tens of bits.Thus,with respect to thehardware alone,an OpenFlow switch is clearly farsimplest design achievable.•Second,it does not provide sufﬁcientﬂexibility.host requirements to continue to evolve,leading togenerality in the Host-Network interface,which inincreasing the generality in the matching allowedactions supported.In the current OpenFlow designthis additional generality must be present on everyis inevitable that,in OpenFlow’s attempt toﬁnd ain the practicality vs generality tradeoff,needingto be present on every switch will bias the decisionmore limited feature set,reducing OpenFlow’s•Third,it unnecessarily couples the hostnetwork core behavior.This point is similar togeneral than the point above.If there is aexternal network protocols(e.g.,switching from IPv4which necessitates a change in the matching behaviorthe matching must be done over differentﬁelds),thisa change in the packet matching even in the networkThus,our goal is to extend the SDN model in a waythese limitations yet still retains SDN’s great control planeTo this end,it must retain its programmatic control plane(so that it provides a general Operator-Networkcleanly distinguishing between the Host-Network andinterfaces(as is done in MPLS).We now describe such a3Extending SDN3.1OverviewIn this section we explore how the SDN architectural framework might be extended to better meet the goals listed in the introduction. Our proposal is centered on the introduction of a new conceptual component which we call the“network fabric”.While a common term,for our purposes we limit the deﬁnition to refer to a collection of forwarding elements whose primary purpose is packet transport. Under this deﬁnition,a network fabric does not provide more complex network services such asﬁltering or isolation.The network then has three kinds of components(see Figure 1):hosts,which act as sources and destinations of packets;edge switches,which serve as both ingress and egress elements;and the core fabric.The fabric and the edge are controlled by(logically) separate controllers,with the edge responsible for complex network services while the fabric only provides basic packet transport.The edge controller handles the Operator-Network interface;the ingress edge switch,along with its controller,handle the Host-Network interface;and the switches in the fabric are where the Packet-Switch interface is exercised.The idea of designing a network around a fabric is well understood within the community.In particular,there are many examples of limiting the intelligence to the network edge and keeping the core simple.4Thus,our goal is not to claim that a network fabric is 4This is commonly done for example in datacenters where connectivity is provided by a CLOS topology running an IGP and ECMP.It is also reﬂected in WANs where interdomain policies areimplemented at the provider edge feeding packets into a simpler MPLS core providing connectivity across the operator network.FabricElementsFabric ControllerSrcHostDstHostEdge ControllerIngress Egressthe edge is responsible for providing more semantically rich services such as network security,isolation,and mobility.Separating the control planes allows them each to evolve separately,focusing on the speciﬁcs of the problem.Indeed,a good fabric should be able to support any number of intelligent edges(even concurrently)and vice versa.Note that fabrics offer some of the same beneﬁts as SDN. In particular,if the fabric interfaces are clearly deﬁned and standardized,then fabrics offer vendor independence,and(as we describe in more detail later)limiting the function of the fabric to forwarding enables simpler switch implementations.3.2Fabric Service ModelUnder our proposed model,a fabric is a system component which roughly represents raw forwarding capacity.In theory,a fabric should be able to support any number of edge designs including different addressing schemes and policy models.The reverse should also be true;that is,a given edge design should be able to take advantage of any fabric regardless of how it was implemented internally.The design of a modern router/switch chassis is a reasonably good analogy for an SDN architecture that includes a fabric.In a chassis,the line cards contain most of the intelligence and they are interconnected by a relatively dumb,but very high bandwidth, backplane.Likewise,in an SDN architecture with a fabric,the edge will implement the network policy and manage end-host addressing, while the fabric will effectively interconnect the edge as fast and cheaply as possible.The chassis backplane therefore provides a reasonable startingPrimitive DescriptionAttach(P)Attach a fabric port P to the fabric.Send(P,pkt)Send a packet to a single fabric port.Send(G,pkt)Send a packet to a multicast group G.Join(G,P)Attach a port to a multicast group G.Leave(G,P)De-attach a port from a multicast group G. Table1:Fabric service model.Note that the ToS bits in the packetfor QoS within the fabric are not included above.point for a fabric service model.Generally,a backplane supports point-to-point communication,point-to-multipoint communication, and priorities to make intelligent drop decisions under contention. In our experience,this minimal set is sufﬁcient for most common deployment scenarios.More complex network functions,such as ﬁltering,isolation,statefulﬂow tracking,or port spanning can be implemented at the edge.Table1summarizes this high-level service model offered by the fabric.3.3Fabric Path SetupAnother consideration is path setup.In the“wild”two methods are commonly used today.In the datacenter,a common approach is to use a standard IGP(like OSPF)and ECMP to build a fabric.In this case,all paths are calculated and stored in the fabric.MPLS, on the other hand,requires the explicit provisioning of an LSP by the provider.The primary difference between the two is that when all forwarding state is precalculated,it is normally done with the assumption that any point at the edge of the fabric can talk to any other point.On the other hand,provisioned paths provide an isolated forwarding context between end points that is dictated by network operator(generally from the provider edge).We believe that either model works in practice depending on the deployment environment.If both the edge and the fabric are part of the same administrative domain,then precalculating all routes saves operational overhead.However,if the edge and fabric have a customer-provider relationship,then an explicit provisioning step may be warranted.3.4Addressing and Forwarding in the FabricAs we have described it,a forwarding element in the fabric differs from traditional network forwarding elements in two ways.First, they are not required to use end-host addresses for forwarding, and second,they are only responsible for delivering a packet to its destination(s),and not enforcing any policy.As a result,the implementation of the fabric forwarding element can be optimized around relatively narrow requirements.Two current approaches exemplify the options available:•One option would be to follow MPLS and limit network addresses to opaque labels and the forwarding actions toforward,push,pop and swap.This would provide a very general fabric that could be used by multiple control planes toprovide either path-based provisioning or destination-basedforwarding with label-aggregation.•Another option would be to limit the packet operations to adestination address lookup with a longest preﬁx match withECMP-based forwarding.It is unlikely that this would besuitable for path-based provisioning,but it would likely resultis a simpler control plane and higher port densities.Our preference is to use labels similar to MPLS because it supports a more general forwarding model.However,the main point of this paper is that the SDN architecture should incorporate the notion of a fabric,but SDN need not be concerned with the speciﬁcs of the fabric forwarding model(indeed,that is the point of having a fabric!).Because the fabric can evolve independently of the edge,multiple forwarding models can exist simultaneously. 3.5Mapping the Edge Context to the FabricThe complexity in an edge/fabric architecture lies in mapping the edge context to network addresses or paths.By“mapping”we simply meanﬁguring out which network address or label to use for a given packet.That is,when a packet crosses from the edge to the fabric,something in the network must decide with which fabric-internal network address to associate with the packet.There are two primary mechanisms for this:Address translation.Address translation provides the mapping by swapping out addresses in situ.For example,when the packet crosses from the edge to the network,the edge addresses are replaced with fabric internal addresses,and then these addresses are translated back into appropriate edge addresses at the destination. The downside of this approach is that it unnecessarily couples the edge and network addressing schemes(since they would need to be of the same size,and map one-to-one with each other). Encapsulation.A far more popular,and we believe more general, approach to mapping an edge address space to the fabric-internal address space is encapsulation.With encapsulation,once a packet crosses from the edge to the network,it is encapsulated with another header that carries the network-level identiﬁers.On the receiving side,the outer header is removed.In either case(address translation or encapsulation),a lookup table at the edge must map edge addresses to network addresses to get packets across the fabric.However,unlike the fabric forwarding problem,this lookup may include any of the headersﬁelds that are used by the edge.This is because more sophisticated functions such asﬁltering,isolation,or policy routing(for example,based on the packet source)must be implemented at the edge.There are many practical(but well understood)challenges in implementing such a mapping that we will not cover in this paper. These include the control plane(which must maintain the edge mappings),connectivity fault management across the fabric,and the impact of addressing to basic operations and management.4Questions and ImplicationsIsn’t this just another approach to layering?To some extent, one could view the edge and the core as different layers,with the edge layer running“over”the core layer;to that extent this is indeed just another approach to layering.And layering provides some of the same beneﬁts we claim:it decouples the protocols in different layers(thereby increasing innovation)and allows for different layers to have different scopes(which is important for scaling).However, current dataplane layering can be thought of as“vertical”,making distinctions based on how close to the hardware a protocol is,and each layer goes all the way to the host.When a layer is exposed to the host,it becomes part of the host-network interface.What we are proposing here is more of a“horizontal”layering, where the host-network interface occurs only at the edge,and the general packet-switch interface exists only in the core.This is a very different decoupling than provided by traditional layering.We not only want to decouple one layer from another,we want to decouple various pieces of the infrastructure from the edge layer entirely. What does this mean for OpenFlow?This approach would require an“edge”version of OpenFlow,which is much more generalthan today’s OpenFlow,and a“core”version of OpenFlow which is little more than MPLS-like label-based forwarding.One can think of the current OpenFlow as an unhappy medium between these two extremes:not general enough for the edge,and not simple enough for the core.One might argue that OpenFlow’s lack of generality is appropri-ately tied to current hardware limitations,and that proposing a more general form of OpenFlow is doomed to fail.But at present much edge forwarding in datacenters is done in software by the host’s general-purpose CPU.Moreover,the vast majority of middleboxes are now implemented using general-purpose CPUs,so they too could implement this edge version of OpenFlow.More generally,any operating system supporting OpenvSwitch or the equivalent could perform the necessary edge processing as dictated by the network controller.Thus,we believe that the edge version of OpenFlow should aggressively adopt the assumption that it will be processed in software,and be designed with that freedom in mind.Why is simplicity so important?Even if one buys the arguments about the edge needing to become moreﬂexible(to accommodate the generality needed in the host-network interface)and able to become moreﬂexible(because of software forwarding),this doesn’t imply that it is important that the core become simpler.There are two goals to simplicity,reduced cost and vendor-neutrality,and the latter is probably more important than the former.Even if the additional complexity were not a great cost factor,if one is striving for vendor-neutrality then one needs an absolutely minimal set of features.Once one starts adding additional complexity,some vendors will adopt it (seeking a competitive advantage on functionality)and others won’t (seeking a competitive advantage on cost),thereby lessening the chance of true vendor-neutrality.We believe this is likely to happen with the emerging OpenFlow speciﬁcations,but would not apply to simple label-switching boxes.5What does this mean for networking more generally?If indeed we arrive at a point where the edge processing is done in software and the core in simple hardware,then the entire infrastructure becomes much more evolvable.Consider the change from IPv4 to IPv6;if all IP processing were done at the edge in software,then simple software updates to hosts(and to the relevant controllers) would be sufﬁcient to change over to this new protocols.While we started the paper focusing on infrastructure over architecture,this is one way in which an improved infrastructure would help deal with architectural issues.Isn’t all this obvious?Yes,we think so.But we also think it is important,and is not being sufﬁciently addressed within the SDN community.We hope that HotSDN will be a forum for a discussion of these topics.References[1]Beacon:A java-based OpenFlow control platform..[2]Brocade VCS Fabric./downloads/documents/white_papers/Introducing_Brocade_VCS_WP.pdf.[3]M.Caesar,D.Caldwell,N.Feamster,J.Rexford,A.Shaikh,and K.van der Merwe.Design and Implementation of aRouting Control Platform.In Proc.of NSDI,2005.[4]M.Canini,D.Venzano,P.Peresini,D.Kostic,and J.Rexford.A NICE Way to Test OpenFlow Applications.In Proc.ofNSDI,2012.5Of course,vendors will always compete in terms of various quantitative measures(e.g.,size of TCAM,amount of memory), but the basic interfaces should be vendor-neutral.[5]M.Casado,M.J.Freedman,J.Pettit,J.Luo,N.McKeown,and S.Shenker.Ethane:Taking Control of the Enterprise.In Proc.of SIGCOMM,2007.[6]M.Casado,T.Garﬁnkel,A.Akella,M.J.Freedman,D.Boneh,N.McKeown,and S.Shenker.SANE:A ProtectionArchitecture for Enterprise Networks.In Proc.of UsenixSecurity,2006.[7]M.de Prycker.Asynchronous Transfer Mode:Solution forBroadband ISDN.Ellis Horwood,1991.[8]N.Foster,R.Harrison,M.J.Freedman,C.Monsanto,J.Rexford,A.Story,and D.Walker.Frenetic:a NetworkProgramming Language.In Proc.of SIGPLAN ICFP,2011.[9]A.Greenberg,G.Hjalmtysson,D.A.Maltz,A.Myers,J.Rexford,G.Xie,H.Yan,J.Zhan,and H.Zhang.A CleanSlate4D Approach to Network Control and Management.SIGCOMM CCR,35(5):41–54,2005.[10]N.Gude,T.Koponen,J.Pettit,B.Pfaff,M.Casado,N.McKeown,and S.Shenker.NOX:Towards an OperatingSystem for Networks.SIGCOMM CCR,38,2008.[11]Juniper QFabric./QFabric.[12]J.Kempf et al.OpenFlow MPLS and the Open Source LabelSwitched Router.In Proc.of ITC,2011.[13]T.Koponen,M.Casado,N.Gude,J.Stribling,L.Poutievski,M.Zhu,R.Ramanathan,Y.Iwata,H.Inoue,T.Hama,andS.Shenker.Onix:A Distributed Control Platform forLarge-scale Production Networks.In Proc.of OSDI,2010. [14]N.McKeown,T.Anderson,H.Balakrishnan,G.Parulkar,L.Peterson,J.Rexford,S.Shenker,and J.Turner.OpenFlow: Enabling Innovation in Campus Networks.SIGCOMM CCR, 38(2):69–74,2008.[15]A.K.Nayak,A.Reimers,N.Feamster,and R.J.Clark.Resonance:Dynamic Access Control for Enterprise Networks.In Proc.of SIGCOMM WREN,2009.[16]R.Niranjan Mysore,A.Pamboris,N.Farrington,N.Huang,P.Miri,S.Radhakrishnan,V.Subramanya,and A.Vahdat.PortLand:A Scalable Fault-tolerant Layer2Data CenterNetwork Fabric.In Proc.of SIGCOMM,2009.[17]B.Pfaff,J.Pettit,T.Koponen,M.Casado,and S.Shenker.Extending Networking into the Virtualization Layer.In Proc.of HotNets,2009.[18]M.Reitblatt,N.Foster,J.Rexford,and D.Walker.ConsistentUpdates for Software-Deﬁned Networks:Change You CanBelieve In!In Proc.of HotNets,2011.[19]E.Rosen,A.Viswanathan,and R.Callon.MultiprotocolLabel Switching Architecture.RFC3031,IETF,2001. [20]S.Shenker.The Future of Networking,the Past of Protocols./watch?v=YHeyuD89n1Y.[21]R.Sherwood,G.Gibb,K.-K.Yap,G.Appenzeller,M.Casado,N.McKeown,and G.Parulkar.Can theProduction Network Be the Testbed?In Proc.of OSDI,2010.[22]D.L.Tennenhouse and D.J.Wetherall.Towards an ActiveNetwork Architecture.In Proc.of DANCE,2002.[23]Trema:Full-Stack OpenFlow Framework in Ruby and C./trema.[24]A.V oellmy and tle:Taking the Sting Out ofProgramming Network Routers.In Proc.of PADL,2011. [25]M.Yu,J.Rexford,M.J.Freedman,and J.Wang.ScalableFlow-based Networking with DIFANE.In Proc.ofSIGCOMM,2010.。

TU Munich

A Compositional Approach to Statecharts SemanticsGerald L¨uttgen ICASE,NASA Langley Research Center Hampton,VA23681–2199USA luettgen@ Michael von der BeeckDepartment of Comp.Sc.TU MunichD–80290M¨unchenGermanybeeck@in.tum.deRance CleavelandDepartment of Comp.Sc.SUNY at Stony BrookStony Brook,NY11794–4400USArance@ABSTRACTStatecharts is a visual language for specifying reactive system be-havior.The formalism extends traditionalﬁnite–state machines with notions of hierarchy and concurrency,and it is used in many popular software design notations.A large part of the appeal of Statecharts derives from its basis in state machines,with their in-tuitive operational interpretation.The classical semantics of State-charts,however,suffers from a serious defect:it is not composi-tional,meaning that the behavior of system descriptions cannot be inferred from the behavior of their positionality is a prerequisite for exploiting the modular structure of Statecharts for simulation,veriﬁcation,and code generation,and it also provides the necessary foundation for reusability.This paper suggests a new compositional approach to formaliz-ing Statecharts semantics asﬂattened labeled transition systems in which transitions represent system steps.The approach builds on ideas developed for timed process calculi and employs structural operational rules to deﬁne the transitions of a Statecharts expres-sion in terms of the transitions of its subexpressions.It isﬁrst pre-sented for a simple dialect of Statecharts,with respect to a vari-ant of Pnueli and Shalev’s semantics,and is illustrated by means of a small example.To demonstrate itsﬂexibility,the proposed approach is then extended to deal with practically useful features available in many Statecharts variants,namely state references,his-tory states,and priority concepts along state hierarchies. Categories and Subject DescriptorsD.2.1[Requirements/Speciﬁcations]:[languages];D.3.1[Pro-gramming Languages]:Formal Deﬁnitions and Theory;F.3.2[Se-mantics of Programming Languages]:[operational semantics]The aim of this paper is to present a new approach to deﬁning State-charts semantics which combines all three abovementioned fea-tures in a formal,yet operationally intuitive,fashion.Our semantic account borrows ideas from timed process calculi[13],which also employ the synchrony hypothesis[2]and which allow one to rep-resent ordinary system behavior and clock ticks using labeled tran-sition systems.These transition systems are deﬁned via structural operational rules[28]—i.e.,rules in SOS format—along the state hierarchy of the Statechart under consideration.Our semantics ex-plicitly represents macro steps as sequences of micro steps which begin and end with the ticking of a global clock.Thereby,compo-sitionality is achieved on the explicit micro–step level and causality and synchrony on the implicit macro–step level.The current work builds on previous research by the authors[20],which developed a compositional timed process algebra that was then used to em-bed a simple variant of Statecharts introduced in[21].That work indirectly yielded a compositional operational semantics for State-charts.In this paper,we re–develop the semantics of[20]with-out reference to a process algebra,thereby eliminating the rather complicated indirection.Our intention is to make the underlying semantic issues and design decisions for Statecharts more apparent and comprehensible.The paper also argues for theﬂexibility and elegance of our approach by extending our semantics to cope with popular Statecharts features used in practice,such as state refer-ences,history states,and priority concepts.Organization of this paper.The next section gives a brief overview of Statecharts,including our notation and its classical semantics. Section3presents our new compositional approach to Statecharts semantics.It also establishes a coincidence result with respect to the traditional step semantics and illustrates our approach by means of an example.Section4shows how our framework can be ex-tended to include various features employed in many Statecharts dialects.Finally,Section5discusses related work,while Section6 contains our concluding comments as well as some directions for future research.2.AN OVERVIEW OF STATECHARTS Statecharts is a speciﬁcation language for reactive systems,i.e., systems characterized by their ongoing interaction with their en-vironment.The notation enriches basicﬁnite–state machines with concepts of hierarchy,concurrency,and priority.In particular,one Statechart may be embedded within the state of another Statechart, and a Statechart may be composed of several simultaneously ac-tive sub–Statecharts which communicate via broadcasting events. Transitions are labeled by pairs of event sets,where theﬁrst com-ponent is referred to as trigger and may include negated events, and the second is referred to as action.Intuitively,if the environ-ment offers all the positive but none of the negated events of the trigger,then the transition is enabled and can be executed,thereby generating the events in the label’s action.As an example,consider the Statechart depicted in Figure1.It consists of an and–state,labeled by,which denotes the paral-lel composition of the two Statecharts labeled by and,both of which are or–states describing a sequential state machine.Or–state is further reﬁned by or–state and basic state,which are connected via transition labeled by.The label speciﬁes that is triggered by the occurrence of event;its execution does not generate any new event as its action is empty.Or–state con-tains the basic states and,connected by transition with trigger and empty action;hence,is enabled if event occurs but event does not.Or–state consists of two basicFigure1:Example Statechart.states and connected via transition with label,so that upon occurrence of trigger event,transition can be executed and generate event.In this paper,weﬁrst consider a simple dialect of Statecharts that supports a basic subset of the popular features present in many Statecharts variants.In particular,it considers hierarchy and con-currency.However,it ignores interlevel transitions(i.e.,transitions crossing borderlines of states),state references(i.e.,triggers of the form in,where is the name of a state),and history states(re-membering the last active sub–state of an or–state).In addition, it does not attach any implicit priorities to transitions at different levels in the state hierarchy.To illustrate theﬂexibility of our approach,we show in Section4how it can be extended to deal with state references,history states,and the abovementioned pri-ority concepts.Interlevel transitions,however,cannot be brought in accordance with a compositional semantics,as they represent an unstructured“goto”behavior(cf.Section5).2.1Term–based SyntaxFor our purposes it is convenient to represent Statecharts not visu-ally but by terms,as is done in[20,21].Formally,let be a count-able set of names for Statecharts states,be a countable set of names for Statecharts transitions,and be a countable set of State-charts events.For technical convenience,we assume that and are disjoint.With every event we associate a negated coun-terpart and deﬁne df as well as df, for.The set SC of Statecharts terms is then deﬁned by the following inductive rules.1.Basic state:If,then is a Statecharts term.2.Or–state:Suppose that and that are State-charts terms for,with df.Also letdf and,with.Then is a Statecharts term.Hereare the sub–states of,set contains the tran-sitions connecting these states,is the default state of, and is the currently active sub–state of.The transitions in are of the form df,where(a)is the name of,(b)source df is the source state of,(c)trg df is the trigger of,(d)act df is the action of,and(e)target df is the target state of.In the sequel,trg stands for trg and trg for trg.Since we assume that all state names and tran-sition names are mutually disjoint,we may uniquely referto states and transitions by using their names,e.g.,we may write for.We also assume that no transition produces an event which appears negated in its trigger.3.And–state:If,if are Statecharts terms for,and if df,then is a State-charts term,where are the(parallel)sub–states of.The Statecharts term corresponding to the Statechart depicted in Figure1is term which is deﬁned as follows.dfdfdfdfdfdfdfdfdfNote that components two andﬁve of a transition in some or–state refer to the indexes of the source and target state in the sequence,respectively,and not to the states’names.2.2Classical SemanticsIn this section,we sketch the semantics of Statecharts terms adopted in[21],which is a slight variant of the“classical”Statecharts se-mantics as proposed by Pnueli and Shalev[29].We refer the reader to[21]for a detailed discussion of the underlying semantic issues.As mentioned before,a Statechart reacts to the arrival of some ex-ternal events by triggering enabled micro steps in a chain–reaction manner.When this chain reaction comes to a halt,a complete macro step has been performed.More precisely,a macro step comprises a maximal set of micro steps,or transitions,that(i)are relevant,(ii)are mutually consistent,(iii)are triggered by events offered by the environment or generated by other mi-cro steps,(iv)are mutually compatible,and(v)obey the princi-ple of causality.These notions may be deﬁned as follows.Let SC,let be a transition in,let be a set of transitions in,and let.Transition is relevant for Statecharts term,in signs relevant,if the source state of is cur-rently active.Transition is consistent with all transitions in,in signs consistent,if is not in the same parallel com-ponent as any transition in.Transition is triggered by event set,in signs triggered,if the positive but not the negative trigger events of are in.Transition is compatible with all transitions in,in signs compatible,if no event produced by appears negated in a trigger of a transition in.Finally,we say that transition is enabled in with re-spect to event set and transition set,if enabled, where enabled df relevant consistent triggered act compatible.A macro step is a subset of transitions in enabled that is causally well-founded.Technically,causality holds if there exists an order-ing among the transitions in a macro step such that no transition depends on events generated by transitions occurring after in the macro step.In[21],an operational approach for causally justifying the triggering of each transition of a macro step is given.It em-ploys the nondeterministic step–construction function presented inTable1:Step–construction functionTable1,which is adapted from Pnueli and Shalev[29].Given aStatecharts term and a set of environment events,the step–construction function nondeterministically computes a set of transitions.In this case,Statecharts term may evolve in the single macro step to Statecharts term,thereby executing the transitions in and producing the events df act.Term can be derived from by updating the index in everyor–state of satisfying for some .Observe that once one has constructed a macro step,all information about how the macro step was derived at is discarded.This is the source for the compositionality defect of this semanticsfor Statecharts:when two Statecharts are composed in parallel,thecombination of the causality orderings may introduce newly en-abled transitions(cf.[19]).Let us illustrate a couple of macro steps of the example State-chart depicted in Figure1.For convenience,we abbreviate a State-charts term by its active basic states,e.g.,term is abbreviatedby.Moreover,we let df and assume that the environment only offers event.Then,both transitions and are enabled,and the execution of results in macro step,i.e.,a macro step in which only a single tran-sition takes part.Although is also enabled,it cannot be executed together with in the same macro step.Otherwise,global con-sistency is violated since generates event whose negated coun-terpart is contained in the trigger of.However,transitions and can take part in the same macro step,as is located in a dif-ferent parallel component than and is triggered by event which is generated by.This leads to macro step. All macro steps of our example Statechart can be found in Figure3.3.A COMPOSITIONAL SEMANTICSWhile Pnueli and Shalev’s semantics has the advantage of simplic-ity,it is not compositional:it relies on a global analysis of an entire Statecharts expression in order to infer its macro steps.In particu-lar,it does not compute the macro steps of a Statechart in terms of the macro steps of its subcomponents.When the Statechart in ques-tion is large,the procedure can therefore be slow,since the macro steps of subcomponents cannot be“precomputed”and reused.It also means that the semantics cannot be used as a basis for modu-lar reasoning about Statecharts.In the following,we present our approach to deﬁning a composi-tional semantics for Statecharts.Our framework is based onﬂatlabeled transition systems and is deﬁned in the SOS style,i.e.,via structural operational rules[28].Each such rule is of the formnamepremisesynchrony hypothesismacro stepFigure 2:Illustration of our operational semantics.and should be read as follows:Rule (name )is applicable if boththe statements in its premise and its side condition hold;in this case,one might infer the conclusion .The beneﬁts of employing SOS –style semantics are manifold.Most tools for the formal analy-sis and veriﬁcation of systems rely on compilers which translate a textual system speciﬁcation,e.g.,a Statechart term,into a la-beled transition system or a Kripke structure [5].Examples of such tools are those offering equivalence checkers and/or model check-ers,including the CONCURRENCY WORKBENCH [6],SPIN [14],and SMV [24].In addition,several meta–theories regarding SOS –style semantics have been developed.Of most relevance to our work are results which infer the compositionality of a semantics from the syntactical structure of the premises,conclusions,and side conditions of the SOS rules deﬁning it [35].3.1Our approachIn contrast to related work,we develop an operational semantics on the micro–step level rather than the macro–step level and represent macro steps as sequences of micro steps.Within such a setting,compositionality is easy to achieve.The challenge is to identify the states at which macro steps start and end so that Statecharts’tradi-tional,non–compositional macro–step semantics can be recovered.Our solution is based on the observation that since Statecharts is a synchronous language,ideas from timed process calculi may be adapted.In particular,we use explicit global clock ticks to denote the boundaries of macro steps.Our ﬂat labeled transition systems therefore possess two kinds of transitions:those representing the execution of a Statecharts tran-sition and those representing global clock ticks.In timed pro-cess calculi such transitions are referred to as action transitions and clock transitions ,respectively.The ideas behind our seman-tics are illustrated in Figure 2,where clock transitions are labeled by .The other transitions are action transitions and actually carry pairs of event sets as labels.An action transition stands for a single Statechart transition which is enabled if the system en-vironment offers all events in but none in .The states of our transition systems are annotated with (extended)Statecharts terms from which one may infer the events generated at any point of execution of the considered Statechart.Accordingly,the clas-sical macro–step semantics of Statecharts can be recovered from our semantics as follows:assume that the global clock ticks,sym-bolizing the beginning of a macro step,when the system environ-ment offers the events in .Starting from a clock transition,followTable 2:Functions out (top)and default (bottom)default df default df defaultdefaultdfdefaultSCSC andSCSC ,are deﬁned via SOS rules.Table3:Operational rules:action transitionstrgANDoutcBASdefaulttarget cOR3The operational rules for action transitions are given in Table3,where the subscript of the transition relation should be ignored fornow;the subscript will only be needed in Section4.3.For conve-nience,we write.More-over,we let stand for the sequence and write for.Rule(OR1)states that or–state can evolve to if transition is enabled,i.e.,if(i)the source state of is the currently active state,(ii)all its positive trigger events trg are offered by the environment,(iii)the positive counterparts of all its negated trigger events trg are not offered by the envi-ronment,and(iv)the negated events corresponding to act are not offered by the environment,i.e.,no transition within the same macro step has alreadyﬁred due to the absence of such an event. The latter is necessary for implementing global consistency in our semantics.Rules(OR2)and(OR3)deal with the case that an in-ner transition of the active sub–state of the considered or–state is executed.Hence,sub–state needs to be updated accordingly. The resulting micro term also reﬂects—via the double colons—that a transition originating within the or–state has been executed,in which case the or–state may no longer en-gage in a transition in during the same macro step,i.e.,before executing the next clock transition.Finally,Rule(AND)deals with and–states.If sub–stateﬁres a transitionholds, which stands forholds—,a clock tick can be accepted and does not result in any change of state(cf.Rule(cOR2)).Rule(cOR3)formalizes the be-havior that an or–state can engage in a clock transition if its active sub–state can engage in one.Finally,Rule(cAND)states that an and–state can engage in a clock transition if all its sub–states can, provided that there is no action transition whose execution cannot be prevented,i.e.,provided that6t (6)69n , t 29(75n (6n ), t 5n 2ba 6)(5n ab σFigure 3:Our semantics (left)and the macro–step semantics (right)for the Statechart in Figure 1.with inputand outputto,if there existSC ,,and,for some,such that 1.,,and.While Conds.(2)and (3)guarantee that all considered action tran-sitions are enabled by the environment,Cond.(5)ensures the max-imality of the macro step,i.e.,it implements the synchrony hy-pothesis.Now,we can establish the desired result,namely that our macro–step semantics coincides with the classical macro–step se-mantics of Statecharts.T HEOREM 3.1(C OINCIDENCE RESULT ).Let SC aswell as.Then,if and only if.P ROOF .(Sketch .)Consider the following construction.Ifis a sequence of Statecharts transitions of SCgenerated by the step–construction function relative to environmentand satisfying out ,then there exists a se-quence ofaction transitions as described in Def.1,such that the -th action transition corresponds to the execution of in .Vice versa,assume that the conditions of Def.1are satisﬁed for some and that is the sequence of State-charts transitions which can be identiﬁed with the considered se-quence of action transitions starting in .Then,can be generated by the step–construction function relative to and ,where the transitions ﬁre in the order indicated by sequence .3.4ExampleWe now return to our example Statechart of Figure 1.Our seman-tics of this Statechart and its classical macro–step semantics are depicted on the left and right in Figure 3,respectively.In both dia-grams,we represent a transition of the formout df inout df out in out df outinout df outinoutdfacttrgin,whereout,and the second macro step is encoded bythe sequence,where out.4.EXTENSIONSWe now illustrate the ﬂexibility of our approach by adapting it to incorporate features offered by many popular Statecharts variants,namely state references ,history mechanisms ,and priority concepts along the or–state hierarchy.4.1State ReferencesMany Statecharts variants permit trigger events of the form in ,for,which are satisﬁed whenever state is active.In our setting,we may encode this feature via the employed communica-tion scheme.To do so,we ﬁrst extend the set of events by the dis-tinguished events in ,for all.Moreover,the sets out ,for SC ,need to be re–deﬁned —as shown in Table 5—such that they include the events in ,for any active state in .It is easy to see that the resulting semantics handles state references as expected.4.2History StatesUpon entering or–states,their initial states are activated.However,in practice it is often convenient to have the option to return to the sub–state which was active when last exiting an or–state,e.g.,aftercompleting an interrupt routine.In Statecharts’visual syntax thisis done by permitting distinguished history states in or–states to which transitions from the outside of the considered or–states may point.Such history states can have twoﬂavors:deep and shallow. Deep means that the‘old’active state of the or–state and the‘old’active states of all its sub–states are restored.Shallow means that only the active state of the or–state is restored and that its sub–statesare reinitialized as usual.In our term–based setting,we may model history states and transitions traversing to history states as follows. For each transition pointing to some or–state,we additionally record a historyﬂag none deep shallow.If none, then transition is interpreted in the standard way,otherwise it is interpreted to point to the deep—if deep—or shallow—if shallow—history state in.In the light of this formalization,it is easy to integrate a history mechanism in our operational semantics.One just has to replace function default in Rule(cOR1)by function default, where none deep shallow is the historyﬂag of transi-tion.The terms default none and default deep are sim-ply deﬁned by default and,respectively.The deﬁnition of default shallow can be done along the structure of Statecharts terms as follows1.default shallow df,2.default shallow dfdefault, 3.default shallow df default shallow.Here,default shallow,where,stands for default shallow default shallow.Observe that default needs only to be deﬁned for Statecharts terms SC and not also for micro terms SC SC.4.3Priority ConceptsSome Statecharts dialects consider an implicit priority mechanism along the hierarchy of or–states.In UML Statecharts[3],for exam-ple,inner transitions of an or–state have priority over outer transi-tions,while this is the other way around in STATEMATE[10].Let us provide aﬂexible scheme for encoding both priority concepts,for which we introduce the notion of addresses which are built accord-ing to the BNF,for.The set of all such addresses is denoted by ddr.Each action transi-tion is then labeled with an address pointing to the sub–term of the considered Statecharts term,from which the transition originates (cf.the subscripts of the transitions in Table3).Intuitively,the symbol encodes that the transition originates from the considered state,i.e.,this state must be an or–state and the transition leaves the or–state’s active sub–state.Address also requires the state to be an or–state and the transition to originate from address of the currently active sub–state of the or–state.Finally,address indicates that the considered state is an and–state having at least sub–states and that the transition originates from address of the -th sub–state.Given an address ddr,we can now deﬁne the set MI of addresses which are considered more important than according to the chosen priority concept.The deﬁnitions of MI for the priority concepts of UML Statecharts and STATEMATE can be done straightforwardly along the structure of and are given in Table6. They do not require any extra explanation.Now,we can deﬁne a Table6:Priority Structure`a la UML(top)and`a la STATEMATE (bottom)MI dfMI df MIMI df MIfor action transitions,which coincides with the original transition relation given in Section3,except that low–priority action transitions areﬁltered out.PrioThis rule states that an action transition located at address may be executed if there exists no action transition at some more important address,which cannot be prevented in any system environment. The justiﬁcation for the fact that only action transitions with empty sets as labels have pre–emptive power over lower prioritized ac-tion transition is similar to the one regarding the pre–emption of clock transitions in Section3.One might wonder why this“two–level”deﬁnition of Statecharts semantics is still compositional,as the above side condition concerns a global property.In order to see this,one may again employ meta–theoretic results about the compositionality of semantics deﬁned via SOS rules[35].Alterna-tively,one can distribute the side condition among the original rules for action transitions,such that compositionality becomes obvious, as is done in approaches to priority in process algebras[4].Some details on this issue can be found in the appendix.5.RELATED WORKWe categorize related work along the three dimensions of State-charts semantics:causality,synchrony,and compositionality.This classiﬁcation wasﬁrst considered by Huizing and Gerth[15]who demonstrated that these dimensions cannot be trivially combined within a simple semantic framework.The original Statecharts semantics,as presented by Harel et al.[11], obeys causality and synchrony.However,it ignores compositional-ity and the concept of global ter,Huizing et al.[16] provided a compositional denotational semantics for this variant, while Pnueli and Shalev[29]suggested the introduction of global consistency for improving the practicality of the variant.How-ever,Pnueli and Shalev’s formalization is again not compositional. Its compositionality defect has recently been analyzed by adapting ideas from intuitionistic logic,as it can be shown that the logic un-derlying Pnueli and Shalev’s semantics does not respect the Law of the Excluded Middle[19].Other researchers have developed languages whose semantics obey the synchrony hypothesis and compositionality but violate causal-ity.Prominent representatives of such languages include Berry’s ESTEREL[2],to which recently some dialect of Statecharts has been interfaced as graphical front–end[32],and Maraninchi’s AR-GOS[22].Both languages are deterministic and treat causalityrather conservatively in a pre–processing step,before determining the semantics of the considered program as Mealy automaton via structural operational rules[23].Moreover,ARGOS semantics sig-niﬁcantly differs from Statecharts semantics by allowing sequential components toﬁre more than once within a macro step.Another approach to formalizing Statecharts,whichﬁts into this category, is the one of Scholz[30]who uses streams as semantic domain for deﬁning a non–causalﬁxed point semantics.The popular synchronous version of STATEMATE[10]neglects the synchrony hypothesis.Events generated in one step may not be consumed within the same step but in the next step only.The oper-ational semantics of this dialect has been compositionally formal-ized by Damm et al.[7].It was also considered by Mikk et al.[25] who translated STATEMATE speciﬁcations to input languages of model–checking tools by using hierarchical automata[26]as in-termediate language.This intermediate language was employed by Latella et al.[17],too,for formalizing the semantics of UML Statecharts[3]in terms of Kripke structures.However,UML State-charts discard not only the synchrony hypothesis but additionally negated events and,thereby,make the notion of global consistency obsolete.Their semantics was also investigated by Paltor and Lil-ius[27],who developed a semantic framework on the basis of a term–rewriting system.Our work is,however,most closely related to approaches which aim at combining all three dimensions—causality,synchrony,and compositionality—within a single formalism.These approaches may be split into two classes.Theﬁrst class adapts a process–algebraic approach,where Statecharts languages are embedded in process algebras,for which structured operational semantics based on labeled transition systems are deﬁelton and Smolka[34] have pioneered this approach which was then reﬁned by Levi[18]. Their notion of transition system involves complex labels of the form,where is a set of events and is a transitive, irreﬂexive order on encoding causality.The second class is characterized by following essentially the same ideas but avoid-ing the indirection of process algebra.Research by Uselton and Smolka[33]again employs the mentioned partial order,whereas Maggiolo–Schettini et al.[21]require even more complex and in-tricate information about causal orderings,global consistency,and negated events.While our present work alsoﬁts into this class, although it originated in the former[20],it avoids complex la-bels by representing causality via micro–step sequences and by adding explicit clock transitions to retrieve macro–step informa-tion.Thereby,our semantics is not only simple and concise but also comprehensible and suited for interfacing Statecharts to exist-ing analysis and veriﬁcation tools.In addition,our approach is very ﬂexible as we demonstrated by adding several prominent features, namely state references,history states,and priority concepts,to our initially rather primitive Statecharts dialect.An alternative compositional semantics for Statecharts is presented by von der Beeck in[37].That account is based on the one pre-sented here,although its practical utility is compromised by the inclusion of unnecessary syntactic information in the semantic re-lation.In particular,the inclusion of transition names into the labels of execution steps allows Statecharts exhibiting the same observa-tional behavior but differing in the naming of transitions to be dis-tinguished.Finally,we brieﬂy comment on interlevel transitions which prohibit a compositional Statecharts semantics as they are based on the idea of“goto–programming.”First of all,interlevel transitions jeopar-dize a strictly structural deﬁnition of Statecharts terms,which is a prerequisite for deriving any compositional semantics.Hence,for modeling interlevel transitions,the syntax of Statecharts must be changed in a way such that interlevel transitions may be represented by several intralevel transitions which are connected via dedicated ports.This can be done either explicitly,as in the Communicating Hierarchical State Machine language introduced by Alur et al.[1], or implicitly via a synchronization scheme along the hierarchy of or–states,as in Maraninchi’s ARGOS[22].6.CONCLUSIONSThis paper presented a new approach to formalizing Statecharts se-mantics,which is centered around the principle of compositionality and borrows from ideas developed for timed process algebras.In contrast to related work,our approach combines all desired features of Statecharts semantics,namely causality,synchrony,and compo-sitionality,within a single formalism,while still being simple and comprehensible.Its foundation on structural operational rules guar-antees that our semantics is easy to implement in speciﬁcation and veriﬁcation tools and that it can be adapted to several Statecharts dialects.The proposed semantic framework also permits the inte-gration of many features desired in practice,as we demonstrated by extending it to dealing with state references,history states,and pri-ority st,but not least,we hope that this paper testiﬁes to the utility of applying knowledge from theﬁeld of Concurrency Theory to formalizing practical speciﬁcation languages rigorously yet clearly.7.ACKNOWLEDGMENTSWe would like to thank the anonymous referees for their valuable comments and suggestions.8.REFERENCES[1]R.Alur,S.Kannan,and municatinghierarchical state machines.In ICALP’99,volume1644ofLNCS,pages169–178.Springer-Verlag,1999.[2]G.Berry and G.Gonthier.The ESTEREL synchronousprogramming language:Design,semantics,implementation.SCP,19(2):87–152,1992.[3]G.Booch,J.Rumbaugh,and I.Jacobson.The UniﬁedModeling Language User Guide.Addison Wesley,1998. [4]R.Cleaveland and M.Hennessy.Priorities in processrm.&Comp.,87(1/2):58–77,1990.[5]R.Cleaveland,E.Madelaine,and S.Sims.Generatingfront-ends for veriﬁcation tools.In TACAS’95,volume1019 of LNCS,pages153–173.Springer-Verlag,1995.[6]R.Cleaveland and S.Sims.The NCSU CONCURRENCYWORKBENCH.In CAV’96,volume1102of LNCS,pages394–397.Springer-Verlag,1996.[7]W.Damm,B.Josko,H.Hungar,and A.Pnueli.Acompositional real-time semantics of STATEMATE designs.In de Roever et al.[8],pages186–238.[8]W.-P.de Roever,ngmaack,and A.Pnueli,editors.Compositionality:The Signiﬁcant Difference,volume1536of LNCS.Springer-Verlag,1997.。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

From Simulink to SCADE/Lustre to TTA:a layered approach for distributed embedded applications∗Paul Caspi,Adrian Curic,Aude Maignan,Christos Sofronis,Stavros Tripakis†Peter Niebert‡ABSTRACTWe present a layered end-to-end approach for the design and implementation of embedded software on a distributed platform.The approach comprises a high-level modeling and simulation layer(Simulink),a middle-level program-ming and validation layer(SCADE/Lustre)and a low-level execution layer(TTA).We provide algorithms and tools to pass from one layer to the next.First,a translator from Simulink to Lustre.Second,a set of real-time and code-distribution extensions to Lustre.Third,implementation techniques for decomposing a Lustre program into tasks and messages,scheduling the tasks and messages on the proces-sors and the bus,distributing the Lustre code on the execu-tion platform,and generating the necessary“glue”code. Categories and Subject DescriptorsD.2.2[Software Engineering]:Design Tools and Tech-niquesGeneral TermsDesign,LanguagesKeywordsEmbedded software,Simulink,Synchronous languages,Lus-tre,Code distribution,Scheduling†VERIMAG,Centre Equation,2,avenue de Vignate,38610 Gi`e res,France,www-verimag.imag.fr.‡Laboratoire d’Informatique Fondamentale(LIF),CMI,39 rue Joliot-Curie,13453Marseille,France,www.lif.univ-mrs.fr.∗Matlab and Simulink are Registered Trademarks of Math-Works,Inc.SCADE and Simulink Gateway are Registered Trademarks of Esterel Technologies,SA.This work has been supported in part by European IST projects“NEXT TTA”under project No IST-2001-32111and“RISE”under project No IST-2001-38117.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on theﬁrst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.LCTES’03,June11–13,2003,San Diego,California,USA.Copyright2003ACM1-58113-647-1/03/0006...$5.00.1.INTRODUCTIONDesigning safety-critical control systems requires a seam-less cooperation of tools at several levels—modeling and de-sign tools at the control level,development tools at the soft-ware level and implementation tools at the platform level. When systems are distributed,the choice of the platform is even more important and the implementation tools must be chosen accordingly.A tool-box achieving such a cooperation would allow important savings in design and development time as well as safety increases and cost eﬀectiveness.In the course of several European IST projects(SafeAir, Next-TTA and Rise),such a goal has been progressively ap-proached and partially prototyped.This paper reports the achievements up to now.The developments were based on the following choice of tools at the diﬀerent levels:Simulink at the control design level,SCADE/Lustre at the software design level and TTA at the distributed platform level.Why such a choice?SimulinkSCADE/LustreTTAFigure1:A design approach in three layers. The choice of Simulink is very natural,since it is consid-ered a de-facto standard in control design,in domains such as automotive or avionics.SCADE(Safety Critical Application Development Envi-ronment)is a tool-suite based on the synchronous paradigm and the Lustre[6]language.Its graphical modeling envi-ronment is endowed with a DO178B-level-A automatic code generator which makes it able to be used in highest criti-cality applications.Besides,a simulator and model checkers come along with the tool as plug-ins.It has been used in im-portant European avionic projects(Airbus A340-600,A380, Eurocopter)and is also becoming a de-facto standard in this ﬁeld.The Time Triggered Architecture[11](TTA)supports distributed implementations built upon a synchronous bus delivering to every computing unit a global fault-tolerant clock.It is currently used in a number of automotive and avionics applications.Furthermore,it ideally matches the synchronous paradigm and can be seen as well adapted to our framework.Although SCADE/Lustre can be seen as a strict sub-set of Simulink(the discrete-time part)and a number of code generators for Simulink exist(e.g.,Real-Time Work-shop,dSpace),we still believe it is important to add the extra layer(SCADE/Lustre)between Simulink and TTA. One simple reason is that important companies already use SCADE/Lustre in their tool-chain.The level-A qualiﬁed code generator is also a crucial aspect that makes certiﬁ-cation considerably easier.Another reason is that powerful analysis tools such as model-checkers and test generators are available for SCADE/Lustre,but not for Simulink.Fi-nally,Simulink was initially conceived as a simulation tool, whereas SCADE/Lustre was initially conceived as a pro-gramming tool.These diﬀerent origins become apparent when we examine the(weak)typing features of Simulink, its lack of modularity sometimes,and its multiplicity of se-mantics(depending on user-controlled“switches”such as the simulation method).On the contrary,SCADE/Lustre was designed from the very beginning as a programming lan-guage and has the above features.Therefore,it can serve as a reliable middle layer whichﬁlters-out Simulink ambi-guities and enforces a strict programming discipline,much needed in safety critical applications.In the rest of the paper we describe the work done at each of the three layers.First,the translation of Simulink to SCADE/Lustre.Second,extensions to SCADE/Lustre for specifying code distribution and real-time constraints. Third,implementation of SCADE/Lustre on the distributed time-triggered platform TTA.Related workA number of approaches exist at various levels of the de-sign chain,but very few are end-to-end.[18]report on an approach to co-simulate discrete controllers modeled in the synchronous language Signal[5]along with continuous plants modeled in Simulink.[17]use a model-checker to verify a Simulink/Stateﬂow model from the automotive do-main,however,they translate their model manually to the input language of the model-checker.[2]present tools for co-simulation of process dynamics,control task execution and network communication in a distributed real-time control system.[8]report on translating Simulink to the SPI model,a model of concurrent processes communicating with FIFO queues or registers.The focus seems to be the preservation of value over-writing which can occur in multi-rate systems when a“slower”node receives input from a“fast”one. Giotto[7]is a time-triggered programming language,sim-ilar in some aspects to SCADE/Lustre.The main diﬀer-ences is the logical-time semantics of SCADE/Lustre ver-sus real-time semantics of Giotto,and the fact that Giotto compilation is parameterized by a run-time scheduler,while in SCADE/Lustre scheduling is done once and for all at compile-time.MetaH[1]is an architecture description lan-guage and associated tool-suite.It uses an“asynchronous”model based on the Ada language,and real-time scheduling techniques such as rate-monotonic scheduling[14]to analyze the properties of the implementation.Annotations of programming languages with external in-formation,as we propose here for Lustre,is also sometimes undertaken in aspect-oriented programming approaches(e.g., as in[15]).Naturally,our work on scheduling is related with the vast literature on job-shop-like scheduling or real-time schedul-ing.However,we could notﬁnd scheduling techniques that can deal with relative deadlines,as we do here.Another originality of our scheduling problem is the periodicity con-straints imposed by the TTA bus.2.OVERVIEW OF THE THREE LAYERS 2.1A short description of SimulinkSimulink is a module of Matlab for modeling dataﬂow transfer laws.The Simulink notation and interface are close to the control engineering culture and knowledge.The user does not need any particular knowledge of software engi-neering.The control laws are designed with mathematical tools.The validation is made through frequency analysis and simulation.For more details,the reader can look at the MathWorks web site().Most impor-tant features of Simulink,which also aﬀect the translation to SCADE/Lustre,are described in Section3.2.2A short description of SCADE/Lustre SCADE is a graphical environment commercialized by Es-terel Technologies.It is based on the synchronous language Lustre.A Lustre program essentially deﬁnes a set of equa-tions:x1=f1(x1,...,x n,u1,...,u m)x2=f2(x1,...,x n,u1,...,u m)...where x i are internal or output variables,and u i are input variables.The variables in Lustre denoteﬂows.Aﬂow is a pair(v,τ),where v is an inﬁnite sequence of values and τis an inﬁnite sequence of instants.A value has a type. All values of v have the same type,which is the type of the ﬂow.Basic types in Lustre are boolean,integer and real. Composite types are deﬁned by tuples of variables of the same type(e.g.,(x,y),where x and y are integers).An instant is a natural number.Ifτ=0,2,4,···,then the understanding is that theﬂow is“alive”(and,therefore,its value needs to be computed)only on the even instants.The sequenceτcan be equivalently represented as a booleanﬂow, b0,b1,···,with the understanding thatτis the sequence of indices i such that b i=true.The functions f i are made up of usual arithmetic oper-ations,control-ﬂow operators(e.g.,if then else),plus a few more operators,namely,pre,->,when and current. pre is used to give memory(state)to a program.More precisely,pre(x)deﬁnes aﬂow y,such that the value of y at instant i is equal to the value of x at instant i−1(for the ﬁrst instant,the value is undeﬁned).->initializes aﬂow.If z=x->y,then the value of z is equal to the value of x at theﬁrst instant,and equal to the value of y there-after.The operator->is typically used to initialize aﬂow the value of which is undeﬁned at theﬁrst in-stant(e.g.,aﬂow obtained by pre).For example,a counter of instants is deﬁned by the equation x=0->(pre(x)+1), instead of x=pre(x)+1,which leaves it undeﬁned.when is used to sample aﬂow,creating aﬂow which lives less frequently than the originalﬂow:x when b,where x is a ﬂow and b is a booleanﬂow which lives at the same instants as x,deﬁnes aﬂow y which lives only when b is true,and has the same value as x on those instants.current is used to extend the life of a sampledﬂow y, to the instants of theﬂow which originally gave birth to y,by the usual sample-and-hold rule:current(y),where y is aﬂow sampled from x,is aﬂow x which leaves on the sameinstants as x,has the value of y on the instants when y isalive,and keeps its previous value during the instants wheny is not alive.Structure is given to a Lustre program by declaring andcalling Lustre nodes,in much the same way as,say,C func-tions are declared and called.Here is an example of nodedeclaration in Lustre:node A(b:bool;i:int;x:real)returns(y:real);var j:int;z:real;letj=if b then0else i;z=B(j,x);y=if b then pre(z)else C(z);tel.A is a node taking as inputs a booleanﬂow b,an integerﬂow i and a realﬂow x and returning a realﬂow y.Auses internalﬂows j and z(with usual scope rules).Thebody of A is declared between the let and tel keywords.A calls nodeB to compute z and nodeC to compute y(conditionally).B and C are declared elsewhere.Clock calculusGiven aﬂow x=(v,τ),τis called the clock of x,and isdenoted clock(x).The clock calculus is a typing mechanismwhich ensures that the Lustre program has a well deﬁnedmeaning.For example,we cannot add two variables x andy,unless they have the same clock(i.e.,they are alive at thesame instants):otherwise,what would the result of x+y beon an instant where x is alive and y is not alive?All input variables have the same clock,which is calledthe basic clock,denoted basic,and given by the sequence ofinstants0,1,2,···or equivalently the booleanﬂowtrue,true,true,···.A simpliﬁed version of the clock calculus of Lustre isshown in Table1.By convention,clock(basic)=basic.Ifthe constraints on clocks are not satisﬁed,then the programis rejected by the Lustre compiler.Partial orderA Lustre program deﬁnes a partial order on x i(denotedby→),expressing variable dependencies at each instant:ifx i→x j,then x j depends on x i,that is,in order to compute the value of x j at a given instant,we need the value of x iat that instant.The partial order is well-deﬁned,since thecompiler ensures that there are no cyclic dependencies onthe variables x i:all cycles must contain at least one preoperator,which means that the dependency is not on thesame instant,but on previous instants.Programs with cyclicdependencies on the same instant are rejected by the Lustrecompiler.2.3A short description of TTATTA[11](Time Triggered Architecture)is a distributed,synchronous,fault-tolerant architecture.It includes a setof computers(TTA nodes)connected through a bus.Each TTA node is equipped with a network card implementing the time triggered protocol[12](TTP).TTP provides a num-ber of services to the nodes,including clock synchronization, group membership and faulty node isolation.The programs running on each TTA node use the TTP controller to communicate with programs running on other munication is time triggered.This means that there is an a-priori global schedule specifying which TTA node will transmit which message at what time.This sched-ule,called message description list(MEDL),is constructed oﬀ-line and loaded on each TTP controller before operation starts.The MEDL ensures that no two TTA nodes transmit at the same time(given the correct functioning of the clock synchronization protocol).Therefore,no on-line arbitration is required(contrary,for example,to the CAN bus).The TTP controller and the CPU of the TTA node are linked through the computer-network interface(CNI):this is essentially a shared memory where the contents of each message are stored.The programs running on a TTA node read/write on the CNI independently of the TTP controller, which reads and writes according to the MEDL(i.e.,when it is time for this node to send/receive a message,the TTP con-troller will read/write the corresponding part of the CNI). The MEDL describes the operation of the bus in the global,common time axis,produced by the clock synchro-nization protocol.Time is divided into cycles,rounds and slots(Figure2).Cycles are repeated as long as the system runs,in exactly the same way.Each cycle contains a number of rounds and each round a number of slots.Rounds have the same duration,whereas slots within a round may have diﬀerent durations.Each slot is assigned to a TTA node, meaning that(only)this node transmits during that slot. Within a slot,a node transmits a frame,which contains one or more messages.The messages are broadcasted,meaning every other node can read them.The diﬀerence between rounds is that the frames of a given slot need not be the same among diﬀerent rounds of a cycle.For example,if slot 1is assigned to node A,A may transmit frame X in slot1of round1and frame Y in slot1of round2.However,opera-tion among diﬀerent cycles is identical(i.e.,A will transmit X in slot1of round1of cycle1,of cycle2,of cycle3,and so on).3.FROM SIMULINK TO SCADE/LUSTRE In our approach,we start with a Simulink model,con-sisting of two parts:a discrete-time part describing the con-troller and a continuous-time(or perhaps discrete-time)part containing the environment in which the controller is sup-posed to operate.Modeling both the controller and the environment is of course essential for studying the prop-erties of the controller by simulation.Once the designer is happy with the results,the implementation of the con-troller can start.Theﬁrst step in the implementation pro-cess is translating the controller part of the Simulink model to SCADE/Lustre.In this section we describe this translation.We begin by pointing out the main diﬀerences of the two languages and which subset of Simulink can be handled by our translation. Then we present the main principles of the translation.Let usﬁx some terminology.For Simulink,we will use the term block for a basic block(e.g.,an adder,a discrete ﬁlter,a transfer function,etc)and the term subsystem for a composite(a set of blocks or subsystems linked by sig-expression e clock (e )constraints commentsinput x basic similarly for constantsx +y clock (x )clock (x )=clock (y )similarly for −,->,if then else ,etcpre (x )clock (x )x when b bclock (x )=clock (b ),b booleancurrent (x )clock (clock (x ))Table 1:Clock calculus of Lustre.-slot 1round 1slot 1slot 1slot 1round 2cycle 1cycle 2round 1round 2slot 4slot 2slot 3slot 2slot 3slot 4slot 4slot 4slot 2slot 3slot 2slot 3Figure 2:Operation of a TTA bus in time.nals).The term system is used for the root subsystem.ForSCADE/Lustre,we will use the term operator for a basic operator (e.g.,+,pre ,etc)and the term node for a compos-ite.3.1Simulink and SCADE/LustreBoth Simulink and SCADE/Lustre allow the representa-tion of signals and systems,more precisely,multi-periodic sampled systems.The two languages share strong similari-ties,such as a data-ﬂow language,1similar abstraction mech-anisms (basic and composite components)and graphical de-scription.However,there are several diﬀerences:(1)SCADE/Lustre has a discrete-time semantics,whereas Simulink has a continuous-time semantics.It is important to note that even the “discrete-time library”Simulink blocks produce piece-wise constant continuous-time signals 2.(2)SCADE/Lustre has a unique,precise semantics.The semantics of Simulink depends on the choice of a simula-tion method.For instance,some models are accepted if one chooses variable-step integration solver and rejected with a ﬁxed-step solver.(3)SCADE/Lustre is a strongly-typed system with ex-plicit type set on each ﬂow.In Simulink,explicit types are not mandatory.A type-checking mechanism exists in Simulink (some models are rejected due to type errors)but,as with the execution semantics,it can be modiﬁed by the user by setting some “ﬂags”.(4)SCADE/Lustre is modular in certain aspects,whereas Simulink is not:for instance,a Simulink model may contain implicit inputs (the sampling periods of a system and its sub-systems,which are not always inherited).Given the above diﬀerences,the goals and limitations of our translation are described below.3.2Translation goals and limitations(1)We only translate a discrete-time,non-ambiguous part of Simulink.In particular,we do not translate blocks of the continuous-time library,S-functions,or Matlab functions.1The foundations of data-ﬂow models were laid by Kahn [9].Various such models are studied in [13].2Thus,in general,it is possible to feed the output of a continuous-time block into the input of a discrete-time block and vice-versa.The Simulink model to be translated is assumed to be (part of)the controller embedded in a larger model (including the environment).(2)The translation is faithful only with respect to the following simulation method:“solver:ﬁxed-step,discrete”and “mode:auto”.(3)The SCADE/Lustre program must be run at the time period the Simulink model was simulated.Thus,an out-come of the translation must be the period at which the SCADE/Lustre program shall be run (i.e.,the period of the basic clock).To know the period at which the Simulink model was simulated,we assume that for every external input of the model to be translated the sampling time is explicitly speciﬁed.(4)We assume that the Simulink model to be translated has the “Boolean logic signals ”ﬂag on 3.Then,a require-ment on the translator is to perform exactly the same type inference as Simulink.In particular,every model that is ac-cepted by Simulink must also be accepted by the translator and vice versa.(5)Simulink has an “algebraic loop”detection mecha-nism,but it allows the user to disable it (no detection per-formed)or to partially enable it (produce a warning in case a loop is detected,but accept the model).We assume that the user has set this ﬂag so that models with algebraic loops are rejected.This corresponds to Lustre restrictions,where cyclic variable dependencies on the same logical instant are not allowed.(6)For reasons of traceability,the translation must pre-serve the hierarchy of the Simulink model as much as possi-ble.It should also be noted that Simulink is a product evolv-ing in time.This evolution has an impact on the semantics of the tool.For instance,earlier versions of Simulink had weaker type-checking rules than current ones.We have de-veloped and tested our translation method and tool with Simulink 4.1(Matlab 6release 12).All examples given in this report refer to this version as well.3.3Translation schemeThe translation is done in three steps.The ﬁrst two steps 3This ﬂag yields a stricter type checking in Simulink,for instance,logical blocks accept and produce only booleans.Constantα:α,α∈SimNum(1) Adder:α×···×α→α,α∈SimNum(2)Relation:α×α→boolean,α∈SimNum(3)Log.Op.:boolean×···×boolean→boolean(4) DTF:double→double(5)DTCα:β→α,α,β∈SimT(6) Figure3:Types of some Simulink blocks.are type inference and clock inference.They are independent and can thus be performed in any order.The third step is the translation per-se.It is performed hierarchically in a bottom-up manner.Due to space limitations,we only give an overall description of each step.Details will be given in subsequent reports.3.3.1Type inferenceThere are three basic types in SCADE/Lustre:bool,int and real.Eachﬂow has a declared type and operations be-tween diﬀerent types are not allowed:for example,we can-not add an int with a real4.In Simulink,types need not be explicitly declared.However,Simulink does have typing rules:some models are rejected because of type errors.The objective of the type inference step is toﬁnd the type of each Simulink signal,which will then be used as the type of the corresponding SCADE/Lustreﬂow.Simulink provides the following data types:boolean,dou-ble,single,int8,uint8,int16,uint16,int32,r-mally,the type system of Simulink can be described as fol-lows.By default,all signals are double,except when:either the user explicitly sets the type of a signal to another type (e.g.,by a Data Type Converter block or by an expression such as single(23.4));or a signal is used in a block which de-mands another type(e.g.,all inputs and outputs of Logical Operator blocks are boolean).We can formalize the above type system as follows.De-note by SimT the set of all Simulink types.Let SimNum= SimT−{boolean}.Then,every Simulink block has a(poly-morphic)type,according to the rules shown in Figure3.“Log.Op.”stands for Logical Operator blocks,“DTF”for Discrete Transfer Function and“DTC”for Data Type Con-verter.The type of a Simulink subsystem(or the root sys-tem)A is deﬁned given the types of the subsystems or blocks composing A,using a standard function composition rule. Type inference is done using a standardﬁx-point computa-tion on an appropriate type lattice.Once the types of Simulink signals are inferred,they are mapped to SCADE/Lustre types:boolean is mapped to bool; int8,uint8,int16,uint16,int32and uint32are mapped to int;single and double are mapped to real.3.3.2Clock inferenceAs mentioned above,a SCADE/Lustre program has a unique basic clock.“Slower”clocks are obtained through the basic clock using the when operator.The clock of ev-ery signal in SCADE/Lustre is implicitly calculated by the compiler,which ensures that operations involve onlyﬂows 4Predeﬁned casting operators int2real or real2int can be used if necessary.Figure4:A triggered subsystem.of the same clock5.Thus,clocks can be seen as extra typing information.Discrete-time Simulink signals may also contain timing information,called“sample time”,consisting of a period and an initial phase.The sample time of a signal speciﬁes when the signal is updated.A signal x with periodπand initial phaseθis updated only at times kπ+θ,for k=0,1,2,..., that is,it remains constant during the intervals[kπ+θ,(k+ 1)π+θ).Sample times can be set in input signals and discrete-time blocks and they also serve as an extra type system in Simulink:some models are rejected because of timing errors.Another timing mechanism of Simulink is by means of “triggers”.Only subsystems(not basic blocks)can be trig-gered.A subsystem can be triggered by a signal x(of any type)in three ways,namely,“rising,falling”or“either”, which specify the moment the trigger occurs w.r.t.the di-rection with which x“crosses”zero(with boolean true iden-tiﬁed with1and false with0).The sample time of blocks and subsystems inside a triggered subsystem cannot be set by the user:it is“inherited”from the sample time T of the triggering signal.The sample times of the input signals must be all equal to T.The sample time of all outputs is also T.Thus,in the example shown in Figure4,the sample times of s,x1,x2and y are all equal.In what concerns triggered subsystems,Simulink is as modular as Lustre,where a node B called inside a node A cannot construct a“faster”clock than the basic clock of A(i.e.,the clock of itsﬁrst input).However,Simulink al-lows the sample time of a subsystem B embedded into a subsystem A to be anything.For instance,the period of A can be2while the period of B is1(thus,although B is “called”within A,B is“faster”than A).We consider this a non-modular feature of Simulink.The objective of clock inference is to compute the period and phase of each Simulink signal,block and subsystem,and use this information when creating the corresponding Lustre ﬂows and nodes and when deﬁning the period at which the SCADE/Lustre program must be run.We now give some examples on how this is done.Consider the Simulink model of Figure5and assume that the period of input x is1and that the period set to the Zero-order Hold block is2.6Then,the output y has pe-riod2and in the generated SCADE/Lustre program,it will be deﬁned as y=x when b1/2,where b1/2is the boolean ﬂow true false true false···.Now,if1is the smallest period in the entire Simulink model,this will also be the period 5Since checking whether two booleanﬂows are equal is gen-erally undecidable,clock checking in SCADE/Lustre is syn-tactic.6Unless otherwise mentioned,we assume that phases are0.TsHoldFigure5:A Zero-order Hold block modifying the period of its input.at which the generated SCADE/Lustre program must be run.In the SCADE/Lustre program,clock(x)=basic and clock(y)=b1/2.As another example,consider a subsystem A with two inputs x1and x2,with periods2and3,respectively.If these are the only periods in the Simulink model,then the period of the SCADE/Lustre program must be the greatest common divisor(GCD)of2and3,that is,1.This will also be the period of the outputs of A.This is because the shortest delay between two changes of the inputs(thus, activations of A)is1,and the outputs must also be periodic.3.3.3Hierarchical translationLogically,a Simulink model is organized as a tree,where the children of a subsystem are the subsystems(or blocks) directly embedded in it.The translation is performed fol-lowing this hierarchy in a bottom-up fashion(i.e.,starting from the basic blocks).For traceability,naming conventions are used,such as suﬃxing by an index or using the name path along the tree.Simple basic Simulink blocks(e.g.,adders,multipliers,the1 z transfer function)are translated into basic SCADE/Lustreoperators.For example,an adder is simply translated into+and1z is the Lustre pre operator.More complex Simulink blocks(e.g.,discreteﬁlters)are translated into SCADE/Lustre nodes.For example,the transfer function z+22is translated into the Lustre code: node Transfer_Function_3(E:real)returns(S:real);var Em_1,Em_2,Sm_1,Sm_2:real;letS=1.0*Em_1+2.0*Em_2-3.0*Sm_1-1.0*Sm_2;Em_1=0.0->pre(E);Em_2=0.0->pre(Em_1);Sm_1=0.0->pre(S);Sm_2=0.0->pre(Sm_1);tel.A Simulink subsystem is translated into a SCADE/Lustre node,possibly containing calls to other nodes.For example, the Simulink model shown in Figure6will be translated in two Lustre nodes,A and B,where node A calls node B. 4.EXTENSIONS OF LUSTREBefore SCADE/Lustre programs can be implemented on TTA,a fundamental problem must be solved:how to relate the semantics of the SCADE/Lustre program with the se-mantics of TTA.This is necessary,since the SCADE/Lustre program has alogical-time semantics,whereas the TTA im-plementation operates in real-time.The set of extensionsA...Figure6:Simulink system A with subsystem B.to Lustre that we propose in this section aim at bridging the two layers.The extensions allow the user to express what it means for an implementation to be correct.They can also be used by the compiler as directives for generating correct implementations.Thus,they facilitate both anal-ysis(checking whether an implementation is correct)and synthesis(automatically building correct implementations). Currently,the extensions are being prototyped in Lustre. The extensions do not change the high-level(logical-time) semantics of Lustre.To ensure backward-compatibility,they are provided mainly as annotations(pragmas)which can be taken into account or ignored,depending on the version of the compiler used.The extensions follow the declarative style of Lustre.4.1ExtensionsA set of code distribution primitives allow the user to spec-ify which parts of the Lustre program are assigned to which TTA node.A set of timing assumption primitives allow the user to specify known facts about the implementation,such as what is the period of an external clock,what is worst-case execution time(WCET)of a code block or the transmission time of a message.A set of timing requirement primitives allow the user to specify properties that the implementation must satisfy,such as relative deadlines of the form“from the moment input x is read until the moment output y is writ-ten,at most k time units elapse”.We give some examples of primitives and their usage in what follows.4.1.1Code distributionThe annotation location=P,where P is the name of a node in the distributed platform,is used to specify that a particular code block must be executed on P(at every instant).For example,x=f(y)(location=P)y=g(z)(location=Q)says that x must be computed on P and y on Q.Note that this implies that y must be transmitted from Q to P,before computation of x can start.4.1.2Basic clock period and periodic clocksThe annotation(hyp)basic period=pdeclares that the period of the basic clock is p time units. This is a timing assumption.Time units are implicit,but they have to be consistent throughout all declarations of timing assumptions.。