Towards a Logical Schema Integrating Software Process Modeling and Software Measurement

合集下载

Autodesk 开放 AEC 生态系统指南说明书

Autodesk 开放 AEC 生态系统指南说明书

Keys to an open AEC ecosystem3 4 6 7 12121“ M ore than ever, we need to work together across teams, tools, and industries to tackle the challenges of our collective future. This is why Autodesk is committed to an open and interoperable software ecosystem defined by seamless data connection.”– A my Bunszel, EVP AEC Design Solutions, AutodeskScreen dataset courtesy of BNIMAs BIM mandates mark the transformation of the AEC industry, the prospect of eliminating data-sharing bottlenecks and creating more seamless ways of collaborating comes closer to reality.Autodesk has a long history of developing more open ways of working through BIM, chief among them an embrace of open data standards for better software interoperability and project team collaboration.Back in 1994, Autodesk was part of a founding group of companies that prioritized the creation of an industry collective to define and progressively advance open, vendor-neutral datastandards for working collaboratively in BIM. Today, buildingSMART International ® supports the advancement of openBIM ® and the implementation of open standards through a focused set of services and programs, from advocacy and awareness to training and software certification to thought and technical leadership.Now, as a member of the buildingSMART International ® Strategic Advisory Council, Autodesk is active in the technical debates that shape the evolution of openBIM ® from a file-based method for data exchange toward a modern, cloud-based data management infrastructure.LEARN MORE >Committed to open data standardsData in a common languageAs part of our long-standing commitment to cross-platform interoperability, we continue to ensure that our portfolio of products meets the rigorous certification standards defined by the openBIM® process.IFC4 Export CertificationAutodesk Revit has received dual IFC4 Export Certification for architecture and structural exports, making it the first BIM platform to earn both certifications. We are committed to supporting IFC across all disciplines, including the IFC 4.3 schema, now in pilot implementation for infrastructure. The buildingSMART International® StrategicAdvisory CouncilAs a member of the council, we help support openBIM®standards and adoption through technical andstrategic guidance and in conversation with the globalcommunity of openBIM® adopters and advocates.Open Design AllianceOur partnership with Open Design Alliance gives usaccess to ODA’s IFC toolkit, allowing us to integratenew versions as they become ratified.Helping AEC BIM workflows with free Autodesk add-insIn addition to open data standards, Autodesk provides and maintains free add-ins to support better data exchange between architects, engineers, contractors, and owners working in BIM.LEARN MORE >COMMON DATA ENVIRONMENTS Common data for all As the AEC industry becomes increasingly complex and data-driven, managing complexity through effective collaboration within project teams is key to streamlining design and delivery.Common data environments harness the full collaborative potential and productivity of AEC project teams from design to construction.A CDE ensures that project and design data are available, accessible, and interchangeable to project stakeholders and contributors by unifying and standardizing BIM processes within a framework of rules and best practices. And not only can a CDE improve data and communication flows for project teams, but it can also assist owners and facility managers by providing a comprehensive record of the project at handoff and a rich dataset for the building, bridge, or road starting the next chapter in operation.Autodesk Docs provides a cloud-based common data environment that can support standard information management processes such as ISO-19650 across the complete project lifecycle. ISO19650 defines effective information management for working in BIM collaborative processes for multi-disciplinary project teams and owners.LEARN MORE ABOUT CDE IN AUTODESK DOCS >“ F orge’s interoperability means everything to us. It saved us the many months it would have taken to find workarounds for so many data formats and accelerated time to market for our product.”- Zak MacRunnels, CEO, ReconstructLINK TO STORY >APIs extend BIM innovation An ever-growing community of product experts and professional programmers customize Autodesk products by creating add-ins that enhance productivity. Even writing just a few simple utilities to automate common tasks can greatly increase team or individual productivity. Both the APIs for developing add-ins and extensions and the resources for using them are public and available for anyone to use.THE AUTODESK DEVELOPER NETWORKMany professional software developers rely on the Autodesk Developer Network (ADN) to support software development and testing and help market their solutions. The ADN, moderated by Autodesk software engineers, offers blogs, forums, and events to support the growing app developer ecosystem. The Autodesk App Store features content libraries, e-books, training videos, standalone applications, and other CAD and BIM tools built by this professional development community.LEARN MORE >AUTODESK AEC INDUSTRY PARTNERSA key benefit of Autodesk’s support for developers is the emergence of a vibrant community of Autodesk AEC Industry Partners. Autodesk AEC Industry Partners are third-party technology and service providers that work with Autodesk to deliver discipline-specific regional solutions, extending out-of-the-box software capabilitiesto help solve targeted business challenges.LEARN MORE >Dynamo is a visual programming language that democratizes access to powerful development tools. It empowers its users by allowing them to build job-, industry-, and practice-specific computational design tools through a visual programming language that can be less daunting to learn than others. It brings automation to CAD and BIM processes and builds connections between workflows, both within and outside the Autodesk portfolio of solutions. DynamoPlayer, available with Revit and Civil 3D, allows for the sharing of computational design scripts for use by non-coders. Dynamo is powered by the ingenuity and passion of its user community. Their contributions of code and documentation and their embrace of an open-source ethos have expanded the horizon of what is possible in BIM computation.LEARN MORE ABOUT DYNAMO >Open source in actionFor better interoperability, there is no going it alone. Partnerships allow bonds to build, ideas to get tested, prototypes to launch, innovations to accelerate, industries to converge, and people to work collectively to make an impact. Collaboration across platforms and industriesNVIDIA OMNIVERSEWe’ve joined forces with leaders across design, business, and technology to explore and create within NVIDIA’sOmniverse. Built on Pixar’s open-source Universal Scene Description format, it provides real-time simulationsand cross-industry collaboration in design and engineering production pipelines.LEARN MORE >UNITYBy integrating Unity’s 2D, 3D, VR, and AR technologies withAutodesk design tools like Revit, 3ds Max, and Maya, AECprofessionals can quickly create, collaborate, and launchreal-time simulations from desktop, mobile, and hand-held devices.LEARN MORE >ESRIWe’re working with ESRI to integrate BIM and GIS processes,enabling a more efficient exchange of information betweenhorizontal and vertical workflows, minimizing data loss, andenhancing productivity with real-time project insights.LEARN MORE >Autodesk and Bentleysign interoperabilityagreementAutodesk makesRevit’s IFCimport/exporttoolkit availableas open sourceIFC4 is released andintegrated into RevitAnnounces partnershipwith Unity to betterintegrate design andsimulationAutodesk Docs extendssupport for ISO 19650Common Data Environment(CDE) workflowsAutodesk and others pilotimplementation of IFC4.3 forinfrastructure workflowsAutodesk developsDXF, an early openfile formatAcquires Revitand beginsdeveloping thepredecessorto IFCCo-founds buildingSMARTInternational® in partnershipwith other industry leaders*buildingSMARTInternational®establishes openBIM®Adds STL export inRevit and releasesopen-source STLpluginRevit adds COBieExtensionIFC is integrated intoAutodesk Inventor®Joins Open Design AllianceAutodesk and Trimble® signinteroperability agreementReceives IFC4 exportcertification for Revit forArchitecture and StructureAutodesk Navisworks addsCOBie ExtensionAnnounces collaborationwith NVIDIA on OmniverseAnnounces partnershipwith ESRI, integratingGIS and BIM processes*Founded as “Industry Alliance for Interoperability” and renamed “International Alliance for Interoperability” in 1996 before coming buildingSMART International® in 2006.BRIEF HISTORY AND RESOURCES11KEYS TO AN OPEN AEC ECOSYSTEMAutodesk and the Autodesk logo are registered trademarks or trademarks of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. Autodesk reserves the right to alter product and services offerings, and specifications, and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document. ©2021 Autodesk, Inc. All rights reserved.。

数据库原理基本概念英文解释

数据库原理基本概念英文解释

数据库原理基本概念英文解释Database principles refer to the fundamental concepts that define the structure, functionality, and management of adatabase system. These principles are essential for designing, implementing, and maintaining a reliable and efficient database. In this essay, I will discuss the basic concepts and principlesof databases in detail, including data modeling, data integrity, normalization, indexing, and database transactions.Data Modeling:Data modeling is the process of defining the structure and relationships of the data in a database. It involves identifying and organizing the various entities, attributes, andrelationships that exist within the domain of an application. There are different types of data models, such as the conceptual, logical, and physical data models. The conceptual data model describes the high-level view of the data, the logical datamodel represents the data structure using entities, attributes, and relationships, and the physical data model maps the logical data model to a specific database management system.Data Integrity:Data integrity ensures the accuracy, consistency, and reliability of data stored in a database. It ensures that the data values conform to defined rules or constraints. There arefour types of data integrity: entity integrity, referential integrity, domain integrity, and user-defined integrity. Entity integrity ensures that each row in a table has a unique identifier. Referential integrity ensures that relationships between tables are maintained. Domain integrity ensures that data values are within certain predefined ranges. User-defined integrity ensures that additional business rules or constraints are enforced.Normalization:Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It involves breaking down larger tables into smaller, more manageable entities and establishing relationships between them. The normalization process follows a series of normal forms, such as First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), etc. Each normal form has a set of rules that need to be satisfied to ensure data integrity and eliminate data anomalies, such as update, insertion, and deletion anomalies.Indexing:Indexing is a technique used to improve the efficiency of data retrieval operations in a database. It involves creating an index on one or more columns of a table, which allows the database system to locate specific rows quickly using the indexed column(s). Indexes are typically implemented as B-treesand provide a faster search mechanism by reducing the number of disk I/O operations required to locate data. Indexes should be carefully designed and maintained to balance the trade-off between query performance and the overhead of maintaining the index.Database Transactions:。

The relationship between reading comprehension and critical thinking

The relationship between reading comprehension and critical thinking

The relationship between reading comprehension ORIGINAL ARTICLEand critical thinking:A theoretical studyAbdulmohsen S.AloqailiKing Saud University,College of Education,Department of Curriculum and Instruction,Saudi Arabia Received 11November 2010;accepted 11January 2011Available online 31October 2011KEYWORDSSchema theory;Reading comprehension;Critical thinking;Cognitive development processesAbstract The main purpose of the present study is to review and analyze the relationship between reading comprehension and critical thinking.The specific theatrical issues being discussed include schema theory as a rational premise for the connection between reading comprehension and critical thinking,cognitive development processes,critical thinking:its nature and definitions,critical thinking:skills and dispositions,and critical thinking and reading comprehension.The results revealed that:(1)there is well established relationship between reading comprehension and critical thinking,(2)schema theory provides a rational premise for that relation,and (3)there is no con-sensus regarding the definition of critical thinking which might be interpreted as a lack of an accepted framework for critical thinking.ª2011King Saud University.Production and hosting by Elsevier B.V.All rights reserved.1.IntroductionIn recent years,the field of research on reading comprehension and critical thinking has received much attention and became a popular area in cognitive psychology.Modern cognitivists have developed new trends and theories that provide theoretical models for explaining and conceptualizing reading comprehen-sion by utilizing a set of related concepts,such as criticalthinking,prior knowledge,inference-making,and metacogni-tive skills (Limbach and Waugh,2010;Zabit,2010).Among these trends is schema theory,which is considered to be a theory about knowledge:how knowledge is represented and or-ganized,and how that representation and organization facili-tates the use of a reader’s prior knowledge to improve reading comprehension.A schema is the organized knowledge that one has about people,objects,places,events,processes,concepts,and virtually everything that provides a basis for learning (Rumelhart,1984).Bos and Anders (1990)stated that,‘‘Schema theory explains how knowledge is structured in mem-ory and how these structures affect incoming information’’(p.49).Anderson and Wilson (1986)indicated that schema theory explains how people’s existing knowledge affects comprehension.Critical thinking can be considered as means to activate or construct schema.Norris and Phillips (1987)indicate that crit-ical thinking provides an explanation for activating existing schemata and for constructing new ones by contrasting ideas and engaging in reflective thinking.Moreover,Collins et al.E-mail address:aloqaili@.sa2210-8319ª2011King Saud University.Production and hosting by Elsevier B.V.All rights reserved.Peer review under responsibility of King Saud University.doi:10.1016/j.jksult.2011.01.001(1980)count inference-making as a way to activate schemata in terms offilling in the missing connections between the surface structure fragments of the text by recourse to content and knowledge about the world.McNeil(1992)asserts that schema theory has special rele-vance for teachers of reading comprehension in that it ques-tions the traditional view that students should learn to reproduce the statements being read in the text.In contrast to this older view of reading comprehension,schema theory stresses an interactive approach that views teaching reading comprehension as a process,meaning that students are taught techniques for processing text,such as making inference,acti-vating prior knowledge,and using critical thinking(McNeil, 1992;Aloqaili,2005a;Orbea and Villabeitia,2010).Tierney and Pearson(1986)explain that schema theory has the major influence on new views of reading and reading com-prehension.They stated that:New views have forced us to rethink the act of reading.Fora long time we thought reading was the reproduction of theideas on the page;our goal was to have students produce a ‘‘photocopy’’of the page.Schema theory has moved us away from a reproductive view to a constructive view.In that view,the reader,rather than the text moves to the cen-ter of the construction process(p.3).According to schema theory,there are no definitive orfinal conclusions that can be reached for the text(Norris and Phillips,1987;Yu-hui et al.,2010).That is,schema theory deals with the reading comprehension as an interactive process between readers’prior knowledge and the text being read. Sometimes a reader may end up with a different understand-ing,based on his or her total previous experiences:their rich-ness or paucity.Therefore a reader with a rich background will comprehend better than one who has a poorer back-ground.In short,schema theory believes in open text or con-text.The interpretation is relative(Aloqaili,2005b).For the purpose of the study,reading comprehension can be defined as the meaning constructed as a result of the complex and interactive processes relating a reader’s critical thinking,prior knowledge,and inference-making.2.Cognitive development processesPiaget(1952)presented three cognitive processes which he used to explain how and why cognitive/concept development occurs.These processes are assimilation,accommodation, and equilibrium.Marshall(1995)believes that Piaget made a key contribution to schema theory with his focus on how sche-mata develop and change.So,the following section will be devoted to a brief explanation of Piaget’s work related to cog-nitive development processes.Piaget(1970)proposed that cognitive growth occurs when the learner establishes mental categories(schemata)comprised of concepts about subjects and events sharing some general or spe-cific features.He views schemata as cognitive structures by which individuals intellectually adapt to and organize the environment.Piaget(1952)provides three cognitive mechanisms which interpret how children develop,acquire,classify,or organize their schemata or cognitive structures.These cognitive pro-cesses or mechanisms are assimilation,accommodation,and equilibration.2.1.AssimilationAccording to Piaget(1952)assimilation is a continuous process that helps the individual to integrate new,incoming stimuli into the existing schemata or concepts.That is,assimilation includes adding new information to old schemata.To illustrate the assimilation processes,Rubin(1997)presents an example of young children who tend to classify all similar four-footed animals as dogs;the children are assimilating.What they do is that they have assimilated all four-food animals into their existing schemata.Wadsworth(1996)points out that the assimilation process elaborates the size or growth of schemata,however,it does not alter them,and it is simply like adding air into a balloon.He stated the following:Assimilation theoretically does not result in a change of schemata,but it does affect the growth of schemata and is thus a part of development.One might compare a schema to a balloon and assimilation to putting more air in the bal-loon.The balloon gets larger(assimilation growth),but it does not change its shape.Assimilation is a part of the pro-cess by which the individual cognitively adapts to and orga-nizes the environment’’(p.17).Thus,assimilation allows for the growth of schemata by adding or taking in new information to old.However,the assimilation process does not change or create new schemata. Change and creation of schemata are the functions of another cognitive development process,which is accommodation.2.2.AccommodationPiaget(1952)indicates that accommodation is the process of developing new categories by a child rather than integrating them into existing ones.That is,accommodation is the way by which children create new schemata or change old ones with new information.Wadsworth(1996)explains that if the child meets with new stimulus that cannot be assimilated because there are no schemata into which the stimulus would fit,the alternative is either to construct new schemata in which to place the stimulus(a new index card in thefile),or change or modify the existing ones tofit with new stimulus.That is, accommodation has two aspects or forms:creation of new schemata or modification of old schemata with new ones.Wadsworth(1996)clarifies the difference between assimila-tion and accommodation by stating that‘‘accommodation accounts for development(a qualitative change)and assimila-tion accounts for growth(a quantitative change;together these processes account for intellectual adaptation and the develop-ment of intellectual structures’’(p.19).Rubin(1997)asserts that in spite of the importance of both assimilation and accommodation as a cognitive process devel-opment,children should be aware of making a balance between these two processes.Therefore,balancing between assimilation and accommodation is the function of the third cognitive mechanism,which is equilibrium.A brief explana-tion of equilibrium is provided below.2.3.EquilibriumAccording to Piaget(1952)equilibrium is a balance between the assimilation and accommodation processes.Wadsworth36 A.S.Aloqaili(1996)indicates that if a child overassimilates,he or she will end up with a few too large schemata,and will be unable to find out the differences in things,because most things seem similar to him or her.In contrast,if a child overaccommo-dates,he or she will have too many small schemata.This over-accommodation would prevent him or her from detecting similarities,because all things seem different to him or her.Rubin(1997)explains that a child with equilibrium process would be able to see similarities between stimuli and thus assimilate them,and also would be able to determine when new schemata are needed for adequate accommodation of a surplus of categories or schemata.3.Critical thinking:its nature and definitionsThe literature indicates that there is no consensus regarding the definition of critical thinking.A multiplicity and variation of definitions of critical thinking are reflective of the way in which educators and scholars define it(Aloqaili,2001;Minter, 2010).Romeo(2010)explains that there is currently a lack of an accepted framework for critical thinking,so that there is not a widely acknowledged and accepted theoretical definition. Some educators and psychologists deal with critical thinking as a narrow concept,whereas others view critical thinking as a broad concept.For example,Beyer(1987)defined critical thinking in a narrow sense as convergent thinking.He stated clearly that‘‘critical thinking is convergent’’(p.35),in contrast to creative thinking which is divergent.Beyer(1985)has argued that‘‘critical thinking is not a pro-cess at least not in the sense that problem solving or decision making are processes;that is,critical thinking is not a unified operation consisting of a number of operations through which one proceeds in sequence’’(p.303).Mcpeck(1981)has offered this broad definition for critical thinking,‘‘The propensity and skill to engage in an activity with reflective skepticism’’(p.8).Ennis(1993)criticizes Mcpeck’s definition because it focuses on‘‘reflective skepti-cism,’’and according to Ennis,‘‘critical thinking must get beyond skepticism’’(p.180).Ennis(1962)has dealt with crit-ical thinking with a narrow sense.He stated that critical think-ing is‘‘the correct assessing of statements’’(p.6).However, Ennis(1985)has replaced his narrower definition with the broader one which viewed critical thinking as‘‘reasonable, reflective thinking that is focused on deciding what to believe or do’’(p.46).One of the main differences between Ennis’definitions of critical thinking is that the broader definition includes creative elements,but the narrower one tried to exclude them.Ennis (1987)explains and analyzes his broader definition of critical thinking as follows:Critical thinking,as I think the term is generally used,is a practical reflective activity that has reasonable belief or action as its goal.There arefive key ideas here:practical,reflective, reasonable,belief,and action.They combine into the following working definition:Critical thinking is reasonable reflective thinking that is focused on deciding what to believe or do. Note that this definition does not exclude creative thinking. Formulating hypotheses,alternative ways of viewing a prob-lem,questions,possible solutions,and plans for investigating something are creative acts that come under this definition (p.10).Lewis and Smith(1993)indicate that although Ennis does not use the term problem solving in his definition of critical thinking,he refers to the usual steps in problem solving as cre-ative acts which are a part of his definition.In other words, Ennis separates critical thinking and problem solving while pointing out their interdependence in practice.Ennis(1987)explains that he abandoned his narrower def-inition of critical thinking‘‘because,although it provides more elegance in theorizing,it does not seem to be in accord with current usage’’(p.11).Kennedy et al.(1991)point out that current usage of the term‘‘critical thinking’’generally reflects Ennis’broad definition.According to Ennis(1993),for a person to reasonably and reflectively go about deciding what to believe or do,most of the following things characteristically must be done interdependently:Judge the credibility of sources.Identify conclusions,reasons,and assumptions.Judge the quality of an argument,including the acceptabil-ity of its reasons,assumptions,and evidence.Develop and defend a position on an issue.Ask appropriate clarifying questions.Plan experiments and judge experimental designs.Define terms in a way appropriate for the context.Be open-minded.Try to be well informed.Draw conclusions when warranted,but with caution.Another scholar who has provided a broad definition for critical thinking is Facione(1984)who developed a definition of critical thinking that incorporates evaluation and problem solving.Facione indicates that it is possible to evaluate critical thinking by evaluating the adequacy of the arguments that ex-press that thinking.He stated that‘‘critical thinking is the development and evaluation of arguments’’(p.259).Lewis and Smith(1993)point out that what is new in Facione’s definition is that he views critical thinking as an active process which involves constructing arguments,not just evaluating them.According to Facione(1984)construct-ing arguments include the usual steps of problem solving which are:(1)determining background knowledge,(2)gener-ating initially plausible hypotheses,(3)developing procedures to test these hypotheses,(4)articulating an argument from the results of these testing procedures,(5)evaluating the arguments,and(6),where appropriate,revising the initial hypotheses.Facione(1984)stated that‘‘Learning argument construc-tion means learning the methodologies that generations of researchers have refined for the specific needs of each disci-pline’’(p.259).In this study,critical thinking refers to the pro-cess by which the reader thinks reasonably and reflectively for the purpose of meaning construction.4.Critical thinking:skills and dispositionsThere is an argument between educators regarding whether critical thinking involves both skills and dispositions.If so, which skills and which dispositions?Skills(or abilities)are the more cognitive aspect of critical thinking,however,dispo-sitions(or attitudes)are the more affective aspect.The relationship between reading comprehension and critical thinking:A theoretical study37Beyer(1984)views critical thinking as a set of nine discrete skills,including:(1)distinguishing between verifiable facts and value claims,(2)determining the reliability of a source,(3) determining the factual accuracy of a statement,(4)distin-guishing relevant from irrelevant information,claims or rea-sons,(5)detecting bias,(6)identifying ambiguous or equivocal claims or arguments,(7)recognizing logical incon-sistencies or fallacies in a line of reasoning,(8)distinguishing between warranted or unwarranted claims,and(9)determin-ing the strength of an argument.A number of researchers in critical thinking disagree that critical thinking is only a set of skills,and they maintain that critical thinking also involves dispositions.So in the literature the importance of dispositions has been heavily stressed (Ennis,1987;Norris,1985;Baum and Newbill,2010;Facione, 2010;Zori et al.,2010;Sternberg,1985).Paul(1984)makes a useful distinction regarding the dispo-sitions of the thinker.He deals with critical thinking in two dif-ferent ways:critical thinking in the weak sense and critical thinking in the strong sense.He asserts:In a weak sense,critical thinking skills are understood as a set of discrete micrological skills ultimately extrinsic to the character of the person;skills that can be tacked onto other learning.In the strong sense,critical thinking skills are understood as a set of integrated macro-logical skills ultimately intrinsic to the character of the person and to insight into one’s own cognitive and affective processes (p.5).Paul(1991)indicates that critical thinking in the strong sense involves approaching issues from multiple perspectives and demands open-mindedness to understanding points of view with which one disagrees.Among those who advocated skills and dispositions were Ennis(1985),who defined critical thinking as‘‘reasonable, reflective thinking that is focused on deciding what to believe or do’’(p.46).Based on his broad and working definition of critical thinking,Ennis(1987)developed a taxonomy of critical thinking skills which includes thirteen dispositions and twelve abilities that together make up critical thinking.For example, some of the dispositions of a critical thinker,as mentioned by Ennis(1987)are:(1)Seek a clear statement of the thesis or question.(2)Take into account the total situation.(3)Keep in mind the original and/or basic concern.(4)Look for alternatives.(5)Use one’s critical thinking abilities.(6)Be sensitive to the feelings,level of knowledge,anddegree of sophistication of others.(7)Be open-minded.In addition to these dispositions,there are some abilities, such as:(1)focusing on a question,(2)analyzing arguments, (3)asking and answering questions of clarification and/or challenge,(4)judging the credibility of a source,(5)deducing and judging deductions,(6)inferring explanatory conclusions and hypotheses,and(7)identifying assumptions.Each of these abilities contains a large number of sub-abilities(Ennis, 1987).5.Critical thinking and reading comprehensionThe relationship between critical thinking and reading is well established in the literature.For example,Norris and Phillips (1987)point out that reading is more than just saying what is on the page;it is thinking.Moreover,Beck(1989)asserts ‘‘there is no reading without reasoning’’(p.677).Also,among those researchers and theoreticians who recognize that reading involves thinking is Ruggiero(1984).He indicates that reading is reasoning.Yu-hui et al.(2010)stated clearly that reading is a thinking process to construct meaning.Utilizing and combining schema theory with principles of critical thinking are one of the effective ways of enhancing the concept of reading comprehension(Norris and Phillips, 1987).They explain that critical thinking provides a means of explaining the ability to work out ambiguous text by gener-ating alternative interpretations,considering them in light of experience and world knowledge,suspending decision until further information is available,and accepting alternative explanations.They conclude that critical thinking is the pro-cess which the reader uses to comprehend.Schema theory provides powerful rationales for making links between students’individual backgrounds,specific sub-ject area knowledge,and critical thinking(Marzano et al., 1988;Aloqaili,2005c).According to Anderson(1994),there are six ways in which schemata function in thinking and in remembering text information.These six ways are:(1)Most new knowledge is gained by assimilating newinformation into existing structure;therefore,subject matter learning should build on prior knowledge when-ever possible.(2)The students’existing schemata help to allocate atten-tion by focusing on what is pertinent and important in newly presented materials.(3)Schemata allow and direct the inferential elaboration ofincoming information and experience.(4)Schemata allow orderly searches of memory by provid-ing learners with a guide to the types of information that should be recalled.(5)Schemata facilitate the thinking skills of summarizingand editing.(6)Schemata permit inferential reconstruction when thereare gaps in memory,which means that they help the learner generate hypotheses about missing information.It is obvious,based on the previous six schemata functions, that prior knowledge plays a significant role regarding estab-lishing connections between thinking critically and processing text information.This connection consequently leads the read-ers to reach the critical comprehension level.In accordance with this notion(the relationship between prior knowledge and critical thinking),the literature reveals an agreement between researchers concerning the idea that an individual’s familiarity with the subject matter of a text plays an important part in the person’s performance on think-ing tasks in that area(Glaser,1984;Norris,1985;Sternberg and Baron,1985).Knowledge and thinking skills can be viewed as interdependent(Nickerson et al.,1985).Comprehension itself has been seen as a critical thinking process.For instance,from a schema theory description of38 A.S.Aloqailireading,comprehension can be conceptualized as a critical thinking act(Anderson and Pearson,1984;Collins et al., 1980;Norris and Phillips,1987;Rumelhart,1980;Aloaili, 2005d).Lewis(1991)argues that viewing reading as a critical thinking act becomes more tenable when some of the compo-nents of the reading process are accepted as automatic and necessary(automatic processes like word identification,deriva-tion of meaning for most words,and assignment of impor-tance),but not sufficient for constructing text understanding.According to schema theory,the understanding and inter-pretation of the text are relative,which means that definitive conclusions cannot be reached.However,the readers should seek to arrive at a coherent and consistent understanding of the text being read.Lewis(1991)stated the following: Schema theory posits that there is no absolute meaning on the page to be interpreted the same by all-that is,there is no ‘‘correct’’comprehension.The goal of reading extended text is to arrive at a coherent representation of the text.This goal is achieved by readers’weighing and comparing data from their schemata,the text,and the context in which the act occurs(p.421).In order to enhance readers’ability to achieve and practice comprehension as a critical thinking act,researchers have shown that‘‘the critical thinker uses his or her metacognitive knowledge and applies metacognitive strategies in a planful, purposeful way throughout the critical thinking process’’(French and Rhoder,1992,p.191).Gallo,1987)uses metacognitive strategies to develop criti-cal thinking.She suggests that improved critical thinking re-quires developing the processes of observation,analysis, inference,and evaluation.Broek and Kremer(2000)made connections between infer-ence-making and critical thinking to promote reading compre-hension.They presented the idea that inferential and reasoning skills are closely related to other readers’characteristics and skills that affect text comprehension.Broek and Kremer (2000)state that:To be successful,readers must have the inferential and rea-soning skills to establish meaningful connections between information in the text and relevant background knowledge. Central to these skills is knowing what constitutes an inferen-tial or causal/logical relation and being able to recognize or construct one when needed in order to form a coherent mental representation of the text(pp.11–12).Ennis(1987)classified inference as critical thinking ability which includes three somewhat overlapping and interdepen-dent kinds of inference:deductive inference,inductive infer-ence,and inference to value judgments.According to Albrecht(1980),deduction is referred to as‘‘top-down think-ing’’because the conclusion or result is known and the search is for specific evidence that led to that particular conclusion. However,Clarke,1990pointed out that induction is often called‘‘bottom-up thinking’’because conclusions are drawn from specific instances,such as building on another unit the conclusion is reachedEnnis(1987)presented subskills or abilities under each of these three kinds of inference:deductive inference,inductive inference,and inference to value judgments.For example, deductive inference includes(1)class logic,(2)conditional logic,and(3)interpretations of statements.Also,inductive inference involves(1)generalizing,(2)inferring explanatory conclusions and hypotheses,and(3)giving reasonable assumptions.Moreover,inference to value judgments requires (1)background facts,(2)considering alternatives,and(3)bal-ancing,weighing,and deciding.Bizar and Hyde(1989)argued that inferential thinking con-tains two types:drawing inferences and drawing conclusion. Regarding thefirst one(drawing inferences),Bizar and Hyde (1989)stated the following:Inferential thinking involves putting together individual bits of information to derive a greater meaning than what one might expect from merely focusing on the bits themselves. When reading a passage,we infer a great deal;that is,we derive much more meaning than a literal interpretation of words’’(p.35).Another kind of inferential thinking,drawing a conclusion, involves taking pieces of information and synthesizing them into a meaningful idea which is greater than the separate pieces (Bizar and Hyde,1989).They concluded that drawing infer-ences and conclusions depend heavily on students’schemata. That is,if the student does not have the requisite knowledge or accurate schemata,he or she will not be able to build mean-ing from the materials being read.6.ConclusionThe literature reveals an agreement between theorists and researchers that there is a strong relationship among reading comprehension,critical thinking,and prior knowledge.This relation is interdependent,which means that prior knowledge serves as a foundation for critical thinking and inference-mak-ing.Critical thinking and inference-making work as effective means to activate prior knowledge.Prior knowledge and thinking skills can be viewed as interdependent.Schema theory provides powerful rational and theoretical premises of building an interactive model for interpreting how reading comprehen-sion develops by utilizing the connections between reading comprehension and critical thinking.Schema theory is consid-ered to be one of the most effective current theories that has had a major influence in terms of changing the face of reading instruction and reading comprehension.ReferencesAlbrecht,K.,1980.Brain Power:Learn to Improve Your Thinking Skills.Englewood Cliffs,Prentice Hall,NJ.Aloqaili,A.S.,2001.Perceptions of Saudi Arabian reading teachers of selected concepts related to schema theory.Unpublished Doctoral Dissertation,Ohio University,USA.Aloqaili,A.S.,2005a.An evaluation of Arabic teachers’education program in Teachers’Colleges in Saudi Arabia.Journal of the Faculty of Education29(4),301–382.Aloqaili,A.S.,2005b.Toward a modern standard of reading skills for elementary schools.Journal of Reading and Literacy49,77–146. Aloqaili,A.S.,2005c.The role of school in reading development of young learners.Paper presented at the International Conference Titled(Toward a Literate Arabic Society:Reading Supporting Policies:International Experiences),Casablanca,Morocco. Aloaili,A.S.,2005d.Arabic teachers beliefs and practices in Riyadh city in relation to constructivism.The Educational Journal19(76), 253–310.Anderson,R.C.,1994.Role of the reader’s schema in comprehension, learning,and memory.In:Ruddell,R.B.,Ruddell,M.R.,Singer,H.(Eds.),Theoretical Models and Processes of Reading,fourth ed.International Reading Association,Newark,DE,pp.469–82.The relationship between reading comprehension and critical thinking:A theoretical study39。

USB Type-C 规范1.2(中文版)

USB Type-C 规范1.2(中文版)
INTELLECTUAL PROPERTY DISCLAIMER
知识产权声明
THIS SPECIFICATION IS PROVIDED TO YOU “AS IS” WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, NON-INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE AUTHORS OF THIS SPECIFICATION DISCLAIM ALL LIABILITY, INCLUDING LIABILITY FOR INFRINGEMENT OF ANY PROPRIETARY RIGHTS, RELATING TO USE OR IMPLEMENTATION OF INFORMATION IN THIS SPECIFICATION. THE PROVISION OF THIS SPECIFICATION TO YOU DOES NOT PROVIDE YOU WITH ANY LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS.
预发行行业审查公司提供反馈
Revision History.......................................................................................................................14
LIMITED COPYRIGHT LICENSE: The USB 3.0 Promoters grant a conditional copyright license under the copyrights embodied in the USB Type-C Cable and Connector Specification to use and reproduce the Specification for the sole purpose of, and solely to the extent necessary for, evaluating whether to implement the Specification in products that would comply with the specification.

外语教学法自考题模拟35

外语教学法自考题模拟35

外语教学法自考题模拟35(总分:100.00,做题时间:90分钟)一、Ⅰ.Multiple Choice(总题数:20,分数:20.00)1.The first Berlitz School was established in ______.(分数:1.00)A.1778B.1828C.1878 √D.1928解析:[解析] 贝力兹(Berlitz)于1878年在美国建立了第一所“贝力兹外语学校”(Berlitz School)。

2.Before the 16th century, Latin was taught and learned for ______.(分数:1.00)A.reading literature in LatinB.spoken and written communication √C.mastering grammarD.1earning fine arts解析:[解析] 16世纪时拉丁语仍是西方各国用以进行口头和书面交际的语言。

3.Memorizing grammar rules and bilingual word lists tends to give the learners ______ to actively communicate in the target language.(分数:1.00)A.high motivationB.good motivationC.low motivation √D.favorable motivation解析:[解析] 语法翻译法的不足之处有一点在于:背诵语法规则和两种语言词汇不能推动学生主动地用目标语进行交流。

4.In the Oral Approach, accuracy in both pronunciation and grammar is regarded as ______. (分数:1.00)A.permittableB.crucial √elessD.acceptable解析:[解析] 口语法教学中准确的发音及语法至关重要,要尽一切努力避免错误。

Gradient-based learning applied to document recognition

Gradient-based learning applied to document recognition

Gradient-Based Learning Appliedto Document RecognitionYANN LECUN,MEMBER,IEEE,L´EON BOTTOU,YOSHUA BENGIO,AND PATRICK HAFFNER Invited PaperMultilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient-based learning technique.Given an appropriate network architecture,gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns,such as handwritten characters,with minimal preprocessing.This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task.Convolutional neural networks,which are specifically designed to deal with the variability of two dimensional(2-D)shapes,are shown to outperform all other techniques.Real-life document recognition systems are composed of multiple modules includingfield extraction,segmentation,recognition, and language modeling.A new learning paradigm,called graph transformer networks(GTN’s),allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure.Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training,and theflexibility of graph transformer networks.A graph transformer network for reading a bank check is also described.It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks.It is deployed commercially and reads several million checks per day. Keywords—Convolutional neural networks,document recog-nition,finite state transducers,gradient-based learning,graphtransformer networks,machine learning,neural networks,optical character recognition(OCR).N OMENCLATUREGT Graph transformer.GTN Graph transformer network.HMM Hidden Markov model.HOS Heuristic oversegmentation.K-NN K-nearest neighbor.Manuscript received November1,1997;revised April17,1998.Y.LeCun,L.Bottou,and P.Haffner are with the Speech and Image Processing Services Research Laboratory,AT&T Labs-Research,Red Bank,NJ07701USA.Y.Bengio is with the D´e partement d’Informatique et de Recherche Op´e rationelle,Universit´e de Montr´e al,Montr´e al,Qu´e bec H3C3J7Canada. Publisher Item Identifier S0018-9219(98)07863-3.NN Neural network.OCR Optical character recognition.PCA Principal component analysis.RBF Radial basis function.RS-SVM Reduced-set support vector method. SDNN Space displacement neural network.SVM Support vector method.TDNN Time delay neural network.V-SVM Virtual support vector method.I.I NTRODUCTIONOver the last several years,machine learning techniques, particularly when applied to NN’s,have played an increas-ingly important role in the design of pattern recognition systems.In fact,it could be argued that the availability of learning techniques has been a crucial factor in the recent success of pattern recognition applications such as continuous speech recognition and handwriting recognition. The main message of this paper is that better pattern recognition systems can be built by relying more on auto-matic learning and less on hand-designed heuristics.This is made possible by recent progress in machine learning and computer ing character recognition as a case study,we show that hand-crafted feature extraction can be advantageously replaced by carefully designed learning machines that operate directly on pixel ing document understanding as a case study,we show that the traditional way of building recognition systems by manually integrating individually designed modules can be replaced by a unified and well-principled design paradigm,called GTN’s,which allows training all the modules to optimize a global performance criterion.Since the early days of pattern recognition it has been known that the variability and richness of natural data, be it speech,glyphs,or other types of patterns,make it almost impossible to build an accurate recognition system entirely by hand.Consequently,most pattern recognition systems are built using a combination of automatic learning techniques and hand-crafted algorithms.The usual method0018–9219/98$10.00©1998IEEE2278PROCEEDINGS OF THE IEEE,VOL.86,NO.11,NOVEMBER1998Fig.1.Traditional pattern recognition is performed with two modules:afixed feature extractor and a trainable classifier.of recognizing individual patterns consists in dividing the system into two main modules shown in Fig.1.Thefirst module,called the feature extractor,transforms the input patterns so that they can be represented by low-dimensional vectors or short strings of symbols that:1)can be easily matched or compared and2)are relatively invariant with respect to transformations and distortions of the input pat-terns that do not change their nature.The feature extractor contains most of the prior knowledge and is rather specific to the task.It is also the focus of most of the design effort, because it is often entirely hand crafted.The classifier, on the other hand,is often general purpose and trainable. One of the main problems with this approach is that the recognition accuracy is largely determined by the ability of the designer to come up with an appropriate set of features. This turns out to be a daunting task which,unfortunately, must be redone for each new problem.A large amount of the pattern recognition literature is devoted to describing and comparing the relative merits of different feature sets for particular tasks.Historically,the need for appropriate feature extractors was due to the fact that the learning techniques used by the classifiers were limited to low-dimensional spaces with easily separable classes[1].A combination of three factors has changed this vision over the last decade.First, the availability of low-cost machines with fast arithmetic units allows for reliance on more brute-force“numerical”methods than on algorithmic refinements.Second,the avail-ability of large databases for problems with a large market and wide interest,such as handwriting recognition,has enabled designers to rely more on real data and less on hand-crafted feature extraction to build recognition systems. The third and very important factor is the availability of powerful machine learning techniques that can handle high-dimensional inputs and can generate intricate decision functions when fed with these large data sets.It can be argued that the recent progress in the accuracy of speech and handwriting recognition systems can be attributed in large part to an increased reliance on learning techniques and large training data sets.As evidence of this fact,a large proportion of modern commercial OCR systems use some form of multilayer NN trained with back propagation.In this study,we consider the tasks of handwritten character recognition(Sections I and II)and compare the performance of several learning techniques on a benchmark data set for handwritten digit recognition(Section III). While more automatic learning is beneficial,no learning technique can succeed without a minimal amount of prior knowledge about the task.In the case of multilayer NN’s, a good way to incorporate knowledge is to tailor its archi-tecture to the task.Convolutional NN’s[2],introduced in Section II,are an example of specialized NN architectures which incorporate knowledge about the invariances of two-dimensional(2-D)shapes by using local connection patterns and by imposing constraints on the weights.A comparison of several methods for isolated handwritten digit recogni-tion is presented in Section III.To go from the recognition of individual characters to the recognition of words and sentences in documents,the idea of combining multiple modules trained to reduce the overall error is introduced in Section IV.Recognizing variable-length objects such as handwritten words using multimodule systems is best done if the modules manipulate directed graphs.This leads to the concept of trainable GTN,also introduced in Section IV. Section V describes the now classical method of HOS for recognizing words or other character strings.Discriminative and nondiscriminative gradient-based techniques for train-ing a recognizer at the word level without requiring manual segmentation and labeling are presented in Section VI. Section VII presents the promising space-displacement NN approach that eliminates the need for segmentation heuris-tics by scanning a recognizer at all possible locations on the input.In Section VIII,it is shown that trainable GTN’s can be formulated as multiple generalized transductions based on a general graph composition algorithm.The connections between GTN’s and HMM’s,commonly used in speech recognition,is also treated.Section IX describes a globally trained GTN system for recognizing handwriting entered in a pen computer.This problem is known as “online”handwriting recognition since the machine must produce immediate feedback as the user writes.The core of the system is a convolutional NN.The results clearly demonstrate the advantages of training a recognizer at the word level,rather than training it on presegmented, hand-labeled,isolated characters.Section X describes a complete GTN-based system for reading handwritten and machine-printed bank checks.The core of the system is the convolutional NN called LeNet-5,which is described in Section II.This system is in commercial use in the NCR Corporation line of check recognition systems for the banking industry.It is reading millions of checks per month in several banks across the United States.A.Learning from DataThere are several approaches to automatic machine learn-ing,but one of the most successful approaches,popularized in recent years by the NN community,can be called“nu-merical”or gradient-based learning.The learning machine computes afunction th input pattern,andtheoutputthatminimizesand the error rate on the trainingset decreases with the number of training samplesapproximatelyasis the number of trainingsamples,is a number between0.5and1.0,andincreases,decreases.Therefore,when increasing thecapacitythat achieves the lowest generalizationerror Mostlearning algorithms attempt tominimize as well assome estimate of the gap.A formal version of this is calledstructural risk minimization[6],[7],and it is based on defin-ing a sequence of learning machines of increasing capacity,corresponding to a sequence of subsets of the parameterspace such that each subset is a superset of the previoussubset.In practical terms,structural risk minimization isimplemented byminimizingisaconstant.that belong to high-capacity subsets ofthe parameter space.Minimizingis a real-valuedvector,with respect towhichis iteratively adjusted asfollows:is updated on the basis of a singlesampleof several layers of processing,i.e.,the back-propagation algorithm.The third event was the demonstration that the back-propagation procedure applied to multilayer NN’s with sigmoidal units can solve complicated learning tasks. The basic idea of back propagation is that gradients can be computed efficiently by propagation from the output to the input.This idea was described in the control theory literature of the early1960’s[16],but its application to ma-chine learning was not generally realized then.Interestingly, the early derivations of back propagation in the context of NN learning did not use gradients but“virtual targets”for units in intermediate layers[17],[18],or minimal disturbance arguments[19].The Lagrange formalism used in the control theory literature provides perhaps the best rigorous method for deriving back propagation[20]and for deriving generalizations of back propagation to recurrent networks[21]and networks of heterogeneous modules[22].A simple derivation for generic multilayer systems is given in Section I-E.The fact that local minima do not seem to be a problem for multilayer NN’s is somewhat of a theoretical mystery. It is conjectured that if the network is oversized for the task(as is usually the case in practice),the presence of “extra dimensions”in parameter space reduces the risk of unattainable regions.Back propagation is by far the most widely used neural-network learning algorithm,and probably the most widely used learning algorithm of any form.D.Learning in Real Handwriting Recognition Systems Isolated handwritten character recognition has been ex-tensively studied in the literature(see[23]and[24]for reviews),and it was one of the early successful applications of NN’s[25].Comparative experiments on recognition of individual handwritten digits are reported in Section III. They show that NN’s trained with gradient-based learning perform better than all other methods tested here on the same data.The best NN’s,called convolutional networks, are designed to learn to extract relevant features directly from pixel images(see Section II).One of the most difficult problems in handwriting recog-nition,however,is not only to recognize individual charac-ters,but also to separate out characters from their neighbors within the word or sentence,a process known as seg-mentation.The technique for doing this that has become the“standard”is called HOS.It consists of generating a large number of potential cuts between characters using heuristic image processing techniques,and subsequently selecting the best combination of cuts based on scores given for each candidate character by the recognizer.In such a model,the accuracy of the system depends upon the quality of the cuts generated by the heuristics,and on the ability of the recognizer to distinguish correctly segmented characters from pieces of characters,multiple characters, or otherwise incorrectly segmented characters.Training a recognizer to perform this task poses a major challenge because of the difficulty in creating a labeled database of incorrectly segmented characters.The simplest solution consists of running the images of character strings through the segmenter and then manually labeling all the character hypotheses.Unfortunately,not only is this an extremely tedious and costly task,it is also difficult to do the labeling consistently.For example,should the right half of a cut-up four be labeled as a one or as a noncharacter?Should the right half of a cut-up eight be labeled as a three?Thefirst solution,described in Section V,consists of training the system at the level of whole strings of char-acters rather than at the character level.The notion of gradient-based learning can be used for this purpose.The system is trained to minimize an overall loss function which measures the probability of an erroneous answer.Section V explores various ways to ensure that the loss function is differentiable and therefore lends itself to the use of gradient-based learning methods.Section V introduces the use of directed acyclic graphs whose arcs carry numerical information as a way to represent the alternative hypotheses and introduces the idea of GTN.The second solution,described in Section VII,is to eliminate segmentation altogether.The idea is to sweep the recognizer over every possible location on the input image,and to rely on the“character spotting”property of the recognizer,i.e.,its ability to correctly recognize a well-centered character in its inputfield,even in the presence of other characters besides it,while rejecting images containing no centered characters[26],[27].The sequence of recognizer outputs obtained by sweeping the recognizer over the input is then fed to a GTN that takes linguistic constraints into account andfinally extracts the most likely interpretation.This GTN is somewhat similar to HMM’s,which makes the approach reminiscent of the classical speech recognition[28],[29].While this technique would be quite expensive in the general case,the use of convolutional NN’s makes it particularly attractive because it allows significant savings in computational cost.E.Globally Trainable SystemsAs stated earlier,most practical pattern recognition sys-tems are composed of multiple modules.For example,a document recognition system is composed of afield loca-tor(which extracts regions of interest),afield segmenter (which cuts the input image into images of candidate characters),a recognizer(which classifies and scores each candidate character),and a contextual postprocessor,gen-erally based on a stochastic grammar(which selects the best grammatically correct answer from the hypotheses generated by the recognizer).In most cases,the information carried from module to module is best represented as graphs with numerical information attached to the arcs. For example,the output of the recognizer module can be represented as an acyclic graph where each arc contains the label and the score of a candidate character,and where each path represents an alternative interpretation of the input string.Typically,each module is manually optimized,or sometimes trained,outside of its context.For example,the character recognizer would be trained on labeled images of presegmented characters.Then the complete system isLECUN et al.:GRADIENT-BASED LEARNING APPLIED TO DOCUMENT RECOGNITION2281assembled,and a subset of the parameters of the modules is manually adjusted to maximize the overall performance. This last step is extremely tedious,time consuming,and almost certainly suboptimal.A better alternative would be to somehow train the entire system so as to minimize a global error measure such as the probability of character misclassifications at the document level.Ideally,we would want tofind a good minimum of this global loss function with respect to all theparameters in the system.If the loss functionusing gradient-based learning.However,at first glance,it appears that the sheer size and complexity of the system would make this intractable.To ensure that the global loss functionwithrespect towith respect toFig.2.Architecture of LeNet-5,a convolutional NN,here used for digits recognition.Each plane is a feature map,i.e.,a set of units whose weights are constrained to be identical.or other2-D or one-dimensional(1-D)signals,must be approximately size normalized and centered in the input field.Unfortunately,no such preprocessing can be perfect: handwriting is often normalized at the word level,which can cause size,slant,and position variations for individual characters.This,combined with variability in writing style, will cause variations in the position of distinctive features in input objects.In principle,a fully connected network of sufficient size could learn to produce outputs that are invari-ant with respect to such variations.However,learning such a task would probably result in multiple units with similar weight patterns positioned at various locations in the input so as to detect distinctive features wherever they appear on the input.Learning these weight configurations requires a very large number of training instances to cover the space of possible variations.In convolutional networks,as described below,shift invariance is automatically obtained by forcing the replication of weight configurations across space. Secondly,a deficiency of fully connected architectures is that the topology of the input is entirely ignored.The input variables can be presented in any(fixed)order without af-fecting the outcome of the training.On the contrary,images (or time-frequency representations of speech)have a strong 2-D local structure:variables(or pixels)that are spatially or temporally nearby are highly correlated.Local correlations are the reasons for the well-known advantages of extracting and combining local features before recognizing spatial or temporal objects,because configurations of neighboring variables can be classified into a small number of categories (e.g.,edges,corners,etc.).Convolutional networks force the extraction of local features by restricting the receptive fields of hidden units to be local.A.Convolutional NetworksConvolutional networks combine three architectural ideas to ensure some degree of shift,scale,and distortion in-variance:1)local receptivefields;2)shared weights(or weight replication);and3)spatial or temporal subsampling.A typical convolutional network for recognizing characters, dubbed LeNet-5,is shown in Fig.2.The input plane receives images of characters that are approximately size normalized and centered.Each unit in a layer receives inputs from a set of units located in a small neighborhood in the previous layer.The idea of connecting units to local receptivefields on the input goes back to the perceptron in the early1960’s,and it was almost simultaneous with Hubel and Wiesel’s discovery of locally sensitive,orientation-selective neurons in the cat’s visual system[30].Local connections have been used many times in neural models of visual learning[2],[18],[31]–[34].With local receptive fields neurons can extract elementary visual features such as oriented edges,endpoints,corners(or similar features in other signals such as speech spectrograms).These features are then combined by the subsequent layers in order to detect higher order features.As stated earlier,distortions or shifts of the input can cause the position of salient features to vary.In addition,elementary feature detectors that are useful on one part of the image are likely to be useful across the entire image.This knowledge can be applied by forcing a set of units,whose receptivefields are located at different places on the image,to have identical weight vectors[15], [32],[34].Units in a layer are organized in planes within which all the units share the same set of weights.The set of outputs of the units in such a plane is called a feature map. Units in a feature map are all constrained to perform the same operation on different parts of the image.A complete convolutional layer is composed of several feature maps (with different weight vectors),so that multiple features can be extracted at each location.A concrete example of this is thefirst layer of LeNet-5shown in Fig.2.Units in thefirst hidden layer of LeNet-5are organized in six planes,each of which is a feature map.A unit in a feature map has25inputs connected to a5case of LeNet-5,at each input location six different types of features are extracted by six units in identical locations in the six feature maps.A sequential implementation of a feature map would scan the input image with a single unit that has a local receptive field and store the states of this unit at corresponding locations in the feature map.This operation is equivalent to a convolution,followed by an additive bias and squashing function,hence the name convolutional network.The kernel of the convolution is theOnce a feature has been detected,its exact location becomes less important.Only its approximate position relative to other features is relevant.For example,once we know that the input image contains the endpoint of a roughly horizontal segment in the upper left area,a corner in the upper right area,and the endpoint of a roughly vertical segment in the lower portion of the image,we can tell the input image is a seven.Not only is the precise position of each of those features irrelevant for identifying the pattern,it is potentially harmful because the positions are likely to vary for different instances of the character.A simple way to reduce the precision with which the position of distinctive features are encoded in a feature map is to reduce the spatial resolution of the feature map.This can be achieved with a so-called subsampling layer,which performs a local averaging and a subsampling,thereby reducing the resolution of the feature map and reducing the sensitivity of the output to shifts and distortions.The second hidden layer of LeNet-5is a subsampling layer.This layer comprises six feature maps,one for each feature map in the previous layer.The receptive field of each unit is a 232p i x e l i m a g e .T h i s i s s i g n i fic a n tt h e l a r g e s t c h a r a c t e r i n t h e d a t a b a s e (a t28fie l d ).T h e r e a s o n i s t h a t i t it h a t p o t e n t i a l d i s t i n c t i v e f e a t u r e s s u c h o r c o r n e r c a n a p p e a r i n t h e c e n t e r o f t h o f t h e h i g h e s t l e v e l f e a t u r e d e t e c t o r s .o f c e n t e r s o f t h e r e c e p t i v e fie l d s o f t h e l a y e r (C 3,s e e b e l o w )f o r m a 2032i n p u t .T h e v a l u e s o f t h e i n p u t p i x e l s o t h a t t h e b a c k g r o u n d l e v e l (w h i t e )c o ro fa n d t h e f o r e g r o u n d (b l ac k )c o r r e s p T h i s m a k e s t h e m e a n i n p u t r o u g h l y z e r o r o u g h l y o n e ,w h i c h a c c e l e r a t e s l e a r n i n g I n t h e f o l l o w i n g ,c o n v o l u t i o n a l l a y e r s u b s a m p l i n g l a y e r s a r e l a b e l ed S x ,a n d l a ye r s a r e l a b e l e d F x ,w h e r e x i s t h e l a y L a y e r C 1i s a c o n v o l u t i o n a l l a y e r w i t h E a c h u n i t i n e a c hf e a t u r e m a p i s c o n n e c t28w h i c h p r e v e n t s c o n n e c t i o n f r o m t h e i n p t h e b o u n d a r y .C 1c o n t a i n s 156t r a i n a b l 122304c o n n e c t i o n s .L a y e r S 2i s a s u b s a m p l i n g l a y e r w i t h s i s i z e 142n e i g h b o r h o o d i n t h e c o r r e s p o n d i n g f T h e f o u r i n p u t s t o a u n i t i n S 2a r e a d d e d ,2284P R O C E E D I N G S O F T H E I E E E ,V O L .86,N O .11,N O VTable 1Each Column Indicates Which Feature Map in S2Are Combined by the Units in a Particular Feature Map ofC3a trainable coefficient,and then added to a trainable bias.The result is passed through a sigmoidal function.The25neighborhoods at identical locations in a subset of S2’s feature maps.Table 1shows the set of S2feature maps combined by each C3feature map.Why not connect every S2feature map to every C3feature map?The reason is twofold.First,a noncomplete connection scheme keeps the number of connections within reasonable bounds.More importantly,it forces a break of symmetry in the network.Different feature maps are forced to extract dif-ferent (hopefully complementary)features because they get different sets of inputs.The rationale behind the connection scheme in Table 1is the following.The first six C3feature maps take inputs from every contiguous subsets of three feature maps in S2.The next six take input from every contiguous subset of four.The next three take input from some discontinuous subsets of four.Finally,the last one takes input from all S2feature yer C3has 1516trainable parameters and 156000connections.Layer S4is a subsampling layer with 16feature maps of size52neighborhood in the corresponding feature map in C3,in a similar way as C1and yer S4has 32trainable parameters and 2000connections.Layer C5is a convolutional layer with 120feature maps.Each unit is connected to a55,the size of C5’s feature maps is11.This process of dynamically increasing thesize of a convolutional network is described in Section yer C5has 48120trainable connections.Layer F6contains 84units (the reason for this number comes from the design of the output layer,explained below)and is fully connected to C5.It has 10164trainable parameters.As in classical NN’s,units in layers up to F6compute a dot product between their input vector and their weight vector,to which a bias is added.This weighted sum,denotedforunit (6)wheredeterminesits slope at the origin.Thefunctionis chosen to be1.7159.The rationale for this choice of a squashing function is given in Appendix A.Finally,the output layer is composed of Euclidean RBF units,one for each class,with 84inputs each.The outputs of each RBFunit(7)In other words,each output RBF unit computes the Eu-clidean distance between its input vector and its parameter vector.The further away the input is from the parameter vector,the larger the RBF output.The output of a particular RBF can be interpreted as a penalty term measuring the fit between the input pattern and a model of the class associated with the RBF.In probabilistic terms,the RBF output can be interpreted as the unnormalized negative log-likelihood of a Gaussian distribution in the space of configurations of layer F6.Given an input pattern,the loss function should be designed so as to get the configuration of F6as close as possible to the parameter vector of the RBF that corresponds to the pattern’s desired class.The parameter vectors of these units were chosen by hand and kept fixed (at least initially).The components of thoseparameters vectors were set to1.While they could have been chosen at random with equal probabilities for1,or even chosen to form an error correctingcode as suggested by [47],they were instead designed to represent a stylized image of the corresponding character class drawn on a7。

对Ruth Wodak 关于话语批判分析的重要观点的总结

对Ruth Wodak 关于话语批判分析的重要观点的总结

批判话语分析课程论文学院:温州大学外国语学院专业:英语语言文学姓名:***学号:***********2011年6月25日Ruth Wodak and Her Main Idea of CDA1 IntroductionThis essay is mainly a summary of Ruth Wodak’s background information and her general idea of CDA, which her notion of the Discourse Historical Approach (DHA)in CDA will be paid much more attention to in this essay.DHA, as she elaborates, is interdisciplinary, problem-oriented, and analyzes the change of discursive practices over time and in various genres. This essay is composed of four parts, along with introduction part and conclusion part, chapter two introduces the background information of Ruth Wodak. Chapter three is about Ruth Wodak’s general idea of CDA, within this chapter, Discourse Historical Approach is mainly dealt with. 2Background Information of Ruth WodakThis very beginning chapter is going to introduce the background information of the famous scholar Ruth Wodak with the help of the information from the internet, for the purpose of a better understanding of all the aspects of her position, her prizes, her researches and so on.Ruth Wodak is Distinguished Professor and Chair in Discourse Studies at Lancaster University. She moved from Vienna, Austria, where she was full professor of Applied Linguistics since 1991. She has stayed co-director of the Austrian National Focal Point (NFP) of the European Monitoring Centre for Racism, Xenophobia and Anti Semitism. She is member of the editorial board of a range of linguistic journals, co-editor of the journal Discourse and Society and editor of Critical Discourse Studies (with Norman Fairclough, Phil Graham and Jay Lemke) and of the Journal of Language and Politics (with Paul Chilton). Together with Greg Myers, she edits the book series DAPSAC (Benjamins). She was also section editor of "Language and Politics" for the Second Edition of the Elsevier Encyclopedia of Language and Linguistics.Besides various other prizes, she was awarded the Wittgenstein-Prizes in 1996 for Elite Researchers in 1996 which made six years of continuous interdisciplinary team research possible. Her main projects focuses on "Discourses on Un/employment in EU organizations; Debates on NATO and Neutrality in Austria and Hungary; The Discursive Construction of European Identities; Attitudes towards EU-Enlargement; Racism at the Top. Parliamentary Debates on Immigration in Six EU countries; The Discursive Construction of the Past - Individual and Collective Memories of the German Wehrmacht and the Second World War." In October 2006, she was awarded the Woman's Prize of the City of Vienna. Ruth chaired the Humanities and Social Sciences Panel for the EURYI award, in the European Science Foundation, 2006 - 2008. She has held visiting professorships in Uppsala, Stanford University, University of Minnesota and Georgetown University, Washington, D.C. In the spring 2004, she had a Leverhulme Visiting Professorship at the University of East Anglia, Norwich, UK. Recently, she was awarded the Kerstin Hesselgren Chair of the Swedish Parliament and stayed at University of Örebro, Sweden, from March to June 2008.She has co-authored works with Teun A. van Dijk.Her research is mainly located in Discourse Studies (DS) and in Critical Discourse Analysis (CDA). Together with her former colleagues and Ph.D students in Vienna (Rudolf de Cillia, Gertraud Benke, Helmut Gruber, Florian Menz, Martin Reisigl, Usama Suleiman, Christine Anthonissen), she elaborated the Discourse Historical Approach in CDA (DHA) which is interdisciplinary, problem-oriented, and analyzes the change of discursive practices over time and in various genres. Her main research agenda focuses the development of theoretical approaches in discourse studies (combining ethnography, argumentation theory, rhetoric and functional systemic linguistics); gender studies; language and/in politics; prejudice and discrimination.Currently, Ruth is studying “everyday politics” in the European Parliament and communicative decision making procedures while contrasting these - on the one hand - withresearch in Political Sciences, on the other hand - with TV series, such as The West Wing and Im Kanzleramt (book in preparation: Politics as Usual: The Discursive Construction and Representation of Politics in Action, Palgrave). Moreover, four edited books are in press: The Construction of History. Remembering the War of Annilihation (with H. Heer, W. Manoschek, A. Pollak; Palgrave); Migration, Identity, and Belonging(with G. Delanty, P. Jones; Liverpool Univ. Press); Qualitative Discourse Analysis for the Social Sciences(with M. Krzyzanowski; Palgrave);v Communication in the Public Sphere. Handbook of Applied Linguistics, Vol. 4v(with V. Koller; De Gruyter). Michal Krzyzanowski and Ruth Wodak are also finalizing the book manuscript on Institutional and Every day Discrimination - The Austrian Case(Transaction Press) which summarizes findings of the EU project XENOPHOB (5th framework), for Austria. These books are quite valuable for the study of CDA. She has also published many other books and articles, thus is a very prolific scholar in the research area of DS and CDA.3 Ruth Wodak’s General Idea of CDAAfter a general schema of her career and research area and other related information, this essay moves to the main focus of her idea about the CDA. This chapter is also divided into several parts, which all emphasizes her tenets of CDA. 3.1 The Kernel of CDA and her contribution to CDAIn Ruth Wodak’s mind, “Critical” means not taking things for granted, opening up complexity, challenging reductionism, dogmatism and dichotomies, being self-reflective in research, and through these processes, making opaque structures of power relations and ideologies manifest. “Critical”, thus, does not imply the common sense meaning of “being negative”—rather “skeptical”. Proposing alternatives is also part of being “critical” (Reisigl & Wodak, 2001).Wodak conveyed that there is no one CDA approach. All CDA approaches have their own theoretical position combined with a specific methodology and methods (Wodak & Meyer, 2001; Wodak, 2004). She emphases that, every theoreticalapproach in CDA is inherently interdisciplinary because it aims at investigating complex social phenomena which are inherently inter- or transdisciplinary and certainly not to be studied by linguistics alone. “Critical” (as mentioned above) is not to be understood in the common sense of the word, i.e. criticizing, or being negative. Thus, “positive” is in no way to be understood as the counterpart of critical research as recently proposed by Jim Martin in his version of “Positive Discourse Analysis”The notion of critic stems from the Frankfurt School, for example, but also from other philosophical and epistemological backgrounds, and means not taking anything for granted, opening up alternative readings (justifiable through cues in the texts); self-reflection of the research process; making ideological positions manifested in the respective text transparent, etc.In Reisigl and Wodak (2001), they distinguish between three dimensions of critique: textimmanent critique, socio-diagnostic critique, and prospective (retrospective) critique. These dimensions also imply integrating the many layers of context into the in-depth analysis (where they have presented very clear steps in the methodology which are implemented in a recursive manner: from text to context to text, etc.). Critical self-reflection must accompany the research process continuously: from the choice of the object under investigation to the choice of methods (categories) of analysis, the sampling, the construction of a theoretical framework designed for the object under investigation (middle range theories), to the interpretation of the results and possible recommendations for practice following the study. When involved in teamwork, this process can also be institutionalized through joint reflective team sessions at various points of the respective research project. In some cases, it has also been very useful to ask outside experts to comment on such reflection processes (for example, they had an international advisory board for her research centre “Discourse, Politics, Identity”at the University of Vienna 1996-2003, which fulfilled this function). For Wodak, Discourse Studies is a separate field; of course, many other disciplines (such as history, sociology, psychology, etc.) study texts, but not in detailed, systematic and retroductable ways; moreover, discourse analysis is not only to be perceived as a “method”or “methodology”but also as theories about text production, and text reception. Moreover, social processes are inherently and dialectically linked to language (text and discourse). In this way, discourse analysis is both a theoretical and empirical enterprise. “Retroductable” means that such analyses should be transparent so that any reader can trace and understand the detailed in-depth textual analysis. In any case, all criteria which are usually applied to social science research apply to CDA as well.Her specific contribution is most probably the focus on interdisciplinary and implementing interdisciplinarity; this is also one of the most important characteristics of the "Discourse-Historical Approach" in CDA. Moreover, in contrast to other CDA scholars (and probably because she was trained as a sociolinguist), she combine theoretical research strongly with empirical research, the analysis of large data corpora and ethnography. She has also been very influenced by the teamwork with historians and sociologists. She learnt a lot from such collaborations and by taking their contributions seriously and attempting integrative approaches. This fed into toher theory of context (Wodak, 2000). Another important characteristic of her work is the primary focus on text analysis, argumentation theory and rhetoric, more than on Functional Systemic Linguistics (FSL) and other grammar theories (however, she has also collaborated very fruitfully with Theo Van Leeuwen and other scholars in FSL, for example). She has recently become very interested in the function of social fields and genres in various social fields, while applying Bourdieo (and Luhmann) as macro approaches to much interdisciplinary research primarily to the political field (Muntigl, Weiss & Wodak, 2000; Reisigl & Wodak, 2001). She thinks that it is very important for herself not to stay in the "ivory tower"—in Austria, she was perceived somewhat as a "public intellectual" and positioned herself explicitly with her research on anti-Semitism and racism, as well as on right-wing populist rhetoric. She applied some of her research in guidelines and seminars with teachers, doctors, lawyers, and so forth.3.2Ruth Wodak’s Discourse Historical ApproachWhen we mention Wodak’s main idea and perspectives of CDA, her Discourse Historical Approach is worth being pay much attention to. Discourse Historical Approach interprets the relationship between discourse and social structure from the perspective of cognition and also combines with the research method of human culturology.The theoretical framework of DHA put forward by Wodak is divided into two dimensions: textual production and textual interpretation. As for textual production, Wodak holds that, other than speech act, nonlinguistic elements such as the situation of utterance, the position of addresser, the time, space of speech, and all social variants like the socialization of discourse organizer’s age, occupation, together with some psychologically determined factors like experience, custom, etc, play a very important role in the process of textual production. The process of text production, they are cognitive dimension, social psychological dimension and linguistic dimension. Among them, cognitive dimension is also called intellectual and experimental dimension, which includes people’s cognition, frame, schema, script and so on. Social psychological dimension contains primary conditions like culture, gender, class member, speech situation, individuality, etc. People’s frame and schema of concept and structure the reality all come into being in such primary conditions. Linguistic dimension is the final linguistic form of text. As for textual interpretation, Wodak points out that it is manipulated by social psychological elements as well. Addressees and readers should classify the text at the very beginning, then apply all kinds of strategies to the comprehension of the original text, the next step, according to Wodak, is to explain the text in order to construct “textual basis”and finally interpret the text. The process of textual production and textual interpretation is circulated and interlinked. In these processes, people’s mental model of epistemology and long-term memory gets constant feedbacks of knowledge and experience and also updates frequently. And the process of updating is completed within the systemic, conscious and unconscious strategies.The analytical model of Discourse Historical Approach can also be divided intothree steps: content, discourse strategy and linguistic form of text. Content dimension can be defined as the concrete content or topics of certain discourse determined to be studied. Discourse strategy studies all kinds of strategies (include argumentation method) used in the text. It is the intermediary of the intention and realization of different communicators in different communicating degrees. And the linguistic form of text is spread upon each layer in discourses, sentences and lexicons in order to discuss the “prototypical” realization of linguistics and the measure of language.4ConclusionThis essay mainly generates the background information and the main thought of CDA of Ruth Wodak. This main help us follow a whole image of this famous scholar. As for her contributions to CDA, this paper mainly introduces her representative approach, that is, Discourse Historical Approach. As far as we concerned, Discourse Historical Approach has its unparalleled advantage in Critique Discourse Analysis. The most outstanding features of it are the research method of human culturology and the sufficiency of corpus.However, this model is not consummate. Wodak made beneficial trials in utilizing achievements in social psychology to expound the process of textual production and interpretation. But these trials are limited, and lack enough explanation to social changes. Therefore, when using cognitive science to illustrate the constructive function of language to the society, more efforts still need to be done.ReferenceMartin, Jim and Wodak, Ruth . Re/ Reading the past [M].Amsterdam: Benjamins, 2003. Muntigl, Peter; Weiss, Gilbert and Wodak, Ruth. EU discourses on un/employment[M].Amsterdam: Benjamins,2001.Reisigl, Martin and Wodak, Ruth. Discourse and discrimination [M]. London: Routledge,2001. Weiss, Gilbert and Wodak, Ruth. CDA. Theory and interdisciplinarit [M]. London: Palgrave/MacMillan, 2001.Wodak, Ruth. The discourse-historical approach. In Ruth Wodak and Michael Meyer, Methods of critical discourse analysis [M]. London: Sage, 2001: 63-95.Wodak, Ruth. Mediation between discourse and society: Assessing cognitive approaches.Discourse Studies [M], 2006:179-190.Lancaster University, “Linguistic and English language: Professor Ruth Wodak, “/fass/projects/ndcc/download/rw.htm (accessed June 20, 2011) Lancaster university, “Department of Linguistic and English language: Dr. Dr. h. c., Dr. Hab. Ruth Wodak,” /profiles/265(accessed June 21, 2011)Wikipedia, the free encyclopedia, “Ruth Wodak,” /wiki/Ruth_Wodak (accessed June 21, 2011)。

les AS gR学生手册英文

les AS gR学生手册英文
第十一页,共33页,
Deploying with Oracle JDeveloper
To deploy an application with JDeveloper, perform the following steps: 1. Create the deployment profile. 2. Configure the deployment profile. 3. Create an application server connection to the target
第十六页,共33页,
Planning the Deployment Process
The deployment process includes:
1. Packaging code as J2EE applications or modules
2. Selecting a parent application
including those in a cluster
第十页,共33页,
Deploying with admin_client.jar
The admin_client.jar tool: • Is a command-line utility • Is executed by using the following basic command:
– defaultDataSource to select the data source used by the application for management of data by EJB entities
– dataSourcesPath to specify a application-specific data sources

Multi-source and Heterogeneous Data Integration Model for Big Data Analytics in Power DCS

Multi-source and Heterogeneous Data Integration Model for Big Data Analytics in Power DCS

Multi-source and Heterogeneous Data Integration Model for Big Data Analytics inPower DCSWengang Chen Jincheng Power Supply CompanyJincheng, Chinajcchenwangang@ Ruijie Wang, Runze Wu, Liangrui TangNorth China Electric Power UniversityBeijing, Chinawang_ruijie2015@wurz@tangliangrui@Junli FanBeijing Guodiantong NetworkTechnology Co. Ltd.Beijing, Chinafanjunli1@Abstract—It is the vital sig nificance for the strong and smart g rid that big data analytics technolog ies apply in the power system. The multi-source and heterog eneous data integ ration technolo g y based on bi g data platform is one of the indispensable content. As there are the problems of data heterog eneity and data islands in the dispatching and control system, a multi-source and heterog eneous data integ ration model is proposed for big data analytics. This model exists the data integration layer in the platform of big data analytics. The model can improve the Extract-Transform-Load (ETL) process in the big data platform according to the extracting rules and transform rules, which are made by uniform data model in the panoramic dispatching and control system. Research shows that the integ rating model developed here is efficient to establish panoramic data and can adapt to various data sources by building uniform data model in the power dispatching and control system. With the development of big data technology, it is expected that the data integration model will be improved and used in more electric power applications.Keywords- power dispatching and control system; big data analysis; data integration model; uniform data modelI.I NTRODUCTIONBig data was referenced the earliest by open source project Nutch in Apache Software Foundation, which was used to describe the analysis of a large number of data sets in web search applications [1]. Different industries have some consensus, but have not a unified definition for the big data. In 2012, Gartner updated its definition in [2] as follows: "Big data is high volume, high velocity, and high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization." The representative features of big data are 3Vs: volume, variety, velocity. International Data C orporation thinks that the 4th V of big data is value with sparse density, but IBM regards as veracity is the 4th V [3].As it has the big data characteristics of large numbers, rapid growth and rich type, the data generated in the power system is representative big data [4], [5]. It describes the features of the electric power big data: 3V and 3E in [4]. The 3V is a representative of volume, velocity and variety and the 3E shows electric power data that is energy, exchange and empathy. For the construction of strong and smart grid, the technology research relating big data is desiderated to make electric power big data analytics.Big data in electric power system spans many areas, including power generation, transmission, distribution, utilization, and scheduling, all of which encompass the big data platform of the State Grid Corporation of China [6]. In the operation process of power grid, dispatching and control system (DC S) collects the vast and various data, which is increased rapidly. Hence, the data collected by DCS has big data characteristics and belongs to the big data. However, the big data of dispatching and control system exists problems of data island and information heterogeneity. In order to manage uniformly the data from any application dispatching and control system, the multi-source and heterogeneous data in the system needs to be integrated in the big data platform to build the panoramic data of DCS. It can promote the big data analytics and data mining for the electric power dispatching control system.Extract-Transform-Load (ETL) is one of the more popular approaches to data integration as shown in [7]. Authors of this work have given the working modules in ETL and introduced a framework that uses a workflow approach to design ETL processes. Some groups have used various approaches such as UML and data mapping diagrams for representing ETL processes, quality metrics driven design for ETL, and scheduling of ETL processes [8]-[11]. In order to fuse the heterogeneous data in DC S and build panoramic data, a multi-source and heterogeneous data integration model is proposed to analysis the big data in the power dispatching and control system. The model adopts the improved ETL processes in the platform of the big data analytics. Meanwhile, the improved ETL processes are based on the uniform data model in the panoramic DCS. The build panoramic data will be stored in big data platform in the form of distributed data warehouse, which can provide the data for user’s data query and data analysis.II.B IG D ATA OF DCSA.Uniform Data Model of Panoramic DCSThe data of dispatching and control system distributes in the application systems, like Supervisory C ontrol and Data Acquisition (SCADA), Energy Management System (EMS), Wide Area Measurement System (WAMS), Project Management Information System (PMIS), etc. Some2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discoveryproblems are caused, as the structure of data storage and attribute name exist differences in some different application systems. The uniform data model is built based on IEC TC57 CIM, which is reduced, supplied and expanded according to the global ontology of DC S, to solve these problems. C ore data models of panoramic data cover the integrated information from SCADA, WAMS, EMS, etc. Power system logic model, which is one of the core data models, is divided into the two branches: equipment container and equipment, as is shown in Figure 1. Subclasses of equipment includes a variety of conducting equipment and auxiliary equipment. The logical grid topology is formed when conductingFigure 1. Power System Logic ModelAfter getting the uniform data model, mappingrelationships and transform rules need to be establishedbetween the uniform data model of panoramic with the datamodel of any application system, which are convenient forthe data extraction and data transformation.B. the Platform of Big Data Analytics in DCSAs mentions before, there are mass data and the problemof heterogeneous information in the power scheduling andcontrol system. Therefore, the data from different sourcesneed be fused by big data technologies to build panoramicdata of DCS in the big data platform. Then, we can realizedata analysis, data mining and data visualization on the baseof the panoramic data. The architecture of big data platformis shown in Figure 2, which includes seven layers, fivebackground processing processes and two interface layers.This framework elaborates the responsibilities of each layerby a left-right and up-down view approach. The functionrealized by any layer is as follows.• Heterogeneous Data Sources: there are more than 10sets of application system in electric powerdispatching and control center, such as SC ADA,WAMS, EMS, PMIS and operation managementsystem, etc. These systems provide the massiveamounts of original heterogeneous data, whichcannot be directly used for data analysis and data mining in the bid data field.• Data Access Layer: in order to speed up the process of data processing, data integration and data analysisare run in a distributed processing system based on Hadoop. Hence, data access layer supplies a data channel that transmits the data between data sources Distributed Processing System Based on HadoopData ExplorationReportsHive Hbase HDFSData Integration LayerExtract DataTransformLoad DataUniform Data Model of Panoramic DCSData Access LayerHeterogeneous Data Sources···SCADAWAMS EMS PMISSpark and Sparking StreamingData QueriesData AnalysisData miningFigure 2. Architecture of Big Data Platform•Data Integration Layer: in this layer we present our conceptual model by describing ETL processingbased on the uniform data model of panoramic DCS.This layer aims at building the panoramic data of dispatching and control system and is one of the keymodules in the platform of big data analytics.• Data Storage Layer: the panoramic data built in the data integration layer is stored in this layer, which provide the basic data for data queries, data analysis and data mining. The big data platform. The big data platform stores the data in ways of the distributed data warehouse Hive and distributed data storage ways HDFS, HBase. • Data Analytics Layer: this layer realizes some functions, which includes data queries, data analysis and data mining, by processing the data called form data storage layer. Data mining technology is composed of Bayesian method, neural net algorithm, results analysis, etc. • Visual Data Layer: the results of data analysis and data mining is presented by the tabular or graph representation to provide effective decision-making supporting information for dispatchers. • Interactive Interface: the interactive interface can manage the big data platform by accessing the data layers except visual data layer and adjusting the parameters in operation of the platform, which can help the normal and efficient operation of system. From the above, the panoramic data integrated is the data center in the big data analytics of dispatching and controlsystem. Hence, it is the important component that complete and effective panoramic data is built by a rational dataintegration method. For obtaining the panoramic data of DC S, a multi-source and heterogeneous data integrationmodel is proposed to apply to the data integration layer.III.M ULTI-SOURCE AND H ETEROGENEOUS D ATAI NTEGRATION M ODELIn this section, an integration framework for multi-source heterogeneous data is proposed by describing big data ETL process. The proposed uniform data model and data integration ETL process will be used in this framework.A.Integration Model OverviewIn the data integration layer, we present the big data ETL framework BDETL which uses Hadoop to parallelize ETL processes and to load the panoramic data into Hive. BDETL employs Hadoop Distributed File System (HDFS) as the ETL execution platform and Hive as the distributed data warehouse system (see Figure 3). BDETL has a quantity of components, including the application programming interfaces (APIs) used by the user’s or development professionals’ ETL programs, ETL processing engine performing the data extracting and transforming process, and a job manager that controls the execution of the jobs toFigure 3. BDETL ArchitectureThere are three sequential steps in BDETL’s workflow: data extracting process, data transforming process and data loading process. The data warehouse is built by Hive based on the uniform data model of panorama dispatching and control system, therefore, rules of data executions and transformations are specified by users or research staff. So the three steps of BDETL are described as follows:•Data extracting process: this is the first step of ETL that involves data extraction from appropriate datasources. The transformed data must be present inHDFS (including Hive files and HBase files) Whenthe MapReduce (MR) jobs are started (see the left ofFig.3). So the data from heterogeneous sources, suchas SCADA, WAMS and EMS, needs to be uploadedinto HDFS. The source data executed by mappersand reducers is extracted from the HDFS by user-specified rules. Here, a graphical interface isprovided to create a workflow of ETL activities andautomate their execution.•Data transforming process: the data transforming process involves a number of transformationsprocessing data, such as normalizing data, lookups,removing duplicates, checking data integrity andconsistency by constraint from uniform data model,etc. BDETL allows processing of data into multipletables within a job. BDETL’s job manager submitsjobs to Hadoop’s Job-Tracker in sequential order.The jobs for transformation are to be run by thetransforming rules based on uniform data model.•Data loading process: this step involves the propagation of the data into a data warehouse likeHive that serves Big Data. Hive employs HDFS forphysical data storage but presents data in HDFS filesas logical tables. The data can be written directlyinto files which can be used by Hive.B.Executing ProcessAs the ETL process is executed in the big data analysis platform, the heterogeneous data of sources must be update into HDFS. The source data in HDFS is split by Hadoop and assigned to the map tasks. The records from a file split are processed by specified rules in the mappers. A mapper can process data from HDFS that will go to different dimensions. What is more, a dimension must have a key that distinguishes its members. Then, in the shuffle and sort, the mapper output is sent to different reducers by specified rules. The reducer output is written to the HDFS. When the attribute values of a dimension member are overwritten by new values, we also need to update the versions that have already been loaded into Hive.Listing 1 shows pseudocode for the mapper. In this code, Γis a sequence of transformation process as defined by users or designers. r=⊥ says that r is the smallest element. Meanwhile, a transformation can be followed by other transformations. According to the uniform data model, the data in HDFS is extracted into mapper process by the method GetTransformationData() (lines 3). The first transformation defines the schematic information of the data source such as the names of attributes, the data types, and the attributes for sorting of versions.Listing 1 Mapper1 class Mapper2 method Initialize()3 GetTransformation()DataΓ←4 method Map(Record r)5 for all t∈Γ do6 .ProcessRecord()r t r←7 if r=⊥ then8 return9 else10 CreatCompositeKey(,.targetDimension)key r t←11 CreatValue(,.targetDimension)value r t←12 return (,)key value13 End if14 End forspecialized map-only method for big dimensions. The input is automatically grouped by the composite key values and sorted before it is fed to the Reduce method. We keep the dimension values temporarily in a buffer (lines 5–10), assign a sequential number to the key of a new dimension record (line 9), and update the attribute values (line 11), including the validity dates and the version number. The method MakeDimensionRecord extracts the dimension’s business key from the composite key given to the mapper andcombines it with the remaining values. Finally, we write the reduce output with the name of the dimension table as the key, and the dimension data as the value (line 12). Listing 2 Reducer 1 class Reducer2 method Reduce(CompositeKey key, values[0…n ])3 GetNameofDimensionTable()name key ←4 if DimType(name) = TypeList then5 ()L new List ←6 for 0,i n ← do7 MakeDimensionRecord(,[])r key values i ← 8 if []r id =⊥ then9 []GetDimensionId()r id name ← 10 End if 11 .()L add r12 UpdateAttributeValues(L ) 13 End for14 for all r ęL do 15 return (,)name r 16 End for 17 End ifIV. I MPLEMENTATIONIn this section, we show how to use BDETL to build panorama data of dispatching and controlling system in a data analysis platform based on Hadoop. This platform includes a local cluster of three computers: two computers are used as the DataNodes and TaskTrackers, each of them has a quad-core N3700 processor (1.6GHz) and 4GB RAM; one machine is used as the NameNode and JobTracker, which has two quad-core G3260 (3.3GHz) processors and 4GB RAM. All machines are connected via a 20Mbit/s Ethernet switch. We use Hadoop 0.20.0 and Hive 0.13.0. Two map tasks or reduce tasks on each node are configured in the Hadoop platform.Data sets for the running example are updated into HDFS from SC ADA, WAMS, EMS, etc. In the experiments, the data dimension is scaled from 20GB to 80GB. The BDETL implementation consists of four steps: 1) update the source data into HDFS and define the data source on request, 2) setup the transforming rules, a sample model of which is provide in Figure 4, 3) define the target table on the basis of the uniform data model, 4) add the sequence of BUS Table in the Uniform Data ModelSTATION VOLTAGECLASS...Bus Table from SCADADIANYALEIXINGCHANGZHAN ...Bus Table from EMSDIANYADENGJICHANGZHAN ...Figure 4. Standardized Attribute of Heterogeneous DataIn the process of data integration, we compare BDETL with ETLMR in [12] which is a parallel ETL programming framework using MapReduce. The reason that ETLMR is selected is BDETL has the same goal as ETLMR. The performance is studied by comparing running time of the integration processes in the ETL.Figure 5 shows the total time of ETL processes. ETLMR is efficient to process relatively small size data sets, e.g., 20GB, but the time grows fast when the data is scaled up and the total time is up to about 23% higher than the time used by BDETL for 100GB. So BDETL outperforms ETR when the data is bigger.Figure 5. The Total Time of ETL ProcessThe time of initial load is shown in Figure 6. For BDETL, the initial loads are tested using data with and without co-location. The result shows that the performance is improved significantly by data co-location and about 60% more time is used when there is no co-location. This is because the co-located data can be processed by a map-only job to save time. BDETL is better than ETLMR. For example, when 100Gb is used to test, ETLMR uses up to 3.2 times as long for the load. Meanwhile, the processing time used by ETLMR grows faster with the increase of the data.The time of processes in mappers and reducers is tested. The results are shown in Figure 7. The MapReduce process of BDETL is faster than that of ETLMR. The reason is that the transforming rules based on uniform data model can reduce the complexity of mappers and reducers. So ETLMR takes 1.5 times longer than BDETL.Figure 6.The Time of Initial LoadFigure 7. The Time of Processes in Mappers and ReducersV. C ONCLUSIONS AND F UTURE W ORKThe data from various heterogeneous sources in dispatching and control system need to be integrated into uniform data model of panoramic dispatching and control system, which allows intelligent querying, data analysis and data mining in the big data platform. That is an important open issue in the area of Big Data. For the big data of power dispatching and control system, we proposed the architecture of big data analysis platform. As the data in dispatching and control system exist heterogeneous, this paper proposes an efficient ETL framework BDETL to integrate the data form various heterogeneous sources in a distributed environment. This framework uses uniform data model around dataintegration. Meanwhile, the transforming rules based on uniform data model are conducive to reduce the time of ETL process. We conducted a number of experiments to evaluate BDETL and compared with ETLMR. The results showed that BDETL achieves better performance than ETLMR when processing the integration of heterogeneous data.There are numerous future research directions for ETL process in big data platform. For example, it would be better to establish graphical user interface where users or designers can make an ETL flow by using visual transformation operators; and that we plan to make BDETL support more ETL transformations in power systems.R EFERENCES[1] Gartner. Top ten strategic technology trend for 2012 [EB/OL]. (2011-11-05) [2014-08-17]. .[2] Big data, from Wikipedia, the free encyclopedia./wiki/Big data.[3] Xin Luna Dong and Divesh Srivastava. 2013. Big data integration.Proc. VLDB Endow. 6, 11 (Aug 2013), 1188-89.[4] C hinese Society for Electrical Engineering Information C ommittee,"Chinese Electric Power Big Data Development White Paper (2013)," Chinese Society for Electrical Engineering, Beijing, China, 2013.. [5] China Computer Federation Big Data Experts Committee, "China BigData Technology and Industry Development White Paper (2013)," China Computer Federation, Beijing, China, 2013.[6] Y. Huang and X. Zhou, "Knowledge model for electric power bigdata based on ontology and semantic web," in CSEE Journal of Power and Energy Systems, vol. 1, no. 1, pp. 19-27, March 2015..[7] P. Vassiliadis, A. Simitsis, and E. Baikousi, “A Taxonomy of ETLActivities,” in Proceedings of the AC M Twelfth International Workshop on Data Warehousing and OLAP, New York, NY, USA, 2009, pp. 25–32.[8] S. K. Bansal, "Towards a Semantic Extract-Transform-Load (ETL)Framework for Big Data Integration," 2014 IEEE International Congress on Big Data, Anchorage, AK, 2014, pp. 522-529.[9] A. Simitsis, K. Wilkinson, M. C astellanos, and U. Dayal, “QoX-driven ETL design: reducing the cost of ETL consulting engagements,” in Proceedings of the 2009 AC M SIGMOD International Conference on Management of data, 2009, pp. 953–960. [10] Liu, X., Thomsen, C ., and Pedersen, B. T. C loudETL: scalabledimensional ETL for hive. In Proc. of IDEAS, pp. 195–206, 2014.[11] A. Karagiannis, P. Vassiliadis, and A. Simitsis, “Macro-levelScheduling of ETL Workflows,” Submitted for publication, 2009. [12] Liu, X., Thomsen, C., and Pedersen, B. T. The ETLMR MapReduce-based Dimensional ETL Made Easy. PVLDB, 5(12):1882–1885, 2012.。

软件工程专业英语求职信

软件工程专业英语求职信

Dear Hiring Manager,I am writing to express my interest in the Software Engineer position at your esteemed company. With a solid background in software engineering and a passion for developing innovative solutions, I believe I would be a valuable asset to your team.I have recently completed my degree in Software Engineering from XYZ University, where I gained a comprehensive understanding of the principles and practices of software development. During my studies, I had the opportunity to work on various projects that allowed me to develop a strong foundation in programming languages such as Java, C++, and Python. I have also gained experience with software development frameworks and tools such as React, Node.js, and Git.Throughout my academic journey, I have consistently demonstrated a commitment to excellence. I have received numerous awards and recognitions for my academic achievements, including the prestigious XYZ Scholarship for excellence in software engineering. Additionally, I have actively participated in extracurricular activities, such as coding competitions and hackathons, where I have honed my problem-solving and teamwork skills.In addition to my academic accomplishments, I have had the opportunity to gain practical experience through internships and freelance projects. During my internship at ABC Technologies, I worked on developing a web application for managing customer data. I was responsible for designing the database schema, implementing the backend logic, and integrating the frontend interface. This experience not only allowed me to apply my programming skills in a real-world setting but also taught me the importance of collaboration and effective communication in a team environment.As a freelancer, I have completed several projects, including the development of a mobile application for a local startup and the implementation of a machine learning algorithm for data analysis. These projects have given me the flexibility to work on a diverse range oftechnologies and have allowed me to develop a strong sense of ownership and accountability for the solutions I deliver.I am confident that my strong technical skills, coupled with my passion for innovation and commitment to excellence, make me a suitable candidate for the Software Engineer position at your company. I am eager to contribute to the success of your team and help drive the development of cutting-edge software solutions.Thank you for considering my application. I would welcome the opportunity to discuss how my skills and experiences align with the requirements of the position further. I have attached my resume for your review, and I look forward to the possibility of meeting with you to discuss my qualifications in more detail.Yours sincerely,[Your Name]。

芯片失效分析系统Avalon软件系统说明书

芯片失效分析系统Avalon软件系统说明书

DATASHEET Overview Avalon software system is the next-generation CAD navigation standard for failure analysis, design debug and low-yield analysis. Avalon is a power packed product with tools, features, options and networking capability that provides a complete system for fast, efficient and accurate investigation of inspection, test and analysis jobs. Avalon optimizes the equipment and personnel resources of design and semiconductor failure analysis (FA) labs by providing an easy-to-use software interface and navigation capabilities for almost every type of test and analytical failure analysis equipment.Avalon enables closer collaboration of product and design groups with FA labs, dramatically improving time to yield and market. Avalon can import CAD design data from all key design tools and several user-proprietary formats while providing visual representations of circuits that can be annotated, exploded, searched and linked with ease.Benefits • Improves failure analysis productivity through a common software platform for various FA equipment • Significantly decreases time to market with reduced FA cycle time • Faster problem solving by cross-mapping between device nodes to view all three design domains (layout, netlist and schematic) simultaneously • Increases accuracy of FA root cause analysis using advanced debug tools • Single application that overlays images from various FA equipment on to design layout • Secure access to all FA information using KDB™ database • Design independent system that supports all major layout versus schematic (LVS)• Complete access to all debug tools critical to failure trace, circuit debug and killer defect source analysis • Simple deployment setup with support for Linux and Windows • Seamless integration with legacy Camelot™ and Merlin™ databases • Ease of conversion for layout, netlist and schematic data and establishes cross-mapping links between each data entityCAD Navigation andDebug Solutions forFailure AnalysisAvalonFigure 1: Avalon CAD-navigation system integrating layout, signal tracing and 3D viewSupporting all CAD Design DataSynopsys is committed to being the leading provider of software solutions that links all CAD design data. Avalon is a comprehensive package that reads all EDA tools and design data from verification systems and several user-proprietary formats. The KDB™database is designed to interface with all key design formats.Today, there are more EDA developers and more verification package choices; Synopsys is the only company thatsupports all of them.• LVS Conversions: Cadence (Assura, DIVA), Mentor Graphics (CheckMate, Calibre), Synopsys (Hercules, ICV)• Netlist Conversion: SPICE, EDIF, OpenAccess• Layout Conversion: GDSII, OASIS®The highest priorities for Avalon users are faster data accessibility, support diverse failure analysis equipment and availability of debug tools. Avalon provides the optimal solution for both small and continually-expanding FA labs and design debug teams. The Avalon database is design independent and offers a superior level of data consistency and security. The unique design of the internal database schema guarantees compatibility with decades-old databases. This is an indispensable feature for all failure analysis, QAand manufacturing organizations especially in the automotive industry.Figure 2: Avalon SchemView and NetView provide an easy way to navigate inside circuit schematicsProviding Critical Analysis FunctionsIn addition to its CAD navigation and database capabilities, Avalon’s analysis features have become indispensable to the FA lab. Different viewing options are critical in tracking potential failures and determining the source and origin of killer defects. Avalon includes special schematic capabilities and layout features that are invaluable to FA engineers as they debug chips manufactured using new processes.Avalon View Only Client consists of maskview, netview, schemview, i-schemview, K-EDIT, defect wafermap and 3D-SAA. The list below details some of the most commonly used applications.Defect Wafer Map integrates defect inspection data with the device CAD design using the defect coordinates to navigate an equipment stage and pinpoint the defect for closer inspection and characterization. Avalon sorts defects by size, location or class, as well as layout location and allows the user to define custom wafer maps. Additionally, users can classify defects, attach images and write updated information to the defect files.Figure 3: Defect Wafer Map pinpoints defects for closer inspectionSchemView provides tracking of potential failures through visualization of the chip logic. Cross-mapping of nets and instances to the device layout and netlist, SchemView helps determine the source and origin of chip failures. SchemView helps determine the source and origin of chip failures. The entire design is displayed in cell hierarchy format, allowing push-down to a transistor level.Figure 4: K-Edit allows collaboration between design, fab and labI-Schem (Interactive Schematic) creates a schematic from a netlist in a net-oriented format allowing forward and backward tracking to locate a fault. Features like Add Driver or Add Input Cone allow for quick analysis and verification of diagnostic resultsin scan chains.Figure 5: I-Schem creates a schematic from a netlistK-Bitmap allows equipment CAD navigation when analyzing memory chips by identifying the physical location of failingmemory cells. It eliminates tedious screen counting by converting the logical addresses, or row and column coordinates, to thephysical location.Figure 6: K-Bitmap identifies the physical location of bit addresses in memory devices3D Small-Area Analysis provides a three-dimensional cross- section capability to FA engineers, enabling faster localization of circuit failures to accelerate IC manufacturing yield improvement.Figure 7: 3D Small-Area Analysis enables faster localization of circuit failuresHot-Spot Analyzer allows user to draw regions on the layout that correspond to hot-spot regions (emission spots) to detect the crucial nets. It finds the nets in each hot-spot region and plots a pareto graph of nets crossing one or more hotspots which helps to easily locate the killer net.Figure 8: Hot-Spot Analyzer displays number of nets in a hot spotUser-Defined Online Search (UDOS) allows users to search a small area of a die for unique polygon features, repeated features or lack of features. Applications include, but are not limited to, FIB-able regions, repeaters, pattern fidelity and lithographic applications.Figure 9: User-Defined Online Search (UDOS) finds easy-to-access tracesPassive Voltage Contrast Checker (PVC) quickly and accurately validates the integrity of a circuit’s conductivity and provides detailed information for identifying suspect faults at via or metal tracesFigure 10: Passive Voltage Contrast (PVC) Checker identifies suspect vias or metal tracesElectronic Virtual Layer marks objects to represent net connectivity during a FIB deposit or cut using KEdit. The online trace will simulate the new connectivity to the virtual layer. PVC checker could be used on this virtual layer to simulate the crack or short.Check Adjacent Nets allows logical analysis of nets. This command line tool finds the adjacent nets which are within user-specified threshold distance to find shorts.Export Partial Layout enables the customer to share partial layout data with service labs without compromising the IP of the product.Image Mapper automates the image alignment process in Avalon Maskview and saves a lot of time and effort spent inmanual alignment.Advanced 3D Viewer displays real time 3D view of the selected layout area. It shows each process step in the 3D view for which it uses the process data along with design data. It zooms into smaller details and helps to minimize unintended consequences during FIB cuts due to underneath high density structure.Avalon SolutionAvalon brings all the advantages of enterprise-wide computing for FA of the chip. Avalon is an open architecture system that connects users over local and wide area networks for seamless integration and database sharing. Instrument integration throughout the fab and other locations throughout the enterprise enables viewing, modifying, characterizing and testing the same wafer location with different instruments, or the same location on wafers at different facilities using the same chip design.Figure 11: Avalon’s open architecture integrates with Synopsys’ Yield ExplorerIC DesignToolsFigure 12: Avalon server solutionComprehensive Library of FA Tool DriversAvalon provides navigation with almost every equipment used in the FA lab. With a continued commitment to support drivers for all types of test and analysis equipment, Synopsys will continue to develop driver interfaces for new tools as they are introduced to the market, as well as the next generation of existing tools.Equipment Supported by Avalon• Analytical Probe Stations• Atomic Force Microscopes• E-Beam Probers• IR Imaging• Mechanical Stage Controllers• Emission Microscopes• Microanalysis Systems• FIB Workstation• Laser Voltage Probe• LSM• EDA LVS• Microchemical Lasers• OBIC Instruments• Optical Review• SEM Tools• Photon Emission Microscopes• Laser Scan Microscopes©2018 Synopsys, Inc. All rights reserved. Synopsys is a trademark of Synopsys, Inc. in the United States and other countries. A list of Synopsys trademarks isavailable at /copyright.html . All other names mentioned herein are trademarks or registered trademarks of their respective owners.。

数据库原理英文选择题

数据库原理英文选择题

数据库原理英文选择题1. Which of the following is NOT a characteristic of a relational database?A. Data is stored in tablesB. Data is accessed through SQLC. Data redundancy is encouragedD. Data integrity is maintained2. What is the primary function of a primary key in a database table?A. To ensure data uniquenessB. To establish relationships between tablesC. To provide a means of data encryptionD. To improve query performance3. In a relational database, which of the following represents a relationship between two tables?A. Primary keyB. Foreign keyC. IndexD. Trigger4. Which of the following SQL statements is used to retrieve data from a database?A. SELECTB. INSERTC. UPDATED. DELETE5. What is the purpose of normalization in database design?A. To improve data redundancyB. To eliminate data anomaliesC. To increase data storage spaceD. To decrease query performanceA. A primary key consisting of a single columnB. A foreign key referencing a primary keyC. A primary key consisting of multiple columnsD. A unique key that allows null values7. In SQL, what is the difference between a WHERE clause and a HAVING clause?A. WHERE clause filters rows before grouping, while HAVING clause filters groups after groupingB. WHERE clause is used with SELECT statements, while HAVING clause is used with UPDATE statementsC. WHERE clause is used to sort data, while HAVING clause is used to filter dataD. WHERE clause is used with JOIN operations, while HAVING clause is used with subqueriesA. CREATE TABLEB. ALTER TABLEC. DROP TABLED. SELECT TABLE9. What is the purpose of an index in a database?A. To improve data redundancyB. To enhance data securityC. To speed up query executionD. To reduce data integrity10. Which of the following is NOT a type of database constraint?A. PRIMARY KEYB. FOREIGN KEYC. UNIQUED. VIEW数据库原理英文选择题(续)11. When designing a database, which of the following isa key principle to ensure data consistency?A. Data duplicationB. Data isolationC. Data abstractionD. Data normalization12. In a database, what is the term used to describe the process of converting a query into an execution plan?A. ParsingB. OptimizationC. CompilationD. Execution13. Which of the following SQL statements is used to modify existing data in a database table?A. SELECTB. INSERTC. UPDATED. DELETE14. What is the purpose of a transaction in a database system?A. To store data permanentlyB. To ensure data consistencyC. To improve query performanceD. To create new tables15. Which of the following is a type of join that returns rows when there is at least one match in both tables?A. INNER JOINB. LEFT JOINC. RIGHT JOIND. FULL OUTER JOIN16. In a database, what is the term used to describe the process of retrieving only distinct (unique) values from a column?A. GROUP BYB. ORDER BYC. DISTINCTD. COUNTA. DROP TABLEB. DELETE TABLEC. TRUNCATE TABLED. ALTER TABLE18. What is the purpose of a stored procedure in a database?A. To store temporary dataB. To perform a series of SQL operationsC. To create a new databaseD. To delete a database19. Which of the following is a characteristic of a NoSQL database?A. It uses a fixed schemaB. It is optimized for structured dataC. It is horizontally scalableD. It only supports SQL as a query language20. In a database, what is the term used to describe a collection of related data organized in rows and columns?A. TableB. ViewC. SchemaD. Database数据库原理英文选择题(续二)21. What is the difference between a database and a data warehouse?A. A database stores current data, while a data warehouse stores historical dataB. A database is used for transactional purposes, while a data warehouse is used for analytical purposesC. A database is small in size, while a data warehouse is large in sizeD. A database is structured, while a data warehouse is unstructuredB. To enforce referential integrityC. To format data before it is displayedA. UnionB. JoinC. IntersectionD. Concatenation24. Which of the following SQL keywords is used to limit the number of rows returned a query?A. LIMITB. FETCHC. OFFSETD. ROWS25. What is the purpose of a database schema?A. To define the physical storage of dataB. To define the logical structure of a databaseC. To define the security permissions for usersD. To define the backup and recovery procedures26. Which of the following is NOT a type of database management system (DBMS)?A. Relational DBMSB. Document DBMSC. Hierarchical DBMSD. Sequential DBMS27. In a database, what is the term used to describe a collection of data that is treated as a single unit?A. TupleB. AttributeC. RelationD. Entity28. Which of the following SQL statements is used to create a view in a database?A. CREATE VIEWB. ALTER VIEWC. DROP VIEWD. SELECT VIEW29. What is the purpose of a database index?A. To sort data in ascending or descending orderB. To improve the speed of data retrievalC. To enforce uniqueness of dataD. To hide sensitive data from users30. Which of the following is a characteristic of a distributed database?A. Data is stored in a single locationB. Data is replicated across multiple locationsC. Data access is limited to a single user at a timeD. Data consistency is not maintained across locations。

15544+自动化专业英语(习题参考解答)

15544+自动化专业英语(习题参考解答)

自动化专业英语姜书艳主编张昌华徐心皓何芳编著习题参考解答Unit 1A. Basic laws of Electrical Networks[EX.1] Comprehension1. KCL:The algebraic sum of the currents entering any node is zero.KVL:The algebraic sum of the voltage around any closed path is zero.2. Node: A point at which two or more elements have a common connection is calleda node.Branches: a single path in a network composed of one simple element and the node at each end of that element.Path: If no node was encountered more than once, then the set of nodes and elements that we have passed through is defined as a path.Loop: If the node at which we started is the same as the node on which we ended, then the path is, by definition, a closed path or a loop. a path is a particular collection of branches.3. 4, 5, We can form a path but not a loop.4. v R2=32V, V x=6V[EX.2] Translation from English to Chinese1. 如果定义具有最大连接支路数的节点为参考节点,那么得到的方程相对来说比较简单。

2024年高考真题英语(北京卷)含解析

2024年高考真题英语(北京卷)含解析
【答案】11.to rest
12.self-awareness
13.gives14.boundaries
【解析】
【导语】本文是一篇说明文。主要介绍了慢下来对个人成长的重要意义。
【11题详解】
考查非谓语动词。句意:花时间休息可以让我们发展出更深层次的自我意识。take (the) time to do sth.为固定搭配,表示“花时间做某事”,所以空处应用动词不定式形式。故填to rest。
9. A.whisperingB.arguingC.clappingD.stretching
10. A.funnierB.fairerC.clevererD.braver
【答案】1. C 2. B 3. D 4. A 5. B 6. D 7. B 8. A 9. C 10. D
【解析】
【导语】本文是一篇记叙文。文章主要讲述了作者抱着试一试的心态,参加了音乐剧面试却成功获得了扮演音乐剧主角的机会,作者在这次经历中体验到了尝试新事物带来的乐趣。
【5题详解】
考查动词词义辨析。句意:然后他们测试了我的唱歌技巧,问我想要演什么角色。A. advertised为……做广告;B. tested测验;C. challenged对……怀疑;D. polished润色。根据上文“I entered the room and the teachers made me say some lines from the musical.”以及下文“The teachers were smiling and praising me.”可推知,此处指作者进入戏剧室后,老师们让作者说几句音乐剧中的台词,测试作者的唱歌技巧,并对作者的表现很满意。故选B。
【3题详解】
考查名词词义辨析。句意:在1:10的时候,戏剧室外面排起了队。A. game游戏;B. show展览;C. play游戏;D. line队伍。根据下文“Everyone looked energetic. I hadn’t expected I’d be standing there that morning.”可知,此处指戏剧室外面排起了队。故选D。

ACM-GIS%202006-A%20Peer-to-Peer%20Spatial%20Cloaking%20Algorithm%20for%20Anonymous%20Location-based%

ACM-GIS%202006-A%20Peer-to-Peer%20Spatial%20Cloaking%20Algorithm%20for%20Anonymous%20Location-based%

A Peer-to-Peer Spatial Cloaking Algorithm for AnonymousLocation-based Services∗Chi-Yin Chow Department of Computer Science and Engineering University of Minnesota Minneapolis,MN cchow@ Mohamed F.MokbelDepartment of ComputerScience and EngineeringUniversity of MinnesotaMinneapolis,MNmokbel@Xuan LiuIBM Thomas J.WatsonResearch CenterHawthorne,NYxuanliu@ABSTRACTThis paper tackles a major privacy threat in current location-based services where users have to report their ex-act locations to the database server in order to obtain their desired services.For example,a mobile user asking about her nearest restaurant has to report her exact location.With untrusted service providers,reporting private location in-formation may lead to several privacy threats.In this pa-per,we present a peer-to-peer(P2P)spatial cloaking algo-rithm in which mobile and stationary users can entertain location-based services without revealing their exact loca-tion information.The main idea is that before requesting any location-based service,the mobile user will form a group from her peers via single-hop communication and/or multi-hop routing.Then,the spatial cloaked area is computed as the region that covers the entire group of peers.Two modes of operations are supported within the proposed P2P spa-tial cloaking algorithm,namely,the on-demand mode and the proactive mode.Experimental results show that the P2P spatial cloaking algorithm operated in the on-demand mode has lower communication cost and better quality of services than the proactive mode,but the on-demand incurs longer response time.Categories and Subject Descriptors:H.2.8[Database Applications]:Spatial databases and GISGeneral Terms:Algorithms and Experimentation. Keywords:Mobile computing,location-based services,lo-cation privacy and spatial cloaking.1.INTRODUCTIONThe emergence of state-of-the-art location-detection de-vices,e.g.,cellular phones,global positioning system(GPS) devices,and radio-frequency identification(RFID)chips re-sults in a location-dependent information access paradigm,∗This work is supported in part by the Grants-in-Aid of Re-search,Artistry,and Scholarship,University of Minnesota. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.ACM-GIS’06,November10-11,2006,Arlington,Virginia,USA. Copyright2006ACM1-59593-529-0/06/0011...$5.00.known as location-based services(LBS)[30].In LBS,mobile users have the ability to issue location-based queries to the location-based database server.Examples of such queries include“where is my nearest gas station”,“what are the restaurants within one mile of my location”,and“what is the traffic condition within ten minutes of my route”.To get the precise answer of these queries,the user has to pro-vide her exact location information to the database server. With untrustworthy servers,adversaries may access sensi-tive information about specific individuals based on their location information and issued queries.For example,an adversary may check a user’s habit and interest by knowing the places she visits and the time of each visit,or someone can track the locations of his ex-friends.In fact,in many cases,GPS devices have been used in stalking personal lo-cations[12,39].To tackle this major privacy concern,three centralized privacy-preserving frameworks are proposed for LBS[13,14,31],in which a trusted third party is used as a middleware to blur user locations into spatial regions to achieve k-anonymity,i.e.,a user is indistinguishable among other k−1users.The centralized privacy-preserving frame-work possesses the following shortcomings:1)The central-ized trusted third party could be the system bottleneck or single point of failure.2)Since the centralized third party has the complete knowledge of the location information and queries of all users,it may pose a serious privacy threat when the third party is attacked by adversaries.In this paper,we propose a peer-to-peer(P2P)spatial cloaking algorithm.Mobile users adopting the P2P spatial cloaking algorithm can protect their privacy without seeking help from any centralized third party.Other than the short-comings of the centralized approach,our work is also moti-vated by the following facts:1)The computation power and storage capacity of most mobile devices have been improv-ing at a fast pace.2)P2P communication technologies,such as IEEE802.11and Bluetooth,have been widely deployed.3)Many new applications based on P2P information shar-ing have rapidly taken shape,e.g.,cooperative information access[9,32]and P2P spatio-temporal query processing[20, 24].Figure1gives an illustrative example of P2P spatial cloak-ing.The mobile user A wants tofind her nearest gas station while beingfive anonymous,i.e.,the user is indistinguish-able amongfive users.Thus,the mobile user A has to look around andfind other four peers to collaborate as a group. In this example,the four peers are B,C,D,and E.Then, the mobile user A cloaks her exact location into a spatialA B CDEBase Stationregion that covers the entire group of mobile users A ,B ,C ,D ,and E .The mobile user A randomly selects one of the mobile users within the group as an agent .In the ex-ample given in Figure 1,the mobile user D is selected as an agent.Then,the mobile user A sends her query (i.e.,what is the nearest gas station)along with her cloaked spa-tial region to the agent.The agent forwards the query to the location-based database server through a base station.Since the location-based database server processes the query based on the cloaked spatial region,it can only give a list of candidate answers that includes the actual answers and some false positives.After the agent receives the candidate answers,it forwards the candidate answers to the mobile user A .Finally,the mobile user A gets the actual answer by filtering out all the false positives.The proposed P2P spatial cloaking algorithm can operate in two modes:on-demand and proactive .In the on-demand mode,mobile clients execute the cloaking algorithm when they need to access information from the location-based database server.On the other side,in the proactive mode,mobile clients periodically look around to find the desired number of peers.Thus,they can cloak their exact locations into spatial regions whenever they want to retrieve informa-tion from the location-based database server.In general,the contributions of this paper can be summarized as follows:1.We introduce a distributed system architecture for pro-viding anonymous location-based services (LBS)for mobile users.2.We propose the first P2P spatial cloaking algorithm for mobile users to entertain high quality location-based services without compromising their privacy.3.We provide experimental evidence that our proposed algorithm is efficient in terms of the response time,is scalable to large numbers of mobile clients,and is effective as it provides high-quality services for mobile clients without the need of exact location information.The rest of this paper is organized as follows.Section 2highlights the related work.The system model of the P2P spatial cloaking algorithm is presented in Section 3.The P2P spatial cloaking algorithm is described in Section 4.Section 5discusses the integration of the P2P spatial cloak-ing algorithm with privacy-aware location-based database servers.Section 6depicts the experimental evaluation of the P2P spatial cloaking algorithm.Finally,Section 7con-cludes this paper.2.RELATED WORKThe k -anonymity model [37,38]has been widely used in maintaining privacy in databases [5,26,27,28].The main idea is to have each tuple in the table as k -anonymous,i.e.,indistinguishable among other k −1tuples.Although we aim for the similar k -anonymity model for the P2P spatial cloaking algorithm,none of these techniques can be applied to protect user privacy for LBS,mainly for the following four reasons:1)These techniques preserve the privacy of the stored data.In our model,we aim not to store the data at all.Instead,we store perturbed versions of the data.Thus,data privacy is managed before storing the data.2)These approaches protect the data not the queries.In anonymous LBS,we aim to protect the user who issues the query to the location-based database server.For example,a mobile user who wants to ask about her nearest gas station needs to pro-tect her location while the location information of the gas station is not protected.3)These approaches guarantee the k -anonymity for a snapshot of the database.In LBS,the user location is continuously changing.Such dynamic be-havior calls for continuous maintenance of the k -anonymity model.(4)These approaches assume a unified k -anonymity requirement for all the stored records.In our P2P spatial cloaking algorithm,k -anonymity is a user-specified privacy requirement which may have a different value for each user.Motivated by the privacy threats of location-detection de-vices [1,4,6,40],several research efforts are dedicated to protect the locations of mobile users (e.g.,false dummies [23],landmark objects [18],and location perturbation [10,13,14]).The most closed approaches to ours are two centralized spatial cloaking algorithms,namely,the spatio-temporal cloaking [14]and the CliqueCloak algorithm [13],and one decentralized privacy-preserving algorithm [23].The spatio-temporal cloaking algorithm [14]assumes that all users have the same k -anonymity requirements.Furthermore,it lacks the scalability because it deals with each single request of each user individually.The CliqueCloak algorithm [13]as-sumes a different k -anonymity requirement for each user.However,since it has large computation overhead,it is lim-ited to a small k -anonymity requirement,i.e.,k is from 5to 10.A decentralized privacy-preserving algorithm is proposed for LBS [23].The main idea is that the mobile client sends a set of false locations,called dummies ,along with its true location to the location-based database server.However,the disadvantages of using dummies are threefold.First,the user has to generate realistic dummies to pre-vent the adversary from guessing its true location.Second,the location-based database server wastes a lot of resources to process the dummies.Finally,the adversary may esti-mate the user location by using cellular positioning tech-niques [34],e.g.,the time-of-arrival (TOA),the time differ-ence of arrival (TDOA)and the direction of arrival (DOA).Although several existing distributed group formation al-gorithms can be used to find peers in a mobile environment,they are not designed for privacy preserving in LBS.Some algorithms are limited to only finding the neighboring peers,e.g.,lowest-ID [11],largest-connectivity (degree)[33]and mobility-based clustering algorithms [2,25].When a mo-bile user with a strict privacy requirement,i.e.,the value of k −1is larger than the number of neighboring peers,it has to enlist other peers for help via multi-hop routing.Other algorithms do not have this limitation,but they are designed for grouping stable mobile clients together to facil-Location-based Database ServerDatabase ServerDatabase ServerFigure 2:The system architectureitate efficient data replica allocation,e.g.,dynamic connec-tivity based group algorithm [16]and mobility-based clus-tering algorithm,called DRAM [19].Our work is different from these approaches in that we propose a P2P spatial cloaking algorithm that is dedicated for mobile users to dis-cover other k −1peers via single-hop communication and/or via multi-hop routing,in order to preserve user privacy in LBS.3.SYSTEM MODELFigure 2depicts the system architecture for the pro-posed P2P spatial cloaking algorithm which contains two main components:mobile clients and location-based data-base server .Each mobile client has its own privacy profile that specifies its desired level of privacy.A privacy profile includes two parameters,k and A min ,k indicates that the user wants to be k -anonymous,i.e.,indistinguishable among k users,while A min specifies the minimum resolution of the cloaked spatial region.The larger the value of k and A min ,the more strict privacy requirements a user needs.Mobile users have the ability to change their privacy profile at any time.Our employed privacy profile matches the privacy re-quirements of mobiles users as depicted by several social science studies (e.g.,see [4,15,17,22,29]).In this architecture,each mobile user is equipped with two wireless network interface cards;one of them is dedicated to communicate with the location-based database server through the base station,while the other one is devoted to the communication with other peers.A similar multi-interface technique has been used to implement IP multi-homing for stream control transmission protocol (SCTP),in which a machine is installed with multiple network in-terface cards,and each assigned a different IP address [36].Similarly,in mobile P2P cooperation environment,mobile users have a network connection to access information from the server,e.g.,through a wireless modem or a base station,and the mobile users also have the ability to communicate with other peers via a wireless LAN,e.g.,IEEE 802.11or Bluetooth [9,24,32].Furthermore,each mobile client is equipped with a positioning device, e.g.,GPS or sensor-based local positioning systems,to determine its current lo-cation information.4.P2P SPATIAL CLOAKINGIn this section,we present the data structure and the P2P spatial cloaking algorithm.Then,we describe two operation modes of the algorithm:on-demand and proactive .4.1Data StructureThe entire system area is divided into grid.The mobile client communicates with each other to discover other k −1peers,in order to achieve the k -anonymity requirement.TheAlgorithm 1P2P Spatial Cloaking:Request Originator m 1:Function P2PCloaking-Originator (h ,k )2://Phase 1:Peer searching phase 3:The hop distance h is set to h4:The set of discovered peers T is set to {∅},and the number ofdiscovered peers k =|T |=05:while k <k −1do6:Broadcast a FORM GROUP request with the parameter h (Al-gorithm 2gives the response of each peer p that receives this request)7:T is the set of peers that respond back to m by executingAlgorithm 28:k =|T |;9:if k <k −1then 10:if T =T then 11:Suspend the request 12:end if 13:h ←h +1;14:T ←T ;15:end if 16:end while17://Phase 2:Location adjustment phase 18:for all T i ∈T do19:|mT i .p |←the greatest possible distance between m and T i .pby considering the timestamp of T i .p ’s reply and maximum speed20:end for21://Phase 3:Spatial cloaking phase22:Form a group with k −1peers having the smallest |mp |23:h ←the largest hop distance h p of the selected k −1peers 24:Determine a grid area A that covers the entire group 25:if A <A min then26:Extend the area of A till it covers A min 27:end if28:Randomly select a mobile client of the group as an agent 29:Forward the query and A to the agentmobile client can thus blur its exact location into a cloaked spatial region that is the minimum grid area covering the k −1peers and itself,and satisfies A min as well.The grid area is represented by the ID of the left-bottom and right-top cells,i.e.,(l,b )and (r,t ).In addition,each mobile client maintains a parameter h that is the required hop distance of the last peer searching.The initial value of h is equal to one.4.2AlgorithmFigure 3gives a running example for the P2P spatial cloaking algorithm.There are 15mobile clients,m 1to m 15,represented as solid circles.m 8is the request originator,other black circles represent the mobile clients received the request from m 8.The dotted circles represent the commu-nication range of the mobile client,and the arrow represents the movement direction.Algorithms 1and 2give the pseudo code for the request originator (denoted as m )and the re-quest receivers (denoted as p ),respectively.In general,the algorithm consists of the following three phases:Phase 1:Peer searching phase .The request origina-tor m wants to retrieve information from the location-based database server.m first sets h to h ,a set of discovered peers T to {∅}and the number of discovered peers k to zero,i.e.,|T |.(Lines 3to 4in Algorithm 1).Then,m broadcasts a FORM GROUP request along with a message sequence ID and the hop distance h to its neighboring peers (Line 6in Algorithm 1).m listens to the network and waits for the reply from its neighboring peers.Algorithm 2describes how a peer p responds to the FORM GROUP request along with a hop distance h and aFigure3:P2P spatial cloaking algorithm.Algorithm2P2P Spatial Cloaking:Request Receiver p1:Function P2PCloaking-Receiver(h)2://Let r be the request forwarder3:if the request is duplicate then4:Reply r with an ACK message5:return;6:end if7:h p←1;8:if h=1then9:Send the tuple T=<p,(x p,y p),v maxp ,t p,h p>to r10:else11:h←h−1;12:Broadcast a FORM GROUP request with the parameter h 13:T p is the set of peers that respond back to p14:for all T i∈T p do15:T i.h p←T i.h p+1;16:end for17:T p←T p∪{<p,(x p,y p),v maxp ,t p,h p>};18:Send T p back to r19:end ifmessage sequence ID from another peer(denoted as r)that is either the request originator or the forwarder of the re-quest.First,p checks if it is a duplicate request based on the message sequence ID.If it is a duplicate request,it sim-ply replies r with an ACK message without processing the request.Otherwise,p processes the request based on the value of h:Case1:h= 1.p turns in a tuple that contains its ID,current location,maximum movement speed,a timestamp and a hop distance(it is set to one),i.e.,< p,(x p,y p),v max p,t p,h p>,to r(Line9in Algorithm2). Case2:h> 1.p decrements h and broadcasts the FORM GROUP request with the updated h and the origi-nal message sequence ID to its neighboring peers.p keeps listening to the network,until it collects the replies from all its neighboring peers.After that,p increments the h p of each collected tuple,and then it appends its own tuple to the collected tuples T p.Finally,it sends T p back to r (Lines11to18in Algorithm2).After m collects the tuples T from its neighboring peers, if m cannotfind other k−1peers with a hop distance of h,it increments h and re-broadcasts the FORM GROUP request along with a new message sequence ID and h.m repeatedly increments h till itfinds other k−1peers(Lines6to14in Algorithm1).However,if mfinds the same set of peers in two consecutive broadcasts,i.e.,with hop distances h and h+1,there are not enough connected peers for m.Thus, m has to relax its privacy profile,i.e.,use a smaller value of k,or to be suspended for a period of time(Line11in Algorithm1).Figures3(a)and3(b)depict single-hop and multi-hop peer searching in our running example,respectively.In Fig-ure3(a),the request originator,m8,(e.g.,k=5)canfind k−1peers via single-hop communication,so m8sets h=1. Since h=1,its neighboring peers,m5,m6,m7,m9,m10, and m11,will not further broadcast the FORM GROUP re-quest.On the other hand,in Figure3(b),m8does not connect to k−1peers directly,so it has to set h>1.Thus, its neighboring peers,m7,m10,and m11,will broadcast the FORM GROUP request along with a decremented hop dis-tance,i.e.,h=h−1,and the original message sequence ID to their neighboring peers.Phase2:Location adjustment phase.Since the peer keeps moving,we have to capture the movement between the time when the peer sends its tuple and the current time. For each received tuple from a peer p,the request originator, m,determines the greatest possible distance between them by an equation,|mp |=|mp|+(t c−t p)×v max p,where |mp|is the Euclidean distance between m and p at time t p,i.e.,|mp|=(x m−x p)2+(y m−y p)2,t c is the currenttime,t p is the timestamp of the tuple and v maxpis the maximum speed of p(Lines18to20in Algorithm1).In this paper,a conservative approach is used to determine the distance,because we assume that the peer will move with the maximum speed in any direction.If p gives its movement direction,m has the ability to determine a more precise distance between them.Figure3(c)illustrates that,for each discovered peer,the circle represents the largest region where the peer can lo-cate at time t c.The greatest possible distance between the request originator m8and its discovered peer,m5,m6,m7, m9,m10,or m11is represented by a dotted line.For exam-ple,the distance of the line m8m 11is the greatest possible distance between m8and m11at time t c,i.e.,|m8m 11|. Phase3:Spatial cloaking phase.In this phase,the request originator,m,forms a virtual group with the k−1 nearest peers,based on the greatest possible distance be-tween them(Line22in Algorithm1).To adapt to the dynamic network topology and k-anonymity requirement, m sets h to the largest value of h p of the selected k−1 peers(Line15in Algorithm1).Then,m determines the minimum grid area A covering the entire group(Line24in Algorithm1).If the area of A is less than A min,m extends A,until it satisfies A min(Lines25to27in Algorithm1). Figure3(c)gives the k−1nearest peers,m6,m7,m10,and m11to the request originator,m8.For example,the privacy profile of m8is(k=5,A min=20cells),and the required cloaked spatial region of m8is represented by a bold rectan-gle,as depicted in Figure3(d).To issue the query to the location-based database server anonymously,m randomly selects a mobile client in the group as an agent(Line28in Algorithm1).Then,m sendsthe query along with the cloaked spatial region,i.e.,A,to the agent(Line29in Algorithm1).The agent forwards thequery to the location-based database server.After the serverprocesses the query with respect to the cloaked spatial re-gion,it sends a list of candidate answers back to the agent.The agent forwards the candidate answer to m,and then mfilters out the false positives from the candidate answers. 4.3Modes of OperationsThe P2P spatial cloaking algorithm can operate in twomodes,on-demand and proactive.The on-demand mode:The mobile client only executesthe algorithm when it needs to retrieve information from the location-based database server.The algorithm operatedin the on-demand mode generally incurs less communica-tion overhead than the proactive mode,because the mobileclient only executes the algorithm when necessary.However,it suffers from a longer response time than the algorithm op-erated in the proactive mode.The proactive mode:The mobile client adopting theproactive mode periodically executes the algorithm in back-ground.The mobile client can cloak its location into a spa-tial region immediately,once it wants to communicate withthe location-based database server.The proactive mode pro-vides a better response time than the on-demand mode,but it generally incurs higher communication overhead and giveslower quality of service than the on-demand mode.5.ANONYMOUS LOCATION-BASEDSERVICESHaving the spatial cloaked region as an output form Algo-rithm1,the mobile user m sends her request to the location-based server through an agent p that is randomly selected.Existing location-based database servers can support onlyexact point locations rather than cloaked regions.In or-der to be able to work with a spatial region,location-basedservers need to be equipped with a privacy-aware queryprocessor(e.g.,see[29,31]).The main idea of the privacy-aware query processor is to return a list of candidate answerrather than the exact query answer.Then,the mobile user m willfilter the candidate list to eliminate its false positives andfind its exact answer.The tighter the spatial cloaked re-gion,the lower is the size of the candidate answer,and hencethe better is the performance of the privacy-aware query processor.However,tight cloaked regions may represent re-laxed privacy constrained.Thus,a trade-offbetween the user privacy and the quality of service can be achieved[31]. Figure4(a)depicts such scenario by showing the data stored at the server side.There are32target objects,i.e., gas stations,T1to T32represented as black circles,the shaded area represents the spatial cloaked area of the mo-bile client who issued the query.For clarification,the actual mobile client location is plotted in Figure4(a)as a black square inside the cloaked area.However,such information is neither stored at the server side nor revealed to the server. The privacy-aware query processor determines a range that includes all target objects that are possibly contributing to the answer given that the actual location of the mobile client could be anywhere within the shaded area.The range is rep-resented as a bold rectangle,as depicted in Figure4(b).The server sends a list of candidate answers,i.e.,T8,T12,T13, T16,T17,T21,and T22,back to the agent.The agent next for-(a)Server Side(b)Client SideFigure4:Anonymous location-based services wards the candidate answers to the requesting mobile client either through single-hop communication or through multi-hop routing.Finally,the mobile client can get the actualanswer,i.e.,T13,byfiltering out the false positives from thecandidate answers.The algorithmic details of the privacy-aware query proces-sor is beyond the scope of this paper.Interested readers are referred to[31]for more details.6.EXPERIMENTAL RESULTSIn this section,we evaluate and compare the scalabilityand efficiency of the P2P spatial cloaking algorithm in boththe on-demand and proactive modes with respect to the av-erage response time per query,the average number of mes-sages per query,and the size of the returned candidate an-swers from the location-based database server.The queryresponse time in the on-demand mode is defined as the timeelapsed between a mobile client starting to search k−1peersand receiving the candidate answers from the agent.On theother hand,the query response time in the proactive mode is defined as the time elapsed between a mobile client startingto forward its query along with the cloaked spatial regionto the agent and receiving the candidate answers from theagent.The simulation model is implemented in C++usingCSIM[35].In all the experiments in this section,we consider an in-dividual random walk model that is based on“random way-point”model[7,8].At the beginning,the mobile clientsare randomly distributed in a spatial space of1,000×1,000square meters,in which a uniform grid structure of100×100cells is constructed.Each mobile client randomly chooses itsown destination in the space with a randomly determined speed s from a uniform distribution U(v min,v max).When the mobile client reaches the destination,it comes to a stand-still for one second to determine its next destination.Afterthat,the mobile client moves towards its new destinationwith another speed.All the mobile clients repeat this move-ment behavior during the simulation.The time interval be-tween two consecutive queries generated by a mobile client follows an exponential distribution with a mean of ten sec-onds.All the experiments consider one half-duplex wirelesschannel for a mobile client to communicate with its peers with a total bandwidth of2Mbps and a transmission range of250meters.When a mobile client wants to communicate with other peers or the location-based database server,it has to wait if the requested channel is busy.In the simulated mobile environment,there is a centralized location-based database server,and one wireless communication channel between the location-based database server and the mobile。

postgis with protobuf

postgis with protobuf

PostGIS is a spatial database extender for PostgreSQL. It allows you to store, query, and manipulate geographic data in a SQL environment. Protocol Buffers (protobuf) is a language-neutral, platform-neutral, extensible mechanism for serializing structured data, used primarily for inter-service communication and data storage in Google's systems.Integrating PostGIS with protobuf typically involves the following steps:1. Define your data structure using protobuf schema (`.proto` files).2. Generate the data access objects (DAO) or client libraries in your desired programming language(s) from the `.proto` files.3. Use the generated libraries to interact with your protobuf-formatted data.4. If you need to store or retrieve spatial data from PostgreSQL using protobuf, you will have to convert between the protobuf format and the spatial data formats supported by PostGIS (like Well-Known Text [WKT] or Well-Known binary [WKB]).Here's a high-level example of how you might approach this integration:**Step 1: Define your protobuf schema**```protobufsyntax = "proto3";message Point {float x = 1;float y = 2;}message SpatialData {Point point = 1;}```**Step 2: Generate DAOs or client libraries**You would use the `protoc` command-line tool to generate code for your chosen language(s). For example, if you're using Python, you might run:```shprotoc --python_out=./path/to/output ./path/to/schema.proto```**Step 3: Interact with protobuf-formatted data**Using the generated Python code, you can now encode and decode spatial data into and from protobuf format:```pythonfrom your_generated_module import SpatialData# Create a spatial data object in protobuf formatspatial_data = SpatialData(point=Point(x=1.0, y=2.0))# Serialize to binary formatserialized_data = spatial_data.SerializeToString()# Deserialize from binary formatdeserialized_data = SpatialData()deserialized_data.ParseFromString(serialized_data)```**Step 4: Convert between protobuf and PostGIS formats**To store or retrieve spatial data in PostgreSQL using PostGIS, you'll need to convert your protobuf data to the WKB format for storage and vice versa when retrieving:```pythonfrom shapely.geometry import Pointfrom postgis.core import WKBWriter, WKBReader# Convert protobuf point to a Shapely Pointshapely_point = Point(spatial_data.point.x, spatial_data.point.y)# Convert Shapely Point to WKBwkb_data = WKBWriter().write(shapely_point)# Insert WKB data into PostgreSQL/PostGIS# ...# Retrieve WKB data from PostgreSQL/PostGISwkb_reader = WKBReader()retrieved_point = wkb_reader.read(wkb_data)# Convert WKB to protobuf Pointprotobuf_point = Point(retrieved_point.x, retrieved_point.y)```Please note that this is a simplified example and real-world use cases may require more complex handling, especially when dealing with the conversion between different formats and ensuring data integrity and performance. Additionally, you would need to have the PostGIS extension installed in your PostgreSQL database and the appropriate GDAL libraries installed to handle the WKB/WKT conversions.。

odin ODE生成和集成系统文档说明书

odin ODE生成和集成系统文档说明书

Package‘odin’October2,2023Title ODE Generation and IntegrationVersion1.2.5Description Generate systems of ordinary differential equations (ODE)and integrate them,using a domain specific language(DSL).The DSL uses R's syntax,but compiles to C in order toefficiently solve the system.A solver is not provided,butinstead interfaces to the packages'deSolve'and'dde'aregenerated.With these,while solving the differential equations,no allocations are done and the calculations remain entirely incompiled code.Alternatively,a model can be transpiled to R for use in contexts where a C compiler is not present.Aftercompilation,models can be inspected to return information about parameters and outputs,or intermediate values after calculations.'odin'is not targeted at any particular domain and is suitablefor any system that can be expressed primarily as mathematicalexpressions.Additional support is provided for working withdelays(delay differential equations,DDE),using interpolatedfunctions during interpolation,and for integrating quantitiesthat represent arrays.License MIT+file LICENSEURL https:///mrc-ide/odinBugReports https:///mrc-ide/odin/issues Imports R6,cinterpolate(>=1.0.0),deSolve,digest,glue,jsonlite, ring,withrSuggests dde(>=1.0.0),jsonvalidate(>=1.1.0),knitr,mockery, pkgbuild,pkgload,rlang,rmarkdown,testthat VignetteBuilder knitrRoxygenNote7.1.1Encoding UTF-8Language en-GBNeedsCompilation no12can_compile Author Rich FitzJohn[aut,cre],Thibaut Jombart[ctb],Imperial College of Science,Technology and Medicine[cph]Maintainer Rich FitzJohn<***********************>Repository CRANDate/Publication2023-10-0213:40:11UTCR topics documented:can_compile (2)odin (3)odin_build (5)odin_ir (6)odin_ir_deserialise (7)odin_options (7)odin_package (9)odin_parse (10)odin_validate (11)Index13 can_compile Test if compilation is possibleDescriptionTest if compilation appears possible.This is used in some examples,and tries compiling a trivial C program with pkgbuild.Results are cached between runs within a session so this should be fast to rely on.Usagecan_compile(verbose=FALSE,refresh=FALSE)Argumentsverbose Be verbose when running commands?refresh Try again to compile,skipping the cached value?DetailsWe use pkgbuild in order to build packages,and it includes a set of heuristics to locate and organise your C compiler.The most likely people affected here are Windows users;if you get this ensure that you have rtools ing pkgbuild::find_rtools()with debug=TRUE may be helpful for diagnosing compiler issues.odin3ValueA logical scalarExamplescan_compile()#will take~0.1s the first timecan_compile()#should be basically instantaneousodin Create an odin modelDescriptionCreate an odin model from afile,text string(s)or expression.The odin_version is a"standard evaluation"escape hatch.Usageodin(x,verbose=NULL,target=NULL,workdir=NULL,validate=NULL,pretty=NULL,skip_cache=NULL,compiler_warnings=NULL,no_check_unused_equations=NULL,options=NULL)odin_(x,verbose=NULL,target=NULL,workdir=NULL,validate=NULL,pretty=NULL,skip_cache=NULL,compiler_warnings=NULL,no_check_unused_equations=NULL,options=NULL)Argumentsx Either the name of afile to read,a text string(if length is greater than1elements will be joined with newlines)or an expression.verbose Logical scalar indicating if the compilation should be verbose.Defaults to the value of the option odin.verbose or FALSE otherwise.target Compilation target.Options are"c"and"r",defaulting to the option odin.target or"c"otherwise.workdir Directory to use for any generatedfiles.This is only relevant for the"c"target.Defaults to the value of the option odin.workdir or tempdir()otherwise.validate Validate the model’s intermediate representation against the included schema.Normally this is not needed and is intended primarily for development use.De-faults to the value of the option odin.validate or FALSE otherwise.pretty Pretty-print the model’s intermediate representation.Normally this is not needed and is intended primarily for development use.Defaults to the value of theoption odin.pretty or FALSE otherwise.skip_cache Skip odin’s cache.This might be useful if the model appears not to compile when you would expect it to.Hopefully this will not be needed often.Defaultsto the option odin.skip_cache or FALSE otherwise.4odin compiler_warningsPreviously this attempted detection of compiler warnings(with some degree ofsuccess),but is currently ignored.This may become supported again in a futureversion depending on underlying support in pkgbuild.no_check_unused_equationsIf TRUE,then don’t print messages about unused variables.Defaults to the optionodin.no_check_unused_equations or FALSE otherwise.options Named list of options.If provided,then all other options are ignored.DetailsDo not use odin::odin in a package;you almost certainly want to use odin_package instead.A generated model can return information about itself;odin_irValueAn odin_generator object(an R6class)which can be used to create model instances.User parametersIf the model accepts user parameters,then the parameter to the constructor or the$set_user() method can be used to control the behaviour when unknown user actions are passed into the model.Possible values are the strings stop(throw an error),warning(issue a warning but keep go-ing),message(print a message and keep going)or ignore(do nothing).Defaults to the option odin.unused_user_action,or warning otherwise.Delay equations with ddeWhen generating a model one must chose between using the dde package to solve the system or the default deSolve.Future versions may allow this to switch when using run,but for now this requires tweaking the generated code to a point where one must decide at generation.dde implements only the Dormand-Prince5th order dense output solver,with a delay equation solver that may perform better than the solvers in deSolve.For non-delay equations,deSolve is very likely to outperform the simple solver implemented.Author(s)Rich FitzJohnExamples##Compile the model;exp_decay here is an R6ClassGenerator and will##generate instances of a model of exponential decay:exp_decay<-odin::odin({deriv(y)<--0.5*yinitial(y)<-1},target="r")##Generate an instance;there are no parameters here so all instances##are the same and this looks a bit pointless.But this step isodin_build5 ##required because in general you don t want to have to compile the##model every time it is used(so the generator will go in a##package).mod<-exp_decay$new()##Run the model for a series of times from0to10:t<-seq(0,10,length.out=101)y<-mod$run(t)plot(y,xlab="Time",ylab="y",main="",las=1)odin_build Build an odin model generator from its IRDescriptionBuild an odin model generator from its intermediate representation,as generated by odin_parse.This function is for advanced use.Usageodin_build(x,options=NULL)Argumentsx An odin ir(json)object or output from odin_validate.options Options to pass to the build stage(see odin_optionsDetailsIn applications that want to inspect the intermediate representation rather before compiling,ratherthan directly using odin,use either odin_parse or odin_validate and then pass the result to odin::odin_build.The return value of this function includes information about how long the compilation took,if itwas successful,etc,in the same style as odin_validate:success Logical,indicating if compilation was successfulelapsed Time taken to compile the model,as a proc_time object,as returned by proc.time.output Any output produced when compiling the model(only present if compiling to C,and if the cache was not hit.model The model itself,as an odin_generator object,as returned by odin.ir The intermediate representation.error Any error thrown during compilationSee Alsoodin_parse,which creates intermediate representations used by this function.6odin_irExamples#Parse a model of exponential decayir<-odin::odin_parse({deriv(y)<--0.5*yinitial(y)<-1})#Compile the model:options<-odin::odin_options(target="r")res<-odin::odin_build(ir,options)#All results:res#The model:mod<-res$model$new()mod$run(0:10)odin_ir Return detailed information about an odin modelDescriptionReturn detailed information about an odin model.This is the mechanism through which coef works with odin.Usageodin_ir(x,parsed=FALSE)Argumentsx An odin_generator function,as created by odin::odinparsed Logical,indicating if the representation should be parsed and converted into an R object.If FALSE we return a json string.WarningThe returned data is subject to change for a few versions while I work out how we’ll use it. Examplesexp_decay<-odin::odin({deriv(y)<--0.5*yinitial(y)<-1},target="r")odin::odin_ir(exp_decay)coef(exp_decay)odin_ir_deserialise7 odin_ir_deserialise Deserialise odin’s IRDescriptionDeserialise odin’s intermediate model representation from a json string into an R object.Unlike the json,there is no schema for this representation.This function provides access to the same deserialisation that odin uses internally so may be useful in applications.Usageodin_ir_deserialise(x)Argumentsx An intermediate representation as a json stringValueA named listSee Alsoodin_parseExamples#Parse a model of exponential decayir<-odin::odin_parse({deriv(y)<--0.5*yinitial(y)<-1})#Convert the representation to an R objectodin::odin_ir_deserialise(ir)odin_options Odin optionsDescriptionFor lower-level odin functions odin_parse,odin_validate we only accept a list of options rather than individually named options.8odin_options Usageodin_options(verbose=NULL,target=NULL,workdir=NULL,validate=NULL,pretty=NULL,skip_cache=NULL,compiler_warnings=NULL,no_check_unused_equations=NULL,rewrite_dims=NULL,rewrite_constants=NULL,substitutions=NULL,options=NULL)Argumentsverbose Logical scalar indicating if the compilation should be verbose.Defaults to the value of the option odin.verbose or FALSE otherwise.target Compilation target.Options are"c"and"r",defaulting to the option odin.target or"c"otherwise.workdir Directory to use for any generatedfiles.This is only relevant for the"c"target.Defaults to the value of the option odin.workdir or tempdir()otherwise.validate Validate the model’s intermediate representation against the included schema.Normally this is not needed and is intended primarily for development use.De-faults to the value of the option odin.validate or FALSE otherwise.pretty Pretty-print the model’s intermediate representation.Normally this is not needed and is intended primarily for development use.Defaults to the value of theoption odin.pretty or FALSE otherwise.skip_cache Skip odin’s cache.This might be useful if the model appears not to compile when you would expect it to.Hopefully this will not be needed often.Defaultsto the option odin.skip_cache or FALSE otherwise.compiler_warningsPreviously this attempted detection of compiler warnings(with some degree ofsuccess),but is currently ignored.This may become supported again in a futureversion depending on underlying support in pkgbuild.no_check_unused_equationsIf TRUE,then don’t print messages about unused variables.Defaults to the optionodin.no_check_unused_equations or FALSE otherwise.rewrite_dims Logical,indicating if odin should try and rewrite your model dimensions(if us-ing arrays).If TRUE then we replace dimensions known at compile-time withliteral integers,and those known at initialisation with simplified and shared ex-pressions.You may get less-comprehensible error messages with this optionset to TRUE because parts of the model have been effectively evaluated duringprocessing.rewrite_constantsLogical,indicating if odin should try and rewrite all constant scalars.This is asuperset of rewrite_dims and may be slow for large models.Doing this willmake your model less debuggable;error messages will reference expressionsthat have been extensively rewritten,some variables will have been removedentirely or merged with other identical expressions,and the generated code maynot be obviously connected to the original code.odin_package9 substitutions Optionally,a list of values to substitute into model specification as constants, even though they are declared as user().This will be most useful in conjunctionwith rewrite_dims to create a copy of your model with dimensions known atcompile time and all loops using literal integers.options Named list of options.If provided,then all other options are ignored.ValueA list of parameters,of class odin_optionsExamplesodin_options()odin_package Create odin model in a packageDescriptionCreate an odin model within an existing package.Usageodin_package(path_package)Argumentspath_package Path to the package root(the directory that contains DESCRIPTION)DetailsI am resisting the urge to actually create the package here.There are better options than I cancome up with;for example devtools::create,pkgkitten::kitten,mason::mason,or creating DESCRIPTIONfiles using desc.What is required here is that your package:•Lists odin in Imports:•Includes useDynLib(<your package name>)in NAMESPACE(possibly via a roxygen comment @useDynLib<your package name>•To avoid a NOTE in R CMD check,import something from odin in your namespace(e.g., importFrom("odin","odin")s or roxygen@importFrom(odin,odin)Point this function at the package root(the directory containing DESCRIPTION and it will write outfiles src/odin.c and odin.R.Thesefiles will be overwritten without warning by running this again.10odin_parseExamplespath<-tempfile()dir.create(path)src<-system.file("examples/package",package="odin",mustWork=TRUE)file.copy(src,path,recursive=TRUE)pkg<-file.path(path,"package")#The package is minimal:dir(pkg)#But contains odin files in inst/odindir(file.path(pkg,"inst/odin"))#Compile the odin code in the packageodin::odin_package(pkg)#Which creates the rest of the package structuredir(pkg)dir(file.path(pkg,"R"))dir(file.path(pkg,"src"))odin_parse Parse an odin modelDescriptionParse an odin model,returning an intermediate representation.The odin_parse_version is a"stan-dard evaluation"escape hatch.Usageodin_parse(x,type=NULL,options=NULL)odin_parse_(x,options=NULL,type=NULL)Argumentsx An expression,character vector orfilename with the odin codetype An optional string indicating the the type of input-must be one of expression, file or text if provided.This skips the type detection code used by odin andmakes validating user input easier.options odin options;see odin_options.The primary options that affect the parse stage are validate and pretty.DetailsA schema for the intermediate representation is available in the package as schema.json.It issubject to change at this point.See Alsoodin_validate,which wraps this function where parsing might fail,and odin_build for building odin models from an intermediate representation.Examples#Parse a model of exponential decayir<-odin::odin_parse({deriv(y)<--0.5*yinitial(y)<-1})#This is odin s intermediate representation of the modelir#If parsing odin models programmatically,it is better to use#odin_parse_;construct the model as a string,from a file,or as a#quoted expression:code<-quote({deriv(y)<--0.5*yinitial(y)<-1})odin::odin_parse_(code)odin_validate Validate an odin modelDescriptionValidate an odin model.This function is closer to odin_parse_than odin_parse because it does not do any quoting of the code.It is primarily intended for use within other applications.Usageodin_validate(x,type=NULL,options=NULL)Argumentsx An expression,character vector orfilename with the odin codetype An optional string indicating the the type of input-must be one of expression, file or text if provided.This skips the type detection code used by odin andmakes validating user input easier.options odin options;see odin_options.The primary options that affect the parse stage are validate and pretty.Detailsodin_validate will always return a list with the same elements:success A boolean,TRUE if validation was successfulresult The intermediate representation,as returned by odin_parse_,if the validation was success-ful,otherwise NULLerror An error object if the validation was unsuccessful,otherwise NULL.This may be a classed odin error,in which case it will contain source location information-see the examples for details.messages A list of messages,if the validation returned any.At present this is only non-fatal infor-mation about unused variables.Author(s)Rich FitzJohnExamples#A successful validation:odin::odin_validate(c("deriv(x)<-1","initial(x)<-1"))#A complete failure:odin::odin_validate("")#A more interesting failurecode<-c("deriv(x)<-a","initial(x)<-1")res<-odin::odin_validate(code)res#The object res$error is an odin_error object:res$error#It contains information that might be used to display to a#user information about the error:unclass(res$error)#Notes are raised in a similar way:code<-c("deriv(x)<-1","initial(x)<-1","a<-1")res<-odin::odin_validate(code)res$messages[[1]]Indexcan_compile,2coef,6odin,3,5odin_(odin),3odin_build,5,11odin_ir,4,6odin_ir_deserialise,7odin_options,5,7,10,11odin_package,4,9odin_parse,5,7,10,11odin_parse_,11,12odin_parse_(odin_parse),10odin_validate,5,7,11,11pkgbuild::find_rtools(),2proc.time,5tempdir(),3,813。

EECS-2012-214

EECS-2012-214

Shark: SQL and Rich Analytics at ScaleReynold Shi XinJoshua RosenMatei ZahariaMichael FranklinScott ShenkerIon StoicaElectrical Engineering and Computer SciencesUniversity of California at BerkeleyTechnical Report No. UCB/EECS-2012-214/Pubs/TechRpts/2012/EECS-2012-214.htmlNovember 26, 2012Copyright © 2012, by the author(s).All rights reserved.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.Shark:SQL and Rich Analytics at ScaleReynold Xin,Josh Rosen,Matei Zaharia,Michael J.Franklin,Scott Shenker,Ion StoicaAMPLab,EECS,UC Berkeley{rxin,joshrosen,matei,franklin,shenker,istoica}@ABSTRACTShark is a new data analysis system that marries query process-ing with complex analytics on large clusters.It leverages a novel distributed memory abstraction to provide a unified engine that can run SQL queries and sophisticated analytics functions(e.g.,it-erative machine learning)at scale,and efficiently recovers from failures mid-query.This allows Shark to run SQL queries up to 100×faster than Apache Hive,and machine learning programs up to100×faster than Hadoop.Unlike previous systems,Shark shows that it is possible to achieve these speedups while retain-ing a MapReduce-like execution engine,and thefine-grained fault tolerance properties that such engines provide.It extends such an engine in several ways,including column-oriented in-memory stor-age and dynamic mid-query replanning,to effectively execute SQL. The result is a system that matches the speedups reported for MPP analytic databases over MapReduce,while offering fault tolerance properties and complex analytics capabilities that they lack.1IntroductionModern data analysis faces a confluence of growing challenges. First,data volumes are expanding dramatically,creating the need to scale out across clusters of hundreds of commodity machines. Second,this new scale increases the incidence of faults and strag-glers(slow tasks),complicating parallel database design.Third,the complexity of data analysis has also grown:modern data analysis employs sophisticated statistical methods,such as machine learn-ing algorithms,that go well beyond the roll-up and drill-down ca-pabilities of traditional enterprise data warehouse systems.Finally, despite these increases in scale and complexity,users still expect to be able to query data at interactive speeds.To tackle the“big data”problem,two major lines of systems have recently been explored.Thefirst,composed of MapReduce[13] and various generalizations[17,9],offers afine-grained fault toler-ance model suitable for large clusters,where tasks on failed or slow nodes can be deterministically re-executed on other nodes.MapRe-duce is also fairly general:it has been shown to be able to express many statistical and learning algorithms[11].It also easily supports unstructured data and“schema-on-read.”However,MapReduce engines lack many of the features that make databases efficient,and have high latencies of tens of seconds to hours.Even systems that have significantly optimized MapReduce for SQL queries,such as Google’s Tenzing[9],or that combine it with a traditional database on each node,such as HadoopDB[3],report a minimum latency of10seconds.As such,MapReduce approaches have largely been dismissed for interactive-speed queries[25],and even Google is developing new engines for such workloads[24].Instead,most MPP analytic databases(e.g.,Vertica,Greenplum, Teradata)and several of the new low-latency engines proposed for MapReduce environments(e.g.,Google Dremel[24],Cloudera Im-pala[1])employ a coarser-grained recovery model,where an entire query has to be resubmitted if a machine fails.1This works well for short queries where a retry is inexpensive,but faces significant challenges in long queries as clusters scale up[3].In addition, these systems often lack the rich analytics functions that are easy to implement in MapReduce,such as machine learning and graph algorithms.Furthermore,while it may be possible to implement some of these functions using UDFs,these algorithms are often expensive,furthering the need for fault and straggler recovery for long queries.Thus,most organizations tend to use other systems alongside MPP databases to perform complex analytics.To provide an effective environment for big data analysis,we believe that processing systems will need to support both SQL and complex analytics efficiently,and to providefine-grained fault re-covery across both types of operations.This paper describes a new system that meets these goals,called Shark.Shark is open source and compatible with Apache Hive,and has already been used at web companies to speed up queries by40–100×.Shark builds on a recently-proposed distributed shared memory abstraction called Resilient Distributed Datasets(RDDs)[33]to perform most computations in memory while offeringfine-grained fault tolerance.In-memory computing is increasingly important in large-scale analytics for two reasons.First,many complex analyt-ics functions,such as machine learning and graph algorithms,are iterative,going over the data multiple times;thus,the fastest sys-tems deployed for these applications are in-memory[23,22,33]. Second,even traditional SQL warehouse workloads exhibit strong temporal and spatial locality,because more-recent fact table data and small dimension tables are read disproportionately often.A study of Facebook’s Hive warehouse and Microsoft’s Bing analyt-ics cluster showed that over95%of queries in both systems could be served out of memory using just64GB/node as a cache,even though each system manages more than100PB of total data[5]. The main benefit of RDDs is an efficient mechanism for fault recovery.Traditional main-memory databases supportfine-grained updates to tables and replicate writes across the network for fault tolerance,which is expensive on large commodity clusters.In con-trast,RDDs restrict the programming interface to coarse-grained deterministic operators that affect multiple data items at once,such as map,group-by and join,and recover from failures by tracking the lineage of each dataset and recomputing lost data.This approach works well for data-parallel relational queries,and has also been shown to support machine learning and graph computation[33]. Thus,when a node fails,Shark can recover mid-query by rerun-1Dremel provides fault tolerance within a query,but Dremel is lim-ited to aggregation trees instead of the more complex communica-tion patterns in joins.Logistic RegressionUser Query 2User Query 1020406080100120Figure 1:Performance of Shark vs.Hive/Hadoop on two SQL queries from an early user and one iteration of logistic regres-sion (a classification algorithm that runs ∼10such steps).Re-sults measure the runtime (seconds)on a 100-node cluster.ning the deterministic operations used to build lost data partitions on other nodes,similar to MapReduce.Indeed,it typically recovers within seconds,by parallelizing this work across the cluster.To run SQL efficiently,however,we also had to extend the RDD execution model,bringing in several concepts from traditional an-alytical databases and some new ones.We started with an exist-ing implementation of RDDs called Spark [33],and added several features.First,to store and process relational data efficiently,we implemented in-memory columnar storage and columnar compres-sion.This reduced both the data size and the processing time by as much as 5×over naïvely storing the data in a Spark program in its original format.Second,to optimize SQL queries based on the data characteristics even in the presence of analytics functions and UDFs,we extended Spark with Partial DAG Execution (PDE):Shark can reoptimize a running query after running the first few stages of its task DAG,choosing better join strategies or the right degree of parallelism based on observed statistics.Third,we lever-age other properties of the Spark engine not present in traditional MapReduce systems,such as control over data partitioning.Our implementation of Shark is compatible with Apache Hive [28],supporting all of Hive’s SQL dialect and UDFs and allowing execution over unmodified Hive data warehouses.It augments SQL with complex analytics functions written in Spark,using Spark’s Java,Scala or Python APIs.These functions can be combined with SQL in a single execution plan,providing in-memory data sharing and fast recovery across both types of processing.Experiments show that using RDDs and the optimizations above,Shark can answer SQL queries up to 100×faster than Hive,runs it-erative machine learning algorithms up to 100×faster than Hadoop,and can recover from failures mid-query within seconds.Figure 1shows three sample results.Shark’s speed is comparable to that of MPP databases in benchmarks like Pavlo et al.’s comparison with MapReduce [25],but it offers fine-grained recovery and complex analytics features that these systems lack.More fundamentally,our work shows that MapReduce-like exe-cution models can be applied effectively to SQL,and offer a promis-ing way to combine relational and complex analytics.In the course of presenting of Shark,we also explore why SQL engines over pre-vious MapReduce runtimes,such as Hive,are slow,and show how a combination of enhancements in Shark (e.g.,PDE),and engine properties that have not been optimized in MapReduce,such as the overhead of launching tasks,eliminate many of the bottlenecks in traditional MapReduce systems.2System OverviewShark is a data analysis system that supports both SQL query pro-cessing and machine learning functions.We have chosen to imple-!!Master!Node!!Slave!Node HDFS!DataNodeResource!Manager!Daemon!Spark!RuntimeExecution!Engine Memstore!!Slave!Node HDFS!DataNodeResource!Manager!Daemon!Spark!RuntimeExecution!Engine MemstoreResource!Manager!SchedulerMetastore (System!Catalog)Master!ProcessHDFS!NameNode Figure 2:Shark Architecturement Shark to be compatible with Apache Hive.It can be used to query an existing Hive warehouse and return results much faster,without modification to either the data or the queries.Thanks to its Hive compatibility,Shark can query data in any system that supports the Hadoop storage API,including HDFS and Amazon S3.It also supports a wide range of data formats such as text,binary sequence files,JSON,and XML.It inherits Hive’s schema-on-read capability and nested data types [28].In addition,users can choose to load high-value data into Shark’s memory store for fast analytics,as shown below:engine.When a query is submitted to the master,Shark compiles the query into operator tree represented as RDDs,as we shall dis-cuss in Section 2.4.These RDDs are then translated by Spark into a graph of tasks to execute on the slave nodes.Cluster resources can optionally be allocated by a cluster re-source manager (e.g.,Hadoop Y ARN or Apache Mesos)that pro-vides resource sharing and isolation between different computing frameworks,allowing Shark to coexist with engines like Hadoop.In the remainder of this section,we cover the basics of Spark and the RDD programming model,followed by an explanation of how Shark query plans are generated and run.2.1SparkSpark is the MapReduce-like cluster computing engine used by Shark.Spark has several features that differentiate it from tradi-tional MapReduce engines [33]:1.Like Dryad and Tenzing [17,9],it supports general compu-tation DAGs,not just the two-stage MapReduce topology.2.It provides an in-memory storage abstraction called Resilient Distributed Datasets (RDDs)that lets applications keep data in memory across queries,and automatically reconstructs it after failures [33].3.The engine is optimized for low latency.It can efficiently manage tasks as short as 100milliseconds on clusters of thousands of cores,while engines like Hadoop incur a la-tency of 5–10seconds to launch each task.withinmid-engine finished, [20],weof (RDDs), created ,hash-storageby RDD ofan turn eachbya Java or is:+b)Spark are au-deter-enablethe it),and[33].2com-(URL,1) map onin-the speedis nodata when we need to replicate each byte written to another machine for fault-tolerance.DRAM in a modern server is over10×faster than even a 10-Gigabit network.Second,Spark can keep just one copy of each RDD partition in memory,saving precious memory over a repli-cated system,since it can always recover lost data using lineage. Third,when a node fails,its lost RDD partitions can be rebuilt in parallel across the other nodes,allowing speedy recovery.3Fourth, even if a node is just slow(a“straggler”),we can recompute nec-essary partitions on other nodes because RDDs are immutable so there are no consistency concerns with having two copies of a par-tition.These benefits make RDDs attractive as the foundation for our relational processing in Shark.2.3Fault Tolerance GuaranteesTo summarize the benefits of RDDs explained above,Shark pro-vides the following fault tolerance properties,which have been dif-ficult to support in traditional MPP database designs:1.Shark can tolerate the loss of any set of worker nodes.Theexecution engine will re-execute any lost tasks and recom-pute any lost RDD partitions using lineage.4This is true even within a query:Spark will rerun any failed tasks,or lost dependencies of new tasks,without aborting the query.2.Recovery is parallelized across the cluster.If a failed nodecontained100RDD partitions,these can be rebuilt in parallel on100different nodes,quickly recovering the lost data.3.The deterministic nature of RDDs also enables straggler mit-igation:if a task is slow,the system can launch a speculative “backup copy”of it on another node,as in MapReduce[13].4.Recovery works even in queries that combine SQL and ma-chine learning UDFs(Section4),as these operations all com-pile into a single RDD lineage graph.2.4Executing SQL over RDDsShark runs SQL queries over Spark using a three-step process sim-ilar to traditional RDBMSs:query parsing,logical plan generation, and physical plan generation.Given a query,Shark uses the Hive query compiler to parse the query and generate an abstract syntax tree.The tree is then turned into a logical plan and basic logical optimization,such as predi-cate pushdown,is applied.Up to this point,Shark and Hive share an identical approach.Hive would then convert the operator into a physical plan consisting of multiple MapReduce stages.In the case of Shark,its optimizer applies additional rule-based optimizations, such as pushing LIMIT down to individual partitions,and creates a physical plan consisting of transformations on RDDs rather than MapReduce jobs.We use a variety of operators already present in Spark,such as map and reduce,as well as new operators we imple-mented for Shark,such as broadcast joins.Spark’s master then exe-cutes this graph using standard MapReduce scheduling techniques, such placing tasks close to their input data,rerunning lost tasks, and performing straggler mitigation[33].While this basic approach makes it possible to run SQL over Spark,doing so efficiently is challenging.The prevalence of UDFs and complex analytic functions in Shark’s workload makes it diffi-cult to determine an optimal query plan at compile time,especially for new data that has not undergone ETL.In addition,even with 3To provide fault tolerance across“shuffle”operations like a par-allel reduce,the execution engine also saves the“map”side of the shuffle in memory on the source nodes,spilling to disk if necessary.4Support for master recovery could also be added by reliabliy log-ging the RDD lineage graph and the submitted jobs,because this state is small,but we have not yet implemented this.such a plan,naïvely executing it over Sparkruntimes)can be inefficient.In the next eral extensions we made to Spark toand run SQL,starting with a mechanism statistics-driven re-optimization at run-time. 3Engine ExtensionsIn this section,we describe ourto enable efficient execution of SQL queries.3.1Partial DAG Execution(PDE) Systems like Shark and Hive are frequently that has not undergone a data loadinguse of static query optimization techniques priori data statistics,such as statisticslack of statistics for fresh data,combined UDFs,necessitates dynamic approaches to To support dynamic query optimization we extended Spark to support partial DAG technique that allows dynamic alteration of data statistics collected at run-time.We currently apply partial DAGfle"operator boundaries where data is exchanged and repartitioned, since these are typically the most expensive operations in Shark.By default,Spark materializes the output of each map task in memory before a shuffle,spilling it to disk as ter,reduce tasks fetch this output.PDE modifies this mechanism in two ways.First,it gathers cus-tomizable statistics at global and per-partition granularities while materializing map output.Second,it allows the DAG to be altered based on these statistics,either by choosing different operators or altering their parameters(such as their degrees of parallelism). These statistics are customizable using a simple,pluggable ac-cumulator API.Some example statistics include:1.Partition sizes and record counts,which can be used to detectskew.2.Lists of“heavy hitters,”i.e.,items that occur frequently inthe dataset.3.Approximate histograms,which can be used to estimate par-titions’data’s distributions.These statistics are sent by each worker to the master,where they are aggregated and presented to the optimizer.For efficiency,we use lossy compression to record the statistics,limiting their size to 1–2KB per task.For instance,we encode partition sizes(in bytes) with logarithmic encoding,which can represent sizes of up to32 GB using only one byte with at most10%error.The master can then use these statistics to perform various run-time optimizations, as we shall discuss next.Partial DAG execution complements existing adaptive query op-timization techniques that typically run in a single-node system[6, 20,30],as we can use existing techniques to dynamically optimize the local plan within each node,and use PDE to optimize the global structure of the plan at stage boundaries.Thisfine-grained statis-tics collection,and the optimizations that it enables,differentiates PDE from graph rewriting features in previous systems,such as DryadLINQ[19].3.1.1Join OptimizationPartial DAG execution can be used to perform several run-time op-timizations for join queries.Figure4illustrates two communication patterns for MapReduce-style joins.In shuffle join,both join tables are hash-partitioned by local join algorithm,which is chosen by each reducer based on run-time statistics.If one of a reducer’s input partitions is small,then it constructs a hash table over the small partition and probes it using the large partition.If both partitions are large,then a symmetric hash join is performed by constructing hash tables over both inputs. In map join,also known as broadcast join,a small input table is broadcast to all nodes,where it is joined with each partition of a large table.This approach can result in significant cost savings by avoiding an expensive repartitioning and shuffling phase.Map join is only worthwhile if some join inputs are small,so Shark uses partial DAG execution to select the join strategy at run-time based on its inputs’exact sizes.By using sizes of the join inputs gathered at run-time,this approach works well even with in-put tables that have no prior statistics,such as intermediate results. Run-time statistics also inform the join tasks’scheduling poli-cies.If the optimizer has a prior belief that a particular join input will be small,it will schedule that task before other join inputs and decide to perform a map-join if it observes that the task’s output is small.This allows the query engine to avoid performing the pre-shuffle partitioning of a large table once the optimizer has decided to perform a map-join.3.1.2Skew-handling and Degree of ParallelismPartial DAG execution can also be used to determine operators’degrees of parallelism and to mitigate skew.The degree of parallelism for reduce tasks can have a large per-formance impact:launching too few reducers may overload re-ducers’network connections and exhaust their memories,while launching too many may prolong the job due to task scheduling overhead.Hive’s performance is especially sensitive to the number of reduce tasks,due to Hadoop’s large scheduling overhead. Using partial DAG execution,Shark can use individual parti-tions’sizes to determine the number of reducers at run-time by co-alescing many small,fine-grained partitions into fewer coarse par-titions that are used by reduce tasks.To mitigate skew,fine-grained partitions are assigned to coalesced partitions using a greedy bin-packing heuristic that attempts to equalize coalesced partitions’sizes[15].This offers performance benefits,especially when good bin-packings exist.Somewhat surprisingly,we discovered that Shark can obtain sim-ilar performance improvement by running a larger number of re-duce tasks.We attribute this to Spark’s low scheduling overhead.3.2Columnar Memory StoreIn-memory computation is essential to low-latency query answer-ing,given that memory’s throughput is orders of magnitude higher than that of disks.Naïvely using Spark’s memory store,however, can lead to undesirable performance.Shark implements a columnar memory store on top of Spark’s memory store.In-memory data representation affects both space footprint and read throughput.A naïve approach is to simply cache the on-disk data in its native format,performing on-demand deserialization in the query processor.This deserialization becomes a major bottle-neck:in our studies,we saw that modern commodity CPUs can deserialize at a rate of only200MB per second per core.The approach taken by Spark’s default memory store is to store data partitions as collections of JVM objects.This avoids deserial-ization,since the query processor can directly use these objects,but leads to significant storage space mon JVM imple-mentations add12to16bytes of overhead per object.For example, storing270MB of TPC-H lineitem table as JVM objects uses ap-proximately971MB of memory,while a serialized representation requires only289MB,nearly three times less space.A more seri-ous implication,however,is the effect on garbage collection(GC). With a200B record size,a32GB heap can contain160million ob-jects.The JVM garbage collection time correlates linearly with the number of objects in the heap,so it could take minutes to perform a full GC on a large heap.These unpredictable,expensive garbage collections cause large variability in workers’response times. Shark stores all columns of primitive types as JVM primitive plex data types supported by Hive,such as map and array,are serialized and concatenated into a single byte array. Each column creates only one JVM object,leading to fast GCs and a compact data representation.The space footprint of columnar data can be further reduced by cheap compression techniques at virtually no CPU cost.Similar to more traditional database systems [27],Shark implements CPU-efficient compression schemes such as dictionary encoding,run-length encoding,and bit packing. Columnar data representation also leads to better cache behavior, especially for for analytical queries that frequently compute aggre-gations on certain columns.3.3Distributed Data LoadingIn addition to query execution,Shark also uses Spark’s execution engine for distributed data loading.During loading,a table is split into small partitions,each of which is loaded by a Spark task.The loading tasks use the data schema to extract individualfields from rows,marshals a partition of data into its columnar representation, and stores those columns in memory.Each data loading task tracks metadata to decide whether each column in a partition should be compressed.For example,the loading task will compress a column using dictionary encoding if its number of distinct values is below a threshold.This allows each task to choose the best compression scheme for each partition, rather than conforming to a global compression scheme that might not be optimal for local partitions.These local decisions do not require coordination among data loading tasks,allowing the load phase to achieve a maximum degree of parallelism,at the small cost of requiring each partition to maintain its own compression meta-data.It is important to clarify that an RDD’s lineage does not need to contain the compression scheme and metadata for each parti-tion.The compression scheme and metadata are simply byproducts of the RDD computation,and can be deterministically recomputed along with the in-memory data in the case of failures.As a result,Shark can load data into memory at the aggregated throughput of the CPUs processing incoming data.Pavlo et al.[25]showed that Hadoop was able to perform data loading at5to10times the throughput of MPP databases.Tested using the same dataset used in[25],Shark provides the same through-put as Hadoop in loading data into HDFS.Shark is5times faster than Hadoop when loading data into its memory store.3.4Data Co-partitioningIn some warehouse workloads,two tables are frequently joined to-gether.For example,the TPC-H benchmark frequently joins the lineitem and order tables.A technique commonly used by MPP databases is to co-partition the two tables based on their join key in the data loading process.In distributedfile systems like HDFS, the storage system is schema-agnostic,which prevents data co-partitioning.Shark allows co-partitioning two tables on a com-mon key for faster joins in subsequent queries.This can be ac-complished with the DISTRIBUTE BY clause:CREATE TABLE l_mem TBLPROPERTIES("shark.cache"=true) AS SELECT*FROM lineitem DISTRIBUTE BY L_ORDERKEY; CREATE TABLE o_mem TBLPROPERTIES("shark.cache"=true,"copartition"="l_mem")AS SELECT*FROM order DISTRIBUTE BY O_ORDERKEY; When joining two co-partitioned tables,Shark’s optimizer con-structs a DAG that avoids the expensive shuffle and instead uses map tasks to perform the join.3.5Partition Statistics and Map PruningData tend to be stored in some logical clustering on one or more columns.For example,entries in a website’s traffic log data mightbe grouped by users’physical locations,because logs arefirst storedin data centers that have the best geographical proximity to users. Within each data center,logs are append-only and are stored in roughly chronological order.As a less obvious case,a news site’s logs might contain news_id and timestamp columns that have strongly correlated values.For analytical queries,it is typical to applyfilter predicates or aggregations over such columns.For ex-ample,a daily warehouse report might describe how different visi-tor segments interact with the website;this type of query naturally applies a predicate on timestamps and performs aggregations that are grouped by geographical location.This pattern is even more frequent for interactive data analysis,during which drill-down op-erations are frequently performed.Map pruning is the process of pruning data partitions based on their natural clustering columns.Since Shark’s memory store splits data into small partitions,each block contains only one or few log-ical groups on such columns,and Shark can avoid scanning certain blocks of data if their values fall out of the query’sfilter range.To take advantage of these natural clusterings of columns,Shark’s memory store on each worker piggybacks the data loading processto collect statistics.The information collected for each partition in-clude the range of each column and the distinct values if the num-ber of distinct values is small(i.e.,enum columns).The collected statistics are sent back to the master program and kept in memoryfor pruning partitions during query execution.When a query is issued,Shark evaluates the query’s predicates against all partition statistics;partitions that do not satisfy the pred-icate are pruned and Shark does not launch tasks to scan them.We collected a sample of queries from the Hive warehouse of a video analytics company,and out of the3833queries we obtained,at least3277of them contain predicates that Shark can use for map pruning.Section6provides more details on this workload.。

Towards a Toolkit for Data Analysis and Mining

Towards a Toolkit for Data Analysis and Mining

Data analysis and mining is typically a multistep process, involving an iterated process of data space partitioning, aggregate computation, and data transformations. These activities typically require steps that cross disciplines, software packages, and le formats. In this paper, we articulate the vision behind some of our ongoing work of developing a data and computational model that relies on a single concept { data space partition { to unify many aspects of data mining and analysis. The uni ed model would simplify the task of the data analyst, and would present opportunities for query optimization.
Towards a Toolkit for Data Analysis and Mining
Theodore Johnson AT&T Labs { Research johnsont@ Laks Lakshmanan ITT, Bombay laks@math.iitb.ernet.in Raymond Ng Univ. British Columbia rng@cs.ubc.ca
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Towards a Logical Schema Integrating Software Process Modellingand Software MeasurementRichard Webby * and Ulrike BeckerFraunhofer Institute for Experimental Software EngineeringSauerwiesen 6, D-67661 Kaiserslautern, Germany*On sabbatical until June 1997 from the Centre for Advanced Empirical Software Research, University of New South Wales, Sydney 2052, AustraliaAbstractThis paper introduces a logical schema for the integration of software process modelling and software measurement. The schema promotes a common understanding of concepts and terminology, serving as a bridge across the fields of process modelling and software metrics. It presents a formal unified view of the major information entities and their inter-relationships. The schema is designed to be general enough to be used with a variety of different process modelling formalisms and metrics approaches. It incorporates several abstraction mechanisms to support the handling of complex process information and multiple evolving versions of that information. The schema should be useful for both researchers and practitioners in empirical process modelling studies of large-scale systems development and evolution. This paper concludes by contrasting our position with the conceptual schema developed at the Software Engineering Institute. The SEI schema has been a major influence on our work, but the paper also outlines some significant differences between their schema and our own position.Keywords: software process modelling, software measurement.1. BackgroundThis paper presents a logical schema which aims to promote the integration of software process modelling and software measurement. The schema is designed to enable the modelling and measurement of software development and evolution processes in a manner independent of specific metrics approaches and process modelling languages.We anticipate that this work will have several benefits to both practitioners and researchers:• The explicit formulation of a logical schema or ‘meta-model’ bridging the two fields of process modelling and software metrics will lead to more precisely defined terminology and better understanding of common concepts and their inter-relationships.• The abstraction of the schema from the specifics of process representation formalisms and measurement methods enables a high level of independence between the data stored and the views of the data [CK92]. This moves us closer to the goal of organisational software experience bases [Ba94,KP93].• The definition of the interfaces between process models and measurement will enable tighter coupling of modelling tools and measurement technology. In so doing, the vision of an integrated software engineering environment [Lo93] is more achievable. T he ideas that we present in our schema are based extensively on existing work from both the fields of software process modelling and software metrics. From the process modelling perspective, our work borrows heavily from the conceptual schema for process definitions and models developed at the Software Engineering Institute (SEI) by Kellner and his colleagues [AK94]. Our work extends that schema, moving towards a more precise logical model of software process information. We present a comparison of the schema of the SEI group with our approach in section 3 of this paper.Other influences on our work from the process modelling perspective include the experiences gained with the process modelling language MVP-L [BL95], strategic dependency models [YM94], the E 3 project [Mo95] and numerous interviews with, and feedback from,practitioners of software process engineering.From the measurement side, we have been influenced by many sources, particularly the work on the TAME resource model [JB88] within the context of the GQM measurement paradigm and the recent work related to the SQUATter project [Wa96] within the Model-Measure-Manage paradigm (M 3P) [OJ97].2. PositionTo graphically and formally present the schema, we have adopted the recently released Unified Modelling Language (UML) developed by Booch, Jacobsen and Rumbaugh [Ra97]. The object-oriented notation of the UML was chosen as it is likely to become a standard in the field.2.1 Relationships among classesThe schema incorporates support for a variety of relationships among the classes used to represent real-world entities in the domains of software process modelling and measurement. The general types of relationships supported are common to those found in object-oriented modelling. We have chosen to present the relationships prior to a detailed discussion of the classes because much of the power and succinctness in the schema comes from the application of these relationships.2.1.1 InstantiationA major challenge in designing the schema was to allow modellers to specify abstract “types” (eg. an IEEE standard, a code complexity metric) as well as actual real-world “instances”, or values, of those types. The goal was that the schema would be useful for both:• prescriptive modelling, in which the activities that should be performed, the artefacts that should be produced and the measurements that should be taken can be indicated for a given type of project, and• descriptive modelling, in which observations of the actual activities performed, artefacts produced and measurements taken are made for a specific actual project.For the purpose of clearly separating type and instance data without creating a host of additional classes, a class called Enactment was designed to serve as the intermediary between any Element (eg. Activity, Artefact) and the Values assigned to its Attributes (see figure 3). The Enactment class also has the benefit of providing formal support for tracking multiple versions and measurements at different points in time during process performance. This allows flexible maintenance of historical, planned and current measurement data and models.2.1.2 AggregationWith complex process models, there is frequently the need to aggregate activities and artefacts into higher level entities. Aggregation is shown by the diamond symbol in the figures. Where possible, measurements should be taken for the lowest level entities (“elementary processes”), but this is not always possible in practice, so is not enforced in the schema. Measurement at the low level allows propagation of these values to the higher levels of aggregation in the model.2.1.3 SpecialisationAny Entity class may be sub-typed (see figure 1) to allow specialisation of its attributes and associations. For example, attributes such as “effort” could be specified at the root level of the Activity class and inherited by all other Activities.2.1.4 Other AssociationsOther domain-specific associations are described in the context of discussing the classes in the next section.2.2 ClassesNote that for space reasons, only the major classes in the schema are presented in this paper.2.2.1 Project-related classes (fig. 1)Figure 1 shows the major schematic entities that are related to the Project class. The Project class is an abstraction of a real-world project or a type of project. This is accomplished in the schema by instantiation (section 2.1.1). A Project may contain activities, artefacts and roles and it may also contain sub-projects through the aggregation relationship (section 2.1.2). The Project class also can be specialised (section 2.1.3), allowing attributes to be inherited by more specialised projects (eg., a hypermedia development project being a specialised form of development project).Activities are abstractions of the ‘things that are done’in the real world of software development and evolution. Our schema allows activities to be broken down into many sub-activities and a sub-activity to be aggregated into more than one activity. This handles the case where, for example, the activity of testing a system component could be viewed as a sub-activity of ‘testing all components’ or a sub-activity of ‘producing that specific component’.Artefacts are abstractions of the ‘things’ used or produced in the real world. These may be documents or files in physical or electronic form. Artefacts and Activities are related through ‘consumes’ and ‘produces’relationships.Roles represent logical groupings of functions associated with the Resources involved in the software process. Some examples of Roles include ‘author’, ‘review team’, and ‘compiler’. Roles act as a link between Activities and Resources (people, teams, and tools). The separation of Resources, Roles and Activities provides flexibility in assigning an Activity to an individual, a team, or an intelligent agent.2.2.2 Resource-related classes (fig. 2)Resources can be modelled independently of Projects.A Resource may be a Tool, an Organisation (which includes teams) or a Person. Organisations and people are related through a many-to-many relationship - many people may be in many different organisations - and people may assume a Position within the Organisation.Activities can be divided up into Tasks. A task is the specific part of a given activity performed by one specificResource acting in one specific Role. This enables attributes such as ‘effort’ to be measured at the necessary level of granularity. The TAME resource model’s [JB88]notion of “Resource-Use” is very similar to our notion of Task.2.2.3 Element associations (fig. 3)Figure 3 shows how the instantiation relationship is supported in our schema. Each Element (project, activity,artefact, task, role, resource) can be enacted and measured at any point in the software process. The Enactments, or states of the elements, are stored in an ordered list to enable retrieval of previous states. Each Enactment contains any number of Values related to the Attributes of the Element.Events may cause a change in state, affecting Attributes through the evaluation of an Expression, which could include simple assignment or entry of a value by a programmer, or could involve the calculation of a complex metric. Some events may be have Constraints which relate to other Attributes in the model. The major event categories that we have identified include:• Scheduled events, such as a milestone which may indicate a measurement point;• Requested events, where an element changes state due to the request of a process performer or due to the triggering of entry/exit criteria;• Selection events, where some path is chosen among competing alternatives (e.g. further testing iswarranted due to the outcome of a product defect metric); or• Unexpected events, such as the loss of a key person,which may be modelled descriptively, perhaps for the purpose of subsequent risk analysis.2.2.4 Attribute associations (fig. 4)Attributes are modelled separately from the actual entities in the model, following the lead of MVP-L. An Attribute has a type (string, integer, etc) and may be optionally assigned a default Value. An Attribute may be calculated using a Metric and fed as input to other Metrics.A Metric is a collection of Expressions, which may be Logical or Arithmetic Expressions, for instance.2.3 Summary of our positionThe logical schema affords a number of benefits to both practitioners and researchers. For practitioners, it helps unify the fields of software process modelling and software measurement, presenting common concepts in an explicit familiar representation (the UML). The schema provides the groundwork for tools to integrate software measurement and process modelling. For researchers, the flexibility in the design of this schema permits the extension of classes such as Metric and Event to incorporate more specific types.**consumes produces******involvescontainsArtifactcontainsProject is_a*1**contains**Entity A ActivityRole{same subtype}Element AnamedescriptionFigure 1. Project-related classesTool belongs to*PersonOrganisation **reports to*PositionResource A *Element ARole ActivityTaskassumes**involves1**11***Figure 2. Resource-related classes*1EnactmenthasValue*constrainsEventConstraintElement A 0..1hasAttribute*instantiates *Dateaffects**constrains **alters***ExpressionLogicalExpression0..1**affects{ordered}(state)**Figure 3. Major element associationsExpression0..1defaultValueAttribute typeLogicalExpression Arithmetical Expression***output toMetric*input to**0..1usesFigure 4. Attribute associations3. ComparisonThe work of the SEI group [AK94, AB94] is compared in this section with our position. Although, as stated, our work is based heavily on the SEI schema, there are a number of practical differences between their schema and ours.First, our schema is closer to being a logical schema than the SEI schema. We have attempted to be more precise and formal by using the UML [Ra97] to specify our schema. In so doing, it is possible that we may have lost some of the generality of the SEI approach. The SEI schema is strong in defining a rich set of process elements and a comprehensive library of possible interconnections between them. We have excluded a number of elements from our schema. For example, the SEI defines an additional entity called Procedure, to separate the “how it is done” information from the “what is done” information in the Activity. We have not yet included Procedure as a distinct entity in our schema, because we desired to achieve a simple initial version of our schema and regarded Procedure as non-essential to the schema’s structure.Second, although we have not represented all the SEI schema in our work, we have also gone further in some areas, so in this sense our work is broader in scope. For example, in the area of measurement, we have defined how metrics and expressions relate to attributes, making the integration with enactment and measurement technology more explicit in our schema.The third major difference is that the SEI schema defines a set of relationship and behaviour classes, whereas our approach was to explicitly specify relationships and behaviours by using the UML’s object-oriented notation. Our schema is suited to implementation in an object-oriented database or equivalent, while the SEI schema is better suited as an introduction to the concepts and as an checklist of the contents of a software process definition.One strong similarity between our position and that of the SEI work, is that the goal of producing a “common denominator schema” [CK92] or “canonical form”[AK94], whereby the schema can be used for translation among different modelling languages and tools, is a common one. We hope that we have contributed in the progression of research toward that goal in this paper. 4. References[AK94]Armitage, JW and Kellner MI. “A Conceptual Schema for Process Definitions and Models”. In Proc. 3rd Int. Conf. on the Software Process, October 1994, pp 153-165.[AB94]Armitage, JW, Briand L, Kellner MI, Over JW, Phillips RW. “Software Process Definition Guide: Content of Enactable Software Process Representations”, Special Report CMU/SEI-94-SR-21, December 1994[Ba94]Basili VR, Caldiera G and Rombach HD.“Experience Factory.” In Encyclopedia of Software Engineering (Marciniak ed.), pp 469-476. Wiley, New York.[BL95]Bröckers A, Lott CM, Rombach HD, Verlage M.MVP-L Language Report Version 2. Interner Bericht 265/95, Fachbereich Informatik, Universitat Kaisersläutern, 1995. [CK92]Curtis B, Kellner M and Over J. “Process Modeling”, Communications of the ACM 35(9), Sept 1992, pp 75-90 [KP93]Kellner MI and Phillips RW. “Practical technology for process assets”. Proc. 8th Int. Soft. Process Workshop, Wadern Germany, March 1993, pp. 107-112[OJ97]Offen RJ and Jeffery DR. “A model-based approach to establishing and maintaining a software measurement program.” To be published in IEEE Software, March 1997. [JB88]Jeffery DR and Basili VR. “Validating the TAME Resource Data Model”. In Proc. 10th Int. Conf. on Soft.Eng., Singapore 1988, pp 187-200.[Lo93]Lott, CM. “Process and measurement support in SEEs”, Software Engineering Notes, 18(4), Oct 1993, pp 83-93.[Mo95]Morisio, M. “A methodology to measure the software process”. In Proceedings of the 7th Annual Oregon Workshop on Software Metrics, Silver Falls, OR, 1995.[Ra97]Rational Software Corporation, Unified Modelling Language, version 1.0, January 1997, downloadable from [Wa96]Walkerden F. “A design for a software metrics repository”. CAESAR Technical Report #96/7, University of NSW, Sydney Australia.[YM94]Yu ESK and Mylopoulos J, “Understanding ‘why’ in software process modelling analysis and design”. In Proc.16th Int. Conf. on Soft. Eng. Sorrento Italy, May 1994, pp 159-1685. BiographyRichard Webby is a Lecturer in the School of Informa-tion Systems, and a Researcher in the Centre for Advanced Empirical Software Research (CAESAR), at the University of New South Wales. During 1996-97 he is on sabbatical for one year as a Visiting Researcher at the Fraunhofer Institute for Experimental Software Engineering in Kaiserslautern, Germany. His research interests lie in the fields of software engineering and decision support sys-tems. Specifically, he seeks to explore the software proc-esses underlying graphical user interface development, and to empirically measure and assess software end-products. His PhD research involved the development of a graphical interactive decision support system and empirical assessment of its effectiveness in terms of decision quality. His recent research has been in the design of a prototype process modelling tool called Spearmint.Ulrike Becker received her Master's degree from the University of Kaiserslautern in 1996. She is a member of the Process Modelling group at the Fraunhofer Institute for Experimental Software Engineering. Her main research interests are in software process modelling, especially multi-view modelling, and how to integrate data collected on a role-specific basis. She is currently also involved in the development of the Spearmint process modelling tool.。

相关文档
最新文档