General Terms
General Terms
A Tagging Approach for Bundling AnnotationsYamin Htun, Joanna McGrenere, Kellogg S. BoothDepartment of Computer Science, University of British Columbia{yhtun, joanna, ksbooth}@cs.ubc.caABSTRACTIn a paper presented at CHI 2006 we introduced structured annotations, called bundles, to support co-authors in the edit-review-comment document lifecycle, and we reported a study showing that bundles facilitate workflow by improving reviewing accuracy and efficiency. Bundles are a “top down” way to organize annotations. We demonstrate an enhanced prototype that also supports “bottom up” organization using tagging techniques, new automated bundle creation options, and the reviewing features and manual bundle creation present in the first prototype. Categories and Subject DescriptorsH.5.3. [Information interfaces and Presentation, HCI] Group and Organizational Interfaces – Asynchronous interaction, Computer-supported collaborative workGeneral TermsDesign, human factorsKeywordsAsynchronous collaboration, collaborative writing, tagging, structured annotationEXTENDED ABSTRACTAsynchronous collaborative writing is common, and annotations play an important role as a central communication medium connecting co-authors with evolving artifacts in the process [7]. However, the lack of support for rich annotations in most word processing systems often forces valuable communication to happen outside the shared document in the bodies of emails, to which the document is an attachment. These messages are separate from the document, making the establishment of a shared reference for discussion difficult [2].Co-authors often copy and paste referenced content of the document into email or type explicit navigation statements such as “Clarify my questions on the third and last paragraphs,” which can be time consuming and error-prone. Significant overhead is required to reconstruct the context of the communication [4]: workflow requires navigating between email messages and the document itself [4] and information is likely to be lost or ignored [1]. At best, in order to keep track of the workflow and progress in the task, collaborators need to maintain not only document files but also the email messages [8]. Information overload and workflow inefficiencies can result with increasing numbers of annotations after only a few reviewing cycles.To facilitate the workflow management involved in collaborative writing, we previously identified user-centered requirements for annotation support and developed a comprehensive model of annotations [8] in which each annotation has a set of attributes such as the creator of the annotation, a timestamp, reviewing status(read/unread and accepted/rejected), and one or more anchors to material in the document. Annotations can have optional attributes such as a list of recipients, a comment, replacements for the anchored material, a name, and substructure.A bundled annotation (or bundle)represents a structured group of annotations with various anchors into the document. There are no restrictions on structuring annotations other than that they be acyclic; an annotation can be associated with more than one bundle. Changes in an annotation’s status will be automatically synchronized across different bundles to which it belongs.We previously described a user study that investigated the effect of structured annotations on reviewing workload and quality [8]. Participants were asked to review a set of annotations with a Simple Editor containing only basic annotations (edits and comments) with high-level communication taking place in a separate email message window, and with a Bundle Editor in which annotations are structured into bundles with high-level communication integrated as generalized annotations. Participants performed faster and more accurately with the Bundle Editor and they found bundles innovative and intuitive. We did not investigate the usability and consequences of bundles in the annotation-creation stage. We are now examining this.In our model, bundles can be created in four ways: (1) manually, (2) automatically, (3) as a result of filtering operations and queries, and (4) as a result of editing commands. While annotating the document, co-authors manually create bundles by explicitly selecting and grouping annotations into bundles. At the end of each reviewing session, a bundle is created automatically with all the new annotations made during the session. Every time a user filters the annotations based on specified attributes, a temporary bundle is created, which can be saved as a permanent bundle with a single click. Moreover, when a user performs normal editing commands such as “Find/Replace” or “Spell Check”, a bundle will be created with all the edits from the command gathered into sub-bundles such as “replaced,” “skipped,” and “ignored”. Although automatic bundle creation does not require extra effort from reviewers, we doubt that automation can fully capture the richness and complexity of the annotations used in discussions. Hence, our goal is to minimize the effort required by reviewers when manually creating bundles and managing annotations. While exploring different approaches we were inspired by recent successes with tagging, in which users assign meta-data or keywords to information resources. Traditionally meta-data is created by professionals (catalogers or authors) [5], but systemsPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.CSCW 2006, November 4-8, 2006, Banff, Alberta, Canada.Copyright 2006 ACM …$5.00.like flickr and delicious allow ordinary users to describe and organize content with any vocabulary they choose. Tagging facilitates the organization of information within personal or shared information spaces. Browsing and searching tags attached to information resources by other users encourage collaboration. Compared to traditional folder-based hierarchical information management models, collaborative tagging is believed to reduce the cognitive workload experienced by users [6]. A major drawback for tagging is the ambiguity and imprecision of tags and the lack of control for synonyms and homonyms [3].In our top-down approach, a user associates an annotation with a bundle by manually dragging the annotation into the bundle. When an annotation is in multiple bundles the work increases linearly with the number of associated bundles. Tagging is a bottom-up approach that reduces effort and achieves a more seamless workflow. An annotation can be easily associated with more than one bundle simply by tagging it with appropriate keywords; bundles are created through filtering that recognizes tags as filterable attributes. Because co-authors have their document as a shared context, we believe tags will be consistent and scalable across users, alleviating the ambiguity and imprecision seen in more general contexts while providing flexibility in classifying information into more than one category. Bottom-up tagging captures multiple semantic concepts that are inherent in most information resources through a light-weight and intuitive means of organizing and sharing information in a collaborative setting.The core interface to the “Bundle Editor” prototype consists of a document pane and a reviewing pane(Figure 1). The main component of the document pane is the document editor, which has typical functionality (insert, delete, comment, etc.). The reviewing pane is a multi-tabbed pane with each tab displaying a specific group of annotations. The reviewing pane supports creating new bundles, adding and removing annotations from a specific bundle, and sorting and filtering annotations based on particular attributes.Tagging, which is bottom up, is appropriate for unknown workflows where structure emerges and serendipity needs to be supported. During more precise workflow, top-down structuring through manual or automated bundle creation is likely to be the preferred approach. We will demonstrate both top-down and bottom-up structuring in the Bundle Editor to illustrate the advantages of each. We expect to report results from preliminary studies of how co-authors use these two approaches. The studies will compare ease of use across the two approaches, examine the semantic categories within annotations for a shared document, and investigate the role of bundles in facilitating problem decomposition strategies involved in co-authoring workflow. REFERENCES[1]Cadiz, J., Gupta, A., Grudin, J. (2000). Using webannotations for asynchronous collaboration arounddocuments. ACM CSCW ‘00. pp309-318.[2]Churchill, E., Trevor, J., Bly, S., Nelson, L., and Cubranic,D. (2000). Anchored conversations: chatting in the context ofa document. ACM CHI ’00. pp 454-461.[3]Guy, M., and Tonkin, E. (2006) Folksonomies: Tidying upTags? D-Lib Magazine, 12, 1.[4]Hee-Cheol, K., and Eklundh K. (2001). Reviewing practicesin collaborative writing. In Computer Supported Cooperative Work, 10, 2. pp 247-259.[5]Mathes, A. (2004) Folksonomies – cooperative classificationand communication through shared metadata. http://www.{HYPERLINK "/academic/computer-mediated-communication/folksonomies.html"} (accessed07/2006).[6]Sinha, R. (2005). A cognitive analysis of tagging. In RashmiSinha’s weblog/archives/05_09/ {HYPERLINK"tagging-cognitive.html"}(accessed 07/ 2006).[7]Weng, C., & Gennari, J. (2004). Asynchronous collaborativewriting through annotations. ACM CSCW ’04. pp 578–581. [8]Zheng, Q., Booth, K.S., and McGrenere, J. (2006). Co-authoring with structured annotations. ACM CHI ’06. pp131-140.Figure 1. Bundle Editor with document and reviewing panes.。
General Terms
Two Supervised Learning Approaches for Name Disambiguation in Author CitationsHui Han Department of Computer Science and Engineering The Pennsylvania StateUniversity University Park,P A,16802 hhan@Lee GilesSchool of InformationSciences and T echnologyThe Pennsylvania StateUniversityUniversity Park,P A,16802giles@Hongyuan ZhaDepartment of ComputerScience and EngineeringThe Pennsylvania StateUniversityUniversity Park,P A,16802zha@Cheng Li Department of Biostatistics Harvard School of PublicHealthBoston,MA,02115cli@ Kostas Tsioutsiouliklis NEC Laboratories America,Inc.4Independence Way,Princeton,NJ08540kt@ABSTRACTDue to name abbreviations,identical names,name misspellings, and pseudonyms in publications or bibliographies(citations),an author may have multiple names and multiple authors may share the same name.Such name ambiguity affects the performance of document retrieval,web search,database integration,and may cause improper attribution to authors.This paper investigates two supervised learning approaches to disambiguate authors in the ci-tations1.One approach uses the naive Bayes probability model,a generative model;the other uses Support Vector Machines(SVMs) [39]and the vector space representation of citations,a discrimi-native model.Both approaches utilize three types of citation at-tributes:co-author names,the title of the paper,and the title of the journal or proceeding.We illustrate these two approaches on two types of data,one collected from the web,mainly publication lists from homepages,the other collected from the DBLP citation databases.Categories and Subject DescriptorsH.3.3[Information Systems]:Information Search and RetrievalGeneral TermsAlgorithms1“Citations”refer to an author’s publication list in the citation for-mat.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.JCDL’04,June7–11,2004,Tucson,Arizona,USA.Copyright2004ACM1-58113-832-6/04/0006...$5.00.KeywordsNaive Bayes,Name Disambiguation,Support Vector Machine 1.INTRODUCTIONDue to name variation,identical names,name misspellings,and pseudonyms,we observe two types of name ambiguities in research papers or bibliographies(citations).Thefirst type is that an author has multiple name labels.For example,the author“David S.John-son”may appear in multiple publications under different name ab-breviations such as“David Johnson”,“D.Johnson”,or“D.S.John-son”,or a misspelled name such as“Davad Johnson”.The second type is that multiple authors may share the same name label.For example,“D.Johnson”may refer to“David B.Johnson”from Rice University,“David S.Johnson”from AT&T research lab,or“David E.Johnson”from Utah University(assuming the authors still have these affiliations).Name ambiguity can affect the quality of scientific data gather-ing,can decrease the performance of information retrieval and web search,and can cause the incorrect identification of and credit attri-bution to authors.For example,identical names cause the ambigu-ity of the“author page”in the web DBLP(Digital Bibliography& Library Project)2.The author page of“Yu Chen”in the DBLP con-tains citations from three different people with the same name:Yu Chen from University of California,Los Angeles;Yu Chen from Microsoft Beijing;Yu Chen as the senior professor from Renmin University of China.Such name ambiguity causes the incorrect identification of authors.For example,the author page of“Jia Li”in the DBLP refers to the“Jia Li”from the Department of Statistics at the Pennsylvania State University.However,the“Home Page”link in her author page directs to the professor with the identical name in the Department of Mathematical Sciences at the Univer-sity of Alabama in Huntsville.We observe from CiteSeer[18] the incorrect attribution to the authors due to similar ambiguity.“D.Johnson”is the most cited author in Computer Science accord-2rmatik.Uni-Trier.DE/∼ley/db/index.htmling to CiteSeer’s statistics in May2003(/ mostcited.html).However,the citation number that“D.Johnson”obtained in CiteSeer’s statistics is actually the sum of several dif-ferent authors such as“David B.Johnson”,“David S.Johnson”, and even“Joel T.Johnson”.This paper investigates the name disambiguation in the context of citations.We propose the idea of a canonical name,i.e.a name that is the minimal invariant and complete name entity for disam-biguation.Such a name may have more than just the name of the individual as constituents.A possible example of a canonical name would be a name entity that has all the characteristics of a name in-cluding abbreviations and AKA’s.“Authorized name”is a similar concept in library practice.Getty’s ULAN(Union List of Artist’s Names)[1]and the Library of Congress name authorityfile[2] are good demonstrations of the canonical,or authoritative form of names.Name ambiguity is a special case of the general problem of iden-tity uncertainty,where objects are not labeled with unique identi-fiers[30].Much research has been done to address the identity un-certainty problem in differentfields using different methods,such as record linkage[17],duplicate record detection and elimination [8,26,29],merge/purge[22],data association[6],database hard-ening[11],citation matching[28],name matching[7,37,9],and name authority work in library cataloging practice[40,15,19]. Citation matching,name matching and name authority work are the work most similar to ours.Citation matching and name match-ing are similar to our method in citation context and the choice of citation attributes for computation.However,the records identified by related work are actually duplicate records in different syntactic formats,while what we identify are different records authored by the same name entity.Another difference is that many name match-ing algorithms are string-based[7,37,9].Bilenko et al.show in their work that the string-based similarity computation works better than token-based methods,which may be due to many misspellings in their datasets[7].We expect token-based methods to betterfit the name disambiguation task because the problem can be treated at the token level,and misspellings and abbreviations are not the main source of the citation differences for these au-thority is the process through which librarians for the past centuary have intellectually provided disambiguation for personal and cor-porate names in the world’s bibliographic output.However,much name authority work is conducted manually.DiLauro et.al.[40, 15]propose a semi-automatic algorithm using Bayes probabilities to disambiguate composers and artists in the Levy music collec-tion.However,their algorithm largely depends on the Library of Congress name authorityfile.We study two machine learning approaches for name disam-biguation,one based on a generative model and the other based on a discriminative model.A generative model can create other examples of the data,usually provides good insight into the na-ture of the data and facilitates easy incorporation of domain knowl-edge[4].We observe that an author’s citations usually contain the information of the author’s research area and his or her individ-ual patterns of coauthoring.Therefore,we propose a naive Bayes model,a generative statistical model frequently used in word sense disambiguation tasks[16,38],to capture all authors’writing pat-terns.Discriminative models such as Support Vector Machines, are basically classifiers.Other differences are that the naive Bayes model uses only positive training citations to model an author’s writing patterns,while the SVMs learn from both positive and neg-ative training citations the distinction between different authors’citations.Also,the naive Bayes model classifies a citation to an author based on the probabilities,while the SVMs uses a distance measure[36].In addition,a probability model allows us to sys-tematically combine different models[23],and is easily extensibleto more information;the vector space representation of citations inclassification approaches usually needs to tune weights for differentattributes[7,37].Our approaches assume the existence of a citation database(train-ing data)indexed by the canonical name entities.Such a citation database can be constructed in several ways.For example,con-structing the database based on existing databases such as DBLP;collecting publication lists from researchers’home pages(Usuallythese publication lists are in the citation format);or clustering ci-tations according to the name entities,as shown by our previouswork[21].Given a full citation with the query name implicitly omitted,ourname disambiguation is to predict the most likely canonical namefrom the citation database.For example,“[J.Anderson],S.Baruah,K.Jeffay.Parallel Switching in Connection-Oriented Networks.IEEE Real-Time Systems Symposium1999:200-209”is a test cita-tion.“J.Anderson”is the omitted query name.The naive Bayes ap-proach estimates the author-specific probabilities,such as the priorprobability of each author,and his/her probabilities of coauthor-ing with coauthors,using certain keywords in the title of the paper,and publishing papers in certain places,as described in detail inSection2.Given a new citation and its query author name,namedisambiguation is to search the database and choose the canoni-cal name entry with the highest posterior probability of producingthis citation.The SVM approach considers each author as a class,and classifies a new citation to the closest author class.With theSVM approach,we represent each citation in a vector space;eachcoauthor name and keyword in paper/journal title is a feature of the vector.Both approaches use three attributes of the citations associatedwith each canonical name entry in the citation database:coauthornames,paper titles,and journal titles.By“journal titles”,we ac-tually refer to the titles of all the publication sources,such as pro-ceedings and journals.Author names in citations are represented by thefirst name initial and last name,the minimal name informationseen in citations.Citation attributes can be extracted by methodssuch as regular expression matching,rule-based system[10],hid-den Markov models[33,34,35],or Support Vector Machines[20].To minimize the effect of inaccurate citation parsing on the studyof two approaches,we use regular expression matching and man-ual correction to parse the citations in“J Anderson”and“J Smith”datasets,as discussed in Section4.1.The DBLP citation datasetsare already in the XML format with parsed attributes.The rest of the paper is organized as follows:Section2describesthe naive Bayes approach;Section3describes the SVM approach;Section4reports experiments and results;Section5concludes and discusses future work.2.THE NAIVE BAYES MODELWe assume that each author’s citation data is generated by thenaive Bayes model,and use his/her past citations as the trainingdata to estimate the model parameters.Based on the parameter es-timates,we use the Bayes rule to calculate the probability that eachname entry X i(i∈[1,N],where N is the total number of candi-date name entries in the citation database)would have generatedthe input citation.2.1Model OverviewGiven an input test citation C with the omission of the queryauthor,the target function is tofind a name entry X i in the citation database with the maximal posterior probability of producing thecitation C,i.e.,max i P(X i|C)(1) Using the Bayes rule,the problem becomesfindingmax i P(C|X i)P(X i)/P(C)(2) where P(X i)denotes the prior probability of X i authoring papers,and is estimated from the training data as the proportion of the pa-pers of X i among all the citations.The prior is useful to incorporate the knowledge,such that a prolific author can have large P(X i). P(C)denotes the probability of the citation C and is omitted since it does not depend on X i.Then Function2becomesmax i P(C|X i)P(X i)(3) We assume that coauthors,paper titles,and journal titles are in-dependent citation attributes,and different elements in an attribute type are also independent from each other.The different attribute element here refers to the individual coauthor,the individual key-word in the paper title,and the individual keyword in the journal title.By“keyword”,we mean the remaining words afterfiltering out the stop words(such as,“a”,“the”“of”,etc.).Therefore,we decompose P(C|X i)in Function3asP(C|X i)=Yj P(A j|X i)=YjYkP(A jk|X i)(4)where A j denotes the different type of attribute;that is,A1-the coauthor names;A2-the paper title;A3-the journal title.Each attribute is decomposed into independent elements represented by A jk(k∈[0..K(j)]).K(j)is the total number of elements in at-tribute A j.For example,A1=(A11,A12,...,A1k,...,A1K(1)), where A1k indicates the kth coauthor in C.To avoid underflow,we store log probabilities in our implementation,and the target func-tion becomes:max i P(X i|C)=max i[Xj Xklog(P(A jk))+log(P(X i))](5)where j∈[1,3]and k∈[0,K(j)).The above attribute inde-pendence assumption may not hold for real-world data,since there exist cases such as multiple coauthors always appearing together. However,empirical evidence shows that naive Bayes often per-forms well in spite of such violation.Friedman,Domingos and Paz-zani show that the violation of the word independence assumption sometimes may affect slightly the classification accuracy(Fried-man1997;Domingos and Pazzani1996).2.2Model Parameters and EstimationNext we describe the decomposition and estimation of the coau-thor conditional probability P(A1|X i)from the training citations, where A1=(A11,A12,...,A1k,...,A1K(1)).The probability es-timation is the maximum likelihood estimation for parameters of multinomial distributions.The pseudo count1is added in parame-ter estimation to avoid zero probability in the estimation results.P(A1|X i)is decomposed into the following conditional proba-bilities.•P(N|X i)-the probability of X i writing a future paper alone conditioned on the event of X i,estimated as the proportion of the papers that X i authors alone among all the papers of X i.(N stands for“No coauthor”,and“Co”below stands for “Has coauthor”).•P(Co|X i)-the probability of X i writing a future paper with coauthors conditioned on the event of X i.P(Co|X i)=1−P(N|X i).•P(Seen|Co,X i)-the probability of X i writing a future pa-per with previously seen coauthors conditioned on the eventthat X i writes a future paper with coauthors.We regard thecoauthors coauthoring a paper with X i at least twice in thetraining citations as the“seen coauthors”;the other coau-thors coauthoring a paper with X i only once in the trainingcitations is considered as the“unseen coauthors”.There-fore,we estimate P(Seen|Co,X i)as the proportion of thenumber of times that X i coauthors with“seen coauthors”among the total number of times that X i coauthors with anycoauthor.Note that if X i has n coauthors in a training cita-tion C,we count that X i coauthors n times in citation C.•P(Unseen|Co,X i)-the probability of X i writing a futurepaper with“unseen coauthors”conditioned on the event thatX i writes a paper with coauthors.P(Unseen|Co,X i)=1−P(Seen|Co,X i)•P(A1k|Seen,Co,X i)-the probability of X i writing a fu-ture paper with a particular coauthor A1k conditioned on theevent that X i writes a paper with previously seen coauthors.We estimate it as the proportion of the number of times thatX i coauthors with A1k among the total number of times X icoauthors with any coauthor.•P(A1k|Unseen,Co,X i)-the probability of X i writing afuture paper with a particular coauthor A1k conditioned onthe event that X i writes a paper with unseen coauthors.Con-sidering all the names in the training citations as the popula-tion and assuming that X i has equal probability to coauthorwith an unseen author,we estimate P(A1k|Unseen,Co,X i) as1divided by the total number of author(or coauthor)names in the training citations minus the number of coauthors of X i.However,the small citation size may underestimate the pop-ulation of new coauthors that X i will coauthor with in thereal-world.This may in turn underestimates the probabilityof an author coauthoring with previously seen coauthors.Inthis case we can set a larger population size.•P(A1|X i)=P(N|X i)if K(1)=0•P(A1|X i)=P(A11|X i)...P(A1k|X i)...P(A1K|X i)if K(1)>0,whereP(A1k|X i)=P(A1k,N|X i)+P(A1k,Co|X i)=0+P(A1k,Co|X i)=P(A1k,Seen,Co|X i)+P(A1k,Unseen,Co|X i)=P(A1k|Seen,Co,X i)∗P(Seen|Co,X i)∗P(Co|X i)+P(A1k|Unseen,Co,X i)∗P(Unseen|Co,X i)∗P(Co|X i) The above decomposition is motivated by the following hypothe-ses:(1)Different authors X i have different probabilities of writ-ing papers alone,writing papers with previously seen coauthors or previously unseen coauthors.(2)Each author X i has his/her own list of previously seen coauthors,and a unique probability distri-bution on these previously seen coauthors to write papers with.If the above hypotheses hold,we expect these conditional probabil-ities to capture the coauthoring history and pattern of X i,and to help disambiguate the omitted author from the rest of a citation C. Similarly,we can estimate the conditional probability P(A2|X i) that an author writes a paper title,and the conditional probability P(A3|X i)that he publishes in a particular journal.Taking each title word of the paper and journal as an independent element,we estimate the probabilities that X i uses a certain word for a future paper title,and publishes a future paper in a journal with a particu-lar word in the journal title.Here the goal is to use author-specificprobabilities to capture information such as the researchfield,key-words in the research direction,and the preference of title word usage from past citations of X i.2.3Computational ComplexitySuppose a citation database consists of N canonical authors, where each author has an average of M training citations,and each citation has an average of K attribute elements.The com-putational complexity for training(estimating the probabilities)the above model is O(MNK);the computational complexity for the query step using coauthor information alone is O(NK)for each query citation.This complexity indicates the scalability of our al-gorithm to real-world applications.3.SUPPORT VECTOR MACHINESThis approach considers each author as a class,and trains the classifier for each author class.Given a full citation with the omis-sion of the query name,the goal of name disambiguation is to clas-sify this citation to the closest author class.Each citation is repre-sented by a feature vector,with each coauthor name and keyword in the paper/journal title as a feature and its frequency in the citation as the feature weight.We use the X ∞to normalize the weight of features with different ranges of values,which was shown to improve the classification performance[20].We choose Support Vector Machines[39,12]as classifiers be-cause of their good generalization performance and ability in han-dling high dimensional data.All experiments use SV M light[24].3.1Support Vector Machine Classification andFeature SelectionThe SVM is designed for two class classification problem.Let {( x1,y1),...,( x N,y N)}be a two-class training dataset,with x i a training feature vector and their labels y i⊂(-1,+1).The SVM attempts tofind an optimal separating hyperplane to maximally separate two classes of training data.The corresponding decision function is called a classifier.In the case where the training data is linearly separable,computing an SVM for the data corresponds to minimizing w such thaty i( w· x i+w0)−1≥0,∀i(6) The linear decision function isf( x)=sgn{( w· x)+w0}=sgn{nXiα∗i y i( x i· x)+w∗0}(7)If f( x)>0,the data x belongs to class1;otherwise, x belongs to class2.The absolute value of f( x)indicates the distance of x from the other class.In thefinal decision function f( x),the training samples with non zero coefficientsα∗i lie closest to the hyperplane,and are called support vectors.As Equation7shows, f( x)is a weighted sum of all features,plus a constant term as the threshold.n is the number of support vectors.Zhang et.al [43]propose to rank the features according to their contribution in separating the differences between two classes.We formalize such a contribution of a feature by Expression8,where x ij is the weight of the feature j in support vector i.We use such ranking of features to analyze the classification performance by SVMs(Section4.1).nXiα∗i y i x ij(8)We extend SVMs to multi-class classification using the“One class versus all others”approach,i.e.,one class is positive and the remaining classes are negative.4.EXPERIMENTS4.1Datasets and Experiment DesignWe apply both approaches on two types of data.Thefirst type of data is publication lists collected from the web,mostly from researchers’homepages.This type of data contains two datasets, one from15different“J Anderson”s,shown in Table1,the other from11different“J Smith”s3.Both“J Anderson”and“J Smith”are ambiguous names in the database of our EbizSearch system-a CiteSeer like search engine specializing in the E-Business area[32].We query“Google”using name information such as“J An-derson”,or the full name information available in our EbizSearch databases such as“James Anderson”,and the keyword“publica-tions”.We manually check the returned links,recognize each re-searcher under the samefirst name initial and last name,and collect their publication web pages to construct our datasets.The other type of data are downloaded from the DBLP web-site,which contains more than300,000bibliographic XML citation records with parsed citation attributes.We form the three attributes in each citation as a string.We then cluster author names with the samefirst name initial and the same last name;each name is associ-ated with the citations where the name appears.We sort the formed name datasets by the number of citations in each set.9large name datasets with each having more than10name variations are cho-sen for experiments,as shown in Table7.We observe that many names in the DBLP have complete name information.To avoid te-dious manual checking,we choose from each name dataset the full names that have more thanfive citations,and consider each such name to represent a canonical name entity.We prepare the training/testing datasets,preprocess the data,and construct the citation databases,in the same way for all datasets. We preprocess the datasets on author names,paper title words and journal title words as follows.All the author names in the cita-tions are simplified tofirst name initial and last name.For example,“Yong-Jik Kim”is simplified to“Y Kim”.A reason for the simplifi-cation is that thefirst name initial and last name format is popular in bibliographic records.Since more name information usually helps name entity disambiguation,we think that insufficient name infor-mation from simplified name format would be good for evaluating our algorithm.Moreover,the simplified name format may avoid some cases of name misspellings.We stem the words of paper ti-tles and journal titles using Krovetz’s stemmer[25],and remove the stop words such as“a”,“the”,etc.We also replace the conference or journal title abbreviations by their full names for more informa-tion.The full names of the conference or journal titles are obtained from the DBLP websites4.Each name dataset is randomly split,with half of them used for training,and the other half for testing.For example,the“J Ander-son”dataset contains117citations for training and112citations for testing;the“J Smith”dataset contains172training citations and 166testing citations.A citation database is then constructed for each name dataset,based on the parsed and pre-processed train-ing citations.For example,the citation database of“J Anderson”contains15canonical name entries for15different“J Anderson”s, with each name entry associated with available identity informa-tion,such as full name,affiliation,research area,as well as authored citations.With each approach,we conduct10experiments with randomly split dataset for each experiment.In each experiment,we explore 3/users/h/x/hxh190/projects/name pro ject.htm4rmatik.uni-trier.de/∼ley/db/conf/indexa.html and rmatik.uni-trier.de/∼ley/db/journals/index.htmlJ Anderson Full name Affiliation Research area Training size Test size 1James Nicholas Anderson UK Edinburgh Communication interface research442James E.Anderson Boston College Economics773James A.Anderson Brown Univ.Neural network214James B.Anderson Penn.State Univ.Chemistry335James B.Anderson Univ.of Toronto Biologist11106James B.Anderson Univ.of Florida Entomology987James H.Anderson U.of North Carolina at Chapel Hill Computer processors27278James H.Anderson Stanford Univ.Robot229James D.Anderson Univ.of Toronto Dentistry3210James P.Anderson N/A Computer Security2111James M.Anderson N/A Pathology3212James Anderson UK Robot vision and philosophy91013James W.Anderson Univ.of KY Medicine5514Jim Anderson Univ.of Southampton Mathematician101015Jim V.Anderson Virginia Tech Univ.Plant pathology2020 Table1:The citation dataset of15“J Anderson”s.Column2,3&4shows the available“Identification information”of a“J Anderson”,e.g.,the full name of each“J Anderson”,his or her affiliation and research area.“Training/test size”lists the number of citations used for training/testing.For space limitation,we do not list here the web sites where we download the citations.Scheme Coauthor Paper title Journal title Hybrid I Hybrid IIApproach Bayes SVM Bayes SVM Bayes SVM Bayes SVM BayesMean71.3%64.4%77.9%82.9%72.1%74.4%91.3%95.6%93.5%StdDev 2.1% 3.8% 3.3% 1.9% 2.1% 3.0% 1.6% 1.7% 1.8%P Value 1.38E-050.0030.0120.0003Table2:The mean and the standard deviation(StdDev)of the10name disambiguation accuracy trials on the“J Anderson”dataset, with both the naive Bayes approach(Bayes)and the SVM approach(SVM);and the statistical significance(two tail P value)of the performance difference by the two approaches.multiple schemes based on different combinations of the utilized citation attributes.The motivation is to study the contributions of different citation attributes on name disambiguation.Both ap-proaches use three schemes which use alone one citation attribute, and at least one of two“Hybrid”schemes which combine aspects of all three attributes.In the naive Bayes model approach,“Hybrid I”computes the equal joint probability of different attributes.In the SVM approach,“Hybrid I”combines different attributes in the same feature space.The“Hybrid II”scheme is specific to the naive Bayes model and uses the coauthor attribute alone when a coauthor relationship exists between a coauthor in the test citation and a can-didate name entry in the citation database;otherwise,“Hybrid II”uses the equal joint probability of all the three attributes.Flexibility of manipulating attributes is an advantage of using the probability model.The absence of a particular attribute can be handled by omitting the corresponding probabilities.“Hybrid II”is motivated by the experimental observation that with the“J Anderson”dataset, adding title words decreases the number of disambiguated names when using only the co-author information.We observe that the coauthor information is valuable for name disambiguation,and de-sign the“Hybrid II”scheme to preserve the names disambiguated by using coauthor information alone.We evaluate the experiment performance by“accuracy”,and de-fine the“accuracy”as the percentage of the query names correctly predicted.The next section shows experiment results and analysis on the all name datasets.4.2Name Disambiguation on the First Type ofDataTable2shows the mean and the standard deviation(StdDev)of the10name disambiguation accuracy trials on the“J Anderson”name dataset,using both approaches.Table3shows the similar trials on the“J Smith”name dataset.The experiment results on these two name datasets are similar,most likely due to the two name datasets having similar probability distributions,since most citations in both datasets are derived from labeled homepages.We analyze the experiment results in detail as follows:(1)Different attributes have different contributions for name disambiguationConsider the“J Anderson”dataset as an example.Table2shows that using paper title words alone achieves higher average accu-racy(77.9%,82.9%)than using either coauthor(71.3%,64.4%)or journal information alone(72.1%,74.4%)with both approaches. Table4shows in detail one experiment using the naive Bayes ap-proach;all other9experiments show similar results.We observe that authors in this dataset have higher probabilities of reusing title words than collaborating with previously seen coauthors.Table4 shows an example of the probability distribution of each attribute. For example,Row4in Column2&3(with header“Seen”)shows that92.0%((86+17)out of112)test citations reuse the words in paper titles;Row5in Column2&3shows that84.8%((79+16)out of112)test citations reuse words in journal titles;and Row3in Column2&3shows that only57.1%(64out of112)test citations have the previously seen coauthor relationship.The above probability distribution indicates that authors in this dataset tend to use the same words for multiple papers,probably because multiple papers are about the same project.And the au-thors in some research areas such as Biology or Plant pathology tend to have a few places they prefer to submit papers.For exam-ple,J.Anderson15(Jim V.Anderson;J.Anderson15refers to the 15th table entry)publishes37.5%(15out of40)of his papers in the same journal“Plant physiology”.Such consistent information con-tained in the journal title helps name entity disambiguation more than the paper title words,especially when the name entities to be disambiguated have diverse research areas.(2)Bayes model better captures the coauthoring patterns of an author than the SVM approachTable2and Table3show that the naive Bayes model(71.3%, 75.2%average accuracy)outperforms the SVM approach(64.4%, 60.0%average accuracy)when using coauthor information alone in。
General Terms 一般词汇
General Terms 一般词汇manager 经纪人instructor 教练,技术指导guide 领队trainer 助理教练referee, umpire (网球.棒球)裁判linesman, touch judge (橄榄球)裁判contestant, competitor, player 运动员professional 职业运动员amateur 业余运动员,爱好者enthusiast, fan 迷,爱好者favourite 可望取胜者(美作:favorite) outsider 无取胜希望者championship 冠军赛,锦标赛champion 冠军record 纪录record holder 纪录创造者ace 网球赛中的一分Olympic Games, Olympics 奥林匹克运动会Winter Olympics 冬季奥林匹克运动会Universiade 世界大学生运动会stadium 运动场track 跑道ring 圈ground, field 场地pitch (足球、橄榄球)场地court 网球场team, side 队Football 足球football, soccer, Association football 足球field, pitch 足球场midfied 中场kick-off circle 中圈half-way line 中线football, eleven 足球队football player 足球运动员goalkeeper, goaltender, goalie 守门员back 后卫left 左后卫right back 右后卫centre half back 中卫half back 前卫left half back 左前卫right half back 右前卫forward 前锋centre forward, centre 中锋inside left forward, inside left 左内锋inside right forward, inside right 右内锋outside left forward, outside left 左边锋outside right forward, outside right 右边锋kick-off 开球bicycle kick, overhead kick 倒钩球chest-high ball 平胸球corner ball, corner 角球goal kick 球门球ground ball, grounder 地面球hand ball 手触球header 头球penalty kick 点球spot kick 罚点球free kick 罚任意球throw-in 掷界外球ball handling 控制球block tackle 正面抢截body check 身体阻挡bullt 球门前混战fair charge 合理冲撞chesting 胸部挡球close-marking defence 钉人防守close pass, short pass 短传consecutive passes 连续传球deceptive movement 假动作diving header 鱼跃顶球flying headar 跳起顶球dribbling 盘球finger-tip save (守门员)托救球clean catching (守门员)跳球抓好flank pass 边线传球high lobbing pass 高吊传球scissor pass 交叉传球volley pass 凌空传球triangular pass 三角传球rolling pass, ground pass 滚地传球slide tackle 铲球clearance kick 解除危险的球to shoot 射门grazing shot 贴地射门close-range shot 近射long drive 远射mishit 未射中offside 越位to pass the ball 传球to take a pass 接球spot pass 球传到位to trap 脚底停球to intercept 截球to break through, to beat 带球过人to break loose 摆脱to control the midfield 控制中场to disorganize the defence 破坏防守to fall back 退回to set a wall 筑人墙to set the pace 掌握进攻节奏to ward off an assault 击退一次攻势to break up an attack 破坏一次攻势ball playing skill 控球技术total football 全攻全守足球战术open football 拉开的足球战术off-side trap 越位战术wing play 边锋战术shoot-on-sight tactics 积极的抢射战术time wasting tactics 拖延战术Brazilian formation 巴西阵式,4-2-4 阵式four backs system 四后卫制four-three-three formation 4-3-3 阵式four-two-four formation 4-2-4 阵式red card 红牌(表示判罚出场)yellow card 黄牌(表示警告)Tennis 网球tennis 网球运动lawn tennis 草地网球运动grass court 草地网球场racket 球拍racket press 球拍夹gut, string (球拍的)弦line ball 触线球baseline ball 底线球sideline ball 边线球straight ball 直线球down-the-line shot 边线直线球crosscourt 斜线球high ball, lob 高球low ball 低球long shot 长球short shot 短球cut 削球smash 抽球jump smash 跃起抽球spin 旋转球low drive 抽低球volley 截击空中球low volley 低截球deep ball 深球heavy ball 重球net 落网球flat stroke 平击球flat drive 平抽球let 重发球fluke, set-up, easy 机会球ground stroke 击触地球wide 打出边线的球overhead smash, overhand smash 高球扣杀game 局set 盘fifteen all 一平thirty all 二平forty all 三平deuce 局末平分, 盘末平局love game 一方得零分的一局double fault 双误, 两次发球失误‘not up’, 两跳,还击前球着地两次service line 发球线fore court 前场back court 后场centre mark 中点server 发球员receiver 接球员Athletics 竞技race 跑middle-distance race 中长跑long-distance runner 长跑运动员sprint 短跑(美作:dash)the 400 metre hurdles 400米栏marathon 马拉松decathlon 十项cross-country race 越野跑jump 跳跃jumping 跳跃运动high jump 跳高long jump 跳远(美作:broad jump) triple jump, hop step and jump 三级跳pole vault 撑竿跳throw 投掷throwing 投掷运动putting the shot, shot put 推铅球throwing the discus 掷铁饼throwing the hammer 掷链锤throwing the javelin 掷标枪walk 竞走Individual Sports 体育项目gymnastics 体操gymnastic apparatus 体操器械horizontal bar 单杠parallel bars 双杠rings 吊环trapeze 秋千wall bars 肋木side horse, pommelled horse 鞍马weight-lifting 举重weights 重量级boxing 拳击Greece-Roman wrestling 古典式摔跤hold, lock 揪钮judo 柔道fencing 击剑winter sports 冬季运动skiing 滑雪ski 滑雪板downhill race 速降滑雪赛,滑降slalom 障碍滑雪ski jumping competition 跳高滑雪比赛ski jump 跳高滑雪ice skating 滑冰figure skating 花样滑冰roller skating 滑旱冰bobsleigh, bobsled 雪橇Games and Competitions 球类运动Football 足球football, soccer, Association football 足球field, pitch 足球场midfied 中场kick-off circle 中圈half-way line 中线football, eleven 足球队football player 足球运动员goalkeeper, goaltender, goalie 守门员back 后卫left 左后卫right back 右后卫centre half back 中卫half back 前卫left half back 左前卫right half back 右前卫forward 前锋centre forward, centre 中锋inside left forward, inside left 左内锋inside right forward, inside right 右内锋outside left forward, outside left 左边锋outside right forward, outside right 右边锋kick-off 开球bicycle kick, overhead kick 倒钩球chest-high ball 平胸球corner ball, corner 角球goal kick 球门球ground ball, grounder 地面球hand ball 手触球header 头球penalty kick 点球spot kick 罚点球free kick 罚任意球throw-in 掷界外球ball handling 控制球block tackle 正面抢截body check 身体阻挡bullt 球门前混战fair charge 合理冲撞chesting 胸部挡球close-marking defence 钉人防守close pass, short pass 短传consecutive passes 连续传球deceptive movement 假动作diving header 鱼跃顶球flying headar 跳起顶球dribbling 盘球finger-tip save (守门员)托救球clean catching (守门员)跳球抓好flank pass 边线传球high lobbing pass 高吊传球scissor pass 交叉传球volley pass 凌空传球triangular pass 三角传球rolling pass, ground pass 滚地传球slide tackle 铲球clearance kick 解除危险的球to shoot 射门grazing shot 贴地射门close-range shot 近射long drive 远射mishit 未射中offside 越位to pass the ball 传球to take a pass 接球spot pass 球传到位to trap 脚底停球to intercept 截球to break through, to beat 带球过人to break loose 摆脱to control the midfield 控制中场to disorganize the defence 破坏防守to fall back 退回to set a wall 筑人墙to set the pace 掌握进攻节奏to ward off an assault 击退一次攻势to break up an attack 破坏一次攻势ball playing skill 控球技术total football 全攻全守足球战术open football 拉开的足球战术off-side trap 越位战术wing play 边锋战术shoot-on-sight tactics 积极的抢射战术time wasting tactics 拖延战术Brazilian formation 巴西阵式,4-2-4 阵式four backs system 四后卫制four-three-three formation 4-3-3 阵式four-two-four formation 4-2-4 阵式red card 红牌(表示判罚出场)yellow card 黄牌(表示警告)rugby 橄榄球basketball 篮球volleyball 排球Tennis 网球tennis 网球运动lawn tennis 草地网球运动grass court 草地网球场racket 球拍racket press 球拍夹gut, string (球拍的)弦line ball 触线球baseline ball 底线球sideline ball 边线球straight ball 直线球down-the-line shot 边线直线球crosscourt 斜线球high ball, lob 高球low ball 低球long shot 长球short shot 短球cut 削球smash 抽球jump smash 跃起抽球spin 旋转球low drive 抽低球volley 截击空中球low volley 低截球deep ball 深球heavy ball 重球net 落网球flat stroke 平击球flat drive 平抽球let 重发球fluke, set-up, easy 机会球ground stroke 击触地球wide 打出边线的球overhead smash, overhand smash 高球扣杀game 局set 盘fifteen all 一平thirty all 二平forty all 三平deuce 局末平分, 盘末平局love game 一方得零分的一局double fault 双误, 两次发球失误‘not up’, 两跳,还击前球着地两次service line 发球线fore court 前场back court 后场centre mark 中点server 发球员receiver 接球员baseball 垒球handball 手球hockey 曲棍球golf 高尔夫球cricket 板球ice hockey 冰球goalkeeper 球门员centre kick 中线发球goal kick 球门发球throw in, line-out 边线发球to score a goal 射门得分to convert a try 对方球门线后触地得分batsman 板球运动员batter 击球运动员men's singles 单打运动员in the mixed doubles 混合双打Water Sports 水上运动swimming pool 游泳池swimming 游泳medley relay 混合泳crawl 爬泳breaststroke 蛙式backstroke 仰式freestyle 自由式butterfly (stroke) 蝶泳diving competition 跳水water polo 水球water skiing 水橇rowing 划船canoe 划艇boat race 赛艇yacht 游艇kayak 皮船sailing 帆船运动outboard boat 船外马达Bicycle Motorcycle 自行车,摩托车car 车类运动velodrome, cycling stadium 自行车赛车场road race 公路赛race 计时赛chase 追逐赛motorcycle, motorbike 摩托车racing car 赛车racing driver 赛车驾驶员rally 汽车拉力赛Riding and Horse Riding 赛马riding 骑马racecourse, racetrack 跑马场,赛马场jockey, polo 马球rider 马球运动员show jumping competition 跳跃赛steeplechase 障碍赛fence 障碍trotter 快跑的马其它体育英语词汇和术语之三:拳击Boxing 拳击boxer 拳击运动员boxing glove 拳击手套boxing shoe 拳击鞋infighting 近战straight punch 直拳uppercut 上钩拳right hook 右钩拳foul 犯规punch bag 沙袋punch ball 沙球boxing match 拳击比赛referee 裁判员boxing ring 拳击台rope 围绳winner 胜利者loser by a knockout 被击败出局者timekeeper 计时员boxing weights 拳击体重级别light flyweight 48公斤级, 次特轻量级flyweight 51公斤级, 特轻量级bantamweight 54公斤级, 最轻量级featherweight 57公斤级, 次轻量级lightweight 60公斤级, 轻量级light welterweight 63.5公斤级, 轻中量级welterweight 67公斤级, 次中量级light middleweight 71公斤级, 中量级middleweight 75公斤级, 次重量级light heavyweight 81公斤级, 重量级heavyweight 81以上公斤级, 最重量级acrobatic gymnastics---技巧运动athletics/track & field---田径beach---海滩boat race---赛艇bobsleigh, bobsled---雪橇boxing---拳击canoe slalom---激流划船canoe---赛艇chess---象棋cricket---板球cycling---自行车diving---跳水downhill race---速降滑雪赛,滑降dragon-boat racing---赛龙船dressage---盛装舞步equestrian---骑马fencing---击剑figure skating---花样滑冰football(英语)/soccer(美语)---足球freestyle----自由式gliding; sailplaning---滑翔运动golf----高尔夫球Greece-Roman wrestling----古典式摔跤gymnastic apparatus----体操器械gymnastics----体操handball-----手球hockey----曲棍球hold, lock-----揪钮horizontal bar-----单杠hurdles; hurdle race----跨栏比赛huttlecock kicking---踢毽子ice skating---滑冰indoor---室内item Archery---箭术judo---柔道jumping----障碍kayak----皮划艇mat exercises---垫上运动modern pentathlon---现代五项运动mountain bike---山地车parallel bars---双杠polo---马球qigong; breathing exercises---气功relative work---造型跳伞relay race; relay---接力rings----吊环roller skating----滑旱冰rowing-----划船rugby---橄榄球sailing--帆船shooting---射击side horse, pommelled horse---鞍马ski jump---跳高滑雪ski jumping competition---跳高滑雪比赛ski---滑雪板skiing---滑雪slalom---障碍滑雪softball---垒球surfing---冲浪swimming----游泳table tennis---乒乓球taekwondo---跆拳道tennis----网球toxophily---射箭track---赛道trampoline---蹦床trapeze---秋千triathlon---铁人三项tug-of-war---拔河volleyball---排球badminton---羽毛球baseball---棒球basketball---篮球walking; walking race---竞走wall bars---肋木water polo----水球weightlifting ---举重weights ---重量级winter sports -----冬季运动wrestling --- 摔交yacht --- 游艇Men's 10m Platform 男子10米跳台Women's Taekwondo Over 67kg 女子67公斤级以上跆拳道Women's Athletics 20km Walk 女子20公里竟走Men's Diving Synchronized 3m Springboard 男子3米跳板Women's Diving 3m Springboard 女子3米跳板Women's Diving Synchronized 10m Platform 女子10米跳台Men's Wrestling Greco-Roman 58kg 男子58公斤古典摔交Men's Diving 3m Springboard 男子3米跳板Men's Artistic Gymnastics Parallel Bars 竞技体操男子双杠Women's Artistic Gymnastics Beam 竞技体操女子自由体操Men's Table Tennis Singles 男子乒乓单打Women's Diving 10m Platform 女子10米跳台Women's Artistic Gymnastics Uneven Bars 竞技体操女子跳马Women's Table Tennis Singles 女子乒乓单打Men's Badminton Singles 男子羽毛球单打Women's Badminton Doubles 女子羽毛球双打Men's Diving Synchronized 10m Platform 跳水男子10米跳台Women's Diving Synchronized 3m Springboard 跳水女子3米跳板Men's Table Tennis Doubles 男子乒乓球双打Women's Badminton Singles 女子羽毛球单打Men's Fencing Team Foil 击剑男子团体花剑Women's Judo Heavyweight +78kg 柔道女子重量级78公斤Men's Shooting 10m Running Target 射击男子10米移动靶Women's Shooting 25m Pistol 射击女子25米运动手枪Women's Table Tennis Doubles 女子乒乓球双打Men's Weightlifting 77kg 举重男子77公斤级抓举Women's Weightlifting 75+ kg 举重女子75公斤以上级抓举Mixed Badminton Doubles 羽毛球男子双打Women's Artistic Gymnastics All-Around Finals 竞技体操女子个人全能决赛Women's Judo Half-Heavywt 78kg 女子次重量级78公斤级柔道Men's Artistic Gymnastics All-Around Finals 竞技体操男子个人全能Women's Fencing Team Epee 击剑女子团体重剑Women's Artistic Gymnastics Team Finals 竞技体操女子团体Women's Judo Half-Middlewt 63kg 女子次中量级63公斤级柔道Women's Weightlifting 63kg 女子63公斤级挺举举重Women's Weightlifting 69kg 女子69公斤级抓举举重Men's Artistic Gymnastics Team Finals 男子团体竞技体操Men's Shooting 10m Air Rifle 射击男子10米气步枪Women's Shooting Trap 射击女子多向飞碟Women's Weightlifting 53kg 举重女子53公斤级抓举Women's Judo Half-Lightwt 52kg 女子次轻量级52公斤柔道Women's Shooting 10m Air Pistol 女子10米汽枪Women's Cycling Track 500m Time Trial 运动场自行车赛女子500米计时赛Men's Shooting 10m Air Pistol 男子10米气手枪Women's Shooting 10m Air Rifle 女子10米气步枪Men's Weightlifting 56kg 男子56公斤级挺举1.General Terms 一般词汇manager 经纪人instructor 教练,技术指导guide 领队trainer 助理教练referee, umpire (网球.棒球)裁判linesman, touch judge (橄榄球)裁判contestant, competitor, player 运动员professional 职业运动员amateur 业余运动员,爱好者enthusiast, fan 迷,爱好者favourite 可望取胜者(美作:favorite)outsider 无取胜希望者championship 冠军赛,锦标赛champion 冠军record 纪录record holder 纪录创造者ace 网球赛中的一分Olympic Games, Olympics 奥林匹克运动会Winter Olympics 冬季奥林匹克运动会stadium 运动场track 跑道ring 圈ground, field 场地pitch (足球、橄榄球)场地court 网球场team, side 队2.Athletics 竞技race 跑middle-distance race 中长跑long-distance runner 长跑运动员sprint 短跑(美作:dash)the400 metre hurdles 400米栏marathon 马拉松decathlon 十项cross-country race 越野跑jump 跳跃jumping 跳跃运动high jump 跳高long jump 跳远(美作:broad jump)triple jump, hop step and jump 三级跳pole vault 撑竿跳throw 投掷throwing 投掷运动putting the shot, shot put 推铅球throwing the discus 掷铁饼throwing the hammer 掷链锤throwing the javelin 掷标枪walk 竞走3.Individual Sprots 体育项目gymnastics 体操gymnastic apparatus 体操器械horizontal bar 单杠parallel bars 双杠rings 吊环trapeze 秋千wall bars 肋木side horse, pommelled horse 鞍马weight-lifting 举重weights 重量级boxing 拳击GRE ece-Roman wrestling 古典式摔跤hold, lock 揪钮judo 柔道fencing 击剑winter sports 冬季运动skiing 滑雪ski 滑雪板downhill race 速降滑雪赛,滑降slalom 障碍滑雪ski jumping competition 跳高滑雪比赛ski jump 跳高滑雪ice skating 滑冰figure skating 花样滑冰roller skating 滑旱冰bobsleigh, bobsled 雪橇4.Games and Competitions 球类运动football 足球rugby 橄榄球basketball 篮球volleyball 排球tennis 网球baseball 垒球handball 手球hockey 曲棍球golf 高尔夫球cricket 板球ice hockey 冰球goalkeeper 球门员centre kick 中线发球goal kick 球门发球throw in, line-out 边线发球to score a goal 射门得分to convert a try 对方球门线后触地得分batsman 板球运动员batter 击球运动员men's singles 单打运动员in the mixed doubles 混合双打5.Water Sports 水上运动swimming pool 游泳池swimming 游泳medley relay 混合泳crawl 爬泳breaststroke 蛙式backstroke 仰式freestyle 自由式butterfly (stroke) 蝶泳diving competition 跳水water polo 水球water skiing 水橇rowing 划船canoe 划艇boat race 赛艇yacht 游艇kayak 皮船sailing 帆船运动outboard boat 船外马达。
GENERAL-TERMS(合同翻译)
GENERAL TERMS & CONDITIONS FOR THE PURCHASE OF SERVICESECLIPSE有限公司购买服务的一般条款和条件The original English language version of these Terms and Conditions is the legally-binding version. The translated version is for information only.此合同条款和条件的英文原文版为具法律效应版本。
翻译版本仅供参考。
1Interpretation 解释1.1In these Conditions:在这些条件下:’Company’means Eclipse Translations Limited whose registered office is at European Translation Centre Lionheart Enterprise Park Alnwick Northumberland NE66 2HT (Company Number: 03290358) “公司” 指Eclipse翻译有限公司,公司注册地址:诺森伯兰郡Alnwick,Lionheart企业园,欧洲翻译中心, 邮编:NE66 2HT (European Translation Centre, Lionheart Enterprise Park, Alnwick, Northumberland NE662HT)(公司编号:03290358)‘charges’means the charge for the Services“收费” 指提供服务的收费‘conditions’means the standard conditio ns of purchase set out in this document and (unless the context otherwise requires) includes any special conditions agreed in Writing between the Company and the Translator“条件” 指本文陈述的购买标准条件,以及包括公司和翻译者之间书面同意的任何特殊条件(除非文中另有要求)‘contract’means the agreement con stituted by the acceptance of these Conditions by the Translator“合同” 指翻译者接受这些条件达成的协议’Delivery Address’means that address stated on the Order“交付地址” 指定单上所述的地址’Order’means the Company’s purchase order to which these Conditions are annexed“定单” 指符合这些条件的公司购买定单‘Translator’means the person or company so described in the Order“翻译者” 指定单中所述的人员或公司‘Services’ means the services (if any) described in the Order“服务” 指定单中所描述的服务(如有)’Specification’includes any plans, drawings, data, description or otherinformation relating to the Services“说明” 包括任何规划、设计图、绘图、资料、描述或其它与服务有关的资讯‘terms’ means the standard terms of purchase set out in this document and (unless the context otherwise requires) includes any special terms agreed in Writing between the Company and Translator“条款” 指本文陈述的购买标准条款,以及包括公司和翻译者之间书面同意的任何特殊条款(除非文中另有要求)‘work”means a translation produced by the Translator in the course of performing the Services ——————————————————————————–Page 2“作品” 指翻译者通过提供服务所作的译文‘writing’includes telex, facsimile transmission and comparable means of communica-tion.“书面” 包括电报、传真和类似的通讯手段。
set out general terms and conditions 条款和条件
set out general terms and conditions 条款和条件一般条款与条件第一条:定义与解释1.1 在本合同中,除非上下文另有要求,否则以下词语和表达应具有以下含义:"我们"、"我们的":指本条款与条件的提供者,即服务的供应方;"您"、"您的":指接受本条款与条件的个人或实体,即服务的用户;"服务":指我们向您提供的任何产品、服务或功能,包括但不限于在线平台、应用程序、软件、内容或其他相关服务。
1.2 本条款与条件中的标题仅为方便阅读而设,不影响其解释。
第二条:接受条款与条件2.1 通过使用我们的服务,您表示已阅读、理解并同意受本条款与条件的约束。
如果您不同意这些条款与条件,您不得使用我们的服务。
2.2 我们可能随时修改本条款与条件。
任何修改将在发布时生效,并适用于此后对服务的使用。
您应定期查看本条款与条件以了解任何修改。
第三条:服务的使用3.1 您必须遵守所有适用的法律、法规和规章,不得将我们的服务用于任何非法、欺诈或有害的目的。
3.2 您不得干扰或破坏我们的服务,包括但不限于使用任何病毒、恶意软件、蠕虫、特洛伊木马或其他有害代码。
3.3 您有责任保护您的账户安全,包括但不限于保管好您的用户名和密码,防止未经授权的访问和使用。
第四条:知识产权4.1 我们的服务中包含的所有内容,如文本、图形、图像、音频、视频、软件、数据等,均受版权、商标、专利或其他知识产权法律的保护。
4.2 除非本条款与条件或适用法律明确允许,否则您不得复制、分发、修改、展示、公开表演、传输或以其他任何方式使用我们的服务中的任何内容。
第五条:免责声明与责任限制5.1 我们的服务按“现状”和“可用”的基础提供,不附带任何形式的明示或暗示的保证,包括但不限于对适销性、特定用途的适用性或非侵权性的保证。
5.2 在法律允许的范围内,我们对于因使用或无法使用我们的服务而产生的任何直接、间接、偶然、特殊、后果性或惩罚性的损害不承担任何责任,即使我们已被告知这种损害的可能性。
GENERAL TERMS
1 2
13
1 冫
Applicants failinqto meetthe ICTI CARE tuocess standard withinthe twelve(12) monthsregistration period shallbe required to re.register wiBl ICn CARE as a new eppticant no soon"i il# it iee ' 1:1 rnon*,. froln U," expiratjon ofthe original regisbation periodincluding peiloagr"r,t"O. any extension ICFAL.may at ifi own discretion issuea specific type of seal to the Applicant to reffectthe audit data av.ailable. The sealsinctude Class A, B, C or probation, depending on U,"'i[u"ioi iorpriance verified. The criteriafor thele sealsforrows the terms and conditids ristedinitr" wu!"i inJ worxinq HoursGuiderines and hplementationPIanavajlable _ www.ictilcare.orq on the website ofthe ICFAL The Applicant understands and agrees that ICFAL is not obljged to proMde evidence to the Apptjcant to ju$iry a cerbin levelof compliance and the Applicant process agrees that the listingon the IcrI CARE website will reflect these different ievelsof certified complia naeas describedin A:rticle 5.3 belo;The Applicanrs vioration of the conditions shal invaridate the varidityof its contract - --- wrth ICFALand any grantedsealof compliance sha be invalidated and recalled from the A;oii.ani.
GENERAL TERMS(合同翻译)
GENERAL TERMS & CONDITIONS FOR THE PURCHASE OF SERVICESECLIPSE有限公司购买服务的一般条款和条件The original English language version of these Terms and Conditions is the legally-binding version. The translated version is for information only.此合同条款和条件的英文原文版为具法律效应版本。
翻译版本仅供参考。
1Interpretation 解释1.1In these Conditions:在这些条件下:‘Company‘means Eclipse Translations Limited whose registered office is at European Translation Centre Lionheart Enterprise Park Alnwick Northumberland NE66 2HT (Company Number: 03290358) ―公司‖ 指Eclipse翻译有限公司,公司注册地址:诺森伯兰郡Alnwick,Lionheart企业园,欧洲翻译中心, 邮编:NE66 2HT (European Translation Centre, Lionheart Enterprise Park, Alnwick, Northumberland NE662HT)(公司编号:03290358)‗charges‘means the charge for the Services―收费‖ 指提供服务的收费‗conditions‘means the standard conditio ns of purchase set out in this document and (unless the context otherwise requires) includes any special conditions agreed in Writing between the Company and the Translator―条件‖ 指本文陈述的购买标准条件,以及包括公司和翻译者之间书面同意的任何特殊条件(除非文中另有要求)‗contract‘means the agreement con stituted by the acceptance of these Conditions by the Translator―合同‖ 指翻译者接受这些条件达成的协议‘Delivery Address‘means that address stated on the Order―交付地址‖ 指定单上所述的地址‘Order‘means the Company‘s purchase order to which these Conditions are annexed―定单‖ 指符合这些条件的公司购买定单‗Translator‘means the person or company so described in the Order―翻译者‖ 指定单中所述的人员或公司‗Services‘ means the services (if any) described in the Order―服务‖ 指定单中所描述的服务(如有)‘Specification‘includes any plans, drawings, data, description or otherinformation relating to the Services―说明‖ 包括任何规划、设计图、绘图、资料、描述或其它与服务有关的资讯‗terms‘ means the standard terms of purchase set out in this document and (unless the context otherwise requires) includes any special terms agreed in Writing between the Company and Translator―条款‖ 指本文陈述的购买标准条款,以及包括公司和翻译者之间书面同意的任何特殊条款(除非文中另有要求)‗work‖means a translation produced by the Translator in the course of performing the Services ——————————————————————————–Page 2―作品‖ 指翻译者通过提供服务所作的译文‗writing‘includes telex, facsimile transmission and comparable means of communica-tion.―书面‖ 包括电报、传真和类似的通讯手段。
General Terms
Java Bytecode as a Typed Term CalculusTomoyuki HiguchiSchool of Information SicenceJapan Advanced Institute ofScience and Technology Tasunokuchi Ishikawa,923-1292Japan thiguchi@jaist.ac.jpAtsushi OhoriSchool of Information SicenceJapan Advanced Institute ofScience and Technology Tasunokuchi Ishikawa,923-1292Japan ohori@jaist.ac.jpABSTRACTWe propose a type system for the Java bytecode language, prove the type soundness,and develop a type inference al-gorithm.In contrast to the existing proposals,our type system yields a typed term calculus similar to type systems of lambda calculi.This enables us to transfer existing tech-niques and results of type theory to a JVM-style bytecode language.We show that ML-style let polymorphism and recursive types can be used to type JVM subroutines,and that there is an ML-style type inference algorithm.The type inference algorithm has beeen implemented.The ability to verify type soundness is a simple corollary of the existence of type inference algorithm.Moreover,our type theoretical approach opens up various type safe extensions including higher-order methods,flexible polymorphic typing through polymorphic type inference,and type-preserving compila-tion.Categories and Subject DescriptorsD.3.1[Programming Languages]:Formal Definitions and Theory;D.3.2[Programming Languages]:Language Clas-sifications—Macro and assembly languages,Object-oriented languagesGeneral TermsLanguages,Theory,VerificationKeywordsJava bytecode,bytecode verifier,type system,type inference 1.INTRODUCTIONType safety of executable code is becoming increasingly important due to recently emerging network computing,where pieces of executable code are dynamically exchanged over the Internet and used under the user’s own privileges.An Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.PPDP’02,October6-8,2002,Pittsburgh,Pennsylvania,USACopyright2002ACM1-58113-528-9/02/0010...$5.00.important achievement toward this direction is the develop-ment of the Java bytecode language[13],which is the target language of the Java programming language[5].A distin-guishing feature is its typing constraint.The system can ensure type correct execution of a given bytecode by check-ing type constraint before execution.Type verification of JVM bytecode is essentially static type checking,customary done in high-level typed program-ming languages.Since JVM is a powerful and complex sys-tem,the development of a correct and reliable type checking system requires a formal framework for semantics and typ-ing derivation of the JVM bytecode language.This prob-lem has recently attracted the attention of programming language researchers,and several static type systems have been developed.Stata and Abadi[17]propose a static type system for a subset of JVM bytecode language including subroutines.In this work,a type system checks the satisfi-ability of constraints on memory states induced by the set of instructions in the given code.This paradigm has been successfully used in subsequent proposals on type-checking the JVM bytecode language.Freund and Mitchell[4]use this framework to analyze a subtle problem of object initialization and propose a refined type-checking scheme.They further show in[3]that their approach extends to various features of JVM including ob-jects,classes,interfaces,and exceptions.O’Callahan[15] also extends the approach of[17]to allow moreflexible typ-ing for subroutines by giving an explicit return type of a subroutine.In addition to this,he introduces polymorphic typing to achieveflexibility in subroutine usage.Hagiya and Tozawa[6]give an alternative approach for subroutines based on dataflow analysis,which yields a simpler sound-ness proof.Iwama and Kobayashi[7]propose another type system based on[17]to verify the correctness of the usage of lock primitives,where the type of an object has the informa-tion about the order in which the object is locked/unlocked.Leroy[12]presents a light-weight verification method based on dataflow analysis.In those proposals,a type system is designed to check type consistency of memory states imposed by instructions in a given bytecode.While this paradigm is useful in establishing type safety of the JVM bytecode language,some important questions remain.For example,how does this paradigm ex-tend to other useful programming features,or how does this paradigm relates to existing theory of type systems?As we comment later in the next section,O’Callahan’s system uses the notion of continuation which represents more informa-1tion of the behavior of a program than set of memory usage constraints,but its relationship to existing notions in type theory is not entirely clear.One obstacle in answering these questions appears to be the fact that in this paradigm,types do not represent be-havior of bytecode.This is in contrast with type systems of programming languages where types provide a static view of the behavior of a program–a typingΓ£M:τimplies that M yields a value of typeτwhen it is executed in an envi-ronment of typeΓ.This view yields a clean type soundness theorem and has been the basis for incorporating various advanced features such as polymorphism and type inference in a language.Jones[8]suggests representing each JVM instruction as a function and sequencing as function composition.Although semantics of functions is general enough to represent JVM instructions,it does not seem to reflect machine execution directly.As a result,it is not obvious that this approach could be a basis for establishing type soundness or for de-veloping a feasible type-checking algorithm for bytecode in-cluding subroutines.Our goal is to develop a typed calculus for a JVM-style bytecode language,where types represent behavior of code. We base our development on Curry-Howard isomorphism for low-level code[16],where it is shown that a low-level code language corresponds to a sequent style proof system of the intuitionistic propositional logic.Katsumata and Ohori[9] sketch that this idea can be applied to represent JVM-style bytecode.The purpose of[9]is to present a proof-directed de-compilation method,and its treatment of JVM bytecode is only on syntactical typing and is quite limited;it does not consider subroutine or inheritance,and it does not discuss type soundness.In the present paper,we refine the logical approach of[16]and develop a typed calculus of JVM byte-code including most of its features,establish type soundness, and develop a type inference algorithm.We believe that the proposed calculus provides type-theoretical account for JVM,and serves as a framework for extending JVM-style bytecode languages with various advanced features includ-ing higher-order methods,flexible polymorphic typing,poly-morphic type inference,and type-preserving compilation. The rest of the paper is organized as follows.Section2 outlines our approach.Section3gives a typed calculus for JVM,and establishes type soundness.Section4extends the typed calculus with polymorphism and develops a type infer-ence algorithm.We have implemented a prototype bytecode verifier based on the type inference algorithm,which is de-scribed in Section5.Section6discusses some extensions to the type systems,and Section7concludes the paper.2.TYPE-THEORETICALINTERPRETATION OF BYTECODEWe follow the logical interpretation of low-level code[16] and interpret a bytecode program as a typing derivation.In JVM,a program consists of a collection of labeled blocks ending with a return instruction or a branch instruction. We let I and B range over(non-branching)instructions and code blocks,respectively.A non-branching instruction op-erates on a local environment and a stack.We useΓand ∆for types of local environments and stacks respectively. Code blocks,environment types,and stack types arefinite sequences,for which we use the following notations.e·S isthe sequence obtained by adding an element e at the beg-ging of a sequnce S,and S{n←e}is the sequence obtained by changing the n th element of S to e.The empty sequence is denoted byφ.Applying the idea of[16],we interpret a JVM block B asa judgment of the formΓ,∆£B:τin a sequent style proofsystem.A return instruction corresponds to an initial se-quent in the proof system.For example,Γ,int·∆£ireturn: int indicates that ireturn is a complete program returning the top element of the current stack.A goto(l)refers to an existing block B named l.So we typeΓ,∆£goto(l):τif Γ,∆£B:τ.An ordinary non-branching instruction I that changes the machine state of typeΓ1,∆1to that of type Γ2,∆2,written as I:Γ1,∆1=⇒Γ2,∆2,is interpreted as asa left-rule of the form:Γ2,∆2£B:τΓ1,∆1£I·B:τwhich can be read“backward”saying that the execution of I transforms a bigger proof of I·B to a smaller proof ofB.Most of the non-branching instructions including one formethod invocation can be interpreted in this way.An important exception is jsr for calling a subroutine.A subroutine block(ranged over by SB)does not return avalue but returns a modified environment and a modified stack to be used by the block that follows jsr.Let l be a la-bel of a subroutine block SB that changes a machine state of typeΓ1,∆1to that of typeΓ2,∆2,and let B be a block such thatΓ2,∆2£B:τ.We interpret a subroutine call jsr(l)·B as a function to transform a judgment of typeΓ2,∆2£τto that of typeΓ1,∆1£τ.To represent this intuitive static semantics,we type jsr(l)·B asΓ1,∆1£jsr(l)·B:τand assign to SB a type of the form Γ2,∆2£τ//Γ1,αl·∆1£τ whereαl denotes the type of a return address.The use of continuation in O’Callahan’s work[15]cor-responds to introducing the partΓ1,∆1in the above type.Capturing this part of typing is enough to ensure type safety of a program consisting only of blocks and subroutine blocks.When methods are added,however,some additional ma-chinery seems to be necessary.Our system can serve as a type theoretical framework for representing both methods (or other higher-order objects)and collection of blocks and subroutines.Further refinement is needed for typing ret(i)instruc-tion.This instruction transfers control back to the block whose address is stored in the i th element of a local envi-ronment.This means thatΓ2(i)is the type of the blockB to which it returns,and therefore the equationΓ2(i)=Γ2,∆2£τmust hold.We solve this problem by introduc-ing recursive type equations.For each subroutine identi-fied by its entry label l,we introduce a type variableαl representing the return address with an associated equa-tion,and assign to a subroutine l a type of the form(αl=Γ2,∆2£τin αl//Γ1,∆1£τ ).We treat a recursive type equationαl=Γ,∆£τas a global declaration similar to ML’s datatype.When the equation is irrelevant,we simply write αl//Γ,∆£τ .Figure1shows an example of type derivation for bytecode block using a ter in section4we show that there is an algorithm to infer a polymorphic typing for blocks and subroutine blocks.3.THE TYPED JVM CALCULUS2Bytecode program:L0:jsr L2 L1:iload_1ireturn L2:astore_2iload_1ifeq L3iload_1ireturnL3:iconst_1istore_1ret2Typing results:αL2={c,int,αL2},∆£int in L0:{c,int,τ},∆£intL1:{c,int,αL2},int·∆£intL2: αL2//{c,int,τ},αL2·∆£intL3: αL3//{c,int,αL2},∆£intFigure1:Example of Type DerivationThis section defines a typed calculus,Jvmc,for the JVM bytecode language.Since a set of classes and an associated subclass relation are explicitly declared,we define Jvmc rel-ative to a givenfixed set of class names(ranged over by c), and a givenfixed subclass relation on class names.We write c1<:c2if c1is a subclass of c2.3.1The Syntax of JvmcA JVM program is a collection of classes.We regards a collection of JVM classes as a pair(Θ,Π)of type specifica-tionsΘand method definitionsΠ.Θis a function assigning each class name a set of typedfield names(ranged over by f)and a set of typed method names(ranged over by m). Figure2gives the syntax ofΘ.{τ1,...,τn}⇒τis a methodΘ:={c=spec,...,c=sepc}spec:={methods={m:{τ1,...,τn}⇒τ,···}fields={f:τ,···}}τ:=int|void|c|αl|Figure2:Syntax ofΘtype with argument types{τ1,...,τn}and the return typeτ. is a special type representing unused environment entry. In this abstract syntax,we understand that the subclass re-lation is already incorporated so that the set offield names and the set of method names of a class contain all those defined in its super classes.We also assume that if some commonfiled names are used in super classes,then those names are properly disambiguated.Πassign each class name its method definitions.In an actual JVM classfile,each method body is a sequence of JVM instructions some of which have labels(ranged over by l)for branch instructions.We regard such a sequence as a labeled collection of basic blocks.Furthermore,we divide basic blocks into code blocks(ranged over by B)and sub-routine blocks(ranged over by SB).Subroutine blocks are those that are(originally)invoked by a subroutine call in-struction.We identify each subroutine block with its entry label and write SB(l s)for a subroutine block whose(orig-inal)entry point is l s.With this refinement,the syntax of method definitionsΠis given in given in Figure3,where i and n represent a local variabel and an integer value respec-tively.Some comments are in order.It is straightforward toΠ:={c=methods,...,c=methods} methods:={m=M,...,m=M}M:={l b:B,···|l s:SB(l s),···}B:=return|ireturn|areturn|jsr(l s,l b)|goto(l b)|I·BSB:=return|ireturn|areturn|jsr(l s,l s)|ret(i)|goto(l s)|I·SBI:=iconst(n)|iload(i)|aload(i)|istore(i)|astore(i)|dup|pop|iadd|ifeq(l)|new(c)|invoke(c,m)|getfield(c,f)|putfield(c,f)Figure3:Syntax ofΠconstruct a labeled collection of code blocks.Construction of subroutine blocks associated with an entry label can be done by traversing instruction sequence starting from a label l appearing in some jsr(l,l )and collecting all the reach-able basic blocks.For those basic blocks that are associated with more than one entry labels,we consider that separate copies exist for each entry labels.Since code is not mutable, any potions of B i and SB i in the result of this construction can be shared.So there is no danger of code size explosion.In JVM,subroutine call has the form jsr(l)·B.JVM pushes on the stack the address of B to be used as the return address by the callee.In order to develop a type system,we need to type a return address explicitly.For this reason,we regards jsr(l)·B as shorthand for jsr(l,l )with the introduction of a new labeled block l :B.3.2The Type SystemThe basic typing judgments are those for code blocks and subroutine blocks.As we have outlined in Section2,they are judgments of the form:•Γ,∆£B:τ•SB(l):(αl=Γ2,∆2£τin αl//Γ1,∆1£τ )Since these blocks contain labels for branch instructions, their derivation are defined relative to a label environment L of the formL={l b:Γ,∆£τ,...|l s:(αls=Γ2,∆2£τin αls//Γ1,∆1£τ ),...}specifying a typing of the code blocks and subroutine blocks of a given method.If S is a sequence,we write S.i for thei th element in S.Similarly,if S is a mapping and e is anelement in its domain,then S.e denotes the element assigned to e by S.Using these notations,static semantics of non-branching instructions is given in Figure4,and the typing rules for blocks are given in Figure5.In this definition,we simply assume that new(c)creates a complete object of class c.In an actual JVM,object creation is done in two stages byfirst creating a container by new and then initializing theirfields by the constructors.As observed in[4],there is subtle issues associated with this process.We believe that the mechanism proposed in[4]is orthogonal to3iconst(n):Γ,∆=⇒Γ,int·∆iload(i):Γ,∆=⇒Γ,int·∆(ifΓ(i)=int)aload(i):Γ,∆=⇒Γ,c·∆(ifΓ(i)=c)istore(i):Γ,int·∆=⇒Γ{i←int},∆astore(i):Γ,c·∆=⇒Γ{i←c},∆astore(i):Γ,αl·∆=⇒Γ{i←αl},∆dup:Γ,τ·∆=⇒Γ,τ·τ·∆iadd:Γ,int·int·∆=⇒Γ,int·∆pop:Γ,τ·∆=⇒Γ,∆new(c):Γ,∆=⇒Γ,c·∆getfield(c0,f):Γ,c1·∆=⇒Γ,τ·∆(ifΘ.c0.fields.f=τand c1<:c0)putfield(c0,f):Γ,τ·c1·∆=⇒Γ,∆(ifΘ.c0.fields.f=τand c1<:c0)invoke(c0,m):Γ,τn·...τ1·c1·∆=⇒Γ,τ0·∆(ifΘ.c0.methods.m={τ 1,···,τ n}=⇒τ0,τ0=void and c1<:c0∧τi<:τ i for all1≤i≤n)invoke(c0,m):Γ,τn·...·τ1·c1·∆=⇒Γ,∆(ifΘ.c0.methods.m={τ 1,···,τ n}=⇒voidand c1<:c0∧τi<:τ i for all1≤i≤n)Figure4:Static Semantics of Non-branching In-structionsour approach,and can be adopted in our type system as well.Another simplification we made is that all the neces-sary classfiles are available at the time of verification.This is reflected in the typing rules of getfield,putfield,and invoke,which refer toΘ.In an actual JVM,classes are dy-namically loaded.It is not hard to modify our type system to model dynamic class loading.Since those instructions in-clude the static type of the method orfield,we can simply use the specified type to verify the code block containing these instruction without referring to the classfiles.At the time of executing one of these instructions,which causes a class containing the method to be loaded,we infer the type of the method and verify that it is indeed equal to the one specified in the instruction.This process is easily formalized by using the mechanism of dynamic typing[1].Typing of a method M is then defined as follows.M:L⇔∀l b∈dom(M).L(l b)=Γ,∆£τ∧L Γ,∆£M(l b):τand∀l s∈dom(M).L M(l s):L(l s).We assume that a method M contains a block having a unique special label e indicating its entry point,and define the typing of a method as follows:M:{τ1,...,τn}⇒τ⇔∃L such that M:L andL Γ{0←c,1←τ1,···n←τn},φ£M.e:τ(Γ=Top(max M))where Top(n)is the type of environment of size nfilled with a meaningless type and max M is the number of local vari-ables used in the method M.We can now define the type correctness of a Jvmc programTyping rules for code blocks:L Γ,∆£ireturn:intL Γ,c·∆£areturn:cL Γ,∆£return:voidL Γ,∆£goto(l):τ(if L(l)=Γ,∆£τ)L Γ2,∆2£B:τL Γ1,∆1£I·B:τ(if I:Γ1,∆1=⇒Γ2,∆2)L Γ,∆£B:τL Γ,int·∆£ifeq(l)·B:τ(if L(l)=Γ,∆ τ) L Γ1,∆1£jsr(l1,l2):τ(if L(l1)=(αl1=Γ2,∆2£τin αl1//Γ1,αl1·∆1£τ ) and L(l2)=Γ2,∆2£τ)Typing rules for subroutine blocks:ret(x):(αl=Γ,∆£τin αl//Γ,∆£τ )(ifΓ(x)=αl)goto(l): αl//Γ,∆£τ (if L(l)= αl//Γ,∆£τ )return: αl//Γ,∆£voidireturn: αl//Γ,int·∆£intareturn: αl//Γ,c·∆£cjsr(l1,l2): αl//Γ,∆£τ(if L(l1)=(αl1=Γ ,∆ £τin αl1//Γ,αl1·∆£τ ) and L(l2)= αl//Γ ,∆ £τ )SB: αl//Γ2,∆2£τI·SB: αl//Γ1,∆1£τ(if I:Γ1,∆1=⇒Γ2,∆2) SB: αl//Γ,∆£τifeq(l)·SB: αl//Γ,int·∆£τ(if L(l)= αl//Γ,∆£τ )Figure5:Typing Rules for Blocks(Π,Θ)as the following property:Π:Θ⇔Dom(Π)=Dom(Θ),and∀c∈Dom(Π). Π.c.m:Θ.c.methods.m3.3Operational SemanticsWe establish that our type system is correct by formally proving the type soundness theorem with respect to an op-erational semantics of Jvmc.We let S range over runtime stacks,E range over runtime variable environments,and h range over heaps.Both S and E arefinite sequences of run-time values(ranged over by v).A heap h maps an address (ranged over by r)to a runtime representation of an object of the form f1=v1,...,f n=v n c where c is the class of the object.Possible runtime values are either natural num-bers n,an address r in a heap,or a return address adrs(l) which represents the entry address of a block named l and is used by a subroutine.We writeI:(S,E),h=⇒(S ,E ),hto indicate that I changes the machine state(S,E),h to (S ,E ),h .Fig.6gives this relation.Update(h,r,f,v)in the rule for putfield updates the ffield of a runtime repre-sentation of an object in the heap h pointed by r to v.The object created by new consists of default values⊥τof type τ.We assume that these values behave as ordinary values4of typeτin subsequent operation.This reflects our simpli-fying assumption mentioned earlier that new(c)creates an object of class c and we do not treat the issues of two stage object creation mechanism in JVM.iconst(n):(S,E),h=⇒(n·S,E),hiload(i):(S,E),h=⇒(E(i)·S,E),haload(i):(S,E),h=⇒(E(i)·S,E),histore(i):(n·S,E),h=⇒(S,E{i←n}),hastore(i):(r·S,E),h=⇒(S,E{i←r}),hastore(i):(adrs(l)·S,E),h=⇒(S,E{i←adrs(l)}),h dup:(v·S,E),h=⇒(v·v·S,E),hiadd:(n1·n2·S,E),h=⇒((n1+n2)·S,E),hpop:(v·S,E),h=⇒(S,E),hnew(c):(S,E),h=⇒(r·S,E),h(if h =h{r← f1=⊥τ1,...,f n=⊥τnc}Θ.c.fields={f1:τ1,...,f n:τn},and r/∈dom(h))getfield(c,f):(r·S,E),h=⇒(h(r).f·S,E),hputfield(c,f):(v·r·S,E),h=⇒(S,E),h(if h =h{r←Update(h,r,f,v)})Figure6:Dynamic Semantics of Non-branching In-structionsAn operational semantics of Jvmc is given through a set of rules similar to those of SECD machine[10]of the form (S,E,C,D),h−→(S ,E ,C ,D ),hC is a code of the form M{B}or M{SB}indicating that the machine executes the top instruction of code block B (or subroutine block SB)in a method body M.D is a dump,which is either emptyφ,or a sequence of saved ex-ecution frames of the form(S,E,C)·D.Note that these rules are taken with respect to a given type specificationsΘand method definitionsΠ.Fig.7gives the set of transition rules.In the rule for invoke,TopEnv(max(c,m))denotes an environment of size max(c,m)whose elements are spe-cial constant⊥ of a meaningless value having type ,and max(c,m)is the maximal local variable index used in the method m defined in the class c.3.4Type SoundnessWe are now in the position to prove type soundness the-orem.To do this,we define typing relations for various runtime objects used in Jvmc.Runtime values may form cycles and sharing through object pointers.To define value typing without resorting to co-induction,we follow[11]and define types of values relative to a heap type(ranged over by H)specifying the structure of a heap,which is a function from afinite set of heap addresses to types.Runtime values may also contain return addresses of the form adrs(l)which should be typed with a block type or a subroutine block type.This requires us to define value typing relative to a label environment L as well.We use the following typing relations.•L|=h:H h has a heap type H•L;H|=v:τv has typeτunder H•L;H|=S:∆S has a stack type∆under H(S,E,M{I·B},D),h−→(S ,E ,M{B},D),h (if I:(S,E),h=⇒(S ,E ),h ) (n·S,E,M{ireturn},(S0,E0,M0{B0})·D0)),h−→(n·S0,E0,M0{B0},D0),h(r·S,E,M{areturn},(S0,E0,M0{B0})·D0)),h−→(r·S0,E0,M0{B0},D0),h(S,E,M{return},(S0,E0,M0{B0})·D0)),h−→(S0,E0,M0{B0},D0),h(0·S,E,M{ifeq(l)·B},D),h−→(S,E,M{M(l)},D),h(n·S,E,M{ifeq(l)·B},D),h−→(S,E,M{B},D),h(if n=0)(S,E,M{goto(l)},D),h−→(S,E,M{M(l)},D),h(v n·...·v1·r·S,E,M{invoke(c,m)·B},D),h−→(φ,E ,M {M .entry},(S,E,M{B})·D)),h(if E =TopEnv(max(c,m)){0←c,1←τ1,···n←τn},Θ.c.methods.m={τ1,...,τn}⇒τ,andΠ.c.m=M )(S,E,M{ret(i)},D),h−→(S,E,M{M(E(i))},D),h(S,E,M{jsr(l1,l2)},D),h−→(adrs(l2)·S,E,M{M(l1)},D),hFigure7:Transition Rules of the Jvmc •L;H|=E:ΓE has an environment typeΓunder H These relations are given in Figure8.We note that,in theL;H|=n:intL;H|=r:τ(if H(r)<:τ)L;H|=adrs(l):αl(if L(l)=αlor L(l)= αl //Γ,∆£τ andαl =Γ,∆£τ) L|=h:H⇔dom(h)=dom(H)and∀r∈dom(h).if h(r)= f1=v1,...,f n=v n cthen c<:H(r)∧L;H|=v i:Θ.c.fields.f i for each i.L;H|=S:∆⇔dom(S)=dom(∆)∧L;H|=S.i:∆.i for each i.L;H|=E:Γ⇔dom(E)=dom(Γ)and L;H|=E.i:Γ.i for each i.Figure8:Typing of Runtime Values case of JVM,runtime objects are explicitly typed,and there-fore one can take H h for h such that H h(r)is the runtime tag of h(r).The resulting tying relation is the same as the one defined in[3].A key to establish type soundness with respect to SECD-style operational semantics is to define typing relation on dumps.This technique isfirst used in[16].We write H|= D:τto indicate that D has typeτunder H.Its intuitive meaning is that D accepts a value of typeτand resumes the saved computation.This relation is defined inductively onD in ing this relation,we define well typedness 5•H|=φ:τfor anyτ•H|=(S,E,M{B})·D:τ⇔∃Γ,∆,L,τ . M:L,L;H|=S:∆ ,L;H|=E:Γ, L Γ,∆£B:τ ,and H|=D:τ .where∆ =∆ifτ=void otherwise∆ =τ·∆•H|=(S,E,M{SB})·D:τ⇔∃Γ,∆,L,τ . M:L,L;H|=S:∆ ,L;H|=E:Γ, L SB: αl//Γ;∆£τ and H|=D:τ .where∆ =∆ifτ=void otherwise∆ =τ·∆Figure9:Typing of Dump Dof a machine state including a dump as follows.H (S,E,M{B},D),h⇔∃L,Γ,∆such that M:L,L|=h:H,L;H|=S:∆, L;H|=E:Γ,L Γ,∆£B:τ,and H|=D:τH (S,E,M{SB},D),h⇔∃L,Γ,∆such that M:L,L|=h:H,L;H|=S:∆, L;H|=E:Γ,L SB: αl//Γ,∆£τ ,and H|=D:τWe can now formally state type soundness as the following theorem.Theorem 1.Consider a Jvmc program(Π,Θ)such that Π:Θ.If H (S,E,M{C},D),h then either(1)C is one of return,ireturn,areturn,and D=φ,or(2)there are some S ,E ,M {C },D ,h ,and H such that H is an extension of H,(S,E,M{C},D),h−→(S ,E ,M {C },D ),h ,and H (S ,E ,M {C },D ),h .where C is B or SB.Proof.The proof uses following simmple lemma,which can be proved by simple case analysis.Lemma 1.If H|=v:τand H is an extension of H then H |=v:τ.Wefirst show the cases where C is a block B.Since |=Π:Θ,there is some L such that M:L.If H (S,E,M{B},D),h then there is someΓ,∆such that L|= h:H,L,H|=E:Γ,L,H|=S:∆,L Γ,∆£τand H|=D:τ.The proof proceeds by cases in terms of the first instruction of B.Case B=return.D=φor there is some S1,E1,M1,B1,D1 such that D=(S1,E1,M1{B1})·D1.The case for D=φis trivial.We assume that D=(S1,E1,M1{B1})·D1.By the transition rule,(S,E,M{return},(S1,E1,M1{B1})·D1),h −→(p·S1,E1,M1{B1},D1),h.By the type system,τ= void.By the typing rule for D,there are someΓ1,∆1,L1,τ1 such that M1:L1,L1,H|=S1:∆1,L1 Γ1,∆1£B1:τ1 and H|=D1:τ1.Case B=goto(l).By the transition rule,(S,E,M{goto(l)},D),h−→(S,E,M{M(l)},D),h.Since L(l)=Γ;∆£τby the definition of type system,L Γ,∆£M(l):τ.Case B=jsr(l1,l2).By the transition rule,(S,E,M{jsr(l1,l2)},D),h−→(adrs(l2)·S,E,M{M(l1)},D ),h.By the definition of the type system,since there aresomeΓ1,∆1such that L(l1)=(αl1=Γ1,∆1£τin αl1//Γ,αl1·∆£τ ),L(l2)=Γ1,∆1£τ,we have M(l1)=(αl1=Γ1;∆1£τin αl1//Γ,αl1·∆£τ ).Therefore we have only toshow L;H|=adrs(l2):αl1.Since L;H|=adrs(l2):L(l2)andαl1=L(l2),L;H|=adrs(l2):αl1.Case B=new(c)·B1.By the transition rule,(S,E,M{new(c)·B1},D),h−→(r·S,E,M{B1},D),h{r←f1=⊥τ1,...,f n=⊥τnc},Θ.c.fields={f1:τ1,···,f n:τn}and r/∈h.If we take H =H{r→c}then sincer is fresh H is a extension of H.Since L;H |=⊥τi:Θ.c.fieldes.f i(1≤i≤n),|=h :H .Then the resultfollows from Lemma1.Case B=getfield(c,f)·B1.By the definition of typesystem,there are someτ0,c0,∆ such that∆=τ0·c0·∆ ,c0<:c1,τ0<:Θ.c.fields.f.Consequently,there are somer0,s such that S=r0·S ,L;H|=r0:c0,L;H|=S :∆ .By the transition rule,(r0·S ,E,M{getfield(c,f)B1·},D)−→(v1·S ,E,M{B1},D1),h and v1=h(r0).f.By thedefinition of the type system,there is someτ1such thatL Γ;τ1·∆ £B1:τandτ1=Θ.c.fields.f.Since c0<:c,h(r0).f=Θ.c.fields.f,and therefore L;H|=v1:τ1.Case B=invoke(c,m)·B1.By the definition of thetype system,there are someτ1,···,τn,c0,∆ such that∆=τn·...·τ1·c0·∆ .Also,there are some v n,···,v n,v0,S suchthat L;H|=τn·...·τ1·c0·∆ :v n·...·v1·r·S ∧S=v n·...·v1·r·S Then we have:(v n·...·v1·r·S ,E,M{invoke(c,m)·B1},D),h−→(φ,E1,M1{M1.e},(S ,E,M{B1})·D),h such thatE1=T opEnv(max(c,m)){0→r,1→v1,···,n→v n},andΘ.c.methods.m={τ1,···,τn}⇒τ .We distinguishcases whetherτ is void or not.Here we only show that se forτ =void.The other case is similar.Since L Γ,∆ £B1:τby the definition of type system,H|=(S ,E,M{B1})·D:void.By Π:Θ,there is some L1such that M1:L1,L1T op(max e){0→c0,1→τ1,···,n→τn},φ£M1.entry:void.Due to the definition,L1;H1|=T opEnv(max(c,m)){0→r,1→v1,···,n→v n}:T op(max e){0→c,1→τ1,···,n→τn}.The other cases for blocks are simpler.The cases when C is a subroutine blocks can be shownsimilary by case analysis in terms of thefirst instruction.We only show the case for SB=jsr(l1,l2).By the def-inition of type system,L jsr(l1,l2): αl//Γ;∆£τ andL(l1)=(αl1=Γ1;∆1£τin αl1//Γ;αl1·∆£τ ),L(l2)= αl//Γ1;∆1£τ .If M:L,L|=h:H,L;H|=S:∆,L;H|=E:Γ,and H|=D:τthen we haveH (S,E,M{jsr(l1,l2)},D),h.By transition rule,(S,E,M{jsr(l1,l2)},D),h−→(adrs(l2)·S,E,M{M(l1)},D),h.Since L M(l1):L(l1),we have only to show L;H|=adrs(l2):αl1.This follows from the above equations forL(l2)and L(l1).This theorem implies that a well typed machine state iseither the halting state or a state such that the machinecan execute one step transition and produce another welltyped machine state.This immediately guarantees that awell typed program never goes wrong,and when the machineterminates,the top element of the stack is a value of correcttype specified by the type of the program.4.POLYMORPHISMAND TYPE INFERENCEIn order to use the type system we have developed as aframework for static verification of type safety of Jvmc code,two further extensions are necessary.One is polymorphic6。
General Terms and Conditions说明书
CGS-CIMB Securities (Singapore) Pte Ltd – General Terms and Conditions 银河-联昌证券(新加坡)私人有限公司–一般条款和条件THIS DOCUMENT states the terms and conditions which govern the relationship between CGS-CIMB Securities (Singapore) Pte. Ltd. (“CGS-CIMB ”) and the applicant or applicants for the Account (as hereafter defined) (the “Client ”). 本文件阐述了银河-联昌证券(新加坡)私人有限公司(“CGS-CIMB ”)与申请人或账户申请人(如下文所定义)(简称“客户”)的关系的条款和条件。
P art A: Definition A 章:定义 1. Definitions 定义1.1 Unless the context otherwise requires or if specifically defined in the relevant part of these terms and conditions,the following words or expressions in these terms and conditions shall have the following meanings:除非上下文另有规定,或在这些条款和条件的相关部分中明确定义,否则这些术语或条件下的下列单词或表达式应具有以下含义:“Account ” means such account, including any sub-account, as may be necessary and expedient for the performance of Transactional Services, including but not limited to the Cash Trading Account, the Margin Trading Account, the Securities Borrowing Account, the Securities Lending Account, the CFD Account (as defined in Clause 62.1), the Investment Advisory Account, and the Multi-currency Trust Account;“账户”是指此类账户,包括任何子账户,以交易服务的性能是必要的和适当的,包括但不限于现金交易账户,保证金交易账户,证券借入账户,证券借出账户,CFD 账户(如第62.1条文定义)、投资咨询账户、和多币种信托账户;“Affiliate” means (i) a related corporation (as defined in the Companies Act (Cap 50)) of CGS-CIMB; (ii) CGS-CIMB Securities Sdn. Bhd. and its related corporations (as defined in the Companies Act (Cap 50)); (iii) a member of the CGI Group; and/or (iv) a member of the CIMB Group;“关联公司”指的是(i )与CGS-CIMB 相关的企业(如公司法规定(第50章));(ii )CGS-CIMB Securities Sdn. Bhd.及其关联公司(根据公司法令(第50章)的定义);(iii )CGI 集团成员;和/或(iv )CIMB 集团股东;“Amount Financed ” means the amount owed by the Client in the Margin Trading Account and shall include (a) amounts financed by CGS-CIMB in respect of outstanding purchases made for the Margin Trading Account net of the Cash Collateral and sales proceeds receivable from outstanding sales made in the Margin Trading Account of the Client; (b) all commission charges, interest expenses and all other related expenses; and (c) such other amount as CGS-CIMB may include for the purpose of determining the amount financed;“资金数额”是指由客户所欠的金额在保证金交易账户和金额应包括(a )由CGS-CIMB 的购买为保证金交易账户净应收现金抵押品和销售收入的优秀销售在客户的保证金交易账户;(b )中所有佣金,利息费用及其他相关费用;(c )其他金额如CGS-CIMB 可能包括用于确定融资金数额;“Authorised Person ” means a person authorised in writing by the Client to provide instructions to CGS-CIMB in relation to Transactions on behalf of the Client, and whose instructions will be accepted by CGS-CIMB and are binding on the Client;“授权人”指的是委托人在书面授权的情况下,为客户提供有关交易的指示,其指示将由CGS-CIMB 接受,并对客户有约束力;“Authority ” means the Monetary Authority of Singapore; “管理局”指的是新加坡的金融管理局;“Base Currency ” means Singapore Dollars; “基础货币”指的是新加坡元;“Business Day ” means any day on which CGS-CIMB is open for business in Singapore; “营业日”指银河-联昌在新加坡营业的任何一天;“CAR ” means Client Account Review; “CAR ”系指客户账户审核;“Cash Collateral ” means Collateral that takes the form of a deposit of cash; “现金抵押品”是指以保证金形式支付的抵押品;Contents 目录 Part A : A 章 : Definition 定义2 Part B : B 章 : Terms Applicable Generally 一般条款的适用 9 Part C : C 章 : Trading In Securities 证券交易22 Part D : D 章 : Financial Advisory Services 财务咨询服务24 Part E : E 章 : Custodian And Nominee Services 保管人和提名人服务26 Part F : F 章 : Securities Borrowing And Lending 证券借入和借出 29 Part G : G 章 : Margin Trading Account 保证金交易账户39 Part H : H 章 : Contracts For Difference 差价合约44 Part I : I 章 : Multi-currency Trust Account 多币种信托账户70 Part J : J 章 : Transactions In Foreign Exchanges 外汇交易71 Part K : K 章 : Electronic Communications 电子通信 71 Part L : L 章 : Online Services 网上服务73 Part M : M 章 : Electronic Payment For Securities 证券电子支付 79 Part N : N 章 : Personal Data 个人资料80 Part O : O 章: Miscellaneous Provisions 杂项规定83 Schedule I : 附表1 : Risk Disclosure Statement 风险披露声明91 Schedule II : 附表2 : Guide And Caution Note: Applying/Maintaining A Trading Account 指引和注意事项:申请/维持交易账户106“Cash Trading Account” means the Account (other than the CFD Account, the Margin Trading Account, the Securities Borrowing Account and the Securities Lending Account) designated by CGS-CIMB through which the Transactions are to be effected;“现金交易账户”指的是由CGS-CIMB指定的账户(CFD账户,保证金交易账户,证券借入账户和证券借出账户除外);“CDP” means The Central Depository (Pte) Limited;“CDP”指中央存管(私人)有限公司;“CFD” means contracts for difference;“CFD”是指差价合约;“Charged Securities” means the Collateral or marketable Securities provided by the Client (and which CGS-CIMB agrees to accept as security for the availability of or continued availability of the Margin Financing Facility) including, without limitation, all or any securities, rights, moneys and properties whatsoever which may at any time after the date hereof be derived from, accrued on or be offered in respect of, any of the Charged Securities; “抵押证券”系指客户提供的抵押品或有价证券(以及CGS-CIMB作为可用的安全或保证金融资设施的持续可用性),包括但不限于所有或任何证券,权利,款项和任何在该日期之后的任何时间可从任何已抵押证券而衍生,计提或提供的财产;“CGI Group” means China Galaxy International Financial Holdings Limited and its related corporations (as defined in the Companies Act (Cap 50));“CGI集团”指的是中国银河国际金融控股有限公司及其相关公司(如公司法规定(第50章));“CIMB Group” means CIMB Group Sdn. Bhd. and its related corporations (as defined in the Companies Act (Cap 50));“CIMB集团”指的是联昌国集团限公司。
set out general terms and conditions 条款和条件 -回复
set out general terms andconditions 条款和条件 -回复在这篇1500-2000字的文章中,我们将逐步回答与"条款和条件"相关的问题。
条款和条件是一种协议或合同的一部分,它规定了各方之间的权利和责任。
它们通常在购买商品或使用服务的过程中起着重要的作用。
在以下内容中,我们将介绍什么是条款和条件,为什么它们重要,如何制定和修改它们,以及如何应对违反条款和条件的情况。
一.什么是条款和条件?条款和条件是一份文件,它规定了双方在交易过程中应遵守的规则和规定。
它们通常以书面形式存在,可以在购买合同、网站使用条款、服务协议以及其他商业合同中找到。
条款和条件通常包括以下内容:1.定义和解释:条款和条件中通常会给出一些定义和解释,以确保双方对其中的术语和概念有相同的理解。
2.费用和支付方式:条款和条件通常会规定商品或服务的价格,以及支付方式和期限。
3.权利和责任:条款和条件会明确规定双方的权利和责任,例如商品的所有权归属、服务的提供范围和质量要求等。
4.服务的期限和终止:如果涉及到订阅服务或长期合同,条款和条件会规定服务的期限和条件以及提前终止的方式。
5.违约和争议处理:条款和条件通常会规定违约时的处理措施和争议解决程序,例如适用的法律和法庭管辖等。
二.为什么条款和条件重要?条款和条件在商业交易中起着至关重要的作用,它们有以下几个重要用途:1.确保交易的公平性和透明性:条款和条件可以确保交易的公平性和透明性,明确双方在交易过程中的权利和责任,避免一方对另一方的不公平行为。
2.减少纠纷和争议:条款和条件规定了交易的规则和条件,可以减少纠纷和争议的发生。
当争议发生时,双方可以依照条款和条件中的规定进行解决。
3.保护双方的权益:条款和条件确保双方在交易过程中的权益得到保护,例如商品的质量和服务的提供范围等。
双方可以根据条款和条件中的规定追究对方的责任。
4.提供法律保护:条款和条件通常包含有适用的法律和法律管辖的条款,这样双方在发生争议时可以依法寻求保护。
Woodpulp General terms (ChineseTranslation-1)
GENERAL TERMS AND CONDITIONS OF WOOD PULP木浆贸易通用条款1. PREAMBLE导言These General Trade Rules shall apply, except when altered by express agreement accepted in writing by both the seller and the buyer.合同之买卖双方除另有彼此接受的书面协议外,其余皆应以本通用条款为准。
2. QUANTITY: WEIGHT AND MOISTURE数量:重量和水分Unless otherwise stated, the word tonne or ton in this contract shall mean 1,000 kilogrammes air-dry weight gross for net. The term air-dry shall mean ninety per cent (90 %) absolutely dry pulp and ten per cent (10 %) water.除非另有说明,合同中所指之“吨”应为1000千克净空干总重量。
条款“空干”所指应为百分之九十绝干木浆和百分之十的水分。
The pulp shall be packed in bales of declared uniform weight and air-dry content or a specification to be given stating the weight and air-dry content and number of each bale. Each bale shall bear a number or other identification mark to enable the time of manufacture to be determined by the seller in case of need.木浆采用标准打包包装,注明均一重量和空干度。
General Terms一般词汇
General Terms一般词汇:manager 经纪人instructor 教练,技术指导guide 领队trainer 助理教练referee, umpire (网球.棒球)裁判linesman, touch judge (橄榄球)裁判contestant, competitor, player 运动员professional 职业运动员amateur 业余运动员,爱好者enthusiast, fan 迷,爱好者favourite 可望取胜者(美作:favorite)outsider 无取胜希望者championship 冠军赛,锦标赛champion 冠军record 纪录record holder 纪录创造者ace 网球赛中的一分Olympic Games, Olympics 奥林匹克运动会Winter Olympics 冬季奥林匹克运动会stadium 运动场track 跑道ring 圈ground, field 场地pitch (足球、橄榄球)场地court 网球场team, side 队竞技性运动competitive sport用粉笔记下(分数等);达到,得到chalk up 出名make one's mark体育项目(尤指重要比赛)event体育PE (Physical Education)体格、体质physique培训groom余的,带零头的odd年少者junior残疾人the handicapped/disabled学龄前儿童preschool全体;普通;一般at large平均寿命life expectancy复兴revitalize使有系统;整理systemize历史悠久的time-honored跳板spring-board秋千swing石弓,弩crossbow(比赛等的)观众spectator取得进展make headway体育大国/强国sporting/sports power与...有关系,加入be affiliated to/with落后lag behind武术martial arts五禽戏five-animal exercises体育运动physical culture and sports增强体质to strengthen one's physique可喜的,令人满意的gratifying称号,绰号label涌现出来to come to the fore源源不断a steady flow of队伍contingent又红又专/思想好,业务精to be both socialist-minded and vocationally proficient 体育界sports circle(s)承担义务to undertake obligation黑马dark horse冷门an unexpected winner; dark horse爆冷门to produce an unexpected winner发展体育运动,增强人民体质Promote physical culture and build up the people's health锻炼身体,保卫祖国Build up a good physique to defend the country为祖国争光to win honors for the motherland胜不骄,败不馁Do not become cocky/be dizzy with success, nor downcast over/discouraged by defeat.体育道德sportsmanship打出水平,打出风格up to one's best level in skill and style of play竞技状态好in good form失常to lose one's usual form比分领先to outscore打成平局to draw/to tie/to play even/to level the score失利to lose中华人民共和国运动委员会(国家体委)Physical Culture and Sports Commission of the PRC (State Physical Culture and Sports Commission)中华全国体育总会All-China Sports Federation国际奥林匹克委员会International Olympic Committee少年业余体育学校youth spare-time sports school, youth amateur athletic school 辅导站coaching center体育中心sports center/complex竞赛信息中心competition information center运动会sports meet; athletic meeting; games全国运动会National Games世界大学生运动会World University Games; Universiade比赛地点competition/sports venue(s)国际比赛international tournament邀请赛invitational/invitational tournament锦标赛championship东道国host country/nation体育场stadium; sports field/ground体育馆gymnasium, gym; indoor stadium比赛场馆competition gymnasiums and stadiums练习场馆training gymnasiums操场playground; sports ground; drill ground体育活动sports/sporting activities体育锻炼physical training体育锻炼标准standard for physical training体育疗法physical exercise therapy; sports therapy广播操setting-up exercises to music课/工间操physical exercises during breaks体育工作者physical culture workers, sports organizer运动爱好者sports fan/enthusiast观众spectator啦啦队cheering-section啦啦队长cheer-leader国家队national team种子队seeded team主队home team客队visiting team教练员coach裁判员referee, umpire裁判长chief referee团体项目team event单项individual event男子项目men's event女子项目women's event冠军champion; gold medalist全能冠军all-round champion亚军running-up; second; silver medalist第三名third; bronze medalist世界纪录保持者world-record holder运动员athlete; sportsman种子选手seeded player; seed优秀选手top-ranking/topnotch athlete田径运动track and field; athletics田赛field events竞赛track events跳高high jump撑杆跳高pole jump; polevault跳远long/broad jump三级跳远hop, step and jump; triple jump标枪javelin throw铅球shot put铁饼discus throw链球hammer throw马拉松赛跑Marathon (race)接力relay race; relay跨栏比赛hurdles; hurdle race竞走walking; walking race体操gymnastics自由体操floor/free exercises技巧运动acrobatic gymnastics垫上运动mat exercises单杠horizontal bar双杠parallel bars高低杠uneven bars; high-low bars吊环rings跳马vaulting horse鞍马pommel horse平衡木balance beam球类运动ball games足球football; soccer足球场field; pitch篮球basketball篮球场basketball court排球volleyball乒乓球table tennis; ping pong乒乓球拍racket; bat羽毛球运动badminton羽毛球shuttlecock; shuttle球拍racket网球tennis棒球baseball垒球softball棒/垒球场baseball(soft ball)field/ground手球handball手球场handball field曲棍球hockey; field hockey冰上运动ice sports冰球运动ice hockey冰球场rink冰球puck; rubber速度滑冰speed skating花样滑冰figure skating冰场skating rink; ice rink人工冰场artificial ice stadium滑雪skiing速度滑雪cross country ski racing高山滑雪alpine skiing水上运动water/acquatic sports水上运动中心aquatic sports center水球(运动)water polo水球场playing pool滑水water-skiing冲浪surfing游泳swimming游泳池swimming pool游泳馆natatorium自由泳freestyle; crawl (stroke)蛙泳breaststroke侧泳sidestroke蝶泳butterfly (stroke)海豚式dolphin stroke/kick蹼泳fin swimming跳水diving跳台跳水platform diving跳板跳水springboard diving赛艇运动rowing滑艇/皮艇canoeing帆船运动yachting; sailing赛龙船dragon-boat racing室内运动indoor sports举重weightlifting重量级heavyweight中量级middleweight轻量级lightweight拳击boxing摔交wresting击剑fencing射击shooting靶场shooting range射箭archery拳术quanshu; barehanded exercise; Chinese boxing气功qigong; breathing exercises自行车运动cycling; cycle racing赛车场(自行车等的)倾斜赛车场cycling track室内自行车赛场indoor velodrome摩托运动motorcycling登山运动mountaineering; mountain-climbing骑术horsemanship赛马场equestrian park国际象棋(international) chess特级大师grandmaster象棋xiangqi; Chinese chess。
英语词汇库{英语专题分类词汇(地理Geography)}
戈洛博翻译-英语词汇库英语专题分类词汇(地理Geography)1.General Terms 一般词汇:physical geography自然地理;economic geography经济地理;geopolitics地理政治论;geology地理学;ethnography民族志;cosmography宇宙志;cosmology宇宙论;toponymy 地名学;oceanography海洋学;meteorology气象学;orography山志学;hydroaraphy水文学;vegetation植被;relief地形/地貌;climate气候;earth地球/大地;Universe/cosmos 宇宙;world世界;globe地球仪;earth/globe地壳;continent大陆;terra firma陆地;coast海岸;archipelago群岛;peninsula半岛;island岛;plain平原;valley谷地;meadow(小)草原;prairie(大)草原;lake湖泊;pond池塘;marsh/bog/swamp沼泽;small lake小湖;lagoon泻湖;moor/moorland荒原;desert沙漠;dune沙丘;oasis绿洲;savanna(南美)大草原;virgin forest 原始森林;steppe大草原;tundra冻原。
2.Cartography 地图绘制法:map 地图;map of the world 世界地图;planisphere 平面地形图;wall map/skeleton map 挂图;skeleton map 示意图;map/plan/chart 平面图;atlas 地图集;chart 海图;cadastre 地籍薄;topography 测绘学;photogrammetry 摄影测量学;world political 世界政区;world physical 世界地形;world communication 世界交通;world time zone 世界时区;geomorphy地貌;scale 比例尺;legend 图例;city map 城市图;populated localitied 居民点;capital 首都/首府;capital of first-order political unit 一级行政中心;main city/major city 主要城市;common city 一般城市;village村;town 镇/乡;city boundary市界;boundary境界;continental boundary洲界;international boundary 国界;undefined international boundary 未定国界;regional boundary 地区界;boundary or first-order political unit一级政区界;communication 交通;railroad/railway 铁路;expressway/motorway 高速公路;highway 公路;shipping route 航海线;airport 机场;port 港口;stream system 水系;river 河流;waterfall 瀑布;salt lake咸水湖/盐湖;lake 湖泊;seasonal lake 时令湖;seasonal river 时令河;reservoir 水库;dam水坝;water pipe line 输水管;well 井;spring 泉;hot spring 温泉;swamp 沼泽;desert 沙漠;peak 山峰;volcano 火山;coral reefs 珊瑚礁;historic site 古迹;pass山口/关隘;the great wall 长城;pyramid 金字塔;magnetic pole 地磁极;city blocks城市街区;main street/major street主要街道;secondary street次要街道;important building 重要建筑物;building 独立建筑;college or university 高等院校;national park 国家公园;military installation 军用设施;space centre or scientificresearch centre 航天或科研中心;lighthouse 灯塔;monument 纪念碑;stadium 体育场;gymnasium 体育馆;park公园;bridge 桥梁;green land 绿地;cemetery 墓地。
General Terms & Conditions
General Terms & Conditions一般条款和条件1.Interpretation解释― Acceptance ‖ means the process whereby Seller demonstrates to Buyer that the Supplies meet the requirements of the Agreement by successful completion of the Acceptance Tests ;―验收‖是指卖方向通过成功完成验收测试向买方展示供应品符合本协议要求的过程。
“Acceptance Date” means the date on which Buyer agrees that the Supplies have successfully completed the Acceptance Tests as defined in "Acceptance Tests and Forms" Appendix. The Supplies shall be deemed installed and accepted upon successful completion of the Acceptance Tests, as notified in writing by Buyer to Seller, or upon first use of the Supplies by Buyer, whichever occurs first ;―验收日期‖是指买方同意供应品成功完成附件“验收测试和表格”中定义的验收测试的日期。
买方书面通知卖方供应品验收测试成功完成时,或者买方首次使用供应品时,应视作供应品安装完毕并被接受,以先发生者为准。
“Acceptance Tests” are the tests listed in "Acceptance Tests and Forms" Appendix ;―验收测试‖是指附件“验收测试和表格”中列出的测试。
经济学英语词汇1、General-terms-一般术语
经济学英语词汇1、General-terms-一般术语经济学英语词汇1、General terms 一般术语economist 经济学家socialist economy 社会主义经济capitalist economy 资本主义经济collective economy 集体经济planned economy 计划经济controlled economy 管制经济rural economics 农村经济liberal economy 自由经济mixed economy 混合经济political economy 政治经济学protectionism 保护主义autarchy 闭关自守primary sector 初级成分private sector 私营成分,私营部门public sector 公共部门,公共成分economic channels 经济渠道economic balance 经济平衡economic fluctuation 经济波动economic depression 经济衰退economic stability 经济稳定economic policy 经济政策economic recovery 经济复原understanding 约定concentration 集中holding company 控股公司trust 托拉斯cartel 卡特尔rate of growth 增长economic trend 经济趋势economic situation 经济形势infrastructure 基本建设standard of living 生活标准,生活水平purchasing power, buying power 购买力scarcity 短缺stagnation 停滞,萧条,不景气underdevelopment 不发达underdeveloped 不发达的developing 发展中的2、Capital 资本initial capital 创办资本frozen capital 冻结资金frozen assets 冻结资产fixed assets 固定资产real estate 不动产,房地产circulating capital, working capital 流动资本available capital 可用资产capital goods 资本货物reserve 准备金,储备金calling up of capital 催缴资本allocation of funds 资金分配contribution of funds 资金捐献working capital fund 周转基金revolving fund 循环基金,周转性基金contingency fund 意外开支,准备金reserve fund 准备金buffer fund 缓冲基金,平准基金sinking fund 偿债基金investment 投资,资产investor 投资人self-financing 自筹经费,经费自给bank 银行current account 经常帐户(美作:checking account) current-account holder 支票帐户(美作:checking-account holder)cheque 支票 (美作:check) bearer cheque, cheque payable to bearer 无记名支票,来人支票crossed cheque 划线支票traveller's cheque 旅行支票chequebook 支票簿,支票本(美作:checkbook) endorsement 背书transfer 转让,转帐,过户money 货币issue 发行ready money 现钱cash 现金ready money business, no credit given 现金交易,概不赊欠change 零钱banknote, note 钞票,纸币(美作:bill)to pay (in) cash 付现金domestic currency, local currency] 本国货币convertibility 可兑换性convertible currencies 可自由兑换货币exchange rate 汇率,兑换率foreign exchange 外汇floating exchange rate 浮动汇率free exchange rates 自由汇兑市场foreign exchange certificate 外汇兑换券hard currency 硬通货speculation 投机saving 储装,存款depreciation 减价,贬值devaluation (货币)贬值revaluation 重估价runaway inflation 无法控制的通货膨胀deflation 通货紧缩capital flight 资本外逃securities business 证券市场stock exchange 股票市场stock exchange corporation 证券交易所stock exchange 证券交易所,股票交易所quotation 报价,牌价share 股份,股票shareholder, stockholder 股票持有人,股东dividend 股息,红利cash dividend 现金配股stock investment 股票投资investment trust 投资信托stock-jobber 股票经纪人stock company, stock brokerage firm 证券公司securities 有价证券share, common stock 普通股preference stock 优先股income gain 股利收入issue 发行股票par value 股面价格, 票面价格bull 买手, 多头bear 卖手, 空头assigned 过户opening price 开盘closing price 收盘hard times 低潮business recession 景气衰退doldrums 景气停滞dull 盘整ease 松弛raising limit 涨停板break 暴跌bond, debenture 债券Wall Street 华尔街3、Credit 信贷short term loan 短期贷款long term loan 长期贷款medium term loan 中期贷款lender 债权人creditor 债权人debtor 债务人,借方borrower 借方,借款人borrowing 借款interest 利息rate of interest 利率discount 贴现,折扣rediscount 再贴现annuity 年金maturity 到期日,偿还日amortization 摊销,摊还,分期偿付redemption 偿还insurance 保险mortgage 抵押allotment 拨款short term credit 短期信贷consolidated debt 合并债务funded debt 固定债务,长期债务floating debt 流动债务drawing 提款,提存aid 援助allowance, grant, subsidy 补贴,补助金,津贴4、Pruduction 生产output 产出,产量producer 生产者,制造者productive, producing 生产的products, goods 产品consumer goods 消费品article 物品,商品manufactured goods, finished goods 制成品,产成品raw product 初级产品semifinished goods 半成品by-product 副产品foodstuffs 食品raw material 原料supply 供应,补给input 投入productivity 生产率productiveness 赢利性overproduction 生产过剩5、Expenses 耗费cost 成本,费用expenditure, outgoings 开支,支出fixed costs 固定成本overhead costs 营业间接成本overheads 杂项开支,间接成本operating costs 生产费用,营业成本operating expenses 营业费用running expenses 日常费用,经营费用miscellaneous costs 杂项费用overhead expenses 间接费用,管理费用upkeep costs, maintenance costs 维修费用,养护费用transport costs 运输费用social charges 社会负担费用contingent expenses, contingencies 或有费用apportionment of expenses 分摊费用6、Profit 利润income 收入,收益earnings 利润,收益gross income, gross earnings 总收入,总收益gross profit, gross benefit毛利,总利润,利益毛额net income 纯收益,净收入,收益净额average income 平均收入national income 国民收入profitability, profit earning capacity 利润率,赢利率yield 产量收益,收益率increase in value, appreciation 增值,升值7、taxes 税duty 税taxation system 税制taxation 征税,纳税fiscal charges 财务税收progressive taxation 累进税制graduated tax 累进税value added tax 增值税income tax 所得税land tax 地租,地价税excise tax 特许权税basis of assessment 估税标准taxable income 须纳税的收入fiscality 检查tax-free 免税的tax exemption 免税taxpayer 纳税人tax collector 收税员8、Internal economic and trade orgnization 国际经济与贸易组织China Council for the Promotion of International Trade, C.C.P.I.T. 中国国际贸易促进委员会National Council for US-China Trade 美中贸易全国理事会Japan-China Economic Association 日中经济协会Association for the Promotion of International Trade,Japan 日本国际贸易促进会British Council for the Promotion of International Trade 英国国际贸易促进委员会International Chamber of Commerce 国际商会International Union of Marine Insurance 国际海洋运输保险协会International Alumina Association 国际铝矾土协会Universal Postal Union, UPU 万国邮政联盟Customs Co-operation Council, CCC 关税合作理事会United Nations Trade and Development Board 联合国贸易与发展理事会Organization for Economic cooperation and Development, DECD 经济合作与开发组织European Economic Community, EEC, European Common Market 欧洲经济共同体European Free Trade Association, EFTA 欧洲自由贸易联盟European Free Trade Area, EFTA 欧洲自由贸易区Council for Mutual Economic Aid, CMEA 经济互助委员会Eurogroup 欧洲集团Group of Ten 十国集团Committee of Twenty(Paris Club) 二十国委员会Coordinating Committee, COCOM 巴黎统筹委员会Caribbean Common Market, CCM, Caribbean Free-Trade Association, CARIFTA 加勒比共同市场(加勒比自由贸易同盟)Andeans Common Market, ACM, Andeans TreatyOrganization, ATO 安第斯共同市场Latin American Free Trade Association, LAFTA 拉丁美洲自由贸易联盟Central American Common Market, CACM 中美洲共同市场African and Malagasy Common Organization, OCAM 非洲与马尔加什共同组织East African Common Market, EACM 东非共同市场Central African Customs and Economic Union, CEUCA 中非关税经济同盟West African Economic Community, WAEC 西非经济共同体Organization of the Petroleum Exporting Countries, OPEC 石油输出国组织Organization of Arab Petroleum Exporting Countries, OAPEC 阿拉伯石油输出国组织Commonwealth Preference Area 英联邦特惠区Centre National du Commerce Exterieur, National Center of External Trade 法国对外贸易中心People's Bank of China 中国人民银行Bank of China 中国银行International Bank for Reconstruction and development, IBRD 国际复兴开发银行World Bank 世界银行International Development association, IDA 国际开发协会International Monetary Found Agreement 国际货币基金协定International Monetary Found, IMF 国际货币基金组织European Economic and Monetary Union 欧洲经济与货币同盟European Monetary Cooperation Fund 欧洲货币合作基金Bank for International Settlements, BIS 国际结算银行African Development Bank, AFDB 非洲开发银行Export-Import Bank of Washington 美国进出口银行National city Bank of New York 花旗银行American Oriental Banking Corporation 美丰银行American Express Co. Inc. 美国万国宝通银行The Chase Bank 大通银行Inter-American Development Bank, IDB 泛美开发银行European Investment Bank, EIB 欧洲投资银行Midland Bank,Ltd. 米兰银行United Bank of Switzerland 瑞士联合银行Dresden Bank A.G. 德累斯敦银行Bank of Tokyo,Ltd. 东京银行Hongkong and Shanghai Corporation 香港汇丰银行International Finance Corporation, IFC 国际金融公司La Communaute Financieve Africane 非洲金融共同体Economic and Social Council, ECOSOC 联合国经济及社会理事会United Nations Development Program, NUDP 联合国开发计划署United Nations Capital Development Fund, UNCDF 联合国资本开发基金United Nations Industrial Development Organization, UNIDO 联合国工业发展组织United Nations Conference on Trade and Development, UNCTAD 联合国贸易与发展会议Food and Agricultural Organization, FAO 粮食与农业组织, 粮农组织Economic Commission for Europe, ECE 欧洲经济委员会Economic Commission for Latin America, ECLA 拉丁美洲经济委员会Economic Commission for Asia and Far East, ECAFE 亚洲及远东经济委员会Economic Commission for Western Asia, ECWA 西亚经济委员会Economic Commission for Africa, ECA 非洲经济委员会Overseas Chinese Investment Company 华侨投资公司New York Stock Exchange, NYSE 纽约证券交易所London Stock Market 伦敦股票市场Baltic Mercantile and Shipping Exchange 波罗的海商业和航运交易所经济学常用英语词汇Aaccounting 会计accounting cost 会计成本accounting profit 会计利润adverse selection 逆向选择allocation 配置allocation of resources 资源配置allocative efficiency 配置效率antitrust legislation 反托拉斯法arc elasticity 弧弹性Arrow's impossibility theorem 阿罗不可能定理Assumption 假设asymetric information 非对称性信息average 平均average cost 平均成本average cost pricing 平均成本定价法average fixed cost 平均固定成本average product of capital 资本平均产量average product of labour 劳动平均产量average revenue 平均收益average total cost 平均总成本average variable cost 平均可变成本Bbarriers to entry 进入壁垒base year 基年bilateral monopoly 双边垄断benefit 收益black market 黑市bliss point 极乐点boundary point 边界点break even point 收支相抵点budget 预算budget constraint 预算约束budget line 预算线budget set 预算集Ccapital 资本capital stock 资本存量capital output ratio 资本产出比率capitalism 资本主义cardinal utility theory 基数效用论cartel 卡特尔ceteris puribus assumption “其他条件不变”的假设ceteris puribus demand curve 其他因素不变的需求曲线Chamberlin model 张伯伦模型change in demand 需求变化change in quantitydemanded 需求量变化change in quantity supplied 供给量变化change in supply 供给变化choice 选择closed set 闭集Coase theorem 科斯定理Cobb—Douglas production function 柯布--道格拉斯生产函数cobweb model 蛛网模型collective bargaining 集体协议工资collusion 合谋command economy 指令经济commodity 商品commodity combination 商品组合commodity market 商品市场commodity space 商品空间common property 公用财产comparative static analysis 比较静态分析compensated budget line 补偿预算线compensated demand function 补偿需求函数compensation principles 补偿原则compensating variation in income 收入补偿变量competition 竞争competitive market 竞争性市场complement goods 互补品complete information 完全信息completeness 完备性condition for efficiency in exchange 交换的最优条件condition for efficiency in production 生产的最优条件concave 凹concave function 凹函数concave preference 凹偏好consistence 一致性constant cost industry 成本不变产业constant returns to scale 规模报酬不变constraints 约束consumer 消费者consumer behavior 消费者行为consumer choice 消费者选择consumer equilibrium 消费者均衡consumer optimization 消费者优化consumer preference 消费者偏好consumer surplus 消费者剩余consumer theory 消费者理论consumption 消费consumption bundle 消费束consumption combination 消费组合consumption possibility curve 消费可能曲线consumption possibility frontier 消费可能性前沿consumption set 消费集consumption space 消费空间continuity 连续性continuous function 连续函数contract curve 契约曲线convex 凸convex function 凸函数convex preference 凸偏好convex set 凸集corporatlon 公司cost 成本cost benefit analysis 成本收益分cost function 成本函数cost minimization 成本极小化Cournot equilihrium 古诺均衡Cournot model 古诺模型Cross—price elasticity 交叉价格弹性Ddead—weights loss 重负损失decreasing cost industry 成本递减产业decreasing returns to scale 规模报酬递减deduction 演绎法demand 需求demand curve 需求曲线demand elasticity 需求弹性demand function 需求函数demand price 需求价格demand schedule 需求表depreciation 折旧derivative 导数derive demand 派生需求difference equation 差分方程differential equation 微分方程differentiated good 差异商品differentiated oligoply 差异寡头diminishing marginal substitution 边际替代率递减diminishing marginal return 收益递减diminishing marginal utility 边际效用递减direct approach 直接法direct taxes 直接税discounting 贴税、折扣diseconomies of scale 规模不经济disequilibrium 非均衡distribution 分配division of labour 劳动分工distribution theory of marginal productivity 边际生产率分配论duoupoly 双头垄断、双寡duality 对偶durable goods 耐用品dynamic analysis 动态分析dynamic models 动态模型EEconomic agents 经济行为者economic cost 经济成本economic efficiency 经济效率economic goods 经济物品economic man 经济人economic mode 经济模型economic profit 经济利润economic region of production 生产的经济区域economic regulation 经济调节economic rent 经济租金exchange 交换economics 经济学exchange efficiency 交换效率economy 经济exchange contract curve 交换契约曲线economy of scale 规模经济Edgeworth box diagram 埃奇沃思图exclusion 排斥性、排他性Edgeworth contract curve 埃奇沃思契约线Edgeworth model 埃奇沃思模型efficiency 效率,效益efficiency parameter 效率参数elasticity 弹性elasticity of substitution 替代弹性endogenous variable 内生变量endowment 禀赋endowment of resources 资源禀赋Engel curve 恩格尔曲线entrepreneur 企业家entrepreneurship 企业家才能entry barriers 进入壁垒entry/exit decision 进出决策envolope curve 包络线equilibrium 均衡equilibrium condition 均衡条件equilibrium price 均衡价格equilibrium quantity 均衡产量eqity 公平equivalent variation in income 收入等价变量excess—capacity theorem 过度生产能力定理excess supply 过度供给exchange 交换exchange contract curve 交换契约曲线exclusion 排斥性、排他性exclusion principle 排他性原则existence 存在性existence of general equilibrium 总体均衡的存在性exogenous variables 外生变量expansion paths 扩展径expectation 期望expected utility 期望效用expected value 期望值expenditure 支出explicit cost 显性成本external benefit 外部收益external cost 外部成本external economy 外部经济external diseconomy 外部不经济externalities 外部性FFactor 要素factor demand 要素需求factor market 要素市场factors of production 生产要素factor substitution 要素替代factor supply 要素供给fallacy of composition 合成谬误final goods 最终产品firm 企业firms’demand curve for labor 企业劳动需求曲线firm supply curve 企业供给曲线first-degree price discrimination 第一级价格歧视first—order condition 一阶条件fixed costs 固定成本fixed input 固定投入fixed proportions production function 固定比例的生产函数flow 流量fluctuation 波动for whom to produce 为谁生产free entry 自由进入free goods 自由品,免费品free mobility of resources 资源自由流动free rider 搭便车,免费搭车function 函数future value 未来值Ggame theory 对策论、博弈论general equilibrium 总体均衡general goods 一般商品Giffen goods 吉芬晶收入补偿需求曲线Giffen's Paradox 吉芬之谜Gini coefficient 吉尼系数goldenrule 黄金规则goods 货物government failure 政府失败government regulation 政府调控grand utility possibility curve 总效用可能曲线grand utility possibility frontier 总效用可能前沿Hheterogeneous product 异质产品Hicks—kaldor welfare criterion 希克斯一卡尔多福利标准homogeneity 齐次性homogeneous demand function 齐次需求函数homogeneous product 同质产品homogeneous production function 齐次生产函数horizontal summation 水平和household 家庭how to produce 如何生产human capital 人力资本hypothesis 假说Iidentity 恒等式imperfect competion 不完全竞争implicitcost 隐性成本income 收入income compensated demand curveincome constraint 收入约束income consumption curve 收入消费曲线income distribution 收入分配income effect 收入效应income elasticity of demand 需求收入弹性increasing cost industry 成本递增产业increasing returns to scale 规模报酬递增inefficiency 缺乏效率index number 指数indifference 无差异indifference curve 无差异曲线indifference map 无差异族indifference relation 无差异关系indifference set 无差异集indirect approach 间接法individual analysis 个量分析individual demand curve 个人需求曲线individual demand function 个人需求函数induced variable 引致变量induction 归纳法industry 产业industry equilibrium 产业均衡industry supply curve 产业供给曲线inelastic 缺乏弹性的inferior goods 劣品inflection point 拐点information 信息information cost 信息成本initial condition 初始条件initial endowment 初始禀赋innovation 创新input 投入input—output 投入—产出institution 制度institutional economics 制度经济学insurance 保险intercept 截距interest 利息interest rate 利息率intermediate goods 中间产品internatization of externalities 外部性内部化invention 发明inverse demand function 逆需求函数investment 投资invisible hand 看不见的手isocost line 等成本线,isoprofit curve 等利润曲线isoquant curve 等产量曲线isoquant map 等产量族Kkinded—demand curve 弯折的需求曲线Llabour 劳动labour demand 劳动需求labour supply 劳动供给labour theory of value 劳动价值论labour unions 工会laissez faire 自由放任Lagrangian function 拉格朗日函数Lagrangian multiplier 拉格朗乘数,land 土地law 法则law of demand and supply 供需法law of diminishing marginal utility 边际效用递减法则law of diminishing marginal rate of substitution 边际替代率递减法则law of diminishing marginal rate of technical substitution 边际技术替代率law of increasing cost 成本递增法则law of one price 单一价格法则leader—follower model 领导者--跟随者模型least—cost combination of inputs 最低成本的投入组合leisure 闲暇Leontief production function 列昂节夫生产函数licenses 许可证linear demand function 线性需求函数linear homogeneity 线性齐次性linear homogeneous production function 线性齐次生产函数long run长期long run average cost 长期平均成本long run equilibrium 长期均衡long run industry supply curve 长期产业供给曲线long run marginal cost 长期边际成本long run total cost 长期总成本Lorenz curve 洛伦兹曲线loss minimization 损失极小化1ump sum tax 一次性征税luxury 奢侈品Mmacroeconomics 宏观经济学marginal 边际的marginal benefit 边际收益marginal cost 边际成本marginal cost pricing 边际成本定价marginal cost of factor 边际要素成本marginal period 市场期marginal physical productivity 实际实物生产率marginal product 边际产量marginal product of capital 资本的边际产量marginal product of1abour 劳动的边际产量marginal productivity 边际生产率marginal rate of substitution 边替代率marginal rate of transformation 边际转换率marginal returns 边际回报marginal revenue 边际收益marginal revenue product 边际收益产品marginal revolution 边际革命marginal social benefit 社会边际收益marginal social cost 社会边际成本marginal utility 边际效用marginal value products 边际价值产品market 市场market clearance 市场结清,市场洗清market demand 市场需求market economy 市场经济market equilibrium 市场均衡market failure 市场失败market mechanism 市场机制market structure 市场结构market separation 市场分割market regulation 市场调节market share 市场份额markup pricing 加减定价法Marshallian demand function 马歇尔需求函数maximization 极大化microeconomics 微观经济学minimum wage 最低工资misallocation of resources 资源误置mixed economy 混合经济model 模型money 货币monopolistic competition 垄断竞争monopolisticexploitation 垄断剥削monopoly 垄断,卖方垄断monopoly equilibrium 垄断均衡monopoly pricing 垄断定价monopoly regulation 垄断调控monopoly rents 垄断租金monopsony 买方垄断NNash equilibrium 纳什均衡Natural monopoly 自然垄断Natural resources 自然资源Necessary condition 必要条件necessities 必需品net demand 净需求nonconvex preference 非凸性偏好nonconvexity 非凸性nonexclusion 非排斥性nonlinear pricing 非线性定价nonrivalry 非对抗性nonprice competition 非价格竞争nonsatiation 非饱和性non--zero—sum game 非零和对策normal goods 正常品normal profit 正常利润normative economics 规范经济学Oobjective function 目标函数oligopoly 寡头垄断oligopoly market 寡头市场oligopoly model 寡头模型opportunity cost 机会成本optimal choice 最佳选择optimal consumption bundle 消费束perfect elasticity 完全有弹性optimal resourceallocation 最佳资源配置optimal scale 最佳规模optimal solution 最优解optimization 优化ordering of optimization(social) preference (社会)偏好排序ordinal utility 序数效用ordinary goods 一般品output 产量、产出output elasticity 产出弹性output maximization 产出极大化Pparameter 参数Pareto criterion 帕累托标准Pareto efficiency 帕累托效率Pareto improvement 帕累托改进Pareto optimality 帕累托优化Pareto set 帕累托集partial derivative 偏导数partial equilibrium 局部均衡patent 专利pay off matrix 收益矩阵、支付矩阵perceived demand curve 感觉到的需求曲线perfect competition 完全竞争perfect complement 完全互补品perfect monopoly 完全垄断perfect price discrimination 完全价格歧视perfect substitution 完全替代品perfect inelasticity 完全无弹性perfectly elastic 完全有弹性perfectly inelastic 完全无弹性plant size 工厂规模point elasticity 点弹性positive economics 实证经济学post Hoc Fallacy 后此谬误prediction 预测preference 偏好preference relation 偏好关系present value 现值price 价格price adjustment model 价格调整模型price ceiling 最高限价price consumption curve 价格费曲线price control 价格管制price difference 价格差别price discrimination 价格歧视price elasticity of demand 需求价格弹性price elasticity of supply 供给价格弹性price floor 最低限价price maker 价格制定者price rigidity 价格刚性price seeker 价格搜求者price taker 价格接受者price tax 从价税private benefit 私人收益principal—agent issues 委托--代理问题private cost 私人成本private goods 私人用品private property 私人财产producer equilibrium 生产者均衡producer theory 生产者理论product 产品product transformation curve 产品转换曲线product differentiation 产品差异product group 产品集团production 生产production contract curve 生产契约曲线production efficiency 生产效率production function 生产函数production possibility curve 生产可能性曲线productivity 生产率productivity of capital 资本生产率productivity of labor 劳动生产率profit 利润profit function 利润函数profit maximization 利润极大化property rights 产权property rights economics 产权经济学proposition 定理proportional demand curve 成比例的需求曲线public benefits 公共收益public choice 公共选择public goods 公共商品pure competition 纯粹竞争rivalry 对抗性、竞争pure exchange 纯交换pure monopoly 纯粹垄断Qquantity—adjustment model 数量调整模型quantity tax 从量税quasi—rent 准租金Rrate of product transformation 产品转换率rationality 理性reaction function 反应函数regulation 调节,调控relative price 相对价格rent 租金rent control 规模报酬rent seeking 寻租rent seeking economics 寻租经济学resource 资源resource allocation 资源配置returns 报酬、回报returns to scale 规模报酬revealed preference 显示性偏好revenue 收益revenue curve 收益曲线revenue function 收益函数revenue maximization 收益极大化ridge line 脊线risk 风险Ssatiation 饱和,满足saving 储蓄scarcity 稀缺性law of scarcity 稀缺法则second—degree price discrimination 二级价格歧视second derivative --阶导数second—order condition 二阶条件service 劳务set 集shadow prices 影子价格short—run 短期short—run cost curve 短期成本曲线short—run equilibrium 短期均衡short—run supply curve 短期供给曲线shut down decision 关闭决策shortage 短缺shut down point 关闭点single price monopoly 单一定价垄断slope 斜率social benefit 社会收益social cost 社会成本social indifference curve 社会无差异曲线social preference 社会偏好social security 社会保障social welfare function 社会福利函数socialism 社会主义solution 解space 空间stability 稳定性stable equilibrium 稳定的均衡Stackelberg model 斯塔克尔贝格模型static analysis 静态分析stock 存量stock market 股票市场strategy 策略subsidy 津贴substitutes 替代品substitution effect 替代效应substitution parameter 替代参数sufficient condition 充分条件supply 供给supply curve 供给曲线supply function 供给函数supply schedule 供给表Sweezy model 斯威齐模型symmetry 对称性symmetry of information 信息对称Ttangency 相切taste 兴致technical efficiency 技术效率technological constraints 技术约束technological progress 技术进步technology 技术third—degree price discrimination 第三级价格歧视total cost 总成本total effect 总效应total expenditure 总支出total fixed cost 总固定成本total product 总产量total revenue 总收益total utility 总效用total variable cost 总可变成本traditional economy 传统经济transitivity 传递性transaction cost 交易费用Uuncertainty 不确定性uniqueness 唯一性unit elasticity 单位弹性unstable equilibrium 不稳定均衡utility 效用utility function 效用函数utility index 效用指数utility maximization 效用极大化utility possibility curve 效用可能性曲线utility possibility frontier 效用可能性前沿Vvalue 价值value judge 价值判断value of marginal product 边际产量价值variable cost 可变成本variable input 可变投入variables 变量vector 向量visible hand 看得见的手vulgur economics 庸俗经济学Wwage 工资wage rate 工资率Walras general equilibrium 瓦尔拉斯总体均衡Walras's law 瓦尔拉斯法则Wants 需要Welfare criterion 福利标准Welfare economics 福利经学Welfare loss triangle 福利损失三角形welfare maximization 福利极大化Zzero cost 零成本zero elasticity 零弹性zero homogeneity 零阶齐次性zero economic profit 零利润GRE词汇精选abandon v./n.放弃;放纵abash v.使害羞,使尴尬abate v.减轻,减少abbreviate v.缩短;缩写abdicate v.退位,辞职,放弃aberrant adj.越轨的;异常的aberrantion n.离开正路,脱离正常;变形abet v.教唆,鼓励帮助abeyance n.中止,搁置abhor v.憎恨,嫌恶abhorrent adj.可恨的,讨厌的abide v.容忍,忍受abject adj.极可怜的abjure adj.发誓放弃;弃绝ablution n.净礼,沐浴abnegate v.否认,放弃abolish v.废止,废除abolition n.废除,革除abominate v.痛恨,厌恶aboveboard adj.光明正大的abrade v.磨损,磨小abrasion n.表面磨损abrasive adj.磨损的;生硬粗暴的abreast adv.并列地,并排地abridge v.删减;缩短abrogate v.废止,废除abscission n.切除,截去;脱离abscond v.潜逃,逃亡absenteeism n.旷课,旷工absolute adj.绝对的,完全的;限制的absolve v.赦免,免除absorb v.吸收;同化;吸引...的注意abstain v.禁绝,放弃abstemious adj.有节制的,节俭的abstention n.节制abstentious adj.节制的abstract n.摘要abstruse adj.难懂的,深奥的absurd adj.荒谬的,可笑的abundance n.充裕,多量abuse v.辱骂;滥用abusive adj.漫骂的;毁谤的;虐待的abut v.接界,毗邻abysmal adj.极深的;糟透的academic adj.学院的,学术的;理论的academician n.院士;学会会员accede v.同意accelerate v.加速;促进accentuate v.重读;强调access n.通路;途径accessiable adj.易达到的;易受影响的accessory adj.附属的,次要的acclaim v.欢呼,称赞acclimate v.使服水土;使适应accolade n.推崇;赞扬accommodate v.与...一直;提供住宿accommodating adj.乐于助人的accompany v.伴随,陪伴accomplice n.同谋者,帮凶accomplish v.完成,做成功accomplished adj.完成了的;有技巧的,有造诣的accord v./n.同意;一致accost v.搭话accountability n.负有责任的accrete v.逐渐增长;添加生长;连生accretion n.自然的增长;增加物accrue v.增大;增多accumulate v.积聚,积累accuracy n.精确,准确accurate adj.精确的,准确的accuse v.谴责,指责acerbic adj.苦涩的;刻薄的acknowledge v.承认;致谢acme n.顶点,极点acolyte n.助手,侍僧acme n.橡子,橡果acoustic adj.听觉的,有关声音的acquaint v.使...熟知;通知acquaintance n.熟知;熟人acquainted adj.对某事物熟悉的,对某人认识的acquiesce v.勉强同意,默许acquired adj.后天习得的acquisitive adj.渴望得到的,贪婪的acquit v.宣告无罪;脱卸义务和责任;还清acquittal n.宣告无罪,开释acrid adj.辛辣的,刻薄的acrimonious adj.尖刻的,严厉的acrimony n.尖刻,刻薄acrobat n.特技演员,杂技演员acrophobia n.恐高症acuity n.敏锐acumen n.敏锐,精明acute adj.灵敏的;急性的adage n.格言,古训adamant adj.强硬的;固执的adapt v.使...适应;修改adaptable adj.有适应能力的;可改编的addendum n.补充,附录addict v./n.沉溺;上瘾addition n.增加,附加additive n.添加剂address v.处理,对付,着手解决adept adj.老练的,精通的adequate adj.足够的adhere v.粘着adherent n.拥护者,信徒adhesive adj.带粘性的,胶粘的adjacent adj.接近的,毗连的adjourn v.使延期,推迟;休会adjunct n.附加物,附件adjust v.整顿,整理admire v.钦佩,赞赏admission n.许可;入会费;承认admonish v.训诫;警告adobe n.泥砖,土坯adolescent adj.青春期的,青少年adopt v.收养adore v.崇拜;热爱adorn v.装饰adroit adj.熟练的,灵巧的adulate v.谄媚,奉承adulterate v.掺假adumbrate v.预示advent n.到来,来临adventtious adj.偶然的adverse adj.不利的,相反的;敌对的advertise v.做广告;通知advisable adj.适当的,可行的advocacy n.拥护,支持advocate v.拥护,支持,鼓吹;n.支持者,拥护者aegis n.盾;保护,庇护aerate v.充气,让空气进入aerial adj.空中的,空气中的aesthete n.审美家aesthetic adj.美学的,有审美感的affable adj.易于交谈的;和蔼的affectation n.做作,虚假affected adj.不自然的;假装的affection n.爱affidavit n.宣誓书affiliate v.加入affiliation n.联系,联合affinity n.密切关系affirm v.确认affic v.粘上,贴上afflict v.使痛苦,折磨affliction n.悲痛,受难的起因affluence n.充裕,富足affluent adj.富裕的,丰富的【结束语】It is love that makes the world go round.爱令世界生生不息。
General Terms
If Your Bug Database Could Talk...Adrian Schr¨oter·Thomas Zimmermann·Rahul Premraj·Andreas ZellerSaarland UniversitySaarbr¨ucken,Germany{schroeter|zimmerth|premraj|zeller}@st.cs.uni-sb.deABSTRACTWe have mined the Eclipse bug and version databases to map fail-ures to Eclipse components.The resulting data set lists the defect density of all Eclipse components.As we demonstrate in three sim-ple experiments,the bug data set can be easily used to relate code, process,and developers to defects.The data set is publicly avail-able for download.Categories and Subject DescriptorsD.2.7[Software Engineering]:Distribution,Maintenance,and Enhancement—version control; D.2.8[Software Engineering]: Metrics—Complexity measures,Process metrics,Product metrics;D.2.9[Software Engineering]:Management—Software quality assurance(SQA)General TermsManagement,Measurement,Reliability1.INTRODUCTIONWhy is it that some programs are more failure-prone than others? This is one of the central questions of software engineering.To an-swer it,we mustfirst know which programs are more failure-prone than others.With this knowledge,we can search for properties of the program or its development process that commonly correlate with defect density;in other words,once we can measure the ef-fect,we can search for its causes.One of the most abundant,widespread,and reliable sources for fail-ure information is a bug database,listing all the problems that oc-curred during the software life time.Unfortunately,bug databases frequently do not directly record how,where,and by whom the problem in question wasfixed.This information is hidden in the version database,recording all changes to the software source code.In recent years,a number of techniques have been devel-oped to relate bug reports tofixes[6,3,2].Since we thus can relate bugs tofixes,andfixes to the locations they apply to,we can easily determine the defect density of a component—simply by counting the appliedfixes.We have conducted such a work on the code base of the Eclipse programming environment.In particular,we have computed the mapping of classes to the number of defects that were reported in thefirst six months before and after release,respectively.We have made this Eclipse bug data set freely available,such that anyone can use it for research purposes.Figure1shows an excerpt of the data set in XML format.The file Plugin.java had5failures(and thus defects)before release3.0 (“pre”);it had one failure after release(“post”).The enclosing package org.eclipse.core.runtime contains43files(“points”)and encountered16failures before and one failure after release3.0; on average eachfile in this package had0.609failures before and 0.022failures after release(“avg”).1What can one do with such data?In this paper,we illustrate how the data set can be used to address simple research questions:•Can one predict failure-proneness from metrics like code complexity?(Section3)•What does a high number of bugs found during testing mean for the number of bugs found after release?(Section4)•Do some developers write more failure-prone code than oth-ers?(Section5)This paper does not attempt to give definitive answers on these questions,but merely highlights the potential of bug data when it comes to answer these questions.We hope that the public avail-ability of data sets like ours will foster empirical research in soft-ware engineering,just like the public availability of open source programs fostered research in program analysis.2.GETTING BUG DATAHow do we know which components failed and which did not? This data can be collected from version archives like CVS and bug tracking systems like BUGZILLA in two steps:1.We identify corrections(orfixes)in version archives:Withinthe messages that describe changes,we search for references to bug reports such as“Fixed42233”or“bug#23444”.Ba-sically every number is a potential reference to a bug report, however such references have a low trust atfirst.We increase the trust level when a message contains keywords such as “fixed”or“bug”or matches patterns like“#and a number”.This approach was previously used in research[3,2].1Since one failure can affect severalfiles in one package,the counts on package level cannot be aggregated fromfile level and therefore are provided separately.<defects project=”eclipse”release=”3.0”><package name=”org.eclipse.core.runtime”><counts><count id=”pre”value=”16”avg=”0.609”points=”43”max=”5”> <count id=”post”value=”1”avg=”0.022”points=”43”max=”1”> </counts><compilationunit name=”Plugin.java”><counts><count id=”pre”value=”5”><count id=”post”value=”1”></counts></compilationunit><compilationunit name=”Platform.java”><counts><count id=”pre”value”1”><count id=”post”value=”0”></counts></compilationunit>...</package>...</defects>Figure1:The Eclipse bug data set(excerpt).2.We use the bug tracking system to map bug reports to re-leases.The bug database versionfield lists the release for which the bug was reported;however,since thefield value may change during the life-cycle of a bug,we only use the first reported release.We distinguish two different kinds of failures:pre-release failures are observed during develop-ment and testing of a program,while post-release failures are observed after the program has been deployed to its users. Since we know the location of every failure that has beenfixed,it is easy to count the number of defects per location and release—resulting in the data set of Figure1.3.THE CODE FACTORSo where do these bugs come from?One hypothesis is that some code is more failure-prone than other because it is more complex. Complexity metrics attempt to quantify this complexity,mapping code to metric values.In earlier work on mining Microsoft bug databases[4],we could notfind a single metric that would correlate with bug density across multiple ing the Eclipse bug data set,we can easily check this result by correlating,for each class,complexity metrics with the number of bugs.Chidamber and Kemerer[1]proposed several code metrics that capture the complexity of a class.Table1lists the correlations of each of these metrics(gathered using the tool ckjm[7])with pre-release and post-release failures.Albeit weak,the most strongly correlated features2to pre-release and post-release failures include RFC(Response for a Class),CBO(Coupling Between Object classes)and WMC(Weighted Methods per Class).These results are in line with our previous research at Microsoft[4], thus suggesting that either new or a combination of existing metrics need to be explored to study the relationship between the complex-ity of code to the presence of bugs in a given class.One important predictor might be the domain of a component—in related work, we could predict the failure-proneness of an Eclipse package from its imports alone[5].2For detailed explanations of these code metrics,the reader is re-quested to refer to[1].Number of Pre-release failures Post-release failuresPearson Spearman Pearson Spearman Pre-release failures 1.00 1.000.260.19 Post-release failures0.260.19 1.00 1.00 WMC0.320.310.160.11 DIT0.070.110.000.01 NOC0.000.040.000.02 CBO0.360.400.230.12 RFC0.390.380.210.11 LCOM0.130.230.030.07 CA0.090.050.020.04 NPM0.200.180.110.09 Table1:Correlation of pre-release and post-eelease failures with code metricsNumber of Pre-release failures Post-release failuresPearson Spearman Pearson Spearman Pre-release failures 1.00 1.000.300.20 Post-release failures0.300.20 1.00 1.00 Changes0.340.440.140.15 Changes since2.10.470.560.190.17 Authors0.300.300.150.13 Authors since2.10.410.490.210.17 Table2:Correlation of process measurements with failures [Eclipse3.0].4.THE PROCESS FACTORAny problem that raises after product release indicates a defect not only in the product,but also in its process:Clearly,the de-fect should have been caught by quality assurancefirst.In practice, this may mean that the product was not tested enough.Therefore, we could turn to the testing process as a cause for the problem. Failures during testing are recorded as pre-release failures in bug tracking systems.Other measures for the development process are the number of changes and authors of afile.Tables2shows how these measurements correlate with each other.For pre-release fail-ures the correlation is highest for the number of changes(0.47)and authors(0.41)since release2.1.This is not surprising,since every pre-release failure also resulted in at least one change(namely the fix).Post-release failures show almost now correlation with process measurements,except for pre-release failures where the correlation is0.30.To summarize,it is difficult to predict post-release failures solely from process measurements.5.THE HUMAN FACTORAs a third andfinal example of using the Eclipse bug data set,let us turn to the ultimate cause of errors:humans.Unfortunately,data from one project alone is not enough to judge managerial decisions. However,we can turn to the developers and examine whether spe-cific developers are more likely to produce bugs than others. Tables3and4summarize pre-release and post-release bug patterns introduced by developers.In both tables,thefirst column lists the names of developers3and the second column lists the number of files owned by the developer.The latter was derived by attributing 3Names have been changed to maintain anonymity.Failure-densities Developer No.of Files PrRF/1000lines Avg.PrRF/File Frederick32016.42 2.81Peter9714.70 1.96Isaac1789.95 1.69Mary3929.35 1.84 London639.18 1.41David888.77 1.64Harry55 2.55 1.18 Tommy92 2.200.35King162 2.180.36 Charles63 1.820.43Nellie60 1.140.32 Robert580.470.17Table3:Pre-release failures by developerthefile to the developer(s)that owned most number of lines of code in afile and only those developers that owned50or morefiles were included in the analysis.Columns3and4record the number of pre-release and post-release failures per1000lines of code and the average number of pre-release and post-release failures perfile.For brevity,only thefirst and last six entries of each table are reported. In Table3,one observes substantial differences in pre-release fail-ure densities infiles(indicated by Columns3and4)between dif-ferent developers.However,such results should be carefully inter-preted.We suspect that the results do not indicate developer com-petency but instead,reflect the complexity of code they are work-ing on.Hence,developers with lesser pre-release or post-release failures are not necessarily better developers that the others.Our stance is further supported by there being no clear relation between the number offiles owned by a developer and the corresponding failure densities observed since experienced and better program-mers may own morefiles.Likewise,Table4again indicates a high variance in failure den-sity infiles owned by different developers,although the densities are smaller in comparison to pre-releasure failures.It is note-worthy that developer Frederick lists in Table3as the owner of thefiles with highest pre-release failure density,while in Table4, the same developer is the owner of nearly failure free post-release files.In contrast to Frederick,files owned by Tommy are less pre-release failure prone while the post-release failures are consider-ably higher.Hence,different developers are likely to introduce different num-ber of failures into the code for manifold possible reasons.We con-sider such information to be only the tip of the iceberg indicating directions for future investigations pertaining to the human factor in software development.6.CONCLUSION AND CONSEQUENCES Where do bugs come from?By mapping failures to components, the Eclipse bug data set offers the opportunity to research these questions.Our initial studies,as shown in this paper,do not give a definitive answer.However,they raise obvious follow-up ques-tions and indicate the potential of future empirical research based on such bug data.To support this very research,we are happy to make the bug data set publicly available.Failure-densities Developer No.of Files PoRF/1000lines Avg.PoRF/File Jack540.710.13 London630.520.08 Queen1110.510.20 Edward550.410.04 Samuel670.390.12 Tommy920.340.05 Alfred1520.030.01 Oliver1060.030.02 Frederick3200.020.00King1620.000.00 Benjamin1190.000.00 George520.000.00Table4:Post-release failures by developer Overall,we would like this set to become both a challenge and a benchmark:Which factors in programs and processes are the ones that predict future bugs,and which approach gives the best prediction results?The more we learn about past mistakes,the better are our chances to avoid these mistakes in the future—and build better software at lower cost.For access to the Eclipse bug data set,as well as for ongoing infor-mation on the project,seehttp://www.st.cs.uni-sb.de/softevo/ Acknowledgments.Our work on mining software reposito-ries is funded by the Deutsche Forschungsgemeinschaft,grant Ze509/1-1.Thomas Zimmermann is additionally funded by the DFG-Graduiertenkolleg“Leistungsgarantien f¨u r Rechnersysteme”.7.REFERENCES[1]S.R.Chidamber and C.F.Kemerer.A metrics suite for objectoriented design.IEEE Trans.Software Eng.,20(6):476–493, 1994.[2]D.Cubranic,G.C.Murphy,J.Singer,and K.S.Booth.Hipikat:A project memory for software development.IEEE Transactions on Software Engineering,31(6):446–465,June 2005.[3]M.Fischer,M.Pinzger,and H.Gall.Analyzing and relatingbug report data for feature tracking.In Proc.10th WorkingConference on Reverse Engineering(WCRE2003),Victoria, British Columbia,Canada,Nov.2003.IEEE.[4]N.Nagappan,T.Ball,and A.Zeller.Mining metrics to predictcomponent failures.In Proceedings of the InternationalConference on Software Engineering(ICSE2006).ACM,May2006.[5]A.Schr¨o ter,T.Zimmermann,and A.Zeller.Predictingfailure-prone components at design time.In Proceedings ofthe5th International Symposium on Empirical SoftwareEngineering(ISESE2006).ACM,Sept.2006.[6]J.´Sliwerski,T.Zimmermann,and A.Zeller.When do changesinducefixes?In Proc.International Workshop on MiningSoftware Repositories(MSR),St.Louis,Missouri,U.S.,May 2005.[7]D.Spinellis.Code Quality:The Open Source Perspective.Addison Wesley,2006.。
General Terms
Towards Safe Composition of Product LinesDon Batory and Sahil ThakerDepartment of Computer SciencesUniversity of Texas at AustinAustin, Texas, 78712 U.S.A.{batory,sahilt}@A specification of a feature model is a grammar and its cross-tree constraints. A model of our automotive product line is listed in Figure 3. A sentence of this grammar that satisfies all cross-tree constraints defines a unique product and the set of all legal sen-tences is a language, i.e., a product line [11].We recently showed that feature models are compact representa-tions of propositional formulas [11]. Rules for translating grammar productions into formulas are listed in Figure 2. (The atmost1(A,B,C) predicate in Figure 2 means at most one of A , B ,or C is true. See [21] p. 278.) The propositional formula of a gram-mar is the conjunction of the formulas for each production, each cross-tree constraint, and the formula that selects the root feature (i.e., all products have the root feature). Thus, all constraints except ordering constraints of a feature model can be mapped to a propositional formula . This relationship of feature models and propositional formulas is essential to results on safe composition.3 AHEADAHEAD is a theory of program synthesis that merges feature mod-els with additional ideas [10]. First, each feature is implemented by a distinct module. Second, program synthesis is compositional:complex programs are built by composing feature modules. Third,program designs are algebraic expressions. The following summa-rizes the ideas of AHEAD that are relevant to safe composition.3.1 Algebras and Step-Wise DevelopmentAn AHEAD model of a domain is an algebra that consists of a set of operations, where each operation implements a feature. We write M = {f, h, i, j} to mean model M has operations (or fea-tures) f , h , i , and j . One or more features of a model are constants that represent base programs:f // a program with feature f h// a program with feature hThe remaining operations are functions , which are program refine-ments or extensions:i •x // adds feature i to program x j •x// adds feature j to program xwhere • denotes function composition and i •x is read as “feature i refines program x ” or equivalently “feature i is added to program x ”. The design of an application is a named expression (i.e., com-position of features) called an equation :prog1 = i •f // prog1 has features i and f prog2 = j •h // prog2 has features j and h prog3 = i •j •h// prog3 has features i, j, hAHEAD is based on step-wise development [52]: one begins with a simple program (e.g., constant feature h ) and builds a more com-plex program by progressively adding features (e.g., adding fea-tures i and j to h in prog3).The relationship between feature models and AHEAD is simple:the operations of an AHEAD algebra are the primitive features of a feature model; compound features (i.e., non-leaf features of a fea-ture diagram) are AHEAD expressions. Each sentence of a feature model defines an AHEAD expression which, when evaluated, syn-thesizes that product. The AHEAD model Auto of the automotive product line is:Auto = {Body, Electric, Gasoline, Automatic,Manual, Cruise }where Body is the lone constant. Some products (i.e., legal expres-sions or sentences) of this product line are:c1 = Automatic •Electric •Bodyc2 = Cruise •Automatic •Electric •Gasoline •Body c1 is a car with an electric engine and automatic transmission. And c2 is a car with both electric and gasoline engines, automatic trans-mission, and cruise control.Figure 2 Feature Diagrams, Grammars, and Propositional FormulasSAB CSAB CSA B CS : A [B] C ;... S ...S : A | B | C ;... S+ ...S : A | B | C ;(S ⇔A) ∧ (B ⇒S) ∧ (C ⇔S)(S ⇔ A ∨ B ∨ C)∧ atmost1(A,B,C)S ⇔ A ∨ B ∨ Cdiagram notationgrammar propositional formulaconcept andalternative (choose1)or (choose 1+)// grammar of our automotive product line Car : [Cruise] Transmission Engine+ Body ;Transmission : Automatic | Manual ;Engine : Electric | Gasoline ;// cross-tree constraints Cruise ⇒Automatic ;Figure 3 A Feature Model Specification3.2 Feature ImplementationsFeatures are implemented as program refinements. Con-sider the following example.Let the BASE feature encapsu-late an elementary buffer classwith set and get methods.Let RESTORE denote a“backup” feature that remem-bers the previous value of abuffer. Figure 4a shows thebuffer class of BASE andFigure 4b shows the bufferclass of RESTORE •BASE. Theunderlined code indicates the changes RESTORE makes to BASE . Namely, RESTORE adds to the buffer class two mem-bers, a back variable and a restore method, and modi-fies the existing set method.While this example is simple,it is typical of features. Adding a feature means adding new mem-bers to existing classes and modifying existing methods. As pro-grams and features get larger, features can add new classes and packages to a program as well.Features can be implemented in many ways. The way it is done in AHEAD is to write program refinements in the Jak language, a superset of Java [10]. The changes RESTORE makes to the buffer class is a refinement that adds the back and restore members and refines the set method. This is expressed in Jak as:refines class buffer {int back = 0;void restore() { buf = back; }void set(int x) { back = buf; Super.set(x); }}(1)Method refinement in AHEAD is accomplished by inheritance;Super.set(x) indicates a call to (or substitution of) the prior def-inition of method set(x). By composing the refinement of (1)with the class of Figure 4a, a class that is equivalent to that in Figure 4b is produced. See [10] for further details.AspectJ could also be used to implement features. As the refine-ment capabilities of AspectJ are more general than that of method refinement in AHEAD, we delay further discussion of aspect implementations of features until Section 7.4 “BIG INHALE” COMPILATIONThe first step in testing safe composition properties is to provide a global analysis of the feature modules that can be composed. The analysis (a) determines how each class, method, and variable refer-ence in every module binds to a definition, and (b) eliminates or identifies ambiguities and other problems related to module com-pilation. We used a variation of a technique that was pioneered in Hyper/J for compiling hyperslices (i.e., Hyper/J modules) [40]. As an approximation, an AHEAD feature module is a hyperslice. Tocompile a hyperslice, stubs are created for all classes and membersthat are not introduced by that hyperslice. This makes them declar-atively complete . Once stubs are available, the Java classes of a hyperslice can be compiled into bytecode. Hyper/J then uses byte-code composition tools to compose independently compiled hyper-slices. We follow a similar approach.We know of no tool support for automatic stub creation in Hyper/J;stubs must be created manually [36]. An advantage of AHEAD and many product lines is that the source or binaries for all features are available. By analyzing the feature code base (which we call the “big inhale”), we can automatically generate stubs for all classes that could appear in a synthesized product [9]. For every class, we create a stub that contains the union of the signatures of all variables, methods, and declarations that could appear in that class. The same applies to interfaces that a class could implement.Remember a Java class C in feature module M encapsulates a frag-ment of a class P.C that could appear in a synthesized program P .When we compile module M , we bind all references in class C of M to the variables, methods, and classes of our generated stubs. Only at module composition time do we rebind each variable, method,etc. reference in C of M to a definition, where the definition of a variable, method, or class may be supplied by one of any of the features that comprise P .An important point of feature module compilation is that it pro-vides a global consistency check on modules, without dealing with the feature combinatorics that is the subject of safe composition discussed in the next section. An example of a consistency prop-erty is for a feature module F to reference a method that is not defined in any feature module. We catch this error because module F fails to compile.Ambiguities are another source of errors that our compilation technique catches. Consider the base program BaseP in Figure 5a, which consists oftwo interfaces (I ,J ) and threeclasses (X ,A ,B ). BaseP seems consistent in isolation: thefoo(x) call in Figure 5a binds to the foo(I) method of class A . Now consider feature mod-ule ExtendX of Figure 5b thatmakes class X also implement interface J . This global knowl-edge is exposed by our class stubs, and module BaseP fails to compile as a consequence:the foo(x) call is ambiguous as it could be bound to either the foo(I) or foo(J) methods of class A .In the following sections, assume we have the bytecodes of each feature module from which we can extract variable, method, and class and interface references.class buffer {int buf = 0;int get() {return buf;} void set(int x) {buf=x;}}class buffer {int buf = 0;int get() {return buf;}int back = 0; void set(int x) {back = buf;buf=x;}void restore() {buf = back;}}(a)(b)Figure 4 Buffer Variations interface I {}interface J {}class X implements I {}class A {void foo(I b) {}void foo(J d) {}}class B {void bar(A a,X x) {a.foo(x); }}refines class Ximplements J {}Figure 5 UncompilableFeature Modules (a)(b)5 SAFE COMPOSITIONThe AHEAD tool suite has multiple ways to compose feature mod-ules to build a product. We can compile feature modules as dis-cussed in the last section and let AHEAD compose their bytecodes to produce the binary of a product directly. A problem that can arise is that there may be references to classes or members that are undefined. Alternatively, the primary way in which features are composed in AHEAD is by composing source files. But now, the same errors (i.e., reference to undefined elements) are discovered at program compilation time. In short, we need to ensure that all variables, methods, and classes that are referenced in a program are indeed defined. And we want to ensure this property for all programs in a product line, regardless of the specific approach to synthesize products. This is the essence of safe composition.The core problem is illustrated in the following example. Let PL be a product line with three fea-tures: base , addD , and refC .Figure 6 shows their modules.base is a base feature that encapsulates class C with method foo(). Feature addD introduces class D and leaves class C unchanged. Feature refC refines method foo() of class C and references the constructor of class D . Now suppose the feature model of PL is a single produc-tion with no cross-tree con-straints:PL : [refC] [addD] base ; // feature modelThe product line of PL has four programs that represent all possible combinations of the presence/absence of the refC and addD fea-tures. All programs in PL use the base feature. Question: are there programs in PL that have type errors? As PL is so simple, it is not difficult to see that there is such a program: it has the AHEAD expression refC •base . Class D is referenced in refC , but there is no definition of D in the program itself. This means one of several possibilities: the feature model is wrong, feature implementations are wrong, or both. Designers need to be alerted to such errors. In the following, we define some general compositional constraints (i.e., properties) that product lines must satisfy.5.1 Properties of Safe CompositionRefinement Constraint . Suppose a member or class m is intro-duced in features X , Y , and Z , and is refined by feature F . Products in a product line that contain feature F must satisfy the following constraints to be type safe:(i) X , Y , and Z must appear prior to F in the product’s AHEAD expression (i.e., m must be defined prior to be refined), and (ii) at least X , Y , or Z must appear in every product that contains feature F .Property (i) can be verified by examining the feature model, as it linearizes features. Property (ii) requires the feature model (or rather its propositional formula) to satisfy the constraint:F ⇒ X ∨ Y ∨ Z(2)By examining the code base of feature modules, it is possible to identify and collect such constraints. These constraints, called implementation constraints , are a consequence of feature imple-mentations, and may not arise if different implementations are used. Implementation constraints can be added to the existing cross-tree constraints of a feature model and obeying these addi-tional constraints will guarantee safe composition. That is, only programs that satisfy domain and implementation constraints will be synthesized. Of course, the number of implementation con-straints may be huge for large programs. However, a majority of constraints will be redundant. Theorem provers, such as Otter [5],could be used to prove that implementation constraints are implied by the feature model and thus can be discarded.Czarnecki [17] recently observed the following: Let PL f be the propositional formula of product line PL . If there is a constraint R that is to be satisfied by all members of PL , then the formula (PL f ∧¬R ) can not be satisfiable. If it is, we know that there is a product of PL that violates R . To make our example concrete, to verify that a product line PL satisfies property (2), we want to prove that all products of PL that use feature F also use X , Y , or Z . A satisfiability (SAT) solver can verify if (PL f ∧ F ∧ ¬X ∧ ¬Y ∧ ¬Z) is satisfiable.If it is, there exists a product that uses F without X , Y , or Z . The variable bindings that are returned by a solver identifies the offending product. In this manner, we can verify that all products of PL satisfy (2).Note: We are inferring composition constraints for each feature module; these constraints lie at the module’s “requires-and-provides interface” [18]. When we compose feature modules, we must verify that their “interface” constraints are satisfied by a composition. If composition is a linking process,we are guaranteeing that there will be no linking errors.Superclass Constraint . Super has multiple meanings in the Jak language. The original intent was that Super would refer to the method that was being refined. Once a method void m() in a class C is defined, it is refined by a specification of the form:void m() {... Super.m(); ... }(3)(In AOP-speak, (3) is an around method for an execution pointcut containing the single joinpoint of the m() method). However, if no method m() exists in class C , then (3) is interpreted as a method introduction that invokes its corresponding superclass method.That is, method m() is added to C and Super.m() invokes C ’s inherited method m(). To test the existence of a superclass method requires a more complex constraint.Let feature F introduce a method m into class C and let m invoke m() of its superclass. Let H n be a superclass of C , where n indicates the position of H n by the number of ancestors above C . Thus H 0 is class C , H 1 is the superclass of C , H 2 is the super superclass of C ,etc. Let Sup n (m) denote the predicate that is the disjunction of allclass C {void foo(){..}}class D {...}refines class C {void foo(){... new D() ...Super.foo();}}(a) base (b) addD(c) refCFigure 6 Three FeatureModulesfeatures that define method m in H n (i.e., m is defined with a method body and is not abstract). If features X and Y define m in H 1, then Sup 1(m)=X ∨Y . If features Q and R define m in H 2, then Sup 2(m)=Q ∨R . And so on. The constraint that m is defined in some superclass is:F ⇒ Sup 1(m) ∨ Sup 2(m) ∨ Sup 3(m) ∨ ...(4)In short, if feature F is in a product, then there must also be some feature that defines m in a superclass of C . The actual predicate that is used depends on C ’s position in the inheritance hierarchy.Note: it is common for a method n() of a class C to invoke a different method m() of its superclass via Super.m().Constraint (4) is also used to verify that m() is defined in a superclass of C .Reference Constraint . Let feature F reference member m of class C . This means that some feature must introduce m in C or m is intro-duced in some superclass of C . The constraint to verify is:F ⇒ Sup 0(m) ∨ Sup 1(m) ∨ Sup 2(m) ∨ ...(5)Note: By treating Super calls as references, (5) subsumes constraints (2) and (4).Note: a special case of (5) is the following. Suppose C is a direct subclass of class S . If C is introduced in a product then S must also be introduced. Let c be the default constructor of C which invokes the default constructor m of S . If feature F introduces C and features X , Y , and Z introduce S , then (5)simplifies to:F ⇒ Sup 0(m)// same as F ⇒ X ∨ Y ∨ Z(6)Single Introduction Constraint .More complicated properties canbe verified in the same manner.An example is when the samemember or class is introducedmultiple times in a composition,which we call replacing . Whilenot necessarily an error, replac-ing a member or class can invali-date the feature that firstintroduced this class or member.For example, suppose feature A introduces the Value class,which contains an integer mem-ber and a get() method(Figure 7a). Feature B replaces —not refines — the get() method by returning the double of theinteger member (Figure 7b). Both A and B introduce method get(). Their composition, B •A , causes A ’s get method to be replaced by B ’s get (see Figure 7c). If subsequent features depend on the get() method of A , the resulting program may not work correctly.It is possible for multiple introductions to be correct; in fact, we carefully used such designs in building AHEAD. More often, suchdesigns are symptomatic of inadvertent captures [31]: a member is inadvertently named in one feature identically to that of a member in another feature, and both members have different meanings. In general, these are “bad” designs that could be avoided with a more structured design where each member or class is introduced pre-cisely once in a product. Testing for multiple introductions can either alert designers to actual errors or to designs that “smell bad”.We note that this problem was first recognized by Flatt et al in mixin compositions [19], and has resurfaced elsewhere in object delegation [30] and aspect implementations [4].Suppose member or class m is introduced by features X , Y , and Z .The constraint that no product has multiple introductions of m is:atmost1(X,Y,Z)// at most one of X,Y,Z is true (7)The actual constraint used depends on the features that introduce m . Abstract Class Constraint . An abstract class can define abstract methods (i.e., methods without a body). Each concrete subclass C that is a descendant of an abstract class A must implement all of A ’s abstract methods. To make this constraint precise, let feature F declare an abstract method m in abstract class A . (F could refine A by introducing m , or F could introduce A with m ). Let feature X introduce concrete class C , a descendant of A . If F and X are com-patible (i.e., they can appear together in the same product) then C must implement m or inherit an implementation of m . Let C.m denote method m of class C . The constraint is:F ∧ X ⇒ Sup 0(C.m) ∨ Sup 1(C.m) ∨ Sup 2(C.m) ∨ (8)That is, if abstract method m is declared in abstract class A and C isa concrete class descendant of A , then some feature must imple-ment m in C or an ancestor of C .Note: to minimize the number of constraints to verify, we only need to verify (8) on concrete classes whose immediate superclass is abstract; A need not be C ’s immediate superclass.Note: Although this does not arise in the product lines we examine later, it is possible for a method m that is abstract in class A to override a concrete method m in a superclass of A .(8) would have to be modified to take this possibility into account.Interface Constraint . Let feature F refine interface I by introduc-ing method m or that F introduces I which contains m . Let feature X either introduce class C that implements I or that refines class C to implement I (i.e., a refinement that adds I to C ’s list of imple-mented interfaces). If features F and X are compatible, then C must implement or inherit m . Let C.m denote method m of class C . The constraint is:F ∧ X ⇒ Sup 0(C.m) ∨ Sup 1(C.m) ∨ Sup 2(C.m) ∨ (9)This constraint is identical in form to (8), although the parameters F , X , and m may assume different values.5.2 PerspectiveWe identified six properties ((2),(4)-(9)) that are essential to safe composition. We believe these are the primary properties toclass Value {int v;int get() { return v; }}refines class Value {int get(){ return 2*v; }}class Value {int v;int get(){ return 2*v; }}(a) A (b) B(c) B •A Figure 7 Overriding Membercheck. We know that there are other constraints that are particular to AHEAD that could be included; some are discussed in Section 7.1. Further, using a different compilation technology may introduce even more constraints to be checked (see Section 7.2). To determine if we have a full compliment of constraints requires a theoretical result on the soundness of the type system of the Jak language, which is a superset of Java. To place such a result into perspective, we are not aware of a proof of the soundness of the entire Java language. A standard approach for soundness proofs is to study a representative subset of Java, such as Featherweight Java [26] or ClassicJava [20]. Given a soundness proof, it should be possible to determine if any constraints are missing for that lan-guage subset. To do this for Jak is a topic of future work.5.3 Beyond Code ArtifactsThe ideas of safe composition transcend code artifacts [17]. Con-sider an XML document; it may reference other XML documents in addition to referencing internal elements. If an XML document is synthesized by composing feature modules [10], we need to know if there are references to undefined elements or files in these documents. Exactly the same techniques that we outlined in earlier sections could be used to verify safe composition properties of a product line of XML documents. We believe the same holds for product lines of other artifacts (grammars, makefiles, etc.) as well.The reason is that we are performing analyses on structures that are common to all kinds of synthesized documents; herein lies the generality and power of our approach.6 RESULTSWe have analyzed the safe composition properties of many differ-ent AHEAD product lines. Table 1 summarizes the key size statis-tics for several of the product lines that we analyzed. For lack of space in this paper, we report the specifics of the first two product lines listed (PPL and BPL). Note that the size of the code base and and average size of a generated program is listed both in Jak LOC and translated Java LOC.The properties that we verified are grouped into five categories:•Refinement (2),•Reference to Member or Class includes (4) and (5)•Single Introduction (7)•Abstract Class (8)•Interface (9).For each constraint, we generate a theorem to verify that all prod-ucts in a product line satisfy that constraint. We report the number of theorems generated in each category. Note that duplicate theo-rems can be generated. Consider features Y and ExtendY ofFigure 8. Method m in ExtendY references method o in Y , method p in ExtendY references field i in Y , and method p in ExtendsY refines method p defined in Y . We create a theorem for each con-straint; all theorems are of the form ExtendY ⇒Y . We eliminate duplicate theorems, and report only the number of failures per cat-egory. If a theorem fails, we report all (in Figure 8, all three)sources of errors. Finally, we note that very few abstract methods and interfaces were used in the product lines of Table 1. So the numbers reported in the last two categories are small.We conducted our experiments on a Mobile Intel Pentium 2.8 GHz PC with 1GB memory running Windows XP. We used J2SDK ver-sion 1.5.0_04 and the SAT4J Solver version 1.0.258RC [45].6.1. Prevayler Product LinePrevayler is an open source application written in Java that main-tains an in-memory database and supports plain Java object persis-tence, transactions, logging, snapshots, and queries [43]. We refactored Prevalyer into the Prevaler Product Line (PPL) by giv-ing it a feature-oriented design. That is, we refactored Prevalyer into a set of feature modules, some of which could be removed to produce different versions of Prevalyer with a subset of its original capabilities. Note that the analyses and errors we report in this sec-tion are associated with our refactoring of Prevayler into PPL, and not the original Prevayler source 1.The code base of the PPL is 2029 Jak LOC with seven features:•Core — This is the base program of the Prevayler framework.•Clock — Provides timestamps for transactions.•Persistent — Logs transactions.•Snapshot — Writes and reads database snapshots.•Censor — Rejects transactions by certain criteria.•Replication — Supports database duplication.•Thread — Provides multiple threads to perform transactions.A feature model for Prevayler is shown in Figure 9. Note that there are constraints that preclude all possible combinations of features.Product Line # of Features # of Programs Code Base Jak/Java LOC Program Jak/Java LOC PPL 7202000/20001K/1K BPL 17812K/16K 8K/12K GPL 18801800/1800700/700JPL 705634K/48K 22K/35KTable 1: Product Line Statistics1. We presented a different feature refactoring of Prevayler in [37]. The refactoring we report here is similar to an aspect refactoring of Godil and Jacobsen [22].class D {static int i;static void o() {..}void p() {..}}class C {void m() { D.o(); }}refines class D {void p() {Super.p();D.i=2;}}Figure 8 Sources of ExtendY ⇒Y (a) Y (b) ExtendY // grammarPREVAYLER : [Thread] [Replication] [Censor][Snapshot] [Persistent] [Clock] Core ;//constraintsCensor ⇒ Snapshot;Replication ⇒ Snapshot;Figure 9. Prevayler Feature ModelResults. The statistics of our PPL analysis is shown in Table2. We generated a total of 882 theorems, of which 791 were duplicates. To analyze the PPL feature module bytecodes, generate and remove duplicate theorems, and run the SAT solver to prove the 91 unique theorems took 8 seconds.We performed two sets of safe composition tests on Prevalyer. In the first test, we found 15 reference constraint violations, of which 8 were unique errors, and 12 multiple-introduction constraint errors. These failures revealed an omission in our feature model: we were missing a constraint “Replication⇒Snapshot”. After changing the model (to that shown in Figure9) we found 11 refer-ence failures, of which 4 were unique errors, and still had 12 multi-ple-introduction failures. These are the results in Table2.Two reference failures were due to yet another error in the feature model that went undetected. Feature Clock must not be optional because all other features depend on its functionality. We fixed this by removing Clock’s optionality.A third failure was an implementation error. It revealed that a code fragment had been misplaced — it was placed in the Snapshot where it should have been placed in Replication. The last fail-ure was similar. A field member that only Thread feature relied upon, was defined in the Persistent feature, essentially making Persistent non-optional if Thread is selected. The error was corrected by moving the field member into Thread feature.Making the above-mentioned changes resolved all reference con-straint failures, but 12 multiple-introduction failures remained. They were not errors, rather “bad-smell” warnings. Here is a typi-cal example. Core has the method:public TransactionPublisher publisher(..) { return new CentralPublisher(null, ...);}Clock replaces this method with:public TransactionPublisher publisher(..) { return new CentralPublisher(new Clock(), ...); }Alternatively, the same effect could be achieved by altering the Core to:ClockInterface c = null;public TransactionPublisher publisher(..) { return new CentralPublisher(c, ...);}And changing Clock to refine publisher():public TransactionPublisher publisher(..) {c = new Clock();return Super.publisher(..);}Our safe composition checks allowed us to confirm by inspection that the replacements were performed with genuine intent.6.2 BaliThe Bali Product Line (BPL) is a set of AHEAD tools that manipu-late, transform, and compose AHEAD grammar specifications [10]. The feature model of Bali is shown in Figure10. It consists of 17 primitive features and a code base of 8K Jak (12K Java) LOC plus a grammar file from which a parser can be generated. Although the number of programs in BPL is rather small (8), each program is about 8K Jak LOC or 12K Java LOC that includes a generated parser. The complexity of the feature model of Figure10 is due to the fact that our feature modelling tools preclude the rep-lication of features in a grammar specification, and several (but not all) Bali tools use the same set of features.The statistics of our BPL analysis is shown in Table3. We gener-ated a total of 3453 theorems, of which 3358 were duplicates. To analyze the BPL feature module bytecodes, generate and remove duplicate theorems, and run the SAT solver to prove the 95 unique theorems took 4 seconds.We found several failures, some of which were due to duplicate theorems failing, and the underlying cause boils down to two errors. The first was a unrecognized dependency between the requireBali2jcc feature and the require feature, namelyConstraint# ofTheoremsFailures Refinement 390 Reference to Member or a Class83011 Single Introduction 1212 Abstract Class 00Interface 10Table 2: Prevayler Statistics Bali : Tool [codegen] Base ;Base : [require] [requireSyntax] collectvisitor bali syntax kernel;Tool : [requireBali2jak] bali2jak| [requireBali2jcc] bali2jcc| [requireComposer] composer| bali2layerGUI bali2layerbali2layerOptions ;%%composer ⇒¬codegen;bali2jak ∨ bali2layer ∨ bali2javacc ⇔ codegen; bali2jak ∧ require ⇒requireBali2jak; // 1 bali2jcc ∧ require ⇒requireBali2jcc; // 2 composer ∧ require ⇒requireComposer; // 3 require ⇒ requireSyntax;Figure 10 Bali Feature ModelConstraint# ofTheoremsFailures Refinement 420 Reference to Member or a Class33347 Single Introduction 187 Abstract Class 410 Interface 180Table 3: Bali Product Line Statistics。
General Terms
Negotiated Rhythms of Mobile Work: Time, Place, and Work Schedules Magnus Nilsson and Morten HertzumComputer Science, Roskilde UniversityBldg. 42.1, P.O. Box 260, DK-4000 Roskilde, Denmarkmagnusn@ruc.dk, mhz@ruc.dkABSTRACTThis study investigates the role of rhythms in the collaborative coordination of mobile work as well as in the individual actors’ comprehension and command of their work. Drawing on an ethnographic study of home-care work, we examine the ways in which temporal regularities or rhythms are formed and reinforced. Further, we analyse how the major temporal rhythms are configured and furnished by individual, collective, and social rhythms, and how these rhythms contribute to the collaborative flow of activities. Finally, we discuss how the concept of rhythms adds to an understanding of alignment and coordination in mobile and distributed work settings.Categories and Subject DescriptorsH.5.3. [Information interfaces and presentation (e.g., HCI)]: Group and Organization Interfaces – asynchronous interaction, computer-supported cooperative work, synchronous interaction. General TermsExperimentation, Human Factors, Theory.KeywordsRhythms, mobility, temporal coordination, distributed collaboration, home-care work, field study.1.INTRODUCTIONThe widely used classification of CSCW systems by means of a two by two matrix of time (same or different) and place (same or different) reflects both a recognition of time as a category fundamental to collaborative work and a crudely simplistic notion of time. Studies of the temporal aspects of collaborative work provide plenty of evidence that time and place are intricately interrelated [2, 3, 9, 23]. Indeed, effective collaboration among spatially distributed co-workers seems to require that considerable efforts are put into deliberating and articulating when who will be doing what. The increased mobility and connectivity of “anytime, anywhere” technologies mean that temporal coordination must increasingly be accomplished through social and organizational arrangements such as schedules [9], conventions [21], rhythms [30], and ad hoc activities [27].This study investigates the role of rhythms in the collaborative coordination of mobile work as well as in the individual actors’ comprehension and command of their work. As the concept of rhythms in social life is not new (cf. e.g. the seminal work of Zerubavel [37]), the main contribution of this paper is an in-depth analysis of the role of rhythms in meshing time and place. We will present an ethnographic description of home-care work, a mobile work setting in which temporal and spatial alignment of distributed activities is of paramount importance. Home-care work is specified in considerable temporal detail and requires that home-care professionals are at specific, distributed places for specific, tightly spaced periods of time to perform their work. Thus, a persistent awareness and negotiation of time and place is pivotal to competent performance of home-care work. This involves that overlapping, competing, and partly conflicting time regimes are continuously meshed into a meaningful temporal structuring of work. We will specifically analyse:•How rhythms are relied upon by the home-care professionals in reading their work schedules and adding structure and predictability to their own activities and those of their colleagues.•How rhythms also facilitate effective rearrangement and modification of work schedules to make collaboration smoother, handle exceptions with minimal disruption to other activities, and even out differences in workload. •How rhythms are formed and reinforced by temporal and spatial patterns, by artefacts, and by home-care professionals’ relationships with their longstanding clients. Mobility is still a vague category. This is mainly because it is dominated by several residual traits trailing from its definition as a conceptual contrast to stationary, office-based work (e.g., [5]). Whereas space has been at the centre of discussions of mobile work, its temporal dimension has not been addressed to a similar extent. Part of the reason for this is, we surmise, that objective clock time has rendered people’s rich and diverse means of accomplishing temporal structure and alignment largely invisible (see also [19, 24]). An initial observation illustrating the richness and effectiveness of these means is that while the home-care professionals’ time is specified in chunks as short as five minutes, they rarely look at their watches.The next section outlines the background for our work in terms of previous research on temporal alignment and coordination of mobile work. Section 3 briefly introduces home-care work, and Section 4 describes the methodology of our ethnographic study.Copyright ACM (2005). This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 2005 International ACM SIGGROUP Conference on Supporting Group Work (GROUP 2005), (Sanibel Island, FL, November 6-9, 2005). ACM Press, New York, pp. 148-157./10.1145/1099203.1099233Section 5 analyses how rhythms contribute to the accomplishment of home-care work. Finally, in Section 6, we discuss how the concept of rhythms adds to an understanding of alignment and coordination in mobile work settings.2.BACKGROUND: TIME AND MOBILITY Alignment and coordination are constituent elements of the ways in which collaborative work is perceived and performed. Work involving distributed locations poses distinctive challenges because it makes it necessary to also align and coordinate where people are, for example to obtain simultaneous presence at one location or a balanced spread of actors across locations. In domains such as air-traffic control [8], ambulance control [22], and line control [14] these challenges have been met by establishing centres of collocated people that control and coordinate the distributed activities. This has, to some extent, rendered alignment and coordination among the distributed – and typically mobile – actors invisible. Dedicated studies of alignment and coordination among mobile actors tend to focus on relatively autonomous actors or professionals expected to exercise discretion in how they accomplish their tasks, such as businesspeople [27], hospital staff [3], real-estate agents [32], and ecologists on field research [26]. We are, however, particularly interested in alignment and coordination among mobile actors in tightly and dynamically regulated settings. One example of such work is Juhlin and Weilenmann’s [16] study of snow-clearance crews in airports. They show how these crews, in their radio communication with the control tower, repair misunderstandings, discuss the task at hand, and negotiate next actions, but they focus on the formal and hierarchical aspects of the communication, rather than on temporal alignment and coordination. Discussions of mobile work tend to construe mobility in spatial terms. For example, Luff and Heath [20] distinguish three types of mobility – micro, local, and remote mobility – by the amount of distance covered by the movements of artefacts and actors. Bardram and Bossen [3] introduce mobility work as a concept complementary to articulation work. They define mobility work as “the work needed to achieve the right configuration of people, resources, knowledge, and place in order to carry out tasks”. While this definition emphasizes that mobility is concerned with assembling distributed entities for combined use, temporality is only implicitly part of the definition. In contrast, Kakihara and Sørensen [17] propose that mobility involves three interrelated aspects – spatial, temporal, and contextual mobility – and argue that all three aspects are necessary to appreciate the full relationship between mobility and human action.Bardram [2] emphasizes that temporal coordination is itself work and that it is mediated by artefacts. The work involved in temporal coordination takes place at multiple levels of abstraction and includes continuous synchronization, planned scheduling, and allocation of temporal resources to various activities. Thus, people’s activities are intertwined with an ongoing process of temporal structuring [1]. According to Orlikowski and Yates [24] it is through this process people in organizations experience time; that is, time is experienced through organizational practice rather than as objective clock time or as a subjective phenomenon. Among the analytic concepts that have been employed in discussing the temporal structures people create and use in the course of performing their work, we want to emphasize trajectories and rhythms, two related but subtly different concepts: Trajectories[35] structure events by providing temporal sequencing. Being on the waiting list for an operation informs medical staff that the patient has previously been diagnosed and a treatment decided upon and that the patient is now to be considered when patients are scheduled for this operation [9]. The waiting list also informs medical staff about upcoming needs for post-operation facilities and activities. Trajectories focus on one actor (or work object) and describe the sequence of events and activities this actor contributes to (or this object is subjected to) in the course of the collaborative activity. An example of the use of trajectories in healthcare settings is Strauss et al. [35].Rhythms structure events by providing temporal cycles [37]. Nursing shifts and morning rounds are examples of large-scale rhythms central to the daily regularity of hospital work. At a finer-grained level, the time between successive refills of intravenously administered drugs is known to experienced nurses without necessarily checking the medication orders and adds temporal rhythm to their treatment of patients [30]. Rhythms punctuate the continuous flow of activities with periodically recurring events and thereby offer ways of condensing myriad individual events into patterns exhibiting at least some regularity and predictability. Rhythms have, for example, been used in predicting when people are available for interruptions [4, 15].In complex work settings trajectories and rhythms are, to a large extent, instigated by conventions and procedures and enforced by artefacts such as coordination mechanisms [33] and classification schemes [6]. This way, appreciation of trajectories and rhythms mediates articulation work and thereby contributes to reducing the complexity of work and alleviating the need for ad hoc negotiation and planning. Reddy and Dourish [30] illustrate how rhythms often make it unnecessary for healthcare professionals to actively get hold of each other because they, for example, know they both attend bed management meetings, which take place three times during the day, and will thus meet ‘automatically’. In highly regulated or specialized settings, work may be serialized or otherwise decoupled into trajectories where actors seldom meet. Fitzpatrick et al. [11] report a case where technological artefacts rendered the collaborative aspects of work nearly invisible and made work appear to take place in isolation. The studied people had separate offices, preferred to keep their door closed, and spent a large portion of their working day “alone in front of a computer screen, typing in silence at a keyboard”. The apparent isolation masked collaborative rhythms, defined by regular patterns of daily activities, and extensive virtual communication and collaboration.A similar masking of collaboration and rhythms characterizes home-care work.3.HOME-CARE WORKHome-care work consists of providing care to elderly people in their homes. Although some care-related tasks may appear mundane and deprived of complexity – for example cleaning, doing the laundry, preparing food, and socializing – these home-care activities must be performed in specific time-spans, at specific places, and in accordance with the elderly persons’ preferences and special needs. This introduces two distinct, but interrelated sources of complexity.First, since home-care professionals have a distributed, location-based work setting, they are on the move most of the day and generally spend a minimum of time in office settings with other home-care professionals. Consequently, opportunities for face-to-face communication and informal coordination of activities are restricted to short periods of time, for example during lunch. Communication and coordination are, however, crucial to home-care work, because the elderly are often receiving treatment and care from various home-care professionals, such as occupational therapists, physical therapists, social workers, nurses, case managers, and home-care workers. The mobile and location-based nature of home-care work introduces an organizational complexity that must be dealt with in short-term treatment coordination as well as long-term treatment planning.Second, in addition to the organizational complexity, home-care work is also complex because it is context-dependent. The exact nature of the care-giving activities varies with, and must be adapted to, the individual elderly person’s home, activities, preferences, and special needs. Further, activities, preferences, and needs change dynamically as the elderly person’s condition improves or deteriorates. Adapting to these changes, which may happen over night, is pivotal to competent home-care work. While the organizational complexity concerns the interdependencies and coordination among distributed home-care professionals each responsible for their specific activities, the source of the context-induced complexity is the individual elderly person’s dynamically changing condition, presenting different needs for adaptation. Studies of home-care work within CSCW have primarily examined the organizational complexity. In particular, Pinelle and Gutwin [28, 29] investigated loosely coupled home-care work in a setting characterized by worker autonomy and remote mobility. The Danish home-care setting we have investigated is, in contrast, tightly regulated and characterized by local mobility.In Denmark 68,000 full-time equivalent (FTE) employees are working as home-care professionals, and approximately 200,000 elderly people are receiving home-care services [10]. To make the amount of resources spent on home care more transparent, a protocol-based taxonomy was introduced in 1998. In essence, the taxonomy is a conceptual tool that describes the different types of care an elderly can acquire. An elderly is scheduled for home-care by a visiting nurse who performs an assessment of the care to be administered. The outcome of such an assessment is a list of items from the taxonomy and the time allocated to each task. These items are fed into a scheduling system used by municipal authorities in producing the home-care professionals’ daily work schedules, which specify the elderly to be visited, the care to be administered, and the time allocated to each task. In some home-care regions, the scheduling system has recently been extended with personal digital assistants (PDAs), through which the home-care professionals access their daily work schedules. This is for example the case in the home-care region we studied.4.RESEARCH SITE AND METHODThe fieldwork reported in this paper is part of a longitudinal study of how home-care professionals use mobile technologies in their daily work. We studied one of the seven home-care regions in the greater Copenhagen area, Greenwood (a pseudonym). This home-care region employs approximately 70 home-care workers and 10 nurses who deliver home care to about 620 elderly people. On a daily basis, 675 visits are arranged and undertaken. This amount to a total of more than 230,000 home-care visits a year.The home-care professionals in Greenwood are divided into teams of 12, and each team shoulders responsibility for delivering care to the elderly in a sub area of Greenwood. As Greenwood is a suburban area of 3-8 storey buildings, the sub areas are of limited size and the means of transportation from visit to visit is bicycles. In the scheduling system the home-care professionals are allocated seven minutes of travel time to get from one visit to the next. A home-care professional typically has 10-13 visits a day. The first author was engaged in fieldwork in Greenwood during October and December 2004 to investigate how work schedules were being used, managed, adapted, and configured by home-care professionals. Multiple representations of the work schedules were of interest in that they were discussed orally, inspected on the whiteboard in the main office, accessed on the PDAs, and held in the heads of individual home-care professionals. To investigate these aspects of home-care work, three different qualitative techniques were employed: participant observations, contextual interviews, and workshop seminars.Participant observation. The first author observed five home-care professionals during entire working days – from 7:30 am, when the home-care professionals arrive at the main office, to 2:30 pm or 3:30 pm when their shift ends. Four of the home-care professionals were home-care workers and one was a nurse originally trained as a home-care worker. Two of the home-care workers were young and educated recently; the three others had been employed in home-care services for five or more years. The fieldwork was an eminently mobile affair comprised of following the home-care professionals from the main office to the homes of the elderly, entering their homes, observing home-care activities, and taking part in some casual activities, like vacuum cleaning. In sum the fieldwork amounted to 70 hours of observing – and sometimes participating in – the daily work of home-care professionals.Contextual interviews. Since home-care professionals’ time is highly regulated and their activities cannot be interrupted due to the private nature of giving care, contextual interviews were added to the participant observations. In this context, the contextual interviews consisted of biking alongside the home-care professionals, in-between visits, while interviewing them about what had happened during the visit they had just completed. As it turned out, these brief, recurrent interviews provided excellent opportunities for grounding interviews in concrete events and getting details about the background reasons and on-site reflections shaping the visits.Workshop seminars. While the participant observation and contextual interviews took place during the fieldwork, the workshop seminars were held subsequently. The seminars consisted of presenting observations and preliminary analyses of the field data to the home-care professionals and their management, followed by discussion. This served to inform and corroborate our understanding of the alignment and coordination of home-care work, specifically the use of the work schedules and the role of rhythms. The first author conducted three workshop seminars.The observations, most of the interviews, and the seminars were documented in field notes. The remaining interviews were audio recorded and transcribed. The major reason for using field notes as the primary means of documentation was ethical. Field notes were chosen in collaboration with home-care management in Greenwood as appropriate for use in the elderly people’s homes and given the personal nature of many home-care activities. Inanalysing the field notes and transcribed interviews we aimed at developing a grounded understanding of temporal coordination in home-care work, conceptualized in terms of rhythmic structures.5.RHYTHMS AND HOME-CARE WORK In this section we present our analysis of the fieldwork. First, we describe the major temporal rhythm that structure home-care work in Greenwood. Second, we analyse how the major temporal rhythm is configured and essentially furnished by other types of rhythms, and how the rhythms are formed and reinforced by artefacts, and by temporal and spatial patterns.5.1Clock-time RhythmsIn contrast to other types of mobile work (cf. Section 2), the mobility of home-care work in Greenwood is structured around arigid specification of exactly when, where, and for how long activities take place. This clock-time specification forms the backbone of home-care work: The home-care workers arrive at the main office at 7:30 am and download their daily schedule on a PDA. At 7:45 am, the home-care workers leave the office to deliver care to typically 5-7 clients; then at 11:00 am they return to the office for lunch for approximately half an hour before returning to the care-related work and administering care to typically 4-5 clients. At approximately 2:15 pm or 3:15 pm, depending on whether they are working short or long shifts, they once again return to the main office. While engaged in work, the home-care workers use the PDA to record time; when entering a new visit, the home-care worker presses ‘start’ on the PDA, and when the visit is over the home-care worker presses ‘stop’ on the PDA. In between visits a standard bicycle time of seven minutes is allocated. Pressing stop on the PDA automatically triggers the seven-minute bicycle time. On returning to the main office, the home-care workers upload their schedule to the main server. The uploaded schedule reflects the de facto order in which visits have been carried out and the exact time spent on the different visits. The schedules downloaded to the home-care workers’ PDAs in the morning are based on a standard plan. A designated team member is responsible for the local coordination of the weekly schedules, detailing the clients to be visited and the care to be administered. While the weekly schedule is arranged so that home-care workers primarily have the same clients, there are always exceptions to the pre-planned schedule. Sometimes home-care workers call in sick; at other times clients have been hospitalized during the night. The management at Greenwood handles these day-to-day exceptions by generating an updated version of the schedule before home-care workers arrive at the main office in the morning. This, however, amounts to home-care workers not knowing their exact schedule before they download it. Attuning to the exceptions of the updated schedule and understanding the temporal and spatial implications, is one of the decisive activities that take place before the home-care workers leave the main office at 7:45 amThe temporal specification of home-care work constitutes the major clock-time temporal rhythm that structures home-care work. In addition, this rhythm is inherently cyclical — the same event-types, for instance, arriving at the main office at 7:30 am, downloading the schedule to the PDA, administering care to a relatively stable group of clients, and uploading the schedule, take place over and over again. On the one hand, this major work rhythm catches many of the features related to home-care work; it describes the stationary and the mobile aspects, and it gives an impression of why time and place is of crucial importance in home-care work in Greenwood. On the other hand, the major temporal rhythm is based on objective, quantifiable, and precise clock time [1]. The home-care workers’ comprehension of time and command of their work is, however, constituted by other rhythms in addition to the clock-time rhythm.5.2Social Aspects of RhythmsDuring our fieldwork, we increasingly noticed other types of rhythms — in particular, rhythms pertaining to home-care workers’ co-presence in the main office, their presence in the homes of the clients, and the mobility of home-care work — than the ones associated with the temporal structuring of clock time. An initial observation indicating that the cyclical, clock-time rhythm did not capture all facets of home-care work was that home-care workers did not constantly check the PDA to learn whether they were on schedule. Still, they managed to keep track of time. Additionally, we observed that keeping track of time and the visits to be carried out was a collaborative effort. Being aware of the schedules of other home-care workers proved to be a major activity. A final observation suggesting the existence of other types of rhythms was the different temporal and spatial patterns associated with the different places of home-care work. In short, the observations indicated that home-care workers relied on other types of rhythms than the ones associated with clock time. These fieldwork observations, which indicate the existence of other types of rhythms, were substantiated by the contextual interviews and the workshop seminars.In the following, we analyse how home-care workers, on the one hand, rely on the major clock-time rhythm that on first glance structure their work, and how home-care workers, on the other hand, furnish the major clock-time structuring with several other types of rhythms.5.2.1Structure and PredictabilitySince the schedule is the focal point of home-care work in Greenwood, we continue our analysis with an excerpt describing the activities associated with downloading the schedule to the PDA1:1All names in the excerpts, which are based on the fieldwork notes and the transcribed interviews, have been disguised. Figure 1. The standard plan (displayed on the screen). The map in front of the screen illustrates the different sub areasof the Greenwood district.It’s 7:32 am; Christine, a home-care worker in her mid-twenties, enters the main office. She notices the message on the whiteboard: “Peter, Susan – October 22, ill.” The office is sizzling from the activities of her colleagues. Two of them, Kate and Simon, are engaged in conversation: One of Simon’s regulars has to be picked up early this morning because he is going to the elder centre [a note in the paper-based team calendar specifies this]. Also, Simon has two of Susan’s clients this morning. Kate accepts to take Simon’s first visit, a visit consisting of preparing food. [7:37 am] Christine downloads her schedule to her PDA and briefly runs her eyes over it. During this, she immediately observes two of Peter’s regulars on her schedule. “Look,” she says [addressed to the first author], “today, we are going to Hill Street [a place on the outskirts of the Greenwood district]. Shortly after this, Christine asks one of her co-workers if she is acquainted with Peter’s regulars, and if she knows how to get there from the centre of the Greenwood district. The co-worker replies, “they are fine, a bit moody, though”, and explains the route she has taken on previous visits. The two new visits are due at 12:00 noon and 12:50 pm All of Christine’s regulars live in the centre of Greenwood. During the day, Christine does not take the visits in the order specified. As the added visits are not health-critical [e.g., giving medicine], Christine first takes her regular visits at the centre of Greenwood, and finally Peter’s regulars on Hill Street.The home-care workers are thoroughly familiar with their regulars who constitute a recurrent structure that forms the backbone of their weekly schedules. In addition, they know their route through the district, and when and where visits are normally scheduled. Accordingly, the weekly schedule and the temporal and spatial rhythms associated with the weekly schedule serve as a background structuring against which changes are readily noticed. When entering the team-room, the home-care workers immediately get a sense of the challenges they must face. If the whiteboard stipulates that someone from the team has called in sick there is bound to be newly added visits to their schedules; and if the team calendar specifies that a client must visit the elder centre, the morning visits must be rearranged. Therefore, the artefacts situated in the main office make potential changes to the schedules of the home-care workers’activities during the day highly visible. As the excerpt illustrates, the home-care workers quickly spot changes to their downloaded schedule, and accordingly grasp the implications of added visits. The implications of added visits concern the mobility of the work, that is, how home-care workers travel through the district. This route is not accidental; the home-care workers have consciously configured their route to move efficiently through the district in accomplishing their weekly schedule. Often it involves taking shortcuts through back alleys or crossing playing fields wheeling their bicycles. In reading through their schedule, the experienced home-care workers rely on this route. In contrast, the inexperienced home-care workers and the home-care workers just employed at Greenwood are still struggling with the topography of Greenwood. They rely on printouts of maps and every morning they spend some time figuring out the most suitable route through the district.For the experienced home-care workers, the major clock-time work rhythm and the fact that they have a stable base of regulars, provide a recurrent structure that adds predictability to their work. As the excerpt indicates, the fifteen minutes of co-presence in the main office, from 7:30 am to 7:45 am, is not only a specific time span conceived of as clock time. It is also a recurring event time. Sometimes the fifteen minutes are very busy for all the home-care workers, sometimes only for a few of them. The excerpt also indicates that there are several social aspects to the major temporal rhythm that structure the work. Home-care workers do not just download their schedule, notice the added visits, and then leave the main office. They engage in conversation asking their colleagues about clients and locations; they notice who is talking to whom, and who is busy and who is not. Thus, the main office is both a place for the individual, sequential rhythms related to entering the office, downloading the schedule to the PDA, and leaving the office, and a place for establishing collaborative work rhythms, that is, exchanging visits, asking clarifying questions about clients and locations. In this fashion being aware of the other home-care workers’ schedules and activities during the morning meeting is crucial in order to know who is available for ad hoc support during the day and who may need a hand to complete their schedule.The social aspects also include the mobility of the work. When leaving the main office, home-care workers do not bicycle individually to the first scheduled visit. Instead, they bicycle in group formation the first half mile and then spilt up. Whereas the communication in the main office is mostly work-related and centred around coordinative problems, the conversations during the five minute bicycle ride to the centre of Greenwood are informal and tend to focus on either stories from the home-care workers’ everyday life, or stories from recent visits. The stories of recent visits are somewhat similar to the ‘war stories’ of Orr’s study of copy-technicians [25]. Like the copy-technicians, the home-care workers also tell each other stories about significant episodes from their work practice, and through the stories pass on knowledge about how to handle critical things (e.g., dealing with clients with dementia). Additionally, the mobility of the work also includes casual encounters where home-care workers meet on-route, in-between visits. While these encounters at the beginning of our fieldwork appeared to be very informal, mundane moments of socializing, an in-depth analysis of the fieldwork notes and contextual interviews showed that the casual encounters are not banana time [31]. The analysis revealed that the social interaction and questions asked depended on the location of the other home-care worker or nurse. If the other home-care worker or nurse was even slightly out of synch relative to where he or she would normally be at that point in time, it immediately triggered questions probing into the cause of their unusual location.A final social aspect of the major work rhythm is that it includes the clients. The clients rely on the home-care workers to visit them on more or less the same time every day, and therefore structure their everyday life accordingly. Moreover, the long-standing relationship between clients and home-care workers — sometimes lasting more than a decade — changes the nature of home-care work. In short, it is not only work, but also a social and friendly meeting. Furthermore, this relationship establishes a fine-grained awareness of the elderly person’s habits, preferences, and special needs. This includes such minutia as precisely how to season an oatmeal porridge with a pinch of salt. Finally, as the elderly person’s preferences and needs change, for example, due to arthritis, dementia, or diabetes, the home-care workers aptly sense these changes and adjust appropriately.The social aspects are essentially furnishing the major temporal rhythm with several other kinds of rhythms. These rhythms are。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
OpenDHT:A Public DHT Service and Its UsesSean Rhea,Brighten Godfrey,Brad Karp,John Kubiatowicz,Sylvia Ratnasamy,Scott Shenker,Ion Stoica,and Harlan YuUC Berkeley and Intel Researchopendht@ABSTRACTLarge-scale distributed systems are hard to deploy,and distributed hash tables(DHTs)are no exception.To lower the barriers fac-ing DHT-based applications,we have created a public DHT service called OpenDHT.Designing a DHT that can be widely shared,both among mutually untrusting clients and among a variety of applica-tions,poses two distinct challenges.First,there must be adequate control over storage allocation so that greedy or malicious clients do not use more than their fair share.Second,the interface to the DHT should make it easy to write simple clients,yet be sufficiently general to meet a broad spectrum of application requirements.In this paper we describe our solutions to these design challenges.We also report our early deployment experience with OpenDHT and describe the variety of applications already using the system.Categories and Subject DescriptorsC.2[Computer Communication Networks]:Distributed SystemsGeneral TermsAlgorithms,Design,Experimentation,Performance,ReliabilityKeywordsPeer-to-peer,distributed hash table,resource allocation1.MOTIV ATIONLarge-scale distributed systems are notoriously difficult to de-sign,implement,and debug.Consequently,there is a long history of research that aims to ease the construction of such systems by providing simple primitives on which more sophisticated function-ality can be built.One such primitive is provided by distributed hash tables,or DHTs,which support a traditional hash table’s sim-ple put/get interface,but offer increased capacity and availability by partitioning the key space across a set of cooperating peers and replicating stored data.While the DHTfield is far from mature,we have learned a tremen-dous amount about how to design and build them over the past few Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.SIGCOMM’05,August21–26,2005,Philadelphia,Pennsylvania,USA. Copyright2005ACM1-59593-009-4/05/0008...$5.00.Figure1:OpenDHT Architecture.years,and several well-debugged DHT implementations[1–3]are now readily available.Furthermore,several fully deployed DHT applications are now in daily use[15,20,26],and dozens more have been proposed and/or prototyped.Maintaining a running DHT requires non-trivial operational ef-fort.As DHT-based applications proliferate,a natural question to ask is whether every such application needs its own DHT deploy-ment,or whether a shared deployment could amortize this opera-tional effort across many different applications.While some appli-cations do in fact make extremely sophisticated use of DHTs,many more access them through such a narrow interface that it is reason-able to expect they might benefit from a shared infrastructure.In this paper,we report on our efforts to design and build OpenDHT(formerly named OpenHash[19]),a shared DHT de-ployment.Specifically,our goal is to provide a free,public DHT service that runs on PlanetLab[5]today.Longer-term,as we con-sider later in this paper,we envision that this free service could evolve into a competitive commercial market in DHT service. Figure1shows the high-level architecture of OpenDHT.Infras-tructure nodes run the OpenDHT server code.Clients are nodes outside the set of infrastructure nodes;they run application code that invokes the OpenDHT service using RPC.Besides participat-ing in the DHT’s routing and storage,each OpenDHT node also acts as a gateway through which it accepts RPCs from clients. Because OpenDHT operates on a set of infrastructure nodes,no application need concern itself with DHT deployment,but neither can it run application-specific code on these infrastructure nodes. This is quite different than most other uses of DHTs,in which the DHT code is invoked as a library on each of the nodes running the application.The library approach is veryflexible,as one can put application-specific functionality on each of the DHT nodes,but each application must deploy its own DHT.The service approach adopted by OpenDHT offers the opposite tradeoff:lessflexibility in return for less deployment burden.OpenDHT provides a home for applications more suited to this compromise.The service approach not only offers a different tradeoff;it also poses different design challenges.Because of its shared nature, building OpenDHT is not the same as merely deploying an existingDHT implementation on PlanetLab.OpenDHT is shared in two dif-ferent senses:there is sharing both among applications and among clients,and each raises a new design problem.First,for OpenDHT to be shared effectively by many different applications,its interface must balance the conflicting goals of gen-erality and ease-of-use.Generality is necessary to meet the needs of a broad spectrum of applications,but the interface should also be easy for simple clients to use.Ease-of-use argues for a fairly sim-ple primitive,while generality(in the extreme)suggests giving raw access to the operating system(as is done in PlanetLab).1It is hard to quantify both ease-of-use and generality,so we rely on our early experience with OpenDHT applications to evaluate our design de-cisions.Not knowing what applications are likely to emerge,we can only conjecture about the required degree of generality. Second,for OpenDHT to be shared by many mutually untrust-ing clients without their unduly interfering with each other,system resources must be allocated with care.While ample prior work has investigated bandwidth and CPU allocation in shared settings,stor-age allocation has been studied less thoroughly.In particular,there is a delicate tradeoff between fairness andflexibility:the system shouldn’t unnecessarily restrict the behavior of clients by imposing arbitrary and strict quotas,but it should also ensure that all clients have access to their fair share of service.Here we can evaluate prospective designs more quantitatively,and we do so with exten-sive simulations.We summarize our solutions to these two design problems in Section2.We then address in significantly more detail the OpenDHT interface(Section3)and storage allocation algorithm (Section4).Section5describes our early deployment experience, both in terms of raw performance and availability numbers,and the variety of applications currently using the system.Section6con-cludes with a discussion of various economic concerns that may affect the design and deployment of services like OpenDHT. 2.OVERVIEW OF DESIGNBefore delving into the details of OpenDHT in subsequent sec-tions,wefirst describe the fundamental rationale for the designs we chose for the system’s interface and storage allocation mechanism.2.1InterfaceIn designing OpenDHT,we have the conflicting goals of gener-ality and ease-of-use(which we also refer to as simplicity).There are three broad classes of interfaces in the DHT literature,and they each occupy very different places on the generality/simplicity spec-trum(a slightly different taxonomy is described in[11]).Given a key,these interfaces provide three very different capabilities: routing Provides general access to the DHT node responsible for the input key,and to each node along the DHT routing path. lookup Provides general access to the DHT node responsible for the input key.storage Directly supports the put(key,value)and get(key)opera-tions by routing them to the DHT node responsible for the input key,but exposes no other interface.The routing model is the most general interface of the three;a client is allowed to invoke arbitrary code at the endpoint and at ev-ery node along the DHT path towards that endpoint(either through 1One might argue that PlanetLab solves the problems we are posing by providing extreme resource control and a general interface.But PlanetLab is hard for simple clients to use,in that every applicationmust install software on each host and ensure its continued opera-tion.For many of the simple applications we describe in Section 5.3,this effort would be inappropriately burdensome.upcalls or iterative routing).This interface has been useful in im-plementing DHT-based multicast[7]and anycast[34].The lookup model is somewhat less general,only allowing code invocation on the endpoint.This has been used for query process-ing[17],file systems[9,23],and packet forwarding[31].The true power of the routing and lookup interfaces lies in the application-specific code running on the DHT nodes.While the DHT provides routing to the appropriate nodes,it is the application-specific code that does the real work,either at each hop en route (routing)or only at the destination(lookup).For example,such code can handle forwarding of packets(e.g.,multicast and i3[31]) or data processing(e.g.,query processing).The storage model is by far the leastflexible,allowing no access to application-specific code and only providing the put/get primi-tives.This lack offlexibility greatly limits the spectrum of applica-tions it can support,but in return this interface has two advantages: it is simple for the service to support,in that the DHT infrastructure need not deal with the vagaries of application-specific code running on each of its nodes,and it is also simple for application developers and deployers to use,freeing them from the burden of operating a DHT when all they want is a simple put/get interface.In the design of OpenDHT,we place a high premium on sim-plicity.We want an infrastructure that is simple to operate,and a service that simple clients can use.Thus the storage model,with its simple put/get interface,seems most appropriate.To get around its limited functionality,we use a novel client library,Recursive Distributed Rendezvous(ReDiR),which we describe in detail in Section3.2.ReDiR,in conjunction with OpenDHT,provides the equivalent of a lookup interface for any arbitrary set of machines (inside or outside OpenDHT itself).Thus clients using ReDiR achieve theflexibility of the lookup interface,albeit with a small loss of efficiency(which we describe later).Our design choice reflects our priorities,but one can certainly imagine other choices.For instance,one could run a shared DHT on PlanetLab,with the DHT providing the routing service and Plan-etLab allowing developers to run application-specific code on indi-vidual nodes.This would relieve these developers of operating the DHT,and still provide them with all theflexibility of the routing in-terface,but require careful management of the application-specific code introduced on the various PlanetLab nodes.We hope others explore this portion of the design space,but we are primarily inter-ested in facilitating simple clients with a simple infrastructure,and so we chose a different design.While there are no cut-and-dried metrics for simplicity and gen-erality,early evidence suggests we have navigated the tradeoff be-tween the two well.As we describe in greater detail in Section5.1, OpenDHT is highly robust,and wefirmly believe that the relative simplicity of the system has been essential to achieving such ro-bustness.While generality is similarly difficult to assess,in Table4 we offer a catalog of the diverse applications built on OpenDHT as evidence of the system’s broad utility.2.2Storage AllocationOpenDHT is essentially a public storage facility.As observed in [6,30],if such a system offers the persistent storage semantics typi-cal of traditionalfile systems,the system will eventuallyfill up with orphaned data.Garbage collection of this unwanted data seems dif-ficult to do efficiently.To frame the discussion,we consider the so-lution to this problem proposed as part of the Palimpsest shared public storage system[30].Palimpsest uses a novel revolving-door technique in which,when the disk is full,new stores push out the old.To keep their data in the system,clients re-put fre-quently enough so that it is neverflushed;the required re-put ratedepends on the total offered load on that storage node.Palimpsest uses per-put charging,which in this model becomes an elegantly simple form of congestion pricing to provide fairness between users (those willing to pay more get more).While we agree with the basic premise that public storage fa-cilities should not provide unboundedly persistent storage,we are reluctant to require clients to monitor the current offered load in order to know how often to re-put their data.This adaptive moni-toring is complicated and requires that clients run continuously.In addition,Palimpsest relies on charging to enforce some degree of fairness;since OpenDHT is currently deployed in an environment where such charging is both impractical and impolitic,we wanted a way to achieve fairness without an explicit economic incentive. Our goals for the OpenDHT storage allocation algorithm are as follows.First,to simplify life for its clients,OpenDHT should offer storage with a definite time-to-live(TTL).A client should know exactly when it must re-store its puts in order to keep them stored, so rather than adapting(as in Palimpsest),the client can merely set simple timers or forget its data altogether(if,for instance,the application’s need for the data will expire before the data itself). Second,the allocation of storage across clients should be“fair”without invoking explicit charging.By fair we mean that,upon overload,each client has“equal”access to storage.2Moreover, we also mean fair in the work-conserving sense;OpenDHT should allow for full utilization of the storage available(thereby preclud-ing quota-like policies),and should restrict clients only when it is overloaded.Finally,OpenDHT should prevent starvation by ensuring a min-imal rate at which puts can be accepted at all times.Without such a requirement,the system could allocate all its storage(fairly)for an arbitrarily long TTL,and then reject all storage requests for the duration of that TTL.Such“bursty”availability of storage would present an undue burden on OpenDHT clients.In Section4we present an algorithm that meets the above goals. The preceding was an overview of our design.We next consider the details of the OpenDHT client interface,and thereafter,the de-tails of storage allocation in OpenDHT.3.INTERFACEOne challenge to providing a shared DHT infrastructure is de-signing an interface that satisfies the needs of a sufficient variety of applications to justify the shared deployment.OpenDHT ad-dresses this challenge two ways.First,a put/get interface makes writing simple applications easy yet still supports a broad range of storage applications.Second,the use of a client-side library called ReDiR allows more sophisticated interfaces to be built atop the base put/get interface.In this section we discuss the design of these interfaces.Section5presents their performance and use. 3.1The put/get APIThe OpenDHT put/get interface supports a range of application needs,from storage in the style of the Cooperative File System (CFS)[9]to naming and rendezvous in the style of the Host Identity Protocol(HIP)[21]and instant messaging.The design goals behind the put/get interface are as follows. First,simple OpenDHT applications should be simple to write. The value of a shared DHT rests in large part on how easy it is to use.OpenDHT can be accessed using either Sun RPC over TCP or 2As in fair queuing,we can of course impose weighted fairness, where some clients receive a larger share of storage than others,for policy or contractual reasons.We do not pursue this idea here,butit would require only minor changes to our allocation mechanism.Procedure Functionalityput(k,v,H(s),t)Write(k,v)for TTL tcan be removed with secret s get(k)returns{(v,H(s),t)}Read all v stored under kreturned value(s)unauthenticated remove(k,H(v),s,t)Remove(k,v)put with secret st>than TTL remaining for put put-immut(k,v,t)Write(k,v)for TTL timmutable(k=H(v))get-immut(k)returns(v,t)Read v stored under kreturned value immutableput-auth(k,v,n,t,K P,σ)Write(k,v),expires at tpublic key K P;private key K Scan be removed using nonce nσ={H(k,v,n,t)}KSget-auth(k,H(K P))returns{(v,n,t,σ)}Read v stored under(k,H(K P))returned value authenticated remove-auth(k,H(v),n,t,K P,σ)Remove(k,v)with nonce nparameters as for put-authTable1:The put/get interface.H(x)is the SHA-1hash of x.XML RPC over HTTP;as such it easy to use from most program-ming languages and works from behind mostfirewalls and NATs.A Python program that reads a key and value from the console and puts them into the DHT is only nine lines long;the complementary get program is only eleven.Second,OpenDHT should not restrict key choice.Previous schemes for authentication of values stored in a DHT require a par-ticular relationship between the value and the key under which it is stored(e.g.,[9,14]).Already we know of applications that have key choice requirements that are incompatible with such restric-tions;the prefix hash tree(PHT)[25]is one example.It would be unwise to impose similar restrictions on future applications. Third,OpenDHT should provide authentication for clients that need it.A client may wish to verify that an authorized entity wrote a value under a particular key or to protect its own values from overwriting by other clients.As we describe below,certain attacks cannot be prevented without support for authentication in the DHT. Of course,our simplicity goal demands that authentication be only an option,not a requirement.The current OpenDHT deployment meets thefirst two of these design goals(simplicity and key choice)and has some support for the third(authentication).In what follows,we describe the current interface in detail,then describe two planned interfaces that better support authentication.Table1summarizes all three interfaces. Throughout,we refer to OpenDHT keys by k;these are160-bit values,often the output of the SHA-1hash function(denoted by H),though applications may assign keys in whatever fashion they choose.Values,denoted v,are variable-length,up to a maximum of 1kB in size.All values are stored for a bounded time period only;a client specifies this period either as a TTL or an expiration time, depending on the interface.Finally,we note that under all three interfaces,OpenDHT pro-vides only eventual consistency.In the case of network partitions or excessive churn,the system may fail to return values that have been put or continue to return values that have been removed.Im-perfect clock synchronization in the DHT may also cause values to expire at some replicas before others,leaving small windows where replicas return different results.While such temporary in-consistencies in theory limit the set of applications that can be built on OpenDHT,they have not been a problem to date.3.1.1The Current InterfaceA put in OpenDHT is uniquely identified by the triple of a key, a value,and the SHA-1hash of a client-chosen random secret up to40bytes in length.If multiple puts have the same key and/or value,all are stored by the DHT.A put with the same key,value, and secret hash as an existing put refreshes its TTL.A get takes a key and returns all values stored under that key,along with their associated secret hashes and remaining TTLs.An iterator interface is provided in case there are many such values.To remove a value,a client reveals the secret whose hash was provided in the put.A put with an empty secret hash cannot be removed.OpenDHT stores removes like puts,but a DHT node dis-cards a put(k,v,H(s))for which it has a corresponding remove.To prevent the DHT’s replication algorithms from recovering this put when the remove’s TTL expires,clients must ensure that the TTL on a remove is longer than the TTL remaining on the correspond-ing put.Once revealed in a remove,a secret should not be reused in subsequent puts.To allow other clients to remove a put,a client may include the encrypted secret as part of the put’s value.To change a value in the DHT,a client simply removes the old value and puts a new one.In the case where multiple clients perform this operation concurrently,several new values may end up stored in the DHT.In such cases,any client may apply an application-specific conflict resolution procedure to decide which of the new values to remove.So long as this procedure is a total or-dering of the possible input values,it does not matter which client performs the removes(or even if they all do);the DHT will store the same value in the end in all cases.This approach is similar to that used by Bayou[24]to achieve eventual consistency.Since OpenDHT stores all values put under a single key,puts are robust against squatting,in that there is no race to putfirst under a valuable key(e.g.,H(“”)).To allow oth-ers to authenticate their puts,clients may digitally sign the values they put into the DHT.In the current OpenDHT interface,however, such values remain vulnerable to a denial-of-service attack we term drowning:a malicious client may put a vast number of values under a key,all of which will be stored,and thereby force other clients to retrieve a vast number of such chaff values in the process of retriev-ing legitimate ones.3.1.2Planned InterfacesAlthough the current put/get interface suffices for the applica-tions built on OpenDHT today,we expect that as the system gains popularity developers will value protection against the drowning attack.Since this attack relies on forcing legitimate clients to sort through chaff values put into the DHT by malicious ones,it can only be thwarted if the DHT can recognize and reject such chaff. The two interfaces below present two different ways for the DHT to perform such access control.Immutable puts:One authenticated interface we plan to add to OpenDHT is the immutable put/get interface used in CFS[9]and Pond[28],for which the DHT only allows puts where k=H(v). Clearly,such puts are robust against squatting and drowning.Im-mutable puts will not be removable;they will only expire.The main limitation of this model is that it restricts an application’s ability to choose keys.Signed puts:The second authenticated interface we plan to add to OpenDHT is one where values put are certified by a particular public key,as used for root blocks in CFS.In these puts,a client employs a public/private key pair,denoted K P and K S,respectively. We call H(K P)the authenticator.Procedure Functionalityjoin(host,id,namespace)adds(host,id)to the list of hostsproviding functionality of namespace lookup(key,namespace)returns(host,id)in namespacewhose id most immediately follows key Table2:The lookup interface provided using ReDiR.In addition to a key and value,each put includes:a nonce n that can be used to remove the value later;an expiration time t in seconds since the epoch;K P itself;andσ={H(k,v,n,t)}KS,where {X}KSdenotes the digital signing of X with K S.OpenDHT checks that the digital signature verifies using K P;if not,the put is rejected. This invariant ensures that the client that sent a put knows K S.A get for an authenticated put specifies both k and H(K P),and returns only those values stored that match both k and H(K P). In other words,OpenDHT only returns values signed by the pri-vate key matching the public key whose hash is in the get request. Clients may thus protect themselves against the drowning attack by telling the DHT to return only values signed by an entity they trust. To remove an authenticated put with(k,v,n),a client issues a remove request with(k,H(v),n).As with the current interface, clients must take care that a remove expires after the corresponding put.To re-put a value,a client may use a new nonce n =n.We use expiration times rather than TTLs to prevent expired puts from being replayed by malicious clients.As with the current inter-face,puts with the same key and authenticator but different values will all be stored by the DHT,and a new put with the same key,au-thenticator,value,and nonce as an existing put refreshes its TTL. Authenticated puts in OpenDHT are similar to those used for public-key blocks in CFS[9],for sfrtags in SFR[33],forfileIds in PAST[14],and for AGUIDs in Pond[28].Like SFR and PAST, OpenDHT allows multiple data items to be stored using the same public key.Unlike CFS,SFR,and PAST,OpenDHT gives applica-tions total freedom over key choice(a particular requirement in a generic DHT service).3.2ReDiRWhile the put/get interface is simple and useful,it cannot meet the needs of all applications.Another popular DHT interface is lookup,which is summarized in Table2.In this interface, nodes that wish to provide some service—packet forwarding,for example—join a DHT dedicated to that service.In joining,each node is associated with an identifier id chosen from a key space, generally[0:2160).Tofind a service node,a client performs a lookup,which takes a key chosen from the identifier space and re-turns the node whose identifier most immediately follows the key; lookup is thus said to implement the successor relation.For example,in i3[31],service nodes provide a packet forward-ing functionality to clients.Clients create(key,destination)pairs called triggers,where the destination is either another key or an IP address and port.A trigger(k,d)is stored on the service node re-turned by lookup(k),and this service node forwards all packets it receives for key k to d.Assuming,for example,that the nodes A through F in Figure2are i3forwarding nodes,a trigger with key B≤k<C would be managed by service node C.The difficulty with lookup for a DHT service is the functional-ity implemented by those nodes returned by the lookup function. Rather than install application-specific functionality into the ser-vice,thereby certainly increasing its complexity and possibly re-ducing its robustness,we prefer that such functionality be sup-ported outside the DHT,while leveraging the DHT itself to per-E Level 0Level 1Level 2Level 3Client keys Client addressesA B D FB C C F C E FCD E ABA DEFigure 2:An example ReDiR tree with branching factor b =2.Each tree node is shown as a contiguous line representing the node’s interval of the keyspace,and the two intervals associated with each node are separated by a tick.The names of registered application hosts (A through F )are shown above the tree nodes at which they would be stored.form lookups.OpenDHT accomplishes this separation through the use of a client-side library called ReDiR.(An alternative approach,where application-specific code may only be placed on subsets of nodes within the DHT,is described in [18].)By using the ReDiR library,clients can use OpenDHT to route by key among these application-specific nodes.However,because ReDiR interacts with OpenDHT only through the put/get API,the OpenDHT server-side implementation retains the simplicity of the put/get interface.A DHT supporting multiple separate applications must distin-guish them somehow;ReDiR identifies each application by an arbitrary identifier,called its namespace .Client nodes provid-ing application-specific functionality join a namespace,and other clients performing lookups do so within a namespace.A ReDiR lookup on identifier k in namespace n returns the node that has joined n whose identifier most immediately follows k .A simple implementation of lookup could be achieved by storing the IP addresses and ports of all nodes that have joined a namespace n under key n ;lookups could then be performed by getting all the nodes under key n and searching for the successor to the key looked up.This implementation,however,scales linearly in the number of nodes that join.To implement lookup more efficiently,ReDiR builds a two-dimensional quad-tree of the nodes that have joined and embeds it in OpenDHT using the put/get interface.3Using this tree,ReDiR performs lookup in a logarithmic number of get operations with high probability,and by estimating the tree’s height based on past lookups,it reduces the average lookup to a constant number of gets,assuming uniform-random client IDs.The details are as follows:each tree node is list of (IP,port)pairs for a subset of the clients that have joined the namespace.An example embedding is shown in Figure 2.Each node in the tree has a level ,where the root is at level 0,its immediate chil-dren are at level 1,etc.Given a branching factor of b ,there are thus at most b i nodes at level i .We label the nodes at any level from left to right,such that a pair (i ,j )uniquely identifies the j th node from the left at level i ,and 0≤j <b i .This tree is then embedded in OpenDHT node by node,by putting the value(s)of node (i ,j )at key H (ns ,i ,j ).The root of the tree for the i 3appli-cation,for example,is stored at H (“i3”,0,0).Finally,we associate with each node (i ,j )in the tree b intervals of the DHT keyspace2160b −i (j +bb ),2160b −i (j +b+1b) for 0≤b <b .We sketch the registration process here.Define I ( ,k )to be the (unique)interval at level that encloses key k .Starting at some level start that we define later,a client with identifier v i does an OpenDHT get to obtain the contents of the node associated with 3Theimplementation of ReDiR we describe here is an improve-ment on our previous algorithm [19],which used a fixed tree height.I ( start ,v i ).If after adding v i to the list of (IP ,port )pairs,v i isnow the numerically lowest or highest among the keys stored in that node,the client continues up the tree towards the root,getting the contents and performing an OpenDHT put in the nodes associ-ated with each interval I ( start −1,v i ),I ( start −2,v i ),...,until it reaches either the root (level 0)or a level at which v i is not the low-est or highest in the interval.It also walks down the tree through the tree nodes associated with the intervals I ( start ,v i ),I ( start +1,v i ),...,at each step getting the current contents,and putting its address if v i is the lowest or highest in the interval.The downward walk ends when it reaches a level in which it is the only client in the interval.Finally,since all state is soft (with TTLs of 60seconds in our tests),the entire registration process is repeated periodically until the client leaves the system.A lookup (ns ,k )is similar.We again start at some level = start .At each step we get the current interval I ( ,k )and determine where to look next as follows:1.If there is no successor of v i stored in the tree node associated with I ( ,k ),then its successor must occur in a larger range of the keyspace,so we set ← −1and repeat,or fail if =0.2.If k is sandwiched between two client entries in I ( ,k ),then the successor must lie somewhere in I ( ,k ).We set ← +1and repeat.3.Otherwise,there is a client s stored in the node associated with I ( ,k )whose identifier v s succeeds k ,and there are no clients with IDs between k and v s .Thus,v s must be the suc-cessor of k ,and the lookup is done.A key point in our design is the choice of starting level start .Initially start is set to a hard-coded constant (2in our implementa-tion).Thereafter,for registrations,clients take start to be the low-est level at which registration last completed.For lookups,clients record the levels at which the last 16lookups completed and take start to be the mode of those depths.This technique allows us to adapt to any number of client nodes while usually hitting the cor-rect depth (Case 3above)on the first try.We present a performance analysis of ReDiR on PlanetLab in Section 5.2.4.STORAGE ALLOCATIONIn Section 2.2,we presented our design goals for the OpenDHT storage allocation algorithm:that it provide storage with a defi-nite time-to-live (TTL),that it allocate that storage fairly between clients and with high utilization,and that it avoid long periods in which no space is available for new storage requests.In this sec-tion we describe an algorithm,Fair Space-Time (FST),that meets these design goals.Before doing so,though,we first consider two choices we made while defining the storage allocation problem.First,in this initial incarnation of OpenDHT,we equate “client”with an IP address (spoofing is prevented by TCP’s three-way hand-shake).This technique is clearly imperfect:clients behind the same NAT or firewall compete with each other for storage,mobile clients can acquire more storage than others,and some clients (e.g.,those that own class A address spaces)can acquire virtually unlimited storage.To remedy this situation,we could clearly use a more so-phisticated notion of client (person,organization,etc.)and require each put to be authenticated at the gateway.However,to be com-pletely secure against the Sybil attack [13],this change would re-quire formal identity allocation policies and mechanisms.In order to make early use of OpenDHT as easy as possible,and to pre-vent administrative hassles for ourselves,we chose to start with the much more primitive per-IP-address allocation model,and we hope。