第五届全国机器翻译研讨会评测大纲20090625( 中文)(终稿)

合集下载

机器翻译中的评价方法研究

机器翻译中的评价方法研究

机器翻译中的评价方法研究机器翻译(Machine Translation,MT)是计算机科学和人工智能领域的重要研究方向,旨在将一种自然语言的文本自动翻译为另一种自然语言的文本。

随着人们对多语言交流需求的增加,机器翻译技术的发展也日渐成熟。

然而,由于语言的复杂性和多义性,机器翻译系统仍然存在一定的不准确性和错误率。

因此,评价机器翻译质量的方法和指标非常重要。

在本文中,将介绍机器翻译评价的方法和指标,并讨论它们的优缺点。

评价机器翻译质量的方法主要可以分为人工评估和自动评估两种方法。

一、人工评估人工评估是通过人工参与的方式对机器翻译结果进行质量评估。

在人工评估中,一般会邀请专业的语言学家或翻译人员对翻译结果进行判定和评分。

常用的人工评估方法包括:1.参考人工评估(Reference-based Evaluation):参考人工评估方法将机器翻译的结果和一个或多个专业翻译人员的参考翻译进行比较。

评估者会以一定的评分标准对翻译结果进行评判,并给出相应的得分。

这种方法的优点是结果可靠,能够提供准确的评估结果。

但是,参考人工评估需要大量的人力和时间,而且评价结果受评估者的主观因素影响,缺乏客观性。

2.原文人工评估(Source-based Evaluation):原文人工评估方法不依赖于参考翻译,而是直接对机器翻译的原文进行评估。

评估者会根据原文的质量和准确性对机器翻译的结果进行判断和评分。

这种方法的优点是节省了参考翻译的成本,但是评估结果仍然受评估者主观因素的影响。

3.对抗性评估(Adversarial Evaluation):对抗性评估方法通过模拟真实翻译场景中的对抗情况来评估机器翻译的鲁棒性。

评估者会对机器翻译的结果进行有意义的扰动,并检查机器翻译对扰动的敏感程度。

这种方法可以评估机器翻译系统的稳定性和鲁棒性,但是比较复杂且需要较长时间。

人工评估方法的优点是结果准确可靠,可以提供较为准确的机器翻译质量评估。

无锡市第五届翻译比赛决赛试题及参考答案

无锡市第五届翻译比赛决赛试题及参考答案

无锡市第五届翻译比赛决赛试题及参考答案(本科组)英译中试题1Is e-mail a blessing or a curse Last month, after a week’s vacation, I discovered 1,218 unread e-mail messages waiting in my IN box. I pretend to be dismayed, but secretly I was pleased. This is how we measure our wired worth in the late 1990s —if you aren’t overwhelmed by e-mail, you must be doing something wrong.E-mail is enabling radically new forms of worldwide human collaboration. Those 225 million people who can send and receive it represent a network of potentially cooperating individuals dwarfing anything that even the mightiest corporation or government. Mailing-list discussion groups and online conferencing allow us to gather together to work on a multitude of projects that are interesting or helpful to us — to pool our collective efforts in a fashion never before possible. The most obvious place to see this collaboration right now is in the world of software. For decades, programmers have used e-mail to collaborate on projects. With increasing frequency, this collaboration is occurring across company lines, and often without even the spur of commercial incentives. It’s happening largely because it can —it’s relatively easy for a thousand programmers to collectively contribute to a project using e-mail and the Internet. Perhaps each individual contribution is small, but the scale of the Internet multiplies all efforts dramatically.参考答案电子邮件是福是祸上个月,在一周休假之后,我在收件箱中发现了1,218封未读邮件。

人机交互式机器翻译中的自动评估指标

人机交互式机器翻译中的自动评估指标

人机交互式机器翻译中的自动评估指标人机交互式机器翻译(Interactive Machine Translation)是指通过人与机器之间的实时交互,共同完成机器翻译任务的一种方法。

自动评估指标是人机交互式机器翻译系统中用来度量机器翻译质量的一种工具。

本文将对自动评估指标在人机交互式机器翻译中的应用进行详细探讨。

一、引言机器翻译(Machine Translation,MT)是一项重要的自然语言处理任务,目前已经取得了显著的进展。

然而,传统的机器翻译系统往往存在错误和不准确的问题。

为了提高机器翻译的质量,人机交互式机器翻译作为一种新的方法被提出。

在人机交互式机器翻译中,机器只提供翻译结果的候选,而人则根据候选选择最佳的翻译。

自动评估指标作为一种度量机器翻译质量的工具,在人机交互式机器翻译中起到了重要的作用。

二、自动评估指标的定义及分类自动评估指标是通过计算机算法自动度量机器翻译质量的工具。

常见的自动评估指标可以分为两类:基于参考答案的指标和无参考答案的指标。

基于参考答案的指标是通过将机器翻译的结果与参考答案进行比较来评估翻译的质量。

其中最常用的指标是BLEU(Bilingual Evaluation Understudy)。

BLEU以n-gram匹配为基础,计算机器翻译结果与参考答案之间的n-gram重叠率。

BLEU指标越高,机器翻译的质量越好。

另外,还有METEOR(Metric for Evaluation of Translation with Explicit ORdering)指标,它考虑了词形、词序和语义信息,并通过计算翻译结果与参考答案之间的匹配程度来评估翻译质量。

无参考答案的指标是通过计算翻译结果的特定属性来评估翻译的质量,而不需要参考答案。

其中常见的指标有TER(Translation Edit Rate)和PER(Position-independent Edit Rate)。

人机交互式机器翻译中的评价指标研究

人机交互式机器翻译中的评价指标研究

人机交互式机器翻译中的评价指标研究评价指标是人机交互式机器翻译领域中的重要研究内容,主要用于评估机器翻译系统的性能和质量。

本文将介绍人机交互式机器翻译评价指标的研究进展,并对常用的指标进行详细描述和讨论。

一、背景介绍机器翻译(Machine Translation,MT)是计算语言学和人工智能领域的一个重要研究方向。

随着科技的发展和社会的进步,机器翻译系统在实用化和商业化方面取得了突破性进展。

然而,由于语言的复杂性和多样性,机器翻译系统仍然存在一些问题,如句法错误、语义错误、上下文不连贯等,这些问题对机器翻译的应用和推广产生了一定的影响。

为了解决这些问题,人机交互式机器翻译(Interactive Machine Translation, IMT)成为了机器翻译领域的一个新兴方向。

人机交互式机器翻译利用人类知识和机器翻译系统相结合,通过人机交互的方式对机器翻译结果进行修正和改进,从而提升机器翻译的质量和性能。

评价指标在人机交互式机器翻译中起到了至关重要的作用,它能够客观地评估机器翻译结果的质量和性能,为系统的改进提供指导。

下面我们将介绍一些常用的评价指标并进行详细讨论。

二、常用评价指标1. BLEUBLEU(Bilingual Evaluation Understudy)是一种常见的机器翻译评价指标,它通过比较候选译文(机器翻译结果)与参考译文(人工参考的正确翻译)之间的相似性来评估翻译的质量。

BLEU指标计算方法基于n-gram的匹配率和短语的覆盖率,可以通过改变n-gram 的大小来灵活地衡量翻译结果的准确性和流畅性。

2. METEORMETEOR(Metric for Evaluation of Translation with Explicit ORdering)是另一个常用的机器翻译评价指标,它是基于单词级别的匹配和序列级别的对齐计算得到的。

METEOR指标考虑了词序的重要性,能够捕捉到翻译结果的顺序和流畅性,并且对同义词和近义词的匹配有较好的处理能力。

ACM Distinguished Scientist

ACM Distinguished Scientist

ACM Distinguished Scientist(10%的精英, 次高级荣誉)中科大(3人)姓名本科毕业院校就职单位熊辉中国科大94本美国Rutgers大学2014李宁辉中国科大93本美国普渡大学教授2012胡禹中国科大89本美国普渡大学教授2010北京大学(2人)杨强北京大学82本香港科技大学教授2011周源源北京大学92本美国UCSD讲席教授2011清华大学(2人)李向阳清华大学95本 Illinois理工教授2014刘杰清华大学96本微软研究院2011南京大学(2人)周志华南京大学96本南京大学教授2013翟成祥南京大学84本美国UIUC计算机系2009浙江大学(2人)诸葛海浙江大学92博中科院计算所2010王飞跃浙江大学84硕中科院自动化所2007上海交大(2人)刘欢上海交大83本亚利桑那州立大学2010赵峰上海交大?本 Microsoft2006东南大学(2人)朱强东南大学82本美国密歇根大学2013芮勇东南大学91本微软亚洲研究院副院长2009北京工业大学(2人)李利北京工大93本美国贝尔实验室2014刘晓文北京工大93本美国Florida国际大学2014西安电子科技大学(2人)张良杰西电大学90本美国IBM研究总部2009翟树民西电大学82本 google2006复旦大学(1人)周雪复旦大学89本美国IBM研究总部2009国防科大学(1人)朱文武国防科大85本清华大学教授2012其他王飞跃青岛科大82本中科院自动化所2007张良杰西安交大92硕美国IBM研究总部2008陈建二中南大学82本美国Notre Dame大学2014陈子仪武汉大学(80肄)美国Notre Dame大学2014徐常胜清华大学96博中科院自动化所2012(本科不详)现公职单位为国内高校全部用红标标出。

Baidu Inc., Beijing, China

Baidu Inc., Beijing, China

Generating Recommendation Evidence Using Translation Model Jizhou Huang†,‡⇤,Shiqi Zhao‡,Shiqiang Ding‡,Haiyang Wu‡,Mingming Sun‡,Haifeng Wang‡†Harbin Institute of Technology,Harbin,China‡Baidu Inc.,Beijing,China{huangjizhou01,zhaoshiqi,dingshiqiang01,wuhaiyang,sunmingming01,wanghaifeng}@AbstractEntity recommendation,providing entity sugges-tions relevant to the query that a user is searchingfor,has become a key feature of today’s web searchengine.Despite the fact that related entities arerelevant to users’search queries,sometimes userscannot easily understand the recommended entitieswithout evidences.This paper proposes a statisti-cal model consisting of four sub-models to gener-ate evidences for entities,which can help users bet-ter understand each recommended entity,andfigureout the connections between the recommended en-tities and a given query.The experiments show thatour method is domain independent,and can gener-ate catchy and interesting evidences in the applica-tion of entity recommendation.1IntroductionOver the past few years,major commercial web search en-gines have enriched and improved users’experiences of in-formation retrieval by presenting recommended entities re-lated to their search queries besides the regular search re-sults.Figure1shows an example of Baidu() web search engine’s recommendation results of the query “Obama”.On the panel,a ranked list of celebrities related to“Obama”is presented,providing users a quick access to entities closely attached to their interests and enhance their information discovery experiences.Related entity recommendations can increase users’en-gagements by evoking their interests to these entities and thereby extending their search sessions[Aggarwal et al., 2015].However,if users have no background knowledge on a recommended entity,they may possibly be confused,and leave without exploring these entities.In order to help users to quickly understand whether,and why,the entities can meet their interests,it is important to provide evidences on the rec-ommended entities.As depicted in Figure1,the phrase un-der each entity,which we refer to as the“recommendation evidence”(“evidence”for short later in this paper),is pre-sented,providing users a quick overview of the representa-Correspondingauthor.Figure1:An example of Baidu web search engine’s recom-mendations for the query“Obama”.The evidences are pre-sented under each entity.tive features of each entity.1Using the third recommended entity“Oh Han Ma”as an example,its evidence is“Obama’s Korean name”.Without this evidence,users may get con-fused and think it is an unrelated recommendation,because this entity is unfamiliar.Therefore,presenting evidence for recommendation can help in building and promoting trust be-tween the users and the recommendation system[V oskarides et al.,2015].In this paper,we will focus on generating ev-idences from sentences.Selecting appropriate evidences for each recommended entity related to a search query will be left for future work.Although there is some previous work on entity recommen-dation systems,e.g.,[Blanco et al.,2013;Yu et al.,2014; Bi et al.,2015],the problem of providing evidence for rec-ommendation has not been well addressed.To the best of our knowledge,little research has been published on automat-ically generating evidences for recommended entities.Al-though extracting evidences from structured data is a feasible way,the coverage is insufficient.Take“Obama”as an ex-ample,the disambiguation texts of entities in Wikipedia,e.g., Obama2,providing useful and key information to help dis-ambiguate different mentions of an entity,can be used as ev-idences for recommendation.For example,from the follow-ing disambiguation texts of Obama(EXAMPLE1and2),we 1We translate both Chinese entities and evidences into English to make it more understandable.2https:///wiki/Obama(disambiguation)Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)could extract“the highest point in Antigua and Barbuda”as an evidence of the entity“Mount Obama”and“the44th and current President of the United States”for“Barack Obama”. But the method does not work for the entity“Oh Han Ma”since no such information is available in Wikipedia. EXAMPLE1.Barack Obama(born1961)is the44th and current President of the United StatesEXAMPLE2.Mount Obama,the highest point in Antigua and BarbudaWe also investigated the largest Chinese online encyclope-dia Baidu Baike3,to examine the coverage of disambiguation texts in Chinese.Results show that only3.4%of entities in Wikipedia and5.9%of entities in Baike have such disam-biguation information,which shows that directly extracting evidences from such online resources is not sufficient.In or-der to provide users with consistent user experiences,it is im-portant to generate evidences for all recommended entities. In this paper,we propose to use statistical machine transla-tion(SMT)techniques to generate evidences for the entities recommended in web search.We train a translation model on a query-title aligned corpus,derived from clickthrough data of Baidu web search engine.Two additional feature functions are introduced in the translation model to produce attractive evidences.The major contributions of this paper are summarized as follows:1.We study the novel issue of evidence generation(EG)for entity recommendation.2.We propose an SMT based approach to generate evi-dence candidates for entities,and introduce two addi-tional feature functions in the model to produce attrac-tive evidences.The experimental results show that our approach is very promising.rge-scale monolingual parallel data is essential fortraining EG models.We propose an efficient method to mine aligned sentence-evidence pairs from the click-through data of a search engine.2Problem StatementIn this paper,we generate recommendation evidences for an entity from sentences that contain the entity.In order to pro-vide users a quick overview of the representative features of a given entity,we define entity evidence as follows:(1)the evidence must correctly describe the entity;(2)the evidence must be concise so as to be presented in a limited space4;(3) the evidence should be informative and attractive so as to at-tract users to browse and click the recommended entity. From the definition,we can see that the EG task requires to shorten a sentence by deleting some less important words and/or replacing some phrases with other more concise and attractive phrases,and organize the generated evidences in an accurate,fluent,and catchy manner.An example is depicted in Figure2.3/4In this paper,we constrain that the evidence should be no longer than10Chinesecharacters.Figure2:An example of evidencegeneration.Figure3:Overview of the EG method.Given a sentence s,our task is to generate a ranked list of evidences E={e1,e2,···,e k}for s.Figure3shows an overview of our method.The EG method contains two components,i.e.,sentence preprocess-ing and evidence generation.Sentence preprocessing mainly includes Chinese word segmentation[Peng et al.,2004],POS tagging[Gimenez and Marquez,2004]and dependency pars-ing[McDonald et al.,2006]for the input sentences,as POS tags and dependency information are necessary for the fol-lowing stages.Evidence generation(described in Section 3.2)is designed to generate evidences for the input sentences with a statistical machine translation model.The evidence generation model needs three data sources.Firstly,sentence-evidence parallel corpus(S-E Pairs)is used to train the“trans-lation”model and language model(described in Section3.1 and3.2).Furthermore,headlines of news articles(Headlines) and manually-labeled evidences are used to train an attrac-tion model(described in Section3.2and3.3)to increase the attraction of generated evidences.3Evidence Generation ModelDespite the similarity between evidence generation(EG)and machine translation(MT),the statistical model used in SMT cannot be directly applied in EG,since there are some clear differences between them:(1)the bilingual parallel data for SMT are easy to collect.In contrast,the large monolingual parallel data for EG are difficult to acquire;(2)SMT has a unique purpose,i.e.,producing high-quality translations for the inputs.In comparison,EG aims at generating attractive and concise descriptions as evidences;(3)In SMT,there is not any limitation for the length of translations,whereas in EG,the length of evidence is strictly limited due to the layout limitation.3.1Sentence-evidence Parallel DataTo train the EG model,we need large-scale monolingual par-allel data of sentence-evidence aligned pairs.Take the entity “Ilham Anas”in Figure1as an example,Table1shows sev-eral evidences(E)of this entity and their aligned sentences (S)from which the evidences are generated.S:Ilham Anas is Indonesia’s Obama look alikeE:the Obama’s Indonesian look-alikeS:Ilham Anas shares a striking resemblancewith US President Barack ObamaE:a striking resemblance to ObamaTable1:Examples of aligned sentence-evidence pairs.U U+B U+B+P U+B+P+D Precision87.9%88.4%90.0%91.6%Recall90.2%90.6%90.8%91.2% Table2:Performance of evidence classifier.From the examples shown in Table1,we can see that the evidences and sentences may use different language styles and vocabularies,so the generation model is required to bridge the gap between them.We view the title-query pairs derived from the clickthrough data of a search engine as sentence-evidence pairs,and use this data to construct the training corpus for evidence generation model,the reasons are as follows:1.The queries and evidences have similar language stylesand vocabularies,because the perplexity result of the language model trained on12million random queries tested on200,000sample evidences is578,which indi-cates high language similarity between them according to the metric described in[Gao et al.,2010].2.A query is parallel to the titles of documents clicked onfor that query[Gao et al.,2010],thus the query-title aligned corpus is a good source of monolingual paral-lel data.The task of evidence identification from query logs can be viewed as a binary classification problem of distinguishing evidence from non-evidence.To construct the data for train-ing the classifier,we collected the disambiguation texts of en-tities from Baidu Baike as seed evidences,and got488,001 evidences in this way.To enrich the evidence data,we used pattern based method[Muslea,1999]to extract more evi-dences from the main texts in Baidu Baike using the pat-tern“<entity>is<evidence>”.Altogether,we collected 1,040,897evidences as positive instances.We then extracted equivalently random queries as negative instances with the same length range as the positive instances.To construct the test set for the classifier,we randomly sampled10%of the data from positive and negative instances separately,and the 90%of data were left for training.Maximum Entropy is selected as the classification model because it is an effective technique for text classification [Nigam et al.,1999].The features used for training the evi-dence classifier are unigrams(U),bigrams(B),POS tagging (P)and dependency parsing(D).Table2shows the perfor-mance of the evidence classifier.To extract candidate sentence-evidence pairs,we used the evidence classifier to identify evidences from queries on a six-months clickthrough data of Baidu web search engine.To en-rich the data,each title was segmented into multiple sub-titles using punctuations,and formed multiple title-query pairs.Fi-nally,similar to[Quirk et al.,2004],the pairs werefiltered if they met any of the following rules:•Title and query have no word overlapping;•Title and query are identical;•The length5of a query is greater than10or less than6;•Title-query pairs with significantly different lengths(the shorter is less than two-thirds the length of the longer).A total of55,149,076title-query aligned pairs were obtained, which we used as sentence-evidence parallel corpus to train our evidence generation model.Mean edit distance[Leven-shtein,1966]over Chinese characters was5.7;mean lengths of sentence and evidence were10.7and7.9,respectively. 3.2Evidence Generation ModelOur EG model contains four sub-models:a translation model, a language model,a length model,and an attraction model, which control the adequacy,fluency,length,and attraction of the evidences,respectively.6Translation Model(M1)Evidence generation is a decoding process.Similar to[Zhao et al.,2009],the input sentence s isfirst segmented into a sequence of units¯s l1,which are then“translated”to a se-quence of units¯e l1.Let(¯s i,¯e i)be a pair of translation units, their translation likelihood is computed using a score func-tion tm(¯s i,¯e i).Thus the translation score between s and e is decomposed into:p tm(¯s l1,¯e l1)=l Y i=1 tm(¯s i,¯e i) tmwhere tm is the weight for the translation model.Actually, it is defined similarly to the translation model in SMT[Koehn et al.,2003].Language Model(M2)We use a tri-gram language model in this work.The language model based score for the evidence e is computed as:p lm(e)=JY j=1p(e j|e j 2e j 1) lmwhere J is the number of words of e,e j is the j-th word of e, and lm is the weight for the language model.Length Model(M3)We use a length-penalty function to generate short evidences whenever possible.To meet the requirement that the evidence should be less than10Chinese characters,the length score for the evidence e is computed as:p lf(e)=(N if N 101N 10if N>10where N is the number of Chinese characters of e.5Throughout this paper the length of sentences,evidences,and queries is assumed to be the number of Chinese characters in them.6The EG model applies monotone decoding,which does not con-tain a reordering sub-model that is often used in SMT.Attraction Model(M4)The attraction model prefers evidences that can better achieve the requirements of entity recommendation described in Sec-tion2.After analyzing a set of manually labeled entity evi-dences,we found that the attraction of e depends on three as-pects:the vocabulary used,the language style,and sentence structure.We use two sub-models to capture these aspects. Thefirst one is a special language model trained on head-lines of news articles(M4-1for short),which tries to gener-ate catchy and interesting evidences with similar vocabularies and styles to headlines.The motivation is that news editors usually try their best to use the attractive expressions to write the headlines.The second one is a sentence structure model trained on human annotated evidences(M4-2),which tries to generate evidences with popular syntax styles that users might prefer.Hence the attraction model is decomposed into:p am(e)=p hl(e) hl·p ss(e) sswhere p hl(e)is the headline language model and p ss(e)is the sentence structure model.p hl(e)is similar to p lm(e),but trained on headlines.p ss(e)is computed as:p ss(e)=max(K(T e,T ti))where T x is the dependency tree of sentence x,t i is the human annotated evidences,and K(·,·)is the dependency tree ker-nels described in[Culotta and Sorensen,2004],which mea-sures the structure similarity between sentences.We combine the four sub-models based on a log-linear framework and get the EG model:p(e|s)= tm⌃l i=1log tm(¯s i,¯e i)+ lm⌃J j=1log p(e j|e j 2e j 1)+ lf log p lf(e)+ hl⌃L l=1log p(e l|e l 2e l 1)+ ss log p ss(e) 3.3Resources for Training Attraction ModelTo train the attraction model,we need data of headlines and human annotated evidences.Wefirstly extracted all headlines from three major Chinese news Websites7,then we ranked the headlines by the click count in the query logs,andfinally top ranked10million headlines were remained.To guide the EG model to generate evidences with similar structures to hu-man composed evidences,we need a set of human annotated evidences of high-quality.We used crowdsourcing method [Hsueh et al.,2009]to collect this set.We asked annotators to compose evidences for each sentence,then asked5different annotators to vote each evidence with two options:acceptable or not,finally an evidence voted by more than4annotators out of5as acceptable were kept.A total of104,775excellent evidences were obtained.3.4Parameter EstimationTo estimate parameters tm, lm, lf, hl,and ss,we adopt the approach of minimum error rate training(MERT)that is popular in SMT[Och,2003].In SMT,the optimization 7(1)/,(2)/,and(3) /ID Category PercentageC1People31.6%C2Terminology8.7%C3Organization8.2%C4Animal 6.3%C5Place 5.9%C6Movie 4.8%C7Others34.5%Table3:Categories of test entities.objective function in MERT is usually BLEU[Papineni et al.,2002],which requires human references.To provide hu-man annotated evidences as references for each sentence,we asked5annotators to compose evidences for each sentence separately.Finally,we invited other3judges to vote each evi-dence,and evidences with at least two agreements were kept.A total of7,822sentences with human annotated evidences were obtained,and only thefirst evidence of a sentence was used as the reference.We estimate parameters for each model separately.The parameters that result in the highest BLEU score on the development set werefinally selected.4Experimental SetupWe use the method proposed in[Che et al.,2015]as baseline, which used a CRF model to compress sentences by dropping certain less important words.For the EG method proposed in this paper,we have trained three models.Thefirst EG model combines M1,M2,and M3,which is used for evaluating the performance of default features(named as EG-D).The sec-ond EG model combines M1,M2,M3,and M4-1,which is used to examine if headlines could help increase the perfor-mance(named as EG-H).The third considers all sub-models M1,M2,M3,and M4(named as EG-F).4.1Experimental DataOur method is not restricted in domain or language,since the translation models and features employed here are language independent.Thus sentences in different languages or con-taining entities of different categories can be used for test-ing.In this paper,all EG models are trained on a Chinese corpus.Furthermore,to evaluate if our method can generate evidences for sentences of different lengths and categories,in our experiments,we manually select1,000Chinese sentences as a test set to carry out the evaluation according to the fol-lowing rules:(1)the sentence contains descriptive informa-tion about an entity in a randomly selected entity set,and(2) the length of a sentence is in the range5to20.The average length of the sentences is11.8.Finally,to check if our method can generate evidences for different types of entities,we clas-sify the entities described by the collected sentences.After classification,104categories in total are obtained.Table3 shows the percentage of the classification results,in which, the top6categories are listed,and all the other categories are combined into“Others”.4.2Evaluation MetricsThe evaluation metrics for EG are similar to the human eval-uation for MT[Callison-Burch et al.,2007].The generated evidences are manually evaluated based on three criteria,i.e., adequacy,fluency,and attraction,each of which has three scales from1to3.Here is a brief description of the different scales for the criteria:Adequacy1:The meaning is obviously changed.2:The meaning is generally preserved.3:The meaning is completely preserved. Fluency1:The evidence e is incomprehensible.2:e is comprehensible.3:e is aflawless phrase.Attraction1:The attraction is obviously decreased.2:The attraction is generally retained.3:The attraction is increased.To make the attraction understood consistently by raters in practice,we define attraction in detail by using three aspects: the evidence should be more concise,informative,and/or in-teresting than the sentence.BLEU is widely used for automatic evaluation in MT.It measures the similarity between a translation and human ref-erences.To further assess the quality of the generated ev-idences,we compute BLEU scores of each method,and3 human references for each test sentence are provided.5Results and AnalysisWe use the baseline and the three EG models to generate ev-idences.Results show that the percentages of test sentences that can be generated(“coverage”for short later in this pa-per)are99.9%,88.8%,87.3%,and87.3%for baseline,EG-D,EG-H,and EG-F,respectively.The coverage of baseline is much higher,because it is easy to delete words to match the required length without considering other constraints of evi-dence.The reason why the last two coverages are lower than the second one is that,after adding the sub-models into the EG-D,several extremely long sentences cannotfind properly short phrase replacements or do phrase deletion so that the method fails to generate evidences within maximum length allowed.It indicates that generating evidences for extremely long sentences is more difficult than the shorter ones.In our experiments,thefirst evidence generated by each model is used for evaluation.5.1EvaluationWe ask two raters to label the evidences generated by base-line and the three models based on the criteria defined in Sec-tion4.2.The labeling results averaged between two raters are shown in Table5.We can see that for adequacy,fluency,and attraction,the EG-F model gets the highest scores.The per-centages of label“3”are50.9%,69.9%for adequacy andflu-ency,which is promising for our EG task.But the percentage of label“3”for attraction is17.6%,the main reason is that it is difficult to increase much attraction due to the strict length limitation of EG task.This motivates us to further improve the attraction model in the future work.We compute the kappa statistic between the raters.Kappais defined as K=P(A) P(E)1 P(E)[Carletta,1996],where P(A)Table4:The generated evidences of some sentences.\indi-cates that the evidence has grammatical errors.is the proportion of times that the labels agree,and P(E)isthe proportion of times that they may agree by chance.We de-fine P(E)=1/3,as the labeling is based on three point scales.The results show that the kappa statistics for adequacy,flu-ency,and attraction are0.6598,0.6833,and0.6763,respec-tively,which indicates a substantial agreement(K:0.61-0.8)according to[Landis and Koch,1977].Table4shows an example of the generated evidences.Ev-idences E of baseline,EG-D,EG-H,and EG-F are listed withtheir source sentences S.5.2ComparisonWe tune the parameters for the three EG models using thedevelopment data as described in Section3.4and evaluatethem with the test data as described in Section4.1.As can be seen from the test results in Table5,the EG-H and EG-F models significantly outperform baseline andEG-D in both human and automated evaluation.Althoughthe coverage of EG-F(87.3%)is lower compared to base-line(99.9%),the usability of EG-F is much higher than thatof baseline:(1)the BLEU score is improved by15.01(from53.63to68.64),and(2)the overall improvements of labels“2”and“3”are higher for adequacy,fluency,and attraction.Compared with baseline,the EG-F achieves a better balancebetween coverage and usability.The baseline is not readilyapplicable to the application of entity recommendation due toits low usability.The overall percentages of labels“2”and“3”of EG-D,baseline,EG-H,and EG-F,in all three evalua-tion metrics,are largely consistent with movements in BLEUscores,which verifies that BLEU tracks human evaluationwell.As shown in Table5,the EG-H model outperforms the EG-D model with noticeable improvements,as the percentagesof labels“2”and“3”are much higher for all three evaluationmetrics.This shows that the model M4-1can contribute togenerating evidences of higher quality.Table5also showsthat the EG-F model improves the performance comparedwith EG-H model.This verifies the effectiveness of bringingBaseline EG-D EG-H EG-FAdequacy(%)144.345.425.524.8 222.127.024.224.3 333.627.650.350.9Fluency (%)125.631.917.717.6 229.414.413.312.5 345.053.769.069.9Attraction(%)150.251.930.429.3 248.638.552.553.1 3 1.29.617.117.6BLEU53.6340.5967.3968.64 Table5:The evaluation results of baseline and EG models.C1C2C3C4C5C6C7A 121.534.838.222.324.525.022.4 222.227.928.522.314.220.027.2 356.337.333.355.461.355.050.4F 115.323.423.618.717.012.517.4 29.018.411.817.011.321.212.8 375.758.264.664.371.766.369.8T 124.039.943.731.325.527.528.8 259.043.739.646.455.656.253.3 317.016.416.722.318.916.317.9Table6:The evaluation results of each category of the EG-F model.In which,A=Adequacy(%),F=Fluency(%),and T =Attraction(%).C1to C7are defined in Table3.model M4-2with human annotated data set.It also indicates that more human annotated evidences can be adopted to better guide the EG model to generate evidences similar to human composed evidences.We further compare the phrase replacements and deletions performed by each model.Experimental results show that the average number of phrase replacements/deletions in sen-tences are0/2.2,0.8/1.9,0.5/1.7,and0.5/1.6for baseline, EG-D,EG-H,and EG-F,and the average lengths of evidences generated by these models are7.9,7.2,7.7,and7.8,respec-tively.The baseline conducts more deletions than the other three models,but makes no replacement.As we can see from Table5,the adequacy,fluency,and attraction of baseline and EG-D are lower than that of EG-H and EG-F models,which demonstrates that some key phrase replacements or deletions are inadequately or incorrectly performed in baseline and EG-D.This motivates us that more efficient models for replacing and deleting can be explored in future to get better results. Table6shows the evaluation results of each category in-volved in our experiments.Except C2(terminology)and C3 (organization),the performance of all the other categories match up with or outperform the overall performance of EG-F (shown in Table5).This verifies that EG-F can generate do-main independent evidences which achieve our applications. The percentages of label“1”of C2and C3for all three eval-uation metrics are higher than that of the overall performance of EG-F.The main reason is that the average lengths of sen-tences of C2and C3are12.1and13.3,which are larger than the overall average length11.8.Another reason is the“mis-translation”of the infrequent numbers that occur more fre-quently in C2and C3(account for24.2%of bad cases),e.g., the“ranked27th”is wrongly“translated”to“ranked1st”.It is thus more necessary to bring a new model to“translate”numbers in correct ways to improve the EG performance.6Related WorkIn this section,we review the related work.A research topic closely related to our work is the task of mining evidences for entities.[Li et al.,2013]proposed a method to mine evidences for named entity disambiguation task.The evi-dences consist of multiple words related to an entity.Our work is different in that we aim at generating comprehensi-ble and human-readable sentences as evidences.Our work is also related to the task of sentence compression.[Turner and Charniak,2005;Galley and McKeown,2007;Nomoto,2007; Che et al.,2015]proposed methods to compress an original sentence by deleting words or constituents.However,these extractive methods are restricted to word deletion,and there-fore are not readily applicable to the more complex EG task. [Cohn and Lapata,2013]proposed an abstractive method to compress an original sentence by reordering,substituting,in-serting,and removing its words.This method cannot be di-rectly transplanted to the EG task due to the specificity of the entity evidence,since the EG task requires to generate evi-dences within specified length limits by using attractive ex-pressions and vocabularies from a sentence,rather than sim-ply compress a sentence.Our work is also closely related to the studies in sentential paraphrase generation using monolingual machine transla-tion.Although the studies share the same idea in translating a source sentence into a target sentence that are in the same lan-guage by using monolingual parallel corpus,there are some differences from our work.[Wubben et al.,2012]built a monolingual machine translation system to convert complex sentences into their simpler variants.While our work aims at generating concise,informative,and interesting evidences from sentences rather than just simplifying sentences.[Quirk et al.,2004;Zhao et al.,2009]viewed paraphrase generation as monolingual machine translation,which aims to generate a paraphrase for a source sentence in a certain application. The three major distinctions between evidence generation and these studies are:(1)we consider language styles and vocab-ularies in evidences;(2)we add a length model to ensure the generating of evidences within maximum length allowed;(3) we introduce two attraction measures and features in the EG model,to produce more attractive evidences from sentences.7Conclusion and Future WorkIn this paper,we study the problem of generating evidences for the recommended entities in web search.We propose a translation model to generate evidences from sentences.The experiments show that our method can generate domain inde-pendent evidences with high usability.As future work,we plan to dynamically select appropriate evidences for each recommended entity related to the search query.We are interested in generating evidence using multi-ple sentences,rather than relying on a single sentence.。

Composite subscriptions in content-based publishsubscribe systems

Composite subscriptions in content-based publishsubscribe systems

Composite Subscriptions in Content-basedPublish/Subscribe SystemsGuoli Li and Hans-Arno JacobsenMiddleware Systems Research Group,University of Toronto,Toronto,ON,Canada Abstract.Distributed publish/subscribe systems are naturally suitedfor processing events in distributed systems.However,support for ex-pressing patterns about distributed events and algorithms for detectingcorrelations among these events are still largely unexplored.Inspiredfrom the requirements of decentralized,event-driven workflow process-ing,we design a subscription language for expressing correlations amongdistributed events.We illustrate the potential of our approach with aworkflow management case study.The language is validated and imple-mented in PADRES.In this paper we present an overview of PADRES,highlighting some of its novel features,including the composite subscrip-tion language,the coordination patterns,the composite event detectionalgorithms,the rule-based router design,and a detailed case study il-lustrating the decentralized processing of workflows.Our experimentalevaluation shows that rule-based brokers are a viable and powerful al-ternative to existing,special-purpose,content-based routing algorithms.The experiments also show that the use of composite subscriptions inPADRES significantly reduces the load on the plex work-flows can be processed in a decentralized fashion with a gain of40%in message dissemination cost.All processing is realized entirely in thepublish/subscribe paradigm.1IntroductionIn distributed applications large numbers of events occur.In isolation these events are often not too interesting or useful.However,as correlations over time,for example,these events may represent interesting and useful information. This information is important for coordinating activities in a distributed system. Workflow processing and business process execution,where different stages of the flow or process execute on distributed nodes,are examples of distributed appli-cations generating potentially huge numbers of events.The efficient correlation of these events reveals information about the status of the workflow.Events in a workflow could be the initiation,the termination,or the status of a task.Distributed publish/subscribe systems are well-suited to handle large num-bers of events.A publish/subscribe system is comprised of information producers who publish and information consumers who subscribe to information.The key In ACMIFIPUSENIX6th International Middleware Conference,Grenoble,France, November,20052Guoli Li and Hans-Arno Jacobsenbenefit of publish/subscribe for distributed event-based processing is the natural decoupling of publishing and subscribing clients.This decoupling can enable the design of large,distributed,loosely coupled systems that interoperate through simple publish and subscribe-style operations.However,current publish/subscribe approaches lack the ability to address event correlation and enable the coordination of activities associated with dis-parate clients in the content-based network.In order to allow publish/subscribe to support such distributed applications,first,an appropriate subscription lan-guage needs to be designed which offers a suitable view over available events to enable coordination.Second,event correlation requires the detection of dis-tributed events.In publish/subscribe this is based on routing subscriptions and publications throughout the broker network and on efficient composite event detection algorithms realized on a single publish/subscribe broker.Some work on detecting composite events in distributed publish/subscribe systems is starting to appear[21,22,5].However,these approaches are mainly focusing on the design of the subscription language and do not address the event correlation problem central to our approach.We have developed an expressive content-based subscription language that is derived from the requirements of event-driven,decentralized workflow management and business process execu-tion scenarios.To validate our approach we have implemented the language in PADRES(Publish/subscribe Applied to Distributed REsource Scheduling),a novel distributed,content-based publish/subscribe messaging system,and have built all the necessary infrastructure to support the deployment,monitoring, and execution of workflows and business processes.In essence,we have realized a decentralized workflow management and execution environment that builds directly on top of a standard publish/subscribe interface.PADRES’s subscription language is fully content-based,includes notions to express time,supports variable bindings,coordination patterns,and composite posite subscriptions offer a higher level view for subscribers by enriching the expressiveness of the subscription language.A composite subscrip-tion consists of several atomic subscriptions linked by logical or temporal oper-ators.An atomic subscription refers to the traditional notion of a subscription in publish/subscribe and is matched by a single publication event;a composite subscription is matched by a set of independent events potentially occurring at different locations and times.PADRES is based on a rule-based broker that im-plements composite event detection and introduces a novel distributed algorithm for composite subscription routing.Support for composite subscriptions is essential for applications where it is impossible to detect a particular condition from isolated atomic events.For example,in workflow management systems,tasks can only be executed if cer-tain conditions are met.A given task may require that two other tasks have successfully completed and a certain timing constraint is met.We will show experimentally that supporting composite subscriptions in content-based pub-lish/subscribe systems has two key advantages.First,subscribers receive fewer messages and network traffic is reduced.Without composite subscriptions,theLecture Notes in Computer Science3 subscriber must subscribe to all the corresponding atomic events in order to receive the necessary information.The subscriber would be overwhelmed by an excessive amount of atomic events,most of which may be irrelevant and could befiltered out before reaching the subscriber.Second,the overall performance of the publish/subscribe system is improved by detecting composite events in the network,rather than at the edge of the network.Moreover,composite sub-scriptions reduce the complexity of subscriber components.The rest of this paper is organized as follows.Section2presents background material and related work.An overview of PADRES is given in Section3.Sec-tion4presents the PADRES subscription language,composite subscription rout-ing and composite event detection in detail.A workflow management system case study built on PADRES is discussed Section5.An experimental evaluation of PADRES and its potential for workflow management is presented in Section6. 2Background and Related WorkContent-based Routing–Content-based publish/subscribe systems typically utilize content-based routing in lieu of the standard address-based routing.Since publishers and subscribers are decoupled,a publication is routed towards the interested subscribers without knowing specifically where subscribers are and how many subscribers exist.The content-based address of a subscriber is the set of subscriptions issued by the subscriber.There are several interesting projects dealing with content-based routing,such as SIENA[3],REBECA[18],JEDI[6], Hermes[20]and Gryphon[19].Covering and merging-based routing,which are optimizations for content-based routing,are discussed in SIENA[3],JEDI[6], REBECA[18],and PADRES[15].In addition to publications and subscriptions, content-based routing can use advertisements[18,3],which are indications of the data that publishers will publish in the future.Advertisements are used to form routing paths along which subscriptions are propagated.Without ad-vertisements,subscriptions must beflooded throughout the network.PADRES adopts the publication-subscription-advertisement model for content-based rout-ing and suggests several novel features not realized in existing approaches.The novel features of PADRES discussed in this paper include a rule-based router design,algorithms to support composite subscription routing,composite event detection,coordination patterns for expressing workflows and business processes, and support for the decentralized deployment and execution of workflows and business processes.Composite Events–An event is defined as a state transition.In the pub-lish/subscribe literature,events describe state transitions of interest to sub-scribers.Events are often synonymously referred to as publications1.A sub-scription captures the interest of a subscriber to be informed about possible events.We generically refer to subscriptions,publications,and advertisement as messages,if no distinction is required.1One could further distinguish between the state transition(i.e.,event)and the pub-lished information that reports on the transition(i.e.,the publication).4Guoli Li and Hans-Arno JacobsenA composite event refers to a pattern of event occurrences of interest to a subscriber.These patterns may express temporal or causal relationships between different events.A pattern is matched,if the specified events have occurred, subject to optional timing constraints.Since several events are involved in the matching of a single subscription pattern the matching engine has to store partial matching states.In the literature,the term composite event has been used to refer to a subscription that expresses the pattern defining a composite event. To make the difference between the state transitions(i.e.,the events)and the actual interest specification clearer,when discussing our work,we use the term composite subscription to refer to the pattern and use composite event to mean the distributed state transitions of relevance for the subscriber of the composite subscription.Also to distinguish composite subscriptions from traditional,non-composite subscriptions,we refer to the latter as atomic subscriptions.The earliest approaches for enabling the processing of composite events were rule-based production systems established in artificial intelligence.One of the most widely used matching algorithms,the Rete algorithm is used in many expert systems today[9].Rete compiles rules into a network.The design of Rete trades offspace for processing efficiency.The Java Expert System Shell (Jess)[10]is a rule-based matching engine based on the Rete algorithm.Our PADRES broker is based on Jess.The Publication Routing Table(PRT)and Subscription Routing Table(SRT)are two Jess engines.We show how content-based publish/subscribe messages(i.e.,subscriptions,composite subscriptions, publications,and advertisements)can be mapped to rules and facts processed by Rete-type rule engines.Many early approaches for composite event processing relate to active databa-ses and are based on centralized evaluation schemes[12,11,16,13,17,4].These projects differ primarily in the mechanism used for event detection.Ode[12] uses afinite automaton and SAMOS[11]uses a Petri Net.Other approaches use trees as the data structure for representing and detecting composite events.The main reason for adopting trees is that they are simple and intuitive for represent-ing composition.The traversal and manipulation of trees have been thoroughly studied in the past,and a large number of efficient algorithms have been de-veloped[16,13,1,17].GEM[16]and READY[13]are projects using tree-based approaches to process incoming events.Atomic events are leaf nodes and oper-ators are inner nodes in the tree structure.The composite event is represented by the root of the tree.The main limitation of GEM is each composite event has its own tree,and identical subtrees cannot be shared among composite event trees.Similar to GEM and READY,EPS(Event Processing Service)[17]pro-vides a tree-based event specification language.EPS alleviates the limitation of GEM by using a shared subscription tree to process incoming events.Snoop[4], also a tree-based approach,provides an expressive composite event specification language with temporal support.Snoop introduces the notion of consumption policies called contexts.They are used to capture application semantics by re-solving which events are consumed from the event history for composite event detection in case of posite subscriptions in PADRES are alsoLecture Notes in Computer Science5 represented by trees.Unique to PADRES is the mapping of atomic and com-posite subscriptions to rules and the support of full content-based,composite subscriptions.The rule-based processing has been thoroughly studied,leading to a large number of efficient algorithms for rule/fact matching.The rule-based approach employed in PADRES takes advantage of the existing research for the PADRES broker design.PADRES also supports a tree decomposition algorithm for composite subscription routing.The specification and detection of composite events in the context of pub-lish/subscribe systems has recently become an important research area[21,22, 5].Hermes[20]and Gryphon[19]provide parameterized atomic events to enrich the expressiveness of subscriptions.Courtenage[5]specifies composite events based on theλ-calculus.The approach lacks support for temporal constraints. CEA[21]proposes a Core Composite Event Language to express event patterns that occur concurrently.CEA constitutes a composite event detection framework built as an extension of an existing publish/subscribe middleware platform.The CEA language is compiled into automata for distributed event detection sup-porting regular expression-type patterns.CEA employs policies to ensure that mobile event detectors perform distributed event detection at favorable loca-tions,such as close to event sources.REBECA[22]describes composite events using composite eventfilter expressions,which can be mapped to expressions of the Core Composite Event Language[21].The subscription language design of PADRES has been inspired from requirements set forth by workflow and busi-ness process description languages and the requirements of distributed execution of these processes.Unique to PADRES is the use of variables in subscriptions to join atomic events.PADRES also supports language elements to express de-pendencies and condition-based repetition relationships of activities(i.e.,while loops).Architecturally different from existing approaches,PADRES builds the composite subscription processing and composite event detection capability into the publish/subscribe layer.3PADRES System DescriptionThe PADRES system consists of a set of brokers connected by a peer-to-peer overlay network.Clients connect to brokers using various binding interfaces such as Java Remote Method Invocation(RMI)and Java Messaging Service(JMS). Each PADRES broker employs a rule-based engine to route and match pub-lish/subscribe messages,and is used for composite event detection.An overview of PADRES is provided in[8].This paper focuses on the specification,detec-tion,and use of composite events.PADRES provides four other novel features as well:monitoring support,historic query capability,fault detection and re-pair,and load balancing.A monitor module,which is an administrative client in PADRES,could display the broker network topology,trace messages,and mea-sure the performance of the broker network.The historic data access module allows clients to subscribe to both future and historic publications.The fault tolerance module detects failures in the publish/subscribe layer and initiates failure recovery.The load balancing module handles the scenarios in which a6Guoli Li and Hans-Arno JacobsenFig.1.Broker Network Fig.2.Broker Architecturebroker is overloaded by a large number of publishers or subscribers.The detail of these features goes beyond the scope of this paper.Fig.10shows the protocol stack of PADRES.This section discusses the architecture of PADRES for pro-cessing of atomic subscriptions.The extension of PADRES to process composite subscription and the case study applying composite subscription processing to workflow management are discussed later.3.1Message FormatThe PADRES subscription language is based on the traditional[attribute, operator,value]predicates used in several existing content-based publish/ subscribe systems[3,18,19,7].An atomic subscription is a conjunction of pred-icates.For example,an atomic subscription in workflow management may be ([class,=,job-status],[appl,=,payroll],[job-name,isPresent,*]). The comma between predicates indicates the conjunction relation.This subscrip-tion is matched by publications of all jobs involved in application payroll.We support operators,such as=,>,<,≥,≤,and isPresent.The special operator isPresent means an attribute could be any value in a given range.Each sub-scription message has a mandatory tuple describing the class of the message. The class attribute provides a guaranteed selective predicate for matching,sim-ilar to the topic in topic-based publish/subscribe systems2.Other predicates are constraints on particular attributes.Advertisements have the same format as atomic subscriptions.Publications are sets of[attribute,value]pairs. There is a match between a subscription and a publication if each predicate in the subscription is satisfied by a corresponding[attribute,value]pair in the publication.A match between a subscription and a advertisement means the sets of publications matching the advertisement and the subscription are overlap.3.2Network ArchitectureThe overlay network connecting the brokers is a set of connections that form the basis for message routing.The overlay routing data is stored in Overlay 2The PADRES language is fully content-based based on a rich predicate language.Lecture Notes in Computer Science7 Routing Tables(ORT)at each broker.Specifically,each broker knows its neigh-bors from the ORT.Message routing in PADRES is based on the publication-subscription-advertisement model established by the SIENA project[3].We as-sume that publications are the most common messages,and advertisements are the least common ones.A publisher issues an advertisement before it publishes. An advertisement allows the publisher to publish a set of publications matching this advertisement.Advertisements are effectivelyflooded to all brokers along the overlay network.A subscriber may subscribe at any time.The subscrip-tions are routed according to the Subscription Routing Table(SRT),which is built based on the knowledge of advertisements.The SRT is essentially a list of[advertisement,last hop]tuples.If a subscription overlaps an advertise-ment in the SRT,it will be forwarded to the last hop broker the advertisement came from.Subscriptions are routed hop by hop to the publisher,who adver-tises information of interest to the subscriber.Meanwhile,the subscription will be used to construct the Publication Routing Table(PRT).Like the SRT,the PRT is logically a list of[subscription,last hop]tuples,which is used to route publications.If a publication matches a subscription in the PRT,it will be forwarded to the last hop broker of that subscription until it reaches the sub-scriber.A diagram showing the overlay network,SRT and PRT is provided in Fig.1.In thisfigure,step1)an advertisement is propagated from B1.Step2) a matching subscription enters from B2.Since the subscription overlaps the ad-vertisement at broker B3,it is sent to B1.Step3)a publication is routed along the path established by the subscription to B2.A subscription/advertisement covering and merging scheme[15]is used to optimize content-based routing by reducing network traffic and routing table size,especially for applications with highly clustered data.3.3Broker ArchitectureThe PADRES brokers are modular software components built on a set of queues: one input queue and multiple output queues.Each output queue represents a unique message destination.A diagram of the broker architecture is provided in Fig.2.The matching engine between the input queue and output queues is built using Jess.It maintains the SRT and PRT,which are Rete trees[9].For example,in the PRT,subscriptions are mapped to rules,and publications are mapped to facts,as shown in Fig.3.An atomic subscription message is mapped to the antecedent of a rule;the actions to be taken if the subscription is matched are mapped to the consequent of the rule.The antecedent encodes the message filter condition and the consequent encodes the notification semantic.The matching between subscriptions and publications is transformed to the matching between rules and facts,which is performed by the rule-based broker. When a new message is received by the broker,it is placed in the input queue. The matching engine takes the message from the input queue.If the message is a publication,it is inserted into the PRT as a fact.When a publication matches a subscription in the PRT,its next hop destination is set to the last hop of the sub-scription,and it is placed into the corresponding output queue(s).If the message8Guoli Li and Hans-Arno JacobsenFig.3.Mapping Subscriptions/Publications to Rules/Factsis a subscription,the matching enginefirst routes it according to the SRT,and, if there is an advertisement overlapping the subscription,the subscription will be inserted into the PRT as a rule.Essentially,the rule-based broker performs matching and decides the next hop destinations of the messages as a router. This novel rule-based approach allows for powerful subscription language and notification semantics and naturally enables composites subscriptions.4Composite Subscription Processing4.1Composite Subscription LanguageThe composite subscription language is inspired by the requirements of workflow management and business process execution.The language should be powerful enough to eventually describe workflows defined using the Business Process Ex-ecution Langauge(BPEL4WS)[14],which is a standard language for business processes.PADRES supports parallelization,alternation,sequence and repetition compositions.PADRES also supports variable bindings that serve to correlate and aggregate publications by specifying constraints on attribute values between different atomic subscriptions.A composite subscription is represented by a sub-scription tree,where the internal nodes are operators and leaf nodes are atomic subscriptions,as shown in Figure4(b).The operator to represent the parallelization pattern is AND,denoted by the symbol(&).The composite subscription(s1&s2)is matched when both s1and s2are matched,irrespective of their matching order.The operator&is to con-nect two or more subscriptions,and it is different from the conjunction operator between predicates in an atomic subscription that requires to be matched by one publication.The alternation pattern represents the matching of any of two specified subscriptions using operator OR,denoted as( ).The composite sub-scription(s1 s2)is satisfied when either s1or s2is matched by a publication. Furthermore,composite subscriptions in PADRES can have variables bound to values in the publications.Variables are represented by$in subscription predi-cates.Parenthesis are used to specify the priority of operators.In the example below,the composite subscription consists of three atomic subscriptions,linked using&and ,and requires the values of the attribute appl in the matching publications to be equal.This is expressed using the variable symbol$X.{Rule(((job-status(appl=$X)(job-name=A)(state=succ))&Lecture Notes in Computer Science9 (job-status(appl=$X)(job-name=B)(state=succ)))||(job-status(appl=$X)(job-name=C)(state=succ))) =>(forward a notification to proper destinations)}Events in applications may have sequential relations,that is,one event hap-pens before the occurrence of another event.The sequence pattern describes this kind of event relation.The composite subscription(s1;[timespan:ts]s2)[within:wi] is matched when a publication p2matching s2occurs provided publication p1 matching s1has already occurred.The timespan parameter specifies the mini-mum time step of the two publications;the within parameter limits the maximum time span between them.In the sequence pattern,a time predicate is added to standard subscriptions.Suppose s1and s2subscribe to job A and job B respec-tively,as in the previous example.The composite subscription is mapped to a rule as described below.This pattern requires that the time p2is published is greater than that of p1.{Rule((job-status...(job-name=A)(time=$Y)...)&(job-status...(job-name=B)(time>$Y+ts)(time<$Y+wi))) =>(forward a notification to proper destinations)}The repetition pattern describes an aperiodic or periodic event.PADRES can describe the repetition events as Repetition(S,n,attr,v).It means publications matching S happen n times and attribute attr increases by step v, or decreases if v is negative.The iteration is controlled the value of attr with step v.A repetition pattern can be mapped to a rule as below.{Rule((job-status...(job-name=A)(attr=$Z)...)&(job-status...(job-name=A)(attr=$Z+v)...)&...&(job-status...(job-name=A)(attr=$Z+(n-1)v)...)) =>(forward a notification to proper destinations)}Composite subscriptions can be composed in a nested fashion using the above operators to create more complex composite subscriptions.Mapping composite subscriptions to rules consists of three steps:first,each atomic subscription is mapped to part of the antecedent.Second,connect each part of the antecedent using logical operators and variables.Third,activites to be taken after matching are mapped to the consequent of the rule.In the PADRES broker,both atomic and composite subscriptions are mapped to rules.That is,extending this sub-scription language does not require significant changes in the matching engine.4.2Composite Subscription RoutingIn a large-scale publish/subscribe system,publications are issued at geographi-cally dispersed sites.A centralized composite event detection scheme constitutes a potential bottleneck and consists of a single point of failure.All atomic pub-lications have to be centrally collected in order to detect an occurrence of a composite event.Our distributed solution consists in detecting parts of an event pattern and aggregating the parts.A notification message signifying the occur-rence of the composite event is sent to the subscriber only after all the parts10Guoli Li and Hans-Arno Jacobsenposite Subscription Routingare detected.The main difficulties of distributed event detection are routing composite subscriptions,including where and how to decompose a composite subscription,and routing the individual parts of the subscription.The loca-tion of detection should be as close to publishers as possible to ensure that the publications contributing to a given composite subscription are not unnecessar-ily disseminated throughout the broker network.In other words,the composite subscription should be forwarded to the publishers within the broker network as far as possible before it is decomposed.As a result,bandwidth usage is reduced. Following the example in Fig.4(a),suppose a composite subscription((s1& s2) s3)arrives from broker1,and its matching publications arrive from bro-ker3,5,and6.The composite subscription is split into parts along the routing path,since the matching publications may arrive from different brokers.Atomic subscriptions s1and s2are detected at broker5and6respectively and the de-tection results are combined at broker4for(s1&s2).Moreover,the detection results could be shared among subscribers that have common subexpressions of composite subscriptions in order to save bandwidth and computational effort.Each atomic subscription in a composite subscription couldfind its destina-tion(s)from SRT.If all atomic subscriptions have the same next hop destination, a broker should forward the composite subscription as a whole to the destina-tion;otherwise the composite subscription should be split into parts according to different destinations,and each part should be forwarded to its own destination. In Fig.4(b),since all matching publications are coming from broker2,broker1 routes the composite subscription as a whole.At broker2publications matching s1and s2arrive from broker4according to the SRT,while s3’s publications will arrive from broker3.As a result,the composite subscription is split into two parts:(s1&s2)and s3.Thefirst part is sent to broker4,where it is split into s1and s2,and sent to broker5and6respectively.The second part s3is routed to broker3.The routing scheme is to detect the event pattern matching a com-posite subscription at a location which is as close as possible to the data sources.A composite subscription is mapped to a rule,and a publication is mapped to a fact at a single broker.The rule-based broker matches facts against rules and decides where to route the notification if there is a match.Therefore,the broker acts as both a message router and a composite event detector.The advantage of using a rule-based matching engine is that it enables composite subscriptions naturally without significant changes to the broker.。

信息技术人工智能机器翻译能力等级评估说明书

信息技术人工智能机器翻译能力等级评估说明书

ICS35.240L70/84团体标准T/CESA 1039—2019信息技术人工智能机器翻译能力等级评估Information technology-Artificial intelligence-Classifiedassessment for machine translation capabilities2019-04-01发布2019-04-01实施目次前言 (II)1 范围 (1)2 术语和定义 (1)3 缩略语 (1)4 机器翻译系统通用模型及要求 (2)4.1 概述 (2)4.2 系统输入输出要求 (2)4.3 系统服务引擎要求 (2)5 机器翻译系统能力指标及计算方法 (2)5.1 能力指标体系 (2)5.2 指标评估方法 (4)5.3 能力计算方法 (5)6 机器翻译系统能力等级划分 (5)7 机器翻译系统能力等级评估要求 (5)7.1 确定评估方案 (5)7.2 机器翻译系统界定 (5)7.3 计算评估指标得分 (5)7.4 评估对象等级划分 (5)7.5 评估报告及使用 (6)附录A (资料性附录)机器翻译忠实度和流利度评价 (7)附录B (规范性附录)机器翻译系统响应时间 (8)附录C (规范性附录)机器翻译综合差错率计算 (9)前言本标准按照GB/T 1.1—2009《标准化工作导则第1部分:标准的结构和编写》给出的规则起草。

请注意本文件的某些内容可能涉及专利。

本文件的发布机构不承担识别这些专利的责任。

本标准由中国电子技术标准化研究院提出并归口。

本标准起草单位:中国电子技术标准化研究院、科大讯飞股份有限公司、腾讯科技(北京)有限公司、网易有道信息技术(北京)有限公司、中国电信集团有限公司、潍坊北大青鸟华光照排有限公司、北京百度网讯科技有限公司、华夏芯(北京)通用处理器技术有限公司、广州广电运通金融电子股份有限公司、安徽听见科技有限公司、杭州方得智能科技有限公司、海尔优家智能科技(北京)有限公司。

机器翻译中的质量评估方法研究

机器翻译中的质量评估方法研究

机器翻译中的质量评估方法研究机器翻译(Machine Translation,MT)是利用计算机技术将一种自然语言的文本转化为另一种自然语言的文本的过程。

随着机器翻译技术的不断发展,如何评估机器翻译的质量成为一个重要的研究课题。

本文将介绍机器翻译中的质量评估方法的研究进展。

一、人工评估方法评估机器翻译质量最直接的方法是人工评估,即由人类专家对翻译结果进行评估和打分。

人工评估可以分为两种类型:一是人工评估比较,即将机器翻译的结果与人工参考翻译进行比较;二是人工评估可信度,即由人工专家对机器翻译的质量进行全面的评估,并给出一个综合的质量得分。

人工评估方法的优点是能够准确地反映机器翻译的质量,尤其适用于一些关键性的翻译任务。

然而,人工评估方法也存在一些问题,比如费时费力、成本高昂等。

为了解决这些问题,研究人员提出了其他的机器评估方法。

二、自动评估方法自动评估方法是利用计算机算法对机器翻译的质量进行评估。

这些方法主要分为两大类:基于标准答案的方法和基于语言模型的方法。

1. 基于标准答案的方法基于标准答案的方法通过将机器翻译的结果与人工参考翻译进行比较来评估机器翻译的质量。

这类方法常用的评估指标包括BLEU (Bilingual Evaluation Understudy)、NIST(Normalized Information Similarity Test)、METEOR(Metric for Evaluationof Translation with Explicit Ordering)等。

BLEU是最常用的机器翻译评估指标之一,它通过计算机器翻译结果与参考翻译之间的n-gram重叠率来评估翻译的准确性。

BLEU指标的取值范围是0到1,值越接近1代表机器翻译的质量越好。

NIST是根据信息检索的原理设计的一种评估指标,它将机器翻译结果与参考翻译之间的信息接近度作为评价标准。

NIST指标也是0到1之间的值,越接近1表示机器翻译的质量越高。

MT-Ⅲ级

MT-Ⅲ级

MT-Ⅲ级概述MT-Ⅲ级是一种高级的机器翻译系统,其基于神经网络和机器学习技术,能够对输入的源语言文本进行自动翻译,输出目标语言文本。

MT-Ⅲ级具有高度的准确性和流畅性,广泛应用于各种翻译场景,包括跨语种翻译、专业领域翻译等。

特点1. 神经网络技术MT-Ⅲ级采用了先进的神经网络技术,包括深度学习和强化学习算法。

神经网络模型可以通过对大量的语料进行训练,自动学习源语言和目标语言之间的关系,从而更好地进行翻译。

神经网络模型可以处理更复杂的语言结构和语义信息,提升翻译的准确性和流畅性。

2. 自学习能力MT-Ⅲ级具有自学习能力,可以根据用户的反馈数据对自身进行优化。

用户使用MT-Ⅲ级进行翻译时,系统会自动记录用户的翻译选择和评价,通过机器学习算法进行分析和处理,从而改善翻译效果。

通过不断的优化和学习,MT-Ⅲ级可以逐步提升翻译质量,更好地满足用户的需求。

3. 多领域支持MT-Ⅲ级可以支持各种领域的翻译,包括通用领域、专业领域等。

MT-Ⅲ级在训练过程中充分考虑了不同领域的语言特点和翻译需求,通过对领域相关的语料进行训练,提高了在特定领域翻译的准确性和专业性。

无论是法律、医学、科技还是金融领域,MT-Ⅲ级都能够提供高质量的翻译结果。

4. 多语种支持MT-Ⅲ级支持多种语言的翻译,包括但不限于中文、英文、法文、德文、日文、韩文等。

MT-Ⅲ级可以根据用户的需求进行相应的配置和训练,以满足不同语种的翻译需求。

无论是中英文、中法文、中德文,还是其他语种之间的翻译,MT-Ⅲ级都能够提供高效准确的翻译服务。

使用场景MT-Ⅲ级广泛应用于各种翻译场景,包括但不限于:•商务会议:在跨国商务会议中,MT-Ⅲ级能够实时翻译与会人员的发言,提高沟通效率和准确性。

•跨语种沟通:在国际合作、交流活动中,MT-Ⅲ级可以帮助用户快速进行语言转换,方便双方交流。

•学术研究:在学术研究中,MT-Ⅲ级可以帮助研究人员阅读和分析其他语种的论文和资料,促进学术交流和合作。

第五届CASIO翻译大赛法语组参考译文专家点评

第五届CASIO翻译大赛法语组参考译文专家点评

Chaque homme est seul et tous se fichent de tous et nos douleurs sont une île déserte. Ce n’est pas une raison pour ne pas se consoler, ce soir, dans les bruits finissants de la rue, se consoler, ce soir, avec des mots. Oh, le pauvre perdu qui, devant sa table, se console avec des mots, devant sa table et le téléphone décroché, car il a peur du dehors, et le soir, si le téléphone est décroché, il se sent tout roi et défendu contre les méchants du dehors, si vite méchants, méchants pour rien.Quel étrange petit bonheur, triste et boitillant mais doux comme un péchéou une boisson clandestine, quel bonheur tout de même d’écrire en ce moment, seul dans mon royaume et loin des salauds. Qui sont les salauds ? Ce n’est pas moi qui vous le dirai. Je ne veux pas d’h istoires avec les gens du dehors. Je ne veux pas qu’on vienne troubler ma fausse paix et m’empêcher d’écrire quelques pages par dizaines ou centaines selon que ce cœur de moi qui est mon destin décidera. J’ai résolu notamment de dire à tous les peintres qu’ils ont du génie, sans Ça ils vous mordent. Et, d’une manière générale, je dis à chacun que chacun est charmant. Telles sont mes mœurs diurnes. Mais dans mes nuits et mes aubes je n’en pense pas moins.Somptueuse, toi, ma plume d’or, va sur la feuille, va au hasard tandis que j’ai quelque jeunesse encore, va ton lent cheminement irrégulier, hésitant comme en rêve, cheminement gauche mais commandé. Va, je t’aime, ma seule consolation, va sur les pages où tristement je me complais et dont le strabisme morosement me délecte. Oui, les mots, ma patrie, les mots, Ça console et Ça venge. Mais ils ne me rendront pas ma mère. Si remplis de sanguin passé battant aux tempes et tout odorant qu’ils puissent être, les mots que j’écris ne me rendront pas ma mère morte. Su jet interdit dans la nuit. Arrière, image de ma mère vivante lorsque je la vis pour la dernière fois en France, arrière, maternel fantÔme.Soudain, devant ma table de travail, parce que tout y est en ordre et que j’ai du café chaud et une cigarette à peine commencée et que j’ai un briquet qui fonctionne et que ma plume marche bien et que je suis près du feu et de ma chatte, j’ai un moment de bonheur si grand qu’il m’émeut. J’ai pitié de moi, de cette enfantine capacité d’immense joie qui ne présage rien de bon. Que j’ai pitié de me voir si content à cause d’une plume qui marche bien, pitié de ce pauvre bougre de cœur qui veut s’arrêter de souffrir et s’accrocher à quelque raison d’aimer pour vivre. Je suis, pour quelques minutes, dans une petite oasis bourgeoise que je savoure. Mais un malheur est dessous, permanent, inoubliable. Oui, je savoure d’être, pour quelques minutes, un bourgeois, comme eux. On aime être ce qu’on n’est pas.Il n’y a pas plus artiste qu’une vraie bourgeoise qui écume devant un poème ou entre en transe, une mousse aux lèvres, à la vue d’un Cézanne et prophétise en son petit jargon, chipé Çà et là et même pas compris, et elle parle de masses et de volumes et elle dit que ce rouge est si sensuel. Et ta sœur, est-ce qu’elle est sensuelle ? Je ne sais plus où j’en suis. Faisons donc en marge un petit dessin appeleur d’idées, un dessin réconfort, un petit dessin neurasthénique, un dessin lent, où l’on met des décisions, des projets, un petit dessin, île étrange et pays de l’ame, triste oasis des réflexions qui en suivent les courbes, un petit dessin à peine fou, soigné, enfantin, sage et filial. Chut, ne la réveillez pas, filles de Jérusalem ne la réveillez pas pendant qu’elle dort.Qui dort ? demande ma plume. Qui dort, sinon ma mère éternellement, qui dort, sinon ma mère qui est ma douleur ? Ne la réveillez pas, filles de Jérusalem, ma douleur qui est enfouie au cimetière d’une ville dont je ne dois pas prononcer le nom, car ce nom est synonyme de ma mère enfouie dans de la terre. Va, plume, redeviens cursive et non hésitante, et sois raisonnable, redeviens ouvrière de clarté, trempe-toi dans la volonté et ne fais pas d’aussi longues virgules, cette inspiration n’est pas bonne. Ame, Ô ma plume, sois vaillante et travailleuse, quitte le pays obscur, cesse d’être folle, presque folle et guidée, guindée morbidement. Et toi, mon seul ami, toi que je regarde dans la glace, réprime les sanglots secs et, puisque tu veux oser le faire, parle de ta mère morte avec un faux cœur de bronze, parle calmement, feins d’être calme, qui sait, ce n’est peut-être qu’une habitude à prendre ? Raconte ta mère à leur calme manière, sifflote un peu pour croire que tout ne va pas si mal que ?a, et surtout souris, n’oublie pas de sourire. Souris pour escroquer ton désespoir, souris pour continuer de vivre, souris dans ta glace et devant les gens, et même devant cette page. Souris avec ton deuil plus haletant qu’une peur. Souris pour croire que rien n’importe, souris pour te forcer à feindre de vivre, souris sous l’épée s uspendue de la mort de ta mère, souris toute ta vie à en crever et jusqu’à ce que tu en crèves de ce permanent sourire.——Extrait du Livre de ma mère d’Albert Cohen法文部份我的母亲人人孤独,相互轻视,我们的痛苦是座荒岛。

大型语言模型在机器翻译任务中的性能和效果分析

大型语言模型在机器翻译任务中的性能和效果分析

翻译质量效果分析
翻译准确度:大型语言模型在机器翻译任务中的准确度较高,能够准确传 达原文意思。
翻译流畅度:大型语言模型能够保证翻译的流畅性,使译文更加自然、流 畅。
翻译多样性:大型语言模型能够提供多种翻译结果,满足不同用户的需求。
翻译效率:大型语言模型在机器翻译任务中的效率较高,能够快速完成翻 译任务。
翻译质量性能分析
翻译准确度:评估大型语言模型在翻译任务中的准确性 翻译流畅度:评估大型语言模型在翻译任务中的流畅性 翻译效率:评估大型语言模型在翻译任务中的效率 翻译多样性:评估大型语言模型在翻译任务中的多样性
05
大型语言模型在机器翻译任务中的效果分 析
翻译准确度效果分析
大型语言模型在机器翻译任务中的准确度评估指标 不同大型语言模型在机器翻译任务中的准确度比较 大型语言模型在机器翻译任务中的优势与局限性 提高大型语言模型在机器翻译任务中准确度的策略与方法
翻译效率提高
大型语言模型在机器翻译任务 中的应用
翻译效率提高的原理
翻译效率提高的具体表现
翻译效率提高的案例分析
翻译质量改善
减少翻译错误:大型语言模型具备强大的语言理解能力,能够更准确地翻译文本
保留原文语义:大型语言模型能够更好地保留原文的语义和信息
流畅度和可读性提高:大型语言模型能够生成更流畅、自然的翻译结果
单击此处添加副标题
大型语言模型在机器翻译
任务中的性能和效果分析
汇报人:XXX
目录
01 02 03 04 05 06
大型语言模型概述
机器翻译任务概述 大型语言模型在机器翻译任务中的
应用 大型语言模型在机器翻译任务中的
性能分析 大型语言模型在机器翻译任务中的

大语言模型在机器翻译中的应用研究 完整版论文

大语言模型在机器翻译中的应用研究 完整版论文

《大语言模型在机器翻译中的应用研究》目录一、引言1. 研究背景2. 研究目的与意义3. 研究方法与论文结构二、大语言模型概述1. 大语言模型的定义2. 大语言模型的特点3. 大语言模型的发展历程三、机器翻译技术概述1. 机器翻译的定义2. 机器翻译的发展历程3. 机器翻译的主要挑战四、大语言模型在机器翻译中的应用1. 预训练语言模型a. 语言模型预训练技术b. 预训练语言模型在机器翻译中的应用2. 生成式翻译模型a. 生成式翻译模型的定义b. 生成式翻译模型在机器翻译中的应用3. 半监督和零样本翻译a. 半监督翻译b. 零样本翻译4. 大语言模型在特定场景下的应用a. 跨语言信息检索b. 文本摘要与生成c. 情感分析与舆情监测五、大语言模型在机器翻译中的应用挑战与展望1. 大语言模型在机器翻译中的应用挑战a. 模型规模与效率b. 数据质量与标注问题c. 模型可解释性与安全性2. 未来发展趋势与展望a. 跨模态与多语言支持b. 个性化翻译与适应性c. 深度学习与自然语言处理技术的融合六、结论1. 本文研究成果总结2. 未来研究方向与建议一、引言1. 研究背景随着互联网的普及和全球化的加速,人们需要跨越语言障碍进行交流的需求越来越迫切。

机器翻译作为实现跨语言交流的重要手段,已经成为了计算机科学领域的研究热点之一。

近年来,深度学习技术的发展为机器翻译带来了革命性的变化,其中大语言模型(Large LanguageModels,LLMs)是这一领域的重要突破之一。

大语言模型是一种基于深度学习的自然语言处理模型,可以学习大规模的语言数据,并具有强大的语言理解和生成能力。

在机器翻译领域,大语言模型可以用于构建多语言的神经机器翻译系统,实现跨语言翻译和理解。

大语言模型在机器翻译中的应用研究主要包括以下几个方面:1.模型设计大语言模型在机器翻译中的应用研究首先涉及到模型设计。

为了实现机器翻译,需要设计一种能够将源语言转换为目标语言的神经网络模型。

《机器翻译》评介

《机器翻译》评介

的 工 具 性 差 异 导 致 M T过 程 中 的 上 升 转 换 的结构差异 ,译文则 会失真 {distortion)。在 这三
(ascending transfer}和下降转换(descending transfer) 种模 型 中 ,模 型 1主 要 聚焦 于 翻译 概 率 的模 型
了MT中面 临的可 能挑战和难题 。该书有利于广 Maximun,EM)的迭代 运算 法降低平均 熵 ,计算
大 学 者和 翻 译 专业 学生 深 入 了解MT的基础 理 出翻译 概率 。EM理论 通过假设赋值 、计算匹配
论 ,以及机器翻译 面临的难题 ,指明机器翻译未 概率 、建立数学表达 式和似然表达式、预测参数
的Taylor&Francis Group出版社2015年 出版 。该 双语 翻 译 的 “分 析一 转换 一 生 成 ” 过程 ,以及 该
书集作者十余年机器翻译 的教学与实践经验 ,全 过程所包括词汇翻译 、翻译 匹配、词汇繁衍管理
面归纳和探讨MT原理 、语 言歧义 消除、 匹配模 以及短语 匹配管理 等技术理论 。词汇翻译和翻译
Pushpak Bhattacharrya教授现 为印度大 学计 据能推翻假设 ,也能限制翻译能力和效率 ,还 能
算机 科 学 和 工程 系 主 任 ,他 的 《机 器 翻 译 》 最大限度地 阐释翻译概率 。作者运用沃古瓦三 角
(Machine Translation)(Bhattacharyya,2015)由著名 (Vauquois triangle/pyramid)理论 ,分析和 阐述MT
型数理 以及评估模型等基础理论 ,结合印地语 、 匹配是循环过程 ,即在给定的平行语料 中 ,任意

有关“机器翻译”的文献综述[1]

有关“机器翻译”的文献综述[1]

机器翻译机器翻译(Machine Translation),又称为自动翻译,是利用计算机把一种自然源语言转变为另一种自然目标语言的过程,它是自然语言处理(Natural Language Processing)的一个分支,与计算语言学(Computational Linguistics)、自然语言理解(Natural Language Understanding)之间存在着密不可分的关系。

步入21世纪以来,随着国际互联网(Internet)的迅猛发展和世界经济一体化的加速,网络信息急剧膨胀,国际交流日益频繁,如何克服语言障碍已经成为国际社会共同面对的问题。

由于人工翻译的方式远远不能满足需求,利用机器翻译技术协助人们快速获取信息,已经成为必然的趋势。

一、机器翻译的基础机器翻译的研究是建立在语言学、数学和计算机科学这3门学科的基础之上的。

语言学家提供适合于计算机进行加工的词典和语法规则,数学家把语言学家提供的材料形式化和代码化,计算机科学家给机器翻译提供软件手段和硬件设备,并进行程序设计。

缺少上述任何一方面,机器翻译就不能实现,机器翻译效果的好坏,也完全取决于这3个方面的共同努力。

二、机器翻译简史机器翻译的研究历史可以追溯到20世纪四五十年代。

1946年,第一台现代电子计算机ENIAC诞生,随后不久,信息论的先驱、美国科学家W. Weaver和英国工程师A. D. Booth 在讨论电子计算机的应用范围时,提出了利用计算机进行语言自动翻译的想法。

1949年,W. Weaver发表《翻译备忘录》,正式提出机器翻译的思想。

走过六十年的风风雨雨,机器翻译经历了一条曲折而漫长的发展道路,学术界一般将其划分为如下四个阶段:1.开创期(1946-1964)1954年,美国乔治敦大学(Georgetown University)在IBM公司协同下,用IBM-701计算机首次完成了英俄机器翻译试验,向公众和科学界展示了机器翻译的可行性,从而拉开了机器翻译研究的序幕。

机器翻译测试大纲

机器翻译测试大纲

2004年度机器翻译评测大纲一、评测对象本次评测的对象包括:汉-英、英-汉、汉-日、日-汉、汉-法、法-汉机器翻译系统中的核心技术。

二、评测内容本次评测组织两种语料的评测,一种是篇章语料,一种是对话语料。

领域是通用领域和奥运的相关领域,包括体育赛事、天气预报、交通住宿、旅游餐饮等。

本次评测的评测指标包括译文质量和翻译速度。

三、评测方法1. 评测方式本次评测为现场评测。

结果评估采用的是以人工评估为主、自动评估为辅方式。

人工评估采用可理解率指标。

评估方式是:由评测组织单位将提交的评测结果汇总在一起,然后用计算机随机打乱译文句子的排列顺序。

再将所有译文句子提交给多位专家进行可理解率的人工评估。

将专家评测的结果汇总,用计算机还原成原来的排列顺序,计算出总的可理解率。

自动评测采用基于n元语法的BLEU和NIST方法。

2.评测步骤(1) 在评测单位统一提供的评测环境上安装被测系统。

系统应安装在指定的目录中。

(2) 评测单位给出评测数据。

评测数据存放在指定目录中。

给出评测数据以后被测单位不得再更改系统参数。

(3) 被测单位运行系统,提交评测结果。

被测单位应指导评测人员学会操作方法,所有操作由评测人员进行,系统运行时各单位人员应离场。

系统的运行应该是批处理方式的,系统读入一个脚本文件(格式后面说明),脚本中存放输入文件名和对应的输出文件名。

(4) 评测单位事后进行人工评估。

(5) 公布评测结果。

3.评测标准(1)人工评测标准本次评测按0.0 –5.0分打分,可含一位小数,最后采用百分制换算评测结果。

总的可理解率= 所有句子得分之和/总句数/5×100%(2)翻译速度评测标准由主持评测的工作人员现场记录翻译时间,各系统自动显示从第一个句子翻译开始到所有句子翻译完毕所用的时间(不计系统初始化所用时间,只记开始翻译到所有句子翻译完毕所用时间)。

4. 输入输出文件格式系统首先接受一个脚本文件作为输入,脚本文件中给出了一系列机器翻译源语言和目标语言文件的文件路径。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

六、 评测指标
本次评测采用多种自动评价标准,包括:BLEU-SBP 、BLEU-IBM 、BLEU-NIST、 NIST 、GTM 、mWER 、mPER、ICT和WoodPecker。BLEU-SBP 是主要的自动评价 标准。自动评测的算法是大小写敏感的。中文的评测是基于字的,而不是词。参评单 位无需在中文译文的汉字之间添加空格。评测组织方会在自动评价前将所有中文译文 中的对应于GB2312编码的A3区全角字符都转换成半角字符, 参评单位也无需自己进行 此项转换。 WoodPecker是微软亚洲研究院研制的新的机器翻译评测指标。该指标的评测使用 WoodPecker系统平台,自动评测翻译系统对各种语言学知识(称为检测点)的翻译能 力。依据测试数据集和参考翻译数据集之间的词对齐结果和它们的句法分析树, WoodPecker系统首先自动从源语言和目标语言中抽取各种检测点类型,包括名词短语, 动宾搭配,介词短语,新词等几十种语言学类型。然后,WoodPecker系统通过计算测 试语料中检测点的参考翻译结果与翻译系统翻译结果之间的匹配程度来评估翻译系统 在特定语言学现象方面的翻译能力见附录六。 为提高WoodPecker评测的准确度,本次WoodPecker评测的测试集将通过人工方式进行 校对。
史晓东(厦门大学) 孙乐(中国科学院软件研究所) 王惠临(中国科学技术信息研究所) 杨沐昀(哈尔滨工业大学) 俞敬松 (北京大学) 张冬冬(微软亚洲研究院) 赵红梅(中国科学院计算技术研究所) 周玉(中国科学院自动化研究所) 有关会议和评测的更多信息请参见以下网址: /cwmt2009 本次会议和评测特别在Google Group上设置CWMT论坛,欢迎加入并参与讨论: /group/cwmt
CWMT2009研讨会结束一个月内,评测组织单位将会把“英汉科技机器翻译”和“汉 蒙日常口语机器翻译”两个评测任务的参考译文提供给参评单位,供各参评单位研究 之用。 “汉英新闻领域单一系统”和“英汉新闻领域机器翻译”两个项目的参考译文 将不提供给参评单位,而是留到下次评测时继续使用,以便了解各参评单位在这一段 时间间隔内的技术进步。因此,参评单位应在这一段时间内避免对这两个测试集进行 针对性调试,以免影响下次评测的客观性。
五、 训练数据
评测组织者将指定一些语言资源,供参评单位用作系统训练之用。资源的清单请 参见附件五。 1. “机器翻译”项目 “机器翻译”项目的基本系统仅允许使用评测组织者指定的数据,不允许使用任 何外部数据进行训练。系统开发过程中通过人工方式构造的数据也在允许范围之内, 因此基于规则的机器翻译系统也可以作为参评的基本系统。如果参评的基本系统是基 于规则的机器翻译系统,但其中混合了基于实例的机器翻译技术或者统计机器翻译技 术,那么这些技术所使用的数据也必须限制在评测组织者指定的数据范围之内,不允 许使用任何外部数据。
3. “系统融合”项目 “系统融合”项目的参评系统目标是,接受“单一系统”项目的翻译结果,并对 这些结果进行重新组合,以得到更好的结果。 “系统融合”项目的参评系统只能基于已有的翻译结果进行处理,不得利用源语 言句子,不得使用任何双语资源进行训练。
三、 测试数据
1. “机器翻译”项目的测试数据 各“机器翻译”项目的评测都采用目前国际上普遍采用的评测方式。由评测组织 方提供源语言的测试数据,测试数据已经分割成句子。测试数据格式见附件二。 2. “单一系统”项目的测试数据: “单一系统”项目的测试数据与“机器翻译”项目类似。 本次CWMT2009“汉英新闻领域单一系统”评测项目的参评单位除了要在本次评测 的CWMT2009测试数据上运行并产生结果以外,还要求在SSMT2007测试数据上运行并产 生结果,并同时提交给评测组织单位。 3. “系统融合”项目的测试数据 在“汉英新闻领域单一系统”项目评测结束后,评测组织单位会将所有参评单位 在CWMT2009和SSMT2007两个测试数据上的输出翻译结果(N-best译文)发给“汉英新 闻领域系统融合”项目的所有参评单位,作为测试数据。 “汉英新闻领域系统融合”项目的参评单位在本次评测的CWMT2009测试数据的多 个译文基础上进行系统融合,并给出系统融合的结果。SSMT2007测试数据的翻译结果 用于对各个输入系统进行评估。 4. 干扰数据 评测组织者提供给参评单位的测试数据中,除了真正的测试数据外,还有一定比 例的干扰数据,干扰数据并不真正用于评测。 5. 分割日期 为了确保训练数据和测试数据不会重叠,评测组织方定义了一个训练数据和测试 数据的分割日期(Cut-off Date)。本次评测定义的分隔日期是2009年1月1日。 所有的训练数据和开发数据,包括评测组织方提供的数据和参评单位自己收集的 数据,都必须是在分割日期之前(不含分割日期)产生的数据。 评测组织方提供的测试数据将是在截止日期之后(含分割日期)产生的数据。
“机器翻译”项目的对比系统可以使用任何数据进行训练。如果这些系统使用了 评测组织者提供的数据之外的数据,那么应该说明这些数据是否是可以公开获得的数 据。如果是可以公开获得的数据,参评单位应该说明所采用的外部数据的出处;如果 不是可以公开获得的数据,参评单位应该说明该数据的内容和规模。 所有的外部数据都必须是分割日期之前产生的数据。 2. “单一系统”项目 “汉英新闻领域单一系统”项目的参评系统,包括基本系统和对比系统,都仅允 许使用评测组织者指定的数据,不允许使用任何外部数据进行训练(注意,这一点与 “机器翻译”项目不同)。系统开发过程中通过人工方式构造的数据也在允许范围之 内,因此基于规则的机器翻译系统也可以作为参评系统。如果参评系统是基于规则的 机器翻译系统,但其中混合了基于实例的机器翻译技术或者统计机器翻译技术,那么 这些技术所使用的数据也必须限制在评测组织者指定的数据范围之内,不允许使用任 何外部数据。 另外需要特别注意的是,由于“汉英新闻领域单一系统”项目的参评系统需要运 行并提交 SSMT2007 测试数据上的运行结果,因此该项目的训练数据不允许使用 SSMT2007测试数据及其参考译文。 3. “系统融合”项目 “系统融合”项目的参评系统,仅允许使用评测组织者提供的目标语言数据,不 允许使用任何外部数据进行训练,也不允许使用任何双语数据。
本次评测仅在“汉英新闻领域单一系统”项目和“汉英新闻领域系统融合”项目 中使用WoodPecker进行评测。
七、 提交技术报告并参加评测研讨会
评测结束后,参评单位应向CWMT2009研讨会提交一份详细的技术报告,并派至 少一人参加CWMT2009研讨会。技术报告的要求见附件四。
八、 评测组织者向参评单位提供参考译文
以上评测项目分为三类: 1. “机器翻译”项目 “机器翻译”项目的参评系统要将评测组织者提供的源语言句子翻译成目标语言 句子。 此类项目的参评系统可以采用任何机器翻译技术,包括基于规则的机器翻译技术、 基于实例的机器翻译技术、或者统计机器翻译技术。对参评系统所采用的技术没有任 何限制。 2. “单一系统”项目 “单一系统”项目的参评系统也要将评测组织者提供的源语言句子翻译成目标语 言句子。 此类项目的参评系统必须是单一的系统,也就是说,不允许采用系统融合技术, 将多 个系统 的 结果融合 成一个结果 ;也不允 许采用对多个 翻译结果进 行重 评分 (Rescoring)的技术,因为重评分技术可以认为是系统融合技术的一个特例;同样不 允许采用解码阶段的多模型技术,也就是使用多个不同概率模型同时指导解码的技术, 因为这也可以看做另一种类型的系统融合技术。除此之外,对参评系统所采用的技术 没有任何其他限制。
九、 评测日历
1. 2. 3. 4. 5. 6. 2009年7月15日 2009年7月1日 2009年8月10日 上午9:00 上午9:00 上午9:00 上午9:00 2009年8月24日 7. 上午9:00 报名截止日期 评测组织方发放训练数据 评测组织方发放“汉英新闻领域单一系统”项目测试数据
二、 评测项目
本次评测的项目设置如下: 评测项目代号 ZH-EN-NEWS-SINGL ZH-EN-NEWS-COMBI EN-ZH-NEWS-TRANS EN-ZH-SCIE-TRANS ZH-MN-DAIL-TRANS 评测项目名称 汉英新闻领域单一系统 汉英新闻领域系统融合 英汉新闻领域机器翻译 英汉科技领域机器翻译 汉蒙日常用语机器翻译 语种 汉�英 汉�英 英�汉 英�汉 汉�蒙 领域 新闻领域 新闻领域 新闻领域 科技领域 日常用语 技术 单一系统 系统融合 机器翻译 机器翻译 机器翻译
2009年8月14日 “汉英新闻领域单一系统”项目参评单位提交运行结果和系 统描述 域机器翻译”项目测试数据 目参评单位提交运行结果和系统描述 评测组织方发放“汉英新闻领域系统融合”项目的测试数据 (即“汉英新闻领域单一系统”项目参评单位提交的运行结 果的汇总)和“汉蒙日常用语机器翻译”项目的测试数据 项目参评单位提交的运行结果和系统描述 评测组织方完成结果评估,向所有参评单位通知评测结果 所有参评单位提交参加评测的技术报告 在研讨会上进行研讨 2009年8月17日 评测组织方发放“英汉新闻领域机器翻译”和“英汉科技领 2009年8月21日 “英汉新闻领域机器翻译”和“英汉科技领域机器翻译”项
四、 测试结果
参评单位收到测试数据后,应在给定时间内返回翻译结果。
参评单位提交的每个结果,都应附带一个详细的系统描述,系统描述的要求见附 件三。 1. “机器翻译”项目 “机器翻译”项目的输出结果采用1-best结果文件格式,具体数据格式见附件二。 “机器翻译”项目的每个参评单位必须提交一个基本结果(Primary Result),产 生基本结果的系统称为参评单位的基本系统(Primary System)。 参评单位最多可以提交三个对比结果(Contrast Results),产生对比结果的系统称 为参评单位的对比系统(Contrast Systems)。 2. “单一系统”项目 “单一系统”项目的输出结果采用N-best结果文件格式,具体数据格式见附件二。 “单一系统”项目的每个参评单位必须提交一个基本结果(Primary Result),产 生基本结果的系统称为参评单位的基本系统(Primary System)。 参评单位最多可以提交三个对比结果(Contrast Results),产生对比结果的系统称 为参评单位的对比系统(Contrast Systems)。 如果可能的话,参评系统输出的N-best结果应给出每个结果的评分,并且按照评 分从高到低排序。N最多不超过20。该项目评测只使用第一个结果进行评价,其他结果 将提供给“汉英新闻领域系统融合”项目的参评单位作为输入数据。如果参评系统由 于所采用的技术原因,无法提供每个结果的评分,或者无法提供N-best的翻译结 果, 则可以不提供N-best翻译结果和评分,在这种情况下,请直接采用1-best结果文件格 式,不要采用N-best结果文件格式。 参评单位除了要提交参评系统在CWMT2009测试数据上的运行结果以外,还要提交 同一系统在SSMT2007测试数据上的运行结果。 3. “系统融合”项目 “系统融合”项目提交测试结果的要求与“机器翻译”项目完全相同。
相关文档
最新文档