Using the Web for Anaphora Resolution

合集下载

中文专业名词或术语大全

中文专业名词或术语大全

中文专业名词或术语大全English Answer:Literary Terms.Allegory: A story with a deeper meaning than its surface level.Anaphora: Repetition of a word or phrase at the beginning of successive lines or clauses.Characterization: The development of characters in a story.Climax: The turning point in a story.Conflict: The struggle between opposing forces in a story.Denouement: The resolution of a story.Epigraph: A short quotation at the beginning of a work that provides insight into its theme or meaning.Exposition: The introduction of characters, setting, and background information in a story.Falling Action: The events that follow the climax and lead to the resolution.Figurative Language: Language that uses figures of speech, such as metaphors, similes, and personification.Flashback: A scene in a story that interrupts the chronological order to show an event that happened in the past.Foreshadowing: Hints or clues about events that will happen later in the story.Imagery: The use of vivid language to create a mental picture in the reader's mind.Irony: A contrast between what is expected and what actually happens.Metaphor: A comparison between two things that are not literally alike, but share some common qualities.Motif: A recurring element or theme in a story.Narrator: The person who tells the story.Plot: The sequence of events in a story.Point of View: The perspective from which a story is told.Resolution: The outcome of the conflict in a story.Rising Action: The events that lead up to the climaxin a story.Setting: The time and place in which a story occurs.Simile: A comparison between two things that are literally alike, using the words "like" or "as"Symbolism: The use of symbols to represent abstract ideas or concepts.Theme: The central idea or message of a story.Grammatical Terms.Adjective: A word that describes a noun or pronoun.Adverb: A word that modifies a verb, adjective, or other adverb.Conjunction: A word that connects words, phrases, or clauses.Determiner: A word that comes before a noun to specify its quantity or identity.Interjection: A word or phrase that expresses strong emotion.Noun: A word that names a person, place, thing, or idea.Preposition: A word that shows the relationship between a noun or pronoun and another word in the sentence.Pronoun: A word that takes the place of a noun.Verb: A word that describes an action, occurrence, or state of being.Literary Genres.Biography: A written account of a person's life.Drama: A work of literature that is intended to be performed on stage.Essay: A written work that expresses the author'sthoughts and ideas on a particular topic.Fiction: A story that is invented rather than based on real events.Memoir: A written account of a person's own life.Novel: A long work of fiction that tells a story.Poem: A work of literature that uses language in a creative and expressive way.Short Story: A brief work of fiction that tells a complete story.Other Literary Terms.Author: The person who writes a work of literature.Character: A person, animal, or other being in a work of literature.Epilogue: A concluding section of a work of literature that provides additional information or commentary.Genre: A category of literature that shares certain characteristics.Protagonist: The main character in a work of literature.Reader: A person who reads a work of literature.Text: The written or printed words of a work of literature.中文回答:文学术语。

基于栈式降噪自编码和词嵌入表示的维吾尔语零指代消解

基于栈式降噪自编码和词嵌入表示的维吾尔语零指代消解
2018年 5月
JOURNAL OF CH INESE INFORM ATION PR0CESSING
文 章 编 号 :1003-0077(2018)05—0056—09
May,2018
Байду номын сангаас
基 于栈 式 降噪 自编 码 和词 嵌 入表 示 的维 吾 尔语 零 指代 消 解
秦 越 ,禹 龙 ,田生 伟。,冯 冠 军 ,吐 尔 根 ·依 布 拉 音 ,艾 斯 卡 尔 ·艾 木 都 拉 ,赵 建 国
QIN Yue ,YU Long。,TIAN Shengwei。,FENG Guanj un , Turgun Ibrahim ,A skar H am dulla ,ZH AO Jianguo
(1.School of Information Science and Engineering,Xi ̄iang University,Urum qi,Xingjiang 830046,China; 2.Network Center,Xinjiang University,Urumqi,Xin iang 830046,China;
0 引言
指代 (anaphora)是 常 见 的 自然 语 言 现象 ,它 是 指在 语篇 中用 一个 指代 词 回指 前文 出现 过 的某一 语 言单 位 。在维 吾 尔 语语 篇 中 ,能 够通 过 上 下 文 语 境
判 断 出 的部 分 经常 被 省 略 ,被 省 略 的部 分在 语 句 中 承担 相应 的句法 成 分 ,且 指 代 前 文 中 出 现过 的某 一 语 言 单位 ,这一 现 象 称 为 零 指 代 。被 省 略 的 部分 称 为零 指代 项 ,被 指 代 的语 言 单 位 称 为 先 行语 (ante— cedent)。如例 句 所 示 ,其 中“ ”代 表 零 代 词 出现 的 位 置 (维吾 尔语 书写 习惯 为从 右 向左 )。

专业英语八级英语语言学知识分类模拟题1

专业英语八级英语语言学知识分类模拟题1

专业英语八级英语语言学知识分类模拟题1专业英语八级英语语言学知识分类模拟题1单项选择题1. Which of the following does NOT state how the linguist discovers the nature and the rules of the underlying language system?A.He has to collect and observe language facts.B.He has to display and then generalize some similarities of the language facts.C.He has to formulate some hypotheses about the language structure.D.He has to deal with the basic concepts, theories, descriptions, models and methods applicable in any linguistic study.答案:D语言学家为了找出潜在的语言系统中的实质和规则,他须收集和观察语言事实,找出某些相似性并对其作出概括;然后,对语言结构进行某种假设,再对照所观察到的事实进行反复验证以充分证明它们的有效性。

2. Which of the following is NOT a distinctive feature of human language?A.Arbitrariness.B.Productivity.C.Cultural transmission.D.Finiteness.答案:D语言的区别性特征有五个:arbitrariness(任意性),productivity(能产性),duality(双层性),displacement(不受时空限制性),cultural transmission(文化传递性)。

The real-time interpretation of pronouns and reflexives Structural and non-structural infor

The real-time interpretation of pronouns and reflexives Structural and non-structural infor

The real-time interpretation of pronouns and reflexives: Structural and non-structural informationElsi Kaiser, University of Southern CaliforniaJeffrey T. Runner, Rachel S. Sussman & Michael K. Tanenhaus, University of RochesterThe observation that English pronouns and reflexives have a (nearly) complementary distribution is central to traditional Binding Theory (BT, ex.1). Picture NPs (PNPs) are a well-known exception, since both pronouns and reflexives are possible. Existing research suggests that semantic/pragmatic factors influence the use of pronouns and reflexives especially in possessorless PNPs (ex.2a), but also in PNPs with possessors (ex.2b), e.g.[14],[8],[15], see also [20],[1]. Here we focus on two complementary claims regarding semantic/pragmatic effects with possessorless PNPs: (i) According to [19], pronouns have a preference for antecedents that are perceivers-of-information (see [5],[2] for related discussion of implicit arguments and control). (ii) According to [11], reflexives prefer source-of-information antecedents (see also [16],[5],[2]). We approach these claims from an experimental perspective, with three main aims. First, we aim to complement existing work by investigating the hypothesized source/perceiver preferences experimentally. Our second aim is to investigate how different kinds of information are integrated during real-time processing of PNP pronouns and reflexives. Does source/perceiver information have an effect only after an initial ‘syntax-only’ stage, as two-stage parsing models might predict (e.g.[7], see also [17]), or does it have an impact from the earliest moments onwards, in the spirit of multiple-constraint processing models (e.g.[12],[3])? Our third aim is to test the hypothesis, argued for by [9], that anaphoric forms can differ in degree of sensitivity to syntactic and semantic/pragmatic information. According to this approach--based on cross-sentential pronouns and demonstratives--anaphor resolution is guided by multiple factors which are weighted differently for different anaphoric forms. To investigate the time-course of processing and to probe the validity of [9]’s hypothesis for intra-sentential PNP pronouns and reflexives, we used the visual-world eye-tracking paradigm [18] to obtain both fine-grained time-course data and action-based reference resolution information.Exp.1--Eye-movements were recorded as participants (n=16) listened to sentences with possessorless PNPs while looking at displays with two characters and a picture of each character (ex.3). Verb type (heard/told) and anaphoric form (pro/refl) were crossed to create four conditions. The task was to click on the picture mentioned in the sentence. Results: With pronouns, participants show a perceiver preference, choosing the subject picture more with heard than told (p<0.01). In contrast, there is no significant verb effect with reflexives; participants usually chose the subject picture (>85% with both verbs)--a pattern compatible with BT. Eye-movements show that with pronouns, a clear perceiver preference emerges 200ms after pronoun-onset (p<0.05), the earliest point at which input-driven eye-movements are expected (given the time it takes to program and launch an eye-movement [13]). With reflexives, eye-movements reveal an early but more subtle sensitivity to source, starting 200ms after onset; participants look more at the object-picture when it is the source (with heard) than when it is the perceiver (with tell) (p<0.05). In sum, pronouns and reflexives are sensitive to the verb manipulation early on; there is no sign of an initial syntax-only stage. However, pronouns show a stronger susceptibility to the verb manipulation than reflexives, supporting [9].Exp.2--Using the same methodology, we further probed the strength of source/perceiver effects by testing possessed PNPs (PPNPs, ex.4). In PPNPs, both Subj and Obj are BT-compatible antecedents for pronouns, but according to traditional BT, PPNP reflexives are constrained to refer to possessors [4]. Results: Pronouns again prefer perceivers, as shown by picture choices (p<0.01) and eye-movements (p<0.05). Reflexives strongly prefer possessors, but there are also some subject-picture choices (~8%). Neither picture choices nor eye-movements show clear verb effects for reflexives; the (overt) possessor seems to create a syntactic configuration that prevents reflexives from being sensitive to source-of-information.Conclusions--In possessorless PNPs, reflexives show an early preference for sources and pronouns a strong early sensitivity for perceivers, indicating that semantic/pragmatic factors influence anaphor resolution from the first moments onwards. In PPNPs, the perceiver preference persists for pronouns, but reflexives show no source bias and mostly refer to what BT predicts, i.e. possessors. Overall, pronouns aremore susceptible to the verb manipulation than reflexives, indicating that in the intra-sentential domain, as well as in cross-sentential reference resolution [9], anaphoric forms can differ in how sensitive they are to different kinds of information. As a whole, these results suggest a theory of anaphor resolution must be fine-grained enough to allow for multiple factors, weighted differently for different anaphoric forms.(1) Principles A, B and C of Binding Theory (based on [5])A. A reflexive is bound in a local domain.B. A pronoun is free in a local domain.C. An R-expression is free.Binding: A binds B iff A c-commands B, and A and B are coindexed.C-command: A node A c-commands a node B iff the first branching node dominating A also dominates B, and A does not dominate B.(2a) PNP without possessor: Lisa likes the picture of herself/her.(2b) PNP with possessor: Lisa likes Mary’s picture of herself/her.(3) Exp.1 Scene = Peter, Andrew, picture of Peter, picture of Andrew(3a) Peter source told Andrew perceiver about the picture of {him/himself} on the wall.(3b) Peter perceiver heard from Andrew source about the picture of {him/himself} on the wall.(4) Exp.2 Peter {told/heard from} Andrew about Greg’s picture of {him/himself} on the wall. References1. Asudeh, A. & F. Keller. 2001. Experimental Evidence for a Predication-based Binding Theory. In Andronis, Ball, Elston & Neuvel, eds., Proceedings of the 37th Meeting of the Chicago Linguistic Society.2. Bhatt, R. & Pancheva, R. 2006. Implicit Arguments. The Blackwell Companion to Syntax v.II, 554-584.3. Badecker, W. & K. Straub. 2002. The processing role of structural constraints on the interpretation of pronouns and anaphors. Journal of Experimental Psychology: LMC, 28:748-769.4. Chomsky, N. 1981. Lectures on Government and Binding. Foris. Dordrecht.5. Chomsky, N. 1986. Knowledge of language. New York: Praeger.6. Davies, W. & Dubinsky, S. 2003. On extraction from NPs. NLLT, 21, 1-37.7. Frazier, L., & Fodor, J. D. 1978. The sausage machine. Cognition 6, 291–325.8. Jaeger, T.F. 2004. Binding in picture NPs revisited: Evidence for a semantic principle of extended argumenthood. In Butt, M. & King, T.H. (eds.) Proceedings of LFG04, 268-288. Stanford: CSLI.9. Kaiser, E. & Trueswell, J. In press. Investigating the interpretation of pronouns and demonstratives in Finnish:Going beyond salience. Gibson & Pearlmutter, Processing & acquisition of reference. MIT Press.10. Keller, F. & Asudeh, A. 2001. Constraints on linguistic coreference. 23rd Cog Sci Soc, 483-488.11. Kuno, S. 1987. Functional Syntax: Anaphora, discourse & empathy. Chicago.12. MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. 1994. The lexical nature of syntactic ambiguity resolution. Psychological Review, 101, 676–703.13. Matin, E., Shao, K. C., & Boff, K. R. 1993. Saccadic overhead: Information processing time with and without saccades. Perception & Psychophysics, 53(4), 372–380.14. Pollard, C. & Sag, I. 1992. Anaphors in English and the scope of BT. LI 23:261-303.15. Runner, J., Sussman, R. & Tanenhaus, M. (2006). Assigning Reference to Reflexives and Pronouns in Picture Noun Phrases, Cognitive Science 30, 1-49.16. Sells, P. 1987. Aspects of Logophoricity. LI 18 (3):445-479.17. Sturt, P. 2003. The time-course of the application of binding constraints in reference resolution. Journal of Memory and Language, 48, 542-562.18. Tanenhaus, M.K., Spivey-Knowlton, M.J., Eberhard, K.M. & Sedivy, J.E. (l995). Integration of visual and linguistic information in spoken language comprehension. Science 268, 1632-1634.19. Tenny, C. 2004. Short Distance Pronouns in RNPs and a Grammar of Sentience. Ms.20. Zribi-Hertz, A. 1989. Anaphor Binding and Narrative Point of View English Reflexive Pronouns in Sentence and Discourse. Language 65(4): 695-727.。

7.现代英语词汇学(第七章)

7.现代英语词汇学(第七章)

with the stimulus. Stated in terms of semantic features, the rule would go as follows: “Change the sign of only one feature.” The word man, for example, hason with the word woman: [+Noun], [+Human], [+Adult]. But these two words differ in the last feature [+ Male]. If [+Male] is changed to [- Male], the stimulus man yields the response woman. The antonymous adjectives longshort elicit each other with a change of the feature [+ Polar]. Other frequent single-feature contrasts include [+ Plural] in verbs (is - are, was – were, has – have, etc.); [ + Past] among strong verbs (is – was, are- were, has- had, take- took, etc.) [+ Nominative] among pronouns (he- him, she- her, we- us, they- them, etc.); and [+ Proximal] among the deictic /daitik/ words (here-

教师资格考试《英语学科知识与教学能力》考试试卷(1286)

教师资格考试《英语学科知识与教学能力》考试试卷(1286)

教师资格考试《英语学科知识与教学能力》课程试卷(含答案)__________学年第___学期考试类型:(闭卷)考试考试时间:90 分钟年级专业_____________学号_____________ 姓名_____________1、单项选择题(36分,每题1分)1. Which of the following description of the sound segmentsis NOT correct? _____A. [h] glottal fricativeB. [m] bilabial nasalC. [m] alveolar approximantD. [l] alveolar lateral答案:C解析:[m]是双唇鼻音,而不是齿龈近音。

2. Which of the following sets of English sounds differsonly in one distinctive feature? _____A. [v][e][k][e]B. [i][i][e][b]C. [p][t][f][s]D. [p][i][b][s]答案:C解析:项都是清辅音。

3. By nine o’clock, all the Olympic torch bearers had reached the top of Mount Qomolangma, _____ appeared a rare rainbow soon.A. above whichB. of whichC. on whichD. from which答案:A解析:which指代top of Mount Qomolangma,彩虹应当在山顶之上,故用above which。

4. Which of the following statements about the Audiolingual Method is wrong? _____A. The method involves praising the correct response or publishing incorrect response until the right one is given.B. The method involves giving the learner stimuli in the form of prompts.C. Emphasis is laid upon using oral language in the classroom some reading and writing might be done as homework.D. Mother tongue is accepted in the classroom just as the target language.答案:D解析:udiolingual Method(听说教学法)的基本特点是:先听说、后读写;教学以句型为中心;将外语与母语对比,确定教学难点;大量练习和反复实践;及时提出批评任何错误;尽量不用母语;广泛利用田晓兰手段。

语篇分析——精选推荐

语篇分析——精选推荐

语篇分析语篇的衔接⼿段cohesion and coherence 衔接与连贯他们系统地将衔接分为五⼤类:照应(reference)、替代(substitution)、省略(ellipsis)、连接(conjunction)及词汇衔接(lexical cohesion)。

其中前三类属于语法⼿段,第四类属于逻辑⼿段,最后⼀类属于词汇衔接⼿段。

照应(reference)是⼀些起信号作⽤的词项。

它们不能像⼤多数词项那样本⾝可作出语义理解,⽽只能通过照应别的词项来说明信息(Halliday & Hasan, 1976:31)。

照应分为外照应(exophora)和内照应(endophora)。

内照应⼜可分为下照应(或称后照应)(anaphora)和上照应(或称前照应)(cataphora)。

外照应指独⽴于上下⽂之外的词项。

内照应指意义依赖于上下⽂的词项。

下照应(后照应)指意义依赖于前述词项的词项、上照应(前照应)指意义依赖于后述词项的词项。

The snail is considered a great delicacy.As the child grows, he learns to be independent.It never should have happened. She went out and left the door open.替代(substitution)指⽤⼀个词项去代替另⼀个或⼏个词项,是词项之间的⼀种代替关系。

英语中常⽤的替代词one(s), do, same。

Halliday和Hasan将其分为名词性替代、动词性替代和从句性替代。

由于前述句⼦或上下⽂使得意义明确⽽省去句⼦的⼀部分称作为省略。

它可以视为“零替代”(Halliday & Hasan, 1976:142),省去⼀些上下⽂可使之意义明确的成分。

Compare the new dictionary with the old one(s).We rent a house, but they own one.A: Black coffee, please.B: The same for me.They do not buy drinks at the supermarket, but we do.I think so.■省略EllipsisHe prefers Dutch cheese and I prefer Danish.---Do you understand?---I tried to.---Y ou haven’t told him yet.---Not yet.连接(conjunction)是句际间意义相互联系的⼀种衔接⼿段,常⽤的有递进、转折、因果、时间。

统计语言模型(fandywang 20121106)

统计语言模型(fandywang 20121106)
I can see Alcatraz from the ws is up The S&P500 jumped Housing prices rose
Economy is good
Einstein met with UN officials in Princeton
17
Shannon Game: Word Prediction
• Claude E.Shannon. “Prediction and Entropy of Printed English”, Bell System Technical Journal 30:30-64. 1951 • How well can the next letter(or word) of a text be predicted when the preceding letters(or words) are known
P(W) or P(wn|w1,w2…wn-1) is called a language model
20
How to compute P(W)
• How to compute this joint probability:
– P(its, water, is, so, transparent, that, the)
✓ ✗
The
13th
Shanghai International Film Festival…
Party May 27 add
Information extraction (IE)
You’re invited to our dinner party, Friday May 27 at 8:30
9
什么是自然语言处理
– Word Segment
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Using the Web for Anaphora ResolutionKatja Markert,Malvina Nissim School of InformaticsUniversity of Edinburgh markert@ mnissim@Natalia N.ModjeskaSchool of Informatics University of Edinburgh and Department of Computer Science University of Toronto natalia@AbstractWe present a novel method for resolv-ing non-pronominal anaphora.Insteadof using handcrafted lexical resources,we search the Web with shallow patternswhich can be predetermined for the typeof anaphoric phenomenon.In experi-ments for other-anaphora and bridging,our shallow,almost knowledge-free andunsupervised method achieves state-of-the-art results.1IntroductionAfter having focussed on pronominal anaphora, researchers are now devoting attention to other nominal anaphors as well(Harabagiu and Maio-rano,1999;Vieira and Poesio,2000;Bierner, 2001;Modjeska,2002;Ng and Cardie,2002). These comprise such diverse phenomena as coref-erence,bridging(Clark,1975)(see also Exam-ple(1)),and other-anaphora(Example(2)).1 (1)The apartment she shares with a12-year-old daughter and her sister was rattled,books and crystal hit thefloor,[...] (2)You either believe Seymour can do itagain or you don’t.Beside the designer’sage,other risk factors for Mr.Cray’scompany include the Cray-3’s[...]chiptechnology.and Poesio,2000)).Moreover,manually built resources are expensive and time-consuming to build and maintain.There have been efforts to extract missing lex-ical relationships from corpora in order to build new knowledge sources and enrich existing ones (Hearst,1992;Berland and Charniak,1999;Poe-sio et al.,2002).In our view there are two main problems with these approaches.Firstly,the size of the used corpora still leads to data sparseness(Berland and Charniak,1999) and the extraction procedure can therefore require extensive smoothing.Secondly,it is not clear how much and which knowledge to include in a fixed context-independent ontology,whether man-ually built or derived from a corpus.Thus,should metonymy,underspecified and point-of-view de-pendent hyponymy relations(Hearst,1992)be in-cluded?Should age,for example,be classified as a hyponym of risk factor independent of context? To solve thefirst problem,we propose using the Web,which with approximately968M pages2 is the largest corpus available to the NLP ing the web has proved successful in severalfields of NLP,e.g.,machine translation (Grefenstette,1999)and bigram frequency estima-tion(Keller et al.,2002).In particular,(Keller et al.,2002)have shown that using the Web handles data sparseness better than smoothing.However, to our knowledge,the Web has not been used for anaphora resolution yet.We do not offer a solution to the second prob-lem,but instead claim that,for our task,we do not need a predeterminedfixed ontology at all.In Example(2),we do not need to have andfix the knowledge that age is always a risk factor,but only that,among the possible NP antecedents“Sey-mour”,“designer”,and“(designer’s)age”,the lat-ter is the most likely to be viewed as a risk factor. In the next section,we introduce a method that uses shallow lexico-syntactic patterns and their web frequencies instead of afixed ontology to achieve this comparison between several possible antecedents.We then present two experiments onother-anaphora and bridging,respectively.Our shallow technique,used on a noisy and unpro-3From now on,we will often use“anaphor/antecedent”instead of the more cumbersome“lexical heads of the anaphor/antecedent”.4These simplified instantiations serve as an example and are not thefinal instantiations we use;see Section3.2.tern can be searched in any corpus to de-termine its frequency.We now follow therationale that the most frequent of these in-stantiated patterns determines the correct an-tecedent.4.As the patterns can be quite elaborate,most corpora will be too small to deter-mine the corresponding frequencies reli-ably.The instantiation age and other riskfactors,for example,does not occur at all inthe British National Corpus(BNC),a100Mwords corpus of British English.5Thereforewe use the largest corpus available,the Web.6We submit all instantiated patterns as queriesto the Web making use of the GOOGLEAPI technology and,as afirst approximation,select the instantiation yielding the highestnumber of hits.Here,age and other riskfactors yields over400hits,whereas theother two instantiations for this example yield0hits each.3Experiment I:Other-anaphoraHere we restrict other-anaphora to referential lexi-cal NPs with the modifiers other or another andnon-structurally given antecedents,as in Exam-ple(2).7The distance between an other-anaphorand its antecedent can be large;(Modjeska,2002)observed a dependency that spans over17sen-tences.3.1Data Collection and PreparationWe tested our method on120samples of other-anaphors from the Wall Street Journal corpus(Penn Treebank release2,first three sections).These samples are part of the dataset reported in(Modjeska,2002).We used the samples in which8and ORGANIZATION,whenever possible.We clas-sified LOCATIONS into COUNTRY,(US)STATE,CITY,RIVER,LAKE and OCEAN,using mainly gazetteers.9If an entity classified by GATE asORGANIZATION contained an indication of the or-ganization type,we used this as a subclassifica-tion;therefore“Bank of America”is classified as BANK.No further distinctions were developed for the category PERSON.For numeric and times enti-ties we used simple heuristics to classify them fur-ther into DAY,MONTH,YEAR as well as DOLLAR or simply NUMBER.Disregarding numeric and time entities,our dataset included262possible NE antecedents. Our method recognised216as proper names(82% recall).Of these,202(93%precision)were cor-rectly classified.Finally,all elements of A were lemmatized. For Example(2),this results in A={person [=Seymour],designer,age}and ana=factor.3.2Pattern Selection and Query Generation We use the following pattern for other-anaphora:10 (O1)(N1{sg}OR N1{pl})and other N2{pl} For common noun antecedents,we instantiate the pattern by substituting N1with a possible an-tecedent,an element of A,and N2with ana,as normally N1is a hyponym of N2in(O1),and the antecedent is a hyponym of the anaphor.An instantiated pattern for Example(2)is(age OR ages)and other factors(see I c1in Table1).11 For NE antecedents we instantiate(O1)by sub-stituting N1with the NE category of the an-tecedent,and N2with ana.An instantiated pat-tern for Example(4)is(person OR persons) and other shareholders(see I p1in Table1).In this instantiation,N1(“person”)is not a hyponym of N2(“shareholder”),instead N2is a hyponym of N1.This is a consequence of the substitution of the antecedent(“Mr.Pickens”)with its NE categoryTable1:Patterns and Instantiations for other-anaphoracommon noun(O1):(N1{sg}OR N1{pl}and other N2{pl}I c1:“(age OR ages)and other factors”P r(Imax ant)=M antnumber of GOOGLE pages P r(Y ant)=freq(Y ant)P r(X ant)P r(Y ant)We resolve to the antecedent with the highest MI ant.In both methods,if two antecedents achieve the same score,a recency based tie-breaker chooses the antecedent closest to the anaphor in the text.3.4Results and Error AnalysisWe postulate three categories for classifying the results:(i)correct,when the antecedent selected is the correct one;(ii)lenient,when the antecedent selected refers to the same entity as the correct an-tecedent;(iii)wrong in all other cases.In Table2,we compare our results with those obtained by the algorithm LEX(Modjeska,2002). Although LEX makes extensive use of WordNet, our algorithm achieves comparable results.Our algorithm’s mistakes are due to several fac-tors.NEs.As58(48.3%)out of the total120 anaphors(48.3%)have NE antecedents,NE res-olution is crucial for our algorithm.The low re-call of our NE recognition module has a signif-icant impact on our algorithm’s performance asTable2:Results for other-anaphora corr505854len757it leads to missing instantiations.Moreover,in-correct NE classifications yield incorrect instanti-ations,although this problem is not very frequent as the precision of our NE resolution module is relatively high(93%).Vague Anaphors.Anaphors that are semanti-cally vague,such as“issue”or“problem”,are not informative enough for a purely semantic-oriented method.Split Antecedents.Our algorithm cannot han-dle split antecedents(6cases in our dataset).In Example(5),the antecedent of“other contract months”is a set of referents consisting of“May”and“July”.(5)The May contract,which also is withoutrestraints,ended with a gain of0.45centto14.26cents.The July delivery roseits daily permissible limit of0.50cent apound to14.00cent,while other contractmonths showed near-limit advances. Our algorithm is not able to distinguish between proper cases of split antecedents(e.g.Exam-ple(5)),where a tie-breaker should not be applied, and cases where two similar entities are mentioned but only one is the actual antecedent.It is normally necessary to resort to sophisticated inference tech-niques to make this distinction.As we always ap-ply a tie-breaker,examples such as(5)cannot be handled:only“July”(the most recent antecedent) is selected and the result counts as wrong. Pronouns.Our algorithm cannot handle pro-noun antecedents as they are lexically empty.Con-trary to intuition,this does not constitute a signif-icant limitation as only2anaphors in our dataset have pronominal antecedents that refer to an en-tity that is not additionally mentioned by a full NP within the2sentence window.In contrast,in Ex-ample(4),the referent of the pronoun“he”is also mentioned by the full NP“Mr.Pickens”,and the algorithm is able to resolve the anaphor to“Mr. Pickens”,thus yielding a lenient correct result.4Experiment II:BridgingFor the scope of this paper,a bridging anaphor is a definite NP that can be felicitously used only because it refers to an entity which stands in a meronymic relation with an already explicitly in-troduced entity(see Example(1)).We use the corpus described in(Poesio et al., 2002),restricting ourselves to the examples clas-sified as meronymy.Unfortunately,this yields only12examples so that the current experiment is only a very small pilot study to explore the ex-tension of our method to other nominal anaphora. In addition,we profit from the a priori knowledge that we are dealing with an instance of meronymy (one of the many relations bridging can express), so that we do not operate in a completely realis-tic scenario.This contrasts with our experiment on other-anaphora,where the modifier other(to-gether with the absence of a structurally given antecedent)reliably signals the presence of an anaphor and the lexical relations expressed are more constrained.For each anaphor,all possible NP antecedents in a5-sentence window have been already extracted by Renata Vieira and Massimo Poesio,who also had already deleted NEs from the original dataset. Again,we stripped modification so that only the heads of possible antecedents and of the anaphors were used.Meronymy is often explicitly expressed by the following patterns:(B1)(N1{sg}OR N1{pl})of(a OR an OR the OR each OR every OR any)*N2{sg}Table3:Results for bridgingcorr738(B2)N1{pl}of(the OR all)*N2{pl}We instantiate both patterns by equating N1 with the anaphor and N2with the antecedent.In Example(1)the instantiations for the antecedent “apartment”are(floor OR floors)of(a OR an OR the OR each OR every OR any)* apartment and floors of(the OR all)* apartments.For each antecedent ant we obtain the raw fre-quencies of all instantiations it occurs in from the Web.We then compute the maximum M ant over these frequencies and select the antecedent with the highest M ant as the correct one.We will use mutual information for bridging on a larger dataset.Table3compares our results with those ob-tained by the algorithms used in(Poesio et al., 2002),of which one relies on WordNet and the other on knowledge a priori extracted from a parsed version of the BNC.In this preliminary study,our results outper-form the Wordnet method and are comparable to those obtained from corpus-based knowledge ex-traction,although we do not linguistically process the web pages returned by our search.5Related WorkMost of the current resolution algorithms for non-pronominal anaphora make heavy use of hand-crafted ontologies.The COCKTAIL system for coreference resolution(Harabagiu and Maiorano, 1999)combines sortal constraints and concep-tual glosses from WordNet with co-occurence in-formation from a treebank.(Vieira and Poesio, 2000)’s system for definite descriptions(covering, inter alia,coreference and bridging)also makes use of WordNet,as well as handcrafted constraints and consistency checks.LEX,a resolution al-gorithm developed for other-anaphors(Modjeska, 2002),employs lexical information from WordNetand heuristics for resolving anaphora with NE an-tecedents,presupposing they have previously been classified into MUC-7categories.12All these ap-proaches suffer from the shortcomings that we outlined in Section1.In contrast,we do not use any external hand-crafted knowledge.In addition,we use only shal-low search patterns and do not process the pages that GOOGLE returns in any way.13We achieve results comparable to knowledge-or processing-intensive methods.Moreover,we only com-pare the likelihood of several given antecedents to cooccur in a given pattern with a given anaphor in-stead of assuming context-independent lexical re-lations.Thus,we can resolve anaphor/antecedent relations that might or should not be included in a lexical hierarchy(e.g.risk factor/age).Our results confirm results by(Keller et al., 2002)and(Grefenstette,1999),who use the Web successfully for other NLP applications(for an overview of successful usage of the Web in NLP, see(Keller et al.,2002)).In line with these re-sults,ours also show that the large amount of data available on the Web overcomes its intrinsic noise as well as the lack of linguistic processing.To our knowledge,ours is thefirst attempt to tackle anaphora resolution using the Web.6Contributions and Future WorkWe have proposed a novel method for non-pronominal anaphora resolution,using simple Web searches with shallow linguistic patterns.We show that the large amount of data available on the Web makes anaphora resolution without hand-crafted lexical knowledge feasible.In particular,we have described two experi-ments carried out on two different anaphoric phe-nomena,namely other-anaphora and bridging.Ex-ploiting free text achieves results comparable to those obtained when using rich and structured handcrafted resources.Given the shallow techniques used and the state-of-the-art results obtained,our method is promis-Sanda Harabagiu and Steven Maiorano.1999.Knowledge-lean coreference resolution and its rela-tion to textual cohesion and coherence.In Proc.of the ACL-99Workshop on the Relation of Discourse and Dialogue Structure and Reference,pages29–38. Marti Hearst.1992.Automatic acquisition of hy-ponyms from large text corpora.In Proc.of COLING-92.Frank Keller,Maria Lapata,and Olga Ourioupina.ing the Web to overcome data sparseness.In Proc.of EMNLP-02,pages230–237.Natalia N.Modjeska.2002.Lexical and grammati-cal role constraints in resolving other-anaphora.In Proc.of DAARC-02.Vincent Ng and Claire Cardie.2002.Improving ma-chine learning approaches to coreference resolution.In Proc.of ACL-02,pages104–111.Massimo Poesio,Tomonori Ishikawa,Sabine Schulte im Walde,and Renata Viera.2002.Acquiring lexical knowledge for anaphora resolu-tion.In Proc.of LREC-02,pages1220-1224. Michael Strube and Udo Hahn.1999.Functional centering—grounding referential coherence in information putational Lingustics, 25(3):309–344.Renata Vieira and Massimo Poesio.2000.An empirically-based system for processing definite putational Linguistics,26(4).。

相关文档
最新文档