Automated Question Answering in Webclopedia - A Demonstration


Open-domain QA systems

Open-domain QA systems

Open-domain QA systemsAnswerBusLCC2([7]), QuASM3, IONAUT4([1]),START5([11]) and Webclopedia6([10]).AnswerBus: 句子级,多语言支持functional words deletion (prepositions, determiners/pronouns, conjunctions, interjections,and discourse particles.)use of word frequency table (delete frequently used words)special words deletionword form modification.候选答案提取words, then an answer candidate sentence should have at least two of them. When a sentence meets the condition as indicated by the above formula, it will receive a primary score basedon the number of matching words it contains. Otherwise, it will receive a score of “0.”候选答案排序问题类型→答案类型(who→ name)问题类型→关键词扩展(多远→千米)名字实体提取Coreference resolution (他→何靖)(AnswerBus only solves the coreferences in theadjacent sentences. When this type of coreference isdetected, the later sentence receives part of score fromits previous sentence.)搜索引擎返回的顺序答案句子评分WebclopediaPrevious work in automated question answering has often categorized questions by question wordalone or by a mixture of question word and the semantic class of the answer (Srihari and Li, 2000; Moldovan et al., 2000). To ensure full coverage of all forms of simple question and answer, we have been developing a QA Typology as a taxonomy of QA types, becoming increasingly specific as one moves from root downward.To create the QA Typology, we analyzed 17,384 questions and their answers (downloaded from); see (Gerber, 2001). The Typology contains 94 nodes, of which 47 are leaf nodes;a section of it appears in Figure 2.By CONTEXTNaturally, this forces the patterns tocontain not only surface forms (words and punctuation, butalso type markers (Date, NumericalAmount, MoneyAmount...).A Question/Answer Typology with Surface Text Patterns问题分类树pattern自动提取(suffix tree,precision)(NAME_OF_PERSON BIRTHYEAR),pattern提取查询评估每个pattern的precision 查询银平Patterns of Potential Answer Expressions as Clues to the Right Answers TextRollersearches for candidate answers using key words (from the question text) and chooses the most probable answer using patterns.In the literature we find approaches attempting to distinguish between the main (primary) andadditional (secondary) query words. In (Sneiders, 1998) this distinction isdiscussed as applied tosearching for answers to FAQs, where the answers are represented as sentences. Primary keywordsare the words that convey the essence of the sentence. They cannot be ignored. Secondarykeywords are the less-relevant words for a particular sentence. They help to convey the meaning ofthe sentence but can be omitted without changing the essence of the meaning. Answer ExtractionRanking1.In most cases, the matching is boolean:2.a couple of special cases where finer distinctionsare made.How many lives were lost in the Lockerbie air crash, entities such as 270 lives or almost 300 lives would be ranked above entities such as 200 pumpkins or150. 23. the frequency and position of occurrences of agiven entity within the retrieved passages.。

  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

ACL-2002 Demonstration, Philadelphia, PA, U.S.A., July 7-12, 2002.Automated Question Answering in Webclopedia – A Demonstration Ulf Hermjakob, Eduard Hovy and Chin-Yew LinUSC Information Sciences Institute4676 Admiralty WayMarina del Rey, CA 90292-6695{ulf,hovy,cyl}@Tel: 310-448-8476, 310-448-8731 and 310-448-8711In this demonstration we present Webclopedia, a semantics-based question answering system accessible via the web (Hovy et al. 2002, 2001, 2000). Through a live interface (Figure 1), users can type in their questions or select a predefined question. The system returns its top 5 candidate answers, drawn from NIST’s TREC corpus, a collection of 1 million newspaper texts.Some key points:Webclopedia integrates IR and NLP components. Both symbolic and statistical techniques are employed. For example, the CONTEX parser’s grammar, the intermediate result ranking rules, and answer matching patterns are created by machine learning; the answer pinpointer uses hand-crafted matching rules.Like almost all modern QA systems, Web-clopedia uses a taxonomy of question/answer types. The QA Typology (Hovy et al 2002), one of the most extensive used in the literature, contains over 180 types, and is based on an analysis of 17,384 questions, plus subsequent extensions.The typology is at /natural-language/projects/Webclopedia/Taxonomy/taxon omy_toplevel.html.Webclopedia took part in NIST’s TREC QA evaluations, achieving MRR (mean reciprocal rank) scores of 31% in TREC9 (tried second place) and 45% in TREC10.Recent work at ISI has focused on developing Korean and Mandarin Chinese versions of Webclopedia, allowing the user to ask English questions and receive English answers from foreign-language text sources.Instead of using the TREC corpus as source, Webclopedia is being extended to also query the web, using commercial web search engines to provide documents with likely answer candidates.The system works as follows:•Question parsing: Using BBN’s IdentiFinder (Bikel et al., 1999), the CONTEX parser (Hermjakob 1997, 2001) produces a syntactic-semantic analysis of the question and determines the QA type(s) sought.•Query formation: Single- and multi-word units (content words) are extracted from theanalysis, and WordNet synsets are used forquery expansion. A series of Boolean queriesis formed.•IR: The IR engine MG (Witten et al., 1994) returns the top-ranked N documents. •Selecting and ranking sentences: For each document, the most promising K<<Nsentences are located and scored using a formula that rewards word and phrase overlapwith the question and its expanded query words. Results are ranked.•Parsing sentences: CONTEX parses the top-ranked 300 sentences.•Pinpointing: Each candidate answer sentence parse tree is matched against the parse of thequestion, with particular attention to the QAtype(s) sought. The matching patterns werebuilt by hand; additional patterns are learnedoff the web (Ravichandran and Hovy, 2002).As a fallback the window method is used.•Ranking of answers: The candidate answers’scores are computed and the topmost 5 areoutput as final answers.Figure 1. Webclopedia web interface (answers in red, matched portions in blue). ReferencesHermjakob, U. 1997. Learning Parse and Translation Decisions from Examples with Rich Context. Ph.D. dissertation, University of Texas, Austin.file:///pub/~mooney/papers/hermjakob-dissertation-, U. 2001. Parsing and Question Classification for Question Answering. Proceedings of the ACL Workshop on Question Answering. Toulouse, France.Hovy, E.H., L. Gerber, U. Hermjakob, M. Junk, and C.-Y. Lin. 2000. Question Answering in Webclopedia. Proceedings of the TREC-9 Conference. NIST, Gaithersburg, MD.Hovy, E.H., L. Gerber, U. Hermjakob, C.-Y. Lin, and D. Ravichandran. 2001. Toward Semantics-Based Answer Pinpointing. Proceedings of the DARPA Human Language Technology Conference (HLT). San Diego, CA.Hovy, E.H., U. Hermjakob, and D. Ravichandran. 2002. A Question/Answer Typology with Surface Text Patterns. Poster in Proceedings of the Human Language Technology Conference. San Diego, CA. Ravichandran, D. and E.H. Hovy. 2002. Learning Surface Text Patterns for a Question Answering System. Proceedings of the ACL Conference. Philadelphia, PA.Witten, I. H. and A. Moffat and T. Bell 1994. Managing Gigabytes. New York: Van Nostrand Reinhold.。
