Toronto, ON

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Automatic Verb Classi cation Using Multilingual Resources
Department of Computer Science University of Toronto 10 King's College Road Toronto, ON Canada M5S 3G4
Suzanne Stevenson
We propose the use of multilingual corpora in the automatic classi cation of verbs. We extend the work of (Merlo and Stevenson, 2001), in which statistics over simple syntactic features extracted from textual corpora were used to train an automatic classi er for three lexical semantic classes of English verbs. We hypothesize that some lexical semantic features that are di cult to detect super cially in English may manifest themselves as easily extractable surface syntactic features in another language. Our experimental results combining English and Chinese features show that a small bilingual corpus may provide a useful alternative to using a large monolingual corpus for verb classi cation.
ing to be explicitly hand-coded. In this paper, we explore the use of multilingual corpora in the automatic learning of verb classi cation. We extend the work of (Merlo and Stevenson, 2001), in which statistics over simple syntactic features extracted from syntactically annotated corpora were used to train an automatic classi er for a set of sample lexical semantic classes of English verbs. This work had two potential limitations: rst, only a small number ( ve) of syntactic features that correlate with semantic class were proposed; second, a very large corpus was needed (65M words) to extract su ciently discriminating statistics. We address both of these issues in the current study by exploiting the use of a parallel English-Chinese corpus. Our motivating hypothesis is that some lexical semantic features that are di cult to detect super cially in English may manifest themselves as surface syntactic features in another language. If this is indeed the case, then we should be able to augment the initial set of English features with features over the translated verbs in the other language (in our case, Chinese). Our hypothesis that a non-English verb feature set can be useful in English verb classi cation is inspired by SLA (Second Language Acquisition) research on learning English verbs. As the name suggests, SLA research studies how humans acquire a second language. \Transfer e ects"|the impact of one's native language when learning a second language (Ellis, 1997)|are of particular interest to us. Recent research has shown that properties of a non-English native lexicon can in uence human learning of English verb class distinctions (e.g., (Helms-Park, 1997; Inagaki, 1997; Ju s,
2000)). Carrying this idea of \transfer" over to related features. We describe how we expect the machine learning setting, we hypothesize each type of feature to vary across the two that features from a second language may pro- classes. vide an additional source of information that for Verbs We used complements the English features, making it 1. Chinese POS tags Knowledge: Informathe CKIP (Chinese possible that a smaller corpus (a bitext) can tion Processing Group) POS-tagger to asbe a useful alternative to using a large monosign one of 15 verb tags to each verb. lingual corpus for verb classi cation. Additionally, each of these tags can be mapped into the UPenn Chinese Treebank 2 The Verb Classes and English standard (Fei Xia, email communication), Features which characterizes each verb as \active" or \stative". Merlo and Stevenson (2001) tested their approach on the major classes of optionally inWe note that change-of-state verbs are transitive verbs in English. All the classes more likely to be adjectivized than creallow the same subcategorizations (transitive ation/transformation verbs; furthermore, and intransitive), entailing that they cannot this adjectival property is not unlike the be discriminated by subcategorization alone. stative property in Chinese. We expect Thus, successful classi cation demonstrates then to see the Chinese translation of the induction of semantic information from English change-of-state verbs to be more syntactic features. likely assigned a stative verb tag. In our work, we focus on two of these classes, the change-of-state verbs, such as open , and 2. Passive Particles: The adjectival nature of change-of-state verbs may also be rethe verbs of creation and transformation, such ected in a higher proportion of passive as perform (classes 45 and 26, respectively, use, since the adjectival use is a passive from (Levin, 1993)). Both classes are optionuse. In Chinese, a passive construction is ally intransitive, but di er in the alternation indicated by a passive particle preceding between the transitive and intransitive forms. the main verb. For example, the passive The transitive form of a change-of-state verb sentence: is a causative form of the intransitive (the door opened /the cat opened the door ), while This store is closed. the transitive/intransitive alternates of a crecan be translated as: ation/transformation verb arise from simple Zhe4 ge4 (this) shang1 dian4 (store) bei4 object optionality (the actors performed the (passive particle) guan1 bi4 (closed). skit /the actors performed ). We thus expect to nd that translations Merlo and Stevenson (2001) used 5 numeric of change-of-state verbs have a higher frefeatures that encoded summary statistics over quency of occurrence with a passive parthe usage of each verb across the corpus (65M ticle in Chinese. words of Wall Street Journal, WSJ). The features captured subcategorization and aspec- 3. Periphrastic (Causative) Particles: tual frequencies (of transitivity, passive voice, In Chinese, some causative sentences use and VBN POS tag), as well as statistics that an external (periphrastic) particle to indiapproximated thematic properties of NP arcate that the subject is the causal agent guments (animacy and causativity) from simof the event speci ed by the verb. For exple syntactic indicators. We adopt these same ample, one possible translation for features in our work, and augment them with I cracked an egg. Chinese features as described next. can be 3 Chinese Features Wo3 (I) jiang1 (made, periphrastic particle) dan4 (egg) da3 lan4 (crack). We selected the following Chinese features for Since change-of-state verbs have our task, based on the properties of the changea causative alternate, and creof-state and creation/transformation classes. ation/transformation verbs do not, Each numbered item refers to a collectiotroduction
Recently, a number of researchers have devised corpus-based approaches for automatically learning the lexical semantic class of verbs (e.g., (McCarthy and Korhonen, 1998; Lapata and Brew, 1999; Schulte im Walde, 2000; Merlo and Stevenson, 2001)). Automatic verb classi cation yields important potential bene ts for the creation of lexical resources. Lexical semantic classes incorporate both syntactic and semantic information about verbs, such as the general sense of the verb (e.g., change-of-state or manner-of-motion) and the allowable mapping of verbal arguments to syntactic positions (e.g., whether an experiencer argument can appear as the subject or the object of the verb) (Levin, 1993). By automatically learning the assignment of verbs to lexical semantic classes, each verb inherits a great deal of information about its possible usage in an NLP system, without that information hav-
相关文档
最新文档