Using WordNet for Building WordNets

合集下载

人工智能 模拟试题

人工智能 模拟试题

《人工智能》模拟试题一(150分钟)1. 填空题(共12分,每小题2分)1)知识表示的性能应从以下二个方面评价:____________________和________________;后者又分二个方面______________和________________。

2)框架系统的特性继承功能可通过组合应用槽的三个侧面来灵活实现,它们是______________________________________________________。

3)KB系统通常由以下三个部分组成:__________________________________________;KB 系统的开发工具和环境可分为以下三类:____________________________________。

4)按所用的基本学习策略可以将机器学习方法划分为以下几类:_____________________________________________________________________。

5)主观Bayes方法将推理规则表示为P Þ Q形式,称__________为先验似然比,__________ 为条件似然比,_________为规则的充分性因子。

6)自然语言理解中,单句理解分二个阶段:____________和____________,后者又分二个步骤:________________和_________________。

2、问答题(共20分,每小题5分)1)阐述示例学习所采用的逐步特化学习策略,并说明学习过程中正、反例的作用。

2)为什么要在框架系统中实行相容匹配技术?如何实现?3)阐述Xps的冲突解法和推理引擎,并说明综合数据库中事实元素的时间标签在冲突解法中起的作用。

4)什么是问题归约?问题归约的操作算子与一般图搜索有何不同?与或图启发式搜索算法AO*的可采纳性条件是什么?3、简单计算题(共35分,每小题7分)1)按书上图8.5中给定的文法规则,再追加2条:N ® football, V ® play;画出英语句子”The boy play little football”的句法分析树。

programming and problem solving with c++ 中文版

programming and problem solving with c++ 中文版

《Programming and Problem Solving with C++》(中文版《C++程序设计教程》)是一本由美国作家Bjarne Stroustrup所著的教材。

以下是这本书的主要内容:
1.程序设计的基本元素:数据类型、控制结构和变量等基本概念,以及如何
使用它们来编写程序。

2.函数和程序结构:介绍如何使用函数来组织程序,包括函数的定义、声明
和调用,以及如何处理函数参数和返回值。

3.面向对象编程:介绍如何使用类和对象来组织程序,包括类的定义、对象
的创建和使用,以及如何使用继承和多态等面向对象编程技术。

4.泛型编程:介绍如何使用模板来编写泛型程序,包括模板函数的定义和使
用,以及如何使用标准模板库(STL)中的容器和算法等。

5.异常处理:介绍如何使用异常处理技术来处理程序中的错误和异常情况,
包括异常的抛出、捕获和处理。

6.文件和流:介绍如何使用文件和流来读写数据,包括文件的打开、读取、
写入和关闭等操作。

7.高级主题:介绍一些高级主题,包括多线程编程、网络编程和并发编程等。

总的来说,这本书是一本全面介绍C++编程语言的教材,适合初学者和有一定经验的程序员阅读。

一种基于WordNet语义相似度的改进算法

一种基于WordNet语义相似度的改进算法

一种基于WordNet语义相似度的改进算法作者:田姗来源:《数字技术与应用》2013年第08期摘要:随着信息的快速发展,计算词语语义相似度在很多领域得到了广泛应用与研究,包括信息检索,信息抽取,词义排歧,基于实例的机器翻译,文本分类等等。

本文在相关研究的基础上除了考虑路径外考虑了节点所在树中的深度和宽度,提出一种基于WordNet语义相似度的改进算法。

关键词:WordNet 语义距离语义相似度中图分类号:TP391 文献标识码:A 文章编号:1007-9416(2013)08-0113-01语义相似度计算在很多领域都有着广泛的应用,如自然语义处理,信息检索,词义排歧,文本分类以及基于实例的机器翻译等。

随着Internet技术的高速发展,语义相似度成为信息检索研究的重要组成部分。

当前语义相似度计算方法大致可以分为两类:一类是根据世界知识或者某种分类体系的方法来计算,主要是基于按照概念间结构层次关系组织的语义词典的方法,根据在这类语言学资源中概念之间的上下位关系和同位关系来计算词语的相似度;第二类是基于统计的方法,主要将上下文信息的概率分布作为词汇语义相似度的参照。

现有的研究中有的通过词结点之间上下位关系构成的最短路径计算语义相似度,文献[1-2]通过两个词的公共祖先结点的最大信息量计算语义相似度,文献[3-5]通过结合结点间的路径长度,概念层次树的深度,概念层次树的区域密度等因素综合考虑计算语义相似度。

国外很多研究者利用WordNet 中的同义词集组成的树状层次体系结构计算语义相似度。

1 WordNet简介WordNet是由Princeton 大学的心理学家,语言学家和计算机工程师联合设计的一种基于认知语言学的英语词典,它不只把单词以字母顺序排列,而且按照单词的意义组成一个“网络”。

由于包含了语义信息,所以WordNet有别于通常意义上的字典。

WordNet描述对象包括复合词、短语动词、搭配次词、成语、单词,其中单词是最基本的单位。

中学英语教学法 第二次导学课4

中学英语教学法 第二次导学课4

---------------------------------------------------------------最新资料推荐------------------------------------------------------ 中学英语教学法第二次导学课4 中学英语教学法第二次导学课主讲:陈道明(华南师范大学外文学院)chendm@1/ 55学习建议1. 要利用网络课件学习; 2. 要在线听“导学课”(共四次),或通过学习中心下导学课的录像(也可以在我给你们开的公共邮箱gdchendm@下载) ,重看录像; 3. 在BBS(交流园地)的“资源区”上下载“导学课”的 PPT ,复习PPT上的内容; 4. 学习《英语教学法教程》的相关章节; 5. 在BBS上下载“自测题”,解压,做题。

理解题目的意思;6. 经常访问BBS,提出问题,参与讨论; 7. 按时完成网上作业。

---------------------------------------------------------------最新资料推荐------------------------------------------------------ 第二次导学课内容? Task-based Language Teaching ? Teaching Pronunciation ? Teaching Grammar ? Teaching Vocabulary3/ 55Task-based Language Teaching (TBLT)---------------------------------------------------------------最新资料推荐------------------------------------------------------ Approach and MethodApproachMethod 1 Method 2 Method X5/ 55Communicative ApproachCLTTBLT/TBL---------------------------------------------------------------最新资料推荐------------------------------------------------------ What is a “task”?According to M. H. Long (1985:89): A task is “ a piece of work for oneself or forothers, freely or for some reward.” e.g. painting a fence; dressing a child; filling outa form; buying a pair of shoes; making an airline reservation; borrowing a library book; taking a driving test; typing a letter; weighing a patient; sorting letters; taking a hotel reservation; writing a cheque; finding a street destination; helping someone across a road; etc.7/ 55Pedagogical tasks def ined by David Nunan (1989: 8) :… a piece of classroom work which involves learners in comprehending, manipulating, producing or interacting in the target language while their attention is principally focused on meaning rather than form.---------------------------------------------------------------最新资料推荐------------------------------------------------------ Will, J. (1996: 23)? Tasks are activities where the target language is used by the learner for a communicative purpose (goal) in order to achieve an outcome.9/ 55Clark, Scarino and Brownell (1994:40):Four main components of a task? A purpose: a reason for undertaking the task.? A context: can be real simulated or imaginary (location, participants, time, etc.)? A process: to use learning strategies (problem solving, reasoning, inquiring, conceptualising, communicating, etc.)? A product: some form of outcome, visible (a written plan, a play, a letter, etc.) or invisible (enjoying a story, learning about another country, etc.)---------------------------------------------------------------最新资料推荐------------------------------------------------------ Exercises, exercise-tasks, and tasks? Tasks: focusing on the complete act of communication.? Exercises: focusing on individual aspects of language, such as vocabulary, grammar or individual skills.? Exercise-tasks: halfway between tasks and exercises.11/ 55A taskA dangerous momentStudent AHave you ever been in a situation where you felt you life was in danger? Describe the situation to your partner. Tell him/her what happened. Give an account of how you felt when you were in danger and afterwards.Student BListen to your partner’s narration about a dangerous moment in his/her life. Draw a picture to show what happened to your partner. Show him/her your picture when you have finished it.---------------------------------------------------------------最新资料推荐------------------------------------------------------ An exerciseGoing shoppingLook at Mary’s shopping list. Then look at the list of items in Abdullah’s store.Mary’s shopping list1. oranges 2. eggs 3. flour 4. powdered milk5. biscuits6. jamAbdullah’s store1. bread 2. salt 3. apples 4. Coca Cola5. tins of fish 6. four 7. chocolate 8. sugar9. curry powder 10. biscuits 11. powdered milk 12. dried beansWork with a partner. One person be Mary and the other be Abdullah. Make conversations like this:Mary: Good morning. Do you have any flour?Abdullah: Yes, I do.OrMary: Good morning. Do you have any jam?Abdullah: No, I’m sorry. I don’t have any.13/ 55PPP and TBLT---------------------------------------------------------------最新资料推荐------------------------------------------------------ Jane Willis’ (1996) TBL frameworkTask cycleLanguage focus15/ 55---------------------------------------------------------------最新资料推荐------------------------------------------------------ Task cycleTaskPlanningReportSs do the task, in pairs or smallgroups.T monitors from a distance.Ss prepare to report to the whole class (orally or in writing) how they did thetask, what they decided or discovered.Some groups present their reports to the class, or exchange written reports, and compare results.Ss may now hear a recording of others doing a similar task and compare how they all did it.17/ 55Language focusAnalysisSs examine and discussspecific features of thetext or transcript of the recording.AnalysissT conducts practice of new words, phrasesand patterns occurring in thedata, either during or afterthe analysis.---------------------------------------------------------------最新资料推荐------------------------------------------------------ Task cycleTask Planning ReportSs hear task recording or read textLanguage focusAnalysis & practice: Review & repeat task.PPP Presentation of single ‘new’ itemPractice of new item: drills exercises, dialogue practiceProduction Activity, role play or task to encourage ‘free’ use of L.19/ 55Teaching Pronunciation (Unit 6)? Components of pronunciation ? The goal of teaching pronunciation ? Practising pronunciation---------------------------------------------------------------最新资料推荐------------------------------------------------------ Components of pronunciation1. Simple sounds 2. Stress 3. Intonation 4. Rhythm21/ 55What should we teach whenteaching pronunciation?? We should pay attention to the distinction between pronunciation and phonetics.? The teaching of pronunciation should focus on the students’ability to identify and produce English sounds themselves. Students should NOT be led to focus on reading and writing phonetic transcripts of words, especially young students.? Introduction to phonetic rules should be avoided at the beginning stage.? Stress and intonation should be taught from the very beginning.---------------------------------------------------------------最新资料推荐------------------------------------------------------ The goal of teaching pronunciationThe realistic goals: 1. Consistency: Be smooth and natural.(连贯性)(fluency) 2. Intelligibility: Be understandable.(可辨认性,可理解性) 3. Communicative efficiency: Convey themeaning that is intended.(交际的有效性)23/ 55Practising pronunciation? Mechanical practice and Meaningful practice? Perception practice and Production practice---------------------------------------------------------------最新资料推荐------------------------------------------------------ Mechanical practice? Pronunciation is difficult to teach without drills on sounds.? However, drilling an individual sound for more than a few minutes a time may be boring and demotivating.? Sometimes we can make mechanical practice, i.e. drilling, more interesting and motivating, e.g. by playing games.25/ 55Meaningful practice? It is important to combine drilling pronunciation exercises with more meaningful exercises. e.g.1. A polliwog looks for his mom.2. A card game: What can you see?---------------------------------------------------------------最新资料推荐------------------------------------------------------ Perception practiceAim: to develop the ability to identify and distinguish between different soundsWays of perception practice: ? Using minimal pairs: will, well; till, tell; fill, fell ? Which order? 1. bear 2. tear 3. ear ? Same or different? met, meet; well, well; well, will ? Odd man out: bit, bit, bit, pit ? Completion: ate, ate, ate, ate, ate, …27/ 55Production practiceAim: to develop the ability to produce soundsWays of production practice:? Listen and repeat.? Fill in the blanks by saying words containing certain sounds. (p.55)? Make up sentences. e.g. last, fast, calm, dark…? Use meaningful context, e.g. role play the dialogue? Use pictures. (p.56) This is old Jack. He has a black cat…? Use tong ue twisters. (p.56) She sells seashells on the seashore. Five wives drank five bottles of fine wine.---------------------------------------------------------------最新资料推荐------------------------------------------------------ Some essentials of teaching pronunciation? Create a pleasant, relaxed, and dynamic classroom.? Use gestures. ? Build-up Students’ confidence. ? Bring variety to the classroom, e.g. Br. &Am. ? Use demo rather than explanation. ? Use visual aids.29/ 55Teaching Grammar (Unit 7)? Grammar presentation methods? Grammar practice---------------------------------------------------------------最新资料推荐------------------------------------------------------ Grammar presentation methods? The deductive method ? The inductive method31/ 55The deductive method? The deductive method relies on reasoning, analysing and comparing.The deductive method is criticized because:? Grammar is taught in an isolated way; ? Little attention is paid to meaning; ? The practice is often mechanical.---------------------------------------------------------------最新资料推荐------------------------------------------------------ Merits of the deductive method? It could be very successful with selected and motivated students.? It could save time when students are confronted with a grammar rule which is complex but which has to be learned.? It may help to increase student’confidence in those examinations which are written with accuracy as the main criterion of success.33/ 55The inductive method? In the inductive method, the teacher induces the learners to realise grammar rules without any form of explicit explanation.? It is believed that the rules will become evident if the students are given enough appropriate examples.? It is believed that the inductive method is more effective in that(=because) students discover the grammar rules themselves while engaged in language use.---------------------------------------------------------------最新资料推荐------------------------------------------------------ Distinction between Deduction and Induction ingrammar teaching? Deductive teaching ? Inductive teachinge.g.e.g.e.g.Rulee.g.e.g.Rulee.g.e.g. e.g.35/ 55Usually no clear-cut distinction? In practice, the distinction between the deductive method and the inductive method is not always apparent.---------------------------------------------------------------最新资料推荐------------------------------------------------------ Grammar practiceAccording to Ur, 6 factors contribute to successful practice.37/ 55According to Ur, 6 factorscontribute to successful practice? Pre-learning.(预习) Learners benefit from clear perception and short-term memory of the new language.? Volume and repetition. (反复复习)The more exposure to or production of language the learners have, the more likely they are to learn.? Success-orientation. (成功感)Practice is most effective when based on successful practice.? Heterogeneity. (水平要求的多样性)Practice should be able to elicit different sentences and generate different levels of answers from different learners.? Teacher assistance. (教师的辅助)The teacher should provide suggestions, hints and prompts.? Interest : (趣味性)an essential feature that is closely related to concentration---------------------------------------------------------------最新资料推荐------------------------------------------------------ Grammar practice? Mechanical practice ? Meaningful practice39/ 55Mechanical practiceMechanical practice involves activities that are aimed at form accuracy.e.g. ? Substitution, and ? Transformation drills---------------------------------------------------------------最新资料推荐------------------------------------------------------ Meaningful practice? In meaningful practice the focus is on the production, comprehension or exchange of meaning, though the students “keep an eye on” the way newly learned structures are used in the process.? e.g.:41/ 55Pair work: Look at the table below. Rank the itemson the left column according to the criteria listed on the top.Cheap Healthy Tasty Fattening ImportantBeer Water FruitCigarettesAlcohol Milk---------------------------------------------------------------最新资料推荐------------------------------------------------------ There is no clear-cut distinction between mechanical practice and meaningful practice.e.g. Chain of events? If I went for a sail, there might be a storm.? If there were a storm, my yacht would sink.? If my yacht sank, I would die. ? If I died, my parents would cry. ?…43/ 55Some forms of meaningful practice? Using prompts for practice–Picture, mime or gestures, information sheets, key phrase or key words, chained phrases for story telling? Using created situations: for simulative communication (role-play). e.g.–Your are a stranger in this town. … – There was a robbery yesterday in theneighbourhood. …---------------------------------------------------------------最新资料推荐------------------------------------------------------ Some suggestions about teaching grammar1. Teach only those rules that are simple and typical.2. Teach useful and important grammar points. 3. Teach grammar in context. 4. Use visible instruments such as charts,tables, diagrams, maps, drawings, and realia (pl. of realis) to aid understanding; 5. Avoid difficult grammatical terminologies as much as possible. 6. Allow enough opportunities for practice. 7. Live with the students’ mistakes and errors.45/ 55Teaching Vocabulary (Unit 8)? Presenting new words ? Consolidating vocabulary ? Developing vocabulary buildingstrategies---------------------------------------------------------------最新资料推荐------------------------------------------------------ Presenting new wordsSome suggestions: ? Provide creative examples. ? Elicit meaning from the students before tellingthem. ? Use related words such as synonyms, antonymsetc. to show the meaning. ? Think about how to check students’understanding. ? Relate the new word(s) to real life context(s). ? Predict possible misunderstanding or confusion.47/ 55Some more suggested ways? Use pictures, diagrams and maps to show the meaning;? Use realia (plural of realis); ? Use pantomimes or actions; ? Use lexical sets;e.g. cook, fry, boil, bake, grill, roast ? Translate and exemplify, esp. with technicalor abstract words; ? Use word formation rules and common affixes.e.g. deduction, induction---------------------------------------------------------------最新资料推荐------------------------------------------------------ How do we teach the new words, e.g., 20 new words, in a unit of atextbook?? Do we teach all the 20 word at a time in an isolated way, i.e., without context? or:? Do we use context and allow the new words to occur in a natural way?49/ 55A possible way? Before reading the text: T: We are going to read a story about NelsonMandela, the first black president of South Africa. Which of the following words do you think may be used in the story? prison, rights, violence, lawyer, youth, league, position, matter, fact, president; vote, accept; continue black, equal, poor, young, wrong, worried Make a guess.。

WordNet. An electronic lexical database.

WordNet. An electronic lexical database.

WordNet. An electronic lexical database. Edited by Christiane Fellbaum, with a preface by George Miller. Cambridge, MA: MIT Press; 1998. 422 p. $50.00This is a landmark book. For anyone interested in language, in dictionaries and thesauri, or natural language processing, the introduction, Chapters 1- 4, and Chapter 16 are must reading. (Select other chapters according to your special interests; see the chapter-by-chapter review). These chapters provide a thorough introduction to the preeminent electronic lexical database of today in terms of accessibility and usage in a wide range of applications. But what does that have to do with digital libraries? Natural language processing is essential for dealing efficiently with the large quantities of text now available online: fact extraction and summarization, automated indexing and text categorization, and machine translation. Another essential function is helping the user with query formulation through synonym relationships between words and hierarchical and other relationships between concepts. WordNet supports both of these functions and thus deserves careful study by the digital library community.The introduction and part I, which take almost a third of the book, give a very clear and very readable overview of the content, structure, and implementation of WordNet, of what is in WordNet and what is not; these chapters are meant to replace Five papers on WordNet(ftp:///pub/WordNet/5papers.ps), which by now are partially outdated. However, I did not throw out my copy of the Five papers; they give more detail and interesting discussions not found in the book. Chapter 16 provides a very useful complement; it includes a very good overview of WordNet relations (with examples and statistics) and describes possible extensions of content and structure.Part II, about 15% of the book, describes "extensions, enhancements, and new perspectives on WordNet", with chapters on the automatic discovery of lexical and semantic relations through analysis of text, on the inclusion of information on the syntactic patterns in which verbs occur, and on formal mathematical analysis of the WordNet structure.Part III, about half the book, deals with representative applications of WordNet, from creating a "semantic concordance" (a text corpus in which words are tagged with their proper sense), to automated word sense disambiguation, to information retrieval, to conceptual modeling. These are good examples of pure knowledge-based approaches and of approaches where statistical processing is informed by knowledge from WordNet.As one might expect, the papers in this collection (which are reviewed individually below), are of varying quality; Chapters 5, 12, and 16 stand out. Many of the authors pay insufficient heed to the simple principle that the reader needs to understand the overall purpose of work being discussed in order to best assimilate the detail. Repeatedly, one finds small differences in performance discussed as if they meant something, when it is clear that they are not statistically significant (or at least it is not shown that they are), a problem that afflicts much of information retrieval and related research. The application papers demonstrate the many uses of WordNet but also make a number of suggestions for expanding the types of information included in WordNet to make it even more useful. They also uncover weaknesses in the content of WordNet by tracing performance problems to their ultimate causes.The papers are tied together into a coherent whole, and that unity of the book is further enhanced by the presence of an index, which is often missing from such collections. One would wish the index a bit more detailed.. The reader wishing for a comprehensive bibliography on WordNet can find it at /~josephr/wn-biblio.html (and more information on WordNet can be found at /~wn).The book deals with WordNet 1.5, while WordNet 1.6 was released at just about the same time. For the papers on applications it could not be otherwise, but the chapters describing WordNet might have been updated to include the 1.6 enhancements. One would have wished for a chapter or at least a prominent mention of EuroWordNet (www.let.uva.nl/~ewn), a multilingual lexical database using WordNet's concept hierarchy.It would have been useful to enforce common use at least of key terms throughout the papers; for example, the introduction says "WordNet makes the commonly accepted distinction between conceptual-semantic relations, which link concepts, and lexical relations, which link individual words [emphasis added], yet in Chapter 1 George Miller states that "the basic semantic relation in WordNet is synonymy" [emphasis added] and in Chapter 5 Marti Hearst uses the term lexical relationships to refer to both semantic relationships and lexical relationships. Priss (p. 186) confuses the issue further in the following statements: "Semantic relations, such as meronymy, synonymy, and hyponymy, are according to WordNet terminology relations that are defined among synsets. They are distinguished from lexical relations, such as antonymy, which are defined among words and not among synsets." [Emphasis added]. Synonymy is, of course, a lexical relation, and terminology relations are relations between words or between words and concepts, but not relations between concepts.This terminological confusion is indicative of a deeper problem. The authors of WordNet vacillate somewhat between a position that stays entirely in the plane of words and deals with conceptual relationships in terms of the relationships between words, and the position, commonly adopted in information science, of separating the conceptual plane from the terminological plane.What follows is a chapter-by-chapter review. By necessity, this often gets into a discussion how things are done in WordNet, rather than just staying with the quality of the description per se.Part I. The lexical database (p. 1 - 127)George Miller's preface is well worth reading for its account of the history and background of WordNet. It also illustrates the lack of communication between fields concerned with language: Thesaurus makers could learn much from WordNet, and WordNet could learn much from thesaurus-makers. The introduction gives a concise overview of WordNet and a preview of the book. It might still make clear more explicitly the structure of WordNet: The smallest unit is the word/sense pair identified by a sense key (p. 107); word/sense pairs are linked through WordNet's basic relation, synonymy, which is expressed by grouping word/sense pairs into synonym sets or synsets; each synset represents a concept, which is often explained through a brief gloss (definition). Put differently, WordNet includes the relationship word W designates concept C, which is coded implicitly by including W in the synset for C. (This is made explicitin Table 16.1.) Two words are synonyms if they have a designates relationship to the same concept. Synsets (concepts) are the basic building blocks for hierarchies and other conceptual structures in WordNet.The following three chapters each deal with a type of word. In Chapter 1, Nouns in WordNet, George Miller presents a cogent discussion of the relations between nouns: synonymy, a lexical relation, and hyponymy (has narrower concept) and meronymy (has part), semantic relations. Hyponymy gives rise to the WordNet hierarchy. Unfortunately, it is not easy to get well-designed display of that hierarchy. Meronymy (has part) / holonymy (is part of) always are subject to confusion, not completely avoided in this chapter, which stems from ignoring the fact that airplane has-part wing really means "an individual object 1 which belongs to the class airplane has-part individual object 2 which belongs to the class wing". While it is true that bird has-part wing, it is by no means the same wing or even the same type of wing. Thus the strength of a relationship established indirectly between two concepts due to a has-part relationship to the same concept may vary widely. Miller sees meronymy and hyponymy as structurally similar, both going down. But it is equally reasonable to consider a wing as something more general than an airplane and thus construe the has-part relationship s going up, as is done in Chapter 13. Chapter 2, Modifiers in WordNet by Katherine J. Miller describes how adjectives and adverbs are handled in WordNet. It is claimed that nothing like hyponymy/hypernymy is available for adjectives. However, especially for information retrieval purposes, the following definition makes sense: Adj A has narrower term (NT) Adj B if A applies whenever B applies and A and B are not synonyms (the scope of A includes the scope of B and more). For example, red NT ruby (and conversely, ruby BT red) or large NT huge. To the extent that WordNet includes specific color values, their hierarchy is expressed only for the noun form of the color names. Hierarchical relationships between adjectives are expressed as similarity. The treatment of antonymy is somewhat tortured to take account of the fact that large vs small and big vs. little are antonym pairs, but *large vs. little or *prompt vs. slow are not. In general, when speakers express semantic opposition, they do not use just any term for each of the opposite concepts, but they apply lexical selection rules. The simplest way to deal with this problem is to establish an explicit semantic relationship of opposition between concepts and separately record antonymy as a lexical relationship between words (in their appropriate senses). Participial adjectives are maintained in a separate file and have a pointer to the verb unless they fit easily into the cluster structure of descriptive adjectives "rather than" having a pointer to the verb. Why a participial adjective in the cluster structure cannot also have a pointer to the verb is not explained.In Chapter 3, A semantic network of English verbs, Christiane Fellbaum presents a carefully reasoned analysis of the semantic relations between verbs (as presented in WordNet and as presented in alternate approaches) and of the other information for verbs, especially sentence structures, given in WordNet. Much of this discussion is based on results from psycholinguistics. The relationships between verbs are all forms of a broadly defined entailment relationship: troponymy establishes a verb hierarchy (verb A has the troponym verb B if B is a particular way of doing A, corresponding to hyponymy between nouns; the reverse relation is verb B has hypernym verb A); entailment in a more specific sense (to snore entails to sleep); cause; and backward presupposition (one must know before one can forget, not given inWordNet). More refined relationship types could be defined but are not for WordNet, since they do not seem to be psychologically salient and would complicate the interface. The last is not a good reason, since detailed relationship types could be coded internally and mapped to a more general set to keep the interface simple; the additional information would be useful for specific applications. In Section 3.3.1.3 on basic level categories in verbs, the level numbers are garbled (talk is both level L+1 and level L), making the argument unclear.In Chapter 4, Design and implementation of the WordNet lexical database and searching software, Randee I. Tengi lays out the technical issues. While it is hard to argue with a successful system, one might observe that this is not the most user-friendly system this reviewer has seen. The lexicographers have to memorize a set of special characters that encode relationship types (for example, ~ for hyponym, @ for hypernym); {apple, edible_fruit, @} means: apple has the hypernym edible_fruit. The interface is on a par with most online thesaurus interfaces, that is to say, poor. In particular, one cannot see a simple outline of the concept hierarchy; for many thesauri that is available in printed form. P. 108 says that "many relations being reflexive", when it should say reciprocal (such as hyponym / hypernym); a reflexive relation has a precise definition in mathematics (for all a, a R a holds).Chapter 16, Knowledge processing on an extended WordNet by Sanda M. Harabagiu and Dan I. Moldovan is discussed here because it should be read here. Table 16.1 gives the best overview of the relations in WordNet, with a formal specification of the entities linked, examples, properties (symmetry, transitivity, reverse of), and statistics. Table 16.2 gives further types of relationships that can be derived from the glosses (definitions) provided in WordNet. Still further relationships can be derived by inferences on relationship chains. The application of this extended WordNet to knowledge processing (for example, marker propagation, testing a text for coherence) is less well explained.Part II. Extensions, enhancements, and new perspectives on WordNet p. 129 - 196Chapter 5, Automated discovery of WordNet relations, by Marti A. Hearst is a well-written account of using Lexico-Syntactic Pattern Extraction (LSPE) to mine text for hyponym relationships. This is important to reduce the enormous workload in the strictly intellectual compilation of the large lexical databases needed for NLP and retrieval tasks. The following examples illustrate two sample patterns (the implied hyponym relationships should be obvious):red algae, such as Gelidiumbruises, broken bones, or other injuriesPatterns can be hand-coded and/or identified automatically. The results presented are promising. In the discussion of Automatic acquisition from corpora (p. 146, under Related work), only fairly recent work from computational linguistics is cited. Resnik, in Chapter 10, cites work back to 1957. There is also early work done in the context of information retrieval, for example, Giuliano 1965; for more references on early work see Soergel 1974, Chapter H.Chapter 6, Representing verb alternations in WordNet, by Karen T. Kohl, Douglas A. Jones, Robert C. Berwick, and Naoyuki Nomura discusses a format for representing syntactic patterns of verbs in WordNet. It is clearly written for linguists and a heavy read for others. It is best to look at the appendix first to get an idea of the ultimate purpose.Chapter 7, The formalization of WordNet by methods of relational concept analysis, by Uta E. Priss is somewhat of an overkill in formalization. Formal concept analysis might be a useful method, but the starkly abbreviated presentation given here is not enough to really understand it. The paper does reveal a proper analysis of meronymy, once one penetrates the formalism and the notation. The WordNet problems identified on p. 191-195 are quite obvious to the experienced thesaurus builder once the hierarchy is presented in a clear format and could in any event be detected automatically based on simple componential analysis.Part III Applications of WordNet. p. 197-406Chapters 8 Building semantic concordances, by Shari Landes, Claudia Leacock, and Randee I. Tengi and Chapter 9, Performance and confidence in a semantic annotation task, by Christiane Fellbaum, Joachim Grabowski, and Shari Landes both deal with manually tagging each word in a corpus (in the example, 103 passages from the Brown corpus and the complete text of Crane's The red badge of courage) with the appropriate WordNet sense. This is useful for many purposes: detecting new senses, obtaining statistics on sense occurrence, testing the effectiveness of correct disambiguation for retrieval or clustering of documents, and training disambiguation algorithms. The tagged corpus is available with WordNet.Chapter 10, WordNet and class-based probabilities, by Philip Resnik, presents an interesting and discerning discourse on "how does one work with corpus-based statistical methods in the context of a taxonomy?" or, put differently, how does one exploit the knowledge inherent in a taxonomy to make statistical approaches to language analysis and processing (including retrieval) more effective. However, the real usefulness of the statistical approach does not appear clearly until Section 10.3, where the approach is applied to the study of selectional preferences of verbs (another example of mining semantic and lexical information from text), and thus the reader lacks the motivation to understand the sophisticated statistical modeling. The reader deeply interested in these matters might prefer to turn to the fuller version in Resnik's thesis (access from /~Resnik),Chapters 11, Combining local context and WordNet similarity for word sense identification, by Claudia Leacok and Martin Chodorov and 13, Lexical chains as representations of context for the detection and correction of malapropisms, by Graeme Hirst and David St-Onge present methods for sense disambiguation. Chapter 11 uses a statistical approach, computing from a sense-tagged corpus (the training corpus) three distributions that measure the associations of a word in a given sense with part-of-speech tags, open-class words, and closed-class words. These distributions are then used to predict the sense of a word in arbitrary text. The key idea of the paper is very close to the basic idea of Chapter 10: exploit knowledge to improve statistical methods, in this case use knowledge on word sense similarity gleaned from WordNet to extend the usefulness of the training corpus by using the distributions for a sense A1 of word A todisambiguate word B, one of whose senses is similar to A1. Results of experiments are inconclusive (they lack statistical significance), but he idea seems definitely worth pursuing. Hirst and St-Onge take an entirely different approach. Using semantic relations from WordNet they build what they call lexical chains (semantic chain would be a better term) of words that occur in the text. Initially there are many competing chains, corresponding to different senses of the words occurring in the text, but eventually some chains grow long and others do not, and the word senses in the long chains are selected. (Instead of chains one might use semantic networks.) Various patterns of direct and indirect relationships are defined as allowable chain links, and each pattern has a given strength. In the paper, this method is used to detect and correct malapropisms, defined here as misspellings that are valid words (and therefore not detected by a spell checker), for example, “Much of the data is available toady electronically” or “Lexical relations very in number within the text”. (Incidentally, this sense of malapropism is not found in standard dictionaries, including WordNet; in a book on WordNet, one should use language carefully.) The problem of detecting misspellings that are themselves words is the same as the homonym problem: If one accepts, for example, both today and toady as variations of the word today and of the word toady, then [today. toady] becomes a homonym having all the senses of the two words that are misspellings of each other; that is to say, the occurrence of either word in the text may indicate any of the senses of both words. Finding the sense that fits the context selects the form correctly associated with that sense. This is an ingenious application of sense disambiguation methods to spell checking.Chapter 14, Temporal indexing through lexical chaining, by Reem Al-Halimi and Rick Kazman uses a very similar approach to indexing text, an idea that seems quite good (as opposed to the writing). The texts in question are transcripts from video conferences, but that is tangential and their temporal nature is not considered in the paper. Instead of chains they build lexical trees (again, semantic tree would be a better name, and why not use more general semantic networks).A text is represented by one or more trees (presumably each tree representing a logical unit of the text). Retrieval then is based on these trees as target objects. A key point in this method, not stressed in the paper, is the word sense disambiguation that comes about in building the semantic trees. The method appears promising, but no retrieval test results are reported.In Chapter 12, Using WordNet for text retrieval, Ellen M. Voorhees reports on retrieval experiments using WordNet for query term expansion and word sense disambiguation. No conclusions as to the usefulness of these methods should be drawn from the results. The algorithms for query term expansion and for word sense disambiguation performed poorly —partially due to problems in the algorithms themselves, partially due to problems in WordNet —so it is not surprising that retrieval results were poor. Later work on improved algorithms, including work by Voorhees, is cited.In Chapter 15, COLOR-X: Using knowledge from WordNet for conceptual modeling [in software engineering], J. F. M. Burg and R. P. van de Riet describe the use of WordNet to clarify word senses in software requirement documents and for detecting semantic relationships between entity types and relationship types in an E-R model. The case they make is not very convincing. They also bring the art of making acronym soup to new heights, misleading the reader in the process (the chapter has nothing to do with colors).All in all, this is a useful collection of papers and a rich source of ideas.ReferencesGiuliano, Vincent E. 1965 The interpretation of word associations. In Stevens, Mary E.; Giuliano, Vincent E.; Heilprin, Lawrence B., eds. Statistical association methods for mechanical documentation. Washington, DC: Government Printing Office; 1965 December. 261 p. (National Bureau of Standards Miscellaneous Publication 269)Soergel, Dagobert. 1974. Indexing languages and thesauri: Construction and maintenance. Los Angeles, CA: Melville; 1974. 632 p., 72 fig., ca 850 ref. (Wiley Information Science Series)。

C++04737 第9章 课后练习题 完整答案

C++04737 第9章 课后练习题  完整答案

第九章一、单项选择题1.B;进行文件操作时需要包含头文件“fstream”;2.A;课本P194;3.B;课本P196;4.B;课本P203;5.D;课本P18;6.D;关键字virtual能用来声明虚基类。

二、填空题1.输出数据按输出域右边对齐输出;(课本P196)2.cin.ignore(3);3.ofstream fout("Text.txt"); 重点三、分析程序题(程序可以直接复制到VC++ 6.0运行)1. 分析下面程序的输出结果。

#include <iostream>#include <iomanip>using namespace std;void main(){cout << oct << 15 << " ";cout << hex << 15 << " ";cout << setfill('a') << setw(10);//输出域占10个位,除数据外,其他填充a,如256占3位,其余填充a cout << 256 << " OK" << endl;}输出结果如下:2. 分析程序功能。

#include <iostream>#include <iomanip>using namespace std;void main(){for(int i=0; i<10; i++)cout << endl << setw(10-i) << '*' << setw(10) << '*';//这里没有使用setfill(),则默认填充空格}输出结果如下:四、完成程序题(除特别说明外,程序可以直接复制到VC++ 6.0运行)1.完成下面的主程序,使其输出为:-2.589000e+001 +2.589000e+001,主程序如下:#include <iostream>#include <iomanip>void main(){参考课本P197,为采用//科学计数法。

wordnet介绍

wordnet介绍

)作为一般词典的WordNet (WordNet as a dictionary)· WordNet跟传统的词典相似的地方是它给出了同义词集合的定义以及例句。

在同义词集合中包含对这些同义词的定义。

对一个同义词集合中的不同的词,分别给出适合的例句来加以区分。

(七)WordNet中的关系(relations in WordNet)·不同句法词类中的语义关系类型也不同,比如尽管名词都动词都是分层级组织词语之间的语义关系,但在名词中,上下位关系是hyponymy关系,而动词中是troponymy关系;动词中的entailment(继承)关系有些类似名词中的meronymy(整体部分)关系。

名词的meronymy关系下面还分出三种类型的子关系(见“WordNet 中的名词”部分)。

(八)网球问题(the tennis problem)· WordNet是基于同义性和反义(对义)性来描述词语和概念之间的各种语义关系类型的。

由于WordNet的注意力不是在文本和话语篇章水平上来描述词和概念的语义,因此WordNet中没有包含指示词语在特定的篇章话题领域的相关概念关系。

例如,WordNet中没有将racquet(网球拍)、 ball(球)、net(球网)等词语以一定方式联系到一起。

Roger Chaffin在一封私人信笺中,曾把这类问题称为“tennis problem”(网球问题),指的就是如何把racquet、ball、net、court game (场地比赛);或者把physician(内科医生)跟hospital(医院)联系到一起。

这对电子词典来说,是一个挑战。

已经有一些相关的研究工作在探索如何从WordNet 中包含的词汇和概念之间的语义关系,来推导出话题信息。

Hirst和St-Onge描述了一种所谓的“词汇链”(lexical chain)的应用方法。

“词汇链”是在基于名词的语义关系构成的上下文中的名词的序列。

人教版九年级上册英语试卷【含答案】

人教版九年级上册英语试卷【含答案】

人教版九年级上册英语试卷【含答案】专业课原理概述部分一、选择题1. The word "invent" is a derivative of which of the following words?A. findB. ventureC. preventD. inventor2. Which sentence uses the correct form of the verb?A. He don't like playing football.B. She doesn't can speak French.C. We doesn't go to school on Sundays.D. They don't have any money.3. What is the plural form of "child"?A. childsB. childesC. childrenD. childs'4. Which word is an antonym of "expensive"?A. cheapB. costlyC. affordableD. valuable5. Which sentence uses the correct form of the past tense?A. I seen a movie last night.B. She went to the store yesterday.C. He eat breakfast this morning.D. We done our homework already.二、判断题1. "Cook" and "book" rhyme with each other. ( )2. "Write" is a transitive verb. ( )3. An adjective describes a noun. ( )4. The word "cat" is a noun. ( )5. "Run" is the past tense of "running". ( )三、填空题1. I _______ (be) tired.2. She _______ (have) three brothers.3. We _______ (go) to the park last weekend.4. They _______ (watch) a movie last night.5. He _______ (do) his homework every day.四、简答题1. What is the difference between "affect" and "effect"?2. What is a conjunction?3. What is the past participle of "go"?4. What is the opposite of "hot"?5. What is a reflexive pronoun?五、应用题1. Write a sentence using the word "amazing".2. Write a sentence using the word "quickly".3. Write a sentence using the word "beautiful".4. Write a sentence using the word "exciting".5. Write a sentence using the word "difficult".六、分析题1. Analyze the following sentence: "The sun sets in the west."2. Analyze the following sentence: "She can play the piano."七、实践操作题1. Translate the following sentence into English: "El gato e pescado."2. Translate the following sentence into English: "Ellos van al cine."八、专业设计题1. Design a poster for an English club at your school. Include the club's name, meeting times, and activities. (2分)2. Create a schedule for a one-day English festival. Include different events such as storytelling, poetry reading, and a spelling bee. (2分)3. Plan a lesson about the differences between American and British English. Include vocabulary, pronunciation, and grammar points. (2分)4. Design a board game that helps students practice irregular verbs. Include instructions and a sample board layout. (2分)5. Create a worksheet for practicing subject-verb agreement. Include various sentences with different subjects and verbs. (2分)九、概念解释题1. Expln the difference between a simile and a metaphor. (2分)2. Define the term "homophone" and provide two examples. (2分)3. Expln the difference between a coordinating conjunction and a subordinating conjunction. (2分)4. Define the term "past perfect tense" and provide an example sentence. (2分)5. Expln the difference between "affect" and "effect". (2分)十、思考题1. How can learning a second language benefit students in their future careers? (2分)2. Why is it important to study grammar in English class? (2分)3. How can technology be used to enhance English language learning? (2分)4. What are some strategies for improving vocabulary skills? (2分)5. How can students practice their English skills outside of the classroom? (2分)十一、社会扩展题1. Research a famous English author and write a brief biography about their life and works. (3分)2. Explore the history of the English language and write about how it has evolved over time. (3分)3. Investigate the influence of social media on the English language and provide examples of new vocabulary or grammar usage. (3分)4. Write about the importance of cultural awareness in language learning and provide examples of cultural differences in English-speaking countries. (3分)5. Research a current event in an English-speaking country and write a summary of the event, including its impact on the local munity or the world. (3分) 本专业课原理概述部分试卷答案及知识点总结如下:一、选择题答案1. D2. B3. C4. A5. B二、判断题答案1. √2. √3. √4. √5. ×三、填空题答案1. am2. has3. went4. watched5. does四、简答题答案1. "Affect" is a verb that means to influence or make a difference to something, while "effect" is a noun that refers to a result or oute of an action.2. A conjunction is a word that connects clauses or sentences.3. The past participle of "go" is "gone".4. The opposite of "hot" is "cold".5. Reflexive pronouns are pronouns that refer back to the subject of a sentence, such as "myself", "yourself", "himself", "herself", "itself", "ourselves", "yourselves", and "themselves".五、应用题答案1. I saw an amazing performance at the theater last night.2. She ran quickly to catch the bus.3. The sunset was beautiful.4. The game was exciting.5. The math test was difficult.六、分析题答案1. The sentence "The sun sets in the west" is a statement of fact. It describesa natural occurrence that happens every day.2. The sentence "She can play the piano" is a statement about someone's ability. It indicates that the person has the skill to play the piano.七、实践操作题答案1. The cat eats fish.2. They go to the movies.知识点总结及各题型考察的学生知识点详解:一、选择题:主要考察学生对英语词汇的理解和记忆。

wordnet

wordnet

WordNet:概念知识库WordNet 是美国 Princeton 大学研发的一个英语词汇语义知识库,或者概念知识库。

本 wiki 只介绍 WordNet 里的名词和动词概念,及其概念间的主要关系。

对形容词和副词概念感兴趣的读者,可以参阅 WordNet 的手册或相关论文。

WordNet 的研发历经近二十年,目前的版本是 3.0,FreeBSD 中有它的 port。

WordNet 最初的研发者是 Princeton 大学的一些心理学家。

后来,由于计算语言学(或自然语言处理)的需求,WordNet 成为语义学研究最权威的知识库之一。

概念的表达或构建要通过自然语言完成,不同的文化和历史可能导致概念的差异,进而导致不同语言的词汇语义之间不是一一对应的。

例如,中文中“叔叔”、“伯父”、“姨夫”、“舅舅”等概念在英文中没有具体的对应,英文中只有 uncle。

虽然如此,人类的概念在很大程度上是共享的,那些小的差异可以忽略。

加上英语是世界语,这也是 WordNet 在全球得以流行的原因吧。

WordNet 里的概念所谓“概念”,在 WordNet 里抽象为一个同义词集合,它是 WordNet 的基本单位,也是 WordNet 所要描述的基本对象。

例如,“computer”有两个语义,分别是“计算机”和“计算者”。

IOU@~$ wn "computer" -synsnSynonyms/Hypernyms (Ordered by Estimated Frequency) of noun computer2 senses of computerSense 1computer, computing machine, computing device, data processor,electronic computer, information processing system=> machineSense 2calculator, reckoner, figurer, estimator, computer=> expert概念的上下位关系名词和动词概念(即同义词集合)之间有两个基本的关系,上位关系和下位关系。

paragraphformat中英文混排

paragraphformat中英文混排

paragraphformat中英文混排In the realm of written communication, the practice of mixing English and Chinese in paragraph formatting presents a unique set of challenges and opportunities. Thislinguistic blend, often seen in international business documents, academic papers, and even personal correspondences, reflects the increasingly globalized nature of our world. However, it also demands a deft handling of language and format to ensure clarity and coherence.One of the primary challenges in mixing English and Chinese in paragraph formatting is maintaining a consistent flow of ideas. The two languages have distinct syntactic structures and word orders, which can disrupt the reading experience if not handled carefully. For instance, while English often favors a subject-verb-object order, Chinese may employ a more topic-comment structure. This difference can lead to disjointed paragraphs if not properly reconciled.Another challenge lies in the use of punctuation. The punctuation marks in English and Chinese serve differentpurposes and have different conventions. For example, the comma in English is used to separate clauses and items in a list, while in Chinese, it is primarily used to indicate a pause or separation within a sentence. Mixing these conventions can lead to confusion and misinterpretation.To overcome these challenges, it is essential to have a solid understanding of both languages and their respective formatting rules. This includes being aware of the differences in syntax, word order, and punctuation, as well as how these differences can affect the flow and clarity of a mixed-language paragraph.One strategy for effective paragraph formatting is to establish a clear structure for the text. This can be done by using headings, subheadings, and bullet points to organize the content and make it easier to follow. Additionally, using transition words and phrases can help bridge the gaps between English and Chinese sentences, ensuring a smooth transition from one language to the other. Another strategy is to be mindful of the audience and context. The language and format should be tailored to the reader's background and expectations. For instance, in abusiness document aimed at a primarily English-speaking audience, it may be appropriate to use English as the primary language and incorporate Chinese terms or phrasesas needed for clarity or cultural sensitivity. Conversely,in a document intended for a Chinese audience, the reverse may be true.Overall, mixing English and Chinese in paragraph formatting requires a careful balancing act. It involves understanding the strengths and limitations of both languages, as well as the challenges posed by their differences. By employing strategies such as establishing a clear structure, using transition words, and tailoring the language and format to the audience and context, it is possible to create coherent and effective mixed-language paragraphs that convey ideas and information clearly and accurately.**中英文混排的挑战与策略**在书面交流领域,段落格式中英文混排的实践带来了一系列独特的挑战与机遇。

Word roots

Word roots

Word RootAero(air/aircraft)——aerogram(无线电报)Agri (field/farm)—— agricultureAnte (before) —— antebellum(战前的)Anthropo (man/human)——anthropology(人类学)Anti (against) ——antifreeze(防冻剂) Aqua(water)——aquatic(水生的)Arch(most important)——archenemy(大敌)Astro (space. star)—— astronaut astronomyAudio(hear/sound)——audiology(听力学)Auto (self) ——automobile(汽车)Bene (good)——benefitBiblio(books)——bibliography(参考书目)Bio (life) —— biologyBy (on the side/less important)——bystander(旁观者) Cardio (heart) ——cardiacCede (go) ——precedeCenti(hundred)——centimeterChromo (color) —— chromatic(多彩的) Chrono(time)——chronology(年代学)Circum (around)——circumference(周长)Co(together with)——cooperate Contra (against) —— contrast(比较) Counter(against)——counterforce(反对势力)Crypto(secret)——cryptography(密码) Cyber(Internet)——cybercafé(网吧)De (lower) ——decreaseDemi(half/partly)——demigod(次神) Demos (people) —— democracy(民主) Derma (skin) ——dermatology(皮肤学) Dis (apart)——dismissDyna (power) —— dynamic(动画)Dys(bad/wrong)——dysfunctional(有故障的)Eco(environment)——ecology(生态学) Ecto(outside) ——ectoparasite(皮外寄生物)Electro(electric)——electronic(电子的)En(put into a condition)——endanger Endo (inside)——endocardiac(心脏内的)Equi (equal)——equivalent(等价物)Ex (out) ——exitExtra(beyond)—— extraordinary(出色的)Fore(before)——foreground(前景)Giga(measurement)——gigabytes(千兆) 8GBHaemo/Hemo(blood)——hemophilia(血友病)Hecto(hundred)——hectogram(百克)Helio (sun) —— heliology (太阳学) Hetero(different)——heterodox(非正统的)Homo(same)——homosexuality(同性恋)Hydro (water) —— hydropower(水能) Hyper (over)——hyperactive(多动症)Hypo(under) —— hypodermic(皮下注射器)Infra(below)——infrared(红外线)Intra (inside, within) —— intranet(内联网)Inter (between) ——interact(相互作用)Intro (into) —— introductionJect (throw) —— eject(弹出)Kilo(thousand)——kilometerMacro (big) —— macroeconomics Magni (big) ——magnify (放大镜)Mal (bad) —— malfunction(失灵) Manu(hand)—— manufactureMega(huge)——mega storeMeta(change of position or state)——Metabolism(新陈代谢)Meter(measure)——diameter(直径)Micro (small) —— microeconomics Milli(thousand)——millimeter(毫米)Mini(small)——miniskirtMis(bad/wrong)——mistakeMono (one) —— monatomic(原子弹) Multi(many)——multiplyNano(one billion)——nanosecond(十亿分之一秒)Neo(new)——neo-impressionism(印象主义,印象派)Neuro(nerves)——neuron(神经元)Non(not)——non-smokerOff(away)——off work-ology(study of)——biologyOmni(all things)——omnipotent(全能的)-onomy(science of)——economy(经济)Ortho (true, tradition) —— orthodox(传统的)Osteo(bones)——osteopath(骨科医生)Out(better/longer/further)——outskirt(郊区)Over(too much)——oversexed(性欲过剩的)Paed(children)——pediatric(儿科的)Palaeo(history)——Paleolithic(旧石器时代)Pan(all of sth./whole of sth.)——pan-American(泛美的,全美洲的)Para(beyond)——paranormal(超自然的)Ped(foot)——pedal(踏板)Petro(rocks/gas/petrol)——petrology(岩石学)Pre(before)——previous(以前的) Phono (sound) ——phonics(音标) Photo (light) —— photographyPhyto (plant) —— phytochemicalPod(foot)——tripod(三脚架)Poly(表示“多”;多数)——polygon(多边形)Post(later)——postmodernPseudo(false)——pseudo-science(伪科学)Psycho (mind) —— psychologyPyro (fire) —— pyrometer(高温计)Radio (broadcast) —— radio waves Retro(backward)——retrospect(回想,追溯)Script(writing) —— manuscript(手写)Semi(half)——semicircle(半圆) Hemi(half)——hemisphere(半球体)Socio (society) —— sociologySub(under)——subway(地铁) Super(above)——supermanTecho (machine)——technology Tele(far)——telephoneTerra (earth) —— terrace(house, balcony)Thermo (heat, temperature) ——thermometer温度计Trans(cross)——transport(运输)Ultra (extremely)—— ultra-violet(紫外线)Un (opposite) —— unableVice (next in rank of somebody) ——vice-president(副总统)Zoo (animal) —— zoology(动物学)。

计算机专业英语试题及答案

计算机专业英语试题及答案

计算机专业英语试题及答案1. 选择题1. Which of the following is not a programming language?a) Javab) HTMLc) Pythond) CSS答案: b) HTML2. Which protocol is used for sending and receiving email?a) HTTPSb) FTPc) SMTPd) DNS答案: c) SMTP3. What does the acronym CPU stand for?a) Central Processing Unitb) Computer Processing Unitc) Control Processing Unitd) Central Power Unit答案: a) Central Processing Unit4. Which programming language is commonly used for web development?a) C++b) Javac) JavaScriptd) Swift答案: c) JavaScript5. What does HTML stand for?a) Hyperlinks and Text Markup Languageb) Hyper Text Markup Languagec) Home Tool Markup Languaged) Hyper Text Modeling Language答案: b) Hyper Text Markup Language2. 填空题1. The process of converting high-level programming code into machine code is called ___________.答案: compilation2. HTTP stands for ___________ Transfer Protocol.答案: Hyper Text3. The process of testing software by executing it is called ___________.答案: debugging4. Java is an object-_____________ programming language.答案: oriented5. DNS stands for Domain Name ___________.答案: System3. 简答题1. What is the difference between TCP and UDP?答案: TCP (Transmission Control Protocol) is a connection-oriented protocol, which means it establishes a connection between the sender and receiver before transferring data. It ensures that all packets are received in the correct order and provides error checking. UDP (User Datagram Protocol), on the other hand, is a connectionless protocol that does not establish a direct connection before transmitting data. It does not guarantee packet delivery or order but is faster and more efficient for time-sensitive applications.2. What is the purpose of an operating system?答案: An operating system (OS) is a software that manages computer hardware and software resources and provides common services forcomputer programs. Its primary purpose is to enable the user to interact with the computer and provide a platform for running applications. It manages memory, file systems, input/output devices, and multitasking. The OS also handles system security and resource allocation to ensure optimal performance.4. 解答题请参考下文并给出自己的解答。

DB33∕T 1136-2017 建筑地基基础设计规范

DB33∕T 1136-2017 建筑地基基础设计规范

5
地基计算 ....................................................................................................................... 14 5.1 承载力计算......................................................................................................... 14 5.2 变形计算 ............................................................................................................ 17 5.3 稳定性计算......................................................................................................... 21
主要起草人: 施祖元 刘兴旺 潘秋元 陈云敏 王立忠 李冰河 (以下按姓氏拼音排列) 蔡袁强 陈青佳 陈仁朋 陈威文 陈 舟 樊良本 胡凌华 胡敏云 蒋建良 李建宏 王华俊 刘世明 楼元仓 陆伟国 倪士坎 单玉川 申屠团兵 陶 琨 叶 军 徐和财 许国平 杨 桦 杨学林 袁 静 主要审查人: 益德清 龚晓南 顾国荣 钱力航 黄茂松 朱炳寅 朱兆晴 赵竹占 姜天鹤 赵宇宏 童建国浙江大学 参编单位: (排名不分先后) 浙江工业大学 温州大学 华东勘测设计研究院有限公司 浙江大学建筑设计研究院有限公司 杭州市建筑设计研究院有限公司 浙江省建筑科学设计研究院 汉嘉设计集团股份有限公司 杭州市勘测设计研究院 宁波市建筑设计研究院有限公司 温州市建筑设计研究院 温州市勘察测绘院 中国联合工程公司 浙江省电力设计院 浙江省省直建筑设计院 浙江省水利水电勘测设计院 浙江省工程勘察院 大象建筑设计有限公司 浙江东南建筑设计有限公司 湖州市城市规划设计研究院 浙江省工业设计研究院 浙江工业大学工程设计集团有限公司 中国美术学院风景建筑设计研究院 华汇工程设计集团股份有限公司

计算机专业英语多选题

计算机专业英语多选题

1.A user interface we said here is __ABC________A.a text-based user interface or GUIB.an interface between a computer and its peripheral deviceC.an interaction between an operating system and a userD.an interaction between an application program and a user2.___A___provides transparent transfer of data between end users, providing reliable data transfer services to the upper layers.A.The Transport LayerB. Session LayerC. Network LayerE.Application Layer E. Presentation Layer3.Many viruses do harmful things such as (ABCD ).A.deleting filesB. slowing your PC downB.simulating typos D. changing random data on your disk4.We can classify programming languages under two types:(AB ) languages and ( )languages.A.high-levelB. low-levelC. advanced-levelD. basic-level5.With an Internet connection you can get some of the basic services available are:___ABCD_______A.E-mailB. TelnetC. FTPD. Usenet news6. A general purpose computer has four main sections: ( ABCE).A.the control unitB. the memoryC. the input and output devicesD. the cpuE. the arithmetic and logic unit (ALU),7.Windows 2000 has the key technologies, they are (ABCD ).A.securityB. active directoryC. flat directoryD. enterprise management8.The register file is___ACD_______A.addressed by much shorter addressesB. physically largeC.physically smallD. one the same chip as the CPU9. A stack protocol can be used for (A ).A.removing the latest element ins( )ertedB. removing the earliest element ins( )ertedC. subroutine callsD. operation of arithmetic expressions10.The end equipment in a communication system includes (ABCD ).A.printersB. computersC. CRTsD. keyboards11.Microsoft Office Professional 2000 include____ABCD______.A.Excel 2000B. PowerPoint 2000C. Word 2000D. Outlook 200012. A general purpose computer has four main sections: ______ABCE______A.the input and output devicesB. the memoryC. the arithmetic and logic unit (ALU),D. the cpuE. the control unit13.The two most common types of scanners are (BC ) and ( )A. hand-held scannersB. flatbed scannersC. auto scannersD. handler scanners14.Some viruses use (CD ) and ( ) techniques to hide their existence.A.quickly spreadB. replace a part of system softwareC. stealthD. polymorphic15.The Windows 2000 product line includes____ABCD______.A.Windows 2000 Datacenter ServerB. Windows 2000 ProfessionalC. Windows 2000 ServerD. Windows 2000 Advanced Server16.Similar to viruses, you can also find malicious code in (ABC ).A.Trojan HorsesB. logic bombsC. wormsD. Microsoft Word Documents17.Viruses all have two phases to their execution, the ( ) and the ( BD).A.create phaseB. attack phaseC. del( )ete phaseD. infection phase18.Active Directory can help you (ACD ).A.get off the limits of down level networksB. deliver complete enterprise security by itselfC. build a complex international networkD. manage every resource with a single logon19.High-level languages are commonly classified as (ACDE ).A.object-orientedB. automaticC. functionalD. logic languagesE. procedure-oriented20.(CD )is a type of executable file .A.TXT fileB. JPG fileC. EXE fileD. COM file21.( ABCD) maybe a certain cause that some viruses infect upon.A.an external event on your PCB. a dayC. a counter within the virusD. a time22.(BC )is a type of executable file .A.TXT fileB. EXE fileC. COM fileD. JPG file23.The web browsers which is normal used, such as(ABCE ).A.FirefoxB. Internet ExplorerC. OpraD. ICQE. Apple Safari24.Newer ideas in computing such as(ABDE ) have radically altered the traditional concepts that once determined program form and functionA.artificial intelligenceB. distributed computingC. software engineeringD. parallel computingE. data mining25.Microsoft Windows currently supports __AC___and _____file systemsA.NTFSB. OCFSC. FATD. ext2E. NILFS26.Modem is ____ACD______.A.a modulator/demodulatorB. a data setC. a demodulatorD. a modulator27.The equipment _AB_____.A.transfers the number of bits in serial formB.manipulates digital information internally in word unitsC.transfers the number of bits in parallelD. manipulates digital information internally in serial form28.Electronic commerce that is conducted between businesses is referred to as business-to-business or DA. C2CB. C2BC. e-commerceD. B2B29.The World Wide Web also subsumes previous Internet information systems such as (AC ).A.GopherB. FtpC. FTPD. Telnet relies on the services of .NET data providers.There are ABCDA.ConnectionB. Data AdapterC. DataReaderD. Command31.The development process in the software life cycle involves four phases: analysis, design, implementation, and ___ACDE_____.A.analysisB. auditC. implementationD. designE. testing32.The end equipment in a communication system includes __ABCD____.A.printersB. CRTsC. computersD. keyboards33.In electronic commerce ,information search and discovery services include (ABCDE ).A.search enginesB. information filtersC. software agentsD. directoriesE. electronic catalogs34.GIS work with two fundamentally different types of geographic models.They are the (BD ).A.geography modelB. vector modelC. mathematic modelD. raster modelE. data model35.The two most common types of scanners are ____AC___and _____A.flatbed scannersB. hand-held scannersC. auto scannersD. handler scanners36.Windows 2000 has the key technologies, they are ( ABCD).A.active directoryB. flat directoryC. enterprise managementD. securityputer software, or just software is a general term used to describe a collection of computer programs, procedures and documentation that perform some tasks on a computer system.The term includes: ABCA.Firmware which is software programmed resident to electrically programmable memorydevices on board mainboards or other types of integrated hardware carriers.B.Application software such as word processors which perform productive tasks for users.C.Middleware which controls and co-ordinates distributed systems.38.Software engineering is related to the disciplines of ___ADE_____A.project managementB. natural languageC. netural networkD. systems engineeringE. computer science39.What make it is difficult to agree on how to count viruses? ABCDA.some viruses can create different versions when they infect other programsB.just a trivial change may create a new virusC.some viruses can use polymorphic techniqueD.new virus arise from an existing virus40. A virus is a program that reproduces its own code by (ABC ).A.ins_erting into the middle of a fileB. simply placing a pointerC. adding to the end of a fileD. replacing another program41.Input devices include: ____ABCD______.A.the keyboardB. microphoneC. touch screenD. the mouse42.Viruses all have two phases to their execution, the ( ) and the ( AD).A.attack phaseB. create phaseC. del_ete phaseD. infection phase43.The equipment (BC ).A.manipulates digital information internally in serial formB.manipulates digital information internally in word unitsC.transfers the number of bits in serial formD.transfers the number of bits in parallel44.Office automation is___CD_______.A.. the computerB. communications technologyC. the application of computerD. used to improve the productivity of people45.The types (classes, structs, enums, and so on) associated with each .NET data provider are located in their own namespaces are: ABCDEA.System.Data.SqlClient. Contains the SQL Server .NET Data Provider types.B.System.Data.Odbc. Contains the ODBC .NET Data Provider types.C.System.Data. Contains provider-independent types such as the DataSet and DataTable.D.System.Data.OracleClient. Contains the Oracle .NET Data ProviderE.System.Data.OleDb. Contains the OLE DB .NET Data Provider types.46.C++ is __ACD________.A.extended from CB. a superset of CC. object-orientedD. procedure-oriented47.Some viruses, with no attack phase, often damage the programs or disks they infect because they (AD ).A.have bugs in themB. show messages on your screenC. steal storageD. contain poor quality code48.Windows 2000 is ( A).A.an inventive technologyB. used for building a scalable networkB.the same as Windows NT D. new lease of Windows49.Some common applications related to electronic commerce are the following:A.TeleconferencingB. Online bankingC. EmailD. Instant messagingE. Enterprise content managementF. NewsgroupsG. Shopping cart software50. A program is a sequence of ( ) that can be executed by a computer.It can either be built into the hardware or exist independently in the form of (BC ).A.hardwareB. softwareC. instructionsD. data51.Electronic payments include ___ABCD______.A.credit card paymentsB. electronic checksC. digital currenciesD. cash payment52.The web browsers which is normal used, such as____ABCD_____.A.OpraB. Internet ExplorerC. FirefoxD. Apple SafariE. ICQputer programs fall into two major classes: __AC____and ____.A.application programsB. application suiteC. operating systemsD. database application54.Database connection ( B) allows an application to reuse an existing connection from a pool instead of repeatedly establishing a new connection with the database.A.pondB. poolingC. linkD. connection55.The development process in the software life cycle involves four phases: analysis, design, implementation, and (ABCE ).A.implementationB. designC. analysisD. auditE. testing56.Hypermedia include ( ABCDEF)A.video clipsB. imagesC. textD. flashE. vidoeF. sounds57.An asleep state is_____ABD_____ed to lessen wear-and tear on the computerB. used for saving energyC. indicated by the indicator light putting outD. a low power standby mode58.Electronic payments include (ABCD ).A.digital currenciesB. electronic checksC. credit card paymentsD. cash payment59.You want to (BCD ) maybe wake up a virus that has residented in memory.A.del( )ete a fileB. access a disketteC. execute a programD. copy a file60.Before you turn the power on with a new computer, you should make sure_____ABCD_____A.the computer system has been set upB. the computer is already out of the boxB.appropriate software have been installed D. appropriate cables are correctly connected61.Security is usually enforced through ___ABE________.A.access controlB. encryptionC. data retrievingD. data storingE. auditingputer programming is the process of ABCD__the source code of computer programs.A. testingB. maintainingC. debuggingD. writing63.Queues that occur everyday life can be seen ( ABCD).A.as automobiles waiting for a traffic lightB. as people waiting for service at a bankC. in certain societies lacking equalityD. in an emergency room of a hospital64.Static graphics include____AB______.A.animatorsB. photographsC. moviesD. pictures66.which is the threat to computer security in the following choices ? ABCDA. Computer CriminalsB. Human ErrorsC. Computer CrimeD. earthquake65.The attributes of the stack are______A____.A.queueB. FIFOC. built into their circuitryD. LIFO66.If a virus simply reproduce and have no cause for an attack phase, but it will still ( ) without your permission. BDA.play musicB. stealing storageC. del( )ete filesD. pilfer CPU cycles67.According to the text,modern digital computers can be divided into four major categories on the basis of cost and performance.They are _______ABDE___________. A.minicomputers B. mainframes C. note book D. workstation E. microcomputers F. lenovo68.The Application layer in TCP/IP Model is correspond to (ABD ) in OSI ModelA.Presentation LayerB. Session LayerC. Transport LayerD. Application LayerE. Network Layer69. A computer system user generally more cares for___ABD_______A.speed of computationB. storage sizeC. physical size of the computerD. efficiency of the computer71.Cache is____ABC______A. slowB. high costC. fastD. relatively small72.We can say a bus is simply_____ABC_____ .A. a wireB. a 16-bit busC. a group of wiresD. a 8-bit bus73.Viruses can delay their attack for (ABCD ).A.yearsB. monthsC. weeksD. days74.In order to increase our computer’s performance we need to____BCD______A.buy a L1 cacheB. have a much larger main memoryC. have a L2 cacheD. buy a L2 cache75.The software that controls the interaction between the input and output hardware is called BIOS,which stands of __A________A.Basic Input Output SystemB. Classic Input Output SystemB.Advanced Input Output System D. Junior Input Output System76.To enhance performance of a computer system we should____ACD______A.improve the pattern of referencing operandB. optimize the simple movement of dataC. optimize the basic sequence control mechanismD. use IF and LOOP instructions as many as possible77.Their company used international lawyers to prosecute a crime ring involving software ____A____in Thailand.A.piracyB. copyingC. duplicationD. cloning78.The software that controls the interaction between the input and output hardware is called BIOS,which stands of (B)A.Advanced Input Output SystemB. Basic Input Output SystemC. Classic Input Output SystemD. Junior Input Output System79.Some viruses use ( AC) and ( ) techniques to hide their existence.A.stealthB. quickly spreadC. polymorphicD. replace a part of system software80.Middleware lies in______ACD____A.the middle of interactions between different application programsB.the top of the layering vertical stackC.the top of an operating systemD.the middle of the layering vertical stack81.Software includes ( ACDE) etcA.video gamesB. all kinds of filesC. programsD. websitesE. mobile application82.The major functional components of an office automation system include: ___ABCD__A.electronic mailB. personal assistance featuresC. information storage and retrievalD. text processing83.The Internet carries various information resources and services, such as (ACDEF ) and the inter-linked hypertext documentsA.online chatB. talkingC. electronic mailD. file transferE. online gamingF. file sharing84. A processor is composed of:____ABCD______.A.an arithmeticB. a control unitC. RegistersD. logic unit85.Functions of the compiler used in RISC are ___ABC_______A.to optimize register usageB.to maximize register usageC.to allocate registers to those variables that will be used the most in a given time periodD.to compile a high level language program86. A digital computer is generally made up of five dstinct elements: a central processing unit,(ABCD).A.a busB. input devicesC. memeory storage devicesD. output devicesE. crt screen87.There are AB (CD)_between the DTEs.A.digital-to-analog converterB. the modemC. communications equipmentD. will be replaced by an upd_ated standard88.What make it is difficult to agree on how to count viruses? ABCDA.just a trivial change may create a new virusB.some viruses can use polymorphic techniqueC.some viruses can create different versions when they infect other programsD.new virus arise from an existing virus89.which aspect have to be considered in the design of a piece of software. ABCDEFGA.Fault-toleranceB. ExtensibilityC. ModularityD. CompatibilityE.MarketabilityF. PackagingG. Maintainability90.Active Directory can help you (ACD ).A.build a complex international networkB. deliver complete enterprise security by itselfC.manage every resource with a single logonD. get off the limits of down level networks91.Early computer solved_____CD_____ problems.A.controlB. engineeringC. mathematicalD. business applications92.The tools which Programming software usually provides include: ABCDEA.debuggersB. text editorsC. linkersD. compilersE. interpreters93.DTE is ( AB).A.data terminal equipmentB.the last piece of equipment that belonged to the subscriber in a data link systemC.satelliteD. Digital T-carrier94.According to the text,modern digital computers can be divided into four major categories on the basis of cost and performance.They are ( BDEF).A.note bookB. microcomputersC. lenovoD. minicomputersE. workstationF. mainframes95.which is the type of electronic commerce in the following choice ACA.B2BB. C2CC. B2C96.The operations of a structured data type might act on (ABCD ).A.a stackB. the values of the data typeC. component elements of the data structureD. a queue97.Types of media include__ACD________.A.textB. animationC. audioD. full-motion video98. A virus is a program that reproduces its own code by (ABC ).A.simply placing a pointerB. adding to the end of a fileC. ins( )erting into the middle of a fileD. replacing another program99.According to the text,the author mentions three of the most commonly used types of printer.They are (BDE ).A.belt printerB. dot-matrix printers;C. array printerD. laser printerE. inkjet printers100.The end equipment in a communication system includes ___ABD_______A.keyboardsB. DCEC. CRTsD. computers101.Software includes _____ACDE________etcA.programsB. all kinds of filesC. video gamesD. websitesE. mobile application102.With .NET, Microsoft is opening up a channel both to ( ) in other programming languages and to ( BC). (developers; components)A.coderB. developersC. componentsD. architecturemon contemporary operating systems include (ABCD ).A.LinuxB. Microsoft WindowsC. SolarisD. Mac OS104.A mechanism for translating Internet hostnames into IP addresses is___BCD_______A.equipped into the general-purpose operating systemB.typically inside of operating system kernelC.as a middleware by author’s definitionD.typically outside of operating system kernel105.RISC is____ABC______ed for many computer manufacturers nowadaysB.guided to be built from studying the execution behavior of high-level language programsC.abbreviation of reduced instruction set computerD.abbreviation of complex instruction set computer106.With .NET, Microsoft is opening up a channel both to _BC_______in other programming languages and to ________. (developers; components)A.coderB. componentsC. developersD. architecture107.The tools which Programming software usually provides include: ABCDEpilersB. interpretersC. text editorsD. linkersE. debuggers108.The following products of software are belong to middleware____BCD______A.OracleB. IBM’s Web Sphere MQC. Java 2 PlatformD. J2EE109.The system manager used by a fast processor can____BCD______A.connect a networkB. monitor processor’s core temperatureC. monitor processor’s supply voltagesD. reset a system110.Queues that occur everyday life can be seen (ABCD ).A.as automobiles waiting for a traffic lightB. as people waiting for service at a bankC. in an emergency room of a hospitalD. in certain societies lacking equality111.C++ include the following pillars: ____ABCD______.A.data hidingB. polymorphismC. encapsulationD. inheritance112.Windows 2000 is____ACD______A.new lease of WindowsB. an inventive technologyC. the same as Windows NTD. used for building a scalable network113.We use paged virtual memory to___ABCD_______A.extend the size of memoryB. reduce latency of the diskC. store large program and data setD. increase bandwidth of the disk114.According to the physical size of computers we can classify the __ABCD____ computers into A. supercomputer B. minicomputer C. microcomputer D. mainframe115.Some common applications related to electronic commerce are the following: ABCDEFGA.EmailB. TeleconferencingC. Instant messagingD. Shopping cart softwareE.NewsgroupsF. Enterprise content managementG. Online banking116.One machine cycle in RISC has _B_________A.two machine instructionsB. one machine instructionC. four machine instructionsD. three machine instructions117.The function of computer hardware is typically divided into three main categories.They are____ADE_____.A.inputB. motherboardC. cpuD. storageE. output118.Active Directory supports ( ABCD).A.granular access controlB. inheritanceC. encapsulationD. delegation of administrative task119.The core of SQL is formed by a command language that allows the (ACDE ) and performing management and administrative functions.A.deletion of dataB. process of dataC. updating of dataD. retrieval of dataE. ins( )ertion of data120.Some commentators say the outcome of the information revolution is likely to be as profound as the shift in (ABCD )A.industrialB. agriculturalC. Service IndustryD. handicraft industry。

GESP C++四级样题卷

GESP C++四级样题卷

GESP C++四级样题卷(满分:100分考试时间:90分钟)学校:姓名:______________________题目一二三总分得分一、单选题(每题2分,共30分)题号123456789101112131415答案D D D D A C C A B B A B B B C1.在C++中,指针变量的大小(单位:字节)是()A.2B.4C.8D.与编译器有关2.以下哪个选项能正确定义一个二维数组()A.int a[][];B.char b[][4];C.double c[3][];D.bool d[3][4];3.在C++中,以下哪种方式不能用于向函数传递参数()A.值传递B.引用传递C.指针传递D.模板传递4.以下关于C++函数的形参和实参的叙述,正确的是()A.形参是实参的别名B.实参是形参的别名C.形参和实参是完全相同的D.形参用于函数声明,实参用于函数调用5.排序算法的稳定性是指()A.相同元素在排序后的相对顺序保持不变B.排序算法的性能稳定C.排序算法对任意输入都有较好的效果D.排序算法容易实现6.如果有如下二维数组定义,则a[0][3]的值为()int a[2][2]={{0,1},{2,3}};A.编译出错B.1C.3D.07.以下哪个选项能正确访问二维数组array的元素()A.array[1,2]B.array(1)(2)C.array[1][2]D.array{1}{2}8.以下哪个选项是C++中正确的指针变量声明()A.int*p;B.int p*;C.*int p;D.int*p*;9.在C++中,以下哪个关键字或符号用于声明引用()A.pointerB.&C.*D.reference10.以下哪个递推关系式表示斐波那契数列()A.F(n)=F(n-1)+F(n-2)+F(n-3)B.F(n)=F(n-1)+F(n-2)C.F(n)=F(n-1)*F(n-2)D.F(n)=F(n-1)/F(n-2)11.以下哪个函数声明在调用时可以传递二维数组的名字作为参数?A.void BubbleSort(int a[3][4]);B.void BubbleSort(int a[][]);C.void BubbleSort(int*a[]);D.void BubbleSort(int**a);12.在C++中,以下哪个关键字用来捕获异常()A.throwB.catchC.tryD.finally13.在下列代码的横线处填写(),可以使得输出是“2010”。

NLP(一)语料库和WordNet

NLP(一)语料库和WordNet

NLP(⼀)语料库和WordNet 访问语料库NLTK数据库的安装:NLTK语料库列表:内部访问(以Reuters corpus为例):import nltkfrom nltk.corpus import reuters# 下载路透社语料库nltk.download('reuters')# 查看语料库的内容files = reuters.fileids()print(files)# 访问其中⼀个⽂件的内容words14826 = reuters.words(['test/14826'])print(words14826[:20])# 输出主题(⼀共90个)reutersGenres = reuters.categories()print(reutersGenres)# 访问⼀个主题,⼀句话⼀⾏输出for w in reuters.words(categories=['tea']):print(w + ' ',end='')if w is '.':print()下载外部语料库并访问(以影评数据集为例)下载数据集:本例下载了1000积极和1000消极的影评from nltk.corpus import CategorizedPlaintextCorpusReader# 读取语料库reader = CategorizedPlaintextCorpusReader(r'D:\PyCharm 5.0.3\WorkSpace\2.NLP\语料库\1.movie_review_data_1000\txt_sentoken',r'.*\.txt',cat_pattern=r'(\w+)/*') print(reader.categories())print(reader.fileids())# 语料库分成两类posFiles = reader.fileids(categories='pos')negFiles = reader.fileids(categories='neg')# 从posFiles或negFiles随机选择⼀个⽂件from random import randintfileP = posFiles[randint(0,len(posFiles)-1)]fileN = negFiles[randint(0,len(negFiles)-1)]# 逐句打印随机的选择⽂件for w in reader.words(fileP):print(w + ' ',end='')if w is '.':print()for w in reader.words(fileN):print(w + ' ',end='')if w is '.':print()CategorizedPlaintextCorpusReader类通过参数的设置,从内部将样本加载到合适的位置语料库中的词频计算和计数分布分析以布朗语料库为例:布朗⼤学 500个⽂本 15个类import nltkfrom nltk.corpus import brownnltk.download('brown')# 查看brown中的类别print(brown.categories())# 挑选出三种类别,并获取其中的疑问词genres = ['fiction','humor','romance']whwords = ['what','which','how','why','when','where','who']# 迭代器分别分析3种类for i in range(0,len(genres)):genre = genres[i]print()print("Analysing '"+ genre + "' wh words")genre_text = brown.words(categories = genre)print(genre_text)# 返回输⼊单词对象的wh类及对应的频率fdist = nltk.FreqDist(genre_text)for wh in whwords:print(wh + ':',fdist[wh],end=' ')print()输出:['adventure', 'belles_lettres', 'editorial', 'fiction', 'government', 'hobbies', 'humor', 'learned', 'lore', 'mystery', 'news', 'religion', 'reviews', 'romance', 'science_fiction'] Analysing 'fiction' wh words['Thirty-three', 'Scotty', 'did', 'not', 'go', 'back', ...]what: 128 which: 123 how: 54 why: 18 when: 133 where: 76 who: 103Analysing 'humor' wh words['It', 'was', 'among', 'these', 'that', 'Hinkle', ...]what: 36 which: 62 how: 18 why: 9 when: 52 where: 15 who: 48Analysing 'romance' wh words['They', 'neither', 'liked', 'nor', 'disliked', 'the', ...]what: 121 which: 104 how: 60 why: 34 when: 126 where: 54 who: 89⽹络⽂本和聊天⽂本的词频分布import nltkfrom nltk.corpus import webtext# nltk.download('webtext')print(webtext.fileids())# 选择⼀个数据⽂件,并计算频率分布,获得FreqDist的对象fdistfileid = 'singles.txt' # 个⼈⼴告wbt_words = webtext.words(fileid)fdist = nltk.FreqDist(wbt_words)# 获取⾼频单词及其计数print('最多出现的词 "' , fdist.max() , '" :' , fdist[fdist.max()])# 获取所有单词的计数print(fdist.N())# 找出最常见的10个词print(fdist.most_common(10))# 将单词和频率制成表格print(fdist.tabulate(5))# 将单词和频率制成分布图fdist.plot(cumulative=True) # 计数显⽰,cumulative=percents为百分⽐显⽰输出:['firefox.txt', 'grail.txt', 'overheard.txt', 'pirates.txt', 'singles.txt', 'wine.txt']最多出现的词 " , " : 5394867[(',', 539), ('.', 353), ('/', 110), ('for', 99), ('and', 74), ('to', 74), ('lady', 68), ('-', 66), ('seeks', 60), ('a', 52)], . / for and539 353 110 99 74None累计计数分布图:使⽤WordNet获取⼀个词的不同含义# import nltk# nltk.download('wordnet')from nltk.corpus import wordnet as wnchair = 'chair'# 输出chair的各种含义chair_synsets = wn.synsets(chair)print('Chair的意思:',chair_synsets,'\n\n')# 迭代输出含义,含义的定义,同义词条,例句for synset in chair_synsets:print(synset,': ')print('Definition: ',synset.definition())print('Lemmas/Synonymous words: ',synset.lemma_names())print('Example: ',synset.examples(),'\n')输出:Chair的意思: [Synset('chair.n.01'), Synset('professorship.n.01'), Synset('president.n.04'), Synset('electric_chair.n.01'), Synset('chair.n.05'), Synset('chair.v.01'), Synset('moderate.v.01')] Synset('chair.n.01') :Definition: a seat for one person, with a support for the backLemmas/Synonymous words: ['chair']Example: ['he put his coat over the back of the chair and sat down']Synset('professorship.n.01') :Definition: the position of professorLemmas/Synonymous words: ['professorship', 'chair']Example: ['he was awarded an endowed chair in economics']Synset('president.n.04') :Definition: the officer who presides at the meetings of an organizationLemmas/Synonymous words: ['president', 'chairman', 'chairwoman', 'chair', 'chairperson']Example: ['address your remarks to the chairperson']Synset('electric_chair.n.01') :Definition: an instrument of execution by electrocution; resembles an ordinary seat for one personLemmas/Synonymous words: ['electric_chair', 'chair', 'death_chair', 'hot_seat']Example: ['the murderer was sentenced to die in the chair']Synset('chair.n.05') :Definition: a particular seat in an orchestraLemmas/Synonymous words: ['chair']Example: ['he is second chair violin']Synset('chair.v.01') :Definition: act or preside as chair, as of an academic department in a universityLemmas/Synonymous words: ['chair', 'chairman']Example: ['She chaired the department for many years']Synset('moderate.v.01') :Definition: preside overLemmas/Synonymous words: ['moderate', 'chair', 'lead']Example: ['John moderated the discussion']上位词和下位词下位词更具体,上位词更⼀般(泛化)以bed.n.01和woman.n.01为例:from nltk.corpus import wordnet as wnwoman = wn.synset('woman.n.01')bed = wn.synset('bed.n.01')# 返回据有直系关系的同义词集,上位词!print(woman.hypernyms())woman_paths = woman.hypernym_paths()# 打印从根节点到woman.n.01的所有路径for idx,path in enumerate(woman_paths):print('\n\nHypernym Path :',idx+1)for synset in path:print((),',',end='')# 更具体的术语,下位词!types_of_bed = bed.hyponyms()print('\n\nTypes of beds(Hyponyms): ',types_of_bed)# 打印出更有意义的lemma(词条)print('\n',sorted(set(() for synset in types_of_bed for lemma in synset.lemmas())))输出:[Synset('adult.n.01'), Synset('female.n.02')]Hypernym Path : 1entity.n.01 ,physical_entity.n.01 ,causal_agent.n.01 ,person.n.01 ,adult.n.01 ,woman.n.01 ,Hypernym Path : 2entity.n.01 ,physical_entity.n.01 ,object.n.01 ,whole.n.02 ,living_thing.n.01 ,organism.n.01 ,person.n.01 ,adult.n.01 ,woman.n.01 ,Hypernym Path : 3entity.n.01 ,physical_entity.n.01 ,causal_agent.n.01 ,person.n.01 ,female.n.02 ,woman.n.01 ,Hypernym Path : 4entity.n.01 ,physical_entity.n.01 ,object.n.01 ,whole.n.02 ,living_thing.n.01 ,organism.n.01 ,person.n.01 ,female.n.02 ,woman.n.01 ,Types of beds(Hyponyms): [Synset('berth.n.03'), Synset('built-in_bed.n.01'), Synset('bunk.n.03'), Synset('bunk_bed.n.01'), Synset('cot.n.03'), Synset('couch.n.03'), Synset('deathbed.n.02'), Synset('double_bed.n.01'), Synset('four-poster.n.01'), Syn ['Murphy_bed', 'berth', 'built-in_bed', 'built_in_bed', 'bunk', 'bunk_bed', 'camp_bed', 'cot', 'couch', 'deathbed', 'double_bed', 'four-poster', 'hammock', 'marriage_bed', 'plank-bed', 'platform_bed', 'sack', 'sickbed', 'single_bed', 'sleigh_bed', 'truckle', 'truckle基于WordNet计算某种词性的多义性以名词n为例:from nltk.corpus import wordnet as wntype = 'n' #动词v,副词r,形容词a# 返回WordNet中所有type类型的同义词集sysnets = wn.all_synsets(type)# 将所有词条合并成⼀个⼤listlemmas = []for sysnet in sysnets:for lemma in sysnet.lemmas():lemmas.append(())# 删除重复词条,list=>setlemmas = set(lemmas)# 计算每个词条type类型的含义数并加到⼀起count = 0for lemma in lemmas:count = count + len(wn.synsets(lemma,type)) # lemma在type类型下的所有含义# 打印所有数值print('%s总词条数: '%(type),len(lemmas))print('%s总含义数: '%(type),count)print('%s平均多义性: '%(type),count/len(lemmas))输出:n总词条数: 119034n总含义数: 152763n平均多义性: 1.2833560159282222。

统计word文档字数 ComputeStatistics

统计word文档字数 ComputeStatistics

Dim fp As String = TextBox1.TextDim fl() As String = Directory.GetFiles(fp, "*.docx", SearchOption.AllDirec tories)Dim dt As New DataTabledt.Columns.Add("路径")dt.Columns.Add("文件名")'dt.Columns.Add()dt.Columns.Add("字符数_不记空格_注释")dt.Columns.Add("字符数_不记空格")dt.Columns.Add("字符数_记空格_注释")dt.Columns.Add("字符数_记空格")dt.Columns.Add("字数_注释")dt.Columns.Add("字数")probar.Maximum = fl.Lengthprobar.Value = 0For Each fn As String In flDim fnl() As String = tools1.getfileinfo(fn)Dim wdApp As Microsoft.Office.Interop.Word.ApplicationwdApp = CreateObject("Word.Application")Dim c1, c2, c3, c4, c5, c6 As StringTry'wdApp.ActiveDocument.BuiltInDocumentProperties(Index)Dim doc As Microsoft.Office.Interop.Word.Documentdoc = wdApp.Documents.Open(fn)' doc.BuiltInDocumentProperties(Index) 这样也可以'Dim prop As Microsoft.Office.Interop.Word.WdBuiltInProperty'prop = doc.BuiltInDocumentProperties'Dim doc As New Aspose.Words.Document(fn)'Dim prop As Aspose.Words.Properties.BuiltInDocumentPropertiesDim stat As Microsoft.Office.Interop.Word.WdStatisticstat = Microsoft.Office.Interop.Word.WdStatistic.wdStatisticCharact ers'c1 = puteStatistics(Microsoft.Office.Interop.Word.WdStatist ic.wdStatisticCharacters, True)'c2 = puteStatistics(Microsoft.Office.Interop.Word.WdStatist ic.wdStatisticCharacters, False)c3 = puteStatistics(Microsoft.Office.Interop.Word.WdStatisti c.wdStatisticCharactersWithSpaces, True)c4 = puteStatistics(Microsoft.Office.Interop.Word.WdStatisti c.wdStatisticCharactersWithSpaces, False)c5 = puteStatistics(Microsoft.Office.Interop.Word.WdStatisti c.wdStatisticWords, True)c6 = puteStatistics(Microsoft.Office.Interop.Word.WdStatisti c.wdStatisticWords, False)doc.Close()wdApp.Quit()wdApp = NothingCatch ex As Exception'MsgBox(fn & ex.Message)wdApp.Quit()wdApp = NothingEnd Try'wdApp.Quit()'wdApp = NothingDim dr As DataRow = dt.NewRowdr(0) = fnl(0)dr(1) = fnl(1)dr(2) = c1dr(3) = c2dr(4) = c3dr(5) = c4dr(6) = c5dr(7) = c6dt.Rows.Add(dr)tsCount1.Text = fnprobar.PerformStep()Application.DoEvents()Nextdgv1.DataSource = dt。

ingredient–target network construction 的意思

ingredient–target network construction 的意思

ingredient–target network construction 的意思摘要:一、引言二、什么是ingredient-target network construction?三、ingredient-target network construction的实用性四、如何进行ingredient-target network construction?五、实例分析六、ingredient-target network construction在实际应用中的优势七、结语正文:一、引言在当今科技飞速发展的时代,ingredient-target network construction (成分-目标网络构建)成为了研究和实践的热点。

本文将详细介绍什么是ingredient-target network construction,以及它在实际应用中的可读性和实用性。

二、什么是ingredient-target network construction?Ingredient-target network construction,中文译为成分-目标网络构建,是一种将化学成分与其生物活性目标相结合的方法。

这种方法主要通过构建一个网络,将各种化学成分与特定的生物目标相互关联,从而揭示化学成分在生物体中的作用机制。

三、ingredient-target network construction的实用性1.有助于药物发现:通过分析已知药物的成分和作用机制,可以预测新药物的研究方向和潜在作用。

2.辅助疾病诊断:根据化学成分与生物目标的关系,发现潜在的生物标志物,为疾病诊断提供新思路。

3.优化药物设计:通过分析药物成分与生物目标之间的相互作用,可以优化药物结构,提高药物的疗效和安全性。

四、如何进行ingredient-target network construction?1.数据收集:收集化学成分、生物活性数据和生物目标信息。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

c m p -l g /9806016 23 J u n 1998Using WordNet for Building WordNets Xavier Farreres, German Rigau, Horacio Rodríguez Departament de Llenguatges i Sistemes Informàtics.Uni versi tat Poli tècni ca de Catalunya. Barcelona. Spai n.{farreres, horaci o, g.ri gau}@lsi.upc.esAbstractTh i s paper summar ises a set of methodologies and techniques for the fast constructi on of multi li ngual WordNets.The Engl sh WordNet s used n th sapproach as a backbone for Catalan andSpan ish WordNets and as a lex ical knowledge resource for several subtasks.1 Motivation and Introduction One of the main issues in last years as regards NLP act i v i t i es i s the i ncreas i ngly fast development of generic language resources. A lot of such resources, nclud ng both software and l ingware i tems (lex i cons, lex i cal databases,grammars, corpora marked in several ways) have been made avai lable for research and i ndustri al appli cati ons.Special interest presents, for knowledge-based NLP tasks, the ava lab l ty of w de coverage ontologies. Most known ontologies (as GUM, CYC,ONTOS, MICROKOSMOS, EDR or WORDNET,see [Gomez 98] for an extensi ve survey) di ffer i n great extent on several characteri sti cs (e.g. broad coverage vs. domai n speci fi c, lexi cally ori ented vs. conceptually ori ented, granulari ty, ki nd of i nformati on placed i n nodes, ki nd of relati ons,way of bui ldi ng, etc.). It i s clear, however, that for a wi de range of appli cati ons, WordNet (WN)[Miller 90] as become a de-facto standard.The success of WordNet has determi ned the emergence of several projects that a i m the construction of WordNets for other languages than English (e.g., [Hamp & Feldweg 97], [Artale et al.97]) or to develop multi li ngual WordNets (the most important project i n th i s l i ne is EuroWordNet (EWN)1).1http://www.let.uva.nl/~ewn/ The aim of EWN project i s to bui ld a multi li ngual database wi th WordNets for several european languages (i n the fi rst phase, Dutch,Italian and Spanish in addition to English).The constructi on of a WN for a language Lg(LgWN) can be tackledi n d i fferent ways accord ng to the lex cal sources ava lable. Of course the manual construction can be undertaken qu te stra ghtforwardly and leads to the best results in terms of accuracy, but has the important drawback of i ts cost. So, other approaches have been carr i ed out tak i ng prof i t of ava i lable resources i n fully automati c or semi -automati c ways.Whi ch are these lexi cal resources? Basi cally four kinds of resources have been used: 1) English WN (EnWN), as an i ni ti al skeleton for tryi ng toattach the words of Lg to i t, 2) already existi ng taxonomies of Lg (both at word and at sense level),3) bilingual (English and Lg) and 4) monolingual (Lg) dictionaries. All the approaches using EnWN as skeleton are based on the assumption of a closeconceptual si mi lari ty between Engli sh and Lg, i n such a way that most of the structure (relations) in EnWN could be maintained for LgWN.In the case of bi li ngual di cti onari es the usual approach is to try to link the English counterpart of entries to synsets in EnWN and to assume that the entry can be linked to the same synset.Monol i ngual d i ct i onar i es have been used basi cally as a source for extracti ng taxonomi c (hypernym) links between words (or senses [Bruce & Guthri e 92], [Ri gau et al. 97]) and i n lower extent for extract ng other k nds of semant c relations [Richardson 97] (e.g. meronymic links).Once a taxonomy of Lg (already exi sti ng or bui lt from a monoli ngual MRD) i s avai lable, the task can consi st of 1) enri chi ng the taxonomi c structure wi th other semanti c li nks (manually or automat i cally), as is the case of bu i ld ing individual WNs, or 2) merging this structure with other already exi sti ng ontologi es (as EnWN or EWN).Th i s paper presents our approach to the constructi on of WNs for two languages, Spani sh and Catalan, and li nki ng the fi rst one to EWN.We have developed a methodology that uses as core source EnWN 2. The methodology implies 1)2We have used WordNet 1.5.The use of EnWN for guiding the selection of the basic concepts of our WNs, 2) the use of EnWN as skeleton for linking Spanish and Catalan words to English synsets using bilingual dictionaries, 3) theuse of EnWN, together w i th b i l i ngual and monol i ngual d i ct i onar i es for allow i ng theconstruction of taxonomies (at sense level) of our languages and 4) the use of EnWN together withalready bui lt fragments of SpWN and CtWN for merging and incorporating these taxonomies to our WNs.In sect on 2 an overall descr pt on of our approach is given. Sections 3, 4 and 5 focus on the procedures for extracti ng connecti ons between words/senses/synsets. Secti on 3 i s devoted to procedures based on the use of bilinguals, section 4on the constructi on of taxonomi es and secti on 5deals w th the merg ng method. In all thesesecti ons we wi ll enphasi ze the role played by EnWN as Knowledge Source. Secti on 6, fi nally,presents some conclusions of our work.2 Our way of building WordNets As we have pointed out in the introduction, our ai m has been to desi gn a methodology (and a software env i ronment support i ng i t) forfaci li tati ng the task of bui ldi ng WNs from oursources. As we are nvolved n EWN project (covering the Spanish part), the methodology has been defi ned to be compati ble whi ch the generalapproach, guidelines and landmarks of the wholeproject but also to allow a parallel development ofthe CtWN.The general approach for bui ldi ng EWN i s described in [Vossen et al. 97]. Roughly speaking,the approach follows a top-down strategy tryi ng to assure a hi gh level of overlappi ng between languages, at least i n the hi ghest levels of the h erarchy, but reflect ng the language-spec f c lexi cali zati ons and provi di ng the maxi mum of freedom and flex i b i l i ty for bu i ld i ng the nd v dual WordNets. Bas cally t cons sts of three major steps: 1) Construct ion of core-WordNets for a set of common base concepts (around 800 nouns and 200 verbs), 2) enrichment of these sets prov i d i ng relat i onal l inks and incorporating their direct semantic contexts and 3)top-down extension of these core-WordNets.In our case two different approaches have been followed for dealing with nouns and verbs 3.3Although other categories can be included in EWN (and cross-category relati ons an be establi shed) only nouns and verbs have been i ntroduced unti l now i n our WordNets except for demostration purposes.In the case of verbs most of the work has been performed manually. The ma i n source of nformat on has been the P rap des database [Castellón et al. 97] that consi sts of 3,600 Engli sh verbs forms organi zed around Levi n's Semanti c Classes connected to WN1.5 senses. The database contai ns the theta-Gri ds speci fi cati ons for each verb (i ts semanti c structure i n terms of cases or themat i c roles), translat i on to Span ish andCatalan forms 4 and di athesi s i nformati on. The connecti ons extracted from thi s database were cross-validated with the information provided by bi li ngual di cti onari es i n order to i mprove thei r accuracy.In the case of nouns we have followed EWN strategy in the next way:1) The two h i ghest levels of EnWN (top concepts and di rect hyponyms) were manually translated i nto Spani sh (i ncludi ng vari ants). The results were f i ltered dropp i ng out words appeari ng less than fi ve ti mes as genus terms i nour monolingual dictionary [DGILE 87] or occurringless than 50 times in DGILE definition corpus 5 andless than 100 times in LEXESP corpus6.Thi s i ni ti al set (Spani sh core concepts, 361synsets) was then compared with base concept sets of other s tes of EWN (roughly the un on of i ntersect i on pa irs between languages was considered as the common base concepts set). The missing concepts in Spanish were manually added and vertically bottom up extended leading to the common Base Concept set (around 800 synsets).Catalan Base Concepts set was then built to cover the Spanish Base Concepts set.2) The enri chment of the BC set has been performed in two steps. First, using bilinguals as main lexical source, and then using other sources (ma i nly taxonom i es). These processes are described below.3 Using English WordNet with bilingualsWhen trying to build a lexical taxonomy from scratch, we can take profit of a preexisting lexical taxonomy, EnWN in our case, assuming it is well formed, as a skeleton of a taxonomy where we will f i ll i n the lex i cal data. Th is ensures several advantages: i t speeds up the constructi on of a large lexi con, as the only problem left i s the4Spani sh and Catalan are languages close enough for allowing a simultaneous development of lexical sources.5i.e. set of all definitions included in DGILE (1 million words)6balanced corpus of Spanish (5 Million words).deci si on where to attach the lexi cal data. Thereare also some problems: nobody ensures that thewellformedness of a lex i cal taxonomy for a language keeps true for another language, there must be semant i c closeness between both languages. We have therefore assumed that the structure of the WN taxonomy would suffice in the earlier stages of the construction of the our WNs.So, we need to choose synonyms in Spanish 7 for the Engli sh words present i n the ori gi nal synsets of WN. One way to fulfil our requirements is using bi li ngual di cti onari es (see [Kni ght & Luk 94],[Okumura & Hovy 94]). But we have to perform a sense disambiguation task in order to know whichsense of both words (the Spanish and the English one) is being referred. In other words, we have to deci de, for whi ch sense of the Spani sh word and for wh i ch synset i n WordNet a relat ion of synonymy is being defined.There i s also another m i nor problem to overcome, the unification of the two directions ofthe bi li ngual di cti onary, whi ch in few cases are symmetri cal, to collect all translati ons together.It i s true that uni fyi ng both di recti ons of the bi li ngual di cti onary i mpli es loss of i nformati on potenti ally i mportant (e.g. the order i n whi ch translati ons are wri tten i s relevant). But the lack of systemati c work i n the constructi on of the bilinguals makes this information of very doubtful utility.Thus, we have processed the b i l ingualscreati ng what we have called the homogeneousb i l i ngual, wh i ch i s a b i l i ngual w ith both directions mixed. Then, for each Spanish word, we have collected all the words gi ven as correct translations. And this has been the source for our work of attachment of Spanish words to WordNet synsets.Hav i ng collected all the translat ions of a Spani sh word together, we have then classi fi ed the words in classes depending on their behaviour.They can be class i f i ed i n three d i mens i ons:polysemy, structural and conceptual.In the polysemy di mensi on, we classi fy the words in classes depending on the number and kind of translati ons. For example, all entri es that have only one translati on fall i n the same class when thi s translati on i s monosemous i n WN terms; all entr i es that have several translat i ons fall i n another class when these translat i ons are polysemous.7Although we ilustrate the methodology considering only Spani sh, we performed the whole process for both Catalan and Spanish (and we provide results for both).In the structural di mensi on, we classi fy the words i n classes dependi ng on the relati on that the translati ons owns i n WN. For example, all entri es whi ch have several translati ons, shari ng some of them a common synset in WN, fall in the same category; all entr i es i n wh ich one translat i on i s a d i rect hyponym of other translation fall in the same category, etc.In the conceptual di mensi on, we apply the conceptual di stance formula (whi ch i s explai ned i n secti on 4.2.1.) on elements of the entri es. For example, all entr i es w i th a low conceptual distance between synsets of their translations fall in the same class.Each of these classes defi nes a set of entri es with the same behaviour. A confidence score has been assigned to each class by means of a manual vali dati on of a si gni fi cant sample extracted from them. We deci ded to accept the classes wi th a preci ssi on of 85% or more as classes of words to include in the first version of SpWN.Bilinguals can be used a step further stating a supposition: when several methods give the same result for the same Spanish word, the confidence for thi s attachment i ncreases. We have carri ed out an experi ment checki ng the classes i n pai rs,evaluat i ng the prec i ss ion of the set of ntersect ons, and n all cases the prec ss on increased. We have removed the cases where the preci si on was over 85%, the threshold appli ed i nthe previous experiment. This caused an increment of 40% of the original set of attachments.Furthermore, it is clear that if we merge more bi li nguals, the homogeneous resulti ng wi ll be larger, and wi ll then generate larger classes. But,what is even more important, the classes are more precise because some bilinguals lack the inclusion of some translations for some words. Table 1 shows the current fi gures of both CtWN and SpWN followi ng thi s approach (see [Atseri as et al. 97]and [Benítez et al. 98] for further detai ls of the whole process and tools used).Nouns Words Synsets Connections Spanish 23,21718,57841,293Catalan 5,2314,7237,193Verbs Spanish 3,0873,2197,960Catalan 3,3373,2199,078Table 1: current volumes of CtWN and SpWN.The last point to address is the extension of the intersection method to larger number of classes. If wi th two classes the i ntersecti on i ncreased the conf idence an equ i valent increase whenntersect ng larger numbers of classes can be expected.As a matter of fact, the extens on of theintersection method would be nothing more than perform ng a mult var ant stat st cal analys s,where each of the classes would be a factor. The nteresti ng result of thi s multi vari ant analysi s would be a formula wh ch could be used to calculate the value of the conf i dence of anattachment, depending on the number of classes in which it occurs.4 Building Taxonomies using WordNet4.1 Exploiting taxonomies from MRDsA strai ghtforward way of obtai ni ng a LgWN can be performed acqui ri ng taxonomi c relati ons from conventional dictionaries following a purely bottom up strategy. That s, 1) pars ng eachdefinition for obtaining the genus, 2) performing a genus disambiguation procedure, and 3) building anatural classification of the concepts as a concepttaxonomy w i th several tops. Follow i ng th i s purely descr pt ve methodology, the semant c pri mi ti ves of the LgWN could be obtai ned by collecti ng those di cti onary senses appeari ng at the top of the complete taxonomies derived from the di cti onary. By characteri zi ng each of thesetops, the complete LgWN could be produced. For DGILE, the complete noun taxonomy was derived using the automatic method described by [Rigau et al. 97]8.However, several problems arise due to a) the source (i .e., ci rculari ty, errors, i nconsi stenci es,omi tted genus, etc.) and b) the li mi tati on of the genus sense di sambi guati on techni ques appli ed (i.e., [Bruce et al. 92] report 80% accuracy usi ng automat c techn ques, wh le [R gau et al. 97]report 83%). Furthermore, the top d ct onary senses do not usually represent the semant c subsets that the LgWN needs to characteri ze i n order to represent useful knowledge for NLP systems. In other words, there i s a mi smatch between the knowledge di rectly deri ved from an MRD and the knowledge needed by a LgWN.To i llustrate the problem we are faci ng, let us suppose we plan to place the FOOD concepts i n the LgWN. Ne ther collect ng the taxonom es derived from a top dictionary sense (or selecting a8This taxonomy contains 111,624 dictionary senses and has only 832 di cti onary senses whi ch are tops of the taxonomy (these top d i ct ionary senses have no hypernyms), and 89,458 leaves (wh ich have no hyponyms). That s, 21,334 def n t ons are placed between the top nodes and the leaves.subset of the top di cti onary senses of DGILE)closest to FOOD concepts (e.g., substanc ia -substance-), nor collecti ng those subtaxonomi es starti ng from closely related senses (e.g., bebi da -dri nkable li qui ds- and ali mento -food-) we are able to collect exactly the FOOD concepts present in the MRD. The first are too general (they would cover non-FOOD concepts) and the second are toospec i f ic (they would not cover all FOOD dictionary senses because FOODs are described in many ways).All these problems can be solved using a mixed methodology. That i s, by attachi ng selected top concepts (and i ts der i ved taxonom ies) toprescribed semantic primitives represented in the LgWN. Thus, f i rst, we prescr i be a m i n i mal ontology (represented by the semantic primitives of the LgWN) able to represent the whole lexicon deri ved from the MRD, and second, followi ng adescr i pt ive approach, we collect, for everysemant i c pr i m i t i ve placed i n the LgWN, i tssubtaxonom i es. F i nally, those subtaxonom i es selected for a semanti c pri mi ti ve are attached to the corresponding LgWN semantic category.We used as semant i c pr i mi t i ves the 24lex i cographer's f i les (or semant i c f i les) into whi ch the 60,557 noun synsets (87,641 nouns) of WN are classi fi ed 9. Thus, we consi dered the 24semantic tags of WN as the main LgWN semantic primitives to which all dictionary senses must be attached. In order to overcome the language gap we also used a b i l i ngual Span i sh/Engl i sh d i ct ionary.4.2 Attaching DGILE dictionary senses to semantic primitivesIn order to classi fy all nomi nal DGILE senses with respect to WordNet semantic files, we used a similar approach to that suggested by [Yarowsky 92]. Th i s task i s d i v i ded into three fully automati c consecuti ve subtasks. Fi rst, we tag a subset (due to the di fference i n si ze between the monoli ngual and the bi li ngual di cti onari es) of DGILE di cti onary senses by means of a process that uses the conceptual di stance formula (see 4.2.1); second, we collect sali ent words for each semanti c fi le; and thi rd, we enri ch each DGILE9One could use other semanti c classi fi cati ons, such as Roget's Thesaurus [Yarowsky 92], the LDOCE semantic or pracmatic codes [Slator 91] or even better, a Spanish semant i c class i f i cat i on such as the "D i cc i onar io Ideológi co de la Lengua Española J. Casares" (DILEC).Really, when usi ng thi s methodology a mi ni mal set of informed seeds are needed. These seeds can be collected from MRDs, thesauri or even by i ntrospecti on. (see [Yarowsky 95]).di cti onary sense wi th a semanti c tag collecti ng ev i dence from the sal i ent words prev i ously computed.4.2.1 Attaching WordNet synsets to DGILE headwords.For each DGILE def n t on, the conceptual di stance between headword and genus has been computed usi ng WN1.5 as a semanti c net. We obtained results only for those definitions having English translations (using a bilingual dictionary)for both headword and genus. By computi ng the conceptual di stance between two words (w1,w2)we are also selecting those concepts (c 1i ,c 2j ) which represent them and seem to be closer with respect to the semanti c net used. Conceptual di stance i s computed using formula (1).(1) dist (w 1,w 2)=minc 1i ∈w 1c 2i ∈w 21depth (c k )c k ∈path (c 1i ,c 2i )∑That i s, the conceptual di stance between two concepts depends on the length of the shortest path 10 that connects them and the speci fi ci ty of the concepts in the path.In this way, we obtained a preliminary version of 29,20511 di cti onary defi ni ti ons semanti cally labelled (that i s, wi th WN lexi cographer's fi les)wi th an accuracy of 64% (61% at a sense level).That i s, a corpus (collecti on of di cti onary senses)classified in 24 partitions (each one corresponding to a semantic category).4.2.2 Collecting the salient words for every semantic primitive.Thus, we can collect the salient words (that is,those representat i ve words for a part icular category) using a Mutual Information-like formula (2), where w means word and SC semantic class.(2) AR (w ,SC )=Pr(w |SC )log 2Pr(w |SC )Pr(w )Intu i t i vely, a sal ient word 12 appears s i gn i f i cantly more often in the context of a10We only consider hypo/hypermym relations.11Due to the different sizes of the dictionaries used we only compute the conceptual distance for 31% of the noun dictionary senses.12Instead of word lemmas, this study has been carried out using word forms because word forms rather than lemmas are representative of typical usages of the sublanguage used in dictionaries.semanti c category than at other poi nts i n thewhole corpus, and hence is a better than average i ndi cator for that semanti c category. The words selected are those most relevant to the semanti c category, where relevance i s def ined as theproduct of salience and local frequency. That is to say, i mportant words should be di sti ncti ve and frequent.We performed the training process considering only the content word forms from d ct onarydef n t ons 13 and we di scarded those sali entwords wi th a negati ve score. Thus, we deri ved a lexicon of 23,418 salient words (one word can be a salient word for many semantic categories).4.2.3 Enriching DGILE definitions with WordNet semantic i ng the sal i ent words per category (or semanti c class) gathered i n the previ ous step we labelled the DGILE di cti onary defi ni ti ons agai n.When any of the sali ent words appears i n a def i n i t i on, there i s ev idence that the wordbelongs to the category i ndi cated. If several of these words appear, the evidence grows. We add together the r we ghts, over all words n the defi ni ti on, and determi ne the category for whi ch the sum is greatest, using formula (3).(3)W (SC )=AR (w ,SC )w ∈definition∑Thus, we obta i ned a second semant i cally labelled version of DGILE. This version has 86,759labelled defi ni ti ons (coveri ng more than 93% of all noun definitions) with an accuracy rate of 80%(we have ga ned, s nce the prev ous labelled version, 62% coverage and 16% accuracy).Although we used the 24 lexi cographer's fi les of WordNet as semanti c pri mi ti ves, a more fi ne-grained classification could be made. For example,all FOOD synsets are classi fi ed under <food,nutrient> synset i n fi le 13. However, FOOD concepts are themselves class i f i ed into 11subclasses (i.e., <y o l k >, <g a s t r o n o m y >,<comestible, edible, eatable, ...>, etc.). Thus, i f the LgWN we are planni ng to bui ld needs to represent <beverage, drink, potable> separately from the concepts <comestible, edible, eatable,...> a fi ner set of semanti c pri mi ti ves should be chosen, for nstance, cons der ng each d rect hyponym of a synset belonging to a semantic file also as a new semantic primitive or even selecting13After discarding functional words.for each semanti c fi le the level of abstracti on we need.4.3 Selecting the main top beginners for a semantic primitive Thi s secti on i s devoted to the locati on of the mai n top di cti onary senses for a gi ven semanti c pr m t ve n order to correctly attach all tssubtaxonomies to the correct semantic primitive inthe LgWN.In order to illustrate this process we will locate the mai n top begi nners for the FOOD di cti onary senses. However, we must consider that many of these top beginners are structured. That is, some of them belong to taxonomi es deri ved from other ones, and then cannot be di rectly placed wi thi n the FOOD type. Thi s i s the case of vi no (wi ne),which is a zumo (juice). Both are top beginners for FOOD and one is a hyponym of the other.First, we collect all genus terms from the whole set of DGILE di cti onary senses labelled i n the prev i ous sect i on w ith the FOOD tag (2,614senses), producing a lexicon of 958 different genus terms (only 309, 32%, appear more than once in the FOOD subset of dictionary senses).As the automati c di cti onary sense labelli ng i s not free of errors (around 80% accuracy)14 we can discard some senses by using filtering criteria.• Filter 1 (F1) removes all FOOD genus terms not assigned to the FOOD semantic file during the mapping process between the bilingual dictionary and WN.• Fi lter 2 (F2) selects only those genus terms whi ch appear more ti mes as genus terms i n the FOOD category. That is, those genus terms which appear more frequently i n di cti onary defi ni ti ons belonging to other semantic tags are discarded.• F lter 3 (F3) d scards those genus terms which appear with a low frequency as genus terms i n the FOOD semant i c category. That i s,infrequent genus terms (given a certain threshold)are removed. Thus, F3>1 means that the fi lteri ng cr i ter i a have d iscarded those genus terms appear ng n the FOOD subset of d ct onary definitions less than twice.At the same level of genus frequency, fi lter 2(removing genus terms which are more frequent in other semanti c categori es) i s more accurate than fi lter 1 (removi ng all genus terms the translati on14Most of them are not really errors. For i nstance, all fishes must be ANIMALs, but some of them are edible (that i s, FOODs). Nevertheless, all fi shes labelled as FOOD have been considered mistakes.of which cannot be FOOD). For instance, no error appears when selecti ng those genus terms whi ch appear 10 or more times (F3) and are more frequentin that category than in any other (F2), discarding only 3% of correct genus terms (see [Rigau et al. 98]for complete figures).4.4 Automatically building large scale taxonomies from DGILE The automati c Genus Sense Di sambi guati on task i n DGILE has been performed followi ng[R i gau et al. 97]. Th i s method reports 83%accuracy when selecting the correct hypernym by combining eight different heuristics using several methods and types of knowledge (two of theheuristics use WN).Once the mai n top begi nners (relevant genus terms) of a semanti c category are selected and every d i ct i onary def i n i t ion has beendisambiguated, we collect all those pairs labelledwi th the semanti c category we are worki ng on havi ng one of the genus terms selected. Usi ng these pa rs we f nally bu ld up the complete taxonomy for a given semantic primitive. That is,i n order to bui ld the complete taxonomy for a semanti c pri mi ti ve we fi t the lower senses usi ng the second labelled lexicon and the genus selected from this labelled lexicon.Although, both f nal taxonom c structures produce more flat taxonomi es than i f the task i s done manually, a few arrangements could be done at the top level of the automati c taxonomi es.Studyi ng the mai n top begi nners we can easi ly di scover an i nternal structure between them (for FOOD, 18 or 48 depend i ng on the cr i ter i a selected).Perform ing the process for the whole dictionary we obtained for F2+(F3>9) a taxonomic structure of 35,099 defi ni ti ons and for F2+(F3>4)the si ze grows to 40,754. Testi ng the results on FOOD taxonomies we achived 99% accuracy with the first criterion and 96% with the second.5 Extending and Filling Gaps.Up to now we have described a methodology to connect words from a language to a WN skeleton,and another methodology to build taxonomies.The words f i nally connected i n the f i rst process, apart from the prec i ss ion threshold cri teri on, do not follow any other cri teri on: they are not the most i mportant, nei ther the topmost nor the lowermost concepts i n the hi erarchy; the connecti ons are scattered all over the skeleton.The final set of words connected to the skeleton is random, and we don't have any control over i t.。

相关文档
最新文档