Acousticfeaturesbased_省略_mforspeechrecog
胡壮麟《语言学教程》测试题及答案
WORD格式胡壮麟《语言学教程》(修订版)测试题第一章:语言学导论I.Choose the best answer. (20%)nguage is a system of arbitrary vocal symbols used for human __________.A. contactB. communicationC. relationD. community2.Which of the following words is entirely arbitrary?A. treeB. typewriterC. crashD. bang3. The function of the sentence “ Waterboils at 100 degrees Centigrade.”is__________.A. interrogativeB. directiveC. informativeD. performative4. In Chinese when someone breaks a bowl or a plate the host or the people presentare likely to say “碎碎(岁岁)平安” as means of controlling the forces whichthey believes feel might affect their lives. Which functions does it perform?A. InterpersonalB. EmotiveC. PerformativeD. Recreational5.Which of the following property of language enables language users to overcomethe barriers caused by time and place, due to this feature of language, speakers ofa language are free to talk about anything in any situation?A. TransferabilityB. DualityC. DisplacementD. Arbitrariness6. Study the following dialogue. What function does it play according to thefunctions of language?— A nice day, isn ’ t it?— Right! I really enjoy the sunlight.A. EmotiveB. PhaticC. PerformativeD. Interpersonal7. __________ refers to the actual realization of the ideal language user knowledge’of the rules of his language in utterances.A. PerformanceB. CompetenceC. LangueD. Parole8.When a dog is barking, you assume it is barking for something or at someone that exists hear and now.It couldn ’ t be sorrowful for some lost love or lost bone. This indicates the design feature of __________.A. cultural transmissionB. productivityC. displacementD. duality9. __________ answers such questions as how we as infants acquire our first language.A. Psycholinguistics C. SociolinguisticsB. Anthropological linguistics D. Applied linguistics10.__________ deals with language application to other fields,particularly education.A. Linguistic theoryB. Practical linguisticsC. Applied linguisticsD. Comparative linguisticsII.Decide whether the following statements are true or false. (10%)nguage is a means of verbal communication. Therefore, the communication wayused by the deaf-mute is not language.nguage change is universal, ongoing and arbitrary.13.Speaking is the quickest and most efficient way of the human communicationsystems.nguage is written because writing is the primary medium for all languages.15.We were all born with the ability to acquire language, which means the detailsof any language system can be genetically transmitted.16.Only human beings are able to communicate.17. . De Saussure, who made the distinction between langue and parole in the early20th century, was a French linguist.18. A study of the features of the English used in Shakespeare’s time is an example of the diachronic study of language.19.Speech and writing came into being at much the same time in human history.20.All the languages in the world today have both spoken and written forms.III.Fill in the blanks. (10%)nguage, broadly speaking, is a means of __________ communication.22.In any language words can be used in new ways to mean new things and can becombined into innumerable sentences based on limited rules. This feature is usuallytermed __________.nguage has many functions. We can use language to talk about itself. Thisfunction is __________.24.Theory that primitive man made involuntary vocal noises while performingheavy work has been called the __________ theory.25.Linguistics is the __________ study of language.26.Modern linguistics is __________ in the sense that the linguist tries todiscover what language is rather than lay down some rules for people to observe.27.One general principle of linguistic analysis is the primacy of __________over writing.28.The description of a language as it changes through time is a __________ study.29.Saussure put forward two important concepts. __________ refers to the abstractlinguistic system shared by all members of a speech community.30. Linguistic potential is similar to Saussure ’ s langue and Chomsky ’ s ________ IV. Explain the following terms, using examples. (20%)31.Design feature32.Displacementpetence34.Synchronic linguisticsV.Answer the following questions. (20%)35.Why do people take duality as one of the important design features of humanlanguage? Can you tell us what language will be if it has no such design feature?(南开大学, 2004 )36. Why is it difficult to define language?(北京第二外国语大学,2004)VI. Analyze the following situation. (20%)37. How can a linguist make his analysis scientific?(青岛海洋大学,1999)第二章:语音I. Choose the best answer. (20%)1. Pitch variation is known as __________ when its patterns are imposed on sentences.A. intonationB. toneC. pronunciationD. voice2. Conventionally a __________ is put in slashes (/ /).A. allophoneB. phoneC. phonemeD. morpheme3.An aspirated p, an unaspirated p and an unreleased p are __________ of the p phoneme.A. analoguesB. tagmemesC. morphemesD. allophones4. The opening between the vocal cords is sometimes referred to as __________.A. glottisB. vocal cavityC. pharynxD. uvula5. The diphthongs that are made with a movement of the tongue towards the centerare known as __________ diphthongs.A. wideB. closingC. narrowD. centering6. A phoneme is a group of similar sounds called __________. A.minimal pairs B. allomorphs C. phones D. allophones7. Which branch of phonetics concerns the production of speech sounds? A.Acoustic phonetics B. Articulatory phonetics C. Auditory phonetics D.None of the above8. Which one is different from the others according to places of articulation?A. [n]B. [m]C. [ b ]D. [p]9. Which vowel is different from the others according to the characteristics of vowels?A. [i:]B. [ u ]C. [e]D. [ i ]10.What kind of sounds can we make when the vocal cords are vibrating?A. VoicelessB. VoicedC. Glottal stopD. ConsonantII. Decide whether the following statements are true or false. (10%)11.Suprasegmental phonology refers to the study of phonological properties of units larger than the segment-phoneme, such as syllable, word and sentence.12.The air stream provided by the lungs has to undergo a number of modification to acquire the quality of aspeech sound.13. Two sounds are in free variation when they occur in the same environment and donot contrast, namely,the substitution of one for the other does not produce adifferent word, but merely a different pronunciation.14. [p] is a voiced bilabial stop.15. Acoustic phonetics is concerned with the perception of speech sounds.16.All syllables must have a nucleus but not all syllables contain an onset and a coda.17.When pure vowels or monophthongs are pronounced, no vowel glides take place.18.According to the length or tenseness of the pronunciation, vowels can be divided into tense vs. lax orlong vs. short.19.Received Pronunciation is the pronunciation accepted by most people.20.The maximal onset principle states that when there is a choice as to where toplace a consonant, it is put into the coda rather than the onset.III.Fill in the blanks. (20%)21.Consonant sounds can be either __________ or __________, while all vowelsounds are __________.22. Consonant sounds can also be made when two organs of speech in the mouth are brought close together so that the air is pushed out between them, causing__________.23. The qualities of vowels depend upon the position of the __________ and the lips.24.One element in the description of vowels is the part of the tongue which is at the highest point in the mouth. A second element is the __________ to which that part of the tongue is raised.25.Consonants differ from vowels in that the latter are produced without__________.26.In phonological analysis the words fail / veil are distinguishable simplybecause of the two phonemes /f/ - /v/. This is an example for illustrating__________.27.In English there are a number of __________, which are produced by movingfrom one vowel position to another through intervening positions.28. __________ refers to the phenomenon of sounds continually show the influenceof their neighbors.29.__________ is the smallest linguistic unit.30.Speech takes place when the organs of speech move to produce patterns of sound. These movements have an effect on the __________ coming from the lungs.IV. Explain the following terms, using examples. (20%)31.Sound assimilation32.Suprasegmental feature33. Complementary distribution34.Distinctive featuresV. Answer the following questions. (20%)35. What is acoustic phonetics?(中国人民大学,2003)36.What are the differences between voiced sounds and voiceless sounds in terms of articulation?(南开 04)VI. Analyze the following situation. (20%)37.Write the symbol that corresponds to each of the following phonetic descriptions; then give an English word that contains this sound. Example: voiced alveolar stop [d] dog. (青岛海洋大学, 1999 )(1)voiceless bilabial unaspirated stop(2)low front vowel(3)lateral liquid(4)velar nasal(5)voiced interdental fricative第三章:词汇I. Choose the best answer. (20%)1. Nouns, verbs and adjectives can be classified as __________.A. lexical wordsB. grammatical wordsC. functionwords D. form words2. Morphemes that represent tense, number, gender and case are called __________ morpheme.A. inflectional C. boundB. freeD. derivational3. There are __________ morphemes in the word denationalization.A. threeB. fourC. fiveD. six4.In English –ise and –tion are called __________.A. prefixesB. suffixesC. infixesD. stems5. The three subtypes of affixes are: prefix, suffix and__________. A. derivational affix B. inflectional affix C. infixD. back-formation6. __________ is a way in which new words may be formed from already existing words by subtracting an affix which is thought to be part of the old word.A. affixationB. back-formationC. insertionD. addition7.The word TB is formed in the way of __________.A. acronymyB. clippingC. initialismD. blending8. The words like comsat and sitcom are formed by __________. A.blending B. clipping C. back-formation D. acronymy9. The stem of disagreements is __________A. agreementB. agreeC. disagreeD. disagreement10.All of them are meaningful except for __________.A. lexemeB. phonemeC. morphemeD. allomorphII. Decide whether the following statements are true or false. (10%)11. Phonetically, the stress of a compound always falls on the first element, while the second element receives secondary stress.12.Fore as in foretell is both a prefix and a bound morpheme.13. Base refers to the part of the word that remains when all inflectionalaffixes are removed.14.In most cases, prefixes change the meaning of the base whereas suffixes change the word-class of the base.15.Conversion from noun to verb is the most productive process of a word.16. Reduplicative compound is formed by repeating the same morpheme of a word.17. The words whimper, whisper and whistle are formed in the way of onomatopoeia.18. In most cases, the number of syllables of a word corresponds to the number of morphemes.19. Back-formation is a productive way of word-formations.20. Inflection is a particular way of word-formations.III. Fill in the blanks. (20%)21. An __________ is pronounced letter by letter, while an __________ is pronounced as a word.22. Lexicon, in most cases, is synonymous with __________.23.Orthographically, compounds are written in three ways: __________, __________ and __________.24. All words may be said to contain a root __________.25. A small set of conjunctions, prepositions and pronouns belong to __________ class, while the largest part of nouns, verbs, adjectives and adverbs belongs to __________ class.26.__________ is a reverse process of derivation, and therefore is a processof shortening.27.__________ is extremely productive, because English had lost most of its inflectional endings by the end of Middle English period, which facilitated the use of words interchangeably as verbs or nouns, verbs or adjectives, and vice versa.28.Words are divided into simple, compound and derived words on the __________ level.29. A word formed by derivation is called a __________, and a word formed by compounding is called a__________.30.Bound morphemes are classified into two types: __________ and __________.IV. Explain the following terms, using examples. (20%)31.Blending32.Allomorph33.Closed-class word34. Morphological ruleV. Answer the following questions. (20%)35.How many types of morphemes are there in the English language? What are they?(厦门大学, 2003 )36. What are the main features of the English compounds?VI. Analyze the following situation. (20%)37. Match the terms under COLUMN I with the underlined forms from COLUMN II(武汉大学, 2004 )I II(1)acronym a. foe(2)free morpheme b. subconscious(3)derivational morpheme c. UNESCO(4)inflectional morpheme d. overwhelmed(5)prefix e. calculation第四章:句法I. Choose the best answer. (20%)1.The sentence structure is ________.A. only linearB. only hierarchicalC. complexD. both linear and hierarchical2.The syntactic rules of any language are ____ in number.A. largeB. smallC. finiteD. infinite3.The ________ rules are the rules that group words and phrases to form grammatical sentences.A. lexicalB. morphologicalC. linguisticD. combinational4.A sentence is considered ____ when it does not conform to the grammatical knowledge in the mind of native speakers.A. rightB. wrongC. grammaticalD. ungrammatical5.A __________ in the embedded clause refers to the introductory word that introduces the embedded clause.A. coordinatorB. particleC. prepositionD. subordinator6. Phrase structure rules have ____ properties.A. recursiveB. grammaticalC. socialD. functional7.Phrase structure rules allow us to better understand _____________.A.how words and phrases form sentences.B.what constitutes the grammaticality of strings of wordsC.how people produce and recognize possible sentencesD.all of the above.8. The head of the phrase “ the city RomeA. the cityB. RomeC.city ” is __________.D. the city Rome9.The phrase “ on the shelflongs”tobe__________ construction.A. endocentricB. exocentricC. subordinateD. coordinate10.The sentence “ They were wanted to remain quiet and not to expose themselves. is a __________sentence.A. simpleB. coordinateC. compoundD. complexWORD格式II. Decide whether the following statements are true or false. (10%)11.Universally found in the grammars of all human languages, syntactic rules that comprise the system of internalized linguistic knowledge of a language speaker are known as linguistic competence.12.The syntactic rules of any language are finite in number, but there is nolimit to the number of sentences native speakers of that language are able toproduce and comprehend.13.In a complex sentence, the two clauses hold unequal status, one subordinatingthe other.14.Constituents that can be substituted for one another without loss ofgrammaticality belong to the same syntactic category.15. Minor lexical categories are open because these categories are not fixed and new members are allowed for.16.In English syntactic analysis, four phrasal categories are commonly recognizedand discussed, namely, noun phrase, verb phrase, infinitive phrase, and auxiliary phrase.17.In English the subject usually precedes the verb and the direct objectusually follows the verb.18.What is actually internalized in the mind of a native speaker is a completelist of words and phrases rather than grammatical knowledge.19.A noun phrase must contain a noun, but other elements are optional.20.It is believed that phrase structure rules, with the insertion of the lexicon, generate sentences at the level of D-structure.III. Fill in the blanks. (20%)21.A __________ sentence consists of a single clause which contains a subject and a predicate and stands alone as its own sentence.22.A __________ is a structurally independent unit that usually comprises a numberof words to form a complete statement, question or command.23.A __________ may be a noun or a noun phrase in a sentence that usually precedes the predicate.24.The part of a sentence which comprises a finite verb or a verb phrase and which says something about the subject is grammatically called __________.25.A __________ sentence contains two, or more, clauses, one of which isincorporated into the other.26.In the complex sentence, the incorporated or subordinate clause is normallycalled an __________clause.27.Major lexical categories are __________ categories in the sense that new wordsare constantly added.28.__________ condition on case assignment states that a case assignor and a case recipient should stay adjacent to each other.29.__________ are syntactic options of UG that allow general principles to operatein one way or another and contribute to significant linguistic variations between andWORD格式among natural languages.30. The theory of __________ condition explains the fact that noun phrases appearonly in subject and object positions.IV. Explain the following terms, using examples. (20%)31.Syntax32.IC analysis33. Hierarchical structure34. Trace theoryV. Answer the following questions. (20%)35.What are endocentric construction and exocentric construction?(武汉大学,2004)36. Distinguish the two possible meanings of “ more beautiful flowers ”by means IC analysis. (北京二外国语大学,2004)VI. Analyze the following situation. (20%)37.Draw a tree diagram according to the PS rules to show the deep structure of thesentence:The student wrote a letter yesterday.第五章:意义I. Choose the best answer. (20%)1. The naming theory is advanced by ________.A. PlatoB. BloomfieldC. Geoffrey LeechD. Firth2. “We shall know a word by the company it keeps. ” This statement represents _______.A. the conceptualist viewB. contexutalismC. the naming theoryD. behaviorism3.Which of the following is NOT true?A.Sense is concerned with the inherent meaning of the linguistic form.B.Sense is the collection of all the features of the linguistic form.C.Sense is abstract and decontextualized.D.Sense is the aspect of meaning dictionary compilers are not interested in.4.“Can I borrow your bike?A. is synonymous with C.entails ” _______ “ You have a bike. ”B. is inconsistent withD. presupposes5. ___________ is a way in which the meaning of a word can be dissected into meaning components,called semantic features.A. Predication analysisB. Componential analysisC. Phonemic analysisD. Grammatical analysis6.“Alive”and “ dead” are ______________.A. gradable antonymsB. relational antonymsC. complementary antonymsD. None of the above7. _________ deals with the relationship between the linguistic element and thenon-linguistic world of experience.A. ReferenceB. ConceptC. SemanticsD. Sense8.___________ refers to the phenomenon that words having different meanings have the same form.A. PolysemyB. SynonymyC. HomonymyD. Hyponymy9. Words that are close in meaning are called ______________.A. homonymsB. polysemiesC. hyponymsD. synonyms10.The grammaticality of a sentence is governed by _______.A. grammatical rulesB. selectional restrictionsC. semanticrules D. semantic featuresII.Decide whether the following statements are true or false. (10%)11. Dialectal synonyms can often be found in different regional dialects such as British English and American English but cannot be found within the varietyitself, for example, within British English or American English.12. Sense is concerned with the relationship between the linguistic element andthe non-linguistic world of experience, while the reference deals with theinherent meaning of the linguistic form.13. Linguistic forms having the same sense may have different references indifferent situations.14. In semantics, meaning of language is considered as the intrinsic andinherent relation to the physical world of experience.15. Contextualism is based on the presumption that one can derive meaning from or reduce meaning to observable contexts.16. Behaviorists attempted to define the meaning of a language form as thesituation in which the speaker utters it and the response it calls forth in the hearer.17. The meaning of a sentence is the sum total of the meanings of allits components.18. Most languages have sets of lexical items similar in meaning but rankeddifferently according to their degree of formality.19.“It is hot. ”-placeisnopredication because it contains no argument.20.In grammatical analysis, the sentence is taken to be the basic unit, but in semantic analysis of a sentence, the basic unit is predication, which is the abstraction of the meaning of a sentence.III. Fill in the blanks. (20%)21. __________ can be defined as the study of meaning.22. The conceptualist view holds that there is no __________ link between alinguistic form and what it refers to.23.__________ means what a linguistic form refers to in the real, physical world;it deals with the relationship between the linguistic element and the non-linguistic world of experience.24. Words that are close in meaning are called __________.25. When two words are identical in sound, but different in spelling and meaning, they are called__________.26.__________ opposites are pairs of words that exhibit the reversal of a relationship between the two items.27.__________ analysis is based upon the belief that the meaning of a word can be divided into meaning components.28. Whether a sentence is semantically meaningful is governed by rules called__________ restrictions, which are constraints on what lexical items can go with what others.29. A(n) __________ is a logical participant in a predication, largely identical with the nominal element(s)in a sentence.30.According to the __________ theory of meaning, the words in a language aretaken to be labels of the objects they stand for.IV. Explain the following terms, using examples. (20%)31.Entailment32.Propositionponential analysis34.ReferenceV. Answer the following questions. (20%)35. What are the sense relations between the following groups of words?Dogs, cats, pets, parrots; trunk, branches, tree, roots(青岛海洋大学,1999 )36.What are the three kinds of antonymy?(武汉大学, 2004 )VI. Analyze the following situation. (20%)37. For each group of words given below, state what semantic property or properties are shared by the (a) words and the (b) words, and what semantic property or properties distinguish between the classes of (a)words and (b) words.(1) a. bachelor, man, son, paperboy, pope, chiefb. bull, rooster, drake, ram(2) a. table, stone, pencil, cup, house, ship, carb. milk, alcohol, rice, soup(3) a. book, temple, mountain, road, tractorb. idea, love, charity, sincerity, bravery, fear (青岛海洋大学,1999)第七章:语言、文化和社会[注:第六章无测试题]I.Choose the best answer. (20%)1._______ is concerned with the social significance of language variation and language use in different speech communities.A. PsycholinguisticsB. SociolinguisticsC. Applied linguisticsD. General linguistics2.The most distinguishable linguistic feature of a regional dialect is its __________.A. use of wordsB. use of structuresC. accentD.morphemes3. __________ is speech variation according to the particular area where a speaker comes from.A. Regional variation C. Social variationB. Language variation D. Register variation4._______ are the major source of regional variation of language.A. Geographical barriersB. Loyalty to and confidence in one ’ s native speechC. Physical discomfort and psychological resistance to changeD. Social barriers5. _________ means that certain authorities, such as the government choose, a particular speech variety, standardize it and spread the use of it across regional boundaries.A. Language interference C. Language planningB. Language changes D. Language transfer6._________ in a person ’ s speech or writing usually ranges on a continuum from casual or colloquial to formal or polite according to the type of communicative situation.A. Regional variationB. Changes in emotionsC. Variation in connotationsD. Stylistic variation7.A ____ is a variety of language that serves as a medium of communication among groups of people for diverse linguistic backgrounds.A. lingua francaB. registerC. CreoleD. national language8.Although _______ are simplified languages with reduced grammatical features, they are rule-governed, like any human language.A. vernacular languagesB. creolesC. pidginsD. sociolects9.In normal situations, ____ speakers tend to use more prestigious forms than their____ counterparts with the same social background.A. female; maleB. male; femaleC. old; youngD. young; old10.A linguistic _______ refers to a word or expression that is prohibited bythe “ polite ” society generalfrom use.A. slangB. euphemismC. jargonD. tabooII.Decide whether the following statements are true or false. (10%)11. Language as a means of social communication is a homogeneous system with a homogeneous group of speakers.12. The goal of sociolinguistics is to explore the nature of language variation and language use among a variety of speech communities and in different social situations. 13. From the sociolinguistic perspective, the term“speechotbevarietyused to”can n refer to standard language, vernacular language, dialect or pidgin.14. The most distinguishable linguistic feature of a regional dialect is its grammarand uses of vocabulary.15. A person’s social backgrounds do not exert a shaping influence on his choiceof linguistic features.16. Every speaker of a language is, in a stricter sense, a speaker of a distinct idiolect.17.A lingua franca can only be used within a particular country forcommunication among groups of people with different linguistic backgrounds.18.A pidgin usually reflects the influence of the higher, or dominant, language inits lexicon and that of the lower language in their phonology and occasionallysyntax.19. Bilingualism and diglossia mean the same thing.20. The use of euphemisms has the effect of removing derogatory overtones and the disassociative effect as such is usually long-lasting.III. Fill in the blanks. (20%)21. The social group isolated for any given study is called the speech __________.22. Speech __________ refers to any distinguishable form of speech used by aspeaker or group of speakers.23.From the sociolinguistic perspective, a speech variety is no more than a__________ variety of a language.nguage standardization is also called language __________.25.Social variation gives rise to __________ which are subdivisible intosmaller speech categories that reflect their socioeconomic, educational,occupational background, etc.26. __________ variation in a person ’ s speech or writing usuallyon range continuum from casual or colloquial to formal or polite according to the typeof communicative situation.27. A regional dialect may gain status and become standardized as the national or。
语言学教程02Chapter 2_sound(2)
If the sound becomes more like the following sound, as in the case of lamb, it is known as anticipatory coarticulation先期协同发音. If the sound shows the influence of the preceding sound, it is perseverative coarticulation后滞协同 发音, as is the case of map.
In phonetic terms, phonemic transcriptions represent the „broad‟ transcriptions.
3.3 Allophones 音位变体
Allophones---- the different phones which can represent a phoneme in different phonetic contexts.
Velarization: clear l and dark l // [] / _____ V [] / V _____
Think about tell and telling!
Phonetic similarity发音近似性: the allophones of a phoneme must bear some phonetic resemblance.
The word „phoneme‟音位 simply refers to a „unit of explicit sound contrast‟: the existence of a minimal pair automatically grants phonemic status to the sounds responsible for the contrasts.
新编简明英语语言学戴炜栋版本u1-u6期末笔记整理
●语言学家:1.F.de Saussure P4Swiss linguist. He distinct the langue and parole in the early 20thcentury <course in general linguistics>写了《普通语言学》强调研究语言(what linguist should do is to abstract langue from parole)2.N ChomskAmerican linguist distinct competence and performance in the late 1950s强调研究语言能力(competence)和索绪尔的相似点●Saussure和chomsky不同之处:索绪尔从社会学角度(sociological view)他的语言概念属于社会习俗范畴(social conventions);乔姆斯基是从心理学角度(Psychological view),认为语言能力是每个个体大脑的特征(property of mind of each individual)3.现代语言学基本上是描述性的(descriptive),传统语法是规定性的(prescriptive)4.现代语言学中共时性研究更重要(synchronic)Phonetics(语音学) Phonology(音位学)●发音器官1.pharyngeal cavity2.oral cavity3.nasal cavity●speech and writing are the two media or substances 言语和文字是自然语言的两种媒介和物质(言语比文字更加基础)●语音学从哪三个角度研究?(1)说话者角度articulatory phonetics 发声语音学(历史最悠久)(2)听话者角度auditory phonetics 听觉语音学(3)研究语音的传播方式acoustic phonetics 声学语音学●主要现在用IPA标音标,但是语言学家会用严式标音(narrowtranscription)书上举了两个字母的例子{l} leap,feel ,health {p} pit,spit (送气,不送气)p h来表送气●语音的分类:元音(voiced sound)和辅音●voiceless●元音的分类:(1)根据舌头哪一个部位最高,分为front、central、back(2)嘴巴的张合度,分为闭元音、半闭元音、半开元音、开元音(3)不圆唇的(所有前和中元音+{a:} )和圆唇的(rounded)后元音●Segment 和syllable 前面数有几个元音辅音;后面数有几个元音●语音学和音位学的区别(1)语音学家关注{l} 的发音,清晰舌边音和模糊舌边音(2)音位学家关注{l}分布模式,即在什么位置发这个音如{l} 在元音后或辅音前,发模糊舌边音feel、quilt{l}放在元音前发清晰的舌边音leap注意:Phonology is concerned with the sound system of a particular language.(关注某种语言的语音系统)Linguistics is the scientific study of human languages in general.一、区分音素,音位,音位变体●音素:phone(1)在单词feel[fi:ł],leaf[li:f],tar[tha:],star[sta:]中,一共有7个音素,分别是[f],[i:],[ł],[l],[th].[t],[a:].(2)英语共有48个音素,其中元音20个,辅音28个。
Festival Multisyn Voices for the 2007 Blizzard Challenge
Festival Multisyn Voices for the2007Blizzard Challenge Korin Richmond,Volker Strom,Robert Clark,Junichi Yamagishi and Sue Fitt Centre for Speech Technology ResearchUniversity of Edinburgh,Edinburgh,United Kingdom(korin|vstrom|robert|jyamagis|sue)@AbstractThis paper describes selected aspects of the Festival Mul-tisyn entry to the Blizzard Challenge2007.We provide an overview of the process of building the three required voices from the speech data provided.This paper focuses on new fea-tures of Multisyn which are currently under development and which have been employed in the system used for this Bliz-zard Challenge.These differences are the application of a more flexible phonetic lattice representation during forced alignment labelling and the use of a pitch accent target cost component. Finally,we also examine aspects of the speech data provided for this year’s Blizzard Challenge and raise certain issues for discussion concerning the aim of comparing voices made with differing subsets of the data provided.1.IntroductionMultisyn is a waveform synthesis module which has recently been added to the Festival speech synthesis system[1].It pro-vides aflexible,general implementation of unit selection and a set of associated voice building tools.Strong emphasis is placed onflexibility as a research tool on one hand,and a high level of automation using default settings during“standard”voice build-ing on the other.This paper accompanies the Festival Multisyn entry to the Blizzard Challenge2007.Similar to the Blizzard Challenges of the previous two years([2,3]),the2007Blizzard Challenge required entrants to build three voices from the speech data pro-vided by speaker“EM001”,then submit a set of synthesised test sentences for evaluation.Thefirst voice,labelled voice“A”, used the entire voice database.Two smaller voices,“B”and“C”used subsections of the database.V oice“B”used the set of sen-tences from the ARCTIC database[4]which were recorded by the EM001speaker.For voice“C”,entrants were invited to per-form their own text selection on the voice database prompts to select a subset of sentences no larger than the ARCTIC data set in terms of total duration of speech in seconds.V oices“B”and “C”are intended as a means to compare different text selection algorithms,as well as to evaluate the performance of synthesis systems when using more limited amounts of speech data.Multisyn and the process of building voices for Multisyn is described in detail in[1].In addition,entrants to the Blizzard Challenge this year have been asked to provide a separate sys-tem description in the form of a template questionnaire.For the reader’s convenience this paper will provide a brief overview of Multisyn and the voices built.To limit redundancy,however,we will not repeat all details comprehensively.Instead,we aim to focus here on areas where the use of Multisyn differs from[1]. Those significant differences are two-fold.First,we will intro-duce a new technique we have been developing to help in forced alignment labelling.Next,we describe a target cost component which uses a simple pitch accent prediction model.Finally,we will discuss our experience of building voice“C”,and highlight some issues we believe may complicate comparison of entrants’voices“B”and“C”.2.Multisyn voice buildingWe use our own Unisyn lexicon and phone set[5],so only used the prompts and associated wavefiles from the distributed data, performing all other processing for voice building from scratch. Thefirst step of voice building involved some brief examina-tion of the text prompts tofind missing words and to add some of them to our lexicon,fix gross text normalisation problems and so on.Next,we used an automatic script to reduce the du-ration of any single silence found in a wavefile to a maximum of50msec.From this point,the process for building Multisyn voices“A”,“B”and“C”described in the remainder of this sec-tion was repeated separately for the relevant utterance subset for each voice.We used HTK tools in a scripted process to perform forced alignment using frames of12MFCCs plus log energy(utter-ance based energy normalisation switched off)computed with a10msec window and2msec frame shift.The process be-gan with single mixture monophone models with three emitting states,trained from a“flat start”.Initial labelling used a single phone sequence predicted by the Festival Multisyn front end. However,as the process progressed with further iterations of reestimation,realignment,mixing up,adding a short pause tee model,and so on,we switched to using a phone lattice for align-ment described in Section3.Once labelling was completed,we used it to perform a waveform power factor normalisation of all waveforms in the database.This process looks at the energy in the vowels of each utterance to compute a single factor to scale its waveform.The power normalised waveforms were then used throughout the remainder of the voice building process,which began with repeating the whole labelling process.Once the labelling had been completed,it was used to build utterance structures1,which are used as part of the internal rep-resentation within afinal Multisyn voice.At this stage,the text prompts were run through a simple pitch accent prediction model(see Section4),and this information stored in the utter-ance structures.Additional information was also added to the utterance structures at this stage;for example,phones with a duration more than2standard deviations from the mean were flagged.Such information could be used later at unit selection time in the target cost function.In addition to labelling and linguistic information stored in utterancefiles,Multisyn requires join cost coefficients and RELP synthesis parameters.To create the synthesis parameters, wefirst performed pitchmarking using a custom script which makes use of Entropic’s epochs,get resid,get f0 and refcof programs.We then used the sig2fv and sigfilter programs from the Edinburgh Speech Tools for lpc analysis and residual signal generation respectively.The 1a data structure defined in the Edinburgh Speech Tools libraryMultisyn join cost uses three equally weighted components: spectral,f0and log energy.The spectral and log energy join cost coefficients were taken from the MFCCfiles calculated by HTK’s HCopy used for labelling.The f0contours were pro-vided by the ESPS program get f0.All three of these feature streams were globally normalised and saved in the appropriate voice data structure.During unit selection,Multisyn does not use any acoustic prosodic targets in terms of pitch or duration.Instead,the target cost is a weighted normalised sum of a series of components which consider the following:lexical stress,syllable position, word position,phrase position,part of speech,left and right phonetic context,“bad duration”and“bad f0”.As mentioned above,“bad duration”is aflag which is set on a phone within a voice database utterance during voice building and suggests a segment should not be used.Similarly,the“bad f0”target cost component looks at a candidate unit’s f0at concatenation points,considering voicing status rather than a specific target f0 value.We have also used an additional target cost component for the presence or absence of a pitch accent on a vowel.This is described further in Section4.Finally,we stress that during concatenation of the best can-didate unit sequence,Multisyn does not currently employ any signal processing apart from a simple overlap-add windowing at unit boundaries.No prosodic modification of candidate units is attempted and no spectral,amplitude or f0interpolation is performed across concatenation boundaries.3.Finite state phonetic lattice labelling For all three voices for this Blizzard Challenge we employed a forced alignment system we have been developing which makes use of afinite state representation of the predicted phonetic real-isation of the recorded prompts.The advantage of thefinite state phonetic representation is that it makes it possible to elegantly encode and process a wide variety pronunciation variation dur-ing labelling of speech data.In the following two sections we first give a general introduction to how our phonetic lattice la-belling works,and then give some more specific details of how the system was applied to building voices for this Blizzard Chal-lenge.3.1.General implementationIf we consider how forced alignment is standardly performed using HTK,for example,the user is required to provide,among other things,a pronunciation lexicon and word level transcrip-tion.The pronunciation lexicon contains a mapping between a given word and a corresponding sequence of phone model labels.During forced alignment,the HTK recognition engine loads the word level transcription and expands this into a recog-nition network,or“lattice”,of phone models using the pronun-ciation dictionary.This lattice is then used to align against the sequence of acoustic parameter vectors.The predominant way to include pronunciation variation within this system is to use multiple entries in the lexicon for the same word.This approach generally suits speech recognition,but in the case of labelling for building a unit selection voice,we could perhaps profit from moreflpleteflexibility is achieved if we compose the phone lattice directly and pass that to the recognition engine.To build the phone lattice for a given prompt sentence,we first lookup each word in the lexicon and convert the phone string to a simplefinite state structure.When a word is not found in the lexicon,we use the CART letter-to-sound rules the final festival voice would use to generate a phone string.Where multiple pronunciations for a word are found,we can combine these into a singlefinite state representation using the union op-eration.Thefinite state machines for the separate words are then concatenated in sequence to give a single representation of the sentence.The topfinite state acceptor(FSA)in Figure 1gives a simplified example of the result of this process for a phrase fragment“...wider economic...”.At this stage,there is little advantage over the standard HTK method,which would internally arrive at the same result.How-ever,once we have a predicted phonetic realisation for a record-ing prompt in afinite state form,it is then straightforward to process this representation further in an elegant and robust way. This is useful to help perform simple tasks,such as splitting stops and affricates into separate symbols for their stop and release parts during forced alignment(done to identify a suit-able concatenation point).More significantly,though,we can also robustly apply more complex context dependent postlex-ical rules,for example optional“r”epenthesis intervocalically across word boundaries for certain British English accents.This is indicated in the bottom FSA of Figure1.This may be conveniently achieved by writing rules in the form of context dependent regular expressions.It is then possi-ble to automatically compile these rules into an equivalentfinite state transducer which can operate on the input lattice which resulted from lexical lookup(e.g.top FSA in Figure1).Sev-eral variations of compilation methods have been previously described to convert a system of handwritten context dependent mapping rules into an equivalent FST machine to perform the transduction,e.g.[6,7,8].Note that the use of context depen-dent modifications is moreflexible and powerful than the stan-dard HTK methods.For example,a standard way to implement optional“r”epenthesis pronunciation variation using a pronun-ciation lexicon alone would be to include multiple entries for “wider”,one of which contains the additional“r”.However,this introduces a number of problems.The most significant problem is the absence of any mechanism to disallow“r”epenthesis in environments where a vowel does not follow.The phonetic lattice alignment code has been implemented as a set of python modules which underlyingly use and extend the MIT Finite State Transducer Toolkit[9].We use CSTR’s Unisyn lexicon[5]to build voices and within the running syn-thesis system.For forced alignment,we use scripts which un-derlying make use of the HTK speech recognition library[10]. Finally,we are planning to make this labelling system publicly available once it reaches a more mature state of development.3.2.Application to EM001voiceSpeaker EM001exhibits a rather careful and deliberate ap-proach to pronunciation during the recordings and uses a rel-atively slow rate of speech.This in fact tends to limit the ap-plicability and usefulness of postlexical rules for the Blizzard Challenge voices somewhat.Postlexical rules are more use-fully applied to the processes of morefluent and rapid connected speech.Thus,in building the three voices for the2007Bliz-zard Challenge,the sole postlexical rule we used was a“tap”rule.Under this rule,alveolar stops in an intervocalic cross word environment could undergo optional transformation to a tap.Specifically,the left phonetic context for this rule com-prised the set of vowels together with/r,l,n/(central and lateral approximants and alveolar nasal stop),while the right context contained just the set of vowels.4.Pitch accent predictionIn this year’s system,we have experimented with a simple pitch accent target cost function component.To use pitch accent pre-diction in the voices built for the Blizzard Challenge required three changes.First,we ran a pitch accent predictor on the textFigure1:Toy examplefinite state phonetic lattices for the phrase fragment“wider economic”:a)after lexical lookup,the lattice encodes multiple pronunciation variants for“economic”b)after additional“r”insertion postlexical rule,the input lattice(top)is modified to allow optional insertion of“r”(instead of short pause“sp”).prompts andflagged words with a predicted accent as such in the voice data structures.Next,at synthesis time,our front end linguistic processor was modified to run the accent predictor on the input sentence to be synthesised,and words with a predicted accent were similarlyflagged.Finally,an additional target cost component compared the values of the pitch accentflag for the word associated with each target vowel and returned a suitable cost depending on whether they match or not.The method for pitch accent prediction we used here is very simple.It is centred on a look-up table of probabilities that a word will be accented,or“accent ratios”,along the lines of the approach described in[11].The accent predictor simply looks up a word in this list.If the word is found and its probability for being accented is less than the threshold of0.28,it is not accented.Otherwise it will receive an accent.These accent ratios are based on the BU Radio Corpus and six Switchboard dialogues.The list contains157words with an accent ratio of less than0.282.The pitch accent target cost component has recently been evaluated in a large scale listening test and was found to be beneficial[12].5.Voice“C”and text selection Entrants to the2007Blizzard Challenge were encouraged to enter a third voice with a voice database size equal to that of the ARCTIC subset,but with a freely selected subset of utterances. The purpose of this voice is to probe the performance of each team’s text selection process,as well as to provide some insight into the suitability of the ARCTIC data set itself.5.1.Text selection processOrdinarily,when designing a prompt set for recording a unit selection voice database,we would seek to avoid longer sen-tences.They are generally harder to read,which means they are more taxing on the speaker and are more likely to slow down the recording process.In this case,however,since the sentences had been recorded already,we decided to relax this constraint.In a simple greedy text selection process,sentences were chosen in an iterative way.First,the diphones present in the EM001text prompts were subcategorised to include certain contextual features.The features we included were lexical stress,pitch accent and proximity to word boundary.Syllable boundary information was not used in the specification of di-phone subtypes.Next,sentences were ranked according to the number of context dependent diphones contained.The top ranking sen-tence was selected,then the ranking of the remaining sentences was recomputed to reflect the diphones now present in the sub-set of selected sentences.Sentences were selected one at a time in this way until the total time of the selected subset reached the 2using the accent ratio table in this way is essentially equivalent to using an(incomplete)list of English function words.count of diphone type in full EM001 setcountofcountsofmissingdiphonetypesFigure2:Histogram of counts of unique context dependent di-phone types present in the full EM001set which are missing from the selected subset used to build for voice“C”.prescribed threshold.This resulted in a subset comprising431 utterances,with a total duration of2908.75seconds.Our definition of context dependent diphones implied a to-tal of6,199distinct diphones with context in the entire EM001 corpus.Our selected subset for voice“C”contained4,660of these,which meant1,539were missing.Figure2shows a his-togram of the missing diphone types in terms of their counts in the full EM001data set.We see that the large majority of the missing diphone types only occur1–5times in the full EM001 dataset.For example,773of the diphone types which are miss-ing from the selected subset only occur once in the full EM001 set,while only one diphone type which is missing occurred as many as26times in the full data set.5.2.Evaluation problemsAlthough it is certainly interesting to compare different text se-lection algorithms against the ARCTIC sentence set,we suggest the way it has been performed this year could potentially con-fuse this comparison.Thefirst issue to which we would like to draw attention concerns the consistency of the recorded speech material throughout the database.The second issue concerns the question of how far the full EM001data set satisfies the se-lection criteria used by arbitrary text selection algorithms.5.2.1.Consistency of recorded utterancesFigures3–5show plots of MFCC parameter means from the EM001database taken in alphabeticalfile ordering.To produceEMOO1 File (alphabetical sorting)m e a n f o r 9t h M F C C c h a n n e lFigure 3:Mean value for 9th MFCC channel for each file of the EM001voice database.EMOO1 File (alphabetical sorting)m e a n f o r 7t h M F C C c h a n n e lFigure 4:Mean value for 7th MFCC channel for each file of the EM001voice database.EMOO1 File (alphabetical sorting)m e a n f o r 11t h M F C C c h a n n e lFigure 5:Mean value for 11th MFCC channel for each file of the EM001voice database.these plots we have taken all files in the EM001data set in al-phabetical ordering (along the x-axis)and calculated the mean MFCC parameters 3for each file.In calculating these means,we have omitted the silence at the beginning and end of files us-ing the labelling provided by the force alignment we conducted during voice building.A single selected dimension of this mean vector is then plotted in each of the Figures 3–5.From these figures,we notice that there seem to be three distinct sections of the database,which correspond to the “ARC-TIC”,“BTEC”and “NEWS”file labels as indicated in the plots.Within each of these blocks,the MFCC mean varies randomly,but apparently uniformly so.Between these three sections,however,we observe marked differences.For example,com-pare the distributions of per-file means of the 9th (Fig.3)and 7th (Fig.4)MFCC parameters within the “NEWS”section with those from the other two sections of the database.We naturally expect the MFCC means to vary “randomly”from file to file according to the phonetic content of the utter-ance contained.However,an obvious trend such as that exhib-ited in these plots suggests the influence of something more than phonetic variation alone.Specifically,we suspect this situation has arisen due to the significant difficulty of ensuring consis-tency throughout the many days necessary to record a speech corpus of this size.We have observed similar effects of incon-sistency within other databases,both those we have recorded at CSTR,as well as other commercially recorded databases.Recording a speech corpus over time allows the introduction of variability,with potential sources ranging from the acous-tic recording environment (e.g.microphone placement relative to speaker)to the quality of the speaker’s own voice,which of course can vary over a very short space of time [13].In addi-tion,even the genre and nature of the prompts themselves can influence a speaker’s reading style and voice characteristics.Note that although we do not see any trends within each of the three sections of the EM001data set,and that they appear relatively homogeneous,this does not imply that these subsec-tions are free of the same variability and inconsistency.These plots have been produced by taking the files in alphabetical,and hence numerical,order.But it is not necessarily the case that the files were recorded in this order.In fact,it is likely the file order-ing within the subsections has been randomised which has the effect of disguising inconsistency within the three sections.The inconsistency between the sections is evident purely because the genre identity tag has maintained three distinct groups.Therefore,despite the probable randomisation of file order within sections,we infer from the patterns evident in Figures 3–5that the speech data corresponding to the ARCTIC prompt set was recorded all together,and constitutes a reasonably con-sistent “block”of data.Meanwhile,the rest of the data seems to have been recorded at different times.This introduces in-consistency throughout the database,which a selection algo-rithm based entirely upon text features will not take account of.This means that unless it is explicitly and effectively dealt with by the synthesis system which uses the voice data,both at voice building time (ing cepstral mean normalisation dur-ing forced alignment)and at synthesis time,voice “C”stands a high chance of being disadvantaged by selecting data indis-criminately from inconsistent subsections of the database.The forced alignment labelling may suffer because of the increased variance of the speech data.Unit selection may suffer because the spectral component of the join cost may result in a nonuni-form probability of making joins across sections of the database,compared with the those joins within a single section.This has the effect of “partitioning”the voice database.3extractedusing HTK’s HCopy as part of our force alignment pro-cessing,and also subsequently used in the Multisyn join costThe Multisyn voice building process currently takes ac-count of amplitude inconsistency,and attempts waveform power normalisation on a per-utterance basis.However,other sources of inconsistency,most notably spectral inconsistency are not currently addressed.This means that Multisyn voice “C”is potentially affected by database inconsistency,which in-troduces uncertainty and confusion in any comparison between voices“B”and“C”.Within the subset of431sentences we se-lected to build voice“C”,261came from the“NEWS”section, 169came from the“BTEC”section,and the remaining36came from the“ARCTIC”section.This issue of inconsistency can potentially affect the com-parison between the“C”voices from different entrants.For example,according to our automatic phonetic transcriptions of the EM001sentence set,the minimum number of phones con-tained in a single sentence within the“NEWS”section is52. Meanwhile,the“BTEC”section contains1,374sentences with less than52phones.Although we have not done so here,it is not unreasonable for a text selection strategy to favour short sentences,in which case a large majority may be selected from the“BTEC”section.This would result in avoiding the large discontinuity we observe in Figures3and4and could poten-tially confer an advantage which is in fact unrelated to the text selection algorithm per se.The problem has the potential,however,to introduce most confusion into the comparison between entrants’voices“B”and “C”,as there is most likely to be a bias in favour of the ARCTIC subset,which seems to have been recorded as a single block. We suggest there are at least two ways of avoiding this bias in future challenges.One way would be to provide a database without the inconsistency we observe here,for example through post-processing.This is likely to be rather difficult to realise, and our own previous attempts have failed tofind a satisfactory solution,although[14]reported some success.A second,sim-pler way would be to record the set of ARCTIC sentences ran-domly throughout the recording of a future Blizzard Challenge corpus.5.2.2.Selection criteria coverageThe second problem inherent in attempting to compare text se-lection processes in this way arises from differing selection cri-teria.It is usual to choose text selection criteria(i.e.which di-phone context features to consider)which complement the syn-thesis system’s target cost function.Hence the criteria may vary between systems.The set of ARCTIC sentences was selected from a very large amount of text,and so the possibility for the algorithm to reach its optimal subset in terms of the selection criteria it used is maximised.In contrast,the text selection required for voice “C”was performed on a far smaller set of sentences.Although, admittedly,it is likely to be phonetically much richer than if the same number of sentences had been selected randomly from a large corpus,it is possible that the initial set of sentences does not contain a sufficient variety of material to satisfy the selec-tion criteria of arbitrary text selection systems.This again may tend to accord an inherent advantage to voice“B”.6.ConclusionWe have introduced two new features of the Multisyn unit selec-tion system.We have also raised issues for discussion concern-ing the comparison of voices built with differing subsets of the provided data.Finally,we note that,as in previous years,par-ticipating in this Blizzard Challenge has proved both interesting and useful.7.AcknowledgmentsKorin Richmond is currently supported by EPSRC grant EP/E027741/1.Many thanks to Lee Hetherington for making the MITFST toolkit available under a BSD-style license,and for other technical guidance.Thanks to A.Nenkova for process-ing the Blizzard text prompts for pitch accent prediction.8.References[1]R.A.J.Clark,K.Richmond,and S.King,“Multisyn:Open-domain unit selection for the Festival speech syn-thesis system,”Speech Communication,vol.49,no.4,pp.317–330,2007.[2]R.Clark,K.Richmond,V.Strom,and S.King,“Multisyn voice for the Blizzard Challenge2006,”in Proc.Blizzard Challenge Workshop(Inter-speech Satellite),Pittsburgh,USA,Sept.2006, (/blizzard/blizzard2006.html).[3]R.A.Clark,K.Richmond,and S.King,“Multisyn voicesfrom ARCTIC data for the Blizzard challenge,”in Proc.Interspeech2005,Sept.2005.[4]J.Kominek and A.Black,“The CMU ARCTIC speechdatabases,”in5th ISCA Speech Synthesis Workshop,Pitts-burgh,PA,2004,pp.223–224.[5]S.Fitt and S.Isard,“Synthesis of regional English usinga keyword lexicon,”in Proc.Eurospeech’99,vol.2,Bu-dapest,1999,pp.823–826.[6]M.Mohri and R.Sproat,“An efficient compiler forweighted rewrite rules,”in Proc.34th annual meeting of Association for Computational Linguistics,1996,pp.231–238.[7]R.Kaplan and M.Kay,“Regular models of phonologicalrule systems,”Computational Linguistics,vol.20,no.3, pp.331–378,Sep1994.[8]L.Karttunen,“The replace operator,”in Proc.33th an-nual meeting of Association for Computational Linguis-tics,1995,pp.16–23.[9]L.Hetherington,“The MITfinite-state transducer toolkitfor speech and language processing,”in Proc.ICSLP, 2004.[10]S.Young,G.Evermann,D.Kershaw,G.Moore,J.Odell,D.Ollason,D.Povey,V.Valtchev,and P.Woodland,TheHTK Book(for HTK version3.2),Cambridge University Engineering Department,2002.[11]J.Brenier,A.Nenkova,A.Kothari,L.Whitton,D.Beaver,and D.Jurafsky,“The(non)utility of linguistic features for predicting prominence on spontaneous speech,”in IEEE/ACL2006Workshop on Spoken Language Technol-ogy,2006.[12]V.Strom,A.Nenkova,R.Clark,Y.Vazquez-Alvarez,J.Brenier,S.King,and D.Jurafsky,“Modelling promi-nence and emphasis improves unit-selection synthesis,”in Proc.Interspeech,Antwerp,2007.[13]H.Kawai and M.Tsuzaki,“Study on time-dependentvoice quality variation in a large-scale single speaker speech corpus used for speech synthesis,”in Proc.IEEE Workshop on Speech Synthesis,2002,pp.15–18. [14]Y.Stylianou,“Assessment and correction of voice qualityvariabilities in large speech databases for concatentative speech synthesis,”in Proc.ICASSP-99,Phoenix,Arizona, Mar.1999,pp.377–380.。
现代语言学前五章课后习题答案
Chapter 1 Introduction1.Explain the following definition of linguistics: Linguistics is the scientific study oflanguage. 请解释以下语言学的定义:语言学是对语言的科学研究。
Linguistics investigates not any particular languagebut languages in general.Linguistic study is scientific because it is baxxxxsed on the systematic investigation of authentic language data.No serious linguistic conclusion is reached until after the linguist has done the following three things: observing the way language is actually usedformulating some hypothesesand testing these hypotheses against linguistic facts to prove their validity.语言学研究的不是任何特定的语言,而是一般的语言。
语言研究是科学的,因为它是建立在对真实语言数据的系统研究的基础上的。
只有在语言学家做了以下三件事之后,才能得出严肃的语言学结论:观察语言的实际使用方式,提出一些假设,并用语言事实检验这些假设的正确性。
1.What are the major branches of linguistics? What does each of them study?语言学的主要分支是什么?他们每个人都研究什么?Phonetics-How speech sounds are produced and classified语音学——语音是如何产生和分类的Phonology-How sounds form systems and function to convey meaning音系学——声音如何形成系统和功能来传达意义Morphology-How morphemes are combined to form words形态学——词素如何组合成单词Sytax-How morphemes and words are combined to form sentences句法学-词素和单词如何组合成句子Semantics-The study of meaning ( in abstraction)语义学——意义的研究(抽象)Pragmatics-The study of meaning in context of use语用学——在使用语境中对意义的研究Sociolinguistics-The study of language with reference to society社会语言学——研究与社会有关的语言Psycholinguistics-The study of language with reference to the workings of the mind心理语言学:研究与大脑活动有关的语言Applied Linguistics-The application of linguistic principles and theories to language teaching and learning应用语言学——语言学原理和理论在语言教学中的应用1.What makes modern linguistics different from traditional grammar?现代语言学与传统语法有何不同?Modern linguistics is descxxxxriptive;its investigations are baxxxxsed on authenticand mainly spoken language data.现代语言学是描述性的,它的研究是基于真实的,主要是口语数据。
Client World Model Synchronous Alignement for Speaker Verification
password phonetic structure is similar for the speaker and the impostors. This motivates the study of a synchronous alignment approach where the hidden process (i.e the sequence of states) is supposed identical for both client and nonclient. Only the output distributions differ between the two hypotheses. The synchronous alignment approach is depicted and compared to the classical one on Figure 1.
The main idea of synchronous alignment is to make the two models share the same topology and differ in the output distributions. In order to compute the optimal path in the shared model, a global criterion is defined. Two possible criteria are proposed in this section. Specific decoding and training algorithms for both criteria are derived. The convergence properties of such algorithm are studied and the results are presented here. 2.1 Criteria for synchronous alignment
Chapter 2 Speech Sounds
2.1 How Speech Sounds Are Made? The Nasal Cavity(鼻腔)
●When the vocal cords are apart, the air can pass through easily and the sound produced is said to be voiceless. e.g. [p, s, t ] ●When they are close together, the airstreams cause them to vibrate and produces voiced sounds. e.g. [b, z, d] ●When they are totally closed, no air can pass between them, then produce the glottal stop [?]none in En.
2.1 How Speech Sounds Are Made? The Oral Cavity(口腔)
The oral cavity provides the greatest source of modification. Tongue: the most flexible Uvula, the teeth and the lips, Hard palate, soft palate (velum) Alveolar ridge: the rough, bony ridge immediately behind the upper teeth
山西省原平市范亭中学2024_2025学年高二英语4月月考试题
山西省原平市范亭中学2024-2025学年高二英语4月月考试题本试卷分为第I卷(选择题)和第II卷(非选择题)两部分, 共150分。
考试时间120分钟。
第I卷(共100分)第一部分阅读理解(共两节,满分60分)第一节(共15小题,每小题3分,满分45分)AA new app aims to help parents interpret what their baby wants based on the sound of their cry. The free app ChatterBaby, which was released last month, analyzes the acoustic (声学的) features of a baby’s cry, to help parents understand whether their child might be hungry, fussy or in pain. While critics say caregivers should not rely too much on their smartphone, others say it’s a helpful tool for new or tired parents.Ariana Anderson, a mother of four, developed the app. She originally designed the technology to help deaf parents better understand why their baby was upset, but soon realized it could be a helpful tool for all new parents.To build a database, Anderson and her team uploaded 2,000 audio samples of infant(婴儿) cries. She used cries recorded during ear piercings and vaccinations to distinguish pain cries. And to create a baseline for the other two categories, a group of moms had to agree on whether the cry was either hungry or fussy.Anderson’s team continues to collect data and hopes to make the app more accurate by asking parents to get specific about what certain sounds mean.Pediatrician Eric Ball pointed out that evaluating cries can never be an exact science. “I think that all of the apps and technology that new parents are using now can be helpful but need to be taken seriously,” Ball said ,“ I do worry that some parents will get stuck in big data and turn their parenting into basically a spreadsheet(电子表格) which I think will take away the love and caring that parents are supposed to be providing for the children. ”But Anderson said the aim of the app is to have parents interpret the results, not to provide a yes o r no answer. The Bells, a couple using this app, say it’s a win-win. They believe they are not only helping their baby now but potentially othersin the future.1.How does the app judge what babies want?A.By collecting data. B.By recording all the sounds.C.By analyzing the sound of their cries. D.By asking parents about specific messages.2.What was the app designed for in the beginning?A.All new parents. B.Deaf parents.C.Ariana Anderson. D.Crying babies.3.What i s Ball’s opinion about the app?A.Parents should use the app wisely.B.The app can create an accurate result.C.Parents and babies are addicted to the app.D.The app makes babies lose love and caring.4.What is the text mainly about?A.Parents should not rely too much on their smartphones.B.A new app helps parents figure out why their babies are crying.C.Parents can deal with babies’ hunger with the help of a new app.D.A new app called ChatterBaby can prevent babies from crying.BMany people spend more than four hours per day on We Chat, and it is redefining the word “friend.” Does friending someone on social media make him or her your friend in real life?Robin Dunbar, a professor at Oxford University, found that only 15, of the 150 Facebook friends the average user has, could be counted as actual friends and only five as close friends. We Chat may show a similar pattern.Those with whom you attended a course together, applied for the same part-time job, went to a party and intended to cooperate but failed take up most of your WeChat friends. In chat records, the only message may be a system notice, “You have accepted somebody’s friend request”. Sometimes when seeing some photos shared on “Moments”, you even need several minutes to think about when you became friends. Also, you maybe disturbed by mass messages (群发信息) sent from your unfamiliar “friends”, including requests for voting for their children or friends, links from Pinduoduo (a Chinese e-commerce platform that allows users to buy items at lower prices if they purchase in groups) and cookie-cutter (一模一样的) blessings in holidays.You would have thought about deleting this type of “friends” and sort out your connections. But actually you did not do that as you were taught that social networkingis valuable to one’s success. Besides, it would be really awkward if they found thatyou have unfriended them already. Then, you keep increasing your “friends” in social media and click “like” on some pictures that you are not really interested. Butthe fact is that deep emotional connections do not come with the increasing numberof your friends in social media.If the number of your friends reaches 150, maintaining these relationships canbe tough to you, and sometimes even will make you anxious. According to Robin Dunbar,150 is the limit of the number of people with whom one can maintain stable social relationships.5.What can we learn from Robin Dunbar's finding in Paragraph 2?A.A Facebook user has 250 friends on average.B.Most of the social media friends can be actual friends.C.Among our social media friends, only a few people matter.D.Only 15 people of a person’s Facebook friends can be close friends.6.What does the third paragraph tell us about most of your WeChat friends?A.You have deep communication with them.B.You benefit a lot from their mass messages.C.You just have a nodding acquaintance with them.D.You become friends with them in important occasions.7.What does the underlined word “that” in Paragraph 4 refer to?A.Removing unfamiliar friends in WeChat.B.Strengthening ties with your We Chat friends.C.Keeping increasing your friends in social media.D.Clicking “like” on pictures posted by your friends.8.What can we infer from the last paragraph?A.We will be anxious if we make friends online.B.We should avoid making any friends in social media.C.We should make as many friends as possible in social media.D.We have difficulty managing relationships with over 150 people.CLast week, Vodafone started a test of the UK's first full 5G service, available for use by businesses in Salford. It is part of its plan to trial the technology in seven UK cities. But what can we expect from the next generation of mobile technology?One thing we will see in the preparation for the test is lots of tricks with the new tech. Earlier this year, operators paid almost £ 1.4 billion for the 5G wavelengths, and to compensate for that cash, they will need to catch the eye of consumers. In September, Vodafone used its bit of the range to display the UK's first hologram (全息) call. The Manchester City captain Steph Houghton appeared as a hologram in Newbury. It isn't all holograms, however: 5G will offer faster internet access, with Ofcom (英国通讯管理局) suggesting that video that takes a minute to download on 4G will be available in just a second.The wider application is to support connected equipment on the "internet of things" -not just the internet-enabled fridge that can reorder your milk for you, but the network that will enable driverless cars and delivery drones (无人机) to communicate with each other.Prof William Webb has warned that the technology could be a case of the emperor's new clothes. Much of the speed increase, he claims, could have been achieved by putting more money in the 4G network, rather than a new technology. Other different voices have suggested that a focus on rolling out wider rural broadband access and addressing current network coverage would be more beneficial to the UK as a whole.Obviously, 5G will also bring a cost to consumers. It requires a handset for both 5G and 4G, and the first 5G-enabled smart phones are expected in the coming year. With the slow pace of network rollout so far, it is likely that consumers will end up upgrading to a new 5 G phone well before 5 G becomes widely available in the nextcouple of years.9.Why does Prof William Webb say "the technology could be a case of the emperor's new clothes" ?A.He is in favor of the application of the new technology.B.5G will bring a cost to consumers in their daily life.C.5G helps people communicate better with each other.D.He prefers more money to be spent on 4G networks.10.The underlined word "addressing" in the fourth paragraph has the closest meaning to________A.making a speech to B.trying to solveC.managing to decrease D.responding to11.The last paragraph indicates thatA.it'll take several years .to make 5G accessible to the public in the UK B.5G service shows huge development potential and a broad marketC.customers are eager to use 5G smart phones instead of 4G onesD.it's probable that 5G network rollout is speeding up in BritainDZebra crossings (斑马线) — the alternating dark and light stripes on the road surface — are meant to remind drivers that pedestrians may be trying to get across. Unfortunately, they are not very effective. A 1998 study done by the Department of Traffic Planning and Engineering at Sweden’s Lund University showed that three out of four drivers kept the same speed or even speeded up as they were approaching a crossing. Even worse? Only 5% stopped even when they saw someone trying to get across.Now a mother-daughter team in Ahmedabad, India has come up with a clever way to get drivers to pay more attention — a 3D zebra crossing with an optical illusion (视错觉). Artists Saumya Pandya Thakkar and Shakuntala Pandya were asked to paint the crosswalks by IL&FS, an Indian company that manages the highways in Ahmedabad. The corporation was looking for a creative solution to help the city’s residents to cross the busy accident-prone (易出事故的) roads safely. Thakkar and Pandya, who had previously seen images of 3D zebra crossings that gave drivers the illusion oflogs of wood on the streets in Taizhou, China, decided to test if a similar way would work in India.Sure enough, in the six months when the 3D crosswalks have been painted across four of the city’s most dangerous highways, there have been no accidents reported! The artists say that while it may appear that the zebra crossing could cause the drivers to brake suddenly and endanger the vehicles behind, such is not the case. Because of the way the human eye works, the illusion is only visible from a distance. As they get closer, the painting looks just like any other ordinary zebra crossing. The creators hope that their smart design will become increasingly common throughout India and perhaps even the world. So let’s look forward to it.12.What can we learn from the first paragraph?A.Most drivers will slow down at zebra crossings.B.Common zebra crossings don’t function well.C.Drivers have to stop when approaching zebra crossings.D.About 95% of the drivers choose to speed up when approaching zebra crossings. 13.Why do drivers seeing the 3D zebra crossings slow down according to Para. 2?A.Because the drivers consider the safety of pedestrians.B.Because the drivers mistake them for logs of wood on the streets.C.Because the drivers are afraid of being fined for breaking the traffic rules.D.Because the drivers don’t want to brake suddenly and endanger the vehicles behind.14.The last paragraph is mainly about ________.A.the theory of the 3D zebra crossingsB.the popularity of the 3D zebra crossingsC.the shortcoming of the 3D zebra crossingsD.the positive effect of the 3D zebra crossings15.What is the author’s attitude towards the 3D zebra cross ings?A.Cautious. B.Doubtful. C.Approving. D.Disapproving.其次节(共5小题;每小题3分,满分15分)依据短文内容,从短文后的选项中选出能填入空白处的最佳选项。
国际会议最佳学生论文奖(Best student paper)
复习题及答案-语言学基本知识与技能
Chapter OneIntroductionI.What is linguistics?Linguistics is generally defined as the scientific study of language. Linguistics studies not any particular language, but it studies languages in general. It is a scientific study because it is based on the systematic investigation of linguistic data, conducted with reference to some general theory of language structure.II.The scope of linguistics1. Phonetics:The study of sounds used in linguistic communication led to the establishment of phonetics.2. Phonology: deals with how sounds are put together and used to convey meaning in communication.3. Morphology: The study of the way in which morphemes are arranged and combined to form words has constituted the branch of study called morphology.4. Syntax:The combination of words to form grammatically permissible sentences in languages is governed by rules. The study of these rules constitutes a major branch of linguistic studies called syntax.5. Semantics: The study of meaning is known as semantics.6. Pragmatics: When the study of meaning is conducted, not in isolation, but in the context of language use, it becomes another branch of linguistic study called pragmatics.7. Sociolinguitics: The study of social aspects of languages and its relation with society form the core of the branch called sociolinguitics.8. Psycholinguistics relates the study of language to psychology.9. Applied linguistics: Findings in linguistics studies can often be applied to the solution of such practical problems as the recovery of speech ability. The study of such applications is generally known as applied linguistics.III. Some important distinctions in linguistics1. Prescriptive vs. descriptiveIf a linguistic study aims to describe and analyze the language people actually use, it is said to be descriptive; if the linguistic study aims to lay down rules for “correct and standard”behaviour in using language, it is said to be prescriptive.2.Synchronic vs. diachronicThe description of a language at some point of time in history is a synchronic study; the description of a language as it changes through time is a diachronic study.3. Speech and writingSpeech and writing are the two major media of linguistic communication. Modern linguistics regards the spoken language as the natural or the primary medium of human language for some obvious reasons. From the point of view of linguistic evolution, speech is prior to writing. The writing system of any language is always “invented” by its users to record speech when the need arises.4. Langue and paroleThe distinction between langue and parole was made by the Swiss linguist F. de Saussure inthe early 20th century. Langue and parole are French words. Langue refers to the abstract linguistic system shared by all the members of a speech community, and parole refers to the realization of language in actual use.petence and performanceThe distinction between competence and performance was proposed by the American linguist N. Chomsky in the late 1950’s. Chomsky defines competence as the ideal user’s knowledge of the rules of his language, and performance the actual realization of this knowledge in linguistic communication.6.Traditional grammar and modern linguisticsTraditional grammar refers to the studies of language before the publication of F. de Saussure’s book Course in General Linguistics in 1916. Modern linguistics differs from traditional grammar in several basic ways.First, linguistics is descriptive while traditional grammar is prescriptive.Second, Modern linguistics regards the spoken language as primary not the written.Then, modern linguistics differs from traditional grammar also in that it does not force languages into a Latin-based framework.IV. What is language?L anguage is a system of arbitrary vocal symbols used for human communication.1.Design features1) ArbitrarinessLanguage is arbitrary. This means that there is no logical connection between meanings and sounds.2) ProductivityLanguage is productive or creative in that it makes possible the construction and interpretation of new signals by its users.3) DualityLanguage is a system, which consists of two sets of structures, or two levels. At the lower or basic level there is a structure of sounds, which are meaningless by themselves. But the sounds of language can be grouped and regrouped into a large number of units of meaning, which are found at the higher level of the system. This duality of structure or double articulation of language enables its users to talk about anything within their knowledge.4) DisplacementLanguage can be used to refer to contexts removed from the immediate situations of the speaker. This is what “displacement” means. This property provides speakers with an opportunity to talk about a wide range of things, free from barriers caused by separation in time and place. 5) Cultural transmissionHuman capacity for language has a genetic basis while the details of any language system are not genetically transmitted, but instead have to be taught and learned. This shows that language is culturally transmitted. It is passed from one generation to the next through teaching and learning, rather than by instinct.2.Functions of Language1)InformativeIt is the major role of language. The use of language to record the facts is a prerequisite of social development.2)Interpersonal functionIt is the most important sociological use of language, by which people establish and maintain their status in a society.Attached to the interpersonal function of language is its function of the expression of identity.3)PerformativeThis concept originates from the philosophical study of language presented by Austin and Searle, whose theory now forms the backbone of pragmatics.The performative function of language is primarily to change the social status of persons as in marriage ceremonies, the blessing of children and the naming of a ship at a launching ceremony. The kind of language employed in performative verbal acts is usually quite formal and even ritualized.4)Emotive functionThe emotive function of language is one the most powerful uses of language because it is so crucial in changing the emotional status of an audience for or against someone or something. e.g. God, my, Damn it...5)Phatic communionThe term originates from Malinnowski’s study of the functions of language performed by Trobriand Islanders. It refers to the social interaction of language.We all use small, seemingly meaningless expressions such as Good morning, God bless you, Nice day to maintain a comfortable relationship between people.6)Recreational functionNo one will deny the use of language for the sheer joy of using it such as a baby’s babbling.7)Metalingual functionOur language can be used to talk about itself. For example, we can use the word “book”to talk about the book.Chapter TwoPhonologyI.Speech production and perceptionA speech sound goes through a three step process. Naturally, the study of sounds is dividedinto three areas, each dealing with one part of the progress.1. Articulatory phoneticsIt is the study of the production of speech sounds.2.Acoustic phoneticsIt is the study of the physical properties of the sounds produced in speech.3.Auditory phoneticsIt is concerned with the perception of the sounds produced in speech.II.Speech organsSpeech organs are also known as vocal organs. They are those parts of the human bodyinvolved in the production of speech.Speech organs mainly consist of the vocal cords and three cavities which are the pharynx, the oral cavity and the nasal cavity.The vocal cords are in the larynx, the front part of which is called “the Adam ’s Apple.” III. ConsonantsClassification of English consonantsEnglish consonants can be classified in two ways: one is in terms of manner of articulation and the other is in terms of place of articulation.IV.VowelsClassification of English vowelsV owels may be distinguished as front, central, and back according to which part of the tongue is held highest. V owels can also be distinguished according to the openness of the mouth: close vowels, semi-close vowels, semi-open vowels, and open vowels.• Nouns Adjectives • Lips Labial / Bilabial • Teeth Dental • Alveolar ridge Alveolar • Hard palate Palatal • Soft palate Velar • Uvula Uvular• Pharynx Pharyngeal • Tip Apical • Blade Laminal • Front Dorsal •Back Dorsal• Consonants Place • /p/ /b/ Bilabial • /t/ /d/ Tip-alveolar • /k/ /g/ Back-velar • /t ʃ/ /d ʒ/ Blade/front – palato-alveolar • /m/ Bilabial • /n/ Tip-alveolar • /ŋ / Back-velar • Consonants Place • /p/ /b/ Bilabial • /t/ /d/ Tip-alveolar • /k/ /g/ Back-velar• /t ʃ/ /d ʒ/ Blade/front –palato-alveolar• /m/ Bilabial• /n/ Tip-alveolar •/ŋ / Back-velarV. Phonology and phonetics1. Phonetics is concerned with the general nature of speech sound while phonology aims to discover how speech sounds in a language form patterns and how these sounds are used to convey meaning in linguistic communication.2. Phone, phoneme, and allophone– A phone is a phonetic unit or segment. The speech sounds we hear and produce during linguistic communication are all phones.– A phoneme is a phonological unit; it is a unit that is of distinctive value. It is an abstract unit. It is not any particular sound, but rather it is represented orrealized by a certain phone in a certain phonetic context.–The different phones which can represent a phoneme in different phonetic environments are called the allophones of that phoneme. For example, thephoneme /l/ in English can be realized as dark /l/, clear /l/, etc. which areallophones of the phoneme.3. Phonemic contrast, complementary distribution, and minimal pairIf the phonetically similar sounds are two distinctive phonemes, they are said to form a phonemic contrast, e.g. /p/ and /b/ in /pit/ and /bit/.If they are allophones of the same phoneme, then they do not distinguish meaning, but complement each other in distribution. For instance, the clear /l/ always occurs before a vowel while the dark /l/ always occurs between a vowel and a consonant, or at the end of a word. So the allophones are said to be in complementary distribution.When two different forms are identical in every way except for one sound segment which occurs in the same place in the strings, the two sound combinations are said to form a minimal pair. So in English, pill and bill are a minimal pair.4. Some rules in phonologySequential rules, Assimilation rule, Deletion rule5. Supra-segmental features—stress, tone, intonationStress:Depending on the context in which stress is considered, there are two kinds of stress: word stress and sentence stress.The location of stress in English distinguishes meaning.Sentence stress refers to the relative force given to the component of a sentence. The part of speech that are normally stressed in an English sentence are nouns, main verbs, adjectives, adverbs, numerals and demonstrative pronouns; the other categories of words like articles, person pronouns, auxiliary verbs, prepositions, and conjunctions are usually not stressed.Tone:Tones are pitch variations, which are caused by the differing rates of vibration of the vocal cords. Pitch variation can distinguish meaning just like phonemes; therefore, the tone is a supra-segmental feature. The meaning-distinctive function of the tone is especially important in what we call tone languages. E.g. Chinese.Intonation:When pitch, stress and sound length are tied to the sentence rather than the word in isolation, they are collectively known as intonation. Intonation plays an important role in the conveyance of meaning in almost every language, especially in a language like English.Chapter ThreeMorphologyI. Open class and closed classIn English, nouns, verbs, adjectives and adverbs make up the largest part of the vocabulary. They are the content words of a language, which are sometimes called open class words, since we can regularly add new words to these classes.The other syntactic categories include “grammatical” or “functional” words. Conjunctions, prepositions, articles and pronounces consist of relatively few words and have been referred to as being closed class words since new words are not usually added to them.II. Internal structure of words and rules for word formationMorphology refers to the study of the internal structure of words, and the rules by which words are formed.e.g. like—dislike order—disorder appear—disappear approve-–disapproveagree—disagree“dis-”is a prefix means “not”, and placed before a root-wordIII. Morphemes—the minimal units of meaningSome words are formed by combining a number of distinct units of meaning. The most basic element of meaning is traditionally called morpheme.The following list shows that in English a single word may consist of one or more morphemes.One morpheme: desireTwo morphemes: desire + ableThree morphemes: desire + able + ityFour morphemes: un + desire + able + ityIn fact every word in every language is composed of one or more morphemes.Prefixes occurs only before other morphemes while suffixes occur only after other morphemes.IV. Derivational and inflectional morphemesIn English there are morphemes which change the category or grammatical class of words. A verb, for example, is formed by adding –en to the adjective black—blacken, or by adding -ize to the noun computer—computerize.More examples: noun—adjective affection + ateAlcohol + ic-en, -ate, and –ic are thus called derivational morphemes, because when they are conjoined to other morphemes (or words) a new word is derived, or formed.Similarly, there are bound morphemes which are for the most part purely grammatical markers, signifying such concepts as tense, number, case, aspect and so on.Such bound morphemes are referred to as inflectional morphemes.V. Morphological rules of word formationThe ways words are formed are called morphological rules. These rules determine how morphemes combine to form words.Some of the morphological rules can be used quite freely to form new words. We call them productive morphological rules.Un + accept + able = un + adjective = not adjectiveVI. CompoundsAnother way to form new words, or compound words, to be exact, is by stringing words together, as shown in the examples below:Chapter FourSyntaxI. What is syntax?Synta x is a branch of linguistics that studies how words are combined to form sentences and the rules that govern the formation of sentences.II. CategoriesCategory refers to a group of linguistic items which fulfill the same or similar functions in a particular language such as sentence, a noun phrase or a verb.A fundamental fact about words in all human languages is that they can be grouped together into a relatively small number of classes, called syntactic categories.1. Word level categories are divided into two kinds: major lexical categories and minor lexical categories.2. Phrase categories and their structuresSyntactic units that are built around a certain word category are called phrases, the category of which is determined by the word category around which the phrase is built. In English syntactic analysis, the most commonly recognized and discussed phrasal categories are noun phrase (NP), verb phrase (VP), adjective phrase (AP) and prepositional phrase (PP).Whether formed of one or more than one word, they consist of two levels, Phrase level and word level as exemplified below.NP VP AP PP ←phrase level||||N V A P ←word levelPhrases that are formed of more than one word usually contain the following elements: head, specifier and complement. The word around which a phrase is formed is termed head. The words on the left side of the heads are said to function as specifiers. The words on the right side of the heads are complements.3 Phrase structure ruleThe special type of grammatical mechanism that regulates the arrangement of elements that make up a phrase is called a phrase structure rule. The phrase structural rule for NP, VP, AP, and PP can be written as follows:NP →(Det) N (PP) …VP →(Qual) V (NP) …AP →(Deg) A (PP) …PP →(Deg) P (NP) …The arrow can be read as “consist of ”or “branches into”. The parentheses mean that the element in them can be omitted and the three dots in each rule indicate that other complement options are available.4.XP ruleThe XP rule: XP →(specifier) X (complement)5. X¯ Theorya. XP → (Specifier) X¯b. X¯→ X (complement)The first rule stipulates that XP categories such as NP and VP consist of an optional specifier (a determiner, a qualifier, and so forth) and an X¯. The second rule states that an X¯consists of a head, X, and any complements.6. Phrase elementsSpecifierSpecifiers have both special semantic and syntactic roles. Semantically, they help make more precise the meaning of the head. Syntactically, they typically mark a phrase boundary. Specifiers can be determiners, qualifiers and degree words as well.ComplementsAs we have seen, complements are themselves phrases and provide information about entities and locations whose existence is implies by the meaning of the head. They are attached to the right of the head in English.The XP Rule (revised): XP → (Specifier) X (Complement*)This rule also captures the simple but important fact that complements, however many there are, occur to the right of the head in English.ModifiersModifiers specify optionally expressible properties of heads.Table 4-2 Modifier position in EnglishModifier Position ExampleAP Precedes the head A very careful girlPP Follows the head Open with careAdvP Precedes or follows the head Read carefully; carefully readTo make modifiers fit into phrase structure, we can expand our original XP rule into the following so that it allows the various options.The Expanded XP rule: XP → (Spec) (Mod) X (Complement*) (Mod)This rule allows a modifier to occur either before the head or after it. Where there is a complement,a modifier that occurs after the head will normally occur to the right of the complement as well.7. Sentences (The S rule)The S rule: S →NP VPWhich combines an NP (often called the subject) with a VP to yield a sentence such as the one bellow.Many linguists nowadays believe that sentences, like other phrases, also have their own heads. They take an abstract category inflection (dubbed “Infl”) as their heads, which indicates the sentence’s tense and agreement.8. Deep structure and surface structureThere are two levels of syntactic structure. Te first, formed by the XP rule in accordance with the head’s subcategories, is called deep structure (or D-structure). The second, corresponding to the final syntactic form of the sentence which results from appropriate transformations, is called surface structure (or S-structure).The organization of the syntactic component of the grammar can be depicted below.The XP Rule↓DEEP STRUCTURE ←(Sub-categorization restricts choice of complements)↓Transformations↓SURFACE STRUCTUREChapter FiveSemanticsI. What is semantics?Semantics can be simply defined as the study of meaning. In our discussion, we will limit ourselves to the study o meaning from linguistic point of view.II. Some views concerning the study of meaning1 The naming theoryThe naming theory was proposed by the ancient Greek scholar Plato, according to which the linguistic forms or symbols, in other words, the words used in a language are simply labels of the objects they stand for.2 The conceptualist viewConceptualist view relates words and things through the mediation of concepts of the mind. This view holds that there is no link between a linguistic form and what it refers to; rather, in the interpretation of meaning they are linked through the mediation of concepts in the mind. This is best illustrated by the classic semantic triangle or triangle of significance suggested by Ogden and Richards:3. ContextualismThe contextualist view of meaning is based on the presumption that one can derive meaning from or reduce meaning to observable contexts. Two kinds of context are recognized: the situational context and the linguistic context. The representative linguist of the view is Firth who is influenced by Molinonwsky and Wittgenstein.4. BehaviorismBehaviorists attempted to define the meaning of a language form as the “situation in which the speaker utters it and the response it calls forth in the hearer.” (Bloomfield, 1933) Behaviorism in linguistics holds that children learn language through a chain of “Stimulus-Response reinforcement”and the adult’s use of language is also a process of Stimulus-Response. For the theory, Bloomfield put forward the well-known formula:S →r …………………s →RHere S stands fro practical stimulus, r stands for the substitute reaction of speech, s stands for the substitute stimulus, and R stands for external practical reaction.III. Lexical meaning1. Sense and referenceSense and reference are two terms often encountered in the study of word meaning. They are two related but different aspects of meaning.Sense is concerned with the inherent meaning of the linguistic form. It is the collection of all the features of the linguistic form; it is abstract and de-contextualized. It is the aspect of meaning dictionary compilers are interested in.Reference means what a linguistic form refers to in the real physical world; it deals with the relationship between the linguistic element and the non-linguistic world of experience.2. Major sense relationsSynonymySynonymy refers to the sameness or closed similarity of meaning. Words that are close in meaning are called synonyms.PolysemyWhile different words may have the same or similar meaning, the same one word may have more than one meaning. This is what we call polysemy.HononymyHononymy refers to the phenomenon that words having different meanings have the same form, i.e., different words are identical in sound or spelling, or in both.HyponymyHyponymy refers to the sense relation between a more general, more inclusive word and a more specific word.AntonymyThe term antonymy is used for oppositeness of meaning; words that are opposite in meaning are antonyms.i. Gradable antonyms; ii. Complementary antonyms; iii. Relational opposites3. Sense relations between sentencesi. X is synonymous with Yii. X is inconsistent with Yiii. X entails Y. (Y is an entailment of X)iv. X presupposes Y. (Y is a prerequisite of X)v. X is a contradictionvi. X is semantically anomalous.4. Analysis of meaningComponential analysis—a way to analyze lexical meaningComponential analysis is a way proposed by the structural semanticists to analyze word meaning. By componential analysis, linguist looks at each word as a bundle of different features or components.Prediction analysis—a way to analyze sentence meaningWhether a sentence is semantically meaningful is governed by rules called selectionalrestrictions, i.e., constraints on what lexical items can go with what others.Chapter SixPragmaticsI. DefinitionPragmatics can be defined in various ways. A general definition is that it is the study of how speakers of a language use sentences to effect successful communication. As the process of communication is essentially a process of conveying and understanding meaning in a certain context, pragmatics can also be regarded as a kind of meaning study.II. ContextThe notion of context is essential to the pragmatics study of language. Context determines the speaker’s use of language and also the hearer’s interpretation of what is said to him.III. Sentence meaning vs. utterance meaningWhile the meaning of a sentence is abstract, and decontextualized, that of an utterance is concrete, and context-dependent. The meaning of an utterance is based on sentence meaning; it is the realization of the abstract meaning of a sentence in a real situation of communication, or simply in a context.IV. Speech act theory1 Austin’s model of speech actsSpeech act theory is an important theory in the pragmatic study of language. It was originated wit the British philosophy John Austin in the late 50’s of the 20th century. This is a philosophical explanation of the nature of linguistic communication. It aims to answer the question “What do we do when using language?”According to speech act theory, we are performing action when we are speaking. A speaker might be performing three acts simultaneously when speaking:locutionary act, illocutionary act, and perlocutionary act.2. Searl e’s classification of speech actAccording to Searle, an American philosophy, speech acts fall into five general categories, i.e., there are five general types of things we do with language, Specific acts that fall into each type share the same illocutionary point, but differ in their strength.1) representatives: stating or describing, saying that the speaker believes to be true.2) directives: trying to get the hearer to do something3) commissives: committing the speaker himself to some future course of action4) expressives: expressing feelings or attitude towards an existing state5) declarations: bringing about immediate changes by saying something3. Principle of conversationPaul Grice’s idea is that in making conversation, the participants must first of all be willing to cooperate; otherwise, it would not be possible for them to carry on the talk. This general principle is called the Cooperative Principle.。
english accent
A COMPARATIVE ANALYSIS OF UK AND US ENGLISH ACCENTS INRECOGNITION AND SYNTHESISQin Yan, Saeed VaseghiDept of Electronic and Computer EngineeringBrunel University, Uxbridge, Middlesex, UK UB8 3PHQin.Yan@,Saeed.Vaseghi@ABSTRACTIn this paper, we present a comparative study of the acoustic speech features of two major English accents: British English and American English. Experiments examined the deterioration in speech recognition resulting from the mismatch between English accents of the input speech and the speech models. Mismatch in accents can increase the error rates by more than 100%. Hence a detailed study of the acoustic correlates of accent using intonation pattern and pitch characteristics was performed. Accents differences are acoustic manifestations of differences in duration, pitch and intonation pattern and of course the differences in phonetic transcriptions. Particularly, British speakers possess much steeper pitch rise and fall pattern and lower average pitch in most of vowels. Finally a possible meansto convert English accents is suggested based on above analysis.1. INTRODUCTIONIn the recent years, there have been significant advancesin speech recognition systems resulting in reduction in the error rate. Two of the most important remaining obstaclesto reliable high performance speech recognition systems are noise and speaker variations. An important aspect of speaker variation is accent. However, current speech recognisers are trained on a specific national accent group (e.g. UK or US English accents), and may have a significant deterioration in performance when processing accents unseen in the training data. An understanding of the causes and acoustic properties of English accents can also be quite useful in several areas such as speech synthesis and voice conversion.In [3] J.C. Wells described the term accent as a pattern of pronunciation used by a speaker for whom English is the native language or more generally, by the community or social grouping to which he or she belongs. Linguistically, accent variation does not only lie in phonetic characteristics but also in the prosody.There has been considerable research conducted on understanding the causes and the acoustics correlates of native English accent. A study in [3] examined a variety of native English accents from a linguistics point of view. Recently more focused studies have been made on acoustic characteristics of English accents. In [4] a method is described to decrease the recognition error rate by automatically generating the accent dictionary through comparison of standard transcription with decoded phone sequence. In [1], rather than using phonetic symbols, different regional accents are synthesized by an accent-independent keyword lexicon. During synthesis, input text is first transcribed as keyword lexicon. Until post-lexical processes, accent dependent allophonic rules were applied to deal with such features as /t//d/ topping in US English, or r-linking in British English. The advantage of this method is that it avoids applying different phonetic symbols to represent various accents. In addition, [2] established a voice conversion system between British and US English accents by HMM-based spectral mapping with set rules for mapping two different phone sets. However, it still has some residual of original source accent characteristics in the converted result.In this paper, experiments began with cross accent recognition to quantify the accent effects between British accent (BrA) and American accent (GenAm) on speechrecognition. A further detailed acoustics feature study of English accent using duration, intonation and frequency characteristics was performed.2. CROSS ACCENT RECOGNITIONAt first, a set of simple experiments was carried out to quantify the effect of accents on the speech recognisers with accent specific dictionaries. The model training and recogniser used here are based on HTK [9]. British accent speech recogniser was trained on Continuous Speech Recognition Corpus (WSJCAM0). American accent speech recogniser was trained on WSJ. Test sets used are WSJ si_dt_05 si_et_05 and WSJCAM si_dt5b, each containing 5k words. Both recognisers employ 3-state left-to-right HMMs. The features used in experiments were 39 MFCCs with energy and their differentiation and acceleration.Accent British model American model British input 12.8 29.3American input 30.6 8.8 Average 21.719.1 Table 1: % word error rate of cross accents speech recognition between British and American accentTable 1 shows that for this database the American English achieves 31% less error than the British English in matched accent conditions. Mismatched accent of the speaker and the recognition system deteriorates the performance. The result was getting worse by 139% for recognizing British English with American models and 232% for recognizing American English with British models. The results are based on word models compiled from triphone HMMs with three states per model and 20 mixtures Gaussians per state.The next section examines the acoustics features of both English accents in an attempt to identify where the main difference lies in addition to the variation in pronunciation.3. ANALYSIS OF ACOUSTIC FEATURES OFUS AND UK ENGLISH ACCENTS3.1 DurationFigure 1 shows that the vowel durations at the start andthe end of sentences in BrA is shorter than that in GenAm. This could be due to the following reason. British speakers always tend to pronounce last syllable fast. It is the case especially for consonants. However, Americans tend to realize more acoustically complete pronunciation.Table 2 gives the comparison of two database in speaking rate. The speaker rate of Wsjcam0 is 7.8% higher than that of Wsj. This is in accordance with comparison in phone duration in Figure 1.The results of these comparisons are shown in Figure 1. Note that results are only presented for models common to both system phones sets.Speak Rate(no/sec)Phone WordWsjcam0 9.77 3.04 Wsj 10.39 2.82Table 2 : Speak rate in Phone and wordfrom Wsjcam0 and WsjFigure 1: Difference of Vowel duration of GenAm and BrA at the utterance starts and ends3.2 Pitch CharacteristicsTable 3 and 4 list average pitch values and numbers of speakers from both databases. Figure 2 displays the difference of average vowel pitch frequency of male speakers of two accents while Figure 3 shows the corresponding comparison of female speakers. Even BrA has lower average pitch than GenAm in the whole phone set, for the common vowels, their average pitch in BrA is stil much more lower than corresponding part in GenAm. It is interesting to note that for most of vowels, British speakers give lower pitch than American counterparts. For British female speakers, its 118% lower than American female in average while it drops down to 7.7% when comparing with British male and American male in the common set vowels. In accordance with [4], diphthongs such as ay uw er, display more difference than other vowels. Furthermore, average pitch frequency of the last word of sentences from male speakers of both accentsalso clearly demonstrate similar results that British speakers are generally speaking lower than their counterpart. Besides, it can be noted that British male speakers gain high average pitch in three vowels : uh, ih and ae .Speaker No. Male FemaleWsjcam0 112 93 Wsj 37 41 Table 3: Number of speakers Wsjcam0 and WsjAvg PitchMaleFemale Wsjcam0 115.8 Hz 196.2 Hz Wsj 127.8 Hz 208.9 Hz Difference 9.4% 5.7% Table 4: Average pitch of Wsjcam0 and WsjFigure 2: Difference of average pitch value of vowels ofGenAm and BrA (male speakers)Figure 3: Difference of average pitch value of vowelsGenAm and BrA (female speakers)Figure 4(a)Figure 4(b)Figure 4 (a): Average of Rise and Fall patterns fromBritish and American speakersFigure 4 (b): Average of Rise and Fall patterns of lastword of the sentencesXlabel: uniform duration (1.812ms), Ylabel: frequency 3.3 ProsodyProsody is usually made up of Intonation-groups, Pitch Event and Pitch Accent .Intonation-groups are composed of a sequence of pitch events within phrase. Pitch Event is a combination of a pitch rise and fall. Pitch accent ,either a pitch rise or a pitch fall, is the most elementary unit of intonation.In [6], a rise fall connection (RFC) model was applied to model the pitch contour by Legendre polynomial function [a1, a2, a3], where a1, a2, a3, called discrete Legendre Polynomial Coefficients, were related to the average contour, average contour slope and average trend of the slope within that pitch accent. Rise and fall are detected according to f0 contour. Based on this, experiments were made on computing the average pattern of pitch accents(Fall and Rise only in this case) to explore the numerical difference of both accents in intonation. Figure 4(a) illustrates the average of rise and fall patterns from both male and female speakers. It is noticeable that British speakers intend to have steeper rise and fall than American speakers. Particularly, for rise pattern, their difference in pitch change rate reaches 34% in average while fall pattern only gives 21% difference. In addition, it is also noticeable that pitch range narrows towards the end of an utterance as [8].Further to the results that American speaker tends to speak lower in final words of sentences. Figure 4(b) indicates that BrA Rise pattern in the last words is much more steeper than that of GenAm with pitch change rate of 48% and 32% respectively.In contrast, the fall pattern is almost same in either figure. Then British speakers possess much steeper pitch accent than American speakers.5. DISCUSSIONS AND CONCLUSIONWe have presented a detailed study of acoustic features about two major English accents: BrA and GenAm. In addition to the significant difference in phonetics, the slope of Rise and Fall accent also exhibits great difference. British speakers tend to speak with lower pitch but higher pitch change rate, especially in the rise accent. Future experiments are to be extended to other context-dependent pitch pattern analysis besides utterance end.In general, accent conversion/synthesis could be simplified into two aspects: phonetics and acoustics. Beep dictionary and CMU dictionary explicitly display the phonetics difference between two accents in terms of phone substitute, delete and insert. In this paper, we began the exploration of acoustics difference between two accents in the view of duration, pitch and intonation pattern.Therefore, the accent synthesis is planned to carry on by two steps for future experiments.1) Pronunciation modelling by transcribing GenAm by BrA phones to map phonetic difference of two accents [4] or vice verse.2) Prosody modification [7] [8]. By applying Tilt model base on decision-tree HMM, tilt parameters are changed according to above analysis. The advantage of Tilt model lies in its continuous tilt parameters, which better describe the intonation pattern than RFC models or FUJISAKI models [7]. A new pitch contour is then synthesized after changing tilt parameters according above study.6 ACKNOWLEDGEMENTSThis research has been supported by Department of Computing and Electronic Engineering, Brunel University, UK. We thank Ching-Hsiang Ho for the program of detecting the pitch accents.7. REFERENCE[1] Susan Fitt, Stephen Isard, Synthesis of Regional English Using A Keyword Lexicon.Proceedings Eurospeech 99, Vol. 2, pp. 823-6.[2] Ching-Hsiang Ho, Saeed Vaseghi, Aimin Chen, Voice Conversion between UK and US Accented English, Eurospeech 99.[3] J.C. Wells, Accents of English, volume:1,2, Cambridge University Press, 1982[4] Jason John Humphries, Accent Modelling and Adaptation in Automatic Speech recognition, PhD Thesis, Cambridge University Engineering Department[5] Alan Cruttenden, Intonation, Second Edition 1997[6] Ching-Hsiang Ho, Speaker Modelling for Voice Conversion, PHD thesis, Department of Computing and Electronic Engineering, Brunel University[7] Thierry Dutoit, Introduction to text-to-speech synthesis, Kluwer (1997)[8] Paul Taylor, Analysis and Synthesis of Intonation using Tilt Model, Journal of the Acoustical Society of America. Vol 107 3, pp. 1697-1714.[9] Steve Young, Dan Kershaw, Julian Odell, Dave Ollason, Valtcho Valtchev, Phil Woodland, The HTK Book. V2.2。
Speech-to-text and speech-to-speech summarization of spontaneous speech
Speech-to-Text and Speech-to-Speech Summarizationof Spontaneous SpeechSadaoki Furui,Fellow,IEEE,Tomonori Kikuchi,Yousuke Shinnaka,and Chiori Hori,Member,IEEEAbstract—This paper presents techniques for speech-to-text and speech-to-speech automatic summarization based on speech unit extraction and concatenation.For the former case,a two-stage summarization method consisting of important sentence extraction and word-based sentence compaction is investigated. Sentence and word units which maximize the weighted sum of linguistic likelihood,amount of information,confidence measure, and grammatical likelihood of concatenated units are extracted from the speech recognition results and concatenated for pro-ducing summaries.For the latter case,sentences,words,and between-filler units are investigated as units to be extracted from original speech.These methods are applied to the summarization of unrestricted-domain spontaneous presentations and evaluated by objective and subjective measures.It was confirmed that pro-posed methods are effective in spontaneous speech summarization. Index Terms—Presentation,speech recognition,speech summa-rization,speech-to-speech,speech-to-text,spontaneous speech.I.I NTRODUCTIONO NE OF THE KEY applications of automatic speech recognition is to transcribe speech documents such as talks,presentations,lectures,and broadcast news[1].Although speech is the most natural and effective method of communi-cation between human beings,it is not easy to quickly review, retrieve,and reuse speech documents if they are simply recorded as audio signal.Therefore,transcribing speech is expected to become a crucial capability for the coming IT era.Although high recognition accuracy can be easily obtained for speech read from a text,such as anchor speakers’broadcast news utterances,technological ability for recognizing spontaneous speech is still limited[2].Spontaneous speech is ill-formed and very different from written text.Spontaneous speech usually includes redundant information such as disfluencies, fillers,repetitions,repairs,and word fragments.In addition, irrelevant information included in a transcription caused by recognition errors is usually inevitable.Therefore,an approach in which all words are simply transcribed is not an effective one for spontaneous speech.Instead,speech summarization which extracts important information and removes redundantManuscript received May6,2003;revised December11,2003.The associate editor coordinating the review of this manuscript and approving it for publica-tion was Dr.Julia Hirschberg.S.Furui,T.Kikuchi,and Y.Shinnaka are with the Department of Com-puter Science,Tokyo Institute of Technology,Tokyo,152-8552,Japan (e-mail:furui@furui.cs.titech.ac.jp;kikuchi@furui.cs.titech.ac.jp;shinnaka@ furui.cs.titech.ac.jp).C.Hori is with the Intelligent Communication Laboratory,NTT Communication Science Laboratories,Kyoto619-0237,Japan(e-mail: chiori@cslab.kecl.ntt.co.jp).Digital Object Identifier10.1109/TSA.2004.828699and incorrect information is ideal for recognizing spontaneous speech.Speech summarization is expected to save time for reviewing speech documents and improve the efficiency of document retrieval.Summarization results can be presented by either text or speech.The former method has advantages in that:1)the documents can be easily looked through;2)the part of the doc-uments that are interesting for users can be easily extracted;and 3)information extraction and retrieval techniques can be easily applied to the documents.However,it has disadvantages in that wrong information due to speech recognition errors cannot be avoided and prosodic information such as the emotion of speakers conveyed only in speech cannot be presented.On the other hand,the latter method does not have such disadvantages and it can preserve all the acoustic information included in the original speech.Methods for presenting summaries by speech can be clas-sified into two categories:1)presenting simply concatenated speech segments that are extracted from original speech or 2)synthesizing summarization text by using a speech synthe-sizer.Since state-of-the-art speech synthesizers still cannot produce completely natural speech,the former method can easily produce better quality summarizations,and it does not have the problem of synthesizing wrong messages due to speech recognition errors.The major problem in using extracted speech segments is how to avoid unnatural noisy sound caused by the concatenation.There has been much research in the area of summarizing written language(see[3]for a comprehensive overview).So far,however,very little attention has been given to the question of how to create and evaluate spoken language summarization based on automatically generated transcription from a speech recognizer.One fundamental problem with the summaries pro-duced is that they contain recognition errors and disfluencies. Summarization of dialogues within limited domains has been attempted within the context of the VERBMOBIL project[4]. Zechner and Waibel have investigated how the accuracy of the summaries changes when methods for word error rate reduction are applied in summarizing conversations in television shows [5].Recent work on spoken language summarization in unre-stricted domains has focused almost exclusively on Broadcast News[6],[7].Koumpis and Renals have investigated the tran-scription and summarization of voice mail speech[8].Most of the previous research on spoken language summarization have used relatively long units,such as sentences or speaker turns,as minimal units for summarization.This paper investigates automatic speech summarization techniques with the two presentation methods in unrestricted1063-6676/04$20.00©2004IEEEdomains.In both cases,the most appropriate sentences,phrases or word units/segments are automatically extracted from orig-inal speech and concatenated to produce a summary under the constraint that extracted units cannot be reordered or replaced. Only when the summary is presented by text,transcription is modified into a written editorial article style by certain rules.When the summary is presented by speech,a waveform concatenation-based method is used.Although prosodic features such as accent and intonation could be used for selection of important parts,reliable methods for automatic and correct extraction of prosodic features from spontaneous speech and for modeling them have not yet been established.Therefore,in this paper,input speech is automat-ically recognized and important segments are extracted based only on the textual information.Evaluation experiments are performed using spontaneous presentation utterances in the Corpus of Spontaneous Japanese (CSJ)made by the Spontaneous Speech Corpus and Processing Project[9].The project began in1999and is being conducted over a five-year period with the following three major targets.1)Building a large-scale spontaneous speech corpus(CSJ)consisting of roughly7M words with a total speech length of700h.This mainly records monologues such as lectures,presentations and news commentaries.The recordings with low spontaneity,such as those from read text,are excluded from the corpus.The utterances are manually transcribed orthographically and phonetically.One-tenth of them,called Core,are tagged manually and used for training a morphological analysis and part-of-speech(POS)tagging program for automati-cally analyzing all of the700-h utterances.The Core is also tagged with para-linguistic information including intonation.2)Acoustic and language modeling for spontaneous speechunderstanding using linguistic,as well as para-linguistic, information in speech.3)Investigating spontaneous speech summarization tech-nology.II.S UMMARIZATION W ITH T EXT P RESENTATIONA.Two-Stage Summarization MethodFig.1shows the two-stage summarization method consisting of important sentence extraction and sentence compaction[10]. Using speech recognition results,the score for important sen-tence extraction is calculated for each sentence.After removing all the fillers,a set of relatively important sentences is extracted, and sentence compaction using our proposed method[11],[12] is applied to the set of extracted sentences.The ratio of sentence extraction and compaction is controlled according to a summa-rization ratio initially determined by the user.Speech summarization has a number of significant chal-lenges that distinguish it from general text summarization. Applying text-based technologies to speech is not always workable and often they are not equipped to capture speech specific phenomena.Speech contains a number of spontaneous effects,which are not present in written language,such as hesitations,false starts,and fillers.Speech is,to someextent,Fig. 1.A two-stage automatic speech summarization system with text presentation.always distorted by ungrammatical and various redundant expressions.Speech is also a continuous phenomenon that comes without unambiguous sentence boundaries.In addition, errors in transcriptions of automatic speech recognition engines can be quite substantial.Sentence extraction methods on which most of the text summarization methods[13]are based cannot cope with the problems of distorted information and redundant expressions in speech.Although several sentence compression methods have also been investigated in text summarization[14],[15], they rely on discourse and grammatical structures of the input text.Therefore,it is difficult to apply them to spontaneous speech with ill-formed structures.The method proposed in this paper is suitable for applying to ill-formed speech recognition results,since it simultaneously uses various statistical features, including a confidence measure of speech recognition results. The principle of the speech-to-text summarization method is also used in the speech-to-speech summarization which will be described in the next section.Speech-to-speech summarization is a comparatively much younger discipline,and has not yet been investigated in the same framework as the speech-to-text summarization.1)Important Sentence Extraction:Important sentence ex-traction is performed according to the following score for eachsentence,obtained as a result of speechrecognition(1)where is the number of words in thesentenceand, ,and are the linguistic score,the significance score,and the confidence score ofword,respectively. Although sentence boundaries can be estimated using linguistic and prosodic information[16],they are manually given in the experiments in this paper.The three scores are a subset of the scores originally used in our sentence compaction method and considered to be useful also as measures indicating theFURUI et al.:SPEECH-TO-TEXT AND SPEECH-TO-SPEECH SUMMARIZATION 403appropriateness of including the sentence in thesummary.and are weighting factors for balancing the scores.Details of the scores are as follows.Linguistic score :The linguisticscore indicates the linguistic likelihood of word strings in the sentence and is measured by n-gramprobability(2)In our experiment,trigram probability calculated using transcriptions of presentation utterances in the CSJ con-sisting of 1.5M morphemes (words)is used.This score de-weights linguistically unnatural word strings caused by recognition errors.Significance score :The significancescoreindicates the significance of eachword in the sentence and is measured by the amount of information.The amount of in-formation contained in each word is calculated for content words including nouns,verbs,adjectives and out-of-vocab-ulary (OOV)words,based on word occurrence in a corpus as shown in (3).The POS information for each word is ob-tained from the recognition result,since every word in the dictionary is accompanied with a unique POS tag.A flat score is given to other words,and(3)where is the number of occurrencesof in the recog-nizedutterances,is the number of occurrencesof ina large-scale corpus,andis the number of all content words in that corpus,thatis.For measuring the significance score,the number of occurrences of 120000kinds of words is calculated in a corpus consisting of transcribed presentations (1.5M words),proceedings of 60presentations,presentation records obtained from the World-Wide Web (WWW)(2.1M words),NHK (Japanese broadcast company)broadcast news text (22M words),Mainichi newspaper text (87M words)and text from a speech textbook “Speech Information Processing ”(51000words).Im-portant keywords are weighted and the words unrelated to the original content,such as recognition errors,are de-weighted by this score.Confidence score :The confidencescoreis incor-porated to weight acoustically as well as linguistically re-liable hypotheses.Specifically,a logarithmic value of the posterior probability for each transcribed word,which is the ratio of a word hypothesis probability to that of all other hypotheses,is calculated using a word graph obtained by a decoder and used as a confidence score.2)Sentence Compaction:After removing relatively less important sentences,the remaining transcription is auto-matically modified into a written editorial article style to calculate the score for sentence compaction.All the sentences are concatenated while preserving sentence boundaries,and a linguisticscore,,a significancescore ,and aconfidencescoreare given to each transcribed word.A word concatenationscorefor every combination of words within each transcribed sentence is also given to weighta word concatenation between words.This score is a measure of the dependency between two words and is obtained by a phrase structure grammar,stochastic dependency context-free grammar (SDCFG).A set of words that maximizes a weighted sum of these scores is selected according to a given compres-sion ratio and connected to create a summary using a two-stage dynamic programming (DP)technique.Specifically,each sentence is summarized according to all possible compression ratios,and then the best combination of summarized sentences is determined according to a target total compression ratio.Ideally,the linguistic score should be calculated using a word concatenation model based on a large-scale summary corpus.Since such a summary corpus is not yet available,the tran-scribed presentations used to calculate the word trigrams for the important sentence extraction are automatically modified into a written editorial article style and used together with the pro-ceedings of 60presentations to calculate the trigrams.The significance score is calculated using the same corpus as that used for calculating the score for important sentence extraction.The word-dependency probability is estimated by the Inside-Outside algorithm,using a manually parsed Mainichi newspaper corpus having 4M sentences with 68M words.For the details of the SDCFG and dependency scores,readers should refer to [12].B.Evaluation Experiments1)Evaluation Set:Three presentations,M74,M35,and M31,in the CSJ by male speakers were summarized at summarization ratios of 70%and 50%.The summarization ratio was defined as the ratio of the number of characters in the summaries to that in the recognition results.Table I shows features of the presentations,that is,length,mean word recognition accuracy,number of sentences,number of words,number of fillers,filler ratio,and number of disfluencies including repairs of each presentation.They were manually segmented into sentences before recognition.The table shows that the presentation M35has a significantly large number of disfluencies and a low recognition accuracy,and M31has a significantly high filler ratio.2)Summarization Accuracy:To objectively evaluate the summaries,correctly transcribed presentation speech was manually summarized by nine human subjects to create targets.Devising meaningful evaluation criteria and metrics for speech summarization is a problematic issue.Speech does not have explicit sentence boundaries in contrast with text input.There-fore,speech summarization results cannot be evaluated using the F-measure based on sentence units.In addition,since words (morphemes)within sentences are extracted and concatenated in the summarization process,variations of target summaries made by human subjects are much larger than those using the sentence level method.In almost all cases,an “ideal ”summary does not exist.For these reasons,variations of the manual summarization results were merged into a word network as shown in Fig.2,which is considered to approximately express all possible correct summaries covering subjective variations.Word accuracy of the summary is then measured in comparison with the closest word string extracted from the word network as the summarization accuracy [5].404IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING,VOL.12,NO.4,JULY 2004TABLE I E V ALUATION SETFig.2.Word network made by merging manual summarization results.3)Evaluation Conditions:Summarization was performed under the following nine conditions:single-stage summariza-tion without applying the important sentence extraction (NOS);two-stage summarization using seven kinds of the possible combination of scores for important sentence extraction(,,,,,,);and summarization by randomword selection.The weightingfactorsand were set at optimum values for each experimental condition.C.Evaluation Results1)Summarization Accuracy:Results of the evaluation ex-periments are shown in Figs.3and 4.In all the automatic summarization conditions,both the one-stage method without sentence extraction and the two-stage method including sen-tence extraction achieve better results than random word se-lection.In both the 70%and 50%summarization conditions,the two-stage method achieves higher summarization accuracy than the one-stage method.The two-stage method is more ef-fective in the condition of the smaller summarization ratio (50%),that is,where there is a higher compression ratio,than in the condition of the larger summarization ratio (70%).In the 50%summarization condition,the two-stage method is effective for all three presentations.The two-stage method is especially effective for avoiding one of the problems of the one-stage method,that is,the production of short unreadable and/or incomprehensible sentences.Comparing the three scores for sentence extraction,the sig-nificancescoreis more effective than the linguisticscore and the confidencescore .The summarization score can beincreased by using the combination of two scores(,,),and even more by combining all threescores.Fig. 3.Results of the summarization with text presentation at 50%summarizationratio.Fig. 4.Results of the summarization with text presentation at 70%summarization ratio.FURUI et al.:SPEECH-TO-TEXT AND SPEECH-TO-SPEECH SUMMARIZATION405The differences are,however,statistically insignificant in these experiments,due to the limited size of the data.2)Effects of the Ratio of Compression by Sentence Extrac-tion:Figs.5and6show the summarization accuracy as a function of the ratio of compression by sentence extraction for the total summarization ratios of50%or70%.The left and right ends of the figures correspond to summarizations by only sentence compaction and sentence extraction,respectively. These results indicate that although the best summarization accuracy of each presentation can be obtained at a different ratio of compression by sentence extraction,there is a general tendency where the smaller the summarization ratio becomes, the larger the optimum ratio of compression by sentence extraction becomes.That is,sentence extraction becomes more effective when the summarization ratio gets smaller. Comparing results at the left and right ends of the figures, summarization by word extraction(i.e.,sentence compaction) is more effective than sentence extraction for the M35presenta-tion.This presentation includes a relatively large amount of re-dundant information,such as disfluencies and repairs,and has a significantly low recognition accuracy.These results indicate that the optimum division of the compression ratio into the two summarization stages needs to be estimated according to the specific summarization ratio and features of the presentation in question,such as frequency of disfluencies.III.S UMMARIZATION W ITH S PEECH P RESENTATIONA.Unit Selection and Concatenation1)Units for Extraction:The following issues need to be ad-dressed in extracting and concatenating speech segments for making summaries.1)Units for extraction:sentences,phrases,or words.2)Criteria for measuring the importance of units forextraction.3)Concatenation methods for making summary speech. The following three units are investigated in this paper:sen-tences,words,and between-filler units.All the fillers automat-ically detected as the result of recognition are removed before extracting important segments.Sentence units:The method described in Section II-A.1 is applied to the recognition results to extract important sentences.Since sentences are basic linguistic as well as acoustic units,it is easy to maintain acoustical smoothness by using sentences as units,and therefore the concatenated speech sounds natural.However,since the units are rela-tively long,they tend to include unnecessary words.Since fillers are automatically removed even if they are included within sentences as described above,the sentences are cut and shortened at the position of fillers.Word units:Word sets are extracted and concatenated by applying the method described in Section II-A.2to the recognition results.Although this method has an advan-tage in that important parts can be precisely extracted in small units,it tends to cause acoustical discontinuity since many small units of speech need to be concatenated.There-fore,summarization speech made by this method some-times soundsunnatural.Fig.5.Summarization accuracy as a function of the ratio of compression by sentence extraction for the total summarization ratio of50%.Fig.6.Summarization accuracy as a function of the ratio of compression by sentence extraction for the total summarization ratio of70%.Between-filler units:Speech segments between fillers as well as sentence boundaries are extracted using speech recognition results.The same method as that used for ex-tracting sentence units is applied to evaluate these units.These units are introduced as intermediate units between sentences and words,in anticipation of both reasonably precise extraction of important parts and naturalness of speech with acoustic continuity.2)Unit Concatenation:Units for building summarization speech are extracted from original speech by using segmentation boundaries obtained from speech recognition results.When the units are concatenated at the inside of sentences,it may produce noise due to a difference of amplitudes of the speech waveforms. In order to avoid this problem,amplitudes of approximately 20-ms length at the unit boundaries are gradually attenuated before the concatenation.Since this causes an impression of406IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING,VOL.12,NO.4,JULY 2004TABLE IIS UMMARIZATION A CCURACY AND N UMBER OF U NITS FOR THE T HREE K INDS OF S UMMARIZATION UNITSincreasing the speaking rate and thus creates an unnatural sound,a short pause is inserted.The length of the pause is controlled between 50and 100ms empirically according to the concatenation conditions.Each summarization speech which has been made by this method is hereafter referred to as “summarization speech sentence ”and the text corresponding to its speech period is referred to as “summarization text sentence.”The summarization speech sentences are further concate-nated to create a summarized speech for the whole presentation.Speech waveforms at sentence boundaries are gradually at-tenuated and pauses are inserted between the sentences in the same way as the unit concatenation within sentences.Short and long pauses with 200-and 700-ms lengths are used as pauses between sentences.Long pauses are inserted after sentence ending expressions,otherwise short pauses are used.In the case of summarization by word-unit concatenation,long pauses are always used,since many sentences terminate with nouns and need relatively long pauses to make them sound natural.B.Evaluation Experiments1)Experimental Conditions:The three presentations,M74,M35,and M31,were automatically summarized with a summarization ratio of 50%.Summarization accuracies for the three presentations using sentence units,between-filler units,and word units,are given in Table II.Manual summaries made by nine human subjects were used for the evaluation.The table also shows the number of automatically detected units in each condition.For the case of using the between-filler units,the number of detected fillers is also shown.Using the summarization text sentences,speech segments were extracted and concatenated to build summarization speech,and subjective evaluation by 11subjects was performed in terms of ease of understanding and appropriateness as a sum-marization with five levels:1—very bad;2—bad;3—normal;4—good;and 5—very good.The subjects were instructed to read the transcriptions of the presentations and understand the contents before hearing the summarizationspeech.Fig.7.Evaluation results for the summarization with speech presentation in terms of the ease ofunderstanding.Fig.8.Evaluation results for the summarization with speech presentation in terms of the appropriateness as a summary.2)Evaluation Results and Discussion:Figs.7and 8show the evaluation results.Averaging over the three presentations,the sentence units show the best results whereas the word unitsFURUI et al.:SPEECH-TO-TEXT AND SPEECH-TO-SPEECH SUMMARIZATION407show the worst.For the two presentations,M74and M35,the between-filler units achieve almost the same results as the sen-tence units.The reason why the word units which show slightly better summarization accuracy in Table II also show the worst subjective evaluation results here is because of unnatural sound due to the concatenation of short speech units.The relatively large number of fillers included in the presentation M31pro-duced many short units when the between-filler unit method was applied.This is the reason why between-filler units show worse subjective results than the sentence units for M31.If the summarization ratio is set lower than50%,between-filler units are expected to achieve better results than sentence units,since sentence units cannot remove redundant expressions within sentences.IV.C ONCLUSIONIn this paper,we have presented techniques for com-paction-based automatic speech summarization and evaluation results for summarizing spontaneous presentations.The sum-marization results are presented by either text or speech.In the former case,the speech-to-test summarization,we proposed a two-stage automatic speech summarization method consisting of important sentence extraction and word-based sentence compaction.In this method,inadequate sentences including recognition errors and less important information are automat-ically removed before sentence compaction.It was confirmed that in spontaneous presentation speech summarization at70% and50%summarization ratios,combining sentence extraction with sentence compaction is effective;this method achieves better summarization performance than our previous one-stage method.It was also confirmed that three scores,the linguistic score,the word significance score and the word confidence score,are effective for extracting important sentences.The best division for the summarization ratio into the ratios of sentence extraction and sentence compaction depends on the summarization ratio and features of presentation utterances. For the case of presenting summaries by speech,the speech-to-speech summarization,three kinds of units—sen-tences,words,and between-filler units—were investigated as units to be extracted from original speech and concatenated to produce the summaries.A set of units is automatically extracted using the same measures used in the speech-to-text summarization,and the speech segments corresponding to the extracted units are concatenated to produce the summaries. Amplitudes of speech waveforms at the boundaries are grad-ually attenuated and pauses are inserted before concatenation to avoid acoustic discontinuity.Subjective evaluation results for the50%summarization ratio indicated that sentence units achieve the best subjective evaluation score.Between-filler units are expected to achieve good performance when the summarization ratio becomes smaller.As stated in the introduction,speech summarization tech-nology can be applied to any kind of speech document and is expected to play an important role in building various speech archives including broadcast news,lectures,presentations,and interviews.Summarization and question answering(QA)per-form a similar task,in that they both map an abundance of information to a(much)smaller piece to be presented to the user[17].Therefore,speech summarization research will help the advancement of QA systems using speech documents.By condensing important points of long presentations and lectures, speech-to-speech summarization can provide the listener with a valuable means for absorbing much information in a much shorter time.Future research includes evaluation by a large number of presentations at various summarization ratios including smaller ratios,investigation of other information/features for impor-tant unit extraction,methods for automatically segmenting a presentation into sentence units[16],those methods’effects on summarization accuracy,and automatic optimization of the division of compression ratio into the two summarization stages according to the summarization ratio and features of the presentation.A CKNOWLEDGMENTThe authors would like to thank NHK(Japan Broadcasting Corporation)for providing the broadcast news database.R EFERENCES[1]S.Furui,K.Iwano,C.Hori,T.Shinozaki,Y.Saito,and S.Tamura,“Ubiquitous speech processing,”in Proc.ICASSP2001,vol.1,Salt Lake City,UT,2001,pp.13–16.[2]S.Furui,“Recent advances in spontaneous speech recognition and un-derstanding,”in Proc.ISCA-IEEE Workshop on Spontaneous Speech Processing and Recognition,Tokyo,Japan,2003.[3]I.Mani and M.T.Maybury,Eds.,Advances in Automatic Text Summa-rization.Cambridge,MA:MIT Press,1999.[4]J.Alexandersson and P.Poller,“Toward multilingual protocol genera-tion for spontaneous dialogues,”in Proc.INLG-98,Niagara-on-the-lake, Canada,1998.[5]K.Zechner and A.Waibel,“Minimizing word error rate in textual sum-maries of spoken language,”in Proc.NAACL,Seattle,W A,2000.[6]J.S.Garofolo,E.M.V oorhees,C.G.P.Auzanne,and V.M.Stanford,“Spoken document retrieval:1998evaluation and investigation of new metrics,”in Proc.ESCA Workshop:Accessing Information in Spoken Audio,Cambridge,MA,1999,pp.1–7.[7]R.Valenza,T.Robinson,M.Hickey,and R.Tucker,“Summarization ofspoken audio through information extraction,”in Proc.ISCA Workshop on Accessing Information in Spoken Audio,Cambridge,MA,1999,pp.111–116.[8]K.Koumpis and S.Renals,“Transcription and summarization of voice-mail speech,”in Proc.ICSLP2000,2000,pp.688–691.[9]K.Maekawa,H.Koiso,S.Furui,and H.Isahara,“Spontaneous speechcorpus of Japanese,”in Proc.LREC2000,Athens,Greece,2000,pp.947–952.[10]T.Kikuchi,S.Furui,and C.Hori,“Two-stage automatic speech summa-rization by sentence extraction and compaction,”in Proc.ISCA-IEEE Workshop on Spontaneous Speech Processing and Recognition,Tokyo, Japan,2003.[11] C.Hori and S.Furui,“Advances in automatic speech summarization,”in Proc.Eurospeech2001,2001,pp.1771–1774.[12] C.Hori,S.Furui,R.Malkin,H.Yu,and A.Waibel,“A statistical ap-proach to automatic speech summarization,”EURASIP J.Appl.Signal Processing,pp.128–139,2003.[13]K.Knight and D.Marcu,“Summarization beyond sentence extraction:A probabilistic approach to sentence compression,”Artific.Intell.,vol.139,pp.91–107,2002.[14]H.Daume III and D.Marcu,“A noisy-channel model for document com-pression,”in Proc.ACL-2002,Philadelphia,PA,2002,pp.449–456.[15] C.-Y.Lin and E.Hovy,“From single to multi-document summarization:A prototype system and its evaluation,”in Proc.ACL-2002,Philadel-phia,PA,2002,pp.457–464.[16]M.Hirohata,Y.Shinnaka,and S.Furui,“A study on important sentenceextraction methods using SVD for automatic speech summarization,”in Proc.Acoustical Society of Japan Autumn Meeting,Nagoya,Japan, 2003.[17]K.Zechner,“Spoken language condensation in the21st Century,”inProc.Eurospeech,Geneva,Switzerland,2003,pp.1989–1992.。
《IEEEsignalprocessingletters》期刊第19页50条数据
《IEEEsignalprocessingletters》期刊第19页50条数据《IEEE signal processing letters》期刊第19页50条数据https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html academic-journal-foreign_ieee-signal-processing-letters_info_57_1/1.《Robust Video Hashing Based on Double-Layer Embedding》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html2.《Removal of High Density Salt and Pepper Noise Through Modified Decision Based Unsymmetric Trimmed Median Filter》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html3.《Performance Comparison of Feature-Based Detectors for Spectrum Sensing in the Presence of Primary User Traffic》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html4.《An Optimal FIR Filter With Fading Memory》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html5.《Piecewise-and-Forward Relaying in Wireless Relay Networks》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html6.《Non-Shift Edge Based Ratio (NSER): An Image Quality Assessment Metric Based on Early Vision Features》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html7.《Joint Optimization of the Worst-Case Robust MMSE MIMO Transceiver》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html8.《A New Initialization Method for Frequency-Domain Blind Source Separation Algorithms》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html9.《A Method For Fine Resolution Frequency Estimation From Three DFT Samples》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html10.《Position-Patch Based Face Hallucination Using Convex Optimization》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html11.《Signal Fitting With Uncertain Basis Functions》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html12.《Optimal Filtering Over Uncertain Wireless Communication Channels》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html13.《The Student's -Hidden Markov Model With Truncated Stick-Breaking Priors》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html14.《IEEE Signal Processing Society Information》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html15.《Acoustic Model Adaptation Based on Tensor Analysis of Training Models》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html16.《On Estimating the Number of Co-Channel Interferers in MIMO Cellular Systems》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html17.《Period Estimation in Astronomical Time Series Using Slotted Correntropy》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html18.《Multidimensional Shrinkage-Thresholding Operator and Group LASSO Penalties》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html19.《Enhanced Seam Carving via Integration of Energy Gradient Functionals》letters_thesis/020*********.html20.《Backtracking-Based Matching Pursuit Method for Sparse Signal Reconstruction》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html21.《Performance Bounds of Network Coding Aided Cooperative Multiuser Systems》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html22.《Table of Contents》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html23.《Bayesian Estimation With Imprecise Likelihoods: Random Set Approach》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html24.《Low-Complexity Channel-Estimate Based Adaptive Linear Equalizer》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html25.《Tensor Versus Matrix Completion: A Comparison With Application to Spectral Data》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html26.《Joint DOD and DOA Estimation for MIMO Array With Velocity Receive Sensors》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html27.《Regularized Subspace Gaussian Mixture Models for Speech Recognition》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html28.《Handoff Optimization Using Hidden Markov Model》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html29.《Standard Deviation for Obtaining the Optimal Direction in the Removal of Impulse Noise》letters_thesis/020*********.html30.《Energy Detection Limits Under Log-Normal Approximated Noise Uncertainty》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html31.《Joint Subspace Learning for View-Invariant Gait Recognition》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html32.《GMM-Based KLT-Domain Switched-Split Vector Quantization for LSF Coding》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html33.《Complexity Reduced Face Detection Using Probability-Based Face Mask Prefiltering and Pixel-Based Hierarchical-Feature Adaboosting》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html34.《RLS Algorithm With Convex Regularization》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html35.《Solvability of the Zero-Pinning Technique to Orthonormal Wavelet Design》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html36.《Power Spectrum Blind Sampling》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html37.《Noise Folding in Compressed Sensing》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html38.《Fast Maximum Likelihood Scale Parameter Estimation From Histogram Measurements》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html39.《Elastic-Transform Based Multiclass Gaussianization》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html40.《Improving Detection of Acoustic Signals by Means of a Time and Frequency Multiple Energy Detector》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html41.《Efficient Multiple Kernel Support Vector Machine Based Voice Activity Detection》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html42.《Performance Analysis of Dual-Hop AF Systems With Interference in Nakagami-$m$ Fading Channels》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html43.《Illumination Normalization Based on Weber's Law With Application to Face Recognition》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html44.《A Robust Replay Detection Algorithm for Soccer Video》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html45.《Regularized Adaptive Algorithms-Based CIR Predictors for Time-Varying Channels in OFDM Systems》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html46.《A Novel Semi-Blind Selected Mapping Technique for PAPR Reduction in OFDM》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html47.《Widely Linear Simulation of Continuous-Time Complex-Valued Random Signals》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html48.《A Generalized Poisson Summation Formula and its Application to Fast Linear Convolution》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html49.《Multiple-Symbol Differential Sphere Detection Aided Differential Space-Time Block Codes Using QAM Constellations》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html50.《Low Rank Language Models for Small Training Sets》原⽂链接:https:///doc/f83f6c1c4ad7c1c708a1284ac850ad02de800787.html /academic-journal-foreign_ieee-signal-processing-letters_thesis/020*********.html。
语言学课后答案第2章
1.phonetics: the study of how speech sounds are produced, transmitted, and perceived. It can be divided into three main areas of study—articulatory phonetics, acoustic phonetics andperceptual/auditory phonetics.articulatory phonetics: the study of the production of speech sounds, or the study of how speech sounds are produced/made.phonology: the study of the sound patterns and sound systems of languages. It aims to discover the principles that govern the way sounds are organized in languages, and to explain the variations that occur. speech organs: those parts of the human body involved in the production of speech, also known as ‗vocal organs‘.voicing: the vibration of the vocal folds. When the vocal folds are close together, the airstream causes them to vibrate against each other and the resultant sound is said to be ‗voiced‘. When the vocal folds are apart and the air can pass through easily, the sound produced is said to be ‗voiceless‘.International Phonetic Alphabet: a set of standard phonetic symbols in the form of a chart (the IPA chart), designed by the InternationalPhonetic Association since 1888. It has been revised from time to time to include new discoveries and changes in phonetic theory and practice. The latest version has been revised in 1993 and updated in 1996. consonant: a major category of sound segments, produced by a closure in the vocal tract, or by a narrowing which is so marked that air cannot escape without producing audible friction.vowel: a major category of sound segments, produced without obstruction of the vocal tract so that air escapes in a relatively unimpeded way through the mouth or the nose.manner of articulation: ways in which articulation of consonants can be accomplished—(a) the articulators may close off the oral tract for an instant or a relatively long period; (b) they may narrow the space considerably; or (c) they may simply modify the shape of the tract by approaching each other.place of articulation: the point where an obstruction to the flow of air is made in producing a consonant.Cardinal Vowels: a set of vowel qualities arbitrarily defined, fixed and unchanging, intended to provide a frame of reference for the description of the actual vowels of existing languages.semi-vowel: segments that are neither consonants nor vowels, e.g. [j] and [w].vowel glide: vowels that involve a change of quality, including diphthongs, when a single movement of the tongue is made, and triphthongs, where a double movement is perceived. coarticulation: simultaneous or overlapping articulations, as when the nasal quality of a nasal sound affects the preceding or following sound so that the latter becomes nasalized. If the affected sound becomes more like the following sound, it is known as ‗anticipatory coarticulation‘; if the sou nd shows the influence of the preceding sound, it is ‗perseverative coarticution‘.phoneme: a unit of explicit sound contrast. If two sounds in a language make a contrast between two different words, they are said to be different phonemes.allophone: variants of the same phoneme. If two or more phonetically different sounds do not make a contrast in meaning, they are said to be allophones of the same phoneme. To be allophones, they must be in complementary distribution and bear phonetic similarity. assimilation: a process by which one sound takes on some or all the characteristics of a neighboring sound, a term often used synonymouslywith ‗coarticulation‘. If a following sound is influencing a preceding sound, it is called ‗regressive assimilation‘; t he converse process, in which a preceding sound is influencing a following sound, is known as ‗progressive assimilation‘.Elsewhere Condition: The more specific rule applied first. It is applied when two or more rules are involved in deriving the surface form from the underlying form.distinctive features: a means of working out a set of phonological contrasts or oppositions to capture particular aspects of language sounds, first suggested by Roman Jacobson in the 1940s and then developed by numerous other people.syllable: an important unit in the study of suprasegmentals. A syllable must have a nucleus or peak, which is often the task of a vowel or possibly that of a syllabic consonant, and often involves an optional set of consonants before and/or after the nucleus.Maximal Onset Principle: a principle for dividing the syllables when there is a cluster of consonants between two vowels, which states that when there is a choice as to where to place a consonant, it is put into the onset rather than the coda.stress: the degree of force used in producing a syllable. When a syllable is produced with more force and is therefore more ‗prominent‘, it is a ‗stressed‘ syllable in contrast to a less prominent, ‗unstressed‘ syllable. intonation: the occurrence of recurring fall-rise patterns, each of which is used with a set of relatively consistent meanings, either on single words or on groups of words of varying length.tone: a set of fall-rise patterns affecting the meanings of individual words.8.In Old English, there are no voiced fricative phonemes. All voiced variants, which appear only between voiced sounds, are allophones of their voiceless counterparts.The rule can be stated as follows:fricatives → [+voice] / [+voice]_____[+voi ce][–voice] in other places2.1) voiced dental fricative2) voiceless postalveolar fricative3) velar nasal4) voiced alveolar stop/plosive5) voiceless bilabial stop/plosive6) voiceless velar stop/plosive7) (alveolar) lateral8) high front unrounded lax vowel9) high back rounded tense vowel10) low back rounded lax vowel3.1) [f]2) [ʒ]3) [j]4) [h]5) [t]6) [e]7) [ʉ]8) [ɶ]9) [ɔ]10) [u]4.1) On a clear day you can see for miles.2) Some people think that first impressions count for a lot.5. 1)Quite a few human organs are involved in the production of speech: the lungs, the trachea (or windpipe), the throat, the nose, and the mouth. The pharynx, mouth, and nose form the three cavities of the vocal tract. Speech sounds are produced with an airstream as their sources of energy. In most circumstances, the airstream comes from the lungs. It is forced out of the lungs and then passes through the bronchioles and bronchi, a series of branching tubes, into the trachea. Then the air is modified at various points in various ways in the larynx, and in the oral and nasal cavities: the mouth and the nose are often referred to, respectively, as the oral cavity and the nasal cavity.Inside the oral cavity, we need to distinguish the tongue and various parts of the palate, while inside the throat, we have to distinguish the upper part, called pharynx, from the lower part, known as larynx. The larynx opens into a muscular tube, the pharynx, part of which can be seen in a mirror. The upper part of the pharynx connects to the oral and nasal cavities.The contents of the mouth are very important for speech production. Starting from the front, the upper part of the mouth includes the upper lip, the upper teeth, the alveolar ridge, the hard palate, the soft palate (or the velum), and the uvula. The soft palate can be lowered to allow air to pass through the nasal cavity. When the oral cavity is at the same time blocked, a nasal sound is produced.The bottom part of the mouth contains the lower lip, the lower teeth, the tongue, and the mandible.At the top of the trachea is the larynx, the front of which is protruding in males and known as the ―Adam‘s Apple‖. The larynx contains the vocal folds, als o known as ―vocal cords‖ or ―vocal bands‖. The vocal folds are a pair of structure that lies horizontally below the latter and their front ends are joined together at the back of the Adam‘s Apple. Their rear ends, however, remain separated and can move into various positions: inwards, outwards, forwards, backwards, upwards and downwards.5. 2)This is because gh is pronounced as [f] in enough, o as [ɪ] in women, and ti as [ʃ] in nation.5. 3)In the production of consonants at least two articulators are involved. For example, the initial sound in bad involves both lips and its final segment involves the blade (or the tip) of the tongue and the alveolar ridge. The categories of consonant, therefore, are established on the basis of several factors. The most important of these factors are: (a) the actual relationship between the articulators and thus the way in which the air passes through certain parts of the vocal tract, and (b) where in the vocal tract there is approximation, narrowing, or the obstruction ofair. The former is known as the Manner of Articulation and the latter as the Place of Articulation.The Manner of Articulation refers to ways in which articulation can be accomplished: (a) the articulators may close off the oral tract for an instant or a relatively long period; (b) they may narrow the space considerably; or (c) they may simply modify the shape of the tract by approaching each other.The Place of Articulation refers to the point where a consonant is made. Practically consonants may be produced at any place between the lips and the vocal folds. Eleven places of articulation are distinguished on the IPA chart.As the vowels cannot be described in the same way as the consonants, a system of cardinal vowels has been suggested to get out of this problem. The cardinal vowels, as exhibited by the vowel diagram in the IPA chart, are a set of vowel qualities arbitrarily defined, fixed and unchanging, intended to provide a frame of reference for the description of the actual vowels of existing languages.The cardinal vowels are abstract concepts. If we imagine that for the production of [@] the tongue is in a neutral position (neither high nor low, neither front nor back), the cardinal vowels are as remote as possible from this neutral position. They represent extreme points of a theoretical vowel space: extending the articulators beyond this spacewould involve friction or contact. The cardinal vowel diagram (or quadrilateral) in the IPA is therefore a set of hypothetical positions for vowels used as reference points.The front, center, and back of the tongue are distinguished, as are four levels of tongue height: the highest position the tongue can achieve without producing audible friction (high or close); the lowest position the tongue can achieve (low or open); and two intermediate levels, dividing the intervening space into auditorily equivalent areas (mid-high or open-mid, and mid-low or close-mid).5. 4)Both phonetics and phonology study human speech sounds but they differ in the levels of analysis. Phonetics studies how speech sounds are produced, transmitted, and perceived. Imagine that the speech sound is articulated by a Speaker A. It is then transmitted to and perceived by a Listener B. Consequently, a speech sound goes through a three-step process: speech production, sound transmission, and speech perception.Naturally, the study of sounds is divided into three main areas, each dealing with one part of the process: Articulatory Phonetics is the study of the production of speech sounds, Acoustic Phonetics is the study of the physical properties of speech sounds, and Perceptual or Auditory Phonetics is concerned with the perception of speech sounds.Phonology is the study of the sound patterns and sound systems of languages. It aims to discover the principles that govern the way sounds are organized in languages, and to explain the variations that occur.In phonology we normally begin by analyzing an individual language, say English, in order to determine its phonological structure, i.e. which sound units are used and how they are put together. Then we compare the properties of sound systems in different languages in order to make hypotheses about the rules that underlie the use of sounds in them, and ultimately we aim to discover the rules that underlie the sound patterns of all languages.5. 5)Speech is a continuous process, so the vocal organs do not move from one sound segment to the next in a series of separate steps. Rather, sounds continually show the influence of their neighbors. For example, if a nasal consonant (such as [m]) precedes an oral vowel (such as [æ] in map), some of the nasality will carry forward so that the vowel [æ] will begin with a somewhat nasal quality. This is because in producing a nasal the soft palate is lowered to allow airflow through the nasal tract. To produce the following vowel [æ], the soft palate must move back to its normal position. Of course it takes time for the soft palate to move from its lowered position to the raised position. This process is still in progress when the articulation of [æ] has begun. Similarly, when [æ] isfollowed by [m], as in lamb, the velum will begin to lower itself during the articulation of [æ] so that it is ready for the following nasal.When such simultaneous or overlapping articulations are involved, we call the process ‗coarticulation‘. If the sound becomes more like the following sound, as in the case of lamb, it is known as ‗anticipatory coarticulation‘. If the sound shows the influence of the preceding sound, it is ‗perseverative coarticulation‘, as is the case of map.Assimilation is a phonological term, often used synonymously with coarticulation, which is more of a phonetic term. Similarly, there are two possibilities of assimilation: if a following sound is influencing a preceding sound, we call it ‗regressive assimilation‘; the converse process, in which a preceding sound is influencing a following sound, is known as ‗progressive assimilation‘.Anticipatory coarticulation is by far the most common cause of assimilation in English. For example,ex. 1a. cap [kæp] can [kæn]b. tap [tæp] tan [tæn]ex. 2a. tent [tɛnt] tenth [tɛn̪θ]b. ninety [naɪnti] ninth [naɪn̪θ]ex. 2a. since [sɪns] sink [sɪŋk]b. mince [sɪns] mink [mɪŋk]In both exx. 1a and 1b, the words differ in two sounds. The vowel in the second word of each pair is ―nasalized‖ because of the influence of the following nasal consonant. In ex. 2, the nasal /n/ is ―dentalized‖ before a dental fricative. In ex. 3, the alveolar nasal /n/ becomes the velar nasal [ŋ] before the velar stop [k]. In this situation, nasalization, dentalization, and velarization are all instances of assimilation, a process by which one sound takes on some or all the characteristics of a neighboring sound.Assimilation can occur across syllable or word boundaries, as shown by the following:ex. 4a. pan[ŋ]cakeb. he can[ŋ] go nowStudies of English fricatives and affricates have shown that their voicing is severely influenced by the voicing of the following sound: ex. 5a. five past [faɪvpɑːst] >[faɪfpɑːst]b. has to [hæztə] >[hæstə]c. as can be shown [əzkənbɪʃəʊn]> [əskənbɪʃəʊn]d. edge to edge [ɛʤtəɛʤ] >[ɛʧtəɛʤ]The first column of symbols shows the way these phrases are pronounced in slow or careful speech while the second column shows how they are pronounced in normal, connected speech. It indicates that in English fricatives and affricates are devoiced when they are followed by voiceless sounds. This however does not occur with stops and vowels.5. 6)The word teller is formed by adding a suffix -er to the base word tell to form a new word. We are all familiar with the rule that governs the allophones of the phoneme /l/: when preceding a vowel, it is [l] and when following a vowel it is [ɫ]. However, in teller it has a vowel both before and after it, so how do we decide that it should be pronounced as [l], not [ɫ]?We notice that tell is a monosyllabic word while teller is disyllabic. In a polysyllabic word, we follow the Maximal Onset Principle (MOP) for the division of syllable. By MOP, the /l/ must be placed in the onset position of the second syllable instead of the coda position of the first syllable. Thus, the phoneme /l/ is realized as it should be before the vowel in the second syllable. The same is true with telling, falling, and many others. We can see from this that the phonological structure of a complex word is often different from its morphological structure, i.e. how the word isformed. In word-formation it is tell + -er while in syllable structure it is [te+lə].6.In some dialects of English the following words have different vowels, as shown by the phonetic transcription. Based on these data, answer the questions that follow.1) All the sounds that end the words in column A are voiceless ([-voice]) consonants and all the sounds that end the words in column B are voiced ([+voice]) consonants.2) All the words in column C are open syllables, i.e. they end in vowels.3) The two sounds are in complementary distribution because [ʌɪ] appears only before voiceless consonants and [aɪ] occurs before voiced consonants and in open syllables.4) (a) [lʌɪf] (b) lives [laɪvz]5) (a) [traɪl] (b) [bʌɪk] (c) [lʌɪs] (d) [flaɪ] (e) [maɪn]6) /aɪ/ [ʌɪ] / _____[–voice][aɪ] in other places7.As far as orthography is concerned, there are four variants: in-, im-, ir-, and il-, but closer scrutiny shows that in- may be pronounced as [ɪŋ] before velar consonants, so there are five groups of words according to their variation on pronunciation:(1) [ɪn]: inharmonic, ingenious, inoffensive, indifferent, inevitable, innumerable[ɪn] or [ɪŋ]: incomprehensible, incompetent, inconsistent[ɪm]: impenetrable, impossible, immobile[ɪl]: illiterate, illegal, illogical[ɪr]: irresponsible, irresistible, irregularIt is clear that the first sound of the base word governs the distribution of the variants, because the final consonant of the prefix in- must assimilate to the first segment of the base word. As a result of this, we find [ɪm] before labial consonants like [m] or [p], [ɪl] before the lateral [l], [ɪr] before [r]. When the first consonant of the base word is the velar consonant [k], it is [ɪŋ] in rapid speech and [ɪn] in careful speech. In all other cases [ɪn] is always the case. Assuming an underlying form /ɪn/, the rule for the prefix in- looks roughly like this (in the simplest notation):(2) /ɪn/ → {[ɪn], [ɪŋ]} / _____[velar][ɪm] / _____[labial][ɪl] / _____ [l][ɪr] / _____[r][ɪn] in other placesThis rule system could be further simplified if we eliminate the first rule, as the realization [ɪŋ] is actually optional. Unlike the other rules, this variation is due to a more general mechanism of assimilation in fast speech, which happens naturally. For example, in conference is also often pronounced as [ɪŋkɒnfərəns] in fast speech, and the nasal in thank and think is also realized as a velar.We can test these rules by looking at other base words which can take the prefix in-, such as correct, moveable, legible, rational, and adequate. When prefixed, they are respectively pronounced [ɪn]correct (or[ɪŋ]correct), [ɪm]moveable, [ɪl]legible, [ɪr]rational, and [ɪn]adequate, which further support the rules above.(Based on Plag, 2003: 200-1)。
《英语语言学概论》精选试题学生版
《英语语言学概论》精选试题11.Which of the following statements about language is NOT true?nguage is a systemnguage is symbolicC.Animals also have l anguagenguage is arbitrary2.Which of the following features is NOT one of the design features of language?A. SymbolicB. DualityC. ProductiveD. Arbitrary3.What is the most important function of language?A. InterpersonalB. PhaticC. InformativeD. Metalingual4.Who put forward the distinction between Langue and Parole?A. SaussureB. C homskyC. HallidayD. Anonymous5.According to Chomsky, which is the ideal user's internalized knowledge of his language?A. competenceB. paroleC. performanceD. langue6.The function of the sentence "A nice day, isn't it?" is .A. informativeB. phaticC. directiveD. performative7.Articulatory phonetics mainly studies .A.the physical properties of the sounds produced in speechB.the perception of soundsC.the combination of soundsD.the production of sounds8.The distinction between vowels and consonants lies in .A.the place of articulationB.the obstruction of airstreamC.the position of the tongueD.the shape of the lips9.Which is the branch of linguistics which studies the characteristics of speech sounds and provides methods for their description, classification and transcription?A. PhoneticsB. PhonologyC. SemanticsD. Pragmatics10.Which studies the sound systems in a certain language?A. PhoneticsB. PhonologyC. SemanticsD. Pragmatics11.Minimal pairs are used to .A.find the distinctive features of a languageB.find the phonemes of a languagepare two wordsD.find the allophones of languageually, suprasegmental features include ,length and pitch.A. phonemeB. speech soundsC. syllablesD. stress13.Which is an indispensable part of a syllable?A. CodaB. OnsetC. StemD. Peak三、判断1.The analyst collects samples of the language as it is used, not according to some views of how it should be used. This is called the prescriptive approach. F2.B road transcription is normally used by the phoneticians in their study of speech sounds. F台州学院考试题1.Articulatory Phonetics studies the physical properties of speech sounds.2.English is a typical intonation language.3.Phones in complementary distribution should be assigned to the same phoneme.4.Linguistic c is a native speaker’s linguistic knowledge of his language.1.The relationship between the sound and the meaning of a word is a .2.P refers to the realization of langue in actual u se.3.Linguistics is generally defined as t he s study of language.1.Which of the following branch of linguistics takes the inner structure of word as its main object of study?A. Phonetics.B. Semantics.C. M orphology.D. Sociolinguistics.3. Which of the following is a voiceless bilabial stop?A. [w].B. [m].C. [b].D. [p].6. What phonetic feature distinguishes the [p] in please and the [p] in speak?A.VoicingB. AspirationC. RoundnessD. Nasality11.Conventionally a is put in slashes.A. a llophoneB. phoneC. phonemeD. morphemenguage is tool of communication. The symbol “highway closed ”serves .A. an expressive functionB. an informative functionC. a performative functionD. a persuasive function14.Which of the following groups of words is a minimal pair?A. but/pubB. wet/whichC. cool/curlD. fail/find16.What are the dual structures of language?A. Sounds and letters.B. Sounds and m eaning.C. Letters and meaning.D. Sounds and symbols.19.Which of the following is one of the core branches of linguistics?A.Phonology.B.Psycho-linguistics.C.Sociolinguistics.D.Anthropology.IV. Translate the following linguistic terms: (10 points, 1 point each)A. From English to ChineseB. From Chinese to English1.acoustic phonetics6. 應用語言學2. closed class words4. distinctive featuresVI.Answer the following questions briefly. (20 points)1.Define phoneme. (4 points)2.Explain complementary distribution with an example.(5 points)3.What are the four criteria for classifying English vowels. (4 points)问答答案1. A contrastive phonological segment whose phonetic realizations are predictable by rules. (4 points)(or: A phoneme is a phonological unit; it is a unit that is of distinctive value.)2.The situation in which phones never occur in the same phonetic environment.(4 points)e.g. [p] and [p h] never occur in the same position. (1 point)3.the position of the tongue in the mouth(1 point), the openness of the mouth(1 point), the shape of the lips(1 point), and the length of the vowels. (1 point)Chapter 1 Introductions to LinguisticsI.Choose the best answer. (20%)nguage is a system of arbitrary vocal symbols used for humanA. contactB. communicationC. relationD. Community2.Which of the following words is entirely arbitrary?A. treeB. typewriterC. crashD. Bang3.The function of the sentence ―Water boils at 100 degrees Centigrade.‖ is.A. interrogativeB. directiveC. informativeD. Performative4.In Chinese when someone breaks a bowl or a plate the host or the people present are likely to say―碎碎(岁岁)平安‖as a means of controlling the forces which they believes feelmight affect their lives. Which functions does it perform?A. InterpersonalB. EmotiveC. PerformativeD. Recreational5.Which of the following property of language enables language users to overcome the barriers caused by time and place , due to this feature of language, speakers of a language arefree to talk about anything in any situation?A. TransferabilityB. DualityC. DisplacementD. Arbitrariness6.Study the following dialogue. What function does it play according to the functions of language?—Anice day, isn’t it?—Right! I really enjoy the sunlight.A. EmotiveB. PhaticC. PerformativeD. Interpersonal7.________ refers to the actual realization of the ideal language user’s knowledge of the rules of his language in utterances.A. PerformanceB. CompetenceC. LangueD. Parole8.When a dog is barking, you assume it is barking for something or at someone thatexists hear and now. It couldn’t be sorrowful for some lost love or lost bone. This indicat design feature of .A.cultural transmissionB.productivityC.displacementD. Duality9.answers such questions as how we as infants acquire our first language.A.PsycholinguisticsB.A nthropological linguisticsC.SociolinguisticsD.Applied linguistics10.deals with language application to other fields, particularly education.A.Linguistic theoryB.Practical linguisticsC.Applied linguisticsparative linguisticsII.Decide whether the following statements are true or false. (10%)11. Language is a means of verbal communication. Therefore, the communication way used by the deaf-mute is not language. F13.Speaking is the quickest and most efficient way of the human communication systems.nguage is written because writing is the primary medium for all languages. F15.We were all born with the ability to acquire language, which means the details language system can be genetically transmitted. F16.Only human beings are able to communicate. F17. F. de Saussure, who made the distinction between langue and parole in the early 20th century, was a French linguist. F18. A study of the features of the English used in Shake e s a p re’s time is an example of the diachronic 历时study of language. F19.Speech and writing came into being at much the same time in human history.F20. III.All the languages in the world today have both spoken and written forms.Fill in the blanks. (10%)Fnguage, broadly speaking, is a means of verbal_ communication.22.In any language words can be used in new ways to mean new things and can becombined into innumerable sentences based on limited rules. This feature is usually termed creativity_ .nguage has many functions. We can use language to talk about itself. This funct is .24.Theory that primitive man made involuntary vocal noises while performing heavywork has been c alled the yo-he-ho ________ theory.25.Linguistics is the systematic study of language.26.Modern linguistics is __ ________ in the sense that the linguist tries to discover what language is rather than lay down some rules for people to observe.27.One general principle of linguistic analysis is the primacy of over writing.28.The description of a language as it changes through time is a study.29.Saussure put forward two important concepts. refers to the abstract linguistic system shared by all members of a speech community.30.Linguistic potential is similar to Saussure’s langue and Chomsky’s.I V.Explain the following terms, using examples. (20%)31.Design feature32.Displacementpetence34.Synchronic linguisticsV.Answer the following questions. (20%)35.Why do people take duality as one of the important design features of human language?Can you tell us what language will be if it has no such design feature? (南开大学,2004 )35.Duality makes our language productive. A large number of different units can be formed out o a small number of elements – for instance, tens of thousands of words out of a small set of sounds, around 48 in the case of the English language. And out of the huge number of words, there can be astronomical number of possible sentences and phrases, which in turn can combineto form unlimited number of texts. Most animal communication systems do not have this design feature of human language.If language has no such design feature, then it will be like animal communicational systemwhich will be highly limited. It cannot produce a very large number of sound combinations, e.g. words, which are distinct in meaning.Chapter 2 Speech SoundsI.Choose the best answer. (20%)1.Pitch variation is k nown as when its patterns are imposed on s entences.A. intonationB. toneC. pronunciationD. voice2.Conventionally a is put in slashes (/ /).A. allophoneB. phoneC. phonemeD. morpheme3.An aspirated p, an unaspirated p and an unreleased p are of the p phoneme.A. analoguesB. tagmemesC. morphemesD. allophones4.The opening between the vocal cords is sometimes referred to as .A. g lottisB. vocal cavityC. pharynxD. uvula6.A phoneme is a group of similar sounds called .A. minimal pairsB. allomorphsC. phonesD. allophones7.Which branch of phonetics concerns the production of speech sounds?A.Acoustic phoneticsB.Articulatory phoneticsC.Auditory phoneticsD.None of the above8.Which one is different from the others according to places of articulation?A. [n]B. [m]C. [ b ]D. [p]9.Which vowel is different from the others according to the characteristics of vowels?A. [i:]B. [ u ]C. [e]D. [ i ]10.What kind of sounds can we make when the vocal cords are vibrating?A. VoicelessB. V oicedC. G lottal s topD. ConsonantII.Decide whether the following statements are true or false. (10%)11.Suprasegmental phonology refers to the study of phonological properties of units larger thanthe segment-phoneme, such as syllable, word and sentence.12.The air stream provided by the lungs has to undergo a number of modification to acquire thequality of a speech sound.14.[p] is a voiced bilabial stop.15.Acoustic phonetics is concerned with the perception of speech sounds.16.All syllables must have a nucleus but not all syllables contain an onset and a coda.17.W hen pure vowels or monophthongs are pronounced, no vowel glides take place.18.According to the length or tenseness of the pronunciation, vowels can be divided into vs. lax or long vs. short.III.Fill in the blanks. (20%)21. Consonant sounds can be e ither ______ __ or _______ _, while all vowel sounds are .23. The qualities of vowels depend upon the position of the and the lips.25.Consonants differ from vowels in that the latter are produced without .26.In phonological analysis the words fail / veil are distinguishable simply because of the two phonemes /f/ - /v/. This is an example for illustrating .27.In English there are a number of _________ , which are produced by moving f rom one vowel position to another through intervening positions.28.refers to the phenomenon of sounds continually show the influence of their neighbors.29.is the smallest linguistic unit.IV.Explain the following terms, using examples. (20%)31.Sound assimilation32.Suprasegmental featureplementary distribution34.Distinctive featuresV.Answer the following questions. (20%)35.What is a coustic phonetics? (中国人民大学,2003 )36.What are the differences between voiced sounds and voiceless sounds in terms of articulation? (南开大学,2004 )VI.Analyze the f ollowing situation. (20%)37.Write the symbol that corresponds to each of the following phonetic descriptions; then give an English word that contains this sound. Example: voiced alveolar stop [d] dog. (青岛海洋大学,1999 )(1)voiceless bilabial unaspirated stop(2)low front vowel(3)lateral liquid(4)velar nasal(5)voiced interdental fricative32.Suprasegmental feature: The phonetic features that occur above the level of the segments are called suprasegmental features; these are the phonological properties of such units as the syllable, the word, and the sentence. The main suprasegmental ones includes stress, intonation, and tone.plementary distribution: The different allophones of the same phoneme never occur in the same phonetic context. When two or more allophones of one phoneme never occur in the same linguistic environment they are said to be in complementary distribution.34.Distinctive featureIst: refers to the features that can distinguish one phoneme from another. If we can group the phonemes into two categories: one with this feature and the other without, this feature is called a d istinctive feature.V. 35.Acoustic phonetics deals with the transmission of speech sounds through the air. When a speechsound is produced it causes minor air disturbances (sound waves). Various instruments are usedto measure the characteristics of these sound waves.36. When the vocal cords are spread apart, the air from the lungs passes between them unimpeded. Sounds produced in this way are described as voiceless; consonants [p, s, t] are produced in this way. But when the vocal cords are drawn together, the air from the lungs repeatedly pushes them apart as it passes through, creating a vibration effect. Sounds pr in this way are described as voiced. [b, z, d] are voiced consonants.11。
简明语言学期末整理
Chapter 1.1.1 What is linguistics?Linguistics is the scientific科学的、系统的study of language.Linguistics studies not any particular language, but it studies languages in general.Why is linguistics a scientific study?Systematic Investigation of linguistic data, conducted with some general linguistic theories.1.3 Important distinctions in linguistics判断1.3.1 Prescriptive vs. descriptive规范性和描述性Descriptive---- describe / analyze the language people actually use.Prescriptive----lay down rules for “corr ect and standard” behaviour in using language. Descriptive vs. Prescriptive•Don’t say X.•People don’t say X.The distinction: Describing how things arePrescribing how things ought to beModern linguistics vs. grammarModern linguistics is mostly descriptive.Grammar: to set models for language users to follow.Modern linguistics is supposed to be scientific and objective and its task is to describe the language people actually use, be it "correct" or not.Modern linguists believe that whatever occurs in the language people use should be described and analyzed in their investigation.1.3.2 Synchronic vs. diachronic同步性和历时性会定义Synchronic study---description of a language at some point of time (modern linguistics) Diachronic study---description of a language through time (historical development of language over a period of time)Synchronic approach enjoys priority over a diachronic one.Synchronic or Diachronic?1.The change of vocabulary since China’s reform and opening up.2.The study of Internet language in 21st century.3.Pejorative Sense Development in English.4.The Categories and Types of Present-day English Word-Formation.1.3.4. Langue and parole语言和言语会概念(F. de Saussure:1857--1913 )Langue---- the abstract linguistic system shared by all the members of a speech communityParole ---- the realization of langue in actual use.Langue is abstract. Parole is concrete.Langue is stable. Parole varies from person to person, and from situation to situation.What linguists should do is to abstract langue from parole.1.3.5 Competence and performance 会定义(Chomsky)Competence-----the ideal user’s knowledge of the rules of his language Performance---- the actual realization of this knowledge in linguistic communication. Similarity and difference between Saussure‟s distinction and that of ChomskySimilarity: both make the distinction between the abstract language system and the actual use of language.Difference: Chomsky’s competence-performance is from psychological point of view. Saussure’s langue-parole is from sociological point of view.1.3.6 Traditional grammar and modern linguistics 要懂!判断就是下面四个Modern linguistics differs from traditional grammar in several basic ways.Firstly, linguistics is descriptive while traditional grammar is prescriptive.Secondly, modern linguistics regards the spoken language as primary, not the written.Modern linguistics differs from traditional grammar also in that it does not force languages into a Latin-based framework.1.2.2 design features of language 必须要会!Arbitrary---- no intrinsic connection between the word and the thing it denotes, e.g.“pen” by any other name is the thing we use to write with.Arbitrariness任意性Productivity/Creativity多产性Duality双重性Displacement位移性Cultural transmission 文化传播Arbitrariness----No logical (motivated or intrinsic) connection between sounds and meanings.Onomatopoeic words (which imitate natural sounds) are somewhat motivated ( English: rumble, crackle, bang, …. Chinese: putong, shasha, dingdang…)Some compound words are not entirely arbitrary, e.g. type-writer, shoe-maker, air-conditioner, photocopy…Productivity/creativity----Peculiar to human languages,users of language can understand and produce sentences they have never heard before, e.g. we can understand sentence like “ A red-eyed elephant is dancing on the hotel bed”, though it does not describe a common happening in the world.A gibbon call system is not productive for gibbon draw all their calls from a fixedrepertoire which is rapidly exhausted, making any novelty impossible.The bee dance does have a limited productivity, as it is used to communicate about food sources in any direction. But food sources are the only kind of messages that can be sent through the bee dance; bees do not “talk”about themselves, the hives, or wind, let alone about people, animals, hopes or desiresDuality (double articulation)Lower level----sounds (meaningless)Higher level----meaning (larger units of meaning)A communication system with duality is considered more flexible than one without it, fora far greater number of messages can be sent. A small number of sounds can be groupedand regrouped into a large number of units of meaning (words), and the units of meaning can be arranged and rearranged into an infinite number of sentences. (we make dictionary of a language, but we cannot make a dictionary of sentences of that language.Cultural transmission----Language is culturally transmitted (through teaching and learning; rather than by instinct).Animal call systems are genetically transmitted. All cats, gibbons and bees have systems which are almost identical to those of all other cats, gibbons and bees.A Chinese speaker and an English speaker are not mutually intelligible. This shows thatlanguage is culturally transmitted. That is, it is pass on from one generation to the next by teaching and learning, rather than by instinct.The story of a wolf child, a pig child shows that a human being brought up in isolation simply does not acquire human language.Chapter 2 phonology2.2 Phonetics2.2.1 What is phonetics?必须要会三个分支也要会Phonetics is the study of the phonic medium of language in isolation.It is concerned with production, transcription音译and classification of speech sounds.The production and perception of speech soundsSpeakerProduction Sound Waves Hearer Perception Articulatory Phonetics Acoustic Phonetics Auditory Phonetics发音语音学声学语音学听觉语音学(语音学的三个分支)2.2.3Orthographic正字法,拼字正确的representation of speech sounds - broad and narrow transcriptions 大致意思要懂Towards the end of 19th century, when articulatory phonetics had developed to such an extent in the West that scholars began to feel the need for a standardized andinternationally accepted system of phonetic transcription. Thus the International Phonetic Alphabet (IPA) came into being.Exercise: Transcription of speech wordsTask: Write the phonetic symbol for the first sound in each word according to your pronunciation.Example: zoo /z/ psycho /s/a. Judge / /b. Thomas / /c. Phone / /d. Easy / /e. Usual / /The two sounds /t/ in “Student ”are really the same?The first “t” is unaspirated (不送气)The second “t” is aspirated (送气的)./ s t j u: d ən th /The four sounds /l/ are the same?Leaf /li:f/; feel /fi:l/; build /bild/; health /helθ/Clear [l] dark [ l ] dental [l]The aspirated /h/, the dark / / are both calledDiacritics (变音符号)Two Transcriptions 严式标音记到什么程度,宽式记音记到什么程度要懂Broad transcription: The letter-symbols only.宽式标音/ /Narrow transcription: The letter-symbols+diacritics严式标音[]2.4 Classification of English Speech sounds 音素的分类要知道Vowels [元音]Consonants [辅音] 元音和辅音放一起叫做音段音位,语调,音调和重音叫做超音段音位The difference between consonants and vowelsV owels: with no obstruction through the speech organsConsonants: with obstruction through the speech organs2.2.4.1 Classification of English consonants 要知道辅音的发音分类有哪几种English consonants can be classified in two ways:1. manner of articulation根据发音方式来分类2. place of articulation. 根据发音部位来分类2.3 PhonologyThe difference between phonetics & phonology 明白各自是干嘛的Phonetics: Study sounds in isolation; one by one,phonetic features; language universalPhonology: Study sounds patterns to convey meaning; language specific2.3.2Phone, Phoneme and allophone 理解含义判断题Phone音系: 1) a phonetic unit 是语音学的一个片段2) not necessarily distinctive of meaning 不需要区分意思3) physical as perceived 物理上能感知4) marked with [ ]Phoneme音位: 1) a phonological unit 是音系学的一个片段2) distinctive of meaning 要区分意思3) abstract, not physical 抽象,不能物理上区分4) marked with / /Allophone音位变体Allophones ---- the phones that can represent a phoneme in different phonetic environments.在不同的语音学环境中代表着一个音素StudentPhones: [t] [th]Phoneme: /t/Allophone: [t] [th]Exercise(How many phones, phonemes and allophones?)Pit Spit TipLeaf, Feel, HealthBut May Rest2.3.3 phonetic contrast, complementary distribution, and minimal pair 要知道什么叫互补分布Complementary distributionComplementary distribution----allophones of the same phoneme are in complementary distribution. They do not distinguish meaning. They occur in different phonetic contexts,e.g.dark [l] & clear [l], aspirated [p] & unaspirated [p]. [t] [th] Two allophones2.3.4 some rules in phonology 3条规则要明白是什么意思判断题2.3.4.1 sequential rules 序列规则There are rules that govern the combination of sounds in a particular language. These rules are called Sequential rules (序列规则)1.What are possible sequences if 3 consonants cluster together at the beginning of a word?2.3.4.2 assimilation rule 同化规则The assimilation rule assimilates one sound to another by “copying” a feature of a sequential phoneme, thus making the two phones similar. (同化规则)3. Give examples to show how the assimilation rule works in English.2.3.4.3 deletion rules 省略规则Deletion rule tells us when a sound is to be deleted although it is orthographically represented. Deletion /g/ occurs before a final nasal consonant. Sign design…比如l r 后面只允许出现元音这是什么规则在约束它是同化规则在约束它错@!应该是序列规则在约束它要知道序列规则该怎么写2.3.5 suprasegmental features ----stress tone声调intonation语调只需知道超音段音位包括哪些就可以了英语中tone算不算超音段音位?不算所谓是音位,需要能够区分意义5. List different types of stress patterns that can distinguish meaning. (Task)1)To distinguish some nouns from their related verbs.` import (n.) —im` port (v.)`record (n.) —re`cord (v.)2) To distinguish compounds from noun phrases.`hotdog (n.) — hot `dog (phrase)3)To distinguish the compound combinations of -ing modifiers and nouns and the phrasalcombinations of-ing forms for the action and nouns for the doer.`sleeping car (compound) —ֽsleeping `boy (phrase)4)To distinguish content words from function words in sentences.He is `driving my `car.5)To emphasize to a certain part of a sentence.I prefer `small apples, those are far too large.7. What is the difference between tone and intonation?1)Tone refers to pitch movement in spoken utterances that is related to differences in wordmeaning.2)Intonation refers to pitch movement in spoken utterances that is not related todifferences in word meaning.Chapter 3 morphology 形态学morphology 的概念要会3.1 Morphology refers to the study of the internal structure of words and the rules bywhich words are formed. 词的内部结构的学习和词形成的规则3.2 open class and closed class 开放词类和封闭词类需要注意既不是开放的也不是封闭的词Open class words----content words of a language to which we can regularly add new words, such as nouns, adjectives, verbs and adverbs, e.g. beatnik(a member of the Beat Generation), hacker, email, internet, “做秀,时装秀…” in Chinese.Closed class words----grammatical or functional words, such as conjunction, articles, preposition and pronouns.封闭词类和开放词类就语法功能来说,词可分为:封闭词类(Closed Class)和开放词类(Open Class)。
胡壮麟《语言学教程》(修订版)测试题——第二章:语音
胡壮麟《语言学教程》(修订版)测试题——第二章:语音胡壮麟《语言学教程》(修订版)测试题——第二章:语音Chapter 2 Speech SoundsI. Choose the best answer. (20%)1~5 ACDAA6~10 DBABB1. Pitch variation is known as __________ when its patterns are imposed on sentences.A. intonationB. toneC. pronunciationD. voice2. Conventionally a __________ is put in slashes (/ /).A. allophoneB. phoneC. phonemeD. morpheme3. An aspirated p, an unaspirated p and an unreleased p are __________ of the p phoneme.A. analoguesB. tagmemesC. morphemesD. allophones4. The opening between the vocal cords is sometimes referred to as __________.A. glottisB. vocal cavityC. pharynxD. uvula5. The diphthongs that are made with a movement of thetongue towards the center are known as __________ diphthongs.A. wideB. closingC. narrowD. centering6. A phoneme is a group of similar sounds called __________.A. minimal pairsB. allomorphsC. phonesD. allophones7. Which branch of phonetics concerns the production of speech sounds?A. Acoustic phoneticsB. Articulatory phoneticsC. Auditory phoneticsD. None of the above8. Which one is different from the others according to places of articulation?A. [n]B. [m]C. [ b ]D. [p]9. Which vowel is different from the others according to the characteristics of vowels?A. [i:]B. [ u ]C. [e]D. [ i ]10. What kind of sounds can we make when the vocal cords are vibrating?A. V oicelessB. V oicedC. Glottal stopD. ConsonantIV. Explain the following terms, using examples. (20%)31. Sound assimilation: Speech sounds seldom occur in isolation. In connected speech, under the influence of their neighbors, are replaced by other sounds. Sometimes two neighboring sounds influence each other and are replaced by a third sound which is different from both original sounds. This process is called sound assimilation.32. Suprasegmental feature: The phonetic features that occur above the level of the segments are called suprasegmental features; these are the phonological properties of such units as the syllable, the word, and the sentence. The main suprasegmental ones includes stress, intonation, and tone. 33. Complementary distribution: The different allophones of the same phoneme never occur in the same phonetic context. When two or more allophones of one phoneme never occur in the same linguistic environment they are said to be in complementary distribution.34. Distinctive features: It refers to the features that can distinguish one phoneme from another. If we can group the phonemes into two categories: one with this feature and the other without, this feature is called a distinctive feature.V. Answer the following questions. (20%)35. What is acoustic phonetics?(中国人民大学,2003)Acoustic phonetics deals with the transmission of speech sounds through the air. When a speech sound is produced it causes minor air disturbances (sound waves). V ariousinstruments are used to measure the characteristics of these sound waves.36. What are the differences between voiced sounds and voiceless sounds in terms of articulation?(南开大学,2004)When the vocal cords are spread apart, the air from the lungs passes between them unimpeded. Sounds produced in this way are described as voiceless; consonants [p, s, t] are produced in this way. But when the vocal cords are drawn together, the air from the lungs repeatedly pushes them apart as it passes through, creating a vibration effect. Sounds produced in this way are described as voiced. [b, z, d] are voiced consonants.VI. Analyze the following situation. (20%)37. Write the symbol that corresponds to each of the following phonetic descriptions; then give an English word that contains this sound. Example: voiced alveolar stop [d] dog. (青岛海洋大学,1999)(1) voiceless bilabial unaspirated stop(2) low front vowel(3) lateral liquid(4) velar nasal(5) voiced interdental fricative。
ChatGPT’s_AI_Can_Help_Screen_for_Alzheimer’s_ChatG
ChatGPT’s AI Can Help Screen for Alzheimer’s 扫码听读
ChatGPT 的人工智能 可帮助筛查阿尔茨海默病
文 / 艾德·金特 译 / 张雅晖 审订 / 石小军
By Edd Gent
The AI-powered chatbot ChatGPT is taking the Internet by storm with its impressive language capabilities, helping to draw up legal contracts as well as write fiction. But it turns out that the underlying technology could also help spot the early signs of Alzheimer’s disease, potentially making it possible to diagnose the debilitating condition sooner. 2 Catching Alzheimer’s early can significantly improve treatment options and give patients time to make lifestyle changes that could slow progression. Diagnosing the disease typically requires brain imaging or lengthy cognitive evaluations though, which can be both expensive and time-consuming and therefore unsuitable for widespread screening, says Hualou Liang a professor of biomedical engineering at Drexel University in Philadelphia. 3 A promising avenue for early detec-
Assignment 5
Hidden Markov Model in Music Information RetrievalMUMT 611, March 2005Assignment 5Paul KolesnikIntroduction to HMMHidden Markov Model (HMM) is a structure that is used to statistically characterize the behavior of sequences of event observations. By definition, HMM is “a double stochastic process with an underlying stochastic process which is not observable, but can only be observed through another set of stochastic process that produces the sequence of observed symbols” (Rabiner and Huang 1986).The main idea behind HMM is that any observable sequence of events can be represented as a succession of states, with each state representing a grouped portion of the observation values and containing its features in a statistical form. The HMM keeps track of what state the sequence will start in, what state-to-state transitions are likely to take place, and what values are likely to occur in each state. The corresponding model parameters are an array of initial state probabilities, a matrix of state-to-state transitional probabilities, and a matrix of state output probabilities. The two basic HMM model types are an ergodic model, where any-to-any state transitions are allowed, and a left-to-right model, where state transitions can only take place to the state itself or to the subsequent state.HMM delas with three basic problems—recognition, uncovering of states and training. The recognition problem can be formulated as: “given an observation sequence and a Hidden Markov Model, calculate the probability that the model would produce this observation sequence”. The uncovering of states problem is: “given an observation sequence and a Hidden Markov Model, calculate the optimal sequence of states that would maximize the likelihood of the HMM producing the observation”. The training problem states: “given an observation sequence (or a set of observation sequences and a Hidden Markov Model, adjust the model parameters, so that probability of the model is maximized”. Through the algorithms used to solve those problems (Forward-Backward, Viterbi and Baum-Welch algorithms), an HMM can be trained with a number of observations, and then be used either for calculation of probability of an input sequence, or for identification of states in the input sequence interpreted by the HMM.Overview of WorksThis section provides an overview of the works presented at the International Symposium on Musical Information and Retrieval (ISMIR) that dealt with HMM-based systems.A publication by Batlle and Cano (2000) describes a system that uses HMMs to classify audio segments. Using the system, audio files are automatically segmentated into abstract acoustic events, with similar events given the same label to be used for training of the HMMs. THe system is applied to classification of a database of audio sounds, and allowsfast indexing and retrieval of audio fragments from the database. During the initial stage of the process, mel-cepstrum analysis is used to obtain feature vectors from the audio information, which are then supplied to an HMM-based classification engine. Since traditional HMMs are not suited for blind learning (which is the goal of this system, as there is no prior knowledge of the feature vector data), competitive HMMs (CoHMMs) are used instead. CoHMMs differ from HMMs only in training stage, whereas the recognition procedure is exactly the same for both,A work by Durey and Clements (2001) deals with a melody-based database song retrieval system. The system uses melody-spotting procedure adopted from word spotting techniques in automatic speech recognition. Humming, whistling or keyboard are allowed as input. According to the publication, the main goal of the work was to develop a practical system for non-symbolic (audio) music representation. The word/melody-spotting techniques involve searching for a data segment in a data stream using HMM models. Left-to-right, 5-state HMMs are used to represent each available note and a rest. As part of the preprocessing process, frequency and time-domain features are used for extraction of feature vectors. The system records the extracted feature vectors for all of the musical pieces and stores them in a database. Once an input query is received, it constructs an HMM model from it, and runs all of the feature vectors from the songs in the database through the model using the Viterbi state uncovering process. As a result, a ranked list of melody occurrences in database songs is created, which allows identifying the occurrence of the melodies within the songs.A publication by Jin and Jagadish (2002) describes a new technique suggested for HMM-based music retrieval systems. The paper describes traditional MIR HMM techniques as effective but not efficient, and suggests a more efficient mechanism to index the HMMs in the database. In it, each state is represented by an interval / inter onset interval ratio, and each transition is transformed into a 4-dimensional box. All boxes are inserted into an R-tree, an indexing structure for multidimensional data, and HMMs are ranked by the number of boxes present in the search tree. The most likely candidates from the R-tree are selected for evaluation of the HMMs, which uses the traditional forward algorithm.A publication by Orio and Sette (2003) describes an HMM-based approach to transcription of musical queries. HMMs are used to model features related to singing voice. A sung query is considered as an observation of an unknown process—the melody the user has in mind. A two-level HMM is suggested for pitch tracking: an event-level (using pitches as labels), and audio-level (attack-sustain-rest events).A paper by Sheh and Ellis (2003) deals with the system that uses HMMs for chord recognition. An EM (Expectation-Maximization) algorithm is used to train the HMMs, and PCP (Pitch Class Profile) vectors are used as features for the training process. Each chord type is represented by an independent HMM model. According to the publication, the system is able to successfully recognize chords in unstructured, polyphonic, multi-timbre audio.Shifrin and Burmingham (2003) present a system that investigates performance of an HMM-based query-by-humming system on a large musical database. A VocalSearch system, that has been desgined as part of MusArt project, is used on a database of 50000 themes that have been extracted from 22000 songs. The system uses a <delta-pitch / Inter-onset Interval ratio> pair as parameters for the feature vectors supplied to the HMMs. The work compares perfect queries with imperfect queries that have simulated insertions and deletions added to the sequences. Some of the trends discovered as a result of the work are: longer queries have a positive effect on evaluation performance, but all experiments show an early saturation point where performance does not improve with query length. The system performed well with imperfect queries on a large database.BibliographyBatlle, E., and P. Cano. 2000. Automatic segmentation for music classification using Competitive Hidden Markov Models. In Proceedings of International Symposium on Music Information Retrieval.Durey, A., and M. Clements. 2001. Melody spotting using Hidden Markov Models. In Proceedings of International Symposium on Music Information Retrieval, 109–17.Jin, H., and H. Jagadish. 2002. Indexing Hidden Markov Models for music retrieval. In Proceedings of International Symposium on Music Information Retrieval, 20–4.Orio, N., and M. Sette. 2003. A HMM-based pitch tracker for audio queries. In Proceedings of International Symposium on Music Information Retrieval, 249–50.Rabiner, L., and B. Huang. 1986. An introduction to Hidden Markov Models. IEEE Acoustics, Speech and Signal Processing Magazine 3(1), 4–16.Sheh, A., and D. Ellis. 2003. Chord segmentation and recognition using EM-trained Hidden Markov Models. In Proceedings of International Symposium on Music Information Retrieval, 183–9.Shifrin, J., and W. Burmingham. 2003. Effectiveness of HMM-based retrieval on large databases. In Proceedings of International Symposium on Music Information Retrieval, 33–9.。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
B ei i j ng 100081)
A b s tr a e t
It 1 w ll k n ow n th a t a u d ito ry sy stem o f h u m an b e i g s h a s ex eellen 5 e n t
a d a P ta tio n m e th o d .
P A C S n u m b e rs: 4 3 .7 2 , 4 3 . 0 6
1 In tro d u etio n
T h e h u m a n a u d ito ry sy ste m 15 a h i h l eo m p lex sen so ry sy stem . S tu d y in g its stru etu re g y
ta n eo u s f eq u e n ey h a r s m u eh Io w er eo m P lex i th a n th a t b a sed o n a m b ig u ity f n etio n . F rth er ty u u
m o re , th e F F T 一 a ed f a tu re s a eh iev e th e h i h est ree o g n iti n ra te u sin g th e P ro P o sed o rd er r bs e g o
o V l. O 3
o f sp ee eh sig n a l 15 a l 叮 5 eh a n g in g eo n tin u o u sly [ c o m p a red w i m o d e l g sp e eeh sig n a l a s s w 2]. thtio n ary si n a l , m o d e li g th e m a s a m p li d e一 eq u en ey m o d u la ti n sig n a l a ee o rd s tim g s n tu r f o s b ette r w i sp e eeh eh a rae teristi s a s w ell as th e p eree p tio n o f h u m a n a u d ito ry sy ste m sf 一 th e 3 4{. A s
a new ti e一equeney anal s tool F aeti m f r ysi ,r onalF uri t f rm ( FT )1 att o er rans o r F 5 raeti m ore and ng
m ore atten ti in sign al Proeessing li on teratu re. Sin ee F F T ean b e eon sidered as a d eeo m P osi on r ti ofth e si nalin term s o feh i s, F F T 1 esp eeiall su i l f r th e proeessing ofeh i like signals. g rp r 5 y tab e o rP一
si n a ls u sin g t h e a u d ito ry f l b a n k , th e d e eo m P o se d s u b b a n d sig n a l m a g i ter s y
sig n als a n d th e R F T e a n P la y a g re就 e r r ole .
o f H i er E d ucati of C hina (2010 110 11100 20). gh on
t C orrespo nd i g au th or: X IE X i g, xiex i g@ bit. u n n an an ed
454
C H IN E SE JO U R N A L O F A C O U S T IC S
p ro ee sses
a n d f n e ti n 15 o f g re at i p o rta n ee f r a b e tte r u n d e rsta n d i g o f h u m a n p e reep ti u o m o n e v
a n d o f h e a rin g p a th o lo g ies a n d th e i tre a tm en t in p a rtieu l r . B esid e s, th e aeh iev m en ts in r a e a u d i ry sy stem to stu d y ea n g u i e th e d esig n o f arti五 ia l a u d ito ry sy stem l d e ll. A s 15 w e ll k n ow n ,
G am m atone f lterba nk 15 aPp li to sPeeeh signals f r f on 一 d tem Poral f lterin g, an d then i ed o r t en i
a eo u s tie f a tu res o f th e o u tP u t su b b a n d sig n a ls are e x tra ete d b ase d o n f a eti n a l F u rie r tra n se r o o o f rm . C o n si erin g th e eriti a l ef e t o f tra n sf rm o rd er f r R F T , a n o rd e r a d a p ta tio n m eth o d d e e f o o
B ee a u se sp e ee h sig n a ls a e v e ry eo m p lex , th e y h a e m a n r v y
o w
m a i f e邻 en ey eo m p o n n s th ey n r e t
n ,t q u ite res em b le s i P le eh irP si n a ls. If w e f rst d ee o m p o se th e sp e eeh sig n a l in to su b b a n d m g i
r f a e t io n a l F O u r ie r t ra n sf r m o
n N H ui x IE x ia n g t
o f r sp e e e h r e e o g n it io n
K U A N G Jin gm i g n
(D e ar o eot o E eetr :e E ng: 汽 g, B e砂 I st,亡to o T ehool P t f l o ne 几 :ng n f e 叩,
* T h i w ork w 朗 sup p orted b y the N ati na lSei ee and T eh nolo盯 M ajor p ro jeets (Zo loZ X o3 oo奋00 3 01), s o en e th e N a ti a N atura Seien ee F u nd ation of C hina (9092 0304) a nd the R sea reh F nd f r the D oeto ral P rog ram on j J o e u o
ig u i f n e tion . A S R ex p e ri e n s are e o n d u e te d o n ele a n a n d n o i ty u m t sy th a t th e p ro p o se d f a tu re s a eh ie v sig n i ea n tly h ig h er e e i f
sy stem s m ig h t im P ro v e th e re eo g n i n ra te si n if e a n tl . tio g i y
S p ee eh si n a l 15 v ery eo m P lex sig n a l. B ee au se o f in o n a ti n a n d eo a rtieu la tio n , f e q u en ey g t o r
P e rf rm a n ee o
w hi autom ati speeeh reeogni on ( SR ) syst s ean t m at , and f aeti eh e ti A em eh r onalF uri t o er rans o f rm ( FT ) has uni r F que adv ntages i non一ati a n st onary si gnal pr si oees ng In thi paper, the s
从 )1. 0 N o .4 3
C H IN E S E J O U R N A L O F A C O U S T IC S
2011
A e o u stie f a u r e s b a s e d o n a u d ito r y m o d e l a n d a d a P tiv e e t
b e m o re li e e h irp k
T h e ap p l atio n o f a u d ito ry m o d els to sp ee eh re eo g n i n h a s a lrea d y b ee n s tu d ie d e x te n ie tio
si y. N ew aeousti f atures f r A SR ean be deri through anal s ( . eepstrum anal s) vel ee o ved ysi ie. ysi