Acousticfeaturesbased_省略_mforspeechrecog

合集下载

胡壮麟《语言学教程》测试题及答案

WORD格式胡壮麟《语言学教程》（修订版）测试题第一章：语言学导论I.Choose the best answer. (20%)nguage is a system of arbitrary vocal symbols used for human __________.A. contactB. communicationC. relationD. community2.Which of the following words is entirely arbitrary?A. treeB. typewriterC. crashD. bang3. The function of the sentence “ Waterboils at 100 degrees Centigrade.”is__________.A. interrogativeB. directiveC. informativeD. performative4. In Chinese when someone breaks a bowl or a plate the host or the people presentare likely to say “碎碎（岁岁）平安” as means of controlling the forces whichthey believes feel might affect their lives. Which functions does it perform?A. InterpersonalB. EmotiveC. PerformativeD. Recreational5.Which of the following property of language enables language users to overcomethe barriers caused by time and place, due to this feature of language, speakers ofa language are free to talk about anything in any situation?A. TransferabilityB. DualityC. DisplacementD. Arbitrariness6. Study the following dialogue. What function does it play according to thefunctions of language?— A nice day, isn ’ t it?— Right! I really enjoy the sunlight.A. EmotiveB. PhaticC. PerformativeD. Interpersonal7. __________ refers to the actual realization of the ideal language user knowledge’of the rules of his language in utterances.A. PerformanceB. CompetenceC. LangueD. Parole8.When a dog is barking, you assume it is barking for something or at someone that exists hear and now.It couldn ’ t be sorrowful for some lost love or lost bone. This indicates the design feature of __________.A. cultural transmissionB. productivityC. displacementD. duality9. __________ answers such questions as how we as infants acquire our first language.A. Psycholinguistics C. SociolinguisticsB. Anthropological linguistics D. Applied linguistics10.__________ deals with language application to other fields,particularly education.A. Linguistic theoryB. Practical linguisticsC. Applied linguisticsD. Comparative linguisticsII.Decide whether the following statements are true or false. (10%)nguage is a means of verbal communication. Therefore, the communication wayused by the deaf-mute is not language.nguage change is universal, ongoing and arbitrary.13.Speaking is the quickest and most efficient way of the human communicationsystems.nguage is written because writing is the primary medium for all languages.15.We were all born with the ability to acquire language, which means the detailsof any language system can be genetically transmitted.16.Only human beings are able to communicate.17. . De Saussure, who made the distinction between langue and parole in the early20th century, was a French linguist.18. A study of the features of the English used in Shakespeare’s time is an example of the diachronic study of language.19.Speech and writing came into being at much the same time in human history.20.All the languages in the world today have both spoken and written forms.III.Fill in the blanks. (10%)nguage, broadly speaking, is a means of __________ communication.22.In any language words can be used in new ways to mean new things and can becombined into innumerable sentences based on limited rules. This feature is usuallytermed __________.nguage has many functions. We can use language to talk about itself. Thisfunction is __________.24.Theory that primitive man made involuntary vocal noises while performingheavy work has been called the __________ theory.25.Linguistics is the __________ study of language.26.Modern linguistics is __________ in the sense that the linguist tries todiscover what language is rather than lay down some rules for people to observe.27.One general principle of linguistic analysis is the primacy of __________over writing.28.The description of a language as it changes through time is a __________ study.29.Saussure put forward two important concepts. __________ refers to the abstractlinguistic system shared by all members of a speech community.30. Linguistic potential is similar to Saussure ’ s langue and Chomsky ’ s ________ IV. Explain the following terms, using examples. (20%)31.Design feature32.Displacementpetence34.Synchronic linguisticsV.Answer the following questions. (20%)35.Why do people take duality as one of the important design features of humanlanguage? Can you tell us what language will be if it has no such design feature?（南开大学， 2004 ）36. Why is it difficult to define language?（北京第二外国语大学，2004）VI. Analyze the following situation. (20%)37. How can a linguist make his analysis scientific?（青岛海洋大学，1999）第二章：语音I. Choose the best answer. (20%)1. Pitch variation is known as __________ when its patterns are imposed on sentences.A. intonationB. toneC. pronunciationD. voice2. Conventionally a __________ is put in slashes (/ /).A. allophoneB. phoneC. phonemeD. morpheme3.An aspirated p, an unaspirated p and an unreleased p are __________ of the p phoneme.A. analoguesB. tagmemesC. morphemesD. allophones4. The opening between the vocal cords is sometimes referred to as __________.A. glottisB. vocal cavityC. pharynxD. uvula5. The diphthongs that are made with a movement of the tongue towards the centerare known as __________ diphthongs.A. wideB. closingC. narrowD. centering6. A phoneme is a group of similar sounds called __________. A.minimal pairs B. allomorphs C. phones D. allophones7. Which branch of phonetics concerns the production of speech sounds? A.Acoustic phonetics B. Articulatory phonetics C. Auditory phonetics D.None of the above8. Which one is different from the others according to places of articulation?A. [n]B. [m]C. [ b ]D. [p]9. Which vowel is different from the others according to the characteristics of vowels?A. [i:]B. [ u ]C. [e]D. [ i ]10.What kind of sounds can we make when the vocal cords are vibrating?A. VoicelessB. VoicedC. Glottal stopD. ConsonantII. Decide whether the following statements are true or false. (10%)11.Suprasegmental phonology refers to the study of phonological properties of units larger than the segment-phoneme, such as syllable, word and sentence.12.The air stream provided by the lungs has to undergo a number of modification to acquire the quality of aspeech sound.13. Two sounds are in free variation when they occur in the same environment and donot contrast, namely,the substitution of one for the other does not produce adifferent word, but merely a different pronunciation.14. [p] is a voiced bilabial stop.15. Acoustic phonetics is concerned with the perception of speech sounds.16.All syllables must have a nucleus but not all syllables contain an onset and a coda.17.When pure vowels or monophthongs are pronounced, no vowel glides take place.18.According to the length or tenseness of the pronunciation, vowels can be divided into tense vs. lax orlong vs. short.19.Received Pronunciation is the pronunciation accepted by most people.20.The maximal onset principle states that when there is a choice as to where toplace a consonant, it is put into the coda rather than the onset.III.Fill in the blanks. (20%)21.Consonant sounds can be either __________ or __________, while all vowelsounds are __________.22. Consonant sounds can also be made when two organs of speech in the mouth are brought close together so that the air is pushed out between them, causing__________.23. The qualities of vowels depend upon the position of the __________ and the lips.24.One element in the description of vowels is the part of the tongue which is at the highest point in the mouth. A second element is the __________ to which that part of the tongue is raised.25.Consonants differ from vowels in that the latter are produced without__________.26.In phonological analysis the words fail / veil are distinguishable simplybecause of the two phonemes /f/ - /v/. This is an example for illustrating__________.27.In English there are a number of __________, which are produced by movingfrom one vowel position to another through intervening positions.28. __________ refers to the phenomenon of sounds continually show the influenceof their neighbors.29.__________ is the smallest linguistic unit.30.Speech takes place when the organs of speech move to produce patterns of sound. These movements have an effect on the __________ coming from the lungs.IV. Explain the following terms, using examples. (20%)31.Sound assimilation32.Suprasegmental feature33. Complementary distribution34.Distinctive featuresV. Answer the following questions. (20%)35. What is acoustic phonetics?（中国人民大学，2003）36.What are the differences between voiced sounds and voiceless sounds in terms of articulation?（南开 04）VI. Analyze the following situation. (20%)37.Write the symbol that corresponds to each of the following phonetic descriptions; then give an English word that contains this sound. Example: voiced alveolar stop [d] dog. （青岛海洋大学， 1999 ）(1)voiceless bilabial unaspirated stop(2)low front vowel(3)lateral liquid(4)velar nasal(5)voiced interdental fricative第三章：词汇I. Choose the best answer. (20%)1. Nouns, verbs and adjectives can be classified as __________.A. lexical wordsB. grammatical wordsC. functionwords D. form words2. Morphemes that represent tense, number, gender and case are called __________ morpheme.A. inflectional C. boundB. freeD. derivational3. There are __________ morphemes in the word denationalization.A. threeB. fourC. fiveD. six4.In English –ise and –tion are called __________.A. prefixesB. suffixesC. infixesD. stems5. The three subtypes of affixes are: prefix, suffix and__________. A. derivational affix B. inflectional affix C. infixD. back-formation6. __________ is a way in which new words may be formed from already existing words by subtracting an affix which is thought to be part of the old word.A. affixationB. back-formationC. insertionD. addition7.The word TB is formed in the way of __________.A. acronymyB. clippingC. initialismD. blending8. The words like comsat and sitcom are formed by __________. A.blending B. clipping C. back-formation D. acronymy9. The stem of disagreements is __________A. agreementB. agreeC. disagreeD. disagreement10.All of them are meaningful except for __________.A. lexemeB. phonemeC. morphemeD. allomorphII. Decide whether the following statements are true or false. (10%)11. Phonetically, the stress of a compound always falls on the first element, while the second element receives secondary stress.12.Fore as in foretell is both a prefix and a bound morpheme.13. Base refers to the part of the word that remains when all inflectionalaffixes are removed.14.In most cases, prefixes change the meaning of the base whereas suffixes change the word-class of the base.15.Conversion from noun to verb is the most productive process of a word.16. Reduplicative compound is formed by repeating the same morpheme of a word.17. The words whimper, whisper and whistle are formed in the way of onomatopoeia.18. In most cases, the number of syllables of a word corresponds to the number of morphemes.19. Back-formation is a productive way of word-formations.20. Inflection is a particular way of word-formations.III. Fill in the blanks. (20%)21. An __________ is pronounced letter by letter, while an __________ is pronounced as a word.22. Lexicon, in most cases, is synonymous with __________.23.Orthographically, compounds are written in three ways: __________, __________ and __________.24. All words may be said to contain a root __________.25. A small set of conjunctions, prepositions and pronouns belong to __________ class, while the largest part of nouns, verbs, adjectives and adverbs belongs to __________ class.26.__________ is a reverse process of derivation, and therefore is a processof shortening.27.__________ is extremely productive, because English had lost most of its inflectional endings by the end of Middle English period, which facilitated the use of words interchangeably as verbs or nouns, verbs or adjectives, and vice versa.28.Words are divided into simple, compound and derived words on the __________ level.29. A word formed by derivation is called a __________, and a word formed by compounding is called a__________.30.Bound morphemes are classified into two types: __________ and __________.IV. Explain the following terms, using examples. (20%)31.Blending32.Allomorph33.Closed-class word34. Morphological ruleV. Answer the following questions. (20%)35.How many types of morphemes are there in the English language? What are they?（厦门大学， 2003 ）36. What are the main features of the English compounds?VI. Analyze the following situation. (20%)37. Match the terms under COLUMN I with the underlined forms from COLUMN II（武汉大学， 2004 ）I II(1)acronym a. foe(2)free morpheme b. subconscious(3)derivational morpheme c. UNESCO(4)inflectional morpheme d. overwhelmed(5)prefix e. calculation第四章：句法I. Choose the best answer. (20%)1.The sentence structure is ________.A. only linearB. only hierarchicalC. complexD. both linear and hierarchical2.The syntactic rules of any language are ____ in number.A. largeB. smallC. finiteD. infinite3.The ________ rules are the rules that group words and phrases to form grammatical sentences.A. lexicalB. morphologicalC. linguisticD. combinational4.A sentence is considered ____ when it does not conform to the grammatical knowledge in the mind of native speakers.A. rightB. wrongC. grammaticalD. ungrammatical5.A __________ in the embedded clause refers to the introductory word that introduces the embedded clause.A. coordinatorB. particleC. prepositionD. subordinator6. Phrase structure rules have ____ properties.A. recursiveB. grammaticalC. socialD. functional7.Phrase structure rules allow us to better understand _____________.A.how words and phrases form sentences.B.what constitutes the grammaticality of strings of wordsC.how people produce and recognize possible sentencesD.all of the above.8. The head of the phrase “ the city RomeA. the cityB. RomeC.city ” is __________.D. the city Rome9.The phrase “ on the shelflongs”tobe__________ construction.A. endocentricB. exocentricC. subordinateD. coordinate10.The sentence “ They were wanted to remain quiet and not to expose themselves. is a __________sentence.A. simpleB. coordinateC. compoundD. complexWORD格式II. Decide whether the following statements are true or false. (10%)11.Universally found in the grammars of all human languages, syntactic rules that comprise the system of internalized linguistic knowledge of a language speaker are known as linguistic competence.12.The syntactic rules of any language are finite in number, but there is nolimit to the number of sentences native speakers of that language are able toproduce and comprehend.13.In a complex sentence, the two clauses hold unequal status, one subordinatingthe other.14.Constituents that can be substituted for one another without loss ofgrammaticality belong to the same syntactic category.15. Minor lexical categories are open because these categories are not fixed and new members are allowed for.16.In English syntactic analysis, four phrasal categories are commonly recognizedand discussed, namely, noun phrase, verb phrase, infinitive phrase, and auxiliary phrase.17.In English the subject usually precedes the verb and the direct objectusually follows the verb.18.What is actually internalized in the mind of a native speaker is a completelist of words and phrases rather than grammatical knowledge.19.A noun phrase must contain a noun, but other elements are optional.20.It is believed that phrase structure rules, with the insertion of the lexicon, generate sentences at the level of D-structure.III. Fill in the blanks. (20%)21.A __________ sentence consists of a single clause which contains a subject and a predicate and stands alone as its own sentence.22.A __________ is a structurally independent unit that usually comprises a numberof words to form a complete statement, question or command.23.A __________ may be a noun or a noun phrase in a sentence that usually precedes the predicate.24.The part of a sentence which comprises a finite verb or a verb phrase and which says something about the subject is grammatically called __________.25.A __________ sentence contains two, or more, clauses, one of which isincorporated into the other.26.In the complex sentence, the incorporated or subordinate clause is normallycalled an __________clause.27.Major lexical categories are __________ categories in the sense that new wordsare constantly added.28.__________ condition on case assignment states that a case assignor and a case recipient should stay adjacent to each other.29.__________ are syntactic options of UG that allow general principles to operatein one way or another and contribute to significant linguistic variations between andWORD格式among natural languages.30. The theory of __________ condition explains the fact that noun phrases appearonly in subject and object positions.IV. Explain the following terms, using examples. (20%)31.Syntax32.IC analysis33. Hierarchical structure34. Trace theoryV. Answer the following questions. (20%)35.What are endocentric construction and exocentric construction?（武汉大学，2004）36. Distinguish the two possible meanings of “ more beautiful flowers ”by means IC analysis. （北京二外国语大学，2004）VI. Analyze the following situation. (20%)37.Draw a tree diagram according to the PS rules to show the deep structure of thesentence:The student wrote a letter yesterday.第五章：意义I. Choose the best answer. (20%)1. The naming theory is advanced by ________.A. PlatoB. BloomfieldC. Geoffrey LeechD. Firth2. “We shall know a word by the company it keeps. ” This statement represents _______.A. the conceptualist viewB. contexutalismC. the naming theoryD. behaviorism3.Which of the following is NOT true?A.Sense is concerned with the inherent meaning of the linguistic form.B.Sense is the collection of all the features of the linguistic form.C.Sense is abstract and decontextualized.D.Sense is the aspect of meaning dictionary compilers are not interested in.4.“Can I borrow your bike?A. is synonymous with C.entails ” _______ “ You have a bike. ”B. is inconsistent withD. presupposes5. ___________ is a way in which the meaning of a word can be dissected into meaning components,called semantic features.A. Predication analysisB. Componential analysisC. Phonemic analysisD. Grammatical analysis6.“Alive”and “ dead” are ______________.A. gradable antonymsB. relational antonymsC. complementary antonymsD. None of the above7. _________ deals with the relationship between the linguistic element and thenon-linguistic world of experience.A. ReferenceB. ConceptC. SemanticsD. Sense8.___________ refers to the phenomenon that words having different meanings have the same form.A. PolysemyB. SynonymyC. HomonymyD. Hyponymy9. Words that are close in meaning are called ______________.A. homonymsB. polysemiesC. hyponymsD. synonyms10.The grammaticality of a sentence is governed by _______.A. grammatical rulesB. selectional restrictionsC. semanticrules D. semantic featuresII.Decide whether the following statements are true or false. (10%)11. Dialectal synonyms can often be found in different regional dialects such as British English and American English but cannot be found within the varietyitself, for example, within British English or American English.12. Sense is concerned with the relationship between the linguistic element andthe non-linguistic world of experience, while the reference deals with theinherent meaning of the linguistic form.13. Linguistic forms having the same sense may have different references indifferent situations.14. In semantics, meaning of language is considered as the intrinsic andinherent relation to the physical world of experience.15. Contextualism is based on the presumption that one can derive meaning from or reduce meaning to observable contexts.16. Behaviorists attempted to define the meaning of a language form as thesituation in which the speaker utters it and the response it calls forth in the hearer.17. The meaning of a sentence is the sum total of the meanings of allits components.18. Most languages have sets of lexical items similar in meaning but rankeddifferently according to their degree of formality.19.“It is hot. ”-placeisnopredication because it contains no argument.20.In grammatical analysis, the sentence is taken to be the basic unit, but in semantic analysis of a sentence, the basic unit is predication, which is the abstraction of the meaning of a sentence.III. Fill in the blanks. (20%)21. __________ can be defined as the study of meaning.22. The conceptualist view holds that there is no __________ link between alinguistic form and what it refers to.23.__________ means what a linguistic form refers to in the real, physical world;it deals with the relationship between the linguistic element and the non-linguistic world of experience.24. Words that are close in meaning are called __________.25. When two words are identical in sound, but different in spelling and meaning, they are called__________.26.__________ opposites are pairs of words that exhibit the reversal of a relationship between the two items.27.__________ analysis is based upon the belief that the meaning of a word can be divided into meaning components.28. Whether a sentence is semantically meaningful is governed by rules called__________ restrictions, which are constraints on what lexical items can go with what others.29. A(n) __________ is a logical participant in a predication, largely identical with the nominal element(s)in a sentence.30.According to the __________ theory of meaning, the words in a language aretaken to be labels of the objects they stand for.IV. Explain the following terms, using examples. (20%)31.Entailment32.Propositionponential analysis34.ReferenceV. Answer the following questions. (20%)35. What are the sense relations between the following groups of words?Dogs, cats, pets, parrots; trunk, branches, tree, roots（青岛海洋大学，1999 ）36.What are the three kinds of antonymy?（武汉大学， 2004 ）VI. Analyze the following situation. (20%)37. For each group of words given below, state what semantic property or properties are shared by the (a) words and the (b) words, and what semantic property or properties distinguish between the classes of (a)words and (b) words.(1) a. bachelor, man, son, paperboy, pope, chiefb. bull, rooster, drake, ram(2) a. table, stone, pencil, cup, house, ship, carb. milk, alcohol, rice, soup(3) a. book, temple, mountain, road, tractorb. idea, love, charity, sincerity, bravery, fear （青岛海洋大学，1999）第七章：语言、文化和社会[注：第六章无测试题]I.Choose the best answer. (20%)1._______ is concerned with the social significance of language variation and language use in different speech communities.A. PsycholinguisticsB. SociolinguisticsC. Applied linguisticsD. General linguistics2.The most distinguishable linguistic feature of a regional dialect is its __________.A. use of wordsB. use of structuresC. accentD.morphemes3. __________ is speech variation according to the particular area where a speaker comes from.A. Regional variation C. Social variationB. Language variation D. Register variation4._______ are the major source of regional variation of language.A. Geographical barriersB. Loyalty to and confidence in one ’ s native speechC. Physical discomfort and psychological resistance to changeD. Social barriers5. _________ means that certain authorities, such as the government choose, a particular speech variety, standardize it and spread the use of it across regional boundaries.A. Language interference C. Language planningB. Language changes D. Language transfer6._________ in a person ’ s speech or writing usually ranges on a continuum from casual or colloquial to formal or polite according to the type of communicative situation.A. Regional variationB. Changes in emotionsC. Variation in connotationsD. Stylistic variation7.A ____ is a variety of language that serves as a medium of communication among groups of people for diverse linguistic backgrounds.A. lingua francaB. registerC. CreoleD. national language8.Although _______ are simplified languages with reduced grammatical features, they are rule-governed, like any human language.A. vernacular languagesB. creolesC. pidginsD. sociolects9.In normal situations, ____ speakers tend to use more prestigious forms than their____ counterparts with the same social background.A. female; maleB. male; femaleC. old; youngD. young; old10.A linguistic _______ refers to a word or expression that is prohibited bythe “ polite ” society generalfrom use.A. slangB. euphemismC. jargonD. tabooII.Decide whether the following statements are true or false. (10%)11. Language as a means of social communication is a homogeneous system with a homogeneous group of speakers.12. The goal of sociolinguistics is to explore the nature of language variation and language use among a variety of speech communities and in different social situations. 13. From the sociolinguistic perspective, the term“speechotbevarietyused to”can n refer to standard language, vernacular language, dialect or pidgin.14. The most distinguishable linguistic feature of a regional dialect is its grammarand uses of vocabulary.15. A person’s social backgrounds do not exert a shaping influence on his choiceof linguistic features.16. Every speaker of a language is, in a stricter sense, a speaker of a distinct idiolect.17.A lingua franca can only be used within a particular country forcommunication among groups of people with different linguistic backgrounds.18.A pidgin usually reflects the influence of the higher, or dominant, language inits lexicon and that of the lower language in their phonology and occasionallysyntax.19. Bilingualism and diglossia mean the same thing.20. The use of euphemisms has the effect of removing derogatory overtones and the disassociative effect as such is usually long-lasting.III. Fill in the blanks. (20%)21. The social group isolated for any given study is called the speech __________.22. Speech __________ refers to any distinguishable form of speech used by aspeaker or group of speakers.23.From the sociolinguistic perspective, a speech variety is no more than a__________ variety of a language.nguage standardization is also called language __________.25.Social variation gives rise to __________ which are subdivisible intosmaller speech categories that reflect their socioeconomic, educational,occupational background, etc.26. __________ variation in a person ’ s speech or writing usuallyon range continuum from casual or colloquial to formal or polite according to the typeof communicative situation.27. A regional dialect may gain status and become standardized as the national or。

语言学教程02Chapter 2_sound(2)

If the sound becomes more like the following sound, as in the case of lamb, it is known as anticipatory coarticulation先期协同发音. If the sound shows the influence of the preceding sound, it is perseverative coarticulation后滞协同发音, as is the case of map.
In phonetic terms, phonemic transcriptions represent the „broad‟ transcriptions.
3.3 Allophones 音位变体

Allophones---- the different phones which can represent a phoneme in different phonetic contexts.

Velarization: clear l and dark l // [] / _____ V [] / V _____

Think about tell and telling!

Phonetic similarity发音近似性: the allophones of a phoneme must bear some phonetic resemblance.

The word „phoneme‟音位 simply refers to a „unit of explicit sound contrast‟: the existence of a minimal pair automatically grants phonemic status to the sounds responsible for the contrasts.

新编简明英语语言学戴炜栋版本u1-u6期末笔记整理

●语言学家：1.F.de Saussure P4Swiss linguist. He distinct the langue and parole in the early 20thcentury <course in general linguistics>写了《普通语言学》强调研究语言（what linguist should do is to abstract langue from parole）2.N ChomskAmerican linguist distinct competence and performance in the late 1950s强调研究语言能力（competence）和索绪尔的相似点●Saussure和chomsky不同之处：索绪尔从社会学角度（sociological view）他的语言概念属于社会习俗范畴（social conventions）；乔姆斯基是从心理学角度（Psychological view），认为语言能力是每个个体大脑的特征（property of mind of each individual）3.现代语言学基本上是描述性的（descriptive），传统语法是规定性的（prescriptive）4.现代语言学中共时性研究更重要（synchronic）Phonetics(语音学) Phonology（音位学）●发音器官1.pharyngeal cavity2.oral cavity3.nasal cavity●speech and writing are the two media or substances 言语和文字是自然语言的两种媒介和物质（言语比文字更加基础）●语音学从哪三个角度研究?(1)说话者角度articulatory phonetics 发声语音学（历史最悠久）(2)听话者角度auditory phonetics 听觉语音学(3)研究语音的传播方式acoustic phonetics 声学语音学●主要现在用IPA标音标，但是语言学家会用严式标音（narrowtranscription）书上举了两个字母的例子{l} leap，feel ，health {p} pit，spit （送气，不送气）p h来表送气●语音的分类：元音（voiced sound）和辅音●voiceless●元音的分类：（1）根据舌头哪一个部位最高，分为front、central、back（2）嘴巴的张合度，分为闭元音、半闭元音、半开元音、开元音（3）不圆唇的（所有前和中元音+{a：} ）和圆唇的（rounded）后元音●Segment 和syllable 前面数有几个元音辅音；后面数有几个元音●语音学和音位学的区别（1）语音学家关注{l} 的发音，清晰舌边音和模糊舌边音（2）音位学家关注{l}分布模式，即在什么位置发这个音如{l} 在元音后或辅音前，发模糊舌边音feel、quilt{l}放在元音前发清晰的舌边音leap注意：Phonology is concerned with the sound system of a particular language.(关注某种语言的语音系统)Linguistics is the scientific study of human languages in general.一、区分音素，音位，音位变体●音素：phone（1）在单词feel[fi:ł],leaf[li:f],tar[tha:],star[sta:]中,一共有7个音素,分别是[f],[i:],[ł],[l],[th].[t],[a:].（2）英语共有48个音素，其中元音20个，辅音28个。

Festival Multisyn Voices for the 2007 Blizzard Challenge

Festival Multisyn Voices for the2007Blizzard Challenge Korin Richmond,Volker Strom,Robert Clark,Junichi Yamagishi and Sue Fitt Centre for Speech Technology ResearchUniversity of Edinburgh,Edinburgh,United Kingdom(korin|vstrom|robert|jyamagis|sue)@AbstractThis paper describes selected aspects of the Festival Mul-tisyn entry to the Blizzard Challenge2007.We provide an overview of the process of building the three required voices from the speech data provided.This paper focuses on new fea-tures of Multisyn which are currently under development and which have been employed in the system used for this Bliz-zard Challenge.These differences are the application of a more ﬂexible phonetic lattice representation during forced alignment labelling and the use of a pitch accent target cost component. Finally,we also examine aspects of the speech data provided for this year’s Blizzard Challenge and raise certain issues for discussion concerning the aim of comparing voices made with differing subsets of the data provided.1.IntroductionMultisyn is a waveform synthesis module which has recently been added to the Festival speech synthesis system[1].It pro-vides aﬂexible,general implementation of unit selection and a set of associated voice building tools.Strong emphasis is placed onﬂexibility as a research tool on one hand,and a high level of automation using default settings during“standard”voice build-ing on the other.This paper accompanies the Festival Multisyn entry to the Blizzard Challenge2007.Similar to the Blizzard Challenges of the previous two years([2,3]),the2007Blizzard Challenge required entrants to build three voices from the speech data pro-vided by speaker“EM001”,then submit a set of synthesised test sentences for evaluation.Theﬁrst voice,labelled voice“A”, used the entire voice database.Two smaller voices,“B”and“C”used subsections of the database.V oice“B”used the set of sen-tences from the ARCTIC database[4]which were recorded by the EM001speaker.For voice“C”,entrants were invited to per-form their own text selection on the voice database prompts to select a subset of sentences no larger than the ARCTIC data set in terms of total duration of speech in seconds.V oices“B”and “C”are intended as a means to compare different text selection algorithms,as well as to evaluate the performance of synthesis systems when using more limited amounts of speech data.Multisyn and the process of building voices for Multisyn is described in detail in[1].In addition,entrants to the Blizzard Challenge this year have been asked to provide a separate sys-tem description in the form of a template questionnaire.For the reader’s convenience this paper will provide a brief overview of Multisyn and the voices built.To limit redundancy,however,we will not repeat all details comprehensively.Instead,we aim to focus here on areas where the use of Multisyn differs from[1]. Those signiﬁcant differences are two-fold.First,we will intro-duce a new technique we have been developing to help in forced alignment labelling.Next,we describe a target cost component which uses a simple pitch accent prediction model.Finally,we will discuss our experience of building voice“C”,and highlight some issues we believe may complicate comparison of entrants’voices“B”and“C”.2.Multisyn voice buildingWe use our own Unisyn lexicon and phone set[5],so only used the prompts and associated waveﬁles from the distributed data, performing all other processing for voice building from scratch. Theﬁrst step of voice building involved some brief examina-tion of the text prompts toﬁnd missing words and to add some of them to our lexicon,ﬁx gross text normalisation problems and so on.Next,we used an automatic script to reduce the du-ration of any single silence found in a waveﬁle to a maximum of50msec.From this point,the process for building Multisyn voices“A”,“B”and“C”described in the remainder of this sec-tion was repeated separately for the relevant utterance subset for each voice.We used HTK tools in a scripted process to perform forced alignment using frames of12MFCCs plus log energy(utter-ance based energy normalisation switched off)computed with a10msec window and2msec frame shift.The process be-gan with single mixture monophone models with three emitting states,trained from a“ﬂat start”.Initial labelling used a single phone sequence predicted by the Festival Multisyn front end. However,as the process progressed with further iterations of reestimation,realignment,mixing up,adding a short pause tee model,and so on,we switched to using a phone lattice for align-ment described in Section3.Once labelling was completed,we used it to perform a waveform power factor normalisation of all waveforms in the database.This process looks at the energy in the vowels of each utterance to compute a single factor to scale its waveform.The power normalised waveforms were then used throughout the remainder of the voice building process,which began with repeating the whole labelling process.Once the labelling had been completed,it was used to build utterance structures1,which are used as part of the internal rep-resentation within aﬁnal Multisyn voice.At this stage,the text prompts were run through a simple pitch accent prediction model(see Section4),and this information stored in the utter-ance structures.Additional information was also added to the utterance structures at this stage;for example,phones with a duration more than2standard deviations from the mean were ﬂagged.Such information could be used later at unit selection time in the target cost function.In addition to labelling and linguistic information stored in utteranceﬁles,Multisyn requires join cost coefﬁcients and RELP synthesis parameters.To create the synthesis parameters, weﬁrst performed pitchmarking using a custom script which makes use of Entropic’s epochs,get resid,get f0 and refcof programs.We then used the sig2fv and sigfilter programs from the Edinburgh Speech Tools for lpc analysis and residual signal generation respectively.The 1a data structure deﬁned in the Edinburgh Speech Tools libraryMultisyn join cost uses three equally weighted components: spectral,f0and log energy.The spectral and log energy join cost coefﬁcients were taken from the MFCCﬁles calculated by HTK’s HCopy used for labelling.The f0contours were pro-vided by the ESPS program get f0.All three of these feature streams were globally normalised and saved in the appropriate voice data structure.During unit selection,Multisyn does not use any acoustic prosodic targets in terms of pitch or duration.Instead,the target cost is a weighted normalised sum of a series of components which consider the following:lexical stress,syllable position, word position,phrase position,part of speech,left and right phonetic context,“bad duration”and“bad f0”.As mentioned above,“bad duration”is aﬂag which is set on a phone within a voice database utterance during voice building and suggests a segment should not be used.Similarly,the“bad f0”target cost component looks at a candidate unit’s f0at concatenation points,considering voicing status rather than a speciﬁc target f0 value.We have also used an additional target cost component for the presence or absence of a pitch accent on a vowel.This is described further in Section4.Finally,we stress that during concatenation of the best can-didate unit sequence,Multisyn does not currently employ any signal processing apart from a simple overlap-add windowing at unit boundaries.No prosodic modiﬁcation of candidate units is attempted and no spectral,amplitude or f0interpolation is performed across concatenation boundaries.3.Finite state phonetic lattice labelling For all three voices for this Blizzard Challenge we employed a forced alignment system we have been developing which makes use of aﬁnite state representation of the predicted phonetic real-isation of the recorded prompts.The advantage of theﬁnite state phonetic representation is that it makes it possible to elegantly encode and process a wide variety pronunciation variation dur-ing labelling of speech data.In the following two sections we ﬁrst give a general introduction to how our phonetic lattice la-belling works,and then give some more speciﬁc details of how the system was applied to building voices for this Blizzard Chal-lenge.3.1.General implementationIf we consider how forced alignment is standardly performed using HTK,for example,the user is required to provide,among other things,a pronunciation lexicon and word level transcrip-tion.The pronunciation lexicon contains a mapping between a given word and a corresponding sequence of phone model labels.During forced alignment,the HTK recognition engine loads the word level transcription and expands this into a recog-nition network,or“lattice”,of phone models using the pronun-ciation dictionary.This lattice is then used to align against the sequence of acoustic parameter vectors.The predominant way to include pronunciation variation within this system is to use multiple entries in the lexicon for the same word.This approach generally suits speech recognition,but in the case of labelling for building a unit selection voice,we could perhaps proﬁt from moreﬂpleteﬂexibility is achieved if we compose the phone lattice directly and pass that to the recognition engine.To build the phone lattice for a given prompt sentence,we ﬁrst lookup each word in the lexicon and convert the phone string to a simpleﬁnite state structure.When a word is not found in the lexicon,we use the CART letter-to-sound rules the ﬁnal festival voice would use to generate a phone string.Where multiple pronunciations for a word are found,we can combine these into a singleﬁnite state representation using the union op-eration.Theﬁnite state machines for the separate words are then concatenated in sequence to give a single representation of the sentence.The topﬁnite state acceptor(FSA)in Figure 1gives a simpliﬁed example of the result of this process for a phrase fragment“...wider economic...”.At this stage,there is little advantage over the standard HTK method,which would internally arrive at the same result.How-ever,once we have a predicted phonetic realisation for a record-ing prompt in aﬁnite state form,it is then straightforward to process this representation further in an elegant and robust way. This is useful to help perform simple tasks,such as splitting stops and affricates into separate symbols for their stop and release parts during forced alignment(done to identify a suit-able concatenation point).More signiﬁcantly,though,we can also robustly apply more complex context dependent postlex-ical rules,for example optional“r”epenthesis intervocalically across word boundaries for certain British English accents.This is indicated in the bottom FSA of Figure1.This may be conveniently achieved by writing rules in the form of context dependent regular expressions.It is then possi-ble to automatically compile these rules into an equivalentﬁnite state transducer which can operate on the input lattice which resulted from lexical lookup(e.g.top FSA in Figure1).Sev-eral variations of compilation methods have been previously described to convert a system of handwritten context dependent mapping rules into an equivalent FST machine to perform the transduction,e.g.[6,7,8].Note that the use of context depen-dent modiﬁcations is moreﬂexible and powerful than the stan-dard HTK methods.For example,a standard way to implement optional“r”epenthesis pronunciation variation using a pronun-ciation lexicon alone would be to include multiple entries for “wider”,one of which contains the additional“r”.However,this introduces a number of problems.The most signiﬁcant problem is the absence of any mechanism to disallow“r”epenthesis in environments where a vowel does not follow.The phonetic lattice alignment code has been implemented as a set of python modules which underlyingly use and extend the MIT Finite State Transducer Toolkit[9].We use CSTR’s Unisyn lexicon[5]to build voices and within the running syn-thesis system.For forced alignment,we use scripts which un-derlying make use of the HTK speech recognition library[10]. Finally,we are planning to make this labelling system publicly available once it reaches a more mature state of development.3.2.Application to EM001voiceSpeaker EM001exhibits a rather careful and deliberate ap-proach to pronunciation during the recordings and uses a rel-atively slow rate of speech.This in fact tends to limit the ap-plicability and usefulness of postlexical rules for the Blizzard Challenge voices somewhat.Postlexical rules are more use-fully applied to the processes of moreﬂuent and rapid connected speech.Thus,in building the three voices for the2007Bliz-zard Challenge,the sole postlexical rule we used was a“tap”rule.Under this rule,alveolar stops in an intervocalic cross word environment could undergo optional transformation to a tap.Speciﬁcally,the left phonetic context for this rule com-prised the set of vowels together with/r,l,n/(central and lateral approximants and alveolar nasal stop),while the right context contained just the set of vowels.4.Pitch accent predictionIn this year’s system,we have experimented with a simple pitch accent target cost function component.To use pitch accent pre-diction in the voices built for the Blizzard Challenge required three changes.First,we ran a pitch accent predictor on the textFigure1:Toy exampleﬁnite state phonetic lattices for the phrase fragment“wider economic”:a)after lexical lookup,the lattice encodes multiple pronunciation variants for“economic”b)after additional“r”insertion postlexical rule,the input lattice(top)is modiﬁed to allow optional insertion of“r”(instead of short pause“sp”).prompts andﬂagged words with a predicted accent as such in the voice data structures.Next,at synthesis time,our front end linguistic processor was modiﬁed to run the accent predictor on the input sentence to be synthesised,and words with a predicted accent were similarlyﬂagged.Finally,an additional target cost component compared the values of the pitch accentﬂag for the word associated with each target vowel and returned a suitable cost depending on whether they match or not.The method for pitch accent prediction we used here is very simple.It is centred on a look-up table of probabilities that a word will be accented,or“accent ratios”,along the lines of the approach described in[11].The accent predictor simply looks up a word in this list.If the word is found and its probability for being accented is less than the threshold of0.28,it is not accented.Otherwise it will receive an accent.These accent ratios are based on the BU Radio Corpus and six Switchboard dialogues.The list contains157words with an accent ratio of less than0.282.The pitch accent target cost component has recently been evaluated in a large scale listening test and was found to be beneﬁcial[12].5.Voice“C”and text selection Entrants to the2007Blizzard Challenge were encouraged to enter a third voice with a voice database size equal to that of the ARCTIC subset,but with a freely selected subset of utterances. The purpose of this voice is to probe the performance of each team’s text selection process,as well as to provide some insight into the suitability of the ARCTIC data set itself.5.1.Text selection processOrdinarily,when designing a prompt set for recording a unit selection voice database,we would seek to avoid longer sen-tences.They are generally harder to read,which means they are more taxing on the speaker and are more likely to slow down the recording process.In this case,however,since the sentences had been recorded already,we decided to relax this constraint.In a simple greedy text selection process,sentences were chosen in an iterative way.First,the diphones present in the EM001text prompts were subcategorised to include certain contextual features.The features we included were lexical stress,pitch accent and proximity to word boundary.Syllable boundary information was not used in the speciﬁcation of di-phone subtypes.Next,sentences were ranked according to the number of context dependent diphones contained.The top ranking sen-tence was selected,then the ranking of the remaining sentences was recomputed to reﬂect the diphones now present in the sub-set of selected sentences.Sentences were selected one at a time in this way until the total time of the selected subset reached the 2using the accent ratio table in this way is essentially equivalent to using an(incomplete)list of English function words.count of diphone type in full EM001 setcountofcountsofmissingdiphonetypesFigure2:Histogram of counts of unique context dependent di-phone types present in the full EM001set which are missing from the selected subset used to build for voice“C”.prescribed threshold.This resulted in a subset comprising431 utterances,with a total duration of2908.75seconds.Our deﬁnition of context dependent diphones implied a to-tal of6,199distinct diphones with context in the entire EM001 corpus.Our selected subset for voice“C”contained4,660of these,which meant1,539were missing.Figure2shows a his-togram of the missing diphone types in terms of their counts in the full EM001data set.We see that the large majority of the missing diphone types only occur1–5times in the full EM001 dataset.For example,773of the diphone types which are miss-ing from the selected subset only occur once in the full EM001 set,while only one diphone type which is missing occurred as many as26times in the full data set.5.2.Evaluation problemsAlthough it is certainly interesting to compare different text se-lection algorithms against the ARCTIC sentence set,we suggest the way it has been performed this year could potentially con-fuse this comparison.Theﬁrst issue to which we would like to draw attention concerns the consistency of the recorded speech material throughout the database.The second issue concerns the question of how far the full EM001data set satisﬁes the se-lection criteria used by arbitrary text selection algorithms.5.2.1.Consistency of recorded utterancesFigures3–5show plots of MFCC parameter means from the EM001database taken in alphabeticalﬁle ordering.To produceEMOO1 File (alphabetical sorting)m e a n f o r 9t h M F C C c h a n n e lFigure 3:Mean value for 9th MFCC channel for each ﬁle of the EM001voice database.EMOO1 File (alphabetical sorting)m e a n f o r 7t h M F C C c h a n n e lFigure 4:Mean value for 7th MFCC channel for each ﬁle of the EM001voice database.EMOO1 File (alphabetical sorting)m e a n f o r 11t h M F C C c h a n n e lFigure 5:Mean value for 11th MFCC channel for each ﬁle of the EM001voice database.these plots we have taken all ﬁles in the EM001data set in al-phabetical ordering (along the x-axis)and calculated the mean MFCC parameters 3for each ﬁle.In calculating these means,we have omitted the silence at the beginning and end of ﬁles us-ing the labelling provided by the force alignment we conducted during voice building.A single selected dimension of this mean vector is then plotted in each of the Figures 3–5.From these ﬁgures,we notice that there seem to be three distinct sections of the database,which correspond to the “ARC-TIC”,“BTEC”and “NEWS”ﬁle labels as indicated in the plots.Within each of these blocks,the MFCC mean varies randomly,but apparently uniformly so.Between these three sections,however,we observe marked differences.For example,com-pare the distributions of per-ﬁle means of the 9th (Fig.3)and 7th (Fig.4)MFCC parameters within the “NEWS”section with those from the other two sections of the database.We naturally expect the MFCC means to vary “randomly”from ﬁle to ﬁle according to the phonetic content of the utter-ance contained.However,an obvious trend such as that exhib-ited in these plots suggests the inﬂuence of something more than phonetic variation alone.Speciﬁcally,we suspect this situation has arisen due to the signiﬁcant difﬁculty of ensuring consis-tency throughout the many days necessary to record a speech corpus of this size.We have observed similar effects of incon-sistency within other databases,both those we have recorded at CSTR,as well as other commercially recorded databases.Recording a speech corpus over time allows the introduction of variability,with potential sources ranging from the acous-tic recording environment (e.g.microphone placement relative to speaker)to the quality of the speaker’s own voice,which of course can vary over a very short space of time [13].In addi-tion,even the genre and nature of the prompts themselves can inﬂuence a speaker’s reading style and voice characteristics.Note that although we do not see any trends within each of the three sections of the EM001data set,and that they appear relatively homogeneous,this does not imply that these subsec-tions are free of the same variability and inconsistency.These plots have been produced by taking the ﬁles in alphabetical,and hence numerical,order.But it is not necessarily the case that the ﬁles were recorded in this order.In fact,it is likely the ﬁle order-ing within the subsections has been randomised which has the effect of disguising inconsistency within the three sections.The inconsistency between the sections is evident purely because the genre identity tag has maintained three distinct groups.Therefore,despite the probable randomisation of ﬁle order within sections,we infer from the patterns evident in Figures 3–5that the speech data corresponding to the ARCTIC prompt set was recorded all together,and constitutes a reasonably con-sistent “block”of data.Meanwhile,the rest of the data seems to have been recorded at different times.This introduces in-consistency throughout the database,which a selection algo-rithm based entirely upon text features will not take account of.This means that unless it is explicitly and effectively dealt with by the synthesis system which uses the voice data,both at voice building time (ing cepstral mean normalisation dur-ing forced alignment)and at synthesis time,voice “C”stands a high chance of being disadvantaged by selecting data indis-criminately from inconsistent subsections of the database.The forced alignment labelling may suffer because of the increased variance of the speech data.Unit selection may suffer because the spectral component of the join cost may result in a nonuni-form probability of making joins across sections of the database,compared with the those joins within a single section.This has the effect of “partitioning”the voice database.3extractedusing HTK’s HCopy as part of our force alignment pro-cessing,and also subsequently used in the Multisyn join costThe Multisyn voice building process currently takes ac-count of amplitude inconsistency,and attempts waveform power normalisation on a per-utterance basis.However,other sources of inconsistency,most notably spectral inconsistency are not currently addressed.This means that Multisyn voice “C”is potentially affected by database inconsistency,which in-troduces uncertainty and confusion in any comparison between voices“B”and“C”.Within the subset of431sentences we se-lected to build voice“C”,261came from the“NEWS”section, 169came from the“BTEC”section,and the remaining36came from the“ARCTIC”section.This issue of inconsistency can potentially affect the com-parison between the“C”voices from different entrants.For example,according to our automatic phonetic transcriptions of the EM001sentence set,the minimum number of phones con-tained in a single sentence within the“NEWS”section is52. Meanwhile,the“BTEC”section contains1,374sentences with less than52phones.Although we have not done so here,it is not unreasonable for a text selection strategy to favour short sentences,in which case a large majority may be selected from the“BTEC”section.This would result in avoiding the large discontinuity we observe in Figures3and4and could poten-tially confer an advantage which is in fact unrelated to the text selection algorithm per se.The problem has the potential,however,to introduce most confusion into the comparison between entrants’voices“B”and “C”,as there is most likely to be a bias in favour of the ARCTIC subset,which seems to have been recorded as a single block. We suggest there are at least two ways of avoiding this bias in future challenges.One way would be to provide a database without the inconsistency we observe here,for example through post-processing.This is likely to be rather difﬁcult to realise, and our own previous attempts have failed toﬁnd a satisfactory solution,although[14]reported some success.A second,sim-pler way would be to record the set of ARCTIC sentences ran-domly throughout the recording of a future Blizzard Challenge corpus.5.2.2.Selection criteria coverageThe second problem inherent in attempting to compare text se-lection processes in this way arises from differing selection cri-teria.It is usual to choose text selection criteria(i.e.which di-phone context features to consider)which complement the syn-thesis system’s target cost function.Hence the criteria may vary between systems.The set of ARCTIC sentences was selected from a very large amount of text,and so the possibility for the algorithm to reach its optimal subset in terms of the selection criteria it used is maximised.In contrast,the text selection required for voice “C”was performed on a far smaller set of sentences.Although, admittedly,it is likely to be phonetically much richer than if the same number of sentences had been selected randomly from a large corpus,it is possible that the initial set of sentences does not contain a sufﬁcient variety of material to satisfy the selec-tion criteria of arbitrary text selection systems.This again may tend to accord an inherent advantage to voice“B”.6.ConclusionWe have introduced two new features of the Multisyn unit selec-tion system.We have also raised issues for discussion concern-ing the comparison of voices built with differing subsets of the provided data.Finally,we note that,as in previous years,par-ticipating in this Blizzard Challenge has proved both interesting and useful.7.AcknowledgmentsKorin Richmond is currently supported by EPSRC grant EP/E027741/1.Many thanks to Lee Hetherington for making the MITFST toolkit available under a BSD-style license,and for other technical guidance.Thanks to A.Nenkova for process-ing the Blizzard text prompts for pitch accent prediction.8.References[1]R.A.J.Clark,K.Richmond,and S.King,“Multisyn:Open-domain unit selection for the Festival speech syn-thesis system,”Speech Communication,vol.49,no.4,pp.317–330,2007.[2]R.Clark,K.Richmond,V.Strom,and S.King,“Multisyn voice for the Blizzard Challenge2006,”in Proc.Blizzard Challenge Workshop(Inter-speech Satellite),Pittsburgh,USA,Sept.2006, (/blizzard/blizzard2006.html).[3]R.A.Clark,K.Richmond,and S.King,“Multisyn voicesfrom ARCTIC data for the Blizzard challenge,”in Proc.Interspeech2005,Sept.2005.[4]J.Kominek and A.Black,“The CMU ARCTIC speechdatabases,”in5th ISCA Speech Synthesis Workshop,Pitts-burgh,PA,2004,pp.223–224.[5]S.Fitt and S.Isard,“Synthesis of regional English usinga keyword lexicon,”in Proc.Eurospeech’99,vol.2,Bu-dapest,1999,pp.823–826.[6]M.Mohri and R.Sproat,“An efﬁcient compiler forweighted rewrite rules,”in Proc.34th annual meeting of Association for Computational Linguistics,1996,pp.231–238.[7]R.Kaplan and M.Kay,“Regular models of phonologicalrule systems,”Computational Linguistics,vol.20,no.3, pp.331–378,Sep1994.[8]L.Karttunen,“The replace operator,”in Proc.33th an-nual meeting of Association for Computational Linguis-tics,1995,pp.16–23.[9]L.Hetherington,“The MITﬁnite-state transducer toolkitfor speech and language processing,”in Proc.ICSLP, 2004.[10]S.Young,G.Evermann,D.Kershaw,G.Moore,J.Odell,D.Ollason,D.Povey,V.Valtchev,and P.Woodland,TheHTK Book(for HTK version3.2),Cambridge University Engineering Department,2002.[11]J.Brenier,A.Nenkova,A.Kothari,L.Whitton,D.Beaver,and D.Jurafsky,“The(non)utility of linguistic features for predicting prominence on spontaneous speech,”in IEEE/ACL2006Workshop on Spoken Language Technol-ogy,2006.[12]V.Strom,A.Nenkova,R.Clark,Y.Vazquez-Alvarez,J.Brenier,S.King,and D.Jurafsky,“Modelling promi-nence and emphasis improves unit-selection synthesis,”in Proc.Interspeech,Antwerp,2007.[13]H.Kawai and M.Tsuzaki,“Study on time-dependentvoice quality variation in a large-scale single speaker speech corpus used for speech synthesis,”in Proc.IEEE Workshop on Speech Synthesis,2002,pp.15–18. [14]Y.Stylianou,“Assessment and correction of voice qualityvariabilities in large speech databases for concatentative speech synthesis,”in Proc.ICASSP-99,Phoenix,Arizona, Mar.1999,pp.377–380.。

现代语言学前五章课后习题答案

Chapter 1 Introduction1.Explain the following definition of linguistics: Linguistics is the scientific study oflanguage. 请解释以下语言学的定义:语言学是对语言的科学研究。

Linguistics investigates not any particular languagebut languages in general.Linguistic study is scientific because it is baxxxxsed on the systematic investigation of authentic language data.No serious linguistic conclusion is reached until after the linguist has done the following three things: observing the way language is actually usedformulating some hypothesesand testing these hypotheses against linguistic facts to prove their validity.语言学研究的不是任何特定的语言，而是一般的语言。

语言研究是科学的，因为它是建立在对真实语言数据的系统研究的基础上的。

只有在语言学家做了以下三件事之后，才能得出严肃的语言学结论:观察语言的实际使用方式，提出一些假设，并用语言事实检验这些假设的正确性。

1.What are the major branches of linguistics? What does each of them study?语言学的主要分支是什么?他们每个人都研究什么?Phonetics-How speech sounds are produced and classified语音学——语音是如何产生和分类的Phonology-How sounds form systems and function to convey meaning音系学——声音如何形成系统和功能来传达意义Morphology-How morphemes are combined to form words形态学——词素如何组合成单词Sytax-How morphemes and words are combined to form sentences句法学-词素和单词如何组合成句子Semantics-The study of meaning ( in abstraction)语义学——意义的研究(抽象)Pragmatics-The study of meaning in context of use语用学——在使用语境中对意义的研究Sociolinguistics-The study of language with reference to society社会语言学——研究与社会有关的语言Psycholinguistics-The study of language with reference to the workings of the mind心理语言学:研究与大脑活动有关的语言Applied Linguistics-The application of linguistic principles and theories to language teaching and learning应用语言学——语言学原理和理论在语言教学中的应用1.What makes modern linguistics different from traditional grammar?现代语言学与传统语法有何不同?Modern linguistics is descxxxxriptive;its investigations are baxxxxsed on authenticand mainly spoken language data.现代语言学是描述性的，它的研究是基于真实的，主要是口语数据。

Client World Model Synchronous Alignement for Speaker Verification

password phonetic structure is similar for the speaker and the impostors. This motivates the study of a synchronous alignment approach where the hidden process (i.e the sequence of states) is supposed identical for both client and nonclient. Only the output distributions differ between the two hypotheses. The synchronous alignment approach is depicted and compared to the classical one on Figure 1.
The main idea of synchronous alignment is to make the two models share the same topology and differ in the output distributions. In order to compute the optimal path in the shared model, a global criterion is defined. Two possible criteria are proposed in this section. Specific decoding and training algorithms for both criteria are derived. The convergence properties of such algorithm are studied and the results are presented here. 2.1 Criteria for synchronous alignment

Chapter 2 Speech Sounds

Various obstructions created within the oral cavity lead to the production of various sounds [p] [b]; [s] [z]; [k] [g]
2.1 How Speech Sounds Are Made? The Nasal Cavity（鼻腔）
●When the vocal cords are apart, the air can pass through easily and the sound produced is said to be voiceless. e.g. [p, s, t ] ●When they are close together, the airstreams cause them to vibrate and produces voiced sounds. e.g. [b, z, d] ●When they are totally closed, no air can pass between them, then produce the glottal stop [?]none in En.
2.1 How Speech Sounds Are Made? The Oral Cavity（口腔）
The oral cavity provides the greatest source of modification. Tongue: the most flexible Uvula, the teeth and the lips, Hard palate, soft palate (velum) Alveolar ridge: the rough, bony ridge immediately behind the upper teeth

山西省原平市范亭中学2024_2025学年高二英语4月月考试题

山西省原平市范亭中学2024-2025学年高二英语4月月考试题本试卷分为第I卷（选择题）和第II卷（非选择题）两部分, 共150分。

考试时间120分钟。

第I卷（共100分）第一部分阅读理解（共两节，满分60分）第一节（共15小题，每小题3分，满分45分）AA new app aims to help parents interpret what their baby wants based on the sound of their cry. The free app ChatterBaby, which was released last month, analyzes the acoustic (声学的) features of a baby’s cry, to help parents understand whether their child might be hungry, fussy or in pain. While critics say caregivers should not rely too much on their smartphone, others say it’s a helpful tool for new or tired parents.Ariana Anderson, a mother of four, developed the app. She originally designed the technology to help deaf parents better understand why their baby was upset, but soon realized it could be a helpful tool for all new parents.To build a database, Anderson and her team uploaded 2,000 audio samples of infant(婴儿) cries. She used cries recorded during ear piercings and vaccinations to distinguish pain cries. And to create a baseline for the other two categories, a group of moms had to agree on whether the cry was either hungry or fussy.Anderson’s team continues to collect data and hopes to make the app more accurate by asking parents to get specific about what certain sounds mean.Pediatrician Eric Ball pointed out that evaluating cries can never be an exact science. “I think that all of the apps and technology that new parents are using now can be helpful but need to be taken seriously,” Ball said ,“ I do worry that some parents will get stuck in big data and turn their parenting into basically a spreadsheet(电子表格) which I think will take away the love and caring that parents are supposed to be providing for the children. ”But Anderson said the aim of the app is to have parents interpret the results, not to provide a yes o r no answer. The Bells, a couple using this app, say it’s a win-win. They believe they are not only helping their baby now but potentially othersin the future.1．How does the app judge what babies want?A．By collecting data. B．By recording all the sounds.C．By analyzing the sound of their cries. D．By asking parents about specific messages.2．What was the app designed for in the beginning?A．All new parents. B．Deaf parents.C．Ariana Anderson. D．Crying babies.3．What i s Ball’s opinion about the app?A．Parents should use the app wisely.B．The app can create an accurate result.C．Parents and babies are addicted to the app.D．The app makes babies lose love and caring.4．What is the text mainly about?A．Parents should not rely too much on their smartphones.B．A new app helps parents figure out why their babies are crying.C．Parents can deal with babies’ hunger with the help of a new app.D．A new app called ChatterBaby can prevent babies from crying.BMany people spend more than four hours per day on We Chat, and it is redefining the word “friend.” Does friending someone on social media make him or her your friend in real life?Robin Dunbar, a professor at Oxford University, found that only 15, of the 150 Facebook friends the average user has, could be counted as actual friends and only five as close friends. We Chat may show a similar pattern.Those with whom you attended a course together, applied for the same part-time job, went to a party and intended to cooperate but failed take up most of your WeChat friends. In chat records, the only message may be a system notice, “You have accepted somebody’s friend request”. Sometimes when seeing some photos shared on “Moments”, you even need several minutes to think about when you became friends. Also, you maybe disturbed by mass messages (群发信息) sent from your unfamiliar “friends”, including requests for voting for their children or friends, links from Pinduoduo (a Chinese e-commerce platform that allows users to buy items at lower prices if they purchase in groups) and cookie-cutter (一模一样的) blessings in holidays.You would have thought about deleting this type of “friends” and sort out your connections. But actually you did not do that as you were taught that social networkingis valuable to one’s success. Besides, it would be really awkward if they found thatyou have unfriended them already. Then, you keep increasing your “friends” in social media and click “like” on some pictures that you are not really interested. Butthe fact is that deep emotional connections do not come with the increasing numberof your friends in social media.If the number of your friends reaches 150, maintaining these relationships canbe tough to you, and sometimes even will make you anxious. According to Robin Dunbar,150 is the limit of the number of people with whom one can maintain stable social relationships.5．What can we learn from Robin Dunbar's finding in Paragraph 2?A．A Facebook user has 250 friends on average.B．Most of the social media friends can be actual friends.C．Among our social media friends, only a few people matter.D．Only 15 people of a person’s Facebook friends can be close friends.6．What does the third paragraph tell us about most of your WeChat friends?A．You have deep communication with them.B．You benefit a lot from their mass messages.C．You just have a nodding acquaintance with them.D．You become friends with them in important occasions.7．What does the underlined word “that” in Paragraph 4 refer to?A．Removing unfamiliar friends in WeChat.B．Strengthening ties with your We Chat friends.C．Keeping increasing your friends in social media.D．Clicking “like” on pictures posted by your friends.8．What can we infer from the last paragraph?A．We will be anxious if we make friends online.B．We should avoid making any friends in social media.C．We should make as many friends as possible in social media.D．We have difficulty managing relationships with over 150 people.CLast week, Vodafone started a test of the UK's first full 5G service, available for use by businesses in Salford. It is part of its plan to trial the technology in seven UK cities. But what can we expect from the next generation of mobile technology?One thing we will see in the preparation for the test is lots of tricks with the new tech. Earlier this year, operators paid almost £ 1.4 billion for the 5G wavelengths, and to compensate for that cash, they will need to catch the eye of consumers. In September, Vodafone used its bit of the range to display the UK's first hologram (全息) call. The Manchester City captain Steph Houghton appeared as a hologram in Newbury. It isn't all holograms, however: 5G will offer faster internet access, with Ofcom (英国通讯管理局) suggesting that video that takes a minute to download on 4G will be available in just a second.The wider application is to support connected equipment on the "internet of things" -not just the internet-enabled fridge that can reorder your milk for you, but the network that will enable driverless cars and delivery drones (无人机) to communicate with each other.Prof William Webb has warned that the technology could be a case of the emperor's new clothes. Much of the speed increase, he claims, could have been achieved by putting more money in the 4G network, rather than a new technology. Other different voices have suggested that a focus on rolling out wider rural broadband access and addressing current network coverage would be more beneficial to the UK as a whole.Obviously, 5G will also bring a cost to consumers. It requires a handset for both 5G and 4G, and the first 5G-enabled smart phones are expected in the coming year. With the slow pace of network rollout so far, it is likely that consumers will end up upgrading to a new 5 G phone well before 5 G becomes widely available in the nextcouple of years.9．Why does Prof William Webb say "the technology could be a case of the emperor's new clothes" ?A．He is in favor of the application of the new technology.B．5G will bring a cost to consumers in their daily life.C．5G helps people communicate better with each other.D．He prefers more money to be spent on 4G networks.10．The underlined word "addressing" in the fourth paragraph has the closest meaning to________A．making a speech to B．trying to solveC．managing to decrease D．responding to11．The last paragraph indicates thatA．it'll take several years .to make 5G accessible to the public in the UK B．5G service shows huge development potential and a broad marketC．customers are eager to use 5G smart phones instead of 4G onesD．it's probable that 5G network rollout is speeding up in BritainDZebra crossings (斑马线) — the alternating dark and light stripes on the road surface — are meant to remind drivers that pedestrians may be trying to get across. Unfortunately, they are not very effective. A 1998 study done by the Department of Traffic Planning and Engineering at Sweden’s Lund University showed that three out of four drivers kept the same speed or even speeded up as they were approaching a crossing. Even worse? Only 5% stopped even when they saw someone trying to get across.Now a mother-daughter team in Ahmedabad, India has come up with a clever way to get drivers to pay more attention — a 3D zebra crossing with an optical illusion (视错觉). Artists Saumya Pandya Thakkar and Shakuntala Pandya were asked to paint the crosswalks by IL&FS, an Indian company that manages the highways in Ahmedabad. The corporation was looking for a creative solution to help the city’s residents to cross the busy accident-prone (易出事故的) roads safely. Thakkar and Pandya, who had previously seen images of 3D zebra crossings that gave drivers the illusion oflogs of wood on the streets in Taizhou， China, decided to test if a similar way would work in India.Sure enough, in the six months when the 3D crosswalks have been painted across four of the city’s most dangerous highways, there have been no accidents reported! The artists say that while it may appear that the zebra crossing could cause the drivers to brake suddenly and endanger the vehicles behind, such is not the case. Because of the way the human eye works, the illusion is only visible from a distance. As they get closer, the painting looks just like any other ordinary zebra crossing. The creators hope that their smart design will become increasingly common throughout India and perhaps even the world. So let’s look forward to it.12．What can we learn from the first paragraph?A．Most drivers will slow down at zebra crossings.B．Common zebra crossings don’t function well.C．Drivers have to stop when approaching zebra crossings.D．About 95% of the drivers choose to speed up when approaching zebra crossings. 13．Why do drivers seeing the 3D zebra crossings slow down according to Para. 2?A．Because the drivers consider the safety of pedestrians.B．Because the drivers mistake them for logs of wood on the streets.C．Because the drivers are afraid of being fined for breaking the traffic rules.D．Because the drivers don’t want to brake suddenly and endanger the vehicles behind.14．The last paragraph is mainly about ________.A．the theory of the 3D zebra crossingsB．the popularity of the 3D zebra crossingsC．the shortcoming of the 3D zebra crossingsD．the positive effect of the 3D zebra crossings15．What is the author’s attitude towards the 3D zebra cross ings?A．Cautious. B．Doubtful. C．Approving. D．Disapproving.其次节（共5小题；每小题3分，满分15分）依据短文内容，从短文后的选项中选出能填入空白处的最佳选项。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

R e el c ved A Pr. 2 5 , 20 1 0 R evi sed Ju n . 4 , 2 0 1 0
B ei i j ng 100081)
A b s tr a e t
It 1 w ll k n ow n th a t a u d ito ry sy stem o f h u m an b e i g s h a s ex eellen 5 e n t
a d a P ta tio n m e th o d .
P A C S n u m b e rs: 4 3 .7 2 , 4 3 . 0 6
1 In tro d u etio n
T h e h u m a n a u d ito ry sy ste m 15 a h i h l eo m p lex sen so ry sy stem . S tu d y in g its stru etu re g y
ta n eo u s f eq u e n ey h a r s m u eh Io w er eo m P lex i th a n th a t b a sed o n a m b ig u ity f n etio n . F rth er ty u u
m o re , th e F F T 一 a ed f a tu re s a eh iev e th e h i h est ree o g n iti n ra te u sin g th e P ro P o sed o rd er r bs e g o
o V l. O 3
o f sp ee eh sig n a l 15 a l 叮 5 eh a n g in g eo n tin u o u sly [ c o m p a red w i m o d e l g sp e eeh sig n a l a s s w 2]. thtio n ary si n a l , m o d e li g th e m a s a m p li d e一 eq u en ey m o d u la ti n sig n a l a ee o rd s tim g s n tu r f o s b ette r w i sp e eeh eh a rae teristi s a s w ell as th e p eree p tio n o f h u m a n a u d ito ry sy ste m sf 一 th e 3 4{. A s
a new ti e一equeney anal s tool F aeti m f r ysi ,r onalF uri t f rm ( FT )1 att o er rans o r F 5 raeti m ore and ng
m ore atten ti in sign al Proeessing li on teratu re. Sin ee F F T ean b e eon sidered as a d eeo m P osi on r ti ofth e si nalin term s o feh i s, F F T 1 esp eeiall su i l f r th e proeessing ofeh i like signals. g rp r 5 y tab e o rP一
si n a ls u sin g t h e a u d ito ry f l b a n k , th e d e eo m P o se d s u b b a n d sig n a l m a g i ter s y
sig n als a n d th e R F T e a n P la y a g re就 e r r ole .
o f H i er E d ucati of C hina (2010 110 11100 20). gh on
t C orrespo nd i g au th or: X IE X i g, xiex i g@ bit. u n n an an ed
454
C H IN E SE JO U R N A L O F A C O U S T IC S
p ro ee sses
a n d f n e ti n 15 o f g re at i p o rta n ee f r a b e tte r u n d e rsta n d i g o f h u m a n p e reep ti u o m o n e v
a n d o f h e a rin g p a th o lo g ies a n d th e i tre a tm en t in p a rtieu l r . B esid e s, th e aeh iev m en ts in r a e a u d i ry sy stem to stu d y ea n g u i e th e d esig n o f arti五 ia l a u d ito ry sy stem l d e ll. A s 15 w e ll k n ow n ,
G am m atone f lterba nk 15 aPp li to sPeeeh signals f r f on 一 d tem Poral f lterin g, an d then i ed o r t en i
a eo u s tie f a tu res o f th e o u tP u t su b b a n d sig n a ls are e x tra ete d b ase d o n f a eti n a l F u rie r tra n se r o o o f rm . C o n si erin g th e eriti a l ef e t o f tra n sf rm o rd er f r R F T , a n o rd e r a d a p ta tio n m eth o d d e e f o o
B ee a u se sp e ee h sig n a ls a e v e ry eo m p lex , th e y h a e m a n r v y
o w
m a i f e邻 en ey eo m p o n n s th ey n r e t
n ,t q u ite res em b le s i P le eh irP si n a ls. If w e f rst d ee o m p o se th e sp e eeh sig n a l in to su b b a n d m g i
r f a e t io n a l F O u r ie r t ra n sf r m o
n N H ui x IE x ia n g t
o f r sp e e e h r e e o g n it io n
K U A N G Jin gm i g n
(D e ar o eot o E eetr :e E ng: 汽 g, B e砂 I st,亡to o T ehool P t f l o ne 几 :ng n f e 叩,
* T h i w ork w 朗 sup p orted b y the N ati na lSei ee and T eh nolo盯 M ajor p ro jeets (Zo loZ X o3 oo奋00 3 01), s o en e th e N a ti a N atura Seien ee F u nd ation of C hina (9092 0304) a nd the R sea reh F nd f r the D oeto ral P rog ram on j J o e u o
ig u i f n e tion . A S R ex p e ri e n s are e o n d u e te d o n ele a n a n d n o i ty u m t sy th a t th e p ro p o se d f a tu re s a eh ie v sig n i ea n tly h ig h er e e i f
sy stem s m ig h t im P ro v e th e re eo g n i n ra te si n if e a n tl . tio g i y
S p ee eh si n a l 15 v ery eo m P lex sig n a l. B ee au se o f in o n a ti n a n d eo a rtieu la tio n , f e q u en ey g t o r
P e rf rm a n ee o
w hi autom ati speeeh reeogni on ( SR ) syst s ean t m at , and f aeti eh e ti A em eh r onalF uri t o er rans o f rm ( FT ) has uni r F que adv ntages i non一ati a n st onary si gnal pr si oees ng In thi paper, the s
从 )1. 0 N o .4 3
C H IN E S E J O U R N A L O F A C O U S T IC S
2011
A e o u stie f a u r e s b a s e d o n a u d ito r y m o d e l a n d a d a P tiv e e t
b e m o re li e e h irp k
T h e ap p l atio n o f a u d ito ry m o d els to sp ee eh re eo g n i n h a s a lrea d y b ee n s tu d ie d e x te n ie tio
si y. N ew aeousti f atures f r A SR ean be deri through anal s ( . eepstrum anal s) vel ee o ved ysi ie. ysi