FORMANT FREQUENCIES OF DUTCH VOWELS IN TRACHEOESOPHAGEAL SPEECH
SPEECH RECOGNITION AND THE FREQUENCY OF RECENTLY USED WORDS
cache component. Furthermore, the relative weigbts calculated from the training text for the two components of the combined model indicate tlmse POSs for which short-term frequencies of word use differ drastically from long-term frequencies, and those for which word frequencies Stay nearly constant over time. 2 A N a t u r a l L a n g u a g e M o d e l w i t h M a r k o v ~.nd Cache C o m p o n e n t s The "trigram " Markov language model for speech recognition developed by F. Jelinek and his colleagues uses the context provided by the two preceding words to estimate the probability that the word Wi occurring at time i is a given vocabulary item W. Assume rccursivcly that at time i we have just recognized the word sequence W 0 ." " , W i _ 2 Wi__1. The trigram model approximatss P (Wi : W ] Wo, • " • , Wi_2, W~_I)
Prediction of Vowel and Consonant Place of Articulation
• least effort: this refers to the efficiency of the articulatory-acoustic relations (2) with monotonicity (3) and orthogonality (4) of the relations. The use of an efficiency criterion corresponds to increasing the role of the dynamics; • simplicity: with respect to the commands of the deformation of the acoustic tube, the commands must be simple J(5) (straight deformations), few in number (6), and with a reduced number of degrees of constriction (7). This is based on the assumption that fewer commands make for smaller demands on memory resources (8) that may be engaged in the learning and eventual mastery of these commands in the phonological acquisition process. An algorithm to automatically and efficiently deform the area function of an acoustic tube in order to increase or decrease the frequency of a formant or combination of several formants has been proposed elsewhere Carr6 e t al. (1994; 1995). As an example, figure 2 shows the automatic evolution of the area function of the tube from a closed-open neutral configuration, for increasing and decreasing F2. Four main regions naturally emerge which are not of equal length. The /a/ vowel is an automatic consequence of a back constriction associated with a front cavity, and, t h e / i / v o w e l of a front constriction associated with a back cavity (anti-symmetrical behavior), a pharynx cavity, thus, being automatically obtained. If, however, the initial configuration is that of a tube closed at both ends (i.e. closed-closed) the/u/ vowel is automaiicaUy obtained with a central constriction (symmetrical behavior). A summary of the main conclusions that may be drawn from the above manipulations are as following: • configurations using the maximum acoustic contrast criterion correspond to those for the three vowels/a, i, u / o f the vowe[ triangle; • the deformation of the tube is minimum, because of the use of the sensitivity function (Fant and Pauli, 1974) and thus efficient; • the deformation 'commands (or gestures) are simple (recti-linear), limited in number (only one in the case of figure 2, for the making of a back constriction is automatically as~sociated with a front cavity and vice-versa, as i9 human), and applied at specific places called distinctive regions.
《英语语言学概论》精选试题学生版
《英语语言学概论》精选试题11.Which of the following statements about language is NOT true?nguage is a systemnguage is symbolicC.Animals also have l anguagenguage is arbitrary2.Which of the following features is NOT one of the design features of language?A. SymbolicB. DualityC. ProductiveD. Arbitrary3.What is the most important function of language?A. InterpersonalB. PhaticC. InformativeD. Metalingual4.Who put forward the distinction between Langue and Parole?A. SaussureB. C homskyC. HallidayD. Anonymous5.According to Chomsky, which is the ideal user's internalized knowledge of his language?A. competenceB. paroleC. performanceD. langue6.The function of the sentence "A nice day, isn't it?" is .A. informativeB. phaticC. directiveD. performative7.Articulatory phonetics mainly studies .A.the physical properties of the sounds produced in speechB.the perception of soundsC.the combination of soundsD.the production of sounds8.The distinction between vowels and consonants lies in .A.the place of articulationB.the obstruction of airstreamC.the position of the tongueD.the shape of the lips9.Which is the branch of linguistics which studies the characteristics of speech sounds and provides methods for their description, classification and transcription?A. PhoneticsB. PhonologyC. SemanticsD. Pragmatics10.Which studies the sound systems in a certain language?A. PhoneticsB. PhonologyC. SemanticsD. Pragmatics11.Minimal pairs are used to .A.find the distinctive features of a languageB.find the phonemes of a languagepare two wordsD.find the allophones of languageually, suprasegmental features include ,length and pitch.A. phonemeB. speech soundsC. syllablesD. stress13.Which is an indispensable part of a syllable?A. CodaB. OnsetC. StemD. Peak三、判断1.The analyst collects samples of the language as it is used, not according to some views of how it should be used. This is called the prescriptive approach. F2.B road transcription is normally used by the phoneticians in their study of speech sounds. F台州学院考试题1.Articulatory Phonetics studies the physical properties of speech sounds.2.English is a typical intonation language.3.Phones in complementary distribution should be assigned to the same phoneme.4.Linguistic c is a native speaker’s linguistic knowledge of his language.1.The relationship between the sound and the meaning of a word is a .2.P refers to the realization of langue in actual u se.3.Linguistics is generally defined as t he s study of language.1.Which of the following branch of linguistics takes the inner structure of word as its main object of study?A. Phonetics.B. Semantics.C. M orphology.D. Sociolinguistics.3. Which of the following is a voiceless bilabial stop?A. [w].B. [m].C. [b].D. [p].6. What phonetic feature distinguishes the [p] in please and the [p] in speak?A.VoicingB. AspirationC. RoundnessD. Nasality11.Conventionally a is put in slashes.A. a llophoneB. phoneC. phonemeD. morphemenguage is tool of communication. The symbol “highway closed ”serves .A. an expressive functionB. an informative functionC. a performative functionD. a persuasive function14.Which of the following groups of words is a minimal pair?A. but/pubB. wet/whichC. cool/curlD. fail/find16.What are the dual structures of language?A. Sounds and letters.B. Sounds and m eaning.C. Letters and meaning.D. Sounds and symbols.19.Which of the following is one of the core branches of linguistics?A.Phonology.B.Psycho-linguistics.C.Sociolinguistics.D.Anthropology.IV. Translate the following linguistic terms: (10 points, 1 point each)A. From English to ChineseB. From Chinese to English1.acoustic phonetics6. 應用語言學2. closed class words4. distinctive featuresVI.Answer the following questions briefly. (20 points)1.Define phoneme. (4 points)2.Explain complementary distribution with an example.(5 points)3.What are the four criteria for classifying English vowels. (4 points)问答答案1. A contrastive phonological segment whose phonetic realizations are predictable by rules. (4 points)(or: A phoneme is a phonological unit; it is a unit that is of distinctive value.)2.The situation in which phones never occur in the same phonetic environment.(4 points)e.g. [p] and [p h] never occur in the same position. (1 point)3.the position of the tongue in the mouth(1 point), the openness of the mouth(1 point), the shape of the lips(1 point), and the length of the vowels. (1 point)Chapter 1 Introductions to LinguisticsI.Choose the best answer. (20%)nguage is a system of arbitrary vocal symbols used for humanA. contactB. communicationC. relationD. Community2.Which of the following words is entirely arbitrary?A. treeB. typewriterC. crashD. Bang3.The function of the sentence ―Water boils at 100 degrees Centigrade.‖ is.A. interrogativeB. directiveC. informativeD. Performative4.In Chinese when someone breaks a bowl or a plate the host or the people present are likely to say―碎碎(岁岁)平安‖as a means of controlling the forces which they believes feelmight affect their lives. Which functions does it perform?A. InterpersonalB. EmotiveC. PerformativeD. Recreational5.Which of the following property of language enables language users to overcome the barriers caused by time and place , due to this feature of language, speakers of a language arefree to talk about anything in any situation?A. TransferabilityB. DualityC. DisplacementD. Arbitrariness6.Study the following dialogue. What function does it play according to the functions of language?—Anice day, isn’t it?—Right! I really enjoy the sunlight.A. EmotiveB. PhaticC. PerformativeD. Interpersonal7.________ refers to the actual realization of the ideal language user’s knowledge of the rules of his language in utterances.A. PerformanceB. CompetenceC. LangueD. Parole8.When a dog is barking, you assume it is barking for something or at someone thatexists hear and now. It couldn’t be sorrowful for some lost love or lost bone. This indicat design feature of .A.cultural transmissionB.productivityC.displacementD. Duality9.answers such questions as how we as infants acquire our first language.A.PsycholinguisticsB.A nthropological linguisticsC.SociolinguisticsD.Applied linguistics10.deals with language application to other fields, particularly education.A.Linguistic theoryB.Practical linguisticsC.Applied linguisticsparative linguisticsII.Decide whether the following statements are true or false. (10%)11. Language is a means of verbal communication. Therefore, the communication way used by the deaf-mute is not language. F13.Speaking is the quickest and most efficient way of the human communication systems.nguage is written because writing is the primary medium for all languages. F15.We were all born with the ability to acquire language, which means the details language system can be genetically transmitted. F16.Only human beings are able to communicate. F17. F. de Saussure, who made the distinction between langue and parole in the early 20th century, was a French linguist. F18. A study of the features of the English used in Shake e s a p re’s time is an example of the diachronic 历时study of language. F19.Speech and writing came into being at much the same time in human history.F20. III.All the languages in the world today have both spoken and written forms.Fill in the blanks. (10%)Fnguage, broadly speaking, is a means of verbal_ communication.22.In any language words can be used in new ways to mean new things and can becombined into innumerable sentences based on limited rules. This feature is usually termed creativity_ .nguage has many functions. We can use language to talk about itself. This funct is .24.Theory that primitive man made involuntary vocal noises while performing heavywork has been c alled the yo-he-ho ________ theory.25.Linguistics is the systematic study of language.26.Modern linguistics is __ ________ in the sense that the linguist tries to discover what language is rather than lay down some rules for people to observe.27.One general principle of linguistic analysis is the primacy of over writing.28.The description of a language as it changes through time is a study.29.Saussure put forward two important concepts. refers to the abstract linguistic system shared by all members of a speech community.30.Linguistic potential is similar to Saussure’s langue and Chomsky’s.I V.Explain the following terms, using examples. (20%)31.Design feature32.Displacementpetence34.Synchronic linguisticsV.Answer the following questions. (20%)35.Why do people take duality as one of the important design features of human language?Can you tell us what language will be if it has no such design feature? (南开大学,2004 )35.Duality makes our language productive. A large number of different units can be formed out o a small number of elements – for instance, tens of thousands of words out of a small set of sounds, around 48 in the case of the English language. And out of the huge number of words, there can be astronomical number of possible sentences and phrases, which in turn can combineto form unlimited number of texts. Most animal communication systems do not have this design feature of human language.If language has no such design feature, then it will be like animal communicational systemwhich will be highly limited. It cannot produce a very large number of sound combinations, e.g. words, which are distinct in meaning.Chapter 2 Speech SoundsI.Choose the best answer. (20%)1.Pitch variation is k nown as when its patterns are imposed on s entences.A. intonationB. toneC. pronunciationD. voice2.Conventionally a is put in slashes (/ /).A. allophoneB. phoneC. phonemeD. morpheme3.An aspirated p, an unaspirated p and an unreleased p are of the p phoneme.A. analoguesB. tagmemesC. morphemesD. allophones4.The opening between the vocal cords is sometimes referred to as .A. g lottisB. vocal cavityC. pharynxD. uvula6.A phoneme is a group of similar sounds called .A. minimal pairsB. allomorphsC. phonesD. allophones7.Which branch of phonetics concerns the production of speech sounds?A.Acoustic phoneticsB.Articulatory phoneticsC.Auditory phoneticsD.None of the above8.Which one is different from the others according to places of articulation?A. [n]B. [m]C. [ b ]D. [p]9.Which vowel is different from the others according to the characteristics of vowels?A. [i:]B. [ u ]C. [e]D. [ i ]10.What kind of sounds can we make when the vocal cords are vibrating?A. VoicelessB. V oicedC. G lottal s topD. ConsonantII.Decide whether the following statements are true or false. (10%)11.Suprasegmental phonology refers to the study of phonological properties of units larger thanthe segment-phoneme, such as syllable, word and sentence.12.The air stream provided by the lungs has to undergo a number of modification to acquire thequality of a speech sound.14.[p] is a voiced bilabial stop.15.Acoustic phonetics is concerned with the perception of speech sounds.16.All syllables must have a nucleus but not all syllables contain an onset and a coda.17.W hen pure vowels or monophthongs are pronounced, no vowel glides take place.18.According to the length or tenseness of the pronunciation, vowels can be divided into vs. lax or long vs. short.III.Fill in the blanks. (20%)21. Consonant sounds can be e ither ______ __ or _______ _, while all vowel sounds are .23. The qualities of vowels depend upon the position of the and the lips.25.Consonants differ from vowels in that the latter are produced without .26.In phonological analysis the words fail / veil are distinguishable simply because of the two phonemes /f/ - /v/. This is an example for illustrating .27.In English there are a number of _________ , which are produced by moving f rom one vowel position to another through intervening positions.28.refers to the phenomenon of sounds continually show the influence of their neighbors.29.is the smallest linguistic unit.IV.Explain the following terms, using examples. (20%)31.Sound assimilation32.Suprasegmental featureplementary distribution34.Distinctive featuresV.Answer the following questions. (20%)35.What is a coustic phonetics? (中国人民大学,2003 )36.What are the differences between voiced sounds and voiceless sounds in terms of articulation? (南开大学,2004 )VI.Analyze the f ollowing situation. (20%)37.Write the symbol that corresponds to each of the following phonetic descriptions; then give an English word that contains this sound. Example: voiced alveolar stop [d] dog. (青岛海洋大学,1999 )(1)voiceless bilabial unaspirated stop(2)low front vowel(3)lateral liquid(4)velar nasal(5)voiced interdental fricative32.Suprasegmental feature: The phonetic features that occur above the level of the segments are called suprasegmental features; these are the phonological properties of such units as the syllable, the word, and the sentence. The main suprasegmental ones includes stress, intonation, and tone.plementary distribution: The different allophones of the same phoneme never occur in the same phonetic context. When two or more allophones of one phoneme never occur in the same linguistic environment they are said to be in complementary distribution.34.Distinctive featureIst: refers to the features that can distinguish one phoneme from another. If we can group the phonemes into two categories: one with this feature and the other without, this feature is called a d istinctive feature.V. 35.Acoustic phonetics deals with the transmission of speech sounds through the air. When a speechsound is produced it causes minor air disturbances (sound waves). Various instruments are usedto measure the characteristics of these sound waves.36. When the vocal cords are spread apart, the air from the lungs passes between them unimpeded. Sounds produced in this way are described as voiceless; consonants [p, s, t] are produced in this way. But when the vocal cords are drawn together, the air from the lungs repeatedly pushes them apart as it passes through, creating a vibration effect. Sounds pr in this way are described as voiced. [b, z, d] are voiced consonants.11。
嗓音声学检测影响因素分析
嗓音声学检测影响因素分析中南大学湘雅医院康复医学科,中南大学湘雅医院康复医学科,通讯作者:李哲,中南大学湘雅医院康复医学科嗓音是语言的基础,在沟通、个性表达以及社交中具有重要意义。
大脑语言中枢发出指令控制呼吸肌收缩产生气流上行至声门形成声门波,产生基音[1],基音向上传导经过共鸣腔体和构音器官时某些声音被放大,形成不同的嗓音[2]。
在嗓音形成通路上神经支配、解剖结构或功能损害会导致呼吸、发声、共鸣或构音等功能异常,进而出现嗓音障碍。
嗓音障碍是临床中常见的主诉,也是一些隐匿性疾病如帕金森、肌萎缩侧索硬化以及喉癌等的早期表现。
因此,嗓音障碍的精准识别对于疾病的早期诊断与治疗具有重要意义。
目前,嗓音的评估主要包括患者主观评分、感知嗓音评估、动态喉镜检查、空气动力学和声学评估五个方面。
其中嗓音声学检测由于无创、易获取、可重复并能提供喉功能相关的定量数据,是嗓音客观评估的首选方法之一。
通过测量基频、共振峰、基频微扰(Jitter)、振幅微扰(Shimmer)以及噪协比(NHR)等多个指标反映发声过程声带振动的稳定性和规律性[3]、共鸣以及噪声能量等。
因此声学检测在早期识别异常嗓音、疾病诊断、评估嗓音功能、动态观察病情以及评价治疗方案中均有优越表现。
但由于嗓音是一个复杂多维的过程,会受到声门下气流、声带振动以及共鸣等因素的影响,目前尚未出现统一的嗓音声学检测方法也缺乏鉴别嗓音障碍的声学区间值。
本文将对影响嗓音声学的因素进行梳理,分析可能影响声学检测结果的因素,为提高声学检测的准确性及后续临床中嗓音的精准评估提供参考。
目前国内外开展的嗓音声学影响因素分析主要归纳为个体差异和检测方式两个部分。
1.个体差异1.1人种、语言嗓音因人种、语言、文化的不同而有一定区别[4]。
嗓音声学研究选择的元音在阿拉伯语中[5]是/i:/、/e:/、/a:/、/o:/和/u:/,在瑞典[6]语中则是/e:/、/u:/、/i:/和/ɔ/,而汉语主要集中在/a/、/i/和/u/,在汉语不同方言与民族语言中相同元音也表现出显著差异[7]。
Automatic technique in frequency domain for near-lossless time-scale modification of audio
Automatic technique in frequency domainfor near-lossless time-scale modification of audioJordi BonadaAudiovisual Institute, Pompeu Fabra UniversityRambla 31, 08002 Barcelona, Spainjordi.bonada@iua.upf.eshttp://www.iua.upf.es[Published in the Proceedings of the ICMC 2000]AbstractTime-scale modification of sounds has been a topic of interest in computer music since its very beginning. Many different techniques both in time and frequency domain have been proposed to solve the problem. Some frequency domain techniques yield high-quality results and can work with large modification factors. However, they present some artifacts, like phasiness, loss of attack sharpness and loss of stereo image. In this paper we propose a new frequency domain approach for an automatic near-lossless time stretching of audio.1. IntroductionTime-scaling an audio signal means changing the length of the sound without affecting other perceptual features, such as pitch or timbre. The system presented in this paper strives for obtaining a near-lossless time-scaled audio modification without any other perceptual change, just like if the music was performed faster or slower.The system has been implemented taking into consideration two important issues: easiness of use and speed of computation. The system should be very easy to control and to make use of, and should have no complex parameter selection at all. Concerning the processing speed, it should be able to apply the effect in real-time, without any DSP or hardware add-on.2. Frame based frequency domaintechniqueThe technique used in this paper is a frame based frequency domain technique. It's based on the well-known phase-vocoder [GL84, D86, ML95]2.1 General diagramIn figure 1 we can see the general diagram, where the input is the audio signal and the output is the time-scaled version of the input. First or all the input sound is windowed and goes through the FFT module to get the analysis frame (AF n), with the spectrum amplitude and phase envelopes. Then the time-scale module generates the synthesis frame (SF m) that goes to the inverse FFT (IFFT). And finally the Windowing&Overlap-Add block divides the sound segment by the analysis window and multiplies it by the overlap-add window, to get the output sound.2.2 Constant frame rateIt is important to remark that the frame rate used is the same both in analysis and synthesis modules, as opposed to the most broadly used change of frame rate in synthesis to achieve the time-scale. In the next figure we can see what happens in our system in the cases of a time-scale factor that makes the audio length larger than the original duration (TS > 1), and the opposite (TS < 1).As it is shown in figure 2, sometimes one analysis frame is used twice (or more), and sometimes it is never used! This will not add any artifacts if the frames are small enough and the sound characteristics don't change veryFigure 1 General diagram. It is clear that both analysis and synthesis processes must use the same window size and type.Figure 2 Analysis and synthesis frames. The horitzontalaxis corresponds to the time of the center of the frame inthe input audio signal. Therefore, when TS>1, the timeincrements in the input audio signal will be shorter insynthesis than in analysis frames, but the actual framerate will be exactly the same. Each synthesis framepoints to the nearest analysis frame looking at the right.AnalysisframesAnalysisframesfast. In the case of a percussive attack, for example, a repetition or an omission could be noticeable by the listener even if the frames are really small (about 10 ms). Therefore some wise knowledge of the sound is needed to decide where to repeat or omit frames, as it will be later exposed.3. The Time-Scale module3.1 DescriptionThe inputs to the time-scale module are the analysis frames (AF n ) containing the spectrum amplitude and phase envelopes. A peak detection algorithm followed by a peak continuation block is applied to the current and previous (Z -1) amplitude envelopes. The peaks that are going to be continued are used as inputs to the spectrum phase generation module. The spectral amplitude envelope is unchanged by the time-scale module. Only the phase is changed.3.2 Peak detection and continuationThe peak detection algorithm used is very simple. It just locates relative maxima (around a band, A peak (i)>A peak (m) for |m -i|?B/2, where B is the required bandwith of the peak) of the amplitude envelope and applies a parabolic interpolation. The continuation algorithm just chooses the closest peak in frequency, and resolves peak conflicts.3.3 Spectrum Phase generationThe phase of the peak is calculated supposing that frequency varies linearly between two consecutivesframes and that exists some phase deviation (fig. 4). Since the frame rate is the same for analysis and synthesis, then the phase variation between two consecutives frames can be supposed to be also the same. As pointed in [LD97], the peaks subdivide the spectrum into regions around each peak where the phase is related to the peak's phase. The phase around each peak is obtained applying the delta phase function of the original spectrum phase envelope.4. Fast changesWe cannot time-scale fast changes (specially attacks) in order to keep the perceptual features of audio. For example, the percussive attacks should be kept as they are. The solution proposed is to detect the fast changes and keep their original length. This means that a greater amount of time-scale should be applied around a fast changing region, so to keep the average time-scale modification factor.4.1 Detection of fast changesThe detection of fast changes (usually attacks ) is done processing several observations: bank filter energies, Mel cepstrum coefficients, and its deltas. The bank of filters is composed of 42 bands between 40 and 20353.6 Hz, following a Mel scale. The window used is a Hamming of about 23.2ms. Several simple rules are applied combining these inputs, but the main idea is to find points of maximum increasing slope.4.2 Fast changing regionThe audio around the fast changing time should not be time-scaled. Besides, it's important to keep the original phase as much as possible in a fast changing region, to preserve the perceptual features of the input sound.AmplitudeAmplitudeFigure 5 Original delta phase function around each peak.Figure 6 Fast changes detection.Figure 4 Peak continuation and phase generation. The peak of frequency f 2 in AF n-1 is not continued in the next frame AF n because the difference ?f = f '2 – f 2 is greater than the defined maximum deviation frequency?f max , and f '3 is closer to f 3 than to f 2n-1n 1n=Analysis frame index m=Synthesis frame index11111,,22n n AF f AF f f f t ?????????????11111,,22m mSF f SFf f f t ?????????????f 2 3f f 0 SF mWe can use the original spectrum phase envelope above a specified frequency cut (FC, 2500Hz has proven to be good selection) along the fast changing region, with no perceived artifacts and preserving the perceptual behaviour. For frequencies lower than the frequency cut, we need to continue the phase (section 3.3) of the quasi-stable sinusoidal peaks. But we can use the original phase in the non stable peaks (this way we just assure the phase envelope in the synthesis frame being more similar to the original one, and therefore guarantee a more similar timbre).There are different types of fast changes, where each type defines a different frequency cut (FC):??hi-freq: only fast changes in the high frequencies (cymbal, crash…). FC high??bass: fast changes in low frequencies (kick drum, tom…) FC low??kick: big changes at all frequencies. FC zero (the full original phase spectrum is used)5. Parallel windowing5.1 Frequency vs. time resolutionIt is good to have long windows in order to achieve a highfrequency resolution, but also to have short windows so to achieve a high time resolution. If an important low frequency sound is present in the audio signal, then a long window is really needed, because the low frequency peaks will be very close one to the other, and with a short window the peaks will not be detected properly. Otherwise, if we use this long window, then the time-scale process will add some reverberance and smoothness to the sound. This leads us to think about parallel windowing. 5.2 Multiple windows in parallelWe propose to use several channels (band pass). Each channel is the result of an FFT with a specific window size, window type and zero padding. For low frequencies the window should be longer than for high frequencies. The peak detection is applied to each of the channels. The peak continuation takes care of the desired channel frequency cuts, so it can connect peaks of different channels. The Time-scale module fills the spectrum of allthe channels (amplitude and phase envelopes)and applies a set of parallel filters H n(f) that must add up to a constant (all pass filter). Time varying channel frequency cuts are required because if a peak is very close to a frequency cut, then breaking it into different channels could produce some artifacts. This way we can guarantee the amplitude and phase envelopes around the peak to be the desired ones.A useful example with three bands could be the next one, with a frame rate of 86 frames/secBand 1Band 2Band 3 Frequency cut (Hz) 700 240022050 Window size (in frames) 8 4 3 Window type KB2.0 KB3.0KB3.0 6. Stereo time-scaleIf we want to keep the stereo image, then it is necessary to preserve the phase and amplitude relation between left and right channels. Since we don't change the spectrum amplitude envelope of the analysis frame in the system, then the amplitude relation between channels will be already preserved if we just use always the same frameAF nFAST CHANGING REGIONClosest synthesis frameto AF nFigure 7 Fast changing region. Notice that no frame is omitted or repeated along the fast changing region. Therefore, the original timing in the fast changing regionis preserved.Figure 8 Phase envelope in the fast change region.Figure 9 Multiple parallel windowing. There are K different channels, each one with a specific window size, window type andzero padding factor.Figure 10 Variable phase frequency cut. The used frequency cut is calculated as the middle point between the two closest peaks to the frequency cut that come out of the peak continuation.Synthesis Frame n-1cutfrequencycutcutfrequencycutSynthesis Frame ntimes for left and right channels. This means that fast changes synchronization is needed to ensure that the left and right channel frames have the same time tag. Therefore, only the phase relation needs to be carefully treated. In the next figure we can see the diagram showing the stereo time-scale system. Notice that the number of FFT and IFFT operations is multiplied by two, and as a consequence the same happens to the processing time.6.1 Channel phase relationAs it was said before, the time-scale module should preserve the phase relation between channels. The phase around the peak is obtained with the original delta phase function (section 3.3). Therefore, by straight-forward algebra it is very clear that if we keep the phase difference of the peaks from different channels, than all the spectrum bins around the peaks will also keep the phase relation, and the stereo image will be preserved.7. Time varying time-scaleThe amount of time-scaling can be a time varying function without affecting the quality of the processed audio. The only significant change in the system is that the time increments of the synthesis frames into the input signal is not constant. The rest of the system remains exactly the same. This opens a lot of new and interesting perspectives:??The system could be easily adapted and used for alignment and synchronization of two soundsources.??The amount of time-scale can be used in a wise way to inspire emotions. For example, toincrease the climax or the suspense of a musicalpiece, by slowing or increasing the tempo duringcertain fragments.??An interesting application could be to control the time-scale factor the same way as the orchestraconductor does and play in real-time apreviously recorded background with a liveperformance.8. Results and conclusionsThe system has been fully implemented. Several audio sources have been tested in order to prove the robustness and quality of the time-scale modification. The tests consisted in hearing and comparing carefully the input and the output audio signals. Different styles of music were used as input into the system. The processed audio yields a high quality (near l o ss-less) in a broad range of time-scaling factors (70% to 150%). The detection and treatment of fast changes (attack) works quite well in most types of audio inputs (musical or not). The attack sharpness is really preserved in the processed signal, as well as the stereo image. Besides, it's possible to apply a time varying time-scale factor with no loss of quality. Weaknesses of the system:??The detection and characterization of fast changes needs to be improved.??Very low pitch harmonic sounds require preserving the phase alignment, specially if thewaveform is quite impulsive.??It's slow. About 50% real-time on a AMD Athlon CPU running at 700Mhz (44Khz, 16 bit)9. Future workThere are still many things to be done. First of all, the current system needs a better detection and characterization of fast changes. It's also important to optimise the implementation and the system to get a real-time time-scaling of stereo sound files (44Khz, 16 bit) on the last PC's in the market without any DSP or hardware add-on. Another area of interest is the object filtering: detect harmonic audio objects, preserve the phase alignment of harmonic objects (this is important for the perceived timbre, specially for low frequency sounds), preserve original vibratos and tremolos. It would be also interesting to transform the system into a pitch shifter, using most of the current implemented processes. Finally, the system should be adapted to the voice, so to be able to keep the original length of plosives and short consonants, use a wise dynamic time-scale factor, etc. References[GL84] D.W. Griffin, J.S.Lim. "Signal estimation from modified short-time fourier transform". IEEE Trans.Acoust., Speech, Signal Processing, ASSP-32(2):236-243, Apr 1984.[D86] M. Dolson. "The phase vocoder: A tutorial".Computer Music Journal, 10(4):14-27, 1986.[ML95] E. Moulines, J. Laroche. "Non parametric techniques for pitch-scale and time-scale modification of speech". Speech Communication, 16:175-205, Feb 1995.[LD97] J. Laroche, M. Dolson. "About this phasiness business". Proceedings of International Computer Music Conference, 1997.Figure 11 Stereo time-scale。
RHYTHM, PROSODY, TONE, LANGUAGE
PROSODY IS OF GREAT INTEREST IN AUTOMATIC SPEECH RECOGNITION
DECLARATIVE, INTEROGATIVE, IMPERATIVE
DECALARATIVE: “You are going home” INTEROGATIVE: “You are going home?” (voice is raised at end of sentence) IMPERATIVE: “You ARE going home!” (are is emphasized)
PROSODY
IN LINGUISTICS, PROSODY IS THE RHYTHM, STRESS, AND INTONATION OF SPEECH. PROSODY MAY REFLECT VARIOUS FEATURES OF THE SPEAKER OR THE UTTERANCE, THE EMOTIONAL STATE OF A SPEAKER, WHETHER THE UTTERANCE IS A STEMENT, A QUESTION, OR A COMMAND; WHETHER THE SPEAKER IS BEING IRONIC OR SARCASTIC; EMPHASIS, CONTRAST AND FOCUS. IN TERMS OF ACOUSTICS, THE PROSODICS OF ORAL LANGUAGES INVOLVE VARIATION IN SYLLABLE LENGTH, LOUDNESS, PITCH, AND THE FORMANT FREQUENCIES OF SPEECH SOUNDS.
VOICE QUALITY IS A BROAD TERM THAT REFERS TO THE EXTRALINGUISTIC ASPECTS OF A SPEAKER’S VOICE WITH REGARD TO IDENTITY, PERSONALITY, HEALTH, AND EMOTIONAL STATE. VOCAL FOLD MASS, VOCAL TRACT LENGTH, TRACHEAL LENGTH, JAW AND TONGUE SIZE, AND NASAL CAVITY VOLUME MAY INDICATE INFORMATION ABOUT AGE, SEX, PHYSIQUE, AND HEALTH.
声门旁间隙名词解释
声门旁间隙名词解释1. 引言在语音学中,声门旁间隙是指位于声门周围的一个空隙,它在语音生成过程中起到重要的作用。
本文将对声门旁间隙进行详细的解释和分析。
2. 声门旁间隙的定义声门旁间隙是指位于声门前后的一段空间,它由声门旁的软组织和声门闭合时形成的声门孔组成。
它是声门周围的一个截面,将呼出气流和吸气气流分隔开来。
3. 声门旁间隙的结构声门旁间隙主要由以下几个部分组成: - 声门旁软骨:声门旁软骨是声门旁间隙的主要组成部分,它位于声门的两侧。
声门旁软骨的形状和大小因个体差异而异,它可以保护声门和声门旁间隙。
- 声门旁肌肉:声门旁肌肉包围着声门旁间隙,起到控制声门旁间隙开合的作用。
声门旁肌肉主要由横抬肌和斜抬肌组成,它们协同工作,使声门旁间隙在语音生成过程中得到合适的调节。
4. 声门旁间隙的功能声门旁间隙在语音生成过程中发挥着重要的功能: - 呼出气流的通道:声门旁间隙是呼出气流通向口腔腔和鼻腔的主要通道之一。
当声门打开时,呼出气流从声门旁间隙通过,经过口腔和鼻腔形成不同的语音音位。
- 声门闭合的保护作用:声门旁间隙的存在可以保护声门不受外界刺激和损伤。
当声门闭合时,声门旁间隙起到了物理屏障的作用,阻止外界物质进入声门区域。
5. 声门旁间隙与语音生成的关系声门旁间隙是语音生成的重要环节之一,它与声带振动、共鸣和调节等因素密切相关。
声门旁间隙的大小、形状和位置会对发音产生影响,不同的声门旁间隙状态会产生不同的音质和音高。
6. 声门旁间隙与语音障碍的关系声门旁间隙的异常状态常常会导致语音障碍的产生。
一些常见的声门旁间隙异常包括声门狭窄、声门旁间隙过大或过小等。
这些异常状态会影响声门的正常振动和调节,导致语音产生困难。
7. 总结声门旁间隙是语音生成中不可忽视的一个因素。
它在语音产生过程中起到了重要的作用,包括呼出气流的通道和声门闭合的保护作用。
声门旁间隙的结构、功能与语音生成密切相关,它影响着语音的音质和音高。
acoustic phonetics
Pitch and Frequency
Frequency is the number of the complete repetitions (cycles) of waveforms ( or the number of vibrations) in a second. The unit of frequency is measured in Hz. If the vocal cords make 200 complete opening and closing movements in a second, the frequency is 200Hz.
• F0 ( read as F naught or F zero) is of particular importance in studies of intonation. • F0 determines tone. Tone is connected with pitch variations. We perceive tones.
Amplitude/ Intensity/Loudness
• Amplitude determines loudness. • The loudness of a sound depends on the size of the variations in air pressure ( or amplitude).
Vocal folds vibration -> harmonics, H1 (F0), H2, H3 … -> Oral, nasal, and pharyngeal cavities -> to form resonators-> Different shapes/sizes of the resonators -> resonate and emphasize certain harmonics -> formants (F1, F2, F3) -> Formant patterns-> quality of sounds (vowels/sonorants)
各元音平均共振峰频率 -回复
各元音平均共振峰频率-回复各元音平均共振峰频率(Average Formant Frequencies of Vowels)引言:人类语音是由声带振动产生的声音经过呼吸道共鸣的结果。
其中,元音是具有清晰的共振峰(Formant)的音素,其频率范围广泛且不同于辅音。
本文将介绍各个元音的平均共振峰频率,并探讨与其口腔形状的关系。
一、什么是共振峰(Formant)?共振峰指的是声音在人体呼吸道内遇到固定长度的共振管时产生的强调频率峰值。
这些共振峰是由于口腔、喉部和鼻腔等腔体的形状和尺寸而产生的。
在人类语音中,共振峰是决定元音音质差异的主要因素。
二、元音共振峰频率与口腔形状的关系不同的元音发音所需要的口腔形状不同,因此,它们的共振峰频率也不同。
以下是一些常见元音的平均共振峰频率:1. /i/ 音(尖尖的)/i/ 音是一种“尖尖的”元音,其发音需要将舌头抬高,并且将嘴唇拉得比较紧。
这种发音会导致共振峰频率较高。
根据研究,/i/ 元音的第一共振峰频率约为276 Hz,第二共振峰频率约为2396 Hz。
2. /ɑ/ 音(开放的)/ɑ/ 音是一种“开放的”元音,其发音需要将口腔打开并且舌头放松。
这种发音会导致共振峰频率较低。
根据研究,/ɑ/ 元音的第一共振峰频率约为730 Hz,第二共振峰频率约为1090 Hz。
3. /u/ 音(圆唇)/u/ 音是一种“圆唇”元音,其发音需要将嘴唇拉起并向前突出。
这种发音会导致共振峰频率较低。
根据研究,/u/ 元音的第一共振峰频率约为362 Hz,第二共振峰频率约为2297 Hz。
4. /e/ 音(舌头在前的)/e/ 音是一种“舌头在前的”元音,其发音需要将舌头抬高并前移。
这种发音会导致共振峰频率较高。
根据研究,/e/ 元音的第一共振峰频率约为549 Hz,第二共振峰频率约为1914 Hz。
综上所述,不同元音的共振峰频率受到各种因素的影响,如口腔形状、舌位、嘴唇等。
对这些共振峰频率的研究有助于理解人类语音产生与知觉的机制。
英语音标讲义及练习题:单元音知识讲稿
Unit Pronunciation Techniques
03
Unit Sound Practice Questions
Monophthongs and diphthongs
01
Delineate the difference between a monophthong and a diphthong, and provide examples of each.
Dialectical variations
Explain the differences in pronunciation between different dialects of English, and provide examples of words that have different pronunciations in different dialects.
1. Imagine different scenarios in which you would use a particular monophonic sound, such as having a conversation with a friend or ordering food at a restaurant.
4. Repeat the simulation process for other monophonic sounds until you feel confident in your ability to use them appropriately in different scenarios.
2. Act out the scenarios using the correct pronunciation of the sound, focusing on the intonation and stress patterns that are unique to each scenario.
FORMANT FREQUENCY ESTIMATION IN NOISE
FORMANT FREQUENCY ESTIMATION IN NOISEBin Chen and Philipos C.Loizou*University of Texas at Dallas,Dept.of Electrical EngineeringRichardson,TX75083*loizou@ABSTRACTThis paper addresses the problem of formant frequency es-timation of speech signals corrupted by colored noise.The spectrum is sequentially segmented inot K segments so that each segment contains a single formant.A segmentation metric based on Wienerfilter theory is proposed for deter-mining the segment boundaries.A peak-picking algorithm is used for estimating the formant frequencies in each seg-ment.Results obtained using vowels embedded in+5dB S/N speech-shaped noise,indicated that the proposed algo-rithm produced formant frequencies which were compara-ble to those estimated in quiet.1.INTRODUCTIONApart from a variety of formant tracking approaches[1][2], considerable attention has been paid to methods based on linear prediction analysis(LPC)[3][4].However,captur-ing and tracking formants accurately from noisy speech is not easy,largely because the accuracy of root-finding algo-rithms based on LPC is sensitive to the noise level.In[5][6],a set of parallel digital formant resonators has been proposed for speech synthesis or formant frequency estimation.In this paper,we propose the use of a sequen-tial digital resonator model for spectrum segmentation.The spectrum segmentation is implemented sequentially from low to high frequencies.For each spectral segment,a dig-ital resonator isfirst determined to represent the spectral segment.A metric based on Wienerfilter theory is pro-posed to determine the segment boundaries.After identify-ing the spectral segments containing the formants,we apply a peak-picking algorithm on each spectral segment tofind the formant frequency.This approach was taken since the LPC-based digital resonators are sensitive to the noise level.A major advantage of the proposed method is that it deter-mines the segment boundaries sequentially and avoids the need for dynamic programming as done in[5]and[7]. This paper is organized as follows.Section2describes the formant estimation model,Section3presents the pro-posed formant frequency estimation algorithm,Section4presents the experimental results,and Section5gives theconclusions.2.FORMANT ESTIMATION MODELIn this section,a model is described for formant estima-tion that is implemented using a set of digital resonators.Each resonator represents a formant in a segment in the fre-quency domain.The spectrum is divided into segments suchthat only one formant resides in each segment.For the con-venience of representing the digital resonator,the segmentboundaries are assumed to befixed.In the next section,weshow how to determine the segment boundaries sequentiallyusing a Wiener-based metric.Each formant in a spectral segment k is represented by asecond-order predictionfilter.The second-order predictionfilter for the formant in the spectral segment k is given bythe all-pole model1/A k(z)=1/(1+a k z−1+βk z−2).The formants can be considered as being generated by a second-order system driven by white noise.A k(z)is a whiteningfilter that whitens the formant spectrum,i.e.,itflattens thespectrum in segment k.If A k(z)is used as a notchfilter,itwill notch the corresponding formant out of the spectrum.In our application,we adopt the notchfilter definition in[8]H k(z)=γ(1+αz−1+βz−2)(1) whereα=e−2πB,β=−2e−πB cosωandγ= 1/(1+α+β)are specified by the notch frequencyωand the bandwidth B.Note that A k(z)is similar to H k(z)ex-cept for the scalarγ.Thus,we canfind the notchfilter by determining the segmental system transfer function H k(z). According to[5],the optimum prediction coefficients of the notchfilter are given by:αoptk=r k(0)r k(1)−r k(1)r k(2)r k(0)−r k(1)(2a)βoptk=r k(0)r k(2)−r k(1)2k k(2b) where r k(m)are the autocorrelation coefficients obtained for segment kr k(m)=r(wk−1,w k)(m)=1Zωkωk−1|S(e jω)|2cos(mω)dω(3)Substitutingαoptk andβoptkobtained above to Eq.(1)givesus the desired notchfilter H k(z)of the k th band in the spec-trum.The scalarγis independent of minimization of theprediction error and is determined after theαoptk andβoptkare found.As in[5],we use a discrete approximation of the in-tegral in Eq.(3).The frequency range[0π]is divided into I equally spaced intervals∆ω(=π/I)with grid πi/I,i=0,1,...,I.Therefore,the segment boundaries ω0=0,...,ωk,...,ωK=πare replaced by the indices i0=0,...,i k,...,i K=I,and r k(m)is given byr k(m)=1i k X i=i k−1|S(i)|2cosµ2πmi¶(4)with S(i)=S(ω)|ω=e j(2πi/2I).The above autocorrelation sequence is determined for a specific spectral segment[i k−1 i k],and is expected to vary accordingly with the spectral segment.Experiments showed that the autocorrelation se-quence does not change much when a strong formant dom-inates the spectral segment even after the spectral segment is expanded to include a second formant.3.PROPOSED FORMANT FREQUENCYESTIMATION ALGORITHM IN NOISESo far we described a formant frequency model for a sin-gle spectral segment k.That is,we assumed that the seg-ment boundaries were known.In this section,we propose a segmentation metric,motivated by Wienerfilter theory,that identifies the boundaries of the K segments of the spectrum containing the K formants.Suppose that the input to a Wienerfilter is a signal with an additive noise,i.e.,x(n)=s(n)+n(n),and the desired signal is the noise,i.e.,d(n)=n(n).From the orthogonal-ity principle,we know thatE[e(n)x(n−l)]=0(5) =r nn(l)−∞X k=0h w(k)r xx(l−k)where h w(n)is the Wienerfilter,e(n)is the estimation er-ror,and r nn(l)and r xx(l)are the autocorrelation sequences of the noise and noisy speech signal respectively.For a given notchfilter h(n),we can produce the prediction resid-ual w(n)of the clean signal asw(n)=M−1X k=0h(k)s(n−k)(6)where h(0)=1,and M=3.Now,if we replace the Wienerfilter h w(n)in Eq.(5)with the notchfilter h(n),we get:E[e(n)x(n−l)](7a)=r nn(l)−M−1X k=0h(k)r xx(l−k)Since x(n)=s(n)+n(n),we get from Eq.(7a):E[e(n)x(n−l)]=r nn(l)−E[w(n)x(n−l)]−M−1P k=0h(k)r nn(l−k)=0(7b)Note that Eq.(7b)is no longer equal to zero since the notchfilter h(n)in Eq.(7b)is not the optimum Wienerfilter.Sincethe prediction residual w(n)is independent of the noisy sig-nal x(n),the second term E[w(n)x(n−l)]in Eq.(7b)oughtto be zero.In practice,however,w(n)becomes white onlyif h(n)whitens s(n˙).As the upper boundary of a segmentexpands,the notchfilter h(n)will gradually become moreand more matched with the formant in the segment,andE[w(n)x(n−l)]will become smaller and smaller.WhenE[w(n)x(n−l)]reaches its minimum,or E[e(n)x(n−l)]attains its maximum,the whole formant will be matched andcontained in the segment.As mentioned earlier,the notchfilter h(n)will not change much even if the next formantis included.That is,E[e(n)x(n−l)]reaches a maximumand saturates thereafter.The point at which the maximumis reached is indicative of a segment boundary.We there-fore use the energy of E[e(n)x(n−l)]as the segmentationmetric.The third term P h(k)r nn(l−k)in Eq.(7b)may alsobecome small as h(n)changes.In order to offset the effectof this undesired term,we add the term P h(k)r nn(l−k)in Eq.(7a).Thefinal segmentation metric then becomes:E k[e(n)x(n−l)](8)=r k nn(l)−M−1X m=0h k(m)r k xx(l−m)+M−1X m=0h k(m)r k nn(l−m)where h k(m)and r k(m)represent the notchfilter andthe autocorrelation sequence calculated from the k th spec-tral segment[ωk−1ωk]respectively.The energy ofEk[e(n)x(n−l)]is used as the segmentation metric andis denoted byE ex(ωk−1,ωk)=M−1X l=0E k[e(n)x(n−l)](9)The metric saturation point,which is also the segmentboundary point,is defined to be the point at which the fol-lowing condition is satisfied:¯¯¯¯E ex(w k+m)−E ex(w k)E ex(w k)¯¯¯¯<ε(10) where E ex(w k)denotes E ex(ωk−1,ωk)for simplicity.The delay index m is used to ensure that there is a long enough saturation period before a true saturation point is detected. Empirically,m should be selected such that the saturation period is no less than300Hz.The constantεis empirically determined.Figure1shows an example of the segmentation of a noisy vowel spectrum.Once the segmentation of the formant region is deter-mined,we considered peak-picking the spectrum.The basic idea is to segment the noise spectrum to have only one for-mant in each segment,and then for each segment,peak-pick the spectrum to get an estimate for the formant frequency of the noisy speech spectrum.The above segmentation algorithm requires access to the autocorrelation sequence of the clean signal,which we do not have.To estimate the clean autocorrelation sequence, we considered pre-processing the signal by the spectral sub-traction algorithm[9]to get an estimate of the enhanced signal spectrum.The autocorrelation sequence is obtained using Eq.(4)but with S(i)being replaced with the enhanced speech spectrum.3.1.Proposed AlgorithmThe proposed algorithm is outlined below: Initialization:k=1;i k−1=0;i k=1;K=desired number of formantsStep1.Loop(for segment k):(1)Calculate r(i k)xx (l)and r(i k)nn(l)using Eq.(4)(2)Use Equations(2a),(2b)and(4)to calculate the notchfilter h(i k)(n)(3)Use Equations(8)and(9)to estimateE ex(ωk−1,ωk)(4)if E ex(ωk−1,ωk)reaches a saturation point(ac-cording to Eq.10),then:k th boundary=i kPeak-pick spectrum to estimate formant fre-quency.go to Step2end(5)i k=i k+1EndStep2.k=k+1i k−1=i kif k>K,stopelse,go to Step1In our implementation,the autocorrelation sequence of the noise,r nn(l),was estimated using thefirst few speech-absent frames of the noisy speech signal.The speech sig-Table1.Standard deviations(Hz)of formant frequencyerrors for synthetic vowels using the proposed algorithm(SEF)and the LPC algorithm.The formant frequencies ofthe LPC algorithm were obtained in quiet,while the fre-quencies of the SEF algorithm were based on vowels em-bedded in+5dB speech-shaped noise.nal was processed using10-ms duration Hamming windowswith50%overlap between adjacent frames,and the spec-trum in Eq.4was obtained using the FFT.4.EXPERIMENTAL RESULTSThe proposed formant frequency estimation algorithm wasevaluated using real and synthetic vowels.Four naturalvowels,/u/,/a/,/ei/and/i/,corrupted by speech-shapednoise at+5dB S/N were used for evaluation.The vowelswere contained in the words“hood”,“hod”,“hayed”and“heed”and were produced by a male speaker.The esti-mated formant tracks are shown in Figure2.For compar-ative purposes,we also estimated the formant frequenciesof these vowels in quiet using two other methods based onLPC(16th order)and dynamic programming[5].As can beseen,our estimated formant frequencies are comparable tothe estimated formant frequencies in quiet.The same vowels were also synthesized using the Klattsynthesizer[6],and corrupted by a+5dB speech-shapednoise.Each test consisted of200trials in which the F1wasvaried±200Hz and the F2and F3frequencies were varied±150Hz around the center of the corresponding formant frequencies.Standard derivations were measured of the dif-ferences between the true formant frequencies and the es-timated formant frequencies.The results are tabulated inTable1.For comparative purposes,we also list the standarddeviations of the formant frequencies of the same vowelsestimated in quiet using the LPC method.Results indicatedthat the estimation of the F1frequency was more accuratethan the estimation of the F2and F3frequencies.5.SUMMARY AND CONCLUSIONSA new method for estimating formant frequencies in noisewas proposed based on sequential determination of spec-tral segments and formant frequencies.The spectrum wasFig.1.The top panel shows values of the segmentation met-ric as a function of frequency.Saturation point was esti-mated to be 1100Hz.Bottom panel shows the noisy spec-trum of the vowel /ey/.In this example,the F1region was determined to be 0-1100Hz.Fig.2.Formant tracks for four vowels in +5dB S/N es-timated using the proposed formant frequency estimation algorithm (SEF).For comparison,we superimpose the for-mant tracks of the vowels estimated in quiet by the LPC and dynamic programming based algorithms (Dyn)[5].sequentially segmented into K segments using a new seg-mentation metric based on Wiener filter theory.No speci fic assumptions were required for the statistics of the noise.Experimental results showed that the estimated formant fre-quencies of vowels embedded in +5dB speech-shaped noise were comparable to the formant frequencies estimated in quiet.6.REFERENCES[1]A.Crowe and M.A.Jack,”Globally optimizing formanttracker using generalized centroids,”Electron.Lett.,vol.23,pp.1019-1020,Sept.1987.[2]G.E.Kopec,”Formant tracking using hidden Markovmodels and vector quantization,”IEEE Trans.Acoust.,Speech,Signal Processing,vol.ASSP-34,pp.709-729,Aug.1986.[3]S.McCandless,”An algorithm for automatic formantextraction using linear prediction spectra,”IEEE Trans.Acoust.,Speech,Signal Processing,vol.ASSP-22,pp.135-141,1974.[4]R.C.Snell and inazzo,”Formant location fromLPC analysis data,”IEEE Trans.Speech Audio Process-ing,vol.1,pp.129-134,Apr.1993.[5]L.Welling and Hermann Ney,”Formant Estimation forSpeech Recognition,”IEEE Trans.Speech Audio Pro-cessing,vol.6,pp.36-48,Jan.1998.[6]D.H.Klatt,”Software for a cascade/parallel formantsynthesizer,”J.Acoust.Soc.Amer.,vol.67,pp.970-995,Mar.1980.[7]H.S.Chhatwal and A.G.Constantinides,”Speechspectral segmentation for spectral estimation and for-mant modeling,”in IEEE Int.Conf.Acoustics,Speech,and Signal Processing,Dallas,TX,apr.1987,pp.316-319[8]A.Watanabe,”Formant Estimation Method UsingInverse-Filter Control,”IEEE Trans.Speech Audio Pro-cessing,vol.9,pp.317-326,May 2001.[9]M.Berouti,R.Schwartz and J.Makhoul,”Enhance-ment of speech corrupted by acoustic noise”,IEEE Int.Conf.Acoustics,Speech,and Signal Processing,vol.4,pp.208-211,Apr 1979。
Summary+of+English+Phonetic+Alphabet+Knowledge+Poi
The written symbol "ae" reports a double voice sound, where the short "a" sound is followed by a long "e" sound
03
Consistent Phonetic Alphabet Learning
Classification
Vowels are classified into two types: pure Vowels and diphthongs Pure votes are produced with a single sound source, while diphthongs involve a smooth transition from one vote sound to another
Learning • Consistent Phonetic Alphabet
Learning
目录
• Phonetic Pronunciation Rules • Phonetic Alphabet Promotion
Rules • The Application of Phonetic
Alphabet in Practice
要点一
Consonants
Consonants are speech sounds that require some sort of construction in the vocal tract Consistent sounds are represented by letters such as "b," "d," and "g" in the Phonetic Alphabet
Formant discrimination in noise for isolated vowels
Formant discrimination in noise for isolated vowels a) Chang Liu b)and Diane Kewley-Port c)Department of Speech and Hearing Sciences,Indiana University,Bloomington,Indiana47405͑Received4April2004;revised9August2004;accepted9August2004͒Formant discrimination for isolated vowels presented in noise was investigated for normal-hearinglisteners.Discrimination thresholds for F1and F2,for the seven American English vowels/{,(,},,,#,~,É/,were measured under two types of noise,long-term speech-shaped noise͑LTSS͒and multitalker babble,and also under quiet listening conditions.Signal-to-noise ratios͑SNR͒variedfromϪ4toϩ4dB in steps of2dB.All three factors,formant frequency,signal-to-noise ratio,andnoise type,had significant effects on vowel formant discrimination.Significant interactions amongthe three factors showed that threshold-frequency functions depended on SNR and noise type.Thethresholds at the lowest levels of SNR were highly elevated by a factor of about3compared to thosein quiet.The masking functions͑threshold vs SNR͒were well described by a negative exponentialover F1and F2for both LTSS and babble noise.Speech-shaped noise was a slightly more effectivemasker than multitalker babble,presumably reflecting small benefits͑1.5dB͒due to the temporalvariation of the babble.©2004Acoustical Society of America.͓DOI:10.1121/1.1802671͔PACS numbers:43.71.Es,43.66.Fe͓RLD͔Pages:3119–3129I.INTRODUCTIONAs one of the world languages with a crowded vowel space,American English has vowels that typically show con-siderable spectral overlap in the two-dimensional F1ϫF2 vowel formant space͑Peterson and Barney,1952;Hillen-brand et al.,1995͒.Spectral overlap between vowels result-ing from talker and context differences suggest that vowels with the same formant-frequency values(F1and F2)appear to belong to different vowel categories.In spite of this am-biguity,formant frequencies,especially F1and F2,are criti-cal to vowel perception and categorization.Thus,a long-term goal of our research is to establish a model of vowel percep-tion to represent listeners’abilities to discriminate changes of formant frequencies both within and between vowels.To date,most formant discrimination research has been con-ducted in quiet,and the purpose of this research is to exam-ine formant discrimination in noise.Formant discrimination has been investigated systemati-cally in a variety of experimental conditions͑Flanagan, 1955;Kewley-Port and Watson,1994;Hawks,1994; Kewley-Port,1995;Kewley-Port and Zheng,1999;Liu and Kewley-Port,2004b͒.In these studies,either F1or F2was changed by different amounts and the smallest change in formant frequencies that could be detected,represented by⌬F,was defined as the threshold of vowel formant discrimi-nation.Thefirst study estimating discrimination thresholds for single-formant frequencies was conducted by Flanagan͑1955͒,using synthetic,steady-state vowels.Thresholds for F1at300,500,and700Hz and for F2at1000,1500,and 2000Hz were measured,with thefinding that thresholds were approximately3%–5%of the formant frequency,as expressed in terms of the Weber ratio,⌬F/F.Over the last decade,Kewley-Port and her colleagues have investigated vowel formant discrimination systemati-cally from optimal to more ordinary listening conditions for both formant-synthesized and high-fidelity speech.Kewley-Port and Watson͑1994͒,using formant-synthesized speech ͑Klatt,1980͒,measured thresholds for formant discrimina-tion under optimal listening conditions,in which isolated vowels were presented under minimal stimulus uncertainty in quiet to well-trained listeners.They found that thresholds were constant at14Hz for F1Ͻ800Hz and increased with formant frequency at a rate of10Hz/1000Hz for F2.The Weber ratio in the F2region was approximately1.5%.To investigate formant discrimination under other more ecologi-cally representative conditions,several factors have been ma-nipulated systematically such as fundamental frequency (F0),phonetic context,level of stimulus uncertainty,lis-tener training,and the addition of a word identification task. For example,thresholds for vowel formant discrimination were measured in different phonetic contexts including syl-lables,phrases,and sentences͑Kewley-Port,1995;Kewley-Port and Zheng,1999;Liu and Kewley-Port,2004b͒. Kewley-Port͑1995͒investigated the effect of the phonetic context/CVC/on formant discrimination for the/(/vowel and suggested that thresholds in a/CVC/were significantly increased compared to thresholds for isolated vowels.When phonetic context increased further to phrases and sentences in formant-synthesized speech,performance for formant dis-crimination became even worse͑Kewley-Port and Zheng, 1999͒.Thresholds for high-fidelity speech,which was syn-thesized in STRAIGHT͑Kawahara et al.,1999͒and sounded more natural than formant-synthesized speech,showed simi-lar effects for phonetic contexts͑Liu and Kewley-Port, 2004b͒,although thresholds were elevated compared to format-synthesized speech.Besides phonetic context,the level of stimulus uncertainty and subject training were found to significantly affect formant frequency discrimination.a͒Portions of the data were presented at the141st and142nd meeting of the Acoustical Society of America͓J.Acoust.Soc.Am.109͑5͒,2295͑Liu and Kewley-Port,2001͒;J.Acoust.Soc.Am.110͑5͒,2658͑Liu and Kewley-Port,2001͔͒.b͒Electronic mail:chang.liu@c͒Electronic mail:kewley@Compared to thresholds at minimal stimulus uncertainty ͑Kewley-Port and Watson,1994͒,a medium level of stimulus uncertainty appeared to increase thresholds for isolated vow-els by130%͑Kewley-Port and Zheng,1999;Kewley-Port, 2001͒.In another condition,naive listeners in theirfirst block of testing still showed much more difficulty in discriminating formant frequency than highly trained listeners with a reduc-tion of230%associated with training͑Kewley-Port,2001͒.Since formant frequencies are represented on a nonlinear scale in the cochlea,Kewley-Port and Zheng͑1999͒and Liu and Kewley-Port͑2004b͒have described a straightforward method of summarizing formant thresholds data.That is,thresholds for vowel formant discrimination,⌬F,were transformed to an auditory scale͑Kewley-Port and Zheng, 1999͒.Several auditory scales including log frequency, Moore’s equivalent rectangular bandwidth͑ERB rate͒scale ͑Moore and Glasberg,1987;Glasberg and Moore,1990͒,and Zwicker’s͑1961͒critical-band scale͑z͒,were examined.Of the three auditory scales,the z scale showed theflattest func-tions and was the most effective in reducing effects of for-mant frequency and fundamental frequency.Applying the z transform to vowel formant discrimination in phrases and sentences under more ordinary listening conditions,thresh-olds were described as constant at0.28barks for formant-synthesized speech͑Kewley-Port and Zheng,1999͒.Re-cently,in a similar study͑Liu and Kewley-Port,2004b͒using high-fidelity speech,reported thresholds were constant at 0.37barks,which can be thought of as a norm for formant discrimination in modest length sentences.Thus,effects of a number of factors,such as phonetic context,level of stimulus uncertainty,training,and an addi-tional identification task,in vowel formant discrimination tasks have been systematically studied.These factors were investigated under the quiet listening condition.However, one challenge to perceiving speech in everyday conversation is the typically noisy environment.Background noise can mask the speech signal so that listeners have less acoustical information to perceive and identify the speech sounds. Thus,the purpose of this research is to investigate how noise affects formant discrimination.Noise variables that influence speech perception are level,and the spectral and temporal properties of the noise. Speech reception thresholds͑SRT͒are constant at low noise levels but increase proportionally to the increase of noise level for middle and high noise levels in the presence of the steady noise͑Plomp and Mimpen,1979͒.Speech perception becomes more difficult as bandwidth increases such that noise reduces the audibility of larger portions of the speech spectrum͑Dubno and Dirks,1982;Stelmachowicz et al., 1990͒.In addition,listeners are sensitive to the temporal fluctuation in noise,i.e.,they take advantage of intermittent noise,modulated noise,and multitalker babble compared to a noise without modulation͑Miller and Licklider,1950;Ka-likow et al.,1977;Festen,1993͒.Although a number of investigations have been con-ducted on speech perception in noise,little research has been concerned with the effects of noise on vowel formant dis-crimination.The present study investigates the degree to which noise influences formant thresholds spectrally and temporally using two types of noise:long-term speech-shaped noise and multitalker babble in comparison to a quietcondition.Two noise factors were manipulated systemati-cally in this study:the overall signal-to-noise ratio and noisetype.II.METHODA.Speech stimuliV owel formant discrimination was measured for sevenAmerican English vowels/{,(,},,,#,~,É/.This selection ofseven vowels provided a broad coverage of formant frequen-cies that range from270to2562Hz as shown in Table I.V owels were recorded in the syllable context of/bVd/from afemale talker.Many sentences and phrases with each of theseven vowels were originally recorded from the femaletalker.Syllables͑/bVd/͒with similar neutral prosody wereselected at the eighth position of one nine-word sentence toserve as the original stimuli for formant shifts and futureresynthesis.All speech stimuli with and without formantshifts were resynthesized from these original syllables in amodified version of STRAIGHT͑Kawahara et al.,1999͒,which uses a pitch-adaptive method for speech analysis andsynthesis.The resynthesized speech stimuli without anychange of the acoustic parameters in STRAIGHT͑the standardstimuli͒sounded quite similar to original natural-speechstimuli and they were referred to the high-fidelity speech.Formant shifts,as described in Liu and Kewley-Port͑2004a͒,were manipulated based on the standard stimulus as follows:a matrix in MATLAB representing the spectrogram͑amplitude ϫtimeϫfrequency͒of the standard syllable,was obtained by the analysis in STRAIGHT.Visually this spectrogram has verysmooth formant peaks.To shift a formant peak,the temporallocation of the formant across the syllable,including transi-tions,was visually identified.In each time frame͑i.e.,onespectrum͒,formant shift was manipulated for the portion be-tween the valleys on either side of the formant peak.Ampli-tude in the low-frequency valley was adjusted to be a con-stant across the frequency range corresponding to thefrequency shift,while the high-frequency valley was col-lapsed by replacing the original amplitude values with theshifted peak,such that the shift in the selected formant fre-quency resulted in no change in other formants͑see Liu andKewley-Port,2004a͒.Detail in the formant peaks was pre-served in this procedure,with the valleys only somewhatchanged.This modified2D matrix was reloaded into STRAIGHT and used with other unchanged acoustic param-TABLE I.Formant frequency͑Hz and bark͒for F1and F2for seven female vowels.F1V owel i u(},#a Frequency͑Hz͒270280430580678700818 Barks 2.7 2.8 4.3 5.6 6.3 6.57.4 F2V owel u a#,}(i Frequency͑Hz͒1124128114541960207821322562 Barks9.610.110.912.913.313.414.7eters such as F 0and amplitude contours for resynthesis ͓for more details,see Liu and Kewley-Port ͑2004a ͒and their Fig.1͔.Although formants at both formant transitions ͑onset and offset ͒as well as the steady state were adjusted in each /bVd/syllable,only isolated vowels were selected as speech stimuli in the present study.These isolated vowels were edited by deleting the formant transition at the beginning and end of the syllable such that only the relatively steady-state vowel nucleus remained.The duration of isolated vowels varied from 107to 206ms as expected in natural speech.Formant frequency shifts were manipulated in STRAIGHT in sets of 24for each vowel.Selected formant frequencies,F 1and F 2of the seven vowels,were increased by 0.7%to 17%over 24steps using a linear scale with a step size of 0.7%.For example,the F 1of the /}/vowel is 580Hz,and the shifts of F 1ranged from 4.1Hz ͑0.7%͒to 98.6Hz ͑17%͒.This procedure is the same as that of Liu and Kewley-Port’s study ͑2004b ͒.B.MaskersTwo types of noise were selected as maskers in this study:long-term speech-shaped noise and multitalker babble ͑Kalikow et al.,1977͒.Long-term speech-shaped noise ͑LTSS ͒is similar to white noise in that it does not fluctuate;however,it has more masking efficiency due to the similarity between the spectra of speech signals and LTSS.On the other hand,the multitalker babble is a time-varying masker in which occasional elements of speech signal may coincide with momentary minima in the level of the masker such that the multitalker babble has less masking efficiency than LTSS noise.In order to match the shape of the noise as well as possible to the speech of this talker,the LTSS noise was generated from uniform noise that was shaped by a filter with the average spectrum of long-term speech calculated over the 40sentences and 9phrases from the female talker.The babble selected is the 12-talker babble produced by Ka-likow et al.͑1977͒that has been used frequently in the lit-erature.The spectra of LTSS noise,multitalker babble,and four of the seven isolated vowels ͑/(,},,,#/͒with signal-to-noise ratio at 0dB are shown in Fig.1.As expected forthe natural vowels,local signal-to-noise ratios are variable for F 1and F 2for these four vowels under either LTSS noise or babble.Spectra of the other three vowels ͑/a,i,u/͒also showed the similar variability on local SNRs for F 1and F 2regions,with the exception of a relatively lower local SNR for F 2of the vowel /u/.The primary spectral difference be-tween the two noises is that the babble has less high-frequency energy than the LTSS noise,and therefore local SNR for F 2is much higher ͑about 6dB ͒.Given the large variability in formant amplitude for these naturally produced vowels,it was not practical to experimentally control local SNR for each formant and noise.Instead,the overall signal-to-noise ratio was manipulated and the local SNR values were evaluated post hoc .Pilot data suggested that perfor-mance for vowel formant discrimination was mainly affected for SNRs between Ϫ4and ϩ4dB,where vowels at Ϫ4-dB SNR were barely detectable.Thus,SNR was varied from Ϫ4to ϩ4dB with a step size of 2dB in the present study.C.ListenersSix American English native speakers,between 21and 39years old,participated in this study.All listeners had nor-mal hearing with pure-tone thresholds of 15dB HL or better at octave intervals from 250to 8000Hz and were paid for their participation.D.ProcedureSpeech stimuli were presented to the right ears of listen-ers,who were seated in a sound-treated,IAC booth,via cali-brated TDH-39earphones.Stimulus presentation was con-trolled by a series of TDT modules,including a 16-bit D/A converter ͑DA1͒,a programmable filter ͑PF1͒,and a head-phone buffer ͑HB6͒,using a sample rate of 11025Hz.A low-pass filter with a cutoff frequency of 5000Hz and a slope of 80dB/octave,and an attenuation level set by the calibration procedure,was configured in the programmable filter and applied to the summed speech plus noise signal.The vowel /}/,with a duration of 3s,was used as the cali-bration sound.The sound-pressure level measured in the NBS-9A 6-c 3coupler by a Larson-Davis sound-level meter ͑model 2800͒with the linear weighting band was set at 70dB SPL for the vowel stimuli.The LTSS noise was generated by the TDT waveform generator ͑WG2͒,then filtered ͑PF1͒and attenuated ͑PA4͒,and finally added to the vowel stimuli via the weighted sum-mer ͑SM3͒.A FIR filter was designed to match the long-term speech spectrum of the female talker.The babble masker was generated by randomly selecting a segment of multitalker babble of an appropriate length from a 30-s sample of the babble ͑Kalikow et al.,1977͒.This selected babble segment was then presented with the vowel stimuli through the dual channels of the converter ͑DA1͒separately.After being at-tenuated ͑PA4͒,the babble was summed with the vowel stimuli via the weighted summer ͑SM3͒.The level of LTSS noise and multitalker babble was attenuated so as to achieve the overall signal-to-noise ratio at Ϫ4,Ϫ2,0,ϩ2,and ϩ4dB or turned off for the quietcondition.FIG.1.LPC spectra of four vowels,LTSS noise,and multitalker babble for an overall vowel level of 70dB SPL and an overall SNR of 0dB.Test procedures were similar to those of Kewley-Portand Zheng͑1999͒.Formant thresholds were measured usinga modified three-interval,two-alternative forced-choice ͑2AFC͒procedure with a two-down,one-up tracking algo-rithm,estimating the frequency increment required for71%-correct responses͑Levitt,1971͒.In each trial,the standardwas presented in thefirst interval,followed by two intervals,in one of which was the incremented-formant vowel and inthe other was the standard vowel.The listener’s task was toindicate which interval contained the stimulus that differedfrom the standard.Sixty trials were run in each block for thequiet only condition and90trials each for the LTSS noiseand babble conditions.Only one formant was tested in eachblock,varying SNR and quiet.There were three experiments in this study.In thefirstexperiment,formant thresholds were measured in quiet forall seven vowels.In the second experiment,thresholds weremeasured in LTSS noise with interleaved trials of the quietcondition.In each block,vowels with only one shifted for-mant were presented.The SNR values were randomly variedtrial by trial with quiet with the range fromϪ4toϩ4dB. Thus,six conditions͑five SNR levels in noise plus quiet͒with only one formant and one vowel were presented in each block.Seven vowels were presented to the listeners in a qua-sirandom order such that the sequence of vowel presenta-tions was different among listeners.Only one vowel was presented in one session͑per day͒and F1and F2were alternated in successive blocks for threshold measurements. In the third experiment,thresholds were measured in the multitalker babble͑Kalikow et al.,1977͒using the same pro-cedures as the second experiment,again with interleaved quiet-condition trials.In each masking trial of the second and third experiment,the maskers began1s before the standard vowel and continued for1s after the three vowel intervals. The intervals between vowel presentations,within each trial, were400ms in duration.Prior to the collection of experimental data,listeners had extensive training in each of the conditions.The duration of each block was approximately4min for the quiet condition and7–8min for the noise conditions.Every session was composed of six or seven blocks.At the beginning of each experiment͑quiet only,LTSS noise,and multitalker babble͒, listeners were given one training session in which F1and F2of vowel/}/were presented so that listeners became familiar with the procedures.After the training session,the test sessions were begun.A mean value of⌬F for each lis-tener was averaged over the⌬F values for the last four blocks in which performance was stable by visual inspection. Group means,i.e.,⌬F for each formant,were calculated as the average of the⌬F thresholds across the six listeners. III.RESULTSA.Thresholds for formant discrimination in quietAverage thresholds of⌬F as a function of formant fre-quency for F1and F2for the seven vowels tested in quiet are shown in parisons for the quiet conditions presented in each of the three experiments͑quiet only and quiet trials interleaved with masking trials in experiment2and3͒were made with a two-way͑listening condition X, formant frequency͒repeated-measure analysis of variance͑ANOV A͒.Results showed that there was no significant ef-fect of listening conditions on formant discrimination in the quiet͓F(2,10)ϭ2.75,pϭ0.112],whereas higher formant frequency significantly increased thresholds͓F(13,65)ϭ18.31,pϽ0.001͔.In addition,the interaction between the formant frequency and listening conditions was not signifi-cant͓F(26,130)ϭ0.971,pϭ0.511].The Tukey post hoc test suggested that there was no significant difference for the thresholds between any two of the three quiet listening con-ditions.Not surprisingly,effects of formant frequency were similar to those obtained in a previous study͑Kewley-Port and Watson,1994͒,that is,thresholds(⌬F)were reasonably constant in the F1region͑Ͻ800Hz͒,and elevated with increases of formant frequency in the F2region.The thresholds in Hz were converted to thresholds in an auditory scale in bark͑Z͒,using Traunmu¨ller’s equation͑1990͒.As expected,the transformation produced thresholds (⌬Z)that were relativelyflat across F1and F2values of formant frequency͑see Fig.3͒.Linear regression on all three quiet conditions showed that there was no significant linear relationship between thresholds(⌬Z)and formant frequency͑barks͒,suggestingflat patterns for all three quiet conditions ͓quiet only,F(1,12)ϭ0.607,pϭ0.451;quiet with LTSS, F(1,12)ϭ0.382,pϭ0.548;quiet with babble,F(1,12)ϭ2.103,pϭ0.173].However,a two-way͑listening condi-tion X,formant frequency͒ANOV A again suggested that lis-tening condition had no significant effect on⌬Z͓F(2,10)ϭ2.542,pϭ0.128],whereas formant frequency did have a significant effect on⌬Z͓F(13,65)ϭ8.707,pϽ0.001͔.The significant effect of formant frequency on⌬Z might be due to the patterns of variability seen for some thresholds across the formant frequency in the three quiet conditions͑ranging from0.108to0.294barks͒.Altogether,the three quiet con-ditions showed similar average thresholds͑about0.20barks͒and similar variability over formants͑standard deviation of 0.05–0.06barks for the three quiet conditions͒.FIG.2.Thresholds of vowel formant discrimination(⌬F)in Hz as a func-tion of formant frequency in quiet conditions for three procedures:quiet only,quiet trials interleaved with LTSS noise trials,and quiet trials inter-leaved with babble trials.B.Thresholds for formant discrimination in LTSS noiseBecause statistical analyses showed similar results for thresholds in ⌬F and ⌬Z ,only thresholds in ⌬Z are reported here.Thresholds in barks for only the LTSS noise and the interleaved quiet condition are shown in Fig.4.The lines connecting the thresholds are only intended to help compare visually the different SNR conditions.Thresholds in ⌬Z were analyzed by a two-way ͑formant frequency X,SNR ͒repeated-measure ANOV A.The analysis showed that for-mant frequency ͓F (13,65)ϭ9.331,p Ͻ0.001͔,SNR ͓F (5,25)ϭ133.5,p Ͻ0.001͔and the interaction between for-mant frequency and SNR ͓F (65,325)ϭ7.341,p Ͻ0.001͔all showed significant effects.The Tukey post hoc tests of threshold-frequency (⌬Z )functions across the SNRs indi-cated four different patterns:͑1͒quiet condition;͑2͒SNR ϭϩ4dB;͑3͒SNR ϭϩ2and 0dB;and ͑4͒SNR ϭϪ2and Ϫ4dB.Simple main effects analyses revealed that formant fre-quency was a significant effect for every SNR and the quiet condition (p Ͻ0.01).Under the quiet listening condition and noise conditions at high SNRs,thresholds in ⌬Z showed relatively flat patterns over formant frequencies.However,thresholds increased markedly in the F 2regions under noise conditions at low SNRs,such as Ϫ2and Ϫ4dB,suggesting more effects of noise on the F 2regions than the F 1regions ͑see Fig.4͒.In order to measure whether thresholds at F 1region are different from thresholds at F 2region,planned comparisons between thresholds for F 1and F 2regions un-der each SNR condition,including the quiet condition,were completed.Results suggested that thresholds in the F 1re-gion were significantly lower than thresholds in the F 2re-gions for only noise conditions with SNR at Ϫ2͓F (1,5)ϭ83.928,p Ͻ0.001͔and Ϫ4dB ͓F (1,5)ϭ125.184,p Ͻ0.001͔.There was no significant difference for thresholds (⌬Z )between F 1and F 2regions at ϩ4dB,ϩ2dB,0dB,or for the quiet condition (p Ͼ0.08).Thus,masking elevates thresholds in the F 2region compared to F 1at low SNRs.Another perspective on the results is provided by mask-ing functions,which demonstrate more clearly how thresh-olds change with decreasing local SNRs.Local SNRs are the measured signal-to-noise ratios for each individual formant.The local SNRs in the frequency region of each formant was obtained by measuring the 1/3-octave levels centered around the formants relative to the noise level using a Larson-Davis sound-level meter ͑2800model ͒,as shown in Table II.In Fig.5,it appears that thresholds in bark as a function of local SNR improve for F 1and F 2for the seven vowels as SNR increases.An exponential decay function and a linear function were fit to the masking function.Because the threshold in noise will eventually reach the threshold in quiet,the quiet threshold was used as a reference to normal-ize thresholds in noise.The normalized thresholds were cal-culated by subtracting the quiet threshold ͑0.201barks ͒from each threshold in Fig.5and then used for the exponential decay regression function.Analysis of the nonlinear regres-sion suggested that the exponential decay function fit well to the masking function (r ϭϪ0.779).The function,⌬Z ϭ0.201ϩ0.378*e (Ϫ0.083*SNR),shown in Fig.5,suggests that thresholds improved exponentially with the increase inlocalFIG.3.Thresholds of vowel formant discrimination (⌬Z )in bark as a function of formant frequency in quiet conditions for three procedures:quiet only,quiet trials interleaved with LTSS noise trials,and quiet trials inter-leaved with babbletrials.FIG.4.Thresholds of vowel formant discrimination (⌬Z )in bark as a function of formant frequency in quiet and in LTSS noise conditions with SNR at Ϫ4,Ϫ2,0,ϩ2,and ϩ4dB.TABLE II.Local SNR in the frequency region of each formant of the seven vowels for F 1͑top ͒and F 2͑bottom ͒for LTSS noise in dB.F 1Overall SNRi u (},#a 412.612.29.611.79.78.310.4210.610.27.69.77.7 6.38.408.68.2 5.67.7 5.7 4.3 6.4Ϫ2 6.6 6.2 3.6 5.7 3.7 2.3 4.4Ϫ4 4.6 4.2 1.6 3.7 1.70.32.4F 2Overall SNRu a #},(i 4Ϫ2.98.97.7 6.68.5 4.3 2.02Ϫ4.9 6.9 5.7 4.6 6.5 2.300Ϫ6.9 4.9 3.7 2.6 4.50.3Ϫ2.0Ϫ2Ϫ8.9 2.9 1.70.6 2.5Ϫ1.7Ϫ4.0Ϫ4Ϫ10.90.9Ϫ0.3Ϫ1.40.5Ϫ3.7Ϫ6.0SNR and approached the threshold in quiet ͑0.201barks ͒when SNR was above 30dB.The linear regression was also completed,yielding a strong relationship between thresholds in ⌬Z and SNRs (r ϭϪ0.812).However,because the mask-ing function shows an asymptotic pattern and it is reasonable to predict that the threshold will approach the threshold in quiet as SNR reaches a high level,nonlinear regression was considered a more principled description of the masking function.C.Thresholds for formant discrimination in multitalker babbleSimilar to the LTSS noise condition,thresholds in mul-titalker babble were analyzed in Hz (⌬F )and barks (⌬Z ).Again results for the two statistical analyses were quite simi-lar and only thresholds in barks (⌬Z )are reported here.As expected,results of a two-factor ͑formant frequency X,SNR ͒repeated-measure ANOV A on ⌬Z suggested that effects of formant frequency ͓F (13,65)ϭ7.176,p Ͻ0.001͔,SNR ͓F (5,25)ϭ88.214,p Ͻ0.001͔,and the interaction between formant frequencies and SNR ͓F (65,325)ϭ2.400,p Ͻ0.001͔were significant.As seen in Fig.6,frequency func-tions in ⌬Z were dependent on overall SNRs.Based on the Tukey post hoc tests across the SNRs,patterns of threshold-frequency functions were categorized into four groups:͑1͒quiet,ϩ4and ϩ2dB;͑2͒0dB;͑3͒Ϫ2dB;and ͑4͒Ϫ4dB.Examining functions for the quiet condition and high-level SNRs,⌬Z was relatively constant across the formants of F 1and F 2regions,whereas at low SNRs,particularly Ϫ2and Ϫ4dB,variability of ⌬Z increased,especially for the mid-frequency formants ͑i.e.,F 1of /a/and F 2of /u/͒.Simple main effects analysis suggested that there was significant ef-fect of the formant frequency under each listening condtion (p Ͻ0.01).However,different from in the LTSS noise,planned comparisons in the babble suggested that there was no significant difference between thresholds for F 1and F 2regions under each SNR and the quiet condition (p Ͼ0.07).These patterns of ⌬Z frequency under different SNRs inmultitalker babble are different from those patterns in LTSS noise,primarily in the F 2region as expected,given the dif-ferences in the noise spectra ͑Fig.1͒.Similar to the LTSS noise condition,the local SNRs for each formant were obtained by measuring the 1/3-octave lev-els of vowels and babble ͑long-term average ͒with a Larson-Davis sound-level meter ͑2800model ͒͑Table III ͒.As shown in Fig.1and Table III,in F 2region,the level of vowel formants is much higher than the masking level of multi-talker babble.The masking function for ⌬Z over F 1and F 2for the seven vowels is shown in Fig.7along with a negative exponential function that was fit by nonlinear regression (r ϭϪ0.622).The nonlinear regression function,represented as ⌬Z ϭ0.201ϩ0.340*e (Ϫ0.094*SNR),suggested that thresh-olds in ⌬Z decrease exponentially with increasing SNRs and approach the threshold for the quiet condition ͑0.201barks ͒for SNR above 30dB.parison between LTSS noise and multitalker babbleHaving considered the thresholds for each noise type separately,another analysis was completed to comparethemFIG.5.The masking effects of LTSS noise (⌬Z )on vowel formant thresh-olds as a function of local SNR are displayed along with the nonlinear ͑exponential decay ͒regressionfunction.FIG.6.Thresholds of vowel formant discrimination (⌬Z )in bark as a function of formant frequency in quiet and in multitalker babble conditions with SNR at Ϫ4,Ϫ2,0,ϩ2,and ϩ4dB.TABLE III.Local SNR for the frequency region at each formant of the seven vowels for F 1͑top ͒and F 2͑bottom ͒for multitalker babble in dB.F 1Overall SNRi u (},#a 48.38.27.711.610.910.711.32 6.3 6.2 5.79.68.98.79.30 4.3 4.2 3.77.6 6.9 6.77.3Ϫ2 2.3 2.2 1.7 5.6 4.9 4.7 5.3Ϫ40.30.2Ϫ0.3 3.6 2.9 2.7 3.3F 2Overall SNRu a #},(i 4 4.914.115.715.122.321.322.42 2.912.113.713.120.319.320.400.910.111.711.118.317.318.4Ϫ2Ϫ1.18.19.79.116.315.316.4Ϫ4Ϫ3.16.17.77.114.313.314.4。
海上无线电通信业务阅读习题集
Marine Radio Communication ServiceExercise BookletZhou fengShanghai Maritime UniversitySection-A: General Information and System Overview. Key Topic #1: Fundamental Concepts:1A1 What is the fundamental concept of the GMDSS?A. GMDSS utilizes automated systems and satellite technology to improve emergency communications for the world’s shipping industry.B. It is intended to automate and improve existing digital selective calling procedures and techniques.C. It is intended to provide more effective but lower cost commercial communications.D. It is intended to provide compulsory vessels with a collision avoidance system when they are operating in waters that are also occupied by non-compulsory vessels.1A2 The primary purpose of the GMDSS is:A. Allow more effective control of SAR situations by vessels.B. Automate and improve emergency communications for the world's shipping industry.C. Provide additional shipboard systems for more effective company communications.D. Effective and inexpensive communications.1A3 What is the basic concept of GMDSS?A. Shoreside authorities will rely on reports from nearby vessels to become aware of Distress alerts.B. Shoreside authorities and vessels can assist in a coordinated SAR operation only after the correct chain of DSC relays takes place.C. SAR authorities ashore can be alerted to a Distress situation & shipping in the vicinity can be requested to participate in SAR operations.D. SAR authorities ashore wait to have EPIRB Distress alerts confirmed by satellite follow-on communications.1A4 GMDSS is primarily a system based on:A. Ship-to-ship Distress communications using MF or HF radiotelephony.B. VHF digital selective calling from ship to shore.C. Distress, Urgency and Safety communications carried out by the use of narrow-band direct printing telegraphy.D. The linking of search and rescue authorities ashore with shipping in the immediate vicinity of a ship in Distress or in need of assistance.1A5 What is the responsibility of compulsory GMDSS vessels?A. Every vessel must be able to perform communications functions essential for its own safety and the safety of other vessels.B. Vessels must transmit a DSC distress relay upon receipt of a DSC distress alert.C. Only the vessels closest to a Distress incident must render assistance.D. Vessels must immediately acknowledge all DSC distress alerts.1A6 GMDSS is required for which of the following?A. All vessels capable of international voyages.B. SOLAS Convention ships of 300 gross tonnage or more.C. Vessels operating outside of the range of VHF coast radio stations.D. Coastal vessels of less than 300 gross tons.Answers - 1A1 - A 1A2 - B 1A3 - C 1A4 - D 1A5 - A 1A6 - BSection-A: General Information and System Overview. Key Topic #2: Equipment Systems:2A1 Which GMDSS system utilizes terrestrial radio techniques?A. F-77B. Inmarsat-CC. GPSD. VHF-MF-HF-DSC2A2 What equipment utilizes satellite communications?A. Inmarsat-CB. VHF-MF-HFC. NAVTEXD. SART2A3 What equipment is used in or near the survival craft?A. NAVTEXB. EPIRBC. FathometerD. COSPAS-SARSAT2A4 What equipment is programmed to initiate transmission of Distress alerts and calls to individual stations?A. NAVTEXB. GPSC. DSC ControllerD. DSC Scanning Watch Receiver2A5 What system provides accurate vessel position information to the GMDSS equipment?A. COSPAS-SARSATB. EPIRBC. GPSD. Inmarsat-B2A6 Which of these can be used to receive MSI?A. SARTB. EPIRBC. Inmarsat-BD. NAVTEXAnswers: 2A1 - D 2A2 - A 2A3 - B 2A4 - C 2A5 - C 2A6 - DSection-A: General Information and System Overview. Key Topic #3: Sea Areas:3A1 Which of the following regions lie outside Sea Areas A1, A2, and A3?A. Sea Areas only apply to Inmarsat footprint areas.B. Sea Area A3-I Inmarsat coverage and Sea Area A3-S HF SITOR (NBDP) coverage.C. Sea Area A4D. There are no additional Sea Areas.3A2 What sea area is defined as being within range of a shore-based MF station that provides for continuous DSC alerting?A. Coastal watersB. Sea area A3C. Sea area A1D. Sea area A23A3 If a vessel is engaged in local trade and at no point in its voyage travels outside the range of a VHF shore station with continuous DSC alerting then the vessel is operating in what area?A. Sea area A1B. Coastal and international zonesC. Inland and coastal watersD. Sea areas A1 and A23A4 What is defined as an area, excluding sea areas A1 and A2, within the coverage of an Inmarsat geostationary satellite in which continuous alerting is available?A. Ocean Area Regions AOR-E, AOR-W, POR or IORB. Sea Area A3C. Sea Area A4D. Coastal and Inland Waters3A5 SITOR (NBDP) equipment is a partial or alternate carriage requirement under GMDSS for vessels operating in which sea area(s)?A. A1B. A3 and A4C. A1 and A2D. A1, A2, A3 and A43A6 What is defined as the area within the radiotelephone coverage area of at least one VHF coast station in which continuous DSC alerting is available as defined by the IMO regulation for GMDSS?A. Ocean Area Regions AOR-E, AOR-W, POR or IORB. Sea Area A2C. Sea Area A1D. Coastal and Inland WatersAnswers: 3A1 - C 3A2 - D 3A3 - A 3A4 - B 3A5 - B 3A6 - C4A1 Which of the following is a functional or carriage requirement for compulsory vessels?A. A compulsory vessel must carry at least two (2) FCC licensed GMDSS Radio Operators in all sea areas as well as a GMDSS Maintainer in sea areas A3 & A4.B. A compulsory vessel must satisfy certain equipment carriage requirements based on the intended sea area of operation.C. A compulsory vessel must be able to transmit and respond to Distress alerts and carry only one (1) FCC licensed GMDSS Radio Operator in sea areas A1 & A2.D. None of these answers are correct.4A2 Which GMDSS communication functions must all compulsory vessels be capable of performing to meet International Maritime Organization requirements?A. Distress alerting and receipt of Maritime Safety Information via Inmarsat for all vessels intending to operate in Sea Area A4.B. Distress alerting and receipt of MSI in Sea Areas A1, A2, A3, and A4 regardless of the vessel's intended area of operation.C. Distress alerting, general communications and receipt of Maritime Safety Information in the vessel's intended area of operation.D. General communications via Inmarsat and receipt of Maritime Safety Information via Enhanced Group Calling in Sea Area A4.4A3 GMDSS-equipped ships will be required to perform which of the following communications functions?A. Distress alerting, MSI, SAR and on-scene communications & receipt of satellite alerts from other vessels.B. SAR and on-scene communications, Bridge-to-Bridge and general radio communications, MSI and relay of satellite alerts from other vessels.C. Bridge-to-Bridge and general radio communications, RDF of EPIRB homing signals, Distress alerting and MSI.D. Transmit distress alerts, SAR and on-scene communications, MSI, Bridge-to-Bridge and general radio communications.4A4 What equipment can be used to receive Maritime Safety Information?A. NAVTEX, EGC receiver or HF SITOR (NBDP).B. EGC receiver, Inmarsat B or F77 terminal.C. HF SITOR (NBDP), Inmarsat B or NAVTEX.D. All of these answers are correct.4A5 Which of the following are required GMDSS functions?A. Bridge-to-Bridge communications, reception of weather map facsimile broadcasts, SAR communications.B. Reception of weather map facsimile broadcasts, receiving company email, On-scene communications.C. Reception of VHF weather channels, On-scene communications, general communicationsD. Bridge-to-Bridge communications, general communications, SAR communications.4A6 Which of the following are required GMDSS functions for vessels?A. Transmit and receive locating signals, general communications and SAR communications.B. Transmit and receive general communications, transmit Distress Alerts by at least one means, MSI.C. Transmit and receive locating signals, send MSI to other ships via EGC, Bridge-to-Bridge communications.D. Transmit and receive SAR communications, transmit Distress Alerts by at least one means, Bridge-to-Bridge communications.Answers: 4A1 - B 4A2 - C 4A3 - D 4A4 - A 4A5 - D 4A6 - A5A1 Which statement is true regarding a vessel equipped with GMDSS equipment that will remain in Sea AreaA1 at all times?A. The vessel must be provided with a radio installation capable of initiating the transmission of ship-to-shore Distress alerting from the position from which the ship is normallynavigated. B. VHF DSC alerting may be the sole means of Distressalerting.C. HF or MF DSC may satisfy the equipment requirement.D. HF SSB with 2182 kHz automatic alarm generator may satisfy the equipment requirement.5A2 What statement is true regarding the additional equipment carriage requirement imposed for the survival craft of vessels over 500 gross tons?A. Additional carriage of two radio equipped lifeboatsaft. B. A second radar transponder is required.C. Four additional portable VHF radios are required.D. The ability to communicate in all modes with any shore station.5A3 Vessels operating in which sea area(s) are required to carry either Inmarsat or HF equipmentor a combination thereof under GMDSS?A. All sea areasB. A4C. A3D. A15A4 Within a single sea area, what is the primary reason GMDSS imposes carriage requirements for different radio subsystems?A. Redundancy in duplicating all operational functions in the event of a system failure.B. Different subsystems are required to meet the specific equipment carriage requirements ofnational authorities.C. GMDSS vessels must be equipped to communicate in all modes with coast radio stations.D. The combined capabilities of redundant subsystems mitigate the risk of a single point of failure.5A5 If operating within Ocean Area A1, and outside of NAVTEX coverage, a GMDSS-equipped vesselmust carry:A. Equipment capable of reception of Maritime Safety Information by the Inmarsat enhanced group call system, or HF SITOR (NBDP).B. A GPS receiver.C. Equipment capable of maintaining a continuous DSC watch on 2187.5kHz. D. An Inmarsat-B terminal.5A6 What is the equipment carriage requirement for survival craft under GMDSS?A. At least three SCT units and two SARTs on every cargo ship between 300-500 gross tons and the same on all passenger ships regardless of tonnage.B. At least three SCT units and two SARTs on every passenger ship and cargo ships of 500 gross tonsand upwards.C. At least two radar transponders must be carried on every cargo ship of 300-500 gross tons and tworadar transponders (one for each side) of every passenger ship regardless of tonnage.D. All cargo vessels above 300 gross tons and every passenger ship regardless of tonnage must carry three SCT units and two SARTs.Answers: 4A1 - B 4A2 - C 4A3 - D 4A4 - A 4A5 - D 4A6 - ASection-C: F.C.C. Rules & Regulations: Key Topic #16: License and Personnel Requirements:16C1 Which FCC license meets the requirement to serve as a GMDSS operator?A. General Radiotelephone Operator’s License.B. GMDSS Radio Operator’s LicenseC. Marine Radio Operator’s Permit.D. GMDSS Radio Maintainer’s License.16C2 Which of the following statements concerning GMDSS Radio Operator requirements is false?A. Each compulsory vessel must carry at least two licensed GMDSS Radio Operators at all times while at sea.B. Each compulsory vessel must carry at least two licensed GMDSS Radio Operators at all times while at sea and may elect to carry a GMDSS Radio Maintainer as well.C. All communications involving Safety of life at sea must be logged as long as the compulsory vessel was not involved in such communications.D. While at sea, adjustments to, and the maintaining of, GMDSS equipment may be performed by the GMDSS Radio Operator as long as the work is supervised by an on-board licensed GMDSS Radio Maintainer.16C3 Which FCC license meets the requirements to perform or supervise the performance of at-sea adjustments, servicing, or maintenance which may affect the proper operation of the GMDSS station?A. General Radiotelephone Operator’s License with Shipboard RADAR endorsement.B. Marine Radio Operator’s Permit or GMDSS Maintainer’s license.C. GMDSS Radio Operator’s license or Marine Radio Operator’s Permit.D. GMDSS Operator’s/Maintainer’s license or GMDSS Maintainer’s license.16C4 Which statement is false regarding the radio operator requirements for a GMDSS-equipped ship station?A. Maintaining a record of all incidents connected with the radio-communications service that appear to be of importance to Safety of life at sea is not required.B. One of the qualified GMDSS radio operators must be designated to have primary responsibility for radio- communications during Distress incidents.C. A qualified GMDSS radio operator, and a qualified backup, must be designated to perform Distress, Urgency and Safety communications.D. While at sea, all adjustments or radio installations, servicing or maintenance of such installations that may affect the proper operation of the GMDSS station must be performed by, or under the supervision of, a qualified GMDSS radio maintainer.16C5 Which of the following are personnel, functional, or equipment FCC requirements of the GMDSS?A. One FCC licensed GMDSS radio operator in sea areas A1 & A2, two FCC licensed GMDSS radio operators in sea areas A3 & A4 and equipment carriage based on intended sea area of operations.B. Distress alerting and response, two USCG STCW GMDSS watchstanders, equipment carriage based on intended sea area of operations.C. Equipment carriage reduced for sea areas A3 & A4, Distress alerting and response and two FCC licensed GMDSS radio operators.D. Equipment carriage based on intended sea area of operations, distress alerting and response and two FCC licensed GMDSS radio operators.16C6 How many GMDSS radio maintainers must be carried aboard a compulsory vessel if the At-Sea maintenance method is used?A. One regardless of sea area of operation.B. Two in Sea Areas A3 and A4.C. Two in Sea Area A1.D. None of these answers are correct.Answers: 16C1 - B 16C2 - C 16C3 - D 16C4 - A 16C5 - D 16C6 - ASection-C: F.C.C. Rules & Regulations: Key Topic #16: Reserve Source of Energy:17C1 Which statement is false regarding the GMDSS requirement for ship sources of energy?A. The reserve sources of energy need to supply independent MF and HF radio installations at the same time.B. At all times while the vessel is at sea, a sufficient supply of electrical energy to operate the radio installations and charge any batteries which may be part of the reserve source of energy is required.C. An uninterruptible power supply or other means of ensuring a continuous supply of electrical power to all GMDSS equipment that could be affected by normal variations and interruptions of ship's power is required.D. If a vessel’s position is constantly required for the proper performance of a GMDSS station, provisions must be made to ensure position information is uninterrupted if the ship’s source of main or emergency energy fails.17C2 What is the meaning of “Reserve Source of Energy”?A. High caloric value items for lifeboat, per SOLAS regulations.B. Power to operate the radio installation and conduct Distress and Safety communications in the event of failure of the ship's main and emergency sources of electrical power.C. Diesel fuel stored for the purpose of operating the powered survival craft for a period equal to or exceeding the U.S.C.G. and SOLAS requirements.D. The diesel fueled emergency generator that supplies AC to the vessel’s Emergency power bus.17C3 Which term describes the source of energy required to supply the GMDSS console with power if the ship’s source of main or emergency energy fails?A. Emergency powerB. Ship's emergency diesel generatorC. Reserve Source of EnergyD. Ship's standby generator17C4 What characteristics describe the GMDSS Reserve Source of Energy (RSE)?A. Supplies independent HF and MF installations at the same time.B. Cannot be independent of the propelling power of the ship.C. Must be incorporated into the ship's electrical system.D. Must be independent of the ship's electrical system when the RSE is needed to supply power to the GMDSS equipment.17C5 What is the requirement for emergency and reserve power in GMDSS radio installations?A. Compulsory ships must have emergency and reserve power sources for radio communications.B. An emergency power source for radio communications is not required if a vessel has proper reserve power (batteries).C. A reserve power source is not required for radio communications.D. Only one of the above is required if a vessel is equipped with a second 406 EPIRB as a backup means of sending a Distress alert.17C6 Which of the following terms is defined as a back-up power source that provides power to radio installations for the purpose of conducting Distress and Safety communications when the vessel's main and emergency generators cannot?A. Emergency Diesel Generator (EDG)B. Reserve Source of Energy (RSE)C. Reserve Source of Diesel Power (RSDP)D. Emergency Back-up Generator (EBG)Answers: 17C1 - A 17C2 - B 17C3 - C 17C4 - D 17C5 - A 17C6 - BSection-C: F.C.C. Rules & Regulations: Key Topic #18: Equipment Testing:18C1 Under GMDSS, a compulsory VHF-DSC radiotelephone installation must be tested at what minimum intervals at sea?A. Annually, by a representative of the FCC.B. At the annual SOLAS inspection.C. MonthlyD. Daily18C2 Which statement concerning the testing of a compulsory radiotelephone station is false?A. Calling the USCG on VHF CH-16 or 2182.0 kHz is the most effective method.B. Tests may be accomplished by using the equipment for normal business.C. A daily test is necessary unless the equipment was used for routine traffic.D. The test may not interfere with communications in progress and must wait or be suspended if a request to do so is made.18C3 While underway, how frequently is the DSC controller required to be tested?A. Once a weekB. Once a dayC. Twice a weekD. Once a month18C4 At sea, all required equipment (other than Survival Craft Equipment) must be proven operational by:A. Testing at least every 48 hours.B. Weekly testing of all S.C.E. and other compulsory equipment.C. Daily testing or daily successful use of the equipment.D. Daily testing of the S.C.E. and weekly tests of the other equipment.18C5 The best way to test the MF-HF SITOR (NBDP) system is:A. Make a radiotelephone call to a coast station.B. Initiate an ARQ call to demonstrate that the transmitter and antenna are working.C. Initiate an ARQ call to a Coast Station and wait for the automatic exchange of answerbacks.D. Initiate an FEC call to demonstrate that the transmitter and antenna are working.18C6 The best way to test the Inmarsat-C terminal is:A. Send a message to a shore terminal and wait for confirmation.B. Send a message to another ship terminal.C. If the “Send” light flashes, proper operation has been confirmed.D. Compose and send a brief message to your own Inmarsat-C terminal.Answers: 18 - D 18C2 - A 18C3 - B 18C4 - C 18C5 - C 18C6 - D19C1 A vessel certified for service in Sea Area A3 is required to maintain a watch on:A. VHF Channel 70, MF Frequency 2182.0 kHz, HF on 8414.5 kHz and one other HF DSC frequency.B. MF Frequency 2187.5 kHz, HF on 8414.5 kHz and one other HF DSC frequency, HF on 4125.0 kHz.C. VHF Channel 70, MF Frequency 2187.5 kHz, HF on 8414.5 kHz and one other HF DSC frequency.D. VHF Channel 16, VHF Channel 70, MF Frequency 2187.5 kHz, HF on 8414.5 MHz and HF 4177.5 MHz.19C2 A vessel certified for service in Sea Area A-2 is required to maintain watch on:A. 2174.5 kHzB. 2182.0 kHzC. 2738.0 kHzD. 2187.5 kHz19C3 What are the mandatory DSC watchkeeping bands/channels?A. 8 MHz HF DSC, 1 other HF DSC, 2 MHz MF DSC and VHF Ch-70.B. 2 MHz MF DSC, 8 MHz DSC, VHF Ch-16 and 1 other HF DSC.C. VHF Ch-70, 2 MHz MF DSC, 6 MHz DSC and 1 other HF DSC.D. VHF Ch-70, 2 MHZ MF DSC, 4 MHZ DSC and 8 MHz DSC.19C4 Proper watchkeeping includes the following:A. Monitoring all required frequencies in the proper mode, setting the DSC scanner to 2 MHz, 4 MHZ and 8 MHz for ships in the vicinity, notifying the Master of any Distress alerts.B. After silencing an alarm all displays and/or printouts are read, monitoring all required frequencies in the proper mode, notifying the Master of any Distress alerts.C. Notifying the Master of any Distress alerts, setting the DSC scanner to 2 MHz, 4 MHZ and 8 MHz for ships in the vicinity, monitoring all required frequencies in the proper mode.D. Setting the DSC scanner only to the mandatory 2 MHz & 8 MHz, maintain continuous watch on 2182.0 kHz or 4125.0 kHz, notify the Master of any Distress traffic heard.19C5 Proper watchkeeping includes the following:A. Understanding normal operational indicators, setting the DSC scanner frequencies to minimize alarms, maintaining a proper log.B. Maintaining a proper GMDSS radio station log, understanding normal operational indicators, responding to and comprehending alarms.C. Responding to and comprehending alarms, logging out of Inmarsat-C terminals while at sea, maintaining a proper GMDSS radio station log.D. Maintaining a proper GMDSS radio station log, setting the DSC scanner frequencies to minimize alarms, logging out of Inmarsat-C terminals while at sea.19C6 Which is true concerning a required watch on VHF Ch-16?A. When a vessel is in an A1 sea area and subject to the Bridge-to-Bridge act and in a VTS system, a watch is required on Ch-16 in addition to both Ch-13 and the VTS channel.B. It is not compulsory at all times while at sea until further notice, unless the vessel is in a VTS system.C. When a vessel is in an A1 sea area and subject to the Bridge-to-Bridge act and in a VTS system, a watch is not required on Ch-16 provided the vessel monitors both Ch-13 and the VTS channel.D. It is not always compulsory in sea areas A2, A3 and A4.20C1 Which of the following statements meets requirements for 47 CFR 80 Subpart-W?A. GMDSS Radio Logs may not be retained aboard compulsory vessels in an electronic file (e.g., word processing) format.B. GMDSS Radio Logs must contain entries of all Distress and Urgency communications affecting your own ship.C. GMDSS Radio Logs must be retained aboard compulsory vessels for a period of at least 90 days in their original form.D. Entries in the GMDSS Radio Log are only required for communications within the vessel's intended Sea Area of operation.20C2 Which of the following statements is false?A. Key letters or abbreviations may be used in GMDSS Radio Logbooks if their meaning is explained.B. Urgency communications may need to be entered in the GMDSS radio log.C. Distress communications heard do not require entries if the vessel did not participate in SAR activity.D. Log entries of VHF Safety broadcasts are not required.20C3 Where should the GMDSS radio log be kept on board ship?A. Captain's officeB. Sea cabinC. Anywhere on board the vessel.D. At the GMDSS operating position.20C4 How long must the radio log be retained on board before sending it to the shoreside licensee?A. At least 30 days after the last entry.B. At least one year after the last entry.C. At least two years after the last entry.D. At least 90 days after the last entry.20C5 Which statement concerning radio log archival by the station licensee is false?A. Retain for two years if there are no Distress entries.B. Logs related to an investigation may not be destroyed without specific authorization.C. Retain for three years if there are Distress entries.D. Retain for one year unless there are Distress or Urgency entries.20C6 Which of the following logkeeping statements is false?A. Entries of all company communications using GMDSS satellite equipment are required.B. Entries relating to pre-voyage, pre-departure and daily tests are required.C. A summary of all Distress communications heard and Urgency communications affecting the station’s own ship. Also, all Safety communications (other than VHF) affecting the station’s own ship must be logged.D. Entries related to failures of compulsory equipment are required.Section-D: DSC & Alpha-Numeric ID: Key Topic #21: Call Signs and SELCALs21D1 A typical call sign for a large container ship under Chinese Flagwould be:A. KBZYB. WBX1469C. NADND. KPH21D2 What would the number 1090 indicate?A. A ship DSC MMSI number.B. A coast station SITOR (NBDP) SELCAL number.C. A coast station DSC MMSI number.D. A ship station SITOR (NBDP) SELCAL number.21D3 Which one of the following is a ship station SELCAL?A. 1104B. 1502352C. 11243D. 023*******21D4 Which of the following is the call sign for a U.S.C.G. coast station?A. NERKB. KPHC. WCCD. NMN21D5 What type of station would be assigned the call sign WAB2174?A. Tug boatB. Container shipC. Passenger shipD. Bulk Tanker21D6 What number will a ship station use to identify itself using SITOR (NBDP)?A. Four digit SELCAL.B. Five digit SELCAL or 9 digit SELCAL number identical to MMSI.C. 9 digit Inmarsat-B I.D. number.D. 9 digit Inmarsat-C I.D. number.Answers: 21D1 - A 21D2 - B 21D3 - C 21D4 - D 21D5 - A 21D6 - BSection-D: DSC & Alpha-Numeric ID: Key Topic #22: MMSI- MID and Ship Station I.D. Numbers: 22D1 What is the MID?A. Mobile Identification NumberB. Marine Indemnity DirectoryC. Mobile Interference DigitsD. Maritime Identification Digits22D2 How many digits are in the MID (Maritime Identification Digits)?A. 3B. 7C. 9D. 1022D3 What does the MID (Maritime Identification Digits) signify?A. Port of registryB. NationalityC. Gross tonnageD. Passenger vessel22D4 Which of the following numbers indicates a CHINA flag shipstation?A. 036627934B. 243537672C. 412426791D. 00338231522D5 Which of the following MMSI numbers indicates a CHINA flag shipstation?A. 412326890B. 033609991C. 303236824D. 25732681922D6 Which of the following numbers indicates a ship station MMSI?A. 003372694B. 030356328C. 3384672D. 623944326Section-D: DSC & Alpha-Numeric ID: Key Topic #23: MMSI; Group and Coast Station I.D. Numbers:23D1 A DSC call is received from a station with a MMSI number of 003669991. What type of station made the call?A. A vessel operating in Sea Area A3.B. A group ship stationC. A China. coast stationD. An Intercoastal vessel23D2 A valid MMSI number for a DSC call to a specific group of vessels is:A. 003664523B. 338462941C. 003036483D. 03032793123D3 A MMSI 030346239 indicates what?A. Group MMSIB. Inmarsat-C I.D. numberC. Coast stationD. Ship station23D4 Which of the following statements concerning MMSI is true?A. Coast station MMSI numbers have 9 digits starting with 4.B. All MMSI numbers are 9 digits and contain an MID.C. Ship station MMSI numbers can be 7 digits or 9 digits depending on the Inmarsat terminal.D. Group MMSI numbers must begin with 2 zeros.23D5 Which of the following statements concerning MMSI is false?A. All Coast Station MMSI must begin with 2 zeros.B. All Coast Station MMSI must begin with the MID then 2 zeros.C. A group call must begin with a single zero followed by the MID.D. The first 3 digits of a ship MMSI comprise the MID.23D6 Which of the following statements concerning MMSI is true?A. All ship station MMSI must begin with a single zero and include the MID.B. All group station MMSI must begin with the MID.C. None of these answers are correct.D. All Coast Station MMSI must be 9 digits and begin with the MID and then two zeros.。
元音和辅音的声学特征(丁红卫)
元音和辅音的声学特征
6
2. Human speech mechanism
Cross-section of the vocal tract
元音和辅音的声学特征
7
3. Representing the sounds of speech
Two ways to represent an ephemeral (短暂 的), time-bound (时间界限的)
– Thyroid (tc 甲状腺)
– Cricoid (cc环状软骨)
– Arytenoid (ac杓状软骨) 元音和辅音的声学特征
20
4.2 How vocal folds vibrate
Vocal folds open, air pass unimpeded: – Voiceless
Vocal folds vibrate: – Voiced
Relationship between pitch and fundamental frequency – Not linear – But logarithmic
Linear: difference same 100Hz-200Hz, 300Hz-400Hz Logarithmic: 100Hz-200Hz(1:2), 300Hz-400Hz(1:1.5)
元音和辅音的声学特征
Dr. Hongwei Ding 丁红卫
元音和辅音的声学特征
1
Introduction
1. Introduction to phonetics 2. Human speech mechanism 3. Representing the sounds of speech 4. The larynx, voicing and voice quality 5. Vowels 6. Approximants 7. Plosives 8. Fricatives 9. Nasals
kSamples Frequency-to-Digital Conversion of a 10 MHz FM Signal
A Delta–Sigma PLL for14-b,50kSample/s Frequency-to-Digital Conversion of a10MHz FM Signal Ian Galton,William Huff,Paolo Carbone,and Eric SiragusaAbstract—In many wireless applications,it is necessary to demodulate and digitize frequency or phase modulated signals. Most commonly,this is done using separate frequency discrim-ination and analog-to-digital(A/D)conversion.In low-cost IC technologies,such as CMOS,precise analog frequency discrim-ination is not practical,so the A/D conversion is usually per-formed in quadrature or at a nonzero intermediate frequency (IF)with digital frequency discrimination.While practical,the approach tends to require complicated A/D converters,and ac-curacy is usually limited by the quality of the A/D conversion. This paper presents an alternative structure,referred to as a delta–sigma frequency-to-digital converter(16FDC),that si-multaneously performs frequency demodulation and digitization. The16FDC is shown to offer high-precision performance with very low analog complexity.A prototype of the key component of the16FDC has been fabricated in a0.6 m,single-poly, CMOS process.The prototype achieved50kSample/s frequency-to-digital conversion of a10MHz frequency-modulated signal with a worst case signal-to-noise-and-distortion ratio of85dB and a worst case spurious-free dynamic range of88dB.I.I NTRODUCTIOND IGITAL signal processing is increasingly used in placeof analog processing in wireless communication systems to reduce manufacturing costs,improve reliability,and allow computer access.In receivers,the requisite digitization usually is performed after the radio-frequency(RF)signal has been down-converted to an intermediate frequency(IF)in the0–100 MHz range.The majority of wireless signal formats are based on frequency or phase modulation,so the demodulation process usually involves some form of frequency discrimina-tion.Conventional IF receiver architectures that perform fre-quency demodulation and digitization are shown in Fig.1. Each performs frequency demodulation and digitization and therefore can be viewed as a type of frequency-to-digital converter(FDC).The conceptually simplest system consists of an analog frequency discriminator and an analog-to-digital converter(ADC),as shown in Fig.1(a).While practical inManuscript received April30,1998;revised July28,1998.This work was supported by the National Science Foundation and by the California MICRO program in conjunction with Rockwell International Corp.I.Galton,W.Huff,and E.Siragusa are with the Department of Electrical and Computer Engineering,University of California,San Diego,La Jolla,CA 92093USA.P.Carbone is with the Instituto di Elettronica,Universit´a di Perugia,Perugia 06125Italy.Publisher Item Identifier S0018-9200(98)08866-0.(a)(b)(c)mon systems that use separate A/D conversion and frequency discrimination to achieve the equivalent of frequency-to-digital conversion. discrete-component systems,high-precision analog frequency discrimination is difficult to achieve in low-cost integrated circuit(IC)technologies such as CMOS.The systems shown in Fig.1(b)and(c)avoid this difficulty through the use of digital discriminators.The system of Fig.1(b)performs in-phase and quadrature demodulation,dual A/D conversion,and digital discrimination,and the system of Fig.1(c)performs bandpass A/D conversion and digital discrimination.Recent advances inbandpassFig.2.A high-level view of the prototype FDC preceded by an IFfilter. Recently,new DPLL-like structures have been proposed that use oversampling and quantization noise shaping to limit the portion of the quantization noise that resides within the signal band[9]–[12].The key component of the FDC presented in this paper,referred to as a delta–sigma phase-locked loop(has theform(1)where is the instantaneousfrequency relativeto,andas an input signal,F/D conversion is the process of extracting,sampling,anddigitizing.The prototype FDC is referred to asa(2)whereand is spectrally shaped quantization noiseequivalent to that of a conventionalsecond-order(3)It follows that,from a signal-processing point of view,theistaken to have a bandwidth of25kHz.Therefore,the effectiveoversampling ratio is approximately200.As inato reside at low frequencies,and noise shapingcauses the quantizationnoise to reside primarily at highfrequencies in the discrete-time spectrum.Consequently,muchof the quantization noise can be removed by a low-passdigital decimationfilter followingtheto be slightly larger than had it been sampled uniformly andit gives rise to harmonic distortion.The slight increase inthe sampled signal bandwidth is not a significant problem inthat it simply reduces the effective oversampling ratio by asmall amount.However,for applications such as broadcastFM,the distortion caused by nonuniform sampling would beintolerable if left uncorrected.Fortunately,the necessary correction can be performed inconjunction with the decimationfiltering at the cost of a rel-atively small amount of additional digital hardware[15].Theresulting structure is referred to as the NTU decimationfilter.A hardware-efficient version of the structure is presented at theregister-transfer level in Section IV.Through a combinationof digitalfiltering,nonuniform downsampling,and two-pointinterpolation,the NTU decimationfilter is able simultaneouslyto remove most of the out-of-band quantization noise andcorrect the distortion to better than14-b linearity.As demonstrated by the results of this paper,the primaryadvantage oftheblock,a digital constant adder,a4-b counter,and a timing controller.Two events control thetiming:1)transitionof from low to high and2)transitionof the“Carry”signal from low to high.The4-b counter isdriven by an80MHz external clock,and the“Carry”signalgoes high on its terminal count.In normal operation,the outputof the XOR gate goes highwhen goes high.This causesthe positive currentsourceFig.3.The 16PLL functional diagram.the capacitor.The negative currentsourceand adding nine.Thiseffectively sets the time until the next “Carry,”which closes the feedback loop.Because it is generated by the 4-b counter,the “Carry”signal and its time-shifted derivatives are synchronous with the 80MHz clock.The 2-b ADC samples and quantizes on the rising edges of the“Carry1”signal.B.Signal-Processing DetailsIt will now be shown that,from a signal-processing point of view,theis loaded intothe 4-b counter each time the“Carryis loaded into the4-b counter each time the “Carry”signal goes high in theversion of Fig.4(a).It is easy to verify that this difference does not alter the “Carry”signal or any of its time-shifted derivatives,and therefore has no effecton,of the rising edgesof.Similarly,the feedback information is represented by the sequence oftimes,,of the rising edges of the “Carry”signal.The time information controls the gating of the current sources into the summing capacitor and therefore gets converted into a sequence of analog voltages across the summing capacitor.The ADC converts the sequence of analog voltages to a sequence of digital values.The digital sequence ultimately controls the data that gets loaded into the counter,which generates the feedbacksignal .The inputsignalchangesthe Therefore,can be viewed astheth sample period will be definedasGiventhat corresponds to a hard limited FM signal,it follows that the sample period varies as a functionofand has the form of asecond-orderdifferencer,a constant offset,and adiscrete-time integrator,as indicated in Fig.5.The heuristics behind this assertion are as follows.In each sample period,the positive current source is connected to the summing capacitorfor(a)(b)Fig.4.(a)A functionally equivalent but simplified version of the 16PLL.(b)The flow of signals in the 16PLL.Fig.5.The signal-processing operations performed by the charge pump and associated logic.period,namely,3.5periods of the 80MHz external clock.Thus,in each sample period,a “packet”of charge proportionaltois the step size of the ADC and the ratio ofthe positive and negative currents is nominally,the value at the input to the ADC attimeis given byThefrom their ideal values result in anintegrator gain error,and a deviation in the ratioFig.6.The signal-processing operations performed by the counter.its nominal value of 3.5/4results in an offset error prior to the integrator.The effects of such deviations with respecttoth sample period affects notonly butalso all future values of the sequence,namely,For example,suppose that a value of nine is loaded into the counter duringtheperiods of the 80MHz externalclock after its previous rising edge,soperiodsof the 80MHz clock,sooperation and the counter intheoperation and adiscrete-time integrator,as indicated in Fig.6.Note that Fig.6also contains the replacement indicated in Fig.5.Fig.7showsthecan be writtenas(5)Theoccurs attime(i.e.,withFig.8.The signal-processing equivalent of the 16PLL redrawn to show it in the form of a second-order 16modulator.Fig.9.The simple switched current source and the overshoot problem.becomes(6)With thedefinitionNotethatrelativetoth samplingperiodThus,relativetoand achieved bythe4offset,theTherefore,relative deviationsbetween the positive and negative currents do not affect the quantization noise shaping performance ofthefrom their ideal values result ina gain error in the signal processing associated with the charge pump.In the system of Fig.8,this corresponds to a gain of1is the gain error.Thus,the gain error has the same effect inthetrack each other so as tominimizeFig.10.A simplified schematic diagram of the negative switched current source used in the16PLL.drainofA.When“A,which causesthe voltage at the drainoftransfer functiongivenbyrepresents a good compromise with respect to these tradeoffs.1Moreover,it can be implemented efficiently using a multiplier-free recursive structure.Thefilteredsequence still contains the low-frequencysignal information,but with much of the high-frequency quan-tization noise removed.The maximum passband magnitude ofthe combfilter is givenbyis the residual quantization noiseafter thefiltering process.Alternatively,it follows from(4)that10z0501Fig.11.The NTU decimation filter high-levelstructure.Fig.12.The NTU downsampler.B.NTU DownsamplerThe NTU downsampler is shown at the register-transfer level in Fig.12.It consists of two subblocks:the uniform clock generator and the interpolating downsampler .The uniform clock generator extracts the sample time informationfromto determine which pairs of actual sample times straddle the desired sample times.The interpolating downsampler performs frequency estimation at the desired sample times by two-point interpolation about these actual sample times.The desired sample times are defined to occur at uniform intervalsofis the average downsampling ratiobecauseis 256dc offsetof from(10)and causes a value proportionaltoSince1.After exceeds zero,the output of the hard limiterswitches from1,so the scaled hard limiter output will combine withthe,insuring that the next valueofis negative.This effectively sets a new desired sample time.Thus,each rising edgeofis the quantization noise remainingatth sample of the resulting sequence isthe nonuniform phase estimate attimeFig.13.The interpolation between actual sample points to estimate the desired downsampled values.wheretwo-point interpolation is usedto find the uniform phaseestimatewhich is depicted graphically in Fig.13.From (9),(11),(12),and Fig.12,it followsthatoperation performeduponandas well as that imposed by the effective moving average of the desired signal performed by the nonuniform-to-uniform conversion processing.V.E XPERIMENTAL R ESULTSAs depicted in Fig.14,theprototypewithkHzandFig.14.Measurement schematic and PSD plots of representative measured data.The PSD of the16PLL output is in units of decibels/hertz relative to the 16PLL quantization step size12(for ease of comparison to the corresponding data from a conventional16)modulator.The PSD of the16FDC outputis in units of decibels/hertz relative to1Hz2(because the output of the16FDC is a frequency estimate in units of hertz).Fig.15.Measured SINAD and SFDR versus frequency deviation(i.e.,message signal amplitude).measured SINAD and SFDR reach their maximum values.Inan FDC,the same measurements are performed except thefrequency deviation of the input signal is increased instead ofthe amplitude.Fig.15shows a plot of measured SINAD andSFDR values as functions of increasing frequency deviation.In each case,the input signal was an FM signal with a12kHz sinusoidal message signal.The resulting peak SINADand SFDR were89and94dB,respectively.In general,thepeak SINAD and SFDR were found to depend somewhat onthe choice of the frequency and offset of the message signalused to generate the FM input signal.This is not surprising,as the output sample times of the2052IEEE JOURNAL OF SOLID-STATE CIRCUITS,VOL.33,NO.12,DECEMBER1998Fig.17.Die photograph.TABLE IM EASUREMENT SUMMARYAs shown in the figure,the maximum in-band intermodulation product for this testwas75kHz.In thiscase,only 14%of the full no-overload range of theprototypem,single-poly,CMOSprocess,and an efficient digital nonuniform-to-uniform dec-imation filter architecture has been proposed.Together,these components were shown to achieve 50kSample/s frequency-to-digital conversion of a 10MHz FM signal with a worst case SINAD of 85dB and a worst case SFDR of 88dB.The system performs limiter-discriminator FM demodulation and digitization.Relative to other approaches involving separate A/D conversion and FM discrimination,it offers the advan-tage of excellent performance with very low analog circuit complexity and only moderate digital complexity.R EFERENCES[1]S.R.Norsworthy,R.Schreier,and G.C.Temes,Delta–Sigma DataConverters:Theory,Design,and Simulation .Piscataway,NJ:IEEE Press,1997.[2] A.K.Ong and B.A.Wooley,“A two-path bandpass 61modulatorfor digital IF extraction at 20MHz,”in IEEE ISSCC Dig.Tech.Papers ,Feb.1997,pp.212–213.[3]M.Song,J.Park,W.Joe,M.J.Choe,and B.S.Song,“A fully-interated5MHz-IF FM demodulator,”in Proc.IEEE Custom Integrated Circuits Conf.,May 1997.[4]S.Jantzi,K.Martin,and A.Sedra,“A quadrature bandpass 61modulator for digital radio,”in IEEE ISSCC Dig.Tech.Papers ,Feb.1997,pp.216–217.[5]W.c.Lindsey and C.M.Chie,“A survey of digital phase-locked loops,”Proc.IEEE ,vol.69,pp.410–431,Apr.1981.[6]G.M.Bernstein,M.A.Lieberman,and A.J.Lichtenberg,“Nonlineardynamics of a digital phase locked loop,”IEEE mun.,vol.37,pp.1062–1070,Oct.1989.[7] C.A.Pomalaza-Raez and C.D.McGillem,“Digital phase-locked loopbehavior with clock and sampler quantization,”IEEE mun.,-33,no.8,pp.753–759,Aug.1985.[8] B.C.Sarkar and S.Chattopadhyay,“Symmetric lock-range multi-levelquantized digital phase locked FM demodulator,”IEEE mun.,vol.38,no.12,pp.2114–2116,Dec.1990.[9]I.Galton and G.Zimmerman,“Combined RF phase extraction anddigitization,”in Proc.IEEE Int.Symp.Circuits and Systems ,May 1993,pp.1104–1107.[10]I.Galton,“Analog-input digital phase-locked loops for precise frequencyand phase demodulation,”IEEE Trans.Circuits Syst.Part II ,vol.42,pp.621–630,Oct.1995.GALTON et al.:DELTA–SIGMA PLL2053 [11]R. D.Beards and M. A.Copeland,“An oversampled delta sigmafrequency discriminator,”IEEE Trans.Circuits Syst.,vol.41,pp.26–32,Jan.1994.[12]M.Hovin,A.Olsen,nde,and C.Toumazou,“Delta-sigmamodulators using frequency-modulated intermediate values,”IEEE J.Solid-State Circuits,vol.32,pp.13–22,Jan.1997.[13]I.Galton,W.Huff,P.Carbone,and E.Siragusa,“A16PLL for14b50kSample/s frequency-to-digital conversion of a10MHz FM signal,”in Dig.Tech.Papers1998IEEE Int.Solid-State Circuits Conf.,vol.41,pp.366–367,Feb.1998.[14]J.C.Candy,“A use of double integration in sigma-delta modulation,”IEEE mun.,-33,pp.249–258,Mar.1985.[15]W.Huff and I.Galton,“Nonuniform to uniform decimation for delta-sigma frequency-to-digital conversion,”in Proc.1998IEEE Int.Symp.Circuits and Systems,May1998,pp.365–368.[16]U.L.Rohde,J.Whitaker,and T.T.N.Bucher,CommunicationsReceivers,2nd ed.New York:McGraw-Hill,1997.Ian Galton received the Sc.B.degree from BrownUniversity,Providence,RI,in1984and the M.S.and Ph.D.degrees from the California Institute ofTechnology,Pasadena,in1989and1992,respec-tively,all in electrical engineering.He is currently an Associate Professor at theUniversity of California(UC),San Diego.He wasformerly with UC Irvine,Acuson,and Mead DataCentral and has acted as a regular Consultant forseveral companies.His research interests involveintegrated signal-processing circuits and systems for communications.He has received four patents.Dr.Galton received the Caltech Charles Wilts doctoral thesisprizeWilliam Huff received the B.S.degree in electrical engineering from the University of California,Los Angeles,in1990.He is currently pursuing the Ph.D. degree in electrical engineering at the University of California,San Diego.From1990to1996,he was an Analog Circuit Designer with Lear Astronics,Santa Monica,CA. His research interests include signal processing for communication systems and design of integrated circuits including delta–sigma frequency-to-digital converters and frequencysynthesizers.Paolo Carbone received the engineer’s degree andthe Dottorato di Ricerca degree from the Universityof Padova,Italy,in1990and1994,respectively.He joined the Electronics Engineering Depart-ment of the University“Roma Tre”in Rome,Italy,in1994as an Assistant Professor.Since1997,he has been an Assistant Professor at the Istitutodi Elettronica,University of Perugia,Italy.Hisresearch interests cover the application of digitalsignal-processing techniques to the measurementscience and the issues related to their practicalimplementation.Eric Siragusa received the B.S.degree(summa cumlaude)in electrical engineering from the Universityof California,Irvine,in1996.He is currently pur-suing the Ph.D.degree in electrical engineering atthe University of California,San Diego.From1996to1998,he was a Design Engi-neer with Newport Microsystems Corp.,Irvine,CA,where he was involved in the system and circuitlevel design of mixed-signal communication IC’s.His research interests include signal processing andIC design for communication systems and circuits. Mr.Siragusa received a Center for Wireless Communication Graduate Student Fellowship in1998.。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Institute of Phonetic Sciences,University of Amsterdam,Proceedings 21 (1997), 143-153FORMANT FREQUENCIES OF DUTCH VOWELS IN TRACHEOESOPHAGEAL SPEECH* Corina J. van As1,2, Annemieke M.A. van Ravesteijn1, Florien J. Koopmans-vanBeinum1, Frans J.M. Hilgers2, Louis C.W. Pols11Institute of Phonetic Sciences, University of Amsterdam, Amsterdam, TheNetherlands2The Netherlands Cancer Institute/Antonie van Leeuwenhoek Hospital, Amsterdam,The NetherlandsAbstractIn the present study Dutch vowel formant characteristics of laryngectomizedtracheoesophageal speakers are investigated. Both vowels in CV nonsense syllables (/a/,/i/, and /u/), and vowels of stressed syllables in read-aloud text are studied. It appearedthat in the nonsense syllables the first formants were comparable to those of normalspeakers, but that the second formants were higher than in normals for the /a/, and /i/,and lower than in normals for the /u/. In read-aloud text however a large portion of theformant frequencies F1 and F2was significantly higher than the formant frequencies innormal speakers. It is thought that the higher formant frequencies can be explained bythe fact that the vocal tract is shorter in laryngectomized speakers, since the neoglottis inthese speakers has a higher position than the vocal folds in normal speakers.1. IntroductionIn a total laryngectomy the whole larynx, and thus the vocal folds are removed. The most widely used methods of restoring speech following total laryngectomy are tracheoesophageal (TE) and esophageal speech. Since the past two decades, TE speech has become the most preferred method of voice restoration. The disconnection of the upper and lower airways implies that, for esophageal speech, the air supply consists of injected volumes of air from the mouth into the esophagus. For TE-speech, a fistula is created between the trachea and the esophagus; this opening allows the insertion of a prosthesis, which acts as a one way valve through which pulmonary air can be directed into the esophagus. The main difference between esophageal speech and TE speech is therefore the air supply.In a few earlier studies on vowel formants in alaryngeal speech, esophageal speech is studied. Sisty and Weinberg (1972) studied both male and female esophageal speech. For both groups of speakers the systematic changes in formant frequencywere similar, and the mean formant frequencies of esophageal speakers were found to be consistently higher than those of normal speakers. For men the average increases were 122 Hz for the first formant, and 325 Hz for the second formant.Comparable observations were made by Rollin (1962) who studied English vowels in esophageal speech, and Kytta (1964) who studied Finnish vowels in esophageal speech. Sisty and Weinberg (1972) state that the consistency of this finding across languages, sex, and vowels shows that removal of the larynx does alter vocal-cavity transmission characteristics. They conclude that differences in tongue position (Nichols, 1968), and mouth opening do not fully explain this effect (Stevens and House, 1952; Fant, 1960); but that a reduction in the effective length of the vocal tract may account for these changes in formant frequencies. Kytta (1964) found a higher first formant for all vowels except /u/, /o/, and /e/, and a higher second formant for all vowels. His explanation for these changes is that following removal of the larynx, after which the base of the tongue is directly connected with the esophagus, the vocal tract loses a portion of its most posterior resonance cavity, which becomes apparent as a rise of the mean frequency results for all formants studied. In cineradiographical studies that Kytta (1964) performed in laryngectomees (n = 6), the neoglottis could be defined at the level of the sixth cervical vertebra in four patients, between the fifth and sixth cervical vertebra in one patient, and between the fourth and the fifth cervical vertebra in one patient.He also found that the shape and function of the neoglottis were not affected by the articulation of the three most extremely shaped vowels /a/, /i/, and /u/.Schilling and Binder (1926), and Beck (1931) studying German vowels in esophageal speech (n =1, and n =2, respectively), and Luchsinger (1952) studying Swiss esophageal vowels (n = 3) demonstrated in the cases investigated only little differences between vowel formant frequencies of normal and esophageal speakers.According to Damsté (1958) this is understandable since the buccopharyngeal cavity has changed little. Later however, Diedrich and Youngstrom (1966) who obtained cinefluorograms of a patient one day prior to and 20 months following surgery, demonstrated that the effective length of the vocal tract of this patient was reduced postoperatively.In the present study, in contrast to the earlier mentioned studies, tracheoesophageal speech is studied. For this purpose Dutch vowels are studied. Since tracheo-esophageal puncture is nowadays the most widely used method of vocal rehabilitation, it is interesting to get insight in the differences that occur between vowel formant frequencies of tracheoesophageal and normal speech. To the best of our knowledge no comparative data are available yet.2. Subjects and methods2.1 SubjectsSubjects were 17 male tracheoesophageal speakers. All of them underwent a standard total laryngectomy, and used a Provox® voice prosthesis (Hilgers and Schouwenburg, 1990). Ages varied from 45 to 81 years, with a mean of 65 years. The time after surgery varied from 9 months to 11 years, mean time after surgery was 6 years. Stoma occlusion was normally performed with thumb or finger in 5 patients, with a Provox®Stomafilter (Hilgers et al., 1996) in 10 patients, and with a Blom-Singer Adjustable Tracheostoma Valve (Blom et al., 1982) in 2 patients.2.2 Speech MaterialSpeech material was taken from an earlier study in this patient group in which the influence of stoma occlusion was investigated (van As et al., in press). For each patient series of nonsense syllables (CV) and a read-aloud text were recorded twice, one recording was made while the stoma was occluded by finger, and one recording was made while the stoma was occluded by a Provox® Stomafilter. In the nonsense syllables the vowel was always an /a/, /i/, or /u/, the consonant differed (/p/, /b/, /t/, /d/, /f/, /v/, /s/, /z/, /k/, and /g/ were used). The text contained all Dutch vowels except /ø/, and was the same as used in a study of Dutch vowel frequencies by Koopmans-van Beinum (1980). Since it is not expected that stoma occlusion may affect vowel formant frequencies (the use of an extratracheal device does not affect the source-filter system), for each patient both texts and both series of nonsense syllables were used in the investigation. The vowel formant frequencies for both occlusion conditions are grouped together for the nonsense syllables as well as for the texts.2.3 MethodsFor the recordings as well as for the formant frequency analysis, the Computerised Speech Lab of Kay Elemetrics Corporation (Lincoln Park, NJ, USA) was used. Via the external module of the Speech Lab, the speech data are digitally stored on a DAT tape. The DAT recorder was a portable Sony TCD8 recorder. The speech data are stored with a sample frequency of 48000 Hz. The microphone was a head-set microphone (AKG-c410), which is standardly used with the Computerised Speech Lab (CSL). The mouth-to-microphone distance was 2,5 centimeters.All vowels were auditorily and visually selected from the oscillogram and the audio signal. For the nonsense syllables only the vowels /a/, /i/, and /u/ were available from the speech material, each of the vowels was selected 5 times. From read-aloud text all Dutch vowels except the vowel /ø/, which was not used in the text, were available. From the other vowels in the read-aloud text 5 items were selected, except for the vowels /y/ and /Y/, which could only be used twice, and the vowel /u/ which could only be selected three times. The vowels that were used were the same for each subject, and selected on the basis of vowel environment criteria, i.e. stressed syllable, no nasal, and no /r/ or other surrounding consonants that could influence the formant frequencies.For each vowel the first (F1) and second (F2) formant frequency were measured usingFFT (Fast Fourier Transform) analysis, performed with CSL. The frame length used was 10 ms, a Hamming window was used for the analysis. Exact positions of the formant frequencies were determined visually at the energy peaks in the frequency spectrum. As a control formant frequencies of 10 /a/ vowels were also measured via the signal processing software package Praat (Boersma and Weenink, 1996).2.4 ControlsThe formant frequencies found in this study were compared to those of normal Dutch speakers from literature.Control values for the /a/, /i/, and /u/ formant frequencies for our nonsense syllables were obtained from a study by Pols et al. (1973). They studied vowel formants in 50 male Dutch speakers. Their words were of the type /h/-/V/-/t/. Control values for the vowel formant frequency in our read aloud text were compared with the formant frequencies found in a study of Koopmans-van Beinum (1980), who used the same text. The formant frequencies found in stressed syllables in read aloud text of both an untrained speaker and a trained speaker were used. In the present study the ratios between the first and the second formant were determined for comparison, for this comparison control values were also taken from studies mentioned above.2.5 StatisticsA t-test for one sample was used to investigate possible differences in formant values and ratios of the first and second formant between TE-speakers and the control values of normal speakers.3. ResultsBoth first and second formant frequencies could be measured in 81% of the vowels, as a whole 93,1% of the formants could be measured. Failures were spread over all data, and not specifically in one speaker or vowel. In some cases vowels were too short, or were recorded too loud and then clipped, in other cases formants were ‘missing’ or fell together. In Figure 1(a) a spectrum of a vowel /a/ with clear formants is given, in Figure 1(b) a spectrum of a vowel /a/ is shown in which the formants also are clearly visible, but the second formant is absent, most probably it falls almost together with the first formant. These figures are drawn with the program Praat developed by Boersma and Weenink (1996). For 8 /a/ vowels spoken by tracheoesophageal speakers results of the programs CSL and Praat were compared. The CSL program only draws a spectral envelope, but the formants that are found are comparable with those found in the spectrum drawn by Praat, the averaged absolute difference between formant values for both programs was 16 Hz.The two spectra in both figures below also give a clear indication of the large amount of noise in these tracheoesophageal voices. Only a few harmonics can be seen, the large amount of noise results in the absence of higher harmonics. These harmonics also show that the voice shown in Figure 1(a) contains more noise (fewer harmonics), than the voice shown in Figure 1(b).Figure 1(a). Spectrum of the vowel /a/, spoken by a tracheoesophageal speaker. Clear formants can be seen.Figure 1(b). Spectrum of the vowel /a/ spoken by a tracheoesophageal speaker, the second format is absent, and most probably falls together with the first formant.3.1. T racheoesophageal speech versus normal speech3.1.1 CV syllablesFirst the vowels /a/, /i/, and /u/ extracted from the nonsense syllables were studied. These vowels were chosen since they represent the most extreme articulation positions of the vocal tract. Comparison of the vowel formant frequencies found in the vowels /a/, /i/, and /u/ of the nonsense syllables with the vowel formant frequencies that were found by Pols et al. (1973) were performed by means of a Student’s t-test in which formant frequencies of each separate tracheoesophageal speaker were compared with the mean values of the normal speakers. The t-test showed certain differences between both speaker groups. The first formant showed no significant differences for either of the three vowels, the second formant showed a significant difference for all three vowels: the second formant of the /u/ was found to be significantly lower than the normal frequency, the second formants of the /a/ and /i/ were significantly higher than the normal frequency.Mean formant frequency values found for tracheoesophageal speech and normal speech are given in Table 1. In figure 2 a graphic representation of these differences is given.Table 1. Mean formant frequencies of Dutch vowels in syllables for male tracheoesophageal speakers (n = 17) and in words for male normal speakers (n = 50). Also the t-value and probabilities are given.Figure 2. Plot of the vowel formant frequencies in nonsense syllables of normal male Dutch speakers (Pols et al. 1973, average value of 50 speakers) and male tracheoesophageal Dutch speakers (average value of 17 speakers, this study).3.1.2 Read-aloud textRegarding the first and second vowel formant frequencies of stressed syllables in read-aloud text, a Student’s T-test between the control values of the trained and the untrained speaker, and the values found by the tracheoesophageal speakers showed significant differences between both speaker groups. In Table 2 the vowel formant frequencies are given, with t-value and probability. In Table 3 an overview is given of the differences that were found.Table 2. Vowel formant frequencies of vowels of stressed syllables in read-aloud text of one trained and one untrained speaker (Koopmans-van Beinum, 1980), and the mean vowel formant frequencies of vowels of stressed syllables in read-aloud text of the tracheoesophageal speakers (n = 17). For both conditions t-value and probability (p) are given.Table 3. Summary of all differences for first and second vowel formant frequencies in read-aloud text, between normal and tracheoesophageal (TE) Dutch speakers. Vowels are represented with IPA (International Phonetic Alphabet) symbols.A graphic representation of the formant frequency values found in text for the normal trained speaker and the normal untrained speaker and tracheoesophageal speakers (n = 17) is given in Figure 3.Figure 3. Plot of the vowel formant frequencies in stressed syllables of read aloud text of a male untrained Dutch speaker (Koopmans- van Beinum, 1980), a male trained Dutch speaker (Koopmans-van Beinum, 1980), and tracheoesophageal Dutch speakers (average value of 17 speakers).3.2. Ratios between F1 and F2.Each separate mean vowel ratio that was found for each TE-speaker was compared to the control value of the normal speaker group by means of a Students t-test for one sample. In Table 4 the mean ratios of the control groups and the TE speakers are given, both for syllables and text. Also the vowels for which the ratios were significantly different, are indicated.Table 4. Averaged ratios between first and second formant frequencies, of normal speakers in words, of TE-speakers in nonsense syllables, of an untrained and a trained speaker in stressed syllables of read-aloud text, and of TE-speakers in stressed syllables or read-aloud text.Vowel (IPA)Words-normalSyllablesTETextUntrainedTextTrainedText-TE* Statistical significant difference between ratios of vowels in syllables (p < 0.05).1 Statistical significant difference with ratio of untrained speaker (p < 0.05).2 Statistical significant difference with ratio of trained speaker (p < 0.05).As can be seen in the Table for part of the vowels the ratios were significantly different from those of the control group. For the syllables the ratios of the vowels /a/ and /i/ were higher, and the ratios of the vowel /u/ were lower in tracheoesophageal speech. For the vowels from the text the ratios of the vowels /u/, /a/, /α/, /i/, /I/, /ε/, /y/, and /Y/ were significantly different from the control ratio of an untrained speaker. The ratios of the vowels /i/, /I/, /ε/, /y/, and /Y/ were also significantly different from the control ratio of the trained speaker.3.3 Interspeaker differences between the tracheoesophageal speakersBy means of a paired Students t-test it was investigated for each speaker separately whether or not his formant frequencies extracted from read aloud text, differed from those of the trained and the untrained speaker. It appeared that 9 speakers significantly differed from both the trained and the untrained speaker. One tracheoesophageal speaker differed significantly only from the trained speaker. For the remaining seven tracheoesophageal speakers no significant differences with the trained or untrained speaker were found.4. Discussion and conclusionsThe aim of this study was to investigate the first and second vowel formant frequencies in male Dutch tracheoesophageal speakers. Studies reporting on vowel formant frequencies are from Beck (1931), Luchsinger (1952), Schilling and Binder. (1926), and Damsté (1958), who reported only a small difference between esophageal and normal speech, and from Sisty and Weinberg (1972), Rollin (1962) and Kytta (1964), who found higher formant frequencies in esophageal speech. To the best of our knowledge no reports on formant frequencies in tracheoesophageal speech are available yet. It can, however, be expected that the vocal tract is comparable to the vocal tract of esophageal speakers, since the type of surgery is the same. The only difference between esophageal and tracheoesophageal speech lies in the fact that tracheoesophageal speech, like normal speech, is pulmonary driven.As found in earlier studies on Finnish (Kytta, 1964), and English (Sisty and Weinberg, 1972; Rollin 1962) esophageal speech, also in the present study on Dutch tracheoesophageal speech higher vowel formant frequencies were found compared to normal speech. It was found that the first and second vowel formant frequencies of male Dutch tracheoesophageal speakers do differ significantly of those of normal male Dutch speakers. As a whole the “vowel triangle” is enlarged for the TE speaker group. An explanation for this might be, according to Sisty and Weinberg (1972), that the length of the vocal tract is shorter compared to normal speakers. Also the back of tongue might be a little lowered, due to the removal of the larynx. The formant frequencies found in this group had a large range of variation, which is in concordance with the observations of Rollin (1962) and Sisty and Weinberg (1972). The differences between TE-speakers may be larger than those between normal speakers, since the anatomy of the voice source and the vocal tract depends on the type and extent of the surgical intervention. These differences in vocal tract most probably also explain the intraspeaker differences that were found between the TE-speakers. This leads to significant differences in vowel formant frequencies compared to normal speakers in one part of the TE-speakers, and similar vowel formant frequencies compared to normals in another part of the TE-speakers. Also the ratios between the first and second formants are significantly different from the control ratios for part of the vowels. Performing videofluoroscopy recordings during speech may give more information about the length of the vocal tract and the position of the back of the tongue compared to normals, and can also give insight about the influence of the extent of the surgical intervention on this phenomenon. Also the influence of the different vowel formant frequencies and ratios on the intelligibility of the vowels should be studied.5. AcknowledgementsThe second author’s masters thesis was the basis for this report (van Ravesteijn, 1997). The authors wish to acknowledge the Maurits and Anna de Kock Stichting for their financial support of the equipment used for the recordings and formant analysis. They also wish to thank all patients for their participation in the study.6. ReferencesBlom, E.D., Singer, M. I., and Hamaker, R.C. (1982). “Tracheostoma valve for postlaryngectomy voice rehabilitation”. Annals of Otology, Rhinology, and Laryngology,91, 576-578. Boersma, P. and Weenink D.J.M. (1996). “Praat, a system for doing phonetics by computer, version3.4”. Report of he Institute of Phonetic Sciences, Amsterdam,132, 182 pp.Beck, J. (1931). “Zur Phonetik der Stimme und Sprache Laryngektomierter”, Zeitschrift für Laryngologie usw., Bd. 21, H6, 506-521.Damsté, P.H. (1958). Oesophageal speech after laryngectomy, Ph.D. Thesis, University of Groningen. Diedrich W.M., and Youngstrom, K.A. (1966). Alaryngeal speech. Springfield: Charles C. Thomas. Fant, G. (1960). Acoustic Theory of Speech Production. The Hague, The Netherlands: Mouton, ‘s Gravenhage.Hilgers, F.J.M., and Schouwenburg, P.F. (1990). “A new low-resistance, self retaining voice prosthesis (Provox®) for voice rehabilitation after total laryngectomy”. Laryngoscope,100, 1203-1207. Hilgers, F.J.M., Ackerstaff, A.H., Balm, A.J.M., and Gregor, R.T. (1996). “A new heat and moisture exchanger with speech valve (Provox® stomafilter”. Clinical Otolaryngology,21, 414-418.Koopmans-van Beinum, F.J. (1980). Vowel contrast reduction. An acoustical and perceptual study of Dutch vowels in various speech conditions. Ph.D. Thesis, University of AmsterdamKytta, J. (1964). “Finnish oesophageal speech after laryngectomy: Sound spectrographic and cineradiographic studies”, Acta Otolaryngologica (Stockholm), Suppl. 195.Luchsinger, R. (1952). “Der Mechanismus der Sprech- und Stimmbildung bei Laryngektomierten und die Ubungsbehandlung”, Practica Oto-Rhino-Laryngologica,XIV, Fasc. 4/5, 304-323. Nichols, A.C. (1968). “Loudness and quality in esophageal speech and the artificial larynx. In: J.C.Snidecor (Ed.), Speech Rehabilitation of the Laryngectomized (2nd ed). Illinois: Thomas, 108-127.Pols, L.C.W., Tromp, H.R.C., Plomp, R., (1973). “Frequency analysis of Dutch vowels from 50 male speakers”, The Journal of the Acoustical Society of America,53, 1093-1101.Rollin, W.J. (1962). A comparative study of vowel formants of esophageal and normal-speaking adults.Ph. D. Thesis, Wayne State University.Schilling, R., and Binder, H. (1926). “Experimentalphonetische Untersuchungen über die Stimme ohne Kehlkopf”, Archiv für Ohren-, Nasen- en Kehlkopfheilkunde, 115, 235-270.Sisty, N.L., and Weinberg, B. (1972). “Formant frequency characteristics of esophageal speech”, Journal of Speech and Hearing Research,15, 439-448.Stevens, K.N., and House, A.S. (1955). “Development of a quantitative description of vowel articulation”, The Journal of the Acoustical Society of America, 27, 484-493.Van As, C.J., Hilgers, F.J.M., Koopmans-van Beinum, F.J., and Ackerstaff A.H. (in press). “The influence of stoma occlusion on aspects of tracheoesophageal voice”, Acta Otolaryngologica. Van Ravesteijn, A.M.A, (1997). Evaluatie van tracheoesophageale spraak. Master’s Thesis, University of Amsterdam.IFA Proceedings 21, 1997153。