Abstract VRIO A Speech Processing Unit for Virtual Reality and Real-World Scenarios- An Exp

合集下载

Expressive Prosody for Unit-selection Speech Synthesis

Expressive Prosody for Unit-selection Speech Synthesis

Expressive Prosody for Unit-selection Speech Synthesis Volker Strom,Robert Clark,Simon KingCentre for Speech Technology ResearchThe University of Edinburgh,Edinburgh,UKvstrom@AbstractCurrent unit selection speech synthesis voices cannot produce em-phasis or interrogative contours because of a lack of the necessary prosodic variation in the recorded speech database.A method of recording script design is proposed which addresses this shortcom-ing.Appropriate components were added to the target cost func-tion of the Festival Multisyn engine,and a perceptual evaluation showed a clear preference over the baseline system.Index Terms:speech synthesis,unit selection,prosody,recording script design1.IntroductionThe Festival unit selection speech synthesis system,Multisyn[1], achieves highly natural synthetic speech by avoiding use of an ex-plicit model of prosody in terms of F0and duration.Instead,large amounts of speech are recorded,so that each diphone is available in a variety of prosodic contexts.Of course,this means that there is no user control over the prosody of the resulting speech.Even when F0and duration models are used in unit selection systems with large databases,and these models are learnt from speech as is[2],they still represent an“average prosody”sounding somewhat unnatural and monotonous,similar to that of diphone synthesis.The Festival Multisyn engine did not previously model prosody at all,except for distinguishing sentence-internal from sentence-final phrase boundaries.This works surprisingly well, provided that the database speech style closely matches the required synthesis style,and in particular is good for read newspaper-style text.But,for generating prosody appropriate for conveying specific meanings,such as emphasis or the type of a question,some prosody control is essential(note that,unlike[3], we are not attempting to convey emotional content).It remains an open question as to on what level“prosodic con-text”is best described,and how to design a suitable text corpus. In our approach,we focus on emphasis and boundary tones,repre-sented on the symbolic level.2.Text corpus designCorpus design,in this context also known as“recording script de-sign”or simply“text selection”,aims for maximal coverage of diphones in context.Above and beyond completeness of cover-age,instances of diphones in multiple text types and reading styles are also desirable.Contextual features may include a stressflag for each half phone,the presence or absence of a boundary of a syllable,word,phrase,or sentence,(three possible locations), or more abstract descriptions such as ToBI accent and boundarytones,pitch accents,nuclear accents,sentence mood etc.The con-text may also include the identity of neighbouring phones.To select those sentences to be recorded,a large text corpus is searched in order to determine the set of existing diphones-in-context.A subset of sentences is selected which covers them all, and which is as small as possible.The main advantage of making the subset as small as possible is the reduced effort of manually correcting the automatic annotations.2.1.Standard approachesIn our standard approach,“context”refers to just syllable and word boundaries,plus lexical stress(as a binary feature,although the lexicon distinguishes primary,secondary and tertiary stress).In a text corpus of442k newspaper sentences,11585distinct types of diphones-in-context were found.A subset of7k sentences was selected that covers all of them.But even in this restricted definition of context,complete cov-erage of all existing diphones is almost impossible to achieve. Figure1shows how many distinct diphones-in-context are found in subsets of the442k newspaper sentences,made by randomly choosing1/2,1/4,....of the corpus.Extrapolating this curve sug-gests that even10million newspaper sentences will not be enough to cover all diphones-in-context.This true even for a definition of context that ignores intonational context entirely.It is question-able whether covering diphones in all syllable boundary contexts is more desirable for synthesis than covering all diphones in,for example,all boundary tone contexts.200040006000800010000120001400010000 100000 1e+06 1e+07 #diphonescorpus size in sentenceswith syllable boundarieswithout syllable boundariesno contextFigure1:Number of different diphones as a function of corpus size.Some approaches take more context into account,but still ig-nore prosody.[4]first covers all triphones found in a database of 153k sentences,then cover larger and larger units found in that database,up to morphemes with phone context.[5]attempts to cover as many pentaphones as possible.In[8]and[3],small databases for specific prosodic expres-sions,such as contrast and yes/no-questions,were recorded in or-der to train speaker-independent prosodic models for duration and F0.For synthesis,a standard speech corpus is used(to save thecost of recording the specific prosodic expressions for each voice), hoping that it still yields what those prosodic models ask for.Some text selection approaches aim to cover prosodic context in terms of phone duration and F0.[6]aims for an even distri-bution of the predicted F0and duration,while in[7]the recorded speaker is explicitly told to read525sentences at three different speaking rates(slow,normal,fast)as well as in three pitch ranges (low,normal,high),yielding nine sub-corpora.Models of expressive prosody are useless in unit selection syn-thesis if they ask for units that simply are missing from the speech database.This may sound trivial,but it is exactly the reason for the poor realisation of emphasized words in[8].Selecting text on the basis of predicted prosody,on the other hand,is rather unreliable,due to the tendency of these predictors to average out the natural variance of prosody.Forcing a speaker to speak with a distinct pitch and rate,as in[7],runs the risk of unnatural-sounding joins when units from different sub-corpora are joined.2.2.Our proposed approachIn order to avoid modelling prosody on the acoustic level,text marked up on a more abstract level was desired.Our initial idea was to use text generated by a dialogue system,which comes marked up with features such as given/new and theme/rheme. However,even the most sophisticated such system available to us, capable of producing67k different sentences(about bathroom de-sign,[9])did not produce sentences of sufficient variety to form the basis of a general-purpose corpus.In the search for text that has a natural variety of prosody, Lewis Carroll’s children’s’stories“Alice in Wonderland”and “Through the Looking Glass”seemed to be good candidates.The-atre plays and movie scripts were also considered,but the most alluring feature of Carroll’s stories was their existing markup of emphasis using typographical devices.As shown in Table1,the major part of the recording script con-sists of word lists,read to achieve four different prosodic contexts, as described in Section3.2.Some specialist texts were added: spelling,digit strings,and addresses.Finally,a relatively small number of newspaper sentences were used in order to cover any remaining missing diphones.3.Corpus description3.1.CarrollFrom Lewis Carroll’s children stories“Alice in Wonderland”and “Through the Looking Glass”,1434sentences were selected man-ually from dialogue-rich sections.These are rich in questions,ex-clamations,quotations within spoken utterances and emphasized words such as contrastive and deictic pronouns.Furthermore,em-phasized words are already capitalized in the text or are quotations within a spoken utterance:...and even Stigand,the patriotic archbishop of Canter-bury,found it advisable‘Found WHAT?’said the Duck.‘Found IT,’the Mouse replied rather crossly:‘of course you know what“it”means.’‘I know what“it”means well enough,when Ifind a thing,’said the Duck:‘it’s generally a frog or a worm.The question is,what did the archbishopfind?’The speaker was asked to read in a spirited manner,but not to give the characters different voices.3.2.Word listsThe largest part of the speech corpus consists of lists of2880 words,selected from the Unisyn lexicon[10],such that all di-phones(with context)in phrase-final syllable position are covered. Each word was readfive times,with afixed intonation pattern:Ace,ace,ace.Ace?Ace!Ache,ache,ache.Ache?Ache!This covers continuation rise(L-H%at the commas),terminal intonation(L-L%at the period and exclamation mark)and inter-rogative intonation(H-H%at the question mark).The speaker was asked to emphasize the last word.Thus, many diphones in emphasized words are covered,but by no means all of them,since the emphasis is mainly on the lexically stressed syllable,which in polysyllabic words is not necessarily the last one.The intonation was rehearsed at the beginning of each recod-ing session and was found to be fairly stable throughout the word lists sub-corpus.Word selection criteria other than diphone coverage in word-final syllables were,in order:exclude homographs and function words,avoid proper nouns,prefer short words,and prefer more frequent words.The main problem with this sub-corpus seems to be the poor performance of the automatic phone alignment method[1].Be-cause the word lists are not continuous speech like the other parts of the corpus,they appear to cause problems when training HMMs from aflat start.The resulting models have problems in accurately placing silence/speech and speech/silence boundaries.This affects not only stops,but also voiceless fricatives and sometimes even vowels.A number of improvements to the procedure described in [1]were attempted,but no satisfying solution has been found yet.3.3.NewspaperIn the442k newspaper sentences mentioned in Section2.1,6998 distinct diphones-in-context,but without the syllable boundary feature,were found.This set was compared to the diphone set cov-ered thus far by the word list and Carroll’s children stories.The difference was2278diphones,most being word-initial or across a word boundary.The fall-back strategy of the unit selection algo-rithm canfix these by inserting a short pause.But for the remain-ing413word-internal diphones,283newspaper sentences were selected to be added to the recording script.3.4.Corpus statisticsTable1shows the sizes of the sub-corpora making up the entire speech database.words1400114288025645333328243total40916(7h)Table1:Partitions of the speech database.The speaking rate in words per minute varies considerably be-tween sub-corpora:56for the word list,93for Carroll,and 161for the newspaper text.This is reflected in the phone duration statis-tics:Figure 2shows the duration distribution of /aa/.0 20 40 60 80 100 120 140 160 00.10.20.30.40.5"dur_aa.carroll""dur_aa.news""dur_aa.wordlist"Figure 2:Distribution of durations in seconds for the phone /aa/in different subcorpora.0 1000020000 30000 40000 50000 60000 70000 80000 050100150200250300350400450500"F0.news""F0.wordlist""F0.carroll"Figure 3:Distribution of F0values in Hz in different subcorpora.4.Automatic text-based prosodic labellingThe emphasis labels are solely based on textual markup:a word is considered emphasized if it is in uppercase,or if it is a short quotation (one or two words)within a spoken utterance,or is a short exclamation (including the 2880such words in the word list sub-corpus).In our system,the type boundary tone in ToBI notation is de-termined from punctuation,the POS of the sentence-initial word,and a flag indicating whether the sentence contains the word “or”immediately following a comma.Wh-questions get an L-L%boundary tone,yes/no questions an H-H%,and alternative ques-tions (such as “1,2,3,or 4?”)get an H-H%at each alternative but the last one,which gets an L-L%.The sentence-initial word’s POS distinguishes Wh-questions from other questions,although the interrogative pronoun may oc-cur later,as in “But how do you know?”or “And what comes next?”.The single worded question “What?”and its equivalents like “What was that again?”are also likely to be pronounced with an H-H%.An exception list is used to deal with these.Questions which are not Wh-questions are either yes/no ques-tions or alternative questions,which is decided by the “or”flag.This rule is imperfect:e.g.consider “Why can’t we do that,or the people of Glasgow do that?”,found in the newspaper text.The text corpus was manually checked for this type of Wh-question that looks like an alternative question.Recognizing them auto-matically is left as a future improvement;currently a user of the synthesis system is required to re-formulate the question as e.g.“Why can’t we or the people of Glasgow do that?”in order to get the desired intonation.Accents are not labelled or modelled yet.5.Target cost functionThe default target cost is a weighted sum of normalized compo-nents,which each score how well a candidate diphone matches the given target.These features are (most highly weighted first):lex-ical stress,phrase-finality,part of speech (noun,verb,or function word),position of the diphone in its syllable,position of the di-phone in its word,left phonetic context and right phonetic context [1].In the new system with prosody,a relatively large penalty is added when one half of the target diphone is a vowel and should be be emphasized,but the candidate is not,or vice versa.Another penalty is added when the boundary tones of target and candidate (L-L%,L-H%,H-H%,or NONE)do not agree.Note that there are no components for accent,F0,or duration.The relative weights for the target cost components were set manually and are not optimal.6.Listening testsTwo web-based listening tests were used to evaluate the phrase boundary component and the emphasis component.6.1.Boundary listening testThe purpose of this test was to find out whether the phrase bound-ary component improves the overall quality of the system,and not to test whether listeners recognize yes/no questions when they are not syntactically marked as such,as in [8].Adding a target cost component for boundaries increases the pressure on the unit selection algorithm,because it exacerbates the problem of data sparsity.When the unit selection is pushed towards a particular intonation pattern at the cost of,for example,less smooth joins,the outcome may be worse than with the default system,even when the default system produced less appropriate intonation.100newspaper sentences were selected for the listening test,25each of:yes/no questions,Wh-questions,alternative questions,and statements.They ranged in length from 4to 19words with an average length of 9words.They were synthesized by two ver-sions of the Festival system:one using using all the target cost components described in the previous section,and one without the boundary cost component.The emphasis component was part of both systems,because this listening test is designed only to evalu-ate the boundary component.Without the emphasis component in the default system,emphasis can be realized somewhat arbitrarily,which would distract listeners from the main point of this test.The order of the sentences was randomized,as well as the or-der of the two versions for each sentence.10volunteers,all native speakers of English,took the test.All but one listened with head-phones.They were able to play each stimulus as often as they wanted,in any order,until they decided which version they pre-ferred (this was a forced choice test with no “undecided”option).The system with the boundary component was preferred in 557of the 1000stimuli-listener pairs (this is significant:).Comparing the listeners,the num-ber of votes for the system with the boundary component ranged from 48to 60.For 32of the 100stimuli pairs,all or all but one listener agreed with each other.6.2.Emphasis listening testSentence modality can be expressed syntactically,and even when a yes/no-questions does not start with a verb,speakers often do notraise the pitch at the end[11],because there is no need to do so if the nature of the question is obvious from the context.Thus there is no right or wrong question intonation:question intonation can only be more or less appropriate,formal,natural etc.The purpose of emphasis,on the other hand,is much less“or-namental”:it is often the only way to convey a special meaning such as contrast and is therefore essential,not optional.If speech synthesis system A succeeds in emphasizing the intended word in the perception of a listener,and system B fails to do so,it is not clear how to assess the segmental quality of both systems indepen-dently from system B’s failure.Therefore,in this second listening test,we looked at how well listeners recognize intended emphasis,regardless of other differ-ences between the two systems(such as better or worse segmental quality).Seven short sentences were selected,4to9words long (6.3on average),in which3,4,or7different words could carry emphasis.This resulted in24stimuli,the order of which was ran-domized.Again,the volunteers could listen to each stimulus as often as they wanted and in any order.Their task was to locate in each sentence the one word which they perceived as most prominent. 15listeners took the test,all but three of them with headphones.In 43%of the stimuli-listener pairs,the emphasized word was chosen correctly,with the average chance level being18%.The average agreement between listeners was83%.recog’d agreem’t for CAN he show us how to make it pay?93.33%93.33%CAN Can HE show us how to make it pay?0.00%73.33%pay Can he SHOW us how to make it pay?20.00%73.33%pay Can he show US how to make it pay?0.00%53.33%pay Can he show us HOW to make it pay?80.00%80.00%HOW Can he show us how to MAKE it pay?0.00%80.00%pay Can he show us how to make it PAY?86.67%86.67%PAYAverage:40.00%77.14%Table2:Recognition of emphasis shift.Table2shows an example sentence before randomizing the stimuli order.It is the longest sentence,having7possible em-phasis positions.Apparently it resulted in the most difficult set of stimuli:only in40%of all105listener-stimuli pairs(7variants 15listeners)the intended emphasis was recognized correctly. However,the chance level here is,and that means the recog-nition rate is2.8times above the chance level.7.DiscussionWhen it comes to phrase boundaries,in particular for yes/no-questions,our database should be big enough already.As men-tioned at the end of Section3.2,the major problem with this voice is still bad phone alignment,in particular at the boundaries be-tween speech and silence.[12]reports that although listeners prefer the standard intona-tion of yes/no-questions in natural speech,in synthesized speech their preference is based more on the overall quality than on into-nation.It seems as if the best way to improve the listener judge-ment of the phrase boundary component is to further improve the phone alignment.Looking closer at misrecognized emphasized words reveals that the recorded speaker did not put the same effort in all words marked up as emphasized.It is also striking that,if in doubt,listen-ers prefer the default location for nuclear accents,i.e.the rightmost pitch accents,as can be seen Table2.However,when looking at where in the database the selected units come from,it becomes ob-vious that,as pointed out in section3.2,many diphones in clearly emphasized words are still missing.Recording an additional600 or so utterances in the word list sub-corpus style should close this gap.8.AcknowledgementsThe Author would like to thank Scottish Enterprise(under the Edinburgh-Stanford Link)for funding this project,and Roger Bur-roughes for his voice.9.References[1]Robert A.J.Clark and Korin Richmond and Simon King:“Multisyn voices from ARCTIC data for the Blizzard chal-lenge”,Proc.Interspeech,2005[2]V.Strom“From Text to Speech Without ToBI”,Proc.Int.Conf.on Spoken Language Processing,2002 [3] E.Eide,A.Aaron,R.Bakis,W.Hamza,M.Picheny,and J.Pitrelli:“A Corpus-Based Approach to<Ahem/>Expres-sive Speech Synthesis”5th ISCA Speech Synthesis Work-shop,Pittsburgh,2004[4]M.Isogai,H.Mizuno and K.Mano:“Recording scriptdesign for corpus-based TTS system based on cover-age of variuos phonetic elements from Speech Signals”Proc.Int.Conf.on Acoustics,Speech and Signal Processing, 2005[5] B.Bozkurt and O.Ozturk and T.Dutoit:“Text design forTTS speech corpus building using a modified greedy selec-tion”Proc.Int.Conf.on Spoken Language Processing,2003 [6]H.Kawai,S.Yamamoto,and T.Shimizu:“A Design Methodof Speech Corpus of Text-to-Speech”,Proc.Int.Conf.on Spoken Language Processing,2000[7]H.Kawanami,T.Masuda,T.Toda and K.Shikano:“Design-ing Speech Database with Prosodic Variety for Expressive TTS System”Proc.LRE,2000[8]J.F.Pitrelli and E.M Eide:“Expressive Speech Synthesis Us-ing American English ToBI:Questions and Contrasive Em-phasis”Proceedings ASRU,2003[9]M.E.Foster,M.White,A.Stetzer and R.Catizone:“Mul-timodal Generation in the COMIC Dialogue System”Pro-ceedings of the ACL Interactive Poster and Demonstration Sessions,Ann Arbor,2005[10]/projects/unisyn[11]A.K.Syrdal and M.Jilka:“To Rise or To Fall:That is theQuestion”Acoustical Society of America-146th Meeting, Austin,TX,2003[12]A.K.Syrdal and M.Jilka:“Acceptability of Variations InQuestion Intonation in Natural And Synthesized American English”J.of the Acoustic Society of America,V ol155,No 33,2004。

语言学第二章

语言学第二章

语言学第二章《Summary》特刊语言学第二章总结编辑:孙波任冲校对:汪燕华老师康亮亮一、Phonetics 语音学1、definition:Studies how speech sounds are produced, transmitted, and perceived.研究语音是如何产生,传递和感知。

2、Articulatory Phonetics、Acoustic Phonetics、Perceptual Phonetics发音语言学、声学语言学、感知语言学3、二、Phonology 音系学the study of the sound patterns and sound systems languages 研究语音模式和语音系统三、Voiceless & Voiced Sounds 清音和浊音1、Voiceless sounds:The sounds produced without causing vibration of the vocal cords. 在发音过程中,发音时声带不振动。

2、Voiced sounds:The sounds produced with causing vibration of the vocal cords. 在发音过程中,发音时声带振动。

四、Consonants & vowels 辅音和元音1、Consonants:sounds produced by constricting or obstructing the vocal tract at some places to divert, impede or completely shut off the flow of air in the oral cavity. 发音时,声道的某些部位受到压缩或阻碍后,使得气流在口腔里转向、受阻或完全被阻塞所发出的音。

2、Vowels:sounds produced without obstruction, so no turbulence or a total stopping of the air can be perceived. 发音时,声道不受到任何压缩或阻碍,因此不会有气流的紊乱或停滞所发出的音。

WHISPERY SPEECH RECOGNITION USING ADAPTED ARTICULATORY FEATURES

WHISPERY SPEECH RECOGNITION USING ADAPTED ARTICULATORY FEATURES

WHISPERY SPEECH RECOGNITION USING ADAPTED ARTICULATORY FEATURESSzu-Chen Jou,Tanja Schultz,and Alex WaibelInteractive Systems LaboratoriesCarnegie Mellon University,Pittsburgh,PAscjou,tanja,ahw@ABSTRACTThis paper describes our research on adaptation methods appliedto articulatory feature detection on soft whispery speech recordedwith a throat microphone.Since the amount of adaptation datais small and the testing data is very different from the trainingdata,a series of adaptation methods is necessary.The adaptationmethods include:maximum likelihood linear regression,feature-space adaptation,and re-training with downsampling,sigmoidallow-passfilter,and linear multivariate regression.Adapted artic-ulatory feature detectors are used in parallel to standard senone-based HMM models in a stream architecture for decoding.Withthese adaptation methods,articulatory feature detection accuracyimproves from87.82%to90.52%with corresponding F-measurefrom0.504to0.617,while thefinal word error rate improves from33.8%to31.2%.1.INTRODUCTIONToday’s real-world applications are driven by ubiquitous mobiledevices while lack keyboard functionality.These applications de-mand new spoken input methods that do not disturb the environ-ment and preserve the privacy of the user.Verification systems forbanking applications or private phone calls in a quiet environmentare only a few examples.As a consequence,recent developmentsin the area of processing whispered speech or non-audible mur-mur1draw a lot of attention.Automatic speech recognition(ASR)has been proven to be a successful interface for spoken input,butso far,microphones have been used that apply the principle of air-transmission to transmit the sound from the speaker’s mouth to theinput device.When transmitting soft whisper,those microphonestend to fail,causing the performance of ASR to deteriorate.Contact microphones,on the other hand,pick up speech sig-nals through skin vibrations rather than by air transmission.As aresult,processing of whispered speech is possible.Research re-lated to contact microphones includes using a stethoscopic micro-phone for non-audible murmur recognition[1]and speech detec-tion and enhancement with a bone-conductive microphone[2].In our previous work,we have demonstrated how to use athroat microphone,one of many kinds of contact microphones,forautomatic soft whisper recognition[3].Based on that,this paperdiscusses how we incorporate articulatory features(AFs)as an ad-ditional information source to improve recognition results.Artic-ulatory features,e.g.voicing or tongue position,have shown greatpotential for robust speech recognition[4].Since whispery speechspeakers from those of BN data,and our sentences are different from the BN ones but in the same domain.Table1.Data for Training,Adaptation,and TestingAmountTraining66.48hrAdaptation712.8sTesting153.1s89.30/0.58589.04/0.579 Downsample88.46/0.55189.56/0.59289.26/0.583 log Mel-spec87.12/0.49388.95/0.57388.52/0.560 CMN-MFCC87.53/0.513FSA FSA+G.FSA90.27/0.61089.19/0.585also make performance worse,contrast to the improvements made for senone models[3].Since sigmoidal low-passfiltering with is the only improving adaptation method,the following experiments are conducted in addition to it.We then apply additional FSA,group FSA,group MLLR,and iterative MLLR methods with.As shown in Table3, Group FSA performs the best,so further iterative MLLR is con-ducted in addition to Group pared to its effects on senone models,iterative MLLR saturates faster in about20iterations and peaks at34iterations with performance90.52%/0.617.Fig.2shows a comparison of the F-measure of the individual AFs,including the baseline AFs tested on the BNeval98/F0test set and on the throat-whisper test set,and the best adapted AFs on the throat-whisper test set.The AFs are listed in the order of F-score improvement from adaptation2;e.g.the leftmost AFFRICATE has the largest improvement by adaptation.Performance degradation from BN to throat-whisper had been expected.However,some AFs such as AFFRICATIVE and GLOTTAL degrades drastically as the acoustic variation of these features is among the largest.Since there is no vocal cord vibration in whispery speech,GLOTTAL would not be useful for such a task.For the same reason,vowel-related AFs, such as CLOSE,CENTRAL,suffer from the mismatch.Most AFs im-prove by adaptation;NASAL,for example,is one of the best AF on BN data but degrades a lot on throat-whisper,as can be inferred from Fig.1.After adaptation,its F-measure doubles but there is still a gap to the performance level on BN data.5.2.Stream DecodingIn the stream architecture,we put together our best senone model3 and the best AF detectors45.Thefirst experiments combine the senone model with each single AF detector to see how well theTable4.Four-Best Single-AF WERs on Different Weight Ratios AF weight90:10AF weightbaseline33.8baseline ASPIRA TED31.4ALVEOLAR33.1CLOSE32.6RETROFLEX31.7DENTAL33.3PALA TAL33.1AF detectors can help the senone model.Table4shows the WERs of different combination weights and the four-best single AF de-tectors.As shown in the table,the combination of90%of weight on senone models and10%of weight on AF detectors results in the best performance,which can be regarded as a global minimum in the performance concave with respect to different weights.In other words,the single AFs can help only with carefully selected weight.In the next experiments,we incrementally add from one up to ten AF detectors to the streams.We use simple rules to select the AF detectors.The AF selection criteria include one-best WER (WER),accuracy(acc),and F-measure(F).According to each cri-terion,AF selection starts in greedy fashion from the AF detector having the best performance,then it picks the second best one,and so on.There is also a set of weighting rules for adding more AFs. Thefirst weighting rule is always assigning0.05to the weight of every AFs(w5).The second rule distributes uniform weights out of 0.1to the AFs(unif).The last one puts more weight on the better performed AFs using the formula,where is the weight,the total number of AF detectors used,the rank of performance(scaled).Fig.3shows the WERs with AF selection using*-WER,which showed better result than the other two;this result is consistent with[5].On the other hand,fixed weight(w5-*)suffers from insufficient weights for the senone models as the AF number increases.With one exception that the WER improves to31.2%in scaled-F with ALVEOLAR and FRICATIVE, incorporating more than one AF doesn’t improve the WER.We suspect the reason is that the mismatched training and testing data are quite different acoustically,while the adaptation data is not enough to reliably estimate the AFs.Therefore we cannot achieve the improvement level as reported in[5].6.CONCLUSIONSWe have developed a series of adaptation methods applied to ar-ticulatory feature detection,which improve the performance of a standard senone-based HMM throat-whisper recognizer using a stream decoder.Also,we have shown AF adaptation improves detection accuracy and F-measure.With t-test=0.046,the best stream decoding performance(WER=31.2%)is statistically sig-nificant;however,on such a small test set,some other smaller improvements are not.We therefore plan to collect more data. Further work could be applying discriminative model combination (DMC)on the stream architecture for better weights[12].7.ACKNOWLEDGEMENTSThe authors wish to thank Dr.Yoshitaka Nakajima for the invita-tion to his lab,the chance to gain hands-on experience using the stethoscopic microphones developed at his lab,and his hospitality. Many thanks to Hua Yu for providing the BN baseline systemandFig.3.WERs on Number of AF Detectors Used in Stream Florian Metze and Sebastian St¨u ker for the AF and stream scripts. Thanks also go to the reviewers for their valuable comments.8.REFERENCES[1]Y.Nakajima,H.Kashioka,K.Shikano,and N.Camp-bell,“Non-audible murmur recognition input interface us-ing stethoscopic microphone attached to the skin,”in Proc.ICASSP,Hong Kong,2003.[2]Y.Zheng,Z.Liu,Z.Zhang,M.Sinclair,J.Droppo,L.Deng,A.Acero,and X.Huang,“Air-and bone-conductive inte-grated microphones for robust speech detection and enhance-ment,”in Proc.ASRU,St.Thomas,U.S.Virgin Islands,Dec 2003.[3]S.-C.Jou,T.Schultz,and A.Waibel,“Adaptation for softwhisper recognition using a throat microphone,”in Proc.ICSLP,Jeju Island,Korea,Oct2004.[4]K.Kirchhoff,Robust Speech Recognition Using ArticulatoryInformation,Ph.D.thesis,University of Bielefeld,Germany, July1999.[5]F.Metze and A.Waibel,“Aflexible stream architecture forASR using articulatory features,”in Proc.ICSLP,Denver, CO,Sep2002.[6]“/english/prod01.htm,”.[7]H.Yu and A.Waibel,“Streaming the front-end of a speechrecognizer,”in Proc.ICSLP,Beijing,China,2000.[8]C.J.Leggetter and P.C.Woodland,“Maximum likelihoodlinear regression for speaker adaptation of continuous den-sity hidden Markov models,”Computer Speech and Lan-guage,vol.9,pp.171–185,1995.[9]H.Valbret,E.Moulines,and J.P.Tubach,“V oice transfor-mation using PSOLA technique,”Speech Communication, vol.11,pp.175–187,1992.[10]M.J.F.Gales,“Maximum likelihood linear transformationsfor HMM-based speech recognition,”Computer Speech and Language,vol.12,pp.75–98,1998.[11]P.Heracleous,Y.Nakajima,A.Lee,H.Saruwatari,andK.Shikano,“Accurate hidden Markov models for non-audible murmur(NAM)recognition based on iterative super-vised adaptation,”in Proc.ASRU,St.Thomas,U.S.Virgin Islands,Dec2003.[12]S.St¨u ker,F.Metze,T.Schultz,and A.Waibel,“Integratingmultilingual articulatory features into speech recognition,”in Proc.Eurospeech,Geneva,Switzerland,Sep2003.。

视觉文化背景下的声音使用——基于实践的一种思考

视觉文化背景下的声音使用——基于实践的一种思考

誉茜。

白匿第图期实务l采素}皤爵蠢端。

i蠢j蕈莲叠视觉文化背景下的声音使用——基于实践的一种思考吴艳(上海广播电视台上海200051)搐耍:这是一个视觉文化主导下的大众传播时代。

但是以声音为符号的文化产品依旧具备其独有的魅力。

本文以视觉文化为考察背景,指出声音产品在当下的几种使用特征,并提出从听觉元素奇观化、声音元素亲近化、消费过程主体化三方面来创造其声音产品的审美价值,为相关实践带来思考。

关键诩:视觉文化背景声音产品审美价值从生物学角度看,人的听觉和视觉是用以感知的生物属性,“二者都需要感觉/感知系统,即听觉器官和视觉器官去加工适当的能量形式。

两者都需要接触那样的能量形式及其所传导的能量,否则感知就不可能发生。

但声音的能量每天24小时都在,无论昼夜、阴晴、雨雪都在;相反,视觉的能量或光线就脆弱和挑剔得多,在自然界,视觉感知的时间和方式完全依赖环境。

”…由此,加入“媒介环境”视角予以观照,这是一个视觉文化主导下的大众传播时代,“视觉符号正在或已经凌越了语言符号转而成为文化的主导形态”。

口1那么,作为人类“天赋权力”之一的听觉,在视觉传播语境下是否可能获得同视觉一样平等的地位?在视觉文化主导下,声音的使用具有哪些特点,如何能够尽显其功用?以声音为符号的文化产品又如何体现其符号特性,从而创造其审美价值?笔者主要以上海电台990新闻频率纪实节目<声音档案>为样本,辅之以其他案例,结合笔者工作的实践体会,借鉴文化传播的相关理论,进行相关问题的思考和梳理。

需要指出的是,本文并不是要将视觉与听觉孤立开来,实际的物质世界里两者也无法孤立,而是将视觉作为一种参照坐标,寻找一种观察角度,以探讨在视觉文化主导的生产环境下,声音产品价值最大化的实现途径。

一、认识声音(一)何为声音很多时候人们意识到声音,是因为注意到发出声音的物体,比如钟表、吉他、车辆、演讲者、猫……。

这个时候,声音便拥有了指代的具象形式,它不再只是声音本身,而是一种东西,它在我们的感官接受系统里获得了可体认的存在:仿佛东南传播2013年第4期(总第104期)被看到、被触到、被闻到。

Speech Synthesis

Speech Synthesis
Speech Synthesis
April 14, 2009
Some Reminders
• Final Exam is next Monday: • In this room • (I am looking into changing the start time to 9 am.) • I have a review sheet for you (to hand out at the end of class).
Perception → Production
• Japanese listeners performed an /r/ - /l/ discrimination task. • Important: listeners were told nothing about how to produce the /r/ - /l/ contrast • …but, through perception training, their productions got better anyway.
Exemplar Categorization
1. Stored memories of speech experiences are known as traces. • Each trace is linked to a category label.
2. Incoming speech tokens are known as probes. 3. A probe activates the traces it is similar to. • • Note: amount of activation is proportional to similarity between trace and probe. Traces that closely match a probe are activated a lot; • Traces that have no similarity to a probe are not activated much at all.

语言学导论课后习题答案

语言学导论课后习题答案

Chapter 4 MorphologyWhat is morphology?The total number of words stored in the brain is called the lexicon.Words are the smallest free units of language that unite sounds with meaning.Morphology is defined as the study of the internal structur e and the formation of words.Morphemes and allomorphsThe smallest meaningful unit of language is called a morpheme.A morpheme may be represented by different forms, called allomorphs.“zero” form of a morpheme and suppletivesSome countable n ouns do not change form to express plurality. Similarly, some regular verbs do not change form to indicate past tense. In these two cases, the noun or verb contains two morphemes, among which there is one “zero form” of a morpheme.Some verbs have irreg ular changes when they are in past tense. In this case, the verbs also have two morphemes. Words which are not related in form to indicate grammatical contrast with their roots are called suppletives.Free and bound morphemesSome morphemes constitut e words by themselves. These morphemes are called free morphemes.Other morphemes are never used independently in speech and writing. They are always attached to free morphemes to form new words. These morphemes are called bound morphemes. The distinct i on between a free morphemes and a bound morpheme is whether it can be used independently in speech or writing.Free morphemes are the roots of words, while bound morphemes are the affixes (prefixes and suffixes).Inflexional and derivational morpheme sInflexional morphemes in modern English indicate case and number of nouns, tense and aspect of verbs, and degree of adjectives and adverbs.Derivational morphemes are bound morphemes added to existing forms to construct new words. English affixes a re divided into prefixes and suffixes.Some languages have infixes, bound morphemes which are inserted into other morphemes.The process of putting affixes to existing forms to create new words is called derivation. Words thus formed are called derivatives.Conclusion: classification of morphemesMorphemesFree morphemesBound morphemesInflexionalDerivational: affixesPrefixes: -s, -’s, -er, -est, -ing, -ed, -sSuffixesFormation of new wordsDerivationDerivation forms a wo rd by adding an affix to a free morpheme.Since derivation can apply more than once, it is possible to create a derived word with a number of affixes. For example, if we add affixes to the word friend, we can form befriend, friendly, unfriendly, friendliness, unfriendliness, etc. This process of adding more than one affix to a free morpheme is termed complex derivation.Derivation does not apply freely to any word of a given category. Generally speaking, affixes cannot be added to morphemes of a different language origin.Derivation is also constrained by phonological factors.Some English suffixes also change the word stress.CompoundingCompounding is another common way to form words. It is the combination of free morphemes. The majority of E nglish compounds are the combination of words from the three classes –nouns, verbs and adjectives – and fall into the three classes.In compounds, the rightmost morpheme determines the part of speech of the word.The meaning of compounds is not always the sum of meaning of the components.ConversionConversion is the process putting an existing word of one class into another class.Conversion is usually found in words containing one morpheme.ClippingClipping is a process that shortens a pol y syllabic word by deleting one or more syllables.Clipped words are initially used in spoken English on informal occasions.Some clipped words have become widely accepted, and are used even in formal styles. For example, the words bus (omnibus), vet (veterinarian), gym (gymnasium), fridge (refrigerator) and fax (facsimile) are rarely used in their complete form.BlendingBlending is a process that creates new words by putting together non-morphemic parts of existing words. For example, smog (smoke + frog), brunch (a meal in the middle of morning, replacing both breakfast and lunch), motel (motor + hotel). There is also an interesting word in the textbook for junior middle school students –“plike” (a kind of machine that is like both a plane and a bike).Back-formationBack-formation is the process that creates a new word by dropping a real or supposed suffix. For example, the word televise is back-formed from television. Originally, the word television is formed by putting the prefix tele- (far) to the root vision (viewing). At the same time, there is a suffix –sion in English indicating nouns. Then people consider the –sion in the word television asthat suffix and drop it to form the verb televise.Acronyms and abbreviationsAcronyms and abbrevia tions are formed by putting together the initial letters of all words in a phrase or title.Acronyms can be read as a word and are usually longer than abbreviations, which are read letter by letter.This type of word formation is common in names of org anizations and scientific terminology.EponymsEponyms are words that originate from proper names of individuals or places. For example, the word sandwich is a common noun originating from the fourth Earl of Sandwich, who put his food between two slices of bread so that he could eat while gambling.CoinageCoinage is a process of inventing words not based on existing morphemes.This way of word formation is especially common in cases where industry requires a word for a new product. For example, Kodak and Coca-cola.For more detailed explanation to the ways of word formation, see my notes of Practical English Grammar.转自[英美者]-英语专业网站:/cn/Html/M/Linguistics/86983.html Chapter 3 PhonologyWhat is phonology?Phonology is the study of sound systems and patterns.Phonology and phonetics are two studies different in perspectives, which are concerned with the study of speech sounds.Phonology focuses o n three fundamental questions.What sounds make up the list of sounds that can distinguish meaning in a particular language? What sounds vary in what ways in what context?What sounds can appear together in a sequence in a particular language?Pho nemes and allophonesA phoneme is a distinctive, abstract sound unit with a distinctive feature.The variants of a phoneme are termed allophones.We use allophones to realize phonemes.Discovering phonemesContrastive distribution – phonemesIf sounds appear in the same environment, they are said to be in contrastive distribution.Typical contrastive distribution of sounds is found in minimal pairs and minimal sets.A minimal pair consists of two words that differ by only one sound in the same position.Minimal sets are more than two words that are distinguished by one segment in the same position.The overwhelming majority of the consonants and vowels represented by the English phonetic alphabet are in contrastive distribution.Some sounds can hardly be found in contrastive distribution in English. However, these sounds are distinctive in terms of phonetic features. Therefore, they are separate phonemes.Complementary distribution – allophonesSounds that are not found in the sam e position are said to be in complementary distribution.If segments are in complementary distribution and share a number of features, they are allophones of the same phoneme.Free variationIf segments appear in the same position but the mutual subs titution does not result in change of meaning, they are said to be in free variation.Distinctive and non-distinctive featuresFeatures that distinguish meaning are called distinctive features, and features do not, non-distinctive features.Distinc tive features in one language may be non-distinctive in another.Phonological rulesPhonemes are abstract sound units stored in the mind, while allophones are the actual pronunciations in speech.What phoneme is realized by what allophones in what specific context is another major question in phonology.The regularities that what sounds vary in what ways in what context are generalized and stated in phonology as rules.There are many phonological rules in English. Take the following ones as exam ples.[+voiced +consonant] – [-voiced]/[-voiced +consonant]_[-voiced +bilabial +stop] – unaspirated/[-voiced +alveolar +fricative]_Syllable structureA syllable is a phonological unit that is composed of one or more phonemes.Every syllable h as a nucleus, which is usually a vowel.The nucleus may be preceded by one or more consonants called the onset and followed by one or more consonants called the coda.Sequence of phonemesNative speakers of any language intuitively know what sounds can be put together.Some sequences are not possible in English. The impossible sequences are called systematic gaps.Sequences that are possible but do not occur yet are called accidental gaps.When new words are coined, they may fill some accident a l gaps but they will never fillsystematic gaps.Suprasegmental featuresFeatures that are found over a segment or a sequence of two or more segments are called suprasegmental features.These features are distinctive features.StressStress is the perceived prominence of one or more syllabic elements over others in a word.Stress is a relative notion. Only words that are composed of two or more syllables have stress. If a word has three or more syllables, there is a primary stress and a sec ondary stress.In some languages word stress is fixed, i.e. on a certain syllable. In English, word stress is unpredictable.IntonationWhen we speak, we change the pitch of our voice to express ideas.Intonation is the variation of pitch to distin guish utterance meaning.The same sentence uttered with different intonation may express different attitude of the speaker.In English, there are three basic intonation patterns: fall, rise, fall-rise.ToneTone is the variation of pitch to disting uish words.The same sequence of segments can be different words if uttered with different tones.Chinese is a typical tone language.-转自[英美者]-英语专业网站:/cn/Html/M/Linguistics/86123.html Chapter 2 PhoneticsWhat is phonetics?Phonetics is termed as the study of speech sounds.Sub-branches of phoneticsArticulatory phonetics – the production of speech soundsAcoustic phonetics – the physical properties of speech soundsAuditory phonetics – the perceptive mechanism of speech soundsThe speech organsWhere does the air stream come from?From the lungWhat is the function of vocal cords?Controlling the air streamWhat are the cavities?O ral cavityPharyngeal cavityNasal cavityTranscription of speech soundsUnits of representationSegments (the individual sounds)Phonetic symbolsThe widely used symbols for phonetic transcription of speech sounds is the International Phonetic Alphabet (IPA).The IPA attempts to represent each sound of human speech with a single symbol and the symbols are enclosed in brackets [ ] to distinguish phonetic transcriptions from the spelling system of a language.In more detailed transcripti o n (narrow transcription) a sound may be transcribed with a symbol to which a smaller is added in order to mark the finer distinctions.Description of speech soundsDescription of English consonantsGeneral feature: obstructionCriteria of conson ant descriptionPlaces of articulationManners of articulationV oicing of articulationPlaces of articulationThis refers to each point at which the air stream can be modified to produce a sound.Bilabial: [p] [b] [m] [w]Labiodental: [f] [v]Interdental: [ ] [ ]Alveolar: [t] [d] [s] [z] [l] [n] [r]Palatal: [ ] [ ] [t ] [d ] [j]Velar: [k] [g] [ ]Glottal: [h]Manners of articulationThis refers to how the air stream is modified, whether it is completely blocked or partially obstructed.Stops: [p] [b] [t] [d] [k] [g]Fricatives: [s] [z] [ ] [ ] [f] [v] [ ] [ ] [h]Affricates: [t ] [d ]Liquids: [l] [r]Glides: [w] [j]Nasals: [m] [n] [ ]V oicing of articulationThis refers to the vibrating of the vocal cords when sounds are produced.V oiced soundsV oiceless soundsDescription of English vowelsGeneral feature: without obstructionCriteria of vowel descriptionPart of the tongue that is raisedFrontCentralBackExtent to which the tongue rises i n the direction of the palateHighMidLowKind of opening made at the lipsPosition of the soft palateSingle vowels (monophthongs) and diphthongsPhonetic features and natural classesClasses of sounds that share a feature or features a re called natural classes.Major class features can specify segments across the consonant-vowel boundary.Classification of segments by features is the basis on which variations of sounds can be analyzed.第三章“词汇”问题和练习1. 解释下列术语语素复合词屈折变化词缀派生词词根语素变体词干粘着语素自由语素词位词汇语法词词汇词封闭类开放类混成法借词混合借词转移借词缩略语脱落逆构词法同化异化俗词源2. 给下列词加上适当的否定前缀a. removable m. syllabicb. formal n. normalc. practicable o. workabled. sensible p. writtene. tangible q. usualf. logical r. thinkableg. regular s. humanh. proportionate t. relevanti. effective u. editablej. elastic v. mobilek. ductive w. legall. rational x. discreet3. 语素被定义为表达和内容关系的最小单位。

现代语言学前五章课后习题答案

现代语言学前五章课后习题答案

Chapter 1 Introduction1.Explain the following definition of linguistics: Linguistics is the scientific study oflanguage. 请解释以下语言学的定义:语言学是对语言的科学研究。

Linguistics investigates not any particular languagebut languages in general.Linguistic study is scientific because it is baxxxxsed on the systematic investigation of authentic language data.No serious linguistic conclusion is reached until after the linguist has done the following three things: observing the way language is actually usedformulating some hypothesesand testing these hypotheses against linguistic facts to prove their validity.语言学研究的不是任何特定的语言,而是一般的语言。

语言研究是科学的,因为它是建立在对真实语言数据的系统研究的基础上的。

只有在语言学家做了以下三件事之后,才能得出严肃的语言学结论:观察语言的实际使用方式,提出一些假设,并用语言事实检验这些假设的正确性。

1.What are the major branches of linguistics? What does each of them study?语言学的主要分支是什么?他们每个人都研究什么?Phonetics-How speech sounds are produced and classified语音学——语音是如何产生和分类的Phonology-How sounds form systems and function to convey meaning音系学——声音如何形成系统和功能来传达意义Morphology-How morphemes are combined to form words形态学——词素如何组合成单词Sytax-How morphemes and words are combined to form sentences句法学-词素和单词如何组合成句子Semantics-The study of meaning ( in abstraction)语义学——意义的研究(抽象)Pragmatics-The study of meaning in context of use语用学——在使用语境中对意义的研究Sociolinguistics-The study of language with reference to society社会语言学——研究与社会有关的语言Psycholinguistics-The study of language with reference to the workings of the mind心理语言学:研究与大脑活动有关的语言Applied Linguistics-The application of linguistic principles and theories to language teaching and learning应用语言学——语言学原理和理论在语言教学中的应用1.What makes modern linguistics different from traditional grammar?现代语言学与传统语法有何不同?Modern linguistics is descxxxxriptive;its investigations are baxxxxsed on authenticand mainly spoken language data.现代语言学是描述性的,它的研究是基于真实的,主要是口语数据。

播音语言表现力可以参考的英文文献

播音语言表现力可以参考的英文文献

播音语言表现力可以参考的英文文献Title: The Expressiveness of Broadcast Language: A Review of LiteratureIntroductionBroadcast language, also known as announcer language, refers to the manner in which news anchors, radio hosts, and television presenters speak while on-air. The expressiveness of broadcast language is crucial in capturing and maintaining audience attention, conveying emotion and tone, and engaging listeners. In this review of literature, we examine various aspects of broadcast language expressiveness and how it impacts audience perception and comprehension.Voice Quality and ModulationResearch suggests that voice quality and modulation play a significant role in the effectiveness of broadcast language. A study by Smith et al. (2017) found that presenters with a clear, smooth voice and appropriate modulation were perceived as more trustworthy and authoritative by listeners. Furthermore, variations in pitch, volume, and speed can help convey emotion and emphasize key points in a broadcast. For example, raising the pitch and volume of the voice can indicate excitement, whilelowering the pitch and speaking slowly can signal seriousness or urgency.Pronunciation and ArticulationProper pronunciation and articulation are essential for clear communication in broadcast language. Research by Jones (2018) suggests that mispronunciations and slurred speech can detract from the credibility of the presenter and confuse the audience. Additionally, certain sounds and phonetic patterns may be more challenging to pronounce accurately, especially for non-native speakers. To improve pronunciation and articulation, broadcasters can practice tongue twisters, vocal exercises, and receive feedback from language coaches.Intonation and EmphasisIntonation refers to the rising and falling patterns of pitch in speech, while emphasis involves highlighting important words or phrases through changes in pitch, volume, or duration. By varying intonation and emphasis, broadcasters can convey nuanced meanings, express emotions, and maintain audience interest. Research by Brown (2019) demonstrates that using intonation and emphasis effectively can enhance audience engagement and comprehension. For instance, stressingkeywords and using rising intonation at the end of a sentence can invite listener participation and create a sense of curiosity.Rhythm and PacingThe rhythm and pacing of speech in broadcast language also influence audience perception and engagement. Research by Lee and Kim (2020) shows that a balanced rhythm and appropriate pacing can help maintain listener attention and facilitate information processing. Broadcasters should consider factors such as phrasing, pausing, and rate of speech to create a natural flow and avoid monotony. For example, inserting brief pauses between phrases can allow listeners to digest information, while adjusting the speed of speech can convey a sense of urgency or importance.Nonverbal CommunicationIn addition to verbal cues, nonverbal communication plays a crucial role in expressing emotions and engaging audiences in broadcast language. Research by Zhang and Wang (2021) highlights the impact of facial expressions, gestures, and body language on viewer perception and comprehension. A friendly smile, a confident posture, or a subtle nod can convey warmth, enthusiasm, and professionalism to the audience. Combiningverbal and nonverbal cues effectively can enhance the overall expressiveness and impact of broadcast language.ConclusionIn conclusion, the expressiveness of broadcast language is a multifaceted phenomenon that encompasses voice quality, pronunciation, intonation, rhythm, and nonverbal communication. Understanding and mastering these aspects can help broadcasters engage and connect with their audience, convey meaning effectively, and create a memorable listening experience. Further research on the role of expressiveness in broadcast language can offer valuable insights for language educators, media professionals, and communication scholars. By continuously honing their linguistic skills and expressive abilities, broadcasters can elevate the quality and impact of their on-air performance.。

各种专业课程英文名称翻译

各种专业课程英文名称翻译

课程的英文词汇(一)生物物理学Biophysics真空冷冻干燥技术Vacuum Freezing & Drying Technology16位微机16 Digit MicrocomputerALGOL语言ALGOL LanguageBASIC 语言BASIC LanguageBASIC 语言及应用BASIC Language & ApplicationC 语言C LanguageCAD 概论Introduction to CADCAD/CAM CAD/CAMCOBOL语言COBOL LanguageCOBOL语言程序设计COBOL Language Program DesigningC与UNIX环境C Language & Unix EnvironmentC语言与生物医学信息处理C Language & Biomedical Information Processing dBASE Ⅲ课程设计C ourse Exercise in dBASE ⅢFORTRAN语言FORTRAN LanguageIBM-PC/XT Fundamentals of Microcomputer IBM-PC/XTIBM-PC微机原理Fundamentals of Microcomputer IBM-PCLSI设计基础Basic of LSI DesigningPASCAL大型作业PASCAL Wide Range WorkingPASCAL课程设计Course Exercise in PASCALX射线与电镜X-ray & Electric MicroscopeZ-80汇编语言程序设计Z-80 Pragramming in Assembly Languages板壳理论Plate Theory板壳力学Plate Mechanics半波实验Semiwave Experiment半导体变流技术Semiconductor Converting Technology半导体材料Semiconductor Materials半导体测量Measurement of Semiconductors半导体瓷敏元件Semiconductor Porcelain-Sensitive Elements半导体光电子学Semiconductor Optic Electronics半导体化学Semiconductor Chemistry半导体激光器Semiconductor Laser Unit半导体集成电路Semiconductor Integrated Circuitry半导体理论Semiconductive Theory半导体器件Semiconductor Devices半导体器件工艺原理Technological Fundamentals of Semiconductor Device 半导体物理Semiconductor Physics半导体专业Semiconduction Specialty半导体专业实验Specialty Experiment of Semiconductor薄膜光学Film Optics报告文学专题Special Subject On Reportage报刊编辑学Newspaper & Magazine Editing报纸编辑学Newspaper Editing泵与风机Pumps and Fans泵与水机Pumps & Water Turbines毕业设计Graduation Thesis编译方法Methods of Compiling编译技术Technique of Compiling编译原理Fundamentals of Compiling变电站的微机检测与控制Computer Testing & Control in Transformer Substatio n变分法与张量Calculus of Variations & Tensor变分学Calculus of Variations变质量系统热力学与新型回转压Variable Quality System Thermal Mechanics & N eo-Ro表面活性物质Surface Reactive Materials并行算法Parallel Algorithmic波谱学Wave Spectrum材料的力学性能测试Measurement of Material Mechanical Performance材料力学Mechanics of Materials财务成本管理Financial Cost Management财政学Public Finance财政与金融Finance & Banking财政与信贷Finance & Credit操作系统Disk Operating System操作系统课程设计Course Design in Disk Operating System操作系统原理Fundamentals of Disk Operating System策波测量技术Technique of Whip Wave Measurement测量原理与仪器设计Measurement Fundamentals & Meter Design测试技术Testing Technology测试与信号变换处理Testing & Signal Transformation Processing产业经济学Industrial Economy产业组织学Industrial Organization Technoooligy场论Field Theory常微分方程Ordinary Differentical Equations超导磁体及应用Superconductive Magnet & Application超导及应用Superconductive & Application超精微细加工Super-Precision & Minuteness Processing城市规划原理Fundamentals of City Planning城市社会学Urban Sociology成组技术Grouping Technique齿轮啮合原理Principles of Gear Connection冲击测量及误差Punching Measurement & Error冲压工艺Sheet Metal forming Technology抽象代数Abstract Algebra传动概论Introduction to Transmission传感器与检测技术Sensors & Testing Technology传感器原理Fundamentals of Sensors传感器原理及应用Fundamentals of Sensors & Application传热学Heat Transfer传坳概论Introduction to Pass Col船舶操纵Ship Controling船舶电力系统Ship Electrical Power System船舶电力系统课程设计Course Exercise in Ship Electrical Power System船舶电气传动自动化Ship Electrified Transmission Automation船舶电站Ship Power Station船舶动力装置Ship Power Equipment船舶概论Introduction to Ships船舶焊接与材料Welding & Materials on Ship船舶机械控制技术Mechanic Control Technology for Ships船舶机械拖动Ship Mechamic Towage船舶建筑美学Artistic Designing of Ships船舶结构力学Structual Mechamics for Ships船舶结构与制图Ship Structure & Graphing船舶静力学Ship Statics船舶强度与结构设计Designing Ship Intensity & Structure船舶设计原理Principles of Ship Designing船舶推进Ship Propeling船舶摇摆Ship Swaying船舶阻力Ship Resistance船体建造工艺Ship-Building Technology船体结构Ship Structure船体结构图Ship Structure Graphing船体振动学Ship Vibration创造心理学Creativity Psychology磁测量技术Magnetic Measurement Technology磁传感器Magnetic Sensor磁存储设备设计原理Fundamental Design of Magnetic Memory Equipment 磁记录技术Magnetographic Technology磁记录物理Magnetographic Physics磁路设计与场计算Magnetic Path Designing & Magnetic Field Calculati磁盘控制器Magnetic Disk Controler磁性材料Magnetic Materials磁性测量Magnetic Measurement磁性物理Magnetophysics磁原理及应用Principles of Catalyzation & Application大电流测量Super-Current Measurement大电源测量Super-Power Measurement大机组协调控制Coordination & Control of Generator Networks大跨度房屋结构Large-Span House structure大型锅炉概况Introduction to Large-Volume Boilers大型火电机组控制Control of Large Thermal Power Generator Networks大学德语College German大学俄语College Russian大学法语College French大学日语College Japanese大学英语College English大学语文College Chinese大众传播学Mass Media代用运放电路Simulated Transmittal Circuit单片机原理Fundamentals of Mono-Chip Computers单片机原理及应用Fundamentals of Mono-Chip Computers & Applications 弹性力学Theory of Elastic Mechanics当代国际关系Contemporary International Relationship当代国外社会思维评价Evaluation of Contemporary Foreign Social Thought 当代文学Contemporary Literature当代文学专题Topics on Contemporary Literature当代西方哲学Contemporary Western Philosophy当代戏剧与电影Contemporary Drama & Films党史History of the Party导波光学Wave Guiding Optics等离子体工程Plasma Engineering低频电子线路Low Frequency Electric Circuit低温传热学Cryo Conduction低温固体物理Cryo Solid Physics低温技术原理与装置Fundamentals of Cryo Technology & Equipment低温技术中的微机原理Priciples of Microcomputer in Cryo Technology低温绝热Cryo Heat Insulation低温气体制冷机Cryo Gas Refrigerator低温热管Cryo Heat Tube低温设备Cryo Equipment低温生物冻干技术Biological Cryo Freezing Drying Technology低温实验技术Cryo Experimentation Technology低温物理导论Cryo Physic Concepts低温物理概论Cryo Physic Concepts低温物理概念Cryo Physic Concepts低温仪表及测试Cryo Meters & Measurement低温原理Cryo Fundamentals低温中的微机应用Application of Microcomputer in Cryo Technology低温装置Cryo Equipment低噪声电子电路Low-Noise Electric Circuit低噪声电子设计Low-Noise Electronic Designing低噪声放大与弱检Low-Noise Increasing & Decreasing低噪声与弱信号检测Detection of Low Noise & Weak Signals地理Geography第二次世界大战史History of World War II电测量技术Electric Measurement Technology电厂计算机控制系统Computer Control System in Power Plants电磁测量实验技术Electromagnetic Measurement Experiment & Technology电磁场计算机Electromagnetic Field Computers电磁场理论Theory of Electromagnetic Fields电磁场数值计算Numerical Calculation of Electromagnetic Fields电磁场与电磁波Electromagnetic Fields & Magnetic Waves电磁场与微波技术Electromagnetic Fields & Micro-Wave Technology电磁场中的数值方法Numerical Methods in Electromagnetic Fields电磁场中的数值计算Numerical Calculation in Electromagnetic Fields电磁学Electromagnetics电动力学Electrodynamics电镀Plating电分析化学Electro-Analytical Chemistry电工测试技术基础Testing Technology of Electrical Engineering电工产品学Electrotechnical Products电工电子技术基础Electrical Technology & Electrical Engineering电工电子学Electronics in Electrical Engineering电工基础Fundamental Theory of Electrical Engineering电工基础理论Fundamental Theory of Electrical Engineering电工基础实验Basic Experiment in Electrical Engineering电工技术Electrotechnics电工技术基础Fundamentals of Electrotechnics电工实习Electrical Engineering Practice电工实验技术基础Experiment Technology of Electrical Engineering电工学Electrical Engineering电工与电机控制Electrical Engineering & Motor Control电弧电接触Electrical Arc Contact电弧焊及电渣焊Electric Arc Welding & Electroslag Welding电化学测试技术Electrochemical Measurement Technology电化学工程Electrochemical Engineering电化学工艺学Electrochemical Technology电机测试技术Motor Measuring Technology电机电磁场的分析与计算Analysis & Calculation of Electrical Motor & Electromagnetic Fields电机电器与供电Motor Elements and Power Supply电机课程设计Course Exercise in Electric Engine电机绕组理论Theory of Motor Winding电机绕组理论及应用Theory & Application of Motor Winding电机设计Design of Electrical Motor电机瞬变过程Electrical Motor Change Processes电机学Electrical Motor电机学及控制电机Electrical Machinery Control & Technology电机与拖动Electrical Machinery & Towage电机原理Principle of Electric Engine电机原理与拖动Principles of Electrical Machinery & Towage电机专题Lectures on Electric Engine电接触与电弧Electrical Contact & Electrical Arc电介质物理Dielectric Physics电镜Electronic Speculum电力电子电路Power Electronic Circuit电力电子电器Power Electronic Equipment电力电子器件Power Electronic Devices电力电子学Power Electronics电力工程Electrical Power Engineering电力生产技术Technology of Electrical Power Generation电力生产优化管理Optimal Management of Electrical Power Generation电力拖动基础Fundamentals for Electrical Towage电力拖动控制系统Electrical Towage Control Systems电力系统Power Systems电力系统电源最优化规划Optimal Planning of Power Source in a Power System 电力系统短路Power System Shortcuts电力系统分析Power System Analysis电力系统规划Power System Planning电力系统过电压Hyper-Voltage of Power Systems电力系统继电保护原理Power System Relay Protection电力系统经济分析Economical Analysis of Power Systems电力系统经济运行Economical Operation of Power Systems电力系统可靠性Power System Reliability电力系统可靠性分析Power System Reliability Analysis电力系统无功补偿及应用Non-Work Compensation in Power Systems & Applicati 电力系统谐波Harmonious Waves in Power Systems电力系统优化技术Optimal Technology of Power Systems电力系统优化设计Optimal Designing of Power Systems电力系统远动Operation of Electric Systems电力系统远动技术Operation Technique of Electric Systems电力系统运行Operation of Electric Systems电力系统自动化Automation of Electric Systems电力系统自动装置Power System Automation Equipment电路测试技术Circuit Measurement Technology电路测试技术基础Fundamentals of Circuit Measurement Technology电路测试技术及实验Circuit Measurement Technology & Experiments电路分析基础Basis of Circuit Analysis电路分析基础实验Basic Experiment on Circuit Analysis电路分析实验Experiment on Circuit Analysis电路和电子技术Circuit and Electronic Technique电路理论Theory of Circuit电路理论基础Fundamental Theory of Circuit电路理论实验Experiments in Theory of Circuct电路设计与测试技术Circuit Designing & Measurement Technology电器学Electrical Appliances电器与控制Electrical Appliances & Control电气控制技术Electrical Control Technology电视接收技术Television Reception Technology电视节目Television Porgrams电视节目制作Television Porgram Designing电视新技术New Television Technology电视原理Principles of Television电网调度自动化Automation of Electric Network Management电影艺术Art of Film Making电站微机检测控制Computerized Measurement & Control of Power Statio电子材料与元件测试技术Measuring Technology of Electronic Material and Element电子材料元件Electronic Material and Element电子材料元件测量Electronic Material and Element Measurement电子测量与实验技术Technology of Electronic Measurement & Experiment电子测试Electronic Testing电子测试技术Electronic Testing Technology电子测试技术与实验Electronic Testing Technology & Experiment电子机械运动控制技术Technology of Electronic Mechanic Movement Control电子技术Technology of Electronics电子技术腐蚀测试中的应用Application of Electronic Technology in Erosion Measurement电子技术基础Basic Electronic Technology电子技术基础与实验Basic Electronic Technology & Experiment电子技术课程设计Course Exercise in Electronic Technology电子技术实验Experiment in Electronic Technology电子理论实验Experiment in Electronic Theory电子显微分析Electronic Micro-Analysis电子显微镜Electronic Microscope电子线路Electronic Circuit电子线路设计与测试技术Electronic Circuit Design & Measurement Technology电子线路实验Experiment in Electronic Circuit电子照相技术Electronic Photographing TechnologyPurPoison2006-03-15 14:56课程的英文词汇(二)雕塑艺术欣赏Appreciation of Sculptural Art调节装置Regulation Equipment动态规划Dynamic Programming动态无损检测Dynamic Non-Destruction Measurement动态信号分析与仪器Dynamic Signal Analysis & Apparatus锻压工艺Forging Technology锻压机械液压传动Hydraulic Transmission in Forging Machinery锻压加热设备Forging Heating Equipment锻压设备专题Lectures on Forging Press Equipments锻压系统动力学Dynamics of Forging System锻造工艺Forging Technology断裂力学Fracture Mechanics对外贸易概论Introduction to International Trade多层网络方法Multi-Layer Network Technology多目标优化方法Multipurpose Optimal Method多项距阵Multi-Nominal Matrix多元统计分析Multi-Variate Statistical Analysis发电厂Power Plant发电厂电气部分Electric Elements of Power Plants法律基础Fundamentals of Law法学概论An Introduction to Science of Law法学基础Fundamentals of Science of Law翻译Translation翻译理论与技巧Theory & Skills of Translation泛函分析Functional Analysis房屋建筑学Architectural Design & Construction非电量测量Non-Electricity Measurement非金属材料Non-Metal Materials非线性采样系统Non-Linear Sampling System非线性光学Non-Linear Optics非线性规划Non-Linear Programming非线性振荡Non-Linear Ocsillation非线性振动Non-Linear Vibration沸腾燃烧Boiling Combustion分析化学Analytical Chemistry分析化学实验Analytical Chemistry Experiment分析力学Analytical Mechanics风机调节Fan Regulation风机调节.使用.运转Regulation,Application & Operation of Fans风机三元流动理论与设计Tri-Variate Movement Theory & Design of Fans风能利用Wind Power Utilization腐蚀电化学实验Experiment in Erosive Electrochemistry复变函数Complex Variables Functions复变函数与积分变换Functions of Complex Variables & Integral Transformation复合材料力学Compound Material Mechanics傅里叶光学Fourier Optics概率论Probability Theory概率论与数理统计Probability Theory & Mathematical Statistics概率论与随机过程Probability Theory & Stochastic Process钢笔画Pen Drawing钢的热处理Heat-Treatment of Steel钢结构Steel Structure钢筋混凝土Reinforced Concrete钢筋混凝土及砖石结构Reinforced Concrete & Brick Structure钢砼结构Reinforced Concrete Structure高层建筑基础设计Designing bases of High Rising Buildings高层建筑结构设计Designing Structures of High Rising Buildings高等材料力学Advanced Material Mechanics高等代数Advanced Algebra高等教育管理Higher Education Management高等教育史History of Higher Education高等教育学Higher Education高等数学Advanced Mathematics高电压技术High-Voltage Technology高电压测试技术High-Voltage Test Technology高分子材料High Polymer Material高分子材料及加工High Polymer Material & Porcessing高分子化学High Polymer Chemistry高分子化学实验High Polymer Chemistry Experiment高分子物理High Polymer Physics高分子物理实验High Polymer Physics Experiment高级英语听说Advanced English Listening & Speaking高能密束焊High Energy-Dense Beam Welding高频电路High-Frenquency Circuit高频电子技术High-Frenquency Electronic Technology高频电子线路High-Frenquency Electronic Circuit高压测量技术High-Voltage Measurement Technology高压测试技术High-Voltage Testing Technology高压电场的数值计算Numerical Calculation in High-Voltage Electronic Field 高压电器High-Voltage Electrical Appliances高压绝缘High-Voltage Insulation高压实验High-Voltage Experimentation高压试验技术High-Voltage Experimentation Technology工程材料的力学性能测试Mechanic Testing of Engineering Materials工程材料及热处理Engineering Material and Heat Treatment工程材料学Engineering Materials工程测量Engineering Surveying工程测试技术Engineering Testing Technique工程测试实验Experiment on Engineering Testing工程测试信息Information of Engineering Testing工程动力学Engineering Dynamics工程概论Introduction to Engineering工程概预算Project Budget工程经济学Engineering Economics工程静力学Engineering Statics工程力学Engineering Mechanics工程热力学Engineering Thermodynamics工程项目评估Engineering Project Evaluation工程优化方法Engineering Optimizational Method工程运动学Engineering Kinematics工程造价管理Engineering Cost Management工程制图Graphing of Engineering工业分析Industrial Analysis工业锅炉Industrial Boiler工业会计学Industrial Accounting工业机器人Industrial Robot工业技术基础Basic Industrial Technology工业建筑设计原理Principles of Industrial Building Design工业经济理论Industrial Economic Theory工业经济学Industrial Economics工业企业财务管理Industrial Enterprise Financial Management工业企业财务会计Accounting in Industrial Enterprises工业企业管理Industrial Enterprise Management工业企业经营管理Industrial Enterprise Adminstrative Management 工业社会学Industrial Sociology工业心理学Industrial Psychology工业窑炉Industrial Stoves工艺过程自动化Technics Process Automation公差Common Difference公差技术测量Technical Measurement with Common Difference公差与配合Common Difference & Cooperation公共关系学Public Relations公文写作Document Writing古代汉语Ancient Chinese古典文学作品选读Selected Readings in Classical Literature固体激光Solid State Laser固体激光器件Solid Laser Elements固体激光与电源Solid State Laser & Power Unit固体物理Solid State Physics管理概论Introduction to Management管理经济学Management Economics管理数学Management Mathematics管理系统模拟Management System Simulation管理心理学Management Psychology管理信息系统Management Information Systems光波导理论Light Wave Guide Theory光电技术Photoelectric Technology光电信号处理Photoelectric Signal Processing光电信号与系统分析Photoelectric Signal & Systematic Analysis光辐射探测技术Ray Radiation Detection Technology光谱Spectrum光谱分析Spectral Analysis光谱学Spectroscopy光纤传感Fibre Optical Sensors光纤传感器Fibre Optical Sensors光纤传感器基础Fundamentals of Fibre Optical Sensors光纤传感器及应用Fibre Optical Sensors & Applications光纤光学课程设计Course Design of Fibre Optical光纤技术实验Experiments in Fibre Optical Technology光纤通信基础Basis of Fibre Optical Communication光学Optics光学测量Optical Measurement光学分析法Optical Analysis Method光学计量仪器设计Optical Instrument Gauge Designing光学检测Optical Detection光学设计Optical Design光学信息导论Introduction of Optical Infomation光学仪器设计Optical Instrument Designing光学仪器与计量仪器设计Optical Instrument & Gauge Instrument Designing 光学仪器装配与校正Optical Instrument Installation & Adjustment广播编辑学Broadcast Editing广播新闻Broadcast Journalism广播新闻采写Broadcast Journalism Collection & Composition广告学Advertisement锅炉燃烧理论Theory of Boiler Combustion锅炉热交换传热强化Boiler Heat Exchange,Condction & Intensification锅炉原理Principles of Boiler国际金融International Finance国际经济法International Economic Law国际贸易International Trade国际贸易地理International Trade Geography国际贸易实务International Trade Affairs国际市场学International Marketing国际市场营销International Marketing国民经济计划National Economical Planning国外社会学理论Overseas Theories of Sociology过程(控制)调节装置Process(Control) Adjustment Device过程调节系统Process Adjustment System过程控制Process Control过程控制系统Process Control System海洋测量Ocean Surveying海洋工程概论Introduction to Ocean Engineering函数分析Functional Analysis焊接方法Welding Method焊接方法及设备Welding Method & Equipment焊接检验Welding Testing焊接结构Welding Structure焊接金相Welding Fractography焊接金相分析Welding Fractography Analysis焊接冶金Welding Metallurgy焊接原理Fundamentals of Welding焊接原理及工艺Fundamentals of Welding & Technology焊接自动化Automation of Welding汉语Chinese汉语与写作Chinese & Composition汉语语法研究Research on Chinese Grammar汉字信息处理技术Technology of Chinese Information Processing毫微秒脉冲技术Millimicrosecond Pusle Technique核动力技术Nuclear Power Technology合唱与指挥Chorus & Conduction合金钢Alloy Steel宏观经济学Macro-Economics宏微观经济学Macro Micro Economics红外CCD Infrared CCD红外电荷耦合器Infrared Electric Charge Coupler红外探测器Infrared Detectors红外物理Infrared Physics红外物理与技术Infrared Physics & Technology红外系统Infrared System红外系统电信号处理Processing Electric Signals from Infrared Systems厚薄膜集成电路Thick & Thin Film Integrated Circuit弧焊电源Arc Welding Power弧焊原理Arc Welding Principles互换性技术测量基础Basic Technology of Exchangeability Measurement互换性技术测量Technology of Exchangeability Measurement互换性与技术测量Elementary Technology of Exchangeability Measurement互换性与技术测量实验Experiment of Exchangeability Measurement Technology 画法几何及机械制图Descriptive Geometry & Mechanical Graphing画法几何与阴影透视Descriptive Geometry,Shadow and Perspective化工基础Elementary Chemical Industry化工仪表与自动化Chemical Meters & Automation化工原理Principles of Chemical Industry化学Chemistry化学反应工程Chemical Reaction Engineering化学分离Chemical Decomposition化学工程基础Elementary Chemical Engineering化学计量学Chemical Measurement化学文献Chemical Literature化学文献及查阅方法Chemical Literature & Consulting Method化学粘结剂Chemical Felter环境保护理论基础Basic Theory of Environmental Protection环境化学Environomental Chemistry环境行为概论Introduction to Environmental Behavior换热器Thermal Transducer回旧分析与试验设计Tempering Analysis and Experiment Design回转式压缩机Rotary Compressor回转压缩机数学模型Mathematical Modeling of Rotary Compressors会计学Accountancy会计与财务分析Accountancy & Financial Analysis会计与设备分析Accountancy & Equipment Analysis会计原理及外贸会计Principles of Accountancy & Foreign Trade Accountancy 会计原理与工业会计Principles of Accountancy & Industrial Accountancy活力学Energy Theory活塞膨胀机Piston Expander活塞式制冷压缩机Piston Refrigerant Compreessor活塞式压缩机Piston Compressor活塞式压缩机基础设计Basic Design of Piston Compressor活塞压缩机结构强度Structural Intensity of Piston Compressor活赛压机气流脉动Gas Pulsation of Piston Pressor货币银行学Currency Banking基本电路理论Basis Theory of Circuit基础写作Fundamental Course of Composition机床电路Machine Tool Circuit机床电器Machine Tool Electric Appliance机床电气控制Electrical Control of Machinery Tools机床动力学Machine Tool Dynamics机床设计Machine Tool design机床数字控制Digital Control of Machine Tool机床液压传动Machinery Tool Hydraulic Transmission机电传动Mechanical & Electrical Transmission机电传动控制Mechanical & electrical Transmission Control机电耦合系统Mechanical & Electrical Combination System机电系统计算机仿真Computer Simulation of Mechanic/Electrical Systems机电一体化Mechanical & Electrical Integration机构学Structuring机器人Robot机器人控制技术Robot Control Technology机械产品学Mechanic Products机械产品造型设计Shape Design of Mechanical Products机械工程控制基础Basic Mechanic Engineering Control机械加工自动化Automation in Mechanical Working机械可靠性Mechanical Reliability机械零件Mechanical Elements机械零件设计Course Exercise in Machinery Elements Design机械零件设计基础Basis of Machinery Elements Design机械设计Mechanical Designing机械设计基础Basis of Mechanical Designing机械设计课程设计Course Exercise in Mechanical Design机械设计原理Principle of Mechanical Designing机械式信息传输机构Mechanical Information Transmission Device机械原理Principle of Mechanics机械原理和机械零件Mechanism & Machinery机械原理及机械设计Mechanical Designing机械原理及应用Mechanical Principle & Mechanical Applications机械原理课程设计Course Exercise of Mechanical Principle机械原理与机械零件Mechanical Principle and Mechanical Elements机械原理与机械设计Mechanical Principle and Mechanical Design机械噪声控制Control of Mechanical Noise机械制造概论Introduction to Mechanical Manufacture机械制造工艺学Technology of Mechanical Manufacture机械制造基础Fundamental of Mechanical Manufacture机械制造基础(金属工艺学) Fundamental Course of Mechanic Manufacturing (Meta机械制造系统自动化Automation of Mechanical Manufacture System机械制造中计算机控制Computer Control in Mechanical Manufacture机制工艺及夹具Mechanical Technology and Clamps积分变换Integral Transformation积分变换及数理方程Integral Transformation & Mathematical Equations积分变换控制工程Integral Transformation Control Engineering积分变换与动力工程Integral Transforms & Dynamic Engineering激光电源Laser Power Devices激光焊Laser Welding激光基础Basis of Laser激光技术Laser Technology激光加工Laser Processing激光器件Laser Devices激光器件与电源Laser Devices & Power Source激光原理Principles of Laser激光原理与技术Laser Principles & Technology极限分析Limit Analysis集合论与代数结构Set Theory & Algebraical Structure技术管理Technological Management技术经济Technological Economy技术经济学Technological Economics技术市场学Technological Marketing计量经济学Measure Economics计算方法Computational Method计算机导论Introduction to Computers计算机导论与实践Introduction to Computers & Practice计算机辅助设计CAD计算机辅助设计与仿真Computer Aided Design & Imitation 计算机辅助语言教学Computer-Aided Language Teaching计算机辅助制造Computer-Aided Manufacturing计算机概论Introduction to Computers计算机绘图Computer Graphics计算机基础Basis of Computer Engineering计算机接口技术Computer Interface Technology计算机接口与通讯Computer Interface & Communication计算机局域网Regional Network of Computers计算机控制Computer Controling计算机设计自动化Automation of Computer Design计算机实践Computer Practice计算机数据库Computer Database计算机算法基础Basis of Computer Algorithm计算机图形显示Computer Graphic Demonstration计算机图形学Computer Graphics计算机网络Computer Networks计算机系统结构Computer Architecture计算机语言处理Computer Language Processing计算机原理Principle of Computer Engineering计算机在化学中的应用Application of Computer in Chemistry 计算机组成原理Principles of Computer Composition计算力学Computational Mechanics计算力学基础Basis of Computational Mechanics计算流体Fluid Computation继电保护新技术New Technology of Relay Protection继电保护原理Principles of Relay Protection继电保护运行Relay-Protected Operation检测技术Measurement Technique检测系统动力学Detection System Dynamics检测与控制Detection & Controling简明社会学Concise Sociology简明世界史Brief World History减振设计Vibration Absorption Designing渐近方法Asymptotical Method建筑材料Building Materials建筑初步Elementary Architecture建筑防火Building Fire Protection建筑概论Introduction to Architecture建筑构造Architectural Construction建筑结构Architectural Structure建筑结构抗震设计Anti-quake Architectural Structure Design建筑经济与企业管理Architectural Economy & Enterprise Management建筑力学Architectural Mechanics建筑名作欣赏Appreciation of Architectural Works建筑入门Elementary Architecture建筑摄影Architectural Photographing建筑设备Architectural Equipment建筑设计Architectural Design建筑施工Construction Technology建筑绘画Architectural Drawing建筑物理Architecural Physics建筑制图Architectural Graphing胶体化学Colloid Chemistry交流调速系统Alternating Current Governor System教育心理学Pedagogic Psychology接口与控制器Interface and Controler接口与通讯Interface and Communication结构程序设计Structural Program Designing结构动力学Structural Dynamics结构化学Structural Chemistry结构检验Structural Testing结构力学Structural Mechanics结构素描Structure Sketching结构塑性分析Structural Plasticity Analysis结构稳定Stability Analysis of Structures结构先进技术Advanced Structuring Technology结构优化理论Optimal Structure Theory结构优化设计Optimal Structure DesigningPurPoison2006-03-15 14:56课程的英文词汇(三)解析几何Analytic Geometry介质波导Medium Wave Guide介质测量Medium Measurement介质光学Medium Optics金属X射线学Metal X-Ray Analysis金属材料焊接Metal Material Welding金属材料学Metal Material Science金属材料与热处理Metal Material & Heat Treatment金属腐蚀与保护Metal Erosion & Protection金属腐蚀原理Principles of Metal Erosion金属工艺学Metal Technics金属焊接性基础Elementary Metal Weldability金属焊接原理Principles of Metal Welding金属机械性能Mechanical Property of Metal金属力学性能Metal Mechanic Property金属切削机床Metal Cutting Machine Tool金属切削原理及刀具Principles of Metal Cutting & Cutters金属熔焊原理Principles of Metal Molten Welding金属熔焊原理及工艺Principles of Metal Molten Welding & Technique 金属熔炼Metal Melting金属塑性成形原理Principles of Metal forming金属物理性能Physical Property of Metal金属学Metallography金属学与热处理Metallography & Heat Treatment金属学原理Principles of Metallography金相分析Metallographic Analysis金相技术Metallographic Techniques近代光学测试技术Modern Optical Testing Technology近代光学计量技术Modern Optical Measuring Technology近代经济史Modern History of Economics近代物理实验Lab of Modern Physics近世代数Modern Algebra晶体管原理Principles of Transistors晶体光学Crystallographic Optics精密测量技术Technology of Precision Measurement精密电气测量Precise Electric Measurement精密合金Precise Alloy精密机械CAD CAD for Precision Machinery精密机械课程设计Course Design for Precision Machinery精密机械零件Precision Machinery Elements精密机械设计基础Elementary Precision Machinery Design精密机械学Precision Machinery精细有机合成Minute Organic Synthesis经济地理Economical Geography经济法Law of Economy经济法学Law of Economy经济分析基础Basis of Economic Analysis经济控制论Economical Cybernetics经济社会学Economic Sociology经济新闻Economic News经济学说史History of Economics经济学原理Principles of Economics经济预测Economic Predicting经济预测与管理奖惩Economic Predicting & Management经济原理Principles of Economy经济运筹学Economic Operation Research经济增长理论Theory of Economic Growth经营管理Operation Management经营管理学Operation Management静力学Statics纠错编码Error Correction of Coding决策分析Analysis of Policy Making绝缘在线检测Insulation Live Testing军事理论Military Theory抗干扰技术Anti-Jamming Technique科技翻译Scientific English Translation科技管理Technological Management科技史History of Science & Technology科技史及新技术知识Historry of Science & Knowledge of New Techndogy 科技写作Scientific Writing科技新闻Scientific News科技英语Scientific English科技英语基础Elementary Scientific English科技英语阅读Readings of Scientific English科技与社会Science & Society科学方法论Scientific Methodology科学技术史History of Science & Technology科学计量Scientific Measurement科学社会学概论Introduction to Scientific Socialism科学社会主义Scientific Socialism科学思维方法Methods of Scinetific Thinking科学学Scientology可计算性Calculability可靠性Reliability可靠性及故障诊断Reliability & Error Diagnosis可靠性技术导论Introduction to Reliability Technology可靠性数学Reliable Mathematics可靠性物理Reliability Physics可逆式机组Reversible Machinery Group可逆式水力机械Reversible Hydraulic Machinery空气调节与通风Air Regulation & Ventilation空气动力学Aerodynamics。

《老友记》幽默言语行为研究

《老友记》幽默言语行为研究

《老友记》幽默言语行为研究研究生:黄莹莹 年级:2007级 学科专业:外国语言学及应用语言学指导老师:李冬梅 副教授 研究方向:语用学中文摘要幽默作为一种语言现象,普遍存在于人们的日常生活中,对幽默的研究也是仁者见仁,智者见智。

本文以言语行为理论为视角,以美国最受欢迎的情景喜剧《老友记》为个案研究对象,对《老友记》中所有幽默言语行为进行统计归类分析,旨在探讨《老友记》主要采用何种方式实施其幽默言语行为及其背后的深层原因,从新的视角解读美式幽默。

本研究在阐述J.Austin的言语行为理论和J.K.Searle对言外行为分类的贡献及不足之处的基础上,将幽默视为一种以言致笑的言语行为,放在Mey(2003)所提出的考虑到交际者、语境等多种因素在内的语用行为的层面来讨论。

研究通过对《老友记》语料的收集统计,从更宏观的角度来分析英语幽默言语行为实施的方式手段和原因,而不仅仅局限于将幽默言语行为按照其所包含的指事动词归入言外行为的几大分类中。

因为有研究表明,并未包含指事动词的一些言语行为也能实施其幽默致笑的言外之力。

幽默言语行为如何实施其致笑的言外之力是整个研究的重点。

作者根据背景笑声,从总共234集情景喜剧《老友记》的剧本中把所有幽默言语抽取出来,一共采集到2980个话轮,随后对这2980个话轮进行一一分析得出4541句幽默言语。

根据谭达人(1997)的观点,人的言语活动大约受制于语言律、修辞律、逻辑律和交际律四种规律,幽默言语也不例外。

修辞通常有特定的表达方式,是对语言成分的巧妙利用以求语言表达生动有趣,因此修辞手法也应看做是一种语言手段。

以此为依据,对4541句幽默言语进行分类。

为避免重复统计,也为突出研究重心,在分类过程中,作者优先考虑其所运用的语言手段,包括英语语音语调词汇语义修辞层面的手段,再进一步分析各个层面下的主要特征表现;若无特别明显的语言手段的运用,则从逻辑规则或交际规则去判断归类,从而得出《老友记》幽默言语行为主要是通过何种方式成功实施其致笑的言外之力,为进一步讨论运用这些方式的原因做好铺垫。

指纹之别技术PPT英文版

指纹之别技术PPT英文版

SPEECH RECOGNITION SYSTEMSURABHI BANSAL RUCHI BAHETYABSTRACTSpeech recognition applications are becoming more and more usefulnowadays. Various interactive speech aware applications are available inthe market. But they are usually meant for and executed on the traditionalgeneral-purpose computers. With growth in the needs for embeddedcomputing and the demand for emerging embedded platforms, it isrequired that the speech recognition systems (SRS) are available on themtoo. PDAs and other handheld devices are becoming more and morepowerful and affordable as well. It has become possible to run multimediaon these devices. Speech recognition systems emerge as efficientalternatives for such devices where typing becomes difficult attributed totheir small screen limitations. This paper characterizes an SR process onPXA27x XScale processor, a widely used platform for handheld devices,and implement it for performing tasks on media files through a Linuxmedia player, Mplayer.INTRODUCTIONSpeech recognition basically means talking to a computer, having it recognize what we are saying, and lastly, doing this in real time. This process fundamentally functions as a pipeline that converts PCM (Pulse Code Modulation) digital audio from a sound card into recognized speech. The elements of the pipeline are:Figure 1: Block diagram of a speech recognizer [16]Transform the PCM digital audio into a better acoustic representation– The input to speech recognizer is in the form of a stream of amplitudes, sampled at about 16,000 times per second. But audio in this form is not useful for the recognizer. Hence, Fast-Fourier transformations are used to produce graphs of frequency components describing the sound heard for 1/100th of a second. Any sound is then identified by matching it to its closest entry in the database of such graphs, producing a number, called the “feature number” that describes the sound.Unit matching system provides likelihoods of a match of all sequences of speech recognition units to the input speech. These units may be phones, diphones, syllables or derivative units such as fenones and acoustic units. They may also be whole word units or units corresponding to group of 2 or more words. Each such unit is characterized by some HMM whose parameters are estimated through a training set of speech data.Lexical Decoding constraints the unit matching system to follow only those search paths sequences whose speech units are present in a word dictionary.Apply a "grammar"so the speech recognizer knows what phonemes to expect. This further places constraints on the search sequence of unit matching system. A grammar couldbe anything from a context-free grammar to full-blown English.Figure out which phonemes are spoken– This is a quite dicey task as different words sound differently as spoken by different persons. Also, background noises from microphone make the recognizer hear a different vector. Thus a probability analysis is done during recognition. A hypothesis is formed based on this analysis. A speech recognizer works by hypothesizing a number of different "states" at once. Each state contains a phoneme with a history of previous phonemes. The hypothesized state with the highest score is used as the final recognition result.RELATED WORKA lot of speech aware applications are already there in the market. Various dictation softwares have been developed by Dragon [12], IBM and Philips [13]. Genie [11] is an interactive speech recognition software developed by Microsoft. Various voice navigation applications, one developed by AT&T, allow users to control their computer by voice, like browsing the Internet by voice.Many more applications of this kind are appearing every day. The SPHINX speech recognizer of CMU [1] provides the acoustic as well as the language models used for recognition. It is based on the Hidden Markov Models (HMM). The SONIC recognizer [2] is also one of them, developed by the University of Colorado. There are other recognizers such as XVoice [4] for Linux that take input from IBM’s ViaVoice which, now, exists just for Windows.Background noise is the worst part of a speech recognition process. It confuses the recognizer and makes it unable to hear what it is supposed to. One such recognizer has been devised for robots [17] that, despite of the inevitable motor noises, makes it communicate with the people efficiently. This is made possible by using a noise-type-dependent acoustic model corresponding to a performing motion of robot. Optimizations for speech recognition on a HP SmartBadge IV embedded system [19] has been proposed to reduce the energy consumption while still maintaining the quality of the application. Another such scalable system has been proposed in [18] for DSR (Distributed Speech recognition) by combining it with scalable compression and hence reducing the computational load as well as the bandwidth requirement on the server. Various capabilities of current speech recognizers in the field of telecommunications are described in [20] like Voice Banking and Directory Assistance.PROJECT DESCRIPTIONOur main goal is to integrate the Mplayer with the SR system to control the functioning of a media file. The block diagram shown below depicts its functioning:Figure 2: Block diagram for the SRS functioning integrated with MplayerThe target hardware platform for this work is PXA27X mainstone board. It serves as a prototype for handheld devices. The mainstone board is 208 MHz Intel PXA27X processor, with 64MB of SDRAM, 32MB of flash memory and a quarter-VGA color LCD screen. We chose this particular device because it runs the GNU/Linux R_ operating system, simplifying the initial port of our system. To build our system, a GCC 3.4.3 cross-compiler is used as it is built with the crosstool script. Let us describe each component of our system.Voice Input : The input is human voice which, as explained before, is sampled at rate of 16,000 per second. It should be given in live mode. But because of some conflicts in the channel settings of the sound card and that used by the software, we are not able to do it in live mode. We are running the recognizer in batch mode, instead, i.e. taking input in the form of a pre-recorded audio file (in RAW format).Microphone : The microphone that we are using for recognition is built onto the PXA27x platform itself. It has got its own advantages and disadvantages:Advantages:∙Nothing to plug in.∙User’s hands are free.Disadvantages:∙Low accuracy unless the user is close to the monitor.∙Not good in a noisy environment.Speech Recognizer : Platform speed directly affected our choice of a speech recognition system for our work. Though all the members of the SPHINX recognizer family have well-developed programming interfaces, and are actively used by researchers in fields such as spoken dialog systems and computer-assisted learning, we chose the PocketSphinx [9, 10] as our speech decoder which is particularly meant for embedded platforms. It is a version of open-source Sphinx2 speech recognizer which is faster than any other SR system.We cross-compiled PocketSphinx on XScale and executed the various test scripts present in it. Both the digits and word recognition scripts are up and running on PXA27x. The voice input is supposed to be given from its microphone. PXA27x microphone is set to accept STEREO input only. Whereas, the PocketSphinx speech decoder takes voice input in MONO format only. Due to limitations in the code, we were not able to solve this problem, and hence we are using pre-recorded audio files for this purpose. The SR process decodes these input files, identifies the command and generates output accordingly.In our application, two input files have been used - ready.raw and stop.raw. The first one is having the utterance - PLAY - that when recognized by SRS, fires a command to the bash asking it to play a media file. It sends the Mplayer command -mplayer inc_160_68.avi VoiceInput Microphone Speech Recognizer Mplayer XScale LCD (media file playing)The Mplayer starts playing this particular file on the pxa27x LCD display. This is done by creating a child process so that the speech recognition system keeps running to take further inputs. The second command has the utterance - STOP - that kills the playing process. Challenges1)The main challenge for us was to identify an efficient SRS, that is able to run on Linux andcan be cross-compiled.2)At first we used SPHINX2 as our decoder. But some of the scripts present in it were notworking on pxa27x because of their dependence on PERL which is very difficult to cross-compile for XScale. Patches are available to cross-compile it for Strong-arm processor. Even the authors of those patches are not able to answer how to do it on XScale.3)Pocketsphinx supports mono channel audio as the input. The audio codec AC97 has somelines of code which hardcodes the channel in stereo without considering the mono channel.The biggest reason for not working with livemode is this as the number of samples produced in stereo are twice than it is required, causing the recognition to fail.4)There were some memory problems which we faced while we were running the software onthe mainstone board. For this, we had to set parameters like –mmap option which prevented memory overflows.Limitations1.The dictionary of PocketSphinx is huge and needs to be optimized.2.Our dictionary was supposed to contain four words – PLAY, PAUSE, STOP and REPEATthat need to be pronounced in a specific way to enable the SRS to recognize them. Among these, we are only able to implement PLAY and STOP commands for a single media file.3.The conflict between the software and sound card channel settings is a big problem for thissystem.RESULTSINPUT OUTPUTready.RAW Mplayer starts playing the media fileinc_160_68.avi stop.RAW Mplayer stops CONCLUSION AND FUTURE ENHANCEMENTSThe speech recognizer that we chose for pxa27x, PocketSphinx, is the first open-source embedded SR system that is capable of real-time, medium-vocabulary continuous speech recognition. We are able to recognize both digits and words on the board. We tried our best to run this system in live mode, but because of the software limitations (i.e. not able to accept stereo input), a conflict between the software and the sound card input instead, we are only able to do that in batch mode.This work can be taken way too far from the state in which it is now. The current software doesn't support stereo channel. This can be made more adaptable for any kind of channel changes.The author of this software is going to address the channel conflict problem in the next version, where every alternate sample will be dropped so that we have enough number of samples for mono mode which is the main requirement for PocketSphinx to work in live mode. Also, in the present work, we have hardcoded the filename which we want to play. This can be further programmed to take input from the user himself as to what file he wants to play. Besides all these, this application can be deployed for use on other handheld and mobile devices. REFERENCES[1] Kai-Fu Lee, Hsiao-Wuen Hon, and Raj Reddy, An Overview of the SPHINX SpeechRecognition System. IEEE Transactions on Acoustics, Speech and Signal Processing,[2] Pellom, B., Sonic: The University of Colorado Continuous Speech Recognition System.[3] /HOWTO/Speech-Recognition-HOWTO/index.html[4] /s/xvoice[5] /[6] Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea,Peter Wolf, Joe Woelfel, Sphinx-4: A Flexible Open Source Framework for SpeechRecognition.[7] A. Hagen, D. A. Connors, B. L. Pellom, The Analysis and Design of Architecture Systemsfor Speech Recognition on Modern Handheld-Computing Devices.[8] /speech.htm[9] David Huggins-Daines, Mohit Kumar, Arthur Chan, Alan W Black, MosurRavishankar, and Alex I. Rudnicky, PocketSphinx: A Free, Real-Time ContinuousSpeech Recognition System for handheld devices.[10] [11] /intdev/agent/[12] [13] /index.htm[14] Ben Shneiderman, The Limits of Speech Recognition.[15] Stefan Eickeler, K. Biatov, Martha Larson, J. Kohler, Two Novel Applications of SpeechRecognition Methods for Robust Spoken Document Retrieval.[16] Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applicationsin Speech Recognition.[17] Yoshitaka Nishimura, Mikio Nakano, Kazuhiro Nakadai, Speech Recognition for aRobot under its Motor Noises by Selective Application of Missing Feature Theory and MLLR.[18] Naveen Srinivasamurthy, Antonio Ortega, Shrikanth Narayanan, Efficient ScalableSpeech Compression for Scalable Speech Recognition.[19] Brian Delaney, Tajana Simunic, Nikil Jayant, Energy Aware Distributed SpeechRecognition for Wireless Mobile Devices.[20] Lawrence R. Rabiner, Applications of Speech Recognition in the Area ofTelecommunications.。

语言学概论(英)试卷三

语言学概论(英)试卷三

语⾔学概论(英)试卷三1. Which of the following statements is FALSE? ________. A. Language is just for communication.B. Language is one of many ways in which we experience the world.C. Language is a sign system.D. Language is arbitrary and conventional.2. ______ refers to the fact that there is no necessary or logical relationship between a linguistic form and its meaning. A. Displacement B. creativity C. arbitrariness D. duality3. The study of a language at some point of time is called________. A. computational linguistics B. sociolinguisticsC. diachronic linguisticsD. synchronic linguistics4. ________ refers to the abstract linguistic system shared by all the members of a speech community A. Langue B. performance C. competence D. parole201 年⽉江苏省⾼等教育⾃学考试8801语⾔学概论(英)⼀、选择题(每⼩题1分,共20分)在下列每⼩题的四个备选答案中选出⼀个正确的答案,并将其字母标号填⼊题⼲的括号内。

语音处理_Audio-Visual Speech Processing(视听语音处理)

语音处理_Audio-Visual Speech Processing(视听语音处理)

Audio-Visual Speech Processing(视听语音处理)数据摘要:A human listener can use visual cues, such as lip and tongue movements, to enhance the level of speech understanding, especially in a noisy environment. The process of combining the audio modality and the visual modality is referred to as speechreading, or lipreading. Inspired by human speechreading, the goal of this project is to enable a computer to use speechreading for higher speech recognition accuracy.There are many applications in which it is desired to recognize speech under extremely adverse acoustic environments. Detecting a person's speech from a distance or through a glass window, understanding a person speaking among a very noisy crowd of people, and monitoring a speech over TV broadcast when the audio link is weak or corrupted, are some examples. In these applications, the performance of traditional speech recognition is very limited. In this project, we use a video camera to track the lip movements of the speaker to assist acoustic speech recognition. We have developed a robust lip-tracking technique, with which preliminary results have showed that speech recognition accuracy for noisy audio can be improved from less than 20% when only audio information is used, to close to 60% when lip tracking is used to assistspeech recognition. Even with the visual modality only, i.e., without listening at all, lip-tracking can achieve a recognition accuracy close to 40%.中文关键词:视听,语音处理,音频模式,视觉模式,英文关键词:Audio-Visual,Speech Processing,audio modality,visual modality,数据格式:VOICE数据用途:Data processing,Identification数据详细介绍:Audio-Visual Speech ProcessingGoalA human listener can use visual cues, such as lip and tongue movements, to enhance the level of speech understanding, especially in a noisy environment. The process of combining the audio modality and the visual modality is referred to as speechreading, or lipreading. Inspired by human speechreading, the goal of this project is to enable a computer to use speechreading for higher speech recognition accuracy.There are many applications in which it is desired to recognize speech under extremely adverse acoustic environments. Detecting a person's speech from a distance or through a glass window, understanding a person speaking among a very noisy crowd of people, and monitoring a speech over TV broadcast when the audio link is weak or corrupted, are some examples. In these applications, the performance of traditional speech recognition is very limited. In this project, we use a video camera to track the lip movements of the speaker to assist acoustic speech recognition. We have developed a robust lip-tracking technique, with which preliminary results have showed that speech recognition accuracy for noisy audio can be improved from less than 20% when only audio information is used, to close to 60% when lip tracking is used to assist speech recognition. Even with the visual modality only, i.e., without listening at all, lip-tracking can achieve a recognition accuracy close to 40%.System DescriptionWe explore the problem of enhancing the speech recognition in noisy environments (both Gaussian white noise and cross-talk noise cases) by using the visual information such as lip movements.We use a novel Hidden Markov Model (HMM) to model the audio-visualbi-modal signal jointly, which shows promising result for recognition. We alsoexplore the fusion of the acoustic signal and the visual information with different combinations approaches, to find the optimum method.To test the performance of this approach, we add gaussian noise to the acoustic signal with different SNR (signal-to-noise ratio). Here we plot the recognition ratio versus SNR.In the plot above, the blue curve shows the change of the recognition ratio vs. SNR when we use the audio-visual joint feature (cascade the visual parameter with the acoustic feature) as the input to the recognition system. The black curve with "o" shows the performance when we use acoustic signal as the input, just as most conventional speech recognition systems do. The flat black curve shows the performance of recognition using only the lip movement parameters.We can see that the recognition ratio of the joint HMM system drops much slower than the acoustic only system, which means the joint system is more robust to the gaussian noise corruption in the acoustic signal. We also apply this approach to cross-talk noise corrupted speech.DownloadAs the first step of this research, we collected an audio-visual data corpus, which is available to the public. In this data set, we have:10 subjects (7 males and 3 females).The vocabulary includes 78 isolated words commonly used for time, such as, "Monday", "February", "night", etc. Each word repeated 10 times.Data Collection:To collect a database with high quality, we have been very critical about the recording environment. We have set it up in a soundproof studio to collectnoise-free audio data. And we used controlled light and blue-screen background to collect the image data.We used a SONY digital camcorder with tie-clip microphone to record the data on DV tapes. The data on DV tapes is transferred to a PC by the Radius MotoDV program and stored as Quicktime files. Since the data on the DV tapes are already digital, there's no quality loss when transferred to the PC.Here are some sample Quicktime videos: ( Here the Quicktime files have been converted to streaming RealVideo sequences with lower quality, to make online viewing easier. ) Click the image to view the sample video.We have altogether 100 such Quicktime files, each one containing one subject articulates all the words in the vocabulary. Each file is about 450MBytes. Both the DV raw data and the Quicktime files are available upon request.Data Pre-processing:The raw data on the DV tapes and the Quicktimes files are big and need further processing before can be used directly in the lipreading research. We have processed the data in the following way:First of all, for the video part, we only keep the mouth area since it's the Area of Interest in the lipreading research field. We used a video editing tool (such as Adobe Premiere) to crop out the mouth area. The following figure shows how the whole face picture, size 720*480, was cut to the only picture of the mouth, size 216*264. Note that the face in the original Quicktime video frame lie horizontally due to the shooting procedure.The four offsets of the mouth picture were noted down for calculating the positions of the lip parameters later. After that, we shrank the mouth picturefrom size 216*264 to 144*176, and rotated to the final picture, size 176*144, the standardized QCIF format.And then we use a lip tracking program to extract the lip parameters from each QCIF sized file, based on deformable template and color information. The template is defined by the left and right corners of the mouth, the height of the upper lip, and the height of the lower lip. The following figure shows an example of the lip tracking result, with the template in black lines superimposed on the mouth area.Click the image to view the sample video sequence.For more details about the lip tracking and object tracking, please see our face tracking web page. The face tracking toolkit can be extended to track lip movements. The lip parameters are stored in text files, along with the offsets of the mouth picture.Secondly, we also have the waveform files extracted from the Quicktime video files. These waveform files contains the speech signals corresponding to the lip parameters contained in the text files mentioned above.Data FormatText files with the lip parameters. The text file contains approximately 3000-4000 lines of numbers. The first line has four numbers which are the offsets of the picture when we cropped the mouth area from the whole face picture. Those four number are the left offset, the right offset, the top offset and the bottom offset respectively, as mentioned above.For the rest of numbers, each line is consist of seven numbers. The first one is the frame number. The next four numbers are the positions of the left corner(x1,y1) and the right corner(x2,y2). The last two numbers are the the height of the upper lip(h1), and the height of the lower lip(h2). As shown in the figure above.The waveform files. These files contain audio signals which are sampled as PCM, 44.1KHz, 16 bit, mono.VocabularyFor date/time: One, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty,thirty, forty, fifty, sixty, seventy, eighty, ninety, hundred,thousand, million, billionFor month:January, February, March, April, May, June,July, August, September, October, November, DecemberFor day:Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, SundaySome additional words:Morning, noon, afternoon, night, midnight, evening,AM, PM, now, next, last, yesterday, today, tomorrow,ago, after, before, from, for, through, until, till, that, this, day, month, week ,yearPublicationsF. J. Huang and T. Chen, "Real-Time Lip-Synch Face Animation driven by human voice", IEEE Workshop on Multimedia Signal Processing, Los Angeles, California, Dec 1998ContactAny suggestions or comments are welcome. Please send them to Wende Zhang.数据预览:点此下载完整数据集。

Chapter 2 Speech Sounds

Chapter 2 Speech Sounds



Tongue ----
the most flexible speech organ.
Voicing 浊音化
the

vibration of the vocal folds
when the vocal folds are close together, the airstream causes them to vibrate against each other and the resultant sound is “voiced” 浊音;
4) Arbitrariness The widely accepted meaning of this feature which was discussed by Saussure first refers to the fact that the forms of the linguistic signs bear no natural relationship to their meaning.


Distinctive features
Suprasegmentals

Phonetics studies how speech sounds are produced, transmitted, and perceived.

Articulatory Phonetics(发音语音学)is the study of the

It was changed to its present title of the International Phonetic Association (IPA国际语 音学会) in 1897.

One of the first activities of the Association was to produce a journal in which the contents were printed entirely in phonetic transcription.

Speech synthesis manner, device and storage

Speech synthesis manner, device and storage

专利名称:Speech synthesis manner, device and storage发明人:山田 雅章申请号:JP2000099531申请日:20000331公开号:JP3728173B2公开日:20051221专利内容由知识产权出版社提供摘要:A speech synthesizing apparatus acquires a synthesis unit speech segment divided as a speech synthesis unit, and acquires partial speech segments by dividing the synthesis unit speech segment with a phoneme boundary. The power value required for each partial speech segment is estimated on the basis of a target power value in reproduction. An amplitude magnification is acquired from the ratio of the estimated power value to the reference power value for each of the partial speech segments. Synthesized speech is generated by changing the amplitude of each partial speech segment of the synthesis unit speech segment on the basis of the acquired amplitude magnification.申请人:キヤノン株式会社地址:東京都大田区下丸子3丁目30番2号国籍:JP代理人:大塚 康徳,高柳 司郎,大塚 康弘,木村 秀二,丸山 幸雄更多信息请下载全文后查看。

索绪尔语言学六个观点英文版

索绪尔语言学六个观点英文版

索绪尔语言学六个观点英文版(1) The first question is the difference between speech activity langage, speech parole, and langue. Saussure believes that conversation spans the fields of physics, physiology, and psychology, and it also belongs to the field of individuals and society. Speech activity consists of two parts. The first part is the main one, which is actually the future of society and the language of individuals. This research is purely psychological. The second part is secondary, it takes the personal part of speech activity, that is speech, which includes pronunciation as the object of study, and it is psychophysical. T o use an analogy, language can be said to be a score, and speech is a performance.(2) The second aspect refers to the fact that language is a symbol system. Another metaphor that needs to be used here is that language can be said to be an algebra with only complex terms. The combination of the concept and the sound image is called the sign, the concept is called the signified signifie, and the sound image is called the signifier signifiant. So what are the characteristics of this essentially mental language symbol?Saussure pointed out two characteristics:A. The sign is arbitrary, which means that it is unarguable, that is, it is arbitrary for the referent that has no natural connection with it in real life. Since the symbols are arbitrary, why don't we see the pervasive suddenchanges in the speech that these symbols make up?There are four factors that hinder this change, the first is the arbitrariness of the sign, which in itself actually makes language shun all attempts to make it change, and the second, that constitutes any language, it must be The fact that there are a large number of symbols makes it difficult to change the symbols, and thirdly, the nature of the language system is too complex, because the system is complex, and people have to think deeply to master it. The fourth is collective inertia, which will produce a resistance to all language innovations. Among all social systems, "language is the least suitable for innovation, it is integrated with the life of the public, and the latter is inherently Inertness is a conservative factor. Language has a stable nature not only because it is tied to the collective rock, but also because it is in time. These two things are inseparable. Whenever there is an association with the past, there is an impediment to freedom of choice."B. The linearity of the signifier, the signifier belongs to the auditory nature and can only be expanded in time, and has the characteristics of borrowing time. First, it reflects a length, and second, this length can only be in one dimension, on Determination, it is a straight line, which is as important as the law of the arbitrariness of signs, upon which the whole apparatus of language depends.(3) The difference between internal linguistics and externallinguistics. Saussure used chess as a metaphor in many places. For example, he said, replacing wooden pieces with ivory pieces, this change is irrelevant to the system, but if the number of pieces is reduced or increased, this change will deeply affect the game. Following a rule, he proposed, "everything that changes the system in any way is internal.(4) The systemicity of language and the value of symbols. The symbols of language are not purely the facts of language, but the constituent elements of the system. This system represents language. The function of the symbols entering the system is determined by the relationship between the various elements of the system. Language is a system, and all the elements in this system form a whole. Here Saussure once again uses the metaphor of chess. He believes that chess can be attributed to the position of each piece, language is a system based only on the opposition of his specific units, and the state of playing chess is equivalent to the state of language. , the value of each piece is determined by their position on the chessboard. Similarly, in language, each element has its value because it is in opposition to other elements."The system is always only temporary, and will change from one state to another, and it is true that the value is also determined first of all by the unchanging statute, that is, the rules of chess, which existed before the start of chess, and are played every time. After a chess move, it continues to exist, and language also has such a rule that oncerecognized, it exists forever, and that is the eternal rule of semiotics.”Saussure put forward that "language is a form, not an entity. The concept of value is the basic concept of Saussure's linguistics. The concept of identity is often combined with the concept of value. Value contains units, concrete entities and reality. The concept of value. Since value determines the function of symbols, the concept of value is one of the concepts with pivotal significance in the system of Saussure's linguistics.(5) The fifth is synchronic linguistics and diachronic linguistics. Saussure distinguished between synchronic linguistics and diachronic linguistics. He believes that the synchronic and diachronic views are in opposition to each other, and no compromise is allowed. At the same time, Saussure believes that the synchronic idea is more important than the diachronic idea, because to the speaker, "it is the real, the only reality.I also disagree on this point.(6) He proposes the distinction between segmental and associative relations, which is also a very important distinction. Bloomfield praised Saussure for providing a theoretical basis for the new direction of language research, which is modern linguistics, so we have repeatedly emphasized that Saussure is the Auki of modern linguistics.。

英语古诗教学 课件ppt课件ppt课件

英语古诗教学 课件ppt课件ppt课件

The use of comparison between two unlike things to highlight their shared characteristics
Methodology
Simicomparison between two unlike things using the words "like" or "as."
Guidance for students to analyze the point's structure and language features
Guidance for students to recommend the point's artistic value and cultural annotation
Teaching English Classical Poetry
CATALOGUE
目录
IntroductionBasic knowledge of acute English poetryTeaching Methods for English Classic PoetryTeaching Practice of English Classic PoetryEvaluation of English Classic Poetry TeachingConclusion
Based on the evaluation results, teachers can develop improvement plans for students that focus on areas where they need to enhance their skills or knowledge

Speech_Act_Theory_and_Subtitle_Translation

Speech_Act_Theory_and_Subtitle_Translation

Speech Act Theory and Subtitle Transla-tion: Applications to Green Book Yongzhong YI , Wenjing ZHANGCollege of Foreign Studies, Guilin University of Electronic TechnologyAbstract: With the rapid economic development and frequent cultural exchanges, a large number of excellent English films have been introduced to China. In this cir-cumstance, the film industry has been gradually developed and the demand of subti-tle translation has been increased. In order to achieve cross-cultural communication through subtitle translation, translator should not only accurately express superficial meaning of subtitle but also vividly convey profound meaning such as various emo-tions, personalities and images of characters. It is easy to find that subtitle is primar-ily composed of conversations. Therefore, translator should take advantage of speech theory as guidance in the process of subtitle translation. Proposed by J.L. Austin, speech act theory has an important guiding role in subtitle translation. As a core the-ory of pragmatic, speech act theory mainly focuses on locutionary act, illocutionary act and perlocutionary act. This paper analyzes subtitle translation from the aspect of locutionary act and illocutionary act.Being released in 2018, the film called Green Book enjoys great popularity. Due to its fascinating story and profound connotation, it won Best Picture at the 91st Academy Awards. Under the guidance of speech act theory, this paper discusses subtitle translation of the Green Book and selects conversations from the film as cases. The author hopes that this paper will provide some suggestions for further study of subtitle translation and boost cultural exchange among China and foreign countries.Key words: Speech act theory; Subtitle translation; Green BookAbout the author: Yongzhong YI, (1965-10), Male Han, Beihai, Guangxi Zhuang Au-tonomous Region, Professor, Master of Art, Applied linguistic.Wenjing ZHANG, (1994-10), Female, Han, Pingyao, Jinzhong, Shanxi Province, Master, English Translation.Funded project: Supported by the study abroad program for graduate student of Guilin University of Electronic Technology (GDYX2018012).Speech Act Theory and Subtitle Translation: Applications to Green Book 1. IntroductionThe film called Green Book was debuted at the Toronto International Film Festi-val on September 11, 2018 and theatrically released in the United States on No-vember 16, 2018. The film enjoyed great popularity after it was released. Due to its fascinating story and profound connotation, it won Best Picture at the 91st Academy Awards, the 76th Golden Globe Awards, the 72nd British Academy of Film and Tel-evision Awards (BAFTA) and other awards in 2018. On March 1st, 2019, the Green Book was released in China.Directed by Peter Farrelly and mainly acted by Viggo Mortensen and Maher-shala Ali, Green Book is inspired by a true story. It tells a story of a tour of the Deep South by Don Shirley, an African American jazz pianist, and Tony Vallelonga (also called Tony Lip), an Italian American bouncer working in Copacabana nightclub. Tony Vallelonga (Viggo Mortensen) and Don Shirley (Mahershala Ali) are two pro-tagonists in the film. Coming from different class, they have distinctive personalities and suffer different predicaments. As an idle man, Tony works as a bouncer in Co-pacabana nightclub. Due to renovation, the nightclub has to be closed for several months. However, Tony, who loses his job, has to pay for rent of his house and living expenses of his family. In this circumstance, he has to find another job to make money during the renovation period. Don Shirley is a famous black pianist and he is about to start his eight-week tour to the Deep South. The discrimination against black people in the South is very serious at that time, so Don Shirley has to find a body-guard to protect him during his tour. Accidently, Tony is hired by Don Shirley as his driver and bodyguard. Then, they become partners and embark on a fascinating jour-ney. During the journey, they confront with a variety of obstacles because of Shirley’s skin color. Furthermore, there are many conflicts between these two people since they come from different class, have different education background and have distinctive personalities. Fortunately, these difficulties and conflicts finally lead them to develop a special friendship regardless of racial difference and class distinc-tion.As a popular film, Green Book not only tells a story of love, compassion and humanity, but also receives praise from all kinds of medias. The National Board of Review says that the film shows a moving friendship and makes audiences experi-ence love and compassion deeply. The Guardian points out that Viggo Mortensen and Mahershala Ali are two excellent actors since Viggo acts a legendary person vividly and Ali presents elegance and composure of Don Shirley. The Observer com-ments that Green Book is the best film over the past decade and audiences are deeply attracted by the humor and sincerity of Viggo and Ali.Due to success and popularity of Green Book and distinctive personalities of characters, this paper selects conversations from this film to do some research from the aspect of speech act theory and subtitle translation.Creativity and Innovation Vol.3 No.2 20192. Literature Review(1) Speech act theory1) Definition of termAs one of core theories of pragmatic, speech act theory is created by J. L. Austin, a British philosopher, in 1950s. The core theories of speech act theory originate with J.L. Austin’s posthumously published article, How to do things with Words (Grey, 2009). Then, John S earle, Austin’s student, continues to develop speech act theory over the next several decades (Howell & Lioy, 2011). The basic idea of speech act theory is that speech and act are inseparable and the foundation of speech communi-cation is performance of speech act. According to speech act theory, communicators use as little language as they think is needed in order to convey all the information they intend to communicate (Sperber & Wilson 1986a, Cappelen & Lepore 2006). In How to Do Things with Words (Austin, 1961), Austin proposes that the object of language research is act performed by words and sentences rather than word or sen-tence. The foundational aspects of speech act theory are as such: speech has the ut-terance, or locution; it has the desired effect of the speech by the speaker, known as the illocution, or illocutionary force; and it has a consequential reaction, the perlo-cution (Emory D. Dively, 2014).When the speaker talking with others, they will perform locutionary act, illo-cutionary act and perlocutionary act at the same time. Focusing on the speaker, the locutionary act means that the speaker articulates sounds and utters words, phrases or sentences with meaning. In other words, the locutionary act lays emphasis on su-perficial meaning of the speech. Illocutionary act means that the speaker shows his or her intention through uttering words or sentences. It pays attention to intention of the speaker and implied meaning of the speech. Perlocutionary act refers to the effect on the hearer produced by the speech, that is to say, whether implied meaning of the speech can be understood by the hearer or not. Both illocutionary act and perlocu-tionary act depend on comprehension of the hearer and performance of the speech. Among these three speech acts, Austin pays more attention to the illocutionary act since it conveys the specific meaning and implied meaning that the speaker wants to express on a particular occasion.Speech act theory shows that, by saying things, we in fact, do them. In other words, “speech that acts instead of describes” and “to say something is to do some-thing, or in saying something we do something, or even by saying something we do something” (Austin, Urmson, & Sbisà, 1975, p. 109). Here is an example can be used for illustrating locutionary act, illocutionary act and perlocutionary act of speech act theory.Setting: A and B are having lunch in the restaurant.A: The food is not delicious.Locutionary act: A thinks the food is not delicious. (superficial meaning)Speech Act Theory and Subtitle Translation: Applications to Green Book Illocutionary act: A wants to have lunch in another restaurant since the food here is not delicious. (implied meaning)Perlocutionary act: A and B go to another restaurant to have lunch. (effect and act produced by the speech)2) Features of speech act theorySpeech act theory puts an emphasis on the linguistic actions that we perform towards another person (Schiffrin 1994:414) and focuses on effect on the hearer pro-duced by the speech.There are some advantages of speech act theory. First, speech act theory pro-vides grounds for investigation of illocutionary acts. The communicative signifi-cance of illocutionary acts is tied to the linguistic environments in which they are used (Acheoah John Emike, 2013). Second, speech act theory helps language users achieve their communicative purposes through their linguistic competence. For in-stance, when two people are talking with each other, one can understand implied meaning of the utterances spoken by the other on the basis of the superficial meaning.Speech act theory also has its weakness. To some degree, it is ambiguous to understand the actual meaning of utterances of speaker since one locutionary act may cause different illocutionary meanings. For instance, when we are asked: “Have you finished lunch?” What is the actual meaning of this question? Is it only a way for greeting? Does the speaker really want to know whether we finish lunch or not? If we say “no”, will the speaker have lunch with us or invite us to have lunch? People from different cultures will interpret the speaker’s m eaning in different ways.3. Communication chainSpeech act theory plays a significant role in communication research since it gives academics a solid foundation for understanding how communication between humans has a force behind it. Rooted in spoken communication, speech act theory has a close relationship with speech communication chain.Creativity and Innovation Vol.3 No.2 2019Actually, the speech communication chain showed in Figure 1 is suitable for interlocutors who have the same native language. If the speaker and listener have different native language, how can they have an efficient conversation? Likewise, if the speaker in the film is an English native speaker, and the audience is a Chinese native speaker, how can the audience totally understand the English subtitle? In this case, a translator is needed between the speaker and the audience. As a bridge be-tween the source language and the target language, the translator also conforms to a communication chain created in the process of translation.Sender (English native speaker) Translator (Channel) Receiver (Chinese native speaker)4. Case Study(1) Locutionary actCase 1: Augie: What the hell happened at the Copa? I heard you split a guy’s face open.That guy you hit Mikey Charon. He was one of Charie the Hands crew.奥吉:科帕那什么情况?我听说你把一个人的脸揍开花了。

Speech-Act-theory

Speech-Act-theory
批注本地保存成功开通会员云端永久保存去开通
Speech Act Theory
Chapter 1 Background 1.1Origin of speech act theory 1.2 Definition Chapter 2 Development of Speech Act Chapter 3 Three different acts 3.1 Illocutionary Act 3.1.1 Directives 3.1.2 Commissives 3.1.3 Expressives 3.1.4 Declarations 3.2 Locutionary Act 3.3 Perlocutionary Act Chapter 4 Conclusion
Chapter 3 Three different Acts
3.1 Illocutionary act 3.2 Locutionary act 3.3Perlocutionary act
Concept
Game
Test
Exam
3.1 Illocutionary act Illocutionary act is a term in linguistics introduced by John L. Austin in his investigation of the various aspects of speech acts. We may sum up Austin's theory of speech acts with the following example. In uttering the locution "Is there any salt?" at the dinner table, one may thereby perform the illocutionary act of requesting salt, as well as the distinct locutionary act of uttering the interrogatory sentence about Game the Concept presence of salt, and the furtherTest perlocutionaryExam act of causing somebody to hand one the salt.
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

VRIO: A Speech Processing Unitfor Virtual Reality and Real-World Scenarios -An Experience ReportD. Kranzlmüller1, A. Ferscha2, P. Heinzlreiter1, M. Pitra2, J. Volkert1GUP1 and IPI2, Joh. Kepler University LinzAltenbergerstr. 69, A-4040 Linz, Austria/Europekranzlmueller@gup.jku.at1 | ferscha@soft.uni-linz.ac.at2AbstractHuman Computer Interaction (HCI) summarizes research and engineering activities related to the communication between human beings and all sorts of “computerized” machines. Wit hin this domain, substantial amount of work is dedicated to the idea of using the human voice as a natural interface for accessing computer systems. The VRIO speech processing unit represents one example of such an interface, where users control the machine via spoken commands. While the application of VRIO was originally intended for Virtual Reality (VR) environments only, a major redesign of VRIO’s architecture allows its application to arbitrary scenarios, e.g. within ubiquitous and pervasive environments. This paper describes the revised architecture of VRIO as well as examples of its application in VR environments and for real-world scenarios.1IntroductionComparable to everyday life, communication problems between the human user and the machine may have substantial impact on either of the two communication partners and/or the surrounding environment and are thus not desired. For this reason, many on-going research projects investigate new or improved ways of Human Computer Interaction (HCI). An example is the VRIO prototype, a combination of a software framework and commodity-of-the-shelf hardware, which provides a flexible user interface within arbitrary computing environments.The original idea of VRIO started in Virtual Reality (VR) environments (Burdea & Coiffet, 1994), in particular the room-sized, 3-D projection-based CAVE Automatic Virtual Environment (Cruz-Neira et al, 1992). While the visual output of the CAVE provides interesting possibilities to experience computer-generated scenarios, the options for user input are often not satisfying. Sophisticated input devices such as the wand, a 3D-like mouse with 6 degrees of freedom (Ware, 1990), require a significant amount of training, while usage of traditional input devices such as a keyboard, is limited by the user’s position, posture, and movement in the CAVE.This example calls for a more human-centred interface design (Landay & Myers, 2001), with possibly natural or intuitive human-computer interaction through multimodal input and output (Ark & Selker, 1999). The deficit of human to computer communication compared to computer to human communication (Damper, 1993) is addressed by a series of ongoing projects in areas such as speech processing and computer vision (Wahlster, 2000).In this context, the approach of VRIO was to replace or enable parts of the user interaction by voice input. The user, wearing a headset with a microphone, was able to control the VR scenariovia spoken commands (Kranzlmüller, Reitinger, Hackl & Volkert, 2001). Due to the invariance of the VR environment, the speech processing was performed on a dedicated workstation located closely by the user.The application of VRIO in Virtual Reality environments demonstrated some of the limitations of the system but also the feasibility of using speech processing for HCI. Based on this experience, a second prototype of VRIO has been developed. The software framework of VRIO was completely reworked to provide a better suited level of abstraction. The resulting client-server architecture enables interaction between arbitrary command clients - not only speech processing - and the server on the one side, and between the server and arbitrary actuators on the other side.This paper provides an overview of VRIO’s architecture and two exampl es of its application, one within Virtual Reality and one in the real-world. The next section describes the architecture of VRIO and its basic functionality, while example applications are presented in Section 3. A conclusion and an outlook on future work in this project summarizes the paper.2Architecture of VRIO2.1OverviewThe architecture of VRIO resembles the traditional client-server approach. The user interface is provided as a preferably simple and almost invisible device, which receives the commands from the user. The input device transforms the commands in machine-readable form and forwards it via the input interface to the central server. The server analyzes the commands and generates one or more corresponding controls, which initiates the requested actions on connected actuator devices. An overview of the architecture is provided in Figure 1, with possible command clients on the left side of the server and example actuators on the right side. Of course, it is possible that a command client may also provide actuator functionality.Figure 1: System architecture of VRIOThe communication between the client and the server, as well as between the server and the actuators relies on TCP/IP and HTTP, and can thus be easily integrated in many existing computing infrastructures. In fact, a variety of command clients and actuators, from simple webforms to self-made hardware instruments, have already been tested with VRIO. The mapping between commands and controls is provided by an XML request scheme, which can be easily adapted to different needs and scenarios. In particular, the system is able to dynamically adapt itself to different contexts, depending on the user’s location and surroundings. The whole system is supported by an Application Programming Interface (API), which simplifies the usability of the framework for both clients and actuators. The API is (as much as possible) platform-independent in order to facilitate its usage on different devices, from embedded systems to sophisticated high-end installations such as the CAVE.3ExampleAs an practical example of this second generation VRIO, the hardware unit of the command client has been replaced by a personal digital assistant (PDA). Figures 2 and 3 display a VRIO user with a PDA, in this case a Compaq iPAQ. The PDA is equipped with a microphone for processing speech commands and a network interface card for communication with the server. The command client uses IBM’s ViaVoice speech processing software (http://www.software.i /speech) to translate voice commands into corresponding XML requests. The requests are transferred over the network to the server, which maps the command to matching controls as indicated in Figure 1. The controls are then forwarded to the corresponding actuators in order to perform the desired activity.3.1Application of VRIO in the CAVEAccording to our original intention, VRIO has been utilized within the CAVE Automatic Virtual Environment for different kinds of applications. Within the computational steering environment MoSt, the user controls the execution of a large scale high-performance computing application by navigating through a graphical representation of the program’s states during its execution. The user is able to query the systems state, extract behavioural data, or modify parameters to change the program’s behaviour.Figure 2: Application of VRIO with the CAVE Holodeck applicationAnother VR application of VRIO is the Holodeck 3D Editor. In this example, VRIO supports the user when generating arbitrary 3D VR scenarios, which can afterwards be used in artificial worlds. The 3D world is constructed interactively by placing and manipulating 3D objects. A variety of commands for handling 3D objects (e.g. cubes, spheres, cylinders, …) and other useful items (e.g. light sources) are provided. Objects can be generated, selected and moved within the virtual world. Their graphical representation can be modified by transformation such as scaling and rotation, as well as changing colours of the objects.An example of the Holodeck 3D editor is given in Figure 2. The user is standing in the virtual world generated by the CAVE. Shutter glasses are required for providing the impression of 3D stereo pictures. The user’s pos ition is tracked through the glasses and the 3-D wand, a 6 degree-of-freedoms mouse, which is also used for object movement. The input client of VRIO (a Compaq iPAQ) in the right hand of the user is equipped with a headset to receive the spoken commands. Figure 2 contains some graphical elements of a simple scenario, which have already been positioned and manipulated by the user.3.2Application of VRIO in Real-World ScenariosThe redesign of VRIO increases its application domain from Virtual Reality to pervasive computing scenarios (Birnbaum, 1997). One example is a VRIO actuator for a standard interface card, which is used to connect arbitrary electric devices to a computer. With this approach, VRIO is able to switch and manipulate devices, e.g. to control arbitrary gadgets of consumer electronics. Another major effort of integrating this kind of natural user interface is being implemented in the Wireless Campus project, where students will be able to access administrative services of the Johannes Kepler University Linz. Related to this project is the public communication WebWall project developed at the IPI (Ferscha & Vogl, 2002).Figure 3: Application of VRIO in front of a public communication WebWall(Ferscha & Vogl, 2002)The basic concept of the WebWall is based on the provision of large-scale displays at public places, which provides an interface to the World Wide Web (WWW), independent from the available input interface. Through WebWall’s connection to the Internet and the cellular phone network, users can utilize the WebWall services as a kind of electronic pinboard for a variety of activities, e.g. placement of small advertisements, event notes, or even multimedia contents.VRIO represents another sophisticated interface to the WebWall. Instead of typing the messages for the electronic pinboard, users can simply dictate their notes or activate arbitrary actions via spoken commands. An example is shown in Figure 3. The user stands in front of a WebWall, wearing VRIO (in form of a Compaq iPAQ) and speaking to the device. Immediately after the command has been submitted, a new window with the translated note opens on the webwall, containing the message of the user.4Conclusions and Future WorkThe latest redesign of VRIO transformed the original Virtual Reality Input Output device from the self-contained world of the CAVE to ubiquitous computers and pervasive systems. The new and much more portable approach combined with the adaptable client-server architecture opens VRIO for a set of novel and interesting application areas. First examples in the CAVE and in the real-world have already demonstrated the feasibility of the approach. Several more applications are currently being developed.An aspect, which has not been sufficiently covered so far, is (linguistic) context, which strongly influences the interaction between communicating human beings. Although the spoken words or gestures of humans are similar in different situations, the context of words or phrases enclosed by other words may trigger different “states” in the communication partners. Thus, the context itself must be considered as a primary source for input within any future human-computer interface. AcknowledgmentsSeveral colleagues at GUP and IPI have contributed to the development of VRIO. We are most grateful to Ingo Hackl, Bernhard Reitinger, Christoph Anthes, Edith Spiegl, and Simon Vogl. ReferencesArk, W.S. and Selker, T. (1999). A Look at Human Interaction with Pervasive Computers. IBM Systems Journal, 38 (4) 504-507.Birnbaum, J. (1997). Pervasive Information Systems. Communications of the ACM, 40 (2), 40-41. Burdea, G., and Coiffet, P. (1994). Virtual Reality Technology. John Wiley & Sons.Cruz-Neira, C., Sandin, D.J., DeFanti, T.A., Kenyon, R.V., and Hart, J.C. (1992). The CAVE: Audio Visual Experience Automatic Virtual Environment. Communications of the ACM, 35(6), 64–72.Damper, R.I. (1993). Speech as an Interface Medium: How can it Best be Used?. Proc. Interactive Speech Technology: Human Factors Issues in the Application of Speech Input/Output to Computers, Taylor and Francis, 59-71.Ferscha A. and Vogl, S. (2002). Pervasive Web Access via Public Communication Walls, Proc.Pervasive 2002, International Conference on Pervasive Computing, Springer-Verlag, 84-97. Kranzlmüller, D., Reitinger, B., Hackl, I., and Volkert, J. (2001). Voice Controlled Virtual Reality and Its Perspectives for Everyday Life. In: A. Bode, W. Karl (Eds.), Proc. APC 2001 - Arbeitsplatzcomputer, ITG Fachbericht, Vol. 168, Munich, Germany, 101-107.Landay, J. and Myers, B.A. (2001). Sketching Interfaces: Toward More Human Interface Design.IEEE Computer, 34 (3) 56-64.Wahlster, W. (2000). Pervasive Speech and Language Technology. In: Wilhelm, R. (Ed.), Informatics – 10 Years Back, 10 Years Ahead. Springer Verlag, LNCS, 2000, 274-293. Ware, C. (1990). Using Hand Position for Virtual Object Placement. The Visual Computer, 6 (5), 245-253.。

相关文档
最新文档