词汇学Collocation搭配PPT
Chapter 8 Collocation
Chapter 8 CollocationsCollocation is the way words combine in a language to produce a natural-sounding speech and writing. For example, in English you say strong wind but heavy rain. It wouldnot be normal to say heavy wind or strong rain.Collocation is the important method of lexical cohesion. runs through the whole of the English language. No piece of natural spoken or written English is totally free ofcollocation. For the student, choosing the right collocation will make his speech andwriting sound much more natural, more native-speaker-like. A student who talks aboutstrong rain may make himself understood, but possibly not without provoking a smile ora correction, which may or may not matter. He will certainly be marked down for it in anexam. (Oxford Collocations Dictionary for Students of English, edited by JonathanCrowther, Sheila Dignen and Diana Lea, 2001.)Words can collocate with different degrees of frequency and acceptability. e.g.high-frequency verb collocate of “story” would be “tell”. Some collocations areunacceptable, for example, “strong tea” is O.K., but one cannot speak of “powerful tea”.1. Definition of collocation1) (Bri. ) Firth: “…you shall know the word by the company it keeps.”;2) (Bri.) Leech: “one of seven types word meaning, collocation meaning”;3) (Ame. L.) Morton Benson: “ …is the combination of the co-occurrence in a language.”4) Halliday: “Collocation is a natural co-occurrence of lexical items, a word or phrase thatis often used with another word or phrase. “So some words may go together, some may not. This natural combination of words is called collocation. A collocation is a pair or group of words that are often usedtogether.– E.g. …do an experiment‟, not …make an experiment‟.2. Classification of collocationBenson, Benson & Ilson (2007): Collocations fall into two major groups: Grammatical collocations: a phrase consisting of a dominant word (nouns, adjectives, verbs) and a preposition or grammatical structure.Lexical collocations: consist of nouns, adjectives, verbs, and adverbs, and do not contain prepositions, infinitives, or clauses.lexical collocation: the combination of two full lexical itemsa. Noun (as subject) + Verbb. Verb + Noun (as object)c. Adjective + NounG1「N+ Prep」:The price of apathy towards public affairs is to be ruled by evil men.G2「N + to Inf」: It was a pleasure to do business with you.G3「N +that Cl」:I took an oath that I would do my duty.G4「Prep+ N」:Major earthquakes can be predicted months in advance.G5「Adj + Prep」:We were hungry for more success.G6「Adj + to Inf」:It is necessary to understand the nature of science.G7「Adj + that Cl」:He was afraid that he might lose his job.L1「V+ N」:The computer software program can help young children compose music. L2「Adj+ N」:I can give you a rough estimate over the phone.L4「N+ V」:Our burglar alarm goes off unexpectedly.L5「N+ of+ N」:He sent her a bouquet of flowers.L6「Adv + Adj」:The good manager will be keenly aware of the needs of others.L7「V+ Adv」:Students vary considerably in their abilities to understand the second language.As we can see the way items combine can be a major source of frustration for nonnative speakers. Therefore, some ESL professionals believe that when learning a new word, students also need to learn the common collocates of the word. These collocates can be “presented” along with the target word or the word can be used in a number of authentic sentences and students can figure out the collocates on their own.Students can test their knowledge of collocates by doing matching exercises.3. Types of collocation1) fixed combination (固定型组合)see eye to eye (with) (意见一致)see red (大为生气), see stars (眼冒金星)2) natural collocation (习惯型搭配)a cat mews, a rooster crows,a bird chirps, a tiger roars3) free combination (自由型组合)–T he economy boomed in the 1990s–T he company has expanded and now has branches in major cities.–T he price increase poses a problem for us.✧The economy boomed in the 1990s.✧The company has expanded and now has branches in major cities.✧The price increase poses a problem for us.1) adj. + n.a major problem/ a key issue/ a chief chatfast quickfood meal car glance2) n. + v.发动汽车start the car 经营公司run a company成家start a family 开店run a shop说实话tell the truth 思念一个人miss a person讲笑话tell a joke 错过一节课miss a lesson– n. + n.– As Sam read the lies about him, he felt a surge of anger.– Every parent feels a sense of pride when….3) n. + n.✧When Paul saw how harshly the poor little girl was treated, he felt a surge of______.✧If people have a sense of ______ in their town, they are more likely to behave wellthere.4) v.+ prep.✧B urst into tears, filled with horror5) v. +adv.✧He whispered softly to Her.✧He placed the beautiful vase gently on the window ledge.✧Perhaps it’s a good thing that Ken’s ______ unaware of what people really think ofhim.✧I am _____ aware that there will be problems whatever we decide.✧Nadya smiled ________ as she watched the children playing in the garden.4. Common errors1) V + N:* learn knowledge – acquire/ broaden/ increase/ extend/ gain/ improve knowledge (增进知识)* kill problem – attack/ combat/ deal with/ ease/ grapple with/ overcome/ resolve/ solve/ tackle problem (解决问题)* make problem – create/ cause problem (造成问题)raise (提出)a question/ *suggestion/ *warning/ * an application/ *one’s resignation 2) A + N:* sour rain – acid rain (酸雨)* hearty greeting – hearty welcomes(热烈欢迎)* thick tea – strong tea (浓茶)* toxic snake – poisonous snake (毒蛇)Practice:1. An explosion of chemicals in the factory started a large fire. (big)2. The car runs at a mean speed of fifty miles an hour. (average)3. You have a nice-looking hat. (nice)4. There are many high buildings in this university. (tall)5. In big cities such as Shanghai, the number of trees and other greenery is very small.(scarce)6. In Shanghai, the traffic is too difficult. (heavy)7. But when the customers buy the goods, they may find they are of bad quality.(low/poor)8. However, the speed of a car is much faster than that of a bicycle. (greater)9. In the past the price of milk was so expensive that most families could not afford it.(high)10. The opportunities for being promoted in a joint venture will be smaller if one cannotspeak English. (fewer)11. That the moon moves round the earth is ordinary sense. (common)12. The inactive volcano may erupt at any time. (dormant)13. We mush have a good command of daily English. (everyday)14. My watch is quick, and yours is slow. (fast)15. The scientist‟s speech drew many audiences. (a large)16. The angle contained by the lines AB and AC is a straight angle. (right)17. The ill engineer has recovered. (sick)18. Penicillin is considered a special medicine for infection. (specific)19. Small drops of water in the air form clouds. (tiny)20. This is the only empty room in the hotel. (vacant)21. As a famous physicist, he received a hot welcome. (warm)22. Charles Darwin is a clever man. (wise)23. He is the only alive man who survived the shipwreck. (living)3) N + N:* job chance – job opportunity (工作机会)4) N + of + Ncause of accident/ damage/ death/ trouble/ failure/ anxiety/ *success/ *progress✧Learning collocation is mostly a matter of noticing and recording, and studentsshould be trained to be able to explore texts and select collocations which are crucial to their own writing needs.✧Eventually, students are guided to internalize these prefabricated languagecollocations/chunks to write a complete piece of research article (autonomous learning/ independent learners).5. Fixed use of lexical collocation1) Verbal collocation1) We accused him ______ theft.2) They indicted the official ______ taking bribes.3) They impeaching the President ______ killing the boy astray.4) Public opinion blamed her ____ leading the boy astray.5) Mother scolded the child ___ telling a lie.6) They were punished ____ selling drugs控告/指控某人做某事。
演示文档词汇学Collocation搭配.pptx
Break the vase/cup/window Break the law/a contract/promise Break heart/a man/spirit
Fat pig/pork/lands/income/kitchen
Байду номын сангаас
A. Generating dictionary B. Lexical cohesive
Quality of the node (head word)
Verb collocation Noun collocation Adjective collocation…
Range selection restriction
Unrestricted collocation Semi-restricted collocation Restricted/frozen collocation
C. Predictability of collocates D. Distinctiveness of polysemics
E. Accuracy of synonyms
fat
plump obese stout collocates
v
v
v
v
man
v
v
v
v
woman
v
v
x
x
baby
v
v
x
v
legs
A. Generating dictionary B. Lexical cohesive
C. Predictability of collocates D. Distinctiveness of polysemics
collocations - 搭配
DRAFT!c January7,1999Christopher Manning&Hinrich Schütze.1415CollocationsA C O L L O C A T I O N is an expression consisting of two or more words thatcorrespond to some conventional way of saying things.Or in the wordsof Firth(1957:181):“Collocations of a given word are statements of thehabitual or customary places of that word.”Collocations include nounphrases like strong tea and weapons of mass destruction,phrasal verbs liketo make up,and other stock phrases like the rich and powerful.Particularlyinteresting are the subtle and not-easily-explainable patterns of word usagethat native speakers all know:why we say a stiff breeze but not??a stiff wind(while either a strong breeze or a strong wind is okay),or why we speak ofbroad daylight(but not?bright daylight or??narrow darkness).Collocations are characterized by limited compositionality.We call a nat-COMPOSITIONALITYural language expression compositional if the meaning of the expressioncan be predicted from the meaning of the parts.Collocations are not fullycompositional in that there is usually an element of meaning added to thecombination.In the case of strong tea,strong has acquired the meaningrich in some active agent which is closely related,but slightly different fromthe basic sense having great physical strength.Idioms are the most extremeexamples of non-compositionality.Idioms like to kick the bucket or to hearit through the grapevine only have an indirect historical relationship to themeanings of the parts of the expression.We are not talking about bucketsor grapevines literally when we use these idioms.Most collocations exhibitmilder forms of non-compositionality,like the expression international bestpractice that we used as an example earlier in this book.It is very nearly asystematic composition of its parts,but still has an element of added mean-ing.It usually refers to administrative efficiency and would,for example,not be used to describe a cooking technique although that meaning wouldbe compatible with its literal meaning.There is considerable overlap between the concept of collocation and no-tions like term,technical term,and terminological phrase.As these names sug-TERMTECHNICAL TERMTERMINOLOGICAL PHRASE1425Collocationsgest,the latter three are commonly used when collocations are extractedfrom technical domains(in a process called terminology extraction).The TERMINOLOGY EXTRACTIONreader be warned,though,that the word term has a different meaning ininformation retrieval.There,it refers to both words and phrases.So itsubsumes the more narrow meaning that we will use in this chapter.Collocations are important for a number of applications:natural lan-guage generation(to make sure that the output sounds natural and mis-takes like powerful tea or to take a decision are avoided),computational lexi-cography(to automatically identify the important collocations to be listedin a dictionary entry),parsing(so that preference can be given to parseswith natural collocations),and corpus linguistic research(for instance,thestudy of social phenomena like the reinforcement of cultural stereotypesthrough language(Stubbs1996)).There is much interest in collocations partly because this is an area thathas been neglected in structural linguistic traditions that follow Saussureand Chomsky.There is,however,a tradition in British linguistics,associ-ated with the names of Firth,Halliday,and Sinclair,which pays close at-tention to phenomena like collocations.Structural linguistics concentrateson general abstractions about the properties of phrases and sentences.Incontrast,Firth’s Contextual Theory of Meaning emphasizes the importanceof context:the context of the social setting(as opposed to the idealizedspeaker),the context of spoken and textual discourse(as opposed to theisolated sentence),and,important for collocations,the context of surround-ing words(hence Firth’s famous dictum that a word is characterized by thecompany it keeps).These contextual features easily get lost in the abstracttreatment that is typical of structural linguistics.A good example of the type of problem that is seen as important in thiscontextual view of language is Halliday’s example of strong vs.power-ful tea(Halliday1966:150).It is a convention in English to talk aboutstrong tea,not powerful tea,although any speaker of English would alsounderstand the latter unconventional expression.Arguably,there are nointeresting structural properties of English that can be gleaned from thiscontrast.However,the contrast may tell us something interesting aboutattitudes towards different types of substances in our culture(why do weuse powerful for drugs like heroin,but not for cigarettes,tea and coffee?)and it is obviously important to teach this contrast to students who wantto learn idiomatically correct English.Social implications of language useand language teaching are just the type of problem that British linguistsfollowing a Firthian approach are interested in.In this chapter,we will introduce the principal approaches tofinding col-5.1Frequency143locations:selection of collocations by frequency,selection based on mean and variance of the distance between focal word and collocating word,hy-pothesis testing,and mutual information.We will then return to the ques-tion of what a collocation is and discuss in more depth different definitionsthat have been proposed and tests for deciding whether a phrase is a col-location or not.The chapter concludes with further readings and pointersto some of the literature that we were not able to include.The reference corpus we will use in examples in this chapter consistsof four months of the New York Times newswire:from August through November of1990.This corpus has about115megabytes of text and roughly 14million words.Each approach will be applied to this corpus to makecomparison easier.For most of the chapter,the New York Times exampleswill only be drawn fromfixed two-word phrases(or bigrams).It is im-portant to keep in mind,however,that we chose this pool for convenience only.In general,bothfixed and variable word combinations can be colloca-tions.Indeed,the section on mean and variance looks at the more looselyconnected type.5.1FrequencySurely the simplest method forfinding collocations in a text corpus is count-ing.If two words occur together a lot,then that is evidence that they havea special function that is not simply explained as the function that resultsfrom their combination.Predictably,just selecting the most frequently occurring bigrams is not very interesting as is shown in Table5.1.The table shows the bigrams(sequences of two adjacent words)that are most frequent in the corpus andtheir frequency.Except for New York,all the bigrams are pairs of functionwords.There is,however,a very simple heuristic that improves these results alot(Justeson and Katz1995b):pass the candidate phrases through a part-of-speechfilter which only lets through those patterns that are likely to be “phrases”.1Justeson and Katz(1995b:17)suggest the patterns in Table5.2.Each is followed by an example from the text that they use as a test set.Inthese patterns A refers to an adjective,P to a preposition,and N to a noun.Table5.3shows the most highly ranked phrases after applying thefilter.The results are surprisingly good.There are only3bigrams that we wouldnot regard as non-compositional phrases:last year,last week,andfirst time.1445Collocations5.1Frequency145tag pattern1465Collocationsstrongsupport50computers10sales21men8showing18man7message15military6gains13country6criticism13post5feelings11nation5challenges11chip5case11senators4signal9magnet4Table5.4The nouns occurring most often in the patterns“strong”and“pow-erful”.However,searching the larger corpus of the World Wide Web wefind799examples of strong tea and17examples of powerful tea(the latter mostlyin the computational linguistics literature on collocations),which indicatesthat the correct phrase is strong tea.2Justeson and Katz’method of collocation discovery is instructive in thatit demonstrates an important point.A simple quantitative technique(thefrequencyfilter in this case)combined with a small amount of linguisticknowledge(the importance of parts of speech)goes a long way.In therest of this chapter,we will use a stop list that excludes words whose mostfrequent tag is not a verb,noun or adjective.Exercise5-1Add part-of-speech patterns useful for collocation discovery to Table5.2,includingpatterns longer than two tags.5.2Mean and Variance147Sentence:Stocks crash as rescue plan teetersBigrams:stocks crash stocks as stocks rescuecrash as crash rescue crash planas rescue as plan as teetersrescue plan rescue teetersplan teeters Figure5.1Using a three word collocational window to capture bigrams at a dis-tance.Exercise5-2Pick a document in which your name occurs(an email,a university transcript or a letter).Does Justeson and Katz’sfilter identify your name as a collocation?Exercise5-3We used the World Wide Web as an auxiliary corpus above because neither stong tea nor powerful tea occurred in the New York Times.Modify Justeson and Katz’s method so that it uses the World Wide Web as a resource of last resort.5.2Mean and VarianceFrequency-based search works well forfixed phrases.But many colloca-tions consist of two words that stand in a moreflexible relationship to one another.Consider the verb knock and one of its most frequent arguments, door.Here are some examples of knocking on or at a door from our corpus: (5.1) a.she knocked on his doorb.they knocked at the doorc.100women knocked on Donaldson’s doord.a man knocked on the metal front doorThe words that appear between knocked and door vary and the distance between the two words is not constant so afixed phrase approach would not work here.But there is enough regularity in the patterns to allow us to determine that knock is the right verb to use in English for this situation, not hit,beat or rap.A short note is in order here on collocations that occur as afixed phraseversus those that are more variable.To simplify matters we only look at fixed phrase collocations in most of this chapter,and usually at just bi-grams.But it is easy to see how to extend techniques applicable to bigrams1485Collocationsto bigrams at a distance.We define a collocational window(usually a win-dow of3to4words on each side of a word),and we enter every word pairin there as a collocational bigram,as in Figure5.1.We then proceed to doour calculations as usual on this larger pool of bigrams.However,the mean and variance based methods described in this sec-tion by definition look at the pattern of varying distance between twowords.If that pattern of distances is relatively predictable,then we haveevidence for a collocation like knock...door that is not necessarily afixedphrase.We will return to this point and a more in-depth discussion of whata collocation is towards the end of this chapter.One way of discovering the relationship between knocked and door is tocompute the mean and variance of the offsets(signed distances)between the MEANVARIANCE two words in the corpus.The mean is simply the average offset.For theexamples in(5.1),we compute the mean offset between knocked and door asfollows:(5.2)where is the number of times the two words co-occur,is the offset forco-occurrence,and is the mean.If the offset is the same in all cases,then the variance is zero.If the offsets are randomly distributed(whichwill be the case for two words which occur together by chance,but not in aparticular relationship),then the variance will be high.As is customary,weuse the standard deviation5.2Mean and Variance149 standard deviation means that the two words usually occur at about the same distance.Zero standard deviation means that the two words always occur at exactly the same distance.We can also explain the information that variance gets at in terms of peaks in the distribution of one word with respect to another.Figure5.2 shows the three cases we are interested in.The distribution of strong with respect to opposition has one clear peak at position(corresponding to the phrase strong opposition).Therefore the variance of strong with respect to opposition is small().The mean of indicates that strong usually occurs at position(disregarding the noise introduced by one occurrence at).We have restricted positions under consideration to a window of size 9centered around the word of interest.This is because collocations are essentially a local phenomenon.Note also that we always get a count of at position when we look at the relationship between two different words. This is because,for example,strong cannot appear in position in contexts in which that position is already occupied by opposition.Moving on to the second diagram in Figure5.2,the distribution of strong with respect to support is drawn out,with several negative positions having large counts.For example,the count of approximately20at position is due to uses like strong leftist support and strong business support.Because of this greater variability we get a higher()and a mean that is between positions and().Finally,the occurrences of strong with respect to for are more evenly dis-tributed.There is tendency for strong to occur before for(hence the neg-ative mean of),but it can pretty much occur anywhere around for. The high standard deviation of indicates this randomness.This indicates that for and strong don’t form interesting collocations.The word pairs in Table5.5indicate the types of collocations that can be found by this approach.If the mean is close to and the standard deviation low,as is the case for New York,then we have the type of phrase that Justeson and Katz’frequency-based approach will also discover.If the mean is much greater than,then a low standard deviation indicates an interesting phrase.The pair previous/games(distance2)corresponds to phrases like in the previous10games or in the previous15games;minus/points corresponds to phrases like minus2percentage points,minus3percentage points etc;hundreds/dollars corresponds to hundreds of billions of dollars and hundreds of millions of dollars.High standard deviation indicates that the two words of the pair stand in no interesting relationship as demonstrated by the four high-variance1505Collocations frequencyof strong50-4-3-2-101234Position of strong with respect to opposition().frequencyof strong50-4-3-2-101234Position of strong with respect to support().frequencyof strong50-4-3-2-101234Position of strong with respect to for().5.2Mean and Variance151Count Word2Newpreviousminushundreds4.030.4436Atlanta4.030.0078New3.960.19119hundredth3.960.29106bystrongpowerfulRichardGarrison1525Collocationsof words that are in a looser relationship thanfixed phrases and that arevariable with respect to intervening material and relative position.5.3Hypothesis TestingOne difficulty that we have glossed over so far is that high frequency andlow variance can be accidental.If the two constituent words of a frequentbigram like new companies are frequently occurring words(as new and com-panies are),then we expect the two words to co-occur a lot just by chance,even if they do not form a collocation.What we really want to know is whether two words occur together moreoften than chance.Assessing whether or not something is a chance eventis one of the classical problems of statistics.It is usually couched in termsof hypothesis testing.We formulate a null hypothesis that there is no NULL HYPOTHESISassociation between the words beyond chance occurrences,compute theprobability that the event would occur if were true,and then rejectif is too low(typically if beneath a significance level of,, SIGNIFICANCE LEVEL,or)and retain as possible otherwise.3It is important to note that this is a mode of data analysis where we lookat two things at the same time.As before,we are looking for particularpatterns in the data.But we are also taking into account how much datawe have seen.Even if there is a remarkable pattern,we will discount it ifwe haven’t seen enough data to be certain that it couldn’t be due to chance.How can we apply the methodology of hypothesis testing to the problemoffinding collocations?Wefirst need to formulate a null hypothesis whichstates what should be true if two words do not form a collocation.For sucha free combination of two words we will assume that each of the wordsand is generated completely independently of the other,and so theirchance of coming together is simply given by:The model implies that the probability of co-occurrence is just the productof the probabilities of the individual words.As we discuss at the end ofthis section,this is a rather simplistic model,and not empirically accurate,but for now we adopt independence as our null hypothesis.5.3Hypothesis Testing1535.3.1The testNext we need a statistical test that tells us how probable or improbable it isthat a certain constellation will occur.A test that has been widely used forcollocation discovery is the test.The test looks at the mean and varianceof a sample of measurements,where the null hypothesis is that the sampleis drawn from a distribution with mean.The test looks at the differencebetween the observed and expected means,scaled by the variance of thedata,and tells us how likely one is to get a sample of that mean and vari-ance(or a more extreme mean and variance)assuming that the sample isdrawn from a normal distribution with mean.To determine the proba-bility of getting our sample(or a more extreme sample),we compute thestatistic:If you look up the value of that corresponds to a confidence level of ,you willfind.4Since the we got is larger than, we can reject the null hypothesis with99.5%confidence.So we can saythat the sample is not drawn from a population with mean158cm,and ourprobability of error is less than0.5%.To see how to use the test forfinding collocations,let us compute thevalue for new companies.What is the sample that we are measuring the1545Collocationsmean and variance of?There is a standard way of extending the testfor use with proportions or counts.We think of the text corpus as a longsequence of bigrams,and the samples are then indicator random vari-ables that take on the value1when the bigram of interest occurs,and are0otherwise.Using maximum likelihood estimates,we can compute the probabilitiesof new and companies as follows.In our corpus,new occurs15,828times,companies4,675times,and there are14,307,668tokens overall.newThe null hypothesis is that occurrences of new and companies are indepen-dent.new companies new companiesIf the null hypothesis is true,then the process of randomly generating bi-grams of words and assigning1to the outcome new companies and0to anyother outcome is in effect a Bernoulli trial with for theprobability of new company turning up.The mean for this distribution isand the variance is(see Section2.1.9),whichis approximately.The approximation holds since formost bigrams is small.It turns out that there are actually8occurrences of new companies amongthe14307668bigrams in our corpus.So,for the sample,we have that thesample mean is:5.3Hypothesis Testing1552.32.21.3p1.20.8 Table5.6Finding collocations:The test applied to10bigrams that occur withfrequency20.Table5.6shows values for ten bigrams that occur exactly20times in thecorpus.For the topfive bigrams,we can reject the null hypothesis that thecomponent words occur independently for,so these are goodcandidates for collocations.The bottomfive bigrams fail the test for signif-icance,so we will not regard them as good candidates for collocations.Note that a frequency-based method would not be able to rank the tenbigrams since they occur with exactly the same frequency.Looking at thecounts in Table5.6,we can see that the test takes into account the numberof co-occurrences of the bigram()relative to the frequencies of thecomponent words.If a high proportion of the occurrences of both words(Ayatollah Ruhollah,videocassette recorder)or at least a very high proportionof the occurrences of one of the words(unsalted)occurs in the bigram,thenits value is high.This criterion makes intuitive sense.Unlike most of this chapter,the analysis in Table5.6includes some stopwords–without stop words,it is actually hard tofind examples that failsignificance.It turns out that most bigrams attested in a corpus occur sig-nificantly more often than chance.For824out of the831bigrams thatoccurred20times in our corpus the null hypothesis of independence canbe rejected.But we would only classify a fraction as true collocations.Thereason for this surprisingly high proportion of possibly dependent bigrams(1565CollocationsThe test and other statistical tests are most useful as a method for rankingcollocations.The level of significance itself is less useful.In fact,in mostpublications that we cite in this chapter,the level of significance is neverlooked at.All that is used is the scores and the resulting ranking.5.3.2Hypothesis testing of differencesThe test can also be used for a slightly different collocation discoveryproblem:tofind words whose co-occurrence patterns best distinguish be-tween two words.For example,in computational lexicography we maywant tofind the words that best differentiate the meanings of strong andpowerful.This use of the test was suggested by Church and Hanks(1989).Table5.7shows the ten words that occur most significantly more often withpowerful than with strong(first ten words)and most significantly more of-ten with strong than with powerful(second set of ten words).The scores are computed using the following extension of the test tothe comparison of the means of two normal populations:(5.4)Here the null hypothesis is that the average difference is(),so wehaveW e5.3Hypothesis Testing157strong)powerful)word4.690498622safety7.0710*******support6.32573616587enough4.58253741210sales4.024********opposition3.9000802181showing3.90001641181sense3.74162501140defense3.6055851130gains3.6055832130criticismTable5.7Words that occur significantly more often with powerful(thefirst ten words)and strong(the last ten words).where is the number of times occurs in the corpus.The application suggested by Church and Hanks(1989)for this form of the test was lexicography.The data in Table5.7are useful to a lexicogra-pher who wants to write precise dictionary entries that bring out the differ-ence between strong and powerful.Based on significant collocates,Church and Hanks analyze the difference as a matter of intrinsic vs.extrinsic qual-ity.For example,strong support from a demographic group means that the group is very committed to the cause in question,but the group may not have any power.So strong describes an intrinsic quality.Conversely,a pow-erful supporter is somebody who actually has the power to move things. Many of the collocates we found in our corpus support Church and Hanks’analysis.But there is more complexity to the difference in meaning be-tween the two words since what is extrinsic and intrinsic can depend on subtle matters like cultural attitudes.For example,we talk about strong tea1585Collocationscompanies(new companies)(e.g.,old companies)1582014287181(5.6)where ranges over rows of the table,ranges over columns,is the5.3Hypothesis Testing159 observed value for cell and is the expected value.One can show that the quantity is asymptotically distributed.In other words,if the numbers are large,then has a distribution.We will return to the issue of how good this approximation is later.The expected frequencies are computed from the marginal probabili-ties,that is from the totals of the rows and columns converted into propor-tions.For example,the expected frequency for cell(new companies) would be the marginal probability of new occurring as thefirst part of a bi-gram times the marginal probability of companies occurring as the second part of a bigram(multiplied by the number of bigrams in the corpus): That is,if new and companies occurred completely independently of each other we would expect occurrences of new companies on average for a text of the size of our corpus.The test can be applied to tables of any size,but it has a simpler form for2-by-2tables:(see Exercise5-9)Looking up the distribution in the appendix,wefind that at a probabil-ity level of the critical value is.(the statistic has one degree of freedom for a2-by-2table).So we cannot reject the null hypoth-esis that new and companies occur independently of each other.Thus new companies is not a good candidate for a collocation.This result is the same as we got with the statistic.In general,for the problem offinding collocations,the differences between the statistic and the statistic do not seem to be large.For example,the20bigrams with the highest scores in our corpus are also the20bigrams with the highest scores.However,the test is also appropriate for large probabilities,for which the normality assumption of the test fails.This is perhaps the reason that the test has been applied to a wider range of problems in collocation discovery.1605Collocationsvache8570934Table5.9Correspondence of vache and cow in an aligned corpus.By applying thetest to this table one can determine whether vache and cow are translations ofeach other.word150076word35.They actually use a measure they call,which is multiplied by.They do this sincethey are only interested in ranking translation pairs,so that assessment of significance is notimportant.5.3Hypothesis Testing161out of bigrams areout of bigrams areTable5.11How to compute Dunning’s likelihood ratio test.For example,thelikelihood of hypothesis is the product of the last two lines in the rightmostcolumn.Just as application of the test is problematic because of the underlyingnormality assumption,so is application of in cases where the numbersin the2-by-2table are small.Snedecor and Cochran(1989:127)adviseagainst using if the total sample size is smaller than20or if it is between20and40and the expected value in any of the cells is5or less.In general,the test as described here can be inaccurate if expected cell values are small(Read and Cressie1988),a problem we will return to below.5.3.4Likelihood RatiosLikelihood ratios are another approach to hypothesis testing.We will seebelow that they are more appropriate for sparse data than the test.Butthey also have the advantage that the statistic we are computing,a likelihood LIKELIHOOD RATIOratio,is more interpretable than the statistic.It is simply a number thattells us how much more likely one hypothesis is than the other.In applying the likelihood ratio test to collocation discovery,we examinethe following two alternative explanations for the occurrence frequency ofa bigram(Dunning1993):Hypothesis1.Hypothesis2.Hypothesis1is a formalization of independence(the occurrence of isindependent of the previous occurrence of),Hypothesis2is a formaliza-tion of dependence which is good evidence for an interesting collocation.6We use the usual maximum likelihood estimates for,and andwrite,,and for the number of occurrences of,and in。
英语词汇学之构词法PPT课件
-en
woolen, golden, wooden, earthen…
-ent different, dependent, existent, consistent…
-ic
realistic, poetic, historic, economic…
November 18, 2023
14
常见后缀2—形容词后缀
11
常见后缀1—名词后缀
后缀 例词
-an
Asian, American, Russian, African…
-ance attendance, performance, assistance…
-ation education, examination, pronunciation…
-dom
freedom, kingdom, wisdom, boredom…
spy(监视)
lower(更低的) lower(降低)
1818
其它构词法之截短法
缺点
杂乱 不喜欢
10
常见前缀3—特定意思
构成方式
anti-(反) auto-(自动) bi-(双) centi-(百分之一的) co-(共同) down(向下) ex-(以前的) fore-(前部的) full-(完全) November 18, 2023
例词
antiwar, anticancer, antipollution… automatic, autotimer, auto-record… bicycle, binoculars… centimeter, centigram, centigrade… co-operate, co-edit, co-exit… downstairs, downhill, downwards… ex-wife, ex-lover, ex-husband… forehead, foreleg, forearm… full-time, full-speed, full-strength…
词汇学第一章 The Basic Concepts of Words and VocabularyPPT
1.4 Sound and Form
Task 1 Say the following words by yourself.
cough
thought
though
thorough
tough
through
Question: why is there the disparity?
The international reason Changes Borrowings
1.3 Sound and Meaning
In how many languages do you know the name of the animal in this picture?
Task 1 Say the name of the animal in
as many languages as you can.
content words and which are functional words? denote never and run notion upon seven Christmas have would
1.5.3 Native words & borrowed words
Task
Guess whether the statements are true or false.
non-basic vocabulary
Not all the words of the basic word stock have these features.
Non-basic vocabulary include:
Terminology 专业术语 Jargon 行话 Slang 俚语 Argot 隐语 Dialectal words 方言词 Archaisms 古词语 Neologisms 新词语
英语词汇学英语的搭配(全英)ppt课件
1
主要内容
什么是搭配? 搭配的理据 搭配的基本类型 搭配的特点 常见搭配举例
2
什么是搭配
1.1 搭配的定义 1.2 搭配的意义 1.3 词语的组合类型
3
Definition of
Collocation
习惯上连在一起使用 并被视为单个词项的两个 或两个以上的词的组合叫 做搭配。4Significance
make a call
do the shopping make an arrangement
7
4. 搭配具有开放性的特点,可以与时俱进。 Eg:
bird flu(禽流感) digital camera(数码相机) knowledge economy(知识经济) brain science(脑科学) human cloning(人体克隆)
6
3.搭配可以使语言交流形式多样,意义简洁。
如:make 和 do 这两个词与其他词语构成的 搭配不仅形式灵活,而且表意清楚。
do the cooking
make the mistake
do the washing
make the bed
do your hair
make money
do your homework
9
搭配的理据
A. 语法理据
a. 语法结构意义 Eg: The fire was an unsuspected disaster to everyone.
b. 词类功能意义 an unsuspected disaster an起限定作用, disaster具有中心词作用, unsuspected 则具有修饰功能。
8
Basic Types of Collocation
英语词汇学授课课件 PPT
B: rapid growth of present-day English Vocabulary and Its causes
❖ Neologisms(新词) after World War II ❖ Reasons: ❖ 1. progress of science and technology科技
❖ This definition emphasizes syntax(句法), but does not touch upon meaning.
Antoine Meillet
❖ “A word is defined by the association of a given sense with a given group of sounds capable of a given grammatical use.” (p.2, para.2 )
Bloomfield布洛姆菲尔德(美国语言学家教育 家) and ❖ a French linguist, Antoine Meillet(梅耶,法 国语言学家)
Bloomfield
❖ “some linguistic forms(语言形态), which we call bound forms(限定/非自由形态) are never used as sentences.
invaded by Angles盎格鲁, Saxons撒克逊, Jutes朱特人
❖ Vocabulary: 5000-6000 words,chiefly Anglo-Saxon/ some Old Norse古斯堪的那维 亚语
❖ Old Norse words (are, they, their, them, till, call, die, give, take, skin, window, ill, weak)
学术英语 collocationPPT课件
• _____ additional time to
• _____ comments • _____ a letter提交信函
• stiff • charge • evenly • go into effect • act • implement • volume • extend • accommodate • merit • develop • submit
• intensely worrying • awash with change • shape our lives for the better • the prime concern • a passing encounter • intense feelings • a mutual attachment • self-evident • a matter of chance/luck • have little/no say in…/have a say
2020/10/13
2
• 生活水平 • 经济困难 • 找工作 • 最终结果 • 报纸广告 • 找工作网络 • 大学就业办 • 猎头 • 工作机会 • 找工作的人 • 在职培训 • 行业转换 • 政府开设、管理的
2020/10/13
Text A
• standard of living • economic hardship • job search • the end result • newspaper ads • Internet job sites • unversity placement offices • headhunters • job openings • job candidates • on-the-job/in-service training • sectoral shifts • government-run
英语词汇学课件1-6章
1) Old English
族
古英语
vocabulary(450—1150 AD公元)
日耳曼语
After Romans罗马, 3 Germanic tribes called Angles , Saxons and Jutes controlled England. Their language—Anglo-Saxon also dominated 支 配 the land. Common practice : combine 2 native words to create new words. It was a highly inflected language with about 50000-60000 words.
3) Modern English 近代英语(1500---now) 2 sub-periods 子周期can be divided: a. Early Modern English (15001700) Because of the Renaissance , many Latin and Greek words entered English and English began to have a Latinate flavor拉丁味.
• C. Productivity衍生(can form new words) • D. Polysemy 一 词 多 义 (various meanings, “book”; “man”: to man a dove) • E. Collocability搭配( form idioms方 言, proverbs谚语)
It has something to do with与有关 the following subjects:
《词语的搭配关系》PPT课件
搭配范围:
——语义的吻合
He killed three bottles of whisky in a week. 他一周内喝了三瓶威士忌。 I’ve been racking my brains to think of some way to kill time. 我绞尽脑汁想法子消磨时间。 He was so worried about the exam that he read the book 20 times. Personally, I think that is overkill. 他非常紧张这次考试,把书看了不下20遍。我觉得他紧 张得太过了头。
1.Strong man 体格强壮的人 2. Strong wind强风 3. Strong majority绝大多数 4. Strong demand巨大的需求 5. Strong mind 健全的头脑 6. Strong rope牢固的绳子 7.Strong will坚强的意志
Collocation
பைடு நூலகம்
Collocation
15.Strong cheese气味刺鼻子的奶酪 16.Strong solution浓溶液 17. Strong opponent劲敌 18. Strong economy实力雄厚的经济 19. Strong eyes眼力敏锐 20. Strong language 骂人的话
Collocation
选择限制: 1. A week elapsed 2. The idea frightened the man.
Collocation
改错:
1.My ideal job is a teacher. 2.This year will produce more grain than
英语词汇学习漫谈PPT教学课件
3
1) Pronunciation (s)
2020/12/10
4
compete competition competitive competitor competent
2020/12/10
5
analyze analysis analytical analyst
2020/12/10
Байду номын сангаас
6
molecule molecular mechanic mechanical mechanism
2020/12/10
7
survey console contact contract attribute addict row bow
2020/12/10
8
sword Warwick debris debut mortgage plumber Maginot buffet
seeds damp: used about sth you would like to be
dry …damp clothes/bed/room humid=hot and damp: a technical word
used to describe climate or weather
2020/12/10
the dog
2020/12/10
12
be a far cry from…=be totally different from… sound beating // a sound idea // a sound sleeper // sound and safe culture succeed industry respect stand contain assume reduce promise honor liquid funds provincial: a provincial attitude shows you are unwilling to accept new
Part6-collocation
• 每一个单词都有一个搭配的范围,而两个同义词或近义词 的区别经常可以通过说明它们各自的搭配而明朗起来。
E.g.
develop和expand 发展 我们可以说 a developing country 发展中国家
develop the economy and ensure supplies 发展经济,保障供给 to expand revolutionary forces
(发动汽车,英语中不说begin the car)
see red(大为生气) see stars(眼冒金星)
see the light (明白)
第7页,共24页。
3)a.+n.
heavy traffic strong tea (浓茶)
a powerful car (大排量的汽车,
不用“strong car”) black tea (红茶) back number (过期杂志)
e.g. Eye与介词的搭配
before one’s eyes (眼前)
in the eyes of (在…心目中, 在… 看来)
to the eye (从表面上看来)
under one’s eye (当着某人的面) with all one’s eye (全神贯注地注视)
第20页,共24页。
联想记忆与搭配
Onomatopoeia 拟声
A lion roared.(咆哮) A dog barked.(狗吠) A cat mews.(猫叫) A rooster crows.(鸡啼) A bird chirps.(鸟鸣)
第6页,共24页。
2)v.+宾语n.
bite one’s nails
英语词汇学chapter 9 collocative meaning
Mary is very (really, quite) able.
Mary is a very (really, quite) able student.
*Mary is perfectly (well, totally) able.
*Mary is a perfectly (well, totally) able student.
able When “able” is used predicatively, or when it is used as a modifier of another noun,it can be collocated with “very, really, quite”,not the words “perfectly, well, totally”,e.g.
The differences in collocations represent
the differences in collocative meaning.
If we say pretty and handsome will give rise to associations about different kinds of beauty,then what kind of “bigness” do “good, strong and high”convey in our mind?
见 Leech 的 Semantics: the study of meaning 第12-13页
He cites “woman” as an example,in the past woman has been burdened with such attributes as “frail, prone to tears, cowardly, emotional, irrational, inconstant, as well as gentle, compassionate, sensitive, hard working, etc..” All these have formed part of the connotations of the word “woman”.According to his theory of connotation, we can well conclude that the differences in “pretty’ and “handsome” are different associations that they give rise to in users’ mind.
语料库语言学术语集
Co-text
共文
DDL/Data Driven Learning
数据驱动学习
Diachronic corpus
历时语料库
Discourse
话语、语篇
Discourse prosody
话语韵律
Documentation
备检文件、文检报告
EAGLES/Expert Advisory Groups on Language Engineering Standards
Mini-text
微型文本
Misuse
误用
Monitor corpus
(动态)监察语料库
Monolingual corpus
单语语料库
Multilingual corpus
多语语料库
Multimodal corpus
多模态语料库
MWU/Multiword unit
多词单位
MWE/Multiword expression
习语原则
Index/Indexing
(建)索引
In-line annotation
文内标注、行内标注
Key keyword
关键主题词
Keyness
主题性、关键性
Keyword
主题词
KWIC/Key Word in Context
语境中的关键词、语境共现(方式)
Learner corpus
学习者语料库
EAGLES文本规格
Empirical Linguistics
实证语言学
Empiricism
经验主义
Encoding
字符编码
Error-tagging
错误标注、错误赋码
collocation名词解释
一、什么是collocation?Collocation是指在语言学中指相邻的两个或多个词汇在语言中经常一起出现的现象。
这些词汇之间有一种固定的搭配关系,它们在篇章中通常是紧密连接在一起的,是一种固定搭配的汉字成词。
这些搭配可能是临时的、几何的或存在于语言中的特定领域。
Collocation有助于提高语言表达的自然度和准确度,因此被广泛应用于语言教学和语料库研究中。
二、collocation的形式和特点1. 固定搭配关系Collocation中的词汇组合通常是固定的,不能随意替换,否则会改变其原本的意思和语言风格。
这种固定的搭配关系在语用学中被认为是语言习得的重要特征。
2. 形式多样Collocation不仅仅是指相邻的两个词汇,还可以是连续的短语、习语、习惯用语等形式。
这些搭配形式多样,但都具有固定的语法和语义关系。
3. 语用上的完整性Collocation的搭配关系不是单纯的语法上的搭配,更多的是基于语用学的角度,反映了语言使用的习惯和规范。
collocation在语言交流中具有重要的作用。
三、collocation的应用领域1. 语言教学Collocation在语言教学中有着重要的应用价值。
教师可以通过引导学生学习collocation来提高他们的语言表达能力和说话的流利度。
通过学习常见的collocation,学生可以更快地掌握语言的运用技巧,提高口语和写作的水平。
2. 翻译与语言研究在翻译和语言研究中,collocation也发挥着重要作用。
翻译工作者需要准确把握collocation的意义和使用习惯,以避免翻译过程中出现不恰当或生硬的表达。
语言研究者可以通过collocation分析来揭示语言现象的规律和特点。
3. 写作和修辞在写作和修辞中,collocation的运用也是至关重要的。
作家和修辞学家可以通过合理运用collocation来增加文笔的生动性和表现力,使作品更具有吸引力和感染力。
词汇学Collocation搭配
Break the vase/cup/window Break the law/a contract/promise Break heart/a man/spirit
Fat pig/pork/lands/incomictionary B. Lexical cohesive
He kicked the bucket out of the way. He kicked the bucket last night.
Unrestricted (free phrase) Restricted
D. Range selection restriction
Significance
A. Generating dictionary B. Lexical cohesive
_W_e_a_r _ a coat _W_e_a_r _ a watch _W_e_a_r _ perfume
Central collocation Medial collocation Peripheral collocation
A. Degree of closes
A fat kitchen makes a lean will. —B. Franklin
Collocation
Types of Collocations
Degree of closeness
Central collocation Medial collocation Peripheral collocation
Collocation
Function
Grammatical/Syntactic collocation Lexical collocation
词汇学Collocation搭配 ppt课件
Lean person Lean season
Poor Richard`s Almanac
be interested in big apple
Grammatical/Syntactic collocation Lexical collocation
B. Function
Come out
A glass of
_W_e_a_r _ a coat _W_e_a_r _ a watch _W_e_a_r _ perfume
Central collocation Medial collocation Peripheral collocation
A. Degree of closes
A fat kitchen makes a lean will. —B. Franklin
C. Predictability of collocates D. Distinctiveness of polysemics
E. Accuracy of synonyms
Lexical cohesive
Co-occurrence of Collocation Reiteration
By Halliday and Hasan (1976)
Quality of the node (head word)
Verb collocation Noun collocation Adjective collocation…
Range selection restriction
Unrestricted collocation Semi-restricted collocation Restricted/frozen collocation
英语词汇教学中的类联接_搭配及词块
濮建忠 英语词汇教学中的类联接 、搭配及词块
要 ,必须对它们作出回答 。 迄今为止 ,较好地回答了后两个问题的应
首推 Sinclair 和 Renouf (1998) 及 Willis (1990) 。 Sinclair 和 Renouf (1998 :148) 在大量的语料库 研究基础上提出了词汇大纲的思路和设想 ,明 确指出 ,英语教学的重点应放在 : 1) 语言中最 常见 的 词 形 ; 2) 这 些 词 形 的 核 心 用 法 模 式 (patterns of usages) ; 3) 它们的典型组合 (com2 binations) 。这一思想与 Richards 的论述不同 , 不仅明确指出了词汇知识的关键内容 (注意 : 它们与上文所总结的 Richards 论断中最主要的 三条密切相关) ,而且将它们进行了有机的结 合 。Willis ( 1990) 则在这一词汇大纲的基础 上 ,进一步提出了较为具体的词汇大纲实施方 法 ,突出了以任务为驱动的词汇教学方法 。
1 many way to know our society. We can reach to t hat t hrough t he medium such as
2 s mot her let him run to her. He can’t
表 11 动词 reach 在各个类联接上的使用分布情况
类联接
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
A. Generating dictionary B. Lexical cohesive
C. Predictability of collocates D. Distinctiveness of polysemics
_W_e_a_r _ a coat _W_e_a_r _ a watch _W_e_a_r _ perfume
Central collocation Medial collocation Peripheral collocation
A. Degree of closes
A fat kitchen makes a lean will. —B. Franklin
Be aware of
Do homework Developed country Be curious about
V
N
Adj
C. Quality of the node (head word)
decide on a boat decide on a boat
在船上作出决定 决定买一艘船
Unrestricted (free phrase) Semi-restricted
v
v
x
v
fingers
v
v
x
x
chicken
v
x
x
x
apple
Thank you!
C. Predictability of collocates D. Distinctiveness of polysemics
E. Accuracy of synonyms
fat
plump obese stout collocates
v
vHale Waihona Puke vvman
v
v
v
v
woman
v
v
x
x
baby
v
v
x
v
legs
He kicked the bucket out of the way. He kicked the bucket last night.
Unrestricted (free phrase) Restricted
D. Range selection restriction
Significance
A. Generating dictionary B. Lexical cohesive
Quality of the node (head word)
Verb collocation Noun collocation Adjective collocation…
Range selection restriction
Unrestricted collocation Semi-restricted collocation Restricted/frozen collocation
A. Generating dictionary B. Lexical cohesive
C. Predictability of collocates D. Distinctiveness of polysemics
E. Accuracy of synonyms
Suddenly they became the parents of triplet, two girls and a boy.
E. Accuracy of synonyms
Break the vase/cup/window Break the law/a contract/promise Break heart/a man/spirit
Fat pig/pork/lands/income/kitchen
A. Generating dictionary B. Lexical cohesive
Collocation
Types of Collocations
Degree of closeness
Central collocation Medial collocation Peripheral collocation
Collocation
Function
Grammatical/Syntactic collocation Lexical collocation
C. Predictability of collocates D. Distinctiveness of polysemics
E. Accuracy of synonyms
Lexical cohesive
Co-occurrence of Collocation Reiteration
By Halliday and Hasan (1976)
Lean person Lean season
Poor Richard`s Almanac
be interested in big apple
Grammatical/Syntactic collocation Lexical collocation
B. Function
Come out
A glass of