语言学(语料库)
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Specialized corpora: useful for English for specific purposes. If we need find out what language is used in a certain profession, then we select texts from that profession. Sample corpora: classification of genres; a large number of short extracts; random selection of extracts within genres; great internal validity内部有效性 . Monitor corpora: gigantic, ever moving stores of text. It has the capacity to hold a ‗state of the language‘ for research purposes.
Literary vs. ordinary language
Typicality vs. atypical language非典型的,不标 准的
Types of corpora
General corpora: useful for language research as a whole. A general reference corpus is not a collection of material from different specialist areas – technical, dialectal, juvenile, etc. It is a collection of material which is broadly homogeneous, but which is gathered from a variety of sources, so that the individuality of a source is obscured, unless the researcher isolates a particular text.
What is a corpus?
A corpus is a collection of linguistic data, either compiled as written texts or as a transcription of recorded speech. The main purpose of a corpus is to verify a hypothesis about language—— for example, to determine how the usage of a particular sound, word, or syntactic construction varies. 语料库是一个语言数据的集合, 其中收集的要么是书面语篇, 要么是言语录音的标音形式语料的主要目的是证实有关语言 的假设例如,确定特定语音单词或句法结构的用法如何进行 变异。
Pragmatics 语用学information
Information from corpora can tell us how language is actually used in communication.
How can we make use of corpora?
It is usually referred to as concordance 词语检索. A concordance is a collection of the occurrences of a word-form, each in its own textual environment. Concordances are usually made from corpora. Concordance is at the heart of corpora.
Context and co-text information
Context: situational environments Co-text: linguistic environments Sometimes it is very difficult to tell the differences of two words or phrases which have similar meaning. However, if we look at the context and co-text in which they are used, the difference becomes clear.
Factors in a corpus
The size of texts selected The types of texts selected
The criteria for selection of texts for corpora
Spoken vs. written language Formal vs. informal language
Good afternoon
outline
Corpus Linguistic Language teaching
Objective
Corpora are the main knowledge base in corpus linguistics.
Corpus linguistics deals with the principles and practice of using corpora (usually is the plural of corpus ) in language study. 语料库语言学考察语言研究中语料库运用的原理和实践。
A fourth area of activity, which has been among the most innovative outcomes of the corpus revolution, has been the exploitation of corpus-based linguistic description for use in a variety of applications. such as language learning and teaching, and natural language processing by machine, including speech recognition and translation.
Collocation and phraseology 措辞information
It is usually difficult for second and foreign language learners to learn which words are frequently used together. So, this kind of information helps a lot. e.g. make effort or take effort? A search in corpus will do the job.
What百度文库uses can we make of corpora?
Frequency information
Why do we need frequency information? Corpora can tell us how frequently certain language items or structures are used. This kind of information is useful when we try to select what to teach, select what to focus on, and decide what senses to focus on in the language classroom.
Types of corpus researchers
Work in corpus linguistics is currently associated with several quite different activities.
Scholars working in the field tend to be identified with one or more of them.
Grammar information
we usually refer to grammar books for grammatical information. However, what the corpora show is far more complicated than what grammar books tell about grammar. e.g. Information from corpora has shown that conditionals in English are far more than 3
What a corpus can do?
a store of used language. It can rearrange that store so that observations of various kinds can be made.
represents a speaker‘s experience of language. It can re-order that experience so that it can be reexamined in ways.
A third group of researchers consists of descriptive linguists. whose main concern has been to make use of computerized corpora to describe reliably the lexicon ['leksikən] and grammar of languages, both of the linguistic systems we use and our likely use of those systems.
A second group of researchers has been concerned with developing tools for the analysis of corpora.
This is the main task of researchers in computational linguistics.
The first group of researchers consists of corpus makers or compilers[kəm'pailə]编译器. These scholars are concerned with the design and compilation of corpora, the collection of texts and their preparation and storage for later analysis.
The function of concordance
Concordances are frequently used as a tool in linguistics that can be used for the study of a text such as:
1.comparing different usages of the same word; 2.analysing keywords; 3.analysing word frequencies; 4.finding and analyzing phrases and idioms; 5.creating indexes and word lists (also useful for publishing)