胡壮麟 第十章 课件语言学教程
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Concordance
计算机有能力搜索一个特定的词,词汇的顺序,甚至 一个文本里的某一个词类。计算机也能检索一个词所 有的实例,它还能计算一个词出现的次数,从而收集 到有关这个词的频率的信息。然后以某种方式对数据 进行分类。
poor in Tale of Two Cities, Book 1
3. Even if language is a finite construct, corpus methodology is not the best method to study language.
(a) * He shines Tony books.
(b) He gives Tony bຫໍສະໝຸດ Baiduoks.
语料库语言学:研究任何这样的语料中的数据。
Criticisms and the revival of corpus linguistics
Chomsky changed the direction of linguistics away from empiricism to rationalism.
A computer corpus is a large body of machine-readable texts.
语料库语言学:论述语言研究中使用语料的 原理和实践。一个计算机语料库是机器可读 文本的重要躯干。
语料 (CORPUS,13世纪,来自拉丁语的corpus一 词;意思是"body"(躯干;身体):复数形式通 常是corpora)。(1)一个文本的集合,尤其指完 整的和自身需求的文本集合;如:Anglo-Saxon诗 句的语料。(2)复数形式也可写成corpuses。在 语言学和词典编纂学上,指文本、语句或其它样 本的集会,通常作为一个电子数据库储存。一般 说来,计算机语料库可以储存上百万的流行词汇, 其特征能通过标记的方式(为词和其它构成的作 标记,并加以确认和分类)和使用共现关系程序 来分析。
1. the corpus could never be a useful tool for the linguist, as the linguist must seek to model language competence rather than performance.
2. the only way to account for a grammar of a language is y description of its rules, rather than by enumeration of its sentences. It is the syntactic rules that are finite.
语料(corpus,复数形式corpora):一个语言数据 的存储,可以是被编辑为书面文本,也可以是被作 为录音言语的誊本。语料的主要目的是鉴定一个语 言的假说--例如,确定一个特定的语音、单词,或 句法结构的使用如何变化。
3.1 Corpus Linguistics
Corpus linguistics deals with the principles and practice of using corpora in language study.
(c) He lends Tony books.
(d) He owes Tony books.
How can ungrammatical utterances be distinguished from ones that haven’t occurred? If the corpus does not contain sentence (a), how do we conclude that it is ungrammatical while the rest of the sentences are grammatical?
Despite the criticisms, corpus linguistics continues to develop, especially after the computer slowly starts to become the mainstay of corpus linguistics.
Corpus Linguistics
Corpus (plural corpora): a collection of linguistic data, either compiled as written texts or as a transcription of recorded speech. The main purpose of a corpus is to verify a hypothesis about language-for example, to determine how the usage of a particular sound, word, or syntactic construction varies.
Chapter Ten Language and the Computer
Corpus Linguistics 语料库语言学
Definition定义 Criticisms and the revival of corpus linguistics语
料库语言学受到的批判及其复兴
Concordance共现索引 Text encoding and annotation语篇编码和注解 The roles of corpus data语料库数据的作用
There are also problems of practicality with corpus linguistics.
How can one imagine searching through an 11-million-word corpus using nothing more than one’s eyes?