Lexicalized context-free grammars
Attacks on Lexical Natural Language Steganography Systems
Cuneyt M. Taskirana , Umut Topkarab , Mercan Topkarab , and Edward J. Delpc Motorola Labs, Multimedia Research Lab, Schaumburg, Illinois 60196 Center for Education and Research in Information Assurance (CERIAS) Purdue University, West Lafayette, Indiana, 47907 c Video and Image Processing Laboratory (VIPER) School of Electrical and Computer Engineering, Purdue University, Indiana, 47907
document. Compared to methods developed for image, video, and audio domains, NL information hiding is still a new area that has its unique challenges. A small number of NL watermarking and steganography methods have been described in the literature. However, since the theory and practice of NL information hiding is still in the process of being developed, there has been little emphasis in previous literature on testing the security, stealthiness and robustness of the proposed methods using various attacks. As mentioned above NL steganography methods may employ lexical, syntactic, or semantic linguistic transformations to manipulate cover text and embed a message. In this paper we will focus on methods that perfom lexical steganography, which is based on changing the words and other tokens in the cover text. We test the stealthiness of lexical steganography systems by developing an attack method that determines whether the choice of lexical tokens in a given text has been manipulated to embed hidden information. To the best of authors’ knowledge, this is the first study that assesses the robustness of existing lexical steganography systems against statistical attacks based on text analysis. Our approach relies on the fact that the text manipulations performed by the lexical steganography system, though they may be imperceptible, nevertheless change the properties of the text by introducing language usage that deviates from the expected characteristics of the cover text. Our method may be summarized as follows: First, we capture cover-text and stego-text patterns by training language models on unmodified and steganographically modified text. Second, we train a support vector machine (SVM) classifier based on the statistical output obtained from the language models. Finally, we classify a given text as unmodified or steganographically modified based on the output of the SVM classifier. Our choice of the SVM classifier was motivated by the facts that they were used successfully for text classification2 and that were proven to be effective as a universal steganographic attack when images were used as cover objects.3, 4 We demonstrate the performance of our approach on a lexical steganography system proposed by Winstein.5 The organization of the paper is as follows: In Section 2 we provide a brief survey of the methods perviously proposed in NL steganography. Section 3 describes in detail the lexical steganography system that we have used in our experiments. Section 4 introduces the language modeling scheme used by our system. In Section 5 we present the results of our steganography detection experiments. Finally, conclusions are presented in Section 6.
雅思写作:词汇和语法两手抓
雅思写作:词汇和语法两手抓我们都知道雅思作文的评分标准有四项,其中第三个评分标准叫lexical resources(词汇资源),第四个评分标准是grammatical range and accuracy(语法多样性和准确性),这充分体现了雅思写作的基础点也是加分点为词汇和语法的多样性。
对于词汇,研究一下剑四到剑八后面的高分作文,我们可以发现这些文章页并不是通篇都是难词或者是一些生僻的词,用到的词一定是最准确的词,在表达上也遵循了多样性的原则。
比如学习不同的技能learn different skills是大部分同学的第一反应,我们可以把learn换成acquire,比learn更生动表达了通过学习而得到的意思;再如,一篇文章由The first reason to be mentioned is that…、Another cause to be shown is that…、The third element to be considered is that…这样三句话引出三个段落,reason cause element, mention show consider都是最基础的一些词,但又都避免了重复用词。
所以建议烤鸭们平时备考绝对不能一味追求僻词、长难词,首先要做的还是背一些基础常用词汇(自己必须制定严格的背单词计划)多积累一些同义词或近义词,避免低级用词,重复用词。
对于语法,要把握两个方面:准确性和多样性。
语法准确性体现在不犯低级语法错误,写作常见的语法错误有:主谓一致、时态语态、非谓语动词、代词错误、词性使用不当、重复累赘以及标点使用不当等。
语法多样性其实就是体现在句子的多样性方面,要求使用各种句式、长短句相结合。
想拿高分的同学,还要在此基础上,尽可能的使用分词结构以及倒装虚拟句式。
SyntacticParsing...
Syntactic Parsing with Hierarchical ModelingJunhui Li,Guodong Zhou,Qiaoming Zhu,and Peide Qian Jiangsu Provincial Key Lab of Computer Information Processing Technology School of Computer Science&Technology,Soochow University,China215006{lijunhui,gdzhou,qmzhu,pdqian}@Abstract.This paper proposes a hierarchical model to parse both En-glish and Chinese sentences.This is done by iteratively constructingsimple constituentsfirst,so that complex ones could be detected reliablywith richer contextual information in the following processes.Evalua-tion on the Penn WSJ Treebank and the Penn Chinese Treebank usingmaximum entropy models shows that our method can achieve a goodperformance with moreflexibility for future improvement.Keywords:syntactic parsing,hierarchical modeling,POSTagging.1IntroductionSyntactic parser takes a sentence as input and returns a syntactic parse tree that reflects structural information about the sentence.However,with ambiguity as the central problem,even a relatively short sentence can map to a considerable number of grammatical parse trees.Therefore,given a sentence,there are two critical issues in syntactic parsing:how to represent and score a parse tree.In the literature,several approaches have been proposed in parsing by repre-senting a parse tree as a sequence of decisions with different motivations.Among them,(lexicalized)PCFG-based parsers usually represent a parse tree as a se-quence of explicit context-free productions(grammatical rules)and multiply their probabilities as its score(Charniak1997;Collins1999).Alternatively,some other parsers represent a parse tree as a sequence of implicit structural decisions instead of explicit grammatical rules.(Magerman et al.1995)maps a parse tree into a unique sequence of actions and applies decision trees to predict next ac-tion according to existing actions.(Ratnaparkhi1999)further applies maximum entropy models to better predict next action according to existing actions.In this paper,we explore the above two issues with a hierarchical parsing strategy by constructing a parse tree level by level.This can be done as follows: given a forest of trees,we recursively recognize simple constituentsfirst and then form a new forest with a less number of trees until there is only one tree in the newly produced forest.2Hierarchical ParsingSimilar to(Ratnaparkhi1999),our parser is divided into three consequent mod-ules:POS tagging,chunking and structural parsing.One major reason is that H.Li et al.(Eds.):AIRS2008,LNCS4993,pp.561–566,2008.c Springer-Verlag Berlin Heidelberg2008562J.Li et al.previous modules can decrease the search space significantly by providing n-best results only.Another reason is that POS tagging and chunking have been well solved in the literature and we can concentrate on structural parsing by incor-porating the start-of-the-art POS taggers and chunkers.In the following,we will concentrate on structural parsing only.Let’sfirst look into more details at structural parsing in(Ratnaparkhi1999).It introduces two procedures(BUILD and CHECK)for structural parsing,where BUILD decides whether a tree starts a new constituent or joins the incomplete constituent immediately to its left and CHECKfinds the most recently proposed constituent and decides if it is complete,and alternates between them.In order to achieve the correct parse tree in Fig.1,thefirst two decisions on NP(IBM) must be B-S and NO.However,as the other children of S have not constructed yet at that moment,there lacks reliable contextual information on the right of NP(IBM)to make correct decision.One solution to this problem is to delay the B-S decision on NP(IBM)until its right brother VP(bought Lotus for$200 million)has already constructed.Fig.1.The parse tree for IBM bought Lotus for$200million Motivated by above observation,this paper proposes a hierarchical parsing strategy by constructing a parse tree level by level.The idea behind the hier-archical parsing strategy is to parse easy constituentsfirst and then leave those complex ones until more information is ready.Table1.BIESO tags used in our hierarchical parsing strategy Tag Description Tag DescriptionB-X start a new constituent X I-X joint the previous oneE-X end the previous one S-X form a new constituent X alone O hold the sameTable1shows various tags in the hierarchical parsing strategy.In each pass, starting from left,the parser assigns each tree in a forest with a tag.Consequent trees with tags B-X,I-X,..,E-X from left to right would be merged into a new constituent X.Especially,S-X indicates to form a constituent X alone. The newly formed forest usually has less number of trees and the process will repeat until there is only one tree in the new forest.Moreover,maximum entropy models are used for predicting probability distribution and Table2shows the contextual information employed in our model.Syntactic Parsing with Hierarchical Modeling563 Table2.Templates for making predicates&Predicates used for prediction Template Descriptioncons(n)Combination of the headword,constituent(or POS)label and action annota-tion of the n-th tree.Action annotation omitted if n≥0cons(n*)Combination of the headword’s POS,constituent(or POS)label and action annotation of the n-th tree.Action annotation omitted if n≥0cons(n**)Combination of the constituent(or POS)label,and action annotation of the n-th tree.Action annotation omitted if n≥0Type Templates used1-gram cons(n),cons(n*),cons(n**),−2≤n≤32-gram cons(m,n),cons(m*,n),cons(m,n*),cons(m*,n*),cons(m**,n),cons(m**, n*),cons(m*,n**),cons(m,n**),cons(m**,n**),(m,n)=(-1,0)or(0,1)3-gram cons(0,m,n),cons(0,m*,n*),cons(0,m*,n),cons(0,m,n*),cons(0*,m*, n*),(m,n)=(1,2),(-2,-1)or(-1,1),and cons(1,2,3),cons(1*,2*,3*),cons(1**,2**,3**),cons(2*,3*,4*),cons(2**,3**,4**)4-gram cons(0,1,2,3),cons(0,1*,2*,3*),cons(0*,1*,2*,3*),cons(1*,2*,3*,4*), cons(1**,2**,3**,4**)5-gram cons(0*,1*,2*,3*,4*),cons(0**,1**,2**,3**,4**)The decoding algorithm attempts tofind the best parse tree T*with high-est score.The breadth-first search(BFS)algorithm introduced in(Ratnaparkhi 1999)with a compuation complexity of O(n)is revised to seek possible sequences of tags for a forest.In addition,heaps are used to store intermediate forests in the evolvement.The BFS-based hierarchical parsing algorithm has a computa-tional complexity of O(n2N2M),where n is the number of words,N is the size of a heap and M is the number of actions.3Experiments and ResultsIn order to test the performance of this hierarchical model proposed in this paper,we conduct experiments both on Penn WSJ Treebank(PTB)and Penn Chinese Treebank(CTB).3.1Parsing Penn WSJ TreebankIn this section,all the evaluations are done on English WSJ Penn Treebank.Here, Sections02-21are used as the training data for POS tagging and chunking while Section02-05are used as the training data for structural parsing.Meanwhile, Section23(2,416sentences)is held-out as the test data.All the experiments are evaluated using measures of LR(Labeled recall),LP(Labeled precision)and F1. And POS tags are not included in the evaluation.Table3compares the effect of different window sizes.It shows that,while the window size of5is normally used in the literature,extending the window size to7(from-2to4)can largely improve the performance.564J.Li et al.Table3.Performance of hierarchical parsing on Section23.(Note:Evaluations below collapse the distinction between labels ADVP and PRT,and ignore all punctuation) windows size#events#predicates LR LP F1 5471,137229,43282.0183.2182.61 6520,566302,41084.4885.7985.13 7559,472377,33285.2186.5985.89One advantage of hierarchical parsing is itsflexibility in parsing a fragment with higher priority.That’s to say,it is practicable to parse easy(or special) parts of a sentence in advance,and then the remaining of the sentence.The problem is how to determine those parts with high priority,such as appositive and relative clauses.Here,we define some simple rules(such asfinding(LRB, RRB)pairs or“–”symbols in a sentence)tofigure out the fragments with high priority.As a result,163sentences with appositive structure are found with the above rules.The experiment shows that it can improve the F1by1.53(from 77.42to78.59)on those sentences,which results in performance improvement from85.89to86.02in F1on the whole Section23.3.2Parsing Penn Chinese TreebankThe Chinese Penn Treebank(5.1)consists of890datafiles,including about 18K sentences with825K words.We putfiles301-325into the development sets,271-300into the test set and reserve the otherfiles for training.All the following experiments are based on gold standard segmentation but untagged. The evaluation results are listed in Table4.The accuracy of automatic POS is 94.19%and POS tags are not included in the evaluation.Table4.Evaluation results(<=40words)by hierarchical parsing.Gold Standard POS means using gold standard POS tags;Automatic POS using the best one automatic POS result;Automatic POS*using multiple automatic POS results withλ=0.20.LR LP F1 Gold Standard POS88.2889.7989.03Automatic POS81.0282.6181.81Automatic POS*82.1983.9683.07Impact of Automatic POS.As shown in Table4,the performance gap posed by automatic POS is up to7.22in F1,which is much wider than that of English parsing performance.The second column in Table5shows top5POS tagging errors on the test set.Mistaggings between verbs(VV)and common nouns(NN) occur frequently and make up28%of all POS tagging errors.In order to verify the effect of those POS tagging errors on the whole perfor-mance,for each error,we obtain the F1on the test set and the corresponding decline rate(the last two columns in Table5)by supposing other POS tags areSyntactic Parsing with Hierarchical Modeling565 Table5.The top5POS tagging errors on the test set and their influence.Based on Gold Standard POS,the F1on the test set(348sentences)is86.38.Num.mistagging errors#errors(rate%)F1decline rate(%)1VV→NN70(15.05)85.02 1.572NN→VV60(12.90)84.78 1.853DEC→DEG40(8.60)84.77 1.864JJ→NN38(8.17)85.790.675DEG→DEC26(5.59)85.550.96all correct.In particular,both POS tagging errors between verbs and nouns,such as VV→NN and NN→VV,and de5tagging errors(DEC→DEG,DEG→DEC) significantly deteriorate the performance.This is not surprising because:1)All nouns are immediately merged into NP,and all verbs into VP;2)de5has dif-ferent structural preferences if tagged as DEC or DEG.In order to lower the side effect caused by POS tagging errors,the top K POS results are served as the input of chunking model.Here K is defined as following, whereλ(0≤λ≤1)is the factor for deciding the number of automatic POS tagged results.The third row in Table6shows the performance whenλ=0.20.K=min(20,|{result i|P(result i)≥λ∗P(result0)}|)(1) Table6.Results on CTB parsing for sentences of at most40words Parsers LP/LR/F1Parsers LP/LR/F1Bilke&Chiang200077.2/76.2/76.7Ours80.0/76.5/78.2Levy&Manning200078.4/79.2/78.8Xiong et al.80.1/78.7/79.4Chiang&Bilke200081.1/78.8/79.9Compare with Other CTB Parsers.(Bikel&Chang2000)implemented two parsers:one based on the modified BBN model and the other based on TIG.(Chiang&Bikel2002)used the EM algorithm on the same TIG-parser to detect head constituents by mining latent information.(Levy&Manning 2003)employed a factored model and improved the performance by error analy-sis.Likewise,(Xiong et al.2005)integrated the head-driven model with several re-annotations into a model with external semantic knowledge from two Chi-nese electronic semantic dictionaries.Table6compares above systems.For fair comparisons,we also train our three models(POStagging,chunking and pars-ing)and test the performance with the same training/test sets as theirs.Table 6shows that our system only performs slightly worse than the best reported system.This may be due to our low performance in chunking.With further analysis on the parsing results,our chunking model only achieves80.82in F1on basic constituents,which make up40.9%of all constituents.Therefore,there is still much room for performance improvement by employing a better chunking model.566J.Li et al.4ConclusionsThis paper represents an attempt at applying hierarchical parsing with machine learning techniques.In the parsing process,we always try to detect constituents from simple to complex.Evaluation on the Penn WSJ Treebank shows that our method can achieve a good performance with moreflexibility for future improvement.Moreover,our experiments on Penn Chinese Treebank suggest that there is still much room for performance improvement by employing a better chunking model.AcknowledgementsThis research is supported by Project60673041under the National Natural Sci-ence Foundation of China and Project2006AA01Z147under the“863”National High-Tech Research and Development of China.References1.Bikel,D.M.,Chiang,D.:Two statistical parsing models applied to the ChineseTreebank.In:Proceedings of2nd Chinese Language Processing Workshop(2000) 2.Charniak,E.:Statistical parsing with a context-free grammar and word statistics.In:Proceedings of AAAI1997(1997)3.Chiang,D.,Bikel,D.M.:Recovering latent information in treebanks.In:Proceedingsof COLING2002,pp.183–189(2002)4.Collins,M.:1999.Head-driven statistical model for natural language parsing[D].Ph.D.Thesis,the University of Pennsylvania(1999)5.Levy,R.,Manning,C.:Is it harder to parse Chinese,or the Chinese Treebank?In:Dignum,F.P.M.(ed.)ACL2003.LNCS(LNAI),vol.2922,Springer,Heidelberg (2004)6.Magerman,D.M.:Statistical decision-tree models for parsing.In:Proceedings of the33rd Annual Meeting of the Association for Computational Linguistics(1995)7.Ratnaparkhi,A.:Learning to parse natural language with maximum entropy models.Machine Learning341(2/3),151–176(1999)8.Xiong,D.,Li,S.L.,Liu,Q.,et al.:Parsing the Penn Chinese treebank with semanticknowledge.In:Proceedings of the2nd IJCNLP,pp.70–81(2005)。
Linguisgtic
lexicogrammar 词汇语法hierarchical 层级性; 层次性decode 解码Cultural Transmission文化传承性speech therapy 言语矫治structuralism 结构主义corpus linguistics 语料库语言学Morphology 形态学morpheme 语素; 词素word-formation 构词(法) morphonology 形态音系学Pragmatics 语用学context 语境utterance 话语social context 社会语境anthropology 人类学language / speech Anthropological linguistics人类语言学field work 实地调查; 现场研究divergence 分化ancestral language (原始) 母语Computational linguistics计算语言学natural language 自然语言machine translation 机器翻译text 文本corpus linguistics 语料库语言学information retrieval 信息检索non-normative 非规范式的usage 惯用法; 习语, 成语dialect 方言synchronic共时diachronic 历时synchronic description 共时描写synchrony 共时性; 共时语言学diachronic linguistics 历时语言学historical linguistics 历史语言学Indo-European tongues 印欧语系语言Humboldt 洪堡特underlying competence 潜在能力Hymes 海姆斯speech community 言语社团pragmatic ability 语用能力communicative competence交际能力Syntagmatic 横组合Paradigmatic 纵聚合reference 指称Macrolinguistics 宏观语言学ethnography 人种学artificial intelligence 人工智能autonomy 自主性underlying system 底层/ 深层系统general linguistics 普通语言学descriptive linguistics 描写语言学psycholinguistics 心理语言学language acquisition 语言习得language development 语言发展cognition 认知Sociolinguistics 社会语言学language variety语言变体speech community 言语社团; 言语集团; 言语共同体mutterance 话语force 语力cooperative principle 合作原则speech act 语言行为orphological ending 词尾语素word order 语序Syntax 句法学word class 词类string 语符列Semantics 语义学encode 编码lexical item 词项semantic component 语义成分denotation 外延; 指称(意义) sense relation 意义关系entailment 蕴含presupposition 预设qualitative research 定性研究quantitative research 定量研究phonological 音系学的; 语音学的morphological 形态学的; 词法的syntactic 句法学的semantic 语义学的pragmatic 语用学的Phonetics 语音学speech sound 语音larynx 喉articulatory phonetics 发音语音学acoustic phonetics 声学语音学lexical-grammatical 词汇语法的morpheme 语素clause complex 小句复合体clause 小句rank 级阶hierarchical scale 层级阶/ 体系word rank 词阶Morpheme and Morphology 语素与形态学morphology 形态学; 词法Types of Morphemes 语素类型free morpheme 自由语素bound morpheme 黏着语素mono-morphemic word 单语素词poly-morphemic word多语素词compound 合成词root 词根stem 词干affix 词缀infix 中缀Morphological Change 形态变化Allomorph语素变体variation 变异; 变体allomorph 语素变体inflectional change 屈折变化possessive case 所有格; 属格lexical item 词项relative uninterruptibility相对连续性minimum free form最小自由形式Classification of W ords 词的分类variable word 可变词invariable word 不变词inflective change 屈折变化Indo-European languages印欧语系诸语言inflective ending 屈折词尾grammatical word 语法词lexical word 词汇词content word 实词, 内容词functional word 语法词, 功能词closed-class word 封闭类词open-class word 开放类词word class 词类distribution 分布parts of speech 词类particle 小品词infinitive marker 不定式符号/ 标记negative marker 否定符号/ 标记phrasal verb 短语动词pro-form 替代形式; 代词形式nominal group 名词词组pro-adjective 代形容词pro-locative 代处所词determiner 限定词pre-determiner 前位限定词head 中心词reference 所指; 指称definite 定指的indefinite 不定指的partitive 部分的universal 整体的inflection 屈折(变化)inflectional affix 屈折词finiteness 限定性aspect 体paradigm 聚合体derivation 派生(词) compound 合成(词) productive inflectional affix 能产性/ 生成性屈折词缀lexical morpheme 词汇语素endocentric compound 向心合成词exocentric compound 离心合成词de-verbal 动词派生词Blending 混成法fusion 融合法Abbreviation 缩写法clipping 截短法Acronym 首字母缩略法/语Back-formation 逆成法Analogical Creation 类推构词Analogy 类推(构词Class shift 词类转换zero derivation 零派生Borrowing 借词loanwords (完全) 借词loanblend 混合借词loanshift 转移借词loan Translation 翻译借词calque 仿造词nominal group 名词词组pre-modifier 前置修饰成分post-modifier 后置修饰成分definite 有定(词) determinative 限定词interrogative 疑问词indefinite 不定(代)词assessment 评价词syntax 句法sequential arrangement 排列顺序, 序列安排syntagmatic relations (横)组合关horizontal relations 横向关系chain relations链状关系genetic classification 谱系/ 亲缘分类法areal classification 地域/ 区域分类法word order 词序, 语序affixation 附加词缀法Genetic Classification谱系分类法, 发生学分类法Indo-European family, 印欧语系Germanic Branch, 日耳曼语族W est Germanic Sub-branch西日耳曼语支Sino-Tibetan 汉藏语系Indo-European 印欧语系Altaic 阿尔泰语系Semito-Hamitic 闪含语系Finno-Ugric 芬兰乌戈尔语系Dravidian 德拉维达/ 达罗毗荼语系Ibero-Caucasian 伊比利亚高加索语系Malayo-Polynesian 马来波利尼西亚语系Austro-Asiatic南亚语系Bantu 班图语系Syntactic Relations 句法关系co-ocurrence 同现Positional Relation 位置关系clause 小句noun phrase 名词词组affixation 附加词缀法word order 词序, 语序Typological / Morphological Classification 类型学/ 形态学分类法Isolating language 孤立语Formless language 无形态语Inflected language 屈折语Agglutinative language 粘着语Polysynthetic language 多式综合语Structural Classification结构分类法Synthetic language 综合语Analytic language 分析语Relation of Substitutability 替代关系associative relations 联想关系paradigmatic relations (纵)聚合关系vertical relations 垂直关系choice relations 选择关系Relation of Co-occurrence同现关系determiner 限定词Grammatical Construction and Its Constituents 语法结构及其构成成分grammatical Construction 语法结构discourse analysis 话语分析text analysis 语篇分析string 语符列, 系列constituent 构成成分Immediate Constituents 直接成分tree (diagram) 树形图node 节点mother (node) 母节点daughter (node) 子节点sister nodes 姊妹/ 姐妹节点Immediate Constituents 直接成分Ultimate Constituents 终端成分Immediate constituent analysis直接成分分析法syntactic category 句法范畴tree diagram 树形图bracketing 括号法binary 两分法top down 自上而下; 从大到小center 中心(词) head 中心词Exocentric 离心结构basic sentence 基本句sequence 序列Coordination and Subordination并列关系和从属关系coordinate sentence 并列句daughter (S constituent) 并列分句co-head 并列中心(词)recursiveness 递归性Subordination 从属关系subordinate constituent 从属成分modifier 修饰语, 修饰成分complement clause 补语/ 宾语从句adjunct clause 修饰/ 状语从句relative clause 关系/ 定语从句finite verb 限定动词Syntactic Function 句法功能predicator 述谓成分; 谓词nominative case 主格semantic role 语义角色agent 施事(者) patient 受事(者) grammatical subject 语法主语logical subject 逻辑主语pro-form 替代形式; 代词形式content question 内容疑问句; 要旨疑问句tag question 反意疑问句predicator 述谓成分; 谓词Object inflecting language 屈折语言case label 格标记accusative case 对格; 直接宾格dative case 与格; 间接宾格passive transformation转换为被动句Category 范畴countability 可数性aspect 体Number 数dual 双数Gender 性animate 有生命; 有生名词inanimate 无生命; 无生名词natural gender 自然性别grammatical gender 语法性别Case 格variation 变体accusative 对格; 直接宾格nominative 主格dative 与格; 间接宾格inflection 屈折(变化) Agreement 一致关系concord 一致(关系) anaphoric 照应antecedent 先行词head 中心词dependent 从属(成分) Phrase, Clause and Sentence短语, 小句和句子Phrase 短语Clause 小句, 从句finite 限定的non-finite 非限定的gerundial phrase 动名词短语Sentence functional approach 功能分类jussive 命令句optative 祈愿句; 请求句copula 系动词existential 存在动词coordination 并列subordination 从属conjoining 连接embedding 嵌入hypotactic 形合(子句) 从属的paratactic 意合(结构) 并列的Conjoining 连接Embedding 嵌入complement clause 补语从句adjunct clause 状语从句adjunct 附加语; 附加成分relative clause 关系从句Beyond the Sentence篇章(大于句子的语言单位) text linguistics 篇章语言学discourse analysis 话语分析Sentential Connection 句子连接hypotactic 主从/ 形合(子句) paratactic 并列/ 意合(结构)Cohesion 衔接discoursal cohesiveness会话衔接textual cohesiveness 语篇衔接cohesive devices 衔接手段ellipsis 省略lexical collocation 词汇搭配reference 指称; 照应substitution 替代Conceptual meaning 概念意义Associative Meaning 联想意义connotative meaning 内涵意义social meaning 社会意义affective meaning 情感意义reflected meaning 反映意义collocative meaning 搭配意义reference 指称; 指称意义connotation 内涵denotation 外延Signifier / Reference能指Signified / Referent所指Refer ential Theory 指称说sense 意义, 涵义Sense Relations 涵义关系intralinguistic relations语言内关系non-linguistic entity 非语言实体Synonymy 同义关系synonym 同义词style 语体, 文体dialectal difference 方言差别Situational / Contextual Antonyms情景反义词Antonymy 反义关系Antonym 反义词Markeredness标记, 有标记性Complementary Antonyms互补反义词Converse Antonym反向反义词Hyponymy 上下义关系meaning inclusiveness意义内包superordinate上义/ 位词hyponym / subordinate下义/ 位词co-hyponym 同位下义/ 位词auto-hyponym 自我下义词Componential Analysis (语义) 成分分析(法)semantic features 语义特征semantic components语义成分expression in logic 逻辑表达式self-contradictory 自我矛盾(句)binary taxonomy两元分类, 两项分类meta-languaeg元语言Sentence Meaning句子意义thematic meaning 主位意义linear order 线性顺序hierarchical structure层次结构Tautology 冗辞; 赘言Tautology (as a rhetorical device) 同义反复; 重言式An Integrated Theory 合成理论principle of compositionality组合/合成原则transformational grammar 转换语法semantic component 语义部分dictionary 词典projection rules 投射规则grammatical marker 语法标记syntactic marker 句法标记semantic marker 语义标记distinguisher 辨义成分selection restrictions 选择限制Logical Semantics 逻辑语义学propositional logic 命题逻辑predicate logic 谓词逻辑propositional calculus 命题演算sentential calculus 句子演算truth conditions真值条composite proposition 复合命题constituent proposition 成分命题connection 关联, 联系truth value 真值function 函数component proposition 成分命题connective 关联词, 连接词simple proposition 简单命题conjunction 合取连词disjunction 析取连词implication 蕴含连词equivalence 恒等连词, 等值连词two-place connective 二元连词one-place connective 一元连词truth table 真值表connective conjunction 合取连词connective disjunction 析取连词connective implication 蕴含连词antecedent 前件consequent 后件connective equivalence 等值连词biconditional 双条件连词constituent proposition 成分命题truth function 真值函数implication connective蕴含连词counterfactual proposition违反事实的命题; 与事实相反的命题syllogism三段论; 演绎推理predicate logic 谓词逻辑predicate calculus 谓词运算argument 主目; 中项predicate 谓词one-place predicate 一元谓词complex predicate 复合谓词; 复杂谓词propositional argument 命题主目quantifier 量词variable 变项universal quantifier 全称量词existential quantifier 存在量词quantified proposition 量化命题major premise (三段论法的) 大前提set theory 集合理论intersect / intersection 交集subset 子集Montague semantics 蒙塔古语义学Montague grammar 蒙塔古logical semantics 逻辑语义学minimal Pair 最小对比对internal mental state 内部心理/心智状态Syntactic Structure 《句法结构》mental process 心理/心智过程information processin信息处理formal approach 形式方法conceptual approach 概念方法overt aspect 显性方面autonomous 自主的; 自立的; 独立存在的sensorimotor processing感觉运动处理/ 加工formal and logic mind 形式逻辑思维cognitive linguistics 认知语言学psychology of language语言心理学formal property 形式特征conceptual property 概念特征cognitive linguistics认知语言学conceptual category概念范畴scenes and events场景和事件entity and process 物体与过程motion and location 运动与处所force and causation 作用力与作用结果ideational and affective category概念和情感范畴cognitive agent 认知因子metaphorical mapping隐喻投射semantic frame 语义框架language acquisition 语言习得utterance 话语language learning 语言学习structural linguistics 结构语言学cognitive psychology 认知心理学neuroscience 神经科学transformational-generative model转换生成模式short-term memory 短期记忆long-term memory 长期记忆perceptual strategy 感知策略; 知觉策略first language acquisition 第一语言习得second language acquisition 第二语言习得foreign language learning 外语学习acoustic wave 声波neurocognition 神经认知cerebral-functional architecture 大脑功能结构language acquisition 语言习得spontaneous speech 自然语言holophrastic stage 独词句阶段holophrastic stage 独词句阶段memorized chunk记忆模块two-word stage 双词句阶段stage of three-word utterances三词句阶段agent 施事者r ecipient 受事者two-word string 双词词串mental lexicon 心理词汇combinatorial system 组合系统telegraphic speech 电报式话语; 电文语言grammatical machinery 语法系统adult grammar成人语法morphological rule 形态学规则agglutinated suffix 黏着后缀grammatical gender 语法性(别)ergative case(爱斯基摩等语言语中的) 动者格/主动格Language Comprehension 语言理解word retrieval 词汇提取word recognition 词汇识别sentence parsing 句子语法分析句子句法分解textual interpretation 语篇解析fleeting signal 转瞬即逝的信号word recognition 单词识别sound segment 音段discrete units 分散单位; 分representation 表征; 表述cohort model 集群模式word candidate 待选词interactive model 交互模式race model 竞争模式pre-lexical route 词前路经phonotactics 音位结构学; 音位配列学segmentation problem 切分问题orthography 正字法connectionist theory 连结主义vowel nucleus 元音核consonant coda 辅音节尾connectionist model连结主义模式serial model 串行/ 序列模式dual-route model 双路模式processor 处理器parallel model 并行模式Garden Path Sentence/Utterance花园小径(句) language production 语言生成conceptualization 概念化; 概念形成linearization序列化; 线性化self-monitoring 自我监控phonological encoding 音位编码single word access 单个词语检索/ 提取word access procedure词语检索/提取程序access to words 词汇提取uperordinate concept 上位概念morpho-phonological encoding形态音位编码target word 目标词representational level表达/ 表述层directionality 指向性functional planning process 功能计划过程lexicon-grammar unit 词汇语法单位positional encoding 定位编码Construal and Construal Operations识解及识解操作vantage point 有利位置/ 条件figure-ground segregation图形-背景分离conceptualizing process 概念化过程attention / salience 注意力/ 突显foreground 前景; 前突地位vt. 把……至于前景地位trajector 射体figure-ground 图形背景ground 背景landmark 路标perspective / situatedness视点/观察者位置deixis 指示词social deixis 社会指示词textual / discursive deixi篇章/ 推论指示词categorization 范畴化basic level category基本(层次) 范畴superordinate level category 上位(层次) 范畴gestalt 格式塔; 完型parasitical categorization依附性范畴化,寄生范畴化parasitical category依附性范畴semantic frame 语义框架modifier-head 修饰语-中心词image schema 意象图式image schematic structure意象图式结构pre-conceptual image schematic structure前概念意象图式结构identifying patter 识别模式center-periphery schema中心-边缘图式containment schema 容纳/容器图式transitivity 及物性; 传递性cycle schema 循环图式self-repair 自我改正/ 纠正force schema (作用) 力图式link schema 连接图式part-whole schema 部分-整体图式path schema 路经图式scale schema 标量图式verticality schema 垂直图式Metaphor 隐喻Metonymy转喻target domain 目标域source domain 源域ontological metaphor 实体隐喻structural metaphor 结构隐喻orientational metaphor 空间/ 方位隐喻vehicle 本体; 源域Idealized Cognitive Models理想化认知模式conceptual mapping概念映射domain highlighting 域突显scenario 事件情景reference-point activation 参照点激活ontological realm 本体域semiotic triangl语义三角sign ICM符号理想化认知模式r eference ICM所指理想化认知模式concept ICM 概念理想化认知模式whole ICM and its part(s) 整体与部分间的转喻thing-and-part ICM 事物及部分转喻metonymic variant 转喻变体scale ICM 标量转喻constitution ICM 构成转喻event ICM 事件转喻subevent 子事件category-and-member ICM范畴及范畴成员转喻reduction ICM 压缩转喻parts of an ICM部分与部分转喻perception ICM 知觉转喻causation ICM 因果转喻production ICM 生产转喻control ICM 控制转喻possession ICM 领属转喻containment ICM 容器转喻sign and reference ICMs 符号和指代转喻modification ICM 修饰转喻Blending Theory 整合理论integration theory 整合理论cognitive operation 认知操作mental space 心理空间(=概念) composition 组合blended space 合成空间emergent structure层创结构; 新创结构cross-space mapping跨空间映射generic space 类属空间blend 合成空间emergent structure层创结构completion 完善elaboration 扩展emergent logic层创逻辑integrated frame 整合框架auditory phonetics 听觉语音学Phonology 音系学distribution 分布sequencing 排列phoneme 音位surrounding sounds 临音semiotic system 符号系统referential 指称的the Prague School 布拉格学派structuralist 结构主义语言学家reference 指称speech event 言语事件context 语境code 语码referential 指称/ 所指(功能)poetic 诗学(功能)emotive 情感(功能)conative 意动(功能)phatic 寒暄(功能)metalingual function元语言功能meta-function 元功能ideational function概念功能interpersonal function人际功能textual function语篇功能instrumental 工具(功能)regulatory 控制/ 调节(功能)representational 表达/ 描写(功能)interactional 交互/ 交流(功能)personal 个体/ 自我意识(功能)heuristic 探索/ 启发(功能)imaginative 想象(功能)Metalingual Function元语言功能linear order 线性语序thematic function主位/ 主题功能functional grammar功能语法theme rheme主位述位segment 音段morpheme 语素utterance 话语; 语句text 语篇discourse 话语stratification 层次; 分层immediate stimulus control受直接刺激控制immediate stimulus-free不受直接刺激控制generalization 概括; 归纳abstraction 抽象; 抽象概念/行为immediate physical context直接物质环境referential application 用作指称non-thing 非实体词汇语法semantics 语义学dialect 方言style 文体; 语体jargon 行话; 黑话linguistics 语言学neurolinguistics 神经语言学psycholinguistics 心理语言学first language acquisition 母语习得applied linguistics 应用语言学second language acquisition 二语习得general linguistics 普通语言学cognitive linguistics 认知语言学semantics 语义学cross-cultural communication跨文化交际学sociolinguistics 社会语言学anthropological linguistics人类语言学stylistics 文体学computational linguistics 计算语言学vocal sounds 语音multimodal 多模态(的)mode of meaning-making意义产生方式text 文本; 语篇verbal 言语的; 口头的conventional 规约的; 约定俗成的social semiotic 社会符号nonverbal 非语言的; 非言语的sign 符号, (语言)符号semiotics 符号学natural signs自然符号conventional signs 规约符号symbol 象征符号; 语符icon 图像符号index 指示符号natural signs 自然符号symbol 象征符号; 语符conventional signs 规约符号Design features of language语言的结构/ 设计特征morpheme 语素onomatopoeia象声词,拟声词syntactical level 句法层次systemic-functionalist系统功能语言学家syntax 句法convention 规约; 惯例conventionality 规约性; 惯例production 发音transmission 传送perception 感知phonetics 语音学phonology 音系学articulatory phonetics 发音语音学acoustic phonetics 声学语音学perceptual phonetics 感知语音学auditory phonetics 听觉语音学sound pattern 语音模式phonological structure 音系结构Speech Organs 发音器官vocal organs 发音器官vocal tract 声道oral cavity 口腔nasal cavity 鼻腔IPA 国际音标International Phonetic Association国际语音学协会phonetic transcription标音(法); 语音转写phonetic alphabet 音标(系统)International Phonetic Alphabet国际音标(字母)diacritic 变音符; 附加符号plosive 爆破音nasal 鼻音trill 颤音tap / flap 触音/ 闪音fricative 擦音lateral fricative 边擦音approximant 近音lateral approximant 边近音pulmonic 肺气流音bilabial 双唇音labiodental 唇齿音dental 齿音alveolar 齿龈音postalveolar 齿龈后音retroflex 卷舌音palatal 硬腭音velar 软腭音uvular 小舌音pharyngeal 咽音glottal 声门音non-pulmonic 非肺气流音click 吸气音voiced implosive 浊内破音ejective 挤喉音suprasegmental 超音段音位/ 特征tone 声调; 音调segment 音段; 切分成分consonant 辅音vowel 元音tonal difference 声调差异intonation pattern 语调模式Consonants 辅音articulator 发音器官manner of articulation发音方式place of articulation 发音部位oral tract 口腔stop 塞音plosive 爆破音nasal stop 鼻腔塞音fricative 擦音turbulent airflow 振荡气流approximant 近音lateral 边音lateral fricative 边擦音lateral approximant 边近音trill 颤音roll 滚音tap 触音flap 闪音retroflex 卷舌音r-colored vowel 儿化元音affricate 塞擦音Vowels 元音cardinal vowels 基本元音vowel quality 元音音质cardinal vowel diagram基本元音图quadrilateral 四边形图high / close 高/ 闭low / open 低/ 开mid-high / close-mid 中高/ 闭中mid-low / open-mid 中低/ 开中schwa 中元音unrounded vowel 展唇元音rounded vowel 圆唇元音segment 音段semi-vowel 半元音quality 音质pure vowel 纯元音monophthong 单元音vowel glide 元音滑音; 元音音渡diphthong 双元音, 二合元音triphthong 三元音, 三合元音voicing 浊音化; 有声化voiceless 清音的; 不带声的voiced 浊音的; 带声的IPA chart 国际音标表r-colored (元音) 带r音色的rhotic 发r音的retroflection 卷舌; 卷舌音质r-coloring r音色rhoticity r音化low central vowel低央元音, 低中元音hypothetical vowel quality假设元音音质phonetic transcription语音转写Coarticulation andPhonetic T ranscriptions协同发音与语音转写overlapping articulation重合发音coarticulation 协同发音anticipatory coarticulation先期协同发音perseverative coarticulation后滞协同发音nasalization 鼻音化variation 变体; 变异aspirated 送气(音)unaspirated 非送气(音)broad transcription宽式标音(法); 宽式转写narrow transcription严式标音(法); 严式转写Narrow T ranscription 严式音标Broad T ranscription 宽式音标velarization 软腭化送气(音)aspiration 送气devoicing 清音化dentalization 齿音化glottalization 声门化syllabification音节特性; 成音节Phonemes 音位minimal pairs最小变异对; 最小对立体minimal pairs test最小变异对测试contrastive sounds 对立音phonemic transcription音位标注; 音位转写Allophones 音位变体phone 音子; 单音; 音素complementary distribution互补分布allophony 音位变体现象allophonic variation音位变体velarization 软腭化dark L 模糊Lphonetic similarity发音相似性free variant 自由变体free variation 自由变异Free variation 自由变体Phonological processes,Phonological rules andDistinctive features音系过程、音系规则和区别性特征Assimilation 同化(现象)nasalization 鼻音化dentalization 齿音化velarization 软腭化phonetic similarity 发音相似性regressive assimilation 逆同化progressive assimilation 顺同化devoicing 清音化phonological process 音系过程target 目标音段affected segment 承事音段phonological rule 音系规则focus bar 焦点线Epenthesis, Rule orderingand the Elsewhere Condition增音、规则顺序及剩余位置条件epenthesis 增音empty position 空位plural variant 复数变体sibilant 咝音underlying form 底层形式underlying representation底层表达式surface form 表层形式surface representation 表层表达式rule ordering 规则顺序elsewhere condition 剩余位置条件Distinctive Features区别性特征sonorant 响音obstruent 阻塞音binary features二分特征; 二项对立特征spread glottis 声门展开Suprasegmentals 超音段特征sound segment 音段The Syllable Structure音节结构monosyllable 单音节polysyllable 多音节nucleus 节核, 音节核心peak 节峰consonant cluster 辅音丛onset 节首coda 节尾rhyme 韵基open syllable 开音节closed syllable 闭音节Maximal Onset Principle = MOP 最大节首原则Stress 重音primary stress 主重音secondary stress 次重音notional words 实词structural words 虚词Intonation 语调intonation-group boundary调群分界线T one 声调tone language 声调语言tone number 调值数字tone description 声调描述English gloss 英语释义tone contour声调升降曲线。
关于双语心理词库的表征结构
关于双语心理词库的表征结构一、本文概述随着全球化的发展,双语教育和学习已成为越来越多人的选择。
双语心理词库作为双语者大脑中词汇信息的存储和处理系统,其表征结构一直是语言学、心理学和认知科学等领域的研究热点。
本文旨在深入探讨双语心理词库的表征结构,分析双语者如何在大脑中存储、组织和提取两种语言的词汇信息。
通过对双语心理词库表征结构的研究,不仅可以深化我们对双语者语言处理机制的理解,还可以为双语教育、第二语言习得和语言障碍治疗等领域提供理论支持和实践指导。
本文首先将对双语心理词库的基本概念进行界定,明确双语心理词库的定义和特征。
接着,将回顾双语心理词库表征结构的相关研究,包括词汇链接模型、概念中介模型等,分析各模型的优缺点和适用范围。
在此基础上,本文将重点探讨双语心理词库的表征方式,包括词汇表征、语义表征和形态表征等方面。
本文还将关注双语心理词库的动态发展过程,探讨双语者在学习和使用两种语言过程中词库表征结构的变化和调整。
本文将对双语心理词库表征结构的研究前景进行展望,提出未来研究方向和潜在应用领域。
通过深入研究双语心理词库的表征结构,我们有望更好地理解双语者的语言处理机制,为双语教育、第二语言习得和语言障碍治疗等领域提供更有效的理论支持和实践指导。
二、双语心理词库的理论背景双语心理词库的研究是在语言学、心理学、认知科学等多个学科的交叉背景下逐渐发展起来的。
其理论背景主要源于以下几个方面:双语心理词库的研究受到了认知心理学中信息加工理论的影响。
信息加工理论认为,人类大脑在进行信息处理时,会形成一个复杂的认知结构,即心理词库。
这一理论为双语心理词库的研究提供了基础,使我们能够从信息加工的角度去理解和描述双语者的词汇存储和加工过程。
双语心理词库的研究还受到了语言学中词汇理论的影响。
词汇理论认为,词汇是语言的基础,是语言理解和表达的关键。
在双语者的语言中,词汇的存储和加工是一个复杂的过程,涉及到两种语言的词汇之间的相互影响和相互作用。
paper
面向大规模英语口语考试的自动语法评分技术研究丁克玉1,3,李兆远1,3,刘飞1,3,陈小平1,3,胡国平2,3,陈志刚2,3(1.中国科学技术大学计算机学院多智能体系统实验室 230027)(2.科大讯飞研究院 230088)(3.安徽省语音及语言技术工程实验室 230088)摘 要:本文首次在面向大规模英语口语考试的自动评分技术中引入语法评测,并在基于人工转写数据集的基础上取得了较好结果。
为准确评价考生的语法水平,分别采用非词汇化概率上下文无关文法的语法分析、词汇化概率上下文无关文法的语法分析和中心词驱动的统计语法分析三种技术对句子进行分析,提取语法得分作为特征。
同时,针对英语口语的特点,引入短语重复数和语法树高度等特征。
使用上述这些特征,分别建立线性回归模型和决策树预测模型以获得语法评测分。
在128份真实考试现场采集的复述题数据集人工转写结果上,最终完成的系统达到了专家精细语法评分91%的性能。
关键词:计算机辅助语言学习;语法评价;统计自然语言处理;评分特征1.引言如何实现英语口语考试的自动阅卷是近年来计算机辅助教育领域的一个研究热点,出现了一些基于计算机和网络技术的英语口语机考系统[1],如上海外语教育出版社的口语机考系统、蓝鸽的系统、科大讯飞公司的英语口语机考系统等。
英语口语考试一般包括朗读题、复述题、看图说话等题型。
其中复述题型的考试形式是让考生看或听一段短文,然后隐去短文,要求考生在规定的时间内将此文的大致意思表述出来。
考生的表述可以不照搬原文,只要复述覆盖了短文的主要内容并且复述本身比较通顺、清晰易懂即可。
复述题型的自动评分技术最近逐渐成为研究的热点,并且取得一定的成果[1]。
现行的复述题型主要考察音素后验概率、语速、关键词覆盖率、文本覆盖率等特征,但是没有考虑语法信息。
语法是语言的重要组成部分,如果一个自动评分系统不能评价考生的语法水平,那么这个评测系统就很难成为一个完整的系统。
基于语料库英语词汇自主学习
基于语料库的英语词汇自主学习摘要:本文围绕基于语料库的英语词汇自主学习展开,大型电脑语料库的发展给词汇自主学习提供一个崭新的模式。
本文论述了基于语料库的英语词汇自主学习的可行性,并对利用语料库进行英语词汇自主学习的方式提出了几项建议。
关键词:语料库词汇学习自主学习词汇量的大小是衡量英语学习者水平高低的重要指标之一。
在传统的英语词汇教学中,教师往往认为词汇方面的学习是学生自己课下应完成的任务,在课上指导的较少,而学生对于词汇的学习常常是费时费力而成效甚微。
近二十年来,外语教育正向交际式教学转变,同时从以教师为中心转向以学生为中心。
学习者开始在外语学习中处于主体地位。
自从自主学习的概念进入外语教学以来,它开始受到越来越多的国内外学者的的关注。
语料库作为一种辅助性工具应用于语言教学,还是应用语言学中一个新兴的研究领域。
它收集了大量真实的语言材料,运用先进的检索软件,使得对数量庞大的语言数据进行快速分析成为可能。
而语料库夜成为了语言学习者一种自主学习的模式。
一、基于语料库的英语词汇自主学习的可行性自主学习(autonomy/self-assess/independent learning)这一概念由来已久,很多学者都对它的概念进行了界定。
holec (1981)认为自主学习就是“可以自己学习的能力”(the ability to take charge of one’s own learning); dickinson (1987 )将自主学习定义为“在学习中学习者完全自己负责和决定并应用这些决定”(situation in which the learner is totally responsible for all the decisions concerned with him or her learning and the implementation of those decisions.); little (1997)认为自主学习“基本是学习者的心理与学习过程和内容的一种联系”(essentially a matter of the learner’s psychological relation to the process and content of learning)。
Context Free Grammars
Context Free GrammarsSo far we have looked at models of language that capture only local phenomena, namely what we can analyze when looking at only a small window of words in a sentence. To move towards more sophisticated applications that require some form of understanding, we need to be able to determine larger structures and to make decisions based on information that may be further away than we can capture in any fixed window. Here are some properties of language that we would like to be able to analyze:1. Structure and Meaning–A sentence has a semantic structure that is critical to determine for many tasks. For instance, there is a big difference between Foxes eat rabbits and Rabbits eat foxes. Both describe some eating event, but in the first the fox is the eater, and in the second it is the one being eaten. We need some way to represent overall sentence structure that captures such relationships.2. Agreement–In English as in most languages, there are constraints between forms of words that should be maintained, and which are useful in analysis for eliminating implausible interpretations. For instance, English requires number agreement between the subject and the verb (among other things). Thus we can say Foxes eat rabbits but “Foxes eats rabbits” is ill-formed. In contrast, The Fox eats a rabbit is fine whereas “The fox eat a rabbit” is not.3. Recursion–Natural languages have a recursive structure that is important to capture. For instance, Foxes that eat chickens eat rabbits is best analyzed as having an embedded sentence (i.e., that eat chickens) modifying the noun phases Foxes. This one additional rule allows us to also interpret Foxes that eat chickens that eat corn eat rabbits and a host of other sentences.4. Long Distance Dependencies–Because of the recursion, word dependencies like number agreement can be arbitrarily far apart from each other, e.g., “Foxes that eat chickens that eat corn eats rabbits” is awkward because the subject (foxes) is plural while the main verb (eats) is singular. In principle, we could find sentences that have an arbitrary number of words between the subject and its verb.Tree Representations of StructureThe standard representation of sentence structure to capture these properties is a tree structure as shown in Figure 1 for the sentence The birds in the field eat the corn. A tree consists of labeled nodes with arcs indicates a parent-child relationship. Trees have a unique root node (S in figure 1). Every node except the root has exactly one parent node, and 0 or more children. Nodes without children are called the leaf nodes. This tree captures the following structural analysis: there is a sentence (S) that consists of an NP followed by a VP. The NP is an article ART followed by a noun (N) and a prepositional phrase (PP). The ART is the, the N is birds and the PP consists of a proposition (P) followed by an NP. The P is in, and the NP consists of an ART (the) followed by a N ( field). Returning to the VP, it consists of a V followed by an NP, and so on through the structure. Note that the tree form readily captures the recursive nature of language. In this tree we have an NP as a subpart of a larger NP. Thus this structure could also easily represent the structure of the sentence The birds in the field in the forest eat the corn, where the inner NP itself also contains a PP that contains another NP.Note also thatthe structure allows us tostatedependencies betweenconstituents ina concise way.For instance,Englishrequires numberagreement between thesubject NP(however complex) and the main verb.This constraintis easily captured as additional information on when an NP followed by a VP can produce a reasonable sentence S. To make properties like agreement work in practice would requiredeveloping a more general formalism for expressing constraints on trees, which takes us beyond our focus in this course (but see a text on Natural Language Understanding).The Context-Free Grammar FormalismGiven a tree analysis, how do we tell what trees characterize acceptable structures in thelanguage? The space of allowable trees is specified by a Grammar , in our case a Context Free Grammar (CFG). A CFG consists of a set of the formcategory 0 -> catagory 1 ... category n which states that a node labeled with category0 is allowed in a tree when if has n children ofcategories (from left to right) catagory 1 ... category n . For example, Figure 2 gives a set of rules that would allow all the trees in the previous figures.You can easily verify that every parent-child relationship shown in each of the trees is listed as a rule in the grammar. Typically, a CFG defines an infinite space of possible trees, and for any given sentence there may be a number of possible trees that are possible. These different trees can provide an account of some of the ambiguities that are inherent in language. For example, Figure 3 provides two trees associated with the sentence I hate annoying neighbors . On the left, we have the interpretation thatdon’t like doing things (like playing loud music) that annoy our neighbors.S NP VP ART N PP V NP P ART N ART N The birds in the field eat the cornFigure 1: A tree representation of sentence structureS NP VP V NP ADJN I hate annoying neighborsFigure 3: Two tree representation of the same sentence S NPVP VVNPI N PRO PRO This ambiguity arises from choosing a different lexical category for the word annoying, which then forces different structural analyses. There are many other cases of ambiguity that are purely structural, and the words are interpreted the same way in each case. The classic example of this is the sentence I saw the man with a telescope , which is ambiguous between me seeing a man holding a telescope and me using a telescope to see a man. This ambiguity would be reflected in where the prepositional phrase with a telescope attached into the parse tree.Probabilistic Context Free GrammarsGiven a grammar and a sentence, a natural question to ask it 1) whether there is any tree that accounts for the sentence (i.e., is the sentence grammatical), and 2) if there are many trees, which is the right interpretation (i.e., disambiguation). For the first question, we need to be able to build a parse tree for a sentence if one exists. To answer the second, we need to have some way of comparing the likelihood of one tree over another. There have been many attempts to try to find inherent preferences based on the structure of trees. For instance, we might say we prefer tree that are smaller over ones that are larger. But none of these approaches has been able to provide a satisfactory account of parse preferences. Putting this into a probabilistic framework, given a grammar G, sentence s, and a set of trees T s that could account for s (i.e, each t has s as its leaf nodes), we want the parse tree that isargmax t in Ts P G (t s )To compute this directly, we would need to know the probability distribution P G (T), where T is the set of all possible trees. Clearly we will never be able to estimate this distribution directly. As usual, we must make some independence assumptions. The most radical assumption is that each node in the tree is decomposed independently of the rest of the nodes in the tree. In other words,say we have a tree T consisting of nodes T 1, with children T 1.1, …, T 1.n , with T i.j having children T i,j,1, …, T i,j,m and so on, as shown in figure 4.Thus making the independence assumption about, we’d sayP(T 1) = P(T 1 ‡ T 1.1 T 1.2) * P(T 1.1) * P(T 1.2)Using the independence assumptions repeatedly, we can expand P(T 1.1) to P(T 1.1 ‡ T 1.1.1 T 1.1.2T 1.1.3) * P(T 1.1.1) * P(T 1.1.2) * P(T 1.1.3), and so on until we have expanded out the probabilities of all the subtrees. Having done this, we’d end up with the following:P(T) =S r P(r ) where r ranges over all rules used to construct TThe probability of a rule r is the probability that the rule will be used to rewrite the node on the right hand side. Thus, if 3/4 of NPs are built by the rule NP -> ART N, then the probability of this rule would be .75. This formalism is called a probabilistic context-free grammar (PCFG).Estimating Rule ProbabilitiesThere are a number of ways to estimate the probabilities for a PCFG. The best, but most labor intensive way is to hand-construct a corpus of parse trees for a set of sentences, and then estimate the probabilities of each rule being used by counting over the corpus. Specifically, the MLE estimate for a rule RHS -> LHS would beP MLE (RHS->LHS) = Count(RHS->LHS) / S c Count(RHS -> c)For example, given the trees in figures 1 and 2 as our training corpus, we would obtain the T 1T1.1T 1.2T 1.1.1T 1.1.2T 1.1.3T 1.2.1 1.2.2T 1.1.3.1T 1.1.3.2T 1.2.2.1T 1.2.2.2T 1.1.3.2.1T 1.1.3.2.2w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8Figure 4: The notation for analyzing treesestimates for the rule probabilities in figure 5.Rule Count Total for RHS MLE EstimateS -> NP VP331NP -> PRO27.29NP -> ART N27.29NP -> N17.15NP -> ADJ N17.15NP -> ART ADJ N17.15VP -> V NP34.75VP -> V VP14.25PP -> P NP111PRO -> I221V -> hate24.5V -> annoying14.25ADJ -> annoying111N -> neighbors14.25Figure 5: The MLE Estimates from Trees in Figs 1 and 3Given this model, we can now answer which interpretation in Figure 3 is more likely. The tree on the left (ignoring for the moment the words) involves rules S->NP VP, NP->PRO, VP->V NP, NP->ADJ N, PRO-> I, V->hate, ADJ->annoying and N->neighbors, and thus has a probability of 1 * .29 * .75 *.15 * 1 * .5 * 1 * .25 = .004, whereas the tree on the right involves rules S->NP VP, NP->PRO, VP-> V VP, VP-> V NP, NP-> N, PRO-> I, V->hate, ADJ->annoying and N->neighbors, and thus has a probability of 1 * .29 * .25 * .75 * .15 * 1 * .5 * 1 * .25 = .001. Thus the tree on the left appears more likely. Note that one reason that there is a large difference in probabilities is because the tree on the right involved one more rule. In general, PCFG will tend to prefer interpretations that involve the fewest number of rules.Note that the probabilities of the lexical rules would be the same as the tag output probabilities we looked at when doing part of speech tagging, i,e,P(ADJ -> Annoying) = P(Annoying | ADJ)Parsing Probabilistic Context Free GrammarsTo find the most likely parse tree, it might seem that we need to search an exponential number of possible trees. But using the probability model described above with its strong independence assumptions, we can use dynamic programming techniques to incrementally build up all possible constituents that cover one word, then two, and so on, over all words in the sentence. Since the probability of a tree is independent of how its subtrees were built, we can collapse the search at each point by just remembering the maximum probability parse. This algorithm looks similar to the min edit distance and Viterbi algorithms we developed earlier.I would be 1, hate would be 2, and so on. Also, to simplify matters, let’s assume all grammar rule are of only two possible forms, namelyX -> Y ZX -> w, where w is a word(i.e., there are 1 or 2 categories on the right hand side). This is called Chomsky Normal Form ,and it can be shown that all context free grammars have an equivalent grammar in this form. The parsing algorithm is called the probabilistic CKY algorithm . It first builds all possiblehypotheses of length one by considering all rules that could produce each of the words. Figure 6shows all non-zero probabilities of the words, where we assume that I can be a pronoun (.9) or a noun (.1 - the letter “i”), hate is a verb, annoying is a verb (.7) or ADJ (.3), and neighbors is a noun. We first add all the lexical probabilities, and then use the rules to build any other constituents of length 1 as shown in Figure 6. For example, at position 1 we have an NP constructed from the rule NP -> PRO (prob .29) and constituent PRO 1 (prob .1), yielding NP 1with prob .29. Likewise at position 4, we have an N with probability .25 (N 4) which with rule NP->N (prob. .15) which produces an NP with probability .0375.The next iteration constructs finds the maximum probability for all possible constituents of length two. In essence, we iterate though every non-terminal and every pair of points to find the maximum probability combination that produces that non-terminal (if there is any at all).Consider this for NP, where NP i,j will be an NP from position i to position j:P(NP 1,2 ) = 0 (no rule that produces an NP has a non-zero LHS)P(NP 2,3) = 0, for same reasonsP(NP 3,4) = P(NP-> ADJ N)*P(ADJ 3)*P(N 4) = .15 * 1 * .25 = .0375.The only other non-zero constituent of length 2 in this simple example is VP 3,4, which has probability P(VP->V NP)*P(V 3)*P(NP 4) = .75*.25 *.0375 = .007. The results after this iteration are shown as Figure 7.The next iteration is building constituents of length 3. Here we have the first case we havecompeting alternatives for the same constituent. In particular, VP 2,4 has two possible derivations,using rule VP->V NP or using rule VP -> V VP. Since we are interested only in finding the maximum parse tree here, we want whichever one gives the maximum probability.VP->V NP: P(VP->V NP) *P(V 2)*P(NP 3,4) = .75 * .5 * .0375 = .014VP->V VP: P(VP->V VP) *P(V 2)*P(VP 3,4) = .25 *.5 * .007 = .0008Thus we add VP 3,4 with probability .014. Note that to recover the parse tree at the end, we would also have to record what rule and subconstituents were used to produce the maximum probability interpretation. As we move to the final iteration, note that there is only one possible combination to produce S 1,4, namely combining NP 1 with VP 2,4. Because we dropped the other interpretation of the VP, that parse tree is not considered at the next level up. It is this property that allows us to search efficiently for the highest probability parse tree. There is only one possibility for a constituent of length four, combining NP 1, VP 2,4 and rule S->NP VP, yielding the probability of S 1,4 of .004 (as we got for this tree before).Note that the CKY algorithm computed the highest probability tree for each constituent type over each possible word sequence, similar to the Viterbi algorithm finding the probability of the best path to each node and each time. A simple variant of the CKY algorithm could add all the probabilities of the trees together a produce an algorithm analogous to computing the forward probabilities in HMMs. This value is called the inside probability and is defined as follows,where w i,j is the sequence of words from i to j, N is a constituent name and N i,j is indicates that this constituent generates the input from positions i to j.Inside G (N, i, j) = P G (w i,j | N i,j )i.e., the probability that the grammar G generates w i,j given we know that constituent N is the root of the tree covering position i through j.Another probability that is often used in algorithms involving PCFGs is the outside probability .This is the probability that a constituent appears between positions i and j given the context of the words on either side of the constituent (i.e., from 1 to i and j to the end). In other words,outside G (N, i, j) = P G (w 1,I-1, N i,j , w j+1,n )If we put these two together, we get the probability that a constituent N generate the words between position i and j for a given input w 1,n .P G (N i,j | w 1,n ) = inside k (N, i, j)*outside k (N, i, j)Figure 8: Inside and Outside ProbabilitiesEvaluating Parser AccuracyA quick look at the literature will reveal that statistical parsers appear to be performing very well on quite challenging texts. For example, a common test corpus is text from the Wall Street Journal, which contains many very long sentences with quite complex structure. The best parsers report that that attain about a 90% accuracy in identifying the correct constituents. Seeing this, one might conclude that parsing is essentially solved. But this is not true. Once we look at the details in these claims, we see there is much that remains to be done. First, a 90% constituent accuracy sounds good, but what does this translate into as far as how many sentences are pared accurately. Lets say we have a sentence that involves 20 constituents. Then the probability of getting a sentence completely correct is .920 = .28. So the parser would only be getting 28% of the parses completely correct. Second, the results also depend on the constraints on the grammar itself. For instance, its easy to write a context free grammar that generates all sentences with two rules:S -> wS -> S Si.e., a sentence is a sequence of words. Clearly getting the correct parse for this grammar would be trivial. Thus to evaluate the results, we need to consider whether the grammars are producing structures that will be useful for subsequent analysis. We should require, for instance, that the grammar requires that we identify all the major constituents in the sentences. In most grammatical models, the major constituents involves noun phrases, verb phrases and sentences, and adjective and adverbial phrases. Typically, the grammars being evaluated do meet this requirements. But for certain constructs that are important for semantic analyses (like whether a phrase is an argument to a verb or a modifier of the verb), the current grammars collapse these distinctions. So if we want a grammar for semantic analysis, the results will look even worse. The lesson here is not to accept accuracy numbers without a careful consideration of what assumptions are being made and how the results might be used.。
斯坦福 nlp课程笔记
斯坦福 nlp课程笔记斯坦福NLP课程笔记斯坦福大学自然语言处理(NLP)课程是一门涵盖了自然语言处理领域广泛知识的课程。
本篇笔记将对该课程的核心概念和重要内容进行总结和讨论。
以下是我对该课程的学习心得和笔记。
1. 自然语言处理简介自然语言处理是计算机科学与人工智能领域的重要研究方向,涉及处理和理解人类语言的算法和技术。
它包括了文本处理、语音处理、语义理解等方面的内容。
2. 语言模型语言模型是自然语言处理的基础,它可以用来估计或产生自然语言句子的概率。
常见的语言模型包括n-gram模型和神经网络语言模型。
3. 词向量表示词向量表示是NLP中常用的技术,可以将词语映射到一个低维向量空间中,从而捕捉到词语之间的语义关系。
Word2Vec和GloVe是两种常见的词向量表示方法。
4. 序列标注序列标注是将输入的序列与特定的标签进行关联的任务,常见的序列标注任务包括命名实体识别和词性标注。
隐马尔可夫模型(HMM)和条件随机场(CRF)是序列标注中常用的算法。
5. 语义角色标注语义角色标注是指对于一个给定的句子,通过识别出该句子中的动词和名词短语,并为它们分配语义角色标签。
语义角色标注可以帮助理解句子的结构和语义。
6. 语义解析与语法分析语义解析是指将自然语言句子转化为语义表示的过程,常见的语义解析方法包括基于规则的方法和基于统计的方法。
语法分析则是将句子结构化成一种形式化的表示,常见的语法分析方法包括依存句法分析和成分句法分析。
7. 机器翻译机器翻译是将一种自然语言转化为另一种自然语言的任务,机器翻译也是自然语言处理中的重要应用之一。
统计机器翻译和神经网络机器翻译是当前主流的翻译方法。
8. 文本生成文本生成是指通过算法和模型生成符合语法和语义规则的自然语言文本,常见的文本生成任务包括文本摘要和对话生成。
9. 情感分析情感分析是指识别和推测出文本中的情感倾向和情感状态,常见的情感分析任务包括情感分类和情感强度预测。
硕士学位论文-中英立法语言中模糊限制语的对比研究
A Contrastive Analysis of HedgesIn Chinese and English Legislative Languages 中英立法语言中模糊限制语的对比研究By Hao ChangxiaUnder the Supervision ofProfessor Zhang YimingA Thesis Submitted to the College of Foreign Languages ofShanghai Maritime UniversityIn Partial Fulfillment ofThe Requirements for the MA DegreeShanghai Maritime UniversityMay 10th, 2009AcknowledgementsFirst, I would like to thank my supervisor, Professor Zhang Yiming in Shanghai Maritime University, who inspired and encouraged me throughout the writing of the present thesis. The completion of the thesis could never have been possible without his precious guidance and constructive suggestions.Second, I would also like to express my deepest gratitude to my boyfriend, who was always ready to give me his opinions and support. He taught me how to make good use of the computer software tools with great patience, which was a very great help for me, especially when making the quantitative analysis of the data during the research of the present thesis.Last but not least, I would like to thank all my roommates, with whom I share the pain and pleasure of writing the thesis. It helped me ease the stress during the writing and keep a good mood so as to finish the present thesis smoothly.ABSTRACTAccuracy and preciseness is always regarded as the soul of the legislative language and people believes that legislative language is clear and precise all the time. While research findings after the birth of fuzzy linguistics show that, vague words or phrases could also be found in this kind of language. This paper is written exactly based on such findings.Since vague language is a big family. This paper chooses hedges, a member of the vague language family, as the research focus to make a contrastive analysis on how they are used in both the English and Chinese legislative languages. And the corpora in English and Chinese are Labor Regulation of Washington State and Labor Law of PRC respectively.This paper consists of three parts. Chapters one to three is the first part, which gives a brief view of the whole thesis, reviews the past researches on hedges both at home and abroad, and presents the theoretic framework for analysis.Chapters four to five is the second part, which first studies what kind of hedges are used in the corpora and their distribution based on Prince et al‟s taxonomy, then researches the various linguistic realizations of hedges by means of Channel‟s categories of approximators and the pragmatic functions through the Cooperative Principle of Grice, and finally carries out a contrastive analysis of hedges used in the English and Chinese legislative language in order to find and conclude some similarities as well as differences.And chapter six is the third part, which summarizes the whole thesis, shows the limitations and also presents suggestions for further studies.Keyword: Hedges, Legislative language, Contrastive Analysis摘要清晰准确一直被视为立法语言的灵魂,人们历来认为立法语言应当准确明了地表达事物。
运用词块教学法提升高中生英语写作能力的探索实践
942019年44期总第484期ENGLISH ON CAMPUS运用词块教学法提升高中生英语写作能力的探索实践文/傅 蓉知识薄弱,句子结构混乱;4. 积累不足,句式单一,内容空洞。
针对这些情况,笔者尝试采用词块教学法,以期提高学生的英语写作水平。
二、词块教学法理论及其对写作的指导意义词汇教学法( Lexical Approach)认为,理解和产出词块(lexical chunks)是语言习得的重要组成部分(Lews,1993)。
词块是指那些兼具词汇和语法特征、有特定话语功能的语言结构;通常由多个词项组成,且整体存储在记忆中,使用时直接提取,无需语法生成和分析(Wray, 2002)。
Nattinger和DeCarrico (1992) 一、引言英语写作是一种综合性的思维活动,也是体现学生语言能力最重要的一种产出性技能。
在高中阶段的英语教学中,写作教学既是一个教学重点,也是一个难关。
《全日制义务教育普通高中英语课程标准(实验稿)》(教育部, 2003)对高中生写作能力的相关要求是:“能写出连贯且结构完整的短文”,做到“语句通顺”。
然而,根据笔者的观察,相当部分学生写作能力薄弱,具体表现在以下方面:1. 受母语思维干扰,“中式英语”现象严重;2. 词汇量少,词不达意,表达不得体状况频繁出现;3. 语法地投入到教学的过程之中。
许多老师在进行教学活动的时候喜欢对学生进行灌输式教学,甚至一部分教师上课只是自顾自地在讲台上照本宣科,对于学生在底下是否认真听讲完全漠不关心,这种教学模式是一种单边的教学模式,教师无法从学生那里获得反馈,也就无法有效地根据学生的实际情况对教学内容进行调整。
教师应当在教学时多关注学生的反馈,采取课堂提问的方式让学生集中注意力,帮助学生了解哪些知识点没有掌握。
教师还可以在教学模式和教学手段上实现创新,充分利用现代科技,如多媒体教学设备,帮助提高课堂的学习效果。
教师可以通过排练英语短剧,观看英文电影,进行英文图书分享等相关活动,通过多样的教学形式引发学生对于英语的学习兴趣;教师还可以让学生去充当“教师”,让学生自己备课,然后为同学们讲解课堂教学内容,在讲解结束后请其他同学评价,找出不足,提出建议。
为什么说「听习英语听力时,千万不要用中文来理解」?
为什么说「听习英语听力时,千万不要用中文来理解」?iPhone Android立即下载知乎日报每日提供高质量新闻资讯为什么说「听习英语听力时,千万不要用中文来理解」?图片: 版权图片库青格乐不光是听听力,做阅读、英文写作、学单词都最好不要用中文来理解。
不要练翻译、不要练翻译、不要练翻译!大家不要被个人经验迷惑了!要看实验!不管你是语言初学者还是学了一段时间的人,我以下分享的理论和模型都适用!有的人英语学得好不代表他知道自己怎么学来的。
此处:翻译、口译专业,或者以从事翻译行业为职业规划的人,我以下说的情况除外。
我的回答只针对想要将英语作为第二语言发展,成为一个熟练的英语使用者的人。
很多二语习得领域的理论模型都侧面解释这个论点。
陈述性- 程序性记忆系统模型(DP model,Declarative/Procedural memory system )Ullman 教授(1997,2001a, 2001b, 2004)认为人的记忆系统分为陈述性记忆系统和程序性记忆系统,且两个系统具有独立的大脑物质基础。
陈述性记忆系统负责知识、事实、以及“是什么”的陈述性知识,相对应的语言知识就是通过书本、课堂有意识的学到一些词汇、语法规则(陈述性知识);程序性记忆系统负责储存“怎么做”(尤其是一系列的序列技能)的程序性知识,相对应的语言技能就是如何把脑子里的语言知识合理、合法、适时地使用出来。
所以,陈述性记忆系统储存有意识获得(有时候也称外显学习,注:二者并不完全等同,但在此处暂不做区分)的知识,但是程序性知识一般都是无意识获得(内隐学习,同上)的。
对于我们的母语中文来说,我们汉语语言能力的组成更多的是程序性知识,也就是流利地、不费力的使用中文的能力,但是也有陈述性知识的组成,比如我们也可以给一个外国朋友解释“不来”和“没来”的区别。
但是对于我们的第二语言英文来说,陈述性知识所占的比例更大,这就是我们学英语的方式所导致的---- 我们总是在有意识地、把英语语言当做知识来学习。
lexicalized语言学名词解释
lexicalized语言学名词解释在语言学中,lexicalized(词汇化)是指原本是独立的词或短语成为一个固定的搭配(collocation),具有特定的意义并被作为一个整体使用的过程。
当一个词或短语被频繁地用作固定搭配时,它的意义可能与其原始的字面意义不同,而成为一个具有独特词义的整体。
这种现象被称为lexicalization。
词汇化可以发生在各种语言的不同层次上,包括词汇层面、结构层面和句法层面。
在词汇层面上,某些词或短语经常被结合在一起,形成固定的搭配,并且在使用中逐渐获得了固定的含义,与其独立字面意义有所不同。
例如,英语中的"break the ice"(打破僵局)这个搭配的意义并不是字面上的“打破冰块”,而是用来表示"破除困难"或"打破沉默"。
在结构层面上,多个词或短语可以以特定的顺序、形式和语法规则进行组合,形成一个短语或句子,这种组合也可以是lexicalized。
例如,英语中的"take a shower"(洗澡)这个结构的组合顺序和形式固定,不能随意改变,而且这个结构的意义也不能从独立的词义推断出来。
在句法层面上,某些句子结构经常出现,并且被视为一个整体,具有固定的意义。
这种句法结构的lexicalized可以包括动词短语、名词短语和形容词短语等。
例如,英语中的"make up"(弥补)这个动词短语在句法上可以用来表示“编造”、“和解”或“化妆”,具体的意义不仅取决于动词本身,还取决于其他成分的配合。
总之,lexicalized 是指一个词或短语在语言使用中逐渐形成固定搭配并获得特定意义的过程,这种现象可以出现在词汇层面、结构层面和句法层面。
lexicalization 可以丰富语言的表达能力,并帮助语言使用者更好地理解和运用语言。
Head-corner parsing for TAG
4
#
wh
. . . .. .. .. .. ... .. .. . .. . .. .. . .. .. .. . . . ...... ...... .. ...... .. .. . . ... . ... .. .. .. .. . .. . .
s
1
vp
*vp
. . . . . . . . . . . . . . .Biblioteka . . . . . . . . .
. . .. .. .. .. .. .. .. .. .. .. .. .. . . .. .. .. .. .. ..
sbar
.. .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . .. . ... .. ....... ...... ...... . ... ... ... .
1
wh who
2
. . . . . . . . . . . .. ... ..... .. ... .. .. .... . ... .... . . . .. .. . . . . . . .. . . .. . . .. . .. . .. . .. . .. . .. . . .. . .. . . . . .
s
#
np
. .. .. .. .. .. . . .. . .. . .. . . . . . . . .. . .. .. .. ...... ...... ..... . .. .. . ... .. ... ... . .. . . .. .. . . .. .. .. .. .. .. .. .. .. .. .. . . ..
Head Corner Parsing for TAG
Linguistic-Context
• today's paper( newspaper) 今天的报纸
• examination paper( a set of questions used as an examination) 考试卷
3
The same is true of all the polysemic verbs • do a sum(work out the answer to a mathema-
9
10
感谢您的阅读收藏,谢谢!
她对丈夫奇怪的行为逐渐变得焦躁。
2,become +pron/n(use as object), meaning ‘suit, befit’意思是“适合,有利于”
7
• This sort of behaviour hardly becomes a person in your position.
这种行为很难适合象你这样地位的人。
tical question) 算术
• do one's teeth( brush) 刷牙
• do the flowers( arrange) 安排花束
4
• do fish(cook) 做鱼
文盲知道语法吗英语作文
文盲知道语法吗英语作文Title: Can Illiterates Understand Grammar?Language, an intricate tapestry of communication, is composed of various elements, including vocabulary, syntax, and grammar. Among these components, grammar holds apivotal role, serving as the structural backbone that enables coherence and clarity in communication. However, an intriguing question arises: Can illiterates comprehend grammar? Let us delve into this inquiry, exploring the intricate relationship between literacy and grammatical understanding.To begin with, it's imperative to elucidate the concept of literacy. Literacy encompasses not only the ability to read and write but also involves a deeper comprehension of language structures, including grammar. Hence, it stands to reason that individuals who lack literacy may encounter challenges in grasping the nuances of grammar. 。
英语语法形式主义发展史
Det N V Det
the girl sees a boy
History of Grammar Formalisms
• 500 B.C.
– Pānini’s grammar of Sanskrit, Astadhyayi, contains production rules (!!!) for Sanskrit phonology, morphology, and grammar. – This is a lasting work of genius, still studied today, and remarkably similar to some modern linguistic theories.
Reference
• Hopcroft, Motwani, and Ullman, Introduction to Automata Theory, Languages, and Computation, second edition, Addison-Wesley, 2001.
– Chapter 5.2
– Parsing: when you find the string on the right hand side, replace it with the string on the left hand side. – Generation: when you find the symbol on the left hand side, replace it with the string on the right hand side. – Different grammar formalisms will have different instructions.
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Yves Schabes and Richard C. Waters Mitsubishi Electric Research Laboratories 201 B r o a d w a y , C a m b r i d g e , M A 02139 e - m a i l : s ( ; h a b e s @ m e r l . c o m a n d dick((~merl.coin
Lexicalized context-free grammar (LCFG) is an attractive compromise between the parsing efficiency of context-free grammar (CFC) and the elegance and lexical sensitivity of lexicalized treeadjoining grammar (LTAG). LCFC is a restricted form of LTAG that can only generate contextfree languages and can be parsed in cubic time. However, LCF(I supports much of the elegance of LTAG's analysis of English and shares with LTAG the ability to lexicalize CF(I;s without changing the trees generated.
Motivation
Context-free grammar (CFG) has been a well accepted framework for computational linguistics for a long time. While it has drawbacks, including the inability to express some linguistic constructions, it has the virtue of being computationally efficient, O(n3)-time in the worst case. Recently there has been a gain in interest in the so-called 'mildly' context-sensitive formalisms (Vijay-Shanker, 1987; Weir, 1988; Joshi, VijayShanker, and Weir, 1991; Vijay-Shanker and Weir, 1993a) that generate only a small superset of context-free languages. One such formalism is lexicalized tree-adjoining grammar (LTAG) (Schabes, Abeill~, and Joshi, 1988; Abeillfi et al., 1990; Joshi and Schabes, 1992), which provides a number of attractive properties at the cost of decreased efficiency, O(n6)-time in the worst case (VijayShanker, 1987; Schabes, 1991; Lang, 1990; VijayShanker and Weir, 1993b). An LTAG lexicon consists of a set of trees each of which contains one or more lexical items. These elementary trees can be viewed as the elementary clauses (including their transformational variants) in which the lexical items participate. The trees are combined by substitution and adjunction. LTAC supports context-sensitive features that can capture some language constructs not captured by CFG. However, the greatest virtue of LTAG is that it is lexicalized and supports ex-
1ቤተ መጻሕፍቲ ባይዱ1
Lexicalized
Context-Free
Grammars
NP D$ N boy
VP
N
VP VP* Adv smoothly
Like an LTAG, an LC'FG consists of two sets of trees: initial trees, which are combined by substitution and auxiliary trees, which are combined by adjunction. An LCFG is lexicalized in the sense that every initial and auxiliary tree is required to contain at least one terminal symbol on its frontier. More precisely, an LCFG is a five-tuple (Z, NT, I, A, ,5'), where ~ is a set of terminal symbols, N T is a set of non-terminal symbols, I and A are sets of trees labeled by terminal and nonterminal symbols, and ,5' is a distinguished nonterminal start symbol. Each initial tree in the set I satisfies the following requirements. (i) Interior nodes are labeled by nonterminal symbols. (ii) The nodes on the frontier of the tree consist of zero or more non-terminal symbols and one or more terminal symbols. (iii) The non-terminal symbols on the frontier are marked for substitution. By convention, this is annotated in diagrams using a down arrow (l). Each auxiliary tree in the set A satisfies the following requirements. (i) Interior nodes are labeled by nonterminal symbols. (ii) The nodes on the frontier consist of zero or more non-terminal symbols and one or more terminal symbols. (iii) All but one of the non-terminal symbols on the frontier are marked for substitution. (iv) The remaining non-terminal on the frontier of the tree is called the foot. The label on the foot must be identical to the label on the root node of the tree. By convention, the foot is indicated in diagrams using an asterisk (.). (v) the foot must be in either the leftmost or the rightmost position on the frontier. Figure 1, shows seven elementary trees that might appear in an LCFG for English. The trees containing 'boy', 'saw', and 'left' are initial trees. T h e remainder are attxiliary trees. Auxiliary trees whose feet are leftrnost are called left recursive. Similarly, auxiliary trees whose feet are rightrnost are called righl recursive auxiliary trees. The path from the root of an auxiliary tree to the foot is called the spine.
tended domains of locality. The lexical nature of LTAC is of linguistic interest, since it is believed that the descriptions of many linguistic phenomena are dependent upon lexical data. The extended domains allow for the localization of most syntactic and semantic dependencies (e.g., fillergap and predicate-argument relationships). A fllrther interesting aspect of LTAG is its ability to lexicalize CFCs. One can convert a CFC into an LTAG that preserves the original trees (Joshi and Schabes, 1992). Lexicalized context-free grammar (LCFG) is an attractive compromise between LTAG and CFG, that combines many of the virtues of LTAG with the efficiency of CFG. LCFC is a restricted form of LTAG that places further limits on the elementary trees that are possible and on the way adjunction can be performed. These restrictions limit LCFG to producing only context-free languages and allow LCFC to be parsed in O(n3) time in the worst ease. However, LCFC retains most of the key features of LTAG enumerated above. In particular, most of the current LTAG grammar for English (Abeilld et al., 1990) follows the restrictions of LCFG. This is of significant practical interest because it means that the processing of these analyses does not require more computational resources than CFGs. In addition, any CFG can be transformed into an equivalent LCFC that generates the same trees (and therefore the same strings). This result breaks new ground, because heretofore every method of lexicalizing CFCs required contextsensitive operations (Joshi and Schabes, 1992). The following sections briefly, define LCFG, discuss its relationship to the current LTAG grammar for English, prove that LC,FC can be used to lexicalize CFC, and present a simple cubic-time parser for LCFC. These topics are discussed in greater detail in Schabes and Waters (1993).