lingual Natural Language Generation (M-NLG)

合集下载

自然语言研究的层次

自然语言研究的层次

自然语言研究的层次Levels of Natural Language Research:1. Morphology: Morphology, also known as "lexical morphology" or "morphology", studies the internal structure of words, including word inflection and word formation. The morphological changes of words can affect their syntactic function and semantic meaning in sentences.2. Syntax: Syntax studies the relationship between the structural components of sentences and the rules for forming sentence sequences. In other words, syntax studies why a sentence can be said in this way or that way.3. Semantics: Semantics studies how to derive the meaning of a sentence from the meaning of the words in the sentence and the role of these words in the syntactic structure of the sentence. In other words, semantics studies what the language unit actually says.4. Pragmatics: Pragmatics studies the meaning and function of language in actual use, including social, cultural, and psychological factors of language. These four levels all play important roles in natural language processing, and together they form the foundation of natural language understanding.自然语言研究的层次:1. 形态学(morphology):形态学也被称为“词汇形态学”或“词法”,它研究词的内部结构,包括词的屈折变化和构词法。

心理学专业英语词汇(L2)_教学英语词汇

心理学专业英语词汇(L2)_教学英语词汇

lie 说谎lie confession 谎供lie detector 测谎器lie scale 测谎量表liepmann s apraxia 利普曼氏运用不能liepmann s effect 利普曼效应life 生命life age 生命年龄life change units 生活变化单位life curve 生命曲线life cycle 生命周期life cycle analysis 生命周期分析life cycle of commodities 商品生命周期life cycle theory of leadership 领导的生命周期理论life event scale 生活事件量表life expectancy 预期寿命life experience 生活经验life force 生命力life goal 生活目标life instinct 生存本能life jacket 救生衣life line 生命线life position 生命地位life record 生活史life rehearsal 生活演练life sciences 生命科学life skill counseling 生活技能咨询life space 生活空间life space continuity 生活空间连续性life space interview 生活场面面谈life span 寿命life stress 生活应激life style disease 生活方式疾病life table 生命表lifetime education 生涯教育lifetime employment system 终身雇用制lifetime personality 终身性格life span development 毕生发展life span developmental psychology 生涯发展心理学life span perspective 毕生发展观life support system 生命维持系统life threatening 危害生命life world 生活世界light 光light activated 光敏感的light adaptation 光适应light comparison technique 光比较法light compass movement 光罗盘运动light constancy 光恒常性light difference 光差light distribution of illuminator 照明器光线分布light dread 恐光症light exposure 曝光light filter 滤光器light halo 光晕light induction 光诱导light intensity 光强度light minimum 最低度光觉light pen 光笔light proof 不透光的light reaction 光反应light reflex 光反射light sensation 光觉light sense 光觉light sensitive 感光的light sensitive 光敏的light sensitivity 感光性light sensitivity 光敏度light signal display 信号灯显示light source 光源light vision 明视觉lighting 照明lighting effectiveness factor 照明有效系数lightmeter 光度计lightness 光度lightness contrast 明度对比light dark adaptation 明暗适应light dark ratio 明暗比light induction 光诱导likable 可爱的likableness 可爱like 爱好like 相似likelihood 似然性likelihood analysis 似然分析likelihood ratio 似然比likelihood ratio criterion 似然性比率评准likeness 类似likert management system 利开特管理系统likert method 利开特法likert procedure 利开特程序likert scale 利开特式量表likert scaling 利氏量表法likert technique 利开特法likert type attitude scale 利式态度量表like dislike 爱好厌恶like indifferent dislike 喜欢无所谓厌恶liking 爱好lilliputian hallucination 渺小幻觉limbic lobe 边缘叶limbic system 边缘系统limbus 缘limbus boundary method 角膜缘部测定法limbus of corneae 角膜缘limb kinetic apraxia 肢体运动性运用不能limen 阈限limen of twoness 两点阈limes 限量limes reacting dose 反应限量liminal 阈限的liminal stimulus 阈限刺激limit 极限limit 限度limit analysis 极限分析limit checks 极限检验limit distribution 极限分布limit method 极限法limit of error 误差界限limit of interval 区间限limit of perception 感觉限度limit of the error of sampling 抽样误差限limit point 极限点limit process 极限法limit sample 极限样本limit superior 上极限limit value 极限值limitation 限制limited capacity 有限容量limited information 有限资料limited prediction 有限推测limiting 极限limiting character 极限特性limiting chi square distribution 极限x2分布limiting distribution 极限分布limiting form 极限形式limiting point 极限点limiting surface 界面limiting value 极限值limits 极限limits of trustworthiness 可靠性限度limosis 善饥症limotherapy 饥饿疗法lincoln oseretsky motor development scale 林奥二氏肌动发展量表line 线line chart 线形图line diagram 线形图line graph 线图line in education 教育路线line length coding 线长编码line of business 行业line of direction 方向线line of regard 注视线line organization 直系组织line plotting 画线line production 流水作业line relationship 直线关系line segment 线段lineage 世系linear 线性linear additive model 线性递加模式linear analysis 线性分析linear approximation 线性近似linear causal model 线性因果模型linear correlation 线性相关linear discrimination function 线性辨别函数linear equating 直线等化linear equation 线性方程linear function 线性函数linear inference 线性推演linear interpolation 线性内推法linear maze 直线迷津linear model 线性模式linear operator model 线性演算器模式linear order 直线序列linear order of genes 基因直线排列linear perspective 直线透视linear program 直线式程序linear regression 线性回归linear regression analysis 线性回归分析性归分析linear regression equation 直线回归方程式linear regression in z ×c table线性回归z×c 表linear regression method 线性回归法linear relation 线性关系linear relationship 线性关系linear syllogism 线性三段论linear transformation 直线转换linear trend 直线趋向linear type 直线型linearity 直线性linearity condition 线性条件linearity error 线性误差linearly independent vector 线性独立向量lingual nerve 舌咽神经linguistic ability 语言能力linguistic analysis 语言分析linguistic approach 语言教读法linguistic behavior 语言行为linguistic capacity 语言能力linguistic code 语言编码linguistic competence 语言能力linguistic context 语言的前后关系linguistic expression 语言表达式linguistic form 语言形式linguistic information 语言信息linguistic kinesic method 语言运动法linguistic organization 语言结构linguistic performance 语言行为linguistic psychology 语言心理学linguistic relativism 语言相对论linguistic relativity 语言相关性linguistic relativity hypothesis 语言相对假说linguistic relativity principle 语言相对原则linguistic skill 语言技能linguistic structure 语言结构linguistic symbol 语言符号linguistic unit 语言单元linguistic universal 语言共通性linguistics 语言学link 环link 联结link analysis 链分析link trainer 环状训练舱link up 连接linkage 联系linked character 关联特征linking rule 连接规则linophobia 绳索恐怖症lip erotism 唇欲lip key 唇键lip language 唇话liparodyspnea 肥胖性呼吸困难lipase 酯酶lipid 脂类lipomatosis 脂肪过多症lipophrenia 失神lipostatic theory 恒脂论lipothymia 昏厥lipps figure 利普斯图形lipreading 读唇法lips 唇lip reading 唇读liquid 液体liquid conservation 液体守恒liquid crystal 液晶liquid crystal display 液晶显示liquidation of memory 记忆同化liquidity 流动性lisp 口齿不清lissauer s paralysis 卒中型麻痹性痴呆list 清单listen 听listening 听力listening comprehension 听懂listening gear 听音器listening with the third ear 第三只耳朵listing technique 列举法listing s law 利斯廷定律literacy 识字literacy test 识字测验literal 文字的literal alexia 字性失读literal notation 文字记号literal paraphasia 字性错语literal perception 精密知觉literary and artistic appreciation 文艺欣赏literary and artistic creation 文艺创作literary language 文学语言literary psychology 文艺心理学literary artistic imagination 文艺想象literature 文献literature 文学literature review 文献评论lithium 锂盐lithium carbonate 碳酸锂盐lithium therapy 锂疗litigation psychology 诉讼心理学litigious delusion 诉讼妄想little s disease 痉挛性两侧瘫little s disease 李特耳氏病lived experience 生活体验livelihood 生活lively type 活泼型livierato s sign 利韦拉托症living clock 生物钟living condition 生活条件living environment 生活环境living form 生活型living goods 生活用品living level 生活水平living space 生活空间living standard 生活标准living substance 生活物质lloyd morgan s canon 摩根法则lmx theory 领导者成员交换理论lness 性善恶混论load 负荷load stress 负荷应激loading 负荷量loading routine 输入程序lobes of cerebrum 大脑叶lobotomized personality 额叶切除人格lobotomy 额叶切除术lobstein s disease ganglion 洛布斯坦神经节lobus centralis 小脑中叶lobus occipitalis 枕叶lobus olfactorius 嗅叶lobus opticus 视叶lobus parietalis 顶叶local 局部local anaphylaxis 局部过敏local circuit 局部路local control 局部控制local convergence 局部收敛local disturbance 局部障碍local effect 局部效应local epilepsy 局部性癫痫local excitatory 局部兴奋态local fatigue 局部疲劳local habitats 本土生境local independence 局地独立local lighting 局部照明local maximum 局部极大local minimum 局部极小local norm 地区常模local optimum 局部最优local pattern 局部模式local phenomenon 局部现象local population 地域人口local reaction 局部反应local reflex 局部反射local response 局部反应local restriction 局部制约local sign 部位记号local stability 局部稳定性local terminal 本机终端local vibration 局部振动locality of animal behavior 动物地域性行为localization 定位localization of function 机能定位localization of function in cortex 大脑机能定位localization of genes 基因定位localization of psychological function 心理机能定位localization theory 机能定位论localized amnesia 局部失忆症localized disturbance 局部失调localized general lighting 分区一般照明localized learning 局部性学习localized transduction 局部性传导location 位置location chart 位置图表location constancy 位置常性location of controls 控制器位置location of displays 显示器位置locational space 位置空间lock and key theory 锁与匙论locomotion 位置移动locomotor 运动器官的locomotor ataxia 运动共济失调locomotor genital stage 运动性欲期locus 地点locus 轨迹locus of control 控制点log 对数log normal distribution 对数常态分布logagnosia 失语症logagraphia 失写症logamnesia 记言不能logaphasia 表达性失语症logaphasia 运动性言语不能logarithmic characteristic 对数特性logarithmic computation 对数计算logarithmic coordinates 对数坐标logarithmic mean 对数平均数logarithmic property 对数性质logarithmic table 对数表logasthenia 言语理解困难logic 逻辑logic behavior 逻辑行为logic decision making 逻辑决策logic equation 逻辑方程logic judgment 逻辑判断logic learning 推理式学习logic memory 逻辑记忆logic model 逻辑模式logic network 逻辑网络logic of faith 信念逻辑logic of language 语言逻辑logic of modality 模态逻辑logic of natural language 自然语言逻辑logic of proposition 命题逻辑logic of question 问句逻辑logic of relation 关系逻辑logic of science 科学逻辑logic rule 逻辑规则logic theorist 逻辑理论家logic theory 逻辑论logical abstraction 逻辑抽象logical algebra 逻辑代数logical analysis 逻辑分析logical argumentation 逻辑论证logical axiom 逻辑公理logical calculus 逻辑演算logical category 逻辑范畴logical checking 逻辑检验logical combination 逻辑组合logical comparison 逻辑比较logical condition 逻辑条件logical deduction 逻辑演绎logical deduction by reasoning 逻辑推理logical derivation 逻辑推导logical empiricism 逻辑经验主义logical entailment 逻辑蕴涵logical equation 逻辑方程logical error 逻辑错误logical expression 逻辑表达logical formula 逻辑式logical grammar 逻辑语法logical ground 逻辑的根据logical idealism 逻辑唯心论logical identity 逻辑的同一logical inclusion 逻辑的包含logical interdependence 逻辑的相互依赖logical judgment 逻辑判断logical knowledge 逻辑认识logical language 逻辑语言logical law 逻辑定律logical memory 逻辑记忆logical method 逻辑方法logical multiplication 逻辑的繁衍logical necessity 逻辑的必然性logical nets 逻辑网格logical neuron 逻辑神经元logical operations 逻辑运算logical order 逻辑程序logical positivism 逻辑实证论logical possibility 逻辑可能性logical priority 逻辑的先在性logical proof 逻辑证明logical proposition 逻辑命题logical reasoning 逻辑推理logical record 逻辑记号logical rigor 逻辑的严密性logical simulation 逻辑模拟logical structure 逻辑结构logical sum 逻辑和logical symbol 逻辑符号logical syntax 逻辑句法logical system 逻辑系统logical term 逻辑术语logical theory 逻辑理论logical thinking 逻辑思维logical type 逻辑型logical validity 逻辑效度logical variable 逻辑变量logicality 逻辑性logically compatible 逻辑相容的logically proper name 逻辑专名logicosemantic analysis 逻辑语义分析logic in use 应用逻辑logistic discrimination 逻辑辨别logistic method 逻辑函数法logistic transformation 逻辑变换logistics system 符号逻辑体系logoclonia 言语痉挛logocophosia 听性失语logodiarrhea 多言症logogriph 字谜logoklony 痉语症logokophosia 听性失语logokophosis 词聋logomachy 字谜游戏logomania 多语症logoneurosis 言语神经机能症logopathy 言语障碍logopedia 言语矫正法logopedics 言语矫正法logophasia 口吃logophobia 谈话恐怖症logoplegia 发音器官麻痹logorrhea 多语癖logos 理性logospasm 痉语症logotherapy 言语疗法log linear models 对数线性模型log linear smoothing model 对数线性平滑模型loneliness 孤独lonely crowd 孤独人群loner 孤独的人long 长的long distance hearing 远距离听觉long distance vision 远距离视觉longevity 长寿longevous 长命的longing 渴望longish 稍长的longitudinal 纵向的longitudinal design 纵向设计longitudinal fissure 大脑纵裂longitudinal method 纵向法longitudinal research 纵向研究longitudinal study 纵向研究longitudinal survey 纵向调查longitudinal wave 纵波longwall method 长壁法long circuit appeal 长期招徕long hot summer effect 长夏效应long range planning 长期经营计划long sightedness 远视long term effect 长期效应long term memory 长时记忆long term store 长期贮存lonizing radiation 放射线looking glass self 镜映自我look and say 看和说look glass self 镜映自我looming 庞视loop 环loose construct 松散性结构loosening 放松loosening of associations 联想散漫loquacity 多语争辩癖lordosis 脊柱前曲lorge thorndike intelligence test 洛桑二氏智力测验loss 失落loss boundaries of ego 自我界限消失loss of consciousness 意识丧失lost generation 迷惘的一代lost letter technique 丢信方法lost time injury 失时伤害loud reading 朗读loud speaker 扬声器loudness 响度loudness compensation 响度补偿loudness contour 等响曲线loudness discrimination 响度辨别loudness level 响度级loudness perception 响度知觉love child 私生子love needs 爱的需要lovelace 色鬼love object 恋爱对象low altitude flight 低空飞行low birth weight infant 出生低体重儿low pressure 低压low temperature 低温lowenfeld mosaic test 洛温菲尔德拼镶测验lower absolute threshold 下绝对阈限lower difference threshold 下差别阈lower limit 下限lower limit value 下限值lower management 现场管理lower mental process 低级心理过程lower nervous activity 低级神经活动lower orders 下层社会lower pressure environment 低压环境lower pressure environment tank 低压舱lower quartile 下四分位数lower sensation 低级感觉lower threshold 下辨别阈lower value 下限值lowering trend of criminal age 犯罪低龄化lowerleg foot length 足小腿高lower order needs 较低层次需要lowest audible tone 最低可听音lowlife 下层社会的人low grade defective 轻度身心缺陷loxophthalmus 斜视loyalty 忠诚lpc 最差同事lpc questionnaire 最难共事者问卷lps 利侧表lq 下四分位值lr 反应限量lrcs 语言认知结构lsd 麦角酸二乙基酰胺lt 逻辑理论家lth 促黄体分泌激素ltm 长时记忆长期记忆ltt 潜在特质理论lucifugal 避光的lucotherapy 光线疗法ludd franklin theory 鲁德弗兰克林说lullaby 催眠曲lumbar spinal cord 腰背lumen 流明lumen per square meter 流明平方米lumenmeter 流明计lumen second 流明·秒luminance 亮度luminance balance 亮度平衡luminance coding 亮度编码luminance contrast 亮度对比luminance difference 亮度差异luminance factor 亮度因素luminance meter 亮度计luminance non uniformity 亮度非均匀性luminance non uniformity 亮度非均匀性亮度非均匀性luminance ratio 亮度比率luminance threshold 亮度阈限luminescence 发光luminophor 发光体luminosity 亮度luminosity body 发光体luminosity coefficient 照度系数luminosity function 亮度函数luminous 发光的luminous body 发光体luminous efficiency 光效率luminous efficiency curve 光效率曲线luminous emittance 光发散度luminous environment 发光环境luminous flux 光通量luminous intensity 光强luminous reflectance 光反射率lump sum separation allowance system 退职金制度lumsden s center 呼吸调节中枢lunar periodicity 月周期性lunatic 精神病患者lung 肺lung volume and capacity 肺活量luria nebraska neuropsychological battery 鲁内神经心理成套测验luridine 胆碱lust 色欲lust of pain 疼痛色情luster 光泽lustrous 有光泽的lutein 黄体素luteinizing hormone 黄体生长激素luteinizing hormone releasing factor 促黄体生成素释放因子luteinizing hormone releasing hormone 黄体生成素释放激素luteotrophic 促黄体的luteotrophic hormone 促黄体分泌激素lux 勒克斯luxury 奢侈lycanthropy 变狼狂lygophilia 暗室癖lygophilia 嗜室症lying 说谎lying posture 躺姿lying with face downward 面向下躺lying with face upward 面向上躺lympha 淋巴lymphatic temperament 粘液质lynching 私刑lypemania 悲伤狂lypemania 沮丧癖lypothymia 忧郁症lysergic acid 麦角酸lysergic acid diethylamine 麦角酸二乙基酰胺lysine 赖氨酸lysine decarboxylase 赖氨酸脱羧酶lysine racemase 赖氨酸消旋酶lyssophobia 狂犬病恐怖症lm最低度光觉lva左视敏度ldopa 左旋多巴lid 喜欢无所谓厌恶。

自然语言(-natural-language)

自然语言(-natural-language)
• 1966年11月,该委员会公布了一个题为《语言与机器》 的报告(简称ALPAC报告) ,该报告全面否定了机器 翻译的可行性,并建议停止对机器翻译项目的资金支 持。这一报告的发表给了正在蓬勃发展的机器翻译当 头一棒,机器翻译研究陷入了近乎停滞的僵局。无独 有偶,在此期间,中国爆发了“十年文革” ,基本上 这些研究也停滞了。机器翻译步入萧条期。
• 它是自然语言处理 (Natural Language Processing)的一个分支,与计算语言学 (Computational Linguistics )、自然语言理解 ( Natural Language Understanding) 之间存在 着密不可分的关系。
Brief History
论电子计算机的应用范围时,于1947年提出了利用计 算机进行语言自动翻译的想法。 • 1949年,W. Weaver 发表《翻译备忘录》 ,正式提出 机器翻译的思想。
开创期(1947-1964)
• 1954 年,美国乔治敦大学(Georgetown University) 在 IBM 公司协同下,用 IBM-701计算 机首次完成了英俄机器翻译试验,向公众和科学 界展示了机器翻译的可行性,从而拉开了机器翻 译研究的序幕。翻译过程原原文译










在搞多种语言对一种语言的翻译时
原文分析
考虑译语的特点
原文译文转换
结合 建立相关独立生成系统
译文生成(独立)
不考虑原语的特点
在搞一种语言对多种语言的翻译时
原文分析(独立)
不考虑译语的特点
原文译文转换
结合
建立独立分析 相关生成系统
译文生成

语言学的英语作文

语言学的英语作文

语言学的英语作文Linguistics, the scientific study of language, is a fieldthat has captivated scholars and laymen alike for centuries. It is a multifaceted discipline that encompasses a variety of subfields, each focusing on different aspects of human communication. In this essay, we will delve into the importance of linguistics, its various branches, and its impact on our understanding of language and culture.Firstly, linguistics is crucial for understanding the structure and function of language. It is divided into several branches, including phonetics, which studies the physical properties of speech sounds; phonology, which examines the system of sounds in a language; morphology, which looks at word formation; syntax, which investigates the rules for constructing sentences; and semantics, which deals with meaning in language. Each of these branches contributes to a comprehensive understanding of how languages work.Secondly, linguistics plays a significant role in language acquisition. It helps educators and psychologists understand how children learn to speak and understand their first language, and how adults can acquire new languages. This knowledge is vital for developing effective language teaching methods and for understanding language disorders.Another important aspect of linguistics is sociolinguistics, which explores the relationship between language and society.It looks at how language varies with different social situations and how it reflects social identities, such as gender, class, and ethnicity. This field is particularly relevant in a globalized world where cultural diversity and social inclusion are increasingly important.Linguistics also intersects with technology in the form of computational linguistics and natural language processing. These areas are essential for developing artificial intelligence systems that can understand and generate human language. As technology continues to advance, the role of linguistics in creating more sophisticated and human-like AI becomes more prominent.Furthermore, linguistics contributes to the preservation of endangered languages. As languages disappear, so does the unique cultural knowledge and worldview that they carry. Linguists work to document and revitalize these languages, ensuring that the world's linguistic diversity is maintained for future generations.In conclusion, linguistics is a dynamic and essential field that offers insights into the human capacity for communication. It not only informs us about the structure and use of language but also plays a critical role in education, social understanding, technological innovation, and cultural preservation. As we continue to navigate an increasingly interconnected world, the study of linguistics will remain a vital tool for fostering communication and understanding among diverse populations.。

英文 预训练语料 nlp

英文 预训练语料 nlp

英文预训练语料 nlpPre-trained Corpora for NLPNatural language processing (NLP) is a field of computer science and artificial intelligence that focuses on the interaction between computers and human language. To develop effective NLP models, a large amount of textual data is required for training. This data is referred to as a "corpus."A corpus is a collection of texts or utterances in a particular language or domain. It serves as the input to train NLP models, enabling them to learn language patterns, grammar, semantics, and pragmatics. By analyzing and processing the data in a corpus, NLP models can generate meaningful output, such as text generation, machine translation, sentiment analysis, and named entity recognition.There are various types of corpora available for NLP, including:1. General corpora: These are large collections of text from a wide range of sources, such as the Internet, books, newspapers, and magazines. Examples include the Corpus of Contemporary American English (COCA) and the British National Corpus (BNC).2. Domain-specific corpora: These corpora focus on specific domains or topics, such as medicine, law, or finance. They contain text related to that particular domain, enabling NLP models to specialize in processing texts within that field.3.Annotated corpora: Annotated corpora include additional metadata or annotations to provide further information about the text. Examples include sentiment annotated corpora for sentiment analysis orNamed Entity Recognition corpora for identifying entities in text.4. Multilingual corpora: These corpora contain text in multiple languages, facilitating the development of multilingual NLP models or cross-lingual applications.To create a corpus, text data is typically collected from various sources and cleaned and预处理 to remove noise and standardize the format. The corpus can thenbe divided into training, validation, and test sets to evaluate the performance of NLP models during training and inference.In recent years, the availability of large-scale pre-trained corpora has revolutionized the field of NLP. Pre-trained language models, such as GPT-3, BERT, and RoBERTa, are trained on massive amounts of text data from the Internet. These models can be fine-tuned on specific tasks or domains using smaller datasets, significantly improving the performance and efficiency of NLP applications.In summary, pre-trained corpora are essential for developing and advancing NLP models. They provide the necessary data to train models, enabling them to understand and process human language with greater accuracy and effectiveness. The continued development and availability of high-quality corpora will drive further innovations in NLP and its applications across various domains.。

Usage of WordNet in natural language generation

Usage of WordNet in natural language generation

Abstract
WordNet has rarely been applied to natural language generation, despite of its wide application in other fields. In this paper, we address three issues in the usage of WordNet in generation: adapting a general lexicon like WordNet to a specific application domain, how the information in WordNet can be used in generation, and augmenting WordNet with other types of knowledge that are helpful for generation. We propose a three step procedure to tailor WordNet to a specific domain, and carried out experiments on a basketball corpus (1,015 game reports, 1.7MB).
Usage of WordNet in Natural Language Generation
Hongyan Jing Department of Computer Science
Columbia University New York, NY 10027, USA
hjing@
Secondly, as a semantic net linked by lexical relations, WordNet can be used for lexicalization in generation. Lexicalization maps the semantic concepts to be conveyed to appropriate words. Usually it is achieved by step-wise refinements based on syntactic, semantic, and pragmatic constraints while traversing a semantic net (Danlos, 1987). Currently most generation systems acquire their semantic net for lexicalization by building their own, while WordNet provides the possibility to acquire such knowledge automatically from an existing resource.

Natural language and the role of meta language

Natural language and the role of meta language
Natural language and the role of metalanguages
Wei Sheng
Natural language
Major Tasks of NLP
• • • • • • • Automatic summarization Part-of-speech tagging Machine translation Relationship extraction Speech segmentation Word segmentation and so on
a classical paradox
• "This sentence is wrong." True or False?
my answer of the reason of the paradox: the sentence is not only objective language, but also meta language!!!ຫໍສະໝຸດ Meta language
• Broadly, any metalanguage is language or symbols used when language itself is being discussed or examined. • In logic and linguistics, a metalanguage is a language used to make statements about statements in another language (the object language).
Explanation
Euro 2012 football kick ...
it's always easier to use another language to explain the objective language if you use the same language, you will be into a cycle.

2020开年解读:NLP新范式凸显跨任务、跨语言能力,语音处理落地开花

2020开年解读:NLP新范式凸显跨任务、跨语言能力,语音处理落地开花

2020开年解读:NLP新范式凸显跨任务、跨语⾔能⼒,语⾳处理落地开花2020年伊始,我们总结、展望了微软亚洲研究院在多个 AI 领域的突破与趋势,⽐如,;;以及。

今天,我们将探索⾃然语⾔处理(Natural Language Processing,NLP)范式的新发展,以及微软亚洲研究院在语⾳识别与合成领域的创新成果。

NLP 在近两年基本形成了⼀套近乎完备的技术体系,包括了词嵌⼊、句⼦嵌⼊、编码-解码、注意⼒模型、Transformer,以及预训练模型等,促进了 NLP 在搜索、阅读理解、机器翻译、⽂本分类、问答、对话、聊天、信息抽取、⽂摘、⽂本⽣成等重要领域的应⽤,预⽰着⾃然语⾔处理进⼊了⼤规模⼯业化实施的时代。

与此同时,随着机器软硬件能⼒的提升,模型、算法的突破,语⾳合成、语⾳识别、语⾳增强也都有了突飞猛进的发展,如微软亚洲研究院的 FastSpeech、PHASEN,让机器语⾳越来越接近⼈类说话,进⼀步加速了相关语⾳产品的落地。

(注:,即可查看⽂中提及的相关论⽂详细信息列表)NLP 步⼊第四代范式:预训练+微调⾃然语⾔处理范式是⾃然语⾔处理系统的⼯作模式,细数之下,已经历了三代变迁,如今即将进⼊第四代。

第⼀代 NLP 范式是出现在上世纪90年代前的“词典+规则”;第⼆代范式是2012年之前的“数据驱动+统计机器学习模型“;第三代范式是始于2012年的“端对端神经⽹络的深度学习模型”。

2018年前后,研究⼈员的⽬光开始锁定在预训练+微调上,标志着 NLP 第四代范式的出现,这也代表着 NLP 未来发展的⽅向。

图1:NLP 范式的变迁⽬前,主流的⾃然语⾔处理范式是以 BERT 为代表的“预训练+微调”的新⾃然语⾔处理研究和应⽤范式,其基本思想是将训练⼤⽽深的端对端的神经⽹络模型分为两步。

⾸先在⼤规模⽂本数据上通过⽆监督(⾃监督)学习预训练⼤部分的参数,然后在具体的⾃然语⾔处理任务上添加与任务相关的神经⽹络,这些神经⽹络所包含的参数远远⼩于预训练模型的参数量,并可根据下游具体任务的标注数据进⾏微调。

学术报告引用参考文献格式

学术报告引用参考文献格式

学术报告引用参考文献格式
学术报告的引用参考文献格式需要根据不同的引用源来进行不同的标注。

一般包括期刊、图书、网络资源等。

以下是常见引用参考文献格式:
期刊文章:作者姓名,题目,期刊名,年份,卷(期),页码。

例如:Li, M., Wang, L., & Zhu, X. (2019). A review of natural language generation methods in Chinese. Journal of Computer Research and Development, 56(8), 1591-1606.
图书:作者姓名,题目,出版地:出版社,年份,页码。

例如:王立平, 《机器学习》, 北京: 清华大学出版社, 2018.
网页:作者(发布年份/更新年份). 标题. 网站名称. 检索日期。

可选:网页链接。

例如:Lample, G., & Conneau, A. (2019). Cross-lingual Language Model Pretraining. ArXiv:1911.02116 [Cs]. 检索日期:2020年9月1日。

以上是几种常见引用参考文献格式的例子,在使用时需要注意间隔符、大小写、字号等格式问题。

在提供参考文献时也需要注意随着学术领域的不同,引用规范也有所不同。

因此,应该根据实际需要来确定具体参考文献的格式和要求。

gpt高级用法

gpt高级用法

gpt高级用法GPT高级用法:探索自然语言生成的无限可能GPT(Generative Pre-trained Transformer)是一种基于Transformer架构的自然语言处理模型,由OpenAI开发。

它在自然语言生成任务上表现出色,如文本生成、对话生成、摘要生成等。

在本文中,我们将探索GPT的高级用法,以及如何利用它的无限可能性。

1. Fine-tuningGPT是一个预训练模型,它在大规模语料库上进行了训练,以学习自然语言的模式和规律。

但是,预训练模型并不是为特定任务而设计的,因此需要进行微调(fine-tuning)以适应特定的任务。

微调是指在特定的数据集上对模型进行训练,以使其更好地适应该数据集。

例如,我们可以使用GPT模型生成电影评论,但是如果我们想让它生成关于科技的评论,我们需要对模型进行微调。

2. Conditional GenerationGPT可以生成与给定条件相关的文本,这被称为条件生成(conditional generation)。

例如,我们可以让GPT生成与给定主题相关的文本。

这可以通过在输入中添加主题词或标签来实现。

例如,如果我们想让GPT生成有关“人工智能”的文本,我们可以将“人工智能”作为输入的一部分。

3. Text CompletionGPT可以用于文本补全(text completion),即给定一部分文本,让模型自动补全剩余的部分。

这可以用于自动化写作、自动化摘要等任务。

例如,我们可以让GPT自动补全一篇文章的结尾,或者生成一篇文章的摘要。

4. Dialogue GenerationGPT可以用于对话生成(dialogue generation),即生成与用户交互的对话。

这可以用于聊天机器人、智能客服等应用。

例如,我们可以让GPT生成与用户的对话,以回答用户的问题或提供帮助。

5. Multi-lingual GenerationGPT可以用于多语言生成(multi-lingual generation),即生成多种语言的文本。

第二语言习得中母优品ppt

第二语言习得中母优品ppt
• 汉语中没有[θ] [ð]的咬舌音——[s] [z]或[思] [兹] these [ði:z] [zi:z]
汉语——没有字词句明显的重读音
英语——重要特征,词重音,句重音
词重音
➢ 英语单词:单音节和多音节 pen , bit 重读音单词,在音标中不显示重读音符 contact 多音节单词 [kɔn’tækt] 重音在第二音节,在句中作
谓语 ➢ 汉语中没有重音的明显特征,都读成重音
• 句重音
➢ 使句子或文章读起来有节奏,把句中带有重要实质信息的 单词重读出来,如名词,动词,形容词,副词,数词,指 示代词等,是听者能及时捕捉说者表达的意思。
➢ The ‘streets are ‘wide and ‘clean 重读,弱读
句子节奏和主干单词表达的意思一目了然
“啊”, Spring [sprɪŋ] [spə rɪŋ]
Interlanguage
汉语发音,发言强调;“Good
Morning”
“故
得毛宁” 因为汉语中b, p, m, f, d, t, n, l, 都有尾音.
英语——音节中有元音和辅音
view [vju:] 发成 [wju:]
pot [pɔt] [pɔtə ]
识新技能的影响 英语闭音节就是以辅音结尾的单词,其后面
汉语语音中没有区分这样清楚,长短元音发成一样的音,“依”。
语言迁移: 相对开音节:单个元音字母后面加单个辅音字母[r除外])
—What is the matter?(降调) pen , bit 重读音单词,在音标中不显示重读音符 英语语音中有的元音,汉语中没有对应的音:英语中【i】【a:】,汉语中没有对应的语音音素;
➢ 汉语对句重读没有特征, 每个单词清清楚楚读出,达不 到重音之间节奏效果,因而语言表达清楚但不流畅,不能 给人美的享受。

自然语言 英语作文

自然语言 英语作文

自然语言英语作文Title: The Power of Natural Language Processing in Shaping the Future。

In today's digital era, natural language processing (NLP) stands at the forefront of technological innovation, revolutionizing the way humans interact with machines and transforming various industries. NLP, a branch ofartificial intelligence (AI) concerned with the interaction between computers and humans in natural language, has made significant strides in recent years, paving the way for a future where communication between man and machine is seamless and intuitive.One of the most remarkable applications of NLP is in language translation. With the advent of sophisticated NLP algorithms, the accuracy of machine translation has improved dramatically, enabling individuals to communicate effortlessly across language barriers. This has profound implications for global connectivity, fostering cross-cultural understanding and facilitating international cooperation.Furthermore, NLP plays a pivotal role in enhancing human-computer interaction. Virtual assistants powered by NLP, such as Apple's Siri, Amazon's Alexa, and Google Assistant, have become integral parts of our daily lives, simplifying tasks and providing valuable information at our fingertips. Through natural language understanding and generation, these virtual assistants can comprehend user queries and respond in a conversational manner, mimicking human-like interaction.In addition to its consumer-facing applications, NLP has revolutionized data analysis and information retrieval. By extracting insights from unstructured textual data, NLP algorithms empower businesses to make data-driven decisions and gain a competitive edge in the marketplace. Whetherit's sentiment analysis, topic modeling, or text summarization, NLP techniques enable organizations to derive valuable insights from vast amounts of text data, unlocking new opportunities for innovation and growth.Moreover, NLP holds immense potential in healthcare, where it can assist clinicians in diagnosing diseases, extracting relevant information from medical records, and providing personalized patient care. By analyzing patient data and medical literature, NLP algorithms can identify patterns, predict outcomes, and support clinical decision-making, ultimately improving healthcare outcomes and reducing medical errors.However, despite its remarkable progress, NLP still faces several challenges. One of the major hurdles is the ambiguity and complexity of natural language, which poses difficulties for accurate interpretation by machines. Additionally, ethical considerations surrounding data privacy, bias in algorithmic decision-making, and the responsible use of AI remain paramount concerns that need to be addressed.Looking ahead, the future of NLP holds immense promise. As researchers continue to advance the field with breakthroughs in deep learning, neural networks, andcognitive computing, we can expect even greater capabilities in language understanding, generation, and translation. Moreover, the integration of NLP with other emerging technologies such as machine learning, robotics, and augmented reality will further expand its potential applications across diverse domains.In conclusion, natural language processing stands at the forefront of technological innovation, empowering humans to communicate with machines in a natural and intuitive manner. From language translation and virtual assistants to data analysis and healthcare, the applications of NLP are vast and far-reaching. As we continue to harness the power of NLP, we can look forward to a future where communication barriers are dissolved, information is readily accessible, and human potential is unleashed to its fullest extent.。

generative language model

generative language model

Generative Language ModelsGenerative language models have revolutionized the field of natural language processing (NLP) and are gaining significant attention in recent years. These models have the ability to generate human-like text based on the input provided to them. In this article, we will explore the fundamentals of generative language models, their applications, and the challenges associated with them.Introduction to Generative Language ModelsGenerative language models are primarily based on machine learning techniques, specifically deep learning algorithms. These models are trained on a large corpus of text data, which could include books, articles, or even internet sources. The goal is to enable the model to learn the statistical patterns and relationships between words in order to generate coherent and contextually appropriate text.One of the popular approaches to generative language modeling is the use of recurrent neural networks (RNNs), specifically long short-term memory (LSTM) networks. The sequential nature of RNNs makes them suitable for generating text, as they can capture dependencies between words in a sentence. LSTM networks, in particular, are designed to handle long-term dependencies and have been successful in generating high-quality text.Applications of Generative Language ModelsGenerative language models find a wide range of applications in NLP and other fields. Let’s explore some of their key applications:1.Text Generation: One of the primary applications of generativelanguage models is text generation. These models can be used to generatecreative writing, poetry, or even entire articles. They have been used in various applications like chatbots, virtual assistants, and content creation tools.2.Speech Recognition and Synthesis: Generative language models canalso be used for speech recognition and synthesis tasks. By training the models on a large amount of spoken language data, they can generate human-likespeech or transcribe spoken words accurately.3.Machine Translation: Another important application area is machinetranslation. Generative language models have shown promising results intranslating text from one language to another. By learning the statisticalpatterns of language from parallel text corpora, these models can generatetranslated sentences that are contextually appropriate.4.Question Answering: Generative language models can also beapplied to question answering tasks. By understanding the context of aquestion, these models can generate relevant and informative answers. They have been used in chatbot systems, online customer support, and virtualassistants to provide instant and accurate responses.Challenges and LimitationsWhile generative language models have shown remarkable performance in various tasks, they also face challenges and limitations. Some of these include:1.Coherence and Consistency: Generating coherent and consistenttext is a major challenge. The models may sometimes generate grammatically correct but contextually inappropriate sentences, leading to nonsensical ormisleading outputs.2.Bias and Inequality: The language models can also inherit biasespresent in the training data. If the training data contains biased language orstereotypes, the generated text may exhibit the same biases. This canperpetuate societal inequalities and reinforce biased perspectives.3.Data Requirements: Training generative language models requires asignificant amount of data. Access to large, diverse, and high-quality datasets is crucial for achieving good performance. Acquiring and preprocessing suchdatasets can be time-consuming and computationally expensive.putational Resources: Generating text using large-scalegenerative language models requires substantial computational resources.Training and inference can be computationally expensive, making itinaccessible for those with limited resources.ConclusionGenerative language models have opened up new possibilities in the domain of natural language processing. Their ability to generate human-like text has made them instrumental in various applications such as text generation, speech synthesis, machine translation, and question answering. However, several challenges such as coherence, bias, data requirements, and computational resources need to be addressed to ensure their responsible and efficient use. With ongoing research and advancements in the field, generative language models will continue to play a significant role in the future of language processing.。

自然语言处理概述外文翻译

自然语言处理概述外文翻译

自然语言处理概述外文翻译
自然语言处理(Natural Language Processing,简称NLP)是计
算机科学、人工智能以及语言学等学科交叉的一个领域。

其研究内
容就是让计算机能够理解、处理和生成自然语言的信息。

NLP的应
用非常广泛,例如自动翻译、语音识别、文本分析和问答系统等等。

在NLP领域中,最常见的任务有自然语言理解(Natural Language Understanding,简称NLU)和自然语言生成(Natural Language Generation,简称NLG)。

自然语言理解是指将自然语言转化成计算机可以理解的形式。

这其中包括分词、词性标注、语法分析、语义分析和实体识别等。

其中,分词是将连续的文本切割成有意义的词汇序列,词性标注是
指为每个词汇标注其词性,语法分析是指分析句子的句法结构,语
义分析是指理解句子的意义,实体识别是指从文本中识别出特定的
实体(人名、地名、组织机构名等)。

自然语言生成则是指根据要求生成自然语言文本。

其基本过程
就是先从语言知识中找到适当的表达,然后将这些表达组合成符合
要求的句子。

自然语言生成在自动化问答、智能对话系统等方面有着广泛的应用。

中文的自然语言处理与英文的自然语言处理

中文的自然语言处理与英文的自然语言处理

中文的自然语言处理与英文的自然语言处理Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on the interaction between computers and humans using natural language. It is a field that has seen significant advancements in recent years, with researchers around the world working to improve the accuracy and effectiveness of NLP systems. In this article, we will compare and contrast the differences between NLP in Chinese and NLP in English.Chinese NLP:1. Character-based: One of the key differences between Chinese NLP and English NLP is that Chinese is a character-based language, whereas English is an alphabet-based language. This means that Chinese NLP systems need to be able to understand and process individual characters, as opposed to words in English.2. Word segmentation: Chinese is also a language that does not use spaces between words, which means that word segmentation is a crucial step in Chinese NLP. This process involves identifying where one word ends and another begins, which can be challenging due to the lack of spaces.3. Tonal differences: Another unique aspect of Chinese NLP is that Chinese is a tonal language, meaning that the tone in which a word is spoken can change its meaning. NLP systems need to be able to recognize and account for these tonal differences in order to accurately process and understand Chinese text.English NLP:1. Word-based: In contrast to Chinese, English is an alphabet-based language, which means that NLP systems can focus on processing words rather than individual characters. This can make certain tasks, such as named entity recognition, easier in English NLP.2. Sentence structure: English has a more rigid sentence structure compared to Chinese, which can make tasks such as parsing and syntactic analysis more straightforward in English NLP. This is because English follows a specificsubject-verb-object order in most sentences, whereas Chinese has a more flexible word order.3. Verb conjugation: English is also a language that uses verb conjugation, meaning that verbs change form based on tense, person, and number. NLP systems need to be able to recognizeand interpret these verb forms in order to accurately understand and generate English text.In conclusion, while there are similarities between Chinese NLP and English NLP, such as the use of machine learning algorithms and linguistic resources, there are also key differences that researchers need to consider when developing NLP systems for these languages. By understanding these differences, researchers can continue to advance the field of NLP and improve the performance of NLP systems in both Chinese and English.。

emnlp参考文献格式

emnlp参考文献格式

emnlp参考文献格式EMNLP (Empirical Methods in Natural Language Processing) 是自然语言处理领域的一个重要会议,它聚集了全球顶尖的研究人员和学者。

在撰写论文或学术研究时,正确的引用参考文献格式是至关重要的。

下面是关于EMNLP参考文献格式的一些建议和指南。

1. 会议论文:在引用EMNLP会议论文时,一般遵循以下格式:作者. (发布年份). 文章标题. 在Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 页码. 地点: 出版者。

例如:Smith, J., & Johnson, A. (2019). A Novel Approach to Sentiment Analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 100-110. New York, NY: Association for Computational Linguistics.2. 期刊文章:对于EMNLP发布的期刊文章,引用格式通常如下:作者. (发布年份). 文章标题. 期刊名称, 卷号(期号), 页码.例如:Brown, L., & Miller, R. (2018). Neural Machine Translation: A Comprehensive Review. Journal of Natural Language Processing, 15(3), 345-367.3. 博士论文或硕士论文:引用博士论文或硕士论文时,格式如下:作者. (论文完成年份). 论文标题. 学位论文类型, 学位论文所在机构.例如:Johnson, M. (2020). Cross-lingual Named Entity Recognition using Neural Networks. Doctoral dissertation, University of California, Berkeley.4. 书籍:对于EMNLP出版的书籍,引用格式如下:作者. (出版年份). 书名. 出版地点: 出版者.例如:Smith, J. (2017). Introduction to Natural Language Processing. New York, NY: Springer.总之,在引用EMNLP参考文献时,要确保准确列出作者姓名、文章标题、出版年份、会议/期刊名称,以及页码等重要信息。

XLM论文原理解析

XLM论文原理解析

XLM论⽂原理解析1. 前⾔近⼀年来,NLP领域发展势头强劲,从ELMO到LSTM再到去年最⽜叉的Google Bert,在今年年初,Facebook⼜推出了模型,在跨语⾔预训练领域表现抢眼。

实验结果显⽰在XNLI任务上⽐原来的state-of-the-art直接⾼了4.9个百分点;在⽆监督机器翻译WMT’16 German-English 中,⽐原来的state-of-the-art⾼了9个BLEU;在有监督的机器翻译WMT’16 Romanian-English中,⽐原来的state-of-the-art⾼了4个BLEU。

最近的研究已经证明了⽣成预训练对于英语⾃然语⾔理解的有效性。

在这项⼯作中,我们将这种⽅法扩展到多种语⾔并展⽰跨语⾔预训练的有效性。

我们提出了两种学习跨语⾔语⾔模型()的⽅法:⼀种是⽆监督⽅式,只依赖于单语⾔数据,另⼀种是监督,利⽤新的跨语⾔语⾔模型⽬标来利⽤并⾏数据。

我们获得了关于跨语⾔分类,⾮监督和监督机器翻译的最新结果。

2. 介绍句⼦编码器的⽣成式预训练(Generative pretraining)已经使许多⾃然语⾔理解的 benchmark 取得了显著的进步。

在此背景下,Transformer 语⾔模型在⼤型⽆监督⽂本语料库上学习后,再针对具体的⾃然语⾔理解 (NLU) 任务进⾏微调,如分类或⾃然语⾔推理。

尽管⽬前研究⼈员对学习通⽤句⼦表⽰的兴趣激增,但该领域的研究基本上都是集中在英语 Benchmark上。

学习和评估多语⾔的跨语⾔句⼦表⽰的最新进展,旨在减轻以英语为中⼼的偏见,并提出可以构建通⽤的跨语⾔编码器,将任何句⼦编码到共享嵌⼊空间中。

在本⽂的⼯作中,证明了跨语⾔模型预训练在多个跨语⾔理解(XLU)benchmark 上的有效性。

具体来说,本⽂⼯作有以下贡献:提出了⼀种新的⽆监督⽅法。

使⽤跨语⾔语⾔建模来学习跨语⾔表⽰,并研究了两种单语预训练的⽬标函数。

提出⼀个新的监督学习⽬标。

自然语言处理技术的发展演变及其特点

自然语言处理技术的发展演变及其特点

自然语言处理技术的发展演变及其特点自然语言处理(Natural Language Processing,简称NLP)是人工智能(Artificial Intelligence,AI)领域中的一个重要分支,致力于研究和开发使计算机能够理解、处理和生成自然语言的技术。

自然语言处理技术的发展可以追溯到 20 世纪 50 年代,经历了多个阶段和重要的技术突破。

在 20 世纪,初期的 NLP 研究主要集中在文本处理和机器翻译方面,特点是基于规则和词典的方法。

当时的研究者尝试将自然语言转化成计算机能够理解和处理的形式,以解决自然语言的歧义和复杂性问题。

然而,这种方法受限于人工规则和语言知识的有限性,很难处理大规模语料和多样化的语言表达。

到了 20 世纪 80 年代,统计方法开始在 NLP 中崭露头角。

随着计算能力的提升和大规模语料库的积累,研究者开始使用统计模型来解决自然语言处理中的问题,例如语音识别、机器翻译和文本分类等。

统计方法通过分析大量的语言数据来推断语言表达的规律,使得计算机可以更好地理解和处理自然语言。

近年来,深度学习的快速发展为自然语言处理带来了新的突破。

深度神经网络模型通过多层的神经元网络来学习语言的表示和语义,极大地提升了自然语言处理技术的性能。

深度学习在机器翻译、情感分析、自动问答等领域取得了一系列重要的研究成果,将自然语言处理技术推向了新的高度。

自然语言处理技术的发展演变带来了多方面的特点。

首先,NLP 技术的适用范围越来越广泛。

除了基础的文本处理和机器翻译外,NLP 还被应用于语音识别、情感分析、自动问答、知识图谱等领域。

此外,NLP 技术也被应用于智能助理、智能客服、推荐系统等各行各业的应用中。

其次,NLP 技术的性能不断提升。

随着统计方法和深度学习的应用,自然语言处理技术在语言理解、语言生成和语言分析等方面取得了显著的进展。

例如,在机器翻译方面,神经网络模型的性能已经超越了传统的统计机器翻译方法。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
1 Introduction
The production of multilingual documentation has an obvious practical importance. Companies seeking global markets for their products must provide instructions or other reference materials in a variety of languages. Large political organizations like the European Union are under pressure to provide multilingual versions of official documents, especially when communicating with the public. This need is met mostly by human translation: an author produces a source document which is passed to a number of other people for translation into other languages.
1053
NLG), the source will be a knowledge base expressed in a formal language. By eliminating the analysis phase of MT, M-NLG can yield high-quality output texts, free from the 'literal' quality that so often arises from structural imitation of an input text. Unfortunately, this benefit is gained at the cost of a huge increase in the difficulty of obtaining the source. No longer can the domain expert author the document directly by writing a text in natural language. Defining the source becomes a task akin to building an expert system, requiring collaboration between a domain expert (who understands the subjectmatter of the document) and a knowledge engineer (who understands the knowledge representation formalism). Owing to this cost, M-NLG has been applied mainly in contexts where the knowledge base is already available, having been created for another purpose (Iordanskaja et al., 1992; Goldberg et al., 1994); for discussion see Reiter and Mellish (1993). Is there any way in which a domain expert might author a knowledge base without going through this time-consuming and costly collaboration with a knowledge engineer? Assuming that some kind of mediation is needed between domain expert and knowledge formalism, the only alternative is to provide easier tools for editing knowledge bases. Some knowledge management projects have experimented with graphical presentations which allow editing by direct manipulation, so that there is no need to learn the syntax of a programming language see for example Skuce and Lethbridge (1995). This approach has also been adopted in two M-NLG systems: GIST (Power and Cavallotto, 1996), which generates social security forms in English, Italian and German; and DRAFTER (Paris et al., 1995), which generates instructions for software applications in English and French. These projects were the first attempts to produce symbolic authoring systems - that is, systems allowing a domain expert with no training in knowledge engineering to author a knowledge base (or symbolic source) from which texts in many languages can be generated. Although helpful, graphical tools for managing knowledge bases remain at best a compromise solution. Diagrams may be easier to understand than logical formalisms, but they still lack the flexibility and familiarity of natural lan-
Abstract There are obvious reasons for trying to automate the production of multilingual documentation, especially for routine subject-matter in restricted domains (e.g. technical instructions). Two approaches have been adopted: Machine Translation (MT) of a source text, and Multilingual Natural Language Generation (M-NLG) from a knowledge base. For MT, information extraction is a major difficulty, since the meaning must be derived by analysis of the source text; M-NLG avoids this difficulty but seems at first sight to require an expensive phase of knowledge engineering in order to encode the meaning. We introduce here a new technique which employs M-NLG during the phase of knowledge editing. A 'feedback text', generated from a possibly incomplete knowledge base, describes in natural language the knowledge encoded so far, and the options for extending it. This method allows anyone speaking one of the supported languages to produce texts in all of them, requiring from the author only expertise in the subject-matter, not expertise in knowledge engineeeedback texts
R i c h a r d P o w e r and D o n i a S c o t t ITRI, University of Brighton Lewes Road, Brighton BN2 4AT, UK stName@
Human translation has several well-known disadvantages. It is not only costly but timeconsuming, often delaying the release of the product in some markets; also the quality is uneven and hard to control (Hartley and Paris, 1997). For all these reasons, the production of multilingual documentation is an obvious candidate for automation, at least for some classes of document. Nobody expects that automation will be applied in the foreseeable future for literary texts ranging over wide domains (e.g. novels). However, there is a mass of non-literary material in restricted domains for which automation is already a realistic aim: instructions for using equipment are a good example. The most direct attempt to automize multilingual document production is to replace the human translator by a machine. The source is still a natural language document written by a human author; a program takes this source as input, and produces an equivalent text in another language as output. Machine translation has proved useful as a way of conveying roughly the information expressed by the source, but the output texts are typically poor and over-literal. The basic problem lies in the analysis phase: the program cannot extract from the source all the information that it needs in order to produce a good output text. This may happen either because the source is itself poor (e.g. ambiguous or incomplete), or because the source uses constructions and concepts that lie outside the program's range. Such problems can be alleviated to some extent by constraining the source document, e.g. through use of a 'Controlled Language' such as AECMA (1995). An alternative approach to translation is that of generating the multilingual documents from a non-linguistic source. In the case of automatic Multilingual Natural Language Generation (M-
相关文档
最新文档