Neural Networks for Latent Semantic Analysis

合集下载

基于潜在语义分析的文本连贯性分析

基于潜在语义分析的文本连贯性分析
收稿日期: 2006- 02 - 21。博士点基金 ( 20050007023) 。汤世 平, 博 士生, 主研领域: 自然语言理解。
96
计算机应用与软件
2008 年
落 }, 但这里的段落单元主要指 段落所 包含的 中心思 想, 而 不仅
仅是段落的位置和边 界等信息。
文本是文章在计 算机内的 存储表 示, 因 此在下 面叙述 中文
这一事实, 我们尝试采用有序方式划分文本的层次。
设文本 T 具有 n 个自然段, K 个层 次, 用 H 表示 文本层次,
P 表示自然段, 则有如下组成关系: T = H 1H 2 H k = ( Pi Pi - 1 )
1
2
( Pi P i - 1 ) ( Pi P i - 1 )。其中, i1 = 1# i2#
调的是层次的有序性 , 层次是由若干连续有序的自然段组成。
利用潜在语义索引, 将特征项映射至概念级, 无疑将有助于 加强同一层次内段 落间的 聚合能力 。同一层 次的若 干自然段,
由于共同支持该层次 所表达的 主题思 想, 因 此在概 念上具 有很
强的集聚性, 在使用的频率上也往往具有很大的相同之处, 根据
关键词 向量空间 模型 潜在语义分析 文本连贯性 计算机辅助评估
LATENT SEMANTIC ANALYSIS BASED TEXT COHERENCE ANALYSIS
T ang Shiping Fan X iaozhong Zhu Jianyong
(S chool of Compu ter S cien ce and T echn ology, B eijing In sti tu te of T echnology, B eij ing 100081, Ch ina )

语义分析的一些方法

语义分析的一些方法

语义分析的一些方法语义分析的一些方法(上篇)•5040语义分析,本文指运用各种机器学习方法,挖掘与学习文本、图片等的深层次概念。

wikipedia上的解释:In machine learning, semantic analysis of a corpus is the task of building structures that approximate concepts from a large set of documents(or images)。

工作这几年,陆陆续续实践过一些项目,有搜索广告,社交广告,微博广告,品牌广告,内容广告等。

要使我们广告平台效益最大化,首先需要理解用户,Context(将展示广告的上下文)和广告,才能将最合适的广告展示给用户。

而这其中,就离不开对用户,对上下文,对广告的语义分析,由此催生了一些子项目,例如文本语义分析,图片语义理解,语义索引,短串语义关联,用户广告语义匹配等。

接下来我将写一写我所认识的语义分析的一些方法,虽说我们在做的时候,效果导向居多,方法理论理解也许并不深入,不过权当个人知识点总结,有任何不当之处请指正,谢谢。

本文主要由以下四部分组成:文本基本处理,文本语义分析,图片语义分析,语义分析小结。

先讲述文本处理的基本方法,这构成了语义分析的基础。

接着分文本和图片两节讲述各自语义分析的一些方法,值得注意的是,虽说分为两节,但文本和图片在语义分析方法上有很多共通与关联。

最后我们简单介绍下语义分析在广点通“用户广告匹配”上的应用,并展望一下未来的语义分析方法。

1 文本基本处理在讲文本语义分析之前,我们先说下文本基本处理,因为它构成了语义分析的基础。

而文本处理有很多方面,考虑到本文主题,这里只介绍中文分词以及Term Weighting。

1.1 中文分词拿到一段文本后,通常情况下,首先要做分词。

分词的方法一般有如下几种:•基于字符串匹配的分词方法。

此方法按照不同的扫描方式,逐个查找词库进行分词。

hinton中文版深度学习解析

hinton中文版深度学习解析

深度学习Yann LeCun1,2, Yoshua Bengio3 & Geoffrey Hinton4,5深度学习是指由多个处理层组成的计算模型来学习表示具有多个抽象层次的数据。

这个方法可以显着地改进语音识别,视觉对象识别,对象检测和许多其他领域,如药物发现和基因组学的先进技术。

深度学习可以通过反向传播算法发现大数据集的复杂结构,来说明一台机器如何从前一层的特征改变其用于计算在每一层中的特征内部参数。

深度卷积网在处理图像、视频、语音和音频等方面带来了突破性的进展,而递归网络已经为顺序数据方面,如文本和语音,指明了方向。

机器学习技术促进了现代社会的许多方面:从网络搜索到社交网络的内容过滤,到对电子商务网站的建议,并且它越来越多地出现在消费类产品之中,如相机和智能手机。

机器学习系统是用来识别图像中的对象、把语音记录成文字,把新闻、海报或者产品与用户的兴趣进行匹配,并选择相关的搜索结果。

逐渐地,这些应用程序使用的一类技术就被称为深度学习。

传统的机器学习技术在处理原始形式的自然数据只有有限的能力。

几十年来,创建一个模式识别或需要精心的工程和相当大的专业知识的机器学习系统,以设计一个将原始数据(如图像的像素值)转换一个合适的内部表示或特征向量学习子系统的特征提取器,通常这可以是一个分类器,它可以在输入中检测或分类模式。

表示学习是允许一台机器被输入原始数据,并自动发现检测或分类所需特征的一组方法。

深度学习方法是一种具有多层次表示的学习方法,通过简单的非线性模块组成,每个模块转换一个级别上的特征(从原始输入开始)到一个更高,更抽象的层级上的特征。

足够多的这样的层级转换,可以学习非常复杂的功能。

对于分类任务,更高层次的表示会放大输入在区分方面的重要性和抑制不相关的变化。

例如,一个图像以像素阵列的形式出现,并且第一表示层中的学习特征通常表示图像中特定取向和位置处的边缘存在或不存在。

第二层通常通过点样颗粒排列来检测图案边缘,而不考虑边缘位置的小变化。

研究NLP100篇必读的论文---已整理可直接下载

研究NLP100篇必读的论文---已整理可直接下载

研究NLP100篇必读的论⽂---已整理可直接下载100篇必读的NLP论⽂⾃⼰汇总的论⽂集,已更新链接:提取码:x7tnThis is a list of 100 important natural language processing (NLP) papers that serious students and researchers working in the field should probably know about and read.这是100篇重要的⾃然语⾔处理(NLP)论⽂的列表,认真的学⽣和研究⼈员在这个领域应该知道和阅读。

This list is compiled by .本榜单由编制。

I welcome any feedback on this list. 我欢迎对这个列表的任何反馈。

This list is originally based on the answers for a Quora question I posted years ago: .这个列表最初是基于我多年前在Quora上发布的⼀个问题的答案:[所有NLP学⽣都应该阅读的最重要的研究论⽂是什么?]( -are-the-most-important-research-paper -which-all-NLP-students-should- definitread)。

I thank all the people who contributed to the original post. 我感谢所有为原创⽂章做出贡献的⼈。

This list is far from complete or objective, and is evolving, as important papers are being published year after year.由于重要的论⽂年复⼀年地发表,这份清单还远远不够完整和客观,⽽且还在不断发展。

基于自监督对比学习的深度神经网络对抗鲁棒性提升

基于自监督对比学习的深度神经网络对抗鲁棒性提升
2无负样本自监督对比学习
2.1基于挛生网络的无负样本对比学习 许多自监督学习方法[5]都是基于跨视角预测
框架,将特征学习问题建模为同一幅样本图像的多 个不同视角相互预测问题。预测问题通常投影于 表示空间,即一幅样本图像的任意随机增广版本的 特征表示能够预测同一幅图像的其他变换版本的 特征表示。然而,在表示空间进行直接预测会导致 崩塌表达,例如在多个视角间保持不变的表示可以 实现跨视角预测。为了解决这个问题,对比学习方 法把预测问题建模为区分性问题,即区分不同增广 图像是否来自同一样本图像。在多数情况下,这种 机制可以防止训练过程退化为崩塌表达,缺点是需 要负样本进行对比学习。负样本最优选取、负样本 的数量的确定无论是在理论上还是在实际应用中 都仍然面临很大挑战。
用快速梯度符号方法(Fast Gradient Sign Method)*3〕
沿梯度方向产生扰动或利用投影梯度下降 (Projec­ ted Gradient Descent-PGD)方法⑷迭代最大化网络 的损失函数产生对抗样本。深度神经网络的对抗 训练过程是有监督学习过程,而且对抗训练过程的 样本采样复杂度要比标准训练高很多,因此需要更 大规模的高质量标记数据集。
ssn. 1003-0530.2021.06.001.
Self-supeeviser Contrastive Leareing foe Improving the Adversarial Robustness of Deep Neeral Networks
SUN Hao1 XU Yanjie1 CHEN Jin2 LEI Lin1 JI Kefeng1 KUANG Gangyao1
第37卷第6期 2021年6月
文章编号:1003-0530(2021)06-0903-09

基于对象特征的深度哈希跨模态检索

基于对象特征的深度哈希跨模态检索

基于对象特征的深度哈希跨模态检索朱杰1,白弘煜1,张仲羽1,谢博鋆2,张俊三3+1.中央司法警官学院信息管理系,河北保定0710002.河北大学数学与信息科学学院,河北保定0710023.中国石油大学(华东)计算机科学与技术学院,山东青岛266580+通信作者E-mail:*******************.cn 摘要:随着不同模态的数据在互联网中的飞速增长,跨模态检索逐渐成为了当今的一个热点研究问题。

哈希检索因其快速、有效的特点,成为了大规模数据跨模态检索的主要方法之一。

在众多图像-文本的深度跨模态检索算法中,设计的准则多为尽量使得图像的深度特征与对应文本的深度特征相似。

但是此类方法将图像中的背景信息融入到特征学习中,降低了检索性能。

为了解决此问题,提出了一种基于对象特征的深度哈希(OFBDH )跨模态检索方法。

此方法从特征映射中学习到优化的、有判别力的极大激活特征作为对象特征,并将其融入到图像与文本的跨模态网络学习中。

实验结果表明,OFBDH 能够在MIRFLICKR-25K 、IAPR TC-12和NUS-WIDE 三个数据集上获得良好的跨模态检索结果。

关键词:对象特征;跨模态损失;网络参数学习;检索文献标志码:A中图分类号:TP391Object Feature Based Deep Hashing for Cross-Modal RetrievalZHU Jie 1,BAI Hongyu 1,ZHANG Zhongyu 1,XIE Bojun 2,ZHANG Junsan 3+1.Department of Information Management,The National Police University for Criminal Justice,Baoding,Hebei 071000,China2.College of Mathematics and Information Science,Hebei University,Baoding,Hebei 071002,China3.College of Computer Science and Technology,China University of Petroleum,Qingdao,Shandong 266580,China Abstract:With the rapid growth of data with different modalities on the Internet,cross-modal retrieval has gradually become a hot research topic.Due to its efficiency and effectiveness,Hashing based methods have become one of the most popular large-scale cross-modal retrieval strategies.In most of the image-text cross-modal retrieval methods,the goal is to make the deep features of the images similar to the corresponding deep text features.However,these methods incorporate background information of the images into the feature learning,as a result,the retrieval performance is decreased.To solve this problem,OFBDH (object feature based deep Hashing)is proposed to learn计算机科学与探索1673-9418/2021/15(05)-0922-09doi:10.3778/j.issn.1673-9418.2006062基金项目:国家自然科学基金(61802269);河北省自然科学基金青年基金项目(F2018511002);河北省高等学校科学技术研究项目(Z2019037,QN2018251);河北大学高层次创新人才科研启动经费项目;2019年中央司法警官学院省级大学生创新创业训练计划项目(S201911903004)。

隐语义模型常用的训练方法

隐语义模型常用的训练方法

隐语义模型常用的训练方法隐语义模型(Latent Semantic Model)是一种常用的文本表示方法,它可以将文本表示为一个低维的向量空间中的点,从而方便进行文本分类、聚类等任务。

在实际应用中,如何训练一个高效的隐语义模型是非常重要的。

本文将介绍隐语义模型常用的训练方法。

一、基于矩阵分解的训练方法1.1 SVD分解SVD(Singular Value Decomposition)分解是一种基于矩阵分解的方法,它可以将一个矩阵分解为三个矩阵相乘的形式,即A=UΣV^T。

其中U和V都是正交矩阵,Σ是对角线上元素为奇异值的对角矩阵。

在隐语义模型中,我们可以将用户-物品评分矩阵R分解为两个低维矩阵P和Q相乘的形式,即R≈PQ^T。

其中P表示用户向量矩阵,Q表示物品向量矩阵。

具体地,在SVD分解中,我们首先需要将评分矩阵R进行预处理。

一般来说,我们需要减去每个用户或每个物品评分的平均值,并对剩余部分进行归一化处理。

然后,我们可以使用SVD分解将处理后的评分矩阵R分解为P、Q和Σ三个矩阵。

其中,P和Q都是低维矩阵,Σ是对角线上元素为奇异值的对角矩阵。

通过调整P和Q的维度,我们可以控制模型的复杂度。

在训练过程中,我们需要使用梯度下降等方法来最小化预测评分与实际评分之间的误差。

具体地,在每次迭代中,我们可以随机选择一个用户-物品对(ui),计算预测评分pui,并根据实际评分rui更新P 和Q中相应向量的值。

具体地,更新公式如下:pu=pu+η(euiq-uλpu)qi=qi+η(euip-uλqi)其中η是学习率,λ是正则化参数,eui=rui-pui表示预测评分与实际评分之间的误差。

1.2 NMF分解NMF(Nonnegative Matrix Factorization)分解是另一种基于矩阵分解的方法,在隐语义模型中也有广泛应用。

与SVD不同的是,在NMF中要求所有矩阵元素都为非负数。

具体地,在NMF中,我们需要将评分矩阵R进行预处理,并将其分解为P和Q两个非负矩阵相乘的形式,即R≈PQ。

语义空间的研究方法

语义空间的研究方法
sponse time C3 : The EPS user interf ace management system
C4 : System and human system engineering testing of
EPS
C5 : Relation of user perceived response time to error measurement M1 : The generation of random , binary , ordered trees M2 : The intersection graph of paths in trees M3 : Graph minors IV : Widths of trees and well quasi - ordering M4 : Graph minors : A survey {A} =
以下是九个技术备忘录的题目 , 前五个是关于 人和计算机交互作用的题目 , 后四个是关于数学图 论的题目 ,两个题目之间是没什么关系的 。每一题 目作为列 ,每一题目中至少出现 2 次的实词 ( 斜体字 标出) 作为行 , 这样就构成了一个 9 列 12 行的原始 矩阵{A} 。 C1 : Human machine interf ace for ABC computer applications C2 : A survey of user opinion of computer system re2
C3 0 1 0 1 1 0 0 1 0 0 0 0
C4 1 0 0 0 2 0 0 1 0 0 0 0
C5 0 0 0 1 0 1 1 0 0 0 0 0
M1 0 0 0 0 0 0 0 0 0 1 0 0
M2 0 0 0 0 0 0 0 0 0 1 1 0

深度学习文本匹配简述

深度学习文本匹配简述

深度学习⽂本匹配简述深度⽂本匹配⽅法近期在看有关于相似⽂本检索的论⽂,但是发现这个⽅向模型和论⽂太多,为了⽅便⾃⼰看,简单做了个整理。

匹配⽅法可以分为三类:基于单语义⽂档表达的深度学习模型(基于表⽰)基于单语义⽂档表达的深度学习模型主要思路是,⾸先将单个⽂本先表达成⼀个稠密向量(分布式表达),然后直接计算两个向量间的相似度作为⽂本间的匹配度。

基于多语义⽂档表达的深度学习模型(基于交互)基于多语义的⽂档表达的深度学习模型认为单⼀粒度的向量来表⽰⼀段⽂本不够精细,需要多语义的建⽴表达,更早地让两段⽂本进⾏交互,然后挖掘⽂本交互后的模式特征,综合得到⽂本间的匹配度。

BERT及其后辈⽂本匹配虽然⽂本匹配在BERT出现以前⼀直是以两类模型主导,但其实⽂本匹配是⼀个⼴泛的概念,在⽂本匹配下⾯还有许多的任务,正如下表所⽰:1.复述识别(paraphrase identification)⼜称释义识别,也就是判断两段⽂本是不是表达了同样的语义,即是否构成复述(paraphrase)关系。

有的数据集是给出相似度等级,等级越⾼越相似,有的是直接给出0/1匹配标签。

这⼀类场景⼀般建模成分类问题。

2.⽂本蕴含识别(Textual Entailment)⽂本蕴含属于NLI(⾃然语⾔推理)的⼀个任务,它的任务形式是:给定⼀个前提⽂本(text),根据这个前提去推断假说⽂本(hypothesis)与⽂本的关系,⼀般分为蕴含关系(entailment)和⽭盾关系(contradiction),蕴含关系(entailment)表⽰从text中可以推断出hypothesis;⽭盾关系(contradiction)即hypothesis与text⽭盾。

⽂本蕴含的结果就是这⼏个概率值。

3.问答(QA)问答属于⽂本匹配中较为常见的任务了,这个任务也⽐较容易理解,根据Question在段落或⽂档中查找Answer,但是在现在这个问题常被称为阅读理解,还有⼀类是根据Question查找包含Answer的⽂档,QA任务常常会被建模成分类问题,但是实际场景往往是从若⼲候选中找出正确答案,⽽且相关的数据集也往往通过⼀个匹配正例+若⼲负例的⽅式构建,因此往往建模成ranking问题。

DeepFakeDetetion、数字图像处理操作取证研究方向综述

DeepFakeDetetion、数字图像处理操作取证研究方向综述

DeepFakeDetetion、数字图像处理操作取证研究⽅向综述DeepFake Detetion综述综述⼀:DeepFake⽣成与防御研究⼊门转⾃公众号【隐者联盟】DeepFake(深度伪造)是英⽂“Deep Learning”和“Fake”的混成词,专指基于⼈⼯智能的⼈体图像合成技术,这是维基百科对Deepfake的基本定义。

⼴义⽽⾔,深度伪造包括基于深度学习的图像、⽂本、⾳视频等各种媒体的⽣成和编辑技术。

从2017年Reddit社区“DeepFake”作品引起轰动,到近期“蚂蚁呀嘿”的盛⾏,DeepFake已经在全⽹掀起了⼀次次应⽤热潮。

深度学习的发展使⼈脸伪造技术趋于⼤众化,由DeepFake技术滥⽤导致的问题也严重威胁着社会信誉、司法公正乃⾄国家安全,因此相应的防御技术也得到了快速发展。

伪造技术概述1. 基于图像域特征编码的⽅法现阶段,全智能化的⼈脸深度伪造技术发展并不完备,其中主流的伪造技术主要从⼈脸图像域的⾓度出发,通过对⼈脸图像进⾏特征编码、重构的操作⽅式实现篡改,篡改类型可以概括为⾯部替换和属性编辑两⼤类。

其中⾯部替换旨在⽤原始⼈脸⾯部替换⽬标⼈脸的⾯部区域,涉及⽬标图像⾝份属性的变化。

⽽属性编辑主要针对⽬标⼈脸⾝份信息外的各类属性进⾏编辑篡改,如使⽤表情迁移、唇形篡改等。

⾯部替换的经典算法是“Deepfakes”[1],主体结构基于⾃动编码器实现。

对于原始⼈脸A和⽬标⼈脸B,训练权值共享的编码器⽤于编码⼈脸特征,解码端A和B各⾃训练独⽴解码器⽤于重构⼈脸。

在测试阶段,⽤训好的编码器对⽬标B进⾏编码,再⽤训好的A解码器来解码B的特征,以实现A与B之间的⼈脸替换。

为了达到更好的替换效果和更佳的可操控性,对抗损失和⼈脸解耦重构等技术也被⽤于深伪算法进⾏约束与监督,并产⽣了很多变体⽅法,如FSGAN[2]、FaceShifter[3]等,使得⽣成的伪造⼈脸质量⼤幅提⾼。

属性编辑算法的基本原理与⾯部替换类似,但该类算法以⼈脸属性为对象进⾏篡改,不涉及到⽬标⼈物⾝份信息的改变,通常⽤来进⾏⼈脸的表情迁移、唇形篡改等应⽤。

基于lda的文本情感分析

基于lda的文本情感分析

本科毕业设计(论文)学院(部)计算机科学与技术学院题目基于lda的文本情感分析年级2014专业信息治理与信息系统班级14信管学号1427402014姓名何聪指导老师严建峰职称副教授论文提交日期2019年5月19日目录摘要 (1)前言 (3)第一章概述 (5)1.1情感分析概述 (5)1.1.1主要研究内容 (5)1.1.2文本情感分析的分类 (6)1.1.3主题模型在情感分析中的应用 (7)1.2国内外研究现状 (7)1.3本文内容安排 (8)第二章数据预处理 (10)2.1概述 (10)2.2分词以及简繁体转换 (10)2.3去除停用词 (10)2.4抽取情感信息 (11)2.4.1情感词典的构建 (11)2.4.2抽取情感信息 (11)2.4.3数据 (12)2.5本章小结 (12)第三章 LDA建模 (13)3.1LDA概念 (13)3.1.1概率主题概念的提出 (13)3.1.2LDA模型 (14)3.2试验 (16)3.2.1划分数据集 (16)3.2.2数据词典 (16)3.2.3向量化 (17)3.2.4使用TF-IDF作为特征值 (17)3.2.5LDA模型训练 (19)3.3本章小结 (20)第四章 SVM分类 (21)4.1SVM概念 (21)4.1.1线性分类 (22)4.1.2软间隔最大化 (23)4.1.3非线性支持向量机 (24)4.2本文中的SVC (26)4.2.1算法描述 (26)4.3试验 (28)4.3.1特征选取 (28)4.3.2数据转换 (28)4.3.3将数据随机分为训练集和测试集 (29)4.3.4SVM训练和预测 (29)4.3本章总结 (30)第五章贝叶斯分类 (31)5.1概念 (31)5.2贝叶斯定理 (31)5.2.1简朴贝叶斯 (31)5.2.2伯努利模型 (32)5.3本文中的简朴贝叶斯 (32)5.3.1算法描述 (32)5.3试验 (33)5.3.1特征选取 (33)5.3.2向量化 (33)5.3.3简朴贝叶斯分类训练 (34)5.3.4测试 (34)5.3.5准确率 (35)5.4本章总结 (35)第六章总结与展望 (37)6.1本文主要内容总结 (37)6.2存在的问题以及未来展望 (37)参考文献 (39)致谢 (40)摘要互联网的快速进展让各类社交媒体与日俱增,人们在网络上发表各种各样的评论、博客等信息。

Bidirectional Recurrent Neural Networks

Bidirectional Recurrent Neural Networks

Bidirectional Recurrent Neural Networks Mike Schuster and Kuldip K.Paliwal,Member,IEEEAbstract—In thefirst part of this paper,a regular recurrent neural network(RNN)is extended to a bidirectional recurrent neural network(BRNN).The BRNN can be trained without the limitation of using input information just up to a preset future frame.This is accomplished by training it simultaneously in positive and negative time direction.Structure and training procedure of the proposed network are explained.In regression and classification experiments on artificial data,the proposed structure gives better results than other approaches.For real data,classification experiments for phonemes from the TIMIT database show the same tendency.In the second part of this paper,it is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.For this part,experiments on real data are reported.Index Terms—Recurrent neural networks.I.I NTRODUCTIONA.GeneralM ANY classification and regression problems of engi-neering interest are currently solved with statistical approaches using the principle of“learning from examples.”For a certain model with a given structure inferred from the prior knowledge about the problem and characterized by a number of parameters,the aim is to estimate these parameters accurately and reliably using afinite amount of training data. In general,the parameters of the model are determined by a supervised training process,whereas the structure of the model is defined in advance.Choosing a proper structure for the model is often the only way for the designer of the system to put in prior knowledge about the solution of the problem. Artificial neural networks(ANN’s)(see[2]for an excellent introduction)are one group of models that take the principle “infer the knowledge from the data”to an extreme.In this paper,we are interested in studying ANN structures for one particular class of problems that are represented by temporal sequences of input–output data pairs.For these types of problems,which occur,for example,in speech recognition, time series prediction,dynamic control systems,etc.,one of the challenges is to choose an appropriate network structureManuscript received June5,1997.The associate editor coordinating the review of this paper and approving it for publication was Prof.Jenq-Neng Hwang.M.Schuster is with the ATR Interpreting Telecommunications Research Laboratory,Kyoto,Japan.K.K.Paliwal is with the ATR Interpreting Telecommunications Research Laboratory,Kyoto,Japan,on leave from the School of Microelectronic Engineering,Griffith University,Brisbane,Australia.Publisher Item Identifier S1053-587X(97)08055-0.that,at least theoretically,is able to use all available input information to predict a point in the output space.Many ANN structures have been proposed in the literature to deal with time varying patterns.Multilayer perceptrons (MLP’s)have the limitation that they can only deal with static data patterns(i.e.,input patterns of a predefined dimen-sionality),which requires definition of the size of the input window in advance.Waibel et al.[16]have pursued time delay neural networks(TDNN’s),which have proven to be a useful improvement over regular MLP’s in many applications.The basic idea of a TDNN is to tie certain parameters in a regular MLP structure without restricting the learning capability of the ANN too much.Recurrent neural networks(RNN’s)[5],[8], [12],[13],[15]provide another alternative for incorporating temporal dynamics and are discussed in more detail in a later section.In this paper,we investigate different ANN structures for incorporating temporal dynamics.We conduct a number of experiments using both artificial and real-world data.We show the superiority of RNN’s over the other structures.We then point out some of the limitations of RNN’s and propose a modified version of an RNN called a bidirectional recurrent neural network,which overcomes these limitations.B.TechnicalConsider a(time)sequence of input datavectorsand a sequence of corresponding output datavectorswith neighboring data-pairs(in time)being somehow statisti-cally dependent.Given timesequences as training data,the aim is to learn the rules to predict the output data given the input data.Inputs and outputs can,in general,be continuous and/or categorical variables.When outputs are continuous,the problem is known as a regression problem, and when they are categorical(class labels),the problem is known as a classification problem.In this paper,the term prediction is used as a general term that includes regression and classification.1)Unimodal Regression:For unimodal regression or func-tion approximation,the components of the output vectors are continuous variables.The ANN parameters are estimated to maximize some predefined objective criterion(e.g.,maximize the likelihood of the output data).When the distribution of the errors between the desired and the estimated output vectors is assumed to be Gaussian with zero mean and afixed global data-dependent variance,the likelihood criterion reduces to the1053–587X/97$10.00©1997IEEE(a)(b)Fig.1.General structure of a regular unidirectional RNN shown (a)with a delay line and (b)unfolded in time for two time steps.convenient Euclidean distance measure between the desired and the estimated output vectors or the mean-squared-error criterion ,which has to be minimized during training [2].It has been shown by a number of researchers [2],[9]that neural networks can estimate the conditional average of the desired output (or target)vectors at their network outputs,i.e.,,where is an expectation operator.2)Classification:In the case of a classification problem,one seeks the most probable class out of a given poolof .To make this kind of problem suitable to besolved by an ANN,the categorical variables are usually coded as vectors as follows.Consider that.Then,construct an outputvectorth component is one and other componentsare zero.The output vectorsequence constructed in this manner along with the input vectorsequenceth network output at each timepoint,with thequality of the estimate depending on the size of the training data and the complexity of the network.For some applications,it is not necessary to estimate the conditional posteriorprobability)orclassification [i.e.,computeand decide the class using themaximum a posteriori decision rule].In this case,the outputs are treated statistically independent.Experiments1Here,we want to make a distinction between C t and c t .C t is a categoricalrandom variable,and c t is its value.for this part are conducted for artificial toy data as well as for real data.•Estimation of the conditional probability of a complete sequence of classes oflength)topredict .How much of this information is captured by a particular RNN depends on its structure and the training algorithm.An illustration of the amount of input information used for prediction with different kinds of NN’s is given in Fig.2.Future input information coming up later than is usually also useful for prediction.With an RNN,this can be partially achieved by delaying the output by a certain numberof(Fig.2).Theoretically,is too large.Apossible explanation for this could be that withrising for thepredictionof ,leaving less modeling power for combining the prediction knowledge from different input vectors.While delaying the output by some frames has been used successfully to improve results in a practical speech recogni-tion system [12],which was also confirmed by the experiments conducted here,the optimal delay is task dependent and has toSCHUSTER AND PALIWAL:BIDIRECTIONAL RECURRENT NEURAL NETWORKS2675Fig.2.Visualization of the amount of input information used for prediction by different networkstructures.Fig.3.General structure of the bidirectional recurrent neural network (BRNN)shown unfolded in time for three time steps.be found by the “trial and error”error method on a validation test set.Certainly,a more elegant approach would be desirable.To use all available input information,it is possible to use two separate networks (one for each time direction)and then somehow merge the results.Both networks can then be called experts for the specific problem on which the networks are trained.One way of merging the opinions of different experts is to assume the opinions to be independent,which leads to arithmetic averaging for regression and to geometric averaging (or,alternatively,to an arithmetic averaging in the log domain)for classification.These merging procedures are referred to as linear opinion pooling and logarithmic opinion pooling ,respectively [1],[7].Although simple merging of network outputs has been applied successfully in practice [14],it is generally not clear how to merge network outputs in an optimal way since different networks trained on the same data can no longer be regarded as independent.B.Bidirectional Recurrent Neural NetworksTo overcome the limitations of a regular RNN outlined in the previous section,we propose a bidirectional recurrent neural network (BRNN)that can be trained using all available input information in the past and future of a specific time frame.1)Structure:The idea is to split the state neurons of a regular RNN in a part that is responsible for the positive time direction (forward states)and a part for the negative time direction (backward states).Outputs from forward states are not connected to inputs of backward states,and vice versa.This leads to the general structure that can be seen in Fig.3,where it is unfolded over three time steps.It is not possible to display the BRNN structure in a figure similar to Fig.1with the delay line since the delay would have to be positive and negative in time.Note that without the backward states,this structure simplifies to a regular unidirectional forward RNN,as shown in Fig.1.If the forward states are taken out,a regular RNN with a reversed time axis results.With both time directions taken care of in the same network,input information in the past and the future of the currently evaluated time frame can directly be used to minimize the objective function without the need for delays to include future information,as for the regular unidirectional RNN discussed above.2)Training:The BRNN can principally be trained with the same algorithms as a regular unidirectional RNN because there are no interactions between the two types of state neurons and,therefore,can be unfolded into a general feed-forward network.However,if,for example,any form of back-propagation through time (BPTT)is used,the forward and backward pass procedure is slightly more complicated because the update of state and output neurons can no longer be done one at a time.If BPTT is used,the forward and backward passes over the unfolded BRNN over time are done almost in the same way as for a regular MLP.Some special treatment is necessary only at the beginning and the end of the training data.The forward state inputsat2676IEEE TRANSACTIONS ON SIGNAL PROCESSING,VOL.45,NO.11,NOVEMBER1997 backward state inputs atfor the forward states and atto tototo)drawn from the uniformdistribution,except the output biases,which are set so thatthe corresponding output gives the prior average of the outputdata in case of zero input activation.For the regression experiments,the networks use theactivation function and are trained to minimize the mean-squared-error objective function.For type“MERGE,”thearithmetic mean of the network outputs of“RNN-FOR”and“RNN-BACK”is taken,which assumes them to be indepen-dent,as discussed above for the linear opinion pool.For the classification experiments,the output layer uses the“softmax”output function[4]so that outputs add up to oneand can be interpreted as probabilities.As commonly used forANN’s to be trained as classifiers,the cross-entropy objectivefunction is used as the optimization criterion.Because theoutputs are probabilities assumed to be generated by inde-pendent events,for type“MERGE,”the normalized geometricmean(logarithmic opinion pool)of the network outputs of“RNN-FOR”and“RNN-BACK”is taken.c)Results:The results for the regression and the classifi-cation experiments averaged over100training/evaluation runscan be seen in Figs.4and5,respectively.For the regressiontask,the mean squared error depending on the shift of theoutput data in positive time direction seen from the timeaxis of the network is shown.For the classification task,therecognition rate,instead of the mean value of the objectivefunction(which would be the mean cross-entropy),is shownSCHUSTER AND PALIWAL:BIDIRECTIONAL RECURRENT NEURAL NETWORKS2677Fig.4.Averaged results(100runs)for the regression experiment on artificial data over different shifts of the output data with respect to the input data in future direction(viewed from the time axis of the corresponding network)for several structures.because it is a more familiar measure to characterize results of classification experiments.Several interesting properties of RNN’s in general can be directly seen from thesefigures.The minimum(maximum) for the regression(classification)task should be at20frames delay for the forward RNN and at10frames delay for the backward RNN because at those points,all information for a perfect regression(classification)has been fed into the network.Neither is the case because the modeling power of the networks given by the structure and the number of free parameters is not sufficient for the optimal solution. Instead,the single time direction networks try to make a tradeoff between“remembering”the past input information, which is useful for regression(classification),and“knowledge combining”of currently available input information.This results in an optimal delay of one(two)frame for the forward RNN andfive(six)frames for the backward RNN.The optimum delay is larger for the backward RNN because the artificially created correlations in the training data are not symmetrical with the important information for regression (classification)being twice as dense on the left side as on the right side of each frame.In the case of the backward RNN, the time series is evaluated from right to left with the denser information coming up later.Because the denser information can be evaluated easier(fewer parameters are necessary for a contribution to the objective function minimization),the optimal delay is larger for the backward RNN.If the delay is so large that almost no important information can be saved over time,the network converges to the best possible solution based only on prior information.This can be seen for the classification task with the backward RNN,which converges to59%(prior of class0)for more than15frames delay. Another sign for the tradeoff between“remembering”and “knowledge combining”is the variation in the standard devia-tion of the results,which is only shown for the backward RNN in the classification task.In areas where both mechanisms could be useful(a3to17frame shift),different local minima of the objective function correspond to a certain amount to either one of these mechanisms,which results in larger fluctuations of the results than in areas where“remembering”is not very useful(2to10)are,in almost all cases,better than with only one network.This is no surprise because besides the use of more useful input information,the number of free parameters for the model doubled.For the BRNN,it does not make sense to delay the output data because the structure is already designed to cope with all available input information on both sides of the currently evaluated time point.Therefore,the experiments for the BRNN are only run forSHIFT.For the regression and classifica-tion tasks tested here,the BRNN clearly performs better than the network“MERGE”built out of the single time-direction networks“RNN-FOR”and“RNN-BACK,”with a comparable number of total free parameters.2)Experiments with Real Data:The goal of the experi-ments with real data is to compare different ANN structures2678IEEE TRANSACTIONS ON SIGNAL PROCESSING,VOL.45,NO.11,NOVEMBER1997Fig.5.Averaged results for the classification experiment on artificial data.for the classification of phonemes from the TIMIT speech database.Several regular MLP’s and recurrent neural network architectures,which make use of different amounts of acoustic context,are tested here.a)Description of Data:The TIMIT phoneme database is a well-established database consisting of6300sentences spoken by630speakers(ten sentences per speaker).Following official TIMIT recommendations,two of the sentences(which are the same for every speaker)are not included in our experiments,and the remaining data set is divided into two sets:1)the training data set consisting of3696sentences from462speakers and2)the test data set consisting of1344 sentences from168speakers.The TIMIT database provides hand segmentation of each sentence in terms of phonemes and a phonemic label for every segment out of a pool of61 phonemes.This gives142910phoneme segments for training and51681for testing.In our experiments,every sentence is transformed into a vector sequence using three levels of feature extraction. First,features are extracted every frame to represent the raw waveform in a compressed form.Then,with the knowledge of the boundary locations from the corresponding labelfiles, segment features are extracted to map the information from an arbitrary length segment to afixed-dimensional vector.A third transformation is applied to the segment feature vectors to make them suitable as inputs to a neural net.These three steps are briefly described below.1)Frame Feature Extraction:As frame features,12reg-ular MFCC’s(from24mel-space frequency bands)plus the log-energy are extracted every10ms with a25.6-msHamming window and a preemphasis of0.97.This is a commonly used feature extraction procedure for speech signals at the frame level[17].2)Segment Feature Extraction:From the frame fea-tures,the segment features are extracted by dividing the segment in time intofive equally spaced regions and computing the area under the curve in each region, with the function values between the data points linearly interpolated.This is done separately for each of the 13frame features.The duration of the segment is used as an additional segment feature.This results in a66-dimensional segment feature vector.3)Neural Network Preprocessing:Although ANN’s canprincipally handle any form of input distributions,we have found in our experiments that the best results are achieved with Gaussian input distributions,which matches the experiences from[12].To generate an “almost-Gaussian distribution,”the inputs arefirst nor-malized to zero mean and unit variance on a sentence basis,and then,every feature of a given channel2is quantized using a scalar quantizer having256recon-struction levels(1byte).The scalar quantizer is designed to maximize the entropy of the channel for the whole training data.The maximum entropy scalar quantizer can be easily designed for each channel by arranging the channel points in ascending order according to their feature values and putting(almost)an equal number of 2Here,each vector has a dimensionality of66.Temporal sequence of each component(or feature)of this vector defines one channel.Thus,we have here 66channels.SCHUSTER AND PALIWAL:BIDIRECTIONAL RECURRENT NEURAL NETWORKS2679 TABLE IITIMIT P HONEME C LASSIFICATION R ESULTS FOR F ULLT RAINING AND T EST D ATA S ETS WITH 13000P ARAMETERSchannel points in each quantization cell.For presentationto the network,the byte-coded value is remapped withvalue byte ata certain time point.For some applications,it is necessary to estimate theconditional posterior probabilityto,we decompose the sequence posteriorprobability asbackward posterior probabilityforward posterior probability(which are the probability termsin the products).The estimates for these probabilities can thenbe combined by using the formulas above to estimate the fullconditional probability of the sequence.It should be noted2680IEEE TRANSACTIONS ON SIGNAL PROCESSING,VOL.45,NO.11,NOVEMBER1997Fig.6.Modified bidirectional recurrent neural network structure shown here with extensions for the forward posterior probability estimation. that the forward and the backward posterior probabilities areexactly equal,provided the probability estimator is perfect.However,if neural networks are used as probability estimators,this will rarely be the case because different architecturesor different local minima of the objective function to beminimized correspond to estimators of different performance.It might therefore be useful to combine several estimatorsto get a better estimate of the quantity of interest usingthe methods of the previous section.Two candidates thatcould be merged hereareand.B.Modified Bidirectional Recurrent Neural NetworksA slightly modified BRNN structure can efficiently beused to estimate the conditional probabilities of thekind,which is conditioned on continu-ousdimensions of the whole input vector.To make theBRNN suitable toestimate,twochanges are necessary.First,instead of connecting the forwardand backward states to the current output states,they areconnected to the next and previous output states,respectively,and the inputs are directly connected to the outputs.Second,if in the resulting structure thefirstcan be used to make predictions.This isexactly what is required to estimate the forward posteriorprobability.Fig.6illustrates thischange of the original BRNN architecture.Cutting the inputconnections to the forward states instead of the backward statesgives the architecture for estimating the backward posteriorprobability.Theoretically,all discrete and continuousinputsthat are necessary to estimate the prob-ability are still accessible for a contribution to the prediction.During training,the bidirectional structure can adapt to the bestpossible use of the input information,as opposed to structuresthat do not provide part of the input information because of thelimited size of the input windows(e.g.,in MLP and TDNN)or one-sided windows(unidirectional RNN).TABLE IIIC LASSIFICATION R ESULTS FOR F ULL TIMITT RAINING AND T EST D ATA WITH61(39)SYMBOLSC.Experiments and Results1)Experiments:Experiments are performed using the fullTIMIT data set.To include the output(target)class in-formation,the original66-dimensional feature vectors areextended to72dimensions.In thefirst six dimensions,thecorresponding output class is coded in a binary format(binary[0,1]activation function.The forward(backward)modified BRNN has64(32)forward and32(64)backward states.Additionally,64hidden neurons areimplemented before the output layer.This results in a forward(backward)modified BRNN structure with26333weights.These two structures,as well as their combination—mergedas a linear and a logarithmic opinion pool—are evaluated forphoneme classification on the test data.2)Results:The results for the phoneme classification taskare shown in Table III.It can be seen that the combination ofthe forward and backward modified BRNN structures resultsin much better performance than the individual structures.Thisshows that the two structures,even though they are trained onthe same training data set to compute the sameprobabilitySCHUSTER AND PALIWAL:BIDIRECTIONAL RECURRENT NEURAL NETWORKS2681sequence and that it does not provide a class sequence with the highest probability.For this,all possible class sequences have to be searched to get the most probable class sequence (which is a procedure that has to be followed if one is interested in a problem like continuous speech recognition). In the experiments reported in this section,we have used the class sequence provided by the TIMIT data base.Therefore, the context on the(right or left)output side is known and is correct.IV.D ISCUSSION AND C ONCLUSIONIn thefirst part of this paper,a simple extension to a regular recurrent neural network structure has been presented, which makes it possible to train the network in both time directions simultaneously.Because the network concentrates on minimizing the objective function for both time directions simultaneously,there is no need to worry about how to merge outputs from two separate networks.There is also no need to search for an“optimal delay”to minimize the objective function in a given data/network structure combination be-cause all future and past information around the currently evaluated time point is theoretically available and does not depend on a predefined delay parameter.Through a series of extensive experiments,it has been shown that the BRNN structure leads to better results than the other ANN structures. In all these comparisons,the number of free parameters has been kept to be approximately the same.The training time for the BRNN is therefore about the same as for the other RNN’s. Since the search for an optimal delay(an additional search parameter during development)is not necessary,the BRNN’s can provide,in comparison to other RNN’s investigated in this paper,faster development of real applications with better results.In the second part of this paper,we have shown how to use slightly modified bidirectional recurrent neural nets for the estimation of the conditional probability of symbol sequences without making any explicit assumption about the shape of the output probability distribution.It should be noted that the modified BRNN structure is only a tool to estimate the conditional probability of a given class sequence;it does not provide the class sequence with the highest probability.For this,all possible class sequences have to be searched to get the most probable class sequence.We are currently working on designing an efficient search engine,which will use only ANN’s tofind the most probable class sequence.R EFERENCES[1]J.O.Berger,Statistical Decision Theory and Bayesian Analysis.Berlin,Germany:Springer-Verlag,1985.[2] C.M.Bishop,Neural Networks for Pattern Recognition.Oxford,U.K.:Clarendon,1995.[3]H.Bourlard and C.Wellekens,“Links between Markov models andmultilayer perceptrons,”IEEE Trans.Pattern Anal.Machine Intell.,vol.12,pp.1167–1178,Dec.1990.[4]J.S.Bridle,“Probabilistic interpretation of feed-forward classifica-tion network outputs,with relationships to statistical pattern recogni-tion,”in Neurocomputing:Algorithms,Architectures and Applications,F.Fougelman-Soulie and J.Herault,Eds.Berlin,Germany:Springer-Verlag,1989,NATO ASI Series,vol.F68,pp.227–236.[5] C.L.Giles,G.M.Kuhn,and R.J.Williams,“Dynamic recurrent neuralnetworks:Theory and applications,”IEEE Trans.Neural Networks,vol.5,pp.153–156,Apr.1994.[6]H.Gish,“A probabilistic approach to the understanding and trainingof neural network classifiers,”in Proc.IEEE Int.Conf.Acoust.,Speech, Signal Process.,1990,pp.1361–1364.[7]R.A.Jacobs,“Methods for combining experts’probability assessments,”Neural Comput.,vol.7,no.5,pp.867–888,1995.[8] B.A.Pearlmutter,“Learning state space trajectories in recurrent neuralnetworks,”Neural Comput.,vol.1,pp.263–269,1989.[9]M.D.Richard and R.P.Lippman,“Neural network classifiers estimateBayesian a posteriori probabilities,”Neural Comput.,vol.3,no.4,pp.461–483,1991.[10]M.Riedmiller and H.Braun,“A direct adaptive method for faster back-propagation learning:The RPROP algorithm,”in Proc.IEEE Int.Conf.Neural Networks,1993,pp.586–591.[11]T.Robinson,“Several improvements to a recurrent error propagationnetwork phone recognition system,”Cambridge Univ.Eng.Dept.Tech.Rep.CUED/F-INFENG/TR82,Sept.1991.[12] A.J.Robinson,“An application of recurrent neural nets to phoneprobability estimation,”IEEE Trans.Neural Networks,vol.5,pp.298–305,Apr.1994.[13]T.Robinson,M.Hochberg,and S.Renals,“The use of recurrentneural networks in continuous speech recognition,”in Automatic Speech Recognition:Advanced Topics,C.H.Lee,F.K.Soong,and K.K.Paliwal,Eds.Boston,MA:Kluwer,1996,pp.233–258.[14],“Improved phone modeling with recurrent neural networks,”inProc.IEEE Int.Conf.Acoust.,Speech,Signal Process.,vol.1,1994,pp.37–40.[15] D.E.Rumelhart,G.E.Hinton,and R.J.Williams,“Learning internalrepresentations by error backpropagation,”in Parallel Distributed Pro-cessing,vol.1,D.E.Rumelhart and J.L.McClelland,Eds.Cambridge, MA:MIT Press,1986,pp.318–362.[16] A.Waibel,T.Hanazawa,G.Hinton,K.Shikano,and ng,“Phoneme recognition using time-delay neural networks,”IEEE Trans.Acoust.,Speech,Signal Processing,vol.37,pp.328–339,Mar.1989.[17]S.Young,“A review of large vocabulary speech recognition,”IEEESignal Processing Mag.,vol.15,pp.45–57,May1996.Mike Schuster received the M.Sc.degree in elec-tronic engineering in1993from the Gerhard Mer-cator University,Duisburg,Germany.Currently,heis also working toward the Ph.D.degree at the NaraInstitute of Technology,Nara,Japan.After doing some research infiber optics atthe University of Tokyo,Tokyo,Japan,and someresearch in gesture recognition in Duisburg,hestarted at Advanced Telecommunication Research(ATR),Kyoto,Japan,to work on speech recognition.His research interests include neural networks and stochastic modeling in general,Bayesian approaches,information theory,andcoding.Kuldip K.Paliwal(M’89)is a Professor andChair of Communication/Information Engineeringat Griffith University,Brisbane,Australia.He hasworked at a number organizations,including theTata Institute of Fundamental Research,Bombay,India,the Norwegian Institute of Technology,Trondheim,Norway,the University of Keele,U.K.,AT&T Bell Laboratories,Murray Hill,NJ,and Advanced Telecommunication Research(ATR)Laboratories,Kyoto,Japan.He has co-edited twobooks:Speech Coding and Synthesis(New York: Elsevier,1995)and Speech and Speaker Recognition:Advanced Topics (Boston,MA:Kluwer,1996).His current research interests include speech processing,image coding,and neural networks.Dr.Paliwal received the1995IEEE Signal Processing Society Senior Award.He is an Associate Editor of the IEEE T RANSACTIONS ON S PEECH AND A UDIO P ROCESSING.。

人工智能算法在机器翻译中的精确度提升

人工智能算法在机器翻译中的精确度提升

人工智能算法在机器翻译中的精确度提升人工智能的快速发展和广泛应用,使得机器翻译领域取得了长足的进步。

在过去,传统的机器翻译方法往往存在着不准确和语义错误的问题。

然而,随着人工智能算法的不断改进和优化,机器翻译的精确度得到了显著提升。

本文将探讨人工智能算法在机器翻译中的应用,以及它们对精确度提升的影响。

一、神经网络机器翻译(NMT)神经网络机器翻译(NMT)是一种基于深度学习的机器翻译方法。

相比传统的统计机器翻译(SMT),NMT利用神经网络模型将输入的源语言句子和目标语言句子进行对应,通过神经网络的学习和训练,实现源语言到目标语言的准确翻译。

NMT的主要优势在于其能够处理更复杂的语言结构,同时在翻译长句子时表现更好。

相比于传统的SMT方法,NMT能够捕捉到更多的语义信息,有效解决了传统方法中的一些疑难问题。

二、注意力机制(Attention)注意力机制(Attention)是一种在NMT中广泛使用的技术。

它通过对输入的源语言句子中的每个词进行加权,使得模型能够重点关注与当前要翻译的目标语言词相关的源语言词。

这种机制可以减轻长句子翻译时的困难,并提高翻译的精确度。

通过引入注意力机制,NMT模型可以更好地捕捉到源语言和目标语言之间的关联性,提高翻译的准确性。

注意力机制在解码的过程中,根据源语言词和目标语言词之间的重要程度进行动态调整,使得翻译更加准确。

三、预训练模型预训练模型是一种在机器翻译中被广泛应用的方法。

它通过在大规模的语料库上进行学习和预训练,得到一个具备语言表示能力的模型,然后在特定任务上进行微调,提高翻译的精确度。

通过预训练模型,机器翻译可以更好地理解源语言句子的语义和上下文,从而提高翻译的质量。

预训练模型中的语言表示能力可以帮助机器翻译模型更好地识别语义和上下文之间的联系,提高翻译的准确度。

四、增量训练和对抗训练增量训练和对抗训练是提高机器翻译精确度的两种常用训练方法。

在增量训练中,模型通过不断接收新的训练数据,不断调整和更新参数,以适应新的语言和翻译任务。

nlp 文本相似度计算

nlp 文本相似度计算

nlp 文本相似度计算自然语言处理(NLP)的文本相似度计算是一个基于文字内容的比较任务,旨在衡量两个或多个文本之间的相似性程度。

文本相似度计算在很多领域都有广泛的应用,包括信息检索、问答系统、机器翻译等。

本文将介绍一些常见的文本相似度计算方法和相关参考内容。

1. 基于词袋模型的文本相似度计算方法:- 词频统计法:将文本转化为词频向量,然后根据词频向量之间的余弦相似度来衡量文本相似度。

- TF-IDF法:基于词频的方法,在词频向量的基础上考虑词的重要性,使用TF-IDF值来计算文本相似度。

- BM25法:改进的TF-IDF方法,考虑了词频和文档长度对词的重要性的影响,常用于信息检索中的文本相似度计算。

2. 基于词向量的文本相似度计算方法:- Word2Vec法:将文本中的每个词映射到一个固定长度的向量空间,然后计算向量之间的相似度来衡量文本相似度。

- Doc2Vec法:将整个文本映射到一个固定长度的向量空间,然后计算向量之间的相似度来衡量文本相似度。

3. 基于语义模型的文本相似度计算方法:- LSA(Latent Semantic Analysis)法:使用矩阵分解技术来提取文本的潜在语义信息,然后计算文本之间的相似度。

- LDA(Latent Dirichlet Allocation)法:基于主题模型的方法,将文本表示为一个主题分布,然后计算主题分布之间的相似度来衡量文本相似度。

关于文本相似度计算的方法和应用,以下是一些相关的参考内容(无链接):1. 《Introduction to Information Retrieval》(Christopher D. Manning等著):该书主要介绍了信息检索的基本概念和技术,包括词袋模型、TF-IDF等方法。

2. 《Natural Language Processing in Action》(Hobson Lane等著):该书详细介绍了自然语言处理中的各种任务和方法,包括文本相似度计算、词向量等。

Convolutional Neural Networks for Sentence

Convolutional Neural Networks for Sentence

8 / 34
Convolutional Neural Networks for Sentence Classification Classification Recursive Neural Tensor Networks
Recursive Neural Tensor Networks (RNTN)
Figure 2: Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank”, EMNLP 2013
Figure 1: Skipgram architecture of Mikolov et al. (2013)
5 / 34
Convolutional Neural Networks for Sentence Classification Word Embeddings
Linguistic regularities in the obtained embeddings
The learned embeddings encode semantic and syntactic regularities:
wbig − wbigger ≈ wslow − wslower wfrance − wparis ≈ wkorea − wseoul
These are cool, but not necessarily unique to neural language models. “ [...] the neural embedding process is not discovering novel patterns, but rather is doing a remarkable job at preserving the patterns inherent in the word-context co-occurrence matrix.” Levy and Goldberg, “Linguistic Regularities in Sparse and Explicit Representations”, CoNLL 2014

大数据理论考试(习题卷12)

大数据理论考试(习题卷12)

大数据理论考试(习题卷12)说明:答案和解析在试卷最后第1部分:单项选择题,共64题,每题只有一个正确答案,多选或少选均不得分。

1.[单选题]()试图学得一个属性的线性组合来进行预测的函数。

A)决策树B)贝叶斯分类器C)神经网络D)线性模2.[单选题]随机试验所有可能出现的结果,称为()A)基本事件B)样本C)全部事件D)样本空间3.[单选题]DWS实例中,下列哪项不是主备配置的:A)CMSB)GTMC)OMSD)coordinato4.[单选题]数据科学家可能会同时使用多个算法(模型)进行预测,并且最后把这些算法的结果集成起来进行最后的预测(集成学习),以下对集成学习说法正确的是()。

A)单个模型之间具有高相关性B)单个模型之间具有低相关性C)在集成学习中使用“平均权重”而不是“投票”会比较好D)单个模型都是用的一个算法5.[单选题]下面算法属于局部处理的是()。

A)灰度线性变换B)二值化C)傅里叶变换D)中值滤6.[单选题]中文同义词替换时,常用到Word2Vec,以下说法错误的是()。

A)Word2Vec基于概率统计B)Word2Vec结果符合当前预料环境C)Word2Vec得到的都是语义上的同义词D)Word2Vec受限于训练语料的数量和质7.[单选题]一位母亲记录了儿子3~9岁的身高,由此建立的身高与年龄的回归直线方程为y=7.19x+73.93,据此可以预测这个孩子10岁时的身高,则正确的叙述是()。

A)身高一定是145.83cmB)身高一定超过146.00cmC)身高一定高于145.00cmD)身高在145.83cm左右8.[单选题]有关数据仓库的开发特点,不正确的描述是()。

A)数据仓库开发要从数据出发;B)数据仓库使用的需求在开发出去就要明确;C)数据仓库的开发是一个不断循环的过程,是启发式的开发;D)在数据仓库环境中,并不存在操作型环境中所固定的和较确切的处理流,数据仓库中数据分析和处理更灵活,且没有固定的模式9.[单选题]由于不同类别的关键词对排序的贡献不同,检索算法一般把查询关键词分为几类,以下哪一类不属于此关键词类型的是()。

15ICCV_Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for

15ICCV_Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for

Weakly-and Semi-Supervised Learning of a Deep Convolutional Network forSemantic Image SegmentationGeorge Papandreou∗Google,Inc. gpapan@ Liang-Chieh Chen∗UCLAlcchen@Kevin P.MurphyGoogle,Inc.kpmurphy@Alan L.YuilleUCLAyuille@AbstractDeep convolutional neural networks(DCNNs)trained on a large number of images with strong pixel-level anno-tations have recently significantly pushed the state-of-art in semantic image segmentation.We study the more challeng-ing problem of learning DCNNs for semantic image seg-mentation from either(1)weakly annotated training data such as bounding boxes or image-level labels or(2)a com-bination of few strongly labeled and many weakly labeled images,sourced from one or multiple datasets.We develop Expectation-Maximization(EM)methods for semantic im-age segmentation model training under these weakly super-vised and semi-supervised settings.Extensive experimental evaluation shows that the proposed techniques can learn models delivering competitive results on the challenging PASCAL VOC2012image segmentation benchmark,while requiring significantly less annotation effort.We share source code implementing the proposed system at https: ///deeplab/deeplab-public.1.IntroductionSemantic image segmentation refers to the problem of assigning a semantic label(such as“person”,“car”or “dog”)to every pixel in the image.Various approaches have been tried over the years,but according to the results on the challenging Pascal VOC2012segmentation benchmark,the best performing methods all use some kind of Deep Convo-lutional Neural Network(DCNN)[2,5,8,14,25,27,41].In this paper,we work with the DeepLab-CRF approach of[5,41].This combines a DCNN with a fully connected Conditional Random Field(CRF)[19],in order to get high resolution segmentations.This model achieves state-of-art results on the challenging PASCAL VOC segmentation benchmark[13],delivering a mean intersection-over-union (IOU)score exceeding70%.A key bottleneck in building this class of DCNN-based∗Thefirst two authors contributed equally to this work.segmentation models is that they typically require pixel-level annotated images during training.Acquiring such data is an expensive,time-consuming annotation effort.Weak annotations,in the form of bounding boxes(i.e.,coarse object locations)or image-level labels(i.e.,information about which object classes are present)are far easier to collect than detailed pixel-level annotations.We develop new methods for training DCNN image segmentation mod-els from weak annotations,either alone or in combination with a small number of strong annotations.Extensive ex-periments,in which we achieve performance up to69.0%, demonstrate the effectiveness of the proposed techniques.According to[24],collecting bounding boxes around each class instance in the image is about15times faster/cheaper than labeling images at the pixel level.We demonstrate that it is possible to learn a DeepLab-CRF model delivering62.2%IOU on the PASCAL VOC2012 test set by training it on a simple foreground/background segmentation of the bounding box annotations.An even cheaper form of data to collect is image-level labels,which specify the presence or absence of se-mantic classes,but not the object locations.Most exist-ing approaches for training semantic segmentation models from this kind of very weak labels use multiple instance learning(MIL)techniques.However,even recent weakly-supervised methods such as[25]deliver significantly infe-rior results compared to their fully-supervised counterparts, only achieving25.7%.Including additional trainable ob-jectness[7]or segmentation[1]modules that largely in-crease the system complexity,[31]has improved perfor-mance to40.6%,which still significantly lags performance of fully-supervised systems.We develop novel online Expectation-Maximization (EM)methods for training DCNN semantic segmentation models from weakly annotated data.The proposed algo-rithms alternate between estimating the latent pixel labels (subject to the weak annotation constraints),and optimiz-ing the DCNN parameters using stochastic gradient descent (SGD).When we only have access to image-level anno-tated training data,we achieve39.6%,close to[31]butwithout relying on any external objectness or segmenta-tion module.More importantly,our EM approach also excels in the semi-supervised scenario which is very im-portant in practice.Having access to a small number of strongly (pixel-level)annotated images and a large number of weakly (bounding box or image-level)annotated images,the proposed algorithm can almost match the performance of the fully-supervised system.For example,having access to 2.9k pixel-level images and 9k image-level annotated im-ages yields 68.5%,only 2%inferior the performance of the system trained with all 12k images strongly annotated at the pixel level.Finally,we show that using additional weak or strong annotations from the MS-COCO dataset can further improve results,yielding 73.9%on the PASCAL VOC 2012benchmark.Contributions In summary,our main contributions are:1.We present EM algorithms for training with image-level or bounding box annotation,applicable to both the weakly-supervised and semi-supervised settings.2.We show that our approach achieves excellent per-formance when combining a small number of pixel-level annotated images with a large number of image-level or bounding box annotated images,nearly match-ing the results achieved when all training images have pixel-level annotations.3.We show that combining weak or strong annotations across datasets yields further improvements.In partic-ular,we reach 73.9%IOU performance on PASCAL VOC 2012by combining annotations from the PAS-CAL and MS-COCO datasets.2.Related workTraining segmentation models with only image-level labels has been a challenging problem in the literature [12,36,37,39].Our work is most related to other re-cent DCNN models such as [30,31],who also study the weakly supervised setting.They both develop MIL-based algorithms for the problem.In contrast,our model em-ploys an EM algorithm,which similarly to [26]takes into account the weak labels when inferring the latent image seg-mentations.Moreover,[31]proposed to smooth the predic-tion results by region proposal algorithms,e.g .,CPMC [3]and MCG [1],learned on pixel-segmented images.Neither [30,31]cover the semi-supervised setting.Bounding box annotations have been utilized for seman-tic segmentation by [38,42],while [15,21,40]describe schemes exploiting both image-level labels and bounding box annotations.[4]attained human-level accuracy for car segmentation by using 3D bounding boxes.Bounding box annotations are also commonly used in interactive segmen-tation [22,33];we show that such foreground/backgroundPixel annotationsImage Deep Convolutional Neural NetworkLossFigure 1.DeepLab model training from fully annotated images.segmentation methods can effectively estimate object seg-ments accurate enough for training a DCNN semantic seg-mentation system.Working in a setting very similar to ours,[9]employed MCG [1](which requires training from pixel-level annotations)to infer object masks from bounding box labels during DCNN training.3.Proposed MethodsWe build on the DeepLab model for semantic image seg-mentation proposed in [5].This uses a DCNN to predict the label distribution per pixel,followed by a fully-connected (dense)CRF [19]to smooth the predictions while preserv-ing image edges.In this paper,we focus for simplicity on methods for training the DCNN parameters from weak la-bels,only using the CRF at test time.Additional gains can be obtained by integrated end-to-end training of the DCNN and CRF parameters [41,6].Notation We denote by x the image values and y the seg-mentation map.In particular,y m ∈{0,...,L }is the pixel label at position m ∈{1,...,M },assuming that we have the background as well as L possible foreground labels and M is the number of pixels.Note that these pixel-level la-bels may not be visible in the training set.We encode the set of image-level labels by z ,with z l =1,if the l -th label is present anywhere in the image,i.e .,if m [y m =l ]>0.3.1.Pixel-level annotationsIn the fully supervised case illustrated in Fig.1,the ob-jective function isJ (θ)=log P (y |x ;θ)=Mm =1log P (y m |x ;θ),(1)where θis the vector of DCNN parameters.The per-pixellabel distributions are computed byP (y m |x ;θ)∝exp(f m (y m |x ;θ)),(2)where f m (y m |x ;θ)is the output of the DCNN at pixel m .We optimize J (θ)by mini-batch SGD.3.2.Image-level annotationsWhen only image-level annotation is available,we can observe the image values x and the image-level labels z ,but the pixel-level segmentations y are latent variables.WeAlgorithm 1Weakly-Supervised EM (fixed bias version)Input:Initial CNN parameters θ′,potential parameters b l ,l ∈{0,...,L },image x ,image-level label set z .E-Step:For each image position m1:ˆf m (l )=f m (l |x ;θ′)+b l ,if z l =12:ˆf m (l )=f m (l |x ;θ′),if z l =03:ˆy m =argmax l ˆf m (l )M-Step:4:Q (θ;θ′)=log P (ˆy |x ,θ)= M m =1log P (ˆy m |x ,θ)5:Compute ∇θQ (θ;θ′)and use SGD to update θ′.have the following probabilistic graphical model:P (x ,y ,z ;θ)=P (x )Mm =1P (y m |x ;θ)P (z |y ).(3)We pursue an EM-approach in order to learn the model parameters θfrom training data.If we ignore terms that do not depend on θ,the expected complete-data log-likelihood given the previous parameter estimate θ′isQ (θ;θ′)= yP (y |x ,z ;θ′)log P (y |x ;θ)≈log P (ˆy |x ;θ),(4)where we adopt a hard-EM approximation,estimating in the E-step of the algorithm the latent segmentation by ˆy =argmax yP (y |x ;θ′)P (z |y )(5)=argmax ylog P (y |x ;θ′)+log P (z |y )(6)=argmaxyMm =1f m (y m |x ;θ′)+log P (z |y ) .(7)In the M-step of the algorithm,we optimize Q (θ;θ′)≈log P (ˆy |x ;θ)by mini-batch SGD similarly to (1),treatingˆyas ground truth segmentation.To completely identify the E-step (7),we need to specifythe observation model P (z |y ).We have experimented withtwo variants,EM-Fixed and EM-Adapt .EM-Fixed In this variant,we assume that log P (z |y )fac-torizes over pixel positions aslog P (z |y )=Mm =1φ(y m ,z )+(const),(8)allowing us to estimate the E-step segmentation at eachpixel separatelyˆy m =argmaxy mˆf m (y m ).=f m (y m |x ;θ′)+φ(y m ,z ).(9)ImageFigure 2.DeepLab model training using image-level labels.We assume thatφ(y m =l,z )=b l if z l =10if z l =0(10)We set the parameters b l =b fg ,if l >0and b 0=b bg ,with b fg >b bg >0.Intuitively,this potential encourages a pixel to be assigned to one of the image-level labels z .We choose b fg >b bg ,boosting present foreground classes more than the background,to encourage full object coverage andavoid a degenerate solution of all pixels being assigned to background.The procedure is summarized in Algorithm 1and illustrated in Fig.2.EM-Adapt In this method,we assume that log P (z |y )=φ(y ,z )+(const),where φ(y ,z )takes the form of a cardi-nality potential [23,32,35].In particular,we encourage atleast a ρl portion of the image area to be assigned to classl ,if z l =1,and enforce that no pixel is assigned to classl ,if z l =0.We set the parameters ρl =ρfg ,if l >0andρ0=ρbg .Similar constraints appear in [10,20].In practice,we employ a variant of Algorithm 1.Weadaptively set the image-and class-dependent biases b l so as the prescribed proportion of the image area is assigned to the background or foreground object classes.This acts as a powerful constraint that explicitly prevents the background score from prevailing in the whole image,also promoting higher foreground object coverage.The detailed algorithm is described in the supplementary material.EM It is instructive to compare our EM-based approach with two recent Multiple Instance Learning (MIL)methods for learning semantic image segmentation models [30,31].The method in [30]defines an MIL classification objective based on the per-class spatial maximum of the lo-cal label distributions of (2),ˆP (l |x ;θ).=max m P (y m =l |x ;θ),and [31]adopts a softmax function.While this approach has worked well for image classification tasks [28,29],it is less suited for segmentation as it does not pro-mote full object coverage:The DCNN becomes tuned to focus on the most distinctive object parts (e.g .,human face)instead of capturing the whole object (e.g .,human body).ImageBbox annotationsDeep ConvolutionalNeural NetworkDenseCRFargmaxLossFigure3.DeepLab model training from bounding boxes.3.3.Bounding Box AnnotationsWe explore three alternative methods for training our segmentation model from labeled bounding boxes.Thefirst Bbox-Rect method amounts to simply consider-ing each pixel within the bounding box as positive example for the respective object class.Ambiguities are resolved by assigning pixels that belong to multiple bounding boxes to the one that has the smallest area.The bounding boxes fully surround objects but also contain background pixels that contaminate the training set with false positive examples for the respective object classes.Tofilter out these background pixels,we have also explored a second Bbox-Seg method in which we per-form automatic foreground/background segmentation.To perform this segmentation,we use the same CRF as in DeepLab.More specifically,we constrain the center area of the bounding box(α%of pixels within the box)to be fore-ground,while we constrain pixels outside the bounding box to be background.We implement this by appropriately set-ting the unary terms of the CRF.We then infer the labels for pixels in between.We cross-validate the CRF parameters to maximize segmentation accuracy in a small held-out set of fully-annotated images.This approach is similar to the grabcut method of[33].Examples of estimated segmenta-tions with the two methods are shown in Fig.4.The two methods above,illustrated in Fig.3,estimate segmentation maps from the bounding box annotation as a pre-processing step,then employ the training procedure of Sec.3.1,treating these estimated labels as ground-truth.Our third Bbox-EM-Fixed method is an EM algorithm that allows us to refine the estimated segmentation maps throughout training.The method is a variant of the EM-Fixed algorithm in Sec.3.2,in which we boost the present foreground object scores only within the bounding box area.3.4.Mixed strong and weak annotationsIn practice,we often have access to a large number of weakly image-level annotated images and can only afford to procure detailed pixel-level annotations for a small fraction of these images.We handlethishybrid training scenario byImage with Bbox Ground-Truth Bbox-Rect Bbox-SegFigure4.Estimatedsegmentation frombounding box annotation.+Pixel AnnotationsFG/BGBiasargmax1. Car2. Person3. HorseDeep ConvolutionalNeural Network LossDeep ConvolutionalNeural NetworkLossScore mapsFigure5.DeepLab model training on a union of full(strong labels)and image-level(weak labels)annotations.combining the methods presented in the previous sections,as illustrated in Figure5.In SGD training of our deep CNNmodels,we bundle to each mini-batch afixed proportionof strongly/weakly annotated images,and employ our EMalgorithm in estimating at each iteration the latent semanticsegmentations for the weakly annotated images.4.Experimental Evaluation4.1.Experimental ProtocolDatasets The proposed training methods are evaluatedon the PASCAL VOC2012segmentation benchmark[13],consisting of20foreground object classes and one back-ground class.The segmentation part of the original PAS-CAL VOC2012dataset contains1464(train),1449(val),and1456(test)images for training,validation,and test,re-spectively.We also use the extra annotations provided by[16],resulting in augmented sets of10,582(train aug)and12,031(trainval aug)images.We have also experimentedwith the large MS-COCO2014dataset[24],which con-tains123,287images in its trainval set.The MS-COCO2014dataset has80foreground object classes and one back-ground class and is also annotated at the pixel level.The performance is measured in terms of pixelintersection-over-union(IOU)averaged across the21classes.Wefirst evaluate our proposed methods on the PAS-CAL VOC2012val set.We then report our results on the official PASCAL VOC2012benchmark test set(whose an-notations are not released).We also compare our test set results with other competing methods.Reproducibility We have implemented the proposed methods by extending the excellent Caffe framework[18]. We share our source code,configurationfiles,and trained models that allow reproducing the results in this paper at a companion web site https:/// deeplab/deeplab-public.Weak annotations In order to simulate the situations where only weak annotations are available and to have fair comparisons(e.g.,use the same images for all settings),we generate the weak annotations from the pixel-level annota-tions.The image-level labels are easily generated by sum-marizing the pixel-level annotations,while the bounding box annotations are produced by drawing rectangles tightly containing each object instance(PASCAL VOC2012also provides instance-level annotations)in the dataset. Network architectures We have experimented with the two DCNN architectures of[5],with parameters initialized from the VGG-16ImageNet[11]pretrained model of[34]. They differ in the receptivefield of view(FOV)size.We have found that large FOV(224×224)performs best when at least some training images are annotated at the pixel level, whereas small FOV(128×128)performs better when only image-level annotations are available.In the main paper we report the results of the best architecture for each setup and defer the full comparison between the two FOVs to the supplementary material.Training We employ our proposed training methods to learn the DCNN component of the DeepLab-CRF model of [5].For SGD,we use a mini-batch of20-30images and ini-tial learning rate of0.001(0.01for thefinal classifier layer), multiplying the learning rate by0.1after afixed number of iterations.We use momentum of0.9and a weight decay of 0.0005.Fine-tuning our network on PASCAL VOC2012 takes about12hours on a NVIDIA Tesla K40GPU.Similarly to[5],we decouple the DCNN and Dense CRF training stages and learn the CRF parameters by cross val-idation to maximize IOU segmentation accuracy in a held-out set of100Pascal val fully-annotated images.We use10 mean-field iterations for Dense CRF inference[19].Note that the IOU scores are typically3-5%worse if we don’t use the CRF for post-processing of the results.4.2.Pixel-level annotationsWe havefirst reproduced the results of[5].Training the DeepLab-CRF model with strong pixel-level annota-tions on PASCAL VOC2012,we achieve a mean IOU scoreMethod#Strong#Weak val IOUEM-Fixed(Weak)-10,58220.8EM-Adapt(Weak)-10,58238.2EM-Fixed(Semi)20010,38247.650010,08256.97509,83259.81,0009,58262.01,4645,00063.21,4649,11864.6Strong1,464-62.510,582-67.6Table1.VOC2012val performance for varying number of pixel-level(strong)and image-level(weak)annotations(Sec.4.3).Method#Strong#Weak test IOUMIL-FCN[30]-10k25.7MIL-sppxl[31]-760k35.8MIL-obj[31]BING760k37.0MIL-seg[31]MCG760k40.6EM-Adapt(Weak)-12k39.6EM-Fixed(Semi)1.4k10k66.22.9k9k68.5Strong[5]12k-70.3Table2.VOC2012test performance for varying number of pixel-level(strong)and image-level(weak)annotations(Sec.4.3).of67.6%on val and70.3%on test;see method DeepLab-CRF-LargeFOV in[5,Table1].4.3.Image-level annotationsValidation results We evaluate our proposed methods in training the DeepLab-CRF model using image-level weak annotations from the10,582PASCAL VOC2012train aug set,generated as described in Sec.4.1above.We report the val performance of our two weakly-supervised EM vari-ants described in Sec.3.2.In the EM-Fixed variant we use b fg=5and b bg=3asfixed foreground and background biases.We found the results to be quite sensitive to the dif-ference b fg−b bg but not very sensitive to their absolute val-ues.In the adaptive EM-Adapt variant we constrain at least ρbg=40%of the image area to be assigned to background and at leastρfg=20%of the image area to be assigned to foreground(as specified by the weak label set).We also examine using weak image-level annotations in addition to a varying number of pixel-level annotations, within the semi-supervised learning scheme of Sec.3.4. In this Semi setting we employ strong annotations of a subset of PASCAL VOC2012train set and use the weak image-level labels from another non-overlapping subset of the train aug set.We perform segmentation inference for the images that only have image-level labels by means of EM-Fixed,which we have found to perform better than EM-Adapt in the semi-supervised training setting.The results are summarized in Table1.We see that the EM-Adapt algorithm works much better than the EM-Fixed algorithm when we only have access to image level an-notations,20.8%vs.38.2%validation ing1,464 pixel-level and9,118image-level annotations in the EM-Fixed semi-supervised setting significantly improves per-formance,yielding64.6%.Note that image-level annota-tions are helpful,as training only with the1,464pixel-level annotations only yields62.5%.Test results In Table2we report our test results.We com-pare the proposed methods with the recent MIL-based ap-proaches of[30,31],which also report results obtained with image-level annotations on the VOC benchmark.Our EM-Adapt method yields39.6%,which improves over MIL-FCN[30]by a large13.9%margin.As[31]shows,MIL can become more competitive if additional segmentation in-formation is introduced:Using low-level superpixels,MIL-sppxl[31]yields35.8%and is still inferior to our EM algo-rithm.Only if augmented with BING[7]or MCG[1]can MIL obtain results comparable to ours(MIL-obj:37.0%, MIL-seg:40.6%)[31].Note,however,that both BING and MCG have been trained with bounding box or pixel-annotated data on the PASCAL train set,and thus both MIL-obj and MIL-seg indirectly rely on bounding box or pixel-level PASCAL annotations.The more interestingfinding of this experiment is that including very few strongly annotated images in the semi-supervised setting significantly improves the performance compared to the pure weakly-supervised baseline.For example,using 2.9k pixel-level annotations along with 9k image-level annotations in the semi-supervised setting yields68.5%.We would like to highlight that this re-sult surpasses all techniques which are not based on the DCNN+CRF pipeline of[5](see Table6),even if trained with all available pixel-level annotations.4.4.Bounding box annotationsValidation results In this experiment,we train the DeepLab-CRF model using bounding box annotations from the train aug set.We estimate the training set segmentations in a pre-processing step using the Bbox-Rect and Bbox-Seg methods described in Sec.3.3.We assume that we also have access to100fully-annotated PASCAL VOC2012val images which we have used to cross-validate the value of the single Bbox-Seg parameterα(percentage of the cen-ter bounding box area constrained to be foreground).We variedαfrom20%to80%,finding thatα=20%maxi-mizes accuracy in terms of IOU in recovering the ground truth foreground from the bounding box.We also examine the effect of combining these weak bounding box annota-tions with strong pixel-level annotations,using the semi-supervised learning methods of Sec.3.4.The results are summarized in Table3.When using only bounding box annotations,we see that Bbox-Seg improves over Bbox-Rect by8.1%,and gets within7.0%of the strong pixel-level annotation result.We observe that combining 1,464strong pixel-level annotations with weak bounding box annotations yields65.1%,only2.5%worse than the strong pixel-level annotation result.In the semi-supervisedMethod#Strong#Box val IOUBbox-Rect(Weak)-10,58252.5Bbox-EM-Fixed(Weak)-10,58254.1Bbox-Seg(Weak)-10,58260.6Bbox-Rect(Semi)1,4649,11862.1Bbox-EM-Fixed(Semi)1,4649,11864.8Bbox-Seg(Semi)1,4649,11865.1Strong1,464-62.510,582-67.6Table3.VOC2012val performance for varying number of pixel-level(strong)and bounding box(weak)annotations(Sec.4.4).Method#Strong#Box test IOUBoxSup[9]MCG10k64.6BoxSup[9] 1.4k(+MCG)9k66.2Bbox-Rect(Weak)-12k54.2Bbox-Seg(Weak)-12k62.2Bbox-Seg(Semi) 1.4k10k66.6Bbox-EM-Fixed(Semi) 1.4k10k66.6Bbox-Seg(Semi) 2.9k9k68.0Bbox-EM-Fixed(Semi) 2.9k9k69.0Strong[5]12k-70.3Table4.VOC2012test performance for varying number of pixel-level(strong)and bounding box(weak)annotations(Sec.4.4).learning settings and1,464strong annotations,Semi-Bbox-EM-Fixed and Semi-Bbox-Seg perform similarly.Test results In Table4we report our test results.We com-pare the proposed methods with the very recent BoxSup ap-proach of[9],which also uses bounding box annotations on the VOC2012segmentation paring our al-ternative Bbox-Rect(54.2%)and Bbox-Seg(62.2%)meth-ods,we see that simple foreground-background segmenta-tion provides much better segmentation masks for DCNN training than using the raw bounding boxes.BoxSup does 2.4%better,however it employs the MCG segmentation proposal mechanism[1],which has been trained with pixel-annotated data on the PASCAL train set;it thus indirectly relies on pixel-level annotations.When we also have access to pixel-level annotated im-ages,our performance improves to66.6%(1.4k strong annotations)or69.0%(2.9k strong annotations).In this semi-supervised setting we outperform BoxSup(66.6%vs.66.2%with1.4k strong annotations),although we do not use MCG.Interestingly,Bbox-EM-Fixed improves over Bbox-Seg as we add more strong annotations,and it per-forms1.0%better(69.0%vs.68.0%)with2.9k strong an-notations.This shows that the E-step of our EM algorithm can estimate the object masks better than the foreground-background segmentation pre-processing step when enough pixel-level annotated images are available.Comparing with Sec.4.3,note that2.9k strong+9k image-level annotations yield68.5%(Table2),while2.9k strong+9k bounding box annotations yield69.0%(Ta-ble3).Thisfinding suggests that bounding box annotations add little value over image-level annotations when a suffi-cient number of pixel-level annotations is also available.Method#Strong COCO#Weak COCO val IOU PASCAL-only--67.6EM-Fixed(Semi)-123,28767.7Cross-Joint(Semi)5,000118,28770.0Cross-Joint(Strong)5,000-68.7Cross-Pretrain(Strong)123,287-71.0Cross-Joint(Strong)123,287-71.7 Table5.VOC2012val performance using strong annotations for all10,582train aug PASCAL images and a varying number of strong and weak MS-COCO annotations(Sec.4.5).Method test IOUMSRA-CFM[8]61.8FCN-8s[25]62.2Hypercolumn[17]62.6TTI-Zoomout-16[27]64.4DeepLab-CRF-LargeFOV[5]70.3BoxSup(Semi,with weak COCO)[9]71.0DeepLab-CRF-LargeFOV(Multi-scale net)[5]71.6Oxford TVG CRF RNN VOC[41]72.0Oxford TVG CRF RNN COCO[41]74.7Cross-Pretrain(Strong)72.7Cross-Joint(Strong)73.0Cross-Pretrain(Strong,Multi-scale net)73.6Cross-Joint(Strong,Multi-scale net)73.9Table6.VOC2012test performance using PASCAL and MS-COCO annotations(Sec.4.5).4.5.Exploiting Annotations Across Datasets Validation results We present experiments leveraging the 81-label MS-COCO dataset as an additional source of data in learning the DeepLab model for the21-label PASCAL VOC2012segmentation task.We consider three scenarios:•Cross-Pretrain(Strong):Pre-train DeepLab on MS-COCO,then replace the top-level network weights and fine-tune on Pascal VOC2012,using pixel-level anno-tation in both datasets.•Cross-Joint(Strong):Jointly train DeepLab on Pas-cal VOC2012and MS-COCO,sharing the top-level network weights for the common classes,using pixel-level annotation in both datasets.•Cross-Joint(Semi):Jointly train DeepLab on Pascal VOC2012and MS-COCO,sharing the top-level net-work weights for the common classes,using the pixel-level labels from PASCAL and varying the number of pixel-and image-level labels from MS-COCO.In all cases we use strong pixel-level annotations for all 10,582train aug PASCAL images.We report our results on the PASCAL VOC2012val in Table5,also including for comparison our best PASCAL-only67.6%result exploiting all10,582strong annotations as a baseline.When we employ the weak MS-COCO an-notations(EM-Fixed(Semi))we obtain67.7%IOU,which does not improve over the PASCAL-only baseline.How-ever,using strong labels from5,000MS-COCO images (4.0%of the MS-COCO dataset)and weak labels from the remaining MS-COCO images in the Cross-Joint(Semi) semi-supervised scenario yields70.0%,a significant2.4%boost over the baseline.This Cross-Joint(Semi)result is also1.3%better than the68.7%performance obtained us-ing only the5,000strong and no weak annotations from MS-COCO.As expected,our best results are obtained by using all123,287strong MS-COCO annotations,71.0%for Cross-Pretrain(Strong)and71.7%for Cross-Joint(Strong). We observe that cross-dataset augmentation improves by 4.1%over the best PASCAL-only ing only a small portion of pixel-level annotations and a large portion of image-level annotations in the semi-supervised setting reaps about half of this benefit.Test results We report our PASCAL VOC2012test re-sults in Table6.We include results of other leading models from the PASCAL leaderboard.All our models have been trained with pixel-level annotated images on the PASCAL trainval aug and the MS-COCO2014trainval datasets.Methods based on the DCNN+CRF pipeline of DeepLab-CRF[5]are the most competitive,with perfor-mance surpassing70%,even when only trained on PAS-CAL data.Leveraging the MS-COCO annotations brings about2%improvement.Our top model yields73.9%,using the multi-scale network architecture of[5].Also see[41], which also uses joint PASCAL and MS-COCO training,and further improves performance(74.7%)by end-to-end learn-ing of the DCNN and CRF parameters.4.6.Qualitative Segmentation ResultsIn Fig.6we provide visual comparisons of the results obtained by the DeepLab-CRF model learned with some of the proposed training methods.5.ConclusionsThe paper has explored the use of weak or partial anno-tation in training a state of art semantic image segmenta-tion model.Extensive experiments on the challenging PAS-CAL VOC2012dataset have shown that:(1)Using weak annotation solely at the image-level seems insufficient to train a high-quality segmentation model.(2)Using weak bounding-box annotation in conjunction with careful seg-mentation inference for images in the training set suffices to train a competitive model.(3)Excellent performance is obtained when combining a small number of pixel-level an-notated images with a large number of weakly annotated images in a semi-supervised setting,nearly matching the results achieved when all training images have pixel-level annotations.(4)Exploiting extra weak or strong annota-tions from other datasets can lead to large improvements. AcknowledgmentsThis work was partly supported by ARO62250-CS,and NIH5R01EY022247-03.We also gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.。

人工智能(AI)中英文术语对照表

人工智能(AI)中英文术语对照表

人工智能(AI)中英文术语对照表目录人工智能(AI)中英文术语对照表 (1)Letter A (1)Letter B (2)Letter C (3)Letter D (4)Letter E (5)Letter F (6)Letter G (6)Letter H (7)Letter I (7)Letter K (8)Letter L (8)Letter M (9)Letter N (10)Letter O (10)Letter P (11)Letter Q (12)Letter R (12)Letter S (13)Letter T (14)Letter U (14)Letter V (15)Letter W (15)Letter AAccumulated error backpropagation 累积误差逆传播Activation Function 激活函数Adaptive Resonance Theory/ART 自适应谐振理论Addictive model 加性学习Adversarial Networks 对抗网络Affine Layer 仿射层Affinity matrix 亲和矩阵Agent 代理/ 智能体Algorithm 算法Alpha-beta pruning α-β剪枝Anomaly detection 异常检测Approximation 近似Area Under ROC Curve/AUC Roc 曲线下面积Artificial General Intelligence/AGI 通用人工智能Artificial Intelligence/AI 人工智能Association analysis 关联分析Attention mechanism注意力机制Attribute conditional independence assumption 属性条件独立性假设Attribute space 属性空间Attribute value 属性值Autoencoder 自编码器Automatic speech recognition 自动语音识别Automatic summarization自动摘要Average gradient 平均梯度Average-Pooling 平均池化Action 动作AI language 人工智能语言AND node 与节点AND/OR graph 与或图AND/OR tree 与或树Answer statement 回答语句Artificial intelligence,AI 人工智能Automatic theorem proving自动定理证明Letter BBreak-Event Point/BEP 平衡点Backpropagation Through Time 通过时间的反向传播Backpropagation/BP 反向传播Base learner 基学习器Base learning algorithm 基学习算法Batch Normalization/BN 批量归一化Bayes decision rule 贝叶斯判定准则Bayes Model Averaging/BMA 贝叶斯模型平均Bayes optimal classifier 贝叶斯最优分类器Bayesian decision theory 贝叶斯决策论Bayesian network 贝叶斯网络Between-class scatter matrix 类间散度矩阵Bias 偏置/ 偏差Bias-variance decomposition 偏差-方差分解Bias-Variance Dilemma 偏差–方差困境Bi-directional Long-Short Term Memory/Bi-LSTM 双向长短期记忆Binary classification 二分类Binomial test 二项检验Bi-partition 二分法Boltzmann machine 玻尔兹曼机Bootstrap sampling 自助采样法/可重复采样/有放回采样Bootstrapping 自助法Letter CCalibration 校准Cascade-Correlation 级联相关Categorical attribute 离散属性Class-conditional probability 类条件概率Classification and regression tree/CART 分类与回归树Classifier 分类器Class-imbalance 类别不平衡Closed -form 闭式Cluster 簇/类/集群Cluster analysis 聚类分析Clustering 聚类Clustering ensemble 聚类集成Co-adapting 共适应Coding matrix 编码矩阵COLT 国际学习理论会议Committee-based learning 基于委员会的学习Competitive learning 竞争型学习Component learner 组件学习器Comprehensibility 可解释性Computation Cost 计算成本Computational Linguistics 计算语言学Computer vision 计算机视觉Concept drift 概念漂移Concept Learning System /CLS概念学习系统Conditional entropy 条件熵Conditional mutual information 条件互信息Conditional Probability Table/CPT 条件概率表Conditional random field/CRF 条件随机场Conditional risk 条件风险Confidence 置信度Confusion matrix 混淆矩阵Connection weight 连接权Connectionism 连结主义Consistency 一致性/相合性Contingency table 列联表Continuous attribute 连续属性Convergence收敛Conversational agent 会话智能体Convex quadratic programming 凸二次规划Convexity 凸性Convolutional neural network/CNN 卷积神经网络Co-occurrence 同现Correlation coefficient 相关系数Cosine similarity 余弦相似度Cost curve 成本曲线Cost Function 成本函数Cost matrix 成本矩阵Cost-sensitive 成本敏感Cross entropy 交叉熵Cross validation 交叉验证Crowdsourcing 众包Curse of dimensionality 维数灾难Cut point 截断点Cutting plane algorithm 割平面法Letter DData mining 数据挖掘Data set 数据集Decision Boundary 决策边界Decision stump 决策树桩Decision tree 决策树/判定树Deduction 演绎Deep Belief Network 深度信念网络Deep Convolutional Generative Adversarial Network/DCGAN 深度卷积生成对抗网络Deep learning 深度学习Deep neural network/DNN 深度神经网络Deep Q-Learning 深度Q 学习Deep Q-Network 深度Q 网络Density estimation 密度估计Density-based clustering 密度聚类Differentiable neural computer 可微分神经计算机Dimensionality reduction algorithm 降维算法Directed edge 有向边Disagreement measure 不合度量Discriminative model 判别模型Discriminator 判别器Distance measure 距离度量Distance metric learning 距离度量学习Distribution 分布Divergence 散度Diversity measure 多样性度量/差异性度量Domain adaption 领域自适应Downsampling 下采样D-separation (Directed separation)有向分离Dual problem 对偶问题Dummy node 哑结点Dynamic Fusion 动态融合Dynamic programming 动态规划Letter EEigenvalue decomposition 特征值分解Embedding 嵌入Emotional analysis 情绪分析Empirical conditional entropy 经验条件熵Empirical entropy 经验熵Empirical error 经验误差Empirical risk 经验风险End-to-End 端到端Energy-based model 基于能量的模型Ensemble learning 集成学习Ensemble pruning 集成修剪Error Correcting Output Codes/ECOC 纠错输出码Error rate 错误率Error-ambiguity decomposition 误差-分歧分解Euclidean distance 欧氏距离Evolutionary computation 演化计算Expectation-Maximization 期望最大化Expected loss 期望损失Exploding Gradient Problem 梯度爆炸问题Exponential loss function 指数损失函数Extreme Learning Machine/ELM 超限学习机Letter FExpert system 专家系统Factorization因子分解False negative 假负类False positive 假正类False Positive Rate/FPR 假正例率Feature engineering 特征工程Feature selection特征选择Feature vector 特征向量Featured Learning 特征学习Feedforward Neural Networks/FNN 前馈神经网络Fine-tuning 微调Flipping output 翻转法Fluctuation 震荡Forward stagewise algorithm 前向分步算法Frequentist 频率主义学派Full-rank matrix 满秩矩阵Functional neuron 功能神经元Letter GGain ratio 增益率Game theory 博弈论Gaussian kernel function 高斯核函数Gaussian Mixture Model 高斯混合模型General Problem Solving 通用问题求解Generalization 泛化Generalization error 泛化误差Generalization error bound 泛化误差上界Generalized Lagrange function 广义拉格朗日函数Generalized linear model 广义线性模型Generalized Rayleigh quotient 广义瑞利商Generative Adversarial Networks/GAN 生成对抗网络Generative Model 生成模型Generator 生成器Genetic Algorithm/GA 遗传算法Gibbs sampling 吉布斯采样Gini index 基尼指数Global minimum 全局最小Global Optimization 全局优化Gradient boosting 梯度提升Gradient Descent 梯度下降Graph theory 图论Ground-truth 真相/真实Letter HHard margin 硬间隔Hard voting 硬投票Harmonic mean 调和平均Hesse matrix海塞矩阵Hidden dynamic model 隐动态模型Hidden layer 隐藏层Hidden Markov Model/HMM 隐马尔可夫模型Hierarchical clustering 层次聚类Hilbert space 希尔伯特空间Hinge loss function 合页损失函数Hold-out 留出法Homogeneous 同质Hybrid computing 混合计算Hyperparameter 超参数Hypothesis 假设Hypothesis test 假设验证Letter IICML 国际机器学习会议Improved iterative scaling/IIS 改进的迭代尺度法Incremental learning 增量学习Independent and identically distributed/i.i.d. 独立同分布Independent Component Analysis/ICA 独立成分分析Indicator function 指示函数Individual learner 个体学习器Induction 归纳Inductive bias 归纳偏好Inductive learning 归纳学习Inductive Logic Programming/ILP 归纳逻辑程序设计Information entropy 信息熵Information gain 信息增益Input layer 输入层Insensitive loss 不敏感损失Inter-cluster similarity 簇间相似度International Conference for Machine Learning/ICML 国际机器学习大会Intra-cluster similarity 簇内相似度Intrinsic value 固有值Isometric Mapping/Isomap 等度量映射Isotonic regression 等分回归Iterative Dichotomiser 迭代二分器Letter KKernel method 核方法Kernel trick 核技巧Kernelized Linear Discriminant Analysis/KLDA 核线性判别分析K-fold cross validation k 折交叉验证/k 倍交叉验证K-Means Clustering K –均值聚类K-Nearest Neighbours Algorithm/KNN K近邻算法Knowledge base 知识库Knowledge Representation 知识表征Letter LLabel space 标记空间Lagrange duality 拉格朗日对偶性Lagrange multiplier 拉格朗日乘子Laplace smoothing 拉普拉斯平滑Laplacian correction 拉普拉斯修正Latent Dirichlet Allocation 隐狄利克雷分布Latent semantic analysis 潜在语义分析Latent variable 隐变量Lazy learning 懒惰学习Learner 学习器Learning by analogy 类比学习Learning rate 学习率Learning Vector Quantization/LVQ 学习向量量化Least squares regression tree 最小二乘回归树Leave-One-Out/LOO 留一法linear chain conditional random field 线性链条件随机场Linear Discriminant Analysis/LDA 线性判别分析Linear model 线性模型Linear Regression 线性回归Link function 联系函数Local Markov property 局部马尔可夫性Local minimum 局部最小Log likelihood 对数似然Log odds/logit 对数几率Logistic Regression Logistic 回归Log-likelihood 对数似然Log-linear regression 对数线性回归Long-Short Term Memory/LSTM 长短期记忆Loss function 损失函数Letter MMachine translation/MT 机器翻译Macron-P 宏查准率Macron-R 宏查全率Majority voting 绝对多数投票法Manifold assumption 流形假设Manifold learning 流形学习Margin theory 间隔理论Marginal distribution 边际分布Marginal independence 边际独立性Marginalization 边际化Markov Chain Monte Carlo/MCMC马尔可夫链蒙特卡罗方法Markov Random Field 马尔可夫随机场Maximal clique 最大团Maximum Likelihood Estimation/MLE 极大似然估计/极大似然法Maximum margin 最大间隔Maximum weighted spanning tree 最大带权生成树Max-Pooling 最大池化Mean squared error 均方误差Meta-learner 元学习器Metric learning 度量学习Micro-P 微查准率Micro-R 微查全率Minimal Description Length/MDL 最小描述长度Minimax game 极小极大博弈Misclassification cost 误分类成本Mixture of experts 混合专家Momentum 动量Moral graph 道德图/端正图Multi-class classification 多分类Multi-document summarization 多文档摘要Multi-layer feedforward neural networks 多层前馈神经网络Multilayer Perceptron/MLP 多层感知器Multimodal learning 多模态学习Multiple Dimensional Scaling 多维缩放Multiple linear regression 多元线性回归Multi-response Linear Regression /MLR 多响应线性回归Mutual information 互信息Letter NNaive bayes 朴素贝叶斯Naive Bayes Classifier 朴素贝叶斯分类器Named entity recognition 命名实体识别Nash equilibrium 纳什均衡Natural language generation/NLG 自然语言生成Natural language processing 自然语言处理Negative class 负类Negative correlation 负相关法Negative Log Likelihood 负对数似然Neighbourhood Component Analysis/NCA 近邻成分分析Neural Machine Translation 神经机器翻译Neural Turing Machine 神经图灵机Newton method 牛顿法NIPS 国际神经信息处理系统会议No Free Lunch Theorem/NFL 没有免费的午餐定理Noise-contrastive estimation 噪音对比估计Nominal attribute 列名属性Non-convex optimization 非凸优化Nonlinear model 非线性模型Non-metric distance 非度量距离Non-negative matrix factorization 非负矩阵分解Non-ordinal attribute 无序属性Non-Saturating Game 非饱和博弈Norm 范数Normalization 归一化Nuclear norm 核范数Numerical attribute 数值属性Letter OObjective function 目标函数Oblique decision tree 斜决策树Occam’s razor 奥卡姆剃刀Odds 几率Off-Policy 离策略One shot learning 一次性学习One-Dependent Estimator/ODE 独依赖估计On-Policy 在策略Ordinal attribute 有序属性Out-of-bag estimate 包外估计Output layer 输出层Output smearing 输出调制法Overfitting 过拟合/过配Oversampling 过采样Letter PPaired t-test 成对t 检验Pairwise 成对型Pairwise Markov property成对马尔可夫性Parameter 参数Parameter estimation 参数估计Parameter tuning 调参Parse tree 解析树Particle Swarm Optimization/PSO粒子群优化算法Part-of-speech tagging 词性标注Perceptron 感知机Performance measure 性能度量Plug and Play Generative Network 即插即用生成网络Plurality voting 相对多数投票法Polarity detection 极性检测Polynomial kernel function 多项式核函数Pooling 池化Positive class 正类Positive definite matrix 正定矩阵Post-hoc test 后续检验Post-pruning 后剪枝potential function 势函数Precision 查准率/准确率Prepruning 预剪枝Principal component analysis/PCA 主成分分析Principle of multiple explanations 多释原则Prior 先验Probability Graphical Model 概率图模型Proximal Gradient Descent/PGD 近端梯度下降Pruning 剪枝Pseudo-label伪标记Letter QQuantized Neural Network 量子化神经网络Quantum computer 量子计算机Quantum Computing 量子计算Quasi Newton method 拟牛顿法Letter RRadial Basis Function/RBF 径向基函数Random Forest Algorithm 随机森林算法Random walk 随机漫步Recall 查全率/召回率Receiver Operating Characteristic/ROC 受试者工作特征Rectified Linear Unit/ReLU 线性修正单元Recurrent Neural Network 循环神经网络Recursive neural network 递归神经网络Reference model 参考模型Regression 回归Regularization 正则化Reinforcement learning/RL 强化学习Representation learning 表征学习Representer theorem 表示定理reproducing kernel Hilbert space/RKHS 再生核希尔伯特空间Re-sampling 重采样法Rescaling 再缩放Residual Mapping 残差映射Residual Network 残差网络Restricted Boltzmann Machine/RBM 受限玻尔兹曼机Restricted Isometry Property/RIP 限定等距性Re-weighting 重赋权法Robustness 稳健性/鲁棒性Root node 根结点Rule Engine 规则引擎Rule learning 规则学习Letter SSaddle point 鞍点Sample space 样本空间Sampling 采样Score function 评分函数Self-Driving 自动驾驶Self-Organizing Map/SOM 自组织映射Semi-naive Bayes classifiers 半朴素贝叶斯分类器Semi-Supervised Learning半监督学习semi-Supervised Support Vector Machine 半监督支持向量机Sentiment analysis 情感分析Separating hyperplane 分离超平面Searching algorithm 搜索算法Sigmoid function Sigmoid 函数Similarity measure 相似度度量Simulated annealing 模拟退火Simultaneous localization and mapping同步定位与地图构建Singular Value Decomposition 奇异值分解Slack variables 松弛变量Smoothing 平滑Soft margin 软间隔Soft margin maximization 软间隔最大化Soft voting 软投票Sparse representation 稀疏表征Sparsity 稀疏性Specialization 特化Spectral Clustering 谱聚类Speech Recognition 语音识别Splitting variable 切分变量Squashing function 挤压函数Stability-plasticity dilemma 可塑性-稳定性困境Statistical learning 统计学习Status feature function 状态特征函Stochastic gradient descent 随机梯度下降Stratified sampling 分层采样Structural risk 结构风险Structural risk minimization/SRM 结构风险最小化Subspace 子空间Supervised learning 监督学习/有导师学习support vector expansion 支持向量展式Support Vector Machine/SVM 支持向量机Surrogat loss 替代损失Surrogate function 替代函数Symbolic learning 符号学习Symbolism 符号主义Synset 同义词集Letter TT-Distribution Stochastic Neighbour Embedding/t-SNE T –分布随机近邻嵌入Tensor 张量Tensor Processing Units/TPU 张量处理单元The least square method 最小二乘法Threshold 阈值Threshold logic unit 阈值逻辑单元Threshold-moving 阈值移动Time Step 时间步骤Tokenization 标记化Training error 训练误差Training instance 训练示例/训练例Transductive learning 直推学习Transfer learning 迁移学习Treebank 树库Tria-by-error 试错法True negative 真负类True positive 真正类True Positive Rate/TPR 真正例率Turing Machine 图灵机Twice-learning 二次学习Letter UUnderfitting 欠拟合/欠配Undersampling 欠采样Understandability 可理解性Unequal cost 非均等代价Unit-step function 单位阶跃函数Univariate decision tree 单变量决策树Unsupervised learning 无监督学习/无导师学习Unsupervised layer-wise training 无监督逐层训练Upsampling 上采样Letter VVanishing Gradient Problem 梯度消失问题Variational inference 变分推断VC Theory VC维理论Version space 版本空间Viterbi algorithm 维特比算法Von Neumann architecture 冯·诺伊曼架构Letter WWasserstein GAN/WGAN Wasserstein生成对抗网络Weak learner 弱学习器Weight 权重Weight sharing 权共享Weighted voting 加权投票法Within-class scatter matrix 类内散度矩阵Word embedding 词嵌入Word sense disambiguation 词义消歧。

neural ode文献综述

neural ode文献综述

"Neural ODE"(神经ordinary differential equation,神经常微分方程)是一种深度学习模型,它将神经网络与常微分方程(ODE)结合起来,用于对动态系统进行建模和预测。

以下是一些关于neural ODE 的文献综述:1."Neural Ordinary Differential Equations" by Chen et al. (2018):这篇论文介绍了neural ODE 的基本概念和方法,包括如何将神经网络与ODE 结合起来,以及如何训练和应用这种模型。

2." Solving Ordinary Differential Equations with Neural Networks" byRaissi et al. (2019):这篇论文介绍了如何使用neural ODE 来解决常微分方程的问题,包括数值解和解析解。

3."On the expressive power of neural ODEs" by Chen et al. (2019):这篇论文探讨了neural ODE 的表达能力,以及它与其他深度学习模型的关系。

4."Neural Ordinary Differential Equations for Image Processing" by Zhu et al. (2019):这篇论文介绍了如何使用neural ODE 来进行图像处理,包括图像生成、图像分类和图像分割等任务。

5." Beyond RNNs with Neural Ordinary Differential Equations" byGrathwohl et al. (2019):这篇论文介绍了如何使用neural ODE 来超越传统的循环神经网络(RNN),包括在处理长序列数据和时间序列预测等任务中的应用。

一种结合BERT修剪的作文自动评分模型

一种结合BERT修剪的作文自动评分模型

国内作文自动评分技术引入较晚,早期由梁茂成教 上下文,维度少,训练速度快。其缺点在于词和向量成
授在结合 PEG 和 IEA 的优点的基础上,带领团队开发 一对一关系,无法解决作文中出现多义词问题,而给出 了“句酷网”[7]。其核心思想都是基于浅层语言特征与 不准确的评分。且 word2vec 是静态模型,无法针对特
训练模型。这样导致 PEG 缺乏对作文内容和结构的分
综合上述提到的作文自动评分技术,主要采用了自然
析,其评分结果过于片面了 ;IEA(Intelligent Essay Analysis)[4] 是由 Hearts 等人研发,该系统基于潜伏语 义分析(Latent Semantic Analysis,LSA)[5] 技术开发
线性回归进行模型训练,并从语言、内容及结构三方面 定任务做动态优化,即无法面对不同内容的作文需要重
进行作文评分。近年来深度学习也开始投入到作文自动 新训练模型。
评分技术的研究中,如科大讯飞利用长短时循环神经网 络(LSTM)[8] 等多种机器学习技术训练模型,该模型 预测能力也表现较好。
RNN,指循环神经网络,其核心思想是以序列数据 作为输入,按序列方向进行递归且所有循环单元按链式 连接。[15] 该神经网络在自然语言处理任务中,能够较
轻教师们的工作负担也使得作文评分更加客观、高效。[1] 自 20 世纪 60 年代,最早的作文自动评分系统诞生
了,即 Ellis Page 教授研发的自动评分系统 PEG(Project Essay Grader)[2]。该系统主要提取浅层语言特征,如
基金项目 :上海市青年科技英才扬帆计划 (17YF1428400) 作者简介 :张冰雪(1985—),女,博士,研究方向 :教育人工智能、教育数据挖掘、自适应学习系统 ;邵小波(1995—),女,
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Textual data ca n be represented by a usually large dimensional matrix. For tasks such as in­ for mation retrieval or i nfo rm atio n filter i ng, it is th e refo re necessary to red11 ce the dimension­
onal factor spanning the se m ant i c space is com­
[6].
This paper presents a neural network real­ ization of LSA. T he memory-space problem is
0-7803-6456-2/00/$10.00 ©2000 IEEE
1201
resolved by computing each of the orthogonal factors in a sequential fashion or quasi-parallel. The proposed neural network architecture and its learning rule for LSA are suited either for VLSI implementation [2J or for implementation on a Dynamic Data-Driven Multiprocessor ma­ chine [10] currentir being developed by our col­ leagues at KUT. Hence the computational-time problem could be resolved by parallel implemen­ tation. The remaining sections of this article are or­ ganized as follows. To make this article self­ contained, Section 2 gives a brief review of LSA. Section 3 describes the architecture and the learning algorithm of the LSA neural networks. Section 4 reports the experimental results. Dis­ cussions and conclusions are then given.
,
To find the underlying or "latent" structure in the pattern of word usage across documents, X is approximated by X as follows:
X;:::: X
=
TSD',
(3)
where the t x n matrix T, the n x n matrix S, and the d x n matrix D are composed of the first n columns of To, So, and Do, respectively. T his approximation allows the arrangement of the space to reflect the major a..'!sociative pat­ terns in the data, and to ignore the smaller, ran­ dom or less important influences. The resulting space is called a semantic space.
A word-by-document t x d matrix X is composed of elements Xij defined by
Xij =
(1) where t is the number of words, and d the num­ ber of document.s. To improve LSA's' results, several preprocessing techniques can be applied to X'ij, such as the. one in [8]. We, however, do not use them in this work.
proximated by linear combination of these fac­ tors. In addition, this set of factors spans a se­ mantic space wherein terms and documents that are closely associated are placed near one an­ other. In practice, however, implementation of LSA for large-size data sets can not be achieved with most numerical SVD algorithms, mainly due to enormous consumption of memory space and computational time
2.2
, I {
0,
if word i occurs otherwise,
in
document j
szm . (d d i )
,
=
(d, d;) IIdlilldill'
(5)
SVD
where d; is the ith row-vector in D, (d, d;) represents the dot product of vectors d and di, and II . II is the Euclidean norm operation.
ality. A statistical method called Latent Seman­ tic Analysis (LSA) can be used fOT' not only di­
both have the goal of retrieving information rele­ vant to what a user wants, while minimizing the amount of irrelevant information retrieved
Neural Networks for Latent Semantic Analysis
Ruck ThawoIlIu8S, Jun-ichirou Hirayama, Takayuki Tomoike, and Akio Sakamoto Department of Information Systems Engineering,
improve the retrieval performance, a statistical method called Latent Semantic A nalysis (LSA)
[8]
is often used. LSA uses singular-value decomposition (SVD)
mensional reduction, mensional,
but also for transforma­
[9J.
To
tion of the original space into a m'uch lower di­ but much more meaningful feature space, usually called a semantic space. However, LSA is known to consume a lot of both mem­
ory space and comp utational time.
This paper
to decompose a usually large word-by-document matrix into a set of much smaller dimensional or­
illustrates how to resolve these problems.
[1].
A textual database in information filtering or
retrieval systems can be represented by a word­
by-document matrix, whose elements represent the occurrence of a word in a document
185 Miyanokuchi, Tosayamada-cho, Kami-gun, Kochi 782-8502, Japan
E-mail: ruck@info.kochi-tech.ac.jp
Kochi University of Technology
1
Abstract
Introduction来自The greater availability of information in elec­ tronic form brought about a comparable growth in research on methods for information filter­ ing and information retrieval. These two tasks
.
Neu­
a
quasi p ara l le l fashion, namely only one orthog­
相关文档
最新文档