A Discrete-Time Projection Neural Network for Solving
自联想记忆神经网络研究_王传栋
X ' = sgn (WX ) = sgn (
M
X
i
(X
i
T
)
X
)
=
i= 1
sgn ( M X i X i, X )
( 1)
i= 1
其中, W =
M X i (Xi )T , 这里 W 为权值 矩阵,
i= 1
sgn ( ) 为二值符号函数。
2 ) HAM 模型收敛性分析。
( 2) 能够通过数学理论分析 该模型 的存储容 量以 及所能获得的纠错性能;
( 3) 开创了 联想 记忆 神经 网络 研究 的先 河, 特 别 是通过能量函数分析网络稳定性的思路在后续很多联 想记忆模型的研究中得以广泛使用;
( 4) 为解决组合优化等实际 应用问 题提供了 有益 的思路。
2 自联想记忆模型研究进展
3. D epartm en t o f Info rm ation and T echno logy, N an jing C o llege o f Fo restry Po lice, N an jing 210046, Ch ina)
A bstract: A s an im portant art if icial n eural n etw o rk, au to - associative m em ory m odel ( AM ) can b e em p loyed to m im ic hum an th ink ing and m ach ine in tell igence, w h ich has m assively parallel distribu ted con f igurat ion and con ten t- addressab le ab ility. In th is pap er, in troduce in detail the H opf ield A ssociativeM em ory ( HAM ) n eural netw ork w h ich has y ie lded a great im pact on the developm en t of au to- assoc i ativem em ory m ode,l and an alyze HAM s strongpo in t and d raw back. S econd ly, focu sing on the exist ing re levan t research literatures, presen t a survey of au to - asso ciativem em ory m odels f rom the three aspects such as learn ing a lgorithm, n etw ork arch itecture and p ractical app lication; Fina lly, summ ariz e th e m ain question w h ich au to - associativem em ory m odels are faced w ith at presen,t and forecast its fu ture developm en t tendency. K ey words: neural netw o rk; au to- assoc iativ em em ory; in tel ligen t in form ation p rocessing
递归最小二乘循环神经网络
递归最小二乘循环神经网络赵 杰 1张春元 1刘 超 1周 辉 1欧宜贵 2宋 淇1摘 要 针对循环神经网络(Recurrent neural networks, RNNs)一阶优化算法学习效率不高和二阶优化算法时空开销过大, 提出一种新的迷你批递归最小二乘优化算法. 所提算法采用非激活线性输出误差替代传统的激活输出误差反向传播,并结合加权线性最小二乘目标函数关于隐藏层线性输出的等效梯度, 逐层导出RNNs 参数的迷你批递归最小二乘解. 相较随机梯度下降算法, 所提算法只在RNNs 的隐藏层和输出层分别增加了一个协方差矩阵, 其时间复杂度和空间复杂度仅为随机梯度下降算法的3倍左右. 此外, 本文还就所提算法的遗忘因子自适应问题和过拟合问题分别给出一种解决办法. 仿真结果表明, 无论是对序列数据的分类问题还是预测问题, 所提算法的收敛速度要优于现有主流一阶优化算法, 而且在超参数的设置上具有较好的鲁棒性.关键词 深度学习, 循环神经网络, 递归最小二乘, 迷你批学习, 优化算法引用格式 赵杰, 张春元, 刘超, 周辉, 欧宜贵, 宋淇. 递归最小二乘循环神经网络. 自动化学报, 2022, 48(8): 2050−2061DOI 10.16383/j.aas.c190847Recurrent Neural Networks With Recursive Least SquaresZHAO Jie 1 ZHANG Chun-Yuan 1 LIU Chao 1 ZHOU Hui 1 OU Yi-Gui 2 SONG Qi 1Abstract In recurrent neural networks (RNNs), the first-order optimization algorithms usually converge slowly,and the second-order optimization algorithms commonly have high time and space complexities. In order to solve these problems, a new minibatch recursive least squares (RLS) optimization algorithm is proposed. Using the inact-ive linear output error to replace the conventional activation output error for backpropagation, together with the equivalent gradients of the weighted linear least squares objective function with respect to linear outputs of the hid-den layer, the proposed algorithm derives the minibatch recursive least squares solutions of RNNs parameters layer by layer. Compared with the stochastic gradient descent algorithm, the proposed algorithm only adds one covari-ance matrix into each layer of RNNs, and its time and space complexities are almost three times as much. Further-more, in order to address the adaptive problem of the forgetting factor and the overfitting problem of the proposed algorithm, two approaches are also presented, respectively, in this paper. The simulation results, on the classifica-tion and prediction problems of sequential data, show that the proposed algorithm has faster convergence speed than popular first-order optimization algorithms. In addition, the proposed algorithm also has good robustness in the selection of hyperparameters.Key words Deep learning, recurrent neural network (RNN), recursive least squares (RLS), minibatch learning, op-timization algorithmCitation Zhao Jie, Zhang Chun-Yuan, Liu Chao, Zhou Hui, Ou Yi-Gui, Song Qi. Recurrent neural networks with recursive least squares. Acta Automatica Sinica , 2022, 48(8): 2050−2061循环神经网络(Recurrent neural networks,RNNs)作为一种有效的深度学习模型, 引入了数据在时序上的短期记忆依赖. 近年来, RNNs 在语言模型[1]、机器翻译[2]、语音识别[3]等序列任务中均有不俗的表现. 但是相比前馈神经网络而言, 也正因为其短期记忆依赖, RNNs 的参数训练更为困难[4−5].如何高效训练RNNs, 即RNNs 的优化, 是RNNs 能否得以有效利用的关键问题之一. 目前主流的RNNs 优化算法主要有一阶梯度下降算法、自适应学习率算法和二阶梯度下降算法等几种类型.最典型的一阶梯度下降算法是随机梯度下降(Stochastic gradient descent, SGD)[6], 广泛应用于优化RNNs. SGD 基于小批量数据的平均梯度对参数进行优化. 因为SGD 的梯度下降大小和方向完全依赖当前批次数据, 容易陷入局部极小点, 故而学习效率较低, 更新不稳定. 为此, 研究者在SGD收稿日期 2019-12-12 录用日期 2020-04-07Manuscript received December 12, 2019; accepted April 7, 2020国家自然科学基金(61762032, 61662019, 11961018)资助Supported by National Natural Science Foundation of China (61762032, 61662019, 11961018)本文责任编委 曹向辉Recommended by Associate Editor CAO Xiang-Hui1. 海南大学计算机科学与技术学院 海口 5702282. 海南大学理学院 海口 5702281. School of Computer Science and Technology, Hainan Uni-versity, Haikou 5702282. School of Science, Hainan University,Haikou 570228第 48 卷 第 8 期自 动 化 学 报Vol. 48, No. 82022 年 8 月ACTA AUTOMATICA SINICAAugust, 2022的基础上引入了速度的概念来加速学习过程, 这种算法称为基于动量的SGD算法[7], 简称为Momen-tum. 在此基础上, Sutskever等[8]提出了一种Nes-terov动量算法. 与Momentum的区别体现在梯度计算上. 一阶梯度下降算法的超参数通常是预先固定设置的, 一个不好的设置可能会导致模型训练速度低下, 甚至完全无法训练. 针对SGD的问题, 研究者提出了一系列学习率可自适应调整的一阶梯度下降算法, 简称自适应学习率算法. Duchi等[9]提出的AdaGrad算法采用累加平方梯度对学习率进行动态调整, 在凸优化问题中表现较好, 但在深度神经网络中会导致学习率减小过快. Tieleman等[10]提出的RMSProp算法与Zeiler[11]提出的AdaDelta 算法在思路上类似, 都是使用指数衰减平均来减少太久远梯度的影响, 解决了AdaGrad学习率减少过快的问题. Kingma等[12]提出的Adam算法则将RMSProp与动量思想相结合, 综合考虑梯度的一阶矩和二阶矩估计计算学习率, 在大部分实验中比AdaDelta等算法表现更为优异, 然而Keskar等[13]发现Adam最终收敛效果比SGD差, Reddi等[14]也指出Adam在某些情况下不收敛.基于二阶梯度下降的算法采用目标函数的二阶梯度信息对参数优化. 最广泛使用的是牛顿法, 其基于二阶泰勒级数展开来最小化目标函数, 收敛速度比一阶梯度算法快很多, 但是每次迭代都需要计算Hessian矩阵以及该矩阵的逆, 计算复杂度非常高. 近年来研究人员提出了一些近似算法以降低计算成本. Hessian-Free算法[15]通过直接计算Hessi-an矩阵和向量的乘积来降低其计算复杂度, 但是该算法每次更新参数需要进行上百次线性共轭梯度迭代. AdaQN[16]在每个迭代周期中要求一个两层循环递归, 因此计算量依然较大. K-FAC算法(Kro-necker-factored approximate curvature)[17]通过在线构造Fisher信息矩阵的可逆近似来计算二阶梯度. 此外, 还有BFGS算法[18]以及其衍生算法(例如L-BFGS算法[19−20]等), 它们都通过避免计算Hessian矩阵的逆来降低计算复杂度. 相对于一阶优化算法来说, 二阶优化算法计算量依然过大, 因此不适合处理规模过大的数据集, 并且所求得的高精度解对模型的泛化能力提升有限, 甚至有时会影响泛化, 因此二阶梯度优化算法目前还难以广泛用于训练RNNs.除了上面介绍的几种类型优化算法之外, 也有不少研究者尝试将递归最小二乘算法(Recursive least squares, RLS)应用于训练各种神经网络. RLS是一种自适应滤波算法, 具有非常快的收敛速度. Azimi-Sadjadi等[21]提出了一种RLS算法, 对多层感知机进行训练. 谭永红[22]将神经网络层分为线性输入层与非线性激活层, 对非线性激活层的反传误差进行近似, 并使用RLS算法对线性输入层的参数矩阵进行求解来加快模型收敛. Xu等[23]成功将RLS算法应用于多层RNNs. 上述算法需要为每个神经元存储一个协方差矩阵, 时空开销很大. Peter 等[24]提出了一种扩展卡尔曼滤波优化算法, 对RN-Ns进行训练. 该算法将RNNs表示为被噪声破坏的平稳过程, 然后对网络的状态矩阵进行求解. 该算法不足之处是需要计算雅可比矩阵来达到线性化的目的, 时空开销也很大. Jaeger[25]通过将非线性系统近似为线性系统, 实现了回声状态网络参数的RLS求解, 但该算法仅限于求解回声状态网络的输出层参数, 并不适用于一般的RNNs训练优化.针对以上问题, 本文提出了一种新的基于RLS 优化的RNN算法(简称RLS-RNN). 本文主要贡献如下: 1) 在RLS-RNN的输出层参数更新推导中, 借鉴SGD中平均梯度的计算思想, 提出了一种适于迷你批样本训练的RLS更新方法, 显著减少了RNNs的实际训练时间, 使得所提算法可处理较大规模数据集. 2) 在RLS-RNN的隐藏层参数更新推导中, 提出了一种等效梯度思想, 以获得该层参数的最小二乘解, 同时使得RNNs仅要求输出层激活函数存在反函数即可采用RLS进行训练, 对隐藏层的激活函数则无此要求. 3) 相较以前的RLS 优化算法, RLS-RNN只需在隐藏层和输出层而非为这两层的每一个神经元分别设置一个协方差矩阵, 使得其时间和空间复杂度仅约SGD算法的3倍.4) 对RLS-RNN的遗忘因子自适应和过拟合预防问题进行了简要讨论, 分别给出了一种解决办法.1 背景1.1 基于SGD优化的RNN算法X s,t∈R m×a H s,t∈R m×h O s,t∈R m×d s tm ah dU s−1∈R a×h W s−1∈R h×hV s−1∈R h×d sb H s−1∈R1×h b O s−1∈R1×dττRNNs处理时序数据的模型结构如图1所示.一个基本的RNN通常由一个输入层、一个隐藏层(也称为循环层)和一个输出层组成. 在图1中, , 和 分别为第批训练样本数据在第时刻的输入值、隐藏层和输出层的输出值, 其中, 为迷你批大小, 为一个训练样本数据的维度, 为隐藏层神经元数, 为输出层神经元数; , 和分别为第批数据训练时输入层到隐藏层、隐藏层内部、隐藏层到输出层的参数矩阵;和分别为隐藏层和输出层的偏置参数矩阵; 表示当前序列数据共有时间步. RNNs的核心思想是在模型的不同时间步对参8 期赵杰等: 递归最小二乘循环神经网络2051数进行共享, 将每一时间步的隐藏层输出值加权输入到其下一时间步的计算中, 从而令权重参数学习到序列数据不同时间步之间的关联特征并进行泛化. 输出层则根据实际问题选择将哪些时间步输出,比较常见的有序列数据的分类问题和预测问题. 对序列数据预测问题, 输出层每一时间步均有输出;对序列数据分类问题, 输出层没有图1虚线框中的时间步输出, 即仅在最后一个时间步才有输出.图 1 RNN 模型结构Fig. 1 RNN model structureRNNs 通过前向传播来获得实际输出, 其计算过程可描述为H s,t =φ(X s,t U s −1+H s,t −1W s −1+1×b H s −1)(1)O s,t =σ(H s,t V s −1+1×b O s −1)(2)1m φ(·)σ(·)其中, 为 行全1列向量; 和分别为隐藏层和输出层的激活函数, 常用的激活函数有sig-moid 函数与tanh 函数等. 为了便于后续推导和表达的简洁性, 以上两式可用增广矩阵进一步表示为R H s,t ∈R m ×(a +h +1)R O s,t ∈Rm ×(h +1)ΘH s −1∈R(a +h +1)×hΘO s −1∈R(h +1)×d其中, , 分别为隐藏层与输出层的输入增广矩阵; , 分别为隐藏层与输入层的权重参数增广矩阵, 即R H s,t =[X s,tH s,t −11](5)R Os,t =[H s,t1](6)RNNs 的参数更新方式和所采用的优化算法密切相关, 基于SGD 算法的RNNs 模型优化通常借助于最小化目标函数反向传播完成. 常用目标函数有交叉熵函数、均方误差函数、Logistic 函数等. 这里仅考虑均方误差目标函数Y ∗s,t ∈Rm ×dX s,t Θs −1t 0t 0=τt 0=1其中, 为 对应的期望输出; 为网络中的所有参数矩阵; 表示输出层的起始输出时间步, 如果是分类问题, , 如果是序列预测问题, 则 , 下文延续该设定, 不再赘述.ˆ∇O s=∂ˆJ (Θs −1)∂ΘOˆ∇O s 令 , 由式(9)和链导法则, 则 为ˆ∆O s,t=∂ˆJ(Θs −1)∂Z O其中, , 即◦Z Os,t 式中, 为Hadamard 积, 为输出层非激活线性输出, 即则该层参数更新规则可定义为α其中,为学习率.ˆ∇H s =∂J (Θs −1)∂ΘH s −1令 , 根据BPTT (Back propag-ation through time)算法[26], 由式(9)和链导法则可得ˆ∆H s,t=∂ˆJ(Θs −1)∂Z H s,t其中, 为目标函数对于隐藏层非激活线性输出的梯度, 即˜∆H s,t =[ˆ∆O s,t ,ˆ∆H s,t +1],˜ΘH s −1=[V s −1,W s −1],Z H s,t 其中, 为隐藏层非激活线性输出, 即则该层参数更新规则可定义为1.2 RLS 算法RLS 是一种最小二乘优化算法的递推化算法,2052自 动 化 学 报48 卷X t ={x 1,···,x t }Y ∗t ={y ∗1,···,y ∗t }不但收敛速度很快, 而且适用于在线学习. 设当前训练样本输入集 , 对应的期望输出集为 . 其目标函数通常定义为w λ∈(0,1]其中, 为权重向量; 为遗忘因子.∇w J (w )=0令 ,可得整理后可表示为其中,为了避免昂贵的矩阵求逆运算且适用于在线学习, 令将式(21)和式(22)改写为如下递推更新形式由Sherman-Morrison-Woodbury 公式[27]易得其中,g t 其中,为增益向量. 进一步将式(23)、(25)和(26)代入式(20), 可得当前权重向量的更新公式为其中,2 基于RLS 优化的RNNs 算法RLS 算法虽然具有很快的学习速度, 然而只适用于线性系统. 我们注意到在RNNs 中, 如果不考虑激活函数, 其隐藏层和输出层的输出计算依旧是σ(·)σ−1(·)线性的, 本节将基于这一特性来构建新的迷你批RLS 优化算法. 假定输出层激活函数 存在反函数 , 并仿照RLS 算法将输出层目标函数定义为s s Z O ∗n,t 其中,代表共有 批训练样本; 为输出层的非激活线性期望值, 即因此, RNNs 参数优化问题可以定义为H s,t O s,t Z Os,t 由于RNNs 前向传播并不涉及权重参数更新,因此本文所提算法应用于RNNs 训练时, 其前向传播计算与第1.1节介绍的SGD-RNN 算法基本相同, 同样采用式(3)计算, 唯一区别是此处并不需要计算 , 而是采用式(12)计算 . 本节将只考虑RLS-RNN 的输出层和隐藏层参数更新推导.2.1 RLS-RNN输出层参数更新推导∇ΘO =∂J (Θ)∂ΘO令 , 由式(31)和链导法则可得∆O n,t =∂J (Θ)∂Z O其中, , 即ΘO ∗∇ΘO =0为了求取最优参数 , 进一步令 , 即将式(35)代入式(36), 得ΘO s 整理可得 的最小二乘解其中,类似于RLS 算法推导, 以上两式可进一步写成8 期赵杰等: 递归最小二乘循环神经网络2053如下递推形式R O s,t,k ∈Rh +1(R O s,t )T k Z O ∗s,t,k ∈R d (Z O ∗s,t )Tk A O s 其中, 为 的第 列向量, 为 的第 列向量. 但是, 由于此处RN-Ns 基于迷你批训练, 式(41)并不能像式(24)那样直接利用Sherman-Morrison-Woodbury 公式求解 的逆.ΘO s −1A O s −1B Os −1考虑到同一批次中各样本 , 和 是相同的, 借鉴SGD 计算迷你批平均梯度思想, 接下来采用平均近似方法来处理这一问题. 因为式(41)和式(42)可以重写为如下形式其中,(A O s )−1ΘOs 因而可使用如下公式来近似求得和 为P O s =(A O s )−1令 , 根据式(47)和式(38)以及Sherman-Morrison-Woodbury 公式, 整理后得如下更新式为∆O s,t,k ∈R d(∆O s,t )T k 其中, 为 的第 列向量, 且ΛO s,t,k =P O s −1R Os,t,k(51)2.2 RLS-RNN 隐藏层参数更新推导∇ΘH =∂J (Θ)∂ΘH令 , 由式(31)和链导法则可得∆H n,t =∂J (Θ)∂Z H n,t其中, , 使用BPTT 算法计算其具体形式为´∆H n,t =∆O n,t ,∆H n,t +1∇ΘH =0其中, . 进一步令 , 可得φ′(Z Hs,t )ΘH 然而, 式(54)非常复杂, 且 一般为非线性, 我们并不能将式(54)代入式(55)求得隐藏层参数 的最小二乘解.∆H n,t ΘH J H (ΘH )接下来我们提出一种新的方法来导出 的等价形式, 藉此来获得 的最小二乘解. 临时定义一个新的隐藏层目标函数Z H ∗n,t J (Θ)→0J H (ΘH )→0其中, 为该层非激活线性输出期望值. 显然, 如果 , 那么 . 即∂J H(ΘH)∂ΘH=0令 , 得∆H n,t 对比式(55)和式(58), 可以得到 的另一种等价定义形式ηηZ H n,t =R H n,t ΘH其中, 为比例因子. 理论上讲,不同迷你批数据对应的 应该有一定的差别. 但考虑到各批迷你批数据均是从整个训练集中随机选取, 因此可忽略这一差别. 根据式(16)可知 , 且将式(59)代入式(55), 得ΘH s 进一步整理, 可得 的最小二乘解2054自 动 化 学 报48 卷其中,P H s =(A H s )−1式(61)的递归最小二乘解推导过程类似于输出层参数更新推导. 令 , 同样采用上文的近似平均求解方法, 易得∆H s,t,k ∈R h (∆H s,t )Tk 其中, 为 的第 列向量, 且ΛH s,t,k =P H s −1RHs,t,k(66)Z H ∗s,t ∆H s,t 需要说明的是, 因为我们并不知道隐藏层期望输出 , 所以实际上不能通过式(59)来求取. 幸运的是, 式(54)与(59)等价, 因此在算法具体实现中, 采用式(54)来替换式(59).综上, RLS-RNN 算法如算法 1所示.算法 1. 基于RLS 优化的RNN 算法{(X 1,Y ∗1),(X 2,Y ∗2),···,(X N ,Y ∗N )},τληαRequire: 迷你批样本 时间步 , 遗忘因子 , 比例因子 , 协方差矩阵初始参数 ;ΘH 0ΘO0P H 0=αI H ,P O 0=αI O ;Initialize: 初始化权重矩阵 和 , 初始化协方差矩阵 s =1,2,···,N for do H s,0=0 设置 ;t =1,2,···,τ for do H s,t 用式(3)计算 ;Z s,t 用式(12)计算 ; end fort =τ,τ−1,···,1 for do ∆O s,t 用式(35)计算 ;∆H s,t 用式(54)计算 ;k =1,···,m for doΛO s,t,k G O s,t,k 用式(51), (52)计算 , ;ΛH s,t,k G H s,t,k 用式(66), (67)计算 , ; end for end forP Os ΘO s 用式(49), (50)更新 , ;P Hs ΘH s 用式(64), (65)更新 , ; end for .3 分析与改进3.1 复杂度分析τm a h d a d h 在RNNs 当前所用优化算法中, SGD 是时间和空间复杂度最低的算法. 本节将以SGD-RNN 为参照, 来对比分析本文提出的RLS-RNN 算法的时间和空间复杂度. 两个算法采用一个迷你批样本数据集学习的时间和空间复杂度对比结果如表1所示. 从第1节介绍可知, 表示序列数据时间步长度, 表示批大小, 表示单个样本向量的维度, 表示隐藏层神经元数量, 表示输出层神经元数量.在实际应用中, 和 一般要小于 , 因而RLS-RNN 的时间复杂度和空间复杂度大约为SGD-RNN 的3倍. 在实际运行中, 我们发现RLS-RNN 所用时间和内存空间大约是SGD-RNN 的3倍, 与本节理论分析结果正好相吻合.所提算法只需在RNNs 的隐藏层和输出层各设置一个矩阵, 而以前的RLS 优化算法则需为RNNs 隐藏层和输出层的每一个神经元设置一个与所提算法相同规模的协方差矩阵, 因而所提算法在时间和空间复杂度上有着大幅降低. 此外, 所提算法采用了深度学习广为使用的迷你批训练方式, 使得其可用于处理较大规模的数据集.λ3.2 自适应调整λλλ众多研究表明, 遗忘因子 的取值对RLS 算法性能影响较大[28], 特别是在RLS 处理时变任务时影响更大. 由于本文所提算法建立在传统RLS 基础之上, 因而RLS-RNN 的收敛质量也易受 的取值影响. 在RLS 研究领域, 当前已有不少关于 自适应调整方面的成果[28−29], 因此可以直接利用这些成果对RLS-RNN 作进一步改进.λs 在文献[29]基础上, 本小节直接给出一种 自适应调整方法. 对第 迷你批样本, RLS-RNN 各层中的遗忘因子统一定义为λmax κ>1λs κλs ξλs q s σes其中, 接近于1, 用于控制 更新, 一般建议取2, 通常 取值越小, 更新越频繁; 是一个极小的常数, 防止在计算 时分母为0; , 8 期赵杰等: 递归最小二乘循环神经网络2055σv s 和 定义为µ07/8;µ1=1−1/(ς1m )ς1≥2;µ2=1−1/(ς2m )ς2>ς1其中, 建议取 , 通常 , 且 .λs λλ当然, 采用以上方式更新 将会引入新的超参数, 给RLS-RNN 的调试带来一定困难. 从使用RLS-RNN 的实际经验来看, 也可采用固定的 进行训练, 建议将 取值设置在0.99至1之间.3.3 过拟合预防传统RLS 算法虽然具有很快的收敛速度, 但也经常面临过拟合风险, RLS-RNN 同样面临这一风险. 类似于第3.2节, 同样可以利用RLS 领域关于这一问题的一些研究成果来改进RLS-RNN.L 1Ek șio ğlu [30]提出了一种 正则化RLS 方法,即在参数更新时附加一个正则化项. 对其稍加改进,则在式(50)和式(65)的基础上可分别重新定义为γG O s,t =G O s,t,1,···,G Os,t,m G H s,t =[G H s,t,1,···,G H s,t,m ]其中, 为正则化因子, ,.实际上, 除了这种方法外, 读者也可采用其他正则化方法对RLS-RNN 作进一步改进.4 仿真实验αη为了验证所提算法的有效性, 本节选用两个序列数据分类问题和两个序列数据预测问题进行仿真实验. 其中, 两个分类问题为MNIST 手写数字识别分类[31]和IMDB 影评正负情感分类, 两个预测问题为Google 股票价格预测[32]与北京市PM2.5污染预测[33]. 在实验中, 将着重验证所提算法的收敛性能、超参数 和 选取的鲁棒性. 在收敛性能验证中, 选用主流一阶梯度优化算法SGD 、Momentum 和Adam 进行对比, 所有问题的实验均迭代运行150Epochs; 在超参数鲁棒性验证中, 考虑到所提算法收敛速度非常快, 所有问题的实验均只迭代运行50Epochs. 为了减少实验结果的随机性, 所有实验均重复运行5次然后取平均值展示结果. 此外, 为了观察所提算法的实际效果, 所有优化算法在RN-Ns 参数更新过程均不进行Dropout 处理. 需要特别说明的是: 对前两个分类问题, 由于时变性不强,所提算法遗忘因子采用固定值方式而不采用第3.2表 1 SGD-RNN 与RLS-RNN 复杂度分析Table 1 Complexity analysis of SGD-RNN and RLS-RNNSGD-RNNRLS-RNN时间复杂度O s O (τmdh )—Z s —O (τmdh ) H s O (τmh (h +a ))O (τmh (h +a ))∆O sO (4τmd ) O (3τmd ) ∆H sO (τmh (h +d ))O (τmh (h +d )) P O s —O (2τmh 2) P H s—O (2τm (h +a )2)ΘO s O (τmdh ) O (τmdh ) ΘH s O (τmh (h +a )) O (τmh (h +a ))合计O (τm (3dh +3h 2+2ha ))O (τm (7h 2+2a 2+3dh +6ha ))空间复杂度ΘO s O (hd ) O (hd ) ΘH sO (h (h +a ))O (h (h +a )) P Hs —O ((h +a )2)P O s—O (h 2)合计O (h 2+hd +ha )O (hd +3ha +a 2+3h 2)2056自 动 化 学 报48 卷节所提方式; 对后两个预测问题, 所提算法遗忘因子将采用第3.2节所提方式; 所提算法对4个问题均将采用第3.3节所提方法防止过拟合.4.1 MNIST 手写数字识别分类28×28MNIST 分类问题的训练集与测试集分别由55 000和10 000幅 像素、共10类灰度手写数字图片组成, 学习目标是能对给定手写数字图片进行识别. 为了适应RNNs 学习, 将训练集和测试集中的每张图片转换成一个28时间步的序列, 每时间步包括28个像素输入, 图片类别采用One-hot 编码.tanh (·).tanh (·)tanh −1(1)tanh −1(−1)tanh −1(x )x ≥0.997tanh −1(x )=tanh −1(0.997)x ≤−0.997,tanh −1(x )=tanh −1(−0.997)该问题所用RNN 模型结构设置如下: 1) 输入层输入时间步为28, 输入向量维度为28. 2) 隐藏层时间步为28, 神经元数为100, 激活函数为 3) 输出层时间步为1, 神经元数为10, 激活函数为. 由于 和 分别为正、负无穷大, 在具体实现中, 对 , 我们约定: 若, 则 ; 若 则 . RNN 模型权重参数采用He 初始化[34].在收敛性能对比验证中, 各优化算法超参数设ληαγβ1β2ϵ10−8αηλ=0.9999γ=0.0001η=1α=0.01,0.1,0.2, (1)=0.9999,γ=0.0001α=0.4,η=0.1,1,2,···,10置如下: RLS 遗忘因子 为0.9999, 比例因子 为1, 协方差矩阵初始化参数 为0.4, 正则化因子 为0.0001; SGD 学习率为0.05; Momentum 学习率为0.05, 动量参数0.5; Adam 学习率0.001, 设为0.9, 为0.999, 设为 . 在超参数 和 选取的鲁棒性验证中, 采用控制变量法进行测试: 1)固定 , 和 , 依次选取 验证; 2) 固定 和 依次选取 验证.αηαα在上述设定下, 每一Epoch 均将训练集随机划分成550个迷你批, 批大小为100. 每训练完一个Epoch, 便从测试集中随机生成50个迷你批进行测试, 统计其平均分类准确率. 实验结果如图2(a)、表2和表3所示. 由图2(a)可知, RLS 在第1个Epoch 便可将分类准确率提高到95%以上, 其收敛速度远高于其他三种优化算法, 且RLS 的准确率曲线比较平滑, 说明参数收敛比较稳定. 表2和表3记录了该实验取不同的 和 时第50 Epoch 的平均分类准确率. 从表2中不难看出, 不同初始化因子 在第50 Epoch 的准确率都在97.10%到97.70%之间波动, 整体来说比较稳定, 说明 对算法性能图 2 收敛性比较实验结果Fig. 2 Experimental results on the convergence comparisons8 期赵杰等: 递归最小二乘循环神经网络2057ηηαη影响较小. 从表3中可知, 不同 取值的准确率均在97.04%到97.80%之间, 波动较小, 取值对算法性能的影响也不大. 综上, RLS 算法的 和 取值均具有较好的鲁棒性.4.2 IMDB 影评情感分类IMDB 分类问题的训练集和测试集分别由25 000和10 000条电影评论组成, 正负情感评论各占50%,学习目标是能对给定评论的感情倾向进行识别. 为了适应RNNs 学习, 首先从Keras 内置数据集加载训练集和测试集的各条评论, 选取每条评论前32个有效词构成一个时间步序列, 然后对该评论中的每个有效词以GloVe.6B 预训练模型[35]进行词嵌入, 使得每个时间步包括50个输入维度, 评论的正负情感类别采用One-hot 编码.tanh (·)tanh (·)tanh −1(x )该问题所用RNN 模型结构设置如下: 1) 输入层输入时间步为32, 输入向量维度为50. 2) 隐藏层时间步为32, 神经元数为100, 激活函数为 .3) 输出层时间步为1, 神经元数为2, 激活函数为. 问题和RNN 模型权重参数的初始化按第4.1节方式同样处理.ληαγβ1β2ϵ10−8αηλ=0.9999,γ=0.001η=1α=0.01,0.1,0.2,···,1λ=0.9999,γ=0.001α=0.4η=0.1,1,2,···,10在收敛性能对比验证中, 各优化算法超参数设置如下: RLS 遗忘因子 为0.9999, 比例因子 为1, 协方差矩阵初始化参数 为0.4, 正则化因子 为0.001; SGD 学习率为0.05; Momentum 学习率为0.05, 动量参数0.5; Adam 学习率0.0001, 设为0.9, 设为0.999, 设为 . 在超参数 和 选取的鲁棒性验证中, 同样采用控制变量法进行测试: 1) 固定 和 , 依次选取 验证; 2) 固定 和 , 依次选取 验证.αηααηηαη在上述设定下, 每一Epoch 均将训练集随机划分成250个迷你批, 批大小为100. 每训练完一个Epoch, 便从测试集中随机生成50个迷你批进行测试, 统计其平均分类准确率. 实验结果如图2(b)、表2和表3所示. 由图2(b)可知, SGD 与Mo-mentum 的收敛不太稳定, 波动比较大, 而Adam 的准确率曲线则比较平滑, 这三者在训练初期的准确率都比较低. 相比之下, RLS 在训练初期的准确率已经比较接近后期预测准确率, 前期收敛速度极快, 整体准确率也明显优于其余三种优化算法. 表2和表3记录了IMDB 实验取不同的 和 时第50Epoch 的平均分类准确率. 由表2易知不同 的情况下准确率浮动范围比较小, 因此不同 对算法的影响比较小. 由表3可知, 采用不同 时其准确率在72.86%到73.82%之间浮动, 可见 的取值对算法性能影响较小. 综上, RLS 算法的 和 取值在本实验中同样都具有较好的鲁棒性.4.3 Google 股票价格预测Google 股票价格预测问题的数据源自Google 公司从2010年1月4日到2016年12月30日的股价记录, 每日股价记录包括当日开盘价、当日最低价、当日最高价、交易笔数及当日调整后收盘价五种数值, 学习目标是能根据当日股价预测调整后次日收盘价. 为了适应RNNs 学习, 首先对这些数值进行归一化处理, 然后以连续50个交易日为单位进行采样, 每次采样生成一条5维输入序列数据,同时将该次采样后推一个交易日选取各日调整后收盘价生成对应的一维期望输出序列数据, 取前1 400条序列数据的训练集, 后续200条序列数据为测试α表 2 初始化因子 鲁棒性分析αTable 2 Robustness analysis of the initializing factor α0.010.10.20.30.40.50.60.70.80.9 1.0MNIST 分类准确率 (%)97.1097.3697.3897.3597.5797.7097.1997.2797.4297.2597.60IMDB 分类准确率 (%)72.2173.5073.2473.3274.0273.0173.6873.2573.2073.4273.12×10−4股价预测MSE ( ) 5.32 5.19 5.04 5.43 5.42 5.30 4.87 4.85 5.32 5.54 5.27×10−3PM2.5预测MSE ( )1.581.551.531.551.611.551.551.541.571.581.57η表 3 比例因子 鲁棒性分析ηTable 3 Robustness analysis of the scaling factor η0.1 1.0 2.0 3.0 4.0 5.0 6.07.08.09.010.0MNIST 分类准确率 (%)97.8097.5997.4897.6197.0497.6297.4497.3397.3897.3797.45IMDB 分类准确率 (%)73.5873.4673.6273.7673.4473.8273.7172.9772.8673.1273.69×10−4股价预测MSE ( ) 5.70 5.32 5.04 5.06 5.61 4.73 5.04 5.14 4.85 4.97 5.19×10−3PM2.5预测MSE ( )1.531.551.561.591.561.531.581.551.541.501.522058自 动 化 学 报48 卷。
recurrent_neural_network_regularization
2
R ELATED
WORK
Dropout Srivastava (2013) is a recently introduced regularization method that has been very successful with feed-forward neural networks. While much work has extended dropout in various ways Wang & Manning (2013); Wan et al. (2013), there has been relatively little research in applying it to RNNs. The only paper on this topic is by Bayer et al. (2013), who focuses on “marginalized dropout” Wang & Manning (2013), a noiseless deterministic approximation to standard dropout. Bayer et al. (2013) claim that conventional dropout does not work well with RNNs because the recurrence amplifies noise, which in turn hurts learning. In this work, we show that this problem can be fixed by applying dropout to a certain subset of the RNNs’ connections. As a result, RNNs can now also benefit from dropout. Independently of our work, Pham et al. (2013) developed the very same RNN regularization method and applied it to handwriting recognition. We rediscovered this method and demonstrated strong empirical results over a wide range of problems. Other work that applied dropout to LSTMs is Pachitariu & Sahani (2013).
时滞忆阻Cohen-Grossberg神经网络周期解的存在性
时滞忆阻Cohen-Grossberg神经网络周期解的存在性王有刚;武怀勤【摘要】研究了一类具有时变时滞的忆阻Cohen-Grossberg神经网络的周期动力行为.借助M-矩阵理论,微分包含理论和Mawhin-like收敛定理,证明了网络系统周期解的存在性.最后,用一个数值算例验证了本文结论的正确性和可行性,并通过图形模拟直观地描述了周期解和平衡点的存在性.%The objective of this paper is to investigate the periodic dynamical behaviors for a class of Memristive Cohen-Grossberg neural networks with time-varying delays. By employing M-matrix theory, differential inclusions theory and the Mawhin-like coin-cidence theorem in set-valued analysis, the existence of the periodic solution for the network system was proved. Finally, an illustra-tive example was given to demonstrate the validity of the theoretical results and the existence of periodic solution and equilibrium point was described visually by graphical simulation.【期刊名称】《西华大学学报(自然科学版)》【年(卷),期】2017(036)005【总页数】10页(P22-30,35)【关键词】忆阻;Cohen-Grossberg神经网络;周期解;时变时滞【作者】王有刚;武怀勤【作者单位】吕梁学院数学系,山西吕梁 033001;燕山大学理学院,河北秦皇岛066004【正文语种】中文【中图分类】TP1831971年, 华裔科学家蔡少棠(Leon O. Chua)从理论推断在电阻、电容和电感器之外,应该还有一种组件,代表着电荷与磁通量之间的关系。
人工神经网络算法基础精讲ppt课件
2.3学习规则
学习规则
在神经网络的学习中,各神经元的连接权值需按一定的规则
调整,这种权值调整规则称为学习规则。下面介绍几种常见的学习
规则。
1.Hebb学习规则
2.Delta(δ)学习规则
3.LMS学习规则
4.胜者为王学习规则
5.Kohonen学习规则
6.概率式学习规则
2.3学习规则
1.Hebb学习规则
突触结构示意图
1.3生物神经元的信息处理机理
电脉冲
输 入
树 突
细胞体 形成 轴突
突
输
触
出
信息处理
传输
图 12.2 生物神经元功能模型
神经元的兴奋与抑制
当传入神经元冲动,经整和使细胞膜电位升高,超过动作电位 的阈值时,为兴奋状态,产生神经冲动,由轴突经神经末稍传出。 当传入神经元的冲动,经整和,使细胞膜电位降低,低于阈值时, 为抑制状态,不产生神经冲动。
④神经元的输出和响应是个输入值的综合作用的结果。
⑤兴奋和抑制状态,当细胞膜电位升高超过阈值时,细胞进入兴奋 状态,产生神经冲动;当膜电位低于阈值时,细胞进入抑制状态。
13
1.6激活函数
神经元的描述有多种,其区别在于采用了不同的激活函数,不 同的激活函数决定神经元的不同输出特性,常用的激活函数有如下 几种类型:
1957年,F.Rosenblatt提出“感知器”(Perceptron)模型,第一 次把神经网络的研究从纯理论的探讨付诸工程实践,掀起了人工神 经网络研究的第一次高潮。
4
1.1人工神经网络发展简史
20世纪60年代以后,数字计算机的发展达到全盛时期,人们误以 为数字计算机可以解决人工智能、专家系统、模式识别问题,而放 松了对“感知器”的研究。于是,从20世纪60年代末期起,人工神 经网络的研究进入了低潮。
非径向对称的广义径向基神经网络的代理模型
r e d u c e i t s c o mp u t a t i o n l a c o mp l e x i t y ,we p r o p o s e a n d c o n s t r u c t t h e g e n e r a l i s e d r a d i a l b a s i s f u n c t i o n n e u r a l n e t w o r k s wi t h n o n - r a d i a l s y mme t r i c k e r n e l f u n c t i o n .T h e r a d i a l s y mme t ic r Ga u s s i a n k e r n e l f u n c t i o n s a n d t h e n o n - r a d i a l s y mme t i r c k e r n e l f u n c t i o n a r e u s e d t o v a l i d a t e t h e t e x t
g e n e r a l i s e d n e t w o r k,t h e r e l a t i v e e r r o r a n d he t r o o t - me a n - s q u a r e e r r o r .Ex p e i r me n t a l r e s u l t s i n d i c a t e ha t t he t a g e n t mo d e l b a s e d o n n o n — r a d i a l s y mme t r i c g e n e r a l i s e d r a d i a l b a s i s f u n c t i o n n e u r a l n e t wo r k s h a s ma n y a d v a n t a g e s s u c h a s h i g h c o mp u t a t i o n a c c u r a c y a n d l e s s r e q u i r e d n e t w o r k n o d e s ,a n d l e s s c a l c u l a t i n g a n d c o mp a r i n g t i me .
1-Discrete-time+MPC+for+Beginners+
1Discrete-time MPC for Beginners1.1IntroductionIn this chapter,we will introduce the basic ideas and terms about model pre-dictive control.In Section1.2,a single-input and single-output state-space model with an embedded integrator is introduced,which is used in the design of discrete-time predictive controllers with integral action in this book.In Sec-tion1.3,we examine the design of predictive control within one optimization window.This is demonstrated by simple analytical examples.With the results obtained from the optimization,in Section1.4,we discuss the ideas of reced-ing horizon control,and state feedback gain matrices,and the closed-loop configuration of the predictive control system.The results are extended to multi-input and multi-output systems in Section1.5.In a general framework of state-space design,an observer is needed in the implementation,and this is discussed in Section1.6.With a combination of estimated state variables and the predictive controller,in Section1.7,we present state estimate predictive control.1.1.1Day-to-day Application Example of Predictive ControlThe general design objective of model predictive control is to compute a tra-jectory of a future manipulated variable u to optimize the future behaviour of the plant output y.The optimization is performed within a limited time window by giving plant information at the start of the time window.To help understand the basic ideas that have been used in the design of predictive control,we examine a typical planning activity of our day-to-day work.The day begins at9o’clock in the morning.We are,as a team,going to complete the tasks of design and implementation of a model predictive control system for a liquid vessel.The rule of the game is that we always plan our activities for the next8hours work,however,we only implement the plan for thefirst hour.This planning activity is repeated for every fresh hour until the tasks are completed.21Discrete-time MPC for BeginnersGiven the amount of background work that we have completed for9o’clock,we plan ahead for the next8hours.Assume that the work tasks are divided into modelling,design,simulation and pleting these tasks will be a function of various factors,such as how much effort we will put in,how well we will work as a team and whether we will get some additional help from others.These are the manipulated variables in the planning problem.Also,we have our limitations,such as our ability to understand the design problem,and whether we have good skills of computer hardware and software engineering. These are the hard and soft constraints in the planning.The background information we have already acquired is paramount for this planning work.After everything is considered,we determine the design tasks for the next 8hours as functions of the manipulated variables.Then we calculate hour-by-hour what we need to do in order to complete the tasks.In this calculation, based on the background information,we will take our limitations into con-sideration as the constraints,andfind the best way to achieve the goal.The end result of this planning gives us our projected activities from9o’clock to 5o’clock.Then we start working by implementing the activities for thefirst hour of our plan.At10o’clock,we check how much we have actually done for thefirst hour.This information is used for the planning of our next phase of activities. Maybe we have done less than we planned because we could not get the correct model or because one of the key members went for an emergency meeting.Nevertheless,at10o’clock,we make an assessment on what we have achieved,and use this updated information for planning our activities for the next8hours.Our objective may remain the same or may change.The length of time for the planning remains the same(8hours).We repeat the same planning process as it was at9o’clock,which then gives us the new projected activities for the next8hours.We implement thefirst hour of activities at 10o’clock.Again at11o’clock,we assess what we have achieved again and use the updated information for the plan of work for the next8hours.The planning and implementation process is repeated every hour until the original objective is achieved.There are three key elements required in the planning.Thefirst is the way of predicting what might happen(model);the second is the instrument of assessing our current activities(measurement);and the third is the instrument of implementing the planned activities(realization of control).The key issues in the planning exercise are:1.the time window for the planning isfixed at8hours;2.we need to know our current status before the planning;3.we take the best approach for the8hours work by taking the constraintsinto consideration,and the optimization is performed in real-time with a moving horizon time window and with the latest information available. The planning activity described here involves the principle of MPC.In this example,it is described by the terms that are to be used frequently in the1.1Introduction3 following:the moving horizon window,prediction horizon,receding horizon control,and control objective.They are introduced as below.1.Moving horizon window:the time-dependent window from an arbitrarytime t i to t i+T p.The length of the window T p remains constant.In this example,the planning activity is performed within an8-hour window, thus T p=8,with the measurement taken every hour.However,t i,which defines the beginning of the optimization window,increases on an hourly basis,starting with t i=9.2.Prediction horizon:dictates how‘far’we wish the future to be predictedfor.This parameter equals the length of the moving horizon window,T p.3.Receding horizon control:although the optimal trajectory of future controlsignal is completely described within the moving horizon window,the actual control input to the plant only takes thefirst sample of the control signal,while neglecting the rest of the trajectory.4.In the planning process,we need the information at time t i in order topredict the future.This information is denoted as x(t i)which is a vec-tor containing many relevant factors,and is either directly measured or estimated.5.A given model that will describe the dynamics of the system is paramountin predictive control.A good dynamic model will give a consistent and accurate prediction of the future.6.In order to make the best decision,a criterion is needed to reflect the ob-jective.The objective is related to an error function based on the difference between the desired and the actual responses.This objective function is often called the cost function J,and the optimal control action is found by minimizing this cost function within the optimization window.1.1.2Models Used in the DesignThere are three general approaches to predictive control design.Each ap-proach uses a unique model structure.In the earlier formulation of model predictive control,finite impulse response(FIR)models and step response models were favoured.FIR model/step response model based design algo-rithms include dynamic matrix control(DMC)(Cutler and Ramaker,1979) and the quadratic DMC formulation of Garcia and Morshedi(1986).The FIR type of models are appealing to process engineers because the model structure gives a transparent description of process time delay,response time and gain.However,they are limited to stable plants and often require large model orders.This model structure typically requires30to60impulse re-sponse coefficients depending on the process dynamics and choice of sampling intervals.Transfer function models give a more parsimonious description of process dynamics and are applicable to both stable and unstable plants.Rep-resentatives of transfer function model-based predictive control include the predictive control algorithm of Peterka(Peterka,1984)and the generalized41Discrete-time MPC for Beginnerspredictive control(GPC)algorithm of Clarke and colleagues(Clarke et al., 1987).The transfer function model-based predictive control is often considered to be less effective in handling multivariable plants.A state-space formulation of GPC has been presented in Ordys and Clarke(1993).Recent years have seen the growing popularity of predictive control de-sign using state-space design methods(Ricker,1991,Rawlings and Muske, 1993,Rawlings,2000,Maciejowski,2002).In this book,we will use state-space models,both in continuous time and discrete time for simplicity of the design framework and the direct link to the classical linear quadratic regulators. 1.2State-space Models with Embedded IntegratorModel predictive control systems are designed based on a mathematical model of the plant.The model to be used in the control system design is taken to be a state-space model.By using a state-space model,the current information required for predicting ahead is represented by the state variable at the current time.1.2.1Single-input and Single-output SystemFor simplicity,we begin our study by assuming that the underlying plant is a single-input and single-output system,described by:x m(k+1)=A m x m(k)+B m u(k),(1.1) y(k)=C m x m(k),(1.2)where u is the manipulated variable or input variable;y is the process output; and x m is the state variable vector with assumed dimension n1.Note that this plant model has u(k)as its input.Thus,we need to change the model to suit our design purpose in which an integrator is embedded.Note that a general formulation of a state-space model has a direct term from the input signal u(k)to the output y(k)asy(k)=C m x m(k)+D m u(k).However,due to the principle of receding horizon control,where a current information of the plant is required for prediction and control,we have im-plicitly assumed that the input u(k)cannot affect the output y(k)at the same time.Thus,D m=0in the plant model.Taking a difference operation on both sides of(1.1),we obtain thatx m(k+1)−x m(k)=A m(x m(k)−x m(k−1))+B m(u(k)−u(k−1)).1.2State-space Models with Embedded Integrator 5Let us denote the difference of the state variable byΔx m (k +1)=x m (k +1)−x m (k );Δx m (k )=x m (k )−x m (k −1),and the difference of the control variable byΔu (k )=u (k )−u (k −1).These are the increments of the variables x m (k )and u (k ).With this transfor-mation,the difference of the state-space equation is:Δx m (k +1)=A m Δx m (k )+B m Δu (k ).(1.3)Note that the input to the state-space model is Δu (k ).The next step is to connect Δx m (k )to the output y (k ).To do so,a new state variable vector is chosen to be x (k )= Δx m (k )T y (k ) T ,where superscript T indicates matrix transpose.Note thaty (k +1)−y (k )=C m (x m (k +1)−x m (k ))=C m Δx m (k +1)=C m A m Δx m (k )+C m B m Δu (k ).(1.4)Putting together (1.3)with (1.4)leads to the following state-space model:x (k +1) Δx m (k +1)y (k +1) =A A m o T m C m A m 1 x (k ) Δx m (k )y (k ) +B B m C m B mΔu (k )y (k )=C o m 1 Δx m (k )y (k ),(1.5)where o m =n 1 00...0 .The triplet (A,B,C )is called the augmented model,which will be used in the design of predictive control.Example 1.1.Consider a discrete-time model in the following form:x m (k +1)=A m x m (k )+B m u (k )y (k )=C m x m (k )(1.6)where the system matrices areA m = 1101 ;B m = 0.51;C m = 10 .Find the triplet matrices (A,B,C )in the augmented model (1.5)and calcu-late the eigenvalues of the system matrix,A ,of the augmented model.61Discrete-time MPC for BeginnersSolution.From (1.5),n 1=2and o m =[00].The augmented model for this plant is given byx (k +1)=Ax (k )+BΔu (k )y (k )=Cx (k ),(1.7)where the augmented system matrices are A = A m o T m C m A m 1 =⎡⎣110010111⎤⎦;B = B m C m B m =⎡⎣0.510.5⎤⎦;C = o m 1 = 001 .The characteristic equation of matrix A is given by ρ(λ)=det(λI −A )=det λI −A m o T m −C m A m (λ−1)=(λ−1)det(λI −A m )=(λ−1)3.(1.8)Therefore,the augmented state-space model has three eigenvalues at λ=1.Among them,two are from the original integrator plant,and one is from the augmentation of the plant model.1.2.2MATLAB Tutorial:Augmented Design ModelTutorial 1.1.The objective of this tutorial is to demonstrate how to obtain a discrete-time state-space model from a continuous-time state-space model,and form the augmented discrete-time state-space model.Consider a continuous-time system has the state-space model:˙x m (t )=⎡⎣010301010⎤⎦x m (t )+⎡⎣113⎤⎦u (t )y (t )= 010 x m (t ).(1.9)Step by Step1.Create a new file called extmodel.m.We form a continuous-time state vari-able model;then this continuous-time model is discretized using MATLAB function ‘c2dm’with specified sampling interval Δt .2.Enter the following program into the file:Ac =[010;301;010];Bc=[1;1;3];Cc=[010];Dc=zeros(1,1);Delta_t=1;[Ad,Bd,Cd,Dd]=c2dm(Ac,Bc,Cc,Dc,Delta_t);1.3Predictive Control within One Optimization Window7 3.The dimensions of the system matrices are determined to discover thenumbers of states,inputs and outputs.The augmented state-space model is produced.Continue entering the following program into thefile: [m1,n1]=size(Cd);[n1,n_in]=size(Bd);A_e=eye(n1+m1,n1+m1);A_e(1:n1,1:n1)=Ad;A_e(n1+1:n1+m1,1:n1)=Cd*Ad;B_e=zeros(n1+m1,n_in);B_e(1:n1,:)=Bd;B_e(n1+1:n1+m1,:)=Cd*Bd;C_e=zeros(m1,n1+m1);C_e(:,n1+1:n1+m1)=eye(m1,m1);4.Run this program to produce the augmented state variable model for thedesign of predictive control.1.3Predictive Control within One Optimization Window Upon formulation of the mathematical model,the next step in the design of a predictive control system is to calculate the predicted plant output with the future control signal as the adjustable variables.This prediction is described within an optimization window.This section will examine in detail the opti-mization carried out within this window.Here,we assume that the current time is k i and the length of the optimization window is N p as the number of samples.For simplicity,the case of single-input and single-output systems is consideredfirst,then the results are extended to multi-input and multi-output systems.1.3.1Prediction of State and Output VariablesAssuming that at the sampling instant k i,k i>0,the state variable vector x(k i)is available through measurement,the state x(k i)provides the current plant information.The more general situation where the state is not directly measured will be discussed later.The future control trajectory is denoted by Δu(k i),Δu(k i+1),...,Δu(k i+N c−1),where N c is called the control horizon dictating the number of parameters used to capture the future control trajectory.With given information x(k i), the future state variables are predicted for N p number of samples,where N p is called the prediction horizon.N p is also the length of the optimization window.We denote the future state variables asx(k i+1|k i),x(k i+2|k i),...,x(k i+m|k i),...,x(k i+N p|k i),81Discrete-time MPC for Beginnerswhere x (k i +m |k i )is the predicted state variable at k i +m with given current plant information x (k i ).The control horizon N c is chosen to be less than (or equal to)the prediction horizon N p .Based on the state-space model (A,B,C ),the future state variables are calculated sequentially using the set of future control parameters:x (k i +1|k i )=Ax (k i )+BΔu (k i )x (k i +2|k i )=Ax (k i +1|k i )+BΔu (k i +1)=A 2x (k i )+ABΔu (k i )+BΔu (k i +1)...x (k i +N p |k i )=A N p x (k i )+A N p −1BΔu (k i )+A N p −2BΔu (k i +1)+...+A N p −N c BΔu (k i +N c −1).From the predicted state variables,the predicted output variables are,by substitutiony (k i +1|k i )=CAx (k i )+CBΔu (k i )(1.10)y (k i +2|k i )=CA 2x (k i )+CABΔu (k i )+CBΔu (k i +1)y (k i +3|k i )=CA 3x (k i )+CA 2BΔu (k i )+CABΔu (k i +1)+CBΔu (k i +2)...y (k i +N p |k i )=CA N p x (k i )+CA N p −1BΔu (k i )+CA N p −2BΔu (k i +1)+...+CA N p −N c BΔu (k i +N c −1).(1.11)Note that all predicted variables are formulated in terms of current state variable information x (k i )and the future control movement Δu (k i +j ),where j =0,1,...N c −1.Define vectors Y = y (k i +1|k i )y (k i +2|k i )y (k i +3|k i )...y (k i +N p |k i )T ΔU = Δu (k i )Δu (k i +1)Δu (k i +2)...Δu (k i +N c −1) T ,where in the single-input and single-output case,the dimension of Y is N p and the dimension of ΔU is N c .We collect (1.10)and (1.11)together in a compact matrix form asY =F x (k i )+ΦΔU,(1.12)where F =⎡⎢⎢⎢⎢⎢⎣CACA 2CA 3...CA N p ⎤⎥⎥⎥⎥⎥⎦;Φ=⎡⎢⎢⎢⎢⎢⎣CB 00...0CAB CB 0...0CA 2B CAB CB ...0...CA N p −1B CA N p −2B CA N p −3B ...CA N p −N c B⎤⎥⎥⎥⎥⎥⎦.1.3Predictive Control within One Optimization Window9 1.3.2OptimizationFor a given set-point signal r(k i)at sample time k i,within a prediction horizon the objective of the predictive control system is to bring the predicted output as close as possible to the set-point signal,where we assume that the set-point signal remains constant in the optimization window.This objective is then translated into a design tofind the‘best’control parameter vectorΔU such that an error function between the set-point and the predicted output is minimized.Assuming that the data vector that contains the set-point information isR T s=N p11 (1)r(k i),we define the cost function J that reflects the control objective asJ=(R s−Y)T(R s−Y)+ΔU T¯RΔU,(1.13) where thefirst term is linked to the objective of minimizing the errors between the predicted output and the set-point signal while the second term reflects the consideration given to the size ofΔU when the objective function J is made to be as small as possible.¯R is a diagonal matrix in the form that¯R=rw I N c×N c(r w≥0)where r w is used as a tuning parameter for the desired closed-loop performance.For the case that r w=0,the cost function (1.13)is interpreted as the situation where we would not want to pay any attention to how large theΔU might be and our goal would be solely to make the error(R s−Y)T(R s−Y)as small as possible.For the case of large r w,the cost function(1.13)is interpreted as the situation where we would carefully consider how large theΔU might be and cautiously reduce the error (R s−Y)T(R s−Y).Tofind the optimalΔU that will minimize J,by using(1.12),J is ex-pressed asJ=(R s−F x(k i))T(R s−F x(k i))−2ΔU TΦT(R s−F x(k i))+ΔU T(ΦTΦ+¯R)ΔU.(1.14) From thefirst derivative of the cost function J:∂J∂ΔU=−2ΦT(R s−F x(k i))+2(ΦTΦ+¯R)ΔU,(1.15) the necessary condition of the minimum J is obtained as∂J∂ΔU=0,from which wefind the optimal solution for the control signal asΔU=(ΦTΦ+¯R)−1ΦT(R s−F x(k i)),(1.16)101Discrete-time MPC for Beginnerswith the assumption that (ΦT Φ+¯R)−1exists.The matrix (ΦT Φ+¯R )−1is called the Hessian matrix in the optimization literature.Note that R s is a data vector that contains the set-point information expressed asR s =N p [111...1]T r (k i )=¯Rs r (k i ),where¯Rs =N p [111...1]T .The optimal solution of the control signal is linked to the set-point signal r (k i )and the state variable x (k i )via the following equation:ΔU =(ΦT Φ+¯R )−1ΦT (¯R s r (k i )−F x (k i )).(1.17)Example 1.2.Suppose that a first-order system is described by the state equa-tion:x m (k +1)=ax m (k )+bu (k )y (k )=x m (k ),(1.18)where a =0.8and b =0.1are scalars.Find the augmented state-space model.Assuming a prediction horizon N p =10and control horizon N c =4,calcu-late the components that form the prediction of future output Y ,and the quantities ΦT Φ,ΦT F and ΦT ¯Rs .Assuming that at a time k i (k i =10for this example),r (k i )=1and the state vector x (k i )=[0.10.2]T ,find the optimal solution ΔU with respect to the cases where r w =0and r w =10,and compare the results.Solution.The augmented state-space equation is Δx m (k +1)y (k +1) = a 0a 1 Δx m (k )y (k ) + b b Δu (k )y (k )= 01 Δx m (k )y (k ).(1.19)Based on (1.12),the F and Φmatrices take the following forms:F =⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣CA CA 2CA 3CA 4CA 5CA 6CA 7CA 8CA 9CA 10⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦;Φ=⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣CB 000CAB CB 00CA 2B CAB CB 0CA 3B CA 2B CAB CB CA 4B CA 3B CA 2B CAB CA 5B CA 4B CA 3B CA 2B CA 6B CA 5B CA 4B CA 3B CA 7B CA 6B CA 5B CA 4B CA 8B CA 7B CA 6B CA 5B CA 9B CA 8B CA 7B CA 6B ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.1.3Predictive Control within One Optimization Window11 The coefficients in the F andΦmatrices are calculated as follows:CA=s11CA2=s21CA3=s31... CA k=s k1,(1.20)where s1=a,s2=a2+s1,...,s k=a k+s k−1,andCB=g0=bCAB=g1=ab+g0CA2B=g2=a2b+g1...CA k−1B=g k−1=a k−1b+g k−2CA k B=g k=a k b+g k−1.(1.21) With the plant parameters a=0.8and b=0.1,N p=10and N c=4,we calculate the quantitiesΦTΦ=⎡⎢⎢⎣1.15411.04070.91160.7726 1.04070.95490.84750.7259 0.91160.84750.76750.6674 0.77260.72590.66740.5943⎤⎥⎥⎦ΦT F=⎡⎢⎢⎣9.23253.21478.32592.76847.29272.33556.18111.9194⎤⎥⎥⎦;ΦT¯R s=⎡⎢⎢⎣3.21472.76842.33551.9194⎤⎥⎥⎦.Note that the vectorΦT¯R s is identical to the last column in the matrixΦT F. This is because the last column of F matrix is identical to¯R s.At time k i=10,the state vector x(k i)=[0.10.2]T.In thefirst case,the error between predicted Y and R s is reduced without any consideration to the magnitude of control ly,r w=0.Then,the optimalΔU is found through the calculationΔU=(ΦTΦ)−1(ΦT R s−ΦT F x(k i))=7.2−6.400T.We note that without weighting on the incremental control,the last two ele-mentsΔu(k i+2)=0andΔu(k i+3)=0,while thefirst two elements have a rather large magnitude.Figure1.1a shows the changes of the state variables where we can see that the predicted output y has reached the desired set-point121Discrete-time MPC for Beginners(a)State variables with no weight onΔu(b)State variables with weight on Δuparison of optimal solutions.Key:line (1)Δx m ;line (2)y 1while the Δx m decays to zero.To examine the effect of the weight r w on the optimal solution of the control,we let r w =10.The optimal solution of ΔU is given below,where I is a 4×4identity matrix,ΔU =(ΦT Φ+10×I )−1(ΦT R s −ΦT F x (k i ))(1.22)= 0.12690.10340.08290.065 T .With this choice,the magnitude of the first two control increments is signifi-cantly reduced,also the last two components are no longer zero.Figure 1.1b shows the optimal state variables.It is seen that the output y did not reach the set-point value of 1,however,the Δx m approaches zero.An observation follows from the comparison study.It seems that if we want the control to move cautiously,then it takes longer for the control signal to reach its steady state (i.e.,the values in ΔU decrease more slowly),because the optimal control energy is distributed over the longer period of future time.We can verify this by increasing N c to 9,while maintaining r w =10.The result shows that the magnitude of the elements in ΔU is reducing,but they are significant for the first 8elements:ΔU T = 0.12270.09930.07900.06140.04630.03340.02270.01390.0072 .In comparison with the case where N c =4,we note that when N c =9,the first four parameters in ΔU are slightly different from the previous case.Example 1.3.There is an alternative way to find the minimum of the cost function via completing the squares.This is an intuitive approach,also the minimum of the cost function becomes a by-product of the approach.Find the optimal solution for ΔU by completing the squares of the cost func-tion J (1.14).1.3Predictive Control within One Optimization Window 13Solution.From (1.14),by adding and subtracting the term(R s −F x (k i ))T Φ(ΦT Φ+¯R)−1ΦT (R s −F x (k i ))to the original cost function J ,its value remains unchanged.This leads toJ =(R s −F x (k i ))T (R s −F x (k i )) −2ΔU T ΦT (R s −F x (k i ))+ΔU T (ΦT Φ+¯R )ΔU + (R s −F x (k i ))T Φ(ΦT Φ+¯R)−1ΦT (R s −F x (k i ))−(R s −F x (k i ))T Φ(ΦT Φ+¯R)−1ΦT (R s −F x (k i )),(1.23)where the quantities under the .are the completed ‘squares’:J 0= ΔU −(ΦT Φ+¯R )−1ΦT (R s −F x (k i )) T ×(ΦT Φ+¯R ) ΔU −(ΦT Φ+¯R )−1ΦT (R s −F x (k i )) .(1.24)This can be easily verified by opening the squares.Since the first and last terms in (1.23)are independent of the variable ΔU (sometimes,we call this a decision variable),and (ΦT Φ+¯R)is assumed to be positive definite,then the minimum of the cost function J is achieved if the quantity J 0equals zero,i.e.,ΔU =(ΦT Φ+¯R)−1ΦT (R s −F x (k i )).(1.25)This is the optimal control solution.By substituting this optimal solution into the cost function (1.23),we obtain the minimum of the cost asJ min =(R s −F x (k i ))T (R s −F x (k i ))−(R s −F x (k i ))T Φ(ΦT Φ+¯R)−1ΦT (R s −F x (k i )).1.3.3MATLAB Tutorial:Computation of MPC GainsTutorial 1.2.The objective of this tutorial is to produce a MATLAB function for calculating ΦT Φ,ΦT F ,ΦT ¯Rs .The key here is to create F and Φmatrices.Φmatrix is a Toeplitz matrix,which is created by defining its first column,and the next column is obtained through shifting the previous column.Step by Step1.Create a new file called mpcgain.m.2.The first step is to create the augmented model for MPC design.The input parameters to the function are the state-space model (A p ,B p ,C p ),prediction horizon N p and control horizon N c .Enter the following program into the file:141Discrete-time MPC for Beginnersfunction[Phi_Phi,Phi_F,Phi_R,A_e,B_e,C_e]=mpcgain(Ap,Bp,Cp,Nc,Np);[m1,n1]=size(Cp);[n1,n_in]=size(Bp);A_e=eye(n1+m1,n1+m1);A_e(1:n1,1:n1)=Ap;A_e(n1+1:n1+m1,1:n1)=Cp*Ap;B_e=zeros(n1+m1,n_in);B_e(1:n1,:)=Bp;B_e(n1+1:n1+m1,:)=Cp*Bp;C_e=zeros(m1,n1+m1);C_e(:,n1+1:n1+m1)=eye(m1,m1);3.Note that the F and P hi matrices have special forms.By taking advantageof the special structure,we obtain the matrices.4.Continue entering the program into thefile:n=n1+m1;h(1,:)=C_e;F(1,:)=C_e*A_e;for kk=2:Nph(kk,:)=h(kk-1,:)*A_e;F(kk,:)=F(kk-1,:)*A_e;endv=h*B_e;Phi=zeros(Np,Nc);%declare the dimension of PhiPhi(:,1)=v;%first column of Phifor i=2:NcPhi(:,i)=[zeros(i-1,1);v(1:Np-i+1,1)];%Toeplitz matrixendBarRs=ones(Np,1);Phi_Phi=Phi’*Phi;Phi_F=Phi’*F;Phi_R=Phi’*BarRs;5.Type into the MATLAB Work Space with Ap=0.8,Bp=0.1,Cp=1,Nc=4and Np=10.Run this MATLAB function by typing[Phi_Phi,Phi_F,Phi_R,A_e,B_e,C_e]=mpcgain(Ap,Bp,Cp,Nc,Np);paring the results with the answers from Example1.2.If it is identicalto what was presented there,then your program is correct.7.Varying the prediction horizon and control horizon,observe the changesin these matrices.8.CalculateΔU by assuming the information of initial condition on x andr.The inverse of matrix M is calculated in MATLAB as inv(M).9.Validate the results in Example1.2.1.4Receding Horizon Control 151.4Receding Horizon ControlAlthough the optimal parameter vector ΔU contains the controls Δu (k i ),Δu (k i +1),Δu (k i +2),...,Δu (k i +N c −1),with the receding horizon control principle,we only implement the first sample of this sequence,i.e.,Δu (k i ),while ignoring the rest of the sequence.When the next sample period arrives,the more recent measurement is taken to form the state vector x (k i +1)for calculation of the new sequence of control signal.This procedure is repeated in real time to give the receding horizon control law.Example 1.4.We illustrate this procedure by continuing Example 1.2,where a first-order system with the state-space descriptionx m (k +1)=0.8x m (k )+0.1u (k )is used in the computation.We will consider the case r w =0.The initial conditions are x (10)=[0.10.2]T and u (9)=0.Solution.At sample time k i =10,the optimal control was previously com-puted as Δu (10)=7.2.Assuming that u (9)=0,then the control signal to the plant is u (10)=u (9)+Δu (10)=7.2and with x m (10)=y (10)=0.2,we calculate the next simulated plant state variablex m (11)=0.8x m (10)+0.1u (10)=0.88.(1.26)At k i =11,the new plant information is Δx m (11)=0.88−0.2=0.68and y (11)=0.88,which forms x (11)= 0.680.88 T .Then we obtainΔU =(ΦT Φ)−1(ΦT R s −ΦT F x (11))= −4.24−0.960.00000.0000 T .This leads to the optimal control u (11)=u (10)+Δu (11)=2.96.This new control is implemented to obtainx m (12)=0.8x m (11)+0.1u (11)=1.(1.27)At k i =12,the new plant information is Δx m (12)=1−0.88=0.12and y (12)=1,which forms x (12)= 0.121 .We obtainΔU =(ΦT Φ)−1(ΦT R s −ΦT F x (11))= −0.960.0000.00000.0000 T .This leads to the control at k i =12as u (12)=u (11)−0.96=2.By imple-menting this control,we obtain the next plant output asx m (13)=ax m (12)+bu (12)=1.(1.28)The new plant information is Δx m (13)=1−1=0and y (13)=1.From this information,we obtain。
模拟ai英文面试题目及答案
模拟ai英文面试题目及答案模拟AI英文面试题目及答案1. 题目: What is the difference between a neural network anda deep learning model?答案: A neural network is a set of algorithms modeled loosely after the human brain that are designed to recognize patterns. A deep learning model is a neural network with multiple layers, allowing it to learn more complex patterns and features from data.2. 题目: Explain the concept of 'overfitting' in machine learning.答案: Overfitting occurs when a machine learning model learns the training data too well, including its noise and outliers, resulting in poor generalization to new, unseen data.3. 题目: What is the role of a 'bias' in an AI model?答案: Bias in an AI model refers to the systematic errors introduced by the model during the learning process. It can be due to the choice of model, the training data, or the algorithm's assumptions, and it can lead to unfair or inaccurate predictions.4. 题目: Describe the importance of data preprocessing in AI.答案: Data preprocessing is crucial in AI as it involves cleaning, transforming, and reducing the data to a suitableformat for the model to learn effectively. Proper preprocessing can significantly improve the performance of AI models by ensuring that the input data is relevant, accurate, and free from noise.5. 题目: How does reinforcement learning differ from supervised learning?答案: Reinforcement learning is a type of machine learning where an agent learns to make decisions by performing actions in an environment to maximize a reward signal. It differs from supervised learning, where the model learns from labeled data to predict outcomes based on input features.6. 题目: What is the purpose of a 'convolutional neural network' (CNN)?答案: A convolutional neural network (CNN) is a type of deep learning model that is particularly effective for processing data with a grid-like topology, such as images. CNNs use convolutional layers to automatically and adaptively learn spatial hierarchies of features from input images.7. 题目: Explain the concept of 'feature extraction' in AI.答案: Feature extraction in AI is the process of identifying and extracting relevant pieces of information from the raw data. It is a crucial step in many machine learning algorithms, as it helps to reduce the dimensionality of the data and to focus on the most informative aspects that can be used to make predictions or classifications.8. 题目: What is the significance of 'gradient descent' in training AI models?答案: Gradient descent is an optimization algorithm used to minimize a function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In the context of AI, it is used to minimize the loss function of a model, thus refining the model's parameters to improve its accuracy.9. 题目: How does 'transfer learning' work in AI?答案: Transfer learning is a technique where a pre-trained model is used as the starting point for learning a new task. It leverages the knowledge gained from one problem to improve performance on a different but related problem, reducing the need for large amounts of labeled data and computational resources.10. 题目: What is the role of 'regularization' in preventing overfitting?答案: Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function, which discourages overly complex models. It helps to control the model's capacity, forcing it to generalize better to new data by not fitting too closely to the training data.。
纹理物体缺陷的视觉检测算法研究--优秀毕业论文
摘 要
在竞争激烈的工业自动化生产过程中,机器视觉对产品质量的把关起着举足 轻重的作用,机器视觉在缺陷检测技术方面的应用也逐渐普遍起来。与常规的检 测技术相比,自动化的视觉检测系统更加经济、快捷、高效与 安全。纹理物体在 工业生产中广泛存在,像用于半导体装配和封装底板和发光二极管,现代 化电子 系统中的印制电路板,以及纺织行业中的布匹和织物等都可认为是含有纹理特征 的物体。本论文主要致力于纹理物体的缺陷检测技术研究,为纹理物体的自动化 检测提供高效而可靠的检测算法。 纹理是描述图像内容的重要特征,纹理分析也已经被成功的应用与纹理分割 和纹理分类当中。本研究提出了一种基于纹理分析技术和参考比较方式的缺陷检 测算法。这种算法能容忍物体变形引起的图像配准误差,对纹理的影响也具有鲁 棒性。本算法旨在为检测出的缺陷区域提供丰富而重要的物理意义,如缺陷区域 的大小、形状、亮度对比度及空间分布等。同时,在参考图像可行的情况下,本 算法可用于同质纹理物体和非同质纹理物体的检测,对非纹理物体 的检测也可取 得不错的效果。 在整个检测过程中,我们采用了可调控金字塔的纹理分析和重构技术。与传 统的小波纹理分析技术不同,我们在小波域中加入处理物体变形和纹理影响的容 忍度控制算法,来实现容忍物体变形和对纹理影响鲁棒的目的。最后可调控金字 塔的重构保证了缺陷区域物理意义恢复的准确性。实验阶段,我们检测了一系列 具有实际应用价值的图像。实验结果表明 本文提出的纹理物体缺陷检测算法具有 高效性和易于实现性。 关键字: 缺陷检测;纹理;物体变形;可调控金字塔;重构
Keywords: defect detection, texture, object distortion, steerable pyramid, reconstruction
II
ARTIFICIAL NEURAL NETWORKS
ARTIFICIAL NEURAL NETWORKSArtificial neural networks (ANNs) provide a general, practical method for learningreal-valued, discrete-valued, and vector-valued functions from examples. Algorithms such as BACKPROPAGATION use gradient descent to tune network parameters to best fit a training set of input-output pairs. ANN learning is robust to errors in the training data and has been successfully applied to problems such as interpreting visual scenes, speech recognition, and learning robot control strategies.1 INTRODUCTIONNeural network learning methods provide a robust approach to approximatingreal-valued, discrete-valued, and vector-valued target functions. For certain typesof problems, such as learning to interpret complex real-world sensor data, artificial neural networks are among the most effective learning methods currently known.For example, the BACKPROPAGATION algorithm described in this chapter has proven surprisingly successful in many practical problems such as learning to recognize handwritten characters (LeCun et al. 1989), learning to recognize spoken words (Lang et al. 1990), and learning to recognize faces (Cottrell 1990). One survey of practical applications is provided by Rumelhart et al. (1994).1.1 Biological MotivationThe study of artificial neural networks (ANNs) has been inspired in part by the observation that biological learning systems are built of very complex webs of interconnected neurons. In rough analogy, artificial neural networks are built outof a densely interconnected set of simple units, where each unit takes a numberof real-valued inputs (possibly the outputs of other units) and produces a singlereal-valued output (which may become the input to many other units).To develop a feel for this analogy, let us consider a few facts from neuro- biology. The human brain, for example, is estimated to contain a densely inter- connected network of approximately 1011 neurons, each connected, on average, tolo4 others. Neuron activity is typically excited or inhibited through connections to other neurons. The fastest neuron switching times are known to be on the order ofloe3 seconds--quite slow compared to computer switching speeds of 10-lo sec-onds. Yet humans are able to make surprisingly complex decisions, surprisingly quickly. For example, it requires approximately lo-' seconds to visually recognize your mother. Notice the sequence of neuron firings that can take place during this 10-'-second interval cannot possibly be longer than a few hundred steps, giventhe switching speed of single neurons. This observation has led many to speculate that the information-processing abilities of biological neural systems must follow from highly parallel processes operating on representations that are distributed over many neurons. One motivation for ANN systems is to capture this kindof highly parallel computation based on distributed representations. Most ANN software runs on sequential machines emulating distributed processes, although faster versions of the algorithms have also been implemented on highly parallel machines and on specialized hardware designed specifically for ANN applications.While ANNs are loosely motivated by biological neural systems, there are many complexities to biological neural systems that are not modeled by ANNs, and many features of the ANNs we discuss here are known to be inconsistentwith biological systems. For example, we consider here ANNs whose individual units output a single constant value, whereas biological neurons output a complex time series of spikes.Historically, two groups of researchers have worked with artificial neural networks. One group has been motivated by the goal of using ANNs to studyand model biological learning processes. A second group has been motivated by the goal of obtaining highly effective machine learning algorithms, independent of whether these algorithms mirror biological processes. Within this book our interest fits the latter group, and therefore we will not dwell further on biological modeling. For more information on attempts to model biological systems using ANNs, see, for example, Churchland and Sejnowski (1992); Zornetzer et al. (1994); Gabriel and Moore (1990).1.2 NEURAL NETWORK REPRESENTATIONSA prototypical example of ANN learning is provided by Pomerleau's (1993) sys- tem ALVINN, which uses a learned ANN to steer an autonomous vehicle drivingat normal speeds on public highways. The input to the neural network is a 30 x 32 grid of pixel intensities obtained from a forward-pointed camera mounted on the vehicle. The network output is the direction in which the vehicle is steered. The ANN is trained to mimic the observed steering commands of a human driving the vehicle for approximately 5 minutes. ALVINN has used its learned networks to successfully drive at speeds up to 70 miles per hour and for distances of 90 miles on public highways (driving in the left lane of a divided public highway, with other vehicles present).Figure 4.1 illustrates the neural network representation used in one versionof the ALVINN system, and illustrates the kind of representation typical of many ANN systems. The network is shown on the left side of the figure, with the input camera image depicted below it. Each node (i.e., circle) in the network diagram corresponds to the output of a single network unit, and the lines entering the node from below are its inputs. As can be seen, there are four units that receive inputs directly from all of the 30 x 32 pixels in the image. These are called "hidden" units because their output is available only within the network and is not available as part of the global network output. Each of these four hidden units computes a single real-valued output based on a weighted combination of its 960 inputs. These hidden unit outputs are then used as inputs to a second layer of 30 "output" units. Each output unit corresponds to a particular steering direction, and the output values of these units determine which steering direction is recommended most strongly.The diagrams on the right side of the figure depict the learned weight values associated with one of the four hidden units in this ANN. The large matrix of black and white boxes on the lower right depicts the weights from the 30 x 32 pixel inputs into the hidden unit. Here, a white box indicates a positive weight, a black box a negative weight, and the size of the box indicates the weight magnitude. The smaller rectangular diagram directly above the large matrix shows the weights from this hidden unit to each of the 30 output units.The network structure of ALYINN is typical of many ANNs. Here the in- dividual units are interconnected in layers that form a directed acyclic graph. Ingeneral, ANNs can be graphs with many types of structures-acyclic or cyclic,directed or undirected. This chapter will focus on the most common and practicalANN approaches, which are based on the BACKPROPAGATION algorithm. The BACK- PROPAGATION algorithm assumes the network is a fixed structure that correspondsto a directed graph, possibly containing cycles. Learning corresponds to choosinga weight value for each edge in the graph. Although certain types of cycles are allowed, the vast majority of practical applications involve acyclic feed-forward networks, similar to the network structure used by ALVINN.1.3 APPROPRIATE PROBLEMS FOR NEURAL NETWORKLEARNINGANN learning is well-suited to problems in which the training data correspondsto noisy, complex sensor data, such as inputs from cameras and microphones.FIGURE 4.1Neural network learning to steer an autonomous vehicle. The ALVINN system uses BACKPROPAGA- TION to learn to steer an autonomous vehicle (photo at top) driving at speeds up to 70 miles per hour. The diagram on the left shows how the image of a forward-mounted camera is mapped to 960 neural network inputs, which are fed forward to 4 hidden units, connected to 30 output units. Network outputs encode the commanded steering direction. The figure on the right shows weight values forone of the hidden units in this network. The 30 x 32 weights into the hidden unit are displayed inthe large matrix, with white blocks indicating positive and black indicating negative weights. The weights from this hidden unit to the 30 output units are depicted by the smaller rectangular block directly above the large block. As can be seen from these output weights, activation of this particular hidden unit encourages a turn toward the left.It is also applicable to problems for which more symbolic representations areoften used, such as the decision tree learning tasks discussed in Chapter 3. Inthese cases ANN and decision tree learning often produce results of comparable accuracy. See Shavlik et al. (1991) and Weiss and Kapouleas (1989) for exper-imental comparisons of decision tree and ANN learning. The BACKPROPAGATION algorithm is the most commonly used ANN learning technique. It is appropriatefor problems with the following characteristics:Instances are represented by many attribute-value pairs. The target functionto be learned is defined over instances that can be described by a vector ofpredefined features, such as the pixel values in the ALVINN example. Theseinput attributes may be highly correlated or independent of one another.Input values can be any real values.●The target function output may be discrete-valued, real-valued, or a vector of several real- or discrete-valued attributes. For example, in the ALVINNsystem the output is a vector of 30 attributes, each corresponding to a rec- ommendation regarding the steering direction. The value of each output issome real number between 0 and 1, which in this case corresponds to the confidence in predicting the corresponding steering direction. We can alsotrain a single network to output both the steering command and suggested acceleration, simply by concatenating the vectors that encode these two out-put predictions.●The training examples may contain errors. ANN learning methods are quite robust to noise in the training data.Long training times are acceptable. Network training algorithms typicallyrequire longer training times than, say, decision tree learning algorithms.Training times can range from a few seconds to many hours, dependingon factors such as the number of weights in the network, the number oftraining examples considered, and the settings of various learning algorithm parameters.●Fast evaluation of the learned target function may be required. Although ANN learning times are relatively long, evaluating the learned network, inorder to apply it to a subsequent instance, is typically very fast. For example, ALVINN applies its neural network several times per second to continuallyupdate its steering command as the vehicle drives forward.●The ability of humans to understand the learned target function is not impor- tant. The weights learned by neural networks are often difficult for humans to interpret. Learned neural networks are less easily communicated to humansthan learned rules.The rest of this chapter is organized as follows: We first consider several alternative designs for the primitive units that make up artificial neural networks(perce~trons, linear units, and sigmoid units), along with learning algorithms for training single units. We then present the BACKPROPAGATION algorithm for training multilayer networks of such units and consider several general issues such as the representational capabilities of ANNs, nature of the hypothesis space search, over- fitting problems, and alternatives to the BACKPROPAGATION algorithm. A detailed example is also presented applying BACKPROPAGATION to face recognition, and directions are provided for the reader to obtain the data and code to experimentfurther with this application.4.4 PERCEPTRONSOne type of ANN system is based on a unit called a perceptron, illustrated inFigure 4.2. A perceptron takes a vector of real-valued inputs, calculates a linear combination of these inputs, then outputs a 1 if the result is greater than some threshold and -1 otherwise. More precisely, given inputs xl through x,, the outputo(x1, . . . , x,) computed by the perceptron iso(x1,. . . , x , ) = 1 if wo + wlxl+ ~ 2 x 2 + - . + W,X, > 0-1 otherwisewhere each wi is a real-valued constant, or weight, that determines the contributionof input xi to the perceptron output. Notice the quantity (-wO) is a threshold thatthe weighted combination of inputs wlxl + . . . + wnxn must surpass in order forthe perceptron to output a 1.To simplify notation, we imagine an additional constant input xo = 1, al-lowing us to write the above inequality as C:=o wixi > 0, or in vector form asiir ..i! > 0. For brevity, we will sometimes write the perceptron function aswhereLearning a perceptron involves choosing values for the weights wo, . . . , w,. Therefore, the space H of candidate hypotheses considered in perceptron learningis the set of all possible real-valued weight vectors.4.4.1 Representational Power of PerceptronsWe can view the perceptron as representing a hyperplane decision surface in then-dimensional space of instances (i.e., points). The perceptron outputs a 1 for instances lying on one side of the hyperplane and outputs a -1 for instanceslying on the other side, as illustrated in Figure 4.3. The equation for this decisionhyperplane is iir . .i! = 0. Of course, some sets of positive and negative examplescannot be separated by any hyperplane. Those that can be separated are calledlinearly separable sets of examples.FIGURE 4 3A perceptron.A single perceptron can be used to represent many boolean functions. Forexample, if we assume boolean values of 1 (true) and -1 (false), then one way touse a two-input perceptron to implement the AND function is to set the weightswo = -3, and wl = wz = .5. This perceptron can be made to represent the ORfunction instead by altering the threshold to wo = -.3. In fact, AND and OR canbe viewed as special cases of m-of-n functions: that is, functions where at leastm of the n inputs to the perceptron must be true. The OR function corresponds torn = 1 and the AND function to m = n. Any m-of-n function is easily representedusing a perceptron by setting all input weights to the same value (e.g., 0.5) andthen setting the threshold wo accordingly.Perceptrons can represent all of the primitive boolean functions AND, OR,NAND ( 1 AND), and NOR ( 1 OR). Unfortunately, however, some boolean func-tions cannot be represented by a single perceptron, such as the XOR functionwhose value is 1 if and only if xl # xz. Note the set of linearly nonseparabletraining examples shown in Figure 4.3(b) corresponds to this XOR function.The ability of perceptrons to represent AND, OR, NAND, and NOR isimportant because every boolean function can be represented by some network of interconnected units based on these primitives. In fact, every boolean function canbe represented by some network of perceptrons only two levels deep, in whichFIGURE 4.3The decision surface represented by a two-input perceptron. (a) A set of training examples and thedecision surface of a perceptron that classifies them correctly. (b) A set of training examples that isnot linearly separable (i.e., that cannot be correctly classified by any straight line). xl and x2 arethePerceptron inputs. Positive examples are indicated by "+", negative by "-".the inputs are fed to multiple units, and the outputs of these units are then input to a second, final stage. One way is to represent the boolean function in disjunctive normal form (i.e., as the disjunction (OR) of a set of conjunctions (ANDs) ofthe inputs and their negations). Note that the input to an AND perceptron can be negated simply by changing the sign of the corresponding input weight.Because networks of threshold units can represent a rich variety of functionsand because single units alone cannot, we will generally be interested in learning multilayer networks of threshold units.4.4.2 The Perceptron Training RuleAlthough we are interested in learning networks of many interconnected units, let us begin by understanding how to learn the weights for a single perceptron. Here the precise learning problem is to determine a weight vector that causes the per- ceptron to produce the correct f 1 output for each of the given training examples. Several algorithms are known to solve this learning problem. Here we con-sider two: the perceptron rule and the delta rule (a variant of the LMS rule usedin Chapter 1 for learning evaluation functions). These two algorithms are guaran- teed to converge to somewhat different acceptable hypotheses, under somewhat different conditions. They are important to ANNs because they provide the basis for learning networks of many units.One way to learn an acceptable weight vector is to begin with randomweights, then iteratively apply the perceptron to each training example, modify- ing the perceptron weights whenever it misclassifies an example. This process is repeated, iterating through the training examples as many times as needed untilthe perceptron classifies all training examples correctly. Weights are modified at each step according to the perceptron training rule, which revises the weight wi associated with input xi according to the rulewhereHere t is the target output for the current training example, o is the output generated by the perceptron, and q is a positive constant called the learning rate. The roleof the learning rate is to moderate the degree to which weights are changed ateach step. It is usually set to some small value (e.g., 0.1) and is sometimes madeto decay as the number of weight-tuning iterations increases.Why should this update rule converge toward successful weight values? Toget an intuitive feel, consider some specific cases. Suppose the training example is correctly classified already by the perceptron. In this case, ( t - o) is zero, makingAwi zero, so that no weights are updated. Suppose the perceptron outputs a -1,when the target output is + 1. To make the perceptron output a + 1 instead of - 1 in this case, the weights must be altered to increase the value of G . 2 . For example, if xi r 0, then increasing wi will bring the perceptron closer to correctly classifyingthis example. Notice the training rule will increase w, in this case, because (t - o),7 , and Xi are all positive. For example, if xi = .8, q = 0.1, t = 1 , and o = - 1 ,then the weight update will be Awi = q(t - o)xi = O . 1 ( 1 - (-1))0.8 = 0.16. Onthe other hand, if t = - 1 and o = 1, then weights associated with positive xi willbe decreased rather than increased.In fact, the above learning procedure can be proven to converge within afinite number of applications of the perceptron training rule to a weight vec-tor that correctly classifies all training examples, provided the training examplesare linearly separable and provided a sufficiently small 7 is used (see Minskyand Papert 1969). If the data are not linearly separable, convergence is not as- sured.4.4.3 Gradient Descent and the Delta RuleAlthough the perceptron rule finds a successful weight vector when the training examples are linearly separable, it can fail to converge if the examples are not linearly separable. A second training rule, called the delta rule, is designed to overcome this difficulty. If the training examples are not linearly separable, thedelta rule converges toward a best-fit approximation to the target concept.The key idea behind the delta rule is to use gradient descent to search the hy- pothesis space of possible weight vectors to find the weights that best fit the train- ing examples. This rule is important because gradient descent provides the basisfor the BACKPROPAGATION algorithm, which can learn networks with many inter-connected units. It is also important because gradient descent can serve as the basis for learning algorithms that must search through hypothesis spaces contain- ing many different types of continuously parameterized hypotheses.The delta training rule is best understood by considering the task of trainingan unthresholded perceptron; that is, a linear unit for which the output o is given by Thus, a linear unit corresponds to the first stage of a perceptron, without the threshold.In order to derive a weight learning rule for linear units, let us begin by specifying a measure for the training error of a hypothesis (weight vector), relative to the training examples. Although there are many ways to define this error, one common measure that will turn out to be especially convenient iswhere D is the set of training examples, td is the target output for training example d, and od is the output of the linear unit for training example d. By this definition, E ( 6 ) is simply half the squared difference between the target output td and theh e a r unit output od, summed over all training examples. Here we characterizeE as a function of 27, because the linear unit output o depends on this weight vector. Of course E also depends on the particular set of training examples, butwe assume these are fixed during training, so we do not bother to write E as an explicit function of these. Chapter 6 provides a Bayesian justification for choosing this particular definition of E. In particular, there we show that under certain conditions the hypothesis that minimizes E is also the most probable hypothesisin H given the training data.4.4.3.1 VISUALIZING THE HYPOTHESIS SPACETo understand the gradient descent algorithm, it is helpful to visualize the entire hypothesis space of possible weight vectors and their associated E values, as illustrated in Figure 4.4. Here the axes wo and w l represent possible values for the two weights of a simple linear unit. The wo, wl plane therefore representsthe entire hypothesis space. The vertical axis indicates the error E relative to some fixed set of training examples. The error surface shown in the figure thus summarizes the desirability of every weight vector in the hypothesis space (we desire a hypothesis with minimum error). Given the way in which we chose todefine E, for linear units this error surface must always be parabolic with a singleglobal minimum. The specific parabola will depend, of course, on the particularset of training examples.FIGURE 4.4Error of different hypotheses. For a linear unit with two weights, the hypothesis space H is the wg, wl plane. The vertical axis indicates tk error of the corresponding weight vector hypothesis, relative to a fixed set of training examples. The arrow shows the negated gradient at one partic- ular point, indicating the direction in the wo, w l plane producing steepest descent along the error surface.Gradient descent search determines a weight vector that minimizes E bystarting with an arbitrary initial weight vector, then repeatedly modifying it insmall steps. At each step, the weight vector is altered in the direction that producesthe steepest descent along the error surface depicted in Figure 4.4. This processcontinues until the global minimum error is reached.4.4.3.2 DERIVATION OF THE GRADIENT DESCENT RULEHow can we calculate the direction of steepest descent along the error surface?This direction can be found by computing the derivative of E with respect to eachcomponent of the vector 2. This vector derivative is called the gradient of E withrespect to 221, written ~ ~ ( i i r ) .Notice VE(221) is itself a vector, whose components are the partial derivativesof E with respect to each of the wi. When interpreted as a vector in weightspace, the gradient specijies the direction that produces the steepest increase inE . The negative of this vector therefore gives the direction of steepest decrease.For example, the arrow in Figure 4.4 shows the negated gradient -VE(G) for aparticular point in the wo, wl plane.Since the gradient specifies the direction of steepest increase of E, the train-ing rule for gradient descent iswhereHere r] is a positive constant called the learning rate, which determines the stepsize in the gradient descent search. The negative sign is present because we wantto move the weight vector in the direction that decreases E. This training rulecan also be written in its component formwherewhich makes it clear that steepest descent is achieved by altering each component w, of ii in proportion to E.To construct a practical algorithm for iteratively updating weights accordingto Equation ( 4 4 , we need an efficient way of calculating the gradient at each step. Fortunately, this is not difficult. The vector of derivatives that form the gradient can be obtained by differentiating E from Equation (4.2), aswhere xid denotes the single input component xi for training example d. We now have an equation that gives in terms of the linear unit inputs xid, outputsOd, and target values td associated with the training examples. Substituting Equa- tion (4.6) into Equation (4.5) yields the weight update rule for gradient descentTo summarize, the gradient descent algorithm for training linear units is as follows: Pick an initial random weight vector. Apply the linear unit to all training examples, then compute Awi for each weight according to Equation (4.7). Update each weight wi by adding Awi, then repeat this process. This algorithm is givenin Table 4.1. Because the error surface contains only a single global minimum, this algorithm will converge to a weight vector with minimum error, regardlessof whether the training examples are linearly separable, given a sufficiently small learning rate q is used. If r ) is too large, the gradient descent search runs the risk of overstepping the minimum in the error surface rather than settling into it. For this reason, one common modification to the algorithm is to gradually reduce the value of r ) as the number of gradient descent steps grows.4.4.3.3 STOCHASTIC APPROXIMATION TO GRADIENT DESCENT Gradient descent is an important general paradigm for learning. It is a strategy for searching through a large or infinite hypothesis space that can be applied whenever (1) the hypothesis space contains continuously parameterized hypotheses (e.g., the weights in a linear unit), and (2) the error can be differentiated with respect to these hypothesis parameters. The key practical difficulties in applying gradient descent are (1) converging to a local minimum can sometimes be quite slow (i.e., it can require many thousands of gradient descent steps), and (2) if there aremultiple local minima in the error surface, then there is no guarantee that theprocedure will find the global minimum.CHAF'l'ER 4 ARTIFICIAL NEURAL NETWORKS 93- -~ ~ A D I E N T - D E s c E N T ( ~ ~ ~ ~ ~ ~ ~ ~ ~ x ~ ~ ~ ~ ~ s , q )Each training example is a pair of the form (2, t ) , where x' is the vector of input values, andt is the target output value. q is the learning rate (e.g., .05). . Initialize each w, to some small random value . Until the termination condition is met, Do0 Initialize each Awi to zero.0 For each (2, t ) in trainingaxamples, Dow Input the instance x' to the unit and compute the output oFor each linear unit weight w,, DoFor each linear unit weight wi, DoTABLE 4.1GRADIENT DESCENT algorithm for training a linear unit. To implement the stochastic approximationto gradient descent, Equation (T4.2) is deleted, and Equation (T4.1) replaced by wi c wi +q(t - o b i .One common variation on gradient descent intended to alleviate these diffi-culties is called incremental gradient descent, or alternatively stochastic gradientdescent. Whereas the gradient descent training rule presented in Equation (4.7)computes weight updates after summing over a22 the training examples in D, theidea behind stochastic gradient descent is to approximate this gradient descentsearch by updating weights incrementally, following the calculation of the errorfor each individual example. The modified training rule is like the training rulegiven by Equation (4.7) except that as we iterate through each training examplewe update the weight according towhere t, o, and xi are the target value, unit output, and ith input for the trainingexample in question. To modify the gradient descent algorithm of Table 4.1 toimplement this stochastic approximation, Equation (T4.2) is simply deleted andEquation (T4.1) replaced by wi t wi + v ( t - o) xi. One way to view this stochastic。
State Space Reconstruction for Multivariate Time Series Prediction
a r X i v :0809.2220v 1 [n l i n .C D ] 12 S e p 2008APS/123-QEDState Space Reconstruction for Multivariate Time Series PredictionI.Vlachos ∗and D.Kugiumtzis †Department of Mathematical,Physical and Computational Sciences,Faculty of Technology,Aristotle University of Thessaloniki,Greece(Dated:September 12,2008)In the nonlinear prediction of scalar time series,the common practice is to reconstruct the state space using time-delay embedding and apply a local model on neighborhoods of the reconstructed space.The method of false nearest neighbors is often used to estimate the embedding dimension.For prediction purposes,the optimal embedding dimension can also be estimated by some prediction error minimization criterion.We investigate the proper state space reconstruction for multivariate time series and modify the two abovementioned criteria to search for optimal embedding in the set of the variables and their delays.We pinpoint the problems that can arise in each case and compare the state space reconstructions (suggested by each of the two methods)on the predictive ability of the local model that uses each of them.Results obtained from Monte Carlo simulations on known chaotic maps revealed the non-uniqueness of optimum reconstruction in the multivariate case and showed that prediction criteria perform better when the task is prediction.PACS numbers:05.45.Tp,02.50.Sk,05.45.aKeywords:nonlinear analysis,multivariate analysis,time series,local prediction,state space reconstructionI.INTRODUCTIONSince its publication Takens’Embedding Theorem [1](and its extension,the Fractal Delay Embedding Preva-lence Theorem by Sauer et al.[2])has been used in time series analysis in many different settings ranging from system characterization and approximation of invariant quantities,such as correlation dimension and Lyapunov exponents,to prediction and noise-filtering [3].The Em-bedding Theorem implies that although the true dynam-ics of a system may not be known,equivalent dynamics can be obtained under suitable conditions using time de-lays of a single time series,treated as an one-dimensional projection of the system trajectory.Most applications of the Embedding Theorem deal with univariate time series,but often measurements of more than one quantities related to the same dynamical system are available.One of the first uses of multivari-ate embedding was in the context of spatially extended systems where embedding vectors were constructed from data representing the same quantity measured simulta-neously at different locations [4,5].Multivariate em-bedding was used for noise reduction [6]and for surro-gate data generation with equal individual delay times and equal embedding dimensions for each time series [7].In nonlinear multivariate prediction,the prediction with local models on a space reconstructed from a different time series of the same system was studied in [8].This study was extended in [9]by having the reconstruction utilize all of the observed time series.Multivariate em-bedding with the use of independent components analysis was considered in [10]and more recently multivariate em-2as x n=h(y n).Despite the apparent loss of information of the system dynamics by the projection,the system dynamics may be recovered through suitable state space reconstruction from the scalar time series.A.Reconstruction of the state space According to Taken’s embedding theorem a trajectory formed by the points x n of time-delayed components from the time series{x n}N n=1asx n=(x n−(m−1)τ,x n−(m−2)τ,...,x n),(1)under certain genericity assumptions,is an one-to-one mapping of the original trajectory of y n provided that m is large enough.Given that the dynamical system“lives”on an attrac-tor A⊂Γ,the reconstructed attractor˜A through the use of the time-delay vectors is topologically equivalent to A.A sufficient condition for an appropriate unfolding of the attractor is m≥2d+1where d is the box-counting dimension of A.The embedding process is visualized in the following graphy n∈A⊂ΓF→y n+1∈A⊂Γ↓h↓hx n∈R x n+1∈R↓e↓ex n∈˜A⊂R m G→x n+1∈˜A⊂R mwhere e is the embedding procedure creating the delay vectors from the time series and G is the reconstructed dynamical system on˜A.G preserves properties of the unknown F on the unknown attractor A that do not change under smooth coordinate transformations.B.Univariate local predictionFor a given state space reconstruction,the local predic-tion at a target point x n is made with a model estimated on the K nearest neighboring points to x n.The local model can have a simple form,such as the zeroth order model(the average of the images of the nearest neigh-bors),but here we consider the linear modelˆx n+1=a(n)x n+b(n),where the superscript(n)denotes the dependence of the model parameters(a(n)and b(n))on the neighborhood of x n.The neighborhood at each target point is defined either by afixed number K of nearest neighbors or by a distance determining the borders of the neighborhood giving a varying K with x n.C.Selection of embedding parametersThe two parameters of the delay embedding in(1)are the embedding dimension m,i.e.the number of compo-nents in x n and the delay timeτ.We skip the discussion on the selection ofτas it is typically set to1in the case of discrete systems that we focus on.Among the ap-proaches for the selection of m we choose the most popu-lar method of false nearest neighbors(FNN)and present it briefly below[13].The measurement function h projects distant points {y n}of the original attractor to close values of{x n}.A small m may still give badly projected points and we seek the reconstructed state space of the smallest embed-ding dimension m that unfolds the attractor.This idea is implemented as follows.For each point x m n in the m-dimensional reconstructed state space,the distance from its nearest neighbor x mn(1)is calculated,d(x m n,x mn(1))=x m n−x mn(1).The dimension of the reconstructed state space is augmented by1and the new distance of thesevectors is calculated,d(x m+1n,x m+1n(1))= x m+1n−x m+1n(1). If the ratio of the two distances exceeds a predefined tol-erance threshold r the two neighbors are classified as false neighbors,i.e.r n(m)=d(x m+1n,x m+1n(1))3 III.MULTIV ARIATE EMBEDDINGIn Section II we gave a summary of the reconstructiontechnique for a deterministic dynamical system from ascalar time series generated by the system.However,it ispossible that more than one time series are observed thatare possibly related to the system under investigation.For p time series measured simultaneously from the samedynamical system,a measurement function H:Γ→R pis decomposed to h i,i=1,...,p,defined as in Section II,giving each a time series{x i,n}N n=1.According to the dis-cussion on univariate embedding any of the p time seriescan be used for reconstruction of the system dynamics,or better,the most suitable time series could be selectedafter proper investigation.In a different approach all theavailable time series are considered and the analysis ofthe univariate time series is adjusted to the multivariatetime series.A.From univariate to multivariate embeddingGiven that there are p time series{x i,n}N n=1,i=1,...,p,the equivalent to the reconstructed state vec-tor in(1)for the case of multivariate embedding is of theformx n=(x1,n−(m1−1)τ1,x1,n−(m1−2)τ1,...,x1,n,x2,n−(m2−1)τ2,...,x2,n,...,x p,n)(3)and are defined by an embedding dimension vector m= (m1,...,m p)that indicates the number of components used from each time series and a time delay vector τ=(τ1,...,τp)that gives the delays for each time series. The corresponding graph for the multivariate embedding process is shown below.y n∈A⊂ΓF→y n+1∈A⊂Γւh1↓h2...ցhpւh1↓h2...ցhpx1,n x2,n...x p,n x1,n+1x2,n+1...x p,n+1ցe↓e...ւeցe↓e...ւex n∈˜A⊂R M G→x n+1∈˜A⊂R MThe total embedding dimension M is the sum of the individual embedding dimensions for each time seriesM= p i=1m i.Note that if redundant or irrelevant information is present in the p time series,only a sub-set of them may be represented in the optimal recon-structed points x n.The selection of m andτfollows the same principles as for the univariate case:the attrac-tor should be fully unfolded and the components of the embedding vectors should be uncorrelated.A simple se-lection rule suggests that all individual delay times and embedding dimensions are the same,i.e.m=m1and τ=τ1with1a p-vector of ones[6,7].Here,we set againτi=1,i=1,...,p,but we consider bothfixed and varying m i in the implementation of the FNN method (see Section III D).B.Multivariate local predictionThe prediction for each time series x i,n,i=1,...,p,is performed separately by p local models,estimated as in the case of univariate time series,but for reconstructed points formed potentially from all p time series as given in(3)(e.g.see[9]).We propose an extension of the NRMSE for the pre-diction of one time series to account for the error vec-tors comprised of the individual prediction errors for each of the predicted time series.If we have one step ahead predictions for the p available time series,i.e.ˆx i,n, i=1,...,p(for a range of current times n−1),we define the multivariate NRMSENRMSE=n (x1,n−¯x1,...,x p,n−¯x p) 2(4)where¯x i is the mean of the actual values of x i,n over all target times n.C.Problems and restrictions of multivariatereconstructionsA major problem in the multivariate case is the prob-lem of identification.There are often not unique m and τembedding parameters that unfold fully the attractor.A trivial example is the Henon map[17]x n+1=1.4−x2n+y ny n+1=0.3x n(5) It is known that for the state space reconstruction from the observable x n the appropriate embedding parame-ters are m=2andτ=1.Due to the fact that y n is a lagged multiple of x n the attractor can obviously be reconstructed from the bivariate time series{x n,y n} equally well with any of the following two-dimensional embedding schemesx n=(x n,x n−1)x n=(x n,y n)x n=(y n,y n−1) since they are essentially the same.This example shows also the problem of redundant information,e.g.the state space reconstruction would not improve by augmenting the delay vector x n=(x n,x n−1)with the component y n that actually duplicates x n−1.Redundancy is inevitable in multivariate time series as synchronous observations of the different time series are generally correlated and the fact that these observations are used as components in the same embedding vector adds redundant information in them.We note here that in the case of continuous dynamical systems,the delay parameterτi may be se-lected so that the components of the i time series are not correlated with each other,but this does not imply that they are not correlated to components from another time series.4 A different problem is that of irrelevance,whenseries that are not generated by the same dynamicaltem are included in the reconstruction procedure.may be the case even when a time series is connectedtime series generated by the system underAn issue of concern is also the fact thatdata don’t always have the same data ranges andtances calculated on delay vectors withdifferent ranges may depend highly on only some ofcomponents.So it is often preferred to scale all theto have either the same variance or be in the samerange.For our study we choose to scale the data torange[0,1].D.Selection of the embedding dimension vector Taking into account the problems in the state space reconstruction from multivariate time series,we present three methods for determining m,two based on the false nearest neighbor algorithm,which we name FNN1and FNN2,and one based on local models which we call pre-diction error minimization criterion(PEM).The main idea of the FNN algorithms is as for the univariate case.Starting from a small value the embed-ding dimension is increased by including delay compo-nents from the p time series and the percentage of the false nearest neighbors is calculated until it falls to the zero level.The difference of the two FNN methods is on the way that m is increased.For FNN1we restrict the state space reconstruction to use the same embedding dimension for each of the p time series,i.e.m=(m,m,...,m)for a given m.To assess whether m is sufficient,we consider all delay embeddings derived by augmenting the state vector of embedding di-mension vector(m,m,...,m)with a single delayed vari-able from any of the p time series.Thus the check for false nearest neighbors in(2)yields the increase from the embedding dimension vector(m,m,...,m)to each of the embedding dimension vectors(m+1,m,...,m), (m,m+1,...,m),...,(m,m,...,m+1).Then the algo-rithm stops at the optimal m=(m,m,...,m)if the zero level percentage of false nearest neighbors is obtained for all p cases.A sketch of thefirst two steps for a bivariate time series is shown in Figure1(a).This method has been commonly used in multivariate reconstruction and is more appropriate for spatiotem-porally distributed data(e.g.see the software package TISEAN[18]).A potential drawback of FNN1is that the selected total embedding dimension M is always a multiple of p,possibly introducing redundant informa-tion in the embedding vectors.We modify the algorithm of FNN1to account for any form of the embedding dimension vector m and the total embedding dimension M is increased by one at each step of the algorithm.Let us suppose that the algorithm has reached at some step the total embedding dimension M. For this M all the combinations of the components of the embedding dimension vector m=(m1,m2,...,m p)are considered under the condition M= p i=1m i.Then for each such m=(m1,m2,...,m p)all the possible augmen-tations with one dimension are checked for false nearest neighbors,i.e.(m1+1,m2,...,m p),(m1,m2+1,...,m p), ...,(m1,m2,...,m p+1).A sketch of thefirst two steps of the extended FNN algorithm,denoted as FNN2,for a bivariate time series is shown in Figure1(b).The termination criterion is the drop of the percent-age of false nearest neighbors to the zero level at every increase of M by one for at least one embedding dimen-sion vector(m1,m2,...,m p).If more than one embedding dimension vectors fulfill this criterion,the one with the smallest cumulative FNN percentage is selected,where the cumulative FNN percentage is the sum of the p FNN percentages for the increase by one of the respective com-ponent of the embedding dimension vector.The PEM criterion for the selection of m= (m1,m2,...,m p)is simply the extension of the goodness-of-fit or prediction criterion in the univariate case to account for the multiple ways the delay vector can be formed from the multivariate time series.Thus for all possible p-plets of(m1,m2,...,m p)from(1,0,...,0), (0,1,...,0),etc up to some vector of maximum embed-ding dimensions(m max,m max,...,m max),the respective reconstructed state spaces are created,local linear mod-els are applied and out-of-sample prediction errors are computed.So,totally p m max−1embedding dimension vectors are compared and the optimal is the one that gives the smallest multivariate NRMSE as defined in(4).IV.MONTE CARLO SIMULATIONS ANDRESULTSA.Monte Carlo setupWe test the three methods by performing Monte Carlo simulations on a variety of known nonlinear dynamical systems.The embedding dimension vectors are selected using the three methods on100different realizations of each system and the most frequently selected embedding dimension vectors for each method are tracked.Also,for each realization and selected embedding dimension vec-5ate NRMSE over the100realizations for each method is then used as an indicator of the performance of each method in prediction.The selection of the embedding dimension vector by FNN1,FNN2and PEM is done on thefirst three quarters of the data,N1=3N/4,and the multivariate NRMSE is computed on the last quarter of the data(N−N1).For PEM,the same split is used on the N1data,so that N2= 3N1/4data are used tofind the neighbors(training set) and the rest N1−N2are used to compute the multivariate NRMSE(test set)and decide for the optimal embedding dimension vector.A sketch of the split of the data is shown in Figure2.The number of neighbors for the local models in PEM varies with N and we set K N=10,25,50 for time series lengths N=512,2048,8192,respectively. The parameters of the local linear model are estimated by ordinary least squares.For all methods the investigation is restricted to m max=5.The multivariate time series are derived from nonlin-ear maps of varying dimension and complexity as well as spatially extended maps.The results are given below for each system.B.One and two Ikeda mapsThe Ikeda map is an example of a discrete low-dimensional chaotic system in two variables(x n,y n)de-fined by the equations[19]z n+1=1+0.9exp(0.4i−6i/(1+|z n|2)),x n=Re(z n),y n=Im(z n),where Re and Im denote the real and imaginary part,re-spectively,of the complex variable z n.Given the bivari-ate time series of(x n,y n),both FNN methods identify the original vector x n=(x n,y n)andfind m=(1,1)as optimal at all realizations,as shown in Table I.On the other hand,the PEM criterionfinds over-embedding as optimal,but this improves slightly the pre-diction,which as expected improves with the increase of N.Next we consider the sum of two Ikeda maps as a more complex and higher dimensional system.The bivariateI:Dimension vectors and NRMSE for the Ikeda map.2,3and4contain the embedding dimension vectorsby their respective frequency of occurrenceNRMSEFNN1PEM FNN2 512(1,1)1000.0510.032 (1,1)100(2,2)1000.028 8192(1,1)1000.0130.003II:Dimension vectors and NRMSE for the sum ofmapsNRMSEFNN1PEM FNN2 512(2,2)650.4560.447(1,3)26(3,3)95(2,3)540.365(2,2)3(2,2)448192(2,3)430.2600.251(1,4)37time series are generated asx n=Re(z1,n+z2,n),y n=Im(z1,n+z2,n).The results of the Monte Carlo simulations shown in Ta-ble II suggest that the prediction worsens dramatically from that in Table I and the total embedding dimension M increases with N.The FNN2criterion generally gives multiple optimal m structures across realizations and PEM does the same but only for small N.This indicates that high complex-ity degrades the performance of the algorithms for small sample sizes.PEM is again best for predictions but over-all we do not observe large differences in the three meth-ods.An interesting observation is that although FNN2finds two optimal m with high frequencies they both give the same M.This reflects the problem of identification, where different m unfold the attractor equally well.This feature cannot be observed in FNN1because the FNN1 algorithm inspects fewer possible vectors and only one for each M,where M can only be multiple of p(in this case(1,1)for M=2,(2,2)for M=4,etc).On the other hand,PEM criterion seems to converge to a single m for large N,which means that for the sum of the two Ikeda maps this particular structure gives best prediction re-sults.Note that there is no reason that the embedding dimension vectors derived from FNN2and PEM should match as they are selected under different conditions. Moreover,it is expected that the m selected by PEM gives always the lowest average of multivariate NRMSE as it is selected to optimize prediction.TABLE III:Dimension vectors and NRMSE for the KDR mapNRMSE FNN1PEM FNN2512(0,0,2,2)30(1,1,1,1)160.7760.629 (1,1,1,1)55(2,2,2,2)39(0,2,1,1)79(0,1,0,1)130.6598192(2,1,1,1)40(1,1,1,1)140.5580.373TABLE IV:Dimension vectors and NRMSE for system of Driver-Response Henon systemEmbedding dimensionsN FNN1PEM FNN2512(2,2)100(2,2)75(2,1)100.196(2,2)100(3,2)33(2,2)250.127(2,2)100(3,0)31(0,3)270.0122048(2,2)100(2,2)1000.093(2,2)100(3,3)45(4,3)450.084(2,2)100(0,3)20(3,0)190.0068192(2,2)100(2,2)1000.051(2,2)100(3,3)72(4,3)250.027(2,2)100(0,4)31(4,0)300.002TABLE V:Dimension vectors and NRMSE for Lattice of3coupled Henon mapsEmbedding dimensionsN FNN1PEM FNN2512(2,2,2)94(1,1,1)6(1,2,1)29(1,1,2)230.298(2,2,2)98(1,1,1)2(2,0,2)44(2,1,1)220.2282048(2,2,2)100(1,2,2)34(2,2,1)300.203(2,2,2)100(2,1,2)48(2,0,2)410.1318192(2,2,2)100(2,2,2)97(3,2,3)30.174(2,2,2)100(2,1,2)79(3,2,3)190.084NRMSEC FNN2FNN1PEM0.4(1,1,1,1)42(1,0,2,1)170.2850.2880.8(1,1,1,1)40(1,0,1,2)170.3140.2910.4(1,1,1,1)88(1,1,1,2)70.2290.1900.8(1,1,1,1)36(1,0,2,1)330.2250.1630.4(1,1,1,1)85(1,2,1,1)80.1970.1370.8(1,2,0,1)31(1,0,2,1)220.1310.072 PEM cannot distinguish the two time series and selectswith almost equal frequencies vectors of the form(m,0)and(0,m)giving again over-embedding as N increases.Thus PEM does not reveal the coupling structure of theunderlying system and picks any embedding dimensionstructure among a range of structures that give essen-tially equivalent predictions.Here FNN2seems to de-tect sufficiently the underlying coupling structure in thesystem resulting in a smaller total embedding dimensionthat gives however the same level of prediction as thelarger M suggested by FNN1and slightly smaller thanthe even larger M found by PEM.ttices of coupled Henon mapsThe last system is an example of spatiotemporal chaosand is defined as a lattice of k coupled Henon maps{x i,n,y i,n}k i=1[22]specified by the equationsx i,n+1=1.4−((1−C)x i,n+C(x i−1,n+x i+1,n)ple size,at least for the sizes we used in the simulations. Such a feature shows lack of consistency of the PEM cri-terion and suggests that the selection is led from factors inherent in the prediction process rather than the quality of the reconstructed attractor.For example the increase of embedding dimension with the sample size can be ex-plained by the fact that more data lead to abundance of close neighbors used in local prediction models and this in turn suggests that augmenting the embedding vectors would allow to locate the K neighbors used in the model. On the other hand,the two schemes used here that ex-tend the method of false nearest neighbors(FNN)to mul-tivariate time series aim atfinding minimum embedding that unfolds the attractor,but often a higher embedding gives better prediction results.In particular,the sec-ond scheme(FNN2)that explores all possible embedding structures gives consistent selection of an embedding of smaller dimension than that selected by PEM.Moreover, this embedding could be justified by the underlying dy-namics of the known systems we tested.However,lack of consistency of the selected embedding was observed with all methods for small sample sizes(somehow expected due to large variance of any estimate)and for the cou-pled maps(probably due to the presence of more than one optimal embeddings).In this work,we used only a prediction performance criterion to assess the quality of state space reconstruc-tion,mainly because it has the most practical relevance. There is no reason to expect that PEM would be found best if the assessment was done using another criterion not based on prediction.However,the reference(true)value of other measures,such as the correlation dimen-sion,are not known for all systems used in this study.An-other constraint of this work is that only noise-free multi-variate time series from discrete systems are encountered, so that the delay parameter is not involved in the state space reconstruction and the effect of noise is not studied. It is expected that the addition of noise would perplex further the process of selecting optimal embedding di-mension and degrade the performance of the algorithms. For example,we found that in the case of the Henon map the addition of noise of equal magnitude to the two time series of the system makes the criteria to select any of the three equivalent embeddings((2,0),(0,2),(1,1))at random.It is in the purpose of the authors to extent this work and include noisy multivariate time series,also fromflows,and search for other measures to assess the performance of the embedding selection methods.AcknowledgmentsThis paper is part of the03ED748research project,im-plemented within the framework of the”Reinforcement Programme of Human Research Manpower”(PENED) and co-financed at90%by National and Community Funds(25%from the Greek Ministry of Development-General Secretariat of Research and Technology and75% from E.U.-European Social Fund)and at10%by Rik-shospitalet,Norway.[1]F.Takens,Lecture Notes in Mathematics898,365(1981).[2]T.Sauer,J.A.Yorke,and M.Casdagli,Journal of Sta-tistical Physics65,579(1991).[3]H.Kantz and T.Schreiber,Nonlinear Time Series Anal-ysis(Cambridge University Press,1997).[4]J.Guckenheimer and G.Buzyna,Physical Review Let-ters51,1438(1983).[5]M.Paluˇs,I.Dvoˇr ak,and I.David,Physica A StatisticalMechanics and its Applications185,433(1992).[6]R.Hegger and T.Schreiber,Physics Letters A170,305(1992).[7]D.Prichard and J.Theiler,Physical Review Letters73,951(1994).[8]H.D.I.Abarbanel,T.A.Carroll,,L.M.Pecora,J.J.Sidorowich,and L.S.Tsimring,Physical Review E49, 1840(1994).[9]L.Cao,A.Mees,and K.Judd,Physica D121,75(1998),ISSN0167-2789.[10]J.P.Barnard,C.Aldrich,and M.Gerber,Physical Re-view E64,046201(2001).[11]S.P.Garcia and J.S.Almeida,Physical Review E(Sta-tistical,Nonlinear,and Soft Matter Physics)72,027205 (2005).[12]Y.Hirata,H.Suzuki,and K.Aihara,Physical ReviewE(Statistical,Nonlinear,and Soft Matter Physics)74, 026202(2006).[13]M.B.Kennel,R.Brown,and H.D.I.Abarbanel,Phys-ical Review A45,3403(1992).[14]D.T.Kaplan,in Chaos in Communications,edited byL.M.Pecora(SPIE-The International Society for Optical Engineering,Bellingham,Washington,98227-0010,USA, 1993),pp.236–240.[15]B.Chun-Hua and N.Xin-Bao,Chinese Physics13,633(2004).[16]R.Hegger and H.Kantz,Physical Review E60,4970(1999).[17]M.H´e non,Communications in Mathematical Physics50,69(1976).[18]R.Hegger,H.Kantz,and T.Schreiber,Chaos:An Inter-disciplinary Journal of Nonlinear Science9,413(1999).[19]K.Ikeda,Optics Communications30,257(1979).[20]C.Grebogi,E.Kostelich,E.O.Ott,and J.A.Yorke,Physica D25(1987).[21]S.J.Schiff,P.So,T.Chang,R.E.Burke,and T.Sauer,Physical Review E54,6708(1996).[22]A.Politi and A.Torcini,Chaos:An InterdisciplinaryJournal of Nonlinear Science2,293(1992).。
离散非线性系统的事件驱动最优控制
离散非线性系统的事件驱动最优控制张欣;薄迎春【摘要】为了降低数据传输次数和计算量,针对离散非线性系统的最优控制问题,提出了一种基于单网络值迭代算法的事件驱动最优控制方案.首先,设计了一种新型事件驱动阈值,当事件驱动误差大于该阈值时事件触发;然后,仅利用一个神经网络来构建评价网,直接计算获得系统状态和控制策略,省略了典型自适应动态规划中模型网和执行网的构建,从而减少了神经网络权值的训练量,通过在评价网和控制策略之间不断地迭代,获得事件驱动近似最优控制策略;接着,基于Lyapunov稳定性理论分别证明了闭环系统的稳定性和评价网络权值的一致最终有界性;最后,将该方法应用于一个离散非线性系统上进行仿真,实验结果验证了所提出的事件驱动最优控制方案的有效性.【期刊名称】《沈阳师范大学学报(自然科学版)》【年(卷),期】2018(036)004【总页数】6页(P318-323)【关键词】离散非线性系统;事件驱动控制;值迭代算法;最优控制【作者】张欣;薄迎春【作者单位】中国石油大学信息与控制工程学院,山东青岛 266580;中国石油大学信息与控制工程学院,山东青岛 266580【正文语种】中文【中图分类】TP273;O2210 引言因为在降低数据传输次数和计算量的同时还能保证具有较好的控制性能,因此,事件驱动控制近年来一直是控制领域的研究热点。
与传统的采样方法不同,事件驱动提供了一个只在状态采样点更新的非周期策略。
只有当事件触发条件不被满足时,对系统状态进行采样, 更新系统的控制率。
在2次更新之间采用零阶保持器保证控制器的输出。
目前,已有许多文献利用事件驱动控制方案解决不同的控制问题[1-5]。
文献[3]研究了线性系统的周期事件驱动控制。
文献[4]将事件驱动控制扩展到了离散非线性系统中。
Tallaprogada等在文献[5]中给出了事件驱动方法在非线性跟踪问题上的控制方案。
为了在事件驱动控制机制下研究系统的最优控制问题, 近期很多学者开始将自适应动态规划(adaptive dynamic programming, ADP)方法引入到事件驱动控制方案中。
人工智能岗位招聘笔试题与参考答案(某大型集团公司)
招聘人工智能岗位笔试题与参考答案(某大型集团公司)(答案在后面)一、单项选择题(本大题有10小题,每小题2分,共20分)1、以下哪个算法不属于监督学习算法?A、决策树B、支持向量机C、K最近邻D、朴素贝叶斯2、在深度学习中,以下哪个概念指的是通过调整网络中的权重和偏置来最小化损失函数的过程?A、过拟合B、欠拟合C、反向传播D、正则化3、以下哪个技术不属于深度学习中的卷积神经网络(CNN)组件?A. 卷积层B. 激活函数C. 池化层D. 反向传播算法4、在自然语言处理(NLP)中,以下哪种模型通常用于文本分类任务?A. 决策树B. 朴素贝叶斯C. 支持向量机D. 长短期记忆网络(LSTM)5、题干:以下哪项不属于人工智能的核心技术?A. 机器学习B. 深度学习C. 数据挖掘D. 计算机视觉6、题干:以下哪个算法在处理大规模数据集时,通常比其他算法更具有效率?A. K最近邻(K-Nearest Neighbors, KNN)B. 支持向量机(Support Vector Machines, SVM)C. 决策树(Decision Tree)D. 随机森林(Random Forest)7、以下哪个技术不属于深度学习领域?A. 卷积神经网络(CNN)B. 支持向量机(SVM)C. 递归神经网络(RNN)D. 随机梯度下降(SGD)8、以下哪个算法不是用于无监督学习的?A. K-均值聚类(K-means)B. 决策树(Decision Tree)C. 主成分分析(PCA)D. 聚类层次法(Hierarchical Clustering)9、以下哪个技术不属于深度学习中的神经网络层?A. 卷积层(Convolutional Layer)B. 循环层(Recurrent Layer)C. 线性层(Linear Layer)D. 随机梯度下降法(Stochastic Gradient Descent)二、多项选择题(本大题有10小题,每小题4分,共40分)1、以下哪些技术或方法通常用于提升机器学习模型的性能?()A、特征工程B、数据增强C、集成学习D、正则化E、迁移学习2、以下关于深度学习的描述,哪些是正确的?()A、深度学习是一种特殊的机器学习方法,它通过多层神经网络来提取特征。
特征更新的动态图卷积表面损伤点云分割方法
第41卷 第4期吉林大学学报(信息科学版)Vol.41 No.42023年7月Journal of Jilin University (Information Science Edition)July 2023文章编号:1671⁃5896(2023)04⁃0621⁃10特征更新的动态图卷积表面损伤点云分割方法收稿日期:2022⁃09⁃21基金项目:国家自然科学基金资助项目(61573185)作者简介:张闻锐(1998 ),男,江苏扬州人,南京航空航天大学硕士研究生,主要从事点云分割研究,(Tel)86⁃188****8397(E⁃mail)839357306@;王从庆(1960 ),男,南京人,南京航空航天大学教授,博士生导师,主要从事模式识别与智能系统研究,(Tel)86⁃130****6390(E⁃mail)cqwang@㊂张闻锐,王从庆(南京航空航天大学自动化学院,南京210016)摘要:针对金属部件表面损伤点云数据对分割网络局部特征分析能力要求高,局部特征分析能力较弱的传统算法对某些数据集无法达到理想的分割效果问题,选择采用相对损伤体积等特征进行损伤分类,将金属表面损伤分为6类,提出一种包含空间尺度区域信息的三维图注意力特征提取方法㊂将得到的空间尺度区域特征用于特征更新网络模块的设计,基于特征更新模块构建出了一种特征更新的动态图卷积网络(Feature Adaptive Shifting⁃Dynamic Graph Convolutional Neural Networks)用于点云语义分割㊂实验结果表明,该方法有助于更有效地进行点云分割,并提取点云局部特征㊂在金属表面损伤分割上,该方法的精度优于PointNet ++㊁DGCNN(Dynamic Graph Convolutional Neural Networks)等方法,提高了分割结果的精度与有效性㊂关键词:点云分割;动态图卷积;特征更新;损伤分类中图分类号:TP391.41文献标志码:A Cloud Segmentation Method of Surface Damage Point Based on Feature Adaptive Shifting⁃DGCNNZHANG Wenrui,WANG Congqing(School of Automation,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China)Abstract :The cloud data of metal part surface damage point requires high local feature analysis ability of the segmentation network,and the traditional algorithm with weak local feature analysis ability can not achieve the ideal segmentation effect for the data set.The relative damage volume and other features are selected to classify the metal surface damage,and the damage is divided into six categories.This paper proposes a method to extract the attention feature of 3D map containing spatial scale area information.The obtained spatial scale area feature is used in the design of feature update network module.Based on the feature update module,a feature updated dynamic graph convolution network is constructed for point cloud semantic segmentation.The experimental results show that the proposed method is helpful for more effective point cloud segmentation to extract the local features of point cloud.In metal surface damage segmentation,the accuracy of this method is better than pointnet++,DGCNN(Dynamic Graph Convolutional Neural Networks)and other methods,which improves the accuracy and effectiveness of segmentation results.Key words :point cloud segmentation;dynamic graph convolution;feature adaptive shifting;damage classification 0 引 言基于深度学习的图像分割技术在人脸㊁车牌识别和卫星图像分析领域已经趋近成熟,为获取物体更226吉林大学学报(信息科学版)第41卷完整的三维信息,就需要利用三维点云数据进一步完善语义分割㊂三维点云数据具有稀疏性和无序性,其独特的几何特征分布和三维属性使点云语义分割在许多领域的应用都遇到困难㊂如在机器人与计算机视觉领域使用三维点云进行目标检测与跟踪以及重建;在建筑学上使用点云提取与识别建筑物和土地三维几何信息;在自动驾驶方面提供路面交通对象㊁道路㊁地图的采集㊁检测和分割功能㊂2017年,Lawin等[1]将点云投影到多个视图上分割再返回点云,在原始点云上对投影分割结果进行分析,实现对点云的分割㊂最早的体素深度学习网络产生于2015年,由Maturana等[2]创建的VOXNET (Voxel Partition Network)网络结构,建立在三维点云的体素表示(Volumetric Representation)上,从三维体素形状中学习点的分布㊂结合Le等[3]提出的点云网格化表示,出现了类似PointGrid的新型深度网络,集成了点与网格的混合高效化网络,但体素化的点云面对大量点数的点云文件时表现不佳㊂在不规则的点云向规则的投影和体素等过渡态转换过程中,会出现很多空间信息损失㊂为将点云自身的数据特征发挥完善,直接输入点云的基础网络模型被逐渐提出㊂2017年,Qi等[4]利用点云文件的特性,开发了直接针对原始点云进行特征学习的PointNet网络㊂随后Qi等[5]又提出了PointNet++,针对PointNet在表示点与点直接的关联性上做出改进㊂Hu等[6]提出SENET(Squeeze⁃and⁃Excitation Networks)通过校准通道响应,为三维点云深度学习引入通道注意力网络㊂2018年,Li等[7]提出了PointCNN,设计了一种X⁃Conv模块,在不显著增加参数数量的情况下耦合较远距离信息㊂图卷积网络[8](Graph Convolutional Network)是依靠图之间的节点进行信息传递,获得图之间的信息关联的深度神经网络㊂图可以视为顶点和边的集合,使每个点都成为顶点,消耗的运算量是无法估量的,需要采用K临近点计算方式[9]产生的边缘卷积层(EdgeConv)㊂利用中心点与其邻域点作为边特征,提取边特征㊂图卷积网络作为一种点云深度学习的新框架弥补了Pointnet等网络的部分缺陷[10]㊂针对非规律的表面损伤这种特征缺失类点云分割,人们已经利用各种二维图像采集数据与卷积神经网络对风扇叶片㊁建筑和交通工具等进行损伤检测[11],损伤主要类别是裂痕㊁表面漆脱落等㊂但二维图像分割涉及的损伤种类不够充分,可能受物体表面污染㊁光线等因素影响,将凹陷㊁凸起等损伤忽视,或因光照不均匀判断为脱漆㊂笔者提出一种基于特征更新的动态图卷积网络,主要针对三维点云分割,设计了一种新型的特征更新模块㊂利用三维点云独特的空间结构特征,对传统K邻域内权重相近的邻域点采用空间尺度进行区分,并应用于对金属部件表面损伤分割的有用与无用信息混杂的问题研究㊂对邻域点进行空间尺度划分,将注意力权重分组,组内进行特征更新㊂在有效鉴别外邻域干扰特征造成的误差前提下,增大特征提取面以提高局部区域特征有用性㊂1 深度卷积网络计算方法1.1 包含空间尺度区域信息的三维图注意力特征提取方法由迭代最远点采集算法将整片点云分割为n个点集:{M1,M2,M3, ,M n},每个点集包含k个点:{P1, P2,P3, ,P k},根据点集内的空间尺度关系,将局部区域划分为不同的空间区域㊂在每个区域内,结合局部特征与空间尺度特征,进一步获得更有区分度的特征信息㊂根据注意力机制,为K邻域内的点分配不同的权重信息,特征信息包括空间区域内点的分布和区域特性㊂将这些特征信息加权计算,得到点集的卷积结果㊂使用空间尺度区域信息的三维图注意力特征提取方式,需要设定合适的K邻域参数K和空间划分层数R㊂如果K太小,则会导致弱分割,因不能完全利用局部特征而影响结果准确性;如果K太大,会增加计算时间与数据量㊂图1为缺损损伤在不同参数K下的分割结果图㊂由图1可知,在K=30或50时,分割结果效果较好,K=30时计算量较小㊂笔者选择K=30作为实验参数㊂在分析确定空间划分层数R之前,简要分析空间层数划分所应对的问题㊂三维点云所具有的稀疏性㊁无序性以及损伤点云自身噪声和边角点多的特性,导致了点云处理中可能出现的共同缺点,即将离群值点云选为邻域内采样点㊂由于损伤表面多为一个面,被分割出的损伤点云应在该面上分布,而噪声点则被分布在整个面的两侧,甚至有部分位于损伤内部㊂由于点云噪声这种立体分布的特征,导致了离群值被选入邻域内作为采样点存在㊂根据采用DGCNN(Dynamic Graph Convolutional Neural Networks)分割网络抽样实验结果,位于切面附近以及损伤内部的离群值点对点云分割结果造成的影响最大,被错误分割为特征点的几率最大,在后续预处理过程中需要对这种噪声点进行优先处理㊂图1 缺损损伤在不同参数K 下的分割结果图Fig.1 Segmentation results of defect damage under different parameters K 基于上述实验结果,在参数K =30情况下,选择空间划分层数R ㊂缺损损伤在不同参数R 下的分割结果如图2所示㊂图2b 的结果与测试集标签分割结果更为相似,更能体现损伤的特征,同时屏蔽了大部分噪声㊂因此,选择R =4作为实验参数㊂图2 缺损损伤在不同参数R 下的分割结果图Fig.2 Segmentation results of defect damage under different parameters R 在一个K 邻域内,邻域点与中心点的空间关系和特征差异最能表现邻域点的权重㊂空间特征系数表示邻域点对中心点所在点集的重要性㊂同时,为更好区分图内邻域点的权重,需要将整个邻域细分㊂以空间尺度进行细分是较为合适的分类方式㊂中心点的K 邻域可视为一个局部空间,将其划分为r 个不同的尺度区域㊂再运算空间注意力机制,为这r 个不同区域的权重系数赋值㊂按照空间尺度多层次划分,不仅没有损失核心的邻域点特征,还能有效抑制无意义的㊁有干扰性的特征㊂从而提高了深度学习网络对点云的局部空间特征的学习能力,降低相邻邻域之间的互相影响㊂空间注意力机制如图3所示,计算步骤如下㊂第1步,计算特征系数e mk ㊂该值表示每个中心点m 的第k 个邻域点对其中心点的权重㊂分别用Δp mk 和Δf mk 表示三维空间关系和局部特征差异,M 表示MLP(Multi⁃Layer Perceptrons)操作,C 表示concat 函数,其中Δp mk =p mk -p m ,Δf mk =M (f mk )-M (f m )㊂将两者合并后输入多层感知机进行计算,得到计算特征系数326第4期张闻锐,等:特征更新的动态图卷积表面损伤点云分割方法图3 空间尺度区域信息注意力特征提取方法示意图Fig.3 Schematic diagram of attention feature extraction method for spatial scale regional information e mk =M [C (Δp mk ‖Δf mk )]㊂(1) 第2步,计算图权重系数a mk ㊂该值表示每个中心点m 的第k 个邻域点对其中心点的权重包含比㊂其中k ∈{1,2,3, ,K },K 表示每个邻域所包含点数㊂需要对特征系数e mk 进行归一化,使用归一化指数函数S (Softmax)得到权重多分类的结果,即计算图权重系数a mk =S (e mk )=exp(e mk )/∑K g =1exp(e mg )㊂(2) 第3步,用空间尺度区域特征s mr 表示中心点m 的第r 个空间尺度区域的特征㊂其中k r ∈{1,2,3, ,K r },K r 表示第r 个空间尺度区域所包含的邻域点数,并在其中加入特征偏置项b r ,避免权重化计算的特征在动态图中累计单面误差指向,空间尺度区域特征s mr =∑K r k r =1[a mk r M (f mk r )]+b r ㊂(3) 在r 个空间尺度区域上进行计算,就可得到点m 在整个局部区域的全部空间尺度区域特征s m ={s m 1,s m 2,s m 3, ,s mr },其中r ∈{1,2,3, ,R }㊂1.2 基于特征更新的动态图卷积网络动态图卷积网络是一种能直接处理原始三维点云数据输入的深度学习网络㊂其特点是将PointNet 网络中的复合特征转换模块(Feature Transform),改进为由K 邻近点计算(K ⁃Near Neighbor)和多层感知机构成的边缘卷积层[12]㊂边缘卷积层功能强大,其提取的特征不仅包含全局特征,还拥有由中心点与邻域点的空间位置关系构成的局部特征㊂在动态图卷积网络中,每个邻域都视为一个点集㊂增强对其中心点的特征学习能力,就会增强网络整体的效果[13]㊂对一个邻域点集,对中心点贡献最小的有效局部特征的边缘点,可以视为异常噪声点或低权重点,可能会给整体分割带来边缘溢出㊂点云相比二维图像是一种信息稀疏并且噪声含量更大的载体㊂处理一个局域内的噪声点,将其直接剔除或简单采纳会降低特征提取效果,笔者对其进行低权重划分,并进行区域内特征更新,增强抗噪性能,也避免点云信息丢失㊂在空间尺度区域中,在区域T 内有s 个点x 被归为低权重系数组,该点集的空间信息集为P ∈R N s ×3㊂点集的局部特征集为F ∈R N s ×D f [14],其中D f 表示特征的维度空间,N s 表示s 个域内点的集合㊂设p i 以及f i 为点x i 的空间信息和特征信息㊂在点集内,对点x i 进行小范围内的N 邻域搜索,搜索其邻域点㊂则点x i 的邻域点{x i ,1,x i ,2, ,x i ,N }∈N (x i ),其特征集合为{f i ,1,f i ,2, ,f i ,N }∈F ㊂在利用空间尺度进行区域划分后,对空间尺度区域特征s mt 较低的区域进行区域内特征更新,通过聚合函数对权重最低的邻域点在图中的局部特征进行改写㊂已知中心点m ,点x i 的特征f mx i 和空间尺度区域特征s mt ,目的是求出f ′mx i ,即中心点m 的低权重邻域点x i 在进行邻域特征更新后得到的新特征㊂对区域T 内的点x i ,∀x i ,j ∈H (x i ),x i 与其邻域H 内的邻域点的特征相似性域为R (x i ,x i ,j )=S [C (f i ,j )T C (f i ,j )/D o ],(4)其中C 表示由输入至输出维度的一维卷积,D o 表示输出维度值,T 表示转置㊂从而获得更新后的x i 的426吉林大学学报(信息科学版)第41卷特征㊂对R (x i ,x i ,j )进行聚合,并将特征f mx i 维度变换为输出维度f ′mx i =∑[R (x i ,x i ,j )S (s mt f mx i )]㊂(5) 图4为特征更新网络模块示意图,展示了上述特征更新的计算过程㊂图5为特征更新的动态图卷积网络示意图㊂图4 特征更新网络模块示意图Fig.4 Schematic diagram of feature update network module 图5 特征更新的动态图卷积网络示意图Fig.5 Flow chart of dynamic graph convolution network with feature update 动态图卷积网络(DGCNN)利用自创的边缘卷积层模块,逐层进行边卷积[15]㊂其前一层的输出都会动态地产生新的特征空间和局部区域,新一层从前一层学习特征(见图5)㊂在每层的边卷积模块中,笔者在边卷积和池化后加入了空间尺度区域注意力特征,捕捉特定空间区域T 内的邻域点,用于特征更新㊂特征更新会降低局域异常值点对局部特征的污染㊂网络相比传统图卷积神经网络能获得更多的特征信息,并且在面对拥有较多噪声值的点云数据时,具有更好的抗干扰性[16],在对性质不稳定㊁不平滑并含有需采集分割的突出中心的点云数据时,会有更好的抗干扰效果㊂相比于传统预处理方式,其稳定性更强,不会发生将突出部分误分割或漏分割的现象[17]㊂2 实验结果与分析点云分割的精度评估指标主要由两组数据构成[18],即平均交并比和总体准确率㊂平均交并比U (MIoU:Mean Intersection over Union)代表真实值和预测值合集的交并化率的平均值,其计算式为526第4期张闻锐,等:特征更新的动态图卷积表面损伤点云分割方法U =1T +1∑Ta =0p aa ∑Tb =0p ab +∑T b =0p ba -p aa ,(6)其中T 表示类别,a 表示真实值,b 表示预测值,p ab 表示将a 预测为b ㊂总体准确率A (OA:Overall Accuracy)表示所有正确预测点p c 占点云模型总体数量p all 的比,其计算式为A =P c /P all ,(7)其中U 与A 数值越大,表明点云分割网络越精准,且有U ≤A ㊂2.1 实验准备与数据预处理实验使用Kinect V2,采用Depth Basics⁃WPF 模块拍摄金属部件损伤表面获得深度图,将获得的深度图进行SDK(Software Development Kit)转化,得到pcd 格式的点云数据㊂Kinect V2采集的深度图像分辨率固定为512×424像素,为获得更清晰的数据图像,需尽可能近地采集数据㊂选择0.6~1.2m 作为采集距离范围,从0.6m 开始每次增加0.2m,获得多组采量数据㊂点云中分布着噪声,如果不对点云数据进行过滤会对后续处理产生不利影响㊂根据统计原理对点云中每个点的邻域进行分析,再建立一个特别设立的标准差㊂然后将实际点云的分布与假设的高斯分布进行对比,实际点云中误差超出了标准差的点即被认为是噪声点[19]㊂由于点云数据量庞大,为提高效率,选择采用如下改进方法㊂计算点云中每个点与其首个邻域点的空间距离L 1和与其第k 个邻域点的空间距离L k ㊂比较每个点之间L 1与L k 的差,将其中差值最大的1/K 视为可能噪声点[20]㊂计算可能噪声点到其K 个邻域点的平均值,平均值高出标准差的被视为噪声点,将离群噪声点剔除后完成对点云的滤波㊂2.2 金属表面损伤点云关键信息提取分割方法对点云损伤分割,在制作点云数据训练集时,如果只是单一地将所有损伤进行统一标记,不仅不方便进行结果分析和应用,而且也会降低特征分割的效果㊂为方便分析和控制分割效果,需要使用ArcGIS 将点云模型转化为不规则三角网TIN(Triangulated Irregular Network)㊂为精确地分类损伤,利用图6 不规则三角网模型示意图Fig.6 Schematic diagram of triangulated irregular networkTIN 的表面轮廓性质,获得训练数据损伤点云的损伤内(外)体积,损伤表面轮廓面积等㊂如图6所示㊂选择损伤体积指标分为相对损伤体积V (RDV:Relative Damege Volume)和邻域内相对损伤体积比N (NRDVR:Neighborhood Relative Damege Volume Ratio)㊂计算相对平均深度平面与点云深度网格化平面之间的部分,得出相对损伤体积㊂利用TIN 邻域网格可获取某损伤在邻域内的相对深度占比,有效解决制作测试集时,将因弧度或是形状造成的相对深度判断为损伤的问题㊂两种指标如下:V =∑P d k =1h k /P d -∑P k =1h k /()P S d ,(8)N =P n ∑P d k =1h k S d /P d ∑P n k =1h k S ()n -()1×100%,(9)其中P 表示所有点云数,P d 表示所有被标记为损伤的点云数,P n 表示所有被认定为损伤邻域内的点云数;h k 表示点k 的深度值;S d 表示损伤平面面积,S n 表示损伤邻域平面面积㊂在获取TIN 标准包络网视图后,可以更加清晰地描绘损伤情况,同时有助于量化损伤严重程度㊂笔者将损伤分为6种类型,并利用计算得出的TIN 指标进行损伤分类㊂同时,根据损伤部分体积与非损伤部分体积的关系,制定指标损伤体积(SDV:Standard Damege Volume)区分损伤类别㊂随机抽选5个测试组共50张图作为样本㊂统计非穿透损伤的RDV 绝对值,其中最大的30%标记为凹陷或凸起,其余626吉林大学学报(信息科学版)第41卷标记为表面损伤,并将样本分类的标准分界值设为SDV㊂在设立以上标准后,对凹陷㊁凸起㊁穿孔㊁表面损伤㊁破损和缺损6种金属表面损伤进行分类,金属表面损伤示意图如图7所示㊂首先,根据损伤是否产生洞穿,将损伤分为两大类㊂非贯通伤包括凹陷㊁凸起和表面损伤,贯通伤包括穿孔㊁破损和缺损㊂在非贯通伤中,凹陷和凸起分别采用相反数的SDV 作为标准,在这之间的被分类为表面损伤㊂贯通伤中,以损伤部分平面面积作为参照,较小的分类为穿孔,较大的分类为破损,而在边缘处因腐蚀㊁碰撞等原因缺角㊁内损的分类为缺损㊂分类参照如表1所示㊂图7 金属表面损伤示意图Fig.7 Schematic diagram of metal surface damage表1 损伤类别分类Tab.1 Damage classification 损伤类别凹陷凸起穿孔表面损伤破损缺损是否形成洞穿××√×√√RDV 绝对值是否达到SDV √√\×\\S d 是否达到标准\\×\√\2.3 实验结果分析为验证改进的图卷积深度神经网络在点云语义分割上的有效性,笔者采用TensorFlow 神经网络框架进行模型测试㊂为验证深度网络对损伤分割的识别准确率,采集了带有损伤特征的金属部件损伤表面点云,对点云进行预处理㊂对若干金属部件上的多个样本金属面的点云数据进行筛选,删除损伤占比低于5%或高于60%的数据后,划分并装包制作为点云数据集㊂采用CloudCompare 软件对样本金属上的损伤部分进行分类标记,共分为6种如上所述损伤㊂部件损伤的数据集制作参考点云深度学习领域广泛应用的公开数据集ModelNet40part㊂分割数据集包含了多种类型的金属部件损伤数据,这些损伤数据显示在510张总点云图像数据中㊂点云图像种类丰富,由各种包含损伤的金属表面构成,例如金属门,金属蒙皮,机械构件外表面等㊂用ArcGIS 内相关工具将总图进行随机点拆分,根据数据集ModelNet40part 的规格,每个独立的点云数据组含有1024个点,将所有总图拆分为510×128个单元点云㊂将样本分为400个训练集与110个测试集,采用交叉验证方法以保证测试的充分性[20],对多种方法进行评估测试,实验结果由单元点云按原点位置重新组合而成,并带有拆分后对单元点云进行的分割标记㊂分割结果比较如图8所示㊂726第4期张闻锐,等:特征更新的动态图卷积表面损伤点云分割方法图8 分割结果比较图Fig.8 Comparison of segmentation results在部件损伤分割的实验中,将不同网络与笔者网络(FAS⁃DGCNN:Feature Adaptive Shifting⁃Dynamic Graph Convolutional Neural Networks)进行对比㊂除了采用不同的分割网络外,其余实验均采用与改进的图卷积深度神经网络方法相同的实验设置㊂实验结果由单一损伤交并比(IoU:Intersection over Union),平均损伤交并比(MIoU),单一损伤准确率(Accuracy)和总体损伤准确率(OA)进行评价,结果如表2~表4所示㊂将6种不同损伤类别的Accuracy 与IoU 进行对比分析,可得出结论:相比于基准实验网络Pointet++,笔者在OA 和MioU 方面分别在贯通伤和非贯通伤上有10%和20%左右的提升,在整体分割指标上,OA 能达到90.8%㊂对拥有更多点数支撑,含有较多点云特征的非贯通伤,几种点云分割网络整体性能均能达到90%左右的效果㊂而不具有局部特征识别能力的PointNet 在贯通伤上的表现较差,不具备有效的分辨能力,导致分割效果相对于其他损伤较差㊂表2 损伤部件分割准确率性能对比 Tab.2 Performance comparison of segmentation accuracy of damaged parts %实验方法准确率凹陷⁃1凸起⁃2穿孔⁃3表面损伤⁃4破损⁃5缺损⁃6Ponitnet 82.785.073.880.971.670.1Pointnet++88.786.982.783.486.382.9DGCNN 90.488.891.788.788.687.1FAS⁃DGCNN 92.588.892.191.490.188.6826吉林大学学报(信息科学版)第41卷表3 损伤部件分割交并比性能对比 Tab.3 Performance comparison of segmentation intersection ratio of damaged parts %IoU 准确率凹陷⁃1凸起⁃2穿孔⁃3表面损伤⁃4破损⁃5缺损⁃6PonitNet80.582.770.876.667.366.9PointNet++86.384.580.481.184.280.9DGCNN 88.786.589.986.486.284.7FAS⁃DGCNN89.986.590.388.187.385.7表4 损伤分割的整体性能对比分析 出,动态卷积图特征以及有效的邻域特征更新与多尺度注意力给分割网络带来了更优秀的局部邻域分割能力,更加适应表面损伤分割的任务要求㊂3 结 语笔者利用三维点云独特的空间结构特征,将传统K 邻域内权重相近的邻域点采用空间尺度进行区分,并将空间尺度划分运用于邻域内权重分配上,提出了一种能将邻域内噪声点降权筛除的特征更新模块㊂采用此模块的动态图卷积网络在分割上表现出色㊂利用特征更新的动态图卷积网络(FAS⁃DGCNN)能有效实现金属表面损伤的分割㊂与其他网络相比,笔者方法在点云语义分割方面表现出更高的可靠性,可见在包含空间尺度区域信息的注意力和局域点云特征更新下,笔者提出的基于特征更新的动态图卷积网络能发挥更优秀的作用,而且相比缺乏局部特征提取能力的分割网络,其对于点云稀疏㊁特征不明显的非贯通伤有更优的效果㊂参考文献:[1]LAWIN F J,DANELLJAN M,TOSTEBERG P,et al.Deep Projective 3D Semantic Segmentation [C]∥InternationalConference on Computer Analysis of Images and Patterns.Ystad,Sweden:Springer,2017:95⁃107.[2]MATURANA D,SCHERER S.VoxNet:A 3D Convolutional Neural Network for Real⁃Time Object Recognition [C]∥Proceedings of IEEE /RSJ International Conference on Intelligent Robots and Systems.Hamburg,Germany:IEEE,2015:922⁃928.[3]LE T,DUAN Y.PointGrid:A Deep Network for 3D Shape Understanding [C]∥2018IEEE /CVF Conference on ComputerVision and Pattern Recognition (CVPR).Salt Lake City,USA:IEEE,2018:9204⁃9214.[4]QI C R,SU H,MO K,et al.PointNet:Deep Learning on Point Sets for 3D Classification and Segmentation [C]∥IEEEConference on Computer Vision and Pattern Recognition (CVPR).Hawaii,USA:IEEE,2017:652⁃660.[5]QI C R,SU H,MO K,et al,PointNet ++:Deep Hierarchical Feature Learning on Point Sets in a Metric Space [C]∥Advances in Neural Information Processing Systems.California,USA:SpringerLink,2017:5099⁃5108.[6]HU J,SHEN L,SUN G,Squeeze⁃and⁃Excitation Networks [C ]∥IEEE Conference on Computer Vision and PatternRecognition.Vancouver,Canada:IEEE,2018:7132⁃7141.[7]LI Y,BU R,SUN M,et al.PointCNN:Convolution on X⁃Transformed Points [C]∥Advances in Neural InformationProcessing Systems.Montreal,Canada:NeurIPS,2018:820⁃830.[8]ANH VIET PHAN,MINH LE NGUYEN,YEN LAM HOANG NGUYEN,et al.DGCNN:A Convolutional Neural Networkover Large⁃Scale Labeled Graphs [J].Neural Networks,2018,108(10):533⁃543.[9]任伟建,高梦宇,高铭泽,等.基于混合算法的点云配准方法研究[J].吉林大学学报(信息科学版),2019,37(4):408⁃416.926第4期张闻锐,等:特征更新的动态图卷积表面损伤点云分割方法036吉林大学学报(信息科学版)第41卷REN W J,GAO M Y,GAO M Z,et al.Research on Point Cloud Registration Method Based on Hybrid Algorithm[J]. Journal of Jilin University(Information Science Edition),2019,37(4):408⁃416.[10]ZHANG K,HAO M,WANG J,et al.Linked Dynamic Graph CNN:Learning on Point Cloud via Linking Hierarchical Features[EB/OL].[2022⁃03⁃15].https:∥/stamp/stamp.jsp?tp=&arnumber=9665104. [11]林少丹,冯晨,陈志德,等.一种高效的车体表面损伤检测分割算法[J].数据采集与处理,2021,36(2):260⁃269. LIN S D,FENG C,CHEN Z D,et al.An Efficient Segmentation Algorithm for Vehicle Body Surface Damage Detection[J]. Journal of Data Acquisition and Processing,2021,36(2):260⁃269.[12]ZHANG L P,ZHANG Y,CHEN Z Z,et al.Splitting and Merging Based Multi⁃Model Fitting for Point Cloud Segmentation [J].Journal of Geodesy and Geoinformation Science,2019,2(2):78⁃79.[13]XING Z Z,ZHAO S F,GUO W,et al.Processing Laser Point Cloud in Fully Mechanized Mining Face Based on DGCNN[J]. ISPRS International Journal of Geo⁃Information,2021,10(7):482⁃482.[14]杨军,党吉圣.基于上下文注意力CNN的三维点云语义分割[J].通信学报,2020,41(7):195⁃203. YANG J,DANG J S.Semantic Segmentation of3D Point Cloud Based on Contextual Attention CNN[J].Journal on Communications,2020,41(7):195⁃203.[15]陈玲,王浩云,肖海鸿,等.利用FL⁃DGCNN模型估测绿萝叶片外部表型参数[J].农业工程学报,2021,37(13): 172⁃179.CHEN L,WANG H Y,XIAO H H,et al.Estimation of External Phenotypic Parameters of Bunting Leaves Using FL⁃DGCNN Model[J].Transactions of the Chinese Society of Agricultural Engineering,2021,37(13):172⁃179.[16]柴玉晶,马杰,刘红.用于点云语义分割的深度图注意力卷积网络[J].激光与光电子学进展,2021,58(12):35⁃60. CHAI Y J,MA J,LIU H.Deep Graph Attention Convolution Network for Point Cloud Semantic Segmentation[J].Laser and Optoelectronics Progress,2021,58(12):35⁃60.[17]张学典,方慧.BTDGCNN:面向三维点云拓扑结构的BallTree动态图卷积神经网络[J].小型微型计算机系统,2021, 42(11):32⁃40.ZHANG X D,FANG H.BTDGCNN:BallTree Dynamic Graph Convolution Neural Network for3D Point Cloud Topology[J]. Journal of Chinese Computer Systems,2021,42(11):32⁃40.[18]张佳颖,赵晓丽,陈正.基于深度学习的点云语义分割综述[J].激光与光电子学,2020,57(4):28⁃46. ZHANG J Y,ZHAO X L,CHEN Z.A Survey of Point Cloud Semantic Segmentation Based on Deep Learning[J].Lasers and Photonics,2020,57(4):28⁃46.[19]SUN Y,ZHANG S H,WANG T Q,et al.An Improved Spatial Point Cloud Simplification Algorithm[J].Neural Computing and Applications,2021,34(15):12345⁃12359.[20]高福顺,张鼎林,梁学章.由点云数据生成三角网络曲面的区域增长算法[J].吉林大学学报(理学版),2008,46 (3):413⁃417.GAO F S,ZHANG D L,LIANG X Z.A Region Growing Algorithm for Triangular Network Surface Generation from Point Cloud Data[J].Journal of Jilin University(Science Edition),2008,46(3):413⁃417.(责任编辑:刘俏亮)。
大数据理论考试(习题卷12)
大数据理论考试(习题卷12)说明:答案和解析在试卷最后第1部分:单项选择题,共64题,每题只有一个正确答案,多选或少选均不得分。
1.[单选题]()试图学得一个属性的线性组合来进行预测的函数。
A)决策树B)贝叶斯分类器C)神经网络D)线性模2.[单选题]随机试验所有可能出现的结果,称为()A)基本事件B)样本C)全部事件D)样本空间3.[单选题]DWS实例中,下列哪项不是主备配置的:A)CMSB)GTMC)OMSD)coordinato4.[单选题]数据科学家可能会同时使用多个算法(模型)进行预测,并且最后把这些算法的结果集成起来进行最后的预测(集成学习),以下对集成学习说法正确的是()。
A)单个模型之间具有高相关性B)单个模型之间具有低相关性C)在集成学习中使用“平均权重”而不是“投票”会比较好D)单个模型都是用的一个算法5.[单选题]下面算法属于局部处理的是()。
A)灰度线性变换B)二值化C)傅里叶变换D)中值滤6.[单选题]中文同义词替换时,常用到Word2Vec,以下说法错误的是()。
A)Word2Vec基于概率统计B)Word2Vec结果符合当前预料环境C)Word2Vec得到的都是语义上的同义词D)Word2Vec受限于训练语料的数量和质7.[单选题]一位母亲记录了儿子3~9岁的身高,由此建立的身高与年龄的回归直线方程为y=7.19x+73.93,据此可以预测这个孩子10岁时的身高,则正确的叙述是()。
A)身高一定是145.83cmB)身高一定超过146.00cmC)身高一定高于145.00cmD)身高在145.83cm左右8.[单选题]有关数据仓库的开发特点,不正确的描述是()。
A)数据仓库开发要从数据出发;B)数据仓库使用的需求在开发出去就要明确;C)数据仓库的开发是一个不断循环的过程,是启发式的开发;D)在数据仓库环境中,并不存在操作型环境中所固定的和较确切的处理流,数据仓库中数据分析和处理更灵活,且没有固定的模式9.[单选题]由于不同类别的关键词对排序的贡献不同,检索算法一般把查询关键词分为几类,以下哪一类不属于此关键词类型的是()。
recurrent models of visual attention
recurrent models of visualattentionRecurrent models of visual attention are a type of deep neural networks that are used to simulate the human visual system and its ability to focus on different areas of an image for a certain period of time. This is done by using recurrent neural networks (RNNs) combined with convolutional neural networks (CNNs). The recurrent model of visual attention is designed to mimic the way humans shift their focus from one area to another in order to identify objects and features in images.In general, recurrent models of visualattention use a combination of CNNs and RNNs to detect which elements or features of an image are most important. This is done in two stages. First, the CNN extracts the features of the image, such as edges and shapes. Then, the RNN takes thesefeatures and uses them to determine which elements of the image should be focused on, and for howlong. This is done through a process called “attention”, where the RNN assigns more importance to certain elements of the image over others based on their relevance to the task at hand.For example, if an image contains both a cat and a dog, the RNN might assign more attention to the cat than the dog. This is because cats tend to be more relevant to the task of identifying animals in images than dogs. Similarly, if an image contains both a car and a tree, the RNN might assign more attention to the car than the tree. This is because cars tend to be more relevant to the task of identifying vehicles in images than trees.The advantage of using recurrent models of visual attention is that they can learn to focus on the most relevant elements in an image, instead of just relying on the fixed filters used in regular CNNs. This makes them more suitable for tasks such as object detection, where it is important to identify the specific objects in an image.Furthermore, since they are able to focus on different elements of an image for different lengths of time, they can also be used for tasks such as scene recognition, where it is important to recognize not only individual objects but also the overall structure of the scene.Overall, recurrent models of visual attention are a powerful tool for simulating the human visual system and its ability to focus on different elements of an image for various lengths of time. By combining CNNs and RNNs, these models can learn to focus on the most relevant elements in an image, making them well suited for tasks such as object detection and scene recognition.。
具有区间时变时滞的细胞神经网络的时滞相关渐近稳定性分析
具有区间时变时滞的细胞神经网络的时滞相关渐近稳定性分析【关键词】神经网络;时变时滞;渐近稳定;线性矩阵不等式(lmi)0 引言细胞神经网络是由chua和yang于1988年提出的,此后得到了广泛的应用研究,例如,运用到信号处理,静态图像处理,模式识别,解非线性代数方程,最优化问题和移动物体速度的确定等各个方面,在过去数十年引起许多学者的关注。
然而,在人工神经网络的电路设计中,由于信号传输速度的有限性,不可避免的存在着时滞问题,因此,对时滞神经网络进行分析具有更重要的理论意义和实际价值。
目前关于神经网络稳定性研究成果主要分为时滞相关和时滞无关两大类。
由于时滞相关稳定条件包含了时滞信息,特别是当时滞很小时,比时滞无关条件具有更低的保守性。
以往,时滞相关的稳定条件通常是用确定模型变换法和交叉项界定的技术得到系统稳定的充分条件,比如中立变换、参数变化等等,利用newton-leibniz 公式,通过在lyapunov-krasovskii泛函导数中添加一些零项,引入辅助变量且利用广义的状态变量,得到一些保守性较低的稳定判据。
但是由于在模型变换过程中,使用的固定权来表示的各项的关系,会引起一定的保守性。
近年来, he 等人提出一种自由权矩阵方法,对时滞神经网络系统进行研究,该方法降低了以往由于对神经网络系统进行模型变换而带来的保守性,得到一系列时滞相关稳定性条件。
本文基于he 等人提出的自由权矩阵方法,研究了具有区间时滞的细胞神经网络系统时滞相关渐近稳定性问题,在lyapunov-krasovskii泛函导数求导的过程中保留了各有用项,得到了一个以线性矩阵不等式(lmi)表示的系统时滞相关稳定的充分条件。
最后,数值实例表明了本文方法的有效性和优越性。
1 问题描述考虑如下的具有区间时变时滞的细胞神经网络系统4 结论本文对具有区间时滞神经网络系统进行研究,得到系统渐近稳定的时滞相关稳定条件,通过选择适当的lyapunov-krasovskii泛函,得到具有更低保守性的基于线性矩阵不等式(lmi)的神经网络系统渐近稳定的时滞相关稳定的充分条件。
PIECEWISE-CONSTANT STABILIZATION
y
With the help of topological necessary conditions for continuous stabilization it is shown that, in general, in order to stabilize continuous- and discrete-time systems one has to use time-dependent or discontinuous feedback controls. On the other hand, the criterion of stabilization in the class of piecewise-constant feedbacks is established. In the context of this paper a piecewise-constant feedback is associated with a piecewise-constant function of the form u = u(x); where x Rn : x The piecewise-constant feedback synthesis outlined here has several attractive features. First, it can be e ectively applied to design feedback stabilizers subjected to control constraints. Second, the designed feedback laws do not cause sliding mode and /or chattering behavior in the closed loop system, i.e., on a nite interval of time the control in the closed loop system may have only nite number of jump discontinuities.
长短期记忆网络AI技术中的序列建模模型
长短期记忆网络AI技术中的序列建模模型序列建模模型是指一类用于处理序列数据的人工智能模型。
在长短期记忆网络(Long Short-Term Memory,简称LSTM)中,序列建模模型成为了一个核心组件。
LSTM是一种特殊的循环神经网络(Recurrent Neural Network,简称RNN),在处理长序列数据时表现出色。
在传统的RNN模型中,长期依赖问题(long-term dependency problem)会导致较早的输入信息长期记忆失效。
为解决这个问题,LSTM模型引入了记忆单元(memory cell)以及输入、输出门(input gate和output gate)。
通过精细的门控机制,LSTM能够选择性地遗忘或保留之前的记忆,并对当前的输入信息进行整合,从而更好地捕捉序列中的长期依赖关系。
在序列建模任务中,LSTM的序列建模模型可以用来进行序列到序列(sequence-to-sequence)的转换,如机器翻译和语音识别等。
下面将介绍两种典型的LSTM序列建模模型:编码器-解码器模型和注意力机制模型。
编码器-解码器模型是一种常见的序列建模模型,它由两个LSTM模型组成:编码器和解码器。
编码器将输入序列编码为一个固定长度的向量,称为上下文向量(context vector),并将该向量提供给解码器。
解码器使用上下文向量和先前生成的输出来逐步生成目标序列。
注意力机制模型是一种改进的序列建模模型,用于解决编码器-解码器模型中的信息压缩问题。
注意力机制允许解码器在生成序列的每一步都选择性地“关注”编码器输出序列的不同部分,从而更好地对输入序列进行建模。
通过对不同位置的输入进行加权组合,注意力机制模型可以更准确地捕捉到序列中的重要信息。
除了以上介绍的编码器-解码器模型和注意力机制模型,还有其他变体的LSTM序列建模模型。
例如,带有双向LSTM的模型可以同时考虑过去和未来的上下文信息,用于序列标注等任务。
简单英文自我介绍优秀
简单英文自我介绍优秀6篇简单英文自我介绍一:I am a third year master major in automation at Shanghai Jiao Tong University, P. R. Ch ina. With tremendous interest in Industrial Engineering, I am writing to apply for acceptan ce into your Ph.D. graduate program. Education background In 1995, I entered the Nanjing University of Science & Technology (NUST) -- widely co nsidered one of the China’s best engineering schools. During the following undergraduate study, my academic records kept distinguished among the whole department. I was granted First Class Prize every semester, and my overall GPA(89.5/100) ranked No.1 among 113 students. In 1999, I got the privilege to enter the graduate program waived of the admiss ion test. I selected the Shanghai Jiao Tong University to continue my study for its best re putation on Combinatorial Optimization and Network Scheduling where my research interes t lies. At the period of my graduate study, my overall GPA(3.77/4.0) ranked top 5% in the depa rtment. In the second semester, I became teacher assistant that is given to talented and m atured students only. This year, I won the Acer Scholarship as the one and only candidate in my department, which is the ultimate accolade for distinguished students endowed by my university. Presently, I am preparing my graduation thesis and trying for the honor of E_cellent Graduation Thesis. Researche_perience and academic activity When a sophomore, I joined the Association of AI Enthusiast and began to narrow down my interest for my future research. In 1997, I participated in simulation tool development for the scheduling system in Prof. Wang’s lab. With the tool of OpenGL and Matlab, I d esigned a simulation program for transportation scheduling system. It is now widely used by different research groups in NUST. In 1998, I assumed and fulfilled a sewage analysis & dispose project for Nanjing sewage treatment plant. This was my first practice to conv ert a laboratory idea to a commercial product. In 1999, I joined the distinguished Professor Yu-Geng _is research group aiming at Netw ork flow problem solving and Heuristic algorithm research. Soon I was engaged in the Fu Dan Gene Database Design. My duty was to pick up the useful information among differe nt kinds of gene matching format. Through the comparison and analysis for many heuristi c algorithms, I introduced an improved evolutionary algorithm -- Multi-population GeneticAlgorithm. By dividing a whole population into several sub-populations, this improved algorithm can effectively prevent GA from local convergence and promote various evolutionary orientations. It proved more efficiently than SGA in e_periments, too. In the secondsemester, I joined the workshop-scheduling research in Shanghai Heavy Duty Tyre plant. The scheduling was designed for the rubber-making process that covered not only discrete but also continuous circumstances. To make a balance point between optimization quality and time cost, I proposed a Dynamic Layered Scheduling method based on hybrid Petri Nets. The practical application showed that the average makespan was shortened by a large scale. I also publicized two papers in core journals with this idea. Recently, I am doing research in the Composite Predict of the Electrical Power system assisted with the technology of Data Mining for Bao Steel. I try to combine the Decision Tree with Receding Optimization to provide a new solution for the Composite Predictive Problem. This project is now under construction.Besides, In July 2000, I got the opportunity to give a lecture in English in Asia Control Conference (ASCC) which is one of the top-level conferences among the world in the area of control and automation. In my senior year, I met Prof. _iao-Song Lin, a visiting professor of mathematics from University of California-Riverside, I learned graph theory from him for my network research. Theseex_eriences all rapidly ex_anded my knowledge of English and the understanding of western culture.I hope to study in depthIn retrospect, I find myself standing on a solid basis in both theory and e_perience, which has prepared me for the Ph.D. program. Myfuture research interests include: Network Scheduling Problem, Heuristic Algorithm research (especially in GA and Neural network), Supply chain network research, Hybrid system performance analysis with Petri nets and Data Mining.简单英文自我介绍二:please allow me to introduce myself fou you. my name is __, my 20-year-old, was born in qinghai province. very honored to have this opportunity to come here to interview. now i will briefly introduce themselves. i graduated from the qinghai first vocational schools. my specialty is computer software applications. in the past year i have the basic e_pertise. i like reading books, watching movies, listening to music, and so on. i think i am an honest and optimistic, can also work under great pressure. this is probably, thank you.简单英文自我介绍三:It is really my honor to have this opportunity for a interview, I hope i can make a good performance today. I'm confident that I can succeed. Now i will introduce myself briefly I am 26 years old,born in shandong province . I was graduated from qingdao university. my major is electronic.and i got my bachelor degree after my graduation in the year of 2003.I spend most of my time on study,i have passed CET4/6 . and i have acquired basic knowledge of my major during my school time.In July 2003, I begin work for a small private company as a technical support engineer in QingDao city.Because I'm capable of more responsibilities, so I decided to change my job.And in August 2004,I left QingDao to BeiJing and worked for a foreign enterprise as a automation software test engineer.Because I want to change my working environment, I'd like to find a job which is more challenging. Moreover Motorola is a global company, so I feel I can gain the most from working in this kind of company ennvironment. That is the reason why I come here to compete for this position.I think I'm a good team player and I'm a person of great honesty to others. Also I am able to work under great pressure.That’s all. Thank you for giving me the chance.简单英文自我介绍四:hello!boys and grilsmy name is forest , i 12 old , my is girl .my family have a three people .my is youngest than other two and i is oldest of the tow.my father and my monther very love me,i love them too.i at my family very happily. i love my family very much, and you?i have a big eyes and a big mouth . i have a short between hair.i have a lot of hobbies ,for e_ample : ilike play football , basketball , badminton , table tennis , i like draw a paintings, watercolours and landscapes,i like is it .and you?.now, i in a si_grade. i like chinese class very much, it very fun ,i love go chinese class. my best like p.e. it very happy. and you?my dream is to be a computer engineer when i grow up, because i very like playing computers. and you?i study very hard , i very like study .one day , is a summer holiday . i and my mother and my father together clambing mountian ,my father and my mother suddenly listenning:“oh hlep me!! help me !!you say :”why?“ i say toyou :”because my suddenly lie down .“简单英文自我介绍五:Hello everyone. My name is … I am a student of Grade eight . I am an outgoing , lovely girl and I am so welcomed by my friends and my classmates.I have a best friend,_iao hai. She is very interesting and lovely too. She often tells funny stories and always make me laugh. We often play together. I like action movies. I think they are e_citing and interesting. I often go to the movies with my friends on weekends. I can aslo play the violin and have won many prizes in the competitions. I take violin lessons twice a week. It is a little hard for me but I am very happy , because I have a dream. I want to be a great violinist one day.Thank you.简单英文自我介绍六:Hello,everyone!My name is___, at 20__years Joined the__School of Information Management Department of economic information study, and in200_years__months before graduation.Succeeded under eat bitter, I come out from the rural areas, have realized that the school career is not easy. In school, systematic study of the economic information management knowledge, while taking advantage of spare time out to do part-time. He was a clerk and the two companies did business and so on.Accumulated some e_perience in social practice, to200_years_months_months, has been YISHION clothing store clerk and business sales and so on. Work in si__months time, I have made some achievements, the company completed the sale of the development tasks, in addition to the other branch colleagues to help sell some clothes.In my work, I had successes and failures, but regardless of success or failure, all of my accumulated e_perience and lessons, work through these si_months, I not only personal ability, professional knowledge had improved, he has learned team spirit , the importance of team spirit. I believe that with si_months work e_perience, I more clearly what they have to, what to do, whatto do, how to do a good job. I believe that this e_perience will be my ne_t life, a good start.Thank you!简单英文自我介绍6篇。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
A Discrete-Time Projection Neural Network for Solving Degenerate Convex Quadratic OptimizationZican Zhang1·Chuandong Li1·Xing He1·Tingwen Huang2Received:24June2015/Revised:21March2016/Accepted:23March2016©Springer Science+Business Media New York2016Abstract This paper presents a discrete-time neural network to solve convex degen-erate quadratic optimization problems.Under certain conditions,it is shown that the proposed neural network is stable in the sense of Lyapunov and globally convergent to an optimal pared with the existing continuous-time neural networks for degenerate quadratic optimization,the proposed neural network in this paper is more suitable for hardware implementation.Results of two experiments of this neural network are given to illustrate the effectiveness of the proposed neural network.Keywords Degenerate quadratic optimization·Discrete-time neural network·Convex optimization·Algorithm convergence1IntroductionOptimization problem plays an important role in science and engineering applications, such as regression analysis[14],signal and image processing[29],manufacturing,B Chuandong Lilicdswu@Zican Zhangzhangzican123@Xing Hehexingdoc@Tingwen Huangtingwen.huang@1School of Electronics and Information Engineering,Southwest University,Chongqing400715, People’s Republic of China2Department of Mathematics,Texas A&M University at Qatar,P.O.Box23874,Doha,Qataroptimal control[3,15,16,30,34]and system identification.In many cases,the real-time solutions of optimization problems are required.But traditional algorithms are not efficient because the computing time is greatly dependent on the structure and dimension of the problems.One possible promising approach to real-time optimiza-tion is to utilize artificial neural networks,which is much faster than the traditional optimization algorithms.In[10],Tank and Hopfieldfirst proposed a neural network for solving linear optimization problems.Their heuristic work has inspired many researchers to investigate neural networks for solving optimization problems[19]. Soon after that,Kennedy and Chua[12]presented a recurrent neural network for solv-ing constraint nonlinear programming problems,which used both gradient method and penalty parameter method,but its equilibrium points correspond to approximate optimal solutions only.Since then,the advent of the neural network method for opti-mization problems already has profound influence,for instance the primal–dual neural networks[28],projection neural networks[2,4,6,7,18,20–22,33,35],one-layer neural networks[8,9]and non-smooth neural networks[5,17,24,25].Recently,the projection method is widely used when building the neural networks in constraint optimization,such as[23]and[26].In particular,the condition that using neural networks to solve a category of non-degenerate quadratic optimization problems [35]has been widely studied.But few people utilize neural network to solve this class of degenerate quadratic optimization problems,and the coefficient of the quadratic term is a degenerate matrix,which also can be said the non-full rank matrix.However, degenerate quadratic programming problems are more general.Specially,Tao and Cao [27]developed a projection neural network to solve the degenerate convex quadratic programming problems.Xue and Bian[35]developed a continuous-time projection neural network for solving degenerate convex quadratic problems with general linear constraints.There exist a few discrete-time neural networks for solving constraint optimization problems[23,26,27,36].The discrete-time projection neural networks with global exponential stability in[23]and[26]were proposed for solving the strictly convex quadratic optimization problems.In[36],a discrete-time primal–dual neural network that derived from its corresponding continuous-time neural network was proposed for solving nonlinear convex optimization problems.For this kind of problems,many researchers focused on the methods of continuous-time neural network.In this paper, we have solved the degenerate quadratic optimization problems using discrete-time neural network approach,and,to the best of our knowledge,few(if any)publica-tions have reported this method in the literature.Continuous-time neural network can hardly simulate on hardware circuits,while the discrete-time neural network can eas-ily simulate on hardware circuits.Motivated by the advantage of discrete-time neural network,we constructed the hardware circuit for simulating the iterative calculation. It means a great significance in engineering application,and in many cases,the real-time solutions of optimization problems are necessary.Moreover,based on Lyapunov function theory,and under certain conditions,it is shown that the proposed neural network is Lyapunov stable and mightfind the optimization solution effectively for many optimization problems.The rest of this paper is organized as follows:In Sect.2,the discrete-time projection neural network is described.In Sect.3,we will prove the asymptotic stability of theproposed neural network.In Sect.4,two examples are given to illustrate the good performance of the proposed neural network.Conclusions are found in Sect.5.2Problem Formulation and Model DescriptionConsider the following quadratic optimization problem:minimize 12x T Qx+c T xsubject to b1≤Bx≤b2d0≤x≤h0(1)where Q∈R n×n,B∈R m×n,b1,b2∈R m,d0,h0∈R n,and c∈R n.Q is a symmetric and semi-positive definite matrix.We only investigate the degenerate quadratic optimization problems in this paper.In order to facilitate the calculation, method is similar as[32].First,letE=I nB,d=d0b1,h=h0b2,where I n∈R n×n is an unit matrix.Then,problem(1)can be rewritten asminimize 12x T Qx+c T xsubject to d≤Ex≤h,(2)where d=[d1,d2,...,d n+m]T and h=[h1,h2,...,h n+m]T.The Lagrange function of(2)is defined as:L(x,u,η)=12x T Qx+c T x−u T(Ex−η),(3)where u∈R m+n is the Lagrange multiplier andη∈X={u∈R n+m|d≤u≤h}. According to the well-known saddle point theorem[1],x∗is an optimal solution of (2)if and only if there exist u∗andη∗satisfyingL(x∗,u,η∗)≤L(x∗,u∗,η∗)≤L(x,u∗,η).(4)Then the derivation process is similar as that in[23],from the left inequality in(4), we haveEx∗=η∗(5) And from the right inequality in(4),we can getf(x∗)−f(x)≤0(6)and(u ∗)T (η−η∗)≥0∀η∈X (7)According to the projection formulation [13],it is equivalent to thatη∗=P X (η∗−u ∗),(8)where P X (u )=[P X (u 1),P X (u 2),...,P X (u n +m )]T is a projection function,for i =1,2,...,n +mP X (u i )=⎧⎪⎨⎪⎩d i ,u i <d i ,u i ,d i ≤u i ≤h i ,h i ,u i >h iwhile,in convex optimization,extreme point must be the optimal point,we already have f (x ∗)≤f (x ),∀x ∈R n ,and therefore∇f (x ∗)=Qx ∗+c −E T u ∗=0(9)So,if there exists (x ∗,u ∗,η∗),it certainly satisfiesQx ∗+c −E T u ∗=0,Ex ∗=P X Ex ∗−u ∗ .(10)Based on the above analysis,the projection equation can be designed as follows:x ∗=P X x ∗−(Qx ∗+c −E T u ∗) ,Ex ∗=P X Ex ∗−u ∗ .(11)that is I n 0n ×m0n ×m ⎛⎝x ∗v ∗w ∗⎞⎠=P X ⎡⎣ I n 0n ×n 0n ×m ⎛⎝x ∗v ∗w ∗⎞⎠−⎛⎝ Q −I n −B T ⎛⎝x ∗v ∗w ∗⎞⎠+c ⎞⎠⎤⎦(12)andI n0n×n0n×mB0m×n0m×m ⎛⎝x∗v∗w∗⎞⎠=P X ⎡⎣I n−I n0n×mB0m×n−I m⎛⎝x∗v∗w∗⎞⎠⎤⎦(13)where x∗=⎛⎜⎝x1...x n⎞⎟⎠,u∗=((v∗)T,(w∗)T)T,v∈Rn,w∈R m and0n×n,0n×m,0m×mare all zero matrix,I n,I m are unit matrix,then let us combine the right and the left side of equation(12)(13),respectively,we can get a new projection equation as follows:⎛⎝I n0n×n0n×mB0m×n0m×mI n0n×n0n×m⎞⎠⎛⎝xvw⎞⎠=P Y⎡⎣⎛⎝I n−I n0n×mB0m×n−I mI n0n×n0n×m⎞⎠⎛⎝xvw⎞⎠−⎛⎝⎛⎝0n×n0n×n0n×m0m×n0m×n0n×mQ−I n−B T⎞⎠⎛⎝xvw⎞⎠+⎛⎝0n×10m×1c⎞⎠⎞⎠⎤⎦(14)where Y={u∈R2n+m|´d≤u≤´h},´d=d0+∞,´h=h0+∞,let y∗=⎛⎝x∗v∗w∗⎞⎠,M=⎛⎝I n0n×n0n×mB0m×n0m×mI n0n×m0n×m⎞⎠,W=⎛⎝I n−I n0n×mB0m×n−I mI n0n×n0n×m⎞⎠,A=⎛⎝0n×n0n×n0n×m0m×n0m×n0m×mQ−I n−B T⎞⎠and C=⎛⎝0n×10m×1c⎞⎠,then the upward projection equation can be transformed into thisform:My(k)=P Y[W y(k)−(Ay(k)+C)],(15) y∗is an equilibrium point of the upward projection equation,thenx∗=I n0n×n0n×my∗is an optimal solution of(2).Next,the proposed neural network for solving problem (2)can be designed as:State equation:y(k+1)=y(k)+λ{P Y[W y(k)−(Ay(k)+C)]−My(k)}(16)Fig.1The principle diagram of model(16)Output equation:x(k)=I n0n×n0n×my(k)(17)where H∈R(2n+m)×(2n+m)is a matrix,λ>0is a positive constant,and the asymp-totic stability of the network is depend on this constant,and we will give the proof later.And y(k)∈R2n+m is the state variable,x(k)∈R n is the output variable,H=⎛⎝I n+Q(I n+Q)B T(I n+Q)2I n0n×m−I n0m×n I m−B⎞⎠Since Q is a degenerate matrix,then Q+I n is positive matrix,hence,H is a non-singular matrix.Therefore,y(k+1)=y(k)if and only if(15)holds.Remark1Figure1shows the principle diagram of model(16).Figure2shows the detailing of the module W1y(k),in Fig.2,R i1,R i2,…,R in,and R s1,R s2,…,R sn are all variable resistors,the input signal of R i1is y(k)1,first element of the column vector. While the value of R s1R i1represents the magnification of amplifier A1,the resistance canbe changed as the optimization problems variety,and then,we let W11=R s1Ri1,whileW11is thefirst row andfirst column element of matrix W.There are some other modules of the coefficients matrix,for example the module of matrix H and module of A,and the principle is the same.We can accommodate all the elements in the coefficient matrix corresponding to the ratio of resistance.The triggered subsystem is Dflip-flop,and it is rising edge trigger.Clock signal is square wave which is controllable as required,Fig.2Module of the matrix Wand when the rising edge arrives,triggered subsystem outputs a signal which activatean iteration,after W1y(k)passes through the Amplifier1,the signal will change intoW1y(k)−[(A1y(k)+C1)],then the signal passes the Projection Module,the Amplifier 2,the module of matrix H,after a cycle of operation,the signal feedback representingy(k+1).Iterative operation will go on along with the rising edge comes.3Stability AnalysisIn this section,the stability of the proposed neural network will be analyzed andproved.Lemma1[35](2007)Let E P be the set consisted of the solutions of(16),and supposethat y∗∈E P is one of the equilibrium points of(16),letδ=W y(k)−(Ay(k)+c), then we can get the following inequality:(y(k)−y∗)T(2M−W+A)T[P Y[δ]−My(k)]≤− P Y[δ]−My(k) 2−(y(k)−y∗)T(M−W+A)T M(y(k)−y∗)(18)Theorem1For model(16),if the constantλsatisfies0<λ≤2H 2λmax(P),then theneural network is asymptotically stable,whereP=⎛⎝(I n+Q)−10n×n0n×m0n×n I n0n×m0m×n0m×n0m×m⎞⎠,λmax(P)is the maximum eigenvalue of matrix P. H 2is the square of the two-norm of H.Proof See“Appendix”. 4Illustrative ExamplesIn this section,two examples are given to illustrate the effectiveness of the proposed neural network.Example1Consider the following degenerate optimization problem[31]:minimize x21+x22+x23−2x1x2−x3subject to0≤x1,x2,x32≤x1−x2≤30≤x1+x3≤3(19) where x=(x1,x2,x3)T,and this problem has a solution setΩ∗={(x1,x2,0.5)|x1−x2=2}.In this problem,the coefficients are as follows Q=⎛⎝2−20−220002⎞⎠,B=1−10101,d0=⎛⎝⎞⎠,b1=2,b2=33,c=⎛⎝−1⎞⎠,the maximal eigenvalue of matrix Pis1,and then,we can obtain the feasible region of the constantλ:0<λ≤2H 2=0.0028,in this example,the value ofλis0.0023.The random initial points and the corre-sponding limit points are given in Table1,and the convergence result is shown in Fig.3.If we let the step sizeλ=0.0328,which beyond its feasible region,and with the initial point(5,1,3,6,7,9,2,4),then the system will be unstable in Fig.4.The initial point The limit point The optimal solution(0,0,0,1,0,0,0,0)T(2.00,0.00,0.50,0.00,0.00,0.00,4.00,0.00)T(2.00,0.00,0.50)T (1,2,3,4,5,6,7,8)T(2.00,0.00,0.50,0.00,0.00,0.00,4.00,0.00)T(2.00,0.00,0.50)T (8,2,2,2,2,3,4,7)T(2.00,0.00,0.50,0.00,0.00,0.00,4.00,0.00)T(2.00,0.00,0.50)T (5,1,3,6,7,9,2,4)T(2.00,0.00,0.50,0.00,0.00,0.00,4.00,0.00)T(2.00,0.00,0.50)T (3,1,2,8,4,6,5,7)T(2.00,0.00,0.50,0.00,0.00,0.00,4.00,0.00)T(2.00,0.00,0.50)T −(1,1,2,3,4,6,5,7)T(2.00,0.00,0.50,0.00,0.00,0.00,4.00,0.00)T(2.00,0.00,0.50)TFig.3Convergence result of Example1Fig.4Behavior of the network under the unsuitable step sizeThe initial point The limit point The optimal solution(0,0,0,3,1,0,0,0)T(0.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00)T(0.00,0.00,1.00)T (1,2,3,4,5,6,7,8)T(0.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00)T(0.00,0.00,1.00)T (0,1,2,4,2,3,6,1)T(0.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00)T(0.00,0.00,1.00)T (4,2,5,6,2,3,4,7)T(0.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00)T(0.00,0.00,1.00)T −(1,1,2,3,4,6,5,7)T(0.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00)T(0.00,0.00,1.00)T (5,1,3,6,7,9,2,4)T(0.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00)T(0.00,0.00,1.00)TExample2Consider this degenerate quadratic programming problem[11]:minimize 14x21+14x23+12x3subject to−1≤x1,x2,x3≤13≤x1+x2+4x3≤51≤x1−x2+2x3≤3(20) x=(x1,x2,x3)T,this problem has a solution set⎧⎪⎨⎪⎩x∗1=2x∗2−2/3<x∗1<2/3x∗3=1.In this problem,the coefficients are as follows Q=⎛⎝0.5−10−120000.5⎞⎠,B=1141−12,d0=⎛⎝−1−1−1⎞⎠,h0=⎛⎝111⎞⎠,b1=31,b2=53,c=⎛⎝−0.5⎞⎠,while the maximaleigenvalue of matrix P is1,then we can obtain the feasible region of the constantλin this example:0<λ≤2H 2=0.0106,and the value ofλis0.0073.The random initial points and the corresponding limit points are as Table2shows,and the convergence result in Fig.5shows that the neural network canfind the optimization solution effectively.If we let the step sizeλ=0.041,beyond its feasible region,and with the initial point-(1,1,2,3,4,6,5,7),the network will be unstable in Fig.6. Optimal solution cannot be found.Fig.5Convergence result of Example2Fig.6Behavior of the network under the unsuitable step size5ConclusionsIn this paper,a discrete-time neural network has been proposed for solving convex degenerate quadratic optimization problems,and the asymptotic stability can be guar-anteed if the parameterλis in its feasible region.Simulation results on numerical examples demonstrate the effectiveness of the proposed neural network for solving degenerate quadratic optimization problems.The circuit we build has made good preparation for the hardware implementation.In the near future,we will try to build neural networks of discontinuous activation functions for solving optimization prob-lems.Acknowledgments This work was supported by Natural Science Foundation of China (Grant Nos.61403313,61374078),Chongqing Research Program of Basic Research and Frontier Technology (No.cstc2015jcyjBX0052)and also supported by the Natural Science Foundation Project of Chongqing CSTC (No.cstc2014jcyjA40014).This publication was made possible by NPRP Grant No.NPRP 4-1162-1-181from the Qatar National Research Fund (a member of Qatar Foundation).AppendixProof of Theorem 1First,we can construct a Lyapunov functionV (k )=[y (k )−y ∗]T P [y (k )−y ∗]k ∈R +nwhere y ∗is one equilibrium point.Then calculating the differenceV (k )=V (k +1)−V (k )= y (k +1)−y ∗ T P y (k +1)−y ∗ − y (k )−y ∗ T P y (k )−y ∗=y (k +1)T Py (k +1)−y (k +1)Py ∗−(y ∗)T y (k +1)+(y ∗)T y ∗− y (k )T Py (k )−y (k )T Py ∗−(y ∗)T Py (k )+(y ∗)Py∗ =y (k +1)T Py (k +1)−y (k +1)T Py ∗−(y ∗)T Py (k +1)−y (k )T Py (k )+y (k )T Py ∗+(y ∗)T Py (k )=y (k +1)T Py (k +1)−y (k )T Py (k )−[y (k +1)−y (k )]T Py ∗−(y ∗)P [y (k +1)−y (k )]={y (k )+λH [P Y [δ]−My (k )]}T P {y (k )+λH [P Y [δ]−My (k )]}−y (k )T Py (k )−{λH [P Y [δ]−My (k )]}T Py ∗−(y ∗)T P {λH [P Y [δ]−My (k )]}=y (k )T Py (k )+y (k )T P λH {P Y [δ]−My (k )}+{λH [P Y [δ]]}T Py (k ){λH [P Y [δ]−My (k )]}T P {λH [P Y [δ]−My (k )]}−y (k )T y (k )−{λH [P Y [δ]−My (k )]}T Py ∗−(y ∗)T P {λH [P Y [δ]−My (k )]}=y (k )T P λH [P Y [δ]−My (k )]+{λH [P Y [δ]−My (k )]}T Py (k )+{λH [P Y [δ]−My (k )]}T P {λH [P Y [δ]−My (k )]}−{λH [P Y [δ]−My (k )]}T Py ∗−(y ∗)T P {λH [P Y [δ]−My (k )]}=λ y (k )−y ∗ T P {H [P Y [δ]−My (k )]}+λ{H [P Y [δ]−My (k )]}T P y (k )−y ∗+λ2{H [P Y [δ]−My (k )]}T P {H [P Y [δ]−My (k )]}=2λ y (k )−y ∗ T P H [P Y [δ]−My (k )]+λ2{H [P Y [δ]−My (k )]}T P {H [P Y [δ]−My (k )]}(21)and then,by simple calculation,P H =⎛⎝I n B T I n +Q I n 0n ×m −I n 0m ×n I m −B⎞⎠=2M −W +A Using the inequality in Lemma 1,it follows that2λ y (k )−y ∗ T P H [P Y [δ]−My (k )]+λ2{H [P Y [δ]−My (k )]}T P {H [P Y [δ]−My (k )]}≤−2λ P Y [δ]−My (k ) 2−2λ y (k )−y ∗ T (M −W +A )T M y (k )−y ∗+λ2{H [P Y [δ]−My (k )]}T P {H [P Y [δ]−My (k )]}≤−2λ P Y [δ]−My (k ) 2−2λ y (k )−y ∗ T (M −W +A )T M y (k )−y ∗+λ2 H 2λmax (P ) P Y [δ]−My (k ) 2= −2λ+λ2 H 2λmax (P ) P Y [δ]−My (k ) 2−2λ y (k )−y ∗ T (M −W +A )T M y (k )−y ∗(22)While(M −W +A )T M =⎛⎝Q 0n ×n 0n ×m 0n ×n0n ×n 0n ×m 0m ×n0m ×n 0m ×m ⎞⎠is a semi-positive definite matrix.Therefore,if the constant λsatisfies−2λ+λ2 H 2λmax (P )≤0(23)That is0<λ≤2 H 2λmax (P )(24)We obtainV (k )=2λ y (k )−y ∗ T P H [P Y [δ]−My (k )]+λ2{H [P Y [δ]−My (k )]}T P {H [P Y [δ]−My (k )]}≤−2λ P Y [δ]−My (k ) 2−2λ y (k )−y ∗ T (M −W +A )T M y (k )−y ∗+λ2 H 2λmax (P ) P Y [δ]−My (k ) 2=−2λ+λ2 H 2λmax(P)P Y[δ]−My(k) 2−2λy(k)−y∗T(M−W+A)T My(k)−y∗≤0(25)therefore,the network(16)we present is globally convergent.This proof is complete.References1.M.S.Bazaraa,C.M.Shetty,Nonlinear programming:theory and algorithms.J.Oper.Res.Soc.(1979).doi:10.2307/30096962.H.Che et al.,An intelligent method of swarm neural networks for equalities-constrained nonconvexoptimization.Neurocomputing.167,569–577(2015)3. B.Chen et al.,Distributed fusion estimation with communication bandwidth constraints.IEEE Trans.Autom.Control.60(5),1398–1403(2015)4.L.Cheng et al.,A neutral-type delayed projection neural network for solving nonlinear variationalinequalities.IEEE Trans.Circuits Syst.II Express Briefs55(8),806–810(2008)5.L.Cheng et al.,Recurrent neural network for non-smooth convex optimization problems with appli-cation to the identification of genetic regulatory networks.IEEE Trans.Neural Netw.22(5),714–726 (2011)6.X.He et al.,An inertial projection neural network for solving variational inequalities.IEEE Trans.Cybern.(2016).doi:10.1109/TCYB.2016.25235417.X.He et al.,Neural networks for solving Nash equilibrium problem in application of multiuser powercontrol.Neural Netw.57(9),73–78(2014)8.X.He et al.,A recurrent neural network for solving bilevel linear programming problem.IEEE Trans.Neural Netw.Learn.Syst.25(4),824–830(2014)9.X.He et al.,Neural network for solving convex quadratic bilevel programming problems.Neural Netw.51(3),17–25(2014)10.Tank Hopfield,“Neural”computation of decisions in optimization problems.Biol.Cybern.52(3),141–152(1985)11.M.J,Improvement of the convergence speed of a discrete-time recurrent neural network for quadraticoptimization with general linear constraints.Neurocomputing144,493–500(2014)12.Chua Kennedy,Neural networks for nonlinear programming.IEEE Trans.Circuits Syst.35(5),554–562(1988)13. D.Kinderlehrer,G.Stampacchia,An introduction to variational inequalities and their applications.SIAM.31,7–22(1980)14.S.Lee et al.,Landslide susceptibility analysis and its verification using likelihood ratio,logistic regres-sion,and artificial neural network models:case study of ndslides4(4),327–338(2007) 15.H.Li et al.,Control of nonlinear networked systems with packet dropouts:interval type-2fuzzy model-based approach.IEEE Trans.Cybern.(2014).doi:10.1109/TCYB.237181416. C.Li et al.,Distributed event-triggered scheme for economic dispatch in smart grids.IEEE Trans.Ind.Inf.(2015).doi:10.1109/TII.247955817. C.Li et al.,A generalized Hopfield network for nonsmooth constrained convex optimization:Liederivative approach.IEEE Trans.Neural Netw.Learn.Syst.27(2),308–321(2016)18. C.Li,X.Yu,W.Yu,Efficient computation for sparse load shifting in demand side management.IEEETrans.Smart Grid.(2016).doi:10.1109/TSG.252137719.Q.Liu,J.Wang,A one-layer recurrent neural network with a discontinuous activation function forlinear programming.Neural Comput.20(5),1366–1383(2008)20.Q.Liu,J.Wang,A projection neural network for constrained quadratic minimax optimization.IEEETrans.Neural Netw.Learn.Syst.26(11),2891–2900(2015)21.Q.Liu,J.Wang,A second-order multi-agent network for bound-constrained distributed optimization.IEEE Trans.Autom.Control.60(12),3310–3315(2015)22.Q.Liu et al.,One-layer continuous-and discrete-time projection neural networks for solving variationalinequalities and related optimization problems.IEEE Trans.Neural Netw.Learn.Syst.25(7),1308–1318(2014)23.Q.Liu,J.Cao,Global exponential stability of discrete-time recurrent neural network for solvingquadratic programming problems subject to linear constraints.Neurocomputing74(17),3494–3501 (2011)24.S.Qin et al.,Neural network for constrained nonsmooth optimization using Tikhonov regularization.Neural Netw.63(3),272–281(2015)25.S.Qin,X.Xue,A two-layer recurrent neural network for nonsmooth convex optimization problems.IEEE Trans.Neural Netw.Learn.Syst.26(6),1149–1160(2015)26.K.C.Tan et al.,Global exponential stability of discrete-time neural networks for constrained quadraticoptimization.Neurocomputing56(3),399–406(2004)27.Q.Tao,J.Cao,D.Sun,A simple and high performance neural network for quadratic programmingput.124(2),251–260(2001)28.J.Wang,Primal and dual neural networks for shortest-path routing.IEEE Trans.Syst.Man Cybern.Part A Syst.Hum.28(6),864–869(1998)29.W.E.Weideman,Neural network image processing system.U.S.Patent4941,122(1990)30.Z.Wu et al.,Sampled-data fuzzy control of chaotic systems based on T–S fuzzy model.IEEE Trans.Fuzzy Syst.22(1),153–163(2014)31.Y.Xia,J.Wang,A general methodology for designing globally convergent optimization neural net-works.IEEE Trans.Neural Netw.9(6),1331–1343(1998)32.Y.Xia et al.,A recurrent neural network with exponential convergence for solving convex quadraticprogram and related linear piecewise equations.Neural Netw.17(7),1003–1015(2004)33.Y.Xia et al.,A projection neural network and its application to constrained optimization problems.IEEE Trans.Circuits Syst.I Regul.Pap.49(4),447–458(2002)34.Y.Xu et al.,Asynchronous dissipative state estimation for stochastic complex networks with quantizedjumping coupling and uncertain measurements.IEEE Trans.Neural Netw.Learn.Syst.(2015).doi:10.1109/TNNLS.250377235.X.Xue,W.Bian,A project neural network for solving degenerate convex quadratic program.Neuro-computing70(13),2449–2459(2007)36.M.Yashtini,A.Malek,A discrete-time neural network for solving nonlinear convex problems withhybrid put.195(2),576–584(2008)。