1960 Lanczos Linear Systems in Self-Adjoint Form

合集下载

清华大学高等数值计算(李津)实践题目一(共轭梯度CG法,Lanczos算法与MINRES算法)

清华大学高等数值计算(李津)实践题目一(共轭梯度CG法,Lanczos算法与MINRES算法)

高等数值计算实践题目一1. 实践目的本次计算实践主要是在掌握共轭梯度法,Lanczos 算法与MINRES 算法的基础上,进一步探讨这3种算法的数值性质,主要研究特征值特征向量对算法收敛性的影响。

2. 实践过程(一)生成矩阵(1)作5个100阶对角阵i D 如下:1D 对角元:1,1,...,20,1+0.1(-20),21,...,100j j d j d j j ====2D 对角元:1,1,...,20,1+(-20),21,...,100j j d j d j j ==== 3D 对角元:,1,...,80,81,81,...,100j j d j j d j ====4D 对角元:,1,...,40,41,41,...,60,41+(60),61,...,100j j j d j j d j d j j =====-= 5D 对角元:,1,...,100j d j j ==记i D 的最大模特征值和最小模特征值分别为1i λ和i n λ,则i D 特征值分布有如下特点:1D 的特征值有较多接近于i n λ,并且1/i i n λλ较小,2D 的特征值有较多接近于i n λ,并且1/i i n λλ较大, 3D 的特征值有较多接近于1i λ,并且1/i i n λλ较大,4D 的特征值有较多接近于中间模特征值,并且1/i i n λλ较大, 5D 的特征值均匀分布,并且1/i i n λλ较大(2)随机生成10个100阶矩阵j M :(100(100))j M fix rand =并作它们的QR 分解,得j Q 和j R ,这样可得50个对称的矩阵Tij j i j A Q DQ =,其中i D 的对角元就是ij A 的特征值,若它们都大于0,则ij A 正定,j Q 的列就是相应的特征向量。

结合(1)可知,ij A 都是对称正定阵。

(二)计算结果以下计算,均选定精确解(100,1)exact x ones =,初值0(100,1)x zeros =由ij exact kA x b =计算得到k b (算法中要求解的精度为10e -)。

互连延迟的分析方法

互连延迟的分析方法

互连延迟的分析方法刘 昆 [1] 郑 赟[2] 黄道君[3] 候劲松[4][2][4]北京中电华大电子设计有限责任公司,[1][3]西安电子科技大学机电工程学院 摘要:随着工艺技术到达深亚微米领域,互连线的延迟影响越来越大,已经超过门延迟,成为电路延迟的主要部分。

因此,互连线的延迟已成为集成电路设计中必须解决的问题。

目前人们已展开了全面、深入地研究,提出了许多方法。

本文将介绍各类互连延迟的评估分析方法,分析它们的原理,比较它们的优缺点,指出它们的适用范围。

1 介绍随着芯片加工工艺技术向深亚微米领域发展,互连线的延迟影响越来越大,已超过门延迟,成为电路延迟的主要部分。

高速互连线的影响,如环绕、反射、串扰和扭曲等,已严重退化系统的性能。

因此互连线的延迟分析已成为集成电路设计中必须解决的问题。

Spice 和AS/X 电路模拟器是非常好的延迟分析工具[1-2]。

它们能非常准确地计算互连延迟,但是计算效率非常低下,特别是对于线性电路。

而互连线就是线性电路,因此一类降阶模型技术[3-5],如AWE[3],已用来计算互连延迟。

它们与模拟方法有相同的精度,却有更高的效率。

但是它们有稳定性和保守性的问题,并且在设计早期使用它们来计算延迟还是很昂贵。

因此既有效率又容易实现的延迟度量已成为许多研究者研究的热点,只要它们的精度和可信度比较合理。

Elmore[6]于1948年提出了一个计算瞬态阶跃响应(step response )到达它最终值的50%时的时间计算表达式。

它的原理是用冲激响应(impulse response )的平均值(也就是一阶瞬态)来近似单调阶跃响应波形到达它最终值的50%时的时间。

Elmore 延迟是冲激响应的一阶瞬态1m 。

它有时相当不准确,因为它忽略了顺流电容的漏电阻(resistive shielding )。

为了取得更高的精确性,需要利用高阶瞬态2m ,Λ,m 3 。

Kahng 和Muddu[7]提出了三个延迟度量(metric ),所有的延迟度量都是采用前三个电路瞬态1m ,2m ,3m 。

控制领域内大牛的经典著作

控制领域内大牛的经典著作

控制领域内大牛的经典著作现代控制理论粗略讲包括三个部分:线性系统理论,最优控制理论和系统辨识。

这三个方面都是在上个世纪六七十年代发展起来的。

线性系统最经典的著作当属陈启宗的《Linear sysytemthoery and design》和凯拉斯的《Linear systems》。

这两本都是美国各名牌大学控制专业指定用书,前者读后会让你感受到一个体系的美和完整,后者比较难懂,如能坚持读完,理论素养会有很大的提升,在美国控制届没有不知道凯拉斯的。

最优控制理论从运筹学等数学理论发展过来,有很多这方面的书。

这里推荐一本《The robust maximum principle》,这本书理论起点较高,几乎囊括了最优控制的所有精华。

系统辨识的经典是瑞典的L。

荣,在此基础上的自适应控制的经典是Goodwin的《自适应预测,滤波和控制》。

关于鲁棒控制的经典力作,无可厚非当属周克敏教授的《鲁棒与最优控制》,是美国各大研究生院的指定教材,据不完全统计周教授的SCI论文和这本书已被引用六千余次,这是我们搞鲁棒控制的偶像!关于非线性控制著作:Hassan K. Khalil 写的曾获IFAC控制工程教材奖的《非线性系统》有中译版的。

希望以上推荐对各位有志于在控制领域有所建树的广大朋友,在控制理论素养上和知识结构的完整上有所帮助!补充:L。

荣------莱纳。

荣(LennartLjung)关于非线性控制的著作还有Alberto Isidori写的《Nonlinear Control Systems Third Edition》被业界誉为非线性控制领域的“圣经”。

有电子工业出版社的中译本,比Hassan K. Khalil 写的《Nonlinear Systems》略难。

Ps,应该是陈同文的《Linear system theory and design》,楼主说的陈启宗好像是按照香港拼音翻译过来的!。

《神经网络与深度学习综述DeepLearning15May2014

《神经网络与深度学习综述DeepLearning15May2014

Draft:Deep Learning in Neural Networks:An OverviewTechnical Report IDSIA-03-14/arXiv:1404.7828(v1.5)[cs.NE]J¨u rgen SchmidhuberThe Swiss AI Lab IDSIAIstituto Dalle Molle di Studi sull’Intelligenza ArtificialeUniversity of Lugano&SUPSIGalleria2,6928Manno-LuganoSwitzerland15May2014AbstractIn recent years,deep artificial neural networks(including recurrent ones)have won numerous con-tests in pattern recognition and machine learning.This historical survey compactly summarises relevantwork,much of it from the previous millennium.Shallow and deep learners are distinguished by thedepth of their credit assignment paths,which are chains of possibly learnable,causal links between ac-tions and effects.I review deep supervised learning(also recapitulating the history of backpropagation),unsupervised learning,reinforcement learning&evolutionary computation,and indirect search for shortprograms encoding deep and large networks.PDF of earlier draft(v1):http://www.idsia.ch/∼juergen/DeepLearning30April2014.pdfLATEX source:http://www.idsia.ch/∼juergen/DeepLearning30April2014.texComplete BIBTEXfile:http://www.idsia.ch/∼juergen/bib.bibPrefaceThis is the draft of an invited Deep Learning(DL)overview.One of its goals is to assign credit to those who contributed to the present state of the art.I acknowledge the limitations of attempting to achieve this goal.The DL research community itself may be viewed as a continually evolving,deep network of scientists who have influenced each other in complex ways.Starting from recent DL results,I tried to trace back the origins of relevant ideas through the past half century and beyond,sometimes using“local search”to follow citations of citations backwards in time.Since not all DL publications properly acknowledge earlier relevant work,additional global search strategies were employed,aided by consulting numerous neural network experts.As a result,the present draft mostly consists of references(about800entries so far).Nevertheless,through an expert selection bias I may have missed important work.A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century.For these reasons,the present draft should be viewed as merely a snapshot of an ongoing credit assignment process.To help improve it,please do not hesitate to send corrections and suggestions to juergen@idsia.ch.Contents1Introduction to Deep Learning(DL)in Neural Networks(NNs)3 2Event-Oriented Notation for Activation Spreading in FNNs/RNNs3 3Depth of Credit Assignment Paths(CAPs)and of Problems4 4Recurring Themes of Deep Learning54.1Dynamic Programming(DP)for DL (5)4.2Unsupervised Learning(UL)Facilitating Supervised Learning(SL)and RL (6)4.3Occam’s Razor:Compression and Minimum Description Length(MDL) (6)4.4Learning Hierarchical Representations Through Deep SL,UL,RL (6)4.5Fast Graphics Processing Units(GPUs)for DL in NNs (6)5Supervised NNs,Some Helped by Unsupervised NNs75.11940s and Earlier (7)5.2Around1960:More Neurobiological Inspiration for DL (7)5.31965:Deep Networks Based on the Group Method of Data Handling(GMDH) (8)5.41979:Convolution+Weight Replication+Winner-Take-All(WTA) (8)5.51960-1981and Beyond:Development of Backpropagation(BP)for NNs (8)5.5.1BP for Weight-Sharing Feedforward NNs(FNNs)and Recurrent NNs(RNNs)..95.6Late1980s-2000:Numerous Improvements of NNs (9)5.6.1Ideas for Dealing with Long Time Lags and Deep CAPs (10)5.6.2Better BP Through Advanced Gradient Descent (10)5.6.3Discovering Low-Complexity,Problem-Solving NNs (11)5.6.4Potential Benefits of UL for SL (11)5.71987:UL Through Autoencoder(AE)Hierarchies (12)5.81989:BP for Convolutional NNs(CNNs) (13)5.91991:Fundamental Deep Learning Problem of Gradient Descent (13)5.101991:UL-Based History Compression Through a Deep Hierarchy of RNNs (14)5.111992:Max-Pooling(MP):Towards MPCNNs (14)5.121994:Contest-Winning Not So Deep NNs (15)5.131995:Supervised Recurrent Very Deep Learner(LSTM RNN) (15)5.142003:More Contest-Winning/Record-Setting,Often Not So Deep NNs (16)5.152006/7:Deep Belief Networks(DBNs)&AE Stacks Fine-Tuned by BP (17)5.162006/7:Improved CNNs/GPU-CNNs/BP-Trained MPCNNs (17)5.172009:First Official Competitions Won by RNNs,and with MPCNNs (18)5.182010:Plain Backprop(+Distortions)on GPU Yields Excellent Results (18)5.192011:MPCNNs on GPU Achieve Superhuman Vision Performance (18)5.202011:Hessian-Free Optimization for RNNs (19)5.212012:First Contests Won on ImageNet&Object Detection&Segmentation (19)5.222013-:More Contests and Benchmark Records (20)5.22.1Currently Successful Supervised Techniques:LSTM RNNs/GPU-MPCNNs (21)5.23Recent Tricks for Improving SL Deep NNs(Compare Sec.5.6.2,5.6.3) (21)5.24Consequences for Neuroscience (22)5.25DL with Spiking Neurons? (22)6DL in FNNs and RNNs for Reinforcement Learning(RL)236.1RL Through NN World Models Yields RNNs With Deep CAPs (23)6.2Deep FNNs for Traditional RL and Markov Decision Processes(MDPs) (24)6.3Deep RL RNNs for Partially Observable MDPs(POMDPs) (24)6.4RL Facilitated by Deep UL in FNNs and RNNs (25)6.5Deep Hierarchical RL(HRL)and Subgoal Learning with FNNs and RNNs (25)6.6Deep RL by Direct NN Search/Policy Gradients/Evolution (25)6.7Deep RL by Indirect Policy Search/Compressed NN Search (26)6.8Universal RL (27)7Conclusion271Introduction to Deep Learning(DL)in Neural Networks(NNs) Which modifiable components of a learning system are responsible for its success or failure?What changes to them improve performance?This has been called the fundamental credit assignment problem(Minsky, 1963).There are general credit assignment methods for universal problem solvers that are time-optimal in various theoretical senses(Sec.6.8).The present survey,however,will focus on the narrower,but now commercially important,subfield of Deep Learning(DL)in Artificial Neural Networks(NNs).We are interested in accurate credit assignment across possibly many,often nonlinear,computational stages of NNs.Shallow NN-like models have been around for many decades if not centuries(Sec.5.1).Models with several successive nonlinear layers of neurons date back at least to the1960s(Sec.5.3)and1970s(Sec.5.5). An efficient gradient descent method for teacher-based Supervised Learning(SL)in discrete,differentiable networks of arbitrary depth called backpropagation(BP)was developed in the1960s and1970s,and ap-plied to NNs in1981(Sec.5.5).BP-based training of deep NNs with many layers,however,had been found to be difficult in practice by the late1980s(Sec.5.6),and had become an explicit research subject by the early1990s(Sec.5.9).DL became practically feasible to some extent through the help of Unsupervised Learning(UL)(e.g.,Sec.5.10,5.15).The1990s and2000s also saw many improvements of purely super-vised DL(Sec.5).In the new millennium,deep NNs havefinally attracted wide-spread attention,mainly by outperforming alternative machine learning methods such as kernel machines(Vapnik,1995;Sch¨o lkopf et al.,1998)in numerous important applications.In fact,supervised deep NNs have won numerous of-ficial international pattern recognition competitions(e.g.,Sec.5.17,5.19,5.21,5.22),achieving thefirst superhuman visual pattern recognition results in limited domains(Sec.5.19).Deep NNs also have become relevant for the more generalfield of Reinforcement Learning(RL)where there is no supervising teacher (Sec.6).Both feedforward(acyclic)NNs(FNNs)and recurrent(cyclic)NNs(RNNs)have won contests(Sec.5.12,5.14,5.17,5.19,5.21,5.22).In a sense,RNNs are the deepest of all NNs(Sec.3)—they are general computers more powerful than FNNs,and can in principle create and process memories of ar-bitrary sequences of input patterns(e.g.,Siegelmann and Sontag,1991;Schmidhuber,1990a).Unlike traditional methods for automatic sequential program synthesis(e.g.,Waldinger and Lee,1969;Balzer, 1985;Soloway,1986;Deville and Lau,1994),RNNs can learn programs that mix sequential and parallel information processing in a natural and efficient way,exploiting the massive parallelism viewed as crucial for sustaining the rapid decline of computation cost observed over the past75years.The rest of this paper is structured as follows.Sec.2introduces a compact,event-oriented notation that is simple yet general enough to accommodate both FNNs and RNNs.Sec.3introduces the concept of Credit Assignment Paths(CAPs)to measure whether learning in a given NN application is of the deep or shallow type.Sec.4lists recurring themes of DL in SL,UL,and RL.Sec.5focuses on SL and UL,and on how UL can facilitate SL,although pure SL has become dominant in recent competitions(Sec.5.17-5.22). Sec.5is arranged in a historical timeline format with subsections on important inspirations and technical contributions.Sec.6on deep RL discusses traditional Dynamic Programming(DP)-based RL combined with gradient-based search techniques for SL or UL in deep NNs,as well as general methods for direct and indirect search in the weight space of deep FNNs and RNNs,including successful policy gradient and evolutionary methods.2Event-Oriented Notation for Activation Spreading in FNNs/RNNs Throughout this paper,let i,j,k,t,p,q,r denote positive integer variables assuming ranges implicit in the given contexts.Let n,m,T denote positive integer constants.An NN’s topology may change over time(e.g.,Fahlman,1991;Ring,1991;Weng et al.,1992;Fritzke, 1994).At any given moment,it can be described as afinite subset of units(or nodes or neurons)N= {u1,u2,...,}and afinite set H⊆N×N of directed edges or connections between nodes.FNNs are acyclic graphs,RNNs cyclic.Thefirst(input)layer is the set of input units,a subset of N.In FNNs,the k-th layer(k>1)is the set of all nodes u∈N such that there is an edge path of length k−1(but no longer path)between some input unit and u.There may be shortcut connections between distant layers.The NN’s behavior or program is determined by a set of real-valued,possibly modifiable,parameters or weights w i(i=1,...,n).We now focus on a singlefinite episode or epoch of information processing and activation spreading,without learning through weight changes.The following slightly unconventional notation is designed to compactly describe what is happening during the runtime of the system.During an episode,there is a partially causal sequence x t(t=1,...,T)of real values that I call events.Each x t is either an input set by the environment,or the activation of a unit that may directly depend on other x k(k<t)through a current NN topology-dependent set in t of indices k representing incoming causal connections or links.Let the function v encode topology information and map such event index pairs(k,t)to weight indices.For example,in the non-input case we may have x t=f t(net t)with real-valued net t= k∈in t x k w v(k,t)(additive case)or net t= k∈in t x k w v(k,t)(multiplicative case), where f t is a typically nonlinear real-valued activation function such as tanh.In many recent competition-winning NNs(Sec.5.19,5.21,5.22)there also are events of the type x t=max k∈int (x k);some networktypes may also use complex polynomial activation functions(Sec.5.3).x t may directly affect certain x k(k>t)through outgoing connections or links represented through a current set out t of indices k with t∈in k.Some non-input events are called output events.Note that many of the x t may refer to different,time-varying activations of the same unit in sequence-processing RNNs(e.g.,Williams,1989,“unfolding in time”),or also in FNNs sequentially exposed to time-varying input patterns of a large training set encoded as input events.During an episode,the same weight may get reused over and over again in topology-dependent ways,e.g.,in RNNs,or in convolutional NNs(Sec.5.4,5.8).I call this weight sharing across space and/or time.Weight sharing may greatly reduce the NN’s descriptive complexity,which is the number of bits of information required to describe the NN (Sec.4.3).In Supervised Learning(SL),certain NN output events x t may be associated with teacher-given,real-valued labels or targets d t yielding errors e t,e.g.,e t=1/2(x t−d t)2.A typical goal of supervised NN training is tofind weights that yield episodes with small total error E,the sum of all such e t.The hope is that the NN will generalize well in later episodes,causing only small errors on previously unseen sequences of input events.Many alternative error functions for SL and UL are possible.SL assumes that input events are independent of earlier output events(which may affect the environ-ment through actions causing subsequent perceptions).This assumption does not hold in the broaderfields of Sequential Decision Making and Reinforcement Learning(RL)(Kaelbling et al.,1996;Sutton and Barto, 1998;Hutter,2005)(Sec.6).In RL,some of the input events may encode real-valued reward signals given by the environment,and a typical goal is tofind weights that yield episodes with a high sum of reward signals,through sequences of appropriate output actions.Sec.5.5will use the notation above to compactly describe a central algorithm of DL,namely,back-propagation(BP)for supervised weight-sharing FNNs and RNNs.(FNNs may be viewed as RNNs with certainfixed zero weights.)Sec.6will address the more general RL case.3Depth of Credit Assignment Paths(CAPs)and of ProblemsTo measure whether credit assignment in a given NN application is of the deep or shallow type,I introduce the concept of Credit Assignment Paths or CAPs,which are chains of possibly causal links between events.Let usfirst focus on SL.Consider two events x p and x q(1≤p<q≤T).Depending on the appli-cation,they may have a Potential Direct Causal Connection(PDCC)expressed by the Boolean predicate pdcc(p,q),which is true if and only if p∈in q.Then the2-element list(p,q)is defined to be a CAP from p to q(a minimal one).A learning algorithm may be allowed to change w v(p,q)to improve performance in future episodes.More general,possibly indirect,Potential Causal Connections(PCC)are expressed by the recursively defined Boolean predicate pcc(p,q),which in the SL case is true only if pdcc(p,q),or if pcc(p,k)for some k and pdcc(k,q).In the latter case,appending q to any CAP from p to k yields a CAP from p to q(this is a recursive definition,too).The set of such CAPs may be large but isfinite.Note that the same weight may affect many different PDCCs between successive events listed by a given CAP,e.g.,in the case of RNNs, or weight-sharing FNNs.Suppose a CAP has the form(...,k,t,...,q),where k and t(possibly t=q)are thefirst successive elements with modifiable w v(k,t).Then the length of the suffix list(t,...,q)is called the CAP’s depth (which is0if there are no modifiable links at all).This depth limits how far backwards credit assignment can move down the causal chain tofind a modifiable weight.1Suppose an episode and its event sequence x1,...,x T satisfy a computable criterion used to decide whether a given problem has been solved(e.g.,total error E below some threshold).Then the set of used weights is called a solution to the problem,and the depth of the deepest CAP within the sequence is called the solution’s depth.There may be other solutions(yielding different event sequences)with different depths.Given somefixed NN topology,the smallest depth of any solution is called the problem’s depth.Sometimes we also speak of the depth of an architecture:SL FNNs withfixed topology imply a problem-independent maximal problem depth bounded by the number of non-input layers.Certain SL RNNs withfixed weights for all connections except those to output units(Jaeger,2001;Maass et al.,2002; Jaeger,2004;Schrauwen et al.,2007)have a maximal problem depth of1,because only thefinal links in the corresponding CAPs are modifiable.In general,however,RNNs may learn to solve problems of potentially unlimited depth.Note that the definitions above are solely based on the depths of causal chains,and agnostic of the temporal distance between events.For example,shallow FNNs perceiving large“time windows”of in-put events may correctly classify long input sequences through appropriate output events,and thus solve shallow problems involving long time lags between relevant events.At which problem depth does Shallow Learning end,and Deep Learning begin?Discussions with DL experts have not yet yielded a conclusive response to this question.Instead of committing myself to a precise answer,let me just define for the purposes of this overview:problems of depth>10require Very Deep Learning.The difficulty of a problem may have little to do with its depth.Some NNs can quickly learn to solve certain deep problems,e.g.,through random weight guessing(Sec.5.9)or other types of direct search (Sec.6.6)or indirect search(Sec.6.7)in weight space,or through training an NNfirst on shallow problems whose solutions may then generalize to deep problems,or through collapsing sequences of(non)linear operations into a single(non)linear operation—but see an analysis of non-trivial aspects of deep linear networks(Baldi and Hornik,1994,Section B).In general,however,finding an NN that precisely models a given training set is an NP-complete problem(Judd,1990;Blum and Rivest,1992),also in the case of deep NNs(S´ıma,1994;de Souto et al.,1999;Windisch,2005);compare a survey of negative results(S´ıma, 2002,Section1).Above we have focused on SL.In the more general case of RL in unknown environments,pcc(p,q) is also true if x p is an output event and x q any later input event—any action may affect the environment and thus any later perception.(In the real world,the environment may even influence non-input events computed on a physical hardware entangled with the entire universe,but this is ignored here.)It is possible to model and replace such unmodifiable environmental PCCs through a part of the NN that has already learned to predict(through some of its units)input events(including reward signals)from former input events and actions(Sec.6.1).Its weights are frozen,but can help to assign credit to other,still modifiable weights used to compute actions(Sec.6.1).This approach may lead to very deep CAPs though.Some DL research is about automatically rephrasing problems such that their depth is reduced(Sec.4). In particular,sometimes UL is used to make SL problems less deep,e.g.,Sec.5.10.Often Dynamic Programming(Sec.4.1)is used to facilitate certain traditional RL problems,e.g.,Sec.6.2.Sec.5focuses on CAPs for SL,Sec.6on the more complex case of RL.4Recurring Themes of Deep Learning4.1Dynamic Programming(DP)for DLOne recurring theme of DL is Dynamic Programming(DP)(Bellman,1957),which can help to facili-tate credit assignment under certain assumptions.For example,in SL NNs,backpropagation itself can 1An alternative would be to count only modifiable links when measuring depth.In many typical NN applications this would not make a difference,but in some it would,e.g.,Sec.6.1.be viewed as a DP-derived method(Sec.5.5).In traditional RL based on strong Markovian assumptions, DP-derived methods can help to greatly reduce problem depth(Sec.6.2).DP algorithms are also essen-tial for systems that combine concepts of NNs and graphical models,such as Hidden Markov Models (HMMs)(Stratonovich,1960;Baum and Petrie,1966)and Expectation Maximization(EM)(Dempster et al.,1977),e.g.,(Bottou,1991;Bengio,1991;Bourlard and Morgan,1994;Baldi and Chauvin,1996; Jordan and Sejnowski,2001;Bishop,2006;Poon and Domingos,2011;Dahl et al.,2012;Hinton et al., 2012a).4.2Unsupervised Learning(UL)Facilitating Supervised Learning(SL)and RL Another recurring theme is how UL can facilitate both SL(Sec.5)and RL(Sec.6).UL(Sec.5.6.4) is normally used to encode raw incoming data such as video or speech streams in a form that is more convenient for subsequent goal-directed learning.In particular,codes that describe the original data in a less redundant or more compact way can be fed into SL(Sec.5.10,5.15)or RL machines(Sec.6.4),whose search spaces may thus become smaller(and whose CAPs shallower)than those necessary for dealing with the raw data.UL is closely connected to the topics of regularization and compression(Sec.4.3,5.6.3). 4.3Occam’s Razor:Compression and Minimum Description Length(MDL) Occam’s razor favors simple solutions over complex ones.Given some programming language,the prin-ciple of Minimum Description Length(MDL)can be used to measure the complexity of a solution candi-date by the length of the shortest program that computes it(e.g.,Solomonoff,1964;Kolmogorov,1965b; Chaitin,1966;Wallace and Boulton,1968;Levin,1973a;Rissanen,1986;Blumer et al.,1987;Li and Vit´a nyi,1997;Gr¨u nwald et al.,2005).Some methods explicitly take into account program runtime(Al-lender,1992;Watanabe,1992;Schmidhuber,2002,1995);many consider only programs with constant runtime,written in non-universal programming languages(e.g.,Rissanen,1986;Hinton and van Camp, 1993).In the NN case,the MDL principle suggests that low NN weight complexity corresponds to high NN probability in the Bayesian view(e.g.,MacKay,1992;Buntine and Weigend,1991;De Freitas,2003), and to high generalization performance(e.g.,Baum and Haussler,1989),without overfitting the training data.Many methods have been proposed for regularizing NNs,that is,searching for solution-computing, low-complexity SL NNs(Sec.5.6.3)and RL NNs(Sec.6.7).This is closely related to certain UL methods (Sec.4.2,5.6.4).4.4Learning Hierarchical Representations Through Deep SL,UL,RLMany methods of Good Old-Fashioned Artificial Intelligence(GOFAI)(Nilsson,1980)as well as more recent approaches to AI(Russell et al.,1995)and Machine Learning(Mitchell,1997)learn hierarchies of more and more abstract data representations.For example,certain methods of syntactic pattern recog-nition(Fu,1977)such as grammar induction discover hierarchies of formal rules to model observations. The partially(un)supervised Automated Mathematician/EURISKO(Lenat,1983;Lenat and Brown,1984) continually learns concepts by combining previously learnt concepts.Such hierarchical representation learning(Ring,1994;Bengio et al.,2013;Deng and Yu,2014)is also a recurring theme of DL NNs for SL (Sec.5),UL-aided SL(Sec.5.7,5.10,5.15),and hierarchical RL(Sec.6.5).Often,abstract hierarchical representations are natural by-products of data compression(Sec.4.3),e.g.,Sec.5.10.4.5Fast Graphics Processing Units(GPUs)for DL in NNsWhile the previous millennium saw several attempts at creating fast NN-specific hardware(e.g.,Jackel et al.,1990;Faggin,1992;Ramacher et al.,1993;Widrow et al.,1994;Heemskerk,1995;Korkin et al., 1997;Urlbe,1999),and at exploiting standard hardware(e.g.,Anguita et al.,1994;Muller et al.,1995; Anguita and Gomes,1996),the new millennium brought a DL breakthrough in form of cheap,multi-processor graphics cards or GPUs.GPUs are widely used for video games,a huge and competitive market that has driven down hardware prices.GPUs excel at fast matrix and vector multiplications required not only for convincing virtual realities but also for NN training,where they can speed up learning by a factorof50and more.Some of the GPU-based FNN implementations(Sec.5.16-5.19)have greatly contributed to recent successes in contests for pattern recognition(Sec.5.19-5.22),image segmentation(Sec.5.21), and object detection(Sec.5.21-5.22).5Supervised NNs,Some Helped by Unsupervised NNsThe main focus of current practical applications is on Supervised Learning(SL),which has dominated re-cent pattern recognition contests(Sec.5.17-5.22).Several methods,however,use additional Unsupervised Learning(UL)to facilitate SL(Sec.5.7,5.10,5.15).It does make sense to treat SL and UL in the same section:often gradient-based methods,such as BP(Sec.5.5.1),are used to optimize objective functions of both UL and SL,and the boundary between SL and UL may blur,for example,when it comes to time series prediction and sequence classification,e.g.,Sec.5.10,5.12.A historical timeline format will help to arrange subsections on important inspirations and techni-cal contributions(although such a subsection may span a time interval of many years).Sec.5.1briefly mentions early,shallow NN models since the1940s,Sec.5.2additional early neurobiological inspiration relevant for modern Deep Learning(DL).Sec.5.3is about GMDH networks(since1965),perhaps thefirst (feedforward)DL systems.Sec.5.4is about the relatively deep Neocognitron NN(1979)which is similar to certain modern deep FNN architectures,as it combines convolutional NNs(CNNs),weight pattern repli-cation,and winner-take-all(WTA)mechanisms.Sec.5.5uses the notation of Sec.2to compactly describe a central algorithm of DL,namely,backpropagation(BP)for supervised weight-sharing FNNs and RNNs. It also summarizes the history of BP1960-1981and beyond.Sec.5.6describes problems encountered in the late1980s with BP for deep NNs,and mentions several ideas from the previous millennium to overcome them.Sec.5.7discusses afirst hierarchical stack of coupled UL-based Autoencoders(AEs)—this concept resurfaced in the new millennium(Sec.5.15).Sec.5.8is about applying BP to CNNs,which is important for today’s DL applications.Sec.5.9explains BP’s Fundamental DL Problem(of vanishing/exploding gradients)discovered in1991.Sec.5.10explains how a deep RNN stack of1991(the History Compressor) pre-trained by UL helped to solve previously unlearnable DL benchmarks requiring Credit Assignment Paths(CAPs,Sec.3)of depth1000and more.Sec.5.11discusses a particular WTA method called Max-Pooling(MP)important in today’s DL FNNs.Sec.5.12mentions afirst important contest won by SL NNs in1994.Sec.5.13describes a purely supervised DL RNN(Long Short-Term Memory,LSTM)for problems of depth1000and more.Sec.5.14mentions an early contest of2003won by an ensemble of shallow NNs, as well as good pattern recognition results with CNNs and LSTM RNNs(2003).Sec.5.15is mostly about Deep Belief Networks(DBNs,2006)and related stacks of Autoencoders(AEs,Sec.5.7)pre-trained by UL to facilitate BP-based SL.Sec.5.16mentions thefirst BP-trained MPCNNs(2007)and GPU-CNNs(2006). Sec.5.17-5.22focus on official competitions with secret test sets won by(mostly purely supervised)DL NNs since2009,in sequence recognition,image classification,image segmentation,and object detection. Many RNN results depended on LSTM(Sec.5.13);many FNN results depended on GPU-based FNN code developed since2004(Sec.5.16,5.17,5.18,5.19),in particular,GPU-MPCNNs(Sec.5.19).5.11940s and EarlierNN research started in the1940s(e.g.,McCulloch and Pitts,1943;Hebb,1949);compare also later work on learning NNs(Rosenblatt,1958,1962;Widrow and Hoff,1962;Grossberg,1969;Kohonen,1972; von der Malsburg,1973;Narendra and Thathatchar,1974;Willshaw and von der Malsburg,1976;Palm, 1980;Hopfield,1982).In a sense NNs have been around even longer,since early supervised NNs were essentially variants of linear regression methods going back at least to the early1800s(e.g.,Legendre, 1805;Gauss,1809,1821).Early NNs had a maximal CAP depth of1(Sec.3).5.2Around1960:More Neurobiological Inspiration for DLSimple cells and complex cells were found in the cat’s visual cortex(e.g.,Hubel and Wiesel,1962;Wiesel and Hubel,1959).These cellsfire in response to certain properties of visual sensory inputs,such as theorientation of plex cells exhibit more spatial invariance than simple cells.This inspired later deep NN architectures(Sec.5.4)used in certain modern award-winning Deep Learners(Sec.5.19-5.22).5.31965:Deep Networks Based on the Group Method of Data Handling(GMDH) Networks trained by the Group Method of Data Handling(GMDH)(Ivakhnenko and Lapa,1965; Ivakhnenko et al.,1967;Ivakhnenko,1968,1971)were perhaps thefirst DL systems of the Feedforward Multilayer Perceptron type.The units of GMDH nets may have polynomial activation functions imple-menting Kolmogorov-Gabor polynomials(more general than traditional NN activation functions).Given a training set,layers are incrementally grown and trained by regression analysis,then pruned with the help of a separate validation set(using today’s terminology),where Decision Regularisation is used to weed out superfluous units.The numbers of layers and units per layer can be learned in problem-dependent fashion. This is a good example of hierarchical representation learning(Sec.4.4).There have been numerous ap-plications of GMDH-style networks,e.g.(Ikeda et al.,1976;Farlow,1984;Madala and Ivakhnenko,1994; Ivakhnenko,1995;Kondo,1998;Kord´ık et al.,2003;Witczak et al.,2006;Kondo and Ueno,2008).5.41979:Convolution+Weight Replication+Winner-Take-All(WTA)Apart from deep GMDH networks(Sec.5.3),the Neocognitron(Fukushima,1979,1980,2013a)was per-haps thefirst artificial NN that deserved the attribute deep,and thefirst to incorporate the neurophysiolog-ical insights of Sec.5.2.It introduced convolutional NNs(today often called CNNs or convnets),where the(typically rectangular)receptivefield of a convolutional unit with given weight vector is shifted step by step across a2-dimensional array of input values,such as the pixels of an image.The resulting2D array of subsequent activation events of this unit can then provide inputs to higher-level units,and so on.Due to massive weight replication(Sec.2),relatively few parameters may be necessary to describe the behavior of such a convolutional layer.Competition layers have WTA subsets whose maximally active units are the only ones to adopt non-zero activation values.They essentially“down-sample”the competition layer’s input.This helps to create units whose responses are insensitive to small image shifts(compare Sec.5.2).The Neocognitron is very similar to the architecture of modern,contest-winning,purely super-vised,feedforward,gradient-based Deep Learners with alternating convolutional and competition lay-ers(e.g.,Sec.5.19-5.22).Fukushima,however,did not set the weights by supervised backpropagation (Sec.5.5,5.8),but by local un supervised learning rules(e.g.,Fukushima,2013b),or by pre-wiring.In that sense he did not care for the DL problem(Sec.5.9),although his architecture was comparatively deep indeed.He also used Spatial Averaging(Fukushima,1980,2011)instead of Max-Pooling(MP,Sec.5.11), currently a particularly convenient and popular WTA mechanism.Today’s CNN-based DL machines profita lot from later CNN work(e.g.,LeCun et al.,1989;Ranzato et al.,2007)(Sec.5.8,5.16,5.19).5.51960-1981and Beyond:Development of Backpropagation(BP)for NNsThe minimisation of errors through gradient descent(Hadamard,1908)in the parameter space of com-plex,nonlinear,differentiable,multi-stage,NN-related systems has been discussed at least since the early 1960s(e.g.,Kelley,1960;Bryson,1961;Bryson and Denham,1961;Pontryagin et al.,1961;Dreyfus,1962; Wilkinson,1965;Amari,1967;Bryson and Ho,1969;Director and Rohrer,1969;Griewank,2012),ini-tially within the framework of Euler-LaGrange equations in the Calculus of Variations(e.g.,Euler,1744). Steepest descent in such systems can be performed(Bryson,1961;Kelley,1960;Bryson and Ho,1969)by iterating the ancient chain rule(Leibniz,1676;L’Hˆo pital,1696)in Dynamic Programming(DP)style(Bell-man,1957).A simplified derivation of the method uses the chain rule only(Dreyfus,1962).The methods of the1960s were already efficient in the DP sense.However,they backpropagated derivative information through standard Jacobian matrix calculations from one“layer”to the previous one, explicitly addressing neither direct links across several layers nor potential additional efficiency gains due to network sparsity(but perhaps such enhancements seemed obvious to the authors).。

A regularizing L-curve Lanczos method for underdetermined linear systems

A regularizing L-curve Lanczos method for underdetermined linear systems

A regularizing L-curve Lanczos method forunderdetermined linear systemsS.Morigi,F.Sgallari *,1Department of Mathematics,CIRAM,University of Bologna,Piazza di Porta San Donato 5,I-40127Bologna,ItalyAbstractMany real applications give rise to the solution of underdetermined linear systems of equations with a very ill conditioned matrix A ,whose dimensions are so large as to make solution by direct methods impractical or infeasible.Image reconstruction from projections is a well-known example of such systems.In order to facilitate the com-putation of a meaningful approximate solution,we regularize the linear system,i.e.,we replace it by a nearby system that is better conditioned.The amount of regularization is determined by a regularization parameter.Its optimal value is,in most applications,not known a priori.A well-known method to determine it is given by the L-curve approach.We present an iterative method based on the Lanczos algorithm for inexpensively evaluating an approximation of the points on the L-curve and then determine the value of the optimal regularization parameter which lets us compute an approximate solution of the regularized system of equations.Ó2001Elsevier Science Inc.All rights reserved.Keywords:Ill posed problems;Regularization;L-curve criterion;Lanczos algorithm1.IntroductionIn this article,we describe a new regularizing iterative method for the so-lution of ill conditioned underdetermined linear systems of equationsA x b ;A P R m Ân ;x P R n ;b P R m ;m <n : 1/locate/amcApplied Mathematics and Computation 121(2001)55±73*Corresponding author.E-mail address:sgallari@dm.unibo.it (F.Sgallari).1This research was supported by the University of Bologna,funds for research topics.0096-3003/01/$-see front matter Ó2001Elsevier Science Inc.All rights reserved.PII:S 0096-3003(99)00262-356S.Morigi,F.Sgallari/put.121(2001)55±73Linear systems of this kind arise,for example,in the reconstruction of images from projections.In this application,each column corresponds to one pixel of the image to be reconstructed,and each rowto a``ray''from an emitter to a detector.The matrices that arise are typically large and have many more col-umns than rows.Further,they are generally very ill conditioned,because the linear system is obtained by discretizing an ill posed problem,the inversion of a Radon transform.The ill conditioning makes a solution of(1)very sensitive to perturbations in the right-hand side vector.Such perturbations arise,e.g.,from measurement errors[1,2].In order to be able to compute a meaningful approximate solution of(1),the system has to be regularized,i.e.,the linear system(1)has to be replaced by a linear system of equations with a not very ill conditioned matrix,such that the unique solution of this system yields an acceptable approximate solution of(1). We introduce the matrixB: AA T:The solution of the minimal norm of(1)can be written as x: A T y,where y P R m solves the linear systemB y b: 2 We regularize this system by adding a multiple of the identity matrix to B and obtainB k I y b; 3 where k>0is a regularization parameter.We denote the solution of(3)by y k. The smaller the value of k,the closer the solution y k of(3)is to the minimal norm solution of(2),and the more ill conditioned the matrix B k I.We wish to®nd a value of k in(3),such that the matrix B k I is not very ill conditioned, and the solution of(3)is an acceptable approximate solution of(2).Generally, it is not known a priori for which values of k these requirements are satis®ed. The approximate solution x k A T y k of(1),obtained by®rst solving(3)for y k,can also be computed by standard Tikhonov regularization of system(1). This approach,however,as has been shown in[3],requires more computer storage and arithmetic work when the linear system has many more columns than rows.Let us consider the minimization problemfk B yÀb k2 k k A T y k2g; 4 miny P R mwhere kÁk denotes the Euclidean norm here and in the sequel.The solution y k of(4)satis®es the equationB T B k AA T y B T b: 5S.Morigi,F.Sgallari/put.121(2001)55±7357 The comparison of(3)±(5)shows the equivalence of problem(3)and(5) when B is a nonsingular matrix and this justi®es what follows.Let components of the right-hand side vector b be contaminated by errors, e.g.,due to inaccurate measurements,and b exact and e denote the unperturbed right-hand side and the error,respectively.If the norm of the error e is known, then the value of k can be chosen so that the associated solution satis®es the Morozov discrepancy principle[3].However,in many applications the norm of e is not explicitly known,and other approaches to determine a value of the regularization parameter k have to be employed.We will show that the plot of the curve(k B y kÀb k;k A T y k k)is shaped roughly like an``L'',and the optimal value of the regularization parameter k corresponds to the``vertex''point of the L-curve.This idea of L-curve was observed by Lawson and Hanson[4]to determine the regularizing parameter k in the Tikhonov approach,while the choice of the optimal regularization pa-rameter as the vertex point was suggested recently in[5,6],and successively considered in[7].The computation of the L-curve is quite costly for large problems;in fact, the determination of a point on the L-curve requires that both the norm of the regularized approximate solution and the norm of the corresponding residual vector be available.Therefore,usually only a fewpoints on the L-curve are computed,and these values,which we will refer to as the discrete L-curve,are used to determine a value of the regularization parameter.In[8],a newmethod for approximately determining the points on the L-curve is proposed.They showed how rectangular regions that contain points on the L-curve(L-ribbon)can be determined fairly inexpensively for several values of the regularization parameter without having to solve the corresponding regularized minimization problem but using Gauss and Gauss±Radau quadr-ature rules.When the L-ribbon is narrow,it is possible to infer the approximate location of the vertex of the L-curve from the shape of the L-ribbon.In order to solve problem(4),or equivalently(3),we cannot follow the Calvetti et al.[8]approach because their assumptions are not satis®ed due to the term A T in the regularizing functional.We present an iterative procedure that determines successive encapsulated intervals containing the optimal regularization parameter corresponding to the vertex of the L-curve.At each step,some newpoints on the L-curve are computed by a fewsteps of the Lanczos bidiagonalization algorithm,while a new interval for k is de-termined by a minimum distance from the local origin criterion.The®nal value obtained for the regularization parameter is used to compute an approximation of the solution of the regularized system(3).This article is organized as follows.In Section2,we give a brief descrip-tion of properties of the L-curve and we show how to use it to solve prob-lem(4).Section3explains the relation between the bidiagonalization Lanczosalgorithm and the approximated L-curve and howto update the information extracted from the Lanczos process to compute a newpoint.Section4de-scribes the regularizing algorithm for the solution of the linear system(3). Section5contains several computed examples which illustrate the performance of the algorithm.2.L-curve for the choice of regularization parameter valueA recent method to choose the regularization parameter is the so-called L-curve.The L-curve is a parametric plot of q k ;g k ,where g k and q k measure the size of the regularized solution and the corresponding residual. The underlying idea is that a good method for choosing the regularization parameter for discrete ill posed problems must incorporate information about the solution size in addition to using information about the residual size.This is indeed quite natural,because we are seeking a fair balance in keeping both of these values small.The L-curve has a distinct L-shaped corner located exactly where the solution y k changes in nature from being dominated by regulariza-tion errors(i.e.,by over smoothing)to being dominated by the errors in the right-hand side.Hence,the corner of the L-curve corresponds to a good bal-ance between minimization of the two quantities,and the corresponding reg-ularization parameter k is a good choice.In particular the continuous L-curve for our problem consists of all the points q k ;g k for k P 0;I ,where q k k B y kÀb k and g k k A T y k k. In what follows,we present some important properties of the continuous L-curve and a method to compute a good approximation of the discrete L-curve. We®rst need to introduce the following singular value decomposition for matrix A U R V T.Here U P R mÂm and V P R nÂn are orthogonal matrices, R R m0 P R mÂn,and R m diag f r i g,where the singular values r i are ordered as0<r1T r2TÁÁÁT r m.It is straightforward to see that the solution of problem(4)can be written asy kX mi 1u T i bk r2iu i: 6Using these notations,the L-curve has the following properties:Proposition1.Let y k denote the solution to(4).Then,k A T y k k is a monotonically decreasing function of k B y kÀb k.Proof.The fact that the k A T y k k is a decreasing function of k B y kÀb k follows immediately from the following expressions:k A T y k k2X mi 1r i u T i br2i k2; 758S.Morigi,F.Sgallari/put.121(2001)55±73k B y k Àb k 2X m i 1kr 2i k u T ib 2; 8which are easily derived from (6).ÃIf a fewreasonable assumptions are made about the ill posed problem,then it is possible to characterize the L-curve behaviour analogously to that done in [5,6]for the Tikhonov regularization.The three assumptions made are:1.The coe cients j u T i b exact j on average decay to zero faster than the r i ;2.The perturbation vector e is essentially ``white noise''with standard devia-tion r ;3.The norm of e is bounded by the norm of b exact .Under Assumptions 1±3,the L-curve q k ;g k is shown to exhibit a corner behaviouras a function of k ,and the corner appears approximately at r m p ;k A T y 0k .Here,y 0is the unregularized solution to the unperturbed problem (2).The larger the di erence between the decay rates of j u T i b exact j and j u Ti e j ,the more distinct the corner will appear.Following [6],we summarize the reasoning that leads to the previous characterization.We consider the behaviour of the L-curve for an unperturbedproblem with e 0.For k (r 21,w e have r i = r 2i k %1=r i ,so that y k %y 0,andk B y k Àb exact k 2%X m i 1k r 2i u T i b exact 2T k r 21k b exact k 2: 9 Hence,for small values of k ,the L-curve is approximately a horizontal line atk A T y k k k A T y 0k .As k increases,it follows from (7)that k A T y k k starts to de-crease,while k B y k Àb k still grows.Now,consider the L-curve associated with the perturbation e of the right-hand side.The corresponding solution y k given by (6)with b replaced by e ,satis®es (from Assumption 2)k A T y k k 2 X m i 1r i u T i e r 2i k 2%r 2X m i 1r i r 2i k 2; 10 k B y k Àb k 2X m i 1kr 2i k u T ie 2%r 2X m i 1kr 2i k2:11For k (r 21,this L-curve is approximately a horizontal line at k A T y k k %m p r =r 1,and it starts to decrease towards the abscissa axis for much smaller values of k than the L-curve for b exact (see Assumption 3).Moreover,we see that as k increases,k B y k Àb k becomes almost independent of k ,while k A T y k k isS.Morigi,F.Sgallari /put.121(2001)55±7359dominated by a fewterms and becomes about r2=kp.Hence,this L-curve soonbecomes almost a vertical line at k B y kÀb k%rmpas k3I.The actual L-curve for a given problem,with a perturbed right-hand side b b exact e,is a combination of the above two special L-curves.For small k, the behaviour is completely dominated by contributions from e,whereas,for large k it is entirely dominated by contributions from b exact.In between there is a small region where both b exact and e contribute,and this region contains the L-shaped corner of the L-curve.In actual computations,the values q k and g k are usually not available; hence,we search for good approximations.Let y k k be an approximation of the solution y k of(3),and introduce the residual vectorr k k: bÀ B k I y k k 12 and the quantitiesq k k : k B y k kÀb k; 13g k k : k A T y k k k: 14 The following proposition is concerned with how well the functions q k and g k approximate q and g,respectively.Proposition2.Let q k k and g k k be defined by(13)and(14),respectively. Thenj q k k Àq k j T k r k k k; 15 j g k k Àg k j T k r k k k: 16 Proof.It follows from the de®nition of q and q k thatj q k k Àq k j T k B y k kÀy k k k B y k kÀ B k I À1b kk B B k I À1r k k k T k B B k I À1kk r k k k;and in viewof k B B k I À1k T1,the®rst part of the proposition follows.For the second part,we have from the de®nition of g and g kj g k k Àg k j T k A T y k kÀy k k k A T y k kÀ B k I À1b kk A T B k I À1r k k k T k A T B k I À1kk r k k k:Using the singular value decomposition of the matrix A U R V T and B U R2U T,we obtain that k A T B k I À1k T1,then the last part of the proposition follows.Ã60S.Morigi,F.Sgallari/put.121(2001)55±73Thus,we can obtain a good approximation of the discrete L-curve by computing q k k ;g k k ,and quantifying the errors by the length of the re-sidual vector.3.The Lanczos algorithm and regularizationIn this section,we brie¯y review the bidiagonalization Lanczos process[9], and,we explore its relation with the functions q k k and g k k introduced in Section2.Given a symmetric matrix B AA T and a nonvanishing vector b,if we ex-ecute k iterations of the Lanczos bidiagonalization Algorithm1in exact arithmetic,we get the orthogonal mÂk matrixV k v0;v1;...;v kÀ1 17 and the orthogonal nÂk matrixW k w0;w1;...;w kÀ1 : 18 The matrices V k and W k are connected by the two equationsAW k V k L k d k v k e T k; 19A T V k W k L T k; 20 where L k denotes the kÂk lower bidiagonal matrixL k:c0d1c1......d kÀ2c kÀ2d kÀ1c kÀ12666664377777521and e k is the k th column of the identity matrix.Algorithm1(Lanczos bidiagonalization algorithm).Input:b P R m n f0g,B P R mÂm,0<k<m;Output:f d j g k j 1;f c j g kÀ1j 0;f v j g kÀ1j 0;v0: b=k b k;s0: A T v0;c0: k s0k;w0: s0=c0;for j 2;3;...;k doS.Morigi,F.Sgallari/put.121(2001)55±7361r jÀ1: A w jÀ2Àc jÀ2v jÀ2;d jÀ1: k r jÀ1k;v jÀ1: r jÀ1=d jÀ1;s jÀ1: A T v jÀ1Àd jÀ1w jÀ2;c jÀ1: k s jÀ1k;w jÀ1: s jÀ1=c jÀ1;end jLet T k 1;k denote a tridiagonal matrixT k 1;k:a0b1b1a1b2.........b kÀ2a kÀ2b kÀ1b kÀ1a kÀ1b k2666666666437777777775P R k 1 Âk: 22It is nowstraightforw ard to identify T k;k L k L T k if a j d2j c2j, b j d j c jÀ1;j 1;...;kÀ1,and a0 c20.By combining the two relations(19) and(20),we getBV k V k L k L T k d k c kÀ1v k e T k V k 1T k 1;k; 23 with V k 1 v0;v2;...;v k ,such that V T k 1V k 1 I k 1and V k 1e1 b=k b k. Throughout this article,I k 1denotes the identity matrix of order k 1,I k 1;k its leading principal k 1 Âk submatrix.It follows from Proposition2that in order to have a good approximation of the L-curve points q k ;g k ,we have to choose k su ciently large and in-crease it until k r k k k is under a given tolerance.Now,we want to show how,by means of Algorithm1,it is possible to compute the residual k r k k k for increasing values of k in a straightforward manner.For notational simplicity,we will consider in the sequel the T k 1;k matrix even if,from a computational point of view,only the necessary elements are determined starting from L k.We determine an approximate solution of(3)of the form y k k V k z by solving the minimization problemmin z P R k k B k I V k zÀb k minz P R kk V k 1 T k 1;k k I k 1;k zÀV k 1e1k b kkminz P R kk T k 1;k k I k 1;k zÀe1k b kk: 24Let us introduce the QR-factorization of the tridiagonal matrix~Tk 1;k : T k 1;k k I k 1;k ~Q k 1~R k 1;k; 2562S.Morigi,F.Sgallari/put.121(2001)55±73where ~Q k 1P R k 1 Â k 1 ,~Q T k 1~Q k 1 I k 1and ~R k 1;k P R k 1 Âk has an upper triangular leading principal k Âk submatrix,denoted by ~R k ,and a vanishinglast row.It follows thatmin z P R kk B k I V k z Àb k min z P R kk ~R k 1;k z À~Q T k 1e 1k b kk j e T k 1~Q T k 1;k e 1jk b khas the solutionz k k: ~RÀ1kI T k 1;k ~Q Tk 1e 1k b k :26Thus,the associated approximate solution of (3)is given byy k k : V k ~R À1k I T k 1;k ~Q T k 1e 1k b k ;27and the corresponding residual vector (12)has the normk r k k k j e T k 1~Q Tk 1e 1jk b k :28We note for future reference that,when y k k is given by (27),the quantities q kand g k de®ned by (13)and (14),respectively,can be evaluated asq k k k T k 1;k z k k Àe 1k b kk ;29 g k k k L T k z k k k ;30where z k k is given by (26).When increasing k ,the QR-factorizations of the matrices ~Tk 1;k are updated.We outline the computations required for this.As the matrix ~Tk 1;k is tridiag-onal,it is convenient to compute its QR-factorization by Givens rotations [10,Chapter 5].Let G ik 1be a Givens rotation of order k 1that rotates the i ;i 1 coordinate planes so that the i th subdiagonal element of the matrix G i À1 k 1;...;G 1 k 1~T k 1;k vanishes.Then,the factors in the QR-factorization (25)of ~Tk 1;k are given by ~Q k 1: G k k 1;...;G 1 k 1T ;~R k 1;k : G k k 1;...;G 1 k 1~T k 1;k :31Taking an additional step with the Lanczos algorithm yields the matrixwhose QR-factorization can be determined by updating factors (31)as follows.For 1T i T k ,let G ik 2denote the Givens rotation of order k 2whose leadingprincipal submatrix of order k 1is G ik 1.Then,32where Ãdenotes a matrix element that may be nonvanishing.The upper tri-angularmatrixin the QR-factorization ~Tk 2;k 1 ~Q k 2~R k 2;k 1is obtained by multiplying matrix (32)from the left by a Givens rotation G k 1k 2that annihilates the k 2;k 1 entry b k 2.Thus,andQ T k 2e 1 G k 1 k 2~Q T k 1e 102435:64S.Morigi,F.Sgallari /put.121(2001)55±73We point out that in order to compute ~Rk 2;k 1,given ~R k 1;k ,we only need to apply the Givens rotations G k À1 k 2,G k k 2and G k 1k 2,in this order,to the lastcolumn of ~Tk 2;k 1.We note that q k k and g k k ,for ®xed k ,can be evaluated for di erent values of k without recomputing the Lanczos decomposition (23).Each newvalue of k yields a newtridiagonal matrix ~Tk 1;k ,whose QR-factorization,see (25)and (31),has to be computed.The computation of this factorization is inexpensive;it requires only O k arithmetic operations,as it can be carried with k Givens rotations.4.A regularizing algorithm based on the Lanczos processIn this section,our method for the solution of underdetermined ill condi-tioned linear systems (1)is described.The evaluation of points on the L-curve requires that the exact solution of the linear system (3)is available.However,as already mentioned,computation of the exact or a highly accurate solution of (3)for each value of k is expensive,and therefore,we replace the equations for q k and g k by (29)and (30),respectively.It follows from Proposition 2and (28)that the norm of the re-sidual vector r k k is a computable inexpensive bound for the error introduced when approximating q k and g k by q k k and g k k .Algorithm 2computes the regularization parameter and the corresponding approximate solution of the regularized system (3).Algorithm 2(Regularizing L-curve Lanczos iteration ).Input :A P R m Ân ,b P R m , d , r P R ,[k i ,k f ],k i ;k f P R ,N ;k val P N ;Output :k opt ,regularized approximate solution x kopt k P R n ;%Initialization %.k : k val;j 0;k 1i k i ,k 1f k f ;k 0opt : 0;repeat.j j 1;.k j s : k j f Àk ji = N À1 ,%L-curve computation phase %.Perform k Lanczos steps by applying Algorithm 1;for k `: k j i to k j f step k js do.Compute QR-factorization of ~Tk 1;k ;.Evaluate k r k k ;while (k r k k > r )do .k k 1;.Apply one more Lanczos step;end ;S.Morigi,F.Sgallari /put.121(2001)55±736566S.Morigi,F.Sgallari/put.121(2001)55±73 .Compute point P` q k k` ;g k k` ;end;%Detection corner phase%.Detect the smallest interval that contains the closest point P j to the origin; .Compute the associated k j opt;until(j k jÀ1optÀk j opt j T d j k jÀ1opt j);%Resolution phase%.Evaluate the®nal solution of A x b by solving x k opt k A T y k opt k.The initialization step needs the de®nition of an initial interval k i;k f for the regularizing parameter,the number of required points(N)on the discrete L-curve,and the initial dimension k val of the Krylov space.We will refer to each step j of the repeat loop as re®nement step j.Each of these steps consists of an L-curve computation phase,which calculates N even spaced points on the discrete L-curve belonging to the interval k j i;k j f ,and of a detection phase for determining the corner of the L-curve at step j.Let us consider the L-curve computation phase in more detail.Given an initial k value(k val),k Lanczos steps are preliminarily performed. Using the relations between d and c entries of L k matrix and a and b entries of T k 1;k the~Q~R factorization(25)is then computed and further Lanczos steps are performed until the residual norm,evaluated by relation(28),does not satisfy the given precision r.The approximate point P` q k k` ;g k k` of the discrete L-curve is then given by(29)and(30).To determine the next point P`of the L-curve the algorithm uses for k and L k the previous values,then,using(25),a newQR-factorization is computed and consequently the residual is available.The corner detection phase considers the L-curve resulting from the previ-ous phase and determines the smallest interval containing the k value corre-sponding to the closest point P j opt to a local origin to a given tolerance d.The local origin coordinates are computed in the®rst re®nement step as min`q k k` and min`g k k` .Note that during each re®nement step,we can also verify whether the given j th interval contains the corner point P j opt.In fact,if the k value cor-responding to the returned point P opt j is equal to one of the extreme points of the interval k j i;k j f ,then the k value associated with the corner point is not contained in the present interval.Thus,it will be necessary to expand the interval and consequently to compute some additional points to complete the L-curve.The last step of Algorithm2evaluates the approximate solution y k opt k of(3) by(27),and the regularized approximate solution of the original problem(1) by x k opt k A T y k opt k.As a®nal remark,we observe that each re®nement step can take advantage of the previous Lanczos step without recomputing all the values.5.Numerical resultsThe purpose of this section is to illustrate the performance of Algorithm2in some numerical examples obtained on a Sun Ultra workstation with a unit roundo of 2À52%2:2204Â10À16.We implemented our method to com-pute an approximation of the L-curve in FORTRAN77language,and we used MATLAB commercial software package[11]to obtain the exact computation of points on the L-curve.In order to obtain small and large scale test problems,we generated an m-by-n matrix A de®ned by its singular value decompositionA U R V T; 33 whereU IÀ2uu Tk u k22;V IÀ2vv Tk v k22; 34and the entries of u and v are random numbers.R is a diagonal matrix with the singular values r i eÀ0:2 iÀm .On the right-hand side of the linear system(1)to be solved b exact is a random vector,and e is the noise vector generated by®rst determining a vector with normally distributed random numbers with zero mean and variance one as components,and then scaling it to have the desired norm.We will refer to the quantity k e k=k b k as the noise level.All computations here reported correspond to small ill conditioned test problems m 100;n 200;r m 1:0;r1 2:5175Â10À9 in order to be able to perform qualitative comparisons with the MATLAB results.The results of each re®nement step of Algorithm2are summarized in Table 1which consider,respectively,the three di erent noise levels0:1;0:01and 0:001.The®rst column in the tables indicates the level of re®nement(LR),the second column labelled`` k i;k f ''shows the interval of the regularization pa-rameter k,the columns labelled``k opt e''and``k opt a''contain the optimal regular-ization parameter determined in that re®nement interval,respectively,by MATLAB(exact)and by Algorithm2(approximated).The last two columns report the distance of the corner from the local origin,chosen for a given noise level(distance),and the number of evaluation points in the re®nement step (NP).The precision`` r'',used in Lanczos procedure,is reported in the captions. The dimension k of the Krylov subspace is not reported in the tables because it maintains a value of®ve for all the computations.The only exception is for some extremely small k values(e.g.,1:0Â10À9;1:0Â10À10)when a high pre-cision r is used in the Lanczos procedure.To avoid the use of k greater than 100,we suggest either to consider a smaller interval for k,or to use,for the computation related to the critical extreme value,a higher precision r.S.Morigi,F.Sgallari/put.121(2001)55±7367Table1Test problemLR k i;k f k opt e k opt a Distance NP(a)Noise level0:1, r 0:11 1:0Â10À9;2:0 3:0Â10À13:0Â10À15:963200Â10À1202 4:0Â10À1;2:0Â10À1 2:5Â10À12:5Â10À15:956456Â10À1203 2:4Â10À1;2:6Â10À1 2:5Â10À12:5Â10À15:956456Â10À1201 1:0Â10À9;2:0 2:5Â10À12:5Â10À15:956456Â10À1200(b)Noise level0:01, r 0:051 1:0Â10À9;2:0 1:0Â10À11:1Â10À1 1.211257202 1:0Â10À9;2:0Â10À1 1:0Â10À21:0Â10À28.901137Â10À1203 1:0Â10À9;2:0Â10À2 1:0Â10À31:0Â10À37.939300Â10À1204 1:0Â10À9;2:0Â10À3 1:3Â10À31:0Â10À37.930785Â10À1201 1:0Â10À9;2:0 1:0Â10À21:0Â10À28.901137Â10À12002 1:0Â10À9;2:0Â10À2 9:0Â10À41:0Â10À37:930785Â10À1200(c)Noise level0:001, r 0:0051 1:0Â10À10;2:0 1:0Â10À11:0Â10À1 2.232340202 1:0Â10À10;2:0Â10À1 1:0Â10À21:0Â10À2 1.715200203 1:0Â10À10;2:0Â10À2 1:0Â10À31:0Â10À3 1.306169204 1:0Â10À10;2:0Â10À3 1:0Â10À41:0Â10À49.563796Â10À1205 1:0Â10À10;2:0Â10À4 1:0Â10À51:0Â10À58.587058Â10À1206 1:0Â10À10;2:0Â10À5 7:0Â10À66:0Â10À68.572574Â10À1207 6:0Â10À6;8:0Â10À6 7:2Â10À65:8Â10À68.570727Â10À1201 1:0Â10À10;2:0 1:0Â10À21:0Â10À29.806748Â10À12002 1:0Â10À10;2:0Â10À2 1:0Â10À41:0Â10À48.646498Â10À12003 1:0Â10À10;2:0Â10À4 5:0Â10À67:0Â10À68.572574Â10À120068S. Morigi, F. Sgallari / Appl. Math. Comput. 121 (2001) 55±73。

自适应信号处理参考文献

自适应信号处理参考文献

自适应信号处理参考文献自适应信号处理是一种运用数学和算法来处理信号的技术。

它可以根据信号的特性和环境的变化自动调整参数和算法,从而提高信号处理的性能。

这项技术被广泛应用于通信、雷达、阵列信号处理、音频处理等领域。

在自适应信号处理领域,有许多经典的参考文献值得注意。

下面将介绍三篇具有代表性的文章。

1. Widrow, B., & Hoff, M. E. (1960). Adaptive Switching Circuits. IRE Convention Record, 4, 96-104.这是自适应信号处理的里程碑之一。

Widrow 和 Hoff 开发了一种自适应滤波器,被称为LMS(Least Mean Squares)算法。

这个算法通过最小化误差平方和来自适应地调整滤波器的权重。

它在信号处理和系统辨识中被广泛应用,并为后来的自适应算法奠定了基础。

2. Haykin, S. (1996). Adaptive Filter Theory. Prentice-Hall.这本书是自适应信号处理领域的经典教材之一。

作者 Simon Haykin 是自适应滤波器领域的权威,他在本书中系统地介绍了自适应滤波器的原理、算法和应用。

这本书向读者深入解释了自适应信号处理的理论和方法,对于学习和研究自适应信号处理非常有用。

3. Sadjadi, F. A. (2013). An overview of adaptive signal processing: Theory and applications. International Journal of Computer Science Issues (IJCSI), 10(1), 377-385.这篇综述文章从理论和应用的角度对自适应信号处理进行了全面的概述。

作者Fakhreddine A. Sadjadi 在文章中总结了自适应信号处理的主要概念、算法和应用领域,并讨论了该领域的未来发展方向。

Lanczos方法PPT课件

Lanczos方法PPT课件

范数
x B
(x,x)B 。读者不难验证,
矩阵
M 1K和 ( KM 都) 1 是M M一对称矩阵。
第15页/共21页
任何初始向量U,U K 设1 向量U 0 = 0 , 用三项 递推公式进行迭代:
{ U k 1 } ( K { U k } k { U k } k { U k 1 } /k 1 )(11)
(ij) (9) (ij)
第6页/共21页
Lanczos法
Lanczos方法利用三项递推关系产生一组正交规 范的特征向量,同时将原矩阵约化成三对角阵,将 问题转化为三对角阵的特征问题的求解。以20世纪 匈牙利数学家Cornelius Lanczos命名。
Lanczos方法实际上是Arnoldi算法对于对称矩 阵的特殊形式,可应用于对称矩阵线性方程组求解 的Krylov子空间方法以及对称矩阵的特征值问题。
(ij) (5) (ij)
{i}TK {j} 0 kii
(ij) (ij) (6)
在式 K中M 将特征向量归一化,即:
第5页/共21页
{i }
1 mii
{i }
(7)
上式称为归一化特征向量。
则式(5),(6)有
{ i} TM { j} 1 0
{i}TK {j} 0 i
(ij)
(8)
(ij)
工程结构的动态分析主要包括两个方面:结构 的动态特性分析和结构动态响应分析。
第2页/共21页
结构无阻尼自由振动方程
将简谐运动
..
M{y}K{y}{0}
(1)
ysin(t) (2)
代入上式可得
(K2M )0 (3)
或写成
KM
(4)

lanczos方法

lanczos方法

lanczos方法
Lanczos方法是一种数值解法,用于求解有限维矩阵线性方程组的可行性、最小值等问题的有效算法。

它的实现可以按照以下步骤:首先,设定一个可行的初始点,然后构建出和矩阵同维度的一组单位矢量。

接着,运用Lanczos算法,构建出一组与矩阵维度相同的线性无关的向量,然后计算出矩阵的主特征向量。

最后,把这些特征向量与初始点结合,构建出该矩阵对应的可行性、最小值等解决线性方程组的问题。

Lanczos方法在计算机科学中应用得非常广泛,它能够提供非常有效的结果,常常用来帮助求解复杂的矩阵模型。

Lanczos的思想和应用在机器学习领域,特别是深度学习中,也有很多应用,比如卷积神经网络中的参数优化算法。

隐式重新启动精化Lanczos双对角化方法

隐式重新启动精化Lanczos双对角化方法

隐式重新启动精化Lanczos双对角化方法
赖降周;卢琳璋
【期刊名称】《厦门大学学报(自然科学版)》
【年(卷),期】2009(048)002
【摘要】给出一种计算少数几个最小奇异三元组的隐式重新启动精化Lanczos双对角化方法,采用调和Ritz值作为位移,有效地逼近大规模矩阵的小奇异值的奇异三元组,算法用精化残量,精化奇异向量和精化Rayleigh商,同时采取压缩技术压缩掉已经求出的小的奇异三元组,数值实验表明,算法更有效地求解大规模矩阵的小奇异三元组,收敛速度也快.
【总页数】7页(P153-159)
【作者】赖降周;卢琳璋
【作者单位】厦门大学数学科学学院,福建,厦门,361005;厦门大学数学科学学院,福建,厦门,361005
【正文语种】中文
【中图分类】O241.6
【相关文献】
nczos双对角化:一种快速的非负矩阵初始化方法 [J], 王炫盛;陈震;卢琳璋
2.隐式重新启动的Lanczos算法在模型降阶中的应用 [J], 王瑞瑞;卢琳璋
3.半精化双正交Lanczos方法 [J], 吴钢
4.隐式重新启动的上、下双对角化Lanczos方法之比较 [J], 牛大田
5.精化双正交Lanczos方法 [J], 王耀卫
因版权原因,仅展示原文概要,查看原文内容请购买。

基于Krylov-Schur重启技术的Arnoldi模型降阶方法

基于Krylov-Schur重启技术的Arnoldi模型降阶方法

基于Krylov-Schur重启技术的Arnoldi模型降阶方法徐康丽;杨志霞;蒋耀林【摘要】Krylov子空间模型降阶方法是模型降阶中的典型方法之一,Arnoldi模型降阶方法是这类方法中的一类基本方法。

运用重正交化的Arnoldi算法得到r步Arnoldi分解;执行Krylov-Schur重启过程,导出基于Krylov-Schur重启技术的Arnoldi模型降阶方法。

运用此方法对大规模线性时不变系统进行降阶,得到具有较高近似精度的稳定的降阶系统,从而改善了Krylov子空间降阶方法不能保持降阶系统稳定性的不足。

数值算例验证了此方法是行之有效的。

%Krylov subspace method is one of the typical model reduction methods, in which Arnoldi model reduction method is the basic method. Re-orthogonalizational Arnoldi algorithm is proposed to obtain r step Arnoldi decomposition. Next, this paper restarts Krylov-Schur process and drives Arnoldi model reduction method based on implicitly restarted Krylov-Schur technology to reduce the large scale linearly time invariant systems. By this method, it can obtain a stable order-reduced system with higher accuracy, which can improve the drawback of Krylov subspace methods. Finally, simula-tions of a linearly time invariant system will be conducted to illustrate the effectiveness of the proposed method.【期刊名称】《计算机工程与应用》【年(卷),期】2016(052)012【总页数】5页(P251-255)【关键词】模型降阶;Krylov子空间方法;重正交化;Krylov-Schur重启技术【作者】徐康丽;杨志霞;蒋耀林【作者单位】新疆大学数学与系统科学学院,乌鲁木齐 830046;新疆大学数学与系统科学学院,乌鲁木齐 830046;西安交通大学数学与统计学院,西安 710049【正文语种】中文【中图分类】TP39XU Kangli,YANG Zhixia,JIANG Yaolin.Computer Engineering andApplications,2016,52(12):251-255.在众多工程技术领域,随着问题复杂性的提高,系统也变得越来越庞大,如:电力系统、流体机械系统、超大规模集成电路系统等,都涉及大型或复杂动力系统的计算机设计、仿真、优化与控制。

三维地电场数值计算中的LANCZOS迭代过程及其改进算法

三维地电场数值计算中的LANCZOS迭代过程及其改进算法

三维地电场数值计算中的LANCZOS迭代过程及其改进算法宛新林;席道瑛
【期刊名称】《物探化探计算技术》
【年(卷),期】2009(031)003
【摘要】针对三维地电场正演数值计算过程中形成的超大规模稀疏线性方程组,在分析此类线性方程组的一般解法基础上,着重阐述一种适宜求解此类方程组的Lanczos迭代过程与算法原理.同时,当地下介质的电性差异较大时,形成系数矩阵A 的条件数就很大,可对算法进行适当改进.讨论采用不完全Cholesky分解方法进行预条件处理,经过条件数改善后,形成新的线性方程组系数矩阵,就会变为一个近似的单位矩阵.经改进后的Lanczos算法,将提高数值计算稳定性,从而加快迭代收敛速度,为提高反演质量提供基础.
【总页数】5页(P197-201)
【作者】宛新林;席道瑛
【作者单位】安徽建筑工业学院土木工程学院,安徽,合肥230022;中国科学技术大学地球与空间科学学院,安徽,合肥230026
【正文语种】中文
【中图分类】O241
【相关文献】
1.基于Newmark格式的车辆-轨道耦合迭代过程的改进算法 [J], 张斌;雷晓燕;罗雁云
NCZOS迭代算法及其在三维地电场数值模拟计算中的应用研究 [J], 宛新林;席道瑛;高尔根
3.Splatting算法在三维CT迭代算法中的应用 [J], 莫会云; 潘晋孝
4.基于预条件LANCZOS算法快速实现三维地电场正演计算 [J], 宛新林;席道瑛
5.三维电阻率正演计算中的Lanczos迭代算法 [J], 宛新林;席道瑛;高尔根
因版权原因,仅展示原文概要,查看原文内容请购买。

Lanczos算法误差估计的若干问题

Lanczos算法误差估计的若干问题

Lanczos算法误差估计的若干问题
贝清泉
【期刊名称】《汕头大学学报:自然科学版》
【年(卷),期】1997(012)002
【摘要】设A为n×n实对称矩阵,对于给定的j个线性无关列向量组成的n×j实矩阵Q,对任意j×j实矩阵T,记R(T)=AQ-QT,本文给出j×j实矩阵H,使∥R(H)∥t-min∥R(T)∥2,并证明当T取矩阵H时,文献「1」中这些计122定理4.10的“√2”的可以改变成“1”
【总页数】3页(P24-26)
【作者】贝清泉
【作者单位】汕头大学数学系
【正文语种】中文
【中图分类】O241.6
【相关文献】
1.基于Lanczos算法的对称非负矩阵分解初始化方法 [J], 武坚强;郭江鸿
2.基于Lanczos核的实时图像插值算法 [J], 郭莹;李伦;王鹏
3.利用Lanczos算法研究一维光晶格Fermi气体 [J], 胡波;陈亮;韩榕生
4.利用Lanczos算法研究一维光晶格Fermi气体 [J], 胡波;陈亮;韩榕生;
5.大规模非对称线性方程组Lanczos算法和精化Lanczos算法的对比 [J], 张亚蕾;杨少静
因版权原因,仅展示原文概要,查看原文内容请购买。

不确定的Lü系统的自适应Backstepping控制

不确定的Lü系统的自适应Backstepping控制

不确定的Lü系统的自适应Backstepping控制
封希媛;李医民
【期刊名称】《西安科技大学学报》
【年(卷),期】2013(033)004
【摘要】混沌反控制模型Lü系统连接了著名的Lorenz系统和Chen系统,并代表了它们之间的连续演变.针对混沌系统的控制问题,已有很多控制方法.本文是针对Lü系统参数未知情形,研究了混沌控制在Lü系统中的问题,并将Lü系统转换成一种非线性的一般严格反馈形式,利用自适应Backstepping设计方法,针对Lü系统的三个未知参数,结合Lü系统的特性设计出一种新的自适应控制器,保证闭环系统全局稳定,仿真结果表明了该方法的可行性和有效性.
【总页数】4页(P466-469)
【作者】封希媛;李医民
【作者单位】青海民族大学计算机学院,青海西宁810007;江苏大学理学院,江苏镇江212013
【正文语种】中文
【中图分类】TP273
【相关文献】
1.模型不确定非线性系统的自适应模糊Backstepping预测控制 [J], 郑兰;周卫东;廖成毅;程华
2.具有非线性不确定参数的电液伺服系统自适应backstepping控制 [J], 林浩;李
恩;梁自泽
3.基于 ELM 的一类不确定性纯反馈非线性系统的Backstepping 自适应控制 [J], 李军;石青
4.一类输入受限的不确定非线性系统自适应 Backstepping变结构控制 [J], 李飞;胡剑波;王坚浩;汪涛
5.一类不确定非线性系统的鲁棒自适应Backstepping控制 [J], 粟世玮;张思洋;尤熠然;李雍
因版权原因,仅展示原文概要,查看原文内容请购买。

自由终端随机最优调节器的注记

自由终端随机最优调节器的注记

自由终端随机最优调节器的注记
张维海
【期刊名称】《控制理论与应用》
【年(卷),期】2006(23)1
【摘要】本文讨论了无限时间自由终端随机最优调节器问题和其相应的广义代数Riccati方程解之间的关系.具体而言,本文证明了无限时间自由终端随机最优调节器对应着广义代数Riccati方程的最小非负解,该最小解的核空间等于随机系统的精确不能观子空间.另外本文指出了以往文献中关于广义代数Riccati方程最大解存在性的一个证明错误,并对错误进行了分析.
【总页数】4页(P135-138)
【作者】张维海
【作者单位】山东轻工业学院,电子信息与控制工程学院,山东,济南,250100;山东科技大学,信息与电气工程学院,山东,青岛,266510
【正文语种】中文
【中图分类】TP13
【相关文献】
1.模型自由的离散时间系统的随机线性二次最优控制 [J], 么彩莲;王涛
2.广义离散随机线性系统的最优调节器 [J], 秦超英;戴冠中
3.多自由度拟不可积哈密顿系统的随机分数阶最优控制 [J], 钱佳敏;陈林聪;陈虹霖
4.无限终端线性二次调节器最优输出反馈律 [J], 张雷;曾蓉
5.基于LQR最优调节器的三自由度直升机控制系统 [J], 赵笑笑;董秀成
因版权原因,仅展示原文概要,查看原文内容请购买。

线性时变系统最优控制解的新方法

线性时变系统最优控制解的新方法

线性时变系统最优控制解的新方法
李慕兰;贾磊
【期刊名称】《山东工业大学学报》
【年(卷),期】1990(020)004
【摘要】对于LQR(具有二次型性能指标的线性最优调节器)问题,求解最优控制律
的关键在于求解黎卡堤方程.但对于时变系统,其求解变得相当复杂,从而影响了它的实际应用.本文提出用移位雅可比正交多项式求解,通过直接计算最优反馈律K(t)所
需的状态转移矩阵,获得了一个计算K(t)的新方法,它将一个增广型的状态方程转化
为一个简单的矩阵代数方程,从而避开了黎卡堤方程求解的困难,并使计算大为简化。

【总页数】8页(P24-31)
【作者】李慕兰;贾磊
【作者单位】不详;不详
【正文语种】中文
【中图分类】O232
【相关文献】
1.方块脉冲函数应用于线性时变系统最优控制的动态规划解法及其改进 [J], 贺昱

2.用WALSH函数对线性时变系统进行分析和最优控制 [J], 王霏
3.分段线性函数应用于线性时变系统的最优控制 [J], 古天龙;徐国华
4.用WALSH函数实现线性时变系统的最优控制和仿真 [J], 王霏
5.线性时变系统二次最优控制问题的保辛近似求解 [J], 谭述君;钟万勰
因版权原因,仅展示原文概要,查看原文内容请购买。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
相关文档
最新文档