4 Convex optimization problems
convex optimization中译本
一、导论随着科技的发展和应用,凸优化在各个领域中发挥着越来越重要的作用。
其在工程、金融、计算机科学等领域的应用不断扩展和深化。
对于凸优化的理论和方法的研究,以及文献的翻译与传播变得尤为重要。
本文旨在对凸优化中的一些重要主题和内容进行介绍和讨论,希望能够为相关领域的研究者和读者提供一些参考和帮助。
二、凸优化基本概念1. 凸集与凸函数凸集和凸函数是凸优化中非常基础且重要的概念。
凸集是指集合中任意两个点的线段都在该集合内部的集合。
凸函数则是定义在凸集上的实值函数,其函数图像上的任意两点组成的线段都在函数图像上方。
凸集和凸函数的性质为凸优化问题的理论和方法提供了基础。
2. 凸优化问题的一般形式凸优化问题的一般形式可以表示为:minimize f(x)subject to g_i(x) <= 0, i = 1,2,...,mh_j(x) = 0, j = 1,2,...,p其中,f(x)是要优化的目标函数,g_i(x)和h_j(x)分别为不等式约束和等式约束。
凸优化问题通常要求目标函数和约束函数都是凸的。
三、凸优化中的常见算法1. 梯度下降法梯度下降法是一种常用的优化算法,尤其适用于凸优化问题。
其基本思想是通过计算目标函数的梯度方向,并沿着梯度的负方向进行迭代,以逐步逼近最优解。
2. 拉格朗日乘子法拉格朗日乘子法主要用于处理约束优化问题,通过构建拉格朗日函数并对其进行优化,得到原始优化问题的最优解。
拉格朗日乘子法在凸优化问题中得到了广泛的应用。
3. 内点法内点法是一类迭代法,主要用于求解线性规划和二次规划等凸优化问题。
其优点在于可以较快地收敛到最优解,尤其适用于大规模的凸优化问题。
四、凸优化在科学与工程中的应用凸优化在科学与工程中有着广泛的应用,如在信号处理中的最小二乘问题、在机器学习中的支持向量机、在通信系统中的功率分配问题等。
这些应用不仅推动了凸优化理论的发展,也为实际问题的解决提供了有效的工具和方法。
考虑燃气轮机无功支撑的IEGES有功-无功协调优化模型
电气传动2021年第51卷第12期ELECTRIC DRIVE 2021Vol.51No.12摘要:燃气轮机作为连接电力网络和天然气网络的重要元件,其无功支撑能力为减小网损和电压偏差提供重要途径。
基于此,提出一种考虑燃气轮机无功支撑的电-气互联综合能源系统(IEGES )有功-无功协调优化模型。
模型首先对电转气、燃气轮机、有载调压变压器和常规机组等重要元件建模,并采用分段线性化处理燃气轮机有功无功出力耦合特性约束,从而融入燃气轮机的无功支撑能力。
接着计及电网气网约束以调度成本和网损为优化目标,运用二阶锥松弛技术将模型转化为混合整数凸优化问题求解。
最后在IEEE33节点电网与比利时20节点天然气网耦合系统中模拟仿真,结果表明:考虑燃气轮机的无功支撑可以显著减少配电网损耗和电压偏差。
为综合能源系统建设和调度决策提供参考。
关键词:电-气互联综合能源系统;燃气轮机无功支撑;二阶锥松弛;分段线性化;有功-无功协调优化中图分类号:TM933文献标识码:ADOI :10.19457/j.1001-2095.dqcd21286Coordinated Active -reactive Optimization Model for IEGES ConsideringReactive Power Support by Gas -fired TurbineZHANG Haoyu ,QIU Xiaoyan ,ZHOU Shengrui ,LIU Mengyi ,ZHAO Youlin(School of Electrical Engineering ,Sichuan University ,Chengdu 610065,Sichuan ,China )Abstract:As the important equipment to link the power network and the gas network ,the gas-fired turbine with the ability of reactive power support could be a considerable way to decrease the network loss and the voltage deviation.Based on this fact ,a coordinated active-reactive optimization model for integrated electricity-gas energy system (IEGES )considering reactive power support by gas-fired turbine was proposed.Firstly ,some important units in this model such as power to gas (P2G ),gas-fired turbine ,on load tap changer ,and traditional generators were elaborated ,and simultaneously to integrate the reactive power support by gas-fired turbine ,the piecewise linearization was adopted to linearize the coupling constraint between active power and reactive power generated by gas-fired turbine.Secondly ,by the technology of the second order cone relaxation ,the model setting the minimum dispatch cost and network loss as the target with the power and gas network constraints was transformed to a mixed integer convex optimization problem.Finally ,the integrated system of IEEE33-node power network and 20-node natural gas network of Belgium was employed for analog simulation ,results show that the ability of reactive power support by gas-fired turbine can significantly decrease the distributed network loss and voltage deviation.This model can provide reference for the construction and dispatch decision of the integrated energy system.Key words:integrated electricity-gas energy system (IEGES );reactive power support by gas-fired turbine ;second order cone relaxation ;piecewise linearization ;coordinated active-reactive power optimization基金项目:四川省科技厅重点研发项目(2017FZ0103)作者简介:张浩禹(1994—),男,硕士研究生,Email :随着经济的高速发展,人类逐渐面临化石能源枯竭和环境保护问题[1]。
拉格朗日神经网络解决带等式和不等式约束的非光滑非凸优化问题
拉格朗日神经网络解决带等式和不等式约束的非光滑非凸优化问题喻昕;许治健;陈昭蓉;徐辰华【摘要】Nonconvex nonsmooth optimization problems are related to many fields of science and engineering applications, which are research hotspots. For the lack of neural network based on early penalty function for nonsmooth optimization problems, a recurrent neural network model is proposed using Lagrange multiplier penalty function to solve the nonconvex nonsmooth optimization problems with equality and inequality constrains. Since the penalty factor in this network model is variable, without calculating initial penalty factor value, the network can still guarantee convergence to the optimal solution, which is more convenient for network computing. Compared with the traditional Lagrange method, the network model adds an equality constraint penalty term, which can improve the convergence ability of the network. Through the detailed analysis, it is proved that the trajectory of the network model can reach the feasible region in finite time and finally converge to the critical point set. In the end, numerical experiments are given to verify the effectiveness of the theoretic results.%非凸非光滑优化问题涉及科学与工程应用的诸多领域,是目前国际上的研究热点.该文针对已有基于早期罚函数神经网络解决非光滑优化问题的不足,借鉴Lagrange乘子罚函数的思想提出一种有效解决带等式和不等式约束的非凸非光滑优化问题的递归神经网络模型.由于该网络模型的罚因子是变量,无需计算罚因子的初始值仍能保证神经网络收敛到优化问题的最优解,因此更加便于网络计算.此外,与传统Lagrange方法不同,该网络模型增加了一个等式约束惩罚项,可以提高网络的收敛能力.通过详细的分析证明了该网络模型的轨迹在有限时间内必进入可行域,且最终收敛于关键点集.最后通过数值实验验证了所提出理论的有效性.【期刊名称】《电子与信息学报》【年(卷),期】2017(039)008【总页数】6页(P1950-1955)【关键词】拉格朗日神经网络;收敛;非凸非光滑优化【作者】喻昕;许治健;陈昭蓉;徐辰华【作者单位】广西大学计算机与电子信息学院南宁 530004;广西大学计算机与电子信息学院南宁 530004;广西大学计算机与电子信息学院南宁 530004;广西大学电气工程学院南宁 530004【正文语种】中文【中图分类】TP183作为解决优化问题的并行计算模型,递归神经网络在过去的几十年里受到了极大的关注,不少神经网络模型被提出。
凸优化练习题与解答(1)台大考古题
Exam policy: Open book. You can bring any books, handouts, and any kinds of paper-based notes with you, but electronic devices (including cellphones, laptops, tablets, etc.) are strictly prohibited.
2. (18%) Determine whether each of the following sets is a convex function, quasi-convex
function, concave function. Write your answer as a table of 6 rows and 3 columns, with
z, X1z ≥ 1 z, X2z ≥ 1
Then, for 0 ≤ θ ≤ 1,
z, θX1 + (1 − θ)X2 z =θ z, X1 + (1 − θ) z, X2 ≥θ · 1 + (1 − θ) · 1 =1.
As required in definition of S10. To see it is not a cone, consider z = (1, 0, . . . , 0), and X = I ∈ Sn (symmetric matrices). Here z, Xz = 1, but z, 2Iz = 2 The reason that it is not affine is the same, by considering 2I = 2 · I + (−1) · O, the “line” containing O (all-0 matrix) and I. It follows that it is not a subspace. 11. S11 = x ∈ Rn ||P x + q||2 ≤ cT x + r given any P ∈ Rm×n, q ∈ Rm, c ∈ Rn, and r ∈ R. T, F, F, F To show convexity, if
Research Statement
Research StatementParikshit GopalanMy research focuses on fundamental algebraic problems such as polynomial reconstruction and interpolation arising from various areas of theoretical computer science.My main algorith-mic contributions include thefirst algorithm for list-decoding a well-known family of codes called Reed-Muller codes[13],and thefirst algorithms for agnostically learning parity functions[3]and decision trees[11]under the uniform distribution.On the complexity-theoretic side,my contribu-tions include the best-known hardness results for reconstructing low-degree multivariate polyno-mials from noisy data[12]and the discovery of a connection between representations of Boolean functions by polynomials and communication complexity[2].1IntroductionMany important recent developments in theoretical computer science,such as probabilistic proof checking,deterministic primality testing and advancements in algorithmic coding theory,share a common feature:the extensive use of techniques from algebra.My research has centered around the application of these methods to problems in Coding theory,Computational learning,Hardness of approximation and Boolean function complexity.While atfirst glance,these might seem like four research areas that are not immediately related, there are several beautiful connections between these areas.Perhaps the best illustration of these links is the noisy parity problem where the goal is to recover a parity function from a corrupted set of evaluations.The seminal Goldreich-Levin algorithm solves a version of this problem;this result initiated the study of list-decoding algorithms for error-correcting codes[5].An alternate solution is the Kushilevitz-Mansour algorithm[19],which is a crucial component in algorithms for learning decision trees and DNFs[17].H˚a stad’s ground-breaking work on the hardness of this problem has revolutionized our understanding of inapproximability[16].All these results rely on insights into the Fourier structure of Boolean functions.As I illustrate below,my research has contributed to a better understanding of these connec-tions,and yielded progress on some important open problems in these areas.2Coding TheoryThe broad goal of coding theory is to enable meaningful communication in the presence of noise, by suitably encoding the messages.The natural algorithmic problem associated with this task is that of decoding or recovering the transmitted message from a corrupted encoding.The last twenty years have witnessed a revolution with the discovery of several powerful decoding algo-rithms for well-known families of error-correcting codes.A key role has been played by the notion of list-decoding;a relaxation of the classical decoding problem where we are willing to settle for a small list of candidate transmitted messages rather than insisting on a unique answer.This relaxation allows one to break the classical half the minimum distance barrier for decoding error-correcting codes.We now know powerful list-decoding algorithms for several important code families,these algorithms have also made a huge impact on complexity theory[5,15,23].List-Decoding Reed-Muller Codes:In recent work with Klivans and Zuckerman,we give the first such list-decoding algorithm for a well-studied family of codes known as Reed-Muller codes, obtained from low-degree polynomials over thefinitefield F2[13].The highlight of this work is that our algorithm is able to tolerate error-rates which are much higher than what is known as the Johnson bound in coding theory.Our results imply new combinatorial bounds on the error-correcting capability of these codes.While Reed-Muller codes have been studied extensively in both coding theory and computer science communities,our result is thefirst to show that they are resilient to remarkably high error-rates.Our algorithm is based on a novel view of the Goldreich-Levin algorithm as a reduction from list-decoding to unique-decoding;our view readily extends to polynomials of arbitrary degree over anyfield.Our result complements recent work on the Gowers norm,showing that Reed-Muller codes are testable up to large distances[21].Hardness of Polynomial Reconstruction:In the polynomial reconstruction problem,one is asked to recover a low-degree polynomial from its evaluations at a set of points and some of the values could be incorrect.The reconstruction problem is ubiquitous in both coding theory and computational learning.Both the Noisy parity problem and the Reed-Muller decoding problem are instances of this problem.In joint work with Khot and Saket,we address the complexity of this problem and establish thefirst hardness results for multivariate polynomials of arbitrary degree [12].Previously,the only hardness known was for degree1,which follows from the celebrated work of H˚a stad[16].Our work introduces a powerful new algebraic technique called global fold-ing which allows one to bypass a module called consistency testing that is crucial to most hardness results.I believe this technique willfind other applications.Average-Case Hardness of NP:Algorithmic advances in decoding of error-correcting codes have helped us gain a deeper understand of the connections between worst-case and average case complexity[23,24].In recent work with Guruswami,we use this paradigm to explore the average-case complexity of problems in NP against algorithms in P[8].We present thefirst hardness amplification result in this setting by giving a construction of an error-correcting code where most of the symbols can be recovered correctly from a corrupted codeword by a deterministic algorithm that probes very few locations in the codeword.The novelty of our work is that our decoder is deterministic,whereas previous algorithms for this task were all randomized.3Computational LearningComputational learning aims to understand the algorithmic issues underlying how we learn from examples,and to explore how the complexity of learning is influenced by factors such as the ability to ask queries and the possibility of incorrect answers.Learning algorithms for a class of concept typically rely on understanding the structure of that concept class,which naturally ties learning to Boolean function complexity.Learning in the presence of noise has several connections to decoding from errors.My work in this area addresses the learnability of basic concept classes such as decision trees,parities and halfspaces.Learning Decision Trees Agnostically:The problem of learning decision trees is one of the central open problems in computational learning.Decision trees are also a popular hypothesis class in practice.In recent work with Kalai and Klivans,we give a query algorithm for learning decision trees with respect to the uniform distribution on inputs in the agnostic model:given black-box access to an arbitrary Boolean function,our algorithmfinds a hypothesis that agrees with it on almost as many inputs as the best decision tree[11].Equivalently,we can learn decision trees even when the data is corrupted adversarially;this is thefirst polynomial-time algorithm for learning decision trees in a harsh noise model.Previous decision-tree learning algorithms applied only to the noiseless setting.Our algorithm can be viewed as the agnostic analog of theKushilevitz-Mansour algorithm[19].The core of our algorithm is a procedure to implicitly solve a convex optimization problem in high dimensions using approximate gradient projection.The Noisy Parity Problem:The Noisy parity problem has come to be widely regarded as a hard problem.In work with Feldman et al.,we present evidence supporting this belief[3].We show that in the setting of learning from random examples(without queries),several outstanding open problems such as learning juntas,decision trees and DNFs reduce to restricted versions of the problem of learning parities with random noise.Our result shows that in some sense, noisy parity captures the gap between learning from random examples and learning with queries, as it is believed to be hard in the former setting and is known to be easy in the latter.On the positive side,we present thefirst non-trivial algorithm for the noisy parity problem under the uniform distribution in the adversarial noise model.Our result shows that somewhat surprisingly, adversarial noise is no harder to handle than random noise.Hardness of Learning Halfspaces:The problem of learning halfspaces is a fundamental prob-lem in computational learning.One could hope to design algorithms that are robust even in the presence of a few incorrectly labeled points.Indeed,such algorithms are known in the setting where the noise is random.In work with Feldman et al.,we show that the setting of adversarial errors might be intractable:given a set of points where99%are correctly labeled by some halfs-pace,it is NP-hard tofind a halfspace that correctly labels even51%of the points[3].4Prime versus Composite problemsMy thesis work focuses on new aspects of an old and famous problem:the difference between primes and composites.Beyond basic problems like primality and factoring,there are many other computational issues that are not yet well understood.For instance,in circuit complexity,we have excellent lower bounds for small-depth circuits with mod2gates,but the same problem for circuits with mod6gates is wide open.Likewise in combinatorics,set systems where sizes of the sets need to satisfy certain modular conditions are well studied.Again the prime case is well understood,but little is known for composites.In all these problems,the algebraic techniques that work well in the prime case break down for composites.Boolean function complexity:Perhaps the simplest class of circuits for which we have been unable to show lower bounds is small-depth circuits with And,Or and Mod m gates where m is composite;indeed this is one of the frontier open problems in circuit complexity.When m is prime, such bounds were proved by Razborov and Smolensky[20,22].One reason for this gap is that we do not fully understand the computational power of polynomials over composites;Barrington et.al were thefirst to show that such polynomials are surprisingly powerful[1].In joint work with Bhatnagar and Lipton,we solve an important special case:when the polynomials are symmetric in their variables[2].We show an equivalence between computing Boolean functions by symmetric polynomials over composites and multi-player communication protocols,which enables us to apply techniques from communication complexity and number theory to this problem.We use these techniques to show tight degree bounds for various classes of functions where no bounds were known previously.Our viewpoint simplifies previously known results in this area,and reveals new connections to well-studied questions about Diophantine equations.Explicit Ramsey Graphs:A basic open problem regarding polynomials over composites is: Can asymmetry in the variables help us compute a symmetric function with low degree?I show a connec-tion between this question and an important open problem in combinatorics,which is to explicitly construct Ramsey graphs or graphs with no large cliques and independent sets[6].While good Ramsey graphs are known to exist by probabilistic arguments,explicit constructions have proved elusive.I propose a new algebraic framework for constructing Ramsey graphs and showed howseveral known constructions can all be derived from this framework in a unified manner.I show that all known constructions rely on symmetric polynomials,and that such constructions cannot yield better Ramsey graphs.Thus the question of symmetry versus asymmetry of variables is precisely the barrier to better constructions by such techniques.Interpolation over Composites:A basic problem in computational algebra is polynomial interpolation,which is to recover a polynomial from its evaluations.Interpolation and related algorithmic tasks which are easy for primes become much harder,even intractable over compos-ites.This difference stems from the fact that over primes,the number of roots of a polynomial is bounded by the degree,but no such theorem holds for composites.In lieu of this theorem I presented an algorithmic bound;I show how to compute a bound on the degree of a polynomial given its zero set[7].I use this to give thefirst optimal algorithms for interpolation,learning and zero-testing over composites.These algorithms are based on new structural results about the ze-roes of polynomials.These results were subsequently useful in ruling out certain approaches for better Ramsey constructions[6].5Other Research HighlightsMy other research work spans areas of theoretical computer science ranging from algorithms for massive data sets to computational complexity.I highlight some of this work below.Data Stream Algorithms:Algorithmic problems arising from complex networks like the In-ternet typically involve huge volumes of data.This has led to increased interest in highly efficient algorithmic models like sketching and streaming,which can meaningfully deal with such massive data sets.A large body of work on streaming algorithms focuses one estimating how sorted the input is.This is motivated by the realization that sorting the input is intractable in the one-pass data stream model.In joint work with Krauthgamer,Jayram and Kumar,we presented thefirst sub-linear space data stream algorithms to estimate two well-studied measures of sortedness:the distance from monotonicity(or Ulam distance for permutations),and the length of the Longest Increasing Subsequence or LIS.In more recent work with Anna G´a l,we prove optimal lower bounds for estimating the length of the LIS in the data-stream model[4].This is established by proving a direct-sum theorem for the communication complexity of a related problem.The novelty of our techniques is the model of communication that they address.As a corollary,we obtain a separation between two models of communication that are commonly studied in relation to data stream algorithms.Structural Properties of SAT solutions:The solution space of random SAT formulae has been studied with a view to better understanding connections between computational hardness and phase transitions from satisfiable to unsatisfiable.Recent algorithmic approaches rely on connectivity properties of the space and break down in the absence of connectivity.In joint work with Kolaitis,Maneva and Papadimitriou,we consider the problem:Given a Boolean formula,do its solutions form a connected subset of the hypercube?We classify the worst-case complexity of various connectivity properties of the solution space of SAT formulae in Schaefer’s framework[14].We show that the jump in the computational hardness is accompanied by a jump in the diameter of the solution space from linear to exponential.Complexity of Modular Counting Problems:In joint work with Guruswami and Lipton,we address the complexity of counting the roots of a multivariate polynomial over afinitefield F q modulo some number r[9].We establish a dichotomy showing that the problem is easy when r is a power of the characteristic of thefield and intractable otherwise.Our results give several examples of problems whose decision versions are easy,but the modular counting version is hard.6Future Research DirectionsMy broad research goal is to gain a complete understanding of the complexity of problems arising in coding theory,computational learning and related areas;I believe that the right tools for this will come from Boolean function complexity and hardness of approximation.Below I outline some of the research directions I would like to pursue in the future.List-decoding algorithms have allowed us to break the unique-decoding barrier for error-correcting codes.It is natural to ask if one can perhaps go beyond the list-decoding radius and solve the problem offinding the codeword nearest to a received word at even higher error rates. On the negative side,we do not currently know any examples of codes where one can do this.But I think that recent results on Reed-Muller codes do offer some hope[13,21].Algorithms for solving the nearest codeword problem if they exist,could also have exciting implications in computational learning.There are concept classes which are well-approximated by low-degree polynomials over finitefields lying just beyond the threshold of what is currently known to be learnable efficiently [20,22].Decoding algorithms for Reed-Muller codes that can tolerate very high error rates might present an approach to learning such concept classes.One of the challenges in algorithmic coding theory is to determine whether known algorithms for list-decoding Reed-Solomon codes[15]and Reed-Muller codes[13,23]are optimal.This raises both computational and combinatorial questions.I believe that my work with Khot et al.rep-resents a goodfirst step towards understanding the complexity of the decoding/reconstruction problem for multivariate polynomials.Proving similar results for univariate polynomials is an excellent challenge which seems to require new ideas in hardness of approximation.There is a large body of work proving strong NP-hardness results for problems in computa-tional learning.However,all such results only address the proper learning scenario where the learning algorithm is restricted to produce a hypothesis from some particular class H which is typically the same as the concept class C.In contrast,known learning algorithms are mostly im-proper algorithms which could use more complicated hypotheses.For hardness results that are independent of the hypothesis H used by the algorithm,one currently has to resort to crypto-graphic assumptions.In ongoing work with Guruswami and Raghavendra,we are investigating the possibility of proving NP-hardness for improper learning.Finally,I believe that there are several interesting directions to explore in the agnostic learn-ing model.An exciting insight in this area comes from the work of Kalai et al.who show that 1regression is a powerful tool for noise-tolerant learning[18].A powerful paradigm in com-putational learning is to prove that the concept has some kind of polynomial approximation and then recover the approximation.Algorithms based on 1regression require a weaker polynomial approximation in comparison with previous algorithms(which use 2regression),but use more powerful machinery for the recovery step.Similar ideas might allow us to extend the boundaries of efficient learning even in the noiseless model;this is a possibility I am currently exploring.Having worked in areas ranging from data stream algorithms to Boolean function complexity, I view myself as both an algorithm designer and a complexity theorist.I have often found that working on one aspect of a problem gives insights into the other;indeed much of my work has originated from such insights([12]and[13],[10]and[4],[6]and[7]).Ifind that this is increasingly the case across several areas in theoretical computer science.My aim is to maintain this balance between upper and lower bounds in my future work.References[1]D.A.Barrington,R.Beigel,and S.Rudich.Representing Boolean functions as polynomialsmodulo composite putational Complexity,4:367–382,1994.[2]N.Bhatnagar,P.Gopalan,and R.J.Lipton.Symmetric polynomials over Z m and simultane-ous communication protocols.Journal of Computer&System Sciences(special issue for FOCS’03), 72(2):450–459,2003.[3]V.Feldman,P.Gopalan,S.Khot,and A.K.Ponnuswami.New results for learning noisyparities and halfspaces.In Proc.47th IEEE Symp.on Foundations of Computer Science(FOCS’06), 2006.[4]A.G´a l and P.Gopalan.Lower bounds on streaming algorithms for approximating the lengthof the longest increasing subsequence.In Proc.48th IEEE Symp.on Foundations of Computer Science(FOCS’07),2007.[5]O.Goldreich and L.Levin.A hard-core predicate for all one-way functions.In Proc.21st ACMSymposium on the Theory of Computing(STOC’89),pages25–32,1989.[6]P.Gopalan.Constructing Ramsey graphs from Boolean function representations.In Proc.21stIEEE symposium on Computational Complexity(CCC’06),2006.[7]P.Gopalan.Query-efficient algorithms for polynomial interpolation over composites.In Proc.17th ACM-SIAM symposium on Discrete algorithms(SODA’06),2006.[8]P.Gopalan and V.Guruswami.Deterministic hardness amplification via local GMD decod-ing.Submitted to23rd IEEE Symp.on Computational Complexity(CCC’08),2008.[9]P.Gopalan,V.Guruswami,and R.J.Lipton.Algorithms for modular counting of roots of mul-tivariate polynomials.In tin American Symposium on Theoretical Informatics(LATIN’06), 2006.[10]P.Gopalan,T.S.Jayram,R.Krauthgamer,and R.Kumar.Estimating the sortedness of a datastream.In Proc.18th ACM-SIAM Symposium on Discrete Algorithms(SODA’07),2007.[11]P.Gopalan,A.T.Kalai,and A.R.Klivans.Agnostically learning decision trees.In Proc.40thACM Symp.on Theory of Computing(STOC’08),2008.[12]P.Gopalan,S.Khot,and R.Saket.Hardness of reconstructing multivariate polynomials overfinitefields.In Proc.48th IEEE Symp.on Foundations of Computer Science(FOCS’07),2007. [13]P.Gopalan,A.R.Klivans,and D.Zuckerman.List-decoding Reed-Muller codes over smallfields.In Proc.40th ACM Symp.on Theory of Computing(STOC’08),2008.[14]P.Gopalan,P.G.Kolaitis,E.N.Maneva,and puting the connec-tivity properties of the satisfiability solution space.In Proc.33rd Intl.Colloqium on Automata, Languages and Programming(ICALP’06),2006.[15]V.Guruswami and M.Sudan.Improved decoding of Reed-Solomon and Algebraic-Geometric codes.IEEE Transactions on Information Theory,45(6):1757–1767,1999.[16]J.H˚a stad.Some optimal inapproximability results.J.ACM,48(4):798–859,2001.[17]J.Jackson.An efficient membership-query algorithm for learning DNF with respect to theuniform distribution.Journal of Computer and System Sciences,55:414–440,1997.[18]A.T.Kalai,A.R.Klivans,Y.Mansour,and R.A.Servedio.Agnostically learning halfspaces.In Proc.46th IEEE Symp.on Foundations of Computer Science,pages11–20,2005.[19]E.Kushilevitz and Y.Mansour.Learning decision trees using the Fourier spectrum.SIAMJournal of Computing,22(6):1331–1348,1993.[20]A.Razborov.Lower bounds for the size of circuits of bounded depth with basis{∧,⊕}.Mathematical Notes of the Academy of Science of the USSR,(41):333–338,1987.[21]A.Samorodnitsky.Low-degree tests at large distances.In Proc.39th ACM Symposium on theTheory of Computing(STOC’07),pages506–515,2007.[22]R.Smolensky.Algebraic methods in the theory of lower bounds for Boolean circuit com-plexity.Proc.19th Annual ACM Symposium on Theoretical Computer Science,(STOC’87),pages 77–82,1987.[23]M.Sudan,L.Trevisan,and S.P.Vadhan.Pseudorandom generators without the XOR lemma.put.Syst.Sci.,62(2):236–266,2001.[24]L.Trevisan.List-decoding using the XOR lemma.In Proc.44th IEEE Symposium on Foundationsof Computer Science(FOCS’03),pages126–135,2003.。
Introduction to Optimization
Course Overview
5/29
Convex Optimization Example: Minimum Cost Flow
Given a directed network G = (V, E ) with cost ce ∈ R+ per unit of traffic on edge e, and capacity de , find the minimum cost routing of r divisible units of traffiw
11/29
Who Should Take this Class
Anyone planning to do research in the design and analysis of algorithms
Course Overview
2/29
Convex Optimization Problem
A continuous optimization problem where f is a convex function on X , and X is a convex set. Convex function: f (αx + (1 − α)y ) ≤ αf (x) + (1 − α)f (y ) for all x, y ∈ X and α ∈ [0, 1] Convex set: αx + (1 − α)y ∈ X , for all x, y ∈ X and α ∈ [0, 1] Convexity of X implied by convexity of gi ’s For maximization problems, f should be concave Typically solvable efficiently (i.e. in polynomial time) Encodes optimization problems from a variety of application areas
Adversarial Examples(对抗样本)
3
Fig. 1: Example of attacks on deep learning models with ‘universal adversarial perturbations’ : The attacks are shown for the CaffeNet, VGG-F network and GoogLeNet. All the networks recognized the original clean images correctly with high confidence. After small perturbations were added to the images, the networks predicted wrong labels with similar high confidence. Notice that the perturbations are hardly perceptible for human vision system, however their effects on the deep learning models are catastrophic. 2018/5/15
4
Definition of terms
• Adversarial example/image is a modified version of a clean
image that is intentionally perturbed (e.g. by adding noise) to confuse/fool a machine learning technique, such as deep neural networks. • Adversarial perturbation is the noise that is added to the clean image to make it an adversarial example. • Adversarial training uses adversarial images besides the clean
节点定位概述
传感器网络节点定位算法
基于测距的定位算法实现起来比较复杂,首先需要通过TOA、
TDOA、AOA、RSSI等常用的测距技术来测量各个未知节点到信标节 点的绝对距离值。
这个阶段也称为测距阶段。
测距结束后就要进行定位(计算坐标)阶段,即利用测距阶段所得的 节点间的距离或方位等参数来计算出未知节点的位置。
传感器网络节点定位算法
基于测距的定位算法 :
➢ 三边测量定位法(Trilateration) ➢ 多边定位法(Multilateration) ➢ 三角测量法(Triangulation) ➢ 极大似然估计法(Maximum Likelihood Method) ➢ 角度定位法(Goniometry)等
MDS-MAP定位算法是University of Missouri-Columbia的Yi Shang 等提出来的。
该算法属于集中式定位算法,它是利用节点间的连通信息通过 Dijkstra或Floyd算法生成节点间距矩阵,然后利用多维尺度分析技 术来获得节点间的位置信息。
这个多边形基本上确定了未知节点所在的区域并缩小了未知节点所在 的范围,最后计算这个多边形区域的质心,并将质心作为未知节点的 位置,这样就实现了未知节点的定位。
传感器网络节点定位算法
基于无需测距的定位算法——凸规划定位算法 (Convex Optimization)
➢ 凸规划定位算法的基本原理如图2-5所示
RFID与识别技术
概念描述
定位
➢ 即确定方位、确定某一事物在一定环境中的位置。 ➢ 在无线传感器网络中的定位具有两层意义:
• 其一是确定自己在系统中的位置。 • 其二是系统确定其目标在系统中的位置。
概念描述
在传感器网络的实际应用中:
convex optimization problem
convex optimization problem凸优化问题是指优化问题中所涉及的函数是凸函数的情况。
凸函数是一种能够保证在任意两点之间的连线上函数值总是不超过线段两个端点对应函数值之间的函数,这使得凸函数在优化问题中具有很好的性质。
在数学领域中,凸优化问题是很重要的一类问题,广泛应用于统计学、机器学习、控制论等领域。
由于凸优化问题具有良好的性质,因此它们可以在没有任何其他假设条件的情况下,被有效地求解。
凸优化问题的一般形式是:$$ min f(x) $$$$ s.t. g_i(x) leq 0, i=1,2,...,m $$$$ h_j(x) = 0, j = 1,2,...,n $$其中$x$是优化变量,$f(x)$是要最小化的凸函数,$g_i(x)$是不等式约束条件,$h_j(x)$是等式约束条件。
凸优化问题的求解方法有很多,其中最常用的是内点法和梯度下降法。
内点法是一种基于迭代的方法,每次迭代都会求解一个线性或非线性方程组。
内点法具有很好的收敛性和计算效率,在处理大规模优化问题时表现尤为突出。
梯度下降法则是一种基于梯度方向迭代的方法,每次迭代都会将优化变量沿着负梯度方向进行更新。
梯度下降法在求解凸优化问题时同样表现优异,但对于非凸优化问题则会存在收敛到局部最优解的问题。
除了以上两种方法外,还有其他求解凸优化问题的方法,如近端梯度法、次梯度法、半定规划等。
每种方法都有其特点和适用范围,需要针对实际问题进行选择和调整。
总之,凸优化问题在数学和应用领域中具有广泛的应用,并且有很多有效的求解方法。
掌握凸优化问题的基本知识和求解方法,对于提高数学建模和问题求解的能力有很大帮助。
GPSR
Gradient Projection for Sparse Reconstruction:Application to Compressed Sensing and Other Inverse ProblemsM´a rio A.T.Figueiredo,Robert D.Nowak,Stephen J.Wright Abstract—Many problems in signal processing andstatistical inference involvefinding sparse solutions tounder-determined,or ill-conditioned,linear systems ofequations.A standard approach consists in minimizingan objective function which includes a quadratic(ℓ2)error term combined with a sparseness-inducing(ℓ1)regularization term.Basis Pursuit,the Least AbsoluteShrinkage and Selection Operator(LASSO),wavelet-based deconvolution,and Compressed Sensing are afew well-known examples of this approach.This paperproposes gradient projection(GP)algorithms for thebound-constrained quadratic programming(BCQP)formulation of these problems.We test variants ofthis approach that select the line search parametersin different ways,including techniques based on theBarzilai-Borwein putational experimentsshow that these GP approaches perform very well in awide range of applications,being significantly faster(interms of computation time)than competing methods.I.I NTRODUCTIONA.BackgroundThere has been considerable interest in solvingthe convex unconstrained optimization problem1minxdictionary(that is,multiplying by W corresponds to performing an inverse wavelet transform),and x is the vector of representation coefficients of the unknown image/signal[24],[25],[26].We mention also image restoration problems un-der total variation(TV)regularization[10],[46].In the one-dimensional(1D)case,a change of variables leads to the formulation(1).In2D,however,the techniques of this paper cannot be applied directly. Another intriguing new application for the opti-mization problem(1)is compressed sensing(CS1) [7],[9],[18].Recent results show that a relatively small number of random projections of a sparse signal can contain most of its salient information. It follows that if a signal is sparse or approximately sparse in some orthonormal basis,then an accurate reconstruction can be obtained from random projec-tions,which suggests a potentially powerful alterna-tive to conventional Shannon-Nyquist sampling.In the noiseless setting,accurate approximations can be obtained byfinding a sparse signal that matches the random projections of the original signal.This problem can be cast as(5),where again matrix A has the form A=R W,but in this case R represents a low-rank randomized sensing matrix (e.g.,a k×d matrix of independent realizations of a random variable),while the columns of W contain the basis over which the signal has a sparse representation(e.g.,a wavelet basis).Problem(1) is a robust version of this reconstruction process, which is resilient to errors and noisy data,and similar criteria have been proposed and analyzed in [8],[31].Although some CS theory and algorithms apply to complex vectors,x∈C n,y∈C k,we will not consider that case here,since the proposed approach does not apply to it.B.Previous AlgorithmsSeveral optimization algorithms and codes have recently been proposed to solve the QCLP(3),the QP(4),the LP(5),and the unconstrained(but nonsmooth)formulation(1).We review this work briefly here and identify those contributions that are most suitable for signal processing applications, which are the target of this paper.In the class of applications that motivates this paper,the matrix A cannot be stored explicitly, and it is costly and impractical to access significant portions of A and A T A.In wavelet-based image reconstruction and some CS problems,for which A=R W,explicit storage of A,R,or W is not practical for problems of interesting scale.However, matrix-vector products involving R and W can be done quite efficiently.For example,if the columns of W contain a wavelet basis,then any multiplication 1A comprehensive repository of CS literature and software can be fond in /cs/.of the form Wv or W T v can be performed by a fast wavelet transform(see Section III-G,for details).Similarly,if R represents a convolution, then multiplications of the form Rv or R T v can be performed with the help of the fast Fourier transform (FFT)algorithm.In some CS applications,if the dimension of y is not too large,R can be explicitly stored;however,A is still not available explicitly, because the large and dense nature of W makes it highly impractical to compute and store R W. Homotopy algorithms thatfind the full path of solutions,for all nonnegative values of the scalar parameters in the various formulations(τin(1),εin(3),and t in(4)),have been proposed in[22],[38], [45],and[56].The formulation(4)is addressed in [45],while[56]addresses(1)and(4).The method in[38]provides the solution path for(1),for a range of values ofτ.The least angle regression (LARS)procedure described in[22]can be adapted to solve the LASSO formulation(4).These are all essentially homotopy methods that perform pivoting operations involving submatrices of A or A T A at certain critical values of the corresponding parame-ter(τ,t,orε).These methods can be implemented so that only the submatrix of A corresponding to nonzero components of the current vector x need be known explicitly,so that if x has few nonze-ros,these methods may be competitive even for problems of very large scale.(See for example the SolveLasso function in the SparseLab toolbox, available from .)In some signal processing applications,however,the number of nonzero x components may be signif-icant,and since these methods require at least as many pivot operations as there are nonzeros in the solution,they may be less competitive on such problems.The interior-point approach in[57],which solves a generalization of(4),also requires explicit construction of A T A,though the approach could in principle modified to allow iterative solution of the linear system at each primal-dual iteration. Algorithms that require only matrix-vector prod-ucts involving A and A T have been proposed ina number of recent works.In[11],the problems(5)and(1)are solved byfirst reformulating them as“perturbed linear programs”(which are linear programs with additional terms in the objective which are squared norms of the unknowns),then applying a standard primal-dual interior-point ap-proach[59].The linear equations or least-squares problems that arise at each interior-point iteration are then solved with iterative methods such as LSQR [47]or conjugate gradients(CG).Each iteration of these methods requires one multiplication each by A and A T.MATLAB implementations of related approaches are available in the SparseLab toolbox; see in particular the routines SolveBP and pdco. For additional details see[50].Another interior-point method was proposed recently(after submission of thefirst version of this paper)to solve a quadratic program-ming reformulation of(1),different from the one adopted here.Each search step is com-puted using preconditioned conjugate gradients (PCG)and requires only products involving A and A T[35].The code,which is available from /˜boyd/l1_ls/,is re-ported to perform faster than competing codes on the problems tested in[35].Theℓ1-magic suite of codes(which is available at )implements algorithms for several of the formulations described in Section I-A. In particular,the formulation(3)is solved by re-casting it as a second-order cone program(SOCP), then applying a primal log-barrier approach.For each value of the log-barrier parameter,the smooth unconstrained subproblem is solved using Newton’s method with line search,where the Newton equa-tions may be solved using CG.(Background on this approach can be found in[6],[9],while details of the algorithm are given in the User’s Guide for ℓ1-magic.)As in[11]and[35],each CG iteration requires only multiplications by A and A T;these matrices need not be known or stored explicitly. Iterative shrinkage/thresholding(IST)algorithms can also be used to handle(1)and only require matrix-vector multiplications involving A and A T. Initially,IST was presented as an EM algorithm,in the context of image deconvolution problems[44], [25].IST can also be derived in a majorization-minimization(MM)framework2[16],[26](see also [23],for a related algorithm derived from a dif-ferent perspective).Convergence of IST algorithms was shown in[13],[16].IST algorithms are based on bounding the matrix A T A(the Hessian of y−Ax 22)by a diagonal D(i.e.,D−A T A is positive semi-definite),thus attacking(1)by solving a sequence of simpler denoising problems.While this bound may be reasonably tight in the case of deconvolution(where R is usually a square matrix), it may be loose in the CS case,where matrix R usually has many fewer rows than columns.For this reason,IST may not be as effective for solving(1)in CS applications,as it is in deconvolution problems. Finally,we mention matching pursuit(MP)and orthogonal MP(OMP)[17],[20],[55],which are greedy schemes tofind a sparse representation of a signal on a dictionary of basis functions.(Matrix A is seen as an n-element dictionary of k-dimensional signals).MP works by iteratively choosing the dic-tionary element that has the highest inner product with the current residual,thus most reduces the representation error.OMP includes an extra orthog-2Also known as bound optimization algorithms(BOA).For a general introduction to MM/BOA,see[32].onalization step,and is known to perform better than standard MP.Low computational complexity is one of the main arguments in favor of greedy schemes like OMP,but such methods are not designed to solve any of the optimization problems posed above. However,if y=Ax,with x sparse and the columns of A sufficiently incoherent,then OMPfinds the sparsest representation[55].It has also been shown that,under similar incoherence and sparsity condi-tions,OMP is robust to small levels of noise[20].C.Proposed ApproachThe approach described in this paper also requires only matrix-vector products involving A and A T, rather than explicit access to A.It is essentially a gradient projection(GP)algorithm applied to a quadratic programming formulation of(1),in which the search path from each iterate is obtained by projecting the negative-gradient onto the feasible set.We refer to our approach as GPSR(gradient projection for sparse reconstruction).Various en-hancements to this basic approach,together with careful choice of stopping criteria and afinal debias-ing phase(whichfinds the least squaresfit over the support set of the solution to(1)),are also important in making the method practical and efficient. Unlike the MM approach,GPSR does not involve bounds on the matrix A T A.In contrasts with the interior-point approaches discussed above,GPSR involves only one level of iteration.(The approaches in[11]and[35]have two iteration levels—an outer interior-point loop and an inner CG,PCG,or LSQR loop.Theℓ1-magic algorithm for(3)has three nested loops—an outer log-barrier loop,an inter-mediate Newton iteration,and an inner CG loop.) GPSR is able to solve a sequence of problems(1)efficiently for a sequence of values ofτ.Oncea solution has been obtained for a particularτ,it can be used as a“warm-start”for a nearby value. Solutions can therefore be computed for a range ofτvalues for a small multiple of the cost of solving for a singleτvalue from a“cold start.”This feature of GPSR is somewhat related to that of LARS and other homotopy schemes,which com-pute solutions for a range of parameter values in succession.Interior-point methods such as those that underlie the approaches in[11],[35],andℓ1-magic have been less successful in making effective use of warm-start information,though this issue has been investigated in various contexts(see for example [30],[34],[60]).To benefit from a warm start, interior-point methods require the initial point to be not only close to the solution but also sufficiently interior to the feasible set and close to a“central path,”which is difficult to satisfy in practice.II.P ROPOSED F ORMULATIONA.Formulation as a Quadratic ProgramThefirst key step of our GPSR approach is to express(1)as a quadratic program;as in[28],this is done by splitting the variable x into its positive and negative parts.Formally,we introduce vectors u and v and make the substitutionx=u−v,u≥0,v≥0.(6) These relationships are satisfied by u i=(x i)+and v i=(−x i)+for all i=1,2,...,n,where(·)+de-notes the positive-part operator defined as(x)+= max{0,x}.We thus have x 1=1T n u+1T n v, where1n=[1,1,...,1]T is the vector consisting of n ones,so(1)can be rewritten as the following bound-constrained quadratic program(BCQP): minu,v12z T Bz≡F(z),s.t.z≥0,(8) wherez= u v ,b=A T y,c=τ12n+ −b b andB= A T A−A T A−A T A A T A .(9) B.Dimensions of the BCQPIt may be observed that the dimension of problem (8)is twice that of the original problem(1):x∈R n, while z∈R2n.However,this increase in dimension has only a minor impact.Matrix operations involv-ing B can be performed more economically than its size suggests,by exploiting its particular structure (9).For a given z=[u T v T]T,we have Bz=B u v = A T A(u−v)−A T A(u−v) , indicating that Bz can be found by computing the vector difference u−v and then multiplying once each by A and A T.Since∇F(z)=c+Bz (the gradient of the objective function in(8)),we conclude that computation of∇F(z)requires one multiplication each by A and A T,assuming that c, which depends on b=A T y,is pre-computed at the start of the algorithm.Another common operation in the GP algorithms described below is tofind the scalar z T Bz for a given z=[u T,v T]T.It is easy to see thatz T Bz=(u−v)T A T A(u−v)= A(u−v) 22, indicating that this quantity can be calculated using only a single multiplication by A.Since F(z)= (1/2)z T Bz+c T z,it follows that evaluation of F(z) also requires only one multiplication by A.C.A Note Concerning Non-negative SolutionsIt is worth pointing out that when the solution of (1)is known in advance to be nonnegative,we can directly rewrite the problem asminx(τ1n−A T y)T x+1We then choose the initial guess to beF(z(k)−αg(k)),α0=arg minαwhich we can compute explicitly as(g(k))T g(k)α0=,αmax .γ(k)Step4:Perform convergence test and terminatewith approximate solution z(k+1)if it is sat-isfied;otherwise set k←k+1and return toStep1.Since F is quadratic,the line search parameterλ(k)in Step2can be calculated simply using thefollowing closed-form expression:λ(k)=mid 0,(δ(k))T∇F(z(k))(When(δ(k))T Bδ(k)=0,we setλ(k)=1.) The use of this parameterλ(k)removes one of the salient properties of the Barzilai-Borwein approach, namely,the possibility that F may increase on some iterations.Nevertheless,in our problems,it appeared to improve performance over the more standard non-monotone variant,which setsλ(k)≡1.We also tried other variants of the Barzilai-Borwein approach,including one proposed in[15],which alternates between two definitions ofα(k).The dif-ference in performance were very small,so we focus our presentation on the method described above. In earlier testing,we experimented with other variants of GP,including the GPCG approach of [42]and the proximal-point approach of[58].The GPCG approach runs into difficulties because the projection of the Hessian B onto most faces of the positive orthant defined by z≥0is singular,so the inner CG loop in this algorithm tends to fail.C.ConvergenceConvergence of the methods proposed above can be derived from the analysis of Bertsekas[3]and Iusem[33],but follows most directly from the results of Birgin,Martinez,and Raydan[4]and Serafini,Zanghirati,and Zanni[51].We summa-rize convergence properties of the two algorithms described above,assuming that termination occursonly when z(k+1)=z(k)(which indicates that z(k) is optimal).Theorem1:The sequence of iterates{z(k)}gen-erated by the either the GPSR-Basic of GPSR-BB algorithms either terminates at a solution of(8),or else converges to a solution of(8)at an R-linear rate.Proof:Theorem2.1of[51]can be used to show that all accumulation points of{z(k)}are stationary points.(This result applies to an algorithm in which theα(k)are chosen by a different scheme,but the only relevant requirement on these parameters in the proof of[51,Theorem2.1]is that they lie in the range[αmin,αmax],as is the case here.)Since the objective in(8)is clearly bounded below(by zero),we can apply[51,Theorem2.2]to deduce convergence to a solution of(8)at an R-linear rate.2s T s−y T ss.t.−τ1n≤A T s≤τ1n.(18) If s is feasible for(18),then12s T s+y T s≥0,(19) with equality attained if and only if x is a solution of(1)and s is a solution of(18).To define a termination criterion,we invert the transformation in(6)to obtain a candidate x,and then construct a feasible s as follows:s≡τAx−yand terminate if|C k|/|I k|≤tolA.(20) This criterion is well suited to the class of prob-lems addressed in this paper(where we expect the cardinality of I k,in later stages of the algorithm, to be much less than the dimension of z),and to algorithms of the gradient projection type,which generate iterates on the boundary of the feasible set. However,it does not work for general BCQPs(e.g., in which all the components of z are nonzero at the solution)or for algorithms that generate iterates that remain in the interior of the feasible set.It is difficult to choose a termination criterion from among these options that performs well on all data sets and in all contexts.In the tests described in Section IV,unless otherwise noted,we use(17), with tolP=10−2,which appeared to yield fairly consistent results.We may also impose some large upper limit maxiter on the number of iterations.E.DebiasingOnce an approximate solution has been obtained using one of the algorithms above,we optionally perform a debiasing step.The computed solution z=[u T,v T]T is converted to an approximate solution x GP=u−v.The zero components of x GP arefixed at zero,and the least-squares objec-tive y−Ax 22is then minimized subject to this restriction using a CG algorithm(see for example [43,Chapter5]).In our code,the CG iteration is terminated wheny−Ax 22≤tolD y−Ax GP 22,(21) where tolD is a small positive parameter.We also restrict the number of CG steps in the debiasing phase to maxiterD.Essentially,the problem(1)is being used to se-lect the“explanatory”variables(components of x), while the debiasing step chooses the optimal values for these components according to a least-squares criterion(without the regularization termτ x 1). Similar techiques have been used in otherℓ1-based algorithms,e.g.,[41].It is also worth pointing out that debiasing is not always desirable.Shrinking the selected coefficients can mitigate unusually large noise deviations[19],a desirable effect that may be undone by debiasing.F.Efficiently Solving For a Sequence of Regulariza-tion Parametersτ:Warm StartingThe gradient projection approach benefits from a good starting point.We can use the non-debiased solution of(1)to initialize GPSR in solving an-other problem in whichτis changed to a nearby value.The second solve will typically take fewer iterations than thefirst one;the number of iterations depends on the closeness of the values ofτand the closeness of the ing this warm-start technique,we can solve for a sequence of values ofτ,moving from one value to the next in the sequence in either increasing of decreasing order.It is generally best to solve in order ofτincreasing,as the number of nonzero components of x in the solution of(1)typically decreases asτincreases,so the additional steps needed in moving between two successive solutions usually take the form of moving some nonzero x components to the boundary and adjusting the other nonzero values. We note that it is important to use the non-debiased solution as starting point;debiasing may move the iterates away from the true minimizer of(1)and is therefore generally of lower quality as a starting point for the next value ofτ.Our motivation for solving for a range ofτvalues is that it is often difficult to determine an appropriate value a priori.Theoretical analysis may suggest a certain value,but it is beneficial to explore a range of solutions surrounding this value and to examine the sparsity of the resulting solutions,possibly using some test based on the solution sparsity and the goodness of least-squaresfit to choose the“best”solution from among these possibilities.G.Analysis of Computational CostIt is not possible to accurately predict the number of GPSR-Basic and GPSR-BB iterations required to find an approximate solution.As mentioned above, this number depends in part on the quality of the initial guess.We can however analyze the cost of each iteration of these algorithms.The main compu-tational cost per iteration is a small number of inner products,vector-scalar multiplications,and vector additions,each requiring n or2nfloating-point operations,plus a modest number of multiplications by A and A T.When A=R W,these operations entail a small number of multiplications by R,R T, W,and W T.The cost of each CG iteration in the debiasing phase is similar but lower;just one multiplication by each of R,R T,W,and W T plus a number of vector operations.We next analyze the cost of multiplications by R,R T,W,and W T for various typical problems; let us begin by recalling that A=R W is a k×n matrix,and that x∈R n,y∈R k.Thus,if R has dimensions k×d,then W must be a d×n matrix. If W contains an orthogonal wavelet basis(d= n),matrix-vector products involving W or W T can be implemented using fast wavelet transform algo-rithms with O(n)cost[39],instead of the O(n2) cost of a direct matrix-vector product.Consequently, the cost of a product by A or A T is O(n)plus that of multiplying by R or R T which,with a direct implementation,is O(k n).When using redundantFig.aftertranslation-invariant wavelet systems,W is d×d(log2(d)+1),but the corresponding matrix-vector products can be done with O(d log d)cost,using fast undecimated wavelet transform algorithms[39]. Similar cost analyses apply when W corresponds to other fast transform operators,such as the FFT.As mentioned above,direct implementations of products by R and R T have O(k d)cost.However, in some cases,these products can be carried out with significantly lower cost.For example,in image deconvolution problems[25],R is a k×k(d=k) block-Toeplitz matrix with Toeplitz blocks(repre-senting2D convolutions)and these products can be performed in the discrete Fourier domain using the FFT,with O(k log k)cost,instead of the O(k2)cost of a direct implementation.If the blur kernel support is very small(say l pixels)these products can be done with even lower cost,O(kl),by implementing the corresponding convolution.Also,in certain ap-plications of compressed sensing,such as MR image reconstruction[37],R is formed from a subset of the discrete Fourier transform basis,so the cost is O(k log k)using the FFT.IV.E XPERIMENTSThis section describes some experiments testify-ing to the very good performance of the proposed algorithms in several types of problems of the form (1).These experiments include comparisons with state-of-the-art approaches,namely IST[16],[25], and the recent l1_ls package,which was shown in [35]to outperform all previous methods,including theℓ1-magic toolbox and the homotopy method from[21].The algorithms discussed in Section III are written in MATLAB and are freely available for download from www.lx.it.pt/˜mtf/GPSR/. For the GPSR-BB algorithm,we setαmin= 10−30,αmax=1030;the performance is insensitive to these choices,similar results are obtained for other small settings ofαmin and large values ofαmax. We discuss results also for a nonmonotone version of the GPSR-BB algorithm,in whichλk≡1.In GPSR-Basic,we usedβ=0.5andµ=0.1.pressed SensingIn ourfirst experiment we consider a typical compressed sensing(CS)scenario(similar to the one in[35]),where the goal is to reconstruct a length-n sparse signal(in the canonical basis)from k observations,where k<n.In this case,the k×n matrix A is obtained byfirstfilling it with inde-pendent samples of a standard Gaussian distribution and then orthonormalizing the rows.In this example, n=4096,k=1024,the original signal x contains 160randomly placed±1spikes,and the observation y is generated according to(2),withσ2=10−4. Parameterτis chosen as suggested in[35]:τ=0.1 A T y ∞;(22) this value can be understood by noticing that for τ≥ A T y ∞the unique minimum of(1)is the zero vector[29],[35].The original signal and the estimate obtained by solving(1)using the monotone version of theFig.2.The objective function plotted against iteration number and CPU time,for GPSR-Basic and the monotone and nonmono-tone versions of the GPSR-BB,corresponding to the experiment illustrated in Fig.1.In Fig.2,we plot the evolution of the objective function(without debiasing)versus iteration number and CPU time,for GPSR-Basic and both versions of GPSR-BB.The GPSR-BB variants are slightly faster,but the performance of all three codes is quite similar on this problem.Fig.3shows how the objective function(1)and the MSE evolve in the debiasing phase.Notice that the objective function (1)increases during the debiasing phase,since we are minizing a different function in this phase.3MSE=(1/n) b x−x 22,where b x is an estimate of x.Fig.4.Evolution of the objective function and reconstruction MSE,vs CPU time,including the debiasing phase,corresponding to the experiment illustrated in Fig. 1.parison with OMPNext,we compare the computational efficiency of GPSR algorithms against OMP,often regarded as a highly efficient method that is especially well-suited to very sparse cases.We use two efficient MATLAB implementations of OMP:the greed_omp_qr function of the Sparsify4toolbox(based on QR factorization[5],[17]),and the function SolveOMP included in the SparseLab toolbox(based on the Cholesky factorization).Because greed_omp_qr requires each column of the matrix R to have unit norm,we use matrices with this property in all our comparisons.Since OMP is not an optimization algorithm for minimizing(1)(or any other objective function),it is not obvious how to compare it with GPSR.In our experiments,wefix matrix size(1024×4096) and consider a range of degrees of sparseness:the number m of non-zeros spikes in x(randomly located values of±1)ranges from5to250.For each value of m we generate10random data sets, i.e.,triplets(x,y,R).For each data set,wefirst run GPSR-BB(the monotone version)and store thefinal value of the residual;we then run greed_omp_qr 4Available at /˜tblumens/sparsify /sparsify.html and SolveOMP until they reach the same value of residual.Finally,we compute average MSE(with respect to the true x)and average CPU time,over the10runs.Fig.4plots the reconstruction MSE and the CPU time required by GPSR-BB and OMP,as a function of the number of nonzero components in x.We observe that all methods basically obtain exact re-constructions for m up to almost200,with the OMP solutions(which are equal up to some numerical fluctuations)starting to degrade earlier and faster than those produced by solving(1).Concerning computational efficiency,which is our main focus in this experiment,we can observed that GPSR-BB is clearly faster than both OMP implementations, except in the case of extreme sparseness(m<50 non-zero elements in the4096-vector x).C.Scalability AssessmentTo assess how the computational cost of the GPSR algorithms grows with the size of matrix A, we have performed an experiment similar to the one in[35,Section5.3].The idea is to assume that the computational cost is O(nα)and obtain empirical estimates of the exponentα.We consider random sparse matrices(with the nonzero entries normally distributed)of dimensions(0.1n)×n, with n ranging from104to106.Each matrix is generated with about3n nonzero elements and the original signal with n/4randomly placed nonzero components.For each value of n,we generate10 random matrices and original signals and observed data according to(2),with noise varianceσ2= 10−4.For each data set(i.e.,each pair A,y),τis chosen as in(22).The results in Fig.5(which are average for10data sets of each size)show that all GPSR algorithms have empirical exponents below0.9,thus much better than l1_ls(for which we foundα=1.21,in agreement with the value1.2 reported in[35]);IST has an exponent very close to that of GPSR algorithms,but a worse constant, thus its computational complexity is approximately a constant factor above GPSR.Finally,notice that, according to[35],ℓ1-magic has an exponent close to 1.3,while all the other methods considered in that paper have exponents no less than2.D.Warm StartingAs mentioned in Section III-F,GPSR algorithms can benefit from being warm-started,i.e.,initialized at a point close to the solution.This property can be exploited tofind minima(1)for a sequence of values ofτ,at a modest multiple of the cost of solving only for one value ofτ.We illustrate this possibility in a problem with k=1024,n=8192,which we wish to solve for9different values ofτ,τ∈{0.05,0.075,0.1,...,0.275} A T y ∞.。
凸函数1
Figure 1: Examples of a convex set (a) and a nons
• All of Rn . It should be fairly obvious that given any x, y ∈ Rn , θx + (1 − θ)y ∈ Rn . • The non-negative orthant, Rn + . The non-negative orthant consists of all vectors in n R whose elements are all non-negative: Rn + = {x : xi ≥ 0 ∀i = 1, . . . , n}. To show that this is a convex set, simply note that given any x, y ∈ Rn + and 0 ≤ θ ≤ 1, (θx + (1 − θ)y )i = θxi + (1 − θ)yi ≥ 0 ∀i. • Norm balls. Let · be some norm on Rn (e.g., the Euclidean norm, x 2 = n n 2 i=1 xi ). Then the set {x : x ≤ 1} is a convex set. To see this, suppose x, y ∈ R , with x ≤ 1, y ≤ 1, and 0 ≤ θ ≤ 1. Then θx + (1 − θ)y ≤ θx + (1 − θ)y = θ x + (1 − θ) y ≤ 1 where we used the triangle inequality and the positive homogeneity of norms. • Affine subspaces and polyhedra. Given a matrix A ∈ Rm×n and a vector b ∈ Rm , an affine subspace is the set {x ∈ Rn : Ax = b} (note that this could possibly be empty if b is not in the range of A). Similarly, a polyhedron is the (again, possibly empty) set {x ∈ Rn : Ax b}, where ‘ ’ here denotes componentwise inequality (i.e., all the entries of Ax are less than or equal to their corresponding element in b).1 To prove this, first consider x, y ∈ Rn such that Ax = Ay = b. Then for 0 ≤ θ ≤ 1, A(θx + (1 − θ)y ) = θAx + (1 − θ)Ay = θb + (1 − θ)b = b. Similarly, for x, y ∈ Rn that satisfy Ax ≤ b and Ay ≤ b and 0 ≤ θ ≤ 1, A(θx + (1 − θ)y ) = θAx + (1 − θ)Ay ≤ θb + (1 − θ)b = b.
凸优化一阶和二阶条件的证明
2
First and second order characterizations of convex functions
Theorem 2. Suppose f : Rn → R is twice differentiable over an open domain. Then, the following are equivalent: (i) f is convex. (ii) f (y ) ≥ f (x) + ∇f (x)T (y − x), for all x, y ∈ dom(f ). (iii) ∇2 f (x) 0, for all x ∈ dom(f ).
• The theorem simplifies many basic proofs in convex analysis but it does not usually make verification of convexity that much easier as the condition needs to hold for all lines (and we have infinitely many). • Many algorithms for convex optimization iteratively minimize the function over lines. The statement above ensures that each subproblem is also a convex optimization problem. 4
1.4
Examples of multivariate convex functions
• Affine functions: f (x) = aT x + b (for any a ∈ Rn , b ∈ R). They are convex, but not strictly convex; they are also concave: ∀λ ∈ [0, 1], f (λx + (1 − λ)y ) = aT (λx + (1 − λ)y ) + b = λaT x + (1 − λ)aT y + λb + (1 − λ)b = λf (x) + (1 − λ)f (y ). In fact, affine functions are the only functions that are both convex and concave. • Some quadratic functions: f (x) = xT Qx + cT x + d. – Convex if and only if Q 0. 0.
凸优化2017课件Lecture4_convex_problems
Implicit Constraints
The standard form optimization problem has an explicit constraint:
m p
x∈D=
i=0
dom fi ∩
i=1
dom hi
D is the domain of the problem The constraints fi (x) ≤ 0, hi (x) = 0 are the explicit constraints A problem is unconstrained if it has no explicit constraints Example: minimize
Convex Optimization Problems
Spring 2016-2017
Yuanming Shi ShanghaiTech University
Outline
1 Optimization Problems 2 Convex Optimization 3 Quasi-Convex Optimization 4 Classes of Convex Problems: LP, QP, SOCP, SDP 5 Multicriterion Optimization (Pareto Optimality)
4
Global and Local Optimality
A feasible x is optimal if f0 (x) = p ; Xopt is the set of optimal points. A feasible x is locally optimal if it is optimal within a ball, i.e., there is an R > 0 such that x is optimal for minimize
convex optimization介绍
Convex optimization是数学最优化的一个子领域,它研究的是定义于凸集中的凸函数的最小化问题。
通俗地说,就像在光滑的山坑中投掷小球,小球会停在最低点,这就是“凸优化”。
而相对地,若山坑内凹凸不平,甚至有更小的坑洞,那么小球有时就会被粗糙的平面卡住,而有时则会落在最低点,这就是“非凸优化”。
凸优化中的一些理论与思想可以被延伸到整个优化领域甚至其他学科,很多非凸问题的解决方法之一就是将其在某种程度上转化为凸问题,然后利用凸优化的方法技巧来计算。
不仅在数学领域,计算机科学、工程学、甚至在金融与经济学领域,凸优化都成为很多学生需研究者需要学习的一门课程。
以上信息仅供参考,如有需要,建议查阅相关网站。
Convex Optimization 教材习题答案
Stephen Boyd
Lieven Vandenberghe
January 4, 2006
Chapter 2
Convex sets
Exercises
Exercises
Definition of convexity
2.1 Let C ⊆ Rn be a convex set, with x1 , . . . , xk ∈ C , and let θ1 , . . . , θk ∈ R satisfy θi ≥ 0, θ1 + · · · + θk = 1. Show that θ1 x1 + · · · + θk xk ∈ C . (The definition of convexity is that this holds for k = 2; you must show it for arbitrary k.) Hint. Use induction on k. Solution. This is readily shown by induction from the definition of convex set. We illustrate the idea for k = 3, leaving the general case to the reader. Suppose that x 1 , x2 , x3 ∈ C , and θ1 + θ2 + θ3 = 1 with θ1 , θ2 , θ3 ≥ 0. We will show that y = θ1 x1 + θ2 x2 + θ3 x3 ∈ C . At least one of the θi is not equal to one; without loss of generality we can assume that θ1 = 1. Then we can write where µ2 = θ2 /(1 − θ1 ) and µ2 = θ3 /(1 − θ1 ). Note that µ2 , µ3 ≥ 0 and µ1 + µ 2 = y = θ1 x1 + (1 − θ1 )(µ2 x2 + µ3 x3 )
Robust Portfolio Selection Based on a Multi-stage Scenario Tree
Ruijun Shen
∗
Shuzhong Zhang
†
February 2006; revised November 2006
Abstract The aim of this paper is to apply the concept of robust optimization introduced by Bel-Tal and Nemirovski to the portfolio selection problems based on multi-stage scenario trees. The objective of our portfolio selection is to maximize an expected utility function value (or equivalently, to minimize an expected disutility function value) as in a classical stochastic programming problem, except that we allow for ambiguities to exist in the probability distributions along the scenario tree. We show that such a problem can be formulated as a finite convex program in the conic form, on which general convex optimization techniques can be applied. In particular, if there is no short-selling, and the disutility function takes the form of semi-variance downside risk, and all the parameter ambiguity sets are ellipsoidal, then the problem becomes a second order cone program, thus tractable. We use SeDuMi to solve the resulting robust portfolio selection problem, and the simulation results show that the robust consideration helps to reduce the variability of the optimal values caused by the parameter ambiguity.
QP-problems
1
where f 2 C2(IRn) is a convex function. For numerical testing we use the easier
quadratic form (1). Since f is convex, the Karush-Kuhn-Tucker conditions are necessary and su cient for x being a global minimizer of (2). Friedlander et.al. use the KKT-conditions to construct an exact penalty function which de nes a "primal-dual" box-constrained optimization problem with 2n + m variables. In the third section their Trust-Region-SQP method for solving the penalty problem is described. For testing this combination of penalty-function and Trust-Region-SQP approach a special problem generator is used. Numerical results are presented.
and an initial trust region radius 0 min.
Step 1:(upper bound of Bk)
Set the trust-region-radius :=
Step 2:(compute initial step)
凸优化(Convex optimization)
凸优化(Convex optimization)
最小二乘问题和线性规划问题都可以看成是凸优化问题的特殊情况,但是与最小二乘问题和线性规划问题两者不同,求解凸优化问题还不能算是一门成熟的技术。
通常没有解析公式来求解凸优化问题,但是存在一些有效的算法,最典型的代表是内点算法。
如果一个实际的问题可以被表示成凸优化问题,那么我们就可以认为其能够得到很好的解决。
但是往往识别一个凸优化问题比识别一个最小二乘问题要困难的多,所以需要更多的技巧。
还有的问题不是凸优化问题,但是凸优化问题同样可以在求解该问题中发挥重要的左右。
比如松弛算法和拉格朗日松弛算法,将非凸的限制条件松弛为凸限制条件。
凸优化包含多个层次,比如:二次优化问题是一个最底层的优化问题,可以通过求解线性方程来求解优化问题。
而牛顿算法是上一个层次,牛顿算法可以求解非限制问题或等式限制问题,但往往是将该问题简化为多个二次优化问题。
内点算法处于最高级,可以将非等式限制问题转化为一系列非限制问题或等式限制问题。
凸优化求解方法
凸优化求解方法
凸优化求解方法
凸优化求解方法是一种利用数学分析工具和算法来解决凸优化
问题(convex optimization problem)的方法。
凸优化问题定义为在一定的条件下,使得目标函数达到最优值。
凸优化求解方法常被用在各种工程科学和机械工程中,尤其是优化设计中。
凸优化求解方法主要包括以下几种:
1、最小二乘法(Least Squares Method):最小二乘法是一种常用的凸优化求解方法,它利用最小二乘拟合的方法,通过最小二乘法的优化,可以得到最优的参数估计值,从而达到最优的目标值。
2、梯度下降法(Gradient Descent):梯度下降法是一种迭代的凸优化求解方法,通过不断地迭代,将目标函数沿着梯度方向移动,朝着最优解进行搜索,最终得到最优解。
3、随机投影法(Random Projection):随机投影法是一种快速凸优化求解方法,它通过对目标函数进行随机投影,从而朝着最优解进行搜索,最终得到最优解。
4、牛顿迭代法(Newton's Method):牛顿迭代法是一种基于数值分析的凸优化求解方法,通过不断迭代,从而逼近最优解,最终得到最优解。
5、拉格朗日-阿尔法法(Lagrange-Algorithm):拉格朗日-阿尔法法是一种基于算法设计,使用多种算法进行凸优化的求解方法。
convex词根 -回复
convex词根-回复Convexity: Exploring the Concepts and ApplicationsIntroduction:The term "convex" is derived from the Latin word "convexus," which means "rounded or arched." This article aims to explore the concept of convexity and its applications in various fields such as mathematics, physics, economics, and computer science. Convexity is a fundamental concept that describes specific characteristics and properties of objects, functions, and spaces. By understanding convexity, we can gain insights into optimization problems, geometric analysis, and resource allocation.What is Convexity?Convexity refers to the property of being curved or rounded outward like a convex lens. It is essentially a geometrical concept that defines the shape of a curve or a surface. In mathematics, convexity describes a set or a function for which any line segment connecting two points in the set lies entirely within the set. This implies that the set does not contain any indentations or holes, andits boundaries are always curved outward.Properties of Convex Sets:A convex set has several defining properties that are crucial in understanding its characteristics. Firstly, any line segment joining two points within the set lies entirely within the set. This is known as the segment property. Secondly, if a set contains two points, then it also contains every point lying on the line segment joining these two points. This is called the line property. Thirdly, the intersection of two convex sets is also convex. Lastly, the convex combination property states that any convex combination of two points lies within the set.Applications of Convexity:1. Optimization:Convexity plays a vital role in optimization problems, where the objective is to find the best possible solution given certain constraints. Convex optimization problems have well-defined properties, making them easier to solve. The convexity of theobjective function ensures that the solution obtained is not merely a local optimum but a global optimum. This significantly simplifies the optimization process and allows for efficient algorithms to be developed. Convex optimization finds applications in finance, engineering, machine learning, and many other fields.2. Geometry and Spatial Analysis:Convexity has extensive applications in geometry and spatial analysis. Convex polyhedra, such as cubes and pyramids, play a crucial role in solid geometry. These shapes have well-defined edges, vertices, and faces, making them easier to analyze and manipulate. Convex hulls, which are the smallest convex sets containing a given set of points, are employed in computer graphics, computational geometry, and image processing. Convexity also allows for the definition of convex curves and surfaces, which are used in various contexts, such as designing aerodynamic shapes, analyzing biological structures, and constructing mathematical models.3. Economics and Game Theory:Convexity finds applications in the study of economics and game theory. Convexity is often assumed in the utility functions of consumers and production functions of firms. This assumption helps in simplifying economic models and understanding the behavior of consumers and producers. Convexity ensures that indifference curves are convex, meaning that individuals have diminishing marginal rates of substitution. Convexity also plays a significant role in game theory, particularly in the study of convex games, where players' payoffs are determined by their own decisions and a convex combination of other players' decisions.4. Resource Allocation and Fair Division:Convexity is essential in the field of resource allocation and fair division. Convexity ensures that fair allocations can be obtained efficiently and without envy. The notion of a fair division problem arises in various real-world scenarios such as dividing goods among individuals, allocating resources to projects, or distributing tasks among workers. Convexity allows for the definition of fair division algorithms that guarantee an equitable distribution without any complaints arising from the participants.Conclusion:Convexity is a powerful concept with broad applications in mathematics, physics, economics, and computer science. It represents essential properties of sets, functions, and spaces, allowing for efficient optimization, geometric analysis, and resource allocation. Understanding and applying convexity can lead to significant advancements in various fields and pave the way for innovative solutions to complex problems. Whether it is finding global optima, analyzing geometric shapes, modeling economic behavior, or ensuring fair division, convexity remains a fundamental and indispensable concept.。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Convex optimization problems
4–7
Local and global optima
any locally optimal point of a convex problem is (globally) optimal proof: suppose x is locally optimal and y is optimal with f0(y) < f0(x) x locally optimal means there is an R > 0 such that z feasible, z−x
• p⋆ = 0 if constraints are feasible; any feasible x is optimal • p⋆ = ∞ if constraints are infeasible
Convex optimization problems
4–5
Convex optimization problem
Convex optimization problems
• f0 : Rn → R is the objective or cost function
• p⋆ = ∞ if problem is infeasible (no x satisfies the constraints)
4–2
Optimal and locally optimal points
4–10
Equivalent convex problems
two problems are (informally) equivalent if the solution of one is readily obtained from the solution of the other, and vice-versa some common transformations that preserve convexity: • eliminating equality constraints minimize f0(x) subject to fi(x) ≤ 0, Ax = b is equivalent to minimize (over z) f0(F z + x0) subject to fi(F z + x0) ≤ 0, where F and x0 are such that Ax = b ⇐⇒ x = F z + x0 for some z
• z−x
which contradicts our assumption that x is locally optimal
Convex optimization problems 4–8
Optimality criterion for differentiable f0
x is optimal if and only if it is feasible and ∇f0(x)T (y − x) ≥ 0 for all feasible y
hi(z) = 0,
i = 1, . . . , p
• f0(x) = x3 − 3x, p⋆ = −∞, local optimum at x = 1
Convex optimization problems
• f0(x) = x log x, dom f0 = R++: p⋆ = −1/e, x = 1/e is optimal
i = 1, . . . , m i = 1, . . . , p
can be considered a special case of the general problem with f0(x) = 0: minimize 0 subject to fi(x) ≤ 0, hi(x) = 0,
i = 1, . . . , m i = 1, . . . , p
• fi : Rn → R, i = 1, . . . , m, are the inequality constraint functions • hi : Rn → R are the equality constraint functions optimal value: p⋆ = inf{f0(x) | fi(x) ≤ 0, i = 1, . . . , m, hi(x) = 0, i = 1, . . . , p} • p⋆ = −∞ if problem is unbounded below
4–11
i = 1, . . . , m
i = 1, . . . , m
Convex optimization problems
• introducing equality constraints minimize f0(A0x + b0) subject to fi(Aix + bi) ≤ 0, is equivalent to minimize (over x, yi) f0(y0) subject to fi(yi) ≤ 0, i = 1, . . . , m yi = Aix + bi, i = 0, 1, . . . , m • introducing slack variables for linear inequalities minimize f0(x) subject to aT x ≤ bi, i is equivalent to minimize (over x, s) f0(x) subject to aT x + si = bi, i = 1, . . . , m i si ≥ 0, i = 1, . . . m
• minimization over nonnegative orthant minimize f0(x) subject to x x is optimal if and only if x ∈ dom f0,
Convex optimization problems
0
x
0,
∇f0(x)i ≥ 0 xi = 0 ∇f0(x)i = 0 xi > 0
k i=1 log(bi
− aT x) i
is an unconstrained problem with implicit constraints aT x < bi i
Convex optimization problems 4–4
Feasibility problem
find x subject to fi(x) ≤ 0, hi(x) = 0,
4–1
Optimization problem in standard form
minimize f0(x) subject to fi(x) ≤ 0, hi(x) = 0, • x ∈ Rn is the optimization variable
i = 1, . . . , m i = 1, . . . , p
x is feasible if x ∈ dom f0 and it satisfies the constraints a feasible x is optimal if f0(x) = p⋆; Xopt is the set of optimal points x is locally optimal if there is an R > 0 such that x is optimal for minimize (over z) f0(z) subject to fi(z) ≤ 0, i = 1, . . . , m, z−x 2 ≤R examples (with n = 1, m = p = 0) • f0(x) = 1/x, dom f0 = R++: p⋆ = 0, no optimal point
Convex Optimization — Boyd & Vandenberghe
4. Convex optimization problems
• optimization problem in standard form • convex optimization problems • quasiconvex optimization • linear optimization • quadratic optimization • geometric programming • generalized inequality constraints • semidefinite programming • vector optimization
2
≤R
=⇒
f0(z) ≥ f0(x)
consider z = θy + (1 − θ)x with θ = R/(2 y − x 2) • z is a convex combination of two feasible points, hence also feasible
2
• y−x
2
> R, so 0 < θ < 1/2 = R/2 and f0(z) ≤ θf0(x) + (1 − θ)f0(y) < f0(x)