Numerical Optimization - 360文档中心

合集下载

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Numerical Optimization: Introduction and gradient-based methods
Master 2 Recherche LRI Apprentissage Statistique et Optimisation
Anne Auger
Inria Saclay-Ile-de-France
essential inﬁmum of a function: Given a measure µ on Rn , the essential inﬁmum of f is deﬁned as ess inf f = sup{b ∈ R : µ({x : f (x) < b }) = 0}
important to keep in mind in the context of stochastic optimization algorithms
Numercial Optimization I
November 2011
5 / 38
Numerical optimization
A few examples
Optimization and machine learning
(Simple) Linear regression Given a set of data (examples): {yi , xi1 , . . . , xip }i =1...N
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
9 / 38
Numerical optimization
Typical diﬃculties in optimization
What Makes a Function Diﬃcult to Solve?
Black-box optimization
Optimization of well placement
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
8 / 38
Numerical optimization
Anne Auger (Inria Saclay-Ile-de-France) Numercial Optimization I November 2011 4 / 38
(1) (2)
Numerical optimization
A few examples
Data ﬁtting - Data calibration
k
minimizeµ1 ,...,µk
i=1 xj ∈Si
xj − µi
2
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
7 / 38
Numerical optimization
A few examples
x∈R
n: dimension of the problem corresponds to dimension of euclidian vector space Rn
Maximization vs Minimization Maximize f = Minimize − f
Anne Auger (Inria Saclay-Ile-de-France)
XT i N w∈R ,β ∈R
min p
|wT Xi + β − yi |2
i =1 ˜ −y Xw
2
same as data ﬁtting with linear model, i.e. f(w,β ) (x) = wT x + β , θ ∈ Rp
Anne Auger (Inria Saclay-Ile-de-France)
Numerical or continuous optimization
Unconstrained optimization
Optimize a function where parameters to optimize are “continuous” (live in Rn ). min f (x) n
Diﬀerent notions of optimum
Diﬀerent notions of optimum
local versus global minimum
local minimum x : for all x in a neighborhood of x , f (x) ≥ f (x ) global minimum: for all x, f (x) ≥ f (x )
2
Mathematical tools and optimality conditions First order diﬀerentiability and gradients Second order diﬀerentiability and hessian Optimality Conditions for unconstrainted optimization Gradient based optimization algorithms Root ﬁnding methods (1-D optimization) Relaxation algorithm Descent methods
Numercial Optimization I
November 2011
3 / 38
Numerical optimization
A few examples
Analytical functions
Convex quadratic function: 1 f (x) = xT Ax + b T x + c 2 1 = (x − x0 )T A(x − x0 ) + c 2 where A ∈ Rn×n , symmetric positive deﬁnite, b ∈ Rn , c ∈ R. Exercice Express x0 in (2) as a function of A and b. Express the minimum of f . For n = 2, plot the level sets of a convex-quadratic function where level sets are deﬁned as the sets Lc = {x ∈ Rn |f (x) = c }.
Objective Given a sequence of data points (xi , yi ) ∈ Rp × R, i = 1, . . . , N , ﬁnd a model “y = f (x) that explains the data
experimental measurements in biology, chemistry, ...
60
50
40
30
20
10
0 −4
−3
−2
−1
0
1
2
3
4
cut from 3-D example, solvable with an evolution strategy
Anne Auger (Inria Saclay-Ile-de-France) Numercial Optimization I November 2011 10 / 38
Data ﬁtting, regression In machine learning Black-box optimization
Diﬀerent notions of optimum Typical diﬃculties in optimization Deterministic vs stochastic - local versus global methods
Numercial Optimization I
November 2011
6 / 38
Numerical optimization
A few examples
Optimization and Machine Learning (cont.)
Clustering
k-means Given a set of observations (x1 , . . . , xp ) where each observation is a n-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k ≤ p ), S = {S1 , S2 , . . . , Sk } so as to minimize the within-cluster sum of squares:
Numercial Optimization I
November 2011
10 / 38
Numerical optimization
Typical diﬃculties in optimization
What Makes a Function Diﬃcult to Solve?
100 90
80
70
Gradient descent, Newton descent, BFGS
3
Trust regions methods
Anne Auger (Inria Saclay-Ile-de-France) Numercial Optimization I November 2011 2 / 38
Numerical optimization
3
4
cut from 3-D example, solvable with an evolution strategy
non-separability
dependencies between the objective variables
ill-conditioning
a narrow ridge
Anne Auger (Inria Saclay-Ile-de-France)
In general, choice of a parametric model or family of functions use of expertise for choosing model or simple models only aﬀordable (fθ )θ∈Rn
(linear, quadratic)
November 2011
http://tao.lri.fr/tiki-index.php?page=Courses
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
1Hale Waihona Puke / 381Numerical optimization A few examples
100
ruggedness
non-smooth, discontinuous, multimodal, and/or noisy function
90
80
70
60
50
40
30
20
10
dimensionality
(considerably) larger than three
0 −4
−3
−2
−1
0
1
2
Try to ﬁnd the parameter θ ∈ Fitting best to the data
Minimize the quadratic error:
θ ∈R N
Rn
ﬁtting best to the data
min n
i =1
|fθ (xi ) − yi |2
Anne Auger (Inria Saclay-Ile-de-France)
Numerical optimization
Typical diﬃculties in optimization
Curse of Dimensionality
The term Curse of dimensionality (Richard Bellman) refers to problems caused by the rapid increase in volume associated with adding extra dimensions to a (mathematical) space.