Numerical Optimization
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Numerical Optimization: Introduction and gradient-based methods
Master 2 Recherche LRI Apprentissage Statistique et Optimisation
Anne Auger
Inria Saclay-Ile-de-France
essential infimum of a function: Given a measure µ on Rn , the essential infimum of f is defined as ess inf f = sup{b ∈ R : µ({x : f (x) < b }) = 0}
important to keep in mind in the context of stochastic optimization algorithms
Numercial Optimization I
November 2011
5 / 38
Numerical optimization
A few examples
Optimization and machine learning
(Simple) Linear regression Given a set of data (examples): {yi , xi1 , . . . , xip }i =1...N
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
9 / 38
Numerical optimization
Typical difficulties in optimization
What Makes a Function Difficult to Solve?
Black-box optimization
Optimization of well placement
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
8 / 38
Numerical optimization
Anne Auger (Inria Saclay-Ile-de-France) Numercial Optimization I November 2011 4 / 38
(1) (2)
Numerical optimization
A few examples
Data fitting - Data calibration
k
minimizeµ1 ,...,µk
i=1 xj ∈Si
xj − µi
2
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
7 / 38
Numerical optimization
A few examples
x∈R
n: dimension of the problem corresponds to dimension of euclidian vector space Rn
Maximization vs Minimization Maximize f = Minimize − f
Anne Auger (Inria Saclay-Ile-de-France)
XT i N w∈R ,β ∈R
min p
|wT Xi + β − yi |2
i =1 ˜ −y Xw
2
same as data fitting with linear model, i.e. f(w,β ) (x) = wT x + β , θ ∈ Rp
Anne Auger (Inria Saclay-Ile-de-France)
Numerical or continuous optimization
Unconstrained optimization
Optimize a function where parameters to optimize are “continuous” (live in Rn ). min f (x) n
Different notions of optimum
Different notions of optimum
local versus global minimum
local minimum x : for all x in a neighborhood of x , f (x) ≥ f (x ) global minimum: for all x, f (x) ≥ f (x )
2
Mathematical tools and optimality conditions First order differentiability and gradients Second order differentiability and hessian Optimality Conditions for unconstrainted optimization Gradient based optimization algorithms Root finding methods (1-D optimization) Relaxation algorithm Descent methods
Numercial Optimization I
November 2011
3 / 38
Numerical optimization
A few examples
Analytical functions
Convex quadratic function: 1 f (x) = xT Ax + b T x + c 2 1 = (x − x0 )T A(x − x0 ) + c 2 where A ∈ Rn×n , symmetric positive definite, b ∈ Rn , c ∈ R. Exercice Express x0 in (2) as a function of A and b. Express the minimum of f . For n = 2, plot the level sets of a convex-quadratic function where level sets are defined as the sets Lc = {x ∈ Rn |f (x) = c }.
Objective Given a sequence of data points (xi , yi ) ∈ Rp × R, i = 1, . . . , N , find a model “y = f (x) that explains the data
experimental measurements in biology, chemistry, ...
60
50
40
30
20
10
0 −4
−3
−2
−1
0
1
2
3
4
cut from 3-D example, solvable with an evolution strategy
Anne Auger (Inria Saclay-Ile-de-France) Numercial Optimization I November 2011 10 / 38
Data fitting, regression In machine learning Black-box optimization
Different notions of optimum Typical difficulties in optimization Deterministic vs stochastic - local versus global methods
Numercial Optimization I
November 2011
6 / 38
Numerical optimization
A few examples
Optimization and Machine Learning (cont.)
Clustering
k-means Given a set of observations (x1 , . . . , xp ) where each observation is a n-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k ≤ p ), S = {S1 , S2 , . . . , Sk } so as to minimize the within-cluster sum of squares:
Numercial Optimization I
November 2011
10 / 38
Numerical optimization
Typical difficulties in optimization
What Makes a Function Difficult to Solve?
100 90
80
70
Gradient descent, Newton descent, BFGS
3
Trust regions methods
Anne Auger (Inria Saclay-Ile-de-France) Numercial Optimization I November 2011 2 / 38
Numerical optimization
3
4
cut from 3-D example, solvable with an evolution strategy
non-separability
dependencies between the objective variables
ill-conditioning
a narrow ridge
Anne Auger (Inria Saclay-Ile-de-France)
In general, choice of a parametric model or family of functions use of expertise for choosing model or simple models only affordable (fθ )θ∈Rn
(linear, quadratic)
November 2011
http://tao.lri.fr/tiki-index.php?page=Courses
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
1Hale Waihona Puke / 381Numerical optimization A few examples
100
ruggedness
non-smooth, discontinuous, multimodal, and/or noisy function
90
80
70
60
50
40
30
20
10
dimensionality
(considerably) larger than three
0 −4
−3
−2
−1
0
1
2
Try to find the parameter θ ∈ Fitting best to the data
Minimize the quadratic error:
θ ∈R N
Rn
fitting best to the data
min n
i =1
|fθ (xi ) − yi |2
Anne Auger (Inria Saclay-Ile-de-France)
Numerical optimization
Typical difficulties in optimization
Curse of Dimensionality
The term Curse of dimensionality (Richard Bellman) refers to problems caused by the rapid increase in volume associated with adding extra dimensions to a (mathematical) space.
Master 2 Recherche LRI Apprentissage Statistique et Optimisation
Anne Auger
Inria Saclay-Ile-de-France
essential infimum of a function: Given a measure µ on Rn , the essential infimum of f is defined as ess inf f = sup{b ∈ R : µ({x : f (x) < b }) = 0}
important to keep in mind in the context of stochastic optimization algorithms
Numercial Optimization I
November 2011
5 / 38
Numerical optimization
A few examples
Optimization and machine learning
(Simple) Linear regression Given a set of data (examples): {yi , xi1 , . . . , xip }i =1...N
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
9 / 38
Numerical optimization
Typical difficulties in optimization
What Makes a Function Difficult to Solve?
Black-box optimization
Optimization of well placement
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
8 / 38
Numerical optimization
Anne Auger (Inria Saclay-Ile-de-France) Numercial Optimization I November 2011 4 / 38
(1) (2)
Numerical optimization
A few examples
Data fitting - Data calibration
k
minimizeµ1 ,...,µk
i=1 xj ∈Si
xj − µi
2
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
7 / 38
Numerical optimization
A few examples
x∈R
n: dimension of the problem corresponds to dimension of euclidian vector space Rn
Maximization vs Minimization Maximize f = Minimize − f
Anne Auger (Inria Saclay-Ile-de-France)
XT i N w∈R ,β ∈R
min p
|wT Xi + β − yi |2
i =1 ˜ −y Xw
2
same as data fitting with linear model, i.e. f(w,β ) (x) = wT x + β , θ ∈ Rp
Anne Auger (Inria Saclay-Ile-de-France)
Numerical or continuous optimization
Unconstrained optimization
Optimize a function where parameters to optimize are “continuous” (live in Rn ). min f (x) n
Different notions of optimum
Different notions of optimum
local versus global minimum
local minimum x : for all x in a neighborhood of x , f (x) ≥ f (x ) global minimum: for all x, f (x) ≥ f (x )
2
Mathematical tools and optimality conditions First order differentiability and gradients Second order differentiability and hessian Optimality Conditions for unconstrainted optimization Gradient based optimization algorithms Root finding methods (1-D optimization) Relaxation algorithm Descent methods
Numercial Optimization I
November 2011
3 / 38
Numerical optimization
A few examples
Analytical functions
Convex quadratic function: 1 f (x) = xT Ax + b T x + c 2 1 = (x − x0 )T A(x − x0 ) + c 2 where A ∈ Rn×n , symmetric positive definite, b ∈ Rn , c ∈ R. Exercice Express x0 in (2) as a function of A and b. Express the minimum of f . For n = 2, plot the level sets of a convex-quadratic function where level sets are defined as the sets Lc = {x ∈ Rn |f (x) = c }.
Objective Given a sequence of data points (xi , yi ) ∈ Rp × R, i = 1, . . . , N , find a model “y = f (x) that explains the data
experimental measurements in biology, chemistry, ...
60
50
40
30
20
10
0 −4
−3
−2
−1
0
1
2
3
4
cut from 3-D example, solvable with an evolution strategy
Anne Auger (Inria Saclay-Ile-de-France) Numercial Optimization I November 2011 10 / 38
Data fitting, regression In machine learning Black-box optimization
Different notions of optimum Typical difficulties in optimization Deterministic vs stochastic - local versus global methods
Numercial Optimization I
November 2011
6 / 38
Numerical optimization
A few examples
Optimization and Machine Learning (cont.)
Clustering
k-means Given a set of observations (x1 , . . . , xp ) where each observation is a n-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k ≤ p ), S = {S1 , S2 , . . . , Sk } so as to minimize the within-cluster sum of squares:
Numercial Optimization I
November 2011
10 / 38
Numerical optimization
Typical difficulties in optimization
What Makes a Function Difficult to Solve?
100 90
80
70
Gradient descent, Newton descent, BFGS
3
Trust regions methods
Anne Auger (Inria Saclay-Ile-de-France) Numercial Optimization I November 2011 2 / 38
Numerical optimization
3
4
cut from 3-D example, solvable with an evolution strategy
non-separability
dependencies between the objective variables
ill-conditioning
a narrow ridge
Anne Auger (Inria Saclay-Ile-de-France)
In general, choice of a parametric model or family of functions use of expertise for choosing model or simple models only affordable (fθ )θ∈Rn
(linear, quadratic)
November 2011
http://tao.lri.fr/tiki-index.php?page=Courses
Anne Auger (Inria Saclay-Ile-de-France)
Numercial Optimization I
November 2011
1Hale Waihona Puke / 381Numerical optimization A few examples
100
ruggedness
non-smooth, discontinuous, multimodal, and/or noisy function
90
80
70
60
50
40
30
20
10
dimensionality
(considerably) larger than three
0 −4
−3
−2
−1
0
1
2
Try to find the parameter θ ∈ Fitting best to the data
Minimize the quadratic error:
θ ∈R N
Rn
fitting best to the data
min n
i =1
|fθ (xi ) − yi |2
Anne Auger (Inria Saclay-Ile-de-France)
Numerical optimization
Typical difficulties in optimization
Curse of Dimensionality
The term Curse of dimensionality (Richard Bellman) refers to problems caused by the rapid increase in volume associated with adding extra dimensions to a (mathematical) space.