MIT开放课程Dynamic Programming Lecture (14)
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
6.231DYNAMIC PROGRAMMING
LECTURE14
LECTURE OUTLINE
•Limited lookahead policies •Performance bounds •Computational aspects
•Problem approximation approach •Vehicle routing example •Heuristic cost-to-go approximation •Computer chess
LIMITED LOOKAHEAD POLICIES •One-step lookahead(1SL)policy:At each k and state x k,use the controlµk(x k)that
min
u k∈U k(x k)E
g k(x k,u k,w k)+˜J k+1
f k(x k,u k,w k)
,
where
−˜J N=g N.
−˜J k+1:approximation to true cost-to-go J k+1•Two-step lookahead policy:At each k and x k,
use the control˜µk(x k)attaining the minimum above, where the function˜J k+1is obtained using a1SL approximation(solve a2-step DP problem).
•If˜J k+1is readily available and the minimization above is not too hard,the1SL policy is imple-mentable on-line.
•Sometimes one also replaces U k(x k)above with a subset of“most promising controls”U k(x k).•As the length of lookahead increases,the re-quired computation quickly explodes.
PERFORMANCE BOUNDS
•Let J k(x k)be the cost-to-go from(x k,k)of the 1SL policy,based on functions˜J k.•Assume that for all(x k,k),we have
ˆJ
k
(x k)≤˜J k(x k),(*) whereˆJ N=g N and for all k,
ˆJ k (x k)=min
u k∈U k(x k)
E
g k(x k,u k,w k)
+˜J k+1
f k(x k,u k,w k)
,
[soˆJ k(x k)is computed along withµk(x k)].Then
J k(x k)≤ˆJ k(x k),for all(x k,k).•Important application:When˜J k is the cost-to-
go of some heuristic policy(then the1SL policy is called the rollout policy).
•The bound can be extended to the case where there is aδk in the RHS of(*).Then
J k(x k)≤˜J k(x k)+δk+···+δN−1
COMPUTATIONAL ASPECTS •Sometimes nonlinear programming can be used to calculate the1SL or the multistep version[par-ticularly when U k(x k)is not a discrete set].Con-nection with the methodology of stochastic pro-gramming.
•The choice of the approximating functions˜J k is critical,and is calculated with a variety of methods.•Some approaches:
(a)Problem Approximation:Approximate the op-
timal cost-to-go with some cost derived from
a related but simpler problem
(b)Heuristic Cost-to-Go Approximation:Approx-
imate the optimal cost-to-go with a function
of a suitable parametric form,whose param-
eters are tuned by some heuristic or system-
atic scheme(Neuro-Dynamic Programming) (c)Rollout Approach:Approximate the optimal
cost-to-go with the cost of some suboptimal
policy,which is calculated either analytically
or by simulation