MIT开放课程Dynamic Programming Lecture (14)

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

6.231DYNAMIC PROGRAMMING

LECTURE14

LECTURE OUTLINE

•Limited lookahead policies •Performance bounds •Computational aspects

•Problem approximation approach •Vehicle routing example •Heuristic cost-to-go approximation •Computer chess

LIMITED LOOKAHEAD POLICIES •One-step lookahead(1SL)policy:At each k and state x k,use the controlµk(x k)that

min

u k∈U k(x k)E

g k(x k,u k,w k)+˜J k+1

f k(x k,u k,w k)

,

where

−˜J N=g N.

−˜J k+1:approximation to true cost-to-go J k+1•Two-step lookahead policy:At each k and x k,

use the control˜µk(x k)attaining the minimum above, where the function˜J k+1is obtained using a1SL approximation(solve a2-step DP problem).

•If˜J k+1is readily available and the minimization above is not too hard,the1SL policy is imple-mentable on-line.

•Sometimes one also replaces U k(x k)above with a subset of“most promising controls”U k(x k).•As the length of lookahead increases,the re-quired computation quickly explodes.

PERFORMANCE BOUNDS

•Let J k(x k)be the cost-to-go from(x k,k)of the 1SL policy,based on functions˜J k.•Assume that for all(x k,k),we have

ˆJ

k

(x k)≤˜J k(x k),(*) whereˆJ N=g N and for all k,

ˆJ k (x k)=min

u k∈U k(x k)

E

g k(x k,u k,w k)

+˜J k+1

f k(x k,u k,w k)

,

[soˆJ k(x k)is computed along withµk(x k)].Then

J k(x k)≤ˆJ k(x k),for all(x k,k).•Important application:When˜J k is the cost-to-

go of some heuristic policy(then the1SL policy is called the rollout policy).

•The bound can be extended to the case where there is aδk in the RHS of(*).Then

J k(x k)≤˜J k(x k)+δk+···+δN−1

COMPUTATIONAL ASPECTS •Sometimes nonlinear programming can be used to calculate the1SL or the multistep version[par-ticularly when U k(x k)is not a discrete set].Con-nection with the methodology of stochastic pro-gramming.

•The choice of the approximating functions˜J k is critical,and is calculated with a variety of methods.•Some approaches:

(a)Problem Approximation:Approximate the op-

timal cost-to-go with some cost derived from

a related but simpler problem

(b)Heuristic Cost-to-Go Approximation:Approx-

imate the optimal cost-to-go with a function

of a suitable parametric form,whose param-

eters are tuned by some heuristic or system-

atic scheme(Neuro-Dynamic Programming) (c)Rollout Approach:Approximate the optimal

cost-to-go with the cost of some suboptimal

policy,which is calculated either analytically

or by simulation

相关文档
最新文档