第4章:马尔可夫决策规划(第八至第九次课)_402107430

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

(4.1)
, π,
).
4 Markov
· 149 ·
(2)
Vβ (π)
,
.
,
,
(
)
(
)
.
,
,
,
,
β (0 < β < 1).
t=0
,t
t=0
βt.
VN,β(π, i),
N
VN,β(π, i) = βt
Pπ{Yt = j, Δt = a|Y0 = i}r(j, a)
t=0 a∈A(i) j∈S
VN,β(π, i), .
, S = {1, 2},
A(1) =
V∗0 = (V∗0(1), V∗0(2)),
· 152 ·
——
V∗0(1) V∗0(2)
i ∈ S = {1, 2}. (1) n = 3,
(4.4)
.
N = 3,
V∗4(i)→V∗3(i)→V∗2(i)→V∗1(i)→V∗0(i),
n = 3, 2, 1, 0
j∈S
(i∈S, n = N, N − 1, N − 2, · · · , 0)
(4.4)
V∗N+1(j) = 0. {1, 2, · · · , l}.
V∗0 = (V∗0(1), V∗0(2), · · · , V∗0(l))
,
i∈S,
V∗0(i) = sup VN (π, i);
π∈Π
π∗ = (f0∗, f1∗, · · · , fN∗ )
j=1
22
+ β2
P (j|i, f0(i))P (k|j, f1(j))r(k, f2(k)) + · · ·
k=1 j=1
(i = 1, 2)
Vβ(π, i)
.
,
i
π
,
π,
Vβ(π, i)
.
4.2.2 MDP
MDP
{S, (A(i), i∈S), P, r, V }
S,
,
i, j, k, · · · .
· 146 ·
——
4.2.1
4.1 (
)
(
)
,
.
,
:
( i = 1)
( i = 2).
,
10 ,
,
0.7,
0.3.
,
,
( a1). ( a3).
2, 0.6;
5 ,(
:
( a2)
−5 ),
2,
0.4.
:
,
,
,
.
t r(i, a)
4.1.
,
,
S = {1, 2},
A(1) = {a1}, A(2) = {a1, a2}. P (j |i, a)
(4.4)
.
V∗4(1) = V∗4(2) = 0
V∗3(1)
=
max {r(1,
a∈A(1)
a)
+
j∈S
P
(j|1,
a)V∗4(j)}
= max {r(1, a)} = r(1, a1) = 10
a∈A(1)
V∗3(1)
a1,
f3∗(1) = a1;
V∗3(2)
=
max {r(2,
a∈A(2)
a)
+
j∈S
P
(j|2,
a)V∗4(j)}
= max {r(2, a2), r(2, a3)} = max{−5, −2} = −2
a∈A(2)
a3, (2) n = 2, (4.4)
f3∗(2) = a3. V∗3(1), V∗3(2),
V∗2(1)
=
max {r(1,
a∈A(1)
a)
+
j∈S
P
(j|1,
π, t = 0
(
)
i
N
(
(4.2) ),
Vβ(π, i) (
(3)

Vβ(π, i) = βt
Pπ{Yt = j, Δt = a|Y0 = i}r(j, a)
t=0 a∈A(i) j∈S
(4.3)
π, t = 0
. Vβ(π)
,
)”.
{S, (A(i), i∈S), P, r, Vβ }
V (π)
.
S = {1, 2, · · · , l}.
π∗ ∈ Π,
π∈
. VN (π∗) = ,
i
,
.
,
{S, (A(i), i∈S), P, r, VN } , S A(i)(i∈S)
,
Markov
π∗ = (f0∗, f1∗, · · · , fN∗ ), ft∗∈F (t =
0, 1, · · · , N ),
V∗0 = (V∗0(1), V∗0(2)) = (26.752, 10.096) = (V3(π∗, 1), V3(π∗, 2))
π∗ = (f0∗, f1∗, f2∗, f3∗) = (f, f, g, g), a1, f (2) = a2, g(2) = a3.
:
,
f2, f1, f0
.
f (1) = g(1) = .
π = (f0, f1, f2, · · · ), .
t
i, π
π Markov . Markov
t
,
t
,
ft(i)
,
Πdm,
Markov
.
4.2
Markov
,t
,
t,
0,
πt(a|it) = 1,
a∈A(i)
Markov
it
a
π = (π0, π1, π2, · · · )
Πm,
Markov
t
π
,
t=0 a∈A(i)
j∈S
· 150 ·
V (π, i) V (π)
, ,
, .
——
π, t = 0 ,i
i V (π, i),
, [5, 26] .
, MDP
,
V (π) MDP
“” ,
,
. Markov .,
4.3
,
MDP {S, (A(i), i∈S), P, r, V } ,
N.
N,
π
i∈S,
t=0
i
N
.
:
N
VN (π, i) =
Pπ{Yt = j, Δt = a|Y0 = i}r(j, a)
t=0 a∈A(j) j∈S
, .
4.3.1
4.3 Π, i∈S
, VN (π∗, i) VN (π, i)
π∗
Π
,
(VN (π∗, 1), VN (π∗, 2), · · · , VN (π∗, l))
4 Markov
. VN (π∗, i) VN (π, i) [5, 26, 27].
π∈Π, i∈S
· 151 ·
4.3.2
,
,
,
Markov
.
[28]
4.1 V∗n(i),
, :
V∗n(i)
=
max
a∈A(i)
r(i, a) + P (j|i, a)V∗n+1(j)
j∈S
= r(i, fn∗(i)) + P (j|i, fn∗(i))V∗n+1(j)
i,
a, (t + 1)
j
t . r(i, a)
i,
a
. P (j |i, a)
t
. P (j |i, a) r(i, a)
,
(i) 1 2
4.1
(a)
a1 a2 a3
j=1 0.7 0.6 0.4
P (j|i, a) j=2
0.3 0.4 0.6
() r(i, a)
10 −5 −2
(
)f
:
1
,
a1;
k
j=1
P (k|j, f1(j)) (k, j = 1, 2).
f, g
,
f2.
,
4 Markov
· 147 ·
(f0, f1, f2, f3, · · · ) (
t=0
,
t=0
βt,
i,
),
π,
,t
β
=
1 1+a
(a
>
0).
. ,t=0
2
Vβ(π, i) =r(i, f0(i)) + β P (j|i, f0(i))r(j, f1(j))
A(i),
i,
a, b, c, · · · .
P
0, 1, 2, · · · )
j
.
0, P (j|i, a) = 1.
j∈S
r
r(i, a).
(
Markov
, P (j |i, a)
i,
a ∈ A(i) , (t + 1)
t
t
,
a∈A(i), i, j∈S.
Γ = {(i, a)|a∈A(i), i∈S}
V∗1(2) (4) n = 0, (4.4)
a2,
f1∗(2) = a2.
V∗1(1), V∗1(2),
V∗0(1)
=
max {r(1,
a∈A(1)
a)
+
j∈S
P
(j|1,
a)V∗1(j)}
ቤተ መጻሕፍቲ ባይዱ
= 10 + 0.7 × 21.72 + 0.3 × 5.16 = 26.752
f0∗(1) = a1;
V∗0(2)
,
π = (π0, π1, π2, · · · )
,
Π,
.
(
πt)
πt(a|it), πt(a|it)
Markov .
. t
.
, MDP (1)
VN (π) N,
Πdm ⊂ Πm ⊂ Π. :
π∈Π i∈S,
N
VN (π, i) =
Pπ{Yt = j, Δt = a|Y0 = i}r(j, a)
t=0 a∈A(i) j∈S
t,
i,
a ∈ A(i)
, ).
V π∈Π, i∈S, V (π, i)
Π×S t=0
.
,
i,
π
{S, (A(i), i∈S), P, r, V } ,
.
,
,
S
A(i) (i∈S)
.
. t (t = P (j|i, a) ,
. MDP
.
· 148 ·
——
4.2.3
S
A(i)
f
(
).
F.
4.1
ft∈F, t∈{0, 1, 2, · · · }, ft
π∈Π, i∈S,
i i
Vβ(π, i). Vβ(π) “ MDP .
V
N (π, i)
=
VN (π, i) N +1
VN (π, i) i , N +1
, V N (π, i) .
π t=0 ,
V
(π,
i)
=
lim
N →∞
inf
VN (π, i) N +1
=
lim
N →∞
inf
N
1 +
1
N
Pπ{Yt = j, Δt = a|Y0 = i}r(j, a)
a)V∗3(j)}
= r(1, a1) + 0.7 × 10 + 0.3 × (−20) = 16.4
f2∗(1) = a1;
V∗2(2)
=
max {r(2,
a∈A(2)
a)
+
j∈S
P
(j|2,
a)V∗3(j)}
= max{r(2, a2) + 0.6 × 10 + 0.4 × (−2),
r(2, a3) + 0.4 × 10 + 0.6 × (−2)}
Yt, Δt
(
)t
. Pπ{Yt = j, Δt = a|Y0 = i}
π∈Π, t = 0
i
a ∈ A(i)
t
j
; VN (π, i)
t=0
i
N
. VN (π)
,i
VN (π, i)(i∈S), VN (π) N
(,
,
N,

”.
MDP
.
,
{S, (A(i), i∈S), P, r, VN }
,
MDP .
= max{0.2, 0.8} = 0.8
V∗2(2) (3) n = 1, (4.4)
a3,
f2∗(2) = a3.
V∗2(1), V∗2(2),
V∗1(1)
=
max {r(1,
a∈A(1)
a)
+
j∈S
P
(j|1,
a)V∗2(j)}
= 10 + 0.7 × 16.4 + 0.3 × 0.8 = 21.72
4 Markov
4.1
, ,
, ;
, (
,
( Markov ), MDP).
), .,
.
,
,
.
;
.
,
: (1)
; (2)
,
.
,
,
Markov
(
Markov ,
;
, .
;
MDP
.
“ ”“
,
, ”,
,
. .
; Markov
Markov [5].
4.2 Markov
, Markov .,
{S, (A(i), i∈S), P, r, V } .
2,
a2. f (1) = a1, f (2) =
a2.
,
g
: g(1) = a1, g(2) = a3.
t=0 ,
),
i
f, g r(i, f0(i));
,
f0 (
t=1 ,
j
P (j |i, f0(i)) (i, j = 1, 2),
f, g
,
f1.
,
,
2
P (j|i, f0(i))r(j, f1(j)). t = 2 ,
,
S=
A(i)(i∈S) S = {1, 2, · · · , l}
fn∗(i)
,
fn∗(i)).
4.1
,
n, V∗n(i)
n,
, (fn∗, fn∗+1, · · · , fN∗ )
i, n
Bellman “
”.
,
(4.4)
(
,
N +1−n
N
.
4.2
4.1 N = 3
.
, {a1}, A(2) = {a1, a2}.
R(i) ,
R(i) = 0, i = 0 C3, i = 1
,
,
(
3 4
)j
.
,
j
,j
:






P (0) =
1 0
0 1
,
P
(1)
=

1 1
4
0 3

,
P
(2)
=⎣
1 7
4
16
0 9
⎦ , P (3)
=

1 37
16
64
0 27

64
C1 = 10, C2 = 5, C3 = 64, ,
4 Markov
· 153 ·
f1∗(1) = a1;
V∗1(2)
=
max {r(2,
a∈A(2)
a)
+
j∈S
P
(j|2,
a)V∗2(j)}
= max{r(2, a2) + 0.6 × 16.4 + 0.4 × 0.8,
r(2, a3) + 0.4 × 16.4 + 0.6 × 0.8}
= max{5.16, 5.04} = 5.16
=
max {r(2,
a∈A(2)
a)
+
j∈S
P
(j|2,
a)V∗1(j)}
= max{−5 + 0.6 × 21.72 + 0.4 × 5.16, −2 + 0.4 × 21.72 + 0.6 × 5.16}
= max{10.096, 9.784} = 10.096
V∗0(2) 4.1
a2,
f0∗(2) = a2.
4.3 .
,
,
j (0 j 3)
.
.
,
0.25.
. ,
. ,
.
· 154 ·
——
,
C2.
,
,
C3.
,
C1.
,
,
,
,
?
3,
N + 1 = 3, N = 2.
,
S = {0, 1},
0
,
1
.
j (0 j 3)
j
,
A(0) = {0}, A(1) = {0, 1, 2, 3}.
r(i, j)
i
j
,
r(i, j) = C1 + C2j, i = 1, j > 0 0,
相关文档
最新文档