统计学习[The Elements of Statistical Learning]第五章习题
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
βˆ = (HT H + λI)−1HT y ˆf = Hβˆ = H(HT H + λI)−1HT y
= ((HT )−1HT HH−1 + λ(HT )−1H−1)−1y = (I + λK−1)−1y = KK−1(I + λK−1)−1y = K(K + λI)−1y
4
3. From (b), we have
J (g) = g(x), g(x) HK =
K(x, xi)K(x, xj ) HK αiαj
i=1 j=1
NN
=
K(xi, xj)αiαj // (2)
i=1 j=1
4. Because ρ(x) orthogonal in HK to each of K(x, xi), we have K(·, xi), ρ(x) HK = 0 and g˜(xi) = K(·, xi), g˜(xi) HK = K(·, xi), g(xi) HK + K(·, xi), ρ(xi) HK = g(xi) // (1)
∞
yi − cj φj (xi)
j=1
2
+λ
∞
c2j
j=1 γj
// (5.53)
2. Calculate derivative of (5) w.r.t. β and let it = 0, we have
−HT (y − Hβˆ) + λβˆ = 0
M < N , so K is invertible and for any λ, K + λI = HT H + λI is invertible
a kernel K that computes the inner product K(x, y) =
M m=1
hm(x)hm(y).
1. Derive (5.62) on page 171 in the text. How would you compute the matrices V and Dγ, given K? Hence show that (5.63) is equivalent to (5.53).
ˆf = Hβˆ = H(HT H)−1HT y = y
5
Ex. 5.15 This exercise derives some of the results quoted in Section 5.8.1. Suppose K(x, y) satisfying the conditions (5.45) and let f (x) ∈ HK. Show that
Let GM = {gnm}, n ≤ M and every row of GM are linear independent, then we have
GM GTM = diag{γ1, γ2, . . . , γM } = Dγ
Let
VT
=
Dγ−
1 2
GM
(VT )T VT GTM
=
VVT GTM
M
∞
K(x, y) = hm(x)hm(y) = γiφi(x)φi(y)
m=1
i=1
Multiply each items with φk(x) and integral w.r.t. x, i.e. calculate K(x, y), φk(x)
M
∞
( hm(x)φk(x)dx)hm(y) = γi( φi(x)φk(x)dx)φi(y)
Sλ = N(NT N + λΩN )−1NT = UDVT (VDUT UDVT + λΩN )−1VDUT = (UD−1VT (VD2VT )VD−1UT + UD−1VT (λΩN )VD−1UT )−1 = (I + λUD−1VT ΩN VD−1UT )−1 = (I + λK)−1
where K = UD−1VT ΩN VD−1UT
=
GTM
(Dγ−
1 2
)T
Dγ−
1 2
GM
GTM
=
GTM
D−γ
1 2
D−γ
1 2
Dγ
=
GTM
Therefore (VT )T VT = VVT = E, VT is an orthogonal matrix. Let h(x) = (h1(x), h2(x), . . . , hM (x))T and φ(x) = (φ1(x), φ2(x), . . . , φM (x))T , then equation (4) equals to
The Element of Statistical Learning – Chapter 5
oxstar@SJTU January 6, 2011
Ex. 5.9 Derive the Reinsch form Sλ = (I + λK)−1 for the smoothing spline.
Answer Let K = (NT )−1ΩN N−1, so K does not depend on λ, and we have
i=1
i=1
N
≥ L(yi, g(xi)) + λJ(g)
i=1
iff when ρ(x) = 0, J(g˜) = J(g) and
N
N
L(yi, g˜(xi)) + λJ(g˜) = L(yi, g(xi)) + λJ(g)
i=1
i=1
2
Ex. 5.16 Consider the ridge regression problem (5.53), and assume M ≥ N . Assume you have
4.
N
N
L(yi, g˜(xi)) + λJ(g˜) ≥ L(yi, g(xi)) + λJ(g)
i=1
i=1
with equality iff ρ(x) = 0.
1
Proof 1.
2.
3.
K(·, xi), f
HK
= ∞ ci i=1 γi
K(·, xi), φi(·)
=
∞ i=1
ci γi
[γi
φi(xi)]
∞
= ciφi(xi) = f (xi)
(1)
i=1
∞1
K(·, xi), K(·, xj) HK
= k=1 γk
K(·, xi), K(·, xj)
∞1 = k=1 γk [γkφk(xi)γkφk(xj )]
∞
= γkφk(xi)φk(xj) = K(xi, xj)
(2)
k=1
NN
and αˆ = (K + λI)−1y.
fˆ(x) = h(x)T βˆ
N
= K(x, xi)αˆi
i=1
4. How would you modify your solution if M < N ?
Answer
1. According to the definition of K(x, y), we have
(3)
m=1
i=1
By the property of orthogonal eigen-functions φi(x)
1, i = k φi(x)φk(x)dx = φi(x), φk(x) = 0, i = k
So we can simplify (3)
M
( hm(x)φk(x)dx)hm(y) = γkφk(y)
m=1
Let gkm = hm(x)φk(x)dx and calculate ·, φ (y) , then
M
gkmhm(y) = γkφk(y)
(4)
m=1
M
gkm( hm(y)φ (y)dy) = γk φk(y)φ (y)dy
m=1
M
gkmg m = γkδk,
m=1
3
where
1, = k g m = hm(y)φ (y)dy, δk, = 0, = k
J (g˜) = g˜(x), g˜(x) HK = g(x), g(x) HK + 2 g(x), ρ(x) HK + ρ(x), ρ(x) HK
N
= J (g) + 2 αi K(x, xi), ρ(x) HK + J (ρ)
i=1
= J(g) + J(ρ)
Hence we proved
N
N
L(yi, g˜(xi)) + λJ(g˜) = L(yi, g(xi)) + λ(J(g) + J(ρ))
2. Show that
ˆf = Hβˆ = K(K + λI)−1y
where H is the N × M matrix of evaluations hm(xi), and K = HHT the N × N matrix of inner-products h(xi)T h(xj).
3. Show that
GM h(x) = Dγφ(x)
VDγ−
1 2
GM
h(x)
=
VD−γ
1 2
Dγ φ(x)
1
h(x) = VDγ2 φ(x)
And we have
N
min
{βm}M 1 i=1
M
2
M
yi −
βmhm(xi) + λ
βm2
m=1
m=1
// (5.63)
N
= min (yi − βT h(xi))2 + λβ2 // Let β = (β1, β2, . . . , βM )T
1. K(·, xi), f HK = f (xi).
2. K(·, xi), K(·, xj) HK = K(xi, xj).
3. If g(x) =
N i=1
αi K (x,
xi),
then
NN
J(g) =
K(xi, xj)αiαj.
i=1 j=1
Suppose that g˜(x) = g(x) + ρ(x), with ρ(x) ∈ HK, and orthogonal in HK to each of K(x, xi), i = 1, . . . , N . Show that
(5)
β
i=1
N
= min
(yi
−
βT
1
VDγ2
φ(xi))2
+
λβT
β
β
i=1
N
= min
(yi
−
cT
φ(xi))2
+
λ(VD−γ
1 2
c)T
VDγ−
1 2
c
c
i=1
//
Let
c
=
1
Dγ2
来自百度文库VT β
N
= min
c
(yi − cT φ(xi))2 + λcT cD−γ 1
i=1
N
= min
{cj }∞ 1 i=1
ˆf = Hβˆ = Kαˆ
= K(α1, α2, . . . , αN )T
N
fˆ(x) = h(x)T βˆ = K(x, xi)αˆi
i=1
4. M < N , so K is not invertible. If λ = 0, K + λI is still invertible hence solutions above still hold. But if λ = 0, we have
ΩN = NT KN Sλ = N(NT N + λΩN )−1NT
= N(NT N + λNT KN)−1NT = N[NT (I + λK)N]−1NT = NN−1(I + λK)−1(NT )−1NT = (I + λK)−1
We can also calculate the singular value decomposition of N, i.e. N = UDVT , where U and V are orthogonal matrices, D is diagonal matrix. Then we have
= ((HT )−1HT HH−1 + λ(HT )−1H−1)−1y = (I + λK−1)−1y = KK−1(I + λK−1)−1y = K(K + λI)−1y
4
3. From (b), we have
J (g) = g(x), g(x) HK =
K(x, xi)K(x, xj ) HK αiαj
i=1 j=1
NN
=
K(xi, xj)αiαj // (2)
i=1 j=1
4. Because ρ(x) orthogonal in HK to each of K(x, xi), we have K(·, xi), ρ(x) HK = 0 and g˜(xi) = K(·, xi), g˜(xi) HK = K(·, xi), g(xi) HK + K(·, xi), ρ(xi) HK = g(xi) // (1)
∞
yi − cj φj (xi)
j=1
2
+λ
∞
c2j
j=1 γj
// (5.53)
2. Calculate derivative of (5) w.r.t. β and let it = 0, we have
−HT (y − Hβˆ) + λβˆ = 0
M < N , so K is invertible and for any λ, K + λI = HT H + λI is invertible
a kernel K that computes the inner product K(x, y) =
M m=1
hm(x)hm(y).
1. Derive (5.62) on page 171 in the text. How would you compute the matrices V and Dγ, given K? Hence show that (5.63) is equivalent to (5.53).
ˆf = Hβˆ = H(HT H)−1HT y = y
5
Ex. 5.15 This exercise derives some of the results quoted in Section 5.8.1. Suppose K(x, y) satisfying the conditions (5.45) and let f (x) ∈ HK. Show that
Let GM = {gnm}, n ≤ M and every row of GM are linear independent, then we have
GM GTM = diag{γ1, γ2, . . . , γM } = Dγ
Let
VT
=
Dγ−
1 2
GM
(VT )T VT GTM
=
VVT GTM
M
∞
K(x, y) = hm(x)hm(y) = γiφi(x)φi(y)
m=1
i=1
Multiply each items with φk(x) and integral w.r.t. x, i.e. calculate K(x, y), φk(x)
M
∞
( hm(x)φk(x)dx)hm(y) = γi( φi(x)φk(x)dx)φi(y)
Sλ = N(NT N + λΩN )−1NT = UDVT (VDUT UDVT + λΩN )−1VDUT = (UD−1VT (VD2VT )VD−1UT + UD−1VT (λΩN )VD−1UT )−1 = (I + λUD−1VT ΩN VD−1UT )−1 = (I + λK)−1
where K = UD−1VT ΩN VD−1UT
=
GTM
(Dγ−
1 2
)T
Dγ−
1 2
GM
GTM
=
GTM
D−γ
1 2
D−γ
1 2
Dγ
=
GTM
Therefore (VT )T VT = VVT = E, VT is an orthogonal matrix. Let h(x) = (h1(x), h2(x), . . . , hM (x))T and φ(x) = (φ1(x), φ2(x), . . . , φM (x))T , then equation (4) equals to
The Element of Statistical Learning – Chapter 5
oxstar@SJTU January 6, 2011
Ex. 5.9 Derive the Reinsch form Sλ = (I + λK)−1 for the smoothing spline.
Answer Let K = (NT )−1ΩN N−1, so K does not depend on λ, and we have
i=1
i=1
N
≥ L(yi, g(xi)) + λJ(g)
i=1
iff when ρ(x) = 0, J(g˜) = J(g) and
N
N
L(yi, g˜(xi)) + λJ(g˜) = L(yi, g(xi)) + λJ(g)
i=1
i=1
2
Ex. 5.16 Consider the ridge regression problem (5.53), and assume M ≥ N . Assume you have
4.
N
N
L(yi, g˜(xi)) + λJ(g˜) ≥ L(yi, g(xi)) + λJ(g)
i=1
i=1
with equality iff ρ(x) = 0.
1
Proof 1.
2.
3.
K(·, xi), f
HK
= ∞ ci i=1 γi
K(·, xi), φi(·)
=
∞ i=1
ci γi
[γi
φi(xi)]
∞
= ciφi(xi) = f (xi)
(1)
i=1
∞1
K(·, xi), K(·, xj) HK
= k=1 γk
K(·, xi), K(·, xj)
∞1 = k=1 γk [γkφk(xi)γkφk(xj )]
∞
= γkφk(xi)φk(xj) = K(xi, xj)
(2)
k=1
NN
and αˆ = (K + λI)−1y.
fˆ(x) = h(x)T βˆ
N
= K(x, xi)αˆi
i=1
4. How would you modify your solution if M < N ?
Answer
1. According to the definition of K(x, y), we have
(3)
m=1
i=1
By the property of orthogonal eigen-functions φi(x)
1, i = k φi(x)φk(x)dx = φi(x), φk(x) = 0, i = k
So we can simplify (3)
M
( hm(x)φk(x)dx)hm(y) = γkφk(y)
m=1
Let gkm = hm(x)φk(x)dx and calculate ·, φ (y) , then
M
gkmhm(y) = γkφk(y)
(4)
m=1
M
gkm( hm(y)φ (y)dy) = γk φk(y)φ (y)dy
m=1
M
gkmg m = γkδk,
m=1
3
where
1, = k g m = hm(y)φ (y)dy, δk, = 0, = k
J (g˜) = g˜(x), g˜(x) HK = g(x), g(x) HK + 2 g(x), ρ(x) HK + ρ(x), ρ(x) HK
N
= J (g) + 2 αi K(x, xi), ρ(x) HK + J (ρ)
i=1
= J(g) + J(ρ)
Hence we proved
N
N
L(yi, g˜(xi)) + λJ(g˜) = L(yi, g(xi)) + λ(J(g) + J(ρ))
2. Show that
ˆf = Hβˆ = K(K + λI)−1y
where H is the N × M matrix of evaluations hm(xi), and K = HHT the N × N matrix of inner-products h(xi)T h(xj).
3. Show that
GM h(x) = Dγφ(x)
VDγ−
1 2
GM
h(x)
=
VD−γ
1 2
Dγ φ(x)
1
h(x) = VDγ2 φ(x)
And we have
N
min
{βm}M 1 i=1
M
2
M
yi −
βmhm(xi) + λ
βm2
m=1
m=1
// (5.63)
N
= min (yi − βT h(xi))2 + λβ2 // Let β = (β1, β2, . . . , βM )T
1. K(·, xi), f HK = f (xi).
2. K(·, xi), K(·, xj) HK = K(xi, xj).
3. If g(x) =
N i=1
αi K (x,
xi),
then
NN
J(g) =
K(xi, xj)αiαj.
i=1 j=1
Suppose that g˜(x) = g(x) + ρ(x), with ρ(x) ∈ HK, and orthogonal in HK to each of K(x, xi), i = 1, . . . , N . Show that
(5)
β
i=1
N
= min
(yi
−
βT
1
VDγ2
φ(xi))2
+
λβT
β
β
i=1
N
= min
(yi
−
cT
φ(xi))2
+
λ(VD−γ
1 2
c)T
VDγ−
1 2
c
c
i=1
//
Let
c
=
1
Dγ2
来自百度文库VT β
N
= min
c
(yi − cT φ(xi))2 + λcT cD−γ 1
i=1
N
= min
{cj }∞ 1 i=1
ˆf = Hβˆ = Kαˆ
= K(α1, α2, . . . , αN )T
N
fˆ(x) = h(x)T βˆ = K(x, xi)αˆi
i=1
4. M < N , so K is not invertible. If λ = 0, K + λI is still invertible hence solutions above still hold. But if λ = 0, we have
ΩN = NT KN Sλ = N(NT N + λΩN )−1NT
= N(NT N + λNT KN)−1NT = N[NT (I + λK)N]−1NT = NN−1(I + λK)−1(NT )−1NT = (I + λK)−1
We can also calculate the singular value decomposition of N, i.e. N = UDVT , where U and V are orthogonal matrices, D is diagonal matrix. Then we have