统计学习[The-Elements-of-Statistical-Learning]第四章习题

合集下载

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

xT Σˆ −1(µˆ2
− µˆ1)
>
1 2 (µˆ2
+ µˆ1)T Σˆ −1(µˆ2
− µˆ1)
+ log(π1)
− log(π2)
=
1 2
µˆT2
Σˆ −1µˆ2
−
1 2
µˆT1
Σˆ −1
µˆ1
+
log
N1 N
− log
N2 N
2. Let β = (β, β0)T and compute the partial deviation of the RSS(β ), then we have
+
µ1)T Σ−1(µ2
−
µ1)
+
xT Σ−1(µ2
+
µ1)
When it > 0, the LDA rule will classify x to class 2, meanwhile, we need to estimate the parameters of the Gaussian distributions using our training data
= N (µˆ2 − µˆ1) // y1 = N/N1, y2 = N/N2
(8)
2
We also have
2
(N − 2)Σˆ =
(xi − µˆk)(xi − µˆk)T
k=1 gi=k
2
=
(xixTi − 2xiµˆTk + µˆkµˆTk ) // xTi µˆk = xiµˆTk
k=1 gi=k
The Element of Statistical Learning – Chapter 4
oxstar@SJTU January 6, 2011
Ex. 4.1 Show how to solve the generalized eigenvalue problem max aT Ba subject to aT Wa = 1 by transforming to a standard eigenvalue problem.
1. Show that the LDA rule classies to class 2 if
xT Σˆ −1(µˆ2
− µˆ1)
>
1 2
µˆT2
Σˆ −1µˆ2
−
1 2
µˆT1
Σˆ −1
µˆ1
+
log
N1 N
− log
N2 , N
and class 1 otherwise.
2. Consider minimization of the least squares criterion
∂பைடு நூலகம்
RSS(β ∂β0
)
=
−2
N
(yi
i=1
−
β0
−
βT xi)
=
0
(1)
∂ RSS(β ) = −2
∂β
N
xi(yi − β0 − βT xi) = 0
(2)
i=1
We can also derive that
1 β0 = N
N
(yi − βT xi)
// from (1)
(3)
i=1
N
xi[βT (xi − x¯)] =
max(aT
Ba)
=
max(bT
W−
1 2
BW−
1 2
b)
a
b
subject to
aT
Wa
=
bT
W−
1 2
WW−
1 2
b
=
1
So the problem is transformed to a standard eigenvalue problem.
Ex. 4.2 Suppose we have features x ∈ Rp, a two-class response, with class sizes N1, N2, and the target coded as −N/N1, N/N2.
N
(yi − β0 − βT xi)2.
i=1
Show that the solution βˆ satisﬁes
(N
−
2)Σˆ
+
N1N2 N
Σˆ B
β = N (µˆ2 − µˆ1)
(after simpliﬁcation), where Σˆ B = (µˆ2 − µˆ1)(µˆ2 − µˆ1)T . 3. Hence show that Σˆ Bβ is in the direction (µˆ2 − µˆ1) and thus
(Fisher, 1936; Ripley, 1996)
Proof 1. Consider the log-ratio of each class density (equation 4.9 in text book)
log
Pr(G Pr(G
= =
2|X 1|X
= =
x) x)
=
log
π2 π1
−
1 2 (µ2
Answer W is the common covariance matrix, and it’s positive-semideﬁnite, so we can deﬁne
1
b = W 2 a,
a
=
W−
1 2
b,
aT
=
bT
W−
1 2
Hence the generalized eigenvalue problem
N
xi
1 yi − N
N
yj
// from (2)(3)
(4)
i=1
i=1
j=1
2
=
xi
yk
−
N1y1
+ N
N2y2
(5)
k=1 gi=k
=
2
Nk µˆk
N yk − (N1y1 + N2y2) N
k=1
=
N1N2 N
(y2
−
y1)(µˆ2
−
µˆ1)
// xi = Nkµˆk (6)
gi =k
(7)
βˆ ∝ Σˆ −1(µˆ2 − µˆ1)
Therefore the least squares regression coeﬃcient is identical to the LDA coeﬃcient, up to a scalar multiple.
4. Show that this result holds for any (distinct) coding of the two classes.
1
5. Find the solution βˆ0, and hence the predicted values fˆ = βˆ0 + βˆT x. Consider the following rule: classify to class 2 if yˆi > 0 and class 1 otherwise. Show this is not the same as the LDA rule unless the classes have equal numbers of observations.