北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU5教学文稿

合集下载

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Class 5: ANOVA (Analysis of Variance) and F-tests

I. What is ANOVA

What is ANOVA? ANOVA is the short name for the Analysis of Variance. The essence

of ANOVA is to decompose the total variance of the dependent variable into two additive components, one for the structural part, and the other for the stochastic part, of a regression. Today we are going to examine the easiest case.

II. ANOVA: An Introduction

Let the model be

εβ+= X y .

Assuming x i is a column vector (of length p) of independent variable values for the i th'

observation,

i i i εβ+='x y .

Then b 'x i is the predicted value.

sum of squares total:

[]

∑-=2

Y y SST i []

∑-+-=2

'x b 'x y Y b i i i

[][]

∑∑∑-+-+-=Y -b 'x b 'x y 2Y b 'x b 'x y 2

2

i i i i i i

[][]

∑∑-+=2

2

Y b 'x e i i

because [][][]∑∑=-=--0Y b 'x e Y b 'x b 'x y i

i i i i .

This is always true by OLS. = SSE + SSR

Important: the total variance of the dependent variable is decomposed into two additive parts: SSE, which is due to errors, and SSR, which is due to regression. Geometric interpretation: [blackboard ]

Decomposition of Variance

If we treat X as a random variable, we can decompose total variance to the between-group portion and the within-group portion in any population:

()()()i i i x y εβV 'V V +=

Prove:

()()

i i i x y εβ+='V V

()

()()

i i i i x x εβεβ,'Cov 2V 'V ++=

()()i

i

x εβV 'V +=

(by the assumption that ()

0 ,'Cov =εβk x , for all possible k.)

The ANOVA table is to estimate the three quantities of equation (1) from the sample.

As the sample size gets larger and larger, the ANOVA table will approach the equation closer and closer.

In a sample, decomposition of estimated variance is not strictly true. We thus need to

separately decompose sums of squares and degrees of freedom. Is ANOVA a misnomer?

III. ANOVA in Matrix

I will try to give a simplied representation of ANOVA as follows:

[]

∑-=2

Y y SST i (

)

∑-+=i i y Y 2Y y 2

2

∑∑∑-+=i i y Y 2Y y 2

2

∑-+=2

22

Y n 2Y n y i (because ∑=Y n y i )

∑-=2

2

Y n y i

2

Y n y 'y -=

y J 'y n /1y 'y -= (in your textbook, monster look)

SSE = e'e

[]

∑-=2

Y b 'x SSR i

()()[

]

∑-+=Y b 'x 2Y b 'x 2

2

i i

()[]

()∑∑-+=b 'x Y 2Y n b 'x 2

2

i i