第2章多元回归分析
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
wage=b0b1educ+b2exper+b3tenure+u log(wage)=b0b1educ+b2exper+b3tenure+u
The estimated equation as below:
wage=2.8730.599educ+0.022exper+0.169tenure log(wage)=0.2840.092educ+0.0041exper+0.022tenure
educ=13.575-0.0738exper+0.048tenure wage=5.896+0.599resid log(wage)=1.623+0.092resid
We can see that the coefficient of resid is the same of the coefficien of the variable educ in the first estimated equation. And the same to log(wage) in the second equation.
Now, we first regress educ on exper and tenure to patial out the exper and tenure’s effects. Then we regress wage on the residuals of educ on exper and tenure. Whether we get the same result.?
Because R2 will usually increase with the number of independent variables, it is not a good way to compare models
This means only the part of xi1 that is uncorrelated with xi2 are being related to yi so we’re estimating the effect of x1 on y after x2 has been “partialled out”
Consider the case wherek 2, i.e.
yˆ bˆ0 bˆ1x1 bˆ2 x2 , then
bˆ1 rˆi1yi
rˆi12 , w hererˆi1 are
the residuals from the estimated
regression xˆ1 ˆ0 ˆ2 xˆ2
13
Goodness-of-Fit (continued)
How do we think about how well our sample regression line fits our sample data?
Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regression
i1
n xik y i bˆ0 bˆ1 xi1 bˆk xik 0.
i1
4
Obtaining OLS Estimates, cont.
bb b b y ˆˆ0ˆ1 x 1ˆ2 x 2 ˆkx k
The above estimated equation is called the OLS regression line or the sample regression function (SRF)
fIrno mthethgeefniersrtalocrdaeser cwointhd iktioinnd, ewpeencdanengtevt akriab1 les,
lwineeasreeeqkueasttiiomnastienskbˆ0,1bˆu1,n k n,obˆwk
tynˆheryebiˆf0orbeˆb,0ˆ1mx1ibnˆ1ixmi1 izebˆtkhxekbˆskuxmik of
so holding x2,...,xk fixed implies that
yˆ bˆ1x1, thatis each b has
a ceteris paribus interpretation
6
An Example (Wooldridge, p76)
The determination of wage (dollars per hour), wage:
the above equation is the estimated equation, is not the really equation. The really equation is population regression line which we don’t know. Wweecoannlygeest taimnOoaLttheSeiintrt.edSrcifeofpe,trueessntiinmt geatseatimdifaOfetLeSrdesnleotpqseuaeasmttiimopnalete,s line. The population regression line is
R2 = SSE/SST = 1 – SSR/SST
14
Goodness-of-Fit (continued)
Wecanalsothinkof R2 asbeingequalto
thesquaredcorrelation coefficient between
theactualyi andthevaluesyˆi
The estimated equations without tenure
wage=3.3910.644educ+0.070exper log(wage)=0.2170.098educ+0.0103exper
wage=0.9050.541educ log(wage)=0.5840.083educ
nins btˆh0e, beˆ1q,uit,iobˆnk : s0quared residuals:
i
i
x y nn1
i11 i1 i
yi
bˆ0bˆ0
wk.baidu.com
bˆ1bxˆi11xi1
bˆkbxˆ kikx
2 ik
0
n xi 2 y i bˆ0 bˆ1 xi1 bˆk xik 0
第二章 多元回归分析:估计
y = b0 + b1x1 + b2x2 + . . . bkxk + u
1
Multiple Regression Analysis
y = b0 + b1x1 + b2x2 + . . . bkxk + u
1. Estimation
2
Parallels with Simple Regression
yi y2 is the totalsum of squares(SST) yˆi y2 is theexplained sum of squares(SSE) uˆi2 is the residualsum of squares(SSR)
Then SST SSE SSR
10
Simple vs Multiple Reg Estimate
Compatrheesimplreegressi~yonb~0 b~1x1 with thme ultiprlegressiyˆonbˆ0 bˆ1x1 bˆ2x2 Generalbl~y1 , bˆ1 unles:s bˆ2 0(i.en. opartiaelffecotfx2)OR
Years of education, educ Years of labor market experience, exper Years with the current employer, tenure
The relationship btw. wage and educ, exper, tenure:
x1 andx2 areuncorreeldaitn thseample
11
The wage determinations: exemple
The estimated equation as below:
wage=2.8730.599educ+0.022exper+0.169tenure log(wage)=0.2840.092educ+0.0041exper+0.022tenure
9
The wage determinations
The estimated equation as below:
wage=2.8730.599educ+0.022exper+0.169tenure log(wage)=0.2840.092educ+0.0041exper+0.022tenure
y = b0 + b1x1 + b2x2 + . . . bkxk + u
b0 is still the intercept b1 to bk all called slope parameters
u is still the error term (or disturbance) Still need to make a zero conditional mean
8
“Partialling Out” continued
Previous equation implies that regressing y on x1 and x2 gives same effect of x1 as regressing y on residuals from a regression of x1 on x2
R2
yi y yˆi yˆ 2 yi y2 yˆi yˆ 2
15
More about R-squared
R2 can never decrease when another independent variable is added to a regression, and usually will increase
The STATA command
Use [path]wage1.dta (insheet using [path]wage1.raw/wage1.txt) Reg wage educ exper tenure Reg lwage educ exper tenure
7
A “Partialling Out” Interpretation
12
Goodness-of-Fit
We can thinkof each observation as being made up of an explained part,and an unexplained part, yi yˆi uˆi Wethen define the following :
bbb b E ( y |x ) 0 1 x 1 2 x 2 k x k
5
Interpreting Multiple Regression
yˆ bˆ0 bˆ1x1 bˆ2x2 ... bˆk xk , so yˆ bˆ1x1 bˆ2x2 ... bˆkxk ,
assumption, so now assume that E(u|x1,x2, …,xk) = 0 Still minimizing the sum of squared residuals,
so have k+1 first order conditions
3
Obtaining OLS Estimates
The estimated equation as below:
wage=2.8730.599educ+0.022exper+0.169tenure log(wage)=0.2840.092educ+0.0041exper+0.022tenure
educ=13.575-0.0738exper+0.048tenure wage=5.896+0.599resid log(wage)=1.623+0.092resid
We can see that the coefficient of resid is the same of the coefficien of the variable educ in the first estimated equation. And the same to log(wage) in the second equation.
Now, we first regress educ on exper and tenure to patial out the exper and tenure’s effects. Then we regress wage on the residuals of educ on exper and tenure. Whether we get the same result.?
Because R2 will usually increase with the number of independent variables, it is not a good way to compare models
This means only the part of xi1 that is uncorrelated with xi2 are being related to yi so we’re estimating the effect of x1 on y after x2 has been “partialled out”
Consider the case wherek 2, i.e.
yˆ bˆ0 bˆ1x1 bˆ2 x2 , then
bˆ1 rˆi1yi
rˆi12 , w hererˆi1 are
the residuals from the estimated
regression xˆ1 ˆ0 ˆ2 xˆ2
13
Goodness-of-Fit (continued)
How do we think about how well our sample regression line fits our sample data?
Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regression
i1
n xik y i bˆ0 bˆ1 xi1 bˆk xik 0.
i1
4
Obtaining OLS Estimates, cont.
bb b b y ˆˆ0ˆ1 x 1ˆ2 x 2 ˆkx k
The above estimated equation is called the OLS regression line or the sample regression function (SRF)
fIrno mthethgeefniersrtalocrdaeser cwointhd iktioinnd, ewpeencdanengtevt akriab1 les,
lwineeasreeeqkueasttiiomnastienskbˆ0,1bˆu1,n k n,obˆwk
tynˆheryebiˆf0orbeˆb,0ˆ1mx1ibnˆ1ixmi1 izebˆtkhxekbˆskuxmik of
so holding x2,...,xk fixed implies that
yˆ bˆ1x1, thatis each b has
a ceteris paribus interpretation
6
An Example (Wooldridge, p76)
The determination of wage (dollars per hour), wage:
the above equation is the estimated equation, is not the really equation. The really equation is population regression line which we don’t know. Wweecoannlygeest taimnOoaLttheSeiintrt.edSrcifeofpe,trueessntiinmt geatseatimdifaOfetLeSrdesnleotpqseuaeasmttiimopnalete,s line. The population regression line is
R2 = SSE/SST = 1 – SSR/SST
14
Goodness-of-Fit (continued)
Wecanalsothinkof R2 asbeingequalto
thesquaredcorrelation coefficient between
theactualyi andthevaluesyˆi
The estimated equations without tenure
wage=3.3910.644educ+0.070exper log(wage)=0.2170.098educ+0.0103exper
wage=0.9050.541educ log(wage)=0.5840.083educ
nins btˆh0e, beˆ1q,uit,iobˆnk : s0quared residuals:
i
i
x y nn1
i11 i1 i
yi
bˆ0bˆ0
wk.baidu.com
bˆ1bxˆi11xi1
bˆkbxˆ kikx
2 ik
0
n xi 2 y i bˆ0 bˆ1 xi1 bˆk xik 0
第二章 多元回归分析:估计
y = b0 + b1x1 + b2x2 + . . . bkxk + u
1
Multiple Regression Analysis
y = b0 + b1x1 + b2x2 + . . . bkxk + u
1. Estimation
2
Parallels with Simple Regression
yi y2 is the totalsum of squares(SST) yˆi y2 is theexplained sum of squares(SSE) uˆi2 is the residualsum of squares(SSR)
Then SST SSE SSR
10
Simple vs Multiple Reg Estimate
Compatrheesimplreegressi~yonb~0 b~1x1 with thme ultiprlegressiyˆonbˆ0 bˆ1x1 bˆ2x2 Generalbl~y1 , bˆ1 unles:s bˆ2 0(i.en. opartiaelffecotfx2)OR
Years of education, educ Years of labor market experience, exper Years with the current employer, tenure
The relationship btw. wage and educ, exper, tenure:
x1 andx2 areuncorreeldaitn thseample
11
The wage determinations: exemple
The estimated equation as below:
wage=2.8730.599educ+0.022exper+0.169tenure log(wage)=0.2840.092educ+0.0041exper+0.022tenure
9
The wage determinations
The estimated equation as below:
wage=2.8730.599educ+0.022exper+0.169tenure log(wage)=0.2840.092educ+0.0041exper+0.022tenure
y = b0 + b1x1 + b2x2 + . . . bkxk + u
b0 is still the intercept b1 to bk all called slope parameters
u is still the error term (or disturbance) Still need to make a zero conditional mean
8
“Partialling Out” continued
Previous equation implies that regressing y on x1 and x2 gives same effect of x1 as regressing y on residuals from a regression of x1 on x2
R2
yi y yˆi yˆ 2 yi y2 yˆi yˆ 2
15
More about R-squared
R2 can never decrease when another independent variable is added to a regression, and usually will increase
The STATA command
Use [path]wage1.dta (insheet using [path]wage1.raw/wage1.txt) Reg wage educ exper tenure Reg lwage educ exper tenure
7
A “Partialling Out” Interpretation
12
Goodness-of-Fit
We can thinkof each observation as being made up of an explained part,and an unexplained part, yi yˆi uˆi Wethen define the following :
bbb b E ( y |x ) 0 1 x 1 2 x 2 k x k
5
Interpreting Multiple Regression
yˆ bˆ0 bˆ1x1 bˆ2x2 ... bˆk xk , so yˆ bˆ1x1 bˆ2x2 ... bˆkxk ,
assumption, so now assume that E(u|x1,x2, …,xk) = 0 Still minimizing the sum of squared residuals,
so have k+1 first order conditions
3
Obtaining OLS Estimates