多元回归分析模型识别和数据问题.
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
form of the White test
It can be tedious to add and test extra terms, plus may find a square
Instetremamdattoerfs awhdednirneagllyfuusnincgtlioogsnwsouoldf bthe eevexn’bsettderirectly,
the first one.
¡ Method 2: the Davidson-Mackinnon test
n If (m1) is true, then the fitted values from (m2) should be insignificant in (m1). Thus, to test (m1), we first estimate (m2) by OLS to obtain the fitted values, ŷ. Then plug it in to (m1), that’s y = b0 + b1x1 + b2x2 +q ŷ+ u
RESET test procedure
Estimate the models: reg price on lotsize, sqrft, bdrms, and get fitted value of price, ŷ and SSRr=300723.806, n=88 R2=0.6724
Calculate ŷ2, ŷ3, and plug them to the original equation, and estimate it. That is, reg price on lotsize, sqrft, bdrms, ŷ2, ŷ3, and SSRur=269983.825 n=88 R2=0.7059
Functional from misspecification
Functional Form
We’ve seen that a linear regression can really fit nonlinear relationships
Can use logs on RHS, LHS or both Can use quadratic forms of x’s Can use interactions of x’s How do we know if we’ve gotten the
It may be possible to avoid omitteHale Waihona Puke Baidu variable bias by using a proxy variable
Model: y = b0 + b1x1 + b2x2 + b3x3* + u A proxy variable must be related to the
y = b0 + b1x1 + b2x2 + b3x3* + u x3* = d0 + d3x3 + v3 What do we need for for this solution to give us
consistent estimates of b1 and b2?
Assume u is uncorrelated with x1, x2 and x3*,x3 and v3 is uncorrelated with x1, x2 and x3
In the same way, we can calculate the second model F= [(2.86256385-2.69401081)/2]/(2.69401081/82)=2.565, p-
value=0.0835. So we can’t reject the null hypothesis at the 5% significance.
Does it make more sense for the derivative of x1 to vary with x1 (quadratic) or with x2 (interactions) or to be fixed?
Functional Form (continued)
WmihWhsiegsehtpaleholrergeeo(cawrrddaiyfegtirehek)tneed=ormwb?m0s+hooobrw1dientdetoeulrctae+ycsttibjo=2oneisnxbptbe0eerxlo++cnlbug3bstiieon1nnxuthr1reees++mtruoic…dteioln+s tobskexekif+u RESEl+ogTb(6wtreaenguelr)iee=2 +sb0b+7oebdn1uecda•utecnt+urribce+2keuxpseirm+bi3latenrutreo+tbh4eedusc2p+ebc5 eiaxpler2
+
dH10ŷ:3d+1TRIlonheEg=Se(SwrntEraa,0Tgoctaeo,i)r,ns=stdthaihbd2ee0en+rRF=dbtEhs1teS0eatdteEeiuxsTuctpsi+actstnefbdiso2netredcxtgepoeseqmirFnu+mga~btHai3oFnt0ne:dn2d:u4,ron=ev-0+tk,e-dds345t=ŷo20+rdL5 ŷM3 + ~u χ22
RESET test, example
Housing price equation (hprice.raw)
price = b0 +b1 lotsize +b2 sqrft +b3 bdrms +u log(price) = b0 +b1 log(lotsize) +b2 log(sqrft) +b3 bdrms +u
log(wage)=b0 + b1 educ + u, or log(educ) as
independent variable
Does it make more sense for x to affect y in percentage (use logs) or absolute terms?
Results
log(wâge) =5.503 + 0.078 educ + 0.0198exper (biased estimate)
(0.112) (0.007) (0.003)
n=935 R2=0.1309
log(wâge) = 5.198 + 0.057educ + 0.0195exper + 0.0058IQ
第八章 多元回归分析
模型识别和数据问题
contents
Functional form misspecification Using proxy variables Measurement error in variables Missing data and Outlying
observations
¡ y = b0 + b1x1 + b2x2 +u ¡ y = b0 + b1log(x1) + b2log(x2)+u
n Which model to choose?
(m1) (m2)
¡ Method 1: estimate a comprehensive model
n y = d0 + d1x1 + d2x2+ d3log(x1) + d4log(x2)+u n H0: d3 =0, d4=0 for the second model and H0: d1 =0, d1=0 for
n A significant t statistic is a rejection of model (m1).
Proxy vairables
Proxy Variables
What if model is misspecified because no data is available on an important x variable?
So the F value = [(300723.806-269983.825)/2]/(269983.825/82) = 4.6682, the p-value=0.012, therefore, we will reject the null hypothesis that there is no misspecification.
unobservable variable – for example: x3* = d0 + d3x3 + v3, where * implies unobserved Now suppose we just substitute x3 for x3*
Proxy Variables (continued)
weAtesatted(sRtdoEfSafEunTn)cdtiotneasl ftorfmuniscRtaimosneys’sorefgŷression specification error
So,
eFGsiertstitfmietstetaidmtvaeatelulyeogŷ=((wloabgg(e0w)â=+geb)0bo+1fbxa1b1eodvu+ece+q…uba2tie+oxnp)ebr k+xbk3 te+nurde1+ŷu2
Nonnested Alternatives Test
n If the models have the same dependent variables, but nonnested x’s could still just make a giant model with the x’s from both and test joint exclusion restrictions that lead to one model or the other. For example, we have to choose model between
E(x3* | x1, x2, x3) = E(x3* | x3) = d0 + d3x3
So really running y = (b0 + b3d0) + b1x1+ b2x2 + b3d3x3 + (u + b3v3) and have just redefined
intercept, error term x3 coefficient
right functional form for our model?
Functional Form (continued)
First, use economic theory to guide you
Y=AKaLbeu or lnY = lnA + alnK + blnL + u
Think about the interpretation
Example: IQ as a Proxy for Ability (wage2.raw, p297)
Model
log(wage) = b0 + b1educ + b2exper + b3abil + u Assume E(u|educ, exper, abil)=0
But the data of ability is not available, we think IQ may correlate with ability, that’s
(0.122) (0.007) (0.003)
(0.001)
n=935 R2=0.1622
(efficient estimate)
abil=d0 +d1IQ+ v Assume E(v |educ, exper, IQ)= 0
so we use IQ as a proxy for ability. And the estimated model is
log(wage) = b0* + b1educ + b2exper + b3*IQ + u*
It can be tedious to add and test extra terms, plus may find a square
Instetremamdattoerfs awhdednirneagllyfuusnincgtlioogsnwsouoldf bthe eevexn’bsettderirectly,
the first one.
¡ Method 2: the Davidson-Mackinnon test
n If (m1) is true, then the fitted values from (m2) should be insignificant in (m1). Thus, to test (m1), we first estimate (m2) by OLS to obtain the fitted values, ŷ. Then plug it in to (m1), that’s y = b0 + b1x1 + b2x2 +q ŷ+ u
RESET test procedure
Estimate the models: reg price on lotsize, sqrft, bdrms, and get fitted value of price, ŷ and SSRr=300723.806, n=88 R2=0.6724
Calculate ŷ2, ŷ3, and plug them to the original equation, and estimate it. That is, reg price on lotsize, sqrft, bdrms, ŷ2, ŷ3, and SSRur=269983.825 n=88 R2=0.7059
Functional from misspecification
Functional Form
We’ve seen that a linear regression can really fit nonlinear relationships
Can use logs on RHS, LHS or both Can use quadratic forms of x’s Can use interactions of x’s How do we know if we’ve gotten the
It may be possible to avoid omitteHale Waihona Puke Baidu variable bias by using a proxy variable
Model: y = b0 + b1x1 + b2x2 + b3x3* + u A proxy variable must be related to the
y = b0 + b1x1 + b2x2 + b3x3* + u x3* = d0 + d3x3 + v3 What do we need for for this solution to give us
consistent estimates of b1 and b2?
Assume u is uncorrelated with x1, x2 and x3*,x3 and v3 is uncorrelated with x1, x2 and x3
In the same way, we can calculate the second model F= [(2.86256385-2.69401081)/2]/(2.69401081/82)=2.565, p-
value=0.0835. So we can’t reject the null hypothesis at the 5% significance.
Does it make more sense for the derivative of x1 to vary with x1 (quadratic) or with x2 (interactions) or to be fixed?
Functional Form (continued)
WmihWhsiegsehtpaleholrergeeo(cawrrddaiyfegtirehek)tneed=ormwb?m0s+hooobrw1dientdetoeulrctae+ycsttibjo=2oneisnxbptbe0eerxlo++cnlbug3bstiieon1nnxuthr1reees++mtruoic…dteioln+s tobskexekif+u RESEl+ogTb(6wtreaenguelr)iee=2 +sb0b+7oebdn1uecda•utecnt+urribce+2keuxpseirm+bi3latenrutreo+tbh4eedusc2p+ebc5 eiaxpler2
+
dH10ŷ:3d+1TRIlonheEg=Se(SwrntEraa,0Tgoctaeo,i)r,ns=stdthaihbd2ee0en+rRF=dbtEhs1teS0eatdteEeiuxsTuctpsi+actstnefbdiso2netredcxtgepoeseqmirFnu+mga~btHai3oFnt0ne:dn2d:u4,ron=ev-0+tk,e-dds345t=ŷo20+rdL5 ŷM3 + ~u χ22
RESET test, example
Housing price equation (hprice.raw)
price = b0 +b1 lotsize +b2 sqrft +b3 bdrms +u log(price) = b0 +b1 log(lotsize) +b2 log(sqrft) +b3 bdrms +u
log(wage)=b0 + b1 educ + u, or log(educ) as
independent variable
Does it make more sense for x to affect y in percentage (use logs) or absolute terms?
Results
log(wâge) =5.503 + 0.078 educ + 0.0198exper (biased estimate)
(0.112) (0.007) (0.003)
n=935 R2=0.1309
log(wâge) = 5.198 + 0.057educ + 0.0195exper + 0.0058IQ
第八章 多元回归分析
模型识别和数据问题
contents
Functional form misspecification Using proxy variables Measurement error in variables Missing data and Outlying
observations
¡ y = b0 + b1x1 + b2x2 +u ¡ y = b0 + b1log(x1) + b2log(x2)+u
n Which model to choose?
(m1) (m2)
¡ Method 1: estimate a comprehensive model
n y = d0 + d1x1 + d2x2+ d3log(x1) + d4log(x2)+u n H0: d3 =0, d4=0 for the second model and H0: d1 =0, d1=0 for
n A significant t statistic is a rejection of model (m1).
Proxy vairables
Proxy Variables
What if model is misspecified because no data is available on an important x variable?
So the F value = [(300723.806-269983.825)/2]/(269983.825/82) = 4.6682, the p-value=0.012, therefore, we will reject the null hypothesis that there is no misspecification.
unobservable variable – for example: x3* = d0 + d3x3 + v3, where * implies unobserved Now suppose we just substitute x3 for x3*
Proxy Variables (continued)
weAtesatted(sRtdoEfSafEunTn)cdtiotneasl ftorfmuniscRtaimosneys’sorefgŷression specification error
So,
eFGsiertstitfmietstetaidmtvaeatelulyeogŷ=((wloabgg(e0w)â=+geb)0bo+1fbxa1b1eodvu+ece+q…uba2tie+oxnp)ebr k+xbk3 te+nurde1+ŷu2
Nonnested Alternatives Test
n If the models have the same dependent variables, but nonnested x’s could still just make a giant model with the x’s from both and test joint exclusion restrictions that lead to one model or the other. For example, we have to choose model between
E(x3* | x1, x2, x3) = E(x3* | x3) = d0 + d3x3
So really running y = (b0 + b3d0) + b1x1+ b2x2 + b3d3x3 + (u + b3v3) and have just redefined
intercept, error term x3 coefficient
right functional form for our model?
Functional Form (continued)
First, use economic theory to guide you
Y=AKaLbeu or lnY = lnA + alnK + blnL + u
Think about the interpretation
Example: IQ as a Proxy for Ability (wage2.raw, p297)
Model
log(wage) = b0 + b1educ + b2exper + b3abil + u Assume E(u|educ, exper, abil)=0
But the data of ability is not available, we think IQ may correlate with ability, that’s
(0.122) (0.007) (0.003)
(0.001)
n=935 R2=0.1622
(efficient estimate)
abil=d0 +d1IQ+ v Assume E(v |educ, exper, IQ)= 0
so we use IQ as a proxy for ability. And the estimated model is
log(wage) = b0* + b1educ + b2exper + b3*IQ + u*