工具变量_李兵
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
one of the regressors is endogenous and we do not have an
instrument
■ The demand function can be consistently estimated because we
can take z1 as an instrument for the endogenous price variable
工具变量工作原理
Graphical illustration of identification problem
Intuitively, it is clear why the demand equation can be identified: We have an observed variable z1 that shifts the supply equation while not affecting the demand equation. In this way the demand equation can be traced out.
经济学实证研究中的
工具变量
主讲人:李兵 中央财经大学 国际经济与贸易学院
2015年11月14日 浙江 杭州
大 纲
实证研究的主要方法 OLS与内生性问题 工具变量
■ 定义 ■ 工作原理 ■ 相关检验 ■ 如何找到工具变量
实证研究的主要方法
描述现象:如何测量?如何收集数据(Data)?
Supply of milk Demand for milk For example, price of cattle feed
Which of the two equations is identified?
■ The supply function cannot be consistently estimated because
The Problem: It is impossible to observe individual treatment effect since we do not know the outcomes for untreated observations when it is under treatment, and for treated when it is not under treatment.
Observed demand shifters e.g. agricultural land area
Wage elasticity of demand
同时性(Simultaneity)
Example: Labor demand and supply in agriculture (cont.)
两阶段最小二乘法 (2SLS)
■ 工具变量估计的结果与下面的估计方法是等价的,而且更加直观,便于
解释:
第一阶段:
只使用外生变量来预测y2
多出来的外生变量(= IV)
第二阶段(= OLS 估计 用第一阶段预测出来的值来替代y2:
两阶段最小二乘法
两阶段最小二乘法的性质
The standard errors from the OLS second stage
Biblioteka Baidu
实证研究的主要方法
什么是最理想的实证研究方法?
■ 控制实验
■ 随机试验
在自然科学中可行
在社会科学中经常不可行
■ 更多依赖自然实验
实证研究的主要方法
计量经济学中因果分析的一部分主要方法
■ OLS
■ IV ■ DID ■ RD
■ Matching ■ System GMM
内生性问题
IV:
估计也更加不准了
弱工具变量
差的工具变量不如OLS的情况
■ 工具变量估计还可能不如OLS,如果工具变量不是严格外生的,并且与内
生变量(x)仅仅是弱相关
如果工具变量确实是外生的,就 没有问题。但是如果不是, 趋近 偏误会随着与x的弱相关而变大。
IV 不如 OLS 的情况:
e.g.
两阶段最小二乘法
regression are wrong. However, it is not difficult to compute correct standard errors.
If there is one endogenous variable and one instrument
then 2SLS = IV
遗漏变量偏误
X3 不可观测或者很难测量
遗漏变量偏误
测量误差
测量误差
衰减偏误 (attenuation bias)
选择性偏误
选择性偏误(selection bias)
选择性偏误
选择性偏误
同时性(Simultaneity)
Simultaneity occurs if at least two variables are jointly determined A typical case is when observed outcomes are the result of separate
The return to education is much lower but also much more imprecise than with OLS
工具变量工作原理
Identification in simultaneous equations systems
Example: Supply and demand system
Note: Without separate exogenous variables in each equation, the two equations could never be distiguished/separately identified
Structural error terms (uncorrelated with the exogenous variables)
什么是内生性问题( Endogeneity Problem)?
OLS经典假设:所有的解释变量Xi与随机误差项彼此之间
不相关。
内生性问题:
■ 若解释变量Xi和ui相关,则OLS估计量是非一致的,也就是
即使当样本容量很大时,OLS估计量也不会接近回归系数的
真值。
■ 当解释变量和随机误差项相关时,模型存在着内生性问题。
假定存在一个工具变量:
(but )
工具变量与残差项不相关
IV-估计:
工具变量与解释变量 相关
工具变量工作原理
一个简单例子: 父亲教育水平作为孩子教育水平的工具变量
OLS:
教育回报可能会被高估
父亲的教育水平是一个好的工具变量吗? 1)不出现在回归方程中 2)与教育水平显著相关 3)与残差项不相关(?) 教育回报降低了(符合我们 的预期)
First stage regression (regress educ on all exogenous variables):
Education is significantly partially correlated with the education of the parents Two Stage Least Squares estimation results:
内生性问题
内生性问题
什么时候会有内生性问题?
遗漏变量偏误(Omitted Variable Bias) 测量误差(Measurement Error)
选择性偏误(Selection bias) 同时性(Simultaneity) – 反向因果(reverse
caulsality)
Endogenous variables
Exogenous variables
谁发明的工具变量?
谁发明的工具变量?
James H. Stock & Francesco Trebbi, 2003.
Retrospectives: Who Invented Instrumental Variable Regression? Journal of Economic Perspectives
■ Observed quantity and price will be determined in equilibrium
同时性(Simultaneity)
Example: Labor demand and supply in agriculture
Annual hours supplied by workers in a given county if the average hourly wage offered to such workers is w Annual hours demanded by employers in a given county if the average hourly wage paid to workers is w
有足够的反事实(Counterfactuals)
例子:医疗的效果
■ 医院效果=那些去了医院的人的健康状况-那些没去医院的人的健
康状况
■ 但是,这两群人也许是不同的,去医院的人群健康状况往往较差。
因果推论的根本问题
Group Treatment (D=1) Control (D=0) Y1 observable (counterfactual) Y0 (counterfactual) observable
The 2SLS estimation can also be used if there is more
than one endo-genous variable and at least as many
instruments
两阶段最小二乘法
Example: 2SLS in a wage equation using two instruments
■ 访谈(Interview)
■ 实地(Field Work) ■ 问卷(Questionnaire)
■ 统计数据(Second hand data) ■ 大数据(Big data)
实证研究的主要方法
解释现象:如何建立因果关系
相关关系 vs. 因果关系 因果推论的根本问题(Holland,1886: 947):没有,或者没
Wright, Philip G. 1928. The Tariff on Animal and
Vegetable Oils. New York: Macmillan.
什么是工具变量?
工具变量(IV)的定义:
1) 不出现在回归方程中 2) 与内生变量高度相关
3) 与残差项不相关
工具变量工作原理
Competition on the labor market in each county i will lead to a county wage w i so that the total number of hours his supplied by workers in this county equals the total number of hours hid demanded by agricultural employers in this county: (= observed equilibrium outcomes in each county) Simultaneous equations model (SEM):
Wage elasticity of supply Observed supply shifters e.g. manufacturing wage Unobserved supply shifters e.g. immigration flows
Unobserved demand shifters e.g. food market shocks
behavioral mechanisms that are coordinated in an equilibrium
The prototypical case is a system of demand and supply equations:
■ D(p) = how high would demand be if the price was set to p? ■ S(p) = how high would supply be if the price was set to p? ■ Both mechanisms have a ceteris paribus interpretation
instrument
■ The demand function can be consistently estimated because we
can take z1 as an instrument for the endogenous price variable
工具变量工作原理
Graphical illustration of identification problem
Intuitively, it is clear why the demand equation can be identified: We have an observed variable z1 that shifts the supply equation while not affecting the demand equation. In this way the demand equation can be traced out.
经济学实证研究中的
工具变量
主讲人:李兵 中央财经大学 国际经济与贸易学院
2015年11月14日 浙江 杭州
大 纲
实证研究的主要方法 OLS与内生性问题 工具变量
■ 定义 ■ 工作原理 ■ 相关检验 ■ 如何找到工具变量
实证研究的主要方法
描述现象:如何测量?如何收集数据(Data)?
Supply of milk Demand for milk For example, price of cattle feed
Which of the two equations is identified?
■ The supply function cannot be consistently estimated because
The Problem: It is impossible to observe individual treatment effect since we do not know the outcomes for untreated observations when it is under treatment, and for treated when it is not under treatment.
Observed demand shifters e.g. agricultural land area
Wage elasticity of demand
同时性(Simultaneity)
Example: Labor demand and supply in agriculture (cont.)
两阶段最小二乘法 (2SLS)
■ 工具变量估计的结果与下面的估计方法是等价的,而且更加直观,便于
解释:
第一阶段:
只使用外生变量来预测y2
多出来的外生变量(= IV)
第二阶段(= OLS 估计 用第一阶段预测出来的值来替代y2:
两阶段最小二乘法
两阶段最小二乘法的性质
The standard errors from the OLS second stage
Biblioteka Baidu
实证研究的主要方法
什么是最理想的实证研究方法?
■ 控制实验
■ 随机试验
在自然科学中可行
在社会科学中经常不可行
■ 更多依赖自然实验
实证研究的主要方法
计量经济学中因果分析的一部分主要方法
■ OLS
■ IV ■ DID ■ RD
■ Matching ■ System GMM
内生性问题
IV:
估计也更加不准了
弱工具变量
差的工具变量不如OLS的情况
■ 工具变量估计还可能不如OLS,如果工具变量不是严格外生的,并且与内
生变量(x)仅仅是弱相关
如果工具变量确实是外生的,就 没有问题。但是如果不是, 趋近 偏误会随着与x的弱相关而变大。
IV 不如 OLS 的情况:
e.g.
两阶段最小二乘法
regression are wrong. However, it is not difficult to compute correct standard errors.
If there is one endogenous variable and one instrument
then 2SLS = IV
遗漏变量偏误
X3 不可观测或者很难测量
遗漏变量偏误
测量误差
测量误差
衰减偏误 (attenuation bias)
选择性偏误
选择性偏误(selection bias)
选择性偏误
选择性偏误
同时性(Simultaneity)
Simultaneity occurs if at least two variables are jointly determined A typical case is when observed outcomes are the result of separate
The return to education is much lower but also much more imprecise than with OLS
工具变量工作原理
Identification in simultaneous equations systems
Example: Supply and demand system
Note: Without separate exogenous variables in each equation, the two equations could never be distiguished/separately identified
Structural error terms (uncorrelated with the exogenous variables)
什么是内生性问题( Endogeneity Problem)?
OLS经典假设:所有的解释变量Xi与随机误差项彼此之间
不相关。
内生性问题:
■ 若解释变量Xi和ui相关,则OLS估计量是非一致的,也就是
即使当样本容量很大时,OLS估计量也不会接近回归系数的
真值。
■ 当解释变量和随机误差项相关时,模型存在着内生性问题。
假定存在一个工具变量:
(but )
工具变量与残差项不相关
IV-估计:
工具变量与解释变量 相关
工具变量工作原理
一个简单例子: 父亲教育水平作为孩子教育水平的工具变量
OLS:
教育回报可能会被高估
父亲的教育水平是一个好的工具变量吗? 1)不出现在回归方程中 2)与教育水平显著相关 3)与残差项不相关(?) 教育回报降低了(符合我们 的预期)
First stage regression (regress educ on all exogenous variables):
Education is significantly partially correlated with the education of the parents Two Stage Least Squares estimation results:
内生性问题
内生性问题
什么时候会有内生性问题?
遗漏变量偏误(Omitted Variable Bias) 测量误差(Measurement Error)
选择性偏误(Selection bias) 同时性(Simultaneity) – 反向因果(reverse
caulsality)
Endogenous variables
Exogenous variables
谁发明的工具变量?
谁发明的工具变量?
James H. Stock & Francesco Trebbi, 2003.
Retrospectives: Who Invented Instrumental Variable Regression? Journal of Economic Perspectives
■ Observed quantity and price will be determined in equilibrium
同时性(Simultaneity)
Example: Labor demand and supply in agriculture
Annual hours supplied by workers in a given county if the average hourly wage offered to such workers is w Annual hours demanded by employers in a given county if the average hourly wage paid to workers is w
有足够的反事实(Counterfactuals)
例子:医疗的效果
■ 医院效果=那些去了医院的人的健康状况-那些没去医院的人的健
康状况
■ 但是,这两群人也许是不同的,去医院的人群健康状况往往较差。
因果推论的根本问题
Group Treatment (D=1) Control (D=0) Y1 observable (counterfactual) Y0 (counterfactual) observable
The 2SLS estimation can also be used if there is more
than one endo-genous variable and at least as many
instruments
两阶段最小二乘法
Example: 2SLS in a wage equation using two instruments
■ 访谈(Interview)
■ 实地(Field Work) ■ 问卷(Questionnaire)
■ 统计数据(Second hand data) ■ 大数据(Big data)
实证研究的主要方法
解释现象:如何建立因果关系
相关关系 vs. 因果关系 因果推论的根本问题(Holland,1886: 947):没有,或者没
Wright, Philip G. 1928. The Tariff on Animal and
Vegetable Oils. New York: Macmillan.
什么是工具变量?
工具变量(IV)的定义:
1) 不出现在回归方程中 2) 与内生变量高度相关
3) 与残差项不相关
工具变量工作原理
Competition on the labor market in each county i will lead to a county wage w i so that the total number of hours his supplied by workers in this county equals the total number of hours hid demanded by agricultural employers in this county: (= observed equilibrium outcomes in each county) Simultaneous equations model (SEM):
Wage elasticity of supply Observed supply shifters e.g. manufacturing wage Unobserved supply shifters e.g. immigration flows
Unobserved demand shifters e.g. food market shocks
behavioral mechanisms that are coordinated in an equilibrium
The prototypical case is a system of demand and supply equations:
■ D(p) = how high would demand be if the price was set to p? ■ S(p) = how high would supply be if the price was set to p? ■ Both mechanisms have a ceteris paribus interpretation