The Regression-Discontinuity Design
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Waldinger (Warwick) 6 / 48
Key Identifying Assumption
Key identifying assumption: E [Y0i jXi ] and E [Y1i jXi ] are continuos in Xi at X0 . This means that all other unobserved determinants of Y are continuously related to the running variable X . This allows us to use average outcomes of units just below the cuto¤ as a valid counterfactual for units right above the cuto¤. This assumption cannot be directly tested. But there are some tests which give suggestive evidence whether the assumption is satis…ed (see below).
1 2
Sharp RD: treatment is a deterministic function of a covariate X. Fuzzy RD: exploits discontinuities in the probability of treatment conditional on a covariate X (the discontinuity is then used as an IV).
10 / 48
Waldinger (Warwick)
Di¤erent Polynomials on the 2 Sides of the Discontinuity
To derive a regression model that can be used to estimate the causal e¤ect we use the fact that Di is a deterministic function of Xi : E [Yi jXi ] = E [Y0i jXi ] + (E [Y1i jXi ] E [Y0i jXi ])Di
Lecture 4: Regression Discontinuity Design
Fabian Waldinger
Waldinger ()
1 / 48
Topics Covered in Lecture
1 2 3 4 5
Sharp RD. Fuzzy RD. Practical Tips for Running RD Models. Example Fuzzy RD: Angrist & Lavy (1999) - Maimonides Rule. Regression Kink Design (very brie‡y).
1 2
Use a nonparametric kernel method (see more below). Use a pth order polynomial: i.e. estimate: Yi = α + β1 xi + β2 xi2 + ... + βp xip + ρDi + η i (1)
Sharp Regression Discontinuity - Nonlinear Case
Suppose the nonlinear relationship is E [Y0i jxi ] = f (Xi ) for some reasonably smooth function f (Xi ). In that case we can construct RD estimates by …tting: Yi = f (xi ) + ρDi + η i There are 2 ways of approximating f (xi ):
Waldinger (Warwick)
7 / 48
Sharp Regression Discontinuity - Nonlinear Case
Sometimes the trend relation E [Y0i jxi ] is nonlinear.
Waldinger (Warwick)
8 / 48
During the following introduction we will focus on the pth order polynomial approach but will discuss the other approach below.
Waldinger (Warwick)
9 / 48
(2)
Equation (1) above is a special case of (2) with β1 = β2 = βp = 0. The treatment e¤ect at Xo is ρ. The treatment e¤ect at Xi X0 = c > 0 is: ρ + β1 c + β2 c 2 + ...+ βp c p
Waldinger (Warwick) 11 / 48
Example Sharp RD: Lee (2008) Incumbency E¤ects
Lee (2008) uses a sharp RD design to estimate the probability that the incumbent wins an election. A large political science literature suggests that incumbents may use privileges and resources of o¢ ce to gain an advantage over potential challengers. An OLS regression of incumbency status on election success is likely to be biased because of unobserved di¤erences. Incumbents have already won an election so they may just be better. Lee analyzes the incumbency e¤ect using Democratic incumbents for US congressional elections. He analyzes the probability of winning the election in year t+1 by comparing candidates who just won compared to candidates who just lost the election in year t.
Di¤erent Polynomials on the 2 Sides of the Discontinuity
We can generalize the function f (xi ) by allowing the xi terms to di¤er on both sides of the threshold by including them both individually and interacting them with Di . As Lee and Lemieux (2010) note, allowing di¤erent functions on both sides of the discontinuity should be the main results in an RD paper (as otherwise we use values from both sides of the cuto¤ the estimate the function on each side). In that case we have: ei + β X e2 ep E [Y0i jXi ] = α + β01 X 02 i + ... + β0 p Xi ei + β X e2 ep E [Y1i jXi ] = α + ρ + β11 X 12 i + ... + β1 p Xi
RD captures the causal e¤ect by distinguishing the nonlinear and discontinuous function, 1(Xi Xo ) from the smooth function f (Xi ).
Waldinger (Warwick)
The regression model which you estimate is then: Yi = α + β01 e xi + β02 e xi2 + ... + β0p e xip
where β1 = β11
xi + β2 Di e xi2 + ... + βp Di e xip + η i + ρ Di + β 1 Di e β01 , β2 = β21 β21 and βp = β1p β0 p
ei = Xi where X
X0 Centering at X0 ensures that the treatment e¤ect at Xi = X0 is the coe¢ cient on Di in a regression model with interaction terms (because you do not have to add values of the Di interacted with X to get the treatment e¤ect at X0 ).
Waldinger (Warwick)
2 / 48
ቤተ መጻሕፍቲ ባይዱ
Regression Discontinuity Design
Regression discontinuity research designs exploit the fact that some rules are quite arbitrary and therefore provide good quasi-experiments when you compare people (or cities, …rms, countries,...) who are just a¤ected by the rule with people who are just not a¤ected by the rule. There are 2 types of RD designs:
3 / 48
Example Linear RD
Waldinger (Warwick)
4 / 48
Sharp RD
In Sharp RD designs you exploit that treatment status is a deterministic and discontinuous function of a covariate xi . 1 if xi xo Di = f 0 if xi < xo xo is a known threshold or cuto¤ Once we know xi we know Di . As highlighted by Imbens and Lemieux (2008) there is no value of xi at which you observe both treatment and control observations. ! the method relies on extrapolation across covariate values. For this reason we cannot be agnostic about regression functional form in RD.
Waldinger (Warwick)
5 / 48
Formalizing Sharp RD
Suppose that in addition to the assignment mechanism above, potential outcomes can be described by a linear, constant e¤ects model: E [ Y0 i j Xi ] = α + β Xi Y1 i = Y0 i + ρ This leads to the regression: Yi = α + βXi + ρDi + η i The key di¤erence between this regression and regressions we have investigated in previous lectures is that Di is not only correlated with Xi but it is a deterministic function of Xi .
Key Identifying Assumption
Key identifying assumption: E [Y0i jXi ] and E [Y1i jXi ] are continuos in Xi at X0 . This means that all other unobserved determinants of Y are continuously related to the running variable X . This allows us to use average outcomes of units just below the cuto¤ as a valid counterfactual for units right above the cuto¤. This assumption cannot be directly tested. But there are some tests which give suggestive evidence whether the assumption is satis…ed (see below).
1 2
Sharp RD: treatment is a deterministic function of a covariate X. Fuzzy RD: exploits discontinuities in the probability of treatment conditional on a covariate X (the discontinuity is then used as an IV).
10 / 48
Waldinger (Warwick)
Di¤erent Polynomials on the 2 Sides of the Discontinuity
To derive a regression model that can be used to estimate the causal e¤ect we use the fact that Di is a deterministic function of Xi : E [Yi jXi ] = E [Y0i jXi ] + (E [Y1i jXi ] E [Y0i jXi ])Di
Lecture 4: Regression Discontinuity Design
Fabian Waldinger
Waldinger ()
1 / 48
Topics Covered in Lecture
1 2 3 4 5
Sharp RD. Fuzzy RD. Practical Tips for Running RD Models. Example Fuzzy RD: Angrist & Lavy (1999) - Maimonides Rule. Regression Kink Design (very brie‡y).
1 2
Use a nonparametric kernel method (see more below). Use a pth order polynomial: i.e. estimate: Yi = α + β1 xi + β2 xi2 + ... + βp xip + ρDi + η i (1)
Sharp Regression Discontinuity - Nonlinear Case
Suppose the nonlinear relationship is E [Y0i jxi ] = f (Xi ) for some reasonably smooth function f (Xi ). In that case we can construct RD estimates by …tting: Yi = f (xi ) + ρDi + η i There are 2 ways of approximating f (xi ):
Waldinger (Warwick)
7 / 48
Sharp Regression Discontinuity - Nonlinear Case
Sometimes the trend relation E [Y0i jxi ] is nonlinear.
Waldinger (Warwick)
8 / 48
During the following introduction we will focus on the pth order polynomial approach but will discuss the other approach below.
Waldinger (Warwick)
9 / 48
(2)
Equation (1) above is a special case of (2) with β1 = β2 = βp = 0. The treatment e¤ect at Xo is ρ. The treatment e¤ect at Xi X0 = c > 0 is: ρ + β1 c + β2 c 2 + ...+ βp c p
Waldinger (Warwick) 11 / 48
Example Sharp RD: Lee (2008) Incumbency E¤ects
Lee (2008) uses a sharp RD design to estimate the probability that the incumbent wins an election. A large political science literature suggests that incumbents may use privileges and resources of o¢ ce to gain an advantage over potential challengers. An OLS regression of incumbency status on election success is likely to be biased because of unobserved di¤erences. Incumbents have already won an election so they may just be better. Lee analyzes the incumbency e¤ect using Democratic incumbents for US congressional elections. He analyzes the probability of winning the election in year t+1 by comparing candidates who just won compared to candidates who just lost the election in year t.
Di¤erent Polynomials on the 2 Sides of the Discontinuity
We can generalize the function f (xi ) by allowing the xi terms to di¤er on both sides of the threshold by including them both individually and interacting them with Di . As Lee and Lemieux (2010) note, allowing di¤erent functions on both sides of the discontinuity should be the main results in an RD paper (as otherwise we use values from both sides of the cuto¤ the estimate the function on each side). In that case we have: ei + β X e2 ep E [Y0i jXi ] = α + β01 X 02 i + ... + β0 p Xi ei + β X e2 ep E [Y1i jXi ] = α + ρ + β11 X 12 i + ... + β1 p Xi
RD captures the causal e¤ect by distinguishing the nonlinear and discontinuous function, 1(Xi Xo ) from the smooth function f (Xi ).
Waldinger (Warwick)
The regression model which you estimate is then: Yi = α + β01 e xi + β02 e xi2 + ... + β0p e xip
where β1 = β11
xi + β2 Di e xi2 + ... + βp Di e xip + η i + ρ Di + β 1 Di e β01 , β2 = β21 β21 and βp = β1p β0 p
ei = Xi where X
X0 Centering at X0 ensures that the treatment e¤ect at Xi = X0 is the coe¢ cient on Di in a regression model with interaction terms (because you do not have to add values of the Di interacted with X to get the treatment e¤ect at X0 ).
Waldinger (Warwick)
2 / 48
ቤተ መጻሕፍቲ ባይዱ
Regression Discontinuity Design
Regression discontinuity research designs exploit the fact that some rules are quite arbitrary and therefore provide good quasi-experiments when you compare people (or cities, …rms, countries,...) who are just a¤ected by the rule with people who are just not a¤ected by the rule. There are 2 types of RD designs:
3 / 48
Example Linear RD
Waldinger (Warwick)
4 / 48
Sharp RD
In Sharp RD designs you exploit that treatment status is a deterministic and discontinuous function of a covariate xi . 1 if xi xo Di = f 0 if xi < xo xo is a known threshold or cuto¤ Once we know xi we know Di . As highlighted by Imbens and Lemieux (2008) there is no value of xi at which you observe both treatment and control observations. ! the method relies on extrapolation across covariate values. For this reason we cannot be agnostic about regression functional form in RD.
Waldinger (Warwick)
5 / 48
Formalizing Sharp RD
Suppose that in addition to the assignment mechanism above, potential outcomes can be described by a linear, constant e¤ects model: E [ Y0 i j Xi ] = α + β Xi Y1 i = Y0 i + ρ This leads to the regression: Yi = α + βXi + ρDi + η i The key di¤erence between this regression and regressions we have investigated in previous lectures is that Di is not only correlated with Xi but it is a deterministic function of Xi .