伍德里奇计量经济学第六版答案Appendix-E

合集下载

伍德里奇计量经济学第六版答案Chapter-3

伍德里奇计量经济学第六版答案Chapter-3

CHAPTER 3TEACHING NOTESFor undergraduates, I do not work through most of the derivations in this chapter, at least not in detail. Rather, I focus on interpreting the assumptions, which mostly concern the population. Other than random sampling, the only assumption that involves more than population considerations is the assumption about no perfect collinearity, where the possibility of perfect collinearity in the sample (even if it does not occur in the population) should be touched on. The more important issue is perfect collinearity in the population, but this is fairly easy to dispense with via examples. These come from my experiences with the kinds of model specification issues that beginners have trouble with.The comparison of simple and multiple regression estimates – based on the particular sample at hand, as opposed to their statistical properties – usually makes a strong impression. Sometimes I do not bother with the “partialling out” interpretation of multiple regression.As far as statistical properties, notice how I treat the problem of including an irrelevant variable: no separate derivation is needed, as the result follows form Theorem 3.1.I do like to derive the omitted variable bias in the simple case. This is not much more difficult than showing unbiasedness of OLS in the simple regression case under the first four Gauss-Markov assumptions. It is important to get the students thinking about this problem early on, and before too many additional (unnecessary) assumptions have been introduced.I have intentionally kept the discussion of multicollinearity to a minimum. This partly indicates my bias, but it also reflects reality. It is, of course, very important for students to understand the potential consequences of having highly correlated independent variables. But this is often beyond our control, except that we can ask less of our multiple regression analysis. If two or more explanatory variables are highly correlated in the sample, we should not expect to precisely estimate their ceteris paribus effects in the population.I find extensive t reatments of multicollinearity, where one “tests” or somehow “solves” the multicollinearity problem, to be misleading, at best. Even the organization of some texts gives the impression that imperfect collinearity is somehow a violation of the Gauss-Markov assumptions. In fact, they include multicollinearity in a chapter or part of the book devoted to “violation of the basic assumptions,” or something like that. I have noticed that master’s students who have had some undergraduate econometrics are often confused on the multicollinearity issue. It is very important that students not confuse multicollinearity among the included explanatory variables in a regression model with the bias caused by omitting an important variable.I do not prove the Gauss-Markov theorem. Instead, I emphasize its implications. Sometimes, and certainly for advanced beginners, I put a special case of Problem 3.12 on a midterm exam, where I make a particular choice for the function g(x). Rather than have the students directly comparethe variances, they should appeal to the Gauss-Markov theorem for the superiority of OLS over any other linear, unbiased estimator.SOLUTIONS TO PROBLEMS3.1 (i) hsperc is defined so that the smaller it is, the lower the student’s standing in high school . Everything else equal, the worse the student’s standing in high school, the lower is his/her expected college GPA.(ii) Just plug these values into the equation:colgpa = 1.392 - .0135(20) + .00148(1050) = 2.676.(iii) The difference between A and B is simply 140 times the coefficient on sat , because hsperc is the same for both students. So A is predicted to have a score .00148(140) ≈ .207 higher.(iv) With hsperc fixed, colgpa ∆ = .00148∆sat . Now, we want to find ∆sat such that colgpa ∆ = .5, so .5 = .00148(∆sat ) or ∆sat = .5/(.00148) ≈ 338. Perhaps not surprisingly, a large ceteris paribus difference in SAT score – almost two and one-half standard deviations – is needed to obtain a predicted difference in college GPA or a half a point.3.2 (i) Yes. Because of budget constraints, it makes sense that, the more siblings there are in a family, the less education any one child in the family has. To find the increase in the number of siblings that reduces predicted education by one year, we solve 1 = .094(∆sibs ), so ∆sibs = 1/.094 ≈ 10.6.(ii) Holding sibs and feduc fixed, one more year of mother’s education implies .131 years more of predicted education. So if a mother has four more years of education, her son is predicted to have about a half a year (.524) more years of education.(iii) Since the number of siblings is the same, but meduc and feduc are both different, the coefficients on meduc and feduc both need to be accounted for. The predicted difference in education between B and A is .131(4) + .210(4) = 1.364.3.3 (i) If adults trade off sleep for work, more work implies less sleep (other things equal), so 1β < 0.(ii) The signs of 2β and 3β are not obvious, at least to me. One could argue that more educated people like to get more out of life, and so, other things equal, they sleep less (2β < 0). The relationship between sleeping and age is more complicated than this model suggests, and economists are not in the best position to judge such things.(iii) Since totwrk is in minutes, we must convert five hours into minutes: ∆totwrk = 5(60) = 300. Then sleep is predicted to fall by .148(300) = 44.4 minutes. For a week, 45 minutes less sleep is not an overwhelming change.(iv) More education implies less predicted time sleeping, but the effect is quite small. If we assume the difference between college and high school is four years, the college graduate sleeps about 45 minutes less per week, other things equal.(v) Not surprisingly, the three explanatory variables explain only about 11.3% of the variation in sleep . One important factor in the error term is general health. Another is marital status, and whether the person has children. Health (however we measure that), marital status, and number and ages of children would generally be correlated with totwrk . (For example, less healthy people would tend to work less.)3.4 (i) A larger rank for a law school means that the school has less prestige; this lowers starting salaries. For example, a rank of 100 means there are 99 schools thought to be better.(ii) 1β > 0, 2β > 0. Both LSAT and GPA are measures of the quality of the entering class. No matter where better students attend law school, we expect them to earn more, on average. 3β, 4β > 0. The number of volumes in the law library and the tuition cost are both measures of the school quality. (Cost is less obvious than library volumes, but should reflect quality of the faculty, physical plant, and so on.)(iii) This is just the coefficient on GPA , multiplied by 100: 24.8%.(iv) This is an elasticity: a one percent increase in library volumes implies a .095% increase in predicted median starting salary, other things equal.(v) It is definitely better to attend a law school with a lower rank. If law school A has a ranking 20 less than law school B, the predicted difference in starting salary is 100(.0033)(20) =6.6% higher for law school A.3.5 (i) No. By definition, study + sleep + work + leisure = 168. Therefore, if we change study , we must change at least one of the other categories so that the sum is still 168.(ii) From part (i), we can write, say, study as a perfect linear function of the otherindependent variables: study = 168 - sleep - work - leisure . This holds for every observation, so MLR.3 violated.(iii) Simply drop one of the independent variables, say leisure :GPA = 0β + 1βstudy + 2βsleep + 3βwork + u .。

伍德里奇计量经济学导论第6版笔记和课后答案

伍德里奇计量经济学导论第6版笔记和课后答案

伍德里奇计量经济学导论第6版笔记和课后答案
第1章计量经济学的性质与经济数据
1.1 复习笔记
考点一:计量经济学及其应用★
1计量经济学
计量经济学是在一定的经济理论基础之上,采用数学与统计学的工具,通过建立计量经济模型对经济变量之间的关系进行定量分析的学科。

进行计量分析的步骤主要有:①利用经济数据对模型中的未知参数进行估计;②对模型进行检验;③通过检验后,可以利用计量模型来进行相关预测。

2经济分析的步骤
经济分析是指利用所搜集的相关数据检验某个理论是否成立或估计某种关系的方法。

经济分析主要包括以下几步,分别是阐述问题、构建经济模型、经济模型转化为计量模型、搜集相关数据、参数估计和假设检验。

考点二:经济数据★★★
1经济数据的结构(见表1-1)
表1-1 经济数据的结构
2面板数据与混合横截面数据的比较(见表1-2)
表1-2 面板数据与混合横截面数据的比较
考点三:因果关系和其他条件不变★★
1因果关系
因果关系是指一个变量的变动将引起另一个变量的变动,这是经济分析中的重要目标之一。

计量分析虽然能发现变量之间的相关关系,但是如果想要解释因果关系,还要排除模型本身存在因果互逆的可能,
否则很难让人信服。

2其他条件不变
其他条件不变是指在经济分析中,保持所有的其他变量不变。

“其他条件不变”这一假设在因果分析中具有重要作用。

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-多元回归分析:OLS的渐近性【圣才出品】

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-多元回归分析:OLS的渐近性【圣才出品】

第5章多元回归分析:OLS 的渐近性5.1复习笔记考点一:一致性★★★★1.定理5.1:OLS 的一致性(1)一致性的证明当假定MLR.1~MLR.4成立时,对所有的j=0,1,2,…,k,OLS 估计量∧βj 是βj 的一致估计。

证明过程如下:将y i =β0+β1x i1+u i 代入∧β1的表达式中,便可以得到:()()()()11111111122111111ˆnni ii i i i n ni i i i xx y n x x u xxnxx ββ-==-==--==+--∑∑∑∑根据大数定律可知上式等式右边第二项中的分子和分母分别依概率收敛于总体值Cov (x 1,u)和Var(x 1)。

假定Var(x 1)≠0,因为Cov(x 1,u)=0,利用概率极限的性质可得:plim ∧β1=β1+Cov(x 1,u)/Var(x 1)=β1。

这就说明了OLS 估计量∧βj 具有一致性。

前面的论证表明,如果假定只有零相关,那么OLS 在简单回归情形中就是一致的。

在一般情形中也是这样,可以将这一点表述成一个假定。

即假定MLR.4′(零均值与零相关):对所有的j=1,2,…,k,都有E(u)=0和Cov(x j1,u)=0。

(2)MLR.4′与MLR.4的比较①MLR.4要求解释变量的任何函数都与u 无关,而MLR.4′仅要求每个x j 与u 无关(且u 在总体中均值为0)。

②在MLR.4假定下,有E(y|x 1,x 2,…,x k )=β0+β1x 1+β2x 2+…+βk x k ,可以得到解释变量对y 的平均值或期望值的偏效应;而在假定MLR.4′下,β0+β1x 1+β2x 2+…+βk x k 不一定能够代表总体回归函数,存在x j 的某些非线性函数与误差项相关的可能性。

2.推导OLS 的不一致性当误差项和x 1,x 2,…,x k 中的任何一个相关时,通常会导致所有的OLS 估计量都失去一致性,即使样本量增加也不会改善。

伍德里奇计量经济学第六版答案Chapter 1

伍德里奇计量经济学第六版答案Chapter 1

CHAPTER 1TEACHING NOTESYou have substantial latitude about what to emphasize in Chapter 1. I find it useful to talk about the economics of crime example (Example 1.1) and the wage example (Example 1.2) so that students see, at the outset, that econometrics is linked to economic reasoning, even if the economics is not complicated theory.I like to familiarize students with the important data structures that empirical economists use, focusing primarily on cross-sectional and time series data sets, as these are what I cover in a first-semester course. It is probably a good idea to mention the growing importance of data sets that have both a cross-sectional and time dimension.I spend almost an entire lecture talking about the problems inherent in drawing causal inferences in the social sciences. I do this mostly through the agricultural yield, return to education, and crime examples.These examples also contrast experimental and nonexperimental (observational) data. Students studying business and finance tend to find the term structure of interest rates example more relevant, although the issue there is testing the implication of a simple theory, as opposed to inferring causality. I have found that spending time talking about these examples, in place of a formal review of probability and statistics, is more successful in teaching the students how econometrics can be used. (And, it is more enjoyable for the students and me.)I do not use counterfactual notation as in the modern “treatment effects” literature, but I do discuss causality using counterfactual reasoning. The return to education, perhaps focusing on the return to getting a college degree, is a good example of how counterfactual reasoning is easily incorporated into the discussion of causality.SOLUTIONS TO PROBLEMS1.1 (i) Ideally, we could randomly assign students to classes of different sizes. That is, each student is assigned a different class size without regard to any student characteristics such as ability and family background. For reasons we will see in Chapter 2, we would like substantial variation in class sizes (subject, of course, to ethical considerations and resource constraints).(ii) A negative correlation means that larger class size is associated with lower performance. We might find a negative correlation because larger class size actually hurts performance. However, with observational data, there are other reasons we might find a negative relationship. For example, children from more affluent families might be more likely to attend schools with smaller class sizes, and affluent children generally score better on standardized tests. Another possibility is that, within a school, a principal might assign the better students to smaller classes. Or, some parents might insist their children are in the smaller classes, and these same parents tend to be more involved in thei r children’s education.(iii) Given the potential for confounding factors – some of which are listed in (ii) – finding a negative correlation would not be strong evidence that smaller class sizes actually lead to better performance. Some way of controlling for the confounding factors is needed, and this is the subject of multiple regression analysis.1.2 (i) Here is one way to pose the question: If two firms, say A and B, are identical in all respects except that firm A supplies job training one hour per worker more than firm B, by how much would firm A’s output differ from firm B’s?(ii) Firms are likely to choose job training depending on the characteristics of workers. Some observed characteristics are years of schooling, years in the workforce, and experience in a particular job. Firms might even discriminate based on age, gender, or race. Perhaps firms choose to offer training to more or less able workers, where “ability” might be difficult to quantify but where a manager has some idea about the relative abilities of different employees. Moreover, different kinds of workers might be attracted to firms that offer more job training on average, and this might not be evident to employers.(iii) The amount of capital and technology available to workers would also affect output. So, two firms with exactly the same kinds of employees would generally have different outputs if they use different amounts of capital or technology. The quality of managers would also have an effect.(iv) No, unless the amount of training is randomly assigned. The many factors listed in parts (ii) and (iii) can contribute to finding a positive correlation between output and training even if job training does not improve worker productivity.1.3 It does not make sense to pose the question in terms of causality. Economists would assume that students choose a mix of studying and working (and other activities, such as attending class, leisure, and sleeping) based on rational behavior, such as maximizing utility subject to the constraint that there are only 168 hours in a week. We can then use statistical methods to。

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-第一篇(第4~6章)【圣才出品】

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-第一篇(第4~6章)【圣才出品】

考点五:对多个线性约束的检验:F 检验 ★★★★★
1.对排除性约束的检验 对排除性约束的检验是指检验一组自变量是否对因变量都没有影响,该检验不适用于不 同因变量的检验。F 统计量通常对检验一组变量的排除有用处,特别是当变量高度相关的时 候。 含有 k 个自变量的不受约束模型为:y=β0+β1x1+…+βkxk+u,其中参数有 k+1 个。 假设有 q 个排除性约束要检验,且这 q 个变量是自变量中的最后 q 个:xk-q+1,…,xk, 则受约束模型为:y=β0+β1x1+…+βk-qxk-q+u。 虚拟假设为 H0:βk-q+1=0,…,βk=0,对立假设是列出的参数至少有一个不为零。 定义 F 统计量为 F=[(SSRr-SSRur)/q]/[SSRur/(n-k-1)]。其中,SSRr 是受约束模型 的残差平方和,SSRur 是不受约束模型的残差平方和。由于 SSRr 不可能比 SSRur 小,所以 F 统计量总是非负的。q=dfr-dfur,即 q 是受约束模型与不受约束模型的自由度之差,也是 约束条件的个数。n-k-1=分母自由度=dfur,且 F 的分母恰好就是不受约束模型中σ2= Var(u)的一个无偏估计量。 假设 CLM 假定成立,在 H0 下 F 统计量服从自由度为(q,n-k-1)的 F 分布,即 F~ Fq,n-k-1。如果 F 值大于显著性水平下的临界值,则拒绝 H0 而支持 H1。当拒绝 H0 时,就 说,xk-q+1,…,xk 在适当的显著性水平上是联合统计显著的(或联合显著)。





注:β1,β2,…,βk 的任何线性组合也都符合正态分布,且βj 的任何一数检验:t 检验 ★★★★
1.总体回归函数 总体模型的形式为:y=β0+β1x1+…+βkxk+u。假定该模型满足 CLM 假定,βj 的 OLS 量是无偏的。

伍德里奇计量经济学第六版答案Chapter 11

伍德里奇计量经济学第六版答案Chapter 11

CHAPTER 11TEACHING NOTESMuch of the material in this chapter is usually postponed, or not covered at all, in an introductory course. However, as Chapter 10 indicates, the set of time series applications that satisfy all of the classical linear model assumptions might be very small. In my experience, spurious time series regressions are the hallmark of many student projects that use time series data. Therefore, students need to be alerted to the dangers of using highly persistent processes in time series regression equations. (Spurious regression problem and the notion of cointegration are covered in detail in Chapter 18.)It is fairly easy to heuristically describe the difference between a weakly dependent process and an integrated process. Using the MA(1) and the stable AR(1) examples is usually sufficient.When the data are weakly dependent and the explanatory variables are contemporaneously exogenous, OLS is consistent. This result has many applications, including the stable AR(1) regression model. When we add the appropriate homoskedasticity and no serial correlation assumptions, the usual test statistics are asymptotically valid.The random walk process is a good example of a unit root (highly persistent) process. In a one-semester course, the issue comes down to whether or not to first difference the data before specifying the linear model. While unit root tests are covered in Chapter 18, just computing the first-order autocorrelation is often sufficient, perhaps after detrending. The examples in Section 11.3 illustrate how different first-difference results can be from estimating equations in levels. Section 11.4 is novel in an introductory text, and simply points out that, if a model is dynamically complete in a well-defined sense, it should not have serial correlation. Therefore, we need not worry about serial correlation when, say, we test the efficient market hypothesis. Section 11.5 further investigates the homoskedasticity assumption, and, in a time series context, emphasizes that what is contained in the explanatory variables determines what kind of hetero-skedasticity is ruled out by the usual OLS inference. These two sections could be skipped without loss of continuity.129130SOLUTIONS TO PROBLEMS11.1 Because of covariance stationarity, 0γ = Var(x t ) does not depend on t , so sd(x t+hany h ≥ 0. By definition, Corr(x t ,x t +h ) = Cov(x t ,x t+h )/[sd(x t )⋅sd(x t+h)] = 0/.h h γγγ=11.2 (i) E(x t ) = E(e t ) – (1/2)E(e t -1) + (1/2)E(e t -2) = 0 for t = 1,2, Also, because the e t are independent, they are uncorrelated and so Var(x t ) = Var(e t ) + (1/4)Var(e t -1) + (1/4)Var(e t -2) = 1 + (1/4) + (1/4) = 3/2 because Var (e t ) = 1 for all t .(ii) Because x t has zero mean, Cov(x t ,x t +1) = E(x t x t +1) = E[(e t – (1/2)e t -1 + (1/2)e t -2)(e t +1 – (1/2)e t + (1/2)e t -1)] = E(e t e t+1) – (1/2)E(2t e ) + (1/2)E(e t e t -1) – (1/2)E(e t -1e t +1) + (1/4(E(e t-1e t ) – (1/4)E(21t e -) + (1/2)E(e t-2e t+1) – (1/4)E(e t-2e t ) +(1/4)E(e t-2e t-1) = – (1/2)E(2t e ) – (1/4)E(21t e -) = –(1/2) – (1/4) = –3/4; the third to last equality follows because the e t are pairwise uncorrelated and E(2t e ) = 1 for all t . Using Problem 11.1 and the variance calculation from part (i),Corr(x t x t+1) = – (3/4)/(3/2) = –1/2.Computing Cov(x t ,x t+2) is even easier because only one of the nine terms has expectation different from zero: (1/2)E(2t e ) = ½. Therefore, Corr(x t ,x t+2) = (1/2)/(3/2) = 1/3.(iii) Corr(x t ,x t+h ) = 0 for h >2 because, for h > 2, x t+h depends at most on e t+j for j > 0, while x t depends on e t+j , j ≤ 0.(iv) Yes, because terms more than two periods apart are actually uncorrelated, and so it is obvious that Corr(x t ,x t+h ) → 0 as h → ∞.11.3 (i) E(y t ) = E(z + e t ) = E(z ) + E(e t ) = 0. Var(y t ) = Var(z + e t ) = Var(z ) + Var(e t ) + 2Cov(z ,e t ) = 2z σ + 2e σ + 2⋅0 = 2z σ + 2e σ. Neither of these depends on t .(ii) We assume h > 0; when h = 0 we obtain Var(y t ). Then Cov(y t ,y t+h ) = E(y t y t+h ) = E[(z + e t )(z + e t+h )] = E(z 2) + E(ze t+h ) + E(e t z ) + E(e t e t+h ) = E(z 2) = 2z σ because {e t } is an uncorrelated sequence (it is an independent sequence and z is uncorrelated with e t for all t . From part (i) we know that E(y t ) and Var(y t ) do not depend on t and we have shown that Cov(y t ,y t+h ) depends on neither t nor h . Therefore, {y t } is covariance stationary.(iii) From Problem 11.1 and parts (i) and (ii), Corr(y t ,y t+h ) = Cov(y t ,y t+h )/Var(y t ) = 2z σ/(2z σ + 2e σ) > 0.(iv) No. The correlation between y t and y t+h is the same positive value obtained in part (iii) now matter how large is h . In other words, no matter how far apart y t and y t+h are, theircorrelation is always the same. Of course, the persistent correlation across time is due to the presence of the time-constant variable, z .13111.4 Assuming y 0 = 0 is a special case of assuming y 0 nonrandom, and so we can obtain the variances from (11.21): Var(y t ) = 2e σt and Var(y t+h ) = 2e σ(t + h ), h > 0. Because E(y t ) = 0 for all t (since E(y 0) = 0), Cov(y t ,y t+h ) = E(y t y t+h ) and, for h > 0,E(y t y t+h ) = E[(e t + e t-1 + e 1)(e t+h + e t+h -1 + + e 1)]= E(2t e ) + E(21t e -) + + E(21e ) = 2e σt ,where we have used the fact that {e t } is a pairwise uncorrelated sequence. Therefore, Corr(y t ,y t+h ) = Cov(y t ,y t+ht11.5 (i) The following graph gives the estimated lag distribution:wage inflation has its largest effect on price inflation nine months later. The smallest effect is at the twelfth lag, which hopefully indicates (but does not guarantee) that we have accounted for enough lags of gwage in the FLD model.(ii) Lags two, three, and twelve have t statistics less than two. The other lags are statistically significant at the 5% level against a two-sided alternative. (Assuming either that the CLMassumptions hold for exact tests or Assumptions TS.1' through TS.5' hold for asymptotic tests.)132(iii) The estimated LRP is just the sum of the lag coefficients from zero through twelve:1.172. While this is greater than one, it is not much greater, and the difference from unity could be due to sampling error.(iv) The model underlying and the estimated equation can be written with intercept α0 and lag coefficients δ0, δ1, , δ12. Denote the LRP by θ0 = δ0 + δ1 + + δ12. Now, we can write δ0 = θ0 - δ1 - δ2 - - δ12. If we plug this into the FDL model we obtain (with y t = gprice t and z t = gwage t )y t = α0 + (θ0 - δ1 - δ2 - - δ12)z t + δ1z t -1 + δ2z t -2 + + δ12z t -12 + u t= α0 + θ0z t + δ1(z t -1 – z t ) + δ2(z t -2 – z t ) + + δ12(z t -12 – z t ) + u t .Therefore, we regress y t on z t , (z t -1 – z t ), (z t -2 – z t ), , (z t -12 – z t ) and obtain the coefficient and standard error on z t as the estimated LRP and its standard error.(v) We would add lags 13 through 18 of gwage t to the equation, which leaves 273 – 6 = 267 observations. Now, we are estimating 20 parameters, so the df in the unrestricted model is df ur =267. Let 2ur R be the R -squared from this regression. To obtain the restricted R -squared, 2r R , weneed to reestimate the model reported in the problem but with the same 267 observations used toestimate the unrestricted model. Then F = [(2ur R -2r R )/(1 - 2ur R )](247/6). We would find thecritical value from the F 6,247 distribution.[Instructor’s Note: As a computer exercise, you might have the students test whether all 13 lag coefficients in the population model are equal. The restricted regression is gprice on (gwage + gwage -1 + gwage -2 + gwage -12), and the R -squared form of the F test, with 12 and 259 df , can be used.]11.6 (i) The t statistic for H 0: β1 = 1 is t = (1.104 – 1)/.039 ≈ 2.67. Although we must rely on asymptotic results, we might as well use df = 120 in Table G.2. So the 1% critical value against a two-sided alternative is about 2.62, and so we reject H 0: β1 = 1 against H 1: β1 ≠ 1 at the 1% level. It is hard to know whether the estimate is practically different from one withoutcomparing investment strategies based on the theory (β1 = 1) and the estimate (1ˆβ= 1.104). But the estimate is 10% higher than the theoretical value.(ii) The t statistic for the null in part (i) is now (1.053 – 1)/.039 ≈ 1.36, so H 0: β1 = 1 is no longer rejected against a two-sided alternative unless we are using more than a 10% significance level. But the lagged spread is very significant (contrary to what the expectations hypothesis predicts): t = .480/.109≈ 4.40. Based on the estimated equation, when the lagged spread is positive, the predicted holding yield on six-month T-bills is above the yield on three-month T-bills (even if we impose β1 = 1), and so we should invest in six-month T-bills.(iii) This suggests unit root behavior for {hy3t }, which generally invalidates the usual t -testing procedure.133(iv) We would include three quarterly dummy variables, say Q2t , Q3t , and Q4t , and do an F test for joint significance of these variables. (The F distribution would have 3 and 117 df .)11.7 (i) We plug the first equation into the second to gety t – y t-1 = λ(0γ + 1γx t + e t – y t-1) + a t ,and, rearranging,y t = λ0γ + (1 - λ)y t-1 + λ1γx t + a t + λe t ,≡ β0 + β1y t -1 + β2 x t + u t ,where β0 ≡ λ0γ, β1 ≡ (1 - λ), β2 ≡ λ1γ, and u t ≡ a t + λe t .(ii) An OLS regression of y t on y t-1 and x t produces consistent, asymptotically normal estimators of the βj . Under E(e t |x t ,y t-1,x t-1, ) = E(a t |x t ,y t-1,x t-1, ) = 0 it follows that E(u t |x t ,y t-1,x t-1, ) = 0, which means that the model is dynamically complete [see equation (11.37)]. Therefore, the errors are serially uncorrelated. If the homoskedasticity assumption Var(u t |x t ,y t-1) = σ2 holds, then the usual standard errors, t statistics and F statistics are asymptotically valid.(iii) Because β1 = (1 - λ), if 1ˆβ= .7 then ˆλ= .3. Further, 2ˆβ= 1ˆˆλγ, or 1ˆγ= 2ˆβ/ˆλ= .2/.3 ≈ .67.11.8 (i) Sequential exogeneity does not rule out correlation between, say, 1t u - and tj x for anyregressors j = 1, 2, …, k . The differencing generally induces correlation between the differenced errors and the differenced regressors. To see why, consider a single explanatory variable, x t . Then 1t t t u u u -∆=- and 1t t t x x x -∆=-. Under sequential exogeneity, t u is uncorrelated with t x and 1t x -, and 1t u - is uncorrelated with 1t x -. But 1t u - can be correlated with t x , which means that t u ∆ and t x ∆ are generally correlated. In fact, under sequential exogeneity, it is always true that 1Cov(,)Cov(,)t t t t x u x u -∆∆=-.(ii) Strict exogeneity of the regressors in the original equation is sufficient for OLS on the first-differenced equation to be consistent. Remember, strict exogeneity implies that theregressors in any time period are uncorrelated with the errors in any time period. Of course, we could make the weaker assumption: for any t , u t is uncorrelated with 1,t j x -, t j x , and 1,t j x + for all j = 1, 2, …, k . The strengthening beyond sequential exogeneity is the assumption that u t is uncorrelated w ith all of next period’s outcomes on all regressors. In practice, this is probably similar to just assuming strict exogeneity.(iii) If we assume sequential exogeneity in a static model, the condition can be written as。

伍德里奇计量经济学导论第6版笔记和课后习题答案

伍德里奇计量经济学导论第6版笔记和课后习题答案

第1章计量经济学的性质与经济数据1.1复习笔记考点一:计量经济学★1计量经济学的含义计量经济学,又称经济计量学,是由经济理论、统计学和数学结合而成的一门经济学的分支学科,其研究内容是分析经济现象中客观存在的数量关系。

2计量经济学模型(1)模型分类模型是对现实生活现象的描述和模拟。

根据描述和模拟办法的不同,对模型进行分类,如表1-1所示。

(2)数理经济模型和计量经济学模型的区别①研究内容不同数理经济模型的研究内容是经济现象各因素之间的理论关系,计量经济学模型的研究内容是经济现象各因素之间的定量关系。

②描述和模拟办法不同数理经济模型的描述和模拟办法主要是确定性的数学形式,计量经济学模型的描述和模拟办法主要是随机性的数学形式。

③位置和作用不同数理经济模型可用于对研究对象的初步研究,计量经济学模型可用于对研究对象的深入研究。

考点二:经济数据★★★1经济数据的结构(见表1-3)2面板数据与混合横截面数据的比较(见表1-4)考点三:因果关系和其他条件不变★★1因果关系因果关系是指一个变量的变动将引起另一个变量的变动,这是经济分析中的重要目标之计量分析虽然能发现变量之间的相关关系,但是如果想要解释因果关系,还要排除模型本身存在因果互逆的可能,否则很难让人信服。

2其他条件不变其他条件不变是指在经济分析中,保持所有的其他变量不变。

“其他条件不变”这一假设在因果分析中具有重要作用。

1.2课后习题详解一、习题1.假设让你指挥一项研究,以确定较小的班级规模是否会提高四年级学生的成绩。

(i)如果你能指挥你想做的任何实验,你想做些什么?请具体说明。

(ii)更现实地,假设你能搜集到某个州几千名四年级学生的观测数据。

你能得到它们四年级班级规模和四年级末的标准化考试分数。

你为什么预计班级规模与考试成绩成负相关关系?(iii)负相关关系一定意味着较小的班级规模会导致更好的成绩吗?请解释。

答:(i)假定能够随机的分配学生们去不同规模的班级,也就是说,在不考虑学生诸如能力和家庭背景等特征的前提下,每个学生被随机的分配到不同的班级。

伍德里奇计量经济学第六版答案Chapter 5

伍德里奇计量经济学第六版答案Chapter 5

CHAPTER 5TEACHING NOTESChapter 5 is short, but it is conceptually more difficult than the earlier chapters, primarily because it requires some knowledge of asymptotic properties of estimators. In class, I give a brief, heuristic description of consistency and asymptotic normality before stating the consistency and asymptotic normality of OLS. (Conveniently, the same assumptions that work for finite sample analysis work for asymptotic analysis.) More advanced students can follow the proof of consistency of the slope coefficient in the bivariate regression case. Section E.4 contains a full matrix treatment of asymptotic analysis appropriate for a master’s level course.An explicit illustration of what happens to standard errors as the sample size grows emphasizes the importance of having a larger sample. I do not usually cover the LM statistic in a first-semester course, and I only briefly mention the asymptotic efficiency result. Without full use of matrix algebra combined with limit theorems for vectors and matrices, it is difficult to prove asymptotic efficiency of OLS.I think the conclusions of this chapter are important for students to know, even though they may not fully grasp the details. On exams I usually include true-false type questions, with explanation, to test the students’ understanding of asymptotics. [For example: “In large samples we do not have to worry about omitted variable bias.” (False). Or “Even if the error term is not normally distributed, in large samples we can still compute approximately valid confidence intervals under the Gauss-Markov assumptions.” (True).]SOLUTIONS TO PROBLEMS5.1 Write y = 0β + 1βx 1 + u , and take the expected value: E(y ) = 0β + 1βE(x 1) + E(u ), or µy = 0β + 1βµx since E(u ) = 0, where µy = E(y ) and µx = E(x 1). We can rewrite this as 0β = µy - 1βµx . Now, 0ˆβ = y - 1ˆβ1x . Taking the plim of this we have plim(0ˆβ) = plim(y - 1ˆβ1x ) = plim(y ) – plim(1ˆβ)⋅plim(1x ) = µy - 1βµx , where we use the fact that plim(y ) = µy and plim(1x ) = µx by the law of large numbers, and plim(1ˆβ) = 1β. We have also used the parts of Property PLIM.2 from Appendix C.5.2 A higher tolerance of risk means more willingness to invest in the stock market, so 2β > 0.By assumption, funds and risktol are positively correlated. Now we use equation (5.5), where δ1 > 0: plim(1β) = 1β + 2βδ1 > 1β, so 1β has a positive inconsistency (asymptotic bias). This makes sense: if we omit risktol from the regression and it is positively correlated with funds , some of the estimated effect of funds is actually due to the effect of risktol .5.3 The variable cigs has nothing close to a normal distribution in the population. Most people do not smoke, so cigs = 0 for over half of the population. A normally distributed randomvariable takes on no particular value with positive probability. Further, the distribution of cigs is skewed, whereas a normal random variable must be symmetric about its mean.5.4 Write y = 0β + 1βx + u , and take the expected value: E(y ) = 0β + 1βE(x ) + E(u ), or µy = 0β + 1βµx , since E(u ) = 0, where µy = E(y ) and µx = E(x ). We can rewrite this as 0β = µy - 1βµx . Now, 0β = y - 1βx . Taking the plim of this we have plim(0β) = plim(y - 1βx ) = plim(y ) – plim(1β)⋅plim(x ) = µy - 1βµx , where we use the fact that plim(y ) = µy and plim(x ) = µx by the law of large numbers, and plim(1β) = 1β. We have also used the parts of the Property PLIM.2 from Appendix C.SOLUTIONS TO COMPUTER EXERCISESC5.1 (i) The estimated equation iswage = -2.87 + .599 educ + .022 exper + .169 tenure (0.73) (.051) (.012) (.022)n = 526, R 2 = .306, ˆσ= 3.085.Below is a histogram of the 526 residual, ˆi u, i = 1, 2 , ..., 526. The histogram uses 27 bins, which is suggested by the formula in the Stata manual for 526 observations. For comparison, the normal distribution that provides the best fit to the histogram is also plotted.log()wage = .284 + .092 educ+ .0041 exper+ .022 tenure(.104) (.007) (.0017) (.003)n = 526, R2 = .316, ˆ = .441.The histogram for the residuals from this equation, with the best-fitting normal distributionoverlaid, is given below:(iii) The residuals from the log(wage) regression appear to be more normally distributed.Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i),and the histogram for the wage residuals is notably skewed to the left. In the wage regressionthere are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations (ˆσ = 3.085) from the mean of the residuals, which is identically zero, of course. Residuals far from zero does not appear to be nearly as much of a problem in the log(wage)regression.C5.2 (i) The regression with all 4,137 observations iscolgpa= 1.392 - .01352 hsperc+ .00148 sat(0.072) (.00055) (.00007)n = 4,137, R2 = .273.(ii) Using only the first 2,070 observations givescolgpa= 1.436 - .01275 hsperc+ .00147 sat(0.098) (.00072) (.00009)n = 2,070, R2 = .283.(iii) The ratio of the standard error using 2,070 observations to that using 4,137 observationsis about 1.31. From (5.10) we compute ≈ 1.41, which is somewhat above the C5.3 We first run the regression colgpa on cigs, parity, and faminc using only the 1,191 observations with nonmissing observations on motheduc and fatheduc. After obtaining these residuals,iu, these are regressed on cigs i, parity i, faminc i, motheduc i, and fatheduc i, where, of course, we can only use the 1,197 observations with nonmissing values for both motheduc andfatheduc. The R-squared from this regression,2uR, is about .0024. With 1,191 observations, thechi-square statistic is (1,191)(.0024) ≈ 2.86. The p-value from the 22χ distribution is about .239, which is very close to .242, the p-value for the comparable F test.C5.4 (i) The measure of skewness for inc is about 1.86. When we use log(inc), the skewness measure is about .360. Therefore, there is much less skewness in log of income, which means inc is less likely to be normally distributed. (In fact, the skewness in income distributions is a well-documented fact across many countries and time periods.)(ii) The skewness for bwght is about -.60. When we use log(bwght), the skewness measure is about -2.95. In this case, there is much more skewness after taking the natural log.(iii) The example in part (ii) clearly shows that this statement cannot hold generally. It is possible to introduce skewness by taking the natural log. As an empirical matter, for many economic variables, particularly dollar values, taking the log often does help to reduce or eliminate skewness. But it does not have to.(iv) For the purposes of regression analysis, we should be studying the conditional distributions; that is, the distributions of y and log(y) conditional on the explanatory variables,1, ...,kx x. If we think the mean is linear, as in Assumptions MLR.1 and MLR.3, then this is equivalent to studying the distribution of the population error, u. In fact, the skewness measure studied in this question often is applied to the residuals from and OLS regression.C5.5 (i) The variable educ takes on all integer values from 6 to 20, inclusive. So it takes on 15 distinct values. It is not a continuous random variable, nor does it make sense to think of it as approximately continuous. (Contrast a variable such as hourly wage, which is rounded to two decimal places but takes on so many different values it makes sense to think of it as continuous.) (ii) With a discrete variable, usually a histogram has bars centered at each outcome, with the height being the fraction of observations taking on the value. Such a histogram, with a normal distribution overlay, is given below.Even discounting the discreteness, the best fitting normal distribution (matching the sample mean and variance) fits poorly. The focal point at educ = 12 clearly violates the notion of a smooth bell-shaped density.(iv) Given the findings in part (iii), the error term in the equation201234educ motheduc fatheduc abil abil u βββββ=+++++cannot have a normal distribution independent of the explanatory variables. Thus, MLR.6 is violated. In fact, the inequality 0educ ≥ means that u is not even free to vary over all values given motheduc , fatheduc , and abil . (It is likely that the homoskedasticity assumption fails, too, but this is less clear and does not follow from the nature of educ .)(v) The violation of MLR.6 means that we cannot perform exact statistical inference; we must rely on asymptotic analysis. This in itself does not change how we perform statistical inference: without normality, we use exactly the same methods, but we must be aware that our inference holds only approximately.d e n s i t y。

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-多元回归分析:推断【圣才出品】

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-多元回归分析:推断【圣才出品】

伍德⾥奇《计量经济学导论》(第6版)复习笔记和课后习题详解-多元回归分析:推断【圣才出品】第4章多元回归分析:推断4.1复习笔记考点⼀:OLS估计量的抽样分布★★★1.假定MLR.6(正态性)假定总体误差项u独⽴于所有解释变量,且服从均值为零和⽅差为σ2的正态分布,即:u~Normal(0,σ2)。

对于横截⾯回归中的应⽤来说,假定MLR.1~MLR.6被称为经典线性模型假定。

假定下对应的模型称为经典线性模型(CLM)。

2.⽤中⼼极限定理(CLT)在样本量较⼤时,u近似服从于正态分布。

正态分布的近似效果取决于u中包含多少因素以及因素分布的差异。

但是CLT的前提假定是所有不可观测的因素都以独⽴可加的⽅式影响Y。

当u是关于不可观测因素的⼀个复杂函数时,CLT论证可能并不适⽤。

3.OLS估计量的正态抽样分布定理4.1(正态抽样分布):在CLM假定MLR.1~MLR.6下,以⾃变量的样本值为条件,有:∧βj~Normal(βj,Var(∧βj))。

将正态分布函数标准化可得:(∧βj-βj)/sd(∧βj)~Normal(0,1)。

注:∧β1,∧β2,…,∧βk的任何线性组合也都符合正态分布,且∧βj的任何⼀个⼦集也都具有⼀个联合正态分布。

考点⼆:单个总体参数检验:t检验★★★★1.总体回归函数总体模型的形式为:y=β0+β1x1+…+βk x k+u。

假定该模型满⾜CLM假定,βj的OLS 量是⽆偏的。

2.定理4.2:标准化估计量的t分布在CLM假定MLR.1~MLR.6下,(∧βj-βj)/se(∧βj)~t n-k-1,其中,k+1是总体模型中未知参数的个数(即k个斜率参数和截距β0)。

t统计量服从t分布⽽不是标准正态分布的原因是se(∧βj)中的常数σ已经被随机变量∧σ所取代。

t统计量的计算公式可写成标准正态随机变量(∧βj-βj)/sd(∧βj)与∧σ2/σ2的平⽅根之⽐,可以证明⼆者是独⽴的;⽽且(n-k-1)∧σ2/σ2~χ2n-k-1。

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-模型设定和数据问题的深入探讨【圣才出品】

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-模型设定和数据问题的深入探讨【圣才出品】

第9章模型设定和数据问题的深入探讨9.1复习笔记考点一:函数形式设误检验(见表9-1)★★★★表9-1函数形式设误检验考点二:对无法观测解释变量使用代理变量★★★1.代理变量代理变量就是某种与分析中试图控制而又无法观测的变量相关的变量。

(1)遗漏变量问题的植入解假设在有3个自变量的模型中,其中有两个自变量是可以观测的,解释变量x3*观测不到:y=β0+β1x1+β2x2+β3x3*+u。

但有x3*的一个代理变量,即x3,有x3*=δ0+δ3x3+v3。

其中,x3*和x3正相关,所以δ3>0;截距δ0容许x3*和x3以不同的尺度来度量。

假设x3就是x3*,做y对x1,x2,x3的回归,从而利用x3得到β1和β2的无偏(或至少是一致)估计量。

在做OLS之前,只是用x3取代了x3*,所以称之为遗漏变量问题的植入解。

代理变量也可以以二值信息的形式出现。

(2)植入解能得到一致估计量所需的假定(见表9-2)表9-2植入解能得到一致估计量所需的假定2.用滞后因变量作为代理变量对于想要控制无法观测的因素,可以选择滞后因变量作为代理变量,这种方法适用于政策分析。

但是现期的差异很难用其他方法解释。

使用滞后被解释变量不是控制遗漏变量的唯一方法,但是这种方法适用于估计政策变量。

考点三:随机斜率模型★★★1.随机斜率模型的定义如果一个变量的偏效应取决于那些随着总体单位的不同而不同的无法观测因素,且只有一个解释变量x,就可以把这个一般模型写成:y i=a i+b i x i。

上式中的模型有时被称为随机系数模型或随机斜率模型。

对于上式模型,记a i=a+c i和b i=β+d i,则有E(c i)=0和E(d i)=0,代入模型得y i=a+βx i+u i,其中,u i=c i+d i x i。

2.保证OLS无偏(一致性)的条件(1)简单回归当u i=c i+d i x i时,无偏的充分条件就是E(c i|x i)=E(c i)=0和E(d i|x i)=E(d i)=0。

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-高级面板数据方法【圣才出品】

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-高级面板数据方法【圣才出品】

第14章高级面板数据方法14.1复习笔记考点一:固定效应估计法★★★★★1.固定效应变换固定效应变换又称组内变换,考虑仅有一个解释变量的模型:对每个i,有y it =β1x it +a i +u it ,t=1,2,…,T对每个i 求方程在时间上的平均,便得到_y i =β1_x i +a i +_u i 其中,11T it t y T y-==∑(关于时间的均值)。

因为a i 在不同时间固定不变,故它会在原模型和均值模型中都出现,如果对于每个t,两式相减,便得到y it -_y i =β1(x it -_x i )+u it -_u i ,t=1,2,…,T或1 12it it it y x u ,t ,,,T=+=&&&&&&L β其中,it it i y y y =-&&是y 的除时间均值数据;对it x &&和it u &&的解释也类似。

方程的要点在于,非观测效应a i 已随之消失,从而可以使用混合OLS 去估计式1 12it it it y x u ,t ,,,T =+=&&&&&&L β。

上式的混合OLS 估计量被称为固定效应估计量或组内估计量。

组间估计量可以从横截面方程_y i =β1_x i +a i +_u i 的OLS 估计量而得到,即同时使用y 和x的时间平均值做一个横截面回归。

如果a i与_x i相关,估计量是有偏误的。

而如果认为a i 与x it无关,则使用随机效应估计量要更好。

组间估计量忽视了变量如何随着时间而变化。

在方程中添加更多解释变量不会引起什么变化。

2.固定效应模型(1)无偏性原始的非固定效应模型,只要让每一个变量都减去时间均值数据,即可得到固定效应模型。

固定效应模型的无偏性是建立在严格外生性的假定下的,所以FE模型需要假定特异误差u it应与所有时期的每个解释变量都无关。

Wooldridge计量经济学答案

Wooldridge计量经济学答案
CONTENTS
PREFACE
iii
SUGGESTED COURSE OUTLINES
iv
Chapter 1 The Nature of Econometrics and Economic Data
1
Chapter 2 The Simple Regression Model
5
Chapter 3 Multiple Regression Analysis: Estimation
15
Chapter 4 Multiple Regression Analysis: Inference
28
Chapter 5 Multiple Regression Analysis: OLS Asymptotics
39
Chapter 6 Multiple Regression Analysis: Further Issues
The solutions to the computer exercises were obtained using Stata, starting with version 4.0 and running through version 7.0. Nevertheless, almost all of the estimation methods covered in the text have been standardized, and different econometrics or statistical packages should give the same answers. There can be differences when applying more advanced techniques, as conventions sometimes differ on how to choose or estimate auxiliary parameters. (Examples include heteroskedasticity-robust standard errors, estimates of a random effects model, and corrections for sample selection bias.)

伍德里奇计量经济学第六版答案AppendixB

伍德里奇计量经济学第六版答案AppendixB

伍德里奇计量经济学第六版答案AppendixBAPPENDIX BSOLUTIONS TO PROBLEMSB.1 Before the student takes the SAT exam, we do not know – nor can we predict with certainty – what the score will be. The actual score depends on numerous factors, many of which we cannot even list, let alone know ahead of time. (The student’s innate abil ity, how the student feels on exam day, and which particular questions were asked, are just a few.) The eventual SAT score clearly satisfies the requirements of a random variable.B.2 (i) P(X≤ 6) = P[(X–5)/2 ≤ (6 –5)/2] = P(Z≤ .5) ≈.692, where Z denotes a Normal (0,1) random variable. [We obtain P(Z≤ .5) from Table G.1.](ii) P(X > 4) = P[(X– 5)/2 > (4 – 5)/2] = P(Z > -.5) = P(Z≤ .5) ≈ .692.(iii) P(|X– 5| > 1) = P(X– 5 > 1) + P(X– 5 < –1) = P(X > 6) + P(X < 4) ≈ (1 – .692) + (1 – .692) = .616, where we have used answers from parts (i) and (ii).B.3 (i) Let Y it be the binary variable equal to one if fund i outperforms the market in year t. By assumption, P(Y it = 1) = .5 (a 50-50 chance of outperforming the market for each fund in each year). Now, for any fund, we are also assuming that performance relative to the market is independent across years. But then the probability that fund i outperforms the market in all 10 years, P(Y i1 = 1,Y i2 = 1, , Y i,10 = 1), is just the product of the probabilities: P(Y i1 = 1)?P(Y i2 = 1) P(Y i,10 = 1) = (.5)10 = 1/1024 (which is slightly less than .001). In fact, if we define a binary random variable Y i such that Y i = 1 if and only if fund i outperformed the market in all 10 years, then P(Y i = 1) = 1/1024.(ii) Let X denote the number of funds out of 4,170 that outperform the market in all 10 years. Then X = Y1 + Y2 + + Y4,170. If we assume that performance relative to the market is independent across funds, then X has the Binomial (n,θ) distribution with n = 4,170 and θ =1/1024. We want to compute P(X≥ 1) = 1 – P(X = 0) = 1 –P(Y1 = 0, Y2= 0, …, Y4,170 = 0) = 1 –P(Y1 = 0)? P(Y2 = 0)P(Y4,170 = 0) = 1 –(1023/1024)4170≈.983. This means, if performance relative to the market is random and independent across funds, it is almost certain that at least one fund will outperform the market in all 10 years.(iii) Using the Stata command Binomial(4170,5,1/1024), the answer is about .385. So there is a nontrivial chance that at least five funds will outperform the market in all 10 years.B.4 We want P(X ≥.6). Because X is continuous, this is the same as P(X > .6) = 1 –P(X≤ .6) =F(.6) = 3(.6)2– 2(.6)3 = .648. One way to interpret this is that almost 65% of all counties havean elderly employment rate of .6 or higher.B.5 (i) As stated in the hint, if X is the number of jurors convinced of Simpson’s innocence, then X ~ Binomial(12,.20). We want P(X≥ 1) = 1 – P(X = 0) = 1 –(.8)12≈ .931.263264 (ii) Above, we computed P(X = 0) as about .069. We need P(X = 1), which we obtain from(B.14) with n = 12, θ = .2, and x = 1: P(X = 1) = 12? (.2)(.8)11 ≈ .206. Therefore, P(X ≥ 2) ≈ 1 – (.069 + .206) = .725, so there is almost a three in four chance that the jury had at least two members convinced of Simpson’s innocence prior to the trial.B.6 E(X ) = 30()xf x dx ? = 320[(1/9)] x x dx ? = (1/9) 330x dx ?.But 330x dx ? = (1/4)x 430| = 81/4. Therefore, E(X ) = (1/9)(81/4) = 9/4, or 2.25 years.B.7 In eight attempts the expected number of free throws is 8(.74) = 5.92, or about six free throws.B.8 The weights for the two-, three-, and four-credit courses are 2/9, 3/9, and 4/9, respectively. Let Y j be the grade in the j th course, j = 1, 2, and 3, and let X be the overall grade point average. Then X = (2/9)Y 1 + (3/9)Y 2 + (4/9)Y 3 and the expected value is E(X ) = (2/9)E(Y 1) + (3/9)E(Y 2) + (4/9)E(Y 3) = (2/9)(3.5) + (3/9)(3.0) + (4/9)(3.0) = (7 + 9 + 12)/9 ≈ 3.11.B.9 If Y is salary in dollars then Y = 1000?X , and so the expected value of Y is 1,000 times the expected value of X , and the standard deviation of Y is 1,000 times the standard deviation of X . Therefore, the expected value and standard deviation of salary, measured in dollars, are $52,300 and $14,600, respectively.B.10 (i) E(GPA |SAT = 800) = .70 + .002(800) = 2.3. Similarly, E(GPA |SAT =1,400) = .70 + .002(1400) = 3.5. The difference in expected GPAs is substantial, but the difference in SAT scores is also rather large.(ii) Following the hint, we use the law of iterated expectations. SinceE(GPA |SAT ) = .70 + .002 SAT , the (unconditional) expected value of GPA is .70 + .002 E(SAT ) = .70 + .002(1100) = 2.9.。

伍德里奇计量经济学第六版答案Chapter 2

伍德里奇计量经济学第六版答案Chapter 2

CHAPTER 2TEACHING NOTESThis is the chapter where I expect students to follow most, if not all, of the algebraic derivations. In class I like to derive at least the unbiasedness of the OLS slope coefficient, and usually I derive the variance. At a minimum, I talk about the factors affecting the variance. To simplify the notation, after I emphasize the assumptions in the population model, and assume random sampling, I just condition on the values of the explanatory variables in the sample. Technically, this is justified by random sampling because, for example, E(u i|x1,x2,…,x n) = E(u i|x i) by independent sampling. I find that students are able to focus on the key assumption SLR.4 and subsequently take my word about how conditioning on the independent variables in the sample is harmless. (If you prefer, the appendix to Chapter 3 does the conditioning argument carefully.) Because statistical inference is no more difficult in multiple regression than in simple regression, I postpone inference until Chapter 4. (This reduces redundancy and allows you to focus on the interpretive differences between simple and multiple regression.)You might notice how, compared with most other texts, I use relatively few assumptions to derive the unbiasedness of the OLS slope estimator, followed by the formula for its variance. This is because I do not introduce redundant or unnecessary assumptions. For example, once SLR.4 is assumed, nothing further about the relationship between u and x is needed to obtain the unbiasedness of OLS under random sampling.Incidentally, one of the uncomfortable facts about finite-sample analysis is that there is a difference between an estimator that is unbiased conditional on the outcome of the covariates and one that is unconditionally unbiased. If the distribution of the x i is such that they can all equal the same value with positive probability – as is the case with discreteness in the distribution –then the unconditional expectation does not really exist. Or, if it is made to exist then the estimator is not unbiased. I do not try to explain these subtleties in an introductory course, but I have had instructors ask me about the difference.SOLUTIONS TO PROBLEMS2.1 (i) Income, age, and family background (such as number of siblings) are just a fewpossibilities. It seems that each of these could be correlated with years of education. (Income and education are probably positively correlated; age and education may be negatively correlated because women in more recent cohorts have, on average, more education; and number of siblings and education are probably negatively correlated.)(ii) Not if the factors we listed in part (i) are correlated with educ . Because we would like to hold these factors fixed, they are part of the error term. But if u is correlated with educ then E(u|educ ) ≠ 0, and so SLR.4 fails.2.2 In the equation y = β0 + β1x + u , add and subtract α0 from the right hand side to get y = (α0 + β0) + β1x + (u - α0). Call the new error e = u - α0, so that E(e ) = 0. The new intercept is α0 + β0, but the slope is still β1.2.3 (i) Let y i = GPA i , x i = ACT i , and n = 8. Then x = 25.875, y =3.2125, 1ni =∑(x i – x )(y i – y ) =5.8125, and 1ni =∑(x i – x )2 = 56.875. From equation (2.9), we obtain the slope as 1ˆβ= 5.8125/56.875 ≈ .1022, rounded to four places after the decimal. From (2.17), 0ˆβ = y – 1ˆβx ≈ 3.2125 – (.1022)25.875 ≈ .5681. So we can writeGPA = .5681 + .1022 ACTn = 8.The intercept does not have a useful interpretation because ACT is not close to zero for the population of interest. If ACT is 5 points higher, GPA increases by .1022(5) = .511.(ii) The fitted values and residuals — rounded to four decimal places — are given along with the observation number i and GPA in the following table:You can verify that the residuals, as reported in the table, sum to -.0002, which is pretty close to zero given the inherent rounding error.(iii) When ACT = 20, GPA = .5681 + .1022(20) ≈ 2.61.(iv) The sum of squared residuals,21ˆni i u=∑, is about .4347 (rounded to four decimal places), and the total sum of squares,1ni =∑(y i – y )2, is about 1.0288. So the R -squared from the regressionisR 2 = 1 – SSR/SST ≈ 1 – (.4347/1.0288) ≈ .577.Therefore, about 57.7% of the variation in GPA is explained by ACT in this small sample of students.2.4 (i) When cigs = 0, predicted birth weight is 119.77 ounces. When cigs = 20, bwght = 109.49. This is about an 8.6% drop.(ii) Not necessarily. There are many other factors that can affect birth weight, particularly overall health of the mother and quality of prenatal care. These could be correlated withcigarette smoking during birth. Also, something such as caffeine consumption can affect birth weight, and might also be correlated with cigarette smoking.(iii) If we want a predicted bwght of 125, then cigs = (125 – 119.77)/( –.524) ≈–10.18, or about –10 cigarettes! This is nonsense, of course, and it shows what happens when we are trying to predict something as complicated as birth weight with only a single explanatory variable. The largest predicted birth weight is necessarily 119.77. Yet almost 700 of the births in the sample had a birth weight higher than 119.77.(iv) 1,176 out of 1,388 women did not smoke while pregnant, or about 84.7%. Because we are using only cigs to explain birth weight, we have only one predicted birth weight at cigs = 0. The predicted birth weight is necessarily roughly in the middle of the observed birth weights at cigs = 0, and so we will under predict high birth rates.2.5 (i) The intercept implies that when inc = 0, cons is predicted to be negative $124.84. This, of course, cannot be true, and reflects that fact that this consumption function might be a poor predictor of consumption at very low-income levels. On the other hand, on an annual basis, $124.84 is not so far from zero.(ii) Just plug 30,000 into the equation: cons = –124.84 + .853(30,000) = 25,465.16 dollars.(iii) The MPC and the APC are shown in the following graph. Even though the intercept is negative, the smallest APC in the sample is positive. The graph starts at an annual income level of $1,000 (in 1970 dollars).increases housing prices.(ii) If the city chose to locate the incinerator in an area away from more expensive neighborhoods, then log(dist) is positively correlated with housing quality. This would violate SLR.4, and OLS estimation is biased.(iii) Size of the house, number of bathrooms, size of the lot, age of the home, and quality of the neighborhood (including school quality), are just a handful of factors. As mentioned in part (ii), these could certainly be correlated with dist [and log(dist )].2.7 (i) When we condition on incbecomes a constant. So E(u |inc⋅e |inc) = ⋅E(e |inc⋅0 because E(e |inc ) = E(e ) = 0.(ii) Again, when we condition on incbecomes a constant. So Var(u |inc⋅e |inc2Var(e |inc ) = 2e σinc because Var(e |inc ) = 2e σ.(iii) Families with low incomes do not have much discretion about spending; typically, a low-income family must spend on food, clothing, housing, and other necessities. Higher income people have more discretion, and some might choose more consumption while others more saving. This discretion suggests wider variability in saving among higher income families.2.8 (i) From equation (2.66),1β = 1n i i i x y =⎛⎫ ⎪⎝⎭∑ / 21n i i x =⎛⎫ ⎪⎝⎭∑.Plugging in y i = β0 + β1x i + u i gives1β = 011()n i i i i x x u ββ=⎛⎫++ ⎪⎝⎭∑/ 21n i i x =⎛⎫ ⎪⎝⎭∑.After standard algebra, the numerator can be written as201111in n ni i i i i i x x x u ββ===++∑∑∑.Putting this over the denominator shows we can write 1β as1β = β01n i i x =⎛⎫ ⎪⎝⎭∑/ 21n i i x =⎛⎫ ⎪⎝⎭∑ + β1 + 1n i i i x u =⎛⎫ ⎪⎝⎭∑/ 21n i i x =⎛⎫⎪⎝⎭∑.Conditional on the x i , we haveE(1β) = β01n i i x =⎛⎫ ⎪⎝⎭∑/ 21n i i x =⎛⎫⎪⎝⎭∑ + β1because E(u i ) = 0 for all i . Therefore, the bias in 1β is given by the first term in this equation. This bias is obviously zero when β0 = 0. It is also zero when 1ni i x =∑ = 0, which is the same asx = 0. In the latter case, regression through the origin is identical to regression with an intercept. (ii) From the last expression for 1βin part (i) we have, conditional on the x i ,Var(1β) = 221n i i x -=⎛⎫ ⎪⎝⎭∑Var 1n i i i x u =⎛⎫ ⎪⎝⎭∑ = 221n i i x -=⎛⎫ ⎪⎝⎭∑21Var()n i i i x u =⎛⎫⎪⎝⎭∑= 221n i i x -=⎛⎫ ⎪⎝⎭∑221n i i x σ=⎛⎫ ⎪⎝⎭∑ = 2σ/ 21n i i x =⎛⎫ ⎪⎝⎭∑.(iii) From (2.57), Var(1ˆβ) = σ2/21()n i i x x =⎛⎫- ⎪⎝⎭∑. From the hint, 21n i i x =∑ ≥ 21()ni i x x =-∑, and so Var(1β) ≤ Var(1ˆβ). A more direct way to see this is to write 21()ni i x x =-∑ = 221()ni i x n x =-∑, which is less than 21ni i x =∑ unless x = 0.(iv) For a given sample size, the bias in 1β increases as x increases (holding the sum of the2ix fixed). But as x increases, the variance of 1ˆβincreases relative to Var(1β). The bias in 1β is also small when 0β is small. Therefore, whether we prefer 1β or 1ˆβ on a mean squared error basis depends on the sizes of 0β, x , and n (in addition to the size of 21ni i x =∑).2.9 (i) We follow the hint, noting that 1c y = 1c y (the sample average of 1i c y is c 1 times the sample average of y i ) and 2c x = 2c x . When we regress c 1y i on c 2x i (including an intercept) we use equation (2.19) to obtain the slope:2211121112222221111112221()()()()()()()()ˆ.()n ni iiii i nnii i i niii nii c x c x c y c y c c x x y y c x c x cx x x x y y c c c c x x ββ======----==----=⋅=-∑∑∑∑∑∑From (2.17), we obtain the intercept as 0β = (c 1y ) – 1β(c 2x ) = (c 1y ) – [(c 1/c 2)1ˆβ](c 2x ) = c 1(y – 1ˆβx ) = c 10ˆβ) because the intercept from regressing y i on x i is (y – 1ˆβx ).(ii) We use the same approach from part (i) along with the fact that 1()c y + = c 1 + y and2()c x + = c 2 + x . Therefore, 11()()i c y c y +-+ = (c 1 + y i ) – (c 1 + y ) = y i – y and (c 2 + x i ) – 2()c x + = x i – x . So c 1 and c 2 entirely drop out of the slope formula for the regression of (c 1 + y i )on (c 2 + x i ), and 1β = 1ˆβ. The intercept is 0β = 1()c y + – 1β2()c x + = (c 1 + y ) – 1ˆβ(c 2 + x ) = (1ˆy x β-) + c 1 – c 21ˆβ = 0ˆβ + c 1 – c 21ˆβ, which is what we wanted to show.(iii) We can simply apply part (ii) because 11log()log()log()i i c y c y =+. In other words, replace c 1 with log(c 1), y i with log(y i ), and set c 2 = 0.(iv) Again, we can apply part (ii) with c 1 = 0 and replacing c 2 with log(c 2) and x i with log(x i ). If 01垐 and ββ are the original intercept and slope, then 11ˆββ= and 0021垐log()c βββ=-.2.10 (i) This derivation is essentially done in equation (2.52), once (1/SST )x is brought inside the summation (which is valid because SST x does not depend on i ). Then, just define/SST i i x w d =.(ii) Because 111垐Cov(,)E[()] ,u u βββ=- we show that the latter is zero. But, from part (i),()1111ˆE[()] =E E().nn i i i i i i u wu u w u u ββ==⎡⎤-=⎢⎥⎣⎦∑∑ Because the i u are pairwise uncorrelated (they are independent), 22E()E(/)/i i u u u n n σ== (because E()0, i h u u i h =≠). Therefore,(iii) The formula for the OLS intercept is 0垐y x ββ=- and, plugging in 01y x u ββ=++ gives 01111垐?()().x u x u x βββββββ=++-=+--(iv) Because 1ˆ and u β are uncorrelated, 222222201垐Var()Var()Var()/(/SST )//SST x x u x n x n x ββσσσσ=+=+=+,which is what we wanted to show.(v) Using the hint and substitution gives ()220ˆVar()[SST /]/SST x xn x βσ=+ ()()2122221211/SST /SST .nni x i x i i n x x x n x σσ--==⎡⎤=-+=⎢⎥⎣⎦∑∑22111E()(/)(/)0.n n ni i i i i i i w u u w n n w σσ======∑∑∑2.11 (i) We would want to randomly assign the number of hours in the preparation course so that hours is independent of other factors that affect performance on the SAT. Then, we would collect information on SAT score for each student in the experiment, yielding a data set{(,):1,...,}i i sat hours i n =, where n is the number of students we can afford to have in the study. From equation (2.7), we should try to get as much variation in i hours as is feasible.(ii) Here are three factors: innate ability, family income, and general health on the day of the exam. If we think students with higher native intelligence think they do not need to prepare for the SAT, then ability and hours will be negatively correlated. Family income would probably be positively correlated with hours , because higher income families can more easily affordpreparation courses. Ruling out chronic health problems, health on the day of the exam should be roughly uncorrelated with hours spent in a preparation course.(iii) If preparation courses are effective,1β should be positive: other factors equal, an increase in hours should increase sat .(iv) The intercept, 0β, has a useful interpretation in this example: because E(u ) = 0, 0β is the average SAT score for students in the population with hours = 0.2.12 (i) I will show the result without using calculus. Let y ̅ be the sample average of the y i and write221122001112200112201()[()()]()2()()()()2()()()()()n niii i nnni i i i i nni i i i ni i y b y y y b y y y y y b y b y y y b y y n y b y y n y b ========-=-+-=-+--+-=-+--+-=-+-∑∑∑∑∑∑∑∑where we use the fact (see Appendix A) that 1()0ni i y y =-=∑ always. The first term does notdepend on 0b and the second term,20()n y b -, which is nonnegative, is clearly minimized when 0b y =.(ii) If we define i i u y y =-then 11()nni i i i u y y ===-∑∑and we already used the fact that this sumis zero in the proof in part (i).SOLUTIONS TO COMPUTER EXERCISESC2.1 (i) The average prate is about 87.36 and the average mrate is about .732.(ii) The estimated equation isprate= 83.05 + 5.86 mraten = 1,534, R2 = .075.(iii) The intercept implies that, even if mrate = 0, the predicted participation rate is 83.05 percent. The coefficient on mrate implies that a one-dollar increase in the match rate – a fairly large increase – is estimated to increase prate by 5.86 percentage points. This assumes, of course, that this change prate is possible (if, say, prate is already at 98, this interpretation makes no sense).(iv) If we plug mrate = 3.5 into the equation we get ˆprate= 83.05 + 5.86(3.5) = 103.59. This is impossible, as we can have at most a 100 percent participation rate. This illustrates that, especially when dependent variables are bounded, a simple regression model can give strange predictions for extreme values of the independent variable. (In the sample of 1,534 firms, only 34 have mrate≥ 3.5.)(v) mrate explains about 7.5% of the variation in prate. This is not much, and suggests that many other factors influence 401(k) plan participation rates.C2.2 (i) Average salary is about 865.864, which means $865,864 because salary is in thousands of dollars. Average ceoten is about 7.95.(ii) There are five CEOs with ceoten = 0. The longest tenure is 37 years.(iii) The estimated equation issalary= 6.51 + .0097 ceotenlog()n = 177, R2 = .013.We obtain the approximate percentage change in salary given ∆ceoten = 1 by multiplying the coefficient on ceoten by 100, 100(.0097) = .97%. Therefore, one more year as CEO is predicted to increase salary by almost 1%.C2.3 (i) The estimated equation issleep= 3,586.4 – .151 totwrkn = 706, R2 = .103.The intercept implies that the estimated amount of sleep per week for someone who does not work is 3,586.4 minutes, or about 59.77 hours. This comes to about 8.5 hours per night.(ii) If someone works two more hours per week then ∆totwrk = 120 (because totwrk ismeasured in minutes), and so sleep ∆= –.151(120) = –18.12 minutes. This is only a few minutes a night. If someone were to work one more hour on each of five working days, sleep ∆= –.151(300) = –45.3 minutes, or about five minutes a night.C2.4 (i) Average salary is about $957.95 and average IQ is about 101.28. The sample standard deviation of IQ is about 15.05, which is pretty close to the population value of 15.(ii) This calls for a level-level model: wage = 116.99 + 8.30 IQn = 935, R 2 = .096.An increase in IQ of 15 increases predicted monthly salary by 8.30(15) = $124.50 (in 1980 dollars). IQ score does not even explain 10% of the variation in wage .(iii) This calls for a log-level model:log()wage = 5.89 + .0088 IQ n = 935, R 2 = .099.If ∆IQ = 15 then log()wage ∆ = .0088(15) = .132, which is the (approximate) proportionate change in predicted wage. The percentage increase is therefore approximately 13.2.C2.5 (i) The constant elasticity model is a log-log model:log(rd ) = 0β + 1βlog(sales ) + u ,where 1β is the elasticity of rd with respect to sales .(ii) The estimated equation is log()rd = –4.105 + 1.076 log(sales )n = 32, R 2 = .910.The estimated elasticity of rd with respect to sales is 1.076, which is just above one. A one percent increase in sales is estimated to increase rd by about 1.08%.C2.6 (i) It seems plausible that another dollar of spending has a larger effect for low-spending schools than for high-spending schools. At low-spending schools, more money can go toward purchasing more books, computers, and for hiring better qualified teachers. At high levels of spending, we would expend little, if any, effect because the high-spending schools already have high-quality teachers, nice facilities, plenty of books, and so on.(ii) If we take changes, as usual, we obtain1110log()(/100)(%),math expend expend ββ∆=∆≈∆just as in the second row of Table 2.3. So, if %10,expend ∆=110/10.math β∆=(iii) The regression results are21069.34 11.16 log()408, .0297math expend n R =-+==(iv) If expend increases by 10 percent, 10math increases by about 1.1 percentage points. This is not a huge effect, but it is not trivial for low-spending schools, where a 10 percent increase in spending might be a fairly small dollar amount.(v) In this data set, the largest value of math10 is 66.7, which is not especially close to 100. In fact, the largest fitted values is only about 30.2.C2.7 (i) The average gift is about 7.44 Dutch guilders. Out of 4,268 respondents, 2,561 did not give a gift, or about 60 percent.(ii) The average mailings per year is about 2.05. The minimum value is .25 (which presumably means that someone has been on the mailing list for at least four years) and the maximum value is 3.5.(iii) The estimated equation is22.01 2.65 4,268, .0138gift mailsyear n R =+==(iv) The slope coefficient from part (iii) means that each mailing per year is associated with – perhaps even “causes” – an estimated 2.65 additional guilders, on average. Therefore, if each mailing costs one guilder, the expected profit from each mailing is estimated to be 1.65 guilders. This is only the average, however. Some mailings generate no contributions, or a contribution less than the mailing cost; other mailings generated much more than the mailing cost.(v) Because the smallest mailsyear in the sample is .25, the smallest predicted value of gifts is 2.01 + 2.65(.25) ≈ 2.67. Even if we look at the overall population, where some people have received no mailings, the smallest predicted value is about two. So, with this estimated equation, we never predict zero charitable gifts.C2.8 There is no “correct” answer to this question because all answers depend on how the random outcomes are generated. I used Stata 11 and, before generating the outcomes on the i x , I set the seed to the value 123. I reset the seed to 123 to generate the outcomes on the i u . Specifically, to answer parts (i) through (v), I used the sequence of commandsset obs 500set seed 123gen x = 10*runiform()sum xset seed 123gen u = 6*rnormal()sum ugen y = 1 + 2*x + ureg y xpredict uh, residgen x_uh = x*uhsum uh x_uhgen x_u = x*usum u x_u(i) The sample mean of the i x is about 4.912 with a sample standard deviation of about 2.874.(ii) The sample average of the i u is about .221, which is pretty far from zero. We do not get zero because this is just a sample of 500 from a population with a zero mean. The current sample is “unlucky” in the sense that the sample average is far from the population average. The sample standard deviation is about 5.768, which is nontrivially below 6, the population value.(iii) After generating the data on i y and running the regression, I get, rounding to three decimal places,0ˆ 1.862β= and 1ˆ 1.870β= The population values are 1 and 2, respectively. Thus, the estimated intercept based on this sample of data is well above the population value. The estimated slope is somewhat below the population value, 2. When we sample from a population our estimates contain sampling error; that is why the estimates differ from the population values.(iv) When I use the command sum uh x_uh and multiply by 500 I get, using scientific notation, sums equal to 4.181e-06 and .00003776, respectively. These are zero for practical purposes, and differ from zero only due to rounding inherent in the machine imprecision (which is unimportant).(v) We already computed the sample average of the i u in part (ii). When we multiply by 500 the sample average is about 110.74. The sum of i i x u is about 6.46. Neither is close to zero, and nothing says they should be particularly close.(vi) For this part I set the seed to 789. The sample average and standard deviation of the i x are about 5.030 and 2.913; those for the i u are about .077- and 5.979. When I generated the i y and run the regression I get0ˆ.701β= and 1ˆ 2.044β= These are different from those in part (iii) because they are obtained from a different random sample. Here, for both the intercept and slope, we get estimates that are much closer to the population values. Of course, in practice we would never know that.。

计量经济学导论第六版课后答案知识伍德里奇

计量经济学导论第六版课后答案知识伍德里奇

计量经济学导论第六版课后答案知识伍德里奇第一章:计量经济学介绍1. 为什么需要计量经济学?计量经济学的主要目标是提供一种科学的方法来解决经济问题。

经济学家需要使用数据来验证经济理论的有效性,并预测经济变量的发展趋势。

计量经济学提供了一种框架,使得经济学家能够使用数学和统计方法来分析经济问题。

2. 计量经济学的基本概念•因果推断:计量经济学的核心是通过观察数据来推断出变量之间的因果关系。

通过使用统计方法,我们可以分析出某个变量对另一个变量的影响。

•数据类型:计量经济学研究的数据可以是时间序列数据或截面数据。

时间序列数据是沿着时间轴观测到的数据,而截面数据是在某一时间点上观测到的数据。

•数据偏差:在计量经济学中,数据偏差是指由于样本选择问题、观测误差等原因导致数据与真实值之间的差异。

3. 计量经济学的方法计量经济学使用了许多统计和经济学方法来分析数据。

以下是一些常用的计量经济学方法:•最小二乘法(OLS):在计量经济学中,最小二乘法是一种常用的回归方法。

它通过最小化观测值和预测值之间的平方差来估计未知参数。

•时间序列分析:时间序列分析是通过对时间序列数据进行模型化和预测来研究经济变量的变化趋势。

•面板数据分析:面板数据是同时包含时间序列和截面数据的数据集。

面板数据分析可以用于研究个体和时间的变化,以及它们之间的关系。

4. 计量经济学应用领域计量经济学广泛应用于经济学研究和实践中的各个领域。

以下是一些计量经济学的应用领域:•劳动经济学:计量经济学可以用来研究劳动力市场的供求关系、工资决定因素等问题。

•金融经济学:计量经济学可以用来研究证券价格、金融市场的波动等问题。

•产业组织经济学:计量经济学可以用来研究市场竞争、垄断力量等问题。

•发展经济学:计量经济学可以用来研究发展中国家的经济增长、贫困问题等。

第二章:统计学回顾1. 统计学基本概念•总体和样本:总体是指我们想要研究的全部个体或事物的集合,而样本是从总体中选取的一部分个体或事物。

伍德里奇计量经济学第六版答案Chapter 7

伍德里奇计量经济学第六版答案Chapter 7

CHAPTER 7TEACHING NOTESThis is a fairly standard chapter on using qualitative information in regression analysis, although I try to emphasize examples with policy relevance (and only cross-sectional applications are included.).In allowing for different slopes, it is important, as in Chapter 6, to appropriately interpret the parameters and to decide whether they are of direct interest. For example, in the wage equation where the return to education is allowed to depend on gender, the coefficient on the female dummy variable is the wage differential between women and men at zero years of education. It is not surprising that we cannot estimate this very well, nor should we want to. In this particular example we would drop the interaction term because it is insignificant, but the issue of interpreting the parameters can arise in models where the interaction term is significant.In discussing the Chow test, I think it is important to discuss testing for differences in slope coefficients after allowing for an intercept difference. In many applications, a significant Chow statistic simply indicates intercept differences. (See the example in Section 7.4 on student-athlete GPAs in the text.) From a practical perspective, it is important to know whether the partial effects differ across groups or whether a constant differential is sufficient.I admit that an unconventional feature of this chapter is its introduction of the linear probability model. I cover the LPM here for several reasons. First, the LPM is being used more and more because it is easier to interpret than probit or logit models. Plus, once the proper parameter scalings are done for probit and logit, the estimated effects are often similar to the LPM partial effects near the mean or median values of the explanatory variables. The theoretical drawbacks of the LPM are often of secondary importance in practice. Computer Exercise C7.9 is a good one to illustrate that, even with over 9,000 observations, the LPM can deliver fitted values strictly between zero and one for all observations.If the LPM is not covered, many students will never know about using econometrics to explain qualitative outcomes. This would be especially unfortunate for students who might need to read an article where an LPM is used, or who might want to estimate an LPM for a term paper or senior thesis. Once they are introduced to purpose and interpretation of the LPM, along with its shortcomings, they can tackle nonlinear models on their own or in a subsequent course.A useful modification of the LPM estimated in equation (7.29) is to drop kidsge6 (because it is not significant) and then define two dummy variables, one for kidslt6 equal to one and the other for kidslt6 at least two. These can be included in place of kidslt6 (with no young children being the base group). This allows a diminishing marginal effect in an LPM. I was a bit surprised when a diminishing effect did not materialize.SOLUTIONS TO PROBLEMS7.1 (i) The coefficient on male is 87.75, so a man is estimated to sleep almost one and one-half hours more per week than a comparable woman. Further, t male = 87.75/34.33 ≈ 2.56, which is close to the 1% critical value against a two-sided alternative (about 2.58). Thus, the evidence for a gender differential is fairly strong.(ii) The t statistic on totwrk is -.163/.018 ≈ -9.06, which is very statistically significant. The coefficient implies that one more hour of work (60 minutes) is associated with .163(60) ≈ 9.8 minutes less sleep.(iii) To obtain 2r R , the R -squared from the restricted regression, we need to estimate themodel without age and age 2. When age and age 2 are both in the model, age has no effect only if the parameters on both terms are zero.7.2 (i) If ∆cigs = 10 then log()bwght ∆ = -.0044(10) = -.044, which means about a 4.4% lower birth weight.(ii) A white child is estimated to weigh about 5.5% more, other factors in the first equation fixed. Further, t white ≈ 4.23, which is well above any commonly used critical value. Thus, the difference between white and nonwhite babies is also statistically significant.(iii) If the mother has one more year of education, the child’s birth weight is estimated to be .3% higher. This is not a huge effect, and the t statistic is only one, so it is not statistically significant.(iv) The two regressions use different sets of observations. The second regression uses fewer observations because motheduc or fatheduc are missing for some observations. We would have to reestimate the first equation (and obtain the R -squared) using the same observations used to estimate the second equation.7.3 (i) The t statistic on hsize 2 is over four in absolute value, so there is very strong evidence that it belongs in the equation. We obtain this by finding the turnaround point; this is the value ofhsize that maximizes ˆsat(other things fixed): 19.3/(2⋅2.19) ≈ 4.41. Because hsize is measured in hundreds, the optimal size of graduating class is about 441.(ii) This is given by the coefficient on female (since black = 0): nonblack females have SAT scores about 45 points lower than nonblack males. The t statistic is about –10.51, so thedifference is very statistically significant. (The very large sample size certainly contributes to the statistical significance.)(iii) Because female = 0, the coefficient on black implies that a black male has an estimated SAT score almost 170 points less than a comparable nonblack male. The t statistic is over 13 in absolute value, so we easily reject the hypothesis that there is no ceteris paribus difference.(iv) We plug in black = 1, female = 1 for black females and black = 0 and female = 1 for nonblack females. The difference is therefore –169.81 + 62.31 = -107.50. Because the estimate depends on two coefficients, we cannot construct a t statistic from the information given. The easiest approach is to define dummy variables for three of the four race/gender categories and choose nonblack females as the base group. We can then obtain the t statistic we want as the coefficient on the black female dummy variable.7.4 (i) The approximate difference is just the coefficient on utility times 100, or –28.3%. The t statistic is -.283/.099 ≈ -2.86, which is very statistically significant.(ii) 100⋅[exp(-.283) – 1) ≈ -24.7%, and so the estimate is somewhat smaller in magnitude.(iii) The proportionate difference is .181 - .158 = .023, or about 2.3%. One equation that can be estimated to obtain the standard error of this difference islog(salary ) = 0β + 1βlog(sales ) + 2βroe + 1δconsprod + 2δutility +3δtrans + u ,where trans is a dummy variable for the transportation industry. Now, the base group is finance , and so the coefficient 1δ directly measures the difference between the consumer products and finance industries, and we can use the t statistic on consprod .7.5 (i) Following the hint, colGPA = 0ˆβ + 0ˆδ(1 – noPC ) + 1ˆβhsGPA + 2ˆβACT = (0ˆβ + 0ˆδ) - 0ˆδnoPC + 1ˆβhsGPA + 2ˆβACT . For the specific estimates in equation (7.6), 0ˆβ = 1.26 and 0ˆδ = .157, so the new intercept is 1.26 + .157 = 1.417. The coefficient on noPC is –.157.(ii) Nothing happens to the R -squared. Using noPC in place of PC is simply a different way of including the same information on PC ownership.(iii) It makes no sense to include both dummy variables in the regression: we cannot hold noPC fixed while changing PC . We have only two groups based on PC ownership so, in addition to the overall intercept, we need only to include one dummy variable. If we try toinclude both along with an intercept we have perfect multicollinearity (the dummy variable trap).7.6 In Section 3.3 – in particular, in the discussion surrounding Table 3.2 – we discussed how to determine the direction of bias in the OLS estimators when an important variable (ability, in this case) has been omitted from the regression. As we discussed there, Table 3.2 only strictly holds with a single explanatory variable included in the regression, but we often ignore the presence of other independent variables and use this table as a rough guide. (Or, we can use the results of Problem 3.10 for a more precise analysis.) If less able workers are more likely to receivetraining, then train and u are negatively correlated. If we ignore the presence of educ and exper , or at least assume that train and u are negatively correlated after netting out educ and exper , then we can use Table 3.2: the OLS estimator of 1β (with ability in the error term) has a downward bias. Because we think 1β ≥ 0, we are less likely to conclude that the training program waseffective. Intuitively, this makes sense: if those chosen for training had not received training, they would have lowers wages, on average, than the control group.7.7 (i) Write the population model underlying (7.29) asinlf = 0β + 1βnwifeinc + 2βeduc + 3βexper +4βexper 2 + 5βage+ 6βkidslt6 + 7βkidsage6 + u ,plug in inlf = 1 – outlf , and rearrange:1 – outlf = 0β + 1βnwifeinc + 2βeduc + 3βexper +4βexper2 + 5βage+ 6βkidslt6 + 7βkidsage6 + u ,oroutlf = (1 - 0β) - 1βnwifeinc - 2βeduc - 3βexper - 4βexper 2 - 5βage - 6βkidslt6 - 7βkidsage6 - u ,The new error term, -u , has the same properties as u . From this we see that if we regress outlf on all of the independent variables in (7.29), the new intercept is 1 - .586 = .414 and each slope coefficient takes on the opposite sign from when inlf is the dependent variable. For example, the new coefficient on educ is -.038 while the new coefficient on kidslt6 is .262.(ii) The standard errors will not change. In the case of the slopes, changing the signs of the estimators does not change their variances, and therefore the standard errors are unchanged (butthe t statistics change sign). Also, Var(1 - 0ˆβ) = Var(0ˆβ), so the standard error of the intercept is the same as before.(iii) We know that changing the units of measurement of independent variables, or entering qualitative information using different sets of dummy variables, does not change the R -squared. But here we are changing the dependent variable. Nevertheless, the R -squareds from the regressions are still the same. To see this, part (i) suggests that the squared residuals will be identical in the two regressions. For each i the error in the equation for outlf i is just the negative of the error in the other equation for inlf i , and the same is true of the residuals. Therefore, the SSRs are the same. Further, in this case, the total sum of squares are the same. For outlf we haveSST = 2211()[(1)(1)]n n i i i i outlf outlf inlf inlf ==-=---∑∑= 2211()()n ni i i i inlf inlf inlf inlf ==-+=-∑∑,which is the SST for inlf . Because R 2 = 1 – SSR/SST, the R -squared is the same in the two regressions.7.8 (i) We want to have a constant semi-elasticity model, so a standard wage equation with marijuana usage included would belog(wage ) = 0β + 1βusage + 2βeduc + 3βexper + 4βexper 2 + 5βfemale + u .Then 100⋅1β is the approximate percentage change in wage when marijuana usage increases byone time per month.(ii) We would add an interaction term in female and usage :log(wage ) = 0β + 1βusage + 2βeduc + 3βexper + 4βexper 2 + 5βfemale+ 6βfemale ⋅usage + u .The null hypothesis that the effect of marijuana usage does not differ by gender is H 0: 6β = 0.(iii) We take the base group to be nonuser. Then we need dummy variables for the other three groups: lghtuser , moduser , and hvyuser . Assuming no interactive effect with gender, the model would belog(wage ) = 0β + 1δlghtuser + 2δmoduser + 3δhvyuser + 2βeduc + 3βexper + 4βexper 2 + 5βfemale + u .(iv) The null hypothesis is H 0: 1δ = 0, 2δ= 0, 3δ = 0, for a total of q = 3 restrictions. If n is the sample size, the df in the unrestricted model – the denominator df in the F distribution – is n – 8. So we would obtain the critical value from the F q ,n -8 distribution.(v) The error term could contain factors, such as family background (including parental history of drug abuse) that could directly affect wages and also be correlated with marijuana usage. We are interested in the effects of a person’s drug usage on his or her wage, so we would like to hold other confounding factors fixed. We could try to collect data on relevant background information.7.9 (i) Plugging in u = 0 and d = 1 gives 10011()()()f z z βδβδ=+++.(ii) Setting **01()()f z f z = gives **010011()()z z βββδβδ+=+++ or *010z δδ=+. Therefore, provided 10δ≠, we have *01/z δδ=-. Clearly, *z is positive if and only if 01/δδ is negative, which means 01 and δδ must have opposite signs.(iii) Using part (ii) we have *.357/.03011.9totcoll == years.(iv) The estimated years of college where women catch up to men is much too high to be practically relevant. While the estimated coefficient on female totcoll ⋅ shows that the gap is reduced at higher levels of college, it is never closed – not even close. In fact, at four years of。

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-第1章及第一篇(第2~3章)【圣才出品】

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-第1章及第一篇(第2~3章)【圣才出品】

品数(output)方面的信息。 (i)仔细陈述这个政策问题背后其他情况不变的思维试验。 (ii)一个企业培训其员工的决策看起来有可能独立于工人特征吗?工人可观测与不可
观测的特征各有哪些? (iii)除工人特征之外,再列出一个影响工人生产力的因素。 (iv)你若发现 training 和 output 之间成正相关关系,你令人信服地证明了工作培训
2.工作培训项目的理由之一是能提高工人的生产力。假设要求你评估更多的工作培训 是否使工人更有生产力。不过,你没有工人的个人数据,而是有俄亥俄州制造企业的数据。 具体而言,对每个企业,你都有人均工作培训小时数(training)和单位工时生产的合格产
4 / 91
圣才电子书 十万种考研考证电子书、题库视频学习平台
十万种考研考证电子书、题库视频学习平台
表 1-1 经济数据的结构
2.面板数据与混合横截面数据的比较(见表 1-2) 表 1-2 面板数据与混合横截面数据的比较
2 / 91
圣才电子书 十万种考研考证电子书、题库视频学习平台

考点三:因果关系和其他条件不变 ★★
1.因果关系 因果关系是指一个变量的变动将引起另一个变量的变动,这是经济分析中的重要目标之 一。计量分析虽然能发现变量之间的相关关系,但是如果想要解释因果关系,还要排除模型 本身存在因果互逆的可能,否则很难让人信服。
答:讲不通。因为找出每周学习小时数(study)和每周工作小时数(work)之间的关 系,是说每周学习小时数(study)和每周工作小时数(work)之间有关系,但没有说是因 果关系,每周学习小时数可能与其他因素有关或每周工作小时数与其他因素有关。
4.对税收有控制权的州或省份有时候会减少税收来刺激经济增长。假设你被某州政府 雇佣来估计公司税率的影响,比如说对每单位州生产总值增长的影响。

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-工具变量估计与两阶段最小二乘法

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-工具变量估计与两阶段最小二乘法

第15章工具变量估计与两阶段最小二乘法15.1复习笔记考点一:工具变量法★★★★★1.简单模型的工具变量法简单回归模型为y=β0+β1x+u,其中x与u相关:Cov(x,u)≠0。

(1)为了在x和u相关时得到β0和β1的一致估计量,需要有一个可观测到的变量z,z满足两个假定:①工具外生性条件,z与u不相关,即Cov(z,u)=0,意味着z应当对y无偏效应(一旦x和u中的遗漏变量被控制),也不应当与其他影响y的无法观测因素相关;②工具相关性条件,z与x相关,即Cov(z,x)≠0,意味着z必然与内生解释变量x 有着或正或负的关系。

满足这两个条件,则z称为x的工具变量,简称为x的工具。

(2)工具变量的两个要求之间的差别①Cov(z,u)是z与无法观测误差u的协方差,通常无法对它进行检验:在绝大多数情形中,必须借助于经济行为或反思来维持这一假定。

②给定一个来自总体的随机样本,z与x(在总体中)相关的条件则可加以检验。

最容易的方法是估计一个x与z之间的简单回归。

在总体中,有x=π0+π1z+v,从而,由于π1=Cov(z,x)/Var(z)所以式Cov(z,x)≠0中的假定当且仅当π1≠0时成立。

因而就能够在充分小的显著水平上,相对双侧对立假设H 1:π1≠0而拒绝虚拟假设H 0:π1=0。

就能相当有把握地肯定工具z 与x 是相关的。

2.工具变量估计量(1)参数的工具变量(IV)估计量参数的识别意味着可以根据总体矩写出β1,而总体矩可用样本数据进行估计。

为了根据总体协方差写出β1,利用简单回归方程可得z 与y 之间的协方差为:Cov(z,y)=β1Cov(z,x)+Cov(z,u)在Cov(z,u)=0与Cov(z,x)≠0的假定下,可以解出β1为:β1=Cov(z,y)/Cov(z,x)β1是z 和y 之间的总体协方差除以z 和x 之间的总体协方差,说明β1被识别了。

给定一个随机样本,用对应样本量来估计总体量。

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-时间序列回归中的序列相关和异方差性

伍德里奇《计量经济学导论》(第6版)复习笔记和课后习题详解-时间序列回归中的序列相关和异方差性

第12章时间序列回归中的序列相关和异方差性12.1复习笔记考点一:含序列相关误差时OLS 的性质★★★1.无偏性和一致性当时间序列回归的前3个高斯-马尔可夫假定成立时,OLS 的估计值是无偏的。

把严格外生性假定放松到E(u t |X t )=0,可以证明当数据是弱相关时,∧βj 仍然是一致的,但不一定是无偏的。

2.有效性和推断假定误差存在序列相关,即满足u t =ρu t-1+e t ,t=1,2,…,n,|ρ|<1。

其中,e t 是均值为0方差为σe 2满足经典假定的误差。

对于简单回归模型:y t =β0+β1x t +u t 。

假定x t 的样本均值为零,因此有:1111ˆn x t tt SST x u -==+∑ββ其中:21nx t t SST x ==∑∧β1的方差为:()()122221111ˆ/2/n n n t j xt t x x t t j t t j Var SST Var x u SST SST x x ---+===⎛⎫==+ ⎪⎝⎭∑∑∑βσσρ其中:σ2=Var(u t )。

根据∧β1的方差表达式可知,第一项为经典假定条件下的简单回归模型中参数的方差。

因此,当模型中的误差项存在序列相关时,OLS 估计的方差是有偏的,假设检验的统计量也会出现偏差。

3.拟合优度当时间序列回归模型中的误差存在序列相关时,通常的拟合优度指标R 2和调整R 2便会失效;但只要数据是平稳和弱相关的,拟合优度指标就仍然有效。

4.出现滞后因变量时的序列相关(1)在出现滞后因变量和序列相关的误差时,OLS 不一定是不一致的假设E(y t |y t-1)=β0+β1y t-1。

其中,|β1|<1。

加上误差项把上式写为:y t =β0+β1y t-1+u t ,E(u t |y t-1)=0。

模型满足零条件均值假定,因此OLS 估计量∧β0和∧β1是一致的。

误差{u t }可能序列相关。

虽然E(u t |y t-1)=0保证了u t 与y t-1不相关,但u t-1=y t -1-β0-β1y t-2,u t 和y t-2却可能相关。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

271
APPENDIX E
SOLUTIONS TO PROBLEMS
E.1 This follows directly from partitioned matrix multiplication in Appendix D. Write
X = 12n ⎛⎫ ⎪ ⎪ ⎪ ⎪ ⎪⎝⎭x x x , X ' = (1'x 2'x n 'x ), and y = 12n ⎛⎫ ⎪ ⎪ ⎪ ⎪ ⎪⎝⎭
y y y
Therefore, X 'X = 1
n
t t t ='∑x x and X 'y = 1
n
t t t ='∑x y . An equivalent expression for ˆβ is
ˆβ = 1
11n
t t t n --=⎛⎫' ⎪⎝⎭∑x x 11n t t t n y -=⎛⎫' ⎪⎝⎭
∑x which, when we plug in y t = x t β + u t for each t and do some algebra, can be written as
ˆβ= β + 1
11n t t t n --=⎛⎫' ⎪⎝⎭∑x x 11n
t t t n u -=⎛⎫' ⎪⎝⎭
∑x . As shown in Section E.4, this expression is the basis for the asymptotic analysis of OLS using matrices.
E.2 (i) Following the hint, we have SSR(b ) = (y – Xb )'(y – Xb ) = [ˆu
+ X (ˆβ – b )]'[ ˆu + X (ˆβ – b )] = ˆu
'ˆu + ˆu 'X (ˆβ – b ) + (ˆβ – b )'X 'ˆu + (ˆβ – b )'X 'X (ˆβ – b ). But by the first order conditions for OLS, X 'ˆu
= 0, and so (X 'ˆu )' = ˆu 'X = 0. But then SSR(b ) = ˆu 'ˆu + (ˆβ – b )'X 'X (ˆβ – b ), which is what we wanted to show.
(ii) If X has a rank k then X 'X is positive definite, which implies that (ˆβ
– b ) 'X 'X (ˆβ – b ) > 0 for all b ≠ ˆβ
. The term ˆu 'ˆu does not depend on b , and so SSR(b ) – SSR(ˆβ) = (ˆβ– b ) 'X 'X (ˆβ
– b ) > 0 for b ≠ˆβ.
E.3 (i) We use the placeholder feature of the OLS formulas. By definition, β = (Z 'Z )-1Z 'y =
[(XA )' (XA )]-1(XA )'y = [A '(X 'X )A ]-1A 'X 'y = A -1(X 'X )-1(A ')-1A 'X 'y = A -1(X 'X )-1X 'y = A -1ˆβ
.
(ii) By definition of the fitted values, ˆt y = ˆt x β and t y = t
z β. Plugging z t and β into the second equation gives t
y = (x t A )(A -1ˆβ
) = ˆt x β = ˆt
y .
(iii) The estimated variance matrix from the regression of y and Z is 2σ(Z 'Z )-1 where 2σ is the error variance estimate from this regression. From part (ii), the fitted values from the two
272
regressions are the same, which means the residuals must be the same for all t . (The dependent
variable is the same in both regressions.) Therefore, 2σ = 2ˆσ
. Further, as we showed in part (i), (Z 'Z )-1 = A -1(X 'X )-1(A ')-1, and so 2σ(Z 'Z )-1 = 2ˆσ
A -1(X 'X )-1(A -1)', which is what we wanted to show.
(iv) The j β are obtained from a regression of y on XA , where A is the k ⨯ k diagonal matrix
with 1, a 2, , a k down the diagonal. From part (i), β = A -1ˆβ
. But A -1 is easily seen to be the k ⨯ k diagonal matrix with 1, 1
2a -, , 1k a - down its diagonal. Straightforward multiplication
shows that the first element of A -1ˆβ
is 1ˆβ and the j th element is ˆj
β/a j , j = 2, , k .
(v) From part (iii), the estimated variance matrix of β is 2ˆσ
A -1(X 'X )-1(A -1)'. But A -1 is a symmetric, diagonal matrix, as described above. The estimated variance of j β is the j th
diagonal element of 2ˆσA -1(X 'X )-1A -1, which is easily seen to be = 2ˆσc jj /2
j
a -, where c jj is the j th diagonal element of (X 'X )-1
. The square root of this, a j |, is se(j β), which is simply se(j β)/|a j |.
(vi) The t statistic for j β is, as usual,
j β/se(j β) = (ˆj β/a j )/[se(ˆj
β)/|a j |],
and so the absolute value is (|ˆj β|/|a j |)/[se(ˆj β)/|a j |] = |ˆj β|/se(ˆj
β), which is just the absolute value of the t statistic for ˆj
β
. If a j > 0, the t statistics themselves are identical; if a j < 0, the t statistics are simply opposite in sign.
E.4 (i) 垐?E(|)E(|)E(|).====δ
X GβX G βX Gβδ
(ii) 2121垐?Var(|)Var(|)[Var(|)][()][()].σσ--'''''====δ
X GβX G βX G G X X G G X X G
(iii) The vector of regression coefficients from the regression y on XG -1 is
111111111111[()]()[()]() ()[()]()ˆ ()()().------------''''''='''''=''''''''===XG XG XG y G X XG G X y G X X G G X y
G X X G G X y G X X X y δ
Further, as shown in Problem E.3, the residuals are the same as from the regression y on X , and
so the error variance estimate, 2ˆ,σ
is the same. Therefore, the estimated variance matrix is。

相关文档
最新文档