北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU7

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Class 7: Path analysis and multicollinearity

I. Standardized Coefficients: Transformations

If the true model is

i p i p i i x x y εβββ++++=--)1(1110

(1)

If we make the following transformation: xk k ik ik y

i i s X x x s Y y y /)(/)(**-=-= ,

where y s and xk s are sample standard deviations of y and x k , respectively.

Thus, standardization does two things: centering and rescaling. Centering is to normalize the location of a variable so that it has a mean of zero. Rescaling is to normalize a

variable to have a variance of unity.

Location of a measurement: where is zero?

Scale of a measurement: how big is one-unit?

Both the location and the scale of a variable can be arbitrary to begin with and need to be normalized. Examples: temperature, IQ, emotion. Some other variables have natural location and scale, such as the number of children and the number of days.

Standardized regression: a regression with all variables standardized.

*

*****i 111(1)y i p i p i x x ββε--=+++ (2)

Relationship between (1) and (2):

Average equation (1) and then take the difference between (1) and the averaged (1). This is equivalent to centering variables in (1) (note that 0=ε): i p p i p i i X x X x Y y εββ+-++-=----)()(1)1(1111

(3) Note: )1(1110--+++=p p X X Y βββ

Divide (3) by y s

**)1(*1*1*1)1(1)1()1(1111111)1(1111//))(/(/))(/(/))(/())(/(/)(i

p i p i y i p x p p i y p x p x i y x y

i p p i y p i y y i x x s s X x s s s X x s s s X x s X x s s Y y εββεββεββ+++=+-++-=+-++-=-----------

That is, y xk k k s s /*ββ=

When variables are standardized variables, we have

()xx X X r '<=>

xy X y r '<=>

xy x x r r y X X X b 11)(--=''=.

In the older days of sociology (1960s and 1970s), many studies publish correlation matrices so that their regression results can be easily replicated. This is possible because correlation matrices contain all the sufficient statistics for path analysis.

II. Why Standardized Coefficients?

A. Ease of Computation

B. Boundaries of Estimates: -1 to 1.

C. Standardized Scale in Comparison

Which is better: Standardized or Unstandardized

Unstandardized coefficients are generally better because they tell you more about the data and about changes in real units.

Rule of Thumb:

A. Usually it is not a good idea to report standardized coefficients.

B. Almost always report unstandardized coefficients (if you can).

C. Read standardized coefficients on your own.

D. You can interpret unstandardized coefficients in terms of standard deviations. (homework).

E. If only a correlation matrix is available, then only standardized coefficients can be

estimated (LISREL).

F. In an analysis of comparing multiple populations, whether to use standardized or

unstandardized is consequential. In this case, theoretical/conceptual considerations

should dictate the decision.

III. Decomposition of Total Effects

A. Difference between reduced-form equations and structural equations

Everything I am now discussing is about systems of equations. What are systems of equations? Systems of equations are equations with different dependent variables. For example, we talked about auxiliary regressions: one independent variable is turned into the new dependent variable.

1. Exogenous variables

Exogenous variables are variables that are used only as independent variables in all

equations.

2. Endogenous variables

Endogenous variables are variables that are used as dependent variables in some

equations and may be used as independent variables in other equations.

B. Structural Equations versus Reduced Forms

1. Structural Equations

Structural equations are theoretically derived equations that often have endogenous

variables as independent variables.

2. Reduced Forms

Reduced form equations are equations in which all independent variables are exogenous variables. In other words, in reduced form equations, we purposely ignore intermediate (or relevant) variables.

相关文档
最新文档