SPSS岭回归
合集下载
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Ridge Estimators —for standardized regression model
For Ordinary least squares, the normal equations are given by:
(X′X)b = X′Y
(1)
When all variables are transformed by the correlation transformation (3), the transformed regression model is given by (4):
Choice of Biasing Constant k
z 显然,由于k=0,bkR 就退化为最小二乘估计;而当kÆ∞时,bkR
就趋于0。因此,k不宜太大。
z 由于k的选择是任意的,岭回归分析时一个重要的问题就是k取 多少合适 。
z 由于岭回归是有偏估计,k值不宜太大;而且一般来说我们希 望能尽量保留信息,即尽量能让k小些。
X
* i1
+
β
* 2
X
* i2
+K+
β
X * *
p−1 i, p−1
+
ε
* i
(4)
Ridge Estimators —for standardized regression model
And the least squares normal equation are given by (5)
rXX b = rYX
bR = (rXX + kI)−1rYX
(10)
The constant k reflects the amount of bias in the estimator. When k =0, bkR
formula (10) reduces to the ordinary least squares regression coefficients
1. Some Remedial Measures for Multicollinearity
2. Principles of Ridge Regression
3. SPSS Example for Ridge Regression
4. Comments for Ridge Regression
Some Remedial Measures for Multicollinearity
(5)
Where rXX is the correlation matrix of the X variables defined in (6) and rYX is the vector of coefficients of simple correlation between Y and
each X variable defined in (7).
rYX is a vector containing the coefficients of simple correlation
between Y and each of X variables.
Ridge Estimators —for standardized regression model
in standardized form. When k >0, the ridge regression coefficients are
biased but tend to be more stable (i.e., less variable) than ordinary least squares estimators.
z The Figure illustrates this situation. Estimator b is unbiased but imprecise, whereas estimator bR is much more precise but has a small bias. The probability that bR falls near the true value βis much greater than that for the unbiased estimator b.
z This remedial measure has two important limitations.
z First, no direct information is obtained about the dropped predictor variable.
z Second, the magnitudes of the regression coefficients for the predictor variables remaining in the model are affected by the correlated predictor variables not included in the model.
SPSS数据统计分析与实践
主讲:周涛 副教授 北京师范大学资源学院
2007-12-11
教学网站:http://www.ires.cn/Courses/SPSS
第十六章:岭回归(Ridge Regression) Multicollinearity Remedial Measures
Contents:
z 因此可以观察在不同k的取值时方程的变动情况,然后取使得 方程基本稳定的最小k值。
Choice of Biasing Constant k
z A commonly used method of determining the biasing constant k is based on the ridge trace (岭迹) and Variance inflation factor (VIF)k.
z A limitation of principal components regression, also called latent root regression(特征根回归), is that it may be difficult to attach concrete meaning to the indexes.
Some Remedial Measures for Multicollinearity
3. Another remedial measure for multicollinearity that can be used with ordinary least squares is to form one or several composite indexes based on the highly correlated predictor variables, an index being a linear combination of the correlated predictor variables. The methodology of principal components provides composite indexes that are uncorrelated.
The ridge standardized regression estimators are obtained by introducing into the least squares equations (5) a biasing constant k≥0, in the following form:
1. As we saw in Chapter 9, the presence of serious
multicollinearity often does not affect the usefulness of the fitted model for estimating mean responses or making predictions. Hence, one remedial measure is to restrict the use of the fitted regression model to inferences for values of the predictor variables.
Some Remedial Measures for Multicollinearity
4. Ridge regression is one of several methods that have been proposed to remedy multicollinearity problem by modifying the method of least squares to allow biased estimators of the regression coefficient.
岭回归的基本原理
Ridge Regression—Biased Estimation
z When an estimator has only a small bias and is substantially more precise than an unbiased estimator, it may well be the preferred estimator since it will have a larger probability of being close to the true parameter value.
=
⎢⎢⎡bb12RR ⎢⎢M
⎤ ⎥ ⎥ ⎥ ⎥
(9)
⎢⎣bpR−1 ⎥⎦
Ridge Estimators —for standardized regression model
Solution of the normal equations (8) yields the ridge standardized regression coefficients:
(rXX + kI)bR = rYX (8)
Where bR is the vector of the standardized ridge regression coefficients bkR ,
and I is the (p-1)×(p-1) identity matrix.
(
bR
p −1)×1
⎡1
⎢
(
rXX
p−1)( p−Hale Waihona Puke Baidu)
=
⎢ ⎢
r21 M
⎢⎢⎣rp−1,1
r12 1 M rp−1,2
r1, p−1 ⎤
r2,
p
−1
⎥ ⎥
(6)
M⎥
1
⎥ ⎥⎦
⎡rY1 ⎤
(
rYX
p −1)×1
=
⎢⎢rY ⎢M
2
⎥ ⎥ ⎥
(7)
⎢⎥
⎢⎣rY , p−1 ⎥⎦
Here, r12 denotes the coefficient of simple correlation between X1 and X2, and so on.
SY = Sk =
∑ (Yi − Y )2 i n −1
∑ ( X ik − X k )2 i n −1
(2)
(k = 1,K, p −1)
Yi* =
1 n −1
⎜⎜⎝⎛
Yi
− SY
Y
⎟⎟⎠⎞
X
* ik
=
1 n −1
⎜⎜⎝⎛
X ik − Sk
Xk
⎟⎟⎠⎞
(3)
(k = 1,K, p −1)
Yi*
=
β1*
Some Remedial Measures for Multicollinearity
2. One or several predictor variables may be dropped from the model in order to lessen the multicollinearity and thereby reduce the standard errors of the estimated regression coefficients of the predictor variables remaining in the model.
z The ridge trace is a simultaneous plot of the values of the p-1 estimated ridge standardized regression coefficients for different values of k, usually between 0 and 1.