logistic 标准系数standardized_paper
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
RUNNING HEAD: Standardized Coefficients in Logistic Regression
Standardized Coefficients in Logistic Regression
Jason E. King
Baylor College of Medicine
Paper presented at the annual meeting of the Southwest Educational Research Association, San Antonio, Texas, Feb. 7-10, 2007.
Correspondence concerning this paper should be addressed to Jason King, One Baylor Plaza, MS: BCM #155, Houston, TX. 77030. E-mail: Jasonk@
Abstract
The aim of this paper is to briefly describe a relatively new standardized coefficient for making variable comparisons in logistic regression. A Microsoft Excel function is appended and application is made using a readily available dataset.
Introduction
Though slow to catch on initially (White, Long, & Tansey, 1997), the last two decades have seen tremendous growth in the use of logistic regression within the social sciences. In spite of the expanding literature base, many researchers remain unaware of or unable to apply best practices in logistic regression. This is due in part to the necessary lag between publications of new methods and their subsequent implementation in the statistical computing packages. Calculation of improved standardized coefficients is a case in point. The SAS and SPSS procedures for calculating standardized logistic regression coefficients are inferior to more recently developed approaches. The aim of this paper is to briefly describe an effective and relatively new standardized coefficient and present a Microsoft Excel function applying the algorithm.
Standardized Coefficients
In linear regression, standardized beta weights are often used to compare strength of prediction across variables. The predictors are placed on a common scale so that each has the same mean and standard deviation. Variables having larger standardized beta weights (in absolute value) are considered to be stronger predictors in the equation. Though some researchers discourage the use of standardized coefficients (Darlington, 1990) or warn against interpreting them as indicators of variable importance (Neter et al., 1996; Pedhazur, 1997), the most recent publication manual of the American Psychological Association (2001) encourages their routine use and reporting. Standardized beta weights are especially useful when variables are measured on an arbitrary scale (e.g., Likert ratings).
It is possible to obtain standardized coefficients in logistic regression (Long,
1997), though things become appreciably more complex than in linear regression. One problem is the relatively meaningless notion of a standard deviation change on a categorical covariate (Greenland, Schlesselman, & Criqui, 1986). Another is that
interpretation of variable importance using standardized coefficients is typically tied to log-odds, which are difficult to interpret. Further complicating matters is fluctuation in the variance of the criterion across values of the predictor variables. SPSS does not print any form of standardized logistic regression coefficient. SAS prints a partial or semi-standardized coefficient that is not without limitations (Kaufman, 1996).
One of the more useful standardized coefficients is given by Kaufman (1996) as: ⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡+-+=-⎪⎪⎭⎫ ⎝⎛--+⎪⎪⎭⎫ ⎝⎛--∆s b P P s b P P P SS ˆ211ln ˆ211ln Ref Ref Ref Ref exp 11exp 11 . (1)
where P Ref = a probability value used as a reference point, b
ˆ = the unstandardized logistic regression coefficient, and s = the sample standard deviation. This coefficient measures the change in predicted probability associated with a one-standard deviation change in the predictor and is restricted to the interval -1 to 1. Technically, SS ∆P is only semi-standardized because the predictor variable alone is standardized. Coefficients that incorporate full standardization of criterion and predictors may be preferable when the goal is comparing predictors across models rather than within models.
Equation 1 suggests that it is necessary to select a reference predicted probability in calculating the index. This is because its value typically varies based on the probability selected. Reasonable reference values include the mean predicted probability, the mean
predicted probability corresponding to the mean of the independent variable, or P = 0.5 (Kaufman, 1996). We will focus on the mean predicted probability. Appendix A presents a Microsoft Excel function for obtaining the standardized coefficient. To use the function: (a) calculate the sample standard deviation for each predictor, (b) save logistic regression predicted probabilities, (c) calculate the mean of those values to obtain P Ref, and (d) substitute s, bˆ, and P Ref into the Excel function.
Application
Application is made using the Employee dataset which comes bundled with recent versions of the Statistical Package for the Social Sciences (SPSS) and can also be freely downloaded at <>. To access the file, log in as Guest and download 75brief.exe from the Patches & Utilities/SPSS Utilities/Example Data Files directory. The database includes measures of Employee Education Level (in years), Sex (recoded as 0 = male; 1 = female), Minority Status (0 = no; 1 = yes), Previous Experience (in months), Case ID, and Job Category (custodial, clerical, managerial). The custodial and clerical job categories were combined to form a dichotomous criterion and recoded
as custodial/clerical = 0 and managerial = 1. The file originally contained n = 474 cases. Deleting 24 cases with missing data on the Experience variable left 450 usable observations.
Table 1 lists the logistic regression standardized coefficients obtained for a model in which Job Category (custodial/clerical, managerial) is predicted using Sex, Minority Status, Education, and Experience. The Education variable evidenced the largest coefficient with a value of 0.736. It is interpreted as follows: A one-standard deviation increase in Education increases the predicted probability of being a manager by 0.736 in
the context of P = 0.187. Because the coefficient is bounded by -1 and 1, the result can be interpreted as the proportion of the maximum possible change in the probability of the outcome. Thus, the value 0.736 represents 74% of the total change possible in predictive probability. More specifically, the 74% change increases the predicted probability of being a manager from 0.187 to about 0.923, a very large increase. By contrast, a one-standard deviation change in Experience yields a coefficient of only 0.027, less than a 3% increase in predicted probability. For comparative purposes, Table 2 presents the ordinary linear regression estimates and standardized coefficients.
Extensions
An alternative to selecting a single reference probability is to consider a range of probabilities. This approach faithfully represents the nonlinear relationship between predictors and predicted criterion values and is easily accomplished using a spreadsheet. Figure 1 depicts a range of change in predicted probabilities. Note that stronger predictors (i.e., Education and Minority Status) produce greater variability in strength of prediction. It is also clear from the figure that although the rank ordering of variables remains constant, the maximum point of dispersion among standardized coefficients occurs when P = 0.5. Such graphics aid in understanding complex nonlinear relationships and promote explorations of the strength of relationship across a range of candidate predicted probabilities.
Concluding Remarks
Standardized logistic regression coefficients aid in exploring the differential effects of predictors on the criterion. Although newer methods for calculating these coefficients are currently unavailable in the popular statistical computing packages,
researchers are encouraged to make use of the appended spreadsheet and begin reporting these results when applicable. As logistic regression grows in popularity and usage, social scientists should occupy the forefront in applying the most current and effective methodological practices, which will in turn lead to stronger scientific research.
American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author.
Darlington, R. B. (1990). Regression and linear models. New York: McGraw-Hill. Greenland, S., Schlesselman, J. J. , & Criqui, M. H. (1986). The fallacy of employing standardized regression coefficients and correlations as measures of effect.
American Journal of Epidemiology, 123, 203-108.
Kaufman, R. L. (1996). Comparing effects in dichotomous logistic regression: A variety of standardized coefficients. Social Science Quarterly, 77, 90-109.
Long, S. (1997). Regression models for categorical and limited dependent variables.
Thousand Oaks, CA: Sage Publications, Inc.
Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). Applied linear statistical models (3rd ed.). Chicago, IL: R.D. Irwin.
Pedhazur, E. J. (1997). Multiple regression in behavioral research (3rd ed.). Fort Worth, TX: Harcourt Brace College.
White, M. C., Long, R. G., & Tansey, R. (1997). Logistic regression: The generalized linear model in the social sciences. Perceptual and Motor Skills, 85, 66.
A Microsoft Excel Function for Calculating a Standardized
Logistic Regression Coefficient
Cell A1 = Enter the mean predicted probability for the dataset.
Cell A2 = Enter the unstandardized beta weight for X.
Cell A3 = Enter the sample standard deviation for X.
Cell A4: Calculate a standardized coefficient for X by typing:
=(1/(1+EXP(-(LN(A1/(1-A1))+0.5*A2*A3))))-(1/(1+EXP(-(LN(A1/(1-A1))-0.5*A2*A3))))
Repeat for additional predictor variables.
Table 1
Logistic Regression Results
Variable bˆSE bˆβWald t Prob. OR
(constant) Sex Minority Education Experience -28.301
-0.805
-2.442
1.772
0.002
4.304
0.452
0.819
0.275
0.003
--
-0.061
-0.156
0.736
0.027
43.244
3.173
8.886
41.400
0.406
0.000a
0.075
0.003
0.000a
0.524
0.000
0.447
0.087
5.883
1.002
Note:β = semi-standardized beta weight using the mean predicted probability of 0.187 as a reference value. OR = odds ratio. D0 = –2LL Intercept = 433.218; D M = –2LL Model = 165.887; G(4) = 267.331, p < 0.001.
a Value less than 0.0005.
Table 2
Linear Regression Results
Variable bˆSE bˆβt Prob. r s
(constant) Sex Minority Education Experience -0.804
-0.082
-0.131
0.076
0.000a
0.092
0.032
0.035
0.006
0.000a
--
-0.104
-0.141
0.569
0.063
-8.765
-2.553
-3.757
13.467
1.588
0.000a
0.011
0.000a
0.000a
0.113
--
-0.473
-0.325
0.961
-0.169
Note:β = standardized beta weight. r s = structure coefficient. Multiple R2 = 0.397.
a Value less than 0.0005.
Figure 1. Range of estimated standardized logistic coefficients.。