2008_使用Stata做多层次分析
合集下载
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
14
15
16
3-level model in Stata (xtmixed)
17
The same model in MLwiN
18
A controversial claim about Stata
• Stata is the best package to use for multilevel modelling, becaus multilevel models?
• In SPSS and Stata, there are extension specifications which can be made in order to specify the simplest random intercepts model
• By constrast:
– Other mainstream packages don’t have adequate range of model estimators – Specialist packages (e.g. MLwiN; HLM) do have more advanced modelling estimators, but they inhibit data manipulation / serious model building
3
The work of statistical modelling
• Yi = BXi + ei
• Most of the time:
– we have a single Y – we ignore e – we concentrate on what goes into B
4
Example
• Data: British Household Panel Survey 2005 adult interviews (7k adults in work) • Y = GHQ scale score for adults in employment (General Health Questionnaire, higher = worse subjective well-being) • X = various possible measures, including gender, age, marital status, occupational advantage, education, partner’s GHQ • You can run this example, the files are at:
2008使用stata做多层次分析使用分析多多statastata分析分析
Scottish Social Survey Network: Master Class 1 Data Analysis with Stata
Dr Vernon Gayle and Dr Paul Lambert
23rd January 2008, University of Stirling
5
Results from four linear models
1 Cons Fem Age Age-squared Cohab Own CAMSIS -0.33* 11.03** 2 6.29** 1.25** 0.22** -0.0024** -0.77** 3 6.14** 1.28** 0.23** -0.0026** -0.76** -0.01* 4 6.56** 1.39** 0.22** -0.0024** -1.52** -0.01
– It is integrated with data management capacity: easy to change variables; change cases; add higher level explanatory variables; etc – It has a wide range of hierarchical model estimators – It allows easy comparison between long-standing hierarchical estimators (from economics) and new random effects models
Father’s CAMSIS
Degree/Diploma Vocational qual
0.01
-0.05 -0.13
No qual
Works > 10hrs Partner’s GHQ R2 0.0009 0.0234
-0.11
0.13 0.08** 0.0244 0.0293
Some regression assumptions
The SSSN is funded under Phase II of the ESRC Research Development Initiative
1
Multilevel data and analysis with Stata (in 15 minutes)
2
Generalised linear model
19
• Models which ignore clustering should be unbiassed but inefficient • The simplest multilevel model:
Shouldn’t change coefficent estimates (unbiased) Should change confidence intervals (inefficient)
8
. . Regions Wave 1 PSU1
Individuals Person Groups PSU2
. . PSU3
Wave 2
Wave 3 . . . . . .
Interviewer2 Interviewer3
. .
. .
. .
Interviewer3 Interviewer1
Interviewers : Interviewer1 W 1, 3 : Interviewer2 W 2 only :
• Y = BX + e
• Y = outcome variable(s) • X = explanatory variables • e = error term for each individual response
Generalised linear mixed models
– Adding complexity to the GLM, such as by disaggregating the error structures
All variables are measured without errors All relevant predictors of the independent variable are included in the analysis Expected value of the error is zero Heteroscedasticity of the error No autocorrelation (no relation between error terms for different cases)
12
Stata examples
• regress ghq fem age age2 cohab • regress ghq fem age age2 cohab, robust cluster(ohid) • xtmixed ghq fem age age2 cohab ||ohid:
13
Comments
– [above using: Menard, S. 1995. Applied Logistic Regression Analysis, London: Sage.]
7
Multilevel modelling
• What if there was some connection between some of the cases within the dataset?
• Linear model: Yi = BXi + ei • Multilevel model (‘random intercepts’) Yij = BXij + uj + eij • Multilevel model (‘random coefficients’) Yij = BXij + UBj + uj + eij
– This occurs by design in certain projects
• e.g. educational research, sample includes multiple children from the same school
– Some connections (‘hierarchical clusters’) are standard in most social surveys
– –
3. We could try telling the model that we expect the error terms to be related
– these are ‘hierarchical random effects’ = multilevel models
10
Creating a multilevel model
9
How to account for hierarchy / clustering in individual data?
1. We could try a unique dummy var. for every cluster
– – – – Country: Y = BX + scot + wal + Nir + e ‘areg’ in Stata allows several hundred variables like this often called a ‘hierarchical fixed effect’ but many hierarchies have too many clusters for this to be satisfactory
e.g. average unemployment rate in local authority district these are also ‘hierarchical fixed effects’
2. We could use higher level explanatory variables
15
16
3-level model in Stata (xtmixed)
17
The same model in MLwiN
18
A controversial claim about Stata
• Stata is the best package to use for multilevel modelling, becaus multilevel models?
• In SPSS and Stata, there are extension specifications which can be made in order to specify the simplest random intercepts model
• By constrast:
– Other mainstream packages don’t have adequate range of model estimators – Specialist packages (e.g. MLwiN; HLM) do have more advanced modelling estimators, but they inhibit data manipulation / serious model building
3
The work of statistical modelling
• Yi = BXi + ei
• Most of the time:
– we have a single Y – we ignore e – we concentrate on what goes into B
4
Example
• Data: British Household Panel Survey 2005 adult interviews (7k adults in work) • Y = GHQ scale score for adults in employment (General Health Questionnaire, higher = worse subjective well-being) • X = various possible measures, including gender, age, marital status, occupational advantage, education, partner’s GHQ • You can run this example, the files are at:
2008使用stata做多层次分析使用分析多多statastata分析分析
Scottish Social Survey Network: Master Class 1 Data Analysis with Stata
Dr Vernon Gayle and Dr Paul Lambert
23rd January 2008, University of Stirling
5
Results from four linear models
1 Cons Fem Age Age-squared Cohab Own CAMSIS -0.33* 11.03** 2 6.29** 1.25** 0.22** -0.0024** -0.77** 3 6.14** 1.28** 0.23** -0.0026** -0.76** -0.01* 4 6.56** 1.39** 0.22** -0.0024** -1.52** -0.01
– It is integrated with data management capacity: easy to change variables; change cases; add higher level explanatory variables; etc – It has a wide range of hierarchical model estimators – It allows easy comparison between long-standing hierarchical estimators (from economics) and new random effects models
Father’s CAMSIS
Degree/Diploma Vocational qual
0.01
-0.05 -0.13
No qual
Works > 10hrs Partner’s GHQ R2 0.0009 0.0234
-0.11
0.13 0.08** 0.0244 0.0293
Some regression assumptions
The SSSN is funded under Phase II of the ESRC Research Development Initiative
1
Multilevel data and analysis with Stata (in 15 minutes)
2
Generalised linear model
19
• Models which ignore clustering should be unbiassed but inefficient • The simplest multilevel model:
Shouldn’t change coefficent estimates (unbiased) Should change confidence intervals (inefficient)
8
. . Regions Wave 1 PSU1
Individuals Person Groups PSU2
. . PSU3
Wave 2
Wave 3 . . . . . .
Interviewer2 Interviewer3
. .
. .
. .
Interviewer3 Interviewer1
Interviewers : Interviewer1 W 1, 3 : Interviewer2 W 2 only :
• Y = BX + e
• Y = outcome variable(s) • X = explanatory variables • e = error term for each individual response
Generalised linear mixed models
– Adding complexity to the GLM, such as by disaggregating the error structures
All variables are measured without errors All relevant predictors of the independent variable are included in the analysis Expected value of the error is zero Heteroscedasticity of the error No autocorrelation (no relation between error terms for different cases)
12
Stata examples
• regress ghq fem age age2 cohab • regress ghq fem age age2 cohab, robust cluster(ohid) • xtmixed ghq fem age age2 cohab ||ohid:
13
Comments
– [above using: Menard, S. 1995. Applied Logistic Regression Analysis, London: Sage.]
7
Multilevel modelling
• What if there was some connection between some of the cases within the dataset?
• Linear model: Yi = BXi + ei • Multilevel model (‘random intercepts’) Yij = BXij + uj + eij • Multilevel model (‘random coefficients’) Yij = BXij + UBj + uj + eij
– This occurs by design in certain projects
• e.g. educational research, sample includes multiple children from the same school
– Some connections (‘hierarchical clusters’) are standard in most social surveys
– –
3. We could try telling the model that we expect the error terms to be related
– these are ‘hierarchical random effects’ = multilevel models
10
Creating a multilevel model
9
How to account for hierarchy / clustering in individual data?
1. We could try a unique dummy var. for every cluster
– – – – Country: Y = BX + scot + wal + Nir + e ‘areg’ in Stata allows several hundred variables like this often called a ‘hierarchical fixed effect’ but many hierarchies have too many clusters for this to be satisfactory
e.g. average unemployment rate in local authority district these are also ‘hierarchical fixed effects’
2. We could use higher level explanatory variables