多水平统计模型简介(全套课件108P)
多水平结构方程模型 ppt课件
多水平结构方程模型
多水平结构方程模型
• 概念
(Hyman, 1955; James & Brett, 1984; Judd & Kenny, 1981; Baron & Kenny, 1986 )
多水平结构方程模型
(MacKinnon, Fairchild,Fritz,2007)
• 最小方差二次无偏估计方法:
在无偏估计中,具有最小方差。
多水平结构方程模型
Estimators
• Muthén’s limited information estimator (MUML) – random
intercepts
– ESTIMATOR = MUML – Muthén’s limited information estimator for
unbalanced data – Maximum likelihood for balanced data
• Full-information maximum likelihood (FIML) – random intercepts and random slopes
多水平结构方程模型
Tests of Model Fit • MUML – chi-square, robust chi-square, CFI,
多水平结构方程模型
• 选用更为严格的显著性水平(即更小的α)
– 仍然有偏,没能校正观测独立性不成立带来的问题。
• 使用跨级相关系数ICC
– 并非最优,且没有考虑数据的层级结构关系。
• 将较低一层水平的分数合成在较高一层的水平上 进行数据分析
– 统计检验力下降; – 同样两个变量在较高水平和较低水平上的关系可能不同; – 数据间的变异不一定存在于较高水平; – 研究感兴趣的问题可能发生在较低水平而非较高水平。
多水平统计分析模型(混合效应模型)
多⽔平统计分析模型(混合效应模型)⼀、概述普通的线性回归只包含两项影响因素,即固定效应(fixed-effect)和噪声(noise)。
噪声是我们模型中没有考虑的随机因素。
⽽固定效应是那些可预测因素,⽽且能完整的划分总体。
例如模型中的性别变量,我们清楚只有两种性别,⽽且理解这种变量的变化对结果的影响。
那么为什么需要 Mixed-effect Model?因为有些现实的复杂数据是普通线性回归是处理不了的。
例如我们对⼀些⼈群进⾏重复测量,此时存在两种随机因素会影响模型,⼀种是对某个⼈重复测试⽽形成的随机噪声,另⼀种是因为⼈和⼈不同⽽形成的随机效应(random effect)。
如果将⼀个⼈的测量数据看作⼀个组,随机因素就包括了组内随机因素(noise)和组间随机因素(random effect)。
这种嵌套的随机因素结构违反了普通线性回归的假设条件。
你可能会把⼈员(组间的随机效应)看作是⼀种分类变量放到普通线性回归模型中,但这样作是得不偿失的。
有可能这个factor的level很多,可能会⽤去很多⾃由度。
更重要的是,这样作没什么意义。
因为⼈员ID和性别不⼀样,我们不清楚它的意义,⽽且它也不能完整的划分总体。
也就是说样本数据中的路⼈甲,路⼈⼄不能完全代表总体的⼈员ID。
因为它是随机的,我们并不关⼼它的作⽤,只是因为它会影响到模型,所以不得不考虑它。
因此对于随机效应我们只估计其⽅差,不估计其回归系数。
混合模型中包括了固定效应和随机效应,⽽随机效应有两种⽅式来影响模型,⼀种是对截距影响,⼀种是对某个固定效应的斜率影响。
前者称为 Random intercept model,后者称为Random Intercept and Slope Model。
Random intercept model的函数结构如下Yij = a0 + a1*Xij + bi + eija0: 固定截距a1: 固定斜率b: 随机效应(只影响截距)X: 固定效应e: 噪声混合线性模型有时⼜称为多⽔平线性模型或层次结构线性模型由两个部分来决定,固定效应部分+随机效应部分,⼆、R语⾔中的线性混合模型可⽤包1、nlme包这是⼀个⽐较成熟的R包,是R语⾔安装时默认的包,它除了可以分析分层的线性混合模型,也可以处理⾮线性模型。
多水平统计模型简介操作 PPT
水平 2 方差之与:
Var yij | 0 , 1, xij Var(u0 j e0ij )
2 u0
2 e0
• 同一个学校得两个学生(用 i1, i表2 示)间得
协方差为:
Cov u0 j ei1 j ,u0 j ei2 j
Cov u0 j , u0 j
2 u0
• 因此,同一学校三名学生得协差阵为
例如,来自同一家庭得子女,其生理与心理特征 较从一般总体中随机抽取得个体趋向于更为相似, 即子女特征在家庭中具有相似性或聚集性 (clustering),数据就是非独立得(non independent)。
忽略多水平层次结构得后果
1、模型中得参数估计值、标准误有偏差 2、残差方差偏大,即模型拟合优度差 3、损失高水平(如水平二:学校)对结果得影响信息
Cov u0 j ei1 j , u0 j ei2 j
Cov u0 j , u0 j
2 u0
组内相关(intra-class correlation, ICC)
2 u0
2
2
u0
e0
ICC测量了学校间方差占总方差得比例, 实际上它反映了学校内个体间相关,即水平 1 单位(学生)在水平 2 单位(学校)中得聚集性或 相似性。
第二层:0 j 00 u0 j
组内相关得度量
应变量方差为(可含固定效应协变量)
Var yij | 0 , 1, xij Var(u0 j eij )
Var(u0 j ) Var(eij ) Cov(u0 j , eij )
2 u0
2 e0
此即水平 2 与水平 1 方差之与。
同一学校中两学生(用i1,i2 表示)间得协方差为:
• SAS、SPSS默认采用REML
多水平统计模型
多水平统计模型
Harvey Goldstein, UK, University of London, Institute of Education 《Multilevel Models in Educational and Social Research》1987
多水平统计模型
经典方法框架下的分析策略
经典的线性模型只对某一层数据的问题进行 分析,而不能将涉及两层或多层数据的问题进行 综合分析。
但有时某个现象既受到水平1变量的影响, 又受到水平2变量的影响,还受到两个水平变量 的交互影响(cross-level interaction)。
多水平统计模型
个体的某事件既受到其自身特征的影响,也 受到其生活环境的影响,即既有个体效应,也有 环境或背景效应(context effect)。
多水平统计模型
层次结构数据为一种非独立数据,即某观察 值在观察单位间或同一观察单位的各次观察间不 独立或不完全独立,其大小常用组内相关(intraclass correlation,ICC)度量。
例如,来自同一家庭的子女,其生理和心理 特征较从一般总体中随机抽取的个体趋向于更为 相似,即子女特征在家庭中具有相似性或聚集性 (clustering),数据是非独立的(non independent)。
多水平统计模型
✓ ML3 (1994) / MLN (1996) / MLwiN (1999) ✓ HLM (Hierarchical Linear Model)
SAS (Mixed) SPSS STБайду номын сангаасTA
多水平统计模型简介SPSS操作课件.ppt
Multilevel Models
ko
1
Chongqing Medical University Peng Bin
单水平模型
1,2,...,i,...n个观察对象
yi 0 1xi ei ,
ei
~
N
(0,
2 e
)
模型假设: 正态性、独立性、残差方差齐同性 协变量的影响保持不变
• 多水平模型将单一的随机误差项分解到与数据 层次结构相应的各水平上,具有多个随机误差 项并估计相应的残差方差及协方差。
• 构建与数据层次结构相适应的复杂误差结构, 是多水平模型区别于经典模型的根本特征
• 多水平模型由固定与随机两部分构成,其随机
部分可以包含解释变量ko
8
多水平模型基本结构
假定一个两水平的层次结构数据,学校为水 平 2 单位,学生为水平 1 单位,学校为相应总体 的随机样本。
yij 0 1 j xij eij
截距不同,斜率不同
yij
ko
0 j 1 j xij eij11
Chongqing Medical University Peng Bin
按学校绘制散点图及拟合线
该模型即为多水平模型
yij 0 j 1 j xij eij
计值与总均数的离差值,反映了第 j 个学校对 y 的 随机效应。
ko
15
Chongqing Medical University Peng Bin
1 j 01 u1 j
01 表示协变量 x 在所有学校的平均效应估计
值(固定部分),u1 j 表示协变量 x 在不同学校所
产生的特殊效应(随机部分),反映协变量与学 校之间产生的交互效应,即学校间 y 的变异与协 变量 x 的变化有关。
优选多水平统计模型
常见的层次结构数据
✓ 多中心临床试验研究 ✓ 临床试验和动物实验的重复测量 ✓ 纵向观测 如儿童生长发育研究 ✓ 流行病学现场调查 如整群抽样调查 ✓ 遗传学 家系调查资料 ✓ meta 分析资料
经典线性模型只对某一层数据进行分析
层次结构数据:
✓ 可能同时受水平1和水平2变量的影响 ✓ 还受到两个水平变量的交互影响(cross-level
Subject specific effects of X on Pr(Death), OR = 20 per 1 unit increase in X Population average effect of X on Pr(Death), OR = 2.7 per 1 unit increase in X
2 u0
2 e0
同一医院所诊疗的三名患者的协差阵:
2 u0
2 e0
2 u0
2 u0
2 u0
2
2
u0
e0
2 u0
2 u0
2 u0
2
2
u0
e0
医院1:3名患者 , 医院2:2名患者
应变量向量 Y 总的协方差阵:
2 u0
2 e0
2 u0
2 u0
2 u0
2
2
u0
e0
2 u0
2 u0
多水平统计模型
多中心临床试验的多水平结构
中心(医院): 个体: 重复测量值:
3水平 2水平 1水平
医院1
医院2
个体1
个体2
个体1
个体2
个体3
重复测量1 重复测量2 重复测量1 重复测量2 重复测量1 重复测量2 重复测量1 重复测量2 重复测量2 重复测量1 重复测量2
多水平模型简介PPT精选文档
14
15
16
* 在模型中纳入水平1解释方差估计分别为0.002449和0.01518,
对应的Z检验统计量为1.65和2.30,prob(Z)分别为 0.0490和0.0108,说明这两个变量的回归系数是随机
系数。 *水平1随机斜率检验
5
*1)由于多水平模型同时考虑不同水平上的
差异,因此当数据水平结构较多时,多水平 模型结构较一般计量模型结构复杂;
*(2)需要较大的样本量才可以保证多水平
模型估计的稳定性,较小的样本会带来偏差
*多水平模型的局限性
6
*无条件两水平模型
首先建立无条件两水平模型,又称为截距模型(intercept-only model) 或空模型(empty model),是两水平模型建模的基础。其模型形式为:
*多水平模型简介
1
*社会科学研究中的一个基本概念是,社会是一个具有
分级结构的整体,社会的分级结构自然而然地使由其 所产生的数据呈现水平(层次)结构。在该类数据中, 低一水平(层次)的数据单位嵌套与或聚集在高一水 平(层次)的单位中。
*长期以来用以说明具有多种水平结构的数据的例子是
对学生学习成绩的研究。学生的学习状况不仅与个人 的内在因素(如智力水平)相联系,而且与其所处的 环境相联系,如学习风气、教师的教学经验、学校的 设施等。因此在对学习成绩与个体水平变量(如性别、 智力水平、种族等)关系的研究中,可将学生个体嵌 套在班级里,而将班级嵌套在学校里的形式进行数据
*空模型
12
结果表明:各村农户的人均收入增长率存在显著差异。组内相关 系数(ICC):
ICC=0.368表明结局测量中约有36.8%的总变 异是由村之间的差异造成的。
多水平统计模型(共108张PPT)
时间的变化;
1 此即水平 2 和水平 1 方差之和。
空模型的结果可以说明总结局测量变异中多大程度是由组内变异引起,多大程度是由组间变异引起。 (3) 第一水平模型纳入第一水平解释变量
随机系数模型
(Random Coefficient Model)
随机系数模型是指协变量的系数估计不是固 定的而是随机的,即协变量对反应变量的效应在
不同的水平 2 单位间是不同的。
仍以医院与患者两水平数据结构说明随机系 数模型基本结构与假设。
yij0j 1jxije0ij
与方差成份模型的区别在于 。 1 j
结构,可忽略医院的存在,即简化为传统的单
水平模型;反之,若存在非零的 略医院的存在。
,则不u20能忽
水平 2 单位中的水平 1 单位间存在相关,
通 常 的 “ 普 通 最 小 二 乘 法 ” (Ordinary Least Squares OLS)进行参数估计是不适宜的。
进一步,如数据具有三个水平的层次结 构,如医院、医生和患者三个水平,则将有 两个这样的相关系数,即医院内相关和医生 内相关。
多水平统计模型简介
A Brief Introduction to
Multilevel Statistical Models
概述 层次结构数据的普遍性 经典方法及其局限性 基本多水平模型 多水平模型的应用
概述
80 年代中后期,英、美等国教育统计学家开始探讨分析
层次结构数据(hierarchically structured data)的统计方法, 并相继提出不同的模型理论和算法。
多水平统计模型研究生版-PPT文档资料
多水平分析的概念为人们提供了这样一个框架,即 可将个体的结局联系到个体特征以及个体所在环境或背 景特征进行分析,从而实现研究的事物与其所在背景的 统一。
基本的多水平模型
经典模型的基本假定是单一水平和单一的随 机误差项,并假定随机误差项独立、服从方差为
常量的正态分布,代表不能用模型解释的残留的
随机成份。
MLwiN (2019)
SAS (Mixed) SPSS STATA
层次结构数据的普遍性
水平2
水平1
两水平层次结构数据
“水平” (level) :
指数据层次结构中的某一层次。例如,子女为低水平
即水平 1 ,家庭为高水平即水平 2 。
“单位” (unit) :
指数据层次结构中某水平上的一个实体。例
2 Var ( e ) E ( e ) 0 0 ij e 0 ij , 0
多水平统计模型简介
A Brief Introduction to Multilevel Statistical Models
概述 层次结构数据的普遍性 经典方法及其局限性 基本多水平模型 多水平模型的应用
多水平主成分分析 多水平因子分析 多水平判别分析 多水平logistic回归 多水平Cox模型 多水平Poisson回归 多水平时间序列分析 多元多水平模型 多水平结构方程模型
u 0 j 0 0 j
0 为平均截距,反映 y ij
与
x ij
的平均关系,
即当 x 取 0 时,所有 y 的总平均估计值。
u 0 j 为随机变量,表示第 j 个医院 y 之平均估
计值与总均数的离差值,反映了第 j 个医院对 y 的 随机效应。
3多水平统计模型简介
Cov e
, e0i2 j
组内相关(intra-class correlation, ICC)
2 u0
2 u0 2 e0
代表组间方差, 组水平方差。
代表组内方差, 个体水平方差
ICC测量了医院间方差占总方差的比例,实际上它反映 了医院内个体间相关,即水平 1 单位(患者)在水平 2 单位(医院)中的聚集性或相似性。 当组内各个体间趋于相互独立时,ICC 趋于0,表示没有 群组效应,此时多层模型可简化为固定效应模型。
项,并假定随机误差项独立、服从方差为常量的正态分布, 代表不能用模型解释的残留的随机成份。Y 0i 1i x1
当数据存在层次结构时,随机误差项则不满足独立
常方差的假定。模型的误差项不仅包含了模型不能解释的 应变量的残差成份,也包含了高水平单位自身对应变量的 效应成份。
多水平模型将单一的随机误差项分解到与数据层次结
2.随机系数模型(Random Coefficient Model)
随机系数模型是指协变量的系数估计不是固定的而是 随机的,即协变量对反应变量的效应在不同的水平 2 单位 间是不同的。(仍以医院与患者两水平数据结构说明随机系
数模型基本结构与假设。)
yij 0 j 1 j xij e0ij
1. 方差成份模型(多水平模型中最简单的)
(Variance Component Modelቤተ መጻሕፍቲ ባይዱ 1.1固定效应模型 1.2不含协变量的随机 效应方差成分模型(空 模型) 1.3含协变量的随机效 应方差成分模型
方差成分模型
1.1固定效应模型
某研究中有多个不同处理因素,若研究者感兴趣的各 种处理都设计在研究当中,则认为这一因素具有固定 效应,如以下例2. 1 中对小白鼠给予三种不同的营养 素.
多水平模型简介
hosp no time group age gender ess0 adl0 ess adl
1~15 1~456
1~3周
试验组=1,对照组=0 18~75岁 女性=0,男性=1 40~80 (评分高病情轻) 0~95 (评分高病情轻) 0~100 0~100
新药临床试验原始资料格式
疗后1周 疗后2周 疗后3周 疗前 医院 患者 组别 年龄 性别 编号 编号 ESS0 ADL0 ESS1 ADL1 ESS2 ADL2 ESS3 ADL3
资料特点
• 两水平层次结构
• 地区(水平2单位) 15 • 各地区内逐年重复观察(水平1单位) 1980
• 资料按性别、年龄分组 • 反应变量是肺癌死亡人数
定性反应变量的多水平模型
重点:二分类反应变量的两水平模型
调查研究
• 某省调查其农村居民的卫生服务
随机抽取30个乡镇,每个乡镇分别抽取2
个行政村,每个村再随机抽取33户(家庭),
2
0
•
1
为处理因素的效应参数,又称固定效应 (fixed effect)参数
• u0 j为 水 平 2 单 位 的 logit 均 值 0 j 与 总 均 值 0 之差,又称为随机效应(random effect) 或高水平的残差。
•
2 u0 j 的 方 差 u 又 称 为 随 机 参 数 ( random
1 1 1 1 1 1
1 2 3 4 5 6
1 0 1 1 0 1
60 43 61 71 71 67
0 1 1 1 1 1
69 50 73 50 86 90 85 100 76 75 82 75 84 100 90 100 40 30 42 35 55 35 72 45 78 80 90 95 92 100 93 100 72 75 75 75 82 - 82 - 80 80 93 85 100 95 - -
多水平统计模型 第3章
Chapter 3Extensions to the basic multilevel model 3.1 Complex variance structuresIn all the models of chapter 2 we have assumed that a single variance describes the random variation at level 1. At level 2 we have introduced a more complex variance structure, as shown in figure 2.7, by allowing regression coefficients to vary across level 2 units. The modelling and interpretation of this complex variation, however, was solely in terms of randomly varying coefficients. Now we look at how we can model the variation explicitly as a function of explanatory variables and how this can give substantively interesting interpretations. We shall consider mainly the level 1 variation, but the same principles apply to higher levels. We shall also in this chapter consider extensions of the basic model to include constraints on parameters, unit weighting, standard error estimation and aggregate level analyses.In the analysis of the JSP data in chapter 2 we saw that the level 1 residual variation appeared to decrease with increasing 8-year maths score. We also saw how the estimated individual school lines appeared to converge at high 8-year scores. We consider first the general problem of modelling the level 1 variation.Since we shall now consider several random variables at each level the notation used in chapter 2 needs to be extended. For a 2-level model we continue to use the notation u e j ij , for the total variation at levels 2 and 1 and we writeu u z e e z j hj hj ij hij hij h r h r ====∑∑, 012(3.1)where the z 's are explanatory variables. Normally z z j ij 00, refer to the constant (=1) defining a basic or intercept variance term at each level.For three level models we will use the notation v u e k kj ijk ,, where i refers to level 1 units, j to level 2 units, and k to level 3 units and h indexes the explanatory variables and their coefficients within each level..One simple model for the level 1 variation is to make it a linear function of a simple explanatory variable. Consider the following extension of (2.1)y x u e e z z x e e e e ij ij j ij ij ij ij ij ij e ij ij ij e =++++====ββσσ0101002101010()var(),var(),cov(),(3.2)so that the level 1 contribution to the overall variance is the linear function of z ijσσe e ij z 02012+This device of constraining a variance parameter to be zero in the presence of a non zero covariance is used to obtain the required variance structure. Thus it is only the specified functions of the random parameters in (3.2) which have an interpretation in terms of the level 1 variances of theresponses y ij . This will generally be the case where the coefficients are random at the same level at which the explanatory variables are defined. Thus for example, in the analyses of the JSP data in chapter 2, we could model the average school 8-year-score, which is a level-2 variable, as random at level 2. If the resulting variance and covariance are non-zero, the interpretation will be that the between-school variance is a quadratic function of the 8-year score namelyσσσu u j u j z z 02011222++where z j is the average 8-year score.Furthermore, we can allow a variance parameter to be negative, so long as the total level 1 variance remains positive within the range of the data In chapter 5 we discuss modelling the total level 1 variance as a nonlinear function of explanatory variables, for example as a negative exponential function which automatically constrains the variance to be positive.Where a coefficient is made random at a level higher than that at which the explanatory variable itself is defined, then the resulting variance (and covariance) can be interpreted as the between-higher-level unit variance of the within-unit relationship described by the coefficient. This is the interpretation, for example, of the random coefficient model of table 2.5 where the coefficient of the student 8-year score varies randomly across schools. In addition, of course, we have a complex variance (and covariance) structure at the higher level.The model (3.2) does not constrain the overall level 1 contribution to the variance in any way. In particular, it is quite possible for the level 1 variance and hence the total response variance to become negative. This is clearly inadmissible and will also lead to numerical estimation problems. To overcome this we can consider elaborating the model by adding a quadratic term, most simply by removing the zero constraint on the variance. In chapter 5 we consider the alternative of modelling the variance as a nonlinear function of explanatory variables.In table 3.0 we extend the model of table 2.5 to incorporate a such a quadratic function for the level 1 variance. If we attempt to fit a linear function we indeed find that a negative total variance is predicted.The results from model A show a significant complex level 1 variation (chi squared with 2 degrees of freedom = 123). Furthermore, the level 2 correlation between the intercept and slope is now reduced to -0.91 and with little change among the fixed part coefficients. The predicted level 1 standard deviation varies from about 9.0 at the lowest 8-year score value to about 1.9 at the highest, reflecting the impression from the scatterplot in figure 2.1.Table 3.1 JSP data with level 1 variance a quadratic function of 8-year score measured about the sample mean. Model A with original scale; models B and C with Normal score transform of 11-year score.Parameter Estimate(s.e.) Estimate (s.e.) Estimate (s.e.) A B CFixed:Constant 31.7 0.13 0.14 8-year score 0.58 (0.029) 0.097 (0.004) 0.096 (0.004) Gender (boys - girls) -0.35 (0.26) -0.04 (0.05) -0.03 (0.05) Social class (Non Man - Man) 0.74 (0.29) 0.16 (0.06) 0.16 (0.06) School mean 8-year score 0.02 (0.11) -0.008 (0.02) 8-yr score x school mean 8-yr score 0.02 (0.01) 0.0006 (0.02)Random:Level 2σu 022.84 (0.88) 0.084 (0.024) 0.086 (0.024)σu 01 -0.17 (0.07) -0.0024 (0.0015) -0.0030 (0.0015) σu 120.012 (0.007) 0.00018 (0.00016) 0.00021 (0.00016)Level 1σe 02 16.5 (1.02) 0.413 (0.029) 0.412 (0.022) σe 01-0.90 (0.02) -0.0032 (0.0017)σe 12 0.06 (0.02)0.0000093(0.00041)One of the reasons for the high negative correlation between the intercept and slope at the school level may be associated with the fact that the 11-year score has a 'ceiling' with a third of the students having scores of 35 or more out of 40. A standard procedure for dealing with such skewed distributions is to transform the data, for example to normality, and this is most conveniently done by computing Normal scores; that is by assigning Normal order statistics to the ranked scores. The results from this analysis are given under model B in table 3.1. Note that the scale has changed since the response is now a standard normal variable with zero mean and unit standard deviation. We now find that there is no longer any appreciable complex variation at level 1; the chi squared test yields a value of 3.4 on 2 degrees of freedom. Nor is there any effect of the compositional variable of mean school 8-year score; the chi squared test for the two fixed coefficients associated with this give a value of 0.2 on 2 degrees of freedom. The reduced model is fitted as C. The parameters associated with the random slope at level 2 remain significant (χ22=7.7, P=0.02) and the level 2 correlation is further reduced to -0.71. Figure 3.1 shows the level 1 standardised residuals plotted against the predicted values from which it is clear that now the variance is much more nearly constant. This example demonstrates that interpretations may be sensitive to the scale on which variables are measured. It is typical of many measurements in the social sciences that their scales are arbitrary and we can justify nonlinear, but monotone, order preserving, transformations if they help to simplify the statistical model and the interpretation.Predicted valuelevel 1residual -5-4-3-2-101234-3-2-112Figure 3.1 Level 1 standardised residuals by predicted values for analysis C in table 3.1We are not limited to making the variance a function of a single explanatory variable, and we can consider general functions of these combined. Some may be absent from the fixed part of the model, or equivalently have their fixed coefficients constrained to zero. A traditional, single level, exampleis 'regression through the origin' in which the fixed intercept term is zero while a level 1 variance associated with the intercept is fitted.We can consider any particular function of explanatory variables as the basis for modelling thevariance. One possibility is to take the fixed part predicted value yij and define the level 1 random term as e yij ij 1 , assuming the predicted value is positive, so that the level 1 variance becomes σe ij y 12 , that is proportional to the predicted value; often known as a 'constant coefficient of variation' model. Other functions are clearly possible, and as we shall see in chapter 7 often there are naturalchoices associated with distributional assumptions made about the responses.3.1.1 Variances for subgroups defined at level 1A common example of complex variation at level 1 is where variances are specific for subgroups. For example, for many measurements there are gender or social class differences in the level 1 variation. A straightforward way to model this situation in the case of a single such grouping is by defining the following version of (3.2) for a model with different variances for children with manual and with non-manual social class backgrounds.y x u e z e z z z e e e e ij ij j ij ij ij ij ij ijij e ij e ij ij =++++=====ββσσ0102233232223322310010()var()var()cov(,) for manual , for non -manual for manual , for non -manual, ,(3.3)Table 3.2 JSP data with normal score of 11-year maths as response. Subscript 1 refers to 8-year maths score, 2 to manual group, 3 to non manual group and 4 to boys.Parameter Estimate (s.e.) Estimate (s.e.) Estimate (s.e.) A B CFixedConstant 0.13 0.13 0.13 8-year score 0.096 (0.004) 0.096 (0.004) 0.096 (0.004) Gender (boys-girls) -0.03 (0.05) -0.03 (0.05) -0.03 (0.05) Social Class (Non Man - Man) 0.16 (0.05) 0.16 (0.05) 0.16 (0.05)Randomlevel 2σu 020.086 (0.025) 0.086 (0.025) 0.086 (0.024) σu 01-0.0029 (0.0015) -0.0029 (0.0015) -0.0028 (0.0015) σu 120.00018 (0.00015) 0.00018 (0.00015) 0.00018 (0.00015)level 1σe 020.37 (0.04) 0.36 (0.04) σe 020.03 (0.02) 0.03 (0.02) σe 22 0.43 (0.03) σe 32 0.37 (0.04)σe 040.004 (0.02)-2 (log likelihood)1491.81491.81491.7If we do this for model C in table 3.1 then we obtain the estimates in column A of table 3.2. The estimates of the fixed parameters have changed little and the level 2 parameters are also similar. At level 1 the variance for the manual students is higher than that for the non manual students, but not significantly so since the likelihood ratio test statistic, formed by differencing the values of (-2 log likelihood) for the model with a single level 1 variance (1493.7) and that given in analysis A of table 3.2, gives a chi-squared test statistic of 1.9 on 1 degree of freedom.We now look at an alternative method for specifying this type of complex variation at level 1 which has certain advantages. We now writey x u e z z e e e e ij ij j ij ij ij ij e ij ij ij e =+++====ββσσ01022200220202100()var(),var()cov(,) for manual , for non -manual,and the level 1 variance is given by σσe e ij z 020222+ because we have constrained the variance of the manual coefficient to be zero. Thus, for manual children (z ij 21=) the level 1 variance is σσe eo 0222+ and for non manual children the level 1 variance is σe 02. The second column in table 3.2 gives the results from this formulation and we see that, as expected, the covariance estimate is equal to half the difference between the separate variance estimates in the first column.Suppose now that we wish to model the level 1 variance as a function both of social class group and gender. One possibility is to fit a separate variance for each of the 4 possible resulting groups, usingeither of the above procedures. Another possibility is to consider a more parsimonious 'additive' model for the variances as followse e e z e z z e e e e e ij ij ij ij ij ij ij ij e ij ij e ij ij e =++====022444002020204041 if a boy , 0 if a girl, var(),cov()cov()σσσ(3.4)with the remaining two variances and covariance equal to zero. Thus (3.4) implies that the level 1 variance for a manual boy is σσσe e e 02020422++ etc. The third column of table 3.2 gives the estimates for this model and we see that there is a negligible difference in the level 1 variance for boys and girls.We can extend such structuring to the case of multicategory variables and we can also include continuous variables as in table 3.1. Suppose we had a 3 category variable: we define two dummy variables, say z z ij ij 56, corresponding to the second and third categories, just as if we were fitting the factor in the fixed part of the model. With z ij 1 representing the continuous variable an additive model for the level 1 random variation can be written ase e e z e z e e e e e e e e ij ij ij ij ij ijij e ij e ij ij e ij ij e ij ij e =++=====0556*******010*********var(),var(),cov()cov(),cov()σσσσσThis model can be elaborated by including one or both the covariances between the dummy variable coefficients and the continuous variable coefficient, namely σσe e 1516, . These covariances are analogous to interaction terms in the fixed part of the model and we see that, starting with an additive model, we can build up models of increasing complexity. The only restriction is that we cannot fit covariances between the dummy variable categories for a single explanatory variable. Thus if social class had three categories, we could fit two covariances corresponding to, say, categories 2 and 3 but not a covariance between these categories.Residuals can be estimated in a straightforward manner for these complex variation models. Forexample, from (3.4) the estimated residual for a manual boy is ee e ij ij ij 024++ where the estimates of the individual residuals are computed using the formulae in appendix 2.2 with the appropriate zero variances.3.1.2 Variance as a function of predicted valueThe level 1 variance can be modelled as a function of any combination of explanatory variables and in particular we can incorporate the estimated coefficients themselves in such functions. A usefulspecial case is where the function is the fixed part predicted value yij . Thus (3.2) becomes y x u e e yij ij j ij ij ij =++++ββ01001( )with level 1 variance given by σσσe e ij e ij y y 02011222++ . A special case of this model is the so called'constant coefficient of variation model' where the two variance terms are constrained to zero. The estimation of the random parameters is straightforward: at each iteration of the algorithm a new set of predicted values are calculated and used as the level 1 explanatory variable.Fixed A B Constant0.130.14Reading score0.50 (0.03) 0.49 (0.03) Gender (boys - girls)-0.19 (0.06) -0.22 (0.06) Social class (Non Man. - Man.) -0.07 (0.06) -0.06 (0.06)Random Level 2:σu 02 0.03 (0.02) 0.02 (0.01) level 1:σe 02 0.66 (0.04) 0.63 (0.04) σe 010.16 (0.04) σe 120.11 (0.09)-2 log(likelihood)1929.51905.0Table 3.3 illustrates the use of this model where the level 1 variance shows a strong dependence on the predicted value. The data are the General Certificate of Secondary Examination (GCSE ) scores at the age of 16 years of the Junior School Project students. This score is derived by assigning values to the grades achieved in each subject examination and summing these to produce a total score (See Nuttall et al, 1989 for a detailed description). There are 785 students in this analysis in 116 secondary schools to which they transferred at the age of 11 years. The students have a measure of reading achievement, the London Reading Test (LRT) taken at the end of their junior school and this is used as a pretest baseline measure against which relative progress is judged. Both the reading test score and the examination score have been transformed to Normal equivalent deviates. Analysis A is a variance components analysis and figure 3.2 shows a plot of the standardised level 1 residuals against the predicted values. It is clear that the variation is much smaller for low predicted values.One possible extension of the model to deal with this is the use the LRT score as an explanatory variable at level 1, so that the level 1 variance becomes a quadratic function of LRT score. This does not, however, entirely eliminate the relationship and instead we model the predicted value as a level 1 explanatory variable, and the results are presented as analysis B of Table 3.3. If we now plot the standardised residuals associated with the intercept against the predicted values we obtain the pattern in figure 3.3 from which it is clear that much of the relationship between the variance and the predicted value has been accounted for. We could go on to fit more complex functions of the predicted value, for example involving nonlinear or higher order polynomial terms.Table 3.3 GCSE scores related to secondary school intake achievement.Predicted valuelevel 1residual -5-4-3-2-101234-1.5-1-0.50.511.52Figure 3.2 Standardised residuals for variance components analysis.3.1.3 Variances for subgroups defined at higher levelsThe random slopes model in table 3.1 has already introduced complex variation at level 2 when the coefficient of a level 1 explanatory variable is allowed to vary across level 2 units. Just as with level 1 complex variation, we can also allow coefficients of variables defined at level 2 to vary at level 2. Exactly the same considerations apply for categorical level 2 variables as we had for such variables at level 1 and complex additive or interactive structures can be defined.Predicted valuelevel 1residual -5-4-3-2-101234-2-112Figure 3.3 Standardised residuals with level 1 variance a function of predicted value.In addition, the coefficient of a level 2 variable can vary randomly at either level 1 or level 2 or both. For example, suppose we have three types of school; all boys schools, all girls schools and mixed schools. We can allow different variances, at level 2, between boys schools, between girls schools and between mixed schools. We can also allow different between-student variances for each type of school.To further illustrate complex level 1 variation and also to introduce a three level model we turn to another data set, this time from a survey of social attitudes.3.2 A 3-level complex variation model.The longitudinal or panel data come from the British Social Attitudes Survey and cover the years 1983 - 1986 with a random sample of 264 adults measured a year apart on four occasions and living at the same address. This panel was a subsample of a larger series of cross sectional surveys. The final sample was intended to be self weighting with each household as represented by a single person having the same inclusion probability. A full technical account of the sampling procedures is given by McGrath and Waterton (1986). The sampling procedure was at the first stage to sample parliamentary constituencies with probability proportional to size of electorate, then to sample a single 'polling district' within each constituency in a similar way and finally to sample an equal number of addresses within each polling district.Because only one polling district was sampled from each constituency, we cannot separate the between-district from the between-constituency variation; the two are 'confounded'. Likewise we cannot separate the between-individuals from the between-households variation. The basic variation is therefore at two levels, between-districts (constituencies) and between-individuals (households). The longitudinal structure of the data, with four occasions, introduces a further level below these two, namely a between-occasion-within-individual level, so that occasion is level 1, individual is level 2 and district is level 3. In chapter 5 we shall study longitudinal data structures in more depth, both at level 1 and higher levels.The response variable we shall use is a scale, in the range 0 - 7, concerned with attitudes to abortion. It is derived by summing the (0,1) responses to seven questions and can be interpreted as indicating whether the respondent supported or opposed a woman's right to abortion with high scores indicating strong support. Explanatory variables are political party allegiance (4 categories), self-assessed social class (3 categories), gender, age (continuous), and religion (4 categories) and year (4 categories). A number of preliminary analyses have been carried out and the effects of party allegiance, social class, gender, and age, were found to be small and not statistically significant. We therefore examine the basic 3-level model which can be written as follows.y x x x x x x v u e ijk ijk ijk ijk ijk ijkijk k jk ijk =+++++++++βββββββ0112233445566()()()(3.5)with the explanatory variables with subscripts 1-3 being dummy variables for religious categories2-4 and those with subscripts 4-6 being dummy variables for years 1984-1986. We have three variances, one at each level in the random part of the model. The response variable in the following analyses has only 8 categories, with 32% of the sample having the highest value of 7. The response has been transformed by assigning Normal scores to the overall distribution and we shall treat the response as if it was continuously distributed. In chapter 7 we shall look at other models which retain the categorisation of the response variable.Table 3.4 Repeated measurements of Attitudes to Abortion. Response is Normal score transformation. Religion estimates are contrasted with none. Age is measured about the mean of 37 years.Parameter Estimate(S.E.) Estimate (S.E.) Estimate (S.E.)A B C Fixed: Constant0.320.330.33Religion: R. Catholic -0.80(0.18) -0.80(0.18) -0.69(0.18) Protestant -0.27(0.10) -0.26(0.10) -0.25(0.10) Other -0.63(0.13) -0.63(0.13) -0.54(0.14)Year: 1984 -0.29(0.05) -0.29(0.48) -0.29(0.05) 1985 -0.06(0.05) -0.07(0.05) -0.07(0.05) 1986 0.06(0.05) 0.05(0.04) 0.05(0.04)Age 0.013(0.005) Age x R. Catholic -0.036(0.010) Age x Protestant -0.014(0.007) Age x Other -0.023(0.008)Random: Level 3σv 2 0.03(0.02) 0.03(0.02) 0.03(0.02) Level 2σu 2 0.37(0.04)0.34(0.04) Level 1σe 02 0.31(0.02) 0.21(0.08) 0.21(0.03) σe 01 0.11(0.05) 0.10(0.04) σe 02 0.03(0.16) 0.03(0.02) σe 03 0.04(0.02) 0.04(0.02) σe 04 0.05(0.02) 0.05(0.02) σe 05 0.05(0.02) 0.05(0.02) σe 06 0.00(0.02) 0.00(0.02)-2 (log likelihood)2233.52214.22198.7Table 3.4 gives the results of fitting (3.5). The between-occasion and between-individual variances are similar. The level 3 variance is small, and the likelihood ratio chi-squared is 2.05 (compared with a value of 1.64 obtained from comparing the estimate with its standard error), which is not significant at the 10% level.For the religious differences we have χ32337=. for the overall test with all those having religious beliefs being less inclined to support abortion, the Roman Catholic and other religions being least likely of all. The Roman Catholic and other religions are significantly less likely than the Protestants to support abortion. The simultaneous test (3 d.f.) chi-squared statistics respectively are 9.7 and 9.0 (P=0.03). For the year differences we have χ32597=. and simultaneous comparisons show that in 1984 there was a substantially less approving attitude towards abortion. It is likely thatthis is an artefact of the way questions were put to respondents.1 No significant interaction exists between religion and year.We now look at elaborating the random structure of the model. At level 1 we fit an additive model as in section 3.1.1 for the categories of religion and for year. Year is the variable defining level 1, but religion is defined at level 2 and is an example of a higher level variable used to define complex variation at a lower level.The results are given as analysis B in table 3.3. For year we obtain χ3283=. (P=0.04) and forreligion χ32=11.0 (P=0.01). There is a greater heterogeneity within the Roman Catholics, from year to year, and within the other religions than within Protestants and those with no religion. The addition of these variances to the model does not change substantially the values for the other parameters.Fitting complex variation at level 2 (between individuals) and level 3 (between districts) does not yield statistically significant effects, although there is some suggestion that there may be more variation among Roman Catholics.For the final analysis we look again at the fixed part and explore interactions. None of the interactions have important effects except for that of age with religion, although age on its own had a negligible effect. We see from analysis C that those with no religion show an increasing approval of abortion with age, whereas the Roman Catholics and to a smaller extent other religions show a decreasing approval with age. The overall chi-squared for testing the interactions is 16.1 with 3 degrees of freedom.3.3 Parameter ConstraintsIn the example of the previous section some of the fixed and random parameters for year and religious groups were similar. This suggests that we could fit a simpler model by forcing or 'constraining' such parameters to take the same values and also so decreasing the standard errors in the model. We illustrate the procedure using the fixed part estimates for the abortion attitudes data. We consider the general linear constraint for the fixed parameters in the form C k β=, where C is a (n x p ) constraint matrix and k is a vector which can have quite general values for their elements. Suppose that, in analysis C of table 3.4, we wished to constrain the main effects and interaction terms of the Roman Catholic and Other religions to be equal. This implies two constraint functions, and we haveC k =--⎛⎝ ⎫⎭⎪=⎛⎝ ⎫⎭⎪010*********000000010100which implies ββββ13810==, . The constrained estimator of β is1In 1984 seven questions making up the attitude scale were put to respondents in the reverse order, that is with the most 'acceptable' reasons for having an abortion (e.g. as a results of rape) coming first. This illustrates an important issue in surveys of all kinds which collect data for comparisons over time, namely to maintain the same questioning procedure.()( )( )βββc T T T LC C LC C k L X V X =--=---111 (3.6)where βis the unconstrained estimator. The covariance matrix of the constrained estimator is MLM whereM I LC C LC C T T =--()1.There is an analogous formula for constrained random parameters.Using the above constraints for analysis C in Table 3.4, the random parameters are little changed, the main effects for Roman Catholic and Other religion become -0.57 and the interaction terms become -0.026 and the remaining main effects are virtually unaltered. The standard errors, as expected, are smaller being 0.121 for the main effect estimate and 0.007 for the interaction.In addition to linear constraints we can also apply nonlinear constraints. To illustrate the procedure we consider the analysis in table 2.5, where the estimated correlation between the slope and intercept was -1.03. To constrain this to be exactly -1.0, after each iteration of the algorithm we compute the covariance as a function of the variances to give this correlation. Thus, after iteration twe compute σσσu tu t u t 01101+= and then constrain the covariance to be equal to this value, a linearconstraint, for iteration t+1. This procedure is repeated until convergence is obtained for the unconstrained values. For more general nonlinear constraints we may require several such constraints to apply simultaneously.If we constrain the model of Table 2.5 to give a correlation of -1.0 we find that the fixed effects and the level 1 variance are altered only slightly, with a small reduction in standard errors. The level 2 parameters, however, are reduced by about 50% and are closer to those in analysis A of Table 3.1 where the estimated correlation is -0.91.We can also temporarily constrain values during the iterative estimation procedure if convergence is difficult or slow. Some parameters, or functions of them, can be held at current values, other parameter values allowed to converge and the constrained parameters subsequently unconstrained.3.4 Weighting unitsIt is common in sample surveys to select level 1 units, for example household members, so that each unit in the population has the same probability of selection. Such self-weighting samples can then be modelled using any of the multilevel models of this book. Likewise, if the model correctly specifies the population structure, non-self weighting samples can be modelled similarly: the differential selection probabilities contain no extra information for the model parameters. If we wished to form predictions for the whole population on the basis of the model estimates, we could cobine weights from each level of the data hierarchy (typically inverses of selection probabilities) into composite level 1 weights and apply these to the predicted values for each level 1 unit and then form a weighted sum over these units.In general we can carry out the following procedure for assigning weights. Two cases need to be distinguished. In the first the weights are independent of the random effects at the level. In this case we adopt the following procedure.Consider the case of a 2 level model. Denote by w j the weight attached to the j -th level 2 unit and by w i j | the weight attached to the i -th level 1 unit within the j -th level 2 unit such that。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
的一个实体。例如,每个子女是一个水平 1 单位,
每个家庭是一个水平 2 单位。
临床试验和动物实验的重复测量 多中心临床试验研究 纵向观测如儿童生长发育研究 流行病学现场调查如整群抽样调查 遗传学家系调查资料 meta 分析资料
层次结构数据为一种非独立数据,即某观察 值在观察单位间或同一观察单位的各次观察间不 独立或不完全独立,其大小常用组内相关 (intraclass correlation,ICC)度量。 例如,来自同一家庭的子女,其生理和心理 特征较从一般总体中随机抽取的个体趋向于更为 相似,即子女特征在家庭中具有相似性或聚集性 (clustering),数据是非独立的(non independent)。
当数据存在层次结构时,随机误差项则不满足 独立常方差的假定。模型的误差项不仅包含了模型 不能解释的应变量的残差成份,也包含了高水平单 位自身对应变量的效应成份。
多水平模型将单一的随机误差项分解到与数
据层次结构相应的各水平上,具有多个随机误差 项并估计相应的残差方差及协方差。构建与数据 层次结构相适应的复杂误差结构,这是多水平模 型区别于经典模型的根本特征。
分解(disaggregation) 聚合(aggregation)
分解:不满足模型独立性假定,回归系数及 其标准误的估计无效,且未能有效区分个体效应 与背景效应。另一种分析策略是用哑变量拟合高 水平单位的固定效应。 聚合:损失大量水平1单位的信息,更严重的 是可能导致“生态学谬误”(ecological fallacy)。
多水平分析的概念为人们提供了这样一个框架,即 可将个体的结局联系到个体特征以及个体所在环境或背
景特征进行分析,从而实现研究的事物与其所在背景的
统一。
基本的多水平模型
经典模型的基本假定是单一水平和单一的随 机误差项,并假定随机误差项独立、服从方差为
常量的正态分布,代表不能用模型解释的残留的
随机成份。
ML3 (1994) / MLN (1996) / MLwiN (1999)
HLM (Hierarchical Linear Model)
SAS (Mixed)
SPSS STATA
层次结构数据的普遍性
水平2
ห้องสมุดไป่ตู้
水平1
两水平层次结构数据
“水平” (level) :指数据层次结构中的某一层 次。例如,子女为低水平即水平 1 ,家庭为高水 平即水平 2 。 “单位” (unit) :指数据层次结构中某水平上
多水平模型由固定与随机两部分构成,与 一般的混合效应模型的不同之处在于,其随机部 分可以包含解释变量,故又称为随机系数模型 (random coefficient model) ,其组内相关也可为 解释变量的函数。换言之,多水平模型可对不同 水平上的误差方差进行深入和精细的分析。
1. 方差成份模型
(Variance Component Model)
假定一个两水平的层次结构数据,医院为水平
非独立数据不满足经典方法的独立性条件, 采用经典方法可能失去参数估计的有效性并导致 不合理的推断结论。 但非独立数据的组内相关结构各异,理论上, 不同的结构应采用相应的统计方法。如纵向观测 数据常用广义估计方程 (GEE) ,但有两个局限性: 一是对误差方差的分解仅局限于 2 水平的情形, 二是没有考虑解释变量对误差方差的影响。当应 变量的协差阵为分块对角阵时,一般采用多水平 模型。
多水平模型 (multilevel models) 最先应用于教育 学领域,后用于心理学、社会学、经济学、组织行 为与管理科学等领域,逐步应用到医学及公共卫生 等领域。
Harvey Goldstein, UK, University of London, Institute of Education 《Multilevel Models in Educational and Social Research》1987
经典方法框架下的分析策略
经典的线性模型只对某一层数据的问题进 行分析,而不能将涉及两层或多层数据的问题进 行综合分析。
但有时某个现象既受到水平1变量的影响, 又受到水平 2 变量的影响,还受到两个水平变量 的交互影响(cross-level interaction)。
个体的某事件既受到其自身特征的影响,也 受到其生活环境的影响,即既有个体效应,也有 环境或背景效应(context effect)。 例如,个体发生某种牙病的危险可能与个体 的遗传倾向、个体所属的社会阶层 ( 如饮食文化和 口腔卫生习惯 ) 、环境因素 ( 如饮水中氟浓度 ) 等有 关。
多水平统计模型简介
A Brief Introduction to Multilevel Statistical Models
概述 层次结构数据的普遍性 经典方法及其局限性 基本多水平模型 多水平模型的应用
概
述
80 年代中后期,英、美等国教育统计学家开始 探 讨 分 析 层 次 结 构 数 据 (hierarchically structured data) 的统计方法,并相继提出不同的模型理论和算 法。
Nicholas
Longford,
Princeton
University,
Education Testing Service
《Random Coefficient Models》1993
多水平主成分分析 多水平因子分析 多水平判别分析 多水平logistic回归 多水平Cox模型 多水平Poisson回归 多水平时间序列分析 多元多水平模型 多水平结构方程模型
Anthony Bryk, University of Chicago
Stephen Raudenbush, Michigan State University , Department of Educational Psychology
《Hierarchical Linear Models : Applications and Data Analysis Methods》1992