【原创】线性混合效应模型Linear Mixed-Effects Models的部分折叠Gibbs采样数据分析报告(含代码数据)
混合线性效应模型
• (3)一阶自回归结构(AR(1)),协方差矩阵中 含2个参数; • (4)循环相关结构(Toeplitz),协方差矩阵中含 有t个参数(t为矩阵维数); • (5)带状主对角结构(UN(1)),协方差矩阵中含t个 参数; • (6)空间幂相关结构(SP(POW)),协方差 矩阵中含有2个参数; • (7)独立结构(UN),又称无结构协方阵。
配合混合线性模型的步骤如下:
小结
• 混合线性模型保留了一般线性模型的Y具有正态 性假定条件,但放弃了独立性和方差齐性的假定。
SAS 程序
• /*程序1:建立例题1数据集,配合一般线性和混合效 应线性模型*/ • Data aaa; • Input student gender $ area $ scores @@;datalines; • 1 m A 56.3 2 F A 84.2 • 3 m A 56.8 4 m A 87.4 • 5 m B 70.1 6 F B 69.8 • 31 m A 78.5 •; • /*fixed-effects model with GLM procedure*/
• 该资料也可以看成是一个3水平资料。第一水平位 各时间点的测量值,第二水平位病人,第三水平 为手术方案。 • 把时间作为第一水平(测量值水平)上的协变量, 在第二水平(病人水平)上有2个协变量:年龄及 术后保留肝容积。手术前白蛋白含量也可作为协 变量处理。 • 在第三水平(手术方案水平)上无协变量。
/*程序2:建立例2资料的SAS数据集及配合混合效 应线性模型*/ Data pad; Input pnt plan $ age h_v pad0 pad2 pad10 pad20@@; Cards; 1 a 30 300 205 129 117 103 40 2 a 43 580 77 171 220 159 105 3 a 47 704 245 172 177 186 145 27 b 59 850 200 230 250 240 208;
专业术语
现代回归分析专业词汇、术语中英文对照第一章Linear regression model/analysis 线性回归模型/分析Response variable 因变量、解释变量、反应变量、响应变量Predictor variable 自变量、回归变量、预测变量Scatterplot 散点图Scatterplots matrix 散点图矩阵Data 数据Summary graph 描述图、概述图、摘要图The horizontal axis 水平轴、横轴、x轴the vertical axis 垂直轴、纵轴、y轴Separated point 离群点Leverage point 杠杆点Outlier 异常点Base 10 common logarithm 以10为底的常用对数Base-2 logrithm 以2为底的对数Natural logarithm 自然对数Cross-sectional data 截面数据Time series data 时间序列数据Longitudinal data 纵向数据;即panel data 面板数据Mean f unction 均值函数Population mean 总体均值(期望)Nonparam etric estimate 非参数估计nonparametric regression 非参数回归Smoother 光滑Transf ormation 变换Variance f unction 方差函数Summary statistics 描述统计量Smooth curve 光滑曲线Quadratic polynomial 二次多项式Null plotMarginal relationship 边际关系joint relationship 联合关系第二章Simple linear regression 简单线性回归、一元线性回归Parameter 参数Intercept 截距Slope 斜率Expected value 期望值expectation 期望Statistical error 统计误差Random variable 随机变量Assumption 假设Independent 独立Normally distributed 服从正态分布normality 正态性Poisson distribution 泊松分布Binomial distribution 二项分布Ordinary least squares estimation /OLS 普通最小二乘估计The residual sum of squares /RSS 残差平方和Residual 残差f itted value 拟合值predict value 预测值Regression coeff i cients 回归系数Cross-product 叉乘、交叉乘积Corrected sum of squares 修正平方和uncorrected sum of squares 未修正的平方和Corrected cross products 修正的叉积uncorrected cross product 未修正的叉积sample average 样本均值population average 总体均值sample variance 样本方差population variance 总体方差sample covariance (matrix) 样本协方差(矩阵)sample correl ation (matrix) 样本相关系数(矩阵)population correlation 总体相关系数inconsistency 不一致性、非一致性sample 样本sampling 抽样asymmetry 不对称dashed line 虚线solid line实线rounding 舍入、舍去estimated equation 估计出来的方程unbiased estimate 无偏估计degrees of freedom /df自由度residual degrees of freedom 残差自由度mean square 均方和residual mean square 残差均方和square root 平方根standard error 标准误差standard deviation 标准差residual standard error 残差标准误差即regression standard error 回归标准误差normal distribution 正态分布chi-squared distribution 卡方分布t distribution t分布 F distribution F分布linear combination 线性组合Gauss-Markov theorem 高斯-马尔科夫定理Best linear unbiased estimate /BLUE 最佳线性无偏估计Linearity 线性normality 正态性nonlinearity 非线性independently and identically distributed /i.i.d. 独立同分布estimated variance 方差的估计analysis of vari ance /ANOVA 方差分析F-test F检验t-test t检验regression sum of squares /SSreg 回归平方和regression df / model df回归自由度、又称为模型自由度null hypothesis 原假设alternative hypothesis 备择假设sample size 样本量numerator 分子denominator 分母signif icance level 显著性水平p value p值percentage point 分位点percentile 分位数critical value 临界值power of test 检验的势、检验的功效coeff i cient of det ermination 可决系数、判决系数scale-free 无计量单位的、无量纲的conf idence interval 置信区间conf idence level 置信水平hypothesis test 假设检验two-sided test 双边检验one-sided test 单边检验prediction 预测point prediction 点预测predictive interval 预测区间standard error of prediction 预测标准误差joint conf idence region 联合置信域residuals plot 残差图第三章Multiple linear regression model 多元线性回归Fertility 出生率Per person gross domestic product 人均国内生产总值(GDP)Urbanization 城市化Perspective 透视法motion 运动、动画3-dinensional plot 三维图Added-variable plot 附加(增加)变量图Linear f unction of the paramet ers 参数的线性函数Hyperplane 超平面Term 项predictor 自变量Continuous variable 连续变量Discrete variable 离散变量Categorical vari able /data 分类变量/数据Transf ormations of predictors 自变量的变换Interaction term 交叉项、交互作用项Dummy variable 虚拟变量Factor 因子Sample correlation matrix 样本相关矩阵Random vector 随机向量Variance-covariance matrix 方差和协方差矩阵Identity matrix 单位矩阵rounding error 舍入误差matrix decomposition 矩阵分解Q R f actori zation Q R分解total sum of squares 总离差平方和regression sum of squares 回归平方和several orders of magnitude 几个数量级main diagonal elements 主对角线元素different nested sets 不同的嵌套集overall analysis of variance 总方差分析computed signif icance level (实际)计算的显著性水平,即p值multiple correlation coeff icient 复相关系数subtract … f rom … 从… 减去…partial F-test 偏F检验sequential analysis of vari ance 序贯方差分析、依次方差分析第四章Fitting model 拟合模型Units of regression parameters /coeff ici ents 回归参数/系数的单位Rate of change 变化率Signs of parameter estimates 参数估计的符号Straight line直线Weak signif icance 弱显著性,即该自变量对因变量的效应很弱或不显著Quadratic time trend 二次时间趋势Rank def i cient 不满秩over-paramet erized 过度参数化,即模型中参数太多,不能由数据估计nonlinear combination 非线性组合linearly independent 线性独立(无关) linearly dependent 线性相关rank of matrix 矩阵的秩f ull rank 满秩computer package 计算机软件包one-way design 单因素设计、单向设计the column of ones 分量全都是1的列向量place a constraint 施加一个约束the order of f itting 拟合的阶test of signif icance 显著性检验repeated experimentation 反复试验dropping variable /term 遗漏变量/项、删除变量conditional expectation 条件期望rectangl e 矩形quadratic f unction 二次函数the f irst derivative of mean f unction with respect to x 均值函数的关于x的一阶导数the slope of the tangent 切线的斜率differentiating the second equation 对第二个方程求微分experimental data 实验数据、试验数据experimental predictors 试验自变量observational data 观测数据observational predictors 观测自变量fertilizer 肥料、化肥the spacing of plants 作物的间距irrigation 灌溉plots of land 地块charact eristics of the plots 地块的特征drainage 排水exposure 光照、日照时间soil f ertility 土壤肥力weather variable 天气变量yields 产量causation 因果关系association 伴随关系、联系feedlot 养殖场plausible 看似合理的outweigh 大于、超过lurking variable 遗漏变量、潜在变量random assignment 随机分配normal population 正态总体multivariate normal distribution 多元正态分布conditional distribution 条件分布random sampling 随机抽样population multiple correlation coeff i cient 总体复相关系数bivariate normal distribution 二元正态分布nonlinear regression 非线性回归regression through the origin 过原点的回归missing data缺失数据incomplete data 不完全数据、即缺失数据missing at random /MAR 随机缺失censored response 因变量删失cross-cultural demographic study 跨文化的人口研究less-developed count ry 不发达国家EM algorithm EM算法Joint distribution 联合分布multiple imputation 多重插值computationally intensive method 计算量大的方法computer simulation 计算机模拟percentile-based 基于分位数的median 中位数sampling with replacem ent 重复抽样、有放回的抽样bootstrap method bootstrap方法、自助法bootstrap sample bootstrap样本large sample size 大样本skewed distribution 偏态分布、有偏分布bootstrap bias bootstrap偏差bootstrap interval bootstrap区间normal interval 正态区间measured with error 有误差的测量、测量误差capture–recapture experiment 捕获再捕获试验第五章Lack of f it test 拟合失真检验、拟合不足检验、失拟检验Weighted least squares /WLS 加权最小二乘Constant variance 常方差homoscedasticity 同方差、同方差性Nonconstant variance 异方差heteroscedasticity 异方差heteroscedasticity Weighted residual sum of squares 加权残差平方和Identity matrix 单位矩阵Mixed effect model 混合效应模型Variance component model 方差分量模型Econometric model 计量经济学模型Generalized least squares /GLS 广义最小二乘Positive def inite matrix 正定矩阵Gamma distribution 伽马分布Variance stabilizing transf ormation 方差稳定变换Generalized linear model 广义线性模型Logistic regression logistic回归Poisson regression 泊松回归Log-linear regression 对数线性回归Systematic bias 系统性偏差Model-f ree estimate 与模型无关的估计Pooled estimate 联合估计、合并估计Pure error 纯误差Sum of squares f or lack of f it 拟合不足的平方和、失拟平方和general F testing 一般的F检验non-null distribution 非原假设下的分布noncentral chi-squared distribution 非中心卡方分布likelihood ratio test 似然比检验robust 稳健的joint conf idence regions 联合置信域ellipsoid 椭球、椭球体ellipse 椭圆the orientation of ellipse 椭圆的方向第六章Polynomial 多项式polynomial regression 多项式回归Factor 因素、因子Smooth f unction 光滑函数integer powers of the predictors 预测变量的整数次幂quadratic regression 二次回归the range of the predictor 预测变量的值域cubic 三次的polynomial of high-enough degree 足够高次的多项式physical model 物理模型numerical accuracy 数值精度centering 中心化orthogonal polynomial 正交多项式the highest-order term 最高阶的项nonsignif icant 不显著的second-order mean f unction 二阶均值函数quadratic curve 二次曲线delta method 德尔塔方法nonlinear combination 非线性组合elementary calculus 基础微积分Taylor series expansion 泰勒级数展开式Partial derivative 偏导数Regularity conditions 正则条件Symbolic dif ferentiation 符号微分Fractional polynomial 分数次幂多项式fractional powers of predictors 自变量的分数次幂qualitative or categorical predictors 定性或分类自变量two levels 两水平dummy variable 虚拟变量The f actor rule 因素法则、因子准则one-way analysis of vari ance 单因素方差分析Parallel regression 并行回归Within-group 组内的Coincident regression lines 重合的回归直线less stringent model 更(较)不严格的模型main-effects 主效应partial one-dimensional mean f unction 偏一维均值函数random coeff i cient model 随机系数模型wetland contamination 湿地污染chloride concentrations 氯的浓度road runoff路面径流intra-class correlation 类内相关性、组内相关性random intercepts model 随机截距模型linear mixed model 线性混合效应模型第七章Transf ormation 变换nonlinear transf orm ation 非线性变换Uneven 不均匀的In the transf ormed scale 在变换后的尺度中、in log scale 对数尺度Power transf ormation 幂变换Transf ormation f amily 变换族Strictly positive严格大于零的、严格为正的Cube root 三次根log transf ormation 对数变换inverse transf ormation 逆变换、倒数变换empirical rule 经验法则the log rule对数法则the range rule 值域法则multiplicative error 乘法误差、相乘形式的误差scaled power transf ormation 尺度化的幂变换basic power transf ormation 基本幂变换nonlinear least squares 非线性最小二乘inverse f itted value plot 逆拟合值图、倒拟合值图Box-Cox transf orm ation Box-Cox变换modif ied power f amily 修正的幂变换族modif ied power transf ormation 修正的幂变换、调整的幂变换、改进的幂变换geometric mean 几何平均数linear predictors 线性预测变量nonpositive variable 非正变量Y eo-Johnson transf orm ation Y eo-Johnson变换第八章Regression diagnostics 回归诊断graphical diagnostics 图形诊断Residual 残差rescaled residual 重新调整的残差Inf luential case 影响点strong inf luence point 强影响点Diagnostic statistics 诊断统计量Distance measure 距离度量(测度) leverage value 杠杆值Full column rank 列满秩Hat matrix 帽子矩阵Zero mean and uncorrelated el ements 零均值且元素(分量)不相关Common variance 同方差、共同的方差Computed residual 计算出的残差Symmetric matrix 对称矩阵Orthogonal projection 正交投影Column space 列向量空间Leverage 杠杆high leverage point 高杠杆点Elliptical contour 椭圆等高线Weighted hat matrix 加权帽子矩阵Weighted f itted values 加权拟合值weighted residual 加权残差Pearson residual Pearson残差weighted residual sum of squares 加权残差平方和nonconstant variance 非常数方差、即异方差incorrectly speci f ied mean f unction 错误设定的均值函数the right-opening megaphone 右开口喇叭megaphone 麦克风test f or curvature 检验曲率、检验弯曲程度、检验曲线(即非线性)Tukey’s test f or nonadditivity 非可加性的Tukey检验Remedy 补救措施Variance stabilizing transf ormation 方差稳定变换Within group variance 组内方差Generalized least squares 广义最小二乘Remain unbiased 仍然保持无偏性Ineff ici ent 无效的inaccurat e 不精确的Generalized linear model 广义线性模型Diagnostic f or nonconstant variance 异方差的诊断Score test 得分检验、Score检验Asymptotic distribution 渐进分布Model assessment 模型评价第九章Outliers 异常点inf luence 影响点Mean shif t outliers model /MSOM 均值漂移异常点模型Large residual (较)大的残差Candidates of outliers 可能的异常点、异常点的候选者Oil deposits 石油储量Outliers identif ication 异常点的识别outlier test 异常点检验Standardized residual 标准化残差Studentized statistics 学生化统计量studentized residual 学生化残差Critical value 临界值Multiple test 多重检验multiple comparison 多重比较Bonf erroni inequality Bonf erroni不等式Conservative 保守的Robust statistical method 稳健的统计方法Inf luence analysis 影响分析Robustness of the conclusion 结论的稳健性Perturbation 扰动Case deleting model /CDM 数据删除模型、个案删除模型、样本点删除模型Cook’s distance Cook距离Local inf luence analysis 局部影响分析Normality assumption 正态性假设Small sample小样本bootstrap method Nonnormality 非正态性Supernormality of residual 残差的超正态性Normal probability plot 正态概率图、正态PP图Order statistics 秩序统计量、顺序统计量Expected order statistics 期望的秩序统计量、预期的秩序统计量第十章Variable selection 变量选择Cheap data and expensive inf ormation 数据廉价但信息昂贵Training of employees 雇员的培训、员工的培训Supplier of raw materials 原材料的供应Important or active predictors 重要或有效的自变量、起作用的自变量Inactive predi ctors 无效变量、不起作用的自变量To overestimate signif icance 高估显著性Machine learning 机器学习data mining 数据挖掘Identi f ying the active predictors 识别有效自变量Surprisingly dif f icult 非常困难Highly positive correlated 高度正相关Highly negative correlated 高度负相关Collinearity 共线性multicollinearity 多重共线性Exactly collinear 完全共线的、即线性相关linearly dependent 线性相关Approximate collinearity 近似共线性,即常说的共线性Sample correlation 样本相关系数approximately collinear 近似共线性的variance inf l ation factor /VIF 方差膨胀因子inf orm ation criteria 信息准则lack of f it of a model 一个模型的拟合失真(即不足)情况complexity of a model 一个模型的复杂性(由其变量的个数决定,太复杂就会产生过度拟合) model comparison 模型比较Akaike inf ormation criteria /AIC 赤池信息准则Bayes inf orm ation criteria /BIC 贝叶斯信息准则Schwarz inf orm ation criteria 许瓦兹信息准则Mallows’ C pComputationally intensive criteria 计算量大的准则Cross-validation 交叉验证Construction set 构造集、建模集V alidation set 验证集Training set 训练集test set 测试集Predicted residual 预测残差predicted residual sum of squares /PRESS 预测残差平方和leaps and bounds algorithm 跨越式算法、快速算法stepwise method 逐步方法stepwise regression 逐步回归f orward selection 前向选择backward elimination algorithm 后向删除算法best subset 最佳子集subset selection 子集选择normal random deviate 正态随机数random number 随机数overstate signif i cance 夸大显著性,即夸大自变量对因变量效应的显著性calibrate 定标、校正、作为标准。
GEMMA2多变量线性混合效应模型版本0.1.3说明书
Package‘gemma2’October13,2022Title GEMMA Multivariate Linear Mixed ModelVersion0.1.3Description Fits a multivariate linear mixed effects model that uses a polygenic term,af-ter Zhou&Stephens(2014)(<https:///articles/nmeth.2848>).Of partic-ular interest is the estimation of variance components with restricted maximum likeli-hood(REML)methods.Genome-wide efficient mixed-model association(GEMMA),as imple-mented in the package'gemma2',uses an expectation-maximization algorithm for variance com-ponents inference for use in quantitative trait locus studies.License MIT+file LICENSEEncoding UTF-8LazyData trueURL https:///fboehm/gemma2BugReports https:///fboehm/gemma2/issuesSuggests covr,testthat,knitr,rmarkdown,readrRoxygenNote7.1.1VignetteBuilder knitrImports methods,MatrixLanguage en-USNeedsCompilation noAuthor Frederick Boehm[aut,cre](<https:///0000-0002-1644-5931>)Maintainer Frederick Boehm<*************************>Repository CRANDate/Publication2020-10-2416:20:03UTCR topics documented:calc_omega (2)calc_qi (3)calc_sigma (4)12calc_omega calc_XHiY (4)center_kinship (5)eigen2 (6)eigen_proc (6)gemma2 (7)MphCalcLogL (7)MphEM (8)stagger_mats (9)UpdateRL_B (9)update_e (10)update_u (10)update_v (11)Index12 calc_omega Calculate Omega matricesDescriptionCalculate Omega matricesUsagecalc_omega(eval,D_l)Argumentseval vector of eigenvalues from decomposition of relatedness matrixD_l vector of length d_sizeValuelist of length2.First entry in the list is the symmetric matrix OmegaU.Second entry in the list is the symmetric matrix OmegaE.Examplescalc_omega(eval=50:1,D_l=runif(2))calc_qi3 calc_qi Calculate Qi(inverse of Q)and log determinant of QDescriptionCalculate Qi(inverse of Q)and log determinant of QUsagecalc_qi(eval,D_l,X)Argumentseval vector of eigenvalues from decomposition of relatedness matrixD_l vector of length d_sizeX design matrixValuea list of length two.First entry in the list is a symmetric numeric matrix,Qi,the inverse of theQ matrix.The second entry in the outputted list is the log determinant of the matrix Q for use in likelihood calculations.Examplesas.matrix(readr::read_tsv(system.file("extdata","mouse100.cXX.txt",package="gemma2"),col_names=FALSE)[,1:100])->kinshipeigen2(kinship)->e2_oute2_out$values->evale2_out$vectors->Ueigen_proc(V_g=diag(c(1.91352,0.530827)),V_e=diag(c(0.320028,0.561589)))->ep_outcalc_qi(eval=eval,D_l=ep_out[[4]],X=t(rep(1,100))%*%U)4calc_XHiY calc_sigma Calculate Sigma_ee and Sigma_uu matricesDescriptionCalculate Sigma_ee and Sigma_uu matricesUsagecalc_sigma(eval,D_l,X,OmegaU,OmegaE,UltVeh,Qi)Argumentseval eigenvalues vector from decomposition of relatedness matrixD_l vectorX design matrixOmegaU matrixOmegaE matrixUltVeh matrixQi inverse of Q matrixcalc_XHiY Calculate XHiYDescriptionCalculate XHiYUsagecalc_XHiY(eval,D_l,X,UltVehiY)Argumentseval vector of eigenvalues from the decomposition of the relatedness matrixD_l vector of length d_sizeX design matrixUltVehiY a matrixValuenumeric vectorcenter_kinship5Examplesreadr::read_tsv(system.file("extdata","mouse100.pheno.txt",package="gemma2"),col_names=FALSE)->phenophe16<-as.matrix(pheno[,c(1,6)])as.matrix(readr::read_tsv(system.file("extdata","mouse100.cXX.txt",package="gemma2"),col_names=FALSE)[,1:100])->kinshipeigen2(kinship)->eouteout$values->evaleout$vectors->UUltVehi<-matrix(c(0,-1.76769,-1.334414,0),nrow=2,byrow=FALSE)#from output of eigen_proc()calc_XHiY(eval=eval,D_l=c(0.9452233,5.9792268),X=rep(1,100)%*%U,UltVehiY=UltVehi%*%t(phe16)%*%U)center_kinship Center a relatedness matrix,after Zhou’s GEMMA function Center-MatrixDescriptionCenter a relatedness matrix,after Zhou’s GEMMA function CenterMatrixUsagecenter_kinship(mat)Argumentsmat a relatedness matrixValuea centered relatedness matrixExamplesreadr::read_tsv(system.file("extdata","mouse100.cXX.txt",package="gemma2"),col_names=FALSE)[,1:100]->kinshipe_out<-eigen2(as.matrix(kinship))center_kinship(as.matrix(kinship))->kinship_centered6eigen_proc eigen2Calculate eigendecomposition and return ordered eigenvalues andeigenvectorsDescriptionCalculate eigendecomposition and return ordered eigenvalues and eigenvectorsUsageeigen2(spd,decreasing=FALSE)Argumentsspd a semi-positive definite matrixdecreasing argument passed to order()Valuea list with2components,the eigenvalues and the eigenvectorsExamplesreadr::read_tsv(system.file("extdata","mouse100.cXX.txt",package="gemma2"),col_names=FALSE)[,1:100]->kinshipe_out<-eigen2(as.matrix(kinship))eigen_proc Eigendecomposition procedure for Vg and VeDescriptionEigendecomposition procedure for Vg and VeUsageeigen_proc(V_g,V_e,tol=1/10000)ArgumentsV_g a d_size by d_size covariance matrixV_e a d_size by d_size covariance matrixtol a positive number indicating the tolerance for isSymmetricgemma27 Valuea named list of length4containing the outputs of eigendecomposition procedureExampleseigen_proc(diag(2),diag(2))gemma2gemma2DescriptionWe implement an expectation-maximization algorithm for multivariate variance components after the GEMMA software’s algorithm.MphCalcLogL Calculate log likelihoodDescriptionCalculate log likelihoodUsageMphCalcLogL(eval,D_l,Qi,UltVehiY,xHiy)Argumentseval eigenvalues vector from decomposition of relatedness matrixD_l vector of eigenvalues from decomposition of Ve matrixQi inverse of Q matrixUltVehiY matrix of(transformed)Y valuesxHiy vector8MphEM MphEM Perform expectation-maximization algorithm to infer Vg and Ve valuesfor a pair of traits.DescriptionPerform expectation-maximization algorithm to infer Vg and Ve values for a pair of traits.UsageMphEM(max_iter=10000,max_prec=1/1e+06,eval,X,Y,V_g,V_e,verbose_output=FALSE)Argumentsmax_iter maximum number of iterations for EM algorithmmax_prec maximum precision for EM algorithmeval vector of eigenvalues from relatedness matrix decompositionX design matrix.Typically contains founder allele dosages.Y matrix of phenotype valuesV_g genetic covariance matrixV_e error covariance matrixverbose_output logical indicating whether to output entire collection of intermediate values for all iterations.Default is FALSE.Valuea list of lists.Length of list corresponds to number of EM iterationsstagger_mats9 stagger_mats Stagger matrices within a larger,block-diagonal matrixDescriptionStagger matrices within a larger,block-diagonal matrixUsagestagger_mats(...)Arguments...one or more matrices,separated by commasValuea block-diagonal matrix,with the inputted matrices as blocks on the diagonal.Examplesfoo<-matrix(rnorm(40000),ncol=8)block_diag<-stagger_mats(foo,foo)dim(foo)dim(block_diag)UpdateRL_B Update B for restricted log likelihoodDescriptionUpdate B for restricted log likelihoodUsageUpdateRL_B(xHiy,Qi,d_size)ArgumentsxHiy vectorQi Q inverse matrixd_size number of traitsSee AlsoOther expectation-maximization functions:update_e(),update_u(),update_v()10update_u update_e Update EDescriptionUpdate EUsageupdate_e(UltVehiY,UltVehiBX,UltVehiU)ArgumentsUltVehiY matrix of transformed Y valuesUltVehiBX matrix of transformed BX valuesUltVehiU matrix of transformed U valuesSee AlsoOther expectation-maximization functions:UpdateRL_B(),update_u(),update_v()update_u Update U matrixDescriptionUpdate U matrixUsageupdate_u(OmegaE,UltVehiY,UltVehiBX)ArgumentsOmegaE the OmegaE matrix,calculated in calc_omegaUltVehiY matrixUltVehiBX matrixSee AlsoOther expectation-maximization functions:UpdateRL_B(),update_e(),update_v()update_v11Examplesreadr::read_tsv(system.file("extdata","mouse100.pheno.txt",package="gemma2"),col_names=FALSE)->phenophe16<-as.matrix(pheno[,c(1,6)])as.matrix(readr::read_tsv(system.file("extdata","mouse100.cXX.txt",package="gemma2"),col_names=FALSE)[,1:100])->kinshipeigen2(kinship)->e2_oute2_out$values->evale2_out$vectors->Ueigen_proc(V_g=diag(c(1.91352,0.530827)),V_e=diag(c(0.320028,0.561589)))->ep_outUltVehi<-ep_out[[3]]calc_omega(eval,ep_out$D_l)->co_outupdate_u(OmegaE=co_out[[2]],UltVehiY=UltVehi%*%t(phe16),UltVehiBX=matrix(c(-0.71342,-0.824482),ncol=1)%*%t(rep(1,100)))update_v Update V_e and V_gDescriptionUpdate V_e and V_gUsageupdate_v(eval,U,E,Sigma_uu,Sigma_ee,tol=1/10000)Argumentseval vector of eigenvalues from eigendecomposition of relatedness matrixU matrixE matrixSigma_uu matrixSigma_ee matrixtol a positive number indicating tolerance to be passed to isSymmetric()See AlsoOther expectation-maximization functions:UpdateRL_B(),update_e(),update_u()Index∗expectation-maximization functionsupdate_e,10update_u,10update_v,11UpdateRL_B,9calc_omega,2calc_qi,3calc_sigma,4calc_XHiY,4center_kinship,5eigen2,6eigen_proc,6gemma2,7MphCalcLogL,7MphEM,8stagger_mats,9update_e,9,10,10,11update_u,9,10,10,11update_v,9,10,11UpdateRL_B,9,10,1112。
Linear Mixed Effects Modeling in SPSS
Technical reportLinear Mixed-Effects Modelingin SPSS: An Introduction to the MIXED ProcedureTable of contentsIntroduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Data preparation for MIXED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Fitting fixed-effects models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Fitting simple mixed-effects models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Fitting mixed-effects models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13Multilevel analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16Custom hypothesis tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18Covariance structure selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19Random coefficient models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20Estimated marginal means. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28About SPSS Inc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28IntroductionThe linear mixed-effects models (MIXED) procedure in SPSS enables you to fit linear mixed-effects models to data sampled from normal distributions. Recent texts, such as those by McCulloch and Searle (2000) and Verbeke and Molenberghs (2000), comprehensively review mixed-effects models. The MIXED procedure fits models more general than those of the general linear model (GLM) procedure and it encompasses all models in the variance components (VARCOMP) procedure. This report illustrates the types of models that MIXED handles. We begin with an explanation of simple models that can be fitted using GLM and VARCOMP, to show how they are translated into MIXED. We then proceed to fit models that are unique to MIXED.The major capabilities that differentiate MIXED from GLM are that MIXED handles correlated data and unequal variances. Correlated data are very common in such situations as repeated measurements of survey respondents or experimental subjects. MIXED extends repeated measures models in GLM to allow an unequal number of repetitions. It also handles more complex situations in which experimental units are nested in a hierarchy. MIXED can, for example, process data obtained from a sample of students selected from a sample of schools in a district.In a linear mixed-effects model, responses from a subject are thought to be the sum (linear) of so-called fixed and random effects. If an effect, such as a medical treatment, affects the population mean, it is fixed. If an effect is associated with a sampling procedure (e.g., subject effect), it is random. In a mixed-effects model, random effects contribute only to the covariance structure of the data. The presence of random effects, however, often introduces correlations between cases as well. Though the fixed effect is the primary interest in most studies or experiments, it is necessary to adjust for the covariance structure of the data. The adjustment made in procedures like GLM-Univariate is often not appropriate because it assumes independence of the data.The MIXED procedure solves these problems by providing the tools necessary to estimate fixed and random effects in one model. MIXED is based, furthermore, on maximum likelihood (ML) and restricted maximum likelihood (REML) methods, versus the analysis of variance (ANOVA) methods in GLM. ANOVA methods produce an optimum estimator (minimum variance) for balanced designs, whereas ML and REML yield asymptotically efficient estimators for balanced and unbalanced designs. ML and REML thus present a clear advantage over ANOVA methods in modeling real data, since data are often unbalanced. The asymptotic normality of ML and REML estimators, furthermore, conveniently allows us to make inferences on the covariance parameters of the model, which is difficult to do in GLM.Data preparation for MIXEDMany datasets store repeated observations on a sample of subjects in “one subject per row” format. MIXED, however, expects that observations from a subject are encoded in separate rows. To illustrate, we select a subset of cases from the data that appear in Potthoff and Roy (1964). The data shown in Figure 1 encode, in one row, three repeated measurements of a dependent variable (“dist1” to “dist3”) from a subject observed at different ages (“age1” to“age3”).Figure 1. MIXED, however, requiresthat measurements at different ages be collapsed into one variable, so that each subject has three cases. The Data Restructure Wizard in SPSS simplifies the tedious data conversion process. We choose “Data->Restructure” from the pull-down menu, and select the option “Restructure selected variables into cases.” We then click the “Next” button to reach the dialog shownin Figure 2.Figure 2. We need to convert two groups ofvariables (“age” and “dist”) into cases. We therefore enter “2” and click “Next.” Thisbrings us to the “Select Variables” dialog box. Figure 3. In the “Select Variables” dialog box,we first specify “Subject ID [subid]” as the case group identification. We then enter the namesof new variables in the target variable drop-down list. For the target variable “age,” we drag “age1,” “age2,” and “age3” to the list box in the “Variables to be Transposed” group. We similarly associate variables “dist1,” “dist2,” and “dist3” with the target variable “distance.” We then drag variables that do not vary within a subject to the “Fixed Variable(s)” box. Clicking “Next” brings us to the “Create Index Variables” dialog box. We accept the default of one index variable, then click “Next” to arrive at the final dialog box.Figure 4. In the “Create One Index Variable”dialog box, we enter “visit” as the name of the indexing variable and click “Finish.” Figure 5. We now have three cases foreach subject.We can also perform the conversion using the following command syntax:VARSTOCASES/MAKE age FROM age1 age2 age3/MAKE distance FROM dist1 dist2 dist3/INDEX = visit(3)/KEEP = subid gender.The command syntax is easy to interpret—it collapses the three age variables into “age” and the three response variables into “distance.” At the same time, a new variable, “visit,” is created to index the three new cases within each subject. The last subcommand means that the two variables that are constant within a subject should be kept.Fitting fixed-effects modelsWith iid residual errorsA fitted model has the form , where is a vector of responses, is the fixed-effects design matrix, is a vector of fixed-effects parameters and is a vector of residual errors. In this model, we assume that is distributed as , where is an unknown covariance matrix. A common belief is that . We can use GLM or MIXED to fit a model with this assumption. Using a subset of the growth study dataset, we illustrate how to use MIXED to fit a fixed-effects model. The following command (Example 1) fits a fixed-effects model that investigates the effect of the variables “gender” and “age” on “distance,” which is a measure of the growth rate.Example 1: Fixed-effects model using MIXEDCommand syntax:MIXED DISTANCE BY GENDER WITH AGE/FIXED = GENDER AGE | SSTYPE(3)/PRINT = SOLUTION TESTCOV.Output:Figure 6Figure 7The command in Example 1 produces a “Type III Tests of Fixed Effects” table (Figure 6). Both “gender” and “age” are significant at the .05 level. This means that “gender” and “age” are potentially impor-tant predictors of the dependent variable. More detailed information on fixed-effects parameters may be obtained by using the subcommand /PRINT SOLUTION. The “Estimates of Fixed Effects” table (Figure 7) gives estimates of individual parameters,as well as their standard errors and confidence intervals.We can see that the mean distance for males is larger than that for females. Distance, moreover, increases with age. MIXED also produces an estimate of the residual error variance and its standard error. The /PRINT TESTCOV option gives us the Wald statistic and the confidence interval for the residual error variance estimate.Example 1 is simple—users familiar with the GLM procedure can fit the same model using GLM.Example 2: Fixed-effects model using GLM Command syntax:GLM DISTANCE BY GENDER WITH AGE /METHOD = SSTYPE(3)/PRINT = PARAMETER /DESIGN = GENDER AGE.Output:Figure 8Figure 9We see in Figure 9 that GLM and MIXED produced the same Type III tests and parameter estimates. Note, however, that in the MIXED “Type III Tests of Fixed Effects” table (Figure 6), there is no column for the sum of squares. This is because, for some complex models, the test statistics in MIXED may not be expressed as a ratio of two sums of squares. They are thus omitted from the ANOVA table.With non-iid residual errorsThe assumption may be violated in some situations. This often happens when repeated measurements are made on each subject. In the growth study dataset, for example, the response variable of each subject is measured at various ages. We may suspect that error terms within a subject are correlated. A reasonable choice of the residual error covariance will therefore be a block diagonal matrix, where each block is a first-order autoregressive (AR1) covariance matrix.Example 3: Fixed-effects model with correlated residual errors Command syntax:MIXED DISTANCE BY GENDER WITH AGE /FIXED GENDER AGE/REPEATED VISIT | SUBJECT(SUBID) COVTYPE(AR1)/PRINT SOLUTION TESTCOV R.Output:Figure 10Figure 11Figure 12Example 3 uses the /REPEATED subcommand to specify a more general covariance structure for the residual errors. Since there are three observations per subject, we assume that the set of three residual errors for each subject is a sample from a three-dimensional normal distribution with a first-order autoregressive (AR1) covariance matrix. Residual errors within each subject are therefore correlated, but are independent across subjects. The MIXED procedure, by default, usesthe REML method to estimate the covariance matrix. An alternative is to request ML estimates by using the /METHOD=ML subcommand.The command syntax in Example 3 also produces the “Residual Covariance (R) Matrix” (Figure 14), which shows the estimated covariance matrix of the residual error for one subject. We see from the “Estimates of Covariance Parameters” table (Figure 13) that the correlation parameter has a relatively large value (.729) and that the p-value of the Wald test is less than .05. The autoregressive structure may fit the data better than the model in Example 1.We also see that, for the tests of fixed effects, the denominator degrees of freedom are not integers. This is because these statistics do not have exact F distributions. The values for denominator degrees of freedom are obtained by a Satterthwaite approximation. We see in the new model that gender is not significant at the .05 level. This demonstrates that ignoring the possible correlations in your data may lead to incorrect conclusions. MIXED is therefore usually a better alternative to GLM and VARCOMP when data are correlated.Fitting simple mixed-effects models Balanced designMIXED, as its name implies, handles complicated models that involve fixed and random effects. Levels of an effect are, in some situations, only a sample of all possible levels. If we want to study the efficiency of workers in different environments, for example, we don’t need to include all workers in the study—a sample of workers is usually enough. The worker effect should be considered random, due to the sampling process. A mixed-effects model has, in general, the formwhere the extra term models the random effects. is the design matrix of random effects and is a vector of random-effects parameters. We can use GLM and MIXED to fit mixed-effects models. MIXED, however, fits a much wider class of models. To understand the functionality of MIXED, we first look at several simpler models that can be created in MIXED and GLM. We also look at the similarity between MIXED and VARCOMP in these models.Figure 13Figure 14In examples 4 through 6, we use a semiconductor dataset that appeared in Pinheiro and Bates (2000) to illustrate the similarity between GLM, MIXED, and VARCOMP. The dependent variable in this dataset is “current” and the predictor is “voltage.” The data are collected from a sample of ten silicon wafers. There are eight sites on each wafer and five measurements are taken at each site. We have, therefore, a total of 400 observations and a balanced design.Example 4: Simple mixed-effects model with balanced design using MIXEDCommand syntax:MIXED CURRENT BY WAFER WITH VOLTAGE/FIXED VOLTAGE | SSTYPE(3)/RANDOM WAFER/PRINT SOLUTION TESTCOV.Output:Figure 15Figure 16Figure 17Example 5: Simple mixed-effects model with balanced design using GLM Command syntax:GLM CURRENT BY WAFER WITH VOLTAGE /RANDOM = WAFER /METHOD = SSTYPE(3)/PRINT = PARAMETER /DESIGN = WAFER VOLTAGE.Output:Example 6: Variance components model with balanced design Command syntax:VARCOMP CURRENT BY WAFER WITH VOLTAGE /RANDOM = WAFER /METHOD = REML.Output:In Example 4, “voltage” is entered as a fixed effect and “wafer” is entered as a random effect. This example tries to model the relationship between “current” and “voltage” using a straight line, but the intercept of the regression line will vary from wafer to wafer according to a normal distribution. In the Type III tests for “voltage,” we see a significantrelationship between “current” and “voltage.” If we delve deeper into the parameter estimates table, the regression coefficient of “voltage” is 9.65. This indicates a positive relationship between “current” and “voltage.” In the “Estimates of Covariance Parameters” table (Figure 17), we have estimates for the residual error variance and the variance due to the sampling of wafers.Figure 18Figure 19Figure 20We repeat the same model in Example 5 using GLM. Note that MIXED produces Type III tests for fixed effects only, but GLM includes fixed and random effects. GLM treats all effects as fixed during computation and constructs F statistics by taking the ratio of the appropriate sums of squares. Mean squares of random effects in GLM are estimates of functions of the variance parameters of random and residual effects. These functions can be recovered from “Expected Mean Squares” (Figure 19). In MIXED, the outputs are much simpler because the variance parameters are estimated directly using ML or REML. As a result, there are no random-effect sums of squares.When we have a balanced design, as in examples 4 through 6, the tests of fixed effects are the same for GLM and MIXED. We can also recover the variance parameter estimates of MIXED by using the sum of squares in GLM. In MIXED, for example, the estimate of the residual variance is 0.175, which is the same as the MS(Error) in GLM. The variance estimate of random effect “wafer” is 0.093, which can be recovered in GLM using the “Expected Mean Squares” table (Figure 19) in Example 5:Var(WAFER) = [MS(WAFER)-MS(Error)]/40 = 0.093This is equal to MIXED’s estimate. One drawback of GLM, however, is that you cannot compute the standard error of the variance estimates.VARCOMP is, in fact, a subset of MIXED. These two procedures therefore always provide the same variance estimates, as seen in examples 4 and 6. VARCOMP only fits relatively simple models. It can only handle random effects that are iid. No statistics on fixed effects are produced. If your primary objective is to make inferences about fixed effects and your data are correlated, MIXED is a better choice.An important note: Due to the different estimation methods that are used, GLM and MIXED often do not produce the same results. The next section gives an example of situations in which they produce different results.Unbalanced designOne situation in which MIXED and GLM disagree is with an unbalanced design. To illustrate this, we removed some cases in the semiconductor dataset, so that the design is no longer balanced.Figure 21We then rerun examples 4 through 6 with this unbalanced dataset. The output is shown in examples 4a through 6a. We want to see whether the three methods—GLM, MIXED and VARCOMP—still agree with each other.Example 4a: Mixed-effects model with unbalanced design using MIXED Command syntax:MIXED CURRENT BY WAFER WITH VOLTAGE /FIXED VOLTAGE | SSTYPE(3)/RANDOM WAFER/PRINT SOLUTION TESTCOV.Output:Example 5a: Mixed-effects model with unbalanced design using GLM Command syntax:GLM CURRENT BY WAFER WITH VOLTAGE /RANDOM = WAFER /METHOD = SSTYPE(3)/PRINT = PARAMETER /DESIGN = WAFER VOLTAGE.Figure 22Figure 23Figure 24Output:Example 6a: Variance components model with unbalanced designCommand syntax:VARCOMP CURRENT BY WAFER WITH VOLTAGE /RANDOM = WAFER/METHOD = REML.Output:Since the data have changed, we expect examples 4a through 6a to differ from examples 4 through 6. We will focus instead on whether examples 4a, 5a, and 6a agree with each other.In Example 4a, the F statistic for the “voltage” effect is 67481.118, butExample 5a gives an F statistic value of 67482.629. Apart from the test of fixed effects, we also see a difference in covariance parameter estimates.Examples 4a and 6a, however, show that VARCOMP and MIXED can produce the same variance estimates, even in an unbalanced design. This is because MIXED and VARCOMP offer maximum likelihood or restricted maximum likelihood methods in estimation, while GLM estimates are based on the method-of-moments approach.MIXED is generally preferred because it is asymptotically efficient (minimum variance), whether or not the data are balanced. GLM, however, only achieves its optimum behavior when the data are balanced.Figure 25Figure 26Figure 27Fitting mixed-effects modelsWith subjectsIn the semiconductor dataset, “current” is a dependent variable measured on a batch of wafers. These wafers are therefore considered subjects in a study. An effect of interest (such as “site”) may often vary with subjects (“wafer”). One scenario is that the (population) means of “current” at separate sites are different. When we look at the current measured at these sites on individual wafers, however, they hover below or above the population mean according to some normal distribution. It is therefore common to enter an “effect by subject” interaction term in a GLM or MIXED model to account for the subject variations.In the dataset there are eight sites and ten wafers. The site*wafer effect, therefore, has 80 parameters, which can be denoted by , i=1...10 and j=1...8. A common assumption is that ’s are assumed to be iid normal with zero mean and an unknown variance. The mean is zero because ’s are used to model only the population variation. The mean of the population is modeled by entering “site” as a fixed effect in GLM and MIXED. The results of this model for MIXED and GLM are shown in examples 7 and 8.Example 7: Fitting random effect*subject interaction using MIXEDCommand syntax:MIXED CURRENT BY WAFER SITE WITH VOLTAGE/FIXED SITE VOLTAGE |SSTYPE(3)/RANDOM SITE*WAFER | COVTYPE(ID).Output:Figure 28Figure 29Example 8: Fitting random effect*subject interaction using GLMCommand syntax:GLM CURRENT BY WAFER SITE WITH VOLTAGE/RANDOM = WAFER/METHOD = SSTYPE(3)/DESIGN = SITE SITE*WAFER VOLTAGE.Output:Figure 30Figure 31Since the design is balanced, the results of GLM and MIXED in examples 7 and 8 match. This is similar to examples 4 and 5. We see from the results of Type III tests that “voltage” is still an important predictor of “current,” while “site” is not. The mean currents at different sites are thus not significantly different from each other, so we can use a simpler model without the fixed effect “site.” We should still, however, consider a random-effects model, because ignoring the subject variation may lead to incorrect standard error estimates of fixed effects or false significant tests.Up to this point, we examined primarily the similarities between GLM and MIXED. MIXED, in fact, has a much more flexible way of modeling random effects. Using the SUBJECT and COVTYPE options, Example 9 presents an equivalent form of Example 7.Example 9: Fitting random effect*subject interaction using SUBJECT specificationCommand syntax:MIXED CURRENT BY SITE WITH VOLTAGE/FIXED SITE VOLTAGE |SSTYPE(3)/RANDOM SITE | SUBJECT(WAFER) COVTYPE(ID).The SUBJECT option tells MIXED that each subject will have its own set of random parameters for the random effect “site.” The COVTYPE option will specify the form of the variance covariance matrix of the random parameters within one subject. The command syntax attempts to specify the distributional assumption in a multivariate form, which can be written as:Under normality, this assumption is equivalent to that in Example 7. One advantage of the multivariate form is that you can easily specify other covariance structures by using the COVTYPE option. The flexibility in specifying covariance structures helps us to fit a model that better describes the data. If, for example, we believe that the variances of different sites are different, we can specify a diagonal matrix as covariance type and the assumption becomes:The result of fitting the same model using this assumption is given in Example 10.Example 10: Using COVTYPE in a random-effects model Command syntax:MIXED CURRENT BY SITE WITH VOLTAGE /FIXED SITE VOLTAGE |SSTYPE(3)/RANDOM SITE | SUBJECT(WAFER) COVTYPE(DIAG)/PRINT G TESTCOV.Output:Figure 32Figure 33Figure 34In Example 10, we request one extra table, the estimated covariance matrix of the random effect “site.” It is an eight-by-eight diagonal matrix in this case. Note that changing the covariance structure of a random effect also changes the estimates and tests of fixed effects. We want, in practice, an objective method to select suitable covariance struc-tures for our random effects. In the section “Covariance Structure Selection,” we revisit examples 9 and 10 to show how to selectcovariance structures for random effects.Multilevel analysisThe use of the SUBJECT and COVTYPE options in /RANDOM and /REPEATED brings many options for modeling the covariance structures of random effects and residual errors. It is particularly useful when modeling data obtained from a hierarchy. Example 11 illustrates the simultaneous use of theseoptions in a multilevel model. We selected data from six schools from the Junior School Project of Mortimore, et al. (1988). We investi-gate below how the socioeconomic status (SES) of a student affects his or her math scores over a three-year period.Example 11: Multilevel mixed-effects model Command syntax:MIXED MATHTEST BY SCHOOL CLASS STUDENT GENDER SES SCHLYEAR /FIXED GENDER SES SCHLYEAR SCHOOL/RANDOM SES |SUBJECT(SCHOOL*CLASS) COVTYPE(ID)/RANDOM SES |SUBJECT(SCHOOL*CLASS*STUDENT) COVTYPE(ID)/REPEATED SCHLYEAR | SUBJECT(SCHOOL*CLASS*STUDENT) COVTYPE(AR1)/PRINT SOLUTION TESTCOV.Figure 35Figure 36In Example 11, the goal is to discover whether socioeconomic status (“ses”) is an important predictor for mathematics achievement (“mathtest”). To do so, we use the factor “ses” as a fixed effect. We also want to adjust for the possible sampling variation due to different classes and students. “Ses” is therefore also used twice as a random effect. The first random effect tries to adjust for the variation of the “ses” effect owing to class variation. In order to identify all classes in the dataset, school*class is specified in the SUBJECT option. The second random effect also tries to adjust for the variation of the “ses” effect owing to student variation. The subject specification is thus school*class*student. All of the students are followed for three years; the school year (“schlyear”) is therefore used as a fixed effect to adjust for possible trends in this period. The /REPEATED subcommand is also used to model the possible correlation of the residual errors within each student.We have a relatively small dataset. Since there are only six schools, we can only use school as a fixed effect while adjusting for possible differences between schools. In this example, there is only one random effect at each level. With SPSS 11.5 or later, you can specify more than one random effect in MIXED. If multiple random effects are specified on the same RANDOM subcommand, you can model their correlation by using a suitable COVTYPE specification. If the random effects are specified on separate RANDOM subcommands, they are assumed to be independent.Figure 38Figure 39In the Type III tests of fixed effects, in Example 11, we see that socioeconomic status does impact student performance. The parameter estimates of “ses” for students with “ses=1” (fathers have managerial or professional occupations) indicate that these students perform better than students at other socioeconomic levels. The effect “schlyear” is also significant in the model and the students’ performances increase with “schlyear.”From “Estimates of Covariance Parameters” (Figure 39), we notice that the estimate of the “AR1 rho” parameter is not significant, which means that a simple, scaled-identity structure may be used. For the variation of “ses” due to school* class, the estimate is very small compared to other sources of variance and the Wald test indicates that it is not significant. We can therefore consider removing the random effect from the model.We see from this example that the major advantages of MIXED are that it is able to look at different aspects of a dataset simultaneously and that all of the statistics are already adjusted for all effects in the model. Without MIXED, we must use different tools to study different aspects of the models. An example of this is using GLM to study the fixed effects andusing VARCOMP to study the covariance structure. This is not only time consuming, but the assumptions behind the statistics are usually violated.Custom hypothesis testsApart from predefined statistics, MIXED allows users to construct custom hypotheses on fixed- and random-effects parameters through the use of the /TEST subcommand. To illustrate, we use a dataset from Pinheiro and Bates (2000). The data consistof a CT scan on a sample of ten dogs. The dogs’ left and right lymph nodes were scanned and the intensity of each scan was recorded in the variable pixel. The following mixed-model command syntax tests whether there is a difference between the left and right lymph nodes.Example 12: Custom hypothesis testing in mixed-effects modelCommand syntax:MIXED PIXEL BY SIDE/FIXED SIDE/RANDOM SIDE | SUBJECT(DOG) COVTYPE(UN)/TEST(0) ‘Side (fixed)’ SIDE 1 -1/TEST(0) ‘Side (random)’ SIDE 1 -1 | SIDE 1 -1/PRINT LMATRIX.Output:Figure 40The output of the two /TEST subcommands is shown above. The first test looks at differences in the left and right sides in the general population (broad inference space). We should use the second test to test the differences between the left and right sides for the sample of dogs used in this particular study (narrow inference space). In the second test, the average differences of the random effects over the ten dogs are added to the statistics. MIXED automatically calculates the average over subjects. Note that the contrast coefficients for random effects are scaled by one/(number of subjects). Though the average difference for the random effect is zero, it affects the standard error of the statistic. We see that statistics of the two tests are the same, but the second has a smaller standard error. This means that if we make an inference on a larger population, there will be more uncertainty. This is reflected in the larger standard error of the test. The hypothesis in this example is not significant in the general population, but it is significant for the narrow inference. A larger sample size is therefore often needed to test a hypothesis about the general population.Covariance structure selectionIn examples 3 and 11, we see the use of Wald statistics in covariance structure selection. Another approach to testing hypotheses on covariance parameters uses likelihood ratio tests. The statistics are constructed by taking the differences of the -2 Log likelihoods of two nested models. Under the null hypothesis that the covariance parameters are 0 in the population, this difference follows a chi-squared distribution with degrees of freedom equal to the difference in the number of parameters of the models.To illustrate the use of the likelihood ratio test, we again look at the model in examples 9 and 10. In Example 9, we use a scaled identity as the covariance matrix of the random effect “site.” In Example 10, however, we use a diagonal matrix with unequal diagonal elements. Our goal is to discover which model better fits the data. We obtain the -2 Log likelihood values and other criteria about the two models from the information criteria tables shown on the next page.Figure 41Figure 42。
多水平统计分析模型(混合效应模型)
多⽔平统计分析模型(混合效应模型)⼀、概述普通的线性回归只包含两项影响因素,即固定效应(fixed-effect)和噪声(noise)。
噪声是我们模型中没有考虑的随机因素。
⽽固定效应是那些可预测因素,⽽且能完整的划分总体。
例如模型中的性别变量,我们清楚只有两种性别,⽽且理解这种变量的变化对结果的影响。
那么为什么需要 Mixed-effect Model?因为有些现实的复杂数据是普通线性回归是处理不了的。
例如我们对⼀些⼈群进⾏重复测量,此时存在两种随机因素会影响模型,⼀种是对某个⼈重复测试⽽形成的随机噪声,另⼀种是因为⼈和⼈不同⽽形成的随机效应(random effect)。
如果将⼀个⼈的测量数据看作⼀个组,随机因素就包括了组内随机因素(noise)和组间随机因素(random effect)。
这种嵌套的随机因素结构违反了普通线性回归的假设条件。
你可能会把⼈员(组间的随机效应)看作是⼀种分类变量放到普通线性回归模型中,但这样作是得不偿失的。
有可能这个factor的level很多,可能会⽤去很多⾃由度。
更重要的是,这样作没什么意义。
因为⼈员ID和性别不⼀样,我们不清楚它的意义,⽽且它也不能完整的划分总体。
也就是说样本数据中的路⼈甲,路⼈⼄不能完全代表总体的⼈员ID。
因为它是随机的,我们并不关⼼它的作⽤,只是因为它会影响到模型,所以不得不考虑它。
因此对于随机效应我们只估计其⽅差,不估计其回归系数。
混合模型中包括了固定效应和随机效应,⽽随机效应有两种⽅式来影响模型,⼀种是对截距影响,⼀种是对某个固定效应的斜率影响。
前者称为 Random intercept model,后者称为Random Intercept and Slope Model。
Random intercept model的函数结构如下Yij = a0 + a1*Xij + bi + eija0: 固定截距a1: 固定斜率b: 随机效应(只影响截距)X: 固定效应e: 噪声混合线性模型有时⼜称为多⽔平线性模型或层次结构线性模型由两个部分来决定,固定效应部分+随机效应部分,⼆、R语⾔中的线性混合模型可⽤包1、nlme包这是⼀个⽐较成熟的R包,是R语⾔安装时默认的包,它除了可以分析分层的线性混合模型,也可以处理⾮线性模型。
混合效应模型spss
混合效应模型spss
SPSS (Statistical Package for the Social Sciences) 是一款常用于统计分析的软件。
它可以用来进行混合效应模型的分析。
混合效应模型(Mixed Effects Model)是一种统计模型,用于分析来自多个随机分组的数据。
在SPSS 中,可以使用GLMM (Generalized Linear Mixed Models) 命令来进行混合效应模型分析。
这个命令可以用来
分析带有随机效应的线性回归模型和逻辑回归模型。
需要注意的是,在使用GLMM 命令之前,需要确保数据已经合适
地设置了随机效应和固定效应。
在使用GLMM 命令进行混合效应模型分析时,需要指
定固定效应变量和随机效应变量。
固定效应变量是指对所有样本都有相同效应的变量,而随机效应变量是指对每个样本都有不同效应的变量。
在SPSS 中还支持多元高斯混合效应模型, MIXED 命令, 它可以支持多元高斯分布,并且可以支持更复杂的模型类型。
它同样可以指定固定效应变量和随机效应变量。
在使用这些命令之前,建议对数据进行清洗和检查,确保数据是符合要求的。
需要根据具体分析目的和数据情况来确定使用哪种命令或模型.。
Bayesian Linear Mixed-Effects Models 贝叶斯线性混合效应模型说
Package‘blme’October12,2022Version1.0-5Date2020-12-28Title Bayesian Linear Mixed-Effects ModelsDepends R(>=3.0-0),lme4(>=1.0-6)Imports methods,stats,utilsSuggests expint(>=0.1-3),testthatDescription Maximum a posteriori estimation for linear and generalized linear mixed-effects mod-els in a Bayesian setting,implementing the meth-ods of Chung,et al.(2013)<doi:10.1007/s11336-013-9328-2>.Extends pack-age'lme4'(Bates,Maechler,Bolker,and Walker(2015)<doi:10.18637/jss.v067.i01>).License GPL(>=2)URL https:///vdorie/blmeBugReports https:///vdorie/blme/issuesNeedsCompilation noAuthor Vincent Dorie[aut,cre](<https:///0000-0002-9576-3064>), Douglas Bates[ctb](lme4non-modular functions,<https:///0000-0001-8316-9503>),Martin Maechler[ctb](lme4non-modular functions,<https:///0000-0002-8685-9910>),Ben Bolker[ctb](lme4non-modular functions,<https:///0000-0002-2127-0443>),Steven Walker[ctb](lme4non-modular functions,<https:///0000-0002-4394-9078>)Maintainer Vincent Dorie<****************>Repository CRANDate/Publication2021-01-0518:10:11UTCR topics documented:blme (2)bmerDist-class (5)bmerMod-class (7)12blme Index9blme Fit Bayesian Linear and Generalized Linear Mixed-Effects ModelsDescriptionMaximum a posteriori estimation for linear and generalized linear mixed-effects models in a Bayesian setting.Built off of lmer.Usageblmer(formula,data=NULL,REML=TRUE,control=lmerControl(),start=NULL,verbose=0L,subset,weights,na.action,offset,contrasts=NULL,devFunOnly=FALSE,cov.prior=wishart,fixef.prior=NULL,resid.prior=NULL,...)bglmer(formula,data=NULL,family=gaussian,control=glmerControl(),start=NULL,verbose=0L,nAGQ=1L,subset,weights,na.action,offset,contrasts=NULL,mustart,etastart,devFunOnly=FALSE,cov.prior=wishart,fixef.prior=NULL,...)Argumentscov.prior a BLME prior or list of priors with allowable distributions:wishart,invwishart, gamma,invgamma,or NULL.Imposes a prior over the covariance of the randomeffects/modeled coefficients.Default is wishart.The NULL argument imposesflat priors over all relevant parameters.fixef.prior a BLME prior of family normal,t,horseshoe,or NULL.Imposes a prior overthefixed effects/modeled coefficients.Default is NULL.resid.prior a BLME prior of family gamma,invamma,point or NULL.Imposes a prior overthe noise/residual variance,also known as common scale parameter or the con-ditional variance given the random effects.Default is NULL.start like the start arguments for lmer and glmer a numeric vector or named list.Unlike the aforementioned,list members of fixef and sigma are applicable tolinear mixed models provided that numeric optimization is required for theseparameters.formula,data,REML,family,control,verbose,nAGQ,mustart,etastart,devFunOnly,...model specification arguments as in lmer and glmer;see there for details.subset,weights,na.action,offset,contrastsfurther model specification arguments as in lm;see there for details.blme3 DetailsThe bulk of the usage for blmer and bglmer closely follows the functions lmer and glmer.Those help pages provide a good overview offitting linear and generalized linear mixed models.The primary distinction is that blmer and bglmer allow the user to do Bayesian inference or penalized maximum likelihood,with priors imposed on the different model components.For the specifics of any distribution listed below,see the distributions page.Covariance PriorThe cov.prior argument applies a prior over the covariance matrix of the random effects/modeled coefficients.As there is one covariance matrix for every named grouping factor-that is every element that appears to the right of a vertical bar("|")in the model formula-it is possible to apply as many different priors as there are said factors.The general formats of an argument to blmer or bglmer for such a prior are of the form:•cov.prior=~covariance.distribution(option1=value1,...)•cov.prior=list(fc.nm~dist1,fc.nm~dist2,...,default.distribution) If the“~”construct is ommitted,the prior is interpretted as a default and applied to all factors that lack specific priors of their own.Options are not required,but permitfine-tuning of the model.Supported distributions are gamma,invgamma,wishart,invwishart,NULL,and custom.The common.scale option,a logical,determines whether or not the prior applies to in the absolute-real world sense(value=FALSE),or if the prior is applied to the random effect covariance divided by the estimated residual variance(TRUE).As a practical matter,when false computation can be slower as the profiled common scale may no longer have a closed-form solution.As such,the default for all cases is TRUE.Other options are specified along with the specific distributions and defaults are explained in the blme distributions page.Fixed Effects PriorPriors on thefixed effects,or unmodeled coefficients,are specified in a fashion similar to that of covariance priors.The general format is•fixef.prior=multivariate.distribution(options1=value1,...) At present,the implemented multivariate distributions are normal,t,horseshoe,and NULL.t and horseshoe priors cannot be used when REML is TRUE,as that integral does not have a closed form solution.Residual Variance PriorThe general format for a residual variance prior is the same as for afixed effect prior.The supported distributions are point,gamma,invgamma.ValueAn object of class"bmerMod",for which many methods are available.See there for details.4blmeSee Alsolmer,glmer,merMod class,and lm.Examplesdata("sleepstudy",package="lme4")###Examples using a covariance prior###Here we are ignoring convergence warnings just to illustate how the package#is used:this is not a good idea in practice..control<-lmerControl(check.conv.grad="ignore")(fm1<-blmer(Reaction~Days+(0+Days|Subject),sleepstudy,control=control,cov.prior=gamma))(fm2<-blmer(Reaction~Days+(0+Days|Subject),sleepstudy,control=control,cov.prior=gamma(shape=2,rate=0.5,posterior.scale= sd ))) (fm3<-blmer(Reaction~Days+(1+Days|Subject),sleepstudy,control=control,cov.prior=wishart))(fm4<-blmer(Reaction~Days+(1+Days|Subject),sleepstudy,control=control,cov.prior=invwishart(df=5,scale=diag(0.5,2))))#Custom priorpenaltyFn<-function(sigma)dcauchy(sigma,0,10,log=TRUE)(fm5<-blmer(Reaction~Days+(0+Days|Subject),sleepstudy,cov.prior=custom(penaltyFn,chol=TRUE,scale="log")))###Examples using a fixed effect prior###(fm6<-blmer(Reaction~Days+(1+Days|Subject),sleepstudy,cov.prior=NULL,fixef.prior=normal))(fm7<-blmer(Reaction~Days+(1+Days|Subject),sleepstudy,cov.prior=NULL,fixef.prior=normal(cov=diag(0.5,2),common.scale=FALSE))) ###Example using a residual variance prior####This is the"eight schools"data set;the mode should be at the boundary#of the space.control<-lmerControl(check.conv.singular="ignore",check.nobs.vs.nRE="ignore",check.nobs.vs.nlev="ignore")y<-c(28,8,-3,7,-1,1,18,12)sigma<-c(15,10,16,11,9,11,10,18)g<-1:8(schools<-blmer(y~1+(1|g),control=control,REML=FALSE,resid.prior=point,cov.prior=NULL,weights=1/sigma^2))bmerDist-class Bayesian Linear Mixed-Effects Model Prior Representations andbmer*Dist MethodsDescriptionObjects created in the initialization step of a blme model that represent the type of prior being applied.Objects from the ClassObjects can be created by calls of the form new("bmerPrior",...)or,more commonly,as side effects of the blmer and bglmer functions.When using the main blme functions,the prior-related arguments can be passed what essentially are function calls with the distinction that they are delayed in evaluation until information about the model is available.At that time,the functions are defined in a special environment and then evaluated in an environment that directly inherits form the one in which blmer or bglmer was called.This is reflected in some of the prototypes of various prior-creating functions which depend on parameters not available in the top-level environment.Finally,if the trailing parentheses are omitted from a blmer/bglmer prior argument,they are simply added as a form of“syntactic sugar”.Prior DistributionsThis section lists the prototypes for the functions that are called to parse a prior during a modelfit.Fixed Effect Priors•normal(sd=c(10,2.5),cov,common.scale=TRUE)Applies a Gaussian prior to thefixed effects.Normal priors are constrained to have a mean of 0-non-zero priors are equivalent to shifting covariates.The covariance hyperparameter can be specified either as a vector of standard deviations, using the sd argument,a vector of variances using the cov argument,or the entire vari-ance/covariance matrix itself.When specifying standard deviations,a vector of length less than the number offixed effects will have its tail repeated,while thefirst element is assumed to apply only to the intercept term.So in the default of c(10,2.5),the intercept receives a standard deviation of10and the various slopes are all given a standard deviation of2.5.The common.scale argument specifies whether or not the prior is to be interpretted as being on the same scale as the residuals.To specify a prior in an absolute sense,set to FALSE.Argument is only applicable to linear mixed models.•t(df=3,mean=0,scale=c(10^2,2.5^2),common.scale=TRUE)The degrees of freedom-df argument-must be positive.If mean is of length1,it is repeated for everyfixed effect.Length2repeats just the second element for all slopes.Otherwise,the length must be equal to that of the number offixed effects.If scale is of length1,it is repeated along the diagonal for every component.Length2repeatsjust the second element for all slopes.Length equal to the number offixed effects sees thevector simply turned into a diagonal matrix.Finally,it can be a full scale matrix,so long as itis positive definite.t priors for linear mixed models require that thefixed effects be added to set of parameters thatare numerically optimized,and thus can substantially increase running time.In addition,whencommon.scale is TRUE,the residual variance must be numerically optimized as well.normalpriors on the common scale can be fully profiled and do not suffer from this drawback.At present,t priors cannot be used with the REML=TRUE argument as that implies an integralwithout a closed form solution.•horseshoe(mean=0,global.shrinkage=2.5,common.scale=TRUE)The horseshoe shrinkage prior is implemented similarly to the t prior,in that it requires addingthefixed effects to the parameter set for numeric optimization.global.shrinkage,also referred to asτ,must be positive and is on the scale of a standarddeviation.Local shrinkage parameters are treated as independent across allfixed effects andare integrated out.See Carvalho et al.(2009)in the references.Covariance Priors•gamma(shape=2.5,rate=0,common.scale=TRUE,posterior.scale="sd")Applicable only for univariate grouping factors.A rate of0or a shape of0imposes an im-proper prior.The posterior scale can be"sd"or"var"and determines the scale on which theprior is meant to be applied.•invgamma(shape=0.5,scale=10^2,common.scale=TRUE,posterior.scale="sd")Applicable only for univariate grouping factors.A scale of0or a shape of0imposes animproper prior.Options are as above.•wishart(df=level.dim+2.5,scale=Inf,common.scale=TRUE,posterior.scale="cov")A scale of Inf or a shape of0imposes an improper prior.The behavior for singular matriceswith only some infinite eigenvalues is undefined.Posterior scale can be"cov"or"sqrt",thelatter of which applies to the unique matrix root that is also a valid covariance matrix.•invwishart(df=level.dim-0.5,scale=diag(10^2/(df+level.dim+1),level.dim), common.scale=TRUE,posterior.scale="cov")A scale of0or a shape of0imposes an improper prior.The behavior for singular matriceswith only some zero eigenvalues is undefined.•custom(fn,chol=FALSE,common.scale=TRUE,scale="none")Applies to the given function(fn).If chol is TRUE,fn is passed a right factor of covariancematrix;FALSE results in the matrix being passed directly.scale can be"none","log",or"dev"corresponding to p(Σ),log p(Σ),and−2log p(Σ)respectively.Since the prior is may have an arbitrary form,setting common.scale to FALSE for a linearmixed model means that full profiling may no longer be possible.As such,that parameter isnumerically optimized.Residual Variance Priors•point(value=1.0,posterior.scale="sd")Fixes the parameter to a specific value given as either an"sd"or a"var".•gamma(shape=0,rate=0,posterior.scale="var")As above with different defaults.•invgamma(shape=0,scale=0,posterior.scale="var")As above with different defaults.Evaluating EnvironmentThe variables that the defining environment have populated are:•p aliased to n.fixef-the number offixed effects•n aliased to n.obs-the number of observations•q.k aliased to level.dim-for covariance priors,the dimension of the grouping factor/grouping level•j.k aliased to n.grps-also for covariance priors,the number of groups that comprise aspecific grouping factorMethodstoString Pretty-prints the distribution and its parameters.ReferencesCarvalho,Carlos M.,Nicholas G.Polson,and James G.Scott."Handling Sparsity via the Horse-shoe."AISTATS.V ol.5.2009.See Alsoblmer()and bglmer(),which produce these objects,and bmerMod-class objects which contain them.bmerMod-class Class"bmerMod"of Fitted Mixed-Effect ModelsDescriptionThe bmerMod class represents linear or generalized linear or nonlinear mixed-effects models with possible priors over model components.It inherits from the merMod class.Objects from the ClassObjects are created by calls to blmer or bglmer.SlotsA bmerMod object contains one additional slot beyond the base merMod class:priors:A named list comprised of covPriors,fixefPrior,and residPrior.In addition,the devcomp slot,element cmp includes the penalty item which is the computed de-viance for the priors.Add this to the regular deviance to obtain the value of the objective function that is used in optimization.See Alsoblmer and bglmer,which produce these objects.merMod,from which this class inherits.ExamplesshowClass("bmerMod")methods(class="bmerMod")Index∗GLMMblme,2∗NLMMblme,2∗classesbmerDist-class,5bmerMod-class,7∗methodsblme,2∗modelsblme,2bglmer,5,7,8bglmer(blme),2bglmerMod-class(bmerMod-class),7 blme,2blmer,5,7,8blmer(blme),2blmerMod-class(bmerMod-class),7 bmerDist(bmerDist-class),5bmerDist-class,5bmerMod,3bmerMod(bmerMod-class),7bmerMod-class,7distributions,3glmer,2–4lm,2,4lmer,2–4merMod,4,7,8print,bmerDist-method(bmerDist-class), 5print.bmerMod(bmerMod-class),7 print.summary.bmerMod(bmerMod-class),7 prior,2show,bmerDist-method(bmerDist-class),5show,bmerMod-method(bmerMod-class),7show.bmerMod(bmerMod-class),7summary.bmerMod(bmerMod-class),7summary.summary.bmerMod(bmerMod-class),7vcov.summary.bmerMod(bmerMod-class),7 9。
线性混合效应模型的估计与检验的开题报告
线性混合效应模型的估计与检验的开题报告一、选题背景线性混合效应模型(linear mixed effects model)是一种广泛应用于数据分析的统计模型。
它可以用来处理纵向数据(longitudinal data)或重复测量数据(repeated measures data),在多个观测时间下对相同个体进行测量,同时考虑个体间和个体内的变异性。
该模型还可以用于处理随机效应(random effects),如个体的不同特征或测量设备的变异性,等等。
通常线性混合效应模型的估计与检验需要使用专业软件或编程语言进行实现。
本文计划使用R编程语言进行模型的估计与检验,以说明如何使用R中的lme4和lmerTest包进行线性混合效应模型的估计与检验。
二、研究目的本文旨在介绍线性混合效应模型的基本概念、模型公式和模型参数的估计方法。
同时,本文也将介绍如何使用lme4和lmerTest包进行模型的估计与检验,并给出相应的R代码和解释。
三、研究内容本文将涉及以下内容:1. 线性混合效应模型的基本概念和模型公式2. 模型参数的估计方法3. 模型诊断和检验4. 使用lme4和lmerTest包进行模型的估计与检验5. 给出R代码和解释,以说明如何实现线性混合效应模型的估计与检验四、研究方法本文将采用文献研究的方法,收集和整理相关文献的理论知识和实践经验,重点介绍多个实例的应用过程,并使用R编程语言对其进行实现。
五、预期结果本文实现了线性混合效应模型在R编程语言中的估计与检验,通过多个实例的应用说明了模型的基本概念和估计方法,同时也强调了模型诊断和检验的重要性。
本文力求通过讲解编程细节和代码实现,使读者能够深入理解模型的思想和背后的统计学原理,并能够灵活地使用R进行模型的估计、模型选择和模型验证等操作。
线性混合效应模型
线性混合效应模型线性混合效应模型(Linear Mixed Effects Model,LME)是一种非常有用的统计模型,它允许将个体差异和时间序列效应集成在一起,以便更好地了解数据中发生的不断变化。
LME模型是一个结构复杂的模型,首先要求对建模进行概括,然后就可以使用概括的参数进行建模。
LME模型由两部分组成:随机效应和固定效应。
随机效应允许将个体差异考虑在内,从而可以更好地量化个体之间的差异。
固定效应是将可测量的变量作为解释变量考虑进来的。
例如,在研究学生成绩时,可以将课程、年级、学习时间等变量作为固定效应加以考虑。
LME模型可以用来分析和预测复杂的数据,例如研究人员从多个独立样本中观察到的实验数据。
它可以帮助弄清实验变量之间的相互作用,并发现不同样本之间的差异。
同时,它还可以用来考察分组效应,以了解样本之间的差异可能是由独立的因素导致的,也可能是由某些群体作用导致的,又或者是由两者共同作用导致的。
另外,LME模型还可以用来研究变量之间的关系,特别是用于分析长期追踪和时间序列数据,这些数据可能会随时间而发生变化。
此外,它还可以用于分析多变量之间的关系,以了解哪些因素会影响另一变量,以及这些变量之间的相互作用。
由于LME模型的复杂性,使用它需要专业统计学知识,以便将模型中的参数准确估计出来,从而能够得到有意义的结果。
同时,模型的参数也有可能会出现过拟合以及其他问题,因此,使用者需要仔细检查模型的参数,以避免出现这些问题。
总的来说,LME模型是一种非常有用的统计模型,能够将个体差异和时间序列效应考虑在内,从而有助于更好地解释和预测复杂的数据。
它可以用来分析和预测变量之间的关系,以及考查多变量之间的相互作用。
然而,由于它的复杂性,使用LME模型可能会出现过拟合或其他问题,因此,使用者需要仔细检查模型的参数,以避免出现这些问题。
model selection in linear mixed effects model using proc mixed
∧
∧
Ω=
∧
1
Notice that the term
∫ f ( x ) log( f ( x ))dx is dependent
of Σ Ω given in (2) since the density function of the fitted model is known. There are two different methods presented here for selecting the "best" or the "smallest AIC" model among all the models under consideration. The first method essentially identifies all the possible mean functions and all the possible variance-covariance structures applicable to the question of interest. The number of possible models for consideration includes all combinations resulted from both the mean and variance-covariance structures. For each of these models, AIC is then computed and the model with minimum AIC is selected. The second method recommended by Wolfinger and Diggle (1,2,10) has the following steps : First, using the most complex mean structure under consideration, select the best variance-covariance structure using the restricted maximum likelihood (REML). REML which focuses on the covariance side of the model should be used in place of the likelihood. Let AIC_R denotes AIC derived from REML : AIC_R = -2*(Restricted likelihood) + 2*(# covariance parameters). The variance-covariance structure with the smallest AIC_R is selected. Then using this variance-covariance structure, go back and use AIC to select the best mean structure. In the following example, both methods will be illustrated.
线性混合效应模型入门之一(linear mixed effects model)
适用场景线性混合效应模型入门(linear mixed effects model),缩写LMM,在生物医学或社会学研究中经常会用到。
它主要适用于内部存在层次结构或聚集的数据,大体上有两种情况:(1)内部聚集数据:比如要研究A、B两种教学方法对学生考试成绩的影响,从4所学校选取1000名学生作为研究对象。
由于学校之间的差异,来自其中某一所学校的学生成绩可能整体都好于另一所学校,换句话说就是学生成绩在学校这个维度上存在聚集现象。
(2)重复测量数据:比如要研究A、B两种降压药物对高血压患者血压的影响,在每个患者服药前、服药后1个月、3个月、6个月分别测量血压。
由于同一个患者的每次血压之间存在明显的相关性,不能适用于传统的方差分析方法。
随机效应与固定效应之所以称为“线性混合效应模型”,就是因为这种模型结合了固定效应和随机效应。
固定效应(fixed effect):所谓固定效应,指的是这个因素的每个水平(level)已经“穷举”出来了,不能或者不需要再做“推广”。
比如上面的降压药物研究,虽然降压药物有很多,但是研究者只关心A、B两种药物的效果,所以可以视为固定效应。
固定效应影响的是响应变量或因变量(如血压)的均值。
随机效应(random effect):指的是该因素是从一个更大的总体中抽取出来的样本,我们的研究结果要推广到整个总体。
还是上面的药物研究,参与研究的患者只是一个小样本,所以患者作为随机效应。
随机效应影响的是响应变量(血压)的变异程度即方差。
图a中演示是固定效应因子,每次重复实验,因子都是A1、A2、A3三个水平,三个水平的效应均值是固定的。
图b演示的是随机效应因子,每次重复实验,因子水平都不一样,如第一次是B1、B2、B3,第二次是B4、B5、B6,以此类推。
所以因子的每个水平对均值的影响都是随机的,不固定的。
当然这两种效应有时并不是绝对的,主要还是看研究的目的。
广义线性混合效应模型试验设计若干
广义线性混合效应模型试验设计若干IntroductionThe Generalized Linear Mixed Effects Model (GLMM) is a powerful statistical tool used for analyzing data in various fields including ecology, social sciences, economics, and psychology. GLMM combines both fixed effects and random effects to account for within-group and between-group variability. It is a flexible model that can handle non-normal data types and non-linear relationships. In this article, we will discuss the experimental design considerations for GLMMs.Experimental Design Considerations for GLMMs1. Choosing the Appropriate Response VariableThe first step in designing a GLMM is to choose an appropriate response variable. The response variable should be continuous and non-negative. When using count data, the response variable should follow a Poisson distribution. If the data is binary or categorical, then logistic regression or multinomial regression should be used.2. Determining Fixed and Random EffectsFixed effects are variables that are assumed to have a systematic effect on the response variable. Examples of fixed effects include treatment group, time, and sex. Random effects are variables that contribute to within-group variation. Examples of random effects include subject ID, site, and block. It is important to carefully consider which variables to include as fixed or random effects.3. Sample Size ConsiderationsSample size considerations are important in any experimental design. In a GLMM, it is important to have enough observations and groups to capture the random effects. A rule of thumb is to have at least 5-10 groups per random effect included in the model. Additionally, a larger sample size will improve the power of the model.4. Choosing the Appropriate Link Function and DistributionThe link function connects the response variable to the linear predictor in the GLMM. Common link functions include the identity, logit, and log-link. The choice of the link function depends on the distribution of the response variable. The distribution of the response variable should also be carefully considered. Common distributions used in GLMMs include the Gaussian, Poisson, and binomial distributions.5. Controlling for CovariatesCovariates are variables that may influence the response variable but are not of interest in the study. It is important to control for covariates in the GLMM to prevent confounding. Examples of covariates include age, weight, and baseline measurements.6. Model Selection and ValidationSelecting an appropriate model is key to obtaining meaningful results. It is important to consider both statistical and biological significance of the model. Model selection can be done through stepwise selection or model comparison techniques such as AIC or BIC. The final model should be validated through checking model assumptions, assessing the fit of the model, and examining any influential observations.ConclusionGLMM is a powerful statistical tool used in various fields to analyze data. In this article, we have discussed the experimental design considerations for GLMMs. Careful consideration of the response variable, fixed and random effects, sample size, link function and distribution, controlling for covariates, and model selection and validation are key to obtaining valid results.。
基于线性混合效应模型的脑结构影像的研究
基于线性混合效应模型的脑结构影像的研究
脑结构影像研究是研究脑结构和功能的关系的重要手段之一。
目前,随着神经影像学
技术的不断发展,研究者们可以利用磁共振成像(MRI)等技术获得高质量的脑结构数据。
基于这些数据分析脑结构与疾病、发育等方面的关系,对理解脑结构与功能的内在机制有
着重要的作用。
线性混合效应模型(linear mixed effects model, LMEM)是一种常用的统计模型,
用于分析在多个层面上的数据的变化。
它被广泛应用于生物学、医学等领域的研究中。
在
脑结构影像研究中,LMEM能够考虑到不同个体之间的差异,同时允许考虑不同因素的影响,并通过模型参数的显著性检验来分析它们在脑结构中的作用。
因此,LMEM能够帮助研究者在脑结构影像数据的分析中更好地理解数据。
在脑结构影像的研究中,LMEM常用于分析不同人群之间存在的差异。
例如,在探究老年人与年轻人脑结构差异的研究中,研究者可以利用LMEM考虑年龄、性别、教育等因素的影响,并找到这些因素在脑结构变化中的作用。
此外,LMEM还可以用于观察个体之间的差异。
比如,在探究特定神经系统疾病的患者脑结构变化时,研究者可以利用LMEM考虑患者之间的差异,并通过参数估计来了解患者个体的特点。
总之,LMEM是一种灵活、有效的统计模型,在脑结构影像研究中的应用越来越广泛。
它能够考虑到不同个体和不同因素之间的差异,并能够帮助研究者更准确地分析数据。
未来,随着神经影像学技术的进一步发展,基于LMEM的脑结构影像研究将会更加精细和深入,为我们更好地理解脑结构与功能之间的关系提供更多的支持。
线性混合效应模型的运用和解读
线性混合效应模型的运用和解读线性混合效应模型(Linear Mixed Effects Model,简称LME)是一种统计模型,用于分析具有重复测量或者多层次结构的数据。
它在社会科学、医学研究、生态学等领域得到广泛应用,能够更准确地估计固定效应和随机效应之间的关系,从而提高数据分析的准确性和可靠性。
LME模型的核心思想是将数据分解为固定效应和随机效应两部分。
固定效应是指影响整个样本的因素,例如性别、年龄等,而随机效应则是指影响个体差异的因素,例如个体间的随机误差或者组别间的随机变异。
通过同时考虑固定效应和随机效应,LME模型能够更好地描述数据的变异情况,提高参数估计的准确性。
LME模型的数学表达形式如下:Y = Xβ + Zγ + ε其中,Y是因变量,X和Z是设计矩阵,β和γ分别是固定效应和随机效应的系数,ε是随机误差项。
通过最大似然估计或者贝叶斯方法,可以估计出模型的参数,进而进行数据的分析和解读。
LME模型的应用范围非常广泛。
在社会科学领域,比如教育研究中,研究者常常需要考虑学校和学生之间的差异,LME模型可以很好地处理这种多层次结构的数据。
在医学研究中,LME模型可以用于分析多个医院或者诊所的数据,考虑到不同医院或者诊所之间的差异。
在生态学研究中,LME模型可以用于分析观测数据和实验数据,考虑到不同观测点或者实验处理之间的差异。
LME模型的解读需要注意几个方面。
首先,需要关注固定效应和随机效应的估计结果。
固定效应的估计结果可以告诉我们在整个样本中哪些因素对因变量有显著影响,而随机效应的估计结果可以告诉我们个体差异或者组别间的差异对因变量的解释程度。
其次,需要关注模型的拟合优度,例如R方值或者AIC/BIC等指标。
拟合优度可以反映模型对数据的解释能力,值越高表示模型拟合得越好。
最后,需要进行参数估计的显著性检验,判断模型中的固定效应和随机效应是否显著。
除了上述基本的应用和解读,LME模型还可以进行进一步的扩展和改进。
r语言icc组内相关系数
r语言icc组内相关系数在R语言中,`icc()`函数来自于lme4包,可以用来计算组内相关系数(intraclass correlation coefficient, ICC)。
具体用法如下:1. 首先,确保已经安装了lme4包。
如果没有安装,可以使用以下命令安装:rinstall.packages("lme4")2. 加载lme4包:rlibrary(lme4)3. 准备数据,假设数据为一个数据框,包含一个组别变量和一个连续型的观测变量:rdata <- data.frame(group = factor(c(1, 1, 2, 2, 3, 3)), value = c(10, 12, 15, 17, 8, 9))4. 使用`icc()`函数计算组内相关系数。
需要指定一个线性混合模型(linear mixed-effects model, LMM)来估计组内方差,并且使用`multiple = TRUE`参数计算多个ICC的平均值。
ricc_result <- icc(value ~ (1 group), data = data, multiple = TRUE)其中,`(1 group)`表示以group作为随机效应的模型,`value`为观测变量。
5. 查看计算结果:ricc_result函数将返回一个数据框,包含了不同类型的ICC的值。
注意:要使用lme4包中的icc函数,数据应该是一个线性层次结构(linear hierarchical structure)的数据,即每个组别有多个观测值。
如果数据不满足这个条件,ICC的计算可能不适用。
线性混合效应模型
线性混合效应模型
线性混合效应模型(Linear Mixed Effects Model, LME)是一类统计模型,用于描述一个随机变量如何受多个不同因素影响的情况。
它是一种统计分析方法,用于处理复杂的数据结构,如多个组的数据或多维数据。
线性混合效应模型分为两类:固定效应模型和随机效应模型。
固定效应模型是一种线性回归模型,旨在描述一个变量(正因变量)如何受多个解释变量(自变量)影响的情况。
它假设每一组观测数据都服从相同的线性关系,并且假设解释变量和正因变量之间存在一个固定的关系。
随机效应模型是一种更加灵活多变的模型,旨在描述一个变量(正因变量)如何受多个解释变量(自变量)影响的情况,同时也考虑了不同组之间的差异。
它假设每一组观测数据的线性关系存在一定的变化,并且假设解释变量和正因变量之间存在一个可变的关系。
线性混合效应模型可以用来比较不同组的数据,从而获得更准确的结果。
例如,可以用它来研究不同年龄段的人群对某个产品的反应,或者可以用它来研究不同地区的人们对某个事件的反应。
LME模型可以帮助研究人员比较不同组之间的数据,发现数据之间的差异,从而更加准确地了解数据的意义。
线性混合效应模型可以用来分析多维数据,用于研究复杂的结构。
它可以帮助研究人员更好地理解数据,从而更准确地推断结果。
使用LME模型,可以更加精确地了解不同组之间的数据,从而发现数据之间的差异,从而更准确地分析数据。
lme函数 标准差
LME(Linear Mixed Effects)函数的标准差可以通过模型的残差标准差来估计。
在R语言中,可以使用lme()函数来拟合线性混合效应模型,并使用summary()函数来获取模型的摘要信息,其中包括残差标准差的估计值。
例如,假设已经拟合了一个线性混合效应模型null.model <- lme(fixed = fev1 ~ 1, data = Data, random =
~ 1 | conwrd),可以使用以下代码来获取残差标准差的估计值:
r复制代码
summary(null.model)
在模型的摘要信息中,可以找到名为“Residual StdDev”的列,该列给出了残差标准差的估计值。
请注意,这里的“标准差”指的是残差的标准差,而不是固定效应或随机效应的标准差。
固定效应的标准差可以通过模型参数的标准误来估计,而随机效应的标准差可以通过模型的方差分量来估计。
基于线性混合效应模型的黄 体长体重关系的时空差异
中国水产科学 2018年11月, 25(6): 1299-1307 Journal of Fishery Sciences of China研究论文收稿日期: 2018-05-15; 修订日期: 2018-07-22.基金项目:中央高校基本科研业务费专项(201562030, 201612004).作者简介: 衷思剑(1994-), 男, 硕士研究生, 从事渔业资源生态学研究. E-mail: zsj4319@ 通信作者: 任一平, 教授, 从事渔业资源生态学研究. E-mail: renyip@ DOI: 10.3724/SP.J.1118.2018.18165基于线性混合效应模型的黄体长体重关系的时空差异衷思剑1, 麻秋云1, 刘淑德2, 王四杰2, 任一平1, 31. 中国海洋大学水产学院, 山东青岛 266003;2. 山东省水生生物资源养护管理中心, 山东 烟台 264003;3. 青岛海洋科学与技术国家实验室海洋渔业科学与食物产出过程功能实验室, 山东 青岛 266237 摘要: 为了研究黄(Lophius litulon )生活史特征的异质性, 根据秋(2016年10月)、冬(2017年1月)、春(2017年5月)和夏(2017年8月)4个季节在山东近海的底拖网调查数据,对该物种体长-体重关系的时空差异进行了研究。
本文构建了广义线性模型和9个线性混合效应模型, 用来研究黄的体长-体重关系(W=aL b )及其时空差异。
b的固定值(2.77)小于3, 表示黄为负异速生长, 肥满度与体长负相关, 身体趋于细长。
根据AIC值最小原则, 最复杂的线性混合效应模型(即水域和季节对两个参数a 和b 均存在随机效应)拟合效果最佳; 交叉验证的结果同样表明, 该模型的预测效果最为可靠。
根据最佳模型和广义线性模型的差异性分析结果, 黄体长-体重关系的时空差异是极显著的(P <0.01)。
在最佳模型中, a 值在春季最大, 其次是秋季和冬季, 而夏季最小; b 值则与此相反。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
咨询QQ:3025393450
有问题百度搜索“”就可以了
欢迎登陆官网:/datablog
线性混合效应模型Linear Mixed-Effects Models的部分折叠Gibbs采样数据分析报告
来源:大数据部落|有问题百度搜索“”就可以了
本文介绍了线性混合效应模型的新型贝叶斯分析。
该分析基于部分折叠的方法,该方法允许某些组件从模型中部分折叠。
得到的部分折叠的Gibbs(PCG)采样器被构造成适合线性混合效应模型,预计会比相应的Gibbs采样器表现出更好的收敛特性。
为了构建PCG采样器而不使组件更新复杂化,我们考虑通过在线性混合效应模型中根据组内方差表示组间方差来重新参数化模型组件。
简介
已经开发出混合效应模型来处理相关响应数据并考虑多种变化来源。
为了解释响应变量的依赖结构,混合效应模型不仅包含固定效应,还包含将某些协变量视为随机变量的随机效应。
混合效应模型在一段时间内对受试者进行重复测量的环境中特别方便。
与传统的纵向数据方法相比,混合效应模型也可以处理缺失值。
方法具有适当先验分布的混合效应模型考虑一般的混合效应模型
(1)
咨询QQ:3025393450
有问题百度搜索“”就可以了
欢迎登陆官网:/datablog
(2)
其中b=(b1,b2,...,b k)是随机效应的q×k矩阵,Y= {Y i}ki= 1是观测数据的集合,代表逆Wishart分布,和
默顿的跳跃扩散模型
考虑默顿的跳跃扩散模型其目的是模型跳跃由于罕见的经济事件或新闻突然资产价格。
该模型由。
给出
(3)
其中St代表时间t的资产价格,γ是资产的瞬时预期收益,σ是资产收益的瞬时标准差,Wt是维纳过程,对数跳跃大小Jt是均值μ高斯随机变量Ĵ和方差σ2Ĵ,和ñ吨是一个泊松过程与到达速率λ。
在没有跳跃过程的情况下,(3)中的模型被称为几何布朗运动过程,并且{St}Tt= 1的连续对数比率与平均γ和方差σ
独立高斯随机变量2。
然而,当在时间t发生跳跃时,该过程不再是连续的; S t -明确表示跳转之间的不连续性。
此外,我们考虑基于日常的资产价格,因此假设在每个时间间隔内最多发生一次跳跃,即
咨询QQ:3025393450
有问题百度搜索“”就可以了
欢迎登陆官网:/datablog
模拟
为了说明PCG采样器相对于Gibbs采样器的改进收敛特性,对具有适当先验分布的混合效应模型和Merton的跳跃扩散模型进行了仿真研究。
首先,在(7)中具有适当先验分布的混合效应模型中,我们假设存在k = 100个组,并且每个组
具有相同的大小n i = 2.模型参数的真实值被设置为,σ 2 = 1,并且
图1由具有适当先验分布的混合效应模型构建的Gibbs采样器模拟的每个模型参数的混合图,自相关图和边际后验概率分布函数。
咨询QQ:3025393450
有问题百度搜索“”就可以了
欢迎登陆官网:/datablog
图2由具有适当先验分布的混合效应模型构建的PCG采样器模拟的每个模型参数的混合图,自相关图和边际后验概率分布函数。
咨询QQ:3025393450
有问题百度搜索“”就可以了
欢迎登陆官网:/datablog
图3由具有适当先验分布的Merton跳跃扩散模型构建的Gibbs采样器模拟的每个模型参数的混合图,自相关图和边际后验概率分布函数。