多水平logistic模型及其在流行病学调查数据中的应用

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

3
广东药学院硕士研究生学位论文
多水平 logistic 模型及其在流行病学调查数据中的应用
In this study, we focus on the rationale for using multilevel logsitic model in public health research and epidemiology, summarizes the statistical methodology, and highlights some of the research questions that have been addressed using these methods. The advantages and disadvantages of multilevel logsitic model compared with standard methods are reviewed. The use of multilevel logsitic model raises theoretical and methodological issues related to the theoretical model being tested, the conceptual distinction between group- and individual-level variables, the ability to differentiate “independent” effects, the reciprocal relationships between factors at different levels, and the increased complexity that these models imply. The potentialities and limitations of multilevel logsitic model, within the broader context of understanding.
广东药学院 硕士学位论文 多水平logistic模型及其在流行病学调查数据中的应用 姓名:骆华萍 申请学位级别:硕士 专业:流行病学与卫生统计学 指导教师:张丕德
20100501
广东药学院硕士研究生学位论文
多水平 logistic 模型及其在流行病学调查数据中的应用
中文摘要:
目的:探讨多水平 logistic 模型的基本理论及其在流行病学调查数据上的应用, 旨在研究多水平 logistic 模型在实际应用过程中的方法学问题,为以后层次结构数 据的有效分析提供参考。
结果:实例应用表明多水平模型在处理层次结构数据时,能够解决其有数据层次 性、聚集性的问题,并且能够根据专业知识和实际情况将解释变量处理为固定效应或 者随机效应,从而能够得到更丰富,更全面的结果。将多水平模型与筛选变量后的 logitistic 回归模型进行比较,前者的标准误比后者更小,统计效果更显著,且对于结 果比后者有更合理的解释性。将缺失值进行多元填补后,能够得到比原始数据更可靠 的结果。
结论:多水平 logistic 模型理论在处理二分类层次结构数据时,提供了比传统 模型更接近于客观情况的丰富信息。一般 logistic 回归模型简单易行,但只能研究个 体层面的信息和其固定效应,无法分析组群方面信息和哪些因素对结局变量的变异程 度有影响,而且当数据存在聚集性时会出现偏误,并且对于结果不能给出合理的解释; 多水平 logistic 模型充分考虑了数据间的相关性问题,可考虑层次信息,并且能够同
方法:在流行病学中常常存在层次结构数据,这种数据的特性为组群间差异较大, 而组群内的成员间趋向于一致,即呈现一定的聚集性。这时,传统模型中数据间关于 相互独立和方差齐的假定有可能不成立。多水平 logistic 模型在处理结构数据时考 虑到了数据的层次性和聚集性,பைடு நூலகம்基本思想是将总残差分解到相应的各个水平,其中 高水平单位之间的变异表示组间变异,低水平单位间的变异表示个体间差异,将各水 平残差表示为某些变量的函数,从而可分析其影响因素及变化趋势。多水平 logistic 模型与一般 logistic 回归模型的区别是:前者可以很好地处理存在组内聚集性的数 据,可同时测量个体水平变异和组水平变异,可同时考虑固定效应和随机效应,还可 研究场景变量对于组群单位的影响,而这些都是后者不能分析和解决的。
忽略层次结构数据的组群效应,将会以损失资料信息的完整性为代价,使统计 结果失效,并有可能得出错误的结论。因此,在有层次结构的流行病学调查数据中, 多水平 logistic 模型是一个很好的选择,随着多水平 logistic 模型理论的完善和成熟, 多水平 logistic 模型在流行病学领域中将会有更大的优势和更广阔的应用前景。
Master Candidate: Hua-ping Luo
Major: Epidemiology and health statistics
Supervior: Professor Pide Zhang
Abstract
In public health and epidemiology, large-scale surveys often follow a hierarchical structure of data as the surveys are based on multistage stratified cluster sampling.Examples of hierarchical data structures include persons nested within families,pupils nested within schools.Specific for hierarchical data sets is that observations are correlated.That is,the lower level belonging to the same higher level unit tend to be more alike than lower level units from different higher level units. At this circumstances,it’s may not be suitable to using standard model such as logistic regression model. Standard approachs have the drawbacks of ignoring the potential importance of group-level attributes in influencing individual-level outcomes. In addition, if outcomes for individuals within groups are correlated, the assumption of independence of observations is violated, resulting in incorrect standard errors and inefficient estimates.The appropriate approach to analyzing such survey data is therefore based on nested sources of variability which come from different levels of hierarchy.
本研究的主要内容包括多水平 logistic 模型理论的基本原理与方法、实例拟合 过程(包括深圳市社区居民健康状况调查和广州市居民吸烟情况调查数据)、分析步 骤、方法比较及结果解释等。
本研究的数据预处理采用 SAS9.2 软件,多水平模型分析采用 MLwiN 和 SAS9.2 软 件,数据缺失值采用 SAS9.2 的 MI 过程进行处理。
Multilevel logsitic model differs from standard approaches,first:it allows the simultaneous examination of the effects of group-level and individuallevel predictors. Second:the nonindependence of observations within groups is accounted for, third:groups or contexts are not treated as unrelated, but are seen as coming from a larger population of groups, fourth: both interindividual and intergroup variation can be examined (as well as the contributions of individuallevel and group-level variables to these variations). Thus, multilevel analysis allows researchers to deal with the micro-level of individuals and the macro-level of groups or contexts simultanenously.
关键词 多水平 logistic 模型 层次结构数据 组内相关性 随机效应 固定效应 缺失数据
2
广东药学院硕士研究生学位论文
多水平 logistic 模型及其在流行病学调查数据中的应用
Multilevel logistic model And its application of
epidemiological survey data
It is found that failing to take into account the multilevel effects in the modeling, the standard logistic model has considerably either overestimated or underestimated ompared to the multilevel logistic model. Therefore, in the hierarchical struture data of epidemiological survey, the multilevel logistic model is a good choice.As the theory of multilevel logistic model of perfect and mature, multilevel logistic model will has greater advantages and more potential applications in the field of epidemiology.
1
广东药学院硕士研究生学位论文
多水平 logistic 模型及其在流行病学调查数据中的应用
时研究个体变异和组间变异,能够分析固体效应和随机效应,对研究因素可做出准确 的估计和假设检验。但是,多水平模型也有一定的局限性,如多水平模型要求低级水 平和高级水平的残差方差服从正态分布或多元正态分布,参数估计较复杂,等等。另 外,有层次结构的数据不一定需要做多水平模型分析,首先要看其组内相关性的大小, 即是否存在组内聚集性,如果不存在数据聚集性,则用一般统计模型就可以了。在实 际应用中,要结合专业知识和数据特征来选择合适的统计方法。
We Use the Shenzhen Residents Health Survey and Guangzhou residents smoking survey multistage stratified cluster data.These study are designed to assist in all aspects of working with multilevel logistic regression models, including model conceptualization, model description, understanding of the structure of required multilevel data, estimation of the model via the statistical package SAS,MLwiN and interpretation of the results.
相关文档
最新文档