Clustering in Generalized Linear Mixed Model using Dirichlet :在使用Dirichlet广义线性混合模型聚类
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
• Dynamic latent variable model (Dunson 2019) Let i index patient and t index follow-up time,
t1
itxitTv ( T jkxT jkv) jkit k0
15
GLMM – Advanced Applications
Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures
Ya Xue Xuejun Liao April 1, 2019
1
Introduction
Concept drift is in the framework of generalized linear mixed model, but brings new question of exploiting the structuring of auxiliary data.
a link function.
6
Generalized Linear Model(GLM)
DDE Example: binomial distribution
Scientific interest: does DDE exposure increase
the risk of cancer? Test on rats. Let i index rat.
Dependent variables:
yi ~Bi(n1,pi),pi :risokfcancfeorrait.
1, raitisdiagnowsietchdancer
yi 0, nocancer
.
Independent variable: dose of DDE exposure,
denoted by xi.
7
Generalized Linear Model(GLM)
Likelihood function of yi:
f(yi | pi)piyi(1pi)1yi expyi{ln1pipi ln11pi}
1exepxyi{pii}}{, wheireln1pipi .
Choosing the the likelihood
yi xi', i:subj.ect
GLM is a generalization of normal linear regression models to exponential family (normal, Poisson, Gamma, binomial, etc).
5
Generalized Linear Model (GLM)
0.5
0.45
0.4
(x0)
0.35
0.3
g(x)
0.25
0.2
0.15
0.1
0.05
0
-5
0
5
17
Bayesian Feature Selection in GLMM
Fixed effects: choose mixture priors for the fixed effects coefficients.
• Spatially varying coefficient processes (Gelfand 2019): random effects are modeled as spatially correlated process.
25
20
15
10
5
0
-5
-5
0
5
10
15
20
25
Possible application: A landmine field where landmines tend to be close together.
If we choose zij = 1, then only the intercept
varies for the different labs (random intercept model).
11
GLMM - Implementation
Gibbs sampling Disadvantage: slow convergence. Solution: hierarchical centering reparametrisation (Gelfand 1994; Gelfand 2019)
4
Generalized Linear Model (GLM)
A linear model specifies the relationship between a dependent (or response) variable
Y, and a set of predictor variables, Xs, so that
b i are “random” effects - deviations for lab i.
10
GLMM – Basic Model
ijxij'zij'bi
If we choose xij = zij , then all the regression
coefficients are assumed to vary for the different labs.
Nested GLMM: within each lab, rats were group housed with three cats per cage.
ij kxij'kzij'b ki vij'kij
let i index lab, j index cage and k index rat. Crossed GLMM: for all labs, four dose protocols
canonical link i
function becomes
ln pi 1pi
xi',
f(yi |xi,)1eexx ypixxp i{i''{}}
8
GLMM – Basic Model
Returning to the DDE example, 19 labs all over the world participated this bioassay.
• Generalized linear model (GLM) • Generalized linear mixed model (GLMM) • Advanced applications • Bayesian feature selection in GLMM
Part II: nonparametric method
Deterministic methods are only available for logit and probit models.
• EM algorithm (Anderson 1985) • Simplex method (Im 1988)
12
GLMM – Advanced Applications
Random effects: reparameterization • LDU decomposition of the random effect
covariance • Choose mixture prior for the elements in the
diagonal matrix.
18
Mixtures with a countably infinite number of components can be handled in a Bayesian framework by employing Dirichlet process priors.
2
Outline
Part I: generalized linear mixed model
• Chinese restaurant process • Dirichlet process (DP) • Dirichlet process mixture models • Variational inference for Dirichlet process mixtures
3
Part I Generalized Linear Mixed Model
• Rats are sorted into 19 groups by lab. • Rats are sorted into 4 groups by protocol.
14
GLMM – Advanced Applications
Temporal/spatial statistics: Account for correlation between the random effects at different times/locations.
16
Bayesian Feature Selection in GLMM
Simultaneous selection of fixed and random effects in GLMM (Cai and Dunson 2019)
Mixture prior: p ( x )( x 0 ) ( 1 ) g ( x )
Nested GLMM: within each lab, rats were group housed with three cats per cage. Two-level GLMM: level I – lab, level II – cage.
Crossed GLMM: for all labs, four dose protocols were applied on different rats.
There are unmeasured factors that vary between the different labs.
For example, rodent diet. GLMM is an extension of the generalized
linear model by adding random effects to the linear predictor (Schall 1991).
GLM differs from linear model in Байду номын сангаасwo major respects:
The distribution of Y can be non-normal, and
does not have to be continuous.
Y still can be predicted from a linear combination of Xs, but they are "connected" via
Missing Identification in GLMM
Data table of DDE bioassay
…… Berlin 1 0.01 0.00 34.10 40.90 37.50 Berlin 1 0.01 0.00 35.70 35.60 32.10 Tokyo 0 0.01 0.00 56.50 28.90 27.10 Tokyo 1 0.01 0.00 51.50 29.90 25.90 ……
What if the first column is missing? Unusual case in statistics, so few people work on it. But this is the problem we have to solve for
9
GLMM – Basic Model
The previous linear predictor is modified as:
ijxij'zij'bi,
where i1,,n index lab, j1,,niindex rat within lab i . are “fixed” effects - parameters common to all rats.
were applied on different rats.
i jxi'j zi'jbivi'j k
let i index lab, j index rat and k indicate the protocol applied on rat i,j.
13
GLMM – Advanced Applications
t1
itxitTv ( T jkxT jkv) jkit k0
15
GLMM – Advanced Applications
Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures
Ya Xue Xuejun Liao April 1, 2019
1
Introduction
Concept drift is in the framework of generalized linear mixed model, but brings new question of exploiting the structuring of auxiliary data.
a link function.
6
Generalized Linear Model(GLM)
DDE Example: binomial distribution
Scientific interest: does DDE exposure increase
the risk of cancer? Test on rats. Let i index rat.
Dependent variables:
yi ~Bi(n1,pi),pi :risokfcancfeorrait.
1, raitisdiagnowsietchdancer
yi 0, nocancer
.
Independent variable: dose of DDE exposure,
denoted by xi.
7
Generalized Linear Model(GLM)
Likelihood function of yi:
f(yi | pi)piyi(1pi)1yi expyi{ln1pipi ln11pi}
1exepxyi{pii}}{, wheireln1pipi .
Choosing the the likelihood
yi xi', i:subj.ect
GLM is a generalization of normal linear regression models to exponential family (normal, Poisson, Gamma, binomial, etc).
5
Generalized Linear Model (GLM)
0.5
0.45
0.4
(x0)
0.35
0.3
g(x)
0.25
0.2
0.15
0.1
0.05
0
-5
0
5
17
Bayesian Feature Selection in GLMM
Fixed effects: choose mixture priors for the fixed effects coefficients.
• Spatially varying coefficient processes (Gelfand 2019): random effects are modeled as spatially correlated process.
25
20
15
10
5
0
-5
-5
0
5
10
15
20
25
Possible application: A landmine field where landmines tend to be close together.
If we choose zij = 1, then only the intercept
varies for the different labs (random intercept model).
11
GLMM - Implementation
Gibbs sampling Disadvantage: slow convergence. Solution: hierarchical centering reparametrisation (Gelfand 1994; Gelfand 2019)
4
Generalized Linear Model (GLM)
A linear model specifies the relationship between a dependent (or response) variable
Y, and a set of predictor variables, Xs, so that
b i are “random” effects - deviations for lab i.
10
GLMM – Basic Model
ijxij'zij'bi
If we choose xij = zij , then all the regression
coefficients are assumed to vary for the different labs.
Nested GLMM: within each lab, rats were group housed with three cats per cage.
ij kxij'kzij'b ki vij'kij
let i index lab, j index cage and k index rat. Crossed GLMM: for all labs, four dose protocols
canonical link i
function becomes
ln pi 1pi
xi',
f(yi |xi,)1eexx ypixxp i{i''{}}
8
GLMM – Basic Model
Returning to the DDE example, 19 labs all over the world participated this bioassay.
• Generalized linear model (GLM) • Generalized linear mixed model (GLMM) • Advanced applications • Bayesian feature selection in GLMM
Part II: nonparametric method
Deterministic methods are only available for logit and probit models.
• EM algorithm (Anderson 1985) • Simplex method (Im 1988)
12
GLMM – Advanced Applications
Random effects: reparameterization • LDU decomposition of the random effect
covariance • Choose mixture prior for the elements in the
diagonal matrix.
18
Mixtures with a countably infinite number of components can be handled in a Bayesian framework by employing Dirichlet process priors.
2
Outline
Part I: generalized linear mixed model
• Chinese restaurant process • Dirichlet process (DP) • Dirichlet process mixture models • Variational inference for Dirichlet process mixtures
3
Part I Generalized Linear Mixed Model
• Rats are sorted into 19 groups by lab. • Rats are sorted into 4 groups by protocol.
14
GLMM – Advanced Applications
Temporal/spatial statistics: Account for correlation between the random effects at different times/locations.
16
Bayesian Feature Selection in GLMM
Simultaneous selection of fixed and random effects in GLMM (Cai and Dunson 2019)
Mixture prior: p ( x )( x 0 ) ( 1 ) g ( x )
Nested GLMM: within each lab, rats were group housed with three cats per cage. Two-level GLMM: level I – lab, level II – cage.
Crossed GLMM: for all labs, four dose protocols were applied on different rats.
There are unmeasured factors that vary between the different labs.
For example, rodent diet. GLMM is an extension of the generalized
linear model by adding random effects to the linear predictor (Schall 1991).
GLM differs from linear model in Байду номын сангаасwo major respects:
The distribution of Y can be non-normal, and
does not have to be continuous.
Y still can be predicted from a linear combination of Xs, but they are "connected" via
Missing Identification in GLMM
Data table of DDE bioassay
…… Berlin 1 0.01 0.00 34.10 40.90 37.50 Berlin 1 0.01 0.00 35.70 35.60 32.10 Tokyo 0 0.01 0.00 56.50 28.90 27.10 Tokyo 1 0.01 0.00 51.50 29.90 25.90 ……
What if the first column is missing? Unusual case in statistics, so few people work on it. But this is the problem we have to solve for
9
GLMM – Basic Model
The previous linear predictor is modified as:
ijxij'zij'bi,
where i1,,n index lab, j1,,niindex rat within lab i . are “fixed” effects - parameters common to all rats.
were applied on different rats.
i jxi'j zi'jbivi'j k
let i index lab, j index rat and k indicate the protocol applied on rat i,j.
13
GLMM – Advanced Applications