A HETEROGENEOUS_BAYESIAN_REGRESSION_MODEL_FOR_CROSS-SECTIONAL_DATA_INVOLVING_A_SINGLE

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

PSYCHOMETRIKA—VOL.77,NO.2,293–314
A PRIL2012
DOI:10.1007/S11336-012-9252-X
A HETEROGENEOUS BAYESIAN REGRESSION MODEL FOR CROSS-SECTIONAL
DATA INVOLVING A SINGLE OBSERV ATION PER RESPONSE UNIT
D UNCAN K.H.F ONG
THE PENNSYLV ANIA STATE UNIVERSITY
P ETER E BBES
THE OHIO STATE UNIVERSITY
W AYNE S.D E S ARBO
THE PENNSYLV ANIA STATE UNIVERSITY
Multiple regression is frequently used across the various social sciences to analyze cross-sectional data.However,it can often times be challenging to justify the assumption of common regression coefﬁ-
cients across all respondents.This manuscript presents a heterogeneous Bayesian regression model that
enables the estimation of individual-level-regression coefﬁcients in cross-sectional data involving a sin-
gle observation per response unit.A Gibbs sampling algorithm is developed to implement the proposed
Bayesian methodology.A Monte Carlo simulation study is constructed to assess the performance of the
proposed methodology across a number of experimental factors.We then apply the proposed method to
analyze data collected from a consumer psychology study that examines the differential importance of
price and quality in determining perceived value evaluations.
Key words:Bayesian estimation,cross-sectional analysis,heterogeneity,consumer psychology.
1.Introduction
Cross-sectional data are collected and analyzed by researchers in most scientiﬁc domains including both the social and physical sciences.To draw statistical inferences,aggregate-level cross-sectional regression analysis is frequently used to functionally relate a dependent variable to a set of designated independent variables.However,in many cases,it can be challenging to justify the assumption of common regression coefﬁcients(i.e.,response homogeneity)across all respondents.For example,in consumer psychology studies(e.g.,studies on customer satis-faction)conducted to relate customers’overall perceived value responses to perceived quality and perceived price ratings,it is unlikely that each and every respondent weighs price more heavily than quality(or vice versa)in the formation of their value perception or utility func-tion.Hence,a simple aggregate-level-regression analysis with perceived quality and perceived price as independent variables(here denoted as the X independent variables)to predict per-ceived value response(here denoted as the Y dependent variable)may be inappropriate.In-deed,in the customer satisfaction and perceived value literature,despite the relative congruence among the major components of perceived value(i.e.,perceived quality and perceived price), a number of researchers have noticed considerable heterogeneity in terms of the relative im-portance of these components of perceived value(see Zeithaml,1988;Sinha&DeSarbo,1998;
Duncan K.H.Fong is Professor of Marketing and Statistics.Wayne S.DeSarbo is the Smeal Distinguished Research Professor of Marketing.Peter Ebbes is Visiting Assistant Professor.
Requests for reprints should be sent to Duncan K.H.Fong,Marketing Department,Smeal College of Business, Pennsylvania State University,456Business Building,University Park,PA16802,USA.E-mail:i2v@
293
©2012The Psychometric Society
294PSYCHOMETRIKA
Swait&Sweeney,2000;DeSarbo,Jedidi,&Sinha,2001;DeSarbo,Ebbes,Fong,&Snow,2010). When background demographic/ﬁrmographic variables(here denoted as the Z descriptor vari-ables)are available,some researchers may run an analysis involving all possible interaction terms between these Z variables with the X variables in an aggregate-level moderator regression model to account for possible individual differences.However,this approach is restrictive because it as-sumes that respondents with the same Z values have identical(combined)regression coefﬁcients associated with the X variables.Moreover,there are limited individual effects if all of the inter-action terms in such an approach are insigniﬁcant.
To fully account for individual differences,it is more appropriate to assume a regression model with respondent level coefﬁcients.Bayesian estimation of such individual-level coef-ﬁcients then becomes an extension of the normal mean problem considered in Berger(1985, pp.183–193).However,we are not aware of any published work that addresses this regression estimation problem with non-replicated cross-sectional data.Since hierarchical Bayesian regres-sion models have been developed to analyze longitudinal panel or replicated cross-sectional data to obtain individual-level parameter estimates(e.g.,Allenby&Ginter,1995;Lenk,DeSarbo, Green,&Young,1996;Rossi,McCulloch,&Allenby,1996;Marshall&Bradlow,2002;De-Sarbo,Fong,Liechty,&Coupland,2005;Liechty,Fong,&DeSarbo,2005),these models may be extendable to cross-sectional data.However,when there is only a single observation per re-sponse unit,we show that the use of conventional vague proper priors in such traditional hier-archical Bayesian regression models can lead to undesirable results.Indeed,when some prior variances are very large or approach inﬁnity in the limit to represent vague information,the cor-responding posterior distribution becomes improper.This implies that the posterior results are sensitive to prior speciﬁcations as they become unstable when the prior variances are set larger and larger.(See the related work on the propriety of posterior distributions in Hobert&Casella, 1996;Berger,2000;Sun,Tsutakawa,&He,2001;Bayarri&Berger,2004;and Berger,Straw-derman,&Tang,2005.)
To overcome this impropriety problem,we propose a heterogeneous Bayesian regression model to obtain individual-level parameter estimates in single observation cross-sectional studies even when some prior variances become inﬁnitely large.We show that the posterior distribution of our proposed model is proper such that the Bayesian results are stable with respect to vari-ous proper priors with large variances.Whereas subjective information may be used to specify informative priors,in many cases there are no strong prior beliefs;in these situations,“let the data speak”approaches are preferred.In such cases,vague priors(e.g.,proper priors with large variances)are commonly employed and a model that is not as sensitive to speciﬁc vague prior speciﬁcations is thus desirable.
Note that,for most survey-based studies in social science applications,administering mul-tiple surveys longitudinally is very costly and in many cases impractical.For example,cross-sectional data involving a single observation per response unit are collected in the vast majority of customer psychology studies involving the measurement of customer satisfaction and value (DeSarbo&Grisaffe,1998;Sinha&DeSarbo,1998;DeSarbo et al.,2001).It would be nearly impossible to attempt to contact the exact same set of consumers months later to replicate the same study.Thus,our proposed Bayesian procedure could be very useful in actual practice,and can result in substantial savings in terms of data collection costs.We develop a Gibbs sampling al-gorithm to generate random samples from the joint posterior distribution for our proposed model to obtain estimates of the various parameters of interest.Given the nature of the speciﬁc applica-tion described in this manuscript,we develop such models for situations involving unconstrained regression coefﬁcients as well as for sign constrained regression coefﬁcients.
Section2describes the technical aspects of our proposed heterogeneous Bayesian regres-sion model for cross-sectional data with a single observation per response unit.We prove that the joint posterior distribution from our model is proper when the prior variances approach inﬁn-ity,whereas the conventional vague priors that are commonly used in a traditional hierarchical
DUNCAN K.H.FONG,PETER EBBES,AND W AYNE S.DESARBO 295
Bayesian model lead to an improper posterior distribution.In Section 3,we provide a simulation study to examine estimation performance with known data structures and parameters.In Sec-tion 4,we present the results of our proposed methodology for a consumer psychology study conducted to examine the relative impact of price and quality on perceptions of consumer value.Finally,conclusions and a number of suggestions for future research are provided in Section 5.
2.The Proposed Model
2.1.Model and Prior Distributions
Let y i be the observation on the dependent variable for respondent i .We assume an individual-level-regression model where,for i =1,...,N ,
y i =x i βi +εi ,(1)
where x i is a (column)vector of dimension J containing the independent variable values for respondent i ,and βi is a vector of subject-speciﬁc regression parameters.It is assumed that the error terms εi are independently and normally distributed as N(0,σ2).If respondent background variables (e.g.,demographics and/or ﬁrmographics)are available,they can be used to formally model the regression parameters as follows:
βi =αz i +ξi ,(2)
where z i is a (column)vector of dimension K containing background variable values for respon-dent i ,and αis a J by K matrix of impact coefﬁcients.The error terms ξi are independent and follow a multivariate normal distribution with zero mean vector O and covariance matrix Σ,i.e.,N J (O ,Σ).An intercept term is introduced for each vector component in Equation (2)by setting the ﬁrst element of z i at 1.
Note that,when ξi =O for all i (or equivalently,Σis a zero matrix),Equations (1)and (2)can be combined to yield an aggregate-level-regression model with common regression coefﬁ-cients across all respondents:
y =X ∗Z ∗vec (α)+e ,(3)
where y =(y 1...y N ) ,vec (α)=(α11...α1K ...αJ 1...αJ K ) ,e =(ε1...εN ) ,X ∗=B(x 1,...,x N )is a block matrix of order N by NJ having x i ,i =1,...,N ,as main
diagonal blocks and zero matrices for the off-diagonal blocks,and Z ∗= B(z 1,...,z 1)...B(z N ,...,z
N )
,where B(z i ,...,z i )is a J by J K block matrix with the J z i as main diagonal blocks so that Z ∗is a
matrix of order NJ by J K .In this form,estimates of αin (3)can be obtained using a standard Bayesian linear multiple regression model (Bernardo &Smith,1994).We call this the compound regression model ,which is used here as a benchmark model for comparison purposes.For this benchmark model,the individual-level-regression coefﬁcients can be obtained as βi =αz i .
When the error term in Equation (2)is not identically equal to zero,estimating the regres-sion coefﬁcients becomes much more challenging.Speciﬁcally,we prove in Section 2.2.2(The-orem 2)that a standard hierarchical Bayesian model yields an improper posterior distribution when prior variances approach inﬁnity.Our proposed approach can overcome this problem;we show in Section 2.2.1that the posterior distribution from our model is proper.First,we specify
296PSYCHOMETRIKA
the prior distributions for our proposed model:
vec(α)∼N J K(O,γ),(4)
σ−2∼G(p,q),(5)Σ−1|σ−2=σ−2C−1,and C−1∼W J(a I J,b),(6) where W J(∗,∗)denotes a Wishart distribution,G(∗,∗)represents a Gamma distribution,I J is the identity matrix of dimension J,and a,b,p,and q are positive numbers.Note that Equa-tions(4)and(5)are standard assumptions in conventional hierarchical Bayes models.In Equa-tion(6),we introduce a scale-free precision matrix(C−1)which is commonly used in Bayesian dynamic linear models(West&Harrison,1997).Unlike the standard hierarchical Bayesian model in which the priors forσ−2andΣ−1are independent,we assume thatσ−2and C−1 are independent,where C−1follows a Wishart distribution.In Theorem1below,we show that the proposed priors yield a proper posterior distribution even if we set the prior variances in Equations(4)and(5)arbitrarily large,according to the convention in Bayesian analyses where large variances are used when prior information is vague.
2.2.Posterior Distributions
Weﬁrst establish the propriety of the joint posterior distribution from our proposed model.
2.2.1.Theorem1For the model given by Equations(1)and(2)with priors speciﬁed in Equations(4),(5)and(6),whereγ−1is a zero matrix and p and q approach0,the joint posterior distribution is proper.In addition,we obtain the following results:
(a)The conditional posterior distribution ofβi,i=1,...,N is
π
βi|y,α,σ−2,C−1
=N J
1
σ2
x i x i+Σ−1
−1
y i
σ2
x i+Σ−1αz i
,
1
σ2
x i x i+Σ−1
−1
.(7)
(b)The conditional posterior distribution of vec(α)is
π
vec(α)|y,σ−2,C−1
=N J K
μα,σ2
Z∗ X∗
diag
x i Cx i+1
−1
X∗Z∗
−1
,(8)
whereμα=(Z∗ X∗ [diag(x i Cx i+1)]−1X∗Z∗)−1Z∗ X∗ [diag(x i Cx i+1)]−1y,and diag(x i Cx i+1)is a diagonal matrix with(scalar)elements x i Cx i+1,i=1,...,N, on the diagonal.
(c)The conditional posterior distribution ofσ−2is
π
σ−2|y,C−1
=G
N−J K
2
,
1
2
y−X∗Z∗μα
diag
x i Cx i+1
−1
y−X∗Z∗μα
.(9)
(d)The posterior distribution of C−1is
π
C−1|y
∝
|diag(|C−1+x i x i|)|−1
|Z∗ X∗ [diag(|C−1+x i x i|)]−1X∗Z∗|
1
2
×
y−X∗Z∗μα
diag
C−1+x i x i
−1
y−X∗Z∗μα
−(N−J K)
2×π
C−1
,(10)
DUNCAN K.H.FONG,PETER EBBES,AND W AYNE S.DESARBO297 whereπ(C−1)is the prior distribution of C−1.The proof of this theorem is given in Appendix A.
The posterior distributions in Theorem1(a)to1(d)above show how prior distributions are updated at various stages.For example,vec(α)is given a prior with an inﬁnitely large variance (to represent vague information),but conditional on y,σ−2,and C−1,it has a normal posterior distribution with aﬁnite variance.Also,as seen in(a),the posterior mean ofβi depends on both x i and z i;thus,respondents with the same Z values do not need to have identical regression coef-ﬁcient estimates as required in aggregate-level moderator regression analysis.With this theorem, one can be assured that for the proposed model the use of proper priors with large variances does not lead to divergent results in the limit.However,this is not the case with the traditional hier-archical Bayesian model:Theorem2shows that conventional vague priors lead to an improper posterior distribution.
2.2.2.Theorem2Under the same conditions speciﬁed in Theorem1,with the exception that the prior forΣ−1is nowπ(Σ−1)=W J(a I J,b),the corresponding joint posterior distribu-tion is improper.The proof of this theorem is given in Appendix B.
2.3.The Full Conditional Distributions
In many social science and consumer psychology studies,certain designated regression co-efﬁcients are expected to have particular signs;e.g.,higher perceived quality should result in higher perceived value.Thus,we consider the full conditional distributions for the general case in which some components of the regression coefﬁcients may be under sign constraints.Let β∗i=(β∗i1...β∗iJ) be the corresponding vector of regression parameters.Without loss of gen-erality,we assume non-negative sign constraints andβ∗ij=βij I{βij>0}for some j.We show in
Appendix C that a Gibbs sampler can be used to generate random samples from the joint pos-terior distribution to obtain estimates of the individual-level-regression coefﬁcients.Speciﬁcally, random deviates are drawn iteratively and recursively from the full conditional distributions, which are all standard probability distributions.As shown in Appendix C,the full conditional distribution for vec(α)is a normal distribution.The full conditional distribution forσ−2is a gamma distribution,and that for C−1is a Wishart distribution.Ifβ∗ij=βij,the full conditional distribution forβij is a normal distribution;ifβ∗ij=βij I{βij>0},we consider the full conditional distribution of F ij(βij)where F ij(∗)is the cumulative distribution function ofβij that has a closed-form inverse.This full conditional distribution is a uniform distribution and a new value forβ∗ij is obtained from each generatedβij.
3.Monte Carlo Simulation
In this section,the proposed heterogeneous Bayesian regression model for cross-sectional data employing the corresponding Gibbs sampling algorithm is evaluated with a simulation study. We start with a description of the experimental design and the factors that are investigated in the study.
3.1.Simulation Study Design
We generated data according to the model presented in Equations(1)and(2)in Section2. We experimentally manipulate the following nine factors:
F1Sample size,with levels(1)N=250,(2)N=750,(3)N=1250;
F2Number of regressors in X,with levels(1)J=2,(2)J=4,(3)J=6;
298PSYCHOMETRIKA
F3Number of background variables in Z,with levels(1)K=1,(2)K=4,(3)K=7,
(4)K=10.In all cases,we include an intercept term in Equation(2);
F4The structure of the variance-covariance matrixΣ,with levels(1)Diagonal,(2)Non-diagonal;
F5Value ofσ2,with levels(1)Low,(2)Medium,(3)High.We discuss the speciﬁc values for these levels at the end of this subsection;
F6Starting values,with levels(1)Random,(2)Compound regression model,(3)True val-ues;
F7Thinning of the Gibbs sampler,with levels(1)No thinning(i.e.,all iterations are saved),
(2)Thinning in every10th iteration(i.e.,saving only every10th iteration);
F8Burn-in,with levels(1)2500iterations,(2)25000iterations;
F9Prior settings,with levels(1)Vague,(2)Informative around the true values.
Factors F1to F5specify the data-generating mechanism,while Factors F6to F9deﬁne aspects of the MCMC algorithm that are presented in Section2.3,as well as the prior sensitivity of the model.We consider three different ways to obtain starting values(F6).For F6=1,we randomly generate the starting values from the prior distributions but with smaller variances;for F6=2,we use the results of the compound regression model(see Section2.1)as starting values. Finally,for F6=3,we employ the true values.F7and F8allow us to investigate the efﬁciency of the proposed Gibbs sampling algorithm.If these factors have minimal effects on the recovery of the parameters and convergence aspects of the algorithm,they can all be set at low levels to avoid unnecessary computation.In that case,the proposed algorithm may be more efﬁciently implemented with a fewer number of total iterations.F9assesses the sensitivity of the Bayesian estimates with respect to prior input.When F9=1,we setγ=1000I J K,p=q=1,b=J+3, and a=b to represent vague prior information.When F9=2,we center the priors at the true parameter values with variances set to0.5.In all cases,we use a total of10000draws to compute various posterior summary measures.
Consistent with previously published methodological work in Psychometrika(e.g.,DeSarbo, Johnson,Manrai,Manrai,&Edwards,1992;DeSarbo,Fong,Liechty,&Saxton,2004),we con-ducted our simulation study by employing a fractional factorial design(Addelman,1961)in-volving main effects only estimation.This design allows us to efﬁciently investigate the effect of the above nine factors on designated performance measures without having to consider every possible combination of factor levels.Utilizing an orthogonal design,we generated a total of64 synthetic data sets,where the true values for the model parameters were determined as described below:
(1)Impact coefﬁcients(α)were generated from uniform distributions:αjk∼U(−1,1).We
generated the constantαj0from U(4,6);
(2)Error varianceσ2was generated from an inverted gamma distribution with mean0.5
(F5=1),1(F5=2),and2(F5=3),all with the same variance of0.25;
(3)The inverse of the scale-free precision matrix C is diagonal when F4=1,where we
drew each diagonal element of C from an inverted gamma distribution with mean1and variance0.25.For F4=2,we drew C from an inverted Wishart with parameters11+J and10I J,so that the mean of C is the identity matrix I J,and the covariance matrix of C is given by var C ii=0.25and var C ij=0.11.
We generated all regressors X and Z from standard normal distributions.The errors i andξi were drawn from normal distributions with mean zero and variances speciﬁed by F4and F5 above.
DUNCAN K.H.FONG,PETER EBBES,AND W AYNE S.DESARBO 299
3.2.The Performance Measures
To evaluate the performance of the proposed Bayesian procedure under various experimental conditions,we use three major categories of performance measures:(1)Overall Goodness-of-Fit (GOF),(2)recovery of the true parameter values,and (3)MCMC algorithm diagnostics.We next discuss each category in turn,and use θto denote the vector of model parameters,speciﬁcally:θ= β 1
,...,β N ,vec (α) ,vec (C ) ,σ2 .(11)Goodness-of-Fit The posterior predictive p -value has been proposed in the literature (Gel-man &Meng,1998)to assess the compatibility of a given model with the observed data.How-ever,there is a problem of “double use”of the data with such an approach,as it may lead to unusual behavior such as extreme conservatism with respect to the resulting p -values (Bayarri &Berger,2000;Lee &Tang,2006).An alternative for overcoming this problem is the partial posterior predictive (ppp)p -value based on the partial posterior predictive distribution.First,the partial posterior distribution is given by (Bayarri &Castellanos,2007):
π(θ|Y obs \t obs )∝π(Y obs |θ)π(θ)π(t obs ,(12)
where π(Y obs |θ)is the likelihood function corresponding to the observed data Y obs ,π(θ)is the prior distribution,and π(t obs |θ)is the likelihood function corresponding to an observed test statistic t obs .The partial posterior predictive distribution is then equal to
m ppp (t |Y obs \t obs )=
π(t |θ)π(θ|Y obs \t obs )d θ.
(13)We implement the procedure outlined in Bayarri and Castellanos (2007)to sample from the partial posterior predictive distribution.Here,we choose T (Y )=(1/N) i y i =Y to assess whether the model appropriately describes the central tendency of the dependent variable across the experimental conditions.
In addition,we generated new values for the dependent variable from the posterior predictive distribution and compare them to the observed data.Speciﬁcally,in each sweep l =1,...,L of the Gibbs sampler,we compute MSE l =N −1 i (y pred l,i −y i )2,where y pred l,i is generated from the posterior predictive distribution.
Recovery of True Parameter Values We examine parameter recovery by computing the bias and mean-square errors.In each sweep of the MCMC chain,we compute for each element j of θ
the mean-squared error MSE (θj,l )=(θj,l −
θj )2and bias Bias (θj,l )=θj,l − θj ,where θj denotes the true value.For simplicity and clarity,we report our results for blocks of similar parameters;e.g.,we compute the average bias and MSE for all elements of β1,...,βN ,all elements of α,all diagonal elements of Σ,and all off-diagonal elements of Σ.
MCMC Diagnostics We consider three measures to assess the performance of the proposed algorithm:(1)computation time (CPU),(2)the lag-k autocorrelations for k =10and 30,and
(3)Geweke’s rge and non-vanishing autocorrelations at high lags suggest slow mixing and possibly non-convergence,and the MCMC chain should be run longer to obtain larger samples that cover the parameter space.The extent of the autocorrelation may depend on the complexity of the model,among other things (Congdon,2006).To further examine convergence of the Gibbs sampling procedure,we use a method proposed by Geweke (1992).Speciﬁcally,for each element j of θ,we use L 1observations from the early part of the chain to compute the
300PSYCHOMETRIKA
meanθ1j=1L
1
l1
θj,l
1
and we use L2observations from a(non-overlapping)later part of the
chain to computeθ2j=1L
2
l2
θj,l
2
.Geweke’s diagnostic is then given by
G j=
θ1j−θ2j
(S1
L1
+S2
L2
)
,(14)
where S1and S2are estimates for the variances obtained from the spectral density at zero fre-
quency(Heidelberger&Welch,1981;Geweke,1992).We follow Geweke(1992)in setting
L1=0.1×L and L2=0.5×L.Values of|G j|>2indicate that the chain may not have con-verged.
As is done for the bias and MSE measures above,we average the autocorrelations
and Geweke’s diagnostic values across blocks of similar parameters(across the elements of
β1,...,βN,across the elements ofα,and across the elements ofΣ),and report only these
values to conserve on space.
3.3.The Simulation Results
To determine the statistical effects of the nine factors on the various performance measures,
we ran separate(main effects only)ANOV A models using the manipulated design factors as
independent variables and the posterior mean of each performance measure as the dependent
variable.The primary results are shown in Table1.We report the means and standard devia-
tions across the64experimental cells,and list the design factors(if any)that are signiﬁcant in
explaining variability in the performance measures.
Recovery of True Parameter Values To examine the recovery of the true parameters,we
ﬁrst consider the difference between the posterior means and the true parameter values(i.e.,
subject-speciﬁc regression parameters,impact coefﬁcients,and the covariance matrixΣ=σ2C)
as indicated by the bias measure in Table1.It can be seen that on average,the biases are ap-
proximately zero across the64experimental cells.For the bias in the regression parameters,F6
(starting values)is signiﬁcant;however,the average biases for the three levels of F6are very sim-
ilar(i.e.,−0.03,0.00,and0.01for F6=1,2,and,3,respectively)and not practically different
from zero.For the diagonal elements ofΣ,F8(burn-in)and F9(prior)are signiﬁcant,but the
average biases are still very small for all levels of these factors(i.e.,the biases range from−0.02
to0.07).
Next,we consider the MSE of the estimators.From the results presented in Table1,F5
(value ofσ2)is signiﬁcant in explaining the MSE for all parameters.Particularly,weﬁnd that
the MSE values are lower for smaller values ofσ2.The MSE for the off-diagonal elements of
Σis smaller than that for the diagonal elements.1The MSE for the impact coefﬁcientsαis the
lowest,and its value is lower in the case of larger sample sizes stipulated under F1.However,
the MSE for the parametersβis the largest.This result is expected because estimates of the
regression parameters involve posterior uncertainty from both the impact coefﬁcientsαand error
covariance matrixΣ,which reﬂects the amount of unexplained variance in(2).Across the64
cells and the diagonal ofΣ=σ2C,the average(true)unexplained variance is0.93with values
that range from0.11to3.78.This average may be seen as a lower bound for the MSE forβ
in Table1.Overall,we conclude that,in terms of parameter recovery,the proposed approach is
fairly robust to the nine factors under investigation.
1The bias and MSE forσ2are−0.02and0.07respectively.We observe similar results forσ2as forΣ(j,j), j=1,...,J.
DUNCAN K.H.FONG,PETER EBBES,AND W AYNE S.DESARBO301
T ABLE1.
Simulation study results:the reported means and standard deviations for the performance measures computed across64 experimental cells.Factors that are signiﬁcant at the5%level(indicated by‘∗’)or the1%level(indicated by‘∗∗’)are listed in the last column.
Mean Std.dev.Signiﬁcant factors Recovery of true parameter values
Biasβ’s−0.010.06F6∗
α’s−0.010.03
Σ(i,i)’s0.020.18F8∗,F9∗
Σ(i,j)’s0.030.11
MSEβ’s1.481.37F5∗∗
α’s0.020.02F1∗∗,F5∗∗
Σ(i,i)’s0.140.37F5∗∗
Σ(i,j)’s0.080.20F5∗∗
Goodness-of-ﬁt
0.410.25
Partial posterior predictive p-values
for T(Y)=Y
MSE predictions1.791.63F5∗∗
MCMC diagnostics
Lag-10ARβ’s0.020.04F1∗∗,F2∗∗,F7∗∗
α’s0.180.20F1∗∗,F2∗∗,F7∗∗
Σ0.330.30F1∗,F2∗∗,F7∗∗,F9∗∗Lag-30ARβ’s0.010.02F1∗∗,F2∗∗,F7∗∗
α’s0.050.09F1∗∗,F2∗∗,F7∗∗
Σ0.140.16F2∗∗,F7∗∗Geweke’s diagnosticβ’s0.860.09
α’s0.820.23
Σ0.810.36
Total MCMC time(in minutes)28.036.3F1∗∗,F2∗∗,F7∗∗,F8∗
Goodness-of-Fit In Table1,we report the average estimated partial predictive p-value for the sample mean of Y across the64cells.Since none of the factors are signiﬁcant and the p-value has a mean of0.41with a0.25standard deviation,there is no evidence of incompatibility of the data with the assumed model across the range of conditions considered in our simulation study.Note that the MSE of the in-sample predictions is affected by F5;however,this result is expected because largerσ2tends to yield larger MSE.On average,the unexplained variance in Y equals0.92and ranges from0.10to3.40.Since the MSE of the in-sample predictions consists of both the unexplained variance and the posterior variability in the regression parametersβi,the range of these MSE values is within the simulation error.
MCMC Diagnostics From Table1,the average Geweke’s diagnostic values across all sim-ulated datasets for the three sets of parameters are all below1,suggesting that the chains have converged(an inspection of the trace plots for some parameters also conﬁrms theseﬁndings). Also,none of the factors are signiﬁcant which implies that the MCMC algorithm does not sig-niﬁcantly degrade under a low level of burn-in and no thinning,thus reducing the computational burden.
Although lower autocorrelations are achieved for larger sample sizes(F1),fewer regres-sors(F2),and some thinning(F7),the results shown in Table1indicate that autocorrelations are small forβandα,and they also dampen quickly forΣ.Thus,the mixing of the sampler is accept-able here.In general,thinning is a useful tool to reduce autocorrelations.For example,when the。