混合logit模型

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

混合logit模型
•研究出行选择行为（选择何种交通方式出行）
•研究消费者商品选择行为（选择购买何种商品）
•研究顾客的满意度（满意度的影响因素）
•研究某种事物的接受度
•产品的市场份额估计
•支付意愿及选择偏好
2 数据描述及研究步骤
2.1 数据描述
我们利用inschoice.dta来应用条件logit模型、混合logit 模型、随机参数logit模型、潜类别logit模型。

该数据集包含6个变量用于记录250人的可用保险计划和选定计划的信息，各变量的描述如下：
•id：用于识别个体
•premium：保费（随方案而变），Insurance premium
(in $100/month)
•deductible：免赔额（随方案而变），Deductible (in $1,000/year)
•ine：收入（个人属性），Ine (in $10,000/year)
•insurance：保险方案（可选方案），Insurances
•choice：选定的保险方案（因变量），Chosen
alternative
首先，我们看一下数据前10行的格式：
. list in 1/10, sepby(id) abbreviate(10)
+------------------------------------------------
----------+
| id premium deductible ine insurance
choice |
|------------------------------------------------
----------|
1. | 1
2.87 1.70 5.74 Health
1 |
2. | 1
3.13 2.14 5.74 HCorp 0 |
3. | 1 2.03 2.26 5.74 SickInc 0 |
4. | 1 1.65 2.94
5.74 MGroup 0 |
5. | 1 0.87 3.56 5.74 MoonHealth 0 |
|------------------------------------------------
----------|
6. | 2 3.52 1.24 2.89 Health 0 |
7. | 2 3.23 1.52 2.89 HCorp 0 |
8. | 2 2.81 2.31 2.89 SickInc 0 |
9. | 2 1.04 2.58 2.89 MGroup 1 |
10. | 2 0.93 3.17 2.89 MoonHealth 0 |
+------------------------------------------------
----------+
然后，查看下数据的基本特征：
. sum id premium deductible ine insurance choice
Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------
-----------------
id | 1,250 125.5 72.19709 1 250
premium | 1,250
2.298161 .858024 .0568172 4.348273
deductible | 1,250
2.194286 .7541999 .334168 4.171037
ine | 1,250 4.935434 1.440165
0 8.337807
insurance | 1,250 3 1.41478 1 5
-------------+----------------------------------------
-----------------
choice | 1,250 .2 .4001601 0 1
2.2 研究步骤
本文主要目的是通过inschoice.dta介绍stata估计混合
logit模型、潜类别logit模型和随机参数logit模型的方
法，同时为了做对比，也将估计条件logit模型，具体流程如
下：
•估计条件logit模型；
•估计混合logit模型；
•估计随机参数logit模型；
•在估计潜类别logit模型。

3 应用实例
3.1 条件logit模型
由于条件logit模型只接受随方案而变的自变量，因此仅使用premium、deductible变量分析其对选择保险方案的影响。

此外，数据集本身没有设置各方案的虚拟变量，故第一步先生成各保险方案的虚拟变量：
gen Health=0
gen HCorp=0
gen SickInc=0
gen MGroup=0
gen MoonHealth=0
replace Health=1 if insurance==1
replace HCorp=1 if insurance==2
replace SickInc=1 if insurance==3
replace MGroup=1 if insurance==4
replace MoonHealth=1 if insurance==5
然后，进行条件logit模型回归，以方案5(MoonHealth)为参考类别：
. clogit choice Health HCorp SickInc MGroup premium deductible,group(id) nolog
Conditional (fixed-effects) logistic regression
Number of obs = 1,250
LR
chi2(6) = 211.64
Prob > chi2 = 0.0000
Log likelihood = -296.53966 Pseudo
R2 = 0.2630
------------------------------------------------------
------------------------
choice | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------
------------------------
Health | 3.520578 .7210234 4.88 0.000 2.107398 4.933757
HCorp | 2.913834 .565827 5.15 0.000 1.804834 4.022835
SickInc | 2.094392 .4213334 4.97 0.000 1.268594 2.92019
MGroup | .9859225 .3116788 3.16
0.002 .3750433 1.596802
premium | -2.453505 .2142519 -11.45 0.000 -2.873431 -2.033579
deductible | -.9891893 .2936813 -3.37 0.001 -1.564794 -.4135846
------------------------------------------------------
------------------------
从估计的结果可以看出，如果其他解释变量的取值相同（premium、deductible），则个人最有可能选择Health保险
方案。

另外，一个方案的保费和免赔额越高，则选择该方案的
概率越低。

由于离散选择模型是非线性模型，故模型的系数只
有其符号能反映正负影响，而不能反映影响的大小，所以在上
述命令中加入OR计算其风险比：
. clogit choice Health HCorp SickInc MGroup premium deductible,group(id) nolog or
Conditional (fixed-effects) logistic regression
Number
of obs = 1,250
LR
chi2(6) = 211.64
Prob >
chi2 = 0.0000
Log likelihood = -296.53966 Pseudo
R2 = 0.2630
------------------------------------------------------
------------------------
choice | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------
------------------------
Health | 33.80394 24.37344 4.88 0.000 8.226804 138.9004
HCorp | 18.42732 10.42667 5.15 0.000 6.07896 55.85922
SickInc | 8.120503 3.421439 4.97 0.000 3.555849 18.54482
MGroup | 2.680283 .8353873 3.16 0.002 1.455054 4.937216
premium | .0859917 .0184239 -11.45
0.000 .0565047 .1308663
deductible | .371878 .1092136 -3.37
0.001 .2091311 .6612756
------------------------------------------------------------------------------
premium的风险比为0.086，表示一个方案的保费每增加一个单位（100美元），则选择该方案的概率将会降低91.4%。

此外，如果各方案的保费和免赔额相等，则个人选择Health保险方案的概率是MoonHealth的33.804倍，其他变量可以类似的解释。

3.2 混合logit模型
相比于条件logit模型只接受随方案而变的变量，混合logit 还可以接受个人属性变量，为了进行对比，先进行仅包含随方案而变的混合logit模型。

同样，以方案5(MoonHealth)为参考类别：
. asclogit choice premium deductible,case(id) alternatives(insurance) base(5) nolog
Alternative-specific conditional logit Number of obs = 1,250
Case ID variable: id Number of cases = 250
Alternatives variable: insurance Alts per case: min = 5
avg = 5.0
max = 5
Wald chi2(2) = 133.80
Log likelihood = -296.53966
Prob > chi2 = 0.0000
------------------------------------------------------
------------------------
choice | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------
------------------------
insurance |
premium | -2.453505 .2142519 -11.45 0.000 -2.873431 -2.033579
deductible | -.9891893 .2936813 -3.37 0.001 -1.564794 -.4135846
-------------+----------------------------------------
------------------------
Health |
_cons | 3.520578 .7210234 4.88 0.000 2.107398 4.933757
-------------+----------------------------------------
------------------------
HCorp |
_cons | 2.913834 .565827 5.15 0.000 1.804834 4.022835
-------------+----------------------------------------
------------------------
SickInc |
_cons | 2.094392 .4213334 4.97 0.000 1.268594 2.92019
-------------+----------------------------------------------------------------
MGroup |
_cons | .9859225 .3116788 3.16
0.002 .3750433 1.596802
-------------+----------------------------------------------------------------
MoonHealth | (base alternative)
------------------------------------------------------------------------------
通过对比条件logit模型和混合logit模型的结果，可以发现当仅包含随方案而变的自变量时，混合logit模型和条件
logit模型的估计结果一致。

下面，在上述混合logit模型的基础上加入个人属性变量（ine），并计算各变量的风险比：
asclogit choice premium deductible,case(id) alternatives(insurance) base(5) casevars(ine) nolog or
Alternative-specific conditional logit Number of obs = 1,250
Case ID variable: id Number of cases = 250
Alternatives variable: insurance Alts per case: min = 5
avg = 5.0
max = 5
Wald chi2(6) = 136.57
Log likelihood = -290.93207
Prob > chi2 = 0.0000
------------------------------------------------------
------------------------
choice | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------
------------------------
insurance |
premium | .0790987 .0174853 -11.48
0.000 .0512868 .1219926
deductible | .3776923 .1118699 -3.29
0.001 .2113577 .6749293
-------------+----------------------------------------
------------------------
Health |
ine | 1.638965 .3107048 2.61 0.009 1.130327 2.376485
_cons | 3.5912 4.079012 1.13
0.260 .3876274 33.2709
-------------+----------------------------------------
------------------------
HCorp |
ine | 1.449831 .2586876 2.08 0.037 1.021976 2.056808
_cons | 3.515446 3.557963 1.24
0.214 .4835973 25.55506
-------------+----------------------------------------
------------------------
SickInc |
ine | 1.117928 .2027894 0.61
0.539 .7834442 1.595217
_cons | 5.390802 5.215012 1.74
0.082 .809485 35.90028
-------------+----------------------------------------------------------------
MGroup |
ine | 1.167163 .2149489 0.84
0.401 .8135256 1.674526
_cons | 1.357472 1.244309 0.33
0.739 .2251584 8.184149
-------------+----------------------------------------------------------------
MoonHealth | (base alternative)
------------------------------------------------------------------------------
Note: _cons estimates baseline odds for each oute.
结果显示，收入（ine）对选择Health和HCorp方案的选择在0.05水平上有影响。

例如年收入每增加一个单位（10,000美元），个人选择Health保险方案的概率将提高64%。

3.3 随机参数logit模型
随机参数logit模型假设个体之间存在异质性，通过模型参数的分布（均值、标准差）刻画个体的异质性。

stata15.0提供了asmixlogit命令用于估计随机参数logit模型，新的
stata16.0版本采用了新命令cmmixlogit作为官方命令，故本文采用cmmixlogit估计随机参数logit模型。

首先通过smset设置个体和可选方案变量，本数据集为id和insurance 变量，命令：
. cmset id insurance
caseid variable: id
alternatives variable: insurance
假定premium与deductible均为随机参数，并且假定参数服从正态分布；此外，随机参数logit模型是采用蒙特卡罗模拟法估计参数的，研究表明使用蒙特卡罗法时采用Halton序列抽样更有效率，故本文采用Halton序列抽样法进行演示，并设定每次Halton序列取1000个点，命令如下：
. cmmixlogit choice, random(deductible premium) casevars(ine) basealternative(5) intmethod(Halton) intpoints(1000)
Fitting fixed parameter model:
Fitting full model:
Iteration 0: log simulated likelihood = -290.34612 (not concave)
Iteration 1: log simulated likelihood = -290.22006 Iteration 2: log simulated likelihood = -288.96157 Iteration 3: log simulated likelihood = -288.87469 Iteration 4: log simulated likelihood = -288.87435 Iteration 5: log simulated likelihood = -288.87435
Mixed logit choice model Number of obs = 1,250
Case ID variable: id Number of cases = 250
Alternatives variable: insurance Alts per case: min = 5
avg = 5.0
max = 5
Integration sequence: Halton
Integration points: 1000 Wald chi2(6) = 63.16
Log simulated likelihood = -288.87435
Prob > chi2 = 0.0000
--------------------------------------------------------------------------------
choice | Coef. Std. Err. z
P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
insurance |
deductible | -1.155932 .3666095 -3.15
0.002 -1.874473 -.4373905
premium | -3.013726 .388873 -7.75
0.000 -3.775904 -2.251549
---------------+----------------------------------------------------------------
/Normal |
sd(deductible)| .8484104 .4390417 .3076947 2.339333
sd(premium)| .8561866 .4157747 .3305328 2.2178
---------------+----------------------------------------------------------------
Health |
ine | .643623 .2750944 2.34
0.019 .1044478 1.182798
_cons | 1.242772 1.453895 0.85
0.393 -1.60681 4.092354
---------------+----------------------------------------------------------------
HCorp |
ine | .4971046 .2452945 2.03
0.043 .0163362 .9778731
_cons | 1.486424 1.254783 1.18
0.236 -.9729056 3.945754
---------------+----------------------------------------------------------------
SickInc |
ine | .185903 .2281991 0.81 0.415 -.2613589 .633165
_cons | 2.094465 1.178101 1.78
0.075 -.2145708 4.4035
---------------+----------------------------------------------------------------
MGroup |
ine | .1462316 .2188757 0.67 0.504 -.2827569 .5752201
_cons | .7973504 1.108057 0.72
0.472 -1.374401 2.969102
---------------+----------------------------------------------------------------
MoonHealth | (base alternative)
------------------------------------------------------
--------------------------
LR test vs. fixed parameters: chi2(2) = 4.12 Prob > chi2 = 0.1277
Note: LR test is conservative and provided only for reference.
stata提供的检验关于是否需要采用随机参数的方法比较保
守，仅作为参考。

实际上，判断是否需要采用随机参数logit
的方法很简单，只需要看随机参数的均值和标准差是否显著即可。

stata官方命令计算的结果仅提供随机参数均值的Z和P
值，并不提供其标准差的显著性统计量。

我们可以很简单的计
算出其Z统计量（Z统计量=估计值/标准差）：
. display(0.8484104/ 0.4390417)
1.9324142
. display(0.8561866/ 0.4157747 )
2.0592561
通过计算出的Z值可以看出，premium的标准差在0.05显著
性水平上显著，deductible的标准差在0.1的显著性水平上
显著，表明考虑将这两个变量刻画成随机参数是合理的。

随机
参数logit模型的固定参数估计解释跟前述模型一致，故这里
只以deductible为例，介绍下如何解释随机参数。

deductible服从N( -1.155932, 0.8484104^2)的正态分布，
根据正态分布的累计概率计算结果表明，一个方案的免赔额越高，91.31%的个体选择该方案的概率更低，而8.69%的个体选
择该方案的概率更高，这体现了个体之间的异质性。

同样地，我们也可以计算每个变量的边际效应，以定量衡量各
变量对选择各方案的边际效应，以deductible为例，首先预
测每个保险方案的市场份额：
. margins
Predictive margins Number
of obs = 1,250
Model VCE : OIM
Expression : Pr(insurance), predict()
------------------------------------------------------
------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------
------------------------
_oute |
Health | .2030631 .0225357 9.01
0.000 .158894 .2472323
HCorp | .2586804 .0239616 10.80
0.000 .2117165 .3056442
SickInc | .2197405 .0220424 9.97
0.000 .1765382 .2629428
MGroup | .1871674 .0212356 8.81
0.000 .1455463 .2287884
MoonHealth | .1313486 .018806 6.98
0.000 .0944895 .1682078
------------------------------------------------------
------------------------
结果表明，根据本身的数据集预测有20.3%的个体选择Health
保险方案。

然后通过改变deductible的值，以考察选择
Health保险方案份额的变化，考虑将deductible增加10%：
. margins, at(deductible=generate(deductible*1.10)) alternative(Health)
Predictive margins Number
of obs = 1,250
Model VCE : OIM
Expression : Pr(insurance), predict()
Alternative : Health
at : deductible = deductible*1.10
------------------------------------------------------
------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------
------------------------
_oute |
Health | .1867732 .0218897 8.53
0.000 .1438702 .2296762
HCorp | .2662112 .0245576 10.84
0.000 .2180792 .3143433
SickInc | .2241216 .0223748 10.02
0.000 .1802678 .2679755
MGroup | .1902699 .0214875 8.85
0.000 .1481552 .2323847
MoonHealth | .132624 .0189892 6.98
0.000 .0954058 .1698422
------------------------------------------------------------------------------
我们可以发现，当deductible增加10%，选择Health保险方案的个体降至了18.7%。

注：本例只是为了演示作用，简单的估计了随机参数logit模型，在实际应用中需要多次测试随机参数服从的不同的分布形式、以及不同抽样序列的样本点值。

尤其是当变量数多了以后，需要多次测试每个变量是否可以建模为随机参数。

此外，Nlogit软件也可以估计随机参数logit模型，并且比stata 更灵活，可以自己写效用函数，甚至可以估计非线性效用函数，也可以将个人属性变量的参数估计刻画成随机参数，有兴趣的读者可以采用Nlogit软件试试，文末会给出Nlogit软件的参考文献和书目。

3.4 潜类别logit模型
Pacifico's (2012)写了个用EM算法估计潜类别logit模型的外部命令（lclogit），本文也将采用lclogit命令估计潜类别logit模型，使用该命令前需要先安装，这里就不演示命令安装过程了。

由于潜类别logit模型事先不知道最佳的潜类别数，故在正式估计模型前，需要根据不同类别数的CAIC和BIC指标确定合适的潜类别数。

首先，先运行2-5类的潜类别logit模型，以确定最佳的潜类别数，各类别的潜类别logit 统计量如下：
Classes LLF Nparam CAIC BIC ------------------------------------------------------
2 -307.8538 6 654.836
3 648.8363
3 -306.0769 10 677.368
4 667.3684
4 -305.9623 14 703.2252 689.2252
5 -306.0615 18 729.5093 711.5093
结果显示，当类别数为2时，CAIC和BIC指标均达到最小，
故应选取2个类别的潜类别logit模型。

并且在上述模型的基础上，加入方案的虚拟变量，下面将估计2类别潜类别logit
模型：
. lclogit choice Health HCorp SickInc MGroup premium deductible,group(id) id(id) nclasses(2) membership(ine) seed(123)
Latent class model with 2 latent classes
Choice model parameters and average classs shares
--------------------------------
Variable | Class1 Class2
-------------+------------------
Health | 6.081 2.841
HCorp | 4.892 2.484
SickInc | 3.626 1.780
MGroup | 1.947 0.829
premium | -3.052 -2.347
deductible | -1.206 -0.829
-------------+------------------
Class Share | 0.379 0.621
--------------------------------
Class membership model parameters : Class2 = Reference class
--------------------------------
Variable | Class1 Class2
-------------+------------------
ine | 2.556 0.000
_cons | -13.981 0.000
--------------------------------
Note: Model estimated via EM algorithm
上表显示了潜类别logit模型以类别2为参考类别，并报告了
每个类别的比例，由于我们更关心参数估计，故上面的结果就
不作过多的解释，下面将展示参数估计结果：
. lclogitml, iter(10)
Latent class model with 2 latent classes
------------------------------------------------------
------------------------
choice | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------
------------------------
choice1 |
Health | 6.565479 1.602333 4.10 0.000 3.424965 9.705994
HCorp | 5.16454 1.261782 4.09 0.000 2.691492 7.637588
SickInc | 3.777135 .9423269 4.01 0.000 1.930208 5.624062
MGroup | 2.009132 .7347536 2.73
0.006 .5690411 3.449222
premium | -3.465304 .5205146 -6.66 0.000 -4.485494 -2.445114
deductible | -1.321517 .5371505 -2.46 0.014 -2.374312 -.2687208
-------------+----------------------------------------
------------------------
choice2 |
Health | 2.932226 .8614032 3.40 0.001 1.243907 4.620546
HCorp | 2.516419 .6787702 3.71 0.000 1.186054 3.846784
SickInc | 1.803789 .5060035 3.56
0.000 .8120409 2.795538
MGroup | .8331806 .3580345 2.33
0.020 .1314458 1.534915
premium | -2.232606 .2445483 -9.13 0.000 -2.711912 -1.753301
deductible | -.7977167 .3597044 -2.22 0.027 -1.502724 -.092709
-------------+----------------------------------------
------------------------
share1 |
ine | 249.251 .8743337 285.08 0.000 247.5373 250.9646
_cons | -
1418.197 . . . .
.
------------------------------------------------------
------------------------
参数估计结果显示，无论是在类别1还是类别2，如果其他解
释变量的取值相同（premium、deductible），则个人最有可
能选择Health保险方案；相较于类别2，类别1的群体受保
费和免赔额的影响更大。

另外，一个方案的保费和免赔额越
高，则选择该方案的概率越低，这与条件logit模型估计的结
果一致。

此外，相比于类别2，类别1的人群更有可能是年收
入更高的人群。

同样的，我们采取exp(B)的方式计算风险
比，这里就不加以演示了。

4 结语
本文通过实例详细地介绍了条件logit模型、混合logit模型、随机参数logit和潜类别logit模型的stata实现过程，尤其介绍了随机参数logit和潜类别logit两种异质性模型，希望能对读者在利用这些模型做研究时提供帮助。

值得注意的是，本文并未比较各模型的优劣，在实际应用中可以采用伪R 方、预测准确性和BIC等指标选取最佳模型。

此外，以的经验来看，stata提供的命令虽然简单，但是局限性也较多，有兴
趣的读者可以考虑用Nlogit软件实现各种离散模型的估计。

如果各位读者有兴趣，后续可以专门写一篇用Nlogit软件估
计非线性效用函数的随机参数logit模型和有序随机参数
logit模型的推文。

5 参考文献
[1] Train K. Discrete Choice Methods With Simulation [M]. Second edition. Cambridge: Cambridge University Press, 2009.
[2] Mannering F L, Bhat C R. Analytic methods in accident research: methodological frontier and future directions [J]. Analytic Methods in Accident Research, 2014, 1: 1-22.
[3] Pacifico D , Yoo H I . Lclogit: A Stata Command
for Fitting Latent-Class Conditional Logit Models via the Expectation-Maximization Algorithm[J]. Stata Journal Promoting Communications on Statistics & Stata, 2013, 13(3):625-639.
[4] StataCorp. 2019.Stata choice models reference manual release 16. College Station, TX: StataCorp LLC.
[5] 陈强. 高级计量经济学及stata应用（第二版），2013.
学习Nlogit软件的参考书目：
[1 ]Hensher D A, Rose J M , Greene W H. Applied Choice Analysis [M]. Second edition. Cambridge: Cambridge University Press, 2015.
[2] Greene W H.2016. NLOGIT version 6 reference
guide[S]. Plainview, NY: Econometric Software.。