HMM工具箱命令使用说明
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
HOW TO USE THE HMM TOOLBOX (MATLAB)
一、离散输出的隐马尔科夫模型(DHMM,HMM with discrete outputs)
最大似然参数估计EM(Baum Welch算法)
The script dhmm_em_demo.m gives an example of how to learn an HMM with discrete outputs. Let there be Q=2 states and O=3 output symbols. We create random stochastic matrices as follows.
O = 3;
Q = 2;
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
obsmat0 = mk_stochastic(rand(Q,O));
Now we sample nex=20 sequences of length T=10 each from this model, to use as training data.
T=10; %序列长度
nex=20; %样本序列数目
data = dhmm_sample(prior0, transmat0, obsmat0, nex, T);
Here data is 20x10. Now we make a random guess as to what the parameters are, prior1 = normalise(rand(Q,1)); %初始状态概率
transmat1 = mk_stochastic(rand(Q,Q)); %初始状态转移矩阵
obsmat1 = mk_stochastic(rand(Q,O)); %初始观测状态到隐藏状态间的概率转移矩阵
and improve our guess using 5 iterations of EM...
[LL, prior2, transmat2, obsmat2] = dhmm_em(data, prior1, transmat1, obsmat1,
'max_iter', 5);
%prior2, transmat2, obsmat2 为训练好后的初始概率,状态转移矩阵及混合状态概率转移矩阵
LL(t) is the log-likelihood after iteration t, so we can plot the learning curve.
序列分类
To evaluate the log-likelihood of a trained model given test data, proceed as follows: loglik = dhmm_logprob(data, prior2, transmat2, obsmat2) %HMM测试
Note: the discrete alphabet is assumed to be {1, 2, ..., O}, where O = size(obsmat, 2). Hence data cannot contain any 0s.
To classify a sequence into one of k classes, train up k HMMs, one per class, and then compute the log-likelihood that each model gives to the test sequence; if the i'th model is the most likely, then declare the class of the sequence to be class i. Computing the most probable sequence (Viterbi)
First you need to evaluate B(i,t) = P(y_t | Q_t=i) for all t,i:
B = multinomial_prob(data, obsmat);
Then you can use
[path] = viterbi_path(prior, transmat, B)
二、具有高斯混合输出的隐马尔科夫模型(GHMM,HMM with mixture of Gaussians outputs)
Maximum likelihood parameter estimation using EM (Baum Welch)
Let us generate nex=50 vector-valued sequences of length T=50; each vector has size O=2.
O = 2;
T = 50;
nex = 50;
data = randn(O,T,nex);
Now let use fit a mixture of M=2 Gaussians for each of the Q=2 states using
K-means.
M = 2;
Q = 2;
left_right = 0;
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
[mu0, Sigma0] = mixgauss_init(Q*M, reshape(data, [O T*nex]), cov_type);
mu0 = reshape(mu0, [O Q M]);
Sigma0 = reshape(Sigma0, [O O Q M]);
mixmat0 = mk_stochastic(rand(Q,M));
Finally, let us improve these parameter estimates using EM.
[LL, prior1, transmat1, mu1, Sigma1, mixmat1] = ...
mhmm_em(data, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', 2);
Since EM only finds a local optimum, good initialisation is crucial. The initialisation procedure illustrated above is very crude, and is probably not adequate for real applications... Click here for a real-world example of EM with mixtures of Gaussians using BNT.
What to do if the log-likelihood becomes positive?
It is possible for p(x) > 1 if p(x) is a probability density function, such as a Gaussian. (The requirements for a density are p(x)>0 for all x and int_x p(x) = 1.) In practice this usually means your covariance is shrinking to a point/delta function, so you should increase the width of the prior (see below), or constrain the matrix to be spherical or diagonal, or clamp it to a large fixed constant (not learn it at all). It is also very helpful to ensure the components of the data vectors have small and comparable magnitudes (use e.g., KPMstats/standardize).
This is a well-known pathology of maximum likelihood estimation for Gaussian mixtures: the global optimum may place one mixture component on a single data point, and give it 0 covariance, and hence infinite likelihood. One usually relies on the fact that EM cannot find the global optimum to avoid such pathologies.
What to do if the log-likelihood decreases during EM?
Since I implicitly add a prior to every covariance matrix (see below), what increases is loglik + log(prior), but what I print is just loglik, which may occasionally decrease.