Modeling Musical Emotion Dynamics 音乐情绪模型化分析

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

6. Y. E. Kim, E. M. Schmiddt, “Modeling musical emotion dynamics with conditional random fields”, in ISMIR 2011
7. Hanna M. Wallach, “Conditional Random Fields: An Introduction”, in 2004 8. Greg Welch, and Gary Bishop, “An Introduction to the Kalman Filter”, in 2006
Adding the known bias, we get the final estimate as:
Experiment Result
Emotion label preprocessing: gray dots indicate individual second-bysecond labels, red ellipses indicate the estimates of the distribution, and blue ellipses indicate the predictions using Kalman filter
CRF: Dynamic Programming
Right side could be rewritten as
Using forward-backward algorithm, define
CRF: Dynamic Programming (Cont.)
And the recursion relations:
Modeling Musical Emotion Dynamics
Presenter: SHUMIN XU
Arousal-Valence Model
2-dimensions representation of emotional memory Arousal: High- vs. low-energy (e.g. energetic vs. calm) Valence: positive vs. negative (e.g. happy vs. sad)
Data Training with Kalman Filtering
AKA: Linear Quardratic Estimation (LQE)
Recursive Estimator Predict: uses the state estimate from the previous timestep to produce an estimate of the state at the current timestep, aka priori state estimate
Conditional Random Fields (CRF)
Definition: For observations sequence X and label sequence Y, Let G = (V, E) be a graph such that , so that Y is indexed by the vertices of G. Then (X, Y) is a conditional random field when the random variables Yv conditioned on X, obey the Markov property with respect to the graph: where w~v means that w and v are neighbors in G.
Much relaxation of the independence assumptions Avoid label bias problem
much complexity
1. 2. Y. E. Kim, E. M. Schmiddt, R. Migneco, B. G. Morton, P.Richardson, J. Scott, J. A. Speck and D. Turnbull, “Music emotion recognition: A state of the art review”, in ISMIR, Utrecht, Netherlands, 2010 3. Y. E. Kim, E. M. Schmiddt, and D. Turnbullm, “Feature selection for content-based, time-varying musical emotion regression”, in ACM MIR, Philadelphia, PA, 2010
Advantages and Limitations
Smooth and robust estimates
Distribution evolves over time
Limited model complexity was unable to cover a wide variance in emotion space dynamics all three become darker as time progresses, i.e. the estimation becomes indeterminism
Where Z(x) is a normalization factor
Maximum Likelihood Parameter Inference The log likelihood is given by:
Differentiating the log-likelihood with respect
4. Y. E. Kim, E. M. Schmiddt, “Prediction of time-varying musical mood distributions from audio”, in ISMIR, Utrecht, Netherlands, 2010
5. Y. E. Kim, E. M. Schmiddt, “Prediction of time-varying musical mood distributions using Kalman filtering”, in IEEE ICMLA, Washington, D.C., 2010
Conditional Random Fields (Cont.)
Potential functions:
Joint Probability:
Let Rewrite the probability function as:
CRF: Max Likelihood Parameter Inference
The process x is Markov, i.e.,
Kalman Filtering
First performing the forward recursions:
Kalman filtering (Cont.)
Then performing the backward recursions:
Update: the current a priori prediction is combined with current observation information to refine the state estimate, termed as posteriori state estimate
State probabilities: state of emotion Recognition: emotion changes over time
Emotion Space Heatmap Prediction
Advantages and Disadvantages
CRF in Musical Emotion Recognition
Label: A-V modeled acoustic data
Observation: Mel-frequency cepstral coefficients (MFCC) Transition probabilities: emotions tend to change smoothly
Linear Gauss-Markov model
Statistical Assumptions
Driving noise w and observation noise v are zero mean Gaussian
ቤተ መጻሕፍቲ ባይዱ
W and v are independent of X and Y