CMU高级机器学习非参数贝叶斯模型

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Advanced Machine Learning Nonparametric Bayesian Models 一
Learning/Reasoning in Open Possible Worlds —Learning/Reasoning Eric Xing Lecture 17, August 14, 2009 Reading: Eric Xing © Eric Xing @ CMU, 2006-2009 1 Clustering Eric Xing © Eric Xing @ CMU, 2006-2009 2 1
Image Segmentation How to segment images? Manual segmentation (very expensive Algorithm segmentation K-means Statistical mixture models Spectral clustering Problems with most existing algorithms Ignore 什w spatial information Perform the segmentation one image at a time Need to specify the number of segments a priori Eric Xing © Eric Xing @ CMU,
2006-2009 3 Object Recognition and Tracking (1.9, 9.0, 2.1 (1.& 7.4, 2.3 (1.9, 6.1, 2.2 (0.7, 5.1, 3.2 (0.9, 5.& 3」(0.6, 5.9, 3.2 t=l Eric Xing t=2 © Eric Xing @ CMU, 2006-2009 t=3 4 2
Modeling The Mind •…Latent brain processes: View picture Read sentence Decide whether consistent fMRI scan:》....... Eric Xing ... t=l © Eric Xing @ CMU, 2006・
2009 t=T 5 The Evolution of Science Research circles Phy Research topics Bio CS PNAS papers 1900 Eric Xing © Eric Xing @ CMU, 2006-2009 2000 6?3
A Classical Approach Clustering as Mixture Modeling Then M model selection" Eric Xing © Eric Xing @ CMU, 2006-2009 7 Partially Observed, Open and Evolving Possible Worlds Unbounded # of objects/trajectories Changing attributes Birth/death, merge/split Relational ambiguity Tlie parametric paradigm: p cpkO ({ } or p({(p } 1:T k Evunt model p (pkt +1 motion model t k ({ } {cp } E*+l|t+l t Finite Entity space Stinctxirally unambiguous E*|t t Sensor model p(x | {cpk } obsen*ation space How to open it up? © Eric Xing @ CMU, 2006-2009 8 Eric Xing 4
Model Selection vs. Posterior Inference Model selection H intelligent H guess: ??? cross validation: data-hungry information theoretic: AIC TIC MDL : " arg min KL f (・ I g (・ | 6 ML , K Parsimony, Ockam's Razor need to compute data likeliliood ( Bayes factor: Posterior inference: we want to handle uncertainty of model complexity explicitly p (M I D 00 p (D I M p (M we favor a distribution that does not constrain M in a "closed" space! M = {0 , K } Eric Xing
© Eric Xing @ CMU, 2006-2009 9 Two "Recent" Developments First order probabilistic languages (FOPLs Examples: PRM. BLOG ・・・ Lift giaphical models to M open M world relation, index, lifespan ・・・ Focus on complete, consistent, and operating rules to instantiate possible worlds, and formal language of expressing such rules Operational way of defining distributions over possible worlds, via sampling methods Bayesian Nonparametrics Examples: Dirichlet processes, stick-breaking processes ・・・ From finite, to infinite mixture, to more complex constnictions (hierarchies, spatial/temporal sequences,・・・ Focus on the laws and behaviors of both the generative formalisms and resulting distributions Often offer explicit expression of distributions, and expose the stnjcture of the distributions ― motivate various approximate schemes Eric Xing © Eric Xing @ CMU, 2006-2009 10 5
Clustering How to label them ? How many clusters ??? Eric Xing © Eric Xing @ CMU, 2006-2009 11 Random Partition of Probability Space {<p4 , TT 4 } {cp6,兀6 } {<p5 , 7i 5 } {(p3 , IT 3 } {(pl 1 ・(event,pe\*ent centroid :=(p linage ele. :=(xQ 述} {q>2,兀2 } •… Eric Xing © Eric Xing @ CMU, 2006-2009 12 6
Stick-breaking Process 兀G =》k 6 (0k 0k - GO k =1 k =1 co cc 0 0.6 0.4 0.5 0.8 0.4 0.3 0.24 兀》k =1 k ・1 Location 0.3 nk = pk H(1 - pk j =1 [3k 〜Beta(l, a Eric Xing Mass © Eric Xing @ CMU, 2006-2009 GO 13 Cliinese Restaurant Process 01 02 P(ci = k | c -i =11 1+a 1 2 +a 1 3+a ml i + a -1 a 1+a 1 2 +a 2 3 +a m2 i + a -1 000a2+aa3 +a .... ai + a-1 CRP defines an exchangeable distribution on partitions over an (infinite sequence of samples, such a distribution is formally known as 什Dirichlet Process (DP Eric Xing © Eric Xing @ CMU, 2006-2009 14 7
Diriclilet Process(p4 <p6 <p3 <p5 (p(pl 2 a distribution A CDF, G, on possible worlds of random partitions follows a Dirichlet Process if for any measurable finite partition (cpl,<p2,・・,(pm: (G((pl, G(cp2, G((pm 〜Dii'iclilet( aG0((pl, ・・・・, aG0((pm another distribution where GO is the base measure and a is the scale parameter Thus a Dirichlet
Process G defines a distribution of distribution Eric Xing © Eric Xing @ CMU, 2006- 2009 15 Gi'apliical Model Representations of DP GO GO a G 0i xi N a K oo 0 yi xi N The Stick-breaking
construction The CRP constmction Eric Xing © Eric Xing @ CMU, 2006-2009 16 8 Ancestral Inference ? Ak 0k Hnl Gn Hii2 N Essentially a clustering problem, but ・・・Better recovery of the ancestors leads to better haplotyping results (because of more accurate grouping of common haplotypes True haplotypes are obtainable with high cost, but they can validate model more subjectively (as opposed to examining saliency of clustering Eric Xing Many other biological/scientific utilities © Eric Xing @ CMU, 2006-2009 17 Example:
DP-haplotypei' Clustering human populations a G K [Xing et al, 2004] GO DP A 0 Hii2 Gn N infinite mixture components (for population haplotypes haplotypes Hnl Likelihood model (for individual haplotypes and genotypes genotypes Inference: Markov Chain Monte Carlo (MCMC Gibbs sampling Metropolis Hasting Eric Xing © Eric Xing @ CMU, 2006-2009 18 9
The DP Mixture of Ancestral Haplotypes The customers around a table in CRP form a cluster associate a mixture component (i.e., a population haplotype with a table sample {a, 0} at each table from a base measure GO to obtain the population haplotype and nucleotide substitution frequency for that component 1 4 8 9 {A.0} 3 2 {A50} 5 {A,0} 6 {A,0} 7 {A,0} {A,6} ... Withp(h|{A, 0} andp(g|hl,h2, the CRP yields a posterior distribution on the number of population haplotypes (and on the haplotype configurations and the nucleotide substitution frequencies Eric Xing © Eric Xing @ CMU, 2006-2009 19 Inheritance and Observation Models Single-locus mutation model ACi —> H ie e Cil C i2 for ht = a t ( 0 I PH ( ht | a t, 0 = I 1-0 I B | - 1 for ht a t l—ht = at with prob ・ 0 Al A2 A3 •…Ancestral pool H il H i2 Haplotypes Noisy observation model H il,Hi2->GiPG(g|hl,h2:gt = hl,t ㊉h2,t with prob. X Gi © Eric Xing @ CMU, 2006-2009 Genotype Eric Xing 20 10
Con vargence -of Ancwstrai Inference
DP vs. Finite
E H
IMCMC 1ar Haplotype Inference
• ■ t •
• O DCO unpng»x<«QXtin;Lta (mtMDr-eM )tafuiLnM bcp^^oailnsiM
•冷tr ・tw •&.
xq
• • u » • ■ h
: *■
■■
MCMC ior Haplotype inference
a StnRi*. ton

Octo anting uftitoc n not «ttoo« moi^h k>nte (jito IM KAMI RitHm
• TnniaM «nofeTrtkn am :■ tonwMsd n
h« ul rt4*at x^satif 号uM
• VMrxMxrw (rie<wK«uA w ee mMioaucfi aM» tffTOWatWTM^ a »m (CKa
• httiwaMini. I. t; waanwlMTMii 事用・口山& ■
W X. 1 « MiuiXfH/ X!:^:X». Approx imations to DP
• g, t
is
IS
Summary A non-parametric Bayesian model for Pattern Unco very Finite mixture model of latent patterns (e.g., image segments, objects infinite mixture of propotypes: alternative to model selection hierarchical infinite mixture infinite hidden Markov model temporal infinite mixture model Applications in general data-miiiiiig ・・・ Eric Xing © Eric Xing @ CMU, 2006-2009 31 16。

相关文档
最新文档