CRF-log必看

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

2 + -1 + 1 = 2
6
●
If the sum is at least 0: “yes”, otherwise: “no”
Sequential Data Modeling – Conditional Random Fields
Review: Mathematical Formulation
y = sign ( w⋅ϕ ( x )) = sign ( ∑i=1 w i⋅ϕ i ( x ))
12
Sequential Data Modeling – Conditional Random Fields
Gradient of the Logistic Function
●
Take the derivative of the probability
d P ( y = 1∣x ) = dw
d e d w 1 + e w⋅ϕ ( x ) w⋅ϕ ( x ) e = ϕ (x) w⋅ϕ ( x ) 2 (1 +e )
w ← w +−0.25 ϕ ( x )
wunigram “Maizuru” wunigram “,” wunigram “in” wunigram “Kyoto” = -0.25 = -0.5 = -0.25 = -0.25 wunigram “A” = -0.25 wunigram “site” = -0.25 wunigram “located” = -0.25
●
The logistic function is a “softened” version of the function used in the perceptron
Perceptron
1
e P ( y =1∣x )= w⋅ϕ ( x ) 1 +e
p(y|x)
w⋅ϕ ( x )
Logistic Function
●
̂ =argmax ∏i P ( y i∣x i ; w ) w
w
●
How do we solve this?
10
Sequential Data Modeling – Conditional Random Fields
Review: Perceptron Training Algorithm
create map w for I iterations for each labeled pair x, y in the data phi = create_features (x) y' = predict_one (w, phi) if y' != y w += y * phi
1
P ( y =1∣x )=1 if w⋅ϕ ( x )≥ 0 P ( y =1∣x )=0 if w⋅ϕ ( x )< 0
p(y|x)
0.5 0
-10
-5
0
5
8
10
w*phi(x)
Sequential Data Modeling – Conditional Random Fields
The Logistic Function
Sequential Data Modeling – Conditional Random Fields
Sequential Data Modeling Conditional Random Fields
Graham Neubig Nara Institute of Science and Technology (NAIST)
w⋅ϕ ( x )
w⋅ϕ ( x )
dp(y|x)/dw*phi(x)
0.4 0.2 -10 -5 0 0 5 10
w*phi(x)
d P ( y =−1∣x ) = dw
d e (1 − ) w⋅ϕ ( x ) dw 1+ e
w⋅ϕ ( x )
13
e = −ϕ ( x ) w⋅ϕ ( x ) 2 (1 + e )
●
Predict
Yes! No!
●
This is binary classification (of course!)
5
Sequential Data Modeling – Conditional Random Fields
Review: Linear Prediction Model
●
Each element that helps us predict is a feature
1 0.5 0
p(y|x)
0.5 0
-10
-5
0
5
10
-10
-5
0
5
10
w*phi(x)
●
w*phi(x)
9
Can account for uncertainty Differentiable
●
Sequential Data Modeling – Conditional Random Fields
●
I
x: the input φ(x): vector of feature functions {φ1(x), φ2(x), …, φI(x)} w: the weight vector {w1, w2, …, wI} y: the prediction, +1 if “yes”, -1 if “no”
● ●
P ( y∣x )
Estimating confidence in predictions Combining with other systems
●
However, perceptron only gives us a prediction
y =sign ( w⋅ϕ( x ))
In other words:
Logistic Regression
●
Train based on conditional likelihood Find the parameters w that maximize the conditional likelihood of all answers yi given the example xi
●
In other words
●
●
For every training example, calculate the gradient (the direction that will increase the probability of y) Move in that direction, multiplied by learning rate α
contains “priest” contains “(<#>-<#>)” contains “site” contains “Kyoto Prefecture”
●ቤተ መጻሕፍቲ ባይዱ
Each feature has a weight, positive if it indicates “yes”, and negative if it indicates “no”
N VBD DET NN
Binary Prediction (2 choices) Multi-class Prediction (several choices) Structured Prediction (millions of choices)
3
I read a book
Sequential Data Modeling – Conditional Random Fields
A book review Oh, man I love this book! This book is so boring... A tweet On the way to the park! 公園に行くなう！ A sentence I read a book Is it positive? yes no Its language English Japanese Its parts-of-speech
wcontains “priest” = 2 wcontains “site” = -3 wcontains “(<#>-<#>)” = 1 wcontains “Kyoto Prefecture” = -1
●
For a new example, sum the weights
Kuya (903-972) was a priest born in Kyoto Prefecture.
1
w⋅ϕ ( x )=−1
d P ( y = 1∣x ) = dw =
w ← w + 0.196 ϕ( x )
wunigram “Maizuru” wunigram “,” wunigram “in” wunigram “Kyoto” = -0.25 = -0.304 = -0.054 = -0.054 wunigram “A” = -0.25 wunigram “site” = -0.25 wunigram “located” = -0.25 wunigram “Shoken” = 0.196 wunigram “monk” = 0.196 wunigram “born” = 0.196 15
1
Sequential Data Modeling – Conditional Random Fields
Prediction Problems
Given x, predict y
2
Sequential Data Modeling – Conditional Random Fields
Prediction Problems Given x, predict y
Logistic Regression
4
Sequential Data Modeling – Conditional Random Fields
Example we will use:
●
Given an introductory sentence from Wikipedia Predict whether the article is about a person
●
In other words
● ●
Try to classify each training example Every time we make a mistake, update the weights
11
Sequential Data Modeling – Conditional Random Fields
Stochastic Gradient Descent
●
Online training algorithm for probabilistic models (including logistic regression) create map w for I iterations for each labeled pair x, y in the data w += α * dP(y|x)/dw
Sequential Data Modeling – Conditional Random Fields
Example: Initial Update
●
Set α=1, initialize w=0
y = -1 x = A site , located in Maizuru , Kyoto 0 d e P ( y =−1∣x ) = − ϕ (x) w⋅ϕ( x )=0 0 2 dw (1 +e ) = −0.25 ϕ ( x )
14
Sequential Data Modeling – Conditional Random Fields
Example: Second Update
x = Shoken , monk born in Kyoto
-0.5 -0.25 -0.25
y=1
e ϕ (x) 1 2 (1 + e ) 0.196 ϕ ( x )
Given
Gonso was a Sanron sect priest (754-827) in the late Nara and early Heian periods. Shichikuzan Chigogataki Fudomyoo is a historical site located at Magura, Maizuru City, Kyoto Prefecture.
●
●
●
●
(sign(v) is +1 if v >= 0, -1 otherwise)
7
Sequential Data Modeling – Conditional Random Fields
Perceptron and Probabilities
●
Sometimes we want the probability