CRF-log必看

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

2 + -1 + 1 = 2
6

If the sum is at least 0: “yes”, otherwise: “no”
Sequential Data Modeling – Conditional Random Fields
Review: Mathematical Formulation
y = sign ( w⋅ϕ ( x )) = sign ( ∑i=1 w i⋅ϕ i ( x ))
12
Sequential Data Modeling – Conditional Random Fields
Gradient of the Logistic Function

Take the derivative of the probability
d P ( y = 1∣x ) = dw
d e d w 1 + e w⋅ϕ ( x ) w⋅ϕ ( x ) e = ϕ (x) w⋅ϕ ( x ) 2 (1 +e )
w ← w +−0.25 ϕ ( x )
wunigram “Maizuru” wunigram “,” wunigram “in” wunigram “Kyoto” = -0.25 = -0.5 = -0.25 = -0.25 wunigram “A” = -0.25 wunigram “site” = -0.25 wunigram “located” = -0.25

The logistic function is a “softened” version of the function used in the perceptron
Perceptron
1
e P ( y =1∣x )= w⋅ϕ ( x ) 1 +e
p(y|x)
w⋅ϕ ( x )
Logistic Function

̂ =argmax ∏i P ( y i∣x i ; w ) w
w

How do we solve this?
10
Sequential Data Modeling – Conditional Random Fields
Review: Perceptron Training Algorithm
create map w for I iterations for each labeled pair x, y in the data phi = create_features (x) y' = predict_one (w, phi) if y' != y w += y * phi
1
P ( y =1∣x )=1 if w⋅ϕ ( x )≥ 0 P ( y =1∣x )=0 if w⋅ϕ ( x )< 0
p(y|x)
0.5 0
-10
-5
0
5
8
10
w*phi(x)
Sequential Data Modeling – Conditional Random Fields
The Logistic Function
Sequential Data Modeling – Conditional Random Fields
Sequential Data Modeling Conditional Random Fields
Graham Neubig Nara Institute of Science and Technology (NAIST)
w⋅ϕ ( x )
w⋅ϕ ( x )
dp(y|x)/dw*phi(x)
0.4 0.2 -10 -5 0 0 5 10
w*phi(x)
d P ( y =−1∣x ) = dw
d e (1 − ) w⋅ϕ ( x ) dw 1+ e
w⋅ϕ ( x )
13
e = −ϕ ( x ) w⋅ϕ ( x ) 2 (1 + e )

Predict
Yes! No!

This is binary classification (of course!)
5
Sequential Data Modeling – Conditional Random Fields
Review: Linear Prediction Model

Each element that helps us predict is a feature
1 0.5 0
p(y|x)
0.5 0
-10
-5
0
5
10
-10
-5
0
5
10
w*phi(x)

w*phi(x)
9
Can account for uncertainty Differentiable

Sequential Data Modeling – Conditional Random Fields

I
x: the input φ(x): vector of feature functions {φ1(x), φ2(x), …, φI(x)} w: the weight vector {w1, w2, …, wI} y: the prediction, +1 if “yes”, -1 if “no”
● ●
P ( y∣x )
Estimating confidence in predictions Combining with other systems

However, perceptron only gives us a prediction
y =sign ( w⋅ϕ( x ))
In other words:
Logistic Regression

Train based on conditional likelihood Find the parameters w that maximize the conditional likelihood of all answers yi given the example xi

In other words


For every training example, calculate the gradient (the direction that will increase the probability of y) Move in that direction, multiplied by learning rate α
contains “priest” contains “(<#>-<#>)” contains “site” contains “Kyoto Prefecture”
●ቤተ መጻሕፍቲ ባይዱ
Each feature has a weight, positive if it indicates “yes”, and negative if it indicates “no”
N VBD DET NN
Binary Prediction (2 choices) Multi-class Prediction (several choices) Structured Prediction (millions of choices)
3
I read a book
Sequential Data Modeling – Conditional Random Fields
A book review Oh, man I love this book! This book is so boring... A tweet On the way to the park! 公園に行くなう! A sentence I read a book Is it positive? yes no Its language English Japanese Its parts-of-speech
wcontains “priest” = 2 wcontains “site” = -3 wcontains “(<#>-<#>)” = 1 wcontains “Kyoto Prefecture” = -1

For a new example, sum the weights
Kuya (903-972) was a priest born in Kyoto Prefecture.
1
w⋅ϕ ( x )=−1
d P ( y = 1∣x ) = dw =
w ← w + 0.196 ϕ( x )
wunigram “Maizuru” wunigram “,” wunigram “in” wunigram “Kyoto” = -0.25 = -0.304 = -0.054 = -0.054 wunigram “A” = -0.25 wunigram “site” = -0.25 wunigram “located” = -0.25 wunigram “Shoken” = 0.196 wunigram “monk” = 0.196 wunigram “born” = 0.196 15
1
Sequential Data Modeling – Conditional Random Fields
Prediction Problems
Given x, predict y
2
Sequential Data Modeling – Conditional Random Fields
Prediction Problems Given x, predict y
Logistic Regression
4
Sequential Data Modeling – Conditional Random Fields
Example we will use:

Given an introductory sentence from Wikipedia Predict whether the article is about a person

In other words
● ●
Try to classify each training example Every time we make a mistake, update the weights
11
Sequential Data Modeling – Conditional Random Fields
Stochastic Gradient Descent

Online training algorithm for probabilistic models (including logistic regression) create map w for I iterations for each labeled pair x, y in the data w += α * dP(y|x)/dw
Sequential Data Modeling – Conditional Random Fields
Example: Initial Update

Set α=1, initialize w=0
y = -1 x = A site , located in Maizuru , Kyoto 0 d e P ( y =−1∣x ) = − ϕ (x) w⋅ϕ( x )=0 0 2 dw (1 +e ) = −0.25 ϕ ( x )
14
Sequential Data Modeling – Conditional Random Fields
Example: Second Update
x = Shoken , monk born in Kyoto
-0.5 -0.25 -0.25
y=1
e ϕ (x) 1 2 (1 + e ) 0.196 ϕ ( x )
Given
Gonso was a Sanron sect priest (754-827) in the late Nara and early Heian periods. Shichikuzan Chigogataki Fudomyoo is a historical site located at Magura, Maizuru City, Kyoto Prefecture.




(sign(v) is +1 if v >= 0, -1 otherwise)
7
Sequential Data Modeling – Conditional Random Fields
Perceptron and Probabilities

Sometimes we want the probability
相关文档
最新文档