深度迁移度量学习
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
z(m) i
W
h (m) (m1) i
b(m)
L(M ) ji
(h(jM ) hi(M ) ) e
'
(
z
(M j
)
)
L( m ) ij
(W
( m1)T
L( m 1) ij
)
e
' ( zi(m) )
L( m ) ji
(W
( m1)T
L( m 1) ji
)
e
'
(
z
(m j
)
)
L(M ) ti
learn feature representations
nonlinear relationship
kernel trick h, W, b
training and test samples
DTML
same Transfer learning
DTML
DML
DML
The nonlinear mapping function can be explicitly obtained:
the output of all layers as much as well.
M 1
min J J (M ) w(m)h(J (m) (m) ),
f (M )
m1
discriminative information from the output of all layers.
J (m)
criterion
D(m) ts
(
X
t
,
X
s
)
1 Nt
Nt i 1
f
(m) ( xti
)
1 Ns
Ns i 1
f (m) (xsi )
2 2
We formulate DTML as the following optimization problem:
min J
f (M )
M
Sc( M
)
Sb(M
Deep Transfer Metric Learning
Introduction
Problems
Metric learning
Deep learning
Deep Metric learning
DSTML
Linear feature space
explicit nonlinear mapping functions
DMTL
Then, W (m) and b(m) can be updated by using the gradient
descent algorithm as follows until convergence:
W (m)
W (m)
J W (m)
b(m)
b(m)
J b(m)
DMTL
STEP:
)
D(M ts
)
(
X
t
,
Xs)
( W (m)
m1
2 F
b(m)
2
)
2
stochastic sub-gradient
descent method
DMTL
the objective function J with respect to the parameters
W and b are computed as follows:
intra-class variations are minimized
inter-class variations are maximized
Transfer learning
Deep learning
DSTML
For exploiting discriminative information from
M
Sc(M ) Sb(M ) ( W (m)
m1
2 F
b(m)
2
)
2
intra-class variations are minimized
inter-class variations are maximized
S (m) c
=
1 Nk1
N i 1
N
Pij
d
2 f (m)
(
xi
,
x
j
)
Sc( M
)
Sb(M
)
D(M ts
)
(Xt
,
X
s
)
M
( W (m)
2 F
b(m)
2
)
2
m1
Then compute the gradient.
Face Verification
Face Verification
LFW
LBP
WDRef
LBP
DTML
output
parameters:
0.1, 10, 0.1, w(1) 1, (1) 0, k1 5, k2 10, 0.2
Face Verification
Person ReIdentification
Person ReIdentification
Source domain Target domain
color and texture histograms
color and texture histograms
DTML
output
parameters:
0.1, 10, 0.1, w(1) 1, (1) 0, k1 3, k2 10, 0.2
Deep network with three layers (M = 2), and neural nodes from bottom to top layer are set as: 200→200→100.
The nonlinear activation function:The tanh function
Person ReIdentification
Person ReIdentification
training samples at the top layer
DML
Enforce the marginal fisher analysis criterion on the output of all the training samples at the top layer:
min J
f (M )
Deep network with three layers (M = 2), and neural nodes from bottom to top layer are set as: 500→400→300. The nonlinear activation function:The tanh function
1 Ns
Ns
L h (m) (m1)T si si
)
2
i 1
W (m)
DMTL
the updating equations for all layers
1 ≤ m ≤ ℓ − 1 are computed as follows :
L(M ) ij
(hi(M ) h(jM ) ) e
' ( zi(M ) )
f (m) (x) h(m) (W h (m) (m1) b(m) ) R p(m) is a nonlinear activation function which operates component-wisely.
For the first layer, we assume h(0) x .
1 (
Nt
Nt j 1
h(M ) tj
1 Ns
Ns
h(M sj
)
)
e
j 1
'
(
z(M ti
)
)
L(M ) si
(1 Ns
Ns j 1
h(M ) sj
1 Nt
Nt
h(M tj
)
)
e
j 1
'
(
z(M si
)
)
L( m ) ti
(W
( m1)T
L( m 1) ti
)
e
'
(
z(m ti
)
)
L(m) (W (m1)T L(m1) ) e ' ( z (m) )
J
W (m)
2 Nk1
N i 1
N
P (L h (m) (m1)T ij ij i j 1
L h (m) (m1)T ji j
)
2
Nk2
N i 1
N
Q (L h (m) (m1)T ij ij i j 1
L h (m) (m1)T ji j
)
2
(1 Nt
L h Nt (m) (m1)T ti ti i 1
For each pair of samples xi and x j , their distance metric is
d
2 f
(m)
(
xi
,
x
j
)
f
(m) (xi )
fwk.baidu.com
(m)(xj )
2 2
Enforce the marginal fisher analysis criterion on the output of all the
j 1
S (m) b
=
1 Nk2
N i 1
N
Qij
d
2 f (m)
(
xi
,
x
j
)
j 1
DMTL
Given target domain data X t and source domain data X s ,to reduce the
distribution difference, we apply the Maximum Mean Discrepancy (MMD)