matrixid矩阵向量求导法则

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

(1b)
(AB)T = BT AT
(1c)
if individual inverses exist (AB)−1 = B−1A−1
(1d)
(A−1)T = (AT )−1
(1e)
0.2 trace, determinant and rank
|AB| = |A||B|
(2a)
|A−1| = 1
(2b)
0 A22
= |A11||A22|
A11 0
0 A22
−1
=
A−111 0
0 A−221
(9d) (9e)
0.10 matrix inversion lemma (sherman-morrison-woodbury)
using the above results for block matrices we can make some substitutions and get the following important results:
∂X
∂X
∂Tr XT AX = (A + AT )X
(3d)
∂X
∂Tr X−1A = −X−1AT X−1
(3e)
∂X
0.4 derivatives of determinants
∂|AXB| = |AXB|(X−1)T = |AXB|(XT )−1
(4a)
∂X
∂ ln |X| = (X−1)T = (XT )−1
(4b)
∂X
∂ ln |X(z)| = Tr
X−1 ∂X
(4c)
∂z
∂z
∂|XT AX| = |XT AX|(AX(XT AX)−1 + AT X(XT AT X)−1) (4d) ∂X
0.5 derivatives of scalar forms
∂(aT x) ∂(xT a)
=
=a
(5a)
∂x
∂x
−A−221A21F−111
A−221 + A−221A21F−111A12A−221
where
F11 = A11 − A12A−221A21
F22 = A22 − A21A−111A12
for block diagonal matrices things are much easier:
A11 0
=I
(6f )
∂x
∂(xT AxxT ) = (A + AT )xxT + xT AxI
(6g)
∂x
0.7 constrained maximization
the maximum over x of the quadratic form:
µT x − 1 xT A−1x
(7a)
2
subject to the J conditions cj(x) = 0 is given by:
similarlytheindexingforderivativesofvectorsandmatriceswithrespecttoscalarsisstraightforward103derivativesoftraces?trx?xi3a?trxa?x?trxta?x?trax?x?traxt?x?trxtax?x?trx?1a?xat3ba3caatx3d?x?1atx?13e04derivativesofdeterminants?axb?xx?1txt?1axbx?1taxbxt?14a?lnx?x4b?lnxz?z?xtax?xtrx?1?x?z4cxtaxaxxtax?1atxxtatx?14d05derivativesofscalarforms?atx?x?xta?xa5a?xtax?x?atxb?x?atxtb?x?atxta?x?atxtcxb?xaatx5babt5cbat5d?atxa?xaat5ectxabtcxbat5f?xabtcxab?xcctxabat5g2thederivativeofonevectorywithrespecttoanothervectorxisamatrixwhoseijthelementis?yj?xi
onalized to the form:
A = CΛCT
(8)
3
where the columns of C are (orthonormal) eigenvectors (i.e. CCT = I) and the diagonal of Λ has the eigenvalues
0.9 block matrices
(2f )
biggest eval
condition number = γ =
(2g)
smallest eval
derivatives of scalar forms with respect to scalars, vectors, or matricies are
indexed in the obvious way. similarly, the indexing for derivatives of vectors
matrix identities
sam roweis (revised June 1999)
note that a,b,c and A,B,C do not depend on X,Y,x,y or z
0.1 basic formulae
A(B + C) = AB + AC
(1a)
(A + B)T = AT + BT
for conformably partitioned block matrices, addition and multiplication is performed by adding and multiplying blocks in exactly the same way as scalar elements of regular matrices however, determinants and inverses of block matrices are very tricky; for 2 blocks by 2 blocks the results are:
|A|
|A| = evals (2c)
Tr [A] = evals (2d)
if the cyclic products are well deﬁned,
Tr [ABC . . .] = Tr [BC . . . A] = Tr [C . . . AB] = . . .
(2e)
rank [A] = rank AT A = rank AAT
represents the ratio of the hypervolume dy to that of dx so that f (y)dy = f (y(x))|∂yT /∂x|dx. however, the sloppy forms ∂y/∂x, ∂yT /∂xT and
∂y/∂xT are often used for this Jacobain matrix.
and matrices with respect to scalars is straightforward.
1
0.3 derivatives of traces
∂Tr [X]
=I
(3a)
∂X
∂Tr [XA] = ∂Tr [AX] = AT
(3b)
∂X
∂X
∂Tr XT A ∂Tr AXT
=
=A
(3c)
0.6 derivatives of vector/matrix forms
∂(X−1) = −X−1 ∂X X−1
(6a)
∂z
∂z
∂(Ax) ∂x
=A
(6b)
∂z
∂z
∂(XY) ∂Y ∂X
=X + Y
(6c)
∂z
∂z ∂z
∂(AXB) ∂X
=A B
(6d)
∂z
∂z
∂(xT A)
=A
(6e)
∂x
∂(xT )
(A + XBXT )−1 = A−1 − A−1X(B−1 + XT A−1X)−1XT A−1
(10)
|A + XBXT | = |B||A||B−1 + XT A−1X|
(11)
where A and B are square and invertible matrices but need not be of the same dimension. this lemma often allows a really hard inverse to be converted into an easy inverse. the most typical example of this is when A is large but diagonal, and X has many rows but few columns
Aµ + ACΛ, Λ = −4(CT AC)CT Aµ
(7b)
where the jth column of C is ∂cj(x)/∂x
0.8 symmetric matrices
have real eigenvalues, though perhaps not distinct and can always be diag-
(5g)
∂X
2
the derivative of one vector y with respect to another vector x is a matrix whose (i, j)th element is ∂y(j)/∂x(i). such a derivative should be written as ∂yT /∂x in which case it is the Jacobian matrix of y wrt x. its determinant
4
A11 A21
A12 A22
= |A22| · |F11| = |A11| · |F22|Βιβλιοθήκη (9a)A11 A21
A12
−1
=
A22
=
F1−11
−A−111A12F−221
−F−221A21A−111
F−221
(9b)
A−111 + A−111A12F−221A21A−111
−F−111A12A−221
∂(xT Ax) = (A + AT )x
(5b)
∂x
∂(aT Xb) = abT
(5c)
∂X
∂(aT XT b) = baT
(5d)
∂X
∂(aT Xa) = ∂(aT XT a) = aaT
(5e)
∂X
∂X
∂(aT XT CXb) = CT XabT + CXbaT
(5f )
∂X
∂ (Xa + b)T C(Xa + b) = (C + CT )(Xa + b)aT