Fredholm determinants and the mKdVsinh-Gordon hierarchies

where D denotes ∂/∂t1 and D −1 denotes the antiderivative which vanishes at t1 = −∞. (Observe that φ and all its derivatives vanish at t1 = −∞.) This is the integrated mKdV hierarchy of equations, ∂3φ ∂φ 3 ∂φ = 3 − 2( ) , ∂t3 ∂t1 ∂t1 ∂5φ ∂ 2 φ ∂φ ∂φ 2 ∂ 3 φ ∂φ 5 ∂φ = 5 − 10 ( 2 )2 − 10 ( ) +6( ), 3 ∂t5 ∂t1 ∂t1 ∂t1 ∂t1 ∂t1 ∂t1 etc. (In general there are constant factors on the left sides which can be removed by changes of scale in the time variables; e.g. [1]) To go in the other direction we introduce the inverse of the operator appearing in (2), which is given by (D 2 − 4 ∂φ −1 ∂φ 1 D D )−1 = (D −1 e2φ D −1 e−2φ + D −1 e−2φ D −1 e2φ ). ∂t1 ∂t1 2 (3)
The case n = 1 of this is equivalent to the sinh-Gordon equation 1 ∂2φ = sinh 2φ. ∂t−1 ∂t1 2 2 (5)
Observe that (2) and (4) can be combined into the single statement that either of them holds for all values of the integer n. Further observe that these results hold independently of the function e(x) appearing in the kernel K (x, y ). The function e(x) affects the boundary conditions for (2) and (4) at tk = −∞. That φ satisfies the integrated mKdV hierarchy was conjectured in [12], and that it satisfies the sinh-Gordon equation (5) was conjectured in [12] and proved in [2]. A related identity, e2φ − 1 ∂2 log det (I − K ) = , (6) − ∂t−1 ∂t1 4 was also conjectured in [12] and proved in [2], and will be rederived here. We prove our results by expressing all relevant quantities in terms of inner products ui,j := ((I − K 2 )−1 Ei , Ej ), vi,j := ((I − K 2 )−1 K Ei , Ej ), (7)

Iterative methods for ill-conditioned Toeplitz matrices

2. Matrices related to Fast Transforms
To derive preconditioners for Toeplitz matrices we consider the following Transforms and related matrix classes: (I) The matrix 1 ijk n?1 Fn = pn exp(? 2 n ) j;k=0 ;
Iterative methods for ill-conditioned Toeplitz Matrices
Thomas Huckle Institut fur Informatik Technical University Munchen D-80333 Munchen, Germany
Toeplitz systems. We consider Toeplitz matrices with a real generating function that is nonnegative with only a small number of zeros. Then we can de ne a preconditioner of the form Sn Sn where Sn is the matrix describing the discrete Sine transform and is a diagonal matrix. If we have full knowledge about f then we can show that the preconditioned system is of bounded condition number independly of n. We can obtain the same result for the case that we know only the position and order of the zeros of f . If we only know the matrix and its coe cients tj , we present Sine transform preconditioners that show in many examples the same numerical behaviour. Key Words. Toeplitz matrix, Sine Transform, preconditioned conjugate gradient method
Limit theorems for sample eigenvalues in a generalized spiked population model

1. Introduction Let (Tp ) be a sequence of p × p non-random and nonnegative definite Hermitian matrices and let (wij ), i, j ≥ 1 be a doubly infinite array of i.i.d. complex-valued random variables satisfying E(w11 ) = 0, E(|w11 |2 ) = 1, E(|w11 |4 ) < ∞.
1991 Mathematics Subject Classification. Primary 60F15, 60F05; secondary 15A52, 62H25. Key words and phrases. Sample covariance matrices, Spiked population model, limit theorems, Largest eigenvalue, Extreme eigenvalues . The research of this Zhidong Bai was supported by CNSF grant 10571020 and NUS grant R-155-000-061-112. Research was (partially) completed while J.-F. Yao was visiting the Department of Statistics and Applied Probability, National University of Singapore in March 2007.
[10], the population covariance matrix has all its eigenvalues equal to unit except for a few fixed eigenvalues (spikes). The question is to quantify the effect of the perturbation caused by the spike eigenvalues. Baik and Silverstein [6] establishes the almost sure limits of the extreme sample eigenvalues associated to the spike eigenvalues when the population and the sample sizes become large. In a recent work [5], we have provided the limiting distributions for these extreme sample eigenvalues. In this paper, we extend this theory to a generalized spiked population model where the base population covariance matrix is arbitrary, instead of the identity matrix as in Johnstone’s case. New mathematical tools are introduced for establishing the almost sure convergence of the sample eigenvalues generated by the spikes.
Graph based semi-supervised learning via label fitting-on line

ORIGINAL ARTICLEGraph based semi-supervised learning via label fittingWeiya Ren 1•Guohui Li 1Received:8April 2015/Accepted:29October 2015ÓSpringer-Verlag Berlin Heidelberg 2015Abstract The global smoothness and the local label fit-ting are two key issues for estimating the function on the graph in graph based semi-supervised learning (GSSL).The unsupervised normalized cut method can provide a more reasonable criterion for learning the global smooth-ness of the data than classic GSSL methods.However,the semi-supervised norm of the normalized cut,which is a NP-hard problem,has not been studied well.In this paper,a new GSSL framework is proposed by extending nor-malized cut to its semi-supervised norm.The NP-hard semi-supervised normalized cut problem is innovatively solved by effective algorithms.In addition,we can design more reasonable local label fitting terms than conventional GSSL methods.Other graph cut methods are also investi-gated to extend the proposed semi-supervised learning algorithms.Furthermore,we incorporate the nonnegative matrix factorization with the proposed learning algorithms to solve the out-of-sample problem in semi-supervised learning.Solutions obtained by the proposed algorithms are sparse,nonnegative and congruent with unit matrix.Experiment results on several real benchmark datasets indicate that the proposed algorithms achieve good results compared with state-of-art methods.Keywords Graph based semi-supervised learning ÁGlobal smoothness ÁLabel fitting ÁGraph cut ÁBasis matrix ÁCongruency approximation1IntroductionIn the past several years,the semi-supervised learning (SSL)approach,which combines limited labeled samples with rich unlabeled samples to improve learning ability,has attracted lots of attention [1–6].As an important branch of SSL,graph based semi-supervised learning (GSSL)[7–12]has recently become popular in wide applications due to their high accuracy and computational efficiency.Its application areas include image annotation [13,14],col-lective image parsing [15]and medical diagnosis [16].Some researches focus on the graph construction [17–19]in GSSL,while others focus on the propagation strategy,such as Gaussian fields and harmonic functions (GFHF)[9],local and global consistency (LGC)[7],greedy gra-dient max-cut (GGMC)[8],and manifold regularization [20].Specifically,LGC and GFHF treat the soft label matrix as the only variable in optimization,while GGMC solves a bivariate optimization problem over the predicted soft labels and the initial hard labels.GSSL is also one kind of graph-based learning (GL),which treats samples from a data set as vertices in a graph and builds pairwise weights between these vertices.Global smoothness and local label fitting are two key issues in GSSL [7,20,21].For global smoothness learning,many graph cut methods are available,such as normalized cut [21],ratio cut [22],average cut [23],minimum cut [24]and min–max cut [25].Given a dataset X 2R m 9n ,and the neighborhood graph with affinity matrix W ,normalized cut is defined as&Weiya Renweiyren.phd@Guohui Liguohli@1College of Information System and Management,National University of Defense Technology,Changsha 410072,People’s Republic of ChinaInt.J.Mach.Learn.&Cyber.DOI 10.1007/s13042-015-0458-yNcut P 1;P 2;...;P k ðÞ,12X k i ¼1W P i ; Pi ðÞvol P i ðÞ:ð1Þwhere P 1,P 2,…,P k are a partition of P (P 1[P 2...[P k ¼P ;P i \P j ¼;;i ¼j and P i ¼;;i ¼1;...;c ),W ðP i ;P j Þ,Pa 2P i ;b 2P j w ab ,vol ðP i Þ,P a 2P i ;b 2P w ab andP i is the com-plementary set of P i .However,finding the optimal normalized cut has proven to be NP-hard.In addition,it is also a challenge to incor-porate prior information in normalized cut.Some litera-tures [26,27]incorporate prior information to normalized cut by adding certain constraints.Yang et al.[28]consider prior information by first separating and assigning training data to the source or the sink set.Instead of considering label information,Kulis et al.[29]use pairwise must-link constraints and cannot-link constraints.A kernel learning approach is then proposed to extend normalized cut to its semi-supervised norm.Nevertheless,these methods cannot perform as well as LGC or GGMC in practice [8].In this paper,the NP-hard semi-supervised normalized cut problem is innovatively solved by considering con-straints relaxation.We extend the normalized cut to its semi-supervised norm to construct the proposed GSSL framework.In addition,designing local label fitting terms can be more flexible than conventional GSSL methods [7–9].The LGC method,which is one of the most popular GSSL methods,can also be integrated into our framework if we discard all constraints and adopt the simplest label fitting strategy.Furthermore,we blend the proposed graph based semi-supervised learning method with nonnegative matrix factorization to solve the out-of-sample classifica-tion problem.It also can be regard as the extending of RNMF [30]by incorporating label information.The rest of the paper is organized as follows:Sect.2presents the graph based semi-supervised learning.In Sect.3,we introduce the algorithms to solve the graph based semi-supervised learning problem.In Sect.4,three graph cuts are adopted to produce other semi-supervised learning methods.Experiment results are presented in Sect.5.Finally,conclusions are drawn in Sect.6.2Graph based semi-supervised learningIt turns out [31]that minimizing normalized cut can beequivalently recast asargmin V T V ¼I ;V !0tr V T LV ÀÁð2Þwhere tr ðÁÞdenotes the trace of a matrix,L =I –S is the normalized Laplacian matrix,I is the identity matrix,S ¼D À1=2WD À1=2and D is an n 9n diagonal matrix withD ii ¼Pj w ij .V 2R n Âc is a specific discrete indicator matrix (or label matrix),which means each row of V has a unique nonzero positive value (see details in [31,32]).Suppose there are c classes,and the label set becomes L ¼f 1;2;...;c g .Point x i (1B i B n )can be labeled as y i ¼f j j V ij ¼0g or y i ¼arg max jV ij .In semi-supervised learning,data are partially labeled.We assume the dataset is organized asX ¼½x 1;...;x l ;x l þ1;...;x n 2R m Ân,suppose the first l points x i (i B l )are labeled and the remaining points x j (l ?1B j B n )are unlabeled.Define a n 9c matrix Y with Y ij =1if x i is labeled as j (1B j B c )and Y ij =0otherwise.Then semi-supervised normalized cut can be formulated as followsargmin V T V ¼I ;V !0tr V T LV ÀÁþl f ðV ;Y Þ:ð3Þwhere l [0is the tuned parameter,f (V ,Y )is the label fitting term,and V is a specific indicator matrix.2.1l [0Now we discuss the specific form of f (V ,Y )when l [0.There are many ways to define f (V ,Y ),and the simplest way isf V ;Y ðÞ¼jj V ÀY jj 2F :ð4Þwhere jj Ájj F denotes the Frobenius norm of a matrix.Notice that if we discard all constraints in (3)and define f (V ,Y )by (4),we can get a standard LGC problem.Obviously,it is not a good label fitting term especially when labeled samples are relatively few.If we only focus on fitting the labeled data,f (V ,Y )can be defined asf V ;Y ðÞ¼jj K V ÀY jj 2F :ð5Þwhere denotes element-wise product of matrices,K is a n 9c matrix with K iz =1(z =1,2,…,c )if x i is labeled and K iz =0(z =1,2,…,c )otherwise.Here we show an example to show the difference in (4)and (5).Considering the following toy examplematrix Y ¼010000000100000266664377775:This matrix indicates that there are five samples in total and the number of categories is three.The first sample belongs to category two and the fourth sample belongs to category one.Int.J.Mach.Learn.&Cyber.According to the definition of K,we haveK¼111 000 000 111 000266664377775:Suppose we know the optimal solution isVü01=ffiffiffi3p0 01=ffiffiffi3p0 01=ffiffiffi3p0 100 001266664377775:Considering following solutionsV1¼1=ffiffiffi3p0001=ffiffiffi3p01=ffiffiffi3p001001266664377775;V2¼01=ffiffiffi3p01=ffiffiffi3p01=ffiffiffi3p100003266664377775:Then,jj V1ÀY jj2F¼5jj V2ÀY jj2F¼9:84:jj K V1ÀY jj2F¼3:33jj K V2ÀY jj2F¼0:17:Thus,V1is a better solution if we consider(4)and V2is a better solution if we consider(5).However,V2is obvi-ously a better solution in this case.We can alsofind that constraints in(4)and(5)are both semi-hard constraints for V,i.e.,they only lead the max value of V’s row V(i)(i=1,2,…,n)equals1if x i is labeled and equals0otherwise.Now we consider soft constraint for V,i.e., we hope the max value of V’s row V(i)(i=1,2,…,n)equals the sum of V(i)if x i is labeled and equals0otherwise.Math-ematically,f(V,Y)can be defined asf V;YðÞ¼l1jj U V jj2þl2jj Y VÀY ðV1c1T cÞjj2F:ð6Þwhere U is a n9c matrix,and l1,l2[0are the tuned parameters.If x i is labeled and its label is j,then U iz=1 (z=1,2,…,c;z=j)and U ij=0.Besides,U iz=0 (z=1,2,…,c)if x i is unlabeled.1c is a vector with 1c=[1,…,1]T2R c91.For a labeled point x i with label j,thefirst term of(6) leads V iz=0(z=1,2,…,c;z=j)and the second term leads V ij equals the sum of the i-th row of V.Notice that the second term is a soft constraint for V,which do not require the max value of any row equals1.According to the definition of U,we haveU¼101000000011000266664377775:Notice thatjj U V1jj2¼1:33:jj U V2jj2¼0:jj Y V1ÀY V11k1T kÀÁjj2¼1:33:jj Y V2ÀY V21k1T kÀÁjj2¼0:Obviously,V2is a better solution than V1if we consider (6).In brief,f(V,Y)defined in(5)and(6)do not affected by the unlabeled data,which are more reasonable labelfitting terms than(4).The above designs areflexible,and one can design more reasonable labelfitting terms to incorporate prior information.2.2l¼1If we assume l=?,then problem(3)becomesargminV T u V u¼I;V u!0tr V T LVÀÁ:s:t:V l¼Y l:ð7Þwhere V¼½V l;V u ,and V l2R lÂc;V u2RðnÀlÞÂc,are the solutions of the labeled data and the unlabeled data, respectively.Y l is a part of Y,where Y=[Y l;Y u].It can be seen as the hard constraint for labelfitting.L can be divided intoL¼L ll L luL ul L uu!:ð8ÞWe haveV T LV¼½V l;V u TL ll L luL ul L uu!V l;V u½¼V T l L ll V lþV T l L lu V uþV T u L ul V lþV T u L uu V u:ð9ÞSince V l=Y l and L lu=L ul T,the constant term V T l L ll V l can be dropped,then problem(7)becomesargminV T u V u¼I;V u!0trð2V TuL ul Y lþV T u L uu V uÞ:ð10ÞInt.J.Mach.Learn.&Cyber.2.3Out of sample problemNote that algorithms in 2.1and 2.2cannot solve the out-of-sample problem.In this section,we develop the graph based semi-supervised learning method which can solve the out-of-sample problem.If the data is nonnegative,we combine the nonnegative matrix factorization with problem (3).Then problem (3)becomesargmin V T V ¼I ;U ;V !0jj X ÀUV T jj 2Fþc tr V T LV ÀÁþl f ðV ;Y Þ:ð11Þwhere X =[x 1,…,x l ,x l ?1,…,x n ]2R m 9n ,U 2R m 9c is the basis matrix,c [0,l C 0are the tuned parameters and V 2R n Âc is the solution matrix (or the representation matrix).If l [0,we can define f (V ,Y )by (4),(5),(6).Now we discuss the case when l =?.When l =?,problem (11)becomesargmin V T u V u ¼I ;U ;V u!0jj X ÀUV T jj 2Fþc tr V T LV ÀÁ:s :t :V l ¼Y l :ð12ÞDivide X as X =[X l ,X u ],then we have XVU T¼X l ;X u ½ V l ;V u ½ U T¼X l V l U TþX u V u U T:UV TVU T¼U ½V l ;V u TV l ;V u ½ U T¼UV T l V l U T þUV Tu V u U T :Thusjj X ÀUV T jj 2¼tr XXTÀÁÀ2X l V l U T þX u V u U TÂÃþUV T l V l U T þUV TuV u U T :Drop constant terms,and problem (12)becomesargmin V T u V u ¼I ;U ;V u!0À2X l V l U T þX u V u U T ÂÃþUV T l V l U TþUV Tu V u U Tþc trace ð2V T u L ul Y l þV Tu L uu V u Þ:ð13ÞThe key to solve the out-of-sample problem is the basis matrix U .When a new sample x arrives,the representation v of x can be computed by solving argmin v !0jj x ÀUv T jj 2:ð14ÞAt last,we can determine the label of x by contrasting the obtained v and the trained V .Usually,1-nn (nearest neighbor)method is adopt to do this job.3Algorithms3.1l [0We first discuss how to solve problem (3).The problem in (3)is a discrete optimization problems.Thus,finding the optimal solution is NP-hard.To get around this,relaxationscan be considered.As mentioned above,if f (V ,Y )isdefined by (4),then (3)becomes to a standard LGC problem by discarding the orthogonality and the discrete-ness constraints.In this way,the discreteness and the orthogonality of the solutions are totally ignored.Actually,we want to preserve more constraints in (3).Firstly,we keep the nonnegative constraint strictly.Though the dis-creteness constraint is always discarded,we still want the solutions be sparse.In [33],we propose a novel algorithm by congruent approximation to solve the normalized cut problem.The orthogonality constraint and sparseness of solutions can be properly reached by considering the con-gruent approximation.Consider the regularizerR ðV Þ tr V T V ÀÁÀlogdet V T V ÀÁ:ð15ÞIt is a strictly convex function [34]and this regularizer can be viewed as a special case of the LogDet divergence [35].The regularizer R is used to approximate the orthogonality constraint V T V =I c .By considering the regularizer R ,(3)becomesargmin V !0tr V T LV ÀÁþk R þl f ðV ;Y Þ:ð16ÞConsidering different f ðV ;Y Þ,we have the followingthree objective functionsO 1¼argmin V !0tr V T LV ÀÁþa jj V ÀY jj 2F þk R :ð17ÞO 2¼argmin V !0tr V T LV ÀÁþb jj K V ÀY jj 2F þk R :ð18ÞO 3¼argmin V !0tr V T LV ÀÁþl 1jj U V jj 2þl 2jj Y V ÀYV 1k 1T k ÀÁjj 2þk R :ð19Þwhere a ;b ;l 1;l 2;k [0are the regularization parameters,and L =I –S is the normalized Laplacian matrix with S ¼D À1=2WD À1=2.We first discuss how to minimize the objective function O 1,which can be rewritten asO 1¼tr V T LV þk V T V ÀÁÀk logdet V T VÀÁþa tr ½V ÀY ðÞT V ÀY ðÞ :ð20ÞLet /jk be the Lagrange multiplier for constraint v jk !0.Denote U ¼½/jk ,then the Lagrange M isM ¼tr V T LV ÀÁþk tr V T V ÀÁÀk logdet V T VÀÁþa tr ½V ÀY ðÞTV ÀY ðÞ þtr U V T ÀÁ:ð21ÞLet the derivatives of M with respect to V vanish,wehaveo Mo V¼2LV þ2k V À2k V V T V ÀÁÀ1þ2a ðV ÀY ÞþU :ð22ÞInt.J.Mach.Learn.&Cyber.Using the KKT conditions [36]/jk V jk ¼0,we get the following equations for V jk LV ðÞjk þk V jk Àk V V T V ÀÁÀ1 jkþa V jk Àa Y jk !V jk ¼0:ð23ÞThese equations lead to the following update ruleV jk V jk SV þa Y þk V V T V ðÞÀ1h i þjkV þk V þa V þk V ðV T V ÞÀ1h i Àjk:ð24Þwhere we separate the positive and negative parts of amatrix B (B ¼V ðV T V ÞÀ1)as:B þik ¼j B ik j þB ik ðÞ=2;B Àik ¼j B ik j ÀB ik ðÞ=2.Similar to minimize O 1,minimizing the objective function O 2leads to the following update ruleV jk V jk SV þb Y þk V V T V ðÞÀ1h i þjkV þk V þb K V þk ½V ðV T V ÞÀ1Àjk:ð25ÞMinimizing the objective function O 3leads to the fol-lowing update ruleV jk V jk SV þl 2Y V 1k 1T k ÀÁþk V V TV ðÞÀ1h i þ jkV þk V þl 1U V þl 2Y V þk ½V ðV T V ÞÀ1 Àjk:ð26Þ3.2l ¼1Now we discuss the updating rule for problem (10).Byconsidering the regularizer R ðV u Þ tr V T u V u ÀÁÀlogdet V T u V u ÀÁ,problem (10)can be recast as O 4¼trace ðV T LV Þþk R ðV u Þ:ð27ÞLet /jk be the Lagrange multiplier for constraint v jk !0.Denote U ¼½/jk ,then the Lagrange L isL ¼trace ðV T LV ÞþR ðV u Þþtr U V T uÀÁ:ð28ÞTheno L o V u¼2L ul Y l þ2L uu V u þ2k V u À2k V u ðV Tu V u ÞÀ1þU :ð29ÞUsing the KKT conditions [36]/jk V jk ¼0,we get the following equations for V ujkv u jkv u jk L Àuu V u þL Àul Y l þk V u V Tu V uÀÁÀ1hi þjk L þuu V u þL þul Y l þk V u þk V u V T u V uÀÁÀ1h i À jk:ð30Þ3.3Out of sample problemNow we discuss the updating rule for problem (11)byconsidering the regularizer R V ðÞ¼tr V T V ðÞÀlogdet V T V ðÞwhen l [0and R V u ðÞ¼tr V T uV u ÀÁÀlogdet V T u V u ÀÁwhen l ¼1.When l [0,problem (11)can be recast asargmin U ;V !0jj X ÀUV T jj 2Fþc tr V T LV ÀÁþl f V ;Y ðÞþk R :ð31ÞThe objective function of (31)isO 5¼jj X ÀUV T jj 2Fþc tr V T LV ÀÁþl f V ;Y ðÞþk R :ð32ÞLet w jk be the Lagrange multiplier for constraint u jk !0.Denote W ¼½w jk ,then the Lagrange L of (32)isL ¼jj X ÀUV T jj 2F þc tr V T LV ÀÁþk R þl f V ;Y ðÞþtr W U T ÀÁ:ð33ÞLet the derivatives of L with respect to U vanish,wehaveo Lo U¼À2XV þ2UV T V þW :ð34ÞUsing the KKT conditions [36]w jk u jk ¼0,we get the following equations for u jk u jk u jkðXV Þjk ðUV V Þjk:ð35ÞSimilar to minimize O 1–O 3,minimizing the objective function O 5with different f ðV ;Y Þleads the following update rulesV jk V jk X T U þc SV þa Y þk V V T V ðÞÀ1h i þjkVU T U þc V þk V þa V þk ½V ðV T V ÞÀ1Àjk:ð36ÞV jk V jk X TU þc SV þb Y þk V V TV ðÞÀ1h i þjkVU T U þc V þk V þb K V þk ½V ðV T V ÞÀ1Àjk:ð37ÞInt.J.Mach.Learn.&Cyber.V jk V jk X TU þc SV þl 2YV 1k 1T kÀÁþk V V TV ðÞÀ1hi þjkVU T U þc V þk V þl 1U V þl 2Y V þk ½V ðV T V ÞÀ1 Àjk:ð38Þwhere a [0is used to substitute l in (36),and b [0is used to substitute l in (37).When l =?,problem (12)[or problem (13)]can be recast asargmin U ;V u !0À2X l V l U T þX u V u U T ÂÃþUV T l V l U T þUV Tu V u U T þc trace 2V T u L ul Y l þV Tu L uu V u ÀÁþk R ðV u Þ:ð39ÞIt is easy to know the updating rule for U is same as (35),and the updating rule for V u isv u jkv u jk X T u U þc L Àuu V u þc L Àul Y l þk V u V T u V uÀÁÀ1hi þjkV u U T U þc L þuu V u þc L þul Y l þk V u þk V u V T u V uÀÁÀ1h i À jk:ð40ÞNow we discuss the method to solve (14),which is O 6¼jj x ÀUv T jj 2F :ð41ÞLet w j be the Lagrange multiplier for constraint v j C 0.Denote w ¼½w j ,then the Lagrange L of (41)isL ¼jj x ÀUv T jj 2F þtr w v T ÀÁ:ð42ÞLet the derivatives of L with respect to v vanish,wehaveo Lo v¼vU T U Àx T U þw :ð43ÞUsing the KKT conditions [36]w j v j ¼0,we get thefollowing equations for v j v i v i ðx T U ÞiðvU T U Þi:ð44Þ4Graph cutsIn Sect.3,algorithms are designed to study the semi-su-pervised graph cut problem under the normalized cut (Ncut)[7]criterion.Besides,other graph criterions including the min–max cut (Mmcut)[25]and ratio cut (Rcut)[22]can also be considered to study the semi-su-pervised graph cut problem.In this section,we investigate min–max cut and ratio cut to extend the proposed learning algorithms.Note that (2)is used as the graph cut regularizer to learn the global smoothness of the data.If we denote the graphcut regularizer as J .Then the semi-supervised graph cut problem can be written as argmin V T V ¼I ;V !0J þl f ðV ;Y Þ:ð45ÞThe semi-supervised graph cut problem that solves theout-of-sample problem can be written as argminV T V ¼I ;U ;V !0jj X ÀUV T jj 2F þc J þl f ðV ;Y Þ:ð46ÞIn fact,original normalized cut problem [21]can berecast asargmin ~J Ncut ¼argmintr ½~V T D ÀW ðÞ~V :s :t :~VT D ~V ¼I k ;~V!0:ð47Þwhere ~V2R n Âk is a specific indicator matrix.Transformation can be considered,and (47)can be recast asargminJ Ncut ¼argmintr V T I ÀD À1=2WD À1=2V h i :s :t :V T V ¼I k ;V !0:ð48Þwhere V 2R n Âk is a specific indicator matrix.It is easy to know that ~V¼D À1=2V (~V in (47),and V in (48)).Both (47)and (48)are normalized cut problems,and we can name them original normalized cut and normalized cut respectively.We can first solve the normalized cut problem in Sect.3,and then solve the original normalizedcut problem by ~V¼D À1=2V .Ratio cut problem can be recast as argminJ Rcut ¼argmintrace ½V T D ÀW ðÞV :s :t :V T V ¼I k ;V !0:ð49Þwhere V 2R n Âk is a specific indicator matrix.Minmax cut problem can be recast as argmin ~JMmcut ¼argmin X k i ¼1~v T iD ~v i v T iW v i :s :t :~VT D ~V ¼I k ;~V !0:ð50Þwhere ~vi is i -th column of ~V ,and ~V 2R n Âk is a specific indicator matrix.Transformation can be considered,and (50)can be recast asargminJ Mmcut ¼argmin X k i ¼1v T i v iv T iD À1=2WD À1=2v i :s :t :V T V ¼I k ;V !0:ð51Þwhere v i is i -th column of V ,and V 2R n Âk is a specificindicator matrix.Int.J.Mach.Learn.&Cyber.It is easy to know that ~V¼D À1=2V (~V in (50),and V in (51)).Both (50)and (51)are minmax cut problems,we can named them original minmax cut and minmax cut.We can first solve the minmax cut problem,and then solve theoriginal minmax cut by ~V¼D À1=2V .If we use Ratio cut as the graph cut regularizer,we can find that ratio cut and normalized share the same updating rules in Sect.3.The difference between them is that they use different Laplacian matrices.Normalized cut uses I ÀD À1=2WD À1=2as the Laplacian matrix,while ratio cut uses D –W as the Laplacian matrix.If we use minmax cut as the graph cut regularizer,we can adopt the proposed method in Sect.3to solve the semi-supervised minmax cut problem.When l [0,updating rules for problem (45)with three kinds of f (V ,Y )arev jk v jk L a V c þa Y þk ½V ðV TV ÞÀ1þjk V b þk V þa V þk ½V ðV T V ÞÀ1À jk:ð52Þv jk v jk L a V c þb Y þk ½V ðV TV ÞÀ1þjkV b þk V þb K V þk ½V ðV T V ÞÀ1Àjk:ð53Þv jk v jk L a V c þl 2YV 1k 1T kÀÁþk ½V ðV TV ÞÀ1þjkV b þk V þl 1U V þl 2Y V þk ½V ðV T V ÞÀ1 Àjk:ð54Þwhere V b ¼1T 1a 1v 1;1T 2a 2v 2;h...;1T k a k v k ,V c ¼v T 1v1ðv 1L a v 1Þ2:v 1;v T 2v 2ðv 2L a v 2Þ2v 2;...;v T k v k ðv kL a v kÞ2v k!;and L a ¼D À1=2WD À1=2.When l =?,we first divide L a asL a ¼L a ll L a luL a ul L a uu!:ð55ÞThen we divide v i as v i ¼½v li ;v ui ,where v i is i -th col-umn of V .The updating rule can be obtained byv u jk v u jk L a ul V c þL a uu V h þk V u V T u V uÀÁÀ1h i þ jkV b þk V u þk V u V T u V uÀÁÀ1h i Àjk:ð56Þwhere V b ¼1T 1a 1v u 1;1T 2a 2v u 2;...;1T k a kv uk h i,V c ¼v T 1v 1v T 1L a v 1ðÞ2y l 1;v T 2v 2v T 2L a v 2ðÞ2y l 2;...;v T k v k v T k L a v k ðÞ2y lk!;V h ¼v T 1v 1v T 1L a v 1ðÞ2v u 1;v T 2v 2ðv T 2La v 2Þ2v u 2;...;v T k v kðv T k L a v k Þ2v uk :When l [0,updating rules for problem (46)with threekinds of f ðV ;Y Þarev jk v jk X T U þc L a V c þa Y þk ½V ðV TV ÞÀ1þjkVU T U þc V b þk V þa V þk ½V ðV T V ÞÀ1Àjk:ð57Þv jk v jk X T U þc L a V c þb Y þk ½V ðV TV ÞÀ1 þjkVU T Uþc V b þk V þb K V þk ½V ðV T V ÞÀ1 Àjk:ð58Þv jk v jkðX T U þc L a V c þl 2Y ðV 1k 1T k Þþk ½V ðV TV ÞÀ1 þÞjkðVU T U þc V b þk V þl 1U V þl 2Y V þk ½V ðV T V ÞÀ1 ÀÞjk:ð59ÞWhen l =?,we havev u jk v u jkX T u U þc L a ul V c þc L a uu V h þk V u V Tu V uÀÁÀ1h i þjkV u U T U þc V b þk V u þk V u V T u V uÀÁÀ1h i Àjk:ð60ÞwhereV b ¼1v T 1L a v 1v u 1;1v T 2L a v 2v u 2;...;1v T k L a v k v uk h i,V c ¼v T 1v 1v T 1L a v 1ðÞ2y l 1;v T 2v 2v T 2L a v 2ðÞ2y l 2;...;v T k v k v T k L a v k ðÞ2y lk!;V h ¼v T 1v 1v T 1L a v 1ðÞ2v u 1;v T 2v 2ðv T 2La v 2Þ2v u 2;...;v T k v kðv T k L a v k Þ2v uk :5ExperimentsIn this section,we construct experiments to demonstratethe effectiveness of the proposed algorithms.We use four categories of public datasets in the experiments,including image data,text data and handwritten digit data.We summarized these databases in Table 1.These datasets are •Yale Database.The Yale database 1contains 165grayscale images of 15individuals.There are 11images per subject,one per different facial expression or configuration.Each image is represented by a 1024-dimensional vector in image space.Table 1Statistics of the four datasets Dataset Size (n)Dimensionality (m)#of Classes (k)YaleB 640205610TDT2150036,77130USPS 50025610Yale1651024151/projects/yalefaces/yalefaces.html .Int.J.Mach.Learn.&Cyber.。
2009 On The Heston Model with Stochastic Interest Rates

Derivatives Research and Validation Group, Rabobank, Jaarbeursplein 22, 3521 AP, Utrecht, the Netherlands
CWI - National Research Institute for Mathematics and Computer Science, Kruislaan 413, 1098 SJ, Amsterdam, the Netherlands first version: February 17, 2009 this version: January 18, 2010 Abstract We discuss the Heston [Heston-1993] model with stochastic interest rates driven by Hull-White [Hull,White-1996] (HW) or Cox-Ingersoll-Ross [Cox, et al.-1985] (CIR) processes. A so-called volatility compensator is defined which guarantees that the Heston hybrid model with a non-zero correlation between the equity and interest rate processes is properly defined. Two different approximations of the hybrid models are presented in order to obtain the characteristic functions. These approximations admit pricing basic derivative products with Fourier techniques [Carr,Madan-1999; Fang,Oosterlee-2008], and can therefore be used for fast calibration of the hybrid model. The effect of the approximations on the instantaneous correlations and the influence of the correlation between stock and interest rate on the implied volatilities are also discussed.

32. 关于义务兵和士官的区别正确的是A.我军现役士兵按兵役性质分为义务兵役制士兵和志愿兵役制士兵。