通过进入国际市场,企业一方面能够在国际舞台上寻找创新资源和学习机会,进而提升企业创新能力和竞争力;另一方面,企业还能够在国际市场上利用已有的技术和营销优势,进而扩大市场份额(Luo & Tung,2007)。
然而,中国企业缺乏国际化所需的技术知识、营销知识和国际化运营经验,企业在国际化过程中不可避免地会遭受各种障碍,如外来者障碍(liability of foreignness)、新到者障碍(liability of newness)(Zaheer,1995),而企业外部网络作为一种重要的知识获取渠道,为中国企业克服国际化障碍提供了契机(Oviatt & McDougall,1994;Tseng et al.,2007;Yu,Gilbert & Oviatt,2011)。
regularization paths for generalized linear models via coordinate descent
Stanford University
Trevor Hastie
Stanford University
Rob Tibshirani
Stanford University
Abstract We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include 1 (the lasso), 2 (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.
Journal of Statistical Software
January 2010, Volume 33, Issue 1. /
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS:Vol.109,No.3,pp.475–494,JUNE2001 Convergence of a Block Coordinate DescentMethod for Nondifferentiable Minimization1P.T SENG2Communicated by O.L.MangasarianAbstract.We study the convergence properties of a(block)coordinatedescent method applied to minimize a nondifferentiable(nonconvex)function f(x1,...,x N)with certain separability and regularity proper-ties.Assuming that f is continuous on a compact level set,the sub-sequence convergence of the iterates to a stationary point is shown wheneither f is pseudoconvex in every pair of coordinate blocks from amongN A1coordinate blocks or f has at most one minimum in each of N A2coordinate blocks.If f is quasiconvex and hemivariate in every coordi-nate block,then the assumptions of continuity of f and compactness ofthe level set may be relaxed further.These results are applied to derivenew(and old)convergence results for the proximal minimization algo-rithm,an algorithm of Arimoto and Blahut,and an algorithm of Han.They are applied also to a problem of blind source separation.Key Words.Block coordinate descent,nondifferentiable minimization,stationary point,Gauss–Seidel method,convergence,quasiconvex func-tions,pseudoconvex functions.1.IntroductionA popular method for minimizing a real-valued continuously differen-tiable function f of n real variables,subject to bound constraints,is the (block)coordinate descent method.In this method,the coordinates are par-titioned into N blocks and,at each iteration,f is minimized with respect to one of the coordinate blocks while the other coordinates are heldfixed.This method,which is related closely to the Gauss–Seidel and SOR methods for equation solving(Ref.1),was studied early by Hildreth(Ref.2)and Warga (Ref.3),and is described in various books on optimization(Refs.1and4–1This work was partially supported by the National Science Foundation Grant CCR-9731273. 2Professor,Department of Mathematics,University of Washington,Seattle,Washington.4750022-3239͞01͞0600-0475$19.50͞0 2001Plenum Publishing Corporation476JOTA:VOL.109,NO.3,JUNE200110).Its applications include channel capacity computation(Refs.11–12), image reconstruction(Ref.7),dynamic programming(Refs.13–15),and flow routing(Ref.16).It may be applied also to the dual of a linearly constrained,strictly convex program to obtain various decomposition methods(see Refs.6–7,17–22,and references therein)and parallel SOR methods(Ref.23).Convergence of the(block)coordinate descent method requires typi-cally that f be strictly convex(or quasiconvex or hemivariate)differentiable and,taking into account the bound constraints,has bounded level sets(e.g., Refs.3–4and24–25).Zadeh(Ref.26;also see Ref.27)relaxed the strict convexity assumption to pseudoconvexity,which allows f to have a non-unique minimum along coordinate directions.For certain classes of convex functions,the level sets need not be bounded(see Refs.2,6–7,17,19–22, and references therein).If f is not(pseudo)convex,then an example of Powell(Ref.28)shows that the method may cycle without approaching any stationary point of f.Nonetheless,convergence can be shown for special cases of non(pseudo)convex f,as when f is quadratic(Ref.29),or f is strictly pseudoconvex in each of N A2coordinate blocks(Ref.27),or f has unique minimum in each coordinate block(Ref.8,p.159).If f is not differentiable, the coordinate descent method may get stuck at a nonstationary point even when f is convex(e.g.,Ref.4,p.94).For this reason,it is perceived generally that the method is unsuitable when f is nondifferentiable.However,an exception occurs when the nondifferentiable part of f is separable.Such a structure for f was consideredfirst by Auslender(Ref.4,p.94)in the case where f is strongly convex.This structure is implicit in a decomposition method and projection method of Han(Refs.18,30),for which f is the convex dual functional associated with a certain linearly constrained convex program(see Ref.22for detailed discussions).This structure arises also in least-square problems where an l1-penalty is placed on a subset of the para-meters in order to minimize the support(see Refs.31–33and references therein).Motivated by the preceding works,we consider in this paper the non-differentiable(nonconvex)case where the nondifferentiable part of f is sep-arable.Specifically,we assume that f has the following special form:Nf k(x k),(1)f(x1,...,x N)G f0(x1,...,x N)C∑k G1for some f0:ℜn1C···C n N>ℜ∪{S}and some f k:ℜn k>ℜ∪{S},k G 1,...,N.Here,N,n1,...,n N are positive integers.We assume that f is pro-per,i.e.,f͞≡S.We will refer to each x k,k G1,...,N,as a coordinate block of x G(x1,...,x N).We will show that each cluster point of the iterates generated by the(block)coordinate descent method is a stationary point ofJOTA:VOL.109,NO.3,JUNE2001477 f,provided that f0has a certain smoothness property(see Lemma3.1),f is continuous on a compact level set,and either f is pseudoconvex in every pair of coordinate blocks from among N A1coordinate blocks,or f has at most one minimum in each of N A2coordinate blocks(see Theorem4.1). If f is quasiconvex and hemivariate in every coordinate block,then the assumptions of continuity of f and compactness of the level set may be relaxed further(see Proposition5.1).These results unify and extend some previous results in Refs.4,6,8,26–27.For example,previous results assumed that f is pseudoconvex and that f1,...,f N are indicator functions for closed convex sets,whereas we assume only that f is pseudoconvex in every pair of coordinate blocks from among N A1coordinate blocks,with no additional assumption made on f1,...,f N.Previous results also did not consider the case where f is not continuous on its effective stly, we apply our results to derive new(and old)convergence results for the proximal minimization algorithm,an algorithm of Arimoto and Blahut (Refs.11–12),and an algorithm of Han(Ref.30);see Examples6.1–6.3. We also apply them to a problem of blind source separation described in Refs.31,33;see Example6.4.In our notation,ℜm denotes the space of m-dimensional real column vector.For any x,y∈ℜm,we denote by〈x,y〉the Euclidean inner product of x,y and by͉͉x͉͉the Euclidean norm of x,i.e.,͉͉x͉͉G1.For any set S⊆ℜm,we denote by int(S)the interior of S and denote bdry(S)G S\int(S).For any h:ℜm>ℜ∪{S},we denote by dom h the effective domain of h, i.e.,dom h G{x∈ℜm͉h(x)FS}.For any x∈dom h and any d∈ℜm,we denote the(lower)directional deriva-tive of h at x in the direction d byh′(x;d)G lim inf[h(x Cλd)A h(x)]͞λ.λ↓0We say that h is quasiconvex ifh(x Cλd)⁄max{h(x),h(x C d)},for all x,d andλ∈[0,1];h is pseudoconvex ifh(x C d)¤h(x),whenever x∈dom h and h′(x;d)¤0;see Ref.34,p.146;and h is hemivariate if h is not constant on any line segment belonging to dom h(Ref.1).For any nonempty I⊆{1,...,m},weJOTA:VOL.109,NO.3,JUNE 2001478say that h (x 1,...,x m )is pseudoconvex [respectively,has at most one mini-mum point].2.Block Coordinate Descent MethodWe describe formally the block coordinate descent (BCD)method below.BCD Method.Initialization.Choose any x 0G (x 01,...,x 0N )∈dom f .Iteration r C 1,r ¤0.Given x r G (x r 1,...,x r N )∈dom f ,choose an indexs ∈{1,...,N }and compute a new iteratex r C 1G (x r C 11,...,x r C 1N )∈dom fsatisfyingx r C 1s ∈arg min x s f (x r 1,...,x r s A 1,x s ,x r s C 1,...,x r N ),(2)x r C 1j G x r j ,∀j ≠s .(3)We note that the minimization in (2)is attained ifX 0G {x :f (x )⁄f (x 0)}is bounded and f is lower semicontinuous (lsc)on X 0,so X 0is compact (Ref.35).Alternatively,this minimization is attained if f is convex,has a minimum point,and is hemivariate in each coordinate block (but the level sets of f need not be bounded).To ensure convergence,we need further that each coordinate block is chosen suf ficiently often in the method.In particu-lar,we will choose the coordinate blocks according to the following rule (see,e.g.,Refs.7–8,21,25).Essentially Cyclic Rule.There exists a constant T ¤N such that every index s ∈{1,...,N }is chosen at least once between the r th iteration and the (r C T A 1)th iteration,for all r .A well-known special case of this rule,for which T G N ,is given below.Cyclic Rule.Choose s G k at iterations k ,k C N ,k C 2N ,...,for k G 1,...,N .JOTA:VOL.109,NO.3,JUNE2001479 3.Stationary Points of fWe say that z is a stationary point of f if z∈dom f andf′(z;d)¤0,∀d.We say that z is a coordinatewise minimum point of f if z∈dom f and f(z C(0,...,d k,...,0))¤f(z),∀d k∈ℜn k,(4) for all k G1,...,N.Here and throughout,we denote by(0,...,d k, 0the vector inℜn1C···C n N whose k th coordinate block is d k and whose other coordinates are zero.We say that f is regular at z∈dom f iff′(z;d)¤0,∀d G(d1,...,d N),such that f′(z;(0,...,d k,...,0))¤0,k G1,...,N.(5) This notion of regularity is weaker than that used by Auslender(Ref.4, p.93),which entailsNf′(z;(0,...,d k,...,0)),for all d G(d1,...,d N).f′(z;d)G∑k G1For example,the functionf(x1,x2)Gφ(x1,x2)Cφ(−x1,x2)Cφ(x1,−x2)Cφ(−x1,−x2),whereφ(a,b)G max{0,a C b A1a2C b2},is regular at z G(0,0)in the sense of(5),but is not regular in the sense of Ref.4,p.93.Since(4)impliesf′(z;(0,...,d k,...,0))¤0,for all d k,it follows that a coordinatewise minimum point z of f is a stationary point of f whenever f is regular at z.To ensure regularity of f at z,we consider one of the following smoothness assumptions on f0:(A1)dom f0is open and f0is Gaˆteaux-differentiable on dom f0.(A2)f0is Gaˆteaux-differentiable on int(dom f0)and,for every z∈dom f∩bdry(dom f0),there exist k∈{1,...,N}and d k∈ℜn ksuch that f(z C(0,...,d k,...,0))F f(z).JOTA:VOL.109,NO.3,JUNE2001480Assumption A1was considered essentially by Auslender(Ref.4, Example2on p.94).In contrast to Assumption A1,Assumption A2allows dom f0to include boundary points.We will see an application(Example 6.2)where A2holds but not A1.Lemma3.1.Under A1,f is regular at each z∈dom f.Under A2,f is regular at each coordinatewise minimum point z of f.Proof.Under A1,if z G(z1,...,z N)∈dom f,then z∈dom f0.Under A2,if z G(z1,...,z N)is a coordinatewise minimum point of f,then z∉bdry(dom f0),so z∈int(dom f0).Thus,under either A1or A2,f0is Gaˆteaux-differentiable at z.Fix any d G(d1,...,d N)such thatf′(z;(0,...,d k,...,0))¤0,k G1,...,N.Then,f′(z;d)G〈∇f0(z),d〉C lim infλ↓0∑Nk G1[f k(x k Cλd k)A f k(x k)]͞λ¤〈∇f(z),d〉C∑Nk G1lim infλ↓0[f k(x k Cλd k)A f k(x k)]͞λG〈∇f(z),d〉C∑Nk G1f′k(z k;d k)G∑Nk G1f′(z;(0,...,d k,...,0))¤0.ᮀ4.Convergence Analysis:IOurfirst convergence result unifies and extends a result of Auslender (Ref.4,p.95)for the nondifferentiable convex case and some results of Grippo and Sciandrone(Ref.27),Luenberger(Ref.8,p.159),and Zadeh (Ref.26)for the differentiable case.In what follows,r≡(N A1)mod N means r G N A1,2N A1,3N A1,....Theorem 4.1.Assume that the level set X0G{x:f(x)⁄f(x0)}is compact and that f is continuous on X0.Then,the sequence {x r G(x r1,...,x r N)}r G0,1,...generated by the BCD method using the essen-tially cyclic rule is defined and bounded.Moreover,the following statementsJOTA:VOL.109,NO.3,JUNE2001481 hold:(a)If f(x1,...,x N)is pseudoconvex in(x k,x i)for every i,k∈{1,...,N},and if f is regular at every x∈X0,then every clusterpoint of{x r}is a stationary point of f.(b)If f(x1,...,x N)is pseudoconvex in(x k,x i)for every i,k∈{1,...,N A1},if f is regular at every x∈X0,and if the cyclic ruleis used,then every cluster point of{x r}r≡(N A1)mod N is a stationarypoint of f.(c)If f(x1,...,x N)has at most one minimum in x k for k G2,...,N A1,and if the cyclic rule is used,then every cluster pointz of{x r}r≡(N A1)mod N is a coordinatewise minimum point of f.Inaddition,if f is regular at z,then z is a stationary point of f.Proof.Since X0is compact,an induction argument on r shows thatx r C1is defined,f(x r C1)⁄f(x r),and x r C1∈X0for all r G0,1,....Thus,{x r} is bounded.Consider any subsequence{x r}r∈R,with R⊆{0,1,...},con-verging to some z.For each j∈{1,...,T},{x r A T C1C j}r∈R is bounded,so bypassing to a subsequence,if necessary,we can assume that{x r A T C1C j}r∈R converges to some z j G(z j1,...,z j N),j G1,...,T. Thus,z T A1G z.Since{f(x r)}converges monotonically and f is continuous on X0,weobtain thatf(x0)¤limr→Sf(x r)G f(z1)G···G f(z T).(6)By further passing to a subsequence,if necessary,we can assume that the index s chosen at iteration r A T C1C j,j∈{1,...,T},is the same for all r∈R,which we denote by s j.For each j∈{1,...,T},since s j is chosen at iteration r A T C1C j for r∈R,then(2)and(3)yieldf(x r A T C1C j)⁄f(x r A T C1C j C(0,...,d s j,...,0)),∀d s j,j G1,...,T,x r A T C1C j k G x r A T C jk,∀k≠s j,j G2,...,T.Then,the continuity of f on X0yields in the limit thatf(z j)⁄f(z j C(0,...,d s j,...,0)),∀d s j,j G1,...,T,z j k G z j A1k ,∀k≠s j,j G2,...,T.(7)482JOTA:VOL.109,NO.3,JUNE2001Then,(6)and(7)yieldf(z j A1)⁄f(z j A1C(0,...,d s j,...,0)),∀d s j,j G2,...,T.(8)(a),(b)Suppose that f is regular at every x∈X0and that f(x1,...,x N) is pseudoconvex in(x k,x i)for every i,k∈{s1}∪···∪{s T A1}.This holds under the assumption(a)or under the assumption(b),with{x r}r∈R being any convergent subsequence of{x r}r≡(N A1)mod N.We claim that,for j G 1,...,T A1,f(z j)⁄f(z j C(0,...,d k,...,0)),∀d k,∀k G s1,...,s j.(9) By(7),(9)holds for j G1.Suppose that(9)holds for j G1,...,l A1for some l∈{2,...,T A1}.We show that(9)holds for j G l.From(8),we have that f(z l A1)⁄f(z l A1C(0,...,d s l,...,0)),∀d s l,implying,...,0))¤0.f′(z l A1;(0,...,z l s l A z l A1s lAlso,since(9)holds for j G l A1,we have that,for each k G s1,...,s l A1, f′(z l A1;(0,...,d k,...,0))¤0,∀d k.Since by(6)z l A1∈X0,so f is regular at z l A1,the above two relations imply f′(z l A1;(0,...,d k,...,0)C(0,...,z l s l A z l A1,...,0))¤0,∀d k.s lSince f is pseudoconvex in(x k,x s l),this yields[also using z l G z l A1C ,...,0)]for k G s1,...,s l A1that(0,...,z l s l A z l A1s lf(z l C(0,...,d k,...,0))¤f(z l A1)G f(z l),∀d k.Since we have also that(7)holds with j G l,we see that(9)holds for j G l. By induction,(9)holds for all j G1,...,T A1.Since z T A1G z and(9)holds for j G T A1,then(4)holds for k G s1,...,s T A1.Since z T A1G z and(8)holds(in particular,for j G T),then(4) holds for k G s T also.Since{1,...,N}G{s1}∪···∪{s T},this implies that z is a coordinatewise minimum point of f.Since f is regular at z,then z is in fact a stationary point of f.(c)Suppose that f(x1,...,x N)has at most one minimum in x k for k G s2,...,s T A1.This holds under the assumption(c),with{x r}r∈R being any convergent subsequence of{x r}r≡(N A1)mod N.For each j G2,...,T A1,sinceJOTA:VOL.109,NO.3,JUNE 2001483(7)and (8)hold,then the functiond s j >f (z j C (0,...,d s j ,...,0))attains its minimum at both d s j G 0and d s j G z j A 1s j A z j s j .By assumption,the minimum point is unique,implying 0G z j A 1s j A z j s j ,or equivalently,z j A 1Gz j .Thus,z 1G z 2G ···G z T A 1G z and (7)yields that (4)holds for k G s 1,...,s T A 1.Since z T A 1G z and (8)holds (in particular,for j G T ),then (4)holds for k G s T also.Since{1,...,N }G {s 1}∪···∪{s T },this implies that z is a coordinatewise minimum point of f .If f is regular at z ,then z is also a stationary point of f .ᮀNotice that,if f is pseudoconvex,then f is pseudoconvex in (x k ,x i )for every i ,k ∈{1,...,N };if f is quasiconvex and hemivariate in x k ,then f has at most one minimum in x k .The converses do not hold.For example,the 2-variable Rosenbrock function has a unique minimum point but is not quasiconvex.The following 3-variable quadratic functionf (x 1,x 2,x 3)G (1͞2)x 21C (1͞2)x 22C (1͞2)x 23C x 1x 3C x 2x 3A x 1x 2is convex in every pair of variables,but is not pseudoconvex.In particular,for x G (0,0,1͞2)and d G (1,1,−1),we have f ′(x ;d )G 1͞2¤0,while f (x C d )G −7͞8F f (x )G 1͞8.This example generalizes to any quadratic functionf (x )G 〈x ,Qx 〉.where Q ∈R N B N is symmetric,not positive semide finite,but whose 2B 2principal submatrices are positie semide finite.Then,for any d satisfying 〈d ,Qd 〉F 0and any x satisfying0⁄〈x ,Qd 〉F −(1͞2)〈d ,Qd 〉,we have thatf ′(x ;d )¤0,while f (x C d )F f (x ).Thus,parts (a)and (c)of Theorem 4.1may be viewed as extensions of two results of Grippo and Sciandrone (Ref.27,Propositions 5.2,5.3)for the case of f 0being continuously differentiable and each f k being the indicator function of some closed convex set.In turn,the first of these results extended a result of Zadeh (Ref.26)for which f k ≡0for all k .Part (b)makes a less restrictive assumption on f than part (a),though its assumption on the BCD method is more restrictive.Part (b)is sharp in the sense that it is false if instead we assume that f is convex in every coordinate block.This484JOTA:VOL.109,NO.3,JUNE2001is because the Powell3-variable example(Ref.28)is convex in each variable; see Ref.27,Section6for further discussions of the example.We will see an application(Example6.4)in which part(b)applies but not part(a)nor(c).5.Convergence Analysis:IIThe convergence analysis of the previous section assumes f to be con-tinuous on a bounded level set and makes no use of the special structure(1) of f.In this section,we show that this assumption can be relaxed by exploiting the special structure(1),provided that f is quasiconvex and hemi-variate in each coordinate block.More precisely,we will make the following assumptions on f,f0,f1,...,f N:(B1)f0is continuous on dom f0.(B2)For each k∈{1,...,N}and(x j)j≠k,the function x k> f(x1,...,x N)is quasiconvex and hemivariate.(B3)f0,f1,...,f N are lsc.We will see some applications(Ref.6,Section3.4.3and Examples6.1–6.3)for which f satisfies this weaker assumption although it is not strictly convex.In addition,we will make one of the following technical assump-tions on f0:(C1)dom f0is open and f0tends to S at every boundary point of dom f0.(C2)dom f0G Y1B···B Y N,for some Y k⊆R n k,k G1,...,N.In contrast to Assumption C1,Assumption C2allows f0to have afinite value on bdry(dom f).We will see in Example6.2a nonseparable function f0that satisfies Assumptions B1–B3and C2,but not C1.We show below that Assumptions B1–B3,together with either Assumption C1or C2,ensure that every cluster point of the iterates generated by the BCD method is a coordinate minimum point of f.The proof of this result is patterned after an argument given by Bertsekas and Tsitsiklis(Ref.6,pp.220–221;also see Ref.27),but is complicated by the fact that f is not necessarily differentiable (or even continuous)on its effective domain.Proposition5.1.Suppose that f,f0,f1,...,f N satisfy Assumptions B1–B3and that f0satisfies either Assumption C1or C2.Also,assume that the sequence{x r G(x r1,...,x r N)}r G0,1,...generated by the BCD method using the essentially cyclic rule is defined.Then,either{f(x r)}↓−S,or else every cluster point z G(z1,...,z N)is a coordinatewise minimum point of f.Proof.Since f(x0)FS and f(x r C1)⁄f(x r)for all r,then either {f(x r)}↓−S,or else{f(x r)}converges to some limit and{f(x r C1)A f(x r)}→0.Consider the latter case and let z be any cluster point of{x r}. Since f is lsc by Assumption B3,we havef(z)⁄limr→Sf(x r)FS,so z∈dom f.We show below that z satisfies(4)for k G1,...,N.First,we claim that,for any infinite subsequence{x r}r∈R→z,(10) with R⊆{0,1,...},there holds that(x r C1}r∈R→z.(11) We prove this by contradiction.Suppose that this were not true.Then,there exists an infinite subsequence R′of R and a scalar(H0such that ͉͉x r C1A x r͉͉¤(,for all r∈R′.By further passing to a subsequence,if necessary,we can assume that there is some nonzero vector d for which{(x r C1A x r)͉͉͞x r C1A x r͉͉}r∈R′→d,(12) and that the same coordinate block,say x s,is chosen t the(r C1)st iteration for all r∈R′.Moreover,(10)implies that{f0(x r)}r∈R and{f k(x r k)}r∈R,k G 1,...,N,are bounded from below,which together with the convergence of{f(x r)}G{f0(x r)C∑Nk G1f k(x r k)}implies that{f0(x r)}r∈R and{f k(x r k)}r∈R,k G1,...,N,are bounded.Hence,by further passing to a subsequence,if necessary,we can assume that there is some scalarθfor which{f0(x r)C f s(x r s)}r∈R′→θ.(13) Fix anyλ∈[0,(].LetzˆG z Cλd,(14) and for each r∈R′,letxˆr G x r Cλ(x r C1A x r)͉͉͞x r C1A x r͉͉.(15) Then,by(10),(12),and(14),{xˆr}r∈R′→zˆ.(16) For each r∈R′,we see from(2)that x r C1is obtained from x r by minimizing f with respect to x s,while the other coordinates are heldfixed.Sinceλ͉͉͞x r C1A x r͉͉⁄λ͞(⁄1,so xˆr lies on the line segment joining x r with x r C1,this together with f(x r C1)⁄f(x r)and the quasiconvexity of x s>f(x r1,...,x r s A1,x s, x r s C1,...,x r N)impliesf(xˆr)⁄f(x r),∀r∈R′.Since f is lsc,this and(16)imply zˆ∈dom f.Also,this and(1)and the obser-vation that x r and xˆr differ only in their s th coordinate block imply f0(xˆr)C f s(xˆr s)⁄f0(x r)C f s(x r s),∀r∈R′.This combined with(13)yieldslimr→S,r∈R′sup{f0(xˆr)C f s(xˆr s)}⁄θ.(17) Also,since{f(x r C1)A f(x r)}r∈R′→0,we have equivalently that{f0(x r C1)C f s(x r C1s)A f0(x r)A f s(x r s)}r∈R′→0,so(13)implies{f0(x r C1)C f s(x r C1s)}r∈R′→θ.(18) LetδG f0(zˆ)C f s(zˆs)Aθ.Since f0and f s are lsc,we have from(16),(17)thatδ⁄0.We claim that in factδG0.Suppose that this were not true,so thatδH0.By(16)and the observation that,for all r∈R′,xˆr and x r differ in only their s th coordinate block,we have{(x r1,...,x r s A1,zˆs,x r s C1,...,x r N)}r∈R′→zˆ.(19) Moreover,the vector on the left-hand side of(19)is in dom f0for all r∈R′sufficiently large.Since zˆ∈dom f0,this is certainly true under Assumption C1;under Assumption C2,this is also true because x r∈dom f0for all r and dom f0has a product structure corresponding to the coordinate blocks. Then,(18)together with(19)and the continuity of f0on dom f0implies that,for all r∈R′sufficiently large,there holds thatf0(x r1,...,x r s A1,zˆs,x r s C1,...,x r N)C f s(zˆs)⁄f0(x r C1)C f s(x r C1s)Cδ͞2,or equivalently[via(1)and the observation that x r and x r C1differ in only their s th coordinate block],f(x r1,...,x r s A1,zˆs,x r s C1,...,x r N)⁄f(x r C1)Cδ͞2,a contradiction to the fact that x r C1is obtained from x r by minimizing f with respect to the s th coordinate block,while the other coordinates are heldfixed.Hence,δG0and thereforef0(zˆ)C f s(zˆs)Gθ.Since the choice ofλwas arbitrary,we obtain[also using(14)]f0(z Cλd)C f s(z s Cλd s)Gθ,∀λ∈[0,(],where d s denotes the s th coordinate block of d.Since x r and x r C1differ in only their s th coordinate block for all r∈R′,then all coordinate blocks of d,except d s,are zero[see(12)],and the above relation,together with(1), shows that f(z Cλd)is constant(andfinite)for allλ∈[0,(],a contradiction to Assumption B2,namely,that f is hemivariate in the s th coordinate block. Hence,(11)holds.Since(11)holds for any subsequence{x r}r∈R of{x r}converging to z, we can apply(11)to the subsequence{x r C1}r∈R to conclude that {x r C2}r∈R→z and so on,yielding{x r C j}r∈R→z,∀j G0,1,...,T,(20) where T is the bound specified in the essentially cyclic rule.We claim that(20),together with Assumption C1or C2,implies f0(z)C f k(z k)⁄f0(z1,...,z k A1,x k,z k C1,...,z N)C f k(x k),(21) for all x k and all k∈{1,...,N}.To see this,fix any k∈{1,...,N}.Since the coordinate blocks are chosen according to the essentially cyclic rule, there exists some j∈{1,...,T}and an infinite subsequence R′⊆R such that the coordinate block x k is chosen at the(r C j)th iteration for all r∈R′.Then,for each r∈R′,x r C j k minimizes f0(x r C j1,...,x r C j k A1,x k, x r C j k C1,...,x r C j N)C f k(x k)over all x k[see(1),(2),(3)],so thatf0(x r C j)C f k(x r C j k)(x r C j1,...,x r C j k A1,x k,x r C j k C1,...,x r C j N)C f k(x k),∀x k.(22)⁄fFix any x k∈dom f k such that(z1,...,z k A1,x k,z k C1,...,z N)∈dom f0. Suppose that Assumption C1holds,so dom f0is open.Since z∈dom f0,then(20)implies that(x r C j1,...,x r C j k A1,x k,x r C j k C1,...,x r C j N)∈dom f0,for all r∈R′sufficiently large. Passing to the limit as r→S,r∈R′,and using the lsc property of f k and the continuity of f0on the open set dom f0,we obtain from(20) and(22)that(21)holds.Suppose instead that Assumption C2holds,sodom f0G Y1B···B Y N,for some Y1⊆ℜn1,...,Y N⊆ℜn N.Then,thefirst quantity on the right-hand side of(22)isfinite for all r∈ℜ′.Passing to the limit as r→S,r∈ℜ′,and using the lsc property of f k and the continuity of f0on dom f0,we obtain from(20)and(22)that (21)holds.If x k∉dom f k or(z1,...,z k A1,x k,z k C1,...,z N)∉dom f0,then the right-hand side of(21)has the extended value S,so(21)holds trivially.Since the above choice of k was arbitrary,this shows that(21) holds for all x k and all k∈{1,...,N}.Then,it follows from(1)that(4) holds for all k G1,...,N.ᮀProposition5.1extends a result of Grippo and Sciandrone(Ref.27, Proposition5.1)for the special case where each f k is the indicator func-tion for some closed convex set and f0is continuously differentiable and (block)coordinatewise strictly pseudoconvex.In turn,the latter result is an extension of a result of Bertsekas and Tsitsiklis(Ref.6,Proposition 3.9in Section3.3.5),which assumes further f0to be convex.As a cor-ollary of Proposition5.1,we obtain the following convergence result for the BCD method.Theorem5.1.Suppose that f,f0,f1,...,f N satisfy Assumptions B1–B3and that f0satisfies either Assumption C1or C2.Also,assume that {x:f(x)⁄f(x0)}is bounded.Then,the sequence{x r}generated by the BCD method using the essentially cyclic rule is defined,bounded,and every cluster point is a coordinatewise minimum point of f.Theorem5.1extends a result of Auslender[see Theorem1.2(a)in Ref. 4,p.95]for the special case where f k is convex for all k,dom f0G Y1B···B Y N for some closed convex sets Y k⊆ℜn k,k G1,...,N,and f0is strongly convex and continuous on dom f0.6.ApplicationsWe describe four interesting applications of the BCD method below. In all applications,the objective function f is not necessarily strictly convex nor differentiable everywhere on its effective domain.Example6.1.Proximal Minimization Algorithm.Letψ:ℜn>ℜ∪{S}be a proper(i.e.,ψ͞≡S)lsc function.Fix any scalar c H0,and consider the proper lsc function f defined byf(x,y)G c͉͉x A y͉͉2Cψ(x).Clearly,this function has the form(1)withf0(x,y)G c͉͉x A y͉͉2,f1Gψ,f2≡0.Applying the BCD method to f yields a method whereby f(x,y)is alter-nately minimized with respect to x and y.This method has the form x r C1G arg minc͉͉x A x r͉͉2Cψ(x),r G0,1,...,xwhich is the proximal minimization algorithm withfixed parameter c for minimizingψ;see Ref.6,Section3.4.3and Refs.36–37and references therein.It is easily seen that f,f0,f1,f2satisfy Assumptions B1–B3and that f0 satisfies Assumptions A1and C1.Moreover,f is regular everywhere on dom f.Then,by Proposition5.1,ifψis bounded below(so,f is bounded below),then every cluster point z of the iterates generated by the above proximal minimization algorithm is a stationary point ofψ,i.e.,ψ′(z;d)¤0,for all d.Notice that Theorem4.1is not applicable here,since f need not be continu-ous on its level sets.Example6.2.Arimoto–Blahut Algorithm.Let P ij,i G1,...,n,j G 1,...,m,be given nonnegative scalars satisfyingP ij G1,for all i.∑jThe P ij may be viewed as probabilities.Consider the proper lsc function f defined byf(x,y)G f0(x,y)C f1(x)C f2(y),。