Compactnessforthecommutatorsofmultilinearsingulari
On the characterizations of compactness
JOURNAL OF FORMALIZED MATHEMATICSVolume13,Released2001,Published2003Inst.of Computer Science,Univ.of BiałystokOn the Characterizations of CompactnessGrzegorz Bancerek University of BiałystokNoboru EndouGifu National College of TechnologyYuji SakaiShinshu UniversityNaganoSummary.In the paper we show equivalence of the convergence offilters on a topo-logical space and the convergence of nets in the space.We also give,five characterizationsof ly,for any topological space T we proved that following condition areequivalent:•T is compact,•every ultrafilter on T is convergent,•every properfilter on T has cluster point,•every net in T has cluster point,•every net in T has convergent subnet,•every Cauchy net in T is convergent.MML Identifier:YELLOW19.WWW:/JFM/Vol13/yellow19.htmlThe articles[18],[7],[22],[23],[19],[14],[10],[5],[25],[24],[6],[16],[9],[12],[8],[15],[17], [21],[1],[2],[3],[11],[4],[20],and[13]provide the notation and terminology for this paper.One can prove the following proposition(2)1For every non empty set X and for every properfilter F of2X⊆and for every set A such thatA∈F holds A is not empty.Let T be a non empty topological space and let x be a point of T.The neighborhood system of x is a subset of2ΩT⊆and is defined by:(Def.1)The neighborhood system of x={A:A ranges over neighbourhoods of x}.The following proposition is true(3)Let T be a non empty topological space,x be a point of T,and A be a set.Then A∈theneighborhood system of x if and only if A is a neighbourhood of x.Let T be a non empty topological space and let x be a point of T.Note that the neighborhood system of x is non empty,proper,upper,andfiltered.The following propositions are true:(4)Let T be a non empty topological space,x be a point of T,and F be an upper subset of2ΩT⊆.Then x is a convergence point of F,T if and only if the neighborhood system of x⊆F.(5)For every non empty topological space T holds every point x of T is a convergence pointof the neighborhood system of x,T.(6)Let T be a non empty topological space and A be a subset of T.Then A is open if andonly if for every point x of T such that x∈A and for everyfilter F of2ΩT⊆such that x is a convergence point of F,T holds A∈F.Let S be a non empty1-sorted structure and let N be a non empty net structure over S.A subset of S is called a subset of S reachable by N if:(Def.2)There exists an element i of N such that it=rng(the mapping of N↾i).Next we state the proposition(7)Let S be a non empty1-sorted structure,N be a non empty net structure over S,and i be anelement of N.Then rng(the mapping of N↾i)is a subset of S reachable by N.Let S be a non empty1-sorted structure and let N be a reflexive non empty net structure over S. Observe that every subset of S reachable by N is non empty.The following three propositions are true:(8)Let S be a non empty1-sorted structure,N be a net in S,i be an element of N,and x be aset.Then x∈rng(the mapping of N↾i)if and only if there exists an element j of N such that i≤j and x=N(j).(9)Let S be a non empty1-sorted structure,N be a net in S,and A be a subset of S reachableby N.Then N is eventually in A.(10)Let S be a non empty1-sorted structure,N be a net in S,and F be afinite non empty set.Suppose every element of F is a subset of S reachable by N.Then there exists a subset B of SF.reachable by N such that B⊆Let T be a non empty1-sorted structure and let N be a non empty net structure over T.Thefilter of N is a subset of2ΩT⊆and is defined as follows:(Def.3)Thefilter of N={A;A ranges over subsets of T:N is eventually in A}.We now state the proposition(11)Let T be a non empty1-sorted structure,N be a non empty net structure over T,and A bea set.Then A∈thefilter of N if and only if N is eventually in A and A is a subset of T.Let T be a non empty1-sorted structure and let N be a non empty net structure over T.One can check that thefilter of N is non empty and upper.Let T be a non empty1-sorted structure and let N be a net in T.One can verify that thefilter of N is proper andfiltered.One can prove the following propositions:(12)Let T be a non empty topological space,N be a net in T,and x be a point of T.Then x is acluster point of N if and only if x is a cluster point of thefilter of N,T.(13)Let T be a non empty topological space,N be a net in T,and x be a point of T.Thenx∈Lim N if and only if x is a convergence point of thefilter of N,T.Let L be a non empty1-sorted structure,let O be a non empty subset of L,and let F be afilter of 2O⊆.The net of F is a strict non empty net structure over L and is defined by the conditions(Def.4). (Def.4)(i)The carrier of the net of F={ a,f ;a ranges over elements of L,f ranges over elements of F:a∈f},(ii)for all elements i,j of the net of F holds i≤j iff j2⊆i2,and(iii)for every element i of the net of F holds(the net of F)(i)=i1.Let L be a non empty1-sorted structure,let O be a non empty subset of L,and let F be afilter of2O⊆.Observe that the net of F is reflexive and transitive.Let L be a non empty1-sorted structure,let O be a non empty subset of L,and let F be a proper filter of2O⊆.Note that the net of F is directed.One can prove the following propositions:(14)For every non empty1-sorted structure T and for everyfilter F of2ΩT⊆holds F\{/0}=thefilter of the net of F.(15)Let T be a non empty1-sorted structure and F be a properfilter of2ΩT⊆.Then F=thefilterof the net of F.(16)Let T be a non empty1-sorted structure,F be afilter of2ΩT⊆,and A be a non empty subsetof T.Then A∈F if and only if the net of F is eventually in A.(17)Let T be a non empty topological space,F be a properfilter of2ΩT⊆,and x be a point of T.Then x is a cluster point of the net of F if and only if x is a cluster point of F,T.(18)Let T be a non empty topological space,F be a properfilter of2ΩT⊆,and x be a point of T.Then x∈Lim(the net of F)if and only if x is a convergence point of F,T.(20)2Let T be a non empty topological space,x be a point of T,and A be a subset of T.Supposex∈A if and only if there exists a net N in T such that N is eventually in A and x is a clusterpoint of N.(24)Let T be a non empty topological space,A be a subset of T,and x be a point of T.Thenx∈A if and only if there exists a properfilter F of2ΩT⊆such that A∈F and x is a cluster pointof F,T.(28)Let T be a non empty topological space,A be a subset of T,and x be a point of T.Thenx∈2The proposition(19)has been removed.(30)Let T be a non empty topological space and A be a subset of T.Then A is closed if andonly if for every ultrafilter F of2ΩT⊆such that A∈F and for every point x of T such that x isa convergence point of F,T holds x∈A.(31)Let T be a non empty topological space,N be a net in T,and s be a point of T.Then s is acluster point of N if and only if for every subset A of T reachable by N holds s∈[9]Agata Darmochwał.Families of subsets,subspaces and mappings in topological spaces.Journal of Formalized Mathematics,1,1989./JFM/Vol1/tops_2.html.[10]Agata Darmochwał.Finite sets.Journal of Formalized Mathematics,1,1989./JFM/Vol1/finset_1.html.[11]Adam Grabowski and Robert Milewski.Boolean posets,posets under inclusion and products of relational structures.Journal ofFormalized Mathematics,8,1996./JFM/Vol8/yellow_1.html.[12]Zbigniew Karno.Maximal discrete subspaces of almost discrete topological spaces.Journal of Formalized Mathematics,5,1993./JFM/Vol5/tex_2.html.[13]Artur Korniłowicz.On the topological properties of meet-continuous lattices.Journal of Formalized Mathematics,8,1996.http:///JFM/Vol8/waybel_9.html.[14]Beata Padlewska.Families of sets.Journal of Formalized Mathematics,1,1989./JFM/Vol1/setfam_1.html.[15]Beata Padlewska.Locally connected spaces.Journal of Formalized Mathematics,2,1990./JFM/Vol2/connsp_2.html.[16]Beata Padlewska and Agata Darmochwał.Topological spaces and continuous functions.Journal of Formalized Mathematics,1,1989./JFM/Vol1/pre_topc.html.[17]Alexander Yu.Shibakov and Andrzej Trybulec.The Cantor set.Journal of Formalized Mathematics,7,1995./JFM/Vol7/cantor_1.html.[18]Andrzej Trybulec.Tarski Grothendieck set theory.Journal of Formalized Mathematics,Axiomatics,1989./JFM/Axiomatics/tarski.html.[19]Andrzej Trybulec.Tuples,projections and Cartesian products.Journal of Formalized Mathematics,1,1989./JFM/Vol1/mcart_1.html.[20]Andrzej Trybulec.Moore-Smith convergence.Journal of Formalized Mathematics,8,1996./JFM/Vol8/yellow_6.html.[21]Wojciech A.Trybulec.Partially ordered sets.Journal of Formalized Mathematics,1,1989./JFM/Vol1/orders_1.html.[22]Zinaida Trybulec.Properties of subsets.Journal of Formalized Mathematics,1,1989./JFM/Vol1/subset_1.html.[23]Edmund Woronowicz.Relations and their basic properties.Journal of Formalized Mathematics,1,1989./JFM/Vol1/relat_1.html.[24]Edmund Woronowicz.Relations defined on sets.Journal of Formalized Mathematics,1,1989./JFM/Vol1/relset_1.html.[25]Edmund Woronowicz and Anna Zalewska.Properties of binary relations.Journal of Formalized Mathematics,1,1989.http://mizar.org/JFM/Vol1/relat_2.html.Received July29,2001Published January2,2004。
Holomorphic functions of slow growth on nested covering spaces of compact manifolds
a r X i v :m a t h /9808042v 2 [m a t h .C V ] 3 M a r 1999HOLOMORPHIC FUNCTIONS OF SLOW GROWTHON NESTED COVERING SPACES OF COMPACT MANIFOLDS Finnur L ´a russon University of Western Ontario 3March 1999Abstract.Let Y be an infinite covering space of a projective manifold M in P N of di-mension n ≥2.Let C be the intersection with M of at most n −1generic hypersurfaces of degree d in P N .The preimage X of C in Y is a connected submanifold.Let φbe the smoothed distance from a fixed point in Y in a metric pulled up from M .Let O φ(X )be the Hilbert space of holomorphic functions f on X such that f 2e −φis integrable on X ,and define O φ(Y )similarly.Our main result is that (under more general hypotheses than described here)the restriction O φ(Y )→O φ(X )is an isomorphism for d large enough.This yields new examples of Riemann surfaces and domains of holomorphy in C n with corona.We consider the important special case when Y is the unit ball B in C n ,and show that for d large enough,every bounded holomorphic function on X extends to a unique function in the intersection of all the nontrivial weighted Bergman spaces on B .Finally,assuming that the covering group is arithmetic,we establish three dichotomies concerning the extension of bounded holomorphic and harmonic functions from X to B .Introduction Let Y →M be an infinite covering space of an n -dimensional projective manifold,n ≥2.The function theory of such spaces is still not well understood.The central problem in this area is the conjecture of Shafarevich that the universal covering space of anyprojective manifold is holomorphically convex.This is a higher-dimensional variation on the venerable theme of uniformization.There are no known counterexamples to the conjecture,and it has been verified only in a number of fairly special cases.Suppose M is embedded into a projective space by sections of a very ample line bundle L .The generic linear subspace of codimension n −1intersects M in a 1-dimensionalconnected submanifold C called an L-curve.The preimage X of C is a connected Riemann surface embedded in Y.A natural approach to constructing holomorphic functions on Y is to extend them from X.This has the advantage of reducing certain questions to the1-dimensional case,but the price one pays is having to work with functions of slow growth.Here,slow growth means slow exponential growth with respect to the distance from afixed base point or a similar well-behaved exhaustion,in an L2or L∞sense. Functions in the Hardy class H p(X)grow slowly in this sense for p large enough.In Section1,we improve upon the main result of our earlier paper[L´a r1]and show that if L is sufficiently ample,then the restriction map·|X is an isomorphism of the Hilbert spaces of holomorphic functions of slow growth.As before,the proof is based on the L2 method of solving the¯∂-equation.This may be viewed as a sampling and interpolation theorem,related to those of Seip;Berndtsson and Ortega Cerd`a;and others.See[BO], [Sei],and the references therein.In Section2,we use the isomorphism theorem to construct new examples of Riemann surfaces with corona.These easily defined surfaces have many symmetries,we have a simple description of characters in the corona,and the corona is large in the sense that it contains a domain in euclidean space of arbitrarily high dimension.In Section3,we adapt results of H¨o rmander on generating algebras of holomorphic functions of exponential growth to the case of covering spaces.Consequently,as shown in Section4,the restriction map·|X may fail to be an isomorphism if L is not sufficiently ample compared to the exhaustion.Under the mild assumption that the covering group is Gromov hyperbolic,we found in[L´a r2]that the only obstruction to every positive harmonic function on X being the real part of a holomorphic function(in which case X has many holomorphic functions of slow growth that extend to Y)is a geometric condition involving the Martin boundary, characteristic of the higher-dimensional case.There are examples of infinitely connected X for which the obstruction is not present,but these have1-dimensional boundary, whereas in general the curves X of interest to us do not:they have the same boundary as the ambient space Y.No examples with higher dimensional boundary are known.In hopes of shedding some light on the dichotomy in[L´a r2],we restrict ourselves from Section4onwards to what seems to be the most auspicious setting possible and let Y be the unit ball B in C n,n≥2.We present the results of the previous sections in a more explicit form.We obtain a sampling and interpolation theorem for the weighted Bergman spaces on B.For each weight,the restriction to X induces an isomorphism from the weighted Bergman space on B to the one on X if L is sufficiently ample.This is in contrast to Seip’s result that no sequence in the disc is both sampling and interpolating for any weighted Bergman space[Sei].Also,every bounded holomorphic function on X extends to a unique function of just barely exponential growth on B,i.e.,a function in the intersection of all the nontrivial weighted Bergman spaces on B,when L is sufficiently ample,for instance when L is the m-th tensor power of the canonical bundle K with m≥2.Whether the extension is itself bounded is an important open question.2In Section5,assuming that the covering group is an arithmetic subgroup of the au-tomorphism group P U(1,n)of B,we establish two dichotomies related to that in[L´a r2] but using very different means.One of them says that either every holomorphic function f continuous up to the boundary on the preimage of a K⊗m-curve in afinite covering of M extends to a continuous function onsoX |f|2e−φ≤cY|f|2e−φ.This shows that we have a continuous linear restriction mapρ:Oφ(Y)→Oφ(X),f→f|X.In a previous paper we showed that under suitable curvature assumptions,ρis surjec-tive when n≥2.1.1.Theorem[L´a r1,Thm.3.1].If n≥2andΘ≥i∂¯∂φ+εωfor someε>0,thenρis surjective.By[L´a r1,Cor.2.4],since the weighted metric eφh in L has curvature−i∂¯∂φ+Θ≥εω, the k-th L2cohomology group H k(2)(Y,L∨)of Y with coefficients in the dual bundle L∨with the dual metric e−φh∨vanishes for k<n.The proof of Theorem1.1is based on vanishing for k=1,for which we need n≥2.We will now use vanishing for k=0to show thatρis injective.Let f∈Oφ(Y)such that f|X=0.Thenα=fs∨is a holomorphic section of L∨on Y.We will show thatαis square-integrable with respect to e−φh∨.Then vanishing ofH0(2)(Y,L∨)implies thatα=0,so f=0.Since∇s=0on C,there is a constant c>0such that dist(·,C)≤c|s|on M.For y∈Y\X let x∈X have dist(y,x)=dist(y,X).Then|α(y)|=|f(y)||s(y)|−1≤c|f(x)−f(y)|/dist(x,y)≤c sup|d f|≤cB|f|≤c B|f|2 1/2,where the supremum is taken over a ball centred at y covering all of M,and B is a ballof larger radius.Hence,|α(y)|2e−φ(y)≤cB|f|2e−φ,soY |α|2e−φ≤cY|f|2e−φ<∞.We have proved the following theorem.1.2.Theorem.SupposeΘ≥i∂¯∂φ+εωfor someε>0.Thenρis injective.If dim X≥1,thenρis an isomorphism.By induction,the theorem generalizes to the case when C is the common zero locus of sections s1,...,s k,k≤n,of L over M which,in a trivialization,can be completed to4a set of local coordinates at each point of C.When k=n−1,such C will be referred to as L-curves.If L is very ample,and therefore the pullback of the hyperplane bundle by an embedding of M into some projective space,then this condition means that the linear subspace{s1,...,s k=0}intersects M transversely in a smooth subvariety C of codimension k.By Bertini’s theorem,this holds for the generic linear subspace of codimension k.If k≤n−1,then C is connected and the mapπ1(C)→π1(M)is surjective by the Lefschetz hyperplane theorem,which implies that X is connected.An important example of a functionφas above is obtained by smoothing the distance δfrom afixed point in Y.By a result of Napier[Nap],there is a smooth functionτon Y such that(1)c1δ≤τ≤c2δ+c3for some c1,c2,c3>0,(2)dτis bounded,and(3)i∂¯∂τis bounded.Furthermore,by(1)and since the curvature of Y is bounded below,there is c>0such that e−cτis integrable on Y.Then e−cτis also integrable on X.If L is positive,then kΘ≥i∂¯∂τfor k∈N sufficiently large by(3),so the curvature inequality in Theorem1.2holds if L is replaced by a sufficiently high tensor power of itself.1.3.Example.There is an open hyperbolic Riemann surface X such that for some ε>0,any f∈O(X)with|f|≤ceεδis constant,whereδis the distance from afixed point in the Poincar´e metric.Namely,there is an example due to Cousin of a projective2-dimensional torus(abelian surface)M with a Z-covering space Y→M such that Y has no nonconstant holomorphic functions[NR,3.9].Letτbe a smooth function on Y satisfying(1),(2),and(3),such that e−τ/2is integrable.Let L be a very ample line bundle on M such thatΘ≥i∂¯∂τ+εω. Let X be the pullback in Y of an L-curve in M.Then Oτ(X)=C by Theorem1.2.If f∈O(X)and|f|≤ceεδwithε>0sufficiently small,then|f|≤ceτ/4,so f∈Oτ(X) and f is constant.2.New examples of Riemann surfaces with coronaLet X be a complex manifold.Let H∞(X)be the space of bounded holomorphic functions on X,which is a Banach algebra in the supremum norm.Let M be the character space of H∞(X),which is a compact Hausdorffspace in the weak-star topology.There is a continuous mapι:X→M taking x∈X to the evaluation character f→f(x).The Corona Problem asks whetherι(X)is dense in M.The complement of the closure of ι(X)in M is referred to as the corona,so ifι(X)is not dense in M,then X is said to have corona.It is well known that the following are equivalent.(1)ι(X)is dense in M.(2)If f1,...,f m∈H∞(X)and|f1|+···+|f m|>ε>0,then there are g1,...,g m∈H∞(X)such that f1g1+···+f m g m=1.5By Carleson’s famous Corona Theorem(1962),the disc has no corona.The Corona The-orem holds for Riemann surfaces offinite type and planar domains of various kinds.The Corona Problem for arbitrary planar domains is open.Around1970,Cole constructed thefirst example of a Riemann surface with corona[Gam].By modifying Cole’s exam-ple,Nakai obtained a regular Parreau-Widom surface with corona[Nak],[Has,p.229]. Recently,Barrett and Diller showed that the homology covering spaces of domains in the Riemann sphere,whose complement has positive logarithmic capacity and zero length, have corona[BD].See also[EP,7.3].Sibony[Sib]found thefirst example of a domain of holomorphy in C n,n≥2,with corona.There are no known examples of such domains without corona.We will now present a new class of Riemann surfaces with corona(see also Theorem 4.2).We remind the reader that if Y is a bounded domain in C n covering a compact complex manifold M,then Y is a domain of holomorphy[Sie,p.136]and M is projective. In fact,the canonical bundle of M is ample[Kol,5.22].2.1.Theorem.Letπ:Y→M be a covering map,where Y is a bounded domain in C n, n≥2,and M is compact.Let L be an ample line bundle on M.If C is an L⊗m-curve in M with m sufficiently large,then the Riemann surface X=π−1(C)has corona.In fact, the natural map from X into the character space M of H∞(X)extends to an embedding of Y into M which maps Y\X into the corona of X.Proof.Letτ≥0be a smoothed distance function on Y as described in Section1,such that e−τis integrable on Y,and hence on X,so H∞(X)⊂Oτ(X).We claim that Oτ(X)·Oτ(X)⊂O3τ(X).Namely,suppose f∈Oτ(X).For p∈X, let B be the ball of radius1centred at p in a metric pulled up from C.Then|f(p)|≤cB|f|≤c B|f|2 1/2,where the constants are independent of p,so|f(p)|e−τ(p)/2≤c B|f|2e−τ 1/2<∞.Hence,|f|≤ceτ/2,soX |f|4e−3τ≤cXe2τe−3τ<∞,and f2∈O3τ(X).If m is sufficiently large,then the curvature of L⊗m is at least3i∂¯∂τ+ωfor some K¨a hler formωon M,so the restriction mapρ:O3τ(Y)→O3τ(X)is an isomorphism by Theorem 1.2.Clearly,ρOτ(Y)=Oτ(X).For p∈Y,letλp be the linear functional f→ρ−1(f)(p) on H∞(X).If f,g∈H∞(X),thenρ−1(f)ρ−1(g)∈Oτ(X)·Oτ(X)⊂O3τ(X),and6ρ−1(f)ρ−1(g)|X=fg,soρ−1(f)ρ−1(g)=ρ−1(fg).This shows thatλp is a character on H∞(X).We have obtained a mapι:Y→M,p→λp,extending the natural map from X into M.We claim thatιis a homeomorphism onto its image with the induced topology,and thatι(X)∩ι(Y)=ι(X).This shows that the corona of X contains an embedded image of Y\X.Let us remark that a Riemann surface as in the theorem is not Parreau-Widom,since a Parreau-Widom surface X embeds into the character space of H∞(X)as an open subset [Has,p.222].See also the proof of Theorem5.2.By the same argument we easily obtain the following more general result.2.2.Theorem.Let Y be a covering space of a projective manifold M with dim M≥2, L be a line bundle on M,and C be an L-curve in M with preimage X in Y.If(1)L is sufficiently ample,and(2)there is a bounded holomorphic map g:Y→C m withU,then we can show by the same argument as in the proof of Theorem2.1that U has corona.Since X is Stein,U may be chosen to be a domain of holomorphy by Siu’s theorem[Siu].Thus we obtain new examples of bounded domains of holomorphy in C n with corona.3.Generating H¨o rmander algebras on covering spacesIn this section,we adapt results of H¨o rmander[H¨o r]on generating algebras of holo-morphic functions of exponential growth to the case of covering spaces over compact manifolds.We let X→M be a covering space of an n-dimensional compact hermitian manifold M.Letφ:X→[0,∞)be a smooth function such that(1)dφis bounded,and(2)e−cφis integrable on X for some c>0.LetAφ=Aφ(X)= c>0O cφ(X)7be the vector space of holomorphic functions f on X such that f2e−cφis integrable on X for some c>0.By(2),Aφcontains all bounded holomorphic functions on X.The following is easy to see by an argument similar to that in the proof of Theorem2.1.3.1.Proposition.A holomorphic function f on X is in Aφif and only if|f|≤ce aφfor some a>0.Hence,Aφis a C-algebra,called a H¨o rmander algebra.If functions f1,...,f m in Aφgenerate Aφ,then there are g1,...,g m∈Aφsuch that f1g1+···+f m g m=1,somaxi=1,...,m|f i|≥ce−aφfor some a>0.We will establish an effective converse to this observation.Our proof is a straightforward adaptation of H¨o rmander’s Koszul complex argument in[H¨o r].See also[EP,7.3]. Let m≥1and r,s≥0be integers,and t∈[0,∞).Choose a basis{e1,...,e m} for C m.Let L s r(t)be the space of smoothΛs C m-valued(0,r)-forms on X which are square-integrable with respect to e−tφ.3.2.Lemma.Suppose M is K¨a hler with K¨a hler formω,and Ric(X)+ti∂¯∂φ≥εωfor someε>0.Ifη∈L s r+1(t)and¯∂η=0,then there isξ∈L s r(t)with¯∂ξ=η.Proof.This follows directly from standard L2theory.See for instance[Dem,Sec.14].Now let f1,...,f m be holomorphic functions on X such thatce−c2φ≤max i|f i|2≤ce c1φ,c1,c2>0.Define a linear operatorα:L s+1r(t)→L s r(t+c1)by the formulaα(e i1∧···∧e i s+1)=s+1k=1(−1)k+1f ike i1∧···∧ˆe i k∧···∧e i s+1,and setα=0on L0r(t).Thenα2=0,andαcommutes with¯∂.Also define a linear operatorβ:L s r(t)→L s+1r(t+c1+2c2)by the formulaβ(ξ)=13.3.Lemma.Suppose e−aφis integrable on X.Ifξ∈L s r(t)andα(ξ)=0,then thereisη∈L s+1r (t+c1+2c2)such thatα(η)=ξand in addition¯∂η∈L s+1r+1(t+2c1+3c2+a)if¯∂ξ=0.Proof.Takeη=β(ξ).Say¯∂ξ=0.Then|¯∂η|≤c|ξ|max i ¯∂¯f iThen functions f1,...,f m in Aφgenerate Aφif and only if−aφfor some a>0.maxi=1,...,m|f i|≥ceThe hypotheses of the corollary are satisfied for example when X is the unit ball in C n andφ=−log(1−|·|2),which is comparable to the Bergman distance from the origin (see Section4).Proof of Theorem3.4.If s≥m or r>n,thenξ=0and we takeη=0.Assume that s<m and r≤n,and that the theorem has been proved with r,s replaced by r+1,s+1.By Lemma3.3,there isη1∈L s+1r(t+c1+2c2)such thatα(η1)=ξand¯∂η1∈L s+1r+1(t+2c1+3c2+a).Now¯∂¯∂η1=0andα(¯∂η1)=¯∂α(η1)=¯∂ξ=0, so by the induction hypothesis,there isη2∈L s+2r+1(u−c1)such that¯∂η2=0and α(η2)=¯∂η1.By Lemma3.2,there isη3∈L s+2r(u−c1)such that¯∂η3=η2.Now letη=η1−α(η3)∈L s+1r(u).Then¯∂η=¯∂η1−α(¯∂η3)=¯∂η1−α(η2)=0and α(η)=α(η1)=ξ.4.The case of the ballIn this section,we will consider the results of the previous sections in the explicit setting of the unit ball B in C n,n≥2.For an instructive discussion of compact ball quotients,see[Kol,Ch.8].Let M be a projective manifold covered by B with a positive line bundle L with curvatureΘ.Let X be the preimage in B of an L-curve C in M.We are particularly interested in the extension problem for bounded holomorphic functions on X.The restriction map H∞(B)→H∞(X)is injective,so we can consider H∞(B)as a subspace of H∞(X),which is closed in the sup-norm.In the locally uniform topology, however,H∞(B)is dense in H∞(X).Namely,say f∈H∞(X),and let F∈O(B)be an extension of f.For r<1,the functions z→F(rz)are bounded in B,and they converge locally uniformly to f on X as r→1.It is well known that if Y is a complex submanifold of a neighbourhood ofwhereg jk= ∂∂z k =n+12Im ds2=i2∂¯∂log(1−|z|2).The distance from the origin in the Bergman metric isδ(z)=√2log1+|z|2∂¯∂log det(g jk)=−ω.Letτ(z)=−n+1n+1δ≤τ≤√n+1.The weighted Bergman space B a p is the space of holomorphic functions g on B suchthatB|g|p(1−|z|)aωn<∞.See[Sto,Ch.10].We have B n2=0.The intersection B n+02= ε>0B n+ε2contains the Hardy space H2(B)of holomorphic functions f on B such that|f|2has a harmonic majorant.11The boundary behaviour of the functions in B n+02may be rather wild.A theorem ofBagemihl,Erd¨o s,and Seidel[BES],[Mac],states that ifµ:[0,1)→[0,∞)goes to infinity at1,then there exists a holomorphic function f on the unit disc with|f|≤µ(|·|),suchthat for some sequence r nր1,we have min|z|=r n|f(z)|→∞.In particular,f does nothave afinite limit along any curve that intersects every neighbourhood of the boundary.Takingµ(r)=−log(1−r),we get f in B1+02.We haveB a2=O2an+1O cτ(B),and define E(X)similarly.These are Fr´e chet-Hilbert spaces.From now on we let n≥2.Theorem1.2yields the following result.4.1.Theorem.IfΘ>cω,then the restriction mapρ:O cτ(B)→O cτ(X)is an isomorphism.SupposeΘ>2nn+1ω,then X has corona.Proof.Let p∈B\X and f i(z)=z i−p i,i=1,...,n.Then |f i|>ε>0on X.Suppose X has no corona.Then there are g1,...,g n∈H∞(X)with f i g i=1.By Theorem 4.1,g i extends to a function G i∈E(B).Then h= f i G i∈E(B),and h|X=1.Again by Theorem4.1,h=1,which is absurd since h(p)=0.The proof of Theorem2.1shows that ifΘ>6nand similarly for X.In contrast to thefirst part of Theorem4.1,we we obtain the following result from Corollary3.5.4.3.Theorem.If s≥1andΘ≤sω,then the restriction mapρ:O cτ(B)→O cτ(X)is not an isomorphism for2n2−n+1c>s+.Furthermore,wen+12ncan easily show thatρis not injective if c>2m+.n+15.A dichotomyAs before,we consider a projective manifold M covered by the unit ball B in C n,n≥2, with a positive line bundle L,and the preimage X in B of an L-curve C in M.We will de-note the covering group byΓ.The bounded extension problem for holomorphic functions is related to the question of which bounded harmonic functions on X are real parts of holo-morphic functions.This question was studied in[L´a r2],where the following dichotomy was established in the more general setting of a nonelementary Gromov-hyperbolic cov-ering space of a compact K¨a hler manifold.5.1.Theorem[L´a r2,Thm.4.2].One of the following holds.(1)Every positive harmonic function on X is the real part of a holomorphic function.(2)If u≥0is the real part of an H1function on X,then the boundary decay of u ata zero on the Martin boundary of X is no faster than its radial decay.By results of Ancona[Anc],the Martin compactification of X is naturally homeomor-phic to X∪S,where S=∂B is the unit sphere.13Clearly,if(1)holds,then there are holomorphic functions on X with a bounded real part that do not extend to a holomorphic function on B with a bounded real part.If(1)holds,then each Martin function k p,p∈S,is the real part of a holomorphic function f p on X.Then the holomorphic map exp(−f p):X→D is proper at every boundary point except p.Here,D denotes the unit disc.Also,if p,q∈S,p=q,then the holomorphic map(exp(−f p),exp(−f q)):X→D×D is proper.However,we have the following result.5.2.Theorem.There is no proper holomorphic map X→D.Proof.Bounded holomorphic functions separate points on X,so if there is a proper holomorphic map X→D,then X is Parreau-Widom by a theorem of Hasumi[Has,p. 209].By[L´a r2,Thm.5.1],if X is Parreau-Widom,then X is either isomorphic to D or homeomorphic to the2-sphere with a Cantor set removed.Both possibilities are excluded by the Martin boundary of X being S.When L is sufficiently ample,we can prove a stronger result.5.3.Theorem.If L is sufficiently ample and f is a holomorphic function on X,then f−1(U)is not relatively compact in X for any nonempty open subset U of the image f(X).In other words,every value of f is taken at infinity.Proof.Suppose there is a holomorphic function f on X such that f−1(U)is relatively compact in X for some nonempty open subset U of f(X).We may assume that0∈U. Then1/f is a meromorphic function on X which has a pole p and is bounded outside the compact closure of f−1(U).Since bounded holomorphic functions separate points on X, a theorem of Hayashi[Hay]now implies that the natural map from X into the character space of H∞(X)is open when restricted to some neighbourhood of p.By Theorem2.1, this is absurd when L is sufficiently ample.We will now present another dichotomy in a similar vein.Let C K(S)denote the space of continuous functions S→K,K=R or K=C,with the supremum norm.Let P be the subspace of C R(S)of boundary values of pluriharmonic functions on B which are continuous onB.It is known that if V is a proper closed subspace of C R(S)and V is invariant under the action of the automorphism group G=P U(1,n) of B,then V=R or V=P.Also,if V is a proper closed G-invariant subspace of C C(S), then V is one of the following:C,O,5.4.Theorem.Suppose that the covering groupΓis arithmetic,and that L is a tensor power of the canonical bundle.Then one(and only one)of the following holds.(1)E(C)=P for every L-curve in afinite covering of M.(2)The subspace of C R(S)generated by E(C)for all L-curves C infinite coveringsof M is dense in C R(S).Note that(2)holds if(1)in Theorem5.1holds for the preimage X of some L-curve in afinite covering of M.Proof.Let C be an L-curve in afinite covering M1of M,with preimage X in B.ThenM1=B/Γ1,whereΓ1is a subgroup offinite index inΓ.Let g be an element of the commensurability subgroup Comm(Γ)in G.This means thatΓand gΓg−1are commen-surable,i.e.,their intersection is offinite index in both of them.ThenΓ2=Γ1∩gΓ1g−1 is a subgroup offinite index inΓ1.Ifα∈E(C),so H X[α]=Re f with f holomorphicon X,then f◦g is holomorphic on g−1X and H g−1X[α◦g]=Re f◦g.Ifγ∈Γ2,then γ=g−1γ1g for someγ1∈Γ1,soγg−1X=g−1γ1gg−1X=g−1γ1X=g−1X.Hence,g−1X isΓ2-invariant,so g−1X is the preimage of an L-curve C′in thefinitecovering B/Γ2of M(here is where we use the assumption that L is a tensor power of thecanonical bundle),andα◦g∈E(C′).This shows that the subspace E of C R(S)described in(2)is invariant under Comm(Γ). SinceΓis arithmetic,Comm(Γ)is Hausdorff-dense in G[Zim,6.2.4](and in fact con-versely),so the closure E is either P or C R(S),and the theorem follows.If the spaces E(C)are rigid in the sense that they do not change when C is varied in its linear equivalency class,then the theorem yields a strong dichotomy.5.5.Corollary.Suppose thatΓis arithmetic,and that L is a tensor power of the canonical bundle.Suppose also that if C1and C2are L-curves in the samefinite covering of M,then E(C1)=E(C2).Then E(C)is either P or C R(S)for every L-curve C in a finite covering of M.We obtain analogous results for holomorphic functions.If C is an L-curve in afinite covering of M with preimage X in B,let us denote by F(C)the closed subspace of functionsα∈C C(S)that extend to a holomorphic function on X.Clearly,O⊂F(C), but F(C)is considerably smaller than C C(S).5.6.Lemma.F(C)∩(P+i C R(S))=O.Proof.Letα∈F(C)∩(P+i C R(S)),so H[α]=f∈O(X)and there is u∈C R(B.Now F maps B into a vertical strip.Letσbe an isomorphism from a neighbourhoodof the closure of this strip in the Riemann sphere onto D.Thenσ◦F is a bounded15holomorphic function on B andσ◦F|X=σ◦f,soσ◦F has the same nontangential boundary functionσ◦αasσ◦f.Sinceσ◦αis continuous,σ◦F extends continuously toB)such that(1)u is plurisubharmonic on B,(2)(∂¯∂u)n=0,i.e.,u is maximal,and(3)u|S=α.Let us write u=M[α]=M B[α].This is the solution of the Dirichlet problem for the Monge-Amp`e re operator,due to Bedford and Taylor[BT].See also earlier work of Bremermann[Bre]and Walsh[Wal].In fact,u is given by the Perron-Bremermann formula u=sup Fα,where Fαis the set of all plurisubharmonic functions v on B withv(z)≤α(x),x∈S.lim supz→xThe operator M:C R(S)→C R(ly,ifε>0,thenα−ε≤αi≤α+εfor i sufficiently large,and then Fα−ε⊂Fαi⊂Fα+ε,so M[α]−ε≤M[αi]≤M[α]+ε.166.1.Theorem.Letα∈C R(S).The following are equivalent.(1)The harmonic extension H[α]ofαto X extends to a bounded-above plurisubhar-monic function on B.(2)H[α]extends to a function in C R(B,so if M[αi]|X are harmonic,then so is M[α]|X.We obtain a dichotomy analogous to those in Section5.If C is an L-curve in afinite covering of M with preimage X in B,let us denote by D(C)the space of functions α∈C R(S)such that M[α]|X is harmonic.Clearly,P⊂D(C)but,as noted above,D(C) is considerably smaller than C R(S).6.3.Theorem.Suppose thatΓis arithmetic,and that L is a tensor power of the canonical bundle.Then one of the following holds.(1)D(C)=P for every L-curve in afinite covering of M.(2)The subspace of C R(S)generated by D(C)for all L-curves C infinite coveringsof M is dense in C R(S).17Suppose furthermore that if C1and C2are L-curves in the samefinite covering of M, then D(C1)=D(C2).Then D(C)=P for every L-curve C in afinite covering of M.There are no examples for which it is known which alternative holds in any of the four dichotomies5.1,5.4,5.7,and6.3,nor is it known if these dichotomies are actually different.References[Anc] A.Ancona,Th´e orie du potentiel sur les graphes et les vari´e t´e s,´Ecole d’´e te de probabilit´e s de Saint-Flour XVIII—1988,Lecture Notes in Mathematics,vol.1427,Springer-Verlag,Berlin, 1990,pp.1–112.[BES] F.Bagemihl,P.Erd¨o s,W.Seidel,Sur quelques propri´e t´e s fronti`e res des fonctions holomorphes d´efinies par certains produits dans le cercle-unit´e,Ann.Sci.´Ecole Norm.Sup.(3)70(1953), 135–147.[BD] D.E.Barrett,J.Diller,A new construction of Riemann surfaces with corona,(preprint1996), J.Geometric Analysis(to appear).[BT] E.Bedford,B.A.Taylor,The Dirichlet problem for a complex Monge-Amp`e re equation,Invent.Math.37(1976),1–44.[BO] B.Berndtsson,J.Ortega Cerd`a,On interpolation and sampling in Hilbert spaces of analytic functions,J.reine angew.Math.464(1995),109–128.[Bre]H.J.Bremermann,On a generalized Dirichlet problem for plurisubharmonic functions and pseudo-convex domains.Characterization ofˇSilov boundaries,Trans.Amer.Math.Soc.91 (1959),246–276.[Dem]J.-P.Demailly,Th´e orie de Hodge L2et th´e or`e mes d’annulation,Introduction`a la th´e orie de Hodge,Panoramas et synth`e ses3,Soc.Math.France,1996.[EP]J.Eschmeier,M.Putinar,Spectral decompositions and analytic sheaves,London Math.Society Monographs,new series,vol.10,Oxford University Press,1996.[Gam]T.W.Gamelin,Uniform algebras and Jensen measures,London Math.Society Lecture Note Series,vol.32,Cambridge University Press,1978.[Has]M.Hasumi,Hardy classes on infinitely connected Riemann surfaces,Lecture Notes in Mathe-matics,vol.1027,Springer-Verlag,1983.[Hay]M.Hayashi,The maximal ideal space of the bounded analytic functions on a Riemann surface, J.Math.Soc.Japan39(1987),337–344.[HL]G.Henkin,J.Leiterer,Theory of functions on complex manifolds,Monographs in Mathematics, vol.79,Birkh¨a user,1984.[H¨o r]L.H¨o rmander,Generators for some rings of analytic functions,Bull.Amer.Math.Soc.73 (1967),943–949.[Kol]J.Koll´a r,Shafarevich maps and automorphic forms,Princeton University Press,1995. [Kra]S.G.Krantz,Function theory of several complex variables,2nd ed.,Wadsworth&Brooks/Cole, 1992.[L´a r1] F.L´a russon,An extension theorem for holomorphic functions of slow growth on covering spaces of projective manifolds,J.Geometric Analysis5(1995),281–291.[L´a r2]。
A Discriminatively Trained, Multiscale, Deformable Part Model
A Discriminatively Trained,Multiscale,Deformable Part ModelPedro Felzenszwalb University of Chicago pff@David McAllesterToyota Technological Institute at Chicagomcallester@Deva RamananUC Irvinedramanan@AbstractThis paper describes a discriminatively trained,multi-scale,deformable part model for object detection.Our sys-tem achieves a two-fold improvement in average precision over the best performance in the2006PASCAL person de-tection challenge.It also outperforms the best results in the 2007challenge in ten out of twenty categories.The system relies heavily on deformable parts.While deformable part models have become quite popular,their value had not been demonstrated on difficult benchmarks such as the PASCAL challenge.Our system also relies heavily on new methods for discriminative training.We combine a margin-sensitive approach for data mining hard negative examples with a formalism we call latent SVM.A latent SVM,like a hid-den CRF,leads to a non-convex training problem.How-ever,a latent SVM is semi-convex and the training prob-lem becomes convex once latent information is specified for the positive examples.We believe that our training meth-ods will eventually make possible the effective use of more latent information such as hierarchical(grammar)models and models involving latent three dimensional pose.1.IntroductionWe consider the problem of detecting and localizing ob-jects of a generic category,such as people or cars,in static images.We have developed a new multiscale deformable part model for solving this problem.The models are trained using a discriminative procedure that only requires bound-ing box labels for the positive ing these mod-els we implemented a detection system that is both highly efficient and accurate,processing an image in about2sec-onds and achieving recognition rates that are significantly better than previous systems.Our system achieves a two-fold improvement in average precision over the winning system[5]in the2006PASCAL person detection challenge.The system also outperforms the best results in the2007challenge in ten out of twenty This material is based upon work supported by the National Science Foundation under Grant No.0534820and0535174.Figure1.Example detection obtained with the person model.The model is defined by a coarse template,several higher resolution part templates and a spatial model for the location of each part. object categories.Figure1shows an example detection ob-tained with our person model.The notion that objects can be modeled by parts in a de-formable configuration provides an elegant framework for representing object categories[1–3,6,10,12,13,15,16,22]. While these models are appealing from a conceptual point of view,it has been difficult to establish their value in prac-tice.On difficult datasets,deformable models are often out-performed by“conceptually weaker”models such as rigid templates[5]or bag-of-features[23].One of our main goals is to address this performance gap.Our models include both a coarse global template cov-ering an entire object and higher resolution part templates. The templates represent histogram of gradient features[5]. As in[14,19,21],we train models discriminatively.How-ever,our system is semi-supervised,trained with a max-margin framework,and does not rely on feature detection. We also describe a simple and effective strategy for learn-ing parts from weakly-labeled data.In contrast to computa-tionally demanding approaches such as[4],we can learn a model in3hours on a single CPU.Another contribution of our work is a new methodology for discriminative training.We generalize SVMs for han-dling latent variables such as part positions,and introduce a new method for data mining“hard negative”examples dur-ing training.We believe that handling partially labeled data is a significant issue in machine learning for computer vi-sion.For example,the PASCAL dataset only specifies abounding box for each positive example of an object.We treat the position of each object part as a latent variable.We also treat the exact location of the object as a latent vari-able,requiring only that our classifier select a window that has large overlap with the labeled bounding box.A latent SVM,like a hidden CRF[19],leads to a non-convex training problem.However,unlike a hidden CRF, a latent SVM is semi-convex and the training problem be-comes convex once latent information is specified for thepositive training examples.This leads to a general coordi-nate descent algorithm for latent SVMs.System Overview Our system uses a scanning window approach.A model for an object consists of a global“root”filter and several part models.Each part model specifies a spatial model and a partfilter.The spatial model defines a set of allowed placements for a part relative to a detection window,and a deformation cost for each placement.The score of a detection window is the score of the root filter on the window plus the sum over parts,of the maxi-mum over placements of that part,of the partfilter score on the resulting subwindow minus the deformation cost.This is similar to classical part-based models[10,13].Both root and partfilters are scored by computing the dot product be-tween a set of weights and histogram of gradient(HOG) features within a window.The rootfilter is equivalent to a Dalal-Triggs model[5].The features for the partfilters are computed at twice the spatial resolution of the rootfilter. Our model is defined at afixed scale,and we detect objects by searching over an image pyramid.In training we are given a set of images annotated with bounding boxes around each instance of an object.We re-duce the detection problem to a binary classification prob-lem.Each example x is scored by a function of the form, fβ(x)=max zβ·Φ(x,z).Hereβis a vector of model pa-rameters and z are latent values(e.g.the part placements). To learn a model we define a generalization of SVMs that we call latent variable SVM(LSVM).An important prop-erty of LSVMs is that the training problem becomes convex if wefix the latent values for positive examples.This can be used in a coordinate descent algorithm.In practice we iteratively apply classical SVM training to triples( x1,z1,y1 ,..., x n,z n,y n )where z i is selected to be the best scoring latent label for x i under the model learned in the previous iteration.An initial rootfilter is generated from the bounding boxes in the PASCAL dataset. The parts are initialized from this rootfilter.2.ModelThe underlying building blocks for our models are the Histogram of Oriented Gradient(HOG)features from[5]. We represent HOG features at two different scales.Coarse features are captured by a rigid template covering anentireImage pyramidFigure2.The HOG feature pyramid and an object hypothesis de-fined in terms of a placement of the rootfilter(near the top of the pyramid)and the partfilters(near the bottom of the pyramid). detection window.Finer scale features are captured by part templates that can be moved with respect to the detection window.The spatial model for the part locations is equiv-alent to a star graph or1-fan[3]where the coarse template serves as a reference position.2.1.HOG RepresentationWe follow the construction in[5]to define a dense repre-sentation of an image at a particular resolution.The image isfirst divided into8x8non-overlapping pixel regions,or cells.For each cell we accumulate a1D histogram of gra-dient orientations over pixels in that cell.These histograms capture local shape properties but are also somewhat invari-ant to small deformations.The gradient at each pixel is discretized into one of nine orientation bins,and each pixel“votes”for the orientation of its gradient,with a strength that depends on the gradient magnitude.For color images,we compute the gradient of each color channel and pick the channel with highest gradi-ent magnitude at each pixel.Finally,the histogram of each cell is normalized with respect to the gradient energy in a neighborhood around it.We look at the four2×2blocks of cells that contain a particular cell and normalize the his-togram of the given cell with respect to the total energy in each of these blocks.This leads to a vector of length9×4 representing the local gradient information inside a cell.We define a HOG feature pyramid by computing HOG features of each level of a standard image pyramid(see Fig-ure2).Features at the top of this pyramid capture coarse gradients histogrammed over fairly large areas of the input image while features at the bottom of the pyramid capture finer gradients histogrammed over small areas.2.2.FiltersFilters are rectangular templates specifying weights for subwindows of a HOG pyramid.A w by hfilter F is a vector with w×h×9×4weights.The score of afilter is defined by taking the dot product of the weight vector and the features in a w×h subwindow of a HOG pyramid.The system in[5]uses a singlefilter to define an object model.That system detects objects from a particular class by scoring every w×h subwindow of a HOG pyramid and thresholding the scores.Let H be a HOG pyramid and p=(x,y,l)be a cell in the l-th level of the pyramid.Letφ(H,p,w,h)denote the vector obtained by concatenating the HOG features in the w×h subwindow of H with top-left corner at p.The score of F on this detection window is F·φ(H,p,w,h).Below we useφ(H,p)to denoteφ(H,p,w,h)when the dimensions are clear from context.2.3.Deformable PartsHere we consider models defined by a coarse rootfilter that covers the entire object and higher resolution partfilters covering smaller parts of the object.Figure2illustrates a placement of such a model in a HOG pyramid.The rootfil-ter location defines the detection window(the pixels inside the cells covered by thefilter).The partfilters are placed several levels down in the pyramid,so the HOG cells at that level have half the size of cells in the rootfilter level.We have found that using higher resolution features for defining partfilters is essential for obtaining high recogni-tion performance.With this approach the partfilters repre-sentfiner resolution edges that are localized to greater ac-curacy when compared to the edges represented in the root filter.For example,consider building a model for a face. The rootfilter could capture coarse resolution edges such as the face boundary while the partfilters could capture details such as eyes,nose and mouth.The model for an object with n parts is formally defined by a rootfilter F0and a set of part models(P1,...,P n) where P i=(F i,v i,s i,a i,b i).Here F i is afilter for the i-th part,v i is a two-dimensional vector specifying the center for a box of possible positions for part i relative to the root po-sition,s i gives the size of this box,while a i and b i are two-dimensional vectors specifying coefficients of a quadratic function measuring a score for each possible placement of the i-th part.Figure1illustrates a person model.A placement of a model in a HOG pyramid is given by z=(p0,...,p n),where p i=(x i,y i,l i)is the location of the rootfilter when i=0and the location of the i-th part when i>0.We assume the level of each part is such that a HOG cell at that level has half the size of a HOG cell at the root level.The score of a placement is given by the scores of eachfilter(the data term)plus a score of the placement of each part relative to the root(the spatial term), ni=0F i·φ(H,p i)+ni=1a i·(˜x i,˜y i)+b i·(˜x2i,˜y2i),(1)where(˜x i,˜y i)=((x i,y i)−2(x,y)+v i)/s i gives the lo-cation of the i-th part relative to the root location.Both˜x i and˜y i should be between−1and1.There is a large(exponential)number of placements for a model in a HOG pyramid.We use dynamic programming and distance transforms techniques[9,10]to compute the best location for the parts of a model as a function of the root location.This takes O(nk)time,where n is the number of parts in the model and k is the number of cells in the HOG pyramid.To detect objects in an image we score root locations according to the best possible placement of the parts and threshold this score.The score of a placement z can be expressed in terms of the dot product,β·ψ(H,z),between a vector of model parametersβand a vectorψ(H,z),β=(F0,...,F n,a1,b1...,a n,b n).ψ(H,z)=(φ(H,p0),φ(H,p1),...φ(H,p n),˜x1,˜y1,˜x21,˜y21,...,˜x n,˜y n,˜x2n,˜y2n,). We use this representation for learning the model parame-ters as it makes a connection between our deformable mod-els and linear classifiers.On interesting aspect of the spatial models defined here is that we allow for the coefficients(a i,b i)to be negative. This is more general than the quadratic“spring”cost that has been used in previous work.3.LearningThe PASCAL training data consists of a large set of im-ages with bounding boxes around each instance of an ob-ject.We reduce the problem of learning a deformable part model with this data to a binary classification problem.Let D=( x1,y1 ,..., x n,y n )be a set of labeled exam-ples where y i∈{−1,1}and x i specifies a HOG pyramid, H(x i),together with a range,Z(x i),of valid placements for the root and partfilters.We construct a positive exam-ple from each bounding box in the training set.For these ex-amples we define Z(x i)so the rootfilter must be placed to overlap the bounding box by at least50%.Negative exam-ples come from images that do not contain the target object. Each placement of the rootfilter in such an image yields a negative training example.Note that for the positive examples we treat both the part locations and the exact location of the rootfilter as latent variables.We have found that allowing uncertainty in the root location during training significantly improves the per-formance of the system(see Section4).tent SVMsA latent SVM is defined as follows.We assume that each example x is scored by a function of the form,fβ(x)=maxz∈Z(x)β·Φ(x,z),(2)whereβis a vector of model parameters and z is a set of latent values.For our deformable models we define Φ(x,z)=ψ(H(x),z)so thatβ·Φ(x,z)is the score of placing the model according to z.In analogy to classical SVMs we would like to trainβfrom labeled examples D=( x1,y1 ,..., x n,y n )by optimizing the following objective function,β∗(D)=argminβλ||β||2+ni=1max(0,1−y i fβ(x i)).(3)By restricting the latent domains Z(x i)to a single choice, fβbecomes linear inβ,and we obtain linear SVMs as a special case of latent tent SVMs are instances of the general class of energy-based models[18].3.2.Semi-ConvexityNote that fβ(x)as defined in(2)is a maximum of func-tions each of which is linear inβ.Hence fβ(x)is convex inβ.This implies that the hinge loss max(0,1−y i fβ(x i)) is convex inβwhen y i=−1.That is,the loss function is convex inβfor negative examples.We call this property of the loss function semi-convexity.Consider an LSVM where the latent domains Z(x i)for the positive examples are restricted to a single choice.The loss due to each positive example is now bined with the semi-convexity property,(3)becomes convex inβ.If the labels for the positive examples are notfixed we can compute a local optimum of(3)using a coordinate de-scent algorithm:1.Holdingβfixed,optimize the latent values for the pos-itive examples z i=argmax z∈Z(xi )β·Φ(x,z).2.Holding{z i}fixed for positive examples,optimizeβby solving the convex problem defined above.It can be shown that both steps always improve or maintain the value of the objective function in(3).If both steps main-tain the value we have a strong local optimum of(3),in the sense that Step1searches over an exponentially large space of latent labels for positive examples while Step2simulta-neously searches over weight vectors and an exponentially large space of latent labels for negative examples.3.3.Data Mining Hard NegativesIn object detection the vast majority of training exam-ples are negative.This makes it infeasible to consider all negative examples at a time.Instead,it is common to con-struct training data consisting of the positive instances and “hard negative”instances,where the hard negatives are data mined from the very large set of possible negative examples.Here we describe a general method for data mining ex-amples for SVMs and latent SVMs.The method iteratively solves subproblems using only hard instances.The innova-tion of our approach is a theoretical guarantee that it leads to the exact solution of the training problem defined using the complete training set.Our results require the use of a margin-sensitive definition of hard examples.The results described here apply both to classical SVMs and to the problem defined by Step2of the coordinate de-scent algorithm for latent SVMs.We omit the proofs of the theorems due to lack of space.These results are related to working set methods[17].We define the hard instances of D relative toβas,M(β,D)={ x,y ∈D|yfβ(x)≤1}.(4)That is,M(β,D)are training examples that are incorrectly classified or near the margin of the classifier defined byβ. We can show thatβ∗(D)only depends on hard instances. Theorem1.Let C be a subset of the examples in D.If M(β∗(D),D)⊆C thenβ∗(C)=β∗(D).This implies that in principle we could train a model us-ing a small set of examples.However,this set is defined in terms of the optimal modelβ∗(D).Given afixedβwe can use M(β,D)to approximate M(β∗(D),D).This suggests an iterative algorithm where we repeatedly compute a model from the hard instances de-fined by the model from the last iteration.This is further justified by the followingfixed-point theorem.Theorem2.Ifβ∗(M(β,D))=βthenβ=β∗(D).Let C be an initial“cache”of examples.In practice we can take the positive examples together with random nega-tive examples.Consider the following iterative algorithm: 1.Letβ:=β∗(C).2.Shrink C by letting C:=M(β,C).3.Grow C by adding examples from M(β,D)up to amemory limit L.Theorem3.If|C|<L after each iteration of Step2,the algorithm will converge toβ=β∗(D)infinite time.3.4.Implementation detailsMany of the ideas discussed here are only approximately implemented in our current system.In practice,when train-ing a latent SVM we iteratively apply classical SVM train-ing to triples x1,z1,y1 ,..., x n,z n,y n where z i is se-lected to be the best scoring latent label for x i under themodel trained in the previous iteration.Each of these triples leads to an example Φ(x i,z i),y i for training a linear clas-sifier.This allows us to use a highly optimized SVM pack-age(SVMLight[17]).On a single CPU,the entire training process takes3to4hours per object class in the PASCAL datasets,including initialization of the parts.Root Filter Initialization:For each category,we auto-matically select the dimensions of the rootfilter by looking at statistics of the bounding boxes in the training data.1We train an initial rootfilter F0using an SVM with no latent variables.The positive examples are constructed from the unoccluded training examples(as labeled in the PASCAL data).These examples are anisotropically scaled to the size and aspect ratio of thefilter.We use random subwindows from negative images to generate negative examples.Root Filter Update:Given the initial rootfilter trained as above,for each bounding box in the training set wefind the best-scoring placement for thefilter that significantly overlaps with the bounding box.We do this using the orig-inal,un-scaled images.We retrain F0with the new positive set and the original random negative set,iterating twice.Part Initialization:We employ a simple heuristic to ini-tialize six parts from the rootfilter trained above.First,we select an area a such that6a equals80%of the area of the rootfilter.We greedily select the rectangular region of area a from the rootfilter that has the most positive energy.We zero out the weights in this region and repeat until six parts are selected.The partfilters are initialized from the rootfil-ter values in the subwindow selected for the part,butfilled in to handle the higher spatial resolution of the part.The initial deformation costs measure the squared norm of a dis-placement with a i=(0,0)and b i=−(1,1).Model Update:To update a model we construct new training data triples.For each positive bounding box in the training data,we apply the existing detector at all positions and scales with at least a50%overlap with the given bound-ing box.Among these we select the highest scoring place-ment as the positive example corresponding to this training bounding box(Figure3).Negative examples are selected byfinding high scoring detections in images not containing the target object.We add negative examples to a cache un-til we encounterfile size limits.A new model is trained by running SVMLight on the positive and negative examples, each labeled with part placements.We update the model10 times using the cache scheme described above.In each it-eration we keep the hard instances from the previous cache and add as many new hard instances as possible within the memory limit.Toward thefinal iterations,we are able to include all hard instances,M(β,D),in the cache.1We picked a simple heuristic by cross-validating over5object classes. We set the model aspect to be the most common(mode)aspect in the data. We set the model size to be the largest size not larger than80%of thedata.Figure3.The image on the left shows the optimization of the la-tent variables for a positive example.The dotted box is the bound-ing box label provided in the PASCAL training set.The large solid box shows the placement of the detection window while the smaller solid boxes show the placements of the parts.The image on the right shows a hard-negative example.4.ResultsWe evaluated our system using the PASCAL VOC2006 and2007comp3challenge datasets and protocol.We refer to[7,8]for details,but emphasize that both challenges are widely acknowledged as difficult testbeds for object detec-tion.Each dataset contains several thousand images of real-world scenes.The datasets specify ground-truth bounding boxes for several object classes,and a detection is consid-ered correct when it overlaps more than50%with a ground-truth bounding box.One scores a system by the average precision(AP)of its precision-recall curve across a testset.Recent work in pedestrian detection has tended to report detection rates versus false positives per window,measured with cropped positive examples and negative images with-out objects of interest.These scores are tied to the reso-lution of the scanning window search and ignore effects of non-maximum suppression,making it difficult to compare different systems.We believe the PASCAL scoring method gives a more reliable measure of performance.The2007challenge has20object categories.We entered a preliminary version of our system in the official competi-tion,and obtained the best score in6categories.Our current system obtains the highest score in10categories,and the second highest score in6categories.Table1summarizes the results.Our system performs well on rigid objects such as cars and sofas as well as highly deformable objects such as per-sons and horses.We also note that our system is successful when given a large or small amount of training data.There are roughly4700positive training examples in the person category but only250in the sofa category.Figure4shows some of the models we learned.Figure5shows some ex-ample detections.We evaluated different components of our system on the longer-established2006person dataset.The top AP scoreaero bike bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tvOur rank 31211224111422112141Our score .180.411.092.098.249.349.396.110.155.165.110.062.301.337.267.140.141.156.206.336Darmstadt .301INRIA Normal .092.246.012.002.068.197.265.018.097.039.017.016.225.153.121.093.002.102.157.242INRIA Plus.136.287.041.025.077.279.294.132.106.127.067.071.335.249.092.072.011.092.242.275IRISA .281.318.026.097.119.289.227.221.175.253MPI Center .060.110.028.031.000.164.172.208.002.044.049.141.198.170.091.004.091.034.237.051MPI ESSOL.152.157.098.016.001.186.120.240.007.061.098.162.034.208.117.002.046.147.110.054Oxford .262.409.393.432.375.334TKK .186.078.043.072.002.116.184.050.028.100.086.126.186.135.061.019.036.058.067.090Table 1.PASCAL VOC 2007results.Average precision scores of our system and other systems that entered the competition [7].Empty boxes indicate that a method was not tested in the corresponding class.The best score in each class is shown in bold.Our current system ranks first in 10out of 20classes.A preliminary version of our system ranked first in 6classes in the official competition.BottleCarBicycleSofaFigure 4.Some models learned from the PASCAL VOC 2007dataset.We show the total energy in each orientation of the HOG cells in the root and part filters,with the part filters placed at the center of the allowable displacements.We also show the spatial model for each part,where bright values represent “cheap”placements,and dark values represent “expensive”placements.in the PASCAL competition was .16,obtained using a rigid template model of HOG features [5].The best previous re-sult of.19adds a segmentation-based verification step [20].Figure 6summarizes the performance of several models we trained.Our root-only model is equivalent to the model from [5]and it scores slightly higher at .18.Performance jumps to .24when the model is trained with a LSVM that selects a latent position and scale for each positive example.This suggests LSVMs are useful even for rigid templates because they allow for self-adjustment of the detection win-dow in the training examples.Adding deformable parts in-creases performance to .34AP —a factor of two above the best previous score.Finally,we trained a model with partsbut no root filter and obtained .29AP.This illustrates the advantage of using a multiscale representation.We also investigated the effect of the spatial model and allowable deformations on the 2006person dataset.Recall that s i is the allowable displacement of a part,measured in HOG cells.We trained a rigid model with high-resolution parts by setting s i to 0.This model outperforms the root-only system by .27to .24.If we increase the amount of allowable displacements without using a deformation cost,we start to approach a bag-of-features.Performance peaks at s i =1,suggesting it is useful to constrain the part dis-placements.The optimal strategy allows for larger displace-ments while using an explicit deformation cost.The follow-Figure 5.Some results from the PASCAL 2007dataset.Each row shows detections using a model for a specific class (Person,Bottle,Car,Sofa,Bicycle,Horse).The first three columns show correct detections while the last column shows false positives.Our system is able to detect objects over a wide range of scales (such as the cars)and poses (such as the horses).The system can also detect partially occluded objects such as a person behind a bush.Note how the false detections are often quite reasonable,for example detecting a bus with the car model,a bicycle sign with the bicycle model,or a dog with the horse model.In general the part filters represent meaningful object parts that are well localized in each detection such as the head in the person model.Figure6.Evaluation of our system on the PASCAL VOC2006 person dataset.Root uses only a rootfilter and no latent place-ment of the detection windows on positive examples.Root+Latent uses a rootfilter with latent placement of the detection windows. Parts+Latent is a part-based system with latent detection windows but no rootfilter.Root+Parts+Latent includes both root and part filters,and latent placement of the detection windows.ing table shows AP as a function of freely allowable defor-mation in thefirst three columns.The last column gives the performance when using a quadratic deformation cost and an allowable displacement of2HOG cells.s i01232+quadratic costAP.27.33.31.31.345.DiscussionWe introduced a general framework for training SVMs with latent structure.We used it to build a recognition sys-tem based on multiscale,deformable models.Experimental results on difficult benchmark data suggests our system is the current state-of-the-art in object detection.LSVMs allow for exploration of additional latent struc-ture for recognition.One can consider deeper part hierar-chies(parts with parts),mixture models(frontal vs.side cars),and three-dimensional pose.We would like to train and detect multiple classes together using a shared vocab-ulary of parts(perhaps visual words).We also plan to use A*search[11]to efficiently search over latent parameters during detection.References[1]Y.Amit and A.Trouve.POP:Patchwork of parts models forobject recognition.IJCV,75(2):267–282,November2007.[2]M.Burl,M.Weber,and P.Perona.A probabilistic approachto object recognition using local photometry and global ge-ometry.In ECCV,pages II:628–641,1998.[3] D.Crandall,P.Felzenszwalb,and D.Huttenlocher.Spatialpriors for part-based recognition using statistical models.In CVPR,pages10–17,2005.[4] D.Crandall and D.Huttenlocher.Weakly supervised learn-ing of part-based spatial models for visual object recognition.In ECCV,pages I:16–29,2006.[5]N.Dalal and B.Triggs.Histograms of oriented gradients forhuman detection.In CVPR,pages I:886–893,2005.[6] B.Epshtein and S.Ullman.Semantic hierarchies for recog-nizing objects and parts.In CVPR,2007.[7]M.Everingham,L.Van Gool,C.K.I.Williams,J.Winn,and A.Zisserman.The PASCAL Visual Object Classes Challenge2007(VOC2007)Results./challenges/VOC/voc2007/workshop.[8]M.Everingham, A.Zisserman, C.K.I.Williams,andL.Van Gool.The PASCAL Visual Object Classes Challenge2006(VOC2006)Results./challenges/VOC/voc2006/results.pdf.[9]P.Felzenszwalb and D.Huttenlocher.Distance transformsof sampled functions.Cornell Computing and Information Science Technical Report TR2004-1963,September2004.[10]P.Felzenszwalb and D.Huttenlocher.Pictorial structures forobject recognition.IJCV,61(1),2005.[11]P.Felzenszwalb and D.McAllester.The generalized A*ar-chitecture.JAIR,29:153–190,2007.[12]R.Fergus,P.Perona,and A.Zisserman.Object class recog-nition by unsupervised scale-invariant learning.In CVPR, 2003.[13]M.Fischler and R.Elschlager.The representation andmatching of pictorial structures.IEEE Transactions on Com-puter,22(1):67–92,January1973.[14] A.Holub and P.Perona.A discriminative framework formodelling object classes.In CVPR,pages I:664–671,2005.[15]S.Ioffe and D.Forsyth.Probabilistic methods forfindingpeople.IJCV,43(1):45–68,June2001.[16]Y.Jin and S.Geman.Context and hierarchy in a probabilisticimage model.In CVPR,pages II:2145–2152,2006.[17]T.Joachims.Making large-scale svm learning practical.InB.Sch¨o lkopf,C.Burges,and A.Smola,editors,Advances inKernel Methods-Support Vector Learning.MIT Press,1999.[18]Y.LeCun,S.Chopra,R.Hadsell,R.Marc’Aurelio,andF.Huang.A tutorial on energy-based learning.InG.Bakir,T.Hofman,B.Sch¨o lkopf,A.Smola,and B.Taskar,editors, Predicting Structured Data.MIT Press,2006.[19] A.Quattoni,S.Wang,L.Morency,M.Collins,and T.Dar-rell.Hidden conditional randomfields.PAMI,29(10):1848–1852,October2007.[20] ing segmentation to verify object hypothe-ses.In CVPR,pages1–8,2007.[21] D.Ramanan and C.Sminchisescu.Training deformablemodels for localization.In CVPR,pages I:206–213,2006.[22]H.Schneiderman and T.Kanade.Object detection using thestatistics of parts.IJCV,56(3):151–177,February2004. [23]J.Zhang,M.Marszalek,zebnik,and C.Schmid.Localfeatures and kernels for classification of texture and object categories:A comprehensive study.IJCV,73(2):213–238, June2007.。
consistency regularization 出处 -回复
consistency regularization 出处-回复Consistency Regularization: An Overview and ApplicationsIntroductionConsistency regularization has emerged as a powerful technique in machine learning, specifically in the field of deep learning. It aims to improve the generalization and robustness of models by encouraging consistency in their predictions. This regularization technique has found applications in various domains, including image classification, natural language processing, and speech recognition. In this article, we will provide an overview of consistency regularization, discuss its theoretical foundations, and explore its applications in different areas.Theoretical FoundationsConsistency regularization is rooted in the principle of encouraging smoothness and stability in model predictions. The underlying assumption is that small changes in the input should not significantly alter the output of a well-trained model. This principle is particularly relevant in scenarios where the training data maycontain noisy or ambiguous samples.One of the commonly used methods for achieving consistency regularization is known as consistency training. In this approach, two different input transformations are applied to the same sample, creating two augmented versions. The model is then trained to produce consistent predictions for the transformed samples. Intuitively, this process encourages the model to focus on the underlying patterns in the data rather than being influenced by specific input variations.Consistency regularization can be formulated using several loss functions. One popular choice is the mean squared error (MSE) loss, which measures the discrepancy between predictions of the original input and transformed versions. Other approaches include cross-entropy loss and Kullback-Leibler divergence.Applications in Image ClassificationConsistency regularization has yielded promising results in image classification tasks. One notable application is semi-supervised learning, where the goal is to leverage a small amount of labeleddata with a larger set of unlabeled data. By applying consistent predictions to both labeled and unlabeled data, models can effectively learn from the unlabeled data and improve their performance on the labeled data. This approach has been shown to outperform traditional supervised learning methods in scenarios with limited labeled samples.Additionally, consistency regularization has been explored in the context of adversarial attacks. Adversarial attacks attempt to fool a model by introducing subtle perturbations to the input data. By training models with consistent predictions for both original and perturbed inputs, their robustness against such attacks can be significantly improved.Applications in Natural Language ProcessingConsistency regularization has also demonstrated promising results in natural language processing (NLP) tasks. In NLP, models often face the challenge of understanding and generating coherent sentences. By applying consistency regularization, models can be trained to produce consistent predictions for different representations of the same text. This encourages the model tofocus on the meaning and semantics of the text rather than being influenced by superficial variations, such as different word order or sentence structure.Furthermore, consistency regularization can be used in machine translation tasks, where the goal is to translate text from one language to another. By enforcing consistency between translations of the same source text, models can generate more accurate and consistent translations.Applications in Speech RecognitionSpeech recognition is another domain where consistency regularization has found applications. One of the key challenges in speech recognition is handling variations in pronunciation and speaking styles. By training models with consistent predictions for different acoustic representations of the same speech utterance, models can better capture the underlying patterns and improve their accuracy in recognizing speech in different conditions. This can lead to more robust and reliable speech recognition systems in real-world scenarios.ConclusionConsistency regularization has emerged as an effective technique for improving the generalization and robustness of models in various machine learning tasks. By encouraging consistency in predictions, models can better learn the underlying patterns in the data and generalize well to unseen examples. This regularization technique has been successfully applied in image classification, natural language processing, and speech recognition tasks, among others. As research in consistency regularization continues to advance, we can expect further developments and applications in the future.。
Superoperator Representation of Nonlinear Response Unifying Quantum Field and Mode Coupling
Shaul Mukamel Department of Chemistry, University of California, Irvine, CA 92697-2025
(Dated: February 2, 2008)
Abstract
Computing response functions by following the time evolution of superoperators in Liouville space (whose vectors are ordinary Hilbert space operators) offers an attractive alternative to the diagrammatic perturbative expansion of many-body equilibrium and nonequilibrium Green functions. The bookkeeping of time ordering is naturally maintained in real (physical) time, allowing the formulation of Wick’s theorem for superoperators, giving a factorization of higher order response functions in terms of two fundamental Green’s functions. Backward propagations and the analytic continuations using artificial times (Keldysh loops and Matsubara contours) are avoided. A generating functional for nonlinear response functions unifies quantum field theory and the classical mode coupling formalism of nonlinear hydrodynamics and may be used for semiclassical expansions. Classical response functions are obtained without the explicit computation of stability matrices.
Moyal Planes are Spectral Triples
a r X i v :h e p -t h /0307241v 3 7 O c t 2003CENTRE DE PHYSIQUE TH ´EORIQUE1CNRS–Luminy,Case 90713288Marseille Cedex 9FRANCEMoyal Planes are Spectral TriplesV.Gayral,2J.M.Gracia-Bond´ıa,3B.Iochum,2T.Sch¨u cker 2and J.C.V´a rilly 4,5Abstract Axioms for nonunital spectral triples,extending those introduced in the unital case by Connes,are proposed.As a guide,and for the sake of their importance in noncommutative quantum field theory,the spaces R 2N endowed with Moyal products are intensively investigated.Some physical applications,such as the construction of noncommutative Wick monomials and the computation of the Connes–Lott functional action,are given for these noncommutative hyperplanes.PACS numbers:11.10.Nx,02.30.Sa,11.15.Kc MSC–2000classes:46H35,46L52,58B34,81S30July 2003CPT–03/P.45461Unit´e Propre de Recherche 70612Also at Universit´e de Provence,gayral@cpt.univ-mrs.fr,iochum@cpt.univ-mrs.fr,schucker@cpt.univ-mrs.fr3Departamento de F´ısica,Universidad de Costa Rica,2060San Pedro,Costa Rica4Departamento de Matem´a ticas,Universidad de Costa Rica,2060San Pedro,Costa Rica 5Regular Associate of the Abdus Salam ICTP,34014Trieste;varilly@ictp.trieste.itContents1Introduction3 2The theory of distributions and Moyal analysis42.1Basic facts of Moyalology (5)2.2The oscillator basis (6)2.3Moyal multiplier algebras (8)2.4Smooth test function spaces,their duals and the Moyal product (10)2.5The preferred unitization of the Schwartz Moyal algebra (14)3Axioms for noncompact spin geometries173.1Generalization of the unital case conditions (17)3.2Modified conditions for nonunital spectral triples (19)3.3The commutative case (21)3.4On the Connes–Landi spaces example (22)4The Moyal2N-plane as a spectral triple234.1The compactness condition (23)4.2Spectral dimension of the Moyal planes (26)4.3The regularity condition (33)4.4Thefiniteness condition (34)4.5The other axioms for the Moyal2N-plane (35)5Moyal–Wick monomials365.1An algebraic mould (36)5.2The noncommutative Wick monomials (39)6The functional action416.1Connes–Terashima fermions (41)6.2The differential algebra (42)6.3The action (43)7Conclusions and outlook45 8Appendix:a few explicit formulas468.1On the oscillator basis functions (46)8.2More junk (47)21IntroductionSince Seiberg and Witten conclusively confirmed[79]that the endpoints of open strings in amagneticfield background effectively live on a noncommutative space,string theory has givenmuch impetus to noncommutativefield theory(NCFT).This noncommutative space turns outto be of the Moyal type,for which there already existed a respectable body of mathematicalknowledge,in connection with the phase-space formulation of quantum mechanics[65].However,NCFT is a problematic realm.Its bane is the trouble with both unitarity andcausality[39,78].Feynman rules for NCFT can be derived either using the canonical operatorformalism for quantizedfields,working with the scattering matrix in the Heisenberg picture bymeans of Yang–Feldman–K¨a ll´e n equations;or from the functional integral formalism.Thesetwo approaches clash[3],and there is the distinct possibility that both fail to make sense.Thedifficulties vanish if we look instead at NCFT in the Euclidean signature.Also,in spite of the tremendous influence on NCFT,direct and indirect,of the work by Connes,it is surprising thatNCFT based on the Moyal product as currently practised does not appeal to the spectral tripleformalism.So we may,and should,raise a basic question:namely,whether the Euclidean version ofMoyal noncommutativefield theory is compatible with the full strength of Connes’formulationof noncommutative geometry,or not.The prospective benefits of such an endeavour are mutual.Those interested in applicationsmay win a new toolkit,and Connes’paradigm stands to gain from careful consideration of newexamples.In order to speak of noncommutative spaces endowed with topological,differential and metricstructures,Connes has put forward an axiomatic scheme for“noncommutative spin manifolds”, which in fact is the end product of a long process of learning how to express the concept of anordinary spin manifold in algebraic and operatorial terms.A compact noncommutative spin manifold consists of a spectral triple(A,H,D),subject tothe six or seven special conditions laid out in[19]—and reviewed below in due course.Here Ais a unital algebra,represented on a Hilbert space H,together with a distinguished selfadjointoperator,the abstract Dirac operator D,whose resolvent is completely continuous,such thateach operator[D,a]for a∈A is bounded.A spectral triple is even if it possesses a Z2-gradingoperatorχcommuting with A and anticommuting with D.The key result is the reconstruction theorem[19,20]which recovers the classical geometryof a compact spin manifold M from the noncommutative setup,once the algebra of coordi-nates is assumed to be isomorphic to the space of smooth functions C∞(M).Details of this reconstruction are given in[45,Chapters10and11]and in a different vein in[71].Thus,for compact noncommutative spaces,the answer to our question is clearly in theaffirmative.Indeed thefirst worked examples of noncommutative differential geometries are thenoncommutative tori(NC tori),as introduced already in1980[14,74].It is a simple observationthat the NC torus can be obtained as an ordinary torus endowed with a periodic version of theMoyal product.The NC tori have been thoroughly exploited in NCFT[24,92].The restriction to compact noncommutative spaces(“compactness”being a metaphor for theunitality of the coordinate algebra A)is essentially a technical one,and no fundamental obstacleto extending the theory of spectral triples to nonunital algebras was foreseen.However,it isfair to say that so far a complete treatment of the nonunital case has not been written down.(There have been,of course,some noteworthy partial treatments:one can mention[41,73], which identify some of the outstanding issues.)The time has come to add a new twist to the tale.3In this article we show in detail how to build noncompact noncommutative spin geometries. The indispensable commutative example of noncompact manifolds is consideredfirst.Then the geometry associated to the Moyal product is laid out.One of the difficulties for doing this is to pin down a“natural”compactification or unitization(embedding of the coordinate algebra as an essential ideal in a unital algebra),the main idea being that the chosen Dirac operator must play a role in this choice.Since the resolvent of D is no longer compact,some adjustments need to be made;for instance,we now ask for a(D−λ)−1to be compact for a∈A andλ/∈sp D.Then,thanks to a variation of the famous Cwikel inequality[27,81]—often used for estimating bound states of Schr¨o dinger operators—we prove that the spectral triple(S(R2N),⋆Θ),L2(R2N)⊗C2N,−i∂µ⊗γµ ,where S denotes the space of Schwartz functions and⋆Θa Moyal product,is2N+-summable and has in fact the spectral dimension2N.The interplay between all suitable algebras containing (S(R2N),⋆Θ)must be validated by the orientation andfiniteness conditions[19,20].In so doing, we prove that the classical background of modern-day NCFTs doesfit in the framework of the rigorous Connes formalism for geometrical noncommutative spaces.This accomplished,the construction of noncommutative gauge theories,that we perform by means of the primitive form of the spectral action functional,is straightforward.The issue of understanding thefluctuations of the geometry,in order to develop“noncommutative grav-ity”[12]has not reached a comparable degree of mathematical maturity,and is not examined yet.As a byproduct of our analysis,and although we do not deal here with NCFT proper,a mathematically satisfactory construction of the Moyal–Wick monomials is also given.The main results in this paper have been announced and summarized in[38].Thefirst order of business is to review the Moyal product more carefully with due attention paid to the mathematical details.2The theory of distributions and Moyal analysisIn thisfirst paragraph wefix the notations and recall basic definitions.For anyfinite dimension k,letΘbe a real skewsymmetric k×k matrix,let s·t denote the usual scalar product on Euclidean R k and let S(R k)be the space of complex Schwartz(smooth,rapidly decreasing) functions on R k.One defines,for f,h∈S(R k),the corresponding Moyal or twisted product:f⋆Θh(x):=(2π)−k f(x−1have the dimensions of an action;one then selectsΘ= S := 01N −1N 0 .Indeed,the product ⋆(or rather,its commutator)was introduced in that context by Moyal [65],using a series developmentinpowers of whose first nontrivial term gives the Poisson bracket;later,it was rewritten in the above integral form.These are actually oscillatory integrals,of which Moyal’s series development,f ⋆g (x )= α∈N 2Ni α!∂f∂(Sx )α(x ),(2.3)is an asymptotic expansion.The development (2.3)holds —and sometimes becomes exact—under conditions spelled out in [33].The first integral form (2.1)of the Moyal product was exploited by Rieffel in a remarkable monograph [75],who made it the starting point for a more general deformation theory of C ∗-algebras.Since the problems we are concerned with in this paper are of functional analytic nature,there is little point in using the most general Θhere:we concentrate on the nondegenerate case and adopt the form Θ=θS with θreal.Therefore,the corresponding Moyal products are indexed by the real parameter θ;we denote them by ⋆θand usually omit explicit reference to N in the notation.The plan of the rest of this section is roughly as follows.The Schwartz space S (R 2N )endowed with these products is an algebra without unit and its unitization will not be unique.Below,after extending the Moyal product to large classes of distributions,we find and choose unitizations suitable for our construction of a noncompact spectral triple,and show that (S (R 2N ),⋆θ)is a pre-C ∗-algebra.We prove that the left Moyal product by a function f ∈S (R 2N )is a regularizing operator on R 2N .In connection with that,we examine the matter of Calder´o n–Vaillancourt-type theorems in Moyal analysis.We inspect as well the relation of our compactifications with NC tori.2.1Basic facts of MoyalologyWith the choice Θ=θS made,the Moyal product can also be writtenf⋆θg (x ):=(πθ)−2N f (y )g (z )e2i ∂x j (f⋆θg )=∂f ∂x j .(2.5)5(iv)Pointwise multiplication by any coordinate x j obeysx j(f⋆θg)=f⋆θ(x j g)+iθ∂(Sx)j⋆θg=(x j f)⋆θg−iθ∂(Sx)j.(2.6)(v)The product has the tracial property:f,g :=1(πθ)N g⋆θf(x)d2N x=12(x2l+x2l+N)for l=1,...,N and H:=H1+H2+···+H N, then the f mn diagonalize these harmonic oscillator Hamiltonians:H l⋆θf mn=θ(m l+12)f mn.(2.9)6They may be defined byf mn :=1θ|m |+|n |m !n !(a ∗)m ⋆θf 00⋆θa n ,(2.10)where f 00is the Gaussian function f 00(x ):=2N e −2H/θ,and the annihilation and creation functions respectively area l :=12(x l +ix l +N )and a ∗l :=12(x l −ix l +N ).(2.11)One finds that a n :=a n 11...a n N N =a ⋆θn 11⋆θ···⋆θa ⋆θn N N .These Wigner eigentransitions are already found in [46]and also in [6].(Incidentally,the “first”attributions in [36]are quite mistaken.)The f mn can be expressed with the help of Laguerre functions in the variables H l :see subsection 8.1of the Appendix.The next lemma summarizes their chief properties.Lemma 2.4.[43]Let m,n,k,l ∈N N .Then f mn ⋆θf kl =δnk f ml and f ∗mn =f nm .Thus f nn isan orthogonal projector and f mn is nilpotent for m =n .Moreover, f mn ,f kl =2N δmk δnl .The family {f mn :m,n ∈N N }⊂S ⊂L 2(R 2N )is an orthogonal basis.It is clear that e K := |n |≤K f nn ,for K ∈N ,defines a (not uniformly bounded)approximate unit {e K }for A θ.As a consequence of Lemma 2.4,the Moyal product has a matricial form.Proposition 2.5.[43]Let N =1.Then A θhas a Fr´e chet algebra isomorphism with the matrix algebra of rapidly decreasing double sequences c =(c mn )such that,for each k ∈N ,r k (c ):= ∞ m,n =0θ2k (m +12)k |c mn |2 1/2is finite,topologized by all the seminorms (r k );via the decomposition f = m,n ∈N N c mn f mnof S (R 2)in the {f mn }basis.For N >1,A θis isomorphic to the (projective)tensor product of N matrix algebras of this kind.Definition 2.6.We may as well introduce more Hilbert spaces G st (for s,t ∈R )of those f ∈S ′(R 2)for which the following sum is finite:f 2st :=∞ m,n =0θs +t (m +12)t |c mn |2.We define G st ,for s,t now in R N ,as the tensor product of Hilbert spaces G s 1t 1⊗···⊗G s N t N .In other words,the elements (2π)−N/2θ−(N +s +t )/2(m +12)−t/2f mn (with an obvious multiindex notation),for m,n ∈N N ,are declared to be an orthonormal basis for G st .If q ≤s and r ≤t in R N ,then S ⊂G st ⊆G qr ⊂S ′with continuous dense inclusions.Moreover,S = s,t ∈R N G st topologically (i.e.,the projective limit topology of the intersection induces the usual Fr´e chet space topology on S )and S ′= s,t ∈R N G st topologically (i.e.,theinductive limit topology of the union induces the usual DF topology on S ′).In particular,the expansion f = m,n ∈N N c mn f mn of f ∈S ′converges in the strong dual topology.We will use the notational convention that if F,G are spaces such that f⋆θg is defined whenever f ∈F and g ∈G ,then F ⋆θG is the linear span of the set {f⋆θg :f ∈F,g ∈G };in many cases of interest,this set is already a vector space.It is now easy to show that S ⋆θS =S ;more precisely,the following result holds.7Proposition 2.7.[43,p.877]The algebra (S ,⋆θ)has the (nonunique)factorization property:for all h ∈S there exist f,g ∈S such that h =f⋆θg .2.3Moyal multiplier algebrasDefinition 2.8.The Moyal product can be defined,by duality,on larger sets than S .For T ∈S ′,write the evaluation on g ∈S as T,g ∈C ;then,for f ∈S we may define T ⋆θf and f⋆θT as elements of S ′by T ⋆θf,g := T,f⋆θg and f⋆θT,g := T,g⋆θf ,using the continuity of the star product on S .Also,the involution is extended to S ′by T ∗,g :=(ii)⋆θis a bilinear associative product on L 2(R 2N ).The complex conjugation of functions f →f ∗is an involution for ⋆θ.(iii)The linear functional f → f (x )dx on S extends to I 00(R 2N ):=L 2(R 2N )⋆θL 2(R 2N ),and the product has the tracial property:f,g :=(πθ)−N f⋆θg (x )d 2N x =(πθ)−N g⋆θf (x )d 2N x =(πθ)−N f (x )g (x )d 2N x.We are not asserting that h =f⋆θg is absolutely integrable.We can nevertheless find u ∈S ′with u ∗⋆θu =1and |h |∈I 00so that h =u⋆θ|h |and |h |=l ∗⋆θl with l ∈G 00.Writing h 00,1:= 1,|h | = l 200,we obtain a Banach space norm for I 00such that f⋆θg 00,1≤ f 00 g 00.(iv)lim θ↓0L θf g (x )=f (x )g (x )almost everywhere on R2N .In subsection 8.1of the Appendix it is discussed why I 00⊂/L 1(R 2N ).Since f ∈I 00if and only if the Schr¨o dinger representative σθ(f )is trace-class (see the proof of the next Proposition 2.13),one can obtain sufficient conditions for f to belong in I 00from the treatment in [29].Definition 2.11.Let A θ:={T ∈S ′:T ⋆θg ∈L 2(R 2N )for all g ∈L 2(R 2N )},provided with the operator norm L θ(T ) op :=sup { T ⋆θg 2/ g 2:0=g ∈L 2(R 2N )}.Obviously A θ=S ֒→A θ.But A θis not dense in A θ(see below),and we shall denote by A 0θits closure in A θ.Note that G 00⊂A θ.This is clear from the following estimate.Lemma 2.12.[43]If f,g ∈L 2(R 2N ),then f⋆θg ∈L 2(R 2N )and L θf op ≤(2πθ)−N/2 f 2.Proof.Expand f = m,n c mn αmn and g = m,n d mn αmn with respect to the orthonormal basis{αnm }:=(2πθ)−N/2{f nm }of L 2(R 2N ).Thenf⋆θg 22=(2πθ)−2N m,l n c mn d nl f ml 22=(2πθ)−N m,l n c mn d nl 2≤(2πθ)−N m,j |c mj |2 k,l |d kl |2=(2πθ)−N f 22 g 22,on applying the Cauchy–Schwarz inequality.The algebra A θcontains moreover L 1(R 2N )and its Fourier transform [57],even the bounded measures and their Fourier transforms;the plane waves;but no nonconstant polynomials,nor derivatives of δ.The algebra A θis selfconjugate,and it could have been defined using right Moyal multiplication instead.Proposition 2.13.[56,90](A θ, . op )is a unital C ∗-algebra of operators on L 2(R 2N ),isomor-phic to L (L 2(R N ))and including L 2(R 2N ).Also,(I 00)′=A θ.Moreover,there is a continuous injection of ∗-algebras A θ֒→A θ,but A θis not dense in A θ,namely A 0θ A θ.Proof.We prove the nondensity result.The left regular representation L θof A θis a denumerable direct sum of copies of the Schr¨o dinger representation σθon L 2(R N )[66].Indeed,there is a unitary operator,the Wigner transformation W [36,90],from L 2(R 2N )onto L 2(R N )⊗L 2(R N ),such thatW L θ(f )W −1=σθ(f )⊗1.9If f ∈S ,then σθ(f )is a compact (indeed,trace-class)operator on L 2(R N ),and so A 0θequals {W −1(T ⊗1)W :T compact },while A θitself is {W −1(T ⊗1)W :T bounded }.Clearly thedual space is (A 0θ)′=I 00.Notice as well that conjugation by W yields an explicit isomorphismbetween A θand L (L 2(R N )).Consequently,A θis a Fr´e chet algebra whose topology is finer than the . op -topology.More-over,it is stable under holomorphic functional calculus in its C ∗-completion A 0θ,as the next proposition shows.Proposition 2.14.A θis a (nonunital)Fr´e chet pre-C ∗-algebra.Proof.We adapt the argument for the commutative case in [45,p.135].To show that A θis stable under the holomorphic functional calculus,we need only check that if f ∈A θand 1+f is invertible in A 0θwith inverse 1+g ,then the quasiinverse g of f must lie in A θ.From f +g +f⋆θg =0,we obtain f⋆θf +g⋆θf +f⋆θg⋆θf =0,and it is enough to show that f⋆θg⋆θf ∈A θ,since the previous relation then implies g⋆θf ∈A θ,and then g =−f −g⋆θf ∈A θalso.Now,A θ⊂G −r,0for any r >N [90,p.886].Since f ∈G s,p +r ∩G qt ,for s,t arbitrary and p,q positive,we conclude that f⋆θg⋆θf ∈G s,p +r ⋆θG −r,0⋆θG qt ⊂G st ;as S = s,t ∈R G st ,the proof is complete.The Fr´e chet algebras A θare automatically good (their sets of quasiinvertible elements are open);and by an old result of Banach [5],the quasiinversion operation is continuous in a good Fr´e chet algebra.Note that a good algebra with identity cannot have proper (even one-sided)dense ideals.However,the nonunital (M θL )′provides an example of a good Fr´e chet algebra that harbours A θas a proper dense left ideal [44].We noticed already that the extensions M θand A θof A θare quite different.Clearly M θis associated with smoothness;however,even though the Sobolev-like spaces G st grow more regular with increasing s and t [90],M θincludes none of them;in particular,L 2(R 2N )⊂/M θfor any θ.Be that as it may,the plane waves belong both to M θand A θ.One obtains for the Moyal product of plane waves:exp(ik ·)⋆θexp(il ·)=e −i2k ·Θl exp(i (k +l )·).(2.13)Therefore the plane waves close to an algebra,the Weyl algebra .It represents the translation group of R 2N : exp(ik ·)⋆θf⋆θexp(−ik ·) (x )=f (x +θSk ),for f ∈S or f ∈G 00,say.2.4Smooth test function spaces,their duals and the Moyal productHere there is a fascinating interplay.Recall that a pseudodifferential operator A ∈ΨDO on R k is a linear operator which can be written asA h (x )=(2π)−k σ[A ](x,ξ)h (y )e iξ·(x −y )d k ξd k y.10LetΨd:={A∈ΨDO:σ[A]∈S d}be the class ofΨDOs of order d,withS d:={σ∈C∞(R k×R k):|∂αx∂βξσ(x,ξ)|≤C Kαβ(1+|ξ|2)(d−|β|)/2for x∈K}, where K is any compact subset of R k,α,β∈N k,and C Kαβis some constant.AlsoΨ∞:= d∈RΨd andΨ−∞:= d∈RΨd.Recall,too,that aΨDO A is called regularizing or smoothing if A∈Ψ−∞,or equivalently[52,80],if A extends to a continuous linear map from the dual of the space of smooth functions C∞(R k)to itself.is a regularizingΨDO.Lemma2.15.If f∈S,then LθfProof.From(2.1),one at once sees that left Moyal multiplication by f is the pseudodifferential operator on R2N with symbol f(x−θSξ)|≤C Kαβ(1+|ξ|2)(d−|β|)/2,2valid for allα,β∈N2N,any compact K⊂R2N,and any d∈R,since f∈S.Remark2.16.Unlike for the case of a compact manifold,regularizingΨDOs are not necessarily compact operators.For instance,for each n,Lθ(f nn)possesses the eigenvalue1with infinite multiplicity,so it cannot be compact.Definition2.17.For m∈N,f∈C m(R k)—functions with m continuous derivatives—and γ,l∈R,letqγlm(f):=sup{(1+|x|2)(−l+γ|α|)/2|∂αf(x)|:x∈R k,|α|≤m};and then let Vmis Horv´a th’s space S m−2l[53].We define0,lVγ:= l∈R m∈N V mγ,l,and,more generally,Vγ,l:= m∈N V mγ,l,so that Vγ= l∈R Vγ,l.Particularly interesting cases include the space K:=V1of Grossmann–Loupias–Stein functions[47],whose dual K′is the space of Ces`a ro-summable distributions[34], the space O C:=V0whose dual O′C is the space of convolution multipliers(Fourier transforms of O M),and the space O T:=V−1[43].Similarly,K r:=V1,r and O r:=V0,r are defined.We see thatS= m∈N l∈R V m0,l.Following Schwartz,we denote B:=O0,the space of smooth functions bounded together with all derivatives.We shall also need˙B:= m∈N VThere are continuous inclusions D ֒→V γ֒→V γ′֒→O M ֒→D ′for γ>γ′;these are all normal spaces of distributions,namely,locally convex spaces which include S as a dense subspace and are continuously included in S ′.Also D L 2(density of S in this space follows from density of theSchwartz functions in L 2and invariance of S under derivations)and M θL ,M θR and M θ[90]arenormal space of distributions.By the way,there are suggestive Tauberian-type theorems for these spaces,establishing when their intersections with their respective dual spaces are included in S .Concretely,we quote the following result from [32].Proposition 2.18.If C is a space of smooth functions on R 2N which is closed under complex conjugation,and if the pointwise product space KC lies within C ,then C ∩C ′⊆S .In particular,V γ∩V ′γ=S for γ≤1.Also O M ∩O ′M =S and C∞∩(C ∞)′=D ⊂S .Now,what can be said about the relation of all these spaces with M θ?In [43]it is established that O ′T ,and a fortiori O ′M ,is included in M θ,for all θ.Therefore by Fourier analysis O C is included in M θfor all θ,and g⋆θf is defined as a tempered distribution whenever f,g ∈O C .Growth estimates may be obtained as follows.It is true that O C = r ∈R O r topologically.If g ∈O r and f ∈O s ,the following crucial proposition shows that the O r spaces have similar behaviour under pointwise and Moyal products.Proposition 2.19.The space O C is an associative ∗-algebra under the Moyal product.In fact,the Moyal product is a jointly continuous map from O r ×O s into O r +s ,for all r,s ∈R .Moreover,A θis a two sided essential ideal in O C .Proof.For the reader’s convenience,we reproduce part of Theorem 2of [35].Let f ∈O r and g ∈O s .By the Leibniz rule for the Moyal product,∂α(f⋆θg )= β+γ=α αβ ∂βf⋆θ∂γg .Hence we need only show that there are constants C rsm such that(1+|x |2)−(r +s )/2|(∂βf⋆θ∂γg )(x )|≤C rsm q 0rm (f )q 0sm (g )(2.14)for all x ∈R 2N ,for large enough m ≥|β|+|γ|.If k ∈N (to be determined later),we can write(∂βf⋆θ∂γg )(x )=(πθ)−2N ∂βf (x +y )(1+|z |2)k(1+|y |2)k (1+|z |2)k e 2i (1+|y |2)k∂γg (x +z )θy ·Sz d 2N y d 2N z =(πθ)−2N e 2i (1+|y |2)k ∂γg (x +z )(1+|y |2)k∂γ+k ′′g (x +z )(1+|y |2)k (1+|x +z |2)s/2m ≥|β|+|γ|+2N +max {r,s }),the integrals will be finite.The joint continuity now follows directly from the estimates (2.14).That S is a two-sided ideal in O C follows from the inclusion O C ⊂M θ.Essentiality for the ideal S =A θis equivalent [45,Prop.1.8]to g⋆θS =0for any nonzero g ∈O s ;but if g⋆θf mn =0for all m,n ,then in the expansion g = m,n c mn f mn (as an element of S ′,say)all coefficients must vanish,so that g =0.Similar results hold for V γwhen γ>0.Indeed,the Moyal product (f,g )→f⋆θg is a jointly continuous map from K r ×K s into K r +s ;moreover,f⋆θg −fg ∈K r +s −2,which is a bonus for semiclassical analysis (while on the contrary the similar statement for O r ×O s is in general false).For γ<0,we lose control of the estimates;indeed,Lassner and Lassner [59]gave an example of two functions in O T whose twisted product can be defined but is not a smooth function,but rather a distribution (of noncompact support).Also,in the next subsection we prove by counterexample that O T ⊂/M θL .The integral estimates on the derivatives of g⋆θf can be refined to show that in fact O M ⋆θO C =O M .However,since these estimates depend on the order of the derivatives in a complicated way,it is doubtful that the twisted product can be extended to O M .The regularizing property of ⋆θproved at the beginning of the section can be vastly improved,as follows.Proposition 2.20.[43]If T ∈S ′and f ∈S ,then T ⋆θf and f⋆θT lie in O T .Moreover,these bilinear maps of S ′×S and S ×S ′into O T are hypocontinuous.In fact,S ⋆θS ′equals (M θL )′,so the latter is made of smooth functions.But (M θL )′∩(M θL )′′=(M θL )′∩M θL =(M θL )′ S ;so (M θL )′and (M θR )′do not satisfy the conclusion ofProposition 2.18.(Here ′′of course denotes the strong bidual space,not a bicommutant.)Asdistributions,the elements of (M θL )′and (M θR )′belong to O ′C ,and a fortiori they are Ces`a ro summable [34].Finally,it is important to know when smooth functions give rise to elements of A 0θor A θ.Sufficient conditions are the following (quite strong)results of the Calder´o n–Vaillancourt type[36,54].Theorem 2.21.The inclusion V 2N +10,0⊂A θholds.In particular,B ⊂A θ.The inclusion V 2N +10,0.We have also proved that the function space B is a ∗-algebra under the Moyal product ⋆θfor any θ,in which A θis a two sided essential ideal.Recall that D L 2⊂˙B ⊂M θ.We will now show that D L 2is a ∗-algebra under the Moyal product as well.Lemma 2.22.(D L 2,⋆θ)is a ∗-algebra with continuous product and involution.Moreover,it is an ideal in (B ,⋆θ).Proof.The closure under the twisted product follows from the Leibniz rule and Lemma 2.12:∂α(f⋆θg ) 2≤(2πθ)−N/2 β≤ααβ ∂βf 2 ∂α−βg 2.This also shows that the product is separately continuous,indeed jointly continuous since D L 2is a Fr´e chet space.The continuity of the involution f →f ∗is immediate.The fact that D L 2is a two sided ideal in B comes directly from the stability of these spaces under partial derivations and from the inclusion B ⊂A θgiven by the previous theorem,since then ∂αf⋆θ∂βg 2<∞for all f ∈B ,g ∈D L 2and all α,β∈N 2N .132.5The preferred unitization of the Schwartz Moyal algebraAs with Stone–ˇCech compactifications,the algebras Mθare too vast to be of much practical use(in particular,to define noncommutative vector bundles).A more suitable unitization of Aθisgiven by the algebra Aθ:=(B,⋆θ).This algebra possesess an intrinsic characterization as the smooth commutant of right Moyal multiplication(see our comments at the end of subsection4.5).The inclusion of Aθin B is not dense,but this is not needed. Aθcontains the constant functions and the plane waves,but no nonconstant polynomials and no imaginary-quadratic exponentials, such as e iax1x2in the case N=1(we will see later the pertinence of this).Proposition2.23. Aθis a unital Fr´e chet pre-C∗-algebra.Proof.We already know that B is a unital∗-algebra with the Moyal product,and that⋆θiscontinuous in the topology of the Fr´e chet space B defined by the seminorms q00m,for m∈N. Its elements have all derivatives bounded,and so are uniformly continuous functions on R2N, as are their derivatives:the group of translationsτy f=f(·−y),for y∈R2N,acts strongly continuously on Aθ(i.e.,y→τy f is continuous for each f).This action preserves the seminorms q00m,and it is clear that B is a subspace of the space of smooth elements forτ,which we provisionally call A∞θ.The latter space has its own Fr´e chet topology,coming from the strongly continuous action.Rieffel[75,Thm.7.1]proves two im-portant properties in this setting:firstly,based on a density theorem of Dixmier and Malli-avin[30],that the inclusion B֒→A∞θis continuous and dense.Secondly,using a“Θ-twisting”of C∗-algebras with an R k-action which generalizes(2.1),whereby the pointwise product can be recovered as(B,⋆0)=( Aθ,⋆−θ),one obtains the reverse inclusion;thus,B=A∞θ.(Thus,the smooth subalgebra is independent ofΘ.)It is now easy to show that Aθ,as a subalgebra of the C∗-algebra Aθ,is stable under the holomorphic functional calculus.Indeed,since G(τy(f))=τy(G(f))for any function G which is holomorphic in the neighbourhood of sp Lθ(f)=sp Lθ(τy(f)),it is clear that f∈ Aθentails G(f)∈ Aθ.Clearly the C∗-algebra completion ofAθproperly contains A0θ;it is not known to us whether it is equal to Aθ.At any rate, Aθ≡B is nonseparable as it stands;there is,however,another topology on B,induced by the topology of C∞(R2N)[77,p.203],under which this space is separable.That latter topology is very natural in the context of commutative and Connes–Landi spaces(see subsections3.3and3.4).To investigate its pertinence in the context of Moyal spaces would take us too far afield.An advantage of Aθis that the covering relation of the noncommutative plane to the NC torus is made transparent.To wit,the smooth noncommutative torus algebra C∞(T2NΘ)can be embedded in B as periodic functions(with afixed period parallelogram).In that respect,it is well to recall[76,87]how far the algebraic structure of C∞(T2NΘ)can be obtained from the integral form(2.1)of(a periodic version of)the Moyal product. Anticipating on the next section,wefinally note the main reason for suitability of Aθ,namely, that each[D/,Lθ(f)⊗12N]lies in Aθ⊗M2N(C),for f∈ Aθand D/the Dirac operator on R2N.The previous proposition has another useful consequence.Corollary2.24.(D L2,⋆θ)is a(nonunital)Fr´e chet pre-C∗-algebra,whose C∗-completion is A0θ. Proof.The argument of the proof of Proposition2.14applies,with the following modifications. Firstly,S⊂D L2⊂A0θwith continuous inclusions,so that A0θis indeed the C∗-completion of(D L2,⋆θ).Indeed,for the second inclusion one can notice that if f∈D L2,then W Lθ(f)W−1=14。
Thermalization Mechanisms in Compact Sources
a rXiv:as tr o-ph/99224v115Fe b1999Thermalization Mechanisms in Compact Sources Roland Svensson Stockholm Observatory,SE-13336Saltsj¨o baden,Sweden Abstract.There is strong observational evidence that a quasi-thermal population of electrons (or pairs)exists in compact X-ray sources.It is,however,unclear what mechanism thermalizes the particles.Here,two processes,Coulomb scattering and synchrotron self-absorption,that may be responsible for the thermalization,are reviewed.The parameter spaces in which respective process dominates are given.While the Coulomb thermalization mechanism is well-known,this is not the case for the synchrotron self-absorption thermalization.We give the arguments that synchrotron self-absorption must act as a thermal-izing mechanism in sufficiently compact sources.The emitting and ab-sorbing electrons then exchange energy efficiently with the self-absorbed synchrotron radiation field and are driven towards a relativistic or mildly relativistic thermal distribution in a few synchrotron cooling times (the “synchrotron boiler”).1.Introduction Observations with hard X-ray/γ-ray satellites such as CGRO OSSE,RXTE ,and BeppoSAX indicate that the X/γspectra cut offat a few hundred keV for the majority of active galactic nuclei (see Zdziarski,this volume;Matt,this volume)and for the hard states of galactic black hole candidates (see Zdziarski,this volume;Grove,this volume).It is generally believed that Comptonization by a quasi-thermal population of electrons (or pairs)is responsible for the formationof the X/γspectra from these sources.In the spectral modeling codes (e.g.,Poutanen &Svensson 1996),it is normally assumed that the Comptonizing par-ticles have a Maxwellian distribution.It has been pointed out several times that thermalization by Coulomb scattering may not be fast enough as compared to various cooling mechanisms (such as Compton cooling)and that the particle distribution therefore will differ from a Maxwellian (e.g.,Dermer &Liang 1989;Fabian 1994).On the other hand,it has been noticed that another thermaliza-tion mechanism,synchrotron self-absorption,may operate in compact plasmas (Ghisellini,Guilbert,&Svensson 1988;Ghisellini &Svensson 1989).Here,we review the physics of these two thermalization mechanisms,and explore in which contexts each of them may operate.2.Thermalization by Coulomb ScatteringThe approximate time scale,t C,for thermalization by Coulomb(Møller)scat-tering between electrons in nonrelativistic plasmas has been known since long (e.g.,Spitzer1956;see Stepney1983for relativistic corrections):t Tt C=4Figure1.Thermalization through electron-electron scattering of aninitially Gaussian electron test distribution(curve a)centered at1MeV and a FWHM of0.28MeV in a background thermal electronplasma having a Maxwellian distribution of temperature511keV,i.e.,Θ=1(curve b).The dashed curves show the relaxing distributionat different times,assuming an electron background density of1cm−3.The thermalization time scale from eq.(1)is2.6×1013s.From Dermer&Liang(1989).process will be inhibited by the coolingfirst noticeable as a truncation of the Maxwellian tail.Stepney(1983)noticed that bremsstrahlung cooling will pre-vent thermalization for temperatures larger than about5×1010K,and Baring (1987)performed further analysis for additional cooling processes,as did Ghis-ellini,Haardt,&Fabian(1993).Including Compton and synchrotron losses in the Fokker-Planck equation allows for the determination of the steady distribution function under the influ-ence of these cooling processes.Results from Dermer&Liang(1989)are shown in Figure2.It is seen that,for increasing energy densities of radiation and mag-neticfields,the high energy tail of the electron distribution becomes increasingly truncated and the effective temperature of the distribution becomes smaller.What are then the conditions for losses to dominate over Coulomb thermal-ization(see,e.g.,Fabian1994)?The nonrelativistic cooling time scale can be written as(see Coppi,this volume)t cool≈R/[cℓB(1+U rad/U B)],(2) where R is the size of the region,ℓB is the magnetic compactness defined in equation(4)below,and U rad,U B are the energy densities in radiation and mag-neticfields,paring with equation(1),onefinds that CoulombFigure2.Truncation of a thermal electron distribution with tem-perature511keV due to Compton and synchrotron losses.Solid curve shows a Maxwellian of temperature511keV.Dashed curves show the relaxed distributions at different values of the parameter(U rad+U B)/n e in units of MeV.From Dermer&Liang(1989).scattering cannot maintain a Maxwellian whenℓB(1+U rad/U B)>τT lnΛcharacterizes the dissipation with L diss being the power providing uniform heat-ing in a cubic volume of size h in the case of a slab of height h,or in the whole volume in the case of an active region of size h.Figure3b shows theΘvs.ℓdiss relations.Figure3.Dimensionless volume-averaged temperature,Θ≡kT e/m e c2,vs.Thomson scattering optical depth,τT,in panel(a),and vs.dissipationcompactness,ℓdiss≡(L diss/h)(σT/m e c3)in panel(b),for a steady X-rayemitting plasma region in pair and energy balance on or above a cold disksurface of black body temperature,kT bb=5eV.The plasma Compton scat-ters reprocessed soft black body photons from the cold disk surface.Solidrectangles and dashed curve show results from a nonlinear Monte Carlo code(Stern et al.1995a)and an iterative scattering method code(Poutanen,&Svensson1996),respectively,for the case of a plane-parallel slab corona.Re-sults using the Monte Carlo code for individual active pair regions are shownfor hemispheres located on the disk surface;surface spheres also located on thesurface(underlined spheres);spheres located at a height of0.5h(spheres),1h(diamond),and2h(triangle),where h is the radius of the sphere.The resultsfor each type of active region are connected by dotted curves.The dash-dottedand dash-dot-dot-dotted curves in panel(b)show the critical compactness asfunction ofΘabove which thermalization by Møller and Bhabha scatteringis not achieved for the cases of pair slabs and surface spheres,respectively.(Unpublished results by Stern et al.;see also Stern et al.1995b;Svensson1997).The question arises whether the electrons can thermalize or not for the conditions,Θ,τT,andℓdiss,in Figure3.Energy exchange and thermalization through Møller(e±e±)and Bhabha(e+e−)scattering compete with various loss mechanisms,with Compton losses being the most important for our conditions. The thermalization is slowest and the Compton losses largest for the higher en-ergy particles in the Maxwellian tail.Instead of using the approximate equation (3),we use the detailed simulations by Dermer&Liang(1989,theirfig.8)tofindthe critical compactness above which the deviation of the electron distribution at the Maxwellian mean energy is more than a factor e≈2.7.The dash-dotted and dash-dot-dot-dotted curves in Figure3b show this critical compactness for slabs and for surface spheres,respectively.In agreement with Ghisellini,Haardt, &Fabian(1993),wefind that Møller and Bhabha scattering cannot compete with Compton losses in our pair slabs and active regions.The problem then arises of what mechanism can thermalize the apparently thermal electron distribution in compact sources.One such mechanism is cy-clo/synchrotron absorption.3.Thermalization by Cyclo/Synchrotron Absorption3.1.A Brief History of Synchrotron ThermalizationEver since the classical interpretation by Shklovskii in the1950s of the radiation from the Crab nebula as being synchrotron radiation,this process has played an important role in our interpretation of the non-thermal radiation from a wide variety of astronomical objects.In general,the electron distribution has been assumed to be a power law or nearly a power law.Much less attention has been paid to what happens to the electron distribution at self-absorbed electron energies.The theory for synchrotron radiation was developed in the1950s(see,e.g., reviews by Ginzburg and Syrovatskii1965,1969;Pacholczyk1970).The emis-sion and absorption coefficients for single relativistic electrons as well as for en-sembles of relativistic electrons having power law or thermal distributions were calculated in the1950s and1960s.For power law distributions,the absorption coefficient increases towards lower photon energies.Below some photon energy,νabs,the source becomes optically thick to synchrotron self-absorption resulting in an intensity proportional toν5/2,obtained from the ratio of emission and ab-sorption coefficients(Le Roux1961).Afiniteνabs,of course,requires that the source isfinite.Below we call this and its consequences”finite source effects”.This early work was,in general,applied to extended sources where the cooling time at self-absorbed particle energies is longer than other relevant time scales(such as the age of the source or the dynamical time scales).It was therefore natural to assume that the self-absorbing electron distribution below the Lorentz factor,γabs,of the electron(emitting at the frequencyνabs where the source becomes optically thick)is unaffected by self-absorption and simply maintains the power law distribution of the injected electrons.In the late1960s,it became increasingly clear that,in compact sources or on long time scales,the self-absorbed electron distribution N(γ)will evolve under the influence of synchrotron emission and absorption.What are then the possi-ble equilibrium solutions at self-absorbing Lorentz factors towards which N(γ) would relax?In the important papers by Rees(1967)and McCray(1969),it was shown that power law distributions,N(γ)∝γ−s with s=2and3,are equilib-rium solutions to the kinetic equations.Rees(1967),however,also found that the solution with s=3is unstable and would evolve away from s=3if slightly perturbed.McCray(1969)showed this explicitly by numerically calculating the time dependent evolution of initial power law distributions in an infinite source.Rees predicted and McCray confirmed that the high energy electronsin aflat(s<3)initial power law would tend to evolve into a quasi-Maxwellian distribution.McCray(1969),furthermore,emphasized the importance offinite source effects on the evolution.In particular,for power laws with s<3,the self-absorbing electrons would gain energy absorbing slightly more energy than they emit,while the electrons radiating in the optically thin limit lose energy by radiating much more energy than they absorb.All electrons would therefore tend to gather atγabs developing a peak there(as was already emphasized by Rees1967).It must be emphasized that the relaxation of self-absorbing electrons takes place through the energy exchange with the radiationfield,which in its turn is determined by the particle distribution.This is the“synchrotron boiler”,a terminology coined by Ghisellini,Guilbert,&Svensson(1988).3.2.Rise and Fall of yet another ParadigmIn a series of papers in the1970s,Norman and coworkers further developed the concept Plasma Turbulent Reactor(PTR)introduced by Kaplan and Tsytovich (1973).Originally the turbulence feeding the electrons was thought to be plas-mons.In practice,however,the PTR is exactly the self-absorbing synchrotron source considered here,as photons are the only plasma modes with sufficiently small damping rate that they can mediate energy transfer from one electron to another(Norman1977).Norman and ter Haar(1975)and Norman(1977) essentially repeated the analysis of McCray(1969)using quite different nota-tion and definitions but arriving at the same conclusions that N(γ)∝γ−2and N(γ)∝γ−3are the only steady power law equilibrium solutions.It is important that they noted that the N(γ)∝γ−2solution corresponds to afinite electron flux upwards along the energy axis,while N(γ)∝γ−3corresponds to zero elec-tronflux.They argued that N(γ)∝γ−3was the most physical solution as the synchrotron time scales establishing this distribution are shorter than other time scales.Although being aware of possiblefinite source effects,they considered them not to influence the electron distribution at Lorentz factors≪γabs.The self-absorbed solution,N(γ)∝γ−3,was considered sufficiently impor-tant in explaining power law spectra from a variety of sources that Norman and ter Haar(1975)called the PTR a new astrophysical paradigm.Norman and coworkers,however,do not seem to have considered the stability of the N(γ)∝γ−3solution.The work of Rees(1967)and McCray(1969)indicates that a Maxwellian distribution may be the only stable equilibrium solution.This was,however,not rigorously established causing Ghisellini,Guilbert&Svensson(1988,GGS88) to numerically determine the steady solutions of the kinetic equations including physical boundary conditions(i.e.,correct Fokker-Planck coefficients at subrel-ativistic energies,and the accounting forfinite source effects at large energies). Asγabs typically is of the order10-100in compact radio sources and the develop-ment of the self-absorbed distribution takes place at mildly relativistic energies, they used expressions and equations valid at any energy.Furthermore,in or-der to obtain steady solutions the particle injection had to be balanced by a sink term(escape or reacceleration).Injecting a power law proportional toγ−3 (i.e.,with the equilibrium slope)GGS88found that the steady solution was a Maxwellian with a temperature corresponding to the mean energy of the in-jected electrons.Similarly,an injected power law proportional toγ2(essentially corresponding to monoenergetic injection at some large Lorentz factor≫γabs) led to the establishment of a Maxwellian just belowγabs.The injected electrons cool until reachingγabs where they thermalize exchanging energy with the self-absorbed radiationfield.Additional studies were made by de Kool,Begelman, &Sikora(1989)and Coppi(1990).With these works it appears that the PTR paradigm of Norman and ter Haar(1975)has been shown to be invalid.3.3.Relaxation by Cyclo/Synchrotron AbsorptionThe works so far that explicitly have demonstrated the formation of a Maxwellian through synchrotron self-absorption are the numerical simulations of GGS88, Coppi(1990),and Ghisellini,Haardt,&Svensson(1998).Here,we review some of the results in the last paper.In the numerical simulations,a kinetic equation for the electron distribution is solved.The kinetic equation,which also in this case takes the form of a Fokker-Planck equation(see derivation in McCray1969), includes Compton and synchrotron cooling,synchrotron absorption(heating), electron injection,and electron escape.Even though various source geometries are discussed,the radiationfield is assumed to be given by the steady slab solution,which is correct to the order of unity.The simulations consider a region of size R with a magneticfield of strength B,into which some distribution of electrons are injected with a power L.In a steady state,this power emerges either as radiation or as the power of escaping electrons.The electrons are assumed to escape at the speed v esc=cβesc=R/t esc where t esc is the escape time.Convenient parameters describing compact sources are the injection compactness,ℓinj,and the magnetic compactness,ℓB,definedasℓinj=Lm e c3;ℓB=σTFigure4.Electron distribution,τ(p)≡σT RN(p),evolving due tocyclo/synchrotron emission/absorption/diffusion.Curves are labeledby times in units of R/c.Parameters areℓinj=1,ℓB=10,R=1013cm(or B≃5.5×103G),andβesc=1.The injected distribution isa Gaussian centered atγ=10.From Ghisellini,Haardt,&Svensson(1998).that the high energy part of the Maxwellian distribution is formed earlier than the low energy part,due to the higher efficiency of exchanging photons of the high energy electrons.A slower evolution takes place after0.1(R/c),as the bal-ance between electron injection and electron escape is achieved on a time scale of a few t esc.Only then have both the shape and the amplitude of the electron distribution reached their equilibrium values.3.4.Influence of Cooling Processes on the Steady Electron Distri-butionThe equilibrium distributions for different values of the injected compactness are shown in Figure5.The magnetic compactness is set toℓB=30,corresponding to B=104G for R=1013cm(from Eq.4).In all cases,the injected distribution is a peaked function with an exponential high energy cut-off.The mean injected Lorentz factor is<γ>≃5and essentially all electrons are belowγabs.It is ap-parent from Figure5that the electron distribution is a quasi-Maxwellian at all energies as long asℓinj≪ℓB.This is a consequence of an almost perfect balance between synchrotron gains(absorption)and losses,while Compton losses are only a small perturbation.Asℓinj increases towardsℓB,Compton losses become increasingly important,competing with the synchrotron processes.At high en-ergies,losses overcome gains,and the electrons diffuse downwards in energy, until subrelativistic energies are reached.In this energy regime,the increasedFigure 5.Steady equilibrium electron distributions due to cy-clo/synchrotron emission/absorption/diffusion and Compton coolingfor different injected compactnesses(decreasing from top to bottom).Further parameters areℓB=30,R=1013cm,βesc=1.The corre-sponding magneticfield is B=104G.The mean injected Lorenz factoris about5.Increasingℓinj implies increased Compton cooling resultingin a shift of the quasi-Maxwellian towards lower temperatures.FromGhisellini,Haardt,&Svensson(1998).efficiency of synchrotron gains(relative to losses)halts the systematic downward diffusion in energy,and a Maxwellian can form(see Ghisellini&Svensson1989). The temperature of this part of N(γ)can be obtained byfitting a Maxwellian to the the low energy part of the distribution,up to energies just above the peak of the electron distribution.The resulting temperatures are plotted in Figure6as a function ofℓinj. Forℓinj<∼1,the temperature is approximately constant,while it decreases for ℓinj>∼1.From Equation(3)(with U rad/U B set to zero),we see that thermal-ization by synchrotron self-absorption dominates whenℓB>τT lnΛ/4Θ3/2(as-sumingΘ<∼1).The Coulomb process thus dominates for small temperatures and largeτT(i.e.,large electron densities).We need to knowτT for our simu-lations.The balance between electron injection and escape in our model gives a Thomson optical depth ofτT=(3/4π)(ℓinj/βesc<γ>).For the simulations in Figure5,the optical depth increases fromτT=5×10−3forℓinj=0.1to τT=5forℓinj=ing the expression forτT,wefind that thermal-ization by synchrotron self-absorption then dominates over Coulomb scattering for temperaturesΘ>0.11(lnΛ/<γ>)2/3(ℓinj/ℓB)2/3.For the parameters of the simulations in Figures5and6and lnΛ=20,the condition becomes Θ>0.03(ℓinj)2/3,which is plotted as the solid line in Figure3.One sees thatFigure6.Temperatures of the quasi-Maxwellian part of steady equi-librium electron distributions shown in Fig. 5.Above the solid line,synchrotron self-absorption dominates over Coulomb exchange as thethermalization mechanism.The solid dots show the Comptonizationy-parameter.From Ghisellini,Haardt,&Svensson(1998).that synchrotron self-absorption dominates the thermalization for all cases with ℓinj smaller than about10.For the casesℓinj=30and100,one cannot neglect Coulomb thermalization.3.5.Spectra from Steady Electron DistributionsIn Figure7,the radiation spectra corresponding to four of the equilibrium elec-tron distributions in Figure5are shown.Each spectrum consists of several continuum components:•a self–absorbed synchrotron spectrum(S);•a Comptonized synchrotron spectrum(SSC);•a reprocessed thermal soft component(bump);•a component from Comptonization of thermal bump photons(IC);•a Compton reflection component.Details of the spectral calculations are given in Ghisellini,Haardt,&Svens-son(1998).Some features in Figure7may be noticed.Forℓinj<1,the Compton y-parameter is less than unity(see Fig.6)making the Compton losses relatively unimportant relative the self-absorbed synchrotron radiation.The large value ofFigure7.Radiation spectra calculated using four of the electron dis-tributions shown in Fig. 5.The different spectral components arethe reprocessed thermal bump and IC components(dashed curves),the synchrotron and SSC components(dash–dotted curves),and theCompton reflection component(dotted curves).The total spectra areshown by the solid curves.Note that the chosen parameters(R=1013cm,and the corresponding magneticfield B=104G)correspond toconditions expected near black holes in active galactic nuclei.FromGhisellini,Haardt,&Svensson(1998).Θmakes the Comptonized spectra bumpy.The2–10keV band is dominated by the SSC component,rather than by the IC.The thermal bump and the X–ray flux are thus not directly related.This is contrary to the common interpreta-tion of the X-ray emission in Seyfert galaxies as being due to Comptonization of thermal bump photons.Forℓinj>1,the Compton cooling dominates and limits the y-parameter to unity.The smooth IC power law dominates over the S and SSC components. Forℓinj<∼3,the high energy spectral cut-offcan be described by an exponential, since the electron distribution is a quasi-Maxwellian in the entire energy range. Forℓinj>∼3,the electron distribution is more complex(see Fig.5),resulting in a more complex spectral cut-off.The choice of R=1013cm(or B=104G)in Figure7corresponds to the case of active galactic nuclei(AGN).Ghisellini,Haardt,&Svensson(1998)also study the case of galactic black holes choosing R=107cm(or B=107G)forthe same set ofℓB andℓinj.The large magneticfield and small size move the S and bump peaks to larger frequencies.The main difference is that now the synchrotron component is not completely self-absorbed leading to an optically thin synchrotron component from the highest energy electrons.More soft syn-chrotron photons enhances the SSC component relative the IC component as compared to the AGN case.4.Final RemarksFirst,we note thatℓB>1is needed for synchrotron self-absorption to operate efficiently(from eq.2).A Maxwellian is then formed with the same mean energy as the injected electrons,assuming that self-absorption operates at essentially all electron energies of interest.If,furthermore,Compton cooling is important, i.e.,ifℓinj>ℓB,then the Maxwellian is modified and is shifted to lower energies (temperature).Second,we note that the criterion for Coulomb vs synchrotron thermaliza-tion in the case when all electrons are self-absorbing(i.e.radiating optically thick synchrotron radiation)is more complex(eq.3).Essentially,the same cri-terion is valid for comparing Coulomb thermalization in the case when the major part of the electrons radiate optically thin synchrotron radiation(or Compton radiation)truncating the Maxwellian.Which of the two cases that is valid de-pends on whether the energy,(γabs−1)m e c2,of the electrons radiating at the photon energy where the absorption optical depth is unity is much larger or much smaller than kT e.There should also be a region in parameter space where both Coulomb and synchrotron thermalization operates simultaneously(assuming that most elec-trons radiate optically thick radiation).Here,Coulomb thermalization should dominate at lower electron energies and synchrotron thermalization at larger energies.However,nobody seems so far to have solved the Fokker-Planck equa-tions to study the thermalization process including both Coulomb scattering and synchrotron self-absorption.Ultimately,the thermalization process should be put in a realistic con-text.Mahadevan&Quataert(1997)studied the importance of thermalization in advection-dominatedflows onto black holes under the conditions considered in suchflows(e.g.,close to free-fall,equipartition magneticfields).Comparing the thermalization time scales with the accretion time scale(equivalent to t esc in our discussion),they found that thermalization did not occur at large radii and small accretion rates.However,at sufficiently large accretion rate,synchrotron thermalization becomes important,and at even larger rates(and thus larger densities)Coulomb thermalization starts operating.Another scenario for generating the X-ray radiation from compact sources is that of a corona or magneticflares atop an accretion disks.The typical condi-tion for theflare regions is that the magnetic energy density should dominate the radiation energy density,i.e.,thatℓB>∼ℓinj>1,which should ensure that cy-clo/synchrotron self-absorption acts as a very efficient thermalizing mechanism in such regions.Acknowledgments.I appreciate a more than decade-long collaboration with G.Ghisellini on the issues discussed in this review.I thank J.Poutanen and A.Beloborodov for valuable comments.This work is supported by the Swedish Natural Science Research Council and the Swedish National Space Board. ReferencesBaring,M.1987,MNRAS,228,695Coppi,P.S.1992,MNRAS,258,657de Kool,M.,Begelman,M.C.,&Sikora,M.1989,ApJ,337,66Dermer,C.D.,&Liang,E.P.1989,ApJ,339,512Fabian,A.C.1994,ApJS,92,555Ghisellini,G.,Guilbert,P.W.,&Svensson R.1988,ApJ,334,L5(GGS88) Ghisellini,G.,&Haardt,F.1993,ApJ,429,L53Ghisellini,G.,Haardt,F.,&Fabian,A.C.1993,MNRAS,263,L9 Ghisellini,G.,Haardt,F.,&Svensson,R.1998,MNRAS,297,348 Ghisellini,G.,&Svensson,R.1989,in Physical processes in hot cosmic plasmas, eds.Brinkmann,W.,Fabian,A.C.&Giovannelli,F.,NATO ASI Series, Kluwer Academic Publishers,395Ginzburg,V.L.,&Syrovatskii,S.I.1965,ARA&A,3,297Ginzburg,V.L.,&Syrovatskii,S.I.1969,ARA&A,7,375Kaplan,S.A.,&Tsytovich,V.N.1973,Plasma Astrophysics,Elmsford,N.Y.: PergamonLe Roux,E.1961,Annals Astrophys.,24,71Mahadevan,R.,&Quataert,E.1997,ApJ,490,605McCray,R.1969,ApJ,156,329Nayakshin,S.,&Melia,F.1998,ApJS,114,269Norman,C.A.1974,Fysisk Tidskrift(Møller Festschrift),72,84Norman,C.A.1977,Annals of Physics,106,26Norman,C.A.,&ter Haar,D.1975,Physics Reports,17,309 Pacholczyk,A.G.1970,Radio Astrophysics,San Francisco:FreemanPilla,R.P.,&Shaham,J.1997,ApJ,486,903Poutanen,J.,&Svensson,R.1996,ApJ,470,249Rees,M.J.1967,MNRAS,136,279Spitzer,L.1956,Physics of Fully Ionized Gases,New York:Wiley Stepney,S.1983,MNRAS,202,467Stern,B.E.,Begelman,M.C.,Sikora,M.,&Svensson,R.1995a,MNRAS,272, 291Stern,B.E.,Poutanen,J.,Svensson,R.,Sikora,M.,&Begelman,M.C.1995b, ApJ,449,L13Svensson,R.1997,in Relativistic Astrophysics:A Conference in Honour of Professor I.D.Novikov’s60th Birthday,eds. B.J.T.Jones and D.Markovic,Cambridge:Cambridge University Press,235。
Comparison of Multiobjective Evolutionary Algorithms Empirical Results1
Comparison of Multiobjective Evolutionary Algorithms:Empirical ResultsEckart ZitzlerDepartment of Electrical Engineering Swiss Federal Institute of T echnology 8092Zurich,Switzerlandzitzler@tik.ee.ethz.ch Kalyanmoy DebDepartment of Mechanical Engineering Indian Institute of T echnology Kanpur Kanpur,PIN208016,Indiadeb@iitk.ac.inLothar ThieleDepartment of Electrical EngineeringSwiss Federal Institute of T echnology8092Zurich,Switzerlandthiele@tik.ee.ethz.chAbstractIn this paper,we provide a systematic comparison of various evolutionary approaches tomultiobjective optimization using six carefully chosen test functions.Each test functioninvolves a particular feature that is known to cause difficulty in the evolutionary optimiza-tion process,mainly in converging to the Pareto-optimal front(e.g.,multimodality anddeception).By investigating these different problem features separately,it is possible topredict the kind of problems to which a certain technique is or is not well suited.However,in contrast to what was suspected beforehand,the experimental results indicate a hierarchyof the algorithms under consideration.Furthermore,the emerging effects are evidencethat the suggested test functions provide sufficient complexity to compare multiobjectiveoptimizers.Finally,elitism is shown to be an important factor for improving evolutionarymultiobjective search.KeywordsEvolutionary algorithms,multiobjective optimization,Pareto optimality,test functions,elitism.1MotivationEvolutionary algorithms(EAs)have become established as the method at hand for exploring the Pareto-optimal front in multiobjective optimization problems that are too complex to be solved by exact methods,such as linear programming and gradient search.This is not only because there are few alternatives for searching intractably large spaces for multiple Pareto-optimal solutions.Due to their inherent parallelism and their capability to exploit similarities of solutions by recombination,they are able to approximate the Pareto-optimal front in a single optimization run.The numerous applications and the rapidly growing interest in the area of multiobjective EAs take this fact into account.After thefirst pioneering studies on evolutionary multiobjective optimization appeared in the mid-eighties(Schaffer,1984,1985;Fourman,1985)several different EA implementa-tions were proposed in the years1991–1994(Kursawe,1991;Hajela and Lin,1992;Fonseca c2000by the Massachusetts Institute of T echnology Evolutionary Computation8(2):173-195E.Zitzler,K.Deb,and L.Thieleand Fleming,1993;Horn et al.,1994;Srinivas and Deb,1994).Later,these approaches (and variations of them)were successfully applied to various multiobjective optimization problems(Ishibuchi and Murata,1996;Cunha et al.,1997;Valenzuela-Rend´on and Uresti-Charre,1997;Fonseca and Fleming,1998;Parks and Miller,1998).In recent years,some researchers have investigated particular topics of evolutionary multiobjective search,such as convergence to the Pareto-optimal front(Van Veldhuizen and Lamont,1998a;Rudolph, 1998),niching(Obayashi et al.,1998),and elitism(Parks and Miller,1998;Obayashi et al., 1998),while others have concentrated on developing new evolutionary techniques(Lau-manns et al.,1998;Zitzler and Thiele,1999).For a thorough discussion of evolutionary algorithms for multiobjective optimization,the interested reader is referred to Fonseca and Fleming(1995),Horn(1997),Van Veldhuizen and Lamont(1998b),and Coello(1999).In spite of this variety,there is a lack of studies that compare the performance and different aspects of these approaches.Consequently,the question arises:which imple-mentations are suited to which sort of problem,and what are the specific advantages and drawbacks of different techniques?First steps in this direction have been made in both theory and practice.On the theoretical side,Fonseca and Fleming(1995)discussed the influence of differentfitness assignment strategies on the selection process.On the practical side,Zitzler and Thiele (1998,1999)used a NP-hard0/1knapsack problem to compare several multiobjective EAs. In this paper,we provide a systematic comparison of six multiobjective EAs,including a random search strategy as well as a single-objective EA using objective aggregation.The basis of this empirical study is formed by a set of well-defined,domain-independent test functions that allow the investigation of independent problem features.We thereby draw upon results presented in Deb(1999),where problem features that may make convergence of EAs to the Pareto-optimal front difficult are identified and,furthermore,methods of constructing appropriate test functions are suggested.The functions considered here cover the range of convexity,nonconvexity,discrete Pareto fronts,multimodality,deception,and biased search spaces.Hence,we are able to systematically compare the approaches based on different kinds of difficulty and to determine more exactly where certain techniques are advantageous or have trouble.In this context,we also examine further factors such as population size and elitism.The paper is structured as follows:Section2introduces key concepts of multiobjective optimization and defines the terminology used in this paper mathematically.We then give a brief overview of the multiobjective EAs under consideration with special emphasis on the differences between them.The test functions,their construction,and their choice are the subject of Section4,which is followed by a discussion about performance metrics to assess the quality of trade-off fronts.Afterwards,we present the experimental results in Section6and investigate further aspects like elitism(Section7)and population size (Section8)separately.A discussion of the results as well as future perspectives are given in Section9.2DefinitionsOptimization problems involving multiple,conflicting objectives are often approached by aggregating the objectives into a scalar function and solving the resulting single-objective optimization problem.In contrast,in this study,we are concerned withfinding a set of optimal trade-offs,the so-called Pareto-optimal set.In the following,we formalize this 174Evolutionary Computation Volume8,Number2Comparison of Multiobjective EAs well-known concept and also define the difference between local and global Pareto-optimalsets.A multiobjective search space is partially ordered in the sense that two arbitrary so-lutions are related to each other in two possible ways:either one dominates the other or neither dominates.D EFINITION1:Let us consider,without loss of generality,a multiobjective minimization problem with decision variables(parameters)and objectives:Minimizewhere(1) and where is called decision vector,parameter space,objective vector,and objective space.A decision vector is said to dominate a decision vector(also written as) if and only if(2)Additionally,in this study,we say covers()if and only if or.Based on the above relation,we can define nondominated and Pareto-optimal solutions: D EFINITION2:Let be an arbitrary decision vector.1.The decision vector is said to be nondominated regarding a set if and only if thereis no vector in which dominates;formally(3)If it is clear within the context which set is meant,we simply leave it out.2.The decision vector is Pareto-optimal if and only if is nondominated regarding.Pareto-optimal decision vectors cannot be improved in any objective without causing a degradation in at least one other objective;they represent,in our terminology,globally optimal solutions.However,analogous to single-objective optimization problems,there may also be local optima which constitute a nondominated set within a certain neighbor-hood.This corresponds to the concepts of global and local Pareto-optimal sets introduced by Deb(1999):D EFINITION3:Consider a set of decision vectors.1.The set is denoted as a local Pareto-optimal set if and only if(4)where is a corresponding distance metric and,.A slightly modified definition of local Pareto optimality is given here.Evolutionary Computation Volume8,Number2175E.Zitzler,K.Deb,and L.Thiele2.The set is called a global Pareto-optimal set if and only if(5) Note that a global Pareto-optimal set does not necessarily contain all Pareto-optimal solu-tions.If we refer to the entirety of the Pareto-optimal solutions,we simply write“Pareto-optimal set”;the corresponding set of objective vectors is denoted as“Pareto-optimal front”.3Evolutionary Multiobjective OptimizationT wo major problems must be addressed when an evolutionary algorithm is applied to multiobjective optimization:1.How to accomplishfitness assignment and selection,respectively,in order to guide thesearch towards the Pareto-optimal set.2.How to maintain a diverse population in order to prevent premature convergence andachieve a well distributed trade-off front.Often,different approaches are classified with regard to thefirst issue,where one can distinguish between criterion selection,aggregation selection,and Pareto selection(Horn, 1997).Methods performing criterion selection switch between the objectives during the selection phase.Each time an individual is chosen for reproduction,potentially a different objective will decide which member of the population will be copied into the mating pool. Aggregation selection is based on the traditional approaches to multiobjective optimization where the multiple objectives are combined into a parameterized single objective function. The parameters of the resulting function are systematically varied during the same run in order tofind a set of Pareto-optimal solutions.Finally,Pareto selection makes direct use of the dominance relation from Definition1;Goldberg(1989)was thefirst to suggest a Pareto-basedfitness assignment strategy.In this study,six of the most salient multiobjective EAs are considered,where for each of the above categories,at least one representative was chosen.Nevertheless,there are many other methods that may be considered for the comparison(cf.Van Veldhuizen and Lamont(1998b)and Coello(1999)for an overview of different evolutionary techniques): Among the class of criterion selection approaches,the Vector Evaluated Genetic Al-gorithm(VEGA)(Schaffer,1984,1985)has been chosen.Although some serious drawbacks are known(Schaffer,1985;Fonseca and Fleming,1995;Horn,1997),this algorithm has been a strong point of reference up to now.Therefore,it has been included in this investigation.The EA proposed by Hajela and Lin(1992)is based on aggregation selection in combination withfitness sharing(Goldberg and Richardson,1987),where an individual is assessed by summing up the weighted objective values.As weighted-sum aggregation appears still to be widespread due to its simplicity,Hajela and Lin’s technique has been selected to represent this class of multiobjective EAs.Pareto-based techniques seem to be most popular in thefield of evolutionary mul-tiobjective optimization(Van Veldhuizen and Lamont,1998b).In particular,the 176Evolutionary Computation Volume8,Number2Comparison of Multiobjective EAs algorithm presented by Fonseca and Fleming(1993),the Niched Pareto Genetic Algo-rithm(NPGA)(Horn and Nafpliotis,1993;Horn et al.,1994),and the Nondominated Sorting Genetic Algorithm(NSGA)(Srinivas and Deb,1994)appear to have achieved the most attention in the EA literature and have been used in various studies.Thus, they are also considered here.Furthermore,a recent elitist Pareto-based strategy,the Strength Pareto Evolutionary Algorithm(SPGA)(Zitzler and Thiele,1999),which outperformed four other multiobjective EAs on an extended0/1knapsack problem,is included in the comparison.4Test Functions for Multiobjective OptimizersDeb(1999)has identified several features that may cause difficulties for multiobjective EAs in1)converging to the Pareto-optimal front and2)maintaining diversity within the population.Concerning thefirst issue,multimodality,deception,and isolated optima are well-known problem areas in single-objective evolutionary optimization.The second issue is important in order to achieve a well distributed nondominated front.However,certain characteristics of the Pareto-optimal front may prevent an EA fromfinding diverse Pareto-optimal solutions:convexity or nonconvexity,discreteness,and nonuniformity.For each of the six problem features mentioned,a corresponding test function is constructed following the guidelines in Deb(1999).We thereby restrict ourselves to only two objectives in order to investigate the simplest casefirst.In our opinion,two objectives are sufficient to reflect essential aspects of multiobjective optimization.Moreover,we do not consider maximization or mixed minimization/maximization problems.Each of the test functions defined below is structured in the same manner and consists itself of three functions(Deb,1999,216):Minimizesubject to(6)whereThe function is a function of thefirst decision variable only,is a function of the remaining variables,and the parameters of are the function values of and.The test functions differ in these three functions as well as in the number of variables and in the values the variables may take.D EFINITION4:We introduce six test functions that follow the scheme given in Equa-tion6:The test function has a convex Pareto-optimal front:(7)where,and.The Pareto-optimal front is formed with.The test function is the nonconvex counterpart to:(8) Evolutionary Computation Volume8,Number2177E.Zitzler,K.Deb,and L.Thielewhere,and.The Pareto-optimal front is formed with.The test function represents the discreteness feature;its Pareto-optimal front consists of several noncontiguous convex parts:(9)where,and.The Pareto-optimal front is formed with.The introduction of the sine function in causes discontinuity in the Pareto-optimal front.However, there is no discontinuity in the parameter space.The test function contains local Pareto-optimal fronts and,therefore,tests for the EA’s ability to deal with multimodality:(10)where,,and.The global Pareto-optimal front is formed with,the best local Pareto-optimal front with.Note that not all local Pareto-optimal sets are distinguishable in the objective space.The test function describes a deceptive problem and distinguishes itself from the other test functions in that represents a binary string:(11)where gives the number of ones in the bit vector(unitation),ififand,,and.The true Pareto-optimal front is formed with,while the best deceptive Pareto-optimal front is represented by the solutions for which.The global Pareto-optimal front as well as the local ones are convex.The test function includes two difficulties caused by the nonuniformity of the search space:first,the Pareto-optimal solutions are nonuniformly distributed along the global Pareto front (the front is biased for solutions for which is near one);second,the density of the solutions is lowest near the Pareto-optimal front and highest away from the front:(12)where,.The Pareto-optimal front is formed with and is nonconvex.We will discuss each function in more detail in Section6,where the corresponding Pareto-optimal fronts are visualized as well(Figures1–6).178Evolutionary Computation Volume8,Number2Comparison of Multiobjective EAs5Metrics of PerformanceComparing different optimization techniques experimentally always involves the notion of performance.In the case of multiobjective optimization,the definition of quality is substantially more complex than for single-objective optimization problems,because the optimization goal itself consists of multiple objectives:The distance of the resulting nondominated set to the Pareto-optimal front should be minimized.A good(in most cases uniform)distribution of the solutions found is desirable.Theassessment of this criterion might be based on a certain distance metric.The extent of the obtained nondominated front should be maximized,i.e.,for each objective,a wide range of values should be covered by the nondominated solutions.In the literature,some attempts can be found to formalize the above definition(or parts of it)by means of quantitative metrics.Performance assessment by means of weighted-sum aggregation was introduced by Esbensen and Kuh(1996).Thereby,a set of decision vectors is evaluated regarding a given linear combination by determining the minimum weighted-sum of all corresponding objective vectors of.Based on this concept,a sample of linear combinations is chosen at random(with respect to a certain probability distribution),and the minimum weighted-sums for all linear combinations are summed up and averaged.The resulting value is taken as a measure of quality.A drawback of this metric is that only the“worst”solution determines the quality value per linear combination. Although several weight combinations are used,nonconvex regions of the trade-off surface contribute to the quality more than convex parts and may,as a consequence,dominate the performance assessment.Finally,the distribution,as well as the extent of the nondominated front,is not considered.Another interesting means of performance assessment was proposed by Fonseca and Fleming(1996).Given a set of nondominated solutions,a boundary function divides the objective space into two regions:the objective vectors for which the corre-sponding solutions are not covered by and the objective vectors for which the associated solutions are covered by.They call this particular function,which can also be seen as the locus of the family of tightest goal vectors known to be attainable,the attainment surface. T aking multiple optimization runs into account,a method is described to compute a median attainment surface by using auxiliary straight lines and sampling their intersections with the attainment surfaces obtained.As a result,the samples represented by the median attain-ment surface can be relatively assessed by means of statistical tests and,therefore,allow comparison of the performance of two or more multiobjective optimizers.A drawback of this approach is that it remains unclear how the quality difference can be expressed,i.e.,how much better one algorithm is than another.However,Fonseca and Fleming describe ways of meaningful statistical interpretation in contrast to the other studies considered here,and furthermore,their methodology seems to be well suited to visualization of the outcomes of several runs.In the context of investigations on convergence to the Pareto-optimal front,some authors(Rudolph,1998;Van Veldhuizen and Lamont,1998a)have considered the distance of a given set to the Pareto-optimal set in the same way as the function defined below.The distribution was not taken into account,because the focus was not on this Evolutionary Computation Volume8,Number2179E.Zitzler,K.Deb,and L.Thielematter.However,in comparative studies,distance alone is not sufficient for performance evaluation,since extremely differently distributed fronts may have the same distance to the Pareto-optimal front.T wo complementary metrics of performance were presented in Zitzler and Thiele (1998,1999).On one hand,the size of the dominated area in the objective space is taken under consideration;on the other hand,a pair of nondominated sets is compared by calculating the fraction of each set that is covered by the other set.The area combines all three criteria(distance,distribution,and extent)into one,and therefore,sets differing in more than one criterion may not be distinguished.The second metric is in some way similar to the comparison methodology proposed in Fonseca and Fleming(1996).It can be used to show that the outcomes of an algorithm dominate the outcomes of another algorithm, although,it does not tell how much better it is.We give its definition here,because it is used in the remainder of this paper.D EFINITION5:Let be two sets of decision vectors.The function maps the ordered pair to the interval:(13)The value means that all solutions in are dominated by or equal to solutions in.The opposite,,represents the situation when none of the solutions in are covered by the set.Note that both and have to be considered,since is not necessarily equal to.In summary,it may be said that performance metrics are hard to define and it probably will not be possible to define a single metric that allows for all criteria in a meaningful way.Along with that problem,the statistical interpretation associated with a performance comparison is rather difficult and still needs to be answered,since multiple significance tests are involved,and thus,tools from analysis of variance may be required.In this study,we have chosen a visual presentation of the results together with the application of the metric from Definition5.The reason for this is that we would like to in-vestigate1)whether test functions can adequately test specific aspects of each multiobjective algorithm and2)whether any visual hierarchy of the chosen algorithms exists.However, for a deeper investigation of some of the algorithms(which is the subject of future work), we suggest the following metrics that allow assessment of each of the criteria listed at the beginning of this section separately.D EFINITION6:Given a set of pairwise nondominating decision vectors,a neighborhood parameter(to be chosen appropriately),and a distance metric.We introduce three functions to assess the quality of regarding the parameter space:1.The function gives the average distance to the Pareto-optimal set:min(14)Recently,an alternative metric has been proposed in Zitzler(1999)in order to overcome this problem. 180Evolutionary Computation Volume8,Number2Comparison of Multiobjective EAs 2.The function takes the distribution in combination with the number of nondominatedsolutions found into account:(15) 3.The function considers the extent of the front described by:max(16) Analogously,we define three metrics,,and on the objective space.Letbe the sets of objective vectors that correspond to and,respectively,and and as before:min(17)(18)max(19)While and are intuitive,and(respectively and)need further explanation.The distribution metrics give a value within the interval()that reflects the number of-niches(-niches)in().Obviously,the higher the value,the better the distribution for an appropriate neighborhood parameter(e.g.,means that for each objective vector there is no other objective vector within-distance to it).The functions and use the maximum extent in each dimension to estimate the range to which the front spreads out.In the case of two objectives,this equals the distance of the two outer solutions.6Comparison of Different Evolutionary Approaches6.1MethodologyWe compare eight algorithms on the six proposed test functions:1.A random search algorithm.2.Fonseca and Fleming’s multiobjective EA.3.The Niched Pareto Genetic Algorithm.4.Hajela and Lin’s weighted-sum based approach.5.The Vector Evaluated Genetic Algorithm.6.The Nondominated Sorting Genetic Algorithm.Evolutionary Computation Volume8,Number2181E.Zitzler,K.Deb,and L.Thiele7.A single-objective evolutionary algorithm using weighted-sum aggregation.8.The Strength Pareto Evolutionary Algorithm.The multiobjective EAs,as well as,were executed times on each test problem, where the population was monitored for nondominated solutions,and the resulting non-dominated set was taken as the outcome of one optimization run.Here,serves as an additional point of reference and randomly generates a certain number of individuals per generation according to the rate of crossover and mutation(but neither crossover and mutation nor selection are performed).Hence,the number offitness evaluations was the same as for the EAs.In contrast,simulation runs were considered in the case of, each run optimizing towards another randomly chosen linear combination of the objec-tives.The nondominated solutions among all solutions generated in the runs form the trade-off front achieved by on a particular test function.Independent of the algorithm and the test function,each simulation run was carried out using the following parameters:Number of generations:250Population size:100Crossover rate:0.8Mutation rate:0.01Niching parameter share:0.48862Domination pressure dom:10The niching parameter was calculated using the guidelines given in Deb and Goldberg (1989)assuming the formation of ten independent niches.Since uses genotypic fitness sharing on,a different value,share,was chosen for this particular case. Concerning,the recommended value for dom of the population size wastaken(Horn and Nafpliotis,1993).Furthermore,for reasons of fairness,ran with a population size of where the external nondominated set was restricted to.Regarding the implementations of the algorithms,one chromosome was used to en-code the parameters of the corresponding test problem.Each parameter is represented by bits;the parameters only comprise bits for the deceptive function. Moreover,all approaches except were realized using binary tournament selection with replacement in order to avoid effects caused by different selection schemes.Further-more,sincefitness sharing may produce chaotic behavior in combination with tournament selection,a slightly modified method is incorporated here,named continuously updated shar-ing(Oei et al.,1991).As requires a generational selection mechanism,stochastic universal sampling was used in the implementation.6.2Simulation ResultsIn Figures1–6,the nondominated fronts achieved by the different algorithms are visualized. Per algorithm and test function,the outcomes of thefirstfive runs were unified,and then the dominated solutions were removed from the union set;the remaining points are plotted in thefigures.Also shown are the Pareto-optimal fronts(lower curves),as well as additional reference curves(upper curves).The latter curves allow a more precise evaluation of the obtained trade-off fronts and were calculated by adding max minto the values of the Pareto-optimal points.The space between Pareto-optimal and 182Evolutionary Computation Volume8,Number2f101234f2RANDFFGA NPGA HLGA VEGA NSGA SOEA SPEAFigure 1:T est function(convex).f101234f2RANDFFGA NPGA HLGA VEGA NSGA SOEA SPEAFigure 2:T est function(nonconvex).183f11234f2RANDFFGA NPGA HLGA VEGA NSGA SOEA SPEA Figure 3:T est function(discrete).f1010203040f2RANDFFGA NPGA HLGA VEGA NSGA SOEA SPEAFigure 4:T est function(multimodal).184f10246f2RANDFFGA NPGAHLGA VEGA NSGASOEA SPEA Figure 5:T est function (deceptive).f12468f2RANDFFGA NPGA HLGA VEGA NSGASOEA SPEA Figure 6:T est function(nonuniform).185reference fronts represents about of the corresponding objective space.However,the curve resulting from the deceptive function is not appropriate for our purposes,since it lies above the fronts produced by the random search algorithm.Instead,we consider all solutions with,i.e.,for which the parameters are set to the deceptive attractors (for).In addition to the graphical presentation,the different algorithms were assessed in pairs using the metric from Definition5.For an ordered algorithm pair,there is a sample of values according to the runs performed.Each value is computed on the basis of the nondominated sets achieved by and with the same initial population. Here,box plots are used to visualize the distribution of these samples(Figure7).A box plot consists of a box summarizing of the data.The upper and lower ends of the box are the upper and lower quartiles,while a thick line within the box encodes the median. Dashed appendages summarize the spread and shape of the distribution.Furthermore,the shortcut in Figure7stands for“reference set”and represents,for each test function,a set of equidistant points that are uniformly distributed on the corresponding reference curve.Generally,the simulation results prove that all multiobjective EAs do better than the random search algorithm.However,the box plots reveal that,,anddo not always cover the randomly created trade-off front completely.Furthermore,it can be observed that clearly outperforms the other nonelitist multiobjective EAs regarding both distance to the Pareto-optimal front and distribution of the nondominated solutions.This confirms the results presented in Zitzler and Thiele(1998).Furthermore, it is remarkable that performs well compared to and,although some serious drawbacks of this approach are known(Fonseca and Fleming,1995).The reason for this might be that we consider the off-line performance here in contrast to other studies that examine the on-line performance(Horn and Nafpliotis,1993;Srinivas and Deb,1994). On-line performance means that only the nondominated solutions in thefinal population are considered as the outcome,while off-line performance takes the solutions nondominated among all solutions generated during the entire optimization run into account.Finally,the best performance is provided by,which makes explicit use of the concept of elitism. Apart from,it even outperforms in spite of substantially lower computational effort and although uses an elitist strategy as well.This observation leads to the question of whether elitism would increase the performance of the other multiobjective EAs.We will investigate this matter in the next section.Considering the different problem features separately,convexity seems to cause the least amount of difficulty for the multiobjective EAs.All algorithms evolved reasonably distributed fronts,although there was a difference in the distance to the Pareto-optimal set.On the nonconvex test function,however,,,and have difficulties finding intermediate solutions,as linear combinations of the objectives tend to prefer solutions strong in at least one objective(Fonseca and Fleming,1995,4).Pareto-based algorithms have advantages here,but only and evolved a sufficient number of nondominated solutions.In the case of(discreteness),and are superior to both and.While the fronts achieved by the former cover about of the reference set on average,the latter come up with coverage.Among the considered test functions,and seem to be the hardest problems,since none of the algorithms was able to evolve a global Pareto-optimal set.The results on the multimodal problem indicateNote that outside values are not plotted in Figure7in order to prevent overloading of the presentation. 186。
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROLInt.J.Robust Nonlinear Control2001;11:515}539(DOI:10.1002/rnc.596)Hybrid control of force transients for multi-pointinjection enginesAndrea Balluchi ,Luca Benvenuti ,Maria Domenica Di Benedetto * R,Alberto L.Sangiovanni-VincentelliPARADES,<ia San Pantaleo,66,00186Roma,ItalyDip.di Informatica e Sistemistica,;niversita degli Studi di Roma**La Sapienza++,<ia Eudossiana18,00184Roma,ItalyDip.di Ingegneria Elettrica,;niversita degli Studi de¸+Aquila,Poggio di Roio,67040L+Aquila,Italy Department of Electrical Engineering and Computer Science,Uni v ersity of California at Berkeley,CA94720,U.S.A.SUMMARYWe address the problem of delivering as quickly as possible a requested torque produced by a spark ignition engine equipped with a multi-point port injection manifold and with electronic throttle.The optimal control problem,subject to the constraint that the air}fuel ratio stays within a pre-assigned range around the stoichiometric ratio,is solved for a detailed,cycle-accurate hybrid model with a hybrid control approach based on a two-step process.In the"rst step,a continuous approximation of the hybrid problem is solved exactly.Then,the control law so obtained is adjusted to satisfy the constraints imposed by the hybrid model. The quality of the control law has been in part analytically demonstrated and in part validated with simulations.Copyright 2001John Wiley&Sons,Ltd.KEY WORDS:hybrid systems;engine control;optimal control1.INTRODUCTIONIn this paper,we deal with the problem of delivering as quickly as possible a requested torque produced by a spark ignition engine equipped with a multi-point port injection manifold and electronics to control the throttle-valve position.The control variables are the amount of injected fuel and the voltage given to the electric motor controlling the position of the throttle valve.The optimization problem is subject to the constraint that the air}fuel(A/F)ratio stays within a pre-assigned range around the stoichiometric value of14.64(the ratio that guarantees minimum emission).Air and fuel dynamics depend on the pressure in the intake manifold}that is controlled by throttle valve}and on second order phenomena,the most important of which is the#uid"lm dynamics[1].The#uid"lm is created during the injection process:after fuel is injected in the intake runner,part of the vapourized fuel turns into#uid that deposits onto the*Correspondence to:M.D.Di Benedetto,Dip.di Ingegneria Elettrica,Universita degli Studi de L'Aquila,Poggio di Roio, 67040L'Aquila,Italy.-E-mail:dibenede@ing.univaq.itContract/grant sponsor:PARADES,P.F.MADESS II CNRCopyright 2001John Wiley&Sons,Ltd.516 A.BALLUCHI E¹A¸.walls of the intake runner and,hence,it is not immediately available for combustion inside the cylinders.The#uid"lm on the intake walls evaporates again,thus contributing to the combus-tion process but with a noticeable delay.This phenomenon(unfortunately)cannot be ignored, since it has a de"nite e!ect on the performance of the combustion process.The most used solutions to this problem consist of feed-forward compensation of fuel dynamics [2}4],based on mean value engine models[5}7].However,the mean values of the engine variables of interest may not be accurate enough to guarantee small transient deviations from the optimal A/F ratio.In this paper,we propose an approach that yields very small deviations with respect to the optimal A/F ratio,by using a hybrid model of the cyclic behaviour of the engine. The hybrid model describes accurately the detailed behaviour of the actuated throttle,of the injection systems and of the torque-generation mechanism,and,at the same time,allows to develop powerful closed-loop control laws.Our goal is to design a control law for the fuel injection durations and the voltage supplied to the throttle valve motor to drive the evolution of the system from an initial condition character-ized by the delivery of a torque u to a"nal condition characterized by the delivery of a requested torque u in minimum time subject to constraints on emissions.Note that the control problem described above is new not only because we use a detailed hybrid model for the injection process but also because we consider the entire control chain,from throttle motor to engine. Minimum time is not the only relevant criterion to follow when considering control schemes that might be implemented for engine control.Preventing signi"cant over/undershoot of the reference value,and being able to hold the desired torque value within some bounds are also desirable.However,minimum-time control problems have well-known solutions that allow us to obtain powerful control laws.These additional criteria can be cast into constraints in the same way as we have done for the A/F ratio.Then,the optimal control problem can still be solved analytically even though the number of details and the notational complexity would become overwhelming.For this reason,we have chosen to ignore these additional considerations since handling them could cloud our approach.Our approach to the optimal control problem is a two-step process:(1)we"rst introduce and solve an auxiliary optimal control problem in continuous time,and(2)we then map the optimal control law back in the hybrid domain trying to maintain as muchas possible the properties of the solution.The mapping process is critical for obtaining a satisfactory solution to the hybrid problem. Several mappings are possible that satisfy the restrictions imposed by the hybrid model but we want to choose a mapping that satis"es the original constraints and is as close as possible to the optimal solution.In Reference[8],we have solved with the same approach a related,but simpler, problem where throttle actuation was not taken into account,the torque requested was zero and the cost function was the amount of undesired oscillations of the power-train(the cut-o!problem). In this case,we were able to prove that the hybrid control law obtained by the mapping process ensured stability and constraints satisfaction.In this paper,given the additional details that have been considered,the control law obtained by the two-step process is more complex and depends on several engine and car parameters.For this reason,the theoretical results are weaker,but,under conservative assumptions on the relative speed of the crankshaft at the time of successive injection points and ignoring the dynamics of the throttle control loop,we can still prove that the original constraints on the problem are satis"ed with the hybrid control law.Simulation data are used to validate the control laws for the full chain on power-trains of existing cars.Copyright 2001John Wiley&Sons,Ltd.Int.J.Robust Nonlinear Control2001;11:515}539It can be argued that a dual approach,relying on a discretized,crank-angle-based abstraction of the cylinder 's FSM (see for example Reference [9]),would solve the problem tackled in this paper as well.Since we assume small excursions in engine speed,then the phase of mapping the auxiliary optimal solution into the hybrid domain is likely to be easier when discrete }time abstractions are used with respect to continuous-time ones.Indeed,the cut-o !problem has also been solved using this approach [10],i.e.a discrete-time abstraction of the hybrid model of the plant was used.However,the weakness of this approach lies on the lack of theoretical results asserting properties of the control law since only numerical solutions to the auxiliary optimal discrete-time control problem can be obtained.2.PLANT MODEL AND PROBLEM FORMULATIONIn this section,a hybrid model for vehicles with four-stroke four-cylinder gasoline engine equipped with a multi-point injection system and electronic throttle is illustrated.The model (an expansion of the model in Reference [8])consists of four parts (see Figure 1):two continuous-time systems,modelling the power-train dynamics and the air dynamics,respectively,and two parts each composed of four hybrid systems modelling the behaviour of each cylinder and of each injection system.Air Dynamics :The model of the quantity of air entering the cylinder during the intake run is obtained from the air #ow balance equation of the manifold.The air mass m ,loaded during an intake run,is subject to the manifold pressure (p )dynamics which is controlled by the throttle valve actuated by a DC motor.Since we consider short control horizon,then we can assume small variations of the crankshaft speed and neglect the dependence from it in the air dynamics.The model is then(t )"a ? (t )#b ?v (t )(1)p (t )"a N p (t )#b N(t )(2)m ?(t )"c Np (t )(3)where v 3[!<,#<]is the DC motor voltage and is the throttle angle that is subject to the constraint0) (t ))90(4)Powertrain model :Powertrain dynamics are modelled by the linear system(t )"A (t )#bu (t )where "[ C , , N ]2represents the axle torsion angle,the crankshaft revolution speed and the wheel revolution speed.The input signal u is the torque produced by the engine and acting on the crankshaft.Model parameters A and b ,depend on the transmission gear which is assumed not to change.A single-state hybrid system emits the event dead }point ,when pistons reach either the top or bottom dead centers,and produces the crank angle .¹orque -generation :The behaviour of each cylinder in the engine is abstractly represented by a "nite state machine (FSM)and a discrete event system (DES)modelling torque generation.The FSM state S G of the i th cylinder assumes values in the set +H G ,I G ,C G ,E G ,which correspond to the exhaust,intake,compression and expansion strokes,respectively,in the four-stroke engine 517TRANSIENTS CONTROL FOR INJECTION ENGINES Copyright 2001John Wiley &Sons,Ltd.Int .J .Robust Nonlinear Control 2001;11:515}539518 A.BALLUCHI E¹A¸.Figure1.Engine hybrid model E.The hybrid models of the cylinders and injection systems2,3and4arenot reported due to space limitation.Copyright 2001John Wiley&Sons,Ltd.Int.J.Robust Nonlinear Control2001;11:515}539cycle.An FSM transition occurs when the piston reaches a dead center,that is when the event dead }point is emitted.The DES describing the torque generation process of the i th cylinder increments its sequence index k by one at each transition of the FSM.Its inputs are the masses m ?G and m TG of air and fuel loaded during the intake phase;its output is the torque u G (k )generated by the cylinder.At the transition I G P C G ,that is at time t 'G ,the event int }end G is generated and the DES reads its inputs,storing in q ?G and q TG their values.The amount of torque achievable during the next expansion phase,obtained by the fuel-to-torque gain G ,is stored in z G .The DES output u G (k )is always zero except at the C G P E G transition when it is set to the value stored in z G .Input u G (t )to the powertrain dynamics is obtained from u G (k )by a zero }order holder latched on the event dead }point .Injection process :The i th injection system is abstractly represented by a hybrid system,whose discrete state F G assumes values in the set +J G ,B G ,=G ,D G,described below:(1)J G :the injector is open and delivers a constant #ow P of vapourized fuel.A fraction of it condenses in a fuel puddle and increases the mass m JG of liquid fuel,fraction 1! increases the mass m TG of vapourized fuel in the intake runner.The mass of liquid fuel evaporates o !with a time constant .The injector remains open for G seconds modelled by timer t G .(2)B G :the injector is closed and the evaporation process continues.When the next dead }point event is emitted,the intake valve opens,and the air }fuel mix begins to enter the cylinder.Atthe I G P C G transition,the intake valve closes and the int }end G event is generated.The mass m TG of vapour is reset to zero since all the vapour fuel has been loaded in the cylinder.(3)=G :the injector is closed and evaporation proceeds.(4)D G :the beginning of fuel injection is synchronized with respect to the beginning of the exhaust phase with a time delay of t B seconds measured by timer t G .This delay allows to locate the injection interval with respect to the engine cycle.The value of t B ,which in general depends on the crankshaft speed,is considered constant since small engine speed variations are assumed.In this state,fuel dynamics is as in state =G.Engine hybrid model :The overall model E of the engine is the combination of four hybrid systems representing the behaviour of each cylinder and related injection system,and the powertrain and intake manifold models which are shared among all cylinders.The pistons are connected to the crankshaft,so that dead-points are synchronous and the cycle of each one is delayed one step with respect to the cycle of the previous one.Then,the dead }point events and the sequence index k are shared among all the cylinders and only one signal u G (t )may be di !erent from zero at any time.Input signals are:the input voltage v (t )to the DC motor actuating the throttle valve,a scalar continuous time signal in the class of functions 1> P [!<,#<];the injection intervals (k ),a scalar discrete time signal in the class of functions Z > P [0, ],which is sequentially distributed over the four injectors synchronously with the corresponding exp }end G event.The state of the overall hybrid systems is a triple (q ,z ,x )where:(1)q "[S ,F ,S ,F ,S ,F ,S ,F ]is the state of the FSMs associated to each cylinder and each injection system;(2)z "[z ,q T ,q ? ,2,z ,q T ,q ? ]is the cylinder DES state;(3)x "[ , , ,p ,t ,t ,m T ,m J ,2,t ,t ,m T ,m J ]is the continuous state associated to the powertrain,air dynamics,and to each injection system.The output of the overall system is the generated torque u .519TRANSIENTS CONTROL FOR INJECTION ENGINES Copyright 2001John Wiley &Sons,Ltd.Int .J .Robust Nonlinear Control 2001;11:515}539520 A.BALLUCHI E¹A¸.2.1.Problem formulationIn order to reduce tailpipe emissions,the air}fuel ratio A/F of each cylinder has to be kept in a range[¸ ,¸ ]around the stoichiometric value¸ "14.64.This corresponds to requiring the following constraint:¸q TG!(k))q?G!(k))¸ q TG!(k)(5) where i!denotes the index of the cylinder which enters the state C.Given a value u of torque produced by the engine,de"ne in the hybrid state space the set T(u) of all the hybrid states(q(0),z(0),x(0))for which there exist v(t):1> P[!<,#<]and (k):Z> P[0, ]such that for all t*0and for all k*0,u(t)"u and constraints(4)and(5) are satis"ed.Note that the set T(u)consists of all the state trajectories of the hybrid model E such that along the entire trajectories a constant torque u is generated while constraints on inputs v, and states ,q?,q T are satis"ed.Problem1Consider the engine hybrid model E,shown in Figure1.Let u and u be the initial value and the desired value of the torque,respectively.Assume that,at the initial time t"0,the hybrid state (q ,z ,x )belongs to T(u ).Find v(t):1> P[!<,#<]and (k):Z> P[0, ]such that(1)the initial state(q ,z ,x )is steered to T(u )at some unspeci"ed time tଙ;(2)for all t*0and for all k*0constraints(4)and(5)are satis"ed;(3)the time tଙis minimized.3.AUXILIARY CONTINUOUS-TIME OPTIMAL CONTROL WITHOUTTHROTTLE DYNAMICSIn this section,the interactions between fuel and air dynamics,subject to constraint(5),are analysed by considering a continuous-time model approximating the behaviour of the engine hybrid model E.The hybrid nature of the intake process is abstracted away by using average continuous-time models for fuel and air whose outputs are the average fuel#ow,f T(t),and air #ow,f?(t),entering the cylinders.Moreover,we abstract away the throttle actuation dynamics and consider the throttle valve to be the air dynamics input.To solve the continuous optimal problem,we follow a two-step process:in the"rst step,we"nd the minimum-time control for the air dynamics alone;in the second step,we introduce the fuel dynamics and appropriately modify the optimal control law found in the"rst step to solve the continuous optimal problem at hand.The intake manifold dynamics is described by Equation(2)and constraint(4).The#ow of air f?(t)entering the cylinders is expressed as:f?(t)"c?p(t)where c?"( /30)c N.Furthermore,fuel dynamics is modelled by the average modelm J(t)"a J m J(t)#b Jf T(t)"c J m J(t)#d J (t)(6) Copyright 2001John Wiley&Sons,Ltd.Int.J.Robust Nonlinear Control2001;11:515}539where f T denotes the average fuel#ow entering the cylinders,m J denotes the average mass of liquid fuel,and a J"! \ ,b J" P( /30),c J" \ ,d J"(1! )P( /30).The A/F constraints are rewritten for the#ows f?and f T as follows:¸p(t)#¸K m J(t)) (t))¸ p(t)#¸K m J(t)(7) where¸ "c?/¸ d J'0,¸ "c?/¸ d J'0,and¸K"!c J/d J(0.For the continuous-time model considered here,the target set corresponding to the desired torque u isT(u )" (m J,p)"p"p "14.64c N G u (8) so that Problem1reduces to the following one:Problem2Consider the engine continuous-time model described by Equations(2)and(6).Let u and u be the initial value and the desired value of the torque,respectively.Assume that,at the initial time t"0,the state(m J,p )belongs to T(u ).Find (t):1> P[0,90]and (t):1> P[0, ] such that(1)the initial state(m J,p )is steered to T(u )at some unspeci"ed time tଙ;(2)for all t*0,constraints(7)are satis"ed;(3)the time tଙis minimized.Constraints(7)de"ne a set of feasible values for(m J,p),obtained for ranging in the interval [0, ].Since the liquid fuel mass is non-negative,the set of feasible states for the control problem at hand are de"ned by the following linear inequalities:¸p#¸K m J*0¸p#¸K m J)p,m J*0(9)Note that,when the manifold pressure p is zero,the unique feasible state value is(m J,p)"(0,0), which is obtained with injection "0(neither fuel nor air is loaded by the cylinders).As a matter of fact,the evaporation of any liquid fuel m J'0would produce a rich mixture with f?/f T(¸ . Hence,if a fuel puddle is present,the manifold pressure has to be greater than zero.However,by injecting a proper amount of fuel,some values(m J,p)on the line m J"0may be feasible even if there is no fuel puddle.These values lay on the segment with extremum points(0,0)and (0, /¸ ).Note that only the segment from(0,p )to(!(¸ /¸K)p ,p )of the target set T(u ) belongs to the feasible set(9).Constraints(7)couples between the manifold dynamics(2)and the fuel dynamics(6).If one considers the manifold dynamics(2)alone,the straightforward min-imum-time control to the target point p"p is" 0if p'p90if p(p!(aN/b N)p when p"p (10) 521TRANSIENTS CONTROL FOR INJECTION ENGINESCopyright 2001John Wiley&Sons,Ltd.Int.J.Robust Nonlinear Control2001;11:515}539The minimum-time t ଙneeded to steer an initial manifold pressure p (0)to p ist ଙ"1a N ln p p (0) if p (0)'p 1a N ln a N p #90b N a N p (0)#90b Nif p (0)(p (11)Given an initial fuel value m J (0),the manifold control (10)remains optimal when the fuel dynamics (6)are also considered and constraints (7)are introduced,if there exists a fuel injection signal (t ):[0,t ଙ]P [0, ]such that constraints (7)are satis "ed along the trajectory.In fact,if this is the case,the trajectory (m J (t ),p (t ))starting from (m J (0),p (0))reaches the target set T (u )at time t ଙwithout leaving the feasible set de "ned by (9).In the following we will:(1)"nd the feasible initial conditions for which the control law (10)remains optimal;(2)give the optimal control law for the remaining initial conditions.Consider "rst the initial conditions (m J (0),p (0))in the region delimited by (9)and p (p ,where the control "90is applied.For these conditions it can be shown that there are always values of satisfying (7)along the entire trajectory to the target set.Then,the control law "90and equal to any value satisfying (7)steer the state to the target set in minimum time satisfying the constraints.In this case,the time required to drive the initial condition to the target set is determined only by the pressure dynamics and is given by (11).On the other hand,when p (0)'p and the control "0is applied,it can be the case that no value of satisfying (7)exists for some point of the trajectory.Since p (t )is decreasing,this corresponds to violating the constraint f ?/f T '¸ .To avoid (if possible)this situation,the amount of fuel #ow entering the cylinders has to be minimized,by choosing the minimum admissible value for ,i.e. "max +0,¸ p #¸K m J ,.Then,for the initial conditions (m J (0),p (0))in the region delimited by (9)and p 'p ,where the control "0is applied,there are always values of satisfying (7)along the entire trajectory to the target set provided that (m J (0),p (0))is on the left of the trajectory obtained by backwards integration of dynamics (2)and (6)from the point (!(¸ /¸K )p ,p )with "0and "max +0,¸ p #¸K m J ,.See Figure 2.Hence,for these initial conditions,applying the control law "0and equal to any value satisfying (7),the target set is reached in minimum time and the constraints are satis "ed.Also in this case,the time needed to steer the initial condition to the target set is determined only by the pressure dynamics and is given by Equation (11).In summary,for all the initial conditions in the feasible set (9)and on the left of ,the optimal controls are" 0if p 'p90if p (p and "max +0,¸ p #¸K m J,Fuel dynamics is steered in such a way that it tracks the intake manifold dynamics to satisfy the air }fuel constraint (7).Since the amount of fuel puddle is small with respect to the values of the manifold pressure,then the air dynamics can be controlled in minimum time to the target 522 A.BALLUCHI E ¹A ¸.Copyright 2001John Wiley &Sons,Ltd.Int .J .Robust Nonlinear Control 2001;11:515}539Figure 2.Minimum-time trajectories without throttle dynamics.pressure p ,without any interference due to the air }fuel constraint,which is handled by the injection signal only.For conditions (m J (0),p (0))in the region delimited by (9)and lying to the right of the curve ,the air dynamics has to track the slower fuel evaporation dynamics to achieve air }fuel constraint (7)satisfation.Hence,for these initial conditions,the optimal feedback controls are as follows:" 0if ¸ p #¸K m J '0a J !a N b N p if ¸ p #¸K m J "0and "max +0,¸ p #¸K m J ,(12)According to (12),these initial conditions are "rst steered by the controls "0and "max +0,¸ p #¸K m J ,to the line ¸ p #¸K m J "0.Then,under the action of the control signals "[(a J !a N )/b N ]p and "0,they follow a sliding motion along this constraint until they reach the extremum (!(¸ /¸K )p ,p )of the target set.It is worth noting that,during the sliding motion,the closed-loop system ism J (t )"a J m J(t )p (t )"a N p #b N [(a C !a N )/b N ]p "a Jp Since "a J "("a N ",the pressure dynamics is slowed down to make it follow the fuel dynamics and to satisfy constraints (7).Thus,the control law for given by (12)is optimal since it minimizes the length of the arc of trajectory over the constraint.Summarizing:523TRANSIENTS CONTROL FOR INJECTION ENGINES Copyright 2001John Wiley &Sons,Ltd.Int .J .Robust Nonlinear Control 2001;11:515}539Theorem 1If the initial state (m J ,p )belongs to the feasible set described by inequalities (9),then the optimal control (t ):1> P [0,90]and (t ):1> P [0,]solving Problem 2is: (t )"0if ¸ p (t )#¸K m J (t )'0and p (t )'p ,90if p (t )(p ,!(a N /b N )p if p (t )"p ,[(a C !a N )/b N ]p (t )if ¸ p (t )#¸K m J(t )"0and p (t )'p (13) (t )"max +0,¸ p (t )#¸K m J(t ),.(14)This result can be proved by applying the Pontryagin Maximum Principle.Figure 2shows some minimum-time trajectories to the target set (8)for dynamics (2)and (6),and constraints (7).4.HYBRID CONTROL WITHOUT THROTTLE DYNAMICSThe continuous control law described in Section 3,must be approximated to yield a feasible control law for the hybrid model E introduced in Section 2.More precisely,in the continuous-time model adopted in Section 3,the air }fuel constraints (7)are expressed in terms of the continuous evolutions of the manifold pressure and liquid fuel.Moreover,the control signals and are assumed to be continuous-time signals.When dealing with the hybrid model E ,the air }fuel constraints (5)are expressed in terms of the event-based signals q ?and q T .In addition,the amount of air q ?loaded in the cylinder depends on the manifold pressure p at the dead center corresponding to the end of the intake.The amount of loaded fuel q T depends on the evolution of the hybrid model describing the fuel injection system,which models the delay between the time at which the injection control signal is set and the time at which the fuel is loaded.Then,the main issues to address when we move from the continuous case to the hybrid case are:(1)in model E there is a delay between the time at which the injection control G is set (at the end of the expansion phase)and the time at which the vapourized fuel q T is loaded (at the end of the intake phase);(2)feasible control actions on G are discrete-time signals synchronized with the crank angle.This issue is the main cause of di $culty for devising a hybrid control strategy;(3)in model E ,there exist four independent fuel dynamics,controlled by inputs ,2, ,whose evolutions are constrained with respect to the same air #ow evolution by A /Fbounds.The measurements available for closing the control loop are:the pressure p ,the angle and the crankshaft speed .Since in the solutions derived in Section 3, is chosen as the maximum between 0and ¸ p #¸K m J ,then fuel injection is regulated so to maintain in (5)q ?"¸ q T ,when the cylinder is in the compression stroke.The injection control G for the i th cylinder is set at the end of the expansion stroke (i.e.when the exp }end G event is generated).Consider "rst the design of the fuel injection control.The continuous optimal injection control law (14)has only two possible actions:either no fuel is injected,that is "0;or "¸ p #¸K m J ,which corresponds to producing a mixture with maximum feasible value of A /F ratio,is injected.524 A.BALLUCHI E ¹A ¸.Copyright 2001John Wiley &Sons,Ltd.Int .J .Robust Nonlinear Control 2001;11:515}539Hence,our strategy is mapped into the hybrid domain as follows.At time t I,corresponding to the end of the expansion stroke,the value of is set to one of the two possible values on the basis of the estimated values of q?(k#2)and q T(k#2)at time t I> corresponding to the end of the next intake stroke:( 1)either (k)"0,i.e.if q?(k#2)(¸ q T(k#2)no fuel is injected;( 2)or (k)such that q?(k#2)"¸ q T(k#2),i.e.a mixture with maximum feasible value of A/F ratio is produced(see(5)).The estimations of q?(k#2)and q T(k#2)are non-trivial since they depend not only on the values of the state components (t I),m T(k),m J(k)and p(t I),but also on the chosen control actions (k)and (t)over[t I,t I> ].Consider now the design of the throttle control.The continuous minimum-time control (13)assumes only four possible values: "0, "90, "[(a C!a N)/b N]p and"nally "!(a N/b N)p when the target set has been reached.This strategy is mapped into the hybrid domain as follows:( 1) "90,if p(p ;( 2) "0,if p'p and q?(k#2)'¸ q T(k#2);( 3) "[(a C!a N)/b N]p,if q?(k#2)"¸ q T(k#2),so that the manifold dynamics tracks the fuel dynamics to obtain a mixture with minimum feasible value of A/F ratio.( 4) "!(a N/b N)p when the target set has been reached.We will now show how to calculate the values of (k)and (t).Consider the cases p(t I)(p (Case1)and p(t I)'p (Case2)separately.Case1:p(t I)(p .According to( 1)suppose (t)"90for all t3[t I,t I> ].Then,the value p(t I> )of the manifold pressure at time t I> obtained by integration of the pressure dynamics, would bep(t I> )"p(t I)e?N R I> \R I !(1!e?N R I> \R I )90b Na N(15)Two cases are possible:Case1a:p(t I> ))p .In this case,we can indeed set (t)"90for all t3[t I,t I> ]and only the value of (k)needs to be computed.In order to compute the value of (k)the amount of fuel q T(k#2)loaded in the cylinder at time t I> needs to be evaluated.Integration of the fuel dynamics givesq T(k#2)"(1!e\ R I> \R I\ O)m J(t I\ )#e\ R I> \R I O e R B O(1!e O) P #P (16) Hence,according to( 1)and( 2),(k)" 0if q?(k#2)"c N p(t I> )(¸ q T(k#2)such that c N p(t > )"¸ q T(k#2),otherwiseCopyright 2001John Wiley&Sons,Ltd.Int.J.Robust Nonlinear Control2001;11:515}539。
最优化方法有关牛顿法的矩阵的秩为一的题目
英文回答:The Newton-Raphson method is an iterative optimization algorithm utilized for locating the local minimum or maximumof a given function. Within the realm of optimization, the Newton-Raphson method iteratively updates the current solution by leveraging the second derivative information of the objective function. This approach enables the method to converge towards the optimal solution at an accelerated pacepared to first-order optimization algorithms, such as the gradient descent method. Nonetheless, the Newton-Raphson method necessitates the solution of a system of linear equations involving the Hessian matrix, which denotes the second derivative of the objective function. Of particular note, when the Hessian matrix possesses a rank of one, it introduces a special case for the Newton-Raphson method.牛顿—拉弗森方法是一种迭代优化算法,用于定位特定函数的局部最小或最大值。
薛定谔—麦克斯韦尔方程径向解的存在性和多重性(英文)
In 1887, the German physicist Erwin Schrödinger proposed a radial solution to the Maxwell-Schrödinger equation. This equation describes the behavior of an electron in an atom and is used to calculate its energy levels. The radial solution was found to be valid for all values of angular momentum quantum number l, which means that it can describe any type of atomic orbital.The existence and multiplicity of this radial solution has been studied extensively since then. It has been shown that there are infinitely many solutions for each value of l, with each one corresponding to a different energy level. Furthermore, these solutions can be divided into two categories: bound states and scattering states. Bound states have negative energies and correspond to electrons that are trapped within the atom; scattering states have positive energies and correspond to electrons that escape from the atom after being excited by external radiation or collisions with other particles.The existence and multiplicity of these solutions is important because they provide insight into how atoms interact with their environment through electromagnetic radiation or collisions with other particles. They also help us understand why certain elements form molecules when combined together, as well as why some elements remain stable while others decay over time due to radioactive processes such as alpha decay or beta decay.。
基于声学指数的神农架国家公园声音多样性动态变化
果显示 ACI 指数不能很好地反映日变化趋势,但 BI 指数和 NDSI 指数具有明显的日变化趋势,且变化趋势符合
物种黎明/ 黄昏合唱的习性;声学指数随海拔梯度的空间变化结果表明,ACI、BI 指数在中海拔区域具有最大值,
且 ACI 指数与海拔相关性较强,NDSI 指数没有显著的变化趋势。【结论】BI、NDSI 指数能较好地反映动物声音
:【 】 Abstract Objective The study aims to evaluate the response of acoustic indices to the dynamic changes of animal , sound diversity further to explore the characteristics of the variation of animal sound diversity in Shennongjia National , , 【 】 Park China in order to provide a quantitative basis for the local ecological protection. Method We deployed nine , sound recording equipments in nine sampling sites in Shennongjia National Park and sound recording data from May to ( ), July 2021 were obtained. A time series of ecoacosutic indices including acoustic complexity index ACI bioacoustic ( ), ( ) index BI normalized difference soundscape index NDSI were extracted from the recording data after noise
Robust Control and Estimation
Robust Control and Estimation Robust control and estimation are essential components of modern engineering systems, providing the ability to maintain stability and performance in the faceof uncertainties and disturbances. In the realm of control theory, robust control techniques aim to design controllers that can effectively handle variations in system parameters or external disturbances, ensuring that the system remainsstable and performs as desired. On the other hand, robust estimation techniques focus on accurately estimating the state of a system despite uncertainties in the measurements or model inaccuracies. One of the key challenges in robust controland estimation is dealing with uncertainties in the system model. Real-world systems are often subject to variations in parameters or external disturbancesthat are difficult to predict or quantify accurately. Traditional control and estimation techniques that rely on precise mathematical models may fail in such scenarios, leading to poor performance or instability. Robust control andestimation techniques, on the other hand, are designed to handle theseuncertainties by incorporating them into the design process and ensuring that the system remains stable and performs well under a wide range of operating conditions. Robust control techniques typically involve the use of advanced mathematical tools such as H-infinity control, mu-synthesis, or robust model predictive control.These techniques allow engineers to design controllers that can guaranteestability and performance even in the presence of uncertainties. By consideringthe worst-case scenario and optimizing the controller design to handle these extreme conditions, robust control techniques provide a higher level of confidencein the system's performance. Similarly, robust estimation techniques play acrucial role in accurately estimating the state of a system despite uncertaintiesin the measurements or model inaccuracies. Kalman filtering, robust observers, or adaptive estimation algorithms are commonly used in robust estimation to improvethe accuracy and reliability of state estimation. By incorporating robustestimation techniques into the control system, engineers can ensure that the controller receives accurate and reliable state information, leading to better control performance. In addition to handling uncertainties, robust control and estimation techniques also offer other benefits such as improved robustness tosensor noise, modeling errors, or external disturbances. By designing controllers and estimators that are robust to these factors, engineers can enhance the overall performance and reliability of the system. Moreover, robust control and estimation techniques can also simplify the tuning process for controllers, as they are designed to be more forgiving of variations in system parameters. Overall, robust control and estimation play a critical role in ensuring the stability, performance, and reliability of modern engineering systems. By incorporating robust techniques into the design process, engineers can create systems that are more resilient to uncertainties and disturbances, leading to improved overall performance and reliability. As technology continues to advance and systems become more complex, the importance of robust control and estimation techniques will only continue to grow, making them essential tools for engineers in various fields.。
Finding compact coordinate representations for polygons and polyhedra
1 Introduction
To achieve reliable programs that implement geometric computations, one must understand and control the e ect of numerical error. One approach to controlling numerical error is to eliminate it by working only with objects and transformations that can be represented with numbers in a representable sub eld of the reals. For linear objects (e.g., points, lines, planes), using exact (i.e., arbitrary precision) rational numbers would allow exact computation of intersections. Unfortunately, the situation is di erent with rotation, a commonly used geometric transformation: rotating a line with rational coe cients can yield a line with irrational coe cients. Pythagorean triples can be used to approximate any rotation arbitrarily closely by a rational rotation, a rotation that can be represented by a matrix with rational entries 7, Section 4.3.3]. Unfortunately, when rational representations are used, iteration of geometric transformations, like rotation, can cause unbounded growth in the precision needed to represent transformed objects, which can lead rapidly to unacceptable performance of geometric computations. We shall focus here on the precision growth problem. Precision growth arises from a sequence of geometric transformations. As a simple example, consider a sequence of r rotations about the coordinate axes (any rotation can be expressed as a sequence of rotations about the axes), each of which is to be approximated as above by a rational rotation with an angle of rotation accurate to one part in 2?P . It is easy to show that after applying these r rotations to a point, the 1
COMMUNICATION Tetravalent Miniantibodies with High Avidity Assembling in Escherichia coli
JMB—MS307Cust.Ref.No.CAM494/95[SGML]J.Mol.Biol.(1995)246,28–34C OMMUNICATIONTetravalent Miniantibodies with High Avidity Assembling in Escherichia coliPeter Pack,Kristian Mu¨ller,Ralph Zahn and Andreas Plu¨ckthun*Biochemisches Institut W e have designed tetravelent miniantibodies assembling in the periplasm of Universita¨t Zu¨rich Escherichia coli.They are based on single-chain Fv fragments,connected viaaflexible hinge to an amphipathic helix which tetramerizes the molecule.The Winterthurestr.190,CH-8057Zu¨rich,Switzerland amphipathic helix is derived from the coiled coil helix of the transcriptionfactor GCN4,in which all hydrophobic a positions of every heptad repeathave been exchanged to leucine and all d positions to isoleucine.Gelfiltrationshows tetramer assembly of the miniantibody even at low concentrations.Asexpected,the functional affinity(avidity)of the tetravalent miniantibody ishigher in ELISA and BIAcore measurements than that of the bivalentconstruct and the gain is dependent on surface epitope density.Keywords:antibodies;single-chain Fv fragments;leucine zipper;B/A*Corresponding author core;multivalencyThe low intrinsic affinity of antibodies produced early after immunization is compensated for very effectively by means of multivalency.This natural strategy of polymerization of binding sites,for instance in the form of a decavalent immunoglobulin M,leads to an increase in functional affinity(avidity) toward multimeric antigens,such as viral or bacterial surface epitopes,of several orders of magnitude (Devey&Steward,1988).The gain in stability of antibody-antigen complexes depends on the thermo-dynamic affinity of a single binding site(intrinsic affinity),the number of binding sites per molecule and the number of epitopes within reach,and is strongly influenced by geometric factors(Crothers& Metzger,1971;Kaufman&Jain,1992).W e previously utilized the benefits of bivalency by designing miniantibodies(Pack&Plu¨ckthun,1992).The model with which the design was tested was the well characterized phosphorylcholine-binding anti-body McPC603(Satow et al.,1986)in the form of a single-chain Fv fragment(Glockshuber et al.,1990), connected to an association motif via aflexible hinge region.T wo different motifs have previously been tested:parallel associating coiled coil helices(O’Shea et al.,1991)and antiparallel associating helices of a four-helix bundle design(Ho&DeGrado,1987).In the latter case,when using only one bundle helix per fragment,nevertheless mostly dimers were ob-tained.Only a small fraction of the affinity-purified protein behaved consistently with tetramers in ultracentrifugation studies(Pack&Plu¨ckthun,1992). Dimeric miniantibodies with further improved binding characteristics could be formed using a helix-turn-helix motif instead of a single helix for association.The resulting scdHLX miniantibody is expressed in yields of several hundred milligrams per liter culture in high cell density fermentation of Escherichia coli(Pack et al.,1993).Here we report a new application of helical association domains to produce tetravalent minianti-bodies assembling in vivo in the periplasm of E.coli. It is based on the observation of P.Kim,T.Alber and co-workers that the simultaneous change of four amino acid residues in heptad position a and four amino acid residues in heptad position d of the leucine zipper dimerization domain of the GCN4 protein(Figure1)can result in the formation of highly stable trimeric and tetrameric synthetic peptides(Harbury et al.,1993).Changing of the hydrophobic residues in position a to leucine and thePresent addresses:P.Pack,MorphoSys GmbH,Frankfurter Ring193a,D-80807Mu¨nchen,Fed.Rep.Germany;R.Zahn,MRC Unit for Protein Function andDesign,Lensfield Road,Cambridge CB21EW,UnitedKingdom.Abbreviations used:BIAcore TM,biosensor device;Ig,immunoglobulin;BSA,bovine serum albumin;BSA-PC,conjugate from BSA and PC;ELISA,enzyme-linkedimmunosorbent assay;PBS,phosphate buffered saline;PC,phosphorylcholine;RU,resonance units;scFv,single-chain Fv fragment;scTETRAZIP,fusion proteinfrom modified tetramerizing leucine zipper andsingle-chain Fv fragment;scZIP,fusion protein fromleucine zipper and single-chain Fv fragment;scdHLX,fusion protein from helix-turn-helix peptide and singlechain Fv fragment.0022–2836/95/060028–07$08.00/071995Academic Press LimitedFigure 1.Sequence comparison of the unmodified GCN4-zipper (amino acid residues 249to 281of the GCN4protein)used as an association handle in the construct scZIP and zipper modified according to Harbury et al .(1993),used in the construct scTETRAZIP .Positions exchanged in the heptads of the modified zipper part are indicated in bold type.conserved leucine residues in position d to isoleucine leads to a parallel four-helix complex as revealed by X-ray crystal structure analysis and deduced from re-equilibration experiments of peptides carrying additional cysteine residues at the N or C terminus.According to the findings of Harbury et al .(1993)the coiled coil packing of the dimer cannot accommodate the side-chains of isoleucine at position d pointing into the hydrophobic core,or at least,the tetrameric packing is energetically favored.Deviations from two-stranded association have also been seen with several other related peptides,in which the nature of the hydrophobic a and d positions in the typical heptad repeat (Lupas et al .,1991)of coiled coil sequences was altered (Lovejoy et al .,1993;Zhu et al .,1993).The tetramerizing protein was constructed and expressed in E.coli in analogy to the dimeric miniantibodies (Pack &Plu ¨ckthun,1992)by fusing the modified GCN4zipper to the single-chain Fv fragment via a hinge region.Here we describe the successful expression and purification of this tetrameric miniantibody (named scTETRAZIP)and compare its functional behavior with a miniantibody dimerized with the unmodified GCN4zipper (named scZIP).Despite the similarity of the individual domains of the constructs the protein yield of the tetrameric miniantibody was initially drastically lower (1.5m g/l per A 550unit)than for the dimer-miniantibody .This problem was circum-vented by exchanging three amino acid residues in the heavy chain which are important for efficient folding (Knappik &Plu ¨ckthun,1995);additionally four amino-terminal amino acid residues were added for the detection with the anti-FLAG antibody (Knappik &Plu ¨ckthun,1994).The parallel,te-trameric nature of this protein may increase its sensitivity to inefficient folding,as the whole tetramer might be lost if just one of the scFv units fails to fold and starts to aggregate cooperatively .These mutations (Knappik &Plu ¨ckthun,1995)as well as slight modifications in the production procedure (Figure 2)increased the yield of functional tetrameric protein 100-fold to 0.2mg/l per A 550unit in shake flasks.T o keep the binding domains identical for functional comparisons,the stabilizing mutations and the short FLAG-epitope were also introduced into the dimeric construct scZIP .Starting from a higher yield,the relative increase was smaller,and finally 0.8mg/l per A 550unit were obtained in shake flasks.Size exclusion chromatography was used to analyze the oligomerization state of the minianti-bodies (Figure 2).This was to test whether the tetramerization seen with peptides in ultracentrifu-gation studies at concentrations from 20to 200m M (Harbury et al .,1993)is also seen in proteins purified from E.coli at a concentration as low as 0.5m M.The highly symmetrical peak at the expected molecular weight of 130kDa in the chromatogram demon-strates that a stable tetramer is purified.In comparison,the miniantibody scZIP elutes at half the molecular weight,corresponding to dimers,while the scFv fragment elutes at the expected monomeric molecular weight.All these proteins give single peaks in the elution profile,indicating a stable oligomerization for the zipper as well as for the tetra-zipper miniantibody and no unspecific aggre-gation of the scFv fragment.The binding characteristics of the dimeric andFigure 2.Size exclusion chromatography of the scFv fragment and the 2miniantibody constructs,scZIP (containing the GCN4-zipper)and scTETRAZIP (contain-ing the modified leucine zipper)at 1m M protein.The column used was Superdex-200(Pharmacia),equilibrated and run in borate-buffered saline (BBS:200mM borate/NaOH (pH 8.0),160mM NaCl).Standards (open circles)were chymotrypsin (25kDa),ovalbumin (43kDa),aldolase (158kDa)and ferritin (440kDa).The expression and purification of the miniantibodies largely followed published procedures (Pack &Plu ¨ckthun,1992,and references cited therein).However,a more robust E.coli strain derived from RV308allowing longer induction times was used and cells were disrupted by sonification.During phosphorylcholine affinity chromatography of the te-trameric,but not of the dimeric,miniantibody ,a protein co-purified,subsequently identified as GroEL (P .Pack,R.Zahn &A.Plu ¨ckthun,unpublished observation),which necessitated an additional size exclusion chromatography purification step on a Superdex-200column.tetrameric miniantibody were compared with surface plasmon resonance using the BIAcore instrument(Jo¨nsson et al.,1991)and ELISA.For both methods,phosphorylcholine coupled to bovine serum albumin(BSA-PC)was used for coating the chip or plate.Under these conditions,binding of a monomeric scFv is not detectable by ELISA due to the relatively weak intrinsic binding constant of 1.6×105M−1(Metzger et al.,1971,Glockshuber et al., 1990).The fast dissociation rate of the monomeric complex of10to38s−1(Goetze&Richards,1977) does not allow one to observe the on-and off-rates directly with the BIAcore instrument,but a steady state is reached quickly.The bivalent complex formed by bivalent miniantibodies is more stable and overcomes the unfavorable dissociation rate of the monomeric scFv(Pack&Plu¨ckthun,1992).The BIAcore sensograms at three different antigen immobilization densities and various miniantibody concentrations of the dimeric and the tetrameric miniantibody are compared in Figure3.The different binding behavior of the dimer and the tetramer can clearly be seen and binding constants can be estimated.The problem of multivalency has been addressed with the BIAcore method(Ito&Kurosawa, 1993).It should be kept in mind,however,that a quantitative evaluation is made difficult by high coating densities required for obtaining dimeric binding,bringing the on-rates in the neighborhood of mass-transport control and causing rebinding in the dissociation phase.Furthermore,individual com-ponents from the multiphasic kinetics observed are hard to separate.T o prevent mass transport limitation theflow rate was chosen to be in a range where the response was shown to beflow-rate-independent.This required a significantly higherflow rate for the tetrameric protein than for the dimer.Rebinding,which is often a problem for the evaluation of intrinsic affinities,is in our case an interesting property of the molecules to be compared and it accounts partially for the avidity effect.At the lowest immobilization density the dimeric miniantibody shows mostly monovalent binding as indicated by immediately reaching a steady state (Figure3A).The reaction is too fast to be kinetically resolved by the instrument.Hence only the equi-librium constant can be estimated by plotting signal versus concentration.The value found was about 106M−1.T wo facts indicate that dimeric binding occurs at higher coating densities.First,the association kinetics change,which is best explained by a slower reaction overlaying the fast monomeric kinetics.Second,the relative remaining signal at time points after the monomeric dissociation is complete (e.g.100seconds)increases with the coating density (Figure3C).Rebinding is likely to account for a large part of this signal and this rebinding has to be dimeric as it is not observed with the scFv monomer. Although this experiment does not rule out dimeric binding to a single BSA-PC molecule,the dependence on coating density makes it more likely that the miniantibody bridges two BSA-PC molecules.The second,slow phase during the association and dissociation of the dimer is likely to be due to bivalent binding with strict stereochemical requirements. The tetramer never shows an immediate plateau and thus always binds at least bivalently.Apparently, even at the lowest antigen concentration,BSA-PC molecules can be bridged or bound bivalently, probably because of the larger distance spanned by the tetramer.Additionally,there are many more combinations of two binding sites out of the four,of which one mayfit favorably.The overall reaction of the tetramer has a slower on-rate but also a much slower off-rate,leading to a notably higher affinity than the dimer,despite being at the border of mass-transport limitation,as evidenced by measure-ments at variousflow rates(data not shown).A closer look at Figure3reveals that this reaction deviates slightly from monophasic kinetics.The difference between the tetramer and the dimer in the rate constants is critically dependent on the conditions and the time point of comparison.At low coating densities and high tetramer concentrations,multiple binding is not favored and rebinding is hindered due to the surface being covered by the miniantibody itself(Figure3A).Thus,in the latter case,the tetramer kinetics have a significant portion of fast kinetics,which is best explained by monomeric binding.At high coating densities and low concen-trations,the dissociation of the tetramer is reduced to zero.This effect is also seen in the second part of the dissociation of the dimer(Figure3C).At all coating densities and concentrations,even when taking the different masses of scZIP and scTETRAZIP into account,the tetramer reaches much higher response units,which also reflects a higher avidity.T o get further insight into the binding properties of the tetramer,the running buffer in the dissociation phase was supplemented with PC.This causes an immediate elution(Figure3),demonstrating that the binding is entirely specific.The fast and efficient interception of surface binding by soluble PC also shows that there must be a highfluctuation of the individual binding sites,which makes the tetramer elute with the enormously fast intrinsic off-rate when soluble PC is added.T o obtain a more quantitative comparison of the binding properties the sensograms were analyzed using a pseudo-first-order kinetic model im-plemented in the vendor software(Karlsson et al., 1991),which gives an idea of the overall binding properties using the linear range of the relevant plot (T able1).Since this model is not valid for the observed kinetics,which are more complicated as easily seen in the case of the dimer,a biphasic association and dissociation model was used tofit the data obtained with the medium coating density surface(Figure3B). Using this model the fast components in the association phase are relatively similar(T able1).The striking difference between the dimeric and tetrameric miniantibody is,according to this model, based on the difference in the dissociation rate and the different contributions(amplitudes)of the fast and slow kinetics to the signal.In the case of the dimerparison of BIAcore runs of the dimeric and tetrameric miniantibody at different coating densities and increasing concentrations of the miniantibodies.Coating densities were:A,1270BIAcore resonance units(RU);B,5000RU; C,10,000RU.Phosphorylcholine was coupled to BSA as described(Pack&Plu¨ckthun,1992)and the derivatized BSA was coupled to the sensor chip(CM5research grade)using N-ethyl-N'-(dimethylaminopropyl)carbodiimide hydrochloride (EDC)and N-hydroxysuccinimide(NHS)amide coupling.The running buffer was BBS,which was also used as dialysis buffer and dilution buffer for the samples.Theflow rate was8m l/min for the scZIP and16m l/min for the scTETRAZIP measurements.For the additional washing step,running buffer was supplemented with PC at a concentration of20mM in runs A and B and5mM in run C.The elevated signal level during the washing with PC reflects the refractive index of this solution.HCl(20mM)was used for chip regeneration.In control experiments performed with underivatized BSA or without coating,binding and the bulk effect were negligible.T able1Kinetic parameters†for the dimeric and tetrameric miniantibody derived from the BIAcore measurementsCoatingdensity scZIP scTETRAZIP(resonance units)k ass×10−5(M−1s−1)k diss×102(s−1)k ass×10−5(M−1s−1)k diss×102(s−1)1270k ass‡—k diss‡—k ass§3k diss§0.2–0.35000k ass§1k diss§1–2k ass§0.8k diss§0.02–0.09 k ass fast>0.8k diss fast>17–19k ass fast>1k dss fast>1–2k asss slow>0.03k diss slow>0.3–0.6k asss slow>0.06k diss slow>0.02–0.0510,000k ass§1k diss§1–2k ass§0.5k diss§0.01–0.09†Goetze&Richards(1977)obtained with soluble phosphorylcholine and the whole McPC603antibody an intrinsic association rate constant of9.5to38×105M−1s−1and a dissociation rate constant of10to38s−1from NMR measurements.‡This reaction was too fast to be kinetically resolved,therefore only the equilibrium constant was obtained from the plateau level(see the text).§Monophasic evaluation:the slope of the d R/d t versus R plots yields k obs,which was plotted against the concentration of soluble antibody to obtain k ass,and ln(R1/R n)was plotted versus time to obtain k diss using the vendor software as described by Karlsson et al.(1991). >Biphasic evaluation:2independent pseudo-first-order phases describing reactions of the type A+B F AB were used.The assumptions were:[A]is constant,[B]=R max−R and[AB]equals the resonance signal R.Integration of the rate equation:d Rd t=k ass[A](R max−R)−k diss Rgives an expression for the resonance-signal as a function of time:R=k ass[A]R maxk ass[A]+k diss(1−e−(k ass[A]+k diss)t).Thus,the total expression used infitting is of the form:R=C1(1−e−k obs1t)+C2(1−e−k obs2t)+C3,where C1,C2,k obs1,k obs2were the parametersfitted,and C3was thefixed baseline.An analogousfit was used for the dissociation phase.A Marquardt–Levenberg algorithm implemented in the program Origin was used forfitting.The association constant was obtained from the plot k obs versus concentration.the amplitude ratio of the fast to the slow kinetics ranges from1.7to3.9,whereas the same ratio for the tetramer ranges from0.06to0.45.During the dissociation phase the difference in this ratio is even higher.W e propose that this is an indication of the fraction of molecules displaying monovalent versus multivalent binding.The dimeric and tetrameric miniantibodies were also compared in their binding properties in a competition ELISA.In this assay increasing amounts of the zipper and tetrazipper miniantibodies were added to a constant amount of TEPC15antibody (Figure4A),a related IgA which recognizes the same antigen(Perlmutter et al.,1984).The second, enzyme-linked antibody detected the constant a domain without cross-reaction.The displacing effect of the tetramer clearly exceeds the competition of the dimericminiantibody,ofwhichaboutatenfoldhigher concentration is required for the same reduction in binding of the IgA.This is another validation that the avidity of the tetrameric molecule is increased.In a direct comparison of ELISA signals obtained withincreasingamountsofthedimericandtetrameric miniantibodies,a sevenfold higher concentration of the scZIP molecule is needed to obtain the same response(Figure4B).Since in this case the observed signal is mediated by a secondary antibody directed against the variable domains,part of the sensitivity increase may be due to more secondary antibodies binding to the twofold larger scTETRAZIP.However, the different shape of the response curve is inconsistent with this being the only cause of the increased sensitivity of the scTETRAZIP.Rather,there must also be an increase in avidity,which is consistent with the competition ELISA and BIAcore data(see above).The gain in stability of scTETRAZIP-antigen complexes in ELISA is also seen in comparing the response as a function of antigen coating density (Figure4C and D).A steeper increase in the response is seen for the scTETRAZIP than for the scZIP,when unspecific binding is accounted for(Figure4C and D). This suggests that more moleculesfind a suitable secondbindingpartnerinthecaseofthetetramerthan the dimer at low coating densities.Furthermore,a further binding site may become involved in the case of the scTETRAZIP at high coating density.Since the inhibition of binding by soluble hapten is morequantitativeandthereproducibilityofthesignal is better in the case of the scTETRAZIP than the dimeric scZIP,we conclude that the tetramerization of the zipper sequences leads to a better shielding of the hydrophobiccore,resultinginfewerhydrophobicand unspecific interactions than with the GCN4zipper. This is very reminiscent of the comparison of the antiparallel four-helix bundles with the dimeric coiled-coil helices as dimerization motifs(Pack et al., 1993).Additionally,crystallographic data reveal that the hydrophobic surface buried by association is 900A˚2per helix in the parallel dimer,but1640A˚2per helix in the tetrameric,altered version of the GCN4 zipper(O’Shea et al.,1991;Harbury et al.,1993).W e suggest that four-helix arrangements are generally superior to dimeric coiled coil helices because the latter expose too much of the hydrophobic surface in the dimeric state.Hence single helices of the typeFigure 4.A,Competition ELISA detecting the binding of the mouse IgA TEPC15,which is displaced by either the dimeric miniantibody (W )or the tetrameric miniantibody (r ).Microtiter plates were coated with 200m g/ml BSA-PC,blocked with 3%(w/v)skim milk powder and washed with BBS.After incubating the antibody mixture for 1h and washing with PBS-T ween TEPC15,binding was detected with a peroxidase coupled anti-a mouse serum (Sigma).TEPC15,purchased as ascites (Sigma)was reduced,alkylated (Goetzl &Metzger,1970)and affinity purified like the miniantibodies.B,Functional ELISA of the miniantibody constructs scZIP and scTETRAZIP .Comparison of different concentrations of the miniantibodies was carried out at constant hapten density .The ELISA wells were coated with the hapten carrier BSA-PC (200m g/ml),and the amount of antibody fragment per well (given as mol molecules/well)is indicated (r ,scTETRAZIP ,W ,scZIP).A rabbit polyclonal serum against the McPC603variable domains and as the second antibody an anti-rabbit-IgG serum was used.C,scTETRAZIP;and D,scZIP analyzed in a competitive coating ELISA (Pack et al .,1992).Coating of the ELISA plates was carried out with a total of 300m g/ml of a mixture of BSA and BSA-PC.The concentration of the latter is given;the amount remaining to 300m g/ml was BSA.The amount of miniantibody was 4×10−12mol/well (antigen binding sites).Open squares give the signal in the presence of 1mM PC.Binding was detected as in B.The maximum signal (A 405)for the given concentration range was set to 100%.Inhibition of binding by soluble PC was 100%for scTETRAZIP but only 70to 81%for scZIP .Therefore,a constant signal (about 25%of the maximum)is due to unspecific adsorption of scZIP to the plate and should be viewed as a baseline.Only the values above this level reflect functional binding and should be used in comparison with scTETRAZIP .described here are suitable for tetramers and helix-turn-helix modules for dimers.W e have demonstrated a method to produce tetravalent scTETRAZIP miniantibodies,which have the molecular weight of bivalent F(ab')2fragments.They exhibit enhanced binding compared to the bivalent miniantibody scZIP as demonstrated by BIAcore and ELISA measurements.The effect of multivalency depends on the antigen density .By altering the hinge region,the antibody might be fine-tuned to a given biological problem.The design of our miniantibody may be useful in contexts where extremely high avidity is important or multiple binding is essential,like receptor cross-linking,yet the molecular weight should remain small.The general design principle should beuseful to assemble a wide range of molecular complexes.AcknowledgementsW e thank Drs T.Alber and P .Kim for making data available prior to publication,Dr D.Riesenberg and colleagues for helpful discussions and K.M.Arndt for help with the experiments.ReferencesCrothers,D.M.&Metzger,H.(1972).The influence ofpolyvalency on the binding properties of antibodies.Immunochemistry,9,341–357.Devey,M.E.&Steward,M.W.(1988).The role of antibody affinity in the performance of solid phase assays.In ELISA and Other Solid Phase Immunoassays(Kemeny,D.M.&Challacombe,S.J.,eds),pp.135-153,JohnWiley&Sons Ltd,London.Glockshuber,R.,Malia,M.,Pfitzinger,I.&Pu¨ckthun,A.(1990).A comparison of strategies to stabilize immunoglobulin Fv-fragments.Biochemistry29,1362–1367.Goetze,A.M.&Richards,J.(1977).Structure-function relations in phosphorylcholine-binding mouse myeloma proteins.Proc.Nat.Acad.Sci.,U.S.A.74, 2109–2112.Goetzl,E.J.&Metzger,H.(1970).Affinity labeling of a mouse myeloma protein which binds nitrophenyl ligands.Kinetics of labeling and isolation of a labeled peptide.Biochemistry,9,1267–1278.Harbury,P.B.,Zhang,T.,Kim,P.S.&Alber,T.(1993).A switch between two-,three-and four-stranded coiled coils in GCN4leucine zipper mutants.Science,262, 1401–1407.Ho,S.P.&DeGrado,W.F.(1987).Design of a4-helix bundle protein:synthesis of peptides which self-associate into a helical protein.J.Amer.Chem.Soc.109, 6751–6758.Ito,W.&Kurosawa,Y.(1993).Development of an artificial antibody system with.multiple valency using an Fv fragment fused to a fragment of protein A.J.Biol.Chem.268,20668–20675.Jo¨nsson,U.,Fa¨gerstam,L.,Ivarsson, B.,Johnsson, B., Karlsson,R.,Lundh,K.,Lo¨fas,S.,Persson, B., Roos,H.,Ro¨nnberg,I.,Sjo¨lander,S.,Stenberg,E., Stahlberg,R.,Urbaniczky, C.,O¨stlin,H.& Malmqvist,M.(1991).Real-time biospecific inter-action analysis using surface plasmon resonance and a sensor chip technology.BioT echniques,11, 620–627.Karlsson,R.,Michaelsson,A.&Mattsson,L.(1991).Kinetic analysis of monoclonal antibody-antigen interactions with a new biosensor based analytical system.J.Immunol.Methods,145,229–240.Kaufman,E.N.&Jain,R.K.(1992).Effect of bivalent interaction upon apparent antibody affinity:exper-imental confirmation of theory usingfluorescence photobleaching and implications for antibody binding assays.Cancer Res.52,4157–4167.Knappik,A.&Plu¨ckthun,A.(1994).An improved tag based on the FLAG peptide for the detection and purification of antibody fragments in Escherichia coli.BioT echniques,17,754–761.Knappik,A.&Plu¨ckthun,A.(1995).Engineered turns ofa recombinant antibody improve its in vivo folding.Protein Eng.,in the press.Lovejoy,B.,Choe,S.,Cascio,D.,McRorie,D.K.,DeGrado, W.F.&Eisenberg,D.(1993).Crystal structure of a synthetic triple-stranded a-helical bundle.Science, 259,1288–1293.Lupas,A.,V an Dyke,M.&Stock,J(1991).Predicting coiled coils from protein sequences.Science,252, 1162–1164.Metzger,H.,Chesebro, B.,Hadler,N.M.,Lee,J.& Otchin,N.(1971).Modification of immunoglobulin combining sites.In Progress in Immunology:Proceedings of the1st Congress of Immunology(Amos,B.,ed.), pp.253–267,Academic Press,New Y ork.O’Shea,E.K.,Klemm,J.D.,Kim,P.S.&Alber,T.(1991).X-ray structure of the GCN4leucine zipper,a two-stranded,parallel coiled coil.Science,254, 539–545.Pack,P.&Plu¨ckthun, A.(1992).Miniantibodies:use of amphipathic helices to produce functional,flexibly linked dimeric Fv fragments with high avidity in Escherichia coli.Biochemistry,31,1579–1584.Pack,P.,Knappik,A.,Krebber,C.&Plu¨ckthun,A.(1992).Mono-and bivalent antibody fragments produced inE.coli:binding properties and folding in vivo.In ACSConference Proceedings Series,Harnessing Biotechnology for the21st Century(Ladisch,M.R.&Bose,A.,eds), pp.10–13,Amer.Chem.Soc.,W ashington DC. Pack,P.,Kujau,M.,Schroeckh,V.,Knu¨pfer,U.,W enderoth, R.,Riesenberg,D.&Plu¨ckthun,A.(1993).Improved bivalent miniantibodies with identical avidity as whole antibodies,produced by high cell density fermentation of Escherichia coli.Biotechnology,11, 1271–1277.Perlmutter,R.M.,Crews,S.T.,Douglas,R.,Soerensen G., Johnson,N.,Nivera,N.,Gearhart,P.J.&Hood,L.(1984)The generation of diversity in phosphoryl-choline-binding antibodies.Advan.Immunol.35,1–37. Satow,Y.,Cohen,G.H.,Padlan,E.A.&Davies,D.R.(1986).Phosphocholine binding immunoglobulin Fab McPC603.An X-ray diffraction study at2.7A˚.J.Mol.Biol.190,593–604.Zhu,B.-Y.,Zhou,N.E.,Kay,C.M.&Hodges,R.S.(1993).Packing and hydrophobicity effects on protein folding and stability:effects of b-branched amino acids,valine and isoleucine,on the formation and stability of two-stranded a-helical coiled coils/leucine zippers.Protein Sci.2,383–394.Edited by I.B.Holland(Received25August1994;accepted in revised form27October1994)。
SPE-159919译文
SPE 159919裂缝型页岩气藏中多尺度流动的扩展有限元建模M. Sheng1, SPE, G. Li, SPE,中国石油大学(北京), S.N. Shah, SPE, and X. Jin, SPE, 俄克拉何马大学版权所有2012,石油工程师学会这篇是准备在美国德克萨斯州圣安东尼奥2012年10月8-10日举行的SPE年度技术会议和展览上进行发表的文章。
本文是SPE程序委员会选定审查的,当中未确认作者所提交的摘要信息。
本文的内容没有被石油工程师学会审查的,也未进行作者更正。
材料不一定反映石油工程师在社会的任何位置,管理人员或成员。
在没有石油工程师的社会的书面同意的情况下禁止电子复制、分发、或储存该文章的任何部分。
印刷复制许可限制在300字以内的摘要;插图不可以被复制。
摘要必须明显包含和承认SPE所有的版权。
摘要一个页岩气的经济生产方案需要更好地了解其气体流动方式和建立合适的油气藏模型。
在复杂的裂缝中和多尺度流动通道中气体流动行为的复杂程度加强。
这篇文章结合改进页岩气运输模型和扩展有限元建模(XFEM)来描述页岩气的主要流动机制和其离散裂隙网络。
页岩气的被视为具有离散裂缝的双重渗透介质。
离散裂缝不需要划分网格,它可以将给定的位置、长度和取向放在任何地方。
岩石变形与瓦斯流动的隐式耦合反映页岩气的应力敏感性。
此外,在破碎断裂中的置换和基质孔隙水压力被视为不连续的近似函数集合。
用计算机编码的开发一个模型,此模型以双渗介质固结问题为验证代码。
结果表明与常规压力场的连续裂缝模型的比较,页岩气的压力场明显被离散裂缝干扰。
因此,将页岩气所处裂隙认为是多孔介质离散裂缝是很重要的。
为提高上述模型的应用,页岩气储层提出了一个案例研究。
模拟在裂缝性储层中以双模式网络为基础。
因为前者使孔隙水压力场耗尽对称,显而易见正交裂隙网络是一个与斜裂缝相反的理想模式。
此外,敏感区域是控制压力衰减的主要因素。
结果表明,所提出的模型和代码是能够模拟页岩气藏所处的离散裂隙网络的。
Global Existence Results for the Anisotropic Boussinesq System in Dimension Two
arXiv:0809.4984v1 [math.AP] 29 Sep 2008
1. Introduction The Boussinesq system describes the influence of the convection (or convection-diffusion) phenomenon in a viscous or inviscid fluid. It is used as a toy model for geophysical fluids whenever rotation and stratification play an important role (see for example J. Pedlosky’s book [23]). In the two-dimensional case, the Boussinesq system reads: ∂t θ + u · ∇θ − κ∆θ = 0 (Bκ,ν ) ∂t u + u · ∇u − ν ∆u + ∇Π = θ e2 with e2 = (0, 1), div u = 0.
Above, u = u(t, x) denotes the velocity vector-field and θ = θ (t, x) is a scalar quantity such as the concentration of a chemical substance or the temperature variation in a gravity field, in which case θ e2 represents the buoyancy force. The nonnegative parameters κ and ν denote respectively the molecular diffusion and the viscosity. In order to simplify the presentation, we restrict ourselves to the whole plan case (that is the space variable x describes the whole R2 ) and focus on the evolution for positive times (i.e. t ∈ R+ ). In the case where both κ and ν are positive, classical methods allow to establish the global existence of regular solutions (see for example [6, 18]). On the other hand, if κ = ν = 0 then constructing global unique solutions for some nonconstant θ0 is a challenging open problem (even in the two-dimensional case) which has many similarities with the global existence problem for the three-dimensional incompressible Euler equations. The intermediate situation where the diffusion acts only on one of the equations has been investigated in a number of recent papers. Under various regularity assumptions on the initial data, it has been shown that for arbitrarily large initial data, systems (Bκ,0 ) with κ > 0 and (B0,ν ) with ν > 0 admit a global unique solution (see for example [1, 7, 13, 14, 15, 19, 20]). In the present paper, we aim at making one more step toward the study of the system with κ = ν = 0 by assuming that the diffusion or the viscosity occurs in the horizontal direction and in one of the equations only. More precisely, we want to consider the following two systems: ∂t θ + u · ∇θ = 0 2 u + ∇Π = θ e (1) ∂t u + u · ∇u − ν∂1 2 div u = 0
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
56
Appl. Math. J. Chinese Univ.
Vol. 34, No. 1
for all (x, y1, . . . , ym) ∈ (Rn)m+1 with x = yj for some j ∈ {1, 2, . . . , m}; and the smoothness
condition
Appl. Math. J. Chinese Univ. 2019, 34(1): 55-75
Compactness for the commutators of multilinear singular integral operators with non-smooth kernels
BU Rui1 CHEN Jie-cheng∗,2
Abstract. In this paper, the behavior for commutators of a class of bilinear singular integral operators associated with non-smooth kernels on the product of weighted Lebesgue spaces is considered. By some new maximal functions to control the commutators of bilinear singular integral operators and CMO(Rn) functions, compactness for the commutators is proved.
A|x − x |γ
|K(x, y1, . . . , ym) − K(x , y1, . . . , ym)| ≤ (
m i=1
|x
−
yi|)mn+γ
,
whenever
|x
−
x
|
≤
1 2
max1≤j≤m
|x
−
yj |
and
also
for
each
j,
|K(x, y1, . . . , yj, . . . , ym) − K(x, y1, . . . , yj, . . . , ym)| ≤ (
constants A and γ ∈ (0, 1] such that K satisfies the size condi . . , ym)| ≤ (|x − y1| + · · · + |x − ym|)mn
(2)
Received: 2016-10-13. Revised: 2018-12-10. MR Subject Classification: 42B20, 42B25. Keywords: singular integral operator, maximal function, weighted norm inequality, commutator, compact operator. Digital Object Identifier(DOI): https:///10.1007/s11766-019-3501-z. Supported by the Natural Science Foundation of Shandong Province(Nos. ZR2018PA004 and ZR2016AB07), the National Natural Science Foundation of China(Nos. 11571306 and 11671363). *Corresponding author.
· · · = ym in (Rn)m+1, x ∈/ ∩m j=1supp fj and f1, . . . , fm are bounded functions with compact
supports. Precisely,
T : S(Rn) × · · · × S(Rn) → S (Rn)
is an m-linear operator associated with the kernel K(x, y1, . . . , ym). If there exist positive
§1 Introduction
In recent decades, the study of multilinear analysis becomes an active topic in harmonic analysis. The first important work, among several pioneer papers, is the famous work by Coifman and Meyer in [8,9], where they established a bilinear multiplier theorem on the Lebesgue spaces. Note that a multilinear multiplier actually is a convolution operator. Naturally one will study the non-convolution operator
A|yj
m i=1
|x
− −
yj |γ yi|)mn+γ
,
(3)
whenever
|yj
− yj|
≤
1 2
max1≤j≤m |x
−
yj |,
then
we
say
that
K
is
a
Calder´on-Zygmund
kernel
and denote it by K ∈ m − CZK(A, γ). Also, T is called the multilinear Calder´on-Zygmund
T (f1, . . . , fm)(x) =
K(x, y1, . . . , ym)f1(y1) · · · fm(ym)dy1 · · · dym,
(1)
(Rn )m
where K(x, y1, . . . , ym) is a locally integral function defined away from the diagonal x = y1 =