An Optimal Method of Diffusion Algorithm for Hetergeneous System
算术平均牛顿法的英文
算术平均牛顿法的英文Arithmetic-Geometric Mean Newton's Method.The arithmetic-geometric mean (AGM) Newton's method is an iterative algorithm used in numerical analysis to approximate the solution of equations, particularly those involving transcendental functions. This method is avariant of the classical Newton's method, which uses the tangent line to the function at a given point to approximate the root of the function. The AGM Newton's method incorporates the arithmetic-geometric mean (AGM) iteration, which is itself a fast converging method for computing the square root of a number.Background on Newton's Method:Newton's method is based on the Taylor series expansion of a function. Given a function f(x) and its derivativef'(x), the method starts with an initial guess x0 and iteratively updates the approximation using the formula:x_{n+1} = x_n f(x_n) / f'(x_n)。
An optimal regularization method for an inverse problem of time fraction diffusion equation
An optimal regularization method for an inverse problem of time fraction diffusionequationXiang-Tuan Xiong1∗,Xiao-Hong Liu2Department of Mathematics and Information,Northwest Normal University,Lanzhou,Gansu,730070,People’s Republic of ChinaAbstractIn this paper,we consider the time fraction inverse advection-dispersion problem(TFI-ADP)in a belt plane.The solute concentration is sought from a measured concentration history at afixed location inside the body.Such problem is obtained from the classical advection-dispersion equation by replacing thefirst-order time derivate by the Caputo frac-tional derivative of orderα(0<α<1).We show that the TFIADP is severely ill-posed and further apply an optimal modified method to solve it based on the solution given by the Fourier method.We give and prove the optimal convergence estimate,which shows that the regularized solution is dependent continuously on the data and is an approximation of the exact solution of the TFIADP.Key words:An optimal modified method;Time fraction inverse advection-dispersion equation;Fourier transform;optimal estimate1IntroductionFraction derivatives calculus and fraction differential equation have been used recently to describe a range of problems in physical,chemical,biology,mechanical engineering,signal processing and systems identification,electrical,control theory,finance,fractional dynam-ics,refer to[22,24,26].For example,in practical physical applications,Brownian motion,the diffusion with an additional velocityfield and diffusion under the influence of a constant external forcefield are both modeled by the advection-dispersion equation(ADE)[12,17]. However,in the case of anomalous diffusion this is no longer true,i.e.,the fractional gen-eralization may be different for the advection case and the transport in an external force field[22].A straightforward extension of the continuous time random walk(CTRW)model leads to a fractional advection-dispersion equation(FADE).The direct problem,i.e.,initial ∗Corresponding author.Email addresss:qianal07@;Tel:+867158144689;Fax:+867158144688value problem and initial boundary value problem for FADE have been studied extensively in the past few years[13-16,18,19,21,23,25,27,28].However,in some practical problems,the boundary data on the whole boundary cannot be obtained.We only know the noisy data on a part of the boundary or at some interior points of the concerned domain,which will lead to some inverse problem,i.e.,fractional inverse advection-dispersion problem(FIADP). To the author’s knowledge,the result for FIADP is still very sparse.2Ill-posedness of the problem and regularizationa Description of the problemIn this paper,we consider the FIADP as follows:Dαtu=u xx,x>0,t>0,(2.1) u(x,0)=0,0≤x≤1,(2.2)u(1,t)=f(t),t≥0,u(x,t)|x→∞bounded.(2.3)where u is the solute concentration.The time fractional derivative Dαt is the Caputofractional derivative of orderα(0<α<1)defined byDαt =1Γ(1−α)t∂u(x,s)∂sd s(t−s)α,0<α<1(2.4)Dαt=u t(x,t),α=1,(2.5)whereΓ(·)is Gamma function,see[24].The TFIADE is an inverse problem and is severely ill-posed.That means the solution does not depend continuously on the given data and any small perturbation in the given data many cause large change to the solution.In this paper,we use an optimal modified method to solve the TFIADE in a belt plane.In the present paper,motivateed by Ref.9,we study the time fraction inverse advection-dispersion problem(TFIADE)from a new point of view.We construct a stable approximate solution for the TFIADE and present convergence result under suitable choices of the regularization parameter.Our paper is divided into three sections:In Section2,we present the ill-posedness of the problem and propose a modified regularization method.In section3,optimal estimates for solute concentration u is given based on a priori assumption for exact solution.b Ill-posednessIn order to use the Fourier transform,we extend the function u(x,·),f(·),and fδ(·)to the whole line−∞<t<∞by defining them to be zero for t<0.Here,and in the following sections, · denotes the L2norm,i.e.f =(R|f(t)|2d t)12We now could assume that the measured data function fδ(t)satisfiesf−fδ ≤δ,(2.6)where the constantδ>0represents a bound on the measurement error.Letˆf(ξ)=1√2π ∞−∞f(t)e−iξt d t,ξ∈Rbe the Fourier transform of a function f(t).Applying this transformation to Eq.(2.1)with respect to t,we obtainˆu xx(x,ξ)=(iξ)αˆu(x,ξ)which is a one-order ordinary differential equation.Now using condition(2.3)in frequency domain,we can easily get the solution of problem(2.1-2.3)ˆu(x,ξ)=ˆf(ξ)e(1−x)(iξ)α20≤x≤1,ξ∈R(2.7) where(iξ)α2=|ξ|α2(cos απ4+isgn(ξ)sinαπ4)=:a+bi.(2.8)since|e(1−x)(iξ)α2|is unbounded with respect to variableξforfixed0<x<1,and our solutionˆu(x,ξ)with respect toξis assumed to be in L2(R),for0≤x<1the exact data function,ˆf(ξ),must decay rapidly as|ξ|→∞.However,the data f in problem(2.1-2.3) are generally based on observations,and we only have the noisy data fδ∈L2(R)with f−fδ L2(R)≤δ.Since we cannot expect the measurement dataˆfδ(ξ)have the same decay in the frequency domain as the exact dataˆf(ξ),the solutionˆuδ(x,ξ)=e(1−x)(iξ)α2ˆfδ(ξ)will not,in general,be in L2(R)forfixed0≤x<1.Thus,if we try to solve problem(2.1-2.3) numerically,high frequency components in the error,δ,are magnified and can destroy the solution.c regularizationFrom the analysis of Sec3,we know that the real cause of ill-posedness is that the noise of data in the high frequency components blows up the solution.Also from(2.7),we have two ways to stably solve the concrete inverse problem.One way is to eliminate the noise in the high frequency components through mollifying the noisy data.4,5H`a o5used the Dirichlet kernel to mollify the noisy data,and Guo and Murio4used the Gaussian kernel to mollify the noisy data.The other way is to eliminate the high frequency effect through modifying the”kernel”e(1−x)(iξ)α2.In the present paper,we are interested in the optimal regularization method and the optimal convergence estimate.Motivated by Refs.4,5,and 8-11,we discuss a regularization method,which is exactly the spectral method(6.7)of Ref.11,in the frequency domain according toˆuδβ(x,ξ)=kβ(x,ξ)ˆfδ(ξ),(2.9) wherekβ(x,ξ)=e(1−x)(iξ)α2,|e(1−x)(iξ)α2|≤β(x)β(x)e i(1−x)sgn(ξ)(ξ)α2sinαπ4,|e(1−x)(iξ)α2|>β(x).(2.10)β(x),which will be determined in Sec.3,can be considered as a constant,which depends only on the location x.To obtain the convergent rate between the regularized solution(2.9)and the exact one (2.7),like any other ill-posed problems,we need to assume the a priori boundu(0,·) p≤E,p≥0,(2.11) where E>0is a constant and · p denotes the norm in Sobolev space H p(R)defined byu(0,·) p:=(R(1+ξ2)p|ˆu(0,·)|2dξ)12.Obviously,when p=0,H p(R)=H0(R)=L2(R),and formula(2.11)is bounded in the L2(R)-norm.Obviously,the larger the p,the more restrictive is the assumption(2.11). Thus,when considering a real problem,we just assume the smallest p that satisfies our requirement.3Optimal estimateIn what follows,we study the properties of(2.9)considered as a regularized solution of problem(2.1-2.3).We give the optimal convergence estimate which shows that formula (2.9)is really an effective approximation.Wefirst analyze the properties of the modified kernel kβ(x,ξ)in(4.1).From two parts of kβ(x,ξ),we at leastfind that(a)the kβ(x,ξ)completely reserving the low frequency com-ponents(i.e.,for small|ξ|,the constructed new kernel equals to the exact kernel=e(1−x)(iξ)α2) since these components contain the main information of solution.(b)The kβ(x,ξ)elimi-nating the effect of high frequency components(i.e.,the new kernel is bounded even if|ξ| tend to infinity)since these components are the natural cause of producing ill-posedness. Property(a)indicates that the constructed regularization solution is an approximation of the exact one.Property(b)indicates that constructed regularization solution is continu-ously dependent on the data.Both properties(a)and(b)are also the basic requirement of the general regularization principle3,6.Theorem3.1.Let u be the solution of problem(2.1-2.3)with the exact data f,which canbe expressed as formula(2.7)in the frequency domain.Let uδβbe the regularized solution with the measured data fδ,which can be expressed as formula(2.9)in the frequency domain.Assume that the measured data at x=1,fδ,satisfies f−fδ ≤δ,and the a priori bound(2.11)holds.Then,if choosingβ(x)=x(Eδ)1−x,(3.1)there holds the optimal estimate for p=0andfixed0<x<1,u(x,·)−uδβ(x,·) ≤E1−xδx.(3.2)Proof.:According to(2.8),we can rewrite solution(2.7)asˆu(x,ξ)=e(1−x)(a+bi)ˆf(ξ),(3.3) wherea=|ξ|α2cos απ4,b=|ξ|α2sgn(ξ)sinαπ4,and rewrite the regularized solution(2.9)asˆuδβ(x,ξ)=kβ(x,ξ)ˆfδ(ξ),(3.4) wherekβ(x,a,b)=e(1−x)(a+bi),e(1−x)a≤β(x)β(x)e(1−x)bi,e(1−x)a>β(x).(3.5)From(3.3)we haveˆf(ξ)=e−(a+bi)ˆu(0,ξ).(3.6) Now using the Parseval relation,(3.3)and(3.4),we haveu(x,·)−uδβ(x,·) = ˆu(x,·)−ˆuδβ(x,·) = e(1−x)(a+bi)ˆf−kβ(x,a,b) .(3.7)Adding and subtracting kβ(x,a,b)ˆf,and using the triangle inequality,we getu(x,·)−uδβ(x,·) ≤ (e(1−x)(a+bi)−kβ(x,a,b))ˆf + kβ(x,a,b)(ˆf−ˆfδ) .(3.8) The second term on the right-hand side of(3.8)is easy,i.e.,kβ(x,a,b)(ˆf−ˆfδ ≤δsupξ∈R|kβ(x,a,b)|≤δβ(x),(3.9) Where we have also used the error bound f−fδ ≤δ.We now estimate thefirst term on the right hand side of(3.8),note that(3.5)and(3.6),(e(1−x)(a+bi)−kβ(x,a,b))ˆf=e(1−x)(a+bi)−kβ(x,a,b)e a+biˆu(0,ξ)=e(1−x)(a+bi)−min{e(1−x)a,β(x)}e(1−x)bie a+biˆu(0,ξ)=(e(1−x)a−min{e(1−x)a,β(x)}e(1−x)bie a+biˆu(0,ξ).Consequently,we have(e(1−x)(a+bi)−kβ(x,a,b))ˆf ≤supξ∈R |(e(1−x)a−min{e(1−x)a,β(x)})e(1−x)bie a+bi|E=supξ∈R,e(1−x)a>β(x)e(1−x)a−β(x)e aE,where we have also used the a priori bound(2.11)for p=0.Let h(a)=(e(1−x)a−β(x))/e a.Differentiating h(a)and setting the derivative equal tozero,wefinde(1−x)a=1xβ(x).(3.10)When formula(3.10)holds,the function h(a)arrives at the maximum value. Therefore,we have(e(1−x)(a+bi)−kβ(x,a,b))ˆf ≤(1x−1)β(x)(1xβ(x))11−xE=1−xx(1x)−11−xβ(x)−x1−x E.(3.11)Combining(3.8),(3.9),and(3.11)with the choice ofβ(x)in(3.1),we arrive at optimal estimate(3.2),i.e.,u(x,·)−uδβ(x,·) ≤1−xx(1x)−11−xβ(x)−x1−x E+δβ(x)=(1−xx)(1x)−11−x(x(Eδ)1−x)−x1−x E+δx(Eδ)1−x =(1−xx)(x)11−x(x)−x1−x(Eδ)(−x)E+δx(Eδ)1−x =(1−x)(Eδ)(−x)E+δx(Eδ)1−x=(Eδ)(−x)((1−x)E+δxEδ)=(Eδ)(−x)E=E1−xδx.Remark3.2:In our application u(x,·) p is usually not known,therefore we have no exact a priori bound E in(2.11)and cannot choose the parameterβ(x)according to(3.1).However,if selectingβ∗(x)=x(1δ)1−x,(3.12)we can also obtain the convergent rateu(x,·)−uδβ∗(x,·) ≤cδx,where the constant c=1/((1−x) u(0,·) p+x).This choice is helpful in our realistic computation.Although the argument above provides an approximation only for0<x<1,we may use this construction to obtain an estimate for g(t):=u(0,t).Although we will not have such a nice estimate as for0<x<1,we do obtain convergence asδ→0.In the following discussion,not only we obtain the convergence at x=0,but also we obtain the explicit convergent rate.In fact,since we are now specially for the endpoint x=0,we rewrite the regularization formula instead of(3.4)ofˆgδβ(ξ)=kβ(a,b)ˆfδ(ξ),(3.13) wherekβ(a,b)=e a+bi,e a≤ββe bi,e a>β.β,which will be given in the following theorem,can be considered as a constant. Theorem3.3:Let u be the solution of problem(2.1-2.3)with the exact data f,which can be expressed as formula(3.3)in the frequency domain.Let gδβbe the regularized solution with the measure data fδ,which can be expressed as formula(3.13)in the frequency do-main.Assume that the measured data at x=1,fδ,satisfy f−fδ ≤δ,and the a priori bound(2.11)holds.Then,if choosingβ=1c(a0)δ−r,(3.14)the constant c(a0)>1,0<r<1, there golds the convergence estimate,g−gδβ ≤c(ln1δ)−2pα,p>0.(3.15)Proof.:Following the process of proof of Theorem3.1,we haveg−gδβ = ˆg−ˆgδβ= e(a+bi)ˆf−kβˆfδ≤ e(a+bi)ˆf−kβˆf + kβˆf−kβˆfδ=e a+biˆf−kβ(1+ξ2)p2e a+bi(1+ξ2)p2e a+biˆf + kβ(ˆf−ˆfδ=e a+bi−kβ(1+ξ2)p2e a+bi(1+ξ2)p2ˆu(0,·) + kβ(ˆf−ˆfδ≤supξ∈R|e a+bi−kβ(1+ξ2)p2e a+bi|E+sup|kβ|δ=supξ∈R |(e a−min{e a,β})e bi(1+ξ2)p2e a+bi|E+supξ∈R|kβ|δ≤E supξ∈R,e a>βe a−β(1+ξ2)p2e a+δβ≤E supξ∈R,e a>βe a−βξp e a+δβ≤E supξ∈R,e a>βe a−βa2pαe a+δβwhere,a=|ξ|α2cos απ4≤|ξ|α2,therefore,|ξ|p≥a2pα.Let h(a)=(e a−β)/a2pαe a.Differentiating h(a)and setting the derivative equals to zero,we find2p αβ+a0β−e a02pαe a0a2pα=0,i.e.,e a0=c(a0)β,c(a0)=(1+αa02p)>1.Therefore,g−gδβ ≤c(a0)−1c(a0)(ln(c(a0)β))−2pαE+δβ=c(a0)−1c(a0)(r ln1δ)−2pαE+1c(a0)δ1−r ≤c(ln1δ)−2pα,δ→0,where we have also used formula(3.14).Our Theorem3.1shows that we not only obtain the convergence but also we give the optimal convergent rate for0<x<1,Theorem3.3shoes that we really get the conver-gence for the endpoint x=0.Remark3.4:It is straightforward to implement the proposed methods numerically accord-ing to(2.9)or(3.13).A numerical procedure for the method can be based on the discrete Fourier transform.However,we do not pursue this aspect in this paper,as our aim here is to obtain optimal stability estimate only.AcknowledgmentsThe author wants to express her thanks to the referee for many valuable comments.References[1]Beck,J.V.,Blackwell,B.,and Clair,C.R.S.,Inverse Heat Conduction:Ill-Posed Prob-lems(Wiley,New York,1985).[2]Eld´e n,L.,Berntsson,F.,and Regi´n ska,T.,”Wavelet and Fourier methods for solving the side-ways heat equation,”put.(USE)21,2087(2000).[3]Engl,H.W.Hanker,M.,and Neubauer,A.,Regularization of Inverse Problems(Kluwer Aca-demic,Boston,MA,1996).[4]Guo,L.and Murio,D.A.,”A mollified space-marchingfinite-difference algorithm for the two-dimensional inverse heat conduction problem with slab symmetry,”Inverse Probl.7.247(1991).[5]H`a o,D.N.,”A mollification method for ill-posed problems,”Numer.Math.68,469(1994).[6]Kirsch,A.,An Introduction to the Mathematical Theory of Inverse Problems(Springer-Verlag,Berlin,1996).[7]Murio,D.A.,The Mollification Method and the Numerical Solution of Ill-posed Prob-lem(Wiley,New York,1993).[8]Qian,Z.and Fu,C.L.,”Regularization strategies for a two-dimensional inverse heat conductionproblem,”Inverse Probl.23,1053(2007).[9]Seidman,T.I.and Eld´e n,L.,”An’optimalfiltering’method for the sideways heat equa-tion,”Inverse Probl.6,681(1990).[10]Tautenhahn,U.,”Optimal stable approximations for the sideways heat equation,”J.Inv.Ill-Posed Problems5,287(1997).[11]Tautenhahn,U.,”Optimality for ill-posed problems under general source condi-tions,”Numer.Funct.Anal.Optim.19,377(1998).[12] F.Catania,M.Massab’o.O.Paladino,Estimation of transport and kinetic parameters us-ing analytical solution of the2D advection-dispersion-reaction model,Environmetrics 17(2)(2006)199-216.[13]S.Chen,F.Liu,ADI-Euler and extrapolation methods for the two-dimensional fractionaladvection-dispersion equation,put.26(1-2)(2008)295-311.[14]V.J.Ervin,J.P.Roop,Variational formulation for the stationary fractional advection dispersionequation,Numer.Methods Partial Differential Equations22(3)(2006)558-576.[15]V.J.Ervin,J.P.Roop,Variational solution of fractional advection dispersion equations onbounded domains in Rd.Numer.Methods Partial Differential Equations23(2)(2007)256-281.[16] F.Huang,F.Liu,The fundamental solution of the space-time fractional advection-dispersionequation,put.18(1-2)(2005)339-350.[17]M.E.Khalifa,Some analytical solution for the advection-dispersion equa-tion,put.139(2-3)(2003)299-310.[18] F.Liu.V.V.Anh,I.Turner,P.Zhuang,Time fractional advection-dispersion equa-tion,put.13(1-2)(2003)233-245.[19]Q.Liu,F.Liu,I.Turner,V.Anh,Approximation of the L´e vy-Feller advection-dispersion processby random walk andfinite difference put.Phys.222(1)(2007)57-70.[20]X.Z.Lu,F.W.Liu,Time fractional diffusion-reaction equation,Numer.Math.J.Chin.Univ.27(3)(2005)267-273.[21]M.M.Meerschaert,C.Tadjeran.Finite difference approximations for fractional advection-dispersionflow equations,put.Appl.Math.172(1)(2004)65-77.[22]R.Metzler,J.Klafter,The random walk’s guide to anomalous diffusion:a fractional dynamicsapproach,Phys.Rep.339.(2000)1-77.[23]S.Momani,Z.Odibat,Numerical solution of the space-time fractional advection-dispersionequation,Numer.Methods Partial Differential Equations24(6)(2008)1416-1429.[24]I.Podlubny,Fractional differential equations:an introduction to fractional deriva-tives,fractional differential,some methods of their solution and some of their applica-tions,in:Mathematics in Science and Engineering,vol.198,Academic Press,San Diego,1999. [25]J.P.Roop,Numerical approximation of a one-dimensional space fractional advection-dispersion equation with boundary layer,Comput.Math.Appl.56(7)(2008)1808-1819.[26] E.Scalas,R.Gorenflo,F.Mainard,Fractional calculus and continuous-timefinance,Physica A284(1-4)(2000)376-384.[27]Z.Yong,D.A.Benson,M.M.Meerschaert,H.P.Scheffler.On using random walks to solve thespace fractional advection-dispersion equations,J.Stat.Phys.123(1)(2006)89-110.[28] F.Huang,F.Liu,The time fractional diffusion equation and the advection-dispersion equa-tion,ANZIAM J.46(3)(2005)317-330.。
对溶质运移模型的并行化处理
第32卷第1期吉首大学学报(自然科学版)Vol.32No .12011年1月Journ al of Ji shou Universit y (Nat ural Science Edit ion)J an.2011文章编号:1007-2985(2011)01-0026-04对溶质运移模型的并行化处理*邓永辉(湖南财政经济学院,湖南长沙410205)摘要:在了解和广泛收集前人研究资料的基础上,将并行算法引入到求解首采区卤水动态二维模型中关于溶质运移的问题中.把溶质运移方程按时间分裂法分成5个子方程,针对这5个子方程讨论了全域精细积分和子域精细积分的并行算法,给出了对流和扩散方程的子域精细积分并行算法.子域精细积分考虑了精细积分法高精度的特点,又避免了全域积分的大矩阵运算,其精度优于单点精细积分法.关键词:对流扩散方程;并行算法;偏微分方程中图分类号:O29文献标志码:A溶质运移模型的数值求解法主要包括有限元和有限差分法2种.由于有限差分法计算简单,物理意义直观,在溶质运移问题求解中得到普遍应用.但是,当对流项占优,即Peclet 数较高时,则出现数值波动和弥散.采用高阶的时间导数和空间导数离散逼近,并用极短的时间和空间步长求解,上游加权法可同时消除大部分的数值弥散,但求解效率却大大降低.总之,对差分法的改进,或消除了数值波动,或减少了数值弥散,仍然未能从根本上消除数值困难.与有限差分法相比,有限单元法比较灵活[1],求解精度高,因而成为早期求解溶质运移问题应用最广泛的数值法.1溶质运移数学模型通过几种简单模型的推导和分析,再结合首采区能引起浓度变化的各种因素,如对流、扩散、排泄、补给等等,可以导出最终目标模型如下:M(C t+V 1C x+V 2C y )=M (C x(D 11C x)+y (D 22C y))+Q(C q -C)+M f c.(1)(1)式只表明了浓度随时间变化的规律,要求出微分方程的解还需要一些定解条件,求出在特定条件下浓度的值,再计算区内边界条件如下:C(x,y,t)|t=0=C 0(x ,y),C(x,y,t)|1=C 1(x,y),M D n Cn |n=q c 2(x,y),12=,12=.其中:C 为浓度(mL -3);V 1,V 2为渗透速度(LT -1);D 11,D 22为弥散系数(L 2T -1);C q 为汇源项的浓度(mL -3);n 为外法线方向;f 为固液转化系数(T -1),若忽略固液转化作用,则f =0;C 0(x,y)为初始浓度分布函数(mL -3);C 1(x,y)为第1类边界1上的浓度分布函数(mL -3);q c 2(x,y)为第2类边界2上弥散通量分布函数(mT -1L -1);为渗流区;M 为含水层厚度(L);Q 为汇源补给项(LT -1);自由孔隙度率,即给水度,也就是从单位体积含水层中在重力作用下能够释放给出的水量所占该含水层体积的份数.*收稿日期基金项目国家科技攻关项目(B 6ZB )作者简介邓永辉(),女,湖南常德人,湖南财政经济学院讲师,硕士,主要从事应用数学领域研究:2010-11-24:2001A 0-09-1:1979-.2并行前的预处理多维的对流扩散方程,尤其当它为非线性和非定常的时候很难直接求解,而且在差分格式的选择上还有很大的局限性,即使得出结果,解的精度也不高.为了剔除这些不利因素,Yanenko 提出了一种利用时间分裂的概念[2],分步求解多维问题的方法.所谓时间分裂,笔者是这样理解的,即在某一固定的时间段内,一个物理或是化学过程中的若干反应所引起的效力,并不是同时完成的,而是存在主次先后之分.换句话说,在某一固定的时间段里,可以将一个物理或化学过程的每个子反应看成是互步相交的集合,而它们的并形成了这个时间段内完整的物理或化学过程的演变.因此,任意多维的问题都可以分解成若干个一维的偏微分方程,依次求解.对于方程(1),不步考虑排泄补给项和溶质的化学转换等,方程化为(C t +V 1C x +V 2C y)=(C x (D 11C x)+y(D 22C y))+S.(2)根据分步解法,方程(2)可以分裂成5个方程:C t=R r =1Q r (z -z r )(y -y r )(x -x r ),(3)C t =-V 1C x ,(4)C t =-V 2C x ,(5)C t =D 112C x 2,(6)Ct=D 112C y 2.(7)其中:()为Dirac 函数;Q r 为第r 个源头的源强;x r ,y r 和z r 为这个源的坐标.方程(3)为R 个源头对浓局度部变化C t 的贡献,(4)和(5)式为对流项对浓度局部变化Ct 的贡献,而(6)和(7)式为水平方向的扩散项对浓度局部变化C t的贡献.在对对流扩散方程采用时间分裂法,将多维的问题转换为5个一维的方程求解时,很大程度上增强了已知算法的可操作性.下面来研究一维扩散方程和一维对流扩散方程的并行化处理.3并行处理偏微分方程的精细时程积分法先在空间坐标上按差分法离散[3],而得到对时间的常微分方程组,然后采用全域精细积分法计算此方程组,可得到方程组的高精度解.这相当于给出了偏微分方程的半解析解.精细积分中指数矩阵T 的计算量很大,而T 的求解主要是矩阵乘法的运算.全域精细积分中,需要对矩阵运算进行并行化.文中使用的T ransputer 系统是分布式存储的多处理器系统,采用分块的矩阵乘法.变量较多时,全域精细积分的指数矩阵的存储量和计算量都很庞大.子域精细积分不需形成总体的指数矩阵,各子域的求解是相对独立的,计算量较大的指数矩阵求解可在各处理器中分别计算,它具有良好的并行计算的性质.3.1一维扩散方程的并行算法(1)全域精细积分的并行算法.考虑一维扩散方程C t =D 2Cx 2,当两端边界条件为0,用差分法在x 坐标方向离散后,可得到k +1=exp(H t)=exp(H t)k=Tk,(8)其中是常数矩阵,0是初值文献[]中给出了指数矩阵T 的精细计算方法T =x ()=[x ()]=[x (T)]=[I +T ],其中可选用=N 若是一不大的时间区段,则当N 并不很大时,=也将是一个非常小的时间27第1期邓永辉:对溶质运移模型的并行化处理H .4:e p H t e p H t/m m e p H m a m m 2.t t/m段.T 的计算相当于做N 次循环:T a =2T a +T a T a .(9)最后做T =I +T a ,这样可得到高精度的指数矩阵T =exp(Ht).Transputer 系统是分布式存储的并行多处理器系统,各处理器的数据由相互间的通讯得到.Fortr an 语言中数组的数据是按列存放的.为了通讯的方便和高效率,(9)式中矩阵的计算按列分块T a =(T a1,T a 2,,T an ).n 是处理器的数目.处理器i 中的矩阵计算为T ai =2T a i +T aT ai .由于相乘和相加是相同的T a 矩阵,因此计算时先将T a 阵分别送到各个处理器,各处理器计算T a i ;然后进行通讯得到新的完整的T a 矩阵;再将T a 阵送给各处理器,依次循环后,加上单位阵I 得最终的矩阵T.(8)式的时间步计算中,将T 阵按行分块T =(T 1,T 2,,T n ),k +1则分为n 段k +1=((k +11)(k +12)(k +1n)),k +1i=T ik,i =1,2,,n.由通讯收集得到k +1,再送给各处理器进行下一时间段的计算.(2)子域精细积分的并行算法.当未知数增多时,全域精细积分法指数矩阵的存储量和计算量都将迅速增加,这会影响到其解题的规模和效率.当时间步长t 很小时,指数矩阵T 是近似窄带宽的.因此,文献[5]提出了子域精细积分的方法,这可以使存储量和计算量大幅度下降.考虑子域的非齐次微分方程组,其向前精细积分格式为v n +1j=Tv n j +(T -I )H -1f njj =1,2,,J.(10)蛙跳格式为T(2t)=exp(2Ht),v n -1j=Tv n j +(T -I )H -1f n j j =1,2,,J.(11)格式(10)是一阶时间精度,格式(11)则是二阶时间精度,而两者的计算量几乎是一样的.当各子域划分相同时,只需计算1组T 和H -1阵.T 阵的并行算法与全域精细积分的相同,H -1阵的计算采用分块的Gauss Jordon 算法.将H 按列分块送到各处理器:H =(H 1,H 2,,H n).设有m 个内点.从H 阵的第1列到最后的第m 列依次计算.计算第j 列时,j 列所在的第I 个处理器从第j列选主元.第I 个处理器将选出的主元行号k 及第j 列向量a 送到每个处理器.各处理器交换H 阵中的第j,k 2行:tmp =H ijt ,H ijt =H ik l ,H ik l =tmpl =1,,m;i =1,,n.选出的主元行号存储于一维数组M 中.然后进行消去运算.设H 阵的第j 列在H i 阵中为第p 列,则第I 个处理器中的算法为:H Ijp =1/a j ,H It p =-a l /a j l =1,,j -1,j +1,,m;H I jl =H I jl /a jl =1,,p -1,p +1,,m I ;H Iq l =H Iq l -H Ijl a qq =1,,j -1,j +1,,m,l =1,,p -1,p +1,,m I .其他处理器中的算法为:H I jl =H I jl /a jl =1,,m i ,i =1,,n;H Iq l =H Iq l -H Ijl a qq =1,,j -1,j +1,,m,l =1,,m i ,i =1,,n.H 阵是按列存在各处理器中,交换的通讯量很大,可将各处理器中的H i 送到主控程序[4],由主控程序进行最后的列交换,得到逆阵H -1.再将T 和H -1阵送回各处理器.向量按照子域划分情况送到各处理器,按(10)或(11)式计算各子域的v n +1j .将v n +1j 中与其他子域重合部分的分量送出,同时接收其他处理器计算的这几个分量,并取平均值作为计算结果.3.2对流扩散方程的并行算法考虑更一般的一维对流扩散方程=D x x,()为改进差分法的数值精度和稳定性,文献[]给出了对流扩散方程的单点精细积分算法由方程()的28吉首大学学报(自然科学版)第32卷ut2u2-v u 124.122个通解u =常数和u =x -v t ,得到在空间坐标上离散的微分方程[5].u 0j =c 1u j-1-c 2u j +c 3u j+1.(13)(13)式的单点蛙跳格式为u n +1j =[u n -1j -(c 3u n j+1+c 1u n j -1)/c 2]exp(-2c 2t)+(c 3u n j+1+c 1u nj -1)/c 2.(14)此格式具有二阶时间精度,并且是无条件稳定的.为提高精度,可以采用多内点的子域精细积分算法.利用蛙跳格式(14),考虑3个内点的子域,其非齐次微分方程组是u 0j -1u 0j u 0j +1=-c 2c 30c 1-c 2c 30c 1-c 2u j -1u jj j+1+c 2u j-2c 3u j+2j =1,2,,J.(15)u j-2和u j +2作为常数,在t 时间段内是不变的,可以从相邻子域中得到这2点的值.采用与(10),(11)式相同的格式计算方程(15).当在空间坐标作等距离的离散,并且各子域内点相同时,T j 和H -1j 是相同的,则只需计算1组[6];算法与前相同.使用不同划分的子域时,先将向量x 按划分的子域分段地送到各处理器,各处理器计算各系数,并组装得到H j 矩阵;然后计算H -1j和T j .这一步计算量较大,且不需通讯,是大粒度的.4结语精细积分法有很高的计算精度,不论是全域积分还是子域积分都有很好的并行性,在并行机系统上可充分发挥其效能.其中子域积分法有更大的优点,它克服了全域积分矩阵过大的弱点,又可以提高计算的精度.在并行计算中能方便地将各子域放到各处理器中,算例表明此算法是高效率的.增加内点数加大了指数矩阵和每个时间步的计算量,但能够提高精度,所以应适当地选择子域的大小.参考文献:[1]王文科.地下水有限分析数值模拟的理论与方法[M].西安:陕西科学技术出版社,1996:102-126.[2]ROACH E P putational Fluid Dynam ics [M].H ermosa,Albuquer que,New Mcxico,1976:446.[3]孙纳正.地下水污染数学模型和数值方法[M].北京:北京地质出版社,1989.[4]钟万勰.子域精细积分及偏微分方程数值解[J].计算结构力学及其应用,1995,12(3):253-260.[5]罗焕炎,陈雨孙.地下水运动的数值模拟[M].北京:中国建筑工业出版社,2001.[6]DURBIN F.Numer ical Inver sion of Laplace Tr ansfor m:An Efficient Improvement to Dur bner and Aba re s Method [J].Comp.J,1973,17:371-376.Parallel Algorithms for a Solute Transport ModelDENG Yong hui(H unan Univer sity of Finance and Economics,Changsha 410205,China)Abstr act:This paper pr esents the parallel algorithms of precise integration methods.The subdomain par allel algorithms of pr ecise integration of convection diffusion equations are given.T he method take advan tage of the high precision of precise integration methods,and they also avoid a large amount of matr ices calculation of whole domain methods.The precision of the methods is better than that of the single point integration.Key words:convection diffusion equations;parallel algorithm;partial differential equations(责任编辑向阳洁)29第1期邓永辉:对溶质运移模型的并行化处理。
基于最大熵和粒子群优化的红外图像分割英文
第30卷 第5期2007年10月电子器件Ch inese Jou r nal Of Elect ro n DevicesVol.30 No.5Oct.2007Maximum Fuzzy Entr opy and Par t icle Sw ar m Optimizat ion (PSO)B a sed Inf rar ed Image Segmenta tion 3Z H A N G Wei 2w ei 1,TA N G Yi n g 2ga n21.Insti t ute of In f ormati on Engi neeri ng ,Hebei I nst i t ut e of Technolo gy ,Tan gs han Hebei 063000,China ;2.I nst i t ut e of Elect rical En gi neeri ng ,Yansh an Uni versit y ,Qi ng huang dao Hebei 066004,ChinaAbstract :Maxi mum f uzzy ent ropy has been proven to be an efficient met hod for i mage segmentation.The key problem associate d wit h t hi s met hod is to find t he opt imal paramet er com bi nation of membe rship f unc 2t ion so t hat an i mage can be t ransfor med into f uzzy domai n wi t h maxi mum fuzzy ent rop y.However ,i t is computationall y i nt ensi ve or even impossibl e to fi nd t he optimal parameter combi nation of mem bership function direct l y.A new optimizat ion al gorit hm ,namely ,P SO al gorit hm i s used to search opt imal para me 2t er com bination.Each parti cle represent s a po ssible paramet er com bi nat ion.A swa rm of particles are ini 2t ialized and fly t hrough solution space for tar get ing t he opti mal solution.The p roposed met hod i s used t o segment i nf rared i ma ge and ideal segmentat ion result s ca n be obt ai ned wit h less computat ion cost.K ey w or ds :inf rared image ;i mage se gmentation ;f uzzy ent ropy ;part icle swarm opt imizat ion (PSO)EEACC :6140C基于最大熵和粒子群优化的红外图像分割3张薇薇1,唐英干21.河北理工大学信息学院,河北唐山063000;2.燕山大学电气工程学院,河北秦皇岛066004收稿日期6228基金项目国家自然科学基金中杰出青年学者资助项目(65533);河北省教育厅基金资助项目()作者简介张薇薇(2),女,硕士,河北理工大学教师,主要研究方向为图像处理和模式识别,zjz @63摘 要:最大模糊熵是一种有效的图像分割方法,该方法的一个关键问题是确定模糊隶属度函数的最优参数组合,从而使得图像变换到模糊域后的模糊熵最大.但是直接采用穷举法来寻找最优参数组合的计算量是很大的,甚至是不可能的.因此,提出了采用一种新的优化方法,即粒子群算法来寻找最优参数组合.在参数搜索空间中,随机初始化一群粒子,通过粒子之间的相互协作来寻找最优解.所提出的方法用于分割红外图像的结果表明,花费很小的计算代价就可以获得理想的分割结果.关键词:红外图像;图像分割;模糊熵;粒子群中图分类号:TP391.41 文献标识码:A 文章编号:100529490(2007)0521736205 Infrared image segment at io n ,which ext ract s object s f ro m background ,plays an import ant role i n auto mat ic t arget recognit ion (AR T)and i nf rared guidance techni que.Thresholdi ng i s a most used met hod to segment inf rared i mage for it s si mplici t y and l ess comput ation cost.However ,i nf ra red i m 2age po ssesse s low cont rast bet ween object andbackground and t he edge of object is am biguous ,cla ssical t hreshol d select ion met hod ,such a s Ot 2su [1]a nd Kapur [2]approach ,can usually not obtain an i deal segment ation resul t s.Fuzzy set t heory i s a powerf ul mat hematic tool ha ndli ng a mbiguit y or uncert ai nt y and has bee n ap 2plied to image segment ation successf ully.Fuzzy c 2part ition is a popul ar f uzzy segmentat ion met hod.:200092:020*******:1977w w 1.co m.Cheng,et al[3].proposed to use f uzzy c2parti tionand maxi mum ent ropy p ri nciple to select t hreshol d automatical ly.Cheng,et al[4],defined a new f uzzy ent rop y and used it t o image segment ation and e n2 hance.Zhao,et al[5]proposed an ot her e nt ropy function using f uzzy c2partit ion and proba bili t y partit ion.Tao[6]proposed a n i mproved ve rsion of Zhao’s met hod and used i t to select t hree2level t hreshol d.A lt hough t he f uzzy c2pa rtition app roach for im age t hre sholdi ng result s i n much bet t er se g2 mentat ion t han many exi sting met hods,t he com2 putat io n burden i s very large because t he size of searc h space increases when t he number of param2 ete rs needed to det ermi ne t he m em ber ship f unc2 t ion increa ses.For example,if S2f unction and Z2 function wi t h t hree paramet ers(i.e.a,b,c)i s se2 lect ed to const ruct f uzzy c2partition,t he searc h space i s L(L-1)(L-2)/6for an i mage having L gray level s.I f L equal s256,t hen t he sea rch space will be2763520.It i s impo ssible to sea rch opti mal parameter com binat ion direct ly.Ma ny resea rcher s seek help to genetic al gorit hm(GA)[4,6]and simu2 lat ed anneali ng(SA)[3,5]to solve t hi s problem.Al2 t hough GA a nd SA can usually obt ai n opt imal so2 l ution,t hey are equipped wit h some drawbacks such a s ea rl y mat uration of G A and slow conver2 gence of SA.PSO i s an evolutionary opt imization t ec hni que developed by K ennedy a nd Eberhart i n1995[7]in2 spired by t he beha vior of bi rd flocking a nd fish schooli ng.Li ke G A,PSO i s a pop ulation2ba sed al2 gori t hm,each i ndi vi dual o r candidat e solution i s call ed a“part icle”.PSO i s effecti ve i n nonli near opti mization problem s and easy to i mple me nt.PSO and it s i mprovement versions have been applied i n ma ny area s such as reacti ve power opt imization[8], polygo nal app roxi mat ion[9]and combi natori al opt i2 mizat ion[10]et c.In t hi s paper,P SO i s used to find t he opti mal parameter combi nation based on ma xi2 m um f uzzy ent ropy p ri ncipl e.Each part icle repre2 se nt s a possi ble parameter combination.Inf rared 1 Threshold Select ion Using Fuzzy Entr opy Pr inciple L et I be a n image of size M×N and I(x,y)be t he gray l evel of pi xel(x,y),where I(x,y)∈{0, 1,2,…,l-1}.The hi stogra m of image I i s rep re2sent ed by H={p0,p1,…,p l-1}where p k=n kM×N and n k i s t he num ber of pi xel wi t h gray l evel k.As2 sume t hat i mage I i s sepa rat ed i nto object E o a nd background E b by t hreshold T,t hen it s probabilit y i s p o=p(E o)and p b=p(E b).For each gray l evel k(k=0,1,2,…,l-1),t he proba bi li t y belongi ng to object a nd background arep k0=p k×p o|k a nd p kb=p k×p b|k(1) respectivel y,where p o|k and p b|k are t he co nditional p roba bili t y of gray l evel k belongi ng to object a nd background.In t hi s paper,l et t he co nditional p roba bili t y of gray l evel k belongi ng to object a nd background equal to fuzzy member ship of gray lev2 el k t o o bj ect and background,i.e.p o|k=μo(k),p b|k=μb(k)(2)whereμo(k)andμb(k)a re t he f uzzy member2 ship of gray level k to object and background.It can get from Equ.(1)a nd Equ.(2):p o=p(E o)=∑l-1k=0p ko=∑l-1k=0p k×p o|k(3) p b=p(E b)=∑l-1k=0p kb=∑l-1k=0p k×p b|k(4) We choose t he S2f unction and Z2function de2 fined in Equ.(5)a nd Equ.(6)a s t he mem bership f unction of object a nd background respectively,μo(k)=S(k;a,b,c)=0 kΦa(k-a)2(c-a)(b-a) a<kΦb1-(k-c)2(c-a)(b-a) b<kΦc1 k>c(5)μb(k)=Z(k;a,b,c)=1-S(k;a,b,c)(6) where0<aΦbΦc.He nce,t he f uzzy e nt ropy of obj ect and background i s=∑=×μ()×μ()()7371第5期张薇薇,唐英干:基于最大熵和粒子群优化的红外图像分割image is segmented using the proposed method andideal results can be obtained with le ss comp uta tion cost.H o-l-1k0p k o kp oln p k o kp o7 andH b =-∑l-1k =0p k ×μb (k)p bln p k ×μb (k )p b(8)The t ot al f uzzy ent rop y f unction is gi ve n a s followingH (a ,b,c )=H o +H b(9)The t otal fuzzy e nt ropy varies along wit h t hree varia bles a ,b ,c .We can fi nd an opti mal com binat ion of (a ,b ,c )so t hat t he t ot al f uzzy en 2t ropy H (a ,b ,c )ha s maxim um val ue.Then ,t he most approp riat e t hreshold T can be comput ed asμo (T )=μb (T )=0.5(10)However ,find t he opt imal com binat ion of (a ,b ,c)i s not a n ea sy wor k as t he reaso n ment io ned previously.In t his paper ,a new opti mization algo 2rit hm ,namel y t he pa rticle swarm opti miza tion (PSO )i s used to find t he opti mal com bi nat ion of (a ,b ,c ).2 PSO A lgor ithmThe basic P SO model consist s of a swar m of particles movi ng in a d 2di mensional search space w he re a ce rtain qualit y measure ,t he fit ness ,can be calculated.Each pa rticl e ha s a po si tio n repre 2sent ed by a vector x i =(x i1,x i2,…,x id )and a velocit y rep re sented by vector v i =(v i1,v i2,…,v id ),where i is t he i nde x of part icle.W hile searc hing opti mal solution in search space ,each particle re mem ber s t wo vari able s:one i s t he best position fo und by it s ow n so far ,denoted by p i a nd anot her i s t he best posit ion among all pa rticle s in t he swar m ,denoted by p g .A s ti me pa sse s ,each particle up date s it s position and velocit y to a new value accor ding t o form ula Equ.(11)a nd Equ.(12)v i (t +1)=w 3v i (t)+c 13r 13(p i -x i (t))+c 23r 23(p g -x i (t ))(11)x i (t +1)=x i (t )+v i (t +1),i =1,2,…,n(12)Where w is t he ine rtia weight ,c 1and c 2are t he lea rning factor s ,ge nerall y ,c 1=c 2=2,r 1,and r 2are two ra ndom number wi t hin t he i nt erval of (0,1),t i s t he number of i teration.G enerally ,t he value of each component s in v i can be clamped [x ,x ]x 2f T f Equ.(12).Thi s process i s repeated until a user 2de 2fined stoppi ng cri terion i s reac hed.In t hi s paper ,PSO i s used fi nd t he opt imal pa 2rameter com bination.Each part icle represent s a possi ble pa ramet er combination.The tot al fuzzy ent ropy i s use d as t he fit ness of each particle.The segment ation procedure usi ng f uzzy ent ropy a nd PSO can be depict ed as followi ng :(1)Ra ndomly generate n part icle s of 32di men 2sion.Each particle is a ssigned a randomized veloci 2t y and posit io n.Set t he ma xi mum it erat io ns N a nd let t =0.(2)Calc ulate t he fi t ness of each part icle usi ng Equ.(9).(3)Updat e t he per sonal be st of eac h parti cles accordi ng to it s curre nt fi t ness and global best a 2mo ng all particles accordi ng t he fit ness of t he swarm s.(4)Cha nge t he velocit y of each particl e usi ng Equ.(11)(5)Move each parti cle to it s new position u 2sing Equ.(12)(6)Let t =t +1and go to St ep 2.Repeat untilt =N(7)Out put t he best position of t he swarm.Calcul at e t he best t hreshold using Equ.(10).3 Exper imental ResultsTo eval uat e t he perform ance of t he proposed met ho d ,experi ment s have been car ried on two i n 2f rared ima ges usi ng MA TL AB 6.5on Pentium 2IV 2.66GHz .The t est im ages are factory a nd t ank whose size i s 185×255and 176×126respec 2tively.Several met hods ,i. e.,Ot su ,Kap ur ,f uzzy e nt ropy usi ng GA i n Ref.[6]are used to seg 2me nt t hese images as well as t he p ropo se d met h 2od.The segm ent ation re sult s are show n i n Fi g.1and Fig.2and t hreshol ds o bt ai ned by t hese met h 2ods a re li st ed i n Table 1.It can be seen from t he result s t hat t he se gm ent ation result s obt ai ned by f uzzy ent ropy met hod are bet t er t han Ot su and Kapur met hod because m uch fal se object ’s i nfor 2x y O K 8371电 子 器 件第30卷to t he range -v ma v ma to control e cessive roa ming o pa rticle s outside the search space.he particle lie s toward to a new position according tomatio n e ist in the results obtained b tsu and apur method.(a )original ima ge (b )Ot su method (c )K apure method(d)f u zzy en tropy method using G A in Re f.[6] (e)f u zzy entropy meth od using PSO (f)total entropy at each iterati on using G A and PSOFig.1 Segmentation results of image f actor y To demonst rat e t he good performance of PSO ,G A is al so used to find t he opti mal paramet er co mbina 2t ion.The paramet ers of GA are set a s i n Ref.[6].Iteration number of G A and PSO are bot h 100.The best parameter combi nation ,t he best tot al fuzzy ent rop y and computation t ime obt ai ned by GA a nd PSO are shown i n Table 2.The cha nge of t otal fuzzy ent ropy at each it era tion using G A a nd PSO are shown in Fi g.1(f)and Fi g.2(f).From Fi g.1(d)-(e)and Fig.2(d)-(e),we can see t hat t he segmentat ion re 2sult s a re compa rat ive usi ng G A and PSO and t he best tot al f uzzy ent ropy is almost t he same.However ,t he computation ti me usi ng PSO i s m uc h less t han usi ng GA.The comput ation t ime using PSO i s a bo ut 1/4of G A.So ,i t i s much fa ster usi ng PSO t han G A.In t hi s paper ,we do not compare t he comput ation ti me u 2si ng PSO and e xha ust searching met hod because i t i s a very t ime 2co nsuming work usi ng exhaust searchi ng met hod to fi nd t he opti mal pa ra met er com binat io n.However ,we can e sti mat e t he comput ation ti me roughl y.In our experi ment ,about 0.05second i s used t o compute t he tot al f uzzy ent ropy each ti me.If ex 2haust searching is used ,about 138162seco nds ,i.e.about 38hour s should be used a nd it cannot be t oler 2ant i n real ti me syst em.(a)original ima ge (b)Ot su method (c)K apure method()f zzy y G R f [6] ()f zzy y SO (f)y G SOF S f 9371第5期张薇薇,唐英干:基于最大熵和粒子群优化的红外图像分割d u en trop method using A in e .e u entrop meth od using P total entrop at each iterati on using A and P ig.2e gmentation results o image tankT a b.1 Thr eshold obta ined by Otsu,K a pur,GA an d PSOImageMethodOtsu K apur GA PSOfactory64109141142 tank107140180180T a b.2 Computa tion time,total thr eshold and par ameter combina tion(a,b,c)of G A a nd PSOtermf actor y tankG A PSO GA PSOtime(s)8.859 2.1579.14 2.25 total entropy9.35799.358210.45310.462 (a,b,c)(51,173,186)(51,179,179)(0,255,255)(0,255,255)4 ConclusionIn co ncl usion,a new segment ation scheme i s proposed for i nf rared ima ge.The i mage t o be se g2 mented i s t ransformed i nto f uzzy domain using S2 function and Z2f unction wit h t hree parameter s. The best t hre shol d is obt ai ned based on maxi mum fuzzy ent ropy pri nciple.PSO algorit hm i s use to find t he best parameter co mbination of S2f unction and Z2f unction.It i s show n t hat,P SO al gorit hm can obtain t he best parameter com bination wit h less comput ation cost t ha n many existi ng met hod such a s e xha ust searching and G A.It p rovides a good qualificat ion for real ti me syst em.R eferences:[1] Ot su N.A Threshol d Select ion Met ho d f ro m Gray2L evel His2t ogram[J].IEEE Tran s.on Syst em s,Man and Cyberneti c, 1979,9(1):62266.[2] K apur J N,Sahoo P K,W o ng A K C.A New Met hod forGray2Level Pict ure Th res hol di ng using t he ent rop y of t he hi s2 t ogram[J].C o m p ut er Vi sion,G raphics and Im age Pro cess2 in g,1985,29(3):2732285.[3] Chen g H D,Chen J R,Li J G.Th res hol d Sel ect ion Based onFuzzy c2Parti tio n Ent ropy Approach[J].Patt ern Recognit ion, 1998,31(7):8572870.[4] Chen g H D,Hung Y H and Sun Y.A Novel Fuzzy Ent ropyApp roach t o Im age Enhancem ent and Threshol ding[J].Signal Proces s i ng,1999,75(3):2772301.[5] Zhao M S,Fu A M N,Yan H.A Tech ni que of Three LevelTh res hol di ng Based o n Probabi lit y Partit ion and Fuzzy32Part i2 t ion[J].I EEE Tran s.Fuzzy Syst em s,2001,9(3):4692479.[6] Tao W B,Tian J W and Li u J.Image Segment atio n by T hree2Level Thresholdin g Based on Maxi mum Fuzzy Ent ropy and Genet ic Algorit h m[J].Patt ern Recognit ion L et t er s,2003,24(16):306923078.[7] Ken nedy J,Ebert hart R C.Par ticl e Swarm Opt i m izat ion[C]//Proc.IEEE Int er nat.Co nf.On Neural Network s,Piscat2 away,NJ,USA:194221948.[8] Ji ang C W,Bo m pard W.A Hybrid Met hod fo Chaot ic Part icl eSwarm Op ti m izat ion and Li near Interior fo r Reacti ve Power Opt imizat ion[J].Mat hem ati cs and C o mput ers i n Sim ul at ion, 2005,68(1):57265.[9] Yin P Y,A Discret e Part icl e Swarm Al go ri t hm for Opt imalPolygonal App ro xi mation of Di gi t al Curves[J].Jo urnal of Vi s2 ual C o mmu nicat ion and Image Rep resent ation,2004,15(2): 2412260.[10] Sal man,Ahmad I and Madani S A.Parti cle Swarm Opt imiza2t ion fo r Task Asignment Poblem[J].Mi croprocessor s andMicrosys tems,2002,26(8):3632371.0471电 子 器 件第30卷。
最优化方法有关牛顿法的矩阵的秩为一的题目
英文回答:The Newton-Raphson method is an iterative optimization algorithm utilized for locating the local minimum or maximumof a given function. Within the realm of optimization, the Newton-Raphson method iteratively updates the current solution by leveraging the second derivative information of the objective function. This approach enables the method to converge towards the optimal solution at an accelerated pacepared to first-order optimization algorithms, such as the gradient descent method. Nonetheless, the Newton-Raphson method necessitates the solution of a system of linear equations involving the Hessian matrix, which denotes the second derivative of the objective function. Of particular note, when the Hessian matrix possesses a rank of one, it introduces a special case for the Newton-Raphson method.牛顿—拉弗森方法是一种迭代优化算法,用于定位特定函数的局部最小或最大值。
An Optimal Algorithm for Constructing the Reduced Grobner Basis of Binomial Ideals
J.Symbolic Computation(2001)31,259–276doi:10.1006/jsco.1999.1015Available online at onAn Optimal Algorithm for Constructing the Reduced Gr¨o bner Basis of Binomial Ideals,and Applications toCommutative SemigroupsULLA KOPPENHAGEN†AND ERNST W.MAYR†Institut f¨u r Informatik,Technische Universit¨a t M¨u nchen,D-80290M¨u nchen,GermanyIt is known that the reduced Gr¨o bner basis of general polynomial ideals can be computedin exponential space.The algorithm,obtained by K¨u hnle and Mayr,is,however,basedon rather complex parallel computations,and,above that,makes extensive use of theparallel computation thesis.In this paper,we exhibit an exponential space algorithmfor generating the reduced Gr¨o bner basis of binomial ideals which can be implementedwithout any complex parallel computations.This result is then applied to derive spaceoptimal decision procedures for thefinite enumeration and subword problems for com-mutative semigroups.c 2001Academic Press1.IntroductionThe method of Gr¨o bner bases(see Buchberger,1965;also Hironaka,1964)is a technique that provides algorithmic solutions to a variety of problems,for instance,primary decom-position of polynomial ideals,computations in the residue class ring modulo a polynomial ideal,decisions about various properties of ideals generated by a givenfinite set of poly-nomials,word problems modulo ideals and in commutative semigroups(reversible Petri nets),bijective enumeration of all polynomial ideals over a given coefficient domain etc. Although versions of Buchberger’s algorithm have been somewhat successful in prac-tice,the complexity of the algorithm is not well understood.Afirst step in understanding the complexity of the algorithm is to bound the degree of polynomials that occur in a minimal Gr¨o bner basis.In the univariate case,the Gr¨o bner-basis algorithm specializes to Euclid’s algorithm whose complexity has been extensively studied(see Loos,1982,for a survey).In the bi-variate case Buchberger(1983)and Lazard(1983)gave important bounds on the degrees and the number of polynomials occurring in a reduced Gr¨o bner basis.In the multivariate case,first steps towards an upper bound for the degrees in a minimal Gr¨o bner basis were taken in Bayer(1982)and M¨o ller and Mora(1984).However,these results did not,or only under very restrictive assumptions,imply bounds for the degree of the polynomials arising during the intermediate computations of the Gr¨o bner basis algorithms.Using a novel partitioning method for polynomial ideals,Dub´e(1990)obtained the †E-mail:{koppenha|mayr}@in.tum.de,rmatik.tu-muenchen.de/0747–7171/01/010259+18$35.00/0c 2001Academic Press260U.Koppenhagen and E.W.Mayrsharpened degree bound of2· d2+d 2k−1(with d the maximum degree of the input basis and k the number of indeterminates)for the degree of polynomials in a reduced Gr¨o bner basis,employing only combinatorial arguments.An exponential bound on the degrees of Gr¨o bner bases for zero-dimensional ideals was shown by Caniglia et al.(1988).Extending this result,Krick and Logar(1991) showed that Gr¨o bner bases of zero-or one-dimensional ideals can be computed in time exponential in the number of indeterminates.By transforming a representation of the normal form of a polynomial into a system of linear equations,K¨u hnle and Mayr(1996)exhibited an exponential space computation of Gr¨o bner bases.This algorithm,however,is based on rather complex parallel compu-tations like parallel rank computations of matrices,and,above that,makes extensive use of the parallel computation thesis(Fortune and Wyllie,1978).In this paper we present an exponential space algorithm for constructing the reduced Gr¨o bner basis of a binomial ideal(each generator is a difference of two terms;for an investigation of the algebraic structure of binomial ideals see Eisenbud and Sturmfels, 1996).This algorithm can be implemented without any difficult parallel rank compu-tations of matrices,or any other complex parallel computations.We make use of the close relationship between commutative semigroups and binomial ideals,in particular,of the algorithm in Mayr and Meyer(1982)for the uniform word problem in commutative semigroups.By the results in Mayr and Meyer(1982)and Huynh(1986),this algorithm is space optimal.As applications of this algorithm we derive a procedure enumerating the elements of finite congruence classes in commutative semigroups and an algorithm for the general subword problem in commutative semigroups.Both algorithms use exponential space and are space optimal.Another immediate application is computing complete rewrite systems for which we also obtain an exponential space completeness result(Huynh,1986).2.Basic Concepts and NotationsIn this section we review some definitions and notations used in the following.2.1.semigroups,Thue systems,and semigroup presentationsA semigroup(H,◦)is a set H with a binary operation◦which is associative.If ad-ditionally◦is commutative we have a commutative semigroup,and a semigroup with a unit element is called a monoid.For simplicity,we write ab instead of a◦b.A commutative monoid M is said to befinitely generated by afinite subset X= {x1,...,x k}⊆M if†M={u|u=x1 (x1)e k,e i∈N,x i∈X}.e2...x k...x ke1x2 (x2)Each element of M can then be represented as a k-dimensional vector in N k,i.e.there is a surjectionϕ:N k→M such thatϕ(e1,...,e k)=x1 (x1)e k.e2...x k...x ke1x2 (x2)†N denotes the set of non-negative integers,and Q the set of rationals.An Optimal Gr¨o bner Base Algorithm for Binomial Ideals,and Applications261 Ifϕis also injective,and hence bijective,then every element of M has a unique repre-sentation in N k,and M is said to be free.For afinite alphabet X={x1,...,x k},X denotes the free commutative monoid generated by X.LetΦ:X →N k be the so-called Parikh mapping,i.e.(Φ(u))i(also writtenΦ(u,x i)) indicates,for every u∈X and i∈{1,...,k},the number of occurrences of x i∈X in u. For an element u of X ,called a(commutative)word,the order of the symbols isimmaterial,and in the following we shall use an exponent notation:u=x e11...x e kk ,where e i=Φ(u,x i)∈N for i=1,...,k.A commutative semi-Thue system over X is given by afinite set P of productions l i→r i,where l i,r i∈X .A word v∈X is derived in one step from u∈X (written u→v(P))by application of the production(l i→r i)∈P iff,for some w∈X ,we have u=wl i and v=wr i.The word u derives v iffu∗→v(P),where∗→is the reflexive transitive closure of→.More precisely we write u+→v(P),where+→is the transitive closure of→,if u∗→v(P)and u=v.A sequence(u0,...,u n)of words u i∈X with u i→u i+1(P)for i=0,...,n−1is called a derivation(of length n)of u n from u0in P.A commutative Thue system is a symmetric commutative semi-Thue system P,i.e.(l→r)∈P⇒(r→l)∈P.Derivability in a commutative Thue system establishes a congruence≡P on X by the ruleu≡v mod P⇔d ef u∗→v(P).Thus,for commutative Thue systems,we also use the notation l≡r mod P to denote the pair of productions(l→r)and(r→l)in P.A commutative Thue system P is also called a presentation of the commutative quotient semigroup X /≡P.We note that commutative semi-Thue systems appear in other,equivalent formula-tions in the literature,like vector addition systems and Petri nets.Finitely presented commutative semigroups are equivalent to reversible vector addition systems or Petri nets.Readers more familiar with reversible Petri nets may want to think of a vector in N k as a marking.2.2.polynomials and idealsLet X denote thefinite set{x1,...,x k},and Q[X]the(commutative)ring of polyno-mials with indeterminates x1,...,x k and rational coefficients.A term t in x1,...,x k is a product of the form t=x e11·x e22···x e k k,with e=(e1,e2,...,e k)∈N k the degree vector of t.By the degree deg(t)of a term t we shall mean the integer e1+e2+···+e k(which is ≥0).Each polynomial f(x1,...,x k)∈Q[X]is afinite sum f(x1,...,x k)= n i=1a i·t i, with a i∈Q−{0}the coefficient of the i th term t i of f.The product m i=a i·t i is called the i th monomial of the polynomial f.The degree of a polynomial is the maximum of the degrees of its terms.An ideal in Q[X]is any subset I of Q[X]satisfying the following conditions:(I1)p,q∈I⇒p+q∈I;(I2)r∈Q[X],p∈I⇒r·p∈I.For f1,...,f h∈Q[X], f1,...,f h ⊆Q[X]denotes the ideal generated by{f1,...,f h},262U.Koppenhagen and E.W.Mayr that is†f1,...,f h :=hi=1p i f i;p i∈Q[X]for i∈I h.If I= f1,...,f h ,{f1,...,f h}is called a basis of I.An admissible term ordering≺on Q[X]is given by any admissible order on N k,i.e. any total order<on N k satisfying the following two conditions:(T1)e>(0,...,0)for all e∈N k−{(0,...,0)};(T2)a<b⇒a+c<b+c for all a,b,c∈N k.If(d1,...,d k)>(e1,...,e k),we say that any monomial a1·x d11···x d k k,a1∈Q−{0},is greater in the term ordering than any monomial a2·x e11···x e k k,a2∈Q−{0}(written a1·x d11···x d k k a2·x e11···x e k k).For a polynomial f(x1,...,x k)= n i=1a i·t i we always assume that t1 t2 ··· t n. For any such non-zero polynomial f∈Q[X]we define the leading term LT(f):=t1. For the sake of constructiveness,we assume that the term ordering is given as part of the input by a k×k integer matrix T such that a1·x d11···x d k k a2·x e11···x e k k iff,for the corresponding degree vectors d and e,T d is lexicographically greater than T e(see Robbiano,1985;Weispfenning,1987)Let I be an ideal in Q[X],and let some admissible term ordering≺on Q[X]be given.A finite set{g1,...,g r}of polynomials from Q[X]is called a Gr¨o bner basis of I(w.r.t.≺),if (G1){g1,...,g r}is a basis of I;(G2){LT(g1),...,LT(g r)}is a basis of the leading term ideal of I,which is the smallest ideal⊆Q[X]containing the leading terms of all f∈I,or equivalently:if f∈I,then LT(f)∈ LT(g1),...,LT(g r) .A Gr¨o bner basis is called reduced if no monomial in any one of its polynomials is divisible by the leading term of any other polynomial in the basis.3.The Connection between Commutative Semigroups and Binomial Ideals3.1.the basic problems and their relationshipIn this section,we consider the uniform word problem for commutative semigroups and the polynomial ideal membership problem.We will show the relationship between these two very basic and fundamental algorithmic problems for commutative semigroups and polynomial ideals.Let P={l i≡r i;i∈I h}be some(finite)commutative semigroup presentation with l i,r i∈X for i∈I h.We identify any u∈X (resp.the corresponding vectoru=(Φ(u,x1),...,Φ(u,x k))∈N k)with the term u=xΦ(u,x1)1·xΦ(u,x2)2···xΦ(u,x k)kandvice versa any term u=x e11·x e22···x e k k∈Q[X]with the wordu=x1 (x1)e1x2 (x2)e2...x k...x ke k∈X .†For n∈N,I n denotes the set{1,...,n}.An Optimal Gr¨o bner Base Algorithm for Binomial Ideals,and Applications263 By I(P)we denote the Q[X]-ideal generated by{l1−r1,...,l h−r h},i.e.I(P):=hi=1p i(l i−r i);p i∈Q[X]for i∈I h.We call such an ideal a binomial ideal,i.e.each polynomial in the basis is the difference of two terms.By looking at Buchberger’s(1965)algorithm it is not hard to see that the reduced Gr¨o bner basis of a binomial ideal still consists only of binomials.The Uniform Word Problem for commutative semigroups is:Given a commutative semi-group presentation P over some alphabet X,and two words u,v∈X ,decide whether u≡v mod P.The Polynomial Ideal Membership Problem is:Given polynomials f,f1,...,f h∈Q[X], decide whether f∈ f1,...,f h .Proposition3.1.(Mayr and Meyer,1982)Let X={x1,...,x k},P={l i≡r i; l i,r i∈X ,i∈I h},and u,v∈X .Then the following are equivalent:(i)there exist p1,...,p h∈Q[X]such that v−u= h i=1p i(l i−r i);(ii)there is a derivation u=γ1→γ2→...→γn=v(P)of v from u in P such that for j∈I n deg(γj)≤max{deg(l i p i),deg(r i p i);i∈I h};(iii)u≡v mod P.In the fundamental paper(Hermann,1926),G.Hermann gave a doubly exponential degree bound for the polynomial ideal membership problem:Proposition3.2.(Hermann,1926)Let X={x1,...,x k},g,g1,...,g h∈Q[X],and d:=max{deg(g i);i∈I h}.If g∈ g1,...,g h ,then there exist p1,...,p h∈Q[X]such that(i)g= h i=1g i p i;(ii)(∀i∈I h);[deg(p i)≤deg(g)+(hd)2k].By size(·)we shall denote the number of bits needed to encode the argument in some standard way(using radix representation for numbers).Then the above two propositions yield an exponential space upper bound for the uniform word problem for commutative semigroups:Proposition3.3.(Mayr and Meyer,1982)Let X={x1,...,x k}and P={l i≡r i;l i,r i∈X ,i∈I h}.Then there is a(deterministic)Turing machine M and some constant c>0independent of P,such that M decides for any two words u,v∈X whether u≡v mod P using at most space(size(u,v,P))2·2c·k.3.2.the reduced Gr¨o bner basis for binomial idealsLet P be a commutative semigroup presentation over some alphabet X,and≺some admissible term ordering on Q[X].The following two theorems characterize the binomials of the reduced Gr¨o bner basis of I(P)(w.r.t.≺).Thefirst shows that in each binomial of264U.Koppenhagen and E.W.Mayrthe reduced Gr¨o bner basis G of I(P)the smaller term(w.r.t.≺)is the minimal element (w.r.t.≺)of the congruence class of the leading term.Theorem3.1.Let X={x1,...,x k},P={l i≡r i;l i,r i∈X ,i∈I h},and G={h1−m1,...,h r−m r}be the reduced Gr¨o bner basis of the ideal I(P)w.r.t.some admissible term ordering≺(m i≺h i).Then m i is the minimal element(w.r.t.≺)of the congruence class[h i]P,i∈I r.Proof.Assume that w=m i is the minimal element of[h i]P(w.r.t.≺).Then w≺m i and m i−w∈I(P).Since G is a Gr¨o bner basis of I(P),m i∈ h1,...,h r ,i.e.there must be some j∈I r such that h j divides m i.This however contradicts the fact that h i−m i is an element of the reduced Gr¨o bner basis of I(P).2The next theorem characterizes the leading terms of the polynomials in I(P). Theorem3.2.Let X={x1,...,x k},P={l i≡r i;l i,r i∈X ,i∈I h},and G={h1−m1,...,h r−m r}be the reduced Gr¨o bner basis of the ideal I(P)w.r.t.some admissible term ordering≺(m i≺h i).Then LT(I(P))(the set of the leading terms of I(P))is the set of all terms with non-trivial congruence class which are not the minimal element in their congruence class w.r.t.≺.H={h1,...,h r}is the set of the minimal elements of LT(I(P))w.r.t.divisibility.Proof.Since G is the reduced Gr¨o bner basis of I(P),it is clear that H is the set of the minimal elements of LT(I(P))w.r.t.divisibility.Since h i−m i∈I(P),there is a derivation in P of m i≺h i from h i(h i+→m i(P)),for all i∈I r.Because G is a Gr¨o bner basis,for any h∈LT(I(P))there is an h j−m j∈G and a term t in X with h=t·h j and h+→t·m j(P).Thus,for any h∈LT(I(P)),the congruence class[h]P is non-trivial,and h is not the minimal element in[h]P.Let s∈X be a term with non-trivial congruence class.If s is not the minimal element m s(w.r.t.≺)of its congruence class[s]P,then s derives m s(s+→m s(P)),and thus, s−m s∈I(P),i.e.s∈LT(I(P)).If s=m s,then there is no derivation of any t s≺s from s,and there is no h j∈H such that h j divides s.This is because if there is some h j∈H and some term t in X with s=t·h j,then s≡t·m j mod P what contradicts the minimality of s.Thus,if s=m s,then s∈LT(I(P))and s∈H.24.An Optimal Algorithm for the Reduced Gr¨o bner BasisIn this section we give an exponential space algorithm for generating the reduced Gr¨o bner basis of a binomial ideal.To determine the complexity of the algorithm we need the results of Section3and the following upper bound for the total degree of polynomials required in a Gr¨o bner basis,obtained by Dub´e(1990).Note that we use exponential notation in representing words over X.Proposition4.1.(Dub´e,1990)Let X={x1,...,x k},F={f1,...,f h}⊂Q[X], I= f1,...,f h the ideal generated by F,and let d be the maximum degree of any f∈F. Then,for any admissible term ordering≺on Q[X],the degree of polynomials required in a Gr¨o bner basis for I w.r.t.≺is bounded by2· d2+d 2k−1.An Optimal Gr¨o bner Base Algorithm for Binomial Ideals,and Applications265 Now we will generate the reduced Gr¨o bner basis of the binomial ideal I(P)w.r.t.some fixed admissible term ordering≺,where X={x1,...,x k}and P={l i≡r i;l i,r i∈X ,i∈I h}(w.l.o.g.l i r i).Let H denote the set{h1,...,h r}of the minimal elements of LT(I(P))w.r.t.divisibility,and m i the minimal element of[h i]P w.r.t.≺,for i∈I r. From Theorems3.1and3.2we know that the set G={h1−m1,...,h r−m r}is the reduced Gr¨o bner basis of I(P).We have to determine the elements in H,as well as the minimal element m i(w.r.t.≺) of the congruence class of each h i∈H.From Proposition4.1we know that the degrees deg(h i)and deg(m i)are bounded by2· d22+d 2k−1,where d is the maximum degree of any l i−r i,i∈I h.Lemma4.1.For a term u∈X with non-trivial congruence class the minimal element w.r.t.≺of[u]P is of the form t·r i with r i∈{r1,...,r h},t∈X .Proof.W.l.o.g.assume that u is not the minimal element m u of[u]P w.r.t.≺.Then there is a derivation in P leading from u to m u≺u,i.e.u+→m u(P),where m u=t·r i for some r i∈{r1,...,r h},t∈X (note that l j r j∀j∈I h).2···x e k k. For h=x e11···x e k k∈X and i∈I k such that e i≥1,define h(i):=x e11···x e i−1i Then H consists exactly of those terms h∈X which have degree≤2· d2+d 2k−1,which are congruent to some term t·r i≺h with r i∈{r1,...,r h},t∈X ,and deg(t·r i)≤2· d2+d 2k−1,and for which,for all applicable i,[h(i)]P is trivial.By Proposition3.3, the condition regarding the reducibility of h can be checked in space(size(P))2·2c·k for some constant c>0independent of P.Testing non-reducibility of the h(i)can also be done in exponential space because of Proposition3.3and:Lemma4.2.A term u∈X with deg(u)≤D is an element of LT(I(P))iffthere is some t·r i with t·r i≺u,r i∈{r1,...,r h},t∈X ,and deg(t·r i)≤D+2· d22+d 2k−1 such that u+→t·r i(P).Proof.We only have to prove the degree bound.Note that u∈LT(I(P))iffeither u∈H,and thus deg(m u)≤2· d22+d 2k−1,where m u is the minimal element of[u]P,or there is some h∈H with u=t u·h for some t u∈X .The degree of the minimal element m h of[h]P w.r.t.≺is bounded by2· d22+d 2k−1.From m h≺h we get t u·m h≺u with deg(t u·m h)≤D+2· d2+d 2k−1.2From this,we derive the exponential space algorithm given in Figure1.Putting everything together,we have proven the following theorem.Theorem4.1.Let X={x1,...,x k},P={l i≡r i;l i,r i∈X ,i∈I h},and≺some admissible term ordering.Then there is an algorithm which generates the reduced Gr¨o bner basis G={h1−m1,...,h r−m r}of the binomial ideal I(P)using at most space (size(P))2·2¯c·k≤2c·size(P),where¯c,c>0are some constants independent of P.266U.Koppenhagen and E.W.MayrThe AlgorithmInput:admissible term ordering≺,P={l1−r1,...,l h−r h}with r i≺l i∀i∈I h Output:the reduced Gr¨o bner basis G={h1−m1,...,h r−m r}of I(P)d:=max{deg(l i),deg(r i);i∈I h}G:=∅for each h=x e11···x e k k∈X with degree≤2· d22+d 2k−1dogb:=falseif there exists t·r i with t·r i≺h,r i∈{r1,...,r h},t∈X ,deg(t·r i)≤2· d22+d 2k−1 which is≡h mod P then/*h∈LT(I(P))*/m:=the minimal(w.r.t.≺)among these termsgb:=trueend ifD:=deg(h)for each i∈I k with e i≥1while gb doh :=x e11···x e i−1i···x e k kif there exists t·r j with t·r j≺h ,r j∈{r1,...,r h},t∈X ,deg(t·r j)≤(D−1)+2· d22+d 2k−1which is≡h mod P then/*h ∈LT(I(P))⇒h∈H*/gb:=falseend ifend forif gb then/*h∈H*/G:=G∪{h−m}end ifend forFigure1.Algorithm for constructing the reduced Gr¨o bner basis of a binomial ideal.From the results in Huynh(1986)we know that,in the worst case,any Gr¨o bner basis of I(P)has maximal degree at least22c ·size(P)for some constant c >0independent of P. Hence,any algorithm that computes Gr¨o bner bases of binomial ideals requires at least exponential space in the worst case.5.ApplicationsWe now consider two applications of the exponential space algorithm obtained in Sec-tion4.We present space optimal decision procedures for thefinite enumeration and subword problems for commutative semigroups.5.1.the finite enumeration problem for commutative semigroupsLet P be afinite commutative semigroup presentation over some alphabet X,and u∈X a word such that the congruence class of u isfinite(or,synonymously,bounded).Then thefinite enumeration problem for commutative semigroups,or equivalently,reversibleAn Optimal Gr¨o bner Base Algorithm for Binomial Ideals,and Applications267Petri nets is the problem of generating a complete list of all the elements of[u]P.We give a procedure for the solution of this problem which needs at most exponential work space.Theorem5.1.Let X={x1,...,x k},P={l i≡r i;l i,r i∈X ,i∈I h}be afinite commutative semigroup presentation over X,and u∈X a word such that the congruence class of u isfinite.Then there is an algorithm which generates the elements of[u]P using at most space(size(u,P))2·2¯c·k≤2c·size(u,P),where¯c,c>0are some constants independent of u and P.Proof.In addition to x1,...,x k we introduce2k+3new variables m,s,t,y1,...,y k and z1,...,z k.Let X =X∪{m,s,t,y1,...,y k,z1,...,z k}.Given P and the word u∈X ,we construct a new commutative semigroup presentation P over X as follows:P contains the relationss·x j≡s·y j·z j,for j=1,...,k,(5.1)s·y(u)≡t,(5.2)s·u≡m,(5.3) and,for every relation l i≡r i in P,the relations·y(l i)≡s·y(r i),and(5.4)t·z(l i)≡t·z(r i),(5.5) where y,resp.z are the homomorphisms replacing x j by y j,resp.z j,j∈I k.Let≺be a lexicographic term ordering satisfyingm≺a≺s≺b for all a∈{x1,...,x k},b∈{t,y1,...,y k,z1,...,z k}.In the following we prove that v∈[u]P iffs·v−m∈G,where G is the reduced Gr¨o bner basis of the ideal I(P )w.r.t.≺.Then,by Theorem4.1,the elements of[u]P can be generated using at most space(size(u,P ))2·2d ·k≤(size(u,P))2·2d·k,where d ,d>0 are some constants independent of u and P ,resp.P.First we establish some technical details.Lemma5.1.Every word w∈[s·u]P satisfies the following conditions:(i)Φ(w,s)+Φ(w,t)+Φ(w,m)=1;(ii)ifΦ(w,s)=1,then xΦ(w,x1)+Φ(w,y1)1·xΦ(w,x2)+Φ(w,y2)2···xΦ(w,x k)+Φ(w,y k)k∈[u]P,xΦ(w,x1)+Φ(w,z1) 1·xΦ(w,x2)+Φ(w,z2)2···xΦ(w,x k)+Φ(w,z k)k∈[u]P;ifΦ(w,t)=1,thenΦ(w,x1)=Φ(w,x2)=...=Φ(w,x k)=0,Φ(w,y1)=Φ(w,y2)=...=Φ(w,y k)=0,xΦ(w,z1) 1·xΦ(w,z2)2···xΦ(w,z k)k∈[u]P.Proof.Let w be any word in[s·u]P .Then there is a repetition-free derivation in P leading from s·u to w.If w=m,then w is derived in one step from s·u by relation(5.3) and w trivially satisfies conditions(i)and(ii).Note that if in a derivation starting at s·u relation(5.3)is applied,then this derivation can only be continued by again using268U.Koppenhagen and E.W.Mayrrelation (5.3),causing a repetition.If w =m ,then in any repetition-free derivation starting at s ·u leading to w only the relations in (5.1)and (5.4)can be applied until the word s ·y (u )·z (u )is reached and changed to t ·z (u )by relation (5.2).Since [u ]P is finite,there is no u ∈{y 1,...,y k } with s ·u ·z (u )∈[s ·u ]P ,u =y (u ),and y (u )divides u .Therefore,any word w occurring in this derivation of s ·y (u )·z (u )from s ·u satisfies conditions (i)and (ii):(i)Φ(w,s )=1,Φ(w,t )=0,Φ(w,m )=0,(ii)x Φ(w,x 1)+Φ(w,y 1)1·x Φ(w,x 2)+Φ(w,y 2)2···x Φ(w,x k )+Φ(w,y k )k ∈[u ]P ,and x Φ(w,x 1)+Φ(w,z 1)1·x Φ(w,x 2)+Φ(w,z 2)2···x Φ(w,x k )+Φ(w,z k )k =u .Then,as long as relation (5.2)is not applied,by the relations in (5.5),words t ·z (v )with v ∈[u ]P can be derived from t ·z (u ).Note that for all such words t ·z (v )with v ∈[u ]P Φ(t ·z (v ),s )=0,Φ(t ·z (v ),t )=1,Φ(t ·z (v ),m )=0,and condition (ii)is satisfied.Relation (5.2)changes t ·z (v )to s ·y (u )·z (v )and again the relations in (5.1)and (5.4)can be applied.As above,the words w in the resulting sub-derivation starting at s ·y (u )·z (v )satisfy (i)and (ii)withx Φ(w,x 1)+Φ(w,z 1)1·x Φ(w,x 2)+Φ(w,z 2)2···x Φ(w,x k )+Φ(w,z k )k =v.By the relations in (5.4),from s ·y (u )·z (v )any word s ·y (v )·z (v )with v ∈[u ]P can be derived.Relation (5.2)can only be applied to the word s ·y (u )·z (v ),causing a repetition.Thus,conditions (i)and (ii)are satisfied within the whole derivation.This completes the proof of Lemma 5.1.2For the derivation of some word s ·v ∈[s ·u ]P ,with v ∈X ,from s ·u in P we conclude from Lemma 5.1and its proof:Corollary 5.1.Let s ·v ∈[s ·u ]P with v ∈X ,v =u ,and let s ·u =γ0→γ1→···→γn =s ·v be any repetition-free derivation in P leading from s ·u to s ·v .Then there is exactly one i ∈I n −1with γi =s ·y (u )·z (u ),γi +1=t ·z (u ),and exactly one j ∈I n −1,j >i ,with γj =t ·z (v ),γj +1=s ·y (u )·z (v ).Thus,we can prove:Lemma 5.2.Let v be some word in X ,thenv ∈[u ]P ⇐⇒s ·v ∈[s ·u ]P .Proof.By Lemma 5.1and its Corollary 5.1,a repetition-free derivation in P leading from s ·u to s ·v with v ∈X has the following form:s ·u +−→(5.1),(5.4)s ·y (u )·z (u )−→(5.2)t ·z (u )+−→(5.5)t ·z (v )−→(5.2)s ·y (u )·z (v )+−→(5.1),(5.4)s ·v,where +−→(.)denotes some repetition-free derivation applying only the relations given in (.).Within the sub-derivations +−→(5.1),(5.4),the values Φ(w,x i )+Φ(w,z i )are constantAn Optimal Gr¨o bner Base Algorithm for Binomial Ideals,and Applications 269for all i ∈I k ,i.e.the word x Φ(w,x 1)+Φ(w,z 1)1·x Φ(w,x 2)+Φ(w,z 2)2···x Φ(w,x k )+Φ(w,z k )kremainsthe same within +−→(5.1),(5.4).Furthermore,all words occurring in the above derivationsatisfy Lemma 5.1.2Lemma 5.3.[s ·u ]P is finite.Proof.Since [u ]P is finite,it follows from the definition of P and Lemma 5.1that[s ·u ]P is also finite.2Lemma 5.4.Let v be some word in X with v ∈[u ]P ,and v divides some u ∈[u ]P .Then s ·v is the minimal (w.r.t.≺)element of its congruence class [s ·v ]P .Proof.If v ∈X with v ∈[u ]P ,and v divides some u ∈[u ]P ,then there is some v ∈X −{ε}with u =v ·v ∈[u ]P .Because of the finiteness of [u ]P there is no ¯v ∈[v ]Pwith ¯v =u ·¯u for ¯u ∈X .If there is such a ¯v ∈[v ]P ,then u =v ·v ≡¯v ·vmod P ,¯v ·v =u ·¯u ·v ∈[u ]P ,i.e.[u ]P is not finite.Thus,in any derivation starting at s ·v relations (5.2)and (5.3)cannot be applied.Only the relations in (5.1)and (5.4)can possibly be used.Since y i x i (resp.z i x i )for all i ∈I k ,s ·v is the minimal element of [s ·v ]P w.r.t.≺.2Note that each v ∈X is the minimal (w.r.t.≺)element of [v ]P because no relation in P is applicable.Since [s ·u ]P is finite,it follows from Dickson’s (1913)Lemma that each w ∈[s ·u ]P is minimal in [s ·u ]P w.r.t.divisibility,i.e.if w ∈[s ·u ]P ,then there is no w ∈[s ·u ]P ,w =w such that w divides w .The minimal element w.r.t.≺of [s ·u ]P is m .Thus,by Lemma 5.4,each s ·v ∈[s ·u ]P with v ∈X is contained in the set of the minimal elements of LT (I (P ))w.r.t.divisibility,and hence G ⊇{s ·v −m |s ·v ∈[s ·u ]P ,v ∈X }(see Theorems 3.1and 3.2).This establishes Theorem 5.1.2As an example of Theorem 5.1,consider the finite commutative semigroup presentationP ={x 21≡x 1x 32,x 22≡x 2x 33}over the alphabet X ={x 1,x 2,x 3}and the word u =x 31.Splitting each relation of P into its two corresponding productions providesx 21→x 1x 32,(5.1)x 1x 32→x 21,(5.2)x 22→x 2x 33,(5.3)x 2x 33→x 22.(5.4)Applying these productions,the congruence class [u ]P of u in P can be derived as shown in Figure 2(v 1(.)−→v 2means that v 2is derived in one step from v 1by application of production (.)).Note that [u ]P is finite.Using the construction of Theorem 5.1we compute the reduced Gr¨o bner basis G of the idealI := sy 1z 1−sx 1,sy 2z 2−sx 2,sy 3z 3−sx 3,。
熵值法 英文
熵值法英文Entropy method, also known as information entropy method, is a multi-criteria decision-making method based on information theory. It was proposed by Shannon in 1948. Entropy method uses the concept of entropy to measure the degree of uncertainty and information of a system, and converts the qualitative factors into quantitative ones for decision-making. It has been widely appliedin various fields such as economics, management, and engineering.In the entropy method, the decision problem is transformed into a mathematical model. The decision matrix is established based on the evaluation criteria and alternatives. The first step is to calculate the weighted normalized decision matrix. The weights of the criteria are calculated by using information entropy and Shannon's entropy formula. The criteria with higher entropy value are considered more important. The decision matrix is then normalized to eliminate the effect of different measurement units or scales.After obtaining the weighted normalized decision matrix, the next step is to calculate the entropy values of each alternative. The entropy value of an alternative is calculated based on the probability distribution of the alternative under each criterion. A lower entropy value indicates a higher degree of certainty and information. The entropy value of an alternative is then multiplied by the weight of the corresponding criterion to obtain the weighted entropy value.The last step is to calculate the comprehensive entropy value of each alternative. The comprehensive entropy value is obtained by summing up the weighted entropy values of all criteria. Thealternative with the lowest comprehensive entropy value is considered the best choice.The entropy method has several advantages. First, it can effectively deal with complex decision-making problems with multiple criteria. Second, it avoids the subjectivity and inconsistency of subjective judgment. Third, it provides a clear and objective ranking of alternatives based on the entropy values.However, there are also limitations to the entropy method. It assumes that the criteria and alternatives are independent, which may not always hold in real-world situations. It can also be sensitive to the choice of decision matrix size and criteria weights. Therefore, it is important to carefully select the criteria and properly determine their weights in order to obtain reliable results.In conclusion, the entropy method is a useful tool for multi-criteria decision-making. It provides a systematic and objective approach to evaluate and rank alternatives. By utilizing the concept of entropy, it effectively measures the degree of uncertainty and information in a decision-making problem. Despite its limitations, the entropy method has been widely applied and has made significant contributions to various fields.。
Design of optimal solvent for extraction of bio-active
Biochemical Engineering Journal37(2007)271–278Design of optimal solvent for extraction of bio-activeingredients from mulberry leavesJong-Min Kim a,Sang-Mok Chang a,In-Ho Kim b,Young-Eun Kim c,Ji-Hwan Hwang c,Kyo-Seon Kim d,Woo-Sik Kim c,∗a Department of Chemical Engineering,Dong-A University,Saha-ku,Hadan2-dong840,Busan604-714,Republic of Koreab Department of Chemical Engineering,Choongnam University,Yusung-ku Kung-dong,Daejeon305-764,Republic of Koreac Department of Chemical Engineering,Kyunghee University,Yongin Kiheung Seochun1,Kyungki-Do449-701,Republic of Koread Department of Chemical Engineering,Kangwon National University,Chuncheon Hyoja2-Dong,Kangwon-Do200-701,Republic of KoreaReceived3October2006;received in revised form8April2007;accepted13May2007AbstractA method of designing solvents for the optimal extraction of bio-active ingredients from natural resources was developed using an alcohol–water binary solvent.The target bio-active ingredient of polyphenols,anti-oxidation and anti-tyrosinase ingredients exhibited different dependency of extraction efficiency on the alcohol species(methanol,ethanol,n-propanol and i-propanol)and composition of binary ing the solubility parameter,the extraction efficiency of the bio-active ingredients was correlated with the solvent polarity.As a result,the optimal solvent polarities for the extraction of polyphenols,anti-oxidation and anti-tyrosinase ingredients were predicted as38.5,37.33,and33.0[MPa1/2],respectively.These predictions also agreed well with the optimal solvent conditions of the water–alcohol mixtures depending on the alcohol species and composition. Plus,the correlation was confirmed with model solvents designed using other solvent species,including acetone and ethylene glycol.©2007Elsevier B.V.All rights reserved.Keywords:Extraction;Alcohol–water binary solvent;Bio-active ingredients;Solvent polarity1.IntroductionRecent studies on exploiting natural compounds for medicine and cosmetics have drawn much attention to the effective extrac-tion of the desired bio-active ingredients from natural products. Typically,various solvents of water,alcohols,acetone and ether, etc.are used to extract bio-active substances from natural prod-ucts due to their broad solubility propensity on solvents,where water is generally applied to extract high polar ingredients,such as carbohydrates,glycosides,and amino acids,while ether is used to extract low polar ingredients,such as aromatic com-pounds.Thereby,alcohol–water mixtures are used to extract out various ingredients having broad range of solubility propensity for the investigation of the specific functionality of the molecular compounds from extracted ingredients.For example,Doi et ed hexane and butanol to extract ingredients having low polarity from mulberry roots∗Corresponding author.Fax:+82312021946.E-mail address:wskim@khu.ac.kr(W.-S.Kim).[1],then investigated the specific functional compounds of prenyflavas,glycoside,iso-quercetine,and astragalin from extracted solution.Methanol is frequently used to extract spe-cific bio-active ingredients from various natural resources.As such,anti-inflammatory ingredients have been found in the methanol extraction from Culcasia scadens P.Beauv[2],anti-microbial compounds from Ceanothus americanus[3],and anti-histaminic compounds from Mentha spicata[4].Plus, ethylacetate and n-hexane have also been applied to extract bio-active ingredients.Thus,from the above previous studies, the solvent polarity would appear to be important for extract-ing specific functional ingredients from a natural resource. Thus,a variety of solvents,pure and mixtures,have been applied to extract bio-active ingredients with various polarities[5].A few studies have already examined the optimal extraction conditions for bio-active ingredients.For example,ethanol for the extraction of anti-oxidants from Spirulina platenis has been suggested as the best solvent among hexane,petroleum ether, and water,while the extraction temperature and time had a min-imal influence on the extraction of the anti-oxidants[6].Mean-1369-703X/$–see front matter©2007Elsevier B.V.All rights reserved. doi:10.1016/j.bej.2007.05.006272J.-M.Kim et al./Biochemical Engineering Journal37(2007)271–278while,Chandrika and Fereidoon attempted tofind the optimal solvent conditions for extracting polyphenolic compounds from wheat using mixture of methanol,ethanol and acetone with vary-ing the solvent composition,extraction temperature,and time [7].Although such previous studies have shown that the optimal solvent conditions for extraction can be found based on trials with various species of solvent and compositions,no system-atic method has yet been suggested for determining the optimal extraction solvent for natural resources.Accordingly,the present study is focused to develop a method for determining the optimal solvent conditions and designing a solvent for the optimal extraction of bio-active ingredients from mulberry leaf known to contain the active ingredients for anti-oxidation and anti-hyperpigmentation.Since the extrac-tion of specific ingredients from natural resources depends on the polarity of the solvent,as implied in previous studies,the extraction efficiency of the solvent for bio-active ingredients is investigated,along with the variation in the polarity of the solvent according to the species and composition of a binary alcohol–water solvent.Here,methanol,ethanol,n-propanol,and iso-propanol are used as the alcohol species for the binary mixture.Plus,ethylene glycol and acetone are used to design model solvents to confirm the relationship between the extrac-tion of bio-active ingredients and the solvent polarity.Based on the extraction of mulberry leaf,activities of ingredients spe-cific to anti-oxidation and anti-hyperpigmentation are used as references to evaluate the extraction efficiency of the solvent [8,9].2.Experimental2.1.ExtractionMulberry(Morus alba L.)leaf,purchased from a herbal market in Korea,was completely dried in a convection oven at60–80◦C for a couple of days,thenfinely pulverized using a milling machine.Next,the leaf powder was meshed with an aperture size of200m and kept in a desiccator. To extract the bio-active ingredients,2g of the leaf powder were added to10ml of a solvent made of a binary mix-ture of alcohol and water for1h in a hot bath at80◦C. The extracted solution was then separated from the solid leaf using a centrifuge(Hanil Science Industrial Co.Ltd.,HA-500, Korea)and the contents of the bio-active ingredients examined, including the polyphenolic compounds,anti-oxidants,and anti-tyrosinase.The solvent conditions for extracting the bio-active ingre-dients were varied by adjusting the alcohol composition and species in the alcohol–water binary mixture.Methanol,ethanol, n-propanol and iso-propanol were used for the binary mixture and their compositions were changed from0to100%.Plus,ethy-lene glycol and acetone were also applied to formulate a binary mixture solvent to evaluate the optimal extraction conditions for the solvent.For the present experiment,all chemicals were pur-chased from Sigma–Aldrich Chemical Co.(U.S.A.)and ACS grade.2.2.Assay of phenolic compoundsBased on the method of Goldstein and Swain[10],the total content of polyphenolic compounds in the extraction was eval-uated.First,the extraction was dilutedfifty times with distilled water(1/50,v/v),then100l of the diluted sample was com-pletely mixed with1250l of the Folin-Denis reagent(ACS grade,Fluka,Switzerland)that had been diluted ten times with distilled water(1/10,v/v).Thereafter,the mixture solution was incubated at25◦C for20min after adding250l of satu-rated sodium ing a UV spectrophotometer with a 760nm wavelength(JASCO,Model V-570,Japan),the content of polyphenolic compounds was estimated by comparing with a standard concentration curve for tannic acid that has an equiv-alent absorbance to the UV wavelength.The standard curve for tannic acid was prepared using the same procedure used to mea-sure the UV absorbance of the extracted sample.Thus,100l of a tannic acid solution was mixed with1250l of the same Folin-Denis reagent and250l of saturated sodium carbonate, then incubated at25◦C for20min.The UV absorbance of the tannic acid solution relative to the concentration resulted in the following standard curve:C P=A T−0.01980.00472(1)where C P is the tanic acid concentration[g ml−1]and A T is the UV absorbance.Eq.(1)was then used to estimate the polyphenol concentration equivalent to the tannic acid concentration.2.3.Anti-oxidation activityThe anti-oxidation activity of the extraction was evaluated based on the degree of scavenging1,1-diphenyl-2-picryhydrazyl (DPPH)free radicals[11].First,a free radical solution was pre-pared with0.15mM DPPH in2000l of ethanol and100l of a0.5%(v/v)Tween-20solution.The pH of the radical solu-tion was then adjusted to7.4using1800l of a0.1M Tris–HCl buffer.After adding10l of the extraction sample to the radical solution,the mixture was allowed to react for30min at room temperature,then the UV absorbance was measured at a wave-length of517nm.The blank solvent containing no extraction was used as a base reference for the anti-oxidation activity.The activity of the extraction(C AO)was expressed on a relative scale as:C AO(%)=A AO−A ROA RO×100(2)where A AO and A RO are the UV absorbances of the extraction sample and blank solvent,respectively.2.4.Anti-tyrosinase activityBased on the method suggested by Lee et al.[9],the mush-room tyrosinase inhibition of the ingredients was measured as indicative activity of anti-hyperpigmentation.The extraction sample was diluted50%(v/v)with methanol,then320l of the diluted sample was mixed with960l of0.83mM l-dopaJ.-M.Kim et al./Biochemical Engineering Journal37(2007)271–278273 and320l of a tyrosinase solution,containing125units ml−1and buffered with0.1M phosphate at pH6.8.The mixture wasquickly cooled to0◦C right after being incubated at37◦C for10min,then the UV absorbance of the solution at490nm wasmeasured.The blank mixture without tyrosinase was used as abase reference for the anti-tyrosinase activity.The activity of theextraction(C AT)was then expressed on a relative scale as:C AT(%)=A AT−A RTRT×100(3)where A AT and A RT are the UV absorbances of the extraction sample and blank mixture,respectively.3.Results and discussion3.1.Extraction of bio-active ingredientsIt is already known that most of the natural plant bio-active ingredients that are polyphenolic compounds,such as mulber-roside F,quercetine,catechin,and rutin,etc.in mulberry leaves, have broad solubility propensities due to their molecular polar-ities.Thus,solvents composed of a broad range of alcohol species and compositions were used for effective extraction of the bio-active ingredients from mulberry leaves.The extrac-tion efficiency of the solvents was evaluated based on three criteria:the total polyphenolic content,anti-oxidation activity (DPPH radical scavenging),and anti-tyrosinase activity(mush-room tyrosinase inhibition)of the extracted solution.As shown in Fig.1,the total content of polyphenols extracted from the mulberry leaves varied with the composition and species of alcohol in the binary mixture solvent.The extrac-tion efficiency of a propanol binary mixture(n-propanol–water and iso-propanol–water mixtures)was slightly maximized with an alcohol composition of about20%,then dramatically reduced when increasing the propanol composition.Meanwhile,with an ethanol–water mixture,the optimum condition for extraction appeared with an alcohol composition of about40%.In the case of a methanol–water mixture,the optimum condition for extrac-tion shifted further to an alcohol composition of60%although the extraction efficiency somewhatfluctuated with the alcohol composition,and the dependency of the extraction efficiency on the methanol composition was significantly diminished when compared to that with a propanol–watermixture.Fig.1.Extraction of polyphenol content by alcohol–water binary mixtures.The extraction of organic ingredients from plant leaves is directly related to the compatibility of the ingredients to the sol-vent;thus,when the ingredients are well matched in polarity with the solvent they will be easily extracted,otherwise,it will be hard to extract them.Therefore,based on the current experi-mental results,it was supposed that an optimal solvent condition of polarity could maximize the total content of polyphenol compounds extracted from mulberry leaves.Also,when using different species for a binary mixture,the solvent composition for optimal polarity was varied due to the distinct polarity of each solvent species.As a result,the optimal alcohol–water mixture appeared at a low composition with propanols and shifted to a high composition with methanol,as the methanol was more polar than the propanols,as displayed in Table1.Since the polar-ity range for the methanol–water mixture was smaller than that for the propanol–water mixture,the extraction efficiency of the methanol–water mixture was found to be much less sensitive to the composition than with any of the other alcohol–water mixtures.The anti-oxidation activity,equivalent capability of scav-enging DPPH free radicals,of the extracted solution was also evaluated under various solvent conditions,as shown in Fig.2. When methanol was used for the binary mixture,the maximum anti-oxidation activity of the extracted solution was obtained at about60%of methanol composition,yet this shifted to a lower alcohol fraction when using ethanol and propanols.It was inter-esting to note that the anti-oxidation activity of the extractedTable1Cohesive energies and solubility parameter of solventsSolvent Molecular weight(g mol−1)Molecular volume(cm3mol−1)δ(MPa1/2)δdδpδhδ(25◦C)δ(80◦C) Water18.0218.112.222.840.448.044.1 Methanol32.040.711.613.024.029.727.3 Ethanol46.158.712.611.220.026.124.0n-Propanol60.175.214.110.517.724.922.9iso-Propanol60.176.814.09.816.023.421.5 Ethylene glycol62.155.910.115.129.834.932.1 Acetone58.174.013.09.811.019.718.1274J.-M.Kim et al./Biochemical Engineering Journal 37(2007)271–278Fig.2.Extraction of bio-active ingredients for anti-oxidation by alcohol–waterbinary mixture.solution was more sensitive to the solvent conditions (species and composition of alcohol)than the polyphenol extraction effi-ciency.In particular,when using methanol for the binary water mixture,the anti-oxidation activity of the solution extracted with a high alcohol composition (above 80%)was dramati-cally reduced,while the extraction efficiency of polyphenolic compounds remained almost the same with only a slight drop from the maximum extraction efficiency.Consequently,it would appear that among the polyphenolic compounds in the extracted solution,the effective ingredients related to anti-oxidation activ-ity were quite polar and their solubility was very sensitive to the solvent polarity.However,the anti-tyrosinase activity of the extracted solution behaved quite differently when varying the alcohol species and composition of the solvent.As shown in Fig.3,when using methanol in the solvent,the activity of the solution continued to be enhanced when increasing the methanol fraction,up to an alcohol composition of around 80%,after which it slightly dropped,whereas in the cases of ethanol and propanols,theFig.3.Extraction of bio-active ingredients for anti-tyrosinase by alcohol–waterbinary mixture.optimal solvent condition was found at an alcohol composition of around 60%.This means that the bio-active ingredients related to anti-tyrosinase activity were more favorably extracted by a solvent with a lower polarity than the polyphenols and anti-oxidation ingredients.3.2.Design of solvents for extraction of bio-active ingredientsAs shown in the above experiment,the extraction efficiency of the bio-active ingredients was directly related to the polarity of the binary solvent,which was varied based on the alcohol fraction and species.Thus,an optimal solvent for the extraction of specific active ingredients could be designed if the relation-ship between the solvent polarity and the extraction efficiency were available.Hence,the solubility parameter [12]is intro-duced as a simple way of representing the polarity of a solvent and correlating the extraction efficiency of the polyphenol con-tent,anti-oxidation ingredients,and anti-tyrosinase ingredients with the polarity of the solvent.Actually,this parameter has already been frequently used to predict the miscibility and solu-bility of materials with a particular solvent [13,14].According to Hildebrand [12],the solubility parameter is directly dictated by the cohesive energy (E coh,i )composed of a linear combination of contributions from the dispersion interaction (E d,i ),polar inter-action (E p,i ),and hydrogen bonding interaction (E h ,i ),defined as:δi =E coh ,ii = E d,i +E p,i +E h,i i (4)Since the cohesion parameters are related to the correspondinginteraction energies as:E d,i +E p,i +E h,i =δ2d,i +δ2p,i +δ2h,i(5)the solubility parameter can be rearranged as:δi =δ2d,i +δ2p,i +δ2h,i(6)where δi is the solubility parameter [MPa 1/2]for species i andV i is the mole volume for species i .For pure solvents of current study,the values of the cohesion parameters,as summarized in Table 1,were then used to calculate the solubility parameters for the water and alcohol species,and the solubility parameters for the alcohol–water mixtures estimated based on a simple mixing rule as follows:δm = ix i δi (7)where δm is the solubility parameter for the alcohol–water mix-ture and x i is the volume fraction of species i in the mixture.In addition,the temperature adjustment for the solubility parameter is considered by Barton [12]: δ1δ2 2=T 2T 1(8)where T i indicates the extraction temperature [K].J.-M.Kim et al./Biochemical Engineering Journal37(2007)271–278275Fig.4.Correlations of extraction of bio-active ingredients form mulberry leaves with polarity of solvent:(a)content of polyphenols,(b)anti-oxidation ingredients and(c)anti-tyrosinase ingredients.As shown in Fig.4,the extraction efficiencies of the solvents for the bio-active ingredients in mulberry leaves,obtained using various alcohol species and compositions,correlated well with the single parameter of the solvent polarity.The optimal extrac-tion for polyphenolic compounds was achieved with a solvent polarity of38.5[MPa1/2](Fig.4(a)),corresponding to a36.9% methanol fraction,30.9%ethanol fraction,29.3%n-propanol fractions,and27.5%iso-propanol fraction in the binary mixture. Meanwhile,the optimal extraction for anti-oxidation ingredients was achieved with a solvent polarity of37.3[MPa1/2](Fig.4(b)), representing a slight increase in the alcohol fraction in the binary mixture(42%methanol,35.1%ethanol,33.3%n-propanol,and 31.2%iso-propanol fraction in the binary mixture).As such, these correlation results on the optimal solvent polarity for extraction would seem to explain why the optimal alcohol com-position varied with the alcohol species in Figs.1and2,and agreed well with the extraction efficiency profiles relative to the alcohol composition.However,the optimal solvent polarity for the anti-tyrosinase ingredients was about33.0[MPa1/2],which was lower than that for the polyphenols and anti-oxidation ingredients,as shown in Fig.4(c).From those experiment results,it would be inferred that the anti-tyrosinase ingredients were less polar and then optimally extracted out with the lower polarity of the solvent than anti-oxidation ingredients.As a result,this low polar condition for the optimum solvent meant a high alcohol fraction was required, such as a70.3%methanol,58.7%ethanol,55.7%n-propanol, and52.3%iso-propanol fraction in the binary mixture,to extract the anti-tyrosinase ingredients.The correlations of the extraction efficiency for bio-active ingredients with the solvent polarity were evaluated using model solvents with a broad polarity range.As summarized in Table2, ethylene glycol,acetone,methanol,and water were used to make the model solvents with polarities ranging from24.8to41.0 [MPa1/2],and applied to extract the bio-active ingredients,as shown in Fig.5.Despite the different species and composi-tions of the model solvents,their extraction efficiencies for the polyphenol content,anti-oxidation and anti-tyrosinase ingredi-ents were well matched with the correlations obtained using the alcohol–water binary mixtures.Furthermore,the polarities of quercetine and mulberroside F were estimated to confirm the above correlation using a func-276J.-M.Kim et al./Biochemical Engineering Journal 37(2007)271–278Table 2Solubility parameters of model solvents (at 80◦C)Solventδ(MPa 1/2)Symbols (for Fig.5)PolyphenolsAnti-oxidant Anti-tyrosinase25%ethylene glycol in water 41.10᭹42%ethylene glycol in water 39.0558%ethylene glycol in water 37.1247%acetone in water 31.8857%acetone in water 29.28♦27%acetone inmethanol24.81parison of extraction of bio-active ingredients by model solvents with correlation of extraction along with solvent polarity.tional group contribution theory [12,15],as these compounds,molecular structures shown in Fig.6,were the most active anti-oxidation and anti-tyrosinase compoundsfrom the mulberry leaves,respectively.Due to the unavailability of cohesive energyFig.6.Molecular structures of (a)quercetine and (b)mulberroside F that are most active ingredients for anti-oxidation and anti-tyrosinase,respectively,in mulberry leaves.data,the solubility parameter for quercetine and mulberroside F was predicted from the molar vaporization energy (g U i )and molar volume (g V i )of the functional group in each compound using the following equation [12]:δi = g U igV i1/2(8)As summarized in Table 3,when using the functional group data on the molar vaporization energy and molar volume with Eq.(8),the polarity of quercetine and mulberroside F was pre-dicted as 36.1and 34.5[MPa 1/2],respectively.Consequently,these polarity values for the compounds,which were consistent with the optimal solvent polarities for the extractions,provide a possible explanation why the optimal solvent conditions for the anti-tyrosinase extraction occurred at a lower solvent polarity than the anti-oxidation extraction in the correlations.The correlation of the extraction efficiency with the solvent polarity was also examined by comparing the contents of a target ingredient extracted under different solvent conditions.As shown in Fig.7,when extracted at three different solvent polarities (21.5,32.0,and 44.1[MPa 1/2]),the thin layer chro-matography spectra clearly revealed that the extraction with a solvent polarity of 32.0[MPa 1/2]contained a higher concen-tration of mulberroside F than any of the other extractions,as expected from the above correlation with the anti-tyrosinase ingredients.Table 3The molar vaporation energy and molar volume of functional groups for predic-tion of solubility parameters of quercetine and mulberroside F [9](at 25◦C)Groupg U(kJ mol −1)g V (cm 3mol −1)CH 3 4.7133.5CH 2 4.9416.1>CH 3.43−1.0CH 4.3113.5>C4.31−5.5Ring closure,five or more atoms1.0516Conjugation in ring,for each double bond1.67−2.2OH (disubstituted or on adjacent C atoms)21.913.0OH 29.810.0O 3.35 3.8CO17.410.8J.-M.Kim et al./Biochemical Engineering Journal37(2007)271–278277parison of concentration of mulberroside F extacted at three different solvent conditions.4.ConclusionsAn effective solvent for extracting bio-active ingredients, such as anti-oxidants and anti-tyrosinases,from mulberry leaves was identified by varying the solvent species and composi-tions.For the effective extraction of target ingredients with a specific activity from among many ingredients,the solvent con-dition was determined based on the polar propensity of the target ingredients.As such,the active anti-oxidation ingredi-ents and polyphenols with a high polar propensity required an alcohol–water binary solvent with a methanol fraction of about 60%for optimal extraction,whereas the anti-tyrosinase ingre-dients were most favorably drawn out by a binary solvent with an80%methanol fraction.In addition,due to the lower polar propensity of ethanol and propanols than methanol,their frac-tions of the binary mixture for the optimal extraction of the active ingredients were lower than the methanol fraction.This dependency of the optimal extraction on the solvent species and composition can be simply described by the sol-vent polarity represented by the solubility parameter.Based on correlating the extraction efficiency of the anti-oxidation and anti-tyrosinase ingredients with the solvent polarity,the opti-mal conditions for the solvent were predicted as a solubility parameter above38.0and33.0[MPa1/2],respectively.These predictions for the optimal solvent conditions for the extraction of the anti-oxidation and anti-tyrosinase ingredients were also consistent with model solvent conditions designed using acetone and ethylene glycol,and confirmed by the extracted contents of quercetine and mulberroside F,the most active anti-oxidation and anti-tyrosinase compounds in mulberry leaves,respectively.Accordingly,the correlations between the extracted contents of bio-active ingredients and the solvent polarity would appear to provide useful information on the optimal solvent conditions for the extraction of target ingredients with specific activity,which may prove to be helpful in the design of industrial processes and solvents.AcknowledgementThe authors are grateful for grants from the Korean Ministry of Health and Welfare(project no.:HMP-03-PJ1-PG1-CH14-0001).References[1]K.Doi,T.Kojima,M.Makino,Y.Kimura,Y.Fujimoto,Studies on theconstituents of the leaves of Morus alba L,Chem.Pharm.Bull.49(2) (2001)151–153.[2]C.O.Okoli,P.A.Akah,Mechanisms of the anti-inflammatory activity ofthe leaf extracts of Culcasia scandens P.Beauv(Araceae),Pharmacol.Biochem.Behav.79(2004)473–481.[3]X.C.Li,L.Cai,C.D.Wu,Antimicrobial compounds from Ceanothusamericanus against oral pathogens,Phytochemistry46(1)(1997)97–102.[4]S.Yamamura,K.Ozawa,K.Ohtani,R.Kasai,K.Yamasaki,Antihistaminicflavones and aliphatic glycosides from Mentha spicata,Phytochemistry48(1)(1998)131–136.[5]P.A.Akaha,A.C.Ezike,S.V.Nwafor,C.O.Okoli,N.M.Enwerem,Eval-uation of the anti-asthmatic property Asystasia gangetica leaf extracts,J.Ethnopharmacol.89(1)(2003)25–36.[6]H.Miguel,J.M.A.F.Pedro,S.Javier,C.Alejandro,I.Elena,Optimizationof accelerated solvent extraction of antioxidants from Spirulina platensis microalga,Food Chem.93(2005)417–423.[7]L.P.Chandrika,S.Fereidoon,Optimization of extraction of phenolic com-pounds from wheat using response surface methodology,Food Chem.93 (2005)47–56.[8]F.J.Chen,N.Nakashima,I.Kimura,M.Kimura,N.Asano,S.Koya,Potentiating effects on pilocarpine-induced saliva secretion,by extracts and278J.-M.Kim et al./Biochemical Engineering Journal37(2007)271–278N-containing sugars derived from mulberry leaves,in streptozocin-diabetic mice,Biol.Pharm.Bull.18(12)(1995)1676–1680.[9]S.H.Lee,S.Y.Choi,H.Kim,J.S.Hwang,B.G.Lee,J.J.Gao,S.Y.Kim,Mulberroside F isolated from the leaves of Morus alba inhibits melanin biosynthesis,Biol.Pharm.Bull.8(25)(2002)1045–1048.[10]J.L.Goldstein,T.Swain,Changes in tannins in ripening fruits,Phytochem-istry2(4)(1963)371–383.[11]Q.Xiong,S.Kadota,T.Tani,T.Namba,Antioxidative effects ofphenylethanoids from Cistanche deserticola,Biol.Pharm.Bull.19(12) (1996)1580–1585.[12]A.F.M.Barton,CRC Handbook of Solubility Parameters and Other Cohe-sion Parameters,CRC Press,Western Australia,1983.[13]F.Fedors,A method for estimation both the solubility parameters and molarvolumes of liquids,Polym.Eng.Sci.14(2)(1974)147–154.[14]P.Bustamante,R.Ochoa,A.Reillo,J.B.Escalera,Chameleonic effect ofsulfanilamide and sulfamethazine in solvent mixtures:solubility curves with two maxima,Chem.Pharm.Bull.42(5)(1994)1129–1133. [15]A.Grubenmann,The solvent dependence of the solubility of organic solidsand solubility parameter theory:investigation by means of an organic pig-ment,Dyes Pigments21(4)(1993)273–292.。
熵值法的英文
熵值法的英文Entropy-Based Weighting MethodEntropy is a fundamental concept in various fields, including information theory, thermodynamics, and decision-making. In the context of decision-making and data analysis, the entropy-based weighting method has emerged as a useful tool for determining the relative importance or weights of different criteria or attributes. This method provides a systematic and objective approach to assigning weights to variables, which can be particularly valuable in multi-criteria decision-making problems.The entropy-based weighting method is based on the principle that the more information a criterion or attribute provides, the more important it is in the decision-making process. The method relies on the calculation of the entropy of each criterion, which reflects the degree of uncertainty or dispersion of the data associated with that criterion. The higher the entropy, the lower the information content, and consequently, the lower the weight assigned to that criterion.The first step in the entropy-based weighting method is to construct the decision matrix, which is a table that represents the performanceof each alternative with respect to each criterion. The decision matrix can be denoted as X = [xij], where xij represents the value of the jth criterion for the ith alternative.Next, the decision matrix is normalized to ensure that all values are within the range of 0 to 1. This normalization process can be done using various techniques, such as linear normalization or vector normalization. The normalized decision matrix is denoted as R = [rij], where rij represents the normalized value of the jth criterion for the ith alternative.The entropy of each criterion is then calculated using the following formula:ej = -k * Σ(rij * ln(rij))where ej is the entropy of the jth criterion, k is a constant (usually set to 1/ln(m), where m is the number of alternatives), and rij is the normalized value of the jth criterion for the ith alternative.The weight of each criterion is then calculated as:wj = (1 - ej) / Σ(1 - ej)where wj is the weight of the jth criterion, and ej is the entropy of thejth criterion.The entropy-based weighting method has several advantages over other weighting methods, such as subjective weighting methods or equal weighting methods. First, it is an objective and data-driven approach, which means that the weights are determined based on the information content of the data rather than on subjective judgments or assumptions. This can be particularly useful in situations where decision-makers have limited knowledge or experience with the problem at hand.Second, the entropy-based weighting method is sensitive to the degree of variation in the data. Criteria with higher variation in their values will have lower entropy and, consequently, higher weights. This reflects the fact that more variable criteria are generally more informative and, therefore, more important in the decision-making process.Third, the entropy-based weighting method is relatively simple to implement and can be easily automated using computer software or spreadsheet applications. This makes it a practical and accessible tool for decision-makers in a variety of contexts, from business and finance to environmental management and public policy.Despite its advantages, the entropy-based weighting method alsohas some limitations. For example, it assumes that the criteria are independent and that the data is accurate and reliable. In situations where there are dependencies between criteria or where the data is incomplete or uncertain, the method may not produce accurate results.In conclusion, the entropy-based weighting method is a powerful tool for determining the relative importance of different criteria or attributes in decision-making. By utilizing the concept of information entropy, the method provides an objective and data-driven approach to weight assignment, which can be particularly valuable in complex, multi-criteria decision problems. While it has its limitations, the entropy-based weighting method is a widely-used and well-established technique in the field of decision analysis and data analysis.。
Optimal Algorithm
Module – 8 Lecture Notes – 4Direct and Indirect Search MethodsIntroductionMost of the real world system models involve nonlinear optimization with complicated objective functions or constraints for which analytical solutions (solutions using quadratic programming, geometric programming, etc.) are not available. In such cases one of the possible solutions is the search algorithm in which, the objective function is first computed with a trial solution and then the solution is sequentially improved based on the corresponding objective function value till convergence. A generalized flowchart of the search algorithm in solving a nonlinear optimization with decision variable X i, is presented in Fig. 1.The search algorithms can be broadly classified into two types: (1) direct search algorithm and (2) indirect search algorithm. A direct search algorithm for numerical search optimization depends on the objective function only through ranking a countable set of function values. It does not involve the partial derivatives of the function and hence it is also called nongradient or zeroth order method. Indirect search algorithm, also called the descent method, depends on the first (first-order methods) and often second derivatives (second-order methods) of the objective function. A brief overview of the direct search algorithm is presented.Direct Search AlgorithmSome of the direct search algorithms for solving nonlinear optimization, which requires objective functions, are described below:A) Random Search Method: This method generates trial solutions for the optimization model using random number generators for the decision variables. Random search method includes random jump method, random walk method and random walk method with direction exploitation. Random jump method generates huge number of data points for the decision variable assuming a uniform distribution for them and finds out the best solution by comparing the corresponding objective function values. Random walk method generates trial solution with sequential improvements which is governed by a scalar step length and a unit random vector. The random walk method with direct exploitation is an improved version of random walk method, in which, first the successful direction of generating trial solutions is found out and then maximum possible steps are taken along this successful direction.B) Grid Search Method: This methodology involves setting up of grids in the decision space and evaluating the values of the objective function at each grid point. The point which corresponds to the best value of the objective function is considered to be the optimum solution. A major drawback of this methodology is that the number of grid points increases exponentially with the number of decision variables, which makes the method computationally costlier.C) Univariate Method: This procedure involves generation of trial solutions for one decision variable at a time, keeping all the others fixed. Thus the best solution for a decision variablekeeping others constant can be obtained. After completion of the process with all the decision variables, the algorithm is repeated till convergence.D) Pattern Directions: In univariate method the search direction is along the direction of co-ordinate axis which makes the rate of convergence very slow. To overcome this drawback, the method of pattern direction is used, in which, the search is performed not along the direction of the co-ordinate axes but along the direction towards the best solution. This can be achieved with Hooke and Jeeves’ method or Powell’s method. In the Hooke and Jeeves’ method, a sequential technique is used consisting of two moves: exploratory move and the pattern move. Exploratory move is used to explore the local behavior of the objective function, and the pattern move is used to take advantage of the pattern direction. Powell’s method is a direct search method with conjugate gradient, which minimizes the quadratic function in a finite number of steps. Since a general nonlinear function can be approximated reasonably well by a quadratic function, conjugate gradient minimizes the computational time to convergence.E)Rosen Brock’s Method of Rotating Coordinates: This is a modified version of Hooke and Jeeves’ method, in which, the coordinate system is rotated in such a way that the first axis always orients to the locally estimated direction of the best solution and all the axes are made mutually orthogonal and normal to the first one.F) Simplex Method: Simplex method is a conventional direct search algorithm where the best solution lies on the vertices of a geometric figure in N-dimensional space made of a set of N+1 points. The method compares the objective function values at the N+1 vertices and moves towards the optimum point iteratively. The movement of the simplex algorithm is achieved by reflection, contraction and expansion.Indirect Search AlgorithmThe indirect search algorithms are based on the derivatives or gradients of the objective function. The gradient of a function in N-dimensional space is given by:⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣⎡∂∂∂∂∂∂=∇N x f x f x f f (21)(1)Indirect search algorithms include:A) Steepest Descent (Cauchy) Method: In this method, the search starts from an initial trial point X1, and iteratively moves along the steepest descent directions until the optimum point is found. Although, the method is straightforward, it is not applicable to the problems having multiple local optima. In such cases the solution may get stuck at local optimum points.B) Conjugate Gradient (Fletcher-Reeves) Method: The convergence technique of the steepest descent method can be greatly improved by using the concept of conjugate gradient with the use of the property of quadratic convergence.C) Newton’s Method: Newton’s method is a very popular method which is based on Taylor’s series expansion. The Taylor’s series expansion of a function f(X ) at X =X i is given by:()()()()[](i i T i i T i i X X J X X X X f X f X f −−+−∇+=21) (2) where, [J i ]=[J]|x i , is the Hessian matrix of f evaluated at the point X i . Setting the partial derivatives of Eq. (2), to zero, the minimum value of f(X ) can be obtained.()N j x X f j,...,2,1,0==∂∂ (3)From Eq. (2) and (3)[]()0=−+∇=∇i i i X X J f f(4)Eq. (4) can be solved to obtain an improved solution X i+1 []i i i i f J X X ∇−=−+11 (5)The procedure is repeated till convergence for finding out the optimal solution.D) Marquardt Method: Marquardt method is a combination method of both the steepest descent algorithm and Newton’s method, which has the advantages of both the methods, movement of function value towards optimum point and fast convergence rate. By modifying the diagonal elements of the Hessian matrix iteratively, the optimum solution is obtained in this method.E) Quasi-Newton Method: Quasi-Newton methods are well-known algorithms for finding maxima and minima of nonlinear functions. They are based on Newton's method, but they approximate the Hessian matrix, or its inverse, in order to reduce the amount of computation per iteration. The Hessian matrix is updated using the secant equation, a generalization of the secant method for multidimensional problems.It should be noted that the above mentioned algorithms can be used for solving only unconstrained optimization. For solving constrained optimization, a common procedure is the use of a penalty function to convert the constrained optimization problem into an unconstrained optimization problem. Let us assume that for a point X i , the amount of violation of a constraint is δ. In such cases the objective function is given by:()()2δλ××+=M X f X f i i (6)where, λ=1( for minimization problem) and -1 ( for maximization problem), M=dummy variable with a very high value. The penalty function automatically makes the solution inferior where there is a violation of constraint.SummaryVarious methods for direct and indirect search algorithms are discussed briefly in the present class. The models are useful when no analytical solution is available for an optimization problem. It should be noted that when there is availability of an analytical solution, the search algorithms should not be used, because analytical solution gives a global optima whereas, there is always a possibility that the numerical solution may get stuck at local optima.。
高熵合金
Microstructures and properties of high-entropyalloysYong Zhang a ,⇑,Ting Ting Zuo a ,Zhi Tang b ,Michael C.Gao c ,d ,Karin A.Dahmen e ,Peter K.Liaw b ,Zhao Ping Lu aa State Key Laboratory for Advanced Metals and Materials,University of Science and Technology Beijing,Beijing 100083,Chinab Department of Materials Science and Engineering,The University of Tennessee,Knoxville,TN 37996,USAc National Energy Technology Laboratory,1450Queen Ave SW,Albany,OR 97321,USAd URS Corporation,PO Box 1959,Albany,OR 97321-2198,USAe Department of Physics,University of Illinois at Urbana-Champaign,1110West Green Street,Urbana,IL 61801-3080,USA a r t i c l e i n f o Article history:Received 26September 2013Accepted 8October 2013Available online 1November 2013a b s t r a c tThis paper reviews the recent research and development of high-entropy alloys (HEAs).HEAs are loosely defined as solid solutionalloys that contain more than five principal elements in equal ornear equal atomic percent (at.%).The concept of high entropyintroduces a new path of developing advanced materials withunique properties,which cannot be achieved by the conventionalmicro-alloying approach based on only one dominant element.Up to date,many HEAs with promising properties have beenreported, e.g.,high wear-resistant HEAs,Co 1.5CrFeNi 1.5Ti andAl 0.2Co 1.5CrFeNi 1.5Ti alloys;high-strength body-centered-cubic(BCC)AlCoCrFeNi HEAs at room temperature,and NbMoTaV HEAat elevated temperatures.Furthermore,the general corrosion resis-tance of the Cu 0.5NiAlCoCrFeSi HEA is much better than that of theconventional 304-stainless steel.This paper first reviews HEA for-mation in relation to thermodynamics,kinetics,and processing.Physical,magnetic,chemical,and mechanical properties are thendiscussed.Great details are provided on the plastic deformation,fracture,and magnetization from the perspectives of cracklingnoise and Barkhausen noise measurements,and the analysis of ser-rations on stress–strain curves at specific strain rates or testingtemperatures,as well as the serrations of the magnetizationhysteresis loops.The comparison between conventional andhigh-entropy bulk metallic glasses is analyzed from the viewpointsof eutectic composition,dense atomic packing,and entropy of 0079-6425/$-see front matter Ó2013Elsevier Ltd.All rights reserved./10.1016/j.pmatsci.2013.10.001⇑Corresponding author.Tel.:+8601062333073;fax:+8601062333447.E-mail address:drzhangy@ (Y.Zhang).2Y.Zhang et al./Progress in Materials Science61(2014)1–93mixing.Glass forming ability and plastic properties of high-entropy bulk metallic glasses are also discussed.Modeling tech-niques applicable to HEAs are introduced and discussed,such asab initio molecular dynamics simulations and CALPHAD modeling.Finally,future developments and potential new research directionsfor HEAs are proposed.Ó2013Elsevier Ltd.All rights reserved. Contents1.Introduction (3)1.1.Four core effects (4)1.1.1.High-entropy effect (4)1.1.2.Sluggish diffusion effect (5)1.1.3.Severe lattice-distortion effect (6)1.1.4.Cocktail effect (7)1.2.Key research topics (9)1.2.1.Mechanical properties compared with other alloys (10)1.2.2.Underlying mechanisms for mechanical properties (11)1.2.3.Alloy design and preparation for HEAs (11)1.2.4.Theoretical simulations for HEAs (12)2.Thermodynamics (12)2.1.Entropy (13)2.2.Thermodynamic considerations of phase formation (15)2.3.Microstructures of HEAs (18)3.Kinetics and alloy preparation (23)3.1.Preparation from the liquid state (24)3.2.Preparation from the solid state (29)3.3.Preparation from the gas state (30)3.4.Electrochemical preparation (34)4.Properties (34)4.1.Mechanical behavior (34)4.1.1.Mechanical behavior at room temperature (35)4.1.2.Mechanical behavior at elevated temperatures (38)4.1.3.Mechanical behavior at cryogenic temperatures (45)4.1.4.Fatigue behavior (46)4.1.5.Wear behavior (48)4.1.6.Summary (49)4.2.Physical behavior (50)4.3.Biomedical,chemical and other behaviors (53)5.Serrations and deformation mechanisms (55)5.1.Serrations for HEAs (56)5.2.Barkhausen noise for HEAs (58)5.3.Modeling the Serrations of HEAs (61)5.4.Deformation mechanisms for HEAs (66)6.Glass formation in high-entropy alloys (67)6.1.High-entropy effects on glass formation (67)6.1.1.The best glass former is located at the eutectic compositions (67)6.1.2.The best glass former is the composition with dense atomic packing (67)6.1.3.The best glass former has high entropy of mixing (67)6.2.GFA for HEAs (68)6.3.Properties of high-entropy BMGs (70)7.Modeling and simulations (72)7.1.DFT calculations (73)7.2.AIMD simulations (75)7.3.CALPHAD modeling (80)8.Future development and research (81)Y.Zhang et al./Progress in Materials Science61(2014)1–9338.1.Fundamental understanding of HEAs (82)8.2.Processing and characterization of HEAs (83)8.3.Applications of HEAs (83)9.Summary (84)Disclaimer (85)Acknowledgements (85)References (85)1.IntroductionRecently,high-entropy alloys(HEAs)have attracted increasing attentions because of their unique compositions,microstructures,and adjustable properties[1–31].They are loosely defined as solid solution alloys that contain more thanfive principal elements in equal or near equal atomic percent (at.%)[32].Normally,the atomic fraction of each component is greater than5at.%.The multi-compo-nent equi-molar alloys should be located at the center of a multi-component phase diagram,and their configuration entropy of mixing reaches its maximum(R Ln N;R is the gas constant and N the number of component in the system)for a solution phase.These alloys are defined as HEAs by Yeh et al.[2], and named by Cantor et al.[1,33]as multi-component alloys.Both refer to the same concept.There are also some other names,such as multi-principal-elements alloys,equi-molar alloys,equi-atomic ratio alloys,substitutional alloys,and multi-component alloys.Cantor et al.[1,33]pointed out that a conventional alloy development strategy leads to an enor-mous amount of knowledge about alloys based on one or two components,but little or no knowledge about alloys containing several main components in near-equal proportions.Theoretical and experi-mental works on the occurrence,structure,and properties of crystalline phases have been restricted to alloys based on one or two main components.Thus,the information and understanding are highly developed on alloys close to the corners and edges of a multi-component phase diagram,with much less knowledge about alloys located at the center of the phase diagram,as shown schematically for ternary and quaternary alloy systems in Fig.1.1.This imbalance is significant for ternary alloys but becomes rapidly much more pronounced as the number of components increases.For most quater-nary and other higher-order systems,information about alloys at the center of the phase diagram is virtually nonexistent except those HEA systems that have been reported very recently.In the1990s,researchers began to explore for metallic alloys with super-high glass-forming ability (GFA).Greer[29]proposed a confusion principle,which states that the more elements involved,the lower the chance that the alloy can select viable crystal structures,and thus the greater the chanceand quaternary alloy systems,showing regions of the phase diagram thatand relatively less well known(white)near the center[33].solid-solutions even though the cooling rate is very high,e.g.,alloys of CuCoNiCrAlFeTiV,FeCrMnNiCo,CoCrFeNiCu,AlCoCrFeNi,NbMoTaWV,etc.[1,2,12–14].The yield strength of the body-centered cubic (BCC)HEAs can be rather high [12],usually compa-rable to BMGs [12].Moreover,the high strength can be kept up to 800K or higher for some HEAs based on 3d transition metals [14].In contrast,BMGs can only keep their high strength below their glass-transition temperature.1.1.Four core effectsBeing different from the conventional alloys,compositions in HEAs are complex due to the equi-molar concentration of each component.Yeh [37]summarized mainly four core effects for HEAs,that is:(1)Thermodynamics:high-entropy effects;(2)Kinetics:sluggish diffusion;(3)Structures:severe lattice distortion;and (4)Properties:cocktail effects.We will discuss these four core effects separately.1.1.1.High-entropy effectThe high-entropy effects,which tend to stabilize the high-entropyphases,e.g.,solid-solution phases,were firstly proposed by Yeh [9].The effects were very counterintuitive because it was ex-pected that intermetallic compound phases may form for those equi-or near equi-atomic alloy com-positions which are located at the center of the phase diagrams (for example,a monoclinic compound AlCeCo forms in the center of Al–Ce–Co system [38]).According to the Gibbs phase rule,the number of phases (P )in a given alloy at constant pressure in equilibrium condition is:P ¼C þ1ÀF ð1-1Þwhere C is the number of components and F is the maximum number of thermodynamic degrees of freedom in the system.In the case of a 6-component system at given pressure,one might expect a maximum of 7equilibrium phases at an invariant reaction.However,to our surprise,HEAs form so-lid-solution phases rather than intermetallic phases [1,2,4,17].This is not to say that all multi-compo-nents in equal molar ratio will form solid solution phases at the center of the phase diagram.In fact,only carefully chosen compositions that satisfy the HEA-formation criteria will form solid solutions instead of intermetallic compounds.The solid-solution phase,according to the classical physical-metallurgy theory,is also called a ter-minal solid solution.The solid-solution phase is based on one element,which is called the solvent,and contains other minor elements,which are called the solutes.In HEAs,it is very difficult to differentiate the solvent from the solute because of their equi-molar portions.Many researchers reported that the multi-principal-element alloys can only form simple phases of body-centered-cubic (BCC)or face-cen-tered-cubic (FCC)solid solutions,and the number of phases formed is much fewer than the maximum number of phases that the Gibbs phase rule allows [9,23].This feature also indicates that the high en-tropy of the alloys tends to expand the solution limits between the elements,which may further con-firm the high-entropy effects.The high-entropy effect is mainly used to explain the multi-principal-element solid solution.According to the maximum entropy production principle (MEPP)[39],high entropy tends to stabilize the high-entropy phases,i.e.,solid-solution phases,rather than intermetallic phases.Intermetallics are usually ordered phases with lower configurational entropy.For stoichiometric intermetallic com-pounds,their configurational entropy is zero.Whether a HEA of single solid solution phase is in its equilibrium has been questioned in the sci-entific community.There have been accumulated evidences to show that the high entropy of mixing truly extends the solubility limits of solid solution.For example,Lucas et al.[40]recently reported ab-sence of long-range chemical ordering in equi-molar FeCoCrNi alloy that forms a disordered FCC struc-ture.On the other hand,it was reported that some equi-atomic compositions such as AlCoCrCuFeNi contain several phases of different compositions when cooling slowly from the melt [15],and thus it is controversial whether they can be still classified as HEA.The empirical rules in guiding HEA for-mation are addressed in Section 2,which includes atomic size difference and heat of mixing.4Y.Zhang et al./Progress in Materials Science 61(2014)1–93Y.Zhang et al./Progress in Materials Science61(2014)1–935 1.1.2.Sluggish diffusion effectThe sluggish diffusion effect here is compared with that of the conventional alloys rather than the bulk-glass-forming alloys.Recently,Yeh[9]studied the vacancy formation and the composition par-tition in HEAs,and compared the diffusion coefficients for the elements in pure metals,stainless steels, and HEAs,and found that the order of diffusion rates in the three types of alloy systems is shown be-low:Microstructures of an as-cast CuCoNiCrAlFe alloy.(A)SEM micrograph of an etched alloy withBCC and ordered BCC phases)and interdendrite(an FCC phase)structures.(B)TEMplate,70-nm wide,a disordered BCC phase(A2),lattice constant,2.89A;(B-b)aphase(B2),lattice constant,2.89A;(B-c)nanoprecipitation in a spinodal plate,7nm(B-d)nanoprecipitation in an interspinodal plate,3nm in diameter,a disorderedarea diffraction(SAD)patterns of B,Ba,and Bb with zone axes of BCC[01[011],respectively[2].illustration of intrinsic lattice distortion effects on Bragg diffraction:(a)perfect latticewith solid solutions of different-sized atoms,which are expected to randomly distribute statistical average probability of occupancy;(c)temperature and distortion effectsY.Zhang et al./Progress in Materials Science61(2014)1–937 the intensities further drop beyond the thermal effect with increasing the number of constituent prin-cipal elements.An intrinsic lattice distortion effect caused by the addition of multi-principal elements with different atomic sizes is expected for the anomalous decrease in the XRD intensities.The math-ematical treatment of this distortion effect for the modification of the XRD structure factor is formu-lated to be similar to that of the thermal effect,as shown in Fig.1.3[41].The larger roughness of the atomic planes makes the intensity of the XRD for HEAs much lower than that for the single-element solid.The severe lattice distortion is also used to explain the high strength of HEAs,especially the BCC-structured HEAs[4,12,23].The severe lattice-distortion effect is also related to the tensile brittle-ness and the slower kinetics of HEAs[2,9,11].However,the authors also noticed that single-phase FCC-structured HEAs have very low strength[7],which certainly cannot be explained by the severe lattice distortion argument.Fundamental studies in quantification of lattice distortion of HEAs are needed.1.1.4.Cocktail effectThe cocktail-party effect was usually used as a term in the acousticsfield,which have been used to describe the ability to focus one’s listening attention on a single talker among a mixture of conversa-tions and background noises,ignoring other conversations.For metallic alloys,the effect indicates that the unexpected properties can be obtained after mixing many elements,which could not be obtained from any one independent element.The cocktail effect for metallic alloys wasfirst mentioned by Ranganathan[42],which has been subsequently confirmed in the mechanical and physical properties [12,13,15,18,35,43].The cocktail effect implies that the alloy properties can be greatly adjusted by the composition change and alloying,as shown in Fig.1.4,which indicates that the hardness of HEAs can be dramat-ically changed by adjusting the Al content in the CoCrCuNiAl x HEAs.With the increase of the Al con-lattice constants of a CuCoNiCrAl x Fe alloy system with different x values:(A)hardnessconstants of an FCC phase,(C)lattice constants of a BCC phase[2].CoNiCrAl x Fe alloy system with different x values,the Cu-free alloy has lower hardness.CoCrCuFeNiAl x[15,45].Cu forms isomorphous solid solution with Ni but it is insoluble in Co,Cr and Fe;it dissolves about20at.%Al but also forms various stable intermetallic compounds with Al.Fig.1.6exhibits the hardness of some reported HEAs in the descending order with stainless steels as benchmark.The MoTiVFeNiZrCoCr alloy has a very high value of hardness of over800HV while CoCrFeNiCu is very soft with a value of less than200HV.Fig.1.7compares the specific strength,which yield strength over the density of the materials,and the density amongalloys,polymers and foam materials[5].We can see that HEAs have densitieshigh values of specific strength(yield strength/density).This is partiallyHEAs usually contain mainly the late transitional elements whoselightweight HEAs have much more potential because lightweightdensity of the resultant alloys will be lowered significantly.Fig.1.8strength of HEAs vs.Young’s modulus compared with conventional alloys.highest specific strength and their Young’s modulus can be variedrange of hardness for HEAs,compared with17–4PH stainless steel,Hastelloy,andYield strength,r y,vs.density,q.HEAs(dark dashed circle)compared with other materials,particularly structural Grey dashed contours(arrow indication)label the specific strength,r y/q,from low(right bottom)to high(left top).among the materials with highest strength and specific strength[5].Specific-yield strength vs.Young’s modulus:HEAs compared with other materials,particularly structural alloys.among the materials with highest specific strength and with a wide range of Young’s modulus[5].range.This observation may indicate that the modulus of HEAs can be more easily adjusted than con-ventional alloys.In addition to the high specific strength,other properties such as high hydrogen stor-age property are also reported[46].1.2.Key research topicsTo understand the fundamentals of HEAs is a challenge to the scientists in materials science and relatedfields because of lack of thermodynamic and kinetic data for multi-component systems in the center of phase diagrams.The phase diagrams are usually available only for the binary and ternary alloys.For HEAs,no complete phase diagrams are currently available to directly assist designing the10Y.Zhang et al./Progress in Materials Science61(2014)1–93alloy with desirable micro-and nanostructures.Recently,Yang and Zhang[28]proposed the X param-eter to design the solid-solution phase HEAs,which should be used combing with the parameter of atomic-size difference.This strategy may provide a starting point prior to actual experiments.The plastic deformation and fracture mechanisms of HEAs are also new because the high-entropy solid solutions contain high contents of multi-principal elements.In single principal-element alloys,dislo-cations dominate the plastic behavior.However,how dislocations interact with highly-disordered crystal lattices and/or chemical disordering/ordering will be an important factor responsible for plastic properties of HEAs.Interactions between the other crystal defects,such as twinning and stacking faults,with chemical/crystal disordering/ordering in HEAs will be important as well.1.2.1.Mechanical properties compared with other alloysFor conventional alloys that contain a single principal element,the main mechanical behavior is dictated by the dominant element.The other minor alloying elements are used to enhance some spe-cial properties.For example,in the low-carbon ferritic steels[47–59],the main mechanical properties are from the BCC Fe.Carbon,which is an interstitial solute element,is used for solid-solution strength-ened steels,and also to enhance the martensite-quenching ability which is the phase-transformation strengthening.The main properties of steels are still from Fe.For aluminum alloys[60]and titanium alloys[61],their properties are mainly related to the dominance of the elemental aluminum and tita-nium,respectively.Intermetallic compounds are usually based on two elements,e.g.,Ti–Al,Fe3Al,and Fe3Si.Interme-tallic compounds are typically ordered phases and some may have strict compositional range.The Burgers vectors of the ordered phases are too large for the dislocations to move,which is the main reason why intermetallic phases are usually brittle.However,there are many successful case studies to improve the ductility of intermetallic compound by micro-alloying,e.g.,micro-alloying of B in Ni3Al [62],and micro-alloying of Cr in Fe3Al[63,64].Amorphous metals usually contain at least three elements although binary metallic glasses are also reported,and higher GFA can be obtained with addition of more elements,e.g.,ZrTiCuNiBe(Vit-1), PdNiCuP,LaAlNiCu,and CuZrAlY alloys[65–69].Amorphous metals usually exhibit ultrahigh yield strength,because they do not contain conventional any weakening factors,such as dislocations and grain boundaries,and their yield strengths are usually three tofive times of their corresponding crys-talline counterpart alloys.There are several models that are proposed to explain the plastic deforma-tion of the amorphous metal,including the free volume[70],a shear-transformation-zone(STZ)[71], more recently a tension-transition zone(TTZ)[72],and the atomic-level stress[73,74].The micro-mechanisms of the plastic deformation of amorphous metals are usually by forming shear bands, which is still an active research area till today.However,the high strength of amorphous alloys can be sustained only below the glass-transition temperature(T g).At temperatures immediately above T g,the amorphous metals will transit to be viscous liquids[68]and will crystallize at temperatures above thefirst crystallization onset temperature.This trend may limit the high-temperature applica-tions of amorphous metals.The glass forming alloys often are chemically located close to the eutectic composition,which further facilitates the formation of the amorphous metal–matrix composite.The development of the amorphous metal–matrix composite can enhance the room-temperature plastic-ity of amorphous metals,and extend application temperatures[75–78].For HEAs,their properties can be different from any of the constituent elements.The structure types are the dominant factor for controlling the strength or hardness of HEAs[5,12,13].The BCC-structured HEAs usually have very high yield strengths and limited plasticity,while the FCC-structured HEAs have low yield strength and high plasticity.The mixture of BCC+FCC is expected to possess balanced mechanical properties,e.g.,both high strength and good ductility.Recent studies show that the microstructures of certain‘‘HEAs’’can be very complicated since they often undergo the spinodal decomposition,and ordered,and disordered phase precipitates at lower temperatures. Solution-strengthening mechanisms for HEAs would be much different from conventional alloys. HEAs usually have high melting points,and the high yield strength can usually be sustained to ultrahigh temperatures,which is shown in Fig.1.9for refractory metal HEAs.The strength of HEAs are sometimes better than those of conventional superalloys[14].Temperature dependence of NbMoTaW,VNbMoTaW,Inconel718,and Haynes2301.2.2.Underlying mechanisms for mechanical propertiesMechanical properties include the Young’s modulus,yield strength,plastic elongation,fracture toughness,and fatigue properties.For the conventional one-element principal alloys,the Young’s modulus is mainly controlled by the dominant element,e.g.,the Young’s modulus of Fe-based alloys is about200GPa,that of Ti-based alloys is approximately110GPa,and that of Al-based alloys is about 75GPa,as shown in Fig.1.8.In contrast,for HEAs,the modulus can be very different from any of the constituent elements in the alloys[79],and the moduli of HEAs are scattered in a wide range,as shown in Fig.1.8.Wang et al.[79] reported that the Young’s modulus of the CoCrFeNiCuAl0.5HEA is about24.5GPa,which is much lower than the modulus of any of the constituent elements in the alloy.It is even lower than the Young’s modulus of pure Al,about69GPa[80].On the other hand,this value needs to be verified using other methods including impulse excitation of vibration.It has been reported that the FCC-structured HEAs exhibit low strength and high plasticity[13], while the BCC-structured HEAs show high strength and low plasticity at room temperature[12].Thus, the structure types are the dominant factor for controlling the strength or hardness of HEAs.For the fracture toughness of the HEAs,there is no report up to date.1.2.3.Alloy design and preparation for HEAsIt has been verified that not all the alloys withfive-principal elements and with equi-atomic ratio compositions can form HEA solid solutions.Only carefully chosen compositions can form FCC and BCC solid solutions.Till today there is no report on hexagonal close-packed(HCP)-structured HEAs.One reason is probably due to the fact that a HCP structure is often the stable structure at low tempera-tures for pure elements(applicable)in the periodic table,and that it may transform to either BCC or FCC at high temperatures.Most of the HEA solid solutions are identified by trial-and-error exper-iments because there is no phase diagram on quaternary and higher systems.Hence,the trial-and er-ror approach is the main way to develop high-performance HEAs.However,some parameters have been proposed to predict the phase formation of HEAs[17,22,28]in analogy to the Hume-Rothery rule for conventional solid solution.The fundamental thermodynamic equation states:G¼HÀTSð1-2Þwhere H is the enthalpy,S is the entropy,G is the Gibbs free energy,and T is the absolute temperature. From Eq.(1-2),the TS term will become significant at high temperatures.Hence,preparing HEAs from the liquid and gas would provide different kinds of information.These techniques may include sput-tering,laser cladding,plasma coating,and arc melting,which will be discussed in detail in the next chapter.For the atomic-level structures of HEAs,the neutron and synchrotron diffraction methods are useful to detect ordering parameters,long-range order,and short-range ordering[81].1.2.4.Theoretical simulations for HEAsFor HEAs,entropy effects are the core to their formation and properties.Some immediate questions are:(1)How can we accurately predict the total entropy of HEA phase?(2)How can we predict the phasefield of a HEA phase as a function of compositions and temperatures?(3)What are the proper modeling and experimental methods to study HEAs?To address the phase-stability issue,thermody-namic modeling is necessary as thefirst step to understand the fundamental of HEAs.The typical mod-eling techniques to address thermodynamics include the calculation of phase diagram(CALPHAD) modeling,first-principle calculations,molecular-dynamics(MD)simulations,and Monte Carlo simulations.Kao et al.[82]using MD to study the structure of HEAs,and their modeling efforts can well explain the liquid-like structure of HEAs,as shown in Fig.1.10.Grosso et al.[83]studied refractory HEAs using atomistic modeling,clarified the role of each element and their interactions,and concluded that4-and 5-elements alloys are possible to quantify the transition to a high-entropy regime characterized by the formation of a continuous solid solution.2.Thermodynamicsof a liquid-like atomic-packing structure using multiple elementsthird,fourth,andfifth shells,respectively,but the second and third shellsdifference and thus the largefluctuation in occupation of different atoms.2.1.EntropyEntropy is a thermodynamic property that can be used to determine the energy available for the useful work in a thermodynamic process,such as in energy-conversion devices,engines,or machines. The following equation is the definition of entropy:dS¼D QTð2-1Þwhere S is the entropy,Q is the heatflow,and T is the absolute temperature.Thermodynamic entropy has the dimension of energy divided by temperature,and a unit of Joules per Kelvin(J/K)in the Inter-national System of Units.The statistical-mechanics definition of entropy was developed by Ludwig Boltzmann in the1870s [85]and by analyzing the statistical behavior of the microscopic components of the system[86].Boltz-mann’s hypothesis states that the entropy of a system is linearly related to the logarithm of the fre-quency of occurrence of a macro-state or,more precisely,the number,W,of possible micro-states corresponding to the macroscopic state of a system:Fig.2.1.Illustration of the D S mix for ternary alloy system with the composition change[17].。
基于偏振补偿的级联式液晶偏振光栅衍射效率优化方法
第 38 卷第 7 期2023 年 7 月Vol.38 No.7Jul. 2023液晶与显示Chinese Journal of Liquid Crystals and Displays基于偏振补偿的级联式液晶偏振光栅衍射效率优化方法胡虎1,刘雪莲2,王春阳2,3*,王子硕3,梁书宁3(1.西安工业大学兵器科学与技术学院,陕西西安 710021;2.西安工业大学西安市主动光电成像探测技术重点实验室,陕西西安 710021;3.长春理工大学电子信息工程学院,吉林长春 130022)摘要:针对光束在级联式液晶偏振光栅传播过程中的斜入射现象会改变其椭圆率进而降低衍射效率的问题,提出了基于偏振补偿的级联式液晶偏振光栅衍射效率优化方法。
运用扩展琼斯矩阵分析斜入射角度对液晶偏振光栅相位延迟的影响,结合斯托克斯参数求解不同斜入射角度下液晶偏振光栅出射光束椭圆率的变化,利用矢量衍射理论完成级联式液晶偏振光栅衍射效率模型的建立。
基于液晶分子指向矢分布建立斜入射下液晶电控波片的坐标系,分析斜入射角度对液晶电控波片相位延迟的影响,推导出液晶电控波片相位延迟与椭圆率的关系。
通过优化液晶电控波片的工作电压补偿斜入射造成的退偏量,实现对椭圆率的偏振补偿。
搭建实验平台验证了理论的准确性与方法的有效性。
根据测量结果可知,当光束偏转角度从-5°偏转到5°时,本文所提方法提高了级联式液晶偏振光栅3%~4%的衍射效率。
关键词:级联式液晶偏振光栅;衍射效率;扩展琼斯矩阵;斯托克斯参数;液晶电控波片中图分类号:TN958.92 文献标识码:A doi:10.37188/CJLCD.2023-0043Optimization method of diffraction efficiency of cascade liquid crystal polarization gratings based on polarization compensation HU Hu1,LIU Xue-lian2,WANG Chun-yang2,3*,WANG Zi-shuo3,LIANG Shu-ning3(1.College of Ordnance Science and Technology, Xi'an Technological University,Xi'an 710021, China;2.Xi'an Key Laboratory of Active Photoelectric Imaging Detection Technology,Xi'an Technological University, Xi'an 710021, China;3.College of Electronic and Information Engineering, Changchun University of Science and Technology,Changchun 130022, China)Abstract:Aiming at the problem that the oblique incidence of beam during the propagation of cascaded liquid crystal polarization gratings will change its ellipticity and reduce the diffraction efficiency,an optimization 文章编号:1007-2780(2023)07-0880-12收稿日期:2023-02-07;修订日期:2023-03-07.基金项目:中国博士后科学基金(No.2020M673606XB);西安市智能兵器重点实验室项目(No.2019220514SYS020CG042)Supported by China Postdoctoral Science Foundation (No.2020M673606XB); Xi'an Key Laboratory ofIntelligent Weapons(No.2019220514SYS020CG042)*通信联系人,E-mail:Wangchunyang19@第 7 期胡虎,等:基于偏振补偿的级联式液晶偏振光栅衍射效率优化方法method of diffraction efficiency of cascaded liquid crystal polarization gratings based on polarization compensation is proposed.The effect of oblique incidence on retardation of liquid crystal polarization grating is analyzed by using extended Jones bined with Stokes parameter,the ellipticity of liquid crystal polarization grating outgoing beam at different oblique incidence angles is analyzed. Based on the vector diffraction theory, the diffraction efficiency model of cascaded liquid crystal polarization gratings is established. Based on the liquid crystal molecular direction vector distribution, the coordinate system of liquid crystal variable retarder under oblique incidence is established.The influence of oblique incidence angle on retardation of liquid crystal variable retarder is analyzed, and the relationship between retardation and ellipticity is derived. By optimizing the operating voltage of liquid crystal variable retarder, the depolarization caused by oblique incidence is compensated to realize the polarization compensation of ellipticity. The experimental platform is set up to verify the accuracy of the theory and the effectiveness of the method. According to the measurement results, when the beam deflection angle is from -5° to 5°, the proposed method improves the diffraction efficiency of the cascaded liquid crystal polarization gratings by 3%~4%.Key words: cascaded liquid crystal polarization gratings;diffraction efficiency;extended Jones matrix;stokes parameter; liquid crystal variable retarder1 引言液晶偏振光栅(Liquid Crystal Polarization Grating, LCPG)是一种新型的高衍射效率、超薄型、可实现非机械式光束偏转的液晶器件,其基于几何相位来调制入射光的偏振态,进而控制出射光的偏转级次。
On the Cauchy problem for a nonlinear Kolmogorov equation
ON THE CAUCHY PROBLEM FOR A NONLINEAR KOLMOGOROV EQUATION ∗ANDREA PASCUCCI †AND SERGIO POLIDORO †SIAM J.M ATH.A NAL .c2003Society for Industrial and Applied Mathematics Vol.35,No.3,pp.579–595Abstract.We consider the Cauchy problem related to the partial differential equationLu ≡∆x u +h (u )∂y u −∂t u =f (·,u ),where (x,y,t )∈R N ×R ×]0,T [,which arises in mathematical finance and in the theory of diffusion processes.We study the regularity of solutions regarding L as a perturbation of an operator of Kolmogorov type.We prove the existence of local classical solutions and give some sufficient conditions for global existence.Key words.nonlinear degenerate Kolmogorov equation,interior regularity,H¨o rmander opera-tors AMS subject classifications.35K57,35K65,35K70DOI.10.1137/S00361410013993491.Introduction.In this paper we study the Cauchy problemLu =f (·,u )in S T ≡R N +1×]0,T [,(1.1)u (·,0)=g in R N +1,(1.2)where L is the nonlinear operator defined byLu =∆x u +h (u )∂y u −∂t u,(1.3)(x,y,t )=z denotes the point in R N ×R ×R ,and ∆x is the Laplace operator acting in the variable x ∈R N .We assume that f,g ,and h are globally Lipschitz continuous functions.One of the main features of operator (1.3)is the strong degeneracy of its char-acteristic form due to the lack of diffusion in the y -direction,so that (1.1)–(1.2)may include the Cauchy problem for the Burgers equation,when h (u )=u ,g =g (y ),and f ≡0.On the other hand,L can be considered as nonlinear version of the operator K =∆x +x 1∂y −∂t ,(1.4)which was introduced by Kolmogorov [17]and has been extensively studied by Kuptsov [12]and Lanconelli and Polidoro [14].Among the known results of K ,we recall that every solution to Ku =0is smooth;thus we may expect some regularity properties also for the solutions to (1.1).Problem (1.1)–(1.2)arises in mathematical finance as well as in the study of nonlinear physical phenomena such as the combined effects of diffusion and convection of matter.∗Receivedby the editors December 10,2001;accepted for publication (in revised form)March 21,2003;published electronically October 2,2003.This research was supported by University of Bologna funds for selected research topics./journals/sima/35-3/39934.html †Dipartimento di Matematica,Universit`a di Bologna,Piazza di Porta S.Donato 5,40127Bologna,Italy (pascucci@dm.unibo.it,polidoro@dm.unibo.it).579580ANDREA PASCUCCI AND SERGIO POLIDOROEscobedo,Vazquez,and Zuazua [8]prove that there exists a unique distributional solution to (1.1)–(1.2)satisfying an entropy condition that generalizes the one by Kruzhkov [11].This solution is characterized in the vanishing viscosity sense;i.e.,it is the limit of a sequence of classical solutions to Cauchy problems related to the regularized operatorL εu =∆x u +ε2∂2y u +h (u )∂y u −∂t u.(1.5)Vol’pert and Hudjaev [19]prove similar existence and uniqueness results in a space of bounded variation functions whose spatial derivatives are square integrable with respect to (w.r.t.)a suitable weight.In this framework,it is natural to consider bounded and integrable initial data g and nonlinearities of the form h (u )=u p −1for p ∈]1,N +2N +1[.Our paper is mainly motivated by the theory of agents’decisions under risk,arising in mathematical finance.The problem is the representation of agents’pref-erences over consumption processes.Antonelli,Barucci,and Mancino [1]propose a utility functional that takes into account aspects of decision making such as the agents’habit formation,which is described as a smoothed average of past consump-tion and expected utility.In that model the processes utility and habit are described by a system of backward-forward stochastic differential equations.The solution of such a system,as a function of consumption and time,satisfies the Cauchy problem (1.1)–(1.2).Our regularity assumption on f,g,h is required by the financial model,since these functions appear in the backward-forward system as Lipschitz continuous coefficients.In the paper by Antonelli and Pascucci [2]an existence result,in the case N =1,is proved by probabilistic techniques that exploit the properties of the solutions to the system of backward-forward stochastic differential equations related to (1.1)–(1.2).In [2],the existence of a viscosity solution,in the sense of [7],is proved.The solution is defined in a suitably small strip R 2×[0,T ]and satisfies the following conditions:|u (x,y,t )−u (x ,y ,t )|≤c 0(|x −x |+|y −y |),|u (x,y,t )−u (x,y,t )|≤c 0(1+|(x,y )|)|t −t |12(1.6)for every (x,y ),(x ,y )∈R 2,t,t ∈[0,T ],where c 0is a positive constant that depends on the Lipschitz constants of f,g ,and h .Concerning the regularity of u ,we remark that the results by Caffarelli and Cabr´e [3]and Wang [20,21]do not apply to our operator.In this paper we prove the existence of a classical solution u to problem (1.1)–(1.2)by combining the analysis on Lie groups with the standard techniques for the Cauchy problem related to degenerate parabolic equations.We say that u is a classical solution if ∂x j x k u ,j,k =1,...,N,the directional derivativez −→Y u (z )=∂u ∂νz (z ),ν(z )=(0,h (u (z )),−1),are continuous functions,and (1.1)–(1.2)are verified at every point.Our main result is the following.Theorem 1.1.There exists a positive T and a unique function u ∈C (S T ),verifying estimates (1.6)on S T ,which is a classical solution to (1.1)–(1.2).We stress that the regularity stated above is natural for the problem under con-sideration.Indeed,although Y u is the sum of the more simple terms h (u )∂y u andA NONLINEAR KOLMOGOROV EQUATION581∂t u,it is not true in general that they are continuous functions.Further regularity properties of solutions can be obtained under additional conditions.For instance, in[5,6]in collaboration with Citti,we considered the nonlinear equation in three variables,(1.7)∂xx u+u∂y u−∂t u=0,which is a special case of(1.1).Assuming a hypothesis formally analogous to the classical H¨o rmander condition,we proved that the viscosity solution u of(1.7)con-structed in[2]actually is a C∞classical solution.In this paper we give a direct proof of the existence of a classical solution to the Cauchy problem(1.1)–(1.2)by using analytical methods.The regularity part in Theorem1.1is based on a modification of the classical freezing method,introduced by Citti in[4]for the study of the Levi equation.More precisely,for any¯z∈S T,we approximate L by the linear operatorL¯z=∆x+(h(u(¯z))+x1−¯x1)∂y−∂t,(1.8)and we represent a solution u in terms of its fundamental solution.Note that up to a straightforward change of coordinates,L¯z is the Kolmogorov operator(1.4),and hence an explicit expression of the fundamental solution of L¯z is available.Also note that L¯z is a good approximation of L in the sense that,by(1.6),we have|Lu(z)−L¯z u(z)|=|u(z)−u(¯z)−(x1−¯x1)||∂y u(z)|≤c0d(¯z,z),where d(¯z,z)is the standard parabolic distance.The existence part of Theorem1.1relies on the Bernstein technique.We explicitly note that the nonlinearity in(1.3)is not monotone;therefore a maximum principle for the operator Lv+h (u)v2,which occurs when we differentiate both sides of(1.1) w.r.t.y,does not hold unless we assume condition(1.6).We end this introduction with a remark about the existence of global solutions. Wefirst note that the space of functions characterized by conditions(1.6)is,in some sense,optimal for the existence of local classical solutions.Indeed the linear growth of the initial data g does not allow,in general,solutions that are defined at every time t>0,as the following example given in[2]shows.Consider the problem(1.7), with f≡0and g(x,y)=x+y:a direct computation shows that u(x,y,t)=x+y1−t is the unique solution to the problem and blows up as t→1.This fact is expectedsince,if u grows as a linear function,then its Cole–Hopf transformed function grows as exp(y2),which is the critical case for the parabolic Cauchy problem.Next we give a simple sufficient condition for the global existence of classical solutions.Theorem1.2.Let f,g,and h be globally Lipschitz continuous functions.Suppose that g is nonincreasing w.r.t.y,that f is nondecreasing w.r.t.y,and that there exists c0∈]0,c1]such thatc0(u−v)≤h(u)−h(v)(1.9)for every u,v∈R.Then the Cauchy problem(1.1)–(1.2)has a solution u that is defined in R N+1×R+.This paper is organized as follows.In section2we prove Theorem1.1,and in section3we prove the existence of a viscosity solution of(1.1)–(1.2).Section4is devoted to the proof of Theorem1.2.582ANDREA PASCUCCI AND SERGIO POLIDORO2.Classical solutions.In this section we prove Theorem 1.1.We first state an existence and uniqueness result of a strong solution u to problem (1.1)–(1.2).And then we prove that u is a solution in the classical sense.We say that a continuous function u is a strong solution to (1.1)–(1.2)if u ∈H 1loc (S T ),∂x j x k u ∈L 2loc (S T ),j,k =1,...,N ,it satisfies equation (1.1)a.e.,and it assumes the initial datum g .Theorem 2.1.If T is suitably small,there exists a unique strong solution of (1.1)–(1.2)verifying estimates (1.6)on S T .The proof of Theorem 2.1is postponed to section3.We remark that in the above statement,we consider the termY u =h (u )∂y u −∂t uas a sum of weak derivatives.Here we aim to prove that Y u is a continuous function and that it coincides with the directional derivative w.r.t.the vector νz =(0,h (u ),−1),namely,Y (u (z ))=∂u ∂νz (z )∀z ∈S T .(2.1)In what follows,when we consider a function F that depends on many variables,to avoid any ambiguity we shall systematically write the directional derivative introduced in (2.1)asY (z )F (·,ζ)=∂F (·,ζ)∂νz(z ).Our technique is inspired by the recent paper [6],where,in collaboration with Citti,we developed some ideas for a general study of a nonlinear equation of the form (1.4).We recall the following lemma,which has been proved in Lemma 3.1of [6],for the Cauchy problem (1.7).We state the lemma for the operator (1.3)and omit the proof,since it is analogous to the one given in [6].Lemma 2.2.Let v be a continuous function defined in S T .Assume that its weak derivatives v y ,v t ∈L 2loc and that the limitlimδ→0v (z +δνz )−v (z )δexists and is uniform w.r.t.z in compact subsets of S T .Then ∂v ∂νz (z )=(h (u )∂y v −∂t v )(z ) a.e.z ∈S T .We next prove Theorem 1.1by using a representation formula of the strong so-lution u in terms of the fundamental solution of the operator L ¯z introduced in (1.8).We define the first order operators (vector fields)X j =∂x j ,j =1,...,N,Y ¯z =(h (u (¯z ))+x 1−¯x 1)∂y −∂t .(2.2)Thus we can rewrite the operator L ¯z in the standard formL ¯z =Nj =1X 2j +Y ¯z .(2.3)A NONLINEAR KOLMOGOROV EQUATION 583Let us recall some preliminary facts about real analysis on nilpotent Lie groups.More details about this topic can be found in [15]and [18].We define on R N +2the composition law θ⊕θ = θ1+θ 1,...,θN +θ N ,θN +1+θ N +1,θN +2+θ N +2+12(θ1θ N +1−θN +1θ 1) and the dilations groupδλ(θ)=(λθ1,...,λθN ,λ2θN +1,λ3θN +2),λ>0.We remark that G =(R N +2,⊕)is a nilpotent stratified Lie group which,in the case N =1,coincides with the standard Heisenberg group.Since the Jacobian Jδλequals λN +5,the homogeneous dimension of G w.r.t.(δλ)λ>0is the natural number Q =N +5.A norm which is homogeneous w.r.t.this dilations group is given byθ = |θ1|6+···+|θN |6+|θN +1|3+|θN +2|2 16.Let ∇¯z =(X 1,...,X N ,Y ¯z ,∂y )be the gradient naturally associated to L ¯z and consider any z ∈RN +2.The exponential map E ¯z (θ,z )=exp( θ,∇¯z )(z )is a global diffeomorphism and induces a Lie group structure on R N +2whose product is defined byζ◦z =E ¯z E −1¯z (ζ,0)⊕E −1¯z (z,0) ,0 ,and it can be explicitly computed asζ◦z =(x +ξ,y +η−tξ1,t +τ).Moreover,a control distance d ¯z in (RN +2,◦)is defined by d ¯z (z,ζ)= E −1¯z ζ−1◦z,0 = |x −ξ|6+|t −τ|3+ y −η+(t −τ) h (u (¯z ))+x 1−ξ1−2¯x 122 16,(2.4)where ζ−1is the inverse in the group law “◦”.We denote by Γ¯z (z,ζ)the fundamental solution of L ¯z with pole in ζand evaluated at z .We refer to [12,14,13,9]for known results about Γ¯z .The following bound holds:Γ¯z (z,ζ)=Γ¯z (ζ−1◦z,0)≤cd ¯z (z,ζ)−Q +2,(2.5)where the constant c continuously depends on ¯z .We are now in a position to prove Theorem 1.1.Proof of Theorem 1.1.By Theorem 2.1there exists a unique strong solution of (1.1)–(1.2)verifying (1.6)in S T for T suitably small.In order to prove that u is a classical solution,we represent it in terms of the fundamental solution Γ¯z :(uϕ)(z )=S TΓ¯z (z,ζ)(U 1,¯z (ζ)−U 2,¯z (ζ))dζ≡I 1(z )−I 2(z )(2.6)584ANDREA PASCUCCI AND SERGIO POLIDOROfor every ϕ∈C ∞0(S T ),whereU 1,¯z =ϕf (·,u )+uL ¯z ϕ+2 ∇x u,∇x ϕ ,U 2,¯z =(h (u )−h (u (¯z ))−(x 1−¯x 1))∂y uϕare bounded functions with compact support.Therefore it is straightforward to prove that uϕ∈C 1+αd ¯z ,α∈]0,1[,where C k +αd ¯z denotes the space of H¨o lder continuous functions w.r.t.the control distance d ¯z .In particular,by choosing ϕ≡1in a compact neighborhood K of ¯z ,we have that X j u (z )=X j (z )Γ¯z (·,ζ)(U 1,¯z (ζ)−U 2,¯z (ζ))dζ,z ∈K,j =1,...,N,and|X j u (z )−X j u (ζ)|≤cd ¯z (z,ζ)α∀z,ζ∈K,α∈]0,1[.(2.7)This proves the H¨o lder continuity of the first order derivatives of u .Let us now consider the second order derivatives X j X h u ,j,k,=1,...,N ,and Y u .We next prove the existence of the directional derivative Y u (¯z )by studying sep-arately the terms I 1,I 2.Since Y is the unique nonlinear vector field to be considered,the proof of our result for the derivatives X j X h u is simpler and will be omitted.The term I 2.We set J (¯z )=S TY (¯z )Γ¯z (·,ζ)U 2,¯z (ζ)dζ.We remark that J is well defined and continuous since,by (1.6),we have|U 2,¯z (ζ)|≤c d ¯z (¯z ,ζ).(2.8)We denote by χ∈C ∞([0,+∞[,[0,1])a cut-offfunction such that χ(s )=0for 0≤s ≤12,χ(s )=1for s ≥1,and we setI 2,δ(z )= S T Γ¯z (z,ζ)χ d ¯z (¯z ,ζ)¯c δ12U 2,¯z (ζ)dζ,¯c ,δ>0.In what follows we shall assume d ¯z (¯z ,z )≤δ12;then by the triangular inequality d ¯z (¯z ,ζ)≤c (d ¯z (¯z ,z )+d ¯z (z,ζ)),(2.9)we can choose ¯c suitably large so that χ d ¯z (¯z ,ζ)¯c δ12 =0if d ¯z (z,ζ)<δ12,and,as a consequence,I 2,δis smooth for any δ>0.We claim thatsup d ¯z (¯z ,z )≤δ12|I 2,δ(z )−I 2(z )|≤c δ32,(2.10)sup d ¯z (¯z ,z )≤δ12|Y ¯z I 2,δ(z )−J (¯z )|≤c δ12|log(δ)|(2.11)A NONLINEAR KOLMOGOROV EQUATION585 for some positive constant c.We postpone the proof of(2.10)–(2.11)to the end.Let us now compute the derivative∂I2∂ν¯z(¯z).For every positiveδwe haveI2(¯z+δν¯z)−I2(¯z)δ−J(¯z)≤I2,δ(¯z+δν¯z)−I2,δ(¯z)δ−J(¯z)+I2(¯z+δν¯z)−I2,δ(¯z+δν¯z)δ+I2(¯z)−I2,δ(¯z)δ.Wefirst note that,using the expression(2.4),wefind d¯z(¯z,¯z+δν¯z)=δ12.Thus,by (2.10)and by the mean value theorem,there exists aδ0∈]0,δ[such thatI2(¯z+δν¯z)−I2(¯z)δ−J(¯z)≤|(h(u(¯z))∂y I2,δ−∂t I2,δ)(¯z+δ0ν¯z)−J(¯z)|+cδ12 =|Y¯z I2,δ(¯z+δ0ν¯z)−J(¯z)|+cδ12≤cδ12|logδ|,where the last inequality follows from(2.11).Therefore we have∂I2∂νz(z)=J(z),and,by Lemma2.2,we get(2.1).We are left with the proof of(2.10)–(2.11).We assume d¯z(¯z,z)≤δ12.By(2.8) and(2.5),we have|I2,δ(z)−I2(z)|≤cd¯z(z,ζ)<δ12d¯z(z,ζ)−Q+2d¯z(¯z,ζ)dζ(since,by(2.9),d¯z(¯z,ζ)<cδ12,and by using the homogeneous polar coordinates)≤δ12<δ12−Q+2+Q−1d =cδ32.This proves(2.10).Next we recall the following estimate which immediately follows by the mean value theorem:|Y¯z(z)Γ¯z(·,ζ)−Y¯z(¯z)Γ¯z(·,ζ)|≤cd¯z(¯z,z)d¯z(¯z,ζ)−Q−1(2.12)for d¯z(¯z,ζ)≥¯c d¯z(¯z,z).Then we have|Y¯z I2,δ(z)−J(¯z)|≤|Y¯z(z)Γ¯z(·,ζ)−Y¯z(¯z)Γ¯z(·,ζ)|χd¯z(¯z,ζ)¯cδ12|U2,¯z(ζ)|dζ+d¯z(¯z,ζ)<δ12|Y¯z(¯z)Γ¯z(·,ζ)U2,¯z(ζ)|dζ(by(2.12)and since the second term can be estimated as before)≤cδ12d¯z(¯z,ζ)>¯cδ12d¯z(¯z,ζ)−Q−1|U2,¯z(ζ)|dζ+cδ12=cδ12|log(δ)|.This concludes the proof of(2.11).586ANDREA PASCUCCI AND SERGIO POLIDOROThe term I 1.Let G (z,ζ)=g (ζ−1◦z ),where g is a smooth function.A direct computation givesY ¯z (z )G (·,ζ)=R ¯z (ζ)G (z,·),(2.13)whereR ¯z (ζ)=−Y ¯z (ζ)−(x 1−ξ1)∂η(see [16,p.295]for a general statement of this result).We aim to prove that Y (¯z )I 1= ΩY (¯z )Γ¯z (·,ζ)(U 1,¯z (ζ)−U 1,¯z (¯z ))dζ+U 1,¯z (¯z ) ∂ΩΓ¯z (¯z ,ζ) R ¯z (ζ),ν(ζ) dσ,(2.14)where νis the outer normal to the set Ω=supp(ϕ),for which we assume that the divergence theorem holds.By (2.5),the homogeneity of the fundamental solution,and the H¨o lder continuity of U 1,¯z ,the functionV (z )= ΩY (z )Γ¯z (·,ζ)(U 1,¯z (ζ)−U 1,z (z ))dζ+U 1,¯z (z )∂ΩΓ¯z (z,ζ) R ¯z (ζ),ν(ζ) dσ(ζ)(2.15)is well defined.Let K be a compact subset of Ω.We set,for δ>0,I 1,δ(z )= ΩΓ¯z (z,ζ)χ d ¯z (z,ζ)δ U 1,¯z (ζ)dζ,where χis the cut-offfunction previously introduced.We choose δsuitably small so that χ d ¯z (z,ζ)δ=1(2.16)for any z ∈K,ζ∈∂Ω.Clearly I 1,δis a smooth function,and differentiating we get Y ¯z (z )I 1,δ= ΩY ¯z (z ) Γ¯z (·,ζ)χ d ¯z (·,ζ)δ (U 1,¯z (ζ)−U 1,¯z (z ))dζ+U 1,¯z (z ) ΩY ¯z (z ) Γ¯z (·,ζ)χ d ¯z (·,ζ)δ dζ.(2.17)By (2.13)and the divergence theorem,we have ΩY ¯z (z ) Γ¯z (·,ζ)χ d ¯z (·,ζ)δ dζ= ΩR ¯z (ζ) Γ¯z (z,·)χ d ¯z (z,·)δ dζ= ∂ΩΓ¯z (z,ζ)χ d ¯z (z,ζ)δ R ¯z (ζ),ν(ζ) dσ(ζ).(2.18)Then,by (2.18)and (2.16),the last terms in (2.17)and (2.15)are equal.Hence we get|V (z )−Y ¯z (z )I 1,δ|= δ¯z (z,ζ)≤δY ¯z (z ) Γ¯z (·,ζ) 1−χ d ¯z (·,ζ)δ (U 1,¯z (ζ)−U 1,¯z (z ))dζ≤C δ¯z (z,ζ)≤δ d ¯z (z,ζ)−Q +Γ¯z (z,ζ)d ¯z (z,ζ)−1δ d ¯z (z,ζ)αdζ≤Cδα.A NONLINEAR KOLMOGOROV EQUATION 587Since the constant C continuously depends on ¯z ,we have that Y ¯z (z )I 1,δconverges to V as δ→0uniformly on K .Since I 1,δconverges to I 1we get (2.14).This completes the proof of Theorem 1.1.3.A priori estimates.In this section we prove Theorem 2.1by using a modifi-cation of the classical Bernstein method.Here we adopt the notation of [10,Chap.3],which we briefly recall for the reader’s convenience.Given a bounded domain Ωin R N +2and α∈]0,1[,C α(Ω)denotes the space of H¨o lder continuous functions w.r.t.the parabolic distanced (z,z )≡|x −x |+|y −y |+|t −t |12,i.e.,the family of all functions u on Ωfor which|u |Ωα=|u |α=|u |0+sup Ω|u (z )−u (z )|d (z,z )α<∞,where |u |Ω0=|u |0=sup Ω|u |.The spaces of H¨o lder continuous functions C k +α,k ∈N ,are defined straightforwardly.We setB r ={(x,y )∈R N ×R ||(x,y )|<r },S r,T =B r ×]0,T [,T,r >0.(3.1)The “parabolic”boundary of the cylinder S r,T is defined by∂p S r,T =(B r ×{0})∪(∂B r ×[0,T ]).(3.2)Given two points z,z ∈S r,T in (3.1),we denote by d z the distance from z to the parabolic boundary ∂p S r,T (cf.(3.2)),and d zz =min {d z ,d z }.We set|u |S r,T α=|u |α=|u |0+sup S r,T d αzz |u (z )−u (z )|d (z,z )α.The space of all functions u with finite norm |u |Sr,T αis denoted by C α(S r,T ).The spaces C k +αof H¨o lder continuous functions of higher order are defined analogously.We say that u ∈C k +α,loc (S T )if u ∈C k +α(S r,T )for every r >0.We consider the Cauchy problemL εu =f (·,u )in S T ≡R N +1×]0,T [,(3.3)u (·,0)=g in R N +1,(3.4)where L ε,ε>0,is the regularized operator in (1.5).We assume that the functions f,g,h are globally Lipschitz continuous;then there exists a positive constant c 1such thatc 1≥max {Lipschitz constants of f,g,h },|h (v )|≤c 1 ,|g (x,y )|≤c 1 1+|(x,y )|2,(3.5)|f (x,y,t,v )|≤c 1 1+|(x,y,t,v )|2,(x,y,t,v )∈S T ×R .The following result holds.Theorem 3.1.There exist two positive constants T,c that depend only on the constant c 1in (3.5)such that for every ε>0and α∈]0,1[the Cauchy problem588ANDREA PASCUCCI AND SERGIO POLIDORO(3.3)–(3.4)has a unique solution u ε∈C 2+α,loc (S T )∩C S T verifying the following ε-uniform estimates:|u εx i |0,|u εy |0≤4c 1,i =1,...,N,(3.6)|u ε(x,y,t +s )−u ε(x,y,t )|≤c 1+|(x,y )|2|s |12,(3.7)|u ε(x,y,t )|≤2c 1 1+|(x,y,t )|2∀(x,y,t )∈S T .(3.8)Before proving Theorem 3.1,we introduce some further notation.If χ=χ(x,y )∈C ∞0 R N +1 is a cut-offfunction such that χ=1in B 12and supp(χ)⊂B 1,we set χn (x,y )=χ x n ,y n,f n =fχn ,g n (·,t )=gχn ,h n (·,v )=h (v )χn ,n ∈N ,(3.9)so that,by (3.5)and readjusting the constant c 1if necessary,we have|∇χn |0≤|∇χ|0n ,|∇g n |≤c 1,|∇x,y f n (x,y,t,v )|≤|χn ∇x,y f |+c 1|∇χ|0n ≤c 1if |v |n is bounded and t ∈[0,T ].Finally,fixing n ∈N and ε>0,we consider the linearized Cauchy–Dirichlet problemL ε,n v u ≡∆x u +ε2u yy +h n (·,v )∂y u −∂t u =f n (·,v )in S n,T ,(3.10)u =g nin ∂p S n,T .(3.11)Given α∈]0,1[,we assume that the coefficient v in (3.10)–(3.11)belongs to the space C 1+α(S n,T )and satisfies the estimates |v (x,y,t )|≤2c 1 1+|(x,y )|2in S n,T ,(3.12)|v x i |0≤4c 1,i =1,...,N,(3.13)|v y |0≤4c 1.(3.14)Then a classical solution u ∈C 2+α(S n,T )to (3.10)–(3.11)exists by known results (see,for example,[10,Chap.3,Thm.7],since h n (·,v ),f n (·,v )∈C 1+α(S n,T ),g n ∈C ∞ S n,T ,and the compatibility condition L ε,n v g n =f n =0holds on ∂B n .Once we have given the following ε-uniform a priori estimates,the proof of Theorem 3.1is rather standard.Lemma 3.2.Under the above assumptions,there exists T >0such that,for any n ∈N ,every classical solution of (3.10)–(3.11)verifies (3.12)–(3.14).Proof .Let u be a classical solution of (3.10)–(3.11).We prove estimate (3.12)for u by applying the maximum principle to the functions H ±u ,where H is defined as H (x,y,t )=(c 1+µt ) 1+|(x,y )|2and µis to be suitably fixed.Keeping in mind (3.5)and (3.12),it is easily verified thatL ε,n v H (x,y,t )≤(1+ε2)(c 1+µT ) 1+|(x,y )|2+((c 1+µT )c 1−µ) 1+|(x,y )|2≤−|f n (x,y,t,v (x,y,t ))|if µ,1T are suitably large.On the other hand,by (3.5),H |∂p S n,T ≥|g n |.Therefore,by the maximum principle,we infer that |u |≤H ≤2c 1 1+|(x,y )|2if T ≤c 1µ.Next we prove estimate (3.14)for the y -derivative of u .Our method is based on the maximum principle.We start by proving a gradient estimate for u on the parabolic boundary of S n,T .Since u ∈C 2+α(S n,T ),it is clear that ∇x,y u =∇x,y g n in B n ×{0}.In order to estimate ∇x,y u on ∂B n ×]0,T [,we employ the classical argument of the barrier functions on the cylinder Q ≡S n,T \S n 2,T .More precisely,given (x 0,y 0,t 0)∈∂B n ×]0,T [,we setw (x,y )=4c 1 (x −x 0,y −y 0),ν ,where νis the inner normal to Q at (x 0,y 0,t 0).Then we haveL ε,n v (w ±u )=±f n (·,v )=0in Q,since f n and h n vanish on Q .On the other hand,it is straightforward to verify that |u |≤w on ∂p Q .Therefore,by the maximum principle,we get |u |≤w and,in particular,|∇x,y u (x 0,y 0,t 0)|≤|∇x,y w (x 0,y 0)|≤4c 1.(3.15)Now we are in a position to prove estimate (3.14)for u .We differentiate equation (3.10)w.r.t.the variable y and then multiply it by e −2λt u y .Denoting ω= e −λt u y 2,we obtainL εv ω=e −2λt L εv u 2y +2λω=2 e −2λt |∇x u y |2+ε2u 2yy +u y ((f n )y +(f n )v v y ) +ω(λ−h (v )v y ) ≥2 e −2λt u y ((f n )y +(f n )v v y )+ω(λ−h (v )v y ) .(3.16)Hence,by setting w =ω−(4c 1)2,we get from (3.16)L εvw ≥2√ω −|(f n )y |−|v y (f n )v |+√ω(λ−|h v y |) (by (3.5),(3.14),and by the elementary inequality √ω≥√22(4c 1+sgn(w ) |w |))≥√2ω√2c 1 2√2 λ−4c 21 −4c 1−1 + λ−4c 21 sgn(w ) |w | (for λ=λ(c 1)suitably large)≥c ω|w |sgn(w )(3.17)for some positive constant c =c (c 1).By contradiction,we want to prove that w ≤0in S n,T .It will follow that|u y |≤c 1e λt ,which implies (3.16)if T =T (c 1)>0is sufficiently small.Let z 0be the maximum of w on Q T .If w (z 0)>0,then z 0∈S n,T \∂p S n,T ,since by (3.15)w ≤0on ∂p S n,T .This leads to a contradiction,since by (3.17)0≥L εvw (z 0)≥c ω(z 0)w (z 0)>0.This concludes the proof of (3.14).By a similar technique,we prove estimate (3.13)of the x -derivatives of u :|u x k |0≤4c 1,k =1,...,N.We setω= e −λt u x k 2,w =ω−(4c 1)2.Differentiating (3.10)w.r.t.x k and multiplying it by e −2λt u x k ,we getL εv w =e −2λt L εv u 2x k +2λω=2 e −2λt u x k ((f n )x k +v x k ((f n )v −u y h ))+λω(by (3.5),(3.13),and estimate (3.14)of u y previously proved)≥c ω|w |sgn(w ),if λ=λ(c 1)is suitably large,for some positive constant c which depends only on c 1.As before,we infer that w ≤0,which yields (3.13).We are in a position to prove Theorem 3.1.Proof of Theorem 3.1.In order to prove the existence of a unique classical solution to (3.3)–(3.4),we consider,for every ε>0and n ∈N ,the Cauchy–Dirichlet problem ∆x u +ε2u yy +h n (·,u )∂y u −∂t u =f n (·,u )in S n,T ,(3.18)u =g n in ∂p S n,T .(3.19)We split the proof into four steps:We first use Schauder’s fixed point theorem to solve the above problem.Then we let n go to infinity under the assumption that the coefficients are smooth.Next we prove estimates (3.6),(3.7),and (3.8).Finally we remove the smoothness assumption.First step.Assume that f,g,h are C ∞functions.We fix α∈]0,1[,n ∈N and denote by W the family of functions v ∈C 1+α(S n,T )such that|v |1+α≤M,(3.20)|v (x,y,t )|≤2c 1 1+|(x,y )|2in S n,T ,(3.21)|v x i |0≤4c 1,i =1,...,N,(3.22)|v y |0≤4c 1,(3.23)where the positive constants M,T will be suitably chosen later.Clearly,W is a closed,convex subset of C 1+α(S n,T ).We define a transformation u ≡Z v on W by choosing u as the unique classical solution of the linear Cauchy–Dirichlet problem (3.10)–(3.11).If we show that (i)Z (W )is precompact in C 1+α(S n,T );(ii)Z is a continuous operator;(iii)Z (W )⊆W ,then we are done.The proof of (i)and (ii)is quite standard and relies on the following two estimates of u (see,for example,[10,Chap.3,Thm.6and Chap.7,Thm.4]:|u |2+α≤c |g n |2+α+|f n (·,v )|α ≤¯c|g n |2+α+|v |α (3.24)for some constant ¯c >0dependent on ε,n,M,α;|u |1+δ≤ c |f n |0+|L εv g n |0+|g n |1+δ ,δ∈]0,1[,(3.25)for some positive constant c dependent on ε,n,δbut not on M .Besides,(iii)is exactly the content of Lemma 3.2.Therefore,by Schauder’s theorem,the operator Z has a fixed point u in W .Note that,by (3.6),a comparison principle in the space W does hold;therefore u is the unique classical solution of problem (3.18)–(3.19)verifying estimates (3.6)and (3.8).Moreover,by a standard bootstrap argument,u ∈C ∞(S n,T ).Second step.We fix ε>0and denote by u n the solution of the Cauchy–Dirichlet problem (3.18)–(3.19),whose existence has been proved in the previous step.We now want to obtain the solution of the Cauchy problem (3.3)–(3.4)letting n go to infinity.Fixing k ∈N ,we consider the sequence (u n χ4k )n ≥4k ,where χis the cut-offfunction introduced in (3.9).Then we have L εun (u n χ4k )=f 4k (·,u n )+2 ∇x u n ,∇x χ4k +ε2∂y u n ∂y χ4k +u n L εu n χ4k ≡F n,4k on S 4k,T ,(u n χ4k )|∂p S 4k,T =g 4k .By classical H¨o lder estimates,we deduce |u n |S 2k,T δ≤|u n χ4k |S 4k,T 1+δ≤c |F n,4k |S 4k,T 0+|L εu n g 4k |S 4k,T 0+|g 4k |S 4k,T 1+δ≤¯c for every n ≥4k and δ∈]0,1[,where ¯c =¯c (δ,ε,c 1,k )does not depend on n .Moreover,sinceL εu n (u n χ2k )=F n,2k on S 4k,T ,(u n χ2k )|∂p S 2k,T =g 2k ,we obtain|u n |S k,T 2+δ≤|u n χ2k |S 2k,T 2+δ≤c |F n,2k |S 2k,T δ+|g 2k |S 2k,T 2+δ ≤¯c ∀n ≥4k,where ¯c =¯c (δ,ε,c 1,k )does not depend on n .Then,by the Ascoli–Arzel`a theorem and Cantor’s diagonal argument,we can extract from u n a subsequence ||2+α-convergent on compacts of S T for every α∈]0,1[to the solution u εof (3.18)–(3.19)verifying estimates (3.6)and (3.8).The uniqueness of u εfollows again from standard results.Third step.We still assume f,g,h ∈C ∞∩Lip.We aim to prove estimate (3.7)for the solution u εfound in the previous step.We fix (¯x ,¯y )∈R n ×R and setw (x,y,t )=u ε(x,εy,t )¯χ(x,εy ),ε>0,(x,y,t )∈S T ,where ¯χ(x,y )=χ(x −¯x ,y −¯y )and χis the cut-offfunction in (3.9).We have (∆x +∂yy −∂t )w =Ψεon S T ,whereΨε(x,y,t )= ¯χ f (·,u ε)−h (u ε)u εy +u ε ∆x ¯χ+ε2¯χyy +2 ∇x u ε,∇x ¯χ +ε2u εy ¯χy (x,εy,t ),(x,y,t )∈S T .。
Optimal method and apparatus for neural modulation
专利名称:Optimal method and apparatus for neuralmodulation for the treatment ofneurological disease, particularly movementdisorders发明人:Daniel J. DiLorenzo申请号:US10008576申请日:20011111公开号:US20020177882A1公开日:20021128专利内容由知识产权出版社提供专利附图:摘要:A neurological control system for modulating activity of any component orstructure comprising the entirety or portion of the nervous system, or any structure interfaced thereto, generally referred to herein as a “nervous system component.” The neurological control system generates neural modulation signals delivered to a nervous system component through one or more intracranial (IC) stimulating electrodes in accordance with treatment parameters. Such treatment parameters may be derived from a neural response to previously delivered neural modulation signals sensed by one or more sensors, each configured to sense a particular characteristic indicative of a neurological or psychiatric condition. Neural modulation signals include any control signal which enhances or inhibits cell activity. Significantly the neurological control system considers neural response, in the form of the sensory feedback, as an indication of neurological disease state and/or responsiveness to therapy, in the determination of treatment parameters.申请人:DILORENZO DANIEL J.更多信息请下载全文后查看。
Toward optimal diffusion matrices £
Toward optimal diffusion matricesRobert Elsässer,Burkhard Monien,Stefan SchambergerUniversität PaderbornDepartment of Mathematics and Computer Science Fürstenallee11,D-33102Paderborn{elsa,bm,schaum}@uni-paderborn.deGünther RoteFreie Universität Berlin Institute of Computer Science Takustr.9,D-14195Berlin rote@inf.fu-berlin.deAbstractEfficient load balancing algorithms are the key to many efficient parallel applications.Until now,research in this area has mainly been focusing on homogeneous schemes. However,observations show that the convergence rate of diffusion algorithms can be improved using edge weighted graphs without deteriorating theflows quality.In this pa-per we consider common interconnection topologies and demonstrate,how optimal edge weights can be calculated for the First and Second Order Diffusion -ing theoretical analysis and practical experiments we show, what improvements can be archived on selected networks.Keywords.load balancing,diffusion,eigenvalues,FOS, SOS,hypercubic networks1.IntroductionLoad balancing is a very important prerequisite for an ef-ficient use of parallel computers.Many parallel applications produce dynamic work load and its amount per processor often changes dramatically during run time.Therefore,to reduce the overall computation time,the total work load of the network has to be distributed evenly among all nodes while the computation proceeds.Obviously,we can ensure an overall benefit of the computation only if the balancing scheme itself is highly efficient.One example showing the importance of an efficient load balancing scheme is parallelfinite element -ing meshes consisting out of several million elements repre-senting the discretized geometric space,these are split into parts and evenly distributed among all processors.Each processor starts computing independently on its part until the next global communication step is required.Depending This work was partly supported by the German Science Foundation (DFG)project SFB-376and by the IST Program of the EU under contract number IST-1999-14186(ALCOM-FT).on the application,the mesh refines and coarsens in someareas during the computation what causes an imbalance be-tween the processors’load and therefore delays the overallcomputation.Influid dynamics for example,simulation ofturbulences or shocks often depends on such refinements. In these situations,there is a need to balance the load.Theapplication is interrupted and the at this moment static loadbalancing problem is solved.For a selection of applications, case studies and references on the problem of parallelfiniteelement simulations the reader is referred to[11].One of the most common approaches for load balancingis the2-step model(e.g.[7]).Thefirst step calculatesa balancingflow.Thisflow is used in the second step,in which load elements are migrated accordingly.This paperfocuses on thefirst step and analyzes the questions,howmuch load has to be migrated and where to.More formally, given a network with nodes,each containing work load,we calculate a load balancingflow over the edges of the network such that after termination of the second step,eachnode has the balanced work load of.We further assume that no load is generated or consumed during the balancing process and the structure of the network isfixed,meaning we consider a static load balancing scenario.If the global imbalance vector is known,it is possible to solve the problem by solving a linear system of equations[14].But assuming that processors of the parallel network may only access information of their direct neigh-bors,load information has to be exchanged locally in iter-ations until a balancingflow is computed.Two sub classes of local iterative load balancing algorithms are the diffu-sion schemes[3,5]and the dimension exchange schemes [5,19].These two classes reflect different communication abilities of the network.Diffusion algorithms assume that a processor can send and receive messages to/from all of its neighbors simultaneously,while the dimension exchange approach is more restrictive and only allows a processor communicate with one of its neighbors during each itera-tion.The alternating direction iterative scheme[10]rep-resents a mixture of the diffusion and dimension exchangemethods.It reduces the number of iteration steps needed for networks constructed by Cartesian products of graphs.The drawback of this scheme is that the resultingflow may have load migration loops tending to infinity.In[5],Cybenco defined the general diffusion scheme. If we denote the load after the th iteration step on node of the graph with,then satisfies the equationMost of the results in this area concentrate on homoge-neous schemes with the entries being the same for any.Furthermore,there is plenty of work [5,7,9,13,16,17]focusing on the relation between conver-gence rates of diffusion algorithms and the condition num-ber of the unweighted Laplacian matrix defined in the next section.In[7]it is shown that all diffusion schemes calcu-late the sameflow and that thisflow is minimal considering the-norm.In the same paper is also shown that the known diffusion schemes can be generalized for weighted graphs. Sending a higher amount of load over heavier weighted edges,the calculatedflow is still minimal with respect to a weighted-norm.A formal definition of this is given in the next section.Inhomogeneous schemes can be described by edge weighted graphs.The goal is tofind edge weights such that the condition number of the resulting Laplacian matrix is maximized among all Laplacians having the same commu-nication structure,e.g.having the same zero entries.At this time very little is known about this problem.To our knowl-edge,[8]was thefirst and up to now the only paper address-ing this topic.There,semi definite programming is used and it is proved that a polynomial time approximation algorithm exists to compute the optimal values.Furthermore,some examples of graph classes with optimal weights are given. Using this approach,however,considerable results can only be obtained for graphs of small cardinality.The results of this paper are the following.First,we consider edge transitive graphs and show that for these graphs the maximal condition number of the correspond-ing weighted Laplacian matrix is achieved if all edges have the same weight.This result solves some open problems described in[8]with respect to optimal edge weights of Hy-percubes,Cycles and the Star.Second,we consider Cayley graphs and prove,that edges generated by the same genera-tor must be of equal weight in order to achieve the maximal condition number.Another general graph class are Carte-sian products of graphs.For this class,we compute edge weights that can be used to improve the known load bal-ancing diffusion algorithms on them.Additionally,we con-sider the Cube Connected Cycles and compute optimal val-ues for the weights of its edges,maximizing the condition number of the corresponding Laplacian.Moreover,we de-scribe how optimal edge weights can be obtained for other hypercubic networks like Cube Connected Paths,Butterfly, wrapped Butterfly and the de Bruijn.To confirm our the-oretical results,we perform several experiments using dif-ferent edge weight scenarios on the mentioned graph types and show the dependencies between edge weights and con-vergence rate.2.Background and DefinitionsLet be a connected,weighted undirected graph with nodes and edges.Let be the weight of edge,be the load of node and be the vector of load values.Vector denotes the vector of an average load.Let be the weighted adjacency matrix of. As is undirected,is symmetric.Column/row of contains where and are neighbors in.For some of our constructions we need the Laplacianof defined as,where con-tains the weighted degrees as diagonal entries,e.g.,and elsewhere.We consider the following local iterative balancing al-gorithm that requires communication with adjacent nodes only and performs the iteration(1)on every node.Here,represents the arbitrar-ily assigned edge direction,describes the amount of load sent via edge in step,is the load sent via edge added up until iteration and is the load of the node after the-th iteration.If a directed edge is pointing from to,then otherwise. Note,that and therefore for any pair puting theflow can be skipped in case of a-step model since there the load is immediately moved and no monitoring needs to be done. Throughout this paper however,we assume applying the-step model in which a balancingflow is calculatedfirst and load is moved in a second step accordingly as already men-tioned in section1.The scheme shown in equation(1)is known as the First Order Scheme(FOS)and converges to the average load[5].It can be written in matrix nota-tion as with. contains at position for every edge,at diagonal entry,and elsewhere.Now,let,be the eigenvalues of the Lapla-cian in non decreasing order.It is known thatwith eigenvector and with being the maximum weighted degree of all nodes [4].has the eigenvalues.Here,has to be chosen such that.Since is connected thefirst eigenvalue is simple to the eigenvector.The matrix is called diffusion matrix.We denote by the second largest eigenvalue of according to absolute values and call it the diffusion norm of.Several modifications to the First Order Scheme have been discussed in the past.One of them is the Second Order Scheme(SOS)[13]which has the formwith being afixed parameter,whereby fastest conver-gence is archived for.The Chebyshev method[7]differs from SOS only by the fact that depends on according toGeneralized,a polynomial based load balancing scheme is any scheme for which the work load in step can be expressed in the form where. Here,denotes the set of all polynomials of degree satisfying the constraint.The convergence of a polynomial based scheme depends on whether(and how fast)the error be-tween the load after iteration,and the corresponding average loadconverges to zero.In this work we consider a system to be-balanced after the th iteration step iff the error.Here,and represent the vectors resp.in-norm.In[13],the number of steps needed to-balance a system is analyzed and using the re-sults of this work we can state the following lemma.Lemma1Let be a graph and be its Laplacian.Let be the diffusion matrix withand set.Then FOS and SOS both takeresp.steps to-balance the system.Here,is the condition number of.In this lemma and are chosen such that the conver-gence rate of FOS and SOS is maximized.We can see that the SOS converges faster than FOS by almost a quadratic factor.The Chebyshev method can be regarded to perform asymptotically identical to SOS[7].Lemma1also shows the importance of the condition number of the Laplacian.Both schemes,FOS and SOS converge faster,if the condi-tion number is higher.As mentioned in section1,by using edge weighted graphs it is possible to increase the condition number of the Laplacian and therefore to reduce the number of steps needed to compute a balancingflow distributing the load in the network.We concentrate now on theflow obtained by the polyno-mial based diffusion algorithms.Let the optimalflow be represented by the minimalflow with respect to the weighted Euclidian norm,i.e.the solution to the problem min!over all balancingflowsHere,represent the weights assigned to the edges of the graph.Then we can state the following lemma [7].Lemma2FOS and the SOS compute an-minimal bal-ancingflow.In[7],lemma2is shown for polynomial based load balanc-ing schemes in a general form.3.New ResultsIn this section we deal with general graph classes like edge-transitive graphs,Cayley graphs and Cartesian prod-ucts of graphs as well as with interconnection topologies like Grid(G),Torus(T),Cube Connected Cycles(CCC), Butterfly(BF)and de Bruijn(DB)networks.These inter-connection topologies are designed to have many favorable properties for distributed computing,e.g.small vertex de-gree and diameter and large connectivity.In the present work,we focus our attention on computing optimal edge weights for these graphs in order to maximize the condition number of the corresponding Laplacian.First,let us con-centrate on some simple graphs like Cycles,Hypercubes, complete graphs or the Star.All these graphs are edge tran-sitive.In other words,for any pair of edges and there is an automorphism such thatand.An automorphism of a graph is a one-to-one mapping of nodes onto nodes such that edges are mapped onto edges.To show that for all edge transitive graphs the maximal condition number is achieved if all edges have the same weight,the following lemma is useful.Lemma3Let be Laplacian matrices of weighted graphs,all with the same adjacency structure,and.Thenand.Proof1We know that is an eigenvector of, to the eigenvalue and for all Graphs we denote the number of vertices with and the number of edges with.Furthermore,we denote bythe weights of the edges of.We assume w.l.o.g.that.Let be the eigenvector of to the eigenvalue.Then,using the Rayleigh coefficient we obtainwhere and are vectors of size.The second statement of the lemma can be obtained in a similar manner.Now,let be the Laplacian of a weighted edge symmetric graph.Applying the lemma to the family of all matrices that can be obtained from by permuting rows and columns according to some automorphism of,we obtain a Lapla-cian where all edge weights are equal.Due to lemma 3,the condition number of will not be smaller than the condition number of and we can state the following the-orem.Theorem1Let be an unweighted,edge transitive graph. Among all weighted graphs with’s adjacency structure, the condition number of the Laplacian is maximal for the one that has all edge weights set to.In the present paper we consider several graphs that can be viewed as Cayley graphs.The definition of a Cayley graph is given below.Definition1Let be any abstractfinite group,with iden-tity and let be a set of generators for,with the prop-erties and.The Cayley graph is a simple graph with vertex set and edge set.An edge{h,k}is generated by a generator,iff or.We now show that edges of the same generator of must have the same weights in order to achieve a minimal amount of iteration steps in diffusion algorithms.Theorem2Let be a Cayley graph and let be the set of its generators.The condition number of the Laplacian is maximized,if for any two edges and generated by the same generator the edge weights of and are equal.Proof2For each we may define a permutation of by the rule.This is an auto-morphism of[2].If there exists an edge between and generated by,then there also exists an edge between and generated by.Assume and let be the smallest integer with the property.Then generates cycles of length where each vertex has an inci-dent edge generated by and an other incident edge gen-erated by.Therefore,the number of edges generated by equals the number of vertices of.Next,we have to show that for different and the edge is mapped to different edges.Assume.Then and or and.In the first case,we have a contradiction to the assumption that ,while in the second case there is a contradiction to.Hence,we can use permutations to map each edge to every other edge and the theorem follows by lemma 3.If,using permutations causes each edge being mapped twice to every other edge in the graph and the theorem also follows by lemma3.As a consequence of this theorem,edges belonging to the same dimension of a Torus must have the same weight.On the other hand,a Torus can be viewed as a Cartesian product of Cycles.For a Cartesian product of two graphs and however,we can state the following theorem.Theorem3Let and be two unweighted graphs and let be their Cartesian product.W.l.o.g,assume.The diffusion schemes on can be improved by assigning the weight to the edges contained in and to the edges contained in.Proof3The second smallest eigenvalue of isand the largest eigenvalue of has the form.Let be the weight of the edges of.We have to maximize the function.The functionis increasing,while is de-creasing in.It follows that holds for a maxi-mized.Note,that if both graphs and defined in theorem3 are edge transitive graphs,assigning weight to the edges of and to the edges of maximizes the condition number of the Laplacian matrix.Since the eigenvalues of a Cycle of length are,we can state the following: Corollary1Let be the-dimensional Torus generatedfrom the Cartesian product of Cycles of length.The polynomial based diffusion algorithms have their fastest convergence rate,if the edge-weights of cycle, are set to.Other graphs with a similar structure are-dimensionalGrids.However,these are not Cayley graphs and it is known that edges of the same dimension do not neces-sarily need to have the same edge weight[8].However,considering them as Cartesian products of Paths of length,we can also improve the diffusionalgorithms on them.Similar to corollary1,the best results are achieved by setting the edge weight of a dimension to.Another example showing the power of this method isthe Cartesian product of a Path of length and a complete graph of ing theorem3and the result of[13],we see that steps are required to-balance the system using FOS.Assigning a weight ofto the edges of the Path,only steps are re-quired.Next,we consider the Cube Connected Cycles Network of dimension,which will be denoted by CCC().The CCC()contains cycles of length.We can represent each node by a pair where is the po-sition of the node within its Cycle and is a-bit binary string,where is the label of the node that corresponds to the cycle.Two nodes and are adjacent,iff ei-ther and mod,or and differs from in exactly the-th bit.Edges of thefirst type are called cycle edges,while edges of the second type are re-ferred to as hypercube edges.Our objective is to determine the edge weights for which the diffusion algorithms FOS and SOS will have the fastest convergence.We use the fact that the CCC()is a Cayley graph[1].It is known that the cycles in the CCC()are generated by one generator of the corresponding Cayley graph,while the hypercube edges are generated by some other generator.As a consequence of theorem2,the CCC()’s optimal value for the condition number is obtained,iff all cycle edges are of one weight and all hypercube edges of some other weight.We normal-ize the weight of the Cycle edges to while the weight of the Hypercube edges remains variable and is set to.To compute the optimal value of we need the following lem-mas.Lemma4Let and where with and all other entries of equal.Thenfor all and.The proof of this lemma immediately follows from the so called Separation Theorem[18].In the next lemma we com-pute the eigenvalues of a modified Laplacian of a Cycle, where one diagonal entry contains the value and all other diagonal entries are set to.Lemma5Let be the Laplacian of an unweighted Cycle of length and where and is defined as in lemma4.Then is maximized for.The proof of this lemma has to be omitted due to space lim-itations.Now,we are ready to formulate the following the-orem.Theorem4The optimal value of the condition number of the Laplacian of a weighted CCC()is achieved for.Proof4The Laplacian of the weighted CCC()is of the form whereand Here,represents the unweighted Cycle of length,its Laplacian,denotes a matrix where all entries equal and the operation“”is the Kronecker Product:For ,the matrix is the matrix obtained from by replacing every element by the block.The eigenvalues of are equal to the eigenvalues of the matrices and. Applying this transformation times,we obtain some ma-trices of the form where represents the the adjacency matrix of an unweighted cycle of length.is a diagonal matrix with all diagonal entries belonging to the set and all of-diagonal entries set to .Lemma4states that the second smallest eigenvalue of the Laplacian of the weighted CCC()is the smallest eigen-value of,where contains exactly one diagonal entry set to and all other diagonal entries equal. Furthermore,lemma4also implies that the largest eigen-value of this Laplacian is the largest eigenvalue ofwhere all diagonal entries of equal.Thus, calculated in lemma5equals the condition number of the and we obtain the theorem.Analyzing the improvement of the condition number of the Laplacian by setting the weight of the hypercube edgesto,it can be calculated that the quotient between the condition number of the weighted Laplacian and the condi-tion number of the unweighted Laplacian converges to.Therefore,we can save about of the time needed for FOS to balance the load on large topologies of this kind.Due to space limitations,the remaining part of this sec-tion contains only a brief overview of our proves and cog-nitions on other network topologies.The structure of the Cube Connected Path is similar to the one of the CCC.Its definition is identical to the CubeConnected Cycle,except that the edges between and are missing.Similar to the CCC,edges of the first type are called path edges,while edges of the secondtype are hypercube edges.In the following we denote the -dimensional Cube Connected Path of vertices by CCP().The CCP is not a Cayley graph and therefore it is quite difficult to determine optimal parameters for its edges. Anyway,a similar approach can be used to improve the convergence rate,assigning weight to the path edges and to the hypercube edges.Doing this,the calculations in lemma5and theorem4provide a value of for .As in the case of the CCC,this value improves the condi-tion number of the Laplacian compared to the unweighted case by a factor of approximately for large.Another common interconnection topology is the But-terfly graph,which has a similar structure to the CCP and is defined as follows.The nodes of the-dimensional Butter-fly BF()correspond to pairs where is the dimension of the node and is a-bit binary number that denotes the row of the node.Two nodes andare adjacent iff and either or and differ in precisely the th bit.In thefirst case the edges are called path edges,while in the second case the edges are cross edges.These graph is neither a Cayley graph,so that using similar approaches as applied for the CCC we can only derive improvements for the convergence rate of diffusion algorithms,but we can not determine the optimal values for the edge weights.However,using weight for the path edges and weight for the cross edges it turns out, that the condition number of the Laplacian is maximized for .Like in the case of CCC and CCP,we can define a But-terfly graph with wrap around edges.Two vertices and are adjacent,iff mod and eitheror and differ in precisely the th bit.Again,edges of thefirst kind are Cycle edges while edges of the second kind are cross edges.This type of graph is a Cayley graph[1], so optimal values for the edge weights can be determined. Similar to lemma5and theorem4,the optimal condition number occurs when the weight of the cycle edges equals and the weight of the cross edges are. However,there is no significant improvement of the condi-tion number for large.To obtain this value for,we have reduced the eigenvalues of the weighted Laplacian of the wrapped Butterfly to the eigenvalues of some matrices of the form.represent the adjacency matri-ces of some weighted Cycles of length with edge weights from the set while is a diagonal matrix with all diagonal entries set to and all others.Wefinish this section by considering the de Bruijn graph DB().The directed de Bruijn graph contains vertices labeled from to with-bit binary numbers,such that there is a directed edge from vertex towhenever for all. By replacing each directed edge by an undirected edge,we obtain the undirected de Bruijn which is regular graph of degree.Note,that this definition allows2loops at the vertices and and one double edge between the vertices and.To improve the diffusion algorithms on the de Bruijn network, we assign to the edges formed between and where,and to the edges betweenand where.By using tech-niques from[6]and lemma5together with theorem4,we get the same value for as we obtain for the wrapped But-terfly.The reason for this is that the eigenvalues of the de Bruijn graph are reduced to the eigenvalues of matrices also contained in the set of matrices used for calculating the eigenvalues of the wrapped Butterfly.4.ExperimentsTo show the effects of the approach introduced in chap-ter3,we have implemented a simulation program and run several work types included are Grid(G), Torus(T),Cube Connected Cycles(CCC),Cube Connected Paths(CCP),Butterfly(BF),wrapped Butterfly and de Bruijn(DB).The program was implemented in C++,using the ARPACK++library[15]for eigenvalue computations. While it is possible to determine eigenvalues of relatively small networks(C(8))from the Laplacian itself,we are not able to do this for larger networks(C(16)) in a reasonable amount of time.Therefore,we determine the second smallest and largest eigenvalues of these graphs by either using explicit formulas or by reducing their cal-culations to the computation of eigenvalues of only parts of the original graph.A detailed description of this approach applied to the CCC can be found in section3and we use similar techniques for other hypercubic networks.Prior to thefirst iteration of the simulation,the network’s load is either distributed randomly(RS)over the network or placed onto a single node(SS),while we normalize the balanced load.The total amount of load is there-fore equal to the total number of nodes in the graph.We apply the FOS and the SOS and keep iterating until an al-most evenly distributingflow is calculated.For our tests,we define this to be archived as soon as after the it-eration is less than 0.01.For both diffusionschemes,we have chosen the optimal value of,for SOS we used .The time spent on com-puting eigenvalues of large graphs is reduced by applying the approach described in section 3,and most of the com-putation time is consumed by the flow calculations.Figures 1through 6show some results of our experi-ments.For each selection of on the -axis the resulting convergence rate of FOS applied on the specific network type (left)and the number of iterations needed by SOS to compute a balancing flow (right)is shown.Note,that since the results are very similar for any combination of one of the schemes (FOS/SOS)and one of the load patterns (RS/SS),we have only included those for the SOS and SS.The results shown in figures 1through 6are also included in tables 1,2and 3,where a short overview on the simulation results with other network sizes is given.0.9660.9680.970.9720.9740.9760.9780.9800.51 1.52 2.53 3.54464850525456586062µ2i t e r a t i o n s aµ2iterationsFigure 1.SOS SS CCC(8)0.9860.98650.9870.98750.9880.98850.9890.98950.990.990500.511.522.570727476788082848688µ2i t e r a t i o n saµ2iterationsFigure 2.SOS SS CCP(8)0.9320.9340.9360.9380.940.9420.9440.9460.9480.950.9520.511.522.533.544.5343536373839µ2i t e r a t i o n saµ2iterationsFigure 3.SOS SS wrapped BF(8)0.9320.9340.9360.9380.940.9420.9440.9460.9480.950.9520.511.522.533.544.5303132333435µ2i t e r a t i o n saµ2iterationsFigure 4.SOS SS DB(8)0.9810.9820.9830.9840.9850.9860.9870.9880.9890.990.9910510152025304850525456586062646668µ2i t e r a t i o n s aµ2iterationsFigure 5.SOS SSG()0.930.9350.940.9450.950.9550.960.9650.970510152025302426283032343638µ2i t e r a t i o n saµ2iterationsFigure 6.SOS SST()As we can see from figures 1to 6,the closer is tothe optimal value ,the smaller becomes the number ofiterations needed to compute a balancing flow on all net-work types.First,let us study the CCC.In case of the to -dimensional CCC we have an optimal greater than .We obtain the best improvements for the -dimensional CCC and the savings decrease when increasing the dimen-sion.Considering CCC of higher dimensions than ,we observe that the improvements increase again with larger dimension.As described in section 3,we can win usingFOS at mostfor the flow computation when tends to infinity.Similar savings can be archived for the CCP,butwe obtain anvalue smaller than for the -dimensional CCP and the improvements become higher with higher di-mensions.Note however,that for the CCC converges to for large in contrast to wrapped BF and DB,wherewill stay about the same.A special case is the BF withits optimal value.Here of course,no savings are possible at all,so we omit the corresponding graph.In the case of the wrapped BF and DB,the maximum savings are also modest,ranging from 3%to 14%and as pointed out in section 3,we cannot expect higher improvements for larger dimensions.Second Order Scheme (SOS),Single Source (SS)CCC()CCP()Iterations ()Savings Iterations ()Savings 31616 1.500%19190.880%42322 1.504%29280.773%52828 1.290%38380.690%63534 1.233%49480.632%84848 1.070%74720.543%1283830.870%1411340.439%161271260.751%2252110.376%wrapped BF()DB()Iterations ()Savings Iterations ()Savings 31110 2.239%109 2.2310%41614 2.3112%1412 2.3114%52019 2.355%1816 2.3511%62524 2.374%2221 2.375%83635 2.393%3230 2.396%126360 2.405%5452 2.404%169895 2.413%8481 2.414%Table 1.Number of iterations needed to calcu-late a balancing flow for unweighted graphs()and optimal weighted graphs ().is the optimal edge weight for one edge type (see Section 3)assuming the other edges have weight .The results for Grid and Torus given in Figures 5and6differ from the others in the way that large savings of iterations are possible,what is due to the large value of.As shown in Table 2and 3,savings up to 28%can be archived.Note,that by fixing one dimension and increasing the other dimension to infinity,the optimal value of will grow quadratically with the cardinality of the graph in the second dimension leading to improvements up to a factor of .We have restricted our experiments to -dimensional Grid and Torus,but similar results can also be obtained for higher dimensional graphs of the same type.This is an in-teresting result,since these networks of about the same size are widely available.The hpcline [12]operated by the。
DEMO Differential evolution for multiobjective optimization
DEMO:Differential Evolution for MultiobjectiveOptimizationTea Robiˇc and Bogdan FilipiˇcDepartment of Intelligent Systems,Joˇz ef Stefan Institute,Jamova39,SI-1000Ljubljana,Sloveniatea.robic@ijs.sibogdan.filipic@ijs.siAbstract.Differential Evolution(DE)is a simple but powerful evolu-tionary optimization algorithm with many successful applications.In thispaper we propose Differential Evolution for Multiobjective Optimization(DEMO)–a new approach to multiobjective optimization based on DE.DEMO combines the advantages of DE with the mechanisms of Pareto-based ranking and crowding distance sorting,used by state-of-the-artevolutionary algorithms for multiobjective optimization.DEMO is im-plemented in three variants that achieve competitive results onfive ZDTtest problems.1IntroductionMany real-world optimization problems involve optimization of several(conflict-ing)criteria.Since multiobjective optimization searches for an optimal vector, not just a single value,one solution often cannot be said to be better than an-other and there exists not only a single optimal solution,but a set of optimal solutions,called the Pareto front.Consequently,there are two goals in multi-objective optimization:(i)to discover solutions as close to the Pareto front as possible,and(ii)tofind solutions as diverse as possible in the obtained nondom-inated front.Satisfying these two goals is a challenging task for any algorithm for multiobjective optimization.In recent years,many algorithms for multiobjective optimization have been introduced.Most originate in thefield of Evolutionary Algorithms(EAs)–the so-called Multiobjective Optimization EAs(MOEAs).Among these,the NSGA-II by Deb et al.[1]and SPEA2by Zitzler et al.[2]are the most popular.MOEAs take the strong points of EAs and apply them to Multiobjective Optimization Problems(MOPs).A particular EA that has been used for multiobjective opti-mization is Differential Evolution(DE).DE is a simple yet powerful evolution-ary algorithm by Price and Storn[3]that has been successfully used in solving single-objective optimization problems[4].Hence,several researchers have tried to extend it to handle MOPs.Abbass[5,6]was thefirst to apply DE to MOPs in the so-called Pareto Differential Evolution(PDE)algorithm.This approach employs DE to createC.A.Coello Coello et al.(Eds.):EMO2005,LNCS3410,pp.520–533,2005.c Springer-Verlag Berlin Heidelberg2005DEMO:Differential Evolution for Multiobjective Optimization521 new individuals and then keeps only the nondominated ones as the basis for the next generation.PDE was compared to SPEA[7](the predecessor of SPEA2) on two test problems and found to outperform it.Madavan[8]achieved good results with the Pareto Differential Evolution Approach(PDEA1).Like PDE,PDEA applies DE to create new individuals. It then combines both populations and calculates the nondominated rank(with Pareto-based ranking assignment)and diversity rank(with the crowding distance metric)for each individual.Two variants of PDEA were investigated.Thefirst compares each child with its parent.The child replaced the parent if it had a higher nondominated rank or,if it had the same nondominated rank and a higher diversity rank.Otherwise the child is discarded.This variant was found inefficient –the diversity was good but the convergence slow.The other variant simply takes the best individuals according to the nondominated rank and diversity rank(like in NSGA-II).The latter variant has proved to be very efficient and was applied to several MOPs,where it produced favorable results.Xue[9]introduced Multiobjective Differential Evolution(MODE).This algo-rithm also uses the Pareto-based ranking assignment and the crowding distance metric,but in a different manner than PDEA.In MODE thefitness of an in-dividual isfirst calculated using Pareto-based ranking and then reduced with respect to the individual’s crowding distance value.This singlefitness value is then used to select the best individuals for the new population.MODE was tested onfive benchmark problems where it produced better results than SPEA.In this paper,we propose a new way of extending DE to be suitable for solving MOPs.We call it DEMO(Differential Evolution for Multiobjective Op-timization).Although similar to the existing algorithms(especially PDEA),our implementation differs from others and represents a novel approach to multiob-jective optimization.DEMO is implemented in three variants(DEMO/parent, DEMO/closest/dec and DEMO/closest/obj).Because of diverse recommenda-tions for the crossover probability,three different values for this parameter are investigated.From the simulation results onfive test problems wefind that DEMO efficiently achieves the two goals of multiobjective optimization,i.e.the convergence to the true Pareto front and uniform spread of individuals along the front.Moreover,DEMO achieves very good results on the test problem ZDT4 that poses many difficulties to state-of-the-art algorithms for multiobjective op-timization.The rest of the paper is organized as follows.In Section2we describe the DE scheme that was used as a base for DEMO.Thereafter,in Section3,we present DEMO in its three variants.Section4outlines the applied test prob-lems and performance measures,and states the results.Further comparison and discussion of the results are provided in Section5.The paper concludes with Section6.1This acronym was not used by Madavan.We introduce it to make clear distinction between his approach and other implementations of DE for multiobjective optimiza-tion.522T.Robiˇc and B.FilipiˇcDifferential Evolution1.Evaluate the initial population P of random individuals.2.While stopping criterion not met,do:2.1.For each individual P i (i =1,...,popSize )from P repeat:(a)Create candidate C from parent P i .(b)Evaluate the candidate.(c)If the candidate is better than the parent,the candidate replacesthe parent.Otherwise,the candidate is discarded.2.2.Randomly enumerate the individuals in P .Fig.1.Outline of DE’s main procedureCandidate creation Input:Parent P i1.Randomly select three individuals P i 1,P i 2,P i 3from P ,where i,i 1,i 2and i 3are pairwise different.2.Calculate candidate C as C =P i 1+F ·(P i 2−P i 3),where F is a scaling factor.3.Modify the candidate by binary crossover with the parent using crossover probability crossP rob .Output:Candidate C Fig.2.Outline of the candidate creation in scheme DE/rand/1/bin2Differential EvolutionDE is a simple evolutionary algorithm that creates new candidate solutions by combining the parent individual and several other individuals of the same pop-ulation.A candidate replaces the parent only if it has better fitness.This is a rather greedy selection scheme that often outperforms traditional EAs.The DE algorithm in pseudo-code is shown in Fig.1.Many variants of cre-ation of a candidate are possible.We use the DE scheme DE/rand/1/bin de-scribed in Fig.2(more details on this and other DE schemes can be found in [10]).Sometimes,the newly created candidate falls out of bounds of the vari-able space.In such cases,many approaches of constraint handling are possible.We address this problem by simply replacing the candidate value violating the boundary constraints with the closest boundary value.In this way,the candidate becomes feasible by making as few alterations to it as possible.Moreover,this approach does not require the construction of a new candidate.3Differential Evolution for Multiobjective OptimizationWhen applying DE to MOPs,we face many difficulties.Besides preserving a uniformly spread front of nondominated solutions,which is a challenging task for any MOEA,we have to deal with another question,that is,when to replace the parent with the candidate solution.In single-objective optimization,the decisionDEMO:Differential Evolution for Multiobjective Optimization523 Differential Evolution for Multiobjective Optimization1.Evaluate the initial population P of random individuals.2.While stopping criterion not met,do:2.1.For each individual P i(i=1,...,popSize)from P repeat:(a)Create candidate C from parent P i.(b)Evaluate the candidate.(c)If the candidate dominates the parent,the candidate replaces the parent.If the parent dominates the candidate,the candidate is discarded.Otherwise,the candidate is added in the population.2.2.If the population has more than popSize individuals,truncate it.2.3.Randomly enumerate the individuals in P.Fig.3.Outline of DEMO/parentis easy:the candidate replaces the parent only when the candidate is better than the parent.In MOPs,on the other hand,the decision is not so straightforward. We could use the concept of dominance(the candidate replaces the parent only if it dominates it),but this would make the greedy selection scheme of DE even greedier.Therefore,DEMO applies the following principle(see Fig.3). The candidate replaces the parent if it dominates it.If the parent dominates the candidate,the candidate is discarded.Otherwise(when the candidate and parent are nondominated with regard to each other),the candidate is added to the population.This step is repeated until popSize number of candidates are created.After that,we get a population of the size between popSize and 2·popSize.If the population has enlarged,we have to truncate it to prepare it for the next step of the algorithm.The truncation consists of sorting the individuals with nondominated sorting and then evaluating the individuals of the same front with the crowding dis-tance metric.The truncation procedure keeps in the population only the best popSize individuals(with regard to these two metrics).The described truncation is derived from NSGA-II and is also used in PDEA’s second variant.DEMO incorporates two crucial mechanisms.The immediate replacement of the parent individual with the candidate that dominates it,is the core of DEMO. The newly created candidates that enter the population(either by replacement or by addition)instantly take part in the creation of the following candidates. This emphasizes elitism within reproduction,which helps achieving thefirst goal of multiobjective optimization–convergence to the true Pareto front.The second mechanism is the use of nondominated sorting and crowding distance metric in truncation of the extended population.Besides preserving elitism, this mechanism stimulates the uniform spread of solutions.This is needed to achieve the second goal–finding as diverse nondominated solutions as possible. DEMO’s selection scheme thus efficiently pursues both goals of multiobjective optimization.The described DEMO’s procedure(outlined in Fig.3)is the most elemen-tary of the three variants presented in this paper.It is called DEMO/parent.524T.Robiˇc and B.FilipiˇcThe other two variants were inspired by the concept of Crowding DE,recently introduced by Thomsen in[11].Thomsen applied crowding-based DE in optimization of multimodal func-tions.When optimizing functions with many optima,we would sometimes like not only tofind one optimal point,but also discover and maintain multiple op-tima in a single algorithm run.For this purpose,Crowding DE can be used. Crowding DE is basically conventional DE with one important diffu-ally,the candidate is compared to its parent.In Crowding DE,the candidate is compared to the most similar individual in the population.The applied similarity measure is the Euclidean distance between two solutions.Crowding DE was tested on numerous benchmark problems and its performance was impressive.Because the goal of maintaining multiple optima is similar to the second goal of multiobjective optimization(maintaining diverse solutions in the front),we implemented the idea of Crowding DE in two addi-tional variants of the DEMO algorithm.The second,DEMO/closest/ dec,works in the same way as DEMO/parent,with the exception that the can-didate solution is compared to the most similar individual in decision space.If it dominates it,the candidate replaces this individual,otherwise it is treated in the same way as in DEMO/parent.The applied similarity measure is the Eu-clidean distance between the two solutions in decision space.In the third variant, DEMO/closest/obj,the candidate is compared to the most similar individual in objective space.DEMO/closest/dec and DEMO/closest/obj need more time for one step of the procedure than DEMO/parent.This is because at every step they have to search for the most similar individual in the decision and objective space,respec-tively.Although the additional computational complexity is notable when oper-ating on high-dimensional spaces,it is negligible in real-world problems where the solution evaluation is the most time consuming task.4Evaluation and Results4.1Test ProblemsWe analyze the performance of the three variants of DEMO onfive ZDT test problems(introduced in[12])that were frequently used as benchmark problems in the literature[1,8,9].These problems are described in detail in Tables1and2.4.2Performance MeasuresDifferent performance measures for evaluating the efficiency of MOEAs have been suggested in the literature.Because we wanted to compare DEMO to other MOEAs(especially the ones that use DE)on their published results,we use three metrics that have been used in these studies.For all three metrics,we need to know the true Pareto front for a problem. Since we are dealing with artificial test problems,the true Pareto front is notDEMO:Differential Evolution for Multiobjective Optimization525 Table1.Description of the test problems ZDT1,ZDT2and ZDT3ZDT1Decision space x∈[0,1]30Objective functions f1(x)=x1f2(x)=g(x)1−x1/g(x)g(x)=1+9n−1 ni=2x iOptimal solutions0≤x∗1≤1and x∗i=0for i=2,...,30 Characteristics convex Pareto frontZDT2Decision space x∈[0,1]30Objective functions f1(x)=x1f2(x)=g(x)1−(x1/g(x))2g(x)=1+9n−1 ni=2x iOptimal solutions0≤x∗1≤1and x∗i=0for i=2,...,30 Characteristics nonconvex Pareto frontZDT3Decision space x∈[0,1]30Objective functions f1(x)=x1f2(x)=g(x)1−x1/g(x)−x1g(x)sin(10πx1)g(x)=1+9n−1 ni=2x iOptimal solutions0≤x∗1≤1and x∗i=0for i=2,...,30Characteristics discontinuous Pareto fronthard to obtain.In our experiments we use500uniformly spaced Pareto-optimal solutions as the approximation of the true Pareto front2.Thefirst metric is Convergence metricΥ.It measures the distance between the obtained nondominated front Q and the set P∗of Pareto-optimal solutions:Υ= |Q|i=1d i |Q|,where d i is the Euclidean distance(in the objective space)between the solution i∈Q and the nearest member of P∗.Instead of the convergence metric,some researchers use a very similar metric, called Generational Distance GD.This metric measures the distance between2These solutions are uniformly spread in the decision space.They were made available online by Simon Huband at .au/research/wfg/ datafiles.html.526T.Robiˇc and B.FilipiˇcTable 2.Description of the test problems ZDT4and ZDT6ZDT4Decision space x ∈[0,1]×[−5,5]9Objective functions f 1(x )=x 1f 2(x )=g (x )1−(x i /g (x ))2g (x )=1+10(n −1)+n i =2x 2i −10cos(4πx i )Optimal solutions 0≤x ∗1≤1and x ∗i=0for i =2,...,10Characteristicsmany local Pareto frontsZDT6Decision space x ∈[0,1]10Objective functions f 1(x )=1−exp −4x 1sin(6πx 1)6f 2(x )=g (x )1−(f 1(x )/g (x ))2g (x )=1+9n −1ni =2x iOptimal solutions 0≤x ∗1≤1and x ∗i =0for i =2,...,10Characteristicslow density of solutions near Pareto frontthe obtained nondominated front Q and the set P ∗of Pareto-optimal solutions as GD =|Q |i =1d 2i|Q |,where d i is again the Euclidean distance (in the objective space)between the solution i ∈Q and the nearest member of P ∗.The third metric is Diversity metric ∆.This metric measures the extent of spread achieved among the nondominated solutions:∆=d f +d l + |Q |−1i =1|d i −d |d f +d l +(|Q |−1)d ,where d i is the Euclidean distance (in the objective space)between consecutive solutions in the obtained nondominated front Q ,and d is the average of these distances.The parameters d f and d l represent the Euclidean distances between the extreme solutions of the Pareto front P ∗and the boundary solutions of the obtained front Q .4.3ExperimentsIn addition to investigating the performance of the three DEMO variants,we were also interested in observing the effect of the crossover probability (crossProb in Fig.2)on the efficiency of DEMO.Therefore,we made the following exper-iments:for every DEMO variant,we run the respective DEMO algorithm with crossover probabilities set to 30%,60%and 90%.We repeated all tests 10timesDEMO:Differential Evolution for Multiobjective Optimization527 Table3.Statistics of the results on test problems ZDT1,ZDT2and ZDT3ZDT1Algorithm Convergence metric Diversity metricNSGA-II(real-coded)0.033482±0.0047500.390307±0.001876NSGA-II(binary-coded)0.000894±0.0000000.463292±0.041622SPEA0.001799±0.0000010.784525±0.004440PAES0.082085±0.0086791.229794±0.004839PDEA N/A0.298567±0.000742MODE0.005800±0.000000N/ADEMO/parent0.001083±0.0001130.325237±0.030249DEMO/closest/dec0.001113±0.0001340.319230±0.031350DEMO/closest/obj0.001132±0.0001360.306770±0.025465ZDT2Algorithm Convergence metric Diversity metricNSGA-II(real-coded)0.072391±0.0316890.430776±0.004721NSGA-II(binary-coded)0.000824±0.0000000.435112±0.024607SPEA0.001339±0.0000000.755148±0.004521PAES0.126276±0.0368771.165942±0.007682PDEA N/A0.317958±0.001389MODE0.005500±0.000000N/ADEMO/parent0.000755±0.0000450.329151±0.032408DEMO/closest/dec0.000820±0.0000420.335178±0.016985DEMO/closest/obj0.000780±0.0000350.326821±0.021083ZDT3Algorithm Convergence metric Diversity metricNSGA-II(real-coded)0.114500±0.0079400.738540±0.019706NSGA-II(binary-coded)0.043411±0.0000420.575606±0.005078SPEA0.047517±0.0000470.672938±0.003587PAES0.023872±0.0000100.789920±0.001653PDEA N/A0.623812±0.000225MODE0.021560±0.000000N/ADEMO/parent0.001178±0.0000590.309436±0.018603DEMO/closest/dec0.001197±0.0000910.324934±0.029648DEMO/closest/obj0.001236±0.0000910.328873±0.019142with different initial populations.The scaling factor F was set to0.5and was not tuned.To match the settings of the algorithms used for comparison,the population size was set100and the algorithm was run for250generations.4.4ResultsTables3and4present the mean and variance of the values of the convergence and diversity metric,averaged over10runs.We provide the results for all three DEMO variants.Results of other algorithms are taken from the literature(see528T.Robiˇc and B.FilipiˇcTable4.Statistics of the results on test problems ZDT4and ZDT6ZDT4Algorithm Convergence metric Diversity metricNSGA-II(real-coded)0.513053±0.1184600.702612±0.064648NSGA-II(binary-coded)3.227636±7.3076300.479475±0.009841SPEA7.340299±6.5725160.798463±0.014616PAES0.854816±0.5272380.870458±0.101399PDEA N/A0.840852±0.035741MODE0.638950±0.500200N/ADEMO/parent0.001037±0.0001340.359905±0.037672DEMO/closest/dec0.001016±0.0000910.359600±0.026977DEMO/closest/obj0.041012±0.0639200.407225±0.094851ZDT6Algorithm Convergence metric Diversity metricNSGA-II(real-coded)0.296564±0.0131350.668025±0.009923NSGA-II(binary-coded)7.806798±0.0016670.644477±0.035042SPEA0.221138±0.0004490.849389±0.002713PAES0.085469±0.0066641.153052±0.003916PDEA N/A0.473074±0.021721MODE0.026230±0.000861N/ADEMO/parent0.000629±0.0000440.442308±0.039255DEMO/closest/dec0.000630±0.0000210.461174±0.035289DEMO/closest/obj0.000642±0.0000290.458641±0.031362[1]for the results and parameter settings of both versions of NSGA-II,SPEA and PAES,[8]for PDEA,and[9]for MODE3).Results for PDEA in[8]were evaluated with generational distance instead of the convergence metric.Because PDEA is the approach that is the most similar to DEMO,we present the additional comparison of their results in Tables5 and6.Once more,we present the mean and variance of the values of generational distance,averaged over10runs.Table5.Generational distance achieved by PDEA and DEMO on the problems ZDT1, ZDT2and ZDT3Generational distanceAlgorithm ZDT1ZDT2ZDT3PDEA0.000615±0.0000000.000652±0.0000000.000563±0.000000 DEMO/parent0.000230±0.0000480.000091±0.0000040.000156±0.000007 DEMO/closest/dec0.000242±0.0000280.000097±0.0000040.000162±0.000013 DEMO/closest/obj0.000243±0.0000500.000092±0.0000040.000169±0.0000173The results for MODE are the average of30instead of10runs.In[9]no diversity metric was calculated.DEMO:Differential Evolution for Multiobjective Optimization529f 1f 2ZDT1f 1f 2ZDT2f 1f 2ZDT3f 1f 2ZDT40 0.20.40.60.811.2f 1f 2ZDT6Fig.4.Nondominated solutions of the final population obtained by DEMO on five ZDT test problems (see Table 7for more details on these fronts).The presented fronts are the outcome of a single run of DEMO/parentIn Tables 3,4,5and 6,only the results of DEMO with crossover probability 30%are shown.The results obtained with crossover probabilities set to 60%and530T.Robiˇc and B.FilipiˇcTable6.Generational distance achieved by PDEA and DEMO on the problems ZDT4 and ZDT6Generational distanceAlgorithm ZDT4ZDT6PDEA0.618258±0.8268810.023886±0.003294DEMO/parent0.000202±0.0000530.000074±0.000004DEMO/closest/dec0.000179±0.0000480.000075±0.000002DEMO/closest/obj0.004262±0.0065450.000076±0.000003Table7.Metric values for the nondominated fronts shown in Fig.4Problem Convergence metric Diversity metricZDT10.0012200.297066ZDT20.0007720.281994ZDT30.0012200.274098ZDT40.0012940.318805ZDT60.0006480.38508890%for all three variants were always worse than the ones given in the tables and are not presented for the sake of clarity4.Figure4shows the nondominated fronts obtained by a single run of DE-MO/parent with crossover probability30%.Table7summarizes the values of the convergence and diversity metrics for the nondominated fronts from Fig.4.5DiscussionAs mentioned in the previous section,DEMO with crossover probability of30% achieved the best results.This is in contradiction with the recommendations by the authors of DE[10]and confirms Madavan’sfindings in[8].However,we have to take these results with caution.Crossover probability is highly related to the dimensionality of the decision space.In our study,only high-dimensional functions were used.When operating on low-dimensional decision spaces,higher values for crossover probabilities should be used to preserve the diversity in the population.The challenge for MOEAs in thefirst three test problems(ZDT1,ZDT2and ZDT3)lies in the high-dimensionality of these problems.Many MOEAs have achieved very good results on these problems in both goals of multiobjective optimization(convergence to the true Pareto front and uniform spread of solu-tions along the front).The results for the problems ZDT1and ZDT2(Tables3 and5)show that DEMO achieves good results,which are comparable to the4The interested reader mayfind all nondominated fronts obtained by the three ver-sions of DEMO with the three different crossover probabilities on the internet site http://dis.ijs.si/tea/demo.htm.DEMO:Differential Evolution for Multiobjective Optimization531 results of the algorithms NSGA-II and PDEA.On ZDT3,the three DEMO variants outperform all other algorithms used in comparison.On thefirst three test problems we cannot see a meaningful difference in performance of the three DEMO variants.ZDT4is a hard optimization problem with many(219)local Pareto fronts that tend to mislead the optimization algorithm.In Tables4and6we can see that all algorithms(with the exception of DEMO)have difficulties in converging to the true Pareto front.Here,we can see for thefirst time that there is a notable difference between the DEMO variants.The third variant,DEMO/closest/obj, performs poorly compared to thefirst two variants,although still better than other algorithms.While thefirst two variants of DEMO converge to the true Pareto optimal front in all of the10runs,the third variant remains blocked in a local Pareto front3times out of10.This might be caused by DEMO/closest/obj putting too much effort infinding well spaced solutions and thus falling behind in the goal of convergence to the true Pareto optimal front.In this problem, there is also a big difference in results produced by the DEMO variants with different crossover probabilities.When using crossover probability60%or90%, no variant of DEMO ever converged to the true Pareto optimal front.With the test problem ZDT6,there are two major difficulties.Thefirst one is thin density of solutions towards the Pareto front and the second one non-uniform spread of solutions along the front.On this problem,all three DEMO variants outperform all other algorithms(see Tables4and6).The results of all DEMO variants are also very similar and almost no difference is noted when using different crossover probabilities.Here,we note that the diversity metric value is worse than on all other problems.This is because of the non-uniform spread of solutions that causes difficulties although the convergence is good.6ConclusionDEMO is a new DE implementation dealing with multiple objectives.The big-gest difference between DEMO and other MOEAs that also use DE for repro-duction is that in DEMO the newly created good candidates immediately take part in the creation of the subsequent candidates.This enables fast convergence to the true Pareto front,while the use of nondominated sorting and crowding distance metric in truncation of the extended population promotes the uniform spread of solutions.In this paper,three variants of DEMO were introduced.The detailed analysis of the results brings us to the following conclusions.The three DEMO variants are as effective as the algorithms NSGA-II and PDEA on the problems ZDT1 and ZDT2.On the problems ZDT3,ZDT4and ZDT6,DEMO achieves better results than any other algorithm used for comparison.As for the DEMO variants, we have not found any variant to be significantly better than another.Crowding DE thus showed not to bring the expected advantage over standard DE.Because DEMO/closest/dec and DEMO/closest/obj are computationally more expensive532T.Robiˇc and B.Filipiˇcthan DEMO/parent and do not bring any important advantage,we recommend the variant DEMO/parent be used in future experimentation.In this study,we also investigated the influence of three different settings of the crossover probability.We found that DEMO in all three variants worked best when the crossover probability was low(30%).Thesefindings,of course, cannot be generalized because our test problems were high-dimensional(10-or 30-dimensional).In low-dimensional problems,higher values of crossover prob-ability should be used to preserve the diversity in the population.Seeing how crossover probability affects DEMO’s performance,we are now interested if the other parameter used in candidate creation(the scaling factor F)also influences DEMO’s performance.This investigation is left for further work.In the near future,we also plan to evaluate DEMO on additional test problems.AcknowledgmentThe work presented in the paper was supported by the Slovenian Ministry of Ed-ucation,Science and Sport(Research Programme P2-0209Artificial Intelligence and Intelligent Systems).The authors wish to thank the anonymous reviewers for their comments and Simon Huband for making the Pareto-optimal solutions for the ZDT problems available online.References1.Deb,K.,Pratap,A.,Agarwal,S.,Meyarivan,T.:A fast and elitist multiobjectivegenetic algorithm:NSGA–II.IEEE Transactions on Evolutionary Computation6 (2002)182–1972.Zitzler,E.,Laumanns,M.,Thiele,L.:SPEA2:Improving the strength pareto evo-lutionary algorithm.Technical Report103,Computer Engineering and Networks Laboratory(TIK),Swiss Federal Institute of Technology(ETH)Zurich,Glorias-trasse35,CH-8092Zurich,Switzerland(2001)3.Price,K.V.,Storn,R.:Differential evolution–a simple evolution strategy for fastoptimization.Dr.Dobb’s Journal22(1997)18–24mpinen,J.:(A bibliography of differential evolution algorithm)http://www2.lut.fi/˜jlampine/debiblio.htm5.Abbass,H.A.,Sarker,R.,Newton,C.:PDE:A pareto-frontier differential evolu-tion approach for multi-objective optimization problems.In:Proceedings of the Congress on Evolutionary Computation2001(CEC’2001).Volume2,Piscataway, New Jersey,IEEE Service Center(2001)971–9786.Abbass,H.A.:The self-adaptive pareto differential evolution algorithm.In:Congress on Evolutionary Computation(CEC’2002).Volume1,Piscataway,New Jersey,IEEE Service Center(2002)831–8367.Zitzler,E.,Thiele,L.:Multiobjective evolutionary algorithms:A comparativecase study and the strength pareto approach.IEEE Transactions on Evolutionary Computation3(1999)257–2718.Madavan,N.K.:Multiobjective optimization using a pareto differential evolutionapproach.In:Congress on Evolutionary Computation(CEC’2002).Volume2, Piscataway,New Jersey,IEEE Service Center(2002)1145–1150。
一种基于万有引力的提高协作推荐准确性的相似性度量方法(IJISA-V12-N2-5)
I.J. Intelligent Systems and Applications, 2020, 2, 44-53Published Online April 2020 in MECS (/)DOI: 10.5815/ijisa.2020.02.05A New Similarity Measure Based onGravitational Attraction for Improving the Accuracy of Collaborative RecommendationsVijay VermaComputer Engineering Department, National Institute of Technology, Kurukshetra, Haryana, India-136119E-mail: vermavijay@nitkkr.ac.inRajesh Kumar AggarwalComputer Engineering Department, National Institute of Technology, Kurukshetra, Haryana, India-136119E-mail: rka15969@Received: 04 April 2019; Accepted: 09 May 2019; Published: 08 April 2020 Abstract—Recommender Systems (RSs) work as apersonal agent for individuals who are not able to make decisions from the potentially overwhelming number of alternatives available on the World Wide Web (or simply Web). Neighborhood-based algorithms are traditional approaches for collaborative recommendations and are very popular due to their simplicity and efficiency. Neighborhood-based recommender systems use numerous kinds of similarity measures between users or items in order to achieve diverse goals for designing an RS such as accuracy, novelty, diversity etc. However, the existing similarity measures cannot manage well the data sparsity problems, which results in either very few co-rated items or absolutely no co-rated items. Furthermore, there are also situations where only the associations between users and items, such as buying/browsing behaviors, exist in form of unary ratings, a special case of ratings. In such situations, the existing similarity measures are either undefined or provide extreme values such as either 0 or 1. Thus, there is a compelling need to define a similarity measure that can deal with data sparsity problem and/or unary rating data. This article proposes a new similarity measure for neighborhood-based collaborative recommender systems based on Newton's law of universal gravitation. In order to achieve this, a new way of interpreting the relative mass as well as the relative distance has been taken into consideration by using the rating data from the user-item matrix. Finally, for evaluating the proposed approach against baseline approaches, several experiments have been conducted using standardized benchmark datasets such as MovieLens-100K and MovieLens-1M. Results obtained demonstrate that the proposed method provides better predictive accuracy in terms of RMSE and significantly improves the classification accuracy in terms of precision-recall.Index Terms—Recommender Systems, Collaborative Filtering, Similarity Measures,Newton's law of universal gravitation, E-commerce.I. I NTRODUCTION Recommender Systems (RSs) help individuals who are not able to make decisions from the potentially overwhelming number of alternatives available on the World Wide Web (or simply Web) [1,2]. A number of commercially successful websites including YouTube[3] and Netflix[4] have employed RSs to facilitate their customers to find interesting items e.g. videos, movies. RSs are the crucial technique for e-commerce websites for providing personalized services to their customers[5]. RS provide relevant suggestions, that may be interesting/useful to users, by exploiting the various sources of data related to users, items and their interactions[6]. Fig.1 shows a high-level abstract view of an RS in general.Fig.1. A high-level abstract view of an RS.Among various recommendation approaches, collaborative filtering-based approaches are the most popular techniques. The articles [6], [16], and [17] provide excellent surveys on collaborative filtering-based recommendations. According to [16], collaborative recommendation approaches can further be classified into two general classes as memory-based and model-based. Memory-based approaches [17-19], also referred to as neighborhood-based collaborative filtering algorithms, use the entire collection of user-item rating data stored in the system for the recommendations process. In contrast to neighborhood-based approaches, themodel-basedapproaches build a model by learning from rating data [20–23]. Particularly, neighborhood-based algorithms (for an example k-Nearest Neighbors) are the traditional algorithms for collaborative recommendation approaches [24]. These algorithms are very popular due to their simplicity, in terms of implementation, and efficiency in terms of performance. Neighborhood-based recommender systems use numerous definitions of similarities between either users or items. The similarity measure is the core component of the neighborhood-based collaborative recommendation as explained by Herlocker et al. in the articles [25], [26].Traditionally, the similarity between users or items is defined using statistical measures such as Pearson Correlation Coefficient (PCC) [27, 28], Constrained Pearson Correlation Coefficient (CPCC) [11], Mean Squared Difference (MSD) [11], Cosine similarity (COS) [16] and Adjusted Cosine Similarity (ACOS) [19]. Moreover, few researchers utilized heuristic information to define the similarity between users/items such as Proximity-Impact-Popularity (PIP-measure) [29], and Proximity-Significance-Singularity (PSS-measure) [30]. In recent years, different contextual information along with hybridization of two or more existing similarity measures, is also exploited for defining similarity such as Jaccard-Mean-Squared- Difference (JMSD) [31], Coverage-based JMSD (CJMSD) [32], singularity-based (SING) [33], and significance-based (s-metric) [34]. However, data sparsity and unary rating data (where only the association between users and items are known such as buying/browsing behaviors in contrast to the actual rating value) are the two main limitations associated with existing similarity measures. This warrants the need for a new similarity measure to overcome the existing issues with similarity measures. Therefore, this article proposes a new similarity measure between users based on Newton’s gravitational law of attraction. It describes a way for modeling two main parameters, i.e. the relative mass & distance, in the gravitational force by using the rating data available in User-Item Matrix. Here, the similarity between users is calculated based on an equation analogous to Newton’s law of universal gravitation. Finally, the proposed similarity measure is used to find similar users to an active user.Several experiments have been performed using the standardized benchmark datasets, MovieLens-1M and MovieLens-100k, for validating the effectiveness of the proposed similarity measure against various accuracy matrices such as RMSE, precision, and recall; which are considered as the most examined goals for designing an RS.The rest of the article is organized as follows: Section II highlights some of the related work for the similarity measures. Section III explains the proposed similarity measure in detail; Section IV discusses the experimental details and results; finally, we conclude the article in Section V.II.R ELATED W ORKSIn the past, there have been several efforts in designing similarity measures to be used with the neighborhood-based collaborative recommendation. In RS literature, there are numerous similarity measures have been proposed to calculate the similarity between users or items.Traditionally, Pearson Correlation Coefficient (PCC) [27, 28], Constrained Pearson Correlation Coefficient (CPCC) [11], Mean Squared Difference (MSD) [11], Cosine similarity (COS) [16] and Adjusted Cosine Similarity (ACOS) [19] are used to find out the similarity between either users or items. Table 1 summarizes these traditional similarity measures with their formulae and major drawbacks.Apart from the traditional similarity measures, there are some new similarity measures proposed in recent times. Ahn [29] proposed a new similarity measure, called PIP (Proximity, Impact, and Popularity), for solving the cold starting problem and Liu et al. [30] further improves this PIP measure.Bobadilla et al. proposed several similarity measures in order to remove the limitations of existing traditional similarity measures [31-34]. In [31], Bobadilla et al. combined the Jaccard index with MSD to define a new similarity metric, called JMSD. This JMSD metric had been further enhanced in [32] by adding an additional term corresponding to improve the coverage of an RS, therefore called as Coverage-based JMSD (CJMSD). Different contextual information related to items and users such as singularities, significances etc. have been utilized to propose new similarity measures in [33, 34]. Further, Patra et al. [36], has proposed a new similarity measure by utilizing Bhattacharyya Coefficient to handle the data sparsity problem. In addition, there are numerous articles present in the recommender systems literature, which combine two or more similarity measures in order to introduce a new hybrid similarity measure.In very recent time, Kumar et al. [37], proposed a similarity measure to describe the affinity between items (in the online space) based on Newton’s law of the gravitational attraction. In their work, Kumar et al. analyzed the features (or attributes) of the items along with rating data for modeling the similarity measure, therefore, their approach is an example of hybridization of content-based and collaborative filtering. Additionally, they have suggested that the two main features of the model (i.e. the relative mass & distance) can be interpreted in various different ways. The proposed work is inspired by their idea but in contrast, does not exploit contents (tags or attributes) of neither items nor users. Thus, the proposed method is an example of the pure collaborative filtering.A. MotivationThere exist numerous ways to define the similarity between users or items in the neighborhood-basedcollaborative recommendations, as explained in the previous section. These similarity measures facilitate the development of an RS with diverse goals such as accuracy, novelty, coverage, serendipity etc. Particularly, some of the model-based approaches are better than neighborhood-based approaches in terms of prediction accuracy, still, for better users’ experience, the accuracy alone is not sufficient. Another important factor for effective and satisfying users’ experience is the concept of serendipity [38], which may help the users to discover absolutely different items. Serendipity enhances the idea of novelty by adding a factor of surprise, however, serendipitous recommendations may not always result in guaranteed success.Table 1. The traditional similarity measures with their definitive formulaeNeighborhood-based approaches favor the serendipitous recommendations along with following advantages [39]:∙Simplicity: these methods are comparatively simple to implement and require just oneparameter, the number of neighbors, generally.∙Efficiency: these methods are very efficient in term of performance and memory because there isno costly training phase, therefore makes themsuitable for scalable applications.∙Justifiability: these methods are capable to provide concise explanations for the recommendations sothat users can better understand the system. Therefore, finding new similarity measures for neighborhood-based methods is an active thread of research among the researchers. Furthermore, the existing CFRS can be easily refurbished by only replacing the similarity measurement module. The proposed similarity measure overcomes the following limitations of the traditional as well as the state-of-art similarity measures: Data sparsity: In any RS, the rating data i.e. the user-item matrix is very sparse which results in very few or almost no co-rated items between users. Therefore, existing similarity measures fail to calculate the similarity. The proposed similarity measure works well with no commonly rated items as explained in the illustrative example 2.Implicit feedback datasets (Unary ratings): In some cases, the users’ preferences are collected from their actions such as buying/browsing behaviors in the form of unary ratings. The state-of-art similarity measures are either not defined or provide always extreme score (0 or 1) for such datasets while the proposed measure works well for the unary rating dataset as explained in the illustrative example 1.III.P ROPOSED S IMILARITY M EASUREIn 1686, Sir Isaac Newton proposed the universal gravitational law of attraction between two bodies basedon some empirical observations. The law states that the two bodies attract each other with a force (F), which is proportional to the product of their masses and inversely proportional to the square of the distance between them as shown in Fig. 2.Fig.2. Newton’s Law of Universal Gravitation .This section will formalize the proposed similarity between users based on the gravitational law of attraction and derive the two main parameters (i.e. the relative mass & distance) from the rating data as available in User-Item Matrix. Here, we do not exploit the content (tags or attributes) of neither the users nor the items for modeling the similarity measure. A. Relative MassThe relative mass of a user can be represented as follows: if a user provides more ratings that means that he/she is an active user of the system. Therefore, the relative mass of a user (u) may be modeled using the activeness of the user as shown in (1).total number of ratings given by user i.e. ||mass(u)u I MAX=(1)The constant MAX is used to normalize the relative mass value between 0 and 1. The constant MAX is defined as the maximum number of ratings given by a user in the system. Therefore, the most active user will have the relative mass equal to 1, who has rated the maximum number of items in the system. Cleary, the relative mass of any user will lies between 0 and 1 so for any two users A and B having masses m1 and m2 respectively then 101m ≤≤ and 201m ≤≤which implies that 120(*)1m m ≤≤.The relative mass value of 0 represents a new user (or a user who has not provided any rating) while the mass value 1 represents the most active user in the system. The user with the higher value of the mass will show greater affinity in comparison to a userwith the lower value of the mass. Another way to model the relative mass of a user may be defined as shown in (2).||()||||||u uui ui i I i I u u r r I mass u I I I ∈∈⎛⎫⎛⎫ ⎪== ⎪ ⎪⎝⎭ ⎪⎝⎭∑∑ (2)Notations used are already explained in the last row of Table 1. The first term of the expression represents the average rating given by a user u. The second term represents how much items have been rated by the user with respect to the total number of items in the system. Semantically, both terms in combination show the average rating given by the user u over all the items in the system if he/she would rate all the items of the system. B. Relative DistanceSimilar to the relative mass, the relative distance between users is also represented using the rating data, present in the UI-matrix. The basic hypothesis is as follows: if the two users have more number of commonly rated items then they possess the tendency of similar interests. Likewise, if the number of uncommonly rated items between two users increases then these users may have their inclination to different items, therefore, having the tendency to be more apart from each other. We postulate that the relative distance between two users can be defined as the ratio of the number of uncommonly rated items to the number of items rated by both users, as defined by (3).{()()}{()()}distance(,)()n u n u v n v n u v u v n u v -⋂+-⋂=⋃ (3)where,()n u : the number of items rated by the user u, i.e. ||u I ()n v : the number of items rated by the user v, i.e. ||v I ()n u v ⋂: the number of items that are commonly rated by both the users u and v, i.e. ||uv I()n u v ⋃: the number of items rated by either the user u or by the user v or by both the users.In (3), the term in the denominator, ()n u v ⋃, is used to normalize the distance value between 0 and 1. Fig. 3 shows a snippet of a UI-matrix, where the number of uncommonly rated items between the users i U and j U are shown using line segments. Further, (3) may be rewritten after further simplification as (4).()()()()distance(,)11(,)()()n u v n u v n u v u v jaccard u v n u v n u v ⎛⎫⋃-⋂⋂==-=- ⎪⋃⋃⎝⎭(4)Fig.3. A portion of a UI-matrix.Where (,)jaccard u v represents the Jaccard similarity index [40], between the users u and v. Therefore, we can conclude that the proposed distance effectively comes out to be equal to the Jaccard distance. We know that the Jaccard index(J) always lies between 0 and 1 i.e.01J ≤≤ which implies that 0(1)1J ≤-≤ , therefore the Jaccard distance will also lie between 0 and 1. B.1. Is the proposed distance is a metric or a measure? Formally, a distance measure(d) is called a metric, if it satisfies the following four conditions:(1) (,)0d i j ≥(2) (,)0d i j i j =⇔= (3) (,)(,)d i j d j i =(4) (,)(,)(,)d i k d i j d j k ≤+.As the proposed relative distance is equal to the Jaccard distance which satisfies all the above four conditions, the proof is given in the article [41]. Thus, results in making the proposed relative distance as a metric.C. The similarity measureFinally, the similarity between two users, u1 and u2 are calculated by using the formula analogous to Newton’s law of the gravitational attraction as specified by (5).1212212()*()(,)[distance(,)]sim mass u mass u F u u u u =(5)From (5), it is clear that a lower value of distance will provide greater similarity between users. As, (1)-(5) have been written for users, we can also rewrite analogous equations to calculate the similarity between items for the item-based neighborhood approaches.Illustrative Example1: Consider the UI matrix as shown in Fig. 3, the similarity between the users a U and b U can be calculated as follows:()7a n U =; ()5b n U =; ()3a b n U U ⋂=; ()9a b n U U ⋃=. Therefore,()7/a mass U MAX = ; ()5/b mass U MAX= ; ()distance(,)13/90.67a b U U =-=. The similarity betweenusers,(,)sim a b F U U ,isequalto()()20.70.5/0.670.779⨯=; assuming the value of theconstant MAX is to be 10 for this toy example. It shouldbe noted that the rating values in the UI matrix are on a scale of 1 to 5 i.e. scalar rating; if these ratings are unary ratings i.e. only the association of users with items are known then traditional similarity measures are either not defined or result in extreme values, however, the proposed similarity measure will result in the same value, 0.779, as in the case of scalar rating.Illustrative Example 2: Again consider the UI matrix as shown in Fig. 3, In order to calculate the similarity between the users b U and c U , who have no corated items between them, we proceed as follows: ()5b n U =; ()3c n U =; ()0b c n U U ⋂=; ()8b c n U U ⋃=. Therefore, ()5/b mass U MAX =; ()3/c mass U MAX =;()distance(,)10/81b c U U =-=.The similarity between users, (,)sim b c F U U , is equal to ()()20.50.3/1.00.15⨯=; assuming the value of the constant MAX is to be 10 for this toy example.Here again, none of the traditional similarity is defined as they operate only on the set of commonly rated items, which is an empty set here. In such cases, the proposed similarity measure returns the maximum value of the distance i.e. 1 and the value of the similarity is measuredbased on the relative masses of the users.IV.E XPERIMENTSIn order to validate the effectiveness of the proposed similarity measure, we have performed offline experiments. An offline experiment is typically the easiest because it doesn’t require the interaction with real users [42].A. The datasetWe have used the standard benchmark MovieLens datasets, which are publicly available from GroupLens[43]. Table 2 summarizes datasets briefly, the more detailed description and history of the MovieLens datasets can be found in [44].Fig. 4(a) represents the well-known long tail problem associated with recommender systems for the MovieLens-1M dataset. According to this problem, only a limited number of items are rated repeatedly by users, such items are called popular items, while, a very large number of items are rated barely. This problem results in the distribution of ratings to be extremely skewed. Fig. 4(b) shows the rating distribution with respect to the user ids. Here, it is clear that the user with the userId= 4169 has rated the maximum number of movies i.e. 2314 times; hence the constant MAX is equal to 2314 for the MovieLens-1M dataset.B. Experimental designIn order to evaluate the effectiveness of the proposed gravitational similarity measure, we have used the traditional user-based collaborative filtering (UBCF) with k-Nearest Neighbors(kNN) technique. However, the usual similarity measures between users are replaced with the proposed gravity-based similarity measure as shown in (5). All the experiments are performed using the system and software specified in Table 3. Furthermore, we have utilized the Apache Mahout framework[45] for all experiments. Fig. 5 summarizes the basic components of a simple user-based recommender in this framework[46], while Fig. 6 demonstrates a simple flow diagram of the basic steps involved in a UBCF for the Apache Mahout framework.C. Evaluation MetricsAccuracy is considered as the most crucial and the most examined goal for designing an RS. We have examined the following two types of accuracy measures in our experiments.∙Prediction accuracy∙Classification accuracyC.1. Prediction accuracyIn order to evaluate the capability of an RS to correctly predict the preference for a user-item pair, we have used the Root Mean Squared Error(RMSE) metric. For a given test set T of user-item pairs (u, i), RMSE is defined as follows:RMSE= (6)C.2. Classification accuracyGenerally, it is not necessary to predict the rating values for providing the recommendations. Actually, in some cases, it is sufficient to provide a list of top-N recommendations, which may or may not be ordered, without estimated ratings for the items in the list. Therefore, we can also apply basic information retrieval metrics for evaluating recommender. These metrics, such as precision and recall, can be adopted in the recommender system scenario without much difficulty. Precision: the proportion of top recommendations that are good recommendationsRecall: the proportion of good recommendations that are present in top recommendations.In a recommender system, the utility of these evaluation metrics relies exclusively upon the fact that how adequately we define: what are good recommendations.Table 2. Brief Description of MovieLens-1M & MovieLens-100K Datasets(a)(b)Fig.4. For MovieLens-1M dataset (a) The long tail problem (b)The distribution of ratings with respect to users Furthermore, in the case of unary rating datasets, only the classification-based accuracy metrics are available for evaluation purposes.D. ResultsBased on the general bibliography in the recommender system research, the Pearson Correlation Coefficient (PCC), among all the traditional similarity measures, provides the best result for predictive accuracy. Therefore, we have evaluated the proposed gravity-based similarity measure against the PCC measure. We have used 80% of rating data from all the users as training data (used for calculating similarity, nearest neighbors) and the remaining 20% ratings of each user are used for testing purpose. Training and test data are chosen randomly in order to avoid any bias from the data selection. Fig.7 compares the RMSE values for both the similarity measures by varying size of nearest neighbors (a) for the MovieLens-100K dataset and (b) for MovieLens-1M dataset.As shown in Fig.7(a) & (b), the RMSE values are calculated for MovieLens-1000K & MovieLens-1M datasets using the proposed method and standard (PCC) as ground truth. Here, based on the values obtained for RMSE, the proposed Gravity-based recommender system results in lowers the RMSE values against the traditional PCC based similarity measure that shows the better performance in terms of predictive accuracy.It is a well-known fact that the predictive accuracy measures, such as MAE, RMSE etc., alone are not enough to depict the real flavor of the accuracy of an RS. Therefore, other classification-based accuracy metrics such as precision and recall are also examined for the varying size of the Top-N recommendations while keeping the neighborhood size to be fixed. Here, we have chosen the neighborhood size equal to 200 for the MovieLens-1M dataset since after this neighborhood size there is no significant improvement in the RMSE values. Furthermore, these accuracy measures, precision, and recall have been simulated as follows:∙for each user, the framework determines the Top-N preferences.∙train the model without these Top-N preferences ∙ask the newly trained model for Top-N recommendation for that user and∙compare those excluded top-N preferences with respect to the predicted top-N recommendationsIn order to define the notion of good recommendations, the framework selects the set of good recommendations only from those items for which the user has expressed preferences above the certain threshold value. This threshold value can be specified either explicitly or implicitly chosen by the framework on a per-user basis.Fig.5. The interaction among basic components of general UBCF in mahout.Fig.6. The flow diagram of a general UBCF in Mahout.(a)(b)Fig.7. The RMSE values for (a) MovieLens-100K and(b) MovieLens-1M dataset.Table 3. The particulars of the system and software used in theexperimentsWe have experimented with the threshold value on a per-user basis, in order to avoid any biases which may arise due to individual differences of rating pattern. For a user, this threshold value is defined as user’s average rating(µ) plus one standard deviation(σ) as shown in Eq.(7).thresholdμσ=+ (7)Table 4 compares the precision and recall values for the different size of recommendation list (a) for MovieLens-1M dataset (b) for MovieLens-100K dataset. The proposed similarity measure is compared against the PCC and Jaccard similarity index. The precision and recall values are also measured for the Jaccard similarity index because it also considers only the structural associations between users & items rather than actual rating values. The entries in Table 4 can be interpreted as follows: the value Precision@20 = 0.042 means that from a list of Top-20 recommendations 4.2% are good recommendations; by considering the average over all the users in the system.Table 4. The value of precision and recall for the varying size of the recommended list. (a) for MovieLens-1M dataset (b) for MovieLens-100K datasetThe proposed measure significantly improves these metrics against the traditional measures and the results highlight the effectiveness of the proposed similarity measure. Based on the empirical findings, we believe that significant improvement is due to the following facts:∙The proposed measure considers all the items rated by a user rather than considering only the co-rateditems.∙It gives more weight to the number of uncommonly rated items while calculating thedistance between users (by taking the square of thedistance in the denominator part).V.C ONCLUSION AND F UTURE W ORKIn this paper, a new similarity measure for collaborative recommender systems, in particular for the user-based collaborative filtering (UBCF), is proposed.。
STL模型的分层轮廓数据优化算法_黄新华
收稿日期:20030331作者简介:黄新华(1978-),男(汉),浙江,硕士研究生黄新华文章编号:1003-8728(2004)05-0605-03STL 模型的分层轮廓数据优化算法黄新华,孙 琨,方 亮,岑启宏(西安交通大学材料科学与工程学院,西安 710049)摘 要:在快速成形系统中,由于STL 模型在切片后的截面轮廓数据含有大量的数据冗余点,严重影响了插补加工的精度和效率。
为此,作者提出了一种数据冗余点的联合剔除算法,在保证加工精度的条件下,以进一步提高加工效率与质量。
关 键 词:快速成形;STL 模型;数据分层;数据优化中图分类号:TG24 文献标识码:AOptimal Algorithm of Laminated Contour for STL (Stereolithography )ModelHUANG Xin -hua ,SUN Kun ,FANG Liang ,CE N Qi -hong(School of Materials Science and Engineering ,Xi ′an Jiaotong University ,Xi ′an 710049)A bstract :There will be a lot of redundant contour points after laminating for STL model in rapid pr ototyping (RP )manufacturing ,which decreases the dimension accuracy and machining efficienc y .A combined algorithm for eliminating redundant points was presented to solve this pr oblem .This algorithm was used in a laminating software ,and applications sho w that machining efficiency and quality are improved for the same machining ac -curacy .Key words :RP manufacturing ;STL model ;Laminating 快速成形近年来得到了非常迅猛的发展,它是集C AD /CAM 、材料、计算机和数控技术为一体的一门新技术。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
+ Corresponding author: Phn: 86-10-68409911 ext 5516, E-mail: jinzy@
Received 2002-04-04; Accepted 2002-09-02 Jin ZY, Wang DX. An optimal method of diffusion algorithm for hetergeneous system. Journal of Software, 2003,14(5):904~910. /1000-9825/14/904.htm Abstract: Many organizations nowadays operate local area networks connecting hundreds of workstations and
l =
∑ wi ∑ li si ∑ si
i =1 i =1 P
P
P
=
i =1 P
∑ si
i =1
.
(1)
负载平衡问题可以归结为通过调整计算时间 wi,使该节点的时间为 l ,则系统达到负载平衡,节点计算时间变化 向量是
b = (l − l1 , l − l 2 , l − l3 ,..., l − l P )T ,
衡,则计算负载不再移动. 设节点 i,j 相邻 ,一个时间步内通过 eij 到达 ni 的流量正比于节点 i,j 的水位差.设在第 t 次迭代时,各节点的 水位(运算时间)是 li( t ) ,扩散算法的公式是
li(t +1) = li(t ) +
1 si
∑τ ij (l (jt ) − li(t ) ) ,
(1) 和式 (2) 得出 , 因此 , 动态负载平衡问题归结于求解代数方程式 (3). 它有 |E| 个未知变量 , 有 P=|V| 个方程 , 一般
地,|E|>P,所以式(3)的解不惟一,存在多种方法来求解该方程,各种方法的结果也可以各不相同.
2
异构系统的动态负载平衡扩散算法
如图 1 所示 ,用一组相互连接的连通器建立异构扩散算 法的模型.设有 P 个水平截面积不同的容器,在底部有管道相 互连接 ,容器内有水 ,可以通过连接管在容器之间流动 .容器的 底面都在同一水平面上 . 如果在初始时刻液面有高有低 , 水就 会在容器之间流动 , 直到所有容器的液面高度相同为止 . 在此 模型中 , 水的体积与计算量相对应 , 水位与节点的计算时间相 对应 , 容器的横截面积与节点的处理速度相对应 , 连接管与节
1000-9825/2003/14(05)0904
©2003 Journal of Software 软 件 学 报
Vol.14, No.5
异构系统中负载平衡扩散算法的加速方法
金之雁 1+, 王鼎兴 2
1 2
∗
(中国气象科学研究院,北京
100081) 100084)
(清华大学 计算机科学与技术系,北京
An Optimal Method of Diffusion Algorithm for Hetergeneous System
JIN Zhi-Yan1+,
1 2
WANG Ding-Xing2
(Chinese Academy of Meteorological Science, Beijing 100081, China) (Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China)
1
异构系统的动态负载平衡方法
我们假设并行处理系统是由用某种拓扑结构的网络相连的处理速度不同的节点构成 . 该系统可以用节点
图 G=(V,E)来表示,其中 V 是节点,P 是节点数量,|V|=P,E 是边,代表节点之间的通信连接.我们将节点从 1~P 顺 序编号,使每个节点有惟一的序号与之对应,节点用 ni 表示.同样,我们将边从 1~|E|编号 ,使每个边也有惟一的序 号与之对应,用 ei 表示,也可用 eij 表示连接节点 ni,nj 的边,|E|是边的数量.第 i 个节点的计算速度是 si,计算负载是 wi,计算时间是 li=wi/s .St 是节点 i 的横截面积 ( 计算速度 ). 此式表示每一步 , 节点 i,j 之间交换的计算负载为 1 τ ij (l tj − lit ) ,引起的计算时间的变化为 τ ij (l tj − lit ) .我们考虑的通信线路是双向的,因此 τ ji = τ ij .公式可改写为 si
personal computers and use them as a cluster system. Dynamic load balancing is an important method to improve the performance on such heterogeneous system. Diffusion algorithm is a dynamic load balancing method for homogeneous system. In this paper, the diffusion algorithm is extended to heterogeneous system, the influence of the arrangement of different processors in the system on the convergence rate of the diffusion algorithm is studied, and an optimal method is proposed to improve the convergence rate. Primary result shows that it can find the better arrangement of the processors to accelerate the diffusion algorithm. Key words: parallel computing; heterogeneous system; dynamic load balancing; diffusion algorithm; convergence rate; arrangement of processors 摘 要: 目前,很多单位与组织都有连接着数百台工作站和微机的局域网,并将它们作为一个机群系统使用.在这
j ↔i
果整个系统达到负载平衡,对于节点 i 有
∑ δ ij
j ↔i
si
= l − li ,
i = 1,..., P ,
906
写成矩阵的形式为
Journal of Software
软件学报
2003,14(5)
S −1 Ax = b , 其中 S 为对角矩阵, S = diag ( s1 , s 2 , s3 ,..., s P ) , A 为图 G 的邻接矩阵,其维数为|V|×|E|,其元素 aij 为
Fig.1
图1
The diffusion algorithm for heterogeneous system
异构系统中的扩散算法
点之间的通信线路对应 , 如果计算时间 ( 水位 ) 不相同 , 一些计 算负载 ( 水 ) 会通过处理节点间的通信连接 ( 管道 ) 从一个节点 转移到另一个节点 . 如果计算时间相等 , 整个系统达到负载均
(3)
1, i ↔ j且i是起点 aij = − 1 , i ↔ j且i是终点 . 0, 其他 因此,负载平衡算法归结为求出每一条边的计算负载转移量,即向量 x,使计算时间相同. 对于给定的系统,已知速度对角矩阵 S 和邻接矩阵 A,向量 b 可以通过测量处理节点的处理时间 li 并通过式
样的异构系统上动态负载平衡是提高性能的一个重要方法. 扩散方法是同构系统的动态负载平衡算法 .将散算法 扩展到异构系统中 , 对异构系统中速度不同的处理机的位置与扩散收敛速度的关系进行了研究 , 提出了加速扩 散算法的收敛速度的优化方法.初步实验证明,该方法能通过合理安排处理机,加快扩散算法的速度. 关键词: 并行计算;异构系统;动态负载平衡;扩散算法;收敛速度;处理机安排 文献标识码: A 中图法分类号: TP301
∗ Supported by the National Natural Science Foundation of China under Grant Nos.40245023, 60273007, 60131160743 (国家自然
科学基金 ) 第一作者简介 : 金之雁 (1962- ),男 ,江苏吴县人 ,正研级高级工程师 ,主要研究领域为数值天气预报 ,并行计算 .
T
(2)
其中() 为()的转置矩阵.我们规定,节点间的每一条边都有一个方向,从序号大的处理节点指向序号小的节点.设 沿 ei 边计算负载的转移量是 δi,计算负载转移向量 x=(δ1,δ2,δ3,…,δ|E|)T,沿边 eij 的负载转移量也可成 δij,对于节点 i,它的计算负载总变化量是 ∑ δ ij ,其中 j↔i 表示节点 j 与节点 i 之间有边直接相连,称为节点 j 与节点 i 相邻.如
金之雁 等:异构系统中负载平衡扩散算法的加速方法
905
随着分布式计算技术的普及 , 在分布式并行系统上开展科学计算的学者越来越多 . 如果并行系统的所有节 点完全相同,这种系统称为同构的,反之称为异构的.异构并行系统在工作站网络环境中是很常见的. 负载平衡问题是许多研究学者的研究焦点 [1,2]. 如果计算负载可以在运行之前确定 , 可以事先将负载划分 , 这是静态负载平衡问题 ;若只能在运行时测量计算负载并动态确定负载划分 , 则是动态负载平衡问题 .对于异构 系统 , 不同体系结构的处理机的计算速度、利用率等等很难事先估算 , 静态负载平衡需要很多假定 ;动态负载平 衡无须对处理机的计算速度作假定,可根据实际测量数据进行调整. 实现动态负载平衡的方法很多 , 一些学者采用由一个节点监视整个系统的负载状况 , 并控制整个系统的负 载分配的方法,这称为直接方法,它的优点是简单、直观,在一些应用中取得了较好的效果.另一类方法是紧邻方 法 (nearest-neighbor approach),它假定并行处理系统是由多个节点按照一定的结构相互连接构成的并行处理网 络 , 每个节点只能与其直接连接的节点通信 , 计算负载也只能通过这些连接移动 . 在负载平衡过程中 , 每个节点 只能与其周围的节点进行负载交换 , 通过多次循环迭代实现系统的负载平衡 [3]. 采用这种模式的负载平衡算法 有 :扩散方法 (diffusion method)[4] 、维交换法 (dimension exchange method)[5] 和基于梯度的方法 (gradient based method) 等 . 在维交换法中 , 每个节点在一个时间内只与它的一个相邻的节点进行负载平衡 . 在超立方体结构中 , 在每个节点与其所有相邻的节点进行负载平衡后 ,整个系统达到负载平衡 .基于梯度的方法有多种 ,其共同特点 是用负载梯度图来描述整个系统的负载情况 , 负载是向着梯度最陡方向的节点移动的 . 其中 , 梯度模式方法 (gradient model method)[6] 用计算负载压力面来表示负载的移动 .在扩散算法中 ,每个节点向它周围计算负载较 低的节点 , 在送出计算负载的同时从计算负载高的节点接受计算负载 . 一些学者还研究了构造切比雪夫多项式 来加速收敛的方法 [7].但是 ,上述算法都是针对同构系统的 .Chi-Chung Hui[8]提出了针对异构系统的流体负载平 衡 (hydrodynamic load balancing) 方法 , 该方法用连通器中水的流动来计算负载的移动 , 还应用水的位能在平衡 时最低的原理计算负载的移动 .本文将扩散方法扩展到异构系统 , 重点研究了不同速度的处理机安排在异构系 统的不同位置对收敛速度的影响 , 并利用它来加速扩散算法收敛速度 ,提出了一种找到收敛速度较快的处理机 安排的算法.