Stochastic Search Algorithms for Optimal Content-based Sampling of Video Sequences

合集下载

Self-adaptive differential evolution algorithm for numerical optimization

u l
n
Abstract—In this paper, we propose an extension of Self-adaptive Differential Evolution algorithm (SaDE) to solve optimization problems with constraints. In comparison with the original SaDE algorithm, the replacement criterion was modified for handling constraints. The performance of the proposed method is reported on the set of 24 benchmark problems provided by CEC2006 special session on constrained real parameter optimization.
2006 IEEE Congress on Evolutionary Computation Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006
Self-adaptive Differential Evolution Algorithm for Constrained Real-Parameter Optimization
“DE/rand/1”: Vi ,G = Xr ,G + F ⋅ Xr ,G − Xr G
1 2 3,
(
“DE/best/1”: Vi ,G = Xbest ,G + F ⋅ Xr ,G − X r G 1 2,

A meta-heuristic algorithm for heterogeneous fleet vehicle routing problems

Discrete OptimizationA meta-heuristic algorithm for heterogeneous ﬂeet vehicle routing problems with two-dimensional loading constraintsStephen C.H.Leung a ,Zhenzhen Zhang b ,Defu Zhang b ,⇑,Xian Hua b ,Ming K.Lim caDepartment of Management Sciences,City University of Hong Kong,Hong Kong bDepartment of Computer Science,Xiamen University,Xiamen 361005,China cDerby Business School,University of Derby,Derby,UKa r t i c l e i n f o Article history:Received 18October 2011Accepted 16September 2012Available online 3October 2012Keywords:Routing PackingSimulated annealing Heterogeneous ﬂeeta b s t r a c tThe two-dimensional loading heterogeneous ﬂeet vehicle routing problem (2L-HFVRP)is a variant of the classical vehicle routing problem in which customers are served by a heterogeneous ﬂeet of vehicles.These vehicles have different capacities,ﬁxed and variable operating costs,length and width in dimension,and two-dimensional loading constraints.The objective of this problem is to minimize transportation cost of designed routes,according to which vehicles are used,to satisfy the customer demand.In this study,we proposed a simulated annealing with heuristic local search (SA_HLS)to solve the problem and the search was then extended with a collection of packing heuristics to solve the loading constraints in 2L-HFVRP.To speed up the search process,a data structure was used to record the information related to loading feasi-bility.The effectiveness of SA_HLS was tested on benchmark instances derived from the two-dimensional loading vehicle routing problem (2L-CVRP).In addition,the performance of SA_HLS was also compared with three other 2L-CVRP models and four HFVRP methods found in the literature.Ó2012Elsevier B.V.All rights reserved.1.IntroductionThe vehicle routing problem (VRP)was ﬁrstly addressed by Dantzig and Ramser (1959),proposing the most cost-effective way to distribute items between customers and depots by a ﬂeet of vehicles.Taking into account of the attribute of the ﬂeet,the tra-ditional VRP has evolved to different variants.Amongst them in-clude CVRP (homogenous VRP)that only considers a constraint of vehicles having the same limited capacity (Rochat and Taillard,1995),HVRP (heterogeneous VRP)that serves customers with dif-ferent types of vehicles (Golden et al.,1984;Gendreau et al.,1999;Lima et al.,2004;Prins,2009;Brandao,2011),VRPTW (VRP with time windows)that requires the service of each customer to start within the time window subject to time windows constraints (Kolen et al.,1987);and SDVRP (split deliver VRP)that allows more than one vehicle serving a customer (Chen et al.,2007).Readers are to refer to Crainic and Laporte (1998)and Toth and Vigo (2002)for a detailed description of VRP and its variants.To solve the VPR variants above effectively,a number of metaheuristics have been applied,such as simulated annealing (Osman,1993),Tabu search (Brandao,2011;Gendreau et al.,1999),genetic algorithms (Lima et al.,2004),variable neighborhood search (Imran et al.,2009),and ant colony optimization (Rochat and Taillard,1995;Li et al.,2009).In the real world,logistics managers have to deal with routing and packing problems simultaneously.This results in another domain of VRP to be investigated.In the literature,there are a number of frameworks proposed to address these two problems simultaneously.Iori et al.(2007)addressed the VRP with two-dimensional packing constraints (2L-CVRP)with an algorithm based on branch-and-cut technique.Gendreau et al.(2008)proposed a Tabu search heuristic algorithm to solve large instances with up to 255customers and more than 700items in the 2L-CVRP.Zachariadis et al.(2009)developed a new meta-methodology guided Tabu search (GTS)which can obtain better results.In this work,a collection of packing heuristics was proposed to check the loading feasibility.Fuellerer et al.(2009)presented a new ant colony optimization algorithm deriving from saving-based ant colony opti-mization method and demonstrated its performance to successfully solve the 2L-CVRP.More recently,Leung et al.(2011)developed a new efﬁcient method that consists of a series of algorithms for two-dimensional packing problems.The method has proven its capability to improve the results of most instances used by Zachariadis et al.(2009).Duhamel et al.(2011)proposed a GRASP ÂELS algorithm for 2L-CVRP,whereby the loading constraints were transformed into resource constrained project scheduling problem (RCPSP)constraints before a packing problem can be solved.However,only basic CVRP and Unrestricted version0377-2217/$-see front matter Ó2012Elsevier B.V.All rights reserved./10.1016/j.ejor.2012.09.023Corresponding author.Tel.:+865922582013;fax:+865922580258.E-mail address:dfzhang@ (D.Zhang).of2L-CVRP were solved with their algorithm.Some researchers have extended their heuristics to three-dimensional problems. Gendreau et al.(2006)proposed a multi-layer Tabu search algorithm that iteratively invokes an inner Tabu search procedure to search the optimal solutions of a three-dimensional loading sub-problem.Tarantilis et al.(2009)used a guided Tabu search (GTS)approach with a combination of six packing heuristics to solve 3L-CVRP.In their work,a manual unloading problem was also tested.Furthermore,Fuellerer et al.(2010)also proposed their methods to deal with three-dimensional loading constraints.In addition,Iori and Martello(2010)provided a review in regard to vehicle routing problems with two and three-dimensional loading constraints.Since most enterprises own a heterogeneousﬂeet of vehicles or hire different types of vehicles to serve their customers,it is there-fore crucial to study VRP with aﬂeet of heterogeneous vehicles.The heterogeneousﬂeet VRP(HFVRP)addresses the VRP with a hetero-geneousﬂeet of vehicles which have various capacities,ﬁxed costs and variable costs(Choi and Tcha,2007;Imran et al.,2009).In the literature,three versions of HFVRP have been studied.Golden et al. (1984)considered the variable costs to be uniformly spread across all vehicle types and the availability of each type of vehicle to be unlimited.Gendreau et al.(1999)considered the different variable costs for different types of vehicle.The third HFVRP was introduced by Taillard(1999)and Tarantilis et al.(2004),in which the number of available vehicles of each type is limited.Recently,Penna et al. (2011)introduced an Iterated Local Search,combined with a Vari-able Neighborhood Decent procedure and a random neighborhood ordering(ILS-RVND),to solve all variants of HFVRP.In this paper,we combined the HFVRP with two-dimensional loading constraints,called the heterogeneousﬂeet vehicle routing problems with two-dimensional loading constraints(2L-HFVRP). However,to the best of our knowledge,no work has been con-ducted to address such VRP although it is a practical problem in real-world transportation and logistics industries.In2L-HFVRP, there are different types of vehicles with different capacity,ﬁxed cost,variable cost,length and width in vehicle dimension,andtwo-dimensional loading constraints.The demand of a customer is deﬁned by a set of rectangular items with given width,length and weight.All the items belonging to one customer must be as-signed to the same route.The objective is to describe the minimum transportation costs with a function of the distance travelled,ﬁxed and variable costs associated with the vehicles.This paper presents a simulated annealing(SA)algorithm for 2L-HFVRP.In the literature SA has been proven to be an effective method to solve combinatorial optimization problems and it has been successfully applied to2L-CVRP(Leung et al.,2010).In this paper,a heuristic local search is used to further improve the solu-tion of SA.In addition,six promising packing algorithms,whereby ﬁve were developed by Zachariadis et al.(2009)and one by Leung et al.(2010),are also used to solve the loading constraints in 2L-HFVRP.These algorithms are extensively tested on benchmark instances derived from the2L-CVRP test problems with vehicles of different capacity,ﬁxed and variable costs,length,and width. The comparison with several effective methods of the2L-CVRP and pure HFVRP is also given.2.Problem descriptionThe2L-HFVRP is deﬁned on an undirected connected graph G=(V,E),where V={0,1,...,n}is a vertex set corresponding to the depot(vertex0)and the customers(vertices1,2,...,n)and E={e ij:i,j2V}is an edge set.For each e ij2E,a distance d ij (d ii=0)is associated.Aﬂeet of P different types of vehicles is lo-cated at the depot,and the number of vehicles of each type is unlimited.Capacity Q t,ﬁxed cost F t,variable cost V t,length L t andwidth W t are associated to each type of vehicle t(t=1,2,...,P). The loading surface of vehicle of type t is A t=L tÃW t.On the basis that a vehicle with larger capacity usually has higher cost and greater fuel consumption,we assume that Q16Q26ÁÁÁ6Q P,F1-6F26ÁÁÁ6FPand V16V26ÁÁÁ6V P.The traveling cost of eachedge e ij2E by a vehicle type t is Cij t¼V tÃd ij.The transportation cost of a route for vehicle type t is C R¼F tþP i<j R ji¼1V tÃd RðiÞ;Rðiþ1Þ, where R is the route whose start point and end point are the depot. Each customer i(i=1,2,...,n)demands a set of m i rectangular items,denoted as IT i,and the total weight of IT i equals to D i.Each item I ir2IT i(r=1,2,...,m i)has a speciﬁc length l ir and width w ir.We also denote a i¼P m ir¼1w irÃl ir as the total area of the items of customer i.In2L-HFVRP,a feasible loading must satisfy the fol-lowing constraints:(i)All items of a given customer must be loaded on the samevehicle and split deliveries are not allowed.(ii)All items must have aﬁxed orientation and must be loaded with their sides parallel to the sides of the loading surface. (iii)Each vehicle must start andﬁnish at the depot.(iv)Each customer can only be served once.(v)The capacity,length and width of the vehicle cannot be exceeded.(vi)No two items can overlap in the same route.The objective of2L-HFVRP is to assign customer i(i=1,2,...,n) to one of the routes,so that the total transportation cost is mini-mized and all the routes fulﬁll the constraints.In this paper,we200S.C.H.Leung et al./European Journal of Operational Research225(2013)199–210consider two versions of2L-HFVRP which is the same as2L-CVRP: the Unrestricted only deals with feasible loading of the items unto the vehicles,and the Sequential considers both loading and unload-ing constraints(e.g.when visiting a customer,his/her items can be unloaded without the need to move items that belong to other cus-tomers in the same route).Fig.1gives an example of the two versions.3.The optimization heuristics for two-dimensional loading problemsFor a given route,it is necessary to determine whether all the items required by the customers can be feasibly loaded onto the vehicle.In this paper,we willﬁrst investigate if the total weight of items demanded by the customers exceeds the capacity of the vehicle.Otherwise,six packing heuristics are used to solve the two-dimensional loading problem.As mentioned earlier,the load-ing position of an inserted item must be feasible,i.e.it must not lead to any overlaps(for both Unrestricted and Sequential prob-lems),or sequence constraint violations(for Sequential only).The ﬁrstﬁve heuristics Heur i(i=1,2,...,5)are based on the work by Zachariadis et al.(2009).Each heuristic loads an item in the most suitable position selected from the feasible ones according to the individual criterion as follows:Heur1:Bottom-leftﬁll(W-axis).The selected position is the one with the minimumW-axis coordinate,breaking ties by minimum L-axis coordinate.Heur2:Bottom-leftﬁll(L-axis).The selected position is the one with the minimumL-axis coordinate,breaking ties by minimum W-axis coordinate.Heur3:Max touching perimeter heuristic.The selected position is the one with the maximumsum of the common edges between the inserteditem,the loaded items in the vehicle,and the load-ing surface of the vehicle.Heur4:Max touching perimeter no walls heuristic.The selected position is the one with the maximumsum of the common edges between the inserteditem and the loaded items in the vehicle.Heur5:Min area heuristic.The selected position is the one with the minimumrectangular surface.The rectangular surface corre-sponding to the position at the circle point is shownon the left in Fig.2.More details of theseﬁve heuristics can be found inZachariadis et al.(2009).In order to handle a morecomplex system Heur6was also used,which wasproposed in Leung et al.(2010).Heur6:Maxﬁtness value heuristic.This heuristic gives a priority to a loading point if itcan decrease the number of corner positions whenwe place an item on the point.As a result,everytime an item is loaded,we will select the best load-ing point for the item,which would increase theprobability to obtain a better loading position.These six heuristics are called in sequence,which means if Heur1fails to produce a feasible loading solution,the more com-plex Heur i(i=2,...,6)will be called one at a time toﬁnd the solu-tion.If a feasible solution is found,the loading process stops and the solution is stored.During the loading process,feasible loading positions are recorded.Atﬁrst,only the front left corner(0,0)is available.When an item is successfully inserted,four new positions are added onto the list,and the occupied and duplicated positions are removed.As shown in Fig.2,item D is inserted in the position shown by a circle and four new positions are created.The items are loaded one at a time according to a given sequence.Here,two orders(Ord1,Ord2)are generated.In a given route,each customer has a unique visit order.Ord1is produced by sorting all items by reverse customer visit order,and breaking ties by decreasing area.In Ord2,all items are simply sorted by decreasing area.Both orders will be evaluated by the six heuristics to search for feasible loading solutions.The pseudo-code for the packing heuristics is given in Table1.4.The simulated annealing meta-heuristics for2L-HFVRPSimulated annealing(SA)is a point-based stochastic optimiza-tion method,which explores iteratively from an initial solution to a better result(Cerny,1985;Kirkpatrick et al.,1983).The search mechanism of SA has a very good convergence,and it has been widely applied in solving various NP-hard problems.Each iterationTable1The pseudo-code for the packing heuristics.Is_Feasible(Route r)if total weight of all items exceeds the capacity thenreturn falseend ifsort the items to generate two orderings Ord1,Ord2for each ordering Ord i of the two orderings doif Heur1k Heur2k Heur3k Heur4k Heur5k Heur6thenreturn trueend ifend forreturn falseTable2The pseudo-code of SA_HLS for the2L-HFVRP.SA_HLS_2L-HFVRP(customer demands,vehicle information)Generate initial Order through sorting the customers by decreasing totalweightAssign_Vehicle(Order)to construct the initial solutionT k=T0,Iter=0//Iter is the number of iterationwhile stopping criteria not met dofor i=1to Len doif Iter<10thengenerate a new Order based on the old oneAssign_Vehicle(new Order)if the new solution is packing-feasible and better than the currentone thenaccept the new solution as current solutionend ifaccept the new Order based on the acceptance rule of SAend ifstochastically select NS from{NS1,NS2,NS3},then get a feasible solutionif new solution is better than the current one thenaccept the new solution as the current solutionelseaccept the new solution through the acceptance probability functionend ifLocal_Search(),and get a new feasible solutionif new solution is better than the best one thenreplace the current solution with this new oneend ifupdate the best solution when the solution is better than itend forT k=0.9ÃT k,Iter=Iter+1end whilereturn the best solutionS.C.H.Leung et al./European Journal of Operational Research225(2013)199–210201in SA generates a candidate solution using a neighborhood func-tion.This is a vital step to develop an efﬁcient SA.However,in many cases,the neighborhood function alone is inadequate when seeking for a global optimum solution.In addition to the proposed SA,we also use heuristic local search algorithms to improve thesolutions.Therefore,our algorithm is denoted as SA_HLS.Some mechanisms are adopted to adjust the search trajectory.One important characteristic of SA is that it can accept a worse solution on a probabilistic manner,aiming to search for a better re-sult.With the initial temperature T 0,the temperature cooling sche-dule is T k =0.9ÃT k À1.For a speciﬁc temperature T k ,a sequence of moves are carried out,which is a Markov chain whose length is de-noted as Len .In every iteration,after applying the neighborhood function,if the new solution is better than the current solution (i.e.the cost is lower),then it is accepted.However,if the cost is higher,the new solution may be accepted subject to the accep-tance probability function p (T k ,S new ,S cur ),which depends on the difference between the corresponding cost values and the global parameter T k :p ðT k ;S new ;S cur Þ¼expcos t ðS cur ÞÀcos t ðS new Þkð1Þwhere S cur and S new represent the current solution and the new solu-tion respectively.Table 2provides a framework for the proposed SA_HLS methodology.4.1.Initial solutionGood initial solutions are often a key to the overall efﬁciency of the metaheuristic.We construct the initial solution focusing on the demand of the customers,so that the use of differenttypesTable 3The pseudo-code for assigning customers into vehicles.Assign _Vehicle (Order )iused [1,2,...,P ]={0}for each customer i in Order do while true doselect the vehicle k which is not tabu for i and has minimum (freeD k ÀD i )ÃF kinsert the customer i at the last position of route for vehicle k if !Is_Feasible (route )theniused [P ]=iused [P ]+1;//add a new vehicle with largest capacity Tabu this vehicle k for customer i elseif freeD k <MinD then //vehicle k cannot service any customeriused [t ]=iused [t ]+1//assuming the type of vehicle k is t ,add onenew vehicleend ifaccept the new route,and break the loop//start to assign successive customerend if end while end forreturn generated solutionTable 4The characteristics of items of Classes 2–5instances.Classm iVertical Homogeneous Horizontal LengthWidth Length Width Length Width 2[1,2][0.4L ,0.9L ][0.1W ,0.2W ][0.2L ,0.5L ][0.2W ,0.5W ][0.1L ,0.2L ][0.4W ,0.9W ]3[1,3][0.3L ,0.8L ][0.1W ,0.2W ][0.2L ,0.4L ][0.2W ,0.4W ][0.1L ,0.2L ][0.3W ,0.8W ]4[1,4][0.2L ,0.7L ][0.1W ,0.2W ][0.1L ,0.4L ][0.1W ,0.4W ][0.1L ,0.2L ][0.2W ,0.7W ]5[1,5][0.1L ,0.6L ][0.1W ,0.2W ][0.1L ,0.3L ][0.1W ,0.3W ][0.1L ,0.2L ][0.1W ,0.6W ]202S.C.H.Leung et al./European Journal of Operational Research 225(2013)199–210of vehicles can be maximized.Firstly,all of the n customers are sorted on decreasing value of D i(i=1,2,...,n),where D i is the total demand of customer i(i=1,2,...,n)and the sequence is re-corded as Order.Subsequently we assign the customers one at a time from the Order list to a vehicle.The decision of which vehi-cle is assigned to a given customer is based on the least value of (freeD kÀD i)ÃF k,where freeD k is the unused capacity and F k is the ﬁxed cost of the current vehicle k(procedure Assign_Vehicle()). Because the number of each type of vehicle is unlimited,the pro-cedure alwaysﬁnds a feasible solution.Table3provides a pseu-do-code for the proposed Assign_Vehicle()algorithm.iused is an array presenting the number of used vehicles of different types.MinD is the minimal demand in all the customers.When assigning one customer i to vehicle k,the feasibility is examined to ensure the loading for the modiﬁed route is feasible.Other-wise,the assignment of customer i to vehicle k is forbidden and the procedure tries to assign the customer i to another vehicle.As shown in Table2,this procedure is used in SA.In each loop, a partial segment of Order is reversed to get a new Order.Then, we reassign the customers using this method.If a new solution is better than the current one,it becomes the new current solu-tion in order to adjust the search trajectory.This is assumed that previous solution does not have a good characteristic that can be improved easily.In order to obtain a better solution,the new Order is adopted based on the SA acceptance rule.After several steps of improvement by SA,the solution constructed is usually not comparable to the current one.So this method is only applied during theﬁrst ten iterations.Table5Dataset for different types of vehicle.Inst A B C DQ A L A W A F A V A Q B L C W C F C V C Q C L C W C F C V C Q D L D W D F D V D120101010 1.025151520 1.140252530 1.260402040 1.3 220101010 1.025151520 1.140252530 1.355402040 1.5 320101010 1.030151520 1.160402040 1.2420101010 1.040202020 1.160402030 1.251000101010 1.025******** 1.14000252530 1.36000402050 1.5 62000101010 1.025******** 1.14000402030 1.37200101010 1.0500151520 1.120002525120 6.0450040202508 8200101010 1.0500151520 1.120002525120 5.04500402025010 920101010 1.025151520 1.148402030 1.310200101010 1.0500151520 1.1200025251208.04500402025010 11200101010 1.0500151520 1.1200025251208.04500402025010 1220101010 1.025151520 1.140402030 1.3132000101010 1.05000151550 2.030,00040202001014500101010 1.01500151550 2.130002020400 3.2500040208005 155******** 1.01500151550 2.130002020400 3.2500040208005 1620101010 1.040202020 1.160402030 1.21720101010 1.025151520 1.140252530 1.360403040 1.4 182******** 1.0500202030 2.020********* 5.01920101010 1.040201020 1.160201530 1.2150402090 5.0 202000101010 1.04000201020 1.110,000301560 4.030,00040201508 2120101010 1.040201020 1.160201530 1.220040201208 2220101010 1.040201020 1.160201530 1.220040201208 2320101010 1.040201020 1.160201530 1.220040201208 2420101010 1.040201020 1.160201530 1.2100402060 3.2 2520101010 1.040201020 1.160201530 1.220040201208 2620101010 1.040201020 1.160201530 1.220040201208 2720101010 1.040201020 1.160201530 1.2100402060 3.2 2820101010 1.040201020 1.160201530 1.220040201208 29200101010 1.0500201030 2.0200040201208.03020101010 1.040201020 1.160201530 1.220040201208 3120101010 1.040201020 1.160201530 1.220040201208 3220101010 1.040201020 1.160201530 1.220040201208 3320101010 1.040201020 1.160201530 1.220040201208 3420101010 1.040201020 1.160201530 1.220040201208 35200101010 1.0400201020 1.51000402060 4.036100101010 1.020******* 1.1300302030 1.2400402040 1.3Table6Calibration experiment result for T0and Len.5152535S Sec tot S Sec tot S Sec tot S Sec totT0Unrest5266.65327.945159.32288.995094.74402.305111.83500.08 Seq5427.43461.795274.77510.395198.10580.955238.46680.853000500070009000LenUnrest5144.37383.375094.74402.305078.97779.145083.18968.25 Seq5328.87456.855198.10580.955206.401025.195191.071242.93S.C.H.Leung et al./European Journal of Operational Research225(2013)199–2102034.2.Neighborhood functionsIn our work,three types of move are used to step from the current solution to the subsequent solutions.They are noted as NS i(i=1,2,3).In each loop,one of them is selected ran-domly with equal probability.To explore a larger search space, a dummy empty vehicle is added for each type of vehicle.NS1is a type of customer relocation(Or-opt)(Waters,1987),which reassigns a customer from one route to another position on the same or different route(Fig.3).It is worth noting that relo-cation between two different routes can reduce the number of vehicles required.Waters(1987)introduced a‘‘swap’’type of route exchange which is represented by NS2(Fig.4).It is only applied to vehicles of the same type as swapping loads of heterogeneous vehicles could lead to an unfeasible route from a loading perspective. Therefore,customers’positions can only be exchanged in the cur-rent solution if they belong to vehicles of the same type.NS3is a variant of route interchange(2-opt)(Croes,1958;Lin, 1965).As for NS2,NS3only considers vehicles of the same type (Fig.5).If the selected customers are in the same route as depicted in Fig.5a,the positions of other customers between them (and including themselves)will be reversed.If they belong to dif-ferent routes as illustrated in Fig.5b,in each route from the se-lected customer to the last customer will be grouped as a block. Between the routes the blocks will be swapped.4.3.Heuristic local search mechanismIn order to improve the quality of the solution,we also apply a heuristic local search mechanism,which consists of three methods,to the proposed SA algorithm.It is worth noting that we only apply the mechanism to the best solution with a prob-ability of5%and this is aimed to obtain more efﬁcient solutions within a shorter period of time.The local search methods adopt theﬁrst improvement criterion using the neighborhood func-tions mentioned in the previous section.Because this neighbor-hood is not operated on two randomly selected customers,we deﬁne this mechanism as heuristic local search.We denote the local search methods as LS i(i=1,2,3)according to the neighborhoods NS i(i=1,2,3).These three methods are ran-domly executed.Let us consider an instance with n customers and k vehicles.In LS1,the relocation move of one customer involves the reassign-ment of(n+k)positions.Hence,the complexity of examining NS1 neighborhood of a solution is O(n⁄(n+k)).For LS2,in the worst case whereby all customers are assigned to one type of vehicle, n2pairs of customers can be exchanged and therefore the complex-ity of NS2is O(n2).For LS3,as for LS2,the number of interchange points is(n+k),so the cardinality of pairs for interchange in NS3 is(n+k)2.As a result,examining NS3neighborhood requires O ((n+k)2)computational effort.In practice,the worst case hardly happened because the customers are usually spread out across dif-ferent types of vehicle.Table8Average computational results of Classes2–5for Sequential2L-HFVRP.Inst SA SA_HLS%Gap S Sec h Sec tot S Sec h Sec tot 1678.497.7323.43603.15 5.7331.0411.10 2753.70 6.4321.53705.03 6.0931.28 6.46 3866.5620.8941.05771.8110.3536.6110.93 4796.2822.9340.76704.878.7135.1911.48 5944.8734.6849.07802.569.3527.6215.06 6901.4425.3345.42834.769.9243.907.40 76634.1357.8694.405770.83 1.9531.4013.01 87064.9227.5274.115633.04 4.7737.4220.27 91181.8648.3956.421047.6015.7468.8411.36 108695.22114.15194.217730.73 5.8751.7811.09 119789.89171.07225.858491.9410.0758.8813.26 121707.9915.1123.731681.6132.90158.45 1.54 1335464.18183.90287.9326761.40 6.7165.3424.54 1412027.5546.68112.0511120.3011.2955.627.54 1512871.23118.99190.7211916.3024.53148.507.42 161437.7411.1545.811291.2425.73113.0210.19 172037.0124.8735.341775.4943.41198.8312.84 188364.54344.74600.495790.6230.50138.0330.77 196186.88369.25781.224303.2651.44233.9830.45 209586.34941.001534.626215.4898.14217.7335.16 2114457.73811.101613.928494.36124.15443.8541.25 2215677.581090.281798.928867.10110.40353.4043.44 2315533.281078.921615.038544.30130.44386.8244.99 246756.02595.331092.094714.08104.98287.0230.22 2522864.752937.973155.7411602.30186.28605.5349.26 2620622.13875.112351.0612380.30153.21392.2639.97 279652.102343.762607.825882.75240.04502.4039.05 2841547.502484.484178.8123585.60393.72737.3543.23 2942142.583444.275609.8122938.80486.551045.6945.57 3036243.783609.867783.7316489.70536.781033.7954.50 3149143.853493.4611927.0922033.001110.801531.2855.17 3249142.654755.4512121.4820982.701040.061487.7457.30 3350660.653911.4612173.2721906.601087.931326.9856.76 3425388.93780.5514154.5615005.001680.671887.4140.90 3512902.48160.3217513.849313.541722.961877.5627.82 366279.65256.3716302.744567.292035.082052.9927.27 Avg.27.46Table7Result comparison of SA_HLS and SA on Class1.Inst SA SA_HLS%GapS Sec h Sec tot S Sec h Sec tot1665.64 6.41162.40596.0729.7833.5910.452732.8539.30159.30679.18 5.0532.017.323813.87162.92182.44745.5112.3954.248.404745.50142.72184.50694.33 6.1518.36 6.865916.40214.52226.97761.19 5.6226.9916.946814.8520.02192.53809.56 4.7725.650.6573387.06159.89215.363211.53 3.6729.76 5.1883359.77212.59215.033184.45 3.2124.30 5.2291144.65120.11219.141029.957.3940.2810.02105400.74108.70274.945149.51 6.9725.88 4.65115465.11217.80279.005119.40 6.9726.47 6.33121699.5917.67261.111658.5635.8092.80 2.411319390.0056.72275.6714655.40 1.9332.1824.421410447.1039.97305.7710019.0013.5173.21 4.101510546.9042.11304.9210151.70 5.4955.29 3.75161391.84108.34297.561292.5817.9476.927.13171963.5853.33411.941770.8338.88225.599.82184055.02343.92363.923140.5513.5435.3522.55191980.40205.31445.781553.1132.56107.8121.58203244.47277.20719.061956.9756.0071.5739.68215330.48428.97703.952567.1875.00195.6651.84225934.32160.30699.752605.9076.39174.7256.09235811.71404.05711.092643.8493.99239.2954.51244257.31289.19696.342555.4163.98156.4139.98255960.19248.30908.832972.59129.04253.0950.13265515.4843.45804.554049.6488.09180.2326.58275093.65426.08905.033561.58159.20230.4930.08287944.7382.841103.066858.35125.60161.0513.672915643.10822.161222.339695.00139.73142.6338.023011320.50992.751500.935663.33242.15259.8349.973119297.801754.361946.538054.90325.51483.4458.263217767.80732.971965.678408.61379.53410.8652.683318325.90696.911962.118555.58368.50486.2553.31345713.851406.471998.505536.63323.50425.80 3.10354875.690.981793.804444.59324.64401.588.84364961.221723.82322.953669.89555.13605.3126.03Avg.23.07204S.C.H.Leung et al./European Journal of Operational Research225(2013)199–210。

Firefly_Algorithms_for_Multimodal_Optimization1

a r X i v :1003.1466v 1 [m a t h .O C ] 7 M a r 2010Fireﬂy Algorithms for Multimodal Optimization Xin-She Yang Department of Engineering,University of Cambridge,Trumpington Street,Cambridge CB21PZ,UK Abstract Nature-inspired algorithms are among the most powerful algorithms for op-timization.This paper intends to provide a detailed description of a new Fireﬂy Algorithm (FA)for multimodal optimization applications.We will compare the proposed ﬁreﬂy algorithm with other metaheuristic algorithms such as particle swarm optimization (PSO).Simulations and results indicate that the proposed ﬁreﬂy algorithm is superior to existing metaheuristic algorithms.Finally we will discuss its applications and implications for further research.Citation detail:X.-S.Yang,“Fireﬂy algorithms for multimodal optimiza-tion”,in:Stochastic Algorithms:Foundations and Applications ,SAGA 2009,Lecture Notes in Computer Sciences,Vol.5792,pp.169-178(2009).1Introduction Biologically inspired algorithms are becoming powerful in modern numerical optimization [1,2,4,6,9,10],especially for the NP-hard problems such as the travelling salesman problem.Among these biology-derived algorithms,the multi-agent metaheuristic algorithms such as particle swarm optimization form hot research topics in the start-of-the-art algorithm development in optimiza-tion and other applications [1,2,9].Particle swarm optimization (PSO)was developed by Kennedy and Eber-hart in 1995[5],based on the swarm behaviour such as ﬁsh and bird schooling in nature,the so-called swarm intelligence.Though particle swarm optimization has many similarities with genetic algorithms,but it is much simpler because it does not use mutation/crossover operators.Instead,it uses the real-number randomness and the global communication among the swarming particles.In this sense,it is also easier to implement as it uses mainly real numbers.This paper aims to introduce the new Fireﬂy Algorithm and to provide the comparison study of the FA with PSO and other relevant algorithms.We will ﬁrst outline the particle swarm optimization,then formulate the ﬁreﬂy algorithms and ﬁnally give the comparison about the performance of these algorithms.The FA optimization seems more promising than particle swarm optimization in the sense that FA can deal with multimodal functions morenaturally and eﬃciently.In addition,particle swarm optimization is just a special class of theﬁreﬂy algorithms as we will demonstrate this in this paper. 2Particle Swarm Optimization2.1Standard PSOThe PSO algorithm searches the space of the objective functions by adjusting the trajectories of individual agents,called particles,as the piecewise paths formed by positional vectors in a quasi-stochastic manner[5,6].There are now as many as about20diﬀerent variants of PSO.Here we only describe the simplest and yet popular standard PSO.The particle movement has two major components:a stochastic component and a deterministic component.A particle is attracted toward the position of the current global best g∗and its own best location x∗i in history,while at the same time it has a tendency to move randomly.When a particleﬁnds a location that is better than any previously found locations,then it updates it as the new current best for particle i.There is a current global best for all n particles.The aim is toﬁnd the global best among all the current best solutions until the objective no longer improves or after a certain number of iterations.For the particle movement,we use x∗i to denote the current best for particle i,and g∗≈min or max{f(x i)}(i=1,2,...,n)to denote the current global best. Let x i and v i be the position vector and velocity for particle i,respectively. The new velocity vector is determined by the following formulav t+1i=v t i+αǫ1⊙(g∗−x t i)+βǫ2⊙(x∗i−x t i).(1) whereǫ1andǫ2are two random vectors,and each entry taking the values between0and1.The Hadamard product of two matrices u⊙v is deﬁned as the entrywise product,that is[u⊙v]ij=u ij v ij.The parametersαandβare the learning parameters or acceleration constants,which can typically be takenas,say,α≈β≈2.The initial values of x t=0i can be taken as the bounds orlimits a=min(x j),b=max(x j)and v t=0i =0.The new position can then beupdated byx t+1 i =x t i+v t+1i.(2)Although v i can be any values,it is usually bounded in some range[0,v max].There are many variants which extend the standard PSO algorithm,and the most noticeable improvement is probably to use inertia functionθ(t)so that v t i is replaced byθ(t)v t i whereθtakes the values between0and1.In the simplest case,the inertia function can be taken as a constant,typicallyθ≈0.5∼0.9. This is equivalent to introducing a virtual mass to stabilize the motion of the particles,and thus the algorithm is expected to converge more quickly.3Fireﬂy Algorithm3.1Behaviour of FireﬂiesTheﬂashing light ofﬁreﬂies is an amazing sight in the summer sky in the tropical and temperate regions.There are about two thousandﬁreﬂy species, and mostﬁreﬂies produce short and rhythmicﬂashes.The pattern ofﬂashes is often unique for a particular species.Theﬂashing light is produced by a process of bioluminescence,and the true functions of such signaling systems are still debating.However,two fundamental functions of suchﬂashes are to attract mating partners(communication),and to attract potential prey.In addition,ﬂashing may also serve as a protective warning mechanism.The rhythmicﬂash, the rate ofﬂashing and the amount of time form part of the signal system that brings both sexes together.Females respond to a male’s unique pattern of ﬂashing in the same species,while in some species such as photuris,female ﬁreﬂies can mimic the matingﬂashing pattern of other species so as to lure and eat the maleﬁreﬂies who may mistake theﬂashes as a potential suitable mate.We know that the light intensity at a particular distance r from the light source obeys the inverse square law.That is to say,the light intensity I de-creases as the distance r increases in terms of I∝1/r2.Furthermore,the air absorbs light which becomes weaker and weaker as the distance increases. These two combined factors make mostﬁreﬂies visible only to a limited dis-tance,usually several hundred meters at night,which is usually good enough forﬁreﬂies to communicate.Theﬂashing light can be formulated in such a way that it is associated with the objective function to be optimized,which makes it possible to formulate new optimization algorithms.In the rest of this paper,we willﬁrst outline the basic formulation of the Fireﬂy Algorithm(FA)and then discuss the implementation as well as its analysis in detail.3.2Fireﬂy AlgorithmNow we can idealize some of theﬂashing characteristics ofﬁreﬂies so as to developﬁreﬂy-inspired algorithms.For simplicity in describing our new Fireﬂire Algorithm(FA),we now use the following three idealized rules:1)allﬁreﬂies are unisex so that oneﬁreﬂy will be attracted to otherﬁreﬂies regardless of their sex;2)Attractiveness is proportional to their brightness,thus for any twoﬂashingﬁreﬂies,the less brighter one will move towards the brighter one. The attractiveness is proportional to the brightness and they both decrease as their distance increases.If there is no brighter one than a particularﬁreﬂy, it will move randomly;3)The brightness of aﬁreﬂy is aﬀected or determined by the landscape of the objective function.For a maximization problem,the brightness can simply be proportional to the value of the objective function. Other forms of brightness can be deﬁned in a similar way to theﬁtness function in genetic algorithms.Based on these three rules,the basic steps of theﬁreﬂy algorithm(FA)can be summarized as the pseudo code shown in Fig.1.Fireﬂy AlgorithmFigure1:Pseudo code of theﬁreﬂy algorithm(FA).In certain sense,there is some conceptual similarity between theﬁreﬂy al-gorithms and the bacterial foraging algorithm(BFA)[3,7].In BFA,the at-traction among bacteria is based partly on theirﬁtness and partly on their distance,while in FA,the attractiveness is linked to their objective function and monotonic decay of the attractiveness with distance.However,the agents in FA have adjustable visibility and more versatile in attractiveness variations, which usually leads to higher mobility and thus the search space is explored more eﬃciently.3.3AttractivenessIn theﬁreﬂy algorithm,there are two important issues:the variation of light intensity and formulation of the attractiveness.For simplicity,we can always assume that the attractiveness of aﬁreﬂy is determined by its brightness which in turn is associated with the encoded objective function.In the simplest case for maximum optimization problems,the brightness I of aﬁreﬂy at a particular location x can be chosen as I(x)∝f(x).However, the attractivenessβis relative,it should be seen in the eyes of the beholder or judged by the otherﬁreﬂies.Thus,it will vary with the distance r ij between ﬁreﬂy i andﬁreﬂy j.In addition,light intensity decreases with the distance from its source,and light is also absorbed in the media,so we should allow the attractiveness to vary with the degree of absorption.In the simplest form, the light intensity I(r)varies according to the inverse square law I(r)=I s/r2 where I s is the intensity at the source.For a given medium with aﬁxed light absorption coeﬃcientγ,the light intensity I varies with the distance r.That is I=I0e−γr,where I0is the original light intensity.In order to avoid the singularity at r=0in the expression I s/r2,the combined eﬀect of both theinverse square law and absorption can be approximated using the following Gaussian formI(r)=I0e−γr2.(3) Sometimes,we may need a function which decreases monotonically at a slower rate.In this case,we can use the following approximationI(r)=I02γ2r4+...,11+γr2.Equation(6)deﬁnes a characteristicdistanceΓ=1/√Γm.3.4Distance and MovementThe distance between any twoﬁreﬂies i and j at x i and x j,respectively,is the Cartesian distancer ij=||x i−x j||=(x i−x j)2+(y i−y j)2.The movement of aﬁreﬂy i is attracted to another more attractive(brighter)ﬁreﬂy j is determined byx i=x i+β0e−γr2ij(x j−x i)+α(rand−1where the second term is due to the attraction while the third term is random-ization withαbeing the randomization parameter.rand is a random number generator uniformly distributed in[0,1].For most cases in our implementation, we can takeβ0=1andα∈[0,1].Furthermore,the randomization term can easily be extended to a normal distribution N(0,1)or other distributions.In addition,if the scales vary signiﬁcantly in diﬀerent dimensions such as−105to 105in one dimension while,say,−0.001to0.01along the other,it is a good idea to replaceαbyαS k where the scaling parameters S k(k=1,...,d)in the d dimensions should be determined by the actual scales of the problem of interest.The parameterγnow characterizes the variation of the attractiveness,and its value is crucially important in determining the speed of the convergence and how the FA algorithm behaves.In theory,γ∈[0,∞),but in practice,γ=O(1) is determined by the characteristic lengthΓof the system to be optimized. Thus,in most applications,it typically varies from0.01to100.3.5Scaling and Asymptotic CasesIt is worth pointing out that the distance r deﬁned above is not limited to the Euclidean distance.We can deﬁne many other forms of distance r in the n-dimensional hyperspace,depending on the type of problem of our interest. For example,for job scheduling problems,r can be deﬁned as the time lag or time interval.For complicated networks such as the Internet and social networks,the distance r can be deﬁned as the combination of the degree of local clustering and the average proximity of vertices.In fact,any measure that can eﬀectively characterize the quantities of interest in the optimization problem can be used as the‘distance’r.The typical scaleΓshould be associated with the scale in the optimization problem of interest.IfΓis the typical scale for a given optimization problem,for a very large number ofﬁreﬂies n≫m where m is the number of local optima,then the initial locations of these n ﬁreﬂies should distribute relatively uniformly over the entire search space in a similar manner as the initialization of quasi-Monte Carlo simulations.As the iterations proceed,theﬁreﬂies would converge into all the local optima (including the global ones)in a stochastic manner.By comparing the best solutions among all these optima,the global optima can easily be achieved. At the moment,we are trying to formally prove that theﬁreﬂy algorithm will approach global optima when n→∞and t≫1.In reality,it converges very quickly,typically with less than50to100generations,and this will be demonstrated using various standard test functions later in this paper.There are two important limiting cases whenγ→0andγ→∞.Forγ→0, the attractiveness is constantβ=β0andΓ→∞,this is equivalent to say that the light intensity does not decrease in an idealized sky.Thus,aﬂashing ﬁreﬂy can be seen anywhere in the domain.Thus,a single(usually global) optimum can easily be reached.This corresponds to a special case of particle swarm optimization(PSO)discussed earlier.Subsequently,the eﬃciency of this special case is the same as that of PSO.On the other hand,the limiting caseγ→∞leads toΓ→0andβ(r)→δ(r)(the Dirac delta function),which means that the attractiveness is almost zero in the sight of otherﬁreﬂies or theﬁreﬂies are short-sighted.This isFigure2:a global minimum f∗≈−1.801at(2.20319,1.57049).equivalent to the case where theﬁreﬂiesﬂy in a very foggy region randomly.No otherﬁreﬂies can be seen,and eachﬁreﬂy roams in a completely randomway.Therefore,this corresponds to the completely random search method.Astheﬁreﬂy algorithm is usually in somewhere between these two extremes,it ispossible to adjust the parameterγandαso that it can outperform both therandom search and PSO.In fact,FA canﬁnd the global optima as well as all thelocal optima simultaneously in a very eﬀective manner.This advantage will bedemonstrated in detail later in the implementation.A further advantage of FAis that diﬀerentﬁreﬂies will work almost independently,it is thus particularlysuitable for parallel implementation.It is even better than genetic algorithmsand PSO becauseﬁreﬂies aggregate more closely around each optimum(withoutjumping around as in the case of genetic algorithms).The interactions betweendiﬀerent subregions are minimal in parallel implementation.4Multimodal Optimization with MultipleOptima4.1ValidationIn order to demonstrate how theﬁreﬂy algorithm works,we have implemented itin Matlab.We will use various test functions to validate the new algorithm.Asan example,we now use the FA toﬁnd the global optimum of the Michalewiczfunctionf(x)=−di=1sin(x i)[sin(ix2iFigure3:The initial40ﬁreﬂies(left)and their locations after10iterations(right).scribed a multimodal function which looks like a standing-wave pattern[11] f(x)= e− d i=1(x i/a)2m−2e− d i=1x2i ·d i=1cos2x i,m=5,(11) is multimodal with many local peaks and valleys,and it has a unique globalminimum f∗=−1at(0,0,...,0)in the region−20≤x i≤20where i=1,2,...,dand a=15.The2D landscape of Yang’s function is shown in Fig.4.4.2Comparison of F A with PSO and GAVarious studies show that PSO algorithms can outperform genetic algorithms(GA)[4]and other conventional algorithms for solving many optimization prob-lems.This is partially due to that fact that the broadcasting ability of the cur-rent best estimates gives better and quicker convergence towards the optimality.A general framework for evaluating statistical performance of evolutionary al-gorithms has been discussed in detail by Shilane et al.[8].Now we will comparethe Fireﬂy Algorithms with PSO,and genetic algorithms for various standardtest functions.For genetic algorithms,we have used the standard version withno elitism with a mutation probability of p m=0.05and a crossover probabilityof0.95.For the particle swarm optimization,we have also used the standardversion with the learning parametersα≈β≈2without the inertia correction[4,5,6].We have used various population sizes from n=15to200,and foundthat for most problems,it is suﬃcient to use n=15to50.Therefore,we haveused aﬁxed population size of n=40in all our simulations for comparison.After implementing these algorithms using Matlab,we have carried out ex-tensive simulations and each algorithm has been run at least100times so as tocarry out meaningful statistical analysis.The algorithms stop when the varia-tions of function values are less than a given toleranceǫ≤10−5.The resultsare summarized in the following table(see Table1)where the global optimaare reached.The numbers are in the format:average number of evaluations(success rate),so3752±725(99%)means that the average number(mean)offunction evaluations is3752with a standard deviation of725.The success rateofﬁnding the global optima for this algorithm is99%.We can see that the FA is much more eﬃcient inﬁnding the global optima with higher success rates.Each function evaluation is virtually instantaneousFigure4:Yang’s function in2D with a global minimum f∗=−1at(0,0)where a=15.Table1:Comparison of algorithm performanceFunctions/Algorithms GA PSO FA on modern personal computer.For example,the computing time for10,000evaluations on a3GHz desktop is about5seconds.Even with graphics fordisplaying the locations of the particles andﬁreﬂies,it usually takes less thana few minutes.It is worth pointing out that more formal statistical hypothesistesting can be used to verify such signiﬁcance.5ConclusionsIn this paper,we have formulated a newﬁreﬂy algorithm and analyzed its simi-larities and diﬀerences with particle swarm optimization.We then implementedand compared these algorithms.Our simulation results forﬁnding the globaloptima of various test functions suggest that particle swarm often outperformstraditional algorithms such as genetic algorithms,while the newﬁreﬂy algo-rithm is superior to both PSO and GA in terms of both eﬃciency and successrate.This implies that FA is potentially more powerful in solving NP-hardproblems which will be investigated further in future studies.The basicﬁreﬂy algorithm is very eﬃcient,but we can see that the solutionsare still changing as the optima are approaching.It is possible to improve the solution quality by reducing the randomness gradually.A further improvement on the convergence of the algorithm is to vary the randomization parameterαso that it decreases gradually as the optima are approaching.These could form important topics for further research.Furthermore,as a relatively straightfor-ward extension,the Fireﬂy Algorithm can be modiﬁed to solve multiobjective optimization problems.In addition,the application ofﬁreﬂy algorithms in com-bination with other algorithms may form an exciting area for further research. References[1]Bonabeau E.,Dorigo M.,Theraulaz G.,Swarm Intelligence:From Naturalto Artiﬁcial Systems.Oxford University Press,(1999)[2]Deb.K.,Optimisation for Engineering Design,Prentice-Hall,New Delhi,(1995).[3]Gazi K.,and Passino K.M.,Stability analysis of social foraging swarms,IEEE Trans.Sys.Man.Cyber.Part B-Cybernetics,34,539-557(2004).[4]Goldberg D.E.,Genetic Algorithms in Search,Optimisation and MachineLearning,Reading,Mass.:Addison Wesley(1989).[5]Kennedy J.and Eberhart R.C.:Particle swarm optimization.Proc.ofIEEE International Conference on Neural Networks,Piscataway,NJ.pp.1942-1948(1995).[6]Kennedy J.,Eberhart R.,Shi Y.:Swarm intelligence,Academic Press,(2001).[7]Passino K.M.,Biomimicrt of Bacterial Foraging for Distributed Optimiza-tion,University Press,Princeton,New Jersey(2001).[8]Shilane D.,Martikainen J.,Dudoit S.,Ovaska S.J.,A general frameworkfor statistical performance comparison of evolutionary computation algo-rithms,Information Sciences:an Int.Journal,178,2870-2879(2008). [9]Yang X.S.,Nature-Inspired Metaheuristic Algorithms,Luniver Press,(2008).[10]Yang X.S.,Biology-derived algorithms in engineering optimizaton(Chap-ter32),in Handbook of Bioinspired Algorithms and Applications(eds Olar-ius&Zomaya),Chapman&Hall/CRC(2005).[11]Yang X.S.,Engineering Optimization:An Introduction with MetaheuristicApplications,Wiley(2010).。

soma包说明书

Package‘soma’October14,2022Version1.2.0Date2022-05-01Title General-Purpose Optimisation with the Self-Organising MigratingAlgorithmAuthor Jon ClaydenMaintainer Jon Clayden<****************>Depends R(>=2.5.0)Imports reportr(>=1.3.0)Suggests tinytest,covr,shadesDescription An R implementation of the Self-Organising Migrating Algorithm,a general-purpose,stochastic optimisation algorithm.The approach is similar to that of genetic algo-rithms,although it is based on the idea of a series of``migrations''by aﬁxed set of individu-als,rather than the development of successive generations.It can be applied to any cost-minimisation problem with a bounded parameter space,and is robust to local minima.License GPL-2URL https:///jonclayden/soma/Encoding UTF-8RoxygenNote7.1.2NeedsCompilation noRepository CRANDate/Publication2022-05-0208:40:05UTCR topics documented:all2one (2)soma (4)Index71all2one Options for the available SOMA variantsDescriptionThese functions generate option lists(and provide defaults)for the SOMA algorithm variants avail-able in the package,which control how the algorithm will proceed and when it will terminate.Each function corresponds to a different top-level strategy,described in a different reference.Usageall2one(populationSize=10L,nMigrations=20L,pathLength=3,stepLength=0.11,perturbationChance=0.1,minAbsoluteSep=0,minRelativeSep=0.001)t3a(populationSize=30L,nMigrations=20L,nSteps=45L,migrantPoolSize=10L,leaderPoolSize=10L,nMigrants=4L,minAbsoluteSep=0,minRelativeSep=0.001)pareto(populationSize=100L,nMigrations=20L,nSteps=10L,perturbationFrequency=1,stepFrequency=1,minAbsoluteSep=0,minRelativeSep=0.001)ArgumentspopulationSize The number of individuals in the population.It is recommended that this be somewhat larger than the number of parameters being optimised over,and itshould not be less than2.The default varies by strategy.nMigrations The maximum number of migrations to complete.pathLength The distance towards the leader that individuals may migrate.A value of1corre-sponds to the leader’s position itself,and values greater than one(recommended)allow for some overshoot.stepLength The granularity at which potential steps are evaluated.It is recommended that the pathLength not be a whole multiple of this value.perturbationChanceThe probability that individual parameters are changed on any given step.minAbsoluteSep The smallest absolute difference between the maximum and minimum cost func-tion values.If the difference falls below this minimum,the algorithm will ter-minate.The default is0,meaning that this termination criterion will never bemet.minRelativeSep The smallest relative difference between the maximum and minimum cost func-tion values.If the difference falls below this minimum,the algorithm will ter-minate.nSteps The number of candidate steps towards the leader per migrating individual.This option is used instead of pathLength and stepLength under the T3A and Paretostrategies,where the step length is variable.migrantPoolSize,leaderPoolSizeThe number of randomly selected individuals to include in the migrant andleader pools,respectively,under the T3A strategy.nMigrants The number of individuals that will migrate,at each migration,under the T3A strategy.perturbationFrequency,stepFrequencyScale factors affecting how rapidly the perturbation probability and step sizesﬂuctuate under the Pareto strategy.DetailsAll To One(the all2one function)is the original SOMA strategy.At each“migration”,the cost function is evaluated for all individuals in the population,and the one with the lowest value is designated the“leader”.All other individuals migrate towards the leader’s position in some or all dimensions of the parameter space,with aﬁxed probability of perturbation in each dimension.Each migration is evaluated against the cost function at several points on the line towards the leader,and the location with the lowest value becomes the individual’s starting position for the next migration.The Team To Team Adaptive(T3A)strategy(Diep,2019)differs in that only a random subset of individuals are selected into a migrant pool and a leader pool for any given migration.A subset of most optimal migrants are then migrated towards the single most optimal individual from the leader pool.The perturbation probability and step length along the trajectory towards the leader also vary according to formulae given by the strategy author as the algorithm progresses through the migrations.In the Pareto strategy(Diep et al.,2019),all individuals are sorted by cost function value at the start of each migration.The leader is selected randomly from the top4%(20%of20%)of most optimal individuals,and a single migrant is chosen at random from between the20th and the36th percentiles of the population(the top20%of the bottom80%).The perturbation probability and the step length again vary across migrations,but this time in a sinusoidal fashion,and the migrant is updated in all dimensions,but some more slowly than others.4somaValueA list of class"soma.options".Author(s)Jon Clayden<****************>ReferencesI.Zelinka(2004).SOMA-self-organizing migrating algorithm.In G.C.Onwubolu&B.V.Babu,eds,New optimization techniques in engineering.V olume141of“Studies in Fuzziness and Soft Computing”,pp.167-217.Springer.Q.B.Diep(2019).Self-Organizing Migrating Algorithm Team To Team Adaptive–SOMA T3A.In proceedings of the2019IEEE Congress on Evolutionary Computation(CEC),pp.1182-1187.IEEE.Q.B.Diep,I.Zelinka&S.Das(2019).Pareto-Based Self-Organizing Migrating Algorithm.Mendel 25(1):111-120.soma The Self-Organising Migrating AlgorithmDescriptionThe Self-Organising Migrating Algorithm(SOMA)is a general-purpose,stochastic optimisation algorithm.The approach is similar to that of genetic algorithms,although it is based on the idea of a series of“migrations”by aﬁxed set of individuals,rather than the development of successive generations.It can be applied to any cost-minimisation problem with a bounded parameter space, and is robust to local minima.Usagesoma(costFunction,bounds,options=list(),init=NULL,...)bounds(min,max)##S3method for class somaplot(x,y=NULL,add=FALSE,...)ArgumentscostFunction A cost function which takes a numeric vector of parameters as itsﬁrst argument, and returns a numeric scalar representing the associated cost value.bounds A list with elements min and max,each a numeric vector giving the upper and lower bounds for each parameter,respectively.options A list of options for the SOMA algorithm itself,usually generated by functions like all2one.soma5 init An optional matrix giving the starting population’s positions in parameter space, one per column.If omitted,initialisation is random(as is usual for SOMA),butspecifying a starting state can be helpful when running the algorithm in stagesor investigating the consistency of solutions....Additional parameters to costFunction(for soma)or the default plotting method (for plot.soma).min,max Vectors of minimum and maximum bound values for each parameter to the costFunction.x An object of class"soma".y Ignored.add If TRUE,add to an existing plot canvas.ValueA list of class"soma",containing the following elements.leader The index of the“leader”,the individual in the population with the lowest cost.population A matrix whose columns give the parameter values for each individual in the popula-tion at convergence.cost A vector giving the cost function values for each individual at convergence.history A vector giving the cost of the leader for each migration during the optimisation.This should be nonincreasing.migrations The number of migrations completed.evaluations The number of times the costFunction was evaluated.A plot method is available for this class,which shows the history of leader cost values during theoptimisation.Author(s)R implementation by Jon Clayden<****************>.ReferencesI.Zelinka(2004).SOMA-self-organizing migrating algorithm.In G.C.Onwubolu&B.V.Babu,eds,New optimization techniques in engineering.V olume141of“Studies in Fuzziness and Soft Computing”,pp.167-217.Springer.See Alsosoma.options for setting options.optim implements other general-purpose optimisation methods.6somaExamples#Rastrigin s function,which contains many local minimarastrigin<-function(x)10*length(x)+sum(x^2-10*cos(2*pi*x))#Find the global minimum over the range-5to5in each parameterx<-soma(rastrigin,bounds(c(-5,-5),c(5,5)))#Find the location of the leader-should be near the true minimum of c(0,0)print(x$population[,x$leader])#Plot the cost history of the leadersplot(x)Indexall2one,2,4bounds(soma),4optim,5pareto(all2one),2plot.soma(soma),4soma,4soma.options,5soma.options(all2one),2t3a(all2one),27。

A Comprehensive Survey of Multiagent Reinforcement Learning

156
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 2, MARCH 2008
A Comprehensive Survey of Multiagent ReinfoN
A
MULTIAGENT system [1] can be deﬁned as a group of autonomous, interacting entities sharing a common environment, which they perceive with sensors and upon which they act with actuators [2]. Multiagent systems are ﬁnding applications in a wide variety of domains including robotic teams, distributed control, resource management, collaborative decision support systems, data mining, etc. [3], [4]. They may arise as the most natural way of looking at the system, or may provide an alternative perspective on systems that are originally regarded as centralized. For instance, in robotic teams, the control authority is naturally distributed among the robots [4]. In resource management, while resources can be managed by a central authority, identifying each resource with an agent may provide a helpful, distributed perspective on the system [5].

最优化方法有关牛顿法的矩阵的秩为一的题目

英文回答：The Newton-Raphson method is an iterative optimization algorithm utilized for locating the local minimum or maximumof a given function. Within the realm of optimization, the Newton-Raphson method iteratively updates the current solution by leveraging the second derivative information of the objective function. This approach enables the method to converge towards the optimal solution at an accelerated pacepared to first-order optimization algorithms, such as the gradient descent method. Nonetheless, the Newton-Raphson method necessitates the solution of a system of linear equations involving the Hessian matrix, which denotes the second derivative of the objective function. Of particular note, when the Hessian matrix possesses a rank of one, it introduces a special case for the Newton-Raphson method.牛顿—拉弗森方法是一种迭代优化算法，用于定位特定函数的局部最小或最大值。

离散时间平均场二次最优控制问题

离散时间平均场二次最优控制问题冀鹏飞【摘要】讨论了带有约束终端的离散时间系统的平均场随机线性二次型最优控制问题.利用拉格朗日乘子定理,在线性二次最优控制问题成立的条件下,给出了状态反馈解的一个必要条件.从某种意义上说,本文可以看作是平均场离散时间随机线性二次最优控制问题的推广.【期刊名称】《德州学院学报》【年(卷),期】2018(034)002【总页数】7页(P8-14)【关键词】随机二次最优控制;离散时间系统;平均场理论;拉格朗日乘子定理【作者】冀鹏飞【作者单位】山东科技大学数学与系统科学学院,山东青岛 266000【正文语种】中文【中图分类】O2321 引言1958年,贝尔曼开始研究二次型最优控制.1960年卡曼建立了基于状态反馈的线性二次型最优控制理论,并在最优控制理论中引入了黎卡提微分方程.这样就可以用统一的解析式来表示线性二次型最优控制的解,且得到一个简单的线性状态反馈控制律,从而构成闭环最优控制.同时线性二次型最优控制问题还可以兼顾系统的性能指标等多方面的因素,如它可以把得到的最优反馈控制与非线性系统开环最优控制结合起来,可以减少开环系统的误差,得到更精确的结果.从20世纪50年代末开始,控制理论进入了一个新的发展时期,它所研究的对象扩展为多输入多输出的,非线性的,时变的离散时间系统,它涉及到了线性控制,自适应控制,最优控制,鲁棒控制,非线性控制,控制系统CAD等理论和方法.今天,随着被控模型的复杂性,不确定性和规模的增大,传统的基于精确的数学模型的控制理论的局限性日益明显. 众所周知,系统很容易受到各种限制因素的影响,例如温度、压力等.因此受约束的随机线性二次最优控制问题的研究是一个非常重要的课题.文献[1]针对模型自由的随机线性离散时间系统,通过Q学习算法,求解无限时间随机线性二次最优控制问题.文献[2]研究了离散时间随机二次最优控制问题.文献[3]考虑了具有确定性系数的平均场随机微分方程的线性二次最优控制问题.在文献[4]中,研究了在无限时间范围内存在的平均场二次最优控制问题.文献[5]提出了有限时域随机最优控制模型的数值方法,推导出了随机最小值原理,并在此基础上提出了一种基于最小值原理直接求解的数值方法.文献[6]研究一类基于社交影响力和平均场理论的信息传播动力学模型,在针对影响力度量中主要研究静态拓扑结构,利用平均场理论来忽略个体行为特征,提出了一种基于动态节点行为和用户影响力的信息传播动力学模型.本文利用凸分析的拉格朗日乘子定理研究带终端的随机线性二次最优控制问题,并且将平均场理论应用到最优控制问题中,可以最大限度的减小噪声对系统的影响,并能方便的处理噪声方面的问题.同时验证了平均区域随机二次最优控制问题存在线性反馈最优解的必要条件,其结果可以看作是平均场离散时间随机二次最优控制问题的推广.为了方便,给出以下定义：M'是矩阵M的转置；Tr(M)是矩阵M的迹；当M>0(M≥0)时,M为正定矩阵；Ex代表随机变量x的数学期望,Rm×n为n×m矩阵；N={0,1,2,…,T}；并且令2 问题陈述考虑以下形式的平均区域离散时间系统(1)bi1x1T+bi2x2T+…+binxnT=ξi, i=1, 2,…, r(2)其中是给定的矩阵值函数；xt和ut分别是状态过程和控制过程；E[ωt]=0和E[ωtωt]=δst是一个二阶过程,δst是Kronecker函数；ωt, t∈N是定义在概率空间(Ω, F, P)上的一维的标准Brown运动,Ft=σ(ωs:s∈N+)为Brown运动生成的信息流.u(.)属于允许控制集(3)ξi为给定的FT可测的平方可积随机变量,即E|ξi|<+,bij为已知实数,i=1,2,…,r;j=1,2,…,n. 令Nr×n=(bij)r×n,ξ=(ξ1,ξ2,…,ξr)′,则约束(2)可写为NT=ξ,在这里假设N为行满秩.表述本文主要定理之前,首先给出本文要用到的拉格朗日乘子定理和一些重要的引理.定义1[7] 设X为向量空间,Y为赋范线性空间,T为X到Y的变换,对x,h∈X,如果极限(4)存在,称此极限为T在x处方向h的方向导数或Gateaux导数.若对任意的h∈X,上述极限都存在,则称T在x处为 Gateaux 可导.定义2[7] 设X,Y为赋范线性空间,T为定义于X到Y的变换.对于给定的x∈D,h∈X,T在x处为Gateaux 可导,Gateaux导数δTx; h∈Y关于h为有界线性变换,且满足则称T在x处为 Frechet 可导,δTx, h为T在x处h的 Frechet 导数.定义3[7] 设Tx为定义于Banach空间X到Banach空间Y的变换,且有连续的Frechet导数.若对x0∈D,δTx; h为从X到Y的满射,则称x0为变换T的正则点. 引理1 [7] 设fx是定义于 Banach 空间X上具有连续的Frechet导数的实值函数,Hx为X到Banach空间Z的映射,x0为变换Hx的正则点.若fx在约束Hx=0下在x0处达到极值,则存在Z上有界线性泛函使Lagrang泛函在x0处有驻点,即†Hx0; h=0,对所有h∈X都成立.在本节的最后再给出一个关于广义逆矩阵的引理.引理2[8] 给定M∈Rm×n,则存在唯一的M†∈Rn×m,满足矩阵M†称为M的 Moor-Penrose 广义逆.3 主要结论对于离散时间控制系统(1),给出关于可容许控制集Uad的目标函数(5)其中是对称矩阵.定义4 如果存在u0∈Uad 满足Jx0, u0=infJx0, u，>-, u∈Uad(6)则称u0为最优控制,系统(1)为适定的.为最优轨迹,Jx0,u0为最优目标函数.如果线性反馈控制对问题(1)和(6)是最优的,那么它在下列形式的反馈中也是最优的(7)其中Lt, t∈NT-1是矩阵值函数,为最优状态反馈控制.把(7)代入(1),则二次最优控制问题变为以下形式(8)称Lt, t∈N为新的控制集.令通过(8)式可以得到(9)X0=Ex0x0′(10)把(9)和(10)代入(5),经过简单的变形得到目标泛函如下其中约束终端(2)变为(11)最优控制问题归结为以下形式目标泛函Jx0, u可视为定义在空间Cm×n[0,T]×Cm×n[0,T]上,其中Cm×n[0,T]为所有元素是[0,T]上连续函数的n阶方阵构成的空间；(9)式和(10)式定义了从Cm×n×Cm×n到Cn×n的变换(12)而(11)式定义了从Cn×n[0,T] 到Rr×r的变换G(XT)=NXTN′从而约束(9)式,(10)式,(11)式可表示成为(13)下面来证明和有连续的Frechet 导数.定理都有连续的 Frechet 导数,且导数为δHX( ΔXt+1)=-ΔXt+1(14)(15)的 Fretchet 导数为其中是矩阵值连续函数.证明在这里只证明(14)式,其他证明过程跟(14)式相似. 令Xαt=Xt+αΔXt,通过定义1,能够得出(16)其中(17)令α→0,可以得出(14).定理2 如果存在(18)是最优控制,那么存在对称矩阵和λ∈Rr×r满足(19)(20)证明设是(5)式的最优解,通过定理2,可以得到对称矩阵和满足以下等式δJXΔXt+δHXΔXt+1+δHXΔXt+δGΔXT=0(21)δJLΔLt+δHLΔHt=0(22)由于那么(21)式和(22)式变为NΔXTN'-TrPTΔXT=0由于ΔXt和ΔXT相互独立,则(19)式证出.通过类似的方法,(20)式也可以被证出.结论1 如果(8)式,(11)式,(18)-(20)式存在解是最优控制,则最优目标函数满足其中把(16)式代入(5)式,经简单变形,就可得到上述结论.推论1 对于平均场二次最优控制问题,如果满足则满足≥0,t∈T.此证明过程与参考文献[9]的证明过程相似,不再加以赘述.4 数值例子考虑一个周期为3的数值例子满足其系数值为借助于Riccati方程(12)和(18),可以得到Riccati解为应用结论1,可以得到最优控制其中5 总结主要研究了平均场线性二次最优控制问题.借助于拉格朗日乘子定理,给出了该问题存在最优解的必要条件,并计算出了状态反馈最优解.将平均场理论应用到最优控制问题中,可以最大限度的减小噪声对系统的影响并能方便的处理噪声问题.最后通过一个数值例子验证了结论的正确性.参考文献:[1] 么彩莲,王涛.模型自由的离散时间系统的随机线性二次最优控制问题[J].辽宁石油化工大学学报,2016,36(6):64-68.[2] X.K.Liu.Y.Li,W.H.Zhang.stochastic linear quadratic optimal control with constraint for discrete-time systems[J].Applied Mathematics and Computation,2014,228(9): 264-270.[3] J.M.Yong.A linear-quadratic optimal control problem for mean-field stochastic differential equations[J].SIAM J.Control andOptim,2013,51(4):2809-2838.[4] Y.N.Ni,R.Elliott,X.Li.Discrete-time mean-field stochastic linear-quadratic optimal control problems,: Infinite horizoncase[J].Automatica,2013,57(11):65-77.[5] P.Parpas,M.Webester.A stochastic minimum principle and an adaptive pathwise algorithm for stochastic optimalcontrol[J].Automatica,2013,49(6):1663-1671.[6] 肖云鹏,李松阳,刘宴兵.一种基于社交影响力和平均场理论的信息传播动力学模型[J].物理学报,2017,66(3):1-13.[7] D.G.Luenberger,Optimization by vectors Space Methods[M].Wiley,New York,1968.[8] M.A.Rami.J.B.Moore.X.Y.Zhou.Indefinite stochastic linear quadratic control and generalized differential Riccati equation[J].SIAM J.Control &Optimization,2001,40:1296-1311.[9] R.J.Elliott,X.Li,Y.H.Ni.Discrete-time mean-field stochastic linear-quadratic optimal control problems[J].Automatica,2013,49:3222-3223.。

遗传算法简介英文版

Using GAs in MatLab
/mirage/GAToolBox/gaot/
MatLab Code
% Bounds on the variables bounds = [-5 5; -5 5]; % Evaluation Function evalFn = 'Four_Eval'; evalOps = []; % Generate an intialize population of size 80 startPop=initializega(80,bounds,evalFn,[1e-10 1]); % GA Options [epsilon float/binary display] gaOpts=[1e-10 1 0]; % Termination Operators -- 500 Generations termFns = 'maxGenTerm'; termOps = [500]; % Selection Function selectFn = 'normGeomSelect'; selectOps = [0.08]; % Crossover Operators xFns = 'arithXover heuristicXover simpleXover'; xOpts = [1 0; 1 3; 1 0];
of steepest gradient. Simple to implement, guaranteed convergence. Must know something about the derivative. Can easily get stuck in a local minimum.
The genetic algorithm was

Probability and Stochastic Processes

Probability and Stochastic Processes Probability and stochastic processes are fundamental concepts in the field of mathematics and have wide-ranging applications in various fields such as engineering, finance, and science. Understanding these concepts is crucial for making informed decisions in uncertain and random environments. In this response, we will delve into the significance of probability and stochastic processes, their real-world applications, and the challenges associated with studying and applying these concepts. Probability is the branch of mathematics that deals with the likelihood of a particular event or outcome occurring. It provides a framework for quantifying uncertainty and making predictions based on available information. Stochastic processes, on the other hand, are mathematical models that describe the evolution of random variables over time. These processes are used to analyze and predict the behavior of complex systems that exhibit random behavior. One of the key reasons why probability and stochastic processes are important is their role in decision-making under uncertainty. In many real-world scenarios, decisions need to be made in the presence of incomplete information and unpredictable outcomes. Probability theory provides a systematic way to evaluate the likelihood of different outcomes and make rational decisions based on this assessment. Stochastic processes, on the other hand, are used to model and analyze random phenomena such as stock prices, weather patterns, and the spread of diseases. In the field of engineering, probability and stochastic processes are used to design and analyze systems that operate in uncertain environments. For example, in the design of communication systems, engineers use probability theory to analyze the performance of error-correcting codes and stochastic processes to model the behavior of wireless channels. Similarly, in the field of finance, these concepts are used to model the behavior of financial markets, price derivatives, and manage risk. Despite their wide-ranging applications, studying probability and stochastic processes can be challenging due to their abstract nature and the need for a strong mathematical foundation. Many students find it difficult to grasp the concepts of probability, random variables, and stochastic processes, as they often require a shift in thinking from deterministic to probabilistic reasoning. Moreover, the mathematical tools and techniques used to analyze these concepts,such as measure theory and stochastic calculus, can be quite advanced and require a significant amount of time and effort to master. In addition to the academic challenges, there are also practical difficulties in applying probability and stochastic processes to real-world problems. For example, in financial modeling, accurately predicting stock prices or interest rates using stochastic processes is a complex task that requires sophisticated mathematical models and large amounts of historical data. Furthermore, the assumptions made in these models, such as the independence of random variables or the stationarity of processes, may not always hold in practice, leading to inaccuracies in predictions. In conclusion, probability and stochastic processes are essential tools for understanding and navigating the uncertainties of the world. From decision-making under uncertainty to modeling complex systems, these concepts play a crucial role in a wide range of fields. However, mastering these concepts and applying them to real-world problems can be challenging due to their abstract nature and the complexity of the mathematical techniques involved. Nonetheless, the rewards of understanding and applying probability and stochastic processes are immense, as they provide a powerful framework for making informed decisions and predicting the behavior of random phenomena.。

关于HMM模型算法的一种改进

关于HMM模型算法的一种改进彭丽莉;周传斌;田永涛【摘要】对传统的HMM模型进行了改进,结合了HMM和MEMM的优点.此改进是在"HMM:状态-观察"和"MEMM:特征-状态"之间进行的,实现"特征-状态-观察"这种模式,简化了繁冗的过程,比一般的HMM具有更好的性能,并举出实例加以论证.【期刊名称】《绵阳师范学院学报》【年(卷),期】2010(029)008【总页数】4页(P110-112,129)【关键词】人工智能;HMM模型;最大熵【作者】彭丽莉;周传斌;田永涛【作者单位】重庆师范大学数学学院,重庆,400047;重庆师范大学数学学院,重庆,400047;重庆师范大学数学学院,重庆,400047【正文语种】中文【中图分类】TP181隐马尔可夫模型(HiddenMarkovModel,HMM)作为一种统计分析模型,是对马尔可夫模型的一种扩充。

它的基本理论形成于上世纪60年代末期和70年代初期。

70年代,CMU的J.K.Baker以及I BM的F. Jelinek等把HMM模型应用于语音识别。

隐马尔可夫模型是一个双重随机过程——具有一定状态数的隐马尔可夫链和显示随机函数集。

它由一些代表隐状态的节点组成,这些节点之间由反映不同状态之间互相转移概率相联系。

每一个隐状态同时都能根据不同的概率发出一些可见的状态。

HMM非常适合描述序列模型,特别是上下文相关的场合,比如:语音中的音素。

最大熵的马尔可夫模型,对文本中的问题和答案部分进行切分。

实际上,它是一种指数模型,输入了文本中的抽象特征,在马尔可夫状态转移的基础上选择下一状态,这一点上更接近于有限状态机(FS M)。

它可以改善抽取的性能,但没有统计具体的文本词汇,只考虑抽取特征,因此在某些情况下性能不如HMM。

因此,本文结合隐马尔可夫模型(HMM)和最大熵的马尔可夫模型(MEMM),对传统的HMM模型进行了改进,来实现文本信息抽取。

Guest Editorial Procedural Content Generation in Games

Guest Editorial:Procedural Content Generation in GamesI.P ROCEDURAL C ONTENT G ENERATIONC REATING games is in essence a nontechnical task and,accordingly,creating computer games involves much more than just programming.First,there needs to be a game design.And then,depending on the game genre,there needs to be character models,maps,levels,boards,racing tracks,trees, rocks,weapons,textures,sound effects,quests,clouds,and so on.In many contemporary game productions,creating all this game content requires a signiﬁcantly larger effort and expense than the actual programming of the game.Therefore,having access to tools that automate some of this content production would be of great and direct beneﬁt to game developers.In addition,such techniques would also indirectly beneﬁt game scholars,as advances in generative techniques are likely to improve our understanding of the content that is designed and of design methods in general.Fortunately,there are many methods(originating,e.g.,from AI,computational intelligence,computer graphics,modeling, and discrete mathematics)that are eminently applicable to gen-erating a large variety of game content.Conversely,efﬁciently generating proper game content poses a number of interesting challenges to such methods,which can spur the development of new algorithms.Therefore,theﬁeld of procedural content generation(PCG)covers a wide set of problems and methods that are of high interest to both game developers and academic researchers.The call for this special issue mentioned many different as-pects of PCG for games,which was clearly reﬂected in a total of 22manuscripts being submitted to the IEEE T RANSACTIONS ON C OMPUTATIONAL I NTELLIGENCE AND AI IN G AMES(TCIAIG) targeted at this special issue.Between November2010and June 2011,all these manuscripts underwent a thorough peer-review process,leading to considerable improvement and sharpening of their contributions.Eventually,the present eight papers stood out as specially representative of the current state of theﬁeld and were selected for inclusion in the special issue.II.T HE P APERSThe selected papers present a good combination of surveys, conceptual frameworks,innovative methods,and applications. Together,they cover a considerable range of methods(including constraint solving,stochastic search,planning,and grammars) and of content types(including game rules,buildings,and quests).However,there is a slight bias towards evolutionary computation as a method and game levels as a content domain. Starting off with“Search-based procedural content genera-tion:a taxonomy and survey,”Togelius et al.present a tax-Digital Object Identiﬁer10.1109/TCIAIG.2011.2166554onomy of PCG in general,and survey work in PCG that em-ploys a“search-based”approach,i.e.,that uses evolutionary or other stochastic search/optimization algorithms.A number of published studies are discussed and put into context using the proposed taxonomy.The paper ends with a discussion of open research topics within search-based PCG,and suggestions for how and when to take this approach to a PCG problem. Next,in“Answer set programming for procedural content generation:A design space approach,”Smith and Mateas pro-pose the use of answer set programming(ASP)as a means of formalizing content generation problems,and solving them using the SAT-solving algorithms that underlie ASP solvers. The authors also consider PCG problems as essentially about ﬁnding good solutions in a search space,but the representa-tions and search techniques proposed are radically different than those by Togelius et al.A couple of examples of this approach are reviewed,and a worked example is given of how to solve a problem using ASP that had previously been solved using evo-lutionary computation.In“Tanagra:Reactive planning and constraint solving for mixed-initiative level design,”Smith et al.describe a system for creating platform game levels.This is a mixed-initiative system, aimed at augmenting the capabilities of human level designers rather than replace them.In Tanagra,the user edits part of a game level,and the system responds by reshaping the adjacent parts of the level so as toﬁt in with the newly edited segment. The underlying methods used are reactive planning and con-straint solving,which might be seen as a form of search.One PCG technique that is deﬁnitely not a form of search is grammar rewriting,as used by Dormans and Bakkes in“Gener-ating missions and spaces for adaptable play experiences.”The authors propose a theory of game design as model transforma-tion,and use grammars to generate both levels and missions for Zelda-style action adventures.Additionally,some strategies are presented for how to make grammar-based level generation adapt to the needs and preferences of individual players using player models.Returning to the search-based approach,in“A generic approach to challenge modeling for the procedural creation of video game levels,”Sorenson et al.present a method for evolving levels for platform games.Key components are the use of a form of player models,as well as the FI-2Pop con-straint satisfaction evolutionary algorithm to enforce that the generated levels be playable.In“Automatic track generation for high-end racing games using evolutionary computation,”Lanzi et al.describe a working system based on genetic algorithms for generating tracks for an advanced3-D car racing game.They use two ﬁtness functions:one rewards diversity in terms of track shape, while the other is simulation based(involving an AI driving the1943-068X/$26.00©2011IEEEcar on the track that is being evaluated)and rewards diversity in driving speed.In“Search-based procedural generation of maze-like levels,”Ashlock et al.describe a technique for evolving maze levels, similar to those common in game genres such as roguelikes and action adventures.Key techniques of the proposed method com-bined direct and indirect representations andﬁtness functions based on dynamic programming.In“Generating consistent buildings:A semantic approach for integrating procedural techniques,”Tutenel et al.propose a generic framework aimed at integrating different PCG tech-niques,so that all disparate content generated by each of them seamlessly combines to form consistent environments,as,e.g., complex buildings(including façade,ﬂoor plan layout,furni-ture,etc.).The integration process is coordinated by a moder-ator that uses a library of semantic classes and constraints.III.T HE C OMMUNITYProcedural content generation has been sporadically used in games since the early1980s.However,that has largely been relegated to niche roles and/or niche games,and most methods used in early examples were,by today’s standards,rather sim-plistic.In addition,back then interaction was very scarce be-tween academic researchers and those game developers using PCG in some form.The last10–15years have seen a considerable increase in PCG research,although not always with an explicit application in games.This is apparent in the areas of computer graphics and modeling,witnessed,for example,by the impressive results proposed yearly at the Eurographics and ACM Siggraph con-ferences,addressing procedural generation of buildings,cities, road networks,vegetation,and other natural phenomena. Another,more game-focused research thread emerged from the area of AI,when game AIﬁnally became a respectable aca-demic research topic(which,depending on your perspective, might have been as late as the early2000s).Most of that ini-tial work,though,focused on generating well-performing(oc-casionally good-looking or believable)strategies for computer players or behavior for nonplayer characters.More recently, however,a number of papers started appearing in conferences such as the IEEE Conference on Computational Intelligence and Games,Artiﬁcial Intelligence and Interactive Digital Enter-tainment,and Foundations of Digital Games,proposing AI and CI solutions to content generation problems.All together,there is clearly an increasing interest and quality in the PCG work being published yearly at these conferences,which led those re-searchers to become aware of each other’s work,thus promoting the growing academic community focused on PCG for games. For example,the editors of this special issue were involved in the organization of theﬁrst two workshops on PCG,which were colocated with the Foundations of Digital Games conferences in 2010and2011( and http://pcgames. ).We intend to keep organizing workshops in this series,as a natural outlet for new concepts and ideas in PCG re-search.Additionally,the PCG Task Force was formed within the IEEE Computational Intelligence Society(http://game.itu.dk/ pcg/)for researchers interested in thisﬁeld,with an associated mailing list/discussion group which is free to join(http://groups. /group/proceduralcontent).In a separate effort,An-drew Doull maintains a PCG Wiki,including a catalog of PCG techniques and their usage in games(/). Another outcome of the efforts to organize the PCG research community is this special issue.We were fortunate to assemble an interesting selection of high-quality papers in this issue.We hope that you willﬁnd them to be as interesting and useful as we found the effort to put this issue together.J ULIAN T OGELIUS,Guest EditorCenter for Computer Games ResearchIT University of CopenhagenCopenhagen,2300DenmarkJ IM W HITEHEAD,Guest EditorCenter for Games and Playable MediaUniversity of California,Santa CruzSanta Cruz,CA95064-1077USAR AFAEL B IDARRA,Guest EditorComputer Graphics GroupDelft University of TechnologyDelft,2628TheNetherlandsJulian Togelius received the B.A.degree in philosophy from Lund University,Lund,Sweden,in 2002,the M.Sc.degree in evolutionary and adaptive systems from University of Sussex,Brighton, U.K.,in2003,and the Ph.D.in computer science from University of Essex,Essex,U.K.,in2007. He is currently an Assistant Professor at the IT University of Copenhagen(ITU),Copenhagen, Denmark.Before joining the ITU in2009,he was a Postdoctoral Researcher at IDSIA in Lugano. His research interests include applications of computational intelligence in games,procedural con-tent generation,automatic game design,evolutionary computation,and reinforcement learning;he has around50papers in journals and conferences about these topics.Dr.Togelius is an Associate Editor of the IEEE T RANSACTIONS ON C OMPUTATIONAL I NTELLIGENCE AND AI IN G AMES and is the current Chair of the IEEE CIS Technical Committee on Games.Jim Whitehead(S’94–M’06–SM’08)received the Ph.D.degree in information and computer sci-ence from the University of California Irvine,Irvine,in2000.He is an Associate Professor in the Computer Science Department,University of California Santa Cruz,Santa Cruz.He was an active participant in the creation of the Computer Science: Computer Game Design major at the University of California Santa Cruz in2006.His research interests include software evolution,software bug prediction,procedural content generation,and augmented design.Prof.Whitehead is a member of the Association for Computing Machinery(ACM)and the In-ternational Game Developers Association(IGDA).He is the founder and chair of the Society for the Advancement of the Science of Digital Games(SASDG).Rafael Bidarra graduated in electronics engineering from the University of Coimbra,Coimbra,Portugal,in1987and received the Ph.D.degree in computer science from Delft University of Tech-nology,Delft,The Netherlands,in1999.He is currently an Associate Professor of Game Technology at the Faculty of Electrical Engi-neering,Mathematics and Computer Science,Delft University of Technology.He leads the re-search line on game technology at the Computer Graphics Group.His current research interests include:procedural and semantic modeling techniques for the speciﬁcation and generation of both virtual worlds and gameplay;serious gaming;semantics of navigation;game adaptivity and inter-pretation mechanisms for in-game data.He has published many papers in international journals, books,and conference proceedings.He integrates the editorial board of several journals,and has served in many conference program committees.。

R软件包‘tpn’的中文名称：截断正态模型和扩展软件包说明书

Package‘tpn’December11,2023Type PackageTitle Truncated Positive Normal Model and ExtensionsVersion1.8Date2023-12-11Author Diego Gallardo[aut,cre],Hector J.Gomez[aut],Yolanda M.Gomez[aut]Maintainer Diego Gallardo<********************>Description Provide data generation and estimation tools for the truncated positive normal(tpn) model discussed in Gomez,Olmos,Varela and Bolfarine(2018)<doi:10.1007/s11766-018-3354-x>,the slash tpn distribution discussed in Gomez,Gallardo and Santoro(2021)<doi:10.3390/sym13112164>,the bimodal tpn distributiondiscussed in Gomez et al.(2022)<doi:10.3390/sym14040665>and theﬂexible tpn model. Depends R(>=4.0.0)Imports pracma,skewMLRM,moments,VGAM,RBE3License GPL(>=2)NeedsCompilation noRepository CRANDate/Publication2023-12-1104:30:05UTCR topics documented:btpn (2)choose.fts (3)est.btpn (4)est.fts (6)est.stpn (7)est.tpn (8)est.utpn (9)fts (11)stpn (12)tpn (13)utpn (15)12btpnIndex17 btpn Bimodal truncated positive normalDescriptionDensity,distribution function and random generation for the bimodal truncated positive normal (btpn)discussed in Gomez et al.(2022).Usagedbtpn(x,sigma,lambda,eta,log=FALSE)pbtpn(x,sigma,lambda,eta,lower.tail=TRUE,log=FALSE)rbtpn(n,sigma,lambda,eta)Argumentsx vector of quantilesn number of observationssigma scale parameter for the distributionlambda shape parameter for the distributioneta shape parameter for the distributionlog logical;if TRUE,probabilities p are given as log(p).lower.tail logical;if TRUE(default),probabilities are P[X<=x]otherwise,P[X>x].DetailsRandom generation is based on the stochastic representation of the model,i.e.,the product betweena tpn(see Gomez et al.2018)and a dichotomous variable assuming values−(1+ )and1− withprobabilities(1+ )/2and(1− )/2,respectively.Valuedbtpn gives the density,pbtpn gives the distribution function and rbtpn generates random deviates.The length of the result is determined by n for rbtpn,and is the maximum of the lengths of the numerical arguments for the other functions.The numerical arguments other than n are recycled to the length of the result.Only theﬁrst elements of the logical arguments are used.A variable have btpn distribution with parametersσ>0,λ∈R andη∈R if its probability densityfunction can be written asf(y;σ,λ,q)=φxσ(1+ )+λ2σΦ(λ),y<0,choose.fts3 andf(y;σ,λ,q)=φxσ(1− )−λ2σΦ(λ),y≥0,where =η/1+η2andφ(·)andΦ(·)denote the probability density function and the cumulativedistribution function for the standard normal distribution,respectively.Author(s)Gallardo,D.I.,Gomez,H.J.and Gomez,Y.M.ReferencesGomez,H.J.,Caimanque,W.,Gomez,Y.M.,Magalhaes,T.M.,Concha,M.,Gallardo,D.I.(2022) Bimodal Truncation Positive Normal Distribution.Symmetry,14,665.Gomez,H.J.,Olmos,N.M.,Varela,H.,Bolfarine,H.(2018).Inference for a truncated positive normal distribution.Applied Mathemetical Journal of Chinese Universities,33,163-176. Examplesdbtpn(c(1,2),sigma=1,lambda=-1,eta=2)pbtpn(c(1,2),sigma=1,lambda=-1,eta=2)rbtpn(n=10,sigma=1,lambda=-1,eta=2)choose.fts Choose a distribution in theﬂexible truncated positive class of modelsDescriptionProvide model selection for a given data set in theﬂexible truncated positive class of modelsUsagechoose.fts(y,criteria="AIC")Argumentsy positive vector of responsescriteria model criteria for the selection:AIC(default)or BIC.DetailsThe functionﬁts the truncated positive normal,truncated positive laplace,truncated positive Cauchy and truncated positive logistic models and select the model which provides the lower criteria(AIC or BIC).ValueA list with the following componentsAIC a vector with the AIC for the different truncated positiveﬁtted models:normal, laplace,cauchy and logistic.selected the selected modelestimate the estimated for sigma and lambda and the respective standard errors(s.e.) conv the code related to the convergence for the optim function.0if the convergence was attached.logLik log-likelihood function evaluated in the estimated parameters.AIC Akaike’s criterion.BIC Schwartz’s criterion.Author(s)Gallardo,D.I.,Gomez,H.J.and Gomez,Y.M.ReferencesGomez,H.J.,Gomez,H.W.,Santoro,K.I.,Venegas,O.,Gallardo,D.I.(2022).A Family of Trunca-tion Positive Distributions.Submitted.Gomez,H.J.,Olmos,N.M.,Varela,H.,Bolfarine,H.(2018).Inference for a truncated positive normal distribution.Applied Mathemetical Journal of Chinese Universities,33,163-176. Examplesset.seed(2021)y=rfts(n=100,sigma=10,lambda=1,dist="logis")choose.fts(y)est.btpn Parameter estimation for the btpn modelDescriptionPerform the parameter estimation for the bimodal truncated positive normal(btpn)discussed in Gomez et al.(2022).Estimated errors are computed based on the hessian matrix.Usageest.btpn(y)Argumentsy the response vector.All the values must be positive.DetailsA variable have btpn distribution with parametersσ>0,λ∈R andη∈R if its probability densityfunction can be written asf(y;σ,λ,q)=φxσ(1+ )+λ2σΦ(λ),y<0,andf(y;σ,λ,q)=φxσ(1− )−λ2σΦ(λ),y≥0,where =η/1+η2andφ(·)andΦ(·)denote the probability density function and the cumulativedistribution function for the standard normal distribution,respectively.ValueA list with the following componentsestimate A matrix with the estimates and standard errorsiter Iterations in which the convergence were attached.logLik log-likelihood function evaluated in the estimated parameters.AIC Akaike’s criterion.BIC Schwartz’s criterion.NoteA warning is presented if the estimated hessian matrix is not invertible.Author(s)Gallardo,D.I.,Gomez,H.J.and Gomez,Y.M.ReferencesGomez,H.J.,Caimanque,W.,Gomez,Y.M.,Magalhaes,T.M.,Concha,M.,Gallardo,D.I.(2022) Bimodal Truncation Positive Normal Distribution.Symmetry,14,665.Examplesset.seed(2021)y=rbtpn(n=100,sigma=10,lambda=1,eta=1.5)est.btpn(y)6est.fts est.fts Parameter estimation for the ftp class of distributionsDescriptionPerform the parameter estimation for the Flexible truncated positive(fts)class discussed in Gomez et al.(2022)based on maximum likelihood estimation.Estimated errors are computed based on the hessian matrix.Usageest.fts(y,dist="norm")Argumentsy the response vector.All the values must be positive.dist standard symmetrical distribution.Avaliable options:norm(default),logis, cauchy and laplace.DetailsA variable has fts distribution with parametersσ>0andλ∈R if its probability density functioncan be written asf(y;σ,λ,q)=g0(yσ−λ)σG0(λ),y>0,where g0(·)and G0(·)denote the pdf and cdf for the speciﬁed distribution.The case where g0(·) and G0(·)are from the standard normal model is known as the truncated positive normal model discussed in Gomez et al.(2018).ValueA list with the following componentsestimate A matrix with the estimates and standard errorsdist distribution speciﬁedconv the code related to the convergence for the optim function.0if the convergence was attached.logLik log-likelihood function evaluated in the estimated parameters.AIC Akaike’s criterion.BIC Schwartz’s criterion.NoteA warning is presented if the estimated hessian matrix is not invertible.Author(s)Gallardo,D.I.and Gomez,H.J.ReferencesGomez,H.J.,Gomez,H.W.,Santoro,K.I.,Venegas,O.,Gallardo,D.I.(2022).A Family of Trunca-tion Positive Distributions.Submitted.Gomez,H.J.,Olmos,N.M.,Varela,H.,Bolfarine,H.(2018).Inference for a truncated positive normal distribution.Applied Mathemetical Journal of Chinese Universities,33,163-176. Examplesset.seed(2021)y=rfts(n=100,sigma=10,lambda=1,dist="logis")est.fts(y,dist="logis")est.stpn Parameter estimation for the stpn modelDescriptionPerform the parameter estimation for the slash truncated positive normal(stpn)discussed in Gomez, Gallardo and Santoro(2021)based on the EM algorithm.Estimated errors are computed based on the Louis method to approximate the hessian matrix.Usageest.stpn(y,sigma0=NULL,lambda0=NULL,q0=NULL,prec=0.001,max.iter=1000)Argumentsy the response vector.All the values must be positive.sigma0,lambda0,q0initial values for the EM algorithm for sigma,lambda and q.If they are omitted,by default sigma0is deﬁned as the root of the mean of the y^2,lambda as0andq as3.prec the precision deﬁned for each parameter.By default is0.001.max.iter the maximum iterations for the EM algorithm.By default is1000.DetailsA variable has stpn distribution with parametersσ>0,λ∈R and q>0if its probability densityfunction can be written as1t1/qσφ(yt1/qσ−λ)dt,y>0,f(y;σ,λ,q)=whereφ(·)denotes the density function for the standard normal distribution.ValueA list with the following componentsestimate A matrix with the estimates and standard errorsiter Iterations in which the convergence were attached.logLik log-likelihood function evaluated in the estimated parameters.AIC Akaike’s criterion.BIC Schwartz’s criterion.NoteA warning is presented if the estimated hessian matrix is not invertible.Author(s)Gallardo,D.I.and Gomez,H.J.ReferencesGomez,H.,Gallardo,D.I.,Santoro,K.(2021)Slash Truncation Positive Normal Distribution:with application using the EM algorithm.Symmetry,13,2164.Examplesset.seed(2021)y=rstpn(n=100,sigma=10,lambda=1,q=2)est.stpn(y)est.tpn Parameter estimation for the tpnDescriptionPerform the parameter estimation for the truncated positive normal(tpn)discussed in Gomez et al.(2018)based on maximum likelihood estimation.Estimated errors are computed based on the hessian matrix.Usageest.tpn(y)Argumentsy the response vector.All the values must be positive.DetailsA variable have tpn distribution with parametersσ>0andλ∈R if its probability density functioncan be written asf(y;σ,λ,q)=φyσ−λσΦ(λ),y>0,whereφ(·)andΦ(·)denote the density and cumultative distribution functions for the standard nor-mal distribution.ValueA list with the following componentsestimate A matrix with the estimates and standard errorslogLik log-likelihood function evaluated in the estimated parameters.AIC Akaike’s criterion.BIC Schwartz’s criterion.NoteA warning is presented if the estimated hessian matrix is not invertible.Author(s)Gallardo,D.I.and Gomez,H.J.ReferencesGomez,H.J.,Olmos,N.M.,Varela,H.,Bolfarine,H.(2018).Inference for a truncated positive normal distribution.Applied Mathemetical Journal of Chinese Universities,33,163-176. Examplesset.seed(2021)y=rtpn(n=100,sigma=10,lambda=1)est.tpn(y)est.utpn Parameter estimation for the utpn modelDescriptionPerform the parameter estimation for the unit truncated positive normal(utpn)type1,2,3or4, parameterized in terms of the quantile based on maximum likelihood estimation.Estimated errors are computed based on the hessian matrix.Usageest.utpn(y,x=NULL,type=1,link="logit",q=0.5)Argumentsy the response vector.All the values must be positive.x the covariates vector.type to distinguish the type of the utpn model:1(default),2,3or4.link link function to be used for the covariates:logit(default).q quantile of the distribution to be modelled.ValueA list with the following componentsestimate A matrix with the estimates and standard errorslogLik log-likelihood function evaluated in the estimated parameters.AIC Akaike’s criterion.BIC Schwartz’s criterion.NoteA warning is presented if the estimated hessian matrix is not invertible.Author(s)Gallardo,D.I.ReferencesGomez,H.J.,Olmos,N.M.,Varela,H.,Bolfarine,H.(2018).Inference for a truncated positive normal distribution.Applied Mathemetical Journal of Chinese Universities,33,163-176. Examplesset.seed(2021)y=rutpn(n=100,sigma=10,lambda=1)est.utpn(y)fts11 fts Flexible truncated positive normalDescriptionDensity,distribution function and random generation for theﬂexible truncated positive(ftp)class discussed in Gomez et al.(2022).Usagedfts(x,sigma,lambda,dist="norm",log=FALSE)pfts(x,sigma,lambda,dist="norm",lower.tail=TRUE,log.p=FALSE)qfts(p,sigma,lambda,dist="norm")rfts(n,sigma,lambda,dist="norm")Argumentsx vector of quantilesp vector of probabilitiesn number of observationssigma scale parameter for the distributionlambda shape parameter for the distributiondist standard symmetrical distribution.Avaliable options:norm(default),logis, cauchy and laplace.log,log.p logical;if TRUE,probabilities p are given as log(p).lower.tail logical;if TRUE(default),probabilities are P[X<=x]otherwise,P[X>x]. DetailsRandom generation is based on the inverse transformation method.Valuedfts gives the density,pfts gives the distribution function,qfts gives the quantile function and rfts generates random deviates.The length of the result is determined by n for rbtpn,and is the maximum of the lengths of the numerical arguments for the other functions.The numerical arguments other than n are recycled to the length of the result.Only theﬁrst elements of the logical arguments are used.A variable have fts distribution with parametersσ>0andλ∈R if its probability density functioncan be written asf(y;σ,λ,q)=g0(yσ−λ)σG0(λ),y>0,where g0(·)and G0(·)denote the pdf and cdf for the speciﬁed distribution.The case where g0(·) and G0(·)are from the standard normal model is known as the truncated positive normal model discussed in Gomez et al.(2018).12stpnAuthor(s)Gallardo,D.I.,Gomez,H.J.and Gomez,Y.M.ReferencesGomez,H.J.,Gomez,H.W.,Santoro,K.I.,Venegas,O.,Gallardo,D.I.(2022).A Family of Trunca-tion Positive Distributions.Submitted.Gomez,H.J.,Olmos,N.M.,Varela,H.,Bolfarine,H.(2018).Inference for a truncated positive normal distribution.Applied Mathemetical Journal of Chinese Universities,33,163-176. Examplesdfts(c(1,2),sigma=1,lambda=1,dist="logis")pfts(c(1,2),sigma=1,lambda=1,dist="logis")rfts(n=10,sigma=1,lambda=1,dist="logis")stpn Slash truncated positive normalDescriptionDensity,distribution function and random generation for the slash truncated positive normal(stpn) discussed in Gomez,Gallardo and Santoro(2021).Usagedstpn(x,sigma,lambda,q,log=FALSE)pstpn(x,sigma,lambda,q,lower.tail=TRUE,log=FALSE)rstpn(n,sigma,lambda,q)Argumentsx vector of quantilesn number of observationssigma scale parameter for the distributionlambda shape parameter for the distributionq shape parameter for the distributionlog logical;if TRUE,probabilities p are given as log(p).lower.tail logical;if TRUE(default),probabilities are P[X<=x]otherwise,P[X>x].DetailsRandom generation is based on the stochastic representation of the model,i.e.,the quotient betweena tpn(see Gomez et al.2018)and a beta random variable.tpn13 Valuedstpn gives the density,pstpn gives the distribution function and rstpn generates random deviates.The length of the result is determined by n for rstpn,and is the maximum of the lengths of the numerical arguments for the other functions.The numerical arguments other than n are recycled to the length of the result.Only theﬁrst elements of the logical arguments are used.A variable has stpn distribution with parametersσ>0,λ∈R and q>0if its probability densityfunction can be written as1f(y;σ,λ,q)=t1/qσφ(yt1/qσ−λ)dt,y>0,whereφ(·)denotes the density function for the standard normal distribution.Author(s)Gallardo,D.I.and Gomez,H.J.ReferencesGomez,H.,Gallardo,D.I.,Santoro,K.(2021)Slash Truncation Positive Normal Distribution:with application using the EM algorithm.Symmetry,13,2164.Gomez,H.J.,Olmos,N.M.,Varela,H.,Bolfarine,H.(2018).Inference for a truncated positive normal distribution.Applied Mathemetical Journal of Chinese Universities,33,163-176.Examplesdstpn(c(1,2),sigma=1,lambda=-1,q=2)pstpn(c(1,2),sigma=1,lambda=-1,q=2)rstpn(n=10,sigma=1,lambda=-1,q=2)tpn Truncated positive normalDescriptionDensity,distribution function and random generation for the slash truncated positive normal(stpn) discussed in Gomez,Gallardo and Santoro(2021).Usagedtpn(x,sigma,lambda,log=FALSE)ptpn(x,sigma,lambda,lower.tail=TRUE,log=FALSE)rtpn(n,sigma,lambda)14tpn Argumentsx vector of quantilesn number of observationssigma scale parameter for the distributionlambda shape parameter for the distributionlog logical;if TRUE,probabilities p are given as log(p).lower.tail logical;if TRUE(default),probabilities are P[X<=x]otherwise,P[X>x].DetailsRandom generation is based on the inverse transformation method.Valuedtpn gives the density,ptpn gives the distribution function and rtpn generates random deviates.The length of the result is determined by n for rtpn,and is the maximum of the lengths of the numerical arguments for the other functions.The numerical arguments other than n are recycled to the length of the result.Only theﬁrst elements of the logical arguments are used.A variable have tpn distribution with parametersσ>0andλ∈R if its probability density functioncan be written asf(y;σ,λ,q)=φyσ−λσΦ(λ),y>0,whereφ(·)andΦ(·)denote the density and cumultative distribution functions for the standard nor-mal distribution.Author(s)Gallardo,D.I.and Gomez,H.J.ReferencesGomez,H.J.,Olmos,N.M.,Varela,H.,Bolfarine,H.(2018).Inference for a truncated positive normal distribution.Applied Mathemetical Journal of Chinese Universities,33,163-176. Examplesdtpn(c(1,2),sigma=1,lambda=-1)ptpn(c(1,2),sigma=1,lambda=-1)rtpn(n=10,sigma=1,lambda=-1)utpn15 utpn Truncated positive normalDescriptionDensity,distribution function and random generation for the unit truncated positive normal(utpn) type1or2discussed in Gomez,Gallardo and Santoro(2021).Usagedutpn(x,sigma=1,lambda=0,type=1,log=FALSE)putpn(x,sigma=1,lambda=0,type=1,lower.tail=TRUE,log=FALSE)qutpn(p,sigma=1,lambda=0,type=1)rutpn(n,sigma=1,lambda=0,type=1)Argumentsx vector of quantilesn number of observationsp vector of probabilitiessigma scale parameter for the distributionlambda shape parameter for the distributiontype to distinguish the type of the utpn model:1(default)or2.log logical;if TRUE,probabilities p are given as log(p).lower.tail logical;if TRUE(default),probabilities are P[X<=x]otherwise,P[X>x].DetailsRandom generation is based on the inverse transformation method.Valuedutpn gives the density,putpn gives the distribution function,qutpn provides the quantile function and rutpn generates random deviates.The length of the result is determined by n for rtpn,and is the maximum of the lengths of the numerical arguments for the other functions.The numerical arguments other than n are recycled to the length of the result.Only theﬁrst elements of the logical arguments are used.A variable has utpn distribution with scale parameterσ>0and shape parameterλ∈R if itsprobability density function can be written asf(y;σ,λ)=φ1−yσy−λσy2Φ(λ),y>0,(type1),16utpnf(y;σ,λ)=φyσ(1−y)−λσ(1−y)2Φ(λ),y>0,(type2),f(y;σ,λ)=φlog(y)σ+λσyΦ(λ),y>0,(type3),f(y;σ,λ)=φlog(1−y)σ+λσ(1−y)Φ(λ),y>0,(type4),whereφ(·)andΦ(·)denote the density and cumulative distribution functions for the standard nor-mal distribution.Author(s)Gallardo,D.I.ReferencesGomez,H.J.,Olmos,N.M.,Varela,H.,Bolfarine,H.(2018).Inference for a truncated positive normal distribution.Applied Mathemetical Journal of Chinese Universities,33,163-176. Examplesdutpn(c(0.1,0.2),sigma=1,lambda=-1)putpn(c(0.1,0.2),sigma=1,lambda=-1)rutpn(n=10,sigma=1,lambda=-1)Indexbtpn,2choose.fts,3dbtpn(btpn),2dfts(fts),11dstpn(stpn),12dtpn(tpn),13dutpn(utpn),15est.btpn,4est.fts,6est.stpn,7est.tpn,8est.utpn,9fts,11pbtpn(btpn),2pfts(fts),11pstpn(stpn),12ptpn(tpn),13putpn(utpn),15qfts(fts),11qutpn(utpn),15rbtpn(btpn),2rfts(fts),11rstpn(stpn),12rtpn(tpn),13rutpn(utpn),15stpn,12tpn,13utpn,1517。

布谷鸟搜索寻优算法Cuckoo search Optimization Algorithm

Applied Soft Computing 11(2011)5508–5518Contents lists available at ScienceDirectApplied SoftComputingj o u r n a l h o m e p a g e :w w w.e l s e v i e r.c o m /l o c a t e /a s ocCuckoo Optimization AlgorithmRamin Rajabioun ∗Control and Intelligent Processing Centre of Excellence (CIPCE),School of Electrical and Computer Engineering,University of Tehran,Tehran,Irana r t i c l ei n f oArticle history:Received 17September 2009Received in revised form 28August 2010Accepted 1May 2011Available online 13May 2011Keywords:Cuckoo Optimization Algorithm (COA)Evolutionary algorithms Nonlinear optimizationa b s t r a c tIn this paper a novel evolutionary algorithm,suitable for continuous nonlinear optimization problems,is introduced.This optimization algorithm is inspired by the life of a bird family,called Cuckoo.Special lifestyle of these birds and their characteristics in egg laying and breeding has been the basic motivation for development of this new evolutionary optimization algorithm.Similar to other evolutionary methods,Cuckoo Optimization Algorithm (COA)starts with an initial population.The cuckoo population,in differ-ent societies,is in two types:mature cuckoos and eggs.The effort to survive among cuckoos constitutes the basis of Cuckoo Optimization Algorithm.During the survival competition some of the cuckoos or their eggs,demise.The survived cuckoo societies immigrate to a better environment and start reproducing and laying eggs.Cuckoos’survival effort hopefully converges to a state that there is only one cuckoo society,all with the same proﬁt values.Application of the proposed algorithm to some benchmark functions and a real problem has proven its capability to deal with difﬁcult optimization problems.©2011Elsevier B.V.All rights reserved.1.IntroductionOptimization is the process of making something better.In other words,optimization is the process of adjusting the inputs to or char-acteristics of a device,mathematical process,or experiment to ﬁnd the minimum or maximum output or result.The input consists of variables:the process or function is known as the cost function,objective function,or ﬁtness function;and the output is the cost or ﬁtness [1].There are different methods for solving an optimiza-tion problem.Some of these methods are inspired from natural processes.These methods usually start with an initial set of vari-ables and then evolve to obtain the global minimum or maximum of the objective function.Genetic Algorithm (GA)has been the most popular technique in evolutionary computation research.Genetic Algorithm uses operators inspired by natural genetic variation and natural selection [2,3].Another example is Particle Swarm Opti-mization (PSO)which was developed by Eberhart and Kennedy in 1995.This stochastic optimization algorithm is inspired by social behavior of bird ﬂocking or ﬁsh schooling [3–5].Ant Colony Opti-mization (ACO)is another evolutionary optimization algorithm which is inspired by the pheromone trail laying behavior of real ant colonies [3,6,7].On the other hand Simulated Annealing sim-ulates the annealing process in which a substance is heated above its melting temperature and then gradually cools to produce the crystalline lattice,which minimizes its energy probability distribu-∗Correspondence address:Faculty of Engineering,Campus #2,University of Tehran,Kargar Shomali St.,P.O.Box 14395-515,Tehran,Iran.Tel.:+989144045713.E-mail addresses:r.rajabioun@ece.ut.ac.ir ,ramin4251@tion [1,8,9].Besides these well known methods,the investigations on nature inspired optimization algorithms are still being done and new methods are being developed to continually solve some sort of nonlinear problems.In [10],making use of the ergodicity and internal randomness of chaos iterations,a novel immune evolu-tionary algorithm based on the chaos optimization algorithm and immune evolutionary algorithm is presented to improve the con-vergence performance of the immune evolutionary algorithm.The novel algorithm integrates advantages of the immune evolution-ary algorithm and chaos optimization algorithm.[11]introduces a new optimization technique called Grenade Explosion Method (GEM)and its underlying ideas,including the concept of Optimal Search Direction (OSD),are elaborated.In [12]a new particle swarm optimization method based on the clonal selection algorithm is pro-posed to avoid premature convergence and guarantee the diversity of the population.The main advantages of evolutionary algorithms are [3]:(1)Being robust to dynamic changes :Traditional methods of opti-mization are not robust to dynamic changes in the environment and they require a complete restart for providing a solution.In contrary,evolutionary computation can be used to adapt solutions to changing circumstances.(2)Broad applicability :Evolutionary algorithms can be applied toany problems that can be formulated as function optimization problems.(3)Hybridization with other methods :Evolutionary algorithms canbe combined with more traditional optimization techniques.(4)Solves problems that have no solutions :The advantage of evolu-tionary algorithms includes the ability to address problems for1568-4946/$–see front matter ©2011Elsevier B.V.All rights reserved.doi:10.1016/j.asoc.2011.05.008R.Rajabioun/Applied Soft Computing11(2011)5508–55185509Fig.1.Flowchart of Cuckoo Optimization Algorithm.which there is no human expertise.Even though human exper-tise should be used when it is needed and available;it often proves less adequate for automated problem-solving routines.Considering these features,evolutionary algorithms can be applied to various applications including:Power Systems oper-ations and control[13,19,20],NP-Hard combinatorial problems [14,15],Chemical Processes[16],Job Scheduling problems[17], Vehicle Routing Problems,Mobile Networking,Batch process scheduling,Multi-objective optimization problems[18],Modeling optimized parameters[21],Image processing and Pattern recogni-tion problems.In this paper we introduce a new evolutionary optimization algorithm which is inspired by lifestyle of a bird family called cuckoo.Speciﬁc egg laying and breeding of cuckoos is the basis of this novel optimization algorithm.Cuckoos used in this model-ing exist in two forms:mature cuckoos and eggs.Mature cuckoos lay eggs in some other birds’nest and if these eggs are not recog-nized and not killed by host birds,they grow and become a mature cuckoo.Environmental features and the immigration of societies (groups)of cuckoos hopefully lead them to converge andﬁnd the best environment for breeding and reproduction.This best envi-ronment is the global maximum of objective functions.This paper illustrates how the life method of cuckoos is modeled and imple-mented.Section2investigates the birds called cuckoo and reviews their amazing life characteristics.In Section3,the Cuckoo Optimization Algorithm(COA)is proposed and its different parts are studied in details.The proposed algorithm is tested with some benchmark functions and also with a controller design of a Multi-Input Multi-Output(MIMO)process as a real case study in Section4.Finally the conclusions are presented in Section5.2.Cuckoos and their special lifestyle for reproductionAll9000species of birds have the same approach to mother-hood:every one lays eggs.No bird gives birth to live young.Birds quickly form and lay an egg covered in a protective shell that is then incubated outside the body.The large size of an egg makes it difﬁcult for the female to retain more than a single one egg at a time–carrying eggs would makeﬂying harder and require more energy.And because the egg is such a protein-rich high-nutrition prize to all sorts of predators,birds mustﬁnd a secure place to hatch their eggs.Finding a place to safely place and hatch their eggs,and raise their young to the point of independence,is a challenge birds have solved in many clever ways.They use artistry,intricate design and complex engineering.The diversity of nest architecture has no equal in the animal kingdom.Many birds build isolated,inconspic-uous nests,hidden away inside the vegetation to avoid detection by predators.Some of them are so successful at hiding their nests that even the all-seeing eyes of man has hardly ever looked on them.There are other birds that dispense with every convention of home making and parenthood,and resort to cunning to raise their families.These are the“brood parasites,”birds which never build their own nests and instead lay their eggs in the nest of another species,leaving those parents to care for its young.The cuckoo is the best known brood parasite,an expert in the art of cruel decep-tion.Its strategy involves stealth,surprise and speed.The mother removes one egg laid by the host mother,lays her own andﬂies off with the host egg in her bill.The whole process takes barely ten seconds.Cuckoos parasitize the nests of a large variety of bird species and carefully mimic the color and pattern of their own eggs to match that of their hosts.Each female cuckoo specializes on one particular host species.How the cuckoo manages to lay eggs to imi-tate each host’s eggs so accurately is one of nature’s main mysteries. Many bird species learn to recognize a cuckoo egg dumped in their own nest and either throw out the strange egg or desert the nest to start afresh.So the cuckoo constantly tries to improve its mimicry of its hosts’eggs,while the hosts try toﬁnd ways of detecting the parasitic egg.The struggle between host and parasite is akin to an arms race,each trying to out-survive the other[22].For the cuckoos suitable habitat provides a source of food(prin-cipally insects and especially caterpillars)and a place to breed,for brood parasites the need is for suitable habitat for the host species. Cuckoos occur in a wide variety of habitats.The majority of species occur in forests and woodland,principally in the evergreen rain-forests of the tropics.In addition to forests some species of cuckoo occupy more open environments;this can include even arid areas like deserts.Temperate migratory species like the Common Cuckoo inhabit a wide range of habitats in order to make maximum use of the potential brood hosts,from reed beds to treeless moors.Most species of cuckoo are sedentary,but several species of cuckoo undertake regular seasonal migrations,and several more undertake partial migrations over part of their range.The migration may be Diurnal,as in the Channel-billed Cuckoo,or nocturnal,as in the Yellow-billed Cuckoo.For species breeding at higher latitudes food availability dictates that they migrate to warmer climates dur-ing the winter,and all do so.The Long-tailed Koel which breeds in New Zealandﬂies migrates to its wintering grounds in Poly-nesia,Micronesia and Melanesia,a feat described as“perhaps the most remarkable over water migration of any land bird”[23];and the Yellow-billed Cuckoo and Black-billed Cuckoo breed in North America andﬂy across the Caribbean Sea,a non-stopﬂight of 4000km.Other long migrationﬂights include the Lesser Cuckoo,5510R.Rajabioun/Applied Soft Computing11(2011)5508–5518whichﬂies from India to Kenya across the Indian Ocean(3000km) and the Common Cuckoos of Europe whichﬂy non-stop over the Mediterranean Sea and Sahara Desert on their voyage to south-ern Africa.Within Africa10species make regular intra-continental migrations that are described as polarized,that is they spend the non-breeding season in the tropical centre of the continent and move north and south to breed in the more arid and open savannah and deserts[24].About56of the Old World species and3of the New World species are brood parasites,laying their eggs in the nests of other birds[25].These species are obligate brood parasites,meaning that they only reproduce in this fashion.The cuckoo egg hatches earlier than the host’s,and the cuckoo chick grows faster;in most cases the chick evicts the eggs or young of the host species.The chick has no time to learn this behavior,so it must be an instinct passed on genetically.The chick encourages the host to keep pace with its high growth rate with its rapid begging call[26]and the chick’s open mouth which serves as a sign stimulus[27].Female para-sitic cuckoos specialize and lay eggs that closely resemble the eggs of their chosen host.This has been produced by natural selection, as some birds are able to distinguish cuckoo eggs from their own, leading to those eggs least like the host’s being thrown out of the nest[27].Host species may engage in more direct action to prevent cuckoos laying eggs in their nest in theﬁrst place–birds whose nests are at high risk of cuckoo-contamination are known to mob cuckoos to drive them out of the area[28].Parasitic cuckoos are grouped into gents,with each gent specializing in a particular host. There is some evidence that the gents are genetically different from one another.Host speciﬁcity is enhanced by the need to imitate the eggs of the host.3.The proposed Cuckoo Optimization Algorithm(COA)Fig.1shows aﬂowchart of the proposed algorithm.Like other evolutionary algorithms,the proposed algorithm starts with an ini-tial population of cuckoos.These initial cuckoos have some eggs to lay in some host birds’nests.Some of these eggs which are more similar to the host bird’s eggs have the opportunity to grow up and become a mature cuckoo.Other eggs are detected by host birds and are killed.The grown eggs reveal the suitability of the nests in that area.The more eggs survive in an area,the more proﬁt is gained in that area.So the position in which more eggs survive will be the term that COA is going to optimize.Cuckoos search for the most suitable area to lay eggs in order to maximize their eggs survival rate.After remained eggs grow and turn into a mature cuckoo,they make some societies.Each soci-ety has its habitat region to live in.The best habitat of all societies will be the destination for the cuckoos in other societies.Then they immigrate toward this best habitat.They will inhabit somewhere near the best habitat.Considering the number of eggs each cuckoo has and also the cuckoo’s distance to the goal point(best habitat), some egg laying radii is dedicated to it.Then,cuckoo starts to lay eggs in some random nests inside her egg laying radius.This pro-cess continues until the best position with maximum proﬁt value is obtained and most of the cuckoo population is gathered around the same position.3.1.Generating initial cuckoo habitatIn order to solve an optimization problem,it’s necessary that the values of problem variables be formed as an array.In GA and PSO terminologies this array is called“Chromosome”and“Particle Position”,respectively.But here in Cuckoo Optimization Algorithm (COA)it is called“habitat”.In a N var-dimensionaloptimization Fig.2.Random egg laying in ELR,central red star is the initial habitat of the cuckoo with5eggs;pink stars are the eggs’new nest.problem,a habitat is an array of1×N var,representing current living position of cuckoo.This array is deﬁned as follows:habitat=[x1,x2,...,x Nvar](1)Each of the variable values(x1,x2,...,x Nvar)isﬂoating point num-ber.The proﬁt of a habitat is obtained by evaluation of proﬁtfunction f p at a habitat of(x1,x2,...,x Nvar).SoProﬁt=f p(habitat)=f p(x1,x2,...,x Nvar)(2) As it is seen COA is an algorithm that maximizes a proﬁt function.To use COA in cost minimization problems,one can easily maximize the following proﬁt function:Proﬁt=−Cost(habitat)=−f c(x1,x2,...,x Nvar)(3) To start the optimization algorithm,a candidate habitat matrix of size N pop×N var is generated.Then some randomly produced num-ber of eggs is supposed for each of these initial cuckoo habitats.In nature,each cuckoo lays from5to20eggs.These values are used as the upper and lower limits of egg dedication to each cuckoo at different iterations.Another habit of real cuckoos is that they lay eggs within a maximum distance from their habitat.From now on, this maximum range will be called“Egg Laying Radius(ELR)”.In an optimization problem with upper limit of var hi and lower limit of var low for variables,each cuckoo has an egg laying radius(ELR) which is proportional to the total number of eggs,number of cur-rent cuckoo’s eggs and also variable limits of var hi and var low.So ELR is deﬁned as:ELR=˛×Number of current cuckoo’s eggsTotal number of eggs×(var hi−var low)(4)where˛is an integer,supposed to handle the maximum value of ELR.3.2.Cuckoos’style for egg layingEach cuckoo starts laying eggs randomly in some other host birds’nests within her ELR.Fig.2gives a clear view of this concept.After all cuckoos’eggs are laid in host birds’nests,some of them that are less similar to host birds’own eggs,are detected by host birds and though are thrown out of the nest.So after egg laying process,p%of all eggs(usually10%),with less proﬁt values,will beR.Rajabioun/Applied Soft Computing11(2011)5508–55185511Fig.3.Immigration of a sample cuckoo toward goal habitat.killed.These eggs have no chance to grow.Rest of the eggs grow in host nests,hatch and are fed by host birds.Another interesting point about laid cuckoo eggs is that only one egg in a nest has the chance to grow.This is because when cuckoo egg hatches and the chicks come out,she throws the host bird’s own eggs out of the nest. In case that host bird’s eggs hatch earlier and cuckoo egg hatches later,cuckoo’s chick eats most of the food host bird brings to the nest(because of her3times bigger body,she pushes other chicks and eats more).After couple of days the host bird’s own chicks die from hunger and only cuckoo chick remains in the nest.3.3.Immigration of cuckoosWhen young cuckoos grow and become mature,they live in their own area and society for sometime.But when the time for egg laying approaches they immigrate to new and better habitats with more similarity of eggs to host birds and also with more food for new youngsters.After the cuckoo groups are formed in differ-ent areas,the society with best proﬁt value is selected as the goal point for other cuckoos to immigrate.When mature cuckoos live in all over the environment it’s difﬁcult to recognize which cuckoo belongs to which group.To solve this problem,the grouping of cuckoos is done with K-means clustering method(a k of3–5seems to be sufﬁcient in simulations).Now that the cuckoo groups are con-stituted their mean proﬁt value is calculated.Then the maximum value of these mean proﬁts determines the goal group and conse-quently that group’s best habitat is the new destination habitat for immigrant cuckoos.When moving toward goal point,the cuckoos do notﬂy all the way to the destination habitat.They onlyﬂy a part of the way and also have a deviation.This movement is clearly shown in Fig.3.As it is seen in Fig.3,each cuckoo onlyﬂies %of all distance toward goal habitat and also has a deviation ofϕradians.These two parameters, andϕ,help cuckoos search much more positions in all environment.For each cuckoo, andϕare deﬁned as follows: ∼U(0,1)ϕ∼U(−ω,ω)(5)where ∼U(0,1)means that is a random number(uniformly dis-tributed)between0and1.ωis a parameter that constrains the deviation from goal habitat.Anωof /6(rad)seems to beenough Fig.4.Pseudo-code for Cuckoo Optimization Algorithm.5512R.Rajabioun/Applied Soft Computing11(2011)5508–5518 for good convergence of the cuckoo population to global maximumproﬁt.When all cuckoos immigrated toward goal point and new habi-tats were speciﬁed,each mature cuckoo is given some eggs.Thenconsidering the number of eggs dedicated to each bird,an ELRis calculated for each cuckoo.Afterward new egg laying processrestarts.3.4.Eliminating cuckoos in worst habitatsDue to the fact that there is always equilibrium in birds’pop-ulation so a number of N max controls and limits the maximumnumber of live cuckoos in the environment.This balance is becauseof food limitations,being killed by predators and also inability toﬁnd proper nest for eggs.In the modeling proposed here in thispaper,only those N max number of cuckoos survive that have betterproﬁt values,others demise.3.5.ConvergenceAfter some iterations,all the cuckoo population moves to onebest habitat with maximum similarity of eggs to the host birds andalso with the maximum food resources.This habitat will producethe maximum proﬁt ever.There will be least egg losses in this besthabitat.Convergence of more than95%of all cuckoos to the samehabitat puts an end to Cuckoo Optimization Algorithm(COA).Themain steps of COA are presented in Fig.4as a pseudo-code.In thenext part,COA is applied to some benchmark optimization prob-lems.Theoretical proofs for convergence to asymptotic probabilitylaws in all stochastic optimization algorithms,considering theMarkovian nature of the underlying processes,require some sort ofdetailed balance or reversibility condition which means the algo-rithm loses much of its efﬁciency.Furthermore,if one insists oneventual convergence to the global optima in the strong or evenweak sense,very slow annealing is also called for.The strength ofstochastic algorithms stem from the fact that their very probabilis-tic nature ensures that the algorithms will not necessarily get stuckat local optima,and there is no need for using any information onobjective gradients,further requiring differentiability conditions.4.Benchmarks on Cuckoo Optimization AlgorithmIn this section the proposed Cuckoo Optimization Algorithm(COA)is tested with4benchmark functions from Ref.[1],one10-dimensional Rastrigin function and a real case study.4.1.Test cost functionsAll the benchmark functions are minimization problems.Thesefunctions are listed below:Function F1:f=x×sin(4x)+1.1y×sin(2y)0<x,y<0,minimum:f(9.039,8.668)=−8.5547(6)Function F2:f=0.5+sin2x2+y2−0.51+0.1(x2+y2)0<x,y<2,minimum:f(0,0.5)=0.5(7)Function F3:f=(x2+y2)0.25×sin{30[(x+0.5)2+y2]0.1}+|x|+|y|−∞<x,y<+∞,minimum:f(−0.2,0)=−0.2471(8)Fig.5.A3D plot of cost function F1.Function F4:f=J0(x2+y2)+0.11−x+0.11−y−∞<x,y<+∞,minimum:f(1,1.6606)=−0.3356(9)Function F5(10-dimensional Rastrigin function):f=10n+ni=1(x i2−10cos(2 x i)),n=9−5.12≤x i≤5.12,f(0,0,...,0)=0(10)First function F1is studied.This function has the global minimum of−18.5547at(x,y)=(9.039,8.668)in interval0<x,y<10.Fig.5 shows the3D plot of this function.The initial number of cuckoos is set only to20.Each cuckoo can lay between5and10eggs.Fig.6shows initial distribution of cuckoos in problem environment.Figs.7–12show the cuckoo population habitats in consequent iterations.Convergence is gained at iteration7.The COA has obtained the global minimum just in7iterations.As it is seen in Figs.7–12,cuckoos have found2minima at iter-ation4.Then in iteration5it is seen that one group of cuckoos is immigrating toward the global minimum.In iteration6most of cuckoos are in global minimum.Andﬁnally at iteration7nearly all of cuckoos are on the best habitat,which is the global minimum of the problem.This habitat is(9.0396,8.6706)with the cost value −18.5543.Fig.13depicts the cost minimization for test functionF1.Fig.6.Initial habitats of cuckoos.R.Rajabioun /Applied Soft Computing 11(2011)5508–55185513Fig.7.Habitats of cuckoos in 2nditeration.Fig.8.Habitats of cuckoos in 3rd iteration.In order to do a comparison,PSO and continuous GA with Roulette wheel selection,uniform cross-over are applied to this function too.The initial population of GA is also set to 20,muta-tion and selection rates are set to 0.2and 0.5,respectively.ForPSOFig.9.Habitats of cuckoos in 4thiteration.Fig.10.Habitats of cuckoos in 5thiteration.Fig.11.Habitats of cuckoos in 6th iteration.cognitive and social parameters are both set to 2.Due to the fact that different initial populations of each method affect directly to the ﬁnal result and the speed of algorithm,a series of test runs is done to have a mean expectance of performance for eachmethod.Fig.12.Habitats of cuckoos in 7th iteration.5514R.Rajabioun /Applied Soft Computing 11(2011)5508–5518Fig.13.Cost minimization for test functionF1.Fig.14.Cost minimization using GA.Running the simulations for 30times produces a mean of 45.9,38.7and 6.8stopping iterations for GA,PSO and COA.Fig.14shows a sample cost minimization plot of function F1for GA in 100iterations.As it is seen from Fig.14,GA has reached to global minimum at 24th iteration.Best chromosome is (9.0434,8.6785)and the cost value is −18.5513.Fig.15depicts cost minimization of function F1using PSO.As it is seen from Fig.15,PSO has reached to global minimum at 19th iteration.Best particle position is (9.0390,8.6682)and thecostFig.15.Cost minimization using PSO.Table 1Mean stopping iterations of GA,PSO and COA in 30runs.F2F3F4GA 12.652.244.1PSO 10.324.838.6COA5.26.96.3Fig.16.Cost minimization plot of functionF2.Fig.17.Cost minimization plot of function F3.value is −18.5547.Considering Table 1it can be seen that while GA and PSO need a mean of 46.8and 39.1iterations,COA reaches to the goal point in a mean of 6.9(approximately 7)iterations.Until now it can be concluded that COA has out performed GA and PSO.For more test we apply these three optimization algorithms on test functions F2,F3and F4.Figs.16–18show the cost minimization plot of all three algo-rithms for test functions F2,F3and F4in a random run.Table1Fig.18.Cost minimization plot of function F4.R.Rajabioun/Applied Soft Computing11(2011)5508–55185515Fig.19.3D plot of Rastrigin function.shows the mean stopping iterations for aforementioned test func-tions.The most interesting point seen in Figs.16–18and also in Table1, is faster convergence of Cuckoo Optimization Algorithm.Considering the results obtained for test functions F1,F2,F3and F4it can be seen that all three methods have been able toﬁnd the global minimum.The only interesting point of Cuckoo Optimiza-tion Algorithm(COA)might be its faster convergence.But to show the superiority of COA over GA and PSO,the10-dimensional Ras-trigin function is chosen as test function F5.This function has lots of local minima and is one of the difﬁcult problems to solve,even in3-dimensional case.Fig.19shows the3-dimensional Rastrigin function.As it is seen even in3-dimensional case,the Rastrigin func-tion is a really challenging optimization problem.But to see the real performance of COA,GA and PSO the10-dimensional Rastri-gin function is selected as last benchmark function.Fig.20depicts the cost minimization results for all three algorithms.For all three methods,the initial population size and the maximum number of iterations are set to20and100,respectively.Now it is clearly seen that GA and PSO have not been able to ﬁnd the global minimum in100iterations,but COA has converged in only66iterations to almost the global minimum of f(x*)=0.In this benchmark function,COA has stunningly out performed and has found a very good estimation of the real global minimum.After that the great performance of COA is proven in test cost functions it is needed to investigate its performance in real prob-lems.For this,a Multi-Input Multi-Output(MIMO)distillation column process is chosen in order to be controlled by means ofmultivariable PID controller.The parameters of PID controllerareFig.20.Cost minimization for10-dimensional Rastrigin function.designed using COA,GA and the method proposed in[29].Before illustrating the design process a brief description is given about multivariable controller design.4.2.Multivariable controller design4.2.1.PID controller for MIMO processesConsider the multivariable PID control loop in Fig.21.In Fig.21,multivariable process P(s)could be demonstrated as follows:P(s)=⎡⎣p11(s)...p1n(s).........p n1(s)...p nn(s)⎤⎦(11)where p ij(s)is the transfers function between y i and u j.In Fig.21, vectors Y d,Y,U and E are in following form:Y d=[y d1y d2···y dn]TY=[y1y2···y n]TU=[u1u2···u n]TE=Y d−Y=[e11e22···e nn]TMultivariable PID controller C(s)in Fig.21,is in the following form: C(s)=⎡⎣c11(s)...c1n(s).........c n1(s)...c nn(s)⎤⎦(12) where c ij(s)that i,j∈{1,2,...,n}is as follows:c ij(s)=K Pij+K Iij1s+K Dij s(13)where K Pij is the proportional,K Iij is the integral and K Dij is the derivative gains of the PID controller c ij(s).4.2.2.Evolutionary PID designIn designing PID controllers,the goal is to tune proper coef-ﬁcients K P,K I and K d so that the output has some desired ually in time domain,these characteristics are given in terms of overshoot,rise time,settling time and steady state error.Two kinds of performance criteria in output tracking,usu-ally considered in the controller designing,are the integral squared error(ISE)and integral absolute error(IAE)of the desired output.In design of a multivariable controller,one of the major aims is diagonally domination of the control process.That is the controller be designed in such a way that y i(t)be able to track the desired input y di(t)and to reject the response of other inputs y dj(t),for i, j∈{1,2,...,n|i/=j}.Considering the decupling aim,IAE is deﬁned in the following form:IAEni=1nj=1IAE ijni=1nj=1∞e ij(t)dt(14)where|e ii(t)|is absolute error of the output y i(t)when tracking input y di(t)and|e ij(t)|is the absolute error caused by the effect of the input y dj(t),on the output y i(t),(i/=j).Also IAE ij is deﬁned as integral of absolute error e ij(t)over time.The source of|e ij(t)|is the coupling problem.。

Stochastic Optimization Algorithms

Stochastic Optimization AlgorithmsPierre ColletUniversité du Littoral Côte d’OpaleLaboratoire d’Informatique du Littoral BP 71962100 Calais cedex — France pierre.collet@Univ-Littoral.frJean-Philippe Rennard*Grenoble Graduate School of Business12, rue Pierre Sémard BP 12738003 Grenoble cedex 01 — Francejp@This is a draft version of a paper to be published in: Rennard, J.P. (Ed.), Handbook of Research on Nature Inspired Computing for Economics and Management , Hershey, IGR, 2006.Abstract : When looking for a solution, deterministic methods have the enormous advantage that they do find global optima. Unfortunately, they are very CPU-intensive, and are useless on untractable NP-hard problems that would require thousands of years for cutting-edge computers to explore.In order to get a result, one needs to revert to stochastic algorithms, that sample the search space without exploring it thoroughly. Such algorithms can find very good results, without any guarantee that the global optimum has been reached; but there is often no other choice than using them.This chapter is a short introduction to the main methods used in stochastic optimization.IntroductionThe never ending search for productivity has made optimization a core concern for engineers. Quick process, low energy consumption, short and economical supply chains are now key success factorsGiven a space Ω of individual solutions n R ω∈ and an objective function ()f f R ω,→, optimizing is the process of finding the solution ω∗ which minimizes (maximizes) f .For hard problems, optimization is often described as a walk in a fitness landscape . First proposed by biologist S. Wright in 1932 (Wright, 1932), fitness landscapes aimed at representing the fitness of a living organism according to the genotype space. While optimizing, fitness measures the quality of a solution, and fitness landscapes plot solutions and corresponding goodness (fitness). If one wishes to optimize the function 10x +=, then depending on the choice of the error measure, fitness can for example be defined as |(1)|x −−+ or as 1|(1)|x /−+. The optimization process then tries to find the peak of the fitness landscape (see figure 1-a).<<INSERT FIGURE 1>>This example is trivial and the optimum is easy to find. Real problems are often multimodal, meaning that their fitness landscapes contain several local optima (i.e. points whose all neighbors have a lower fitness see figure 1-b). This is particularly true when variables interact with one another (epistatis ).Usual analytical methods, like gradient descent, are often unable to find a global optimum, since they are unable to deal with such functions. Moreover, companies mostly deal with combinatorial problems like quadratic assignment, timetabling or scheduling problems. These problems using discrete states generate non-continuous objective functions that are unreachable through analytical methods.Stochastic optimization algorithms were designed to deal with highly complex optimization problems. This chapter will first introduce the notion of complexity and then present the main stochastic optimization algorithms.NP-complete problems and combinatorial explosionIn December, Santa Claus must prepare the millions of presents he has to distribute for Christmas. Since the capacity of his sleigh is finite, and he prefers to minimize the number of runs, he would like to find the best way to organize the packs. Despite the apparent triviality of the task, Santa Claus is facing a very hard problem. Its simplest formulation is the one-dimensional bin packing problem . Given a list 12()n L a a …a =,,, of items withsizes 0()1i s a <≤, what is the minimum number m of unit-capacity bins j B such that()11i j i a B s a j m ∈≤,≤≤∑ ? This problem is known to be NP-hard (Coffman, Garey, & Johnson, 1996).Various forms of bin packing problem are very common. Transportation industry must optimize truck packing given weight limits, press has to organize advertisements minimizing the space, metal sheet industry must solve the cutting-stock problem (how to minimize waste when cutting a metal sheet)…Such problems are very tough because we do not know how to build algorithms that can solve them in polynomial-time ; they are said to be intractable problems . The only algorithms we know for them need an exponential-time . Table 1 illustrates the evolution of time algorithms for polynomial-time problems vs non-polynomial. Improving the speed of computers or algorithms is not the solution, since if the speed is multiplied, the gain of time is only additive for exponential functions (Papadimitriou & Steiglitz, 1982).<<INSERT TABLE 1>>The consequences of the computational complexity for a great many real world problems are fundamental. Exact method for scheduling problems “become computationally impracticable for problems of realistic size, either because the model grows too large, or because the solution procedures are too lengthy, or both, and heuristics provide the only viable scheduling techniques for large projects” (Cooper, 1976).Heuristics and meta-heuristicsSince many real-world combinatorial problems are NP-hard, it is not possible to guarantee the discovery of the optimum. Instead of exact methods, one usually uses heuristics , which are approximate methods using iterative trial and error processes, to approach the best solution. Many of them are nature-inspired, and their latest development is to use metaheuristics . “Ametaheuristic is an iterative master process that guides and modifies the operations of subordinate heuristics to efficiently produce high-quality solutions. It may manipulate a complete (or incomplete) single solution or a collection of solutions at each iteration. The subordinate heuristics may be high (or low) level procedures, or a simple local search, or just a construction method.” (Voss, Martello, Osman, & Roucairol, 1999).Metaheuristics are high level methods guiding classical heuristics. They deal with a dynamic balance between diversification (exploration of the solution space) and intensification (exploitation of the accumulated knowledge) (Blum & Roli, 2003).Stochastic algorithmsRandom searchRandom search is what it says it is. In essence, it simply consists in picking up random potential solutions and evaluating them. The best solution over a number of samples is the result of random search.Many people do not realize that a stochastic algorithm is nothing else than a random search, with hints by a chosen heuristics (or meta-heuristics) to guide the next potential solution to evaluate. People who realize this feel uneasy about stochastic algorithms, because there is not guarantee that such an algorithm (based on random choices) will always find the global optimum.The only answer to this problem is a probabilistic one:•If, for a particular problem, one already knows the best solution for different instances of this problem, and•if, over a significative number of runs, the proposed stochastic algorithm findsa solution that in average is 99% as good as the known optimum for the testedinstances of the problem, then,•one can hope that on a new instance of the problem for which the solution is not known, the solution found by the stochastic algorithm will be 99% as good as the unknown optimum over a significative number of runs.This claim is not very strong, but there are not many other options available: if one absolutely wants to get the global optimum for a large NP-Hard problem, the only way is to let the computer run for several hundred years (cf. table 1)… The stochastic way is therefore a pragmatic one.Computational EffortAs can be seen above, it is difficult to evaluate the performance of stochastic algorithms, because, as Koza explains for genetic programming in (Koza, 1994):Since genetic programming is a probabilistic algorithm, not all runs aresuccessful at yielding a solution to the problem by generation G.When a particular run of genetic programming is not successful after theprespecified number of generations G, there is no way to know whether or whenthe run would ever be successful. When a successful outcome cannot beguaranteed for every run, there is no knowable value for the number ofgenerations that will yield a solution […].Koza therefore proposes a metrics to measure what he calls the computational effort required to solve a problem, that can be extended to any stochastic algorithm where evaluations consume a significant fraction of the computer resources:P n, the cumulative probability of success by the number of evaluations One first calculates ()n , being the number of runs that succeeded on or before the n th evaluation, divided by the number of runs conducted.The computational effort ()I n z , can then be defined as the number of evaluations that must be computed to produce a satisfactory solution with probability greater than z (where z is usually 99%), using the formula: ln(1)ln(1())z P n n −−⎡⎤∗⎣⎦.No Free Lunch TheoremRandom search is also important because it serves as a reference on which one can judge stochastic algorithms. A very important theorem is that of the No Free Lunch (Wolpert & Macready, 1995). This theorem states that no search algorithm is better than a random search on the space of all possible problems —in other words, if a particular algorithm does better than a random search on a particular type of problem, it will not perform as well on another type of problem, so that all in all, its global performance on the space of all possible problems is equivalent to a random search.The overall implication is very interesting, as it means that an off the shelf stochastic optimizer cannot be expected to give good results on any kind of problem (no free lunch): a stochastic optimizer is not a black box: to perform well, such algorithms must be expertly tailored for each specific problem.Hill-climbingHill-climbing is the basis of most local search methods. It is based on:• A set of feasible solutions {}n R ωωΩ=;∈.• An objective function ()f ω that can measure the quality of a candidate solution.• A neighborhood function (){()}n n N dist ωωωωδ=∈Ω|,≤ able to map anycandidate solution to a set of close candidate solutions.The optimization algorithm has to find a solution ()()f f ωωωω∗∗,∀∈Ω,≤. The basic hill-climbing algorithm is trivial:1.Build a candidate solution ω∈Ω. 2. Evaluate ω by computing ()f ω and set ωω∗←.3. Select a neighbor ()n N ωω∈ and set n ωω←.4. If ()()f f ωω∗≤ set ωω∗←.5. If some stopping criterion is met, exit else go to 3.For example, if one considers the famous Traveling Salesman Problem (TSP: given a collection of cities, finding the shortest way of visiting them all and returning back to the starting point), which is an NP-hard problem (cf. table 1 for complexity). The candidate solution is a list of cities e.g. F-D-B-A-E-C and the objective function is the length of this journey. There are many different ways to build a neighborhood function. 2-opt (Lin, 1965) is one of the simplest since it just reverses a sequence. Applying 2-opt could lead to F-E-A-B-D -C. The new tour will be selected if it is shorter than the previous one, otherwise one will evaluate another neighbor tour.More advanced hill-climbing methods look for the best neighbor:1.Build a candidate solution ω∈Ω. 2. Evaluate ω by computing ()f ω.3. For each neighbor ()n N ωω∈, evaluate ()n f ω.4.If all ()n f ω are ≥ ()f ω (local optimum) then exit.5. Else select ()()()n n N f f ωωωωω∗∗,∀∈,< as the current candidate solutionand set ωω∗←.6. Go to 3.The main advantage of hill-climbing is its simplicity, the core difficulty usually being the design of the neighborhood function. The price for this simplicity is a relative inefficiency. It is trivial to see that hill-climbing is easily trapped in local minima. If one starts from point A (see figure 1-b), it will not be able to reach the global optimum, since once on top of the first peak, it will not find any better point and will get stuck there.Even though many advanced forms of hill-climbing have been developed, these methods are limited to smooth and unimodal landscapes. “A question for debate in medieval theology was whether God could create two hills without an intervening valley (…) unfortunately, when optimizing functions, the answer seems to be no” (Anderson & Rosenfeld, 1988, p.551). This is why search rules based on local topography usually cannot reach the highest point. Simulated annealingSimulated annealing (Kirkpatrick, Gellat, & Vecchi, 1983) is an advanced form of hill-climbing . It originates in metallurgy. While annealing a piece of metal, quickly lowering the temperature leads to a defective crystal structure, far from the minimum energy level. Starting from a high temperature, cooling must be progressive when approaching the freezing point in order to obtain a nearly-perfect crystal, which is a crystal close to the minimum energy level. Knowing that the probability for a system to be at the energy level 0E is 00()exp()()B p E E k T Z T =−//, where B k is the Boltzmann constant , T the temperature and()Z T a normalizing function, Metropolis et al. proposed in 1955 a simple algorithm to simulate the behavior of a collection of atoms at a given temperature (Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller, 1955). At each iteration, a small random move is applied to an atom and the difference of energy E Δ is computed. If 0E Δ≤ the new state is always accepted. If 0E Δ> the new state is accepted according to a probability ()exp()B p E E k T Δ=−Δ/.Simulated annealing is based on a series of Metropolis algorithms with a decreasing temperature. It can shortly be described this way:1. Build a candidate solution ω∈Ω.2. Evaluate ω by computing ()f ω.3.Select a neighbor candidate solution ()n N ωω∈. 4. If ()()n f f ωω≤ then set n ωω← and exit if the evaluation is good enough.5. Else select n ω (n ωω←) according to the probability:exp((()()))n i p f f T ωω=−−/ where i T is the current temperature which decreasesover time.6.Go to 3. Uphill moves (step 5) allow overcoming local minima. One can illustrate the difference between hill-climbing and simulated annealing with the rolling ball metaphor (see Figure 2). Imagine a ball on a bumpy surface. The ball will roll down and stop at the first point of minimum elevation which usually is a local optimum. By tolerating uphill moves, simulated annealing somehow “shakes” the surface pushing the ball beyond the local minimum. At the beginning of the process, the surface is brutally shaken —the temperature is high— allowing a large exploration. The reduction of the temperature progressively decreases the shaking toprevent the ball from leaving the global optimum.<<INSERT FIGURE 2>>Simulated annealing is efficient, but slow. Many improvements have been proposed, like the rescaled simulated annealinga target energy level by using 22ij E Δ=−, with typically20t E T αα=,> (Hérault, 2000). This method “flattens” the error surface at the beginning ofthe process, minimizing the tendency of the algorithm to jump among local minima. Tabu Search“Tabu search may be viewed as ‘meta-heuristic’ superimposed on another heuristic. The approach undertakes to transcend local optimality by a strategy of forbidding certain moves.” ((Glover, 1986), this is the first appearance of the term meta-heuristic). Like simulated annealing , it is as an advanced form of hill-climbing , based on a set of feasible solutions Ω, an objective function ()f ω and a neighborhood function ()N ω. Tabu Search tries to overcome local minima by allowing the selection of non-improving solutions and by using a procedure which avoids cycling moves. Unlike simulated annealing , the probability of selection of a non-improving move is not applied to a given neighbor, but to the set of neighbors. To avoid cycles, Tabu Search implements a list T of tabu moves which in the basic form contains the t last moves. The simple Tabu Search works as follows (Glover, 1989):1. Select a potential solution ω∈Ω and let ωω∗←. Initialize the iteration counter 0k = and let T =∅.2. If ()N T ω−=∅ go to 4. Otherwise, increment k and select ()b N T ωω∈−the “best” available move.3. Let b ωω←. If ()()f f ωω∗<, let ωω∗←.4.If ω∗ is equal to the desired minimum or if ()N T ω−=∅ from 2, stop. Otherwise update T and go to 2.We did not define the “best” available move at step 2. The simplest —nevertheless powerful— way is to select b ω such that ()()()n b n N T f f ωωωω∀∈−,<. This means that thealgorithm can select a non-improving move since ()b f ω can be greater than ()f ω∗.The definition of the tabu list (step 4) is also a central one. This list aims at escaping local minima and avoiding cycles. It should then consider as tabu any return to a previous solutionstate . If 1s − is the reverse move of s , the tabu list can be defined such that 1{}h T s h k t −=:>−,where k is the iteration index and t defines the size of the time window. Practically, this method is hard to implement, especially because of memory requirement. One usually stores only partial ranges of the moves attributes, which can be shared by other moves. The tabu list then contains collections h C of moves sharing common attributes: h T C h k t =∪;>−, where1h h s C −∈ (Glover, 1989).Since the tabu list manages moves and not solutions, unvisited solutions can have the tabu status. In order to add flexibility to the research process, Tabu Search uses aspiration levels . In its simplest form, the aspiration level will allow tabu moves whose evaluation has been the best so far.Two extra features are usually added to Tabu Search (Glover, 1990): intensification anddiversification . These terms can be added to the objective function ff =+%intensification +diversification .Intensification aims at closely examining “interesting” areas. The intensification function will favor solutions close to the current best. The simplest way is to get back to a close-to-the-best solution and to reduce the size of the tabu list for some iterations. More sophisticated methods use long-term memory memorizing the good components of good solutions.Diversification aims at avoiding a too local search. The diversification function gives more weight to solutions far from the current one. The simplest way to implement it, is to perform random restarts. One can also penalize the most frequent solutions components.Let us examine a simple example to illustrate Tabu Search .<<INSERT FIGURE 3>>The cube (see figure 3) shows the cost and the neighborhood of an eight configurations problem. The random initial configuration is e.g. 10. We will simply define the tabu movements as the reverse movements in each of the three directions, that is if we move along x +, the movement x − will be tabu.First iteration:• Neighborhood of 10 is 15, 8 and 12.• The best move is z + which selects 8.• z − is added to the tabu list.Second iteration:• Neighborhood is 11, 13 and 10.• The best move is z − but it is tabu. The second best move is x − which selects 11.• x + is added to the tabu list.Third iteration:• Neighborhood is 9, 8 and 15.• The best move is x − but it is tabu. The second best move is y + which selects 9.• y − is added to the tabu list.Fourth iteration:• Neighborhood is 11, 13 and 5.• The best move is z − which is tabu, but its evaluation is 5 which is lower than the best evaluation so far (8). The aspiration criterion overrides the tabu restriction and the 5 is selected.• 5 is the global minimum, the research is over.Despite its simplicity, Tabu Search is a highly efficient algorithm. It is known to be one of the most effective meta-heuristics for solving the job-shop scheduling problem (Taillard, 1994; Watson, Whitley, & Howe, 2003). It is used in many different fields like resources planning, financial analysis, logistics, flexible manufacturing…Neural networksA neural network is a set of processing units linked by “learnable connections.” They are well known in the field of artificial intelligence where they notably provide powerful generalization and clustering tools. Some recurrent neural networks are also useful for optimization. The optimization process is usually based on the minimization of an energy function defined as: ()()()c k k k E x E x a E x =+∑ where c E is the cost function, ()k E x are thepenalties associated to constraint violations and k a are the associated weighting parameters.For many optimization problems, the cost function is expressed in a quadraticform ()12ij i j i i i j i E x T s s I s ,=−/−∑∑, where i s is the signal of the neuron i , 2ij i j T E s s =∂/∂∂ and i i I E s =∂/∂ (Dreyfus et al., 2002).Hopfield networks (Hopfield, 1982) are the most famous neural networks used for optimization. They are asynchronous (one randomly selected neuron is updated at each step) fully connected —except self-connection— neural networks (see Figure 4).<<INSERT FIGURE 4>>The binary version uses a sign function; the output signal of a neuron is computed as: 1is = if 0j ji j i w s θ−≥∑, 0i s = otherwise; where ji w is the weight of the connection betweenneurons j and i , j s is the signal of the neuron j and i θ is the bias (a constant, usuallynegative, signal).Such a network is a dynamic system whose attractors are defined by the minima of the energy function defined as:12ij i j j j i j jE w s s s θ,=−/−∑∑Originally, Hopfield designed his networks as associative memories. Data are stored in the attractors where the network converges starting from partial or noisy data providing a content-addressable memory . In 1985, Hopfield demonstrated the optimizing capabilities of his network applying it to the TSP problem (Hopfield & Tank, 1985).While using Hopfield networks for optimization, the main difficulty is the representation of the problem and the definition of the objective function as the energy of the network. We can illustrate that with the TSP. For n cities, Hopfield and Tank used 2n neurons. A set of n neurons was assigned to each city and the rank of the firing neuron designed the rank of the city during the travel.<<INSERT TABLE 2>>Table 2 represents the tour C-A-E-B-D . The energy function depends on constraints and cost. For the TSP, the constraints define the validity of the tour that is the fact that each city is visited once. Hopfield and Tank defined the corresponding function as:2222()xi xj xi yi xi x i j i i x x y x iA V VB V VC V n ≠≠/+/+/−∑∑∑∑∑∑∑∑ where xi V is the binary signal of the neuron representing the city x at the position i (0i V = or 1i V =) and A , B , C are constant.. The first term is zero when each row contains one “1” (cities are visited once), the second is zero when each column contains one “1” (there is one city per position) and the third term is zero when the matrix contains exactly n “1.” The cost function depends on the length of the tour. It is defined as:112()xy xi y i y i x y x iD d V V V ,+,−≠/+∑∑∑where xy d is distance between cities x and y and where x n j x j V V ,+,=. The energy function is the sum of the four terms. If the constants are large enough (A = B = 500, C = 200, D = 500 in the initial tests), low energy states will correspond to valid tours. The matrix of the connection weights becomes:11(1)(1)()xi yj xy ij ij xy xy j i j i w A B C Dd δδδδδδ,,+,−=−−−−−−+where xi yj w , is the weight of the connection between the neurons representing the city x atposition i and the city y at position j and 1ij δ= if i j =, 0ij δ= otherwise.The original model of Hopfield-Tank has been quite controversial since their results have proved to be very difficult to reproduce. Thanks to posterior improvements (e.g. Boltzmann machine which tries to overcome local minima by using a stochastic activation function (Ackley, Hinton, & Sejnowski, 1985)), Hopfield-Tank model demonstrated its usefulness. It is notably used to solve general (e.g. Gong, Gen, Yamazaki, & Xu, 1995) and quadratic (e.g. Smith, Krishnamoorthy, & Palaniswami, 1996) assignment problems; Cutting stock problems (e.g. Dai, Cha, Guo, & Wang, 1994) or Job-Shop scheduling (e.g. Foo & Takefuji, 1988).Apart from Hopfield networks, T. Kohonen Self Organizing Maps (SOM) (Kohonen,1997), initially designed to solve clustering problems, are also used for optimization (Smith, 1999), notably since the presentation of the Elastic Net Method (Durbin & Willshaw, 1987).They are especially used to solve quadratic assignment problems (Smith, 1995) and vehicle routing problems (e.g. Ramanujam & Sadayappan, 1995).Evolutionary algorithms and Genetic ProgrammingEvolutionary algorithms and Genetic Programming have chapters devoted to them in this book, so that this section will remain small and general. Evolutionary algorithms provide a way to solve the following interesting question. Given:1. a very difficult problem for which no way of finding a good solution is known and where a solution is represented as a set of parameters,2. a number of previous trials that have all been evaluated.How can one use the accumulated knowledge to choose a new set of parameters to try out (and therefore do better than a random search)? One could store all the trials in a database and perform statistics on the different parameters characterizing the trials, to try to deduce some traits that will lead to better results. However, in real life, parameters are often interdependent (epistasis ), so drawing conclusions may not be that easy, even on a large amount of data.Evolutionary algorithms (EAs) rely on artificial Darwinism to do just that: exploit each and every trial to try out new potential solutions that will hopefully be better than the previous ones: given an initial set of evaluated potential solutions (called a population of individuals), “parents” are selected to “give birth” to “children” thanks to “genetic” operators, such as “crossover” and “mutation.” “Children” are then evaluated and form the pool of “parents” and “children,” a replacement operator selects those that will make it to the new “generation.” As can be seen in the previous paragraph, the biological inspiration for this paradigm led to borrow vocabulary specific to this field.The selection and replacement operators are the driving force behind artificial evolution. They are biased towards good individuals, meaning that (all in all) the population is getting better along as the generations evolve. A too strong selection pressure will lead to a premature convergence (the population of individuals will converge towards a local optimum) while a too weak selection pressure will prevent any convergence.Evolutionary algorithms can be used to optimize virtually any kind of problems, even some that cannot be formalized. This makes them usable for interactive problems where the fitness of an individual is given by a human operator (see for example Ian Parmee chapter in this book). They are also very efficient on multi-objective problems, thanks to the fact that they evolve a whole population of individuals at once (see Carlos Coello Coello chapter in this book). Proper techniques (such as NSGA-II (Deb, Agrawal, Pratab, & Meyarivan, 2000)) can be used to create a full Pareto-front in only one run (something impossible to do with simulated annealing or Tabu Search , for instance).If EAs can be used for virtually anything, why not try to evolve programs? This is what Genetic Programming (detailed in chapter Genetic Programming ) is about. Individuals are not。

改进麻雀搜索算法优化支持向量机的井漏预测

改进麻雀搜索算法优化支持向量机的井漏预测王鑫;张奇志【期刊名称】《科学技术与工程》【年(卷),期】2022(22)34【摘要】在钻井过程中,受地质环境、钻井技术等多种因素的影响,容易发生井漏事故。

为预防井漏事故,减少因钻井事故带来的损失,提出了一种改进麻雀搜索算法(improved sparrow search algorithm,ISSA)优化支持向量机的井漏预测方法。

首先,在发现者位置更新公式中引入一种改进的自适应非线性惯性递减权重,提高算法全局搜索能力;其次,在警戒者位置更新公式中引入莱维(Levy)飞行策略,减少算法陷入局部最优的风险。

为验证改进算法的寻优能力,将麻雀搜索算法(sparrow search algorithm,SSA)、遗传算法(genetic algorithm,GA)、灰狼算法(grey wolf algorithm,GWO)以及改进的麻雀搜索算法(ISSA)在8个基准测试函数上做了对比实验。

实验结果表明,改进的麻雀搜索算法(ISSA)在寻优精度、稳定性等方面都较其他算法更为优异。

最后,将改进的麻雀搜索算法用于优化支持向量机(ISSA-SVM)的惩罚参数C和核参数g,进行井漏事故的预测。

结果表明,ISSA-SVM 预测准确率为97.7654%,相比于麻雀算法(SSA)-SVM、遗传算法(GA)-SVM以及灰狼算法(GWO)-SVM预测准确率都高,且收敛速度快,迭代次数少,能够高效、快速预测井漏事故,提高钻井效率和可靠性。

【总页数】8页(P15115-15122)【作者】王鑫;张奇志【作者单位】西安石油大学电子工程学院;陕西省油气井重点测控实验室【正文语种】中文【中图分类】TE21【相关文献】1.基于麻雀搜索算法优化支持向量机的滚动轴承故障诊断2.基于麻雀搜索算法优化支持向量机的刀具磨损识别3.基于改进麻雀搜索算法优化支持向量机的短期光伏发电功率预测4.基于变分模态分解和混沌麻雀搜索算法优化支持向量机的滚动轴承故障诊断5.麻雀搜索算法优化BP神经网络的短期风功率预测因版权原因，仅展示原文概要，查看原文内容请购买。

NIPS2014_WithSupp

Discriminative Metric Learning byNeighborhood GerrymanderingShubhendu Trivedi,David McAllester,Gregory ShakhnarovichToyota Technological InstituteChicago,IL-60637{shubhendu,mcallester,greg}@AbstractWe formulate the problem of metric learning for k nearest neighbor classiﬁcationas a large margin structured prediction problem,with a latent variable representingthe choice of neighbors and the task loss directly corresponding to classiﬁcationerror.We describe an efﬁcient algorithm for exact loss augmented inference,anda fast gradient descent algorithm for learning in this model.The objective drivesthe metric to establish neighborhood boundaries that beneﬁt the true class labelsfor the training points.Our approach,reminiscent of gerrymandering(redrawingof political boundaries to provide advantage to certain parties),is more direct inits handling of optimizing classiﬁcation accuracy than those previously proposed.In experiments on a variety of data sets our method is shown to achieve excellentresults compared to current state of the art in metric learning.1IntroductionNearest neighbor classiﬁers are among the oldest and the most widely used tools in machine learn-ing.Although nearest neighor rules are often successful,their performance tends to be limited by two factors:the computational cost of searching for nearest neighbors and the choice of the metric (distance measure)deﬁning“nearest”.The cost of searching for neighbors can be reduced with ef-ﬁcient indexing,e.g.,[1,4,2]or learning compact representations,e.g.,[13,19,16,9].We will not address this issue here.Here we focus on the choice of the metric.The metric is often taken to be Euclidean,Manhattan orχ2distance.However,it is well known that in many cases these choices are suboptimal in that they do not exploit statistical regularities that can be leveraged from labeled data. This paper focuses on supervised metric learning.In particular,we present a method of learning a metric so as to optimize the accuracy of the resulting nearest neighbor classiﬁer.Existing works on metric learning formulate learning as an optimization task with various constraints driven by considerations of computational feasibility and reasonable,but often vaguely justiﬁed principles[23,8,7,22,21,14,11,18].A fundamental intuition is shared by most of the work in this area:an ideal distance for prediction is distance in the label space.Of course,that can not be measured,since prediction of a test example’s label is what we want to use the similarities to begin with.Instead,one could learn a similarity measure with the goal for it to be a good proxy for the label similarity.Since the performance of k NN prediction often is the real motivation for similarity learning,the constraints typically involve“pulling”good neighbors(from the correct class for a given point)closer while“pushing”the bad neighbors farther away.The exact formulation of “good”and“bad”varies but is deﬁned as a combination of proximity and agreement between labels. We give a formulation that facilitates a more direct attempt to optimize for the k NN accuracy as compared to previous work as far as we are aware.We discuss existing methods in more detail in section2,where we also place our work in context.In the k NN prediction problem,given a point and a chosen metric,there is an implicit hidden variable:the choice of k“neighbors”.The inference of the predicted label from these k examples is trivial,by simple majority vote among the associated labels.Given a query point,there can possibly exist a very large number of choices of k points that might correspond to zero loss:any set of k points with the majority of correct class will do.We would like a metric to“prefer”one of these “good”example sets over any set of k neighbors which would vote for a wrong class.Note that to win,it is not necessary for the right class to account for all the k neighbors–it just needs to get more votes than any other class.As the number of classes and the value of k grow,so does the space of available good(and bad)example sets.These considerations motivate our approach to metric learning.It is akin to the common,albeit negatively viewed,practice of gerrymandering in drawing up borders of election districts so as to provide advantages to desired political parties,e.g.,by concentrating voters from that party or by spreading voters of opposing parties.In our case,the“districts”are the cells in the V oronoi diagram deﬁned by the Mahalanobis metric,the“parties”are the class labels voted for by the neighbors falling in each cell,and the“desired winner”is the true label of the training points associated with the cell.This intuition is why we refer to our method as neighborhood gerrymandering in the title. Technically,we write k NN prediction as an inference problem with a structured latent variable being the choice of k neighbors.Thus learning involves minimizing a sum of a structural latent hinge loss and a regularizer[3].Computing structural latent hinge loss involves loss-adjusted inference—one must compute loss-adjusted values of both the output value(the label)and the latent items(the set of nearest neighbors).The loss augmented inference corresponds to a choice of worst k neighbors in the sense that while having a high average similarity they also correspond to a high loss(“worst offending set of k neighbors”).Given the inherent combinatorial considerations,the key to such a model is efﬁcient inference and loss augmented inference.We give an efﬁcient algorithm for exact inference.We also design an optimization algorithm based on stochastic gradient descent on the surrogate loss.Our approach achieves k NN accuracy higher than state of the art for most of the data sets we tested on,including some methods specialized for the relevant input domains.Although the experiments reported here are restricted to learning a Mahalanobis distance in an ex-plicit feature space,the formulation allows for nonlinear similarity measures,such as those deﬁned by nonlinear kernels,provided computing the gradients of similarities with respect to metric param-eters is feasible.Our formulation can also naturally handle a user-deﬁned loss matrix on labels.2Related Work and DiscussionThere is a large body of work on similarity learning done with the stated goal of improving k NN performance.In much of the recent work,the objective can be written as a combination of some sort of regularizer on the parameters of similarity,with loss reﬂecting the desired“purity”of the neighbors under learned similarity.Optimization then balances violation of these constraints with regularization.The main contrast between this body of work and our approach here is in the form of the loss.A well known family of methods of this type is based on the Large Margin Nearest Neighbor (LMNN)algorithm[22].In LMNN,the constraints for each training point involve a set of pre-deﬁned“target neighbors”from correct class,and“impostors”from other classes.The set of target neighbors here plays a similar role to our“best correct set of k neighbors”(h∗in Section4).How-ever the set of target neighbors are chosen at the onset based on the euclidean distance(in absence of a priori knowledge).Moreover as the metric is optimized,the set of“target neighbors”is not dy-namically updated.There is no reason to believe that the original choice of neighbors based on the euclidean distance is optimal while the metric is updated.Also h∗represents the closest neighbors that have zero loss but they are not necessarily of the same class.In LMNN the target neighbors are forced to be of the same class.In doing so it does not fully leverage the power of the k NN objective. The role of imposters is somewhat similar to the role of the“worst offending set of k neighbors”in our method( h in Section4).See Figure2for an illustration.Extensions of LMNN[21,11]allow for non-linear metrics,but retain the same generalﬂavor of constraints.There is another extension to LMNN that is more aligned to our work[20],in that they lift the constraint of having a static set of neighbors chosen based on the euclidean distance and instead learn the neighborhood.Figure1:Illustration of objectives of LMNN(left)and our structured approach(right)for k=3. The point x of class blue is the query point.In LMNN,the target points are the nearest neighbors of the same class,which are points a,b and c(the circle centered at x has radius equal to the farthest of the target points i.e.point b).The LMNN objective will push all the points of the wrong class that lie inside this circle out(points e,f,h,i,and j),while pulling in the target points to enforce the margin.For our structured approach(right),the circle around x has radius equal to the distance of the farthest of the three nearest neighbors irrespective of class.Our objective only needs to ensure zero loss which is achieved by pushing in point a of the correct class(blue)while pushing out the point having the incorrect class(point f).Note that two points of the incorrect class lie inside the circle(e,and f),both being of class red.However f is pushed out and not e since it is farther from x.Also see section2.The above family of methods may be contrasted with methods of theﬂavor as proposed in[23]. Here“good”neighbors are deﬁned as all similarly labeled points and each class is mapped into a ball of aﬁxed radius,but no separation is enforced between the classes.The k NN objective does not require that similarly labeled points be clustered together and consequently such methods try to optimize a much harder objective for learning the metric.In Neighborhood Component Analysis(NCA)[8],the piecewise-constant error of the k NN rule is replaced by a soft version.This leads to a non-convex objective that is optimized via gradient descent.This is similar to our method in the sense that it also attempts to directly optimize for the choice of the nearest neighbor at the price of losing convexity.This issue of non-convexity was partly remedied in[7],by optimization of a similar stochastic rule while attempting to collapse each class to one point.While this makes the optimization convex,collapsing classes to distinct points is unrealistic in practice.Another recent extension of NCA[18]generalizes the stochastic classiﬁcation idea to k NN classiﬁcation with k>1.In Metric Learning to Rank(MLR)[14],the constraints involve all the points:the goal is to push all the correct matches in front of all the incorrect ones.This again is not the same as requiring correct classiﬁcation.In addition to global optimization constraints on the rankings(such as mean average precision for target class),the authors allow localized evaluation criteria such as Precision at k,which can be used as a surrogate for classiﬁcation accuracy for binary classiﬁcation,but is a poor surrogate for multi-way classiﬁcation.Direct use of k NN accuracy in optimization objective is brieﬂy mentioned in[14],but not pursued due to the difﬁculty in loss-augmented inference.This is because the interleaving technique of[10]that is used to perform inference with other losses based inherently on contingency tables,fails for the multiclass case(since the number of data interleavings could be exponential).We take a very different approach to loss augmented inference,using targeted inference and the classiﬁcation loss matrix,and can easily extend it to arbitrary number of classes.A similar approach is taking in[15],where the constraints are derived from triplets of points formed by a sample,correct and incorrect neighbors.Again,these are assumed to be set statically as an input to the algorithm,and the optimization focuses on the distance ordering(ranking)rather than accuracy of classiﬁcation.3Problem setupWe are given N training examples X={x1,...,x N},represented by a“native”feature map, x i∈R d,and their class labels y=[y1,...,y N]T,with y i∈[R],where[R]stands for the set{1,...,R }.We are also given the loss matrix Λwith Λ(r,r )being the loss incurred by predicting r when the correct class is r .We assume Λ(r,r )=0,and ∀(r,r ),Λ(r,r )≥0.We are interested in Mahalanobis metricsD W (x ,x i )=(x −x i )T W (x −x i ),(1)parameterized by positive semideﬁnite d ×d matrices W .Let h ⊂X be a set of examples in X .For a given W we deﬁne the distance score of h w.r.t.a point x as S W (x ,h )=− x j ∈hD W (x ,x j )(2)Hence,the set of k nearest neighbors of x in X ish W (x )=argmax |h |=k S W (x ,h ).(3)For the remainder we will assume that k is known and ﬁxed.From any set h of k examples from X ,we can predict the label of x by (simple)majority vote:y (h )=majority {y j :x j ∈h },with ties resolved by a heuristic,e.g.,according to 1NN vote.In particular,the k NN classiﬁer predicts y (h W (x )).Due to this deterministic dependence between y and h ,we can deﬁne the classiﬁcation loss incured by a voting classiﬁer when using the set h as∆(y,h )=Λ(y, y (h )).(4)4Learning and inferenceOne might want to learn W to minimize training loss i ∆(y i ,h W (x i )).However,this fails dueto the intractable nature of classiﬁcation loss ∆.We will follow the usual remedy:deﬁne a tractable surrogate loss.Here we note that in our formulation,the output of the prediction is a structured object h W ,for which we eventually report the deterministically computed y .Structured prediction problems usu-ally involve loss which is a generalization of the hinge loss;intuitively,it penalizes the gap between score of the correct structured output and the score of the “worst offending”incorrect output (the one with the highest score and highest ∆).However,in our case there is no single correct output h ,since in general many choices of h would lead to correct y and zero classiﬁcation loss:any h in which the majority votes for the right class.Ideally,we want S W to prefer at least one of these correct h s over all incorrect h s.This intuition leads to the following surrogate loss deﬁnition:L (x ,y,W )=max h [S W (x ,h )+∆(y,h )](5)−max h :∆(y,h )=0S W (x ,h ).(6)This is a bit different in spirit from the notion of margin sometimes encountered in ranking problems where we want all the correct answers to be placed ahead of all the wrong ones.Here,we only care to put one correct answer on top;it does not matter which one,hence the max in (6).5Structured FormulationAlthough we have motivated this choice of L by intuitive arguments,it turns out that our problem is an instance of a familiar type of problems:latent structured prediction [24],and thus our choice of loss can be shown to form an upper bound on the empirical task loss ∆.First,we note that the score S W can be written as S W (x ,h )= W ,− x j ∈h (x −x j )(x −x j )T ,(7)where ·,· stands for the Frobenius inner product.Deﬁning the feature mapΨ(x,h) −x j∈h(x−x j)(x−x j)T,(8) we get a more compact expression W,Ψ(x,h) for(7).Furthermore,we can encode the deterministic dependence between y and h by a“compatibility”function A(y,h)=0if y= y(h)and A(y,h)=−∞otherwise.This allows us to write the joint inference of y and(hidden)h performed by k NN classiﬁer asy W(x), h W(x)=argmaxh,y[A(y,h)+ W,Ψ(x,h) ].(9)This is the familiar form of inference in a latent structured model[24,6]with latent variable h.So, despite our model’s somewhat unusual property that the latent h completely determines the inferred y,we can show the equivalence to the“normal”latent structured prediction.5.1Learning by gradient descentWe deﬁne the objective in learning W asmin W W 2F+CiL(x i,y i,W),(10)where · 2F stands for Frobenius norm of a matrix.1The regularizer is convex,but as in other latent structured models,the loss L is non-convex due to the subtraction of the max in(6).To optimize(10),one can use the convex-concave procedure(CCCP)[25]which has been proposed speciﬁcally for latent SVM learning[24].However,CCCP tends to be slow on large problems. Furthermore,its use is complicated here due to the requirement that W be positive semideﬁnite (PSD).This means that the inner loop of CCCP includes solving a semideﬁnite program,making the algorithm slower still.Instead,we opt for a simpler choice,often faster in practice:stochastic gradient descent(SGD),described in Algorithm1.Algorithm1:Stochastic gradient descentInput:labeled data set(X,Y),regularization parameter C,learning rateη(·)initialize W(0)=0for t=0,...,while not converged dosample i∼[N]h i=argmaxh [S W(t)(x i,h)+∆(y i,h)]h∗i=argmax h:∆(yi ,h)=0S W(t)(x i,h)δW=∂S W(x i, h i)∂W−∂S W(x i,h∗i)∂WW(t)W(t+1)=(1−η(t))W(t)−CδWproject W(t+1)to PSD coneThe SGD algorithm requires solving two inference problems( h and h∗),and computing the gradient of S W which we address below.25.1.1Targeted inference of h∗iHere we are concerned withﬁnding the highest-scoring h constrained to be compatible with a given target class y.We give an O(N log N)algorithm in Algorithm2.Proof of its correctness and complexity analysis is in Appendix.1We discuss other choices of regularizer in Section7.2We note that both inference problems over h are done in leave one out settings,i.e.,we impose an additional constraint i/∈h under the argmax,not listed in the algorithm explicitly.Algorithm2:Targeted inferenceInput:x,W,target class y,τ ties forbidden Output:argmax h: y(h)=y S W(x)Let n∗= k+τ(R−1)R //min.required number of neighbors from yh:=∅for j=1,...,n∗doh:=h∩argminx i:y i=y,i/∈hD W(x,x i)for l=n∗+1,...,k dodeﬁne#(r) |{i:x i∈h,y i=r}|//count selected neighbors from class r h:=h∩argminx i:y i=y,or#(y i)<#(y)−τ,i/∈hD W(x,x i)return hThe intuition behind Algorithm2is as follows.For a given combination of R(number of classes) and k(number of neighbors),the minimum number of neighbors from the target class y required to allow(although not guarantee)zero loss,is n∗(see Proposition1in the App.A).The algorithmﬁrst includes n∗highest scoring neighbors from the target class.The remaining k−n∗neighbors are picked by a greedy procedure that selects the highest scoring neighbors(which might or might not be from the target class)while making sure that no non-target class ends up in a majority.When using Alg.2toﬁnd an element in H∗,we forbid ties,i.e.setτ=1.5.1.2Loss augmented inference h iCalculating the max term in(5)is known as loss augmented inference.We note thatmax h W,Ψ(x,h ) +∆(y,h )=maxymaxh ∈H∗(y )W,Ψ(x,h )= W,Ψ(x,h∗(x,y ))+Λ(y,y )(11)which immediately leads to Algorithm3,relying on Algorithm2.The intuition:perform targeted inference for each class(as if that were the target class),and the choose the set of neighbors for the class for which the loss-augmented score is the highest.In this case,in each call to Alg.2we set τ=0,i.e.,we allow ties,to make sure the argmax is over all possible h’s.Algorithm3:Loss augmented inferenceInput:x,W,target class yOutput:argmax h[S W(x,h)+∆(y,h)]for r∈{1,...,R}doh(r):=h∗(x,W,r,1)//using Alg.2 Let Value(r):=S W(x,h(r)),+Λ(y,r)Let r∗=argmax r Value(r)return h(r∗)5.1.3Gradient updateFinally,we need to compute the gradient of the distance score.From(7),we have∂S W(x,h)∂W =Ψ(x,h)=−x j∈h(x−x j)(x−x j)T.(12)Thus,the update in Alg1has a simple interpretation,illustrated in Fig2on the right.For every x i∈h∗\ h,it“pulls”x i closer to x.For every x i∈ h\h∗,it“pushes”it farther from x;these push and pull refer to increase/decrease of Mahalanobis distance under the updated W.Any other x i, including any x i∈h∗∩ h,has no inﬂuence on the update.This is a difference of our approach fromLMNN,MLR etc.This is illustrated in Figure2.In particular h∗corresponds to points a,c and e, whereas h corresponds to points c,e and f.Thus point a is pulled while point f is pushed.Since the update does not necessarily preserve W as a PSD matrix,we enforce it by projecting Wonto the PSD cone,by zeroing negative eigenvalues.Note that since we update(or“downdate”)W each time by matrix of rank at most2k,the eigendecomposition can be accomplished more efﬁciently than the na¨ıve O(d3)approach,e.g.,as in[17].Usingﬁrst order methods,and in particular gradient methods for optimization of non-convex func-tions,has been common across machine learning,for instance in training deep neural networks.Despite lack(to our knowledge)of satisfactory guarantees of convergence,these methods are oftensuccessful in practice;we will show in the next section that this is true here as well.One might wonder if this method is valid for our objective that is not differentiable;we discuss thisbrieﬂy before describing experiments.A given x imposes a V oronoi-type partition of the space of W into aﬁnite number of cells;each cell is associated with a particular combination of h(x)and h∗(x)under the values of W in that cell.The score S W is differentiable(actually linear)on theinterior of the cell,but may be non-differentiable(though continuous)on the boundaries.Since theboundaries between aﬁnite number of cells form a set of measure zero,we see that the score isdifferentiable almost everywhere.6ExperimentsWe compare the error of k NN classiﬁers using metrics learned with our approach to that with otherlearned metrics.For this evaluation we replicate the protocol in[11],using the seven data sets inTable1.For all data sets,we report error of k NN classiﬁer for a range of values of k;for eachk,we test the metric learned for that petition to our method includes Euclidean Distance, LMNN[22],NCA,[8],ITML[5],MLR[14]and GB-LMNN[11].The latter learns non-linear metrics rather than Mahalanobis.For each of the competing methods,we used the code provided by the authors.In each case we tunedthe parameters of each method,including ours,in the same cross-validation protocol.We omit a fewother methods that were consistently shown in literature to be dominated by the ones we compareto,such asχ2distance,MLCC,M-LMNN.We also could not includeχ2-LMNN since code for it isnot available;however published results for k=3[11]indicate that our method would win againstχ2-LMNN as well.Isolet and USPS have a standard training/test partition,for the otherﬁve data sets,we report the meanand standard errors of5-fold cross validation(results for all methods are on the same folds).Weexperimented with different methods for initializing our method(given the non-convex objective),including the euclidean distance,all zeros etc.and found the euclidean initialization to be alwaysworse.We initialize each fold with either the diagonal matrix learned by ReliefF[12](which gives ascaled euclidean distance)or all zeros depending on whether the scaled euclidean distance obtainedusing ReliefF was better than unscaled euclidean distance.In each experiment,x are scaled by meanand standard deviation of the training portion.3The value of C is tuned on on a75%/25%split ofthe training portion.Results using different scaling methods are attached in the appendix.Our SGD algorithm stops when the running average of the surrogate loss over most recent epoch nolonger descreases substantially,or after max.number of iterations.We use learning rateη(t)=1/t. The results show that our method dominates other competitors,including non-linear metric learning methods,and in some cases achieves results signiﬁcantly better than those of the competition.7ConclusionWe propose a formulation of the metric learning for k NN classiﬁer as a structured prediction prob-lem,with discrete latent variables representing the selection of k neighbors.We give efﬁcient algo-rithms for exact inference in this model,including loss-augmented inference,and devise a stochasticgradient algorithm for learning.This approach allows us to learn a Mahalanobis metric with an ob-jective which is a more direct proxy for the stated goal(improvement of classiﬁcation by k NN rule) 3For Isolet we also reduce dimensionality to172by PCA computed on the training portion.k=3Dataset Isolet USPS letters DSLR Amazon Webcam Caltech d17025616800800800800 N77979298200001579582951123 C26102610101010 Euclidean8.66 6.18 4.79±0.275.20±3.060.13±1.956.27±2.580.5±4.6 LMNN 4.43 5.48 3.26±0.124.17±4.526.72±2.115.59±2.246.93±3.9 GB-LMNN 4.13 5.48 2.92±0.121.65±4.826.72±2.113.56±1.946.11±3.9 MLR 6.618.2714.25±5.836.93±2.624.01±1.823.05±2.846.76±3.4 ITML7.89 5.78 4.97±0.219.07±4.933.83±3.313.22±4.648.78±4.5 NCA 6.16 5.23 4.71±2.231.90±4.930.27±1.316.27±1.546.66±1.8 ours 4.87 5.18 2.32±0.117.18±4.721.34±2.510.85±3.143.37±2.4k=7Dataset Isolet USPS letters DSLR Amazon Webcam Caltech Euclidean7.44 6.08 5.40±0.376.45±6.262.21±2.257.29±6.380.76±3.7 LMNN 3.78 4.9 3.58±0.225.44±4.329.23±2.014.58±2.246.75±2.9 GB-LMNN 3.54 4.9 2.66±0.125.44±4.329.12±2.112.45±4.646.17±2.8 MLR 5.648.2719.92±6.433.73±5.523.17±2.118.98±2.946.85±4.1 ITML7.57 5.68 5.37±0.522.32±2.531.42±1.910.85±3.151.74±2.8 NCA 6.09 5.83 5.28±2.536.94±2.629.22±2.722.03±6.545.50±3.0 ours 4.61 4.9 2.54±0.121.61±5.922.44±1.311.19±3.341.61±2.6k=11Dataset Isolet USPS letters DSLR Amazon Webcam Caltech Euclidean8.02 6.88 5.89±0.473.87±2.864.61±4.259.66±5.581.39±4.2 LMNN 3.72 4.78 4.09±0.123.64±3.430.12±2.913.90±2.249.06±2.3 GB-LMNN 3.98 4.78 2.86±0.223.64±3.430.07±3.013.90±1.049.15±2.8 MLR 5.7111.1115.54±6.836.25±13.124.32±3.817.97±4.144.97±2.6 ITML7.77 6.63 6.52±0.822.28±3.130.48±1.411.86±5.650.76±1.9 NCA 5.90 5.73 6.04±2.840.06±6.030.69±2.926.44±6.346.48±4.0 ours 4.11 4.98 3.05±0.122.28±4.924.11±3.211.19±4.440.76±1.8Table1:k NN error,for k=3,7and11.Features were scaled by z-scoring.Mean and standard deviation are shown for data sets on which5-fold partition was used.Best performing methods are shown in bold.Note that the only non-linear metric learning method in the above is GB-LMNN. than previously proposed similarity learning methods.Our learning algorithm is simple yet efﬁcient, converging on all the data sets we have experimented upon in reasonable time as compared to the competing methods.Our choice of Frobenius regularizer is motivated by desire to control model complexity without biasing towards a particular form of the matrix.We have experimented with alternative regularizers, both the trace norm of W and the shrinkage towards Euclidean distance, W−I 2F,but found both to be inferior to W 2F.We suspect that often the optimal W corresponds to a highly anisotropic scaling of data dimensions,and thus bias towards I may be unhealthy.The results in this paper are restricted to Mahalanobis metric,which is an appealing choice for a number of reasons.In particular,learning such metrics is equivalent to learning linear embedding of the data,allowing very efﬁcient methods for metric search.Still,one can consider non-linear embeddings x→φ(x;w)and deﬁne the distance D in terms of the embeddings,for example,as D(x,x i)= φ(x)−φ(x i) or as−φ(x)Tφ(x i).Learning S in the latter form can be seen as learning a kernel with discriminative objective of improving k NN performance.Such a model would be more expressive,but also more challenging to optimize.We are investigating this direction. AcknowledgmentsThis work was partly supported by NSF award IIS-1409837.。

纯Python和PyTorch对比实现SGDMomentumRMSpropAdam梯

纯Python和PyTorch对比实现SGDMomentumRMSpropAdam梯在机器学习训练过程中，梯度下降算法是最常用的优化算法之一、经典的梯度下降算法包括SGD（Stochastic Gradient Descent）、Momentum、RMSprop和Adam等。

下面将逐一对比纯Python和PyTorch的实现。

1.SGD（随机梯度下降）:SGD是最基本的优化算法，每次迭代仅使用一个随机样本进行梯度计算和参数更新。

以下是纯Python和PyTorch的SGD实现：```pythondef sgd(params, lr):for param in params:param -= lr * param.grad```PyTorch实现：```pythonimport torch.optim as optimoptimizer = optim.SGD(params, lr)optimizer.step```PyTorch提供了torch.optim模块来实现优化算法，可以直接调用其中的SGD类进行参数更新。

2. Momentum:Momentum算法在SGD的基础上引入了动量项，用于加速收敛过程。

以下是纯Python和PyTorch的Momentum实现：```pythondef momentum(params, velocities, lr, momentum):for param, velocity in zip(params, velocities):velocity = momentum * velocity + lr * param.gradparam -= velocityvelocity *= 0.9 # 0.9为动量因子```PyTorch实现：```pythonimport torch.optim as optimoptimizer = optim.SGD(params, lr, momentum=momentum)optimizer.step```PyTorch的SGD类也支持设置动量因子参数，因此可以直接传入momentum参数来实现。

Optimization Algorithms

Optimization AlgorithmsOptimization algorithms are a crucial tool in the field of mathematics, computer science, engineering, and various other disciplines. These algorithms are designed to find the best solution to a problem from a set of possible solutions, often within a specific set of constraints. The application of optimization algorithms is vast, ranging from solving complex mathematical problems to optimizing the performance of real-world systems. In this response, we will delve into the significance of optimization algorithms, their various types, real-world applications, challenges, and future prospects. One of the most prominent types of optimization algorithms is the evolutionary algorithm, which is inspired by the process of natural selection. These algorithms work by iteratively improving a population of candidate solutions through processes such as mutation, recombination, and selection. Evolutionary algorithms have been successfully applied in various domains, including engineering design, financial modeling, and data mining. Their ability to handle complex, multi-modal, and non-linear optimization problems makes them particularly valuable in scenarios where traditional algorithms may struggle to find optimal solutions. Another important type of optimization algorithm is the gradient-based algorithm, which operates by iteratively moving towards the direction of steepest descent in the search space. These algorithms are widely used in machine learning, optimization of neural networks, and various scientific and engineering applications. Gradient-based algorithms, such as the popular gradient descent algorithm, have proven to be highly effective in finding optimal solutions for differentiable and smooth objective functions. However, they may face challenges in dealing with non-convex and discontinuous functions, and can get stuck in local optima. Real-world applications of optimization algorithms are diverse and impactful. In the field of engineering, these algorithms are used for optimizing the design of complex systems, such as aircraft, automobiles, and industrial processes. They are also employed in logistics and supply chain management to optimize transportation routes, inventory management, and scheduling. In finance, optimization algorithms are utilized for portfolio optimization, risk management, and algorithmic trading. Furthermore, these algorithms play a crucial role in healthcare for treatmentplanning, resource allocation, and disease modeling. The wide-ranging applications of optimization algorithms underscore their significance in solving complex real-world problems. Despite their widespread use and effectiveness, optimization algorithms are not without challenges. One of the primary challenges is the need to balance exploration and exploitation in the search for optimal solutions. Many algorithms may struggle to strike the right balance, leading to premature convergence or excessive exploration, which can hinder their performance. Additionally, the scalability of optimization algorithms to handle high-dimensional and large-scale problems remains a significant challenge. As the complexity of problems increases, the computational resources required for optimization also grow, posing practical limitations in many applications. Looking ahead, the future of optimization algorithms holds great promise. With advancements in computational power, parallel processing, and algorithmic innovations, the capabilities of optimization algorithms are expected to expand significantly. The integration of optimization algorithms with artificial intelligence and machine learning techniques is likely to open new frontiers in autonomous optimization, adaptive algorithms, and self-learning systems. Moreover, the increasing emphasis on sustainability and resource efficiency is driving the development of optimization algorithms for eco-friendly designs, renewable energy management, and sustainable urban planning. In conclusion, optimization algorithms are indispensable tools for solving complex problems across diverse domains. Their ability to find optimal solutions within specified constraints makes them invaluable in engineering, finance, healthcare, and many other fields. While they face challenges such as balancing exploration and exploitation and scalability to high-dimensional problems, ongoing research and technological advancements are poised to enhance their capabilities. The future holds exciting prospects for optimization algorithms, as they continue to evolve and contribute to addressing the complex challenges of the modern world.。