Multi-agent coordination based on tokens Reduction of the bullwhip effect in a forest suppl
基于Ontology的Multi-Agent信息检索系统模型研究
《 农业网络信息》08 20 年第4期 研 究与开 发
基 于 Onoo y的 Mu iA e t tlg l g n 信息检 索 系统 t — 模 型 研 究
孙 倩 ,苗 良
( 山东农 业 大学 信息 科 学与 工程 学院 ,山东 泰 安 2 11) 708
K e wor :no main rtiv ; Onoo y; Mu t y dsI r t ere a f o l tlg li -Ag n et
1 引言
网络 已经 成 为 当今 人 们 获 取 知 识 的 主 要 来 源 ,但
2 O t oy简介 no g l
本 体 是 一 个 源 于 哲 学 的 概 念 ,即 “ 客 观 存 在 物 对 的系统 描述 ” ,后 被 人 工 智 能 界 引 入 ,最 早 将 本 体 定 义 为 “ 出构 成 相 关 领 域 词 汇 的 基 本 术 语 和 关 系 ,以 给 及 利 用 这 些 术 语 和关 系 构 成 的 规 定 这 些 词 汇 外 延 的规 则 的定义 ” 。后 来 越 来 越 多 的 人 研 究 本 体 ,给 出 了许
I r e o i rv h s is ,wih o tlg n lt- g n e h oo y lt- g n no main r tiv y tm de a e n o d rt mp o e ti sue t noo y a d mu i a e ttc n lg ,a mu i a e ti fr to ere a s se mo lb s d l o noo sp owa d n p cfc f n t n fe c o o e ta e e p an d b x mpes The mo e a mp o e te ef— n o tlg i utfr r ,a d s e ii u ci s o a h c mp n n x li e y e a l . y o r d lc n i rv h fi ce c fi o main rt e a , n a r al e h e d fte a piain. in y o n r to er v l a d c ng e ty me tt en e so h p lc to f i
The International Journal of Advanced Manufacturing Technology
Ping LouÆZu-de ZhouÆYou-Ping ChenÆWu AiStudy on multi-agent-based agile supply chain management Received:23December2002/Accepted:23December2002/Published online:5December2003ÓSpringer-Verlag London Limited2003Abstract In a worldwide network of suppliers,factories, warehouses,distribution centres and retailers,the supply chain plays a very important role in the acquisition, transformation,and delivery of raw materials and products.One of the most important characteristics of agile supply chain is the ability to reconfigure dynami-cally and quickly according to demand changes in the market.In this paper,concepts and characteristics of an agile supply chain are discussed and the agile supply chain is regarded as one of the pivotal technologies of agile manufacture based on dynamic alliance.Also,the importance of coordination in supply chain is emphas-ised and a general architecture of agile supply chain management is presented based on a multi-agent theory, in which the supply chain is managed by a set of intelli-gent agents for one or more activities.The supply chain management system functions are to coordinate its agents.Agent functionalities and responsibilities are de-fined respectively,and a contract net protocol joint with case-based reasoning for coordination and an algorithm for task allocation is presented.Keywords Agile supply chainÆMulti-agent systemÆCoordinationÆCBRÆContract net protocol1IntroductionAdvanced technology and management are constantly being adopted to improve an enterpriseÕs strength and competitive ability in order to achieve predominance among hot global competition.In a report on21st century manufacturing strategy development,the author suggests that various production resources,including people,funds,technology and facilities should be inte-grated and managed as a whole;thus optimising the utilisation of resources and taking full advantage of advanced manufacturing technology,information tech-nology,network technology and computer[1].Agile manufacture based on dynamic alliance is coming into being so that enterprises can remain competitive in a constantly changing business environment and is becoming a main competitive paradigm in the interna-tional market.Agility,which has basically two mean-ings:flexibility and reconfigurability,has become a very important characteristic of a modern manufacturing enterprise.Flexibility is an enterpriseÕs ability to make adjustments according to customersÕneeds.Reconfigu-rability is the ability to meet changing demands[2,3].The ability to quickly respond to marketÕs changes, called agility,has been recognised as a key element in the success and survival of enterprises in todayÕs market.In order to keep up with rapid change,enterprises need to change traditional management in this hot competition. Through dynamic alliance,enterprises exert predomi-nance themselves,cooperate faithfully with each other, and compete jointly so as to meet the needs of the fluctuating market,andfinally achieve the goal of win-win[2,3].So how to improve agility in the supply chain, namelyflexibility and reconfigurability,is one of the important factors to win against the competition.Supply chain management(SCM)is an approach to satisfy the demands of customers for products and ser-vices via integrated management in the whole business process from raw material procurement to the product or service delivery to customers.In[4],M.S.Fox et al. describe the goals and architecture of integrated supply chain management system(ISCM).In this system,each agent performs one or more supply chain management functions,and coordinates its decisions with other rele-vant agents.ISCM provides an approach to the real timeInt J Adv Manuf Technol(2004)23:197–203 DOI10.1007/s00170-003-1626-xP.Lou(&)ÆZ.ZhouRoom107,D8Engineering Research Center of Numerical Control System,School of Mechanical Science&Engineering, Huazhong University of Science&Technology, 430074Wuhan,Hubei,P.R.ChinaE-mail:louping_98@Y.-P.ChenÆW.AiSchool of Mechanical Science and Engineering, Huazhong University of Science and Technology, 430074Wuhan,Hubei,P.R.Chinaperformance of supply chain function.The integration of multi-agent technology and constraint network for solving the supply chain management problem is pro-posed[6].In[7],Yan et al.develop a multi-agent-based negotiation support system for distributed electric power transmission cost allocation based on the networkflow model and knowledge query&manipulation language (KQML).A KQML based multi-agent coordination language was proposed in[8,9]for distributed and dy-namic supply chain management.However,the coordi-nation mechanisms have not been formally addressed in a multi-agent-based supply chain.In most industries, marketing is becoming more globalised,and the whole business process is being implemented into a complex network of supply chains.Each enterprise or business unit in the SCM represents an independent entity with conflicting and competing product requirements and may possess localised information relevant to their interests.Being aware of this independence,enterprises are regarded as autonomous agents that can decide how to deploy resources under their control to serve their interests.This paperfirst introduces concepts and characteris-tics of agile supply chains and emphasises the impor-tance of coordination in supply chain.Then,it presents an architecture of agile supply chain based on a multi-agent theory and states the agentsÕfunctions and responsibilities.Finally,it presents a CBR contract net protocol for coordination and the correlative algorithm for task allocation in multi-agent-based agile supply chains.2Agile supply chainA supply chain is a network from the topologic structure which is composed of autonomous or semi-autonomous enterprises.The enterprises all work together for pro-curement,production,delivery,and so on[10].There is a main enterprise in the supply chain that is responsible for configuring the supply chain according to the de-mand information and for achieving supply chain value using fundflow,materialflow and informationflow as mediums.There are three discontinuous buffers to make the materialflowfluently and satisfy the change in the demand.On the one hand,as every enterprise manages inventory independently,plenty of funds are wasted.As the demand information moves up-stream,the forecast is inaccurate and the respond to the change in demand is slow[11].Accordingly,the key method for competi-tiveness is improving and optimising supply chain management to achieve integrated,automated,and agile supply chain management and to cut costs in the supply chain.To optimise supply chain management and coordi-nate the processes for materialflow,fundflow and informationflow,it is necessary to make materialflow fluent,quickly fund turnover and keep information integrated.Prompt reconfiguration and coordination is an important characteristic of agile supply chain according to dynamic alliance compositing and de-compositing(enterprise reconfiguration).Agile supply chain management can improve enterprise reconfiguring agility.The agile supply chain breaks through the tra-ditional line-style organizational structure.With net-work technology an enterprise group is formed by a cooperative relationship which includes an enterprise business centre,a production design centre,a supplier,a distribution centre,a bank,a decision-making centre, etc.It reduces the lead time to the market to satisfy customer demand.Agile supply chain without temporal and spatial limits promptly expands the enterprise scale,marketing share and resource by allied enterprise.So,a key factor of the agile supply chain is to integrate heterogeneous information systems adopted in various enterprises.The integration information system can provide marketing information and supplier details.Feasible inventory, quantity and cycle of replenished stock,delivery,etc.is designed using the shared information.It is evident that agile supply chain is a typical distributed system.A multi-agent system(MAS)which is characterised byflexibility and adaptability is suit-able for an open and dynamic environment.Thus MAS is a good method for agile supply chain man-agement.3The concept of agents and MASSome people define an agent as any piece of software or object which can perform a specific given task.Presently the prevailing opinion is that an agent must exhibit three important general characteristics:autonomy,adapta-tion,and cooperation[8,12,13].Autonomy means that agents have their own agenda of goals and exhibit goal-directed behaviour.Agents are not simply reactive,but can be pro-active and take initiatives as they deem appropriate.Adaptation implies that agents are capable of adapting to the environment,which includes other agents and human users,and can learn from the expe-rience in order to improve themselves in a changing environment.Cooperation and coordination between agents are probably the most important feature of MAS. Unlike those stand-alone agents,agents in a MAS col-laborate with each other to achieve common goals.In other words,these agents share information,knowledge, and tasks among themselves.The intelligence of MAS is not only reflected by the expertise of individual agents but also exhibited by the emerged collective behaviour beyond individual agents.Of course various agents have different functions,but some functions are needed for each agent.A generic structure of agents that includes two parts is presented:agent kernel and function mod-ule.Figure1exhibits the generic structure of agents which is a plug-in model.In Fig.1,the generic agent includes the following components:198The mailbox handles communication between one agent and the other agents.The message handler processes incoming message from the mailbox,orders them according to priority level,and dispatches them to the relevant components of the agent.The coordination engine makes decisions concerning the agent Õs goals,e.g.how they should be pursued,when to abandon them,etc.,and sends the accepted tasks to the planner/scheduler.It is also responsible for coordi-nating the agents Õinteractions with other agents using coordination protocols and strategies.The planner and scheduler plans the agent Õs tasks on the basis of decisions made by the coordination engine and on resources and task specifications available to the agent.If not,a message is sent to the coordination en-gine for finding extra resources.The blackboard provides a shared work area for exchanging information,data,and knowledge among function modules.Every function module is an inde-pendent entity.These function modules execute con-currently by the control of planner/scheduler and collaborate through the blackboard.The acquaintance database describes one agent Õs relationships with other agents in the society,and its beliefs about the capabilities of those agents.The coor-dination engine uses information contained in this database when making collaborative arrangements with other agents.The resource database reserves a list of resources (referred to in this paper as facts)that are owned by and available to the agent.The resource database also sup-ports a direct interface to external systems,which allows the interface to dynamically link and utilise a proprie-tary database.The ontology database stores the logical definition of each fact type—its legal attributes,the range of legal values for each attribute,any constraints betweenattribute values,and any relationship between the attributes of that fact and other facts.The task/plan database provides logical descriptions of planning operators (or tasks)known to the agent.4Multi-agent-based agile supply chain management Multi-agent-based agile supply chain management per-forms many functions in a tightly coordinated manner.Agents organise supply chain networks dynamically by coordination according to a changing environment,e.g.exchange rates go up and down unpredictably,customers change or cancel orders,materials do not arrive on time,production facilities fail,etc.[2,14].Each agent performs one or more supply chain functions independently,and each coordinates his action with other agents.Figure 2provides the architecture of multi-agent-based agile supply chains.There are two types of agents:functional agents and mediator agents.Functional agents plan and/or control activities in the supply chain.Mediator agents play a system coordinator role s by promoting coopera-tion among agents and providing message services.Mediator agents dispatch the tasks to the functional agents or other mediator agents,and then those func-tional or mediator agents complete the tasks by coordi-nation.All functional agents coordinate with each other to achieve the goals assigned by mediator agents.The mediator-mediator and mediator-agent communication is asynchronous,and the communication mode can be point-to-point (between two agents),broadcast (one to all agents),or multicast (to a selected group of agents).Messages are formatted in an extended KQML format.The architecture is characterised by organizational hier-archy and team spirit,simplifying the organisational architecture and reducing the time needed to fulfil the task.The rest of this section briefly describes each of the mediator agents underdevelopment.Fig.1Generic structures of agents199–Customer mediator agent:This agent is responsible for acquiring orders from customers,negotiating with customers about prices,due dates,technical advisory,etc.,and handling customer requests for modifying or cancelling respective orders,then sending the order information to a scheduling mediator agent.If a customer request needs to be re-designed,the infor-mation is sent to a design mediator agent,then to a scheduling mediator agent.–Scheduling mediator agent:This agent is responsible for scheduling and re-scheduling activities in the fac-tory,exploring hypothetical ‘‘what-if’’scenarios for potential new orders,and generating schedules that are sent to the production mediator agent and logis-tics mediator agent.The scheduling agent also acts as a coordinator when infeasible situations arise.It has the capability to explore tradeoffs among the various constraints and goals that exit in the plant.–Logistics mediator agent:This agent is responsible for coordinating multi-plans,multiple-supplier,and the multiple-distribution centre domain of the enterprise to achieve the best possible results in terms of supply chain goals,which include on-time delivery,cost minimisation,etc.It manages the movement of products or materials across the supply chain from the supplier of raw materials to the finished product customer.–Production mediator agent:This agent performs the order release and real-time floor control functions as directed by the scheduling mediator agent.It monitors production operation and facilities.If the production operation is abnormal or a machine breaks down,this agent re-arranges the task or re-schedules with the scheduling mediator agent.–Transportation mediator agent:This agent is responsible for the assignment and scheduling of transportation resources in order to satisfy inter-plant movement specified by the logistics mediator agent.It is able to take into account a variety oftransportation assets and transportation routes in the construction of its schedules.The goal is to send the right materials on time to the right location as assigned by the logistics mediator agent.–Inventory mediator agent:There are three invento-ries at the manufacturing site:raw product inven-tory,work-in-process inventory,and finished product inventory.This agent is responsible for managing these inventories to satisfy production requirements.–Supplier mediator agent:This agent is responsible for managing supplier information and choosing suppli-ers based on requests in the production process.–Design mediator agent:This agent is responsible for developing new goods and for sending the relevant information to the scheduling mediator agent for scheduling,as well as to the customer mediator agent for providing technological advice.5Coordination in a multi-agent-based agile supply chainCoordination has been defined as the process of man-aging dependencies between activities [15].One impor-tant characteristic of an agile supply chain is the ability to reconfigure quickly according to change in the envi-ronment.In order to operate efficiently,functional entities in the supply chain must work in a tightly coordinated manner.The supply chain works as a net-work of cooperating agents,in which each performs one or more supply chain functions,and each coordinates its action with that of other agents [5].Correspondingly,a SCMS transforms to a MAS.In this MAS,agents may join the system and leave it according to coordinating processes.With coordination among agents,this MAS achieves the goal of ‘‘the right products in the right quantities (at the right location)at the right moment at minimalcost’’.Fig.2An architecture of multi-agent based agile supply chain management2005.1Contract net protocol combined withcase-based reasoningThe contract net is a negotiation protocol(CNP)pro-posed by Smith[15].In the CNP,every agent is regarded as a node,such as a manager or a contractor.The manager agent(MA)is responsible for decomposing, announcing,and allocating the task and contractor agent(CA)is responsible for performing the task.This protocol has been widely used for multi-agent negotia-tion,but it is inefficient.For this reason,contract net protocol is combined with case-based reasoning(CBR).In case-based reasoning(CBR),the target case is defined as problem or instance which is currently being faced,and the base case is problem or instance in the database.CBR searches the base case in the database under the direction of the target case,and then the base case instructs the target case to solve the problem.This method is efficient.But at the very beginning,it is very difficult to set up a database which includes all problems solving cases.The cases may be depicted as follows:C¼\task;MA;taskÀconstraint;agentÀset> Here,MA is task manager.Task-constraint repre-sents various constraint conditions for performing the task,depicted as a vector{c1,c2,c3,...,c m}.Agent-set is a set of performing the task as defined below:Agent set¼\sub task i;agent id;cost;time;resource>f gtask¼[ni¼1sub task iIn the supply chain,the same process in which a certain product moves from the manufacturer to the customer is performed iteratively.So,case-based rea-soning is very efficient.Consequently,combining con-tract net protocol with CBR could avoid high communicating on load,thus promoting efficiency.The process can be depicted as follows(Fig.3).5.2The algorithm for task allocation baseon CBR contract net protocolThere are two types of agents in the supply chain, cooperative and self-interested agents.Cooperative agents attempt to maximise social welfare,which is the sum of the agents utilities.They are willing to take individual losses in service of the good of the society of agents.For example,function agents come from the same enterprise.In truth,the task allocation among cooperative agents is combinational optimisation prob-lem.Self-interested agents seek to maximise their own profit without caring about the others.In such a case,an agent is willing to do other agentsÕtasks only for com-pensation[16].Function agents,for example,come from different enterprises.In the following section the algorithm for task allo-cation among self-interested agents based on CBR contract net protocol will be addressed.Before describ-ing the algorithm,there are some definitions that must be clarified:Task—A task which is performed by one agent or several agents together:T=<task,reward,con-straints>,where task is the set of tasks(task={t1,t2,..., t m}),reward is the payoffto the agents that perform the task(reward={r1,r2,...,r m}),and constraints refer to the bounded condition for performing the task(con-straints={c1,c2,...,c n}).Agent coalition(AC)—A group of agents that per-form task T,described as a set AC={agent i,i=1,2,...,n}.Efficiency of agent—Efficiency of an agent i is de-scribed as follows:E i¼rewardÀcostðÞ=costð1Þwhere reward is the payoffto the agent performing task T,and cost refers to that spend on performing the task. If agent i is not awarded the task,then E i=0.Efficiency of agent coalition—E coalition¼rewardÀX micost iÀh!,X micost iþh!ð2Þwhere reward is the payoffof the agent coalition per-forming task T;cost i refers to that spend on performing task t i;and h is the expense on forming coalition,which is shared by the members of the coalition.If the coalition is not awarded task T,then E coalition<=0.6Algorithm:1.After MA accepts the task T=<task,reward,constraint>(task is decomposable),then it searches the database.2.If itfinds a corresponding case,it assigns the task orsubtask to the related agents according to the case, and the process is over3.If no case is found,then the task T is announced toall relevant agents(agent i,i=1,2,...n).4.The relevant agents make bids for the task accord-ing to their own states and capabilities.Thebid Fig.3CBR contract net process201from agent i can be described as follows:Bid i =<agentid i ,T i ,price i ,condition i >,where i ex-presses the bidding agent (i =1,2,...,h );agentid i is the exclusive agent identifier;T i is the task set of agent i Õs fulfilment;price i is the recompense of agent i fulfilling the task T i ;and condition i is the constraint conditions for agent i to fulfil the task T i .5.If [1 i h&T i then the task T can not be performed.Otherwise MA makes a complete combination of the agents,namely to form a number of agent coalitions (or agent sets,amounting to N =2h )1).6.First MA deletes those agent coalitions where no agents are able to satisfy the constraint condition.Next the rest of the coalitions are grouped by the number of agents in coalitions and put into set P (P ={P 1,P 2,...,P h })in order of the minimum re-compense increase of the coalitions,where P i is the set of agent coalitions,including i agents.7.MA puts the first coalition from each group P i(i =1,2,...,h )into set L ,and if L is null then it returns to (10),otherwise it calculates the minimum re-compense of each coalition as follows:Min Pm iprice i ÃT is :t :P h i ¼1T i TP m icondition i constraitThen it searches for the minimal agent coalition AC min from the set L .8.MA sends the AC min to the relevant agents,namely MA requests that these agent fulfil the task to-gether.The relevant agents calculate the E coalition and E i according to Eqs.1and 2.IfE coalition !max miE i ,then all agents in the AC minaccept the proposal to form a coalition to perform the task T together.MA assigns the task to the AC min ,and the process is over.Otherwise it deletes the AC min from P i and returns to (7).9.If the relevant agents accept the task or subtask,then MA assigns the task to them.The process is over.If some agents cannot accept the subtask and the stated time is not attained,then it returns to (3),otherwise it returns to (10).10.The process is terminated (namely the task cannotbe performed).After all processes have been completed,case-based maintenance is required to improve the CBR.Thus efficiency is continuously promoted.6.1An example–A simple instantiation of a supply chain simulation is presented here and the negotiating process among agents is shown.In this supply chain instantiation,thetransportation mediator agent (TMA)has a transporttask T ,in which it has to deliver the finished product to the customer within 15units of time and must pay 1500monetary units for it,that is T =<t ,1500,15>.Four transport companies can perform task T .Each company is an autonomous agent,that is four agents,agent A,agent B,agent C and agent D.So the TMA announces the task T to the four agents.Then the four agents make a bid for the task T as shown in Table 1.–So the four agents can form 24)1coalitions (see Fig.4),which are put into set P .Cooperation between agents in the coalition requires expense and the ex-pense for forming the coalition increases with the growth of in coalition size.This means that expanding the coalition may be non-beneficial.The expense of each agent in forming a coalition h is 100.First,the coalitions in which no agents can satisfy the constraint conditions are deleted from the set P .The rest of the coalitions are grouped by the number of agents in the coalition and ordered according to the recompense of each group that was increased due to the coalition,namely P 1={B},P 2={{A,B},{A,C},{B,C},{A,D},{B,D}},P 3={{A,B,C},{A,B,D},{B,C,D}},P 4={{A,B,C,D}}.Then the cost and efficiency of coalition {B},{A,C}and {A,B,C}are calculated as follows:Price f A ;B g ¼Min ð800x 1þ1200x 2Þs :t :20x 1þ12x 2 15x 1þx 2!1x 1!0:x 2!0Price f A ;B ;C g ¼Min ð800y 1þ1200y 2þ2000y 3Þs :t :20y 1þ12y 2þ5y 3 15y 1þy 2þy 3!1y 1!0:y 2!0;y 3!Fig.4Agent coalition graphTable 1The bids of four agents Agent Id Price Conditions Agent A 80020Agent B 120012Agent C 20005AgentD25003202the following result can be obtained:Price{B}=1200; x1=0.3750,x2=0.6250,Price{A,B}=1050;and y1= 0.3750,y2=0.6250,y3=0.The above result shows that agent B does not attend the coalition{A,B,C},that is both agent B and coalition{A,B}can fulfill the task and satisfy the constraint conditions.According to Eqs.1 and2,E A,E B,E{A,B}:E A=0(because TMA does not assign the task to A.),E B=(1500)1200)/1200=0.25, E{A,B}=(1500)1050)2*100)/(1050+2*100)=0.2can be obtained.Because of E{A,B}<max{E A,E B},agent B does not agree to form a coalition.Therefore,the TMA se-lects agent B to fulfil the task.7ConclusionsIn this paper,the concept and characteristics of agile supply chain management are introduced.Dynamic and quick reconfiguration is one of important characteristics of an agile supply chain and agile supply chain man-agement is one of the key technologies of agile manu-facturing based on dynamic alliances.As agile supply chain is a typical distributed system,and MAS is effi-cient for this task.In the architecture of agile supply chain management, the supply chain is managed by a set of intelligent agents that are responsible for one or more activities.In order to realise the agility of supply chains,coordination amongst agents is very important.Therefore,it can be suggested that contract net protocol should be combined with case-based reasoning to coordinate among agents. Acknowledgement The authors would like to acknowledge the funding support from the National Science Fund Committee (NSFC)of China(Grant No.5991076861).References1.Goldman S,Nagel R,Preiss K(1995)Agile competitors andvirtual organization.Van Nostrsand Reinhold,New York, pp23–32,pp158–1662.Yusuf YY,Sarhadi M,Gunasekaran A(1999)Agile manu-facturing:the drivers,concepts and attributes.Int J Prod Eng 62:33–433.Gunasekaran A(1999)Agile manufacturing:A framework forresearch and development.Int J Prod Eng62:87–1054.Fox MS,Chionglo JF,Barbuceanu M(1992)Integrated chainmanagement system.Technical report,Enterprise Integration Laboratory,University of Toronto5.Shen W,Ulieru M,Norrie DH,Kremer R(1999)Implementingthe internet enabled supply chain through a collaborative agent system.In:Proceedings of agentsÔ99workshop on agent-based decision support for managing the internet-enabled supply-chain,Seattle,pp55–626.Sandholm TW,Lesser VR(1995)On automated contracting inmulti-enterprise manufacturing.Advanced Systems and Tools, Edinburgh,Scotland,pp33–427.Beck JC,Fox MS(1994)Supply chain coordination via medi-ated constraint relaxation.In:Proceedings of thefirst Canadian workshop on distributed artificial intelligence,Banff,Alberta, 15May19948.Chen Y,Peng Y,Finin T,Labrou Y,Cost R,Chu B,Sun R,Willhelm R(1999)A negotiation-based multi-agent system for supply chain management.In:Working notes of the ACM autonomous agents workshop on agent-based decision-support for managing the internet-enabled supply-chain,4:1–79.Wooldridge M,Jennings NR(1995)Intelligent agents:theoryand practice.Knowl Eng Rev10(2):115–15210.Barbuceanu M,Fox MS(1997)The design of a coordinationlanguage for multi-agent systems.In:Muller JP,Wooldridge MJ,Jennings NR(eds)Intelligent agent III:agents theories, architecture and languanges(Lecture notes in artificial intelligence),Springer,Berlin Heidelberg New York,pp341–35711.Hal L,Padmanabhan V,Whang S(1997)The Bullwhip effect insupply chains.Sloan Manag Rev38(4):93–10212.Yung S,Yang C(1999)A new approach to solve supply chainmanagement problem by integrating multi-agent technology and constraint network.HICASS-3213.Yan Y,Yen J,Bui T(2000)A multi-agent based negotiationsupport system for distributed transmission cost allocation.HICASS-3314.Nwana H(1996)Software agents:an overview.Knowl Eng Rev11(3):1–4015.Smith RG(1980)Contract net protocol:high-level communi-cation and control in a distributed problem solver.IEEE Trans Comput29(12):1104–111316.Barbuceanu M,Fox MS(1996)Coordinating multiple agentsin the supply chain.In:Proceedings of thefifth workshop on enabling technology for collaborative enterprises(WET ICEÕ96).IEEE Computer Society Press,pp134–14117.Jennings NR,Faratin P,Norman TJ,OÕBrien P,Odgers B(2000)Autonomous agents for business process management.Int J Appl Artif Intell14(2):145–1818.Malone TW,Crowston K(1991)Toward an interdisciplinarytheory of coordination.Center for coordination science tech-nical report120,MIT Sloan School203。
基于Multi-Agent的农产品质量安全追溯系统研究
【 关键词 ] gn 多智能体 可追溯系统 A et 2 l— gn 系统 的体系结构 3 Mut A e t i 随着 人们生活水平 的不断提高 , 产品质量安全 问题 也变得越来 农 M l— gn 系统 的体 系结 构是用 于定义 M l A e t u i et tA u i gn 系统体 系结 t — 越 突出 。农产 品质量 问题频频被媒 体曝光, 更加引起 了人们对农 产 构 的元 素 、 也 体系结构元 素之间 的相互关 系以及对体系结构元 素的约束 品生产 、 过程可 追溯性的高度重视 流通 。为了保障食品安全监控 与追 的一套规则 。M l— gn 系统的体系结构有三种基本方案: u iA et t 溯 系统, 实现“ 田间到餐桌 ” 从 的全 程监控体 系, 对大供 应量的农产 品的 ( ) 中式结 构( 1集 黑板结构) 是多 A e t : gn 通过集 中式信息 中心或公 追 溯建设, 已进入实施 阶段 。 目 , 前 我国在蔬菜 、 肉类 、 水产 品等领域 已 共资源共享 区, 进行相互通信 、 资源共享; 开展 了可追溯 系统的研究 与应用 。 国际标准化组 织( O对 可追溯 性 I ) S () 2 分散式结构: 没有集 中的信息交换 中心或资源共享 区, 而是 由 (r eb i ) T a a i y 的定义是: c l t 通过记 载的识别 , 踪实体 的历 史 、 追 应用情 况和 各A et gn 之间按照通信协议直接进行消息交换, 具有分散式协调通信; 所 处场所的能力 3 20 年以来 , ~o 0 1 1 日本开始试行并推广农 产品与动物 () 3 混合式结构 : 是指集 中与分散相结 合的信息 结构, 以采 用的 可 性 食品的追溯系统;0 3 日 2 0 年 本开始对牛 肉制品实施追溯系统 , 同年美 是阶梯式 信息结构 。在一个 多 A e t gn 系统中, gn 是 自主 的, A et 它们可 国农业部开始建 立家畜追溯体 系。20 年 , 0 5 日本对通过全 国农 协上市 以是不同的个人或组织, 采用不 同的设计 方法和计算机语言开发而成, 的 肉类 和蔬菜等 所有农产 品实施追溯 系统 , 国从 20 年 4 4日开 因而可能是完全 异质的 。没有全 局数 据, 美 05 月 也没有全局控制 。这是一个 始执行 “ 贝类产品 的原产 国标签暂行法规 ” 韩 国从 2 0 年 1 鱼 ; 06 月开始 开放的系统, gn 加入 和离开都是 自由的 。系统 中的各 A et 同协 A et gn 共 在全国范围内执行 全方位 的农产 品追溯程序 。 作、 协调它们 的能力和 目标以求解单个 A e t gn 无法解决的问题 。 我 国从 2 0 0 0年开始对农 产品供应链 的可追溯 系统进行研究 。袁 3基于 Mut Ag n 的农产品质量安全追溯系统模 型研究 . l— e t i 康来等 出可追溯性是食 品质量 和安全管理 的一个预防性策略, 提 是农 农产 品从“ 田间到餐 桌”需要 经过 生产 、 加工 、 运输等 多道 环节 。 业食品供应链管 理系统的基本元素 。2 0 年 , 0 6 北京 市提出要围绕绿色 各环节都有导致农 产品质量安全 问题 的因素存在 。传统 的质量追溯是 奥运和放 心消费环境 , 围绕提高农产 品安全 控制水平 , 建立 、 完善对农 当出现质量 问题 时, 追溯方从 消费者逐级往生产者一方开始追溯撮 终 予 但 产品安全“ 从农 田到餐桌” 的全过程控制体 系。北京蔬菜溯源系统 以生 找到问题发生环节, 以控 制, 出现质量问题已是不可逆的 了。现有 产履历为基础 , I 卡 为管理 工具 , 以 c 以溯源标签 为蔬菜追溯码载体 , 的质量追溯 系统一般基 于RFD技术 。它可 以实现供 应链全程信 息的 以 I 查询服 务系统 为平台 ,9 生产配送 企业参与推 广 , 3家 具有追 溯码的蔬 采集 、 存储 、 传递 。但是, 一旦发生农 产品安全事故, 系统只能追溯到相 菜 品种 l0 2 多个 , 零售终端 10个 。国内许多高校 也在积极 致力于可 关环节, 7 而无法进行突发质量安全事故的诊断和实时检测 。 追溯 系统 的研究 , 如中国农业大学 、 复旦大学 、 南京 农业大学等 。2 0 05 基 于 M l— gn 的农 产品质量 安全追 溯则从 源头 开始对各 个环 ui et tA 结构 。 年咎树森 、 同超等设计研制开 发了“ 郑 牛肉安全生产全过程 质量跟踪与 节的信息进行控制, 见图 l 追溯信 息系统 ” 它是国内第一个对 牛肉生产 、 , 加工全 过程进行质量跟 踪 与追溯 的信息 管理 系统 , 该系统是 单机操作 , 化管理 功能有 不过 网络 待于继续开发 。 现有 的追溯 系统 均综合应用 了信息 系统技术 与编码 技术 , 在保证 标识 唯一的前提下可实现供应链 上相关环节信息的有效追溯 。然而, 考 虑到生产 、 环境 、 管理 、 为等可能对农产 品安全造成影响 的因素方 人 面, 就需要一种 可以融合多项技 术 , 根据周 围环境 变化做出实时诊 断 , 按照系统 目 自主调整并做出决策的技术来提供平台支撑。M l— gn 标 u A et i t 这 一智能技术正 具有 这样特点 , 而将 Mut Agn技术应用 到农 产品质 l— e t i 远程拄制 Ae t g n 量安全追 溯领域 也将填补 了这方面 的空 白。
基于Multi-Agent银行准备金模型的研究与实现
上 海 金 融 学 院 学 报
J o u r n a l o f S h a n g h a i F i n a n c e Un i v e r s i t y
No . 4 , 2 01 3
Ap r No . 1 1 8
方 法 应 用 于 金 融市 场 的 复 杂 性研 究 中 .研 究 了通 过 计 算 机 仿 真 技 术 来 模 拟 简 单 的市 场 , 讨 论 了基 于 M u l t i — A g e n t 的银 行 准 备 金 模 型 的 实 现 , 并 对 仿 真 的 数据 进 行 了分 析 。 最 后 分析 了 系 统 存 在 的缺 点和 需要 改进 的地 方 。 以及 对 未 来 计 算 金 融 的 展 望 。 关键词 : 银行准备金 ; A g e n t ; 计 算 金 融
望规 则 ( 比如采 用遗 传算 法 ( G A) 或者 人工 神 经 网络算 法 ( A N N) 预测 资产 未 来
收稿 E t 期: 2 0 1 3 — 0 7 — 0 3
作者简 介: 张高煜( 1 9 7 2 一 ) , 男, 湖北房县人 , 上 海 金 融学 院 副 教 授 , 博士后。
二、 银 行 准 备 金
为安全 起 见 , 银 行 需要 提 留一 定 比例 的存款 以保 证储 户 提款 , 其余 的存款
才 能用 于放 贷或 投资 。 这 部分 提 留 的存 款 叫银 行准 备金 , 决定 银 行准 备 金 比例 的要 素是 流 动性 要求 和盈 利性 要 求 。
流动性要求是指银行 能够 随时应付 客户的提款 , 满足必要贷款 的能力 。为了 分析方便 . 我们只考虑贷款 和准备金 两种资产 。由于银行需要应 付公众和其 它金
Multi-Agent混合遗传算法在岩性参数反演中的应用
l』 0
图3 密度反 演结果
…
—
—
F )一 f( 一 ) ( =、 , R , / o ∑ ) r
—— ( ) 式3
4 结 论
最后 算法用于三层样点理 论模 型的A A V 反演 ,每 层的样
点数 为 5 ,设 置 算 法 计 算 参 数 , 入 射 角 在 0 3 度 的 范 围 个 ~ 0 内 ,正 演 H五 层 样 点 理 论 模 型 P P 的 反 射 系 数 ,然 后 利 用 J —波
t a s i s o c e f c e t o pl n l n t di a a d rn m s in of i in s f ae o gi u n l n
t a s e s w v s J . e p y i s P o p c i g 1 6 , :8 rn vr e a e lJG oh sc r s e tn ,9 19 45
均匀概率分布的随机数。
多智能体遗传算法 (u t — g n e e i l o ih , M l iA e t G n tc A g r t m
M G ) 模 拟 生 物 自然 进 化 的 遗 传 算 法 与 智 能 体 相 结 合 “来 AA 是 求 解 优 化 问 题 的 进 化 算 法 ,智 能 体 具 有 感 应 周 围环 境 并 对 周 围发 生变 化 作 出响 应 的特 性 , 因 此 在 用 多 智 能体 遗 传 算
10 ,模拟 退火算 子 中的初始温 度为9 度 ,降温 的方式 幅 0代 O
度 为 0 9 ,迭 代 次 数 为 5 代 , 对 每 个 函 数各 运 行 3 次 ,求 其 .5 0 0 函数 的平均值 ,实验 结果如表 1 ,三 个 测 试 函 数 的全 局 最 优 值 均 为 0 从 表 中可 以看 出M H A 测 试 f ̄ f 计 算 性 能 上 要 , AG在 lU 2 明 显 好 于 微 分 进 化 算 法 (i fr n i l E o u i n D ) 但 D f e e t a v lt o , E ,
基于Multi-Agent理论的飞机故障协同诊断模型研究
2017年第24卷第7期基于Multi-Agent 理论的飞机故障协同诊断模型研究P 陆江华1徐贵强2(1.成都航空职业技术学院航空工程学院,四川成都610100;2.成都航空有限公司技术工程办公室,四川成都610200)摘要:随着我国民航事业的迅速发展,如何保障飞机的飞行安全成了日益重要的问题[1]。
解决这一问题的关键就是及 时准确地对故障进行分析和诊断。
根据飞机远程故障诊断的实际需求及当前基于角色的协同诊断模型中存在的问题, 应用Multi-Agent 理论对民航飞机远程故障的协同诊断做了一些探索性研究。
关键词:飞机故障诊断;Multi-Agent 系统;协同机制;UML 协作图 doi :10.3969/j . issn . 1006 -8554.2017.07.0021基于Multi-Agent 的被动协同机制针对基于角色的飞机故障协同诊断模型存在的问题,将引人Multi-Agent 思想,定义参与诊断的实体的功能和结构,将其 封装为诊断Agent ,并对Agent 之间的协同机制以及Agent 与协 同环境之间的交互关系进行重点研究[2] 3。
1.1 诊断Agent 的功能在Multi-Agent 的协同诊断环境中,每个参与诊断的实体可 以抽象为一个诊断Agent 。
按照飞机故障诊断的实际需求,诊 断Agent 的功能如图1所示。
诊断Agent实时监控 知识获取 故障诊断 数据维护协同诊断11111故障提交过程监控决策提交决策评价图1诊断Agent 的功能1) 飞机运行状态数据的实时监控的功能:用户可以对飞机 运行状态数据进行实时观测,当出现异常数据时,诊断Agent 的实时预警机制会向用户发出提示。
2)飞机故障数据的特征信息获取:诊断Agent 对飞机故障数据提供了数据预处理功能,通过一系列模块操作,最终获取 飞机故障数据中的关键特征信息。
这是后续对故障信息进行 分析诊断的必要准备。
采用拟蒙特卡罗法的被动多传感器目标跟踪
收稿日期:2009 12 22基金项目:国家自然科学基金资助项目(60871074)作者简介:郭 辉(1985 ),男,西安电子科技大学博士研究生,E mail:gh ui xd@.doi:10.3969/j.issn.1001 2400.2010.06.011采用拟蒙特卡罗法的被动多传感器目标跟踪郭 辉,姬红兵,武 斌(西安电子科技大学电子工程学院,陕西西安 710071)摘要:使用拟蒙特卡罗采样方法替代传统的蒙特卡罗采样方法,改善了高斯粒子滤波器的性能,结合多传感器集中式融合策略,提出了一种基于拟蒙特卡罗 高斯粒子滤波器的被动多传感器目标跟踪算法,较好地解决了被动跟踪中的强非线性和弱可观测性问题.该算法在降低计算复杂度的同时提高了跟踪的精度和稳定性,使算法快速收敛,并且具有并行结构,有利于用超大规模集成电路来实现.关键词:多传感器;拟蒙特卡罗 高斯粒子滤波器;目标跟踪中图分类号:T N953 文献标识码:A 文章编号:1001 2400(2010)06 1042 06Quasi Monte Carlo Gaussian particle filter basedtarget tracking for the multiple passive sensorG UO H ui,J I H ong bing ,W U B in(Scho ol of Electr onic Eng ineering ,X idian U niv.,Xi an 710071,China)Abstract: T his paper employ s Q uasi M onte Carlo (Q M C)sampling to replace conventio na l M onte Car lo(M C)sam pling,thus impro ving the per formance o f the G aussian Part icle Filter (G PF).A multi passiv esensor targ et tracking alg or ithm based o n the Quasi M o nte Car lo Gaussian P article Filter (QM C G PF)ispr oposed in connectio n w ith the multi senso r centr alized fusio n str ategy ,which r eso lves the stro ngno nlinearit y and w eak observ abilit y pr oblem in a multi passive senso r tracking system mor e efficiently.T he algo rithm not o nly r educes the computat ional co mplexity,but also impro ves the accuracy andstability of the tracking alg or ithm,thus g etting fast conver gence.M o reo ver,because o f the parallelstr uctur e,w hich makes it easier to realize wit h larg e scale integr ated circuits.Key Words: passiv e sensor s;Q uasi M o nt e Car lo Gaussian particle filter;targ et tracking被动多传感器目标跟踪实质上是一个非线性跟踪问题,将非线性滤波应用于被动多传感器目标跟踪成为当前研究的热点.非线性滤波最典型的代表是扩展卡尔曼滤波(EKF)[1]和无迹卡尔曼滤波(UKF)[2].由于被动传感器目标跟踪中非线性程度较高,而EKF 需要进行线性化处理,会使系统误差增大,导致滤波器不稳定甚至发散.UKF 用有限个确定性的样本点来近似状态的概率分布,通过系统方程传递样本点,进而更新目标的状态,避免了由于局部线性化近似带来的误差,得到更高的滤波精度.还有学者提出了性能更优的求积分卡尔曼滤波(QKF)[3 4]和求容积卡尔曼滤波(CKF)[5]方法,但这类方法均受高斯假设的限制,仅适用于非线性程度较低的场合,制约了其在高跟踪精度场合下的应用.近年来兴起的基于贝叶斯理论的粒子滤波,因其在处理非线性、非高斯系统时表现出的优越性能,受到人们日趋广泛的关注,因此涌现出了各种形式的粒子滤波目标跟踪算法[6 7].高斯粒子滤波[8]是粒子滤波的一个变体算法,由于不需要进行重采样,克服了样本枯竭问题,并且计算过程中不用存储样本,降低了计算的复杂度.它假设系统状态的预测概率和后验概率密度可以用一个单峰的高斯密度来近似,使用蒙特卡罗数值积分的方法来计算更新状态的均值和协方差,但蒙特卡罗采样的随机性2010年12月第37卷 第6期 西安电子科技大学学报(自然科学版)JOUR NAL OF XIDI AN UNIV ER SI TY Dec.2010Vol.37 No.6仍然会导致较大的计算量.笔者采用拟蒙特卡罗(QM C)数值积分方法[9],用一些精选的确定性点来取代蒙特卡罗积分中的随机点,能更加均匀地探究采样空间,并进一步降低计算的复杂度,从而提高了高斯粒子滤波(GPF)的性能.笔者将QMC GPF 算法与多传感器集中式融合策略相结合,提出了基于QM C GPF 的被动多传感器目标跟踪算法,较好地解决了被动传感器系统中存在的强非线性和弱可观测性问题,在减少计算量的同时提高了被动跟踪算法的精度和稳定性,使算法快速收敛.1 系统数学模型笔者主要考虑笛卡尔坐标系下被动多传感器单目标跟踪的情况.假设目标在三维空间中运动,(x s i ,y s i ,z s i ),i =1,2, ,M,为第i 个被动传感器的空间位置,(x k ,y k ,z k )为k 时刻目标的位置, 及 分别表示目标相对于传感器的俯仰角和方位角.系统的状态取x ,y 和z 3个方向的位置、速度和加速度,建立如下的离散时间状态方程:x k+1=Fx k +G w k ,(1)其中,x k =[x k ,y k ,z k ,v x k ,v y k ,v z k ,a x k ,a y k ,a z k ]T为k 时刻目标的状态,F 为一步转移矩阵,G 为模型噪声转移矩阵,w k =[w x k ,w y k ,w z k ]T 为模型位置噪声,噪声协方差矩阵Q k =cov [Gw k ].针对不同的模型,参数设置可参考文献[10].用 s i k , si k 表示k 时刻目标相对于第i 个观测站的俯仰角和方位角,定义为 s i k =ar ctan z k -z s i (x k -x s i )2+(y k -y s i )21/2 , s i k =arctan x k -x s i y k -y s i.(2)第i 个被动观测站的观测方程为z s i k =h s i (x k )+v s i k =( s i k , s i k )T +v s i k , i =1,2, ,M ,(3)其中,v s i k 是观测噪声,服从均值为零、协方差为R s i k =2s i 00 2s i 的高斯分布.因此k 时刻的观测集合可表示为z k =[z s 1k , ,z s M k ]T =[( s 1k , s 1k )T , ,( s M k , s M k )T ]T ,系统的协方差矩阵为R k =R s 1kR s Mk .(4)从式(2)和(3)可以看出,系统的状态和观测之间存在较强的非线性,经典的EKF 和UKF 算法解决此类问题时,系统误差会比较大,导致估计性能较差.笔者引入改进粒子滤波算法,能得到贝叶斯理论下状态的渐进最优估计.2 基于QMC GPF 的被动多传感器目标跟踪算法2 1 拟蒙特卡罗 高斯粒子滤波器设直到k 时刻,目标的状态集合x 0:k ={x 0, ,x k },观测集合z 0:k ={z 0, ,z k }.在贝叶斯框架下,当已知初始概率分布p (x 0)及到k 时刻的所有观测值时,递归地估计状态的后验分布p (x k |z 0:k )和预测分布p (x k+1|z 0:k ).最优贝叶斯估计的观测更新和时间更新式分别为p (x k |z 0:k )=C k p (z k |x k )p (x k |z 0:k-1) ,(5)p (x k+1|z 0:k )= p (x k+1|x k )p (x k |z 0:k )d x k ,(6)1043第6期 郭 辉等:采用拟蒙特卡罗法的被动多传感器目标跟踪其中C k 为归一化常量,即C k = p (x k |z 0:k-1)p (z k |x k )d x k -1 .(7) 高斯粒子滤波算法将滤波分布p (x k |z 0:k )和预测分布p (x k+1|z 0:k )都近似成一个单峰的高斯分布,用蒙特卡罗采样积分计算它们的均值和方差.在高斯粒子滤波框架下,用QMC 采样所得样本代替传统的蒙特卡罗随机样本,得到QM C GPF 算法.理论证明,在后验概率密度可以近似为高斯分布时,该算法是渐进最优的[8].拟蒙特卡罗方法是用精选的确定性的样本点来代替蒙特卡罗采样中的随机性点,其中确定性的样本点是由低偏差序列通过某种变换而得的.在最优低偏差序列情况下,积分将以阶数O((log N )d /N )收敛,而对于相同维数d,蒙特卡罗积分的收敛阶数为O(N -1/2)[9].因此,它能以较少的样本数达到蒙特卡罗方法需大量样本数才能达到的精度,从而减少了计算量.M iodrag [11]设计了GPF 的并行硬件,仅用QMC 采样替代GPF 中的传统蒙特卡罗采样,并没有改变滤波的结构.因此,笔者提出的算法同样具有并行结构,有利于超大规模集成电路的实现.把QMC GPF 引入到被动多传感器跟踪系统中,实现对目标的跟踪.如何根据传感器的观测集合更新滤波过程中状态的均值和协方差是需要解决的主要问题.笔者采用集中式融合策略,其主要思想是把所有的观测值传递到融合处理节点,进行统一的滤波处理,即根据k 时刻所有的观测值来计算权值,进而更新目标的状态及协方差.已经进行了空间和时间的配准,下面详细推导被动多传感器系统中QM C GPF 滤波过程中的权值.假设系统为一阶马尔科夫过程,各个传感器的观测相互独立,由贝叶斯理论可知,后验概率密度函数p (x k |z 0:k )可表示为p (x k |z 0:k )=p (z 0:k |x k )p (x k )p (z 0:k )=p (z k |z 0:k-1,x k )p (z 0:k-1|x k )p (x k )p (z k |z 0:k-1)p (z k -1)=p (z k |z 0:k-1,x k )p (x k |z 0:k-1)p (z 0:k -1)p (x k )p (z k |z 0:k-1)p (z 0:k-1)p (x k )=p (z k |x k )p (x k |z 0:k -1)p (z k |z 0:k-1)N i =1 (i)k (x k -x (i)k ) ,(8)其中, ( )为Dir ac delta 函数, (i)k 为第i 个粒子的权值.由于很难得到后验概率的真实分布,因此可构造重要性密度函数q(x k |z 0:k ).依据重要性密度进行采样,可得粒子的权值(i)k =p (z k |x (i)k )p (x (i)k |z 0:k-1)q(x (i)k |z 0:k ).(9)根据QM C GPF 滤波原理,预测分布p (x (i)k |z 0:k-1)可用一个单峰的高斯分布近似,因此式(9)可表示成 (i)k=p (z k |x (i)k )N (x k =x (i)k ;^x k|k-1,P ^k|k-1)q(x (i)k |z 0:k )=p (z s 1k , ,z s M k |x (i)k )N (x k =x (i)k ;^x k|k-1,P ^k|k-1)q(x (i)k |z 0:k )= M j =1p (z s j k |x (i)k )N (x k =x (i)k ;^x k|k-1,P ^k|k-1)q(x (i)k |z 0:k ).(10)其中,^x k|k-1,P ^k|k-1分别为p (x (i)k |z 0:k-1)的均值和协方差.重要性密度函数q(x k |z 0:k )的选取依赖于所研究的问题.就QM C GPF 来讲,一种简单的选取是使q(x k |z 0:k )=p (x k |z 0:k-1)=N (x k ;^x k|k-1,P ^k|k-1),于是可得 (i)k = M j =1p (z s j k |x (i)k )M j =1(2 )-1R s j k -1/2exp -12(^z s j k -z s j k )T (R s j k )-1(^z s j k -z s j k )=(2 )-M R k -1/2ex p -12(^z k -z k )T (R k )-1(^z k -z k ) .(11)2 2 算法流程结合多传感器集中式融合策略,笔者提出了一种基于QM C GPF 的被动多传感器目标跟踪算法,具体算法流程如下.(1)初始化.设置k =1,目标初始状态为x 0,协方差为P 0,令1044 西安电子科技大学学报(自然科学版) 第37卷^x 1|0=x 0 , P ^1|0=P 0 .(2)QM C 采样.产生均值为^x k|k -1,协方差为P ^k|k-1的N 个拟高斯点,即x (i)k|k-1~N (^x k|k-1,P ^k|k-1) , i =1,2, ,N .(3)传播拟高斯点.根据观测方程(3),得到^z k ={^z s j k ,j =1,2, ,M } , ^z s j k =h s j (x (i)k|k-1) , i =1,2, ,N .(4)计算权值.对于每个粒子i =1,2, ,N ,根据k 时刻的观测集合z k ,利用式(11)计算权值 (i)k .(5)权值归一化.(i)k = (i)k N l=1 (l)k .(6)估计状态均值和协方差. ^x k|k =N i =1 (i)k x (i)k|k-1 , P ^k|k =N i=1 (i)k (^x k|k -x (i)k|k-1)(^x k|k -x (i)k|k-1)T . (7)QM C 采样.产生均值为^x k|k ,协方差为P ^k|k 的N 个拟高斯点,即x (i)k|k ~N(^x k|k ,P ^k|k ) , i =1,2, ,N .(8)传播拟高斯点.根据状态方程(1),得到{x (i)k+1|k ,i =1,2, ,N } , x (i)k+1|k =Fx (i)k|k +G w k .(9)状态均值和协方差预测. ^x k+1|k =1N N i =1x (i)k+1|k , P ^k+1|k =1N N i=1(^x k+1|k -x (i)k+1|k )(^x k+1|k -x (i)k+1|k )T . (10)如果k 等于观测时间长度,结束;否则,k =k +1,转到(2).3 仿真实验与分析实验采用三维空间中静止的3个被动观测站,传感器位置坐标分别为(0km ,0km,0km),(0km,10km,0km ),(0km,0km ,5 31/2km),每个观测站的测角误差均为 s i=1mrad ,i =1,2,3.k 时刻每个观测站实际测量值z s i k ,由观测值( s i k , si k )T 加上相互独立、噪声强度为1m rad 的零均值高斯白噪声得到.定义k 时刻、L 次实验的位置均方根误差(RM SE)为E RMSE k =1L L l =1(x ^l k -x k )2+(y ^l k -y k )2+(^z l k -z k )21/2 .(12) 实验1 使用三维空间中的匀速直线运动模型[10],对比分析基于QM C GPF 的被动多传感器目标跟踪算法与基于PF 和GPF 算法的性能.取目标位置噪声强度 w =1 10-3km ,初始时刻目标的位置为(29 98km,29 98km,10.00km),x 和y 方向与真实位置均有20m 的偏差,运动速度为(-0 3km /s,-0 4km /s,0.0km/s),采样周期T s =1s,观测时间长度为80s.分别采用3种算法进行100次蒙特卡罗实验.图1所示是粒子数分别为100,200,300,500的情况下,基于QM C GPF,PF,GPF 算法的跟踪性能对比.从跟踪精度上分析,当粒子数为100时,基于QM C GPF 算法的位置均方根误差明显小于其他两种算法.随着粒子数的增加,PF 算法的性能明显提高,GPF 算法也有所改善,而Q MC GPF 算法的性能基本没有变化,这是由于传统蒙特卡罗采样会形成 空隙和簇 的现象,只有通过增加粒子数来弥补其不足,导致大量粒子的浪费.笔者提出的算法利用QM C 采样方法,使粒子在空间分布更加均匀,解决了 空隙和簇 导致滤波器性能下降的问题,从而提高了跟踪的精度.从粒子数上分析,要达到相同的跟踪精度,GPF 的算法和PF 的算法分别需要2倍和5倍于QM C GPF 算法的粒子数.笔者提出的算法所需粒子数最少,体现了粒子在空间分布的重要性,减轻了算法的计算量,有利于工程的实现.1045第6期 郭 辉等:采用拟蒙特卡罗法的被动多传感器目标跟踪图1 3种算法的跟踪性能对比从收敛速度上分析,在初始阶段,由于初始状态协方差矩阵比较大,导致滤波器不稳定,3种算法的跟踪精度都有所下降,因此会出现图1中曲线先升高后下降的现象.当粒子数为100时,可明显看出,QM C GPF 算法在第12个时刻左右,就已经能稳定跟踪目标,而GPF 算法,特别是PF 的算法需要经过很长的调整过程.综上所述,笔者提出的算法在粒子数较少的情况下,可以获得较高的跟踪精度和快速的收敛速度.实验2 使用三维空间中的匀速直线运动、匀加速直线运动以及恒速率转弯运动模型[10],对比分析在不同粒子数情况下,基于QMC GPF 的被动多传感器目标跟踪算法的性能,并对3种运动模型下目标跟踪的失跟率进行统计分析.3种运动模型中目标位置噪声强度均为1 10-3km,初始时刻位置坐标均为(29 98km,29 98km,10.00km),x 和y 方向与真实位置有20m 的偏差,运动速度为(-0 3km/s,-0 4km/s,0.0km/s).匀加速直线运动在x ,y 和z 方向的加速度为(-0 002km/s 2,-0 003km/s 2,0.000km/s 2),恒速率转弯运动的转弯速率为6 /s.采样周期T s =1s,观测时间长度为80s,分别进行100次蒙特卡罗实验.表1 不同粒子数情况下位置均方根误差的均值和方差位置均方根误差的均值(mean)/m100200300500位置均方根误差的方差(var)/m 2100200300500匀速直线运动23 422 522 121 6131130128124匀加速直线运动27 825 424 223 8208194188180恒速率转弯运动41 339 837 636 9501482446458表1所示为3种运动模型下,当采样粒子数不同时采用笔者提出的算法所得的位置均方根误差的均值和方差.从表1可以看出,笔者提出的算法对目标作匀速、匀加速直线运动以及恒速率转弯运动都具有很好的跟踪性能.从运动模型角度分析,算法对目标发生机动情况下的跟踪性能要略差于目标作匀速运动的情况,这是由于随着目标机动性的增强,会使建立的目标运动模型与目标的实际运动失配,从而导致跟踪性能下降.从采样粒子数的角度分析,粒子数的增加提高了目标的跟踪精度,在目标机动的情况下体现较为明显.1046 西安电子科技大学学报(自然科学版) 第37卷主要原因是当目标机动时,在QM C GPF 滤波过程中协方差矩阵相对较大,粒子数较少可能导致滤波器发散,会发生目标失跟的现象,这时就需要更多的粒子来提高滤波器估计的精度,进而提高跟踪的性能.实验中对上述条件下的失跟率进行了统计,如表2所示.表2 不同粒子数情况下算法的失跟率 (%)粒子数100200300500匀速直线运动0000匀加速直线运动5200恒速率转弯运动10610失跟率 定义为位置均方根误差大于150m 的实验次数与总实验次数的比值,可表示为 =n/L ,其中n 为跟踪失败的次数,L 为蒙特卡罗实验次数.从表2可以看出,随着粒子数的增多,失跟率会降低.考虑粒子数对计算量的影响,当目标作匀速直线运动时,粒子数一般取100~300就能够达到很好的跟踪精度;当目标机动时,粒子数要相对增加,从实验结果看,粒子数取为500就能保证无失跟,并且位置均方根误差可以控制在40m 以内.4 总结与展望笔者提出了一种基于Q MC GPF 的被动多传感器目标跟踪算法,解决了被动跟踪中的强非线性和弱可观测性问题.利用QMC 采样替代传统的蒙特卡罗采样,解决了由于蒙特卡罗采样形成 空隙和簇 导致GPF 滤波性能下降的问题;结合集中式融合策略,把QM C GPF 引入到被动多传感器目标跟踪系统中,用较少的粒子数获得了较高的跟踪精度和快速的收敛速度.随着被动传感器数目的增多,采用集中式融合的方式会受到处理机和通信带宽的限制.尽管笔者提出的算法在一定程度上降低了计算的复杂度,但是很难实时地对目标进行跟踪.分布式系统由于先进行局部滤波,减少了通信开销和计算量,因此在分布式系统下被动多传感器目标的跟踪是我们今后研究的一个方向.参考文献:[1]A nderso nB D O,M oo re J B.Optimal Filter ing [M ].Eng lew ood Clif fs:Pr entice H all,1979.[2]Julier S,U hlmann J,Dur rant White H F.A New M etho d for N onlinear T ransfo rmation o f M eans and Co var iances in F ilter s and Est imator s [J].IEEE T rans o n Automat ic Co nt rol,2000,45(3):477 482.[3]Ienkaran A ,Simon H ,Ro ber t J E.Discrete T ime N onlinear Filtering Algo rithms U sing Gauss H erm ite Q uadr ature [J].P roceeding s of the IEEE,2007,95(5):953 977.[4]Ienkar an A,Simon H.Square R oot Quadrature K alman Filter ing [J].IEEE T r ans on Sig na l P rocessing,2008,56(6):2589 2593.[5]Ienkar an A,Simon H.Cubature K alman F ilt er s [J].IEEE T rans on Auto mat ic Co nt ro l,2009,54(6):1254 1269.[6]李翠芸,姬红兵.新遗传粒子滤波的红外弱小目标跟踪与检测[J].西安电子科技大学学报,2009,36(4):620 623.L i Cuiyun,Ji Ho ng bing.IR D im T arg et T racking and Detection Based o n N ew Genetic Part icle Filter [J].Journal o f Xidian U niv er sity ,2009,36(4):620 623.[7]李良群,黄敬雄,谢维信.被动传感器阵列中基于粒子滤波的目标跟踪[J].电子与信息学报,2009,31(4):844 847.L i L iangqun,H uang Jinx iong,Xie Weixin.T ar gr t T racking Based on P article Filtering in Passiv e Sensor Ar ray [J].Journal o f Electro nics &Informat ion T echno log y,2009,31(4):844 847.[8]K otecha J H ,D juric P A.Gaussian Part icle F ilter ing [J].I EEE T r ans o n Sig nal P ro cess,2003,51(10):2592 2601.[9]Guo D ,Wang X D.Quasi M o nte Carlo F iltering in No nlinear D ynamic Systems [J].IEEE T rans o n Sig nal P rocessing,2006,54(6):2087 2098.[10]L i X R,Jilko v V P.Survey o f M aneuver ing T ar get T r acking pa rt I:D ynamic M odels [J].IEEE T rans o n A ero space andElectr onic Systems,2003,39(4):1333 1364.[11]M io dr ag B.A r chitectures fo r Efficient Implementatio n o f Par ticle Filters [D ].New Y ork:Stony Bro okU niver sity ,2004.(编辑:郭 华) 1047第6期 郭 辉等:采用拟蒙特卡罗法的被动多传感器目标跟踪。
基于Multi-Agent的编队对空防御方法
( . i lt nT an n e tr D l nWa s i a e f L v , l n 1 6 , ia 1 S mu ai r i i gC n e , a i rh pAc d myo A Na y Dai 1 0 Chn ; o a P a 1 8 2 T c n l g p r n , ae l eMa i me rc ig& C nr l n p  ̄ n f h n , in y n 2 4 3 , h: ) . e h o o y De a t t S t l t r i a kn me i t T o t l gDe a me t C i a Ja g i 1 4 l C ia o i o n
o tm ia i n p r o e n t e e d p i z t u p s s i h n .Co o mp r o t e ta ii n l g n t l o i m t t t t e t o e ,t e M AS a e t h r d to a e e i a g rt c h wi s a i ma h ma i m d l h h c c
Ab t a t Ac o d n t f a u e o Ai- e e s wa f r t s ra e o ma i n c n t u t ir p we d srb t n sr c : c r ig o e t r s f rd f n e r a e o u f c f r to , o sr c f e o r iti u i o o tm ia i n s l to a e n M u t— e t s s e .Ai— e e s r a e i n f t e mo t i o a t f r s。 p i z to o u i n b s d o liAg n y t m r d f n e wa f r s o e o h s mp r n o m i mo e n t n d r n v l b tl s t d n fr p we it i u i n s r t g s n a e n M u t- g n y t m o s l e fr p we y a i a a a te ,su y o e o r d srb t ta e y u i g b s d o i o li A e t s s e t o v e o r d n m c i
distributed coordination of multi-agent systems with quantizaed-observer based encoding-decoding
Distributed Coordination of Multi-Agent Systems With Quantized-Observer Based Encoding-DecodingTao Li,Member,IEEE,and Lihua Xie,Fellow,IEEEAbstract—Integrative design of communication mechanism and coordinated control law is an interesting and important problem for multi-agent networks.In this paper,we consider distributed coordination of discrete-time second-order multi-agent systems with partially measurable state and a limited communication data rate.A quantized-observer based encoding-decoding scheme is designed,which integrates the state observation with encoding/de-coding.A distributed coordinated control law is proposed for each agent which is given in terms of the states of its encoder and decoders.It is shown that for a connected network,2-bit quantizers suffice for the exponential asymptotic synchronization of the states of the agents.The selection of controller parameters and the performance limit are discussed.It is shown that the alge-braic connectivity and the spectral radius of the Laplacian matrix of the communication graph play key roles in the closed-loop performance.The spectral radius of the Laplacian matrix is related to the selection of control gains,while the algebraic con-nectivity is related to the spectral radius of the closed-loop state matrix.Furthermore,it is shown that as the number of agents increases,the asymptotic convergence rate can be approximated as a function of the number of agents,the number of quantization levels(communication data rate)and the ratio of the algebraic connectivity to the spectral radius of the Laplacian matrix of the communication graph.Index Terms—Data rate,digital communication,distributed co-ordination,encoding and decoding,multi-agent systems,quantized observer.I.I NTRODUCTIONI N recent years,distributed cooperative control of multi-agent systems has attracted unprecedented attention of the control community([1]–[14])in view of its wide applications in many emergingfields such as smart grids,intelligent trans-portation,formationflight,etc.In particular,the problem of multi-agent consensus has been the focus of many researches; see,e.g.,[5]and the reference therein.Manuscript received March01,2011;revised September03,2011;accepted April05,2012.Date of publication May14,2012;date of current version November21,2012.Recommended by Associate Editor L.Schenato.This work was supported by the National Natural Science Foundation of China (NSFC)under grants61004029,60934006and61120106011.This paper was presented in part at the30th Chinese Control Conference,July22-24,2011, Yantai,China.Recommended by Associate Editor L.Schenato.T.Li is with the Key Laboratory of Systems and Control,Institute of Systems Science,Academy of Mathematics and Systems Science,Chinese Academy of Sciences,Beijing100190,China(e-mail:litao@).L.Xie is with EXQUISITUS,Centre for E-City,School of Electrical and Electronic Engineering,Nanyang Technological University,Singapore639798 (e-mail:elhxie@.sg).Color versions of one or more of thefigures in this paper are available online at .Digital Object Identifier10.1109/TAC.2012.2199152Quantized consensus is an important problem due to that digital communications are widely adopted and has attracted recurring interest([15]–[24]).Kashyap et al.([15])developed an average-consensus algorithm with integer-valued states, which can ensure the asymptotic convergence of agents’states to an integer approximation of the average of the initial states. They gave an upper bound for the expected convergence time for fully connected networks and linear networks.Frasca et al. ([19]),Carli et al.([20]),and Li et al.([24])considered the av-erage-consensus problem with real-valued states and quantized communications.In[19]and[20],static uniform quantizers and dynamic logarithmic quantizers with an infinite number of quantization levels were considered,respectively.In[20]and [24],average-consensus algorithms with dynamicfinite-level uniform quantizers were proposed.Especially,in[24],it is shown that if the network is connected,then the control param-eters can be properly chosen such that the average-consensus can be achieved with an exponential convergence rate by using a single-bit quantizer.The work of[24]was extended to the cases with link failures in[25]and time-delay in[26], respectively.The aforementioned works are concerned with thefirst-order integrator systems with measurable states.In many applications, however,we encounter higher order systems with partially mea-surable states.Dynamic output feedback control of multi-agent systems of general higher order dynamics wasfirst studied by Fax and Murray([3]).Tuna proposed a controller design algo-rithm for synchronization of discrete-time linear systems based on static relative output feedback([27]).Qu et al.([28])dealt with static output feedback of multi-agent systems via feedback linearization,where the control input of an agent is given in terms of its own output and the relative output errors with re-spect to its neighbors.Li et al.([29])and You and Xie([30]) considered distributed coordination based on dynamic relative output feedback.Hong et al.([31])developed a distributed ob-server for leader-following systems where the leader and the followers are described by second-order integrators and each follower constructs a state observer based on the leader’s posi-tion,neighbors’positions and leader’s control input to estimate the leader’s velocity.More literature on distributed observers can be found in[32]and[33].In this paper,we consider distributed coordination of multi-agent networks based on digital communications.The communications among agents are described by an undirected graph.Each agent is described by a discrete-time second-order integrator,with measurable position but unmeasurable velocity, unlike[20]and[24].Since the states of the agents are only partially measurable,the encoding-decoding scheme in[24]0018-9286/$31.00©2012IEEEcan not be easily extended to this case.Further,unlike[20] where infinite-level logarithmic quantizers are considered,we aim to design an efficient encoding-decoding scheme under a limited data rate for information exchange between agents. Ourfirst challenge is to jointly design state-observation and encoding-decoding for communication and computation effi-ciency while achieving consensus.Note that one natural idea is to design a state-observer for each agent and then encode and transmit the state-estimate to neighbors,which,however, requires a distributed control with complex encoding-decoding scheme in order to eliminate the effect of quantization and estimation errors on thefinal closed-loop system.Further,even such a control scheme can be developed to guarantee conver-gence,the computation and communication loads are generally higher and the performance(i.e.,the convergence rate under the same bit rate)is not definitely better.From the perspective of minimizing communication bit rate and reducing computation load,we propose an integrative ap-proach for observer and encoder-decoder design in this paper. At each time instant,the quantized innovation of each agent’s position is sent to its neighbors,while,at each receiver,an ob-server-based decoder is activated to obtain an estimate of the sender’s position and velocity.Our design can result in a much lower communication requirement due to:1)the encoder inputs,i.e.,agents’positions,contains less variables than the full states;2)the encoder outputs are in fact a kind of quantized innova-tions of agents’positions and it is known that innovations gen-erally can be quantized with much lower numbers of bits than the positions themselves.It is worth pointing out that even if the quantization is ignored,our encoders and decoders are different from the dynamic feedback control law in[3].Here,we do not design a state observer for each agent separately,but send the quantized innovation of each agent’s output directly and inte-grate the state observation and communication process together. Our observer-based encoding-decoding scheme is also different from the distributed observer given in[31],especially,we do not require the knowledge of the other agents’control inputs.We develop a distributed coordinated control law by using the states of the decoders and encoders,provide sufficient con-ditions on the control gains and network topology for the ex-istence offinite-level quantizers to ensure the closed-loop con-vergence,and show that these conditions are also necessary in some sense.We prove that,by selecting the number of quantiza-tion levels(data rate)properly,the asymptotic synchronization of the positions and velocities can be achieved.Furthermore,for a connected network,we can always select the control gains, such that2-bit quantizers can guarantee the exponential conver-gence of the closed-loop system and the convergence rate can be predesigned.It should be noted that compared with classical non-quan-tized and centralized state observers,due to the nonlinearity of the quantization and the coupling of all agents’states,the con-vergence of a given observer-based encoding-decoding scheme depends on the control inputs of all agents and the closed-loop dynamics of the whole network.Different from[24],the rela-tionship between the estimation error and the quantization error does not have a simple form if observer type is not properly se-lected,and it is very difficult to get an explicit expression for the relationship between the spectral radius of the closed-loop state matrix and the eigenvalues of the graph Laplacian.All these significantly complicate the closed-loop analysis and the con-trol parameter selection.Also,different from[24],there is no explicit relationship between the stability margin and the con-trol gain,which makes the performance limit analysis difficult. By using differential calculus and limit analysis,we give a linear approximation of the spectral radius of the closed-loop state ma-trix with respect to the control gain ratio and algebraic connec-tivity of the communication graph,based on which,a relation-ship between the performance limit and the parameters of the network and system is revealed.We show that as the number of agents increases to infinity,the asymptotic highest convergence rate is when using a-level quantizer,where is the ratio of the algebraic connectivity to the spectral radius of the Laplacian matrix of the communica-tion graph.The remainder of this paper is organized as follows.In Section II,we present the model of the network and agents,give the structures of observer-based encoders,observer-based de-coders and distributed coordinated control laws.In Section III, we analyze the closed-loop system and give conditions on the network topology,the control gains and the number of quantization levels to ensure convergence.In Section IV,we discuss the selection of the control gain ratio and show that2-bit quantizers can guarantee the convergence of the closed-loop system by selecting the control gains properly.We also give an explicit form of the asymptotic convergence rate.In Section V, we draw some concluding remarks and propose future research topics.The following notation will be used throughout this paper: denotes a column vector with all ones.denotes the identity matrix with an appropriate size.For a given set,the number of its elements is denoted by.For a given vector or matrix ,we denote its transpose by,its-norm by,its Euclidean norm by,its spectral radius by,and its trace by.For a given positive number,the natural logarithm, the logarithm of with base2,the maximum integer less than or equal to,and the minimum integer greater than or equal to are respectively denoted by,,and.II.P ROBLEM F ORMULATIONA.Agent and Network ModelsWe consider distributed coordination of a network of agents with the second-order dynamics:(1) where,,and are the position, velocity control of the th agent,respectively.Here, is the output of agent,that is,for agent,only its po-sition is measurable.The agents communicate with each other through a network whose topology is modeled as an undirected graph,where the agents and the communication channels between agents are represented by the node set and the edge set,respectively.The weighted adjacency matrix ofLI AND XIE:DISTRIBUTED COORDINATION OF MULTI-AGENT SYSTEMS WITH QUANTIZED-OBSERVER BASED ENCODING-DECODING3025is denoted by.Note that is a sym-metric matrix.An edge by the pair represents a communication channel from to and if and only if.The neighborhood of the th agent is denoted by.For any,,and if and only if.Also,is called the degree of,and is called the degree of.The Laplacian matrix of is defined as,where.The Laplacian matrix is a sym-metric positive semi-definite matrix and its eigenvalues in an ascending order are denoted by,where is the spectral radius of and is called the algebraic connectivity of([34],[35]).A sequence of edges is called a path from node to node.The graph is called a connected graph if for any ,there is a path from to.B.Observer-Based Encoding-DecodingWe consider digital communication channels with limited channel capacity.At each time step,what each agent can send to its neighbors is only a coded version of its current and past measurements.Generally speaking,the encoder of the th agent may take the following form:(2) where and are the output and input of the encoder, respectively,is a Borel measurable function and is a quan-tizer.Note that both the structure and parameters of and may be time-varying and the encoder may have infinite memory. In this paper,we propose afinite memory encoder of agent as(3) where is an exponentially decaying scaling function to be defined later.In the above,and are the internal states of the encoder and is afiquantizer given by(4) where is the number of quantization levels of.After is received by one of the th agent’s neighbors,say agent,a decoder will be activated:(5) where and are the outputs of the decoder.Remark1:In the above,is a quantized innovation with scaling.From the dynamic(1)of the th agent,we know that to get estimates for and,following the standard observer design,the decoder can be in the form(6) where and are the observer gains.It can be easily verified that if and the quantizer is the identity function,then(6)degenerates to the classical deadbeat posterior state observer based on output.However,since is not available for the neighbors of the th agent,we adopt decoder(5)instead.Remark2:From(3)and(5),we have(7) We will show that and can be viewed as the estimates for and,respectively.Denoteas the quantization error in encoder,as the estimation error for andas the estimation error for.By(3)and some direct calculation,we get(8) and(9) It can be seen that if the quantization error is bounded, then due to the vanishing of,the estimation errorsand will both to zero asymptotically as.Note that here,for the velocity estimation,there is one step delay.Remark3:The relationship among the estimation errors ,and the quantization error is not asin thefirst-order case It will be seen later that(8)and(9)will play an important role in the closed-loop analysis.Observe that the estimation errors for velocities depend on two steps of quantization errors,which, as we can see later,leads to an additional bit required for the quantizers as compared to thefirst-order case([24]). Remark4:From the above,we can see that both the en-coder(3)and the decoder(5)can be viewed as the state ob-servers based on the output and the quantized innovation. We call the encoder(3)an observer-based encoder and the de-coder(5)an observer-based decoder.Though the velocityis not measurable,the th agent and its neighbors can make an estimate for the overall state by using an ob-server-based encoder and an observer-based decoder.At each time step,each agent only needs to send the quantized innova-tion of its output to its neighbors,then the neighbors can use observer-based decoders to get estimates for the state of the3026IEEE TRANSACTIONS ON AUTOMATIC CONTROL,VOL.57,NO.12,DECEMBER2012agent.However,generally speaking,there is no separation prin-ciple for the encoder-decoder design and the control design. Compared with classical non-quantized and centralized state observers,due to the nonlinearity of the quantization and the coupling of all agents’states,the convergence of a given ob-server-based encoding-decoding scheme depends on the control inputs of all agents and the closed-loop dynamics of the whole network,which significantly complicates the analysis as seen below.C.Distributed Control LawIn this paper,we aim at designing a distributed coordinated control law based on quantized communications such that(10) We propose a distributed coordinated control law of the form(11) where and are the control gains.From(3),(5)and(11),we can see that the control input of each agent only depends on the state of its own encoder and the states of the decoders associated with the channels from its neighbors.Remark5:Since the states of agents are only partially mea-surable,the encoding-decoding scheme in[24]where agents of single integrator dynamics are considered cannot be easily extended to this case.The challenge is to design state observers and encoders-decoders jointly so that they can achieve con-sensus with efficient communications and computation.One natural idea is to design a state-observer for each agent and then encode and transmit the state estimate to neighbors.For example,we may adopt the following state-observer for the th agent:(12)is then encoded and transmitted to the neigh-bors of the th agent.However,since the control inputand estimation error are not available for its neighbors,to eliminate the effect of quantization and estima-tion errors on thefinal closed-loop system,we may need a more complex encoding-decoding scheme and a control law than(3), (5)and(11).Further,even if we canfind such a scheme to guar-antee convergence,the computation and communication loads are higher and the performance(i.e.,the convergence rate under the same bit rate)is not definitely better.From the perspective of bit rate constraint and reducing computation load,we propose an integrative approach for the state-observer and encoder-de-coder design.III.C ONVERGENCE A NALYSISThis section is devoted to the convergence analysis of the proposed distributed control law in the last section.To this end, we introduce the following notation:where.We also define the unitary matrix(13) where is the unit eigenvector of associated with,that is,,,.Under the protocol(3),(5)and(11),due to the quantization, the closed-loop system is a nonlinear discontinuous system. Generally speaking,the convergence analysis is difficult, however,by using the estimation error expressions(8)and (9),the closed-loop equation can be converted into a linear equation with time-varying disturbances,whose homogeneous part is just the closed-loop equation without quantization.Then by properly selecting the number of quantization levels,the quantizers can be kept unsaturated and the convergence of the closed-loop system can be achieved.We make the following assumptions.A1)There are known positive constants,,,, such that,,,.A2)The communication graph is connected.A3).A4).The following lemma,whose proof can be found in Ap-pendix,will be used in the analysis of the homogeneous part of the closed-loop system.Lemma3.1:Let(14) Then,i),if and only if As-sumptions(A2)–(A4)hold.ii)Let(15)LI AND XIE:DISTRIBUTED COORDINATION OF MULTI-AGENT SYSTEMS WITH QUANTIZED-OBSERVER BASED ENCODING-DECODING3027If Assumptions(A3)–(A4)hold,then the eigenvalues of are0,,and ,where(16) In the above,the arguments,of and were omitted,and,where.From Lemma 3.1,we know that if Assumptions (A2)–(A4)hold,then is diagonalizable.Let ,,be nonsingular matrices,such that whereDenote,. In the following,the dependence of,and on and will be omitted when there is no confusion.The following theorem gives sufficient conditions on the con-trol gains and network topology for the existence offinite-level quantizers to ensure the closed-loop convergence.Theorem3.1:Suppose Assumptions(A1)–(A4)hold.Let the scaling function,where(17) and.If the numbers of quantization levels of the quantizer,satisfy(18) and(19)where, then under the protocol(3),(5)and(11),the closed-loop system satisfies(20) Furthermore,the convergence rate is given by(21)Proof:The proof can be divided into three steps.First, we convert the closed-loop system into non-coupled linear equations with nonlinear disturbances.The disturbances are combinations of the estimation errors which are related to the quantization errors as observed from by(8)and(9).Second, we estimate the bound of the synchronization errors in terms of the quantization errors and system and control parameters. Finally,we prove the boundness of the quantization error by properly choosing the control parameters and the number of quantization levels,which will lead to the convergence of the closed-loop system.Step1)From(7)and(11),it follows that(22) Substitute the control law above into the system(1),we haveLet,,where is defined in(13).Denote the th components of and by and,respectively.Then we have, and(23) Denote,then the(23)can be rewritten as(24) where with.It is clear that to get(20),we only need to prove,.3028IEEE TRANSACTIONS ON AUTOMATIC CONTROL,VOL.57,NO.12,DECEMBER2012Step2)By(24),we have(25) Further,by(8)and(9),noting that,we haveThen it follows from(25)that(26) By the definition of,and,we get(27) Step3)By Lemma A.2,we get.This together with(26)gives,, which further implies(20).Then from, (26)and(27),we get(21).Observe that the distributed control law in Theorem3.1re-lies on,which requires each agent to know the graph and may not be practical.This restriction is relaxed by the following corollary.Corollary3.1:Suppose Assumptions(A1)–(A4)hold.Let the scaling function,where(28) and.If the numbers of quantization levels of the quantizer,satisfy(29) and(30) where then under the protocol(3),(5)and(11),the closed-loop system satisfies(31) and the convergence rate is given by(32)Proof:Noting that and,by Theorem3.1,we get the conclusion of this corollary.Remark6:From Theorem3.1and Corollary3.1,we can see that the convergence factor can be properly chosen to tune the convergence rate of the closed-loop system.By Corollary3.1, we may select the control parameters by the following steps.i)Choosing,such that Assumptions(A3)–(A4)hold.ii) Choosing and then according to(28).iii)Choosing the number of quantization levels according to(29)and(30). Remark7:Corollary3.1tells us that to select proper and the number of quantization levels,we do not need to know, that is,the exact Laplacian matrix.Furthermore,Assumption A4)holds if,so the selection of the con-trol gains may not need the knowledge of.However,from the definition of,we can see that the selection of needs the knowledge of the eigenvalues of the Laplacian ma-trix.Hence,we still need some global knowledge of the net-work topology to select the control parameters.In the case when the network topology can be predesigned,this is not a problem. However,in some applications,the network topology may notLI AND XIE:DISTRIBUTED COORDINATION OF MULTI-AGENT SYSTEMS WITH QUANTIZED-OBSERVER BASED ENCODING-DECODING3029be known to each agent,for example,under switching topolo-gies due to changing environment.In this situation,the problem of estimating the eigenvalues of the Laplacian matrix in a dis-tributed manner becomes relevant.Franceschelli et al.([36]) gave an algorithm to estimate the eigenvalues of a Laplacian matrix by each agent using the fast Fourier transform.The com-bination of the eigenvalue estimation algorithm with our pro-posed distributed coordinate control algorithm is an interesting future research topic.Remark8:From Lemma3.1and the proof of Theorem3.1, we can see that A2-A4)are necessary and sufficient for the sta-bility of the homogeneous part of the closed-loop systems(24). Since,we can see that a smaller degree, which implies lower local connectivity,will instead give more flexibility for selecting the control gains.In the main theorem of[15](Theorem1of[15]),the authors proved that under their algorithm,as time goes on the states of agents converge to a ball centered at the average of the initial states with radius less than or equal to the quantization interval, with probability1.They also proved that there always exists a finite time such that the states of the agents enter and stay in the ball with a positive probability when.An upper bound for the mathematical expectation of the convergence time for fully connected networks and linear networks was also pro-vided.In this paper,we focus on the case with real-valued states and the asymptotic convergence to exact synchronization.The algorithm given here can guarantee convergence to synchro-nization with an arbitrary precision as time goes on.In the fol-lowing,we will give an analysis on the convergence time for a given precision for connected networks.For any given, denote and,which are respectively the time for the positions and veloci-ties of all the agents with precision.Theorem3.2:Suppose the conditions of Theorem3.1hold,and.Then under the protocol(3),(5)and(11),for sufficiently small, the convergence time for the position and velocity respectively satisfies(33) whereProof:The proof can be found in the Appendix. Remark9:Similar to Corollary 3.1,the constantin Theorem 3.2can be replacedby Fig.1.Curves of of Example1.,which gives us a relationship between the upper bound of the convergence time and the number of agents.IV.P ARAMETER D ESIGN AND P ERFORMANCE L IMIT A NALYSIS In this section,we shall investigate controller parameter se-lection and analyze the asymptotic consensus convergence rate.A.Selecting the Control Gain RatioSelecting the control gains and is equivalent to selecting a control gain ratio and the position control gain. It is easily seen that Assumptions A3)-A4)hold if and only if and.Further will max-imize,which implies the largest stability margin of the homogeneous part of the closed-loop system(24).1)Example1:We consider a10-node network withand.The curves of with respect to with different control gain ratios are shown in Fig.1.It can be seen that will go to1as or,andfirst decreases and then increases with respect to.The of the inflection point of reaches its maximum when.Further,it can be proved theoretically that when is sufficiently small, is almost a linear,monotone decreasing function of .We have the following result.Lemma4.1:If Assumptions A2)-A4)hold,then for any given ,we have(34)Proof:The proof can be found in Appendix.For Example1,the curves of andwith different are shown in Fig.2.B.Selecting the Control Parameters Under a Given Communication Data RateIn Theorem3.1,we give a criterion for selecting the number of quantization levels(communication data rate)under given control gains and a convergence rate.In the following theorem,3030IEEE TRANSACTIONS ON AUTOMATIC CONTROL,VOL.57,NO.12,DECEMBER2012Fig.2.Curves of and of Example1with different,where dot are for and the solid lines are for.we will consider how to select the control parameters under a given communication data rate.Theorem4.1:Suppose Assumptions A1)and A2)hold.For any given,,denote(35) Then,i)is nonempty.ii)If,,and the numbers of the quantization levels of satisfy(36)then under the protocol given by(3),(5)and(11)with,the closed-loop system satisfieswhere is a constant satisfying(37)Proof:From Lemma4.1,we have(38) which impliesFrom the aforementioned,noting that the ex-ists,and(35),we have(i).For any given integer and constant,if ,,(36)and(37)hold,then it is easily verified that,Assumptions A3)-A4)and(18)hold. Then noting that and,we know that(17)and(19)also hold.By Theorem3.1,we get ii).Remark10:In[24],it is shown that for a connected network withfirst-order agents,average-consensus can be achieved with an exponential convergence rate based on merely1-bit informa-tion exchange between agents.Here,we prove that for the case with second-order agents,2-bit quantizers suffice for the expo-nential asymptotic synchronization of agents’pared with[24],from(A.2),we can see that the additional bit is used to overcome the uncertainty in estimating the velocity of the agent.Remark11:Compared with[24],the performance limit analysis for the second order agents with partial measur-able states is much more challenging.In[24],the spec-tral radius of the closed-loop matrix has the simple form:,where is the control gain.In this paper,it is very difficult to get an explicit expression for the relationship between the closed-loop spectral radiusand the eigenvalues of the Laplacian matrix.By differential mean theorem and limit analysis,we develop Lemma4.1to give a linear approximation of with respect to the control gains and the algebraic connectivity.From(38),we can see that Lemma4.1plays a vital role in establishing Theorem 4.1.Different from[24],there is also no explicit relationship between the stability margin and the control gain ,which also poses a significant challenge in the asymptotic convergence rate analysis as seen later in Section IV-C.1)Example2:We consider a network with10nodes andweights,which means that,if,other-wise,.The edges of the graph are randomly generated according to,for any unordered pair. Here,,.The initial states are chosen as and,.The con-trol gain and,which give.The scaling factor is taken as 0.9998.According to Theorem3.1,the2-bit quantizer can be used.The evolution of the states is shown in Fig.3.It can be seen that both the positions and the velocities of the agents are asymptotically synchronized.Next,we set.In this。
An Overview of Recent Progress in the Study of Distributed Multi-agent Coordination
An Overview of Recent Progress in the Study of Distributed Multi-agent CoordinationYongcan Cao,Member,IEEE,Wenwu Yu,Member,IEEE,Wei Ren,Member,IEEE,and Guanrong Chen,Fellow,IEEEAbstract—This article reviews some main results and progress in distributed multi-agent coordination,focusing on papers pub-lished in major control systems and robotics journals since 2006.Distributed coordination of multiple vehicles,including unmanned aerial vehicles,unmanned ground vehicles and un-manned underwater vehicles,has been a very active research subject studied extensively by the systems and control community. The recent results in this area are categorized into several directions,such as consensus,formation control,optimization, and estimation.After the review,a short discussion section is included to summarize the existing research and to propose several promising research directions along with some open problems that are deemed important for further investigations.Index Terms—Distributed coordination,formation control,sen-sor networks,multi-agent systemI.I NTRODUCTIONC ONTROL theory and practice may date back to thebeginning of the last century when Wright Brothers attempted theirfirst testflight in1903.Since then,control theory has gradually gained popularity,receiving more and wider attention especially during the World War II when it was developed and applied tofire-control systems,missile nav-igation and guidance,as well as various electronic automation devices.In the past several decades,modern control theory was further advanced due to the booming of aerospace technology based on large-scale engineering systems.During the rapid and sustained development of the modern control theory,technology for controlling a single vehicle, albeit higher-dimensional and complex,has become relatively mature and has produced many effective tools such as PID control,adaptive control,nonlinear control,intelligent control, This work was supported by the National Science Foundation under CAREER Award ECCS-1213291,the National Natural Science Foundation of China under Grant No.61104145and61120106010,the Natural Science Foundation of Jiangsu Province of China under Grant No.BK2011581,the Research Fund for the Doctoral Program of Higher Education of China under Grant No.20110092120024,the Fundamental Research Funds for the Central Universities of China,and the Hong Kong RGC under GRF Grant CityU1114/11E.The work of Yongcan Cao was supported by a National Research Council Research Associateship Award at AFRL.Y.Cao is with the Control Science Center of Excellence,Air Force Research Laboratory,Wright-Patterson AFB,OH45433,USA.W.Yu is with the Department of Mathematics,Southeast University,Nanjing210096,China and also with the School of Electrical and Computer Engineering,RMIT University,Melbourne VIC3001,Australia.W.Ren is with the Department of Electrical Engineering,University of California,Riverside,CA92521,USA.G.Chen is with the Department of Electronic Engineering,City University of Hong Kong,Hong Kong SAR,China.Copyright(c)2009IEEE.Personal use of this material is permitted. However,permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@.and robust control methodologies.In the past two decades in particular,control of multiple vehicles has received increas-ing demands spurred by the fact that many benefits can be obtained when a single complicated vehicle is equivalently replaced by multiple yet simpler vehicles.In this endeavor, two approaches are commonly adopted for controlling multiple vehicles:a centralized approach and a distributed approach. The centralized approach is based on the assumption that a central station is available and powerful enough to control a whole group of vehicles.Essentially,the centralized ap-proach is a direct extension of the traditional single-vehicle-based control philosophy and strategy.On the contrary,the distributed approach does not require a central station for control,at the cost of becoming far more complex in structure and organization.Although both approaches are considered practical depending on the situations and conditions of the real applications,the distributed approach is believed more promising due to many inevitable physical constraints such as limited resources and energy,short wireless communication ranges,narrow bandwidths,and large sizes of vehicles to manage and control.Therefore,the focus of this overview is placed on the distributed approach.In distributed control of a group of autonomous vehicles,the main objective typically is to have the whole group of vehicles working in a cooperative fashion throughout a distributed pro-tocol.Here,cooperative refers to a close relationship among all vehicles in the group where information sharing plays a central role.The distributed approach has many advantages in achieving cooperative group performances,especially with low operational costs,less system requirements,high robustness, strong adaptivity,andflexible scalability,therefore has been widely recognized and appreciated.The study of distributed control of multiple vehicles was perhapsfirst motivated by the work in distributed comput-ing[1],management science[2],and statistical physics[3]. In the control systems society,some pioneering works are generally referred to[4],[5],where an asynchronous agree-ment problem was studied for distributed decision-making problems.Thereafter,some consensus algorithms were studied under various information-flow constraints[6]–[10].There are several journal special issues on the related topics published af-ter2006,including the IEEE Transactions on Control Systems Technology(vol.15,no.4,2007),Proceedings of the IEEE (vol.94,no.4,2007),ASME Journal of Dynamic Systems, Measurement,and Control(vol.129,no.5,2007),SIAM Journal of Control and Optimization(vol.48,no.1,2009),and International Journal of Robust and Nonlinear Control(vol.21,no.12,2011).In addition,there are some recent reviewsand progress reports given in the surveys[11]–[15]and thebooks[16]–[23],among others.This article reviews some main results and recent progressin distributed multi-agent coordination,published in majorcontrol systems and robotics journals since2006.Due to space limitations,we refer the readers to[24]for a more completeversion of the same overview.For results before2006,thereaders are referred to[11]–[14].Specifically,this article reviews the recent research resultsin the following directions,which are not independent but actually may have overlapping to some extent:1.Consensus and the like(synchronization,rendezvous).Consensus refers to the group behavior that all theagents asymptotically reach a certain common agreementthrough a local distributed protocol,with or without predefined common speed and orientation.2.Distributed formation and the like(flocking).Distributedformation refers to the group behavior that all the agents form a pre-designed geometrical configuration throughlocal interactions with or without a common reference.3.Distributed optimization.This refers to algorithmic devel-opments for the analysis and optimization of large-scaledistributed systems.4.Distributed estimation and control.This refers to dis-tributed control design based on local estimation aboutthe needed global information.The rest of this article is organized as follows.In Section II,basic notations of graph theory and stochastic matrices are introduced.Sections III,IV,V,and VI describe the recentresearch results and progress in consensus,formation control, optimization,and estimation.Finally,the article is concludedby a short section of discussions with future perspectives.II.P RELIMINARIESA.Graph TheoryFor a system of n connected agents,its network topology can be modeled as a directed graph denoted by G=(V,W),where V={v1,v2,···,v n}and W⊆V×V are,respectively, the set of agents and the set of edges which directionallyconnect the agents together.Specifically,the directed edgedenoted by an ordered pair(v i,v j)means that agent j can access the state information of agent i.Accordingly,agent i is a neighbor of agent j.A directed path is a sequence of directed edges in the form of(v1,v2),(v2,v3),···,with all v i∈V.A directed graph has a directed spanning tree if there exists at least one agent that has a directed path to every other agent.The union of a set of directed graphs with the same setof agents,{G i1,···,G im},is a directed graph with the sameset of agents and its set of edges is given by the union of the edge sets of all the directed graphs G ij,j=1,···,m.A complete directed graph is a directed graph in which each pair of distinct agents is bidirectionally connected by an edge,thus there is a directed path from any agent to any other agent in the network.Two matrices are used to represent the network topology: the adjacency matrix A=[a ij]∈R n×n with a ij>0if (v j,v i)∈W and a ij=0otherwise,and the Laplacian matrix L=[ℓij]∈R n×n withℓii= n j=1a ij andℓij=−a ij,i=j, which is generally asymmetric for directed graphs.B.Stochastic MatricesA nonnegative square matrix is called(row)stochastic matrix if its every row is summed up to one.The product of two stochastic matrices is still a stochastic matrix.A row stochastic matrix P∈R n×n is called indecomposable and aperiodic if lim k→∞P k=1y T for some y∈R n[25],where 1is a vector with all elements being1.III.C ONSENSUSConsider a group of n agents,each with single-integrator kinematics described by˙x i(t)=u i(t),i=1,···,n,(1) where x i(t)and u i(t)are,respectively,the state and the control input of the i th agent.A typical consensus control algorithm is designed asu i(t)=nj=1a ij(t)[x j(t)−x i(t)],(2)where a ij(t)is the(i,j)th entry of the corresponding ad-jacency matrix at time t.The main idea behind(2)is that each agent moves towards the weighted average of the states of its neighbors.Given the switching network pattern due to the continuous motions of the dynamic agents,coupling coefficients a ij(t)in(2),hence the graph topologies,are generally time-varying.It is shown in[9],[10]that consensus is achieved if the underlying directed graph has a directed spanning tree in some jointly fashion in terms of a union of its time-varying graph topologies.The idea behind consensus serves as a fundamental principle for the design of distributed multi-agent coordination algo-rithms.Therefore,investigating consensus has been a main research direction in the study of distributed multi-agent co-ordination.To bridge the gap between the study of consensus algorithms and many physical properties inherited in practical systems,it is necessary and meaningful to study consensus by considering many practical factors,such as actuation,control, communication,computation,and vehicle dynamics,which characterize some important features of practical systems.This is the main motivation to study consensus.In the following part of the section,an overview of the research progress in the study of consensus is given,regarding stochastic network topologies and dynamics,complex dynamical systems,delay effects,and quantization,mainly after2006.Several milestone results prior to2006can be found in[2],[4]–[6],[8]–[10], [26].A.Stochastic Network Topologies and DynamicsIn multi-agent systems,the network topology among all vehicles plays a crucial role in determining consensus.The objective here is to explicitly identify necessary and/or suffi-cient conditions on the network topology such that consensus can be achieved under properly designed algorithms.It is often reasonable to consider the case when the network topology is deterministic under ideal communication chan-nels.Accordingly,main research on the consensus problem was conducted under a deterministicfixed/switching network topology.That is,the adjacency matrix A(t)is deterministic. Some other times,when considering random communication failures,random packet drops,and communication channel instabilities inherited in physical communication channels,it is necessary and important to study consensus problem in the stochastic setting where a network topology evolves according to some random distributions.That is,the adjacency matrix A(t)is stochastically evolving.In the deterministic setting,consensus is said to be achieved if all agents eventually reach agreement on a common state. In the stochastic setting,consensus is said to be achieved almost surely(respectively,in mean-square or in probability)if all agents reach agreement on a common state almost surely (respectively,in mean-square or with probability one).Note that the problem studied in the stochastic setting is slightly different from that studied in the deterministic setting due to the different assumptions in terms of the network topology. Consensus over a stochastic network topology was perhaps first studied in[27],where some sufficient conditions on the network topology were given to guarantee consensus with probability one for systems with single-integrator kinemat-ics(1),where the rate of convergence was also studied.Further results for consensus under a stochastic network topology were reported in[28]–[30],where research effort was conducted for systems with single-integrator kinematics[28],[29]or double-integrator dynamics[30].Consensus for single-integrator kine-matics under stochastic network topology has been exten-sively studied in particular,where some general conditions for almost-surely consensus was derived[29].Loosely speaking, almost-surely consensus for single-integrator kinematics can be achieved,i.e.,x i(t)−x j(t)→0almost surely,if and only if the expectation of the network topology,namely,the network topology associated with expectation E[A(t)],has a directed spanning tree.It is worth noting that the conditions are analogous to that in[9],[10],but in the stochastic setting. In view of the special structure of the closed-loop systems concerning consensus for single-integrator kinematics,basic properties of the stochastic matrices play a crucial role in the convergence analysis of the associated control algorithms. Consensus for double-integrator dynamics was studied in[30], where the switching network topology is assumed to be driven by a Bernoulli process,and it was shown that consensus can be achieved if the union of all the graphs has a directed spanning tree.Apparently,the requirement on the network topology for double-integrator dynamics is a special case of that for single-integrator kinematics due to the difference nature of thefinal states(constantfinal states for single-integrator kinematics and possible dynamicfinal states for double-integrator dynamics) caused by the substantial dynamical difference.It is still an open question as if some general conditions(corresponding to some specific algorithms)can be found for consensus with double-integrator dynamics.In addition to analyzing the conditions on the network topology such that consensus can be achieved,a special type of consensus algorithm,the so-called gossip algorithm[31],[32], has been used to achieve consensus in the stochastic setting. The gossip algorithm can always guarantee consensus almost surely if the available pairwise communication channels satisfy certain conditions(such as a connected graph).The way of network topology switching does not play any role in the consideration of consensus.The current study on consensus over stochastic network topologies has shown some interesting results regarding:(1) consensus algorithm design for various multi-agent systems,(2)conditions of the network topologies on consensus,and(3)effects of the stochastic network topologies on the con-vergence rate.Future research on this topic includes,but not limited to,the following two directions:(1)when the network topology itself is stochastic,how to determine the probability of reaching consensus almost surely?(2)compared with the deterministic network topology,what are the advantages and disadvantages of the stochastic network topology,regarding such as robustness and convergence rate?As is well known,disturbances and uncertainties often exist in networked systems,for example,channel noise,commu-nication noise,uncertainties in network parameters,etc.In addition to the stochastic network topologies discussed above, the effect of stochastic disturbances[33],[34]and uncertain-ties[35]on the consensus problem also needs investigation. Study has been mainly devoted to analyzing the performance of consensus algorithms subject to disturbances and to present-ing conditions on the uncertainties such that consensus can be achieved.In addition,another interesting direction in dealing with disturbances and uncertainties is to design distributed localfiltering algorithms so as to save energy and improve computational efficiency.Distributed localfiltering algorithms play an important role and are more effective than traditional centralizedfiltering algorithms for multi-agent systems.For example,in[36]–[38]some distributed Kalmanfilters are designed to implement data fusion.In[39],by analyzing consensus and pinning control in synchronization of complex networks,distributed consensusfiltering in sensor networks is addressed.Recently,Kalmanfiltering over a packet-dropping network is designed through a probabilistic approach[40]. Today,it remains a challenging problem to incorporate both dynamics of consensus and probabilistic(Kalman)filtering into a unified framework.plex Dynamical SystemsSince consensus is concerned with the behavior of a group of vehicles,it is natural to consider the system dynamics for practical vehicles in the study of the consensus problem. Although the study of consensus under various system dynam-ics is due to the existence of complex dynamics in practical systems,it is also interesting to observe that system dynamics play an important role in determining thefinal consensus state.For instance,the well-studied consensus of multi-agent systems with single-integrator kinematics often converges to a constantfinal value instead.However,consensus for double-integrator dynamics might admit a dynamicfinal value(i.e.,a time function).These important issues motivate the study of consensus under various system dynamics.As a direct extension of the study of the consensus prob-lem for systems with simple dynamics,for example,with single-integrator kinematics or double-integrator dynamics, consensus with general linear dynamics was also studied recently[41]–[43],where research is mainly devoted tofinding feedback control laws such that consensus(in terms of the output states)can be achieved for general linear systems˙x i=Ax i+Bu i,y i=Cx i,(3) where A,B,and C are constant matrices with compatible sizes.Apparently,the well-studied single-integrator kinematics and double-integrator dynamics are special cases of(3)for properly choosing A,B,and C.As a further extension,consensus for complex systems has also been extensively studied.Here,the term consensus for complex systems is used for the study of consensus problem when the system dynamics are nonlinear[44]–[48]or with nonlinear consensus algorithms[49],[50].Examples of the nonlinear system dynamics include:•Nonlinear oscillators[45].The dynamics are often as-sumed to be governed by the Kuramoto equation˙θi=ωi+Kstability.A well-studied consensus algorithm for(1)is given in(2),where it is now assumed that time delay exists.Two types of time delays,communication delay and input delay, have been considered in the munication delay accounts for the time for transmitting information from origin to destination.More precisely,if it takes time T ij for agent i to receive information from agent j,the closed-loop system of(1)using(2)under afixed network topology becomes˙x i(t)=nj=1a ij(t)[x j(t−T ij)−x i(t)].(7)An interpretation of(7)is that at time t,agent i receives information from agent j and uses data x j(t−T ij)instead of x j(t)due to the time delay.Note that agent i can get its own information instantly,therefore,input delay can be considered as the summation of computation time and execution time. More precisely,if the input delay for agent i is given by T p i, then the closed-loop system of(1)using(2)becomes˙x i(t)=nj=1a ij(t)[x j(t−T p i)−x i(t−T p i)].(8)Clearly,(7)refers to the case when only communication delay is considered while(8)refers to the case when only input delay is considered.It should be emphasized that both communication delay and input delay might be time-varying and they might co-exist at the same time.In addition to time delay,it is also important to consider packet drops in exchanging state information.Fortunately, consensus with packet drops can be considered as a special case of consensus with time delay,because re-sending packets after they were dropped can be easily done but just having time delay in the data transmission channels.Thus,the main problem involved in consensus with time delay is to study the effects of time delay on the convergence and performance of consensus,referred to as consensusabil-ity[52].Because time delay might affect the system stability,it is important to study under what conditions consensus can still be guaranteed even if time delay exists.In other words,can onefind conditions on the time delay such that consensus can be achieved?For this purpose,the effect of time delay on the consensusability of(1)using(2)was investigated.When there exists only(constant)input delay,a sufficient condition on the time delay to guarantee consensus under afixed undirected interaction graph is presented in[8].Specifically,an upper bound for the time delay is derived under which consensus can be achieved.This is a well-expected result because time delay normally degrades the system performance gradually but will not destroy the system stability unless the time delay is above a certain threshold.Further studies can be found in, e.g.,[53],[54],which demonstrate that for(1)using(2),the communication delay does not affect the consensusability but the input delay does.In a similar manner,consensus with time delay was studied for systems with different dynamics, where the dynamics(1)are replaced by other more complex ones,such as double-integrator dynamics[55],[56],complex networks[57],[58],rigid bodies[59],[60],and general nonlinear dynamics[61].In summary,the existing study of consensus with time delay mainly focuses on analyzing the stability of consensus algo-rithms with time delay for various types of system dynamics, including linear and nonlinear dynamics.Generally speaking, consensus with time delay for systems with nonlinear dynam-ics is more challenging.For most consensus algorithms with time delays,the main research question is to determine an upper bound of the time delay under which time delay does not affect the consensusability.For communication delay,it is possible to achieve consensus under a relatively large time delay threshold.A notable phenomenon in this case is that thefinal consensus state is constant.Considering both linear and nonlinear system dynamics in consensus,the main tools for stability analysis of the closed-loop systems include matrix theory[53],Lyapunov functions[57],frequency-domain ap-proach[54],passivity[58],and the contraction principle[62]. Although consensus with time delay has been studied extensively,it is often assumed that time delay is either constant or random.However,time delay itself might obey its own dynamics,which possibly depend on the communication distance,total computation load and computation capability, etc.Therefore,it is more suitable to represent the time delay as another system variable to be considered in the study of the consensus problem.In addition,it is also important to consider time delay and other physical constraints simultaneously in the study of the consensus problem.D.QuantizationQuantized consensus has been studied recently with motiva-tion from digital signal processing.Here,quantized consensus refers to consensus when the measurements are digital rather than analog therefore the information received by each agent is not continuous and might have been truncated due to digital finite precision constraints.Roughly speaking,for an analog signal s,a typical quantizer with an accuracy parameterδ, also referred to as quantization step size,is described by Q(s)=q(s,δ),where Q(s)is the quantized signal and q(·,·) is the associated quantization function.For instance[63],a quantizer rounding a signal s to its nearest integer can be expressed as Q(s)=n,if s∈[(n−1/2)δ,(n+1/2)δ],n∈Z, where Z denotes the integer set.Note that the types of quantizers might be different for different systems,hence Q(s) may differ for different systems.Due to the truncation of the signals received,consensus is now considered achieved if the maximal state difference is not larger than the accuracy level associated with the whole system.A notable feature for consensus with quantization is that the time to reach consensus is usuallyfinite.That is,it often takes afinite period of time for all agents’states to converge to an accuracy interval.Accordingly,the main research is to investigate the convergence time associated with the proposed consensus algorithm.Quantized consensus was probablyfirst studied in[63], where a quantized gossip algorithm was proposed and its convergence was analyzed.In particular,the bound of theconvergence time for a complete graph was shown to be poly-nomial in the network size.In[64],coding/decoding strate-gies were introduced to the quantized consensus algorithms, where it was shown that the convergence rate depends on the accuracy of the quantization but not the coding/decoding schemes.In[65],quantized consensus was studied via the gossip algorithm,with both lower and upper bounds of the expected convergence time in the worst case derived in terms of the principle submatrices of the Laplacian matrix.Further results regarding quantized consensus were reported in[66]–[68],where the main research was also on the convergence time for various proposed quantized consensus algorithms as well as the quantization effects on the convergence time.It is intuitively reasonable that the convergence time depends on both the quantization level and the network topology.It is then natural to ask if and how the quantization methods affect the convergence time.This is an important measure of the robustness of a quantized consensus algorithm(with respect to the quantization method).Note that it is interesting but also more challenging to study consensus for general linear/nonlinear systems with quantiza-tion.Because the difference between the truncated signal and the original signal is bounded,consensus with quantization can be considered as a special case of one without quantization when there exist bounded disturbances.Therefore,if consensus can be achieved for a group of vehicles in the absence of quantization,it might be intuitively correct to say that the differences among the states of all vehicles will be bounded if the quantization precision is small enough.However,it is still an open question to rigorously describe the quantization effects on consensus with general linear/nonlinear systems.E.RemarksIn summary,the existing research on the consensus problem has covered a number of physical properties for practical systems and control performance analysis.However,the study of the consensus problem covering multiple physical properties and/or control performance analysis has been largely ignored. In other words,two or more problems discussed in the above subsections might need to be taken into consideration simul-taneously when studying the consensus problem.In addition, consensus algorithms normally guarantee the agreement of a team of agents on some common states without taking group formation into consideration.To reflect many practical applications where a group of agents are normally required to form some preferred geometric structure,it is desirable to consider a task-oriented formation control problem for a group of mobile agents,which motivates the study of formation control presented in the next section.IV.F ORMATION C ONTROLCompared with the consensus problem where thefinal states of all agents typically reach a singleton,thefinal states of all agents can be more diversified under the formation control scenario.Indeed,formation control is more desirable in many practical applications such as formationflying,co-operative transportation,sensor networks,as well as combat intelligence,surveillance,and reconnaissance.In addition,theperformance of a team of agents working cooperatively oftenexceeds the simple integration of the performances of all individual agents.For its broad applications and advantages,formation control has been a very active research subject inthe control systems community,where a certain geometric pattern is aimed to form with or without a group reference.More precisely,the main objective of formation control is to coordinate a group of agents such that they can achievesome desired formation so that some tasks can befinished bythe collaboration of the agents.Generally speaking,formation control can be categorized according to the group reference.Formation control without a group reference,called formationproducing,refers to the algorithm design for a group of agents to reach some pre-desired geometric pattern in the absenceof a group reference,which can also be considered as the control objective.Formation control with a group reference,called formation tracking,refers to the same task but followingthe predesignated group reference.Due to the existence of the group reference,formation tracking is usually much morechallenging than formation producing and control algorithmsfor the latter might not be useful for the former.As of today, there are still many open questions in solving the formationtracking problem.The following part of the section reviews and discussesrecent research results and progress in formation control, including formation producing and formation tracking,mainlyaccomplished after2006.Several milestone results prior to 2006can be found in[69]–[71].A.Formation ProducingThe existing work in formation control aims at analyzingthe formation behavior under certain control laws,along with stability analysis.1)Matrix Theory Approach:Due to the nature of multi-agent systems,matrix theory has been frequently used in thestability analysis of their distributed coordination.Note that consensus input to each agent(see e.g.,(2))isessentially a weighted average of the differences between the states of the agent’s neighbors and its own.As an extensionof the consensus algorithms,some coupling matrices wereintroduced here to offset the corresponding control inputs by some angles[72],[73].For example,given(1),the controlinput(2)is revised as u i(t)= n j=1a ij(t)C[x j(t)−x i(t)], where C is a coupling matrix with compatible size.If x i∈R3, then C can be viewed as the3-D rotational matrix.The mainidea behind the revised algorithm is that the original controlinput for reaching consensus is now rotated by some angles. The closed-loop system can be expressed in a vector form, whose stability can be determined by studying the distribution of the eigenvalues of a certain transfer matrix.Main research work was conducted in[72],[73]to analyze the collective motions for systems with single-integrator kinematics and double-integrator dynamics,where the network topology,the damping gain,and C were shown to affect the collective motions.Analogously,the collective motions for a team of nonlinear self-propelling agents were shown to be affected by。
多Agent系统协作求解的粒子模型方法
多Agent系统协作求解的粒子模型方法赵旭宝;李静;董靓瑜【摘要】The relation between cooperative problem solving of distribution in MAS and partical collaboration are discussed, and a partical model for cooperative problem solving is proposed in MAS, which transforms the process of cooperative problem solving into co-optimization of particles. A parameter of collaboration extent is introduced, formula of demand intensity and effectiveness of target function of benefits are established, and particle swarm optimization algorithm to solve such problem is developed. Through evolutionary computation, an optimal solution for task allocations and resource assignments can be found. The proposed approach can describe and process the self-organization phenomena of Agent as well as the randomness and simultaneity of social interaction behaviors to complicated problem solving. The simulation experiments demonstrate the effectiveness and convergence of the method.%讨论了多Agent系统分布协作求解和粒子协作之间的关系,提出了一种多Agent系统协作求解粒子模型方法,将任务资源规划协作求解过程转化为多粒子共同寻优的过程.引入了协作程度变化参数,建立了需求强度计算公式和效益目标函数,并构造了适合求解的粒子群算法.通过算法的寻优计算,得到了任务资源规划协作求解的最优解.仿真实验结果表明,对于复杂的任务资源规划问题,该方法能描述和处理Agent本身自组织现象和社会交互行为的随机性和并发性,并具有良好的收敛性和有效性.【期刊名称】《大连交通大学学报》【年(卷),期】2012(033)002【总页数】6页(P94-99)【关键词】多Agent系统(MAS);分布式协作求解;粒子群算法;任务资源规划分配【作者】赵旭宝;李静;董靓瑜【作者单位】大连交通大学软件学院,辽宁大连116021;大连交通大学软件学院,辽宁大连116021;大连交通大学软件学院,辽宁大连116021【正文语种】中文0 引言在分布式人工智能中,基于Agent结构提供了柔性和鲁棒性,适合解决动态、不确定和分布式的问题.系统中各Agent个体都是具有自主性的智能体,存在自己的信念、愿望、目标等认知属性和承诺、义务、协作、竞争等社会属性[1-2].系统中各Agent个体通过对自身知识的表示和对问题域的描述,构成分布的、异构的、面向特定问题的Agent求解子系统,完成指定任务的求解.但在多Agent分布式系统中由于每个Agent个体所具有的知识资源和执行能力是有限的,当单个Agent难以独立完成指定任务,或多个Agent一起完成会产生更大的效益时,多个A-gent个体之间就倾向于利用协作机制进行信息的交流、知识的共享来完成任务的协作求解.为了保证多Agent系统协作求解的性能,很多学者在关于多Agent 系统协作求解模型建立和协作求解方法方面做了大量研究.文献[3]提出面向共同目标的合作求解策略,重点在于寻求系统的最大效益;文献[4-5]提出基于弹簧网络的多Agent系统协作求解方法,通过自组织动力学策略来实现Agent之间的协调;文献[6-7]提出基于合同网协议的合作求解方法,先协商结盟再规划求解,并通过协商的方式解决冲突.目前多Agent分布式系统协作求解方法的研究基本上有两种类型:一种类型是Agent个体各自寻求自身最大利益的方法;另一种是Agent个体共同寻求整个系统最大效益的方法.但前者协作求解中没有全局的优化目标,缺乏统一的全局控制策略;后者又难以描述Agent个体自变与自组织现象.同时,这两种类型虽然都涉及到Agent间的协作和交互,但协作交互也仅仅是一些简单的社会交互行为,在问题求解过程中不能及时处理环境和Agent本身的动态变化以及社会交互行为的随机性和并发性的问题.为此,本文提出了一种多Agent系统协作求解的粒子模型方法.将系统协作求解转换为多粒子共同寻优过程,克服了Agent本身认知属性和社会属性动态变化及随机性和并发性的问题,使得Agent个体在协作求解中既获取自身的最大利益,又促进系统的总体效益.最后,引入了协作程度变化参数,给出了Agent协作求解的需求强度计算公式和系统效益目标函数及优化算法,经过算法迭代计算求得了协作求解的资源分配优化解.仿真结果表明该方法具有很好的收敛性和实用性.1 求解粒子模型1.1 求解问题描述不失一般性,本文以多Agent分布式环境下对问题实施任务资源规划分配的协作求解为背景,讨论多Agent系统协作求解方法与优化算法.设待解决的任务Agent 集合和知识与执行能力构成的资源Agent集合分别为 Task={AgentT1,AgentT2,…, AgentTn} 和 Res = {AgentR1,AgentR2,…,AgentRm},且每个子资源AgentRi个体拥有的资源容量为mi.子资源AgentRi个体分配给子任务AgentTk个体的资源量为rik(rik≤mik),供其完成规划的任务.同时子任务AgentTk个体付给子资源AgentRi个体单位资源报酬为pik.设第k个子任务AgentTk对各种子资源Agent需求资源总量为Sk和第k个子任务AgentTk所能支付总的资源报酬代价为Pk.子任务AgentTk在执行任务时使用何种子资源Agent取决于子任务AgentTk对子资源AgentRi的需求强度xik(1≤i≤m,1≤k≤n).xik表示第k个任务需要第i个资源.因此,在多Agent分布协作求解中,任务资源规划目标则为寻找一个优化的任务资源规划分配方案,在各资源用量最下的前提下,取得系统整体收益最大值.1.2 求解粒子模型由上述问题分析可知,单个子任务AgentTk对各子资源AgentRi的需求是关于rik,pik,xik的函数,函数表示为Aik=F(rik,pik,xik).所有任务对资源需求可以表示成如下的分配矩阵T_R=[Aik]m×n(1 ≤ i≤ m,1≤ k≤ n).矩阵如下:在上述分配矩阵中,每个子任务AgentTk(1≤k≤n)在完成任务时,可能对每个资源Agent Ri(1≤i≤m)个体存在需求.也就是说,每个子资源AgentRi个体可能分配给不同的子任务(rikxik≤ mi(∀i=1,2,…,m)).这样当多个子任务同时需要某资源时,就可能产生资源使用的冲突.为此,本文提出了资源协作求解的粒子模型.将每个子资源AgentRi个体视为不可再分的个体,称为粒子,每个子资源粒子每次仅能分配给一个子任务粒子.这样,当多个子任务需要同时使用某子资源时,Agent粒子就会倾向于进行协作求解共同完成多个子任务.或当多个子资源Agent 粒子一起完成所规划任务会更有效时,也会倾向协作求解.多Agent粒子之间是否进行协作,取决于协作求解强度xik的值.1.3 需求强度的计算在实际应用中,对于任务对资源需求强度的取值,不但要考虑Agent意愿、目标等自身认知属性的变化,更要考虑复杂社会交互行为对协作求解中需求强度的影响.对于群体Agent协作求解过程中所涉及的社会交互行为类型大致可分为两类: (1)对于子任务 AgentTk,子资源 AgentRi粒子与子资源AgentRj粒子的协作交互行为ρijk;(2)对于子资源AgentRi,子任务AgentTk粒子与子任务AgentTj粒子之间的协作交互行为ρ'kji.其中,ρijk的含义为:对于子任务AgentTk,如果资源AgentRi与资源AgentRj具有相同的意愿和目标,且产生交互行为能加速对任务的执行,或产生更多的效益,则将加强任务 AgentTk对资源AgentRi粒子与AgentRj粒子的需求,加强的强度为ρijk.相反则消弱.同理,ρ'kji的含义为:对于子资源 AgentRi,如果任务AgentTk与任务AgentTj之间产生交互合作行为,能简化任务执行的复杂度,并能节约资源的消耗,则将加强资源AgentRi对任务AgentTk粒子与AgentTj粒子的分配.加强的强度为ρ'kji.相反则消弱.假定不同类型的交互行为产生的效果具有叠加性.因此,根据上述社会交互行为的分类,在协作求解交互过程中,任务对资源的需求强度可通过式(2)计算取得.其中,wij为关联权值,在[0,1]之间取值.它表示协作求解中各Agent粒子间关联程度.它随Agent粒子的动态变化而改变.如新陈代谢、随机故障以及协作交互的竞争、利用、欺骗等.在Agent粒子生命周期结束时wij=0.为了简化问题求解的复杂度,本文假定交互行为对需求强度的加强与消弱程度相同,即ρijk和ρ'kji取相同的小数值.由式(2)可知,计算所得需求强度xik的值为非整数,不满足粒子不可分思想,需对其做进一步的处理.通过构造将粒子间的需求强度xik定义为只取0或1整数值,且满足2,…,m).当xik=1表示第i个资源粒子分配给第k个子任务粒子;xik=0表示含义与前相反.需求强度的取值描述如式(3):在式(3)中,当计算所得需求强度值超过给定的某阈值μ后,使得xik=1,否则为0.构造后,某次任务资源规划协作求解状态可用图1表示.图1中圆点表示子资源粒子对子任务粒子的分配.有向边表示各粒子之间的协作求解过程.如有向边<x12,x24>表示子任务粒子T2在使用资源粒子R1时,还需要使用资源粒子R2,但R2已被T4使用.因此资源粒子R1和R2建立协作关系.在图1中有向边越多表明系统内部协作求解规模越大.另外,如果从资源分配角度描述群体Agent粒子间的协作求解交互过程,还可以表示成有向图,如图2.图2中每条边上的权值代表系统消费代价.在协作求解中由于Agent粒子自身的动态变化和社会交互行为随机性、并发性等因素的影响,使得这种协作求解是以通信开销和资源耗费为代价.本文将第i个Agent粒子和第j个Agent粒子协作求解产生的资源消费代价定义为cij.在图2中有向边的多少也体现了系统协作求解交互程度.用变量e∈[0,1]表示系统协作交互程度.图1 Agent协作求解模型图2 资源Agent协作求解过程2 求解数学模型在上述模型中,每次xik的不同取值,即可确定某次任务资源规划的一次分配方案.由此即可计算任务规划系统整体效益值.下面给出协作求解的效益目标函数.设第k个子任务完成后所产生的效益是一个关于资源消耗代价的某一连续可微的严格凹函数分布,则第k个任务的效益函数为2,…,n).其中λ1为 Agent粒子自身效益因子(λ1为随机正小数).它与Agent在其生命周期内自变与自组织因素及资源的利用率、需求满足率有关;为第k个任务所耗费资源报酬总和.同理,可得到第i个资源与第j个资源执行协作求解时,所产生的协作效益为Wkb(X)=(i≠ j且 b > k,∀k,b=1,2,…,n).其中λ2为协作过程效益因子(λ2为随机正小数).它与协作过程存在拥塞、欺骗、竞争和优先级竞争等社会交互行为有关;为协作过程中由于通信开销和资源消耗所产生的报酬总和.因此,整个系统所产生的总效益函数为.系统的实现目标则为求得W(X)总效益的最大值.对于总效益函数W(X),求总效益的最大值等价于求整个系统消费总代价最小值,即式(4)的最小值,且满足式 (5),(6),(7)约束条件.λ1,λ2为(0,1)的随机数3 协作求解的粒子群算法在多Agent系统分布式协作求解的粒子模型中,每个规划解xik都是一个0或1值(1≤i≤m,1≤k≤n).显然任务资源协作求解问题属于组合优化问题.因此,本文构造了适合求解的改进粒子群优化算法[8].在每次求解中把子资源数粒子数m定义为算法中每次迭代求解的空间维数.即每一维空间代表一个资源粒子对任务粒子的分配情况,用一个整数来表示.如果某资源粒子在求解中没有被任何子任务所使用,则表示为0.如Pkm代表算法中第k次求解的第m维空间;Pkm=n-1表示算法中第k次求解中第m个资源粒子分配给第n-1个子任务使用.在此粒子群优化算法中,采用了惩罚函数法[9]来处理了具有约束的优化问题,即只要是非可行解就直接丢弃.转化后的目标适应值计算公式可表示为式(8).采用这种适应值计算方法,对式(4)经过多次优化迭代计算,即可求得其系统消费代价最小优化解.协作求解的粒子群算法如下:(1)随机初始化粒子种群:粒子群位置X、速度V、种群规模N、学习因子C1和C2、惯性因子w和最大迭代次数Max_Len以及系统参数变量pik,rik,cij值.(2)利用给定的初始位置值,初始化个体最优位置Pi解和全局最优位置Pg解.并根据各个Agent自治性和社会交互性,利用式(2)(3)计算需求强度参数xik.(3)While(k< =Max_Len&&φ <0.0001)//φ为优化达到的精度;For i=1∶NFor j=1∶mChange(X,V)//修改每个解的位置X和速度V.End For(4)calculation(i);//通过式(8),计算第i次迭代适应值.localbest(i);//将第i次计算解与所经历过的当前最好位置解Pi进行比较,若较好,则将其作为当前解的最好位置解;End Forglobalbest();//对每次求解,将其Pi与全局所经历的最好位置解进行比较,求出全局极值Pg.EndWhile4 仿真实例4.1 仿真实例参数设置在仿真实验中,考虑到每个子资源Agent和子任务Agent有不同的优先级、自治度、交互性等复杂的行为,在仿真实验中随机生成了相关系统参数.仿真程序中各个参数和变量分别设置如下:每次迭代中搜索空间的维数为资源数m;加速因子c1和c2设置为2.惯性权w根据经验值设为w=0.9 -count*0.5/(Max_Len -1),count代表当前第count个粒子.种群位置的变化范围根据所要研究的问题设置为[1,N],粒子速度的变化范围设定为[1,N];最大迭代次数Max_Len取值为500;其他参数设置为如下随机值:wij=[0,1];λ1= [0.01,0.1];λ2= [0.1,0.2];mi= [50,100];pik= [2,10];rik= [1,5];cij= [5,10];Sk= [100,200];Pk= [100,200].4.2 仿真结果与收敛性分析算法中的全局最优适应值变化反应了协作求解的寻优过程.种群根据自身经验和全局经验,不断调整单个解的位置,最后通过迭代搜索到全局最优解.由于本文把每个解的搜索空间定义为资源数.所以每个全局最优适应值就代表了任务资源的一次规划分配方案.在系统协作求解中,由于子任务数和子资源数是不固定的,并且协作求解的程度e也处于动态变化之中.因此,为了全面考察算法的有效性.本文针对不同的子任务、子资源和不同e值组合进行实验分析.如图3,4所示.图3 给出了在 (m,n)=(12,10),e=0.3,0.5,0.8条件下全局最优适应值Pg的变化过程.从图3可以看出,当资源粒子数和任务粒子数相同,协作求解程度e不同时,全局最优适应值差别很大,由17.1变化到41.2,而收敛速度基本相同.这是因为随着e的增加,系统协作求解的资源消费不断增加,导致系统总消费不断增大.但由于资源数和任务数相同,每个粒子的搜索空间维数相同.因此算法的收敛速度基本相同.这也说明了该算法的收敛速度与协作求解程度e无关.当资源数和任务数不同,协作求解程度相同时.全局最优适应值虽变化很大,但收敛速度随资源粒子数的增加明显变慢.这是由于资源数的增加导致粒子搜索的空间变大,迭代速度就会变慢时间变长.如图4所示.在图4虽然全局最优适应值变化速度不同,但最终都趋于平稳状态.表明该方法具有良好的收敛性.同时,仿真结果也验证了该方法适合不同协作条件下各种任务资源规划分配求解.图3 全局最优适应值变化((m,n)=(12,10))图4 全局最优适应值变化(e=0.2)仿真计算过程中,对于不同的资源粒子数、任务粒子数和不同的e值.在run_time=30条件下,求得的全局最优适应平均值和标准方差值如附表所示.在附表中全局最优适应值的标准均方差的变化范围为0.22~0.42.说明了该算法具有良好的收敛性和有效性.仿真结果也验证了该方法适合不同协作条件下各种任务资源规划分配求解,该方法具有一定的通用性和有效性.附表任务资源规划分配结果表(m,n)e 全局最优适应平均值标准方差STD 0.39.99 0.22(10,8)0.5 16.03 0.37(12,10)0.3 16.86 0.3 0.5 25.36 0.32 0.3 28.58 0.39(16,14)0.5 45.97 0.425 结论多Agent分布式系统协作求解比较复杂,需要考虑Agent本身的自治与交互行为动态性和随机性、复杂性等问题.本文针对分布式环境下任务资源协作规划分配问题,提出了一种多Agent分布式系统协作求解粒子模型方法,通过讨论多A-gent 系统分布协作求解和粒子协作之间的关系,将协作求解问题转化为多个粒子共同寻优的过程,并构造了适合求解该方法的粒子群算法.在算法迭代过程中,尽管全局最优适应值收敛的速度不同,但都收敛达到平稳状态,表明该算法是收敛的.仿真实验结果表明在该方法中资源数的多少决定了算法的搜索空间,影响了收敛速度的快慢,但收敛速率与协作求解程度无关.仿真实验结果也验证了该模型方法即能克服了环境和Agent本身的动态变化,又能处理社会交互行为变化对系统协作求解的影响,能够很好地解决各种复杂的任务资源规划问题,具有很好的有效性和通用性.参考文献:[1]张新良,石纯一.多Agent合作求解[J].计算机科学,2003,30(8):100-103.[2]李英.多Agent系统及其在预测与智能交通系统的应用[M].上海:华东理工大学出版社,2004.[3]WOOLDRIDGE M,JENNINGS NR.The cooperative problem-solving process[J].Journal of Logic Computation,1999,9(4):563-592.[4]SHUAI Dianxun,FENG Xiang.Distributed problem solving in multi-agent system:A spring net approach[J].IEEE Intelligent System,2005,20(4):66-74.[5]帅典勋,王亮.一种新的基于复合弹簧网络的多A-gent系统分布式问题求解方法[J].计算机学报,2002,25(8):853-859.[6]陶海军,王亚东,郭茂祖,等.基于熟人联盟及扩充合同网协议的多智能体协商模型[J].计算机研究与发展,2006,43(7):1155-1160.[7]陈宇,陈新,陈新度,等.基于设备整体效能和多Agent的预测-反应式调度[J].计算机集成制造系统,2009,15(8):1599-1605.[8]谢晓锋,张文俊,杨之廉.微粒群算法综述[J].控制与决策,2003,18(2):129-134.[9]徐刚,于泳波.基于改进的微粒子算法求解0/1背包问题[J].齐齐哈尔大学学报,2007,23(1):71-74.。
多变量核密度估计与Vine复杂体(kdevine包)v0.4.4用户指南说明书
Package‘kdevine’October18,2022Type PackageTitle Multivariate Kernel Density Estimation with Vine CopulasVersion0.4.4URL https:///tnagler/kdevineBugReports https:///tnagler/kdevine/issuesDescription Implements the vine copula based kernel density estimator ofNagler and Czado(2016)<doi:10.1016/j.jmva.2016.07.003>.The estimator doesnot suffer from the curse of dimensionality and is therefore well suited forhigh-dimensional applications.License GPL-3Imports graphics,stats,utils,MASS,Rcpp,qrng,KernSmooth,cctools,kdecopula(>=0.8.1),VineCopula,doParallel,parallel,foreachLazyData yesLinkingTo RcppRoxygenNote7.2.0Suggests testthatNeedsCompilation yesAuthor Thomas Nagler[aut,cre]Maintainer Thomas Nagler<****************>Repository CRANDate/Publication2022-10-1812:25:15UTCR topics documented:kdevine-package (2)contour.kdevinecop (3)dkde1d (3)dkdevine (4)dkdevinecop (5)kde1d (6)12kdevine-package kdevine (7)kdevinecop (9)plot.kde1d (10)rkdevine (11)wdbc (12)Index14 kdevine-package Kernel Smoothing for Bivariate Copula DensitiesDescriptionThis package implements a vine copula based kernel density estimator.The estimator does not suf-fer from the curse of dimensionality and is therefore well suited for high-dimensional applications (see,Nagler and Czado,2016).DetailsThe multivariate kernel density estimators is implemented by the kdevine function.It combines a kernel density estimator for the margins(kde1d)and a kernel estimator of the vine copula density (kdevinecop).The package is built on top of the copula density estimators in the kdecopula::kdecopula-package and let’s you choose from all its implemented methods.Optionally,the vine copula can be estimated parameterically(only the margins are nonparametric).Author(s)Thomas NaglerReferencesNagler,T.,Czado,C.(2016)Evading the curse of dimensionality in nonparametric density estimation with simplified vine copu-las.Journal of Multivariate Analysis151,69-89(doi:10.1016/j.jmva.2016.07.003)Nagler,T.,Schellhase,C.and Czado,C.(2017)Nonparametric estimation of simplified vine copula models:comparison of methods arXiv:1701.00845 Nagler,T.(2017)A generic approach to nonparametric function estimation with mixed data.arXiv:1704.07457contour.kdevinecop3 contour.kdevinecop Contour plots of pair copula kernel estimatesDescriptionContour plots of pair copula kernel estimatesUsage##S3method for class kdevinecopcontour(x,tree="ALL",xylim=NULL,cex.nums=1,...)Argumentsx a kdevinecop object.tree"ALL"or integer vector;specifies which trees are plotted.xylim numeric vector of length2;sets xlim and ylim for the contours.cex.nums numeric;expansion factor for font of the numbers....arguments passed to contour.kdecopula.Examplesdata(wdbc,package="kdecopula")#load datau<-VineCopula::pobs(wdbc[,5:7],ties="average")#rank-transform#estimate densityfit<-kdevinecop(u)#contour matrixcontour(fit)dkde1d Working with a kde1d objectDescriptionThe density,cdf,or quantile function of a kernel density estimate are evaluated at arbitrary points with dkde1d,pkde1d,and qkde1d respectively.4dkdevineUsagedkde1d(x,obj)pkde1d(x,obj)qkde1d(x,obj)rkde1d(n,obj,quasi=FALSE)Argumentsx vector of evaluation points.obj a kde1d object.n integer;number of observations.quasi logical;the default(FALSE)returns pseudo-random numbers,use TRUE for quasi-random numbers(generalized Halton,see ghalton).ValueThe density or cdf estimate evaluated at x.See Alsokde1dExamplesdata(wdbc)#load datafit<-kde1d(wdbc[,5])#estimate densitydkde1d(1000,fit)#evaluate density estimatepkde1d(1000,fit)#evaluate corresponding cdfqkde1d(0.5,fit)#quantile functionhist(rkde1d(100,fit))#simulatedkdevine Evaluate the density of a kdevine objectDescriptionEvaluate the density of a kdevine objectUsagedkdevine(x,obj)dkdevinecop5Argumentsx(mxd)matrix of evaluation points(or vector of length d).obj a kdevine object.ValueThe density estimate evaluated at x.See AlsokdevineExamples#load datadata(wdbc)#estimate density(use xmin to indicate positive support)fit<-kdevine(wdbc[,5:7],xmin=rep(0,3))#evaluate density estimatedkdevine(c(1000,0.1,0.1),fit)dkdevinecop Working with a kdevinecop objectDescriptionA vine copula density estimate(stored in a kdevinecop object)can be evaluated on arbitrary pointswith dkevinecop.Furthermore,you can simulate from the estimated density with rkdevinecop. Usagedkdevinecop(u,obj,stable=FALSE)rkdevinecop(n,obj,U=NULL,quasi=FALSE)Argumentsu mx2matrix of evaluation points.obj kdevinecop object.stable logical;option for stabilizing the estimator:the estimated pair copula density is cut off at50.n integer;number of observations.U(optional)nxd matrix of independent uniform random variables.quasi logical;the default(FALSE)returns pseudo-random numbers,use TRUE for quasi-random numbers(generalized Halton,see ghalton).6kde1d ValueA numeric vector of the density/cdf or a nx2matrix of simulated data.Author(s)Thomas NaglerReferencesNagler,T.,Czado,C.(2016)Evading the curse of dimensionality in nonparametric density estimation.Journal of Multivariate Analysis151,69-89(doi:10.1016/j.jmva.2016.07.003)Dissmann,J.,Brechmann,E.C.,Czado,C.,and Kurowicka,D.(2013).Selecting and estimating regular vine copulae and application tofinancial returns.Computational Statistics&Data Analysis,59(0):52–69.See Alsokdevinecop,dkdecop,rkdecop,ghaltonExamplesdata(wdbc,package="kdecopula")#load datau<-VineCopula::pobs(wdbc[,5:7],ties="average")#rank-transformfit<-kdevinecop(u)#estimate densitydkdevinecop(c(0.1,0.1,0.1),fit)#evaluate density estimatekde1d Univariate kernel density estimation for bounded and unbounded sup-portDescriptionDiscrete variables are convoluted with the uniform distribution(see,Nagler,2017).If a variable should be treated as discrete,declare it as ordered().Usagekde1d(x,mult=1,xmin=-Inf,xmax=Inf,bw=NULL,bw_min=0,...)Argumentsx vector of length n.mult numeric;the actual bandwidth used is bw∗mult.xmin lower bound for the support of the density.xmax upper bound for the support of the density.bw bandwidth parameter;has to be a positive number or NULL;the latter calls KernSmooth::dpik().bw_min minimum value for the bandwidth....unused.DetailsIf xmin or xmax arefinite,the density estimate will be0outside of[xmin,xmax].Mirror-reflectionis used to correct for boundary bias.Discrete variables are convoluted with the uniform distribution(see,Nagler,2017).ValueAn object of class kde1d.ReferencesNagler,T.(2017).A generic approach to nonparametric function estimation with mixed data.arXiv:1704.07457See Alsodkde1d,pkde1d,qkde1d,rkde1d plot.kde1d,lines.kde1dExamplesdata(wdbc,package="kdecopula")#load datafit<-kde1d(wdbc[,5])#estimate densitydkde1d(1000,fit)#evaluate density estimatekdevine Kernel density estimatior based on simplified vine copulasDescriptionImplements the vine-copula based estimator of Nagler and Czado(2016).The marginal densitiesare estimated by kde1d,the vine copula density by kdevinecop.Discrete variables are convolutedwith the uniform distribution(see,Nagler,2017).If a variable should be treated as discrete,declareit as ordered().Factors are expanded into binary dummy codes.Usagekdevine(x,mult_1d=NULL,xmin=NULL,xmax=NULL,copula.type="kde",...)Argumentsx(nxd)data matrix.mult_1d numeric;all bandwidhts for marginal kernel density estimation are multipliedwith mult_1d.Defaults to log(1+d)where d is the number of variables afterapplying cctools::expand_as_numeric().xmin numeric vector of length d;see kde1d.xmax numeric vector of length d;see kde1d.copula.type either"kde"(default)or"parametric"for kernel or parametric estimation ofthe vine copula....further arguments passed to kde1d or kdevinecop.ValueAn object of class kdevine.ReferencesNagler,T.,Czado,C.(2016)Evading the curse of dimensionality in nonparametric density estima-tion with simplified vine copulas.Journal of Multivariate Analysis151,69-89(doi:10.1016/j.jmva.2016.07.003)Nagler,T.(2017).A generic approach to nonparametric function estimation with mixed data.arXiv:1704.07457See Alsodkdevine kde1d kdevinecopExamples#load datadata(wdbc,package="kdecopula")#estimate density(use xmin to indicate positive support)fit<-kdevine(wdbc[,5:7],xmin=rep(0,3))#evaluate density estimatedkdevine(c(1000,0.1,0.1),fit)#plot simulated datapairs(rkdevine(nrow(wdbc),fit))kdevinecop9 kdevinecop Kernel estimation of vine copula densitiesDescriptionThe function estimates a vine copula density using kernel estimators for the pair copulas(based on the kdecopula package).Usagekdevinecop(data,matrix=NA,method="TLL2",renorm.iter=3L,mult=1,test.level=NA,trunc.level=NA,treecrit="tau",cores=1,info=FALSE)Argumentsdata(nxd)matrix of copula data(have to lie in[0,1d]).matrix R-Vine matrix(nxd)specifying the structure of the vine;if NA(default)thestructure selection heuristic of Dissman et al.(2013)is applied.method see kdecop.renorm.iter see kdecop.mult see kdecop.test.level significance level for independence test.If you provide a number in[0,1],anindependence test(BiCopIndTest)will be performed for each pair;if the nullhypothesis of independence cannot be rejected,the independence copula willbe set for this pair.If test.level=NA(default),no independence test will beperformed.trunc.level integer;the truncation level.All pair copulas in trees above the truncation levelwill be set to independence.treecrit criterion for structure selection;defaults to"tau".cores integer;if cores>1,estimation will be parallized within each tree(using foreach).info logical;if TRUE,additional information about the estimate will be gathered(seekdecop).10plot.kde1dValueAn object of class kdevinecop.That is,a list containingT1,T2,...lists of the estimted pair copulas in each tree,matrix the structure matrix of the vine,info additional information about thefit(if info=TRUE).ReferencesNagler,T.,Czado,C.(2016)Evading the curse of dimensionality in nonparametric density estimation with simplified vine cop-ulas.Journal of Multivariate Analysis151,69-89(doi:10.1016/j.jmva.2016.07.003)Nagler,T.,Schellhase,C.and Czado,C.(2017)Nonparametric estimation of simplified vine copula models:comparison of methods arXiv:1701.00845 Dissmann,J.,Brechmann,E.C.,Czado,C.,and Kurowicka,D.(2013).Selecting and estimating regular vine copulae and application tofinancial returns.Computational Statistics&Data Analysis,59(0):52–69.See Alsodkdevinecop,kdecop,BiCopIndTest,foreachExamplesdata(wdbc,package="kdecopula")#rank-transform to copula data(margins are uniform)u<-VineCopula::pobs(wdbc[,5:7],ties="average")fit<-kdevinecop(u)#estimate densitydkdevinecop(c(0.1,0.1,0.1),fit)#evaluate density estimatecontour(fit)#contour matrix(Gaussian scale)pairs(rkdevinecop(500,fit))#plot simulated dataplot.kde1d Plotting kde1d objectsDescriptionPlotting kde1d objectsUsage##S3method for class kde1dplot(x,...)##S3method for class kde1dlines(x,...)rkdevine11Argumentsx kde1d object....further arguments passed to plot.default.See Alsokde1d lines.kde1dExamplesdata(wdbc)#load datafit<-kde1d(wdbc[,7])#estimate densityplot(fit)#plot density estimatefit2<-kde1d(as.ordered(wdbc[,1]))#discrete variableplot(fit2,col=2)rkdevine Simulate from a kdevine objectDescriptionSimulate from a kdevine objectUsagerkdevine(n,obj,quasi=FALSE)Argumentsn number of observations.obj a kdevine object.quasi logical;the default(FALSE)returns pseudo-random numbers,use TRUE for quasi-random numbers(generalized Halton,only works for fully nonparametricfits).ValueAn nxd matrix of simulated data from the kdevine object.See Alsokdevine,rkdevinecop,rkde1d12wdbcExamples#load and plot datadata(wdbc)#estimate densityfit<-kdevine(wdbc[,5:7],xmin=rep(0,3))#plot simulated datapairs(rkdevine(nrow(wdbc),fit))wdbc Wisconsin Diagnostic Breast Cancer(WDBC)DescriptionThe data contain measurements on cells in suspicious lumps in a women’s breast.Features are computed from a digitized image of afine needle aspirate(FNA)of a breast mass.They describe characteristics of the cell nuclei present in the image.All samples are classsified as either benign or malignant.Usagedata(wdbc)Formatwdbc is a data.frame with31columns.Thefirst column indicates wether the sample is classified as benign(B)or malignant(M).The remaining columns contain measurements for30features.DetailsTen real-valued features are computed for each cell nucleus:a)radius(mean of distances from center to points on the perimeter)b)texture(standard deviation of gray-scale values)c)perimeterd)areae)smoothness(local variation in radius lengths)f)compactness(perimeter^2/area-1.0)g)concavity(severity of concave portions of the contour)h)concave points(number of concave portions of the contour)i)symmetryj)fractal dimension("coastline approximation"-1)The references listed below contain detailed descriptions of how these features are computed.The mean,standard error,and"worst"or largest(mean of the three largest values)of these features were computed for each image,resulting in30features.wdbc13NoteThis breast cancer database was obtained from the University of Wisconsin Hospitals,Madison from Dr.William H.Wolberg.Sourcehttps:///ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)Bache,K.&Lichman,M.(2013).UCI Machine Learning Repository.Irvine,CA:University of California,School of Information and Computer Science.ReferencesO.L.Mangasarian and W.H.Wolberg:"Cancer diagnosis via linear programming",SIAM News,V olume23,Number5,September1990,pp1&18.William H.Wolberg and O.L.Mangasarian:"Multisurface method of pattern separation for medical diagnosis applied to breast cytology",Proceedings of the National Academy of Sciences,U.S.A.,V olume87,December1990,pp9193-9196.K.P.Bennett&O.L.Mangasarian:"Robust linear programming discrimination of two linearly inseparable sets",Optimization Methods and Software1,1992,23-34(Gordon&Breach Science Publishers).Examplesdata(wdbc)str(wdbc)Index∗datasetswdbc,12∗packagekdevine-package,2 BiCopIndTest,9,10 cctools::expand_as_numeric(),8 contour.kdecopula,3 contour.kdevinecop,3dkde1d,3,3,7dkdecop,6dkdevine,4,8 dkdevinecop,5,10 dkevinecop(dkdevinecop),5 foreach,9,10ghalton,4–6kde1d,2,4,6,7,8,11 kdecop,9,10kdecopula,9kdecopula::kdecopula-package,2 kdevine,2,5,7,11kdevine-package,2 kdevinecop,2,3,6–8,9 KernSmooth::dpik(),7lines.kde1d,7,11lines.kde1d(plot.kde1d),10 ordered(),6,7pkde1d,3,7pkde1d(dkde1d),3pkde1d,(dkde1d),3plot.default,11plot.kde1d,7,10qkde1d,3,7qkde1d(dkde1d),3qkde1d,(dkde1d),3rkde1d,7,11rkde1d(dkde1d),3rkdecop,6rkdevine,11rkdevinecop,11rkdevinecop(dkdevinecop),5wdbc,1214。
分数阶多机器人的领航-跟随型环形编队控制
第38卷第1期2021年1月控制理论与应用Control Theory&ApplicationsV ol.38No.1Jan.2021分数阶多机器人的领航–跟随型环形编队控制伍锡如†,邢梦媛(桂林电子科技大学电子工程与自动化学院,广西桂林541004)摘要:针对多机器人系统的环形编队控制复杂问题,提出一种基于分数阶多机器人的环形编队控制方法,应用领航–跟随编队方法来控制多机器人系统的环形编队和目标包围,通过设计状态估测器,实现对多机器人的状态估计.由领航者获取系统中目标状态的信息,跟随者监测到领航者的状态信息并完成包围环绕编队控制,使多机器人系统形成对动态目标的目标跟踪.根据李雅普诺夫稳定性理论和米塔格定理,得到多机器人系统环形编队控制的充分条件,实现对多机器人系统对目标物的包围控制,通过对一组多机器人队列的目标包围仿真,验证了该方法的有效性.关键词:分数阶;多机器人;编队控制;环形编队;目标跟踪引用格式:伍锡如,邢梦媛.分数阶多机器人的领航–跟随型环形编队控制.控制理论与应用,2021,38(1):103–109DOI:10.7641/CTA.2020.90969Annular formation control of the leader-follower multi-robotbased on fractional orderWU Xi-ru†,XING Meng-yuan(School of Electronic Engineering and Automation,Guilin University of Electronic Technology,Guilin Guangxi541004,China) Abstract:Aiming at the complex problem of annular formation control for fractional order multi robot system,an an-nular formation control method based on fractional order multi robot is proposed.The leader follower formation method is used to control the annular formation and target envelopment of the multi robot systems.The state estimation of multi robot is realized by designing state estimator.The leader obtains the information of the target state in the system,the followers detects the status of the leader and complete annular formation control,the multi-robot system forms the target tracking of the dynamic target.According to Lyapunov stability theory and Mittag Leffler’s theorem,the sufficient conditions of the annular formation control for the multi robot systems are obtained in order to achieve annular formation control of the leader follower multi robot.The effectiveness of the proposed method is verified by simulation by simulation of a group of multi robot experiments.Key words:fractional order;multi-robots;formation control;annular formation;target trackingCitation:WU Xiru,XING Mengyuan.Annular formation control of the leader-follower multi-robot based on fractional order.Control Theory&Applications,2021,38(1):103–1091引言近年来,随着机器人技术的崛起和发展,各式各样的机器人技术成为了各个领域不可或缺的一部分,推动着社会的发展和进步.与此同时,机器人面临的任务也更加复杂,单个机器人已经无法独立完成应尽的责任,这就使得多机器人之间相互协作、共同完成同一个给定任务成为当前社会的研究热点.多机器人系统控制的研究主要集中在一致性问题[1]、多机器人编队控制问题[2–3]、蜂拥问题[4–5]等.其中,编队控制问题作为多机器人系统的主要研究方向之一,是国内外研究学者关注的热点问题.编队控制在生活生产、餐饮服务尤其是军事作战等领域都发挥着极大的作用.例如水下航行器在水中的自主航行和编队控制、军事作战机对空中飞行器的打击以及无人机在各行业的应用等都是多机器人编队控制上的用途[6–7].目前,多机器人编队控制方法主要有3种,其中在多机器收稿日期:2019−11−25;录用日期:2020−08−10.†通信作者.E-mail:****************;Tel.:+86132****1790.本文责任编委:黄攀峰.国家自然科学基金项目(61603107,61863007),桂林电子科技大学研究生教育创新计划项目(C99YJM00BX13)资助.Supported by the National Natural Science Foundation of China(61603107,61863007)and the Innovation Project of GUET Graduate Education (C99YJM00BX13).104控制理论与应用第38卷人系统编队控制问题上应用最广泛的是领航–跟随法[8–10];除此之外,还有基于行为法和虚拟结构法[11].基于行为的多机器人编队方法在描述系统整体时不够准确高效,且不能保证系统控制的稳定性;而虚拟结构法则存在系统灵活性不足的缺陷.领航–跟随型编队控制法具有数学分析简单、易保持队形、通信压力小等优点,被广泛应用于多机器人系统编队[12].例如,2017年,Hu等人采用分布式事件触发策略,提出一种新的自触发算法,实现了线性多机器人系统的一致性[13];Zuo等人利用李雅普诺夫函数,构造具有可变结构的全局非线性一致控制律,研究多机器人系统的鲁棒有限时间一致问题[14].考虑到分数微积分的存储特性,开发分数阶一致性控制的潜在应用具有重要意义.时中等人于2016年设计了空间遥操作分数阶PID 控制系统,提高了机器人系统的跟踪性能、抗干扰性、鲁棒性和抗时延抖动性能[15].2019年,Z Yang等人探讨了分数阶多机器人系统的领航跟随一致性问题[16].而在多机器人的环形编队控制中,对具有分数阶动力学特性的多机器人系统的研究极其有限,大部分集中在整数阶的阶段.而采用分数阶对多机器人系统目标包围编队控制进行研究,综合考虑了非局部分布式的影响,更好地描述具有遗传性质的动力学模型.使得系统的模型能更准确的反映系统的性态,对多机器人编队控制的研究非常有利.目标包围控制问题是编队控制的一个分支,是多智能体编队问题的重点研究领域.随着信息技术的高速发展,很多专家学者对多机器人系统的目标包围控制问题进行了研究探讨.例如,Kim和Sugie于2017年基于一种循环追踪策略设计分布式反馈控制律,保证了多机器人系统围绕一个目标机器人运动[17].在此基础上,Lan和Yan进行了拓展,研究了智能体包围多个目标智能体的问题,并把这个问题分为两个步骤[18]. Kowdiki K H和Barai K等人则研究了单个移动机器人对任意时变曲线的跟踪包围问题[19].Asif M考虑了机器人与目标之间的避障问题,提出了两种包围追踪控制算法;并实现了移动机器人对目标机器人的包围追踪[20].鉴于以上原因,本文采用了领航–跟随型编队控制方法来控制多机器人系统的环形编队和目标包围,通过设计状态估测器,实现对多机器人的状态估计.系统中目标状态信息只能由领航者获取,确保整个多机器人系统编队按照预期的理想编队队形进行无碰撞运动,并最终到达目标位置,对目标、领航者和跟随者的位置分析如图1(a)所示,图1(b)为编队控制后的状态.通过应用李雅普诺夫稳定性理论,得到实现多机器人系统环形编队控制的充分条件.最后通过对一组多机器人队列进行目标包围仿真,验证了该方法的有效性.(a)编队控制前(b)编队控制后图1目标、领航者和追随者的位置分析Fig.1Location analysis of targets,pilots and followers2代数图论与分数阶基础假定一个含有N个智能体的系统,通讯网络拓扑图用G={v,ε}表示,定义ε=v×v为跟随者节点之间边的集合,v={v i,i=1,2,···,N}为跟随者节点的集合.若(v i,v j)∈ε,则v i与v j为相邻节点,定义N j(t)={i|(v i,v j)∈ε,v i∈v}为相邻节点j的标签的集合.那么称第j个节点是第i 个节点的邻居节点,用N j(t)={i|(v i,v j)∈ε,v i∈v}表示第i个节点的邻居节点集合.矩阵L=D−A称为与图G对应的拉普拉斯矩阵.其中:∆是对角矩阵,对角线元素i=∑jN i a ij.若a ij=a ji,i,j∈I,则称G是无向图,否则称为有向图.如果节点v i与v j之间一组有向边(v i,v k1)(v k1,v k2)(v k2,v k3)···(v kl,v j),则称从节点v i到v j存在有向路径.定义1Riemann-Liouville(RL)分数阶微分定义:RLD atf(t)=1Γ(n−a)d nd t ntt0f(τ)(t−τ)a−n+1dτ,(1)其中:t>t0,n−1<α<n,n∈Z+,Γ(·)为伽马函数.定义2Caputo(C)分数阶微分定义:CDαtf(t)=1Γ(n−α)tt0f n(τ)(t−τ)α−n+1dτ,(2)其中:t>t0,n−1<α<n,n∈Z+,Γ(·)为伽马第1期伍锡如等:分数阶多机器人的领航–跟随型环形编队控制105函数.定义3定义具有两个参数α,β的Mittag-Leffler方程为E α,β(z )=∞∑k =1z kΓ(αk +β),(3)其中:α>0,β>0.当β=1时,其单参数形式可表示为E α,1(z )=E α(z )=∞∑k =1z kΓ(αk +1).(4)引理1[21]假定存在连续可导函数x (t )∈R n ,则12C t 0D αt x T (t )x (t )=x T (t )C t 0D αt x (t ),(5)引理2[21]假定x =0是系统C t 0D αt x (t )=f (x )的平衡点,且D ⊂R n 是一个包含原点的域,R 是一个连续可微函数,x 满足以下条件:{a 1∥x ∥a V (t ) a 2∥x ∥ab ,C t 0D αt V (t ) −a 3∥x ∥ab,(6)其中:t 0,x ∈R ,α∈(0,1),a 1,a 2,a 3,a,b 为任意正常数,那么x =0就是Mittag-Leffler 稳定.3系统环形编队控制考虑包含1个领航者和N 个跟随者的分数阶非线性多机器人系统.领航者的动力学方程为C t 0D αt x 0(t )=u 0(t ),(7)式中:0<α<1,x 0(t )∈R 2是领航者的位置状态,u 0(t )∈R 2是领航者的控制输入.跟随者的动力学模型如下:C t 0D αt x i (t )=u i (t ),i ∈I,(8)式中:0<α<1,x i (t )∈R 2是跟随者的位置状态,u i (t )∈R 2是跟随者i 在t 时刻的控制输入,I ={1,2,···,N }.3.1领航者控制器的设计对于领航者,选择如下控制器:u 0(t )=−k 1(x 0(t )−˜x 0(t ))−k 2sgn(x 0(t )−˜x 0(t )),(9)C t 0D αt x 0(t )=u 0(t )=−k 1(x 0(t )−˜x 0(t ))−k 2sgn(x 0(t )−˜x 0(t )).(10)设计一个李雅普诺夫函数:V (t )=12(x 0(t )−˜x 0(t ))T (x 0(t )−˜x 0(t )).(11)根据引理1,得到该李雅普诺夫函数的α阶导数如下:C 0D αt V(t )=12C 0D αt (x 0(t )−˜x 0(t ))T (x 0(t )−˜x 0(t )) (x 0(t )−˜x 0(t ))TC 0D αt (x 0(t )−˜x0(t ))=(x 0(t )−˜x 0(t ))T [C 0D αt x 0(t )−C 0D αt ˜x0(t )]=(x 0(t )−˜x 0(t ))T [−k 1(x 0(t )−˜x 0(t ))−k 2sgn(x 0(t )−˜x 0(t ))−C 0D αt ˜x0(t )]=−k 1(x 0(t )−˜x 0(t ))T (x 0(t )−˜x 0(t ))−k 2∥x 0(t )−˜x 0(t )∥−(x 0(t )−˜x 0(t ))TC 0D αt ˜x0(t )=−2k 1V (t )−k 2∥x 0(t )−˜x 0(t )∥+∥C 0D αt ˜x0(t )∥∥x 0(t )−˜x 0(t )∥=−2k 1V (t )−(k 2−∥C 0D ∝t ˜x0(t )∥)∥x 0(t )−˜x 0(t )∥ −2k 1V (t ).(12)令a 1=a 2=12,a 3=2k 1,ab =2,a >0,b >0,得到a 1∥x 0(t )−˜x 0(t )∥a V (t ) a 2∥x 0(t )−˜x 0(t )∥ab ,(13)C t 0D αt V(t ) −a 3∥x 0(t )−˜x 0(t )∥ab .(14)根据引理2,可知lim t →∞∥x 0(t )−˜x 0(t )∥=0,即x 0(t )逐渐趋近于˜x 0(t ).为了使跟随者能够跟踪观测到领航者的状态,设计了一个状态估测器.令ˆx i ∈R 2是追随者对领航者的状态估计,给出了ˆx i 的动力学方程C 0D αt ˆx i=β(∑j ∈N ia ij g ij (t )+d i g i 0(t )),(15)其中g ij =˜x j (t )−˜x i (t )∥˜x j (t )−˜x i (t )∥,˜x j (t )−˜x i (t )=0,0,˜x j (t )−˜x i (t )=0.(16)对跟随者取以下李雅普诺夫函数:V (t )=12N ∑i =1(ˆx i (t )−x 0(t ))T (ˆx i (t )−x 0(t )).(17)计算该函数的α阶导数如下:C 0D αt V(t )=12C 0D αtN ∑i =1(ˆx i (t )−x 0(t ))T (ˆx i (t )−x 0(t )) N ∑i =1(ˆx i (t )−x 0(t ))TC 0D αt (ˆx i (t )−x 0(t ))=N ∑i =1(ˆx i (t )−x 0(t ))T [C 0D αt ˆxi (t )−C 0D αt x 0(t )]=N ∑i =1(ˆx i (t )−x 0(t ))T [β(∑j ∈N ia ijˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥+d iˆx 0(t )−ˆx i (t )∥ˆx 0(t )−ˆx i (t )∥)−C 0D αt x 0(t )]=N ∑i =1(ˆx i (t )−x 0(t ))T β(∑j ∈N i a ij ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i(t )∥+106控制理论与应用第38卷d iˆx 0(t )−ˆx i (t )∥ˆx 0(t )−ˆx i (t )∥)−N ∑i =1(ˆx i (t )−x 0(t ))TC 0D αt x 0(t )=βN ∑i =1(ˆx i (t )−x 0(t ))T ∑j ∈N i a ij ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥+βN ∑i =1(ˆx i (t )−x 0(t ))Td i ˆx 0(t )−ˆx i (t )∥ˆx 0(t )−ˆx i(t )∥−N ∑i =1(ˆx i (t )−x 0(t ))TC 0D αt x 0(t ).(18)在上式中,令C 0D αt V (t )=N 1+N 2以方便后续计算,其中:N 1=βN ∑i =1(ˆx i (t )−x 0(t ))T ∑j ∈N i a ij ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥+βN ∑i =1(ˆx i (t )−x 0(t ))Td i ˆx 0(t )−ˆx i (t )∥ˆx 0(t )−ˆx i (t )∥=β2[N ∑i =1N ∑j =1a ij (ˆx i (t )−x 0(t ))T ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥+N ∑j =1N ∑i =1a ij (ˆx j (t )−x 0(t ))Tˆx i (t )−ˆx j (t )∥ˆx i (t )−ˆx j (t )∥]−βN ∑i =1d i∥ˆx 0(t )−ˆx i (t )∥2∥ˆx 0(t )−ˆx i (t )∥=β2N ∑i =1N ∑j =1a ij [(ˆx i (t )−x 0(t ))Tˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥−(ˆx j (t )−x 0(t ))T ˆx i (t )−ˆx j (t )∥ˆx i (t )−ˆx j (t )∥]−βN ∑i =1d i∥ˆx 0(t )−ˆx i (t )∥2∥ˆx 0(t )−ˆx i (t )∥=β2N ∑i =1N ∑j =1a ij [ˆx T i(t )ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥−x T 0(t )ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥−ˆx T j(t )ˆx i (t )−ˆx j (t )∥ˆx i (t )−ˆx j (t )∥+x T0(t )ˆx i (t )−ˆx j (t )∥ˆx i (t )−ˆx j (t )∥]−βN ∑i =1d i ∥ˆx 0(t )−ˆx i (t )∥=β2N ∑i =1N ∑j =1a ij [ˆx T i (t )ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥−ˆx T j (t )ˆx i (t )−ˆx j (t )∥ˆx i (t )−ˆx j (t )∥]−βN ∑i =1d i ∥ˆx 0(t )−ˆx i (t )∥2∥ˆx 0(t )−ˆx i (t )∥=β2N ∑i =1N ∑j =1a ij (ˆx T i(t )−ˆx Tj (t ))ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥−βN ∑i =1d i ∥ˆx 0(t )−ˆx i (t )∥2∥ˆx 0(t )−ˆx i (t )∥=−β(12N ∑i =1N ∑j =1a ij (ˆx T j (t )−ˆx T i (t ))׈x j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥+N ∑i =1d i ∥ˆx 0(t )−ˆx i (t )∥2∥ˆx 0(t )−ˆx i (t )∥),(19)N 2=−N ∑i =1(ˆx i (t )−x 0(t ))TC 0D αt x 0(t )=N ∑i =1∥ˆx i (t )−x 0(t )∥∥C 0D αt x 0(t )∥×cos {ˆx i (t )−x 0(t ),−C 0D αt x 0(t )}.(20)由于∥C 0D αt x 0(t )∥k 1∥x 0(t )−˜x 0(t )∥+k 2∥sgn(x 0(t )−˜x 0(t ))∥ k 1∥x 0(t )−˜x 0(t )∥+k 2.(21)根据定义3,当lim t →∞∥x 0(t )−˜x 0(t )∥=0时,存在T >0(T 为实数),使得在t >T 时∥x 0(t )−˜x 0(t )∥ ε成立,那么对于t >T ,有0<∥C 0D αt x 0(t )∥ k 1ε+k 2=M 2,可得−N ∑i =1(ˆx i (t )−x 0(t ))TC 0D αt x 0(t )N ∑i =1∥ˆx i (t )−x 0(t )∥M 2M 2N max {∥ˆx i (t )−x 0(t )∥},(22)C 0D αt V(t ) −(β−M 2N )max i ∈I{∥ˆx i (t )−x 0(t )∥}−2β1λmin V (t ).(23)根据引理2,得lim t →∞∥ˆx i (t )−x 0(t )∥=0.(24)由上式可知,ˆx i (t )在对目标的追踪过程中逐渐趋近于x 0(t ).3.2跟随者控制器的设计在本文中,整个多机器人系统中领导者能够直接获得目标的位置信息,将这些信息传递给追随者,因此需要为每个追随者设计观测器来估计目标的状态.令ϕi (t )∈R 2由跟随者对目标i 的状态估计,给出ϕi (t )的动力学方程C 0D αt ϕi(t )=α(∑j ∈N ia ij f ij (t )+d i f i 0(t )),(25)其中f ij =ϕj (t )−ϕi (t )∥ϕj (t )−ϕi (t )∥,ϕj (t )−ϕi (t )=0,0,ϕj (t )−ϕi (t )=0.(26)取如下李雅普诺夫函数:V (t )=12N ∑i =1(ϕi (t )−r (t ))T (ϕi (t )−r (t )).(27)计算α阶导数如下:C 0D αt V(t )=第1期伍锡如等:分数阶多机器人的领航–跟随型环形编队控制10712N ∑i =1(ϕi (t )−r (t ))T (ϕi (t )−r (t )) N ∑i =1(ϕi (t )−r (t ))TC 0D αt (ϕi (t )−r (t ))=N ∑i =1(ϕi (t )−r (t ))T [C 0D αt ϕi (t )−C 0D αt r (t )]=N ∑i =1(φi (t )−r (t ))T [α(∑j ∈N ia ij f ij (t )+d i f i 0(t ))]−C 0D αt r (t )=N ∑i =1(ϕi (t )−r (t ))T α(∑j ∈N ia ij ϕj (t )−ϕi (t )∥ϕj (t )−ϕi (t )∥+d i ϕ(t )−ϕi (t )∥ϕ(t )−ϕi (t )∥)=βN ∑i =1(ϕi (t )−r (t ))T ∑j ∈N i a ijϕj (t )−ϕi (t )∥ϕj (t )−ϕi(t )∥+βN ∑i =1(ϕi (t )−r (t ))T d i ϕ(t )−ϕi (t )∥ϕ(t )−ϕi(t )∥−N ∑i =1(ϕi (t )−r (t ))TC 0D αt r (t ),(28)可得lim t →∞∥x i (t )−˜x i (t )∥=0.(29)由上式可知,x i (t )在对目标的追踪过程中逐渐趋近于˜x i (t ).4仿真结果与分析本节通过仿真结果来验证本文所提出的方法.图2为通信图,其中:V ={1,2,3,4}表示跟随者集合,0代表领导者.以5个机器人组成的队列为例进行验证,根据领航者对目标的跟随轨迹,分别进行了仿真.图2通信图Fig.2Communication diagrams假设系统中目标机器人的动态为C 0D αt r (t )=[cos t sin t ]T ,令初始值r 1(0)=r 2(0)=1,α=0.98,k 1=1,k 2=4,可知定理3中的条件是满足的.根据式(24)和式(29),随着时间趋于无穷,领航者及其跟随者的状态估计误差趋于0,这意味着领航者的状态可以由跟随者渐近精确地计算出来.令k 2>M 1,M 1=M +M ′>0,则lim t →∞∥x 0(t )−˜x 0(t )∥=0,x 0渐近收敛于领航者的真实状态.此时取时滞参数µ=0.05,实验结果见图3,由1个领航者及4个跟随者组成的多机器人系统在进行目标围堵时,最终形成了以目标机器人为中心的包围控制(见图3(b)).(a)领航者和跟随者的初始位置分析(b)编队形成后多机器人的位置关系图3目标、领航者和追随者的位置分析Fig.3Location analysis of target pilots and followers综合图4–5曲线,跟随者对领航者进行渐进跟踪,领航者同目标机器人的相对位置不变,表明该领航跟随型多机器人系统最终能与目标机器人保持期望的距离,并且不再变化.图4领航者及其跟随者的状态估计误差Fig.4The state estimation error of the leader and followers108控制理论与应用第38卷图5编队形成时领航者与目标的相对位置关系Fig.5The relative position relationship between leader andtarget仿真结果表明,多个机器人在对目标物进行包围编队时,领航者会逐渐形成以目标物运动轨迹为参照的运动路线,而跟随者则渐近的完成对领航者的跟踪(如图6所示),跟随者在对领航者进行跟踪时,会出现一定频率的抖振,但这些并不会影响该多机器人系统的目标包围编队控制.5总结本文提出了多机器人的领航–跟随型编队控制方法,选定了一台机器人作为领航者负责整个编队的路径规划任务,其余机器人作为跟随者.跟随机器人负责实时跟踪领航者,并尽可能与领航机器人之间保持队形所需的距离和角度,确保整个多机器人系统编队按照预期的理想编队队形进行无碰撞运动,并最终到达目标位置.通过建立李雅普诺夫函数和米塔格稳定性理论,得到了实现多机器人系统环形编队的充分条件,并通过对一组多机器人队列的目标包围仿真,验证了该方法的有效性.图6领航者与跟随者对目标的状态估计Fig.6State estimation of target by pilot and follower参考文献:[1]JIANG Yutao,LIU Zhongxin,CHEN Zengqiang.Distributed finite-time consensus algorithm for multiple nonholonomic mobile robots with disturbances.Control Theory &Applications ,2019,36(5):737–745.(姜玉涛,刘忠信,陈增强.带扰动的多非完整移动机器人分布式有限时间一致性控制.控制理论与应用,2019,36(5):737–745.)[2]ZHOU Chuan,HONG Xiaomin,HE Junda.Formation control ofmulti-agent systems with time-varying topology based on event-triggered mechanism.Control and Decision ,2017,32(6):1103–1108.(周川,洪小敏,何俊达.基于事件触发的时变拓扑多智能体系统编队控制.控制与决策,2017,32(6):1103–1108.)[3]ZHANG Ruilei,LI Sheng,CHEN Qingwei,et al.Formation controlfor multi-robot system in complex terrain.Control Theory &Appli-cations ,2014,31(4):531–537.(张瑞雷,李胜,陈庆伟,等.复杂地形环境下多机器人编队控制方法.控制理论与应用,2014,31(4):531–537.)[4]WU Jin,ZHANG Guoliang,ZENG Jing.Discrete-time modeling formultirobot formation and stability of formation control algorithm.Control Theory &Applications ,2014,31(3):293–301.(吴晋,张国良,曾静.多机器人编队离散模型及队形控制稳定性分析.控制理论与应用,2014,31(3):293–301.)[5]WANG Shuailei,ZHANG Jinchun,CAO Biao.Target tracking al-gorithm with double-type agents based on flocking control.Control Engineering of China ,2019,26(5):935–940.(王帅磊,张金春,曹彪.双类型多智能体蜂拥控制目标跟踪算法.控制工程,2019,26(5):935–940.)[6]SHAO Zhuang,ZHU Xiaoping,ZHOU Zhou,et al.Distributed for-mation keeping control of UA Vs in 3–D dynamic environment.Con-trol and Decision ,2016,31(6):1065–1072.(邵壮,祝小平,周洲,等.三维动态环境下多无人机编队分布式保持控制.控制与决策,2016,31(6):1065–1072.)[7]PANG Shikun,WANG Jian,YI Hong.Formation control of multipleautonomous underwater vehicles based on sensor measuring system.Journal of Shanghai Jiao Tong University ,2019,53(5):549–555.(庞师坤,王健,易宏.基于传感探测系统的多自治水下机器人编队协调控制.上海交通大学学报,2019,53(5):549–555.)[8]WANG H,GUO D,LIANG X.Adaptive vision-based leader-followerformation control of mobile robots.IEEE Transactions on Industrial Electronics ,2017,64(4):2893–2902.[9]LI R,ZHANG L,HAN L.Multiple vehicle formation control basedon robust adaptive control algorithm.IEEE Intelligent Transportation Systems Magazine ,2017,9(2):41–51.[10]XING C,ZHAOXIA P,GUO G W.Distributed fixed-time formationtracking of multi-robot systems with nonholonomic constraints.Neu-rocomputing ,2018,313(3):167–174.[11]LOPEZ-GONZALEA A,FERREIRA E D,HERNANDEZ-MAR-TINEZ E G.Multi-robot formation control using distance and ori-entation.Advanced Robotics ,2016,30(14):901–913.[12]DIMAROGONAS D,FRAZZOLI E,JOHNSSON K H.Distributedevent-triggered control for multi-agent systems.IEEE Transactions on Automatic Control ,2019,57(5):1291–1297.[13]HU W,LIU L,FENG G.Consensus of linear multi-agent systems bydistributed event-triggered strategy.IEEE Transactions on Cybernet-ics ,2017,46(1):148–157.第1期伍锡如等:分数阶多机器人的领航–跟随型环形编队控制109[14]ZUO Z,LIN T.Distributed robustfinite-time nonlinear consensusprotocols for multi-agent systems.International Journal of Systems Science,2016,47(6):1366–1375.[15]SHI Zhong,HUANG Xuexiang,TAN Qian.Fractional-order PIDcontrol for teleoperation of a free-flying space robot.Control The-ory&Applications,2016,33(6):800–808.(时中,黄学祥,谭谦.自由飞行空间机器人的遥操作分数阶PID控制.控制理论与应用,2016,33(6):800–808.)[16]YANG Z C,ZHENG S Q,LIU F.Adaptive output feedback con-trol for fractional-order multi-agent systems.ISA Transactions,2020, 96(1):195–209.[17]LIU Z X,CHEN Z Q,YUAN Z Z.Event-triggered average-consensusof multi-agent systems with weighted and directed topology.Journal of Systems Science and Complexity,2016,25(5):845–855.[18]AI X L,YU J Q.Flatness-basedfinite-time leader-follower formationcontrol of multiple quad rotors with external disturbances.Aerospace Science and Technology,2019,92(9):20–33.[19]KOWDIKI K H,BARAI K,BHATTACHARYA S.Leader-followerformation control using artificial potential functions:A kinematic ap-proach.IEEE International Conference on Advances in Engineering.Tamil Nadu,India:IEEE,2012:500–505.[20]ASIF M.Integral terminal sliding mode formation control of non-holonomic robots using leader follower approach.Robotica,2017, 1(7):1–15.[21]CHEN W,DAI H,SONG Y,et al.Convex Lyapunov functions forstability analysis of fractional order systems.IET Control Theory& Applications,2017,11(7):1070–1074.作者简介:伍锡如博士,教授,硕士生导师,目前研究方向为机器人控制、神经网络、深度学习等,E-mail:***************.cn;邢梦媛硕士研究生,目前研究方向为多机器人编队控制,E-mail: ****************.。
用倾向最大回报的协同进化优化多Agent合作
用倾向最大回报的协同进化优化多Agent合作
高坚;张伟
【期刊名称】《计算机工程与应用》
【年(卷),期】2006(042)016
【摘要】进化计算是多Agent系统学习的一个有用技术.在多Agent系统研究中的某些领域,一种常用的方法是协同进化多Agent合作.研究已经指出:在某些领域,协同进化系统更倾向于稳定而不是成效(即收敛到局部优化解).这与多Agent系统研究的目的(追求利益最大化)是不相符的.为此,文章提出了一种基于混沌机制的倾向于最大回报的协同进化算法,改进了Wiegand等人的工作,.理论分析和仿真实验表明,这种基于混沌机制的倾向能促使协同进化向更优化的全局稳定点收敛,从而帮助协同进化算法在某些合作的多Agent领域发现更好的解(甚至是最优解).
【总页数】4页(P38-40,120)
【作者】高坚;张伟
【作者单位】烟台大学计算机学院,山东,烟台,264005;烟台大学计算机学院,山东,烟台,264005
【正文语种】中文
【中图分类】TP18
【相关文献】
1.协同合作视角下的VGAgent团队成员优选模型 [J], 蒋勋;顾小林;丁一;胡旻
2.二分微粒群协同进化优化算法 [J], 姚祥光;周永权;李咏梅
3.基于协同学的Multi-Agent合作系统研究 [J], 蒋国瑞;杨晓燕;赵书良
4.基于形式概念分析的大规模全局协同进化优化算法 [J], 马连博;常凤荣;张桓熙;王兴伟;黄敏;郝飞
5.“合作社人当劳模”系列报道七刘卉:最大目标是回报社会 [J], 罗青
因版权原因,仅展示原文概要,查看原文内容请购买。
基于局部相对概率密度kNN的多模态过程故障检测
多模态过程故障检测方面具有很高的准确性。
关键词:多模态过程;故障检测;局部相对概率密度;kNN
中图分类号:TP277
文献标志码:A
DOI:10.3969/j.issn.1003-9015.2019.01.021
Multimodal process fault detection based on local relative probability density kNN
160
高校化学工程学报
2019 年 2 月
郭金玉, 刘玉超, 李 元 (沈阳化工大学 信息工程学差差异明显的空间分布特点,提出一种基于局部相对概率密度 k 近邻 (LRPD-kNN)的
多模态过程故障检测方法。首先对训练数据进行标准化,计算训练数据的局部相对概率密度估计值,消除多模态数据
的方差差异。然后,对预处理后的数据建立 kNN 模型,计算统计量和控制限。对于测试数据,计算与训练数据局部相
对概率密度的欧式距离平方和,通过比较统计量与控制限进行多模态故障检测。将该方法应用到数值例子和半导体生
产过程,仿真结果表明,提出的算法效果要优于 PCA、kNN 和局部离群因子(LOF)方法,说明算法在方差差异较大的
1引 言
在现代工业生产过程中,生产策略的不同导致实际生产过程存在着多个运行模态。由于每个模态的 离散程度不同,数据复杂,使得过程监控越来越受到人们的重视[1-3]。其中基于数据驱动的故障检测技术 受到国内外学术界的广泛关注,以主元分析(principal component analysis, PCA)算法为代表的多元统计分 析方法迅速发展,并衍生出多种新的故障检测方法[4-5]。主元分析算法[6]是一种对数据进行降维的方法, 但在处理多模态过程时,由于其需要数据满足单一分布的基本假设无法满足,因此无法给出满意的监控 效果[7-10]。
SYSTEM AND METHOD FOR MULTI-AGENT REINFORCEMENT LE
专利名称:SYSTEM AND METHOD FOR MULTI-AGENTREINFORCEMENT LEARNING IN A MULTI-AGENT ENVIRONMENT发明人:David Francis Isele,Kikuo Fujimura,AnahitaMohseni-Kabir申请号:US16390224申请日:20190422公开号:US20200090074A1公开日:20200319专利内容由知识产权出版社提供专利附图:摘要:A system and method for multi-agent reinforcement learning in a multi-agentenvironment that include receiving data associated with the multi-agent environment in which an ego agent and a target agent are traveling and learning a single agent policy that is based on the data associated with the multi-agent environment and that accounts for operation of at least one of: the ego agent and the target agent individually. The system and method also include learning a multi-agent policy that accounts for operation of the ego agent and the target agent with respect to one another within the multi-agent environment. The system and method further include controlling at least one of: the ego agent and the target agent to operate within the multi-agent environment based on the multi-agent policy.申请人:Honda Motor Co., Ltd.地址:Tokyo JP国籍:JP更多信息请下载全文后查看。
面向运行时协作的异构Agent能力选择与补偿方法研究
面向运行时协作的异构Agent能力选择与补偿方法研究李爽;刘玮;吴坤;王晶【期刊名称】《烟台大学学报(自然科学与工程版)》【年(卷),期】2017(030)002【摘要】介绍了自适应Agent的能力模型,并在此基础上提出一种基于规划的能力选择和补偿方法.首先,定义了能力元模型,能力元模型表示了业务和信息系统的设计,包括目标、协作、能力、上下文;然后,提出一种来进行Agent能力协作的能力选择与补偿方法,该方法解决了多Agent能力与任务之间协作问题;最后,通过用AGVs模拟演示医疗垃圾运输系统验证了能力选择与补偿方法.%Dynamic decision and planning are needed at runtime in self-adaptation and multi-agent cooperative systems.In the description for Agent's execution and performance in different environments and state, the agent of adaptive collaboration system is required to support run-time dynamic decision-making and planning.The capability model of adaptive Agent is introduced, and a planning-based capability choosing and compensation method is presented.First of all, we define a meta-model to describe the design of business and information systems including objectives, coordination, ability and context.Then we propose a capability choosing and compensation to establish a collaboration in Agent's capability, settling the coordination problems between the agent capability and the task.Finally, the capability-collaborative modeling is well tested by using AGVs to simulate medical waste transporting system.【总页数】6页(P167-172)【作者】李爽;刘玮;吴坤;王晶【作者单位】武汉工程大学智能机器人湖北省重点实验室,湖北武汉 430205;武汉工程大学智能机器人湖北省重点实验室,湖北武汉 430205;武汉工程大学智能机器人湖北省重点实验室,湖北武汉 430205;武汉工程大学智能机器人湖北省重点实验室,湖北武汉 430205【正文语种】中文【中图分类】TP181【相关文献】1.异构Agent协作的研究进展 [J], 王晶;刘玮;吴坤;李爽2.面向系统集成领域的多Agent智能协作模型 [J], 李青山;王梅圣;赵晨光;王英强3.面向低能耗的非精确异构多核上的运行时技术 [J], 房双德;杜子东;方运潭;黄元杰;李华伟;陈云霁;吴承勇4.面向服务的multi-agent协作模型研究 [J], 徐喆;沈记全5.面向服务的分布式Agent CGF协作集成系统设计 [J], 宦婧;周伟祝因版权原因,仅展示原文概要,查看原文内容请购买。
基于自适应局部离群概率的动态过程监控(英文)
基于自适应局部离群概率的动态过程监控(英文)马玉鑫;侍洪波;王梦灵【期刊名称】《中国化学工程学报:英文版》【年(卷),期】2014(22)7【摘要】Complex industrial processes often have multiple operating modes and present time-varying behavior. The data in one mode may follow specific Gaussian or non-Gaussian distributions. In this paper, a numerically efficient moving window local outlier probability algorithm is proposed. Its key feature is the capability to handle complex data distributions and incursive operating condition changes including slow dynamic variations and instant mode shifts. First, a two-step adaption approach is introduced and some designed updating rules are applied to keep the monitoring model up-to-date. Then, a semi-supervised monitoring strategy is developed with an updating switch rule to deal with mode changes. Based on local probability models, the algorithm has a superior ability in detecting faulty conditions and fast adapting to slow variations and new operating modes. Finally, the utility of the proposed method is demonstrated with a numerical example and a non-isothermal continuous stirred tank reactor.【总页数】8页(P820-827)【关键词】异常概率;过程监控;自适应;连续搅拌釜式反应器;非高斯分布;概率算法;监测模型;工业过程【作者】马玉鑫;侍洪波;王梦灵【作者单位】Key Laboratory of Advanced Control and Optimization for Chemical Processes of Ministry of Education, East China University of Science and Technology【正文语种】中文【中图分类】TP277;P315.72【相关文献】1.基于动态多向局部离群因子的在线故障检测 [J], 李元;马雨含;郭金玉2.基于马氏距离局部离群因子方法的复杂化工过程故障检测 [J], 马贺贺;胡益;侍洪波3.基于时空近邻标准化和局部离群因子的复杂过程故障检测 [J], 冯立伟; 李元; 张成; 谢彦红4.对一种新的基于局部标准差的自适应对比度增强算法的评价(英文) [J], 张锋;蒋一峰;陈真诚;蒋大宗5.基于局部PLS的多输出过程自适应软测量建模方法(英文) [J], 邵伟明;田学民;王平因版权原因,仅展示原文概要,查看原文内容请购买。
一种基于Multi—Agent的组织知识获取模型框架
A Model Framework for Knowledge Capture Based
on Multi-Agent
作者: 王君 [1] 樊治平 [2]
作者机构: 北京航空航天大学,经济管理学院,北京,100083[1] 东北大学,工商管理学院,辽宁,沈阳,110004[2]
出版物刊名: 中国管理科学
页码: 41-45页
主题词: 知识管理 知识获取 Multi—Agent 组织知识 知识集结
摘要:在分析了组织知识获取的过程和Multi-Agent技术的特征的基础上,提出了一种基于Multi-Agent的组织知识获取模型框架,并且还给出了该模型中实现知识获取的关键技术及知识集结方法.依据提出的模型框架,可以容易的实现从个体知识中获取组织知识,有利于提高知识管理的有效性.最后,通过一个实例说明了给出模型框架的应用.。
用遗传算法实现Multi—Agent协同设计中的子任务调度
用遗传算法实现Multi—Agent协同设计中的子任务调度王经卓;胡小兵
【期刊名称】《淮海工学院学报》
【年(卷),期】2000(009)001
【摘要】提出了Multi-Agent协同设计任务调度的目标模型,描述了系统资源和Agent资源的数据结构,并以细粒度子任务的调例,用遗传算法实现任务调度,提高了Multi-Agent协同的效率。
【总页数】6页(P18-23)
【作者】王经卓;胡小兵
【作者单位】淮海工学院电子工程系;淮海工学院电子工程系
【正文语种】中文
【中图分类】O224
【相关文献】
1.Multi-Agent混合遗传算法在岩性参数反演中的应用 [J], 吴尚尉;刘明洋;伍敦仕;沈铭成
2.Multi-Agent技术在车间调度中的应用 [J], 王雪辉;李世杰;张玉芝
3.基于Multi-Agent的协同设计中虚拟环境感知研究 [J], 倪宁;卢刚;卜佳俊
4.运用遗传算法实现项目调度中的现金流优化 [J], 徐柏群;张军;陈伟能
5.Multi Agent在分布式测控系统动态任务调度中的实现 [J], 闫钧华;张焕春;经亚枝
因版权原因,仅展示原文概要,查看原文内容请购买。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Multi-Agent Coordination Based on Tokens: Reduction of the Bullwhip Effect in a Forest Supply Chain∗Thierry Moyaux,Brahim Chaib-draaUniversitéLaval-DAMAS,Pavillon PouliotD´epartement d’Informatique et de G´enie Logiciel Sainte-Foy G1K7P4(Qu´ebec,Canada){moyaux,chaib}@iad.ift.ulaval.caSophie D’Amours UniversitéLaval-FOR@C,Pavillon Pouliot Département de Génie Mécanique Sainte-Foy G1K7P4(Qu´ebec,Canada) sophie.damours@gmc.ulaval.caABSTRACTIn this paper,we focus on the supply chain as a multi-agent system and we propose a new coordination technique to reduce thefluc-tuations of orders placed by each company to its suppliers in such a supply chain.This problem of amplification of the demand vari-ability is called the bullwhip effect.To reduce such a bullwhip ef-fect,we propose a technique based on tokens to achieve a decentral-ized coordination.Precisely,classical orders manage the demand itself whereas tokens manage effects on company inventory due to variations of this demand.Finally,the proposed approach is vali-dated by the Wood Supply Game,which is a supply chain model used to make players aware of the bullwhip effect.We experimen-tally verify that our coordination technique leads to less variable orders(i.e.the standard deviation of orders is reduced)while in-ventory levels are not excessively high but sufficient to avoid back-orders.Categories and Subject DescriptorsI.2.11[Distributed Artificial Intelligence]:multi-agent systems General TermsManagement,performance,experimentationKeywordsMulti-agent systems,supply chain,decentralized coordination,to-kens,bullwhip effect1.INTRODUCTIONThe performance of many systems is reduced byfluctuations in their internal streams.As a distributed system,a multi-agent sys-2.THE PROBLEMIn this section,wefirst describe thefluctuation problem from ageneral point of view,then we focus on a concrete example of this problem in the supply chainfield.The streamfluctuations in supply chains(a supply chain is the set of companies producing and dis-tributing products to consumers)affect orders placed by companies to their supplier(s);this is known as the bullwhip effect.2.1Fluctuations in distributed systemsThefluctuation problem in a distributed system can be described as follows:i)a distributed system(multi-agent or other)is built to achieve a function;(ii)many streams travel in the distributed sys-tem while achieving its function;(iii)these streams mayfluctuate; (iv)thesefluctuations may trouble the distributed system in achiev-ing its function.Streamfluctuations may arise in many distributed systems:(i) on the road network,vehicle density mayfluctuate,generating traf-fic jams and empty roads instead of steady traffic on every road, (ii)on a computer network,data density between computers may fluctuate,generating congestion,(iii)in electronics,electricity may fluctuate between composants,roasting electronic components,(iv) in the macro-economy,the economy of a country alternates among recession and growth,etc.We focus in the two following subsec-tions on two examples:multi-agent systems and supply chains. 2.2Fluctuations in multi-agent systemsAs multi-agent systems are particular distributed systems,fluc-tuations may occur in them.Parunak[22]notes that,in principle, systems of autonomous agents can become computationally unsta-ble.We think this instability may appear in this way:to carry out its function,a multi-agent system is crossed by two streams:an actions stream resulting from other agents and a percepts stream resulting from the environment and from the others agents:if a stream is disturbed(delay,error...),the other is disturbed as well.In particular,an instability may occur due to the following causes:un-certainty in evaluating the environment,delay in information trans-mission,multithreading management if several agents run on the same processor,etc.In some cases,in particular if the multi-agent system represents a supply chain,the global behavior of the multi-agent system can be perturbed by thesefluctuations.2.3Fluctuations in supply chains:the bull-whip effectWe now illustrate this problem using the case of unexpected de-mandfluctuations in a supply chain which is known as“amplifi-cation of the demand variability”,“bullwhip effect”or“Forrester’s effect”.The entities of such a supply chain are companies and the fluctuating streams in this system are the orders placed by each company to its suppliers.Figure1shows how the bullwhip effect propagates on a simple supply chain with only three companies:a retailer,a wholesaler and a manufacturer.The retailer sells to the customer and buys from the wholesaler,the wholesaler sells to the retailer and buys from the factory and the factory sells to the retailer and buys from an unknown supplier.The ordering patterns of the three companies share a common,recurring theme:the variabili-ties of an upstream site are always greater than those of the down-stream site[14].As a variability,the bullwhip effect is measured by the standard deviationσof orders(note that meansµof orders are all equal in our example).Demandfluctuations cost money due to higher inventory levels and supply chain agility reduction(agility is the ability of an organisation to thrive in a constantly chang-ing,unpredictable business environment[12]in[24]).In fact,such fluctuation of the demand lead every participant in the supply chain µRet.the retailerOrders fromTimeWholesalerAmplification of order variabilitythe customerσTimeRetailerOrders fromRet.the wholesalerTimeManufacturerOrders fromµσσµWho.Who.Man.Man.Figure1:The bullwhip effect[15,16].to stockpile because of a high degree of demand uncertainties and variabilities[15].The demandfluctuation in a supply chain wasfirst described by Forrester[10]in1958and this explains why this phenomenon is sometimes called the Forrester’s effect.Many years later,Lee and his colleagues[14,15]gave a more complete understanding of this effect and called it the bullwhip effect.They proposed in particular four main causes(demand forecast updating,order batching,price fluctuation,and rationing and shortage gaming).However,some other causes were identified[10,27,28].A formal model of this problem was proposed in[4]but very few people[13,29]studied this problem using multi-agent techniques.Simchi-Levi and his colleagues[4,26]note that one of the most frequent suggestions for reducing the bullwhip effect is to centralize demand informa-tion within a supply chain,that is,to provide each participant in the supply chain with complete information on the actual customer demand(this was formally proven for two forecasting techniques in).Such centralization of information allows every company in the supply chain to create more accurate forecasts,rather than re-lying on the downstream company,which can vary significantly more than the actual customer demand.In our validation,we try to compare our coordination mechanism with this centralization. Finally,reducing the bullwhip effect appears to be more a prob-lem of coordination rather than of optimization,of constraints sat-isfaction or of any other type of problem,because,for each com-pany,it is a matter of ordering in a coherent manner in comparison to other company’s behaviour.The goal is to synchronize every company’s activities in order to avoid products being stored in in-ventory.3.DECENTRALIZED COORDINATIONBASED ON TOKENSOur coordination technique aims at improving system efficiency by reducing streamfluctuations between entities.Wefirst look at two decentralized coordination mechanisms based on tokens used in the production managementfield.We then present the technique we propose generally for any multi-agent system.Finally,we apply this general idea to reduce the bullwhip effect,that is the amplifi-cation of demand variability in a supply chain.3.1Tokens as a coordination mechanismIn thefield of Industrial Management,several different approaches have been proposed and are used in some companies to coordi-nate production entities in manufacturing systems in a decentral-ized way.These approaches use tokens to coordinate entities.Many mechanisms were proposed:at the company level,PAC(Produc-tion Authorization Cards)System[3],Kanban,Extended Kanban and Generalized Kanban[17]are used to control the production of one company and at the supply chain level.Reponsability To-kens[23]is the operationalization of the Lee and Whang’s[16]informationmaterialFigure2:Tokens as a coordination mechanism in manufactur-ing systems[3].decentralized supply chain management scheme.As examples,we presentfirst the PAC system and next the Responsability Tokens. The PAC system is a decentralized approach to the coordination and control of material and informationflow in multiple cell man-ufacturing systems.This approach generalizes other approaches such as MRP(Material Requirements Planning),Kanban(Japanese card system)and OPT(Optimized Production Technology)among others.Figure2shows a production cell(circle at the centre), two stores(dashed boxes at the left and at the right)and the min-imal components of the PAC system.Different types of tokens go through cells and inventories:•requisition tags are sent by cell j to store j−1to ask store j−1to ship an item to cell j immediately,or,if the store is empty,requisition tags wait in a queue at the store until there is a unit of product available(so,this queue isfilled with backorders).•order tags are sent by cell j to store j−1to inform store j−1 that there will be a demand by the cell for a product in the future:for each order tag,there would be a requisition tag.These tokens allows long-term scheduling by propagating in the production system.•process tags are sent by cell j to store j when cell j ships an item to store j.When an order tag arrives at a store,it is matched with a process tag and the match generates the PA card.•PA(Production Authorization)cards are sent by store j to cell j to allow this cell to process a part.Moreover,when cell j receives a PA card,it send order and requisition tags to store j−1.This is a brief description of material and informationflow con-trol.The complete PAC system has more components:each type of product has its own set of tags,tags can have priority and be added to take into account stream convergences(e.g.for cells hav-ing many entryflows and only one exit stream)and stream diver-gences,order cancelations,treatment of defective products... Responsabilty Tokens were proposed by Porteus[23]to further operationalize the decentralized supply chain management scheme of Lee and Whang[16],which is itself an operationalization of the decentralized management scheme made implicit by Clark and Scarf[5].It is a simpler mechanism than the PAC system and is designed to coordinate several companies whereas the PAC system is designed to coordinate manufacturing workstations in the same company.Responsability Tokens are used as a mechanism for ad-ministering the transfer payments required to implement upstream responsability.The idea is to base reimbursement on actual con-sequences of processing/delivering/shipping less than what was requested,rather than predicting the consequences in advance.Thea) 1−1 ordering rule b) 1−1 ordering rule + tokensFigure3:Addition of tokens to1-1ordering rule.Y1Y2X X XXY i: classical information stream: tokens sent by entities i, i−1, etc.X Y i: total information stream = +: tokens added to ordersenvironment agent 2agent 1Figure4:Information streams cut into two parts.system works as follows:whenever an upstream company cannot meet the entire order placed by its customer company,it will substi-tute responsability tokens in place of the missing units.Customer companies will treat these tokens as physical units and thefinan-cial consequences of their not being real units are assigned to the issuing player.Thus,companies are incited byfinancial penalties to deliver to their downstream companies as completly as possible.3.2General coordination principleWe were inspired by Porteus’Responsability Tokens to design our coordination mechanism.Our mechanism assumes the bull-whip effect happens in the following way:when customer demand increases,inventory decreases because of ordering lead time.In fact,if the supplier inventory is enough,it only takes the ordering lead time to increase the productflow up to the new demand.Be-cause inventory decreases,the company has to overorder to avoid stockouts:if it does not say to its supplier why it overorders,this supplier faces the same situation but with a bigger demand:thus the supplier inventory decreases more,etc.Figure3a illustrates this fact:the company places orders strictly equal to incoming or-ders(1-1ordering rule),what avoids bullwhip effect but generates stockouts.As shown in Figures3b and4,the principle used to align agents’behavior is to cut information streams into two parts:the first part X(which is the classical order stream withoutfluctua-tions)is used to transmit the agents’real needs,while the second part Y i(i.e.the tokens)is used to manage the consequences of changes in agent i’s environment.When the environment changes, thefirst part X follows this change.No agents must change X:we assume in this paper every agent plays the game and every agent trusts its upstream agent when it says the environment wants X. Each agent i manages a change in X as well as it is able to and sends Y i tokens to ask for more ressources from the rest of the sys-tem.In fact,when agent i only transmits X,it will get in the near future enough ressources to fullfil what the upstream agent i+1 needs and thus the enviromnent,but,as we have just said,because of the existence of delays,agent i has to consume its reserve.To-kens Y i are thus sent to reconstitute this reserve.Therefore,when agent i sends its tokens Y i,it must transmit tokens Y i−1from up-stream agents down to the end of the information stream and add to Y i−1its own tokens to ask for resources to reconstitute its own reserves.3.3Example of token-based coordination in asupply chainEach company in the supply chain can be controlled by a soft-ware agent;therefore the supply chain can be viewed as a multi-agent system.Improving coordination in the multi-agent system will reduce the bullwhip effect while improving supply chain effi-ciency.We apply to the supply chain the general idea of our coor-dination technique as stated previously.Here,for agent i,upstream agent i−1is company i’s customer and the environment is the rmation streams are composed of orders placed by each company to its suppliers.We now cut these streams into two parts (Figure4):thefirst part X represents the actual quantity desired by the company to satisfy its orders(we call this part the order because it is the classicflow)and the second part Y i allows company i to over-or underorder when there is a change in the incoming order X(we call this part tokens).The information given to company i by company i−1is the doublet(X,Y i−1).If incoming order X increases,each company’s inventory reduces as long as ordered products arrive:so tokens Y i manage the consequences of delays in physical and information streams.The idea is that orders X ex-actly follows market demand,that leads to the samefluctuations in orders as in the market;the bullwhip effect is thus eliminated(as do1-1ordering rule)because orderfluctuations are the same as markets ones.But this can lead to huge inventories or backorders (i.e.negative inventories),so we introduce tokens Y i to manage a unique over-or underorder for each change in the market to stabi-lize each company’s inventory.X indicates thus the market need and is the same for every company in the supply chain.Y i indicates a variation in X and allows each company to maintain an efficient inventory(i.e.not too big,not too small).When customer demand increases(or decreases)because of a market demand increase(or decrease),the job of tokens is to travel up to the forest to trigger off only one big(or small)quantity to go down the supply chain to adjust each company’s inventory to its normal level.This is the theory:to react more quickly to the increase in customer demand, we do not wait for tokens to go to the forest before triggering big-ger shippings:each time a company receives a token,it processes it as an order;if it does not have enough inventory,the remaining tokens are memorized as backordered tokens.Coordination with tokens and information centralization(i.e.each company knows in real time the actual market demand)both al-low every company to know the actual market demand.In fact, when tokens are used,order X is equal to market demand.The first difference between these two systems lays in market demand propagation speed:tokens are as slow as orders while informa-tion centralization supposes each company knows in real time the market demand(retailers broadcast the market consumption to the whole supply chain).The second difference between tokens and information centralization is with customer demand management. With information centralization,when the retailer overorders to re-constitute its inventory,its wholesaler has to overorder even more in order to reconstitute its own inventory,and so on.In this situa-tion,all companies place orders superior to what is asked by their customer because it is their only way to refill their inventory.They do so even if they know what the customer demand is and that this causes their suppliers to be in backorder.On the contrary,when the retailer sends tokens to its wholesaler,it indicates that it needs more products to reconstitute its inventory:it continues to place the same orders as its incoming orders are,to avoid perturbations in the rest of the chain(in particular,it does not put its suppliers into a backo-rder situation)and waits for incoming products streams to bring the products wave ordered with tokens in order to adjust its inventory level.ForestPaper PaperCustomerCustomer PaperLumber LumberRetailerRetailer WholesalerWholesalerPulp millProducts stream Orders stream1 week order delay1 week shipping delayPlayer 1Player 2Player 3Player 4Player 5Player 6Player 7Player 8SawmillFigure5:Model of forest supply chain used in experiments.4.EXPERIMENTAL V ALIDATIONWe have validated our coordination mechanism on a model of a forest supply game which simulates the group of companies partic-ipating in the production and distribution of paper and lumber.This model isfirst described and then the experiments are detailed. 4.1The wood supply gameTwo games,called“Wood Supply Games”[9,11],were devel-opped based on the structure and dynamics of the Beer Game.Beer and Wood Supply Games are an exercise that simulates the material and informationflows in a production-distribution system and were designed to make players aware of the bullwhip pared to the classical Beer Game that has been used to study supply chain dynamics,the Wood Supply Games introduce divergent product flows to increase its relevance to the North European forest sector. Our team has adapted this game for the Qu´e bec forest sector;we use this version,which is displayed in Figure5.There are eight players in thisfigure,but both customer players are only there for convenience when drawingfigures12,13and14.The main differ-ence between the original and our Qu´e bec Wood Supply Game is in the length of the lumber and paper chain which is either the same (Fjeld’s game)or different(our game).Figure5shows how six players(human or software agents)play the game.The game is played by turns:each turn represents a week in reality and is played in4steps;these4steps are played in parallel by each player.In thefirst step,players receive their inven-tory(these products were sent two weeks earlier by their supplier, because there is a two-week shipping delay)and advance shipping delays between suppliers and their customers.Then in the second step,players look at their incoming orders and try tofill them.If they have backorders,they try tofill those as well.If they do not have enough inventory,they ship as much as they can and add the rest to their backorders.In the third step,players record their in-ventory or backorders.In the fourth step,players advance the order slips.In the last step,players place an order to their supplier(s) and record this order.To decide what the order to place is,players compare their incoming orders with their inventory/backorder level (in our experiments,they only evaluate what is written in Figure7). The correct decision that would reduce the bullwhip effect has to be taken here.Finally,a new week begins with a new step1,and so on.Each position is played in the same way,except the sawmill: this position receives two orders(one from the lumber wholesaler, another from the pulp mill)that have to be aggregated when placing an order to the forest.The sawmill can evaluate its order by basing it on the lumber demand or on the paper demand:in the following experiments,the sawmill places an order equal to the maximum of this two possible orders.Moreover,the sawmill receives one type of product and each unit of this product generates two units:a lum-Without centralizationcentralization tokens tokensWith Without informationWith information Experiment BExperiment A Experiment C Experiment DFigure 6:Experiments classification.A B DC incoming order Exp minus inventory variationincoming tokens plus2 times customer demand variationminus inventory variationcustomer demand none2 times order variationincoming tokens plus customer demandnoneincoming order Order placed Tokens sent Y iX Figure 7:Experimented ordering patterns.ber and a paper unit.That is,each incoming unit is cut in two:one piece goes to the sawmill’s lumber inventory,the other goes to its paper inventory.When we add our coordination technique to this game,players have to manage tokens.If players receive tokens with incoming or-ders in the second step,they should transmit them minus the prod-ucts shipped in the second step.As the incoming orders change (they always change at the same time as tokens arrive),players add the new tokens to the transmitted tokens;in our experiments,we have empirically chosen the quantity of added tokens to be equal to two times the incoming order variation (cf.experiments B and D in Figure 7).Generally,this quantity of added tokens depends on incoming order variation,on transportation delays and on ordering delays.4.2ExperimentsWe measure order variability by computing the standard devia-tion σof orders placed by each company in the supply chain.This measure is made in the four experiments described in Figures 6and 7:each either uses or not information centralization (each player knows in real time the customer demand)and our coordi-nation mechanism.We now present the results of each experiment and conclude with some comparisons.4.2.1Experiment AExperiment A uses neither information centralization nor any co-ordination mechanism.It is the most basic experiment with which the other three experiments are to be compared.Players order from their upstream player what their downstream player ordered minus their stock variation.This order pattern is designed to keep positive inventory:when incoming orders are greater than stock variation,the quantity ordered will keep a steady inventory,but when incom-ing orders are not greater than stock variation,nothing is ordered (there are no negative orders,i.e.order cancellations),leading to an increase in inventory.Figure 8exhibits the quantity ordered each week by each player.The first curve gathers lumber and paper customer demands,the second one represents orders placed by lumber and paper retailers,the third one shows both wholesalers orders,the fourth one shows20 40 60 80 100Figure 8:Experiment A (without information centralization,without tokens).20 40 60 80 100Figure 9:Experiment B (without information centralization,with tokens).only pulp mill orders and the last curve is for the saw mill.We can see a very great amplification in order variability between the two first curves(lumber and paper customer)and the last curve(saw mill),that is the bullwhip effect is huge.The third curve gathers lumber and paper wholesalers’orders:they do not order the same quantity all the time.This is because the supply chain is a sys-tem where the lumber chain is shorter than the paper one,therefore players order in a different manner even if they have the same or-dering pattern.4.2.2Experiment BExperiment B uses our coordination mechanism but not informa-tion centralization:only the retailer knows the customer demand. That is,we only added our coordination mechanism to experiment A.The order placed by players is equal to their incoming order(we don’t substract inventory variation as in experiment A,because this variation was there to keep steady or positive inventory and this is now the job of tokens).Some tokens are added to this order which quantity is the sum of incoming tokens plus two times the incoming order variation:the incoming tokens represent the trans-fered tokens while the two times the incoming order variation is an evaluation of the needed quantity to refill inventory.Figure9ex-hibits the ordering pattern for each player.In this Figure,tokens are added to orders,which leads to a peak in ordering pattern when a company becomes aware of changes in market demand.More-over,we can see that this peak does not happen at the same time: there is a two-week shift between two successive supply chain lev-els(e.g.between retailers and wholesalers)due to order delays.As the saw mill belongs to lumber and paper supply chains and these two chains have different lengths,the saw mill has two peaks.Fig-ure9can be compared with Figure8;we can see thatfluctuations in orders are not as great in experiment B as in experiment A,so tokens lower the bullwhip effect.4.2.3Experiment CIn comparison with experiment A,experiment C adds informa-tion centralization(but not tokens as in experiment B).Experiments C and D use this centralization,which allows players to base their order on customer demand instead of on incoming orders.This ex-plains the difference in the ordering formula between experiments A and B in that we replace“incoming order”with“customer de-mand”.So,players now order customer demand minus their own inventory variation.In experiment B,players are also able to know customer demand(this demand travels upstream in the order,with-out being affected by anything else such as downstream player’s inventory variation)but this signal is very slow(as slow as orders: two weeks are needed to go from players to their suppliers).Com-pared with experiment A,we now assume that customer demand is instantaneously known by all players.When comparing Figures8 and10,we have a confirmation that information centralization re-duces the bullwhip effect([4,26]).4.2.4Experiment DExperiment D uses information centralization and tokens(cf.figure11).Players now order customer demand and send tokens. The quantity of these tokens is the number of incoming tokens plus two times the customer demand variation.If we compare Fig-ures11and91,where we have tokens in both experiments,we can see the information centralization advantage:(i)all peaks oc-cur early and(ii)peaks are equal.We can note there are several peaks for each company:thefirst peak corresponds to tokens sent510152025303540Player 1Player 2Player 3Player 4Player 5Player 6Player 7Player 8O r d e r s s t a n d a r d d e v i a t i o nLevel in the supply chain (customer -> saw mill)Experiment A Experiment B Experiment C Experiment DFigure 12:Standard deviation of orders.100200300400500600700800900100011001200130014001500Player 1Player 2Player 3Player 4Player 5Player 6Player 7Player 8T o t a l i n v e n t o r yLevel in the supply chain (customer -> saw mill)Experiment A Experiment B Experiment C Experiment DFigure 13:Total inventories for each player.by the company and the next ones are the tokens transmitted from clients.4.2.5Experiment comparisonFigure 12shows standard deviation of orders for each company (Figure 5gives the company name for each player)and for each experiment on 50weeks.The lower this standard deviation is,the lower the bullwhip effect:in the best case (experiment D),the stan-dard deviation for each company is inferior to 2.53(sawmill)while the customer’s is 1.09.It does not seem possible to reduce the companies standard deviation to 1.09,because companies have to overorder to reconstitute their inventory due to delay effects.If companies do not do so,it would mean that inventories are too low or even negative,panies have backorders,which thus re-duces customer service.Only experiments that use tokens (B and D )finish with stable order patterns,while customer demand is always stable,except in week 5where there is a unique change.In all experiments,no255075100125150175200225250Player 1Player 2Player 3Player 4Player 5Player 6Player 7Player 8T o t a l b a c k o r d e rLevel in the supply chain (customer -> saw mill)Experiment A Experiment B Experiment C Experiment DFigure 14:Total backorders for each player.company finishes the game with backorders.In experiments B and D ,some companies finish with very few products in inventory:as they have a stable order pattern,this can be seen as an excellent result.On the other hand,we have chosen the factor two in the to-kens created in order to reconstitute original inventories:we think this heuristic factor corresponds to the ordering delay,because the longer the ordering delay is,the more inventory decreases and so the more tokens are to be sent.But this factor depends on the shape of the supply chain too (e.g.in experiment D ,the lumber whole-saler’s inventory stabilizes on 16products in inventory per week,while the paper wholesaler’s inventory stabilizes on 28).This is a consequence of the divergent flow located in the saw mill and of the different sub-supply chain lengths.Next,tokens lead to more stable order patterns (i.e.less bullwhip effect)than information central-ization:experiment A and C curves are above experiment B and D .But this does not mean that our coordination mechanism is always better than information centralization.In fact,when we look at Figures 13and 14,we see that total inventory and backorder curves cross:this means some companies prefer to manage the supply chain like in the experiment D while some others prefer to manage like in experiment B or C (never A )if their goal is to minimize only their inventory.However,experiment D has the best results because it has the lowest bullwhip effect and the lowest backorders (i.e.the best customer service)for each com-pany.Moreover,inventories in experiment B are lower than in D for every company,but this inventory may be adjusted by changing the heuristic factor of sent tokens.In fact,when too much tokens are sent,inventories stabilize on a too important level:as we calcu-late total inventories on 50weeks,a little difference on the inven-tory stabilization level leads to huge difference in Figure 13(and also 14).5.CONCLUSIONIn this paper,we have investigated coordination techniques that are able to reduce streams fluctuations in a distributed system.We have proposed a new coordination technique to manage this fluctu-ation in a decentralized way and applied it to the case of a supply chain.In fact,a supply chain is a distributed system composed of many companies where the fluctuation problem is the amplifi-cation of demand variability called the bullwhip effect.Our tech-。