A Multiagent Variant of Dyna-Q

合集下载

基于多Agent粒子群改进算法的车间调度

基于多Agent粒子群改进算法的车间调度
Ab s t r a c t :T o s o l v e t h e d y n a mi c a n d c o mp l e x c a l c u l a t i o n p r o b l e ms i n p r o d u c t i o n s c h e d u l i n g p oc r e d u r e ,a i f t n e s s f u n c t i o n o f p a r t i c l e s wa r m a l g o it r h m a n d o p e r a t i o n s t r a t e g y o f s h o p s c h e d u l i n g a g e n t a r e c o n s t r u c t e d b y me a n s o f p a r t i c l e s w a mi a l g o i r t h m a n d mu l t i — a g e n t c o o p e r a t i o n c h a r a c t e is r t i c . Mo r e o v e r ,a n i mp r o v e d a l g o it r h m b a s e d o n p a r t i c l e S wa r m a lg o r i t h m i s e s t a b l i s h e d a n d a n o p t i ma l s h o p s c h e d u l i n g p r o c e d u r e b y p a r t i c l e s wa r m a l g o it r h m b a s e d o n mu l t i - a g e n t i n t h i s p a p e r . F i n ll a y , t h e s i mu l a t i o n s y s t e m o f p r o d u c t i o n d y n a mi c

基于Multi-Agent理论的飞机故障协同诊断模型研究

基于Multi-Agent理论的飞机故障协同诊断模型研究

2017年第24卷第7期基于Multi-Agent 理论的飞机故障协同诊断模型研究P 陆江华1徐贵强2(1.成都航空职业技术学院航空工程学院,四川成都610100;2.成都航空有限公司技术工程办公室,四川成都610200)摘要:随着我国民航事业的迅速发展,如何保障飞机的飞行安全成了日益重要的问题[1]。

解决这一问题的关键就是及 时准确地对故障进行分析和诊断。

根据飞机远程故障诊断的实际需求及当前基于角色的协同诊断模型中存在的问题, 应用Multi-Agent 理论对民航飞机远程故障的协同诊断做了一些探索性研究。

关键词:飞机故障诊断;Multi-Agent 系统;协同机制;UML 协作图 doi :10.3969/j . issn . 1006 -8554.2017.07.0021基于Multi-Agent 的被动协同机制针对基于角色的飞机故障协同诊断模型存在的问题,将引人Multi-Agent 思想,定义参与诊断的实体的功能和结构,将其 封装为诊断Agent ,并对Agent 之间的协同机制以及Agent 与协 同环境之间的交互关系进行重点研究[2] 3。

1.1 诊断Agent 的功能在Multi-Agent 的协同诊断环境中,每个参与诊断的实体可 以抽象为一个诊断Agent 。

按照飞机故障诊断的实际需求,诊 断Agent 的功能如图1所示。

诊断Agent实时监控 知识获取 故障诊断 数据维护协同诊断11111故障提交过程监控决策提交决策评价图1诊断Agent 的功能1) 飞机运行状态数据的实时监控的功能:用户可以对飞机 运行状态数据进行实时观测,当出现异常数据时,诊断Agent 的实时预警机制会向用户发出提示。

2)飞机故障数据的特征信息获取:诊断Agent 对飞机故障数据提供了数据预处理功能,通过一系列模块操作,最终获取 飞机故障数据中的关键特征信息。

这是后续对故障信息进行 分析诊断的必要准备。

基于Multi-Agent的编队对空防御方法

基于Multi-Agent的编队对空防御方法

( . i lt nT an n e tr D l nWa s i a e f L v , l n 1 6 , ia 1 S mu ai r i i gC n e , a i rh pAc d myo A Na y Dai 1 0 Chn ; o a P a 1 8 2 T c n l g p r n , ae l eMa i me rc ig& C nr l n p  ̄ n f h n , in y n 2 4 3 , h: ) . e h o o y De a t t S t l t r i a kn me i t T o t l gDe a me t C i a Ja g i 1 4 l C ia o i o n
o tm ia i n p r o e n t e e d p i z t u p s s i h n .Co o mp r o t e ta ii n l g n t l o i m t t t t e t o e ,t e M AS a e t h r d to a e e i a g rt c h wi s a i ma h ma i m d l h h c c
Ab t a t Ac o d n t f a u e o Ai- e e s wa f r t s ra e o ma i n c n t u t ir p we d srb t n sr c : c r ig o e t r s f rd f n e r a e o u f c f r to , o sr c f e o r iti u i o o tm ia i n s l to a e n M u t— e t s s e .Ai— e e s r a e i n f t e mo t i o a t f r s。 p i z to o u i n b s d o liAg n y t m r d f n e wa f r s o e o h s mp r n o m i mo e n t n d r n v l b tl s t d n fr p we it i u i n s r t g s n a e n M u t- g n y t m o s l e fr p we y a i a a a te ,su y o e o r d srb t ta e y u i g b s d o i o li A e t s s e t o v e o r d n m c i

基于随机森林和加权K均值聚类的网络入侵检测系统

基于随机森林和加权K均值聚类的网络入侵检测系统

基于随机森林和加权K均值聚类的网络入侵检测系统任晓芳;赵德群;秦健勇【摘要】目前,许多误用检测系统无法检测未知攻击,而异常检测系统虽然能够精确检测未知攻击,但具有较高的误报警率.为此,提出了一种基于随机森林和加权K均值聚类算法的混合入侵检测系统.首先,利用随机森林算法从训练集中建立入侵模型,构建误用检测模型,通过网络连接的特征匹配来检测已知攻击.然后,利用加权K均值算法构建异常检测模块,根据随机森林算法获得的特征,将不确定性攻击的网络连接数据进行聚类,进而实现未知攻击的检测.在KDD'99数据库中的实验表明,该系统具有较高的检测率和较低的误报警率.【期刊名称】《微型电脑应用》【年(卷),期】2016(032)007【总页数】4页(P21-24)【关键词】入侵检测系统;随机森林;加权K-均值聚类;误用检测;异常检测【作者】任晓芳;赵德群;秦健勇【作者单位】新疆工学院,计算机工程系,乌鲁木齐,830023;新疆工学院,计算机工程系,乌鲁木齐,830023;新疆工学院,计算机工程系,乌鲁木齐,830023【正文语种】中文【中图分类】TP393当前计算机网络尽管具有多重安全策略,譬如,访问控制、加密以及防火墙的应用,然而,网络安全的漏洞还是与日俱增。

因此,迫切需要智能的入侵检测系统(Intrusion Detection System,IDS)来自动地检测新型的入侵行为[1]。

目前,主要有两种入侵检测方法:误用检测和异常检测[2]。

误用检测只能检测出数据中已知的入侵模式,无法检测新出现的入侵行为,且需要实时的更新数据库[3]。

异常检测系统能够检测出新的入侵行为,但通常具有较高的误报警率[4,5]。

为了解决误用检测和异常检测的这些缺点,则需要采用混合入侵检测的技术。

为此,文献[6]提出一种基于增量式神经网络的入侵检测系统,解决了神经网络离线训练所带来的问题,对在线检测过程中新出现的攻击类型进行增量式学习,实现对入侵检测模型的动态扩展。

Multiagent systems A survey from a machine learning perspective

Multiagent systems A survey from a machine learning perspective

Multiagent Systems:A Survey from a Machine LearningPerspectivePeter Stone Manuela VelosoAT&T Labs—Research Computer Science Department180Park Ave.,room A273Carnegie Mellon UniversityFlorham Park,NJ07932Pittsburgh,PA15213pstone@ veloso@/˜pstone /˜mmvIn Autonomous Robotics volume8,number3.July,2000.AbstractDistributed Artificial Intelligence(DAI)has existed as a subfield of AI for less than two decades.DAI is concerned with systems that consist of multiple independent entities that interact in a domain.Traditionally,DAI has been divided into two sub-disciplines:Distributed Problem Solving(DPS)focuseson the information management aspects of systems with several components working together towardsa common goal;Multiagent Systems(MAS)deals with behavior management in collections of severalindependent entities,or agents.This survey of MAS is intended to serve as an introduction to thefieldand as an organizational framework.A series of general multiagent scenarios are presented.For eachscenario,the issues that arise are described along with a sampling of the techniques that exist to deal withthem.The presented techniques are not exhaustive,but they highlight how multiagent systems can beand have been used to build complex systems.When options exist,the techniques presented are biasedtowards machine learning approaches.Additional opportunities for applying machine learning to MASare highlighted and robotic soccer is presented as an appropriate test bed for MAS.This survey does notfocus exclusively on robotic systems.However,we believe that much of the prior research in non-roboticMAS is relevant to robotic MAS,and we explicitly discuss several robotic MAS,including all of thosepresented in this issue.1IntroductionExtending the realm of the social world to include autonomous computer systems has always been an awe-some,if not frightening,prospect.However it is now becoming both possible and necessary through ad-vances in thefield of Artificial Intelligence(AI).In the past several years,AI techniques have become more and more robust and complex.To mention just one of the many exciting successes,a car steered itself more than95%of the way across the United States using the ALVINN system[Pormerleau,1993].By meeting this and other such daunting challenges,AI researchers have earned the right to start examining the im-plications of multiple autonomous“agents”interacting in the real world.In fact,they have rendered thisexamination indispensable.If there is one self-steering car,there will surely be more.And although each may be able to drive individually,if several autonomous vehicles meet on the highway,we must know how their behaviors interact.Multiagent Systems(MAS)is the subfield of AI that aims to provide both principles for construction of complex systems involving multiple agents and mechanisms for coordination of independent agents’behaviors.While there is no generally accepted definition of“agent”in AI[Russell and Norvig,1995],for the purposes of this article,we consider an agent to be an entity,such as a robot,with goals,actions,and domain knowledge,situated in an environment.The way it acts is called its“behavior.”(This is not intended as a general theory of agency.)Although the ability to consider coordinating behaviors of autonomous agents is a new one,thefield is advancing quickly by building upon pre-existing work in thefield of Distributed Artificial Intelligence(DAI).DAI has existed as a subfield of AI for less than two decades.Traditionally,DAI is broken into two sub-disciplines:Distributed Problem Solving(DPS)and MAS[Bond and Gasser,1988].The main topics considered in DPS are information management issues such as task decomposition and solution synthesis. For example,a constraint satisfaction problem can often be decomposed into several not entirely independent subproblems that can be solved on different processors.Then these solutions can be synthesized into a solution of the original problem.MAS allows the subproblems of a constraint satisfaction problem to be subcontracted to different prob-lem solving agents with their own interests and goals.Furthermore,domains with multiple agents of any type,including autonomous vehicles and even some human agents,are beginning to be studied.This survey of MAS is intended as an introduction to thefield.The reader should come away with an appreciation for the types of systems that are possible to build using MAS as well as a conceptual framework with which to organize the different types of possible systems.The article is organized as a series of general multiagent scenarios.For each scenario,the issues that arise are described along with a sampling of the techniques that exist to deal with them.The techniques presented are not exhaustive,but they highlight how multiagent systems can be and have been used to build complex systems.Because of the inherent complexity of MAS,there is much interest in using machine learning techniques to help deal with this complexity[Weißand Sen,1996;Sen,1996].When several different systems exist that could illustrate the same or similar MAS techniques,the systems presented here are biased towards those that use machine learning(ML)approaches.Furthermore,every effort is made to highlight additional opportunities for applying ML to MAS.This survey does not focus exclusively on robotic systems.However, we believe that much of the prior research in non-robotic MAS is relevant to robotic MAS,and we explicitly discuss several robotic MAS(referred to as multi-robot systems),including all of those presented in this issue.Although there are many possible ways to divide MAS,the survey is organized along two main di-mensions:agent heterogeneity and amount of communication among agents.Beginning with the simplest multiagent scenario,homogeneous non-communicating agents,the full range of possible multiagent sys-tems,through highly heterogeneous communicating agents,is considered.For each multiagent scenario presented,a single example domain is presented in an appropriate instan-tiation for the purpose of illustration.In this extensively-studied domain,the Predator/Prey or“Pursuit”domain[Benda et al.,1986],many MAS issues arise.Nevertheless,it is a“toy”domain.At the end of the article,a much more complex domain—robotic soccer—is presented in order to illustrate the full power of MAS.The article is organized as follows.Section2introduces thefield of MAS,listing several of its strong points and presenting a taxonomy.The body of the article,Sections3–7,presents the various multiagent scenarios,illustrates them using the pursuit domain,and describes existing work in thefield.A domain that facilitates the study of most multiagent issues,robotic soccer,is advocated as a test bed in Section8. Section9concludes.2Multiagent SystemsTwo obvious questions about any type of technology are:What advantages does it offer over the alternatives?In what circumstances is it useful?It would be foolish to claim that MAS should be used when designing all complex systems.Like any useful approach,there are some situations for which it is particularly appropriate,and others for which it is not. The goal of this section is to underscore the need for and usefulness of MAS while giving characteristics of typical domains that can benefit from it.For a more extensive discussion,see[Bond and Gasser,1988].Some domains require MAS.In particular,if there are different people or organizations with different (possibly conflicting)goals and proprietary information,then a multiagent system is needed to handle their interactions.Even if each organization wants to model its internal affairs with a single system,the organi-zations will not give authority to any single person to build a system that represents them all:the different organizations will need their own systems that reflect their capabilities and priorities.For example,consider a manufacturing scenario in which company X produces tires,but subcontracts the production of lug-nuts to company Y.In order to build a single system to automate(certain aspects of)the production process,the internals of both companies X and Y must be modeled.However,neither company is likely to want to relinquish information and/or control to a system designer representing the other company.Perhaps with just two companies involved,an agreement could be reached,but with several companies involved,MAS is necessary.The only feasible solution is to allow the various companies to create their own agents that accurately represent their goals and interests.They must then be combined into a multiagent system with the aid of some of the techniques described in this article.Another example of a domain that requires MAS is hospital scheduling as presented in[Decker,1996c]. This domain from an actual case study requires different agents to represent the interests of different people within the hospital.Hospital employees have different interests,from nurses who may want to minimize the patient’s time in the hospital,to x-ray operators who may want to maximize the throughput on their ma-chines.Since different people evaluate candidate schedules with different criteria,they must be represented by separate agents if their interests are to be justly considered.Even in domains that could conceivably use systems that are not distributed,there are several possible reasons to use MAS.Having multiple agents could speed up a system’s operation by providing a method for parallel computation.For instance,a domain that is easily broken into components—several independent tasks that can be handled by separate agents—could benefit from MAS.Furthermore,the parallelism of MAS can help deal with limitations imposed by time-bounded or space-bounded reasoning requirements.While parallelism is achieved by assigning different tasks or abilities to different agents,robustness is a benefit of multiagent systems that have redundant agents.If control and responsibilities are sufficiently shared among different agents,the system can tolerate failures by one or more of the agents.Domains that must degrade gracefully are in particular need of this feature of MAS:if a single entity—processor or agent—controls everything,then the entire system could crash if there is a single failure.Although a multiagent system need not be implemented on multiple processors,to provide full robustness against failure,its agents should be distributed across several machines.Another benefit of multiagent systems is their scalability.Since they are inherently modular,it should be easier to add new agents to a multiagent system than it is to add new capabilities to a monolithic system. Systems whose capabilities and parameters are likely to need to change over time or across agents can also benefit from this advantage of MAS.From a programmer’s perspective the modularity of multiagent systems can lead to simpler program-ming.Rather than tackling the whole task with a centralized agent,programmers can identify subtasks and assign control of those subtasks to different agents.The difficult problem of splitting a single agent’s time among different parts of a task solves itself.Thus,when the choice is between using a multiagent system or a single-agent system,MAS may be the simpler option.Of course there are some domains that are more naturally approached from an omniscient perspective—because a global view is given—or with central-ized control—because no parallel actions are possible and there is no action uncertainty[Decker,1996b]. Single-agent systems should be used in such cases.Multiagent systems can also be useful for their illucidation of fundamental problems in the social sci-ences and life sciences[Cao et al.,1997],including intelligence itself[Decker,1987],.As Weißput it:“In-telligence is deeply and inevitably coupled with interaction”[Weiß,1996].In fact,it has been proposed that the best way to develop intelligent machines at all might be to start by creating“social”machines[Daut-enhahn,1995].This theory is based on the socio-biological theory that primate intelligencefirst evolved because of the need to deal with social interactions[Minsky,1988].While all of the above reasons to use MAS apply generally,there are also some arguments in favor of multi-robot systems in particular.In tasks that require robots to be in particular places,such as robot scouting,a team of robots has an advantage over a single robot in that it can take advantage of geographic distribution.While a single robot could only sense the world from a single vantage point,a multi-robot system can observe and act from several locations simultaneously.Finally,as argued in[Jung and Zelinsky,2000],multi-robot systems can exhibit benefits over single-robot systems in terms of the“performance/cost ratio.”By using heterogeneous robots each with a subset of the capabilities necessary to accomplish a given task,one can use simpler robots that are presumably less expensive to engineer than a single monolithic robot with all of the capabilities bundled together.Reasons presented above to use MAS are summarized in Table1.2.1TaxonomySeveral taxonomies have been presented previously for the relatedfield of Distributed Artificial Intelligence (DAI).For example,Decker presents four dimensions of DAI[Decker,1987]:1.Agent granularity(coarse vs.fine);2.Heterogeneity of agent knowledge(redundant vs.specialized);3.Methods of distributing control(benevolent petitive,team vs.hierarchical,static vs.shifting roles);4.and Communication possibilities(blackboard vs.messages,low-level vs.high-level,content).Along dimensions1and4,multiagent systems have coarse agent granularity and high-level communication. Along the other dimensions,they can vary across the whole ranges.In fact,the remaining dimensions are very prominent in this article:degree of heterogeneity is a major MAS dimension and all the methods of distributing control appear here as major issues.More recently,Parunak[1996]has presented a taxonomy of MAS from an application perspective.From this perspective,the important characteristics of MAS are:System function;Agent architecture(degree of heterogeneity,reactive vs.deliberative);System architecture(communication,protocols,human involvement).A useful contribution is that the dimensions are divided into agent and system characteristics.Other overviews of DAI and/or MAS include[Lesser,1995;Durfee,1992;Durfee et al.,1989;Bond and Gasser, 1988].There are also some existing surveys that are specific to multi-robot systems.Dudek et al.[1996] presented a detailed taxonomy of multiagent robotics along seven dimensions,including robot size,various communication parameters,reconfigurability,and unit processing.Cao et al.[1997]presented a“taxonomy based on problems and solutions,”using the followingfive axes:group architecture,resource conflicts, origins of cooperation,learning,and geometric problems.It specifically does not consider competitive multi-robot scenarios.This article contributes a taxonomy that encompasses MAS along with a detailed chronicle of existing systems as theyfit in to this taxonomy.The taxonomy presented in this article is organized along what we believe to be the most important aspects of agents(as opposed to domains):degree of heterogeneity and degree of -munication is presented as an agent aspect because it is the degree to which the agents communicate(or whether they communicate),not the communication protocols that are available to them,that is considered. Other aspects of agents in MAS are touched upon within the heterogeneity/communication framework.For example,the degree to which different agents play different roles is certainly an important MAS issue,but here it is framed within the scenario of heterogeneous non-communicating agents(it arises in the other three scenarios as well).All four combinations of heterogeneity and communication—homogeneous non-communicating agents; heterogeneous non-communicating agents;homogeneous communicating agents;and heterogeneous com-municating agents—are considered in this article.Our approach throughout the article is to categorize the issues as they are reflected in the literature.Many of the issues could apply in earlier scenarios,but do not in the articles that we have come across.On the other hand,many of the issues that arise in the earlier scenarios also apply in the later scenarios.Nevertheless,they are only mentioned again in the later scenarios to the degree that they differ or become more complex.The primary purpose of this taxonomy is as a framework for considering and analyzing the challenges that arise in MAS.This survey is designed to be useful to researchers as a way of separating out the issues that arise as a result of their decisions to use homogeneous versus heterogeneous agents and communicating versus non-communicating agents.The multiagent scenarios along with the issues that arise therein and the techniques that currently exist to address these issues are described in detail in Sections4–7.Table2gives a preview of these scenarios and associated issues as presented in this article.2.2Single-Agent vs.Multiagent SystemsBefore studying and categorizing MAS,we mustfirst consider their most obvious alternative:centralized, single-agent systems.Centralized systems have a single agent which makes all the decisions,while the others act as remote slaves.For the purposes of this survey,a“single-agent system”should be thought of as a centralized system in a domain which also allows for a multiagent approach.A single-agent system might still have multiple entities—several actuators,or even several physically2.2.1Single-Agent SystemsIn general,the agent in a single-agent system models itself,the environment,and their interactions.Of course the agent is itself part of the environment,but for the purposes of this article,agents are considered to have extra-environmental components as well.They are independent entities with their own goals,actions, and knowledge.In a single-agent system,no other such entities are recognized by the agent.Thus,even if there are indeed other agents in the world,they are not modeled as having goals,etc.:they are just considered part of the environment.The point being emphasized is that although agents are also a part of the environment,they are explicitly modeled as having their own goals,actions,and domain knowledge(see Figure1).2.2.2Multiagent SystemsMultiagent systems differ from single-agent systems in that several agents exist which model each other’s goals and actions.In the fully general multiagent scenario,there may be direct interaction among agents (communication).Although this interaction could be viewed as environmental stimuli,we present inter-agent communication as being separate from the environment.From an individual agent’s perspective,multiagent systems differ from single-agent systems most sig-nificantly in that the environment’s dynamics can be affected by other agents.In addition to the uncertaintyFigure1:A general single-agent framework.The agent models itself,the environment,and their interac-tions.If other agents exist,they are considered part of the environment.that may be inherent in the domain,other agents intentionally affect the environment in unpredictable ways. Thus,all multiagent systems can be viewed as having dynamic environments.Figure2illustrates the view that each agent is both part of the environment and modeled as a separate entity.There may be any number of agents,with different degrees of heterogeneity and with or without the ability to communicate directly.From the fully general case depicted here,we begin by eliminating both the communication and the heterogeneity to present homogeneous non-communicating MAS(Section4). Then,the possibilities of agent heterogeneity and inter-agent communication are considered one at a time (Sections5and6).Finally,in Section7,we arrive back at the fully general case by considering agents that directly.can interact3Organization of Existing WorkThe following sections present many different MAS techniques that have been previously published.They present an extensive,but not exhaustive,list of work in thefield.Space does not permit exhaustive coverage. Instead,the work mentioned is intended to illustrate the techniques that exist to deal with the issues that arise in the various multiagent scenarios.When possible,ML approaches are emphasized.All four multiagent scenarios are considered in the following order:homogeneous non-communicating agents,heterogeneous non-communicating agents,homogeneous communicating agents,and heterogeneous communicating agents.For each of these scenarios,the research issues that arise,the techniques that deal with them,and additional ML opportunities are presented.The issues may appear across scenarios,but they are presented and discussed in thefirst scenario to which they apply.In addition to the existing learning approaches described in the sections entitled“Issues and Techniques”, there are several previously unexplored learning opportunities that apply in each of the multiagent scenarios. For each scenario,a few promising opportunities for ML researchers are presented.Many existing ML techniques can be directly applied in multiagent scenarios by delimiting a part of the domain that only involves a single agent.However multiagent learning is more concerned with learning issues that arise because of the multiagent aspect of a given domain.As described by Weiß,multiagent learning is“learning that is done by several agents and that becomes possible only because several agents are present”[Weiß,1995].This type of learning is emphasized in the sections entitled“Further Learning Opportunities.”For the purpose of illustration,each scenario is accompanied by a suitable instantiation of the Preda-tor/Prey or“Pursuit”domain.3.1The Predator/Prey(“Pursuit”)DomainThe Predator/Prey,or“Pursuit”domain(hereafter referred to as the“pursuit domain”),is an appropriate one for illustration of MAS because it has been studied using a wide variety of approaches and because it has many different instantiations that can be used to illustrate different multiagent scenarios.Since it involves agents moving around in a world,it is particularly appropriate as an abstraction of robotic MAS.The pursuit domain is not presented as a complex real-world domain,but rather as a toy domain that helps concretize many concepts.For discussion of a domain that has the full range of complexities characteristic of more real-world domains,see Section8.The pursuit domain was introduced by Benda et al.[1986].Over the years,researchers have studied several variations of its original formulation.In this section,a single instantiation of the domain is presented. However,care is taken to point out the parameters that can be varied.The pursuit domain is usually studied with four predators and one prey.Traditionally,the predators are blue and the prey is red(black and grey respectively in Figure3).The domain can be varied by usingdifferent numbers of predators and prey.Predators see each otherPrey stays put 10% of timePrey moves randomlyPredators can communicateSimultaneous movementsOrthogonal Game in a Toroidal WorldCaptureFigure 3:A particular instantiation of the pursuit domain.Predators are black and the prey is grey.The arrows on top of two of the predators indicate possible moves.The goal of the predators is to “capture”the prey,or surround it so that it cannot move to an unoccupied position.A capture position is shown in Figure 3.If the world has boundaries,fewer than four predators can capture the prey by trapping it against an edge or in a corner.Another possible criterion for capture is that a predator occupies the same position as the prey.Typically,however,no two players are allowed to occupy the same position.As depicted in Figure 3,the predators and prey move around in a discrete,grid-like world with square spaces.They can move to any adjacent square on a given turn.Possible variations include grids with other shapes as spaces (for instance hexagons)or continuous worlds.Within the square game,players may be allowed to move diagonally instead of just horizontally and vertically.The size of the world may also vary from an infinite plane to a small,finite board with edges.The world pictured in Figure 3is a toroidal world:the predators and prey can move off one end of the board and come back on the other end.Other parameters of the game that must be specified are whether the players move simultaneously or in turns;how much of the world the predators can see;and whether and how the predators can communicate.Finally,in the original formulation of the domain,and in most subsequent studies,the prey moves randomly:on each turn it moves in a random direction,staying still with a certain probability in order to simulate being slower than the predators.However,it is also possible to allow the prey to actively try to escape capture.As is discussed in Section 5,there has been some research done to this effect,but there is still much room for improvement.The parameters that can be varied in the pursuit domain are summarized in Table 3.The pursuit domain is a good one for the purposes of illustration because it is simple to understand and because it is flexible enough to illustrate a variety of scenarios.The possible actions of the predators andAgentFigure4:The pursuit domain with just a single agent.One agent controls all predators and the prey is considered part of the environment.For each of the multiagent scenarios presented below,a new instantiation of the pursuit domain is de-fined.Their purpose is to illustrate the different scenarios within a concrete framework.3.2Domain IssuesThroughout this survey,the focus is upon agent capabilities.However,from the point of view of the system designer,the characteristics of the domain are at least as important.Before moving on to the agent-based categorization of thefield in Sections4–7,a range of domain characteristics is considered.Relevant domain characteristics include:the number of agents;the amount of time pressure for gen-erating actions(is it a real-time domain?);whether or not new goals arrive dynamically;the cost of com-munication;the cost of failure;user involvement;and environmental uncertainty.Thefirst four of these characteristics are self-explanatory and do not need further mention.With respect to cost of failure,an example of a domain with high cost of failure is air-traffic control[Rao and Georgeff,1995].On the other hand,the directed improvisation domain considered by Hayes-Roth et al.[1995]has a very low cost of failure.In this domain,entertainment agents accept all improvisation suggestions from each other.The idea is that the agents should not be afraid to make mistakes,but rather should“just let the wordsflow”[Hayes-Roth et al.,1995].Several multiagent systems include humans as one or more of the agents.In this case,the issue of communication between the human and computer agents must be considered[Sanchez et al.,1995].Another example of user involvement is user feedback in an informationfiltering domain[Ferguson and Karakoulas, 1996].Decker[1995]distinguishes three different sources of uncertainty in a domain.The state transitions in the domain itself might be non-deterministic;agents might not know the actions of other agents;and agents might not know the outcomes of their own actions.This and the other domain characteristics are summarized in Table4.4Homogeneous Non-Communicating Multiagent SystemsIn homogeneous,non-communicating multiagent systems,all of the agents have the same internal structure including goals,domain knowledge,and possible actions.They also have the same procedure for selecting among their actions.The only differences among agents are their sensory inputs and the actual actions they take:they are situated differently in the world.4.1Homogeneous Non-Communicating Multiagent PursuitIn the homogeneous non-communicating version of the pursuit domain,rather than having one agent con-trolling all four predators,there is one identical agent per predator.Although the agents have identical capa-bilities and decision procedures,they have limited information about each other’s internal state and sensory inputs.Thus they are not be able to predict each other’s actions.The pursuit domain with homogeneous agents is illustrated in Figure5.Within this framework,Stephens and Merx[1990]propose a simple heuristic behavior for each agent that is based on local information.They define capture positions as the four positions adjacent to the prey. They then propose a“local”strategy whereby each predator agent determines the capture position to which。

基于量子理论及蚁群算法的多AgentQ学习

基于量子理论及蚁群算法的多AgentQ学习
Q学习是标准的与模型无关的强化学习技术, Q学习本质
(5)
传统搜索算法主要克服的是由解空间过大而导致的需要 搜索的路径过多的问题, 因此经典搜索策略的核心是设法减 少实际搜索空间。而量子算法面临的主要问题是解集的振幅 太小, 因此量子搜索算法策略的核心是, 如何迅速地使振幅向 解集集中。也就是说, 传统搜索算法考虑的是如何避免在无 效路径上进行搜索, 而量子算法快速搜索所有路径已不是问 题, 它寻求的是如何减少、 消除非解路径上的振幅, 并把它转 移到解路径上来。
需要搜索策略空间和协调策略选择问题, 提出了一种新颖的基于量子理论和蚁群算法的多 Agent 协作学习算法。新算法首先借 签了量子计算理论, 将多 Agent 的行为和状态空间通过量子叠加态表示, 利用量子纠缠态来协调策略选择, 利用概率振幅进行动 作探索, 加快学习速度。其次, 根据蚁群算法, 提出 “脚印” 思想来间接增强 Agent 之间的交互。最后, 对新算法的理论分析和实验 结果都证明了改进的 Q 学习是可行的, 并且可以有效地提高学习效率。 关键词: 多 Agent 系统; 协作; 量子计算; Q-学习; 均衡解; 蚁群算法 DOI: 10.3778/j.issn.1002-8331.2010.21.012 文章编号: 1002-8331 (2010) 21-0043-04 文献标识码: A 中图分类号: TP301.6
系统的整体行为性能, 增强 Agent 及多 Agent 系统解决问题的 能力, 还能使系统具有更好的灵活性。通过协作使多 Agent 系 统能解决更多的实际问题, 拓宽应用。如何通过学习来协调 各 Agent 之间的行为以适应动态变化的环境, 从而有效地实现 共同的目标是多 Agent 系统研究的一个重要课题。 多 Agent 的 强 化 学 习 面 临 这 样 的 问 题 即 许 多 独 立 的 Agent 根据自己的奖赏函数在同一个环境采取不同的动作以

序列多智能体强化学习算法

序列多智能体强化学习算法

第34卷第3期2021年3月模式识别与人工智能Pattern Recognition and Artificial IntelligenceVol.34No.3Mar.2021序列多智能体强化学习算法史腾飞1王莉1黄子蓉1摘要针对当前多智能体强化学习算法难以适应智能体规模动态变化的问题,文中提出序列多智能体强化学习算法(SMARL).将智能体的控制网络划分为动作网络和目标网络,以深度确定性策略梯度和序列到序列分别作为分割后的基础网络结构,分离算法结构与规模的相关性.同时,对算法输入输出进行特殊处理,分离算法策略与规模的相关性.SMARL中的智能体可较快适应新的环境,担任不同任务角色,实现快速学习.实验表明SMARL在适应性、性能和训练效率上均较优.关键词多智能体强化学习,深度确定性策略梯度(DDPG),序列到序列(Seq2Seq),分块结构引用格式史腾飞,王莉,黄子蓉.序列多智能体强化学习算法.模式识别与人工智能,2021,34(3):206-213. DOI10.16451/ki.issn1003-6059.202103002中图法分类号TP18Sequence to Sequence Multi-agent Reinforcement Learning AlgorithmSHI Tengfei',WANG Li1,HUANG Zirong1ABSTRACT The multi-agent reinforcement learning algorithm is difficult to adapt to dynamically changing environments of agent scale.Aiming at this problem,a sequence to sequence multi-agent reinforcement learning algorithm(SMARL)based on sequential learning and block structure is proposed. The control network of an agent is divided into action network and target network based on deep deterministic policy gradient structure and sequence-to-sequence structure,respectively,and the correlation between algorithm structure and agent scale is removed.Inputs and outputs of the algorithm are also processed to break the correlation between algorithm policy and agent scale.Agents in SMARL can quickly adapt to the new environment,take different roles in task and achieve fast learning. Experiments show that the adaptability,performance and training efficiency of the proposed algorithm are superior to baseline algorithms.Key Words Multi-agent Reinforcement Learning,Deep Deterministic Policy Gradient(DDPG), Sequence to Sequence(Seq2Seq),Block StructureCitation SHI T F,WANG L,HUANG Z R.Sequence to Sequence Multi-agent Reinforcement Learning Algorithm.Pattern Recognition and Artificial Intelligence,2021,34(3):206-213.在多智能体强化学习(Multi-agent Reinforce-收稿日期:2020-10-10;录用日期:2020-11-20Manuscript received October10,2020;accepted November20,2020国家自然科学基金项目(No.61872260)资助Supported by National Natural Science Foundation of China(No. 61872260)本文责任编委陈恩红Recommended by Associate Editor CHEN Enhong1.太原理工大学大数据学院晋中0306001.College of Data Science,Taiyuan University of Technology,Jinzhong030600ment Learning,MARL)技术中,智能体与环境及其它智能体交互并获得奖励(Reward),通过奖励得到信息并改善自身策略.多智能体强化学习对环境的变化十分敏感,一旦环境发生变化,训练好的策略就可能失效.智能体规模变化是一种典型的环境变化,可造成已有模型结构和策略失效.针对上述问题,需要研究自适应智能体规模动态变化的MARL.现今MARL在多个领域已有广泛应用[1],如构建游戏人工智能(Artificial Intelligence,AI)[2]、机器人控制[3]和交通指挥⑷等.MARL研究涉及范围广泛,与本文相关的研究可分为如下3方面.1)多智能体性能方面的研究.多智能体间如何第3期史腾飞等:序列多智能体强化学习算法207较好地合作,保证整体具有良好性能是所有MARL 必须考虑的问题.Lowe等[5]提出同时适用于合作与对抗场景的多智能体深度确定性策略梯度(Multi-agent Deep Deterministic Policy Gradient,MADDPG),使用集中训练分散执行的方式让智能体之间学会较好的合作,提升整体性能.Foerster等⑷提出反事实多智能体策略梯度(Counterfactual Multi-agent Policy Gradients,COMA),同样使用集中训练分散执行的方式,使用单个Critic多个Actor的网络结构,Actor 网络使用门控循环单兀(Gate Recurrent Unit,GRU)网络,提高整体团队的合作效果.Wei等[7]提出多智能体软Q学习算法(Multi-agent Soft Q-Learning, MASQL),将软Q学习(Soft Q-Learning)算法迁至多智能体环境中,多智能体采用联合动作,使用全局回报评判动作好坏,一定程度上提升团队的合作效果.上述算法在一定程度上提升多智能体团队合作和对抗的性能,但是均存在难以适应智能体规模动态变化的问题.2)多智能体迁移性方面的研究.智能体的迁移包括同种环境中不同智能体之间的迁移和不同环境中智能体的迁移.研究如何较好地实现智能体的迁移可提升训练效率及提升智能体对环境的适应性. Brys等⑷通过重构奖励实现智能体策略的迁移.虽然可解决智能体策略的迁移问题,但在奖励重构的过程中需要耗费大量资源.Taylor等[9]提出在源任务和目标任务之间通过任务数据的双向传输,实现源任务和目标任务并行学习,加快智能体学习的进度和智能体知识的迁移,但在智能体规模巨大时,训练速度仍然有限.Mnih等[10]通过多线程模拟多个环境空间的副本,智能体网络同时在多个环境空间副本中进行学习,再将学习到的知识进行迁移整合,融入一个网络中.该方法在某种程度上也可视作一种知识的迁移,但并不能直接解决规模变化的问题.3)多智能体可扩展性和适应性方面的研究.在实际应用中,智能体的规模通常不固定并且十分庞大.当前一般解决思路是先人为调整设定模型的网络结构,然后通过大量再训练甚至是从零训练,使模型适应新的智能体规模.这种做法十分耗时耗力,根本无法应对智能体规模动态变化的环境.Khan 等[11]提出训练一个可适用于所有智能体的单一策略,使用该策略(参数共享)控制所有的智能体,实现算法可适应任意规模的智能体环境.但是该方法未注意到智能体规模对模型网络结构的影响.Zhang 等[12]提出使用降维方法对智能体观测进行表征,将不同规模的智能体的观测表征在同个维度下,再将表征作为强化学习算法的输入.该方法本质上是扩充模型网络可接受的输入维度大小,但当智能体规模持续扩大时,仍会超出模型网络的最大范围,从而导致模型无法运行.Long等[|3]改进MADDPG,使用注意力机制进行预处理观测,再将处理后的观测输入MADDPG,使用编码器(Encoder)实现注意力网络.该方法在一定程度上可适应智能体规模的变化,但在面对每次智能体规模变动时,均需要重新调整网络结构和进行再训练.针对智能体规模动态变化引发的MARL失效的问题,本文提出序列多智能体强化学习算法(Sequence to Sequence Multi-agent Reinforcement Learning Algorithm,SMARL).SMARL中的智能体可较快适应新的环境,担任不同任务角色,实现快速学习.1序列多智能体强化学习算法SMARL的核心思想是分离模型网络结构和模型策略与智能体规模的相关性,具体框图见图1.图1SMARL框图Fig.1Framework of SMARL首先在结构上,将智能体的控制网络划分为2个平行的模块一智能体动作网络(图1左侧)和智能体目标网络(图1右侧).每个智能体的执行动作由这两个网络的输出组成.为了适应算法结构,划分智能体的观测数据和动作数据.智能体的观测分为每个智能体的局部观测和所有智能体的全局观测,本文称为个性观测和共性观测.个性观测不会随智能体规模变化而变化.同理,算法中对智能体动作也分成智能体的共性动作和个性动作,所有智能体动作集的交集为共性动作,某智能体的动作集与共208模式识别与人工智能(PR&AI)第34卷性动作的差集为该智能体的个性动作.共性动作为智能体的执行动作,个性动作为智能体执行动作的目标.共性动作不会随智能体规模变化而变化.每个智能体执行的动作由共性动作和个性动作共同组成.举例说明,在二维格子世界中存在3个可移动且能相互之间抛小球的机械手臂.它们的共性观测是统一坐标系下整个地图的观测,个性观测是以自身为坐标原点的坐标系下的观测.它们的共性动作为上、下、左、右抛.个性动作由智能体ID决定:0号智能体的个性动作为1号、2号;1号智能体的个性动作为0号、2号;2号智能体的个性动作为0号、1号.经过上述分割,算法将与智能体规模相关和无关的内容分割为两部分.考虑到深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)网络[⑷在单智能强化学习上性能较优,本文在对智能体观测和动作进行分割之后,将所有智能体的动作策略视作同个策略,选取DDPG网络作为智能体动作网络的内部结构.Khan等[||]证明使用单智能体网络和单一策略控制多个智能体的有效性.考虑到序列到序列(Sequence-to-Sequence,Seq2Seq)网络[15-16]对输入输出长度的不敏感性,本文选取Seq2Seq作为智能体目标网络的内部结构,将智能体规模视作序列长度.智能体动作网络输入为智能体的个性观测,输出为智能体的共性动作,详细框图见图2.图2智能体动作网络框图Fig.2Framework of agent action network 智能体动作网络由多个DDPG网络组成,每个智能体均有各自的DDPG网络,其中,Actor网络参数为兹,,Critic网络参数为Q,Actor-target网络参数为兹;,Critic-target网络参数为Q;,i=0,1,…,N-1.单个的DDPG网络仅接收其对应的智能体以自身作为“坐标原点”的局部观测.此时,使用单一策略(参数共享)控制所有智能体的动作是有意义的.另外,为了实现参数共享,本文参考异步优势演员评论家(Asynchronous Advantage Actor-Critic, A3C)的做法[10],在智能体动作网络中额外设置一个不进行梯度更新的中心参数网络,Actor网络参数为兹”,Critic网络参数为Q n网络接收其它DDPG网络的参数进行软更新(软更新超参数子=0.01),再使用软更新更新其它DDPG网络,最终使所有DDPG网络的参数达到同个单一策略.智能体动作网络更新方式如下.令m n l,=o D pg移(九-Q(o ib,山Q J)2达到最小以更新Critic网络,其中,Q i为Critic网络的参数,Q(•-)为网络评估,B_DDPG为算法批次(Batch Size)数量,o ib、两、r ib、0亦1为抽取样本,Ju,=r,b+酌Q'(s u,+1,滋'(s u,+1丨兹忆)Q;),酌为折扣因子.Actor网络更新如下:V兹丿抑B_DDPG移(VQ(o,a Q i)s o)V汕(o丨兹J L), ib lb lb lb其中,兹i为Actor网络的参数,m(••)为网络策略.中心参数网络和其它网络相互更新如下:兹N饮子兹i+(1-子)兹N,Q N饮子匕+(1-子)Q,兹i饮子兹N+(1-子)兹i,Q i饮t Q N+(1-子)Q i-其中:中心参数网络的Actor网络参数为如,Critic 网络参数为Q N;其它DDPG网络的Actor网络参数为兹,,Critic网络参数为Q i,i=0,1,-,N-1;t为软更新超参数.智能体目标网络输入为智能体的共性观测,输出为智能体的个性动作,框图如图3所示.网络由一个Seq2Seq网络和一个存储器组成,Seq2Seq网络参数为啄.Seq2Seq网络由编码器和解码器组成,这两部分内部结构均为循环神经网络(Recurrent Neural Network,RNN).编码器负责将输入序列表征到更高的维度,由解码器将高维表征进行解码,输出新的序列.Seq2Seq网络负责学习和预测智能体间的合作关系.智能体目标网络使用强化学习的思想,存储器起到强化学习中Q的作用,负责记录某观测(序第3期史腾飞等:序列多智能体强化学习算法209列)到动作(序列)的映射及相应获得的奖励. Seq2Seq部分相当于强化学习中的Actor,负责学习最优观测序列到动作序列的映射及预测新观测序列的动作序列.所有智能体的全局观测(共性观测)所有智能体在整体坐标下的全局观测序列存储器取数据训练“翻译”Seq2Seq编码器I RNN^rRN^k l rn N|注意力机制层解码器|RNN川RNN f RNN|智能体动作目标(个性动作)▼图3智能体目标网络框图Fig.3Framework of agent target network智能体目标网络输入的序列长度为智能体规模,序列中的元素维度为每个智能体的观测.输出序列的长度同样为智能体规模,序列中的元素是智能体编号.输入序列和输出序列的顺序均按照智能体的编号排序,每当智能体规模发生变化时,智能体重新从0开始编号.具体如下:先定义Seq2Seq的奖励函数,通过强化学习的思想筛选奖励最大的观测序列到动作序列的映射,将该映射视作一种翻译,再由Seq2Seq网络进行学习.网络输出表示智能体间的合作关系.另外,本文在Seq2Seq网络中引入Attention机制,提升Seq2Seq网络性能[17].Seq2Seq的核心公式如下:m^x Z*q=1E1s s s s s sN移ln(a0,,…,a N-1o0,o1,…,0N-1,啄),n=0其中,啄为Seq2Seq的参数,。

基于Logistic混沌映射的改进麻雀算法

基于Logistic混沌映射的改进麻雀算法

X
t B
X
t Di,
j
X
t B
X t1 Di, j
X
t Di,
j
K
X
t Di ,
j
X
t L
fi fw
fi fg fi fg
(5)
式中: X B 是当前全局最优位置; fi 是当前麻雀个体的适应度; fg , fw 分别为当前全局最优和最差的麻雀
个体适应度; 是步长控制参数,服从方差为 1 均值为 0 的正态分布的随机数;K 1, 1 是一个随机数;
本文利用 Logistic 混沌映射提高初始解质量,提升麻雀算法的全局搜索能力,改善群体智能算法在接
近最优解时,种群多样性减少,易陷入局部最优等问题,利用线性递减权重法,降低群体智能算法容易早 熟的风险,减少后期在全局最优解附近出现的振荡现象[11],提出了基于 Logistic 混沌映射的改进型麻雀算法.
f
FX
X 21
X 22
X2d
f
X n1
X n2
X nd
(2)
SSA 算法在寻优过程中,具有较高适应度的发现者在迭代搜索过程中会优先获取食物,由于发现者为
整个种群提供觅食搜索方向,故发现者拥有比加入者更大的搜索范围.在迭代过程中,发现者位置更新公
式为
X
t 1 Fi, j
X
t Fi ,
Q
exp
X
t L
X
t Ji
,
j
i2
X
t 1 P
X
t Ji
,
j
X
t 1 P
L A
i 0.5n i 0.5n
(4)
式中: X P 为当前发现者所在的最优位置; X L 为当前全局最差的位置; L 是维数为1 d 且元素全为 1 的矩

多Agent系统协作求解的粒子模型方法

多Agent系统协作求解的粒子模型方法

多Agent系统协作求解的粒子模型方法赵旭宝;李静;董靓瑜【摘要】The relation between cooperative problem solving of distribution in MAS and partical collaboration are discussed, and a partical model for cooperative problem solving is proposed in MAS, which transforms the process of cooperative problem solving into co-optimization of particles. A parameter of collaboration extent is introduced, formula of demand intensity and effectiveness of target function of benefits are established, and particle swarm optimization algorithm to solve such problem is developed. Through evolutionary computation, an optimal solution for task allocations and resource assignments can be found. The proposed approach can describe and process the self-organization phenomena of Agent as well as the randomness and simultaneity of social interaction behaviors to complicated problem solving. The simulation experiments demonstrate the effectiveness and convergence of the method.%讨论了多Agent系统分布协作求解和粒子协作之间的关系,提出了一种多Agent系统协作求解粒子模型方法,将任务资源规划协作求解过程转化为多粒子共同寻优的过程.引入了协作程度变化参数,建立了需求强度计算公式和效益目标函数,并构造了适合求解的粒子群算法.通过算法的寻优计算,得到了任务资源规划协作求解的最优解.仿真实验结果表明,对于复杂的任务资源规划问题,该方法能描述和处理Agent本身自组织现象和社会交互行为的随机性和并发性,并具有良好的收敛性和有效性.【期刊名称】《大连交通大学学报》【年(卷),期】2012(033)002【总页数】6页(P94-99)【关键词】多Agent系统(MAS);分布式协作求解;粒子群算法;任务资源规划分配【作者】赵旭宝;李静;董靓瑜【作者单位】大连交通大学软件学院,辽宁大连116021;大连交通大学软件学院,辽宁大连116021;大连交通大学软件学院,辽宁大连116021【正文语种】中文0 引言在分布式人工智能中,基于Agent结构提供了柔性和鲁棒性,适合解决动态、不确定和分布式的问题.系统中各Agent个体都是具有自主性的智能体,存在自己的信念、愿望、目标等认知属性和承诺、义务、协作、竞争等社会属性[1-2].系统中各Agent个体通过对自身知识的表示和对问题域的描述,构成分布的、异构的、面向特定问题的Agent求解子系统,完成指定任务的求解.但在多Agent分布式系统中由于每个Agent个体所具有的知识资源和执行能力是有限的,当单个Agent难以独立完成指定任务,或多个Agent一起完成会产生更大的效益时,多个A-gent个体之间就倾向于利用协作机制进行信息的交流、知识的共享来完成任务的协作求解.为了保证多Agent系统协作求解的性能,很多学者在关于多Agent 系统协作求解模型建立和协作求解方法方面做了大量研究.文献[3]提出面向共同目标的合作求解策略,重点在于寻求系统的最大效益;文献[4-5]提出基于弹簧网络的多Agent系统协作求解方法,通过自组织动力学策略来实现Agent之间的协调;文献[6-7]提出基于合同网协议的合作求解方法,先协商结盟再规划求解,并通过协商的方式解决冲突.目前多Agent分布式系统协作求解方法的研究基本上有两种类型:一种类型是Agent个体各自寻求自身最大利益的方法;另一种是Agent个体共同寻求整个系统最大效益的方法.但前者协作求解中没有全局的优化目标,缺乏统一的全局控制策略;后者又难以描述Agent个体自变与自组织现象.同时,这两种类型虽然都涉及到Agent间的协作和交互,但协作交互也仅仅是一些简单的社会交互行为,在问题求解过程中不能及时处理环境和Agent本身的动态变化以及社会交互行为的随机性和并发性的问题.为此,本文提出了一种多Agent系统协作求解的粒子模型方法.将系统协作求解转换为多粒子共同寻优过程,克服了Agent本身认知属性和社会属性动态变化及随机性和并发性的问题,使得Agent个体在协作求解中既获取自身的最大利益,又促进系统的总体效益.最后,引入了协作程度变化参数,给出了Agent协作求解的需求强度计算公式和系统效益目标函数及优化算法,经过算法迭代计算求得了协作求解的资源分配优化解.仿真结果表明该方法具有很好的收敛性和实用性.1 求解粒子模型1.1 求解问题描述不失一般性,本文以多Agent分布式环境下对问题实施任务资源规划分配的协作求解为背景,讨论多Agent系统协作求解方法与优化算法.设待解决的任务Agent 集合和知识与执行能力构成的资源Agent集合分别为 Task={AgentT1,AgentT2,…, AgentTn} 和 Res = {AgentR1,AgentR2,…,AgentRm},且每个子资源AgentRi个体拥有的资源容量为mi.子资源AgentRi个体分配给子任务AgentTk个体的资源量为rik(rik≤mik),供其完成规划的任务.同时子任务AgentTk个体付给子资源AgentRi个体单位资源报酬为pik.设第k个子任务AgentTk对各种子资源Agent需求资源总量为Sk和第k个子任务AgentTk所能支付总的资源报酬代价为Pk.子任务AgentTk在执行任务时使用何种子资源Agent取决于子任务AgentTk对子资源AgentRi的需求强度xik(1≤i≤m,1≤k≤n).xik表示第k个任务需要第i个资源.因此,在多Agent分布协作求解中,任务资源规划目标则为寻找一个优化的任务资源规划分配方案,在各资源用量最下的前提下,取得系统整体收益最大值.1.2 求解粒子模型由上述问题分析可知,单个子任务AgentTk对各子资源AgentRi的需求是关于rik,pik,xik的函数,函数表示为Aik=F(rik,pik,xik).所有任务对资源需求可以表示成如下的分配矩阵T_R=[Aik]m×n(1 ≤ i≤ m,1≤ k≤ n).矩阵如下:在上述分配矩阵中,每个子任务AgentTk(1≤k≤n)在完成任务时,可能对每个资源Agent Ri(1≤i≤m)个体存在需求.也就是说,每个子资源AgentRi个体可能分配给不同的子任务(rikxik≤ mi(∀i=1,2,…,m)).这样当多个子任务同时需要某资源时,就可能产生资源使用的冲突.为此,本文提出了资源协作求解的粒子模型.将每个子资源AgentRi个体视为不可再分的个体,称为粒子,每个子资源粒子每次仅能分配给一个子任务粒子.这样,当多个子任务需要同时使用某子资源时,Agent粒子就会倾向于进行协作求解共同完成多个子任务.或当多个子资源Agent 粒子一起完成所规划任务会更有效时,也会倾向协作求解.多Agent粒子之间是否进行协作,取决于协作求解强度xik的值.1.3 需求强度的计算在实际应用中,对于任务对资源需求强度的取值,不但要考虑Agent意愿、目标等自身认知属性的变化,更要考虑复杂社会交互行为对协作求解中需求强度的影响.对于群体Agent协作求解过程中所涉及的社会交互行为类型大致可分为两类: (1)对于子任务 AgentTk,子资源 AgentRi粒子与子资源AgentRj粒子的协作交互行为ρijk;(2)对于子资源AgentRi,子任务AgentTk粒子与子任务AgentTj粒子之间的协作交互行为ρ'kji.其中,ρijk的含义为:对于子任务AgentTk,如果资源AgentRi与资源AgentRj具有相同的意愿和目标,且产生交互行为能加速对任务的执行,或产生更多的效益,则将加强任务 AgentTk对资源AgentRi粒子与AgentRj粒子的需求,加强的强度为ρijk.相反则消弱.同理,ρ'kji的含义为:对于子资源 AgentRi,如果任务AgentTk与任务AgentTj之间产生交互合作行为,能简化任务执行的复杂度,并能节约资源的消耗,则将加强资源AgentRi对任务AgentTk粒子与AgentTj粒子的分配.加强的强度为ρ'kji.相反则消弱.假定不同类型的交互行为产生的效果具有叠加性.因此,根据上述社会交互行为的分类,在协作求解交互过程中,任务对资源的需求强度可通过式(2)计算取得.其中,wij为关联权值,在[0,1]之间取值.它表示协作求解中各Agent粒子间关联程度.它随Agent粒子的动态变化而改变.如新陈代谢、随机故障以及协作交互的竞争、利用、欺骗等.在Agent粒子生命周期结束时wij=0.为了简化问题求解的复杂度,本文假定交互行为对需求强度的加强与消弱程度相同,即ρijk和ρ'kji取相同的小数值.由式(2)可知,计算所得需求强度xik的值为非整数,不满足粒子不可分思想,需对其做进一步的处理.通过构造将粒子间的需求强度xik定义为只取0或1整数值,且满足2,…,m).当xik=1表示第i个资源粒子分配给第k个子任务粒子;xik=0表示含义与前相反.需求强度的取值描述如式(3):在式(3)中,当计算所得需求强度值超过给定的某阈值μ后,使得xik=1,否则为0.构造后,某次任务资源规划协作求解状态可用图1表示.图1中圆点表示子资源粒子对子任务粒子的分配.有向边表示各粒子之间的协作求解过程.如有向边<x12,x24>表示子任务粒子T2在使用资源粒子R1时,还需要使用资源粒子R2,但R2已被T4使用.因此资源粒子R1和R2建立协作关系.在图1中有向边越多表明系统内部协作求解规模越大.另外,如果从资源分配角度描述群体Agent粒子间的协作求解交互过程,还可以表示成有向图,如图2.图2中每条边上的权值代表系统消费代价.在协作求解中由于Agent粒子自身的动态变化和社会交互行为随机性、并发性等因素的影响,使得这种协作求解是以通信开销和资源耗费为代价.本文将第i个Agent粒子和第j个Agent粒子协作求解产生的资源消费代价定义为cij.在图2中有向边的多少也体现了系统协作求解交互程度.用变量e∈[0,1]表示系统协作交互程度.图1 Agent协作求解模型图2 资源Agent协作求解过程2 求解数学模型在上述模型中,每次xik的不同取值,即可确定某次任务资源规划的一次分配方案.由此即可计算任务规划系统整体效益值.下面给出协作求解的效益目标函数.设第k个子任务完成后所产生的效益是一个关于资源消耗代价的某一连续可微的严格凹函数分布,则第k个任务的效益函数为2,…,n).其中λ1为 Agent粒子自身效益因子(λ1为随机正小数).它与Agent在其生命周期内自变与自组织因素及资源的利用率、需求满足率有关;为第k个任务所耗费资源报酬总和.同理,可得到第i个资源与第j个资源执行协作求解时,所产生的协作效益为Wkb(X)=(i≠ j且 b > k,∀k,b=1,2,…,n).其中λ2为协作过程效益因子(λ2为随机正小数).它与协作过程存在拥塞、欺骗、竞争和优先级竞争等社会交互行为有关;为协作过程中由于通信开销和资源消耗所产生的报酬总和.因此,整个系统所产生的总效益函数为.系统的实现目标则为求得W(X)总效益的最大值.对于总效益函数W(X),求总效益的最大值等价于求整个系统消费总代价最小值,即式(4)的最小值,且满足式 (5),(6),(7)约束条件.λ1,λ2为(0,1)的随机数3 协作求解的粒子群算法在多Agent系统分布式协作求解的粒子模型中,每个规划解xik都是一个0或1值(1≤i≤m,1≤k≤n).显然任务资源协作求解问题属于组合优化问题.因此,本文构造了适合求解的改进粒子群优化算法[8].在每次求解中把子资源数粒子数m定义为算法中每次迭代求解的空间维数.即每一维空间代表一个资源粒子对任务粒子的分配情况,用一个整数来表示.如果某资源粒子在求解中没有被任何子任务所使用,则表示为0.如Pkm代表算法中第k次求解的第m维空间;Pkm=n-1表示算法中第k次求解中第m个资源粒子分配给第n-1个子任务使用.在此粒子群优化算法中,采用了惩罚函数法[9]来处理了具有约束的优化问题,即只要是非可行解就直接丢弃.转化后的目标适应值计算公式可表示为式(8).采用这种适应值计算方法,对式(4)经过多次优化迭代计算,即可求得其系统消费代价最小优化解.协作求解的粒子群算法如下:(1)随机初始化粒子种群:粒子群位置X、速度V、种群规模N、学习因子C1和C2、惯性因子w和最大迭代次数Max_Len以及系统参数变量pik,rik,cij值.(2)利用给定的初始位置值,初始化个体最优位置Pi解和全局最优位置Pg解.并根据各个Agent自治性和社会交互性,利用式(2)(3)计算需求强度参数xik.(3)While(k< =Max_Len&&φ <0.0001)//φ为优化达到的精度;For i=1∶NFor j=1∶mChange(X,V)//修改每个解的位置X和速度V.End For(4)calculation(i);//通过式(8),计算第i次迭代适应值.localbest(i);//将第i次计算解与所经历过的当前最好位置解Pi进行比较,若较好,则将其作为当前解的最好位置解;End Forglobalbest();//对每次求解,将其Pi与全局所经历的最好位置解进行比较,求出全局极值Pg.EndWhile4 仿真实例4.1 仿真实例参数设置在仿真实验中,考虑到每个子资源Agent和子任务Agent有不同的优先级、自治度、交互性等复杂的行为,在仿真实验中随机生成了相关系统参数.仿真程序中各个参数和变量分别设置如下:每次迭代中搜索空间的维数为资源数m;加速因子c1和c2设置为2.惯性权w根据经验值设为w=0.9 -count*0.5/(Max_Len -1),count代表当前第count个粒子.种群位置的变化范围根据所要研究的问题设置为[1,N],粒子速度的变化范围设定为[1,N];最大迭代次数Max_Len取值为500;其他参数设置为如下随机值:wij=[0,1];λ1= [0.01,0.1];λ2= [0.1,0.2];mi= [50,100];pik= [2,10];rik= [1,5];cij= [5,10];Sk= [100,200];Pk= [100,200].4.2 仿真结果与收敛性分析算法中的全局最优适应值变化反应了协作求解的寻优过程.种群根据自身经验和全局经验,不断调整单个解的位置,最后通过迭代搜索到全局最优解.由于本文把每个解的搜索空间定义为资源数.所以每个全局最优适应值就代表了任务资源的一次规划分配方案.在系统协作求解中,由于子任务数和子资源数是不固定的,并且协作求解的程度e也处于动态变化之中.因此,为了全面考察算法的有效性.本文针对不同的子任务、子资源和不同e值组合进行实验分析.如图3,4所示.图3 给出了在 (m,n)=(12,10),e=0.3,0.5,0.8条件下全局最优适应值Pg的变化过程.从图3可以看出,当资源粒子数和任务粒子数相同,协作求解程度e不同时,全局最优适应值差别很大,由17.1变化到41.2,而收敛速度基本相同.这是因为随着e的增加,系统协作求解的资源消费不断增加,导致系统总消费不断增大.但由于资源数和任务数相同,每个粒子的搜索空间维数相同.因此算法的收敛速度基本相同.这也说明了该算法的收敛速度与协作求解程度e无关.当资源数和任务数不同,协作求解程度相同时.全局最优适应值虽变化很大,但收敛速度随资源粒子数的增加明显变慢.这是由于资源数的增加导致粒子搜索的空间变大,迭代速度就会变慢时间变长.如图4所示.在图4虽然全局最优适应值变化速度不同,但最终都趋于平稳状态.表明该方法具有良好的收敛性.同时,仿真结果也验证了该方法适合不同协作条件下各种任务资源规划分配求解.图3 全局最优适应值变化((m,n)=(12,10))图4 全局最优适应值变化(e=0.2)仿真计算过程中,对于不同的资源粒子数、任务粒子数和不同的e值.在run_time=30条件下,求得的全局最优适应平均值和标准方差值如附表所示.在附表中全局最优适应值的标准均方差的变化范围为0.22~0.42.说明了该算法具有良好的收敛性和有效性.仿真结果也验证了该方法适合不同协作条件下各种任务资源规划分配求解,该方法具有一定的通用性和有效性.附表任务资源规划分配结果表(m,n)e 全局最优适应平均值标准方差STD 0.39.99 0.22(10,8)0.5 16.03 0.37(12,10)0.3 16.86 0.3 0.5 25.36 0.32 0.3 28.58 0.39(16,14)0.5 45.97 0.425 结论多Agent分布式系统协作求解比较复杂,需要考虑Agent本身的自治与交互行为动态性和随机性、复杂性等问题.本文针对分布式环境下任务资源协作规划分配问题,提出了一种多Agent分布式系统协作求解粒子模型方法,通过讨论多A-gent 系统分布协作求解和粒子协作之间的关系,将协作求解问题转化为多个粒子共同寻优的过程,并构造了适合求解该方法的粒子群算法.在算法迭代过程中,尽管全局最优适应值收敛的速度不同,但都收敛达到平稳状态,表明该算法是收敛的.仿真实验结果表明在该方法中资源数的多少决定了算法的搜索空间,影响了收敛速度的快慢,但收敛速率与协作求解程度无关.仿真实验结果也验证了该模型方法即能克服了环境和Agent本身的动态变化,又能处理社会交互行为变化对系统协作求解的影响,能够很好地解决各种复杂的任务资源规划问题,具有很好的有效性和通用性.参考文献:[1]张新良,石纯一.多Agent合作求解[J].计算机科学,2003,30(8):100-103.[2]李英.多Agent系统及其在预测与智能交通系统的应用[M].上海:华东理工大学出版社,2004.[3]WOOLDRIDGE M,JENNINGS NR.The cooperative problem-solving process[J].Journal of Logic Computation,1999,9(4):563-592.[4]SHUAI Dianxun,FENG Xiang.Distributed problem solving in multi-agent system:A spring net approach[J].IEEE Intelligent System,2005,20(4):66-74.[5]帅典勋,王亮.一种新的基于复合弹簧网络的多A-gent系统分布式问题求解方法[J].计算机学报,2002,25(8):853-859.[6]陶海军,王亚东,郭茂祖,等.基于熟人联盟及扩充合同网协议的多智能体协商模型[J].计算机研究与发展,2006,43(7):1155-1160.[7]陈宇,陈新,陈新度,等.基于设备整体效能和多Agent的预测-反应式调度[J].计算机集成制造系统,2009,15(8):1599-1605.[8]谢晓锋,张文俊,杨之廉.微粒群算法综述[J].控制与决策,2003,18(2):129-134.[9]徐刚,于泳波.基于改进的微粒子算法求解0/1背包问题[J].齐齐哈尔大学学报,2007,23(1):71-74.。

多变量核密度估计与Vine复杂体(kdevine包)v0.4.4用户指南说明书

多变量核密度估计与Vine复杂体(kdevine包)v0.4.4用户指南说明书

Package‘kdevine’October18,2022Type PackageTitle Multivariate Kernel Density Estimation with Vine CopulasVersion0.4.4URL https:///tnagler/kdevineBugReports https:///tnagler/kdevine/issuesDescription Implements the vine copula based kernel density estimator ofNagler and Czado(2016)<doi:10.1016/j.jmva.2016.07.003>.The estimator doesnot suffer from the curse of dimensionality and is therefore well suited forhigh-dimensional applications.License GPL-3Imports graphics,stats,utils,MASS,Rcpp,qrng,KernSmooth,cctools,kdecopula(>=0.8.1),VineCopula,doParallel,parallel,foreachLazyData yesLinkingTo RcppRoxygenNote7.2.0Suggests testthatNeedsCompilation yesAuthor Thomas Nagler[aut,cre]Maintainer Thomas Nagler<****************>Repository CRANDate/Publication2022-10-1812:25:15UTCR topics documented:kdevine-package (2)contour.kdevinecop (3)dkde1d (3)dkdevine (4)dkdevinecop (5)kde1d (6)12kdevine-package kdevine (7)kdevinecop (9)plot.kde1d (10)rkdevine (11)wdbc (12)Index14 kdevine-package Kernel Smoothing for Bivariate Copula DensitiesDescriptionThis package implements a vine copula based kernel density estimator.The estimator does not suf-fer from the curse of dimensionality and is therefore well suited for high-dimensional applications (see,Nagler and Czado,2016).DetailsThe multivariate kernel density estimators is implemented by the kdevine function.It combines a kernel density estimator for the margins(kde1d)and a kernel estimator of the vine copula density (kdevinecop).The package is built on top of the copula density estimators in the kdecopula::kdecopula-package and let’s you choose from all its implemented methods.Optionally,the vine copula can be estimated parameterically(only the margins are nonparametric).Author(s)Thomas NaglerReferencesNagler,T.,Czado,C.(2016)Evading the curse of dimensionality in nonparametric density estimation with simplified vine copu-las.Journal of Multivariate Analysis151,69-89(doi:10.1016/j.jmva.2016.07.003)Nagler,T.,Schellhase,C.and Czado,C.(2017)Nonparametric estimation of simplified vine copula models:comparison of methods arXiv:1701.00845 Nagler,T.(2017)A generic approach to nonparametric function estimation with mixed data.arXiv:1704.07457contour.kdevinecop3 contour.kdevinecop Contour plots of pair copula kernel estimatesDescriptionContour plots of pair copula kernel estimatesUsage##S3method for class kdevinecopcontour(x,tree="ALL",xylim=NULL,cex.nums=1,...)Argumentsx a kdevinecop object.tree"ALL"or integer vector;specifies which trees are plotted.xylim numeric vector of length2;sets xlim and ylim for the contours.cex.nums numeric;expansion factor for font of the numbers....arguments passed to contour.kdecopula.Examplesdata(wdbc,package="kdecopula")#load datau<-VineCopula::pobs(wdbc[,5:7],ties="average")#rank-transform#estimate densityfit<-kdevinecop(u)#contour matrixcontour(fit)dkde1d Working with a kde1d objectDescriptionThe density,cdf,or quantile function of a kernel density estimate are evaluated at arbitrary points with dkde1d,pkde1d,and qkde1d respectively.4dkdevineUsagedkde1d(x,obj)pkde1d(x,obj)qkde1d(x,obj)rkde1d(n,obj,quasi=FALSE)Argumentsx vector of evaluation points.obj a kde1d object.n integer;number of observations.quasi logical;the default(FALSE)returns pseudo-random numbers,use TRUE for quasi-random numbers(generalized Halton,see ghalton).ValueThe density or cdf estimate evaluated at x.See Alsokde1dExamplesdata(wdbc)#load datafit<-kde1d(wdbc[,5])#estimate densitydkde1d(1000,fit)#evaluate density estimatepkde1d(1000,fit)#evaluate corresponding cdfqkde1d(0.5,fit)#quantile functionhist(rkde1d(100,fit))#simulatedkdevine Evaluate the density of a kdevine objectDescriptionEvaluate the density of a kdevine objectUsagedkdevine(x,obj)dkdevinecop5Argumentsx(mxd)matrix of evaluation points(or vector of length d).obj a kdevine object.ValueThe density estimate evaluated at x.See AlsokdevineExamples#load datadata(wdbc)#estimate density(use xmin to indicate positive support)fit<-kdevine(wdbc[,5:7],xmin=rep(0,3))#evaluate density estimatedkdevine(c(1000,0.1,0.1),fit)dkdevinecop Working with a kdevinecop objectDescriptionA vine copula density estimate(stored in a kdevinecop object)can be evaluated on arbitrary pointswith dkevinecop.Furthermore,you can simulate from the estimated density with rkdevinecop. Usagedkdevinecop(u,obj,stable=FALSE)rkdevinecop(n,obj,U=NULL,quasi=FALSE)Argumentsu mx2matrix of evaluation points.obj kdevinecop object.stable logical;option for stabilizing the estimator:the estimated pair copula density is cut off at50.n integer;number of observations.U(optional)nxd matrix of independent uniform random variables.quasi logical;the default(FALSE)returns pseudo-random numbers,use TRUE for quasi-random numbers(generalized Halton,see ghalton).6kde1d ValueA numeric vector of the density/cdf or a nx2matrix of simulated data.Author(s)Thomas NaglerReferencesNagler,T.,Czado,C.(2016)Evading the curse of dimensionality in nonparametric density estimation.Journal of Multivariate Analysis151,69-89(doi:10.1016/j.jmva.2016.07.003)Dissmann,J.,Brechmann,E.C.,Czado,C.,and Kurowicka,D.(2013).Selecting and estimating regular vine copulae and application tofinancial returns.Computational Statistics&Data Analysis,59(0):52–69.See Alsokdevinecop,dkdecop,rkdecop,ghaltonExamplesdata(wdbc,package="kdecopula")#load datau<-VineCopula::pobs(wdbc[,5:7],ties="average")#rank-transformfit<-kdevinecop(u)#estimate densitydkdevinecop(c(0.1,0.1,0.1),fit)#evaluate density estimatekde1d Univariate kernel density estimation for bounded and unbounded sup-portDescriptionDiscrete variables are convoluted with the uniform distribution(see,Nagler,2017).If a variable should be treated as discrete,declare it as ordered().Usagekde1d(x,mult=1,xmin=-Inf,xmax=Inf,bw=NULL,bw_min=0,...)Argumentsx vector of length n.mult numeric;the actual bandwidth used is bw∗mult.xmin lower bound for the support of the density.xmax upper bound for the support of the density.bw bandwidth parameter;has to be a positive number or NULL;the latter calls KernSmooth::dpik().bw_min minimum value for the bandwidth....unused.DetailsIf xmin or xmax arefinite,the density estimate will be0outside of[xmin,xmax].Mirror-reflectionis used to correct for boundary bias.Discrete variables are convoluted with the uniform distribution(see,Nagler,2017).ValueAn object of class kde1d.ReferencesNagler,T.(2017).A generic approach to nonparametric function estimation with mixed data.arXiv:1704.07457See Alsodkde1d,pkde1d,qkde1d,rkde1d plot.kde1d,lines.kde1dExamplesdata(wdbc,package="kdecopula")#load datafit<-kde1d(wdbc[,5])#estimate densitydkde1d(1000,fit)#evaluate density estimatekdevine Kernel density estimatior based on simplified vine copulasDescriptionImplements the vine-copula based estimator of Nagler and Czado(2016).The marginal densitiesare estimated by kde1d,the vine copula density by kdevinecop.Discrete variables are convolutedwith the uniform distribution(see,Nagler,2017).If a variable should be treated as discrete,declareit as ordered().Factors are expanded into binary dummy codes.Usagekdevine(x,mult_1d=NULL,xmin=NULL,xmax=NULL,copula.type="kde",...)Argumentsx(nxd)data matrix.mult_1d numeric;all bandwidhts for marginal kernel density estimation are multipliedwith mult_1d.Defaults to log(1+d)where d is the number of variables afterapplying cctools::expand_as_numeric().xmin numeric vector of length d;see kde1d.xmax numeric vector of length d;see kde1d.copula.type either"kde"(default)or"parametric"for kernel or parametric estimation ofthe vine copula....further arguments passed to kde1d or kdevinecop.ValueAn object of class kdevine.ReferencesNagler,T.,Czado,C.(2016)Evading the curse of dimensionality in nonparametric density estima-tion with simplified vine copulas.Journal of Multivariate Analysis151,69-89(doi:10.1016/j.jmva.2016.07.003)Nagler,T.(2017).A generic approach to nonparametric function estimation with mixed data.arXiv:1704.07457See Alsodkdevine kde1d kdevinecopExamples#load datadata(wdbc,package="kdecopula")#estimate density(use xmin to indicate positive support)fit<-kdevine(wdbc[,5:7],xmin=rep(0,3))#evaluate density estimatedkdevine(c(1000,0.1,0.1),fit)#plot simulated datapairs(rkdevine(nrow(wdbc),fit))kdevinecop9 kdevinecop Kernel estimation of vine copula densitiesDescriptionThe function estimates a vine copula density using kernel estimators for the pair copulas(based on the kdecopula package).Usagekdevinecop(data,matrix=NA,method="TLL2",renorm.iter=3L,mult=1,test.level=NA,trunc.level=NA,treecrit="tau",cores=1,info=FALSE)Argumentsdata(nxd)matrix of copula data(have to lie in[0,1d]).matrix R-Vine matrix(nxd)specifying the structure of the vine;if NA(default)thestructure selection heuristic of Dissman et al.(2013)is applied.method see kdecop.renorm.iter see kdecop.mult see kdecop.test.level significance level for independence test.If you provide a number in[0,1],anindependence test(BiCopIndTest)will be performed for each pair;if the nullhypothesis of independence cannot be rejected,the independence copula willbe set for this pair.If test.level=NA(default),no independence test will beperformed.trunc.level integer;the truncation level.All pair copulas in trees above the truncation levelwill be set to independence.treecrit criterion for structure selection;defaults to"tau".cores integer;if cores>1,estimation will be parallized within each tree(using foreach).info logical;if TRUE,additional information about the estimate will be gathered(seekdecop).10plot.kde1dValueAn object of class kdevinecop.That is,a list containingT1,T2,...lists of the estimted pair copulas in each tree,matrix the structure matrix of the vine,info additional information about thefit(if info=TRUE).ReferencesNagler,T.,Czado,C.(2016)Evading the curse of dimensionality in nonparametric density estimation with simplified vine cop-ulas.Journal of Multivariate Analysis151,69-89(doi:10.1016/j.jmva.2016.07.003)Nagler,T.,Schellhase,C.and Czado,C.(2017)Nonparametric estimation of simplified vine copula models:comparison of methods arXiv:1701.00845 Dissmann,J.,Brechmann,E.C.,Czado,C.,and Kurowicka,D.(2013).Selecting and estimating regular vine copulae and application tofinancial returns.Computational Statistics&Data Analysis,59(0):52–69.See Alsodkdevinecop,kdecop,BiCopIndTest,foreachExamplesdata(wdbc,package="kdecopula")#rank-transform to copula data(margins are uniform)u<-VineCopula::pobs(wdbc[,5:7],ties="average")fit<-kdevinecop(u)#estimate densitydkdevinecop(c(0.1,0.1,0.1),fit)#evaluate density estimatecontour(fit)#contour matrix(Gaussian scale)pairs(rkdevinecop(500,fit))#plot simulated dataplot.kde1d Plotting kde1d objectsDescriptionPlotting kde1d objectsUsage##S3method for class kde1dplot(x,...)##S3method for class kde1dlines(x,...)rkdevine11Argumentsx kde1d object....further arguments passed to plot.default.See Alsokde1d lines.kde1dExamplesdata(wdbc)#load datafit<-kde1d(wdbc[,7])#estimate densityplot(fit)#plot density estimatefit2<-kde1d(as.ordered(wdbc[,1]))#discrete variableplot(fit2,col=2)rkdevine Simulate from a kdevine objectDescriptionSimulate from a kdevine objectUsagerkdevine(n,obj,quasi=FALSE)Argumentsn number of observations.obj a kdevine object.quasi logical;the default(FALSE)returns pseudo-random numbers,use TRUE for quasi-random numbers(generalized Halton,only works for fully nonparametricfits).ValueAn nxd matrix of simulated data from the kdevine object.See Alsokdevine,rkdevinecop,rkde1d12wdbcExamples#load and plot datadata(wdbc)#estimate densityfit<-kdevine(wdbc[,5:7],xmin=rep(0,3))#plot simulated datapairs(rkdevine(nrow(wdbc),fit))wdbc Wisconsin Diagnostic Breast Cancer(WDBC)DescriptionThe data contain measurements on cells in suspicious lumps in a women’s breast.Features are computed from a digitized image of afine needle aspirate(FNA)of a breast mass.They describe characteristics of the cell nuclei present in the image.All samples are classsified as either benign or malignant.Usagedata(wdbc)Formatwdbc is a data.frame with31columns.Thefirst column indicates wether the sample is classified as benign(B)or malignant(M).The remaining columns contain measurements for30features.DetailsTen real-valued features are computed for each cell nucleus:a)radius(mean of distances from center to points on the perimeter)b)texture(standard deviation of gray-scale values)c)perimeterd)areae)smoothness(local variation in radius lengths)f)compactness(perimeter^2/area-1.0)g)concavity(severity of concave portions of the contour)h)concave points(number of concave portions of the contour)i)symmetryj)fractal dimension("coastline approximation"-1)The references listed below contain detailed descriptions of how these features are computed.The mean,standard error,and"worst"or largest(mean of the three largest values)of these features were computed for each image,resulting in30features.wdbc13NoteThis breast cancer database was obtained from the University of Wisconsin Hospitals,Madison from Dr.William H.Wolberg.Sourcehttps:///ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)Bache,K.&Lichman,M.(2013).UCI Machine Learning Repository.Irvine,CA:University of California,School of Information and Computer Science.ReferencesO.L.Mangasarian and W.H.Wolberg:"Cancer diagnosis via linear programming",SIAM News,V olume23,Number5,September1990,pp1&18.William H.Wolberg and O.L.Mangasarian:"Multisurface method of pattern separation for medical diagnosis applied to breast cytology",Proceedings of the National Academy of Sciences,U.S.A.,V olume87,December1990,pp9193-9196.K.P.Bennett&O.L.Mangasarian:"Robust linear programming discrimination of two linearly inseparable sets",Optimization Methods and Software1,1992,23-34(Gordon&Breach Science Publishers).Examplesdata(wdbc)str(wdbc)Index∗datasetswdbc,12∗packagekdevine-package,2 BiCopIndTest,9,10 cctools::expand_as_numeric(),8 contour.kdecopula,3 contour.kdevinecop,3dkde1d,3,3,7dkdecop,6dkdevine,4,8 dkdevinecop,5,10 dkevinecop(dkdevinecop),5 foreach,9,10ghalton,4–6kde1d,2,4,6,7,8,11 kdecop,9,10kdecopula,9kdecopula::kdecopula-package,2 kdevine,2,5,7,11kdevine-package,2 kdevinecop,2,3,6–8,9 KernSmooth::dpik(),7lines.kde1d,7,11lines.kde1d(plot.kde1d),10 ordered(),6,7pkde1d,3,7pkde1d(dkde1d),3pkde1d,(dkde1d),3plot.default,11plot.kde1d,7,10qkde1d,3,7qkde1d(dkde1d),3qkde1d,(dkde1d),3rkde1d,7,11rkde1d(dkde1d),3rkdecop,6rkdevine,11rkdevinecop,11rkdevinecop(dkdevinecop),5wdbc,1214。

基于WMNPE间歇过程监测的改进SVDD算法

基于WMNPE间歇过程监测的改进SVDD算法

基于WMNPE间歇过程监测的改进SVDD算法惠永永;赵小强【摘要】间歇过程数据包含表征过程变化的相关信息和非相关信息,并且呈现高斯与非高斯的多分布等特点.为了更加充分地提取数据的有用信息和处理数据的非高斯性等问题,实现有效的过程监控,提出一种基于WMNPE间歇过程监测的改进SVDD算法.首先运用多向邻域保持嵌入(MNPE)算法来提取低维子流形以实现降维;再使用概率权值策略来提取表征过程变化的相关信息,通过Greedy方法提取低维子流形的特征样本;最后以支持向量数据描述(SVDD)方法建立监控模型进行监控.通过青霉素发酵过程仿真平台验证了所提算法的有效性.【期刊名称】《兰州理工大学学报》【年(卷),期】2018(044)006【总页数】5页(P107-111)【关键词】间歇过程;过程监控;多向邻域保持嵌入(MNPE)算法;支持向量数据描述(SVDD)【作者】惠永永;赵小强【作者单位】兰州理工大学电气工程与信息工程学院,甘肃兰州 730050;兰州理工大学电气工程与信息工程学院,甘肃兰州 730050【正文语种】中文【中图分类】TP277随着工业过程规模的扩大和系统的复杂性日益增加,在过程生产中保证生产安全以及提高产品质量显得越来越重要.由于计算机控制系统的发展,采集和存储了大量的工业过程数据,可以通过对数据的统计分析来监控系统的运行状态,因此,基于多元统计的过程监控(MSPM)方法得到了迅速的发展[1-3].间歇过程作为流程工业中一种重要的生产方式,被广泛地运用于半导体、生物制药、食品及生化产品的制备等[4].与连续生产过程相比,间歇过程有着明显的区别,其中多操作阶段、动态特性变化快、时序操作严格等是其主要特点[5].多元统计方法中的主元分析(PCA)[6]和偏最小二乘(PLS)[7]法广泛地运用在间歇过程的过程监控与故障诊断中.Wold等[8]提出的分层多模块PCA,将三维数据展开后分割成若干个子模块进行建模,来提高监控效果;Lu等[9]提出了间歇过程子时段划分算法、基于子时段PCA/PLS模型的在线监测及质量预测算法.Jiang等[10]提出了一种自适应权值PCA的方法来突出有用信息,从而提高过程监控质量.但是,这些算法通常都需要满足高斯分布的假设,并在数据的提取过程中会造成有用信息的丢失,这会影响过程监控的性能.近年来,流形算法在模式识别和机器学习领域得到广泛的应用.与传统特征提取的算法相比,流形算法提取原始数据空间中隐藏的低维流形信息,对局部结构有较好的保持能力.自Roweis等[11]提出局部线性嵌入(LLE) 算法以来,出现了许多新的流形算法,如等距映射(ISOMAP)[12]、拉普拉斯特征映射(LE)[13]、局部保持投影(LPP)[14]和近邻保持嵌入(NPE)[15]算法等.Hu等[16]将LPP算法用于间歇过程监测并取得了优于PCA的监测效果,之后又提出了基于NPE的动态间歇过程监测方案[17].NPE算法较好地适应于任何新数据,对边缘不太敏感,并且有着较强的鲁棒性[18].支持向量数据描述(SVDD)方法[19]在数据的分类方面得到了较广泛的应用,该方法通过对正常数据的建模从而分离出过程的故障信息,不需要对数据进行高斯分布的假设,可以解决数据的非高斯和非线性问题.核密度估计是一种对非参数概率密度估计的有力工具[20],被广泛用于确定过程数据的分布和获取监控过程的控制限.虽然基于NPE的方法已经成功用于间歇过程的监控,但是仍然存在对有用信息的抑制问题,并且需要对高斯分布的假设.实际的生产过程通常包括复杂的物理及化学过程,原料成分的改变、现场噪声和设备老化等因素都会导致采集的数据具有非高斯分布的特征.因此,为了更充分地提取间歇过程监控中的有用信息和解决数据的非高斯分布问题,提高算法效率,实现在线监控,本文提出一种基于WMNPE间歇过程监测的改进SVDD算法,在采用MNPE算法在充分保留数据的局部结构的前提下,对正常样本批次进行降维来提取低维流形结构,然后利用核密度估计对每个嵌入部分进行密度估计,得出加权矩阵,通过权值策略来充分提取过程数据的有用信息;之后在统计建模的过程中通过Greedy方法来避免大规模的样本数据,提取特征样本;最后使用SVDD方法解决统计数据的非高斯性进行监控.1 NPE算法邻域保持嵌入算法(NPE)是一种局部流形算法,首先确定样本点xi的近邻点,本文中选用k近邻法找到每个样本的K个欧式距离最近的邻近点.再通过近邻点来计算权重系数矩阵W,若结点i到j有一条边连接,则此边权值为wij;若无连接则权重值就为0.权重系数矩阵可通过求解下式的最优解得到:(1)在NPE算法中,若wij能在Rm空间中重构数据点xi,则它也可以在Rm空间中重构对应的点yi.因此,映射变换矩阵A通过求解下式最优解的问题:(2)其中:M=(I-W)Τ(I-W),约束条件是yΤy=bΤXXΤbΤ=1.这就将求解变换矩阵问题转化为求解下式的广义特征值:XMXΤb=λXXΤb(3)式(3)中最小的d个特征值(λ1≤λ2≤…≤λd)所对应的特征向量组成变换矩阵A=(b1,b2,…,bd).所以Y=AΤX,Y为得到的降维数据矩阵,A为映射变换矩阵,X为测量的数据矩阵.2 SVDD方法SVDD方法是针对数据集X={xi,i=1,…,N},通过非线性转换φ:X→F将原始空间数据投影到特征空间{φ(xi),i=1,…,N},从而找到一个几乎包含所有数据样本的最小体积超球面.假定a是超球体的球心,R为超球体半径.考虑测量误差或者噪音等干扰引起的离群点影响,引入松弛因子ξ,C是惩罚参数,该问题可描述为[21-22](4)式(4)的最优化问题可以转化为解决相应的对偶问题:(5)其中:ai是拉格朗日因子.用核函数K(xi,xj)代替内积〈φ(xi),φ(xj)〉,以实现低维空间的非线性问题向高维空间的线性问题的转换,可得到:(6)基于式(5,6)的最优化问题,可以得出超球面的球心a以及半径R,如下式:其中:Ra是支持向量与球心的距离;ai是每一数据样本的拉格朗日乘子;xa是SVDD模型的支持向量(支持向量是位于超球面表面的正常样本).对于新来的样本xnew,其到超球体球心的距离可表示为(9)如果Dnew<R,则该样本正常;反之,该样本为异常样本.3 基于WMNPE的改进SVDD算法本文通过加权的方法来提取数据中的有用信息,并抑制一些不相关信息.给定权值矩阵为Wp,将其嵌入到特征空间可得出下式:Y=WpPX(10)其中:为一对角阵.权值策略的目的是突出有用信息和抑制不相关信息,因此确定其权值矩阵中的权值是关键,本文使用概率密度估计来增强有用信息和抑制噪声等无关信息,从而得出权值矩阵中的权值.首先,把采样点投影到特征空间,然后使用概率密度估计对每个正常数据的嵌入部分进行密度估计,最后对每个新的采样点进行密度估计.一个比较大的密度值意味着新的采样点与正常数据之间的偏差较小,同理,一个较小的密度值表示与正常数据间的偏差较大.一个单变量的核密度函数为(11)其中:y是采样数据;yi是数据集中的观测值;h是窗口的宽度;n为观测数据的个数[23].本文选择高斯核函数,所以核密度估计为(12)通常,窗口的宽度对核密度的估计有着重要的影响,最优的选择窗口宽度与采样点数、数据的分布特性及核函数的选择等有关.权值矩阵Wp在故障检测中突出有用信息,因此,越重要的嵌入部分加权权值越大.由于密度值越小表示测量值与正常值之间的偏差越大,因此密度值小的可能包含更多的故障信息.通过密度估计,加权值可以表示为(13)其中:α为密度阈值;β是嵌入的权值;总体上,α和β值能够通过对正常数据分析获取,同时,α和β值的选取要保证过程监控不被已知的扰动所影响;在本文中α取值为0.001~0.3,β的取值选为是第i个嵌入值的第k-1个采样点的核密度估计值(k为当前采样点);为当前时刻前h个样本点对应的独立元yi的平均值,即得出加权矩阵Wp之后,通过Y=WpPX求得低维空间的投影值Y,就可以对其建立统计分析模型,对过程进行监测.SVDD方法可以被用来建立非高斯的统计模型,但是其运算复杂度为N3(N是训练样本数).由于大规模的数据运算会引发核函数矩阵计算时的“维数灾难”,所以在建模前,提取建模数据集中的特征样本以减小计算量是十分必要的.Greedy方法[24]的特征提取是在建模残差数据集中,寻找一个特征子集使由该特征子集张成的空间近似于由原建模数据集张成的空间.所以本文在建立SVDD模型之前使用Greedy方法进行建模特征样本的提取,以减小计算量,加快了计算速度,在过程监控中提高了实时的监控性能.4 基于WMNPE的改进SVDD算法间歇过程故障监测步骤4.1 离线监测1) 从正常工况条件下采集一定数量的间歇过程数据构成三维矩阵X(I×J×K),将其沿批次方向展开为X(I×JK),并进行标准化.2) 通过MNPE算法得到投影矩阵P.3) 用核密度估计算法估计出每个正常数据嵌入部分的密度值.4) 指定窗口宽度h、密度阈值α和嵌入的权值β.5) 计算正常数据的嵌入权值Wp.6) 利用Greedy方法进行建模特征样本提取.7) 对低维流形特征样本建立SVDD模型,得到超球体半径R.4.2 在线监测1) 在线获取实时数据xk,并对数据进行标准化处理.2) 计算xk的嵌入矩阵并且计算其概率密度值,求得加权矩阵WP.3) 确定加权后的降维投影Y.4) 通过改进SVDD算法计算当前采样的R值.5) 如果R值大于控制限,则判断为产生故障,否则返回步骤1)重新监测.5 仿真实验本文利用Birol等[25]提出的Pensim2.0青霉素发酵过程的标准仿真平台产生出间歇过程数据,Pensim2.0是美国Illinois州立理工学院为了研究典型间歇过程而开发的,它可产生出不同初始条件和不同工况下青霉素发酵过程中各变量每个时刻的数据用以分析研究.本文将每批次的反应时间设定为400 h,采样时间设为1 h,在初始条件设置不同且都在正常范围不引入故障的情况下共产生50个批次正常工况下数据,从产生的18个变量数据中选择其中16个过程变量作为监控变量(见表1),构成三维矩阵X(50×16×400)作为训练样本.表1 Pensim过程变量Tab.1 Pensim process variable变量号变量名称变量号变量名称1通风速率9反应器体积2搅拌速率10排气二氧化碳浓度3底物流加速率11pH值4补料温度12发酵罐温度5基质浓度13产生热6溶解氧浓度14酸流加速率7菌体浓度15碱流加速率8产物浓度16冷水流加速率Pensim2.0仿真平台不仅可以产生正常工况下的数据,还提供了三种故障类型(通气率、搅拌功率和底物流速率).本文引入故障类型2,即变量2的搅拌功率(agitator power)故障,在采样时间150~300 h加入20%的阶跃故障,所产生的数据作为故障样本以供在线检测.分别用MPCA算法、MNPE算法和本文提出的基于WMNPE的改进SVDD算法来对故障样本进行监测,得到图1~5的监测图.图1是MPCA算法在故障2下的SPE监测图,在采样点40左右超出控制限,发生误报,在150点加入阶跃故障时能够迅速检测出故障,在300点故障消失时能够迅速回归到正常状态.图2是MNPE算法在故障2下的SPE监测图,在采样点0~40产生多次误报,同样也能迅速检测到故障.图3是MPCA在故障2下的T2监测图,从图中可以看出在0~50点处于误报状态,能迅速检测到故障.图4是MNPE在故障2下的T2监测图,在采样点40点左右产生多次误报,能迅速检测到故障.图5是本文提出的基于WMNPE间歇过程监测的改进SVDD算法在故障2下的R2监测图,全程无漏报与误报,有较好的监测效果.通过以上仿真实验对比得到,本文提出的基于WMNPE间歇过程监测的改进SVDD算法具有较好的检测效果.图1 MPCA算法的SPE监测图Fig.1 SPE monitoring chart of MPCA algorithm图2 MNPE算法的SPE监测图Fig.2 SPE monitoring chart of MNPE algorithm 图3 MPCA算法的T2监测图Fig.3 T2 monitoring chart of MPCA algorithm 图4 MNPE算法的T2监测图Fig.4 T2 monitoring chart of MNPE algorithm 图5 基于WMNPE 的改进SVDD算法的R2监测图Fig.5 R2 monitoring chart of improved SVDD algorithm based on WMNPE6 结语本文提出的基于WMNPE的改进SVDD算法首先通过使用加权的方法更充分地提取了数据中的有用信息,使用改进SVDD的方法减少了大规模的数据样本可能造成的“维数灾难”,同时也避免了在统计分布的过程中对数据进行的高斯假设,满足数据的非高斯性.通过青霉素发酵过程的仿真实验,验证了本文所提出的算法对间歇过程的故障监控有较好的监测效果.参考文献:【相关文献】[1] RAICH A,CINAR A.Statistical process monitoring and disturbance diagnosis in multivariable continuous processes [J].AIChE J,1996,42:995-1009.[2] QIN S J,YU J.Recent developments in multivariable controller performance monitoring [J].J Process Control,2007,17:221-227.[3] QIN S J.Survey on data-driven industrial process monitoring and diagnosis [J].Annu Rev Control,2012,36:220-234.[4] NOMIKOS P,MACGREGOR J F.Monitoring batch processes using multiway principal component analysis [J].AIChE J,1994,40:1361-1375.[5] NOMIKOS P,MACGREGOR J F.Multivariate SPC charts for monitoring batch processes [J].Technometrics,1995,37:41-59.[6] JIANG Q,YAN X,ZHAO W.Fault detection and diagnosis in chemical processes using sensitive principal component analysis [J].Industrial and Engineering Chemistry Research,2013,52: 1635-1644.[7] KRUGER U,DIMITRIADIS G.Diagnosis of process faults in chemical systems using a local partial least squares approach [J].AIChE Journal,2008,54:2581-2596.[8] WOLD S.Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection [J].Journal of Chemometrics,1996,10(5/6):463-482.[9] LU N,GAO F,WANG F.A sub-PCA modeling and online monitoring strategy for batch processes [J].AIChE Journal,2004,50:255-259.[10] JIANG Q,YAN X.Chemical processes monitoring based on weighted principal component analysis and its application [J].Chemometrics and Intelligent Laboratory Systems,2012,119:11-20.[11] ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding [J].Science,2000,290(5500):2323-2326.[12] TENENBAUM J B,SILVA V D,LANGFORD J C.A global geometric framework for nonlinear dimensionality reduction [J].Science,2000,290(5500):2319-2323.[13] BELKIN M,NIYOGI placian eigenmaps for dimensionality reduction and data representation [J].Neural Computation,2003,15(6):1373-1396.[14] HE X F,NIYOGI P.Locality preserving projection [J].Advances in Neural Information Processing Systems,2003,16: 153-160.[15] HE X F,CAI D,YAN S C,et al.Neighborhood preserving embedding [C]//Proceedings of the 10th IEEE International Conference on Computer Vision.Beijing:[s.n.],2005:1208-1213.[16] HU K,YUAN J.Multivariate statistical process control based on multiway locality preserving projections [J].Journal of Process Control,2008,18(7/8):797-807.[17] HU K,YUAN J.Statistical monitoring of fed-batch process using dynamic multiway neighborhood preserving embedding [J].Chemometrics & Intelligent Laboratory Systems,2008,90(2):195-203.[18] 赵小强,王涛.基于WMNPE间歇过程监测的改进SVDD算法 [J].兰州理工大学学报,2016,42(3):82-87.[19] GE Z,CAO F,SONG Z.Batch process monitoring based on support vector data description method [J].Journal of Process Control,2011,2:949-959.[20] PARZEN E.On estimation of a probability density function and mode [J].Annals of Mathematical Statistics,1962,33:1065-1076.[21] LENNOX B,MONTAGUE G A,HIDEN H G,et al.Process monitoring of an industrial fed-batch fermentation [J].Biotechnology and Bioengineering,2001,74(2):125-135.[22] GE Z Q,GAO F R,SONG Z H.Batch Process monitoring based on support vector data description method [J].Journal of Process Control,2011,21(6):949-959.[23] XIE Z,YAN J.Kernel density estimation of traffic accidents in a network space[J].Computers,Environment and Urban Systems,2008,32(5):396-406.[24] FRANC V,HLAVAC V.Greedy algorithm for a training set reduction in the kernel method [M].Berlin:Springer Verlag,2003:426-433.[25] BIROL G,UNDEY C,CINAR A.A modular simulation package for fed-batch fermentation: penicillin production [J].Comput Chem Eng,2002,26:1553-1565.。

分数阶多机器人的领航-跟随型环形编队控制

分数阶多机器人的领航-跟随型环形编队控制

第38卷第1期2021年1月控制理论与应用Control Theory&ApplicationsV ol.38No.1Jan.2021分数阶多机器人的领航–跟随型环形编队控制伍锡如†,邢梦媛(桂林电子科技大学电子工程与自动化学院,广西桂林541004)摘要:针对多机器人系统的环形编队控制复杂问题,提出一种基于分数阶多机器人的环形编队控制方法,应用领航–跟随编队方法来控制多机器人系统的环形编队和目标包围,通过设计状态估测器,实现对多机器人的状态估计.由领航者获取系统中目标状态的信息,跟随者监测到领航者的状态信息并完成包围环绕编队控制,使多机器人系统形成对动态目标的目标跟踪.根据李雅普诺夫稳定性理论和米塔格定理,得到多机器人系统环形编队控制的充分条件,实现对多机器人系统对目标物的包围控制,通过对一组多机器人队列的目标包围仿真,验证了该方法的有效性.关键词:分数阶;多机器人;编队控制;环形编队;目标跟踪引用格式:伍锡如,邢梦媛.分数阶多机器人的领航–跟随型环形编队控制.控制理论与应用,2021,38(1):103–109DOI:10.7641/CTA.2020.90969Annular formation control of the leader-follower multi-robotbased on fractional orderWU Xi-ru†,XING Meng-yuan(School of Electronic Engineering and Automation,Guilin University of Electronic Technology,Guilin Guangxi541004,China) Abstract:Aiming at the complex problem of annular formation control for fractional order multi robot system,an an-nular formation control method based on fractional order multi robot is proposed.The leader follower formation method is used to control the annular formation and target envelopment of the multi robot systems.The state estimation of multi robot is realized by designing state estimator.The leader obtains the information of the target state in the system,the followers detects the status of the leader and complete annular formation control,the multi-robot system forms the target tracking of the dynamic target.According to Lyapunov stability theory and Mittag Leffler’s theorem,the sufficient conditions of the annular formation control for the multi robot systems are obtained in order to achieve annular formation control of the leader follower multi robot.The effectiveness of the proposed method is verified by simulation by simulation of a group of multi robot experiments.Key words:fractional order;multi-robots;formation control;annular formation;target trackingCitation:WU Xiru,XING Mengyuan.Annular formation control of the leader-follower multi-robot based on fractional order.Control Theory&Applications,2021,38(1):103–1091引言近年来,随着机器人技术的崛起和发展,各式各样的机器人技术成为了各个领域不可或缺的一部分,推动着社会的发展和进步.与此同时,机器人面临的任务也更加复杂,单个机器人已经无法独立完成应尽的责任,这就使得多机器人之间相互协作、共同完成同一个给定任务成为当前社会的研究热点.多机器人系统控制的研究主要集中在一致性问题[1]、多机器人编队控制问题[2–3]、蜂拥问题[4–5]等.其中,编队控制问题作为多机器人系统的主要研究方向之一,是国内外研究学者关注的热点问题.编队控制在生活生产、餐饮服务尤其是军事作战等领域都发挥着极大的作用.例如水下航行器在水中的自主航行和编队控制、军事作战机对空中飞行器的打击以及无人机在各行业的应用等都是多机器人编队控制上的用途[6–7].目前,多机器人编队控制方法主要有3种,其中在多机器收稿日期:2019−11−25;录用日期:2020−08−10.†通信作者.E-mail:****************;Tel.:+86132****1790.本文责任编委:黄攀峰.国家自然科学基金项目(61603107,61863007),桂林电子科技大学研究生教育创新计划项目(C99YJM00BX13)资助.Supported by the National Natural Science Foundation of China(61603107,61863007)and the Innovation Project of GUET Graduate Education (C99YJM00BX13).104控制理论与应用第38卷人系统编队控制问题上应用最广泛的是领航–跟随法[8–10];除此之外,还有基于行为法和虚拟结构法[11].基于行为的多机器人编队方法在描述系统整体时不够准确高效,且不能保证系统控制的稳定性;而虚拟结构法则存在系统灵活性不足的缺陷.领航–跟随型编队控制法具有数学分析简单、易保持队形、通信压力小等优点,被广泛应用于多机器人系统编队[12].例如,2017年,Hu等人采用分布式事件触发策略,提出一种新的自触发算法,实现了线性多机器人系统的一致性[13];Zuo等人利用李雅普诺夫函数,构造具有可变结构的全局非线性一致控制律,研究多机器人系统的鲁棒有限时间一致问题[14].考虑到分数微积分的存储特性,开发分数阶一致性控制的潜在应用具有重要意义.时中等人于2016年设计了空间遥操作分数阶PID 控制系统,提高了机器人系统的跟踪性能、抗干扰性、鲁棒性和抗时延抖动性能[15].2019年,Z Yang等人探讨了分数阶多机器人系统的领航跟随一致性问题[16].而在多机器人的环形编队控制中,对具有分数阶动力学特性的多机器人系统的研究极其有限,大部分集中在整数阶的阶段.而采用分数阶对多机器人系统目标包围编队控制进行研究,综合考虑了非局部分布式的影响,更好地描述具有遗传性质的动力学模型.使得系统的模型能更准确的反映系统的性态,对多机器人编队控制的研究非常有利.目标包围控制问题是编队控制的一个分支,是多智能体编队问题的重点研究领域.随着信息技术的高速发展,很多专家学者对多机器人系统的目标包围控制问题进行了研究探讨.例如,Kim和Sugie于2017年基于一种循环追踪策略设计分布式反馈控制律,保证了多机器人系统围绕一个目标机器人运动[17].在此基础上,Lan和Yan进行了拓展,研究了智能体包围多个目标智能体的问题,并把这个问题分为两个步骤[18]. Kowdiki K H和Barai K等人则研究了单个移动机器人对任意时变曲线的跟踪包围问题[19].Asif M考虑了机器人与目标之间的避障问题,提出了两种包围追踪控制算法;并实现了移动机器人对目标机器人的包围追踪[20].鉴于以上原因,本文采用了领航–跟随型编队控制方法来控制多机器人系统的环形编队和目标包围,通过设计状态估测器,实现对多机器人的状态估计.系统中目标状态信息只能由领航者获取,确保整个多机器人系统编队按照预期的理想编队队形进行无碰撞运动,并最终到达目标位置,对目标、领航者和跟随者的位置分析如图1(a)所示,图1(b)为编队控制后的状态.通过应用李雅普诺夫稳定性理论,得到实现多机器人系统环形编队控制的充分条件.最后通过对一组多机器人队列进行目标包围仿真,验证了该方法的有效性.(a)编队控制前(b)编队控制后图1目标、领航者和追随者的位置分析Fig.1Location analysis of targets,pilots and followers2代数图论与分数阶基础假定一个含有N个智能体的系统,通讯网络拓扑图用G={v,ε}表示,定义ε=v×v为跟随者节点之间边的集合,v={v i,i=1,2,···,N}为跟随者节点的集合.若(v i,v j)∈ε,则v i与v j为相邻节点,定义N j(t)={i|(v i,v j)∈ε,v i∈v}为相邻节点j的标签的集合.那么称第j个节点是第i 个节点的邻居节点,用N j(t)={i|(v i,v j)∈ε,v i∈v}表示第i个节点的邻居节点集合.矩阵L=D−A称为与图G对应的拉普拉斯矩阵.其中:∆是对角矩阵,对角线元素i=∑jN i a ij.若a ij=a ji,i,j∈I,则称G是无向图,否则称为有向图.如果节点v i与v j之间一组有向边(v i,v k1)(v k1,v k2)(v k2,v k3)···(v kl,v j),则称从节点v i到v j存在有向路径.定义1Riemann-Liouville(RL)分数阶微分定义:RLD atf(t)=1Γ(n−a)d nd t ntt0f(τ)(t−τ)a−n+1dτ,(1)其中:t>t0,n−1<α<n,n∈Z+,Γ(·)为伽马函数.定义2Caputo(C)分数阶微分定义:CDαtf(t)=1Γ(n−α)tt0f n(τ)(t−τ)α−n+1dτ,(2)其中:t>t0,n−1<α<n,n∈Z+,Γ(·)为伽马第1期伍锡如等:分数阶多机器人的领航–跟随型环形编队控制105函数.定义3定义具有两个参数α,β的Mittag-Leffler方程为E α,β(z )=∞∑k =1z kΓ(αk +β),(3)其中:α>0,β>0.当β=1时,其单参数形式可表示为E α,1(z )=E α(z )=∞∑k =1z kΓ(αk +1).(4)引理1[21]假定存在连续可导函数x (t )∈R n ,则12C t 0D αt x T (t )x (t )=x T (t )C t 0D αt x (t ),(5)引理2[21]假定x =0是系统C t 0D αt x (t )=f (x )的平衡点,且D ⊂R n 是一个包含原点的域,R 是一个连续可微函数,x 满足以下条件:{a 1∥x ∥a V (t ) a 2∥x ∥ab ,C t 0D αt V (t ) −a 3∥x ∥ab,(6)其中:t 0,x ∈R ,α∈(0,1),a 1,a 2,a 3,a,b 为任意正常数,那么x =0就是Mittag-Leffler 稳定.3系统环形编队控制考虑包含1个领航者和N 个跟随者的分数阶非线性多机器人系统.领航者的动力学方程为C t 0D αt x 0(t )=u 0(t ),(7)式中:0<α<1,x 0(t )∈R 2是领航者的位置状态,u 0(t )∈R 2是领航者的控制输入.跟随者的动力学模型如下:C t 0D αt x i (t )=u i (t ),i ∈I,(8)式中:0<α<1,x i (t )∈R 2是跟随者的位置状态,u i (t )∈R 2是跟随者i 在t 时刻的控制输入,I ={1,2,···,N }.3.1领航者控制器的设计对于领航者,选择如下控制器:u 0(t )=−k 1(x 0(t )−˜x 0(t ))−k 2sgn(x 0(t )−˜x 0(t )),(9)C t 0D αt x 0(t )=u 0(t )=−k 1(x 0(t )−˜x 0(t ))−k 2sgn(x 0(t )−˜x 0(t )).(10)设计一个李雅普诺夫函数:V (t )=12(x 0(t )−˜x 0(t ))T (x 0(t )−˜x 0(t )).(11)根据引理1,得到该李雅普诺夫函数的α阶导数如下:C 0D αt V(t )=12C 0D αt (x 0(t )−˜x 0(t ))T (x 0(t )−˜x 0(t )) (x 0(t )−˜x 0(t ))TC 0D αt (x 0(t )−˜x0(t ))=(x 0(t )−˜x 0(t ))T [C 0D αt x 0(t )−C 0D αt ˜x0(t )]=(x 0(t )−˜x 0(t ))T [−k 1(x 0(t )−˜x 0(t ))−k 2sgn(x 0(t )−˜x 0(t ))−C 0D αt ˜x0(t )]=−k 1(x 0(t )−˜x 0(t ))T (x 0(t )−˜x 0(t ))−k 2∥x 0(t )−˜x 0(t )∥−(x 0(t )−˜x 0(t ))TC 0D αt ˜x0(t )=−2k 1V (t )−k 2∥x 0(t )−˜x 0(t )∥+∥C 0D αt ˜x0(t )∥∥x 0(t )−˜x 0(t )∥=−2k 1V (t )−(k 2−∥C 0D ∝t ˜x0(t )∥)∥x 0(t )−˜x 0(t )∥ −2k 1V (t ).(12)令a 1=a 2=12,a 3=2k 1,ab =2,a >0,b >0,得到a 1∥x 0(t )−˜x 0(t )∥a V (t ) a 2∥x 0(t )−˜x 0(t )∥ab ,(13)C t 0D αt V(t ) −a 3∥x 0(t )−˜x 0(t )∥ab .(14)根据引理2,可知lim t →∞∥x 0(t )−˜x 0(t )∥=0,即x 0(t )逐渐趋近于˜x 0(t ).为了使跟随者能够跟踪观测到领航者的状态,设计了一个状态估测器.令ˆx i ∈R 2是追随者对领航者的状态估计,给出了ˆx i 的动力学方程C 0D αt ˆx i=β(∑j ∈N ia ij g ij (t )+d i g i 0(t )),(15)其中g ij =˜x j (t )−˜x i (t )∥˜x j (t )−˜x i (t )∥,˜x j (t )−˜x i (t )=0,0,˜x j (t )−˜x i (t )=0.(16)对跟随者取以下李雅普诺夫函数:V (t )=12N ∑i =1(ˆx i (t )−x 0(t ))T (ˆx i (t )−x 0(t )).(17)计算该函数的α阶导数如下:C 0D αt V(t )=12C 0D αtN ∑i =1(ˆx i (t )−x 0(t ))T (ˆx i (t )−x 0(t )) N ∑i =1(ˆx i (t )−x 0(t ))TC 0D αt (ˆx i (t )−x 0(t ))=N ∑i =1(ˆx i (t )−x 0(t ))T [C 0D αt ˆxi (t )−C 0D αt x 0(t )]=N ∑i =1(ˆx i (t )−x 0(t ))T [β(∑j ∈N ia ijˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥+d iˆx 0(t )−ˆx i (t )∥ˆx 0(t )−ˆx i (t )∥)−C 0D αt x 0(t )]=N ∑i =1(ˆx i (t )−x 0(t ))T β(∑j ∈N i a ij ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i(t )∥+106控制理论与应用第38卷d iˆx 0(t )−ˆx i (t )∥ˆx 0(t )−ˆx i (t )∥)−N ∑i =1(ˆx i (t )−x 0(t ))TC 0D αt x 0(t )=βN ∑i =1(ˆx i (t )−x 0(t ))T ∑j ∈N i a ij ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥+βN ∑i =1(ˆx i (t )−x 0(t ))Td i ˆx 0(t )−ˆx i (t )∥ˆx 0(t )−ˆx i(t )∥−N ∑i =1(ˆx i (t )−x 0(t ))TC 0D αt x 0(t ).(18)在上式中,令C 0D αt V (t )=N 1+N 2以方便后续计算,其中:N 1=βN ∑i =1(ˆx i (t )−x 0(t ))T ∑j ∈N i a ij ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥+βN ∑i =1(ˆx i (t )−x 0(t ))Td i ˆx 0(t )−ˆx i (t )∥ˆx 0(t )−ˆx i (t )∥=β2[N ∑i =1N ∑j =1a ij (ˆx i (t )−x 0(t ))T ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥+N ∑j =1N ∑i =1a ij (ˆx j (t )−x 0(t ))Tˆx i (t )−ˆx j (t )∥ˆx i (t )−ˆx j (t )∥]−βN ∑i =1d i∥ˆx 0(t )−ˆx i (t )∥2∥ˆx 0(t )−ˆx i (t )∥=β2N ∑i =1N ∑j =1a ij [(ˆx i (t )−x 0(t ))Tˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥−(ˆx j (t )−x 0(t ))T ˆx i (t )−ˆx j (t )∥ˆx i (t )−ˆx j (t )∥]−βN ∑i =1d i∥ˆx 0(t )−ˆx i (t )∥2∥ˆx 0(t )−ˆx i (t )∥=β2N ∑i =1N ∑j =1a ij [ˆx T i(t )ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥−x T 0(t )ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥−ˆx T j(t )ˆx i (t )−ˆx j (t )∥ˆx i (t )−ˆx j (t )∥+x T0(t )ˆx i (t )−ˆx j (t )∥ˆx i (t )−ˆx j (t )∥]−βN ∑i =1d i ∥ˆx 0(t )−ˆx i (t )∥=β2N ∑i =1N ∑j =1a ij [ˆx T i (t )ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥−ˆx T j (t )ˆx i (t )−ˆx j (t )∥ˆx i (t )−ˆx j (t )∥]−βN ∑i =1d i ∥ˆx 0(t )−ˆx i (t )∥2∥ˆx 0(t )−ˆx i (t )∥=β2N ∑i =1N ∑j =1a ij (ˆx T i(t )−ˆx Tj (t ))ˆx j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥−βN ∑i =1d i ∥ˆx 0(t )−ˆx i (t )∥2∥ˆx 0(t )−ˆx i (t )∥=−β(12N ∑i =1N ∑j =1a ij (ˆx T j (t )−ˆx T i (t ))׈x j (t )−ˆx i (t )∥ˆx j (t )−ˆx i (t )∥+N ∑i =1d i ∥ˆx 0(t )−ˆx i (t )∥2∥ˆx 0(t )−ˆx i (t )∥),(19)N 2=−N ∑i =1(ˆx i (t )−x 0(t ))TC 0D αt x 0(t )=N ∑i =1∥ˆx i (t )−x 0(t )∥∥C 0D αt x 0(t )∥×cos {ˆx i (t )−x 0(t ),−C 0D αt x 0(t )}.(20)由于∥C 0D αt x 0(t )∥k 1∥x 0(t )−˜x 0(t )∥+k 2∥sgn(x 0(t )−˜x 0(t ))∥ k 1∥x 0(t )−˜x 0(t )∥+k 2.(21)根据定义3,当lim t →∞∥x 0(t )−˜x 0(t )∥=0时,存在T >0(T 为实数),使得在t >T 时∥x 0(t )−˜x 0(t )∥ ε成立,那么对于t >T ,有0<∥C 0D αt x 0(t )∥ k 1ε+k 2=M 2,可得−N ∑i =1(ˆx i (t )−x 0(t ))TC 0D αt x 0(t )N ∑i =1∥ˆx i (t )−x 0(t )∥M 2M 2N max {∥ˆx i (t )−x 0(t )∥},(22)C 0D αt V(t ) −(β−M 2N )max i ∈I{∥ˆx i (t )−x 0(t )∥}−2β1λmin V (t ).(23)根据引理2,得lim t →∞∥ˆx i (t )−x 0(t )∥=0.(24)由上式可知,ˆx i (t )在对目标的追踪过程中逐渐趋近于x 0(t ).3.2跟随者控制器的设计在本文中,整个多机器人系统中领导者能够直接获得目标的位置信息,将这些信息传递给追随者,因此需要为每个追随者设计观测器来估计目标的状态.令ϕi (t )∈R 2由跟随者对目标i 的状态估计,给出ϕi (t )的动力学方程C 0D αt ϕi(t )=α(∑j ∈N ia ij f ij (t )+d i f i 0(t )),(25)其中f ij =ϕj (t )−ϕi (t )∥ϕj (t )−ϕi (t )∥,ϕj (t )−ϕi (t )=0,0,ϕj (t )−ϕi (t )=0.(26)取如下李雅普诺夫函数:V (t )=12N ∑i =1(ϕi (t )−r (t ))T (ϕi (t )−r (t )).(27)计算α阶导数如下:C 0D αt V(t )=第1期伍锡如等:分数阶多机器人的领航–跟随型环形编队控制10712N ∑i =1(ϕi (t )−r (t ))T (ϕi (t )−r (t )) N ∑i =1(ϕi (t )−r (t ))TC 0D αt (ϕi (t )−r (t ))=N ∑i =1(ϕi (t )−r (t ))T [C 0D αt ϕi (t )−C 0D αt r (t )]=N ∑i =1(φi (t )−r (t ))T [α(∑j ∈N ia ij f ij (t )+d i f i 0(t ))]−C 0D αt r (t )=N ∑i =1(ϕi (t )−r (t ))T α(∑j ∈N ia ij ϕj (t )−ϕi (t )∥ϕj (t )−ϕi (t )∥+d i ϕ(t )−ϕi (t )∥ϕ(t )−ϕi (t )∥)=βN ∑i =1(ϕi (t )−r (t ))T ∑j ∈N i a ijϕj (t )−ϕi (t )∥ϕj (t )−ϕi(t )∥+βN ∑i =1(ϕi (t )−r (t ))T d i ϕ(t )−ϕi (t )∥ϕ(t )−ϕi(t )∥−N ∑i =1(ϕi (t )−r (t ))TC 0D αt r (t ),(28)可得lim t →∞∥x i (t )−˜x i (t )∥=0.(29)由上式可知,x i (t )在对目标的追踪过程中逐渐趋近于˜x i (t ).4仿真结果与分析本节通过仿真结果来验证本文所提出的方法.图2为通信图,其中:V ={1,2,3,4}表示跟随者集合,0代表领导者.以5个机器人组成的队列为例进行验证,根据领航者对目标的跟随轨迹,分别进行了仿真.图2通信图Fig.2Communication diagrams假设系统中目标机器人的动态为C 0D αt r (t )=[cos t sin t ]T ,令初始值r 1(0)=r 2(0)=1,α=0.98,k 1=1,k 2=4,可知定理3中的条件是满足的.根据式(24)和式(29),随着时间趋于无穷,领航者及其跟随者的状态估计误差趋于0,这意味着领航者的状态可以由跟随者渐近精确地计算出来.令k 2>M 1,M 1=M +M ′>0,则lim t →∞∥x 0(t )−˜x 0(t )∥=0,x 0渐近收敛于领航者的真实状态.此时取时滞参数µ=0.05,实验结果见图3,由1个领航者及4个跟随者组成的多机器人系统在进行目标围堵时,最终形成了以目标机器人为中心的包围控制(见图3(b)).(a)领航者和跟随者的初始位置分析(b)编队形成后多机器人的位置关系图3目标、领航者和追随者的位置分析Fig.3Location analysis of target pilots and followers综合图4–5曲线,跟随者对领航者进行渐进跟踪,领航者同目标机器人的相对位置不变,表明该领航跟随型多机器人系统最终能与目标机器人保持期望的距离,并且不再变化.图4领航者及其跟随者的状态估计误差Fig.4The state estimation error of the leader and followers108控制理论与应用第38卷图5编队形成时领航者与目标的相对位置关系Fig.5The relative position relationship between leader andtarget仿真结果表明,多个机器人在对目标物进行包围编队时,领航者会逐渐形成以目标物运动轨迹为参照的运动路线,而跟随者则渐近的完成对领航者的跟踪(如图6所示),跟随者在对领航者进行跟踪时,会出现一定频率的抖振,但这些并不会影响该多机器人系统的目标包围编队控制.5总结本文提出了多机器人的领航–跟随型编队控制方法,选定了一台机器人作为领航者负责整个编队的路径规划任务,其余机器人作为跟随者.跟随机器人负责实时跟踪领航者,并尽可能与领航机器人之间保持队形所需的距离和角度,确保整个多机器人系统编队按照预期的理想编队队形进行无碰撞运动,并最终到达目标位置.通过建立李雅普诺夫函数和米塔格稳定性理论,得到了实现多机器人系统环形编队的充分条件,并通过对一组多机器人队列的目标包围仿真,验证了该方法的有效性.图6领航者与跟随者对目标的状态估计Fig.6State estimation of target by pilot and follower参考文献:[1]JIANG Yutao,LIU Zhongxin,CHEN Zengqiang.Distributed finite-time consensus algorithm for multiple nonholonomic mobile robots with disturbances.Control Theory &Applications ,2019,36(5):737–745.(姜玉涛,刘忠信,陈增强.带扰动的多非完整移动机器人分布式有限时间一致性控制.控制理论与应用,2019,36(5):737–745.)[2]ZHOU Chuan,HONG Xiaomin,HE Junda.Formation control ofmulti-agent systems with time-varying topology based on event-triggered mechanism.Control and Decision ,2017,32(6):1103–1108.(周川,洪小敏,何俊达.基于事件触发的时变拓扑多智能体系统编队控制.控制与决策,2017,32(6):1103–1108.)[3]ZHANG Ruilei,LI Sheng,CHEN Qingwei,et al.Formation controlfor multi-robot system in complex terrain.Control Theory &Appli-cations ,2014,31(4):531–537.(张瑞雷,李胜,陈庆伟,等.复杂地形环境下多机器人编队控制方法.控制理论与应用,2014,31(4):531–537.)[4]WU Jin,ZHANG Guoliang,ZENG Jing.Discrete-time modeling formultirobot formation and stability of formation control algorithm.Control Theory &Applications ,2014,31(3):293–301.(吴晋,张国良,曾静.多机器人编队离散模型及队形控制稳定性分析.控制理论与应用,2014,31(3):293–301.)[5]WANG Shuailei,ZHANG Jinchun,CAO Biao.Target tracking al-gorithm with double-type agents based on flocking control.Control Engineering of China ,2019,26(5):935–940.(王帅磊,张金春,曹彪.双类型多智能体蜂拥控制目标跟踪算法.控制工程,2019,26(5):935–940.)[6]SHAO Zhuang,ZHU Xiaoping,ZHOU Zhou,et al.Distributed for-mation keeping control of UA Vs in 3–D dynamic environment.Con-trol and Decision ,2016,31(6):1065–1072.(邵壮,祝小平,周洲,等.三维动态环境下多无人机编队分布式保持控制.控制与决策,2016,31(6):1065–1072.)[7]PANG Shikun,WANG Jian,YI Hong.Formation control of multipleautonomous underwater vehicles based on sensor measuring system.Journal of Shanghai Jiao Tong University ,2019,53(5):549–555.(庞师坤,王健,易宏.基于传感探测系统的多自治水下机器人编队协调控制.上海交通大学学报,2019,53(5):549–555.)[8]WANG H,GUO D,LIANG X.Adaptive vision-based leader-followerformation control of mobile robots.IEEE Transactions on Industrial Electronics ,2017,64(4):2893–2902.[9]LI R,ZHANG L,HAN L.Multiple vehicle formation control basedon robust adaptive control algorithm.IEEE Intelligent Transportation Systems Magazine ,2017,9(2):41–51.[10]XING C,ZHAOXIA P,GUO G W.Distributed fixed-time formationtracking of multi-robot systems with nonholonomic constraints.Neu-rocomputing ,2018,313(3):167–174.[11]LOPEZ-GONZALEA A,FERREIRA E D,HERNANDEZ-MAR-TINEZ E G.Multi-robot formation control using distance and ori-entation.Advanced Robotics ,2016,30(14):901–913.[12]DIMAROGONAS D,FRAZZOLI E,JOHNSSON K H.Distributedevent-triggered control for multi-agent systems.IEEE Transactions on Automatic Control ,2019,57(5):1291–1297.[13]HU W,LIU L,FENG G.Consensus of linear multi-agent systems bydistributed event-triggered strategy.IEEE Transactions on Cybernet-ics ,2017,46(1):148–157.第1期伍锡如等:分数阶多机器人的领航–跟随型环形编队控制109[14]ZUO Z,LIN T.Distributed robustfinite-time nonlinear consensusprotocols for multi-agent systems.International Journal of Systems Science,2016,47(6):1366–1375.[15]SHI Zhong,HUANG Xuexiang,TAN Qian.Fractional-order PIDcontrol for teleoperation of a free-flying space robot.Control The-ory&Applications,2016,33(6):800–808.(时中,黄学祥,谭谦.自由飞行空间机器人的遥操作分数阶PID控制.控制理论与应用,2016,33(6):800–808.)[16]YANG Z C,ZHENG S Q,LIU F.Adaptive output feedback con-trol for fractional-order multi-agent systems.ISA Transactions,2020, 96(1):195–209.[17]LIU Z X,CHEN Z Q,YUAN Z Z.Event-triggered average-consensusof multi-agent systems with weighted and directed topology.Journal of Systems Science and Complexity,2016,25(5):845–855.[18]AI X L,YU J Q.Flatness-basedfinite-time leader-follower formationcontrol of multiple quad rotors with external disturbances.Aerospace Science and Technology,2019,92(9):20–33.[19]KOWDIKI K H,BARAI K,BHATTACHARYA S.Leader-followerformation control using artificial potential functions:A kinematic ap-proach.IEEE International Conference on Advances in Engineering.Tamil Nadu,India:IEEE,2012:500–505.[20]ASIF M.Integral terminal sliding mode formation control of non-holonomic robots using leader follower approach.Robotica,2017, 1(7):1–15.[21]CHEN W,DAI H,SONG Y,et al.Convex Lyapunov functions forstability analysis of fractional order systems.IET Control Theory& Applications,2017,11(7):1070–1074.作者简介:伍锡如博士,教授,硕士生导师,目前研究方向为机器人控制、神经网络、深度学习等,E-mail:***************.cn;邢梦媛硕士研究生,目前研究方向为多机器人编队控制,E-mail: ****************.。

多目标粒子群算法流程英文

多目标粒子群算法流程英文

多目标粒子群算法流程英文The Multi-Objective Particle Swarm Optimization (MOPSO) algorithm is a powerful tool for addressing complex problems involving multiple objectives. It works by simulating the behavior of a swarm of particles, each representing a potential solution to the problem.The process starts with the initialization of the particle swarm. Each particle is randomly assigned a position in the search space, representing an initial solution. Each particle also has a velocity, which determines how it moves in the search space.The particles then begin to explore the search space, guided by two main components: personal best and global best. The personal best represents the best position found by an individual particle so far, while the global best represents the best position found by any particle in the swarm.The particles update their positions and velocities based on these best positions. They move towards the global best, but also maintain a balance between exploration and exploitation, ensuring that they don't get stuck in local optima.The algorithm continues to iterate, updating the positions and velocities of the particles until a stopping criterion is met. This could be based on a pre-defined number of iterations, a certain level of convergence, or a combination of both.One key aspect of MOPSO is handling multiple objectives. Unlike single-objective optimization, where there's a clear best solution, in multi-objective optimization, there'soften a set of Pareto-optimal solutions, each representinga different trade-off between the objectives. MOPSO uses techniques like Pareto dominance and crowding distance to identify and maintain a diverse set of representative solutions.In summary, the Multi-Objective Particle SwarmOptimization algorithm is a dynamic and adaptive approach to solving complex problems with multiple objectives. It leverages the collective intelligence of a swarm of particles to efficiently explore the search space and identify a diverse set of optimal solutions.。

多模态 鲁棒生成方法

多模态 鲁棒生成方法

多模态鲁棒生成方法Generating robust multimodal methods is a challenging task in the field of artificial intelligence. 多模态方法的鲁棒生成是人工智能领域的一项具有挑战性的任务。

In recent years, there has been an increasing focus on developing multimodal methods that can effectively process and generate information from different modalities such as text, images, and audio. 近年来,人们越来越关注开发能够有效处理和生成来自不同模态的信息的多模态方法,如文本、图像和音频。

One perspective for addressing the challenge of multimodal generation is through the use of deep learning techniques. 通过使用深度学习技术来解决多模态生成的挑战是一个视角。

Deep learning models, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have shown promising results in multimodal tasks by learning feature representations and capturing dependencies between different modalities. 深度学习模型,如卷积神经网络(CNN)和循环神经网络(RNN),通过学习特征表示和捕获不同模态之间的依赖关系,在多模态任务中取得了令人鼓舞的成果。

DYNAFORM问题全集1-50

DYNAFORM问题全集1-50

DYNAFORM问题总集1-50GDYU_YU整理1、用dynaform做模拟能否在柱坐标系下进行?(lzjms,2003-10-14)用dynaform做模拟能否在柱坐标系下进行?我想看Mφ等。

/thread-130297-1-170.htmlA:(haierking)想看径向应力应变?我问过distributor,只能在笛卡尔坐标系下进行。

2、Eta/DYNAFORM简介/thread-131221-1-170.html(leeqihan,2003-10-16)板料成形模拟与模具设计软件Eta/DYNAFORM简介。

该软件是包括美国三大汽车公司在内的世界著名汽车、航空、钢铁等公司及大学和科研机构得到广泛应用的板料成形模拟软件。

可预测材料成形的应力应变及模具承载状况,自动判断可能发生的破裂、起皱、变薄及回弹等。

应用范围包括压边、拉延、弯曲、裁剪、回弹等板料成形过程模拟,还可模拟充液成形过程、轧辊成形过程、管件弯曲等成形过程以及进行模具结构承载分析、汽车及航空航天领域的冲撞等大变形结构分析等,使模具设计人员显著地减少从概念到产品的开发时间,缩短试模周期、降低成本和提高设计质量,是板料成形模具设计、工艺设计及参数优化的理想CAE工具。

(goldao)ls-dyna功能强大,DYNAFORM只是利用dyna搞的一个专业软件。

/thread-138736-1-169.html(wyons)DYNAFORM计算内核是用LS-DYNA3D ,生成的文件在DYNA 里能够直接运行。

(hoby)dynaform是eta与lstc公司各自产品的无缝集成,利用eta公司的前后处理和lstc公司的ls-dyna求解器,所以用dynaform生成的文件完全可以在ls-dyna下面计算。

(Goneinwind)DYNAFORM和FEMB均是LS-DYNA的前处理器!都是ETA公司的产品!只是所对应的领域或者方向不一样!DYNAFORM是专门的钣金前处理软件;FEMB是LS-DYNA直接面向K文件的界面前处理器!首先一点,钣金成形用FEMB一样可以实现!不过,由于两者在处理网格方面的功能不太一样,比如:模面!还有就是DYNAFORM是一个专业的钣金前处理,所以,不能涵盖所有的LS-DYNA关键字,也不能像FEMB那样直观(对于熟悉K文件的人),但是它可以使熟悉钣金的人员很快上手!所以它的定位是:方便快捷的专业钣金前处理软件!3、液压成形模拟计算中断/thread-66079-1-170.html(wdjsc,2003-5-19)在dynaform中模拟液压成形过程(Dynaform的求解器是ls-dyna),用Ls-dyna分析计算时为什么会莫名其妙地中断?下面是message文件。

基于优先级扫描Dyna结构的贝叶斯Q学习方法

基于优先级扫描Dyna结构的贝叶斯Q学习方法
第 34 卷第 11 期 2013 年 11 月




ห้องสมุดไป่ตู้
Journal on Communications
Vol.34 No. 11 November 2013
doi:10.3969/j.issn.1000-436x.2013.11.015
基于优先级扫描 Dyna 结构的贝叶斯 Q 学习方法
于俊1,刘全1,2,傅启明1,孙洪坤1,陈桂兴1
·130·
通 信


第 34 卷
1
引言
强化学习又称为增强学习、再励学习或激励学 习,是一类学习环境状态到动作的映射方法。在与 环境的交互学习中, agent选择动作, 并作用于环境, 环境对此做出响应,产生新的场景,同时给出评价性 反馈信号。 该评价性反馈信号通常称为奖赏或强化信 号,agent的目标就是极大化期望累积奖赏值[1]。强 化学习的一个基本特征是agent与环境的交互, agent 不断探索环境和感知奖赏,利用获得的奖赏实现序 贯决策的优化,因此与监督学习和无监督学习比 较,在解决序贯决策优化方面,强化学习有着很大 的优势[2~4]。 马 尔 科 夫 决 策 过 程 (MDP, Markov decision process) 是研究一类具有随机动态系统的最优序贯 决策问题, 即最优化问题。MDP经常用来对强化学 习进行建模,系统在每个时间步所处的状态是随机 的,从当前状态按照一定的概率迁移到下一状态, 并且下一状态仅仅取决于当前的状态和迁移概率, 与以前的状态无关,即无后效性。状态的转移规律 与选用的动作, 两者交互作用决定系统的发展进程[5]。 基于强化学习的基本框架,Watkins于 1989 年在其 博士学位论文中首次提出Q学习算法,Q学习算法 以求解具有延迟回报的序贯决策优化问题为目标, 并且成为强化学习的经典算法之一[6]。 Sutton于 1990 年提出Dyna结构,agent每次与 环境交互都会产生真实经验,利用真实经验进行在 线学习的同时,也将利用获得的环境知识来完善环 境的模型,再通过模型产生的模拟经验来规划更新 状态的值函数[7]。当所建立的模型趋近于真实环境 时,能够加快算法的收敛速度。近年来, Sutton Szepesvári 等人将Dyna结构进行扩展,并与线性函 数逼近相结合,且在此基础之上,证明算法能够收 敛到一个唯一解[8]。优先级扫描思想是基于状态动 作对的重要程度来确定优先级的,以优先级的顺序 来更新值函数,并利用回溯方法来更新其所有前继 状态动作对的优先级[9]。优先级扫描能够在有限的 样本情况下,对值函数进行较好的估计。 强化学习需要解决的重要难点之一是平衡探 索与利用(exploration-and-exploitation),即使用已知 最优动作还是探索未测试动作[10]。在任意状态下, 动作的选择对算法性能的表现都有着非常重要的 影响,agent需要根据先前知识选择执行的动作,常

一种基于随机生成树的多维Q选择算法

一种基于随机生成树的多维Q选择算法

一种基于随机生成树的多维Q选择算法靳晓芳;黄祥林;朱允【期刊名称】《中国传媒大学学报(自然科学版)》【年(卷),期】2014(000)001【摘要】射频识别( Radio Frequency Identification,RFID)中,当标签密度较大时,系统工作效率常常因标签发生碰撞而降低,甚至导致通信错误,这时需要应用防碰撞算法进行纠正。

本文在ISO/IEC 18000-6 Type C标准所采用的Q选择防碰撞算法基础上,提出了基于随机生成树的多维Q选择算法( Multiple Dimensional Q-Selection with Random Tree,MDQRT)。

该算法实现了随机Q选择算法与确定性算法的结合。

仿真结果显示,该方法降低了设备及能量损耗,并有效提高了整个系统的识别效率。

%An efficient anti-collision method can correct some communication errors in a RFID system,as scenarios of multiple tags cause collisions,which induce inefficiency and system failure. In this paper,a Multidimensional Q-Selection with Random Tree algorithm ( MQRT) is proposed on the basis of Q-Algo-rithm analyses;it is a hybrid method of Q-Algorithm and deterministic algorithms. Simulation results show that the hybrid method optimizes both system efficiency and energy loss.【总页数】5页(P23-27)【作者】靳晓芳;黄祥林;朱允【作者单位】中国传媒大学信息工程学院,北京100024;中国传媒大学信息工程学院,北京100024;中国传媒大学信息工程学院,北京100024【正文语种】中文【中图分类】TN914.5【相关文献】1.一种启发式的局部随机特征选择算法 [J], 刘景华;林梦雷;张佳;林耀进2.基于自然邻居和最小生成树的原型选择算法 [J], 朱庆生;段浪军;杨力军3.一种基于随机游走的多维数据推荐算法 [J], 李芳;李永进4.一种基于多维节点属性层次聚类的应用层组播生成树算法 [J], 陈华胜;齐勇;李伟华5.一种改进的随机选择算法 [J], 周鹏因版权原因,仅展示原文概要,查看原文内容请购买。

装配作业车间调度的免疫粒子群算法实现

装配作业车间调度的免疫粒子群算法实现

装配作业车间调度的免疫粒子群算法实现SUN Hu;ZHOU Jingyan【摘要】装配作业车间调度问题(AJSSP)是一类重要的调度问题,由于其复杂性高和求解时间长,因此寻找高效的求解算法具有重要的意义.针对多层装配工序的作业车间调度问题给出3种求解方案:粒子群算法(PSO)、基于浓度抑制的免疫粒子群算法(IPSO)和采用\"精英替代\"策略的粒子群算法(EIPSO),并通过大量计算验证3种算法的优劣性.结果表明,IPSO优于PSO和EIPSO.IPSO由于免疫算法的加入,避免了PSO算法中高浓度粒子的过度复制和过早收敛,提高了全局搜索能力,能更好地求解装配作业车间调度问题.【期刊名称】《武汉理工大学学报(信息与管理工程版)》【年(卷),期】2019(041)003【总页数】5页(P282-286)【关键词】装配作业车间调度;粒子群优化算法;免疫算法;精英替代策略;优化算法【作者】SUN Hu;ZHOU Jingyan【作者单位】;【正文语种】中文【中图分类】TP301.6装配作业车间调度(AJSS)是生产中常见的生产组织形式,具有复杂的物料清单结构,包含多层装配节点,其生产调度更加有挑战性,相关研究相对较少。

为充分利用制造资源,提升企业核心竞争力,有必要对装配车间生产调度进行优化。

目前,学者多采用遗传算法求解AJSS问题,较少见粒子群算法和免疫粒子群算法在这一领域的应用研究。

故笔者采用免疫粒子群算法求解最小化最大完工时间的装配作业车间调度模型,综合考虑粒子群算法和免疫算法各自的求解特点,通过多次重复试验验证免疫粒子群算法求解的高效性和实用性。

1 装配作业车间调度1.1 问题定义AJSS问题是在调度的初始时刻,知道所有要调度产品的详细信息,包括产品结构和工序详细信息。

该问题的优化目标是确定每个工序在每一台机器上的开始时间和完成时间,使得总加工时间最小。

1.2 研究现状近年来,针对AJSS这一类NP-hard问题,学者们已提出一系列求解该问题的调度算法。

基于Meta平衡的多Agent Q学习算法研究

基于Meta平衡的多Agent Q学习算法研究

基于Meta平衡的多Agent Q学习算法研究
王万良;濮约庆;赵燕伟
【期刊名称】《计算机科学》
【年(卷),期】2012(039)B06
【摘要】多Agent强化学习算法的研究一直以来大多都是针对于合作策略,而NashQ算法的提出对非合作策略的研究无疑是一个重要贡献。

针对在多Agent系统中,Nash平衡无法确保求得的解是Pareto最优解及其计算复杂度较高的问题,提出了基于Meta平衡的MetaQ算法。

与NashQ算法不同,MetaQ算法通过对自身行为的预处理以及对其它Agent行为的预测来获取共同行为的最优策略。

最后通过研究及气候合作策略游戏实验,证明了MetaQ算法在解决非合作策略的问题中有着很好的理论解释和实验性能。

【总页数】4页(P261-264)
【作者】王万良;濮约庆;赵燕伟
【作者单位】
【正文语种】中文
【中图分类】TP181
【相关文献】
1.多Agent系统中Q学习算法研究 [J], 战忠丽;王强;王佩霞
2.基于Q学习的Agent在交叉口航班排序中的应用 [J], 朱承元;张璋
3.基于多Agent Q学习的RoboCup局部配合策略 [J], 赵发君;李龙澍
4.基于Q学习的管制员Agent学习行为研究 [J], 刘岳鹏;隋东;林颖达
5.基于博弈论及Q学习的多Agent协作追捕算法 [J], 郑延斌; 樊文鑫; 韩梦云; 陶雪丽
因版权原因,仅展示原文概要,查看原文内容请购买。

一种有限容积法中极点奇异单元的处理方法

一种有限容积法中极点奇异单元的处理方法

一种有限容积法中极点奇异单元的处理方法鲁录义;危卫;罗昔联;顾兆林【摘要】针对采用极坐标系结构化网格计算非轴对称流动会出现的奇异性问题,以圆形腔顸部驱动流为例分析奇异性问题产生的因为,并通过局部坐标系下压力梯度项的分解,提出了一种极坐标系统结构化网格中心奇异单元离散格式中压力梯度项的处理方法.圆形腔顶部驱动流的数值结果表明:经过压力修正的极坐标系结构化网格算法与单元数高于其数十倍的非结构化网格算法具有一致的计算精度.与同类处理方法相比,该算法不需要对极点处的网格进行特殊处理,在局部坐标系下,仅对极点处网格奇异单元的离散格式压力梯度项进行正交分解处理,极大地降低了有限容积法的实施难度,且保证了算法的准确有效性.【期刊名称】《西安交通大学学报》【年(卷),期】2010(044)003【总页数】5页(P95-99)【关键词】有限容积法;奇异性;极坐标系统【作者】鲁录义;危卫;罗昔联;顾兆林【作者单位】西安交通大学能源与动力工程学院,710049,西安;西安交通大学能源与动力工程学院,710049,西安;西安交通大学能源与动力工程学院,710049,西安;西安交通大学能源与动力工程学院,710049,西安【正文语种】中文【中图分类】TQ026.7在科学与工程领域中,许多椭圆型方程的问题都非常适合采用有限容积法进行求解,特别是对于直角坐标系统中几何形状规则的计算区域.当计算区域为环形、柱形或球形特征时,如槽道流、圆形射流等,在极坐标或柱坐标系下可以容易地进行结构化网格划分,同时N-S方程也可以非常方便地以相应的极坐标或柱坐标形式给出.对于二维问题,当计算区域包含圆心时,在r=0处网格线汇聚成一个点,此点周围的单元由四边形退化为三角形.对于非轴对称流动问题,必须对这些三角形单元进行特殊处理. 一种被普遍采用且简单地避免这种奇异性问题的方法是采用贴体网格进行计算,甚至改变网格划分并采用直角坐标系统来处理[1](见图1a).但是,采用这样的网格划分时,包含A、B、C、D 点的单元将发生高度变形,而且网格线也失去了正交性,模拟精度必然受到影响.第二种处理方法是采用图1b中的混合网格[2-3],即采用直角坐标系统计算圆心附近的区域,采用极坐标系统处理远离圆心的区域.由于两套不同坐标系统转换,也将导致整体计算精度下降.第三种方法是采用圆形单元[4]代替三角形单元,(见图1c),由于中心圆形单元与周围多个单元相邻,圆形单元不得不进行特殊处理,动量方程和压力修正方程都变得相当复杂,且效果并不理想[5].实际上,在采用保留退化后三角形网格(见图1d)计算纯传热问题时,并不会产生任何奇异性,即离散方程并不需要做任何修改,因为圆心处单元面积与热流量均为0,而且退化后单元的正交性并没有改变.通过对圆心周围退化后三角形单元的分析,离散方程中的每个单元的对流扩散系数、每个界面通量等计算都是正确的,三角形单元发生奇异性主要是由于压力梯度不正确计算引起的.因此,本文提出一种极坐标系统结构化网格中退化三角形单元中的压力梯度计算公式,并对公式进行了验证.图1 不同类型的网格划分1 极坐标下的控制方程二维极坐标系统中的稳态流动的控制方程[6]如下式中:φ为通用变量;u为径向(r方向)速度;v为环向(θ方向)速度;ρ为流体密度和分别为变量φ的广义扩散系数和广义源项.采用同位网格计算时,所有的求解变量均在同一套网格上,考虑r-θ平面上的一个控制容积(见图2),分别对u、v方程做控制容积积分,可以得到如下离散形式[7]式中:aP和anb分别为离散方程组中节点本身和相邻节点上的影响系数;b为方程(1)中源项的积分;pe、pw、pn和ps均为构成控制容积的4个界面上的压力,由节点压力采用线性插值而得.图2 r-θ平面上的控制容积为了避免不合理的棋盘压力场,实施同位网格算法时一般采用动量插值,一种常见的做法是在计算界面流速时引入相邻两点的压力差.为此,可以将式(2)、式(3)改写成为如下形式为便于阐述圆心退化单元引起的奇异性问题,本文首先对计算传热学的经典算例——水平环形空腔内的自然对流进行计算,并与文献[9]中给出的采用光学方法所测定的结果进行对比.为便于对比,计算中采用该文实验的几何参数,即:水平环形空腔的外内直径之比Dout/Din=2.6,内外壁面的温度(Tin和 Tout)保持均匀恒定值,空气的Pr为 0.7,计算网格数为72(环向)×30(径向).图3为Ra=4.7×104时,计算结果与文献[9]中给出的干涉仪测量结果的对比,可以看出计算结果与实验结果能很好地吻合.进一步定量地将计算结果与实验结果进行对比,结果如表1所示.从表1可以看出:Kin和Kout值差别很小,说明计算收敛,已经达到热平衡.对比相同Ra下内外归一化当量导热系数,本文的结果与文献[9]结果的最大偏差不超过2.24%.因此,对于不包含圆心处退化单元的计算问题,采用极坐标下结构化网格可以得到精确解.表1 环形空间自然对流结果注:K为量纲一的当量导热系数.Kin Kout Ra 误差/%误差/%计算值文献[9]计算值文献[9]103 1.085 1.081 0.37 1.086 1.084 0.18104 2.055 2.010 2.24 2.032 2.005 1.355×1043.045 3.024 0.69 2.954 2.973 -0.64 图3 等温线的计算结果与实验结果比较2 单元的奇异性下面以圆形区域的稳态驱动流为例,说明退化三角形单元的奇异性问题.对于圆形区域的稳态驱动流,计算区域的半径为1.0 m,圆的上半弧旋转驱动速度为1.0 rad/s,下半弧静止;内部流体的黏度为0.01 kg/m◦s-1,密度为1.0 kg/m3.为了便于比较,采用极坐标下的结构化程序和直角坐标系下非结构化程序(计算区域采用Delaunay方法[8,10]进行三角形单元划分,分别进行计算,空间项离散采用中心差分格式,计算结果如图4所示.对比图4中速度矢量与流线可看出,两种不同网格系统的计算结果的差别是明显的.从速度矢量来看,前者在接近中心处的单元只有旋转速度,而没有径向速度.从流线来看,前者计算的涡心位于圆心处,而后者计算的涡心则偏离圆心.这也就是说,对圆心周围三角形单元不进行任何处理,对于数值计算而言离散方程并没有任何奇异性,程序也不会发散,但从物理意义上说是不合理的.导致两种不同网格系统计算结果的差异的原因如下.对于极坐标系下的结构化网格,虽然在中心的单元由四边形退化为三角形,但网格线的正交性并没有发生变化,即两单元中心连线仍与它们的界面垂直(见图2b).因此,对于动量方程和压力修正方程的离散而言,其相邻单元离散方程系数的计算并不会受影响(退化后的n界面的离散方程系数也是正确的,因为其面积为0,离散方程系数也为0),导致不合理物理现象出现的主要原因是由式(2)和式(3)压力梯度项的非正确计算引起的.对于压力梯度,一般是采用散度定理将压力梯度由体积分转化为面积分,即图4 两种网格系统计算的速度矢量与流线图式中:V表示控制容积;A为控制容积表面积的单位矢量,指向单元外.对于图2a中的四边形,考虑到网格族线的正交性,使式(7)的离散形式在u和v方程中的表达分别为式中:eu、ev分别为u方向和v方向的单位矢量.式(9)和式(10)就是式(2)和式(3)中压力项的表达.对于图2b所示圆中心处的退化三角形,其w和e界面与s界面并不垂直,进行类似的简化是不可行的,必须将界面上的压力进行正交分解,式(8)在u和v 方向的表达式分别为3 算例的考核与分析沿用上一节的算例,计算区域单元采用图1d所示类型,单元数量为36(环向)×20(径向),空间离散采用中心差分方法,压力修正采用式(13)和式(14),代数方程组的求解在周向采用CTDMA算法[7,11],计算得到的结果如图5所示.对比图5与图4b可知,采用压力修正处理后,极坐标系下结构化网格与直角坐标系下的非结构化网格的计算结果定性一致.为了定量确定采用退化后网格算法的计算精度,本文首先基于直角坐标系统非结构化算法,不断细化网格,得到与网格无关的数值解.对于直角坐标系的非结构化网格,计算分别采用2586个单元、5850个单元、10256个单元、23328个单元计算,并比较计算区域垂直中心线上的速度,结果显示10256个单元与23328个单元所计算的速度分布曲线重合,因此本文采用10256个单元计算得到的结果为网格无关基准解.图5 压力修正后极坐标系统下结构网格计算的速度矢量与流线图为了检验压力项修正后极坐标系下结构化网格的计算精度,下面对比垂直对称线上的速度值,即直角坐标系下 x=0或极坐标系统下θ=90°(或270°)上的速度值(见图6).由图6a的u速度分布曲线可看出,直角坐标系非结构网格的单元数对u速度分布的影响较小,而压力修正后的极坐标系的结构化网格也得到了相同的速度分布.由图6b可看出,直角坐标系非结构化网格的单元数对v速度的影响较大,但压力修正后,极坐标系统下的结构化网格得到的v速度与直角坐标系下非结构化网格得到的网格无关基准解相吻合,而且极坐标系下的结构化网格所需要的单元数量要比直角坐标系下网格无关解所需的非结构化网格单元数小2个数量级.因此,虽然极坐标系下的结构化网格的单元数较少,但压力修正后该方法具有较高的精度.图6 垂直对称线上的速度分布4 结论本文针对极坐标系统中极点附近结构化网格单元因退化产生的非物理解进行分析,并对退化单元内的压力梯度离散项进行修正,提出了一种适合于有限容积法的极点奇异性单元的极坐标系下的网格处理方法.关于圆形区域的稳态驱动流的数值计算表明,压力修正后,极坐标系的结构化算法与单元数高于其数十倍的非结构化算法具有一致的计算效果,证明该方法具有较高的精度和效率.同时,与其他奇异性单元处理方法相比,本文的算法具有简单、易于实施等特点.参考文献:【相关文献】[1]XU X F,LEE J S,PLETCHER R H.A compressible finite volume formulation for large eddy simulation of turbulent pipe flows at low Mach number in Cartesian coordinates[J].Journal of Computational Physics,2005,203:22-48.[2]VERZICCO R,ORLANDIP.A finite-difference scheme for three-dimensional incompressible flows in cylindrical co-ordinates[J].Journal of ComputationalPhysics,1996,123:402-414.[3]EGGELS J G M.Direct and large eddy simulation of turbulent flow in cylindrical pipegeometry[D].The Netherlands:Delft University of Technology,1994.[4]MA L,INGHAMD B,WEN X.Afinite volume method for fluid flow in polar cylindrical grids[J].International Journal for Numerical M ethods in Fluids,1998,28:663-677.[5]XUE S C,PHAN-THIEN N,TANNER R I.Fully three-dimensional,time-dependent numerical simulations of Newtonian and viscoelastic swirling flows in a confined cylinder:method and steady flows[J].Journal of Non-Newtonian Fluid Mechanics,1999,87:337-367.[6]HE Y L,TAO W Q,QU Z G,et al.Steady natural convection in a vertical cylindrical envelope with adiabatic lateral wall[J].International Journal of Heat and Mass Transfer,2004,47:3131-3144.[7]聂建虎,李增耀,陶文铨,等.周期性三对角阵方法与反复迭代法的比较[J].西安交通大学学报,2000,34(7):21-25.NIE Jianhu,LI Zengyao,TAO Wenquan,et al.Heat transfer and fluid flow analysis using collocated cylindrical coordinates in three dimensions[J].Journal of Xi′an Jiaotong University,2000,34(7):21-25.[8]陶文铨.计算传热学的近代进展[M].北京:科学出版社,2000.[9]KUEHN T H,GOLDSEIN R J.An experimental and theoretical study of natural convection in the annulus between horizontal concentric cylinders[J].J Fluid Mech,1976,74(4):695-719.[10]宇波.内翅片管中的对流换热及非结构化网格中有限容积法的研究[D].西安:西安交通大学能源与动力工程学院,1998.[11]陶文铨.数值传热学[M].2版.西安:西安交通大学出版社,2004.[本刊相关文献链接]竖直环管内低压水过冷沸腾数值模拟.西安交通大学学报,2008,42(7):855-859.微通道中极性流体流动特性的研究.西安交通大学学报,2007,41(3):274-278.一种求解N-S方程的自适应直角网格方法.西安交通大学学报,2009,43(11):11-17.气泡堆积法生成曲线多边形区域非结构化网格及其应用.西安交通大学学报,2009,43(11):109-113. 多分散系统不同粒径颗粒碰撞的多重八叉树搜索算法.西安交通大学学报,2008,42(3):304-308.利用能量守恒和径向基函数插值的流固耦合界面数据传递方法.西安交通大学学报,2009,43(9):114-119.。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

A Multiagent Variant of Dyna-QGerhard WeißInstitut f¨u r Informatik,Technische Universit¨a t M¨u nchen D-80290M¨u nchen,Germany,weissg@in.tum.deAbstractThis paper describes a multiagent variant of Dyna-Q called M-Dyna-Q.Dyna-Q is an integrated single-agent framework for planning,reacting,and learning.Like Dyna-Q,M-Dyna-Q employs two key ideas:learning results can serve as a valuable input for both planning and reacting, and results of planning and reacting can serve as a valu-able input to learning.M-Dyna-Q extends Dyna-Q in that planning,reacting,and learning are jointly realized by mul-tiple agents.1IntroductionDyna-Q(e.g.,[1]and[2,Chapter9])is a single-agent framework that integrates planning and reacting on the basis of learning.This integration is based on two key ideas:•Learning results can serve as a valuable basis for both planning and reacting.Through learning the agents ac-quire information that makes it possible for them to plan and react more effectively and efficiently.More specifically,according to Dyna-Q the agents plan on the basis of an incrementally learnt world model and they react on the basis of incrementally learnt values that indicate the usefulness of their potential actions.•Results of both planning and reacting can serve as a valuable basis for learning.The agents use the out-comes of their planning and reacting activities for im-proving their world model and the estimates of their actions’usefulness.More specifically,planning con-stitutes a basis for trial-and-error learning from hypo-thetical experience,while reacting at the same time constitutes a basis for trial-and-error learning from real experience.This paper describes how the Dyna-Q framework can be ex-tended to and applied in multiagent settings.This extension, called M-Dyna-Q,keeps to the key ideas underlying Dyna-Q,but goes beyond the single-agent setting by considering planning,reacting,and learning as processes that are jointly realized by multiple agents.2The M-Dyna-Q FrameworkAccording to the M-Dyna-Q framework the overall mul-tiagent activity results from the repeated execution of a ba-sic working cycle consisting of two major joint activities, namely,action selection and learning.Each cycle runs ei-ther in real or hypothetical mode,where the agents syn-chronously switch between the two modes at afixed and predefined rate.The real mode corresponds to(fast)“reac-tive behavior,”whereas the hypothetical mode corresponds to(slower)“plan-based behavior.”During action selection, the agents jointly decide what action should be carried out next(resulting in the next real or a new hypothetical state); this decision is made on the basis of the agents’distributed value function in the case of operating in the real mode,and on the basis of the agents’joint world model in the case of operating in the hypothetical mode.During learning the agents adjust both their world model and their value func-tion if they act in the real mode,and just their world model if they act in the hypothetical mode.Below these two major activities are described in detail.In the remaining the following simple notation is used and the following elementary assumptions are made.A g= {A1,...,A n}(n∈N)denotes thefinite set of agents available in the MAS under consideration.The environment in which the agents act can be described as a discrete state space,and the individual real and hypothetical states are de-noted by S,T,U,...A c possi={a1i,...,a m i i}(m i∈N) denotes the set of all possible actions of the agent A i,andis called its action potential.Finally,A c possi[S]denotes the set of all actions that A i could carry out(identifies as“exe-cutable”)in the environmental state S.Joint Action Selection.According to M-Dyna-Q each agent A i maintains state-specific estimates of the useful-ness of its actions for goal attainment.More specifically, an agent A i maintains,for every state S and each of its ac-tions a j i,a quantity Q j i(S)that expresses its estimate ofa ji’s state-specific usefulness with respect to goal attain-ment.Based on these estimates,action selection works as follows.If the agents operate in the“real mode”,then they analyze the current real state S,and each agent A i iden-tifies and announces the set A c possi [S]of actions it couldcarry out immediately(assuming the availability of a stan-dard blackboard communication structure and a time-out announcement mechanism).The identification of its po-tentential actions can be done by each agent independent of the other agents and on the basis of its own view of and knowledge about S;in particular,action identification can be done concurrently by the agents.The action to be carried out is then selected among all announced actions dependent on the agents’action selection policy.A standard policy (which was also used in the experiments reported below)is that the probability of selecting an announced action a j i is proportional to the estimated usefulness of all actions an-nounced in S,i.e.,e Q j i(S)。

相关文档
最新文档