波恩大学博弈论 讲义 GameT-12
博弈论本科讲义
在中观经济研究中,劳动力经济学和金融理 论都有关于企业要素投入品市场的博弈模型, 即使在一个企业内部也存在博弈问题:工人之 间会为同一个升迁机会勾心斗角,不同部门之 间为争取公司的资金投入相互竞争;从宏观角 度看,国际经济学中有关于国家间的相互竞争 或相互串谋、选择关税或其他贸易政策的模型; 至于产业组织理论更是大量应用博弈论的方法 (见Jean Tirole的《产业组织理论》)。
如果n个参与人每人从自己的Si中选择一个策略 siategy profile),参与人i之外的其他参 与人的策略组合可记为s-i=( s1,s2,﹍,si-1 , si+1 ,﹍, sn)。
例如田忌的某个策略s田忌=上中下,或中下上, 等等;S田忌={上中下,上下中,中上下,中下 上 ,下上中,下中上}
贷市场的过高利息。此外,阿克尔洛夫还把信 息不对称运用于解释各种社会问题,比如因为信 息不对称,医疗保险市场上,老年人、个体劳动 者的医疗保险利益得不到保障。
三、基本概念
1、参与人Players:一个博弈中的决策主体, 他们各自的目的是通过选择行动(策略)以最 大化自己的目标函数/效用水平/支付函数。他们 可以是自然人或团体或法人,如企业、国家、 地区、社团、欧盟、北约等。 那些不作决策或虽做决策但不直接承担决 策后果的被动主体不是参与人,而只能当做环 境参数来处理。如指手划脚的看牌人、看棋人, 企业的顾问等。 对参与人的决策来说,最重要的是必须有
教材——P5 博弈论就是系统研究各种各 样博弈中参与人的合理选择及其 均衡的理论。
关于“经济博弈论”:
博弈论是研究人们在利益相互影响的格局 中的策略选择问题、是研究多人决策问题的理 论。而策略选择是人们经济行为的核心内容, 此外,经济学和博弈论的研究模式是一样的: 即强调个人理性,也就是在给定的约束条件下 追求效用最大化。可见,经济学和博弈论具 有内在的联系。在经济学和博弈论具有的这 种天然联系的基础上产生了经济博弈论。
博弈论完整版PPT课件
2-阶理性: C相信R相信C是理性的,C会将R4从R的战略空间中剔除, 所以 C不会选择C1;
3-阶理性: R相信C相信R相信C是理性的, R会将C1从C的战略空间中剔 除, R不会选择R1;
基本假设:完全竞争,完美信息
个人决策是在给定一个价格参数和收入的条 件下最大化自己的效用,个人的效用与其他人 无涉,所有其他人的行为都被总结在“价格”参数 之中
一般均衡理论是整个经济学的理论基石 和道义基础,市场机制是完美的,帕累托 最优成立,平等与效率可以兼顾。
.
3
然而在以下情况,上述结论不成立:
.
19
理性共识
0-阶理性共识:每个人都是理性的,但不知道其 他人是否是理性的;
1-阶理性共识:每个人都是理性的,并且知道其 他人也是理性的,但不知道其他人是否知道自己 是理性的;
2-阶理性共识:每个人都是理性的,并且知道其
他人也是理性的,同时知道其他人也知道自己是
理性的;但不知道其他人是否知道自己知道他们
如果你预期我会选择X,我就真的会选择X。
如果参与人事前达成一个协议,在不存在外部强 制的情况下,每个人都有积极性遵守这个协议,这 个协议就是纳什均衡。
.
28
应用1——古诺的双寡头垄断模型(1938)
假定:
只有两个厂商 面对相同的线形需求曲线,P(Q)=a-Q, Q=q1+q2 两厂商同时做决策; 假定成本函数为C(qi)=ciqi
劣策略:如果一个博弈中,某个参与人有占优策略,那么
该参与人的其他可选择策略就被称为“劣策略”。
game theory12 博弈论 英文
8 / 28
Review
Beer or Quiche
Modified Beer or Quiche
Summary
1, 1
P2 Beer P1 Quiche P2
Weak
0, 1
1
0
3, 0
0.1
2, 0
Chance
7 / 28
Review
Beer or Quiche
Modified Beer or Quiche
Summary
• P1: Strong->Quiche; Weak->Beer • P2: optimal response to that is: if Beer->Fight; if Quiche->NoFight
0, 0
1 Beer P2
Strong
1, 0
Quiche0
0.9
P1
P2
2, 1
3, 1
11 / 28
Review
Beer or Quiche
Modified Beer or Quiche
Summary
• P1: Strong->Quiche; Weak->Quiche • P2: optimal response to that: NoFight if Quiche Fight if Beer (so P1 - Weak does not deviate) • WPBE: if Quiche, Pr(Weak)=0.1; Pr(Strong)=0.9 if Beer, Pr(Weak)=p; Pr(Strong)=1-p; where p>1/2
博弈论game theory
CH2 完全信息静态博弈 §2.1 博弈的解法
2.1.1占优策略 局中人的最优策略不依赖于其他局中人的选择,则称该 局中人有占优策略。如果某策略组合中的每一个策略都是局 中人各自的占优策略,此策略组合称为占优均衡。 囚徒困境中的(坦白,坦白)就是占优均衡,坦白是每 个囚徒的占优策略。但并不是每个博弈,每个局中人都有占 优策略。
博弈论
Game Theory
焦未然
目
完全信息静态博弈 不完全信息静态博弈
录
完全且完美信息动态博弈
重 复 博 弈
基本概念
进化博弈
合作博弈
完全但不完美信息动态博弈
不完全信息动态博弈
1.1.1博弈的基本描述——博弈即游戏 游戏的基本特征:有规则、有结果、有策略、相互依赖性 例子 三人决斗,开枪射杀对手,以保存自己。命中率和 每一轮的开枪次序如下。 命中率 次序 A 30% 1 B 70% 2 C 100% 3 A在第一轮的策略是什么?A最怕什么?首轮之后谁的存 活几率最高?
2.3.3反应函数 对于其他局中人的每一个可能的决策,局中人i都选定 自己的最佳决策。建立这种最佳决策与其他局中人的每一个 可能的决策的映射关系,我们称之为反应函数。局中人反应 函数的交点就是NE。
q2
a b a 2b
a bq 2 q1 2b
E
q2
R1 q2
a bq1 q2 2b a 2b a b
按钮 4,8
0
2.1.3划线法
A B 坦 白
坦
白不 坦 白
猜硬币者 正 盖硬币者 面 反 面
-8,-8 -10,0
0,-10 -1,-1
正
反
面
面
-1,1
博弈论 Game theory (全)
博弈论 Game Theory博弈论亦名“对策论”、“赛局理论”,属应用数学的一个分支, 目前在生物学,经济学,国际关系,计算机科学, 政治学,军事战略和其他很多学科都有广泛的应用。
在《博弈圣经》中写到:博弈论是二人在平等的对局中各自利用对方的策略变换自己的对抗策略,达到取胜的意义。
主要研究公式化了的激励结构间的相互作用。
是研究具有斗争或竞争性质现象的数学理论和方法。
也是运筹学的一个重要学科。
博弈论考虑游戏中的个体的预测行为和实际行为,并研究它们的优化策略。
表面上不同的相互作用可能表现出相似的激励结构(incentive structure),所以他们是同一个游戏的特例。
其中一个有名有趣的应用例子是囚徒困境(Prisoner's dilemma)。
具有竞争或对抗性质的行为称为博弈行为。
在这类行为中,参加斗争或竞争的各方各自具有不同的目标或利益。
为了达到各自的目标和利益,各方必须考虑对手的各种可能的行动方案,并力图选取对自己最为有利或最为合理的方案。
比如日常生活中的下棋,打牌等。
博弈论就是研究博弈行为中斗争各方是否存在着最合理的行为方案,以及如何找到这个合理的行为方案的数学理论和方法。
生物学家使用博弈理论来理解和预测演化(论)的某些结果。
例如,约翰·史密斯(John Maynard Smith)和乔治·普莱斯(George R. Price)在1973年发表于《自然》杂志上的论文中提出的“evolutionarily stable strategy”的这个概念就是使用了博弈理论。
其余可参见演化博弈理论(evolutionary game theory)和行为生态学(behavioral ecology)。
博弈论也应用于数学的其他分支,如概率,统计和线性规划等。
历史博弈论思想古已有之,我国古代的《孙子兵法》就不仅是一部军事著作,而且算是最早的一部博弈论专著。
博弈论最初主要研究象棋、桥牌、赌博中的胜负问题,人们对博弈局势的把握只停留在经验上,没有向理论化发展。
博弈论算法讲义范文
博弈论算法讲义范文
一、对局(Game)
1、定义:对局(Game)是由一个或多个策略者参与构成的有决策过程
的系统,一步步进行的,并且,策略者的行为往往会影响后续的行为。
2、基本假设:
(1)策略者相互独立,没有彼此通讯的机会;
(2)策略者在做出行动时都是理性的,也就是说,他们都认为能够
获得的利益最大化。
3、类型:
(1)博弈:指在决策过程中,双方的目标是相互对抗,差异方案最
大化,最终谁都不赢;
(2)友好博弈:指在决策过程中,双方的目标是协同合作,以共同
获利和最优解。
二、博弈论(Game Theory)
1、定义:博弈论(Game Theory)是用来研究博弈应用问题的数学理论,旨在分析和研究在对局中各个策略者互相作用对抗的结果。
2、组成:
(1)博弈模型:它是由一组策略者的全部可能行动和他们的后果,
以及他们的信息和有关产生的报酬及其图像,构成的决策系统;
(2)决策分析:根据博弈模型,分析不同攻击者使用的不同策略以及各自的收益;
(3)决策算法:根据系统的状况,实施一系列有效的决策算法,达到博弈模型期望的最优解;
(4)实验结果:实验的结果,通过比较和分析,证明博弈模型具有较高的准确性和有效性。
三、Nash均衡。
lecture23(博弈论讲义(Carnegie Mellon University))
June 20, 2003
73-347 Game Theory--Lecture 23
6
Cournot duopoly model of incomplete information (version one) cont'd
Firm 2's marginal cost depends on some factor (e.g. technology) that only firm 2 knows. Its marginal cost can be
* * * Firm 1 chooses q1 which is its best response to firm 2's ( q2 (cH ) , q2 (cL ) ) (and the probability). * If firm 2's marginal cost is HIGH then firm 2 chooses q2 (cH ) which is its * best response to firm 1's q1 . * If firm 2's marginal cost is LOW then firm 2 chooses q2 (cL ) which is its * best response to firm 1's q1 .
June 20, 2003
73-347 Game Theory--Lecture 23
3
Static (or simultaneous-move) games of complete information
A set of players (at least two players) For each player, a set of strategies/actions Payoffs received by each player for the combinations of the strategies, or for each player, preferences over the combinations of the strategies All these are common knowledge among all the players.
lecture12(博弈论讲义(Carnegie Mellon University))
June 4, 2003
73-347 Game Theory--Lecture 12
8
Game tree
A game tree has a set
of nodes and a set of edges such that
a path from x0 to x4
x0
a node
each edge connects two nodes (these two nodes are said to be adjacent) for any pair of nodes, there is a unique path that connects these two nodes
Dynamic Games of Complete Information
Extensive-Form Representation Game Tree
June 4, 2003
73-347 Game Theory--Lecture 12
1
Outline of dynamic games of complete information
June 4, 2003 73-347 Game Theory--Lecture 12 6
Definition: extensive-form representation
The extensive-form representation of a
game specifies:
the players in the game when each player has the move what each player can do at each of his or her opportunities to move what each player knows at each of his or her opportunities to move the payoff received by each player for each combination of moves that could be chosen by the players
lect12
Eco514—Game TheoryLecture 12:Repeated Games (1)Marciano SiniscalchiOctober 26,1999Introduction[By and large,I will follow OR,Chap.8,so I will keep these notes to a minimum.]The theory of repeated games is a double-edged sword.On one hand,it indicates how payoffprofiles that are not consistent with Nash equilibrium in a simultaneous-move game might be achieved when the latter is played repeatedly,in a manner consistent with Nash or even subgame-perfect equilibrium.On the other hand,it shows that essentially every individually rational payoffprofile can be thus achieved if the game is repeated indefinitely (and a similar,but a bit more restrictive result holds for finitely repeated games).Thus,the theory has little predictive power.[To make matters worse,the expression “repeated-game phenomenon”is often invoked to account for the occurrence of a non-equilibrium outcome in real-world strategic interactions.But,as usual,the theory refers to a clearly specified,highly stylized situation,which may or may not approximate the actual strategic interaction.]OR emphasize the structure of the equilibria supporting these outcomes,rather than the set of attainable outcomes themselves,as the most interesting product of the theory.PayoffAggregation CriteriaDefinition 1Consider a normal-form game G =(N,(A i ,u i )i ∈N ).The T -repeated game (T ≤∞)induced by G is the perfect-information game Γ=(N,A,H,Z,P,( i )i ∈N )where:(i)A = i ∈N A i is the set of actions available to players in G ;(ii)H is the set of sequences of length at most T of elements of i ∈N A i ;(iii)P (h )=N (iv) i satisfies weak separability :for any sequence (a t )T t =1and profiles a,a ∈ i ∈N A i ,u i (a )≥u i (a )implies (a 1,...,a t −1,a,a t +1,...) i (a 1,...,a t −1,a ,a t +1,...).1The definition uses preference relations instead of utility functions.This is to accommo-date payoffaggregation criteria which do not admit an expected-utility representation.I will only illustrate the key features of the two“special”criteria proposed in OR,and then focus on discounting.I will mostly discuss infinitely-repeated games.DiscountingIn general,we assume that players share a common discount factorδ∈(0,1),and rank payoffstreams according to the rule(u ti)t≥1 i(w t i)t≥1⇔ t≥1δt−1(u t i−w t i)>0Now,we want to be able to talk about payoffprofiles of the stage game G as being achieved in the repeated version of G.Thus,for instance,if the profile a∈A is played in each repetition of the game,we want to say that the payoffprofile u(a)is attained.But,of course,this is not the discounted value of the stream(u(a),u(a),...):rather,this value isu(a) 1−δ.Thus,one usually“measures”payoffs in a repeated game in“discounted units”,i.e.interms of the discounted value of a constant payoffstream(1,1,...),which is of course11−δ.The bottom line is that,given the terminal history(a1,...,a t,...),the corresponding payoffprofile is taken to be(1−δ) t≥1δt−1u(a t).Limit-of-MeansThe key feature(OR:“the problem”)of discounting is that periods are weighted differently. One might think that,in certain situations,this might be unrealistic:OR propose the example of nationalist struggles.A criterion which treats period symmetrically is the limit of means.Ideally,we wouldwant the value of a stream(u ti )t≥1to Player i to be lim T→∞ T t=1u t i T.However,this limitmay fail to exist.1Thus,we consider the following rule:(u ti )t≥1 i(w t i)t≥1⇔lim inft→∞Tt=1u t i−w t i T>0[Recall how lim inf n x n is defined:let y n=inf{x m:m≥n},and set lim inf n x n=lim n y n. Thus,lim inf n x n>0iff,for some >0,x n> for all butfinitely many n’s.]1Here’s how to construct an example:call V ti the average up to T.First,set v1i=1,so V t i=1.Now consider the sequence(1,0,0):V3i=13.Now extend it to(1,0,0,1,1,1):V6i=23.Now consider(1,0,0,1,1,1,0,0,0,0,0,0):V1i2=13.You see how you can construct a sequence such that the sequence ofaveragesflips between13and23.2Now the stream(0,0,0,...,0,2,2,2,...)will always be preferred to(1,1,...),whereas, for low enoughδ,the reverse is true.Conversely,the limit-of-means criterion does not distinguish between(−1,1,0,0,...)and(0,0,...),whereas,for allδ∈(0,1),the latter stream is preferred to the former.Note that,for any pair of streams,there exists a discount factor such that the ordering of those two streams according to the limit-of-means and discounting criteria is the same (but,you need to pick a differentδfor every pair of streams!)OvertakingThe limit-of-means criterion completely disregardsfinite histories.Thus,for instance,it does not distinguish between(-1,2,0,0,...)and(-1,1,0,0,...).This might seem a bit extreme.Thus,consider the following variant:(u ti )t≥1 i(w t i)t≥1⇔lim inft→∞Tt=1(u t i−w t i)>0Then,according to the overtaking criterion,a stream is preferred to another if it eventually yields a higher cumulative payoff.In particular,(-1,2,0,0,...)is preferred to(-1,1,0,0, ...).In some sense,the overtaking criterion is“the best of both worlds:”it treats all stage games symmetrically,and it does give some weight tofinite histories.Having said that,we shall forget about any criterion other than discounting!MachinesLet me just remind you of the definition from OR.We do not really need them in this lecture.Definition2Fix a normal-form game G.A machine for Player i∈N is a tuple M i=(Q i,q0i ,f i,τi),where:(i)Q i is afinite set(whose elements should be thought of as labels);(ii)q0i is the initial state of the machine;(iii)f i:Q i→A i is the action function:it specifies what Player i does at each state;and (iv)τi:Q i×A→Q i is the transition function:if action a∈A is played and Player i’s machine state is q i∈Q i,then at the next stage Player i’s machine state will beτi(q i,a).Note that every machine defines a strategy,but not conversely.This is clear:a strategy is a sort of“hyper-machine”which has as states the set of non-terminal histories;but of course there are infinitely many such histories,so we cannot“shoehorn”strategies into our definition of machines.3Strategies implemented by machines are Markovian;informally,they are“simple”strate-gies.Nash Folk Theorem for the Discounting CriterionDefinition3Fix a normal-form game G.The minmax payofffor player i,v i,is defined byv i=min a−i∈A−i max ai∈A iu i(a i,a−i).Definition4Fix a normal-form game G.A payoffprofile u∈R N is feasible if u= a∈Aβaγu(a)for some set of rational coefficients(γ,(βa)a∈A)with a∈Aβa=γ.It is en-forceable(resp.strictly enforceable)if u i≥v i(resp.u i>v i)for all i∈N It should be clear that no payoffprofile w such that w i<v i for some player i∈N can be achieved in a Nash equilibrium of the infinite repetition of G:Proposition0.1[OR144.1]If s is a Nash equilibrium of the infinitely repeated gameΓ, then the resulting payoffprofile is enforceable.The proof is in OR.In particular,in a two-player game,if the machine M2=(Q2,f2,q02,τ2)implements a strategy for Player2,then the machine M1=(Q2,f1,q02,τ2)with f1(q2)∈r1(f2(q2))yields at least v1to Player1.This is OR Exercise[144.2].The main result in this lecture is known as the Nash Folk Theorem with Discounting: Proposition0.2[OR145.2]For any feasible,strictly enforceable payoffprofile w,and for every >0,there existsδ<1and a feasible,strictly enforceable payoffprofile w such that |w−w |< and,forδ>δ,w is a Nash equilibrium payoffprofile ofΓ.The intuition is easiest to grasp if w=u(a)for some a∈ i∈N a i:for each player,we construct a strategy which conforms to a as long as no other player has deviated,and plays an action which minmaxes thefirst deviator for every history following a deviation.For high enough discount factor,the benefit from a single-stage deviation is outweighted by the fact that the deviator receives his minmax payoffforever after.For arbitrary feasible payoffs,we must use cycles.If the payoffaggregation criterion was limit-of-means,we would be home free,because(1)the value of afinite stream in which each u i(a)is repeated exactlyβa times is precisely a∈Aβaγu i(a)=w;and(2)the value of an infinite repetition of thefinite stream just defined is again w.This is OR,Theorem144.3.However,for discounting we need to do some extra bookkeeping.The value of thefinite stream defined in(1)is not exactly w,but it will be close to w for high enoughδ:this will be our w .The trick is then to measure time in terms ofγ-cycles,redefining the discount factor appropriately.4Proof:Write i∈N A i={a(1),...,a(K)},where K=| i∈N A i|.Letσbe a sequence consisting ofβa(1)repetitions of a(1),followed byβa(2)repetitions of a(2),...,andfinally by βa(K)repetitions of a(K).Definew i(δ)=Kk=1βa(k) =1δP k−1k =1βa(k )+ −1u i(a(k))i.e.the discounted value of the stream of payoffs for Player i generated by thefinite sequence σ.2Essentially,the -th repetition of u i(a(1))is discounted byδ −1;the -th repetition of u i(a(2))is discounted byδβa(1)+ −1,because the u i(a(2))’s followβa(1)repetitions of u i(a(1)); and so on.Now observe that the value of the payoffstream generated by a history h=(σ,σ)is w i(δ)+δγw i(δ);thus,the value of the payoffstream along the infinite sequence z=(σ,σ,...) is1−δγw i(δ);recall that we measure values in terms of the discounted value of a constant stream(1,1,...),so we need to multiply the discounted sum by(1−δ).Denote the latter quantity by w i(δ).Repeat this construction for every i∈N to obtain a profile w (δ).Clearly,for every >0,there existsδ1such thatδ>δ1implies that|w−w (δ)|< .We now construct a strategy s i for each player i∈N.To deter deviations,fix,for each player j∈N,a minmaxing action profilep−j∈arg mina−j∈A−j maxa j∈A ju j(a j,a−j)For every i=j,denote by p−j,i the i-th component of p−j.Consider a player i∈N.For eachfinite history h which consists offinitely many repetitions ofσ,followed by an initial subsequence ofσ,Player i plays the action she is supposed to play next according toσ.These actions determine the equilibrium path.Now consider afinite history h which is offthe equilibrium path.Then there exists a subhistory h of h such that h is on the equilibrium path(i.e.players conform toσ)but there exists at least one player j=i such that s j(h )is not according toσ(if there are multiple deviators,choose the one with the lowest index).Then,at h,Player i plays p−j,i.Finally,at any history h at which Player i must be punished(i.e.there exists a subhistory h of h on the equilibrium path such that i is the lowest-indexed deviator),choose s i(h)∈r i(p−i).32We could significantly simplify the exponent ofδ,but I choose to be explicit so you see what is really going on.3Observe that,as soon as afirst deviation occurs,the lowest-indexed deviator is punished at each continu-ation history;in particular,if some player later fails to punish the deviator,nothing happens:the prescribed strategy is still to minmax the original deviator.5Finally,to see that no player has an incentive to deviate from s,fix any history h onthe equilibrium path.If Player i deviates at h,she obtains max ai∈A i u i(a i,s−i(h))+δ1−δv i,because her opponents will minmax her,and she will best-respond to p−i.4Clearly,there existsδ2such thatδ>δ2implies that the deviation is not profitable at h.And,since h is on the equilibrium path,it follows that the deviation is not profitable ex-ante.Thus,forδ>δ=max(δ1,δ2),w (δ)is the Nash equilibrium payoffprofile,and|w (δ)−w|< ,as required.You can see that the specification of the strategies is rather awkward.Machines greatly simplify the task,as I will ask you to verify in the next problem set.4Note the role of the equilibrium assumption:in principle,Player i could hope that a lower-indexed player j also deviates at h,but in equilibrium this does not happen.Hence,she expects to be the only deviator at h.6。
《博弈论》精品讲义
7
➢长街上的超市 (海滩占位模型)
*********************
0
1/4 A’ 1/2 O’
3/4
1
✓资源浪费还是理性的必然?
✓其它相似情形:旅行社的热门路线;黄金时间 的电视节目;总统竞选。
博弈论20092009
正大光明 公正無私
8
➢狩猎与投资 狩猎:
两个猎人围住一头鹿,各卡住两个关口中的 一个,齐心协力即可成功获得并平分猎物。此时 有一群兔子跑过,任何一人去抓兔子必可成功, 但鹿会跑掉。
博弈论20092009
正大光明 公正無私
5
1.博弈现象
➢田忌赛马:正确的策略可以反败为胜。 ➢囚徒困境:
乙 甲
理性的人是自私自利的; 理性选择不是全局最优。
博弈论20092009
正大光明 公正無私
6
➢经济合作:
乙 甲
诚信的价值; 一报还一报策略; 人类生存环境启示。
博弈论20092009
正大光明 公正無私
如两人写的一样, 就 认为他们讲真话, 并 按 所 写数额赔偿;如果两人写的不一样,就认定低 者讲真话,并照此价格赔偿。同时,对讲真话的 旅客奖励2元钱,对讲假话的旅客罚款2元。
理性原则下,他们会写多少价格呢?
博弈论20092009
正大光明 公正無私
11
2. 博弈概念
➢什么是博弈:
个人或团体间在依存和对抗、合作和冲突 中的决策问题。
正大光明 公正無私
43
∴I的最优混合策略为
(1,2)
(1, 4
3) 4
同理,II的最优混合策略为
G=8
(1,2)
(1, 2
1) 2
博弈论 讲义[精]
第六章 不完全信息动态博弈-精练 贝叶斯纳什均衡
一 精练贝叶斯纳什均衡
基本思路 贝叶斯法则 精练贝叶斯纳什均衡 不完美信息博弈的精练贝叶斯均衡
二 信号传递博弈及其应用举例 三 博弈论概念简要总结
思维体操:
张同学、李同学都具有足够的推理能力。某天,他们正
所罗门王断案
两个女人为争夺一个孩子吵到所罗门王那里。一个女人说:“陛下, 我和这妇人同住一个房间。我生了一个孩子,三天以后这妇人也生 了一个孩子,房间里再没有别的人。夜里这妇人睡觉的时候,把自 己的孩子压死了。她半夜醒来,趁我睡着,把我的孩子抱去,把她 已经死了的孩子放在我的怀里。天亮要喂奶的时候,我才发现怀里 的孩子是死的,仔细察看,并不是我生的孩子。”另一个女人赶紧 说:“不对,活孩子是我的,死孩子才是她的。”吵得不可开交。
→唯一的均衡价格是P=2000,只有低质量的车成交, 高质量的车退出市场。 若假设车的质量θ∈[2000 ,6000]连续分布,均衡结 果为?
高质量的车退出市场,低质量的旧车充斥市场,结 果买者买到低质量车的现象。——逆向选择( adverse-selection)。
旧车市场的逆向选择来自买卖双方的信息不对称。
完全信息条件下,均衡价格P=6000(高质量)或 P=2000(低质量)。
买者不知道车的真实质量,如果两类车都进入市场, 车的平均质量Eθ =4000→买者愿出的最高价格 P=4000。 →高质量车的卖者将退出市场,只有低量 车θ= 2000的卖者愿意出售。
→买者知道高质量的车退出,市场上剩下的一定是 低质量的卖者。买者愿出的最高价格为P=2000
在接受推理面试。他们知道桌子的抽屉里有如下16张扑克牌:
博弈论讲义入门 slides12
行为公开条件下的 无限重复博弈
• T={0,1,2,…,t,…} • G=“阶段博弈”=有限博弈 • 在T中每个日期t,进行G博弈,所有参与 者知道t之前采取的所有行动; • 总收益=阶段收益之贴现和。 • 此博弈称为G(T)。
定义
一给定收益流π= 一给定收益流π的均值是 一给定收益流π在日期t的现值是 博弈历史就是一系列过去观察到的策略选择。如: 的现值是
存在多个均衡
s*= • 当t=0,选择(B,M) • 当t=1,如果t=0时是(B,M), 则选择(C,R);否则,选择 (A,L)
你知道完美子博弈均衡路径上 的策略?
• • • • • • (B,M) (B,M) 不 (B,M) (A,L) 不 (B,L) (C,R) 是 (C,L) (C,R) 不 令T={0,1,2} (C,L) (B,M) (C,R) 是
如果有N个进入者将会怎样?
Goliath软件(GS)与新来者
新来者
如果有N个新来者将会怎样?
二次、多次重复的囚徒困境
• 日期T={0,1}; • 每个日期进行囚徒困境博弈:
• 在日期1初期所有参与者可观察到日期0 的策略。总收益为各阶段收益之和。
重复二次的囚徒困境博弈
如果T={0,1,2,…,n},情况又会怎样?
讲座12 重复博弈I
14.12 博弈理论 穆罕默德·伊尔蒂兹
路线图
1、小测验 2、有限重复博弈 1)进入遏阻博弈/连锁店悖论 2)重复囚徒困境 3)一个普遍性结论 4)存在多个均衡 3、行为公开条件下的无限重复博弈 1)贴现/现值 2)单偏差原理 3)实例
进入遏阻博弈
(进入) (容纳)
(反击)
重复二次的进入遏阻博弈
一个普遍性结论
lecture12_13
14.12 博弈论讲义(讲座12-13)穆罕默德·伊尔蒂兹1 重复博弈在这些讲义里,我们将讨论重复博弈,即重复进行一个特定较小博弈的那些博弈;这个小博弈就称为阶段博弈。
不论参与者在前面博弈中选择如何,阶段博弈都会重复进行。
我们分析的重点是,博弈是有限次还是无限次重复,以及参与者可否观察到每个参与者在之前博弈中的选择。
1.1 行为公开的有限次重复博弈我们将首先考虑阶段博弈被有限次重复的博弈,并在每次重复开始时,每个参与者都能记起每个参与者在之前每次博弈中的选择。
考虑下面这个进入遏阻博弈,其中进入者(1)决定是否进入(Enter)某个市场,在他进入后在位者(2)决定是反击(Fight)还是容纳(Acc.)进入者。
考虑这个进入遏阻博弈重复两次,而且所有前期行为是可观察到的。
假设参与者只简单关注他在各阶段博弈中的收益之和。
这个博弈如下图所示。
注意,在第一次博弈的每种结果之后,进入遏阻博弈又重复了,其中第一次博弈的收益也加到其后的每一结果中。
由于参与者对博彩的偏好并不会因我们在他的效用函数中增加一个常数而改变,所以第二“天”进行的三个博弈中每一个都与阶段博弈(即上述的进入遏阻博弈)相同。
阶段博弈存在一个唯一的完美子博弈均衡,其中,在位者将容纳进入者,并在这个推测下,进入者将进入市场。
在这种情况下,第二天进行的三个博弈中每一个都只有这个均衡是它的完美子博弈均衡。
如下图所示。
运用反向归纳法,我们将该博弈简化如下。
注意,我们简单地将第二天唯一完美子博弈均衡的收益1加到该阶段博弈的每个收益中。
同样,由于在参与者收益中加上一个常数不会改变这个博弈,于是这个简化博弈拥有阶段博弈的完美子博弈均衡作为其唯一的完美子博弈均衡。
因此,这个唯一的完美子博弈均衡如下图所示。
这可以推广到一般情形。
换句话说,对任一行为公开的有限次重复博弈,如果阶段博弈存在唯一完美子博弈均衡,那么重复博弈也存在唯一完美子博弈均衡,其中参与者每天选择阶段博弈的完美子博弈均衡。
lecture20(博弈论讲义(Carnegie Mellon University))
(0, 5)
R1
R2 L2
R2 L2
R2 L2
R2
1 1
5 0
0 5
Player 2 L2 L1 Player 1 M1 R1
June 17, 2003
M2 1 5 0 5 , 4 , 0 , 0 4 0
R2 0 , 0 , 3 , 0 0 3
10
1 , 0 , 0 ,
73-347 Game Theory--Lecture 20
Informal game tree
Finitely repeated game
A finitely repeated game is a dynamic game of
complete information in which a (simultaneous-move) game is played a finite number of times, and the previous plays are observed before the next play. The finitely repeated game has a unique subgame perfect Nash equilibrium if the stage game (the simultaneous-move game) has a unique Nash equilibrium. The Nash equilibrium of the stage game is played in every stage.
数学建模优秀讲座课件之博弈论
,
Page 20
囚徒困境可以用来说明许多现象。
• 广告战
两个公司互相竞争,二公司的广告互相影响,即一 公司的广告较被顾客接受则会夺取对方的部分收入。但 若二者同时期发出质量类似的广告,收入增加很少但成 本增加。但若不提高广告质量,生意又会被对方夺走。
此二公司可以有二选择:
互相达成协议,减少广告的开支。(合作)
Page 14
纳什均衡的定义
• 纳什均衡简单说就是,一策略组合中,所有的参与 者面临这样的一种情况:当其他人不改变策略时, 他此时的策略是最好的。也就是说,此时如果他改 变策略,他的支付将会降低。 在纳什均衡点上,每一个理性的参与者都不会有单 独改变策略的冲动。
Page 15
•寻找纳什均衡的方法———条件策略下画线法
Page 17
假设有两个小偷A和B联合犯事、私入民宅 被警察抓住。警方将两人分别置于不同的两个 房间内进行审讯,对每一个犯罪嫌疑人,警方 给出的政策是:如果两个犯罪嫌疑人都坦白了 罪行,交出了赃物,于是证据确凿,两人都被 判有罪,各被判刑8年;如果只有一个犯罪嫌 疑人坦白,另一个人没有坦白而是抵赖,则以 妨碍公务罪(因已有证据表明其有罪)再加刑 2年,而坦白者有功被减刑8年,立即释放。如 果两人都抵赖,则警方因证据不足不能判两人 的偷窃罪,但可以私入民宅的罪名将两人各判 入狱1年。
-3y+2(1-y)=2y+(-1)*(1-y)
解的:
y=3/8,
而美女每次的期望收益则是2(1-y)-3y=1/8元。
Page 28
由以上结果可知,在双方都采取最优策略的情 况下,平均每次美女赢1/8元。其实只要美女采取 了(3/8,5/8)这个方案,不论你再采用什么方案,都 是不能改变局面的。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
The Optimality of Auctions• A seller sells an object whose value to him is zero, he faces two buyers.•The seller does not know the value of the object to the buyers.•Each of the buyers has the valuation 3or 4with probability p , 1-p (respc.)•The seller wishes to design amechanism that will yield the highest possible expected payoff .An example:?First best 4 –p2 Posting price 33 Posting price 44(1 –p2) 1st price auction2nd price auction3+(1 –p)2Modified 2nd priceauction 4 –pTheresultssofar ????[]We look for an equilibrium in which mixes with probability distribution on .⋅H F() 3,K Player will bid .L 3()()()When bidding he expects to earn :⎡⎤⎣⎦b 4-b p +1-p F b It is optimal for to play a pure strategy.not H ()()This should be constant over the support, with F 3=0, F K =1()()()When bidding he expects to earn :⎡⎤⎣⎦b 4-b p +1-p F b ()()This should be constant over the support, with F 3=0, F K =1()()().Const ⎡⎤=⎣⎦4-b p +1-p F b ()()().Const ⎡⎤==⎣⎦4-3p +1-p F 3p ()p b -3F b =1-p 4-bK =4-p()ndThe expected payoff to the seller is : (same as 2 price auction)23+1-p see ex. 11.10.39 in Binmore p.564First best 4 –p 2Posting price 331st price auction 2nd price auction3+(1 –p)2auction–3+(1 –p)2p4 –p4(1 –p 2)maxModified 2nd price4 pPosting price 44(1 –p 2)Posting price 4Is there a better mechanism, one that yields a higher payoff to the seller?The Optimality of Auctions A general mechanism G Player 2 types: H, LPlayer 1 H types: L 4 , 3An Equilibrium:$t H t L s Hs LH, LPlayer 1 H types: 4 , 3The strategy set of player 1:{H,L}$t t L s Hs LThe strategy set of player 2:{H,L} When the players choosethe strategies X,YThe outcome is G(s X, t Y)L HLHH , LPlayer 1 H types: L4 , 3Truth telling is an equilibrium of the direct mechanism.t Ht Ls Hs LThe payoff in this equilibrium is the same as the payoff ofthe quilibrium of GRevelation Principle(the strategy set of each player is his set of types)in which truth telling is an equilibrium.The truth telling equilibrium implements theoutcome of E .For every mechanism G,and an equilibriumE of it, there is a direct mechanismD **TheA general direct mechanism DEach player is allowed to announce H or L .The mechanism can be described by 4numbers:h , l , H , LAssuming the other player tells the truth: HD (,): D (H ,H ), D (H ,L ), D (L ,H ) ,D (L ,L )describes the probability of obtaining the object and the expected payoffWe are interested in describing a truth telling equilibrium A player who announces LIncentive Compatibility constraintsThe seller chooses h,l,H,L to maximize his expected profits:2[(1-p )H + pL ]Individual rationality (participation) constraints:4h -H ≥ 0 IR H 3l -L ≥ 0 IR L4h -H ≥ 4l -L IC H 3l -L ≥ 3h -H IC Lsubject to:Assume we are at the optimal h,l,H,L,4h -H ≥ 0 IR H 3l -L ≥ 0 IR L 4h -H ≥ 4l -L IC H 3l -L ≥ 3h -H IC LThen by IC H :4h -H ≥ 4l -L≥ 3l –L >04h –H >0Now, increasing both H and L by the (same, small) constant will keep IR H , IR L and will not change IC H , IC L.However, this means that we could not have been at a maximum of the objective function.An increase in H and L improves the seller’s payoffIR must be an equality.4h -H ≥ 0 IR H 3l -L =0 IR L 4h -H ≥ 4l -L IC H 3l -L ≥ 3h -H IC LThen by IC H and IR L :4h -H >4l -L ≥ 3l –L =04h –H >0Now, increasing H by a small constantwill keep IC H , IR H and will not change IR L , IC L.However, this means that we could not have been at a maximum of the objective function.An increase in H improves the seller’s payoffIR H is a strict inequality.max 2[(1-p )H + pL ]4h -H ≥ 0 IR H 3l -L =0 IR L 4h -H =4l -L IC H 3l -L ≥ 3h -H IC Ls.t.Moreover, If IC H and IR L are equalities, then IR H is satisfied. And if h ≥ l then also IC L is satisfied.4h -H =4l -L ≥ 3l –L =4h –H ≥IR H is satisfied4h -H =4l -L3h -H =4l –L -h =3l –L + l -h ≤3l –L4h -H ≥ 0 IR H 3l -L =0 IR L 4h -H =4l -L IC H3l -L ≥ 3h -H IC LWe therefore need to consideronly IC H and IR L (and ensurethat the outcome satisfies h ≥ l )max 2[(1-p )H + pL ]3l -L =0 IR L 4h -H =4l -L ICHs.t.L = 3l H = 4h –l()()()max ⎡⎤⎣⎦h,l21-p 4h -l +p 3l ()()max ⎡⎤⎣⎦h,l24h 1-p +l 4p -1equivalent to maximizing:()⎛⎫ ⎪1h 1-p +l p -But symmetry imposes additional constraintsh ≤ p+ ½(1-p)(1-p)h + pl ≤ ½()max ⎡⎤⎛⎫ ⎪⎢⎥⎝⎭⎣⎦h,l 1h 1-p +l p -4The probability that a given player wins is ≤ ½The winning probabilities h,lcannot be too large=½(1+p)l ≤ 1 -p+ ½p =1 -½p(1-p)h + pl ≤ ½()max ⎡⎤⎛⎫ ⎪⎢⎥⎝⎭⎣⎦h,l 1h 1-p +l p -4h ≤ ½(1 + p)l ≤ 1 -½phl1 -½p(1-p)h + pl = ½slope:-(1-p)/p(1-p)h + pl ≤ ½()max ⎡⎤⎛⎫ ⎪⎢⎥⎝⎭⎣⎦h,l 1h 1-p +l p -4h ≤ ½(1 + p)l ≤ 1 -½phl1 -½p(1-p)h + pl = ½slope:-(1-p)/p For p <¼the slope of h(1-p) + l(p-¼)= Const is (1-p)/(¼-p)>0For p <¼the maximum is at l = 0, h = ½(1 + p)(1-p)h + pl ≤ ½()max ⎡⎤⎛⎫ ⎪⎢⎥⎝⎭⎣⎦h,l 1h 1-p +l p -4h ≤ ½(1 + p)l ≤ 1 -½phl1 -½p (1-p)h + pl = ½slope:-(1-p)/p For p >¼the slope of h(1-p) + l(p-¼)= Const is -(1 -p)/(p -¼)<-(1 -p)/pFor p >¼the maximum is at l = ½p, h = ½(1 + p)½p(1-p)h + pl ≤ ½⎝⎭⎣⎦4h ≤ ½(1 + p)l ≤ 1 -½pl1 -½p(1-p)h + pl = ½½pFor p <¼the maximum is atl = 0, h = ½(1 + p)l = ½p, h = ½(1 + p)(1-p)h + pl ≤ ½⎝⎭⎣⎦4h ≤ ½(1 + p)l ≤ 1 -½pl1 -½p(1-p)h + pl = ½½pFor p <¼the maximum is atl = 0, h = ½(1 + p)l = ½p, h = ½(1 + p)(1-p)h + pl ≤ ½⎝⎭⎣⎦4h ≤ ½(1 + p)l ≤ 1 -½pl = ½p, h = ½(1 + p)What is the seller’s payoff at this point ??()max ⎡⎤⎣⎦2H 1-p +Lp H = 4h –l = 2 + 3p/2L = 3 l = 3p/2= 4 -pInterpretation(1-p)h + pl ≤ ½⎝⎭⎣⎦4h ≤ ½(1 + p)l ≤ 1 -½pl = ½p, h = ½(1 + p)()max ⎡⎤⎣⎦2H 1-p +Lp = 4 -pL wins against L with prob. ½:½pand pays 3when he wins (expected payoff 3p/2)H = 2 + 3p/2 L= 3p/2H wins against L,and against H with prob. ½:p + ½(1 -p) = ½(1 + p)He pays 4when he wins against H,and 3½ against L(1-p)h + pl ≤ ½⎝⎭⎣⎦4h ≤ ½(1 + p)l ≤ 1 -½pl = ½p, h = ½(1 + p)()max ⎡⎤⎣⎦2H 1-p +Lp = 4 -pH = 2 + 3p/2 L= 3p/2Indeed, he could pay x when he wins against H,and y against L, with p x +½(1 -p)y =2+3p/2.(1-p)h + pl ≤ ½()max ⎡⎤⎛⎫ ⎪⎢⎥⎝⎭⎣⎦h,l 1h 1-p +l p -4h ≤ ½(1 + p)l ≤ 1 -½pWhat is the seller’s payoff at this point ??()max ⎡⎤⎣⎦2H 1-p +Lp H = 4h –l = 2(1 + p) L = 3 l = 0= 4(1 –p 2)For p <¼the maximum is atl = 0, h = ½(1 + p)L never wins. H wins against L, and against Hwith prob. ½:p+ ½(1 -p) =½(1 + p)。