国外博弈论lecture11

合集下载

博弈论完整版PPT课件

博弈论完整版PPT课件
R3 3, 2 0, 4 4, 3 50, 1 会将C4从C的战略空间中剔除, 所以 R4 2, 93 0, 92 0, 91 100, 90 R不会选择R4;
2-阶理性: C相信R相信C是理性的,C会将R4从R的战略空间中剔除, 所以 C不会选择C1;
3-阶理性: R相信C相信R相信C是理性的, R会将C1从C的战略空间中剔 除, R不会选择R1;
基本假设:完全竞争,完美信息
个人决策是在给定一个价格参数和收入的条 件下最大化自己的效用,个人的效用与其他人 无涉,所有其他人的行为都被总结在“价格”参数 之中
一般均衡理论是整个经济学的理论基石 和道义基础,市场机制是完美的,帕累托 最优成立,平等与效率可以兼顾。
.
3
然而在以下情况,上述结论不成立:
.
19
理性共识
0-阶理性共识:每个人都是理性的,但不知道其 他人是否是理性的;
1-阶理性共识:每个人都是理性的,并且知道其 他人也是理性的,但不知道其他人是否知道自己 是理性的;
2-阶理性共识:每个人都是理性的,并且知道其
他人也是理性的,同时知道其他人也知道自己是
理性的;但不知道其他人是否知道自己知道他们
如果你预期我会选择X,我就真的会选择X。
如果参与人事前达成一个协议,在不存在外部强 制的情况下,每个人都有积极性遵守这个协议,这 个协议就是纳什均衡。
.
28
应用1——古诺的双寡头垄断模型(1938)
假定:
只有两个厂商 面对相同的线形需求曲线,P(Q)=a-Q, Q=q1+q2 两厂商同时做决策; 假定成本函数为C(qi)=ciqi
劣策略:如果一个博弈中,某个参与人有占优策略,那么
该参与人的其他可选择策略就被称为“劣策略”。

国外博弈论课件lecture

国外博弈论课件lecture

完全信息博弈
总结词
所有参与者都拥有完全相同的信息, 即每个参与者都了解其他参与者的策 略和收益。
详细描述
完全信息博弈中,所有参与者都拥有 完全相同的信息,没有任何信息不对 称的情况。这种类型的博弈通常用于 分析具有高度透明度的竞争环境。
不完全信息博弈
总结词
至少有一个参与者不完全了解其他参与者的信息和策略。
博弈论广泛应用于经济学、政治学、社会学等领域,用于解释和预测行为主体的策 略选择和结果。
博弈论的发展历程
1944年,冯·诺依曼和摩根斯坦合著 的《博弈论与经济行为》标志着博弈 论的诞生。
20世纪70年代,泽尔腾提出了子博 弈精炼纳什均衡的概念,进一步丰富 了博弈论的理论体系。
20世纪50年代,纳什提出了纳什均 衡的概念,为博弈论的发展奠定了基 础。
斗鸡博弈
总结词
描述两个斗鸡在独木桥上相遇的情景,揭示了竞争中的策略和心理因素。
详细描述
斗鸡博弈是一个著名的博弈论案例,描述了两个斗鸡在独木桥上相遇的情景。每个斗鸡都有两种选择:前进或后 退。如果两只斗鸡都选择前进,它们将相撞并可能受伤;如果都选择后退,它们将保持安全但失去面子。这个案 例揭示了竞争中的策略和心理因素,如恐惧、面子和竞争意识等。
市场进入博弈
总结词
描述新企业和已存在的企业在决定是否进入市场时的 策略和利益关系。
详细描述
市场进入博弈是一个经典的博弈论案例,描述了新企 业和已存在的企业在决定是否进入市场时的策略和利 益关系。在这个情景中,已存在的企业可以选择保持 市场地位或进行阻挠,而新企业可以选择进入或不进 入市场。如果新企业决定进入市场,已存在的企业可 能会进行阻挠以保护自己的市场份额和利润。这个案 例揭示了市场进入和竞争中的策略和利益关系。

博弈论Lecture 11_Bargaining

博弈论Lecture 11_Bargaining

1Lecture 11Bargaining 2Readings •Watson: Strategy_ An introduction to gametheory–Ch19.2–Exercise: 33Outline•The ultimatum game•A finite horizon game with alternating offers and impatient players2•An infinite horizon game with alternating offers and impatient players4The ultimatum game☐Players:the two players;☐Timing:player 1 proposes a division(x1,x2) of a pie, where x1+x2=1. If 2 accepts this division, she receives x2and player 1 receives x1; if she rejects it, neither player receives any pie.☐Preferences:Each person’s preferences are represented by payoffs equal to the division of pie she receives.x12,x x 120,0Y NThe ultimatum gameBackward induction☐First consider the optimal strategy of player 2☐Player 2 either accepts all offers (including 0), or accepts all offers and rejects the offer .☐Now consider the optimal strategy of player 1:☐If player 2 accepts all offers (including 0), then player 1’s optimal offer is 0;☐If player 2 accepts all offers except zero, then no offer of player 1 is optimal.20x >20x =Conclusion☐The only equilibrium of the game is the strategy pair in which player 1 offers 0 and player 2 accepts all offers.☐In this equilibrium, player 1’s payoff is all the pie and player 2’s payoff is zero.Extensions of the ultimatum gamex12,x x 0,012Y NNY 12,y y y 21Backward induction☐In this game, player 1 is powerless; Her proposal at the start of the game is irrelevant.☐Every subgame following player 2’s rejection of a proposal of player 1 is a variant of the ultimatum game in which player 2 moves first.☐Thus every such subgame has a unique equilibrium, in which player 2 offers nothing to player 1, andplayer 1 accepts all proposal.☐Hence in every equilibrium player 2 obtains all thepie.Conclusion☐In the extension of this game in which the players alternate offers over many periods, a similar result holds:☐In every equilibrium, the player who makes theoffer in the last period obtains all the pie.A finite horizon game with alternatingoffers and impatient players0,0x12,x x 12Y NNY1122,y y δδy2110y =22x δ=Two-period deadlineBackward Induction☐The game has a unique equilibrium in which :☐Player 1’s initial proposal is .☐Player 2 accepts all proposals in which she receivesat least and rejects all proposals in which shereceives less than . ☐Player 2 proposes (0,1) after any history in which she rejects a proposal of player 1.☐Player 1 accepts all proposals of player 2 at the end of the game (after a history in which player 2 rejects player 1’s opening proposal).22(1,)δδ-2δ2δConclusion☐The outcome of this equilibrium is that player1 proposes,which player 2 accepts.☐Player 1’s payoff is and player 2’s payoffis .22(1,)δδ-21δ-2δ1x12,x x 0,012YN NY1122,y y δδy2Third-period deadline1x '2NY221122,x x δδ''20x '=11y δ=221(1)x δδ=-This game has a unique equilibrium:.2121(1(1)),(1))δδδδ---An infinite horizon game with alternatingoffers and impatient players: The is the following extensive game with perfect information.: Two negotiators, say 1 Definitio and 2n bargaining game of alternating offers Players Te : Every sequenc rminal historie e of the s form (1212,,,,,,)for 1, and every (infinite) sequence of the form (,,,,),where each is a division of the pie (a pair of numbers that sums to 1Player function ).: P()=1 (player 1 trx N x N x Y t x N x N x ≥∅ 1212121makes the first offer), and 1 if is even(,,,,,)(,,,,,,) 2 if is odd: For 1,2, player 's payoff to Preference the terminal history (,,,,,,) is s tttt it P x N x N x P x N x N x N t i i x N x N x Y δ-⎧==⎨⎩= 12, where 01, and her payoff to every (infinite) terminal history (,,,,) is 0.ti i x x N x N δ<<Stationary☐The structure of the game is stationary .☐The stationary structure of the game makes it reasonable to guess that the game has anequilibrium in which each player always makes the same proposal and always accepts the same set of proposals—that is, each player’s strategy is stationary .☐A stationary strategy pair:**11**22Player 1 always proposes and accepts a proposal if and only if Player 2 always proposes and accepts a proposal if and only if x y y yy x x x≥≥x**12,x x12YNNY**1122,y yδδy21y**12,y y21YNNY**1122,x xδδx12…………Properties of subgame perfect equilibrium: Player 2 accepts player 1's first offer, so that agreement is reached immediately; no resources are w aste Efficiency d in delay.Properties of subgame perfect equilibrium 2*11: For a given value of , the value of , the equilibrium payoff of player 1, increases as increases to 1. That is, fixing the patience of player 2, player 1's sha Effect of changes in patie r nce x δδ1e increases as she becomes more patient.Further, as player 1 becomes extremely patient ( close to 1), her share approaches 1. Symmetrically, fixing the patience of player 1,player 2's share increase δs to 1 as she becomes more patient.。

LectureEquilibrium Play(博弈论,Vanderbilt University)

LectureEquilibrium Play(博弈论,Vanderbilt University)


Two main types of bidders
private value common value

14
Private Value

Dinner

What others know does not effect your willingness to pay
15
Source: Photograph courtesy of Erik Dungan, http://sxc.hu (accessed August 10, 2004). Used with permission.
10
300MHz
3GHz
A
1850MHz
D
B
E
F
C
A
D
B

F
C
1990MHz

99 licenses (corresponding to the red circles) were sold to 18 companies for a total price of $7.7 billion
Images of spectrum courtesy of U.S. Department of Commerce’s National Telecommunications and Information Administration (/, accessed 16 July 2004).
Slide courtesy of Mike Shor, Vanderbilt University.
13
Types of Bidders

Auctions have rules and bidders

基础博弈论大学英文讲义

基础博弈论大学英文讲义

Week 11: Game TheoryRequired Reading: Schotter pp. 229 - 260Lecture Plan1. The Static Game TheoryNormal Form GamesSolution Techniques for Solving Static Games Dominant StrategyNash Equilibrium2. Prisoner’s Dilemma3. Decision AnalysisMaximim CriteriaMinimax Criteria4. Dynamic One-Off GamesExtensive Form GamesThe Sub-Game Perfect Nash Equilibrium1. The static Game TheoryStatic games: the players make their move in isolation without knowing what other players have done1.1 Normal Form GamesIn game theory there are two ways in which a game can be represented.1st) The normal form game or strategic form game2nd) The extensive form gameA normal form game is any game where we can identity the following three things:1. Players:2. The strategies available to each player.3. The Payoffs. A payoff is what a player will receive at the endof the game contingent upon the actions of all players in the game.Suppose that two people (A and B) are playing a simple game. A will write one of two words on a piece of paper, “Top” or “Bottom”. At the same time, B will independently write “left” or “right” on a piece of paper. After they do this, the papers will be examined and they will get the payoff depicted in Table 1.Table 1If A says top and B says left, then we examine the top-left corner of the matrix. In this matrix, the payoff to A(B) is the first(Second) entry in the box. For example, if A writes “top” and B writes “left” payoff of A = 1 payoff of B = 2.What is/are the equilibrium outcome(s) of this game?1.2Nash Equilibrium Approach to Solving Static GamesNash equilibrium is first defined by John Nash in 1951 based on the work of Cournot in 1893.A pair of strategy is Nash equilibrium if A's choice is optimal given B's choice, and B's choice is optimal given A's choice. When this equilibrium outcome is reached, neither individual wants to change his behaviour.Finding the Nash equilibrium for any game involves two stages.1) identify each optimal strategy in response to what the other players might do.Given B chooses left, the optimal strategy for A isGiven B chooses right, the optimal strategy for A isGiven A chooses top, the optimal strategy for B isGiven A chooses bottom, the optimal strategy for B isWe show this by underlying the payoff element for each case.2) a Nash equilibrium is identified when all players are player their optimal strategies simultaneouslyIn the case of Table 2,If A chooses top, then the best thing for B to do is to choose left since the payoff to B from choosing left is 1 and the payoff from choosing right is 0. If B chooses left, then the best thing for A to do is to choose top as A will get a payoff of 2 rather than 0.Thus if A chooses top B chooses left. If B chooses left, A chooses top. Therefore we have a Nash equilibrium: each person is making optimal choice, given the other person's choice.If the payoff matrix changes as:Table 2then what is the Nash equilibrium?Table 3If the payoffs are changed as shown in Table 32. Prisoners’ dilemm aPareto Efficiency: An allocation is Pareto efficient if goods cannot be reallocated to make someone better off without making someone else worse off.Two prisoners who were partners in a crime were being questioned in separate rooms. Each prisoner has a choice of confessing to the crime (implicating the other) or denying. If only one confesses, then he would go free and his partner will spend 6 months in prison. If both prisoners deny, then both would be in the prison for 1 month. If both confess, they would both be held for three months. The payoff matrix for this game is depicted in Table 4.Table 4The equilibrium outcome3. Decision AnalysisLet N=1 to 4 a set of possible states of nature, and let S=1 to 4be a set of strategy decided by you. Now you have to decide which strategy you have to choose given the following payoff matrix.Table 5S=YouN=OpponentIn this case you don't care the payoff of your opponent i.e. nature.3.1 The Maximin Decision Rule or Wald criterionWe look for the minimum pay-offs in each choice and then maximising the minimum pay-offLet us highlight the mimimum payoff for each strategy.3.2 The Minimax Decision Rule or Savage criterionOn this rule we need to compute the losses or regret matrix from the payoff matrix. The losses are defined as the difference between the actual payoff and what that payoff would have been had the correct strategy been chosen.Regret/Loss = Max. payoff in each column – actual payoffFor example of N=1 occurs and S=1 is chosen, the actual gain = 2 from Table 3. However, the best action given N=1 is also to choose S=1 which gives the best gain = 2. For (N=1, S=1) regret = 0.If N=1 occurs but S=2 is chosen, the actual gain = 1. However, the best action given N=1 is also to choose S=1 which gives the best gain = 2. For (N=1, S=2) regret = 2-1.Following the similar analysis, we can compute the losses for each N and S and so can compute the regret matrix.Table 6: Regret MatrixAfter computing the regret matrix, we look for the maximum regret of each strategy and then taking the minimum of these.Minimax is still very cautious but less so than the maximin.4. Dynamic one-off GamesA game can be dynamic for two reasons. First, players may be ableto observe the actions of other players before deciding upon theiroptimal response. Second, one-off game may be repeated a number of times.4.1 Extensive Form GamesDynamic games cannot be represented by payoff matrices we have touse a decision tree (extensive form) to represent a dynamic game.Start with the concept of dynamic one-off game the game can beplayed for only one time but players can condition their optimal actions on what other players have done in the past.Suppose that there are two firms (A and B) that are considering whether or not to enter a new market. If both firms enter the market,then they will make a loss of $10 mil. If only one firm enters the market, it will earn a profit of $50 mil. Suppose also that Firm B observes whether Firm A has entered the market before it decides what to do.Any extensive form game has the following four elements in common:Nodes: This is a position where the players must take a decision.The first position is called the initial node, and each node is labelled so as to identify who is making the decision.Branches: These represent the alternative choices that the person faces and so correspond to available actions.Payoff Vectors: These represent the payoffs for each player, with the payoffs listed in the order of players. When we reach a payoffvector the game ends.In period 1, Firm A makes its decisions. This is observed by Firm B which decides to enter or stay out of the market in period 2. In this extensive form game, Firm B’s decision nodes are the sub-game. This means that firm B observes Firm A’s action before making its own decision.4.2 Subgame Perfect Nash EquilibriumSubgame perfect Nash equilibrium is the predicted solution to a dynamic one-off game. From the extensive form of this game, we can observe that there are two subgames, one starting from each of Firm B’s decision nodes.How could we identify the equilibrium outcomes?In applying this principle to this dynamic game, we start with the last period first and work backward through successive nodes until we reach the beginning of the game.Start with the last period of the game first, we have two nodes. At each node, Firm B decides whether or not entering the market based on what Firm A has already done.For example, at the node of “Firm A enters”, Firm B will either make a loss of –$10mil (if it enters) or receive “0” payoff (if it stays out); these are shown by the payoff vectors (-10, -10) and (50, 0). If Firm B is rational, it will stays outThe node “Firm A enters” can be replaced by the vector (50, 0).At the second node “Firm A stays out”, Firm A h as not entered the market. Thus, Firm B will either make a profit of $50mil (if it enters) or receive “0” payoff (if it stays out); these are shown by the payoff vectors (0, 50) and (0, 0). If Firm B is rational, it will enter thus we could rule out the possibility of both firms stay outWe can now move back to the initial node. Here Firm A has to decide whether or not to enter. If Firm B is rational, it is known that the game will never reach the previously “crossed” vectors. Firm A also knows that if it enters, the game will eventually end at (A enters, B stays out) where A gets 50 and B gets 0. On the other hand, if Firm A stays out, the game will end at (A stays out, B enters) where A gets 0 and B gets 50 Firm A should enter the market at the first stage. The eventual outcome is (A enters, B stays out)How to find a subgame perfect equilibrium of a dynamic one-off game?1. Start with the last period of the game cross out the irrelevant payoff vectors.2. Replace the preceding nodes by the uncrossed payoff vectorsuntil you reach the initial node.3. The only uncrossed payoff vector(s) is the subgame perfect Nash equilibrium.。

国外博弈论课件lecture

国外博弈论课件lecture

s22 ( 1- q ) u1(s11, s22), u2(s11, s22) u1(s12, s22), u2(s12, s22)
Mixed strategy Nash equilibrium:
A pair of mixed strategies ((r*, 1-r*), (q*, 1-q*))
is a Nash equilibrium if (r*,1-r*) is a best response to (q*, 1-q*), and (q*, 1-q*) is a best response to (r*,1-r*). That is,
v1((r*, 1-r*), (q*, 1-q*)) v1((r, 1-r), (q*, 1-q*)), for all 0 r 1 v2((r*, 1-r*), (q*, 1-q*)) v2((r*, 1-r*), (q, 1-q)), for all 0 q 1
Introduction to games Normal-form (or strategic-form)
representation Iterated elimination of strictly dominated
strategies Nash equilibrium Review of concave functions, optimization Applications of Nash equilibrium Mixed strategy Nash equilibrium
May 30, 2003
h
4
Mixed strategy Nash equilibrium: 2player each with two pure strategies

国外博弈论课件lecture(10)

国外博弈论课件lecture(10)
复杂系统博弈论的研究
针对复杂系统的博弈研究将进一步深化,探索如 何利用博弈论分析复杂系统中的相互作用和演化 规律。
人工智能与博弈论的结合
随着人工智能技术的发展,博弈论将进一步应用 于机器学习、数据挖掘等领域,为人工智能提供 更好的决策支持。
博弈论对人类社会的影响
促进社会公平和合作
博弈论的研究有助于揭示社会现象背后的利益关系和互动机制, 推动社会公平和合作,减少冲突和矛盾。
纳什均衡的求解方法
迭代法
01
通过不断迭代计算每个参与者的最优策略,最终收敛到一个稳
定的策略组合。
线性规划法
02
将博弈问题转化为线性规划问题,通过求解线性规划来找到纳
什均衡。
代数法
03
通过代数方法求解博弈问题,找到纳什均衡。
纳什均衡的应用
经济学
纳什均衡可以用来解释市场中的竞争行为和价格形成机制,以及 企业之间的竞争策略。
立法博弈
在立法过程中,博弈论可以帮助分析各方利 益集团的博弈行为,预测法案的通过与否以 及最终的立法结果。这有助于利益集团制定
最佳的游说策略和立法策略。
社会问题
环境保护博弈
在环境保护方面,博弈论可以用于分析政府、企业和个人之间的环保博弈行为。通过建 立博弈模型,可以预测各方的行为反应,从而为政府制定有效的环保政策提供依据。
国外博弈论课件 Lecture(10)
目 录
• 博弈论简介 • 博弈论的基本概念 • 博弈的类型 • 纳什均衡 • 博弈论的实际应用 • 结论与展望
01
CATALOGUE
博弈论简介
博弈论的定义
博弈论:研究决策主体在给定 信息结构下如何决策以最大化 自己的效用,以及不同决策主

博弈论最全完整-讲解课件

博弈论最全完整-讲解课件
(zero-sum game and non-zero-sum game)
• 如果一个博弈在所有各种对局下全体参与人之 得益总和总是保持为零,这个博弈就叫零和博 弈;
• 相反,如果一个博弈在所有各种对局下全体参 与人之得益总和不总是保持为零,这个博弈就 叫非零和博弈。
• 零和博弈是利益对抗程度最高的博弈。
• 即使决策或行动有先后,但只要局中人在决策 时都还不知道对手的决策或者行动是什么,也 算是静态博弈
学习交流PPT
28
完全信息博弈与不完全信息博弈
(games of complete information and games of incomplete information)
• 按照大家是否清楚对局情况下每个局中人 的得益。
供万无一失的应对办法。
学习交流PPT
5
例1:无谓竞争(The GPA Rat Race)
• 你所注册的一门课程按照比例来给分:无论卷 面分数是多少,只有40%的人能够得优秀,40 %的人能得良好。
• 所有学生达成一个协议,大家都不要太用功, 如何?想法不错,但无法实施!稍加努力即可 胜过他人,诱惑大矣。
• 某些博弈中,由于偶然的外因可以对策略贴标 签,或者参与者之间拥有某些共同的知识体验, 导致了焦点的存在。
• 没有某个这样的暗示,默契的合作就完全不可 能。
学习交流PPT
9
例3:为什么教授如此苛刻?
• 许多教授强硬地规定,不进行补考,不允许迟 交作业或论文。
• 教授们为何如此苛刻?
• 如果允许某种迟交,而且教授又不能辨别真伪, 那么学生就总是会迟交。
• 王则柯、李杰编著,《博弈论教程》,中国人民大学 出版社,2004年版。

博弈论(哈佛大学原版教程)

博弈论(哈佛大学原版教程)

博弈论(哈佛⼤学原版教程)Lecture XIII:Repeated GamesMarkus M.M¨o biusApril19,2004Gibbons,chapter2.3.B,2.3.COsborne,chapter14Osborne and Rubinstein,sections8.3-8.51IntroductionSo far one might get a somewhat misleading impression about SPE.When we?rst introduced dynamic games we noted that they often have a large number of(unreasonable)Nash equilibria.In the models we’ve looked at so far SPE has’solved’this problem and given us a unique NE.In fact,this is not really the norm.We’ll see today that many dynamic games still have a very large number of SPE.2Credible ThreatsWe introduced SPE to rule out non-credible threats.In many?nite horizon games though credible threats are common and cause a multiplicity of SPE.Consider the following game:1actions before choosing the second period actions.Now one way to get a SPE is to play any of the three pro?les above followed by another of them (or same one).We can also,however,use credible threats to get other actions played inperiod1,such as:Play(B,R)in period1.If player1plays B in period1play(T,L)in period2-otherwise play (M,C)in period2.It is easy to see that no single period deviation helps here.In period2a NE is played so obviously doesn’t help.In period1player1gets4+3if he follows strategy and at most5+1 if he doesn’t.Player2gets4+1if he follows and at most2+1if he doesn’t. Therefore switching to the(M,C)equilibrium in period2is a credible threat.23Repeated Prisoner’s DilemmaNote,that the PD doesn’t have multiple NE so in a?nite horizon we don’t have the same easy threats to use.Therefore,the? nitely repeated PD has a unique SPE in which every player defects in eachperiod.In in?nite other types of threats are credible.Proposition1In the in?nitely repeated PD withδ≥12there exists a SPEin which the outcome is that both players cooperate in every period. Proof:Consider the following symmetric pro?le:s i(h t)=CIf both players have played C in everyperiod along the path leading to h t.D If either player has ever played D.To see that there is no pro?table single deviation note that at any h t such that s i(h t)=D player i gets:0+δ0+δ20+..if he follows his strategy and1+δ0+δ20+..if he plays C instead and then follows s i.At any h t such that s i(h t)=C player i gets:1+δ1+δ21+..=1 1?δ3if he follows his strategy and2+δ0+δ20+..=2if he plays D instead and then follows s i.Neither of these deviations is worth while ifδ≥12.QEDRemark1While people sometimes tend to think of this as showing that people will cooperate in they repeatedly interact ifdoes not show this.All it shows is that there is one SPE in which they do.The correct moral to draw is that there many possible outcomes.3.1Other SPE of repeated PD1.For anyδit is a SPE to play D every period.2.Forδ≥12there is a SPE where the players play D in the?rst periodand then C in all future periods.3.Forδ>1√2there is a SPE where the players play D in every evenperiod and C in every odd period.4.Forδ≥12there is a SPE where the players play(C,D)in every evenperiod and(D,C)in every odd period.3.2Recipe for Checking for SPEWhenever you are supposed to check that a strategy pro?le is an SPE you should?rst try to classify all histories(i.e.all information sets)on and o?the equilibrium path.Then you have to apply the SPDP for each class separately.Assume you want to check that the cooperation with grim trigger pun-ishment is SPE.There are two types of histories you have to check.Along the equilibrium path there is just one history:everybody coop-erated so far.O?the equilibrium path,there is again only one class: one person has defected.4Assume you want to check that cooperating in even periods and defect-ing in odd periods plus grim trigger punishment in case of deviation by any player from above pattern is SPE.There are three types of his-tories:even and odd periods along the equilibrium path,and o?the equilibrium path histories.Assume you want to check that TFT(’Tit for Tat’)is SPE(which it isn’t-see next lecture).Then you have you check four histories:only the play of both players in the last period matters for future play-so there are four relevant histories such as player1and2both cooperated in the last period,player1defected and player2cooperated etc.1Sometimes the following result comes in handy.Lemma1If players play Nash equilibria of the stage game in each period in such a way that the particular equilibrium being played in a period is a function of time only and does not depend on previous play,then this strategy is a Nash equilibrium. The proof is immediate:we check for the SPDP.Assume that there is a pro?table deviation.Such a deviation will not a?ect future play by assump-tion:if the stage game has two NE,for example,and NE1is played in even periods and NE2in odd periods,then a deviation will not a?ect future play.1 Therefore,the deviation has to be pro?table in the current stage game-but since a NE is being played no such pro?table deviation can exist.Corollary1A strategy which has players play the same NE in each period is always SPE.In particular,the grim trigger strategy is SPE if the punishment strategy in each stage game is a NE(as is the case in the PD). 4Folk TheoremThe examples in3.1suggest that the repeated PD has a tremendous number of equilibria whenδis large.Essentially,this means that game theory tells us we can’t really tell what is going to happen.This turns out to be an accurate description of most in?nitely repeated games.1If a deviation triggers a switch to only NE1this statement would no longer be true.5Let G be a simultaneous move game with action sets A1,A2,..,A I and mixed strategy spacesΣ1,Σ2,..,ΣI and payo?function?u i.De?nition1A payo?vector v=(v1,v2,..,v I)? I is feasible if there exists action pro?les a1,a2,..,a k∈A and non-negative weightsω1,..,ωI which sum up to1such thatv i=ω1?u ia1+ω2?u ia2+..+ωk?u ia k+De?nition2A payo?vector v is strictly individually rational ifv i>v i=minσ?i∈Σ?imaxσi(σ?i)∈Σiu i(σi(σ?i),σ?i)(1)We can think of this as the lowest payo?a rational player could ever get in equilibrium if he anticipates hisopponents’(possibly non-rational)play.Intuitively,the minmax payo?v i is the payo?player1can guarantee to herself even if the other players try to punish her as badly as they can.The minmax payo?is a measure of the punishment other players can in?ict. Theorem1FolkTheorem.Suppose that the set of feasible payo?s of G is I-dimensional.Then for any feasible and strictly individually rational payo?vector v there existsδ<1such that for allδ>δthere exists a SPE x?of G∞such that the average payo?to s?is v,i.e.u i(s?)=v i 1?δThe normalized(or average)payo?is de?ned as P=(1?δ)u i(s?).It is the payo?which a stage game would have to generate in each period such that we are indi?erent between that payo?stream and the one generates by s?:P+δP+δ2P+...=u i(s?)4.1Example:Prisoner’s DilemmaThe feasible payoset is the diamond bounded by(0,0),(2,-1),(-1,2) and(1,1).Every point inside can be generated as a convex combina-tions of these payo?vectors.6The minmax payofor each player is0as you can easily check.The other player can at most punish his rival by defecting,and each playerthat the equilibria I showed before generate payo?s inside this area.4.2Example:BOSConsiderHere each player can guarantee herself at least payo?23which is the pay-o?from playing the mixed strategy Nash equilibrium.You can check that whenever player2mixes with di?erent probabilities,player1can guarantee herself more than this payo?by playing either F or O all the time.4.3Idea behind the Proof1.Have players on the equilibrium path play an action with payo?v(oralternate if necessary to generate this payo?).22.If some player deviates punish him by having the other players for Tperiods chooseσ?i such that player i gets v i.3.After the end of the punishment phase reward all players(other than i)for having carried out the punishment by switching to an action pro?lewhere player i gets v Pij+ .2For example,in the BoS it is not possible to generate32,32in the stage game evenwith mixing.However,if players alternate and play(O,F)and then(F,O)the players can get arbitrarily close for largeδ. 8。

博弈论(第十一讲 重复博弈与声誉)PPT课件

博弈论(第十一讲 重复博弈与声誉)PPT课件
16
11.2 有限次重复博弈
11.2.1 有限次重复博弈——阶段博弈只有一个纳什均衡 假设重复博弈要经历T个阶段而此时是T-1阶段,总支付为各 个独立阶段的支付之和。在阶段T都会选择击中。用AT-2和GT-2 分别表示盟军与德军从阶段1到阶段T-2的各阶段支付之和,T 阶段的子博弈如下所示:
可以看出,AT-2和GT-2对求解 过程不产生任何影响。
17
11.2 有限次重复博弈
11.2.1 有限次重复博弈——阶段博弈只有一个纳什均衡
对于阶段博弈而言,有一个纳什均衡的重要性在于它 能够明确最后一个阶段中参与者如何选择。 因此,对于重复博弈而言,倒数第二阶段的行为不会 影响最后一个阶段,并且这一逻辑可以重复地应用到 任何阶段直至初始阶段。
如果阶段博弈有唯一的纳什均衡,对于有限次重复 博弈而言,唯一的子博弈精炼纳什均衡就是阶段博 弈纳什均衡的重复。
9
11.1 基本概念
阶段博弈的支付在重复博弈中仅表现为一个阶段的支付 。重复博弈中参与者的支付受到每一个独立阶段支付流 的影响。 假设T=5,重复博弈的历史为(不击中,不击中),(击 中,不击中),(不击中,不击中),(不击中,击中 ),(击中,击中),第一个行动为协约国士兵的选择 ,则协约国士兵的独立阶段支付流为4,6,4,0,2。 在重复博弈中参与者会选择能够产生最高支付的策略作 为最终策略。 支付流{4,6,4,0,2}劣于{6,6,6,0,2},但是,很难比较 支付流{6,4,4,2,6}与{4,6,4,0,2}。
10
11.1 基本概念
为了对不同的支付流进行比较,需要确定一个标准。 一个标准就是对支付流进行加总,求得各阶段支付之和 。这样做不仅因为简单,而且当单一阶段的支付增加时 ,总和也随之增加。 对于上述三个支付流有:

国外博弈论课件lecture(22)

国外博弈论课件lecture(22)

可扩展博弈
允许博弈者通过简单地 copy/compare 策略来 “合并” 在大 规模复杂博弈中的策略。
应用实例
1
政学中的博弈论应用
2
在采取决策之前,我们需要评估经济、社会、
环境和其他类似因素。
3
经济学中的博弈论应用
受多个相互作用的市场方影响是现代经济中 的一大问题。
生物学中的博弈论应用
生态系统动态性和不稳定性给生物发展带来 了挑战,而且天然选择对策略演化的影响也 有所不同。
演化博弈
无记忆演化博弈
可重复演化博弈
参与者通过建立既有策略之间的相
博弈者之间的相遇顺序是通过一个
遇次数表并更新,来进化优胜策略。 可重复的时间序列来确定的。
连续时间演化博弈
连续时间演化博弈允许我们研究复 杂的博弈动态,它们的变化率可以 被表示为微分方程。
合作博弈
稳定联盟
去寻找一种能够保持同盟稳定且稳健的机制来分享成果。
总结
1 博弈论的意义与价值
2 博弈论的研究展望
博弈论有助于我们了解人们为什么行事,并且描 述了企业和市场行为之间的互动。
随着博弈论的不断发展,其应用领域将会越来越 广,策略多样性和复杂性也将会增加。
国外博弈论课件 lecture(22)
博弈论(Game Theory)是研究人们在决策情境中所采取的策略,以及这些策 略如何影响到他人收益的一种数学工具。在现代社会中,博弈论被广泛应用 于经济学、政治学、生物学等多个领域。本课回顾博弈论的基本概念,探讨 其应用于现实生活中的实例,以及其未来的研究展望。
博弈中的信息
不完全信息博弈
当博弈参与人对于某些关键信息的获得不完全、不 对等或存在隐瞒时,会产生不完全信息博弈。

国外博弈论课件lecture(20)_OK

国外博弈论课件lecture(20)_OK

• Any node that has no success
or is called a terminal node w
hich is a possible end of the g ame
x4
• Ee xtearmmpinleal4n: ox4d,exs5, x6, x7, x8 ar
2021/9/6
x1
odes (these two nodes are
said to be adjacent)
➢ for any pair of nodes, ther
e is a unique path that con x4
x5
nects these two nodes
2021/9/6
x7 an edge connecting nodes x1 and x5
2021/9/6
x0 x1
x5 x7
x2
x3 x6
x8
14
Game tree
• If a node x is a successor of a nother node y then y is called a predecessor of x.
• In a game tree, any node othe r than the root has a unique p redecessor.
In
Out
Incumbent
A
F
1, 2
2, 1
0, 0
Incumbent
Accommodate
Fight
In
2, 1
0, 0
Challenger
2021/9/6
Out
1, 2

博弈论(哈佛大学原版教程)

博弈论(哈佛大学原版教程)

Lecture XII:Analysis of Infinite GamesMarkus M.M¨o biusApril7,2004•Gibbons,chapter2.1.A,2.1.B,2.2.A•Osborne,sections14.1-14.4,16•Oxborne and Rubinstein,sections6.5,8.1and8.21Introduction-Critique of SPEThe SPE concept eliminates non-credible threats but it’s worth to step back for am minute and ask whether we think SPE is reasonable or in throwing out threats we have been overzealous.Practically,for this course the answer will be that SPE restrictions are OK and we’ll always use them in extensive form games.However,it’s worth looking at situations where it has been criticized.Some of the worst anoma-lies disappear in infinite horizon games which we study next.1.1Rationality offthe Equilibrium PathIs it reasonable to play NE offthe equilibrium path?After all,if a player does not follow the equilibrium he is probably as stupid as a broomstick. Why should we trust him to play NE in the subgame?Let’s look at the following game to illustrate that concern:1Here(L,A,x)is the unique SPE.However,player2has to put a lot of trust into player1’s rationality in order to play x.He must believe that player1is smart enough tofigure out that A is a dominant strategy in the subgame following R.However,player2might have serious doubts about player1’s marbles after the guy has just foregone5utils by not playing L.1 1.2Multi-Stage GamesLemma1The unique SPE of thefinitely repeated Prisoner’s Dilemma game in which players get the sum of their payoffs from each stage game has every player defecting at each information set.The proof proceeds by analyzing the last stage game where we would see defection for sure.But then we would see defection in the second to last stage game etc.In the same way we can show that thefinitely repeated Bertrand game results in pricing at marginal cost all the time.Remark1The main reason for the breakdown of cooperation in thefinitely repeated Prisoner’s Dilemma is not so much SPE by itself by the fact that there is afinal period in which agents would certainly defect.This raises the question whether an infinitely repeated PD game would allow us to cooperate. Essentially,we could cooperate as long as the other player does,and if there 1After all any strategy in which L is played strictly dominates any strategy in which R is played in the normal form.2is defection,we defect from then on.This still looks like a SPE-in any subgame in which I have defected before,I might as well defect forever.If I haven’t defected yet,I can jeopardize cooperation by defection,and therefore should not do it as long as I care about the future sufficiently.These results should make us suspicious.Axelrod’s experiments(see fu-ture lecture)showed that in thefinitely repeated Prisoner’s Dilemma people tend to cooperate until the last few periods when the’endgame effect’kicks in.Similarly,there are indications that rivalfirms can learn to collude if they interact repeatedly and set prices above marginal cost.This criticisms of SPE is reminiscent of our criticism of IDSDS.In both cases we use an iterative procedure tofind equilibrium.We might have doubts where real-world subjects are able(and inclined)to do this calculation.game is to drop out immediately(play L)at each stage.However,in experi-3ments people typically continue until almost the last period before they drop out.2Infinite Horizon GamesOf the criticism of SPE the one we will take most seriously is that longfinite horizon models do not give reasonable answers.Recall that the problem was that the backward induction procedure tended to unravel’reasonably’looking strategies from the end.It turns out that many of the anomalies go away once we model these games as infinite games because there is not endgame to be played.The prototypical model is what Fudenberg and Tirole call an infinite horizon multistage game with observed actions.•At times t=0,1,2,..some subset of the set of players simultaneously chooses actions.•All players observe the period t actions before choosing period t+1 actions.•Players payoffs maybe any function of the infinite sequence of actions (play does not end in terminal nodes necessarily any longer)2.1Infinite Games with DiscountingOften we assume that player i’s payoffs are of the form:u i(s i,s−i)=u i0(s i,s−i)+δu i1(s i,s−i)+δ2u i2(s i,s−i)+..(1) where u it(s i,s−i)is a payoffreceived at t when the strategies are followed.2.1.1Interpretation ofδ1.Interest rateδ=11+r .Having two dollars today or two dollars tomor-row makes a difference to you:your two dollars today are worth more, because you can take them to the bank and get2(1+r)Dollars to-morrow where r is the interest rate.By discounting future payoffs withδ=11+r we correct for the fact that future payoffs are worth less to usthan present payoffs.42.Probabilistic end of game:suppose the game is reallyfinite,but thatthe end of the game is not deterministic.Instead given that stage t is reached there is probabilityδthat the game continues and probability 1−δthat the game ends after this stage.Note,that the expectednumber of periods is then11−δandfinite.However,we can’t applybackward induction directly because we can never be sure that any round is the last one.The probabilistic interpretation is particularly attractive for interpreting bargaining games with many rounds.2.2Example I:Repeated GamesLet G be a simultaneous move game withfinite action spaces A1,..,A I.The infinitely repeated game G∞is the game where in periods t=0,1,2,..theplayers simultaneously choose actions(a t1,..,a tI)after observing all previousactions.We define payoffs in this game byu i(s i,s−i)=∞t=0δt˜u ia t1,..,a tI(2)where(a t1,..,a tI)is the action profile taken in period t when players followstrategies s1,..,s I,and˜u i are the utilities of players in each stage game. For example,in the infinitely repeated Prisoner’s Dilemma game the˜u i are simply the payoffs in the’boxes’of the normal form representation.2.3Example II:BargainingSuppose there is a one Dollar to be divided up between two players.The following alternate offer procedure is used:I.In periods0,2,4,...player1offers the division(x1,1−x1).Player2then accepts and the game ends,or he rejects and play continues.II.In period1,3,5,...player2offers the division(1−x2,x2).Player1then accepts or rejects.Assume that if the division(y,1−y)is agreed to in period t then the payoffs areδt y andδt(1−y).22Note that this is not a repeated game.First of all,the stage games are not identical (alternate players make offers).Second,there is no per period payoff.Instead,players only get payoffs when one of them has agreed to an offer.Waiting to divide the pie is costly.5Remark2There arefinite versions of this game in which play end after period T.One has to make some assumption what happens if there is no agreement at time T-typically,one assumes that the pie simply disappears. If T=1then we get the simple ultimatum game.3Continuity at InfinityNone of the tools we’ve discussed so far are easy to apply for infinite games. First,backward induction isn’t feasible because there is no end to work back-ward from.Second,using the definition of SPE alone isn’t very easy.There are infinitely many subgames and uncountably many strategies that might do better.We will discuss a theorem which makes the analysis quite tractable in most infinite horizon games.To do so,we mustfirst discuss what continuity at infinity means.Definition1An infinite extensive form game G is continuous at∞iflim T→∞supi,σ,σ s.th.σ(h)=σ (h)for all h in peri-ods t≤T|u i(σ)−u i(σ )|=0In words:compare the payoffs of two strategies which are identical for all information sets up to time T and might differ thereafter.As T becomes large the maximal difference between any two such strategies becomes arbitrarily small.Essentially,this means that distant future events have a very small payoffeffect.3.1Example I:Repeated GamesIfσandσ agree in thefirst T periods then:|u i(σ)−u i(σ )|=∞t=Tδt˜u ia t−˜u i(at)≤∞t=Tδt maxi,a,a|˜u i(a)−˜u i(a )| 6Forfinite stage games we know that M=max i,a,a |˜u i(a)−˜u i(a )|isfinite. This implies thatlim T→∞|u i(σ)−u i(σ )|≤limT→∞∞t=Tδt M=limT→∞δT1−δM=0.3.2Example II:BargainingIt’s easy to check that the bargaining game is also continuous.3.3Example III:Non-discounted war of attritionThis is an example for an infinite game which is NOT continuous at infinity.Players1and2choose a ti ∈{Out,Fight}at time t=0,1,2,...The gameends whenever one player quits with the other being the’winner’.Assume the payoffs areu i(s i,s−i)=1−ct if player i’wins’in period t −ct if player i quits in period tNote,that players in this game can win a price of1by staying in the game longer than the other player.However,staying in the game is costly for both players.Each player wants the game tofinish as quickly as possible,but also wants the other player to drop outfirst.This game is not continuous at∞.LetσT=Bothfight for T periods and then1quitsσ T=Bothfight for T periods and then2quits.Then we haveuiσT−u iσ T=1.This expression does not go to zero as T→∞.4The Single-Period Deviation PrincipleThe next theorem makes the analysis of infinite games which are continuous at∞possible.7Theorem1Let G be an infinite horizon multistage game with observed ac-tions which is continuous at∞.A strategy profile(σ1,..,σI)is a SPE if and only if there is no player i and strategyˆσi that agrees withσi except at a sin-gle information set h ti and which gives a higher payoffto player i conditionalon h tibeing reached.We write u i(σi,σ−i|x)for the payoffconditional on x being reached.For example,in the entry game below we have u2(Accomodate,Out|node1is reached)= 1.Recall,that we can condition on nodes which are not on the equilibrium path because the strategy of each player defines play at each node.4.1Proof of SPDP for Finite GamesI start by proving the result forfinite-horizon games with observed actions.Step I:By the definition of SPE there cannot be a profitable deviationfor any player at some information set in games with observed actions.3 Step II:The reverse is a bit harder to show.We want to show thatu i(ˆσi,σ−i|x t)≤u i(σi,σ−i|x t)for all initial nodes x t of a subgame(subgameat some round t).We prove this by induction on T which is the number of periods in whichσi andˆσi differ.3We have to very careful at this point.We have defined SPE as NE in every subgame. Subgames can only originate at nodes and not information sets.However,in games with observed actions all players play simultaneous move games in each round t.Therefore any deviation by a player at an information set at round t which is not a singleton is on the equilibrium path of some subgame at round t.8T=1:In this case the result is clear.Supposeσi andˆσi differ only in the information set in period t .If t>t it is clear that u i(ˆσi,σ−i|x t)= u i(σi,σ−i|x t)because the two strategies are identical at all the relevant in-formation sets.If t≤t then:u i(ˆσi,σ−i|x t)=h itu i(ˆσi,σ−i|h it )P rob{h it |ˆσi,σ−i,x t}≤h itu i(σi,σ−i|h it )follows from onestage deviationcriterionP rob{h it |σi,σ−i,x t}follows fromσiandˆσi havingsame play be-tween t and t=u i(σi,σ−i|x t)T→T+1:Assuming that the result holds for T letˆσi be any strategy differing fromσi in T+1periods.Let t be the last period at which they differ and define˜σi by:˜σi(h it)=ˆσi(h it)if t<tσi(h it)if t≥tIn other words,˜σi differs fromσi only at T periods.Therefore we have for any x tu i(˜σi,σ−i|x t)≤u i(σi,σ−i|x t)by the inductive hypothesis since we assumed that the claim holds for T.However,we also know that˜σandˆσonly differ in a single deviation at round t .Therefore,we can use exactly the same argument as in the previous step to show thatu i(ˆσi,σ−i|x t)≤u i(˜σi,σ−i|x t)for any x t.4Combining both inequalities we get the desires result:u i(ˆσi,σ−i|x t)≤u i(σi,σ−i|x t)This proves the result forfinite games with observed actions.4It is important that we have defined˜σi in differing only in the last period deviation. Therefore,after time t strategy˜σi followsσi.This allows us to use the SPDP.94.2Proof for infinite horizon gamesNote,that the proof forfinite-horizon games also establishes that we for a profileσwhich satisfies SPDP player i cannot improve onσi in some subgame x t by considering a new strategyˆσi withfinitely many deviations fromσi. However,it is still possible that deviations at infinitely many periods might be an improvement for player i.Assume this would be the case for someˆσi.Let’s denote the gain from using this strategy with :=u i(ˆσi,σ−i|x t)−u i(σi,σ−i|x t)Because the game is continuous at∞this implies that if we choose T large enough we can define some strategy˜σi which agrees withˆσi up to period T and then follows strategyσi such that:|u i(ˆσi,σ−i|x t)−u i(˜σi,σ−i x t)|< 2This implies that the new strategy˜σi gives player i strictly more utility than σi.However,it can only differ fromσi for at most T periods.But this is a contradiction as we have shown above.QEDRemark3Games which are at continuous at∞are in some sense’the next best thing’tofinite games.5Analysis of Rubinstein’s Bargaining Game To illustrate the use of the single period deviation principle and to show the power of SPE in one interesting model we now return to the Rubinstein bargaining game introduced before.First,note that the game has many Nash equilibria.For example,player 2can implement any division by adopting a strategy in which he only accepts and proposes a share x2and rejects anything else.Proposition1The bargaining game has a unique SPE.In each period of the SPE the player i who proposes picks x i=11+δand the other player acceptsany division giving him at leastδ1+δand rejects any offer giving him less.Several observations are in order:101.We get a roughly even split forδclose to1(little discounting).Theproposer can increase his share the more impatient the other player is.2.Agreement is immediate and bargaining is therefore efficient.Playerscan perfectly predict play in the next period and will therefore choose a division which makes the other player just indifferent between accepting and making her own proposal.There is no reason to delay agreement because it just shrinks the pie.Immediate agreement is in fact not observed in most experiments-last year in a two stage bargaining game in class we observed that only2out of14bargains ended in agreement after thefirst period.There are extensions of the Rubinstein model which do not give immediate agreement.53.The division becomes less equal forfinite bargaining games.Essentially,the last proposer at period T can take everything for himself.Therefore, he will tend to get the greatest share of the pie in period1as well -otherwise he would continue to reject and take everything in the last period.In our two-period experiments we have in deed observed greater payoffs to the last proposer(ratio of2to1).However,half of all bargains resulted in disagreement after the second period and so zero payoffs for everyone.Apparently,people care about fairness as well as payoffs which makes one wonder whether monetary payoffs are the right way to describe the utility of players in this game.5.1Useful ShortcutsThe hard part is to show that the Rubinstein game has a unique SPE.If we know that it is much easier to calculate the actual strategies.5.1.1Backward Induction in infinite gameYou can solve the game by assuming that random payoffafter rejection of offer at period T.The game then becomes afinite game of perfect information which can be solved through backward induction.It turns out that as T→∞5For example,when there is uncertainty about a players discount factor the proposer might start with a low offer in order to weed out player2types with low discount factor. Players with highδwill reject low offers and therefore agreement is not immediate.To analyze these extension we have tofirst develop the notion of incomplete information games.11the backward solution converges to the Rubinstein solution.This technique also provides an alternative proof for uniqueness.5.1.2Using the Recursive Structure of GameThe game has a recursive structure.Each player faces essentially the same game,just with interchanged roles.Therefore,in the unique SPE player 1should proposes some(x,1−x)and player2should propose(1−x,x). Player1’s proposal to2should make2indifferent between accepting imme-diately or waiting to make her own offer(otherwise player1would bid higher or lower).This implies:1−x2’s payoffat period1=δx2’s discounted payofffrom period2x=1 1+δ5.2Proof of Rubinstein’s Solution5.2.1ExistenceWe show that there is no profitable single history deviation.Proposer:If he conforms at period t the continuation payoffisδt11+δ.If he deviates and asks for more he getsδt+1δ.If he deviates and asks for less hegets less.Either way he loses.Recipient:Look at payoffs conditional on y being proposes in period t.–If he rejects he getsδt+111+δ.–If he accepts he getsδt y.–If y≥δ1+δthe strategy says accept and this is better than reject-ing.–If y<δ1+δthe strategy says reject and accepting is not a profitabledeviation.125.2.2UniquenessThis proof illustrates a number of techniques which enable us to prove prop-erties about equilibrium without actually constructing it.Let v and v be the highest and lowest payoffs received by a proposer in any SPE.We first observe:1−v ≥δv (3)If this equation would not be true then no proposer could propose v because the recipient could always get more in any subgame.We also find:v ≥1−δv (4)If not then no proposer would propose v -she would rather wait for the other player to make her proposal because she would get a higher payoffthis way.We can use both inequalities to derive bounds on v and v :v ≥1−δv ≥1−δ(1−δv )(5) 1−δ2 v ≥1−δ(6)v ≥11+δ(7)Similarly,we find:1−v ≥δv ≥δ(1−δv )(8)v ≤11+δ(9)Hence v =v =11+δ.Clearly,no other strategies can generate this payoffin every subgame.66While being a nice result it does not necessarily hold anymore when we change the game.For example,if both players make simultaneous proposals then any division is a SPE.Also,it no longer holds when there are several players.13。

博弈论(哈佛大学原版教程)

博弈论(哈佛大学原版教程)

Lecture VIII:LearningMarkus M.M¨o biusMarch10,2004Learning and evolution are the second set of topics which are not discussed in the two main texts.I tried to make the lecture notes self-contained.•Fudenberg and Levine(1998),The Theory of Learning in Games,Chap-ter1and21IntroductionWhat are the problems with Nash equilibrium?It has been argued that Nash equilibrium are a reasonable minimum requirement for how people should play games(although this is debatable as some of our experiments have shown).It has been suggested that players should be able tofigure out Nash equilibria starting from the assumption that the rules of the game,the players’rationality and the payofffunctions are all common knowledge.1As Fudenberg and Levine(1998)have pointed out,there are some important conceptual and empirical problems associated with this line of reasoning:1.If there are multiple equilibria it is not clear how agents can coordinatetheir beliefs about each other’s play by pure introspection.mon knowledge of rationality and about the game itself can bedifficult to establish.1We haven’t discussed the connection between knowledge and Nash equilibrium.As-sume that there is a Nash equilibriumσ∗in a two player game and that each player’s best-response is unique.In this case player1knows that player2will playσ∗2in response toσ∗1,player2knows that player1knows this mon knowledge is important for the same reason that it matters in the coordinated attack game we discussed earlier.Each player might be unwilling to play her prescribed strategy if she is not absolutely certain that the other play will do the same.13.Equilibrium theory does a poor job explaining play in early rounds ofmost experiments,although it does much better in later rounds.This shift from non-equilibrium to equilibrium play is difficult to reconcile with a purely introspective theory.1.1Learning or Evolution?There are two main ways to model the processes according to which players change their strategies they are using to play a game.A learning model is any model that specifies the learning rules used by individual players and examines their interaction when the game(or games)is played repeatedly. These types of models will be the subject of today’s lecture.Learning models quickly become very complex when there are many play-ers involved.Evolutionary models do not specifically model the learning process at the individual level.The basic assumption there is that some un-specified process at the individual level leads the population as a whole to adopt strategies that yield improved payoffs.These type of models will the subject of the next few lectures.1.2Population Size and MatchingThe natural starting point for any learning(or evolutionary)model is the case offixed players.Typically,we will only look at2by2games which are played repeatedly between these twofixed players.Each player faces the task of inferring future play from the past behavior of agents.There is a serious drawback from working withfixed agents.Due to the repeated interaction in every game players might have an incentive to influence the future play of their opponent.For example,in most learning models players will defect in a Prisoner’s dilemma because cooperation is strictly dominated for any beliefs I might hold about my opponent.However, if I interact frequently with the same opponent,I might try to cooperate in order to’teach’the opponent that I am a cooperator.We will see in a future lecture that such behavior can be in deed a Nash equilibrium in a repeated game.There are several ways in which repeated play considerations can be as-sumed away.1.We can imagine that players are locked into their actions for quite2a while(they invest infrequently,can’t build a new factory overnightetc.)and that their discount factors(the factor by which they weight the future)is small compared that lock-in length.It them makes sense to treat agents as approximately myopic when making their decisions.2.An alternative is to dispense with thefixed player assumption,andinstead assume that agents are drawn from a large population and are randomly matched against each other to play games.In this case,it is very unlikely that I encounter a recent opponent in a round in the near future.This breaks the strategic links between the rounds and allows us to treat agents as approximately myopic again(i.e.they maximize their short-term payoffs).2Cournot AdjustmentIn the Cournot adjustment model twofixed players move sequentially and choose a best response to the play of their opponent in the last period.The model was originally developed to explain learning in the Cournotmodel.Firms start from some initial output combination(q01,q02).In thefirst round bothfirms adapt their output to be the best response to q02.Theytherefore play(BR1(q02),BR2(q01)).This process is repeated and it can be easily seen that in the case of linear demand and constant marginal costs the process converges to the unique Nash equilibrium.If there are several Nash equilibria the initial conditions will determine which equilibrium is selected.2.1Problem with Cournot LearningThere are two main problems:•Firms are pretty dim-witted.They adjust their strategies today as if they expectfirms to play the same strategy as yesterday.•In each period play can actually change quite a lot.Intelligentfirms should anticipate their opponents play in the future and react accord-ingly.Intuitively,this should speed up the adjustment process.3Cournot adjustment can be made more realistic by assuming thatfirms are’locked in’for some time and that they move alternately.Firms1moves in period1,3,5,...andfirm2moves in periods2,4,6,..Starting from someinitial play(q01,q02),firms will play(q11,q02)in round1and(q11,q22)in round2.Clearly,the Cournot dynamics with alternate moves has the same long-run behavior as the Cournot dynamics with simultaneous moves.Cournot adjustment will be approximately optimal forfirms if the lock-in period is large compared to the discount rate offirms.The less locked-in firms are the smaller the discount rate(the discount rate is the weight on next period’s profits).Of course,the problem with the lock-in interpretation is the fact that it is not really a model of learning anymore.Learning is irrelevant becausefirms choose their optimal action in each period.3Fictitious PlayIn the process offictitious play players assume that their opponents strategies are drawn from some stationary but unknown distribution.As in the Cournot adjustment model we restrict attention to afixed two-player setting.We also assume that the strategy sets of both players arefinite.4Infictitious play players choose the best response to their assessment of their opponent’s strategy.Each player has some exogenously given weightingfunctionκ0i :S−i→ +.After each period the weights are updated by adding1to each opponent strategy each time is has been played:κt i (s−i)=κt−1i(s−i)if s−i=s t−1−iκt−1i(s−i)+1if s−i=s t−1−iPlayer i assigns probabilityγti(s−i)to strategy profile s−i:γt i (s−i)=κti(s−i)˜s−i∈S−iκti(˜s−i)The player then chooses a pure strategy which is a best response to his assessment of other players’strategy profiles.Note that there is not neces-sarily a unique best-response to every assessment-hencefictitious play is not always unique.We also define the empirical distribution d ti (s i)of each player’s strategiesasd t i (s i)=t˜t=0I˜t(s i)tThe indicator function is set to1if the strategy has been played in period˜tand0otherwise.Note,that as t→∞the empirical distribution d tj of playerj’s strategies approximate the weighting functionκti (since in a two playergame we have j=−i).Remark1The updating of the weighting function looks intuitive but also somewhat arbitrary.It can be made more rigorous in the following way. Assume,that there are n strategy profiles in S−i and that each profile is played by player i’s opponents’with probability p(s−i).Agent i has a prior belief according to which these probabilities are distributed.This prior is a Dirichlet distribution whose parameters depend on the weighting function. After each round agents update their prior:it can be shown that the posterior belief is again Dirichlet and the parameters of the posterior depend now on the updated weighting function.3.1Asymptotic BehaviorWillfictitious play converge to a Nash equilibrium?The next proposition gives a partial answer.5Proposition1If s is a strict Nash equilibrium,and s is played at date t in the process offictitious play,s is played at all subsequent dates.That is,strict Nash equilibria are absorbing for the process offictitious play.Furthermore, any pure-strategy steady state offictitious play must be a Nash equilibrium. Proof:Assume that s=(s i,s j)is played at time t.This implies that s i isa best-response to player i’s assessment at time t.But his next periodassessment will put higher relative weight on strategy s j.Because s i isa BR to s j and the old assessment it will be also a best-response to theupdated assessment.Conversely,iffictitious play gets stuck in some pure steady state then players’assessment converge to the empirical distribution.If the steady state is not Nash players would eventually deviate.A corollary of the above result is thatfictitious play cannot converge to a pure steady state in a game which has only mixed Nash equilibria such as matchingpennies.6distribution over each player’s strategies are converging to12,12-this isprecisely the unique mixed Nash equilibrium.This observation leads to a general result.Proposition2Underfictitious play,if the empirical distributions over each player’s choices converge,the strategy profile corresponding to the product of these distributions is a Nash equilibrium.Proof:Assume that there is a profitable deviation.Then in the limit at least one player should deviate-but this contradicts the assumption that strategies converge.These results don’t tell us whenfictitious play converges.The next the-orem does precisely that.Theorem1Underfictitious play the empirical distributions converge if the stage has generic payoffs and is22,or zero sum,or is solvable by iterated strict dominance.We won’t prove this theorem in this lecture.However,it is intuitively clear whyfictitious play observes IDSDS.A strictly dominated strategy can never be a best response.Therefore,in the limitfictitious play should put zero relative weight on it.But then all strategies deleted in the second step can never be best responses and should have zero weight as well etc.3.2Non-Convergence is PossibleFictitious play does not have to converge at all.An example for that is due to Shapley.7(M,R),(M,L),(D,L),(D,M),(T,M).One can show that the number of time each profile is played increases at a fast enough rate such that the play never converges.Also note,that the diagonal entries are never played.3.3Pathological ConvergenceConvergence in the empirical distribution of strategy choices can be mis-leading even though the process converges to a Nash equilibrium.Take the following game:8riod the weights are2,√2,and both play B;the outcome is the alternatingsequence(B,B),(A,A),(B,B),and so on.The empirical frequencies of each player’s choices converge to1/2,1/2,which is the Nash equilibrium.The realized play is always on the diagonal,however,and both players receive payoff0each period.Another way of putting this is that the empirical joint distribution on pairs of actions does not equal the product of the two marginal distributions,so the empirical joint distribution corresponds to correlated as opposed to independent play.This type of correlation is very appealing.In particular,agents don’t seem to be smart enough to recognize cycles which they could exploit.Hence the attractive property of convergence to a Nash equilibrium can be misleading if the equilibrium is mixed.9。

国外博弈论课件lecture(14)_OK

国外博弈论课件lecture(14)_OK

Stage 1 Stage 2
• Replace the smallest subgames by payoffs of player 1 and 2.
• Player 1 and 2 knows that if they play a1 and a2, respectively, then
player 3 and 4 will play a3* R3(a1, a2 ), a4* R4 (a1, a2 ),
2021/9/6
3
An abstract game: generalization of the tariff game
• Four players: 1, 2, 3, 4. The timing of the game is as follows.
• Stage1:
Player 1 and 2 simultaneously choose actions a1 and a2 from feasi ble action sets A1 and A2, respectively.
• More applications • Repeated games
2021/9/6
2
Today’s Agenda
• Review of previous class • An abstract game: generalization of the tariff game • Repeated games
• A set of players: {player 1, player 2}
• Sets of actions: A1, A2 • Payoffs: u1(a1, a2 , R3 (a1, a2 ), R4 (a1, a2 ))
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

2p11+2 p12= 3p11+1 p12 3p11+2 p12 and p11+p12=1.
We should have
2anpd21+p201+
pp2222++1p23p=23
=3p21+1 1
p22+0
p23
Solve these. If we can get a solution that satisfies p11>0, p12>0, p21>0, p22>0, p23=0 then we have a mixed strategy Nash equilibrium. Otherwise, if we can not find a solution or we find a solution that does not satisfies
not satisfies p11>0, p12>0, p21>0, p22>0, p23>0, then we do not have such a mixed strategy Nash equilibrium.
June 3, 2003
73-347 Game Theory--Lecture 11
3
Exercise 138.1 of Osborne
Player 1
T (p11) B (p12)
L (p21) 2, 2 3, 2
Player 2 M (p22) 0, 3 1, 1
R (p23) 1, 3 0, 2
We first consider pure-strategy Nash equilibria. How many can you find?
5
Exercise 138.1 of Osborne
Player 2
Player 1
T (p11) B (p12)
L (p21) 2, 2 3, 2
M (p22) 0, 3 1, 1
R (p23) 1, 3 0, 2
Case 2: check whether there is a mixed strategy in which p11>0, p12>0, p21>0, p22>0, p23=0 By theorem 4, we should have
We should have
2anpd21+p201+
pp2222++1p23p=23
=3p21
Solve these equations. If we can get a solution that satisfies p11>0, p12>0, p21>0, p22>0, p23>0 then we have a mixed strategy Nash equilibrium. Otherwise, if we can not find a solution or we find a solution that does
Case 1: check whether there is a mixed strategy in which p11>0, p12>0, p21>0, p22>0, p23>0 By theorem 4, we should have
2p11+2 p12= 3p11+1 p12 = 3p11+2 p12 and p11+p12=1.
Player 2
Player 1
T (p11) B (p12)
L (p21) 2, 2 3, 2
M (p22) 0, 3 1, 1
R (p23) 1, 3 0, 2
Case 3: check whether there is a mixed strategy in which p11>0, p12>0, p21>0, p22=0, p23>0 By theorem 4, we should have
Introduction to games Normal-form (or strategic-form)
representation Iterated elimination of strictly dominated
strategies Nash equilibrium Review of concave functions, optimization Applications of Nash equilibrium Mixed strategy Nash equilibrium
p11>0, p12>0, p21>0, p22>0, p23=0, then we do not have such a mixed strategy Nash equilibrium.
June 3, 2003
73-347 Game Theory--Lecture 11
6
Exercise 138.1 of Osborne
In order to find all Nash equilibria, we need to consider 15 more cases by Theorem 4 of Lecture 10!
We first consider complicated cases. Some cases are very easy.
June 3, 2003
73-347 Game Theory--Lecture 11
2
Today’s Agenda
Review of previous class Exercise 138.1 of Osborne Review HW1
June 3, 2003
73-347 Game Theory--Lecture 11
June 3, 2003
73-347 Game Theory--Lecture 11
4
Exercise 138.1 of Osborne
Player 2
Player 1
T (p11) B (p12)
L (p21) 2, 2 3, 2
M (p22) 0, 3 1, 1
R (p23) 1, 3 0, 2
Static (or SimultaneousMove) Games of Complete Information
Mixed Strategy Nash Equilibrium
June 3, 2003
73-347 Game Theory--Lecture 11
1
Outline of Static Games of Complete Information
相关文档
最新文档