Lecture 21 - Repeated Games Cooperation vs. the End Game

合集下载

第6.2讲:Repeated Games(II):以“猜硬币博弈”为例

第6.2讲:Repeated Games(II):以“猜硬币博弈”为例
有预见能力的理性博弈方都知道最后一个阶段的结果,因此在该阶 段也不可能有合作的可能性。
猜硬币方 猜正面 猜反面 盖正面 盖硬币方 盖反面 -1,1 1,-1 1,-1 -1,1
9
“猜硬币博弈” 的有限次重复博弈
依此类推
得出结论,在整个零和博弈的有限次重复博弈中,所有博弈方的唯 一选择就是始终采用原博弈的混合策略纳什均衡策略。
最后一次重复博弈是原零和博弈本身,而且不再有任何后续博弈, 因此博弈方之间既没有任何合作的机会,也没有合作的必要,采用 原博弈的混合策略纳什均衡是唯一合理的选择。
猜硬币方 猜正面 猜反面
盖正面
盖硬币方 盖反面
-1,1 1,-1
1,-1 -1,1
8
“猜硬币博弈” 的有限次重复博弈
逆推到倒数第二阶段:
1
Lecture 6: Repeated Games(II)
──以“猜硬币博弈”为例
2
严格竞争博弈
在博弈论中,通常把各博弈方的利益和偏好严格对立,没有 纯策略纳什均衡的博弈问题称为“严格竞争博弈”。例如, 猜硬币博弈。
3
猜硬币博弈
猜硬币方
猜正面
盖正面
猜反面
-1,1 1,-1
1,-1 -1,1
11
“猜硬币博弈” 的无限次重复博弈
12
“猜硬币博弈” 的无限次重复博弈
以“猜硬币博弈”为原博弈的无限次重复博弈
逆推归纳法不再适用。
猜硬币方 猜正面 猜反面 盖正面
-1,1 1,-1猜硬币博弈” 的无限次重复博弈
【直观判断】重复博弈的无限次增加不会改变原博弈中博弈 方之间在利益上的对立关系,也不会创造出潜在的合作利益, 因此在这种博弈的无限次重复博弈中,博弈方仍然是每次重 复博弈都根据当时的最大利益行为,采用原博弈的混合策略 纳什均衡。

repeated games 1

repeated games 1
The problem of cooperation Finitely-repeated prisoner’s dilemma Infinitely-repeated games and cooperation
Folk Theorems
Reference: Fudenberg and Tirole, Section 5.1.
10
Game Theory: Lecture 15
Infinitely-Repeated Games
Infinitely-Repeated Games
Now consider the infinitely-repeated game G ∞ , i.e., players play the game repeatedly at times t = 0, 1, . . ..
8
Game Theory: Lecture 15
Introduction
Finitely-Repeated Prisoners’ Dilemma (continued)
In the last period,“defect” is a dominant strategy regardless of the history of the game. So the subgame starting at T has a dominant strategy equilibrium: (D , D ). Then move to stage T − 1. By backward induction, we know that at
5
Game Theory: Lecture 15
Introduction
Mathematical Model
More formally, imagine that I players playing a strategic form game G = �I , (Ai )i ∈I , (gi )i ∈I � for T periods. At each period, the outcomes of all past periods are observed by all players ⇒ perfect monitoring Let us start with the case in which T is finite, but we will be particularly interested in the case in which T = ∞. Here Ai denotes the set of actions at each stage, and gi : A → R, where A = A1 × · · · × AI . � � t That is, gi ait , a− is the stage payoff to player i when action profile i

Mathematical Models Game Theory 博弈论数学模型

Mathematical Models Game Theory 博弈论数学模型

Mathematical Models: Game TheoryMark FeyUniversity of RochesterThis course is designed to teach graduate students in political science the tools of game theory. The course will cover the standard group of essential concepts and some additional topics that are particularly important in formal theory. In addition, we will cover some specific applications of game theory in political science.Students should have, at a minimum, a mathematical background of algebra (solving equations, graphing functions, etc.) and a basic knowledge of probability. Development of the theory does not require calculus, although some applications will be presented that utilize basic calculus (i.e., derivatives). A very brief summary of the necessary mathematics is in Appendix One of Morrow.Game theory, as with most mathematical topics, is best learned by doing, rather than reading. Thus, an important part of the course will be the problem sets covering the lecture material and readings. These problem sets will count for 60% of the final grade, and a take-home exam at the end of the course will count for 40% of the final grade. Solutions to the problem sets will be covered in class. Auditors are welcome, and those who complete the problem sets and keep up with the lectures and reading will be entitled to seek help with problems and with the material. There are two required texts for the course:James Morrow, 1995. Game Theory for Political Scientists. Princeton University Press.Robert Gibbons, 1992. Game Theory for Applied Economists. Princeton University Press. Other readings will be made available to you for photocopying.ScheduleJune 27Introduction, Basic Assumptions of Rational ChoiceMorrow: Chs. 1-2June 28 Decision Theory, OptimizationJune 29 Representing Games, Strategic Form GamesMorrow: Chs. 3-4June 30 Strategic Form Games129July 3 & 4 No class, but lots of fireworks!July 5 Strategic Form Games, DominanceGibbons: Sec. 1.1July 6 Nash Equilibrium, Mixed StrategiesGibbons: Sec. 1.3July 7 Zero-sum Games, ApplicationsJuly 10 Extensive Form Games, Backwards Induction Morrow: Ch. 5July 11 Subgame Perfection, Forward InductionGibbons: Ch. 2July 12 Bayesian Games, Bayesian EquilibriumMorrow: Ch. 6July 13 Bayesian EquilibriumGibbons: Ch. 3July 14 Perfect Bayesian Equilibrium and Sequential Equilibrium Morrow: Ch. 7Gibbons: Sec. 4.1July 17 Sequential EquilibriumJuly 18 Signaling GamesMorrow: Ch. 8Gibbons: Sec. 4.2July 19 Cheap Talk GamesGibbons: Sec. 4.3July 20 Repeated GamesMorrow: Ch. 9July 21 Applications, Wrap-up130。

Elements of Decision- Making

Elements of Decision- Making

GAME THEORY:
KEY ELEMENTS
(Brandenburger and Nalebuff, 1995)
According to game theory there are five main elements of the game:
• Players – customers, suppliers, competitors, employees etc. • Added Value – what each player brings to the game • Rules – gives structure to the game • Strategies – moves used to shape the game and how it is perceived • Payoffs – political social and economic
• TV programming
• Airlines flying schedule
• Politics – Left vs Right
ONE STAGE GAMES
• Easiest game, 2 players, 1 stage such as the coordination game. • Nash equilibrium – each player is making an optimal choice given the
other player’s choice
• In one stage games the key to how co-ordination is achieved centres on whether decision-making is:
• simultaneous - lack of information makes co-ordination more difficult to attain

剑桥少儿英语预备级Unit16单词游戏

剑桥少儿英语预备级Unit16单词游戏
The game is based on the Cambridge English curriculum, which is internationally recognized and widely used in schools around the world
By playing this game, children can improve their word recognition, spelling, and promotion, as well as their understanding of grammar and presence structure
The main objective of the game is to help children learn and practice new words and phrases related to the theme of the unit
The game resources active participation and collaboration among players, promoting a positive learning environment
All other players are "Students" who must work together to correctly use the words and phrases drawn from the word cards in attendance
Scoring mechanism and reward and punishment measures
understanding
Player Character Setting and Response Allocation

lecture17

lecture17

-0.5, -0.5
-K, -K
-K, -K
-K, -K
11
73-347 Game Theory--Lecture 17
Find subgame perfect Nash equilibria: backward induction
Player 1 L Player 2 L’ 3 R’ 3 R” L” R” L” L’
are available? What do players know when they move? Players’ payoffs are determined by their choices. All these are common knowledge among the players.
June 12, 2003 73-347 Game Theory--Lecture 17 4
Dynamic games of complete information
Perfect information
A
player knows Who has made What choices when she has an opportunity to make a choice player may not know exactly Who has made What choices when she has an opportunity to make a choice.
73-347 Game Theory--Lecture 17 7
Perfect information and imperfect information
A dynamic game in which every information

Lecture-4-Extensive-Games

Lecture-4-Extensive-Games

informed of the history of what has happened so far, up to the point where it is her turn to move. • An extensive game with perfect information consists of
. . . . . .
Extensive Games
Subgame Perfect Equilibrium
Backward Induction
Illustrations
Extensions and Controversies
Nash equilibrium in extensive games: example • So the following extensive game
. . . . . .
Extensive Games
Subgame Perfect Equilibrium
Backward Induction
Illustrations
Extensions and Controversies
Definition • Perfect information (完美信息): each player is perfectly
• An example: A challenger decides whether or not to enter (a
market); if the challenger enters, the incumbent decides to fight or acquiesce.
.
.
.
.
.
.
Extensive Games
Subgame Perfect Equilibrium

lecture23(博弈论讲义(Carnegie Mellon University))

lecture23(博弈论讲义(Carnegie Mellon University))

June 20, 2003
73-347 Game Theory--Lecture 23
6
Cournot duopoly model of incomplete information (version one) cont'd
Firm 2's marginal cost depends on some factor (e.g. technology) that only firm 2 knows. Its marginal cost can be
* * * Firm 1 chooses q1 which is its best response to firm 2's ( q2 (cH ) , q2 (cL ) ) (and the probability). * If firm 2's marginal cost is HIGH then firm 2 chooses q2 (cH ) which is its * best response to firm 1's q1 . * If firm 2's marginal cost is LOW then firm 2 chooses q2 (cL ) which is its * best response to firm 1's q1 .
June 20, 2003
73-347 Game Theory--Lecture 23
3
Static (or simultaneous-move) games of complete information
A set of players (at least two players) For each player, a set of strategies/actions Payoffs received by each player for the combinations of the strategies, or for each player, preferences over the combinations of the strategies All these are common knowledge among all the players.

2023北京人大附中初三(上)开学考英语(教师版)

2023北京人大附中初三(上)开学考英语(教师版)

2023北京人大附中初三(上)开学考英语2023.9一、听后选择(共9分, 每小题1.5分)听对话或独白, 根据对话或独白的内容, 从下面各题所给的A、B、C三个选项中选择最佳选项, 每段对话或独白你将听两遍。

请听一段对话, 完成第 1 至第2小题。

1. What does the man think of Sichuan hotpot?A. It's too cold.B. It's spicy.C. It's special.2. What Chinese food do the woman's family like?A. Cantonese dishes.B. Sichuan dishes.C. Sushi and sashimi.请听一段对话, 完成第3至第4小题.3. What is the relationship between the two speakers?A. Husband and wife.B. Workmates.C. Mother and son.4. What will the speakers do first?A. Get some medicine for the man's mother.B. Buy a chicken in the supermarket.C. Have a look at the new clothes in Blue Moon.请听一段独白, 完成第 5 至第 6小题。

5. What can we know from the speaker?A. Group 2 is for students in Grade 7.B. Students should tell the stories within 15 minutes.C. The 2nd prize winners will get 10 classic novels.6. Why does the speaker make the speech?A. To tell the students the result of the competition.B. To introduce the information of the competition.C. To do a report on the competition last month.二、听后回答(共12分, 每小题2分)听对话, 根据对话内容, 笔头回答问题。

第6.1讲:Repeated Games(I):基本理论

第6.1讲:Repeated Games(I):基本理论

27
除博弈次数之外,影响均衡结果的另一个
重要因素是信息的完备性。简单地说,当
一个参与人的支付函数(特征)不为其他
参与人所知时,该参与人可能有积极性建
立一个“好”声誉(reputation)以换取
长远利益。这一点或许可以解释为什么那
些本质上并不好的人在相当长的时期内干 好事。(张维迎,2012,第124页)
3
借 甲 分 ( 2 , 2) 打 不分 乙

不借 ( 1 , 0)
不打
(-1,0)
( 0 , 4)
4
重复博弈(Repeated Game)
“重复博弈(Repeated Game)” 是指同样结构的博弈重复多次,其中 的每次博弈称为“阶段博弈(Stage Game)”。(张维迎,2012,第123页)
5
典型重复博弈
体育竞技中的多局制比赛
乒乓球赛 拳击赛 共 同 特 征
同样一些博弈方,在完全相同的 环境和规则下重复进行的博弈。
6
典型重复博弈(续)
同样一些博弈方,在完全相同的环境和规则下 重复进行的博弈:
商业中的回头客问题
企业之间的长期合作或竞争
7
重复博弈
重复博弈(Repeated Game)
1
Lecture 6: Repeated Games (I)
──基础理论
2
序贯博弈(Sequential Game)
参与人在前一个阶段的行动选择决定随后的子 博弈的结构,因此,从后一个决策结开始的子 博弈不同于从前一个决策结开始的子博弈,或 者说,同样结构的子博弈只出现一次。这样的 动态博弈称为“序贯博弈(Sequential Game)”。(张维迎,2012,第123页)

lecture21Dynamic Games of Complete and Imperfect Information(博弈论,Carnegie Mellon University)

lecture21Dynamic Games of Complete and Imperfect Information(博弈论,Carnegie Mellon University)
73-347 Game Theory--Lecture 21 12
June 18, 2003
Trigger strategy: step 1
Stage 1 2
P2: Trigger
P2: deviate trigger at t
(R1, R2) (R1, R2)
(R1, R2) (R1, R2)
We use best response to do step 1: Suppose that player 1 plays the trigger strategy. Suppose that player 2 plays the trigger strategy up to stage t-1. Can player 2 be better-off if she deviates from the trigger strategy at stage t?

Dynamic games of complete information Extensive-form representation Dynamic games of complete and perfect information Game tree Subgame-perfect Nash equilibrium Backward induction Applications Dynamic games of complete and imperfect information More applications Repeated games

Hence, it is a subgame perfect Nash
equilibrium.
June 18, 2003 73-347 Game Theory--Lecture 21 11

指南中关于合作游戏的阐述

指南中关于合作游戏的阐述

指南中关于合作游戏的阐述英文回答:Cooperative games, also known as co-op games, are a type of game where players work together towards a common goal. Unlike competitive games where players compete against each other, cooperative games encourage collaboration and teamwork. In these games, players must communicate, strategize, and coordinate their actions in order to succeed.One of the key elements of cooperative games is the shared objective. All players have a common goal that they must work towards achieving. This can be anything from solving a puzzle, completing a mission, or surviving against a common enemy. The shared objective creates a sense of unity among the players and fosters a cooperative mindset.Communication is crucial in cooperative games. Playersmust constantly communicate with each other to share information, coordinate their actions, and plan strategies. Effective communication can greatly enhance the team's chances of success. It allows players to share their ideas, discuss different approaches, and make informed decisions together.Another important aspect of cooperative games is the distribution of roles and responsibilities. Each player may have a unique role or set of abilities that contribute to the team's overall success. For example, one player may be responsible for healing, while another player focuses on dealing damage. By assigning specific roles, players can maximize their efficiency and complement each other's strengths and weaknesses.Cooperative games also often incorporate elements of challenge and difficulty. The game may present obstacles or enemies that require the collective effort of the team to overcome. This can create a sense of excitement and accomplishment when the team successfully overcomes these challenges together.Cooperative games can be enjoyed by players of allskill levels and ages. They provide a platform for players to work together, build relationships, and develop problem-solving and communication skills. These games can also be a great way to foster teamwork and cooperation in various settings, such as classrooms, workplaces, or family gatherings.中文回答:合作游戏,也被称为合作性游戏,是一种玩家共同合作达到共同目标的游戏类型。

指南中关于合作游戏的阐述

指南中关于合作游戏的阐述

指南中关于合作游戏的阐述英文回答:Cooperative Gameplay.Cooperative gameplay, also known as co-op, allows multiple players to work together to achieve a common goal. This mode of play can be found in a variety of genres, from first-person shooters to role-playing games. In cooperative games, players must coordinate their actions and communicate effectively in order to succeed. This can lead to a more challenging and engaging experience than playing alone.There are many benefits to cooperative gameplay. First, it allows players to socialize with others and build relationships. Second, it can help players learn new skills and strategies. Third, it can provide a more challenging and rewarding experience than playing alone.There are a few things to keep in mind when playing cooperative games. First, it is important to communicate effectively with your teammates. Second, it is important to be patient and understanding. Third, it is important to be willing to compromise.Cooperative gameplay can be a great way to experience video games with friends. It can be a challenging and rewarding experience that can help players build relationships and learn new skills.中文回答:合作模式。

Game Theory 2

Game Theory 2

GAME THEORYThomas S.Ferguson Part II.Two-Person Zero-Sum Games1.The Strategic Form of a Game.1.1Strategic Form.1.2Example:Odd or Even.1.3Pure Strategies and Mixed Strategies.1.4The Minimax Theorem.1.5Exercises.2.Matrix Games.Domination.2.1Saddle Points.2.2Solution of All2by2Matrix Games.2.3Removing Dominated Strategies.2.4Solving2×n and m×2Games.2.5Latin Square Games.2.6Exercises.3.The Principle of Indifference.3.1The Equilibrium Theorem.3.2Nonsingular Game Matrices.3.3Diagonal Games.3.4Triangular Games.3.5Symmetric Games.3.6Invariance.3.7Exercises.4.Solving Finite Games.4.1Best Responses.4.2Upper and Lower Values of a Game.4.3Invariance Under Change of Location and Scale.4.4Reduction to a Linear Programming Problem.4.5Description of the Pivot Method for Solving Games.4.6A Numerical Example.4.7Exercises.5.The Extensive Form of a Game.5.1The Game Tree.5.2Basic Endgame in Poker.5.3The Kuhn Tree.5.4The Representation of a Strategic Form Game in Extensive Form.5.5Reduction of a Game in Extensive Form to Strategic Form.5.6Example.5.7Games of Perfect Information.5.8Behavioral Strategies.5.9Exercises.6.Recursive and Stochastic Games.6.1Matrix Games with Games as Components.6.2Multistage Games.6.3Recursive Games. -Optimal Strategies.6.4Stochastic Movement Among Games.6.5Stochastic Games.6.6Approximating the Solution.6.7Exercises.7.Continuous Poker Models.7.1La Relance.7.2The von Neumann Model.7.3Other Models.7.4Exercises.References.Part II.Two-Person Zero-Sum Games1.The Strategic Form of a Game.The individual most closely associated with the creation of the theory of games is John von Neumann,one of the greatest mathematicians of this century.Although others preceded him in formulating a theory of games-notably´Emile Borel-it was von Neumann who published in1928the paper that laid the foundation for the theory of two-person zero-sum games.Von Neumann’s work culminated in a fundamental book on game theory written in collaboration with Oskar Morgenstern entitled Theory of Games and Economic Behavior,1944.Other more current books on the theory of games may be found in the text book,Game Theory by Guillermo Owen,2nd edition,Academic Press,1982,and the expository book,Game Theory and Strategy by Philip D.Straffin,published by the Mathematical Association of America,1993.The theory of von Neumann and Morgenstern is most complete for the class of games called two-person zero-sum games,i.e.games with only two players in which one player wins what the other player loses.In Part II,we restrict attention to such games.We will refer to the players as Player I and Player II.1.1Strategic Form.The simplest mathematical description of a game is the strate-gic form,mentioned in the introduction.For a two-person zero-sum game,the payofffunction of Player II is the negative of the payoffof Player I,so we may restrict attention to the single payofffunction of Player I,which we call here L.Definition1.The strategic form,or normal form,of a two-person zero-sum game is given by a triplet(X,Y,A),where(1)X is a nonempty set,the set of strategies of Player I(2)Y is a nonempty set,the set of strategies of Player II(3)A is a real-valued function defined on X×Y.(Thus,A(x,y)is a real number for every x∈X and every y∈Y.)The interpretation is as follows.Simultaneously,Player I chooses x∈X and Player II chooses y∈Y,each unaware of the choice of the other.Then their choices are made known and I wins the amount A(x,y)from II.Depending on the monetary unit involved, A(x,y)will be cents,dollars,pesos,beads,etc.If A is negative,I pays the absolute value of this amount to II.Thus,A(x,y)represents the winnings of I and the losses of II.This is a very simple definition of a game;yet it is broad enough to encompass the finite combinatorial games and games such as tic-tac-toe and chess.This is done by being sufficiently broadminded about the definition of a strategy.A strategy for a game of chess,for example,is a complete description of how to play the game,of what move to make in every possible situation that could occur.It is rather time-consuming to write down even one strategy,good or bad,for the game of chess.However,several different programs for instructing a machine to play chess well have been written.Each program constitutes one strategy.The program Deep Blue,that beat then world chess champion Gary Kasparov in a match in1997,represents one strategy.The set of all such strategies for Player I is denoted by X.Naturally,in the game of chess it is physically impossible to describe all possible strategies since there are too many;in fact,there are more strategies than there are atoms in the known universe.On the other hand,the number of games of tic-tac-toe is rather small,so that it is possible to study all strategies andfind an optimal strategy for each ter,when we study the extensive form of a game,we will see that many other types of games may be modeled and described in strategic form.To illustrate the notions involved in games,let us consider the simplest non-trivial case when both X and Y consist of two elements.As an example,take the game called Odd-or-Even.1.2Example:Odd or Even.Players I and II simultaneously call out one of the numbers one or two.Player I’s name is Odd;he wins if the sum of the numbers if odd. Player II’s name is Even;she wins if the sum of the numbers is even.The amount paid to the winner by the loser is always the sum of the numbers in dollars.To put this game in strategic form we must specify X,Y and A.Here we may choose X={1,2},Y={1,2}, and A as given in the following table.II(even)yI(odd)x12 1−2+3 2+3−4A(x,y)=I’s winnings=II’s losses.It turns out that one of the players has a distinct advantage in this game.Can you tell which one it is?Let us analyze this game from Player I’s point of view.Suppose he calls‘one’3/5ths of the time and‘two’2/5ths of the time at random.In this case,1.If II calls‘one’,I loses2dollars3/5ths of the time and wins3dollars2/5ths of the time;on the average,he wins−2(3/5)+3(2/5)=0(he breaks even in the long run).2.If II call‘two’,I wins3dollars3/5ths of the time and loses4dollars2/5ths of the time; on the average he wins3(3/5)−4(2/5)=1/5.That is,if I mixes his choices in the given way,the game is even every time II calls ‘one’,but I wins20/c on the average every time II calls‘two’.By employing this simple strategy,I is assured of at least breaking even on the average no matter what II does.Can Player Ifix it so that he wins a positive amount no matter what II calls?Let p denote the proportion of times that Player I calls‘one’.Let us try to choose p so that Player I wins the same amount on the average whether II calls‘one’or‘two’.Then since I’s average winnings when II calls‘one’is−2p+3(1−p),and his average winnings when II calls‘two’is3p−4(1−p)Player I should choose p so that−2p+3(1−p)=3p−4(1−p)3−5p=7p−412p=7p=7/12.Hence,I should call‘one’with probability7/12,and‘two’with probability5/12.On theaverage,I wins−2(7/12)+3(5/12)=1/12,or813cents every time he plays the game,nomatter what II does.Such a strategy that produces the same average winnings no matter what the opponent does is called an equalizing strategy.Therefore,the game is clearly in I’s favor.Can he do better than813cents per gameon the average?The answer is:Not if II plays properly.In fact,II could use the same procedure:call‘one’with probability7/12call‘two’with probability5/12.If I calls‘one’,II’s average loss is−2(7/12)+3(5/12)=1/12.If I calls‘two’,II’s average loss is3(7/12)−4(5/12)=1/12.Hence,I has a procedure that guarantees him at least1/12on the average,and II has a procedure that keeps her average loss to at most1/12.1/12is called the value of the game,and the procedure each uses to insure this return is called an optimal strategy or a minimax strategy.If instead of playing the game,the players agree to call in an arbitrator to settle thisconflict,it seems reasonable that the arbitrator should require II to pay813cents to I.ForI could argue that he should receive at least813cents since his optimal strategy guaranteeshim that much on the average no matter what II does.On the other hand II could arguethat he should not have to pay more than813cents since she has a strategy that keeps heraverage loss to at most that amount no matter what I does.1.3Pure Strategies and Mixed Strategies.It is useful to make a distinction between a pure strategy and a mixed strategy.We refer to elements of X or Y as pure strategies.The more complex entity that chooses among the pure strategies at random in various proportions is called a mixed strategy.Thus,I’s optimal strategy in the game of Odd-or-Even is a mixed strategy;it mixes the pure strategies one and two with probabilities 7/12and5/12respectively.Of course every pure strategy,x∈X,can be considered as the mixed strategy that chooses the pure strategy x with probability1.In our analysis,we made a rather subtle assumption.We assumed that when a player uses a mixed strategy,he is only interested in his average return.He does not care about hismaximum possible winnings or losses—only the average.This is actually a rather drastic assumption.We are evidently assuming that a player is indifferent between receiving5 million dollars outright,and receiving10million dollars with probability1/2and nothing with probability1/2.I think nearly everyone would prefer the$5,000,000outright.This is because the utility of having10megabucks is not twice the utility of having5megabucks.The main justification for this assumption comes from utility theory and is treated in Appendix1.The basic premise of utility theory is that one should evaluate a payoffby its utility to the player rather than on its numerical monetary value.Generally a player’s utility of money will not be linear in the amount.The main theorem of utility theory states that under certain reasonable assumptions,a player’s preferences among outcomes are consistent with the existence of a utility function and the player judges an outcome only on the basis of the average utility of the outcome.However,utilizing utility theory to justify the above assumption raises a new difficulty. Namely,the two players may have different utility functions.The same outcome may be perceived in quite different ways.This means that the game is no longer zero-sum.We need an assumption that says the utility functions of two players are the same(up to change of location and scale).This is a rather strong assumption,but for moderate to small monetary amounts,we believe it is a reasonable one.A mixed strategy may be implemented with the aid of a suitable outside random mechanism,such as tossing a coin,rolling dice,drawing a number out of a hat and so on.The seconds indicator of a watch provides a simple personal method of randomization provided it is not used too frequently.For example,Player I of Odd-or-Even wants an outside random event with probability7/12to implement his optimal strategy.Since 7/12=35/60,he could take a quick glance at his watch;if the seconds indicator showed a number between0and35,he would call‘one’,while if it were between35and60,he would call‘two’.1.4The Minimax Theorem.A two-person zero-sum game(X,Y,A)is said to be afinite game if both strategy sets X and Y arefinite sets.The fundamental theorem of game theory due to von Neumann states that the situation encountered in the game of Odd-or-Even holds for allfinite two-person zero-sum games.Specifically,The Minimax Theorem.For everyfinite two-person zero-sum game,(1)there is a number V,called the value of the game,(2)there is a mixed strategy for Player I such that I’s average gain is at least V no matter what II does,and(3)there is a mixed strategy for Player II such that II’s average loss is at most V no matter what I does.This is one form of the minimax theorem to be stated more precisely and discussed in greater depth later.If V is zero we say the game is fair.If V is positive,we say the game favors Player I,while if V is negative,we say the game favors Player II.1.5Exercises.1.Consider the game of Odd-or-Even with the sole change that the loser pays the winner the product,rather than the sum,of the numbers chosen(who wins still depends on the sum).Find the table for the payofffunction A,and analyze the game tofind the value and optimal strategies of the players.Is the game fair?2.Player I holds a black Ace and a red8.Player II holds a red2and a black7.The players simultaneously choose a card to play.If the chosen cards are of the same color, Player I wins.Player II wins if the cards are of different colors.The amount won is a number of dollars equal to the number on the winner’s card(Ace counts as1.)Set up the payofffunction,find the value of the game and the optimal mixed strategies of the players.3.Sherlock Holmes boards the train from London to Dover in an effort to reach the continent and so escape from Professor Moriarty.Moriarty can take an express train and catch Holmes at Dover.However,there is an intermediate station at Canterbury at which Holmes may detrain to avoid such a disaster.But of course,Moriarty is aware of this too and may himself stop instead at Canterbury.Von Neumann and Morgenstern(loc.cit.) estimate the value to Moriarty of these four possibilities to be given in the following matrix (in some unspecified units).HolmesMoriartyCanterbury Dover Canterbury100−50 Dover0100What are the optimal strategies for Holmes and Moriarty,and what is the value?(His-torically,as related by Dr.Watson in“The Final Problem”in Arthur Conan Doyle’s The Memoires of Sherlock Holmes,Holmes detrained at Canterbury and Moriarty went on to Dover.)4.The entertaining book The Compleat Strategyst by John Williams contains many simple examples and informative discussion of strategic form games.Here is one of his problems.“I know a good game,”says Alex.“We pointfingers at each other;either onefinger or twofingers.If we match with onefinger,you buy me one Daiquiri,If we match with twofingers,you buy me two Daiquiris.If we don’t match I letyou offwith a payment of a dime.It’ll help pass the time.”Olaf appears quite unmoved.“That sounds like a very dull game—at least in its early stages.”His eyes glaze on the ceiling for a moment and his lipsflutterbriefly;he returns to the conversation with:“Now if you’d care to pay me42cents before each game,as a partial compensation for all those55-cent drinks I’llhave to buy you,then I’d be happy to pass the time with you.Olaf could see that the game was inherently unfair to him so he insisted on a side payment as compensation.Does this side payment make the game fair?What are the optimal strategies and the value of the game?2.Matrix Games —DominationA finite two-person zero-sum game in strategic form,(X,Y,A ),is sometimes called a matrix game because the payofffunction A can be represented by a matrix.If X ={x 1,...,x m }and Y ={y 1,...,y n },then by the game matrix or payoffmatrix we mean the matrix A =⎛⎝a 11···a 1n ......a m 1···a mn⎞⎠where a ij =A (x i ,y j ),In this form,Player I chooses a row,Player II chooses a column,and II pays I the entry in the chosen row and column.Note that the entries of the matrix are the winnings of the row chooser and losses of the column chooser.A mixed strategy for Player I may be represented by an m -tuple,p =(p 1,p 2,...,p m )of probabilities that add to 1.If I uses the mixed strategy p =(p 1,p 2,...,p m )and II chooses column j ,then the (average)payoffto I is m i =1p i a ij .Similarly,a mixed strategy for Player II is an n -tuple q =(q 1,q 2,...,q n ).If II uses q and I uses row i the payoffto I is n j =1a ij q j .More generally,if I uses the mixed strategy p and II uses the mixed strategy q ,the (average)payoffto I is p T Aq = m i =1 n j =1p i a ij q j .Note that the pure strategy for Player I of choosing row i may be represented as the mixed strategy e i ,the unit vector with a 1in the i th position and 0’s elsewhere.Similarly,the pure strategy for II of choosing the j th column may be represented by e j .In the following,we shall be attempting to ‘solve’games.This means finding the value,and at least one optimal strategy for each player.Occasionally,we shall be interested in finding all optimal strategies for a player.2.1Saddle points.Occasionally it is easy to solve the game.If some entry a ij of the matrix A has the property that(1)a ij is the minimum of the i th row,and(2)a ij is the maximum of the j th column,then we say a ij is a saddle point.If a ij is a saddle point,then Player I can then win at least a ij by choosing row i ,and Player II can keep her loss to at most a ij by choosing column j .Hence a ij is the value of the game.Example 1.A =⎛⎝41−3325016⎞⎠The central entry,2,is a saddle point,since it is a minimum of its row and maximum of its column.Thus it is optimal for I to choose the second row,and for II to choose the second column.The value of the game is 2,and (0,1,0)is an optimal mixed strategy for both players.For large m ×n matrices it is tedious to check each entry of the matrix to see if it has the saddle point property.It is easier to compute the minimum of each row and the maximum of each column to see if there is a match.Here is an example of the method.row min A =⎛⎜⎝3210012010213122⎞⎟⎠0001col max 3222row min B =⎛⎜⎝3110012010213122⎞⎟⎠0001col max 3122In matrix A ,no row minimum is equal to any column maximum,so there is no saddle point.However,if the 2in position a 12were changed to a 1,then we have matrix B .Here,the minimum of the fourth row is equal to the maximum of the second column;so b 42is a saddle point.2.2Solution of All 2by 2Matrix Games.Consider the general 2×2game matrix A = a b d c.To solve this game (i.e.to find the value and at least one optimal strategy for each player)we proceed as follows.1.Test for a saddle point.2.If there is no saddle point,solve by finding equalizing strategies.We now prove the method of finding equalizing strategies of Section 1.2works when-ever there is no saddle point by deriving the value and the optimal strategies.Assume there is no saddle point.If a ≥b ,then b <c ,as otherwise b is a saddle point.Since b <c ,we must have c >d ,as otherwise c is a saddle point.Continuing thus,we see that d <a and a >b .In other words,if a ≥b ,then a >b <c >d <a .By symmetry,if a ≤b ,then a <b >c <d >a .This shows thatIf there is no saddle point,then either a >b ,b <c ,c >d and d <a ,or a <b ,b >c ,c <d and d >a .In equations (1),(2)and (3)below,we develop formulas for the optimal strategies and value of the general 2×2game.If I chooses the first row with probability p (es the mixed strategy (p,1−p )),we equate his average return when II uses columns 1and 2.ap +d (1−p )=bp +c (1−p ).Solving for p ,we findp =c −d (a −b )+(c −d ).(1)Since there is no saddle point,(a−b)and(c−d)are either both positive or both negative; hence,0<p<1.Player I’s average return using this strategy isv=ap+d(1−p)=ac−bda−b+c−d.If II chooses thefirst column with probability q(es the strategy(q,1−q)),we equate his average losses when I uses rows1and2.aq+b(1−q)=dq+c(1−q)Hence,q=c−ba−b+c−d.(2)Again,since there is no saddle point,0<q<1.Player II’s average loss using this strategyisaq+b(1−q)=ac−bda−b+c−d=v,(3)the same value achievable by I.This shows that the game has a value,and that the players have optimal strategies.(something the minimax theorem says holds for allfinite games). Example2.A=−233−4p=−4−3−2−3−4−3=7/12q=samev=8−9−2−3−4−3=1/12Example3.A=0−1012p=2−10+10+2−1=1/11q=2+100+10+2−1=12/11.But q must be between zero and one.What happened?The trouble is we“forgot to test this matrix for a saddle point,so of course it has one”.(J.D.Williams The Compleat Strategyst Revised Edition,1966,McGraw-Hill,page56.)The lower left corner is a saddle point.So p=0and q=1are optimal strategies,and the value is v=1.2.3Removing Dominated Strategies.Sometimes,large matrix games may be reduced in size(hopefully to the2×2case)by deleting rows and columns that are obviously bad for the player who uses them.Definition.We say the i th row of a matrix A=(a ij)dominates the k th row if a ij≥a kj for all j.We say the i th row of A strictly dominates the k th row if a ij>a kj for all j.Similarly,the j th column of A dominates(strictly dominates)the k th column if a ij≤a ik(resp.a ij<a ik)for all i.Anything Player I can achieve using a dominated row can be achieved at least as well using the row that dominates it.Hence dominated rows may be deleted from the matrix.A similar argument shows that dominated columns may be removed.To be more precise,removal of a dominated row or column does not change the value of a game .However,there may exist an optimal strategy that uses a dominated row or column (see Exercise 9).If so,removal of that row or column will also remove the use of that optimal strategy (although there will still be at least one optimal strategy left).However,in the case of removal of a strictly dominated row or column,the set of optimal strategies does not change.We may iterate this procedure and successively remove several rows and columns.As an example,consider the matrix,A .The last column is dominated by the middle column.Deleting the last column we obtain:A =⎛⎝204123412⎞⎠Now the top row is dominated by the bottomrow.(Note this is not the case in the original matrix).Deleting the top row we obtain:⎛⎝201241⎞⎠This 2×2matrix does not have a saddle point,so p =3/4,q =1/4and v =7/4.I’s optimal strategy in the original game is(0,3/4,1/4);II’s is (1/4,3/4,0).1241 A row (column)may also be removed if it is dominated by a probability combination of other rows (columns).If for some 0<p <1,pa i 1j +(1−p )a i 2j ≥a kj for all j ,then the k th row is dominated by the mixed strategy that chooses row i 1with probability p and row i 2with probability 1−p .Player I can do at least as well using this mixed strategy instead of choosing row k .(In addition,any mixed strategy choosing row k with probability p k may be replaced by the one in which k ’s probability is split between i 1and i 2.That is,i 1’s probability is increased by pp k and i 2’s probability is increased by (1−p )p k .)A similar argument may be used for columns.Consider the matrix A =⎛⎝046574963⎞⎠.The middle column is dominated by the outside columns taken with probability 1/2each.With the central column deleted,the middle row is dominated by the combination of the top row with probability 1/3and the bottom row with probability 2/3.The reducedmatrix, 0693,is easily solved.The value is V =54/12=9/2.Of course,mixtures of more than two rows (columns)may be used to dominate and remove other rows (columns).For example,the mixture of columns one two and threewith probabilities 1/3each in matrix B =⎛⎝135340223735⎞⎠dominates the last column,and so the last column may be removed.Not all games may be reduced by dominance.In fact,even if the matrix has a saddle point,there may not be any dominated rows or columns.The 3×3game with a saddle point found in Example 1demonstrates this.2.4Solving 2×n and m ×2games.Games with matrices of size 2×n or m ×2may be solved with the aid of a graphical interpretation.Take the following example.p 1−p 23154160Suppose Player I chooses the first row with probability p and the second row with proba-bility 1−p .If II chooses Column 1,I’s average payoffis 2p +4(1−p ).Similarly,choices of Columns 2,3and 4result in average payoffs of 3p +(1−p ),p +6(1−p ),and 5p respectively.We graph these four linear functions of p for 0≤p ≤1.For a fixed value of p ,Player I can be sure that his average winnings is at least the minimum of these four functions evaluated at p .This is known as the lower envelope of these functions.Since I wants to maximize his guaranteed average winnings,he wants to find p that achieves the maximum of this lower envelope.According to the drawing,this should occur at the intersection of the lines for Columns 2and 3.This essentially,involves solving the game in which II is restrictedto Columns 2and 3.The value of the game 3116is v =17/7,I’s optimal strategy is (5/7,2/7),and II’s optimal strategy is (5/7,2/7).Subject to the accuracy of the drawing,we conclude therefore that in the original game I’s optimal strategy is (5/7,2/7),II’s is (0,5/7,2/7,0)and the value is 17/7.Fig 2.10123456col.3col.1col.2col.4015/7pThe accuracy of the drawing may be checked:Given any guess at a solution to a game,there is a sure-fire test to see if the guess is correct ,as follows.If I uses the strategy (5/7,2/7),his average payoffif II uses Columns 1,2,3and 4,is 18/7,17/7,17/7,and 25/7respectively.Thus his average payoffis at least17/7no matter what II does.Similarly, if II uses(0,5/7,2/7,0),her average loss is(at most)17/7.Thus,17/7is the value,and these strategies are optimal.We note that the line for Column1plays no role in the lower envelope(that is,the lower envelope would be unchanged if the line for Column1were removed from the graph). This is a test for domination.Column1is,in fact,dominated by Columns2and3taken with probability1/2each.The line for Column4does appear in the lower envelope,and hence Column4cannot be dominated.As an example of a m×2game,consider the matrix associated with Figure2.2.If q is the probability that II chooses Column1,then II’s average loss for I’s three possible choices of rows is given in the accompanying graph.Here,Player II looks at the largest of her average losses for a given q.This is the upper envelope of the function.II wants tofind q that minimizes this upper envelope.From the graph,we see that any value of q between1/4and1/3inclusive achieves this minimum.The value of the game is4,and I has an optimal pure strategy:row2.Fig2.2⎛⎝q1−q154462⎞⎠123456row1row2row3011/41/2qThese techniques work just as well for2×∞and∞×2games.2.5Latin Square Games.A Latin square is an n×n array of n different letters such that each letter occurs once and only once in each row and each column.The5×5 array at the right is an example.If in a Latin square each letter is assigned a numerical value,the resulting matrix is the matrix of a Latin square game.Such games have simple solutions.The value is the average of the numbers in a row,and the strategy that chooses each pure strategy with equal probability1/n is optimal for both players.The reason is not very deep.The conditions for optimality are satisfied.⎛⎜⎜⎜⎝a b c d eb e acd c a de b d c e b ae d b a c ⎞⎟⎟⎟⎠a =1,b =2,c =d =3,e =6⎛⎜⎜⎜⎝1233626133313623362163213⎞⎟⎟⎟⎠In the example above,the value is V =(1+2+3+3+6)/5=3,and the mixed strategy p =q =(1/5,1/5,1/5,1/5,1/5)is optimal for both players.The game of matching pennies is a Latin square game.Its value is zero and (1/2,1/2)is optimal for both players.2.6Exercises.1.Solve the game with matrix−1−3−22 ,that is find the value and an optimal (mixed)strategy for both players.2.Solve the game with matrix 02t 1for an arbitrary real number t .(Don’t forget to check for a saddle point!)Draw the graph of v (t ),the value of the game,as a function of t ,for −∞<t <∞.3.Show that if a game with m ×n matrix has two saddle points,then they have equal values.4.Reduce by dominance to 2×2games and solve.(a)⎛⎜⎝5410432−10−1431−212⎞⎟⎠(b)⎛⎝1007126476335⎞⎠.5.(a)Solve the game with matrix 3240−21−45 .(b)Reduce by dominance to a 3×2matrix game and solve:⎛⎝08584612−43⎞⎠.6.Players I and II choose integers i and j respectively from the set {1,2,...,n }for some n ≥2.Player I wins 1if |i −j |=1.Otherwise there is no payoff.If n =7,for example,the game matrix is⎛⎜⎜⎜⎜⎜⎜⎜⎝0100000101000001010000010100000101000001010000010⎞⎟⎟⎟⎟⎟⎟⎟⎠。

国外博弈论lecture20

国外博弈论lecture20

Today's Agenda
? Review of previous class ? Repeated games ? Infinitely repeated games
June 17, 2003
73-347 Game Theory--Lecture 20
3
第三页,共28页。
Two-stage repeated game
which M 1, M 2 are played? Or, can the two players cooperate in a
subgame perfect Nash equilibrium?
L1
Player 1 M1
R1
June 17, 2003
L2 1, 1
0, 5 0, 0
Player 2
? Dynamic games of complete information ? Extensive-form representation
? Dynamic games of complete and perfect information ? Game tree
? Subgame-perfect Nash equilibrium ? Backward induction
The payoffs of the 2nd stage has been added to the first
stage game.
L1
Player 1 M1 R1
June 17, 2003
L2
2, 2 1, 6 1, 1
73-347 Game Theory--Lecture 20
第十三页,共28页。
play begins

高级微观经济学博弈论讲义(复旦大学CCES Yongqin Wang)

高级微观经济学博弈论讲义(复旦大学CCES Yongqin Wang)
13
Player 1 s12 s13
What is game theory?
We focus on games where: There are at least two rational players Each player has more than one choices The outcome depends on the strategies chosen by all players; there is strategic interaction Example: Six people go to a restaurant. Each person pays his/her own meal – a simple decision problem Before the meal, every person agrees to split the bill evenly among them – a game
At the separate workplaces, Chris and Pat must choose to
attend either an opera or a prize fight in the evening. Both Chris and Pat know the following:

Dec, 2006, Fudan University
Game Theory--Lecture 1
11
Definition: normal-form or strategicform representation
The normal-form (or strategic-form)
representation of a game G specifies:

Introduction Mathematical Economics

Introduction Mathematical Economics

. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
4 Sequential move games 23 4.1 Extensive form games . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 Subgame perfect Nash equilibria . . . . . . . . . . . . . . . . . 27 4.3 Repeated games . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5 Dynamic price competition 6 Market entry 7 Review exercises A Solutions exercises 33 34 37 39
1.2
Games in strategic form
A (noncooperative) game in strategic form or normal form is described by a tuple (N, (Xi )i∈N , (πi )i∈N ), where N = {1, . . . , n} is the finite set of players, Xi is the nonempty strategy set of player i ∈ N and πi : X → R with X = i∈N Xi is the payoff function of player i, which assigns to each strategy profile x = (x1 , . . . , xn ) ∈ X a real number πi (x) representing agent i’s utility level, usually thought of as money. Example 1.2.1 A Cournot market can be modelled as a game in strategic 3

lect13

lect13

Eco514—Game TheoryLecture 13:Repeated Games (2)Marciano SiniscalchiOctober 28,1999Introduction[Again,by and large,I will follow OR,Chap.8,so I will keep these notes to a minimum.]Review of key definitionsRecall our three payoffaggregation criteria:discounting,i.e.(u t i )t ≥1 i (w t i )t ≥1⇔ t ≥1δt −1(u t i −w t i )>0(also recall that the payoffprofile corresponding to a stream (u t )is taken to be (1−δ) t ≥1δt −1u (a t ));limit of means:(u t i )t ≥1 i (w t i )t ≥1⇔lim inf t →∞T t =1u t i −w t i T >0;and overtaking:(u t i )t ≥1 i (w t i )t ≥1⇔lim inf t →∞T t =1(u t i −w t i )>0.Also recall the definition of machines:Definition 1Fix a normal-form game G .A machine for Player i ∈N is a tuple M i =(Q i ,q 0i ,f i ,τi ),where:(i)Q i is a finite set (whose elements should be thought of as labels);(ii)q 0i is the initial state of the machine;(iii)f i :Q i →A i is the action function:it specifies what Player i does at each state;and (iv)τi :Q i ×A →Q i is the transition function:if action a ∈A is played and Player i ’s machine state is q i ∈Q i ,then at the next stage Player i ’s machine state will be τi (q i ,a ).1Note:OR actually allows for arbitrary state spaces.For most proofs,finite state spaces are enough.There is one exception,which I will note below.Definition2Fix a normal-form game G.The minmax payofffor player i,v i,is defined byv i=min a−i∈A−i max ai∈A iu i(a i,a−i).We are not going to worry about implementing mixtures of action profiles in this lecture, so I just remind you of the definition of enforceability:Definition3Fix a normal-form game G.A payoffprofile u∈R N is enforceable(resp. strictly enforceable)if u i≥v i(resp.u i>v i)for all i∈NPerfect Folk TheoremsThe strategies in the proof of the Nash folk theorem(say,with discounting)call for indefinite punishment following a deviation from the equilibrium.This isfine in games such as the Prisoner’s Dilemma,in which the minmax actions are actually an equilibrium(indeed,in PD,they are the only equilibrium!).That is,the threat of indefinite punishment is indeed credible.A DA2,31,6D0,10,1Figure1:Indefinite Punishment is not credibleHowever,consider the game in Figure1.The Row player can hold the Column player down to his minimum payoffby choosing D,but D is strictly dominated for her.Hence,while (2,3)can be enforced as a Nash equilibrium payoffprofile in the infinitely repeated version of the game by assuming that players switch to(D,D)following a deviation,this would not work if we required subgame perfection,regardless of the payoffaggregation criterion.A warm-up exercise:Perfect Folk Theorems for Limit-of-MeansThus,we must be smarter than that.A key intuition is that,after all,a deviator need not be punished forever,but only long enough to wipe out any gains from his deviation.Formally,fix a game G=(N,(A i,u i)i∈N)and let M=max i∈N,a∈A u i(a).Fix an outcome a∗of G;clearly,a one-time deviation by Player i cannot yield more than M−u i(a∗).Thus,if Player i deviates,we need only punish him so as to wipe out this payoffdifferential;clearly, how long this will be depends on the payoffaggregation criterion.Intuitively,limit-of-means should make things very simple,because,following a deviation, punishers face only afinite number of periods in which they must forego their equilibrium2payoffstream in order to punish Player i.Under limit-of-means aggregation,finite periods do not matter,so punishers are indifferent between punishing and not punishing.There is only one subtlety:of course,the same argument applies to Player i:no one-time deviation can be profitable for her!So,what exactly are we trying to deter?The answer is that,although nofinite deviation from the infinite repetition of a∗is prof-itable for Player i,the following strategy could be.Suppose that,in the putative equilibrium we wish to construct,a deviation is followed by L rounds of punishment(you can think of L =1if you wish).Thus,if Player i deviates once,she gets an extra payoffof(at most) M−u i(a∗),but then loses u i(a∗)−v i utils in each of the subsequent L periods.Now suppose L is small,so that M−u i(a∗)>L[u i(a∗)−v i].For example,in the game of Figure1,suppose that i is the Column player,that a∗=(A,A),and that L=1.Then a deviation yields3utils,whereas one round of punishment costs2utils.Then,Player i can adopt the following strategy:deviate,then play a best-response to p−i(in Figure1,play D) for L periods;then,as soon as play is supposed to return to a∗,deviate and best-respond to the minmax profile,and so on.This is a profitable deviation.[Observe that it is also a neat example of a game in which the one-deviation property holds!]Thus,we must choose L large enough so∀i∈N,M−u i(a∗)<L[u i(a∗)−v i]In Figure1,it is enough to choose L =2.To complete the argument,we must specify what happens if more than one player de-viates,or if somebody deviates from the punishment stage.As in the proof of the Nash Folk Theorem,multiple deviations lead to the lowest-index player being punished(they will not occur in equilibrium anyway,so players cannot hope to get away with deviating because somebody else will deviate,too).Finally,if one or more punishers fail to punish,we simply disregard these further(un-profitable)deviations:again,in equilibrium they are not going to occur,so players cannot count on them to improve their predicament after afirst deviation.This proves:Proposition0.1[OR,Proposition146.2]Fix a game G.Any feasible,strictly enforce-able payoffprofile of G is a subgame-perfect equilibrium payoffprofile of the limit-of-means infinitely repeated version of G.MachinesYou will undoubtedly notice that describing these strategies verbally is awkward;doing so formally(as we have tried to do in the proof of the Nash Folk theorem)is even worse.Machines can help.For instance,here is a set of machines that players can use to implement the strategies in the proof of Proposition0.1:Player i uses machine M i= (Q,q0,f i,τ)(where Q,q0,τare common to all players)defined as follows.3First,Q={N,P(j,t)}.N is the normal state,in which a∗is played.P(j,t)is the state in which j is punished,and t more rounds of punishment are required.Second,q0=N.Third,f i(N)=a∗i,f i(P(j,t))=p−j,i if j=i,and f j(P(j,t))=r j(p−j). This should be obvious.Finally,τ(·,·)is such that we remain in N if nobody deviates,we switch from N to P(j,L)if j is the lowest-index deviator,we always move from P(j,t)to P(j,t−1)if t=0, and we always move from P(j,0)back to N.Easy!DiscountingThe strategy profile thus constructed is not a subgame-perfect equilibrium of the game in Figure1if payoffs are aggregated via discounting,for any discount factor.Suppose the Column player deviates:then the Row player is supposed to choose D for2periods,and hence receive aflow payoffof0units.She will then receive2forever after.However,if she deviates to A,nothing happens:the Column player continues to choose D,so again the Row player can choose A.Obviously,she prefers(1,1,2,2,...)to(0,0,2,2,...)!Thus,the key intuition is that we must somehow ensure that punishers are willing to punish.In principle,one could think of punishing punishers,but this may fail to work with discounting:essentially,second deviations might require longer punishment periods(because the burden of carrying out thefirst punishment lasts for L periods,and not just one),third deviations might require even longer punishments,and so on.This is certainly the case in the game of Figure1.The alternative is to reward punishers.This leads to OR’s Proposition151.1.I am only going to offer a few comments on the proof.The“conciliation”states C(j)serve to reward punishers,in a way.However,this is subtle:we never go back to the Nash state C(0).What happens is,if j deviates and i punishes him,then after punishment we move to C(j),which i prefers to C(i).Otherwise,we go to C(i)and stay there until somebody else deviates.1In my opinion,this is also a punishment of sorts,but of course you are welcome to differ.Also:the remark about thefirst condition onδbeing sufficient has to do with the fact that,after L periods of punishment,we move to C(j);since u i(a(j))<u i(a(0))by assump-tion,this is actually a further punishment(which,er,actually reinforces my interpretation, but never mind that).The point is that this punishment may not be enough,or may come 1Again,let me repeat this because Ifirst got it wrong in class(but then you guys spotted me!):whenever we are in state C(j),we remain there if somebody deviates,and move to the state P(k,L)in which Player k’s punishment begins if k deviates from a(j).Thus,what supports a(j)as a continuation equilibrium after a deviation by Player j is the threat of further punishment;a(j)need not be a Nash equilibrium per se.4too late,so we make sure that Player j is hit really hard for L periods before switching to the conciliation phase.Finally,the second condition onδhas the following interpretation:by punishing Player j, Player i potentially loses M−u i(p−j,r j(p−j))for the L periods following Player j’s deviation (be it a deviation from a(0)or from whatever j was supposed to play).On the other hand, after L rounds of punishments,the game will switch to the C(j)conciliation stage.Now, Player i prefers to be in the C(j)state than in the C(i)state,by assumption,so if the discount factor is large enough,she will not deviate.[The only subtle point is that she may actually deviate at any t∈{1,...,L}(where time is measured starting from thefirst stage after Player j’s deviation).If she deviates at t,she will actually be held down to v i starting from t+1and until t+L≥L+1;the condition in the text is thus stronger than it needs be(it assumes a payoffof M from t+1to L,and a punishment payoffof u i(a(i))from L+1to t+L).]5。

博弈论(哈佛大学原版教程)

博弈论(哈佛大学原版教程)

博弈论(哈佛⼤学原版教程)Lecture XIII:Repeated GamesMarkus M.M¨o biusApril19,2004Gibbons,chapter2.3.B,2.3.COsborne,chapter14Osborne and Rubinstein,sections8.3-8.51IntroductionSo far one might get a somewhat misleading impression about SPE.When we?rst introduced dynamic games we noted that they often have a large number of(unreasonable)Nash equilibria.In the models we’ve looked at so far SPE has’solved’this problem and given us a unique NE.In fact,this is not really the norm.We’ll see today that many dynamic games still have a very large number of SPE.2Credible ThreatsWe introduced SPE to rule out non-credible threats.In many?nite horizon games though credible threats are common and cause a multiplicity of SPE.Consider the following game:1actions before choosing the second period actions.Now one way to get a SPE is to play any of the three pro?les above followed by another of them (or same one).We can also,however,use credible threats to get other actions played inperiod1,such as:Play(B,R)in period1.If player1plays B in period1play(T,L)in period2-otherwise play (M,C)in period2.It is easy to see that no single period deviation helps here.In period2a NE is played so obviously doesn’t help.In period1player1gets4+3if he follows strategy and at most5+1 if he doesn’t.Player2gets4+1if he follows and at most2+1if he doesn’t. Therefore switching to the(M,C)equilibrium in period2is a credible threat.23Repeated Prisoner’s DilemmaNote,that the PD doesn’t have multiple NE so in a?nite horizon we don’t have the same easy threats to use.Therefore,the? nitely repeated PD has a unique SPE in which every player defects in eachperiod.In in?nite other types of threats are credible.Proposition1In the in?nitely repeated PD withδ≥12there exists a SPEin which the outcome is that both players cooperate in every period. Proof:Consider the following symmetric pro?le:s i(h t)=CIf both players have played C in everyperiod along the path leading to h t.D If either player has ever played D.To see that there is no pro?table single deviation note that at any h t such that s i(h t)=D player i gets:0+δ0+δ20+..if he follows his strategy and1+δ0+δ20+..if he plays C instead and then follows s i.At any h t such that s i(h t)=C player i gets:1+δ1+δ21+..=1 1?δ3if he follows his strategy and2+δ0+δ20+..=2if he plays D instead and then follows s i.Neither of these deviations is worth while ifδ≥12.QEDRemark1While people sometimes tend to think of this as showing that people will cooperate in they repeatedly interact ifdoes not show this.All it shows is that there is one SPE in which they do.The correct moral to draw is that there many possible outcomes.3.1Other SPE of repeated PD1.For anyδit is a SPE to play D every period.2.Forδ≥12there is a SPE where the players play D in the?rst periodand then C in all future periods.3.Forδ>1√2there is a SPE where the players play D in every evenperiod and C in every odd period.4.Forδ≥12there is a SPE where the players play(C,D)in every evenperiod and(D,C)in every odd period.3.2Recipe for Checking for SPEWhenever you are supposed to check that a strategy pro?le is an SPE you should?rst try to classify all histories(i.e.all information sets)on and o?the equilibrium path.Then you have to apply the SPDP for each class separately.Assume you want to check that the cooperation with grim trigger pun-ishment is SPE.There are two types of histories you have to check.Along the equilibrium path there is just one history:everybody coop-erated so far.O?the equilibrium path,there is again only one class: one person has defected.4Assume you want to check that cooperating in even periods and defect-ing in odd periods plus grim trigger punishment in case of deviation by any player from above pattern is SPE.There are three types of his-tories:even and odd periods along the equilibrium path,and o?the equilibrium path histories.Assume you want to check that TFT(’Tit for Tat’)is SPE(which it isn’t-see next lecture).Then you have you check four histories:only the play of both players in the last period matters for future play-so there are four relevant histories such as player1and2both cooperated in the last period,player1defected and player2cooperated etc.1Sometimes the following result comes in handy.Lemma1If players play Nash equilibria of the stage game in each period in such a way that the particular equilibrium being played in a period is a function of time only and does not depend on previous play,then this strategy is a Nash equilibrium. The proof is immediate:we check for the SPDP.Assume that there is a pro?table deviation.Such a deviation will not a?ect future play by assump-tion:if the stage game has two NE,for example,and NE1is played in even periods and NE2in odd periods,then a deviation will not a?ect future play.1 Therefore,the deviation has to be pro?table in the current stage game-but since a NE is being played no such pro?table deviation can exist.Corollary1A strategy which has players play the same NE in each period is always SPE.In particular,the grim trigger strategy is SPE if the punishment strategy in each stage game is a NE(as is the case in the PD). 4Folk TheoremThe examples in3.1suggest that the repeated PD has a tremendous number of equilibria whenδis large.Essentially,this means that game theory tells us we can’t really tell what is going to happen.This turns out to be an accurate description of most in?nitely repeated games.1If a deviation triggers a switch to only NE1this statement would no longer be true.5Let G be a simultaneous move game with action sets A1,A2,..,A I and mixed strategy spacesΣ1,Σ2,..,ΣI and payo?function?u i.De?nition1A payo?vector v=(v1,v2,..,v I)? I is feasible if there exists action pro?les a1,a2,..,a k∈A and non-negative weightsω1,..,ωI which sum up to1such thatv i=ω1?u ia1+ω2?u ia2+..+ωk?u ia k+De?nition2A payo?vector v is strictly individually rational ifv i>v i=minσ?i∈Σ?imaxσi(σ?i)∈Σiu i(σi(σ?i),σ?i)(1)We can think of this as the lowest payo?a rational player could ever get in equilibrium if he anticipates hisopponents’(possibly non-rational)play.Intuitively,the minmax payo?v i is the payo?player1can guarantee to herself even if the other players try to punish her as badly as they can.The minmax payo?is a measure of the punishment other players can in?ict. Theorem1FolkTheorem.Suppose that the set of feasible payo?s of G is I-dimensional.Then for any feasible and strictly individually rational payo?vector v there existsδ<1such that for allδ>δthere exists a SPE x?of G∞such that the average payo?to s?is v,i.e.u i(s?)=v i 1?δThe normalized(or average)payo?is de?ned as P=(1?δ)u i(s?).It is the payo?which a stage game would have to generate in each period such that we are indi?erent between that payo?stream and the one generates by s?:P+δP+δ2P+...=u i(s?)4.1Example:Prisoner’s DilemmaThe feasible payoset is the diamond bounded by(0,0),(2,-1),(-1,2) and(1,1).Every point inside can be generated as a convex combina-tions of these payo?vectors.6The minmax payofor each player is0as you can easily check.The other player can at most punish his rival by defecting,and each playerthat the equilibria I showed before generate payo?s inside this area.4.2Example:BOSConsiderHere each player can guarantee herself at least payo?23which is the pay-o?from playing the mixed strategy Nash equilibrium.You can check that whenever player2mixes with di?erent probabilities,player1can guarantee herself more than this payo?by playing either F or O all the time.4.3Idea behind the Proof1.Have players on the equilibrium path play an action with payo?v(oralternate if necessary to generate this payo?).22.If some player deviates punish him by having the other players for Tperiods chooseσ?i such that player i gets v i.3.After the end of the punishment phase reward all players(other than i)for having carried out the punishment by switching to an action pro?lewhere player i gets v Pij+ .2For example,in the BoS it is not possible to generate32,32in the stage game evenwith mixing.However,if players alternate and play(O,F)and then(F,O)the players can get arbitrarily close for largeδ. 8。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Overvted games, aiming to unpack the intuition that the promise of rewards and the threat of punishment in the future of a relationship can provide incentives for good behavior today. In class, we play prisoners' dilemma twice and three times, but this fails to sustain cooperation. The problem is that, in the last stage, since there is then is future, there is no incentive to cooperate, and hence the incentives unravel from the back. We related this to the real-world problems of a lame duck leader and of maintaining incentives for those close to retirement. But it is possible to sustain good behavior in early stages of some repeated games (even if they are only played a few times) provided the stage games have two or more equilibria to be used as rewards and punishments. This may require us to play bad equilibria tomorrow. We relate this to the trade off between ex ante and ex post efficiency in the law. Finally, we play a game in which the players do not know when the game will end, and we start to consider strategies for this potentially infinitely repeated game.
相关文档
最新文档