MASSACHVSETTS INSTITVTE OF TECHNOLOGY

Department of Electrical Engineering and Computer Science

6.001 – Structure and Interpretation of Computer Programs

Fall Semester, 2005

Project 2 – Prisoner's Dilemma

Purpose

 

Project 2 focuses on the use of higher order procedures, together with data structures. You will also further develop and demonstrate your ability to write clear, intelligible, well-documented procedures, as well as test cases for your procedures.

A Fable

In the mid-1920's, the Nebraska State Police achieved what may still be their finest moment. After a 400-mile car chase over dirt roads and through corn fields, they finally caught up with the notorious bank robbers Bunny and Clod. The two criminals were brought back to the police station in Omaha for further interrogation. Bunny and Clod were questioned in separate rooms, and each was offered the same deal by the police. The deal went as follows (since both are the same, we need only describe the version presented to Bunny):

“Bunny, here's the offer that we are making to both you and Clod. If you both hold out on us and don't confess to bank robbery, then we admit that we don't have enough proof to convict you. However, we will be able to jail you both for one year, for reckless driving and endangerment of corn. If you turn state's witness and help us convict Clod (assuming he doesn't confess), then you will go free, and Clod will get twenty years in prison. On the other hand, if you don't confess and Clod does, then he will go free and you will get twenty years.”

“What happens if both Clod and I confess?” asked Bunny.

“Then you both get ten years,” responded the police.

Bunny, who had been a math major at Cal Tech before turning to crime, reasoned this way: “Suppose Clod intends to confess. Then if I don't confess, I'll get twenty years, but if I do confess, I'll only get ten years. On the other hand, suppose Clod intends to hold out on the cops. Then if I don't confess, I'll go to jail for a year, but if I do confess, I'll go free. So no matter what Clod intends to do, I am better off confessing than holding out. So I'd better confess.”

Naturally, Clod employed the very same reasoning. Both criminals confessed, and both went to jail for ten years (Well, actually they didn't go to jail. When they were in court, and heard that they had both turned state's witness, they strangled each other. But that's another story.) The police, of course, were triumphant, since the criminals would have been free in a year had both remained silent.

The Prisoner's Dilemma

The Bunny and Clod story is an example of a situation known in mathematical game theory as the “prisoner's dilemma.” A prisoner's dilemma always involves two “game players,” and each has a choice between “cooperating” and “defecting.” If the two players cooperate, they each do moderately well; if they both defect, they each do moderately poorly. If one player cooperates and the other defects, then the defector does extremely well and the cooperator does extremely poorly. (In the case of the Bunny and Clod story, “cooperating” means cooperating with one's partner - i.e. holding out on the police - and “defecting” means confessing to bank robbery.) Before formalizing the prisoner's dilemma situation, we need to introduce some basic game theory notation.

A Crash Course in Game Theory

In game theory, we differentiate between a game, and a play. A game refers to the set of possible choices and outcomes for the entire range of situations. A play refers to a specific set of choices by the players, with the associated outcome for that particular scenario. Thus, in game theory, a two-person binary-choice game is represented by a two-by-two matrix. Here is a hypothetical game matrix.

 

B cooperates

B defects

A cooperates

A gets 5
B gets 5

A gets 2
B gets 3

A defects

A gets 3
B gets 2

A gets 1
B gets 1

The two players in this case are called A and B, and the choices are called “cooperate” and “defect.” Players A and B can play a single game by separately (and secretly) choosing either to cooperate or to defect. Once each player has made a choice, he announces it to the other player; and the two then look up their respective scores in the game matrix. Each entry in the matrix is a pair of numbers indicating a score for each player, depending on their choices. Thus, in the example above, if Player A chooses to cooperate while Player B defects, then A gets 2 points and B gets 3 points. If both players defect, they each get 1 point. Note, by the way, that the game matrix is a matter of public knowledge; for instance, Player A knows before the game even starts that if he and B both choose to defect, they will each get 1 point.

In an iterated game, the two players play repeatedly; thus after finishing one game, A and B may play another. (Admittedly, there is a little confusion in the terminology here; thus we refer to each iteration as a “play,” which constitutes a single “round” of the larger, iterated game.) There are a number of ways in which iterated games may be played; in the simplest situation, A and B play for some fixed number of rounds (say 200), and before each round, they are able to look at the record of all previous rounds. For instance, before playing the tenth round of their iterated game, both A and B are able to study the results of the previous nine rounds.

An Analysis of a Simple Game Matrix

The game depicted by the matrix above is a particularly easy one to analyze. Let's examine the situation from Player A's point of view (Player B's point of view is identical):

“Suppose B cooperates. Then I do better by cooperating myself (I receive five points instead of three). On the other hand, suppose B defects. I still do better by cooperating (since I get two points instead of one). So no matter what B does, I am better off cooperating.”

Player B will, of course, reason the same way, and both will choose to cooperate. In the terminology of game theory, both A and B have a dominant choice - i.e., a choice that gives a preferred outcome no matter what the other player chooses to do. The matrix shown above, by the way, does not represent a prisoner's dilemma situation, since when both players make their dominant choice, they also both achieve their highest personal scores. We'll see an example of a prisoner's dilemma game very shortly.

To re-cap: in any particular game using the above matrix, we would expect both players to cooperate; and in an iterated game, we would expect both players to cooperate repeatedly, on every round.

The Prisoner's Dilemma Game Matrix

Now consider the following game matrix:

 

B cooperates

B defects

A cooperates

A gets 3
B gets 3

A gets 0
B gets 5

A defects

A gets 5
B gets 0

A gets 1
B gets 1

In this case, Players A and B both have a dominant choice - namely, defection. No matter what Player B does, Player A improves his own score by defecting, and vice versa.

However, there is something odd about this game. It seems as though the two players would benefit by choosing to cooperate. Instead of winning only one point each, they could win three points each. So the “rational” choice of mutual defection has a puzzling self-destructive flavor.

The second matrix is an example of a prisoner's dilemma game situation. Just to formalize the situation, let CC be the number of points won by each player when they both cooperate; let DD be the number of points won when both defect; let CD be the number of points won by the cooperating party when the other defects; and let DC be the number of points won by the defecting party when the other cooperates. Then the prisoner's dilemma situation is characterized by the following conditions:

displaymath880

displaymath881

In the second game matrix, we have

displaymath882

so both conditions are met. In the Bunny and Clod story, by the way, you can verify that:

displaymath883

Again, these values satisfy the prisoner's dilemma conditions.

Axelrod's Tournament

 

In the late 1970's, political scientist Robert Axelrod held a computer tournament designed to investigate the prisoner's dilemma situation (Actually, there were two tournaments. Their rules and results are described in Axelrod's book: The Evolution of Cooperation.). Contestants in the tournament submitted computer programs that would compete in an iterated prisoner's dilemma game of approximately two hundred rounds, using the second matrix above. Each contestant's program played five iterated games against each of the other programs submitted, and after all games had been played the scores were tallied.

The contestants in Axelrod's tournament included professors of political science, mathematics, computer science, and economics. The winning program - the program with the highest average score - was submitted by Anatol Rapoport, a professor of psychology at the University of Toronto. In this project, we will pursue Axelrod's investigations and make up our own Scheme programs to play the iterated prisoner's dilemma game.

As part of this project, we will be running a similar tournament, but now involving a three-person prisoner's dilemma.

Before we look at the two-player program, it is worth speculating on what possible strategies might be employed in the iterated prisoner's dilemma game. Here are some examples:

OscarTheGrouch - a program using the OscarTheGrouch strategy simply defects on every round of every game.

MrRogers - a program using the MrRogers strategy cooperates on every round of every game.

Unpredictable - this program cooperates or defects on a random basis.

MajorityRules - this program defects on the first round. On all subsequent rounds, MajorityRules examines the history of the other player's actions, counting the total number of defections and cooperations by the other player. If the other player's defections outnumber her cooperations, MajorityRules will defect; otherwise this strategy will cooperate.

Mimic - this program cooperates on the first round, and then on every subsequent round it mimics the other player's previous move. Thus, if the other player cooperates (defects) on the nth round, then Mimic will cooperate (defect) on the (n+1)st round.

All of these strategies are extremely simple. (Indeed, the first three do not even pay any attention to the other player; their responses are uninfluenced by the previous rounds of the game.) Nevertheless, simplicity is not necessarily a disadvantage. Rapoport's first-prize program employed the Mimic strategy, and achieved the highest average score in a field of far more complicated programs.

The Two-Player Prisoner's Dilemma Program

A Scheme program for an iterated prisoner's dilemma game is provided as part of the code for this project. The procedure play-loop pits two players (or, to be more precise, two “strategies”) against one another for approximately 100 games, then prints out the average score of each player.

Player strategies are represented as procedures. Each strategy takes two inputs - its own “history” (that is, a list of all its previous “plays,” where for convenience we will use "c" to represent cooperate, and "d" to represent defect) and its opponent's “history.” The strategy returns either the string “c” for “cooperate” or the string “d” for “defect.” (Note that we will need to use procedures appropriate for comparing strings when we analyze these results.)

At the beginning of an iterated game, each history is an empty list. As the game progresses, the histories grow (via extend-history) into lists of  “c”'s and “d”'s, thus each history is stored from most recent to least recent. Note how each strategy must have its own history as its first input. So in play-loop-iter, strat0 has history0 as its first input, and strat1 has history1 as its first input.

The values from the game matrix are stored in a list named *game-association-list*. This list is used to calculate the scores at the end of the iterated game.

(define *game-association-list*
  (list (list (list “c” “c”) (list 3 3))
        (list (list “c” “d”) (list 0 5))
        (list (list “d” “c”) (list 5 0))
        (list (list “d” “d”) (list 1 1))))

Thus, if both players cooperate, the payoff to each player is a 3, if one player cooperates and the other defects, the defecting player gets a payoff of 5, the cooperating player gets a zero payoff, if both players defect, each gets a payoff of 1.

Some sample strategies are given in the code. OscarTheGrouch and MrRogers are particularly simple; each returns a constant value regardless of the histories. Unpredictable also ignores the histories and chooses randomly between cooperation and defection. You should study MajorityRules and Mimic to see that their behavior is consistent with the descriptions in the previous section.

Problem 1

To be able to test out the system, we need to complete a definition for extract-entry. This procedure will retrieve the payoff information from the game association list.  The procedure's behavior is as follows: it takes as input a play, represented as a list of choices for each strategy (i.e., a “c” or a “d”), and the game association list. Thus a play will in this case be a list of two entries (since there are two players), each of which is the choice of action for that player. Each entry in the game association list is a list itself, with a first element representing a list of game choices, and the second element representing a list of scores (or payoffs) for each player. Thus extract-entry wants to search down the game association list trying to match its first argument against the first element of each entry in the game association list, one by one. When it succeeds, it returns that whole entry.

For example, we expect the following behavior:

(define a-play (make-play “c” “d”))

(extract-entry a-play *game-association-list*)
;Value: ((“c” “d”) (0 5))

Write the procedure extract-entry, and test it out using the above case *game-association-list*. Turn in a copy of your documented procedure and some test examples. You may want to use a diagram of the list structure to guide the creation of your code.

Problem 2

Use play-loop to play games among the five defined strategies. Notice how a strategy's performance varies sharply depending on its opponent. For example, MrRogers does quite well against Mimic or against another MrRogers, but it loses badly to OscarTheGrouch. Pay special attention to Mimic. Notice how it never beats its opponent - but it never loses badly. Create a matrix in which you show the average score for tournaments pitting all possible pairings of the five different strategies: OscarTheGrouch, MrRogers, Mimic, Unpredictable, MajorityRules.  Describe the behavior you observe for the different strategies.

Problem 3

Games involving MajorityRules tend to be slower than other games. Why is that so? Use order-of-growth notation to explain your answer.

Alyssa P. Hacker, upon seeing the code for MajorityRules, suggested the following iterative version of the procedure:

(define (MajorityRules my-history other-history)
  (define (majority-loop cs ds hist)
    (cond ((empty-history? hist) (if (> ds cs) “d” “c”))
          ((string=? (most-recent-play hist) “c”)
           (majority-loop (+ 1 cs) ds (rest-of-plays hist)))
          (else 
           (majority-loop cs (+ 1 ds) (rest-of-plays hist)))))
  (if (empty-history? my-history)
      “d”
      (majority-loop 0 0 other-history)))

Compare this procedure with the original version. Do the orders of growth (in time) for the two procedures differ? Is the newer version faster?

Problem 4

Write a new strategy slow-mimic. The strategy should cooperate on the first round, then defect on the second round, and then should do what the opponent did two turns previously (i.e. if the opponent defected two turns ago, it should defect; if the opponent cooperated two turns ago, it should cooperate). Play slow-mimic against other strategies. Describe the behavior you observe.

Problem 5

Write a procedure make-slow-mimic-n. This procedure should take a number as input and return the appropriate Mimic-like strategy. For example, (make-slow-mimic-n 2) should return a strategy equivalent to slow-mimic, and (make-slow-mimic-n 1) should return a strategy equivalent to mimic.   Use this procedure to create a new strategy and test it against the other strategies.  Describe the observed behavior.

Problem 6

Write a procedure make-alternating-strategy which takes as input two strategies (say, strat0 and strat1) and an integer (say freq). Make-alternating-strategy should return a strategy which plays strat0 for the first freq rounds in the iterated game, then switches to strat1 for the next freq rounds, and so on. (Hint: you may find it useful to think about the remainder procedure in order to decide which strategy to use at each iteration.) Test it against other strategies and describe the performance.

Problem 7

Write a new strategy, make-higher-order-Unpredictable, which takes a list of strategies as input.  It returns a new strategy that loops through this list of strategies, using the next one in the list for each play, and then starting again at the beginning of the list when it has used all the strategies. Test this new strategy against other strategies and describe the performance.

Problem 8

Write a procedure forgiving, which takes as input a strategy (say strat) and a number between 0 and 1 (call it forgiveness-factor). The forgiving procedure should return a strategy that plays the same as strat except: when strat defects, the new strategy should have a forgiveness-factor chance of cooperating. (If forgiveness-factor is 0, the return strategy performs exactly the same as strat; if forgiveness-factor is 0.5, the returned strategy cooperates half the time that strat defects; if forgiveness-factor is 1, the returned strategy performs the same as MrRogers.)

Use forgiving with a low value for forgiveness-factor - say, 0.1 - to create two new strategies: slightly-forgiving-Nasty and slightly-forgiving-Mimic. Test these new strateigies against other strategies and describe the performance.

Problem 9

A natural idea in creating a prisoner's dilemma strategy is to try and deduce what kind of strategies the other player might be using. In this problem, we will implement a simple version of this idea.

Let’s start by thinking about some simple examples.  Suppose we took our history and the history of our opponent, and for simplicity, let’s assume that if our opponent’s strategy is based in any way on what we have done, that it is reflected on the next play (i.e. our opponent may be using a strategy that is independent of what we do, but if they are reacting to our choices, then their decision at time n depends only on our decision at time n-1).  Thus, if we compare our decision at time n against our opponent’s decision at time n-1, our decision at time n-1 against our opponent’s at time n-2, and so on, we can count the number of times each possible pairing of our decision versus our opponent’s occurs – there will be some number of times each of  CC, CD, DC, and DD occur, where the first symbol is our action and the second is our opponents.

Now, let’s consider some simple cases.  If the only cases with a non-zero count are CC and DC, then there is a good chance that our opponent is playing MrRogers.  In that case, our best strategy would be to constantly defect.  If the only cases with a non-zero count are CD and DD, then there is a good chance that our opponent is playing OscarTheGrouch.  In that case, our best strategy would be to constantly defect.  If the only cases with a non-zero count are CC and DD, then there is a good chance that our opponent is playing Mimic.  What would your best strategy be in this case?

In this problem, we want you to implement this idea.  This will take a bit of planning.  You should design a data structure that you can use for counting the number of occurences of the different possible pairings of cooperate and defect that you find by comparing two histories; you should write procedure(s) that analyze a pair of histories to create a version of this data structure; you should write predicate procedures that test to see if an opponent is playing any of the three stratagies noted above; and finally you should write a new strategy that waits for some number of plays to occur, then tests to see if the opponent is playing either MrRogers or OscarTheGrouch, in which case it should defect, tests to see if the opponent is playing Mimic, in which case it should use whatever strategy you think is optimal, and otherwise should defect or cooperate at random. 

Test your procedure against other strategies.

The Three-Player Prisoner's Dilemma

So far, all of our prisoner's dilemma examples have involved two players (and, indeed, most game-theory research on the prisoner's dilemma has focused on two-player games). But it is possible to create a prisoner's dilemma game involve three - or even more - players.

Strategies from the two-player game do not necessarily extend to a three-person game in a natural way. For example, what does Mimic mean? Should the player defect if either of the opponents defected on the previous round? Or only if both opponents defected? And are either of these strategies nearly as effective in the three-player game as Mimic is in the two-player game?

Before we analyze the three-player game more closely, we must introduce some notation for representing the payoffs. We use a notation similar to that used for the two-player game. For example, we let DCC represent the payoff to a defecting player if both opponents cooperate. Note that the first position represents the player under consideration. The second and third positions represent the opponents.

Another example: CCD represents the payoff to a cooperating player if one opponent cooperates and the other opponent defects. Since we assume a symmetric game matrix, CCD could be written as CDC. The choice is arbitrary.

Now we are ready to discuss the payoffs for the three-player game. We impose three rules (Actually, there is no universal definition for the multi-player prisoner's dilemma. The constraints used here represent one possible version of the three-player prisoner's dilemma):

1) Defection should be the dominant choice for each player. In other words, it should always be better for a player to defect, regardless of what the opponents do. This rule gives three constraints:

displaymath900

displaymath901

displaymath902

2) A player should always be better off if more of his opponents choose to cooperate. This rule gives:

displaymath903

displaymath904

3) If one player's choice is fixed, the other two players should be left in a two-player prisoner's dilemma. This rule gives the following constraints:

displaymath905

displaymath906

displaymath907

displaymath908

We can satisfy all of these constraints with the following payoffs:

CDD = 0,      DDD = 1,     CCD = 3,     DCD = 5,       CCC = 8,      DCC = 12.

Problem 10

Revise the Scheme code for the two-player game to make a three-player iterated game. The program should take three strategies as input, keep track of three histories, and print out results for three players. You need to change only three procedures: play-loop, print-out-results and get-scores (although you may also have to change your definition of extract-entry if you did not write it in a general enough manner). We would suggest that you make copies of the necessary code and rename them so that you can separate the two person version from the three person one.

You also need to change *game-association-list* as follows – you can do this by uncommenting this part of the code (and commenting out the earlier version):

 

(define *game-association-list*
  (list (list (list “c” “c” “c”) (list 8 8 8))
        (list (list “c” “c” “d”) (list 3 3 12))
        (list (list “c” “d” “c”) (list 3 12 3))
        (list (list “d” “c” “c”) (list 12 3 3))
        (list (list “c” “d” “d”) (list 0 5 5))
        (list (list “d” “c” “d”) (list 5 0 5))
        (list (list “d” “d” “c”) (list 5 5 0))
        (list (list “d” “d” “d”) (list 1 1 1))))

Problem 11

Write strategies MrRogers-3, OscarTheGrouch-3, and Unpredictable-3 that will work in a three-player game. Try them out to make sure your code is working.

Write a new strategy:  double-Mimic, if both opponents did the same thing on the last play, this strategy should do the same thing on this play, otherwise it should cooperate or defect at random. Play some games using this new strategy. Describe the observed behavior.

Problem 12

Write a procedure make-combined-strategies which takes as input two two-player strategies and a “combining” procedure. Make-combined-strategies should return a three-player strategy that plays one of the two-player strategies against one of the opponents, and the other two-player strategy against the other opponent, then calls the “combining” procedure on the two two-player results. Here's an example:

(make-combined-strategies
  Mimic Mimic
  (lambda (r1 r2) (if (or (string=? r1 “d”) (string=? r2 “d”))
                      “d”
                      “c”)))

The resulting strategy plays Mimic against each opponent, and then calls the combining procedure on the two results. If either of the two two-player strategies has returned “d”, then the three-player strategy will also return “d”.

Here's another example. This call to make-combined-strategies returns a three-player strategy that plays Mimic against one opponent, MajorityRules against another, and chooses randomly between the two results:

(make-combined-strategies
  Mimic MajorityRules
  (lambda (r1 r2) (if (= (random 2) 0) r1 r2)))
 

Write a new procedure using make-combined-strategies.  Test it out against other strategies and describe the behavior.

Problem 13: The Three-Player Prisoner's Dilemma Tournament

As described earlier, Axelrod held two computer tournaments to investigate the two-player prisoner's dilemma. We are going to hold a three-player tournament. You need to design a strategy for the tournament. You might submit one of the strategies developed in the project (but not one of the standard strategies provided as part of the code), or develop a completely new one. The only restriction is that the strategy must work against any other legitimate entry. Any strategies that cause the tournament software to crash will be disqualified. To submit your entry strategy, you should:

The tournament will be a complete one, that is every strategy plays against every other pair. Each game will consist of approximately 100 rounds.

Submission of project

 

For each problem above, include your code (with identification of the problem number being solved), as well as comments and explanations of your code, and demonstrate your code’s functionality against a set of test cases. Once you have completed this project, your file should be submitted electronically on the 6.001 on-line tutor, using the Submit Project Files button.

 

Remember that this is Project 2; when you are have completed all the work and saved it in a file, upload that file and submit it for Project 2.