A new paper, Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent (pdf), has just been published in the Proceedings of the National Academy of Sciences. Written by William Press and Freeman Dyson, it represents a substantial breakthrough in strategies that work in the Prisoner’s Dilemma game.
The key point? There does exist a strategy where a player can “enforce a unilateral claim to an unfair share of rewards.”
The implications of this paper are fascinating. For biological evolution, it opens up new thinking about reproductive strategies and life history theory, as well as the direct impact on ideas about the evolution of cooperation.
For cultural evolution, it seems to provide some powerful insights into the evolution of inequality in human society. As the agriculture revolution and population growth led to the ability to monopolize social resources and create differential wealth, what happened with social class? Did human cooperation turn from fairness to enforcing the sort of unfair game that Press and Dyson outline?
Zero Determinant Strategies and Beating Tit-for-Tat
Here is the paper’s abstract for Press and Dyson (2012):
The two-player Iterated Prisoner’s Dilemma game is a model for both sentient and evolutionary behaviors, especially including the emergence of cooperation. It is generally assumed that there exists no simple ultimatum strategy whereby one player can enforce a unilateral claim to an unfair share of rewards. Here, we show that such strategies unexpectedly do exist. In particular, a player X who is witting of these strategies can (i) deterministically set her opponent Y’s score, independently of his strategy or response, or (ii) enforce an extortionate linear relation between her and his scores. Against such a player, an evolutionary player’s best response is to accede to the extortion. Only a player with a theory of mind about his opponent can do better, in which case Iterated Prisoner’s Dilemma is an Ultimatum Game.
PNAS also gives an informative commentary by Alexander Stewart and Joshua Plotkin entitled “Extortion and cooperation in the Prisoner’s Dilemma (pdf).”
One of the key points they highlight is the analysis of Press and Dyson about long-term versus short-term strategies, in this case, framed in terms of memory of previous encounters. (From my side, I think the broader framing – of long versus short strategies – is useful, since it resonates with a great deal of evolutionary thinking, from reproductive strategies to optimality to life history strategies…)
First, they prove that any “long-memory” strategy is equivalent to some “short-memory” strategy, from the perspective of a short-memory player. This means that an opponent who decides his next move by analyzing a long sequence of past encounters might as well play a much simpler strategy that considers only the immediately previous encounter, when playing against a short-memory player. Thus, the possible outcomes of the IPD can be understood by analyzing strategies that remember only the previous round.
I want to focus in on that point – “considers only the immediately previous encounter.” That’s the language of math. But for evolutionary strategies, any situation or context where a player can only take into account (or base his/her decisions on) the previous encounter is playing the short-memory strategy. If a person is structurally forced to do that – say, they have a limited amount of money and need to buy food to support their family, and so have to pay the “market price” – then it is a short-memory or short-term strategy.
What does that mean, according to the game theory math of Press and Dyson? That one player can determine another player’s scores, and make sure the pay-offs benefit him or her over the long-run. Or, to put it another way, to systematically screw the other person.
That is, X can set Y’s score to any value in the range from the mutual noncooperation score to the mutual cooperation score. What is surprising is not that Y can, with X’s connivance, achieve scores in this range, but that X can force any particular score by a fixed strategy p, independent of Y’s strategy q. In other words, there is no need for X to react to Y, except on a timescale of her own choosing. A consequence is that X can simulate or “spoof” any desired fitness landscape for Y that she wants, thereby guiding his evolutionary path. For example, X might condition Y’s score on some arbitrary property of his last 1,000 moves, and thus present him with a simulated fitness landscape that rewards that arbitrary property.
Press and Dyson call these “zero determinant” strategies, because the player can enforce a linear relationship of pay-offs that systematically favor the enforcer. Nothing the other player can do can change that result, so long as the original player chooses a unilinear strategy of their own that sets up this linear relationship.
The outcome? Press and Dyson have discovered strategies that trump Tit-for-Tat, the Prisoner’s Dilemma strategy that has been a consistent winner in the past, beating out strategies that cheat or defect more often. Over the past thirty years, Tit-for-Tat has been a major impetus behind ideas about the evolution of cooperation.
Tit-for-Tat cooperates if you cooperate, and can create a long series of positive outcomes. Except now, Tit-for-Tat, a short-term memory player, loses when it encounters the sort of unlevel playing field that zero-determinant strategists create. As William Poundstone puts it in the illuminating coverage and discussion of this paper over on Edge:
Robert Axelrod’s 1980 tournaments of iterated prisoner’s dilemma strategies have been condensed into the slogan, Don’t be too clever, don’t be unfair. Press and Dyson have shown that cleverness and unfairness triumph after all.
New Strategies: Mischief, Extortion, and Willing Partner in Your Own Defeat
So, what unequal-determinant strategies work? (Yes, I think “unequal-determinant” is a much better name for those specific strategies that force others to play by your set of pay-offs.)
In their commentary, Stewart and Plotkin highlight three unilinear strategies that can all end in zero-determinant games (in other words, where the second player has zero control):
Unequal-Determinant #1: Mischief
If a player X is aware of ZD [zero-determinant] strategies, then she can choose a strategy that determines her opponent Y’s long term score, regardless of how Y plays. There is nothing Y can do to improve his score, although his choices may affect X’s score.
Unequal-Determinant #2: Extortion
Suppose once again that X is aware of ZD strategies, but that Y is an “evolutionary player,” who possesses no theory of mind and instead simply seeks to adjust his strategy to maximize his own score in response to whatever X is doing, without trying to alter X’s behavior. X can now choose to extort Y. Extortion strategies, whose existence Press and Dyson report, grant a disproportionate number of high payoffs to X at Y’s expense (example in Fig. 1). It is in Y’s best interest to cooperate with X, because Y is able to increase his score by doing so. However, in so doing, he ends up increasing X’s score even more than his own. He will never catch up to her, and he will accede to her extortion because it pays him to do so.
Unequal-Determinant #3: Willing Partner in Own Defeat
If both players are sentient and witting of ZD strategies, then each will initially try to extort the other, resulting in a low payoff for both. The rational thing to do, in this situation, is to negotiate a fair cooperation strategy… Knowledge of ZD strategies offers sentient players an even better option [than Tit-for-Tat]: both can agree to unilaterally set the other’s score to an agreed value (presumably the maximum possible). Neither player can then improve his or her score by violating this treaty, and each is punished for any purely malicious violation.
I almost labeled this strategy as “willing partner in victory and defeat,” which is probably a fairer label. After all, one can imagine a mother and fetus in this sort of negotiation, where the mother can often play a zero-determinant strategy for the organism developing inside her. In this case, mothers and offspring likely engage in a cooperation strategy where a mother retains maximum reproductive potential after birth (the baby doesn’t take too much…) and the mother invests enough in the fetus to set its developmental potential to its maximum.
However, that forgets the unequal relationship between the player who can set the conditions for a game and a player who cannot. For those who cannot set the conditions, then the options available can systematically screw them. The options available to one player won’t be the same as with the other player – the maximum possible for each will be different, and even rational cooperation ends in defeat. A z-d strategist, then, will often play to be able to set the matrix of payoffs so that the “maximum” pay-offs still creates a situation like the extortion one, where the other player gets some benefits while still losing out in the end.
Key Questions that Arise with Zero-Determinant Strategies
Whether zero-dimensional strategies are specifically in play in any particular evolutionary case (and whether we can get the data to show that), it remains useful to ask the sorts of questions zero-dimensional approaches raise.
For example, one of the most interesting points that arises from the Press and Dyson (2012) papers is the discrepancy that gets created between players who can execute long-term strategies and those who can only execute short-term or immediate strategies. We can still do this type of analysis, even without being able to ascertain whether a particular case hues to zero-dimensional prisoner dilemma dynamics.
Important questions to ask include:
-Who is playing long versus short strategies?
-Who gets to set the pay-offs?
-What incentives exist for cooperation? And punishments are there for cheating? (In other words, do conditions favor more extortion or more cooperation?)
-And, likely most crucial, are zero-dimensional strategies available to the agents or organisms in question?
The last question goes beyond answering questions of whether or not there are iterative interactions of the type described by the prisoner’s dilemma, as well as whether an animal has a theory of mind or agentive powers of the type capable to execute zero-dimensional strategies, as Press and Dyson seem to indicate as necessary. (I’d actually be open to zero dimensional strategies being found through the brute, random processes of evolution as well; doesn’t have to be a cognitive thing alone.) Rather, are they accessible given differing fitness landscapes, types of reciprocal/dynamic interactions, and/or sets of pay-offs provided to social interactions in specific groups?
Or we might take the reverse engineering route, and find evolutionary scenarios that already exist which we might fruitfully analyze using the lens of skewed but linear pay-offs from iterative interactions between agents pursuing different strategies.
For example, can sexual reproduction possibly be a zero-dimensional strategy, specifically in cases where females can systematically skew pay-offs for males that treat intercourse as a short-term interaction?
Life History Theory and Z-D Strategies
One area where this type of analysis might fruitfully be applied is in life history theory, parental investment, and other evolutionary processes that rely on development.
Take the weanling’s dilemma, where developing infants need to make the trade-off between gaining potential access to greater nutritional intake through food but at the risk of exposure to more pathogens. In one sense, this decision is an iterative one between infant and child – they have to both agree to keep breast feeding for another day. Here the mother is in a position to engage in zero dimensional strategizing that favors not only this child, but also other potential children she might have. For the infant, the trade-offs are largely immediate, about whether to give up access to free calories.
The infant is in no position to engage in the sort of linear strategizing that a mother can impose. Particularly for infants that appear to be weaker or to not offer the same evolutionary pay-offs, mothers might cut off high levels of investment earlier. For infants that appear to offer greater pay-offs, the mother might do the opposite, and set the table to ensure the maximum pay-off from interactions, even if that ends up having a higher overall cost in terms of reproductive effort. Similarly, for species with large amounts of paternal investment, the same dynamics could play into how fathers invest in their offspring.
In other words, zero-dimensional strategies are a way to think about facultative adjustments that organisms can make in reproductive and life history strategies.
As just a thought to throw out there, might zero-dimensional approaches shed new light on the epidemiological transition? Has it made sense, where fitness pay-offs are high for offspring through investment and development, to invest more as a parent and thus set the highest set of pay-offs for a child?
As another thought, are epigenetic mechanisms that last more than a generation a way to try to game the system, to try to create the sort of long-term skewing to ensure better adjustment to unpredictable or short-term environments?
On the Evolution of Social Inequality
If a partner can be forced into taking a “short-memory” strategy, then an agent opens up the field to enact the zero-dimensional strategies. To play at mischief, to extort others, or to set partners up to play and yet fail…
One mystery of human evolution is why systemic inequality appeared with the emergence of complex social structures. Hunter-gatherers have fewer social divisions than agricultural societies, and often have a series of cultural and behavioral mechanisms that help enforce cooperation. With the rise of agriculture, the ability to dominate concentrated resources, and the need for communal defense in war, many human societies developed social classes and structural inequality. Why?
Evolutionary analysis using zero-dimensional strategies helps provide insight – people in positions of power could enforce short-term strategies on other members in society, even as they enacted their own longer term linear tactics to determine the pay-offs that reciprocal interactions provided to everyone. In other words, the rich got richer…
I will be interested to see how these types of analyses develop.
I am also struck by some of the language in the Press and Dyson (2012) paper, which sounds remarkably close to analyses of power within cultural anthropology.
Here are two initial examples, developed to be provocative:
For any strategy of the longer-memory player Y, shorter-memory X’s score is exactly the same as if Y had played a certain shorter-memory strategy (roughly, the marginalization of Y’s long-memory strategy: its average over states remembered by Y but not by X), disregarding any history in excess of that shared with X.
Or, to draw on Eric Wolf’s magnum opus, Europe and the People without History. Discounting shared history – shared accountability – is certainly one way to force short-term interactions, and that will benefit a dominant trading partner.
X can force any particular score by a fixed strategy p, independent of Y’s strategy q. In other words, there is no need for X to react to Y, except on a timescale of her own choosing… For example, X might condition Y’s score on some arbitrary property of his last 1,000 moves, and thus present him with a simulated fitness landscape that rewards that arbitrary property.
Cultural distinctions often draw on exactly those arbitrary but symbolically powerful distinctions that determine status. To draw on Pierre Bourdieu, Distinction: A Social Critique of the Judgment of Taste, here is the Amazon description:
In the course of everyday life people constantly choose between what they find aesthetically pleasing and what they consider tacky, merely trendy, or ugly… The different aesthetic choices people make are all distinctions-that is, choices made in opposition to those made by other classes. Taste is not pure. Bourdieu finds a world of social meaning in the decision to order bouillabaisse, in our contemporary cult of thinness, in the “California sports” such as jogging and cross-country skiing. The social world, he argues, functions simultaneously as a system of power relations and as a symbolic system in which minute distinctions of taste become the basis for social judgement.
Those social judgments – or traditions as people call them – function marvelously as zero-dimensional strategies that enforce greater pay-offs from the people on the “good” side of the judgments.
Coca-Cola and Capitalism
Can we also analyze capitalism in this same way? Is capitalism an example of a zero-determinant strategy?
On the idealized side, the rational cooperative strategy of fair play provides maximal pay-offs to everyone involved. On the critical side, companies that can extract excess profit from workers or from consumers can use that increased pay-off to set conditions that favor themselves.
Workers often play shorter term strategies than companies – they have to eat everyday. Companies can then use zero-determinant (or unequal-determinant) strategies. They can extract the maximum productivity from their workers, all the while setting up strategies that ensure the companies get much greater pay-offs over time than the workers. The workers buy into it, because they get what they need in an immediate sense. But they often are losing in the long run.
Consumers are similar. You want a coke or you don’t. But the Coca Cola company isn’t playing that game. They want to extract maximum profit from a very lucrative commodity. The iterative interaction – you buying coke and coke and Coca-Cola selling can after can – is played at different time-scales, and with very different objectives. They can bring enormous resources – placement of coke in supermarkets and in movie theaters, sugar price supports from the government, and so forth – that set up a long-term memory strategy that provides them great pay-offs.
The consumer, well, the consumer gets choice – an immediate decision between one product or another, or one price or another. Funny how the market works that way…
So this 2012 paper by William Press and Freeman on Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent brings up lots of interesting angles.
The implications of this work go beyond biological evolution into social and cultural evolution, human development and reproduction, and social analysis of social class, capitalism and inequality. I hope I’ve stirred up a little mischief with it.