Skip to main content

Learning to cooperate in the Iterated Prisoner’s Dilemma by means of social attachments


The Iterated Prisoner’s Dilemma (IPD) has been used as a paradigm for studying the emergence of cooperation among individual agents. Many computer experiments show that cooperation does arise under certain conditions. In particular, the spatial version of the IPD has been used and analyzed to understand the role of local interactions in the emergence and maintenance of cooperation. It is known that individual learning leads players to the Nash equilibrium of the game, which means that cooperation is not selected. Therefore, in this paper we propose that when players have social attachment, learning may lead to a certain rate of cooperation. We perform experiments where agents play the spatial IPD considering social relationships such as belonging to a hierarchy or to coalition. Results show that learners end up cooperating, especially when coalitions emerge.


  1. 1.

    Abramson G, Kuperman M (2001) Social games in a social network. Phys Rev E 63

  2. 2.

    Axelrod R (1984) The evolution of cooperation. Basic Books, New York

    Google Scholar 

  3. 3.

    Babes M, Cote EMD, Littman ML (2008) Social reward shaping in the prisoner’s dilemma. In: Padgham L, Parkes D, Müller J, Parsons S (eds) Proc. of the 7th int. joint conf. on aut. agents and multiagent systems, IFAAMAS, May 2008, pp 1389–1392

    Google Scholar 

  4. 4.

    Bazzan ALC, Bordini RH (2001) A framework for the simulation of agents with emotions: Report on experiments with the iterated prisoners dilemma. In: Müller JP, Andre E, Sen S, Frasson C (eds) Proceedings of the fifth international conference on autonomous agents, Montreal, Canada, May 2001. ACM, New York, pp 292–299

    Chapter  Google Scholar 

  5. 5.

    Bazzan ALC, Bordini RH, Campbell JA (1999) Moral sentiments in multi-agent systems. In: Intelligent agents V. Lecture notes in artificial intelligence, vol 1555. Springer, Berlin, pp 113–131. Also appeared as Proc. of the workshop on agent theories, architecture and languages (ATAL98), Paris, July 1998

    Google Scholar 

  6. 6.

    Bazzan ALC, de Oliveira D, da Silva BC (2010) Learning in groups of traffic signals. Eng Appl Artif Intell 23:560–568

    Article  Google Scholar 

  7. 7.

    Brafman RI, Tennenholtz M (2002) Efficient learning equilibrium. In: NIPS, pp 1603–1610

    Google Scholar 

  8. 8.

    Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the fifteenth national conference on artificial intelligence, pp 746–752

    Google Scholar 

  9. 9.

    Costa-Montenegro E, Burguillo-Rial JC, González-Castaño FJ, Vales-Alonso J (2007) Agent-controlled sharing of distributed resources in user networks. In: Lee RST, Loia V (eds) Computational intelligence for agent-based systems. Studies in computational intelligence, vol 72. Springer, Berlin, pp 29–60

    Chapter  Google Scholar 

  10. 10.

    Costa-Montenegro E, Burguillo-Rial JC, Gil-Castiñeira F, González-Castaño FJ (2011) Implementation and analysis of the bittorrent protocol with a multi-agent model. J Netw Comput Appl 34:368–383

    Article  Google Scholar 

  11. 11.

    Fulda N, Ventura D (2007) Predicting and preventing coordination problems in cooperative Q-learning systems. In: Proceedings of the 20th international joint conference on artificial intelligence (IJCAI), pp 780–785

    Google Scholar 

  12. 12.

    Hines G, Larson K (2008) Learning when to take advice: A statistical test for achieving a correlated equilibrium. In: McAllester DA, Myllymäki P (eds) UAI. AUAI Press, Menlo Park, pp 274–281

    Google Scholar 

  13. 13.

    Hu J, Wellman MP (1998) Multiagent reinforcement learning: Theoretical framework and an algorithm. In: Proc. 15th international conf. on machine learning. Kaufmann, Los Altos, pp 242–250

    Google Scholar 

  14. 14.

    Huberman BA, Glance NS (1993) Evolutionary games and computer simulations. Proc Natl Acad Sci USA 90:7716–7718

    Article  Google Scholar 

  15. 15.

    Humphrys M (1997) Action selection methods using reinforcement learning. PhD thesis, Cambridge

  16. 16.

    Kim BJ, Trusina A, Holme P, Minnhagen P, Chung JS, Choi MY (2002) Dynamic instabilities induced by asymmetric influence: Prisoner’s dilemma game in small-world networks. Phys Rev E 66

  17. 17.

    Kuminov D, Tennenholtz M (2008) As safe as it gets: Near-optimal learning in multi-stage games with imperfect monitoring. In: Proceeding of the ECAI. IOS Press, Amsterdam, pp 438–442

    Google Scholar 

  18. 18.

    Lin R, Kraus S, Shavitt Y (2007) On the benefits of cheating by self-interested agents in vehicular networks. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems (AAMAS 2007). ACM, New York, pp 327–334

    Google Scholar 

  19. 19.

    Lindgren K, Nordahl M (1994) Evolutionary dynamics of spatial games. Physica D 75:292–309

    Article  Google Scholar 

  20. 20.

    Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th international conference on machine learning, ML, New Brunswick, NJ. Kaufmann, Los Altos, pp 157–163

    Google Scholar 

  21. 21.

    Littman ML (2001) Friend-or-Foe Q-learning in general-sum games. In: Proceedings of the eighteenth international conference on machine learning (ICML01), San Francisco, CA, USA. Kaufmann, Los Altos, pp 322–328

    Google Scholar 

  22. 22.

    Mailath G, Samuelson L, Shaked A (1993) Correlated equilibria as network equilibria. Discussion paper, University of Bonn

  23. 23.

    Narendra KS, Thathachar MAL (1989) Learning automata: an introduction. Prentice-Hall, Upper Saddle River

    Google Scholar 

  24. 24.

    Nowak MA, May RM (1992) Evolutionary games and spatial chaos. Nature 359:826–829

    Article  Google Scholar 

  25. 25.

    Ortony A, Clore GL, Collins A (1988) The cognitive structure of emotions. Cambridge University Press, Cambridge

    Book  Google Scholar 

  26. 26.

    Panait L, Luke S (2005) Cooperative multi-agent learning: The state of the art. Auton Agents Multi-Agent Syst 11(3):387–434

    Article  Google Scholar 

  27. 27.

    Peleteiro A, Burguillo JC, Bazzan ALC (2010) Enhancing cooperation in the ipd with learning and coalitions. In: Proc. of the 2nd Brazilian workshop on social simulation, S. Bernardo do Campo. SBC, Porto Alegre

    Google Scholar 

  28. 28.

    Sandholm T (2007) Perspectives on multiagent learning. Artif Intell 171(7):382–391

    MathSciNet  Article  Google Scholar 

  29. 29.

    Sandholm TW, Crites RH (1995) Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems 37:147–166

    Article  Google Scholar 

  30. 30.

    Sandholm T, Larson K, Andersson M, Shehory O, Tohmé F (1999) Coalition structure generation with worst case guarantees. Artif Intell 111(1–2):209–238

    Article  Google Scholar 

  31. 31.

    Shoham Y, Powers R, Grenager T (2007) If multi-agent learning is the answer, what is the question? Artif Intell 171(7):365–377

    MathSciNet  Article  Google Scholar 

  32. 32.

    Stone P (2007) Multiagent learning is not the answer. It is the question. Artif Intell 171(7):402–405

    Article  Google Scholar 

  33. 33.

    Stone P, Veloso M (2000) Multiagent systems: A survey from a machine learning perspective. Auton Robots 8(3):345–383

    Article  Google Scholar 

  34. 34.

    Vinyals M, Rodríguez-Aguilar JA, Cerquides J (2011) A survey on sensor networks from a multiagent perspective. Comput J 54:455–470

    Article  Google Scholar 

  35. 35.

    Vrancx P, Tuyls K, Westra RL (2008) Switching dynamics of multi-agent learning. In: Padgham L, Parkes D, Müller J, Parsons S (eds) Proceedings of the 7th international joint conference on autonomous agents and multiagent systems, Estoril, vol 1. pp 307–313

    Google Scholar 

  36. 36.

    Wang X, Sandholm T (2002) Reinforcement learning to play an optimal Nash equilibrium in team Markov games. In: Advances in neural information processing systems (NIPS-2002), vol 15

    Google Scholar 

  37. 37.

    Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292

    Google Scholar 

  38. 38.

    Zhang C, Abdallah S, Lesser VR (2008) Efficient multi-agent reinforcement learning through automated supervision (extended abstract). In: Padgham L, Parkes D, Müller J, Parsons S (eds) Proceedings of the 7th international joint conference on autonomous agents and multiagent systems, Estoril, vol 3. pp 1365–1368

    Google Scholar 

  39. 39.

    Zhang C, Abdallah S, Lesser V (2009) Integrating organizational control into multi-agent learning. In: Sichman JS, Decker KS, Sierra C, Castelfranchi C (eds) Proceedings of the 8th international conference on autonomous agents and multiagent systems (AAMAS), Budapest, Hungary

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Ana L. C. Bazzan.

Additional information

A previous version of this paper appeared at BWSS 2010, the Brazilian Symposium on Social Simulation.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Bazzan, A.L.C., Peleteiro, A. & Burguillo, J.C. Learning to cooperate in the Iterated Prisoner’s Dilemma by means of social attachments. J Braz Comput Soc 17, 163–174 (2011).

Download citation


  • Game theory
  • Iterative Prisoner’s Dilemma
  • Reinforcement learning
  • Agent-based simulation