Comparación de Algoritmos de Aprendizaje por Refuerzo: DQN vs PPO en el Juego Atari Amidar

Corzo Acuña, Fabrizio Mario

Publicación:
Comparación de Algoritmos de Aprendizaje por Refuerzo: DQN vs PPO en el Juego Atari Amidar

authorProfile.id.code	202111240
dc.contributor.advisor	Takahashi Rodríguez, Silvia
dc.contributor.author	Corzo Acuña, Fabrizio Mario
dc.contributor.jury	Takahashi Rodríguez, Silvia
dc.date.accessioned	2026-01-27T19:30:59Z
dc.date.available	2026-01-27T19:30:59Z
dc.date.issued	2026-01-22
dc.description.abstract	En este proyecto implementé y comparé tres algoritmos de aprendizaje por refuerzo para jugar Amidar, un clásico de Atari: Q-Learning tabular (como punto de partida), DeepQ-Network (DQN), y Proximal Policy Optimization (PPO). Los experimentos arrojaron resultados claros: PPO obtuvo en promedio 293,88 ±84,91 puntos, mientras que DQN apenas alcanzó 89,69 ± 15,45, una diferencia de más del 200%. También desarrollé versiones optimizadas de PPO (“Mejorado” y “Ultra”) usando técnicas recientes como normalización de recompensas y arquitectura Impala-CNN, que mejoraron aún más la estabilidad durante el entrenamiento. El trabajo documenta tanto los éxitos como los problemas encontrados, incluyendo un fenómeno interesante de colapso de entropía en PPO que limita la diversidad de estrategias aprendidas.	spa
dc.description.abstract	This project compares three reinforcement learning algorithms on Amidar, a classic Atari game: tabular Q-Learning as a baseline, Deep Q-Network (DQN), and Proximal Policy Optimization (PPO). The results were striking—PPO scored 293,88 ± 84,91 points on average, while DQN managed only 89,69 ± 15,45, a gap of over 200%. I also developed optimized PPO variants (“Improved” and “Ultra”) using recent techniques like reward normalization and Impala-CNN architecture, which further improved training stability. Beyond the numbers, this work documents practical challenges I encountered, including an entropy collapse phenomenon in PPO that caused the agent to learn only a handful of rigid strategies instead of adapting to different situations.	eng
dc.description.degreelevel	Pregrado
dc.format.extent	42 páginas
dc.format.mimetype	application/pdf
dc.identifier.instname	instname:Universidad de los Andes
dc.identifier.reponame	reponame:Repositorio Institucional Séneca
dc.identifier.repourl	repourl:https://repositorio.uniandes.edu.co/
dc.identifier.uri	https://hdl.handle.net/1992/77986
dc.language.iso	spa
dc.publisher	Universidad de los Andes
dc.publisher.department	Departamento de Ingeniería de Sistemas y Computación
dc.publisher.faculty	Facultad de Ingeniería
dc.publisher.program	Ingeniería de Sistemas y Computación
dc.relation.references	V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Playing Atari with Deep Reinforcement Learning,” NIPS Deep Learning Workshop, 2013.
dc.relation.references	V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Human-level control through deep reinfor cement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
dc.relation.references	J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” arXiv preprint arXiv:1707.06347, 2017.
dc.relation.references	J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-Dimensional Continuous Control Using Generalized Advantage Estimation,” arXiv preprint ar Xiv:1506.02438, 2015
dc.relation.references	D. Silver, A. Huang, C. J. Maddison, et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
dc.relation.references	L. Espeholt, H. Soyer, R. Munos, et al., “IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures,” ICML, 2018.
dc.relation.references	M. Hessel, J. Modayil, H. Van Hasselt, et al., “Rainbow: Combining Improvements in Deep Reinforcement Learning,” AAAI, 2018.
dc.relation.references	M. Andrychowicz, A. Raichuk, P. Sta´nczyk, et al., “What Matters In On-Policy Rein forcement Learning? A Large-Scale Empirical Study,” ICLR, 2021.
dc.relation.references	H. Van Hasselt, A. Guez, and D. Silver, “Deep Reinforcement Learning with Double Q-Learning,” AAAI, 2016.
dc.relation.references	Z. Wang, T. Schaul, M. Hessel, et al., “Dueling Network Architectures for Deep Rein forcement Learning,” ICML, 2016.
dc.relation.references	R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018.
dc.relation.references	G. Brockman, V. Cheung, L. Pettersson, et al., “OpenAI Gym,” arXiv preprint ar Xiv:1606.01540, 2016
dc.relation.references	M. Towers, J. K. Terry, A. Kwiatkowski, et al., “Gymnasium: A Standard Interface for Reinforcement Learning Environments,” arXiv preprint arXiv:2407.17032, 2023.
dc.relation.references	M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The Arcade Learning Envi ronment: An Evaluation Platform for General Agents,” Journal of Artificial Intelligence Research, vol. 47, pp. 253–279, 2013.
dc.relation.references	L. Engstrom, A. Ilyas, S. Santurkar, et al., “Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO,” ICLR, 2020.
dc.rights	Attribution 4.0 International	en
dc.rights.accessrights	info:eu-repo/semantics/openAccess
dc.rights.coar	http://purl.org/coar/access_right/c_abf2
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.subject.keyword	Aprendizaje por refuerzo	spa
dc.subject.keyword	Deep Q-Network	eng
dc.subject.keyword	Proximal Policy Optimization	eng
dc.subject.keyword	Atari	spa
dc.subject.keyword	Amidar	spa
dc.subject.keyword	Redes neuronales convolucionales	spa
dc.subject.themes	Ingeniería	spa
dc.title	Comparación de Algoritmos de Aprendizaje por Refuerzo: DQN vs PPO en el Juego Atari Amidar	spa
dc.type	Trabajo de grado - Pregrado
dc.type.coar	http://purl.org/coar/resource_type/c_7a1f
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.content	Text
dc.type.driver	info:eu-repo/semantics/bachelorThesis
dc.type.redcol	http://purl.org/redcol/resource_type/TP
dc.type.version	info:eu-repo/semantics/acceptedVersion
dspace.entity.type	Publication
person.identifier.cvlac	https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000143898
person.identifier.gsid	https://scholar.google.es/citations?user=x7gjZ04AAAAJ
person.identifier.orcid	0000-0001-7971-8979
relation.isDirectorOfPublication	7ab9a4e1-60f0-4e06-936b-39f2bf93d8a0
relation.isDirectorOfPublication.latestForDiscovery	7ab9a4e1-60f0-4e06-936b-39f2bf93d8a0

Archivos

Bloque original

Mostrando 1 - 2 de 2

Nombre:: Comparacion de Algoritmos de Aprendizaje por Refuerzo DQN vs PPO en el Juego Atari Amidar.pdf
Tamaño:: 312.1 KB
Formato:: Adobe Portable Document Format

Descargar