Publicación: Comparación de Algoritmos de Aprendizaje por Refuerzo: DQN vs PPO en el Juego Atari Amidar
| authorProfile.id.code | 202111240 | |
| dc.contributor.advisor | Takahashi Rodríguez, Silvia | |
| dc.contributor.author | Corzo Acuña, Fabrizio Mario | |
| dc.contributor.jury | Takahashi Rodríguez, Silvia | |
| dc.date.accessioned | 2026-01-27T19:30:59Z | |
| dc.date.available | 2026-01-27T19:30:59Z | |
| dc.date.issued | 2026-01-22 | |
| dc.description.abstract | En este proyecto implementé y comparé tres algoritmos de aprendizaje por refuerzo para jugar Amidar, un clásico de Atari: Q-Learning tabular (como punto de partida), DeepQ-Network (DQN), y Proximal Policy Optimization (PPO). Los experimentos arrojaron resultados claros: PPO obtuvo en promedio 293,88 ±84,91 puntos, mientras que DQN apenas alcanzó 89,69 ± 15,45, una diferencia de más del 200%. También desarrollé versiones optimizadas de PPO (“Mejorado” y “Ultra”) usando técnicas recientes como normalización de recompensas y arquitectura Impala-CNN, que mejoraron aún más la estabilidad durante el entrenamiento. El trabajo documenta tanto los éxitos como los problemas encontrados, incluyendo un fenómeno interesante de colapso de entropía en PPO que limita la diversidad de estrategias aprendidas. | spa |
| dc.description.abstract | This project compares three reinforcement learning algorithms on Amidar, a classic Atari game: tabular Q-Learning as a baseline, Deep Q-Network (DQN), and Proximal Policy Optimization (PPO). The results were striking—PPO scored 293,88 ± 84,91 points on average, while DQN managed only 89,69 ± 15,45, a gap of over 200%. I also developed optimized PPO variants (“Improved” and “Ultra”) using recent techniques like reward normalization and Impala-CNN architecture, which further improved training stability. Beyond the numbers, this work documents practical challenges I encountered, including an entropy collapse phenomenon in PPO that caused the agent to learn only a handful of rigid strategies instead of adapting to different situations. | eng |
| dc.description.degreelevel | Pregrado | |
| dc.format.extent | 42 páginas | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.instname | instname:Universidad de los Andes | |
| dc.identifier.reponame | reponame:Repositorio Institucional Séneca | |
| dc.identifier.repourl | repourl:https://repositorio.uniandes.edu.co/ | |
| dc.identifier.uri | https://hdl.handle.net/1992/77986 | |
| dc.language.iso | spa | |
| dc.publisher | Universidad de los Andes | |
| dc.publisher.department | Departamento de Ingeniería de Sistemas y Computación | |
| dc.publisher.faculty | Facultad de Ingeniería | |
| dc.publisher.program | Ingeniería de Sistemas y Computación | |
| dc.relation.references | V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Playing Atari with Deep Reinforcement Learning,” NIPS Deep Learning Workshop, 2013. | |
| dc.relation.references | V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Human-level control through deep reinfor cement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. | |
| dc.relation.references | J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” arXiv preprint arXiv:1707.06347, 2017. | |
| dc.relation.references | J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-Dimensional Continuous Control Using Generalized Advantage Estimation,” arXiv preprint ar Xiv:1506.02438, 2015 | |
| dc.relation.references | D. Silver, A. Huang, C. J. Maddison, et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016. | |
| dc.relation.references | L. Espeholt, H. Soyer, R. Munos, et al., “IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures,” ICML, 2018. | |
| dc.relation.references | M. Hessel, J. Modayil, H. Van Hasselt, et al., “Rainbow: Combining Improvements in Deep Reinforcement Learning,” AAAI, 2018. | |
| dc.relation.references | M. Andrychowicz, A. Raichuk, P. Sta´nczyk, et al., “What Matters In On-Policy Rein forcement Learning? A Large-Scale Empirical Study,” ICLR, 2021. | |
| dc.relation.references | H. Van Hasselt, A. Guez, and D. Silver, “Deep Reinforcement Learning with Double Q-Learning,” AAAI, 2016. | |
| dc.relation.references | Z. Wang, T. Schaul, M. Hessel, et al., “Dueling Network Architectures for Deep Rein forcement Learning,” ICML, 2016. | |
| dc.relation.references | R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018. | |
| dc.relation.references | G. Brockman, V. Cheung, L. Pettersson, et al., “OpenAI Gym,” arXiv preprint ar Xiv:1606.01540, 2016 | |
| dc.relation.references | M. Towers, J. K. Terry, A. Kwiatkowski, et al., “Gymnasium: A Standard Interface for Reinforcement Learning Environments,” arXiv preprint arXiv:2407.17032, 2023. | |
| dc.relation.references | M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The Arcade Learning Envi ronment: An Evaluation Platform for General Agents,” Journal of Artificial Intelligence Research, vol. 47, pp. 253–279, 2013. | |
| dc.relation.references | L. Engstrom, A. Ilyas, S. Santurkar, et al., “Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO,” ICLR, 2020. | |
| dc.rights | Attribution 4.0 International | en |
| dc.rights.accessrights | info:eu-repo/semantics/openAccess | |
| dc.rights.coar | http://purl.org/coar/access_right/c_abf2 | |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
| dc.subject.keyword | Aprendizaje por refuerzo | spa |
| dc.subject.keyword | Deep Q-Network | eng |
| dc.subject.keyword | Proximal Policy Optimization | eng |
| dc.subject.keyword | Atari | spa |
| dc.subject.keyword | Amidar | spa |
| dc.subject.keyword | Redes neuronales convolucionales | spa |
| dc.subject.themes | Ingeniería | spa |
| dc.title | Comparación de Algoritmos de Aprendizaje por Refuerzo: DQN vs PPO en el Juego Atari Amidar | spa |
| dc.type | Trabajo de grado - Pregrado | |
| dc.type.coar | http://purl.org/coar/resource_type/c_7a1f | |
| dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | |
| dc.type.content | Text | |
| dc.type.driver | info:eu-repo/semantics/bachelorThesis | |
| dc.type.redcol | http://purl.org/redcol/resource_type/TP | |
| dc.type.version | info:eu-repo/semantics/acceptedVersion | |
| dspace.entity.type | Publication | |
| person.identifier.cvlac | https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000143898 | |
| person.identifier.gsid | https://scholar.google.es/citations?user=x7gjZ04AAAAJ | |
| person.identifier.orcid | 0000-0001-7971-8979 | |
| relation.isDirectorOfPublication | 7ab9a4e1-60f0-4e06-936b-39f2bf93d8a0 | |
| relation.isDirectorOfPublication.latestForDiscovery | 7ab9a4e1-60f0-4e06-936b-39f2bf93d8a0 |
Archivos
Bloque original
1 - 2 de 2
Cargando...
- Nombre:
- Comparacion de Algoritmos de Aprendizaje por Refuerzo DQN vs PPO en el Juego Atari Amidar.pdf
- Tamaño:
- 312.1 KB
- Formato:
- Adobe Portable Document Format
No hay miniatura disponible
- Nombre:
- Formato autorización proyecto de grado.pdf
- Tamaño:
- 284.18 KB
- Formato:
- Adobe Portable Document Format
- Descripción:
- HIDE
Bloque de licencias
1 - 1 de 1
No hay miniatura disponible
- Nombre:
- license.txt
- Tamaño:
- 2.48 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción: