Articles
| Open Access | STRATEGIES FOR MITIGATING OVERFITTING AND ASYMPTOTIC BIAS IN BATCH REINFORCEMENT LEARNING WITH PARTIAL OBSERVABILITY
Vincent Pineau , School of Computer Science, McGill University, University Street, Montreal, Canada Raphael Rabusseau , Montefiore Institute, University of LiegeAllée de la découverte, BelgiumAbstract
In the realm of batch reinforcement learning, where agents are confronted with the challenges of partial observability, the phenomena of overfitting and asymptotic bias can greatly affect the learning process. This paper explores effective strategies for addressing these issues and enhancing the performance of agents operating under partial observability. We delve into novel techniques that provide insights into mitigating overfitting while countering the inherent asymptotic bias, thus paving the way for more robust and reliable batch reinforcement learning algorithms.
Keywords
Batch Reinforcement Learning, Partial Observability, Overfitting
References
Abel, D., Hershkowitz, D., & Littman, M. (2016). Near optimal behavior via approximatestate abstraction. InProceedings of The 33rd International Conference on MachineLearning, pp. 2915–2923.
Aberdeen, D. (2003). A (revised) survey of approximate methods for solving partiallyobservable Markov decision processes.National ICT Australia, Canberra, Australia, 1.
Aberdeen, D., Buffet, O., & Thomas, O. (2007). Policy-gradients for PSRs and POMDPs.InArtificial Intelligence and Statistics, pp. 3–10.
Arun-Kumar, S. (2006). On bisimilarities induced by relations on actions. InSoftwareEngineering and Formal Methods, 2006. SEFM 2006. Fourth IEEE International Conference on, pp. 41–49. IEEE.
Cassandra, A. R., Kaelbling, L. P., & Littman, M. L. (1994). Acting optimally in partially observable stochastic domains. InProceedings of the Twelfth AAAI National Conferenceon Artificial Intelligence, Vol. 94, pp. 1023–1028.
Castro, P. S., Panangaden, P., & Precup, D. (2009). Equivalence relations in fully andpartially observable markov decision processes.. InTwenty-First International JointConference on Artificial Intelligence, Vol. 9, pp. 1653–1658.
Chen, T., Goodfellow, I., & Shlens, J. (2015). Net2net: Accelerating learning via knowledgetransfer.arXiv preprint arXiv:1511.05641.
Farahmand, A.-m. (2011).Regularization in reinforcement learning. Ph.D. thesis, Universityof Alberta.
Ferns, N., Panangaden, P., & Precup, D. (2004). Metrics for finite Markov decision processes.InProceedings of the 20th conference on Uncertainty in artificial intelligence, pp.162–169. AUAI Press.
François-Lavet, V., Fonteneau, R., & Ernst, D. (2015). How to discount deep reinforcementlearning: Towards new dynamic strategies.arXiv preprint arXiv:1512.02011.
François-Lavet, V., Taralla, D., Ernst, D., & Fonteneau, R. (2016). Deep reinforcementlearning solutions for energy microgrids management. In European Workshop onReinforcement Learning.
Ghavamzadeh, M., Mannor, S., Pineau, J., & Tamar, A. (2015). Bayesian reinforcementlearning: a survey.Foundations and Trends®in Machine Learning,8(5-6), 359–483.