Articles | Open Access |

STRATEGIES FOR MITIGATING OVERFITTING AND ASYMPTOTIC BIAS IN BATCH REINFORCEMENT LEARNING WITH PARTIAL OBSERVABILITY

Vincent Pineau , School of Computer Science, McGill University, University Street, Montreal, Canada
Raphael Rabusseau , Montefiore Institute, University of LiegeAllée de la découverte, Belgium

Abstract

In the realm of batch reinforcement learning, where agents are confronted with the challenges of partial observability, the phenomena of overfitting and asymptotic bias can greatly affect the learning process. This paper explores effective strategies for addressing these issues and enhancing the performance of agents operating under partial observability. We delve into novel techniques that provide insights into mitigating overfitting while countering the inherent asymptotic bias, thus paving the way for more robust and reliable batch reinforcement learning algorithms.

Keywords

Batch Reinforcement Learning, Partial Observability, Overfitting

References

Abel, D., Hershkowitz, D., & Littman, M. (2016). Near optimal behavior via approximatestate abstraction. InProceedings of The 33rd International Conference on MachineLearning, pp. 2915–2923.

Aberdeen, D. (2003). A (revised) survey of approximate methods for solving partiallyobservable Markov decision processes.National ICT Australia, Canberra, Australia, 1.

Aberdeen, D., Buffet, O., & Thomas, O. (2007). Policy-gradients for PSRs and POMDPs.InArtificial Intelligence and Statistics, pp. 3–10.

Arun-Kumar, S. (2006). On bisimilarities induced by relations on actions. InSoftwareEngineering and Formal Methods, 2006. SEFM 2006. Fourth IEEE International Conference on, pp. 41–49. IEEE.

Cassandra, A. R., Kaelbling, L. P., & Littman, M. L. (1994). Acting optimally in partially observable stochastic domains. InProceedings of the Twelfth AAAI National Conferenceon Artificial Intelligence, Vol. 94, pp. 1023–1028.

Castro, P. S., Panangaden, P., & Precup, D. (2009). Equivalence relations in fully andpartially observable markov decision processes.. InTwenty-First International JointConference on Artificial Intelligence, Vol. 9, pp. 1653–1658.

Chen, T., Goodfellow, I., & Shlens, J. (2015). Net2net: Accelerating learning via knowledgetransfer.arXiv preprint arXiv:1511.05641.

Farahmand, A.-m. (2011).Regularization in reinforcement learning. Ph.D. thesis, Universityof Alberta.

Ferns, N., Panangaden, P., & Precup, D. (2004). Metrics for finite Markov decision processes.InProceedings of the 20th conference on Uncertainty in artificial intelligence, pp.162–169. AUAI Press.

François-Lavet, V., Fonteneau, R., & Ernst, D. (2015). How to discount deep reinforcementlearning: Towards new dynamic strategies.arXiv preprint arXiv:1512.02011.

François-Lavet, V., Taralla, D., Ernst, D., & Fonteneau, R. (2016). Deep reinforcementlearning solutions for energy microgrids management. In European Workshop onReinforcement Learning.

Ghavamzadeh, M., Mannor, S., Pineau, J., & Tamar, A. (2015). Bayesian reinforcementlearning: a survey.Foundations and Trends®in Machine Learning,8(5-6), 359–483.

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

STRATEGIES FOR MITIGATING OVERFITTING AND ASYMPTOTIC BIAS IN BATCH REINFORCEMENT LEARNING WITH PARTIAL OBSERVABILITY. (2023). International Journal of Artificial Intelligence, 3(04), 01-05. https://www.academicpublishers.org/journals/index.php/ijai/article/view/60