DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs

The International Conference on Learning Representations (ICLR) 2021 [Spotlight]

Aayam Shrestha Stefan Lee Prasad Tadepalli Alan Fern

Oregon State University

Abstract

We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience. This approach can be applied on top of any learned representation and has the potential to easily support multiple solution objectives as well as zero-shot adjust- ment to changing environments and goals. Our main contribution is to introduce the Deep Averagers with Costs MDP (DAC-MDP) and to investigate its solutions for offline RL. DAC-MDPs are a non-parametric model that can leverage deep representations and account for limited data by introducing costs for exploiting under-represented parts of the model. In theory, we show conditions that allow for lower-bounding the performance of DAC-MDP solutions. We also investigate the empirical behavior in a number of environments, including those with image- based observations. Overall, the experiments demonstrate that the framework can work in practice and scale to large complex offline RL problems.

Paper: [PDF] Code: [GitHub] Poster: [ICLR] Preprint: [arXiv]

Videos:

Bibtex

	@article{
		Shrestha2020DeepAveragersOR,
		title={DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs},
		author={Aayam Shrestha and Stefan Lee and Prasad Tadepalli and Alan Fern},
		journal={ICLR}, 
		year={2021},
		numpages = {24},
		url = {https://iclr.cc/virtual/2021/poster/3092}
		keywords = {non-parametric markov decision process, exact planning, offline reinforcement learning}
	  }