Coarse Q-Learning : Addressing the convergence problem when quantizing continuous state variables

Dazeley, Richard; Vamplew, Peter; Bignold, Adam

Title: Coarse Q-Learning : Addressing the convergence problem when quantizing continuous state variables
Creator: Dazeley, Richard; Vamplew, Peter; Bignold, Adam
Date: 2015
Type: Text; Conference paper
Identifier: http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/162416
Identifier: vital:12644
Identifier: https://doi.org/10.13140/RG.2.1.1965.1041
Abstract: Value-based approaches to reinforcement learning (RL) maintain a value function that measures the long term utility of a state or state-action pair. A long standing issue in RL is how to create a finite representation in a continuous, and therefore infinite, state environment. The common approach is to use function approximators such as tile coding, memory or instance based methods. These provide some balance between generalisation, resolution, and storage, but converge slowly in multidimensional state environments. Another approach of quantizing state into lookup tables has been commonly regarded as highly problematic, due to large memory requirements and poor generalisation. In particular , attempting to reduce memory requirements and increase generalisation by using coarser quantization forms a non-Markovian system that does not converge. This paper investigates the problem in using quantized lookup tables and presents an extension to the Q-Learning algorithm, referred to as Coarse Q-Learning (C QL), which resolves these issues. The presented algorithm will be shown to drastically reduce the memory requirements and increase generalisation by simulating the Markov property. In particular, this algorithm means the size of the input space is determined by the granularity required by the policy being learnt, rather than by the inadequacies of the learning algorithm or the nature of the state-reward dynamics of the environment. Importantly, the method presented solves the problem represented by the curse of dimensionality.
Relation: 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making
Rights: This metadata is freely available under a CCO license
Subject: Reinforcement learning; Temporal difference learning; Continuous state; Quantized state; Function approximation
Reviewed

Hits: 1204
Visitors: 974
Downloads: 0

		Thumbnail	File	Description	Size	Format