Distributional Reinforcement Learning, Terra Incognita

Jan 2, 2022

3 minutes read

My foray into the world of reinforcement learning may[or not] have been caused by the Baader–Meinhof phenomenon. I’ve been interested in machine learning(ML) for as long as I can remember. However, my interest in ML sits collecting dust alongside a host of other unbid interests.

A while back, while learning more about ML on a high level I discovered the field of reinforcement learning. Reinforcement learning(RL) uses the concept of rewards to enable agents to make intelligent actions. I think the field is fascinating.

Since learning about RL, educational resources related to RL have been popping up all around me. I started taking a course by Deepmind, I got stuck after a few lectures due to a lack of proper ML fundamentals. I decided to allocate some time this year to learn the fundamentals of ML & eventually RL. I am writing this post so I can hold myself accountable for this goal.

I came across the draft(under submission) for a book on hacker news — Distributional Reinforcement Learning by Marc G. Bellemare and Will Dabney and Mark Rowland. I read the first two chapters but got stuck after the first two chapters for the same reason mentioned above. While reading those two chapters, I learned a lot and took some notes. I’ll be sharing some notes I made while reading chapter one below. These notes aren’t the most coherent you’ll see, but yeah, enjoy.

There’s a relationship btw intelligence & external feedback
In reinforcement learning, external feedback is quantifiable as a reward.
Agents look to maximise cumulative rewards from interactions with an environment. The agent aims to maximise the culmination of rewards by making certain decisions. The agent’s objectives can be determined by assigning rewards to decisions. Rewards —> decisions —> agents objective
Agents, decisions, uncertainty. Agents making decisions introduce uncertainty to the mix. RL(reinforcement learning) models this uncertainty by introducing chance to the rewards/effects of the agent’s decisions on its environment.
Returns —> it’s random//sum of rewards received along the way
Historically, the field of RL has devoted most of its efforts to model the mean of random return
maximisation of expectation// a common concept in RL. (reward hypothesis)
Great analogy answering why distributional RL —> Just the same way a coloured photograph highlights more details & gives more depth about a scene than a black & white photo, distributional reinforcement learning(DRL) makes the expected return distribution of the agents decision infinite-dimensional.
DRL gives a fresh perspective of how optimal decisions should be made, consequences of interacting with other learning agents.
distributional dynamic programming — the process of computing return distributions
aleatoric uncertainty — the notion that the random nature of the interactions is intrinsically irreducible
partially observable — A scenario where parts of an environment may appear random because some parts of it are not described to the agent
The fundamental object of distributional reinforcement learning is a probability distribution over returns not a scalar like the traditional RL
Bellman equation differs from the distributional Bellman equation (I need to take a closer look at these equations, they are everywhere in RL)
when analysing a distributional reinforcement learning algorithm, we must instead measure the distance between probability distributions using a probability metric.
distributional reinforcement learning enables behaviour that is more robust to variations and, perhaps, better suited to real-world applications.

Here’s a link to the draft if you’ll be interested in reading it. Cheers.

Back to posts