Using Deep Reinforcement Learning to Play Sonic the Hedgehog

An attempt to replicate the World Models paper to play Sonic for the OpenAI Retro Contest.

Image for post
Image for post
OpenAI meets Sonic The Hedgehog. Image by OpenAI and SEGA.
Image for post
Image for post
A baseball batter doesn’t have time to fully calculate which is the best shot to take. Instead, the batter relies on memories from years of practice to make the decision subconsciously. Source: World Models Explained by Siraj Raval.

“The first principle is that you must not fool yourself and you are the easiest person to fool.” — Richard Feynman

Learning how to learn

The World Model of a Hedgehog

Image for post
Image for post
Original cover photos of the Sonic SEGA Genesis series. Source: Steam

Why bother with games?

How does our Sonic playing agent learn to think?

Image for post
Image for post
Observing the world.

Artificial Intelligence = Reinforcement Learning + Deep Learning

Reinforcement learning

Image for post
Image for post
How one might use reinforcement learning to train their dog to go to the bathroom outside (method not guaranteed).

Deep learning

Image for post
Image for post
Note that we can’t directly decide/know which type of dirt (feature) gets filtered out at each layer. The network (funnel) learns this on its own.
Image for post
Image for post
When you look at the Sonic game window, what’s important to you?

Putting together our own World Model

Image for post
Image for post
Going straight to the source.
Image for post
Image for post
A battle of complexity: Sonic vs. Car Racing. Sonic: 1, Car Racing: 0.

Observe, Remember, Act, Repeat

Image for post
Image for post
An overview of the World Models architecture described in the original World Models paper.

Vision Model (Variational Autoencoder — VAE)

Image for post
Image for post
Vision model portion of the World Model. The first half of the VAE architecture (dotted black line) is described in the figure below.
Image for post
Image for post
The encoder portion of the VAE. See the supporting materials for the full VAE.
Image for post
Image for post
z is essentially a list of numbers which represents a 2D image of the Sonic game space.

Memory Model (Mixture Density Network + Recurrent Neural Network or MDN-RNN)

Image for post
Image for post
The Memory model section of the architecture is highlighted by the green dashed line and is expanded in the figure below.
Image for post
Image for post
An expanded version of the Memory model.

Controller Model (CMA-ES Feed Forward Neural Network)

Image for post
Image for post
The single layer linear function to calculate the most rewarding action at each time step.
Image for post
Image for post
Highlighted portion of the Controller within the World Models architecture.

Three models walk into a bar (putting it all together)

How does our agent play?

Step 1 — Generating data

Step 2 — Training the Vision model

Step 3 — Training the Memory model

Step 4 — Defining the Controller and evolving weight and bias parameters

Image for post
Image for post
Possible button presses with the SEGA Genesis controller.
Image for post
Image for post
For multiple button presses at the same time, more than one number would have a 1 value.

Step 5 — Observing the agent play

Image for post
Image for post
Our agent loved to jump.
Image for post
Image for post
With these button combinations, one can successfully complete every level of Sonic.
Image for post
Image for post
Instead of trying to explore the dotted red circle of the 8 best moves, our agent was searching for the right move in a pool of a possible 4096.

Next steps

How could we improve?

A new world

Appendix

Image for post
Image for post
Our full VAE model (encoder and decoder layers).

Key terms

I play at the crossroads of technology, health and art. Broadcasting from: www.mrdbourke.com