Reinforcement Learning Game

Vipul Vaibhaw
3 min readApr 27, 2019

--

This blog post is about a small script which I have written which will help you to understand basic concepts of Reinforcement Learning.

Recently, I came across a talk by Richard Sutton at Microsoft titled — “Tutorial: Introduction to Reinforcement Learning with Function Approximation”. In this tutorial, he demonstrates a tool which was written in common Lisp, I tried to replicate the same game in python here — github.

In this game, the player is a an agent. So in the world of reinforcement learning, there are two components. There is an agent and an environment with which agents interacts with.

The agent interacts with the environment by taking some permissible actions and accordingly gets a feedback(or reward).

Here in this game, there are two actions which the user(agent) can take — {1, 2}. The environment has got two states — {A,B}. According to the action taken by user the state in the environment is affected.

How to play this game?

Following are the system requirements(as if I have made FIFA 😛 ) -

  1. python3
  2. numpy

after you have the system ready, then follow these steps -

> git clone https://github.com/vaibhawvipul/tildy-mdp.git 
> python3 learn_mdp.py

That’s it!

About the Game -

As we can see that, there are two states — state A and state B. Following are the scenarios in this game -

  • If state A and action 1 is taken then final state is A and small positive reward is earned.
  • If state A and action 2, 80% chances that final state is B, small negative reward. If final state is A then small positive reward.
  • If state B and action 1, 80% chances that final state is A, big positive reward is earned. If final state is B, small negative reward.
  • If state B and action 2, 80% chances that final state is A, small negative reward is earned. If final state is B, small negative reward.

The Optimal Strategy -

This game helps user understand the exploration vs exploitation dilemma and markov decision process(outcome is party random and partly under control of user).

As we can observe in above section that a good strategy can be to remain in state A and keep taking action 1 and get small positive rewards always.

However the optimal strategy is, when in state A, take action 2 with some negative reward. This will change the state to B and then take action 1 to get big positive rewards!

I hope this was a fun read! I really enjoyed coding this up! Thanks to Richard Sutton for inspiration.

This github repo is open for PRs. Couple of ideas are -

  1. make this game to remember the total number of rewards so that a neural network can be trained on this.
  2. make the probabilities random so that every time this game is booted up, not even the programmer knows the optimal strategy.

Thanks for reading this! Happy coding! 🙂

If you liked this blog then please share it on twitter and follow this blog!

Personal Blog of an autodidact! View all posts by Vipul Vaibhaw

Originally published at http://vipulvaibhaw.com on April 27, 2019.

--

--

Vipul Vaibhaw

I am passionate about computer engineering. Building scalable distributed systems| Web3 | Data Engineering | Contact me — vaibhaw[dot]vipul[at]gmail[dot]com