Skip to content

This is my attempt to try and speed up progress on training time for reinforcement learning. It's the culmination of countless hours of testing and tweaking, to get the algorithm to play the game bubble bobble.

Notifications You must be signed in to change notification settings

Shikamaru5/DFP-LNDQ-bubble_bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

LNDQ-bubble_bot

This is my attempt to try and speed up progress on training time for reinforcement learning. It's the culmination of countless hours of testing and tweaking, to get the algorithm to play the game bubble bobble. https://www.retrogames.cz/play_216-NES.php?language=EN this is the link to the website I'm using. I may come back to this but for now this is where I'll be stopping. I'd been working on changing the basic SAC or soft-actor critic deep-q reinforcement learning algorithm that I, among others, was taught by the YouTuber Sentdex. I found that for a game such as Bubble Bobble some of the techniques used were insufficient for my purposes. So I've changed a few things, like it doesn't use random samples of a minibatch of states. The algorithm will provide as a batch, the states it's collected up to 128 states to update. I use PyTesseract in order to do OCR or Optical Character Recognition, this allows the model to read the score right from the screen which is a big part of how I've been training the model. I don't want to plug it into the game and give it data from the games systems, it feels like a step in the wrong direction to do that. By all means though don't let me stop you. I'll provide the Ocrtest script I use to make sure that my screen is aligned w/ the algorithms cameras whenever I start a new session. Recently a paper came out and the subsequent code describing using reinforcement learning to teach a robot dog to walk in 20-30 minutes. This intrigued me so there are techniques that they've used in the program as well. They main take away from that research seems to be w/ a large amount of updates to train the model, and w/ a large amount of target_model q-predictions, that are then layer normalized it should reduce training time. The github page is ikostrikov/walk_in_the_park if you wish to review their work. From the paper since it's a different use case they are able to get 25k steps in 20-30 mins, however for a game like Bubble Bobble, I've found that it usually only gets in 13-16 steps per episode every 5 mins, give or take how powerful your GPU is. Which is in essence making that 20-30 mins between 2-6 days of training time, it's dependent on a great many variables. For reinforcement learning on a semi-complex problem with one GPU that's actually really impressive results, it still wasn't sufficient enough for me, so I decided to go and attempt to add direct future predicting to the model. There was a paper done in 2017 by Intel titled learning to act by predicting the future, in which they used a semi-supervised agent to do future predicts on goals set to the model that could be quickly changed depending on the situation. They were in a competition playing the game Doom against other state of the art RL models and blew them out of the water. I borrowed the technique from them and a paper that adapted it to their own purposes called Direct Future Predict, of course my code has been updated to serve my needs. The original paper didn't supply code just some pseudocode but the DFP did, the github page is flyyufelix/Direct-Future-Prediction-Keras, if you find I've made some mistakes or something because I found the code difficult to adapt and fluffy. However, once added I found it did increase my models overall performance, but not to the degree that I wanted. Perhaps you can take what I've done and make it better but I'm concluding this project a long term failure but a short term success. The model likely would've taken weeks if not months to train just off of SAC alone, but now will take merely days if estimations are correct. I had hoped to get it down to at minimum an hour but I have other pressing projects to work on and thus closing it.

About

This is my attempt to try and speed up progress on training time for reinforcement learning. It's the culmination of countless hours of testing and tweaking, to get the algorithm to play the game bubble bobble.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages