Why RL?
First draft: 2022-12-16
Reinforcement learning (RL) is a fascinating subject to study because it represents a significant step forward in our ability to create intelligent machines.
One of the primary reasons that reinforcement learning is so fascinating is that it allows machines to learn through trial and error, much like a human would. Instead of being explicitly programmed with a set of rules or instructions, a reinforcement learning agent is given a goal and a set of possible actions, and it learns to achieve that goal through interaction with an environment. Through learning the agent can adapt and improve its performance over time, as it gathers more experiences and learns how to chose good actions to achieve tasks.
The section above was written by a Transformer (GPT3) tuned with Reinforcement Learning (GPT -> ChatGPT), demonstrating the ability of RL for text generation. [6]
(Prompt: “Write a section about why reinforcement learning is fascinating”)
Another reason that reinforcement learning is so cool is that it has the potential to revolutionize a wide range of fields, from healthcare and robotics to autonomous vehicles, games, natural language processing, finance, education and tutoring, and many more. By enabling machines to learn and adapt to new situations and environments, reinforcement learning could lead to significant advancements in a variety of industries, improving efficiency and productivity and potentially even producing creativity.
As hinted above, the potential of RL extends to creative solutions that can emerge for example through a combined approach of self-play with reinforcement learning. When an agent is given the freedom to explore and experiment with different actions and strategies in order to achieve a goal, typically in a competetive setting where it is playing against itself, the learning can lead to innovative, original and clever solutions, that even the builders of the system didn’t anticipate.
Last but not least, i also believe that we understand a great deal of stuff about our brains in the process of building AI systems. I’ll leave you with this quote from Richard Feynman, which sums it up nicely.
What I cannot create, I do not understand.
A few examples of creative solutions
that emerged through RL:
OpenAI hide and seek (agents learned to abuse the physics engine)
Videos are playing from the timestamps that show the strategies to exploit physics bugs.
Breakout tunnel strategy
Below, I have linked to the specific parts of the following videos that correspond to the text. However, if you find the video interesting, I highly recommend watching the entire thing.
Agent discovering a creative strategy (tunnel digging) to beat breakout quickly.
(Wait for it.)
The creative move 37 from AlphaGo, refuting thousands of years of Go theory
OpenAI gym ant walking upside down
Left: normal walking pattern [2]
Right: something similar to the discovered
head-walking gait (read text below) [3]
OpenAI gym coupled with the MuJoCo simulator offers realistic physics simulation, so you can try to train agents to walk or do all kinds of stunts. Usually when learning a walking gait (pattern), you want to give some additional reward when the agent is walking smoothly (because otherwise it often jitters and looks really weird). Here someone wanted to make the ant walk properly instead of jittering on the ground in order to move forward, and theirfore gave it negative reward for touching the ground. But instead of walking or jumping properly, it discovered to walk on its head and moved forward on the knees, so that its feet didn’t touch the ground. Unfortunately I couldn’t find the video for this, but the clip on the right should give you the gist of how that would look like.
Similarly to the ant example, here we want to learn to run
Just watch for yourself in how many ways that can go wrong :D
(My favorite is top right at 1:20)
Apart of creating insight, efficiency and creative solutions, the study of reinforcement learning also allows us to gain a deeper understanding of how intelligence works, both in humans and in machines. I find the thinking of Rich Sutton [4] (with timestamp; <- this is probably the most important link in this post) appealing, who says we should keep an open mind about intelligence and not predetermine that human intelligence is an optimal way of going about it. Because it’s not. Machines can already do a lot of intelligent things better than we can. I think there is still a big Delta of wiggle room before we create a system that resembles optimal intelligence, Artificial General Intelligence (AGI), or anything close to that.
By examining the learning systems, we can gain insights into the underlying mechanisms of learning and decision-making, and use this knowledge to:
- design more intelligent systems in the future
- understand ourselves better
Overall, the study of reinforcement learning is a fascinating and awe-inspiring field that has the potential to shape the future of artificial intelligence and the way we interact with machines.
And its results are sometimes just plain beautiful.
So i’m in.
Todo
Ressources:
- david silver lectures for an introduction to rl
-
sutton & barto (free online)
-
add GIFS for RL agents playing
- building, analyzing and understanding intelligence
-
super fascinating because it resembles how humans learn -> machines will be able to do it more efficiently/better
- AGI
-> help solve problems -> come up with new theorems and proofs -> help propose solutions to hard problems, e.g. mitigate/reduce climate change
- learning! (no.1. fact-based is limited by the knowledge put inside the system and the designers)
- attention
- curiosity
Accomplishments
-
add images for accomplishments
- ChatGPT (PPO + RLHF)
- AlphaGo (Deepmind-Go)
- AlphaZero (Deepmind-Any 2 player game)
- Google Datacenters (Deepmind)
- learning to drive in a day
- AlphaStar
- walking robodogs
References
- OpenAI: Emergent Tool Use from Multi-Agent Interaction
- Berkeley - normal ant walking gait
- Stuart Robinson with Isaac Sim - ant head spinning
- Rich Sutton - The Alberta Plan for AI Research: Tea Time Talk with Richard S. Sutton
- Thumbnail taken from this Google blog
- ChatGPT - GPT with RL from human feedback