Two teams of UC Berkeley researchers have trained quadruped robots, also known as robot dogs, to learn to walk on their own in the real world in record time using reinforcement learning, or RL, algorithms.
RL enables a robot to complete a task such as walking by rewarding it every time it discovers a desired behavior through trial-and-error interactions with its environment, according to Philipp Wu, a co-lead author on the team led by electrical engineering and computer sciences, or EECS, professor Pieter Abbeel.
“You can think of this as how you would train a dog,” Wu said. “Let’s say you were trying to teach a dog how to do tricks, you give it a treat every time it does what you want. And so the way that we try to train these robots is through a similar means.”
Previously, reinforcement learning for robots has been limited to controlled lab settings, which fail to capture the complexity of the real world as it can take robots up to several weeks to learn commands each time researchers alter the environment, according to Danijar Hafner, a co-lead author on Abbeel’s team.
The team, led by Abbeel, taught a quadruped robot to roll over, stand up and walk on its own within one hour of training time, Hafner noted. The robot also learned to roll over and get back on its feet when pushed down.
This was accomplished through an artificial intelligence algorithm named Dreamer, in which a robot learns to model its environment over time from past experience and discovers successful behaviors for choosing actions that achieve high reward, Hafner added.
“The learned environment model allows the robot to imagine future outcomes of its actions, without having to try out all actions in the real world,” Hafner said in an email.
The robot was rewarded when it was upright and moving forward, which substantially reduced the training time of the robots compared to previous methods, according to Wu.
Abbeel’s team also tested the algorithm on a wheeled robot and two robot arms.
The researchers hope to test the robot with multitask objectives and different rewards to complete more complex tasks such as walking forward and doing a backflip, according to Wu and Hafner.
Another team of researchers led by associate professor of EECS Sergey Levine took a different approach called off-policy model-free RL, according to Laura Smith, a co-lead author on Levine’s team.
In this model, the robot is completely untrained with no prior experiences and relies solely on trial and error in the field to learn to walk, according to Smith. The team tested the quadruped robot on four different terrains — one of which was a fire trail — on which it was able to learn to walk in under 20 minutes.
Levine’s team also used a reward function, allowing the robot to explore by using its legs and seeing whether this movement is rewarded.
“Just by getting this feedback from the environment as (the robot) wiggles around and explores in its environment, it learns how to optimize for that reward and as a result it learns some walking behavior,” Smith said.
The team was able to prove that the model-free RL algorithm can be used to get a robot to walk in the real world in a short amount of time, despite its history of taking a long time to learn anything from its environment.
Levine’s team would like to expand the research by moving to a more “realistic” setting, in which a robot does not erase its memory each time it moves to a new environment, Smith said.
Philipp added that leveraging ideas from both teams’ methods may result in even better performance in training a robot how to walk on its own.