You are free to share this article under the Attribution 4.0 International license.
Using a training technique commonly used to teach dogs to sit and stand, researchers showed a robot how to teach itself several new tricks, including stacking blocks.
With the method, the robot called Spot was able to learn in days, which usually takes a month.
“I’ve had dogs so I know that rewards work, and that was the inspiration behind the development of the learning algorithm.”
By using positive reinforcement, an approach familiar to anyone who has changed a dog’s behavior with treats, the team has dramatically improved the robot’s skills, and done so quickly enough to be able to train robots for real work a more workable company to make.
“The question here was how do we get the robot to learn a skill?” Says lead author Andrew Hundt, a PhD student who works at the Computational Interaction and Robotics Laboratory at Johns Hopkins University. “I’ve had dogs so I know that rewards work, and that was the inspiration behind the development of the learning algorithm.”
The research appears in IEEE Robotics and Automation Letters.
Teaching a robot to learn
Unlike humans and animals, who are born with a highly intuitive brain, computers are a blank board and have to learn everything from scratch. But real learning is often achieved through trial and error, and robotics are still figuring out how robots can efficiently learn from their mistakes.
The team achieved this by developing a reward system that works for a robot the way it works for a dog. Where a dog gets a biscuit for a good job, the robot earns numerical points.
Hundt remembered once teaching his terrier mix puppy, Leah, the command “Leave it” so she could ignore squirrels on walks. He used two types of treats, regular trainer treats and something even better, like cheese.
When Leah got excited and sniffed at the treats, she got nothing. But when she calmed down and looked away, she got the good stuff. “So I gave her the cheese and said: ‘Leave it! Good Lea! ‘”
Similar to stacking blocks, Spot the Robot had to learn to focus on constructive actions. As the robot explored the blocks, it quickly learned that stacking correctly earned high scores, but nothing wrong. Reach out but not grab a block? No points. Knock over a pile? Definitely no points. Spot made the most money by putting the last pad on a pile of four pads.
Not only did the training tactic work, it only took days to teach the robot what used to take weeks. The team was able to reduce practice time by first training a simulated robot that is very similar to a video game and then running tests with Spot.
“The robot wants the higher number of points,” says Hundt. “It quickly learns the right behavior to get the best reward. In fact, it took a month of practice for the robot to achieve 100% accuracy. We did it in two days. “
Positive reinforcement not only helped the robot teach itself to stack blocks, but with the scoring system the robot learned various other tasks just as quickly – even a simulated navigation game. The ability to learn from mistakes in all situations is critical to developing a robot that can adapt to new environments.
“In the beginning the robot has no idea what it is doing, but it gets better and better with every exercise. It never gives up and tries again and again to stack and is able to complete the task 100%, ”says Hundt.
The team envisions that these results could help teach household robots to do laundry and wash dishes – tasks that could be popular in the open market and help seniors live independently. It could also help develop improved self-driving cars.
“Our goal is ultimately to develop robots that can perform complex tasks in the real world – such as product assembly, elderly care and surgery,” says co-author Gregory D. Hager, professor of computer science.
“We don’t currently know how to program such tasks – the world is too complex. But such work shows us that the idea that robots can learn to do such real-world tasks in a safe and efficient way is very promising, ”says Hager.
Source: Johns Hopkins University
DOI of the original study: 10.1109 / LRA.2020.3015448