Learning to Catch Reactive Objects with a Behavior Predictor

Kai Lu¹, Jia-Xing Zhong¹, Bo Yang², Bing Wang², Andrew Markham¹

¹ K. Lu, J-X. Zhong, and A. Markham are with the Department of Computer Science, University of Oxford, Oxford, UK.
² B. Yang and B. Wang are with vLAR Group, Department of Computing, Hong Kong Polytechnic University, HKSAR.

Paper Link Code & Data (Coming Soon)

Overview Video

Abstract & Method

Tracking and catching reactive objects is an important ability for robots in a dynamic world, where these targets will alter their behavior in response to motion of the manipulator. Reactive applications range from gently capturing living animals to smoothly assisting a person. In this work, we blend the approach of an explicit, yet learned, target state predictor with RL. We further show how a tightly coupled predictor which ‘observes’ the state of the robot leads to significantly improved anticipatory action, especially with targets that seek to evade the robot following a simple policy. Example results (click).

Our approach: Prediction-based RL for robotic catching (3,4). Coupled learning of target predictor (4). Advantages: Enhanced RL efficiency and performance (4). Predicts both non-reactive and reactive behaviors (4).

Training Process - Predictor

In the predictor learning phase, the robot interacts with the object (left), observing its movements in response to the robotic actions. The evaluation of prediction is also shown in the video (right).

Training Process - Prediction-Based RL

During RL training, we notice that there are continuous improvements of our robotic agent (left is recorded during episode=250, right is episode=500), which is learning how to track and catch the object.

We also apply massively parallel training for our method (left is predictor learning, right is RL training) in Isaac Gym¹.

Example Results

Experiments show that our approach can effectively learn catching policy for reactive objects at different speeds, achieving an overall success rate of 80%. As shown here, the upper two videos are low-speed (40%~50% of robot's maximum speed) objects, the bottom two videos are high-speed (80% of robot's maximum speed) objects.

Comparison: while our method (above) shows effectiveness in the catching task, the Monolithic RL method (below left) without explict prediction often lets objects escape catching range. Moreover, the Vanilla Predictor + RL method (below right) where its predictor doesn't account for robot's self actions and object's non-linear reactive behaviors frequently results in failed catching attempts due to incorrect predictions.

Versatility Across Behaviors

We also show the generalizability of our method across various behaviors, including fixed circle path, bouncing off the wall, reactive behavior, and reactive behaviors with small added (random) noises (from left to right, from top to bottom).

Adaptability in Diverse Environments

We further show the adaptability of our method in diverse environments when combining a collision avoidance module. The robot is then able to catch the moving objects in these challenging areas.