Decoupling Skill Learning from Robotic Control for Generalizable Object Manipulation


Kai Lu1, Bo Yang2, Bing Wang1, Andrew Markham1
1 K. Lu, B. Wang, and A. Markham are with the Department of Computer Science, University of Oxford, Oxford, UK. {kai.lu, bing.wang, andrew.markham}@cs.ox.ac.uk
2 B. Yang is with vLAR Group, Department of Computing, Hong Kong Polytechnic University, HKSAR. bo.yang@polyu.edu.hk

paper link


Summary Video (ICRA 2023)


Abstract

Robotic reinforcement learning (RL) and imitation learning (IL) have recently shown potential for tackling a range of tasks e.g., opening a drawer or a cupboard, but they generalize poorly to unseen objects. In this paper, we separate the task of learning 'what to do' from 'how to do it' i.e., whole-body control (WBC). We pose the RL problem as one of determining the skill dynamics for a disembodied virtual manipulator interacting with articulated objects (left panel). The QP-based WBC is optimized with singularity and kinematic constraints to execute the high-dimensional joint motion to reach the goals in the workspace (right panel). Experiments on manipulating complex articulated objects show that our approach is more generalizable to unseen objects with large intra-class variations. It also generates more compliant robotic motion, and outperforms the pure RL and IL baselines in task success rates (bottom panel). quick view of example results (click)


Pipeline

As shown in the yellow block, we use two simple PointNet ensembles to separately perceive static (e.g., size of an object) and dynamic (e.g., current position of a handle) states. These are inputs to a SAC RL framework to learn how to control the disembodied end-effector, realizing a 6-DoF motion skill. Through knowledge of the robot's physical model, QP is used to optimize control of the joint dynamics of the whole-body robot.


Training process

During training process (left), the disembodied manipulator interactes with various cabinets. Test on unseen cabinets (right): by interacting with more cabinets, the RL model shows a better understanding of the skill dynamics, resulting in smoother and more reasonable motions.


Whole-body control

At every time step, the ego-centric point cloud observation is obtained from the three RGB-D cameras mounted on the robot.

During the manipulation process, we optimize the joint-space actions of the robot to approximate its end-effector (EE) motions to the disembodied manipulator's trajectory. We use a QP-based WBC with robotic singularity and kinematic contraints to solve the high-dimensional joint actions.


Example Results

Generalizability to different unseen objects

Experiments show that our approach can learn generalizable skills over different cabinets of the training sets and unseen test sets. We achieve an average success rate of 74% on training cabinets and a 51% on test cabinets in the drawer opening task, significantly out-performing existing techniques (e.g., the baseline methods in ManiSkill-Learn1 obtain a best performance of 37% on training cabinets and a 12% on test cabinets).

Motion Compliance

We also compare the robotic motions produced by our method and pure RL and IL (left: BC in 0:00~0:12, BCQ in 0:13~0:25), showing that robot singularities are avoided in most cases by our method (right: ours). Note that BC and BCQ exhibits far high joint velocities, while our approach generates more compliant, smoother, and controllable robot motions.


Contact

Have any questions, please feel free to contact Kai Lu


March, 2023
Copyright © Kai Lu