Bi-Adapt: Few-shot Bimanual Adaptation for Novel Categories of 3D Objects via Semantic Correspondence

1National University of Singapore    2Shanghai Qi Zhi Institute
3Peking University    4The University of Hong Kong
*Equal Contribution    Corresponding Author

Video Presentation

Teaser

MY ALT TEXT

Abstract

Bimanual manipulation is imperative yet challenging for robots to execute complex tasks, requiring coordinated collaboration between two arms. However, existing methods for bimanual manipulation often rely on costly data collection and training, struggling to generalize to unseen objects in novel categories efficiently. In this paper, we present Bi-Adapt, a novel framework designed for efficient generalization for bimanual manipulation via semantic correspondence. Bi-Adapt achieves cross-category affordance mapping by leveraging the strong capability of vision foundation models. Fine-tuning with restricted data on novel categories, Bi-Adapt exhibits notable generalization to out-of-category objects in a zero-shot manner. Extensive experiments conducted in both simulation and real-world environments validate the effectiveness of our approach and demonstrate its high efficiency, achieving a high success rate on different benchmark tasks across novel categories with limited data.

Pipeline

MY ALT TEXT

(Left) We first train the Action Learning Network on the supporting set. The corresponding affordance and action distribution serve as our prior knowledge. (Middle) Then, we make contact points mapping from training categories to novel categories, leveraging the foundation model. (Lower Right) After that, the pre-trained network proposes actions based on mapped contact-point pairs on novel categories and fine-tuned with the interaction results. (Upper Right) Finally, the fine-tuned networks can facilitate manipulating unseen instances from novel categories with better performance.

More Visuals

Cross-category affordance generalization results

Example Image 1

Each group shows a source image with contact points (: first gripper, : second gripper) and a target image with inferred points. Highlight intensity indicates correspondence.

Adaptation Results

Example Image 2

Visualization of manipulation results across tasks

Example Image 3

Visualization of affordance comparison

Example Image 3

Real Experiment Setup

(a) Robot Setup

real exp setup

(b) Objects

Normal image

(a) We use two UFactory xArm6 for manipulation and two RealSense D435 cameras to cap- ture RGBD observations.
(b) We test on a diverse set of objects.

Visualization of real-world experiments

MY ALT TEXT

Each row represents a task. From left to right, we respectively show the corresponding affordance on novel categories (: first gripper, : second gripper) and the manipulation progress.

Real-world Demo

Generalization

Example Image 2

BibTeX

BibTex Code Here