Dexterous Hand Manipulation via Efficient Imitation-Bootstrapped Online Reinforcement Learning

Abstract

Dexterous hand manipulation in real-world scenarios presents considerable challenges due to its demands for both dexterity and precision. While imitation learning approaches have thoroughly examined these challenges, they still require a significant number of expert demonstrations and are limited by a constrained performance upper bound. In this paper, we propose a novel and efficient Imitation-Bootstrapped Online Reinforcement Learning (IBORL) method tailored for robotic dexterous hand manipulation in real-world environments. Specifically, we pretrain the policy using a limited set of expert demonstrations and subsequently finetune this policy through direct reinforcement learning in the real world. To address the catastrophic forgetting issues that arise from the distribution shift between expert demonstrations and real-world environments, we introduce a regularization term that balances the exploration of novel behaviors with the preservation of the pretrained policy. Our experiments with real-world tasks demonstrate that our method significantly outperforms existing approaches, achieving an almost 100% success rate and a 23% improvement in cycle time. Furthermore, our method is adept at uncovering superior policies in comparison to those obtained solely by imitating expert demonstrations via reinforcement learning.

Experiments

Method	Grab Cup		Pinch Cube		Grab Scanner		Take Loopy
Method	SR (%)	Cycle Time	SR (%)	Cycle Time	SR (%)	Cycle Time	SR (%)	Cycle Time
BC	70.0	122.21	7.5	129.67	80.0	204.25	95.0	116.11
SERL	0	N/A	0	N/A	0	N/A	0	N/A
IBORL^†	0	N/A	0	N/A	0	N/A	100.0	101.8
IBORL (Ours)	87.5	68.51	90.0	110.55	90.0	147.19	100.0	115.55

Grab Cup

Grab Cup (Ours)

Grab Cup (BC)

Pinch Cube

Pinch Cube (Ours)

Pinch Cube (BC)

Take Loopy

Take Loopy (Ours)

Take Loopy (BC)

Grab Scanner

Grab Scanner (Ours)

Grab Scanner (BC)

ACT

Pinch Cube (ACT)

Grab Cup (ACT)

BibTeX

@inproceedings{huang2025iborl,
  author    = {Huang, Dongchi and Zhang, Tianle and Li, Yihang and Zhao, Ling and Li, Jiayi and Fang, Zhirui and Xia, Chunhe and Li, Lusong and He, Xiaodong},
  title     = {Dexterous Hand Manipulation via Efficient Imitation-Bootstrapped Online Reinforcement Learning},
  booktitle = {arXiv},
  year      = {2025},
}