General-purpose Clothes Manipulation with Semantic Keypoints

National University of Singapore;

Abstract

Clothes manipulation is a critical skill for household robots. Recent advancements have been made in task-specific clothes manipulation, such as folding, flattening, and hanging. However, due to clothes' complex geometries and deformability, creating a general-purpose robot system that can manipulate a diverse range of clothes in many ways remains challenging. Since clothes are typically designed with specific structures, we propose identifying these specific features like ``left sleeve'' as semantic keypoints. Semantic keypoints can provide semantic cues for task planning and geometric cues for low-level action generation. With this insight, we develop a hierarchical learning framework using the large language model (LLM) for general-purpose CLothes mAnipulation with Semantic keyPoints (CLASP). Extensive simulation experiments show that CLASP outperforms baseline methods on both seen and unseen tasks across various clothes manipulation tasks. Real-world experiments show that CLASP can be directly deployed in the real world and applied to a wide variety of clothes.

CLASP

We propose a general-purpose CLothes mAnipulation method with Semantic keyPoints (CLASP). Given Depth observation and free-form language instruction, ClASP first detect semantic keypoints as state abstraction. The semantic description of detected semantic keypoints and the instruction are fed into a large language model to generate a series of sub-task. For each sub-task, a low-level action primitives library is used to execute the action based on the semantic keypoint positions. The entire pipeline enables the robot to learn transferable language and visual concepts across clothes manipulation tasks, which make CLASP generalize to a wide variety of clothes categories and tasks.

Interpolate start reference image.

Videos of real-world experiments

We evaluate the performance of our framework on a dual-arm robot manipulation system. Our method can be directly transformed to reality and applied to various clothes and task categories.

Interpolate start reference image.

BibTeX

@article{deng2024clasp,
    title     = {General-purpose Clothes Manipulation with Semantic Keypoints},
    author    = {Yuhong Deng and David Hsu},
    journal = {arXiv preprint arXiv: Arxiv-2408.08160},
    year      = {2024},
  }