CLASP

Abstract

Clothes manipulation, such as folding, flattening, and hanging, is prevalent in our daily life and a critical skill for home service robots. Despite recent advances, existing methods often focus narrowly on a specific task or a specific type of clothes.

This work introduces CLothes mAnipulation with Semantic keyPoints (CLASP), which aims at general-purpose clothes manipulation over diverse clothes types—T-shirts, sweaters, shorts, skirts, long dresses, etc.—as well as multiple tasks: folding, flattening, hanging, and more. Our insight in tackling the challenge of generalization is the concept of semantic keypoints, a general spatial-semantic representation that encodes the structural features of clothes, such as “left sleeve” and “right hem.” Semantic keypoints are salient for both perception and action, effectively captured in the commonsense knowledge of foundation models, and relatively easy to extract from observations.

CLASP integrates semantic keypoints with foundation models—both large language models (LLMs) and vision-language models (VLMs)—to achieve general-purpose clothes manipulation. In both simulation and real-robot experiments, CLASP demonstrates strong performance and generalization capabilities compared to baseline methods.

Semantic Keypoints

Inspired by the way humans manipulate clothes, we introduce semantic keypoints as a general spatial-semantic representation of clothes. These keypoints carry explicit semantic meaning, can be described using natural language, and provide a sparse yet succinct representation. As a result, they are easy to extract from visual observations, generalize well across different clothes instances, and effectively define affordances for manipulation.

CLASP

CLASP overview. Given an RGB-D observation, CLASP extracts semantic keypoints as the state. These keypoints, along with the RGB image and task instruction, are fed to a VLM to generate a sub-task sequence. Once verified, the sub-tasks are executed. After each sub-task, CLASP updates the observation and decides whether to replan. This loop continues until the task is complete.

Results

Semantic Keypoints Extraction

Our method can achieve open-category semantic keypoint extraction under irregular deformation and occlusion.

BibTeX

@article{deng2025clasp,
  title     = {CLASP: General-Purpose Clothes Manipulation with Semantic Keypoints},
  author    = {Yuhong Deng and Chao Tang and Cunjun Yu and Linfeng Li and David Hsu},
  journal   = {arXiv preprint arXiv:2507.19983},
  year      = {2025},
}

CLASP: General-Purpose Clothes Manipulation with Semantic Keypoints

CLASP is an general-purpose clothes manipulation method across diverse clothes categories and manipulation tasks.

Abstract

Video

Semantic Keypoints

CLASP

Results

Semantic Keypoints Extraction

BibTeX