Clothes manipulation, such as folding, flattening, and hanging, is prevalent in our daily life and a critical skill for home service robots. Despite recent advances, existing methods often focus narrowly on a specific task or a specific type of clothes.
This work introduces CLothes mAnipulation with Semantic keyPoints (CLASP), which aims at general-purpose clothes manipulation over diverse clothes types—T-shirts, sweaters, shorts, skirts, long dresses, etc.—as well as multiple tasks: folding, flattening, hanging, and more. Our insight in tackling the challenge of generalization is the concept of semantic keypoints, a general spatial-semantic representation that encodes the structural features of clothes, such as “left sleeve” and “right hem.” Semantic keypoints are salient for both perception and action, effectively captured in the commonsense knowledge of foundation models, and relatively easy to extract from observations.
CLASP integrates semantic keypoints with foundation models—both large language models (LLMs) and vision-language models (VLMs)—to achieve general-purpose clothes manipulation. In both simulation and real-robot experiments, CLASP demonstrates strong performance and generalization capabilities compared to baseline methods.
Inspired by the way humans manipulate clothes, we introduce semantic keypoints as a general spatial-semantic representation of clothes. These keypoints carry explicit semantic meaning, can be described using natural language, and provide a sparse yet succinct representation. As a result, they are easy to extract from visual observations, generalize well across different clothes instances, and effectively define affordances for manipulation.
CLASP overview. Given an RGB-D observation, CLASP extracts semantic keypoints as the state. These keypoints, along with the RGB image and task instruction, are fed to a VLM to generate a sub-task sequence. Once verified, the sub-tasks are executed. After each sub-task, CLASP updates the observation and decides whether to replan. This loop continues until the task is complete.
Our method can achieve open-category semantic keypoint extraction under irregular deformation and occlusion.
@inproceedings{deng2025clasp,
title = {General-purpose Clothes Manipulation with Semantic Keypoints},
author = {Yuhong Deng and David Hsu},
booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
year = {2025},
}