Lightweight Language-driven Grasp Detection using Conditional Consistency Model

Nghia Nguyen, Minh Nhat Vu, Baoru Huang, An Vuong, Ngan Le, Thieu Vo, Anh Nguyen

Research output: Chapter in Book or Conference ProceedingsConference Proceedings with Oral Presentationpeer-review

Abstract

Language-driven grasp detection is a fundamental yet challenging task in robotics with various industrial applications. This work presents a new approach for language-driven grasp detection that leverages lightweight diffusion models to achieve fast inference time. By integrating diffusion processes with grasping prompts in natural language, our method can effectively encode visual and textual information, enabling more accurate and versatile grasp positioning that aligns well with the text query. To overcome the long inference time problem in diffusion models, we leverage the image and text features as the condition in the consistency model to reduce the number of denoising timesteps during inference. The intensive experimental results show that our method outperforms other recent grasp detection methods and lightweight diffusion models by a clear margin. We further validate our method in real-world robotic experiments to demonstrate its fast inference time capability.
Original languageGerman
Title of host publicationProceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Pages13719-13725
DOIs
Publication statusPublished - 25 Dec 2024
Event2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024) - Abu Dhabi, United Arab Emirates
Duration: 14 Oct 202418 Oct 2024

Conference

Conference2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period14/10/2418/10/24

Research Field

  • Complex Dynamical Systems

Cite this