Language-driven Grasp Detection with Mask-guided Attention

Tuan Van Vo, Minh Nhat Vu, Baoru Huang, An Vuong, Ngan Le, Thieu Vo, Anh Nguyen

Research output: Chapter in Book or Conference ProceedingsConference Proceedings with Oral Presentationpeer-review

Abstract

Grasp detection is an essential task in robotics with various industrial applications. However, traditional methods often struggle with occlusions and do not utilize language for grasping. Incorporating natural language into grasp detection remains a challenging task and largely unexplored. To address this gap, we propose a new method for language-driven grasp detection with mask-guided attention by utilizing the transformer attention mechanism with semantic segmentation features. Our approach integrates visual data, segmentation mask features, and natural language instructions, significantly improving grasp detection accuracy. Our work introduces a new framework for language-driven grasp detection, paving the way for language-driven robotic applications. Intensive experiments show that our method outperforms other recent baselines by a clear margin, with a 10.0% success score improvement. We further validate our method in real-world robotic experiments, confirming the effectiveness of our approach.
Original languageEnglish
Title of host publicationProceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Pages7492-7498
DOIs
Publication statusPublished - 25 Dec 2024
Event2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024) - Abu Dhabi, United Arab Emirates
Duration: 14 Oct 202418 Oct 2024

Conference

Conference2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period14/10/2418/10/24

Research Field

  • Complex Dynamical Systems

Fingerprint

Dive into the research topics of 'Language-driven Grasp Detection with Mask-guided Attention'. Together they form a unique fingerprint.

Cite this