Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

Toan Nguyen, Minh Nhat Vu, Baoru Huang, An Vuong, Quan Vuong, Ngan Le, Thieu Vo, Anh Nguyen

Publikation: Beitrag in Buch oder TagungsbandVortrag mit Beitrag in TagungsbandBegutachtung

Abstract

6-DoF grasp detection has been a fundamental and challenging problem in robotic vision. While previous works have focused on ensuring grasp stability, they often do not consider human intention conveyed through natural language, hindering effective collaboration between robots and users in complex 3D environments. In this paper, we present a new approach for language-driven 6-DoF grasp detection in cluttered point clouds. We first introduce Grasp-Anything-6D, a large-scale dataset for the language-driven 6-DoF grasp detection task with 1M point cloud scenes and more than 200M language-associated 3D grasp poses. We further introduce a novel diffusion model that incorporates a new negative prompt guidance learning strategy. The proposed negative prompt strategy directs the detection process toward the desired object while steering away from unwanted ones given the language input. Our method enables an end-to-end framework where humans can command the robot to grasp desired objects in a cluttered scene using natural language. Intensive experimental results show the effectiveness of our method in both benchmarking experiments and real-world scenarios, surpassing other baselines. In addition, we demonstrate the practicality of our approach in real-world robotic applications. Our project is available at https://airvlab.github.io/grasp-anything.
OriginalspracheEnglisch
TitelLecture Notes in Computer Science
Seiten363-381
Auflage15077
DOIs
PublikationsstatusVeröffentlicht - 6 Dez. 2024
VeranstaltungEuropean Conference on Computer Vision: Computer Vision - - Milan, Italien
Dauer: 29 Sept. 20244 Okt. 2024

Konferenz

KonferenzEuropean Conference on Computer Vision
KurztitelComputer Vision - ECCV 2024
Land/GebietItalien
StadtMilan
Zeitraum29/09/244/10/24

Research Field

  • Complex Dynamical Systems

Schlagwörter

  • Language-Driven 6-DoF Grasp Detection
  • Diffusion Models

Fingerprint

Untersuchen Sie die Forschungsthemen von „Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance“. Zusammen bilden sie einen einzigartigen Fingerprint.

Diese Publikation zitieren