Skip to main navigation Skip to search Skip to main content

Action Tokenizer Matters in In-Context Imitation Learning

  • An Dinh Vuong
  • , Minh Nhat Vu
  • , Dong An
  • , Ian Reid
  • Department of Computer Vision, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)

Research output: Chapter in Book or Conference ProceedingsConference Proceedings with Oral Presentationpeer-review

Abstract

In-context imitation learning (ICIL) is a new paradigm that enables robots to generalize from demonstrations to unseen tasks without retraining. A well-structured action representation is the key to capturing demonstration information effectively, yet action tokenizer (the process of discretizing and encoding actions) remains largely unexplored in ICIL. In this work, we first systematically evaluate existing action tokenizer methods in ICIL and reveal a critical limitation: while they effectively encode action trajectories, they fail to preserve temporal smoothness, which is crucial for stable robotic execution. To address this, we propose LipVQ-VAE, a variational autoencoder that enforces the Lipschitz condition in the latent action space via weight normalization. By propagating smoothness constraints from raw action inputs to a quantized latent codebook, LipVQ-VAE generates smoother actions. When integrating into ICIL, LipVQ-VAE improves performance by more than 5.3% in high-fidelity simulators, with real-world experiments confirming its ability to produce smoother, more reliable trajectories. Code and checkpoints are available at https://action-tokenizer-matters.github.io/.
Original languageEnglish
Title of host publicationProceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Pages13490-13496
Number of pages7
ISBN (Electronic)979-8-3315-4393-8
DOIs
Publication statusPublished - 2025
Event2025 IEEE/RSJ International Conference on Intelligent Robots and Systems - Hangzhou, China, Hangzhou, China
Duration: 19 Oct 202525 Dec 2025
https://www.iros25.org/

Publication series

Name2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Conference

Conference2025 IEEE/RSJ International Conference on Intelligent Robots and Systems
Abbreviated titleIROS
Country/TerritoryChina
CityHangzhou
Period19/10/2525/12/25
Internet address

Research Field

  • Complex Dynamical Systems

Keywords

  • Codes
  • Imitation learning
  • Autoencoders
  • stability analysis
  • Encoding
  • Trajectory
  • Reliability
  • Intelligent robots

Fingerprint

Dive into the research topics of 'Action Tokenizer Matters in In-Context Imitation Learning'. Together they form a unique fingerprint.

Cite this