Three-Dimensional Waypoint Navigation of Multicopters by Attitude and Throttle Commands using Off-Policy Reinforcement Learning

Research output: Chapter in Book or Conference ProceedingsConference Proceedings with Oral Presentationpeer-review

Abstract

Artificial intelligence, in particular machine learning, is becoming increasingly important in automation and robotics. Machine learning approaches are also becoming more and more accepted in aviation. In particular, Reinforcement Learning is gaining more attention in navigation and control problems, for example in training flight manoeuvres. This paper aims to investigate the use of Off-Policy Reinforcement Learning techniques for three-dimensional waypoint navigation of multicopters by providing roll, pitch and throttle commands. It describes and compare the trainings performed using two well-known Off-Policy algorithms, namely the Deep Deterministic Policy Gradient (DDPG) and the Soft Actor Critic (SAC). Furthermore, we investigate the impact of the reward definition on the training outcome. For each of the used algorithm, two agents are trained with two different reward definitions. Finally, the paper shows the validations performed to evaluate the performance of the four trained agents under different known and unknown conditions. Their performances are evaluated and compared with respect to the training algorithm and the reward definition used.
Original languageEnglish
Title of host publication2022 International Conference on Unmanned Aircraft Systems (ICUAS)
Pages1359-1366
Number of pages8
Publication statusPublished - 2022
EventInternational Conference on Unmanned Aircraft System (ICUAS) -
Duration: 21 Jun 202224 Jun 2022

Conference

ConferenceInternational Conference on Unmanned Aircraft System (ICUAS)
Period21/06/2224/06/22

Research Field

  • Assistive and Autonomous Systems

Keywords

  • Reinforcement Learning
  • Waypoint navigation

Fingerprint

Dive into the research topics of 'Three-Dimensional Waypoint Navigation of Multicopters by Attitude and Throttle Commands using Off-Policy Reinforcement Learning'. Together they form a unique fingerprint.

Cite this