Three-Dimensional Waypoint Navigation of Multicopters by Attitude and Throttle Commands using Off-Policy Reinforcement Learning

Publikation: Beitrag in Buch oder TagungsbandVortrag mit Beitrag in TagungsbandBegutachtung

Abstract

Artificial intelligence, in particular machine learning, is becoming increasingly important in automation and robotics. Machine learning approaches are also becoming more and more accepted in aviation. In particular, Reinforcement Learning is gaining more attention in navigation and control problems, for example in training flight manoeuvres. This paper aims to investigate the use of Off-Policy Reinforcement Learning techniques for three-dimensional waypoint navigation of multicopters by providing roll, pitch and throttle commands. It describes and compare the trainings performed using two well-known Off-Policy algorithms, namely the Deep Deterministic Policy Gradient (DDPG) and the Soft Actor Critic (SAC). Furthermore, we investigate the impact of the reward definition on the training outcome. For each of the used algorithm, two agents are trained with two different reward definitions. Finally, the paper shows the validations performed to evaluate the performance of the four trained agents under different known and unknown conditions. Their performances are evaluated and compared with respect to the training algorithm and the reward definition used.
OriginalspracheEnglisch
Titel2022 International Conference on Unmanned Aircraft Systems (ICUAS)
Seiten1359-1366
Seitenumfang8
PublikationsstatusVeröffentlicht - 2022
VeranstaltungInternational Conference on Unmanned Aircraft System (ICUAS) -
Dauer: 21 Juni 202224 Juni 2022

Konferenz

KonferenzInternational Conference on Unmanned Aircraft System (ICUAS)
Zeitraum21/06/2224/06/22

Research Field

  • Assistive and Autonomous Systems

Schlagwörter

  • Reinforcement Learning
  • Waypoint navigation

Fingerprint

Untersuchen Sie die Forschungsthemen von „Three-Dimensional Waypoint Navigation of Multicopters by Attitude and Throttle Commands using Off-Policy Reinforcement Learning“. Zusammen bilden sie einen einzigartigen Fingerprint.

Diese Publikation zitieren