On multitask loss function for audio event detection and localization

Huy Phan, Lam Pham, Philipp Koch, Ngoc Duong, Ian McLoughlin, Alfred Mertins

Research output: Chapter in Book or Conference ProceedingsConference Proceedings with Oral Presentationpeer-review

Abstract

Audio event localization and detection (SELD) have been commonly tackled using multitask models. Such a model usually consists of a multi-label event classification branch
with sigmoid cross-entropy loss for event activity detection and a regression branch with mean squared error loss for direction-of-arrival estimation. In this work, we propose a
multitask regression model, in which both (multi-label) event detection and localization are formulated as regression problems and use the mean squared error loss homogeneously for
model training. We show that the common combination of heterogeneous loss functions causes the network to underfit the data whereas the homogeneous mean squared error loss
leads to better convergence and performance. Experiments on the development and validation sets of the DCASE 2020 SELD task demonstrate that the proposed system also outperforms the DCASE 2020 SELD baseline across all the detection and localization metrics, reducing the overall SELD error (the combined metric) by approximately 10% absolute.
Original languageEnglish
Title of host publicationDetection and Classification of Acoustic Scenes and Events (DCASE), 2020
Pages160-164
Publication statusPublished - Nov 2020
EventDetection and Classification of Acoustic Scenes and Events 2020 - Tokyo, Japan, Tokyo, Japan
Duration: 2 Nov 20203 Nov 2020

Other

OtherDetection and Classification of Acoustic Scenes and Events 2020
Country/TerritoryJapan
CityTokyo
Period2/11/203/11/20

Research Field

  • Former Research Field - Data Science

Fingerprint

Dive into the research topics of 'On multitask loss function for audio event detection and localization'. Together they form a unique fingerprint.

Cite this