Abstract
Acoustic scene classification (ASC) using front-end time-frequency features and back-end neural network classifiers has demonstrated good performance in recent years. However
a profusion of systems has arisen to suit different tasks anddatasets, utilising different feature and classifier types. This paper aims at a robust framework that can explore and utilise a
range of different time-frequency features and neural networks, either singly or merged, to achieve good classification performance. In particular, we exploit three different types of front-end time-frequency feature; log energy Mel filter, Gammatone filter and constant Q transform. At the back-end we evaluate effective a two-stage model that exploits a Convolutional
Neural Network for pre-trained feature extraction, followed by Deep Neural Network classifiers as a post-trained feature adap-
tation model and classifier. We also explore the use of a data augmentation technique for these features that effectively generates a variety of intermediate data, reinforcing model learning abilities, particularly for marginal cases. We assess performance on the DCASE2016 dataset, demonstrating good classification accuracies exceeding 90%, significantly outperforming
the DCASE2016 baseline and highly competitive compared to state-of-the-art systems.
a profusion of systems has arisen to suit different tasks anddatasets, utilising different feature and classifier types. This paper aims at a robust framework that can explore and utilise a
range of different time-frequency features and neural networks, either singly or merged, to achieve good classification performance. In particular, we exploit three different types of front-end time-frequency feature; log energy Mel filter, Gammatone filter and constant Q transform. At the back-end we evaluate effective a two-stage model that exploits a Convolutional
Neural Network for pre-trained feature extraction, followed by Deep Neural Network classifiers as a post-trained feature adap-
tation model and classifier. We also explore the use of a data augmentation technique for these features that effectively generates a variety of intermediate data, reinforcing model learning abilities, particularly for marginal cases. We assess performance on the DCASE2016 dataset, demonstrating good classification accuracies exceeding 90%, significantly outperforming
the DCASE2016 baseline and highly competitive compared to state-of-the-art systems.
| Originalsprache | Englisch |
|---|---|
| Titel | INTERSPEECH 2019 |
| Seiten | 3634-3638 |
| DOIs | |
| Publikationsstatus | Veröffentlicht - Sept. 2019 |
| Veranstaltung | Interspeech 2019 - Graz, Österreich Dauer: 15 Sept. 2019 → 19 Sept. 2019 |
Konferenz
| Konferenz | Interspeech 2019 |
|---|---|
| Land/Gebiet | Österreich |
| Stadt | Graz |
| Zeitraum | 15/09/19 → 19/09/19 |
Research Field
- Ehemaliges Research Field - Data Science
Fingerprint
Untersuchen Sie die Forschungsthemen von „A Robust Framework for Acoustic Scene Classification“. Zusammen bilden sie einen einzigartigen Fingerprint.Diese Publikation zitieren
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver