Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach: Use Case of Riot or Violent Context Detection

Research output: Chapter in Book or Conference ProceedingsConference Proceedings with Poster Presentationpeer-review

Abstract

In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By combining individual tasks and analyzing both audio & visual data extracted from input video, the toolchain offers various audio/video-based applications: Two general applications of audio/video clustering, comprehensive audio/video summary and a specific application of riot or violent context detection. Furthermore, the toolchain presents a flexible and adaptable architecture that is effective to integrate new models for further audio/video-based applications.
Original languageEnglish
Title of host publication2024 International Conference On Content-based Multimedia Indexing, Cbmi
Pages349-352
Number of pages4
ISBN (Electronic)979-8-3503-7844-3
DOIs
Publication statusPublished - Feb 2025
Event21st International Conference on Content-based Multimedia Indexing - Reykjavik University (RU), Reykjavik, Iceland
Duration: 18 Sept 201720 Sept 2024
https://cbmi2024.org/

Conference

Conference21st International Conference on Content-based Multimedia Indexing
Abbreviated titleCBMI 2024
Country/TerritoryIceland
CityReykjavik
Period18/09/1720/09/24
Internet address

Research Field

  • Multimodal Analytics

Keywords

  • Deep learning model
  • multimodal
  • toolchain

Fingerprint

Dive into the research topics of 'Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach: Use Case of Riot or Violent Context Detection'. Together they form a unique fingerprint.

Cite this