Go Back Research Article July, 2024

Multimodal Sentiment Analysis Using Transformer-Based Architectures: A Fusion of Text, Audio, and Visual Cues

Abstract

Multimodal Sentiment Analysis (MSA) seeks to interpret human emotions by integrating textual, auditory, and visual data. Leveraging transformer-based architectures, this study introduces a novel framework that effectively fuses these modalities to enhance sentiment classification accuracy. The proposed model employs advanced fusion techniques and attention mechanisms to capture intricate inter-modal relationships. Evaluated on benchmark datasets such as CMU-MOSI and CMU-MOSEI, the model demonstrates superior performance compared to existing state-of-the-art methods, highlighting the efficacy of transformer-based multimodal fusion in sentiment analysis.

Keywords

Multimodal Sentiment Analysis Transformer Architectures Text-Audio-Visual Fusion Attention Mechanisms Deep Learning
Document Preview
Download PDF
Details
Volume 5
Issue 2
Pages 1-7
ISSN 3232-4536