Multimodal Sentiment Analysis Using Transformer-Based Architectures: A Fusion of Text, Audio, and Visual Cues
Abstract
Multimodal Sentiment Analysis (MSA) seeks to interpret human emotions by integrating textual, auditory, and visual data. Leveraging transformer-based architectures, this study introduces a novel framework that effectively fuses these modalities to enhance sentiment classification accuracy. The proposed model employs advanced fusion techniques and attention mechanisms to capture intricate inter-modal relationships. Evaluated on benchmark datasets such as CMU-MOSI and CMU-MOSEI, the model demonstrates superior performance compared to existing state-of-the-art methods, highlighting the efficacy of transformer-based multimodal fusion in sentiment analysis.
Keywords
Multimodal Sentiment Analysis
Transformer Architectures
Text-Audio-Visual Fusion
Attention Mechanisms
Deep Learning
Document Preview
Download PDF
https://scholar9.com/publication-detail/multimodal-sentiment-analysis-using-transformer-ba--33937
Details
Volume
5
Issue
2
Pages
1-7
ISSN
3232-4536
mj iaeme
"Multimodal Sentiment Analysis Using Transformer-Based Architectures: A Fusion of Text, Audio, and Visual Cues".
International Journal of Scientific Research in Computer Science and Information Technology,
vol: 5,
No. 2
Jul. 2024, pp: 1-7,
https://scholar9.com/publication-detail/multimodal-sentiment-analysis-using-transformer-ba--33937