Paper Title

Multimodal Sentiment Analysis Using Transformer-Based Architectures: A Fusion of Text, Audio, and Visual Cues

Authors

mj iaeme

Keywords

Multimodal Sentiment Analysis
Transformer Architectures
Text-Audio-Visual Fusion
Attention Mechanisms
Deep Learning

Article Type

Research Article

Journal

International Journal of Scientific Research in Computer Science and Information Technology

Issue

Volume : 5 | Issue : 2 | Page No : 1-7

Published On

July, 2024

Downloads

FULL PDF

CITATION

COPY LINK

Abstract

Multimodal Sentiment Analysis (MSA) seeks to interpret human emotions by integrating textual, auditory, and visual data. Leveraging transformer-based architectures, this study introduces a novel framework that effectively fuses these modalities to enhance sentiment classification accuracy. The proposed model employs advanced fusion techniques and attention mechanisms to capture intricate inter-modal relationships. Evaluated on benchmark datasets such as CMU-MOSI and CMU-MOSEI, the model demonstrates superior performance compared to existing state-of-the-art methods, highlighting the efficacy of transformer-based multimodal fusion in sentiment analysis.