Go Back Research Article September, 2024

AI-POWERED SPEECH EMOTION RECOGNITION FOR PERSONALIZED ASSISTANTS

Abstract

The introduction of AI AIpowered speech emotion recognition (SER) acts as a major enabler for the personalized assistants to recognize and understand human emotion to react to them accordingly. The objective of this research is to develop a robust SER (Sentiment Extraction & Representation) model, which uses emerging technologies like (deep learning), (Transfer Learning), and models based on (BERT (Bidirectional Encoder Representations from Transformers)) for the best results. It utilizes the proposed model which is based on multimodal data inputs (e.g. audio features and text based embeddings), to extract common features about complex emotional patterns. The model performs mental health condition monitoring and causes emotional shift discovery in user interactions by the application of supervised deep recurrent systems. Moreover, the speaker recognition models with transfer learning techniques help the system to generalize to different speech patterns. Synthetic emotional speech augmentation further implements the model to be more resilient to data imbalance and also improves its predictive performance. The experimental results show that that the system results in state of the art performance across key SER benchmarks with an overall accuracy, precision and recall performance beating conventional models' performance. However, it is anticipated that future emotion aware AI systems will operate on advanced neural architectures that will include proactive causes of emotions and also grow real time adaptive capability to train to the unique version of the individual AI will interact with.

Keywords

speech emotion recognition personalized assistants deep learning transfer learning bert models synthetic speech augmentation multimodal data integration
Document Preview
Download PDF
Details
Volume 5
Issue 2
Pages 13–23
ISSN AwaiedX-XXXX