Abstract
In this information age, opinion mining which is also known as sentiment analysis turns up to be the most important task in the field of natural language processing. Previous literature in area of sentiment analysis which mostly focused on single modality that is on textual data. Almost all the latest advancement in the sentiment analysis are using textual dataset and resources only. With the invent of internet which increases the use of social media, people are using vlogs, videos, pictures, audios, emojis and microblogs to represent their opinions on different web platforms. In this new media age, every day 720k hours of videos are uploaded on alone Youtube only. We have number of such platforms like YouTube. In the classical methods other modalities’ expressiveness is overlooked and thus these methods fail to generate accurate results. Numerous commercial applications used the aggregation of sentiments and opinions of individuals by anticipating large population. Thus, it is highly necessary that the diverse modalities from the raw data available from the internet should be utilized to mine opinions and identify sentiments. Varied data (i.e., text, speech, visual and code-mixed data) available over internet is integrated by Multimodal Sentiment Analysis. Multimodality refers to more than one modality like bimodal which uses any two modalities or trimodal which uses all the three modalities. Each modality offers its own exclusive features and can be collectively used to mine their positive or negative sentiments, opinions or responses about the entity. The latest development in multimodal sentiment analysis is that the diverse modalities i.e., audio, visual and textual are fused to generate better accuracy. Also, language and culture independent and speaker independent models can be generated. In this survey, we have defined various fusion techniques for sentiment analysis using multiple modalities, characteristics, features for multimodal sentiment analysis.
View more >>