A new study has revealed a way to do sentiment analysis on a large number of social media images using unsupervised learning.
Unsupervised learning in AI is a step above supervised learning where machines have to work with unlabelled data, observe and make sense of it, and provide an outcome. Supervised learning, on the other hand, gives machines labelled data or examples to learn from when carrying out certain tasks such as classifying an object or predicting future outcomes.
The study, Unsupervised Sentiment Analysis for Social Media Images, was released as part of the International Joint Conference on Artificial Intelligence in Argentina this week. It reveals a novel framework, called Unsupervised Sentiment Analysis (USEA), that uses both textual and visual data in a single model for learning.
Images from social media sites offer rich data to work with when doing sentiment analysis. However, manually labelling millions of images is too labour- and time-intensive, meaning this data often goes untapped. This is why the study's authors focused their efforts on unsupervised learning.
“In order to utilise the vast amount of unlabelled social media images, an unsupervised approach would be much more desirable,” researchers from Arizona State University wrote in their paper.
“As of 2013, 87 millions of users have registered with Flickr. Also, it was estimated that about 20 billion Instagram photos are shared to 2014.
“To our best knowledge, USEA is the first unsupervised sentiment analysis framework for social media images.”
The framework infers sentiments by combining visual data with accompanying textual data. As textual data is often incomplete with hardly any tags or noisy with irrelevant comments, relying on it alone is difficult when doing sentiment analysis.
Therefore, the researchers used the supporting textual data to provide semantic information on the images to enable unsupervised learning.
“Textual information bridges the semantic gap between visual features and sentiment labels.”
The researchers crawled images from Flickr and Instagram users, collecting 140,221 images from Flickr and 131,224 from Instagram.
They built a framework to classify images into three categories or class labels – positive, negative and neutral, looking at image captions and comments associated with the images.
“Some words may contain sentiment polarities. For example, some words are positive such as ‘happy’ and ‘terrific’; while others are negative such as ‘gloomy’ and ‘disappointed’.
“The sentiment polarities of words can be obtained via some public sentiment lexicons. For example, the sentiment lexicon MPQA [Multiple Perspective Question Answering] contains 7,504 human labeled words which are commonly used in the daily life with 2,721 positive words and 4,783 negative words.
“Second, some abbreviations and emoticons are strong sentiment indicators. For example, ‘lol’ [laugh out loud] is a positive indicator while ‘:(‘ is a negative indicator.”
Visual features from the images were extracted by large-scale visual attribute detectors, with term frequency and stop words (removing words like ‘a’ and ‘the’) used to form text-based features.
The framework was compared to other sentiment analysis algorithms such as Senti API for unsupervised sentiment prediction and a variant of the framework, USEA-T, which only takes textual data into account when doing sentiment analysis.
Other methods that were also compared with the USEA framework were Sentibank with K-means clustering, which uses large scale visual attribute detectors, and adjective and nouns visual sentiment description pairs; EL with K-means clustering, which is a topical graphical model for sentiment analysis; and Random, which randomly guesses to predict sentiment labels of images.
The results show that USEA performed better than all the other algorithms tested, receiving 56.18 per cent accuracy with the Flickr dataset compared to Senti API at 34.15 per cent and USEA-T at 40.22 per cent. With the Instagram dataset, it received 59.94 per cent accuracy compared to Senti API at 37.80 per cent and USEA-T at 36.41 per cent.
“The proposed framework often obtains better performance than baseline methods. There are two major reasons. First, textual information provides semantic meanings and sentiment signals for images. Second we combine visual and textual information for sentiment analysis.”
The research pointed out that deep learning approaches (many hidden layers in artificial neural networks) to this have shown to be effective, but still are mostly used in a supervised learning way, which depends on the availability of a good training dataset with labels.
“In the future, we will exploit more social media sources, such as link information, user history, geo-location, etc., for sentiment analysis.”