CCE Theses and Dissertations
Date of Award
2020
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
College of Computing and Engineering
Advisor
Sumitra Mukherjee
Committee Member
Francisco J. Mitropoulos
Committee Member
Michael J. Laszlo
Keywords
artificial intelligence, CNN, deep learning, embedding methods, emoticon, LSTM, sentiment analysis
Abstract
Businesses glean meaningful feedback in regard to products and services from social media posts in order to improve the quality of products and services, as well as to meet customer expectations. Sentiment analysis is increasingly being used to help businesses by assigning positive or negative polarity to such posts. Although methods currently exist to determine the polarity of sentiments, such methods are unreliable when posts contain terms that are not typically part of a standard dictionary used for sentiment analysis, such as slang and informal language. This dissertation has aimed to empirically investigate alternative methods to improve the classification accuracy of sentiments in such contexts. Specifically, it considers posts written in English that include emoticons.
The benchmark Sentiment140 English language datasets were used for evaluation and labeled tweets that included emoticons. Two types of deep neural networks–Convolution Neural Networks (CNN) and Long Short-Term Memory (LSTM) Networks–were used for classification since they have been demonstrated to produce the best results. All terms in the tweets were represented using the pre-trained embedding vectors word2vec, GloVe, and fastText. Baseline models were trained and tested using tweets with their emoticons removed. For each baseline model, a corresponding model was trained that included emoticons as inputs; in others, emoticons were replaced with English language. Accuracy, precision, recall, and F_(1 )scores of models using emoticons were compared to their corresponding baseline models that did not use emoticons.
Experiments are conducted on data with emoticons and emoticons removed for all the models. Our experiments showed that LSTM that uses an attention model with fastText embedding outperformed the linear models for identifying sentiment for the all datasets used. We also learned that when we replaced emoticons with English language, the sentiment classification accuracy improved. We therefore concluded that inclusion of emoticons as features achieves the highest accuracy in our research on sentiment classification.
NSUWorks Citation
Mutharasu Narayanaperumal. 2020. Deep Neural Networks for Sentiment Analysis in Tweets with Emoticons. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, College of Computing and Engineering. (1117)
https://nsuworks.nova.edu/gscis_etd/1117.