CCE Theses and Dissertations

Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


College of Computing and Engineering


Sumitra Mukherjee

Committee Member

Francisco J. Mitropoulos

Committee Member

Michael J. Laszlo


Businesses glean meaningful feedback in regard to products and services from social media posts in order to improve the quality of products and services, as well as to meet customer expectations. Sentiment analysis is increasingly being used to help businesses by assigning positive or negative polarity to such posts. Although methods currently exist to determine the polarity of sentiments, such methods are unreliable when posts contain terms that are not typically part of a standard dictionary used for sentiment analysis, such as slang and informal language. This dissertation has aimed to empirically investigate alternative methods to improve the classification accuracy of sentiments in such contexts. Specifically, it considers posts written in English that include emoticons.

The benchmark Sentiment140 English language datasets were used for evaluation and labeled tweets that included emoticons. Two types of deep neural networks–Convolution Neural Networks (CNN) and Long Short-Term Memory (LSTM) Networks–were used for classification since they have been demonstrated to produce the best results. All terms in the tweets were represented using the pre-trained embedding vectors word2vec, GloVe, and fastText. Baseline models were trained and tested using tweets with their emoticons removed. For each baseline model, a corresponding model was trained that included emoticons as inputs; in others, emoticons were replaced with English language. Accuracy, precision, recall, and F_(1 )scores of models using emoticons were compared to their corresponding baseline models that did not use emoticons.

Experiments are conducted on data with emoticons and emoticons removed for all the models. Our experiments showed that LSTM that uses an attention model with fastText embedding outperformed the linear models for identifying sentiment for the all datasets used. We also learned that when we replaced emoticons with English language, the sentiment classification accuracy improved. We therefore concluded that inclusion of emoticons as features achieves the highest accuracy in our research on sentiment classification.