CCE Theses and Dissertations
Date of Award
2024
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Information Systems (DISS)
Department
College of Computing and Engineering
Advisor
Sumitra Mukherjee
Committee Member
Michael Laszlo
Committee Member
Francisco Mitropoulos
Keywords
Artificial intelligence, information science, sentiment analysis
Abstract
The field of Natural Language Processing (NLP) has witnessed significant advancements in recent decades, with text classification emerging as a critical task, particularly in sentiment analysis applications. However, a constant challenge within sentiment analysis research is the scarcity of diverse and specialized labeled datasets. The present dissertation addresses this gap by developing two novel, labeled textual datasets sourced from niche areas: BoardGameGeek.com's top 250 board game reviews and TrustPilot.com's car dealership reviews
The main goal of this dissertation is to enrich sentiment analysis methodologies by providing unique datasets and insights into the performance of current models within specialized domains. By using publicly available review data, collected and preprocessed using Python libraries, the datasets enhance the diversity and applicability of sentiment analysis research.
The methodology involves a two-phase approach: data collection and data preparation. Data is extracted from the TrustPilot.com and BoardGameGeek.com websites and then is prefiltered and processed to ensure relevance and quality. Each review underwent language detection, removal of irrelevant elements, and labeling based on associated ratings, resulting in datasets categorized as Positive, Neutral, or Negative sentiments.
To facilitate model testing and measurement, for illustrative purposes a baseline using Naïve Bayes (NB) is established, followed by testing of deep learning models, including Convolutional Neural Networks (CNNs), Bidirectional Long Short-Term Memory (BiLSTM) networks, and hybrid CNN-BiLSTM models. Performance evaluation was conducted across fundamental metrics such as accuracy, precision, recall, and F1 scores, and Receiver Operating Characteristic (ROC) curves.
NSUWorks Citation
Kimon Andreou. 2024. Enhancing Sentiment Analysis in Niche Domains: Introducing Diverse Datasets and Evaluating Model Performance in Car Dealership and Board Game Reviews. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, College of Computing and Engineering. (1201)
https://nsuworks.nova.edu/gscis_etd/1201.