CCAC Theses and Dissertations

Enhancing Sentiment Analysis in Niche Domains: Introducing Diverse Datasets and Evaluating Model Performance in Car Dealership and Board Game Reviews

Kimon Andreou, Nova Southeastern UniversityFollow

Date of Award

2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Information Systems (DISS)

Department

College of Computing and Engineering

Advisor

Sumitra Mukherjee

Committee Member

Michael Laszlo

Committee Member

Francisco Mitropoulos

Keywords

Artificial intelligence, information science, sentiment analysis

Abstract

The field of Natural Language Processing (NLP) has witnessed significant advancements in recent decades, with text classification emerging as a critical task, particularly in sentiment analysis applications. However, a constant challenge within sentiment analysis research is the scarcity of diverse and specialized labeled datasets. The present dissertation addresses this gap by developing two novel, labeled textual datasets sourced from niche areas: BoardGameGeek.com's top 250 board game reviews and TrustPilot.com's car dealership reviews

The main goal of this dissertation is to enrich sentiment analysis methodologies by providing unique datasets and insights into the performance of current models within specialized domains. By using publicly available review data, collected and preprocessed using Python libraries, the datasets enhance the diversity and applicability of sentiment analysis research.

The methodology involves a two-phase approach: data collection and data preparation. Data is extracted from the TrustPilot.com and BoardGameGeek.com websites and then is prefiltered and processed to ensure relevance and quality. Each review underwent language detection, removal of irrelevant elements, and labeling based on associated ratings, resulting in datasets categorized as Positive, Neutral, or Negative sentiments.

To facilitate model testing and measurement, for illustrative purposes a baseline using Naïve Bayes (NB) is established, followed by testing of deep learning models, including Convolutional Neural Networks (CNNs), Bidirectional Long Short-Term Memory (BiLSTM) networks, and hybrid CNN-BiLSTM models. Performance evaluation was conducted across fundamental metrics such as accuracy, precision, recall, and F1 scores, and Receiver Operating Characteristic (ROC) curves.

NSUWorks Citation

Kimon Andreou. 2024. Enhancing Sentiment Analysis in Niche Domains: Introducing Diverse Datasets and Evaluating Model Performance in Car Dealership and Board Game Reviews. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, College of Computing and Engineering. (1201)
https://nsuworks.nova.edu/gscis_etd/1201.

CCAC Theses and Dissertations

Enhancing Sentiment Analysis in Niche Domains: Introducing Diverse Datasets and Evaluating Model Performance in Car Dealership and Board Game Reviews

Date of Award

Document Type

Degree Name

Department

Advisor

Committee Member

Committee Member

Keywords

Abstract

NSUWorks Citation

Included in

Browse

Author Corner

Links

Connect with NSU

CCAC Theses and Dissertations

Enhancing Sentiment Analysis in Niche Domains: Introducing Diverse Datasets and Evaluating Model Performance in Car Dealership and Board Game Reviews

Author

Date of Award

Document Type

Degree Name

Department

Advisor

Committee Member

Committee Member

Keywords

Abstract

NSUWorks Citation

Included in

Share

Browse

Author Corner

Links

Connect with NSU