CCE Theses and Dissertations
Date of Award
2015
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science (CISD)
Department
College of Engineering and Computing
Advisor
Michael J. Lazlo
Committee Member
Sumitra Mukherjee
Committee Member
Amon B. Seagull
Keywords
Authorship attribution, Machine learning, Natural language processing, Rhetoric, Computer Science
Abstract
Measures of classical rhetorical structure in text can improve accuracy in certain types of stylistic classification tasks such as authorship attribution. This research augments the relatively scarce work in the automated identification of rhetorical figures and uses the resulting statistics to characterize an author's rhetorical style. These characterizations of style can then become part of the feature set of various classification models.
Our Rhetorica software identifies 14 classical rhetorical figures in free English text, with generally good precision and recall, and provides summary measures to use in descriptive or classification tasks. Classification models trained on Rhetorica's rhetorical measures paired with lexical features typically performed better at authorship attribution than either set of features used individually. The rhetorical measures also provide new stylistic quantities for describing texts, authors, genres, etc.
NSUWorks Citation
James Java. 2015. Characterization of Prose by Rhetorical Structure for Machine Learning Classification. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, College of Engineering and Computing. (347)
https://nsuworks.nova.edu/gscis_etd/347.