CCE Theses and Dissertations

Date of Award

2015

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science (CISD)

Department

College of Engineering and Computing

Advisor

Michael J. Lazlo

Committee Member

Sumitra Mukherjee

Committee Member

Amon B. Seagull

Keywords

Authorship attribution, Machine learning, Natural language processing, Rhetoric, Computer Science

Abstract

Measures of classical rhetorical structure in text can improve accuracy in certain types of stylistic classification tasks such as authorship attribution. This research augments the relatively scarce work in the automated identification of rhetorical figures and uses the resulting statistics to characterize an author's rhetorical style. These characterizations of style can then become part of the feature set of various classification models.

Our Rhetorica software identifies 14 classical rhetorical figures in free English text, with generally good precision and recall, and provides summary measures to use in descriptive or classification tasks. Classification models trained on Rhetorica's rhetorical measures paired with lexical features typically performed better at authorship attribution than either set of features used individually. The rhetorical measures also provide new stylistic quantities for describing texts, authors, genres, etc.

Share

COinS