CCE Theses and Dissertations
Campus Access Only
All rights reserved. This publication is intended for use solely by faculty, students, and staff of Nova Southeastern University. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, now known or later developed, including but not limited to photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author or the publisher.
Date of Award
2013
Document Type
Dissertation - NSU Access Only
Degree Name
Doctor of Philosophy in Computer Information Systems (DCIS)
Department
Graduate School of Computer and Information Sciences
Advisor
Michael J. Lazlo
Committee Member
Amon B Seagull
Committee Member
Wei Li
Keywords
Authorship Attribution, collocations, frequent words, function words, n-grams, word order
Abstract
Prior research has considered the sequential order of function words, after the contextual words of the text have been removed, as a stylistic indicator of authorship. This research describes an effort to enhance authorship attribution accuracy based on this same information source with alternate classifiers, alternate n-gram construction methods, and a genetically tuned configuration.
The approach is original in that it is the first time that probabilistic versions of Burrows's Delta have been used. Instead of using z-scores as an input for a classifier, the z-scores were converted to probabilistic equivalents (since z-scores cannot be subtracted, added, or divided without the possibility of distorting their probabilistic meaning); this adaptation enhanced accuracy. Multiple versions of Burrows's Delta were evaluated; this includes a hybrid of the Probabilistic Burrows's Delta and the version proposed by Smith & Aldridge (2011); in this case accuracy was enhanced when individual frequent words were evaluated as indicators of style. Other novel aspects include alternate n-gram construction methods; a reconciliation process that allows texts of various lengths from different authors to be compared; and a GA selection process that determines which function (or frequent) words (see Smith & Rickards, 2008; see also Shaker, Corne, & Everson, 2007) may be used in the construction of function word n-grams.
NSUWorks Citation
Russell Clark Johnson. 2013. Authorship Attribution with Function Word N-Grams. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, Graduate School of Computer and Information Sciences. (188)
https://nsuworks.nova.edu/gscis_etd/188.