CCE Theses and Dissertations

Campus Access Only

All rights reserved. This publication is intended for use solely by faculty, students, and staff of Nova Southeastern University. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, now known or later developed, including but not limited to photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author or the publisher.

Date of Award


Document Type

Dissertation - NSU Access Only

Degree Name

Doctor of Philosophy in Computer Information Systems (DCIS)


Graduate School of Computer and Information Sciences


Wei Li

Committee Member

Sumitra Mukherjee

Committee Member

Easwar Nyshadham


email based attacks, information security, logistic regression, machine learning, phishing, privacy and security


The majority of documented phishing attacks have been carried by email, yet few studies have measured the impact of email headers on the predictive accuracy of machine learning techniques in detecting email phishing attacks. Research has shown that the inclusion of a limited subset of email headers as features in training machine learning algorithms to detect phishing attack did increase the predictive accuracy of these learning algorithms. The same research also recommended further investigation of the impact of including an expanded set of email headers on the predictive accuracy of machine learning algorithms.

In addition, research has shown that the cost of misclassifying legitimate emails as phishing attacks--false positives--was far higher than that of misclassifying phishing emails as legitimate--false negatives, while the opposite was true in the case of fraud detection. Consequently, they recommended that cost sensitive measures be taken in order to further improve the weighted predictive accuracy of machine learning algorithms.

Motivated by the potentially high impact of the inclusion of email headers on the predictive accuracy of machines learning algorithms and the significance of enabling cost-sensitive measures as part of the learning process, the goal of this research was to quantify the impact of including an extended set of email headers and to investigate the impact of imposing penalty as part of the learning process on the number of false positives. It was believed that if email headers were included and cost-sensitive measures were taken as part of the learning process, than the overall weighted, predictive accuracy of the machine learning algorithm would be improved. The results showed that adding email headers as features did improve the overall predictive accuracy of machine learning algorithms and that cost-sensitive measure taken as part of the learning process did result in lower false positives.

To access this thesis/dissertation you must have a valid OR email address and create an account for NSUWorks.

Free My Thesis

If you are the author of this work and would like to grant permission to make it openly accessible to all, please click the Free My Thesis button.

  Contact Author

  Link to NovaCat