CEC Theses and Dissertations

Campus Access Only

All rights reserved. This publication is intended for use solely by faculty, students, and staff of Nova Southeastern University. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, now known or later developed, including but not limited to photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author or the publisher.

Date of Award


Document Type

Dissertation - NSU Access Only

Degree Name

Doctor of Philosophy in Computer Information Systems (DCIS)


Graduate School of Computer and Information Sciences


Sumitra Mukherjee

Committee Member

Michael J. Lazlo

Committee Member

Junping Sun


Hard disk drives are used in everyday life to store critical data. Although they are reliable, failure of a hard disk drive can be catastrophic, especially in applications like medicine, banking, air traffic control systems, missile guidance systems, computer numerical controlled machines, and more. The use of Self-Monitoring, Analysis and Reporting Technology (SMART) can aid in failure prediction by monitoring specific drive attributes and warning the user of an impending failure so that the user can backup data while there is still time. As a consequence, hard drive failure prediction has become an important problem and the subject of active research.

The best available approaches for hard drive failure prediction achieve acceptably low false alarm rates by first selecting a subset of features using non-parametric statistical methods such as reverse arrangements and then using the multiple-instance naïve Bayes classifier for the prediction task. However, the prediction accuracy of this approach is not sufficiently high.

The focus of this dissertation was to improve the drive failure prediction accuracy while maintaining a low false alarm rate by using a genetic algorithm for feature set reduction in conjunction with the multiple-instance naïve Bayes classifier for the prediction task. This research achieved a failure detection rate of 81% with a 0% false alarm rate on 12 attributes selected by the genetic algorithm. As a secondary contribution, this dissertation investigated the tradeoff between feature subset reduction and prediction accuracy in the hard drive prediction problem. This research found that as the number of features decreased below 10, the detection accuracy decreased significantly.

To access this thesis/dissertation you must have a valid nova.edu OR mynsu.nova.edu email address and create an account for NSUWorks.

  Contact Author

  Link to NovaCat