CCE Theses and Dissertations

A New Statistical Approach for Anomaly Intrusion Detection

Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


Graduate School of Computer and Information Sciences


James D. Cannady

Committee Member

Sumitra Mukherjee

Committee Member

Junping Sun


Although using statistical modeling techniques for detecting anomaly intrusion and profiling user behavior with network audit data has been studied for more than a decade, the minimum sample size required from each network site to fit a model is not clear. A large sample size requires more resources to collect and analyze data, while a small sample size increases both false positives and false negatives. Determining the minimum sample size and developing a better corresponding classification algorithm are two essential tasks in the area of intrusion detection. This research aimed to address these two tasks with the Markov Chain Monte Carlo, bootstrap simulation, and hierarchical random effects logistic regression modeling methods. The study cohorts were drawn from the 1998 Defense Advanced Research Projects Agency Intrusion Detection Evaluation offline, and the Third International Knowledge Discovery and Data Mining Tools Competition 1999 (KDD-cup) data. Sensitivity, specificity, area under the receiver operating characteristic curve, and misclassification rate were used for evaluating the performance of the proposed technique. The research demonstrated that a minimum sample size of 500 provides a sensitivity value of 0.85, a specificity value of 0.92, and a kappa-statistic of 0.77 in classification, and the results from the minimum sample-based model were comparable with the full sample-based model. The research also developed a multilevel classification algorithm that provides remarkably better classification performance. Finally, the research developed a standardized risk score to assess and evaluate the classification performance of the minimum sample size and multilevel algorithm. Compared with the KDD-cup 1999 top winning results, the risk score had similar performance, but in a far simpler format. In summary, the research provides statistically sound evidence for determining an appropriate sample size of audit data in intrusion detection. Since audit data are available and collectable from network traffic log files, the risk score and the classification algorithm can be implemented in common computer languages. The results of the research would significantly improve and facilitate real-time anomaly intrusion detection, typically in the high-speed mobile wireless network environment, and provide a new statistical approach to this area.

This document is currently not available here.

  Link to NovaCat