Date of Award
Doctor of Philosophy in Information Assurance (DIA)
College of Engineering and Computing
James D. Cannady
Paul S. Cerkez
Intrusion detection is the practice of examining information from computers and networks to identify cyberattacks. It is an important topic in practice, since the frequency and consequences of cyberattacks continues to increase and affect organizations. It is important for research, since many problems exist for intrusion detection systems. Intrusion detection systems monitor large volumes of data and frequently generate false positives. This results in additional effort for security analysts to review and interpret alerts. After long hours spent reviewing alerts, security analysts become fatigued and make bad decisions. There is currently no approach to intrusion detection that reduces the workload of human analysts by providing a probabilistic prediction that a computer is experiencing a cyberattack.
This research addressed this problem by estimating the probability that a computer system was being attacked, rather than alerting on individual events. This research combined concepts from cyber situation awareness by applying clustering ensembles, probability analysis, and active learning. The unique contribution of this research is that it provides a higher level of meaning for intrusion alerts than traditional approaches.
Three experiments were conducted in the course of this research to demonstrate the feasibility of these concepts. The first experiment evaluated cluster generation approaches that provided multiple perspectives of network events using unsupervised machine learning. The second experiment developed and evaluated a method for detecting anomalies from the clustering results. This experiment also determined the probability that a computer system was being attacked. Finally, the third experiment integrated active learning into the anomaly detection results and evaluated its effectiveness in improving the accuracy.
This research demonstrated that clustering ensembles with probabilistic analysis were effective for identifying normal events. Abnormal events remained uncertain and were assigned a belief. By aggregating the belief to find the probability that a computer system was under attack, the resulting probability was highly accurate for the source IP addresses and reasonably accurate for the destination IP addresses. Active learning, which simulated feedback from a human analyst, eliminated the residual error for the destination IP addresses with a low number of events that required labeling.
Steven M. McElwee. 2018. Probabilistic Clustering Ensemble Evaluation for Intrusion Detection. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, College of Engineering and Computing. (1044)