CCE Theses and Dissertations

Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


College of Computing and Engineering


James D. Cannady

Committee Member

Ling Wang

Committee Member

Sumitra Mukherjee


Compression, Genomic, Network, Packets, PCAP


In cybersecurity, one of most important forensic tools are audit files; they contain a record of cyber events that occur on systems throughout the enterprise. Threats to an enterprise have become one of the top concerns of IT professionals world-wide. Although there are various approaches to detect anomalous insider behavior, these approaches are not always able to detect advanced persistent threats or even exfiltration of sensitive data by insiders. The issue is the volume of network data required to identify this anomalous activity. It has been estimated that an average corporate user creates a minimum of 1.5 MB audit data per day or roughly 30 MB per business month and thus 90 MB or more in a three-month period. That volume by itself is not unwieldy, but that is for a single user. If a large corporate network is involved, that number could easily reach one-half petabyte or more, a size that could be unwieldy to store for any length of time. Normal compression techniques can reduce this size significantly, but the resultant file not only may still be large, but it also requires decompression to its original size to analyze.In gene research, file size is also a major concern. An approach has been developed whereby common segments of a gene are stored as links to the original, augmented with edit scripts showing any difference, such that the resultant file is significantly smaller than the original, allowing for easier analysis. The purpose of this research was to apply dynamic compressive techniques utilized in genomic research to the issue of data volume. All available data is required for gene sequencing, so compressive techniques have been developed where redundant information is replaced by links to that data, leaving only the difference intact for analysis. Similarly, in this research, network traffic was processed such that redundant packet information was replaced by links to that information, leaving intact the pertinent information needed to reconstruct the packet information and the steps required for the access. To test the Genomic Network Compression System (GNCS), two datasets were chosen, packet captures from the 2012 Mid-Atlantic Collegiate Cyber Defense Competition (MACCDC) and a hybrid dataset from the University of New South Wales at the Australian Defense Force Academy, Canberra, Australia, the UNSW-NB15 17-2-2015 dataset. To test for the efficacy with request/reply message formats, Address Resolution Protocol packets were processed for both datasets and obtained file size savings of 54.8% and 49.6% respectively. To test the GNCS with protocols that transfer large amounts of data, the Transmission Control Protocol was processed for both datasets. The MACCDC 2012 dataset consistently exhibited file space savings of approximately 66%, while the UNSW NB-15 dataset showed a gradual increase from 10.3% for a sample of 1,000 packets and increased until it plateaued at approximately 46% for samples of 10,000,000 packets and larger. This shows that the GNCS can provide approximately a 50% savings in storage space for network packets, providing organizations with a significant decrease in the required storage space for audit files.

Available for download on Saturday, May 25, 2024