CCE Theses and Dissertations

Date of Award

2017

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science (CISD)

Department

College of Engineering and Computing

Advisor

Gregory Simco

Committee Member

Sumitra Mukherjee

Committee Member

Frank Mitropoulos

Keywords

file system benchmarking, synthetic benchmarks, synthetic workloads

Abstract

File system benchmarking plays an essential part in assessing the file system’s performance. It is especially difficult to measure and study the file system’s performance as it deals with several layers of hardware and software. Furthermore, different systems have different workload characteristics so while a file system may be optimized based on one given workload it might not perform optimally based on other types of workloads. Thus, it is imperative that the file system under study be examined with a workload equivalent to its production workload to ensure that it is optimized according to its usage. The most widely used benchmarking method is synthetic benchmarking due to its ease of use and flexibility. The flexibility of synthetic benchmarks allows system designers to produce a variety of different workloads that will provide insight on how the file system will perform under slightly different conditions. The downside of synthetic workloads is that they produce generic workloads that do not have the same characteristics as production workloads. For instance, synthetic benchmarks do not take into consideration the effects of the cache that can greatly impact the performance of the underlying file system. In addition, they do not model the variation in a given workload. This can lead to file systems not optimally designed for their usage. This work enhanced synthetic workload generation methods by taking into consideration how the file system operations are satisfied by the lower level function calls. In addition, this work modeled the variations of the workload’s footprint when present. The first step in the methodology was to run a given workload and trace it by a tool called tracefs. The collected traces contained data on the file system operations and the lower level function calls that satisfied these operations. Then the trace was divided into chunks sufficiently small enough to consider the workload characteristics of that chunk to be uniform. Then the configuration file that modeled each chunk was generated and supplied to a synthetic workload generator tool that was created by this work called FileRunner. The workload definition for each chunk allowed FileRunner to generate a synthetic workload that produced the same workload footprint as the corresponding segment in the original workload. In other words, the synthetic workload would exercise the lower level function calls in the same way as the original workload. Furthermore, FileRunner generated a synthetic workload for each specified segment in the order that they appeared in the trace that would result in a in a final workload mimicking the variation present in the original workload. The results indicated that the methodology can create a workload with a throughput within 10% difference and with operation latencies, with the exception of the create latencies, to be within the allowable 10% difference and in some cases within the 15% maximum allowable difference. The work was able to accurately model the I/O footprint. In some cases the difference was negligible and in the worst case it was at 2.49% difference.

Share

COinS