CEC Theses and Dissertations

Date of Award


Document Type


Degree Name

Doctor of Philosophy in Computer Science (CISD)


Graduate School of Computer and Information Sciences


Gregory E. Simcoe

Committee Member

Sumitra Mukherjee

Committee Member

Francisco J. Mitropoulos


When a cloud user allocates a cluster to execute a map-reduce workload, the user must determine the number and type of virtual machine instances to minimize the workload's financial cost. The cloud user may rent on-demand instances at a fixed price or spot instances at a variable price to execute the workload. Although the cloud user may bid on spot virtual machine instances at a reduced rate, the spot market auction may delay the workload's start or terminate the spot instances before the workload completes. The cloud user requires a forecast for the workload's financial cost and completion time to analyze the trade-offs between on-demand and spot instances.

While existing estimation tools predict map-reduce workloads' completion times and costs, these tools do not provide spot instance estimates because a spot market auction determines the instance's start time and duration. The ephemeral spot instances impact execution time estimates because the spot market auction forces the map-reduce workloads to use different storage strategies to persist data after the spot instances terminate. The spot market also reduces the existing tools' completion time and cost estimate accuracy because the tool must factor in spot instance wait times and early terminations.

This dissertation updated an existing tool to forecast map-reduce workload's monetary cost and completion time based on spot market historical traces. The enhanced estimation tool includes three new enhancements over existing tools. First, the estimation tool models the impact to the execution from new storage strategies. Second, the enhanced tool calculates additional execution time from early spot instance termination. Finally, the enhance tool predicts the workloads wait time and early termination probabilities from historic traces. Based on two historical Amazon EC2 spot market traces, the enhancements reduce the average completion time prediction error by 96% and the average monetary cost prediction error by 99% over existing tools.