Our big datasets keep getting bigger, and processing all that data requires a careful synthesis of hardware and software. In this research project, I investigated how to effectively stand up a Hadoop cluster on a layer of virtual machines to improve flexibility without degrading performance. The benchmark tests were a success, and I followed this up by writing a series of scripts to simplify the process of setting up new clusters on the fly.