This paper presents APlug, a framework for automatic tuning of large scale applications of many independent tasks. APlug suggests the best decomposition of the original computation into smaller tasks and the best number of CPUs to use, in order to meet user-specific constraints. We show that the problem is not trivial because there is large variability in the execution time of tasks, and it is possible for a task to occupy a CPU by performing useless computations. APlug collects a sample of task execution times and builds a model, which is then used by a discrete event simulator to calculate the optimal parameters. We provide a C++ API and a stand-alone implementation of APlug, and we integrate it with three typical applications from computational chemistry, bioinformatics, and data mining. A scenario for optimizing resources utilization is used to demonstrate our framework. We run experiments on 16,384 CPUs on a supercomputer, 480 cores on a Linux cluster and 80 cores on Amazon EC2, and show that APlug is very accurate with minimal overhead.
|Original language||English (US)|
|Title of host publication||2015 IEEE 31st International Conference on Data Engineering|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Number of pages||12|
|State||Published - Apr 2015|