Data Transfer Scheduling for Maximizing Throughput of Big-Data Computing in Cloud Systems

Abstract

Many big-data computing applications have been deployed in cloud platforms. These applications normally demand concurrent data transfers among computing nodes for parallel processing. It is important to find the best transfer scheduling leading to the least data retrieval time-the maximum throughput in other words. However, the existing methods cannot achieve this, because they ignore link bandwidths and the diversity of data replicas and paths. In this paper, we aim to develop a max-throughput data transfer scheduling to minimize the data retrieval time of applications. Specifically, the problem is formulated into mixed integer programming, and an approximation algorithm is proposed, with its approximation ratio analyzed. The extensive simulations demonstrate that our algorithm can obtain near optimal solutions.

Publication
IEEE Transactions on Cloud Computing, vol. 6, no. 1, pp. 87-98, (中科院大类一区期刊)
Ruitao Xie
Ruitao Xie
Associate Professor

My research interests include scheduling system resources to improve the performance of AI applications, edge computing, cloud computing, and mobile computing.