Data Transfer Scheduling for Maximizing Throughput of Big-Data Computing in Cloud Systems

摘要

Many big-data computing applications have been deployed in cloud platforms. These applications normally demand concurrent data transfers among computing nodes for parallel processing. It is important to find the best transfer scheduling leading to the least data retrieval time-the maximum throughput in other words. However, the existing methods cannot achieve this, because they ignore link bandwidths and the diversity of data replicas and paths. In this paper, we aim to develop a max-throughput data transfer scheduling to minimize the data retrieval time of applications. Specifically, the problem is formulated into mixed integer programming, and an approximation algorithm is proposed, with its approximation ratio analyzed. The extensive simulations demonstrate that our algorithm can obtain near optimal solutions.

出版物
IEEE Transactions on Cloud Computing, vol. 6, no. 1, pp. 87-98, (中科院大类一区期刊)
Ruitao Xie
Ruitao Xie
副教授

研究兴趣包括通过系统资源调度优化机器学习应用的性能,边缘计算,云计算,移动计算.