Task Graph Mapping and Scheduling on Heterogeneous Architectures Under Communication Constraints

17 Jul

A. Emeretlis, T. Tsakoulis, G. Theodoridis, P. Alefragis, N. Voros

An approach for mapping applications represented as Directed Acyclic Graphs (DAGs) on platforms consisting of heterogeneous cores considering the communication overhead between the cores is introduced. Each node of the DAG corresponds to an application’s task and each edge, e = (vi, vj), represents a data dependency between them. When two dependent tasks are assigned on the same core, the transferred data are stored and loaded internally and no communication delay is inferred. Otherwise, the communication delay of two dependent tasks that are assigned on different cores is fixed and depends on the amount of the transferred data between the tasks.

The goal is to find a mapping that minimizes the total execution time (makespan). The approach is based on the Benders decomposition principle and integrates Integer Linear Programming (ILP) and Constraint Programming (CP) formulations. Specifically, the original problem is decomposed into two sub-problems namely, the assignment of the tasks to the cores and their scheduling per core, which are solved iteratively through the ILP and CP models, respectively. Both formulations take into account the communication delay between dependent tasks that are assigned to different cores trying to optimize the application’s execution time.

The solution of the first sub-problem is fed to the second one, whose purpose is to minimize the global makespan with the given assignment solution. In each iteration, extra constraints, called Benders cuts, are generated by inference and added to the first sub-problem. Their purpose is prune the search space by excluding many inefficient solutions at once. The iterative process terminates when the whole solution space has been searched or a time limit is reached.

The proposed approach succeeds to provide the optimal solution in all cases of synthetic and real-application DAGs within the time limit of one hour, while the pure ILP model fails more than half of them. In addition, the average solution time of the proposed method is about 1 minute, whereas for instances solved by both models within the time limit, the speedup equals to 11× over the ILP model.

Accepted for publication at the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XVII), Samos, Greece, July 17-20, 2017.

Leave a Reply