A Parallel Scheduler for Block Iterative Solvers in Heterogeneous Computing Environments

合集下载

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Abstract
Nowadays with the help of di erent programming tools, it is easier to develop parallel applications that run in heterogeneous environments. From the advantages of these environments, we are interested in the ability of combining di erent computing capabilities into a single metacomputer to provide a way of reusing resources that are already available at our computing sites. We introduce a scheduler for distributing a set of tasks among a set of processors contained in a metacomputer. We study a case for the scheduler where the tasks are subsystems of linear equations generated by a parallel implementation of the Block Cimmino method. The algorithm is regarded as a block row projection algorithm for solving linear systems of equations. Furthermore, we use a block Conjugate Gradient algorithm (Block-CG) to accelerate the rate of convergence of the Block Cimmino algorithm. In x2 we present a scheduler for heterogeneous computing environments. In x3 we introduce our parallel implementation of the Block Cimmino algorithm and in x4 we analyse the performance of the scheduler inside the parallel Block Cimmino implementation. Lastly, we present general observations and conclusions in x5.
Configuration: FX80 = 2 processors Alliant FX/80 Monitor processor Cluster of processors single processor SUN4 = SUN SPARC 10 TC2K = 4 processors BBN TC2000 RS6K = IBM RS6000
M
HES le
ibmwrkst1 sp=3000 Alliant ct=SHARED np=2 sp = 2500 butter y ct=DISTRIB np=4 sp = 1000 sunwrkst1 sunwrkst2
,
FX/80
Fra Baidu bibliotek
TC2K
RS6K
SUN4
SUN4
TC2K
TC2K
TC2K
. ! ! ! !
z Istituto di Analisi Numerica CNR. via Abbiategrasso 209. 27100 Pavia, Italy. x Centre Europeen de Recherche et de Formation Avancee en Calcul Scienti que (CERFACS). 42 Ave.
Fig. 1. An example of a HES le and corresponding computing element graph.
speed of a processor is speci ed with an sp keyword. Some parallel programming tools like PVM will supply a relative speed when the sp keyword is omitted. After the scheduler receives the task information, it sorts the tasks according to their computational weight factors. Using the communication factors, it creates a table of communication between the tasks. Afterwards, the scheduler parses the HES le and produces as a result a computing element graph (ceg) wich is used during the distribution of tasks to clusters or processors. In general, we will use the term computing elements to refer to both single processors and clusters of processors. Figure 1 illustrates an example of a HES le and the output ceg from the scheduler. For the example in Figure 1, the Alliant FX/80 is represented by a cluster node of type
G. Coriolis. 31057 Toulouse, France. { Rutherford Appleton Laboratory. Chilton, Didcot, Oxon. 0X11 0QX England. k Ecole Nationale Superieure d'Electrotechnique, d'Electronique, d'Informatique, et d'Hydraulique de Toulouse (ENSEEIHT). 31000 Toulouse, France.
1
2
Arioli et al.
between the tasks. The rst parameter is used as a computational weight factor and the second as a communication factor. Henceforth, we will refer to the heterogeneous computing environment as the metacomputer. The metacomputer is de ned in a HES le (Heterogeneous Environment Speci cation le). Every line in a HES le contains either a comment or a computer speci cation. A computer speci cation line is formed by a set of keywords and parameter values. Currently, the scheduler accepts keywords de ned by di erent parallel programming tools like PVM 5] and P4 7]. In addition we include the keywords ct (cluster type), np (number of processors), and sp (speed of processor). These keywords are particular to the scheduler. We de ne three di erent types of processors or clusters that are valid parameter values to the ct keyword. SINGLE is used for a computer with only one processor. SHARED is used for shared memory architectures (for example ALLIANT FX/80), or computers with more than one processor without direct communication links to each of its processors from outside the cluster (e.g., a workstation with more than two processors). DISTRIB for distributed memory computers (e.g, KSR1, TC2000), or a set of processors with a fast interconnection network (e.g., a group of workstations interconnected with an FDDI network). The number of processors in a cluster is speci ed with an np keyword. And the relative
A Parallel Scheduler for Block Iterative Solvers in Heterogeneous Computing Environments
Mario Arioli z Anthony Drummond x Iain S. Du
x{
Daniel Ruiz k
We present a parallel scheduler for distributing work to a group of processors in a heterogeneous computing environment. Some of the processors in the heterogeneous computing environment can be clustered to take advantage of particular communication networks. Here, the scheduler has been used in the implementation of a parallel block iterative solver based on the Cimmino method. We have used PVM 3 to implement the communication between the heterogeneous processors.
1 Introduction
2 A Scheduler for Heterogeneous Computing Environments
We introduce a static scheduler for the distribution of a set of tasks among a set of processors. The scheduler takes as input information from the tasks to be computed in parallel and the de nition of the heterogeneous computing environment. The information that the scheduler needs to know about the tasks are the size of every task and the communication