Rewriting skeleton programs How to evaluate the data-parallel stream-parallel tradeo
合集下载
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Rewriting skeleton programs: how to evaluate the data-parallel stream-parallel tradeo
M. Aldinucci, M. Coppola & M. Danelutto
Department of Computer Science { University of Pisa Corso Italia, 40 { 56125 Pisa { Italy Email: faldinuc,coppola,marcodg@di.unipi.it programmer to use both data and stream parallel skeletons within the same program. It is known that particular skeleton nestings can be formally rewritten into di erent nestings that preserve the functional semantics. Indeed, the kind and possibly the amount of parallelism usefully exploitable may change while rewriting takes place. Here we discuss an original framework allowing the user (and/or the compiling tools) of a skeleton based parallel programming language to evaluate whether or not the transformation of a skeleton program is worthwhile in terms of the nal program performance. We address, in particular, the evaluation of transformations exchanging data parallel and stream parallel skeleton subtrees.
In most skeleton frameworks equivalences can be established between di erent skeleton nestings. We can state that: Ei : X1 (X11 : : : X1n ) X2 (X21 : : : X2m ) meaning that skeleton X1 applied to skeletons X11 to X1n computes the same result of skeleton X2 applied to skeletons X21 to X2m . Equivalences such as this one are of interest to the skeleton programmer, as he can use them as rewriting rules to rewrite his programs in order to achieve a better program structure and/or a program sporting a better performance. These equivalences are also of interest to the designer of the skeleton system/language, as he can use them to improve the compilation process, once an algorithm is available stating which equivalences have to be exploited for better performance (e ciency). In both cases, it is fundamental to devise a framework that could be used to evaluate the impact of an equivalence rule over a given program. In other words, it is fundamental to have a formal way of understanding, given two skeleton programs P1 and P2 , such that P2 has been obtained transforming P1 by using a set of equivalences fE1 : : : Ek g, which one is the \better" in terms of the performance achieved (it is supposed to achieve). In order to be able to answer such kind of questions, we must be able to solve di erent problems: { we must be able to enumerate the programs that are \equivalent" to a given program modulo a set of rewriting rules. { we need a reliable algorithm to compute the expected performance of a skeleton program onto a target machine { we need an evaluation criteria able to discriminate, among equivalent programs, the one(s) exhibiting the best performance These problems are not easily dealt with. In this paper we want to investigate the possibility of providing the user with some formal reasoning methodology allowing him to understand whether a given rewrite rule can be e ciently applied to his programs. Furthermore, since the skeleton system we will take into account will contain skeletons exploiting both data and stream parallelism, we will focus on the rules that transform either kind of parallelism into the other. Finding the best interplay among them with respect to the program performance will be our goal. In the following we will at rst show our skeleton model, which is a subset of the P3L one, and a sample of its rewriting rules. We will then brie y outline the design of a performance model for skeleton implementations on real target machines, thus providing a formal way to \measure" the expected performance of left and right hand sides of a rewriting rule. The performance model is built using a logP -like cost calculus 5] that extends similar ones already developed, in particular within the P3L project 6, 7]. We will nally show how such a model can be used to tell the programmer what rewriting rule to use and in which direction (left hand to right hand side or vice-versa), depending on some parameters relative to the data, the program structure and the target architecture at hand.
Abstract. Some skeleton based parallel programming models allow the
ቤተ መጻሕፍቲ ባይዱ1 Introduction
Since the work of Cole on skeletons 1], many researchers worked out the idea of providing the parallel programmer with a set of skeletons that could be used to model the parallel structure of applications and algorithms. Di erent research teams have designed skeleton sets that vary in the number and specialisation of the skeletons, as well as in the possibility of performing arbitrary skeleton nestings within programs 2, 3]. The skeleton idea is very appealing mostly due to the fact that skeletons allow e cient development of parallel applications by proper speci cation of the skeleton nesting to be used. All the details of process decomposition, mapping and scheduling, as well as the communication and memory management issues are handled by the skeleton system in place of the programmer. This pleasant property is paid in two ways. On the one hand, programmers are not allowed to exploit any arbitrary parallel decompositions of a problem: they are forced to use just those patterns provided by the skeleton system (skeleton models belong to the \restricted programming model" class 4]). On the other hand, the implementation of an e cient skeleton based programming environment is clearly a hard task, as all the process and communication details the programmer has not to deal with must be handled by the skeleton system designer.
M. Aldinucci, M. Coppola & M. Danelutto
Department of Computer Science { University of Pisa Corso Italia, 40 { 56125 Pisa { Italy Email: faldinuc,coppola,marcodg@di.unipi.it programmer to use both data and stream parallel skeletons within the same program. It is known that particular skeleton nestings can be formally rewritten into di erent nestings that preserve the functional semantics. Indeed, the kind and possibly the amount of parallelism usefully exploitable may change while rewriting takes place. Here we discuss an original framework allowing the user (and/or the compiling tools) of a skeleton based parallel programming language to evaluate whether or not the transformation of a skeleton program is worthwhile in terms of the nal program performance. We address, in particular, the evaluation of transformations exchanging data parallel and stream parallel skeleton subtrees.
In most skeleton frameworks equivalences can be established between di erent skeleton nestings. We can state that: Ei : X1 (X11 : : : X1n ) X2 (X21 : : : X2m ) meaning that skeleton X1 applied to skeletons X11 to X1n computes the same result of skeleton X2 applied to skeletons X21 to X2m . Equivalences such as this one are of interest to the skeleton programmer, as he can use them as rewriting rules to rewrite his programs in order to achieve a better program structure and/or a program sporting a better performance. These equivalences are also of interest to the designer of the skeleton system/language, as he can use them to improve the compilation process, once an algorithm is available stating which equivalences have to be exploited for better performance (e ciency). In both cases, it is fundamental to devise a framework that could be used to evaluate the impact of an equivalence rule over a given program. In other words, it is fundamental to have a formal way of understanding, given two skeleton programs P1 and P2 , such that P2 has been obtained transforming P1 by using a set of equivalences fE1 : : : Ek g, which one is the \better" in terms of the performance achieved (it is supposed to achieve). In order to be able to answer such kind of questions, we must be able to solve di erent problems: { we must be able to enumerate the programs that are \equivalent" to a given program modulo a set of rewriting rules. { we need a reliable algorithm to compute the expected performance of a skeleton program onto a target machine { we need an evaluation criteria able to discriminate, among equivalent programs, the one(s) exhibiting the best performance These problems are not easily dealt with. In this paper we want to investigate the possibility of providing the user with some formal reasoning methodology allowing him to understand whether a given rewrite rule can be e ciently applied to his programs. Furthermore, since the skeleton system we will take into account will contain skeletons exploiting both data and stream parallelism, we will focus on the rules that transform either kind of parallelism into the other. Finding the best interplay among them with respect to the program performance will be our goal. In the following we will at rst show our skeleton model, which is a subset of the P3L one, and a sample of its rewriting rules. We will then brie y outline the design of a performance model for skeleton implementations on real target machines, thus providing a formal way to \measure" the expected performance of left and right hand sides of a rewriting rule. The performance model is built using a logP -like cost calculus 5] that extends similar ones already developed, in particular within the P3L project 6, 7]. We will nally show how such a model can be used to tell the programmer what rewriting rule to use and in which direction (left hand to right hand side or vice-versa), depending on some parameters relative to the data, the program structure and the target architecture at hand.
Abstract. Some skeleton based parallel programming models allow the
ቤተ መጻሕፍቲ ባይዱ1 Introduction
Since the work of Cole on skeletons 1], many researchers worked out the idea of providing the parallel programmer with a set of skeletons that could be used to model the parallel structure of applications and algorithms. Di erent research teams have designed skeleton sets that vary in the number and specialisation of the skeletons, as well as in the possibility of performing arbitrary skeleton nestings within programs 2, 3]. The skeleton idea is very appealing mostly due to the fact that skeletons allow e cient development of parallel applications by proper speci cation of the skeleton nesting to be used. All the details of process decomposition, mapping and scheduling, as well as the communication and memory management issues are handled by the skeleton system in place of the programmer. This pleasant property is paid in two ways. On the one hand, programmers are not allowed to exploit any arbitrary parallel decompositions of a problem: they are forced to use just those patterns provided by the skeleton system (skeleton models belong to the \restricted programming model" class 4]). On the other hand, the implementation of an e cient skeleton based programming environment is clearly a hard task, as all the process and communication details the programmer has not to deal with must be handled by the skeleton system designer.