面向Mesh片上网络的快速层次化多目标映射方法_英文_
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
⾯向Mesh⽚上⽹络的快速层次化多⽬标映射⽅法_英⽂_
北京⼤学学报(⾃然科学版),第44卷,第5期,2008年9⽉
Acta Scientiarum Naturalium Universitatis Pekinensis ,V ol.44,N o.5(Sept.2008)
⾼技术研究发展计划专项经费(2005AA111010)资助
收稿⽇期:2007208223;修回⽇期:2008204218
A F ast H ierarchical Multi 2Objective M apping Approach
for Mesh 2B ased N etw orks 2on 2Chip
LI N Hua
,ZHANGLiang ,T ONG Dong ,LI X ianfeng ,CHE NG Xu
M icroprocess or Research and Development Center ,School of E lectrics Engineering and C om puter Science ,
Peking University ,Beijing 100871; C orresponding Author ,E 2mail :linhua86@/doc/94fe8d0002020740be1e9bdb.html
Abstract The authors proposes a fast hierarchical multi 2objective mapping approach (H M Map )for mesh 2based N oC.Based on partition and multi 2objective heuristic techniques ,H M Map automatically maps large number of IP cores onto N oC architecture and makes g ood tradeoffs between communication energy and latency.Experimental results show that proposed approach achieves shorter execution time ,lower energy and latency com pared with others.With the increasing of N oC size ,the optimization effect of H M Map becomes m ore obvious.
K ey w ords Systems 2on 2Chip ;Netw orks 2on 2Chip ;topological mapping ;multi 2objective optimization
⾯向Mesh ⽚上⽹络的快速层次化多⽬标映射⽅法
林桦
张良 佟冬 李险峰 程旭
北京⼤学信息科学技术学院微处理器研究开发中⼼,北京100871; 通讯作者,E 2mail :linhua86@/doc/94fe8d0002020740be1e9bdb.html 摘要 (N oC )拓扑映射的近似最优解,提出⼀种⾯向Mesh N oC 的层次化多⽬标映射⽅法———
H M Map 。
该⽅法采⽤分组和多⽬标启发式算法,⾃动将给定应⽤的IP 核映射到N oC 体系结构上,有效⽀持⼤规模IP 核的映射,并且能够很好地权衡系统通信能耗和延迟两个关键设计指标。
实验表明,H M Map 相对现有⽅法运⾏
时间短,所得到的拓扑映射⽅案在降低通信能耗和延迟⽅⾯均效果显著。
随着N oC 规模的增⼤,H M Map 的优势更加明显。
关键词 ⽚上系统;⽚上⽹络;拓扑映射;多⽬标优化中图分类号 TP302
W ith the advance of the semiconductor technology ,hundreds or even th ousands of IP cores will be integrated
into a single chip [1]
.A communication centric design paradigm called Netw orks 2on 2Chip (N oC )has recently emerged to s olve the on 2chip communication problems in such com plex systems.N oC architectures can be classified into tw o major categ ories based on their topologies :custom or regular.Regular topologies ,especially the mesh topology ,become popular architectures for h om ogeneous or heterogeneous N oC design due to their sim ple lay out
im plementation ,easy routing ,high bandwidth ,short clock cycle and overall scalability [2]
.T his paper focuses on mesh N oC architectures.
Im plementing an application on mesh 2based N oC inv olves the following steps.First ,simulate the
application or statically analyze the traffic characteristics
to obtain a T ask G raph
[3]
.Second ,split up the T ask
G raph into a set of concurrent communicating tasks.Third ,different IP cores are selected and the tasks are assigned and scheduled.F ourth ,the IP cores are mapped
1
17
onto the proper N oC tiles to make certain design metrics of interest optimized.
The first three steps described above have been well addressed in the area of hardwareΠs oftware co2design and IP2reuse[3].The last step,mapping,is new to the C AD community and has a strong im pact on the system energy and latency.S o we discuss mapping in this paper.
H owever,mapping problem is an instance of constrained quadratic assignment problem which is known as NP2 hard[4].The search space of the problem increases factorially with the system size.Even for a4×4N oC, there can be16!mapping results.Exhaustive search of the problem s olution space is un feasible.Therefore,it is necessary to design new methods that make g ood tradeoffs between s olution quality and execution time.In addition, there might be multiple design objectives.Instead of a single optimum,there is rather a set of alternative trade2 offs,such as energy and latency,generally known as Pareto2optimal s olutions[5].Heuristic techniques based on ev olutionary alg orithms have been proposed to approximate the Pareto2optimal set for the mapping problem to achieve g ood tradeoffs between energy and latency of communication.H owever,as the size of N oC increases, their execution overheads become unaffordable. M oreover,they su ffer from premature convergence and getting trapped in local optimal areas.These problems become especially severe in multi2objective optimizations. S o it is hard for those approaches to find the Pareto2 optimal s olutions.
This paper presents a novel approach named H M Map that maps large numbers of IP cores onto mesh2based N oC with time2effectiveness and g ood tradeoffs between communication energy and latency.It em ploys a partition2 based hierarchical mapping method to speed up the convergence to approach Pareto2optimal s olutions s o as to reduce the execution time.M oreover,in order to obtain lower energy cost and latency,it groups IP cores according to application2specific communication to guide the search,selects the proper group size to find a balance of intensification and diversification in heuristics[6],and introduces nondominated s orting genetic alg orithm2 (NSG A2)[7]to find g ood spread and convergence of the Pareto s olutions.1 R elated Work
T opological mapping has been recognized as one of the key problems in mesh2based N oC research by recent surveys[8210],and it has been addressed in recent research efforts.S ome of them present mapping alg orithms that optimize only on single design objective,such as Murali et al.[11]with the aim of minimizing communication delay,Hu et al.[3]with the aim of minimizing the total communication energy,and T ang et al.[12]with the aim of reducing the overall execution time.Others propose techniques for multiple design objectives.Classical multi2 objective optimization methods,such as multi2objective programming which aggregates the objectives by forming a linear combination of them[5],have several disadvantages.The major disadvantage is that since they usually require several independent runs to optimize the different objectives,these methods cannot exploit synergies am ong the objectives and may cause high com putation overhead.F or this reas on,existing w orks m ostly use the ev olutionary alg orithms as a multi2objective optimization technique.Ascia et al.[13,14]formulate the mapping problem as a multi2objective combinatorial optimization problem and apply heuristic based on Strength Pareto Ev olutionary Alg orithm(SPE A)and the im proved version SPE A2[15]to obtain the Pareto2optimal mappings.Especially,Ascia et al.[14]extend the alg orithms with single objective in Ref.[3,11],and then com pare them with their SPE A2based method.The results show that SPE A2based method has better performance.
H owever,as this method has to use simulator to evaluate the interim s olutions during the search process,the execution time becomes prohibitive for large N oCs. Furtherm ore,as the N oC size increases,it is difficult to approximate the optimal s olutions.
Instead of using simulator,we apply energy and latency m odels to evaluate the interim s olutions,which can shorten the simulation time.We als o propose a hierarchical mapping approach to cut down the s olution space and speed up the convergence in the search process.Furtherm ore,a multi2objective ev olutionary alg orithm NSG A2with lower com putational com plexity is em ployed to the mapping problem in our approach.
217
北京⼤学学报(⾃然科学版)第44卷
2 Approach Overvie w
The design flow of H M Map is illustrated in Fig.1.It has four phases.
In the first phase ,we partition the core graph and group the application IP cores according to communication v olume.The communication v olume of each IP core is defined as the sum of the v olume coming into and out of the core.We assume that cores of large communication v olume with each other should be placed on the neighboring tiles to optimize the communication energy and latency [11,16]
.M oreover ,the group size is application
specific and has great effects on s olution quality ,s o it should be designed carefully.
This greedy partition
alg orithm is effective because it uses communication v olume information in the search process to identify high quality areas in the search space.The detailed description of this phase is presented in Section 411.
The second and the third phases perform the hierarchical mapping to get near optimal s olutions to smaller subproblems.This hierarchical strategy not only cuts down the search space to accelerate the convergence but als o reduces the com putational com plexity of the alg orithm.In the second phase ,a heuristic based on NSG A2is em ployed to decide the relative location of the partitioned groups.This step helps to intensively explore areas of the search space with high quality s olutions.The NSG A22based heuristic is termed M Map ,which will be explained in Section 412.In the third phase ,we use M Map again to further map cores inside each group.This phase achieves a higher diversification which prevents the premature convergence toward sub 2optimal s olutions.
At last ,the fourth phase combines the results from the hierarchical mapping into final s olutions of the original mapping problem.An exam ple of our design flow is presented in Fig.
2.
Fig 11 Design flow of H
MMap
Fig 12 An example design flow of H MMap
3 Problem Formulation
3.1 G raph definitions
T o formulate our problem ,we use and m odify the definitions proposed in Ref.[3].
S pecifically ,the
property of bandwidth that is not discussed in this paper is omitted in Definition 1and the communication latency is added in Definition 2for requirement.
The communication between
IP cores of the
applications is represented by the core graph :
Definition 1 A core graph is a directed graph ,
CG (C ,A )with each vertex c i ∈C representing an IP core and the directed arc (c i ,c j ),denoted as a i ,j ∈A ,representing a direct communication from c i to c j .Each a i ,j has the following property :v (a i ,j )is the arc volume from vertex c i to c j ,which represents the communication volume (bits )from c i to c j .
The architecture and connectivity of the N oC are
represented by the N oC architecture characterization
3
17 第5期林桦等:⾯向Mesh ⽚上⽹络的快速层次化多⽬标映射⽅法
graph.
Definition2 An architecture characterization graph AG(T,R)is a directed graph,where each vertex t i represents one tile in the architecture,and each directed arc r i,j represents the routing parameter from t i to t j.Each r i,j has the following properties.
1)Let P i,j be a set o f candidate minimal paths from tile t i to t j.Πp i,j∈P i,j,L(p i,j)gives the set o f links used by p i,j.
2)e(r i,j)stands for the average energy consumption o f sending one bit o f data from t i to t j.
3)d(r i,j)is the average communication latency o f one bit o f data from t i to t j.
Using these definitions,the mapping of the core graph CG(C,A)onto the architecture characterization graph AG(T,R)is defined by the one2to2one mapping function map[8].
map:C→T,s.t.map(c i)=t j,
Πc i∈C, ?t j∈T
which min{E system,T commlatency},
where E system represents the total system communication energy and T commlatency is the system communication latency.
312 E nergy and latency model
T o evaluate the intermediate mapping s olutions during the search process of H M Map,the communication energy and latency are m odeled in this section.
The system communication energy is obtained by the following equation:
E system=∑Πi,j(v(a i,j)×e(r i,j)),
e(r i,j)=E S
bit ×p
i,j
+E
L
bit
×(p
i,j
-1),
where E
S
bit ,E
L
bit
represent the energy consumed of
sending one bit of data by the switch and on the links between tiles,respectively.The links belong to L(p i,j).
E
S
bit
=αswitch C switch V2,
E
L
bit
=αlink C link V2,
α
switch
,αlink and C switch,C link are the signal activities and the total capacitances of the switches and wire segments, respectively.It is assumed that data always takes the shortest path between mesh nodes i and j.According to this assum ption,p i,j can be estimated as the Manhattan distance between the tw o tiles:
p i,j=x i-x j+y i-y j,
where x i,x j,y i,y j represent the tile location in N oC mesh.
T ransport latency is defined as the time that elapses from between the occurrence of a message header injection into the netw ork at the s ource node and the occurrence of a tail flit reception at the destination node[17].We use the average latency in our evaluation m odel.Let d bit(r i,j) denote the communication latency of one bit of data from t i to t j and T commlatency is:
T commlatency=d(r i,j)=
∑Πi,j d bit(r i,j)
v(a i,j)
,
d bit(r i,j)=k×p i,j,
where k is a constant related to the properties of links and switches[12].
4 H MMap:H ierarchical Multiobjec2 tive Mapping
4.1 IP cores partition
The partition phase em ploys a greedy alg orithm to partition the given large numbers of application IP cores into groups according to their communication v olume.The alg orithm is presented in Alg orithm1.The com putational com plexity of the partition alg orithm is O(p LOGp+qp) (where p is the number of application cores and q is the group size).It is low and will not increase the com putational com plexity of H M Map.
Algorithm1
1)S ort all given cores in a non2increasing order by the communication v olume.
2)F or each group:
a)Choose the core with the maximum communication v olume from the current core graph,add it into this group and rem ove it from the core graph.
b)Choose a core from the current core graph that communicates m ost with any selected ones in this group, add it into the group and rem ove it from the core graph.
c)Repeat step b until expected number of cores are selected.
3)Repeat step2until all groups are formed.
The group size must be specified for given
417
北京⼤学学报(⾃然科学版)第44卷
applications.This number should be adequately large s o that enough communication information can be included to guide the sam pling of the m ost promising regions in the search space.In the meantime,this number cannot be over large,or the search will be trapped in con fined areas.
412 MMap:NSGA22based multi2objective mapping approach
This section describes the proposed NSG A22based multi2objective mapping approach,named M Map.
Recently,there has been an increasing interest in ev olutionary multi2objective optimization due to the fact that ev olutionary alg orithms are able to find multiple Pareto2optimal s olutions in one single run and obtain g ood tradeoffs between various objectives.T o s olve our mapping problem with tw o design objectives(minimum energy and latency of communication),we use one of the m ost promising multi2objective ev olutionary alg orithms called nondominated s orting genetic alg orithm2 (NSG A2).NSG A2can find g ood spread of s olutions and g ood convergence near the true Pareto2optimal front[5]with a lower com putational com plexity of O(MN2)(where M is the number of objectives and N is the population size) than SPE A2with O(MN2lg N)com putational com plexity[15].
In M Map,a chrom os ome represents one topological mapping s olution Map to our problem.Each tile in the mesh is ass ociated with a gene which encodes the identifier of the IP core.F or instance,in an m×n mesh N oC,the i th gene encodes the identifier of the core in the tile in row「iΠm and column i%m.Instead of a single Map,the search begins with a random set of Maps called initial population.In the following sections,we first describe s ome key operators and processes in M Map based on NSG A2and then present the main flow. 41211 R ank assignment
Rank,which determines the survival chance for a specific Map,is related to all the design objectives.The tw o objective functions of the mapping problem are formulated as follows:
Definition3(objective function) For multi2 objective mapping problem:
1)The fir st objective f unction aims at minimizing the total communication energy o f the NoC,
Obj.1:f1(x)=E system,
2)The second objective f unction tries to minimize the average communication latency in the system,
Obj.2:f2(x)=T commlatency,
where x is the decision vector that represents the Map.
The Maps with lower communication energy and delay are assigned into higher rank front and have m ore chance to be selected to the next generation.
41212 Diversity estim ation
Along with convergence towards the Pareto2optimal set,it is als o desired that the proposed technique maintains a diverse population of Maps in order to prevent premature convergence and achieve a well distributed and well spread non2dominated set.T o estimate diversity in M Map,the population is s orted according to each objective function value in ascending order and a diversity estimation function is em ployed.F or each Map,this function calculates the sum of the abs olute normalized difference of tw o adjacent Maps corresponding to each objective.The function is illustrated as follows:
I i2diversity=I i2diversity+(I i+1?m-I i-1?m)Π
(f max
m
-f min m),
Where I i2diversity is the diversity of the i th Map in the population,I i?m refers to the m th objective function value of the i th Map,f max m and f min m are the maximum and minimum values of the m th objective function, respectively.
41213 Selection phase
The selection phase of M Map preserves the high quality Maps to the next generations.It consists of three steps.First,a combined population C t of the n th and(n -1)th generation population is generated.Second,each Map in C t is assigned to a rank front.Since all previous and current population members are included in C t,the best Maps can be preserved.Third,if the size of rank1is smaller than the population size N,we choose all Maps in rank1for the new population G eneration n+1.The remaining members of G eneration n+1are chosen from subsequent fronts in order of their ranking.Thus,Maps of rank2are chosen following Maps of rank1.This procedure is continued until the number of Maps chosen
517
第5期林桦等:⾯向Mesh⽚上⽹络的快速层次化多⽬标映射⽅法
exceeds N .T o choose exactly N population members ,we estimate diversity of the members in the last rank.The Maps with larger diversity are added to G eneration n +1.41214 Crossover operator and mutation operator
Because the identifier of the IP cores cannot be duplicate.The cross over and mutation genetic operators are redefined according to our problem.S pecifically ,the single 2point 2order cross over between Map a and Map b (Map n is a chrom os ome which represents one mapping s olution )is constructed in the following tw o steps.In the first step ,a cross over point is generated randomly.In the second step ,the genes of Map b behind the cross over point is added to the head of Map a and the genes of Map a behind the cross over point is added to the head of Map b .Finally ,of the duplicate genes ,the ones originally belonging to the Maps are deleted.Fig.3(a )describes an exam ple of the cross over process.
The mutation operator acts on a single Map as follows.
Because
the
tile
with
the
maximum
communication v olume has a large effect on the system communication energy and latency ,it is chosen and exchanged with another tile randomly chosen.Fig.3(b )shows an exam ple of the mutation process.41215 Main loop
Initially ,a random population G eneration 0of valid Maps is created.Since each gene represents an IP core ,it must be unique in a chrom os ome 2or Map.Then ,the first offspring population is created by traditional
selection
Fig 13 Crossover and mutation process of MMap and proposed cross over and mutation operator.The
subsequent selection is performed as illustrated in Section 4.2.4.This ev olution of M Map is an iterative process and terminates if the im provement of design objectives is less than 0101%during the last 100generations.
413 H ierarchical mapping approach
As technology scaling and large numbers of IP cores inv olved into a single chip ,the s olution space of the topological
mapping
problem
increases
drastically.
Obviously ,small populations do not provide enough diversity am ong
the
chrom os omes.
Increasing
the
population size ,however ,does not automatically yield an increase in s olution quality.The same holds for the number of generations
[18]
.Because of these reas ons and
limitations of existing optimization strategies ,it is difficult for state 2of 2the 2art approaches and M Map to converge towards the Pareto 2optimal front and meet time 2to 2market requirement during the optimization process.T o address these problems ,we present H M Map for mapping problem including large numbers of IPs.The approach overview is presented in Section 2and Alg orithm 2shows the main loop of this approach.
Algorithm 2 Main loop for H M Map
1)S ort the communication v olumes of each IP in a non 2increasing order according to the C ore G raph.
2)G reedy partition the IPs into groups.
3)Apply M Map to the partitioned groups to decide their relative locations.
4)Apply M Map to each group of IPs to decide the cores relative locations.
5)Merge the results derived from step 3and step 4to obtain the final s olution.
F or a m ×n N oC partitioned into groups size of a ×
b ,the s olution space reduces from (m ×n )!to ((m Πa )×(n Πb ))!and the s olution space of the partitioned
groups is (a ×b )!that is als o much smaller.The smaller search space accelerates the convergence speed greatly.Furtherm ore ,relatively smaller population size and generation size are chosen to reduce the com putational com plexity and the execution time ,respectively.
5 Experimental R esults and Discussion
Next we present an extensive experimental study
6
17 北京⼤学学报(⾃然科学版)第44卷
inv olving a set of benchmarks generated by T ask G raphs for Free(TG FF)[19]and benchmark applications.The mapping results are simulated using P opNet[20],a C++ netw ork simulator with Orion’s power m odels embedded, to evaluate the communication energy and latency[21].A deterministic minimal deadlock2free routing alg orithm named XY routing is chosen in all these experiments.(In short,for2D mesh netw orks,the XY routing first routes packets along the X2axis.Once the packets reach the column where lies the destination tile,they are then routed along the Y2axis)
511 Comparison betw een MMap and related w ork
We first com pare M Map againt a recent and m ost efficient related w ork[13]to dem onstrate the effectiveness of M Map in mapping problem.F or brevity,we term their w ork S2M for short as they em ploy an im proved alg orithm SPE A2as their ev olutionary alg orithm.T ask graphs of benchmarks are randomly generated by TG FF.As this paper concerns the mapping process and not assignment or scheduling,the output graphs are randomly assigned to the selected IPs available from industry[14].Then these IPs are used to form core graphs.T en benchmarks of task graphs are generated for each N oC size ranging from4×4 to18×18,respectively.T o make the com paris on fair, the parameters,such as the population size,the generation size,the cross over and mutation probability, are carefully selected for both alg orithms.F or all test benchmarks with M MAP and S2M,we use a cross over probability of019and a mutation probability of0101. Various population sizes and generations are run for different N oCs.F or exam ple,we conduct100generations for6×6N oCs and600generations for8×8N oCs.T able 1and Fig14present the results.In the table,the second and the third column present the runtime of S2M and M Map.The fourth and the fifth column present the energy consum ption of S2M and M Map,And the last tw o columns show the latency of S2M and M Map, respectively.
As shown in the results,M Map generates s olutions with much shorter time com pared to S2M.M oreover, though the runtime of S2M is affordable for small systems (e.g.6×6N oC),it increases dramatically as the system size scales up.F or instance,for N oC with8×8 tiles,M Map needs just several minutes to com plete while the time of S2M becomes prohibitive(In our experiments, S2M does not finish in24h of CPU time.The programs run on a116G H z I BM OpenP ower720Server).
T able1and Fig.4als o com pare the energy and latency of M Map and S2M for nine different N oC sizes. As can be seen,for small N oCs,the best s olutions are evaluated and com pared.Although S2M is simulation2 based that takes the dynamic traffic effect into account, the energy and latency results of the tw o approaches are alm ost the same.M Map behaves very similar with S2M due to the effective evaluation m odels and predominant NSG A22based heuristic.As for large N oCs,we only obtain intermediate s olutions after one day with S2M,and com pare s olutions generated by M Map against them.In this case,M Map makes an im provement of both energy and latency on S2M as expected.
512 Comparison betw een H MMap&MMap T o evaluate the scalability of the approaches,we
T able1 Runtime,energy and latency comp arisons of
MMap against S2M
N oC S ize T SΠs T MΠs E SΠn J E MΠn J L SΠns L MΠns 4×4509140.6721.4621.1113.3513.15 6×688210 3.08109.01107.1419.0918.75 8×8104647189.56883.44367.84260.7024.56 10×10N ot Finish246.352101.79736.26411.7827.75 12×12N ot Finish566.143973.351244.06726.7330.67 14×14N ot Finish1735.677681.822285.34945.2934.60 15×15N ot
Finish2424.0910870.402689.341680.8135.47 16×16N ot Finish3299.6134011.985690.213737.4453.82 18×18N ot Finish6527.6166876.607697.507364.9455.
24
Fig14 Design objectives comp arisons of MMap against
S2M on various N oCs
717
第5期林桦等:⾯向Mesh⽚上⽹络的快速层次化多⽬标映射⽅法
perform experiments inv olving netw orks of sizes ranging from 4×4to 18×18,each containing ten benchmarks ,respectively.H M Map with different group sizes is applied to these benchmarks and the best results are selected for com paris on.F or an n ×n N oC ,an em pirical optimal group size of
n 」×
n 」is observed for our
experiments.T able 2and Fig.5illustrate the average execution time ,communication energy and latency results and im provements made by H M Map on M Map for different N oCs ,which show a g ood scalability of H M Map.
It als o needs to be pointed out that the hierarchical partition approach does not necessarily reduce energy and latency.T o dem onstrate this occasional case ,we show the results from a benchmark for 12×12N oC in T able 3.H M Map with different group sizes are applied to this benchmark.As can be seen the results deteriorate when the group size is set to 36.H owever ,we can neglect this strategy and choose the optimal one with the group size of 16.Furtherm ore ,for H M Map applied to large N oCs ,this kind of deterioration seldom happens mainly for tw o reas ons.First ,H M Map utilizes the traffic characteristic
T able 2 Runtime ,energy and latency comp arisons of H MMap against MMap
N oC S ize T M Πs
T H Πs
E M Πn J
E H Πn J
L M Πns
L H Πns
4×40.670.09521.1129.7713.1515.276×6 3.080.29107.1474.2218.7516.138×889.56 5.87367.84238.6224.5620.1610×10246.35 6.80736.26460.4727.75
22.3812×12566.14 4.701244.06
691.04
30.6723.5114×141735.6713.362285.341393.2734.6027.2115×152424.0915.512689.341466.9935.4726.9716×163299.6119.145690.211776.8753.8226.7318×18
6527.61
36.55
7697.502381.92
55.24
27.76
Fig 15 Design objectives comp arisons of H MMap
against MMap on various N oCs T able 3 E nergy and latency influence of p artition on
12×12N oC observed from a benchm ark
G roup size
Energy reduction Π%
Latency reduction Π%
416.3613.67927.0222.021628.6122.7136
-1.71
-7.55
information about the set of pairs of cores which communicate m ost
and other pairs which
seldom
communicate.Second ,although M Map performs a search in a larger s olution space ,Pareto 2optimal s olutions are not found due to the relatively small parameters such as population size ,generation when N oC becomes com plex.H owever ,increasing these parameters and m odifying the heuristic do not automatically lead to an increase in s olution
quality
[18]
.F or
these reas ons ,H M Map
outperforms M Map in large N oCs (larger than 36tiles in our experiments ).We should als o point out that for smaller N oCs (up to 16tiles )it is difficult for H M Map to reduce energy and latency while shortening execution time (as shown in T able 2and Fig.5).Actually ,the execution time of M Map for small N oCs is affordable s o the partition 2based hierarchical method is not needed.M oreover ,the best approach to find the Pareto s olutions for N oCs such as 2×2is to enumerate the possible mappings as the search space is only 4!.F or systems like 4×4,exhaustive search of the s olution space is un feasible ,s o heuristic such as M Map is able to obtain g ood s olutions.513 Experiments on benchmark applications
We als o evaluate the performance of our proposed approach by applying it to benchmark applications.T w o cases are retrieved from Ref.[22],namely S parse matrix s olver and SPEC fpppp ,which contain 96and 334tasks ,respectively.We first manually assigned them onto 36and 144selected IP cores ,respectively.Next ,we customize the group sizes as four for S parse matrix s olver and sixteen for SPEC fpppp.Then ,H M Map and S2M are applied to these applications.Results are shown in T able 4.As we can
see ,
H M Map
reduces
the
execution
time ,
communication energy and latency for S parse matrix s olver
by 99197%,10170%and 8105%,respectively.F or SPEC fpppp ,H M Map produces 99199%,71104%,
8
17 北京⼤学学报(⾃然科学版)第44卷
97106%reduction in execution time ,energy and latency ,respectively.
The above experiments limit the netw ork topology to
n ×n (where n is not a prime number )mesh N oCs for
sim plicity.H M Map can be applied to other mesh N oCs by m odifying the partition method.F or instance ,a 7×7N oC can be assumed to be an 8×8N oC for partition.First ,we partition the assumed 8×8N oC according to the method presented in Section 4.1.Then ,the virtual tiles are omitted.F or rectangular mesh N oCs like 4×6,we can partition them into six 2×2groups.Fig.6dem onstrates these tw o exam ples.
T able 4 Design objectives comp arisons of H MMap
against S2M on real applications
Real App.t S Πs
t H Πs
e S Πn J
e H Πn J
l S Πns
l H Πns
matrix s olver
695580.2215.4513.8015.7714.50SPEC。