ACM的论文写作格式标准

合集下载

全国大学生数学建模竞赛论文格式规范.doc

全国大学生数学建模竞赛论文格式规范.doc

全国大学生数学建模竞赛论文格式规范(全国大学生数学建模竞赛组委会,2019年修订稿)为了保证竞赛的公平、公正性,便于竞赛活动的标准化管理,根据评阅工作的实际需要,竞赛要求参赛队分别提交纸质版和电子版论文,特制定本规范。

一、纸质版论文格式规范第一条,论文用白色A4纸打印(单面、双面均可);上下左右各留出至少2.5厘米的页边距;从左侧装订。

第二条,论文第一页为承诺书,第二页为编号专用页,具体内容见本规范第3、4页。

第三条,论文第三页为摘要专用页(含标题和关键词,但不需要翻译成英文),从此页开始编写页码;页码必须位于每页页脚中部,用阿拉伯数字从“1”开始连续编号。

摘要专用页必须单独一页,且篇幅不能超过一页。

第四条,从第四页开始是论文正文(不要目录,尽量控制在20页以内);正文之后是论文附录(页数不限)。

第五条,论文附录至少应包括参赛论文的所有源程序代码,如实际使用的软件名称、命令和编写的全部可运行的源程序(含EXCEL、SPSS等软件的交互命令);通常还应包括自主查阅使用的数据等资料。

赛题中提供的数据不要放在附录。

如果缺少必要的源程序或程序不能运行(或者运行结果与正文不符),可能会被取消评奖资格。

论文附录必须打印装订在论文纸质版中。

如果确实没有源程序,也应在论文附录中明确说明“本论文没有源程序”。

第六条,论文正文和附录不能有任何可能显示答题人身份和所在学校及赛区的信息。

第七条,引用别人的成果或其他公开的资料(包括网上资料)必须按照科技论文写作的规范格式列出参考文献,并在正文引用处予以标注。

第八条,本规范中未作规定的,如排版格式(字号、字体、行距、颜色等)不做统一要求,可由赛区自行决定。

在不违反本规范的前提下,各赛区可以对论文增加其他要求。

二、电子版论文格式规范第九条,参赛队应按照《全国大学生数学建模竞赛报名和参赛须知》的要求提交以下两个电子文件,分别对应于参赛论文和相关的支撑材料。

第十条,参赛论文的电子版不能包含承诺书和编号专用页(纸质版的前两页),即电子版论文第一页为摘要专用页。

全国数学建模竞赛论文格式规范解读

全国数学建模竞赛论文格式规范解读
说明:摘要和关键词必须在一 页上,此页不能有正文。
7
6. 论文从第三页开始编写页码,页码必 须位于每页页脚中部,用阿拉伯数字从 “1”开始连续编号。
说明:页码必须有,一定要按 照要求,不要乱编。
8
7. 从第四页开始是论文正文,不要目录。 论文不能有页眉或任何可能显示答题人 身份和所在学校等的信息 。
示文稿。
Microsoft Office PowerPoint
做出来的东西叫演示文稿,其格式后缀名为:
ppt、pptx;或者也可以保存为:pdf、图片
格式等
16
卡盟排行榜 卡盟
网上资源的表述方式为:
[编号] 作者,资源标题,网址,/34566.htm,2011年9月 11日。
22
最后提示!
(4)邮件附件不得使用部分邮件服务商提供 的“云附件”或“超大附件”等临时存储的 功能,在发送邮件时注意正确选择操作。否 则,全国组委会可能无法接收到该参赛队的 论文。 (5)在发送邮件时,务必认真核对邮件主题、 论文编号等关键信息,确保准确无误。否则, 将无法识别该参赛队的论文,可能会被认定 为不成功参赛。
题号(A或B或C或D)赛区编号(简称“区号”,2 位)、学校编号(3位)和校内编号(3位)及参赛 队员姓名
例如:
参赛队是 太原理工大学25队的三名同学张三、李 四、王五选做了A题,相应的论文编号文件名为: A04001025_张三_李四_王五.pdf 压缩文件名:A04001025_张三_李四_王五 .rar
25
最后希望同学们在百忙之中一 定要严格按照《2014全国大学生数 学建模竞赛论文格式规范》的要求 写好自己的数学建模论文,这样就 有最大的希望获得国家奖及高获奖 名次。
26

数学建模论文格式

数学建模论文格式

数学建模论文格式①研究的主要问题;②建立的什么模型;③用的什么求解方法;④主要结果(简洁、主要的);⑤自我评价和推广。

数学建模竞赛章程规定,对竞赛论文的评价应以:①假设的合理性②建模的制造性③结果的正确性④文字表述的清楚性为主要标准。

所以论文中应努力反映出这些特点。

留意:整个版式要完全根据《全国大学生数学建模竞赛论文格式规范》的要求书写,否则无法送全国评奖。

一、问题的重述数学建模竞赛要求解决给定的问题,所以一般应以“问题的重述”开始。

此部分的目的是要吸引读者读下去,所以文字不可冗长,内容选择不要过于分散、琐碎,措辞要精练。

这部分的内容是将原问题进行整理,将已知和问题明确化即可。

留意:在写这部分的内容时,肯定不可照抄原题!应为:在认真理解了问题的基础上,用自己的语言重新将问题描述一篇。

应尽量简短,没有必要像原题一样面面俱到。

二、模型假设作假设时需要留意的问题:①为问题有帮忙的全部假设都应当在此消失,包括题目中给出的假设!②重述不能代替假设!也就是说,虽然你可能在你的问题重述中已经叙述了某个假设,但在这里仍旧要再次叙述!③与题目无关的假设,就不必在此写出了。

三、变量说明为了使读者能更充分的理解你所做的工作,对你的模型中所用到的变量,应一一加以说明,变量的输入必需使用公式编辑器。

留意:①变量说明要全即是说,在后面模型建立模型求解过程中使用到的全部变量,都应当在此加以说明。

②要与数学中的习惯相符,不要使用程序中变量的写法比如:一般表示圆周率;cba,,一般表示常量、已知量;zyx,,一般表示变量、未知量再比如:变量21,aa等,就不要写成:a[0],a[1]或a(1),a(2)四、模型的建立与求解这一部分是文章的重点,要特殊突出你的制造性的工作。

在这部分写作需要留意的事项有:①肯定要有分析,而且分析应在所建立模型的前面;②肯定要有明确的模型,不要让别人在你的文章中去找你的模型;③关系式肯定要明确;思路要清楚,易读易懂。

数学建模竞赛论文格式规范

数学建模竞赛论文格式规范

数学建模竞赛论文格式规范(1)本科组参赛队从A、B题中任选一题。

(2)论文第一页为承诺书,具体内容和格式见本规范第二页。

论文第二页为编号专用页,用于赛区和全国评阅前后对论文进行编号,具体内容和格式见本规范第三页。

论文题目、摘要和关键词写在论文第三页上(无需译成英文),并从此页开始编写页码;页码必须位于每页页脚中部,用阿拉伯数字从“1”开始连续编号。

注意:摘要应该是一份简明扼要的详细摘要,请认真书写(但篇幅不能超过一页)。

从第四页开始是论文正文(不要目录)。

论文不能有页眉或任何可能显示答题人身份和所在学校等的信息。

(3)论文应该思路清晰,表达简洁(正文尽量控制在20页以内,附录页数不限)。

引用别人的成果或其他公开的资料(包括网上查到的资料) 必须按照规定的参考文献的表述方式在正文引用处和参考文献中均明确列出。

正文引用处用方括号标示参考文献的编号,如[1][3]等;引用书籍还必须指出页码。

参考文献按正文中的引用次序列出,其中书籍的表述方式为:[编号] 作者,书名,出版地:出版社,出版年。

参考文献中期刊杂志论文的表述方式为:[编号] 作者,论文名,杂志名,卷期号:起止页码,出版年。

参考文献中网上资源的表述方式为:[编号] 作者,资源标题,网址,访问时间(年月日)。

在论文附录中,应给出参赛者实际使用的软件名称、命令和编写的全部计算机源程序(若有的话)。

同时,所有源程序文件必须放入论文电子版中备查。

论文及源程序电子版压缩在一个文件中,一般不要超过20MB。

(如果发现程序不能运行,或者运行结果与论文中报告的不一致,该论文可能会被认定为弄虚作假而被取消评奖资格。

)本规范中未作规定的,如排版格式(字号、字体、行距、颜色等)不做统一要求,可由参赛同学自行决定。

数学建模美赛论文格式中文版Word版

数学建模美赛论文格式中文版Word版

你的论文需要从此开始请居中使用Arial14字体第一作者,第二作者和其他(使用Arial14字体)1.第一作者的详细地址,包括国籍和email(使用Arial11)2.第二作者的详细地址,包括国籍和email(使用Arial11)3.将所有的详细信息标记为相同格式关键词列出文章的关键词。

这些关键词会被出版方用作关键词索引(使用Arial11字体)论文正文使用Times New Roman12字体摘要这一部分阐述说明了如何为TransTechPublications.准备手稿。

最好阅读这些用法说明并且整篇论文都是遵照这个提纲。

手稿的正文部分应该是17cm*25cm(宽*高)的格式(或者是6.7*9.8英尺)。

请不要在这个区域以外书写。

请使用21*29厘米或8*11英尺的质量较好的白纸。

你的手稿可能会被出版商缩减20%。

在制图和绘表格时候请特别注意这些准则。

引言所有的语言都应该是英语。

请备份你的手稿(以防在邮寄过程中丢失)我们收到手稿即默认为原作者允许我们在期刊和书报出版。

如果作者在论文中使用了其他刊物中的图表,他们需要联系原作者,获取使用权。

将单词或词组倾斜以示强调。

除了每一部分的标题(标记部分的标题),不要加粗正文或大写首字母。

使用激光打印机,而不是点阵打印机正文的组织:小标题小标题应该加粗并注意字母的大小写。

第二等级的小标题被视为后面段落的一部分(就像这一大段的一小部分的开头)页码不要打印页码。

请用淡蓝色铅笔在每一张纸的左下角(在打印区域以外)标注数字。

脚注脚注应该单独放置并且和正文分开理想地情况下,脚注应该出现在参考文献页,并且放在文章的末尾,和正文用分割线分开。

表格表格(如表一,表二,...)应该放在正文当中,是正文的一部分,但是,要避免文本混乱。

一个描述性的表格标题要放在图表的下方。

标题应该独立的放在表格的下方或旁边。

表中的单位应放在中括号中[兆伏]如果中括号不可用,需使用大括号{兆}或小括号(兆)。

XX大学数学建模竞赛论文格式规范【模板】(3)

XX大学数学建模竞赛论文格式规范【模板】(3)

**大学数学建模竞赛论文格式规范
●参赛队从A、B、C、D题中任选一题。

●论文用白色A4纸单面打印;上下左右各留出至少2.5厘米的页边距。

●论文第一页为参赛队信息,具体内容和格式见本规范第二页。

●论文题目用三号黑体字、一级标题用四号黑体字,并居中;二级、三级标题用小四
号黑体字,左端对齐(不居中)。

论文中其他汉字一律采用小四号宋体字,行距用单倍行距。

●提请大家注意:摘要应该是一份简明扼要的详细摘要(包括关键词),在整篇论文
评阅中占有重要权重,请认真书写(注意篇幅不能超过一页,且无需译成英文)。

评阅时将首先根据摘要和论文整体结构及概貌对论文优劣进行初步筛选。

●论文应该思路清晰,表达简洁(正文尽量控制在30页以内,附录页数不限)。

●引用别人的成果或其他公开的资料(包括网上查到的资料) 必须按照规定的参考文
献的表述方式在正文引用处和参考文献中均明确列出。

正文引用处用方括号标示参考文献的编号,如[1][3]等;引用书籍还必须指出页码。

参考文献按正文中的引用次序列出,其中书籍的表述方式为:
[编号] 作者,书名,出版地:出版社,出版年。

参考文献中期刊杂志论文的表述方式为:
[编号] 作者,论文名,杂志名,卷期号:起止页码,出版年。

参考文献中网上资源的表述方式为:
[编号] 作者,资源标题,网址,访问时间(年月日)。

2015年**大学数学建模竞赛
参赛队员信息
参赛题号(从A/B/C/D中选择一项填写):。

计算机科学与技术毕业论文排版格式要求

计算机科学与技术毕业论文排版格式要求

计算机科学与技术毕业论文排版格式要求本文根据计算机科学与技术专业毕业生要求编写符合写作规范的毕业论文格式排版,仅供计算机科学与技术专业学生参考使用。

摘要部分:标题“摘要”使用:黑体,居中,字号:小三,1.5倍行距,段后11磅,段前为0。

摘要内容设置为:每段落首行缩进2个汉字,字体:宋体,字号:小四,行距:多倍行距 1.25,间距:前段、后段均为0行,取消网格对齐选项。

字数为600-800字。

关键字:“关键词:”是关键词部分的引导,不可省略,用黑体字,小四号,加粗。

列出3-5个关键词即可,关键字之间用分号间隔,末尾不用加标点。

引言:标题“引言”设置成字体:黑体,居中,字号:小三,1.5倍行距,段后11磅,段前为0。

引言正文设置成每段落首行缩进2字,字体:宋体,字号:小四,行距:多倍行距 1.25,间距:前段、后段均为0行,取消网格对齐选项。

论文格式基本要求:(1)纸型:A4纸,双面打印;(2)页边距:上87.5px,下62.5px,左62.5px、右62.5px;(3)页眉:62.5px,页脚:50px,左侧装订;(4)字体:正文全部宋体、小四;(5)行距:多倍行距:1.25,段前、段后均为0,取消网格对齐选项。

页眉页脚编排:奇数页页眉,宋体,五号,居中。

偶数页页眉,宋体,五号,居中。

填写内容是论文的中文题目。

正文:每段落首行缩进2字,字体:宋体,字号:小四,行距:多倍行距 1.25,间距:前段、后段均为0行,取消网格对齐选项。

章节标题:(1) 每章的章标题选用模板中的样式所定义的“标题1”,居左;或者手动设置成字体:黑体,居左,字号:小三,1.5倍行距,段后11磅,段前为0。

每章另起一页。

章序号为阿拉伯数字。

在输入章标题之后,按回车键,即可直接输入每章正文。

(2) 每节的节标题选用模板中的样式所定义的“标题2”,居左;或者手动设置成字体:黑体,居左,字号:四号,1.5倍行距,段后为0,段前0.5行。

数学建模论文写作要求

数学建模论文写作要求

数学建模论文格式规范
1、论文用白色A4纸单面打印,上下左右各留出2.5厘米的页边距。

2、论文页码必须位于每页页脚中部,用阿拉伯数字从“1”开始连续编号,不
能有页眉。

3、论文题目用三号宋体加粗、一级标题用四号宋体加粗,并居中。

论文中其他
汉字一律采用小四号宋体字,行距用1.5倍行距。

4、摘要应该是简明扼要(包括关键词),要求用一两句话说明题目中解决的问
题是什么、用什么模型解决的、求解方法是什么、结果如何、有无改进和推广。

5、论文写作要求,应基本包括(1)-(9)个方面的内容。

(1)摘要;(2)
问题的提出;(3)模型的假设及符号说明;(4)模型的建立;(5)模型的求解;(6)结果分析及相关说明;(7)参考文献;(8)程序和附件。

6、引用别人的成果或其他公开的资料(包括网上查到的资料) 必须按照规定的
参考文献的表述方式在正文引用处和参考文献中均明确列出。

正文引用处用方括号标示参考文献的编号,如[1][3]等;引用书籍还必须指出页码。

参考文献按正文中的引用次序列出,其中书籍的表述方式为:
[编号] 作者,书名,出版地:出版社,出版年。

参考文献中期刊杂志论文的表述方式为:
[编号] 作者,论文名,杂志名,卷期号:起止页码,出版年。

参考文献中网上资源的表述方式为:
[编号] 作者,资源标题,网址,访问时间(年月日)。

全国大学生数学建模竞赛论文格式规范

全国大学生数学建模竞赛论文格式规范

全国大学生数学建模竞赛论文格式规范●本科组参赛队从A、B题中任选一题,专科组参赛队从C、D题中任选一题。

(全国评奖时,每个组别一、二等奖的总名额按每道题参赛队数的比例分配;但全国一等奖名额的一半将平均分配给本组别的每道题,另一半按每道题参赛队比例分配。

)●论文用白色A4纸单面打印;上下左右各留出至少2.5厘米的页边距;从左侧装订。

●论文第一页为承诺书,具体内容和格式见本规范第二页。

●论文第二页为编号专用页,用于赛区和全国评阅前后对论文进行编号,具体内容和格式见本规范第三页。

●论文题目、摘要和关键词写在论文第三页上,从第四页开始是论文正文,不要目录。

●论文从第三页开始编写页码,页码必须位于每页页脚中部,用阿拉伯数字从“1”开始连续编号。

●论文不能有页眉,论文中不能有任何可能显示答题人身份的标志。

●论文题目用三号黑体字、一级标题用四号黑体字,并居中;二级、三级标题用小四号黑体字,左端对齐(不居中)。

论文中其他汉字一律采用小四号宋体字,行距用单倍行距。

打印文字内容时,应尽量避免彩色打印(必要的彩色图形、图表除外)。

●提请大家注意:摘要应该是一份简明扼要的详细摘要(包括关键词),在整篇论文评阅中占有重要权重,请认真书写(注意篇幅不能超过一页,且无需译成英文)。

全国评阅时将首先根据摘要和论文整体结构及概貌对论文优劣进行初步筛选。

●论文应该思路清晰,表达简洁(正文尽量控制在20页以内,附录页数不限)。

●在论文纸质版附录中,应给出参赛者实际使用的软件名称、命令和编写的全部计算机源程序(若有的话)。

同时,所有源程序文件必须放入论文电子版中备查。

论文及程序电子版压缩在一个文件中,一般不要超过20MB,且应与纸质版同时提交。

●引用别人的成果或其他公开的资料(包括网上查到的资料) 必须按照规定的参考文献的表述方式在正文引用处和参考文献中均明确列出。

正文引用处用方括号标示参考文献的编号,如[1][3]等;引用书籍还必须指出页码。

大学生数学建模邀请赛论文格式规范

大学生数学建模邀请赛论文格式规范

大学生数学建模邀请赛论文格式规范●各参赛队从A、B、C题中任选一题。

●论文页面为A4纸纵向,上下左右各留出2.5厘米的页边距。

●论文题目和摘要写在第一页上,第一页不写页码,不写单位和作者姓名,不写关键词,不写英文摘要。

正文从第二页开始,页码位于每页页脚中部,用阿拉伯数字从"1"开始连续编号。

●论文题目用三号黑体字、一级标题用四号黑体字,并居中;二级、三级标题用小四号黑体字,左端对齐。

论文中其他汉字一律采用小四号宋体字,行距用单倍行距。

提交的论文电子稿一律是PDF文件,不得压缩,命名规则为"题号队号.pdf",题号为A,B,C之一。

论文封面不得出现学校名或人名,以队号为唯一识别方式。

为彰显比赛的公平性,队号(四位数)由电脑随机产生,比赛前发送到各队联系邮箱中。

如某参赛队选A题,则命名为"AXXXX.pdf",“XXXX”即为队号。

其他文件(如程序文件等)的命名规则同上。

●提请大家注意:摘要应该是一份简明扼要的详细摘要,在整篇论文评阅中占有重要权重,请认真书写(注意篇幅不能超过一页)。

评阅时将首先根据摘要和论文整体结构及概貌对论文优劣进行初步筛选。

●论文应该思路清晰,表达简洁(正文尽量控制在20页以内,附录页数不限)。

●引用别人的成果或其他公开的资料(包括网上查到的资料) 必须按照规定的参考文献的表述方式在正文引用处和参考文献中均明确列出。

正文引用处用方括号标示参考文献的编号,如[1][3]等;引用书籍还必须指出页码。

参考文献按正文中的引用次序列出,其中书籍的表述方式为:[编号] 作者,书名,出版地:出版社,出版年。

参考文献中期刊杂志论文的表述方式为:[编号] 作者,论文名,杂志名,卷期号:起止页码,出版年。

参考文献中网上资源的表述方式为:[编号] 作者,资源标题,网址,访问时间(年月日)。

●本规范的解释权归本届大学生数学建模邀请赛组委会所有。

数学建模论文格式

数学建模论文格式

数学建模论文格式①研究的主要问题;②建立的什么模型;③用的什么求解方法;④主要结果(简单、主要的);⑤自我评价和推广。

数学建模竞赛章程规定,对竞赛论文的评价应以:①假设的合理性②建模的创造性③结果的正确性④文字表述的清晰性为主要标准。

所以论文中应努力反映出这些特点。

注意:整个版式要完全按照《全国大学生数学建模竞赛论文格式规范》的要求书写,否则无法送全国评奖。

一、问题的重述数学建模竞赛要求解决给定的问题,所以一般应以“问题的重述”开始。

此部分的目的是要吸引读者读下去,所以文字不可冗长,内容选择不要过于分散、琐碎,措辞要精练。

这部分的内容是将原问题进行整理,将已知和问题明确化即可。

注意:在写这部分的内容时,绝对不可照抄原题!应为:在仔细理解了问题的基础上,用自己的语言重新将问题描述一篇。

应尽量简短,没有必要像原题一样面面俱到。

二、模型假设作假设时需要注意的问题:①为问题有帮助的所有假设都应该在此出现,包括题目中给出的假设!②重述不能代替假设!也就是说,虽然你可能在你的问题重述中已经叙述了一些假设,但在这里仍然要再次叙述!③与题目无关的假设,就不必在此写出了。

三、变量说明为了使读者能更充分的理解你所做的工作,①变量说明要全即是说,在后面模型建立模型求解过程中使用到的所有变量,都应该在此加以说明。

②要与数学中的习惯相符,不要使用程序中变量的写法比如:一般表示圆周率;cba,一般表示常量、已知量;zyx,一般表示变量、未知量再比如:变量21,aa等,就不要写成:a[0],a[1]或a(1),a(2)四、模型的建立与求解这一部分是文章的重点,要特别突出你的创造性的工作。

在这部分写作需要注意的事项有:①一定要有分析,而且分析应在所建立模型的前面;②一定要有明确的模型,不要让别人在你的文章中去找你的模型;③关系式一定要明确;思路要清晰,易读易懂。

④建模与求解一定要截然分开;⑤结果不能代替求解过程:必须要有必要的求解过程和步骤!最好能像写算法一样,一步一步的写出其步骤;⑥结果必须放在这一部分的结果中,不能放在附录里。

全国大学生数学建模竞赛 论文格式规范(含编辑步骤)

全国大学生数学建模竞赛 论文格式规范(含编辑步骤)
注意:论文中所有的英文、数字都用“Times New Roman”字体。
全国大学生数学建模竞赛论文格式规范
五、引用别人的成果或其他公开的资料(包括网上 查到的资料) 必须按照规定的参考文献的表述方式 在正文引用处和参考文献中均明确列出。正文引 用处用方括号标示参考文献的编号,如[1][3]等; 引用书籍还必须指出页码。参考文献按正文中的 引用次序列出。
全国大学生数学建模竞赛论文格式规范
3、插入页码 单击需要添加页码的位置—单击工具栏中 的“插入页码”—选中页码—调整页码的 段落位置为“居中对齐”。
全国大学生数学建模竞赛论文格式规范
三、从第四页开始是论文正文(不要目 录)。论文不能有页眉或任何可能显示答 题人身份和所在学校等的信息。 删除页眉的横线:
含格式编辑的具体步骤!好资料! 全国大学生数学建模竞赛论文格式规范
一、论文用白色A4纸单面打印;上下左右 各留出至少2.5厘米的页边距;从左侧装订。
全国大学生数学建模竞赛论文格式规范
二、论文从第三页开始编写页码,页码必 须位于每页页脚中部,用阿拉伯数字从“1” 开始范
全国大学生数学建模竞赛论文格式规范
(2)插入目录 在文章“摘要”后单独插入一页—“插 入”—“引用”—“索引和目录”—进入“目 录”选项卡—设置目录格式—“确定”。
如何插入页码:先把文章分成若干节,再把 节与节之间断开,最后对每一节分别插入页 码。 1、插入分节符 “插入”—“分隔符”—“连续”—“确定”。
在“编号专用页”最后一行插入一个分节符, 如果要添加目录,在目录的最后一页也需要插入一 个分节符。
全国大学生数学建模竞赛论文格式规范
2、将节与节之间断开
“视图”—“页眉和页脚”—切换到页脚—单击页 眉和页脚工具栏中的“链接到前一个”,取消链 接—单击页眉和页脚工具栏中的“设置页码格 式”—设置“数字格式”和“起始页码”。

ACM的论文写作格式标准

ACM的论文写作格式标准

ACM Word Template for SIG Site1st Author1st author's affiliation1st line of address2nd line of address Telephone number, incl. country code 1st author's E-mail address2nd Author2nd author's affiliation1st line of address2nd line of addressTelephone number, incl. country code2nd E-mail3rd Author3rd author's affiliation1st line of address2nd line of addressTelephone number, incl. country code3rd E-mailABSTRACTA s network speed continues to grow, new challenges of network processing is emerging. In this paper we first studied the progress of network processing from a hardware perspective and showed that I/O and memory systems become the main bottlenecks of performance promotion. Basing on the analysis, we get the conclusion that conventional solutions for reducing I/O and memory accessing latencies are insufficient for addressing the problems.Motivated by the studies, we proposed an improved DCA combined with INIC solution which has creations in optimized architectures, innovative I/O data transferring schemes and improved cache policies. Experimental results show that our solution reduces 52.3% and 14.3% cycles on average for receiving and transmitting respectively. Also I/O and memory traffics are significantly decreased. Moreover, an investigation to the behaviors of I/O and cache systems for network processing is performed. And some conclusions about the DCA method are also presented.KeywordsKeywords are your own designated keywords.1.INTRODUCTIONRecently, many researchers found that I/O system becomes the bottleneck of network performance promotion in modern computer systems [1][2][3]. Aim to support computing intensive applications, conventional I/O system has obvious disadvantages for fast network processing in which bulk data transfer is performed. The lack of locality support and high latency are the two main problems for conventional I/O system, which have been wildly discussed before [2][4].To overcome the limitations, an effective solution called Direct Cache Access (DCA) is suggested by INTEL [1]. It delivers network packages from Network Interface Card (NIC) into cache instead of memory, to reduce the data accessing latency. Although the solution is promising, it is proved that DCA is insufficient to reduce the accessing latency and memory traffic due to many limitations [3][5]. Another effective solution to solve the problem is Integrated Network Interface Card (INIC), which is used in many academic and industrial processor designs [6][7]. INIC is introduced to reduce the heavy burden for I/O registers access in Network Drivers and interruption handling. But recent report [8] shows that the benefit of INIC is insignificant for the state of the art 10GbE network system.In this paper, we focus on the high efficient I/O system design for network processing in general-purpose-processor (GPP). Basing on the analysis of existing methods, we proposed an improved DCA combined with INIC solution to reduce the I/O related data transfer latency.The key contributions of this paper are as follows:▪Review the network processing progress from a hardware perspective and point out that I/O and related last level memory systems have became the obstacle for performance promotion.▪Propose an improved DCA combined with INIC solution for I/O subsystem design to address the inefficient problem of a conventional I/O system.▪Give a framework of the improved I/O system architecture and evaluate the proposed solution with micro-benchmarks.▪Investigate I/O and Cache behaviors in the network processing progress basing on the proposed I/O system.The paper is organized as follows. In Section 2, we present the background and motivation. In Section 3, we describe the improved DCA combined INIC solution and give a framework of the proposed I/O system implementation. In Section 4, firstly we give the experiment environment and methods, and then analyze the experiment results. In Section 5, we show some related works. Finally, in Section 6, we carefully discuss our solutions with many existing technologies, and then draw some conclusions.2.Background and MotivationIn this section, firstly we revise the progress of network processing and the main network performance improvement bottlenecks nowadays. Then from the perspective of computer architecture, a deep analysis of network system is given. Also the motivation of this paper is presented.2.1Network processing reviewFigure 1 illustrates the progress of network processing. Packages from physical line are sampled by Network Interface Card (NIC). NIC performs the address filtering and stream control operations, then send the frames to the socket buffer and notifies OS to invoke network stack processing by interruptions. When OS receives the interruptions, the network stack accesses the data in socket buffer and calculates the checksum. Protocol specific operations are performed layer by layer in stack processing. Finally, data is transferred from socket buffer to the user buffer depended on applications. Commonly this operation is done by memcpy, a system function in OS.Figure 1. Network Processing FlowThe time cost of network processing can be mainly broke down into following parts: Interruption handling, NIC driver, stack processing, kernel routine, data copy, checksum calculation and other overheads. The first 4 parts are considered as packet cost, which means the cost scales with the number of network packets. The rests are considered as bit cost (also called data touch cost), which means the cost is in proportion to the total I/O data size. The proportion of the costs highly depends on the hardware platform and the nature of applications. There are many measurements and analyses about network processing costs [9][10]. Generally, the kernel routine cost ranges from 10% - 30% of the total cycles; the driver and interruption handling costs range from 15% - 35%; the stack processing cost ranges from 7% - 15%; and data touch cost takes up 20% - 35%. With the development of high speed network (e.g. 10/40 Gbps Ethernet), an increasing tendency for kernel routines, driver and interruption handling costs is observed [3].2.2 MotivationTo reveal the relationship among each parts of network processing, we investigate the corresponding hardware operations. From the perspective of computerhardware architecture, network system performance is determined by three domains: CPU speed, Memory speed and I/O speed. Figure 2 depicts the relationship.Figure 2. Network xxxxObviously, the network subsystem can achieve its maximal performance only when the three domains above are in balance. It means that the throughput or bandwidth ofeach hardware domain should be equal with others. Actually this is hard for hardware designers, because the characteristics and physical implementation technologies are different for CPU, Memory and I/O system (chipsets) fabrication. The speed gap between memory and CPU – a.k.a “the memory wall” – has been paid special attention for more than ten years, but still it is not well addressed. Also the disparity between the data throughput in I/O system and the computing capacity provided by CPU has been reported in recent years [1][2].Meanwhile, it is obvious that the major time costs of network processing mentioned above are associated with I/O and Memory speeds, e.g. driver processing, interruption handling, and memory copy costs. The most important nature of network processing is the “producer -consumer locality” between every two consecutive steps of the processing flow. That means the data produced in one hardware unit will be immediately accessed by another unit, e.g. the data in memory which transported from NIC will be accessed by CPU soon. However for conventional I/O and memory systems, the data transfer latency is high and the locality is not exploited.Basing on the analysis discussed above, we get the observation that the I/O and Memory systems are the limitations for network processing. Conventional DCA or INIC cannot successfully address this problem, because it is in-efficient in either I/O transfer latency or I/O data locality utilization (discussed in section 5). To diminish these limitations, we present a combined DCA with INIC solution. The solution not only takes the advantages of both method but also makes many improvements in memory system polices and software strategies.3. Design MethodologiesIn this section, we describe the proposed DCA combined with INIC solution and give a framework of the implementation. Firstly, we present the improved DCA technology and discuss the key points of incorporating it into I/O and Memory systems design. Then, the important software data structures and the details of DCA scheme are given. Finally, we introduce the system interconnection architecture and the integration of NIC.3.1 Improved DCAIn the purpose of reducing data transfer latency and memory traffic in system, we present an improved Direct Cache Access solution. Different with conventional DCA scheme, our solution carefully consider the following points. The first one is cache coherence. Conventionally, data sent from device by DMA is stored in memory only. And for the same address, a different copy of data is stored in cache which usually needs additional coherent unit to perform snoop operation [11]; but when DCA is used, I/O data and CPU data are both stored in cache with one copy for one memory address, shown in figure 4. So our solution modifies the cache policy, which eliminated the snoopingoperations. Coherent operation can be performed by software when needed. This will reduce much memory traffic for the systems with coherence hardware [12].I/O write *(addr) = bCPU write *(addr) = aCacheCPU write *(addr) = a I/O write with DCA*(addr) = bCache(a) cache coherance withconventional I/O(b) cache coherance withDCA I/OFigure 3. xxxxThe second one is cache pollution. DCA is a mixed blessing to CPU: On one side, it accelerates the data transfer; on the other side, it harms the locality of other programs executed in CPU and causes cache pollution. Cache pollution is highly depended on the I/O data size, which is always quite large. E.g. one Ethernet package contains a maximal 1492 bytes normal payload and a maximal 65536 bytes large payload for Large Segment Offload (LSO). That means for a common network buffer (usually 50 ~ 400 packages size), a maximal size range from 400KB to 16MB data is sent to cache. Such big size of data will cause cache performance drop dramatically. In this paper, we carefully investigate the relationship between the size of I/O data sent by DCA and the size of cache system. To achieve the best cache performance, a scheme of DCA is also suggested in section 4. Scheduling of the data sent with DCA is an effective way to improve performance, but it is beyond the scope of this paper.The third one is DCA policy. DCA policy refers the determination of when and which part of the data is transferred with DCA. Obviously, the scheme is application specific and varies with different user targets. In this paper, we make a specific memory address space in system to receive the data transferred with DCA. The addresses of the data should be remapped to that area by user or compilers.3.2 DCA Scheme and detailsTo accelerate network processing, many important software structures used in NIC driver and the stack are coupled with DCA. NIC Descriptors and the associated data buffers are paid special attention in our solution. The former is the data transfer interface between DMA and CPU, and the later contains the packages. For farther research, each package stored in buffer is divided into the header and the payload. Normally the headers are accessed by protocols frequently, but the payload is accessed only once or twice (usually performed as memcpy) in modern network stack and OS. The details of the related software data structures and the network processing progress can be found in previous works [13].The progress of transfer one package from NIC to the stack with the proposed solution is illustrated in Table 1. All the accessing latency parameters in Table 1 are based on a state of the art multi-core processor system [3]. One thing should be noticed is that the cache accessing latency from I/O is nearly the same with that from CPU. But the memory accessing latency from I/O is about 2/3 of that from CPU due to the complex hardware hierarchy above the main memory.Table 1. Table captions should be placed above the tabletransfer.We can see that DCA with INIC solution saves above 95% CPU cycles in theoretical and avoid all the traffic to memory controller. In this paper, we transfer the NIC Descriptors and the data buffers including the headers and payload with DCA to achieve the best performance. But when cache size is small, only transfer the Descriptors and the headers with DCA is an alternative solution.DCA performance is highly depended on system cache policy. Obviously for cache system, write-back with write-allocate policy can help DCA achieves better performance than write-through with write non-allocate policy. Basing on the analysis in section 3.1, we do not use the snooping cache technology to maintain the coherence with memory. Cache coherence for other non-DCA I/O data transfer is guaranteed by software.3.3 On-chip network and integrated NICFootnotes should be Times New Roman 9-point, and justified to the full width of the column.Use the “ACM Reference format” for references – that is, a numbered list at the end of the article, ordered alphabetically and formatted accordingly. See examples of some typical reference types, in the new “ACM Reference format”, at the end of this document. Within this template, use the style named referencesfor the text. Acceptable abbreviations, for journal names, can be found here: /reference/abbreviations/. Word may try to automatically ‘underline’ hotlinks in your references, the correct style is NO underlining.The references are also in 9 pt., but that section (see Section 7) is ragged right. References should be published materials accessible to the public. Internal technical reports may be cited only if they are easily accessible (i.e. you can give the address to obtain thereport within your citation) and may be obtained by any reader. Proprietary information may not be cited. Private communications should be acknowledged, not referenced (e.g., “[Robertson, personal communication]”).3.4Page Numbering, Headers and FootersDo not include headers, footers or page numbers in your submission. These will be added when the publications are assembled.4.FIGURES/CAPTIONSPlace Tables/Figures/Images in text as close to the reference as possible (see Figure 1). It may extend across both columns to a maximum width of 17.78 cm (7”).Captions should be Times New Roman 9-point bold. They should be numbered (e.g., “Table 1” or “Figure 2”), please note that the word for Table and Figure are spelled out. Figure’s captions should be centered beneath the image or picture, and Table captions should be centered above the table body.5.SECTIONSThe heading of a section should be in Times New Roman 12-point bold in all-capitals flush left with an additional 6-points of white space above the section head. Sections and subsequent sub- sections should be numbered and flush left. For a section head and a subsection head together (such as Section 3 and subsection 3.1), use no additional space above the subsection head.5.1SubsectionsThe heading of subsections should be in Times New Roman 12-point bold with only the initial letters capitalized. (Note: For subsections and subsubsections, a word like the or a is not capitalized unless it is the first word of the header.)5.1.1SubsubsectionsThe heading for subsubsections should be in Times New Roman 11-point italic with initial letters capitalized and 6-points of white space above the subsubsection head.5.1.1.1SubsubsectionsThe heading for subsubsections should be in Times New Roman 11-point italic with initial letters capitalized.5.1.1.2SubsubsectionsThe heading for subsubsections should be in Times New Roman 11-point italic with initial letters capitalized.6.ACKNOWLEDGMENTSOur thanks to ACM SIGCHI for allowing us to modify templates they had developed. 7.REFERENCES[1]R. Huggahalli, R. Iyer, S. Tetrick, "Direct Cache Access forHigh Bandwidth Network I/O", ISCA, 2005.[2] D. Tang, Y. Bao, W. Hu et al., "DMA Cache: Using On-chipStorage to Architecturally Separate I/O Data from CPU Data for Improving I/O Performance", HPCA, 2010.[3]Guangdeng Liao, Xia Zhu, Laxmi Bhuyan, “A New ServerI/O Architecture for High Speed Networks,” HPCA, 2011. [4] E. A. Le´on, K. B. Ferreira, and A. B. Maccabe. Reducingthe Impact of the MemoryWall for I/O Using Cache Injection, In 15th IEEE Symposium on High-PerformanceInterconnects (HOTI’07), Aug, 2007.[5] A.Kumar, R.Huggahalli, S.Makineni, “Characterization ofDirect Cache Access on Multi-core Systems and 10GbE”,HPCA, 2009.[6]Sun Niagara 2,/processors/niagara/index.jsp[7]PowerPC[8]Guangdeng Liao, L.Bhuyan, “Performance Measurement ofan Integrated NIC Architecture with 10GbE”, 17th IEEESymposium on High Performance Interconnects, 2009. [9] A.Foong et al., “TCP Performance Re-visited,” IEEE Int’lSymp on Performance Analysis of Software and Systems,Mar 2003[10]D.Clark, V.Jacobson, J.Romkey, and H.Saalwen. “AnAnalysis of TCP processing overhead”. IEEECommunications,June 1989.[11]J.Doweck, “Inside Intel Core microarchitecture and smartmemory access”, Intel White Paper, 2006[12]Amit Kumar, Ram Huggahalli., Impact of Cache CoherenceProtocols on the Processing of Network Traffic[13]Wenji Wu, Matt Crawford, “Potential performancebottleneck in Linux TCP”, International Journalof Communication Systems, Vol. 20, Issue 11, pages 1263–1283, November 2007.[14]Weiwu Hu, Jian Wang, Xiang Gao, et al, “Godson-3: ascalable multicore RISC processor with x86 emulation,”IEEE Micro, 2009. 29(2): pp. 17-29.[15]Cadence Incisive Xtreme Series./products/sd/ xtreme_series.[16]Synopsys GMAC IP./dw/dwtb.php?a=ethernet_mac [17]ler, P.M.Watts, A.W.Moore, "Motivating FutureInterconnects: A Differential Measurement Analysis of PCILatency", ANCS, 2009.[18]Nathan L.Binkert, Ali G.Saidi, Steven K.Reinhardt.Integrated Network Interfaces for High-Bandwidth TCP/IP.Figure 1. Insert caption to place caption below figure.Proceedings of the 12th international conferenceon Architectural support for programming languages and operating systems (ASPLOS). 2006[19]G.Liao, L.Bhuyan, "Performance Measurement of anIntegrated NIC Architecture with 10GbE", HotI, 2009. [20]Intel Server Network I/O Acceleration./technology/comms/perfnet/downlo ad/ServerNetworkIOAccel.pdfColumns on Last Page Should Be Made As Close AsPossible to Equal Length。

全国大学生数学建模竞赛论文格式规范

全国大学生数学建模竞赛论文格式规范

全国大学生数学建模竞赛论文格式规范(全国大学生数学建模竞赛组委会,2019年修订稿)为了保证竞赛的公平、公正性,便于竞赛活动的标准化管理,根据评阅工作的实际需要,竞赛要求参赛队分别提交纸质版和电子版论文,特制定本规范。

一、纸质版论文格式规范第一条,论文用白色A4纸打印(单面、双面均可);上下左右各留出至少2.5厘米的页边距;从左侧装订。

第二条,论文第一页为承诺书,第二页为编号专用页,具体内容见本规范第3、4页。

第三条,论文第三页为摘要专用页(含标题和关键词,但不需要翻译成英文),从此页开始编写页码;页码必须位于每页页脚中部,用阿拉伯数字从“1”开始连续编号。

摘要专用页必须单独一页,且篇幅不能超过一页。

第四条,从第四页开始是论文正文(不要目录,尽量控制在20页以内);正文之后是论文附录(页数不限)。

第五条,论文附录至少应包括参赛论文的所有源程序代码,如实际使用的软件名称、命令和编写的全部可运行的源程序(含EXCEL、SPSS等软件的交互命令);通常还应包括自主查阅使用的数据等资料。

赛题中提供的数据不要放在附录。

如果缺少必要的源程序或程序不能运行(或者运行结果与正文不符),可能会被取消评奖资格。

论文附录必须打印装订在论文纸质版中。

如果确实没有源程序,也应在论文附录中明确说明“本论文没有源程序”。

第六条,论文正文和附录不能有任何可能显示答题人身份和所在学校及赛区的信息。

第七条,引用别人的成果或其他公开的资料(包括网上资料)必须按照科技论文写作的规范格式列出参考文献,并在正文引用处予以标注。

第八条,本规范中未作规定的,如排版格式(字号、字体、行距、颜色等)不做统一要求,可由赛区自行决定。

在不违反本规范的前提下,各赛区可以对论文增加其他要求。

二、电子版论文格式规范第九条,参赛队应按照《全国大学生数学建模竞赛报名和参赛须知》的要求提交以下两个电子文件,分别对应于参赛论文和相关的支撑材料。

第十条,参赛论文的电子版不能包含承诺书和编号专用页(纸质版的前两页),即电子版论文第一页为摘要专用页。

acm参考文献格式

acm参考文献格式

acm参考文献格式
ACM(Association for Computing
Machinery)参考文献格式遵循ACM Citation
Style。

下面是ACM参考文献格式的一般规则:
1.期刊文章:
[序号] 作者. 文章标题. 期刊名称, 年份, 卷号(期号), 页码.
2.会议论文:
[序号] 作者. 论文标题. 会议名称, 年份, 页码.
3.书籍:
[序号] 作者. 书名. 出版地: 出版社, 出版年份.
4.网页/在线资源:
[序号] 作者/组织名称. 文章标题. 网站名称,
发布日期/更新日期, URL.
请注意,具体的参考文献格式可能会根据不同的出版物和学术机构的要求而有所不同。

建议您在撰写论文或文章时参考ACM官方提供的参考文献格式指南或咨询您所在学校或出版机构的要求,以确保符合相应的格式要求。

ACM(五篇范例)

ACM(五篇范例)

ACM(五篇范例)第一篇:ACMDijkstra 模板/*************************************** * About:有向图的Dijkstra算法实现 * Author:Tanky Woo * Blog:t=0;if(flag == 0){printf(“Non”);}else{for(int i=min;i<=max;++i){if(mark[i]==1 && arr[i]==0)cnt++;}}if(cnt==1)printf(“Yesn”);elseprintf(“Non”);}} return 0;搜索算法模板BFS:1.#include2.#include3.#include4.#includeing namespace std;6.const int maxn=100;7.bool vst[maxn][maxn];// 访问标记8.int dir[4][2]={0,1,0,-1,1,0,-1,0};// 方向向量9.10.struct State // BFS 队列中的状态数据结构 11.{ 12.int x,y;// 坐标位置13.int Step_Counter;// 搜索步数统计器14.};15.16.State a[maxn];17.18.boolCheckState(State s)// 约束条件检验19.{ 20.if(!vst[s.x][s.y] &&...)// 满足条件 1: 21.return 1;22.else // 约束条件冲突 23.return 0;24.} 25.26.void bfs(State st)27.{ 28.queue q;// BFS 队列29.State now,next;// 定义 2 个状态,当前和下一个30.st.Step_Counter=0;// 计数器清零 31.q.push(st);// 入队32.vst[st.x][st.y]=1;// 访问标记33.while(!q.empty())34.{ 35.now=q.front();// 取队首元素进行扩展36.if(now==G)// 出现目标态,此时为Step_Counter 的最小值,可以退出即可37.{ 38.......// 做相关处理39.return;40.} 41.for(int i=0;i<4;i++)42.{ 43.next.x=now.x+dir[i][0];// 按照规则生成下一个状态44.next.y=now.y+dir[i][1];45.next.Step_Counter=now.Step_Coun ter+1;// 计数器加1 46.if(CheckState(next))// 如果状态满足约束条件则入队 47.{ 48.q.push(next);49.vst[next.x][next.y]=1;//访问标记 50.} 51.} 52.q.pop();// 队首元素出队53.} 54.return;55.} 56.57.int main()58.{ 59.......60.return 0;61.}代码:胜利大逃亡Ignatius被魔王抓走了,有一天魔王出差去了,这可是Ignatius逃亡的好机会.魔王住在一个城堡里,城堡是一个A*B*C的立方体,可以被表示成A个B*C的矩阵,刚开始Ignatius被关在(0,0,0)的位置,离开城堡的门在(A-1,B-1,C-1)的位置,现在知道魔王将在T分钟后回到城堡,Ignatius每分钟能从一个坐标走到相邻的六个坐标中的其中一个.现在给你城堡的地图,请你计算出Ignatius能否在魔王回来前离开城堡(只要走到出口就算离开城堡,如果走到出口的时候魔王刚好回来也算逃亡成功),如果可以请输出需要多少分钟才能离开,如果不能则输出-1.Input 输入数据的第一行是一个正整数K,表明测试数据的数量.每组测试数据的第一行是四个正整数A,B,C和T(1<=A,B,C<=50,1<=T<=1000),它们分别代表城堡的大小和魔王回来的时间.然后是A块输入数据(先是第0块,然后是第1块,第2块......),每块输入数据有B行,每行有C个正整数,代表迷宫的布局,其中0代表路,1代表墙.(如果对输入描述不清楚,可以参考Sample Input中的迷宫描述,它表示的就是上图中的迷宫) 特别注意:本题的测试数据非常大,请使用scanf输入,我不能保证使用cin能不超时.在本OJ上请使用Visual C++提交.Output 对于每组测试数据,如果Ignatius能够在魔王回来前离开城堡,那么请输出他最少需要多少分钟,否则输出-1.Sample Input 1 3 3 4 20 0 1 1 1 0 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 0 1 1 1 0 0 0 0 0 1 1 0 0 1 1 0Sample Output 11代码:#include #include #include #include #includeusing namespace std;int tx[] = {0,1,-1,0,0,0,0};int ty[] = {0,0,0,1,-1,0,0};int tz[] = {0,0,0,0,0,1,-1};int arr[55][55][55];int known[55][55][55];// 访问标记int a,b,c,d;struct state{int x,y,z;// 所在的坐标int step_count;//统计搜索步数。

acm引用注脚

acm引用注脚

acm引用注脚
在ACM(Association for Computing Machinery,计算机机器协会)的论文中,引用注脚有以下要求:
1. 注脚格式:
注脚应采用小字号,如字体大小为10pt。

2. 注脚编号:
注脚编号应以小写字母开头,例如:“footnote1”。

注脚编号应紧随逗号后,如:“This is a sample sentence., footnote1”。

3. 注脚内容:
注脚内容应包含所需的信息,例如:引用来源、解释、补充说明等。

4. 注脚位置:
注脚应位于页面底部,靠近页边距。

在文字中间的注脚应位于句号之后,紧邻第一个单词。

以下是一个ACM引用注脚的示例:
This is a sample sentence.^1
^1: This is a footnote with information related to the sentence.
请注意,上述示例中的注脚编号和内容仅作为示例,实际应用时应根据需要进行修改。

在撰写论文时,请遵循ACM的引用规范,并确保注脚内容的准确性和一致性。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

ACM Word Template for SIG Site1st Author1st author's affiliation1st line of address2nd line of address Telephone number, incl. country code 1st author's E-mail address2nd Author2nd author's affiliation1st line of address2nd line of addressTelephone number, incl. country code2nd E-mail3rd Author3rd author's affiliation1st line of address2nd line of addressTelephone number, incl. country code3rd E-mailABSTRACTA s network speed continues to grow, new challenges of network processing is emerging. In this paper we first studied the progress of network processing from a hardware perspective and showed that I/O and memory systems become the main bottlenecks of performance promotion. Basing on the analysis, we get the conclusion that conventional solutions for reducing I/O and memory accessing latencies are insufficient for addressing the problems.Motivated by the studies, we proposed an improved DCA combined with INIC solution which has creations in optimized architectures, innovative I/O data transferring schemes and improved cache policies. Experimental results show that our solution reduces 52.3% and 14.3% cycles on average for receiving and transmitting respectively. Also I/O and memory traffics are significantly decreased. Moreover, an investigation to the behaviors of I/O and cache systems for network processing is performed. And some conclusions about the DCA method are also presented.KeywordsKeywords are your own designated keywords.1.INTRODUCTIONRecently, many researchers found that I/O system becomes the bottleneck of network performance promotion in modern computer systems [1][2][3]. Aim to support computing intensive applications, conventional I/O system has obvious disadvantages for fast network processing in which bulk data transfer is performed. The lack of locality support and high latency are the two main problems for conventional I/O system, which have been wildly discussed before [2][4].To overcome the limitations, an effective solution called Direct Cache Access (DCA) is suggested by INTEL [1]. It delivers network packages from Network Interface Card (NIC) into cache instead of memory, to reduce the data accessing latency. Although the solution is promising, it is proved that DCA is insufficient to reduce the accessing latency and memory traffic due to many limitations [3][5]. Another effective solution to solve the problem is Integrated Network Interface Card (INIC), which is used in many academic and industrial processor designs [6][7]. INIC is introduced to reduce the heavy burden for I/O registers access in Network Drivers and interruption handling. But recent report [8] shows that the benefit of INIC is insignificant for the state of the art 10GbE network system.In this paper, we focus on the high efficient I/O system design for network processing in general-purpose-processor (GPP). Basing on the analysis of existing methods, we proposed an improved DCA combined with INIC solution to reduce the I/O related data transfer latency.The key contributions of this paper are as follows:▪Review the network processing progress from a hardware perspective and point out that I/O and related last level memory systems have became the obstacle for performance promotion.▪Propose an improved DCA combined with INIC solution for I/O subsystem design to address the inefficient problem of a conventional I/O system.▪Give a framework of the improved I/O system architecture and evaluate the proposed solution with micro-benchmarks.▪Investigate I/O and Cache behaviors in the network processing progress basing on the proposed I/O system.The paper is organized as follows. In Section 2, we present the background and motivation. In Section 3, we describe the improved DCA combined INIC solution and give a framework of the proposed I/O system implementation. In Section 4, firstly we give the experiment environment and methods, and then analyze the experiment results. In Section 5, we show some related works. Finally, in Section 6, we carefully discuss our solutions with many existing technologies, and then draw some conclusions.2.Background and MotivationIn this section, firstly we revise the progress of network processing and the main network performance improvement bottlenecks nowadays. Then from the perspective of computer architecture, a deep analysis of network system is given. Also the motivation of this paper is presented.2.1Network processing reviewFigure 1 illustrates the progress of network processing. Packages from physical line are sampled by Network Interface Card (NIC). NIC performs the address filtering and stream control operations, then send the frames to the socket buffer and notifies OS to invoke network stack processing by interruptions. When OS receives the interruptions, the network stack accesses the data in socket buffer and calculates the checksum. Protocol specific operations are performed layer by layer in stack processing. Finally, data is transferred from socket buffer to the user buffer depended on applications. Commonly this operation is done by memcpy, a system function in OS.Figure 1. Network Processing FlowThe time cost of network processing can be mainly broke down into following parts: Interruption handling, NIC driver, stack processing, kernel routine, data copy, checksum calculation and other overheads. The first 4 parts are considered as packet cost, which means the cost scales with the number of network packets. The rests are considered as bit cost (also called data touch cost), which means the cost is in proportion to the total I/O data size. The proportion of the costs highly depends on the hardware platform and the nature of applications. There are many measurements and analyses about network processing costs [9][10]. Generally, the kernel routine cost ranges from 10% - 30% of the total cycles; the driver and interruption handling costs range from 15% - 35%; the stack processing cost ranges from 7% - 15%; and data touch cost takes up 20% - 35%. With the development of high speed network (e.g. 10/40 Gbps Ethernet), an increasing tendency for kernel routines, driver and interruption handling costs is observed [3].2.2 MotivationTo reveal the relationship among each parts of network processing, we investigate the corresponding hardware operations. From the perspective of computerhardware architecture, network system performance is determined by three domains: CPU speed, Memory speed and I/O speed. Figure 2 depicts the relationship.Figure 2. Network xxxxObviously, the network subsystem can achieve its maximal performance only when the three domains above are in balance. It means that the throughput or bandwidth ofeach hardware domain should be equal with others. Actually this is hard for hardware designers, because the characteristics and physical implementation technologies are different for CPU, Memory and I/O system (chipsets) fabrication. The speed gap between memory and CPU – a.k.a “the memory wall” – has been paid special attention for more than ten years, but still it is not well addressed. Also the disparity between the data throughput in I/O system and the computing capacity provided by CPU has been reported in recent years [1][2].Meanwhile, it is obvious that the major time costs of network processing mentioned above are associated with I/O and Memory speeds, e.g. driver processing, interruption handling, and memory copy costs. The most important nature of network processing is the “producer -consumer locality” between every two consecutive steps of the processing flow. That means the data produced in one hardware unit will be immediately accessed by another unit, e.g. the data in memory which transported from NIC will be accessed by CPU soon. However for conventional I/O and memory systems, the data transfer latency is high and the locality is not exploited.Basing on the analysis discussed above, we get the observation that the I/O and Memory systems are the limitations for network processing. Conventional DCA or INIC cannot successfully address this problem, because it is in-efficient in either I/O transfer latency or I/O data locality utilization (discussed in section 5). To diminish these limitations, we present a combined DCA with INIC solution. The solution not only takes the advantages of both method but also makes many improvements in memory system polices and software strategies.3. Design MethodologiesIn this section, we describe the proposed DCA combined with INIC solution and give a framework of the implementation. Firstly, we present the improved DCA technology and discuss the key points of incorporating it into I/O and Memory systems design. Then, the important software data structures and the details of DCA scheme are given. Finally, we introduce the system interconnection architecture and the integration of NIC.3.1 Improved DCAIn the purpose of reducing data transfer latency and memory traffic in system, we present an improved Direct Cache Access solution. Different with conventional DCA scheme, our solution carefully consider the following points. The first one is cache coherence. Conventionally, data sent from device by DMA is stored in memory only. And for the same address, a different copy of data is stored in cache which usually needs additional coherent unit to perform snoop operation [11]; but when DCA is used, I/O data and CPU data are both stored in cache with one copy for one memory address, shown in figure 4. So our solution modifies the cache policy, which eliminated the snoopingoperations. Coherent operation can be performed by software when needed. This will reduce much memory traffic for the systems with coherence hardware [12].I/O write *(addr) = bCPU write *(addr) = aCacheCPU write *(addr) = a I/O write with DCA*(addr) = bCache(a) cache coherance withconventional I/O(b) cache coherance withDCA I/OFigure 3. xxxxThe second one is cache pollution. DCA is a mixed blessing to CPU: On one side, it accelerates the data transfer; on the other side, it harms the locality of other programs executed in CPU and causes cache pollution. Cache pollution is highly depended on the I/O data size, which is always quite large. E.g. one Ethernet package contains a maximal 1492 bytes normal payload and a maximal 65536 bytes large payload for Large Segment Offload (LSO). That means for a common network buffer (usually 50 ~ 400 packages size), a maximal size range from 400KB to 16MB data is sent to cache. Such big size of data will cause cache performance drop dramatically. In this paper, we carefully investigate the relationship between the size of I/O data sent by DCA and the size of cache system. To achieve the best cache performance, a scheme of DCA is also suggested in section 4. Scheduling of the data sent with DCA is an effective way to improve performance, but it is beyond the scope of this paper.The third one is DCA policy. DCA policy refers the determination of when and which part of the data is transferred with DCA. Obviously, the scheme is application specific and varies with different user targets. In this paper, we make a specific memory address space in system to receive the data transferred with DCA. The addresses of the data should be remapped to that area by user or compilers.3.2 DCA Scheme and detailsTo accelerate network processing, many important software structures used in NIC driver and the stack are coupled with DCA. NIC Descriptors and the associated data buffers are paid special attention in our solution. The former is the data transfer interface between DMA and CPU, and the later contains the packages. For farther research, each package stored in buffer is divided into the header and the payload. Normally the headers are accessed by protocols frequently, but the payload is accessed only once or twice (usually performed as memcpy) in modern network stack and OS. The details of the related software data structures and the network processing progress can be found in previous works [13].The progress of transfer one package from NIC to the stack with the proposed solution is illustrated in Table 1. All the accessing latency parameters in Table 1 are based on a state of the art multi-core processor system [3]. One thing should be noticed is that the cache accessing latency from I/O is nearly the same with that from CPU. But the memory accessing latency from I/O is about 2/3 of that from CPU due to the complex hardware hierarchy above the main memory.Table 1. Table captions should be placed above the tabletransfer.We can see that DCA with INIC solution saves above 95% CPU cycles in theoretical and avoid all the traffic to memory controller. In this paper, we transfer the NIC Descriptors and the data buffers including the headers and payload with DCA to achieve the best performance. But when cache size is small, only transfer the Descriptors and the headers with DCA is an alternative solution.DCA performance is highly depended on system cache policy. Obviously for cache system, write-back with write-allocate policy can help DCA achieves better performance than write-through with write non-allocate policy. Basing on the analysis in section 3.1, we do not use the snooping cache technology to maintain the coherence with memory. Cache coherence for other non-DCA I/O data transfer is guaranteed by software.3.3 On-chip network and integrated NICFootnotes should be Times New Roman 9-point, and justified to the full width of the column.Use the “ACM Reference format” for references – that is, a numbered list at the end of the article, ordered alphabetically and formatted accordingly. See examples of some typical reference types, in the new “ACM Reference format”, at the end of this document. Within this template, use the style named referencesfor the text. Acceptable abbreviations, for journal names, can be found here: /reference/abbreviations/. Word may try to automatically ‘underline’ hotlinks in your references, the correct style is NO underlining.The references are also in 9 pt., but that section (see Section 7) is ragged right. References should be published materials accessible to the public. Internal technical reports may be cited only if they are easily accessible (i.e. you can give the address to obtain thereport within your citation) and may be obtained by any reader. Proprietary information may not be cited. Private communications should be acknowledged, not referenced (e.g., “[Robertson, personal communication]”).3.4Page Numbering, Headers and FootersDo not include headers, footers or page numbers in your submission. These will be added when the publications are assembled.4.FIGURES/CAPTIONSPlace Tables/Figures/Images in text as close to the reference as possible (see Figure 1). It may extend across both columns to a maximum width of 17.78 cm (7”).Captions should be Times New Roman 9-point bold. They should be numbered (e.g., “Table 1” or “Figure 2”), please note that the word for Table and Figure are spelled out. Figure’s captions should be centered beneath the image or picture, and Table captions should be centered above the table body.5.SECTIONSThe heading of a section should be in Times New Roman 12-point bold in all-capitals flush left with an additional 6-points of white space above the section head. Sections and subsequent sub- sections should be numbered and flush left. For a section head and a subsection head together (such as Section 3 and subsection 3.1), use no additional space above the subsection head.5.1SubsectionsThe heading of subsections should be in Times New Roman 12-point bold with only the initial letters capitalized. (Note: For subsections and subsubsections, a word like the or a is not capitalized unless it is the first word of the header.)5.1.1SubsubsectionsThe heading for subsubsections should be in Times New Roman 11-point italic with initial letters capitalized and 6-points of white space above the subsubsection head.5.1.1.1SubsubsectionsThe heading for subsubsections should be in Times New Roman 11-point italic with initial letters capitalized.5.1.1.2SubsubsectionsThe heading for subsubsections should be in Times New Roman 11-point italic with initial letters capitalized.6.ACKNOWLEDGMENTSOur thanks to ACM SIGCHI for allowing us to modify templates they had developed. 7.REFERENCES[1]R. Huggahalli, R. Iyer, S. Tetrick, "Direct Cache Access forHigh Bandwidth Network I/O", ISCA, 2005.[2] D. Tang, Y. Bao, W. Hu et al., "DMA Cache: Using On-chipStorage to Architecturally Separate I/O Data from CPU Data for Improving I/O Performance", HPCA, 2010.[3]Guangdeng Liao, Xia Zhu, Laxmi Bhuyan, “A New ServerI/O Architecture for High Speed Networks,” HPCA, 2011. [4] E. A. Le´on, K. B. Ferreira, and A. B. Maccabe. Reducingthe Impact of the MemoryWall for I/O Using Cache Injection, In 15th IEEE Symposium on High-PerformanceInterconnects (HOTI’07), Aug, 2007.[5] A.Kumar, R.Huggahalli, S.Makineni, “Characterization ofDirect Cache Access on Multi-core Systems and 10GbE”,HPCA, 2009.[6]Sun Niagara 2,/processors/niagara/index.jsp[7]PowerPC[8]Guangdeng Liao, L.Bhuyan, “Performance Measurement ofan Integrated NIC Architecture with 10GbE”, 17th IEEESymposium on High Performance Interconnects, 2009. [9] A.Foong et al., “TCP Performance Re-visited,” IEEE Int’lSymp on Performance Analysis of Software and Systems,Mar 2003[10]D.Clark, V.Jacobson, J.Romkey, and H.Saalwen. “AnAnalysis of TCP processing overhead”. IEEECommunications,June 1989.[11]J.Doweck, “Inside Intel Core microarchitecture and smartmemory access”, Intel White Paper, 2006[12]Amit Kumar, Ram Huggahalli., Impact of Cache CoherenceProtocols on the Processing of Network Traffic[13]Wenji Wu, Matt Crawford, “Potential performancebottleneck in Linux TCP”, International Journalof Communication Systems, Vol. 20, Issue 11, pages 1263–1283, November 2007.[14]Weiwu Hu, Jian Wang, Xiang Gao, et al, “Godson-3: ascalable multicore RISC processor with x86 emulation,”IEEE Micro, 2009. 29(2): pp. 17-29.[15]Cadence Incisive Xtreme Series./products/sd/ xtreme_series.[16]Synopsys GMAC IP./dw/dwtb.php?a=ethernet_mac [17]ler, P.M.Watts, A.W.Moore, "Motivating FutureInterconnects: A Differential Measurement Analysis of PCILatency", ANCS, 2009.[18]Nathan L.Binkert, Ali G.Saidi, Steven K.Reinhardt.Integrated Network Interfaces for High-Bandwidth TCP/IP.Figure 1. Insert caption to place caption below figure.Proceedings of the 12th international conferenceon Architectural support for programming languages and operating systems (ASPLOS). 2006[19]G.Liao, L.Bhuyan, "Performance Measurement of anIntegrated NIC Architecture with 10GbE", HotI, 2009. [20]Intel Server Network I/O Acceleration./technology/comms/perfnet/downlo ad/ServerNetworkIOAccel.pdfColumns on Last Page Should Be Made As Close AsPossible to Equal Length。

相关文档
最新文档