面向OpenCL模型的DCT并行化_向阳霞

合集下载

clfft 编译

clfft 编译clfft 编译是一个用于高性能计算的开源软件库，具有快速、高效的特点。

它主要用于在并行计算环境下进行快速傅里叶变换计算，可广泛应用于信号处理、图像处理、数值模拟等领域。

本文将介绍clfft 编译的基本原理、使用方法以及优缺点。

我们来了解一下 clfft 编译的基本原理。

clfft 是基于 OpenCL（开放式计算语言）的快速傅里叶变换库，利用GPU 的并行计算能力来加速傅里叶变换的计算过程。

通过将计算任务分配到多个计算单元上并行执行，可以大大缩短计算时间，提高计算效率。

在使用clfft 编译时，首先需要安装OpenCL 运行时环境，并将clfft 库链接到您的项目中。

然后，您可以通过简单的API 调用来实现快速傅里叶变换的计算。

clfft 提供了丰富的配置选项，可以根据实际需求来调整计算的精度、数据格式等参数，以获得最佳的计算性能。

与传统的 CPU 计算相比，使用 clfft 编译进行快速傅里叶变换计算具有明显的优势。

首先，GPU 的并行计算能力远远高于CPU，在处理大规模数据时可以显著提高计算速度。

其次，由于clfft 是基于开放标准的OpenCL，可以在不同厂商的GPU 上进行跨平台运行，具有较好的可移植性。

当然，使用clfft 编译也存在一些局限性。

首先，对于小规模数据的计算，可能无法发挥出GPU 的并行计算优势，反而会因为数据传输等额外开销导致性能下降。

此外，由于OpenCL 的编程模型相对复杂，需要较深的理解和熟练掌握，对于初学者来说可能存在一定的学习曲线。

总的来说，clfft 编译是一个强大的工具，可以帮助我们在并行计算环境下高效地进行快速傅里叶变换计算。

通过合理的配置和优化，可以实现更快速、更高效的计算，提高计算任务的处理能力。

同时，也需要注意避免一些潜在的问题，如数据规模过小导致性能下降等。

希望本文能够帮助读者更好地理解和应用clfft 编译，提高计算效率，促进科学研究和工程实践的发展。

OpenCL异构并行计算：原理、机制与优化实践

02
8.3.2 存储器层次
9 OpenCL计算二维卷积
11
9 OpenCL计算二维卷积
9.1 测试平台信息
9.2 AMD X86 CPU串行实现
9.3 简单 OpenCL实现
9.4 使用常量存储器优化
9.5 使用局部存储器优化
9.6 一个工作项同时计算多个输出
9 OpenCL计算二维卷积
4.16.4 设备队列的使用示例
CD
4 OpenCL C语言
4.16 设备队列
5 OpenCL存储器对象
07
5 OpenCL存储器对象
5.1 缓冲区 5.3 管道
5.5 共享虚拟存储器
5.2 图像对象和采样器对象
5.4 存储器对象数据传输
5.6 存储器一致性模型
5 OpenCL存储器对象

5.5.1 SVM 缓冲操作
5.5.2 SVM 类型和特性
5.5.3 相关示例
5 OpenCL 存储器对象
5.6 存储器一致性模型

1
2
5.6.1 存储器次序规则 5.6.2 原子操作的存储器次序规则
3
4
5.6.3 栅栏操作的存储 5.6.4 工作组函数的存
8.3 ARM Mali GPU架构
8.4 本章小结
8 OpenCL到主流GPU处理器的映射
8.1.1 AMD Cayman架
构GPU
8.1.2 AMD GCN 架构的GPU
8.1 AMD家族GPU
8 OpenCL到主流GPU 处理器的映射
8.2 NVIDIA CUDA兼容的 GPU

1 异构并行计算的过去、现状和未来

基于OpenCL大规模种群并行遗传算法

ＲＮＡ二级结构预测的算法研究发展至今已经有几十年了，文献［１］总结了不同的ＲＮＡ二级结构预测算法及实现。目前算法大致可以归为４类：第１类
为比较序列分析，第２类为动态规划法，第３类为组合优化算法，第４类为启发式算法Ｊ。本文采用启发式算法，遗传算法是典型的启发式算法，最早由Ｈｏｌ— ｌａｎｄＪ．提出ｊ，ＶａｎＢａｔｅｎｂｕｒｇ等人最早用遗传算法实现ＲＮＡ二级结构预测＿４Ｊ。由于遗传算法具有潜在的并行性，Ｓｈａｐｉｒｏ等人先在１６３８４个处理器的ＳＩＭＤ超级计算机ＭａｓＰａｒＭＰ．２上实现了大规模并行遗传算法，并且和动态规划算法相比该算法结果更好。之后Ｓｈａｐｉｒｏ等人又在６４个处理器的ＭＩＭＤＳＧＩＯＲ— ＩＧＩＮ２０００和有５１２个处理器的ＭＩＭＤＣＲＡＹＴ３Ｅ上实现了大规模并行遗传算法，以及对改变相关参数的影响进行讨论ｊ。文献［７］介绍了粗粒度的分布式遗传算法并且通过实验结果证明该算法加速了ＲＮＡ二级结构自由能值收敛的速度。本文描述如何在异构平台下实现大规模种群并行遗传算法。实验结果的度量由２个参数来衡量，即敏感性（Ｓｅｎｓｉｔｉｖｉｔｙ）和阳性预测率（ＰＰＶ）Ｊ。此外也和传统的串行遗传算
Ａｂｓｔｒａｃｔ：ＩｎｏｒｄｅｒｔｏｉｍｐｒｏｖｅｔｈｅａｃｃｕｒａｃｙｒａｔｅｏｆＲＮＡｓｅｃｏｎｄａｒｙｓｔｒｕｃｔｕｒｅｐｒｅｄｉｃｔｉｏｎａｎｄａｃｃｅｌｅｒａｔｅｔｈｅｇｅｎｅｔｉｃａｌｇｏｒｉｔｈｍ，ｔｈｉｓｔｈｅｓｉｓｐｒｏｐｏｓｅｄｔｈｅｉｍｐｌｅｍｅｎｔａｔｉｏｎｏｆａｌａｒｇｅｐｏｐｕｌａｔｉｏｎｐａｒａｌｌｅｌｇｅｎｅｔｉｃａｌｇｏｒｉｔｈｍｂａｓｅｄｏｎＯｐｅｎＣＬ．Ｔｈｒｏｕｇｈｒｅｓｅａｒｃｈｉｎｇｔｈｅｐｏ— ｔｅｎｔｉａｌｐａｒａｌｌｅｌｉｓｍｏｆｇｅｎｅｔｉｃａｌｇｏｒｉｔｈｍ，ｔｈｉｓｔｈｅｓｉｓｕｓｅｓＡｃｅｒＴＭＰ２４６Ｍ—ＭＧ一５０８６ａｓｅｘｐｅｒｉｍｅｎｔａｌｐｌａｔｆｏｒｍ，ｆｉｒｓｔｌｙｒｅａｌｉｚｅｓｔｈｅｇｅ— ｎｅｔｉｃａｌｇｏｒｉｔｈｍｏｎＣＰＵ，ｔｈｅｎｒｅａｌｉｚｅｓｔｈｅｌａｒｇｅｐｏｐｕｌａｔｉｏｎｐａｒａｌｌｅｌｇｅｎｅｔｉｃａｌｇｏｒｉｔｈｍｏｎＧＰＵ．Ｔｅｓｔｒｅｓｕｌｔｓｓｈｏｗｔｈａｔｔｈｅａｃｃｕｒａｃｙｒａｔｅｏｆｐａｒａｌｌｅｌｇｅｎｅｔｉｃａｌｇｏｒｉｔｈｍｐｒｅｄｉｃｔｉｏｎｈａｓｂｅｅｎｉｎｃｒｅａｓｅｄａｂｏｕｔ４９．８８％，ａｎｄｔｈｅａｖｅｒａｇｅｓｐｅｅｄｕｐｏｆｕｓｉｎｇＧＰＵｉｓ９．７６ｘ．Ｋｅｙｗｏｒｄｓ：ｌａｒｇｅｐｏｐｕｌａｔｉｏｎ；ｐａｒａｌｌｅｌｇｅｎｅｔｉｃａｌｇｏｒｉｔｈｍ；ＲＮＡｓｅｃｏｎｄａｒｙｓｔｒｕｃｔｕｒｅｐｒｅｄｉｃｔｉｏｎ

基于OpenCL的并行计算在图像处理中的应用研究

基于OpenCL的并行计算在图像处理中的应用研究一、引言随着计算机技术的不断发展，图像处理在各个领域中扮演着越来越重要的角色。

而并行计算作为一种提高计算效率的重要手段，被广泛应用于图像处理领域。

OpenCL作为一种开放的并行计算框架，具有跨平台、高性能等优势，因此在图像处理中得到了广泛的应用。

本文将探讨基于OpenCL的并行计算在图像处理中的应用研究。

二、OpenCL简介OpenCL（Open Computing Language）是一种开放的并行计算框架，由Khronos Group组织制定并维护。

它允许开发人员利用多核CPU、GPU等异构设备进行并行计算，从而加速应用程序的运行速度。

OpenCL 具有跨平台、高性能、灵活性强等特点，适用于各种类型的并行计算任务。

三、图像处理中的并行计算图像处理是一种对图像进行获取、存储、传输和呈现等操作的技术，广泛应用于医学影像、数字摄影、视频处理等领域。

在图像处理过程中，往往需要对大量的像素数据进行处理，这就需要高效的计算方法来提高处理速度。

而并行计算正是能够满足这一需求的技术之一。

四、基于OpenCL的图像处理算法1. 图像滤波图像滤波是图像处理中常见的操作，可以用于去噪、平滑、锐化等目的。

基于OpenCL的并行计算可以加速图像滤波算法的执行，提高处理效率。

2. 图像分割图像分割是将图像划分为若干个具有独立特征的区域的过程，常用于目标检测、边缘检测等领域。

通过利用OpenCL进行并行计算，可以加快图像分割算法的运行速度。

3. 特征提取在图像处理中，特征提取是一项重要任务，可以帮助识别目标、分类等。

基于OpenCL的并行计算可以加速特征提取算法的执行，提高准确性和效率。

五、案例分析以某医学影像处理项目为例，该项目需要对大量医学影像数据进行处理和分析。

通过采用基于OpenCL的并行计算技术，可以显著提高影像处理速度，缩短诊断时间，提高工作效率。

六、未来展望随着硬件技术的不断进步和OpenCL框架的不断完善，基于OpenCL的并行计算在图像处理领域将有更广阔的应用前景。

基于OpenCL的图形处理器FDTD算法仿真研究

基于OpenCL的图形处理器FDTD算法仿真研究
龚兴全;李康;孔凡敏
【期刊名称】《系统仿真学报》
【年(卷),期】2014(0)8
【摘要】大型电磁仿真计算的时域有限差分(FDTD)仿真计算通常是十分耗时的,通用图形处理器(GPGPU)技术为其提供了一种合适的解决方案。

通过分析FDTD算法特征以及Courant稳定性及数值色散稳定条件,阐述其在并行计算方面的优势。

OpenCL是一种新的开放的行业标准,可以用来开发在CPUs,GPUs及其它各种平台上通用的程序。

通过阐述OpenCL硬件基础,执行环境,实现方法来增进对其概念的掌握。

为充分发挥异构处理平台下GPU的计算能力,提出了基于开放运算语言(OpenCL)模型,并且利用图形处理器并行FDTD仿真的实现方法。

并与传统CPU 计算相比较,验证计算结果的精确性。

通过分析不同网格数量的速度提升情况,结果表明基于OpenCL的GPU计算速度与单CPU相比可以提升几十倍。

【总页数】6页(P1639-1643)
【作者】龚兴全;李康;孔凡敏
【作者单位】山东大学信息科学与工程学院
【正文语种】中文
【中图分类】TP391.41
【相关文献】
1.基于GPU的高阶辛FDTD算法的并行仿真研究
2.基于FDTD算法的超宽频同轴辐照腔仿真设计
3.基于异构计算的三维FDTD并行算法及其在电磁仿真中的应用
4.图形处理器对ADI-FDTD算法的加速作用研究
5.基于CUDA的图形处理器FDTD算法仿真研究
因版权原因，仅展示原文概要，查看原文内容请购买。

基于OpenCL的NDVI算法的并行化实现

ＧＰＵｔｏＮＤＶＩａｌｇｏｒｉｔｈｍｔｏａｃｃｅｌｅｒａｔｅｔｈｅｏｐｅｒａｔｉｏｎ．Ｔｈｅｈｅｔｅｒｏｇｅｎｅｉｔｙ，ＯｐｅｎＣＬｆｒａｍｅｗｏｒｋｉｓｕｓｅｄｔｏｓｔｕｄｙ
（四川化工职业技术学院，四川泸州６４６０００）
摘
要：以ＮＤＶＩ算法为例，讲述了利用ＯｐｅｎＣＬ框架，使用ＧＰＵ对ＮＤＶＩ算法实现加速操作。利用ＯｐｅｎＣＬ框架
的异构性，研究是否能更加有效提高加速比。关键词：ＯｐｅｎＣＬ，ＮＤＶＩ算法，加速比中图分类号：ＴＰ２７４文献标识码：Ａ
第２６卷
第１１期
电脑开发与应用
（总０８８７）
・７７・
文章编号：１００３ — ５８５０ｆ２０１３）１１－００７７ — ０２
基于ＯｐｅｎＣＬ的ＮＤＶＩ算法的并行化实现
熊英，罗琼
基于ＯｐｅｎＣＬ的ＮＤＶＩ算法的并行化实现
２０１３年第１１期
图像拉伸是为了加强图像的对比度，一张图像被灰度化以后，可能看起来要么全黑、要么全白的情况，这属于对比度不明显。可以通过图像拉伸的方法，使
原本对比不明显的图像看着黑白对比明显，方便对图片进行进一步的处理。通过图像拉伸的方法，可以很

OpenCL Snort性能优化与并行设计说明书

Parallel Design and Performance Optimization based on OpenCLSnortHongying Xie*,Yangxia Xiang †, Caisen Chen †*Unit 61175, China†Academy of Armored Force Engineering, ChinaKeywords: OpenCL snort, GPU, AC algorithm, parallel programmingAbstractWith the rapid increasement of the network speed and number of threats which hide in the network poses enormous challenges to network intrusion detection systems (NIDS). As the most popular NIDS, snort can run as a single threaded application. However, it may not be able to detect intrusions in real-time especially in networks with high traffic. In this paper, a parallel module OpenCL Snort (OCLSnort) is introduced: realize parallel pattern matching algorithm using GPU and innovate new architecture which is more suitable for the parallel algorithm. The result showed that OCLSnort can detect the attacks correctly and effectively, the new system not only has markedly improved on throughput, also effectively reduced the CPU utilization and memory usage.1 IntroductionIntrusion detection systems (IDSs) are of critical importance to the integrity of computer networks due to massive growth in the data transmission speed and the frequency of attacks. With the rapid development of computer network, more and more data need to be searched, analyzed and detected whether they have threat or not. Such as network monitoring application snort, which is an open source network intrusion prevention and detection system (IDS/IPS) developed by Sourcefire. Combining the benefits of signature, protocol, and anomaly-based inspection, and as so far, Snort is the most widely deployed IDS/IPS technology worldwide [1].In snort, they are using pattern matching algorithm such as AC, BM algorithm to detect thread. Pattern matching is one of the core operations used by applications such as traffic classification [2], intrusion detection systems [3] and content monitoring filters [1]. Unfortunately, packet detecting part occupies the most of the time of the whole processing time in modern NIDSes [4,5] and this operation has significant overheads in terms of both memory space and CPU cycles, so when the data or packet which will be detected is very large, there will be packet-losing problem about snort.Several research efforts have used GPU for security purposes, including cryptography [6], data carving [7] and intrusion detection [8]. And Jacob and Brodley were the first that tried to use the GPU as a pattern matching engine for NIDS in PixelSnort [8]. They changed KMP algorithm to parallel version but the performance result is not very ideal. This paper is organized as follows: In Section 2 and 3, two methods to realize OpenCL snort are presented. In section 4, we evaluate our implementation and compare with the original snort. Experimental results and analysis are given. Finally, conclusions are given in Section 5.2 ArchitectureThe overall architecture of Snort NIDS is shown in Fig.1 and the OpenCL version Snort’s architecture is showed in Fig.2. From Fig.1 and Fig.2, there are some differences between the original snort and the new version snort, one is collecting packets at packet classification part; the other is detecting packet content at packet detecting part.g ss2.1 Packet detecting using OpenCL AC algorithmFor the multi-pattern matching algorithm, the first thing is to build DFA such as Fig.3, and this section is finished before2nd Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2017)the beginning of the packet detected in snort. In our design of the OpenCL version Snort, the realized DFA is represented as a two-dimensional state table array that is mapped on the memory space of the GPU. The dimensions of the array are equal to the number of states and the size of the alphabet (256 in our case), respectively. Each cell contains the next state to move, as well as an indication of whether the state is a final state or not.Fig.3 AC State Machine of P atterns “he”, “hers”, “his”, “she” Fig.3 shows a state machine of patterns which used in our OpenCL AC algorithm, from this figure we can see that the difference between original AC state machine and OpenCL AC state machine is that whether it is needed about failure transitions. The failure transitions are used to back-track the state machine to recognize patterns in any location of an input stream. Given a current state and an input character, the original AC machine first looks up the valid transition table to check whether there is a valid transition for the input character; otherwise, the machine looks up the failure transition table and jumps to the failure state where the failure transition points. Then the machine regards the same input character until the character causes a valid transition. In our OpenCL version snort, we used OpenCL to change the AC algorithm for parallelism based on PFAC [9]. The idea of the parallel algorithm of AC is: Give an input stream have N byte, we will create N threads corresponding to N byte. And for each thread, it is only responsible for identifying the pattern starting at the thread starting position. So in OpenCL AC algorithm, the failure transitions of the AC state machine can all be removed as well as the self-loop transition of the initial state. And the whole process of the OpenCL AC is showed by Fig.4.Fig.4 OpenCL AC Algorithm Execution ProcessThere are several characteristics of the OpenCL AC algorithm. First, although it creates huge amounts of threads, most threads have a high probability of terminating very early because a thread in OpenCL AC is only responsible for matching the pattern beginning at its starting position. Second, the maximum detection length of each thread is the maximumlength in whole patterns, and based on this, when the larger the input stream is, the faster the detect speed is. And finally, the failure transitions are all removed when we are using OpenCL AC, and this simplifies the algorithm and the thread can detect the input stream automatically without rollback. 2.2 Packet collecting and transfer to GPUBefore the packet detecting in GPU, the first thing must to consider is how and how many packets will be transferred from the network to the GPU memory. The simplest method is according to the original snort architecture, transfer one packet to GPU for processing once time. However, as we know, the TCP or UDP packet size is usually hundreds byte,the performance is much better batching many small transfers into a large one than making each transfer separately [16].Thus, we realized the two methods (1) using original snort architecture, transferring one packet to GPU once time, and (2)change packet classification part, transferring more than onepackets to GPU once time and get the performancecomparison based on two methods. As we know, the process flow of original snort is showed as Fig.1: capture a packet from network once time, then packet analysis and classification, detecting packet and output the result finally. Using method (1), the process flow can be changed as follows: capture a packet from network, packet analysis and classification is not changed, then transfer the packet to GPU and detecting it using OpenCL AC algorithm, then transfer the results to CPU and output the result finally. So using method (1), we changed Detection part, using OpenCL AC take the place of original ac algorithm and the other part of snort ’s architecture are not changed, processing packet one by one.And using method 1, the performance improvement is not exciting, there are two reasons: (1) the DMA time occupied most of the time; (2) the input stream transferring to GPU only have hundreds byte each time. It does not make full use of GPU resources. Based on this, we proposed a new method that can transfer more than one packets to GPU, the architecture of OpenCL Snort are showed by Fig.2. From Fig.2 we can see the difference between OpenCL Snort and original snort is processing packet number once time. In OpenCL version Snort, we change the interface to realizecapture multi-packets at the beginning of snort and then deal with packets, transfer multi-packets to GPU once time, and finally output alerts/logs.3 Implementation In this section, we are showed the implementation details about the OpenCL version snort. In snort, they are using different rules to detecting whether the packet has threatened or not according to packet type. Different rules create different state transition tables. So we are focus on the packets collecting and the state transition tables correspond to packets part when using method (2) transfer multi-packets to GPU once time.3.1 Transferring a Single Packet to GPUIn this approach, when capturing a packet from network, snort will decode and classify it, then send it to GPU for detecting,send the result to CPU and finally output the result.Assume the packet has N characters, the algorithm will create N threads in GPU if the device has this ability, and else they will create maximum threads which under the device’s ability, then each thread will loop many times to detecting the wholeThis method is very simple, because there is only one packet, and the state transition table which transferred to GPU also has one. So this method need not to find out which state transition table is corresponding to which packet. A drawback of this approach is that the input stream is very small and the DMA time occupied most of the time, so the GPU is not utilized effectively.3.2 Transferring Multi-packets to GPUIn this approach, we will mark the packets which we interfered to GPU as unique packetID, and give a unique tableID for each state transition table which finished creation process and transfer all the state transition tables to GPU. The whole process will be finished at the initialization phase of snort.Using this approach to detect packets, the way to creating threads is the same as method (1), and the difference is the packet must correspond to the state transition table. And this could be solved adding elements packetID and tableID to struct ACSM, and we will also transfer those elements to GPU. In the OpenCL algorithm, we must to judge the packet boundaries in order to get the correct results. The process flow is showed by Fig.6, and example of packets collecting process is showed by Fig.7. From Fig.7, each packet corresponds to a state transition table, so when we transfer packets to GPU, we must to determine the transition table’s address corresponding to each packet.Although this method is complicated comparing with method (1), the input stream transferring into GPU once time is muchindex n4 EvaluationPattern matching is the most critical operation in the snort system. Usually pattern matching algorithm can be classified into single pattern matching algorithm (such as KMP) and multi-patterns algorithm (such as AC).In this section, we explore the performance of our implementation. We realize the two approaches in Snort and compare the two methods with the original snort respectively. In processing multi-packets method, we change the parameter about the collecting packet number once time then get the average time about processing one packet.In our experiments we used an AMD A10-4600m computer, the CPU in this computer is 2.3GHz APU with Radeon™ HD Graphics 4 processor , 8G memory and GPU is AMD Radeon HD 7660G card, the operating system is Ubuntu12.04 64-bit. We get the packets data LLS DDOS 1.0-inside.dump from MIT Lincoln Laboratory [17] as the detected data, we also using snort to dump some small packets date set using the detected data LLS DDOS 1.0-inside.dump, such as contain 200 packets date set, 1000 packets date set, 10000 packets date set and 20000 packets date set, and we read the packets from disk rather than network in order to get the same speed of capture packet in different experiments. We also using the default rules file when using different version snort and this can ensure the correctness of the result.For all experiments, we disregard the time spent in the initialization phase of snort as well as the log of the alerts to the disk or terminal. Even though it only takes just a few seconds to load rule files and build its internal structures. And we used the full AC implementation to measure the performance in original snort.4.1 Performance Comparison between One Packet OpenCL snort and Original snortIn this experiment, input1, input2 and input3 are three different size detected packets and the packets size is 200, 1000 and 10000 respectively. We change the input packet numbers to get the performance data about one packet OpenCL snort and original snort and the performance data is showed by Fig.8. From Fig.8, (1) with the increase of input packet size, the throughput of two methods becomes large; (2)using one packet OpenCL snort, the throughput is not batter than the original snort’s throughput, because the local memory is not large enough, the state transition table is stored in Global memory, when judge the current character meet the conversion criteria or not each time, the algorithm must access the global memory once time; and most of threads are terminated at the beginning of the algorithm, and the GPU’s utilization is not high.Fig.8 Performance Comparison4.2 Performance Analysis about Multi-Packets OpenCL snortIn this experiment, we get the performance comparison about multi-packets OpenCL snort and one packet OpenCL snort. Before this comparison, first thing we must to ensure is when we transfer how many packets to GPU, the algorithm will get the best performance and maximum throughput. Fig.9 showed the algorithm’s performance comparison when transferring different number of packets to GPU. From Fig.9 we can see with the number’s difference, the throughput has some difference as well. When the number which transfers to GPU once time is 30, the throughput is 4.78Gbits/sec, when the number is 100, the throughput is up to 6.43Gbits/sec. And when the number changes from 150 to 200, the throughput grows slowly and then it has a downward trend. So we select 200 as the number which transfers to GPU once time.Fig.9 Performance Comparison of Multi-packets OpenCLSnortThe next experiment we are focus on is the performance comparison about the three version snort: original snort, one packet OpenCL snort and multi-packets OpenCL snort. And the result is showed by Fig.10. In this figure, input1, input2 and input3 are three different size detected packets as the same as Fig.9, the packets size is 150, 1200 and 10000 respectively. From the result, we can see the multi-packets OpenCL snort’s throughput is about two times faster than other two methods. And the GPU’s utilization in multi-packets OpenCL snort is much higher than the one packet OpenCL snort.Fig.10 Performance Comparison on Different Versions 5 ConclusionsIn this paper, we have proposed two OpenCL version snort, one packet OpenCL snort and multi-packets OpenCL snort in order to accelerate packet detecting by GPU. And the result showed that although one packet OpenCL snort’s throughput is about 20% slower than original snort, multi-packets OpenCL snort is about 2 times faster than original snort, and this system was able to achieve a maximum throughput of 6.758Gbit/s.AcknowledgementsThis work is supported by the National Natural Science Foundation of China under Grant No. 61402528, all support is gratefully acknowledged.References[1] Snort: : Home Page. /.[2] Application Layer Packet Classifier for Linux. http://17-/.[3] Clam AntiVirus. /.[4] S. Antonatos, K. Anagnostakis, and E. Markatos.Generating realistic workloads for network intrusion detection systems. In Proceedings of the 4th ACM Workshop on Software and Performance, (2004).[5] J. B. D. Cabrera, J. Gosar,W. Lee, and R. K. Mehra. Onthe statistical distribution of processing times in network intrusion detection. In 43rd IEEE Conference on Decision and Control, 75-80, (2004).[6] D. L. Cook, J. Ioannidis, A. D. Keromytis, and J. Luck.Cryptographics: Secret key cryptography using graphics cards. In Proceedings of RSA Conference, Cryptographer’s Track. 334-350, (2005).[7] G. G. R. I. Lodovico Marziale and V. Roussev. Massivethreading: Using GPUs to increase the performance of digital forensics tools. Digital Investigation. 73–81. [8] N. Jacob and C. Brodley. Offloading IDS computation tothe GPU. In Proceedings of the 22nd Annual Computer Security Applications Conference on Annual Computer Security Applications Conference, Washington, DC, USA, IEEE Computer Society. 371–380, (2006).[9] Lin CH, Tsai SY, Liu CH, Chang SC, Shyu. JMAccelerating string matching using multi-threaded algorithm on gpu. In: GLOBECOM, 1-5, (2010). [10] C. IOS. IPS deployment guide. /.。

基于OpenCL的并行kNN算法设计与实现

基于OpenCL的并行kNN算法设计与实现杨朋霖;冯百明;周志阳;温向慧【期刊名称】《计算机工程与科学》【年(卷),期】2017(039)012【摘要】kNN算法是机器学习和数据挖掘程序中经常使用的经典算法.随着数据量的增大,kNN算法的执行时间急剧上升.为了有效利用现代计算机的GPU等计算单元减少kNN算法的计算时间,提出了一种基于OpenCL的并行kNN算法,该算法对距离计算和排序两个瓶颈点进行并行化,在距离计算阶段使用细粒度并行化策略和优化的线程模型,排序阶段使用优化内存模型的双调排序.以UCI数据集letter为测试集,分别使用E8400和GTS450运行kNN算法进行测试,采用GPU加速的并行kNN算法的计算速度比CPU版提高了40.79倍.%The kNN algorithm is a classical algorithm often used in machine learning and data mining programs.With the increasing amount of data,the execution time of thekNN algorithm increases sharply.In order to effectively utilize GPU and other computing units of modern computers to reduce the computation time of the kNN algorithm,we present a parallel kNN algorithm based on OpenCL,which parallelizes the two segments of bottleneck code:distance calculation and sorting.The algorithm adopts the fine-grained parallelization strategy and the optimized memory model in the phase of distance calculation and uses bitonic sort that can optimize memory model in the phase of sorting.We use Letter,one of UCI datasets,as the test set and E8400 AND GTS450 to run the kNN algorithm for testing.Thecomputing speed of the parallel kNN algorithm accelerated by GPU is 40.79 times faster than that of its CPU version.【总页数】5页(P2198-2202)【作者】杨朋霖;冯百明;周志阳;温向慧【作者单位】西北师范大学计算机科学与工程学院,甘肃兰州730070;西北师范大学计算机科学与工程学院,甘肃兰州730070;西北师范大学计算机科学与工程学院,甘肃兰州730070;西北师范大学计算机科学与工程学院,甘肃兰州730070【正文语种】中文【中图分类】TP301【相关文献】1.一种基于OpenCL的高能效并行KNN算法及其GPU验证 [J], 贺江;蒲宇亮;李海波;阎波2.基于OpenCL的JPEG压缩算法并行化设计与实现 [J], 张敏华;张剑贤;裘雪红;周端3.基于OpenCL的尺度不变特征变换算法的并行设计与实现 [J], 许川佩;王光4.基于OpenCL的图像灰度化并行算法研究 [J], 肖汉;郭宝云;李彩林;肖诗洋5.基于OpenCL并行的挡板对珠光体生长的相场法模拟 [J], 朱昶胜;李玉杰;马芳兰;冯力;雷鹏因版权原因，仅展示原文概要，查看原文内容请购买。

一种基于OpenCL的高能效并行KNN算法及其GPU验证

一种基于OpenCL的高能效并行KNN算法及其GPU验证贺江;蒲宇亮;李海波;阎波
【期刊名称】《电子技术应用》
【年(卷),期】2016(42)2
【摘要】近年来数据分类技术已经被广泛应用于各类问题中,作为最重要的分类算法之一,K最近邻法(KNN)也被广泛使用.在过去的近50年,人们就如何提高KNN的并行性能做出巨大努力.基于CUDA的KNN并行实现算法——CUKNN算法证明KNN在GPU上的并行实现比在CPU上串行实现的速度提升数十倍,然而,CUDA 在实现过程中包含了大量的冗余计算.提出了一种并行冒泡的新型KNN并行算法,并通过OpenCL,在以GPU作为计算核心的异构系统上进行验证,结果显示提出的方法比CUDA快16倍.
【总页数】3页(P14-16)
【作者】贺江;蒲宇亮;李海波;阎波
【作者单位】电子科技大学,四川成都610036;电子科技大学,四川成都610036;广东省公安厅,广东广州510050;电子科技大学,四川成都610036
【正文语种】中文
【中图分类】TP311
【相关文献】
1.一种基于OpenCL的Lukas-Kanade光流并行加速算法 [J], 吴进;李乔深;闵育;马思敏
2.基于OpenCL的并行kNN算法设计与实现 [J], 杨朋霖;冯百明;周志阳;温向慧
3.基于OpenCL的隐马尔可夫模型的GPU并行实现 [J], 刘华泓;姜克旺;蔡向高
4.一种基于格子玻尔兹曼前向模型的GPU并行加速荧光扩散断层成像的方法 [J], 吴焕迪; 严壮志; 岑星星
5.基于GPU并行的二维时空中子动力学MOC程序开发及验证 [J], 邹航;梁亮;张乾;宋佩涛;赵强
因版权原因，仅展示原文概要，查看原文内容请购买。

基于OpenCL的异构系统并行编程

ｒｏｒａｍｍｉｎＰａｒａｌｌｅｌｏｆｈｅｔｅｒｏｅｎｅｏｕｓｓｓｔｅｍｂａｓｅｄｏｎＯｅｎＣＬｐｇｇｇｙｐ
，，ＺＨＡＮＹｕｎＺＨＡＯＸｉｎｃａｎＴＡＮＴｏｎｄｅ－－ｇ
：Ａ，Ａｂｓｔｒａｃｔｉｍｉｎａｔｔｈｅｏｆｔｈｅｌｏｗｕｔｉｌｉｚａｔｉｏｎｏｆｈｅｔｅｒｏｅｎｅｏｕｓｉｎｔｒａｄｉｔｉｏｎａｌｃｏｍｕｔｉｎａｒｏｂｌｅｍｒｏｃｅｓｓｏｒｓｅｎｅｒａｌｕｒｏｓｅｇｇｐｇｐｐｇｐｐ）ｎｅｗｃｏｍｕｔｉｎｔｅｃｈｎｏｌｏｂａｓｅｄｏｎＯｅｎＣＬ（ｏｅｎｃｏｍｕｔｉｎｌａｎｕａｅｉｓｔｏａｕｎｉｆｉｅｄｅｎｅｒａｌｕｒｏｓｅｒｏｏｓｅｄｒｏｖｉｄｅｒｏ－ｐｇｇｙｐｐｐｇｇｇｇｐｐｐｐｐｐ，，，ｍｏｄｅｌ．ＦｉｒｓｔｌＯｅｎＣＬｆｅａｔｕｒｅｓｆｒａｍｅｗｏｒｋａｎｄａｒｅｄｅｓｃｒｉｂｅｄａｎｄｔｈｅｏｔｉｍｉｚａｒａｍｍｉｎｅｒｆｏｒｍａｎｃｅｒｉｎｃｉｌｅｓｅｒｆｏｒｍａｎｃｅ－ｙｐｐｇｇｐｐｐｐ，Ｏ）ｒｏｏｓｅｄ．ＡｔｔｉｏｎｓｔｒａｔｅｉｅｓｏｆＯｅｎＣＬａｒｅｌａｓｔｅｎＣＬａｎｄＣＵＤＡ（ｃｏｍｕｔｅｕｎｉｆｉｅｄｄｅｖｉｃｅａｒｃｈｉｔｅｃｔｕｒｅａｒｅｃｏｍａｒｅｄｗｉｔｈｐｐｇｐｐｐｐ ’ ｒｏｃｅｓｓｏｒｓｏｔｅｎｔｉａｌｏｔｈｅｒｃｏｍｕｔｉｎｔｅｃｈｎｏｌｏｉｅｓ．ＴｈｅｒｅｓｕｌｔｓｈｏｗｓｔｈａｔＯｅｎＣＬｃａｎｆｕｌｌｅｘｃａｖａｔｅｖａｒｉｏｕｓｉｎｈｅｔｅｒｏｅｎｅｏｕｓｐｐｐｇｇｐｙｇ，，ｌａｔｆｏｒｍｓａｎｄｄｉｓｔｒｉｂｕｔｅｔａｓｋｓｒｅａｓｏｎａｂｌａｎｄａｎｅｗｏｗｅｒｆｕｌｔｏｏｌｆｏｒｌａｒｅｓｃａｌｅａｒａｌｌｅｌｃｏｍｕｔｉｎｉｓｒｏｖｉｄｅｄ．ｒｏｃｅｓｓｉｎ－ｐｙｐｇｐｐｇｐｐｇ：；；；ＫｅｗｏｒｄｓｈｅｔｅｒｏｅｎｅｏｕｓｃｏｍｕｔｉｎＯｅｎＣＬ；ｏｔｉｍｉｚａｔｉｏｎＣＵＤＡｒｏｃｅｓｓｏｒｓｅｎｅｒａｌｕｒｏｓｅｅｒｆｏｒｍａｎｃｅｇｐｇｐｐｐｇｐｐｐｙ

基于OpenCL的隐马尔可夫模型的GPU并行实现

① ④ ④
隐马尔可夫模型作为一个时间序列的统计分析模型在语音识别、生物序列分析
间系统。
② 识别问题指的是在给出模型
＝
为了方便讨论，根据文献［１】，本文首先（１）隐藏的状态集合记为
三类关键算法的并行化设计，并在对三种关
｛，， … ），其中为可能取值的个数，并记时刻ｔ
＝
（Ｆｏｒｗａｒｄ）算法、Ｖｉｔｅｒｂｉ算法、Ｂａｕｍ—
ｗｅｌｃｈ算法。三种算法的具体描述均可在
一
键算法深入研究的基础上，开发了基于开放的观察值为Ｏ文找到。ｔ，０ｔ ∈Ｖ；式计算语言（ＯｐｅｎＣＬ）的并行实现。实验（３）当步长为ｌ时，条件概率（１）为转结果表明在各种情况下获得了良好的加速移概率，ＯｐｅｎＣＬ简介简记为研，所有的嘶构成转移概２比性能。０ＰｅｎＣＬ是一个面向异构平台编程的率矩阵Ａ，且ａ０＝ｌ；开放性计算语言，其将各类计算设备抽象』＝Ｉ
＝
ｃｕ可以看成为一个单指多数据流处（５）初始的状态概率分布向量记为组成，｛疗｝，其中＝Ｐ（ｑ：），１ ≤ Ｎｌ理器。在设备端执行的一个完整程序称为

基于OpenCL的并行方腔流加速性能分析

第２８卷第４期２１０１年４月
计算机应用研究
ＡｐｌａｉｎＲｅｅｒｈｏｏｕｅｓｐｉｔｓａｃｆＣｍｐｔｒｃｏ
Ｖ０Ｊ．Ｉ２８Ｎｏ４Ａｐｒ２．０１１
基于ＯｐｎＬ的并行方腔流加速性能分析冰ｅＣ
一
发布了支持ＯｅＣ．用计算规范的驱动程序，得人们ｐｎＬ１０通使
可以真正使用ＯｅＣｐｎＬ技术释放ＧＵ和ＣＵ强大的并行计算ＰＰ能力。ＯｅＣｐｎＬ允许开发人员把同样的代码移植到任何支持该标准的平台上运行，大大降低了程序移植性的复杂度。而开发
ｅａｉ，ＣｉｅｃｄｍｙＳｉｃ，ｅｉｇ１０９ＣｉａｈｎｓｈｎｓＡａｅｃｎｅＢｉｎ０１０，ｈｎ）ｃｅｅｓｊ
Ａｂｓｒｃ：Ｔｓｐａｅｒｐｓｄａｎｅａｒａｈｔｃｅｅａｅｃｖｔｏｐｏｌｍｉｅｎｅｈｏｏｙ．ｉａｒｔａｓｔａｔｈｉｐｒｐｏｏｅｗｐｐｏｃｏａｃｌｒｔａｉｙｆｗｒｂｅｕｓｎｇＯｐＣＬｔｃｎｌｇＴｈｓｐｐｅｒｎ — ｌｆｒｄｔｅｃｖｔｏｐｏｌｍｎｏＮ— ｑａｉｎａｄｓｌｅｔｂｎｔｉｆｒｎｅｍｅｈｏ．Ｄｕｎｈｓｐｏｅｕｅ，ｕｅｏａｏｍｅｈａｉｙｆｗｒｂｅｉｔＳｅｕｔｏｎｏｖｄｉｙｆｉｅｄｆｅｅｃｔｄｌｉｉｒｇｔｉｒｃｄｒｓｄｌｃｌｍｅｒｎｔｅｅｈｎｑｕｓｔｃｅｅａｅｔｏｅ．ｉａｅａｈｅｐｒｇａｏｔｍｏｙａｄｏｈｒｔｃｉｅｏａｃｌｒｔｈｅｃｄＴｈｓｐｐｒｒｎｔｏｒｍｎｂｏｈＮＶＩＡｎＤＩａｄＡＴＩｐａｆｒｓＴｈｘｌｔｏｍ．ｅｅ — ｐｒｍｅｔｓｏｘｔｍｅｐｅｄｕｅｉｎｈｗｓａ３０ｉｓｓｅｐ．

基于OpenCL的尺度不变特征变换算法的并行设计与实现

基于OpenCL的尺度不变特征变换算法的并行设计与实现许川佩;王光【期刊名称】《计算机应用》【年(卷),期】2016(036)007【摘要】针对尺度不变特征变换(SIFT)算法实时性差的问题,提出了利用开放式计算语言(OpenCL)并行优化的SIFT算法.首先,通过对原算法各步骤进行组合拆分、重构特征点在内存中的数据索引等方式对原算法进行并行化重构,使得算法的中间计算结果能够完全在显存中完成交互;然后,采用复用全局内存对象、共享局部内存、优化内存读取等策略对原算法各步骤进行并行设计,提高数据读取效率,降低传输延时;最后,利用OpenCL语言在图形处理单元(GPU)上实现了SIFT算法的细粒度并行加速,并在中央处理器(CPU)上完成了移植.与原SIFT算法配准效果相近时,并行化的算法在GPU和CPU平台上特征提取速度分别提升了10.51～19.33和2.34～4.74倍.实验结果表明,利用OpenCL并行加速的SIFT算法能够有效提高图像配准的实时性,并能克服统一计算设备架构(CUDA)因移植困难而不能充分利用异构系统中多种计算核心的缺点.【总页数】6页(P1801-1806)【作者】许川佩;王光【作者单位】桂林电子科技大学电子工程与自动化学院,广西桂林541004;广西自动检测技术与仪器重点实验室(桂林电子科技大学),广西桂林541004;桂林电子科技大学电子工程与自动化学院,广西桂林541004;广西自动检测技术与仪器重点实验室(桂林电子科技大学),广西桂林541004【正文语种】中文【中图分类】TP391.4【相关文献】1.基于改进尺度不变特征转换算法的合成孔径雷达图像配准并行研究 [J], 张建勋;孙权;李涛2.基于改进尺度不变特征转换算法的合成孔径雷达图像配准并行研究 [J], 张建勋;孙权;李涛3.基于尺度不变特征变换特征点应用于印刷检测的快速匹配算法 [J], 谢文吉;孙晓刚;张亮4.基于尺度不变特征变换特征点应用于印刷检测的快速匹配算法 [J], 谢文吉;孙晓刚;张亮;5.基于改进的尺度不变特征变换特征点匹配的电子稳像算法 [J], 孟勃;韩广良因版权原因，仅展示原文概要，查看原文内容请购买。

基于DCT预测编码的Epiphany-OpenCL大矩阵乘并行计算

基于DCT预测编码的Epiphany-OpenCL大矩阵乘并行计算龙卓群;王晓瑜;王昌明【期刊名称】《自动化与仪表》【年(卷),期】2017(032)007【摘要】为提高大矩阵乘的并行计算效率和计算精度,该文提出一种基于DCT预测编码的Epiphany-OpenCL大矩阵乘并行计算方法.首先,引入DCT预测编码技术,并利用其二维数据的DCT变换值及其逆变换的二维表达式,对未编码数据的预测来消除邻近数据间在时间域以及空间域上的相关性,以达到对数据进行压缩的目的;其次,基于Epiphany进行OpenCL的并行变换编码处理流程设计,实现矩阵乘的并行化计算;最后,通过在常用编程模型和大矩阵乘法上的试验,显示所提方法具有更高的计算效率和精度,验证了所提并行计算方法的性能优势.%In order to improve the parallel computing efficiency and computational precision of large matrix multiplication,a parallel computation method of Epiphany-OpenCL based on DCT predictive coding is proposed.Firstly,the DCT prediction encoding technology was introduced,and its two dimensional inverse DCT transform expression value of the two-dimensional data was used to predict the encoding data to eliminate the correlation in time domain and space domain of the adjacent data,which realized the compression of data;Secondly,based on Epiphany,the parallel transform coding process of OpenCL is designed to realize the parallel computation of matrix multiplication;Finally,experiments on the common programming modeland the large matrix multiplication show that the proposed method has higher computational efficiency and accuracy,and the performance advantage of the proposed parallel computing method is verified.【总页数】7页(P16-21,33)【作者】龙卓群;王晓瑜;王昌明【作者单位】西安航空学院电子工程学院,西安710077;西安航空学院电子工程学院,西安710077;西安电子科技大学电子工程学院,西安710077【正文语种】中文【中图分类】TP391【相关文献】1.基于预测编码与DCT变换编码的比较 [J], 鲁业频2.基于Fan模型非负矩阵分解的光谱解混并行计算 [J], 江子特;赵辽英;邹佳林3.基于GPU的多项式矩阵特征值并行计算 [J], 杨智诚;黄友钦;傅继阳;4.基于 GPU 的多项式矩阵特征值并行计算 [J], 杨智诚;黄友钦;傅继阳5.基于MPI的矩阵相乘并行计算的一种探究 [J], 张亮;赵妍因版权原因，仅展示原文概要，查看原文内容请购买。

基于OpenCL的Prewitt算法的并行实现

基于OpenCL的Prewitt算法的并行实现马歌;肖汉【期刊名称】《现代电子技术》【年(卷),期】2014(000)020【摘要】Prewitt算法是数字图像分割中最常用的边缘检测算法。

采用传统CPU 上的串行方法实现该算法需要较大的计算量、耗时较长，因此，通过GPU对其进行性能加速有着重要的意义。

然而由于GPU硬件体系结构的差异性，跨平台移植是一件非常困难的工作。

针对上述问题，提出了一种基于OpenCL异构框架的Prewitt图像边缘检测并行算法。

实验结果表明，该并行算法比CPU上的串行算法运行速度快，加速比可达30倍，有效地提高了大规模数据处理的效率，可移植性好，具有较高的应用价值。

%Prewitt algorithm is the most commonly used edge detection algorithm in digital image segmentation,but large amount of calculations and great time consumption are needed to be suffered if traditional CPU serial method is used to imple-ment the algorithm. Therefore,it is important to accelerate its performance by GPU. However,the cross platform transplantation is very difficult because of the difference of GPU hardware system structure. In view of the above questions,a parallel algorithm of Prewitt image edge detection based on OpenCL heterogeneous framework is proposed. The experimental results show that the running speed of the parallel algorithm is faster than that of the serial algorithm in CPU,and its speedup ratio is 30 times as the serialalgorithm. It improved the efficiency of large-scale data processing effectively. It has good portability and high application value.【总页数】4页(P103-106)【作者】马歌;肖汉【作者单位】郑州师范学院信息科学与技术学院，河南郑州 450044;郑州师范学院信息科学与技术学院，河南郑州 450044【正文语种】中文【中图分类】TN919-34;TP391【相关文献】1.基于OpenCL的JPEG压缩算法并行化设计与实现 [J], 张敏华;张剑贤;裘雪红;周端2.基于OpenCL的并行kNN算法设计与实现 [J], 杨朋霖;冯百明;周志阳;温向慧3.基于OpenCL的尺度不变特征变换算法的并行设计与实现 [J], 许川佩;王光4.基于OpenCL的NDVI算法的并行化实现 [J], 熊英;罗琼5.基于OpenCL的加速鲁棒特征算法并行实现 [J], 郭景;陈贤富因版权原因，仅展示原文概要，查看原文内容请购买。

opencl编程模型

opencl编程模型OpenCL编程模型OpenCL（Open Computing Language）是一种跨平台的并行编程模型，它允许开发者在不同的计算设备上进行并行计算。

OpenCL的出现使得开发者能够充分利用计算设备的并行能力，提高计算效率。

本文将介绍OpenCL编程模型的基本概念和使用方法。

一、OpenCL编程模型的基本概念1.1 平台（Platform）OpenCL支持多种计算平台，如CPU、GPU、FPGA等。

每个平台都有自己的设备和驱动程序。

开发者可以选择适合自己需求的平台进行并行计算。

1.2 设备（Device）设备是OpenCL编程的执行单元，包括CPU、GPU等。

每个设备都有自己的计算单元、内存等资源。

开发者可以在不同设备上进行并行计算，充分利用各种计算资源。

1.3 上下文（Context）上下文是OpenCL编程的环境，包括平台和设备的信息。

开发者需要创建一个上下文，以便在指定设备上进行并行计算。

上下文包含了与设备相关的参数和配置信息。

1.4 内核（Kernel）内核是OpenCL编程的计算单元，它是在设备上执行的并行函数。

开发者需要使用OpenCL C语言编写内核代码，并将其编译成设备可执行的形式。

内核函数可以接收参数，读取和写入设备内存。

1.5 命令队列（Command Queue）命令队列是OpenCL编程的执行单元，用于管理内核的执行。

开发者可以将多个内核函数加入到命令队列中，OpenCL会按照加入的顺序执行内核函数。

命令队列还可以控制内核的执行顺序和同步。

二、OpenCL编程的基本步骤2.1 创建上下文和命令队列开发者需要创建一个上下文，并选择一个设备。

可以通过OpenCL 提供的API函数来查询可用的平台和设备信息，然后选择合适的设备创建上下文。

同时，还需要创建一个命令队列，用于管理内核的执行。

2.2 创建内存对象在进行并行计算之前，开发者需要为输入和输出数据创建内存对象。

基于DCT并行加速算法图像渲染平台系统设计

基于DCT并行加速算法图像渲染平台系统设计
于艳东
【期刊名称】《计算机测量与控制》
【年(卷),期】2014(22)5
【摘要】此次主要研究了基于GPU的集群渲染系统平台设计;为了提高平台的工作效率、增强集群渲染系统平台的数据传输能力,提出了一种采用DCT变换的方法来加速图像渲染速度;该方法利用DCT变换算法加速图像的实时压缩,加入CPU监控器和任务分配器模块,让GPU和CPU共同承担了绘图和渲染的目的,这样有效地降低处理流程对CPU的占用,实现了三维绘图和特效渲染的加速;为了验证平台的有效性以及图像压缩处理的效果,做了相应的功能验证;对640×480的RCB图像使用上述压缩方法和JPEG标准库在不同压缩设置下进行实验;仿真实验结果表明所提方案具有更高的压缩效率.
【总页数】3页(P1516-1518)
【作者】于艳东
【作者单位】集宁师范学院计算机系,内蒙古乌兰察布012000
【正文语种】中文
【中图分类】TP302
【相关文献】
1.基于CUDA平台的DR图像增强处理加速算法 [J], 何祥彬;周荷琴;李方勇
2.研究基于并行渲染的虚拟现实开发平台设计与实现 [J], 陈衡
3.基于并行渲染的虚拟现实开发平台设计与实现 [J], 申闫春;王锐;郭富荣;刘茂朕
4.一种基于网络的并行渲染和跨平台同步展示系统 [J], 吕圣卿;陈一民;黄晨;高明柯
5.基于视觉传达效果的海报图像自动渲染系统设计 [J], 周先博
因版权原因，仅展示原文概要，查看原文内容请购买。