PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IMAGE COMPRESSION USING V

合集下载

FPGA_ASIC-一种改进的2D-DCT的FPGA实现

FPGA_ASIC-一种改进的2D-DCT的FPGA实现

⎥ ⎥ ⎥ ⎥
⎢⎣2−4 + 2−5 + 2−9 + 2−10 + 2−11 + 2−12 ⎥⎦
下图直观的描述了[X(0)-X(7)] 1/2cos(π/16)用移位求和的方法实现的框图。并且在移位 求和中插入 3 级寄存器,形成 3 级流水线,使 3 级加法能在一个时钟周期内完成。
3.3 转置 RAM 模块 2D-DCT 需要两个 1D-DCT 共同完成,但是第一次 1D-DCT 运算得到的中间结果并不是
设 X(0),X(1),X(2),X(3), X(4),X(5),X(6),X(7)为输入的一行数据,Y(0),Y(1),Y(2),Y(3), Y(4),Y(5),Y(6),Y(7)为 DCT 变换后输出的一行数据。 由 1D-DCT 的数学定义可得到:
Y (0)
=
1 22
⎡⎣X (0)+ X (7)⎤⎦
N −1
(2 j + 1)vπ
2 / N c(v)x(i, j) cos
j=0
2N
由于 2D-DCT 具有行列可分解性,所以 8×8 数据块的 2D-DCT 可分解成 8 点一行的行 DCT 变换和 8 点一列的列 DCT 变换。如图 1 所示,是 2D DCT 变换的整体实现框图。整个 硬件框图包括 5 个主要模块。1、串并转换模块 2、1D-DCT 变换模块 3、转置 RAM 模块 4、 并串转换模块 5、控制模块
X
(5)]−
1
5π cos
[
X
(3)−
X
(
4)]
2 16
2 16
2 16
2 16
Y (4)= 1 [ X (0)+ X (7)]− 1 [ X (1)+ X (6)]− 1 [ X (2)+ X (5)]+ 1 [ X (3)+ X (4)]

西安工程大学学报2023年总目次

西安工程大学学报2023年总目次

西安工程大学学报2023年总目次Ә纺织科学与工程亚麻短纤维增强硅橡胶复合材料的力学性能周子祥,等第1期(1) 新媒体广告推送方式对服装购买意愿的影响周 捷,等第1期(6) 台湾高山族传统服饰中的刺绣针法赖文蕾,等第1期(14) 基于K A N O 模型的冲锋衣口袋款式需求周 捷,等第2期(1) 基于P S O 的G P C -P I D 的细纱机锭速控制算法王延年,等第2期(9) B i O B r 光热超疏水涂层制备及其防冰除冰性能张彩宁,等第3期(1) 基于C u NW s /A g NWs /棉纺织品的疏水性可穿戴压力传感器屈银虎,等第3期(7) P B O 纤维湿法非织造材料热压工艺李志刚,等第3期(15) 基于逆向工程的青年女性夜跑服设计薛 媛,等第3期(21) 可活动式男体立裁人台手臂的研制方法对比许 珂,等第3期(28) 基于岭回归的改良 新唐装 款式设计周 捷,等第4期(1) 基于感知风险与感知价值的婚纱租赁接受意愿影响因素张云鹤,等第4期(8) 服装品牌社交电商平台宣传策略对消费者购买意愿的影响:以小红书为例冯润榴,等第4期(16) 基于深度置信网络的缝纫平整度客观评价模型胡 胜,等第4期(25) 基于图像特征的纱线条干均匀度实时检测宋栓军,等第4期(32) 改进自抗扰下的细纱机卷绕系统控制策略廉继红,等第4期(40) Ә环境工程·化学化工面向I G B T 模块的冷却方式及微通道冷却在I G B T 中的应用研究吴曦蕾,等第1期(21) 自然条件下水冷捕获量的建模与验证孙铁柱,等第1期(38) 有机氟丙烯酸树脂/S i O 2超疏水涂层的制备与性能赵亚梅,等第1期(46) 低共熔溶剂辅助酶法制备稀有人参皂苷C K 樊雨柔,等第1期(54) 纳米Z r O 2/Z n -A l -C 涂层在模拟地热水中的防腐性能余 嵘,等第1期(62) R s -198液体有机菌肥制备及其促生性能研究朱双喜,等第1期(71) 好氧颗粒污泥对活性黑5染料的降解陈 希,等第2期(32) 基于A i r p a k 的某建筑工地活动板房室内热环境数值模拟狄育慧,等第2期(40) 延河底泥的重金属分布特征和生态风险评价王理明,等第2期(47) 酿酒酵母启动子的克隆及特性表征孙琳琳,等第3期(51) 复合微生物腐解菌剂的制备及其菌渣堆肥性能李方向,等第3期(59) 蒸发冷却空调水质及处理方法的适用性黄 翔,等第3期(66) I n 2S 3/U i O -67异质结的构筑及可见光催化清除C r (Ⅵ)和R h B 袁童乐,等第4期(64) MA -S A S -H E MA 三元共聚物的合成及其阻垢性能余 嵘,等第4期(74) Ә电子信息与机电工程基于改进U N e t 模型的原棉杂质图像分割方法许 涛,等第1期(77) 含典型缺陷的风电塔筒环焊缝强度分析成小乐,等第1期(84)动态调整蚁群算法启发因子的A G V 路径规划沈丹峰,等第1期(93) 基于改进E S O 的柔性机械臂自抗扰-滑模组合控制朱其新,等第1期(103) 智能投影电视意象耦合造型仿生设计高小针,等第1期(112) 基于纵向阻抗的变压器虚拟相位保护夏经德,等第2期(54) 电网频率控制的新型三电平光储一体机王 刚,等第2期(63) 自适应变分模态分解与R C N N -3结合的扬声器异常声分类方法周静雷,等第2期(71) 基于B P 神经网络的电磁阀多目标优化设计沈丹峰,等第2期(79) 渐进式深度网络下盲运动图像去模糊方法王晓华,等第3期(74) 改进D *算法下的无人机三维路径规划汪小帅,等第3期(83) 多尺度混合注意力网络的图像超分辨率重建李云红,等第3期(92) 融合直觉模糊灰色理论的制造云服务Q o S 评价方法陈 君,等第3期(101) 基于双源自适应知识蒸馏的轻量化图像分类方法张凯兵,等第4期(82) 结合先验知识与深度强化学习的机械臂抓取研究缪刘洋,等第4期(92) 基于浸入与不变自适应的机械臂轨迹跟踪控制方法汤元会,等第4期(102) 局部遮荫下基于I P &O -S S A 的M P P T 控制研究王延年,等第4期(110) 改进D e e p L a b V 3+下的轻量化烟雾分割算法陈 鑫,等第4期(118) 基于新型特征增强与融合的雾天目标检测方法朱 磊,等第6期(106) 用于自动驾驶的双注意力机制语义分割方法王延年,等第6期(114) 优化脉振高频信号注入的P M S M 无位置传感器控制方法张 蕾,等第6期(121) T 型受限微通道内液滴生成特性数值模拟袁越锦,等第6期(129) 联合边界感知和多特征融合的点云语义分割方法卢 健,等第6期(137) 基于改进R N N 多源融合算法的网络异构信息集成管理系统李 麟,等第6期(145) 基于胶囊网络的入侵检测模型赵 旭,等第1期(119) 小数据集下基于改进QMA P 算法的B N 参数学习陈海洋,等第1期(126) 基于E f f i c i e n t F a c e N e t s 的大规模自然场景人脸识别张凯兵,等第2期(87) 多策略改进的麻雀搜索算法及应用薛 涛,等第2期(96) 多视角原型对比学习的小样本意图识别模型张晓滨,等第2期(105) Ә材料科学时效处理对20C r 渗碳钢制高速直线导轨组织及性能影响王俊勃,等第2期(17) 不同溅射气压下T i N 薄膜的制备及其性能徐 洁,等第2期(25) 包覆铜粉的制备及其电磁吸波性能刘 毅,等第3期(36) N i O 改性纳米多孔A g 电催化氧化硼氢化钠性能研究宋衍滟,等第3期(44) 不同溅射功率下C o C r F e N i C u 高熵合金涂层的耐腐蚀及其抗氧化性能王彦龙,等第4期(48) 钕钆变质镁铝基合金的固溶及时效行为杨建东第4期(56) Ә基础科学线性回归模型多变点的L A D -L A S S O 估计王 珊,等第2期(113) 引入正弦余弦算子和新自花授粉的花授粉算法张 超,等第2期(119)基于多源特征和双向门控循环单元的抗高血压肽识别贺兴时,等第3期(109) 一类具有时滞的S e l k o v 模型的H o p f 分歧分析马亚妮,等第3期(115) 具有恐惧和强A l l e e 效应的离散食饵-捕食者模型胡新利,等第4期(127) 一种具有执行器故障的非线性离散系统的迭代学习控制李丁巳,等第4期(134) 数据中心中机柜出风温度的快速模拟张 博,等第5期(1) 水蓄冷在珠三角地区数据中心应用的节能潜力分析董梓骏,等第5期(10) 间接蒸发冷却在湿热地区数据中心的节能分析马晓晨,等第5期(18) 藏区数据中心热回收式直接蒸发冷却机组的设计与测试黄 翔,等第5期(25) 数据中心气泵驱动复合冷却机组工作特性周 峰,等第5期(32) 声屏障及填料和配水协同优化对湿式冷却塔热力性能的影响步兆彬,等第5期(39) 数据中心间接蒸发冷却空调系统能效评价褚俊杰,等第5期(46) 地板下送风数据中心冷通道导流的结构研究许陆顺,等第5期(53) 基于模型预测控制的数据中心水蓄冷冷却系统节能优化模型郑浩然,等第5期(61) 回热式间接蒸发冷却地区适应性的数值模拟徐 鹏,等第5期(69) 基于线性S VM 算法的云数据中心蓄电池状态预测杨玉丽,等第5期(77) 数据中心送风冷通道的导流构件结构优化巩 莉,等第5期(83) 室内工况对蒸发冷凝气泵热管复合空调的影响王 飞,等第5期(92) 高热流密度多热源冷却用相变换热冷板实验研究刘 凯,等第5期(99) 基于全生命周期成本的装配式高效制冷机房设计凌荣武,等第5期(107)Ә建筑环境与舒适健康过渡季高校教室短期热经历对热舒适与热适应的影响蒋 婧,等第6期(1) 夏热冬冷地区办公建筑空气源热泵与太阳能复合供暖系统运行特性邓淑丹,等第6期(8) 基于G R A -P S O -B P 神经网络的办公建筑负荷率及冷冻水供水温度预测马静静,等第6期(17) 间歇用能特征下的干湿式地板辐射供暖热性能对比周文杰,等第6期(26) 传统村落微气候环境模拟应用与空间优化 以汉中市乐丰村为例李 晶,等第6期(34) 冬季产后女性热偏好及其影响因素王丽娟,等第6期(42) 中国不同地区居民节能意识影响因素调查常皓冉,等第6期(50) Ә电力安全与智能装备关键技术输电线路中污秽复合绝缘子异常发热研究曹 雯,等第6期(60) 恶劣环境下多参量融合的断路器操动机构辅助开关研究邱鹏锋,等第6期(69) 电力系统中全光纤电流传感器的研究进展高 超,等第6期(78) 光伏组件覆雪层的自然融化脱落条件朱永灿,等第6期(89) 直流微网中双有源桥变换器精确直接功率控制叶育林,等第6期(96)。

基于多层特征嵌入的单目标跟踪算法

基于多层特征嵌入的单目标跟踪算法

基于多层特征嵌入的单目标跟踪算法1. 内容描述基于多层特征嵌入的单目标跟踪算法是一种在计算机视觉领域中广泛应用的跟踪技术。

该算法的核心思想是通过多层特征嵌入来提取目标物体的特征表示,并利用这些特征表示进行目标跟踪。

该算法首先通过预处理步骤对输入图像进行降维和增强,然后将降维后的图像输入到神经网络中,得到不同层次的特征图。

通过对这些特征图进行池化操作,得到一个低维度的特征向量。

将这个特征向量输入到跟踪器中,以实现对目标物体的实时跟踪。

为了提高单目标跟踪算法的性能,本研究提出了一种基于多层特征嵌入的方法。

该方法首先引入了一个自适应的学习率策略,使得神经网络能够根据当前训练状态自动调整学习率。

通过引入注意力机制,使得神经网络能够更加关注重要的特征信息。

为了进一步提高跟踪器的鲁棒性,本研究还采用了一种多目标融合的方法,将多个跟踪器的结果进行加权融合,从而得到更加准确的目标位置估计。

通过实验验证,本研究提出的方法在多种数据集上均取得了显著的性能提升,证明了其在单目标跟踪领域的有效性和可行性。

1.1 研究背景随着计算机视觉和深度学习技术的快速发展,目标跟踪在许多领域(如安防、智能监控、自动驾驶等)中发挥着越来越重要的作用。

单目标跟踪(MOT)算法是一种广泛应用于视频分析领域的技术,它能够实时跟踪视频序列中的单个目标物体,并将其位置信息与相邻帧进行比较,以估计目标的运动轨迹。

传统的单目标跟踪算法在处理复杂场景、遮挡、运动模糊等问题时表现出较差的鲁棒性。

为了解决这些问题,研究者们提出了许多改进的单目标跟踪算法,如基于卡尔曼滤波的目标跟踪、基于扩展卡尔曼滤波的目标跟踪以及基于深度学习的目标跟踪等。

这些方法在一定程度上提高了单目标跟踪的性能,但仍然存在一些局限性,如对多目标跟踪的支持不足、对非平稳运动的适应性差等。

开发一种既能有效跟踪单个目标物体,又能应对多种挑战的单目标跟踪算法具有重要的理论和实际意义。

1.2 研究目的本研究旨在设计一种基于多层特征嵌入的单目标跟踪算法,以提高目标跟踪的准确性和鲁棒性。

基于概率论的二维点云外包轮廓处理方法,装置及介质

基于概率论的二维点云外包轮廓处理方法,装置及介质

基于概率论的二维点云外包轮廓处理方法,装置及介质全文共四篇示例,供读者参考第一篇示例:在计算机视觉和图形处理领域,二维点云是一种常见的数据形式,它由一系列二维点组成,通常用于表示物体的形状和轮廓。

点云外包轮廓处理是指从点云数据中提取对象的边界或轮廓信息的过程,这对于识别和分析对象非常重要。

本文将介绍一种基于概率论的二维点云外包轮廓处理方法,以及相关的装置和介质。

一、基于概率论的二维点云外包轮廓处理方法在传统的二维点云外包轮廓处理方法中,常常采用基于几何形状或相邻点关系的算法来提取对象的轮廓信息。

这些方法往往对噪声和不规则形状的点云数据表现不佳,容易导致轮廓信息的缺失或错误。

为了解决这一问题,我们提出了一种基于概率论的二维点云外包轮廓处理方法。

我们首先将二维点云数据表示为一个由点组成的集合,记为P={p1, p2, ..., pn},其中pi=(xi, yi)表示点的坐标。

然后,我们引入一个概率模型来描述点云数据中点的分布规律。

具体地,我们假设点云数据服从一个高斯混合模型,即每个点的生成可以由多个高斯分布加权组合而成。

这样,我们可以用参数θ={μ1, Σ1, w1, ..., μk, Σk, wk}来表示高斯混合模型,其中μi和Σi分别表示第i个高斯分布的均值和协方差矩阵,wi表示第i个高斯分布的权重。

接下来,我们利用EM算法来估计参数θ,以拟合点云数据的分布模型。

具体地,EM算法分为两个步骤:E步和M步。

在E步中,我们计算每个点属于每个高斯分布的后验概率,即每个点与各个高斯分布的关联程度。

在M步中,我们根据E步的计算结果更新参数θ,使得模型更好地拟合点云数据。

我们利用拟合得到的高斯混合模型来提取点云数据的轮廓信息。

具体地,我们可以根据高斯分布的概率密度函数,计算每个点属于轮廓的概率。

然后,我们可以根据概率值来筛选出轮廓点,从而得到对象的外包轮廓。

二、装置及介质为了实现基于概率论的二维点云外包轮廓处理方法,我们需要相应的装置和介质。

一种2DDCT与压缩感知结合的人脸识别

一种2DDCT与压缩感知结合的人脸识别

(. colfEet n n nom t nE gnei L T D l n 16 2 , hn ; 1 Sh o o l r i a dI r ai nier gD U , ai 10 4 C i coc f o n a a
2 Cut nv ri ,P r 1 2, srl . r n U iest i y et WA 6 0 Autai h a;3 YLi r l . i ma No ,Yiig 8 5 0 nn 3 0 0,C ia hn )
fc a a a e h w t a i to b s n f c iei efc c g i o ,e p c al efc a b s l B a ed t b s s o t h s s h t meh di r u t d e e t t er o n t n so a v nh a e i s e i y i t a e d t a e Yae . l nh a
h g i n in i n o u a in l o lx t. n t i p p rt e oi i a c g sp c s e y 2 C or d c e i h d me s a t a d c mp tt a mp e i I s a e rgn f e i e i r e s d b DD T t e u e t ol y o c y h h l a ma o h c a a t rd me so s e e t ey h n,t e i g s p o e s d b o o ti e f c e o n t n f au e . i al ,t e h r c e i n i n f ci l.T e h ma e i r c s e y CS t b an t a e r c g i o e t r s F n l v h i y h

基于密度相似性的密度峰值聚类算法[发明专利]

基于密度相似性的密度峰值聚类算法[发明专利]

专利名称:基于密度相似性的密度峰值聚类算法专利类型:发明专利
发明人:王言言,万静,田新雨
申请号:CN202210264661.X
申请日:20220317
公开号:CN114638301A
公开日:
20220617
专利内容由知识产权出版社提供
摘要:本发明针对密度峰值聚类算法(DPC)不适用于流形数据集、聚类中心的选择需要人为干预且会在剩余点分配会出现多米诺效应的缺陷。

提出了一种基于密度相似性的密度峰值聚类算法
(DA‑DPC)。

首先,引用密度相识度来代替欧式距离来适用处理流形数据集,可以消除dc对算法结果的影响;其次,根据密度聚类指数的特点和聚类的定义,设计了一种新的密度聚类指数(DCI),自动获取聚类中心,降低参数对聚类结果的影响;对于剩余点提出两种匹配策略,更好的达到聚类效果;实验表明,该算法在人工数据集和UCI真实数据集上比常用的几种聚类算法具有更好的聚类效果。

申请人:哈尔滨理工大学
地址:150080 黑龙江省哈尔滨市南岗区学府路52号
国籍:CN
更多信息请下载全文后查看。

主要研究方向、研究内容和学术成就简介

主要研究方向、研究内容和学术成就简介

主要研究方向、研究内容和学术成就简介1. 主要研究方向该学者的主要研究方向包括计算机视觉、图像处理和机器学习。

他致力于利用计算机技术解决视觉领域中的各种问题,并探索如何将机器学习算法应用于图像处理中。

2. 研究内容在计算机视觉方面,该学者的研究内容主要涉及目标检测、图像分类和图像分割等领域。

他提出了一种基于深度学习的目标检测方法,在各种复杂场景下取得了很好的效果。

同时,他还研究了图像分类算法,通过改进传统的卷积神经网络结构,提高了图像分类的准确率。

此外,他还探索了图像分割技术,在医学图像处理和自动驾驶等领域有着广泛的应用。

在图像处理方面,该学者的研究内容主要包括图像增强、图像去噪和图像重建等方面。

他提出了一种基于深度学习的图像增强算法,能够提高图像的视觉质量和细节信息。

此外,他还研究了图像去噪算法,通过利用深度学习模型对图像进行降噪处理,取得了较好的效果。

他还关注图像重建领域,通过提出一种基于稀疏表示的图像重建方法,实现了对低质量图像的高质量重建。

在机器学习方面,该学者的研究内容主要涉及模式识别、特征提取和分类算法等。

他提出了一种基于深度学习的模式识别算法,在人脸识别和手写字符识别等方面取得了较好的结果。

同时,他还研究了特征提取算法,通过改进传统的特征提取方法,提高了图像和文本等数据的表达能力。

此外,他还关注分类算法的研究,通过优化传统的分类算法,提高了分类任务的准确率和效率。

3. 学术成就该学者在计算机视觉、图像处理和机器学习领域取得了一系列的学术成就。

他的研究成果发表在多个国际期刊和会议上,并受到了同行的广泛关注和引用。

他的某篇论文被评为该领域的重要突破,并入选了某个重要会议的最佳论文。

此外,他还获得了某个国际学术组织颁发的青年科学家奖,以表彰他在该领域的杰出贡献。

总结通过对该学者的研究方向、研究内容和学术成就的介绍,我们可以看出他在计算机视觉、图像处理和机器学习领域具有较高的研究水平和丰富的经验。

电子信息工程专业英语教程_第5版 题库

电子信息工程专业英语教程_第5版 题库

《电子信息工程专业英语教程(第5版)》题库Section A 术语互译 (1)Section B 段落翻译 (5)Section C阅读理解素材 (12)C.1 History of Tablets (12)C.2 A Brief History of satellite communication (13)C.3 Smartphones (14)C.4 Analog, Digital and HDTV (14)C.5 SoC (15)Section A 术语互译Section B 段落翻译Section C阅读理解素材C.1 History of TabletsThe idea of the tablet computer isn't new. Back in 1968, a computer scientist named Alan Kay proposed that with advances in flat-panel display technology, user interfaces, miniaturization of computer components and some experimental work in WiFi technology, you could develop an all-in-one computing device. He developed the idea further, suggesting that such a device would be perfect as an educational tool for schoolchildren. In 1972, he published a paper about the device and called it the Dynabook.The sketches of the Dynabook show a device very similar to the tablet computers we have today, with a couple of exceptions. The Dynabook had both a screen and a keyboard all on the same plane. But Key's vision went even further. He predicted that with the right touch-screen technology, you could do away with the physical keyboard and display a virtual keyboard in any configuration on the screen itself.Key was ahead of his time. It would take nearly four decades before a tablet similar to the one he imagined took the public by storm. But that doesn't mean there were no tablet computers on the market between the Dynabook concept and Apple's famed iPad.One early tablet was the GRiDPad. First produced in 1989, the GRiDPad included a monochromatic capacitance touch screen and a wired stylus. It weighed just under 5 pounds (2.26 kilograms). Compared to today's tablets, the GRiDPad was bulky and heavy, with a short battery life of only three hours. The man behind the GRiDPad was Jeff Hawkins, who later founded Palm.Other pen-based tablet computers followed but none received much support from the public. Apple first entered the tablet battlefield with the Newton, a device that's received equal amounts of love and ridicule over the years. Much of the criticism for the Newton focuses on its handwriting-recognition software.It really wasn't until Steve Jobs revealed the first iPad to an eager crowd that tablet computers became a viable consumer product. Today, companies like Apple, Google, Microsoft and HP are trying to predict consumer needs while designing the next generation of tablet devices.C.2 A Brief History of satellite communicationIn an article in Wireless World in 1945, Arthur C. Clarke proposed the idea of placing satellites in geostationary orbit around Earth such that three equally spaced satellites could provide worldwide coverage. However, it was not until 1957 that the Soviet Union launched the first satellite Sputnik 1, which was followed in early 1958 by the U.S. Army’s Explorer 1. Both Sputnik and Explorer transmitted telemetry information.The first communications satellite, the Signal Communicating Orbit Repeater Experiment (SCORE), was launched in 1958 by the U.S. Air Force. SCORE was a delayed-repeater satellite, which received signals from Earth at 150 MHz and stored them on tape for later retransmission. A further experimental communication satellite, Echo 1, was launched on August 12, 1960 and placed into inclined orbit at about 1500 km above Earth. Echo 1 was an aluminized plastic balloon with a diameter of 30 m and a weight of 75.3 kg. Echo 1 successfully demonstrated the first two-way voice communications by satellite.On October 4, 1960, the U.S. Department of Defense launched Courier into an elliptical orbit between 956 and 1240 km, with a period of 107 min. Although Courier lasted only 17 days, it was used for real-time voice, data, and facsimile transmission. The satellite also had five tape recorders onboard; four were used for delayed repetition of digital information, and the other for delayed repetition of analog messages.Direct-repeated satellite transmission began with the launch of Telstar I on July 10, 1962. Telstar I was an 87-cm, 80-kg sphere placed in low-Earth orbit between 960 and 6140 km, with an orbital period of 158 min. Telstar I was the first satellite to be able to transmit and receive simultaneously and was used for experimental telephone, image, and television transmission. However, on February 21, 1963, Telstar I suffered damage caused by the newly discovered Van Allen belts.Telstar II was made more radiation resistant and was launched on May 7, 1963. Telstar II was a straight repeater with a 6.5-GHz uplink and a 4.1-GHz downlink. The satellite power amplifier used a specially developed 2-W traveling wave tube. Along with its other capabilities, the broadband amplifier was able to relay color TV transmissions. The first successful trans-Atlantic transmission of video was accomplished with Telstar II , which also incorporated radiation measurements and experiments that exposed semiconductor components to space radiation.The first satellites placed in geostationary orbit were the synchronous communication (SYNCOM ) satellites launched by NASA in 1963. SYNCOM I failed on injection into orbit. However, SYNCOM II was successfully launched on July 26, 1964 and provided telephone, teletype, and facsimile transmission. SYNCOM III was launched on August 19, 1964 and transmitted TV pictures from the Tokyo Olympics. The International Telecommunications by Satellite (INTELSAT) consortium was founded in July 1964 with the charter to design, construct, establish, and maintain the operation of a global commercial communications system on a nondiscriminatory basis. The INTELSAT network started with the launch on April 6, 1965, of INTELSAT I, also called Early Bird. On June 28, 1965, INTELSAT I began providing 240 commercial international telephone channels as well as TV transmission between the United States and Europe.In 1979, INMARSAT established a third global system. In 1995, the INMARSAT name was changed to the International Mobile Satellite Organization to reflect the fact that the organization had evolved to become the only provider of global mobile satellite communications at sea, in the air, and on the land.Early telecommunication satellites were mainly used for long-distance continental and intercontinental broadband, narrowband, and TV transmission. With the advent of broadband optical fiber transmission, satellite services shifted focus to TV distribution, and to point-to-multipoint and very small aperture terminal (VSAT) applications. Satellite transmission is currently undergoing further significant growth with the introduction of mobile satellite systems for personal communications and fixed satellite systems for broadband data transmission.C.3 SmartphonesThink of a daily task, any daily task, and it's likely there's a specialized, pocket-sized device designed to help you accomplish it. You can get a separate, tiny and powerful machine to make phone calls, keep your calendar and address book, entertain you, play your music, give directions, take pictures, check your e-mail, and do countless other things. But how many pockets do you have? Handheld devices become as clunky as a room-sized supercomputer when you have to carry four of them around with you every day.A smartphone is one device that can take care of all of your handheld computing and communication needs in a single, small package. It's not so much a distinct class of products as it is a different set of standards for cell phones to live up to.Unlike many traditional cell phones, smartphones allow individual users to install, configure and run applications of their choosing. A smartphone offers the ability to conform the device to your particular way of doing things. Most standard cell-phone software offers only limited choices for re-configuration, forcing you to adapt to the way it's set up. On a standard phone, whether or not you like the built-in calendar application, you are stuck with it except for a few minor tweaks. If that phone were a smartphone, you could install any compatible calendar application you like.Here's a list of some of the things smartphones can do:•Send and receive mobile phone calls•Personal Information Management (PIM) including notes, calendar and to-do list•Communication with laptop or desktop computers•Data synchronization with applications like Microsoft Outlook•E-mail•Instant messaging•Applications such as word processing programs or video games•Play audio and video files in some standard formatsC.4 Analog, Digital and HDTVFor years, watching TV has involved analog signals and cathode ray tube (CRT) sets. The signal is made of continually varying radio waves that the TV translates into a picture and sound. An analog signal can reach a person's TV over the air, through a cable or via satellite. Digital signals, like the ones from DVD players, are converted to analog when played on traditional TVs.This system has worked pretty well for a long time, but it has some limitations:•Conventional CRT sets display around 480 visible lines of pixels. Broadcasters have been sending signals that work well with this resolution for years, and they can't fit enough resolution to fill a huge television into the analog signal.•Analog pictures are interlaced - a CRT's electron gun paints only half the lines for each pass down the screen. On some TVs, interlacing makes the picture flicker.•Converting video to analog format lowers its quality.United States broadcasting is currently changing to digital television (DTV). A digital signal transmits the information for video and sound as ones and zeros instead of as a wave. For over-the-air broadcasting, DTV will generally use the UHF portion of the radio spectrum with a 6 MHz bandwidth, just like analog TV signals do.DTV has several advantages:•The picture, even when displayed on a small TV, is better quality.• A digital signal can support a higher resolution, so the picture will still look good when shown on a larger TV screen.•The video can be progressive rather than interlaced - the screen shows the entire picture for every frame instead of every other line of pixels.•TV stations can broadcast several signals using the same bandwidth. This is called multicasting.•If broadcasters choose to, they can include interactive content or additional information with the DTV signal.•It can support high-definition (HDTV) broadcasts.DTV also has one really big disadvantage: Analog TVs can't decode and display digital signals. When analog broadcasting ends, you'll only be able to watch TV on your trusty old set if you have cable or satellite service transmitting analog signals or if you have a set-top digital converter.C.5 SoCThe semiconductor industry has continued to make impressive improvements in the achievable density of very large-scale integrated (VLSI) circuits. In order to keep pace with the levels of integration available, design engineers have developed new methodologies and techniques to manage the increased complexity inherent in these large chips. One such emerging methodology is system-on-chip (SoC) design, wherein predesigned and pre-verified blocks often called intellectual property (IP) blocks, IP cores, or virtual components are obtained from internal sources, or third parties, and combined on a single chip.These reusable IP cores may include embedded processors, memory blocks, interface blocks, analog blocks, and components that handle application specific processing functions. Corresponding software components are also provided in a reusable form and may include real-time operating systems and kernels, library functions, and device drivers.Large productivity gains can be achieved using this SoC/IP approach. In fact, rather than implementing each of these components separately, the role of the SoC designer is to integrate them onto a chip to implement complex functions in a relatively short amount of time.The integration process involves connecting the IP blocks to the communication network, implementing design-for-test (DFT) techniques and using methodologies to verify and validate the overall system-level design. Even larger productivity gains are possible if the system is architected as a platform in such as way that derivative designs can be generated quickly.In the past, the concept of SoC simply implied higher and higher levels of integration. That is, it was viewed as migrating a multichip system-on-board (SoB) to a single chip containing digital logic, memory, analog/mixed signal, and RF blocks. The primary drivers for this direction were the reduction of power, smaller form factor, and lower overall cost. It is important to recognize that integrating more and more functionality on a chip has always existed as a trend by virtue of Moore’s Law, which predicts that the number of transistors on a chip will double every 18-24 months. The challenge is to increase designer productivity to keep pace with Moore’s Law. Therefore, today’s notion of SoC is defined in terms of overall productivity gains through reusable design and integration of components.。

多级多维DCT域的视频水印方案

多级多维DCT域的视频水印方案

帧 内、帧 问的不 同特性 , 便 是本文所 谓 的 “ 这 多维 ” 。 这种 灵活 的维数 选择 能 更好 地 发挥 D CT变换 能量
eeg ovrec f i rt oi rnfr ai (C ) Fr l nry cnegne o d c e cse tasom t n D T . it se n o sy,te v e fa e r at i e io sm lss Te , A cri o te h i o rm sae prio d n o e c s . hn cod g t h d tn t ae n
K 的系数 立方体 , 系数 立方 体进行 3 DC 对 D— T变换 ,
这是 第二级 DC T。若 第一阶 段进 行 了多次 DC T变 换 ,则第二 阶段 的级数 依次递 增 。 依 上 所述 ,在 第 一 、二 级 DCT 中分 别 进 行 了
二维 和 三维 D CT变换 ,不 同的维 数 选择 分 别针对
3 D—DCT的 水 印方 案 ,但 它们 共 同的 问题 是运 算 量 大 ,处 理 速 度慢 。本 文 采取 多 级 多维 DCT变 换
的 方法 ,充 分 利用 DCT变 换 能量 集 中 的特性 ,先
对视 频场 景 中每 一帧 进 行 2 D—DCT变换 ,再取 各
帧 DCT 系数 矩阵 左 上 角相 应 的 系数 组 成 系数 立方
Ab ta t T i p pr rsns a nv l vd o wa emakn meh d a e o mutl e ad mut i nin l CT, b u i te poet o sr c : hs a e pee t o e i e tr r ig to b sd n l e l n limesoa D iv d y s g h rp ry f n

空间通道双重注意力道路场景语义分割

空间通道双重注意力道路场景语义分割

第28卷㊀第5期2023年10月㊀哈尔滨理工大学学报JOURNAL OF HARBIN UNIVERSITY OF SCIENCE AND TECHNOLOGY㊀Vol.28No.5Oct.2023㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀空间通道双重注意力道路场景语义分割王小玉,㊀林㊀鹏(哈尔滨理工大学计算机科学与技术学院,哈尔滨150080)摘㊀要:无人驾驶领域的一个重要问题就是在低功耗移动电子设备上怎样运行实时高精度语义分割模型㊂由于现有语义分割算法参数量过多㊁内存占用巨大导致很难满足无人驾驶等现实应用的问题,并且在影响语义分割模型的精度和推理速度的众多因素中,空间信息和上下文特征尤为重要,并且很难同时兼顾㊂针对该问题提出采用不完整的ResNet18作为骨干网络,ResNet18是一个轻量级的模型,参数量较少,占用内存不大;同时采用双边语义分割模型的技术,在两条路径上添加通道空间双重注意力机制,来获取更多的上下文信息和空间信息的想法㊂另外还采用了精炼上下文信息的注意力优化模块,和融合两条路径输出的融合模块,添加的模块对于参数量和内存的影响很小,可以即插即用㊂以Cityscapes 和CamVid 为数据集㊂在Citycapes 上,mIoU 达到77.3%;在CamVid 上,mIoU 达到66.5%㊂输入图像分辨率为1024ˑ2048时,推理时间为37.9ms ㊂关键词:无人驾驶;实时语义分割;深度学习;注意力机制;深度可分离卷积DOI :10.15938/j.jhust.2023.05.013中图分类号:TP391.41文献标志码:A文章编号:1007-2683(2023)05-0103-07Semantic Segmentation of Unmanned Driving SceneBased on Spatial Channel Dual AttentionWANG Xiaoyu,㊀LIN Peng(Harbin University of Scienceand Technology,Computer Scienceand Technology,Harbin 150080,China)Abstract :An important issue in the field of unmanned driving is how to run real-time high-precision semantic segmentation modelson low-power mobile electronic devices.Existing semantic segmentation algorithms have too many parameters and huge memory usage,which makes it difficult to meet the problems of real-world applications such as unmanned driving.However,among the many factors that affect the accuracy and speed of the semantic segmentation model,spatial information and contextual features are particularly important,and it is difficult to take into account both.In response to this problem,it is proposed to use the incomplete ResNet18as the backbone network,design a bilateral semantic segmentation model,and add a channel space dual attention model to the two paths to obtain more contextual and spatial information.In addition,the attention optimization module that refines the context information and the fusion module that integrates the output of the two paths are also used.Take Cityscapes and CamVid as data sets.On Citycapes,mIoU reached 77.3%;on CamVid,mIoU reached 66.5%.When the input image resolution is 1024ˑ2048,the segmentation speed is 37.9ms.Keywords :driverless technology;real-time semantic segmentation;deep learning;attention mechanism;depth separable convolu-tion㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀收稿日期:2022-04-04基金项目:国家自然科学基金(61772160);黑龙江省教育厅科学技术研究项目(12541177).作者简介:林㊀鹏(1997 ),男,硕士研究生.通信作者:王小玉(1971 ),女,教授,硕士研究生导师,E-mail:wangxiaoyu@.0㊀引㊀言随着人工智能与汽车交通的结合, 自动驾驶热潮被掀起,如何准确㊁快速地检测路况㊁路标等信息成为目前研究的热点目标[1]㊂许多研究人员逐渐将注意力转向了对道路场景的理解㊂主要领域之一是道路场景的语义分割[2]㊂基于深度学习的图像语义分割作为计算机视觉中的一项基本任务,旨在估计给定输入图像中所有像素的类别标签,并呈现出不同颜色区域掩模的分割结果㊂2014年,文[2]提出的全卷积神经网络(FCN),被誉为深度卷积神经网络的奠基之作,标志着分割领域正式进入全新的发展时期㊂与之前所有图像语义分割算法最大的不同在于,FCN用卷积层代替分类模型中全部的全连接层,学习像素到像素的映射㊂并且,提出了在上采样阶段联合不同池化层的结果,来优化最终输出的方法[2]㊂目前很多的优秀的基于深度学习的图像语义分割算法都是基于FCN的思想实现的[3]㊂2015年,剑桥大学在FCN的基础上,实现了突破,提出了SegNet模型[3]㊂从那时起,更多的语义分割算法被开发出来,并且分割的准确性一直在提高,如deeplab系列[4],多路级联模型(refinenet)[4]和PSPNet等[5]㊂近年来,深度学习在图像语义分割方面有了很大的进步㊂在自动驾驶等领域有着很大的应用潜力㊂但是算法模型大多关注对图像分割准确率的提升,其计算成本和内存占用较高,模型的实时性得不到保证[6]㊂在许多实际应用中,对于模型的实时性也有很高的要求㊂根据这一需求,目前最常用的ENet,MobileNet系列也随即被提出[7]㊂实时进行语义信息分割技术逐渐分化一个新的领域㊂在实时语义分割的任务中,为了提高推理速度,有的模型采取缩小图片尺寸的操作,有的采取删减特征图通道的操作,但是这些操作都会丢失一些空间信息[7]㊂这是因为初始图像经历了多次卷积和池化,最终导致初始图片被模型加载后,特征图的分辨率由大变小㊂对于分割任务来说,获取丰富的上下文信息和空间信息㊁高分辨率的特征㊁深层特征的语义信息,可以更好地提高模型的分割精度[8]㊂近年来,在实时语义信息分割算法中,双边分割网络算法(BiSeNet)在语义分割任务上获得了瞩目的成绩[9]㊂本文在BiSeNet的基础上,上下文路径以轻量化模型ResNet18作为骨干网络㊂引入两个空间通道双重注意力机制CBAMT和CSSE模块㊂通过在上下文路径的轻量型特征提取网络引入CBAMT模块,从空间和通道两个维度来判断应该学习什么特征[10]㊂然后使用注意力优化模块(ARM),强化对轻量型特征提取模型不同阶段的特征学习[11]㊂通过在空间路径引入CSSE模块获取更多的空间特征,并且可以利用深度可分离卷积减少参数量㊂最后使用特征融合模块(FFM)将两条路径的输出进行融合㊂1㊀本文算法BiSeNet其结构如图1所示,双边分割网络设计有2条支路结构:空间支路和上下文支路㊂空间支路解决空间信息的缺失;上下文支路解决感受野小的问题,获取丰富的上下文信息[12]㊂两条路径采取的方法分别为:在空间支路中,输入的图像经过三层由大卷积核组成的卷积层的卷积,将输入图像压缩成原图尺寸1/8的特征图,这样就保留丰富的空间信息㊂并且这些卷积层的卷积核都是小步长的,经过这些卷积层的学习,最终可以生成高分辨率的特征[13];在上下文支路中,将全局平均池化添加到支路中,获取最大的感受野㊂并且还添加注意力机制来指导特征学习㊂图1㊀原始模型的结构Fig.1㊀original model1.1㊀基于空间和通道的双重注意力机制单元文[3]提出一种轻量的空间通道双重注意力机401哈㊀尔㊀滨㊀理㊀工㊀大㊀学㊀学㊀报㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀第28卷㊀制CBAM,可以在通道和空间维度上进行注意力关注[14]㊂CBAM由两个单独的子模块组成,分别是通道注意力模块(CAM)和空间注意力模块(SAM)㊂前者是关注于通道,后者是关注于空间㊂这样的优点是不仅可以很好地的控制模型的参数量,并且能够将其加入到当前已有的模型结构中㊂总之, CBAM是一种随插随用的模块㊂1.1.1㊀CAM对输入的特征图G(HˑWˑC)分别进行基于宽高的整体最大池化和平均整体池化,得到两张1ˑ1ˑC特征的图像㊂然后将它们发送到一个双层神经网络(MLP),这个双层神经网络是共用的[15]㊂第一层神经元个数为C/r(r为减少率),激活函数为Relu;第二层神经元个数为C㊂然后将MLP输出的特征相加并由sigmoid激活㊂生成最终的通道注意特征图M_c㊂最后,用乘法将M_c和输入特征图G 相乘㊂生成的特征图即为空间注意力机制模块需要的输入特征图Gᶄ㊂1.1.2㊀SAMSAM将Gᶄ作为输入特征图㊂首先进行以通道为基础的最大全局池化和平均全局池化㊂然后将两个特征图HˑWˑ1拼接操作,即通道拼接㊂经过7ˑ7卷积,降维为一个通道,即HˑWˑ1㊂随后由sigmoid函数生成特征图Gᵡ㊂最后将Gᵡ和Gᶄ进行乘法操作,生成最后的特征图㊂1.2㊀改进的空间支路为了使语义分割模型有更好的分割效果,可以通过将低级的空间特征和庞大的深层语义信息相结合来提高模型的分割精度[15]㊂本文提出的空间路径是由3个卷积组成㊂第一层包括一个步长为2的卷积,剩下两层是步长为1的深度可分离卷积[15]㊂然后是批标准化(BN),和以线性整流函数(ReLU)作为激活函数㊂此外本文还在空间路径上添加通道空间模块(CSSE)㊂具体算法如下:特征图HˑWˑC经过全局平均池化,得到特征图1ˑ1ˑC㊂然后经过两个1ˑ1ˑ1的卷积处理,最终得到一个C维向量㊂然后用sigmoid归一化函数得到对应的mask,最后乘以通道分组得到信息校准后的Mᶄ特征图㊂sSE模块类似于SAM㊂具体过程是直接在特征Mᶄ(HˑWˑC)上使用1ˑ1ˑ1,将特征图Mᶄ卷积成为HˑWˑ1的特征图㊂然后用sigmoid 进行激活得到空间特征图㊂最后应用它直接对原始特征图完成空间信息的校准㊂CSSE模块是将cCE 模块和sSE模块以串联的方式连接,并且通过实验证明,组成的CSSE对模型的分割效果的也有提升㊂CSSE结构如图2所示㊂图2㊀CSSE结构图Fig.2㊀CSSE structure diagram1.3㊀改进的上下文支路在原始模型中,为了可以有更大的感受野和更多的语义信息,BiSeNet设计了Context path[15]㊂并且使用Xception作为特征提取的骨干网络[16]㊂Xception可以快速缩小特征图以获得大感受野,来编码高级语义上下文信息[16]㊂本文提出的改进的上下文路径使用轻量级模型ResNet18作为特征提取骨干网络,并且在路径中额外添加了CBAMT 模块㊂本文的特征提取的骨干网络是由4个block组成,每个block由两个3ˑ3的卷积和BN层,以及relu组成㊂此外,本文提出的CBAMT模块是基于文[6]中提出的一种triplet attention方法㊂该方法使用三重分支结构来捕获维度交互,从而计算注意力的权重,实现通道和空间的交互[16]㊂本文提出的改进后的CBAMT模块,采用了triplet attention(三重分支)的思想,三重分支结构3个并行分支分支组成,其中两个分支主要负责维度C与维度H或W之间的交互[17]㊂最后一个分支类似于SAM,用于构建空间感知模块[17]㊂最后,将所有分支的输出进行平均水平聚合㊂CBAMT将CAM模块的输出特征图Fᶄ利用两个平行的包含Z池化层,用于维度交互的分支,将维度C与维度H或W的维度进行交互,将两个输出结果相加得到特征图Fᵡ㊂然后使用特征图Fᵡ作为SAM的输入以得到最终特征㊂Z池化层的作用是将维度H和W的张量减少到2维,并将该维度的平均池化特征和最大池化特征联系起来,这使得该层在减少其深度的同时保持真实张量的丰富表示,这有利于后续计算[18]㊂最后,改进的上下文路径中保留了全局平局池化结构,这样可以为模型提供全局上下文信息,更好地增强模型分割效果㊂CBAMT模块结构如图3,改进后的整体网络模型如图4所示,以及Z-pool计算:501第5期王小玉等:空间通道双重注意力道路场景语义分割Mc(F )=σ((AvgPool(F ),MaxPool(F ))(1)式中:F 为输入特征图;σ为sigmoid 激活函数;Avg-Pool 和MaxPool 分别表示全局平均池化和全局最大池化,f7x7表示卷积操作时,卷积核大小为7㊂图3㊀空间通道注意力模块CBAMT 结构图Fig.3㊀Spatial channel attention module CBAMTstructurediagram图4㊀改进后的模型的整体结构Fig.4㊀The overall structure of the improved mod1.4㊀特征融合模块(FFM )特征融合模块的功能是把来自空间支路的特征和上下文支路的特征融合[18]㊂之所以需要FFM 来融合两者,是由于前者是低层次的特征,后者是高层次的特征[18]㊂具体流程:将来自空间支路和上下文支路的特征进行向量拼接的操作,得到特征图H ,然后对特征图H 进行全局平局池化,得到1ˑ1ˑC 向量㊂最后通过类似SENet 中的通道权重相乘,对特征图H 重新进行加权,得到最后的特征图Hᶄ㊂图5显示了该模块的结构㊂图5㊀FFM 结构图Fig.5㊀FFM structure diagram1.5㊀注意力优化模块(ARM )原始模型还针对上下文路径设计了ARM,如图6所示㊂首先为了获得整体上下文语境信息,使用全局平局池化㊂来帮助模型学习特征,来强化特征提取网络不同阶段的特征学习㊂此外还可以简单地完成整体上下文语境信息的集成㊂并且不必利用上采样,计算成本可以忽略不计㊂图6㊀ARM 结构图Fig.6㊀ARM block diagram1.6㊀注意力优化模块(ARM )上下文路径中添加了两个辅助损失函数来更好地监督输出㊂主损失函数和辅助损失函数都使用Softmax 函数为式(2)[19]㊂辅助损失函数监督模型的训练,主损失函数监督整个BiSeNet 的输出(Lp)㊂添加两个特殊的辅助损失函数监督Context Path 的输出(Li)借助参数α以平衡主损失函数与辅助损失函数的权重,如式(3):Loss =1n ði l i =1n ði loge p iði e p i(2)L (X |W )=l p (X :W )+αðK i l i (X i :W )(3)其中:l p 为主要的loss ;l i 为辅助的loss ;X i 为ResNet601哈㊀尔㊀滨㊀理㊀工㊀大㊀学㊀学㊀报㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀第28卷㊀第i个阶段的输出特征;K=3,ɑ为1㊂在训练过程中,只使用了辅助loss进行训练㊂2㊀实验结果与分析2.1㊀数据集本文使用两个数据集,均是城市道路场景数据集,分别为Cityscapes数据集和CamVid数据集㊂这两个数据集是道路场景语义分割中最常用来进行模型评估的数据集[19]㊂CamVid数据集有11个类别;而Cityscapes包含两类,一类是5000张带有高质量像素级标签的精细图像,一类是20000张带有粗糙标签的附加图,本实验使用的是Cityscapes中5000个高质量像素级标签的精细图像进行实验㊂最后从速度即推理时间以及精度两个方面与Baseline模型进行对比,分析模型的分割性能,并且通过可视化结果展示模型的分割性能㊂2.2㊀参数设置本文实验环境为Win10操作系统,Nvidia RTX 1080Ti6GB,Python3.9编译环境,Pytorch1.9框架㊂具体参数为 bitchsize=8,momentum=0.9,weight-decay=5ˑ10-4㊂采用 poly 学习率,power=0.9㊂本文采取随机梯度下降优化算法(SGD)进行模型训练,并使用 poly 学习策略,其公式为:η=η∗(1-itermax_iter)power(4)其中:初始学习率为2.5ˑ10-2㊂iter是当前的迭代次数;max_iter是总迭代次数[19]㊂设置为1000(即产生1000个epoch)㊂功率值固定为0.9;主要和次要损失平衡参数设置为1㊂2.3㊀消融实验本文还做了在相同条件下CBAMT和CSSE这两个模块对模型性能的提升的有效性试验结果见表1㊂从表1可以看出,CBAMT和CSSE两个模块均可以提高模型分割精度,而且CBAMT的提升效果要优于CSSE㊂表1㊀各模块在CamVid数据集上的有效性验证Tab.1㊀Validation of each module on the CamVid dataset CBAM CSSE CBAMT FFM ARM mIoU%ɿɿɿɿ66.5ɿɿɿ66.1ɿɿɿɿ65.9ɿɿɿ65.7㊀㊀注:ɿ表示有效㊂2.4㊀算法整体性能分析与比较本文使用的Baseline模型是个人实现的Res-Net18版本的BiSeNet模型㊂2.4.1㊀分割精度模型性能采用平均交并比(mIOU)来衡量,计算公式为mIoU=1k+1ðk i=0p iiðk j=0p ij+ðk j=0p ji-p ii(5)本文算法与其他算法的分割结果的对比如表2所示㊂由表2可见,本文模型的精度与原BiSeNet 对比,在Cityscapes和CamVid上分割精度度提高了1.6%和1.1%㊂表2㊀分割精度与对比Tab.2㊀Segmentation accuracy and comparison模型mIoU/%Cityscapes CamVid SegNet58.957.0ENet65.752.9DFANet71.361.5 MobileNet177.864.7 BiSeNet(Res-18)75.765.4本文算法77.366.52.4.2㊀推理速度在测试速度实验中,Baseline模型在Cityscapes 上的推理时间为21.5ms,在CamVid上的推理时间为35.5ms,结果如表3所示㊂表3㊀推理速度对比Tab.3㊀Split speed and comparison模型Cityscapes/ms CamVid/msSegNet24.615.7 MobileNet132.310.5 BiSeNet(Res-18)35.521.5本文算法37.924.5㊀㊀本文模型在Cityscapes上的推理时间为37.9ms,在CamVid上的推理时间为24.5ms,证明本文网络本文网络充分满足实时语义分割的要求㊂总之,从速度和精度两个方面综合分析,本文提出的模型在Cityscapes和Camvid数据集上,比701第5期王小玉等:空间通道双重注意力道路场景语义分割BiSeNet(Res18)在推理速度与分割精度之间实现了更好的平衡,与ENet 相比,在精度得到了显著提升,其次与目前常见的MobileNet1相比,推理时间接近,精度方面有所提升㊂但是MobileNet1采用分组卷积,同时模型也没有考虑到空间信息,而且模型层数还是较多,而且对硬件要求,比如GPU 较高㊂而且由于分组卷积,导致在多次重复实验中,偶尔会出现分割效果很差的情况,通过查看文献得知,可能与分组卷积会导致模型学废,后续会对这方面继续研究㊂2.4.3㊀可视化结果本文提出的模型在CamVid 上的分割效果以及与Baseline 模型的比较如图7所示㊂首先,前三列图像分别是初始图㊁标签图和模型的分割效果图㊂从前三者可以看出,改进后的模型有着很好的分割性能㊂另外该模型对不同物体的分割效果是有所区别的㊂其中较大物体的分割效果较好,基本可以准确识别其类别,例如树木㊂相反,对于很小的物体的分割结果存在一些问题㊂比如存在部分细小物体没有识别等问题㊂另外模型同样存在当前大多数实时分割模型对没有标记的物体分割非常混乱的通病㊂通过观察本文模型与Baseline 模型的实际分割效果图(即最后一列图像)的对比,可以看出改进后的语义分割模型的的分割效果优于基础模型㊂图7㊀可视化结果Fig.7㊀Visualization resul2㊀结㊀论本文对语义分割算法的准确度和实时性表现进行深入分析,提出了一种空间通道双重注意力道路场景分割模型㊂在保证分割准确度的同时兼顾模型的实时性㊂上下文路径的CBAMT 模块可以获取更多重要的上下文特征信息,空间路径的CSSE 获取了更丰富的空间信息㊂实验证明,本文提出的模型在精度和速度的平衡性优于原BiSeNet 模型㊂所构建的注意力机制以及轻量级模型对于其他研究者具有参考意义㊂由于本文算法仅对道路场景数据集进行深入测试,对于其他类别缺乏针对性,在后续研究中,会考虑结合具体图像分割目标进行模型设计,进一步提升模型的实用性能,并且对实际的目标进行研究和测试㊂参考文献:[1]㊀JIA Gengyun,ZHAO Haiying,LIU Feiduo,et al.Graph-Based Image Segmentation Algorithm Based on Superpixels[J].Journal of Beijing University of Postsand Telecommunications,2018,41(3):46.[2]㊀黄福蓉.用于实时道路场景的语义分割算法CBR-ENet[J].中国电子科学研究院学报,2021,16(3):27.HUANG Furong.Semantic Segmentation Algorithm CBR-ENet for Real-time Road Scenes[J].Journal of China A-cademy of Electronic Sciences,2021,16(3):277.[3]㊀CANAYAZ M.C +EffxNet:A Novel Hybrid Approachfor COVID-19Diagnosis on CT Images Based on CBAM and EfficientNet[J].Chaos,Solitons &Fractals,2021,151:111310.[4]㊀祖宏亮.基于模糊聚类的图像分割算法研究[D].哈尔滨:哈尔滨理工大学,2020.[5]㊀吕沛清.基于改进U-Net 的肝脏CT 图像自动分割方801哈㊀尔㊀滨㊀理㊀工㊀大㊀学㊀学㊀报㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀第28卷㊀法研究[D].哈尔滨:哈尔滨理工学报.2022: [6]㊀TANG X,TU W,LI K,et al.DFFNet:an IoT-percep-tive Dual Feature Fusion Network for General Real-timeSemantic Segmentation[J].Information Sciences,2021,565:326.[7]㊀ZHANG R X,ZHANG L M.Panoramic Visual Percep-tion and Identification of Architectural Cityscape Elementsin a Virtual-reality Environment[J].Future GenerationComputer Systems,2021,118:107.[8]㊀A Method to Identify How Librarians Adopt a TechnologyInnovation,CBAM(Concern Based Adoption Model)[J].Journal of the Korean Society for Library and Infor-mation Science,2016,50(3):[9]㊀张立国,程瑶,金梅,等.基于改进BiSeNet的室内场景语义分割方法[J].计量学报,2021,42(4):515.ZHANG Liguo,CHENG Yao,JIN Mei,et al.SemanticSegmentation Method of Indoor Scene Based on ImprovedBiSeNet[J].Acta Metrology,2021,42(4):515. [10]高翔,李春庚,安居白.基于注意力和多标签分类的图像实时语义分割[J].计算机辅助设计与图形学学报,2021,33(1):59.GAO Xiang,LI Chungeng,An Jubai.Real-time Seman-tic Segmentation of Images Based on Attention and Multi-label Classification[J].Journal of Computer-Aided De-sign and Graphics,2021,33(1):59.[11]YIN J,GUO L,JIANG W,et al.Shuffle Net-inspiredLightweight Neural Network Design for Automatic Modula-tion Classification Methods in Ubiquitous IoT Cyber-phys-ical Systems[J].Computer Communications,2021,176:249.[12]RÜNZ M,AGAPITO L.Co-fusion:Real-time Segmenta-tion,Tracking and Fusion of Multiple Objects[C]//2017IEEE International Conference on Robotics and Automa-tion(ICRA).IEEE,2017:4471.[13]CHEN Y C,LAI K T,LIU D,et al.Tagnet:Triplet-at-tention Graph Networks for Hashtag Recommendation[J].IEEE Transactions on Circuits and Systems for VideoTechnology,2021,32(3):1148.[14]任天赐,黄向生,丁伟利,等.全局双边网络的语义分割算法[J].计算机科学,2020,47(S1):161.REN Tianci,HUANG Xiangsheng,DING Weili,et al.Semantic Segmentation Algorithm for Global Bilateral Net-works[J].Computer Science,2020,47(S1):161.[15]LI J,LIN Y,LIU R,et al.RSCA:Real-time Segmenta-tion-based Context-aware Scene Text Detection[C]//Pro-ceedings of the IEEE/CVF Conference on Computer Vi-sion and Pattern Recognition,2021:2349. [16]SAFAE El Houfi,AICHA Majda.Efficient Use of RecentProgresses for Real-time Semantic Segmentation[J].Ma-chine Vision and Applications,2020,31(6):45. [17]MARTIN F.Grace,PING Juliann.Driverless Technolo-gies and Their Effects on Insurers and the State:An Ini-tial Assessment[J].Risk Management and Insurance Re-view,2018,21(3):1.[18]WEI W,ZHOU B,POŁAP D,et al.A Regional Adap-tive Variational PDE Model for Computed Tomography Im-age Reconstruction[J].Pattern Recognition,2019,92:64.[19]FAN Borui,WU Wei.Sufficient Context for Real-TimeSemantic Segmentation[J].Journal of Physics:Confer-ence Series,2021,1754(1):012230.(编辑:温泽宇)901第5期王小玉等:空间通道双重注意力道路场景语义分割。

一种2DDCT与压缩感知结合的人脸识别

一种2DDCT与压缩感知结合的人脸识别

一种2DDCT与压缩感知结合的人脸识别路翀;刘晓东;刘万泉【期刊名称】《电子设计工程》【年(卷),期】2011(019)021【摘要】In this paper an improved face recognition algorithm is proposed based on the combination of 2D discrete cosine transform (2DDCT) and Compressed Sensing(CS)because of CS. CS first transforms an image matrix to a vector which caused high dimensionality and computational complexity. In this paper the original face image is processed by 2DDCT to reduce the character dimensions effectively. Then, the image is processed by CS to obtain the face recognition features. Finally, the nearest neighbor (NN) classifier is selected to perform face recognition. The experimental results on ORL , Yale and Feret face databases show that this method is robust and effective in the face recognition, especially in the face database YaleB.%针对压缩感知(Compressed Sensing,CS)方法需将图像矩阵转化为向量后进行特征提取,导致数据维数很大,计算复杂等缺点,提出二维离散余弦变换(2DDCT)和压缩感知(Compressed Sensing,CS)相结合的人脸识别方法。

基于脉动阵列的二维DCT算法及其VLSI设计

基于脉动阵列的二维DCT算法及其VLSI设计

基于脉动阵列的二维DCT算法及其VLSI设计
孙阳;余锋
【期刊名称】《微电子技术》
【年(卷),期】2003(31)5
【摘要】本文介绍了一种基于脉动阵列算法的二维离散余弦变换(2-D DCT)电路设计.该电路结构不需要复杂的转移存储器,而是采用平行输入平行输出的结构,完成一次N×N个DCT变换只需要N个周期,因此吞吐率是传统DCT的N倍.这种电路结构具有模块化、布线简单、芯片占用面积小等优点,十分适合VLSI的实现.
【总页数】7页(P21-26,36)
【作者】孙阳;余锋
【作者单位】浙江大学仪器系数字技术与仪器研究所,杭州,310027;浙江大学仪器系数字技术与仪器研究所,杭州,310027
【正文语种】中文
【中图分类】TN492
【相关文献】
1.二维DCT算法及其优化的VLSI设计 [J], 魏本杰;刘明业;章晓莉
2.基于脉动阵列的HEVC8×8整数DCT变换的设计与实现 [J], 潘苏文;叶宇煌;郑明魁;陈志峰;杨秀芝
3.基于提升算法的二维离散9/7小波变换的VLSI结构设计 [J], 熊琦;方建超;王政道;文康益
4.基于脉动阵列的LU算法矩阵求逆VLSI结构 [J], 孙泉;赵明;张秀君
5.二维DCT算法及其精简的VLSI设计 [J], 陈伟;卢贵主;郑灵翔
因版权原因,仅展示原文概要,查看原文内容请购买。

一种新型面积优化的二维IDCT处理器

一种新型面积优化的二维IDCT处理器

一种新型面积优化的二维IDCT处理器
于宝东;邹雪城
【期刊名称】《微处理机》
【年(卷),期】2005(026)005
【摘要】本文提出了一种基于行列分解算法的8×8二维反向离散余弦变换(IDCT)处理器.不再需要传统的为保持输入列向量的输入寄存器和并串转换寄存器,这既减小了芯片面积又减小了处理延时.其中的一维离散余弦变换采用查找表实现,作为查找表的ROM比传统的分布式算法的ROM也小的多.我们提出的二维IDCT处理器不仅具有面积优化、低延时、高吞吐率的特点,并且具有规整的、全流水结构,因此非常适合VLSI和FPGA实现.
【总页数】3页(P86-88)
【作者】于宝东;邹雪城
【作者单位】华中科技大学电子科学与技术系,武汉,430074;华中科技大学电子科学与技术系,武汉,430074
【正文语种】中文
【中图分类】TN4
【相关文献】
1.一种基于高度并行结构的二维DCT/IDCT处理器设计 [J], 刘锋;代国定;庄奕琪
2.基于FPGA和2位串行分布式算法的实时高速二维DCT/IDCT处理器研制 [J], 向晖;滕建辅
3.一种高度并行无乘法器结构的二维IDCT协处理器设计 [J], 穆荣;朱贺新;焦继业
4.一种动态精度匹配的面积优化2-DDCT/IDCT的实现 [J], 刘峰;周荣政;陈学峰;洪志良
5.基于FPGA的实时高速二维DCT/IDCT处理器 [J], 卢?;陈旭昀;闵昊;章倩苓因版权原因,仅展示原文概要,查看原文内容请购买。

基于模运算的DCT域二维水印嵌入与盲提取算法

基于模运算的DCT域二维水印嵌入与盲提取算法

基于模运算的DCT域二维水印嵌入与盲提取算法
梁新生;江健
【期刊名称】《计算机与现代化》
【年(卷),期】2004(000)010
【摘要】数字水印是实现数字产品版权保护的有效办法,本文在离散余弦变换域内对数字图像的隐蔽嵌入和提取算法进行了研究,实现了基于DCT中频的二维水印嵌入与盲提取算法,该算法以模运算为基础,实验结果表明,该算法针对图像压缩和噪声等攻击具有较好的鲁棒性.
【总页数】3页(P114-116)
【作者】梁新生;江健
【作者单位】中讯邮电设计院,河南,郑州,450007;浙江大学,浙江,杭州,310027【正文语种】中文
【中图分类】TP301.6
【相关文献】
1.一种DCT域水印嵌入与提取算法研究 [J], 王若蕙;傅圣雪
2.基于图象DCT域的信息隐藏盲提取算法 [J], 胡敏;平西建;丁益洪
3.基于人眼视觉特性的DCT域的信息隐藏盲提取算法 [J], 刘文杰
4.基于人眼视觉特性的DCT域的信息隐藏盲提取算法 [J], 贾玉珍;王玥;郭红云
5.一种基于二维离散小波变换的视频水印嵌入和盲提取算法 [J], 白林雪;宗良因版权原因,仅展示原文概要,查看原文内容请购买。

DCT域网格编码量化及其在图像量化中的应用

DCT域网格编码量化及其在图像量化中的应用

DCT域网格编码量化及其在图像量化中的应用
周正华;郑勇;朱维乐
【期刊名称】《信号处理》
【年(卷),期】2001(017)001
【摘要】本文提出了在离散余弦变换(DCT)域内作网格编码量化(TCQ)的新方法.该方法不仅利用了信号间的时间相关性,而且也利用了信号变换域内的相关性,该方法首先用一维或二维DCT变换减小变换域内的相关性,然后利用卷集编码和信号空间扩展来增大量化信号间的欧氏距离,并用维持比算法寻找最优量化序列.仿真结果表明,基于二维DCT的TCQ方法在相对大的灰度率下,比传统的TCQ方法好2dB左右,与此同时,TCQ又比最优标量量化好2dB左右.该方法还具有编码计算复杂度适中,解码简单以及对误差扩散不敏感的优点.
【总页数】4页(P27-30)
【作者】周正华;郑勇;朱维乐
【作者单位】电子科技大学电子技术系;电子科技大学电子技术系;电子科技大学电子技术系
【正文语种】中文
【中图分类】TN91
【相关文献】
1.基于改进DCT域分类矢量量化的图像编码算法 [J], 王展青;魏毅峰
2.二维网格编码矢量量化及其在静止图像量化中的应用 [J], 郑勇;周正华;朱维乐
3.结合相干斑抑制的小波域SAR图像网格编码量化 [J], 谢海慧;纪中伟;黄顺吉
4.DCT域网格编码矢量量化 [J], 周正华;郑勇;朱维乐
5.基于菱形编码和修改量化表的DCT域图像隐写 [J], 金涛;何加铭;杨任尔
因版权原因,仅展示原文概要,查看原文内容请购买。

一种分类自适应混合DCT/DPCM图像编码方法

一种分类自适应混合DCT/DPCM图像编码方法

一种分类自适应混合DCT/DPCM图像编码方法
王春宁;段勇
【期刊名称】《西安电子科技大学学报》
【年(卷),期】1997(024)001
【摘要】在图像压缩编码方法中,如何将传统意义上的Shannon编码理论与人的视觉特征更为一致性的结合,已受到越来越普遍的重视。

为此,文中依据视觉特性,从更多考虑图像边缘区域的编码性能入手,提出了分类自适应DCT/DPCM图像编码方法。

实验结果表明,与单独DCT或DPCM编码方式比较,此方法具有压缩比较高、恢复图像主观性能良好、且易实现与现存的国际编码标准(如JPEG、H.261、MPEG等)结合使用等优点
【总页数】7页(P108-114)
【作者】王春宁;段勇
【作者单位】西安电子科技大学检测与仪器系;西安电子科技大学检测与仪器系【正文语种】中文
【中图分类】TN919.8
【相关文献】
1.一种基于DCT边缘模式分类的神经网络图象编码方法 [J], 汪庆宝;董来生
2.高质量图象压缩的自适应DPCM/DCT混合编码方法 [J], 赵德斌;陈耀强;高文
3.一种基于DCT的快速分形图像编码方法 [J], 严宁;李启炎
4.一种基于DCT的图像多描述编码方法 [J], 郁梅;贺赛龙;范良忠;蒋刚毅
5.一种基于DT分割及自适应DCT的图像编码方法 [J], 王养利;吴成柯
因版权原因,仅展示原文概要,查看原文内容请购买。

双路径特征融合编解码结构的高速语义分割

双路径特征融合编解码结构的高速语义分割

双路径特征融合编解码结构的高速语义分割
胡学刚;龚宇;敬力源
【期刊名称】《计算机辅助设计与图形学学报》
【年(卷),期】2022(34)12
【摘要】对基于深度学习的高精度图像语义分割模型参数量大、分割速度慢的问题,提出一种基于双路径特征融合编解码结构的语义分割模型.首先,该模型编码器通过对语义路径和空间路径同时进行编码,其能够融合不同的特征信息,弥补了空间信息和语义信息难以两全的弊端,对特征图进行高效的卷积操作;其次,该模型解码器通过融合高层语义信息和低层空间信息,有效地弥补了编码时下采样操作丢失的特征信息.在Cityscapes和Camvid数据集上的实验结果表明,整体模型的参数量仅为3.91×10^(6),在2个数据集上分别取得了67.7%和65.8%的均交并比,分割速度分别为111帧/s和86帧/s.对比其他同类模型,所提模型拥有更少的参数量和更高的精度,其分割速度远远超过实时语义分割的最低要求24帧/s.
【总页数】9页(P1911-1919)
【作者】胡学刚;龚宇;敬力源
【作者单位】重庆邮电大学通信与信息工程学院
【正文语种】中文
【中图分类】TP391.41
【相关文献】
1.融合语义流场的编解码网络港口影像海陆分割
2.基于空间特征提取和注意力机制的双路径语义分割
3.基于编解码结构的多特征融合眼底图像分割
4.基于实景数据增强和双路径融合网络的实时街景语义分割算法
5.MAAUNet: 医学图像语义分割U型编解码结构探索
因版权原因,仅展示原文概要,查看原文内容请购买。

基于信息流多级结构响应的轮廓检测模型

基于信息流多级结构响应的轮廓检测模型

基于信息流多级结构响应的轮廓检测模型
李健;范影乐
【期刊名称】《传感技术学报》
【年(卷),期】2024(37)2
【摘要】考虑到视觉信息流在视通路多级结构中的处理方式,提出一种图像轮廓检测的新模型。

首先,根据初级视皮层(V1区)4B层的简单细胞具有三重感受野结构并对朝向敏感的特性,感知图像方位信息,并经复杂细胞提取获得边缘轮廓响应;其次,根据V1区2/3层细胞的抑制特性,引入稀疏性度量指标和神经元突触动态编码机制对边缘轮廓响应进行抑制,得到纹理抑制响应;最后,利用高级视皮层的融合修正机制,对边缘轮廓响应和纹理抑制响应进行优势互补,得到最终的轮廓检测结果。

在RuG40和BSDS500图像数据集上进行实验,结果表明所提算法能够有效地区分图像的轮廓与纹理信息,凸显主体轮廓。

所构建的基于信息流多级结构响应的轮廓检测模型对后续基于生物视觉机制的图像分析具有一定的参考价值。

【总页数】9页(P288-296)
【作者】李健;范影乐
【作者单位】杭州电子科技大学模式识别与图像处理实验室
【正文语种】中文
【中图分类】TP391
【相关文献】
1.基于信息流的多级安全策略模型研究
2.基于petri网的多级管理并发信息流模型与死锁分析
3.基于信息流的多级动态可信度量模型
4.基于主视通路结构分级响应模型的轮廓检测方法
5.一种基于信息流控制的多级安全通道模型
因版权原因,仅展示原文概要,查看原文内容请购买。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

P IPELINED A RCHITECTURE O F 2D-DCT,QUANTIZATION AND Z IGZAG PROCESS FORJPEG I MAGE C OMPRESSION USING VHDLT.Pradeepthi1 and Addanki Purna Ramesh2Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (dt),Andhra Pradesh, Indiapurnarameshaddanki@,pradeepthi.t@A BSTRACTThis paper presents the architecture and VHDL design of a Two Dimensional Discrete Cosine Transform (2D-DCT) with Quantization and zigzag arrangement. This architecture is used as the core and path in JPEG image compression hardware. The 2D- DCT calculation is made using the 2D- DCT Separability property, such that the whole architecture is divided into two 1D-DCT calculations by using a transpose buffer. Architecture for Quantization and zigzag process is also described in this paper. The quantization process is done using division operation. This design aimed to be implemented in Spartan-3E XC3S500 FPGA. The 2D- DCT architecture uses 1891 Slices, 51I/O pins, and 8 multipliers of one Xilinx Spartan-3E XC3S500E FPGA reaches an operating frequency of 101.35 MHz One input block with 8 x 8 elements of 8 bits each is processed in 6604 ns and pipeline latency is 140 clock cycles .K EYWORDSJPEG, discrete cosine transform (DCT), quantization, zigzag, FPGA1.I NTRODUCTIONOne of the most popular lossy compression methods is JPEG. JPEG stands for Joint Photographic Expert Group. According to Magli [3], widely used in JPEG image included on the internet web pages. The picture using JPEG can be accessed faster than the image without compression.The JPEG compression can be divided into five main steps [6], as shown in Fig.1 color space conversion, down-sampling, 2-D DCT, quantization and entropy coding. The first two operations are used only for color images. For gray scale image we use only last three steps. In this present paper we concentrated on hardware architecture of 2D-DCT, quantization and zigzag arrangement. To achieve high throughput, this paper uses pipelined architecture, rather than single clock architecture designed by Basri et.al [4].DOI : 10.5121/vlsic.2011.2308 99100Fig.1 JPEG Compression Steps for Colour Images1.1. Color Space ConverterThe process of the JPEG starts with color space conversion. This process is not applicable to gray-scale image, where there is only one luminance component for gray scale image. Color image data in computers is usually represented in RGB (Red-Green-Blue) format. Each color component uses 8 bits to store, thus, a full color pixel would require 24 bits. From the fact that human eyes are more sensitive to intensity change rather than color change, the JPEG algorithm exploits this by converting the RGB format to another color space called YCbCr. Y is luminance component, Cb and Cr are chrominance components. After converting the color space, the encoder stores the luminance Y in more detail than the other two chrominance components. The Y component represents the brightness of a pixel, the Cb and Cr components represent the chrominance (split into blue and red components). This is the same color space as used by digital color television as well as digital video including video DVDs, and is similar to the way color is represented in analog PAL video and MAC but not by analog NTSC, which uses the YIQ color space. The YCbCr color space conversion allows greater compression without a significant effect on perceptual image quality (or greater perceptual image quality for the same compression). The compression is more efficient as the brightness information, which is more important to the eventual perceptual quality of the image, is confined to a single channel, more closely representing the human visual system.The RGB image is converted to YCbCr by using the following equations Y= 0.299R+ 0.587G +0.114B Cb=0.564B– 0.564 Y Cr=0.713R– 0.713 Y1.2. Down SamplingDue to the densities of color- and brightness sensitive receptors in the human eye, humans can see considerably more fine detail in the brightness of an image (the Y component) than in the color of an image (the Cb and Cr components). Using this knowledge, encoders can be designed to compress images more efficiently. The transformation into the YCbCr color model enables the next step, which is to reduce the spatial resolution of the Cb and Cr components (called "down sampling" or "chroma sub sampling"). The ratios at which the down sampling can be done on JPEG are 4:4:4 (no down sampling), 4:2:2 (reduce by factor of 2 in horizontal direction), and most commonly 4:2:0(reduce by factor of 2 in horizontal and vertical directions). For the rest of the compression process, Y, Cb and Cr are processed separately and in a very similar manner. Down sampling the chroma components saves 33% or 50% of the space taken by the image without drastically affecting perceptual image quality.1.3. Block SplittingAfter sub sampling, each channel must be split into 8×8 blocks (of pixels). If the data for a channel does not represent an integer number of blocks then the encoder must fill the remainingarea of the incomplete blocks with some form of dummy data: Filling the edge pixels with a fixedcolor (typically black) creates dark artifacts along the visible part of the border. Repeating theedge pixels is a common but nonoptimal technique that avoids the visible border, but it still creates artifacts with the colorimetric of the filled cells. A better strategy is to fill pixels using colors that preserve the DCT coefficients of the visible pixels, at least for the low frequency ones(for example filling with the average color of the visible part will preserve the first DC coefficient, but best fitting the next two AC coefficients will produce much better results with less visible 8×8 cell edges along the border).1.4.Discrete Cosine Transform (DCT)The discrete cosine transforms (DCT) is a technique for converting a signal into elementary frequency components. It is widely used in image compression. Before compression, image datain memory is divided into several blocks MCU (minimum code units). Each block consists of 8x8 pixels. Compression operations including DCT-2D in it will be done on each block [3]. Two dimensional DCT, because of its advantage in image compression, is an interesting research subject that invite many researcher [1],[2], [4], [6] and others to participate in. That makes many algorithms of DCT is developed.1.4.1. 2-D Discrete Cosine Transform (DCT)There are several ways to compute 2-D DCT. It can be computed with straightforward computation just multiply input vector by raw DCT coefficients without any algorithm [1]. This method is fast but need large logic utilization, especially multiplier. This method is fully pipelined in this paper. FPGA chip usually has only a few multipliers. In this case, Spartan-3EXCS500E has only 20 multipliers. This paper adopts the work of [2] The Discrete Cosine Transform is an orthogonal transform consisting of a set of basis vectors that are sampled cosine functions. The 2-D DCT of a data matrix is defined as equation (1)101 (2)For k = 1, 2… N, l = 2, 3… N, and c k, l= N -1/2 for l = 1.The 2-D DCT (8 x 8 DCT) is implemented by the row-column decomposition technique. We first compute the 1-D DCT (8 x 1 DCT) of each column of the input data matrix X to yield X t C. after appropriate rounding or truncation, the transpose of the resulting matrix, C t X, is stored in an transpose buffer. We then compute another 1-D DCT (8 x 1 DCT) of each row of C t X to yield the desired 2-D DCT as defined in equation (1). A block diagram of the design is shown in Fig 2.Fig 2: 2D-DCT Architecture1.5. QuantizationOur 8x8 block of DCT coefficients is now ready for compression by quantization. A remarkable and highly useful feature of the JPEG process is that in this step, varying levels of image compression and quality are obtainable through selection of specific quantization matrices. This enables the user to decide on quality levels ranging from 1 to 100, where 1 gives the poorest image quality and highest compression, while 100 gives the best quality and lowest compression. As a result, the quality/compression ratio can be tailored to suit different needs.Subjective experiments involving the human visual system have resulted in the JPEG standard quantization matrix. With a quality level of 50, this matrix renders both high compression and excellent decompressed image quality. From [5] the Quantization matrix is obtained.Fig 3. Quantization Matrix1021.6. Zigzag Reordering BufferEach block of data that is output by the quantization module needs to be reordered in a zigzag.This reordering is achieved using an 8 x 8 array of register pairs organized in a fashion similar tothe transpose buffer. Quantized output is sent sequentially byte-by-byte in zigzag pattern. Zigzagoperation is done for every 8X8 block. The pattern is shown in figure 2 [5]. Numbers listed in thefigure are the address of 64 data that is arranged in a zigzag pattern.Fig.4 Zigzag Pattern2.FPGA I MPLEMENTATION2.1. System ArchitectureThe entire system architecture to be implemented in FPGA is shown in figure 5. Input data isinserted into the system every 8 bit sequentially. Actually, many DCT designs insert the input tothe DCT in parallel. For example is 8 x 8 bit [1], [4],[6]. This is ideal for DCT computing becauseit only consumes a clock cycle to insert data to 1D-DCT unit. With sequential manner, it takes 8clock cycles to insert a set of data (8 points) to the DCT unit. The sequential architecture ischosen to save I/O port in FPGA chip. Some 2D-DCT intellectual property designs from Xilin xalso use 8-bit input [10]. The 8-bit input architecture is also fit to many camera modules.The 2D-DCT architecture, combined with zigzag and quantization used in this paper is shown inFig. 5. The 2DDCT module construction is modified from [2] that also put the data sequentially103into the module. Thus, the architecture of 2D-DCT was divided into two 1D DCT modules and one transpose buffer. The same 1D DCT module is used twice. The transpose buffer operates like a temporal barrier between the first and the second 1D DCT. It made from static RAM with two sets of data and address bus. One for read process and the other for write.2.2.1D-DCT Pipeline ProcessSince the DCT input/output has 8 points and data has to be entered and released in sequential manner, it takes 8 clock cycles for each input and output process. Totally, 8 points 1D-DCT computation needs 22 clock cycles. Design for data input and output in this paper is inspired by design from [2]. The input and output process visualization is shown in figure 6. In this paper, system computes every step in a clock cycle, so DCT computation can be done faster.Fig.6 Data input/output process visualization of clock cycles for 8 points 1D-DCT2.3. Transpose BufferTranspose buffer is static RAM, designed with two set of data and address bus. It has input and output data and address buses, the structural construction of transpose buffer is shown in fig.7. The input to the transpose buffer comes from output of first 1D-DCT. Address in, out, and WE (write enable) are generated from controller module. Input address is generated in normal sequence (0,1,2,3,4,5,6, …, 63) but output address is generated in transposed sequence (0,8,16,24,32,40,48,56,1,9,17,..,55, 63). Output process begins after the entry of all 64 inputs. It gives the time latency between first input and first output. The output of transpose buffer is fed directly to the input of second 1D-DCT unit and the chain of sequences continues in same order.104105Fig.7 Transpose Buffer Block Diagram2.4. QuantizerThe Quantization process in JPEG image compression is done by dividing each and every 2D-DCT coefficient by quantizing values from quantization table shown in fig 3. This quantizer module consists of ROM and divider. These quantizing values are first stored in ROM. The divider carries out division in a pipelined manner. The first DCT coefficient coming out from 2D-DCT module is divided by the first value from the quantization table (which was already stored in ROM), and second DCT coefficient is divided by second value from the table, like wise total 64 coefficients are divided by the values in quantization table. In this quantization process also we used pipeline architecture. Block Diagram of the implementation is shown in fig 8.2.5. Zigzag BufferZigzag buffer is made from static RAM. Its construction is like transpose buffer. It has two sets of data – address bus. Input address bus is accessed by normal sequence, but output address is given some zigzag sequence described in fig 4. Zigzag address is generated by a zigzag RAM. The sequence is stored in the RAM. When the RAM address bus is accessed by normal address sequence, RAM data bus will emit zigzag value. Figure 8 describe zigzag buffer and RAM construction in the system.Fig.8 Quantization & Zigzag Architecture3.SIMULATION R ESULTSThe 2-D DCT, Quantization and Zigzag architecture was described in VHDL. This VHDL was synthesized into a Xilinx Spartan 3E family FPGA [7]. System is tested with gray scale image. Simulation of VHDL values are compared with MATLAB values. The complete synthesis results to Spartan-3E FPGA are presented in table 1, whose hardware was fit in an XCS500E device. The table 2 presents the comparison between [1] and the present work in this paper.Table 1.device utilization using Xilinx spartran-3E for total architecture proposed in this paper.Logic Units Used Available Utilization Number of slices 1891 4656 40%Number of slices FFs 2450 9312 26%Number of 4 input LUTs 1671 9312 17%Number of Bonded IOBs 51 231 21%Number of multipliers 18x18 8 20 40%106Table 2.present paper’s 2D-DCT result is compared with work of 2D-DCT design of [1].Logic Units This paper (only 2D-DCT) Presented[1] Number of slices 1235 7260Number of slices FFs 1551 9644Number of 4 inputLUTs1239 11194Number of BondedIOBs23 101Number of multipliers 18x18 8-Table 3. Present paper’s 2D-DCT (with Quantization & zigzag process) result is compared with result with [8].Logic units This paper(2D-DCT,quantization,zigzag) Presented [8]Number of 4 inputLUTs1671 5276 Number of slices 1891 3070Clock freq(MHz) 101.35 31.1According to synthesis result, maximum time delay produced is 9.004 ns. That constraint yields minimum clock period 9.866 ns. Maximum clock frequency can be used is 101.355MHz. Maximum delay synthesized is much smaller than delay produced in [4]. 1D-DCT designed in [4] yields maximum time delay 76.03 ns.System [4] uses fully parallel processing without clock to compute 8 points 1D-DCT. That system is used as comparison reference because it uses same FPGA with this system. Since the system in paper [1] uses Vertex FPGA that has higher frequency than Spartan, the delay is much smaller than this system and maximum frequency is higher. Maximum frequency in [1] is 308.182 MHz this system is also faster than 2D-DCT described in [6]. System [6] has minimum period 82.1 ns.The uses of pipeline process gives the system latency. When results are compared with [8] the slices, LUT’s are decreased and clock frequency is increased to 101MHz. The output exists several clock cycles after the first input. Latency produced in 2D-DCT is 94 clock cycles, quantizer output at 118 clock cycles. Overall system (quantized and zigzag 2D-DCT) has latency 140 clock cycles. As comparison, 2D-DCT107108designed in [6] has latency 160 clock cycles. The better result reached by system in [1]. It takes37 clock cycles as system latency to compute 2D-DCT.Fig 9 Output for only 2D-DCTFig 10 Output after for QuantizationFig 11 Output after zigzagFig 12 Device Utilization Summery4.C ONCLUSIONThe jpeg image compression is designed in VHDL and is tested with gray scale image. The accuracy of computation is compared to Matlab computation result with similar operation. This Comparison yields Mean Square Error value MSE = 0.060552, which is computed for 64 bits of data in pipelined process causes latency in the system. The latency produced from this system is 140 clock cycles. Maximum frequency can be achieved by this system is 101.35 MHz Sequential pipeline design gives higher frequency than fully parallel design in [4]. The design takes 1891 slices, 2450 slice FF, 1671 LUT’s and 8 multipliers such that the area is also reduced when compared to previous work. It is suitable for implementing on FPGA like Xilinx XCS500E. The sequential operation of DCT saves logic utilization in FPGA compared with [1] to a much larger extent. Each step of DCT algorithm is executed on each clock cycle. Every step consists of 8-9 operation. Thus this method is fast and had very less complexity.5. References[1] Trang T.T. Do, Binh P. Nguyen “A High-Accuracy and High-Speed 2-D 8x8 Discrete CosineTransform Design”. Proceedings of ICGCRCICT 2010, vol. 1, 2010, pp. 135-138.[2] Sun, M., Ting C., and Albert M., ‘‘VLSI Implementation of a 16 X 16 Discrete Cosine Transform’’,IEEETransactions on Circuits and Systems, Vol. 36, No. 4, April 1989.[3] E. Magli, “The JPEG Family of Coding Standard,” Part of “Document and Image Compression”, NewYork: Taylor and Francis, 2004.[4] I. Basri, B. Sutopo, “Implementation 1D-DCT Algoritma Feig- Wino grad di FPGA Spartan-3E(Indonesian)”. Proceedings of CITEE 2009, vol. 1, 2009, pp. 198-203[5] Wallace, G. K. ,''The JPEG Still Picture Compression Standard'',Communications of the ACM, Vol.34, Issue 4, pp.30-44. 1991.[6] L. Agostini, S. Bampi, “Pipelined Fast 2-D DCT Architecture for JPEG Image Compression”Proceedings of the 14th Annual Symposium on Integrated Circuits and Systems Design, Pirenopolis, Brazil. IEEE Computer Society 2001. pp 226-231.[7] Xilinx, Inc., “Spartan-3E FPGA Family : Data Sheet ”, Xilinx Corporation, 2009.[8] Vijay Kumar Sharma, Umesh C. Pati, and K. K. Mahapatra “A Simple VLSI Architecture forComputation of 2-D DCT, Quantization and Zig-zag ordering for JPEG”.[9] Omnivision, Inc., “OV9620/9120 Camera Chip Data Sheet ”, Xilinx Corporation, 2002.109[10] Xilinx, Inc., “2D Discrete Cosine Transform (DCT) V2.0 ”, Logicore Product Specification, XilinxCorporation, 2002.[11] A. Shams, A. Chidanandan, W. Pan, and M. Bayoumi, ”NEDA: A low power high throughput DCTarchitecture”, IEEE Transactions on Signal Processing, vol.54(3), Mar. 2006.[12] Peng Chungan, Cao Xixin, Yu Dunshan, Zhang Xing,”A 250MHz optimized distributed architectureof 2D 8x8 DCT”,7th International Conference on ASIC, pp. 189 – 192, Oct. 2007.[13] B.G. Lee,” A new algoritm to compute the discrete cosine transform” ―IEEE Trans. Acoust., Speech,Signal Processing, vol. ASSP-32, pp. 1243-1245, Dect.1984.[14] H.S Hou, “A fast recursive algorithms for computing the discrete cosine transform”, ―IEEE Trans.Acoust., Speech, Signal Processing, vol. ASSP-35, pp. 1455-1461, Oct.1987.[15] N.I Cho and S.U.Lee, ”DCT algorithms for VLSI parallel implementation”,―IEEE Trans. Acoust.,Speech, Signal Processing, vol 38. pp. 121-127, Jan.1990.[16] Nam Ik Cho, Sang Uk Lee , “Fast Algorithm and Implementation of 2-D Discrete Cosine Transform”,―IEEE Transaction on Circuits and Systemsǁ, Vol.38,No.3, March 1991.[17] N. Ahmed, T.Natarajan, and K.R. Rao, “Discrete Cosine Transform”, IEEE Trans. Commun., vol,COM-23, pp. 90-93, Jan. 1974[18] S. Ramachandran, S. Srinivasan and R. Chen, “EPLD-based Architecture of Real Time 2D-DiscreteCosine Transform and Quantization for Image Compression”, IEEE International Symposium on Circuits and Systems (ISCAS ‘99), Orlando, Florida, May–June 1999.[19] Hassan EL-Banna, Alaa A. EL-Fattah “An Efficient Implementation of the 1D DCT using FPGATechnology”, Proceedings of the 11th IEEE International Conference and Workshop on the Engineering of Computer-Based Systems (ECBS’04),2004.[20] Y. Arai, T. Agui, and M. Nakajima, “A fast DCT-SQ scheme for images”, Trans IEICE, Vol. E71,No. 11, pp 1095-1097, 1998.[21] Long- Wen Chang and Ching-Yang Wang and Shiuh-Ming Lee “Designing JPEG QuantizationTables Based On Human Visual System,”Proceedings. 1999 International Conference on Image Processing, ICIP 99, pp.376 - 380 vol.2, Oct.1999.[22] Douglas J. Smith, “HDL Chip Design – A practical guide for designing, synthesizing, and simulatingASICs and FPGAs using VHDL or Verilog”, Doone Publications.110。

相关文档
最新文档