Eye Detection Based on Improved AD AdaBoost Algorithm

合集下载

一种结合AdaBoost与mean shift的人眼检测方法

10
15
Eye detection based on combination of AdaBoost and mean shift
XIONG Jinshui1, SU Fei1, ZHANG Jian2
20 (1. School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876; 2. Propaganda Department of Beijing University of Posts and Telecommunications, Beijing 100876) Abstract: Eye detection is an important part of the face recognition system. It plays a pivotal role in the accuracy of face recognition. In the system, eye detection is divided into coarse detection and fine detection. The haar feature, the AdaBoost algorithm and the attentional cascade are used in the coarse detection to get plenty of candidates. The mean shift, including scale factor, are used in the fine detection to find the maximum position of probability density, which is considered as the ultimate eye position. The experiments show that the algorithm can localize precise eyes for frontal faces in real time. Key words: Face recognition; Eye detection; AdaBoost; Mean shift

红外图像中瞳孔定位算法

红外图像中瞳孔定位算法佚名【摘要】文中提出了一种改进的红外图像中瞳孔定位算法.在红外暗瞳图像中,首先用阈值分割获得候选瞳孔区域,通过形态学运算、团块筛选获得瞳孔区域,对该区域进行边缘检测、斑点去除和椭圆拟合等处理,精确定位瞳孔.实验结果表明,该算法能够很好的克服反射光斑、睫毛及阴影等干扰问题,保证较高的正确率.【期刊名称】《电子设计工程》【年(卷),期】2019(027)001【总页数】5页(P189-193)【关键词】瞳孔定位;阈值分割;椭圆拟合;形态学【正文语种】中文【中图分类】TP301.6瞳孔定位在视线追踪、虹膜识别、医疗诊断中有着重要的作用。

在视线追踪中，可以根据瞳孔的运动状态判断视线方向或落点，进而可以获知人的心理活动；在虹膜识别中，通过瞳孔定位来提取虹膜区域，进而可以进行特征提取；在医疗方面，可以通过监测瞳孔情况判断一个人的精神状况。

总而言之，瞳孔定位具有很大的研究价值[1-2]。

目前比较实用的瞳孔定位算法主要有Hough变换法、梯度向量法、椭圆拟合法、对称变换法和微积分法。

1）Hough变换是常用的瞳孔定位方法，该方法在时间和空间消耗都非常大，无法满足实时性[3]。

2）梯度向量法速度较快，适合于分辨率较低，光照随机场景，容易受光斑、图像模糊等干扰，定位的鲁棒性不高。

3）椭圆拟合法速度快，但抗干扰性差，定位精度一般；4）对称变换法能够适应头部姿势的变化，但计算复杂度高、计算量大，不适合实时视线追踪系统。

5）微积分法定位结果精度较高，但速度较慢，图像质量要求较高[4]。

文中对椭圆拟合法进行了深入研究，针对其容易受到噪声干扰和鲁棒性差的问题，提出一种改进的适合于红外图像的瞳孔定位算法，通过形态学运算、斑点干扰去除等提高算法的抗干扰性。

1 椭圆拟合法该方法首先用边缘检测或射线发散法获得瞳孔的边缘点[5]，然后用椭圆拟合这些边缘点，将椭圆的中心作为瞳孔的中心。

图1是射线发散线法示意图，用射线发散法搜索瞳孔边界点时，先粗略定位瞳孔中心，其后由该中心向四周发出射线，当射线遇梯度较大的点时停止前进，将该点确定为边缘点。

应用于红外眼科疾病检测的瞳孔定位算法

收稿日期：2020-09-25；修订日期：2020-11-09 基金项目：国家重点研发计划（No. 2017YFC0109901）；天津市自然科学基金项目（No. 15JCQNJC14200）
Supported by National Key R&D Program of China (No. 2017YFC0109901); Natural Science Foundation Project of Tianjin (No. 15JCQNJC14200)
根据识别原理，可将瞳孔定位算法分为基于数据的方法和基于知识的方法[8]。两种方法的根本区别在于瞳孔中心点的判断准则是否需要根据先验知识进行制定。基于数据的方法不依靠先验知识，而是通过对足够数量的样本进行学习，获取人眼特征，完成人眼定位。常用的方法有支持向量机 (Support Vector Machine，SVM)[9-11]，卷积神经网络（Convolutional Neural Network，CNN）[12-13]， AdaBoost（adaptive boosting）[14-17] 等。其中，基于 Harr 特征的 AdaBoost 人眼检测算法应用较为广泛。该算法利用积分图和级联结构，对一定数量人眼样本的 Harr 特征进行统计学习，定位人眼区域[17]，其能够极大地改善人眼检测方法的定位精度和定位速度。基于数据的方法对图像质量要求不高，但是这类方法需要大量训练样本，训练过程复杂，并且定位结果精度较低，只适用于粗略的人
摘要：在眼科疾病检测中，为了对被检测者进行快速、准确、自动化的瞳孔定位，提出一种改进径向对称变换的瞳孔中心
点定位算法。首先利用灰度积分投影法结合最大类间方差法，完成对人眼图像的粗分割，并根据多团块筛选条件提取出

基于改进的Adaboost算法的人脸检测系统

基于改进的Adaboost算法的人脸检测系统冯小建;马明栋;王得玉【摘要】With the development of social informatization and intelligence, face detection technology plays an increasingly important role in business, culture and other fields. The performance requirements of face detection systems for society are also increasing. Open source computer vision library OpenCV implements a number of image processing algorithms, including Adaboost algorithm to train Haar classifier for high-accuracy face detection. The commonly used Haar-Like feature-based AdaBoost face detection algorithm still has some disadvantages, such as high missed detection rate and false detection rate and low detection efficiency. In order to make up these deficiencies, we add a new Haar-Like feature that matches the distribution of human face in the original Haar-Like feature set to increase the detection rate and reduce the false alarm rate. And through the introduction of skin detection technology to screen the human skin area as an alternative area, remove most of the non-skin color areas in the picture, in order to improve detection efficiency. Through testing and performance comparison, the improved face detection system has better accuracy and higher detection efficiency.%随着社会的信息化与智能化发展,人脸检测技术在商业、文化等领域扮演着日益重要的角色,社会对人脸检测系统的性能要求也越来越高.开源计算机视觉库OpenCV中实现了众多图像处理算法,其中包括使用Adaboost算法训练出Haar分类器,用以进行高准确率人脸检测.常用的基于Haar-Like特征的AdaBoost人脸检测算法还存在着不足之处,例如漏检率和误检率较高,检测效率较低等.针对这些不足之处,在原有的Haar-Like特征集中加入新的符合人脸器官分布的Haar-Like特征,以提高检测率,降低虚警率.并通过引进肤色检测技术来筛选人体肤色区域作为备选区域,剔除图片中大部分的非肤色区域,以提高检测效率.通过测试与性能对比,改进的人脸检测系统具有更好的准确率和更高的检测效率.【期刊名称】《计算机技术与发展》【年(卷),期】2019(029)003【总页数】4页(P89-92)【关键词】Adaboost;Haar-Like特征;OpenCV;人脸检测;肤色检测;图像处理【作者】冯小建;马明栋;王得玉【作者单位】南京邮电大学通信与信息工程学院, 江苏南京 210003;南京邮电大学地理与生物信息学院, 江苏南京 210003;南京邮电大学地理与生物信息学院, 江苏南京 210003【正文语种】中文【中图分类】TP3020 引言人脸检测是人脸识别技术的重要组成部分和关键技术基础。

基于改进YOLOv5s_的头盔佩戴检测算法

第４２卷第５期２０２３年１０月沈㊀阳㊀理㊀工㊀大㊀学㊀学㊀报ＪｏｕｒｎａｌｏｆＳｈｅｎｙａｎｇＬｉｇｏｎｇＵｎｉｖｅｒｓｉｔｙＶｏｌ４２Ｎｏ５Ｏｃｔ２０２３收稿日期:２０２２－０９－２１基金项目:辽宁省教育厅高等学校基本科研项目(ＬＪＫＺ０２４１)作者简介:陈扬(１９９７ )ꎬ女ꎬ硕士研究生ꎮ通信作者:吕艳辉(１９７１ )ꎬ女ꎬ教授ꎬ博士ꎬ研究方向为计算机视觉㊁人工智能ꎮ文章编号:１００３－１２５１(２０２３)０５－００１１－０７基于改进ＹＯＬＯｖ５ｓ的头盔佩戴检测算法陈㊀扬ꎬ吕艳辉(沈阳理工大学信息科学与工程学院ꎬ沈阳１１０１５９)摘㊀要:交通法规规定电动车驾驶人驾车时需要佩戴安全头盔ꎬ常用检测算法针对安全头盔这类小目标进行检测时存在漏检的问题ꎮ鉴于此ꎬ提出一种基于改进ＹＯＬＯｖ５ｓ的电动车驾驶人头盔佩戴检测算法ꎬ简称为ＨＷＤ￣ＹＯＬＯｖ５ｓ算法ꎮ该算法以深度学习框架ＹＯＬＯｖ５ｓ为基础ꎬ改进原始模型特征提取部分的下采样方法和特征融合方法ꎬ并修改边框损失函数ＧＩＯＵ的计算方法ꎮ通过多场景下数据采集获得１１３７０张图片以制作安全头盔数据集ꎬ并在自制数据集上采用ＨＷＤ￣ＹＯＬＯｖ５ｓ算法及其他主流算法进行小目标检测的对比实验ꎮ实验结果表明:与ＹＯＬＯｖ５ｓ算法相比ꎬＨＷＤ￣ＹＯＬＯｖ５ｓ算法在准确率㊁召回率㊁平均精度三个方面分别提升０.４％㊁１.１％㊁０.２％ꎻ检测速度能够达到实时检测要求ꎮ关㊀键㊀词:ＹＯＬＯｖ５ｓꎻ目标检测ꎻ特征提取ꎻ特征融合中图分类号:ＴＰ３９１.４１文献标志码:ＡＤＯＩ:１０.３９６９/ｊ.ｉｓｓｎ.１００３－１２５１.２０２３.０５.００２ＡｌｇｏｒｉｔｈｍｏｆＨｅｌｍｅｔＷｅａｒｉｎｇＤｅｔｅｃｔｉｏｎＢａｓｅｄｏｎＩｍｐｒｏｖｅｄＹＯＬＯｖ５ｓＣＨＥＮＹａｎｇꎬＬＹＵＹａｎｈｕｉ(ＳｈｅｎｙａｎｇＬｉｇｏｎｇＵｎｉｖｅｒｓｉｔｙꎬＳｈｅｎｙａｎｇ１１０１５９ꎬＣｈｉｎａ)Ａｂｓｔｒａｃｔ:Ｔｒａｆｆｉｃｒｅｇｕｌａｔｉｏｎｓｓｔｉｐｕｌａｔｅｔｈａｔｅｌｅｃｔｒｉｃｖｅｈｉｃｌｅｄｒｉｖｅｒｓｎｅｅｄｔｏｗｅａｒｓａｆｅｔｙｈｅｌ￣ｍｅｔｓｗｈｅｎｄｒｉｖｉｎｇ.Ｇｅｎｅｒａｌｄｅｔｅｃｔｉｏｎａｌｇｏｒｉｔｈｍｓｈａｖｅｔｈｅｐｒｏｂｌｅｍｏｆｍｉｓｓｉｎｇｄｅｔｅｃｔｉｏｎｗｈｅｎｄｅｔｅｃｔｉｎｇｓｍａｌｌｔａｒｇｅｔｓｓｕｃｈａｓｓａｆｅｔｙｈｅｌｍｅｔｓ.Ｉｎｖｉｅｗｏｆｔｈｉｓꎬａｈｅｌｍｅｔｗｅａｒｉｎｇｄｅｔｅｃ￣ｔｉｏｎａｌｇｏｒｉｔｈｍｆｏｒｅｌｅｃｔｒｉｃｖｅｈｉｃｌｅｄｒｉｖｅｒｓｂａｓｅｄｏｎｉｍｐｒｏｖｅｄＹＯＬＯｖ５ｓｉｓｐｒｏｐｏｓｅｄꎬｗｈｉｃｈｉｓｃａｌｌｅｄＨＷＤ￣ＹＯＬＯｖ５ｓａｌｇｏｒｉｔｈｍｆｏｒｓｈｏｒｔ.ＢａｓｅｄｏｎｔｈｅｄｅｅｐｌｅａｒｎｉｎｇｆｒａｍｅｗｏｒｋＹＯＬＯｖ５ｓꎬｔｈｅａｌｇｏｒｉｔｈｍｉｍｐｒｏｖｅｓｔｈｅｄｏｗｎ￣ｓａｍｐｌｉｎｇｍｅｔｈｏｄａｎｄｆｅａｔｕｒｅｆｕｓｉｏｎｍｅｔｈｏｄｏｆｔｈｅｆｅａｔｕｒｅｅｘｔｒａｃｔｉｏｎｐａｒｔｏｆｔｈｅｏｒｉｇｉｎａｌｍｏｄｅｌꎬａｎｄｍｏｄｉｆｉｅｓｔｈｅｃａｌｃｕｌａｔｉｏｎｍｅｔｈｏｄｏｆｔｈｅｆｒａｍｅｌｏｓｓｆｕｎｃｔｉｏｎＧＩＯＵ.１１３７０ｐｉｃｔｕｒｅｓａｒｅｏｂｔａｉｎｅｄｔｈｒｏｕｇｈｄａｔａｃｏｌｌｅｃｔｉｏｎｉｎｍｕｌｔｉｐｌｅｓｃｅｎｅｓｔｏｍａｋｅａｈｅｌｍｅｔｄａｔａｓｅｔꎬａｎｄＨＷＤ￣ＹＯＬＯｖ５ｓａｌｇｏｒｉｔｈｍａｎｄｏｔｈｅｒｍａｉｎｓｔｒｅａｍａｌｇｏ￣ｒｉｔｈｍｓａｒｅｕｓｅｄｉｎｔｈｅｃｏｍｐａｒａｔｉｖｅｅｘｐｅｒｉｍｅｎｔｏｆｓｍａｌｌｔａｒｇｅｔｄｅｔｅｃｔｉｏｎｏｎｔｈｅｓｅｌｆ￣ｍａｄｅｄａｔａ￣ｓｅｔ.ＴｈｅｅｘｐｅｒｉｍｅｎｔａｌｒｅｓｕｌｔｓｓｈｏｗｔｈａｔꎬｃｏｍｐａｒｅｄｗｉｔｈｏｔｈｅｒａｌｇｏｒｉｔｈｍｓꎬＨＷＤ￣ＹＯＬＯｖ５ｓａｌ￣ｇｏｒｉｔｈｍｃａｎｉｍｐｒｏｖｅａｃｃｕｒａｃｙꎬｒｅｃａｌｌａｎｄａｖｅｒａｇｅａｃｃｕｒａｃｙｂｙ０.４％ꎬ１.１％ａｎｄ０.２％ｒｅ￣ｓｐｅｃｔｉｖｅｌｙꎬａｎｄｔｈｅｄｅｔｅｃｔｉｏｎｓｐｅｅｄｃａｎｍｅｅｔｔｈｅｒｅｑｕｉｒｅｍｅｎｔｓｏｆｒｅａｌ￣ｔｉｍｅｄｅｔｅｃｔｉｏｎ.Ｋｅｙｗｏｒｄｓ:ＹＯＬＯｖ５ｓꎻｔａｒｇｅｔｄｅｔｅｃｔｉｏｎꎻｆｅａｔｕｒｅｅｘｔｒａｃｔｉｏｎꎻｆｅａｔｕｒｅｆｕｓｉｏｎ㊀㊀目标检测技术在军事㊁交通㊁医学㊁生物学等领域具有广泛的应用价值ꎬ推动了社会的进步ꎬ为人们生活提供了便利ꎮ近些年ꎬ随着电动车销量的不断增多ꎬ与电动车相关的交通事故也在增多ꎮ很多电动车驾驶人没有安全意识ꎬ不佩戴安全头盔ꎮ国内已有多个城市相继颁布电动车驾驶人需要佩戴安全头盔的交通法规ꎬ以减少发生交通意外时的人员伤亡ꎮ本文拟对电动车驾驶人是否佩戴安全头盔进行检测并在公共场所自行采集数据制作数据集ꎬ此背景下构建的数据集中会存在相当多的小目标ꎬ容易出现漏检的情况ꎮ另外ꎬ本文研究内容的应用场所人员变动较快ꎬ算法应具备实时检测的能力ꎮ一张图片中宽高占整个图片宽高十分之一以下的物体称为小目标ꎮ小目标检测作为目标检测领域内的一个难点ꎬ受到越来越多研究者的关注ꎮ文献[１]通过改进数据增强方法平衡了数据类别ꎬ使用轻量型网络ＭｏｂｉｌｅＮｅｔＶ２替换原始主干网络ꎬ减少了网络计算量ꎬ最后对模型通道进行剪枝以及知识蒸馏提升了检测速度ꎬ但检测精度有待提高ꎻ文献[２]提出一种自适应双向特征融合模块提高对小目标的检测率ꎬ但采用的损失函数未能考虑数据集分布特点ꎬ未能解决难易样本不均衡问题ꎻ文献[３]提出在特征提取网络中融入通道注意力机制以更好地提取小目标特征信息ꎬ但其通道注意力机制只使用全局平均池化层压缩通道特征ꎬ对难以区分的物体检测效果较差ꎻ文献[４]提出基于ＦＰＮ的优化算法ꎬ通过引入感受野模块模拟生物体的感受野机制ꎬ使网络着重学习位于中心的特征ꎬ进而取得较好的识别效果ꎬ然而检测速度有所降低ꎬ达不到实时检测的效果ꎮ综上ꎬ目标检测算法虽然取得了一定的研究成果ꎬ但也存在一些不足ꎮ鉴于此ꎬ本文提出一种基于改进ＹＯＬＯｖ５ｓ[５]的电动车驾驶人头盔佩戴检测算法ꎬ简称ＨＷＤ￣ＹＯＬＯｖ５ｓ算法ꎮ该算法以深度学习框架ＹＯＬＯｖ５ｓ为基础ꎬ针对其第一步下采样方法可能导致过拟合的问题ꎬ提出一种新的下采样方法ꎻ针对原特征融合方法没有考虑不同特征图占有不同的贡献度以及没有对重要特征给予重点关注所导致的检测精度差的问题ꎬ提出一种改进的特征融合方法ꎻ针对原算法采用的边框损失函数ＧＩＯＵ在检测框和真实框相交时收敛慢的问题ꎬ提出适用的边框损失函数计算方法ꎮ１㊀ＨＷＤ￣ＹＯＬＯｖ５ｓ算法的提出１.１㊀ＹＯＬＯｖ５ｓ算法的不足本文选用ＹＯＬＯｖ５中的ＹＯＬＯｖ５ｓ作为基础网络ꎬ在ＹＯＬＯｖ５ｓ中图像自适应缩放成６４０ˑ６４０ˑ３大小的图像后进行特征提取ꎮ首先ꎬ进行切片下采样ꎬ将平面信息转移到通道维度上ꎬ降低维度ꎬ增加通道数ꎬ能够得到两倍下采样特征图ꎮ该下采样方法的位置处在特征提取的第一步ꎬ虽然几乎没有丢失特征ꎬ但是会学习到大量无用信息ꎬ可能导致模型出现过拟合现象[６]ꎮ其次ꎬＹＯＬＯｖ５ｓ使用ＦＰＮ及ＰＡＮｅｔ完成图像特征融合[７]ꎮ这种特征融合方式能够做到双向特征传递ꎬ但无法区分不同分辨率特征图的贡献度ꎻ对于图像中一些重要特征并未给出特别的关注[８]ꎬ不能将无关噪声和应该重点关注的特征信息区分开ꎬ导致算法对目标的检测精度较差ꎬ算法检测效果仍有待提升ꎮ再次ꎬＹＯＬＯｖ５ｓ算法的边框回归损失函数采用ＧＩＯＵꎬ当检测框和真实框相交时ꎬ无法反映两个框的相交方式ꎬ即不能衡量两个框相对的位置关系[９]ꎮ当预测框与真实框分离且距离较远时ꎬ产生较大的外接矩形框ꎬ因损失值较大难以优化ꎬ导致算法收敛速度慢[１０]ꎮ１.２㊀ＨＷＤ￣ＹＯＬＯｖ５ｓ算法１.２.１㊀下采样方法的提出在ＨＷＤ￣ＹＯＬＯｖ５ｓ算法中ꎬ提出一种新的下采样方法ꎬ能够解决原始ＹＯＬＯｖ５ｓ第一步下采样方法的弊端ꎮ为后续计算方便ꎬ对输入的６４０ˑ６４０ˑ３图像特征矩阵进行１ˑ１ˑ１２的卷积计算ꎬ通道数变成４ꎬ即得到６４０ˑ６４０ˑ４维度的特征图２１沈㊀阳㊀理㊀工㊀大㊀学㊀学㊀报㊀㊀第４２卷Ｘ０ꎻ再将Ｘ０进行非线性变换ꎬ包括归一化操作㊁ＲｅＬＵ激活ꎬ并进行卷积操作ꎮ将这种非线性变换操作作为一个模块ꎬ共设计五个模块(Ｂｌｏｃｋ１~Ｂｌｏｃｋ５)ꎬ其中Ｂｌｏｃｋ５中进行卷积和平均池化实现下采样ꎮ整体下采样方法的非线性变换结构如图１所示ꎮ图１㊀非线性变换结构图㊀㊀由图１可知ꎬ每个模块都会输出一个特征矩阵ꎬ若输入特征矩阵为Ｘ０ꎬ则设Ｂｌｏｃｋ２~Ｂｌｏｃｋ４模块的输入特征矩阵分别是Ｘ１~Ｘ３ꎮ在Ｂｌｏｃｋ１模块中经过非线性变换后的特征矩阵与输入特征矩阵Ｘ０在通道上进行拼接后得到特征矩阵Ｘ１ꎮ对Ｘ１进行非线性变换得到的特征矩阵与Ｘ０和Ｘ１进行拼接后得到特征矩阵Ｘ２ꎮ以此类推ꎬ最后Ｂｌｏｃｋ４模块经过非线性变换后的特征矩阵与之前的Ｘ０~Ｘ３进行拼接ꎬ得到输出Ｘ４ꎮ上述操作的数学表达式为Ｘｌ＝Ｃｏｎｃａｔ[Ｈ(Ｘｌ－１)ꎬＸｌ－２ꎬ ꎬＸ１](１)式中:ｌ＝１ꎬ２ꎬ３ꎬ４ꎻＨ表示非线性变换函数ꎬ对特征矩阵进行非线性变换操作ꎻＣｏｎｃａｔ表示拼接函数ꎬ实现对特征矩阵进行通道上的拼接ꎮ首先对本模块的输入特征矩阵进行非线性变换ꎬ然后在通道维度上拼接前面所有模块的输入特征矩阵作为下一模块的输入ꎬ再进行非线性变换ꎮ一次拼接一次非线性变换ꎬ特征图维度大小不变ꎬ通道数一直增大ꎬ即做通道间的堆积ꎮ非线性变换前后的特征图个数ꎬ即通道个数ａ为ａ＝Ｋ０＋Ｋ´(ｎ－１)(２)式中:Ｋ０表示输入特征图个数ꎻｎ表示模块ꎻＫ表示网络增长率ꎬ本文中Ｋ值取４ꎮ以输入图像Ｘ０为例ꎬＸ０大小为６４０ˑ６４０ˑ３ꎬＢｌｏｃｋ１对其进行非线性变换ꎬ即进行批标准化㊁ＲｅＬＵ激活以及卷积操作ꎮ此处完成两次卷积ꎬ先进行１ˑ１ˑ４卷积ꎬ卷积后的特征矩阵大小为６４０ˑ６４０ˑ１ꎬ再进行批标准化和ＲｅＬＵ激活及３ˑ３ˑ４㊁填充为１的卷积ꎬ卷积后的特征矩阵大小为６４０ˑ６４０ˑ４ꎮ由于批标准化㊁ＲｅＬＵ激活后特征矩阵大小不变ꎬ所以将所得特征矩阵与Ｘ０进行通道维度相加ꎬ得到输出特征矩阵大小为６４０ˑ６４０ˑ８ꎬ得到特征图Ｘ１ꎮ以此类推可知ꎬ特征图Ｘ２的通道数为１２ꎬＸ３的通道数为１６ꎬＸ４的通道数为２０ꎮＢｌｏｃｋ５和前面四个模块略有不同ꎬ其内部进行批归一化㊁ＲｅＬＵ激活以及１ˑ１ˑ３２ａ的卷积ꎬ最后进行过滤器大小为２ˑ２㊁步长为２的平均池化操作ꎬ最终输出特征图大小为３２０ˑ３２０ˑ３２ꎮ整个结构完成了两倍的下采样ꎬ相邻两个模块间都具有极为密切的联系ꎮ由于前四个模块中都有３ˑ３ˑ４㊁填充为２的卷积ꎬ使得输出的特征矩阵维度保持相同ꎬ以便在通道维度上进行累加ꎬ１ˑ１的卷积大大减少了参数ꎬ使得整个结构具有较高的计算效率ꎮ综上ꎬ本文提出的下采样方法采用模块间密集连接的方式ꎬ保证了浅层目标特征的有效提取ꎮ同时ꎬ密集连接方式也提升了整个结构梯度的反向传播ꎬ使网络更容易训练ꎮ最后的平均池化操作实现了下采样ꎬ能够降低输入特征矩阵维度ꎬ减少网络模型的参数ꎬ防止出现过拟合的现象ꎮ１.２.２㊀特征融合方法的提出因为ＹＯＬＯｖ５ｓ使用ＦＰＮ及ＰＡＮｅｔ完成特征融合ꎬ能够做到双向特征传递ꎬ但无法区分不同分辨率特征图的贡献度[１１]ꎮ本文所提融合方法先优化特征融合网络结构ꎬ并赋予特征图权重后再３１第５期㊀㊀㊀㊀㊀陈㊀扬等:基于改进ＹＯＬＯｖ５ｓ的头盔佩戴检测算法进行特征融合ꎮ将ＹＯＬＯｖ５ｓ中第五层到第七层的融合改为第三层到第七层进行融合ꎬ特征融合方法的结构如图２所示ꎮ图２㊀ＨＷＤ￣ＹＯＬＯｖ５ｓ特征融合方法结构图㊀㊀图２中每一个节点代表一个特征图ꎮ由图２可见ꎬ因为第三层和第七层的中间特征图只有一个输入ꎬ没有特征融合ꎬ对整个特征网络的贡献很小ꎬ故删除第三层和第七层中间的特征图ꎻ因为底层特征在经过反复上采样和下采样后可能会失真ꎬ故在第四~六层中ꎬ在原有双向融合的基础上ꎬ同水平层之间引入跳跃连接ꎬ以强化高层特征层中的细节信息ꎬ以提高特征融合的效率ꎮ图２中仅标明了部分特征图的权重ꎬ另外ꎬＰｔｄ６向Ｐｔｄ５传递时所占权重为Ｗ１０ꎻＰｔｄ５向Ｐｔｄ４传递时所占权重为Ｗ１１ꎻＰｉｎ４㊁Ｐｉｎ５㊁Ｐｉｎ６分别向Ｐｏｕｔ４㊁Ｐｏｕｔ５㊁Ｐｏｕｔ６传递时所占权重分别为Ｗ３㊁Ｗ５㊁Ｗ７ꎮ在特征融合过程中ꎬ考虑到不同分辨率的特征图具有不同的贡献度ꎬ提出一种自适应权重的特征向量融合方法ꎮ通过将不同分辨率的特征图进行上采样或下采样以统一特征图为相同的分辨率后计算各自的权重ꎬ再进行特征向量的融合ꎮ其特征向量的融合公式为Ｏ＝ðｉＷｉε＋ðｊＷｊˑＩｉ(３)式中:Ｏ表示所有特征图融合后的特征向量ꎻｉ㊁ｊ表示特征向量序号ꎬ是正整数ꎻＷｉ和Ｗｊ表示权重ꎻε＝０.０００１ꎬ用于保证数值的稳定性ꎻＩｉ表示输入的特征向量ꎮ以图２为例ꎬ其中第五层的中间特征向量Ｐｔｄ５计算方法为Ｐｔｄ５＝Ｃｏｎｖ(Ｗ６ˑＰｉｎ５＋Ｗ１０ˑＲｅｓｉｚｅ(Ｐｔｄ６)Ｗ６＋Ｗ１０＋ε)(４)式中:Ｃｏｎｖ表示卷积操作ꎻＲｅｓｉｚｅ表示上采样ꎮ以上的双向自适应加权融合方法从结构上能更充分进行特征信息保留ꎬ但并未解决ＹＯＬＯｖ５ｓ不能对特征图中重要信息给予特别关注的问题ꎮ为此ꎬ在第三~七层的卷积层中引入注意力模块[１２]ꎬ包括空间注意力和通道注意力ꎮ卷积层中引入注意力机制的结构如图３所示ꎮ图３㊀卷积层中引入注意力机制结构图㊀㊀图３中输入特征矩阵在通道注意子模块中分别进行全局最大池化和全局平均池化ꎬ即取每个特征图中的最大值和平均值ꎬ分别得到最大合并特征矩阵和平均合并特征矩阵[１３]ꎬ然后输入到共享网络ꎬ得到通道注意力图Ｆｔꎬ即带有通道注意力权重的特征图ꎮ为减小参数量ꎬ需要通过Ｓｉｇｍｏｉｄ函数进行归一化处理ꎬ然后通过乘法加权到原始输入特征图上得到最终输出ꎮ上述Ｆｔ的计算公式可表示为Ｆｔ＝σ(ＭＬＰ(ＡｖｇＰｏｏｌ(Ｆ))＋㊀㊀㊀㊀ＭＬＰ(ＭａｘＰｏｏｌ(Ｆ)))㊀㊀㊀㊀＝σ(Ｓ１(Ｓ０(Ｆｃａｖｇ))＋Ｓ１(Ｓ０(Ｆｃｍａｘ)))(５)式中:Ｆ是输入特征图ꎻσ表示Ｓｉｇｍｏｉｄ函数ꎻＭＬＰ是组成共享网络的多层感知器ꎻＡｖｇＰｏｏｌ是全局平均池化ꎻＭａｘＰｏｏｌ是全局最大池化ꎻＦｃａｖｇ是通道平均合并特征ꎻＦｃｍａｘ是通道最大合并特征ꎻＳ０㊁Ｓ１是两层共享网络ꎮＦｔ在进入空间注意力子模块时ꎬ同样要进行全局最大池化和全局平均池化ꎬ将特征维度转变成１ˑ１ꎬ经过卷积核为７ˑ７的卷积和ＲｅＬＵ激活函数以降低特征图的维度ꎻ再经过一次卷积后提升为原来的维度ꎻ最后采用Ｓｉｇｍｏｉｄ函数进行归一化处理得到的特征图与通道注意图进行合并ꎬ得到空间注意力图Ｆｋꎬ即注意力模块最终输出的特征图ꎮＦｋ计算公式可表示为Ｆｋ＝σ(ｆ７ˑ７([ＡｖｇＰｏｏｌ(Ｆｔ)ꎻＭａｘＰｏｏｌ(Ｆｔ)]))＝σ(ｆ７ˑ７([(Ｆｔ)ｓａｖｇꎻ(Ｆｔ)ｓｍａｘ]))㊀㊀㊀㊀(６)式中:ｆ７´７是指卷积核为７ˑ７的卷积操作ꎻ(Ｆｔ)ｓａｖｇ是空间平均合并特征ꎻ(Ｆｔ)ｓｍａｘ是空间最大合并特征ꎮ４１沈㊀阳㊀理㊀工㊀大㊀学㊀学㊀报㊀㊀第４２卷１.２.３㊀边框回归损失函数的提出ＹＯＬＯｖ５ｓ算法采用ＧＩＯＵ作为边框回归损失函数[１４]ꎬ其计算公式为ＧＩＯＵ＝１－ＢｐɘＢｔＢｐɣＢｔ＋｜Ｃ－(ＢｐɣＢｔ)｜｜Ｃ｜(７)式中:ＢＰ表示预测框ꎻＢｔ表示真实框ꎻＣ表示预测框与真实框最小外接矩形框的面积ꎮ当预测框与真实框重合时ꎬ损失值为０ꎻ当预测框的边与真实框的边外切时ꎬ损失值为１ꎻ当预测框与真实框分离且距离较远时ꎬ损失值无限趋近于２ꎮ根据上述计算ꎬＧＩＯＵ反映了真实框与预测框的重合程度和远近距离ꎮ但当检测框和真实框相交时ꎬ无法反应两个框的相交方式ꎮ当预测框与真实框分离且距离较远时ꎬ产生较大的外接矩形框ꎬ因损失值较大难以优化ꎬ会导致算法收敛速度慢ꎮ因此ꎬ本文将边框回归损失函数计算方法修正为Ｌ＝１－ＢｐɘＢｔＢｐɣＢｔ＋ρ２(ｂꎬｂｇｔ)ｃ２＋㊀ρ２(ｗꎬｗｇｔ)ｃᶄ２＋ρ２(ｈꎬｈｇｔ)ｃᵡ２(８)式中:Ｌ代表修正后边框回归损失ꎻρ２(ｂꎬｂｇｔ)代表预测框中心点ｂ到真实框中心点ｂｇｔ欧氏距离的平方ꎻｃ２为预测框和真实框最小外接矩形框对角线距离的平方ꎻｃᶄ２为预测框与真实框最小外接矩形框的宽的平方ꎻρ２(ｗꎬｗｇｔ)表示预测框宽的中点ｗ到真实框对应宽的中点ｗｇｔ的欧氏距离的平方ꎻｃᵡ２表示预测框与真实框最小外接矩形框的高的平方ꎻρ２(ｈꎬｈｇｔ)表示预测框高的中点ｈ到真实框对应高的中点ｈｇｔ的欧氏距离的平方ꎮ本文提出的边框回归损失函数的惩罚项有三个ꎬ分别计算预测框的中心点㊁宽㊁高与其最小外接框中心点㊁宽㊁高的差值ꎬ直接回归欧氏距离ꎮ其中ꎬ中心点与最小外接矩形框的惩罚项解决了ＧＩＯＵ在两框距离较远时产生较大的外包框所导致的收敛速度慢的问题ꎻ宽㊁高与最小外接矩形框宽㊁高的惩罚项的使用能够解决ＧＩＯＵ无法反映两个框的相交方式问题ꎬ且当预测框与真实框在水平方向和垂直方向上存在差异时ꎬ通过使用宽㊁高与最小外接矩形框宽㊁高的惩罚项ꎬ能够分别在水平和垂直方向上提高收敛速度ꎬ提升回归精度ꎮ２㊀实验结果与分析２.１㊀环境配置本文实验环境在Ｗｉｎｄｏｗｓ１０操作系统下配置ꎬ安装Ａｎａｃｏｎｄａ用于管理和创建环境ꎻ在Ａｎａ￣ｃｏｎｄａ下搭建Ｐｙｔｏｒｃｈ框架ꎬ加速ＧＰＵꎻ编译器选用Ｐｙｃｈａｒｍ２０２０.１.２版本ꎻ安装ＣＵＤＡ工具包ꎬ以提高ＧＰＵ的大规模并行计算能力ꎻ本文实验中对数据的训练需要在带有ＧＰＵ的服务器上进行ꎮ２.２㊀数据集制作在多个路口㊁市场㊁小区等场景下采集数据ꎬ其中的样本多样ꎬ安全头盔颜色多样ꎬ电动车型号㊁颜色多样ꎮ采集数据全部是以拍摄视频的方式获取ꎬ视频时长几秒到几十秒不等ꎬ共计约有４００个视频ꎮ通过在线免费转换器进行图片的提取ꎬ每秒提取图片数量为１０张ꎬ共计提取图片有２６９５４张ꎮ剔除掉图像中没有待检测目标或目标特征不明显的无用图片５６７４张以及重复图片９９１０张后ꎬ共计保留１１３７０张ꎮ对图片逐张进行标注ꎬ把电动车和人整体作为目标ꎬ其中佩戴安全头盔的作为一类ꎬ命名为ｈａｔꎻ未佩戴安全头盔的作为一类ꎬ命名为ｎｏ￣ｈａｔꎮ实验将自制安全头盔数据集随机划分为三部分ꎬ其中训练集占７０％ꎬ验证集占２０％ꎬ测试集占１０％ꎮ预训练ＨＷＤ￣ＹＯＬＯｖ５ｓ模型权重ꎬ迭代次数设为２５０ꎮ数据集中待检测目标大小不同ꎬ为保证检测效果ꎬＨＷＤ￣ＹＯＬＯｖ５ｓ算法将待检测目标分成大㊁中㊁小三类尺寸ꎮ针对不同尺寸的目标ꎬ在训练过程中分别设置了与其尺寸对应的预置检测框进行训练ꎬ且在预测端针对不同尺寸的目标给出对应的输出ꎬ以提升对不同尺寸目标检测的准确度ꎮ２.３㊀实验及对比实验本实验利用召回率(Ｒｅｃａｌｌ)㊁准确率(Ｐｒｅｃｉ￣ｓｉｏｎ)㊁阈值为０.５的平均精度(ｍＡＰ)作为评价指标ꎮ１)使用ＨＷＤ￣ＹＯＬＯｖ５ｓ算法在自制安全头盔数据集上进行训练并在验证集上进行验证ꎬ得出的验证结果如表１所示ꎮ表１㊀ＨＷＤ￣ＹＯＬＯｖ５ｓ算法实验结果类别验证集ＰｒｅｃｉｓｉｏｎＲｅｃａｌｌｍＡＰｈａｔ１８４００.９２３０.９０８０.９４５ｎｏ￣ｈａｔ１６８８０.９４２０.９０８０.９４７ａｌｌ２２７４０.９３２０.９０８０.９４６５１第５期㊀㊀㊀㊀㊀陈㊀扬等:基于改进ＹＯＬＯｖ５ｓ的头盔佩戴检测算法㊀㊀表１中类别为ａｌｌ的数据是类别ｈａｔ和ｎｏ￣ｈａｔ在上述几个评价指标对应数值的平均值ꎮＨＷＤ￣ＹＯＬＯｖ５ｓ算法在安全头盔数据集上训练时生成的评价指标曲线如图４所示ꎮ图４㊀ＨＷＤ￣ＹＯＬＯｖ５ｓ算法评价指标曲线图㊀㊀从图４中可以看出ꎬ在较少的训练轮数后ꎬＰｒｅｃｉｓｉｏｎ㊁Ｒｅｃａｌｌ㊁ｍＡＰ值均已达到了９０％ꎬ并稳步增长ꎮ其中Ｒｅｃａｌｌ值相对于其他两个指标达到９０％所需训练轮数更多ꎮ２)进行对比实验ꎮ在ＹＯＬＯｖ５ｓ㊁ＳＳＤ[１４]㊁ＦａｓｔｅｒＲ￣ＣＮＮ[１５]三种主流目标检测算法上基于自制安全头盔数据集进行训练并在验证集上进行验证ꎬ分别得到Ｐｒｅｃｉｓｉｏｎ㊁Ｒｅｃａｌｌ㊁ｍＡＰ㊁速度四个评价指标的实验结果ꎬ取两个类别ｈａｔ和ｎｏ￣ｈａｔ评价结果的平均值为最终结果与本文算法ＨＷＤ￣ＹＯＬＯｖ５ｓ进行性能比较ꎬ对比实验结果见表２ꎮ表２㊀各算法实验结果对比算法ＰｒｅｃｉｓｉｏｎＲｅｃａｌｌｍＡＰ时间/ｍｓＦａｓｔｅｒＲ￣ＣＮＮ０.８１６０.８５５０.８２７１６７.５５ＳＳＤ０.８５５０.９０２０.９２５２１.７７ＹＯＬＯｖ５ｓ０.９２８０.８９７０.９４４１４.６４ＨＷＤ￣ＹＯＬＯｖ５ｓ０.９３２０.９０８０.９４６１４.８９㊀㊀由表２可见ꎬＨＷＤ￣ＹＯＬＯｖ５ｓ算法的Ｐｒｅｃｉ￣ｓｉｏｎ㊁Ｒｅｃａｌｌ值均为最优ꎮ与ＹＯＬＯｖ５ｓ相比ꎬ在Ｐｒｅｃｉｓｉｏｎ㊁Ｒｅｃａｌｌ㊁ｍＡＰ三个方面分别提升了０.４％㊁１.１％㊁０.２％ꎮＨＷＤ￣ＹＯＬＯｖ５ｓ算法检测一张图片的时间为１４.８９ｍｓꎬ能够满足实时性检测的要求ꎮ３㊀结论常用图像检测算法针对安全头盔这类小目标可能存在漏检的问题ꎬ本文基于自制数据集中安全头盔的分布特点ꎬ使用ＹＯＬＯｖ５ｓ作为基础网络ꎬ改进特征提取中的下采样方法以及特征融合方法ꎬ修改边框损失函数ＧＩＯＵ计算方法ꎬ获得改进算法ＨＷＤ￣ＹＯＬＯｖ５ｓꎮ将ＨＷＤ￣ＹＯＬＯｖ５ｓ算法在自制数据集上进行实验ꎬ并且使用ＹＯＬＯｖ５ｓ㊁ＳＳＤ㊁ＦａｓｔｅｒＲ￣ＣＮＮ三种主流算法进行对比实验ꎮ实验结果表明:ＨＷＤ￣ＹＯＬＯｖ５ｓ算法泛化性强ꎬ对于安全头盔目标较密集情况的检测效果较好ꎬ相比其他算法具有更高的检测精度ꎬ检测实时性满足交通领域的应用要求ꎮ本文研究的不足在于只针对自制数据集进行验证ꎬ数据集中样本的多样性和样本均衡性稍差ꎮ参考文献:[１]韩锟ꎬ李斯宇ꎬ肖友刚.施工场景下基于ＹＯＬＯｖ３的安全帽佩戴状态检测[Ｊ].铁道科学与工程学报ꎬ２０２１ꎬ１８(１):２６８－２７６.[２]肖进胜ꎬ张舒豪ꎬ陈云华ꎬ等.双向特征融合与特征选择的遥感影像目标检测[Ｊ].电子学报ꎬ２０２２ꎬ５０(２):２６７－２７２.[３]王奇ꎬ靳华中ꎬ李文萱ꎬ等.多尺度通道注意力机制的小样本图像分类算法[Ｊ].湖北工业大学学报ꎬ２０２２ꎬ３７(１):３４－３９ꎬ７０.[４]李莉ꎬ乔璐ꎬ张浩洋.结合ＦＰＮ改进Ｒ￣ＦＣＮ的肺结节检测算法[Ｊ].计算机应用与软件ꎬ２０２２ꎬ３９(４):１７９－１８４.[５]于硕ꎬ李慧ꎬ桂方俊ꎬ等.复杂场景下基于ＹＯＬＯｖ５的口罩佩戴实时检测算法研究[Ｊ].计算机测量与控制ꎬ２０２１ꎬ２９(１２):１８８－１９４.[６]钱雪ꎬ李军ꎬ唐球ꎬ等.基于ＹＯＬＯｖ５的药品表面缺陷实时检测方法[Ｊ].信息技术与网络安全ꎬ２０２１ꎬ４０(１２):４５－５０.[７]李成.基于改进ＹＯＬＯｖ５的小目标检测算法研究[Ｊ].长江信息通信ꎬ２０２１ꎬ３４(９):３０－３３.[８]王书坤ꎬ高林ꎬ伏德粟ꎬ等.改进的轻量型ＹＯＬＯｖ５绝６１沈㊀阳㊀理㊀工㊀大㊀学㊀学㊀报㊀㊀第４２卷缘子缺陷检测算法研究[Ｊ].湖北民族大学学报(自然科学版)ꎬ２０２１ꎬ３９(４):４５６－４６１.[９]叶兴宇.基于深度学习的口罩佩戴检测算法研究[Ｊ].信息与电脑(理论版)ꎬ２０２１ꎬ３３(１８):７２－７６. [１０]马琳琳ꎬ马建新ꎬ韩佳芳ꎬ等.基于ＹＯＬＯｖ５ｓ目标检测算法的研究[Ｊ].电脑知识与技术ꎬ２０２１ꎬ１７(２３):１００－１０３.[１１]ＳＥＯＬＳꎬＡＨＮＪꎬＬＥＥＨꎬｅｔａｌ.ＳＳＰｂａｓｅｄｕｎｄｅｒｗａｔｅｒＣＩＲｅｓｔｉｍａｔｉｏｎｗｉｔｈＳ￣ＢｉＦＰＮ[Ｊ].ＩＣＴＥｘｐｒｅｓｓꎬ２０２２ꎬ８(１):４４－４９.[１２]ＦＡＮＧＹꎬＨＵＡＮＧＨꎬＹＡＮＧＷＪꎬｅｔａｌ.ＮｏｎｌｏｃａｌｃｏｎｖｏｌｕｔｉｏｎａｌｂｌｏｃｋａｔｔｅｎｔｉｏｎｍｏｄｕｌｅＶＮｅｔｆｏｒｇｌｉｏｍａｓａｕｔｏｍａｔｉｃｓｅｇｍｅｎｔａｔｉｏｎ[Ｊ].ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＩｍａｇｉｎｇＳｙｓｔｅｍｓａｎｄＴｅｃｈｎｏｌｏｇｙꎬ２０２１ꎬ３２(２):５２８－５４３.[１３]兰凌强ꎬ刘淇缘ꎬ卢树华.基于注意力机制与特征相关性的人脸表情识别[Ｊ].北京航空航天大学学报ꎬ２０２２ꎬ４８(１):１４７－１５５.[１４]周永福ꎬ李文龙ꎬ胡冉冉.多尺度特征融合的双通道ＳＳＤ行人头部检测算法[Ｊ].激光与光电子学进展ꎬ２０２１ꎬ５８(２４):３８３－３９４.[１５]赵振强ꎬ何水原ꎬ梁永志.基于ＦａｓｔｅｒＲ￣ＣＮＮ的遥感影像船舶检测识别[Ｊ].测绘通报ꎬ２０２１(１１):５９－６４.(责任编辑:和晓军)(上接第１０页)[４４]ＡＲＩＮＤＡＭＳꎬＣＨＯＷＤＨＵＲＹＡ.Ｓｃａｌｅ￣ｉｎｖａｒｉａｎｔｂａｔｃｈ￣ａｄａｐｔｉｖｅｒｅｓｉｄｕａｌｌｅａｒｎｉｎｇｆｏｒｐｅｒｓｏｎｒｅ￣ｉｄｅｎｔｉｆｉ￣ｃａｔｉｏｎ[Ｊ].ＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎＬｅｔｔｅｒｓꎬ２０２０ꎬ１２９(１):２７９－２８６.[４５]ＬＩＷꎬＺＨＵＸＴꎬＧＯＮＧＳＧ.Ｐｅｒｓｏｎｒｅ￣ｉｄｅｎｔｉｆｉｃａｔｉｏｎｂｙｄｅｅｐｊｏｉｎｔｌｅａｒｎｉｎｇｏｆｍｕｌｔｉ￣ｌｏｓｓｃｌａｓｓｉｆｉｃａｔｉｏｎ[Ｃ]//Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ２６ｔｈＩｎｔｅｒｎａｔｉｏｎａｌＪｏｉｎｔＣｏｎ￣ｆｅｒｅｎｃｅｏｎＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ.ＭｅｌｂｏｕｒｎｅꎬＡｕｓｔｒａｌ￣ｉａ:ＩＪＣＡＩꎬ２０１７:２１９４－２２００.[４６]ＺＨＯＵＳꎬＷＡＮＧＪꎬＳＨＩＲꎬｅｔａｌ.Ｌａｒｇｅｍａｒｇｉｎｌｅａｒｎ￣ｉｎｇｉｎｓｅｔ￣ｔｏ￣ｓｅｔｓｉｍｉｌａｒｉｔｙｃｏｍｐａｒｉｓｏｎｆｏｒｐｅｒｓｏｎｒｅｉ￣ｄｅｎｔｉｆｉｃａｔｉｏｎ[Ｊ].ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＭｕｌｔｉｍｅｄｉａꎬ２０１７ꎬ２０(３):５９３－６０４.[４７]ＭＣＬＡＵＧＨＬＩＮＮꎬＪＭＤＲꎬＭＩＬＬＥＲＰＣ.Ｐｅｒｓｏｎｒｅｉ￣ｄｅｎｔｉｆｉｃａｔｉｏｎｕｓｉｎｇｄｅｅｐｃｏｎｖｎｅｔｓｗｉｔｈｍｕｌｔｉｔａｓｋｌｅａｒｎ￣ｉｎｇ[Ｊ].ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＣｉｒｃｕｉｔｓａｎｄＳｙｓｔｅｍｓｆｏｒＶｉｄｅｏＴｅｃｈｎｏｌｏｇｙꎬ２０１７ꎬ２７(３):５２５－５３９. [４８]ＬＩＫꎬＤＩＮＧＺꎬＬＩＫꎬｅｔａｌ.Ｖｅｈｉｃｌｅａｎｄｐｅｒｓｏｎｒｅ￣ｉｄｅｎ￣ｔｉｆｉｃａｔｉｏｎｗｉｔｈｓｕｐｐｏｒｔｎｅｉｇｈｂｏｒｌｏｓｓ[Ｊ].ＩＥＥＥＴｒａｎｓ￣ａｃｔｉｏｎｓｏｎＮｅｕｒａｌＮｅｔｗｏｒｋｓａｎｄＬｅａｒｎｉｎｇＳｙｓｔｅｍｓꎬ２０２２ꎬ３３(２):８２６－８３８.[４９]ＹＵＡＮＣꎬＧＵＯＪꎬＦＥＮＧＰꎬｅｔａｌ.Ｌｅａｒｎｉｎｇｄｅｅｐｅｍ￣ｂｅｄｄｉｎｇｗｉｔｈｍｉｎｉ￣ｃｌｕｓｔｅｒｌｏｓｓｆｏｒｐｅｒｓｏｎｒｅ￣ｉｄｅｎｔｉｆｉｃａ￣ｔｉｏｎ[Ｊ].ＭｕｌｔｉｍｅｄｉａＴｏｏｌｓａｎｄＡｐｐｌｉｃａｔｉｏｎꎬ２０１９ꎬ７８(１５):２１１４５－２１１６６.[５０]ＳＡＱＵＩＢＳＭꎬＳＣＨＵＭＡＮＮＡꎬＥＢＥＲＬＥＡꎬｅｔａｌ.Ａｐｏｓｅ￣ｓｅｎｓｉｔｉｖｅｅｍｂｅｄｄｉｎｇｆｏｒｐｅｒｓｏｎｒｅ￣ｉｄｅｎｔｉｆｉｃａｔｉｏｎｗｉｔｈｅｘｐａｎｄｅｄｃｒｏｓｓｎｅｉｇｈｂｏｒｈｏｏｄｒｅ￣ｒａｎｋｉｎｇ[Ｃ]//ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉ￣ｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ.ＰｉｓｃａｔａｗａｙꎬＵＳＡ:ＩＥＥＥꎬ２０１８:４２０－４２９.[５１]ＬＩＫꎬＤＩＮＧＺꎬＬＩＫＰꎬｅｔａｌ.Ｓｕｐｐｏｒｔｎｅｉｇｈｂｏｒｌｏｓｓｆｏｒｐｅｒｓｏｎｒｅ￣ｉｄｅｎｔｉｆｉｃａｔｉｏｎ[Ｃ]//Ｐｒｏｃｅｅｄｉｎｇｏｆｔｈｅ２６ｔｈＡＣＭＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＭｕｌｔｉｍｅｄｉａ.ＮｅｗＹｏｒｋꎬＵＳＡ:ＡＣＭꎬ２０１８:１４９２－１５００.[５２]ＥＲＭＯＬＯＶＡꎬＭＩＲＶＡＫＨＡＢＯＶＡＬꎬＫＨＲＵＬＫＯＶＶꎬｅｔａｌ.Ｈｙｐｅｒｂｏｌｉｃｖｉｓｉｏｎｔｒａｎｓｆｏｒｍｅｒｓ:ｃｏｍｂｉｎｉｎｇｉｍ￣ｐｒｏｖｅｍｅｎｔｓｉｎｍｅｔｒｉｃｌｅａｒｎｉｎｇ[Ｃ]//ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ.ＮｅｗＯｒｌｅａｎｓꎬＵＳＡ:ＩＥＥＥꎬ２０２２:７３９９－７４０９.[５３]ＷＡＮＧＨꎬＤＥＮＧＹꎬＹＯＯＳꎬｅｔａｌ.ＡＧＫＤ￣ＢＭＬ:ｄｅ￣ｆｅｎｓｅａｇａｉｎｓｔａｄｖｅｒｓａｒｉａｌａｔｔａｃｋｂｙａｔｔｅｎｔｉｏｎｇｕｉｄｅｄｋｎｏｗｌｅｄｇｅｄｉｓｔｉｌｌａｔｉｏｎａｎｄｂｉ￣ｄｉｒｅｃｔｉｏｎａｌｍｅｔｒｉｃｌｅａｒｎ￣ｉｎｇ[Ｃ]//ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩＥＥＥＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎ￣ｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎ.ＭｏｎｔｒｅａｌꎬＣａｎａｄａ:ＩＥＥＥꎬ２０２１:７６３８－７６４７.(责任编辑:和晓军)７１第５期㊀㊀㊀㊀㊀陈㊀扬等:基于改进ＹＯＬＯｖ５ｓ的头盔佩戴检测算法。

基于眼底照相的糖尿病视网膜病变人工智能筛查系统应用指南

中华实验眼科杂志2019年8月第37卷第8期Chin J Exp Ophthalmol,August2019,Vol.37,No.8•593・•标准与规范•基于眼底照相的糖尿病视网膜病变人工智能筛查系统应用指南中国医药教育协会智能医学专委会智能眼科学组国家重点研发计划“眼科多模态成像及人工智能诊疗系统的研发和应用”项目组通信作者：袁进,Email:yuanjincomea@【摘要】基于医疗大数据的人工智能(AI)辅助诊断技术近年来日趋成熟,基于彩色眼底照相的AI辅助诊断系统在糖尿病视网膜病变(DR)的筛査中展现了良好的灵敏度和特异度。

为了建立AI辅助DR筛査的统一标准，推动AI诊断系统的临床实践应用,提升我国基于AI技术的DR诊疗水平，中国医药教育协会智能医学专委会智能眼科学组起草并通过了《基于眼底照相的糖尿病视网膜病变人工智能筛査系统应用指南》，对基于彩色眼底照相的AI辅助DR诊断平台的硬件参数、设备配置、数据采集及标准、数据库建立、AI算法要求、AI筛査报告内容格式、临床AI筛査随访方案制定了相关规范和建议。

【关键词】人工智能；糖尿病视网膜病变；筛査；眼底照相基金项目：国家重点研发计划项目(2017YFC0112400、2017YFC0112405)指南注册：国际实践指南注册平台，IPGRP-2019CN038DOI：10.3760/cma.j.issn.2095-0160.2019.08.001Guidelines for artificial intelligent diabetic retinopathy screening system based on fundus photographyIntelligent Medicine Special Committee of China Medicine Education Association,National Key Research andDevelopment Program of China"Development and Application of Ophthalmic Multimodal Imaging and ArtificialIntelligence Diagnosis and Treatment System"Project TeamCorresponding author:Yuan Jin,Email:yuanjincornea®[Abstract]Artificial intelligence(AI)aided diagnosis technology based onmedical big data has matured inrecent years.Al-aided diagnosis system based on color fundus photographs has shown favorable sensitivity andspecificity in the screening of diabetic retinopathy(DR).In order to establish a unified standard for Al-assisted DRscreening, promote the clinical practice of AI diagnostic system, and improve the level of DR diagnosis and treatmentbased on AI technology in China,the Artificial Intelligent Ophthalmology Group under the Intelligent Medicine SpecialCommittee of China Medicine Education Association drafted and adopted the"Artificial Intelligent DiabeticRetinopathy Screening System Based on Fundus Photography Guideline M.Specifications and recommendations of AI-assisted DR diagnosis platform based on fundus photographs were formulated on system hardware parameters,equipment configuration,data collection and standards,database establishment,AI algorithm requirements,content andformat of AI screening report,and AI screening follow-up plan.[Key words]Artificial intelligence；Diabetic retinopathy；Screening；Fundus photographyFund program：National Key R&D Program of China(2017YFC0112400、2017YFC0112405)Guidelines register:International Practice Guideline Registry Platform,IPGRP-2019CN038DOI:10.3760/cma.j.issn.2095-0160.2019.08.0011人工智能糖尿病视网膜病变筛查系统研发和应用的目的及意义糖尿病是影响人类健康和生活质量的常见慢性疾病，国际糖尿病联盟(International Diabetes Federation, IDF)的研究报告显示，世界范围内有4.25亿成年人患有各种类型糖尿病。

改进DAB

现代电子技术Modern Electronics Technique2023年11月1日第46卷第21期Nov. 2023Vol. 46 No. 210 引言交通是国民经济的命脉，交通安全与人民群众生命财产安全、社会稳定和长治久安以及国民经济高质量发展密切相关。

道路交通事故占交通事故的绝大多数，据统计，近五年我国道路交通事故年均发生接近25万起，年均造成死亡人数超6万人，财产损失近14亿元，且仍处于道路交通事故发展的上升期。

因此，本文通过对非规则改进DAB⁃DETR 算法的非规则交通对象检测林峰1，2，宁琪琳1，朱智勤2（1.重庆邮电大学通信与信息工程学院，重庆 400065； 2.重庆邮电大学自动化学院，重庆 400065）摘要：非规则交通对象主要指任何在车辆行驶过程中可能对车辆行驶起到阻碍作用的物体，例如坑洼、落石、树枝等影响车辆正常驾驶的目标。

针对道路中的非规则交通对象检测问题，提出一种基于改进DAB⁃DETR 算法的非规则交通对象目标检测算法，经过对原始模型结构的分析，发现在图像特征输入编码器前加入绝对位置编码来弥补图像位置信息的缺失，只能隐式地表达特征间的相对位置信息，因此改进DAB⁃DETR 在Transformer 的编码结构中的多头自注意力机制中添加了针对图像的相对位置编码；其次发现在原始训练策略中，对得到的检测定位结果与类别信息进行二分匹配并计算损失值时，只是简单地将定位损失和分类损失加权求和，这样会导致性能下降，所以在训练策略中增加了将分类、定位损失集成在一个统一参数化公式中的AP 损失函数。

实验结果表明：改进DAB⁃DETR 算法的检测精度达到了82.00%，比原始模型提高了3.3%，比传统模型Faster R⁃CNN 、YOLOv5分别提高了6.20%、7.71%。

关键词：非规则交通对象；目标检测； DAB⁃DETR 算法；相对位置编码； AP 损失函数；消融实验中图分类号： TN911.73⁃34； TP751 文献标识码： A 文章编号： 1004⁃373X （2023）21⁃0141⁃08Irregular traffic object detection by improved DAB⁃DETR algorithmLIN Feng 1, 2, NING Qilin 1, ZHU Zhiqin 2(1. School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;2. School of Automation, Chongqing University of Posts and Telecommunications, Chongqing 400065, China)Abstract ： Irregular traffic objects mainly refer to any objects that may play an obstructive role in vehicle driving, such as potholes, falling rocks, tree branches and other objectives that affect the normal driving of vehicles. Therefore, an irregular traffic object detection algorithm based on improved DAB⁃DETR (dynamic anchor boxes are better queries for DETR) is proposed. Byanalyzing the structure of the original model, it is found that the absolute position encoding is added before the image features are input into the encoder to make up for the lack of image location information can only implicitly show the relative location information between features. Therefore, in the improved DAB ⁃DETR algorithm, the relative location encoding for images isadded to the multi⁃headed self⁃attention mechanism in the encoding structure of transformer. When binary matching is carriedout on both the obtained detection and positioning results and the category information and then the loss value is calculated, the localization loss and classification loss are simply weighted and summed, which may lead to decreased performance, so an AP loss function that integrates the classification and localization losses in a unified parameterized formula is added to the improved strategy. The experimental results show that the detection accuracy of the improved DAB ⁃DETR algorithm can reach 82.00%,which is 3.3% higher than that of the original model, and 6.20% and 7.71% higher than those of the traditional models Faster R⁃CNN and YOLOv5, respectively.Keywords ： irregular traffic object; object detection; DAB ⁃DETR algorithm; relative position encoding; AP loss function;ablation experimentDOI ：10.16652/j.issn.1004⁃373x.2023.21.026引用格式：林峰，宁琪琳，朱智勤.改进DAB⁃DETR 算法的非规则交通对象检测[J].现代电子技术，2023，46（21）：141⁃148.收稿日期：2023⁃05⁃10 修回日期：2023⁃05⁃29基金项目：重庆市教委“成渝地区双城经济圈建设”科技创新项目（KJCXZD2020028）141现代电子技术2023年第46卷交通对象（任何在车辆行驶过程中可能对车辆行驶起到阻碍作用的物体）检测的研究来减少道路交通事故的发生。

Outof-domain detection based on confidence measures from multiple topic classification

OUT-OF-DOMAIN DETECTION BASED ON CONFIDENCE MEASURES FROMMULTIPLE TOPIC CLASSIFICATIONIan ne1,2,Tatsuya Kawahara1,2,Tomoko Matsui3,2,Satoshi Nakamura21School of Informatics,Kyoto UniversitySakyo-ku,Kyoto606-8501,Japan2ATR Spoken Language Translation Laboratories2-2-2Hikaridai,Seika-cho,Soraku-gun,Kyoto619-0288,Japan3The Institute of Statistical Mathematics4-6-7Minami-Azabu,Mitato-ku,Tokyo106-8569,JapanABSTRACTOne signiﬁcant problem for spoken language systems is how to cope with users’OOD(out-of-domain)utterances which cannot be handled by the back-end system.In this paper,we propose a novel OOD detection framework,which makes use of classiﬁcation con-ﬁdence scores of multiple topics and trains a linear discriminant in-domain veriﬁer using GPD.Training is based on deleted inter-polation of the in-domain data,and thus does not require actual OOD data,providing high portability.Three topic classiﬁcation schemes of word N-gram models,LSA,and SVM are evaluated, and SVM is shown to have the greatest discriminative ability.In an OOD detection task,the proposed approach achieves an ab-solute reduction in EER of6.5%compared to a baseline method based on a simple combination of multiple-topic classiﬁcations. Furthermore,comparison with a system trained using OOD data demonstrates that the proposed training scheme realizes compara-ble performance while requiring no knowledge of the OOD data set.1.INTRODUCTIONMost spoken language systems,excluding general-purpose dicta-tion systems,operate over deﬁnite domains as a user interface to a service provided by the back-end system.However,users,es-pecially novice users,do not always have an exact concept of the domains served by the system.Thus,they often attempt utterances that cannot be handled by the system.These are referred to as OOD(out-of-domain)in this paper.Deﬁnitions of OOD for three typical spoken language systems are described in Table1.For an improved interface,spoken language systems should predict and detect such OOD utterances.In order to predict OOD utterances,the language model should allow some margin in its coverage.A mechanism is also required for the detection of OOD utterances,which is addressed in this paper.Performing OOD de-tection will improve the system interface by enabling users to de-termine whether to reattempt the current task after being conﬁrmed as in-domain,or to halt attempts due to being OOD.For exam-ple,in a speech-to-speech translation system,an utterance may be in-domain but unable to be accurately translated by the back-end system;in this case the user is requested to re-phrase the input utterance,making translation possible.In the case of an OOD ut-terance,however,re-phrasing will not improve translation,so theTable1.Deﬁnitions of Out-of-domain for various systems System Out-of-Domain deﬁnition Spoken Dialogue User’s query does not relate to back-endinformation sourceCall Routing User’s query does not relate to anycall destinationSpeech-to-Speech Translation system does not provide Translation coverage for offered topicuser should be informed that the utterance is OOD and provided with a list of tractable domains.Research on OOD detection is limited,and conventional stud-ies have typically focused on using recognition conﬁdences for re-jecting erroneous recognition outputs(e.g.,[1],[2]).In these ap-proaches there is no discrimination between in-domain utterances that have been incorrectly recognized and OOD utterances,and thus effective user feedback cannot be generated.One area where OOD detection has been successfully applied is call routing tasks such as that described in[3].In this work,classiﬁcation models are trained for each call destination,and a garbage model is ex-plicitly trained to detect OOD utterances.To train these models,a large amount of real-world data is required,consisting of both in-domain and OOD training examples.However,reliance on OOD training data is problematic:ﬁrst,an operational on-line system is required to gather such data,and second,it is difﬁcult to gain an appropriate distribution of data that will provide sufﬁcient cover-age over all possible OOD utterances.In the proposed approach,the domain is assumed to consist of multiple sub-domain topics,such as call destinations in call-routing,sub-topics in translation systems,and sub-domains in com-plex dialogue systems.OOD detection is performed byﬁrst cal-culating classiﬁcation conﬁdence scores for all in-domain topic classes and then applying an in-domain veriﬁcation model to this conﬁdence vector,which results in an OOD decision.The veriﬁ-cation model is trained using GPD(gradient probabilistic descent) and deleted interpolation,allowing the system to be developed by using only in-domain data.2.SYSTEM OVERVIEWIn the proposed framework,the training set is initially split into multiple topic classes.In the work described in this paper,topic classes are predeﬁned and the training set is hand-labeled appropri-«by applying topic-dependent language models.We demonstratedthe effectiveness of such an approach in[4].An overview of the OOD detection framework is shown inFigure1.First,speech recognition is performed by applying ageneralized language model that covers all in-domain topics,andN-best recognition hypotheses(s1,...,s N)are generated.Next, topic classiﬁcation conﬁdence scores(C(t1|X),...,C(t M|X)) are generated for each topic class based on these hypotheses.Fi-nally,OOD detection is performed by applying an in-domain veri-ﬁcation model G in−domain(X)to the resulting conﬁdence vector. The overall performance of the proposed approach is affected by the accuracy of the topic classiﬁcation method and the in-domain veriﬁcation model.These aspects are described in detail in the following sections.3.TOPIC CLASSIFICATIONIn this paper three topic classiﬁcation schemes are evaluated:topic-dependent word N-gram,LSA(latent semantic analysis),and SVM (support vector machines).Based on a given feature set,topic models are trained using the above methods.Topic classiﬁcation is performed and conﬁdence scores(in the range[0,1])are calculated by applying a sigmoid transform to these results.When classiﬁca-tion is applied to an N-best speech recognition result,conﬁdence scores are calculated as shown in Equation1.Topic classiﬁcation is applied independently to each N-best hypothesis,and these are linearly combined by weighting each with the posterior probability of that hypothesis given by ASR.C(t j|X)=NXi=1p(s i|X)C(t j|s i)(1)C(t j|X):conﬁdence score of topic t j for input utterance Xp(s i|X):posterior probability of i-th best sentencehypothesis s i by ASRN:number of N-best hypotheses3.1.Topic Classiﬁcation FeaturesVarious feature sets for topic classiﬁcation are investigated.A feature vector consists of either word baseform(word token with no tense information;all variants are merged),full-word(surface form of words,including variants),or word+POS(part-of-speech) tokens.The inclusion of N-gram features that combine multiple neighboring tokens is also investigated.Appropriate cutoffs are applied during training to remove features with low occurrence.3.2.Topic-dependent Word N-gramIn this approach,N-gram language models are trained for each topic class.Classiﬁcation is performed by calculating the log-likelihood of each topic model for the input sentence.Topic clas-siﬁcation conﬁdence scores are calculated by applying a sigmoid transform to this log-likelihood measure.tent Semantic AnalysisLSA(latent semantic analysis)[5]is a popular technique for topic classiﬁcation.Based on a vector space model,each sentence is represented as a point in a large dimension space,where vector components relate to the features described in Section3.1.Be-cause the vector space tends to be extremely large(10,000-70,000 features),traditional distance measures such as the cosine distance become unreliable.To improve performance,SVD(singular value decomposition)is applied to reduce the large space to100-300di-mensions.Each topic class is represented as a single document vector composed of all training sentences,and projected to this reduced space.Classiﬁcation is performed by projecting the vector represen-tation of the input sentence to the reduced space and calculating the cosine distance between this vector and each topic class vec-tor.The resulting distance is normalized by applying a sigmoid transform generating classiﬁcation conﬁdence scores.3.4.Support Vector MachinesSVM(support vector machines)[6]is another popular classiﬁca-tion ing a vector space model,SVM classiﬁers are trained for each in-domain topic class.Sentences that occur in the training set of that topic are used as positive examples and the remainder of the training set is used as negative examples.Classiﬁcation is performed by feeding the vector representa-tion of the input sentence to each SVM classiﬁer.The perpendicu-lar distance between this vector and each SVM hyperplane is used as the classiﬁcation measure.This value is positive if the input sen-tence is in-class and negative otherwise.Again,conﬁdence scores are generated by applying a sigmoid transform to this distance.4.IN-DOMAIN VERIFICATIONTheﬁnal stage of OOD detection consists of applying an in-domain veriﬁcation model G in−domain(X)to the vector of conﬁdence scores generated during topic classiﬁcation.We adopt a linear dis-criminant model(Eqn.2).Linear discriminant weights (λ1,...,λM)are applied to the conﬁdence scores from topic clas-siﬁcation(C(t1|X),...,C(t M|X)),and a threshold(ϕ)is ap-plied to obtain a binary decision of in-domain or OOD.G in−domain(X)=(1ifPMj=1λj C(t j|X)≥ϕ(in-domain)0otherwise.(OOD)(2)C(t j|X):conﬁdence score of topic t j for input utterance XM:number of topic classes4.1.Training using Deleted InterpolationThe in-domain veriﬁcation model is trained using only in-domain data.An overview of the proposed training method combining GPD(gradient probabilistic descent)[7]and deleted interpolation¬Table2.Deleted Interpolation based Training for each topic i in[1,M]set topic i as temporary OODset remaining topic classes as in-domaincalculate(λ1,...,λM)using GPD(λi excluded) average(λ1,...,λM)over all iterationsTable3.Experiment CorpusDomain:Basic Travel ExpressionsIn-Domain:11topics(transit,accommodation,...)OOD:1topic(shopping)Training Set:11topics,149540sentences(in-domain data only) Lexicon Size:17000wordsTest set:In-Domain:1852utterancesOOD:138utterancesis given in Table2.Each topic is iteratively set to be temporar-ily OOD,and the classiﬁer corresponding to this topic is removed from the model.The discriminant weights of the remaining topic classiﬁers are estimated using GPD.In this step,the temporary OOD data is used as negative training examples,and a balanced set of the remaining topic classes are used as positive(in-domain) examples.Upon completion of estimation by GPD,theﬁnal model weights are calculated by averaging over all interpolation steps. In the experimental evaluation,a topic-independent class“basic”covering general utterances exists,which is not removed during deleted interpolation.4.2.Incorporation of Topic-dependent VeriﬁerImproved OOD detection accuracy can be achieved by applying more elaborate veriﬁcation models.In this paper,a model consist-ing of multiple linear discriminant functions is investigated.Topic dependent functions are added for topics not modeled sufﬁciently. Their weights are trained speciﬁcally for verifying that topic.For veriﬁcation,the topic with maximum classiﬁcation conﬁdence is selected,and a topic-dependent function is applied if one exists, otherwise a topic-independent function(Eqn.2)is applied.5.EXPERIMENTAL EV ALUATIONThe ATR BTEC corpus[8]is used to investigate the performance of the proposed approach.An overview of the corpus is given in Table3.In this experiment,we use“shopping”as OOD of the speech-to-speech translation system.The training set consisting of11in-domain topics is used to train both the language model for speech recognition and the topic classiﬁcation models.Recogni-tion is performed with the Julius recognition engine.The recognition performance for the in-domain(ID)and OOD test sets are shown in Table4.Although the OOD test set has much greater error rates and out-of-vocabulary rate compared with the in-domain test set,more than half of the utterances are correctly recognized,since the language model covers the general travel do-main.This indicates that the OOD set is related to the in-domain task,and discrimination between these sets will be difﬁcult.System performance is evaluated by the following measures: FRR(False Rejection Rate):Percentage of in-domainutterances classiﬁed as OOD FAR(False Acceptance Rate):Percentage of OOD utterancesclassiﬁed as in-domainEER(Equal Error Rate):Error rate at an operating pointwhere FRR and FAR are equalTable4.Speech Recognition Performance#Utt.WER(%)SER(%)OOV(%) In-Domain18527.2622.40.71 Out-of-Domain13812.4945.3 2.56 WER:Word Error Rate SER:Sentence Error RateOOV:Out of V ocabularyparison of Feature Sets&Classiﬁcation Models Method Token Set Feature Set#Feat.EER(%)SVM base-form1-gram877129.7SVM full-word1-gram989923.9SVM word+POS1-gram1000623.3SVM word+POS1,2-gram4075421.7SVM word+POS1,2,3-gram7306519.6LSA word+POS1-gram1000623.3LSA word+POS1,2-gram4075424.1LSA word+POS1,2,3-gram7306523.0 NGRAM word+POS1-gram1000624.8 NGRAM word+POS1,2-gram4075425.2 NGRAM word+POS1,2,3-gram7306524.2 SVM:Support Vector Machines LSA:Latent Semantic Analysis NGRAM:Topic-dependent Word N-gram5.1.Evaluation of Topic Classiﬁcation and Feature Sets First,the discriminative ability of various feature sets as described in Section3.1were investigated.Initially,SVM topic classiﬁca-tion models were trained for each feature set.A closed evaluation was performed for this preliminary experiment.Topic classiﬁca-tion conﬁdence scores were calculated for the in-domain and OOD test sets using the above SVM models,and used to train the in-domain veriﬁcation model using GPD.During training,in-domain data were used as positive training examples,and OOD data were used as negative examples.Model performance was evaluated by applying this closed model to the same conﬁdence vectors used for training.The performance in terms of EER is shown in theﬁrst section of Table5.The EER when word-baseform features were used was29.7%. Full-word or word+POS features improved detection accuracy sig-niﬁcantly:with EERs of23.9%and23.3%,respectively.The in-clusion of context-based2-gram and3-gram features further im-proved detection performance.A minimum EER of19.6%was obtained when3-gram features were incorporated.Next,LSA and N-gram-based classiﬁcation models were eval-uated.Both approaches showed lower performance than SVM, and the inclusion of context-based features did not improve per-formance.SVM with a feature set containing1-,2-,and3-gram offered the lowest OOD detection error rate,so it is used in the following experiments.5.2.Deleted Interpolation-based TrainingNext,performance of the proposed training method combining GPD and deleted interpolation was evaluated.We compared the OOD detection performances of the proposed method(proposed), a reference method in which the in-domain veriﬁcation model was trained using both in-domain and OOD data(as described in Sec-tion5.1)(closed-model),and a baseline system.In the baseline system,topic detection was applied and an utterance was classi-ﬁed as OOD if all binary SVM decisions were negative.Other-¬10203040506070010203040506070FRRF A RFig.2.OOD Detection Performance on Correct Transcriptions102030baselineproposedclosed-modelVerification MethodE r r o r R a t e (%)Fig.3.OOD Detection Performance on ASR Result wise it was classi ﬁed as in-domain.The ROC graph of the three systems obtained by altering the veri ﬁcation threshold (ϕin Eqn.2)is shown in Figure 2.The baseline system has a FRR of 25.2%,a FAR of 29.7%,and an EER of 27.7%.The proposed method provides an abso-lute reduction in EER of 6.5%compared to the baseline system.Furthermore,it offers comparable performance to the closed eval-uation case (21.2%vs.19.6%)while being trained with only in-domain data.This shows that the deleted interpolation approach is successful in training the OOD detection model in the absence of OOD data.5.3.Evaluation with ASR ResultsNext,the performances of the above three systems were evaluated on a test set of 1990spoken utterances.Speech recognition was performed and the 10-best recognition results were used to gen-erate a topic classi ﬁcation vector.The FRR,FAR and percentage of falsely rejected utterances with recognition errors are shown in Figure 3.The EER of the proposed system when applied to the ASR re-sults is 22.7%,an absolute increase of 1.5%compared to the case for the correct transcriptions.This small increase in EER suggests that the system is strongly robust against recognition errors.Fur-ther investigation showed that the falsely rejected set had a SER of around 43%,twice that of the in-domain test set.This suggests that utterances that incur recognition errors are more likely to be rejected than correctly recognized utterances.5.4.Effect of Topic-dependent Veri ﬁcation ModelFinally,the topic-dependent in-domain veri ﬁcation model described in Section 4.2was also incorporated.Evaluation was performed on spoken utterances as in the above section.The addition of atopic-dependent function (for the topic “basic ”)reduced the EER to 21.2%.The addition of further topic-dependent functions,how-ever,did not provide signi ﬁcant improvement in performance over the two function case.The topic class “basic ”is the most vague and is poorly modeled by the topic-independent model.A topic-dependent function effectively models the complexities of this class.6.CONCLUSIONSWe proposed a novel OOD (out-of-domain)detection method based on con ﬁdence measures from multiple topic classi ﬁcation.A novel training method combining GPD and deleted interpolation was in-troduced to allow the system to be trained using only in-domain data.Three classi ﬁcation methods were evaluated (topic depen-dent word N-gram,LSA and SVM),and SVM-based topic classi ﬁ-cation using word and N-gram features proved to have the greatest discriminative ability.The proposed approach reduced OOD detection errors by 6.5%compared to the baseline system based on a simple combination of binary topic classi ﬁcations.Furthermore,it provides similar per-formance to the same system trained on both in-domain and OOD data (EERs of 21.2%and 19.6%,respectively)while requiring no knowledge of the OOD data set.Addition of a topic dependent veri ﬁcation model provides a further reduction in detection errors.Acknowledgements:The research reported here was supported in part by a contract with the Telecommunications Advancement Organization of Japan entitled,”A study of speech dialogue trans-lation technology based on a large corpus”.7.REFERENCES[1]T.Hanzen,S.Seneff,and J.Polifroni.Recognition con ﬁdenceand its use in speech understanding systems.In Computer Speech and Language ,2002.[2]C Ma,M.Randolph,and J.Drish.A support vector machines-based rejection technique for speech recognition.In ICASSP ,2001.[3]P.Haffner,G.Tur,and J.Wright.Optimizing svms for com-plex call classi ﬁcation.In ICASSP ,2003.[4]ne,T.Kawahara,and nguage model switch-ing based on topic detection for dialog speech recognition.In ICASSP ,2003.[5]S.Deerwester,S.Dumais,G.Furnas,ndauer,andR.Harshman.Indexing by latent semantic analysis.In Journ.of the American Society for information science,41,pp.391-407,1990.[6]T.Joachims.Text categorization with support vector ma-chines.In Proc.European Conference on Machine Learning ,1998.[7]S.Katagiri,C.-H.Lee,and B.-H.Juang.New discriminativetraining algorithm based on the generalized probabilistic de-scent method.In IEEE workshop NNSP ,pp.299-300,1991.[8]T.Takezawa,M.Sumita,F.Sugaya,H.Yamamoto,and Ya-mamoto S.Towards a broad-coverage bilingual corpus for speech translation of travel conversations in the real world.In Proc.LREC,pp.147-152,2002.¬。

基于视线角度的人眼视线检测研究

部图像进行阈值处理后, 很容易找到瞳孔与普尔钦斑的相对位置关系, 从而断定视线方向。当头部发生微小转动时, 就要建立视线方向修正系统。这种基于红外的方法目前仍然比较流行。 1. 2. 2 角膜反射法
该法是实时采集角膜表面形成的虚像, 根据虚像的相对移动状况, 来判断眼球的行动情况, 进而判断视线的方向。资料显示: 台湾的成大研究所的 Eyegaze! 系统就是采用这种技术。但是这种方法非常不准确, 特别容易受到眼动时各种噪声信号的影响。 1. 2. 3 红外光电反射法
主要是利用电磁感应的原理测量眼睛的移动。此法是将感应线圈做成的特殊镜片放在试验者的角膜上, 然后在眼睛的周围加上固定的交变磁场, 当眼睛转动时, 将使特制的镜片发生转动以切割附近的磁感线, 因而产生不同的感应电动势。依据感应电动势的变化来记录分析眼球的运动。该方法测量的精度高, 速度快。使用这种方法很容易受到眼睛状况的影响, 如眼泪分泌量和过敏等。
目前采用这种方法的仪器已经商用化, 荷兰的 skalar 仪器公司和华盛顿的 C- N - C 技术公司都有成熟的产品。
∀ 38 ∀
计算机技术与发展
第 19 卷
1. 1. 2 眼电图法这种方法主要是利用电极的原理来检测视线方
向。因为人的视网膜前后存在着一个电压差, 当眼球转动时眼睛周围的电势肯定会发生变化。所以首先在眼睛的四周装上四个电极, 当眼睛移动时, 电极将产生相应的变化信号, 可依据这些差异的信号来判断眼球运动的情况。根据资料显示, 这种方法大约可以识别出眼球 3 度左右的水平偏转, 以及 5 度左右的垂直偏转。但是由于电极始终在眼睛四周的皮肤上, 而皮肤受汗腺、皮肤分泌物的影响, 它的电阻是因时而异的, 这就容易造成电信号的不稳定, 造成很大的测量误差, 而且电极对人也有伤害作用。 1. 1. 3 接触镜法

基于霍夫变换的车道线检测的实验报告(一)

基于霍夫变换的车道线检测的实验报告(一)基于霍夫变换的车道线检测的实验报告引言•对于自动驾驶和辅助驾驶等智能交通系统来说，准确检测和识别道路的车道线是至关重要的技术之一。

•本实验报告通过采用霍夫变换方法，对图像进行处理，以实现车道线的检测。

实验设备•一台计算机（配有Python环境）•视频或图像数据集实验步骤1.图像预处理–使用OpenCV库加载图像，并对其进行灰度化处理，以方便后续处理。

–对图像进行高斯模糊处理，以去除图像中的噪声。

2.边缘检测–利用Canny边缘检测算法对预处理后的图像进行边缘检测。

–通过调整边缘检测的参数，可以使得车道线在图像中更加清晰可见。

3.霍夫变换–使用霍夫变换算法对边缘检测后的图像进行直线检测。

–根据设定的阈值和参数，筛选出可能为车道线的直线。

4.车道线检测–根据霍夫变换得到的直线参数，将其画在原始图像上，实现车道线的可视化。

–可以利用额外的算法和技术，如滑动窗口法、曲线拟合等，对车道线进行进一步的处理和优化。

5.实验结果展示–展示处理后的图像和检测到的车道线。

–分析实验结果的准确性和稳定性。

实验总结•本实验使用了霍夫变换方法对图像进行处理，成功地实现了车道线的检测。

•霍夫变换方法具有一定的鲁棒性和稳定性，在处理不同条件下的图像时表现良好。

•通过本次实验，进一步验证了霍夫变换在图像处理中的重要性和潜力。

参考文献1.S. Xie, C. Wang, J. Wang, et al. (2019).Real-time lane detection based on improved Houghtransform. Sensors, 19(21), 4655.2.Y. Liang, Y. Huang, and C. Xu. (2016). Lanedetection algorithm based on adaptive Canny and Houghtransform. Journal of Electronic Measurement andInstrument, 30(6), .3.霍夫变换的原理–霍夫变换是一种数学变换，用于在图像中检测直线等几何形状。

人眼识别外文翻译译文

译文:惺忪眼睛识别之睡意检测林信锋,林家仁，姚志国立东华大学，台湾花莲摘要：随着科学技术和汽车工业的进步,道路上有了越来越多的车辆。

其结果是，繁忙的交通经常导致越来越多的交通事故。

普通交通事故，司机注意力不集中通常是一个主要原因。

若要避免这种情况,本文提出了惺忪的眼识别系统的嗜睡检测。

首先，级联Adaboｏst算法与Haar特征分类器来找出人脸。

第二，眼睛区域位于主动形状模型(ＡSＭ）搜索算法.然后采用二进制的模式和边缘检测的眼睛特征提取和确定眼睛的状态。

实验结果表明即使没有系统训练阶段也能与其他方法的性能比较。

关键词：人脸检测;人眼识别；睡意。

一、引言在过去的几十年中，随着车辆技术的发展交通事故发生率越来越高.驾驶员疲劳驾驶被认为是一个重要因素。

许多研究显示长时间驾驶的危险是相当于醉酒驾驶.因此，驾驶员疲劳驾驶已成为一个普遍的问题。

其结果是，大量的研究一直致力于检测系统的不安全驾驶。

安全驾驶系统可以概括为两大类。

一种是车辆的"以车为本”的［1］[２] 方法，其中着重论述，如车辆的道路上,位置状态变化的速度，等等.另一类是＂以人为本”的方法，侧重于驱动程序的状态。

此方法分析了驱动程序的人脸图像与图像处理和模式识别，如眨眼频率和眼睛关闭［３] 的时间。

提出的方法基于这一类别。

林eｔal.[4］评估几个功能集和分类对于亲密关系的人眼检测。

他们采用灰度值，Gａｂor 小波、局部二进制模式（ＬBP）及直方图的面向梯度（HOG）来表示功能集，并与三种类型的分类器（即，邻近取样(ＮN)，支持向量机（ＳVM) 和Ａdａｂooｓt算法）比较。

实验结果表明,各种特征描述符的结合大大提高了精度。

吴吴 et al．[5] 提出了一种识别眼睛的状态方法.他们用ｈａar 特征和Aｄaboost 分类器 [６］来找出人脸区域.LBP 被考虑作为图像的特征和特点采用支持向量机训练。

然后利用支持向量机识别眼睛的状态。

扩瞳试验对阿尔茨海默病的预测效应

论著扩瞳试验对阿尔茨海默病的预测效应项志清1,2　陈月敏2　徐一峰1,2 【摘要】　目的　利用扩瞳试验对轻度认知功能损害(M ild cognitive i m pair m ent,MC I )患者进行研究,了解MC I 、阿尔茨海默病(A D )与正常老年人在扩瞳试验结果之间的差异,分析MC I 与AD 之间的关系,并探讨扩瞳试验是否能作为MC I 发展成AD 的预测指标。

方法　收集A D 患者30例、MC I 患者28例以及健康对照34例。

分别进行神经心理学测验和扩瞳试验。

比较三组的神经心理学测验和扩瞳试验结果之间的差异。

计算扩瞳试验在诊断A D 和MC I 时的敏感性和特异性。

结果　MC I 组的神经心理学测验都显著好于AD 组(P <0.001),但都明显不如正常对照组(P <0.001)。

AD 患者和MC I 患者在滴入扩瞳剂后,瞳孔直径明显扩大,与NC 组有显著差异(P 值分别为P <0.05,P <0.001),而AD 组与MC I 组之间则无统计学上的差异(P >0.05)。

扩瞳试验诊断AD 的敏感性和特异性分别为60.0%和67.65%,诊断MC I 的敏感性和特异性分别为71.43%和67.65%。

结论　扩瞳试验可以将MC I 患者和AD 患者、与正常老年人区别开来,可以作为MC I 和AD 的一个筛选诊断标志。

MC I 是AD 与正常衰老之间的过渡状态;MC I 患者是AD 的高危人群。

【关键词】　扩瞳试验　阿尔茨海默病(AD )　轻度认知功能损害(MC I )P up il d il a tio n test:a pr ed i ct i ve te st for Alzhe i m er ’s d is ea se Xia ng Z hiqing,Chen Yuem in,Xu Yifeng .D epa rt ment o f psychia try ,Hua shan H ospit a l,Fudan U ni versity .Sha ngha i 200040【Abstra c t 】　O b ject i ve:T o compare pa tients with Alzhei me r ’s disease,amne stic M ild Cogniti veI mpair ment (a -MC I),and a group of healthy elde rly pers ons on pupil dilati on test .T o de t e r m ine whethe r pup il re s ponse t o dilute tropicam i de could be used a s a diag nostic te st for AD and M C I,and to investigate whethe r pupil dilati on test could be used as a predic t or for pa tients with a -MC I who will p rogress to AD .M e thod s:30pa tients with AD,28pati entswith a -MC I and 34healt hy elde rl y pers ons were rec ruit ed .A ll of the pa rticipants finished a ba ttery of neuropsychol ogical tests and pu p il dilati on test .The results ofneuropsychol ogical t e sts of the s e three gr oupswere co mpared .The sensitivity and the s pecificity of pupil dila ti on test we re ca lculated .The diffe rences of pupil dilation degree among AD,MC I and nor m al controls wereana lysed .Re sults:The sco res ofMM SE,WMS and G DS ofMC I grou p we re signi f i cantly higher than tha t of AD gr oup (P <0.001),butwere si gnificantly lo wer than tha t of nor m al controls (P <0.001).AD L score ofMC I gr oup was significantly l owe r t han that of AD group (P <0.001),and wa s significantly highe r than that of nor m al controls (P >0.001).M ean pe rcent change i n pupil dia m ete r of the treated eye sho wed a trend w ith faste r m axi mzing dila ti on in AD group and MC I group,which we re significantly diffe rent fr om t he nor m al controls afte r instilla ti on (P <0.05and P <0.001re s pectively).The re wa s n o difference of pup il dila ti on degree be t ween AD gr oup and MC I gr oup (P >0.05).The sensiti vity and the s pec i f i c ity of p u p il dilation t e st for AD was 60.0%and 67.75%res pec tively .T he sensitivity and the s pec ifi c ity of pup il dil a tion t e st for MC I was 71.43%and 67.75%re s pectiv e ly .C o n clu si on:A D and MC I could be disting uished fro m t he hea lthy e l derly by pupil dilati on test which could be used as a screening di agnosing t ool .MC I repre sents a transiti onal state bet ween healthy agi ng and AD.Patients with MC I are individua ls wit h high risk t o progress to AD.【Key wor d s 】　Pupil dilati on test A lzhei me r ’s disease M ild cognitive i mpair ment 据流行病学调查,美国65岁以上的老年人,有6%～10%患有痴呆[1,2],主要都是AD 。

《汽车零部件》编辑部郑重声明

图8车辆不同状态下实时检测结果4结论在Haar-like特征的基础上，提出了新的特征集以及其特征计算公式，使用新的特征集训练出更优的分类器，提高了检测精度。

通过Adaboosl算法训练出不同车辆姿态所对应的分类器，合并各个分类器的检测结果，实现多通道级联分类器。

实验结果表明：新增加Haar-like特征可以使Adaboost训练过程中的弱分类器精确度得到提高，特别在检测同一图像中出现各种状态的车辆时具有较高的鲁棒性。

训练出的强分类器较原始Haar-like特征分类器检测率得到明显提高，并且也能满足实时检测的要求。

参考文献：[1]张晖,董育宁.基于视频的车辆检测算法综述[J].南京邮电大学学报(自然科学版),2007,27(3)：88-94.ZHANG H,DONG Y N.Survey on video based vehicle detection algorithms J.Journal of Nanjing University of Posts and Telecommunications(Natural Science),2007,27(3)：88-94. [2]许庆，高峰，徐国艳.基于Haar特征的前车识别算法[J].汽车工程,2013,35(4):381-384.XU Q,GAO F,XU G Y.An algorithm for front-vehicle detection based on Haar-like feature[J].Automotive Engineering,2013,35(4):381-384.[3]金立生，贾敏，孙玉芹，等.日间高速公路侧后方车辆识别方法[J]•西南交通大学学报,2010,45(2):231-237.JIN L S,JIA M,SUN Y Q,et al.Detection of backside vehicle on freeway in daytime J；.Journal of Southwest Jiaotong University, 2010,45(2)：231-237.[4]FLETCHER L,PETERSSON L,ZELINSKY A.Driver assistancesystems based on vision in and out of vehiclesf C]//Proceedings of Intelligent Vehicles Symposium.IEEE,2003.[5]VIOLA P,JONES M J.Robust real-time facedetection[J].InternationalJournal of Computer Vision, 2004,57(2):137-154.[6]甘玲，朱江，苗东•扩展Haar特征检测人眼的方法[J].电子科技大学学报,2010,39(2)：247-250.GAN L,ZHU J,MIAO D.Application of the expansion Haar features in eye detection]J」.Journal of University of Electronic Science and Technology of China,2010,39(2)：247-250.[7]刘晓克，孙燮华，周永霞.基于新Haar-like特征的多角度人脸检测[J].计算机工程,2009,35(19):195-197.LIU X K,SUN X H,ZHOU Y X.Multi-angle face detection based on new Haar-like feature[J puter Engineering,2009,35(19): 195-197.[8]IZADINIA H,RAMAKRISHNA V,KITANI KM,et al.Multi-posemulti-target tracking for activity understanding[C]//Proceedings of Applications of Computer Vision(WACV).Tampa:IEEE, 2013.[9]VIOLA P,JONES M J.Rapid object detection using a boostedcascade of simple features[C]//Proceedings of the2001IEEE Computer Society Conference on Computer Vision and Pattern Recognition.CVPR,2001.《汽车零部件》编辑部郑重声明近期有作者向《汽车零部件》编辑部反映，有人冒充编辑部工作人员向作者收取审稿费和版面费。

一种优化的自然光源下眼动视觉测量方法

1 虹膜中心–内眼角向量法
图 1 为瘫痪病人在注视屏幕不同目标点时，眼睛相应的状态。红色圆点表示虹膜中心在眼眶中的位置，黄色圆点表示内眼角点在眼眶中的位置，箭头表示“虹膜中心–眼角点”向量。可以看到该眼动向量充分反映了瘫痪病人看向不同位置时的眼球状态。实现对该向量的测量将有助于反映病人内心的真实想法。图 2 为眼动向量示意图。
(c) 肤色检测
图8 优化算法定位效果
3 人眼定位
在得到原始图像后，即可进一步划分人眼
ROI。主流的人眼检测算法采用的也是基于 Haar-
like 特征的 Adaboost 方法，这种方法依然会存在
将非人眼目标误检为人眼的情况。因此，本文采
用先验知识法进行人眼定位。前人根据人脸面部
这种方法对硬件要求高，设备昂贵，并且通常需要一个安装有近红外光源的专用头盔。而本文需要开发一套针对瘫痪病人及老年人方便使用的系统。
本文采用优化的“虹膜中心–眼角点向量法” 实现对眼动数据的测量。动点选取为虹膜中心
·62·
应
用
科
技
第 48 卷
点，静点选取内眼角点，由于系统面向的对象是四肢瘫痪病人，所以静点选取为内眼角点是合理且可行的。由于四肢瘫痪病人的特殊性，他们在眼球运动过程中，眼角点相对于脸部是一个绝对静止的点。该方法在系统的复杂度上要远远低于 PCCR 法，仅需要一台带电荷耦合器件（charge coupled device，CCD）相机的笔记本即可实现自然光源下对眼动的测量。
提出了一种优化的自然光源下的眼动视觉测量方法。从笔记本 CCD 相机获取的低分辨率图像，结合图像处理的方法，实现眼动测量的流程如图 3 所示。
原始图像

基于改进STDC的井下轨道区域实时分割方法

基于改进STDC 的井下轨道区域实时分割方法马天1，李凡卉1，杨嘉怡1，张杰慧1，丁旭涵2（1. 西安科技大学计算机科学与技术学院，陕西西安　710054；2. 西安科技大学安全科学与工程学院，陕西西安　710054）摘要：目前中国大部分井下轨道运输场景较为开放，存在作业人员、散落物料或煤渣侵入到轨道上的问题，从而给机车行驶带来威胁。

煤矿井下轨道区域多呈线性或弧形不规则区域，且轨道会逐渐收敛，采用目标识别框或检测轨道线的方法划分轨道区域难以精确获得轨道范围，采用轨道区域的分割可实现像素级别的精确轨道区域检测。

针对目前井下轨道区域分割方法存在边缘信息分割效果差、实时性低的问题，提出了一种基于改进短期密集连接（STDC ）网络的轨道区域实时分割方法。

采用STDC 作为骨干架构，以降低网络参数量与计算复杂度。

设计了基于通道注意机制的特征注意力模块（FAM ），用于捕获通道之间的依赖关系，对特征进行有效的细化和组合。

使用特征融合模块（FFM ）融合高级语义特征与浅层特征，并利用通道和空间注意力丰富融合特征表达，从而有效获取特征并减少特征信息丢失，提升模型性能。

采用二值交叉熵损失、骰子损失及图像质量损失来优化详细信息的提取，并通过消除冗余结构来提高分割效率。

在自建的数据集上对基于改进STDC 的轨道区域实时分割方法进行验证，结果表明：该方法的平均交并比（MIoU ）为95.88%，较STDC 提高了3%；参数量为6.74 MiB ，较STDC 降低了18.3%；随着迭代次数增加，优化后的损失函数值持续减小，且较STDC 降低更为明显；基于改进STDC 的轨道区域实时分割方法的MIoU 达95.88%，帧速率为37.8帧/s ，参数量为6.74 MiB ，准确率为99.46%。

该方法可完整识别轨道区域，轨道被准确地分割且边缘轮廓完整准确。

关键词：井下轨道区域；语义分割；短期密集连接网络；特征注意力；特征融合；注意力机制中图分类号：TD67 文献标志码：AReal time segmentation method for underground track area based on improved STDCMA Tian 1, LI Fanhui 1, YANG Jiayi 1, ZHANG Jiehui 1, DING Xuhan 2(1. College of Computer Science and Technology, Xi'an University of Science and Technology, Xi'an 710054, China ；2. College of Safety Science and Engineering, Xi'an University of Science and Technology, Xi'an 710054, China)Abstract : Currently, most underground rail transportation scenarios in China are relatively open. There are problems of operators, scattered materials, or coal slag invading the track. It poses a threat to locomotive operation. The underground track area of coal mines often presents linear or arc-shaped irregular areas, and the track gradually converges. It is difficult to accurately obtain the track range by using object recognition boxes or detecting track lines to divide the track area. Using track area segmentation can achieve pixel level accurate track area detection. Aiming at the problems of poor edge information segmentation and low real-time performance in current underground track area segmentation methods, a real-time track area segmentation method based on improved network short-term dense concatenate (STDC) is proposed. STDC is adopted as the backbone收稿日期：2023-08-22；修回日期：2023-11-15；责任编辑：王晖，郑海霞。

基于边缘计算的疲劳驾驶检测方法

基于边缘计算的疲劳驾驶检测方法
娄平 1，2，杨欣 1，胡辑伟 1，2，萧筝 3，严俊伟 1，2
（1. 武汉理工大学信息工程学院，武汉 430070；2. 宽带无线通信和传感器网络湖北省重点实验室，武汉 430070； 3. 武汉理工大学机电工程学院，武汉 430070）
摘要：现有疲劳驾驶检测方法通常将驾驶过程中采集的数据传输至云端进行分析，然而在车辆移动过程中网络覆盖范围、响应速度等因素会造成检测实时性差。为在车载嵌入式设备上对驾驶人疲劳状态进行准确预警，提出一种基于边缘计算的疲劳驾驶检测方法。通过改进的多任务卷积神经网络确定人脸区域，根据人脸的面部比例关系定位驾驶人的眼部与嘴部区域，利用基于 Ghost 模块的轻量化 AlexNet 分类检测眼部与嘴部的开闭状态，并结合 PERCLOS 和 PMOT 指标值实现疲劳检测。在 NHTU-DDD 数据集上的实验结果表明，该方法在树莓派 4B 开发板上的检测准确率达到 93.5% 且单帧平均检测时间为 180 ms，在保障检测准确率的同时大幅降低了计算量，能较好地满足疲劳驾驶的实时检测需求。关键词：疲劳驾驶检测；边缘计算；多任务卷积神经网络；轻量化；AlexNet 结构
李照等［7］设计一个基于数字信号处理（Digital Signal Processing，DSP）的嵌入式车载疲劳驾驶检测系统，该系统利用 AdaBoost 算法检测人脸并定位驾驶人眼睛区域，通过计算人眼闭合程度，并结合 PERCLOS 算法进行疲劳判断。徐明［8］利用 ARM 开发板设计了一套疲劳驾驶检测系统，该系统首先利用 AdaBoost 级联分类器依次定位驾驶人面部及眼部区域，接着采用积分投影算法定位嘴部区域，并根据眼部及嘴部的宽高比确定开闭状态，最后结合 PERCLOS 算法及打哈欠分析完成疲劳检测。这类方法可在车载嵌入式设备上独立完成疲劳检测，但基于手工提取的特征，准确率会受到个体差异、拍摄距离及角度的影响，鲁棒性较差。

一种新的挖掘眼部结构特征的人眼精定位方法

一种新的挖掘眼部结构特征的人眼精定位方法张静;叶学义;张维笑;陈雪婷【期刊名称】《计算机工程与应用》【年(卷),期】2016(052)012【摘要】针对眼镜框对眼睛定位的干扰问题，充分挖掘眼部类圆形的结构信息，并结合眼镜框的固有特征，提出了一种新的人眼精确定位方法。

它在AdaBoost 检测算法已经得到的眼部区域的基础上，在眼部矩形区域内利用灰度投影法分割眉眼条形区域，运用自适应二值化圆拟合对眉眼条形区域实现二值化处理，通过二值化图像圆拟合的圆心来确定人眼精确位置。

该方法充分利用了眼部类圆形的结构特征，有效地避免了眼镜框对人眼定位的影响。

实验结果表明，该算法在有眼镜框和无眼镜框的情况下都具有很好的定位效果。

%For interference problems of the glasses in human eyes location, this paper fully exploits the eye structure infor-mation, combined with a fixed feature of glasses, proposes a new method for precise location of the human eyes. It obtains eyes regions by using AdaBoost detection algorithm. And on the basis of this, firstly, this paper divides the eye-eyebrow bar regions using gray projection function in the eyes rectangular regions. Secondly, it achieves binary processing of the eye-eyebrow bar regions by using adaptive binary circle fitting. Finally, it determines the exact position of the human eyes through the binary image circle fitting center. This method takes full advantage of the class structure of the eye round, and effectively avoids the impact of the glasses in the human eyes location. The experimental results show that thealgorithm has good positioning results in the case of with and without glasses.【总页数】5页(P158-162)【作者】张静;叶学义;张维笑;陈雪婷【作者单位】杭州电子科技大学模式识别与信息安全实验室，杭州 310018;杭州电子科技大学模式识别与信息安全实验室，杭州 310018;杭州电子科技大学模式识别与信息安全实验室，杭州 310018;杭州电子科技大学模式识别与信息安全实验室，杭州 310018【正文语种】中文【中图分类】TP391【相关文献】1.一种新的图像中人眼定位方法 [J], 张金敏;孟萍2.一种新的面向普通用户的多值属性关联规则可视化挖掘方法 [J], 郭晓波;赵书良;王长宾;陈敏3.一种新的利用人眼视觉特性去噪方法的研究 [J], 雷超阳;刘军华;张敏4.一种新的人眼定位方法研究 [J], 李孟歆;李东昊;5.一种新的结构特征全参考图像质量评价方法 [J], 丛波因版权原因，仅展示原文概要，查看原文内容请购买。

基于改进AdaBoost的快速人脸检测算法

基于改进AdaBoost的快速人脸检测算法房宜汕【摘要】When applying in face detection,traditional AdaBoost has the problems of asking many feature numbers and slow speed in detection.In light of this,a rapid face detection algorithm based on improved AdaBoost is proposed.On the one hand,dual-threshold weak classifiers are used to replace the traditional single-threshold weak classifier and this has improved the classification capability on single feature.On the other hand,the information entropy is introduced as the metric means of feature relevance,during the feature selection,in each round of cycle only those features with low feature relevance to the selected features will be chosen,therefore the redundant information between the features is reduced.Experimental results show that compared with traditional AdaBoost face detection algorithm,this one can achieve higher detection correct rate using less features,and the detection speed is magnificently enhanced.%针对传统AdaBoost用于人脸检测时需要的特征数目多,检测速度慢的问题,提出一种基于改进AdaBoost的快速人脸检测算法.一方面,提出使用双阈值的弱分类器代替传统的单阈值弱分类器,提高单个特征的分类能力;另一方面,引入信息熵作为特征相关度的度量方法,在特征选择时每一轮循环中只选择与已选出特征相关度较低的特征,从而减少特征之间的冗余信息.实验结果表明,相对于传统AdaBoost人脸检测算法,该方法使用较少的特征即可达到较高的检测准确率,检测速度得到显著提高.【期刊名称】《计算机应用与软件》【年(卷),期】2013(030)008【总页数】4页(P271-274)【关键词】人脸检测;AdaBoost算法;特征选择;特征相关度;信息熵【作者】房宜汕【作者单位】嘉应学院计算机学院广东梅州514015【正文语种】中文【中图分类】TP317.4人脸检测是计算机视觉领域最为活跃的研究课题，在身份认证、安全防范、虚拟现实、可视通信等领域有着广泛的应用［1－3］。

基于快速傅里叶变换与人脸先验知识的改进人眼检测算法

基于快速傅里叶变换与人脸先验知识的改进人眼检测算法蔚东辰;郑伯川【期刊名称】《现代计算机（专业版）》【年(卷),期】2012(000)019【摘要】提出了一种针对DVF（距离向量域）眼睛检测算法的重要改进。

使用距离向量域来检测人眼在图像中的位置已经被证明是一种出色的方法，然而其拥有非常高的时间计算复杂度。

目前，两项重要改进被引入：首先，使用快速傅里叶变换与卷积定理，成功将检测复杂度降低。

其次，眼睛在人脸中粗略位置的合理假设被引入检测过程中，使得DVF检测算法在人脸在有旋转与偏斜的情况下仍然保持精确。

%Proposes an important improvement on detecting speed and accuracy on DVF (distance vector field) eye detection algorithm. Detecting eyes region using DVF has been proved to be excel- lent, but with very high computing cost. Introduces two improvements, first, using FFT and con- volution theorem we succeed in reducing the computing complexity of detection. Then applies an appropriate assumption of approximate location of eyes in the detecting procedure, which keeps the DVF eye detection accurate despite the rotation or skew of a human face.【总页数】5页(P25-29)【作者】蔚东辰;郑伯川【作者单位】四川大学计算机学院,成都610065;西华师范大学数学学院,南充637002【正文语种】中文【中图分类】TP391.41【相关文献】1.基于先验知识的人脸检测算法研究与应用 [J], 董立新2.基于先验知识的人脸检测算法研究与应用 [J], 董立新3.人眼定位与AdaBoost Gabor滤波的人脸检测算法 [J], 杨定礼;张宇林;周红标;赵环宇;白秋产4.基于MTCNN与改进Camshift相结合的人脸检测算法 [J], 黄新;高雷;宋博源;郭晓敏5.基于改进YOLOv5的人脸遮挡物目标检测算法 [J], 赵元章;耿生玲因版权原因，仅展示原文概要，查看原文内容请购买。

基于自适应Gabor滤波的红外弱小目标检测

基于自适应Gabor滤波的红外弱小目标检测史漫丽;凌龙;吴南;原娜【摘要】针对复杂背景下的红外弱小目标检测,本文提出了一种改进的Gabor滤波的红外弱小目标检测方法.该方法在背景预测算法的基础上,通过构造Gabor核函数来自适应确定背景预测系数.该方法利用了更多的图像局部特性信息,使用对比度尺度模型和强度尺度传播模型分别确定Gabor核函数的两个轴,解决了Gabor滤波算子不能自适应调整滤波系数的问题.通过与传统的小目标检测方法的比较实验结果表明,本文方法能有效保留图像的边缘信息,能有效地突出目标,抑制背景杂波,提高了对红外弱小目标的检测能力,效果明显优于传统方法.【期刊名称】《红外技术》【年(卷),期】2018(040)007【总页数】6页(P632-637)【关键词】Gabor;背景预测;红外弱小目标;目标检测【作者】史漫丽;凌龙;吴南;原娜【作者单位】北京空间机电研究所,北京 100094;北京空间机电研究所,北京100094;北京空间机电研究所,北京 100094;北京空间机电研究所,北京 100094【正文语种】中文【中图分类】TP3910 引言近些年来，红外预警技术迅猛发展，在军事上的应用也越来越广泛。

如何能更早更准确地发现目标，并及时地采取行动精确地打击目标是军事侦察所期待的目标。

因此，研究人员对红外弱小目标的检测算法越来越关注，涌现出许多有效的目标检测算法。

相对于红外面目标的检测，红外弱小目标检测中的目标无形状、边缘等信息，可分辨的信息比较少，增加了检测难度。

另外，复杂多变的背景及强噪声也增大了检测难度。

所谓的弱小目标，即目标和系统的相对距离较远，图像在成像的平面上仅仅表现为几个或者几十个像素点，目标没有形状特征、或者表现为形状特征极其不明显，甚至在运动中出现目标的闪烁、间断等现象[1]，图像的信噪比比较低，目标极易被复杂的背景杂波淹没，检测起来极为困难。

背景占据了红外图像的绝大部分面积。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Eye Detection Based on Improved AD AdaBoost AlgorithmBenke XiangCollege of Computer and Information Science SouthwestUniversityChongqing, P.R Chinaxiangbenke@Xiaoping ChengCollege of Computer and Information Science SouthwestUniversityChongqing, P.R Chinaxpcheng@Abstract—Eye detection is an important step in eye tracking and eye state recognition. An improved AD AdaBoost algorithm for eye detection is proposed to slow the degradation in training step. Weight on negative samples which are classified correctly is released then the other samples' weight is normalized to slow the expansion of weight on difficult samples. The experiment results show that the approach proposed is real time and has a higher detection accuracy.Keywords-Eye Detection; AD AdaBoost; Cascaded ClassifierI.I NTRODUCTIONThe human eye, because of its extremely important information, has become a hot research field of pattern recognition. The research on its tracking and pattern recognition has been widely used in the areas such as fatigue testing, specific user man-machine interface, animation synthesis, automated access control and security monitoring [1]. Human eye detection is the first step of the human eye tracking and human eye state identification, so it has attracted more and more attention.In the detection process, the method which based on statistics, which through training and learning on a large number of target and non-target samples [2], obtained a set of model parameters, and then building a classifier based on the model, or filters to detect targets. It takes full advantage of the weak learning theorem in machine learning [3], making the classification accuracy greatly improved through extensive studies. It has gradually become the mainstream technology in research area of human eye detection.AD AdaBoost algorithm in statistical learning methods [4] is the latest development of AdaBoost, it reduces the error rate of negative samples and optimize the disadvantage of traditional AdaBoost algorithm which can only make the upper bound of classification error rate minimal, so it is more suitable for the target detection problem. However, the training process, as number of iterations in the classifier increases, the weights of difficult samples expanded in the entire sample set, making the rate of simple sample values decrease in the entire sample set, resulting in the classifiers produces bias on the sample [5]. The already-generate accurate classification rules have been destroyed, the entire system recognition rate decreases, resulting in degradation.This article will use AD AdaBoost for the detection of human eye, taking into account the clarity of human eye's own outline, sharp color contrast, using a large number of human eye samples for statistical learning. Releasing correctly classified weight of negative samples in the classifier training process, and through weight normalization to solve the problem of expansion on weights of buffering difficult samples, slowing the degradation in classification process, improving detection accuracy.II.C ASCADE STRUCTURE CLASSIFIER TRAININGA.AD AdaBoost Cascade StructureCascade structure of classifiers which is proposed by Paul Viola [6] is a set of serial classifiers. In the treatment of classification on identifying samples, only the samples which are pointed positive by classifier in the front were sent to classifier in the back to be addressed later, to the contrary, it is seen as negative sample and to be outputted directly. Finally, only the samples which are judged as positive at every level of the classifier are to serve as the final output of the positive samples. Cascade structure as shown in Figure 1.Figure 1 Schematic diagram of cascadeT indicates that this layer is determined as positive, F indicates negative samples. The preceding level classifier structure is simple, using fewer number of features, but can filter out large numbers of negative samples which were quite different with the target samples. The latter class uses more features and more complex structures and thus be able to distinguish between negative samples which has high similarity with the target samples. In the actual target detection problem, because the positive samples to be tested are usually only a small proportion of the overall sample, most of the samples was filtered out in the classifier of cascade structure in the front, and only a small number ofsamples need to go through all levels of the classifiers, and thus cascade structure significantly reduces the computational complexity.B.Enhancing Mechanism in the Training of the SampleWeightsThe basic idea of AD AdaBoost is to integrate multiple weak classifiers into a strong classifier. In the training process, all of the positive and negative samples were given an equal initial weights. When one classifier training is completed, based on their classification result on the training set of samples, we make adjustment on all of the sample weights, the correctly classified samples by previous class, its weight decreases when it enters the next iteration. The mistakenly classified samples by previous class, its weight increases when it enters the next iteration, making the next weak classifier training are more concerned about samples which have been misidentified [4]. The final judgment result of strong classifier weighted sum of all judgment results of all weak classifiers.However, with the increase of the iterations in training, the algorithm will gradually shift its focus to the sample difficult to categorize, which leads to the expansion of the weight of difficult samples within the scope of the entire sample set. When the weight of the difficult sample is too large, the weak classifier will increase emphasis for each loop and the emphasis will improve with the increasing weight; meanwhile, due to the sample weight normalization after each classification, the proportion of simple sample weights will severely reduced in the entire sample, resulting in bias, that the accurate classification rules will be destroyed, the entire system recognition rate decreases, resulting in degradation [5].C.Analysis for the Training of Cascade StructureClassifierThrough the learning of a large number of the positive samples with target type and negative samples with non-target type in the training stage, AdaBoost has accessed to the testing classification rules.Suppose the final cascade structure classifiers obtained a total of N levels (N> 1), samples were concentrated with a negative sample A (xa, -1), its corresponding classification rules is ¾. During training phase, if the first t-level (t <N) classifier correctly classified negative samples of A, the first t-class classifier obtain the rule of ¾, while input the first t +1 level after the reduce of weight of A, which continue to be used for the training of next classifier. Because of the repeat study for correctly classified negative sample of A, the first t +1 level classifier also obtained discrimination rule ¾. During the actual testing phase, if the negative samples A are classified correctly in the t -level, it will be output directly asa negative sample and will not be proceed to the next level for further distinguish. Thus, judge rule ¾ for A acquired by t +1 level class classifier has lost its practical value.To sum up, A is invalid sample for the t +1 level classifier during the training process, which reduces the effective capacity of the training samples while the continuing possession of weight which will compress weight space of other effective samples, resulting in distortions in the distribution of weight. Therefore, the improvement of the AD AdaBoos is marking the correctly classified negative samples, then releasing the sample weights in time, and finally normalizing to buffer the expansion of difficult samples' weights.III.I MPROVED AD A DA B OOST FOR HUMAN EYEDETECTION$ Algorithm ImprovementAlgorithm improvements from the following aspects:Ł for the training samples, set identification state markfi the initial value of +1;ł in iterative process of the algorithm, if a negative samples were correctly classified in a certain level, modify fi =- 1;Ń for the fi =- 1 sample, release its weight, Doing normalization operation after updating weights of remaining samples.ń fi = +1 samples will continue to be used for training in next class classifier.% The Improved AD AdaBoost for Human Eye Detection The specific algorithm is as follows:1. For a given set of training samples for calibration: (x1, y1, f1), (x2, y2, f2 ),...,( xm, ym, fm), where xi∈ X, yi∈Y = (-1, + 1), fi∈F = (-1, +1);2. initialize the sample weight and identification state: D1(i) = 1 / m, fi = +1;3. Circle t = 1,2 ,..., T:3.1 The current distribution of weights of Dt, according to every single-feature, we use fi = +1 samples to train a weak classifier for human eye identification, in these classifiers, we selected the classifier with smallest error rate as a weak classifier ht in this circle;3.2 selected ht, calculate the weighted error rate:)()(iDiityxhtt¦≠=εAnd identify the correct weighted sum of positive samples:)(1)(,1iDpitixhytt¦===3.3 Solving the parameters weak weighted classifier ht:tp t tt e k ⋅+−=1ln[21εαWhere k is a constant, must reduce the upper bound of least error rate in this circle [4], often taking 1 / 120.3.4 Update sample identification condition,circulating t '= 1,2 ,..., m:endiff set y x h if i i i t 11&&1)(−=−=−= 3.5 update the sample weights for the next cycle:°¯°®≠−==+==×=−+i i t i i i t i i i t tt t yx h e y y x h y y x h e Z i D i D t t )(,1&&)(,01&&)(,)()(1ααWhere Zt is the normalization factor:¦+=⋅⋅−⋅=1)()(i i t i t f x h y tt e i D Z α4. The final strong classifier shall be:»¼º«¬ª−=¦=T t t t Th x h sign x H 1)()(α Where Th is the threshold value which is set manually , the default is 0.For the training samples, add identification mark fi initial value +1, in the subsequent iteration, if a negative sample is correctly classified in this layer set fi =- 1, it will not participate in the next round of training. negative samples which have already been correctly classified are not taken into account when samples are update and normalized.IV.E XPERIMENT$ Training SamplesWhen we do Training on the AD AdaBoost, the positive samples are selected from the Feret Face Database. In the library, 5000 images from 14051 are selected as positive samples. Negative samples were mainly from scenes photo which are downloaded from the Internet, and the others are taken from the plants and animals photos taken by my own. A total of about 8000 photos.% Experimental ResultsExperiment is operated on a P Č 2.4GHz desktop machine, end up with 12 classes of human eye detection, 433 features, with a total of 7 training days. A 300 × 240 image takes about 16ms. testing samples tested were selected from Feret face database and human photos downloaded from the Internet, a total of 323. The test results are shown in Table 1. Experimental results show that the accuracy of the method in this article is better than the previous method. Some test results are shown in Figure 2.Table 1 Test results for the human eye detectionAdaBoostAD AdaBoos t Improved ADAdaBoostRecognitionrate93.8% 96.3% 97.2% False detectionrate 6.2% 3.7%2.8% False Report2012 9Figure 2 examples of test resultsAs can be seen from the table, AD AdaBoost have made considerable improvements than the traditional AdaBoost in the recognition rate, and the Improved AD AdaBoost method in this article, which mark the indentification state of samples, has improvement on recognition rate compare to the traditional AD AdaBoost.V.S UMMARYThis paper presents an improved AD AdaBoost algorithm useing to the human eye detection. Compare to traditional AdaBoost, AD AdaBoost effectively reduces the negative sample error rate while positive sample error rate is relatively low, improving the identification ability of overall classifiers. Also, aiming at the AD AdaBoost’ degradation phenomenon which appears in the experiment, this paper proposed an improved algorithm by marking identification state of samples to slow spread of degradation. So that AD AdaBoost recognition capability has been further improved. This improved algorithm can only mitigate the spread of degradation, because there is no absolute appropriate restrictions on upper bound on the increase of difficult sample weights in iteration, marking on samples can not fundamentally put an end on rise of degradation phenomenon. So, the enhancing mechanism of samples weights is a topic needs further research.R EFERENCES[1]TANG Jin, XU Hai-zhu, WANG Li, “Survey on human eyesdetection in images, ” Application Research of Computers, vol 25, Apr. 2008, pp. 961-965[2]Freund Y and Schapire R E, “A decision-theoretic generalization ofon-line learning and an application to boosting, ” Journal of Computer and System Sciences, vol 55, Jan. 1997, pp. 119-139.[3]Schapire R E, “The strength of weak learnability, ” Machine Learning,vol 5, Feb. 1990, pp. 197-227.[4]LI Chuang, DING Xiao-qing and WU You-shou, “A RevisedAdaBoost Algorithm-AD AdaBoost, ” CHINESE JOURNAL OF COMPUTERS, vol 30, Jan. 2007, pp. 103-109.[5]Quinlan J R, “Bagging,boosting,and C4.5, ” Proc of the ThirteenthNational Conference on Artificial Intelligence, 1996, pp. 725-730. [6]Paul Viola and Michael J.Jones, “Robust Real-time ObjectDetection, ” UK: Cambridge Research Laboratory, Feb. 2001.。