基于深度神经网络的目标检测

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

第10页 | 共25页
SPP-Net: Testing for Detection
Almost the same as R-CNN, except Step3.
第11页 | 共25页
• Speed: 64x faster than R-CNN using one scale, and 24x faster using five-scale paramid. • mAP: +1.2 mAP vs R-CNN
Baidu Nhomakorabea
第14页 | 共25页
Fast R-CNN: Joint Training Framework
Joint the feature extractor, classifier, regressor together in a unified framework
（RoI）候选区域：图像序号＋几何位置
第15页 | 共25页
第17页 | 共25页
brute force （single scale）
image pyramids （multi scale）
conv
Conv5 feature map
• In practice, single scale is good enough. (The main reason why it can faster x10 than SPP-Net)
第13页 | 共25页
Fast R-CNN: Motivation
JOINT TRAINING!!
ROI pooling layer 多任务损失函数(multi-task loss) 特征提取和分类放在一个网络之中，联合训练
Ross Girshick, Fast R-CNN, Arxiv tech report
第26页 | 共28页
YOLO可以每秒处理45张图像每个网络预测目标窗口时使用的是全图信息
只使用7*7的网格回归会使得目标不能非常精准的定位检测精度并不是很高
第27页 | 共28页
第28页 | 共25页
· YOLO对相互靠的很近的物体，还有很小的群体检测效果不好，这是因为一个网格中只预测了两个框，并且只属于一类。 · 对测试图像中，同一类物体出现的新的不常见的长宽比和其他情况是。泛化能力偏弱。 · 由于损失函数的问题，定位误差是影响检测效果的主要原因。尤其是大小物体的处理上，还有待加强。
基于深度神经网络的目标检测
第1页 | 共25页
检测 ≈ 定位 + 分类
第2页 | 共25页
传统目标检测
RCNN SPPNET
基于候选区域的目标检测
fast-RCNN faster-RCNN
回归方法的深度学习目标检测
YOLO
第3页 | 共25页
传统目标检测
区域选择
特征提取
分类器分类
滑动窗口策略
第25页 | 共28页
基于回归YOLO
增强版本GPU中能跑45fps，简化版本155fps
(1) 给个一个输入图像，首先将图像划分成7*7的网格 (2) 对于每个网格，我们都预测2个边框（包括每个边框是目标的置信度以及每个边框区域在多个类别上的概率） (3) 根据上一步可以预测出7*7*2个目标窗口，然后根据阈值去除可能性比较低的目标窗口，最后NMS去除冗余窗口即可。
本质为滑动窗口
第21页 | 共25页
第22页 | 共25页
滑动窗口（最后一卷积层） anchor机制（锚点）边框回归
可以得到多尺度长宽比候选区域
第23页 | 共25页
20000个anchor
简单网络目标检测速度达到17fps，在PASCAL VOC上准确率为59.9%；复杂网络达到5fps，准确率78.8%
第12页 | 共25页
1. 训练分多阶段，并不是端到端的训练过程
Conv layers
FC layers
SVM
regressor
2. 训练花费过大的硬盘开销和时间
store
3. 训练sppnet只微调全连阶层（检测除了语义信息还需要位置信息，多层pooling操作导致位置信息模糊）
Fast R-CNN
第29页 | 共25页
Thanks
第30页 | 共28页
TOO SLOWWWW !!! －> SPPNET
RCNN三个问题（分阶段训练、空间浪费、慢47s ）
第6页 | 共25页
SPP-Net: Motivation
Cropping may loss some information about the object
Warpping may change the object’s appearance
Conv5 feature map Image Pyramid FeatMap Pyramids
第9页 | 共25页
SPP-Net: Training for Detection(2)
• Step 2, For each proposal, walking the image pyramid and find a project version that has a number of pixels closest to 224x224. (For scaling invariance in training.)
手工设计特征 SVM、adaboost等
两个问题
第4页 | 共25页
基于候选区域的目标检测
候选区域 selective search
IOU NMS 非极大值抑制
PASCAL VOC上的检测率从35.1%提升到53.7%
第5页 | 共25页
Step1. Input an image Step2. Use selective search to obtain ~2k proposals Step3. Warp each proposal and apply CNN to extract its features Step4. Adopt class-specified SVM to score each proposal Step5. Rank the proposals and use NMS to get the bboxes. Step6. Use class-specified regressors to refine the bboxes’ positions.
第8页 | 共25页
SPP-Net: Training for Detection(1)
Step1. Generate a image pyramid and exact the conv FeatMap of the whole image
Conv5 feature map
conv
Conv5 feature map
第24页 | 共25页
1 2 3 4
• Imagenet上预训练模型初始化网络参数，微调RPN网络
• 使用1中网络提取候选区域训练fast－RCNN • 用2的fast—RCNN重新初始化RPN，固定卷积层微调 • 固定2种fast－RCNN卷积层，用3种RPN提取候选微调
1.无法达到实时 2.预先获取候选区域，在对每个proposal分类计算量比较大
• Step 3, find the corresponding FeatMap in Conv5 and use SPP layer to pool it to a fix size.
• Step 4, While getting all the proposals’ feature, fine-tune the FC layer only. • Step 5, Train the classspecified SVM
第18页 | 共25页
Fast R-CNN: Other tricks
第19页 | 共25页
Fast RCNN和RCNN相比，训练时间从84小时减少为9.5 小时，测试时间从47秒减少为0.32秒。在PASCAL VOC 2007上的准确率相差无几，约在66%-67%之间
- 网络末端同步训练的分类和位置调整，提升准确度 - 使用多尺度的图像金字塔，性能几乎没有提高 - 倍增训练数据，能够有2%-3%的准确度提升 - 网络直接输出各类概率(softmax)，比SVM分类器性能略好 - 更多候选窗不能提升性能
Fast R-CNN: RoI pooling layer
≈ one scale SPP layer
第16页 | 共25页
Fast R-CNN: Regression Loss
多任务损失函数
A smooth L1 loss which is less sensitive to outliers than L2 loss
第20页 | 共25页
1. Region proposal耗时（提region proposal 2~3s，而提特征分类只需0.32s）
2. 伪端到端训练（region proposal使用selective search先提取处来，占用磁盘存储）
Faster－RCNN
卷积网络直接产生候选区域RPN
第7页 | 共25页
• FC layer need a fixed-length input while conv layer can be adapted to arbitrary input size. • Thus we need a bridge between the conv and FC layer. • Here comes the SPP layer.