行为识别与检测

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY, CAS
Action Recognition
2D CNNs Recurrent Modeling 3D CNNs
SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY, CAS
Temporal Linear Encoding (CVPR 17)
Moments in Time
http://moments.csail.mit.edu/
2016 2017 2017 2018
Google
• Video Classification
Google (DeepMind)
• Trimmed Activity Recognition
Google MIT
• Spatio-temporal Action Localization • Trimmed Event Recognition
SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY, CAS
Video Benchmarks
1. The widely-used data sets are small-scale 2. It is hard to investigate spatial-temporal representations of deep neural networks
• 80 atomic actions • 192 clips (15 mins per clip) • 740k annotations
SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY, CAS
Large-Scale Video Sets
Benchmarks
ActivityNet
UCF101 (13,320 videos,101 actions ) HMDB51 (6,849 videos, 51 actions )
SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY, CAS
Large-Scale Video Sets
Youtube8M
• 200 classes • 100 untrimmed videos per class • 1.54 activity instances per video • 648 video hours
Limin Wang et al., UntrimmedNets for Weakly Supervised Action Recognition and Detection, CVPR 2017
SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY, CAS
Recurrent Modeling
http://activity-net.org/index.html
Year
Team
Task
2015
ຫໍສະໝຸດ Baidu
Universidad del Norte &
KAUST
• • • •
Untrimmed Action Recognition Temporal Action Proposals Temporal Action Localization Dense-Captioning Events in Videos
SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY, CAS
UntrimmedNets (CVPR 17)
UntrimmedNet: 1. Attention for proposal selection 2. Weakly-supervised detection
Deep Temporal Linear Encoding (TLE) Networks: 1. Aggregating K segments into a video representation 2. Bilinear encoding for feature interactions
Ali Diba et al., Deep Temporal Linear Encoding Networks, CVPR 2017
Youtube8M
https://research.google.com/you tube8m/index.html
Kinetics
https://deepmind.com/research/ open-source/open-sourcedatasets/kinetics/
AVA
https://research.google.com/ava /index.html
Wenbin Du et al., Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos, IEEE TIP 2018
Recurrent Spatial-Temporal Attention Network (ours): 1. Spatial-temporal attention from global video context 2. Attention-driven two-steam fusion 3. Actor-attention regularization to highlight action regions
VALSE 2018 - 大连
行为识别与检测2018年度进展
Yu QIAO
乔宇
中国科学院深圳先进技术研究院 2018年4月22日
2018.4
Outline
SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY, CAS
视频行为数据库 行为识别方法 行为检测方法 未来研究方向
SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY, CAS
Large-Scale Video Sets
• 306,245 videos in total • 400 action classes • Each clip lasts around 10s
• over 1,000,000 videos • 339 Moment classes • 3-second video
相关文档
最新文档