卷积神经网络CNNs (AlexNet)
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
[1] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv, 2012.
2
CNN
CNNs are basically layers of convolutions followed by subsampling and fully connected layers. Intuitively speaking, convolutions and subsampling layers works as feature extraction layers while a fully connected layer classifies which category current input belongs to using extracted features.
3
Why so powerful?
Local Invariance
Loosely speaking, as the convolution filters are „sliding‟ over the input image, the exact location of the object we want to find does not matter much. Compositionality There is a hierarchy in CNNs. It is GOOD!
Introduction to CNNs and AlexNet
Sungjoon Choi (sungjoon.choi@cpslab.snu.ac.kr)
CNN
Convolutional Neural Network
This is pretty much everything about the convolutional neural network. Convolution + Subsampling + Full Connection
[batch, in_height=4, in_width=4, in_chnnel=3] [filter_height=3, filter_width=3, in_channels=3, out_channels=7]
34
Conv2D
[batch, in_height=4, in_width=4, in_chnnel=3] [filter_height=3, filter_width=3, in_channels=3, out_channels=7]
32
http://www.eddyazar.com/the-regrets-of-a-dropout-and-why-you-should-drop-out-too/
Must remember!
33
Conv2D
[batch, in_height, in_width, in_chnnel] [filter_height, filter_width, in_channels, out_channels]
7
Stride
8
Stride
(Left) Stride size: 1
(Right) Stride size: 2 If stride size equals the filter size, there will be no overlapping.
9
Conv2D
[batch, in_height, in_width, in_chnnel] [filter_height, filter_width, in_channels, out_channels]
来自百度文库
5
Get familiar with this
ZeroStride padding Channel
6
Zero-padding
What is the size of the input?
What is the size of the output? What is the size of the filter? What is the size of the zero-padding?
Why residual?
We hypothesize that it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping. Shortcut connections are used.
4
https://starwarsanon.wordpress.com/tag/darth-sidious-vs-yoda/
Convolution
http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
What is the number of parameters in this convolution layer?
35
VGG
VGG?
37
GoogLeNet
GoogLeNet
39
GoogLeNet
22 Layers Deep Network Efficiently utilized computing resources, “Inception Module” Significantly outperforms previous methods on ILSVRC 2014
70
Why residual?
“The extremely deep residual nets are easy to optimize. “ “The deep residual nets can easily enjoy accuracy gains from greatly increased depth, producing results substantially better than previous networks. “
31
Dropout
Original dropout [1] sets the output of each hidden neuron with certain probability. In this paper, they simply multiply the outputs by 0.5.
26
LRN
Local Response Normalization
It implements a form of lateral inhibition inspired by real neurons.
27
Regularization in AlexNet
Main objective is to reduce overfitting.
152 layers network 1st place on ILSVRM 2015 classification task 1st place on ImageNet detection 1st place on ImageNet localization 1st place on COCO detection 1st place on COCO segmentation
More details will be handled in next week. In the AlexNet, two regularization methods are used.
Data augmentation Dropout
28
Data augmentation
29
http://www.slideshare.net/KenChatfield/chatfield14-devil
56
Inception v4
57
Inception v4
58
Inception v4
59
Inception-ResNet-v1
60
Inception-ResNet-v1
61
Inception-ResNet-v1
62
Inception-ResNet-v1
63
ResNet
Deep residual networks
One by one convolution
47
One by one convolution
48
One by one convolution
49
GoogLeNet
Network in Network!
50
Conclusion
Very clever idea of using one by one convolution for dimension reduction! Other than that...
51
Inception-v4, InceptionResNet and the Impact of Residual Connections on Learning
Residual inception blocks
53
Inception v4
54
Inception v4
55
Inception v4
66
Degeneration problem
CiFAR 100 Dataset
ImageNet
67
Residual learning building block
68
Residual learning building block
y
ReLU
+ y conv2d y
ReLU
y conv2d x
69
What is the number of parameters in this convolution layer?
11
12
13
14
15
16
17
18
19
20
CNN Architectures
AlexNet VGG
GoogLeNet
ResNet
21
Top-5 Classification Error
22
AlexNet
AlexNet
What is the number of parameters? Why are layers divided into two parts?
24
AlexNet
25
ReLU
Rectified Linear Unit
ReLU
tanh
Faster Convergence!
Data augmentation in AlexNet
30
Data augmentation in AlexNet
Color variation
Probabilistically, not a single patch will be same at the training phase! (a factor of infinity!)
40
Inception module
41
Naï ve inception module
42
Naï ve inception module
43
Naï ve inception module
44
Actual inception module
45
One by one convolution
46
65
Deeper Network?
Is deeper network always better? What about vanishing/exploding gradients? Better initialization methods / batch normalization / ReLU Any other problems? Overfitting? Degradation problem: more depth but lower performance
[batch, in_height=4, in_width=4, in_chnnel=3] [filter_height=3, filter_width=3, in_channels=3, out_channels=7]
10
Conv2D
[batch, in_height=4, in_width=4, in_chnnel=3] [filter_height=3, filter_width=3, in_channels=3, out_channels=7]
2
CNN
CNNs are basically layers of convolutions followed by subsampling and fully connected layers. Intuitively speaking, convolutions and subsampling layers works as feature extraction layers while a fully connected layer classifies which category current input belongs to using extracted features.
3
Why so powerful?
Local Invariance
Loosely speaking, as the convolution filters are „sliding‟ over the input image, the exact location of the object we want to find does not matter much. Compositionality There is a hierarchy in CNNs. It is GOOD!
Introduction to CNNs and AlexNet
Sungjoon Choi (sungjoon.choi@cpslab.snu.ac.kr)
CNN
Convolutional Neural Network
This is pretty much everything about the convolutional neural network. Convolution + Subsampling + Full Connection
[batch, in_height=4, in_width=4, in_chnnel=3] [filter_height=3, filter_width=3, in_channels=3, out_channels=7]
34
Conv2D
[batch, in_height=4, in_width=4, in_chnnel=3] [filter_height=3, filter_width=3, in_channels=3, out_channels=7]
32
http://www.eddyazar.com/the-regrets-of-a-dropout-and-why-you-should-drop-out-too/
Must remember!
33
Conv2D
[batch, in_height, in_width, in_chnnel] [filter_height, filter_width, in_channels, out_channels]
7
Stride
8
Stride
(Left) Stride size: 1
(Right) Stride size: 2 If stride size equals the filter size, there will be no overlapping.
9
Conv2D
[batch, in_height, in_width, in_chnnel] [filter_height, filter_width, in_channels, out_channels]
来自百度文库
5
Get familiar with this
ZeroStride padding Channel
6
Zero-padding
What is the size of the input?
What is the size of the output? What is the size of the filter? What is the size of the zero-padding?
Why residual?
We hypothesize that it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping. Shortcut connections are used.
4
https://starwarsanon.wordpress.com/tag/darth-sidious-vs-yoda/
Convolution
http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
What is the number of parameters in this convolution layer?
35
VGG
VGG?
37
GoogLeNet
GoogLeNet
39
GoogLeNet
22 Layers Deep Network Efficiently utilized computing resources, “Inception Module” Significantly outperforms previous methods on ILSVRC 2014
70
Why residual?
“The extremely deep residual nets are easy to optimize. “ “The deep residual nets can easily enjoy accuracy gains from greatly increased depth, producing results substantially better than previous networks. “
31
Dropout
Original dropout [1] sets the output of each hidden neuron with certain probability. In this paper, they simply multiply the outputs by 0.5.
26
LRN
Local Response Normalization
It implements a form of lateral inhibition inspired by real neurons.
27
Regularization in AlexNet
Main objective is to reduce overfitting.
152 layers network 1st place on ILSVRM 2015 classification task 1st place on ImageNet detection 1st place on ImageNet localization 1st place on COCO detection 1st place on COCO segmentation
More details will be handled in next week. In the AlexNet, two regularization methods are used.
Data augmentation Dropout
28
Data augmentation
29
http://www.slideshare.net/KenChatfield/chatfield14-devil
56
Inception v4
57
Inception v4
58
Inception v4
59
Inception-ResNet-v1
60
Inception-ResNet-v1
61
Inception-ResNet-v1
62
Inception-ResNet-v1
63
ResNet
Deep residual networks
One by one convolution
47
One by one convolution
48
One by one convolution
49
GoogLeNet
Network in Network!
50
Conclusion
Very clever idea of using one by one convolution for dimension reduction! Other than that...
51
Inception-v4, InceptionResNet and the Impact of Residual Connections on Learning
Residual inception blocks
53
Inception v4
54
Inception v4
55
Inception v4
66
Degeneration problem
CiFAR 100 Dataset
ImageNet
67
Residual learning building block
68
Residual learning building block
y
ReLU
+ y conv2d y
ReLU
y conv2d x
69
What is the number of parameters in this convolution layer?
11
12
13
14
15
16
17
18
19
20
CNN Architectures
AlexNet VGG
GoogLeNet
ResNet
21
Top-5 Classification Error
22
AlexNet
AlexNet
What is the number of parameters? Why are layers divided into two parts?
24
AlexNet
25
ReLU
Rectified Linear Unit
ReLU
tanh
Faster Convergence!
Data augmentation in AlexNet
30
Data augmentation in AlexNet
Color variation
Probabilistically, not a single patch will be same at the training phase! (a factor of infinity!)
40
Inception module
41
Naï ve inception module
42
Naï ve inception module
43
Naï ve inception module
44
Actual inception module
45
One by one convolution
46
65
Deeper Network?
Is deeper network always better? What about vanishing/exploding gradients? Better initialization methods / batch normalization / ReLU Any other problems? Overfitting? Degradation problem: more depth but lower performance
[batch, in_height=4, in_width=4, in_chnnel=3] [filter_height=3, filter_width=3, in_channels=3, out_channels=7]
10
Conv2D
[batch, in_height=4, in_width=4, in_chnnel=3] [filter_height=3, filter_width=3, in_channels=3, out_channels=7]