学术研讨会
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Batch Normalization:
Accelerating Deep Network Training by Reducing Internal Covariate Shift
Background
Red point is the data set. Waste of time
Dash line is the initial random line. overfit
Experiments
Figure2:Single crop validation accuracy of Inception and its batch-normalized variants
Experiments
exceeds the estimated accuracy of human raters Figure3: Batch-Normalized Inception comparison with previous state of the art on the provided validation set comprising 50000 images. *BN-Inception ensemble has reached 4.82% top-5 error on the 100000 images of the test set of the ImageNet as reported by the test server.
Problem2:
How to use Batch Normalization to CNN?
Experiments
Figure 1:predicting the digit class on the MNIST dataset. (a) The test accuracy of the MNIST network trained with and without BN. (b,c) The evolution of input distributions to a typical sigmoid, over the course of training, shown as {15, 50, 85}th percentiles.
Batch size = 1
来自百度文库
Batch learning
Efficiently occupied memory Iterations decreased each epoch Convergence quickly
What is batch normalization?
X(1) ={ X1(1), x2(1), x3(1), x4(1), x5(1), x6(1), x7(1)}
Features of BN Higher learning rate
Less careful about initialization
Eliminating the need for Dropout
Thank you!
Problem1:
We can obtain E(x(k)) in training step due to the data is a batch.
How can we get E(x(k)) in testing step when the x(k) is a single number ?
Apply BN to DNN
Background
Mean = 0 Cor = 1 Decorrelated whiten
Compute complex
Background
Full batch learning Batch size = data size
Attemp to Mean = 0, Cor = 1
Online batch learning
X(k) is the kth dimension of input X. A batch contains X1(k), X2(k)…Xm(k). … … … γ(k) β(k) determined through iterations.
Apply BN to DNN
Apply BN to DNN
Accelerating Deep Network Training by Reducing Internal Covariate Shift
Background
Red point is the data set. Waste of time
Dash line is the initial random line. overfit
Experiments
Figure2:Single crop validation accuracy of Inception and its batch-normalized variants
Experiments
exceeds the estimated accuracy of human raters Figure3: Batch-Normalized Inception comparison with previous state of the art on the provided validation set comprising 50000 images. *BN-Inception ensemble has reached 4.82% top-5 error on the 100000 images of the test set of the ImageNet as reported by the test server.
Problem2:
How to use Batch Normalization to CNN?
Experiments
Figure 1:predicting the digit class on the MNIST dataset. (a) The test accuracy of the MNIST network trained with and without BN. (b,c) The evolution of input distributions to a typical sigmoid, over the course of training, shown as {15, 50, 85}th percentiles.
Batch size = 1
来自百度文库
Batch learning
Efficiently occupied memory Iterations decreased each epoch Convergence quickly
What is batch normalization?
X(1) ={ X1(1), x2(1), x3(1), x4(1), x5(1), x6(1), x7(1)}
Features of BN Higher learning rate
Less careful about initialization
Eliminating the need for Dropout
Thank you!
Problem1:
We can obtain E(x(k)) in training step due to the data is a batch.
How can we get E(x(k)) in testing step when the x(k) is a single number ?
Apply BN to DNN
Background
Mean = 0 Cor = 1 Decorrelated whiten
Compute complex
Background
Full batch learning Batch size = data size
Attemp to Mean = 0, Cor = 1
Online batch learning
X(k) is the kth dimension of input X. A batch contains X1(k), X2(k)…Xm(k). … … … γ(k) β(k) determined through iterations.
Apply BN to DNN
Apply BN to DNN