深度学习的随机矩阵理论模型_v0.1

合集下载

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Page . 11
Early Theoretical Results on Deep Learning
▪ Approximation theory
▪ Perceptrons and multilayer feedforward networks are universal approximators: Cybenko ’89, Hornik ’89, Hornik ’91, Barron ‘93
深度学习的随机矩阵理论模型
邱才明 12/29/பைடு நூலகம்017
深度学习理论- A Review
Page . 3
多层神经网络
▪ 神经网络将许多单一的神经元连接在一起 ▪ 一个神经元的输出作为另一个神经元的输入
前向传播
▪ 多层神经网络模型可以理解为多个非线性函数“嵌套”
f ( f ( f ( f ( z, w1, b1 ), w2 , b2 )), wn , bn )
[1] Bruna-Mallat. Classification with scattering operators, CVPR’11. Invariant scattering convolution networks, arXiv’12. Mallat Waldspurger. Deep Learning by Scattering, arXiv’13. [2] Wiatowski, Bölcskei. A mathematical theory of deep convolutional neural networks for feature extraction. arXiv 2015. [3] Giryes, Sapiro, A Bronstein. Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? arXiv:1504.08291. [4] Sokolic. Margin Preservation of Deep Neural Networks, 2015 [5] Montufar. Geometric and Combinatorial Perspectives on Deep Neural Networks, 2015. [6] Neyshabur. The Geometry of Optimization and Generalization in Neural Networks: A Path-based Approach, 2015.
▪ More computing (GPUs) ▪ Better regularization: Dropout
▪ New nonlinearities
▪ Max pooling, Rectified linear units (ReLU)
▪ Theoretical understanding of deep networks remains shallow
▪ All these features are somehow present in Deep Learning systems
Neuroscience Simple cells Complex cells Grandmother cells Deep Network First layer Pooling Layer Last layer
Page . 9
Harmonic analysis
▪ Olshausen-Field representations bear strong resemblance to defined mathematical objects from harmonic analysis wavelets, ridgelets, curvelets. ▪ Harmonic analysis: long history of developing optimal representations via optimization ▪ Research in 1990's: Wavelets etc are optimal sparsifying transforms for certain classes of images
Page . 13
Recent Theoretical Results on Deep Learning
▪ Optimization theory and algorithms
Page . 10
Approximation Theory
▪ Class prediction rule can be viewed as function f(x) of highdimensional argument ▪ Curse of Dimensionality
▪ Traditional theoretical obstacle to high-dimensional approximation ▪ Functions of high dimensional x can wiggle in too many dimensions to be learned from finite datasets
[1] Cybenko. Approximations by superpositions of sigmoidal functions, Mathematics of Control, Signals, and Systems, 2 (4), 303-314, 1989. [2] Hornik, Stinchcombe and White. Multilayer feedforward networks are universal approximators, Neural Networks, 2(3), 359-366, 1989. [3] Hornik. Approximation Capabilities of Multilayer Feedforward Networks, Neural Networks, 4(2), 251 –257, 1991. [4] Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930 – 945, 1993. [5] P Baldi, K Hornik, Neural networks and principal component analysis: Learning from examples without local minima, Neural networks, 1989. [6] Brady, Raghavan, Slawny. Back propagation fails to separate where perceptrons succeed. IEEE Trans Circuits & Systems, 36(5):665–674, 1989.
▪ Optimization theory
▪ No spurious local optima for linear networks: Baldi & Hornik ’89 ▪ Stuck in local minima: Brady ‘89 ▪ Stuck in local minima, but convergence guarantees for linearly separable data: Gori & Tesi ‘92 ▪ Manifold of spurious local optima: Frasconi ’97
Page . 12
Recent Theoretical Results on Deep Learning
▪ Invariance, stability, and learning theory
▪ Scattering networks: Bruna ’11, Bruna ’13, Mallat ’13 ▪ Deformation stability for Lipschitz non-linearities: Wiatowski ’15 ▪ Distance and margin-preserving embeddings: Giryes ’15, Sokolik ‘16 ▪ Geometry, generalization bounds and depth efficiency: Montufar ’15,Neyshabur ’15, Shashua ’14 ’15 ’16 ▪ ……
[1] Razavian, Azizpour, Sullivan, Carlsson, CNN Features off-the-shelf: an Astounding Baseline for Recognition. CVPRW’14 [2] Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research, 15(1):1929–1958. [3] Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pages 448–456.
Page . 5
层数逐年增加
今天>1000 layers
Page . 6
为什么深度学习性能如此之好?
▪ Features are learned rather than hand-crafted ▪ More layers capture more invariances
▪ More data to train deeper networks
Page . 7
神经科学带来的启示
▪ Experimental Neuroscience uncovered:
▪ Neural architecture of Retina/LGN/V1/V2/V3/ etc ▪ Existence of neurons with weights and activation functions (simple cells) ▪ Pooling neurons (complex cells)
▪ 多层神经网络层数可以无限叠加
▪ 具有无限建模能力, 可以拟合任意函数
Page . 4
常用激活函数
▪ Sigmoid
f ( z) 1 1 e z
▪ Tanh
e z e z f ( z) z z e e
▪ Rectified linear units (ReLU)
z z 0 f ( z) 0 z 0
Page . 8
Olshausen and Field’s Work (Nature, 1996)
▪ Olshausen and Field demonstrated that receptive fields learned from image patches. ▪ Olshausen and Field showed that optimization process can drive learning image representations.