深度学习讲义
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
• E.g. Evolutionary Artificial Neural Networks
• Q: Can we design the network structure? Convolutional Neural Network (CNN)
Three Steps for Deep Learning
x1
…… …… A function set containing the candidates for Handwriting Digit Recognition …… …… …… ……
Hidden Layers Output Layer
y1
is 1
x2
“2”
…… y10
y2
is 2
Input Layer
target
“1”
x1
……
…… Softmax …… Given a set of parameters …… ……
y1
y2
1
0 …… ……
Cross Entropy
x2
x256
……
y10
……
0
Total Loss
For all training data …
x1 x2 NN y1 y2
Total Loss:
……
xN
……
Hidden Layers
……
……
Output Layer
yM
Deep = Many hidden layers
22 layers
http://cs231n.stanford.e du/slides/winter1516_le cture8.pdf
19 layers
8 layers 7.3% 6.7%
Three Steps for Deep Learning
Step 1: define a set Neural of function Network Step 2: goodness of function Step 3: pick the best function
Deep Learning is so simple ……
1 1 4 0.98
-2
1
-1 -1
1
-2
0.12
0
Sigmoid Function
z
1 z 1 ez
z
Fully Connect Feedforward Network
1 1 4 0.98 2 -1 1
0.86 3
-1
0.62
-2 -1
-2 0.12
0
-2 0.11 0 -1
NN
NN
x3
y3
Find a function in function set that minimizes total loss L ……
……
NN
……
xN
……
yN
Three Steps for Deep Learning
Step 1: define a set Neural of function Network Step 2: goodness of function Step 3: pick the best function
x1
……
W2
b1 b2 WL …… bL
y1
y2
x2 W1
……
……
xN x
W1
x + b1
……
a1
W2
……
a2
……
……
y
yM
a1 + b2 WL aL-1 + bL
Neural Network
x1
……
W2
b1 b2 WL …… bL
y1
y2
x2 W1
……
y
x
……
xN x
WL
Using parallel computing techniques to speed up matrix operation
Ups and downs of Deep Learning
• 1958: Perceptron (linear model) • 1969: Perceptron has limitation • 1980s: Multi-layer perceptron • Do not have significant difference from DNN today • 1986: Backpropagation • Usually more than 3 hidden layers is not helpful • 1989: 1 hidden layer is “good enough”, why deep? • 2006: RBM initialization • 2009: GPU • 2011: Start to be popular in speech recognition • 2012: win ILSVRC image competition • 2015.2: Image recognition surpassing human-level performance • 2016.3: Alpha GO beats Lee Sedol • 2016.10: Speech recognition system as good as humans
Given network structure, define a function set
Fully Connect Feedforward Network
neuron Input Layer 1 Layer 2 Layer L Output
x1
…… …… ……
y1 y2 ……
x2
Input Layer
• 感謝 Victor Chen 發現投影片上的打字錯誤
Deep Learning
Hung-yi Lee 主讲:李宏毅
Deep learning attracts lots of attention.
• I believe you have seen lots of exciting results before.
Deep learning trends at Google. Source: SIGMOD 2016/Jeff Dean
Backpropagation
libdnn
台大周伯威 同學開發
Ref: http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20b ackprop.ecm.mp4/index.html
Three Steps for Deep Learning
Step 1: define a set Neural of function Network Step 2: goodness of function Step 3: pick the best function
Deep Learning is so simple ……
Acknowledgment
This is the “learning” of machines in deep learning …… Even alpha go using this approach.
People image ……
Actually …..
I hope you are not too disappointed :p
Step 1: define a set Neural of function Network Step 2: goodness of function Step 3: pick the best function
Deep Learning is so simple ……
Loss for an Example
……
xN
is 0
……
You need to decide the network structure to let a good function in your function set.
FAQ
• Q: How many layers? How many neurons for each layer? Intuition Trial and Error + • Q: Can the structure be automatically determined?
Deep Learning is so simple ……
Gradient Descent
0.2 0.15
-0.1
0.05
……
0.3
0.2
……
gradient
Gradient Descent
0.2 0.15 0.09
……
-0.1
0.05
0.15
……
0.3
……
0.2 0.10
……
……
Gradient Descent
Neural Machine Network
What is needed is a function ……
is 1 is 2
百度文库
x2
“2”
…… y10
y2
x256
Input: 256-dim vector
……
is 0
output: 10-dim vector
……
Example Application
Input Layer 1 Layer 2 Layer L Output
Neural Network
z
z
z
z
“Neuron”
Neural Network
Different connection leads to different network structures
Fully Connect Feedforward Network
-2
0.83 2
-1
1
-1 0
4
Fully Connect Feedforward Network
0 1 0.73 2 0.72 -1 1 3 -1
0.51 -2
-2 -1
0.5
0
-2 0.12 0 -1
0.85
2
0
1
-1 0
4
This is a function. Input vector, output vector
Input Layer
x2
……
……
……
Hidden Layers
xK
yM
Output = Multi-class Layer Classifier
Example Application
Input
x1
Output
y1 0.1
0.7 y2
is 1
is 2
x2
The image is “2”
……
……
x256
…
W2 W1 x + b1 + b2
……
a1
……
a2
……
……
y
yM
… + bL
Output Layer as Multi-Class Classifier
Feature extractor replacing feature engineering
……
x1
y1 …… Softmax y2 ……
x
…… ……
16.4%
AlexNet (2012)
VGG (2014)
GoogleNet (2014)
Deep = Many hidden layers
152 layers 101 layers
Special structure
Ref: https://www.youtube.com/watch?v= dxB6299gpvI
16 x 16 = 256
……
Ink → 1 No ink → 0
0.2 y10
is 0
Each dimension represents the confidence of a digit.
Example Application
• Handwriting Digit Recognition
x1
y1
3.57%
16.4% AlexNet (2012)
7.3% VGG (2014)
6.7%
GoogleNet (2014)
Residual Net (2015)
Taipei 101
Matrix Operation
1 1 4 0.98
y1
-2
1
-1 -1
1
-2
0.12
y2
0
Neural Network
• Q: Can we design the network structure? Convolutional Neural Network (CNN)
Three Steps for Deep Learning
x1
…… …… A function set containing the candidates for Handwriting Digit Recognition …… …… …… ……
Hidden Layers Output Layer
y1
is 1
x2
“2”
…… y10
y2
is 2
Input Layer
target
“1”
x1
……
…… Softmax …… Given a set of parameters …… ……
y1
y2
1
0 …… ……
Cross Entropy
x2
x256
……
y10
……
0
Total Loss
For all training data …
x1 x2 NN y1 y2
Total Loss:
……
xN
……
Hidden Layers
……
……
Output Layer
yM
Deep = Many hidden layers
22 layers
http://cs231n.stanford.e du/slides/winter1516_le cture8.pdf
19 layers
8 layers 7.3% 6.7%
Three Steps for Deep Learning
Step 1: define a set Neural of function Network Step 2: goodness of function Step 3: pick the best function
Deep Learning is so simple ……
1 1 4 0.98
-2
1
-1 -1
1
-2
0.12
0
Sigmoid Function
z
1 z 1 ez
z
Fully Connect Feedforward Network
1 1 4 0.98 2 -1 1
0.86 3
-1
0.62
-2 -1
-2 0.12
0
-2 0.11 0 -1
NN
NN
x3
y3
Find a function in function set that minimizes total loss L ……
……
NN
……
xN
……
yN
Three Steps for Deep Learning
Step 1: define a set Neural of function Network Step 2: goodness of function Step 3: pick the best function
x1
……
W2
b1 b2 WL …… bL
y1
y2
x2 W1
……
……
xN x
W1
x + b1
……
a1
W2
……
a2
……
……
y
yM
a1 + b2 WL aL-1 + bL
Neural Network
x1
……
W2
b1 b2 WL …… bL
y1
y2
x2 W1
……
y
x
……
xN x
WL
Using parallel computing techniques to speed up matrix operation
Ups and downs of Deep Learning
• 1958: Perceptron (linear model) • 1969: Perceptron has limitation • 1980s: Multi-layer perceptron • Do not have significant difference from DNN today • 1986: Backpropagation • Usually more than 3 hidden layers is not helpful • 1989: 1 hidden layer is “good enough”, why deep? • 2006: RBM initialization • 2009: GPU • 2011: Start to be popular in speech recognition • 2012: win ILSVRC image competition • 2015.2: Image recognition surpassing human-level performance • 2016.3: Alpha GO beats Lee Sedol • 2016.10: Speech recognition system as good as humans
Given network structure, define a function set
Fully Connect Feedforward Network
neuron Input Layer 1 Layer 2 Layer L Output
x1
…… …… ……
y1 y2 ……
x2
Input Layer
• 感謝 Victor Chen 發現投影片上的打字錯誤
Deep Learning
Hung-yi Lee 主讲:李宏毅
Deep learning attracts lots of attention.
• I believe you have seen lots of exciting results before.
Deep learning trends at Google. Source: SIGMOD 2016/Jeff Dean
Backpropagation
libdnn
台大周伯威 同學開發
Ref: http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20b ackprop.ecm.mp4/index.html
Three Steps for Deep Learning
Step 1: define a set Neural of function Network Step 2: goodness of function Step 3: pick the best function
Deep Learning is so simple ……
Acknowledgment
This is the “learning” of machines in deep learning …… Even alpha go using this approach.
People image ……
Actually …..
I hope you are not too disappointed :p
Step 1: define a set Neural of function Network Step 2: goodness of function Step 3: pick the best function
Deep Learning is so simple ……
Loss for an Example
……
xN
is 0
……
You need to decide the network structure to let a good function in your function set.
FAQ
• Q: How many layers? How many neurons for each layer? Intuition Trial and Error + • Q: Can the structure be automatically determined?
Deep Learning is so simple ……
Gradient Descent
0.2 0.15
-0.1
0.05
……
0.3
0.2
……
gradient
Gradient Descent
0.2 0.15 0.09
……
-0.1
0.05
0.15
……
0.3
……
0.2 0.10
……
……
Gradient Descent
Neural Machine Network
What is needed is a function ……
is 1 is 2
百度文库
x2
“2”
…… y10
y2
x256
Input: 256-dim vector
……
is 0
output: 10-dim vector
……
Example Application
Input Layer 1 Layer 2 Layer L Output
Neural Network
z
z
z
z
“Neuron”
Neural Network
Different connection leads to different network structures
Fully Connect Feedforward Network
-2
0.83 2
-1
1
-1 0
4
Fully Connect Feedforward Network
0 1 0.73 2 0.72 -1 1 3 -1
0.51 -2
-2 -1
0.5
0
-2 0.12 0 -1
0.85
2
0
1
-1 0
4
This is a function. Input vector, output vector
Input Layer
x2
……
……
……
Hidden Layers
xK
yM
Output = Multi-class Layer Classifier
Example Application
Input
x1
Output
y1 0.1
0.7 y2
is 1
is 2
x2
The image is “2”
……
……
x256
…
W2 W1 x + b1 + b2
……
a1
……
a2
……
……
y
yM
… + bL
Output Layer as Multi-Class Classifier
Feature extractor replacing feature engineering
……
x1
y1 …… Softmax y2 ……
x
…… ……
16.4%
AlexNet (2012)
VGG (2014)
GoogleNet (2014)
Deep = Many hidden layers
152 layers 101 layers
Special structure
Ref: https://www.youtube.com/watch?v= dxB6299gpvI
16 x 16 = 256
……
Ink → 1 No ink → 0
0.2 y10
is 0
Each dimension represents the confidence of a digit.
Example Application
• Handwriting Digit Recognition
x1
y1
3.57%
16.4% AlexNet (2012)
7.3% VGG (2014)
6.7%
GoogleNet (2014)
Residual Net (2015)
Taipei 101
Matrix Operation
1 1 4 0.98
y1
-2
1
-1 -1
1
-2
0.12
y2
0
Neural Network