递归神经网络英文课件-Chapter 2 Machine learning basics
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
wMSEtrain = 0 ⇒ w||X(train)w − y(train)||22 = 0 w = (X(train)t X(train))−1X(train)t y(train) where X(train) = [x(1train), . . . , x(Ntrain)] and y(train) = [y1(train), . . . , yN(train)].
i
Why not use classification errors #{f (x(itrain)) = yi(train)}?
cuhk
Xiaogang Wang
Machine Learning Basics
Optimization
The choice of the objective function should be good for optimization Take linear regression as an example
y ∗ = arg max P(y = k |x)
k
(Duda et al. Pattern Classification 2000) Xiaogang Wang
Machine Learning Basics
cuhk
Regression
Predict real-valued output f : RD → RM
Xiaogang Wang
Machine Learning Basics
cuhk
Generalization
We care more about the performance of the model on new, previously unseen examples
The training examples usually cannot cover all the possible input configurations, so the learner has to generalize from the training examples to new cases
1 Performancetest = M
M
Error(f (x(i test)), yi(test))
Байду номын сангаас
i =1
We hope that both test examples and training examples are drawn from p(x, y ) of interest, although it is unknown
Mean squared error (MSE) for linear regression
MSEtrain
=
1 N
||wt x(itrain) − yi(train)||22
i
Cross entropy (CE) for classification
CEtrain
=
1 N
log P(y = yi(train)|x(itrain))
GEf = p(x, y )Error(f (x), y )
x,y
cuhk
Xiaogang Wang
Machine Learning Basics
Generalization
However, in practice, p(x, y ) is unknow. We assess the generalization performance with a test set {x(i test), yi(test)}
Machine Learning Basics
Xiaogang Wang
Machine Learning Basics
cuhk
Machine Learning
Xiaogang Wang
Machine Learning Basics
cuhk
Classification
f (x) predicts the category that x belongs to f : RD → {1, . . . , K }
f (x) is decided by the decision boundary As an variant, f can also predict the probability distribution over classes given x, f (x) = P(y |x). The category is predicted as
Decision boundary, parameters of P(y |x), and w in linear regression
Optimize an objective function on the training set. It is a performance measure on the training set and could be different from that on the test set.
Generalization error: the expected error over ALL examples
To obtain theoretical guarantees about generalization of a machine learning algorithm, we assume all the samples are drawn from a distribution p(x, y ), and calculate generalization error (GE) of a prediction function f by taking expectation over p(x, y )
Example: linear regression
D
y = wt x = wd xd + w0
d =1
(Bengio et al. Deep Learning 2014)
Xiaogang Wang
Machine Learning Basics
cuhk
Training
Training: estimate the parameters of f from {(x(itrain), yi(train))}
i
Why not use classification errors #{f (x(itrain)) = yi(train)}?
cuhk
Xiaogang Wang
Machine Learning Basics
Optimization
The choice of the objective function should be good for optimization Take linear regression as an example
y ∗ = arg max P(y = k |x)
k
(Duda et al. Pattern Classification 2000) Xiaogang Wang
Machine Learning Basics
cuhk
Regression
Predict real-valued output f : RD → RM
Xiaogang Wang
Machine Learning Basics
cuhk
Generalization
We care more about the performance of the model on new, previously unseen examples
The training examples usually cannot cover all the possible input configurations, so the learner has to generalize from the training examples to new cases
1 Performancetest = M
M
Error(f (x(i test)), yi(test))
Байду номын сангаас
i =1
We hope that both test examples and training examples are drawn from p(x, y ) of interest, although it is unknown
Mean squared error (MSE) for linear regression
MSEtrain
=
1 N
||wt x(itrain) − yi(train)||22
i
Cross entropy (CE) for classification
CEtrain
=
1 N
log P(y = yi(train)|x(itrain))
GEf = p(x, y )Error(f (x), y )
x,y
cuhk
Xiaogang Wang
Machine Learning Basics
Generalization
However, in practice, p(x, y ) is unknow. We assess the generalization performance with a test set {x(i test), yi(test)}
Machine Learning Basics
Xiaogang Wang
Machine Learning Basics
cuhk
Machine Learning
Xiaogang Wang
Machine Learning Basics
cuhk
Classification
f (x) predicts the category that x belongs to f : RD → {1, . . . , K }
f (x) is decided by the decision boundary As an variant, f can also predict the probability distribution over classes given x, f (x) = P(y |x). The category is predicted as
Decision boundary, parameters of P(y |x), and w in linear regression
Optimize an objective function on the training set. It is a performance measure on the training set and could be different from that on the test set.
Generalization error: the expected error over ALL examples
To obtain theoretical guarantees about generalization of a machine learning algorithm, we assume all the samples are drawn from a distribution p(x, y ), and calculate generalization error (GE) of a prediction function f by taking expectation over p(x, y )
Example: linear regression
D
y = wt x = wd xd + w0
d =1
(Bengio et al. Deep Learning 2014)
Xiaogang Wang
Machine Learning Basics
cuhk
Training
Training: estimate the parameters of f from {(x(itrain), yi(train))}