斯坦福大学机器学习第七讲Lecture7.

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

The problem of
overfitting
Machine Learning
Example: Linear regression (housing prices
Overfitting:
If we have too many features, the learned hypothesis to generalize to new examples (predict prices on new examples.
P r i c e
Size P r i c e
Size P r i c e
Size
Example: Logistic regression
( = sigmoid function x
1
x
2
x
1
x
2
x
1
x
2
P
r
i
c
e
Size size of house
no. of bedrooms
no. of floors
age of house
average income in neighborhood
kitchen size
Options:
1.Reduce number of features.
―Manually select which features to keep.
―Model selection algorithm (later in course.
2.Regularization.
―Keep all the features, but reduce magnitude/values of parameters .
―Works well when we have a lot of features, each of which contributes a bit to predicting .
Cost function
Machine Learning
Intuition
Suppose we penalize and make , really small. P r i c e Size of house P r i c e
Size of house
Small values for parameters
―“Simpler” hypothesis
―Less prone to overfitting Housing:―Features:
―P arameters:
P r i c
e Size o
f house
What if is set to an extremely large value (perhaps for too large for our problem, say ? -Algorithm works fine; setting to be very large can’t hurt it
-Algortihm fails to eliminate overfitting.
-Algorithm results in underfitting. (Fails to fit even training data well.
-Gradient descent will fail to converge.
What if is set to an extremely large value (perhaps for too large for our problem, say ? P r i c e
Size of house
Regularized linear
regression
Machine Learning
Regularized linear regression
Gradient descent
Repeat
Normal equation
Suppose ,
Non-invertibility
(optional/advanced.(#examples(#featuresIf ,
Regularized
logistic regression
Machine Learning
Regularized logistic regression. x2 x1 Cost function: Andrew Ng
Gradient descent Repeat Andrew Ng
Advanced optimization function [jVal, gradient] = costFunction(theta jVal = [ code to compute ]; ]; gradient(1 = [ code to compute gradient(2 = [code to compute ]; gradient(3 = [code to compute ]; gradient(n+1 = [ code to compute ]; Andrew Ng。