投影次梯度
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Pegasos algorithm:
Cutting planes method
为什么pegasos会有这 样的优势呢?
根据目标函数的性质选择相应的方法
特殊
二阶可微 一阶可微
wk 1 wk 2 f (wk )1f (wk )
wk 1 wk f (wk )
---牛顿法 ---梯度下降法
Sample complexity:
In Pegasos, we aim at analyzing computational complexity based on , , and (also in Bottou & 1 Bousquet) O( ) Finding argmin vs. calculating min: It seems that Pegasos finds the argmin more easily than it requires to calculate the min value
用马尔可夫不等式证明, 略。。。
% 1 )的迭代次数。 获得精度 置信度1 的解,需要(
1 c ln(T ) ln 1 rob f (w r ) f (w* ) T
% 1 )的迭代次数。 获得精度 置信度1 的解,需要(
W 1/
2 1/ 2
为什么B集是这个样子?
Subgradient
Projection
t 1 (1 t ) t
看看它们是怎么联系的?
_
T 3 1 ln T 2 ln T 1 ln T ln T 2
+
G 2 (1 ln T ) c(1 ln T ) c ln T 2T 2T T
给定训练集
其中
f (w)
这里先 讲b=0的 情况, b不等于 0在拓展 篇中提到! 本文是针对大规模样本解决原始SVM问
给定训练集
其中
K=(1…m)
Tong Zhang, Solving large scale linear prediction problems using stochastic gradient descent (ICML-04)
Memory: m, Time: super-linear in m
Online learning & Stochastic Gradient
Memory: O(1), Time: 1/2 (linear kernel) Memory: 1/2, Time: 1/4 (non-linear kernel) Typically, online learning algorithms do not converge to the optimal solution of SVM
PEGASOS
Primal Estimated sub-Gradient Solver for SVM
YASSO = Yet Another SVM Solver
2010.12.06
Support Vector Machines
QP form:
More “natural” form:
Regularization term
What’s new in pegasos?
1. Sub-gradient descent technique 50 years old 2. Soft Margin SVM 14 years old 3. Typically the gradient descent methods suffer from slow convergence 4. Authors of Pegasos proved that aggressive decrease in learning rate t still leads to convergence. Previous works: t = 1/( t ) pegasos:
Better rates for finite dimensional instances (Murata, Bottou)
popular classification learning tool : 1,Vapnik, 1998; 2,Cristianini& Shawe-Taylor, 2000.
% ln(1/ ) ) (
同步运行,效率更高!
ຫໍສະໝຸດ Baidu
拓展篇
Popular approach: increase dimension of x Cons: “pay” for b in the regularization term
Because:
?
convergence rate is 1/2 |At| need to be large
1.
W1 0
2.
3.
4.
看看效果如何
But, tuning the parameter is more expensive than learning …
•The product of kT determines how close to the optimal value we get •If kT is fixed the k does not play a significant role!
kernels complex prediction problems bias term
Previous Work
Dual-based methods
Interior Point methods
Memory: m2, time: m3 log(log(1/))
Decomposition methods(SMO ,SVMlight)
Empirical loss
Outline
Previous Work The Pegasos algorithm Analysis – faster convergence rates Experiments – “outperforms state-of-the-art” Extensions
t = 1/ ( t)
1/
5. Proved that the solution always lies in a ball of radius
牛顿法、梯度法 能否也应用投影方法,把每 一步迭代解投影到类似更紧凑的可行域中呢?
Discussion
Pegasos: Simple & Efficient solver for SVM Sample vs. computational complexity
2
缺 点
_
+
无论新来的样本 X 被 W分类正确与否, Pegasos都要调整W,有 点不合常理,是一种比较 蛮横方法!
不足之处,请批评
Pegasos成功之处何在?
Pegasos很能干!
The Pegasos algorithm
原始定义:
实际应用
+
|At|=1
次梯度是怎么操作的?
+
_
+
投影是怎么操作的?
B
i.e.
W 1 (2, 2)
t 2
Wt 1 (1,1)
Wt 1 Wt 1/ 2 *min{1,1/( || Wt 1/ 2 ||)}
(最速下降法)
不可微(次可微) wk 1 wk f (wk ) 投影次梯度方法 wk 1 P[wk f (wk )]
---次梯度法
一般
概念理解:在线 随机 批处理
理论保证:
Proof: page4
Proof: page 9
t t 1/ 2
1 t Ht
1: Choose a example randomly 2 : Compute subgradient g ( x) 3: Update xt 1 xt t g ( x)
1990-1992,北大数学系 1992-1994,康奈尔大学 1994-1998,斯坦福(博士) IBM -> Yahoo ->Rutgers 大学 统计系->…