Robustifying AdaBoost by adding the naive error rate

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

−1 exp(−Fm−1 (xi )yi). In other words, the weight wm (i) goes up by where wm (i) = Zm
eαm when the weak hypothesis fm (x) fails to correctly classify the training data (xi , yi), and goes down by e−αm otherwise. In a subsequent discussion, we propose alternative updating methods in place of (6). In addition, the update rule (6) has the following property: εm+1 (fm ) = 1 (m = 1, · · · , M − 1). 2 2 (7)
1
Introduction
ห้องสมุดไป่ตู้
In the present paper we investigate a classiﬁcation problem using a boosting method. The boosting method aims to construct a strong learning machine by combining weak hypotheses. One typical boosting method is AdaBoost (Freund and Schapire, 1997). The key concept here is to change the weight distribution of a training data set in each step of the learning process. Let y denote the class label, which is to be predicted based on a feature vector represented by x ∈ Rp . A weak hypothesis f (x) with values ±1 is in a given class F . An indicator function I is deﬁned by I(A) = 1, A is true, 0, otherwise. 1 (1)
For a given training data set {(xi , yi)}N i=1 and a discriminant function F (x), the exponential loss is deﬁned by Lexp (F ) =
N i=1
exp (−yi F (xi )) .
f ∈F
1 (i = 1, · · · , N ) , F0 (x) = 0. N
(3)
where εm (f ) = (b) Calculate
N i=1
wm (i)I(f (xi ) = yi ). 1 − εm (fm ) 1 log . 2 εm (fm )
αm =
(4)
(c) Update the weight distribution as wm+1 (i) = where Fm (x) =
with the probability η (Copas, 1988). In a subsequent discussion, we will give another form of η depending on x. We focus on a simple example in order to explore the weakness of AdaBoost under noisy conditions. We generate a data set of 200 examples which is completely separable by a linear separator and only change the label of one instance. Figure 1 gives the data set and the classiﬁcation result provided by AdaBoost. Since AdaBoost exponentially increases the weight for the outlier, some weak hypotheses with a decision boundary near the outlier are selected in learning steps. Consequently, the output is greatly aﬀected by the outlier, as shown in Figure 1 and leads to bad performance in the sense of generalization error. In the present paper, we propose a new boosting algorithm called η -Boost. We deﬁne the naive loss function as Lnaive (F ) = −
This implies that the hypothesis fm (x) optimized with respect to the weight distribution wm (i) has no predictive accuracy, or is a random guess under the updated distribution wm+1 (i). While AdaBoost is a very simple and powerful method, this method is sensitive to outliers. Robust statistics emerged in the 1960s in the statistical community (Huber, 1974 and Hampel, 1986). This concept is based on a geometric interpretation when the data space is unbounded and continuous. An outlier can be deﬁned as residing far from the bulk of a given data set. A procedure is said to be robust if it is relatively unaﬀected by outliers. An inﬂuence function is introduced to assign a quantiﬁed value for the eﬀect of the procedure. However, the space of labels is limited to {−1, 1} in our context, so that we cannot use the usual deﬁnition of robustness described above. The outlying occurs simply in the transposition between −1 and 1, in which the label y is changed to −y . Therefore, we have to take a probabilistic approach rather than a geometric approach. Typically, the distribution of the label y given x, p(y |x) is contaminated as (1 − η )p(y |x) + ηp(−y |x) (8)
Robustifying AdaBoost by adding the naive error rate
Takashi Takenouchi Department of Statistical Science, Graduate University of Advanced Studies Shinto Eguchi Institute of Statistical Mathematics, Japan and Department of Statistical Science, Graduate University of Advanced Studies Abstract AdaBoost can be derived by sequential minimization of the exponential loss function. It implements the learning process by exponentially reweighting examples according to classiﬁcation results. However, weights are often too sharply tuned, so that AdaBoost suﬀers from the non-robustness and over-learning. We propose a new boosting method that is a slight modiﬁcation of AdaBoost. The loss function is deﬁned by a mixture of the exponential loss and naive error loss functions. As a result, the proposed method incorporates the eﬀect of forgetfulness into AdaBoost. The statistical signiﬁcance of our method is discussed, and simulations are presented for conﬁrmation. Keywords contamination model, discriminant analysis, mislabel, robustness, η -divergence
m s=1
exp(−Fm (xi )yi) , Zm+1
N i=1
(5)
αs fs (x), Zm+1 =
M m=1
exp(−Fm (xi )yi ).
3. Output the discriminant function sgn(
αm fm (x)).
In 2(b), αm is a log-odds ratio of weighted error rate εm (fm ) and is the coeﬃcient of fm (x) in the output function. When εm (fm ) is small, αm takes a large value and when εm (fm ) is near 1 , αm takes a small value. In 2(c), we can rewrite the update rule (5) as 2 wm+1 (i) ∝ wm (i)eαm if fm (xi ) = yi , wm (i)e−αm otherwise (6)
(2)
We introduce AdaBoost by the sequential minimization of (2) as follows: 1. Start with weights w1 (i) = 2. For m = 1, · · · , M (a) Find fm (x) = argmin εm (f ),