自控第二章
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
2 Pattern Discrimination 2.1 Decision Regions and Functions
2.1.1 Generalized Decision Functions 2.1.2 Hyperplane Separability
2.2 Feature Space Metrics 2.3 The Covariance Matrix
The roots set of d(x), the decision surface, or discriminant, is now a linear d- dimensional surface called a hyperplane that can be characterized by its distance Do from the coordinates origin and its unitary normal vector n in the positive direction (d(x)﹥0) as follows (d(x)﹥ (see e.g. Friedman and Kandel, 1999): ω ω0 n= D0 = , ω ω (2-2d) (2where ‖w‖ represents the vector w length. Notice also that ︱d(z)︱/‖w‖ is precisely the distance of any d(z)︱ point z to the hyperplane.
concepts, approaches about pattern • discrimination • Request: Master the basic approaches of pattern discrimination • Focus: Basic notions
Contents of the chapter
ln(d ( x )) ≥ ln(∆ )
x ∈ ω1
x ∈ ω2
Figure 2.3. (a) Quadratic decision function, d(x) ;(b) Logarithmic decision function, .
Figure 2.3b illustrates this logarithmic decision function for the quadratic classifier example using the new threshold value ln(49)=3.89. It is sometimes convenient to express a generalized decision function as a functional linear combination: (2-4) (2with (2-4a) (2-
Imagine, for instance, that we had two classes with circular limits as shown in Figure 2.4a. A quadratic decision function capable o f separating the d ( x) = ( x1 − 1) 2 + ( x2 − 1) 2 + 0.25 classes is: (2-5a) (2Instead of working with a quadratic decision function in the original two-dimensional feature=space,=we − 1) 2 + ( x − 1)to work in a two- * y f ( x) ( x1 may decide 2 y = [1, y ]' 2 transformed one-dimensional feature space: onewith (2-5b) (2g ( y ) = [0.25,1]' y * = y + 0.25 In this one-dimensional space we rewrite the decision function onesimply as a linear decision function: (2-5c) (2Figure 2.4b illustrates that if there are small scaling differences in the original features xl and x2, as well as deviations from the class centers, it would be, in principle, easier to perform the d i s c r i m i n a t i o n in the y space than in the x space. A particular case of interest is the polynomial expression of a decision function d(x). Figure 2.5 illustrates an example of 2d(x).
Exercises
2.1 Decision Regions and Functions
In the particular case of a classifier. The main goal is to divide classifier. the feature space into regions assigned to the classification classes. classes. These regions are called decision regions. Let us assume two classes w1,w2 of patterns described by twotwodimensional feature vectors (coordinates x1, and x2) as shown in Figure 2.1.
2.1 .1 Generalized Decision Functions
As long as the classes do not overlap one can always find a generalized decision function defined in Rd that separates a class wi, from a total of
classes, so that the following decision rule applies:
d i ( x) > 0 x ∈ ωi d i;( x) < 0 x ∈ ω j j≠i if if with . (2(2-3) For some generalized decision functions we will establish a
(2-2b) (2(2-2c) (2-
is the augmented feature vector.
Figure 2.2. Two-dimensional linear decision function with normal Twovector n and data distance D0 from the origin.
Figure 2.1. Two classes of patterns described by two-dimensional feature vectors (features x1, and x2).
In Figure 2.1 we used "0" to denote class u, patterns and "x" to denote class 02 patterns. The ellipses represent "boundaries" of the patterns. distributions. It also shows a straight line separating the two classes. The classes. equation of the straight line in terms of the coordinates (features)x1, x2 using coefficients or weights w1, and w2 and a bias term w0 as shown in equation (2-1). (2(2- 1) (2We say that d(x) is a linear decision function that divides (categorizes) Rd into two decision regions: the upper half plane corresponding to d(x)>0 where each feature vector is assigned to w1; the lower half plane corresponding to d(x)<0 where each feature vector is assigned to w2. The classification is arbitrary for
d i (x) > ∆
if ; x ∈ ωi d i (x) < ∆
x ∈ω j if
jwitwk.baidu.com . ≠i
ω1 = {x; x ∈ [−7,7]}
ω and 2 = {x; x ∈ [−7,7]}
It is important to note that as far as class discrimination is concerned, any functional composition of d(x) by a monotonic function will obviously separate the classes in exactly the same way. way. For the quadratic classifier ( 2 - 3 b) we may, for instance, use a monotonic logarithmic composition: ln( x∈ x∈ If d ( x)) ≥ ln(∆ ) then ω1 else ω2 (2-3c) (2-
d ( x ) = ω1 f1 ( x ) + ⋯ω k f k ( x ) + ω0 = ω *' y *
y * = [1, f1 ( x ), f 2 ( x ) ⋯ f k ( x )]'
Figure 2.4. A two-class discrimination problem in the original twofeature space (a) and in a transformed one-dimensional feature onespace (b).
d ( x ) = ω *' x * = ω ' x + ω0
(2-2a) ω = [ω0ω1 ⋯ω d ] (2*
ω = [ω1 ⋯ω d ]'
'
is the weight vector; is the augmented weight vector with the bias term;
x * = [1x1 ⋯ xd ]
certain threshold∆ for class discrimination: (2(2-3a) For instance, in a two-class one-dimensional classification problem with twoonea quadratic decision function d ( x) = x 2 , one would design the classifier by selecting an adequate threshold ∆ so that the following decision rule would If ( x) = x 2 ≥ ∆ then ω1 else ω2 (2(2-3b) d x∈ x∈ In this decision rule we chose to assign the equality case to class w,. w,. Figure 2.3a shows a two-class discrimination using a quadratic decision twofunction with a threshold ∆ =49, which will discriminate between the class 49, apply:
Pattern Recognition Chapter 2
Pattern Discrimination (Pattern Recognition Concepts, Methods and Applications J.P. Marques de Sa Springer)
• Teaching purpose: Introduce the basic purpose:
d ( x) = ω1 x1 + ω2 x2 + ω0 = 0
d(x)=0. The generalization of the linear decision function for a dimensional feature space in Rd is straightforward:
(2-2) (2Where
2.1.1 Generalized Decision Functions 2.1.2 Hyperplane Separability
2.2 Feature Space Metrics 2.3 The Covariance Matrix
The roots set of d(x), the decision surface, or discriminant, is now a linear d- dimensional surface called a hyperplane that can be characterized by its distance Do from the coordinates origin and its unitary normal vector n in the positive direction (d(x)﹥0) as follows (d(x)﹥ (see e.g. Friedman and Kandel, 1999): ω ω0 n= D0 = , ω ω (2-2d) (2where ‖w‖ represents the vector w length. Notice also that ︱d(z)︱/‖w‖ is precisely the distance of any d(z)︱ point z to the hyperplane.
concepts, approaches about pattern • discrimination • Request: Master the basic approaches of pattern discrimination • Focus: Basic notions
Contents of the chapter
ln(d ( x )) ≥ ln(∆ )
x ∈ ω1
x ∈ ω2
Figure 2.3. (a) Quadratic decision function, d(x) ;(b) Logarithmic decision function, .
Figure 2.3b illustrates this logarithmic decision function for the quadratic classifier example using the new threshold value ln(49)=3.89. It is sometimes convenient to express a generalized decision function as a functional linear combination: (2-4) (2with (2-4a) (2-
Imagine, for instance, that we had two classes with circular limits as shown in Figure 2.4a. A quadratic decision function capable o f separating the d ( x) = ( x1 − 1) 2 + ( x2 − 1) 2 + 0.25 classes is: (2-5a) (2Instead of working with a quadratic decision function in the original two-dimensional feature=space,=we − 1) 2 + ( x − 1)to work in a two- * y f ( x) ( x1 may decide 2 y = [1, y ]' 2 transformed one-dimensional feature space: onewith (2-5b) (2g ( y ) = [0.25,1]' y * = y + 0.25 In this one-dimensional space we rewrite the decision function onesimply as a linear decision function: (2-5c) (2Figure 2.4b illustrates that if there are small scaling differences in the original features xl and x2, as well as deviations from the class centers, it would be, in principle, easier to perform the d i s c r i m i n a t i o n in the y space than in the x space. A particular case of interest is the polynomial expression of a decision function d(x). Figure 2.5 illustrates an example of 2d(x).
Exercises
2.1 Decision Regions and Functions
In the particular case of a classifier. The main goal is to divide classifier. the feature space into regions assigned to the classification classes. classes. These regions are called decision regions. Let us assume two classes w1,w2 of patterns described by twotwodimensional feature vectors (coordinates x1, and x2) as shown in Figure 2.1.
2.1 .1 Generalized Decision Functions
As long as the classes do not overlap one can always find a generalized decision function defined in Rd that separates a class wi, from a total of
classes, so that the following decision rule applies:
d i ( x) > 0 x ∈ ωi d i;( x) < 0 x ∈ ω j j≠i if if with . (2(2-3) For some generalized decision functions we will establish a
(2-2b) (2(2-2c) (2-
is the augmented feature vector.
Figure 2.2. Two-dimensional linear decision function with normal Twovector n and data distance D0 from the origin.
Figure 2.1. Two classes of patterns described by two-dimensional feature vectors (features x1, and x2).
In Figure 2.1 we used "0" to denote class u, patterns and "x" to denote class 02 patterns. The ellipses represent "boundaries" of the patterns. distributions. It also shows a straight line separating the two classes. The classes. equation of the straight line in terms of the coordinates (features)x1, x2 using coefficients or weights w1, and w2 and a bias term w0 as shown in equation (2-1). (2(2- 1) (2We say that d(x) is a linear decision function that divides (categorizes) Rd into two decision regions: the upper half plane corresponding to d(x)>0 where each feature vector is assigned to w1; the lower half plane corresponding to d(x)<0 where each feature vector is assigned to w2. The classification is arbitrary for
d i (x) > ∆
if ; x ∈ ωi d i (x) < ∆
x ∈ω j if
jwitwk.baidu.com . ≠i
ω1 = {x; x ∈ [−7,7]}
ω and 2 = {x; x ∈ [−7,7]}
It is important to note that as far as class discrimination is concerned, any functional composition of d(x) by a monotonic function will obviously separate the classes in exactly the same way. way. For the quadratic classifier ( 2 - 3 b) we may, for instance, use a monotonic logarithmic composition: ln( x∈ x∈ If d ( x)) ≥ ln(∆ ) then ω1 else ω2 (2-3c) (2-
d ( x ) = ω1 f1 ( x ) + ⋯ω k f k ( x ) + ω0 = ω *' y *
y * = [1, f1 ( x ), f 2 ( x ) ⋯ f k ( x )]'
Figure 2.4. A two-class discrimination problem in the original twofeature space (a) and in a transformed one-dimensional feature onespace (b).
d ( x ) = ω *' x * = ω ' x + ω0
(2-2a) ω = [ω0ω1 ⋯ω d ] (2*
ω = [ω1 ⋯ω d ]'
'
is the weight vector; is the augmented weight vector with the bias term;
x * = [1x1 ⋯ xd ]
certain threshold∆ for class discrimination: (2(2-3a) For instance, in a two-class one-dimensional classification problem with twoonea quadratic decision function d ( x) = x 2 , one would design the classifier by selecting an adequate threshold ∆ so that the following decision rule would If ( x) = x 2 ≥ ∆ then ω1 else ω2 (2(2-3b) d x∈ x∈ In this decision rule we chose to assign the equality case to class w,. w,. Figure 2.3a shows a two-class discrimination using a quadratic decision twofunction with a threshold ∆ =49, which will discriminate between the class 49, apply:
Pattern Recognition Chapter 2
Pattern Discrimination (Pattern Recognition Concepts, Methods and Applications J.P. Marques de Sa Springer)
• Teaching purpose: Introduce the basic purpose:
d ( x) = ω1 x1 + ω2 x2 + ω0 = 0
d(x)=0. The generalization of the linear decision function for a dimensional feature space in Rd is straightforward:
(2-2) (2Where