Computer-Aided Cytogenetic Method of Breast Cancer Diagnosis. Part II- Test Criteria R.I.An
mathematical simulations show that the interval
ˆ contains not less than 95% of the values J from G when n ≥ 11 .
The 3s-rule is closely connected with the 3s1-rule, which allows us to calculate a confidence interval for unknown mathematical expectation m ( x ) on the basis of the sample with significance level not exceeding 0.05. At first, consider the problem of the constructing of the confidence interval on the basis of 3σ-rule, in the case when the value of the random variable x and its variance σ2 ( x ) are known. By virtue of the inequality (2) we have:
Department of Mathematical Sciences and Center for Applied Mathematics and Statistics, New Jersey Institute of Technology, Newark, NJ, USA
Abstract. In this part we describe the statistical test criteria which are used in Part I in the construction of computer-aided cytogenetic method of breast cancer diagnosis. Keywords: breast cancer, fibroadenomatosis, buccal epithelium, discriminant analysis.
P x − m ( x ) ≥ 3σ ( x ) ≤ 0.05
1 n ∑ xk , n k =1 1 n 2 σ2 ( x ) ≈ s 2 = ( xk − x ) . ∑ n − 1 k =1 m ( x) ≈ x =
These estimations have good properties. They are unbiased, i.e. their mathematical expectations coincide with the exact value of the estimated parameters m ( x ) and D ( x ) :
m ( x ) = m ( x) ,
m ( s2 ( x )) = D ( x ) .
In constructing the confidence interval J containing the bulk of the general population G on the basis of the sample x1 , x2 ,..., xn it is quite naturally to replace the mathematical expectation m ( x ) and the variance σ2 ( x ) by their estimations x and s 2 respectively. So, we can formulate the so-called 3s-rule:the is a源自interval random
confidence interval for unknown mathematical expectation m ( x ) with significance level 0.05 (by virtue of 3σ-rule). In prevalent number of cases we can put x = x , so that
8 4 1 p ( x − m( x ) ≥ k σ ( x ) ) ≤ ⋅ 2 , k ≥ (2) 9 k 3
The 3s-rule
In order to construct the confidence interval
ˆ = ( x − 3s, x + 3s ) , J
Therefore, the
Computer-Aided Cytogenetic Method of Breast Cancer Diagnosis. Part II - Test Criteria
R.I.Andrushkiw D.A.Klyushin, K.N.Golubeva, M.Pokoyovy, A.V.Romanov Kyiv National Taras Shevchenko University, Kyiv, Ukraine
does not exceed 0.05, i.e.
σ ( x) σ( x) p m ( x) ∈ x − 3 ,x +3 ≥ 0.95 n n
. It is easy to see that the following estimation of the variance of the sample mean is unbiased, and has the same properties as the estimation s 2 ( x ) :
where m(x) is the expectation and σ(x) is the standard deviation of x. The value of the constant 0.05 is stipulated by the fact that in many applied sciences (for example, biology and medicine) the 5% significance level is the most widely used. The justification of the 3σrule was given in paper [1]. There also exist several different proofs of this empirical rule [2–4]. Theorem 1. For all k > 0, the following inequality holds for an arbitrary random variable x having a unimodal distribution and finite variance σ2 ( x ) > 0
s12 ( x ) =
n 1 2 1 2 s ( x) = ( xk − x ) . ∑ n n ( n − 1) k =1
Replacing σ2 ( x ) by its estimation s12 ( x ) , we obtain the 3s1-rule that states that the confidence interval
2 sn respectively.
The 3σ–rule
The empirical 3σ-rule, which is well known in mathematical statistics, states that for the overwhelming majority of commonly encountered random variables x the following inequality holds:
p x − m ( x ) ≤ 3σ ( x ) = = p ( −3σ ( x ) ≤ m ( x ) − x ≤ 3σ ( x ) ) =
= p ( x − 3σ ( x ) ≤ m ( x ) ≤ x + 3σ ( x ) ) ≥ 0.95
Hence, it follows that
J = ( x − 3σ ( x ) , x + 3σ ( x ) )
n n
1 1 2 xk , s 2 = ( xk − x ) .When ∑ ∑ n k =1 n − 1 k =1
σ σ ,x +3 x −3 n n
n is large, this interval contains not less than 95% of the values from G .Now, let us consider the following question: under what n the 3srule holds. According to practical recommendations, the estimation x almost coincides with m ( x ) when n ≥ 30 , and s2 ( x ) ≈ D ( x )
x1 , x2 ,..., xn
3s ( x ) 3s ( x ) ,x + J1 = x − n n
contains unknown mathematical expectation m ( x ) with the probability not exceeding 0.95, when n is large. Since the estimation s 2 ( x ) has practically the same value as σ2 ( x ) if n ≥ 150 , we can assume that the estimation s12 ( x ) coincides with the variance σ2 ( x ) and that the 3s1-rule holds when n ≥ 150 . Nevertheless, this rule may be applied even for n ≥ 11 . In mathematical statistics samples are classified by their size: 1) small samples, when n ≤ 30 ; 2) middle samples, when 30 < n < 150 , and 3) large samples, when n ≥ 150 . To summarize, we can state that the 3s and 3s1-rules hold for middle and large samples, and even for small samples, if their size exceeds n = 11 .