哈工大数理统计ppt
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
which is the instantaneous rate of mortality of an individual alive at t. The log of the empirical survival function is defined as 0 t < t(1)
k log S n (t ) = log(1 − ) n +1 log(1 − n ) n +1 t( k ) ≤ t < t( k +1) t ≥ t( n )
Chapter 1 Summarizing Data
Methods Based on the Cumulative Distribution Function Histograms, Density Curves and Stem-and-Leaf Plots Measures of Location Measures of Dispersion
Example
SAS data set: beeswax.sas Solutions Analysis Interactive Data Analysis (Find beeswax.sas from Work) Analyze Distribution Output Normal Q-Q plot
Summarizing Data Comparing Two Samples The Analysis of Variance Linear Least Squares
What we should learn?
Mathematics(Statistics)
English
Computer(SAS) (Statistical Analysis System)
E(Fn (x)) = F(x)
1 Var( Fn ( x)) = F ( x)(1 − F ( x)) n
n→∞ x
Theorem 2
p ( lim max Fn ( x ) − F ( x ) = 0 ) = 1
That is , Fn (x) tends to F (x) simultaneously with probability one.
S n (t ) = 1 − Fn (t )
where Fn (t ) is the ecdf of random variable T .
The Hazard Function
The hazard function is defined as
f (t ) F ′(t ) d h (t ) = = = − log s (t ) 1 − F (t ) 1 − F (t ) dt
Stem-and-Leaf Plots
ExampleFra Baidu bibliotekbeeswax.sas
7 9 18 23
Measures of Location
The Arithmetic Mean For a batch of numbers x1 , x2 ,L, xn , the most commonly used measure of location is
− 2 1 x 1 1 −2( h ) 1 wh ( x) = w( ) = e = e 2h h h h 2π 2π h
2 2
Let x1 , x2 ,L, xn be a sample from a probability f , then wh ( x − xi ) is the normal density with mean x i and standard deviation h ; The kernel probability density estimate of f is then given by 1 n
0 Fn ( x) = k (n + 1) n (n + 1) x < x(1) x( k ) ≤ x < x( k +1) x ≥ x( n)
Properties of the Empirical Cumulative Distribution Function
Theorem 1
0.0565((1 59) ÷ 0.3) 0.452(8 59) ÷ 0.3) 1.3559((24 59) ÷ 0.3) density= 0.8475((15 59) ÷ 0.3) 0.339((6 59) ÷ 0.3) 0.2825((5 59) ÷ 0.3)
For the given sample x1 , x2 ,L, xn , the ecdf Fn ( x) = k (n + 1) for x(k) ≤ x < x(k+1) or Fn (x(k) ) = k (n +1) ; Let Fn (x(k) ) = k (n +1) , thus the data is assigned to x(k ) ;
Mathematical Statistics and Data Analysis
John A. Rice
University of California, Berkeley
Arrangement of the Course
Chapter 1 Chapter 2 Chapter 3 Chapter 4
f h ( x) =
∑ w (x − x ) n
i =1 h i
where h is a chosen bandwidth.
Example
Beeswax Solutions Analysis Interactive Data Analysis (Find beeswax.sas from Work) Analyze Distribution Output Density Estimate Normal (kernel density)
Example
Calculate the hazard function for the exponential distribution:
1 − e − λt F (t ) = 0 t≥0 t<0
Let f denote the density function and h the hazard function of a nonnegative random variable. Show that − ∫ h ( s ) ds f (t ) = h(t )e
Comparing Two Samples by using Q-Q plot
Are sample x1 , x2 ,L, xn and y1 , y2 ,L, yn from the same distribution? x The empirical k (n + 1)th quantile of x'' s is x(k ) ; The empirical k (n + 1)th quantile of y' s is y(k ) ; The dots ( x ( k ) , y ( k ) ) on the plane would be approximately a straight line if the sample comes from the same distribution.
t 0
Quantile-Quantile Plots
The p th quantile of the distribution is the value of x p such that or x p = F −1 ( p) F (xp ) = p
1
F (x)
p
Pth quantile
xp
x
The empirical quantile of data
Examples
Plot the ecdf of this batch of numbers: 1,14,10,9,11,9 SAS data set: beeswax.sas Solutions Analysis Interactive Data Analysis (Find beeswax.sas from Work) Analyze Distribution Output Cumulative Distribution function Empirical
62.7 ≤ x < 63 63 ≤ x < 63.3 63.3 ≤ x < 63.6 63.6 ≤ x < 63.9 63.9 ≤ x < 64.2 64.2 ≤ x < 64.5
Density Curves—Kernel Probability Density Estimation
Let w(x) be the standard normal density, then the rescaled version of w(x) , wh (x) is defined as x 1 x which is the normal density with standard deviation h ;
k (n + 1)th
quantile of
Assessing Goodness of Fit by Using Q-Q Plot
Is the sample x1 , x2 ,L, xn from the distribution F? The empirical k (n + 1)th quantile is x(k ) ; The theoretical k (n + 1) th quantile of F is xk ( n +1) , which satisfies F ( xk (n+1) ) = k (n +1) ; The dots (x(k ) , xk (n+1) ) on the plane would be approximately a straight line if the sample comes from F .
Histograms
Example: beeswax.sas
1 8 24 frequency = 15 6 5 62.7 ≤ x < 63 63 ≤ x < 63.3 63.3 ≤ x < 63.6 63.6 ≤ x < 63.9 63.9 ≤ x < 64.2 64.2 ≤ x < 64.5
stem 1 1 4 1 0 3 3 2 9 5 10 26 19 17 11 6 6 5 2 2 7 2 6 5 0 1 3 0 2 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644
leaf :5 : : 358 : 033 : 77 : 001446669 : 01335 : 0000113668 : 0013689 : 88 : 334668 : 22223 : :2 : 147 : : 02
Denote the ordered batch of numbers by x(1) < x( 2 ) < L < x( n ) , then the ecdf can be expressed as
0 Fn ( x) = k n 1 x < x(1) x( k ) ≤ x < x(k +1) x ≥ x(n)
The Empirical Cumulative Distribution Function(ecdf)
Suppose that x1 , x2 ,L, xn is a batch of numbers. The ecdf is defined as
Fn ( x) = 1 (# xi ≤ x) n
The Survival Function
If T denotes time until failure or death with cdf F , the survival function is defined as
S (t ) = p (T > t ) = 1 − F (t )
which is simply the probability that the life time will be longer than t . The empirical survival function is given by