(完整word版)概率论与数理统计(英文)
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
3. Random Variables
3.1 Definition of Random Variables
In engineering or scientific problems, we are not only interested in the probability of events, but also interested in some variables depending on sample points. (定义在样本点上的变量)
For example, we maybe interested in the life of bulbs produced by a certain company, or the weight of cows in a certain farm, etc. These ideas lead to the definition of random variables.
1. random variable definition
Here are some examples.
Example 3.1.1 A fair die is tossed. The number X shown is a random variable, it takes values in the set {1,2,6}.
Example 3.1.2The life t of a bulb selected at random from bulbs produced by company A is a random variable, it takes values in the interval (0,) .
Since the outcomes of a random experiment can not be predicted in advance, the exact value of a random variable can not be predicted before the experiment, we can only discuss the probability that it takes some
value or the values in some subset of R.
2. Distribution function Definition
3.1.2 Let X be a random variable on the sample space S . Then the function
()()F X P X x =≤. R x ∈
is called the distribution function of X
Note The distribution function ()F X is defined on real numbers, not on sample space.
Example 3.1.3 Let X be the number we get from tossing a fair die. Then the distribution function of X is (Figure 3.1.1)
0,1;(),
1,1,2,,5;6
1, 6.if x n F x if n x n n if x <⎧⎪⎪=≤<+=⎨⎪≥⎪⎩
Figure 3.1.1 The distribution function in Example 3.1.3 3. Properties
The distribution function ()F x of a random variable X has the following properties :
(1) ()F x is non-decreasing.
Solution
By definition,
1(2000)(2000)10.6321P X F e -≤==-=.
(10003000)(3000)(1000)P X P X P X <≤=≤-≤
1.50.5(3000)(1000)(1)(1)0.3834F F e e --=-=---= Question : What are the probabilities (2000)P X < and (2000)P X =? Solution
Let 1X be the total number shown, then the events 1{}X k = contains 1k - sample points, 2,3,4,5k =. Thus
11()36k P X k -==
, 2,3,4,5k = And
512{1}{}k X X k ==-=
=
so 525(1)()18
k P X P X k ==-===
∑ 13(1)1(1)18
P X P X ==-=-=
Thus
0,1;5()(),
11;18
1, 1.x F x P X x x x <-⎧⎪⎪=≤=-≤<⎨⎪≥⎪⎩
Figure 3.1.2 The distribution function in Example 3.1.5
The distribution function of random variables is a connection between probability and calculus. By means of distribution function, the main tools in calculus, such as series, integrals are used to solve probability and statistics problems.
3.2 Discrete Random Variables 离散型随机变量
In this book, we study two kinds of random variables. ,,}n a
Assume a discrete random variable X takes values from the set 12{,,,}n X a a a =. Let
()n n P X a p ==,1,2,.n = (3.2.1) Then we have 0n p ≥, 1,2,,n = 1n n p
=∑.
the probability distribution of the discrete random variable X (概率分布)
注意随机变量X 的分布所满足的条件
(1) P i ≥0
(2) P 1+P 2+…+P n =1
离散型分布函数
And the distribution function of X is given by
()()n n a x
F x P X x p ≤=≤=∑ (3.2.2)
Solution
n=3, p=1/2
X p r
01/8
13/8
23/8
31/8
two-point distribution(两点分布)
某学生参加考试得5分的概率是p, X表示他首次得5分的考试次数,求X的分布。
geometric distribution (几何分布)
Example 3 (射击5发子弹)
***** ,6. Find Solution Assume that
()P X k ck ==, c =constant, 1,2,,6k =.
Since the events ()P X k =, 1,2,,6k = are mutually exclusive and their union is the certain event, i.e., the sample space S , we have
6
11()21k P X k c ====∑,
thus 1
21
c =. The probability distribution of X is (Figure 3.2.1) ()21k P X k ==, 1,2,,6k =.
123456p
x
2/7
Figure 3.2.1 Probability distribution in Example 3.2.1
Question. What is the difference between distribution functions and
probability distributions
例2 有一种验血新方法:把k 个人的血混在一起进行化验,如果结果是阴性,那么对这k 个人只作一次检验就够了,如果结果是阳性,那么必须对这k 个人再逐个分别化验,这时k 个人共需作k+1次检验。
假设对所有人来说,化验是阳性反应的概率为p ,而且这些人反映是独立的。
设X 表示每个人需要化验的次数,求X 的分布(construct the distribution of X )
Binomial distribution (二项分布)
Example 3.2.2 A fair die is tossed 4 times. Let X be the number of six got. Find the probability distribution of X .
Solution . The possible values of X are 0,1,2,3,4.
First we find the probability (0)P X =.
Since 0X = means that no six occur in 4 tosses.
The probability that six fails to occur in a single toss is 5/6, and all trials are independent, so
4
5(0)6P X ⎛⎫== ⎪⎝⎭. Now consider the probability ()P X k =, 1,2,3,4k =.
Since X k =means that six occurs exactly k times, they may occur in any k tosses of 4 tosses.
The event that they occur in a special order (for example, the first k tosses), has probability 4(5/6)(1/6)k k -,
and we have 4k C such combinations. Thus
44()(5/6)(1/6)k k k P X k C -== i.e.
62512525(0), (1), (2)1296324216P X P X P X ==
==== 41(3), (4)
3251296P X P X ====.
Binomial Distributions
An experiment often consists of repeated trials, each with two possible outcomes “success ” and “failure ”. The most useful application deals with the testing of items as they come off an assembly line, where each test or trial may indicate a defective or a non-defective item. We may choose to define either outcome as a success. The process is referred to a Bernoulli process . Each trial is called a Bernoulli trial .
Consider an experiment consists of n independent repeated trials, each trials result in two outcomes “success ” and “failure ”, and the probability of success, denote by p , remains constant. Then this process is called a Bernoulli process.
The random variable in Example 3.2.2 is an example of binomial random variable.
,n Proof First, consider the probability of obtaining k consecutive successes, followed by n k - consecutive failures. These n events are independent, therefore the desired probability is (1)k n k p p --.
Since the k successes and n k - failures may occur in any order, and for any specific order, the probability is again (1)k n k p p --. We must now determine the total number of sample points in the experiment that have k successes and n k - failures. This number is equal to the number of partition of n outcomes into two groups with k in one group and n k - in the other, i.e. k n C . Because the partitions are mutually
exclusive, thus we have
()(,,)(1)k k n k
n P X k b k n p C p p -===-, 0,1,2,,k n = Let 1q p =-, the binomial expansion of the expression ()n q p + gives
0111()n n n n n n n n q p C q C pq C p -=+=+++
(0;,)(1;,)(;,)b n p b n p b n n p =+++.
Each term correspond to various values of binomial distribution, this is the reason that we called it “binomial distribution ”. Example 2
poisson distribution (泊松分布)
2,}, and if
0,1,2,
k
distribution Note that
()1!!k
k
k k k P X k e
e
e e k k λ
λ
λλλλ∞∞
∞
---
=======⋅=∑∑∑.
Here are some examples of Poisson random variables :
(a) the number of radioactive particles passing through a counter in certain time period;
(b) the number of telephone calls received by an office in certain time period;
(c) the number of bacteria in a given culture;(细菌,培养基)
(d) the number of typing errors per page in a certain book. Example 3.5.1
Solution The probability is
64
4(6;4)0.10426!
p e
-=⋅=.
Example 3.5.2
Solution . The probability is
3
3
10
00
10(;10)!k k k P p k e k -====∑∑
102111101010000.010326e -⎛⎫
⎛⎫⎛⎫=+++= ⎪ ⎪ ⎪⎝⎭
⎝⎭
⎝
⎭
.
Homework
Chapter 3 (P47) 1, 2, 3, 5,7, 21
二项分布与泊松分布的关系 Theorem 3.5.2
(;,)(;)n b k n p p k λ→, when n →∞, and 0n p →. proof ****
Solution Let X be the number of forms contain an error, then X has the binomial distribution of parameter 5000n = and 0.001p =. Using Poisson distribution as approximations, we have
65
5(6;5000,0.001)(6,5)0.14626!
b p e -≈=⋅=;
75
5(7;5000,0.001)(7,5)0.10447!
b p e -≈=⋅=;
85
5(8;5000,0.001)(8,5)0.0653
8!
b p e -≈=⋅=.
二项分布的应用例子
Solution In this case, 5,n = 0.15p =. (a) The probability is
2235(2;5,0.15)(0.15)(0.85)0.1382b C ==.
(b)The desired probability is the sum of getting 2, 3, 4, 5 defective articles, or, we may first find the probability of the complement event, i.e., getting 0 or 1 defective article. So, if we denote the number of defective articles by X , then we have
51
45(2)1(1)10.85(0.15)(0.85)0.1648P X P X C ≥=-≤=--=. □
Solution (a) The probability is
3336(3;6,0.7)(0.7)(0.3)0.1852b C ==.
(b) In n shots, the probability that he hits at least one time is
1(0;,0.7)10.3n b n -=-.
Since when 3n ≥, we have
10.30.95n -≥,
so in 3 shots, the probability that he hits the target at least one time is
310.30.9730.95-=>.
3.3. Expectation and Variance 1.Expectation (mean) 数学期望
Suppose in the final exam, you got 85 in calculus, 90 in algebra and 83 in statistics, then your average score is (859083)/386++=.
Consider the future games. Since we cannot predict the outcome of the game, we cannot predict the exact amount he will win in the game. But we can predict the average amount he will win. Assume he tosses the die 600 times, in average, six will occur 100 times, thus, the average amount he will win per toss would be
115(11100(1)500)11(1)160066⎛⎫⨯+-⨯=⨯+-= ⎪⎝⎭
We say that in average he will win $1 per toss.
Notice
In the case that
X
takes values from an infinite number set, (3.3.1)
becomes an infinite series. If the series converges absolutely (级数绝对
收敛), we say the expectation ()E X exists, otherwise we say that the expectation of X does not exist . Example 3.3.1
Solution Since X takes values from the set {1,2,3,4,5,6} and the distribution is
1()6
P X k ==, 1,2,,6k =. Thus,
6
1
17
()62k E X k ==⋅=∑.
If a discrete random variable assume each of its values with an equal probability, we say this probability distribution is a discrete uniform distribution (离散均匀分布). The distribution in Example 3.3.1 is a discrete uniform distribution. Example 3.3.2
Solution Let X be the amount the player wins. Then X takes values from the set {1,2,,,}n . The player wins k dollars if and only if he gets 1k - tails first, and follows by a head. Thus
1()2k
P X k == So
11
()22k
k E X k ∞
==⋅
=∑
二项分布的数学期望
Now we give the expectation and variance of binomial distribution .
Proof Consider the identity
00()()n
n
n
k k
n k
k k n k k
n
n k k px q C px q
C p q x --==+==∑∑. Regard ,p q as constants, x as a variable. Differentiate both sides of this identity with respect to x , we have
1
1
1()
n
n k k n k k n k np px q kC p q x ---=+=∑. (3.4.3) Put 1x =, we get
1
n
k
k n k n
k kC
p q np -==∑.
But ()k k n k n P X k C p q -==, thus
()()n n
k k
n k
n
k k E X kP X k kC p q
np
-======∑∑
Take differentiation on both sides of (3.4.3),
2
2
2
2(1)()
(1)n
n k k n k k n k n n p px q k k C p q x ---=-+=-∑. Put 1x =, we get
22
(1)(1)n
k k n k n
k k k C
p q n n p -=-=-∑. (3.4.4)
Add (3.4.3) and (3.4.4) to get
[]2
2
220()()()()()n
k D X E X E X k P X k np ==-==-∑
221
()n
k k n k n k k C p q np -==-∑
222(1)()(1)n n p np np np np np p =-+-=-=-
泊松分布的期望
Proof By the definition,
00
1
()()!
(1)!
k
k
k k k E X k P X k k e
e k k λ
λλλ∞
∞
∞
--====⋅==⋅
⋅=⋅-∑∑∑
1
!
!
k k
k k e
e k k λ
λλλλλ+∞
∞
--===⋅=⋅=∑
∑
.
Homework chapter 3 8, 9, 10, 22, 27, 30 2008-3-19
验血问题
验血次数X 的数学期望为
111()(1)(1)1.k k k E X q q q k k k
=
++-=-+ N 个人平均需化验的次数为 1
(1)k N q k
-+. 由此可知,只要选择k 使 1
11k q k
-+
<, 则N 个人平均需化验的次数N <. 当p 固定时,我们选取k 使得11k L q k
=-+小于1且取到最小值,这时就能得到最好的分组方法.
例如,0.1p =,则0.9q =,当4k =时, 11k L q k
=-+取到最小值. 此时得到最好的分组方法.若1000N =,此时以4k =分组,则按第二方案平均只需化验
41
1000(10.9 )594()4
-+=次.
这样平均来说,可以减少40%的工作量.
补充例 问题提出
某工厂需要在五周内采购1000吨原料,估计原料价格为500元的概率为0.3,600元的概率为0.3,700元的概率为0.4,试求最佳采购策略,使采购价格的期望值最小。
思考题
如果你能预先知道5周的原料价格,当然是按最低价购买全部原料, 则此时价格的期望值是多少?
The expectation of discrete random variables has the following
properties
数学期望性质
(a) If ()1P X a == for some constant a IR ∈, then ()E X a =. (b) Let ()g X be a function of X , then
(())()()x
E g X g X P X x ==∑ (3.3.2)
(c) If 12,,,n X X X are discrete random variables, then
1212E()=E()E()E()n n X X X X X X ++
+++
+ (3.3.3)
(d) If (0)1P X ≥=, then ()0E X ≥.
(e) If (0)1P X ≥= and ()0E X =, then (0)1P X ==.
(f) For any constant b , ()()E bX bE X = (3.3.4) (g) Schwarz ’s inequality. (许瓦慈不等式)Let X,Y be random variables, then
()
2
22()()()E XY E X E Y ≤ (3.3.5)
The equality holds iff (0)1P X == or ()1P Y aX == for some constant a. Proof (a) By the definition.
()()x
E X xP X x ==∑.
In the summation, for the term , ()1,x a P X a === and for the other terms , ()=0, x a P X x ≠=thus E(X)=a .
(b), (c) The proofs are given in advanced probability theory, so is omitted here.
(d) If (0)=1P X ≥, then in the summation ()()x
E X xP X x ==∑
each term is non-negative, thus ()0E X ≥.
(e).If (0)=1P X ≥ and ()0E X =, then in the summation ()()x
E X xP X x ==∑
the left side is 0 and each term in right side is non-negative, so each term is 0. Thus, for the terms 0x ≠, we must have ()0P X x ==. This means (0)1P X ==.
(f).Set ()g X bX = in (b), we have
()()()()x
x
E bX bxP X x b bxP X x bE X =====∑∑
(h) Assume (0)1P X =≠, then 2()0E X >. Consider the variable
Z Y sX =-, s is a constant, then
()222222220()()(2)()2()()
E Z E Y sX E Y sXY s X E Y sE XY s E X ≤=-=-+=-+
put 2()/(),s E XY E X = we have
()
2
22()()()E XY E X E Y ≤
If (0)1,P X == then 2()()0E XY E X ==, the equality holds in (3.3.5). If
()1P Y aX ==, then
()()()2
2
2
22222()()()())()E XY E aX a E X E X E Y ===
so the equality holds. On the other hand, if the equality holds in (3.3.5), but (0)1P X =≠, then we must have
()1P Y sX ==, 2()/()s E XY E X =
2.Variance 方差
Except the expectation of a random variable, we are interested in some other quantities related to a random variable. Let ’s consider an example.
The average nicotine content for both district are the same : 24 milligrams.
But the manufacturer prefer the tobaccos from district 1, because it has smaller dispersion than district 2, i.e., it is more stable.
To measure the dispersion of a data set 12,,n X X X , whose average is
1
(X n
=
12)n X X X +++, we use the quantity “variance ”, denote by 2σ, is
defined as
2
21
1()n
i i X X n σ==-∑
For example , if 21σ and 22σ are the variances of nicotine constant for the district 1and 2, resp., then
()22222211
(2424)(2724)(2524)(2224)(2224) 3.65σ=
-+-+-+-+-= ()22222211
(2824)(2724)(2524)(2024)(2024)11.65
σ=-+-+-+-+-=.
For many purpose it is desirable that a measure of dispersion be expressed in the same unit as the original data, thus the square root of the variance, called standard deviation is used. Thus for the data set
12,,
n X X X , the standard deviation (标准差)is
1
2
211
()n
i i X X n σ=⎛⎫
=- ⎪⎝⎭
∑
To measure the dispersion of random variables, we also use the quantity variance and standard deviation .
The variance of random variable has the following properties. (方差性质)
Proof (a) Let (),E X μ= ()222()()(2)D X E X E X X μμμ=-=-+
22()2()E X E X μμ=-+
()2
2222()2()()E X E X E X μμ=-+=-
(b) ()()()x
E aX b ax b P X x +=+=∑
()()
()x
x
axP X x bP X x aE X b a b
μ==+==+=+∑∑
Thus ()2()(()())D aX b E ax b a b μ+=+-+
()22222(())()()E a X a E X a D X μμ=-=-=
(c) Since ()2()01P X μ-≥=, by Theorem 3.3.1(e)
(0)1P X μ-==, i.e. ()()1P X E X ==
Example 3.3.3 A die is tossed. Find the variance and standard deviation of the spots X shown, if
(a)this die is a fair die, i.e. the probability distribution of X is
1
()6
P X k ==
, 1,2,,6.k =
(b)the probability distribution of X is
Solution (a) By Example 3.3.1. 7()2
E X =. Thus
()2
6
2
221
735
()()()()212k D X E X E X k P X k =⎛⎫=-=⋅=-= ⎪⎝⎭∑
=
(b)()10.120.130.140.250.260.3 4.2E X =⋅+⋅+⋅+⋅+⋅+⋅=
6
2
21
()()20.4K E X k P X k ====∑
()2
22()()()20.4 4.2 2.76D X E X E X =-=-=
1.66=
Example 3.3.5 Find the variance of the amount won by the player in Example 3.3.2.
(1()2
k P X k ==
) Solution We have ()2E X =, and
()
2
2
2
21()()()212482
k k k D X E X E X ∞
==-=-=-=∑.
泊松分布的期望和方差
Proof By the definition,
00
1
()()!
(1)!
k
k
k k k E X k P X k k e
e k k λ
λλλ∞
∞
∞
--====⋅==⋅
⋅=⋅-∑∑∑
1
!
!
k k
k k e
e k k λ
λλλλλ+∞
∞
--===⋅=⋅=∑
∑
.
()22
2
011
()()(1)!!k k k k k k E X k P X k e k k k e k k λλ
λλ∞
∞
∞--======⋅=-+⋅∑∑∑
22
1
(2)!
(1)!
k
k
k k e
e k k λ
λλλλλ∞
∞
--===⋅+⋅=+--∑
∑
()2
222()()()()D X E X E X λλλλ=-=+-=
Variance of Binomial distribution。