2_Maximum_Likelihood_Estimation_BUFN758O_S2013

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
BUFN 758O (Prof. Skoulakis)
k −Np p (1−p )
= 0, we obtain the estimator p ˆ=
k . N
5 / 12
Maximum Likelihood Estimation
Maximum Likelihood Estimator (MLE)
In general, the probability of the observed data, say X1 , . . . , XT , depends on the underlying parameter, say θ. This probability is denoted by p (X1 , . . . , XT |θ). Viewed as a function of the parameter θ, this probability is called the likelihood function and is denoted by L(θ) ≡ p (X1 , . . . , XT |θ). The value of the parameter θ that maximizes the likelihood function L(θ) is called the Maximum Likelihood Estimator.
Maximum Likelihood Estimation
BUFN 758O
Prof. Skoulakis
BUFN 758O (Prof. Skoulakis)
Maximum Likelihood Estimation
1 / 12
Example: Coin Tossing
Consider tossing a (not necessarily fair) coin N times The probability of tails is p = P[T ] while the probability of heads is q = P[H ] = 1 − p The parameter of interest is p Suppose that the coin is tossed 10 times and we observe 3 tails Suppose there are two candidates for the parameter p to be 1 considered: 3 and 2 . Which one is more reasonable? 3
BUFN 758O (Prof. Skoulakis)
Maximum Likelihood Estimation
4 / 12
Example: Coin Tossing (cont’d)
The derivative of F (p ) is 1 1 3 − 10p F ′ (p ) = 3 − 7 = p 1−p p (1 − p ) Setting the derivative F ′ (p ) equal to 0, we obtain the estimate 3 p ˆ = 10 . In general, if we observe k tails in( N) trials then the probability of the observed outcome is L(p ) = N p k (1 − p )N −k . To maximize k this probability (likelihood), we need to maximize F (p ) = k log(p ) + (N − k ) log(1 − p ) Setting F ′ (p ) = 0 ⇔
BUFN 758O (Prof. Skoulakis)
Maximum Likelihood Estimation
BUFN 758O (Prof. Skoulakis)
Maximum Likelihood Estimation
6 / 12
Maximum Likelihood Estimator (MLE)
The ML estimator has a number of nice properties. Under regularity conditions, the MLE is consistent and asymptotically normal. The asymptotic variance is the inverse of the Fisher information matrix I (θ), where [ 2 ] ∂ I (θ) = −E log f (X ; θ) ∂θ∂θ′ [ ] ∂ ∂ = E log f (X ; θ) ′ log f (X ; θ) ∂θ ∂θ
Example: Normal Distribution
Let X1 , . . . , XT be a random sample from a normal distribution 2 with mean µ and variance density of the N (µ, σ 2 ) ( σ . The ) µ)2 distribution is √21 exp − (X2− . σ2 πσ 2 ( ) ∏ (Xt −µ)2 √ 1 Likelihood function: L(µ, σ 2 ) = T exp − . t =1 2πσ 2 2σ 2 The log-likelihood function is ℓ(µ, σ 2 ) = −
BUFN 758O (Prof. Skoulakis) Maximum Likelihood Estimation 9 / 12
Example: Normal Distribution (cont’d)
2 Given the MLE of µ, the MLE of σ∑ is obtained by maximizing T 1 2 2 ¯ 2 ℓ(ˆ µMLE , σ ) = − 2 log (2πσ ) − 2σ2 T t =1 (Xt − X ) .
BUFN 758O (Prof. Skoulakis)
Maximum Likelihood Estimation
2 / 12
Example: Coin Tossing (cont’d)
Idea: select the value of p under which the observed outcome is more likely. Let X be the number of tails in N trials. The random variable X takes values 0, 1, . . . , N and follows the binomial distribution with probability function ( ) N k p (1 − p )N −k , k = 0, . . . , N . P[X = k ] = k then the observed outcome has In our example, if p = 1 3 (10) ( 1 )3 ( 2 ) (10) 27 7 probability 3 3 = . 3 3 310 then the observed outcome has probability Moreover, if p = 2 3 ) (10) ( 2 )3 ( 1 )7 ( 10 23 = 3 310 . 3 3 3
T ( ) T 1 ∑ log 2πσ 2 − 2 (Xt − µ)2 . 2 2σ t =1
As in the case of known variance, the log-likelihood function is ¯ , and so µ ¯ . This maximized with respect to µ at µ = X ˆMLE = X is the case regardless of the value of σ 2 .
BUFN 758O (Prof. Skoulakis)
Maximum Likelihood Estimation
7 / 12
Example: Normal Distribution with known variance
Let X1 , . . . , XT be a random sample from a normal distribution with mean µ and variance ( 1. The ) density of the N (µ, 1) 2 ( X − µ ) distribution is √1 exp − 2 . 2π ( ) ∏ (X −µ)2 √1 exp − t . The likelihood function is L(µ) = T t =1 2π 2 The log-likelihood function is T 1∑ ℓ(µ) = log L(µ) = − log (2π ) − (Xt − µ)2 . 2 2 t =1
T
∑ ∑T 2 2 ¯ 2 ¯ Note that T t =1 (Xt − µ) = t =1 (Xt − X ) + T (X − µ) and ¯ (the usual sample mean). so the MLE is µ ˆMLE = X
BUFN 758O (Prof. Skoulakis) Maximum Likelihood Estimation 8 / 12
The first (partial) derivative of ℓ(ˆ µMLE , σ 2 ) with respect to σ 2 is ∑ T ∂ T 1 1 2 ¯ )2 . ℓ(ˆ µMLE , σ ) = − 2 σ2 + 2σ4 t =1 (Xt − X ∂σ 2
2 It follows that σ ˆMLE = ∑T ¯ 2 t =1 (Xt −X ) T
.
2 Both µ ˆMLE and σ ˆMLE are consistent and asymptotically normal, but −1 2 2 2 µ ˆMLE is also unbiased while σ ˆMLE is not since E [ˆ σMLE ] = TT σ .
BUFN 758O (Prof. Skoulakis) Maximum Likelihood Estimation 3 ຫໍສະໝຸດ Baidu 12
Example: Coin Tossing (cont’d)
1 Hence, the observed outcome is more likely under p = 3 , which we conclude to be the more reasonable selection for p .
But, we do not have to focus on just two possibilities. Following the same logic, we can ask what value of p makes the observed outcome( most other words, what value of p maximizes ) likely. In 10 3 7 L(p ) = 3 p (1 − p ) ? Since the logarithmic function is strictly increasing, and ignoring (10) the constant term 3 , it suffices to maximize [ ] F (p ) = log p 3 (1 − p )7 = 3 log(p ) + 7 log(1 − p )
相关文档
最新文档