Maximum Likelihood Estimation (MLE) is a method to estimate unknown parameters of a probability distribution or statistical model.
The principle: choose the parameter values that make the observed data most likely.
Definition
Suppose we have independent observations from a distribution with parameter and probability density/mass function .
The likelihood function is: The MLE is the parameter value that maximizes :
Often, we maximize the log-likelihood instead:
Fisher Information
The Fisher Information measures how much information an observable random variable carries about an unknown parameter.
It is defined as:
Equivalently:
Connection to MLE
-
For large samples, the MLE is approximately normally distributed:
-
The Fisher Information thus determines the variance of the MLE.
-
A higher means the data provide more information about , leading to a more precise estimate.
Example
Suppose we flip a coin times and observe heads.
Let = probability of heads.
The likelihood is:
Log-likelihood:
Differentiate and solve:
So the MLE of is just the sample proportion of heads.
Properties of MLE
- Consistency: as .
- Asymptotic normality: For large , where is the Fisher information.
- Efficiency: Achieves the lowest possible variance asymptotically.
MLE Table of various distributions
Distribution | Likelihood | Log-Likelihood | First Derivative | Second Derivative | MLE |
---|---|---|---|---|---|
(Bernoulli(p)) | |||||
Binomial(n, p) | |||||
Poisson(λ) | |||||
Uniform(a, b) | 0 | 0 | |||
Geometric(p) | |||||
Normal(μ, σ²) | |||||
Exponential(λ) | |||||
Gamma(α, β) | (w.r.t β) | (w.r.t β) | |||
Neg. Binomial(r, p) |
Summary
- MLE finds parameter values that maximize the likelihood of the observed data.
- Simple to apply to many models, though sometimes requires numerical optimization.
- Forms the basis for many other statistical methods (Wald test, likelihood ratio test, confidence intervals).
- Fisher Information quantifies the precision of those estimates.