Maximum Likelihood Estimation (MLE) is a method to estimate unknown parameters of a probability distribution or statistical model.  

The principle: choose the parameter values that make the observed data most likely.

Definition

Suppose we have independent observations from a distribution with parameter and probability density/mass function .  

The likelihood function is: The MLE is the parameter value that maximizes :

Often, we maximize the log-likelihood instead:

Fisher Information

The Fisher Information measures how much information an observable random variable carries about an unknown parameter.  

It is defined as:

Equivalently:

Connection to MLE

  • For large samples, the MLE is approximately normally distributed:

  • The Fisher Information thus determines the variance of the MLE.  

  • A higher means the data provide more information about , leading to a more precise estimate.

Example

Suppose we flip a coin times and observe heads.  

Let = probability of heads.  

The likelihood is:

Log-likelihood:

Differentiate and solve:

So the MLE of is just the sample proportion of heads.

Properties of MLE

  • Consistency: as .  
  • Asymptotic normality: For large ,     where is the Fisher information.  
  • Efficiency: Achieves the lowest possible variance asymptotically.

MLE Table of various distributions

DistributionLikelihoodLog-LikelihoodFirst DerivativeSecond DerivativeMLE
(Bernoulli(p))
Binomial(n, p)
Poisson(λ)
Uniform(a, b)00
Geometric(p)
Normal(μ, σ²)
Exponential(λ)
Gamma(α, β)(w.r.t β) (w.r.t β)
Neg. Binomial(r, p)

Summary

  • MLE finds parameter values that maximize the likelihood of the observed data.  
  • Simple to apply to many models, though sometimes requires numerical optimization.  
  • Forms the basis for many other statistical methods (Wald test, likelihood ratio test, confidence intervals).
  • Fisher Information quantifies the precision of those estimates.