Fix a variance $\sigma^2 > 0$ and let $n > 0$ and $\mu_1, \mu_2, …, \mu_n \in \mathbb{R}$. Given $m = m_1 + m_2 + … + m_n$ samples

$$

\begin{split}

X_{11}, X_{12}, …, X_{1m_1} &\sim N(\mu_1, \sigma^2), \\

X_{21}, X_{22}, …, X_{2m_2} &\sim N(\mu_2, \sigma^2), \\

&\vdots \\

X_{n1}, X_{n2}, …, X_{nm_n} &\sim N(\mu_n, \sigma^2), \\

\end{split}

$$

where $m_1, m_2, …, m_n > 1$. What is the best estimator for the variance $\sigma^2$? In other words, we draw samples from $n$ normal distributions with the same variance, but with not necessary equal means. What is the best way to estimate the variance for the $n$ normal distributions?

# Main result

Surprisingly there is an answer. It is surprisingly, because we are looking for the **best** estimator for the variance. Beforehand there is no grantee that a best solution actually exists. Moreover the answer is simple, though a bit harder to prove. Define $X = (X_{ij})$ to be the joint distribution of $\{X_{ij}\}$. The unique uniformly minimum-variance unbiased estimator (UMVUE) for $\sigma^2$ is given by

$$

S(X) = \frac{1}{m – n} \sum_{i = 1}^n \sum_{j = 1}^{m_i} (X_{ij} – \overline{X}_{i})^2,

$$

where $\overline{X}_i = \frac{1}{m_i} \sum_{j = 1}^{m_i} X_{ij}$. In other words, the best estimator for $\sigma^2$ is $S(X)$.

# An example

Fix $\sigma = 0.35$. Suppose

$$

\begin{split}

X_1 &= (1.69, 1.35, 1.75, 1.45, 1.77) \sim N(1.5, \sigma^2), \\

X_2 &= (2.27, 1.73, 1.46) \sim N(2.0, \sigma^2), \\

X_3 &= (-0.28, 0.32, 0.69, 0.14) \sim N(0.1, \sigma^2)

\end{split}

$$

We compute

$$

\begin{split}

m &= |X_1| + |X_2| + |X_3| = 5 + 3 + 4 = 11, n = 3, \\

\overline{X}_1 &\approx 1.60, \overline{X}_2 \approx 1.82, \overline{X}_3 \approx 0.21.

\end{split}

$$

According to the main result we approximate the variance $\sigma^2$ as follows

$$

\begin{split}

\frac{1}{m – n} \sum_{i = 1}^n \sum_{j = 1}^{m_i} (X_{ij} – \overline{X}_{i})^2

&= \frac{1}{11 – 3} ( (1.69 – 1.6)^2 + (1.35 – 1.6)^2 + (1.75 – 1.6)^2 \\

&\qquad + (1.45 – 1.6)^2 + (1.77 – 1.6)^2 + (2.27 – 1.82)^2 \\

&\qquad + (1.73 – 1.82)^2 + (1.46 – 1.82)^2 + (-0.28 – 0.21)^2 \\

&\qquad + (0.32 – 0.21)^2 + (0.69 – 0.21)^2 + (0.14 – 0.21)^2) \\

&\approx 0.12

\end{split}

$$

Then the standard deviation is approximated by $\sigma \approx 0.346$. Not bad!

# The proof

Before we can start with the proof of the main result we have to introduce some Definitions and Theorems from statistics.

## Sufficient statistic

A statistic T(X) is sufficient for the underlying parameter $\theta$ if and only if the conditional probability distribution of the data $X$, given statistic $T(X)$, doesn’t depend on parameter $\theta$. We have the following equivalent definition for sufficiency of a statistic $T(X)$.

Let $f_{\theta}(x)$ be the probability density function for $X$. $T(X)$ is sufficient for underlying parameter $\theta$ if and only if there exists non-negative functions $g$ and $h$ such that

$$

f_{\theta}(x) = h(x) g_{\theta}(T(x)),

$$

where $h(x)$ doesn’t depend on $\theta$, and $g_{\theta}(T(x))$ depends only on $\theta$ and $T(x)$, but not on $x$.

## Complete statistic

A statistic T(X) is complete if and only if for every distribution of the data $X$ and every measurable function $g$

$$

\forall \theta: E_{\theta}(g(T)) = 0 \rightarrow \forall \theta: P_{\theta}(g(T) = 0) = 1.

$$

In other words, if the expected value of $g(T)$ is zero, $g(T) = 0$ almost everywhere on $X$.

The Lehmann-Scheffe theorem provides us the tool to prove when a sufficient and complete statistic for $\theta$ is a UMVUE for $\theta$. Let $X = (X_1, X_2, …, X_n)$ be random samples from a distribution with probability distribution $f_{\theta}(x)$. Suppose $T$ is a sufficient and complete statistic for $\theta$. If $E(T(X)) = \theta$, then $T(X)$ is the unique uniformly minimum-variance unbiased estimator (UMVUE) for $\theta$.

## Lemma 1

Given independent samples $X_1 \sim N(\mu_1, \sigma^2)$, $X_2 \sim N(\mu_2, \sigma^2)$, then the statistic $S(X_1, X_2) = X_1 X_2$ and $T(X_1, X_2) = X_1 + X_2$ are complete.

### Proof

We prove the completeness of $S$. The completeness of $T$ is left as an exercise. Fix $\theta = (\mu_1, \mu_2, \sigma^2)$. Let $g$ be a measurable function such that

$$

E_{\theta}(g(T(X))) = E_{\theta}(g(X_1 + X_2)) = 0.

$$

The probability distribution of $T(X_1, X_2)$ is

$$

f_{\mu_1 + \mu_2, 2\sigma^2}(x) = (4\pi\sigma^2)^{\frac{1}{2}} \mathrm{exp} \left( – \frac{(x – \mu_1 – \mu_2)^2}{4 \sigma^2} \right).

$$

Therefore

$$

0 = E_{\theta}(g(T(X))) = (4\pi\sigma^2)^{\frac{1}{2}} \int g(x_1 + x_2) \mathrm{exp} \left(-\frac{(x_1 + x_2 – \mu_1 – \mu_2)^2}{4 \sigma^2}\right) dx_1 dx_2.

$$

The $\mathrm{exp}$ term is always greater than $0$, therefore $g(x) = 0$ almost everywhere. Therefore $P_{\theta}(g(x) = 0) = 1$.

## Lemma 2

Given independent samples $X_1 \sim N(\mu_1, \sigma^2)$, $X_2 \sim N(\mu_2, \sigma^2)$, …, $X_n \sim N(\mu_n, \sigma^2)$, then $\sum_i X_i$ and $\sum_i X_i^2$ are complete.

### Proof

Both statements follow from induction on $i$ and applying Lemma 1.

## Proof of the main result

Let $X_{ij}$ as defined in the main result. Define $X = (X_{ij})$ to be the joint distribution of $\{X_{ij}\}$ with probability distribution

$$

\begin{split}

f(x) &= f_{\mu_1, \mu_2, …, \mu_n, \sigma^2}(x) \\

&= (2\pi)^{-\frac{m}{2}} \sigma^{-m} \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_{i,j} (x_{ij} – \mu_i)^2 \right) \\

&= (2\pi)^{-\frac{m}{2}} \sigma^{-m} \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_i m_i \mu_i^2\right) \\

&\qquad \times \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_{i,j} x_{ij}^2\right) \mathrm{exp} \left(\sum_i \frac{\mu_i}{\sigma^2} \sum_j x_{ij}\right).

\end{split}

$$

Notice that the equation above satisfies the requirements of the Fisher-Neyman factorization for

$$

T(X) = (\sum_{j} X_{1j}, \sum_{j} X_{2j}, …, \sum_{j} X_{nj}, \sum_{i,j} X_{ij}^2).

$$

Therefore $T(X)$ is sufficient statistic for $(\mu_1, \mu_2, …, \mu_n, \sigma^2)$. From Lemma 1 and 2 it follows that $T(X)$ is a complete statistic. Define statistic $S$ from $T$ as

$$

\begin{split}

S(X) &= \frac{1}{m – n} \left( \sum_{i,j} X_{i,j}^2 – \sum_i \frac{1}{m_i} \sum_j X_{i,j} \right) \\

&= \frac{1}{m – n} \sum_i \left( \sum_j X_{i,j}^2 – \overline{X}_i^2 \right) \\

&= \frac{1}{m – n} \sum_i \left( \sum_j (X_{i,j} – \mu_i)^2 – m_i (\overline{X}_i – \mu_i)^2 \right).

\end{split}

$$

With Lemma 2 $S$ is complete and sufficient. Moreover

$$

\begin{split}

E(S) &= \frac{1}{m – n} \sum_i \left( \sum_j E((X_{i,j} – \mu_i)^2) – m_i E((\overline{X}_i – \mu_i)^2) \right) \\

&= \frac{1}{m – n} \sum_i \left( \sum_j \sigma^2 – m_i \frac{\sigma^2}{m_i} \right) \\

&= \frac{1}{m – n} \sum_i (m_i – 1) \sigma^2 = \sigma^2,

\end{split}

$$

because

$$

\begin{split}

E((X_{i,j} – \mu_i)^2) &= \mathrm{Var}(X_{i,j}) = \sigma^2, \\

E((\overline{X}_{i,j} – \mu_i)^2) &= \mathrm{Var}(\overline{X}_{i,j}) \\

&= \frac{1}{m_i^2} \left(\sum_j \mathrm{Var}(X_{i,j}) + \sum_{j \neq k} \mathrm{Cov}(X_{i,j}, X_{i,k}) \right) \\

&= \frac{1}{m_i^2} \sum_j \sigma^2 = \frac{\sigma^2}{m_i}.

\end{split}

$$

Therefore $S$ is a complete, sufficient, statistic estimator for $\sigma^2$. From the Lehmann-Scheffe Theorem it follows that $S$ is a UMVUE for $\sigma^2$. QED