# The estimator for the variance of $m$ normal distributions

Fix a variance $\sigma^2 > 0$ and let $n > 0$ and $\mu_1, \mu_2, …, \mu_n \in \mathbb{R}$. Given $m = m_1 + m_2 + … + m_n$ samples
$$\begin{split} X_{11}, X_{12}, …, X_{1m_1} &\sim N(\mu_1, \sigma^2), \\ X_{21}, X_{22}, …, X_{2m_2} &\sim N(\mu_2, \sigma^2), \\ &\vdots \\ X_{n1}, X_{n2}, …, X_{nm_n} &\sim N(\mu_n, \sigma^2), \\ \end{split}$$
where $m_1, m_2, …, m_n > 1$. What is the best estimator for the variance $\sigma^2$? In other words, we draw samples from $n$ normal distributions with the same variance, but with not necessary equal means. What is the best way to estimate the variance for the $n$ normal distributions?

# Main result

Surprisingly there is an answer. It is surprisingly, because we are looking for the best estimator for the variance. Beforehand there is no grantee that a best solution actually exists. Moreover the answer is simple, though a bit harder to prove. Define $X = (X_{ij})$ to be the joint distribution of $\{X_{ij}\}$. The unique uniformly minimum-variance unbiased estimator (UMVUE) for $\sigma^2$ is given by
$$S(X) = \frac{1}{m – n} \sum_{i = 1}^n \sum_{j = 1}^{m_i} (X_{ij} – \overline{X}_{i})^2,$$
where $\overline{X}_i = \frac{1}{m_i} \sum_{j = 1}^{m_i} X_{ij}$. In other words, the best estimator for $\sigma^2$ is $S(X)$.

# An example

Fix $\sigma = 0.35$. Suppose
$$\begin{split} X_1 &= (1.69, 1.35, 1.75, 1.45, 1.77) \sim N(1.5, \sigma^2), \\ X_2 &= (2.27, 1.73, 1.46) \sim N(2.0, \sigma^2), \\ X_3 &= (-0.28, 0.32, 0.69, 0.14) \sim N(0.1, \sigma^2) \end{split}$$
We compute
$$\begin{split} m &= |X_1| + |X_2| + |X_3| = 5 + 3 + 4 = 11, n = 3, \\ \overline{X}_1 &\approx 1.60, \overline{X}_2 \approx 1.82, \overline{X}_3 \approx 0.21. \end{split}$$
According to the main result we approximate the variance $\sigma^2$ as follows
$$\begin{split} \frac{1}{m – n} \sum_{i = 1}^n \sum_{j = 1}^{m_i} (X_{ij} – \overline{X}_{i})^2 &= \frac{1}{11 – 3} ( (1.69 – 1.6)^2 + (1.35 – 1.6)^2 + (1.75 – 1.6)^2 \\ &\qquad + (1.45 – 1.6)^2 + (1.77 – 1.6)^2 + (2.27 – 1.82)^2 \\ &\qquad + (1.73 – 1.82)^2 + (1.46 – 1.82)^2 + (-0.28 – 0.21)^2 \\ &\qquad + (0.32 – 0.21)^2 + (0.69 – 0.21)^2 + (0.14 – 0.21)^2) \\ &\approx 0.12 \end{split}$$
Then the standard deviation is approximated by $\sigma \approx 0.346$. Not bad!

# The proof

Before we can start with the proof of the main result we have to introduce some Definitions and Theorems from statistics.

## Sufficient statistic

A statistic T(X) is sufficient for the underlying parameter $\theta$ if and only if the conditional probability distribution of the data $X$, given statistic $T(X)$, doesn’t depend on parameter $\theta$. We have the following equivalent definition for sufficiency of a statistic $T(X)$.

## Theorem (Fisher-Neyman factorization)

Let $f_{\theta}(x)$ be the probability density function for $X$. $T(X)$ is sufficient for underlying parameter $\theta$ if and only if there exists non-negative functions $g$ and $h$ such that
$$f_{\theta}(x) = h(x) g_{\theta}(T(x)),$$
where $h(x)$ doesn’t depend on $\theta$, and $g_{\theta}(T(x))$ depends only on $\theta$ and $T(x)$, but not on $x$.

## Complete statistic

A statistic T(X) is complete if and only if for every distribution of the data $X$ and every measurable function $g$
$$\forall \theta: E_{\theta}(g(T)) = 0 \rightarrow \forall \theta: P_{\theta}(g(T) = 0) = 1.$$
In other words, if the expected value of $g(T)$ is zero, $g(T) = 0$ almost everywhere on $X$.

## Theorem (Lehmann-Scheffe)

The Lehmann-Scheffe theorem provides us the tool to prove when a sufficient and complete statistic for $\theta$ is a UMVUE for $\theta$. Let $X = (X_1, X_2, …, X_n)$ be random samples from a distribution with probability distribution $f_{\theta}(x)$. Suppose $T$ is a sufficient and complete statistic for $\theta$. If $E(T(X)) = \theta$, then $T(X)$ is the unique uniformly minimum-variance unbiased estimator (UMVUE) for $\theta$.

## Lemma 1

Given independent samples $X_1 \sim N(\mu_1, \sigma^2)$, $X_2 \sim N(\mu_2, \sigma^2)$, then the statistic $S(X_1, X_2) = X_1 X_2$ and $T(X_1, X_2) = X_1 + X_2$ are complete.

### Proof

We prove the completeness of $S$. The completeness of $T$ is left as an exercise. Fix $\theta = (\mu_1, \mu_2, \sigma^2)$. Let $g$ be a measurable function such that
$$E_{\theta}(g(T(X))) = E_{\theta}(g(X_1 + X_2)) = 0.$$
The probability distribution of $T(X_1, X_2)$ is
$$f_{\mu_1 + \mu_2, 2\sigma^2}(x) = (4\pi\sigma^2)^{\frac{1}{2}} \mathrm{exp} \left( – \frac{(x – \mu_1 – \mu_2)^2}{4 \sigma^2} \right).$$
Therefore
$$0 = E_{\theta}(g(T(X))) = (4\pi\sigma^2)^{\frac{1}{2}} \int g(x_1 + x_2) \mathrm{exp} \left(-\frac{(x_1 + x_2 – \mu_1 – \mu_2)^2}{4 \sigma^2}\right) dx_1 dx_2.$$
The $\mathrm{exp}$ term is always greater than $0$, therefore $g(x) = 0$ almost everywhere. Therefore $P_{\theta}(g(x) = 0) = 1$.

## Lemma 2

Given independent samples $X_1 \sim N(\mu_1, \sigma^2)$, $X_2 \sim N(\mu_2, \sigma^2)$, …, $X_n \sim N(\mu_n, \sigma^2)$, then $\sum_i X_i$ and $\sum_i X_i^2$ are complete.

### Proof

Both statements follow from induction on $i$ and applying Lemma 1.

## Proof of the main result

Let $X_{ij}$ as defined in the main result. Define $X = (X_{ij})$ to be the joint distribution of $\{X_{ij}\}$ with probability distribution
$$\begin{split} f(x) &= f_{\mu_1, \mu_2, …, \mu_n, \sigma^2}(x) \\ &= (2\pi)^{-\frac{m}{2}} \sigma^{-m} \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_{i,j} (x_{ij} – \mu_i)^2 \right) \\ &= (2\pi)^{-\frac{m}{2}} \sigma^{-m} \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_i m_i \mu_i^2\right) \\ &\qquad \times \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_{i,j} x_{ij}^2\right) \mathrm{exp} \left(\sum_i \frac{\mu_i}{\sigma^2} \sum_j x_{ij}\right). \end{split}$$
Notice that the equation above satisfies the requirements of the Fisher-Neyman factorization for
$$T(X) = (\sum_{j} X_{1j}, \sum_{j} X_{2j}, …, \sum_{j} X_{nj}, \sum_{i,j} X_{ij}^2).$$
Therefore $T(X)$ is sufficient statistic for $(\mu_1, \mu_2, …, \mu_n, \sigma^2)$. From Lemma 1 and 2 it follows that $T(X)$ is a complete statistic. Define statistic $S$ from $T$ as
$$\begin{split} S(X) &= \frac{1}{m – n} \left( \sum_{i,j} X_{i,j}^2 – \sum_i \frac{1}{m_i} \sum_j X_{i,j} \right) \\ &= \frac{1}{m – n} \sum_i \left( \sum_j X_{i,j}^2 – \overline{X}_i^2 \right) \\ &= \frac{1}{m – n} \sum_i \left( \sum_j (X_{i,j} – \mu_i)^2 – m_i (\overline{X}_i – \mu_i)^2 \right). \end{split}$$
With Lemma 2 $S$ is complete and sufficient. Moreover
$$\begin{split} E(S) &= \frac{1}{m – n} \sum_i \left( \sum_j E((X_{i,j} – \mu_i)^2) – m_i E((\overline{X}_i – \mu_i)^2) \right) \\ &= \frac{1}{m – n} \sum_i \left( \sum_j \sigma^2 – m_i \frac{\sigma^2}{m_i} \right) \\ &= \frac{1}{m – n} \sum_i (m_i – 1) \sigma^2 = \sigma^2, \end{split}$$
because
$$\begin{split} E((X_{i,j} – \mu_i)^2) &= \mathrm{Var}(X_{i,j}) = \sigma^2, \\ E((\overline{X}_{i,j} – \mu_i)^2) &= \mathrm{Var}(\overline{X}_{i,j}) \\ &= \frac{1}{m_i^2} \left(\sum_j \mathrm{Var}(X_{i,j}) + \sum_{j \neq k} \mathrm{Cov}(X_{i,j}, X_{i,k}) \right) \\ &= \frac{1}{m_i^2} \sum_j \sigma^2 = \frac{\sigma^2}{m_i}. \end{split}$$
Therefore $S$ is a complete, sufficient, statistic estimator for $\sigma^2$. From the Lehmann-Scheffe Theorem it follows that $S$ is a UMVUE for $\sigma^2$. QED