Fix a variance $\sigma^2 > 0$ and let $n > 0$ and $\mu_1, \mu_2, …, \mu_n \in \mathbb{R}$. Given $m = m_1 + m_2 + … + m_n$ samples
$$
\begin{split}
X_{11}, X_{12}, …, X_{1m_1} &\sim N(\mu_1, \sigma^2), \\
X_{21}, X_{22}, …, X_{2m_2} &\sim N(\mu_2, \sigma^2), \\
&\vdots \\
X_{n1}, X_{n2}, …, X_{nm_n} &\sim N(\mu_n, \sigma^2), \\
\end{split}
$$
where $m_1, m_2, …, m_n > 1$. What is the best estimator for the variance $\sigma^2$? In other words, we draw samples from $n$ normal distributions with the same variance, but with not necessary equal means. What is the best way to estimate the variance for the $n$ normal distributions?
Main result
Surprisingly there is an answer. It is surprisingly, because we are looking for the best estimator for the variance. Beforehand there is no grantee that a best solution actually exists. Moreover the answer is simple, though a bit harder to prove. Define $X = (X_{ij})$ to be the joint distribution of $\{X_{ij}\}$. The unique uniformly minimum-variance unbiased estimator (UMVUE) for $\sigma^2$ is given by
$$
S(X) = \frac{1}{m – n} \sum_{i = 1}^n \sum_{j = 1}^{m_i} (X_{ij} – \overline{X}_{i})^2,
$$
where $\overline{X}_i = \frac{1}{m_i} \sum_{j = 1}^{m_i} X_{ij}$. In other words, the best estimator for $\sigma^2$ is $S(X)$.
An example
Fix $\sigma = 0.35$. Suppose
$$
\begin{split}
X_1 &= (1.69, 1.35, 1.75, 1.45, 1.77) \sim N(1.5, \sigma^2), \\
X_2 &= (2.27, 1.73, 1.46) \sim N(2.0, \sigma^2), \\
X_3 &= (-0.28, 0.32, 0.69, 0.14) \sim N(0.1, \sigma^2)
\end{split}
$$
We compute
$$
\begin{split}
m &= |X_1| + |X_2| + |X_3| = 5 + 3 + 4 = 11, n = 3, \\
\overline{X}_1 &\approx 1.60, \overline{X}_2 \approx 1.82, \overline{X}_3 \approx 0.21.
\end{split}
$$
According to the main result we approximate the variance $\sigma^2$ as follows
$$
\begin{split}
\frac{1}{m – n} \sum_{i = 1}^n \sum_{j = 1}^{m_i} (X_{ij} – \overline{X}_{i})^2
&= \frac{1}{11 – 3} ( (1.69 – 1.6)^2 + (1.35 – 1.6)^2 + (1.75 – 1.6)^2 \\
&\qquad + (1.45 – 1.6)^2 + (1.77 – 1.6)^2 + (2.27 – 1.82)^2 \\
&\qquad + (1.73 – 1.82)^2 + (1.46 – 1.82)^2 + (-0.28 – 0.21)^2 \\
&\qquad + (0.32 – 0.21)^2 + (0.69 – 0.21)^2 + (0.14 – 0.21)^2) \\
&\approx 0.12
\end{split}
$$
Then the standard deviation is approximated by $\sigma \approx 0.346$. Not bad!
The proof
Before we can start with the proof of the main result we have to introduce some Definitions and Theorems from statistics.
Sufficient statistic
A statistic T(X) is sufficient for the underlying parameter $\theta$ if and only if the conditional probability distribution of the data $X$, given statistic $T(X)$, doesn’t depend on parameter $\theta$. We have the following equivalent definition for sufficiency of a statistic $T(X)$.
Let $f_{\theta}(x)$ be the probability density function for $X$. $T(X)$ is sufficient for underlying parameter $\theta$ if and only if there exists non-negative functions $g$ and $h$ such that
$$
f_{\theta}(x) = h(x) g_{\theta}(T(x)),
$$
where $h(x)$ doesn’t depend on $\theta$, and $g_{\theta}(T(x))$ depends only on $\theta$ and $T(x)$, but not on $x$.
Complete statistic
A statistic T(X) is complete if and only if for every distribution of the data $X$ and every measurable function $g$
$$
\forall \theta: E_{\theta}(g(T)) = 0 \rightarrow \forall \theta: P_{\theta}(g(T) = 0) = 1.
$$
In other words, if the expected value of $g(T)$ is zero, $g(T) = 0$ almost everywhere on $X$.
The Lehmann-Scheffe theorem provides us the tool to prove when a sufficient and complete statistic for $\theta$ is a UMVUE for $\theta$. Let $X = (X_1, X_2, …, X_n)$ be random samples from a distribution with probability distribution $f_{\theta}(x)$. Suppose $T$ is a sufficient and complete statistic for $\theta$. If $E(T(X)) = \theta$, then $T(X)$ is the unique uniformly minimum-variance unbiased estimator (UMVUE) for $\theta$.
Lemma 1
Given independent samples $X_1 \sim N(\mu_1, \sigma^2)$, $X_2 \sim N(\mu_2, \sigma^2)$, then the statistic $S(X_1, X_2) = X_1 X_2$ and $T(X_1, X_2) = X_1 + X_2$ are complete.
Proof
We prove the completeness of $S$. The completeness of $T$ is left as an exercise. Fix $\theta = (\mu_1, \mu_2, \sigma^2)$. Let $g$ be a measurable function such that
$$
E_{\theta}(g(T(X))) = E_{\theta}(g(X_1 + X_2)) = 0.
$$
The probability distribution of $T(X_1, X_2)$ is
$$
f_{\mu_1 + \mu_2, 2\sigma^2}(x) = (4\pi\sigma^2)^{\frac{1}{2}} \mathrm{exp} \left( – \frac{(x – \mu_1 – \mu_2)^2}{4 \sigma^2} \right).
$$
Therefore
$$
0 = E_{\theta}(g(T(X))) = (4\pi\sigma^2)^{\frac{1}{2}} \int g(x_1 + x_2) \mathrm{exp} \left(-\frac{(x_1 + x_2 – \mu_1 – \mu_2)^2}{4 \sigma^2}\right) dx_1 dx_2.
$$
The $\mathrm{exp}$ term is always greater than $0$, therefore $g(x) = 0$ almost everywhere. Therefore $P_{\theta}(g(x) = 0) = 1$.
Lemma 2
Given independent samples $X_1 \sim N(\mu_1, \sigma^2)$, $X_2 \sim N(\mu_2, \sigma^2)$, …, $X_n \sim N(\mu_n, \sigma^2)$, then $\sum_i X_i$ and $\sum_i X_i^2$ are complete.
Proof
Both statements follow from induction on $i$ and applying Lemma 1.
Proof of the main result
Let $X_{ij}$ as defined in the main result. Define $X = (X_{ij})$ to be the joint distribution of $\{X_{ij}\}$ with probability distribution
$$
\begin{split}
f(x) &= f_{\mu_1, \mu_2, …, \mu_n, \sigma^2}(x) \\
&= (2\pi)^{-\frac{m}{2}} \sigma^{-m} \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_{i,j} (x_{ij} – \mu_i)^2 \right) \\
&= (2\pi)^{-\frac{m}{2}} \sigma^{-m} \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_i m_i \mu_i^2\right) \\
&\qquad \times \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_{i,j} x_{ij}^2\right) \mathrm{exp} \left(\sum_i \frac{\mu_i}{\sigma^2} \sum_j x_{ij}\right).
\end{split}
$$
Notice that the equation above satisfies the requirements of the Fisher-Neyman factorization for
$$
T(X) = (\sum_{j} X_{1j}, \sum_{j} X_{2j}, …, \sum_{j} X_{nj}, \sum_{i,j} X_{ij}^2).
$$
Therefore $T(X)$ is sufficient statistic for $(\mu_1, \mu_2, …, \mu_n, \sigma^2)$. From Lemma 1 and 2 it follows that $T(X)$ is a complete statistic. Define statistic $S$ from $T$ as
$$
\begin{split}
S(X) &= \frac{1}{m – n} \left( \sum_{i,j} X_{i,j}^2 – \sum_i \frac{1}{m_i} \sum_j X_{i,j} \right) \\
&= \frac{1}{m – n} \sum_i \left( \sum_j X_{i,j}^2 – \overline{X}_i^2 \right) \\
&= \frac{1}{m – n} \sum_i \left( \sum_j (X_{i,j} – \mu_i)^2 – m_i (\overline{X}_i – \mu_i)^2 \right).
\end{split}
$$
With Lemma 2 $S$ is complete and sufficient. Moreover
$$
\begin{split}
E(S) &= \frac{1}{m – n} \sum_i \left( \sum_j E((X_{i,j} – \mu_i)^2) – m_i E((\overline{X}_i – \mu_i)^2) \right) \\
&= \frac{1}{m – n} \sum_i \left( \sum_j \sigma^2 – m_i \frac{\sigma^2}{m_i} \right) \\
&= \frac{1}{m – n} \sum_i (m_i – 1) \sigma^2 = \sigma^2,
\end{split}
$$
because
$$
\begin{split}
E((X_{i,j} – \mu_i)^2) &= \mathrm{Var}(X_{i,j}) = \sigma^2, \\
E((\overline{X}_{i,j} – \mu_i)^2) &= \mathrm{Var}(\overline{X}_{i,j}) \\
&= \frac{1}{m_i^2} \left(\sum_j \mathrm{Var}(X_{i,j}) + \sum_{j \neq k} \mathrm{Cov}(X_{i,j}, X_{i,k}) \right) \\
&= \frac{1}{m_i^2} \sum_j \sigma^2 = \frac{\sigma^2}{m_i}.
\end{split}
$$
Therefore $S$ is a complete, sufficient, statistic estimator for $\sigma^2$. From the Lehmann-Scheffe Theorem it follows that $S$ is a UMVUE for $\sigma^2$. QED