Category Archives: mathematics

The secretary problem

You have a single position to fill for a secretary. There are $n$ applicants for the position. After interviewing each of the applicants you can immediately rank the applicant from worst to best with the previous interviewed applicants. However, immediately after the interview you must make the decision to hire the applicant. You only have one chance to hire the applicant directly after the interview. The question is: how many applicants must you interview before hiring the current applicant?

If you make your decision too early you might not hire the best applicant. But if you wait to long you might have rejected the best applicant already. More realistic, this is a problem that happens quite regular in real life as well:

  • If you want to buy a house, how many potential houses must you visit before buying your favorite house? If you reject a house, someone else will buy it.
  • How often should you change jobs before hoping you’ll end up with the best job you will get? If you resign a job, you (usually) can’t return to your old job.

There is a mathematical optimum for this problem. Suppose we have $n$ applicants. Our strategy is as follows: we reject the first $r – 1$ applicants. Then we continue with the $n – r$ other applicants. When there is a applicant better than the first $r – 1$ applicants, we take this applicant as our best choice. The goal is to find the optimal $r$ such that the probability that we chose the best applicant is optimal. Let $P(r)$ be the probability that we select the best applicant after rejecting the first $r – 1$ applicants, then
$$
\begin{align*}
P(r) &= \sum_{i = r}^{n} P(\text{$i$ is the best applicant} \cap \text{we choose $i$}) \\
&= \sum_{i = r}^{n} P(\text{$i$ is the best applicant}) P(\text{we choose $i$} | \text{$i$ is the best applicant}).
\end{align*}
$$
We have $P(\text{$i$ is the best applicant}) = \frac{1}{n}$. The second part of the equation is a bit more work. Given $i$ is the best applicant, what is the probability that we would choose $i$? This is only possible if applicants $r, r+1, …, i-1$ were worse than one of the first $r – 1$ applicants. Therefore, the second best applicant before $i$ must be in the first $r – 1$ applicants. The probability of this is $\frac{r – 1}{i – 1}$. We now have
$$
\begin{align*}
P(r) &= \sum_{i = r}^{n} \frac{1}{n} \frac{r\, -\, 1}{i\, -\, 1} = \frac{r\, -\, 1}{n} \sum_{i = r}^{n} \frac{1}{i\, -\, 1}.
\end{align*}
$$

We can try to compute the sum for each $r$ and find the optimal $P(r)$, but there is another way. We try to approximate the sum with a continuous function. Take $x = \frac{r – 1}{n}$, for large $n$ we have
$$
P(r) = \frac{r\, -\, 1}{n} \sum_{i = r}^{n} \frac{1}{i\, -\, 1} \approx x \int_{x}^{1} \frac{dt}{t} = – x\, \mathrm{log}(x) = \tilde{P}(x).
$$
$\tilde{P}(x)$ takes its optimum when its derivative is zero. I.e.
$$
\begin{align*}
0 = \tilde{P}'(x) &= -\mathrm{log}(x)\, -\, 1 \\
\mathrm{log}(x) &= -1, \qquad x = \frac{1}{e}.
\end{align*}
$$
Because $x = \frac{r – 1}{n}$ we find that $r = \frac{n}{e} + 1$, so the optimal strategy is to reject the first $\left[\frac{n}{e}\right]$ (round down) applicants.

So what are the mathematical optimal solutions to the real-life questions asked above? It depends per person, but you should first figure out how many options (applicants) you want to take at maximum. If you have found your maximum you can use the secretary problem to figure out how many options you have to reject by default. Then, you should select the first try that’s better than the options you have rejected before.

  • When should you bid on a house that you visited? Depending on how much you like visiting houses, but you probably don’t want to visit more than 10 houses at maximum. This means that you should reject the first $\left[\frac{10}{e}\right] = 3$ houses straight. Afterwards, every other house you visit that’s better than the first 3 I should try to buy.
  • When should you change jobs? First compute how many jobs you could have at maximum in my life? You work for around 45 years and I think you should work at least 2.5 years for the same employer before being able to say if you like the job or not. This means that you can have around 18 jobs maximum in your working life. This means that you should switch jobs after 2.5 years for the first $\left[\frac{18}{e}\right] = 6$ jobs. Afterwards you should continue switching jobs, until you find a job that’s better than any of the previous jobs. Another way to look at this problem is saying that the first $\left[\frac{45}{e}\right] = 16$ years of your working life you can switch jobs regardless. Afterwards you should become more selective and stick to any job that’s better than any of the previous jobs. You will probably not get anything better.

What is the main takeaway from the secretary problem? We probably make decisions too soon. We stick too long with our current situation. Mathematically it’s probably better to reject offers more often and only after the first $\left[\frac{n}{e}\right]$ options we should become more picky. On the other hand we can’t continue rejecting all our options thinking something better will eventually appear. There is a clearly defined threshold that tells you went to stop looking for other options.

What if you vaccinate for COVID-19 in the Netherlands to reduce pressure on hospitals?

The vaccination strategy for COVID-19 in the Netherlands is to vaccinate the people with the highest risks first. These are our elder and people with underlying health problems. Also see gezondheidsraad, in dutch. With an exception to nurses who are directly in contact with corona patients, the first people to be vaccinated are people with underlying health problems, followed by age 90+, then 80-90, etc. Other countries seem to follow the same approach. In the UK a 91 old woman was the first to receive a COVID-19 vaccine. A 101 year old woman in Germany was among the first to get a vaccine as well.

What would be the vaccination strategy if instead of vaccinating the people with the highest risks first? We would vaccinate to reduce the pressure on the hospitals as fast as possible, but with the probable trade-off that more susceptible people will get COVID-19. In order to find such a strategy we need to compute the following probability: what if you take a person randomly (possibly to be vaccinated) of gender $G$ and in the age group $A$. What is the probability that this person can get COVID-19 and is hospitalized $H$ for corona? According to this alternative strategy, the group $(G, A)$ with the highest probability should be the group to be vaccinated first. Mathematically we have to compute $P(H|G,A)$.

Using Bayesians rule we have
$$
P(H|G,A) = \frac{P(H,G,A)}{P(G,A)} = \frac{P(H)P(G|H)P(A|G,H)}{P(G,A)}.
$$

Because I don’t know the exact distributions of $H$, $G$, and $A$, I have to approximate the probabilities with counting. These calculations are subjective, and there are other, probably better, ways of approximate the true values. Data of all COVID-19 patients is openly available on the site of RIVM. General information on population distribution can be found on the site of CVB. The probability to be hospitalized in the Netherlands $P(H)$ is immediately the most subjective to approximate. First, it’s time dependent. To keep it simple, $P(H)$ represents the probability that an random person gets hospitalized for corona this year. Do we know what the probability is to get corona and be hospitalized this year? I don’t know, because I cannot predict how COVID-19 will spread or mutate. The best I can do is to use the probability that I could have been hospitalized for corona last year:
$$
P(H) = \frac{\mathrm{number\ of\ people\ hospitalized}}{\mathrm{total\ number\ of\ people}}.
$$
Although $P(H)$ is likely to be estimated wrong, it won’t matter for the results. Because $P(H)$ is independent of $G$ and $A$ the results will have no effect on the final conclusion, i.e. absolute the probabilities might be off, but not relatively. If the probability of being hospitalized for corona $P(H)$ will be different from my approximation, the overall burden on healthcare will change, but it won’t effect the relative pressure per gender and age group. Next, if being hospitalized, the probability of you being male or female:
$$
P(G|H) = \frac{\mathrm{hospitalized\ gender\ G\ patients}}{\mathrm{total\ hospitalized\ patients}}.
$$
The age groups we’re considering are 0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, and 90+. If being hospitalized and of gender $G$, probability of being in age group $A$.
$$
P(A|G,H) = \frac{\mathrm{number\ of\ hospitalized\ gender\ G\ patients\ of\ age\ group\ A}}{\mathrm{total\ hospitalized\ gender\ G\ patients}}.
$$
Probability $P(A,G)$ of being gender $G$ and of age group $A$ is computed directly from the CVB population distribution.

Filling in the details, with data used on 23th of January 2021, yield the following results

Some interesting remarks:

  • There is a large difference between male and females. All males above 40 burden the healthcare more than their female counterpart.
  • To release pressure from healthcare the best group to vaccinate first are males between 90+. Next are males between 80 and 90. Then males between 70 and 80. And after that females between 80 and 90.
  • Notice that vaccinating males between 70 and 80 releases more pressure from healthcare than vaccinating 90+ females.

Note that I’m not claiming that we should change our strategy. I only want to show that there is a difference, with respect to vaccinating, between protecting the weakest first and releasing pressure on healthcare.

The Two Girl Paradox

A family has two children. You know that one of the two children is a girl. What is the probability that the other child is a girl as well? A classical problem for a probability course exam. Unfortunately, the problem is undefined, because the problem can be intrepid in multiple ways which lead to different results.

Let $G$ be the case that we know that one of the children is a girl. Let $GG$ be the case that both children in the family are a girl. By Bayes rule
$$
P(GG|G) = \frac{P(G|GG) P(GG)}{P(G)} = \frac{1 \times 1/4}{3/4} = \frac{1}{3}.
$$
Because $P(G|GG) = 1$, $P(GG) = 1/4$, and $P(G) = 1/2$. Assuming that the ratio boy girl is exactly fifty/fifty.

Let’s go back to the child of which we know that she’s a girl. She must have a name (we assume that every child has a name). Let her name be $N$ and let $G_N$ be the case that a girl is named $N$. The probability that a girl is called $N$ is set to be $P(G_N) = p$. $p$ is close to zero but not equal to zero, because there are many girl names and none of these names are used too often. Note that if a family has two girls the chance that at least one of the girls is called $N$ is
$$
P(G_N|GG) = 1 – P(\textrm{not }G_N|GG) = 1 – (1 – p)(1 – p) = 2p – p^2.
$$
By Bayes rule
$$
P(GG|G) = P(GG|G_N) = \frac{P(G_N|GG) P(GG)}{P(G_N)} = \frac{(2p – p^2) \times 1/4}{p} = \frac{1}{4} (2 – p).
$$
Because $p$ positive but close to zero we find that $P(GG|G) \approx \frac{1}{2}$. A different results than the result in the previous paragraph.

Perl Weekly Challenge 46 Task #2

This weeks Perl Weekly Challenge Task #2 has an interesting solution. The challenge is as follows:

Is the room open?

There are 500 rooms in a hotel with 500 employees having keys to all the rooms. The first employee opened main entrance door of all the rooms. The second employee then closed the doors of room numbers 2,4,6,8,10 and so on to 500. The third employee then closed the door if it was opened or opened the door if it was closed of rooms 3,6,9,12,15 and so on to 500. Similarly the fourth employee did the same as the third but only room numbers 4,8,12,16 and so on to 500. This goes on until all employees has had a turn.

Write a script to find out all the rooms still open at the end.

Solution

The solution is given in four steps.

Given a single room with number $N$. How many times did an employee open or close the door of that room? Let’s look at the $k$-th employee. The employee opens or closes the door if and only if $k$ is a divisor of $N$. So if we want to know how often the door has been opened or closed we must count the number of divisors of the room number $N$.

If the room number $N$ has been opened or closed an even number of times, the door is closed. Likewise, if the room number $N$ has been opened or closed an odd number of times, the door is open. So this problem is equivalent to finding all $N$ equal or below 500 that have an odd number of divisors.

Take the prime decomposition of $N$
$$
N = p_1^{k_1} p_2^{k_2} … p_i^{k_i}.
$$
The number of divisors of $N$ is given by $(k_1 + 1)(k_2 + 1)…(k_i + 1)$.

Using the prime decomposition of $N$, the number of divisors of $N$ is odd if and only if $k_1$, $k_2$, …, $k_i$ are all even. Because, if some $k_j$ would be odd, $k_j + 1$ is even, hence every product with $k_j + 1$ (such as the number of divisors of $N$) is even. But if $k_1$, $k_2$, …, $k_i$ are all even, $N$ is a squared number.

Therefore, the only open rooms are the rooms with a squared number equal or below 500. A one-liner in Raku (Perl 6) is:

say $_**2 for 1..(500.sqrt);

The estimator for the variance of $m$ normal distributions

Fix a variance $\sigma^2 > 0$ and let $n > 0$ and $\mu_1, \mu_2, …, \mu_n \in \mathbb{R}$. Given $m = m_1 + m_2 + … + m_n$ samples
$$
\begin{split}
X_{11}, X_{12}, …, X_{1m_1} &\sim N(\mu_1, \sigma^2), \\
X_{21}, X_{22}, …, X_{2m_2} &\sim N(\mu_2, \sigma^2), \\
&\vdots \\
X_{n1}, X_{n2}, …, X_{nm_n} &\sim N(\mu_n, \sigma^2), \\
\end{split}
$$
where $m_1, m_2, …, m_n > 1$. What is the best estimator for the variance $\sigma^2$? In other words, we draw samples from $n$ normal distributions with the same variance, but with not necessary equal means. What is the best way to estimate the variance for the $n$ normal distributions?

Main result

Surprisingly there is an answer. It is surprisingly, because we are looking for the best estimator for the variance. Beforehand there is no grantee that a best solution actually exists. Moreover the answer is simple, though a bit harder to prove. Define $X = (X_{ij})$ to be the joint distribution of $\{X_{ij}\}$. The unique uniformly minimum-variance unbiased estimator (UMVUE) for $\sigma^2$ is given by
$$
S(X) = \frac{1}{m – n} \sum_{i = 1}^n \sum_{j = 1}^{m_i} (X_{ij} – \overline{X}_{i})^2,
$$
where $\overline{X}_i = \frac{1}{m_i} \sum_{j = 1}^{m_i} X_{ij}$. In other words, the best estimator for $\sigma^2$ is $S(X)$.

An example

Fix $\sigma = 0.35$. Suppose
$$
\begin{split}
X_1 &= (1.69, 1.35, 1.75, 1.45, 1.77) \sim N(1.5, \sigma^2), \\
X_2 &= (2.27, 1.73, 1.46) \sim N(2.0, \sigma^2), \\
X_3 &= (-0.28, 0.32, 0.69, 0.14) \sim N(0.1, \sigma^2)
\end{split}
$$
We compute
$$
\begin{split}
m &= |X_1| + |X_2| + |X_3| = 5 + 3 + 4 = 11, n = 3, \\
\overline{X}_1 &\approx 1.60, \overline{X}_2 \approx 1.82, \overline{X}_3 \approx 0.21.
\end{split}
$$
According to the main result we approximate the variance $\sigma^2$ as follows
$$
\begin{split}
\frac{1}{m – n} \sum_{i = 1}^n \sum_{j = 1}^{m_i} (X_{ij} – \overline{X}_{i})^2
&= \frac{1}{11 – 3} ( (1.69 – 1.6)^2 + (1.35 – 1.6)^2 + (1.75 – 1.6)^2 \\
&\qquad + (1.45 – 1.6)^2 + (1.77 – 1.6)^2 + (2.27 – 1.82)^2 \\
&\qquad + (1.73 – 1.82)^2 + (1.46 – 1.82)^2 + (-0.28 – 0.21)^2 \\
&\qquad + (0.32 – 0.21)^2 + (0.69 – 0.21)^2 + (0.14 – 0.21)^2) \\
&\approx 0.12
\end{split}
$$
Then the standard deviation is approximated by $\sigma \approx 0.346$. Not bad!

The proof

Before we can start with the proof of the main result we have to introduce some Definitions and Theorems from statistics.

Sufficient statistic

A statistic T(X) is sufficient for the underlying parameter $\theta$ if and only if the conditional probability distribution of the data $X$, given statistic $T(X)$, doesn’t depend on parameter $\theta$. We have the following equivalent definition for sufficiency of a statistic $T(X)$.

Theorem (Fisher-Neyman factorization)

Let $f_{\theta}(x)$ be the probability density function for $X$. $T(X)$ is sufficient for underlying parameter $\theta$ if and only if there exists non-negative functions $g$ and $h$ such that
$$
f_{\theta}(x) = h(x) g_{\theta}(T(x)),
$$
where $h(x)$ doesn’t depend on $\theta$, and $g_{\theta}(T(x))$ depends only on $\theta$ and $T(x)$, but not on $x$.

Complete statistic

A statistic T(X) is complete if and only if for every distribution of the data $X$ and every measurable function $g$
$$
\forall \theta: E_{\theta}(g(T)) = 0 \rightarrow \forall \theta: P_{\theta}(g(T) = 0) = 1.
$$
In other words, if the expected value of $g(T)$ is zero, $g(T) = 0$ almost everywhere on $X$.

Theorem (Lehmann-Scheffe)

The Lehmann-Scheffe theorem provides us the tool to prove when a sufficient and complete statistic for $\theta$ is a UMVUE for $\theta$. Let $X = (X_1, X_2, …, X_n)$ be random samples from a distribution with probability distribution $f_{\theta}(x)$. Suppose $T$ is a sufficient and complete statistic for $\theta$. If $E(T(X)) = \theta$, then $T(X)$ is the unique uniformly minimum-variance unbiased estimator (UMVUE) for $\theta$.

Lemma 1

Given independent samples $X_1 \sim N(\mu_1, \sigma^2)$, $X_2 \sim N(\mu_2, \sigma^2)$, then the statistic $S(X_1, X_2) = X_1 X_2$ and $T(X_1, X_2) = X_1 + X_2$ are complete.

Proof

We prove the completeness of $S$. The completeness of $T$ is left as an exercise. Fix $\theta = (\mu_1, \mu_2, \sigma^2)$. Let $g$ be a measurable function such that
$$
E_{\theta}(g(T(X))) = E_{\theta}(g(X_1 + X_2)) = 0.
$$
The probability distribution of $T(X_1, X_2)$ is
$$
f_{\mu_1 + \mu_2, 2\sigma^2}(x) = (4\pi\sigma^2)^{\frac{1}{2}} \mathrm{exp} \left( – \frac{(x – \mu_1 – \mu_2)^2}{4 \sigma^2} \right).
$$
Therefore
$$
0 = E_{\theta}(g(T(X))) = (4\pi\sigma^2)^{\frac{1}{2}} \int g(x_1 + x_2) \mathrm{exp} \left(-\frac{(x_1 + x_2 – \mu_1 – \mu_2)^2}{4 \sigma^2}\right) dx_1 dx_2.
$$
The $\mathrm{exp}$ term is always greater than $0$, therefore $g(x) = 0$ almost everywhere. Therefore $P_{\theta}(g(x) = 0) = 1$.

Lemma 2

Given independent samples $X_1 \sim N(\mu_1, \sigma^2)$, $X_2 \sim N(\mu_2, \sigma^2)$, …, $X_n \sim N(\mu_n, \sigma^2)$, then $\sum_i X_i$ and $\sum_i X_i^2$ are complete.

Proof

Both statements follow from induction on $i$ and applying Lemma 1.

Proof of the main result

Let $X_{ij}$ as defined in the main result. Define $X = (X_{ij})$ to be the joint distribution of $\{X_{ij}\}$ with probability distribution
$$
\begin{split}
f(x) &= f_{\mu_1, \mu_2, …, \mu_n, \sigma^2}(x) \\
&= (2\pi)^{-\frac{m}{2}} \sigma^{-m} \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_{i,j} (x_{ij} – \mu_i)^2 \right) \\
&= (2\pi)^{-\frac{m}{2}} \sigma^{-m} \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_i m_i \mu_i^2\right) \\
&\qquad \times \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_{i,j} x_{ij}^2\right) \mathrm{exp} \left(\sum_i \frac{\mu_i}{\sigma^2} \sum_j x_{ij}\right).
\end{split}
$$
Notice that the equation above satisfies the requirements of the Fisher-Neyman factorization for
$$
T(X) = (\sum_{j} X_{1j}, \sum_{j} X_{2j}, …, \sum_{j} X_{nj}, \sum_{i,j} X_{ij}^2).
$$
Therefore $T(X)$ is sufficient statistic for $(\mu_1, \mu_2, …, \mu_n, \sigma^2)$. From Lemma 1 and 2 it follows that $T(X)$ is a complete statistic. Define statistic $S$ from $T$ as
$$
\begin{split}
S(X) &= \frac{1}{m – n} \left( \sum_{i,j} X_{i,j}^2 – \sum_i \frac{1}{m_i} \sum_j X_{i,j} \right) \\
&= \frac{1}{m – n} \sum_i \left( \sum_j X_{i,j}^2 – \overline{X}_i^2 \right) \\
&= \frac{1}{m – n} \sum_i \left( \sum_j (X_{i,j} – \mu_i)^2 – m_i (\overline{X}_i – \mu_i)^2 \right).
\end{split}
$$
With Lemma 2 $S$ is complete and sufficient. Moreover
$$
\begin{split}
E(S) &= \frac{1}{m – n} \sum_i \left( \sum_j E((X_{i,j} – \mu_i)^2) – m_i E((\overline{X}_i – \mu_i)^2) \right) \\
&= \frac{1}{m – n} \sum_i \left( \sum_j \sigma^2 – m_i \frac{\sigma^2}{m_i} \right) \\
&= \frac{1}{m – n} \sum_i (m_i – 1) \sigma^2 = \sigma^2,
\end{split}
$$
because
$$
\begin{split}
E((X_{i,j} – \mu_i)^2) &= \mathrm{Var}(X_{i,j}) = \sigma^2, \\
E((\overline{X}_{i,j} – \mu_i)^2) &= \mathrm{Var}(\overline{X}_{i,j}) \\
&= \frac{1}{m_i^2} \left(\sum_j \mathrm{Var}(X_{i,j}) + \sum_{j \neq k} \mathrm{Cov}(X_{i,j}, X_{i,k}) \right) \\
&= \frac{1}{m_i^2} \sum_j \sigma^2 = \frac{\sigma^2}{m_i}.
\end{split}
$$
Therefore $S$ is a complete, sufficient, statistic estimator for $\sigma^2$. From the Lehmann-Scheffe Theorem it follows that $S$ is a UMVUE for $\sigma^2$. QED

The product of two uniform distributions

Recently someone asked me the following question, what is the distribution of the product of two uniform distributions? Not knowing the answer immediately we discussed this problem and concluded that this is not a trivial question. Below I show you my solution to this problem.

Numerical approach

We first want to have some intuition about what we expect to obtain. We do this by generating 10000 pairs of random uniform samples between 0 and 1. Then we multiply each of the pairs and plot the histogram of the results. We can do this with the following piece of python code.

import numpy as np
import matplotlib.pyplot as plt

rs = np.random.RandomState(37)

X = rs.rand(10000)
Y = rs.rand(10000)
Z = X*Y

hist = np.hstack(Z)
plt.hist(hist, bins = 'auto')
plt.show()

The results are quite surprising

It seems that the product of two uniform distributions are distributed on a logarithmic scale. Now the question is, can we find the explicit distribution function, and can we prove the correctness of the distribution function.

The distribution function of the product of two uniform distributions

Let $X$ and $Y$ be uniform distributions between $0$ and $1$. Define $Z = XY$. We write $f_X$ and $f_Y$ for the distributions $X$ and $Y$, respectively. Let $0 \leq T \leq 1$, then
$$
\begin{split}
P(Z < T) &= \int_0^1 P(xY < T) f_X(x) dx = \int_0^1 P(Y < T/x) f_X(x) dx
\end{split}
$$
If $x < T$, then $P(Y < T/x) = 1$, hence we split the integral
$$
\begin{split}
P(Z < T) &= \int_0^T f_X(x) dx + \int_T^1 P(Y < T/x) f_X(x) dx \\
&= \int_0^T f_X(x) dx + \int_T^1 \frac{T}{x} f_X(x) dx \\
&= \int_0^T 1 dx + \int_T^1 \frac{T}{x} dx \\
&= T – T \log(T).
\end{split}
$$
Hence
$$
f_Z(x) = \frac{d}{dx} (x – x\log(x)) = 1 – (1 + \log(x)) = -\log(x).
$$

The distribution function of the product of an arbitrary product of uniform distributions

We can do better. Let $X_1$, $X_2$, …, $X_n$ be uniform distributions. What is the distribution function for $Z_n = X_1 X_2 \ldots X_n$? We will show, in general, that
$$
f_{Z_n}(x) = \frac{(-1)^{n-1}}{(n-1)!} \log(x)^{n-1}, \quad 0 \leq x \leq 1.
$$
For $n = 2$ we already showed that $f_{Z_2}(x) = -\log(x)$. Assume that we can show this formula for all $k \leq n$. We show that the formula holds for $n+1$. By induction this proves that the formula holds for all $n$. We use the somewhat known identities, seeĀ https://en.wikipedia.org/wiki/List_of_integrals_of_logarithmic_functions,
$$
\begin{split}
\int \log(x)^n dx &= x \sum_{k=0}^n (-1)^{n-k} \frac{n!}{k!} (\log(x))^k, \\
\int \frac{\log(x)^n}{x} dx &= \frac{\log(x)^{n+1}}{n+1}.
\end{split}
$$
Using
$$
\begin{split}
P(Z_n < T) &= \int_0^1 P(x X_n < T) f_{Z_{n-1}}(x) dx \\
&= \int_0^1 P(X_n < T/x) f_{Z_{n-1}}(x) dx \\
&= \int_0^T f_{Z_{n-1}}(x) dx + \int_T^1 \frac{T}{x} f_{Z_{n-1}}(x) dx,
\end{split}
$$
where $Z_{n-1} = X_1 X_2 \ldots X_{n-1}$, we have
$$
\begin{split}
P(Z_{n+1} < T) &= \int_0^T f_{Z_{n}}(x) dx + \int_T^1 \frac{T}{x} f_{Z_{n}}(x) dx \\
&= \int_0^T \frac{(-1)^{n-1}}{(n-1)!} \log(x)^{n-1}dx
+ \int_T^1 \frac{T}{x} \frac{(-1)^{n-1}}{(n-1)!} \log(x)^{n-1}dx \\
&= \frac{(-1)^{n-1}}{(n-1)!} \left(
\int_0^T \log(x)^{n-1}dx + \int_T^1 \frac{T}{x} \log(x)^{n-1} dx
\right) \\
&= \frac{(-1)^{n-1}}{(n-1)!} \left(
T \sum_{k=0}^{n-1} (-1)^{n-k} \frac{n!}{k!} (\log(T))^k – \frac{T}{n} \log(T)^n
\right)
\end{split}
$$
The differential of $P(Z_{n+1} < T)$ is a telescope sum, i.e. using
\begin{equation*}
\frac{d}{dx} x\log(x)^k = k\log(x)^{k-1} + \log(x)^k,
\end{equation*}
we have
$$
\begin{split}
\frac{d}{dx} \left(
x \sum_{k = 0}^{n} (-1)^{n-k} \frac{n!}{k!} \log(x)^k
\right) &= \sum_{k = 0}^{n} (-1)^{n-k} \frac{n!}{k!} (k\log(x)^{k-1} + \log(x)^k) \\
&= \sum_{k = 0}^{n-1} (-1)^{n-k+1} \frac{n!}{k!} \log(x)^k
+ \sum_{k = 0}^{n} (-1)^{n-k} \frac{n!}{k!} \log(x)^k \\
&= \log(x)^{n}
\end{split}
$$
Therefore
$$
\begin{split}
f_{Z_{n+1}}(x) &= \frac{d}{dx} P(Z_{n+1} < x) \\
&= \frac{(-1)^{n-1}}{(n-1)!} \left(
\log(x)^{n-1}
– \log(x)^{n-1}
– \frac{1}{n} \log(x)^{n}
\right) = \frac{(-1)^n}{n!} \log(x)^n.
\end{split}
$$
Hence by induction we showed that $f_{Z_n}(x)$ holds for all $n$.