Derivatives of Trigonometric and Hyperbolic functions

Recently, I notice that I often encounter trigonometric and hyperbolic derivatives and integrals. It’s not that I don’t recognize the expressions. But I often mix up the different equations. Therefore, I made a list of all derivatives of the trigonometric and hyperbolic functions, including their inverse functions.

Derivatives of Trigonometric functions
$$
\begin{align}
\frac{d}{dx} \sin(x) &= \cos(x), \\
\frac{d}{dx} \cos(x) &= -\sin(x), \\
\frac{d}{dx} \tan(x) &= \frac{d}{dx} \frac{\sin(x)}{\cos(x)} = \frac{1}{\cos^2(x)} = \sec^2(x), \\
\frac{d}{dx} \cot(x) &= \frac{d}{dx} \frac{\cos(x)}{\sin(x)} = \frac{1}{\sin^2(x)} = -\csc^2(x), \\
\frac{d}{dx} \sec(x) &= \frac{d}{dx} \frac{1}{\cos(x)} = \frac{\sin(x)}{\cos^2(x)} = \sec(x) \tan(x), \\
\frac{d}{dx} \csc(x) &= \frac{d}{dx} \frac{1}{\sin(x)} = -\frac{\cos(x)}{\sin^2(x)} = -\csc(x) \cos(x).
\end{align}
$$

Derivatives of inverse Trigonometric functions
$$
\begin{align}
\frac{d}{dx} \arcsin(x) &= \frac{1}{\sqrt{1 – x^2}}, \quad -1 < x < 1, \\
\frac{d}{dx} \arccos(x) &= -\frac{1}{\sqrt{1 – x^2}}, \quad -1 < x < 1, \\
\frac{d}{dx} \arctan(x) &= \frac{1}{1 + x^2}, \\
\frac{d}{dx} \mathrm{arccot}(x) &= -\frac{1}{1 + x^2}, \\
\frac{d}{dx} \mathrm{arcsec}(x) &= \frac{1}{|x|\sqrt{x^2 – 1}}, \quad 1 < x, \\
\frac{d}{dx} \mathrm{arccsc}(x) &= -\frac{1}{|x|\sqrt{x^2 – 1}}, \quad 1 < x.
\end{align}
$$

Derivatives of Hyperbolic functions
$$
\begin{align}
\frac{d}{dx} \sinh(x) &= \cosh(x), \\
\frac{d}{dx} \cosh(x) &= \sinh(x), \\
\frac{d}{dx} \tanh(x) &= \frac{1}{\cosh^2(x)}, \\
\frac{d}{dx} \coth(x) &= -\frac{1}{\sinh^2(x)}, \\
\frac{d}{dx} \mathrm{sech}(x) &= -\tanh(x) \mathrm{sech}(x), \\
\frac{d}{dx} \mathrm{csch}(x) &= -\coth(x) \mathrm{csch}(x).
\end{align}
$$

Derivatives of inverse Hyperbolic functions
$$
\begin{align}
\frac{d}{dx} \mathrm{arcsinh}(x) &= \frac{1}{\sqrt{1 + x^2}}, \\
\frac{d}{dx} \mathrm{arccosh}(x) &= \frac{1}{\sqrt{x^2 – 1}}, \quad 1 < x, \\
\frac{d}{dx} \mathrm{arctanh}(x) &= \frac{1}{1 – x^2}, \quad -1 < x < 1, \\
\frac{d}{dx} \mathrm{arccoth}(x) &= \frac{1}{1 – x^2}, \quad 1 < |x|, \\
\frac{d}{dx} \mathrm{arcsech}(x) &= -\frac{1}{x\sqrt{1 – x^2}}, \quad -1 < x < 1, \\
\frac{d}{dx} \mathrm{arccsch}(x) &= -\frac{1}{|x|\sqrt{1 + x^2}}.
\end{align}
$$

An interesting integral

Show that
$$
\int_0^{\infty} e^{-x^2 – \frac{\alpha^2}{x^2}} dx = e^{-2\alpha} \frac{\sqrt{\pi}}{2}.
$$

Let
$$
I(\alpha) = \int_0^{\infty} e^{-x^2 – \frac{\alpha^2}{x^2}} dx.
$$
We differentiate with respect to $\alpha$,
$$
\frac{dI}{d\alpha} = -2 \int_0^{\infty} \frac{a}{x^2} e^{-x^2 – \frac{\alpha^2}{x^2}} dx
= -2 \int_{\infty}^{0} e^{-\frac{\alpha^2}{y^2} – y^2} dy
= -2 I(\alpha),
$$
where we substituted $y = \alpha / x$. The solution to $\frac{dI}{d\alpha} = -2I$ is $I(\alpha) = C e^{-2\alpha}$. To find $C$ evaluate $\alpha = 0$ in $I(\alpha)$ and notice a well know integral
$$
C = I(0) = \int_0^{\infty} e^{-x^2} dx = \frac{\sqrt{\pi}}{2}.
$$
So, we have
$$
I(\alpha) = C e^{-2\alpha} = e^{-2\alpha} \frac{\sqrt{\pi}}{2}.
$$

The sum of two normal distributed samples

Let $X_1$ and $X_2$ be independently and identically drawn from the distribution $\mathcal{N}(0, \sigma^2)$ with mean $0$ and (unknown) variance $\sigma^2$. We approximate the sample mean by
$$
\overline{X} = \frac{1}{2} (X_1 + X_2),
$$
and the sample variance by
$$
S^2 = \frac{1}{n – 1} \sum (X_i – \overline{X})^2 = (X_1 – \overline{X})^2 + (X_2 – \overline{X})^2.
$$
The random variable
$$
\frac{\overline{X}}{\sigma / \sqrt{2}}
$$
is a normal distribution with mean $0$ and standard deviation $1$. The approximated random variable
$$
\frac{\overline{X}}{S / \sqrt{2}}
$$
has a Student t-distribution with 1 degree of freedom. Be aware that a Student t-distribution with 1 degree of freedom, the random variable doesn’t have a mean or standard deviation. The “true” random variable and the “approximated” random variable have strong different behavior with respect to the mean and standard deviation.

This result is easily verified numerically

import numpy as np
rng = np.random.RandomState(37)

sigma = 0.42  # Some (unknown) standard deviation
x1 = rng.normal(0, sigma, size=10000)
x2 = rng.normal(0, sigma, size=10000)

xm = (x1 + x2)/2
s2 = (x1 - xm)**2 + (x2 - xm)**2
y = xm/np.sqrt(s2)/np.sqrt(2)

print(y.mean())  # Result: 0.9905..., definitely not 0
print(y.std())   # Result: 91.0561..., not even close to 1

# The "approximation" is not even close to the "true" random variable.
y_true = xm/sigma/np.sqrt(2)
print(np.sqrt(((y - y_true)**2).mean()))  # Result: 91.0412...

This should be a warning when trying to calculate the mean and variance on only 2 samples of a normal distribution. When $n > 2$ this behavior doesn’t happen, as for higher degrees of freedom the Student t-distribution has a mean and variance.

The secretary problem

You have a single position to fill for a secretary. There are $n$ applicants for the position. After interviewing each of the applicants you can immediately rank the applicant from worst to best with the previous interviewed applicants. However, immediately after the interview you must make the decision to hire the applicant. You only have one chance to hire the applicant directly after the interview. The question is: how many applicants must you interview before hiring the current applicant?

If you make your decision too early you might not hire the best applicant. But if you wait to long you might have rejected the best applicant already. More realistic, this is a problem that happens quite regular in real life as well:

  • If you want to buy a house, how many potential houses must you visit before buying your favorite house? If you reject a house, someone else will buy it.
  • How often should you change jobs before hoping you’ll end up with the best job you will get? If you resign a job, you (usually) can’t return to your old job.

There is a mathematical optimum for this problem. Suppose we have $n$ applicants. Our strategy is as follows: we reject the first $r – 1$ applicants. Then we continue with the $n – r$ other applicants. When there is a applicant better than the first $r – 1$ applicants, we take this applicant as our best choice. The goal is to find the optimal $r$ such that the probability that we chose the best applicant is optimal. Let $P(r)$ be the probability that we select the best applicant after rejecting the first $r – 1$ applicants, then
$$
\begin{align*}
P(r) &= \sum_{i = r}^{n} P(\text{$i$ is the best applicant} \cap \text{we choose $i$}) \\
&= \sum_{i = r}^{n} P(\text{$i$ is the best applicant}) P(\text{we choose $i$} | \text{$i$ is the best applicant}).
\end{align*}
$$
We have $P(\text{$i$ is the best applicant}) = \frac{1}{n}$. The second part of the equation is a bit more work. Given $i$ is the best applicant, what is the probability that we would choose $i$? This is only possible if applicants $r, r+1, …, i-1$ were worse than one of the first $r – 1$ applicants. Therefore, the second best applicant before $i$ must be in the first $r – 1$ applicants. The probability of this is $\frac{r – 1}{i – 1}$. We now have
$$
\begin{align*}
P(r) &= \sum_{i = r}^{n} \frac{1}{n} \frac{r\, -\, 1}{i\, -\, 1} = \frac{r\, -\, 1}{n} \sum_{i = r}^{n} \frac{1}{i\, -\, 1}.
\end{align*}
$$

We can try to compute the sum for each $r$ and find the optimal $P(r)$, but there is another way. We try to approximate the sum with a continuous function. Take $x = \frac{r – 1}{n}$, for large $n$ we have
$$
P(r) = \frac{r\, -\, 1}{n} \sum_{i = r}^{n} \frac{1}{i\, -\, 1} \approx x \int_{x}^{1} \frac{dt}{t} = – x\, \mathrm{log}(x) = \tilde{P}(x).
$$
$\tilde{P}(x)$ takes its optimum when its derivative is zero. I.e.
$$
\begin{align*}
0 = \tilde{P}'(x) &= -\mathrm{log}(x)\, -\, 1 \\
\mathrm{log}(x) &= -1, \qquad x = \frac{1}{e}.
\end{align*}
$$
Because $x = \frac{r – 1}{n}$ we find that $r = \frac{n}{e} + 1$, so the optimal strategy is to reject the first $\left[\frac{n}{e}\right]$ (round down) applicants.

So what are the mathematical optimal solutions to the real-life questions asked above? It depends per person, but you should first figure out how many options (applicants) you want to take at maximum. If you have found your maximum you can use the secretary problem to figure out how many options you have to reject by default. Then, you should select the first try that’s better than the options you have rejected before.

  • When should you bid on a house that you visited? Depending on how much you like visiting houses, but you probably don’t want to visit more than 10 houses at maximum. This means that you should reject the first $\left[\frac{10}{e}\right] = 3$ houses straight. Afterwards, every other house you visit that’s better than the first 3 I should try to buy.
  • When should you change jobs? First compute how many jobs you could have at maximum in my life? You work for around 45 years and I think you should work at least 2.5 years for the same employer before being able to say if you like the job or not. This means that you can have around 18 jobs maximum in your working life. This means that you should switch jobs after 2.5 years for the first $\left[\frac{18}{e}\right] = 6$ jobs. Afterwards you should continue switching jobs, until you find a job that’s better than any of the previous jobs. Another way to look at this problem is saying that the first $\left[\frac{45}{e}\right] = 16$ years of your working life you can switch jobs regardless. Afterwards you should become more selective and stick to any job that’s better than any of the previous jobs. You will probably not get anything better.

What is the main takeaway from the secretary problem? We probably make decisions too soon. We stick too long with our current situation. Mathematically it’s probably better to reject offers more often and only after the first $\left[\frac{n}{e}\right]$ options we should become more picky. On the other hand we can’t continue rejecting all our options thinking something better will eventually appear. There is a clearly defined threshold that tells you went to stop looking for other options.

Eurovision Song Contest 2021 Jury vs Televoters looks normal

On 22th of May 2021 the Eurovision Song Contest 2021 took place. Italy won first place, with both high votes from the jury and the public. However, I was surprised to see quite a few countries that had large differences between the jury and public scores. For example, Ukraine received only 97 points for the jury while receiving 2.75 times more from the public with 267 points; Finland got 83 points from the jury, 218 from the public, 2.6 times more; Malta convinced the jury with 208 points, but received only 47 points from the public, more than 4 times less; And similar results happened with Bulgaria and Portugal.

PlaceCountrySong / ArtistPointsTeleJury
1ItalyZitti e buoni / Måneskin524318206
2FranceVoilà / Barbara Pravi499251248
3SwitzerlandTout l’Univers / Gjon’s Tears432165267
4Iceland10 Years / Daði & Gagnamagnið378180198
5UkraineShum / Go_A36426797
6FinlandDark Side / Blind Channel30121883
7MaltaJe Me Casse / Destiny25547208
8LithuaniaDiscoteque / The Roop22016555
9RussiaRussian Woman / Manizha204100104
10GreeceLast Dance / Stefania1707991
11BulgariaGrowing Up Is Getting Old / Victoria17030140
12PortugalLove Is On My Side / The Black Mamba15327126
13MoldovaSugar Natalia / Gordienko1156253
14SwedenVoices / Tusse1096346
15SerbiaLoco Loco / Hurricane1028220
16CyprusEl Diablo / Elena Tsagrinou944450
17IsraelSet Me Free / Eden Alene932073
18NorwayFallen Angel / TIX756015
19BelgiumThe Wrong Place / Hooverphonic74371
20AzerbaijanMata Hari / Efendi653332
21AlbaniaKarma / Anxhela Peristeri573522
22San MarinoAdrenalina / Senhit feat. Flo Rida501337
23NetherlandsBirth Of A New Age / Jeangu Macrooy11011
24SpainVoy A Quedarme / Blas Cantó606
25GermanyI Don’t Feel Hate / Jendrik303
26United KingdomEmbers / James Newman000

I wondered if this is an outlier, and hence if the jury or the public seems to be (too) bias. Assume, as null hypothesis, that are both fair and unbiased. I’ve not done the rigorous calculations, but I expect that under the assumption of the null hypothesis the difference between the jury and the televoters per country should be normally distributed. After normalization we have the following distribution.

Looks close to normal to me. To be sure I applied the D’Agostino’s K-squared test on the data using the scipy normaltest, with results $K^2 \approx 0.82$ and p-value $0.66$. Therefore I cannot reject the null hypothesis (or not with any significance). Perhaps someone needs to have a greater look at this to figure out if this is the correct model, but on first sight the differences between the jury and televoters seem to be explainable as random variations.

What if you vaccinate for COVID-19 in the Netherlands to reduce pressure on hospitals?

The vaccination strategy for COVID-19 in the Netherlands is to vaccinate the people with the highest risks first. These are our elder and people with underlying health problems. Also see gezondheidsraad, in dutch. With an exception to nurses who are directly in contact with corona patients, the first people to be vaccinated are people with underlying health problems, followed by age 90+, then 80-90, etc. Other countries seem to follow the same approach. In the UK a 91 old woman was the first to receive a COVID-19 vaccine. A 101 year old woman in Germany was among the first to get a vaccine as well.

What would be the vaccination strategy if instead of vaccinating the people with the highest risks first? We would vaccinate to reduce the pressure on the hospitals as fast as possible, but with the probable trade-off that more susceptible people will get COVID-19. In order to find such a strategy we need to compute the following probability: what if you take a person randomly (possibly to be vaccinated) of gender $G$ and in the age group $A$. What is the probability that this person can get COVID-19 and is hospitalized $H$ for corona? According to this alternative strategy, the group $(G, A)$ with the highest probability should be the group to be vaccinated first. Mathematically we have to compute $P(H|G,A)$.

Using Bayesians rule we have
$$
P(H|G,A) = \frac{P(H,G,A)}{P(G,A)} = \frac{P(H)P(G|H)P(A|G,H)}{P(G,A)}.
$$

Because I don’t know the exact distributions of $H$, $G$, and $A$, I have to approximate the probabilities with counting. These calculations are subjective, and there are other, probably better, ways of approximate the true values. Data of all COVID-19 patients is openly available on the site of RIVM. General information on population distribution can be found on the site of CVB. The probability to be hospitalized in the Netherlands $P(H)$ is immediately the most subjective to approximate. First, it’s time dependent. To keep it simple, $P(H)$ represents the probability that an random person gets hospitalized for corona this year. Do we know what the probability is to get corona and be hospitalized this year? I don’t know, because I cannot predict how COVID-19 will spread or mutate. The best I can do is to use the probability that I could have been hospitalized for corona last year:
$$
P(H) = \frac{\mathrm{number\ of\ people\ hospitalized}}{\mathrm{total\ number\ of\ people}}.
$$
Although $P(H)$ is likely to be estimated wrong, it won’t matter for the results. Because $P(H)$ is independent of $G$ and $A$ the results will have no effect on the final conclusion, i.e. absolute the probabilities might be off, but not relatively. If the probability of being hospitalized for corona $P(H)$ will be different from my approximation, the overall burden on healthcare will change, but it won’t effect the relative pressure per gender and age group. Next, if being hospitalized, the probability of you being male or female:
$$
P(G|H) = \frac{\mathrm{hospitalized\ gender\ G\ patients}}{\mathrm{total\ hospitalized\ patients}}.
$$
The age groups we’re considering are 0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, and 90+. If being hospitalized and of gender $G$, probability of being in age group $A$.
$$
P(A|G,H) = \frac{\mathrm{number\ of\ hospitalized\ gender\ G\ patients\ of\ age\ group\ A}}{\mathrm{total\ hospitalized\ gender\ G\ patients}}.
$$
Probability $P(A,G)$ of being gender $G$ and of age group $A$ is computed directly from the CVB population distribution.

Filling in the details, with data used on 23th of January 2021, yield the following results

Some interesting remarks:

  • There is a large difference between male and females. All males above 40 burden the healthcare more than their female counterpart.
  • To release pressure from healthcare the best group to vaccinate first are males between 90+. Next are males between 80 and 90. Then males between 70 and 80. And after that females between 80 and 90.
  • Notice that vaccinating males between 70 and 80 releases more pressure from healthcare than vaccinating 90+ females.

Note that I’m not claiming that we should change our strategy. I only want to show that there is a difference, with respect to vaccinating, between protecting the weakest first and releasing pressure on healthcare.

The Two Girl Paradox

A family has two children. You know that one of the two children is a girl. What is the probability that the other child is a girl as well? A classical problem for a probability course exam. Unfortunately, the problem is undefined, because the problem can be intrepid in multiple ways which lead to different results.

Let $G$ be the case that we know that one of the children is a girl. Let $GG$ be the case that both children in the family are a girl. By Bayes rule
$$
P(GG|G) = \frac{P(G|GG) P(GG)}{P(G)} = \frac{1 \times 1/4}{3/4} = \frac{1}{3}.
$$
Because $P(G|GG) = 1$, $P(GG) = 1/4$, and $P(G) = 1/2$. Assuming that the ratio boy girl is exactly fifty/fifty.

Let’s go back to the child of which we know that she’s a girl. She must have a name (we assume that every child has a name). Let her name be $N$ and let $G_N$ be the case that a girl is named $N$. The probability that a girl is called $N$ is set to be $P(G_N) = p$. $p$ is close to zero but not equal to zero, because there are many girl names and none of these names are used too often. Note that if a family has two girls the chance that at least one of the girls is called $N$ is
$$
P(G_N|GG) = 1 – P(\textrm{not }G_N|GG) = 1 – (1 – p)(1 – p) = 2p – p^2.
$$
By Bayes rule
$$
P(GG|G) = P(GG|G_N) = \frac{P(G_N|GG) P(GG)}{P(G_N)} = \frac{(2p – p^2) \times 1/4}{p} = \frac{1}{4} (2 – p).
$$
Because $p$ positive but close to zero we find that $P(GG|G) \approx \frac{1}{2}$. A different results than the result in the previous paragraph.

Perl Weekly Challenge 46 Task #2

This weeks Perl Weekly Challenge Task #2 has an interesting solution. The challenge is as follows:

Is the room open?

There are 500 rooms in a hotel with 500 employees having keys to all the rooms. The first employee opened main entrance door of all the rooms. The second employee then closed the doors of room numbers 2,4,6,8,10 and so on to 500. The third employee then closed the door if it was opened or opened the door if it was closed of rooms 3,6,9,12,15 and so on to 500. Similarly the fourth employee did the same as the third but only room numbers 4,8,12,16 and so on to 500. This goes on until all employees has had a turn.

Write a script to find out all the rooms still open at the end.

Solution

The solution is given in four steps.

Given a single room with number $N$. How many times did an employee open or close the door of that room? Let’s look at the $k$-th employee. The employee opens or closes the door if and only if $k$ is a divisor of $N$. So if we want to know how often the door has been opened or closed we must count the number of divisors of the room number $N$.

If the room number $N$ has been opened or closed an even number of times, the door is closed. Likewise, if the room number $N$ has been opened or closed an odd number of times, the door is open. So this problem is equivalent to finding all $N$ equal or below 500 that have an odd number of divisors.

Take the prime decomposition of $N$
$$
N = p_1^{k_1} p_2^{k_2} … p_i^{k_i}.
$$
The number of divisors of $N$ is given by $(k_1 + 1)(k_2 + 1)…(k_i + 1)$.

Using the prime decomposition of $N$, the number of divisors of $N$ is odd if and only if $k_1$, $k_2$, …, $k_i$ are all even. Because, if some $k_j$ would be odd, $k_j + 1$ is even, hence every product with $k_j + 1$ (such as the number of divisors of $N$) is even. But if $k_1$, $k_2$, …, $k_i$ are all even, $N$ is a squared number.

Therefore, the only open rooms are the rooms with a squared number equal or below 500. A one-liner in Raku (Perl 6) is:

say $_**2 for 1..(500.sqrt);

Install (x)Ubuntu 18.04 on an Asus Zenbook 13

Recently I bought an Asus Zenbook 13 RX333FN. I removed Windows 10 and installed xUbuntu 18.04 on the laptop. However, not every feature I use works out of the box. In these notes explain how to fix this. Probably these notes will apply for similar Asus Zenbooks as well. I expect that these fixes become obsolete with never versions of Ubuntu. Don’t forget to check the ArchLinux wiki on the Asus Zenbook UX333 as well.

Enable deep suspend mode

The Zenbook doesn’t go into deep suspend mode automatically when I close the lid. To enable deep suspend mode open /etc/default/grub, and add to GRUB_CMDLINE_LINUX_DEFAULT the line mem_sleep_default=deep. In my case I have

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash mem_sleep_default=deep"

(partially) Reduce power usage and heat

The laptop overheats quickly. I followed the Most Effective Ways To Reduce Laptop Overheating In Linux. Still the cpu easily heats up to 40 to 50 Celsius. I’m guessing that this is the downside of having such a small laptop.

Fix sound

The sound card is supported in the Linux operating system 4.20 and onward. I installed the 4.20.17 kernel using ukuu.

sudo add-apt-repository ppa:teejee2008/ppa
sudo apt update
sudo apt install ukuu
sudo ukuu-gtk  # Install Kernel 4.20.17

In order to boot the latest Linux operating system, navigate to BIOS, go to “Advanced Menu”, then the “Security” tab and set “Secure Boot” to Off.

Enable Nvidia CUDA support

To enable the Nvidia CUDA support, first disable the nouveau driver. Open /etc/modprobe.d/blacklist-nvidia-nouveau.confand add

blacklist nouveau
options nouveau modeset=0

Next open Software & Updates in the menu, go to Additional Driversand select nvidia-driver-418 (open source). Click on Apply Changes.

Go to the Nvidia home page and download CUDA Toolkit and cuDNN. To download cuDNN you’re required to create an Nvidia account. Install the packages:

sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.168-418.67_1.0-1_amd64.deb
sudo dpkg -i libcudnn7_7.6.0.64-1+cuda10.1_amd64.deb

Reboot your machine and check if CUDA works.

> nvidia-smi
Mon Jun 10 15:02:51 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce MX150       On   | 00000000:02:00.0 Off |                  N/A |
| N/A   57C    P0    N/A /  N/A |    111MiB /  2002MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      3348      G   /usr/lib/xorg/Xorg                           109MiB |
|    0      6656      G   /usr/lib/firefox/firefox                       1MiB |
+-----------------------------------------------------------------------------+

Retrieve the Windows 10 license key

The zenbook comes with a Windows 10 license key hardcoded in the firmware. You can extract the license key as follows

sudo grep -aPo '[\w]{5}-[\w]{5}-[\w]{5}-[\w]{5}-[\w]{5}' /sys/firmware/acpi/tables/MSDM

The estimator for the variance of $m$ normal distributions

Fix a variance $\sigma^2 > 0$ and let $n > 0$ and $\mu_1, \mu_2, …, \mu_n \in \mathbb{R}$. Given $m = m_1 + m_2 + … + m_n$ samples
$$
\begin{split}
X_{11}, X_{12}, …, X_{1m_1} &\sim N(\mu_1, \sigma^2), \\
X_{21}, X_{22}, …, X_{2m_2} &\sim N(\mu_2, \sigma^2), \\
&\vdots \\
X_{n1}, X_{n2}, …, X_{nm_n} &\sim N(\mu_n, \sigma^2), \\
\end{split}
$$
where $m_1, m_2, …, m_n > 1$. What is the best estimator for the variance $\sigma^2$? In other words, we draw samples from $n$ normal distributions with the same variance, but with not necessary equal means. What is the best way to estimate the variance for the $n$ normal distributions?

Main result

Surprisingly there is an answer. It is surprisingly, because we are looking for the best estimator for the variance. Beforehand there is no grantee that a best solution actually exists. Moreover the answer is simple, though a bit harder to prove. Define $X = (X_{ij})$ to be the joint distribution of $\{X_{ij}\}$. The unique uniformly minimum-variance unbiased estimator (UMVUE) for $\sigma^2$ is given by
$$
S(X) = \frac{1}{m – n} \sum_{i = 1}^n \sum_{j = 1}^{m_i} (X_{ij} – \overline{X}_{i})^2,
$$
where $\overline{X}_i = \frac{1}{m_i} \sum_{j = 1}^{m_i} X_{ij}$. In other words, the best estimator for $\sigma^2$ is $S(X)$.

An example

Fix $\sigma = 0.35$. Suppose
$$
\begin{split}
X_1 &= (1.69, 1.35, 1.75, 1.45, 1.77) \sim N(1.5, \sigma^2), \\
X_2 &= (2.27, 1.73, 1.46) \sim N(2.0, \sigma^2), \\
X_3 &= (-0.28, 0.32, 0.69, 0.14) \sim N(0.1, \sigma^2)
\end{split}
$$
We compute
$$
\begin{split}
m &= |X_1| + |X_2| + |X_3| = 5 + 3 + 4 = 11, n = 3, \\
\overline{X}_1 &\approx 1.60, \overline{X}_2 \approx 1.82, \overline{X}_3 \approx 0.21.
\end{split}
$$
According to the main result we approximate the variance $\sigma^2$ as follows
$$
\begin{split}
\frac{1}{m – n} \sum_{i = 1}^n \sum_{j = 1}^{m_i} (X_{ij} – \overline{X}_{i})^2
&= \frac{1}{11 – 3} ( (1.69 – 1.6)^2 + (1.35 – 1.6)^2 + (1.75 – 1.6)^2 \\
&\qquad + (1.45 – 1.6)^2 + (1.77 – 1.6)^2 + (2.27 – 1.82)^2 \\
&\qquad + (1.73 – 1.82)^2 + (1.46 – 1.82)^2 + (-0.28 – 0.21)^2 \\
&\qquad + (0.32 – 0.21)^2 + (0.69 – 0.21)^2 + (0.14 – 0.21)^2) \\
&\approx 0.12
\end{split}
$$
Then the standard deviation is approximated by $\sigma \approx 0.346$. Not bad!

The proof

Before we can start with the proof of the main result we have to introduce some Definitions and Theorems from statistics.

Sufficient statistic

A statistic T(X) is sufficient for the underlying parameter $\theta$ if and only if the conditional probability distribution of the data $X$, given statistic $T(X)$, doesn’t depend on parameter $\theta$. We have the following equivalent definition for sufficiency of a statistic $T(X)$.

Theorem (Fisher-Neyman factorization)

Let $f_{\theta}(x)$ be the probability density function for $X$. $T(X)$ is sufficient for underlying parameter $\theta$ if and only if there exists non-negative functions $g$ and $h$ such that
$$
f_{\theta}(x) = h(x) g_{\theta}(T(x)),
$$
where $h(x)$ doesn’t depend on $\theta$, and $g_{\theta}(T(x))$ depends only on $\theta$ and $T(x)$, but not on $x$.

Complete statistic

A statistic T(X) is complete if and only if for every distribution of the data $X$ and every measurable function $g$
$$
\forall \theta: E_{\theta}(g(T)) = 0 \rightarrow \forall \theta: P_{\theta}(g(T) = 0) = 1.
$$
In other words, if the expected value of $g(T)$ is zero, $g(T) = 0$ almost everywhere on $X$.

Theorem (Lehmann-Scheffe)

The Lehmann-Scheffe theorem provides us the tool to prove when a sufficient and complete statistic for $\theta$ is a UMVUE for $\theta$. Let $X = (X_1, X_2, …, X_n)$ be random samples from a distribution with probability distribution $f_{\theta}(x)$. Suppose $T$ is a sufficient and complete statistic for $\theta$. If $E(T(X)) = \theta$, then $T(X)$ is the unique uniformly minimum-variance unbiased estimator (UMVUE) for $\theta$.

Lemma 1

Given independent samples $X_1 \sim N(\mu_1, \sigma^2)$, $X_2 \sim N(\mu_2, \sigma^2)$, then the statistic $S(X_1, X_2) = X_1 X_2$ and $T(X_1, X_2) = X_1 + X_2$ are complete.

Proof

We prove the completeness of $S$. The completeness of $T$ is left as an exercise. Fix $\theta = (\mu_1, \mu_2, \sigma^2)$. Let $g$ be a measurable function such that
$$
E_{\theta}(g(T(X))) = E_{\theta}(g(X_1 + X_2)) = 0.
$$
The probability distribution of $T(X_1, X_2)$ is
$$
f_{\mu_1 + \mu_2, 2\sigma^2}(x) = (4\pi\sigma^2)^{\frac{1}{2}} \mathrm{exp} \left( – \frac{(x – \mu_1 – \mu_2)^2}{4 \sigma^2} \right).
$$
Therefore
$$
0 = E_{\theta}(g(T(X))) = (4\pi\sigma^2)^{\frac{1}{2}} \int g(x_1 + x_2) \mathrm{exp} \left(-\frac{(x_1 + x_2 – \mu_1 – \mu_2)^2}{4 \sigma^2}\right) dx_1 dx_2.
$$
The $\mathrm{exp}$ term is always greater than $0$, therefore $g(x) = 0$ almost everywhere. Therefore $P_{\theta}(g(x) = 0) = 1$.

Lemma 2

Given independent samples $X_1 \sim N(\mu_1, \sigma^2)$, $X_2 \sim N(\mu_2, \sigma^2)$, …, $X_n \sim N(\mu_n, \sigma^2)$, then $\sum_i X_i$ and $\sum_i X_i^2$ are complete.

Proof

Both statements follow from induction on $i$ and applying Lemma 1.

Proof of the main result

Let $X_{ij}$ as defined in the main result. Define $X = (X_{ij})$ to be the joint distribution of $\{X_{ij}\}$ with probability distribution
$$
\begin{split}
f(x) &= f_{\mu_1, \mu_2, …, \mu_n, \sigma^2}(x) \\
&= (2\pi)^{-\frac{m}{2}} \sigma^{-m} \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_{i,j} (x_{ij} – \mu_i)^2 \right) \\
&= (2\pi)^{-\frac{m}{2}} \sigma^{-m} \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_i m_i \mu_i^2\right) \\
&\qquad \times \mathrm{exp} \left(-\frac{1}{2\sigma^2} \sum_{i,j} x_{ij}^2\right) \mathrm{exp} \left(\sum_i \frac{\mu_i}{\sigma^2} \sum_j x_{ij}\right).
\end{split}
$$
Notice that the equation above satisfies the requirements of the Fisher-Neyman factorization for
$$
T(X) = (\sum_{j} X_{1j}, \sum_{j} X_{2j}, …, \sum_{j} X_{nj}, \sum_{i,j} X_{ij}^2).
$$
Therefore $T(X)$ is sufficient statistic for $(\mu_1, \mu_2, …, \mu_n, \sigma^2)$. From Lemma 1 and 2 it follows that $T(X)$ is a complete statistic. Define statistic $S$ from $T$ as
$$
\begin{split}
S(X) &= \frac{1}{m – n} \left( \sum_{i,j} X_{i,j}^2 – \sum_i \frac{1}{m_i} \sum_j X_{i,j} \right) \\
&= \frac{1}{m – n} \sum_i \left( \sum_j X_{i,j}^2 – \overline{X}_i^2 \right) \\
&= \frac{1}{m – n} \sum_i \left( \sum_j (X_{i,j} – \mu_i)^2 – m_i (\overline{X}_i – \mu_i)^2 \right).
\end{split}
$$
With Lemma 2 $S$ is complete and sufficient. Moreover
$$
\begin{split}
E(S) &= \frac{1}{m – n} \sum_i \left( \sum_j E((X_{i,j} – \mu_i)^2) – m_i E((\overline{X}_i – \mu_i)^2) \right) \\
&= \frac{1}{m – n} \sum_i \left( \sum_j \sigma^2 – m_i \frac{\sigma^2}{m_i} \right) \\
&= \frac{1}{m – n} \sum_i (m_i – 1) \sigma^2 = \sigma^2,
\end{split}
$$
because
$$
\begin{split}
E((X_{i,j} – \mu_i)^2) &= \mathrm{Var}(X_{i,j}) = \sigma^2, \\
E((\overline{X}_{i,j} – \mu_i)^2) &= \mathrm{Var}(\overline{X}_{i,j}) \\
&= \frac{1}{m_i^2} \left(\sum_j \mathrm{Var}(X_{i,j}) + \sum_{j \neq k} \mathrm{Cov}(X_{i,j}, X_{i,k}) \right) \\
&= \frac{1}{m_i^2} \sum_j \sigma^2 = \frac{\sigma^2}{m_i}.
\end{split}
$$
Therefore $S$ is a complete, sufficient, statistic estimator for $\sigma^2$. From the Lehmann-Scheffe Theorem it follows that $S$ is a UMVUE for $\sigma^2$. QED