Central Limit Theorem
The central limit theorem explains why Gaussian distributions are so common, and in particular is essential for thermodynamics. Any quantity which is the sum of individual terms whose values are described by almost any probability distribution, will be described by the Gaussian distribution. The key requirement is that the distribution describing the individual values must have a finite second moment.
Program CentralLimitTheorem lets you test the central limit theorem for three distributions, and explore the distribution as a function of the number of terms $N$ in the sum.
Problem: Central limit theoremUse Program CentralLimitTheorem to test the applicability of the central limit theorem.
- Assume that the variable $s$ is uniformly distributed between 0 and 1. Calculate analytically the mean and standard deviation of $s$ and compare your numerical results from the program with your analytical calculation.
- Use the default value of $N=12$, the number of terms in the sum $S = \sum_{i=1}^N s_i$, and describe the qualitative form of $p(S)$ that is computed and plotted by the program; $p(S)\Delta S$ is the probability that the sum $S$ is between $S$ and $S+\Delta S$. Does the qualitative form of $p(S)$ change as the number of measurements (trials) of $S$ is increased for a given value of $N$?
- What is the approximate width of $p(S)$ for $N = 12$? Describe the changes, if any, of the width of $p(S)$ as $N$ is increased. Increase $N$ by at least a factor of four. Do your results depend strongly on the number of measurements?
- To determine the generality of your results, consider the probability density $f(s) = e^{-s}$ for $s \geq 0$ and answer the same questions as in parts (a)-(c).
- Consider the Lorentz distribution $$ f(s) = (1/\pi)(1/(s^2 + 1)), $$ where $-\infty \leq s \leq \infty$. What is the mean value and variance of $s$? Is the form of $p(S)$ consistent with the results that you found in parts (b)-(d)?
- *Each value of $S$ can be considered to be a measurement. The sample variance $\tilde \sigma_S^2$ is a measure of the square of the differences of the result of each measurement and is given by \begin{equation} \tilde \sigma_S^2 = \frac{1}{N-1}\sum_{j=1}^N (S_j - \overline{S})^2. \label{eq:3/stadardmeans} \end{equation} The reason for the factor of $N - 1$ rather than $N$ in the definition of $\tilde \sigma_S^2$ is that to compute it, we need to use the $N$ values of $s$ to compute the mean of $S$. Thus, loosely speaking, we have only $N - 1$ independent values of $s$ remaining to calculate $\tilde \sigma_S^2$. Show that if $N \gg 1$, then $\tilde \sigma_S \approx \sigma_S$, where the standard deviation $\sigma_S$ is given by $\sigma_S^2 = \overline{S^2} - \overline{S}^2$.
- *The quantity $\tilde \sigma_S$ in \eqref{eq:3/stadardmeans} is known as the standard deviation of the means. That is, $\tilde \sigma_S$ is a measure of how much variation we expect to find if we make repeated measurements of $S$. How does the value of $\tilde \sigma_S$ compare to your estimated width of the probability density $p(S)$?