Why probability matters


In our reading on probability, we learned one way of mathematically modeling a system in which we did not have sufficient control of the important conditions to follow the causality chain of events that produced the result. The example we gave there was flipping a fair coin, where a "fair coin" is one that, if you flipped it very many times, would approach coming up heads and tails an equal number of times.

The basic result was that if heads and tails (H & T) are equally probable results for a single flip, then for many flips, we can calculate the likelihood (the probability) that if you flip the coin N times, that you will get n heads and N-n tails.

The way you do this is to consider each string of N flips (HTTHHHTHTT....T) as equally probably (a microstate — everything is as specified as possible). Then define what you care about, say n heads and N-n tails (a macrostate — only the things you care about specified). The probability of any particular macrostate is the fraction of the total number of microstates that corresponds to that particular macrostate.

There are a number of important general points to take away from this example.

  • The result given by a probabilistic law does NOT tell you what will happen in any given experiment (trial). It only applies if you REPEAT the experiment many times. And then it only tells you what fraction of the time you can expect the different results that are possible.
  • The model we have of the system is crucial. What are the hidden variable states (microstates) that are equally probable, and how many different ways can a result state (macrostate) be made up from different hidden variable states?

Probabilistic models turn out to be immensely important for many groups of scientists. Here are some reasons why.

Probability in the context of a physics class

In this class, probabilistic models will play critical roles in understanding the physics behind some of the topics added because of their importance in making sense of the mechanisms for important phenomena in biology. 

  • The concept of how pressure arises in a gas follows directly from a model of the gas molecules moving randomly as a result of numerous collisions with other molecules.
    • This analysis results in the ideal gas law and shows us the meaning of temperature (the average kinetic energy of a molecule).
    • It also helps us understand the meaning of partial pressure and what happens when gases mix.
  • Diffusion is the phenomenon in which a set of molecules with a non-uniform density, spreads itself out, becoming more and more uniform. What happens is well described by a probabilistic model in which each molecule moves at random as a result of being hit from all sides by the molecules of the fluid they are imbedded in. Despite each molecule not knowing where it's going, the diffusion result (Fick's law) emerges from a probabilistic model of the system. 
    • The Nernst equation is the principle that creates the electric potential difference across cell membranes. This follows directly from an understanding of how diffusion combines with electric forces.
  • Entropy is a critical concept that underlies Gibbs Free energy, the function that tells what direction a process will go in spontaneously. It plays a critical role in biology. Understanding entropy requires building a strong understanding of how microstates and macrostates for the distribution of energy can be described by a probability model.
    • This analysis helps us understand how even at thermal equilibrium the system fluctuates, with the fluctuations of the energy described by the Boltzmann factor. 

Probability for researchers

Researchers often are trying to find out what happens in some physical situation. The point of that research is not to find out "what happened" but rather "what will happen if I do the experiment again" (or if the situation occurs in the wild).

In every experiment, a researcher tries to control as many of the variables that are important to a situation as well as they possibly can. But no variable can be controlled perfectly and sometimes not all variables are known or controllable at all. In this case, repeating an experiment many times and seeing how the result varies when the input is "as much the same as you can make it" gives a good idea of what variation can be expected. Note that this variation is NOT "experimental error". It is experimental uncertainty. That uncertainty is as much a part of the data as is the result.

In you science laboratories, you will be introduced to the methods of statistical analysis that you can use to get an idea of what the uncertainty is in a given experimental result.

Probability for health-care professionals

Data about the success of health-care treatments — drugs, surgeries, etc. — are almost always reported using the mathematical tools for analyzing situations that have uncontrolled components: statistics. Interpreting reported statistical results can be very challenging and can be misleading.

As we saw in our discussion above, to do a probabilistic (statistical) analysis of a system, you need to assume "a fair coin". The problem is, you can't determine if a coin is truly "fair" until you've done an infinite number of experiments. With a coin, you can do a LOT of flips — maybe 2000 in an hour — so in a couple of weeks, you might be able to get 200,000 flips. Though it wouldn't be fun to do, that might be "close enough" to let you know that a coin is almost fair, say within 2 tenths of a percent. You don't need a perfect coin to do a reasonably fair choice.

But with a study of patients, there are often only a small number. It's very rare to get hundreds of thousands of subjects in a medical experiment. Especially at the early stages of a new treatment, there may be only a few hundred or even only a few dozen patients receiving the treatment. What to do then?

The tool of analysis for such situations is called Bayesian probability. This goes far beyond what we will do with probability in this class, but is often taught in medical schools as a tool for finding the best diagnosis given the information you have. The underlying idea is to assume an initial guess as to the probability (for example, assuming a "fair coin"), and then  to correct the probability in the best way mathematically possible as more data comes in. 

An important thing to remember when reading the results of medical experiments is that the statistical conclusions rely heavily on the assumptions made about both the population probed and the mechanism leading to the apparent randomness in the results. For example, a new medical treatment may have a 45% success rate. And that result may be highly statistically significant. That does NOT means that every patient given the treatment will improve by 45%. It may mean that 45% of the patients will improve 100% because they have a particular element of their genome that was not know to the researchers, while the treatment will have no effect on rest of the population. 

In evaluating and understanding medical statistics, it is essential to keep in mind both the importance of the model of the population being used and the mechanism of what's going on in the observed process in order to make good evaluations useful for diagnoses.

Joe Redish 3/2019

Article 287
Last Modified: March 31, 2019