# Example: Sampling

## Understanding the situation

A powerful estimation technique common in both biology and medicine is sampling. When you want to estimate information about a large population you sample that population by looking at a subset of the data. Let's see how it works in a simple case (toy model). (This method used to be the way blood counts of red and white cells were done - by hand using a microscope with a grid on the slide.)

## Presenting a sample problem

The figure at the right shows a large number of spots. Without counting all of them, can you fairly quickly get a good estimate of how many there are? Can you estimate how good your estimate is — again without counting all of them?

## Solving this problem

One way to do this is to draw a grid over the image, breaking it up into boxes, counting the number in a few of the boxes, and assuming that most of the boxes have similar numbers. This is basically a "density" argument, assuming that the density of dots is reasonably uniform.

Of course to do this, your grid has to be an appropriate intermediate size. It can't have too many dots in each box or you might as well count the whole thing. It can't have too few, or the average number in a box wont be about the same — the "density" assumption will be no good.

The grid shown below (6 x 5) ought to work pretty well for this situation.

There are 30 boxes. Let's pick 4 at random and get an average number per box. Since the dots are big and some cross boundaries, let's only include those that are more than half in our box. We should choose randomly (and if one occurs again, we include it twice!). I'm using a random number generator that I found using Google.  Here's what I get and my count in each box.

 A3 7 C5 5 B1 7 A4 11

This is a total of 30 or an average number per box of 30/4 = 7.5. Multiplying by 30 gives an estimate of the total as 225.

Let's do it again and see how stable our result is.

 A4 11 D1 6 F2 10 B5 7

This is a total of 34 or an average number per box of 34/4 = 8.5. Multiplying by 30 gives an estimate of the total as 255. Try one more time.

 B4 11 C2 7 A3 7 C5 5

This is again a total of 30 giving an estimate of 225. Averaging our 3 trials we get our estimate for the number in the box as 235. I would report this as 230 ± 10.

It looks like we can get the result to about 5% by sampling.

Joe Redish 4/30/15

Article 492