Stats Tip: Understanding Normal Distributions (and “Husky-Sized” Tomatoes)

Today’s stats tip comes to you courtesy of… My amazing friend, Ingrid Gruett, who has joined the ranks of us Gen-Xers going back to grad school for a career change.  Ingrid got her bachelor of music degree as a classmate of mine at the Wheaton Conservatory of Music and is an extremely talented clarinetist.  She is now back in school to begin a new career in music therapy.

Ingrid is taking a stats course and passed along an interesting exam question she saw this week which I thought was a particularly good one for digesting the concept of normally distributed data and how we think about what data are unusual.  I always find that explaining a concept helps me understand it better and perhaps this will be helpful to some new stats students out there (or others who teach them.) For you seasoned liars, damned liars, and statisticians out there, this will be old hat, but for me it was helpful to have an opportunity to explain this concept in a way that might make sense to someone who is new to statistics.

Here’s the question Ingrid passed along…

Jody, a statistics major, grows tomatoes in her spare time.  She keeps a record of the weight of each tomato she grows.  One tomato is 2 standard deviations heavier than the mean weight.  Assume a Normal model is appropriate.  What percentile is it in?

  1. 99.7
  2. 95
  3. 97.5
  4. 68
  5. None of the above

To answer this question, I offer this crudely drawn figure and some steps to guide your thinking.TomatoDistributionSketch

STEP 1. CONSIDER THE TOMATOES OF THE FIELD  – Think of all the tomatoes coming out of the garden.  When we assume their weights are normally distributed, that means that all of them must fall inside the bell curve of our distribution somewhere. A handful of the tomatoes will be especially big (like the one in your problem) and will be way out under the right tail of the curve.  About the same number will be especially small, so these will be way out under the left tail of the curve.  Most will be about average (really close the the mean), which is why there’s so much more area under the middle bump of the curve than there is way out in the tails.

STEP 2. WHICH TOMATOES ARE INSIDE 2 STANDARD DEVIATIONS AND WHICH ARE OUTSIDE? – If you look at the poorly drawn picture  you can work out that 95% of all the tomatoes are your typical tomatoes (as we said above) and aren’t so unusually big or small that they are more than 2 standard deviations away from the mean, so they would be under the big middle bump of the curve.  That means that all the freakishly big AND freakishly small tomatoes are outside of that big middle bump, under the TWO tails. So, if 95% are inside that 2SD window, 5% are outside in the TWO tails…

STEP 3. WHICH FREAKISH TOMATOES ARE YOU TALKING ABOUT? – This is the key step… in this case you are ONLY looking at a freakishly BIG tomato way out on the right tail of the distribution.  As we’ve already said, though, if we assume our garden grows a crop of tomatoes that is normally distributed in terms of tomato size, there must also be freakishly SMALL tomatoes way out on the left tail (these are pitiful, green, sour tomatoes that deserve our pity).  Even though we don’t care about those little guys, they are still part of the 5% that are outside the 2SD window… in fact, they make up exactly HALF of that 5%, meaning there is 2.5% in that tail and 2.5% in other tail.  So… that means that ONLY 2.5% of ALL the tomatoes are 2SD+ BIGGER than average… meaning, 97.5% are smaller than the monster tomato you are looking at in this problem.  Hence, it is in the 97.5th percentile.

Thanks, Ingrid!  I have no doubt a great many hurting people will benefit from your work as a music therapist. Keep at it!

Image Credit – “Healthy Red Tomatoes are Wet and Organic”  by, used under CC BY 2 / Modified from original