## Wednesday, July 27, 2011

### The Center: Mean, Median, and Mode

Let's begin with what might be familiar territory: how people describe a list of numbers (data!) using a single central measure. There are three of these numbers -- mean, median, and mode -- and which one is best heavily depends on the data you're describing. Let's go over how to determine each of these measures, and discuss the pros and cons of each.

Consider the following very short list of test grades one of my students received last semester: 74, 84, 88, 71, and 88.

• To find the mean (also known as the average): just add up all of the numbers and divide by how many numbers are in the set. That is: (74+84+88+71+88) divided by 5 = 405 / 5 = 80. The symbol we'll use for the mean is

The mean is good to use when there are no extreme values (numbers that lie far
outside the span of the other numbers. There are none in this set of numbers, so
we say there are no outliers and so the mean is just fine to use.
• To find the median (also known as the midpoint), arrange the grades in numerical order: 71 ,74, 84, 88, 88. Note that we list duplicates as many times as they occur. Once the numbers are arranged, find the number that's in the middle of the list. In this short list, it's 84.
Now, what if there is no middle number? For example, suppose we add a sixth
score: 94. We now have the list, in order: 71, 74, 84, 88, 88, 94. When we look
for the middle,there isn't a single score but two: 84 and 88. In this case, find the
average of these two numbers: (84+88}/2 = 86.

The median is good to use almost anytime, but is especially important to use when
there are outliers. To see why the median is more accurate than the mean as a
central measure when there are outliers, think about the salaries of a very small
company, from the line workers to the CEO:

Line worker 1:       \$  28.000
Line worker 2:       \$  32,500
Line worker 3:       \$  33,100
Supervisor:           \$  45,000
Marketing person: \$  62,300
Sales person:        \$  70,000
CEO:                    \$175,000

Compare the mean (\$ 63,700) to the median \$45,000. The mean isn't realistic
because 5 out of 7 of the workers are making less! This is because the outlier,
\$175,000, inflates the calculation of the mean. On the other hand, the median is
much more reflective of the central salary: 3 employees make more, 3 make less.

• Finding the mode is easy, but it exists only if there are duplicates in the list. Simply find the number that occurs the most. In our original list of 5 test scores, 88 occurs twice, so the mode is 88. In our salary example directly above, there is no mode because no salary occurs more than once. If one number occurs twice and another number occurs four times, the latter is the mode because it occurs the most.
The mode is probably the least useful as a measure of center. The only time it
makes sense is if there are many occurrences of the same number, compared to
the number of other values. Example: 57, 66, 75, 75, 75, 75, 75, 75, 82.

Moral of the Story: When you see the terms "mean" and "median" used in articles, do not assume that the writer is always using the right term. If you see the data, you can check this. Otherwise, the author might be confusing one measure for the other. Not everyone understands the difference, but now (hopefully) you do!

Self quiz: What are the measures of center (mean, median, and mode) for the following list of 8 student scores? 43, 77, 66, 73. 85, 75, 92, 81.

(Answer): Mean = 74; Median = 76; Mode = none. Which is the better measure? Technically, the median would be better because the low score of 43 is dragging the mean down a bit. However, it's pretty much a wash since the mean and median are so close to each other.