Sunday, August 7, 2011

Picturing Categorical Data

They say a picture is worth a thousand words, and that's certainly true with statistical data! The first thing to note that different types of data are pictured in different ways. You might recall from an earlier posting that there are two types of data:
• Categorical data are sub-levels of variables that you do not combine with arithmetic. You usually count them. Examples: eye colors (bllue, brown, green, hazel, gray), education levels (high school, undergrad, graduate, etc.). Some categorical variables are numbers, but not ones that make sense to add; examples are zip codes, numbers on athletic jerseys, and phone numbers.
• Quantitative data are numbers for which it makes sense to do arithmetic on them. For example, with test scores, dollars, miles, etc., it is meaningful to, say, find their average. Quantitative data usually have labels.
In this post, we'll picture categorical data only; I'll cover quantitative data in another post.

There are two ways to "graphically" picture categorical data: bar charts and pie charts. You see these all the time in articles. Let's use the following base data that shows eye colors in a group of 115 people:
Bar Charts
The most straightforward type of display is a bar chart. Bar charts can portray the actual numbers...

...or they can portray percents of the whole (115)...

Notice that the bars can be vertical or horizontal. In either case, the bars are arranged in either increasing or decreasing size.

If you have a number of very small bars, you can put them together as a larger combined bar and label it "Other." Such bars don't necessarily need to be placed in order; they usually appear as the last bar. Suppose, for example, you want to put green, hazel, and gray together as an "Other" bar. Then your graph would look like this:

Sometimes you will see a single bar with sections representing each category in proportion. This is called a segmented (or stacked) bar chart. They can appear with the actual counts (height of the single bar is equal to the sum of the counts), or as percentages (height of the single bar represents 100%). Here's how a segmented bar chart with percents would look. Notice that it's good to arrange the bars in decreasing order from bottom to top:
Segmented bar charts are especially helpful in comparing the same categories in two or more different groups. Suppose we had a second group of people whose segmented bar chart of eye colors looked slightly different. We could put the bars side by side in a single display, with the eye colors in the same order.

It is easy to see that there are fewer brown-eyed people in Group 2, but more blue-eyed people, about the same number of green and gray eyed-people, but fewer with hazel eyes.

One caution with two or more segmented bars: Unless each group is exactly the same size, you should use percents rather than counts. Otherwise it would be nearly impossible to compare the bars.

Pie Charts
A pie chart shows each category as a proportional-sized pie slice. Pie charts always use percents. So, a pie chart for our eye color data would look like this:

Notice that the pieces are arranged in order of size as you go around the pie.

(Self-Test): Suppose there's another group -- this time consisting of 140 people -- whose eye colors are as follows. Make a "Group 3" segmented bar chart to show the differences.