How's it going so far?

Showing posts with label categorical data. Show all posts
Showing posts with label categorical data. Show all posts

Friday, August 12, 2011

Keeping it Simple -- the Area Principle

If you made it through the [rather long] post entitled "Picturing Categorical Data," this next post brings out a fine point about making graphical displays. If you look through articles and newspapers, or perhaps at slides that people make in your office for presentations, you might see a tendency for folks to want to make them fancy. While this is admirable -- trying to show potentially dry information in a more splashy way -- it can end up being misleading. Read on.

Take a look at this pie chart, which graphs quantities that are 10%, 20%, 30%, and 40% of the whole.

Now compare it to its flashy 3-D counterpart:
"What's the difference?" you might ask. However, I would argue that the 3-D version makes the green slice (30%) look larger than the purple (40%) slice. Can you see it? This doesn't always happen with 3-D displays, but 3-D displays are prone to this. It's something that you want to look out for, just in case.

When a smaller pie slice or bar (in a bar chart) looks larger than a slice or bar that represents a larger quantity, we say that the display violates the Area Principle. In the first display, each piece was proportionately sized relative to the others.

Why make this point? When trying to communicate something graphically, the main point is to get the information across with as little potential confusion as possible -- not to impress people with fancy pictures. Statistics can be mind-boggling to many, so why not try to make things as straightforward as possible? It's the old K.I.S.S. principle. Statistics-challenged people will thank you! (As I'm sure you're thanking me for the short post!)

Sunday, August 7, 2011

Picturing Categorical Data

They say a picture is worth a thousand words, and that's certainly true with statistical data! The first thing to note that different types of data are pictured in different ways. You might recall from an earlier posting that there are two types of data:
  • Categorical data are sub-levels of variables that you do not combine with arithmetic. You usually count them. Examples: eye colors (bllue, brown, green, hazel, gray), education levels (high school, undergrad, graduate, etc.). Some categorical variables are numbers, but not ones that make sense to add; examples are zip codes, numbers on athletic jerseys, and phone numbers.
  • Quantitative data are numbers for which it makes sense to do arithmetic on them. For example, with test scores, dollars, miles, etc., it is meaningful to, say, find their average. Quantitative data usually have labels.
In this post, we'll picture categorical data only; I'll cover quantitative data in another post.

There are two ways to "graphically" picture categorical data: bar charts and pie charts. You see these all the time in articles. Let's use the following base data that shows eye colors in a group of 115 people:
Bar Charts
The most straightforward type of display is a bar chart. Bar charts can portray the actual numbers...

...or they can portray percents of the whole (115)...

Notice that the bars can be vertical or horizontal. In either case, the bars are arranged in either increasing or decreasing size.

If you have a number of very small bars, you can put them together as a larger combined bar and label it "Other." Such bars don't necessarily need to be placed in order; they usually appear as the last bar. Suppose, for example, you want to put green, hazel, and gray together as an "Other" bar. Then your graph would look like this:

Sometimes you will see a single bar with sections representing each category in proportion. This is called a segmented (or stacked) bar chart. They can appear with the actual counts (height of the single bar is equal to the sum of the counts), or as percentages (height of the single bar represents 100%). Here's how a segmented bar chart with percents would look. Notice that it's good to arrange the bars in decreasing order from bottom to top:
Segmented bar charts are especially helpful in comparing the same categories in two or more different groups. Suppose we had a second group of people whose segmented bar chart of eye colors looked slightly different. We could put the bars side by side in a single display, with the eye colors in the same order.

It is easy to see that there are fewer brown-eyed people in Group 2, but more blue-eyed people, about the same number of green and gray eyed-people, but fewer with hazel eyes.

One caution with two or more segmented bars: Unless each group is exactly the same size, you should use percents rather than counts. Otherwise it would be nearly impossible to compare the bars.

Pie Charts
A pie chart shows each category as a proportional-sized pie slice. Pie charts always use percents. So, a pie chart for our eye color data would look like this:

Notice that the pieces are arranged in order of size as you go around the pie.

(Self-Test): Suppose there's another group -- this time consisting of 140 people -- whose eye colors are as follows. Make a "Group 3" segmented bar chart to show the differences.


(Answer): See below.