Statistics Without Tears: Connection or Causation?

I was reading an article the other day about a possible connection between mental illness and nose jobs, and thought, "Aha! A perfect possible example of crazy statistics!" But I was not only disappointed on that count, but also pleasantly surprised that the article showed valid statistical practices. What I can focus on in this post are two things:

What valid statistical practices were used?
How to avoid common pitfalls when reading an article that involves statistics.

First, you might want to quickly read the article:
http://well.blogs.nytimes.com/2011/07/27/some-nose-job-patients-may-have-mental-illness/
Go ahead; I'll wait.

There are enough bad statistical articles out there, which I'm sure I'll happen upon and grab for another post, but as I said, this article is not one of them. First, the author was careful to qualify the difference between people who have valid medical reasons to seek plastic surgery on their noses and those who appear to be negatively obsessed with a perfectly normal-looking (from a plastic surgeon's standpoint) nose. The author was careful to separate the two.

This was a "controlled" study, as statisticians say. First, there was a rather large sample (266) of patients seeking nose jobs in Belgium, who filled out a diagnostic questionnaire that was intended to uncover the condition (called Body Dysmorpphic Disorder (BDD)). The author included all of the vital information -- number of people, location of the study, duration of the study, and a link to the full journal article so that the study could be reproduced if desired. Rather than worrying that a duplicate study will refute the findings of the original study, most statisticians welcome "do-overs" of the study, because it can either strengthen their findings or point out something they might have overlooked. After all, from an ethical standpoint, most researchers are after the truth, so they try to make the specifics as transparent as possible.

Separating those patients who had a "valid" complaint about their nose (for instance, a breathing problem) from those who showed signs of BDD was an excellent example of controlling the study. For example, if they hadn't done this and went on conducting the study, they couldn't have separated the "valid" patients from the BDD patients later. By controlling the study, they were able to determine that only 2% of the "valid" patients showed evidence of BDD while 43% of the "invalid" patients did. If everyone had been lumped together, the results might have been diluted. Researchers need to think of possible variables ahead of time that could muddy the waters later. The variable of "valid" versus "invalid" was one of these. We call these variables lurking variables, because they can work under the surface, making it look like one thing (seeking a nose job) is causing the other (BDD)... or vice versa.

Another interesting and admirable thing the article did was volunteer extra information that the reader might have had upon reading the article; that is, they singled out nose jobs from other plastic surgery as being notable with respect to this connection. They seem to have thought things through very well.

Now, on to you as a reader of such articles. My only caution in this case is that you don't read any type of cause and effect situation into the article. Having BDD doesn't necessarily cause people to seek a nose job, although there seems to be a very strong connection between the two. Further study and repetition of the study would be needed to prove cause and effect. To its credit, the author of this article was careful not to lead you in the causation direction.

Simply put: Just because two variables appear to be connected doesn't prove that one variable causes another. For those readers who recall the "old days" when people smoked cigarettes and didn't know about the dangers of lung cancer, remember how it took years to get even the mildest cautionary note placed on cigarette packs? The first such caution cited a connection between cigarette smoking and lung cancer. After years of further controlled study, researchers were finally able to put stronger cautions on cigarette packs: that cigarette smoking causes lung cancer.

Moral of the Story: This was an example of an article in which the author was very careful not to imply cause and effect. Other articles aren't so clear. As you read articles that involve numbers and statistics, be aware of this and don't jump to causation conclusions!

Self-Test: Name a lurking variable in this statement: "The more firefighters sent to a fire, the more damage is done to the structure."

Answer: Size of the fire is the lurking variable. It influences both variables: the amount of damage done, and the number of firefighters sent to the scene.

(Credit: Stats: Modeling the World, Bock, Velleman, deVeaux. Thanks!

Statistics Without Tears

How's it going so far?

Saturday, July 30, 2011

Connection or Causation?

No comments:

Post a Comment