If you’ve been reading here for a while, you know that the correlation does not imply causation thing really gets me into a tizzy. What you probably don’t know is that this issue is not my only pet peeve with a lot of empirical research. (Shocking, I know.) Let me start by giving a bit of a statistical introduction/refresher:

Say you’re trying to answer a research question. That question may be of the form “Does X have an effect on Y?”, “Is the average behavior of group X different from that of group Y?”, etc. You get a result that looks promising (i.e. that X does in fact have an effect on Y or that X and Y are different), but you still wonder if the effect is big enough to be, well, important or meaningful…significant even.

Overall, there are two types of significance- “statistical significance” and “economic significance.” (I would venture to guess that the latter is called something else in fields other than economics.) Statistical significance answers the question of whether the observed result is unlikely to appear by random chance, and economic significance answers the question of whether the observed result is large enough to have a meaningful impact. Let’s think about statistical significance for now…

If you are trying to answer whether X has an impact on Y, then, on a basic level, your two options for outcomes are “X has no impact on Y” and “X has an impact on Y.” (These options would be called “no” and “yes”, respectively, in normal people terms.) Similarly, if you are trying to answer whether groups X and Y are different, then your two outcome options are “X and Y are not different” and “X and Y are different.” In statistics, we take somewhat of a roundabout approach to drawing positive conclusions about our observed effects or differences in that we do so by rejecting the notion that there isn’t an effect or difference. To this end, the statistical significance of an outcome is reported using something called a p-value, which answers the question “if in reality there were no systematic effect or difference, what is the probability that we would see an outcome this far away from zero by random chance?” Logically, if this probability is very low, we can reject the idea that there is no effect of X on Y or no difference between X and Y. By process of elimination, if we’ve ruled out the possibility that there is no effect or difference, then, well, there must be an effect or difference. See how that works?

In economics, the typical threshold for an outcome to be considered statistically significant is a p-value of less that 0.05, or 5 percent. In other words, an effect is significant if there is a less than 5 percent chance that the researcher is getting a false positive. Furthermore, academic journals really only want to publish studies that cross this threshold, since apparently “we couldn’t rule out the possibility that there’s no effect here, but please read anyway” doesn’t make for a particularly compelling narrative. (*I* would want to read these papers, but I think I’m pretty alone in my viewpoint.) Therefore, the academic publishing world is more or less like the following:

I would like to point out that the video is not technically correct when the bear-bunny creature on the right says that a p-value just measures whether a sample is large enough. Instead, the p-value reflects both the size of the sample and the magnitude of the effect. In other words, you can achieve statistical significance with a smaller sample if your observed effect is really strong, but if your observed effect is subtle then you’re going to need a lot of data points to convince people that it’s actually there. On a related note, I am more favorable to the use of the p-value standard than is the bear-bunny in the video, but I think that the dialogue illustrates an important point regarding publication bias.

If editors publish only publish papers that pass this test of statistical significance, the readers of these papers cannot assess the true significance of the result. For example, if you were reading a paper that said that the chance of the observed result being a false positive is only 5 percent, wouldn’t you want to know whether there are 19 other similar papers sitting in researchers’ desk drawers collecting dust because they didn’t show a significant result and thus got rejected? If these papers existed, you would probably take the “significant” result with a grain of salt…or perhaps a salt lick. The following from xkcd illustrates the publication bias problem nicely:

Sigh. Think of all those people who avoid green jelly beans just because they don’t understand statistics. (Not to mention those who do so because they still don’t get the correlation vs. causation thing.)

P.S. Now you can also understand what it means to be someone’s statistically significant other.

## 11 responses so far ↓

1

Daniel// Apr 6, 2011 at 4:34 pmWhen I was in middle school one guy spouted off in class that “Green M&Ms make you horny!” Let’s get the scientists away from their Minecraft game to investigate this pronto!

2

econgirl// Apr 6, 2011 at 4:37 pmI was told that when I was younger as well. Unfortunately, I was told this by my high-school English teacher. I now separate out the green ones when I get a pack of M&M’s…I will leave it to you to decide whether I am going for positive or negative selection. =P

3

Laura// Apr 6, 2011 at 11:03 pmI am 95% confident that, as a grad student I waste a significant amount of time doing research that someone else has already done, but it wasn’t published because of the insignificant results. I wish each field had a journal that reported insignificant findings (when they really are insignificant, like when the confidence intervals are huge and 0 is between them).

4

steve// Apr 6, 2011 at 11:18 pmon a slightly unrelated note you might be interested in this paper

https://editorialexpress.com/cgi-bin/conference/download.cgi?db_name=IIPF66&paper_id=315

given your correlation doesn’t imply causation thing, I’d be interested in your opinion on the reason for the differences?i.e. why does the right attract better looking representatives? and/or why does being right wing make you care more about your appearance?

5

Harriet R// Apr 7, 2011 at 4:02 amGreat post – I actually had a discussion about papers that don’t have statisitically sig. findings just yesterday, so I’m all kinds of happy to read your excellent thoughts.

6

Juan// Apr 7, 2011 at 7:56 pmOn a related note…

http://www.phdcomics.com/comics/archive.php?comicid=1174

7

Warren J// Apr 8, 2011 at 2:26 am@Juan – I would laugh if it weren’t so true…

8

Warren J// Apr 8, 2011 at 2:31 am@Steve – so, this study much prove the theory of “natural election?” Oh, yeah, I’m on fire tonight…

Actually, and I didn’t read the study – I just looked at the table – since right candidates are far more likely to be in the minority, how likely is it that appearance benefits the minority more than the majority, instead of it being a left vs. right thing?

9

Economics in Action » Blog Archive » A little bit on correlation// Nov 16, 2011 at 11:13 am[…] http://www.economistsdoitwithmodels.com/2011/04/06/why-publication-bias-matters-in-one-cartoon/ […]

10

einarbt// Dec 5, 2011 at 2:28 amThank you for an interesting article.

I think teddy is fair when saying “Yes, irrelevant, the p value only tells us if a researcher had a big enough sample size or not”; the p value simply captures the N of the study because as your N increases so does p decrease and vice versa capturing the limitations of the p value perfectly. At least confidence intervals (CI) give you a range that extends in both directions away from the point estimate and they are not limited to a magic .05 number despite attempts to make a CI of 95% a default.

Granted, the p value reflects both the N and the effect but the effect size does not reflect the p value; think it through.

Anyone interested in examining the uselessness of significance testing should start by reading

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003.

and

Schmidt, F., L, & Hunter, J. E. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In L. L. Harlow, S. A. Mulaik & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 37-64). London: Lawrence Erlbaum Associates.

Kind regards,

einarbt

11

argan oil acne before and after// Jun 28, 2014 at 8:47 pmNumerous individuals will undoubtedly take

advantage of quality post like this one. Do place me on your ezine subscription in order that I can be up

to date with your newest posts. Thanks upfront.

## Leave a Comment