If you’ve been reading here for a while, you know that the correlation does not imply causation thing really gets me into a tizzy. What you probably don’t know is that this issue is not my only pet peeve with a lot of empirical research. (Shocking, I know.) Let me start by giving a bit of a statistical introduction/refresher:
Say you’re trying to answer a research question. That question may be of the form “Does X have an effect on Y?”, “Is the average behavior of group X different from that of group Y?”, etc. You get a result that looks promising (i.e. that X does in fact have an effect on Y or that X and Y are different), but you still wonder if the effect is big enough to be, well, important or meaningful…significant even.
Overall, there are two types of significance- “statistical significance” and “economic significance.” (I would venture to guess that the latter is called something else in fields other than economics.) Statistical significance answers the question of whether the observed result is unlikely to appear by random chance, and economic significance answers the question of whether the observed result is large enough to have a meaningful impact. Let’s think about statistical significance for now…
If you are trying to answer whether X has an impact on Y, then, on a basic level, your two options for outcomes are “X has no impact on Y” and “X has an impact on Y.” (These options would be called “no” and “yes”, respectively, in normal people terms.) Similarly, if you are trying to answer whether groups X and Y are different, then your two outcome options are “X and Y are not different” and “X and Y are different.” In statistics, we take somewhat of a roundabout approach to drawing positive conclusions about our observed effects or differences in that we do so by rejecting the notion that there isn’t an effect or difference. To this end, the statistical significance of an outcome is reported using something called a p-value, which answers the question “if in reality there were no systematic effect or difference, what is the probability that we would see an outcome this far away from zero by random chance?” Logically, if this probability is very low, we can reject the idea that there is no effect of X on Y or no difference between X and Y. By process of elimination, if we’ve ruled out the possibility that there is no effect or difference, then, well, there must be an effect or difference. See how that works?
In economics, the typical threshold for an outcome to be considered statistically significant is a p-value of less that 0.05, or 5 percent. In other words, an effect is significant if there is a less than 5 percent chance that the researcher is getting a false positive. Furthermore, academic journals really only want to publish studies that cross this threshold, since apparently “we couldn’t rule out the possibility that there’s no effect here, but please read anyway” doesn’t make for a particularly compelling narrative. (*I* would want to read these papers, but I think I’m pretty alone in my viewpoint.) Therefore, the academic publishing world is more or less like the following:
I would like to point out that the video is not technically correct when the bear-bunny creature on the right says that a p-value just measures whether a sample is large enough. Instead, the p-value reflects both the size of the sample and the magnitude of the effect. In other words, you can achieve statistical significance with a smaller sample if your observed effect is really strong, but if your observed effect is subtle then you’re going to need a lot of data points to convince people that it’s actually there. On a related note, I am more favorable to the use of the p-value standard than is the bear-bunny in the video, but I think that the dialogue illustrates an important point regarding publication bias.
If editors publish only publish papers that pass this test of statistical significance, the readers of these papers cannot assess the true significance of the result. For example, if you were reading a paper that said that the chance of the observed result being a false positive is only 5 percent, wouldn’t you want to know whether there are 19 other similar papers sitting in researchers’ desk drawers collecting dust because they didn’t show a significant result and thus got rejected? If these papers existed, you would probably take the “significant” result with a grain of salt…or perhaps a salt lick. The following from xkcd illustrates the publication bias problem nicely:
Sigh. Think of all those people who avoid green jelly beans just because they don’t understand statistics. (Not to mention those who do so because they still don’t get the correlation vs. causation thing.)
P.S. Now you can also understand what it means to be someone’s statistically significant other.