Causal Friday: Fun with Gender Discrimination, Now with More Bad Econometrics…

On Fridays, we examine a research paper that uses (or fails to use) a clever method to perform causal inference, i.e. to tease out cause and effect.

Disclaimer: I’m kind of stretching the definition of both “causal analysis” and “research paper” here, but I guess you could interpret the analysis as relating to the causal impact of being female.

In case you haven’t heard, Google is the target of a class-action lawsuit based on gender discrimination. (Shocking, I know, given what we know about Silicon Valley more generally. =P) Part of the impetus for the lawsuit is an employee-led effort to collect compensation data that shows that men are paid more than women at the company:

At Google, Employee-Led Effort Finds Men Are Paid More Than Women

At Google, Employee-Led Effort Finds Men Are Paid More Than Women

A spreadsheet created by employees to share salary information shows pay for women is falling short of what men make at various levels.


From a data perspective, proving discrimination can be somewhat difficult- for example, we hear the often-quoted “women make 77 cents for every dollar a man makes” statistic, but this in itself doesn’t really tell us anything about discrimination. It could instead be the case that women sort into lower-paying occupations and jobs of their own volition, choose to work fewer hours, and so on. (On the other hand, we can’t rule out the discrimination hypothesis either.)

Ideally, what one would do to look for discrimination would be to compare otherwise equivalent men and women and see whether compensation differences still exist within the matched groups. Mathematically, this is essentially what economists do when they run a regression with “control variables”- variables that suck up the differences that are accounted for by stuff other than gender.

Google employees seem to be up on their applied math, since they put together an analysis so that they could make the following statement:

Based upon its own analysis from January, Google said female employees make 99.7 cents for every dollar a man makes, accounting for factors like location, tenure, job role, level and performance.

On the surface, this seems to suggest that significant gender discrimination just doesn’t show up in the data. BUT…and this is important…this example highlights the difference between doing math and doing data analysis (or, more charitably, data science)- while this conclusion may be mathematically correct, it’s basically a “garbage in, garbage out” use of econometric tools. Simply put, if you’re trying to isolate gender discrimination, you can’t just blindly control for things that themselves are likely the result of gender discrimination! It’d be like looking at the impact of diet on health and using weight as a control variable- sure, you’d get an “all else being equal” sort of result, but it wouldn’t make sense since weight is likely a step in the chain between diet and health outcomes.

In this way, Google tipped its hand quite a bit regarding the particular nature of gender discrimination at the company- if men and women are paid the same once job title and performance reviews are taken into account, then gender discrimination (if it exists) is taking place either by herding women into jobs with different roles/levels or showing anti-female (or pro-male) bias in performance reviews. (Also, if the “levels” have set pay bands, which the article kind of suggests, doesn’t controlling for level largely amount to assuming your conclusion?)

Turns out my suspicions are pretty on point, given the specific claim of the lawsuit:

Google ‘segregates’ women into lower-paying jobs, stifling careers, lawsuit says

Google ‘segregates’ women into lower-paying jobs, stifling careers, lawsuit says

Exclusive: Women say Google denied them promotions, telling the Guardian they were forced into less prestigious jobs despite qualifications


It’s amazing what you can learn from data IF you look at it properly. In a semi-previous life, I worked as an economic consultant, which basically means that I helped prepare expert testimony to be used in lawsuits involving economic matters. What I wouldn’t give to be the expert witness who gets to offer up a rebuttal to Google’s crap econometrics here.

Update: This is amazing:

In case you’re curious, the excerpt is from this book, which I highly recommend.

Causal Friday: The Most Depressing Instrument Ever, Fox News Edition…

On Fridays, we examine a research paper that uses (or fails to use) a clever method to perform causal inference, i.e. to tease out cause and effect.

Economists Gregory J. Martin and Ali Yurukoglu have a new paper published in the American Economic Review (also available in working paper form here) that shows that the existence of Fox News has a (statistically) significant impact on Republican vote share. Here’s the abstract:

We measure the persuasive effects of slanted news and tastes for like-minded news, exploiting cable channel positions as exogenous shifters of cable news viewership. Channel positions do not correlate with demographics that predict viewership and voting, nor with local satellite viewership. We estimate that Fox News increases Republican vote shares by 0.3 points among viewers induced into watching 2.5 additional minutes per week by variation in position. We then estimate a model of voters who select into watching slanted news, and whose ideologies evolve as a result. We use the model to assess the growth over time of Fox News influence, to quantitatively assess media-driven polarization, and to simulate alternative ideological slanting of news channels.

Ok sure, that’s a lot to unpack, but let’s work through it. I think we can all agree that people who watch Fox News are more likely to vote Republican than others, but on that basis we can’t tell whether Fox News actually causes them to vote Republican, Republican ideology attracts them to Fox News, or something else both causes them to watch Fox News and vote Republican. In an ideal world (at least from a research standpoint), we could run an experiment to examine cause and effect where we take a group of people and randomly choose half of them to sit in front of Fox News for a while (and disallow the other group from watching) while keeping everything else about their lives the same as before. (This might actually be hard if the Fox News group doesn’t watch a lot of TV and goes outside instead, etc.) To my knowledge, no one has tried to do this yet, perhaps because watching Fox News is too hazardous to get IRB clearance. (That said, I will admit I was too lazy to read the lit review of the paper.)

So do researchers just give up? Well, sociologists might. 🙂 (I kid because I love.) But economists get creative, and one thing they do is try to find an instrumental variable– simply put, a source of randomization. In this case, the researchers asserted that people are more likely to watch a given channel when it has a lower channel number (perhaps the result of the typical channel-surfing process), and they noticed that what channel Fox News is on differs by geography in a fairly random way. (In other words, it’s not correlated with how likely people are to watch fox News, vote Republican, etc.) These two observations together mean that we basically do have a world where some people are randomly subjected to more Fox News than others, and, as it turns out, there is a (negative) relationship between Fox News channel number and Republican vote share.

Obviously, there is a no direct link between Fox News channel number and voting patterns, and instead the hypothesis is that channel number impacts viewing time, which in urn affects the votes. Kind of fancy econometrics stuff enables the researchers to isolate the part of watching Fox News that is essentially random and then determine the impact of that random part on voting. They estimate that this impact is 0.3 percentage points in vote share as a result of a random extra 2.5 minutes per week of Fox News watching. (for example, 55.3% to 55.6% voting Republican) A few things to note:

  • This doesn’t seem a like a huge effect, but it’s statistically significantly different from zero, and there are people who are randomly subjected to more than an extra 2.5 minutes per week of Fox News, in which case the effect would be larger. (2.5 minutes is the increase in viewing time associated with a one standard deviation reduction in channel number.)
  • Similar analysis was done for, say, MSNBC, but an analogous effect was not observed.
  • The paper itself tries harder than I do here to rule out alternative explanations and such.
  • If cable/broadcast companies know that the channel numbers work in this way, they could use them as a manipulative tool, since that’s how causality works. (Good thing this paper happened first, since non-randomization would kind of screw things up.)

I’m a little conflicted here- on one hand, given that Fox News is heavy on the misinformation, it’s pretty depressing to learn that it actually shapes ideology and actions. On the other hand, math is SO COOL.

(Sidenote: If you think this sort of think is neat, you can see a whole talk about it here.)

File Under #socialscienceproblems…

I feel like this is something that most social scientists can relate to:

I mean, that’s pretty much how this came about:

Seriously though, I do think a lot about how we could make economics more visually interesting than repeated supply and demand diagrams, but it’s hard…for example, I tried to implement a “CSI: Regression Analysis” parody (as proposed by Charles Wheelan in Naked Statistics)- the script was pretty straightforward, but when I got people together to act it out we were like “heh, maybe add some sort of white board and write random stuff on it I guess?” I know, you’ve never thought of that before.

The end product wasn’t terrible, but visual suggestions are certainly welcome. Unless the suggestion is “tables of regression coefficients,” then just GFY.