Economists Do It With Models

Warning: “graphic” content…

Bookmark and Share
Causal Friday: Well That Explains My Student Evaluations…

December 12th, 2014 · 4 Comments
Causal Friday

Well, file this experiment in both the “cool” and “depressing” categories:

What’s in a Name: Exposing Gender Bias in Student Ratings of Teaching

Lillian MacNell, Adam Driscoll, Andrea N. Hunt

Student ratings of teaching play a significant role in career outcomes for higher education instructors. Although instructor gender has been shown to play an important role in influencing student ratings, the extent and nature of that role remains contested. While difficult to separate gender from teaching practices in person, it is possible to disguise an instructor’s gender identity online. In our experiment, assistant instructors in an online class each operated under two different gender identities. Students rated the male identity significantly higher than the female identity, regardless of the instructor’s actual gender, demonstrating gender bias. Given the vital role that student ratings play in academic career trajectories, this finding warrants considerable attention.

I think if you asked a random person what data they would want to see in order to determine whether female instructors are discriminated against (or, equivalently in this case, male instructors are discriminated for), they would ask for statistics related to the average ratings of male instructors and the average ratings of female instructors. Noting that the male average is higher, however, doesn’t really tell us anything about discrimination unless we can rule out the possibility that the male instructors are better teachers. (Unlikely, of course, but a theoretical possibility nonetheless. =P) If we wanted to do better, we could have people trained to objectively rate the instructors and then see whether the differential between the objective ratings and the student ratings is different for male instructors than for female instructors. Of course, this presumes the existence of an objective way to rate instructors and assumes that said trained observers are somehow immune from subconscious discrimination themselves. Ideally, then, we’d want to conceive of some Mrs. Doubtfire-type situation where an instructor could randomize across classes whether or not he or she was wearing an opposite gender disguise. If we did this, we essentially normalize for the teaching quality across classes and can therefore attribute difference in ratings to discrimination on the part of students.

This is basically what the paper did, sadly without the costumes, since the online courses were such that the instructors could just say what gender they were and didn’t have to back it up with visual evidence. (On a half sidenote, one of the reasons that I find this study depressing is that we are in a world where classes can be structured so that we never see the instructors. Technology being what it is certainly means that we can do better than that.) The researchers then did find what appears to be differential treatment for male versus female instructors. From the NC State press office:

To address whether students judge female instructors differently than male instructors, the researchers evaluated a group of 43 students in an online course. The students were divided into four discussion groups of 8 to 12 students each. A female instructor led two of the groups, while a male instructor led the other two.

However, the female instructor told one of her online discussion groups that she was male, while the male instructor told one of his online groups that he was female. Because of the format of the online groups, students never saw or heard their instructor.

At the end of the course, students were asked to rate the discussion group instructors on 12 different traits, covering characteristics related to their effectiveness and interpersonal skills.

“We found that the instructor whom students thought was male received higher ratings on all 12 traits, regardless of whether the instructor was actually male or female,” MacNell says. “There was no difference between the ratings of the actual male and female instructors.”

In other words, students who thought they were being taught by women gave lower evaluation scores than students who thought they were being taught by men. It didn’t matter who was actually teaching them.

The instructor that students thought was a man received markedly higher ratings on professionalism, fairness, respectfulness, giving praise, enthusiasm and promptness.

“The difference in the promptness rating is a good example for discussion,” MacNell says. “Classwork was graded and returned to students at the same time by both instructors. But the instructor students thought was male was given a 4.35 rating out of 5. The instructor students thought was female got a 3.55 rating.”

I agree that the promptness category is a good one to point out, since it’s one of the few places where the underlying behavior can be objectively measured. Maybe I’m exaggerating a bit, but I now kind of feel like my students think I have nothing better to do than get their papers returned immediately whereas male instructors have other big and important things to attend to. Anecdotally, I’ve certainly gotten the impression that not everyone is cool with a female instructor, due to the authority figure nature of the situation if nothing else. This sometimes shows up in evaluation comments that are not exactly of an appropriate nature- I think my favorite was “she would make a good wife if only she learned how to cook.” I don’t think the intention was to make me annoyed that the student assumed that I cannot cook, whereas in reality I am quite proficient thankyouverymuch- I’ll have you over for a nice short rib risotto if you don’t believe me.

I found it interesting that, even after reading an article about this experiment, a number of people responded with “but I gave a male/female professor a really low/high ranking that one time!” This observation doesn’t actually run counter to the findings of the study, of course, since the study is examining whether the low/high ranking would have been even lower or higher had the instructor been of the other gender. Some of the comments on the article about the study are also worth reading- not necessarily the ones about the small sample size, since statistical significance takes that into account, but about the lack of what social scientists would call “double-blind” protocols in the study- it’s entirely possible (and, again, depressing) that the instructors actually acted differently when they were presenting as male than when they were presenting as female.

I also have to admit that the line from the article about student evaluations being important “because they’re used to guide higher education decisions related to hiring, promotions and tenure” made me laugh due to its adorable naive nature. =P Perhaps someday we can also have a discussion about the usefulness of student evaluations in the first place, since I can think of a lot of ways to get higher student ratings that don’t actually increase learning. (In fact, there is evidence that lower evaluations correlate with higher performance in subsequent courses, suggesting that student punish instructors in the short run for making them do things that are good for them in the long run. But no, tell me more about the “student as customer” mentality…)

Tags: Causal Friday

4 responses so far ↓

  • 1 David // Dec 13, 2014 at 3:30 am

    So the sample size is 8 people per cell? Yikes!

  • 2 Nicholas Henkey // Dec 13, 2014 at 5:19 am

    Now I’m curious how the trend changes if the professor/teacher is instructing on a subject within or outside of the student’s interest. What I mean by this, is, if a student is interested in Economics, would they rate a female professor the same as a male? Conversely, if the student is uninterested in a subject, would they penalize the female professor more than the male?

    I ask this question primarily because I enjoy your articles. My favorite Econ professor in college was a conspiracy theorist and he questioned EVERYTHING, including some economic assumptions which are considered foundational among certain factions. I like your articles as much as I liked his courses because you also question some of the basic assumptions of economics (example: your “popcorn trivia article” where you question whether or not the economy reverts back to previous states after the government reverses policy decisions).

    Mulling over it, I cannot seem to recall many female professors teaching courses that I didn’t like during my university career… except for English… but she was kind of weird and hyperobsessed with political cartoons and introduced me to the movie Casablanca, so I liked her course a lot (run-on + made-up word). Something tells me that she could have been the exception and not the rule, particularly due to her natural eccentricity.

    I should have become an academic so I could get support to take this test to the next level…

  • 3 econgirl // Dec 13, 2014 at 5:25 pm

    @ David: I’m being optimistic and assuming that the differences are statistically significant, which makes the 8 per cell issue less troubling. But I do think we would all like to see this tried on a larger group.

  • 4 Michael // Dec 14, 2014 at 1:38 am

    Given how consistent it is with every other study on a similar issue ever . . .

Leave a Comment