Friday, March 28, 2014

Research: Do We Shy Away From Pivotal Calls?

A new study shows that at least one type of decision-maker — the Major League Baseball umpire — is biased when the stakes are high.

We like to think of judges and other arbitrators as unbiased, making decisions based solely on the examination of facts of the case and the rule of law. This is, after all, what they’re charged to do, and spend years training to perfect. But there’s one problem: They’re human.

Past behavioral studies have shown that extraneous factors, such as whether the judge is hungry or the person being judged has a particular skin color, do influence some decision-making scenarios. But new research by Stanford Graduate School of Business PhD students Etan Green and David P. Daniels suggests that a decision-maker’s call can be swayed by another factor: the magnitude of the stakes involved.

Green and Daniels analyzed ball and strike calls made by Major League Baseball umpires for more than a million pitches between 2009 and 2011. In their study, which recently won second place at the MIT Sloan Sports Analytics Conference, they show that an umpire’s strike zone shrinks in counts when the batter already has two strikes (and therefore a third strike would result in an out) and expands when the batter has three balls (with a fourth ball then resulting in a walk).

“Oftentimes, the umpires face a choice between a call that would be really pivotal and a call that would be relatively inconsequential,” says Green. “And what we find is that they err on the side of the inconsequential call unless they’re absolutely certain that the pivotal call is the right one.”

The research focuses on pitches around the edges of the strike zone, which the authors call the “ring of uncertainty.” When pitches cross home plate squarely in the strike zone, umpires call them strikes with a probability greater than 99%. When pitches veer widely from the strike zone, umpires call them strikes with a probability less than 1%. But within a roughly 1-foot-wide band on the edges of the strike zone, the probabilities for calling a strike vary precipitously, from 90% to 10%.

When the batter already has two strikes, the authors show, a borderline pitch that would typically be called a strike 50% of the time is called a strike only about 30% of the time. In other words, when an umpire isn’t quite sure that the pitch was a strike, he or she becomes biased toward making calls that maintain the status quo (keeping the at-bat going) rather than potentially making an important mistake (ending the at-bat early by calling an uncertain strike).

The images above show one umpire's strike zone for right-handed hitters: on the left for counts with fewer than two strikes; and on the right for two-strike counts. The dotted rectangle shows the official strike zone, and each band represents pitch locations for which the umpire calls strikes with the listed probability. As you can see, with two strikes the umpire's strike zone shrinks for the next pitch. In other words, when there are fewer than two strikes, he calls a pitch in the red-lined circle a strike 90% of the time. When there are two strikes, he will only call a pitch in that location a strike about 70% of the time.

In scenarios when a batter has three balls, the authors found, umpires are slightly more likely to call borderline pitches a strike instead of a ball. “This is consistent with a bias against making a pivotal call,” says Daniels. “It happens in both directions.”

They also discovered that every single umpire exhibits the effect, some to a dramatic extent. “The most biased umpires, for some borderline calls, are going from a 50-50 chance they’ll call a strike to calling balls basically every time in a two-strike count,” says Green. Furthermore, the effect also increases in certain game scenarios in which the umpire faces time pressure to make a quick call, suggesting that it’s likely a hard-wired, intuitive reaction rather than a behavior that could be changed.

Green and Daniels say that the game, with its clearly defined scenarios, is a particularly rich microcosm for studying decision-making behavior. The rules are clear, and home plate umpires are supposed to call strikes based solely on the location of the pitch. What’s more, Major League Baseball documents every pitch with a series of 60 3-D images — “so we have very specific data to go back and check the extent to which the umpire’s strike zone changes with extraneous factors,” says Green.

In the real world, virtually every decision made by a professional arbitrator is more complex than calling balls and strikes. “But there’s a reasonable argument that if an effect happens in a decision as simple as balls and strikes, we would expect it to manifest in other places, too,” says Daniels.

One possible scenario would be in cases involving states’ “three strikes” laws (which gets their nickname from baseball). In some borderline cases with a defendant potentially facing a third felony conviction, a judge could become less likely to rule a violation is a felony rather than a misdemeanor, for example. Judges or juries making a determination between guilty or innocent, or even doctors making certain diagnoses, may be inclined to err on the side of making an inconsequential call, to the extent that the decision-maker has discretion to do so.

Green and Daniels determined that umpire bias flips about 1% of ball and strike calls, so, they write, “almost once a game, an at-bat ends in something other than a strikeout after a strike should have been called." But since the bias is consistently applied, and doesn’t favor one team over another, it’s debatable whether the effect creates a fairness problem.

“We don’t have a stance one way or the other about what baseball should do about this issue,” says Green. “But in other societal contexts, the stakes are considerably higher. You can imagine if you’re a team and you lose a game because of this, you might say, ‘OK, maybe next time the calls will go in our favor.’ But if a judge lets off a guilty offender, or a doctor makes the wrong diagnosis because he or she is erring on the side of not doing something consequential — those effects don’t get recalibrated.”