Interpreting the EBP plots: The Threshold Effect, The Fixed-Outs Explosion, and other features

Expected Binomial Production (EBP) was created to show how run production in baseball varies with on base percentage. In this article we’ll display this relationship graphically, and provide interpretation of what we’re seeing. There are three run production measures that we plot against p (the probability that the average team batter reaches base each trip to the plate): Runs per inning, Runs per plate appearance, and Runs per base reach.

Runs Per Inning for the four components of EBP

Formula and plot

The main formula provides runs per inning as a function of p, and is
EBP_{IN}(L,p) =\ \dfrac{p^{L+1}}{1-p}  [3 + 2L(1-p) + \frac{1}{2}L(L+1)(1-p)^2]

Here are the plots of runs per inning versus p for each of the L = 0, 1, 2, and 3 components of the formula (The Homers, The Doublers, The Singlers, and The Walkers, respectively):

Runs per inning for EBP components

What should happen at the end points

Does this align with what we expect to see? Just what do we expect to see? We expect 0 runs when p = 0 — nobody getting on base means no runs scoring. When p = 1, nobody is making outs at the plate. If nobody makes outs on the bases either, the inning will never end, and so runners will keep scoring, and runs per inning increases to infinity. Steal attempts and attempts to take an extra base will cease because the cost of failure would be too high; nevertheless, the possibility of making an out on the bases means run production wouldn’t quite go to infinity as p gets close to 1, but it should instead become a very large finite number. We therefore expect something like this:

Runs per inning versus p, expectation

The Fixed-Outs Explosion

The sharply upward curve as p approaches 1 is expected because in the ideal case of no runners lost on the bases, there is a vertical asymptote at p = 1, and functions that have a vertical asymptote increase faster than any others, much faster than exponential. I will refer to this sharp upward increase as “The Fixed-Outs Explosion”, because it happens due to the rules fixing the number of failures (outs) you must have before you’re done, instead of fixing the number of overall attempts (plate appearances).

Contrast this to free throws in basketball, for which you get your two attempts and then you’re done – there’s no potential for things to go on forever, so there’s no explosion in scoring when success rates grow high. Expected free throw points per quarter as a function of free throw percentage would be a straight line:

Basketball analogy - points per quarter

The Fixed-Outs Explosion is one of the two phenomena I was pondering that got me curious about the magnitude of the synergy in OBP. The other I call the Threshhold Effect, which we’ll see soon. It’s basically the fact that you can have offensive success, but no results to show for it in the score. In each inning, you must get over a threshhold of offensive production before you have something to show for it. We’ll get back to Runs Per Inning plots soon, and show what happens when we linearly combine these components to simulate a real team.

Runs Per Plate Appearance for the four components of EBP

Formula and plot

Multiplying the runs per inning formulas by the quantity (1-p) is like multiplying by outs per plate appearance. Additionally dividing by 3 gives us formulas for runs per plate appearance.
EBP_{PA}(L,p) = \dfrac{p^{L+1}}{3}  [3 + 2L(1-p) + \frac{1}{2}L(L+1)(1-p)^2]

Here are the plots of runs per plate appearance versus p for each of the L = 0, 1, 2, and 3 components of the formula:

Runs per PA for EBP components

What should happen at the end points

Again, at p=0, we expect runs plate appearance to be zero, because zero base reaches per plate appearance means zero runs per plate appearance. So these plots agree with that.

At p=1, nobody makes an out at the plate, innings go on almost forever, and so just about everybody who steps to the plate scores. There must be some outs on the basepaths, however, so we expect some batters don’t score, and so runs per plate appearance will be slightly less than one. These plots have it at one, because if you recall our simplfying assumptions, one was that outs never happen on the basepaths. That simplifying assumption causes these curves to show 1 instead of just below 1.

Can we quantify where they ought to be? We note that usually between 9% and 14% of a team’s baserunners are out on the basepaths, not counting “runners” who hit home runs. At p=1, we’d expect extreme caution on the basepaths, and so this number would get much lower (and then home runs would have a small additional impact). If we presume it falls to a 3% to 10% range, we would see runs per plate appearance (and runs per base reach, for that matter) on the upper end reaching .90 to .97. A potential future adjustment to EBP that introduces an accounting of outs on the bases can be judged in part by whether it brings its p=1 value down to somewhere in or near the .90 to .97 range.

What should happen in the middle

Note that nowhere do these curves exceed 1. That’s good, because you can’t have more runs than batters, and that’s exactly what we’d be saying happens if runs per plate appearance were more than one.

You also can’t have more runs than base reaches, and that explains why none of these curves is higher than the straight diagonal line. For The Homers (who only ever hit home runs), base reaches and runs are the same thing; so runs per plate appearance (the thing we’re plotting) is the same as base reaches per plate appearance. But base reaches per plate appearance is actually the thing we’re plotting against, because it is taken to be equal to p. So the thing we’re plotting equals the thing we’re plotting against, which means we get a straight line for our graph.

That’s in a home-runs-only world. In the real world, only some of the runners who reach base score, so runs are less than base reaches, and so runs per plate appearance is less than base reaches per plate appearance. Thus the actual curve should fall below this straight line, starting out more horizontal, then turning upward to become more vertical – just like we see here.

Notice the lowest line belongs to The Walkers. This is as we’d expect, because they’re the ones who’ll have the greatest fraction of base reaches that don’t turn out to be runs.

Finally, let’s compare this to its analogy for free throws, which would be points per free throw attempted, as a function of p (p = the probability of making a free throw):

Basketball analogy - points per free throw

Whereas the runs per inning charts looked very different from the free throws per quarter plot, in these per-attempt charts, the homers’ line looks identical to the free throws line. That’s because they actually function the same way. The reason why these plots are the same, but the points per game/inning/quarter charts look so different, is the fixed-outs explosion occurring in the baseball version and not the basketball version. If we wanted to make the basketball version of points per inning/quarter look like the baseball version, you could do it by changing the rules so that each time you’re fouled, you kept shooting free throws and scoring points until you miss one. Then basketball would have a “fixed-failures explosion”, too. A high-percentage shooter could be at the line for a loooooooong time …

This makes baseball more interesting

As for these points per attempt charts, it’s not the home runs, but rather the other kinds of hits and base reaches, that make things more interesting. These other kinds of base reaches are what give us the threshold effect, and therefore the curvy lines, and also therefore the frustration of offense without points. It’s a frustration that you don’t get in other sports. It gives baseball a unique kind of drama and range of emotions for the fans.

Runs Per Base Reach for the components

Dividing the runs per plate appearance formulas by p converts them to runs per base reach formulas.
EBP_{BR}(L,p)\ =\ \dfrac{p^L}{3}  [3 + 2L(1-p) + \frac{1}{2}L(L+1)(1-p)^2]

Here are the plots of runs per base reach versus p for each of the L = 0, 1, 2, and 3 components of the formula:

Runs per base reach for EBP components

The reason Expected Binomial Production was created

These to me are the most interesting plots. They’re the ones I originally sought out, based on the realization that if there was no synergy in raising an entire team’s OBP, then you’d get horizontal straight lines on this plot. Indeed, that’s what we see for The Homers, and it makes sense when you think about it. Because every base reach is also a home run, one extra base reach always brings with it the same amount of additional runs (precisely 1), no matter how frequently or infrequently they occur. But you don’t get horizontal lines for the other teams.

The Threshold Effect

That’s because unlike The Homers, the other teams experience the threshold effect. The threshold effect says innings with few base reaches don’t score their baserunners as efficiently as innings with many base reaches.

Basketball free throws are not subject to the threshold effect. The analagous plot in this case would be points per basket made as a function of p. But points per basket made doesn’t depend on p; by rule, it’s always 1. So the plot looks just like the Homers’ plot:

Basketball analogy - points per made free throw

Taking a closer look at an example may help show how the threshold effect works. The Walkers could have three players reach base by walk every inning, and they would score no runs despite a high .500 on base percentage. But just one more player reaching base each inning, and suddenly they’re scoring a very high 1 run per inning. That’s because they just made it over the threshold. From there, if they increase those base reaches by 50%, they triple that run production (6 base reaches and 3 runs). That’s the threshhold effect – getting no scoring out of your initial offensive production, but then suddenly seeing that scoring rapidly increase as you cross a threshold of production. Once you’re over the threshold, scoring becomes roughly proportional to production, but as you’re crossing it, the ratio of overall scoring in the inning to overall production in the inning rapidly increases.

I interpret the “S”-like shape of these curves as expressing that threshold effect (except of course for The Homers who are not subject to it). The portion of the curves to the left are below-threshold, and so increasing the on-base rate does little to increase the rate at which those who reach base score. The middle portion is around the threshhold, and this is why we see run production make its steepest climb here. On the right side of these plots, representing the highest on-base rates, we are well clear of the threshold, and most base reaches now lead to runs; increasing on-base percentage in this region doesn’t change that fact, so there’s little change in the rate at which base reaches become runs. The curves therefore become more horizontal again.

What should happen at the end points

Do these graphs align correctly with what real world teams would do? The points p=0 and p=1 are interesting to look at.

At p=1, every plate appearance results in a base reach, and so these runs-per-base-reach curves should take exactly the same values as the runs-per-plate-appearance curves there. Everything we said in the previous section about values at p=1 therefore applies here. For a “real world” team that makes outs on the basepaths (“real world” in quotes because p=1 isn’t ever really going to happen – at least not in professional baseball), we expect the curves to peak at p=1, taking values probably between .90 and .97 runs per base reach. For our hypothetical team that never makes outs on the basepaths, we expect the curve to take the value of 1 run per base reach at p=1. As mentioned for the runs per plate appearance charts, we may judge any potential future adjustment to EBP that introduces an accounting of outs on the bases in part by whether it brings its value for runs per base reach at p=1 down to somewhere in or near the .90 to .97 range.

At p=0, there’s no production at all, so let’s consider what happens where there’s just barely above p=0 production. Getting more than one base reach in an inning would almost never happen. So without being able to score by chaining hits together, runs would score either by home run, or by advancing any runner on outs or by steal or by error. If we look at what fraction of a team’s base reaches are home runs, we expect the team’s curve at p=0 to take a value somewhere above that fraction. For most teams, that fraction falls in the 4% to 10% range. I don’t expect the contribution from advancing runners to be as large as this, in part because of the fact that EBP is pretty much free of biases (which tells me that in actual historical play, outs made on the basepaths roughly offset base advances that don’t result from other base reaches). So we expect a value of runs per base reach that’s a little higher than a team’s home run fraction.

Now notice that none of the four curves for L= 0, 1, 2, or 3 come close to that value. They’re either 0 or 1.

When I first came up with the basic EBP formula, I toyed with the idea allowing L to take on any value, speculating that if I find the right value for a particular team’s season (always between 1 and 2), I have a prediction of how their run production varies with on-base percentage. But any value of L above 0 leads to 0 runs per base reach at p=0, something that shouldn’t happen if home runs are properly accounted for. If there were instead a way to use the L=0 formula sometimes, and the formulas for other L values other times, we might be able to properly model a real team’s runs per base reach at p=0. Fortunately, there is a very appropriate and logical way to do so, which is described in the introductory article on EBP.