Expected Binomial Production (EBP) was created to show how run production in baseball varies with on base percentage. In this article we’ll display this relationship graphically, and provide interpretation of what we’re seeing. There are three run production measures that we plot against p (the probability that the average team batter reaches base each trip to the plate): Runs per inning, Runs per plate appearance, and Runs per base reach.
Runs Per Inning for the four components of EBP
Formula and plot
The main formula provides runs per inning as a function of p, and is
Here are the plots of runs per inning versus p for each of the L = 0, 1, 2, and 3 components of the formula (The Homers, The Doublers, The Singlers, and The Walkers, respectively):
What should happen at the end points
Does this align with what we expect to see? Just what do we expect to see? We expect 0 runs when p = 0 — nobody getting on base means no runs scoring.
When p = 1, and therefore when OBP = 1.000, nobody is making outs at the plate. If nobody makes outs on the bases either, the inning will never end, and so runners will keep scoring, and runs per inning increases to infinity. When creating these formulas, it was assumed there are no outs made on the basepaths, so by going to infinity at p=1, these formulas correctly reflect that assumption.
However, even at p=1, where everyone safely reaches base, that assumption that baserunners make no outs may be just a little bit wrong. Can we quantify what percentage of baserunners will make outs when OBP = 1.000? We can start by noting that in actual MLB games, usually between 9% and 14% of a team’s baserunners are out on the basepaths, not counting “runners” who hit home runs. But now let’s consider the cost of something like getting caught stealing a base. Success stealing means there’s an extremely small chance you got your team 1 extra run in an inning where you’re scoring potentially hundreds of runs. Failure means you probably just cost your team hundreds of runs. Given that situation, nobody would ever attempt to steal a base.
Neither would any runner ever take a lead, so runners would never get picked off anymore. Sacrifice flies would never happen. Going first to third on a single? No way. Runners would never take the extra base. Would doubles even happen anymore on a ball that stayed in play? Lots of what would have been doubles at lower OBP’s would become singles at OBP = 1.
It’s hard to think of scenarios in which an out on the basepaths happens. Fielder’s choices don’t happen because if they did, we wouldn’t have p=1 and OBP = 1. Hidden ball trick? Doesn’t work when runners don’t take leads at all. But I can imagine that once in a while, maybe a runner stumbles and overslides the bag and gets tagged out. Or perhaps they think the third out has been made, so they wander off the bag and get tagged out. Or there is an easy double but the runner on first stops at second, and with two runners at second, one is easily tagged out.
What does this translate to? The numbers should be WAY below the current 9% to 14% range. Could as many as 1 in 25 runners be out? As few as 1 in 1000? It’s hard to say, but I’ll take that range. That gives us anywhere from 0.1% to 4% of baserunners being out on the basepaths. We therefore might possibly expect a runs per inning curve that looks like this (which reflect about 3% of baserunners making outs):

The Fixed-Outs Explosion
Regardless of whether it goes infinite or finite, the sharply upward curve as p approaches 1 is expected because in the ideal case of no runners lost on the bases, there is a vertical asymptote at p = 1, and functions that have a vertical asymptote increase faster than any others, much faster than exponential. I will refer to this sharp upward increase as “The Fixed-Outs Explosion”, because it happens due to the rules fixing the number of failures (outs) you must have before you’re done, instead of fixing the number of overall attempts (plate appearances).
Contrast this to free throws in basketball, for which you get your two attempts and then you’re done – there’s no potential for things to go on forever, so there’s no explosion in scoring when success rates grow high. Expected free throw points per quarter as a function of free throw percentage would be a straight line:
The fixed-outs explosion is one of the two phenomena I was pondering that got me curious about the magnitude of the synergy in OBP. The other I call the Threshhold Effect, which we’ll see soon. It’s basically the fact that you can have offensive success, but no results to show for it in the score. In each inning, you must get over a threshhold of offensive production before you have something to show for it. We’ll get back to Runs Per Inning plots soon, and show what happens when we linearly combine these components to simulate a real team.
Runs Per Plate Appearance for the four components of EBP
Formula and plot
Multiplying the runs per inning formulas by the quantity (1-p) is like multiplying by outs per plate appearance. Additionally dividing by 3 gives us formulas for runs per plate appearance.
Here are the plots of runs per plate appearance versus p for each of the L = 0, 1, 2, and 3 components of the formula:
What should happen at the end points
Again, at p=0, we expect runs plate appearance to be zero, because zero base reaches per plate appearance means zero runs per plate appearance. So these plots agree with that.
At p=1, nobody makes an out at the plate, innings go on almost forever, and so just about everybody who steps to the plate scores. With our assumption that no outs are made on the basepaths, innings do go on forever at p=1, and so everybody scores. All the curves converge to a value of 1 run per plate appearance. If we remove our assumption that no outs are made on the basepaths, we expect a few rare blunders, as explained above. Using the guess we made above of 0.1% to 4% of baserunners making outs, we’d see The Homers with a runs per PA of .96 to .999, The Walkers with a rate of .92 to .998, and The Singlers and The Doublers in between.
What should happen in the middle
Note that nowhere do these curves exceed 1. That’s good, because you can’t have more runs than batters, and that’s exactly what we’d be saying happens if runs per plate appearance were more than one.
You also can’t have more runs than base reaches, and that explains why none of these curves is higher than the straight diagonal line. For The Homers (who only ever hit home runs), base reaches and runs are the same thing; so runs per plate appearance (the thing we’re plotting) is the same as base reaches per plate appearance. But base reaches per plate appearance is actually the thing we’re plotting against, because it is taken to be equal to p. So for The Homers, the thing we’re plotting equals the thing we’re plotting against, which means we get a straight line for our graph.
That’s in a home-runs-only world. In the real world, only some of the runners who reach base score, so runs are less than base reaches, and so runs per plate appearance is less than base reaches per plate appearance. Thus the actual curve should fall below this straight line, starting out more horizontal, then turning upward to become more vertical – just like we see here.
Notice the lowest line belongs to The Walkers. This is as we’d expect, because they’re the ones who’ll have the greatest fraction of base reaches that don’t turn out to be runs.
Finally, let’s compare this to its analogy for free throws, which would be points per free throw attempted, as a function of p (p = the probability of making a free throw):
Whereas the runs per inning charts looked very different from the free throws per quarter plot, in these per-attempt charts, the homers’ line looks identical to the free throws line. That’s because they actually function the same way. The reason why these plots are the same, but the points per game/inning/quarter charts look so different, is the fixed-outs explosion occurring in the baseball version and not the basketball version. If we wanted to make the basketball version of points per inning/quarter look like the baseball version, you could do it by changing the rules so that each time you’re fouled, you kept shooting free throws and scoring points until you miss one. Then basketball would have a “fixed-failures explosion”, too. A high-percentage shooter could be at the line for a loooooooong time …
This makes baseball more interesting
As for these points per attempt charts, it’s not the home runs, but rather the other kinds of hits and base reaches, that make things more interesting. These other kinds of base reaches are what give us the threshold effect, and therefore the curvy lines, and also therefore the frustration of offense without points. It’s a frustration that you don’t get in other sports. It gives baseball a unique kind of drama and range of emotions for the fans.
Runs Per Base Reach for the components
Dividing the runs per plate appearance formulas by p converts them to runs per base reach formulas.
Here are the plots of runs per base reach versus p for each of the L = 0, 1, 2, and 3 components of the formula:
The reason Expected Binomial Production was created
These to me are the most interesting plots. They’re the ones I originally sought out, based on the realization that if there was no synergy in raising an entire team’s OBP, then you’d get horizontal straight lines on this plot. Indeed, that’s what we see for The Homers, and it makes sense when you think about it. Because every base reach is also a home run, one extra base reach always brings with it the same amount of additional runs (precisely 1), no matter how frequently or infrequently they occur. But you don’t get horizontal lines for the other teams.
The Threshold Effect
That’s because unlike The Homers, the other teams experience the threshold effect. The threshold effect says innings with few base reaches don’t score their baserunners as efficiently as innings with many base reaches.
Basketball free throws are not subject to the threshold effect. The analagous plot in this case would be points per basket made as a function of p. But points per basket made doesn’t depend on p; by rule, it’s always 1. So the plot looks just like the Homers’ plot:
Taking a closer look at an example may help show how the threshold effect works. The Walkers could have three players reach base by walk every inning, and they would score no runs despite a high .500 on base percentage. But just one more player reaching base each inning, and suddenly they’re scoring a very high 1 run per inning. That’s because they just made it over the threshold. From there, if they increase those base reaches by 50%, they triple that run production (6 base reaches and 3 runs). That’s the threshhold effect – getting no scoring out of your initial offensive production, but then suddenly seeing that scoring rapidly increase as you cross a threshold of production. Once you’re over the threshold, scoring becomes roughly proportional to production, but as you’re crossing it, the ratio of overall scoring in the inning to overall production in the inning rapidly increases.
I interpret the “S”-like shape of these curves as expressing that threshold effect (except of course for The Homers who are not subject to it). The portion of the curves to the left are below-threshold, and so increasing the on-base rate does little to increase the rate at which those who reach base score. The middle portion is around the threshhold, and this is why we see run production make its steepest climb here. On the right side of these plots, representing the highest on-base rates, we are well clear of the threshold, and most base reaches now lead to runs; increasing on-base percentage in this region doesn’t change that fact, so there’s little change in the rate at which base reaches become runs. The curves therefore become more horizontal again.
What should happen at the end points
Do these graphs align correctly with what real world teams would do? The points p=0 and p=1 are interesting to look at.
At p=1, every plate appearance results in a base reach, and so these runs-per-base-reach curves should take exactly the same values as the runs-per-plate-appearance curves there. Everything we said in the previous section about values at p=1 therefore applies here. That means for our idealized team that makes no outs on the basepaths, runs per base reach is 1 when p=1. But for teams that do make outs on the basepaths, our guess that 0.1% to 4% of baserunners make outs means we’d see The Homers with a runs per base reach of .96 to .999, The Walkers with a rate of .92 to .998, and The Singlers and The Doublers in between.
At p=0, there’s no production at all, so let’s consider what happens where there’s just barely above p=0 production. Getting more than one base reach in an inning would almost never happen. So without being able to score by chaining hits together, runs would score either by home run, or by advancing any runner on outs or by steal or by error. If we look at what fraction of a team’s base reaches are home runs, we expect the team’s curve at p=0 to take a value somewhere above that fraction. For most teams, that fraction falls in the 4% to 10% range. I don’t expect the contribution from advancing runners to be as large as this, in part because of the fact that EBP is pretty much free of biases (which tells me that in actual historical play, outs made on the basepaths roughly offset base advances that don’t result from other base reaches). So at p=0 (OBP=.000) we expect a value of runs per base reach that’s a little higher than a team’s home run fraction.
Now notice that none of the four curves for L= 0, 1, 2, or 3 come close to that value. They’re either 0 or 1.
When I first came up with the basic EBP formula, I toyed with the idea allowing L to take on any value, speculating that if I find the right value for a particular team’s season (always between 1 and 2), I have a prediction of how their run production varies with on-base percentage. But any value of L above 0 leads to 0 runs per base reach at p=0, something that shouldn’t happen if home runs are properly accounted for. If there were instead a way to use the L=0 formula sometimes, and the formulas for other L values other times, we might be able to properly model a real team’s runs per base reach at p=0. Fortunately, there is a very appropriate and logical way to do so, which is described in the introductory article on EBP.






