Other things you can do with EBP

Elsewhere we’ve discussed using EBP to study the relationship between OBP and run production, and for estimating runs. Here are a few other things you can do with it.

Predicting runs for college teams, little league teams, minor league teams, etc.
How changing the number of outs per inning would affect run production
Strategy and lineup construction
Predicting percentages of innings with a particular number of runs scored
Predicting left on base rates
Separating what happens at the plate from what happens on the basepaths

Predicting runs for college teams, little league teams, minor league teams, etc.

Elsewhere I’ve compared and contrasted EBP to many of the better-known run estimators, whether comparing correlations to actual run production when using EBP as a run estimator, or comparing how they vary with OBP, when introducing a p-dependence to the run estimators. The creators of the run estimators discussed on those pages (with the possible exception of the simplest and oldest of these, Runs Created) all made use of actual major league gameplay data in their development. No such data was referenced in the development of EBPt (and only average rates of taking an extra base were referenced in the development of EBPf). As such, the formulas will be applicable to baseball played in any league, including little leagues, without modification. How well they make predictions in those contexts will probably vary considerably depending on rates of errors, wild pitches, stolen bases, etc. In major league baseball, the missing consideration of outs on the basepaths and base advances does not appear to have a great effect on its overall accuracy. This may not be the case in other contexts. It would be interesting to find out how it fares in other leagues. The idea that it might not is one reason I’d like to study how base advances and outs on the basepaths might be added to the model.

How changing the number of outs per inning would affect run production

Ever wonder how scoring in baseball would change if we had 4 outs per inning instead of 3? No? Well, if you had, Expected Binomial Production can provide an answer for you. All you have to do is replace the component formulas for 3-out innings:

$EBP(L,p) = \dfrac{p^{L+1}}{1-p} [3 + 2L(1-p) + \tfrac{1}{2}L(L+1)(1-p)^2]$

… with the component formulas for 4-out innings:

$EBP(L,p) = \dfrac{p^{L+1}}{1-p} [4 + 3L(1-p) + L(L+1)(1-p)^2 + \tfrac{1}{6}L(L+1)(L+2)(1-p)^3]$

You would then linearly combine the four L=0,1,2,3 formulas using exactly the same coefficients as before, to get the expected runs per 4-out-inning of a real team. Divide that by the EBP for the team in 3-out-innings to get the factor by which run production would go up. Examination will show that The Homers will have their production go up by a factor of 4/3, an increase of 1/3. Any real team will have it go up by even more, because the first term increases by 1/3; the second term (the one that’s linear in 1-p) goes up by 50%; the third term (quadradic in 1-p) goes up by 100%; and you get an additional fourth term. The largest values of L get the biggest increases.

If instead of the ratio of these two numbers you are seeking the difference between them, their formulas line up nicely. You can just take the differences of the four L-component formulas, and then linearly combine those differences. Subtracting the 3-out versions from the 4-out versions, the differences of the component formulas are:

$Difference = \dfrac{p^{L+1}}{1-p} [1 + L(1-p) + \tfrac{1}{2}L(L+1)(1-p)^2 + \tfrac{1}{6}L(L+1)(L+2)(1-p)^3]$

And interestingly, this can be rewritten as

$Difference = \dfrac{p^{L+1}}{1-p} [1 + L(1-p)(1 + \tfrac{1}{2}(L+1)(1-p)(1 + \tfrac{1}{3}(L+2)(1-p)))]$

In general, if OPI = the number of outs per inning, EBP says that on average, the runs per inning is expected to be

$EBP(L,p) = \dfrac{p^{L+1}}{1-p} \displaystyle\sum_{i=0}^{OPI-1} [(OPI-i) \binom{L-1+i}{L-1} (1-p)^i]$

You would then linearly combine the four L=0,1,2,3 formulas using exactly the same coefficients as before, to model real teams.

Strategy and lineup construction

Obviously, you could also use it for the purpose for which it was designed, which is to see how run production changes in especially low or high OBP situations, such as when a very low OBP-against pitcher faces a team with a collectively low OBP, or when the lowest-OBP portion of a team’s lineup is coming up. This could help inform a team whether it makes sense to use sacrifices and other techniques for manufacturing a single run, or to aim for a multiple-run inning.

You could also use it in trying to decide whether it is better to add a particular power-oriented hitter versus a particular on-base type of hitter to a given team’s lineup. One team may see more improvement by adding the power hitter, whereas another team may benefit more from adding the on-base hitter.

Predicting percentages of innings with a particular number of runs scored

Deciding which of two possible offenses is “better” isn’t necessarily just a matter of figuring out which lineup ought to produce more runs per inning. A lineup that more consistently scores something may win more games than a lineup that scores the same amount overall, but has more frequent “big innings” as well as more frequent innings in which it does not score. My theory is that the former is what you’ll get with a high-power team, and the latter is what you’ll get with a high-OBP team. Add to that the notion that big innings are likely to produce runs a lot of extraneous runs, that is to say, runs that are more than is necessary to win the game, and we arrive at the conclusion that a high-power team might be expected to win more games than a high-OBP team that scores the same average number of runs per inning.

It’s a hard idea to test using game data, because there are many other factors involved in arriving at a win. However, there is a variation we can make to the way the EBP formulas are derived that may be able to help decide this. It produces the likely fraction of innings in which a given team will score 0 runs, 1 run, 2 runs, etc. For example, here is what it predicts for 2016 MLB teams, alongside the actual fractions from Baseball-Reference.com.

Fractions of innings with particular run totals as predicted by EBP – 2016
Team	0 R	1 R	2 R	3 R	4 R	5+ R
LAA	74.65%	12.57%	6.78%	3.32%	1.53%	1.15%
HOU	73.59%	13.03%	7.08%	3.49%	1.60%	1.21%
OAK	75.86%	12.41%	6.44%	3.03%	1.33%	0.93%
TOR	72.67%	13.19%	7.34%	3.70%	1.74%	1.36%
ATL	76.13%	12.01%	6.36%	3.07%	1.40%	1.03%
MIL	74.50%	12.64%	6.84%	3.35%	1.53%	1.14%
STL	71.97%	13.58%	7.54%	3.78%	1.77%	1.37%
CHC	71.84%	13.21%	7.54%	3.92%	1.91%	1.58%
ARI	73.27%	13.23%	7.17%	3.51%	1.61%	1.20%
LAD	74.29%	12.76%	6.89%	3.37%	1.54%	1.15%
SFG	74.79%	12.38%	6.73%	3.34%	1.56%	1.20%
CLE	72.74%	13.20%	7.31%	3.67%	1.73%	1.35%
SEA	72.78%	13.24%	7.31%	3.65%	1.71%	1.31%
MIA/FLA	75.34%	12.29%	6.58%	3.21%	1.47%	1.11%
NYM	74.13%	12.93%	6.94%	3.37%	1.52%	1.11%
WSN	73.40%	13.05%	7.14%	3.54%	1.64%	1.24%
BAL	72.28%	13.67%	7.46%	3.67%	1.68%	1.24%
SDP	76.63%	12.22%	6.23%	2.87%	1.23%	0.82%
PHI	76.89%	12.05%	6.15%	2.84%	1.23%	0.83%
PIT	73.91%	12.61%	6.97%	3.52%	1.67%	1.32%
TEX	72.71%	13.36%	7.33%	3.64%	1.68%	1.28%
TBR	74.00%	13.22%	6.97%	3.31%	1.47%	1.03%
BOS	69.51%	13.97%	8.16%	4.33%	2.16%	1.86%
CIN	74.98%	12.61%	6.69%	3.22%	1.45%	1.05%
COL	71.10%	13.79%	7.76%	3.96%	1.89%	1.51%
KCR	75.71%	12.38%	6.48%	3.08%	1.37%	0.98%
DET	72.23%	13.39%	7.45%	3.76%	1.77%	1.39%
MIN	73.98%	13.02%	6.98%	3.38%	1.53%	1.12%
CHW	74.71%	12.69%	6.76%	3.27%	1.48%	1.09%
NYY	74.87%	12.60%	6.73%	3.25%	1.47%	1.07%

Fractions of innings with particular run totals – 2016 actual
Team	0 R	1 R	2 R	3 R	4 R	5+ R
LAA	73.41%	14.31%	6.56%	2.79%	1.61%	1.33%
HOU	71.54%	15.76%	7.74%	3.26%	1.02%	0.68%
OAK	74.86%	14.02%	5.52%	3.38%	1.38%	0.83%
TOR	72.30%	14.36%	7.35%	3.30%	1.37%	1.31%
ATL	73.22%	15.64%	7.04%	2.46%	1.23%	0.41%
MIL	73.40%	14.58%	7.15%	2.92%	1.25%	0.69%
STL	71.73%	14.79%	6.43%	3.59%	2.21%	1.24%
CHC	69.69%	15.67%	8.18%	3.40%	1.80%	1.25%
ARI	70.95%	16.80%	6.87%	2.86%	1.16%	1.36%
LAD	72.86%	14.57%	6.77%	3.04%	1.59%	1.17%
SFG	73.29%	13.90%	7.81%	2.53%	1.23%	1.23%
CLE	70.58%	15.51%	7.16%	4.03%	1.81%	0.90%
SEA	72.29%	14.81%	6.24%	3.57%	1.78%	1.30%
MIA/FLA	73.76%	14.96%	6.12%	3.34%	1.25%	0.56%
NYM	73.91%	13.77%	7.61%	2.84%	1.25%	0.62%
WSN	70.74%	16.63%	6.35%	3.38%	1.86%	1.04%
BAL	71.54%	15.38%	7.13%	3.36%	1.40%	1.19%
SDP	73.49%	15.04%	6.11%	3.09%	1.37%	0.89%
PHI	74.53%	15.60%	5.31%	3.11%	0.90%	0.55%
PIT	72.73%	14.42%	7.01%	3.37%	1.44%	1.03%
TEX	70.41%	16.40%	7.12%	3.42%	1.33%	1.33%
TBR	72.29%	16.39%	6.74%	2.50%	1.25%	0.83%
BOS	68.14%	15.76%	8.61%	3.92%	1.82%	1.75%
CIN	73.47%	14.39%	6.37%	3.29%	1.10%	1.37%
COL	71.03%	14.69%	6.06%	4.67%	1.67%	1.88%
KCR	74.43%	13.96%	6.50%	2.63%	1.31%	1.17%
DET	71.29%	15.34%	7.00%	4.20%	1.33%	0.84%
MIN	70.77%	17.81%	6.46%	2.86%	1.16%	0.95%
CHW	71.47%	17.23%	6.13%	3.65%	0.90%	0.62%
NYY	72.45%	16.22%	5.94%	3.29%	1.33%	0.77%

At first glance, these numbers are pretty close. But look at this table of how much it overestimates the numbers of runs:

EBP’s overestimates of fractions of innings with particular run totals – 2016
Team	0 R	1 R	2 R	3 R	4 R	5+ R
LAA	1.24%	-1.73%	0.22%	0.53%	-0.08%	-0.18%
HOU	2.05%	-2.73%	-0.66%	0.23%	0.59%	0.53%
OAK	1.00%	-1.61%	0.92%	-0.35%	-0.05%	0.10%
TOR	0.37%	-1.18%	-0.01%	0.40%	0.37%	0.05%
ATL	2.91%	-3.64%	-0.68%	0.62%	0.17%	0.62%
MIL	1.10%	-1.94%	-0.32%	0.43%	0.28%	0.45%
STL	0.23%	-1.21%	1.11%	0.19%	-0.44%	0.12%
CHC	2.14%	-2.46%	-0.64%	0.52%	0.10%	0.34%
ARI	2.32%	-3.57%	0.29%	0.66%	0.46%	-0.16%
LAD	1.43%	-1.81%	0.12%	0.33%	-0.05%	-0.03%
SFG	1.50%	-1.52%	-1.08%	0.80%	0.32%	-0.03%
CLE	2.16%	-2.30%	0.14%	-0.36%	-0.08%	0.44%
SEA	0.49%	-1.57%	1.07%	0.09%	-0.08%	0.01%
MIA/FLA	1.57%	-2.67%	0.45%	-0.13%	0.22%	0.55%
NYM	0.22%	-0.84%	-0.67%	0.53%	0.28%	0.49%
WSN	2.66%	-3.58%	0.79%	0.15%	-0.23%	0.21%
BAL	0.74%	-1.71%	0.33%	0.31%	0.28%	0.05%
SDP	3.14%	-2.82%	0.12%	-0.22%	-0.14%	-0.07%
PHI	2.36%	-3.55%	0.84%	-0.26%	0.33%	0.28%
PIT	1.18%	-1.82%	-0.03%	0.15%	0.23%	0.29%
TEX	2.30%	-3.03%	0.21%	0.22%	0.36%	-0.05%
TBR	1.71%	-3.17%	0.24%	0.81%	0.22%	0.19%
BOS	1.37%	-1.78%	-0.45%	0.41%	0.34%	0.11%
CIN	1.51%	-1.79%	0.31%	-0.07%	0.35%	-0.32%
COL	0.07%	-0.90%	1.70%	-0.71%	0.22%	-0.37%
KCR	1.28%	-1.58%	-0.02%	0.45%	0.06%	-0.20%
DET	0.94%	-1.94%	0.45%	-0.44%	0.44%	0.55%
MIN	3.21%	-4.79%	0.52%	0.52%	0.37%	0.17%
CHW	3.24%	-4.54%	0.63%	-0.38%	0.59%	0.47%
NYY	2.42%	-3.62%	0.78%	-0.03%	0.14%	0.31%

It consistently underestimates the number of 1-run innnings, while consistently overestimating numbers of 0-run innings, and overestimating the number of multiple-run innings about two-thirds of the time. By considering the simplifying assumptions made in the derivation of EBP, we can speculate why these consistent biases occur. Zero-run innings occur less frequently than EBP predicts because, I speculate, it does not account for run-manufacturing activities, such as stolen bases, bunts, hitting behind the runner, and sacrifices. It also doesn’t account for wild pitches and balks, and some errors. On the other hand, multiple-run innings occur less frequently than EBP predicts because, I speculate, it does not account for the extra outs made on the bases, such as in double plays, caught stealings, and getting thrown out taking an extra base. Such outs comprise between 5% and 7% of all outs made. They bring innings to an end more quickly, and thus will cause fewer runs to be scored than EBP would predict; but since these outs can’t reduce the number of runs scored in innings in which nobody would have scored anyway, and can have at most a one-run impact on innings in which one run would have scored, this omission will selectively have a greater impact on what EBP says would have been multiple-run innings. I’m hoping to do some future research on how to compensate for these simplifying assumptions and arrive at truer predictions.

The formula for this variation is presented and explained over here.

Predicting left on base rates

Another variation will produce predictions of numbers of runners left on base per inning. I’m not sure the value in that, but it comes out pretty naturally as part of the derivation. It is not as simple as taking a weighted average of the coefficients of the linear combination, perhaps like this:

Average runners LOB per inning = LOB1 + 2 * LOB2 + 3 * LOB3

That’s because we’d be assuming a certain number of runners got on base in the first place. For example, The Walkers, a team that only ever walks and strikes out, would have LOB0 = 0, LOB1 = 0, LOB2 = 0, and LOB3 = 1. But that doesn’t mean that there average runners LOB per inning = 3 * LOB3 = 3 * 1 = 3. That’s because in some innings they don’t get on base at all, or they get on base less than three times. We have to account for those scenarios, but the calculation is similar to the one that produced the runs per inning formulas. I’ve worked out those formulas in November 2025, and hope to publish them on another page on this site soon.

Of course because of the simplifying assumptions used in the construction of these, there will be some built-in errors. In the original version of this article, I expressed being curious how large those errors are. Well, now I have an idea, as of November 2025: the estimates (at least when applied to team data from 1972) come out consistently 24% to 34% high. And this is a good thing.

Huh? What? Good? Well yes, I’ll explain. EBP works pretty well when used as a run estimator, even though it ignores base advances that are not the direct result of a base reach by the batter, and also ignores all outs made by baserunners. No steals are considered, nor advances on a fly out, wild pitches, grounding into double plays, caught stealing, nor thrown out trying to advance an extra base, etc. For it to give those good results, the effects of ignored base advances on run production, and ignored outs on the basepaths on the same, must be about as large as each other, and so cancelling out. And it definitely seems that they do. That works in part because one increases expected runs, and the other decreases them.

But in the case of left-on-base numbers, both of these categories of neglected events will tend to decrease those numbers, so by ignoring them, they combine to cause an overestimate in left-on-base rates.

And here’s why that’s good. I’ve wanted to find a way to create an expanded version of EBP that accounts for these events on the basepaths. I’ve had a few different ideas in that regard. I’d like to do it in a way that can be applied across all all on-base percentages. But whether or not it does, it has to work well. Which idea is best? Well, ideas that overestimate both the effects of base advances and OOB (outs on the basepaths) may cancel each other for predicting runs per inning, but they’d tend to make LOB predictions too low. Ideas that underestimate the effects will likewise make LOB predictions too high. Alternatively, ideas that overestimate one of these while underestimating the other may make LOB predictions just right, but will make runs per inning predictions off. The winning adjustment ideas need to make both LOB predictions and runs per inning predictions on target. Testing the LOB predictions will allow for more certainty that I got these adjustments right.

If I get to a good LOB formulation, perhaps it could be used in a measure of the timeliness of a team’s hitting. Just calculate expected LOB rate, and subtract from it the actual LOB rate to get a “timeliness factor” for a team. The higher the better. Or maybe the expression should be reversed, so the lower the better? And would we find that there is any skill involved, or that these timeliness numbers are just random noise? Regardless, I suspect Baseball Savant’s Batting Run Value captures timeliness of hitting in a much more precise way. I’d have to look at the formulations to be sure, but off the top of my head I’d think (Batting Run Value)/xwOBA might give a good individual and team “timeliness factor”. Might be interesting to compare this to the EBP-based version.

Separating what happens at the plate from what happens on the basepaths

The oversimplifying assumptions of Expected Binomial Production may actually be of some use. It’s possible that we can get a good idea about how baserunning events and outs-on-the-bases events affect run production by looking at the differences between EBP’s predictions and actual run production. There is evidence that the separation from the context of those events is cleaner with EBPf used as a run estimator than with other run estimators. For example, FanGraphs has a statistic it calls BsR that estimates how many more runs above or below average a team’s baserunning prowess earned them in comparison to an average team’s baserunning. Intuitively, it would seem that the run estimator that is most devoid of information of the effects of baserunning on scoring would gain the most from the addition of BsR to its predictions. I did these calculations, and they showed EBPf definitely gaining more in its correlation coefficient by the addition of BsR than did any of the run estimators to which I compared it. Those that already had stolen base and caught stealing information as part of their formula actually got worse. Those that did not got better, but none by as much as EBP did. My impression from this result is that EBPf, used as a run estimator, is the estimator that is most devoid of the effects of base advances and outs on the bases. I suspect it may be useful in some circumstances to have that separation.

Wrapping Up

I welcome your ideas and comments. On Twitter I am @tomisphere.

Nearly all data used in the calculations described in this article came from Baseball-Reference.com, with a small amount of additional data coming from FanGraphs. In particular, calculating EBPt would have been impossible without the base-advance numbers provided by Baseball-Reference.com. I owe them my extreme gratitude.

The Baseballsphere Blog

The (sometimes mathematical) baseball thoughts, analysis, and ideas of Tom McIntyre

Other things you can do with EBP

Predicting runs for college teams, little league teams, minor league teams, etc.

How changing the number of outs per inning would affect run production

Strategy and lineup construction

Predicting percentages of innings with a particular number of runs scored

Predicting left on base rates

Separating what happens at the plate from what happens on the basepaths

Wrapping Up

Predicting runs for college teams, little league teams, minor league teams, etc.

How changing the number of outs per inning would affect run production

Strategy and lineup construction

Predicting percentages of innings with a particular number of runs scored

Predicting left on base rates

Separating what happens at the plate from what happens on the basepaths

Wrapping Up

Share this: