Derivation of Expected Binomial Production left-on-base fractions

A previous article provided a full derivation of the four component formulas of Extended Binomial Production (EBP), those being

EBP_{IN}(L,p)\ =\ \dfrac{p^{L+1}}{1-p}\  [3 + 2L(1-p) + \frac{1}{2}L(L+1)(1-p)^2]

Each formula, one for each of four possible values of L, provides a runs per inning expectation for a team that must always get L=0, 1, 2, or 3 base reaches in an inning before they start scoring.

We also have two variations, each providing four component formulas as well; one variation providing runs per plate appearance expectations, the other providing runs per base reach expectations, as follow:

EBP_{PA}(L,p)\ =\ \dfrac{p^{L+1}}{3}\  [3 + 2L(1-p) + \frac{1}{2}L(L+1)(1-p)^2]

EBP_{BR}(L,p)\ =\ \dfrac{p^L}{3}\  [3 + 2L(1-p) + \frac{1}{2}L(L+1)(1-p)^2]

As explained in the introductory article to EBP, to model real teams we must take our four component formulas and add together fractions of each of them. We’re essentially taking a weighted average of the four formulas, with the weights being the fractions. For these fractions, we use the percentage of the time that we expect the team to leave L=0, 1, 2, or 3 runners on base in an inning. We use formulas that try to predict how often that is expected to be for each. Each fraction gets its own unique formula – some of these formulas extremely simple, others very complex.

So how do we come up with these formulas for left-on-base rates?

We start with the last base reach, and work backwards.

It’s easy for the L=0 case, for which no runners are left on base. Because we don’t have any base advances in our world except when there’s a hit, the only way to leave no runners on the bases at the end of an inning is for the last hit to be a home run. We therefore want to estimate the fraction of innings for which the last base reach will be a home run. What fraction of all innings is that? A good assumption is that it’s just as often as any base reach for that team is a home run. So as the coefficient for the L=0 formula we just use the fraction of all base reaches that are home runs.

“Hang on a moment”, you may say, “what about cases in which no baserunners reach? You’re using the fraction of base reaches that are home runs, but some of your innings don’t have a base reach. You’re taking a fraction of the wrong thing!”

It’s a good point you have there. But we can neglect innings that don’t have a base reach, because our formula doesn’t count them, because they provide no runs. Our formulas are basically a sum of probabilities of innings that score runs, weighted by the number of runs they produce. Innings that produce no runs don’t factor into that. We can always therefore, for purposes of determining these coefficients, assume at least one base reach for the L=0 formula, two for the L=1 formula, etc.

From here, it will be useful to create some variables representing how frequently each of the different types of base reaches occur. Here’s what we’ll use:

  fHR = fraction of all base reaches that are home runs
  f3B = fraction of all base reaches that are triples
  f2B = fraction of all base reaches that are doubles
  f1B = fraction of all base reaches that are singles or a reach-on-error (ROE)
  fFP = fraction of all base reaches that are either a walk, a hit-by-pitch, or catcher’s interference

Also, let LOB0, LOB1, LOB2, and LOB3 represent the fractions of innings in which there end up being 0, 1, 2, and 3 runners left on base, respectively.

So what we just figured out above is that

LOB0 = fHR.

Nice and simple!

Before we go on to LOB1, note that our final linear combination will be

EBP(p) = LOB0*EBP(0,p) + LOB1*EBP(1,p) + LOB2*EBP(2,p) + LOB3*EBP(3,p)

Where the four EBP(L,p) comprise one of the sets of four previously-derived component formulas.

Now on to figuring out LOB1, the fraction of innings that end with one runner on base. What sequences of base reaches can lead to this?

Finishing with a home run won’t do – we already used that on LOB0.

What if the last base reach is a triple? That’s perfect. It clears off whoever else may be on the bases and leaves one runner on base. The fraction of innings whose last base reach is a triple is f3B, so we have a portion of LOB1:

LOB1 = f3B + ?

The ? represents innings whose last base reach is some lesser hit, and that leave one runner on.

What if the last base reach is a double? If the double is preceded by a home run, triple, or another double, then we’ll end with one runner on. But if it was proceeded by a single or some form of free pass, we might end up with runners on second and third … two runners on. It depends on whether the runner on first takes an extra base or not. So let’s call f14D the fraction of times that a runner goes from first to home on a double, which we take to be the probability that any particular runner on the team does the same. The probability of the last base reach being a double is f2B. The probability that the second-to-last base reach is cleared off the bases after that double is fHR+f3B+f2B+(f1B+fFP)(f14D). So putting it all together, we now have

LOB1 = f3B + (f2B)[fHR+f3B+f2B+(f1B+fFP)(f14D)] + ?

We can simplify this a little bit by noting that fHR+f3B+f2B+f1B+fFP = 1, and rewrite this as

LOB1 = f3B + (f2B)(1 – (fFP+f1B)(1-f14D)) + ?

And when those runners on first only go to third? We can add that probability to LOB2, so let’s get that expression started:

LOB2 = (f2B)(f1B+fFP)(1-f14D) + ?

Things will get even more complicated when the last base reach of the inning is a single, ROE, or some form of free pass.

Now it’s your turn

What we’ve done so far should give you the gist of how to proceed with deriving the rest of these coefficient formulas, if you’d really like to. If not, just know that there’s nothing to the rest of this derivation that we haven’t already done to this point.

For LOB1, we end up with:

LOB1 = f3B + (f2B)(fHR+f3B+f2B+(f1B+fFP)(f14D)) + (f1B)(fHR+f3B+f2B*f24S) + (fFP)(fHR)

… where f24S is the frequency with which runners go from second base to home on a single.

Not “nice and simple”, like before with LOB0. I hope this doesn’t get too much worse …

For LOB2, we end up with:

LOB2 = (f2B)(f1B+fFP)(1-f14D) + (f1B)(f2B)(1-f24S) + (f1B)(f1B) – (f1B)(f1B)(fFP+f1B)(1-f13S)(1-f24S) + 2(f1B)(fFP)(fHR+f3B) + (f1B)(fFP)(fFP+f1B+2(f2B))(f24S) + (fFP)(f2B+f3B) – (fFP)(f2B)(1-f14D)(f1B+fFP) + (fFP)(fFP)(fHR)

It got much worse. Though there are a couple of terms that could be combined here, but I didn’t combine, for clarity. Maybe it will be harder to leave the bases loaded:

LOB3 = (f1B)(f1B)(fFP+f1B)(1-f13S)(1-f24S) + (r1B)(rFP)(1-f24S)(f1B+fFP+2(r2B)) + (fFP)(f2B)(1-f14D)(f1B+fFP) + (fFP)(f1B+fFP)(f1B+fFP) + (fFP)(fFP)(f2B+f3B)

Yes, not as many ways to leave the bases loaded. Not quite as bad.