The meaning of each part of the EBP formulas

In the introduction to Expected Binomial Production, and in the full derivation of Expected Binomial Production, we saw the following formula for the dependence of a team’s expected runs per inning on the probability p that each batter reaches base:

Expected\ runs\ per\ inning\ =\ \dfrac{p^{L+1}}{1-p}  [3 + 2L(1-p) + \frac{1}{2}L(L+1)(1-p)^2]

Here, L is the number of runners left on base in every inning in which the team scores (real teams leave different numbers of runners on base in different innings, so we’re talking about some fictional, idealized teams here).

This may look quite complicated, but if we break it down into three factors and examine each, we’ll be able to attach meanings to each part, in such a way that the entire formula makes sense.

The break down with overview of meanings

By making a slight change, we can write this as

EBP(L,p) = \dfrac{p}{1-p} p^L [3 + 2L(1-p) + \tfrac{1}{2}L(L+1)(1-p)^2]

This can be broken into three factors that have distinct meanings:

\dfrac{p}{1-p}

p^L

3 + 2L(1-p) + \tfrac{1}{2}L(L+1)(1-p)^2

The first portion, \tfrac{p}{1-p}, is a fraction that represents base reaches per out made. For a team that only hits home runs (The Homers), it is equal to runs scored per out.

The second portion, p^L, is a threshhold penalty. For a team whose first L base reaches in an inning produce no runs, this is a factor by which scoring is reduced. It may be better in the end, however, to think of this as a left-on-base penalty, where L is the number of runners left on base at the end of an inning.

The third portion, 3 + 2L(1-p) + \frac{1}{2}L(L+1)(1-p)^2, could be called an “outs per inning multiplier”. It is best made sense of by comparing to a game of baseball in which there is only one out per inning instead of three. A team that only hits home runs and therefore never leaves runners on base will score the same number of runs in three one-out innings as they will in one three-out inning that has the same pattern of base reaches and outs. By contrast to that, a team that hits only doubles (L=1), singles (L=2), or walks (L=3), will often score more in one three-out inning than they will in three equivalent one-out innings. The reason for this is that some of the runners that would have been left on base in the first two one-out innings might now score in the one three-out inning. So in this portion, the first term of “3” represents the tripling in the number of outs per inning; the 2L(1-p) and \tfrac{1}{2}L(L+1)(1-p)^2 terms represent those runners who would have been left on base in three one-out innings that score in the one three-out inning.

So in summary, the EBP formula boils down to

EBP(L,p) = (Base reaches per out) (Threshhold penalty) (Outs per inning + Factor for runners who scored but would have been left on base in a one-out inning)

A closer look at the base reaches per out factor

The first factor, \tfrac{p}{1-p}, I have referred to as base reaches per out. Let’s see why. We can approximate p, the probability of reaching base in a plate appearance, by actual total base reaches divided by actual total plate appearances:
p ~ BR/PA

In our fictional ideal universe, all outs are made at the plate, so outs are the same thing as plate appearances that don’t end in a base reach. Therefore
Outs = PA – BR

But
1-p ~ 1 – BR/PA = PA/PA – BR/PA = (PA-BR)/PA = Outs/PA

To get \frac{p}{1-p} we just divide one of these expressions by the other:
\dfrac{p}{1-p} \approx \dfrac{BR/PA}{Outs/PA} = \dfrac{BR}{Outs}.

Okay, so the first factor represents base reaches per out. Does it make intuitive sense that this relates to runs per inning? Think of each base advance as progress toward a run, and each out as progress toward the end of an inning. Then base reaches per out is like portions of runs divided by portions of innings. It then makes sense that there would be some proportionality between them.

But baseball isn’t as straightforward as other sports, such as basketball. In basketball, every successful basket brings you points. In baseball, many successful plate appearances lead to no points. We must account for that.

A closer look at the threshold penalty factor

The second factor, p^L, is what accounts for that. I refer to it as the threshold penalty factor, because it represents a threshold of base reaches a team must reach before it starts getting runs in an inning.

Note that for The Homers, who only hit home runs, there is no threshold penalty. Every base reach produces a run.

Let’s compare them to The Walkers, who only get walks (or strikeouts). Their first three base reaches in an inning will load the bases without scoring any runs. On their fourth base reach, and on every base reach after, they score a run. The Walkers therefore have a threshold penalty of 3. Their first three base reaches get them to the threshold of scoring, and only then do they start scoring runs.

So why does this reduce their scoring by p^L?

Before we get into that, let’s have a look at a result from the full derivation of Expected Binomial Production. There I showed that in a game of baseball that is played with OPI outs per inning (so OPI usually equals 3), the expected runs per inning is given by

Expected\ runs\ per\ inning\ =\ \dfrac{p^{L+1}}{1-p} \sum_{i=0}^{OPI-1} [(OPI-i)\binom{L+i-1}{L-1}(1-p)^i]

When we put OPI=3 into this formula, we get the formula we’ve already seen. But if we look at one-out games by setting OPI=1, the first two factors stay the same, but the third factor goes away:
Expected\ runs\ per\ one-out\ inning\ =\ \dfrac{p}{1-p} p^L

So we should be able to demonstrate the threshold penalty factor by looking at one-out innings, without the additional complications you get with three-out innings (one of which would be having to explain the third factor at the same time).

In our simplified universe, a one-out inning consists of a string of base reaches followed by one out. For example, if we represent base reaches for The Homers with an H, and outs with an O, they could have the following one-out innings:
O (0 runs)
HO (1 run)
HHO (2 runs)
HHHO (3 runs)

and so on.

For The Walkers, if we represent base reaches with a W, they could have the following innings:
O (0 runs)
WO (0 run)
WWO (0 runs)
WWWO (0 runs)
WWWWO (1 run)
WWWWWO (2 runs)
WWWWWWO (3 runs)

and so on.

We can disregard all innings that produce 0 runs, as they add 0 to our expected average runs per inning. By doing that, we can line up all the remaining innings by how many runs they produce.
1 run:   WWWWO,   HO
2 runs:  WWWWWO,  HHO
3 runs:  WWWWWWO, HHHO

and so on.

Now notice that for each number of runs scored, The Walkers have three base reaches more than The Homers. As we’ve discussed before, that’s their threshold penalty, their first three base reaches of the inning. Notice that, in terms of base reaches and outs, if you cross off those first three base reaches of each inning for The Walkers, then their innings look exactly like The Homers’ innings.

What are the probabilities of each of these innings occurring? With p as the probability of a base reach, and (1-p) the probability of an out, we just take the product of these probabilities:
1 run:     pppp(1-p) for the Walkers     p(1-p) for The Homers
2 runs:   ppppp(1-p) for the Walkers    pp(1-p) for The Homers
3 runs:  pppppp(1-p) for the Walkers   ppp(1-p) for The Homers

and so on.

For each amount of runs, the probability of The Walkers reaching that run total is precisely ppp or p^3 times as much as the probability of The Homers reaching that run total. Check each line and you’ll see this. And that’s just the probability of the first three batters reaching base in the inning – in other words, the probability of The Walkers reaching their threshold. And for The Singlers that threshold probability is p^2, and for The Doublers, p.

Once over the threshold, The Walkers score just like The Homers do. In the sum of probabilities times runs that gives us our expected runs on the inning, the p^3 factors out of the sum, and the sum becomes \tfrac{p}{1-p}.
So that’s how we get p^L as the threshold penalty. It’s the probability that The Walkers or Singlers or Doublers reach the point of scoring runs in the inning.

A closer look at the outs per inning multiplier

We just had a glimpse at how run production works when baseball is played with one-out innings. What happens when we move from one-out innings to three out innings? We saw this already in the last section by comparing what the Outs Per Inning (or OPI) formula gives us for one-out innings (OPI=1):

Expected\ runs\ per\ inning\ =\ \dfrac{p^{L+1}}{1-p}

… and for three-out innings (OPI=3):

Expected\ runs\ per\ inning\ =\ \dfrac{p^{L+1}}{1-p}  [3 + 2L(1-p) + \frac{1}{2}L(L+1)(1-p)^2]

In the jump from one-out to three-out innings, run production is multiplied by the third factor in our formula:

3 + 2L(1-p) + \tfrac{1}{2}L(L+1)(1-p)^2

Just from that, we can see that the name “outs per inning multiplier” is appropriate for this factor. But let’s look closer to see if we can ascribe meaning to the parts of this multiplier.
Let’s start by listing what this looks like for each of the four fictional teams:
The Homers (L=0): 3
The Doublers (L=1): 3 + 2(1-p) + (1-p)^2
The Singlers (L=2): 3 + 4(1-p) + 3(1-p)^2
The Walkers (L=3): 3 + 6(1-p) + 6(1-p)^2

Things look simplest for The Homers, and indeed they do make the easiest place to start our analysis. For The Homers, tripling the number of outs in an inning triples the expected number of runs scored per inning. Why is that?
Let’s suppose The Homers play three one-out innings that go like this:
HHHHO HHO HO

Each H represents a home run, and each O represents an out. Over these three one-out innings, they score seven runs.

What if this same sequence had played out as a single, three-out inning? They would still hit seven home runs, so they would still score seven runs. There would never be any runners left on base. If you think about it, for The Homers, it doesn’t really matter when the outs happen, and when the base reaches happen. It doesn’t matter if you stop innings after one out, or after three. Their number of runs scored per outs made will always be the same. So the expected number of runs in one three-out inning is the same as the expected number of runs in three one-out innings. And therefore, the formula for the expected number of runs in one three-out inning is three times the formula for their expected number of runs in a one-out inning.

Now let’s consider the same sequence of base reaches and outs for The Walkers, this time using W to represent the walks:
WWWWO WWO WO

In the first one-out inning, they score one run and leave the bases loaded. In the second, they score none and leave two. In the third, they score none and leave one. Seven base reaches and only one run – life for The Walkers sure can be frustrating.

But we can make it less frustrating for them by playing three-out innings instead of one-out innings. Now all three of the runners who were left on base in the first inning of the one-out innings game score in the three-out inning. Now they score four while stranding three, instead of scoring one and stranding six. A much better runs-to-base-reaches ratio.

The takeaway here is that we can’t just multiply their expected runs in one-run innings by three to get their expected runs in three-run innings. We have to multiply by more than three, to account for the fact that some of the runners who would have been left on base over three one-out innings now score in the one three-out inning. That’s where those extra terms come from in the additional factor in their three-out inning formula. Notice that because 1-p only ever has a value between 0 and 1, those extra terms always take a positive value. That’s because the only change that happens when you switch to three-out innings is that some runners who would previously have been left on base, now score.

Notice also that those extra terms disappear (go to zero) when p gets close to 1. That’s because when p goes to 1, almost every plate appearance ends up as a base reach, and so almost everyone who gets on base scores. Three-out innings, in that case, behave a lot like three one out innings.

Can we actually derive these extra terms, just by counting up the number of baserunners who are no longer left on base? We can indeed, and now we will. However, this will be much easier to demonstrate for the switch from one-out innings to two-out innings, so that’s what we’ll do here. The line of reasoning is similar for the one-to-three-out inning switch, but more complicated to do in that case.
Let’s plug OPI=2 into our previous formula to get the expected runs formulas for two-out innings:

Expected runs per inning for two-out innings = \frac{p^{L+1}}{1-p}[2 + L(1-p)]
Expected runs per inning for The Homers (L=0) = \frac{p}{1-p}(2)
Expected runs per inning for The Doublers (L=1) = \frac{p^2}{1-p}(2 + (1-p))
Expected runs per inning for The Singlers (L=2) = \frac{p^3}{1-p}(2 + 2(1-p))
Expected runs per inning for The Walkers (L=3) = \frac{p^4}{1-p}(2 + 3(1-p))

Now let’s consider The Doublers. In each of their two one-out innings, they may leave up to one runner on base. If they leave nobody on base in either one-out inning, then those are both innings with just one plate appearance, that one plate appearance being an out. The equivalent two-out inning consists of just a pair of outs. In both ways of going about it, no runs score, and no runners are left on base, so the switch from one-out to two-out innings makes no difference in this case.

Now let’s consider the case in which only one of the innings has a runner left on base. Then, the number of runners left on base in the two-out inning with the same sequence of base reaches and outs is also one. Also no difference in the overall number runners left on base for this case.

The only difference in runners left on base comes when each one-out inning has a runner left on. Then we have two runners left on over the two one-out innings, but only one runner left on in the one two-out inning. In those situations, because one fewer runner is left on base, one more runner scores in the two-out inning than in the two one-out innings.

So we just need to figure out how frequently those situations happen, and multiply that frequency by the number of extra runs scored (1) to get the amount we add to our formula. The fraction of one-out innings with a baserunner, for The Doublers, is just p; that’s the odds that the first batter of the inning gets on base. So to have baserunners in each of two consecutive one-run innings is pp = p^2. That is therefore the fraction of all two-run innings that we expect to get an extra run. Multiplied by the number of extra runs scored in that situation (1) gives p^2 as the additional expected runs for the two-out inning.

Given that the expected number of runs for two one-out innings is

\dfrac{p^2}{1-p}(2),

then the expected number of runs for one two-out inning must be this plus p^2:
\dfrac{p^2}{1-p}(2) + p^2

A little algebra converts this into the formula we showed above:
\dfrac{p^2}{1-p}(2) + p^2 = \dfrac{p^2}{1-p}(2) + p^2\dfrac{1-p}{1-p} = \dfrac{p^2}{1-p}(2 + (1-p))

The same reasoning can be used to derive any of the other formulas for 2-out innings or 3-out innings.

Summing (or multiplying) it all up

So now we can break down our formula into meaningful parts, as follows:

Expected\ runs\ per\ inning\
=\ \dfrac{p}{1-p} p^L [3 + 2L(1-p) + \tfrac{1}{2}L(L+1)(1-p)^2]
= (Base reaches per out) (Threshhold penalty) (Outs per inning + Factor for runners who scored but would have been left on base in a one-out inning)

This can be broken into three factors that have distinct meanings:

\dfrac{p}{1-p}

p^L

3\ +\ 2L(1-p)\ +\ \tfrac{1}{2}L(L+1)(1-p)^2

Here,
Base reaches per out = \frac{p}{1-p},
Threshhold penalty = p^L,
Outs per inning = 3,
Factor for runners scoring on extra chances = 2L(1-p)\ +\ \tfrac{1}{2}L(L+1)(1-p)^2

Advertisements