Diverging paths forward for EBP

Though the potential for improvement of Expected Binomial Production is clear, what is not clear is how to go about that improvement. It’s become evident to me that I should not think of improving EBP, but rather I should think of EBP as a launching point for different enhancements that would meet different needs.

One need would be to study how each independent offensive statistic affects run production, regardless of whether the statistics are under an offensive player’s or team’s control or not.  Most attempts to explain the connection between basic stats and runs scored are purposely limited.  They focus only on those statistics deemed to be under the offensive player’s control, such as doubles or stolen bases.  They ignore statistics that count events outside of the offensive team’s control, such as wild pitches and balks.  Wild pitches and balks are examples of mistakes made by the defensive team that are not induced by the skill of the offensive team, but nonetheless can help the offensive team to score more runs.

Why wouldn’t somebody studying run production consider all the statistics that contribute to runs?  Because including all those statistics may actually run counter to their goal.  Most systems for estimating run production were motivated by a desire to assess the value of a player’s offensive contributions.  Players can’t be said to have “contributed” the wild pitches and balks that they benefitted from, however.  Their contributions are those things that result from how well they play the game – things like hits, walks, and stolen bases.  Or how poorly they play it, in the cases of getting caught stealing and striking out.

The benefit of including all the stats, however, is that it may give us a clearer picture of how many runs resulted from a team’s or player’s contributions, by removing from the equation those runs that resulted from something else.  By understanding the extent to which things like wild pitches contribute to runs, we can eliminate the noise that they cause.

Another need would be to only study how team-predictive stats affect run production. This would be from the perspective of someone wanting to put together the best team they can.

Another would be to only study how the predictive stats that can be attributable to individual players affect team run production. This would be from the perspective of determining how much a specific player contributes to a team’s run production.

Each of the three above needs would allow a different set of inputs, each need being more restrictive than the one before it.

Separately, there’s the matter of whether to pursue better curves of the relationship between on base percentage and run production, or to forget about the OBP-dependence and just attempt to create a more accurate run estimator. This choice may lead to very different approaches. For example, to incorporate number of times grounding into double plays (GIDP) as an input, we’d have to construct a complicated model for how GIDP depends on p, and then weave that in to the probabilistic fabric of the EBP formulas, if we were working on improving the curves for OBP dependence. By contrast, if we were just making a better run estimator, we can choose to ignore the p dependence, and just incorporate a team’s GIDP total in a much more direct way.

Finally, there’s the question of maintaining independence of empirical data. Because no data from any baseball league was used to create EBP, it should be equally useful for all baseball leagues. It probably won’t be possible to continue this league-agnosticism as consideration of base advances and outs on the basepaths is factored in, though I do have some ideas on how that may be attempted. It remains to be seen which way to go.

I welcome any comments with ideas on how to proceed, conversation on the same, and the results of any work you may do along one of these paths forward. On Twitter I am @tomisphere.

Nearly all data used in the calculations described in this article came from Baseball-Reference.com, with a small amount of additional data coming from FanGraphs. In particular, calculating EBPt would have been impossible without the base-advance numbers provided by Baseball-Reference.com. I owe them my extreme gratitude.