Accuracy, in brief
When used as a run estimator on actual major league team seasonal data (1955 though 2016), Expected Binomial Production (EBP) falls about where you’d expect it to fall among other leading run estimators in terms of accuracy. When comparing estimators based on numbers of inputs they use, you expect the list from most to least accurate to look something like the following (go to my page of information about each of these run estimators to learn what these are):
Extrapolated Runs (XR)
Runs Created – 2002 (RC2002)
Runs Created – technical (RCtech)
Estimated Runs Produced (ERP)
New Estimated Runs Produced (NERP)
Base runs (BaseRun)
Equivalent Average runs (EqR)
Runs Created – stolen bases (RCsb)
Total Offensive Productivity (TOP)
Expected Binomial Production – team (EBPt)
Expected Binomial Production – fixed (EBPf)
Extrapolated Runs Reduced (XRR)
Runs based on Weighted Runs Above Average (wRAA)
Runs Created (RC)
Here, the ones that use the most inputs are on the top, and the ones that use the fewest are on the bottom. When evaluating accuracy based on correlation coefficient, you get the following order:
Extrapolated Runs (XR)
Total Offensive Productivity (TOP)
Base runs (BaseRun)
New Estimated Runs Produced (NERP)
Runs Created – technical (RCtech)
Estimated Runs Produced (ERP)
Runs Created – 2002 (RC2002)
Extrapolated Runs Reduced (XRR)
Equivalent Average runs (EqR)
Expected Binomial Production – team (EBPt)
Expected Binomial Production – fixed (EBPf)
Runs based on Weighted Runs Above Average (wRAA)
Runs Created – stolen bases (RCsb)
Runs Created (RC)
Each run estimator’s placement in this list is based on the average of its correlation coefficients to individual seasons of team runs-per-inning data from 1955 through 2016. So that’s the average of 62 correlation coefficients for each estimator, with each correlation coefficient having been calculated over 16 to 30 data points. (Converting each estimator’s output of runs per season to runs per inning actually increased the correlations of every estimator in a very consistent manner.)
EBP falls right about where you’d expect it to. More importantly, the range of correlation coefficients of the methods on this list falls within a tight range, of .9415 through .9568 By comparison, batting average and on base percentage have correlations of .7731 and .8516, respectively (slugging percentage has .8860, and OPS an excellent .9393). We conclude that EBP gives good results on real major league data since 1955, and additionally, because it doesn’t take into account outs on the bases nor base advances (except in one small way), it leaves plenty room for improvement.
Correlation coefficients don’t tell the whole story on accuracy. One can also look at the root-mean-square error (RMSE) and what I call “bias”. It turns out that RMSE can be broken down into two components, like the x-coordinate and the y-coordinate of a point on an x-y graph. One of these components is described by the correlation coefficient, and the other is described by the bias. Bias is basically the amount by which the average of your estimates deviates from the average of the actual runs produced. Some run estimators do additional manipulations on their formulas to eliminate the biases, something that all run estimators could do if they wanted. Of those that don’t, EBP does about as well with bias as the other good estimators. Of the 62 annual biases I calculated for each estimator, I looked to see if about half were positive and half were negative; this is the ideal. EBP had more negative ones, but also a significant number of positive ones. This tells me that it tends to be low, but that the reducing effect of not including base advances is mostly offset by the increasing effect of not including outs on the bases.
Please visit why I use correlation coefficient to evaluate accuracy to learn why I prefer this over other commonly used measures of accuracy like RMSE.
Please visit A comparison of run estimators and Expected Binomial Production for a fuller evaluation of the accuracy of EBP in the context of other run estimators, including charts that show each run estimator’s annual bias and RMSE graphically.
Potential for improvement
I have a goal of adding in the effects of double plays, stolen bases, caught stealing, and other baserunning events to the model in a way that won’t introduce biases, and that preserves the integrity of the three p-dependence plots shown earlier in this article. However, in the interim, I tried some ad hoc experiments to try to determine the potential for improvement of EBP, based just on its correlation coefficient when used as a run estimator.
Of these I think the best-designed and most telling one came from the observation that run estimators that include more numerous data inputs in their formulas tended to achieve higher correlations to runs produced. So for the experiment I modified each formula by adding terms containing inputs that the formula lacked, but only those inputs that helped it. Rather than try to customize the coefficients used for each input to the formula I was adding it to, I simply used the coefficient from the formula for Extrapolated Runs, which is the king of the linear run estimators. Doing it this way was both simpler, and had the feel of a more controlled experiment. So for example, for BaseRuns I added terms from the XR formula for sacrifice flies and strikeouts, terms that benefitted the BaseRuns correlation coefficient. However, I did not add a term for sacrifice hits, which did not benefit its correlation coefficient, and is the only other input it lacks.
I also checked what happened if I added reach-on-error (ROE) numbers to the singles totals used in each formula, as these numbers are included in EBP. Every formula benefitted from the addition of ROE to the number of singles, so this was the other modification I made to all the formulas (except of course to EBP which already included it).
For full details on the terms added and the resulting correlations, please see Assessing Expected Binomial Production’s potential for improvement.
To summarize the results, putting these estimators on more equal footing in terms of amount if input data brought their correlations significantly closer together, as one would expect. It also pushed EBPf up to the middle of the pack of correlation coefficients, and EBPt to the top, at 0.9606.
Though I would not put these modifications into practice for any of these estimators due to the biases they introduce in the results, and other absurdities that show up, I am encouraged by this and other experiments that show that EBP has great potential for improvement by incorporation of what happens on the basepaths, both outs and base advances.