## Approach

I could have attempted to do a park-by-park assessment of home field advantage, or a team-by-team assessment of the same, but these would prove difficult. It would be difficult to untangle the effects of the team’s differing abilities from the parks’ home field advantage, or the different park effects from the team’s home field advantage. But looking at home and away splits for the league as a whole gives a balanced view. A single team’s skill level does not skew the results because they play as many games as the home team as they do the away team. And a single park’s effects don’t skew the results because all parks contain an equal number of home and away games represented when using whole-season, all-teams results.

I used Fangraphs’ leaderboard for team stats, selecting both Home and Away splits in turn. This Fangraphs page lets you select a range of years over which to provide cumulative statistics. I found out that the Home and Away splits data there only seem to go back to 2002, which gave me a 17-year span to work with (2002 through 2018). To get a sense of any change or progression in home field advantage, I grouped the data into three portions, of 6 years (2002 through 2007), 6 years (2008 through 2013), and 5 years (2014 through 2018). I wanted to use multiple years at a time to limit the effects of small sample sizes.

I divided all cumulative statistics by plate appearances to turn them into rate statistics, to create a good basis for comparison. First I’ll present these rates, followed by the percent differences between them.

## Home and Away rate statistics

The hitting stats:

2002-2007 2008-2013 2014-2018 Away Home Away 573,544 552,045 567,637 546,923 470,077 452,489 512,295 488,837 508,995 485,948 423,243 404,424 .1546 .1575 .1525 .1551 .1472 .1503 .0474 .0479 .0454 .0464 .0441 .0454 .0045 .0053 .0042 .0053 .0043 .0051 .0270 .0282 .0249 .0266 .0281 .0291 .0818 .0880 .0803 .0869 .0781 .0840 .0068 .0076 .0059 .0067 .0049 .0054 .0094 .0098 .0082 .0086 .0093 .0094 .1726 .1622 .1917 .1821 .2158 .2071 .2614 .2699 .2532 .2626 .2485 .2573 .3275 .3399 .3181 .3317 .3129 .3252 .4153 .4317 .3966 .4165 .4007 .4172 .7428 .7717 .7147 .7482 .7136 .7424

The baserunning stats:

2002-2007 2008-2013 2014-2018 Away Home Away .0143 .0145 .0159 .0163 .0135 .0143 .0061 .0058 .0062 .0058 .0056 .0052

The team collaborative stats:

2002-2007 2008-2013 2014-2018 Away Home Away .1185 .1268 .1109 .1196 .1112 .1195 .1129 .1209 .1056 .1140 .1060 .1138 .0071 .0076 .0067 .0072 .0064 .0069 .0084 .0090 .0079 .0087 .0057 .0058

## Analysis misgivings

Now let me state before going any further that I have misgivings about what I’m about to do. Taking percentage increases in rate numbers that have an upper bound can be a misleading practice. For example, let’s say in some fictional league, players reached base between 98% and 99% of the time. In such a league, a 1% increase in on base rate (say, from .9800 to .9898) is a tremendous increase. And the highest possible increase is about 2%. But for the leagues we know, with a range of about .280 to .400, a 1% increase represents a change from, say, .300 to .303, which is pretty much statistically insignificant. There are better ways to handle it, but it’s not something I can cover in a quick aside here. So note that all the statistics we’re talking about here are taking values well under half their possible maximums (the maximum is 1.000 for most of these, 4.000 for slugging percentage and 5.000 for OPS). When that’s the case, talking about percentage increases in rate numbers is good enough, and it makes intuitive sense to people.

So to be clear, an increase from .200 to .400 would be a doubling, so would be a 100% increase (not 20%).

## Percentage increase in Home stats over Away stats

Here are the percentage differences in these numbers, from the Away numbers to the Home numbers:

2002-2007 2008-2013 2014-2018 2.4% ± 0.5% 2.8% ± 0.5% 2.8% ± 0.6% 1.9% ± 0.6% 1.7% ± 0.6% 2.1% ± 0.7% 1.2% ± 1.2% 2.2% ± 1.3% 2.8% ± 1.4% 19.0% ± 4.4% 25.2% ± 4.6% 18.0% ± 4.9% 4.4% ± 1.6% 6.8% ± 1.7% 3.7% ± 1.8% 7.6% ± 0.9% 8.2% ± 0.9% 7.5% ± 1.1% 11.8% ± 3.4% 14.9% ± 3.8% 9.9% ± 4.4% -6.0% ± 0.6% -5.0% ± 0.5% -4.0% ± 0.6% 4.4% ± 2.8% 5.0% ± 3.0% 1.1% ± 3.1% 3.3% ± 0.5% 3.7% ± 0.5% 3.5% ± 0.6% 3.8% ± 0.4% 4.3% ± 0.4% 3.9% ± 0.4% 4.0% ± 0.7% 5.0% ± 0.7% 4.1% ± 0.7% 3.9% ± 0.4% 4.7% ± 0.4% 4.0% ± 0.5%
2002-2007 2008-2013 2014-2018 1.4% ± >2.3% * 2.1% ± >2.1% * 5.6% ± >2.6% * -5.0% ± >3.4% * -5.9% ± >3.4% * -5.8% ± >3.9% *
2002-2007 2008-2013 2014-2018 7.0% ± 0.8% ** 7.8% ± 0.8% ** 7.5% ± 0.9% ** 7.1% ± 0.8% ** 7.9% ± 0.8% ** 7.3% ± 0.9% ** 7.7% ± >3.3% * 6.5% ± >3.4% * 8.3% ± >3.9% * 6.3% ± >3.0% * 9.8% ± >3.2% * 1.3% ± >3.9% *

I calculated the ± error bars using the formula for the standard deviation of a binomial distribution, with Plate Appearances representing the number of trials, and the actual rate of occurrence as the probability of occurrence.

The error bar sizes represent two of these standard deviations, representing a 95% confidence interval. However, this approach isn’t quite right for the statistics marked with an asterisk (*), because PA doesn’t properly represent the number of trials. Because the actual number of trials ought to be smaller, I’ve placed a greater-than symbol in front of these symbols to demonstrate that the PA-based error bars are too small.
It also isn’t right for those statistics marked with a double asterisk (**), because a “successful trial” can add more than 1 to the statistic’s total, and because the probability of a successful trial varies greatly given the men on base and the number of outs. I’ve provided the PA-based error bars for these anyway, as a reference point.

## Analysis

I must say I was surprised by how all-encompassing home field advantage turned out to be.

I’m not surprised that the biggest effect was for triples. Triples usually happen when the ball is hit to very particular parts of a ballpark, and these parts are, in many cases, unique to each park. The home team players will therefore have a better idea of when they should try to stretch a double into a triple.

But that the home team would have both more stolen bases and fewer caught stealings? That’s harder to explain. Do baserunners simply run faster in their home park? Or run smarter? Is it easier for them to focus on the pitcher’s tells at home because the familiar backdrop is less distracting? Perhaps a little of all of these. Perhaps the umpires slightly favoring the home team as well, on close plays.

Maybe the baserunners run faster because the visiting team’s locker room is smaller, more cramped? Or the visitors are more tired from travel?

You can’t make that case for the hit by pitch rate, though. If anything, the better rested team should be able to dodge an errant pitch more effectively, so the home team should have fewer of these. So it’s either the umpire, or the pitcher.

Hey – maybe I’ve been focusing on the wrong thing. Maybe it’s not that the hitters and runners on the home team do better. Maybe it’s that the pitchers on the visiting team do worse. Standing in the middle of all those thousands of people who don’t like you – the pressure has to be felt by the pitcher most. And throw a major league pitcher’s very finely-tuned control off ever so slightly, and sixty feet six inches away, you get difference between a ball and a strike, or a hittable strike and an unhittable one.  Could this be the psychological effect that Jasoncards was thinking of?

But look at those error bars on SB, CS, and HBP. These are not very significant home field advantages for the stats I’ve been discussing.

They are significant, however, for walks and strikeouts. And these are some of the larger effects we’re seeing: around -5% for strikeouts, and about +7.5% for walks. So is it the hitters, the pitchers, or the umpires making the biggest difference here? I’ll guess pitchers first, then umpires second, and hitters last.

One thing I haven’t mentioned is cheating. You’d think the home team would be more likely to have devices, people, or both planted to let them pick up signs, pitch grips, what have you, or to relay information to the players. Depends on how much of that sort of thing you think goes on. Some does, but how much?

There’s one number here that is surely immune to the effects of umpire’s calls, and to cheating, and that’s intentional walks.  Neither can intentional walks be directly attributed to the skill of the pitcher or the hitter. Yet this stat has one of the largest home team boosts, between 10% and 15% over the visiting team’s rate of intentional walks! There are two causes I can think of here:
1. Because the overall offensive performance is better for the home team, they more often end up in situations that call for an intentional walk;
2. Because the home team bats last, the visiting team has clearer choices in terms of the trade offs they can make in the final inning that will allow them to win the game, including intentional walks.

In the end, the more important stats are the traditional stats like on-base percentage, slugging percentage, and OPS, and these all show about a 4% boost for the home team. But interestingly, the most important stat of all gets a bigger boost. Runs per plate appearance is 7% to 8% higher for the home team. You’d think it would be closer in line to OPS, but it’s not. The nonlinear nature of run production versus the linear nature of OPS could explain this difference.