Predicting Astros batter regression in 2018

Embed from Getty Images

The Astros had an historic offense in 2017. The club ranked first in the major leagues in Runs, first in RBI, best in strikeout percentage, first in Isolated Power, first in batting average, first in On-Base Percentage, first in Slugging Percentage, first in weighted Runs Created Plus (wRC+), and first in fWAR.

In fact, the Astros’ offensive component of fWAR was 161.8, which was not only more than twice as high as the second-spot Yankees (72.8), it ranks as the 7th-best offensive season out of 2,482 since 1901 when the modern era of baseball began.

Whew!

Superlatives do not do it justice, so suffice to say the Astros’ offense was quite good.

The 2018 Astros return with that entire offense intact. One may even argue that the offense returns improved. Carlos Correa and Alex Bregman remain on the up-slope of their major league development curve. Gone are Carlos Beltran, Nori Aoki, Cameron Maybin, and Juan Centeno, who accounted for nearly one-sixth of the Astros’ plate appearances, while batting .218/.300/.374 collectively.

The Astros have the talent to remain the best offensive club in baseball during this upcoming season.

Caution!

Despite this, fans who believe it is a foregone conclusion that the 2018 Astros can repeat or better 2017’s level of offense are betting on a long shot.

The Astros should be the odds-on favorite to repeat as the best offense in baseball, but probably are not likely to repeat as a top-10 all time squad. That is too much to ask.

For one, the Astros were able to maintain a largely injury-free lineup, only losing Correa for a short stretch in the middle of the season.  Completing a season so relatively unscathed is rare. The Astros’ depth at most positions will help them maintain their place in the standings and keep a high overall output, but it will not make Tony Kemp be an Altuve-quality replacement if number 27 is forced to miss any significant time. [note: the author just knocked on wood.]

Were the 2017 Astros Lucky?

Besides their unusual ability to keep an All Star roster healthy, several Astros posted incredible numbers on Balls in Play.

The BABIP stat (Batting Average on Balls in Play) is a useful proxy for determining how lucky a player or team was during a long stretch of play. There’s more to it than that, but this post is going to become thick enough without getting into detail. Fangraphs has an excellent breakdown of BABIP.

The key takeaway is that batters only have a certain amount of control over the percentage of times they actually make it to base when they put a ball into the field of play, as opposed to making an out. Too much depends on defensive quality, positioning, and the pitcher’s ability to generate weak contact. Certain skillsets can make a player sustain batting averages up to 30 points higher or lower than average, but it isn’t typical, and it isn’t something the player can do anything about unless he fundamentally changes his skill set.

The Astros’ team BABIP of .309, as it was not too much higher than the league average of .300. But individual BABIP’s of key players are pretty out there.

20180224-2

Here we see that the Astros’ top six batters by plate appearances posted a BABIP of .328. Granted, that is largely propped up by Altuve’s monstrous .370–more on that in a second.

But based on these numbers it is fair to ask how far back towards the league average these key Astros will regress, which would negatively impact their output compared to 2017.

Luckly, we can answer that…mostly.

Through the magic powers of folks a lot more well-versed in statistical analysis than myself, we have a tool to determine how far off a player’s BABIP was compared to how it should have been based on their batted ball data. In other words – we can determine how lucky they were.

That tool is called xBABIP, or expected BABIP.

DON’T STOP READING YET, I PROMISE THIS WON’T BE OBTUSE.

The idea behind xBABIP is to use other statistics to determine what a player’s BABIP should have been, with as much of the random variation like pitcher, defense, etc, taken out as possible. They do this with regressive analysis blah blah blah by looking at massive quantities of historical data and figuring out which stats tend to correlate most closely with BABIP. In other words – historically speaking, on average, which stats cause BABIP to go up, and which cause them to go down? Those are the stats that are used in the complicated-looking xBABIP formula.

A perfect formula for xBABIP is something of a holy grail among nerdy baseball types. Every couple of years, a young statistician is able to improve slightly on it.

Until recently, the best xBABIP formulas were far more predictive of NEXT year’s BABIP than BABIP is itself, but nerds still felt like it could be greatly improved. The missing factor? Defensive shift data.

See, it has long been known that defensive shifts greatly affect a player’s hit-to-out ratio when putting balls in play.  This is obvious – hitting into a shift gives a higher chance of making an out than not hitting into shifts. But until very recently, batted ball splits that took shifting into account were unavailable.

BUT NO MORE!

Using shifted data and a lot of math, Mike Podhorzer crafted a new xBABIP formula that is even better than the previously-existing ones. If you’re into that kind of thing, read his detailed explanation of how he came up with it. Long story short, it’s a very good improvement over previous versions by adding in ground balls that are pulled into defensive shifts — an often-occurring situation that has a massive effect on BABIP.

It took me a while to deconstruct and reconstruct his xBABIP formula using Fangraphs’ splits tools, but I figured it out.

And, well, it has things to say about the 2018 Astros.

20180224-3

Boooo!!! With caveats.

The big one that jumps out is our MVP Altuve. But there’s a big asterisk there. While this xBABIP formula is projecting him to drop back to a 0.313 BABIP next season, it needs to be pointed out that Altuve has never posted a BABIP that low during a full season in his major league career.

That said, considering his career average BABIP of .339, a large regression back in that direction is likely. It is entirely possible, and not insulting to suggest, that 2017 will end up as the best season of Altuve’s career.  But even with regression, Altuve is still one of the five best hitters in major league baseball, so who cares?

More concerning are a couple of players who were very good last season but xBABIP is frowning on.

Marwin Gonzalez had a sudden breakout season in 2017, and it is fair to think that he will be unable to match his .303/.377/.530 line. xBABIP projects him to drop by a fair margin, and the publicly-available projection systems agree.

Unfortunately, Josh Reddick also had one of the most productive offensive seasons of his career in 2017, and xBABIP sees him dropping back to Earth as well.

In both cases, this isn’t the end of the world–both batters still project to be well above average next season compared to other major leaguers. But it does indicate a probable drop off from 2017’s historic team records.

Let’s not talk about Jake Marisnick. He’ll need to make some adjustments.

But xBABIP sees some big positives too.

The one that jumps out to me is Carlos Correa. Yes, xBABIP sees him coming down somewhat from his lofty .352, and that is expected. However, the formula also pegs him as being able to carry an extremely high .331 in the future. That is great news. As Correa continues developing, if he is able to maintain a well-above-average rate of getting on base when he puts the ball in play, it could make him an MVP-caliber monster. Yay!

There’s also an interesting note about Derek Fisher. He had an intriguing rookie season, but xBABIP tells us that he was unlucky during his 166 major league plate appearances. Granted, his sample of balls in play–especially when adding in cascading splits–makes it difficult to know if this data is real or just noise. But Fisher has been described as one of the fastest guys in the majors, and he hits the ball pretty hard. Perhaps he can maintain a high BABIP, which would mean it’s reasonable to expect his improvement next season.

What does all this mean?

Probably nothing we shouldn’t have already known. The Astros should be great on offense next season, but it might be foolish to expect historic greatness. But statistical tools that are proven to be fairly predictive are useful to make general statements.

  1. Some of the Astros’ best hitters last season will regress, but still project to be extremely good, or even great.
  2. Correa is very good at hitting baseballs, and so is Altuve.
  3. Derek Fisher is an intriguing prospect, and xBABIP may give us some insight into why the Astros protected him from the Justin Verlander and Gerritt Cole trades.