Using xAVG to Look Deeper into Astros Spring Training

My, what a World Series victory does to fans in the lead-up to the next season. The expectations. The hype. The prospect of going back-to-back. Ever since the Astros convened in West Palm Beach to begin their long campaign to return to the postseason, fans have been digesting, analyzing, debating every morsel of information that comes out of spring training camp. Be it a statement from manager A.J. Hinch, or how many errors the outfield has committed, or how many dingers Preston Tucker’s little brother has jacked into the Florida sky, nothing has gone unnoticed by fans hungry for the regular season to begin in South Oklahoma on March 29.

As baseball fans, we usually like to play with statistics in this time of eager expectation. But during spring training, we have a problem: Do spring training stats really mean anything? If not, why bother keeping them at all? If they mean something, what conclusions can we draw from them?

Here are some of the fundamental problems with spring training stats:

  • Small sample size: Few hitters get more than 60 plate appearances, and many receive less than a dozen. Pitchers are lucky to get in ten innings of work. Many spend less than three on the mound.
  • The whimsical nature of the baseball gods: You know, those invisible spirits that determine which balls sneak under a shortstop’s outstretched glove, which balls find their way into a random hole in the outfield wall, and which home runs are stolen at the fence by a leaping center fielder.
  • Experimentation: Many players, especially the established vets who are “locks” to make the opening day roster, may use their spring training games for things other than building the best-looking statistical resume. Batters may want to try out new stances, new swing patterns, or new grips on the bat. Pitchers might want to perfect their changeup or try to throw their fastball with more deceptive movement. In these cases, we wouldn’t expect to find high batting averages or low ERAs. For these players, such things are beside the point.

While we can’t easily calculate the extent to which experimentation or indifference have on spring training stats, we might be able to apply one type of statistical information calculated using previous years’ playing results in a way that helps reduce the “luck” factor inherent in those small spring training sample sizes.

This project attempts to begin to rectify, using the sparse spring training batted-ball data available, the hitting stats of batters who seem to be unusually favored – or disfavored – by the baseball gods. Some players, like Tony Kemp, have hit the ball an awful lot in spring training, but have little to show for it in the “batting average” column. Others players, like Preston Tucker’s little brother Ted, can seem to do no wrong. However, years and years of watching baseball have taught us that no player remains blessed or cursed by the baseball gods forever. BABIP – the percentage of batted balls in play that turn into hits – varies from player to player, but his actual on-field results tend, over time, towards that percentage in spite of short-term fluctuations such as those we sometimes see in spring training.

The conclusions that can be drawn from the following attempt to calculate “Expected Batting Average” for each Astros hitter in spring training are admittedly very tentative and limited. Please use your own judgment in deciding if these data ultimately mean anything at all. It may, however, be a worthwhile exercise to see if we can massage the Astros’ spring training batting data if, for no other reason, than to bring some cold, objective analysis to the emotional and sometimes mud-slinging debates that have surrounded the most beloved (and, perhaps, loathed?) Astros. You know the young ones I’m talking about: JD Davis, Derek Fisher, Tony Kemp, AJ Reed, Kyle Tucker, Tyler White. We want only the most worthy players to make it to Minute Maid Park, so let’s at least be clear about our numbers as we attempt to decide who meets the grade.

With the team’s announcements that Derek Fisher and J.D. Davis will make the opening day roster, while Tony Kemp, A.J. Reed, and Tyler White will begin the year in the minors, there is some joy and also some disappointment among fans. The following data may help us understand some of the less obvious things Hinch must take into consideration when making those difficult choices about who begins the season in Fresno or Corpus Christi instead of taking on the South Oklahoma Mall Cops on March 29.

How the Expected Batting Average (xAVG) calculations were done

The chief goal of this project was to calculate what the batting average of each Astros player in 2018 spring training (limited to those with at least a chance of cracking the OD roster) would be if their batted-balls-in-play fell for hits at the rates projected by Steamer, as opposed to the actual rates at which they have fallen for hits in real games. The general idea here is that some players’ spring batting averages have been inflated by sheer luck or defensive ineptitude, while other players’ averages have been deceptively suppressed by having an unusually high number of balls hit at fielders in a position to convert an out.

The basic formula applied here for xAVG is simply: “(Expected Hits plus Actual Home Runs) divided by At-Bats”. The number of At-Bats for each player is already known. Expected Hits is calculated as “Projected BABIP times the number of Batted-Balls-in-Play”. And, if you didn’t already know, Batted-Balls-in-Play is defined as: “At-Bats minus Strikeouts minus Home Runs plus Sacrifice Flies”, all of which are known values in the ordinary stat sheets. Again, we rely on Steamer for Projected BABIP (xBABIP) numbers in this exercise.

xavg

Please note that the xAVG formula used above leads us to numbers of expected hits given with decimals. Of course, in real life there can be no such thing as a “partial hit”. We could round these numbers to make them more realistic. I have chosen to leave them unrounded, as we are dealing in the realm of “imaginary hits” anyway, so we might as well imagine there is such a thing as a partial hit.

Caveats and Cautions

Admittedly, there is a tremendous amount that the above xAVG figures don’t tell us. Batting Average (BA) is inherently an ugly, outdated statistic, and I personally dislike it. It would be much more interesting to know something like Expected On-base Plus Slugging (xOPS) or Expected Weighted On-Base Average (xwOBA), but those would be more difficult and speculative calculations. While working in the world of “imaginary hits,” we would need to determine how many of those hits would likely be singles, doubles, and triples; and we might even want to predict things like walks and the likelihood of beating a close throw for an infield hit. It would take a more math-savvy person that this writer to calculate xOPS or xwOBA, but in theory the players with a high slugging percentage (from 2017, these would include Correa, Altuve, Gonzalez, White, and Springer) and speed (Fisher, Marisnick) would “look better” according to these metrics than they do according to xAVG.

There are many other factors that limit the usefulness of the xAVG data. For example:

  • How pitching quality affected the number of home runs and batted balls in play. Baseball-Reference handily calculates an “Opponent Quality” (OppQual) score which assigns a value of 10 to an at-bat against a major league pitcher and progressively smaller values to at-bats against minor league pitchers of successively lower levels. No attempt has been made here to adjust the xAVG figures for OppQual, but the OppQual values are provided here in case someone would like to try to do that.
  • There is precious little detailed batted-ball data available to the public for spring training games. If we knew more about each player’s percentage of ground balls, line drives, fly balls, and pop-ups, that would tell us which players are putting the ball into play in ways that are more likely or less likely to lead to hits. Ditto with stats on soft, medium, and hard contact. Exit velocities and launch angles – all of these things would be hepful to know. While teams probably collect such information for their internal use, it is not freely available to the public.
  • Bases-on-balls are one important kind of offensive production which do not boost a player’s BA or xAVG. Usually, working a walk requires plate discipline and a good eye; and potentially, a walk can score a run. The fact that xAVG does not value walks means that Astros hitters who have drawn a lot of walks this spring have been disadvantaged when viewed by this metric. Chief among them are Evan Gattis, Alex Bregman, Derek Fisher, and Max Stassi.
  • Because the number of spring training games is so small, weather can affect players’ performance to a certain extent. We have all seen days in Florida where the wind either made it conducive to hitting home runs or nearly impossible to hit home runs. Hitting homers is a great way for a player to increase his xAVG; but imagine, for example, that A.J. Hinch primarily played you on days with the wind blowing heavily toward the infield, while the player competing with you for the same job (say… first base) tended to get played on days with the wind blowing out to right. Your teammate could easily get a couple of cheap home runs while you, with the same exit velocities and launch angles, go home with two outs on fly balls caught at the warning track.
  • An offensive player’s running speed can weaken the significance of the xAVG statistic. If a player is fast and steals a lot of bases, a single for him can easily allow him to end up on second base, whereas for his slower teammates, it may be necessary to hit a double to get to second. Therefore a fast player with a slightly lower xAVG than a slow player may actually take more total bases and generate more offense for his team.
  • Steamer’s BABIP projections sometimes prove to be quite inaccurate. Particularly for younger players who have not created a very large data set for Steamer to consider, actual regular season BABIP may differ significantly from the projection. Also, if a player is nursing an undisclosed injury, his actual BABIP may be lower than the projection simply because the projection assumed normal health.
  • The xAVG statistic says nothing about the hitter’s defensive abilities. Particularly when it comes to making roster decisions, A.J. Hinch must consider both the offensive as well as the defensive value a player brings. Above-average defensive players like Josh Reddick and Jake Marisnick will probably be forgiven if their offensive contributions are not among the team’s best, but below-average defensive players like J.D. Davis, Evan Gattis, and Tyler White will need to help the offense score more runs than their errors and failed defensive plays created for the opponent.
  • Particularly in the late innings of spring training games, umpires sometimes expand the strike zone egregiously. Those hitters unfortunate enough to have a home plate umpire eager to get the game concluded may find that they strike out when, had the strike zone been more rigorously enforced, they may have put the ball in play and thus improved their xAVG.

Spring training xAVG may ultimately prove irrelevant to the batting averages we see in the regular season. Opposing pitching in the regular season will tend to be more challenging that in the spring, and even the seasoned veterans will be doing their utmost in the regular season to get hits and score runs.

In 2017’s spring training, the Astros hitters in competition for a roster spot collectively hit .264 (weighted BA), while in the regular season the team batted .282, an improvement of .018 BA points. We will have to wait and see whether a similar relationship holds up in 2018. So far this year, the Astros hitters surveyed here have put up an average BA of .269 (weighted) and .263 (unweighted) in spring training.

Players of Interest

What, if anything, does xAVG tell us about the Astros offense this spring? Perhaps, it tells us a little bit about which players might be doing a good job of putting the ball in play but are looking bad in the stat sheets due to being on the wrong side of the baseball gods for the time being. Conversely, xAVG might help us temper our expectations of players wowing us with hits all over the ball park as they put up actual batting averages that look worthy of Cooperstown one day.

The spring’s offensive heroes to date, Kyle “Ted” Tucker and J.D. Davis, are good even by the xAVG metric, just perhaps not quite as good as their actual batting averages pretend to indicate. MVP Jose Altuve and “Woo-man” Josh Reddick also force themselves into the conversation. Kyle Tucker hit well at AA Corpus Christi last season, but he earned a more modest BA of .265 over 317 plate appearances – far short of the lofty .410 he has produced in the Florida spring this year. Tucker’s xAVG of .312 still leads the team, though — not bad for a young man just barely old enough to order a beer.

J.D. Davis’s five home runs have greatly boosted his xAVG since we add home runs at full value into the numerator of the xAVG calculation. The reason for this is that there is no way for a fielder to defend against a home run, hence there is no element of luck at play as to whether the ball hitting the ground in one spot or another will affect the runner’s ability to reach base. If, for example, Davis had hit only two home runs, his xAVG would be a mere .265. It may be relevant to Davis’s home run count that he has faced some of the most lightweight competition in terms of opposing pitching (relative to other Astros’ hitters), but in the present model we are not taking that factor into consideration. This should not be taken as a knock on Davis, though: “Bubbles”, as he’s sometimes called, has hit with authority all spring long, and looks ready to contribute at the big league level this year.

Two hitters with low actual batting averages – Tony Kemp and Tyler White – stand out as victims of momentary misfortune on the baseball field. They have not only put the ball into play in 75% or more of their plate appearances, but they have also successfully elevated the ball, posting GO/AO (Ground Out to Air Out) ratios of 0.84 (Kemp) and 0.75 (White). These ratios suggest that if maintained in the regular season, both batters will tend to avoid being put out easily on ground balls and instead have more chances to reach base on line drives and deep fly balls. The good news for Kemp and White is that over time, their actual on-field results should start to come closer to those predicted by their projected BABIP figures. This number is a little higher for the speedy Kemp, but even White’s is not very low given his tendency to hit the ball hard and for distance.

The xAVG metric may or may not suggest that a few Astros sluggers could struggle out of the gate in 2018. Jake Marisnick, who had always been a below-average offensive player at the major league level prior to 2017 (posting a wRC+ of just 59 in 2016), broke out in a big way in 2017, hitting .243 and achieving a wRC+ of 117. Skeptics have contended that Jake from Rake Farm will never match last year’s feat, while others believe he has made lasting changes that will keep him raking for years to come. Marisnick’s spring xAVG of .192 lends some ammunition to the skeptics, but the man who posted some of the team’s highest exit velocities of 2017 could quickly quiet this pessimistic crowd by launching a few balls up onto the train tracks at Minute Maid Park.

Derek Fisher, who flashed hot out of the gate in 2017 but then struggled through the end of the season, has had difficulty putting the ball in play this spring, thus driving his xAVG down to just .208. However, he has enjoyed good fortune on those balls he has put into play, such that is his actual batting average comes in at a very good-looking .292. Combined with a healthy number of walks, Fisher has managed to provide an above-average amount of offense to his team, currently sitting on a spring OPS of .886. One of the Astros’ strongest hitters as measured by exit velocity, if Fisher can just maintain good contact rates and avoid the ground ball, he stands to be a top contributor of home runs this year. His monster HR/FB rate of 26.3% over 166 plate appearances last year is probably not sustainable over the long term, but it is a sign of the incredible power that Fisher stands to bring to the game.

None of the Astros’ three likely catchers for 2018 (Brian McCann, Max Stassi, Evan Gattis) have batted especially effectively this spring, though McCann’s xAVG of .185 suggests he has been doing much better than his paltry .042 actual batting average suggests. McCann remained absent from spring training games from March 10 to 18, so perhaps he is taking a more carefully regulated approach to his preparations. Let’s not forget that Max Stassi, who in 2013 famously became the Astros’ only rookie to earn his first RBI by taking a pitch to the head, packs a real punch with the bat. In limited at-bats last year, Stassi reached exit velocities as high as 110.5 mph and launched monster-shot home runs of 428 and 442 feet in estimated distance. If Stassi can apply that power to more batted balls in 2018, he may well achieve a high OPS even if his batting average doesn’t crack .250.

Finally, spring xAVG gives us an early look at how Marwin Gonzalez – another of 2017’s breakout stars – is performing at the plate. Prior to last year, Gonzalez had never posted more than a 111 wRC+, which he did in 2015. When his wRC+ jumped to 144 in 2017, many fans wondered whether the season was a fluke or a sign of things to come. Gonzalez’s spring xAVG of .234 is more in line with how he hit back in 2012, a year in which he posted a wRC+ of 65. I like to think that Marwin is just experimenting this spring and taking it easy in sunny Florida.

While spring training statistics may have little predictive value, by using expected BABIP to turn regular batting average into xAVG, we might be able to compare players’ performance over the small spring sample sizes a little more fairly and accurately. This is especially the case for players who are not locks for the opening day roster and who are indeed trying very hard to produce as much offense as possible. In the end, we may regret having to watch some of our favorite players start the season in the minors, but at least the perspective that xAVG provides may help us understand how spring training might have gone had the BABIP luck been spread a little more evenly among teammates.