Batter's Box Interactive Magazine Batter's Box Interactive Magazine Batter's Box Interactive Magazine
MLEs, translations of minor league stats to a major league context, were pioneered by Bill James. Conceptually, they are the first cousins of park neutral batting stats. Both attempt to adjust for distortions of the statistics created by the various venues in which professional baseball is played.

James found that by applying the appropriate park, league and level of competition factors, a player's performance in AA or AAA, translated to the big leagues, was a surprisingly accurate predictor of what that player would do in the major leagues.

Testing the accuracy of MLEs can only be done by translating the minor league stats of players who have had significant playing time in the major leagues. This is the heart of the problem, since only a small percentage of regulars in AA and AAA become regulars in the majors. This introduces selection bias. We know fairly well what factors to apply to the players who do make the transition to the majors, but we can't be sure if these are valid for players who never get to the majors or bounce between triple A and the majors over several years. It is likely that the players who make a successful transition to the major leagues have qualities that help them do this, identified by scouts and front office personnel but not necessarily reflected in the batting line.

Take the case of Reed Johnson, Toronto's new temporary (phantom) 4th outfielder. Craig Burley graciously provided an MLE for Johnson's 2001 season in AA Tennessee - .346 OBP/.407 SLG (753 OPS). How can we tell if this is a realistic translation of Johnson's actual performance? One concern is that Johnson was 24 when he put up those numbers in AA: most players who eventually become big league regulars are a couple of years younger when they reach AA.

To answer the question posed in the title would require a detailed evaluation of each of the various MLE methods. Hopefully the reader will find the following analysis akin to a sabermetric stroll in the park - not to be taken too seriously, merely to be enjoyed.

What I did is I took that MLE at face value and tried to identify players who actually played in the majors at roughly the same age and displayed the hitting ability suggested by that MLE.

I identified all major league outfielders, 1994-2002, who had at least 300 PA for a given team, were around the same age as the 2001 version of Reed Johnson, and produced an OPS between 723 and 783. There were 17 such players, but four of them were eliminated because they had no seasons in AA of 300 PA or more. The following 13 formed the comparison group for Johnson's provisional MLE:

John (F.) Mabry(1995), Tony Tarasco(1995), Brian L. Hunter(1995), Rondell White(1996), ALex Ochoa(1996),JOhnny Damon(1998), Magglio Ordonez(1998), Wilton Guerrero(1999), Chad Allen(1999), Torii Hunter(2000), Juan Encarnacion(2000), Kevin Mench(2002), Milton Bradley(2002)

These 13, on average, put up a .332 OBP/.415 SLG (747 OPS) in the majors in their age 24.5 (+-0.5) season, which was a little worse than Johnson's provisional 2001 MLE (753 OPS). The average performance of the 13 players in AA was .352 OBP/.463 SLG (815 OPS), compared to Reed Johnson's untranslated .381 OBP/.453 SLG (834 OPS). However, Johnson's 2001 age was 24.56, while the comparison group's average age was 22.03 - a full 2.5 years younger.

As you might imagine, the average hitter progresses quite a bit between age 22.0 and 24.5 - but how much is that exactly? It's difficult to tell, and various methods might be applied to come up with an estimate.

I identified players who were major league regulars before age 22.5 (plus 4 outfielders between 22.5 and 22.7). I whittled down the comparison group by extracting pairs of seasons either 2 or 3 years apart, with each player being no older than 25.2 years in the "old" season.

Guys who hold down regular jobs in the majors at age 22 or younger are often star players, so this group has a lot of top flight talent (A-Rod, Manny, Chavez, Andruw, Jeter, Tejada, Vlad, Green, Rolen and others) but also lesser players (Cristian Guzman, the two Alex G's, Grieve, Wilton Guerrero). This group was selected in order to study how players hit at age 22 compared to a few years later.

The young group (age 22.2) hit .341 OBP/.433 SLG (774 OPS) and the same players 2.5 years later (age 24.7) hit .359 OBP/.471 SLG (830 OPS). I'm going to assume that improvement of this magnitude is typical for professional hitters as a whole, and that a player's OPS typically improves by about 7.2% between those ages. Intuitively, it seems like a good estimate.

Let's return to our original comparison group. They hit .332 OBP/.415 SLG (747 OPS) in the majors; assuming that they developed normally as hitters, what would they have hit 2.5 years earlier if they had been in the majors? Their estimated OPS would have been 747/1.072 = 697 OPS at age 22.0 (assuming the 7.2% factor applies to them). But at that age, they were in AA and actually produced an 815 OPS (on average).

Based on their performance in AA and the reverse-projection of their major league numbers at age 24.5, the AA to Majors conversion factor for OPS should be in the neighbourhood of 697/815 = 0.855.

The final assumption to make is to apply that conversion factor to Reed Johnson's 2001 OPS at AA Tennessee. This is not meant to be a by-the-book sabermetric study - there are things like park factors, league factors to take into account. As well, a sophistical MLE system needs to take into account the skills set of each player as well as their age. But ...

The conversion factor (ideally done for each individual batting stat) from one level of competition to another is THE most important thing, as well as the most difficult, to pin down. Reed Johnson's 834 OPS, translates to an MLE of 713 OPS, fully 40 points lower than the one Craig provided.

This is not to say that that's what Reed will hit this year. He's 2 years older and there would be a very good chance that he'd hit better than that, though of course he won't get enough PA to provide a decent sample.

For comparison purposes, here are some other 2001 Tennessee OPS conversions (using 0.855 as the conversion factor):

Josh Phelps .......(588 PA) 828 OPS
Jayson Werth ....(443 PA) 758 OPS
Orlando Hudson ..(349 PA) 732 OPS
Reed Johnson .....(624 PA) 713 OPS
Ryan Fleming .....(398 PA) 675 OPS
Jerson Perez ......(466 PA) 614 OPS
Glenn Williams ....(544 PA) 599 OPS
Dewayne Wise ...(379 PA) 566 OPS
Matt Logan .......(313 PA) 549 OPS

Are MLEs over-optimistic? | 6 comments | Create New Account
The following comments are owned by whomever posted them. This site is not responsible for what they say.
_Jonny German - Friday, April 18 2003 @ 01:43 AM EDT (#20490) #
Nicely done as always, Robert.

What I'm wondering is if this is something that sabremetrics will never conquer, something that will mean traditional scouts never become obsolete. As Robert states, the only way to test MLEs for their accuracy is to see how well they translate for the players who end up spending significant time in the majors. But the average major leaguer is far from an average person or an average athlete. By definition there are many more players who post an .800 OPS in AA than will ever post an .800 OPS in the majors. When we say that an .800 OPS at AA translates to a .684 OPS in the majors we have to add this very large corollary: if that player continues to develop such that he actually does make it to the majors. That's crucial, because the ones that do continue the development, that make it on to AAA and the majors, are the exceptions, the ones that have most everything go right... they overcome or don't have major injuries, their growth curve coincides smoothly with their promotions, they don't have a fundamental flaw in their game that they can only overcome against minor league pitching, they didn't just post the great OBP because a disproportionate number of their grounders skipped through the infield. And so on.

A traditional scout can tell you, or at least give you a feel, that Billy mashes all kinds of pitches from all kinds of pitchers, whereas the numbers will tell you he hit 20 homers but won't tell you that 90% of those came off fastballs from the bottom quartile of pitching talent in the league. Okay, sabremetrics could overcome this, but you'd have to take a large step in the statistical record keeping from the already dizzying detail. Not to say that they won't.

Many factors cannot be readily measured and / or are mostly about luck, good or bad. Robert mentions age compared to level of competition, which is valid but not as accurate as saying "level of development" or "developmental maturity" in relation to the league... and how on earth do you measure that by the numbers? When Reed Johnson put up those good numbers in AA was he at the peak of his developmental curve or was he still learning and getting better as a hitter? In this case it's easy to guess "peak" given his age, but even if he had been exactly league average age we couldn't say with certainty how much more development was left in him.
Gitz - Friday, April 18 2003 @ 02:47 AM EDT (#20491) #
Scouting will never be eliminated, nor should it; whether or not it becomes less important than statistical analysis remains to be seen. Like anything else, a balance of the two would seem the best approach.
_Chuck Van Den C - Friday, April 18 2003 @ 08:01 AM EDT (#20492) #
When we say that an .800 OPS at AA translates to a .684 OPS in the majors we have to add this very large corollary: if that player continues to develop such that he actually does make it to the majors.

Jonny, with all due respect, I think your corollarly is superfluous.

If I understand the concept correctly, an MLE is simply meant to answer the question: given a player's performance in the minors in a given season, how would he have hit in the majors that season?

An MLE is not meant to be a predictive tool in and of itself, though it could be used as an input in a forecasting tool.

If Reed Johnson posts what seems to be a high MLE, it is not the MLE's fault if his performance is not considered within a broader context. Was he old for his level? Was his season out of character with his preceding seasons? Etc.
Coach - Friday, April 18 2003 @ 09:12 AM EDT (#20493) #
This is great, Robert, but in Reed Johnson's case, let's remember that he's not here to be a slugger. He got the call because he's a superior defender to Aven and Ryan, and the Jays are hoping he hits about as well. Even his adjusted MLE suggests he's better with the stick than comparable glove men Colangelo and Wise. So until Werth is in a hitting groove, or Gross has dominated AAA, or a better option becomes available, Reed's the 25th man. Actually, he and Linton are tied at 24.5 on my list.

Is there a free Internet source for "the appropriate park, league and level of competition factors" that go into calculating MLE? I rely on generalities (I know the FSL is a pitcher's league from experience and observation, but I couldn't tell you the "best" hitting park in the loop). I'm always curious about the differences between the PCL and the IL: who would prevail in a neutral park between the irresistible force of Edmonton hitters and the immovable object of Richmond pitchers?

The old school/new wave balance is essential to evaluate talent, but statistical analysis also saves time and money. The Jays don't need a bird dog at every sandlot from Aruba to Zimbabwe, looking for needles in haystacks. They are going to "miss" some 16-year-old phenoms that way -- big deal. In Canada, every high school and midget Rep coach (including me) thinks he has a pro prospect or two. These kids should recognize the odds against them, and try to get a college scholarship. If they excel at that level, the Jays (and other teams) will find them. If they don't prove good enough for a ML organization to invest in, they'll get an education and can play in the CBL. Sending dozens of scouts (on expense accounts) to check out tens of thousands of teenage players is not exactly a science; you can still miss Rich Harden. By the time a player is 20 and has played in the NCAA, even a good junior college, or the lowest levels of the minors, his stats become useful indicators -- though not as accurate as AA MLEs -- and your scouts can start taking closer looks at fewer prospects with a better chance of success. The older the player, the more the balance should tilt from tools to data.
_Jonny German - Friday, April 18 2003 @ 12:23 PM EDT (#20494) #
I think we agree on the concept but not on the application, Chuck. The MLE says the player would have had a .684 OPS had he been in the majors, agreed. But if that's where it ends, I say "So what? He wasn't in the majors.". It's not meant to be the complete predictive package, no, but it has zero value other than predictive value.
_Chuck Van Den C - Friday, April 18 2003 @ 01:35 PM EDT (#20495) #
It's not meant to be the complete predictive package, no, but it has zero value other than predictive value.

I disagree, though I feel that we're getting into matters of semantics.

Being able to translate a 975 OPS in Albequerque to a 650 MLB OPS is very valuable. But not as a predictive tool in and of itself, but rather as an input into a predictive model.

Hopefully such a model would be sophisticated enough to factor in the context of the MLE. Were his numbers achieved in the majors? Are they an MLE for a minor league season? If so, was the player old for his league? How long has he been at that level? What is the player's age? What are his recent MLEs? How do players of a similar profile develop? Etc.

I am quite sure that MLE's serve as inputs into the the more sophisticated predictive models of late: Nate Silver's PECOTA model (one that compares a player to others fitting his profile), Gary Huckaby's Vlads (a neural network based model) and BP's (Clay Davenport's?) Wiltons (a neural net model as well, I believe).
Are MLEs over-optimistic? | 6 comments | Create New Account
The following comments are owned by whomever posted them. This site is not responsible for what they say.