Are MLEs over-optimistic?

Friday, April 18 2003 @ 12:14 PM EDT

Contributed by: robertdudek

MLEs, translations of minor league stats to a major league context, were pioneered by Bill James. Conceptually, they are the first cousins of park neutral batting stats. Both attempt to adjust for distortions of the statistics created by the various venues in which professional baseball is played.

James found that by applying the appropriate park, league and level of competition factors, a player's performance in AA or AAA, translated to the big leagues, was a surprisingly accurate predictor of what that player would do in the major leagues.

Testing the accuracy of MLEs can only be done by translating the minor league stats of players who have had significant playing time in the major leagues. This is the heart of the problem, since only a small percentage of regulars in AA and AAA become regulars in the majors. This introduces selection bias. We know fairly well what factors to apply to the players who do make the transition to the majors, but we can't be sure if these are valid for players who never get to the majors or bounce between triple A and the majors over several years. It is likely that the players who make a successful transition to the major leagues have qualities that help them do this, identified by scouts and front office personnel but not necessarily reflected in the batting line.

Take the case of Reed Johnson, Toronto's new temporary (phantom) 4th outfielder. Craig Burley graciously provided an MLE for Johnson's 2001 season in AA Tennessee - .346 OBP/.407 SLG (753 OPS). How can we tell if this is a realistic translation of Johnson's actual performance? One concern is that Johnson was 24 when he put up those numbers in AA: most players who eventually become big league regulars are a couple of years younger when they reach AA.

To answer the question posed in the title would require a detailed evaluation of each of the various MLE methods. Hopefully the reader will find the following analysis akin to a sabermetric stroll in the park - not to be taken too seriously, merely to be enjoyed.

What I did is I took that MLE at face value and tried to identify players who actually played in the majors at roughly the same age and displayed the hitting ability suggested by that MLE.

I identified all major league outfielders, 1994-2002, who had at least 300 PA for a given team, were around the same age as the 2001 version of Reed Johnson, and produced an OPS between 723 and 783. There were 17 such players, but four of them were eliminated because they had no seasons in AA of 300 PA or more. The following 13 formed the comparison group for Johnson's provisional MLE:

John (F.) Mabry(1995), Tony Tarasco(1995), Brian L. Hunter(1995), Rondell White(1996), ALex Ochoa(1996),JOhnny Damon(1998), Magglio Ordonez(1998), Wilton Guerrero(1999), Chad Allen(1999), Torii Hunter(2000), Juan Encarnacion(2000), Kevin Mench(2002), Milton Bradley(2002)

These 13, on average, put up a .332 OBP/.415 SLG (747 OPS) in the majors in their age 24.5 (+-0.5) season, which was a little worse than Johnson's provisional 2001 MLE (753 OPS). The average performance of the 13 players in AA was .352 OBP/.463 SLG (815 OPS), compared to Reed Johnson's untranslated .381 OBP/.453 SLG (834 OPS). However, Johnson's 2001 age was 24.56, while the comparison group's average age was 22.03 - a full 2.5 years younger.

As you might imagine, the average hitter progresses quite a bit between age 22.0 and 24.5 - but how much is that exactly? It's difficult to tell, and various methods might be applied to come up with an estimate.

I identified players who were major league regulars before age 22.5 (plus 4 outfielders between 22.5 and 22.7). I whittled down the comparison group by extracting pairs of seasons either 2 or 3 years apart, with each player being no older than 25.2 years in the "old" season.

Guys who hold down regular jobs in the majors at age 22 or younger are often star players, so this group has a lot of top flight talent (A-Rod, Manny, Chavez, Andruw, Jeter, Tejada, Vlad, Green, Rolen and others) but also lesser players (Cristian Guzman, the two Alex G's, Grieve, Wilton Guerrero). This group was selected in order to study how players hit at age 22 compared to a few years later.

The young group (age 22.2) hit .341 OBP/.433 SLG (774 OPS) and the same players 2.5 years later (age 24.7) hit .359 OBP/.471 SLG (830 OPS). I'm going to assume that improvement of this magnitude is typical for professional hitters as a whole, and that a player's OPS typically improves by about 7.2% between those ages. Intuitively, it seems like a good estimate.

Let's return to our original comparison group. They hit .332 OBP/.415 SLG (747 OPS) in the majors; assuming that they developed normally as hitters, what would they have hit 2.5 years earlier if they had been in the majors? Their estimated OPS would have been 747/1.072 = 697 OPS at age 22.0 (assuming the 7.2% factor applies to them). But at that age, they were in AA and actually produced an 815 OPS (on average).

Based on their performance in AA and the reverse-projection of their major league numbers at age 24.5, the AA to Majors conversion factor for OPS should be in the neighbourhood of 697/815 = 0.855.

The final assumption to make is to apply that conversion factor to Reed Johnson's 2001 OPS at AA Tennessee. This is not meant to be a by-the-book sabermetric study - there are things like park factors, league factors to take into account. As well, a sophistical MLE system needs to take into account the skills set of each player as well as their age. But ...

The conversion factor (ideally done for each individual batting stat) from one level of competition to another is THE most important thing, as well as the most difficult, to pin down. Reed Johnson's 834 OPS, translates to an MLE of 713 OPS, fully 40 points lower than the one Craig provided.

This is not to say that that's what Reed will hit this year. He's 2 years older and there would be a very good chance that he'd hit better than that, though of course he won't get enough PA to provide a decent sample.

For comparison purposes, here are some other 2001 Tennessee OPS conversions (using 0.855 as the conversion factor):

Josh Phelps .......(588 PA) 828 OPS
Jayson Werth ....(443 PA) 758 OPS
Orlando Hudson ..(349 PA) 732 OPS
Reed Johnson .....(624 PA) 713 OPS
Ryan Fleming .....(398 PA) 675 OPS
Jerson Perez ......(466 PA) 614 OPS
Glenn Williams ....(544 PA) 599 OPS
Dewayne Wise ...(379 PA) 566 OPS
Matt Logan .......(313 PA) 549 OPS

6 comments



https://www.battersbox.ca/article.php?story=20030418121416999