Batter's Box Interactive Magazine - Pitcher Evaluation Tools-Part 3-Quality Measurement

Pitcher Evaluation Tools-Part 3-Quality Measurement

At Hall Watch, evaluating quality is Job 1. For starting pitchers, it is not easy. There are so many choices. You can start with the components- walks, home runs, strikeouts, ground balls, fly balls and line drives, and work from there. You can start with the runs allowed and work from there. Or you can look at the won-loss record and work from there. You might very well use a different method for career evaluation than you would use for seasonal evaluation.

In Part 1 of this series, starter standards 1905-2005 were calculated. In Part 2, we looked at quantity measures for starters throughout the ages. Now, it's quality time.

The Components of Runs Allowed

For batters, we look first and foremost at the components of run production-on base percentage and slugging percentage, rather than runs scored or driven in. Why is that? For any specific batter, the number of runs that he scores or drives in will be dependent to a great extent on the performance of the hitters around him in the lineup. We could try to adjust runs scored or driven in to reflect batting order context, but nobody does, because it's just too difficult, even though for Rickey Henderson or Tim Raines, say, it might provide very meaningful information. On the other hand, measuring the components for a particular batter is easy as pie, is totally independent of the performance of other batters in the lineup and provides the information that we need to evaluate quality. It is not quite the same for pitchers.

Walks, Home Runs allowed, and strikeouts are components of runs allowed and are special because they are independent of team defence. They are, aren't they? Well, no. I am not speaking of inside-the-park homers either, but rather the important contribution that catchers make through pitch selection and through the fine art of framing. Still, the quality of the pitcher is the most important factor in these components. What are the other components of runs allowed?

The other components are:
1) opposition batting average on balls in play (BABIP), more commonly called, from a pitcher's perspective, defensive efficiency ratio (DER)
2) stolen base/caught stealing, wild pitch/passed ball/balk, double plays per opportunity and runner advancement ratio (first to third on singles, scoring from first on doubles...)
3) errors
4) situational performance differences, and,
5) bullpen support (ratio of bequeathed runners who score).

Defensive Efficiency Ratio for a Pitcher

It is well known that both pitchers and team defence contribute to the pitcher's DER, but over a season a pitcher's DER is mostly independent of the pitcher's performance. Rather luck or defence plays a greater role over this time frame. Over a number of seasons though, how important is luck vs. the quality of the pitcher in the number of hits the opponents get on balls in play. I did a small cohort study using the delta H statistic provided by Baseball Prospectus. Delta H compares the number of hits on balls in play surrendered by a pitcher with the similar ratio for the team. A negative delta H is good and signifies that the pitcher has given up fewer hits than would be expected from the team defence. At the same time, I thought that it would be useful to look at the Delta R statistic to gauge the importance of the factors other than opposition BABIP in runs scored against a pitcher.

I wanted to find out about power pitchers and the impact of handedness, so I started with five well-known power pitchers from different eras- Walter Johnson, Lefty Grove, Bob Feller, Sandy Koufax and Tom Seaver. For their cohorts, I choose pitchers on their clubs of opposite hand, who pitched the most innings over a period of years. The cohorts were Joe Boehling (1912-1916) and Tom Zachary(1919-1925) for Walter Johnson, George Earnshaw(1928-1933) for Lefty Grove, Al Milnar (1938-1941) for Bob Feller, Don Drysdale (1956-1966) for Sandy Koufax and Jerry Koosman (1967-1977) for Tom Seaver.

Before comparing the pitchers and their cohorts during the specific seasons, here are their career dH and dR statistics:

Pitcher             dH     dR
Walter Johnson     -320   -62
Joe Boehling        42    -46
Tom Zachary         93    -146

Lefty Grove        -44    -137
George Earnshaw    -3     -8

Bob Feller         -52    -56
Al Milnar           21     6

Sandy Koufax       -94     6
Don Drysdale       -12    -48

Tom Seaver         -199   -34
Jerry Koosman       61    -32

At first glance, the power pitcher seems to have a marked advantage in preventing hits on balls in play. Let's compare the power pitchers and their teammate cohorts during the seasons under review:

Pitcher   Seasons   IP    H      dH   R     dR
Johnson   1912-16   1793  1326  -106  421  -53
Boehling  1912-16   805   732    30   329  -40

Johnson   1919-25   1746  1631  -91   682  -21
Zachary   1919-25   1383  1576   21   677  -54

Grove     1928-33   1683  1577  -9    596  -78
Earnshaw  1928-33   1353  1345  -5    723  -1

Feller    1938-41   1237  981   -61   472  -38
Milnar    1938-41   749   780    5    392  -8

Koufax    1956-66   2282  1722  -93   791   10
Drysdale  1956-66   2848  2543  -8    1089 -29

Seaver    1967-77   2814  2230  -65   849  -56
Koosman   1967-77   2310  2060   17   884  -28

Three of the five power pitchers (Johnson, Feller and Koufax) showed significant negative dH (greater than 5% of hits). Tom Seaver also showed a marginally significant result of almost 3%. It could be said that quality is the key factor rather than power pitching as the power pitchers were all better than their cohorts. Comparing Catfish Hunter, who rang up amazing -35, -22, -28 and -49 dHs in successive years from 1972 to 1975 with Vida Blue's more typical single digit figures lends support to this.

Whatever the cause, it is pretty clear that some pitchers are able to sustain the ability to surrender many fewer hits on balls in play than would be expected from the defence behind them. I suspect that this is disproportionately so among pitchers who we will be evaluating for Hall Watch purposes. This is one reason why I would not use fielding independent measures for this purpose.

For current pitchers, we could use ball in play information (line-drive rate, pop-up rate) to refine the fielding independent measures. Unfortunately, this information is not accessible for most of the pitchers who we will be comparing today's contenders with.

Next week, we will look at the other run components and the transition from runs allowed to wins.

Posted by Mike Green on Friday, December 30 2005 @ 08:00 AM EST.

Pitcher Evaluation Tools-Part 3-Quality Measurement | 6 comments | Create New Account

The following comments are owned by whomever posted them. This site is not responsible for what they say.

birdwatcher - Friday, December 30 2005 @ 02:51 PM EST (#138231) #

Interesting idea, but you need a lot more than just 4 or 5 comparisons before drawing any conclusions. Also, the power pitcher designation seems unnecessary. Why not just run team-by-team comparisons covering all starting pitchers who achieve some minimum pitch level, say 100 IP. Then see where the variations occur. I suspect the results will look pretty random, reconfirming McCracken’s original hypothesis that pitchers have minimal control over hits on balls in play. Finally, be careful before calling 3% to 5% differences in Delta H “significant.” If I toss a coin 1000 times and get 525 heads, that’s still not large enough to conclude it’s a defective coin. In fact, it’s not even close. Finally, making percentage comparisons between Delta-H scores and total hits allowed tends to give a misleading picture. Let’s take the minus 94 of Koufax. That might look like a big number compared to his total hits allowed of 1722, but it’s worth remembering that Koufax had almost 2300 IPs. That means he probably gave up at least 7000 balls in play. So, his 94 score amounts to little more than one per cent of all his balls in play. McCracken's work has held up well over the years, so if power pitchers have an advantage over soft tossers, it likely has more to do with their disparate K-rates as opposed to differences in hits on balls in play. But, hey, half the fun is in the chase, so let's keep looking !

Mike Green - Friday, December 30 2005 @ 11:26 PM EST (#138264) #

McCracken has acknowledged that pitchers do influence the outcome of balls in play to a relatively modest degree. I'd be interested in his take on Walter Johnson's -320.

birdwatcher - Saturday, December 31 2005 @ 03:13 PM EST (#138320) #

There’s no doubt WJ’s score is statistically significant – in fact, it’s probably about 6 standard deviations removed from the mean ! But I wonder about the relevance of these early century stats when discussing pitchers’ control over balls in play. WJ’s career was spent with some of the worst teams in MLB, so his DH score may have more to do with the gross incompetence of his pitching colleagues than his own ability to control balls in play. Stephen Jay Gould and others have noted and explained that there was a much greater disparity in skill levels during the early years of baseball compared to today, and I suspect these disparities may affect the interpretation of statistical measures such as the DH score. Think about it as if we did a study today based on A-level pitchers. At this level, pitchers probably vary widely in their ability to control balls in play but these differences tend to disappear at the MLB level as the underperformers are weeded out. I’d argue that testing various ball-in–play theories/hypotheses should definitely focus on post-1920 data, and I suspect using post-1950 or later data is probably good enough.

Mike Green - Saturday, December 31 2005 @ 06:59 PM EST (#138343) #

Birdwatcher, what's interesting about Walter Johnson is the Tom Zachary comparison. Zachary seems to have been a pretty fair pitcher himself and threw into his 40s. Notice the dHs and dRs.

My reading of the data (and it's a purely subjective one) is that Johnson was a fabulous athlete in his early 20s, and that he did everything very well. He was an above-average hitter (not for a pitcher, but as a hitter, period). I'll bet that he was a fine fielder and held runners on. This led to his excellent dR as a young pitcher. As the pre-eminent strikeout artist of his time, I am guessing that he pitched up in the strike zone more than most and generated many more pop-ups and fewer line drives than others.

As he aged, he was a less impressive fielder, and he lost a mile or two on his fastball. He still got more than his share of pop-ups and fewer line drives than typical. Hence, the less impressive dH and dR numbers.

Incidentally, the heads/tails analogy doesn't really work for dH. If Koufax had 7000 BIP, most of those would be expected to be outs. It's not a 50-50 coin. That's why the relevant dH comparison is to hits.

Happy new year to all Bauxites.

birdwatcher - Saturday, December 31 2005 @ 09:12 PM EST (#138346) #

It’s true that hit probability on balls in play is not 50% as in the coin toss (more like 30%) but that doesn’t invalidate the relevance of the statistical tests. The point is, regardless of initial probabilities, there are always going to be outliers in any population, and you use statistical testing to determine if they carry any special significance (i.e. the coin is not fair / the pitcher has some control over balls in play).

For example, if you do a study of 1000 starting pitchers, you’d expect to find about 25 pitchers with negative D-Scores more than 2 standard deviations removed from the mean, and a handful of pitchers (one-half of 1% or about 5 pitchers) would be 3 standard deviations from the mean. Similarily, out of 1000 fair coins, there will be 25 coins that end up showing an unusually large number of heads, and there will be a handful which in the Old West would probably cost the coin tosser his life even though they are actually fair coins.

That’s the problem with zeroing in on specific pitchers. Without some idea what the overall population looks like, it’s impossible to know whether we’re looking at a truly unique athlete as opposed to just normal statistical variations. All sorts of reasons may be cited why Pitcher A really was different and actually could control balls in play – but in the absence of some supporting statistical evidence, we really are just guessing.

Koufax, Seaver , Feller and Grove - all had big, negative DH Scores, all are HOF superstars, so obviously they had some control over balls in play, right ? Well, let me give you some other names: Luis Tiant (3500+IPs, -151 DH), Andy Messersmith, (2000+IPs, -131 DH), Earl Wilson (2000+IPs, -46 DH). And that’s without even looking too hard. All solid, serviceable pitchers but not a superstar among them. Despite the impressive numbers, I’m willing to bet they had no more control over balls in play than any other pitcher. But at least they can say they share the same standard deviation with Koufax, Seaver, Feller and Grove !!!

OK – it’s off to New Year’s now. See you in 2006 Bauxites.

Mike Green - Sunday, January 01 2006 @ 05:12 PM EST (#138375) #

I checked BBRef's list of career pitching leaders by ERA+, and looked at starters from 1950 on. Out of 29 starters with ERA+ over 120, 3 had positive dHs- Jose Rijo(29), Kevin Brown(16) and Whitey Ford(3). On the negative side, Dave Stieb had a dH of -142 (we all knew that that slider was tough to hit). Jim Palmer had a dH of -169. Andy Messersmith and Sandy Koufax we covered. Bob Gibson had a dH of -73. Greg Maddux has a -69. Tom Glavine has a -60. John Smoltz has a -44. I haven't done the total negative dH, but it's likely over 1500 for the 26 pitchers.

Looking at the Messersmith, Catfish Hunter, Palmer and 90s Atlanta starter numbers suggests that it's the overall quality of the pitcher rather than the mph on the fastball which correlated with a -dH.

Pitcher Evaluation Tools-Part 3-Quality Measurement | 6 comments | Create New Account

The following comments are owned by whomever posted them. This site is not responsible for what they say.