Batter's Box Interactive Magazine - What Does Pythagoras Say?

There is a fairly predictable relationship between runs scored and allowed, and games won and lost. It does stand to reason, no?

Well, way back in the dawn of pre-history (I believe it was in the 1980 Baseball Abstract, to be precise), Bill James developed a very simple formula to derive a team's expected winning percentage, based on how many runs they scored and allowed.

This formula has been passed down to us through the years. It is known as The Pythagorean Method, and there is a pithy explication of it over at The Hardball Times:

     A formula for converting a team’s Run Differential into a projected Won/Loss 
     record. The formula is RS^2/(RS^2+RA^2). Teams’ actual won/loss records tend 
     to mirror their Pythagorean records, and variances can usually be attributed
     to luck.

Variances can usually be attributed to luck - because as a rule a team that wins more (or less) games than their run differential suggests does not make a habit of it. A team that is five games over one year is just as likely to be five games under the next year. There's nothing that suggests that it's a tangible ability, or the product of a style of play... it's mainly just random luck.

So what should the 2005 standings have looked like?

	        Expected                      Actual	    Difference
EAST	        W   L  PCT  GB   RS   RA      W   L  PCT	
Boston	       91  71 .561   -  910  805     95  67 .586    +4
NY Yankees     90  72 .558   1  886  789     95  67 .586    +5
Toronto	       89  73 .547   2  775  705     80  82 .494    -9
Baltimore      73  89 .454  18  729  800     74  88 .457    +1
Tampa Bay      63  99 .391  28  750  936     67  95 .414    +4
											
CENTRAL	       
Cleveland      97  65 .602   -  790  643     93  69 .574    -4
Chicago Sox    92  70 .569   5  741  645     99  63 .611    +7
Minnesota      84  78 .519  13  688  662     83  79 .512    -1
Detroit	       74  88 .458  23  723  787     71  91 .438    -3
Kansas City    58 104 .360  39  702  935     56 106 .346    -2
											
WEST		
LA Angels      95  67 .583   -  761  643     95  67  .586    0
Oakland	       94  68 .579   1  772  658     88  74  .543   -6
Texas	       82  80 .504  13  865  858     79  83  .488   -3
Seattle	       75  87 .464  20  699  751     69  93  .426   -6
											
2005 National League Standings														
EAST		
Atlanta	       92  70 .566   -  769  674     90  72  .556   -2
NY Mets	       90  72 .554   2  722  648     83  79  .512   -7
Philadelphia   90  72 .553   2  807  726     88  74  .543   -2
Florida	       79  83 .490  13  717  732     83  79  .512   +4
Washington     77  85 .474  15  639  673     81  81  .500   +4
											
CENTRAL		
St. Louis     100  62 .617   -  805  634    100  62  .617    0
Houston	       91  71 .564   9  693  609     89  73  .549   -2
Milwaukee      84  78 .520  16  726  697     81  81  .500   -3
Chicago Cubs   80  82 .492  20  703  714     79  83  .488   -1
Cincinnati     74  88 .460  26  820  889     73  89  .451   -1
Pittsburgh     71  91 .439  29  680  769     67  95  .414   -4
											
WEST	
San Diego      76  86 .470   -  684  726     82  80  .506   +6
LA Dodgers     73  89 .452   3  685  755     71  91  .438   -2
San Francisco  70  92 .431   6  649  745     75  87  .463   +5
Colorado       69  93 .424   7  740  862     67  95  .414   -2
Arizona	       64  98 .398  12  696  856     77  85  .475  +13

Yep, while the four NL post-season teams would be unchanged, the Yankees and White Sox would have missed the post-season, replaced by Cleveland and Oakland. (Note: the "winning percentage" given above is actually an expression of the relationship between runs scored and allowed. That figure is then multipled by 162 to produce the expected wins and losses, which are of course rounded off. And this is why the three 90-72 teams all have slightly different winning percentages. In case you were wondering!)

I think this is one of the two best reasons out there for Toronto fans to be waiting impatiently for 2006. (The other, of course, is the thought of Roy Halladay starting 33 games instead of 19. The Blue Jays had the best pitching in the divison by far even though Halladay missed the entire second half.)

Why did the Blue Jays miss their expected Won-Loss record by the largest margin of any team in the major leagues?

Well, no one really knows for sure. But we do know this. The Blue Jays went 16-31 in games decided by one run this past season. No team in baseball had a worse winning percentage in one-run games. If they had just broken even, if they had gone 24-23... we're looking at an 88-74 record.

A team's record in one-run games is also mainly a matter of random chance - it's not something that repeats year after year.

I promise to explore this specific subject at a later date. But for now, let's just take note of this. It is not the case that a team with, say, a superior bullpen does better in one-run games. The Cleveland Indians had arguably the best bullpen in the AL this past year, and they went 22-36 in one run games. That was the most losses by a single run in all of the majors, and an excellent reason to make them the pre-season favourite in the AL Central next year.

It is also not necessarily the case that a small-ball team - a team in the habit of playing for one run at a time - can be counted on to have a superior record in close games. It is indeed true that the team with the best record in one-run games this past season (35-19, .648) were the Chicago White Sox. The White Sox did lead the AL in sac hits, but they also hit more home runs than the Boston Red Sox. The Red Sox, who scored more runs than any team in the universe, had a record in close games (27-15, .643) almost as good as Chicago's. The best team in the NL (28-18, .609) in one-run games was the Arizona Diamondbacks, a team whose offense is built almost entirely around drawing walks and hitting home runs.

It might be a coincidence - but I doubt it. The two teams that won the most games over and above what their runs scored and allowed led us to expect were the Chicago White Sox and the Arizona Diamondbacks. Do not expect this pattern to be repeated next year.

Posted by Magpie on Monday, October 03 2005 @ 09:30 AM EDT.

What Does Pythagoras Say? | 45 comments | Create New Account

The following comments are owned by whoever posted them. This site is not responsible for what they say.

Mike Green - Monday, October 03 2005 @ 09:51 AM EDT (#129484) #

The idea that teams that underperform their Pythagorean record tend to improve the following season and those that overperform tend to decline has been proven, and is known as the Johnson effect, after former Globe and Mail writer and early Bill James contributor Bryan Johnson.

Flex - Monday, October 03 2005 @ 10:41 AM EDT (#129493) #

This is fascinating and encouraging. Is there any chance you data table gurus could give us a chart showing the Blue Jay trends on the Pythagorean front? Just to get a sense of how the team has risen and fallen compared to their Pythagorean record of the year before. If it's as you say, the team with the worst relationship to their Pythagorean record -- our Jays -- would stand to take the biggest leap next year.

Something makes me wonder, though, whether certain intangibles, and not just luck, come into play. In other words the Red Sox and the Yankees both out performed their Pythagorean stats, and might not this have something to do with their assumption of winning, and could this help explain the streak of the Atlanta Braves?

I know a lot of the posters here discount the effect of such intangibles, but I wonder if the stats themselves might prove that certain teams with a winning tradition (not that the Red Sox have a winning tradition, but let's say they've acquired a winning mindset) tend to outperform their Pythagorean records more often than not.

And, therefore, might a recent but well entrenched tradition of losing, as our Jays have built, tend to push actual performance in the other direction?

Rob - Monday, October 03 2005 @ 10:47 AM EDT (#129494) #

I find it interesting that while it's accepted as fact that the Jays play hard in every game (or 155 out of 162, whatever), they had the worst record relative to their Pythagorean record in all of major league baseball.

Mike Green - Monday, October 03 2005 @ 10:52 AM EDT (#129495) #

The Yankees have beaten their Pythagorean projection easily each of the last 2 years. I attribute it to Rivera and Gordon and totally useless middle relief, combined with a strong and old offence. The Yankees did get absolutely trounced a couple of times; Torre pulled his veterans, put in the useless middle relief and the score got out of hand. As a result, the Yanks did not do as well in blowouts, as you'd expect a really good team to do.

But, the team that blew away Pythagoras was Arizona. And if you can come up with a rational explanation for that, more power to you.

Pistol - Monday, October 03 2005 @ 11:06 AM EDT (#129496) #

While the Jays underperformed relative to R/RA, according to BP the Jays were right in line with their record of 80-82 based on EqA.

http://www.baseballprospectus.com/statistics/standings.php

And, for what it's worth, the Jays record in:
1 run games 16-31
2-3 run games 29-29
4+ run games 35-22

Mike Green - Monday, October 03 2005 @ 11:10 AM EDT (#129499) #

Actually, by the time yesterday's results are accounted for, the Jays' third order record will probably be 82-80.

Craig B - Monday, October 03 2005 @ 11:18 AM EDT (#129502) #

I find it interesting that while it's accepted as fact that the Jays play hard in every game... they had the worst record relative to their Pythagorean record in all of major league baseball.

That's exactly what I would anticipate. A team that plays hard every inning will tend to underperform its Pythagorean, because they will continue to chip away when they're down a lot of runs, and won't let up when they're winning by a lot of runs.

This won't make a big impact on the win column, but it will make a considerable impact in runs scored and allowed. So yes, the Jays' outperforming their Pythags is a sign of a team that plays hard all the time. However, it's also a sign that improvement may be a little harder than it seems.

Mike Green - Monday, October 03 2005 @ 11:26 AM EDT (#129506) #

Craig meant " the Jays underperforming their Pythags". I agree with him. It's the converse situation to the Yankees, who pull their veterans when the game gets way out of hand. The Jays keep fighting. It doesn't account for the full 8 games, but it is some of it.

StephenT - Monday, October 03 2005 @ 01:01 PM EDT (#129525) #

Great thread. I agree with some of the explanations for the Yankees' and Blue Jays' "luck".

On a different theory topic: Does anyone have 2005 park factors? The ones at ESPN seem flaky ( http://sports.espn.go.com/mlb/stats/parkfactor ), e.g. they change after a Refresh.

Magpie - Monday, October 03 2005 @ 01:31 PM EDT (#129530) #

the team that blew away Pythagoras was Arizona. And if you can come up with a rational explanation for that, more power to you.

I don't have a rational explanation, but I can point to the two main factors. One, as noted above, is their very good record in one-run games. The other is their utterly awful record in blowouts.

The D'Backs won 1 game this year by eight runs (8-0).

They lost two games by eight runs (11-3, 11-3).

They lost three games by ten runs (10-0, 10-0, 16-6).

They lost one game by twelve runs (14-2).

They lost one game by thirteen runs (14-1).

They lost three games by fourteen runs (16-2, 17-3, 18-4)

They lost one game by fifteen runs (18-3).

In these twelve games, the D'Backs went 1-11: they scored 35 runs and allowed 155 runs.

In their other 150 games, they went 76-74. They scored 661 runs and gave up 701. They're still over-performing, of course...

Mike Green - Monday, October 03 2005 @ 01:38 PM EDT (#129532) #

Well, that was a surprise. Prior to yesterday's game, the Jays had, according to BP, 81 third order wins. After the 7-2 victory over Kansas City, they now have 80. Do you have to defeat KC (in the memorable words of Moon Martin) "21 to zip" to improve your third order record?

Rob - Monday, October 03 2005 @ 01:46 PM EDT (#129536) #

A team that plays hard every inning will tend to underperform its Pythagorean, because they will continue to chip away when they're down a lot of runs, and won't let up when they're winning by a lot of runs.

You're right -- I was thinking that if the manager is "good" his team will play harder for him, and thus outperform the projection...but now that I think about it, that doesn't quite make sense because as you say, they continue to chip away even if they lose. It was clearly too early in the morning for non-algebraic reasoning...

Craig B - Monday, October 03 2005 @ 01:53 PM EDT (#129538) #

Yeah, see it's one thing to say that a good manager will help the team outperform the Pythag, because that's saying that he'll make good strategic decisions and turn around games where the score is close. I don't think it's a big factor, but it's there in a small way.

Playing hard every inning of every game really isn't the done thing around MLB - it takes a particular environment to produce it. At any rate, it's certainly not a particular trait of good managers to have their teams hustling all the time - most managers who are widely perceived as good have indifferently motivated teams.

Mike Green - Monday, October 03 2005 @ 02:10 PM EDT (#129542) #

Stephen T, I could not find park factors for 2005. I am sure they'll be up somewhere shortly. The ones on espn.com are way off, as you said. I checked the numbers forRogers Centre, Yankee Stadium and Fenway Park; they all played as moderately favourable offensive parks this season.

Jordan - Monday, October 03 2005 @ 02:17 PM EDT (#129543) #

In these twelve games, the D'Backs went 1-11: they scored 35 runs and allowed 155 runs.

Is there a way to factor in these kinds of blowouts into the Pythag formula? You could set it up that for every game that involves a run differential of more than, say, eight runs at any given time, the Pythag calculation is varied by some percentage or other.

That would help account not just for the 11-1 wins and 13-2 losses that throw off the straight Pythag calculation, but also those games where a "playing hard" team is down 10-1 by the third but scores enough runs late to lose 10-8 (as per the Blue Jays). Any number-friendly folks out there think this is doable?

Craig B - Monday, October 03 2005 @ 03:30 PM EDT (#129550) #

I can see doing a "win expectation analysis" from a team's run-scoring profile... saying that when you score 18 runs, your chance of winning is .9999, when you score 5 runs, it's .5481, when you allow 8 runs it's .0985 and when you allow 0 runs, it's 1.0000 ... and then saying OK, you scored 10 runs six times, 9 runs twice, 3 runs 31 times... etc. and then the same for defense.

Seems cumbersome, but it might provide a more meaningful analysis than straight Pythag...

AWeb - Monday, October 03 2005 @ 03:39 PM EDT (#129552) #

I'm pretty number-friendly (a couple of degrees in stats), but I tend to doubt that it would add much to the analysis. That said, it may be that someone has already done it. Other models, mentioned above, just look at other numbers, and make predictions about how many runs a team should have scored.

I could see "downweighting" runs scored/allowed after a certain deficit is achieved. This deficit would probably best be used in such a way as to account for the time of game. I.e., 5 run leads in the 8th might be equivalent to 8 run leads in the 3rd. Cutoffs, based on game winning percentage perhaps, could be created for each situation. All sorts of rules and parameters could be created, making the model more and more accurate.

I think this is sort of a dead end, as increasing the complexity of this type of model could eventually lead to just recreating final record (adding variables leads to a more accurate model, just not a better one necessarily). Given the turnover of players, aging effects, September callups, injuries, and luck, any attempt to predict this years/next years record is going to fail occasionally.

The simplicity, and overall accuracy, of the Pythagorean record makes it an excellent model. Basically, it takes very little data (two numbers), and estimates the final record very well (most of the time). And it's fairly easy to explain to non-math types. That makes it exactly the type of model that one wants for widespread baseball use.

So my 2 cents of opinion is that it's doable, just not something that would really help out that much. And the conclusion, based on pythagoras and BP's method, is that the Jays were lucky to score as many runs as they did (so says BP's model), but were unlucky that the extra runs didn't translate into more wins (so says James' model). Which sort of fits with my feelings on the year. It seemed like the Jays weren't good enough to really be a playoff team, but they had a chance to make it anyway. Sort of frustrating, sort of not.

Magpie - Monday, October 03 2005 @ 04:18 PM EDT (#129553) #

The Yankees have beaten their Pythagorean projection easily each of the last 2 years.

Each of the last five years, actually. Ever since they stopped winning world championships.

Over the past ten seasons, the Yankees have exceeded their Pythagorean projection by 30 games, by far the largest amount in the AL. That's come about almost entirely (29 games) over these last five seaons.

Behold a Data Table!

Team	R	RA	W	L	Py Pct	Ex W	Ex L	Diff

NYY	8859	7399	982	634	0.589	952	664	+30
MIN	7625	7917	792	825	0.481	778	839	+14
CHW	8344	8051	847	771	0.518	838	780	 +9
OAK	8286	7713	875	744	0.536	867	752	 +8
LAA	7828	7709	829	790	0.508	822	797	 +7
MIL	1575	1641	158	165	0.479	155	168	 +3
TBD	5649	6941	518	775	0.398	515	778	 +3
TEX	8752	8797	805	815	0.497	806	814	 -1
CLE	8612	7975	867	751	0.538	871	747	 -4
DET	7299	8764	652	956	0.410	659	949	 -7
BOS	8734	7762	897	722	0.559	905	714	 -8
TOR	7948	7976	796	823	0.498	807	812	-11
BAL	7891	8149	770	849	0.484	783	836	-13
KCR	7666	8812	679	937	0.431	696	920	-17
SEA	8476	7805	855	763	0.541	876	742	-21

I have year-by-year for all of the individual teams, of course. The biggest positive single-season aberrations were posted by the 2004 Yankees (who went 101-61 instead of 89-73), the 1998 Kansas City Royals (72-89 instead of 62-99). Biggest under-performers? The 1999 Kansas City Royals (64-97 instead of 75-86). And, yes, the 2005 Blue Jays (80-82 instead of 89-73).

In case you're curious, the 2000 Royals finished one game above their Pythag projection, after going from 10 over to 11 under in the previous two seasons.

And I must give a Big Huge shout-out to Rob for telling me about Excel's Text-to-Colums feature. The young rook teaches the old dog a New Trick.

Mike Green - Monday, October 03 2005 @ 04:31 PM EDT (#129554) #

All of which will be filed in Mariano Rivera's Hall of Fame dossier. The Twins have had a very good pen over the last 10 years as well, while the Mariners, Royals and Jays have on the whole suffered in that department. I suspect that if you look at Pythagorean vs. actual W-L over a 10 year span, luck plays a much smaller role in the difference. The Royals' 98 and 99 seasons are a good example of why this might be the case.

The Yankees 2001-2005 number of +29 might very well be one of the highest 5 year totals.

John Northey - Monday, October 03 2005 @ 04:32 PM EDT (#129555) #

Interesting that the 1999 Royals were the only team in the past 10 years to do worse than the Jays with Runs For/Against vs record. Lets see the 99 Royals vs 00 Royals to get some idea for the Jays in 2006...

99 - Scored 856 Allowed 921 Record 64-97
00 - Scored 879 Allowed 930 Record 77-85

Stats via ESPN.

Wow. Similiar runs scored/allowed for the two years (23 more scored, 9 more allowed) but a 13 win difference. Gives one hope for 2006.

Magpie - Monday, October 03 2005 @ 04:42 PM EDT (#129556) #

All of which will be filed in Mariano Rivera's Hall of Fame dossier.

What, there's actually room for more?

Magpie - Monday, October 03 2005 @ 04:46 PM EDT (#129557) #

I suspect that if you look at Pythagorean vs. actual W-L over a 10 year span...

It's one honking big Data Table, but I can post it easily enough if y'all want to have a look. Should I sort it by difference in projected as opposed to actual win-loss record? Or just by team?

studes - Monday, October 03 2005 @ 04:51 PM EDT (#129558) #

I could see the argument that a "hard-working" team would underperform its Pythagorean projection, I guess. But I'd hate to use it to make subjective judgements like that.

On a regular basis, I do look at the distribution of runs scored and allowed, and add them into a projected won/loss record. I wrote a long article or two about it earlier this year at the Hardball Times (by the way, "pithy"????). Sometimes it helps; lots of times it doesn't. But it does explain the Diamondback's variance. Lots of blowout losses.

I agree with Mike that the Yankees' recent dominance may be historic. Also, using the "Pythagopat" formula, last year's Yankees and this year's Diamondbacks were both at +12. The highest ever was +13, by the 1905 Tigers. So we've just seen two of the highest variance years ever in the past two years.

Craig B - Monday, October 03 2005 @ 04:55 PM EDT (#129559) #

But I'd hate to use it to make subjective judgements like that.

Studes, you've got the explanation arrow pointed the wrong way 'round... we're using the team characteristics to look for explanation for why the Pythag is the way it is, not looking at Pythag to tell us something about the teams.

Again, we are fully cognizant that all this stuff is only tendencies.

Craig B - Monday, October 03 2005 @ 04:58 PM EDT (#129560) #

I suspect that if you look at Pythagorean vs. actual W-L over a 10 year span, luck plays a much smaller role in the difference.

I suspect that if you looked at Pythagorean vs actual W/L over a 10-year span, the effects would be so vanishingly small as to disappear in the noise.

Craig B - Monday, October 03 2005 @ 05:01 PM EDT (#129561) #

The Yankees, though, are 2.18 standard deviations above the norm. (Everyone else is well within 2 standard deviations). So maybe, just maybe, there is something to this Yankee dominance.

Magpie - Monday, October 03 2005 @ 05:14 PM EDT (#129562) #

Actually, I forgot that Flex had already asked to see the Blue Jays performance gathered together. So I'll sort by teams:

	Team	R	RA	W	L	Py Pct	Ex W	Ex L	Diff
									
1996	BAL	949	903	88	74	0.525	85	77	+3
1997	BAL	812	681	98	64	0.587	95	67	+3
1998	BAL	817	785	79	83	0.520	84	78	-5
1999	BAL	851	815	78	84	0.522	84	78	-6
2000	BAL	794	913	74	88	0.431	70	92	+4
2001	BAL	687	829	63	98	0.407	66	95	-3
2002	BAL	667	773	67	95	0.427	69	93	-2
2003	BAL	743	820	71	91	0.451	73	89	-2
2004	BAL	842	830	78	84	0.507	82	80	-4
2005	BAL	729	800	74	88	0.454	73	89	+1

1996	BOS	928	921	85	77	0.504	82	80	+3
1997	BOS	851	857	78	84	0.496	80	82	-2
1998	BOS	876	729	92	70	0.591	96	66	-4
1999	BOS	836	718	94	68	0.575	93	69	+1
2000	BOS	792	745	85	77	0.531	86	76	-1
2001	BOS	772	745	82	79	0.518	83	78	-1
2002	BOS	859	665	93	69	0.625  101	61	-8
2003	BOS	961	809	95	67	0.585	95	67	 0
2004	BOS	949	768	98	64	0.604	98	64	 0
2005	BOS	910	805	95	67	0.561	91	71	+4

1996	CHW	898	794	85	77	0.561	91	71	-6
1997	CHW	779	833	80	81	0.467	75	86	+5
1998	CHW	861	931	80	82	0.461	75	87	+5
1999	CHW	777	870	75	86	0.444	71	90	+4
2000	CHW	978	839	95	67	0.576	93	69	+2
2001	CHW	798	795	83	79	0.502	81	81	+2
2002	CHW	856	798	81	81	0.535	87	75	-6
2003	CHW	791	715	86	76	0.550	89	73	-3
2004	CHW	865	831	83	79	0.520	84	78	-1
2005	CHW	741	645	99	63	0.569	92	70	+7

1996	CLE	952	769	99	62	0.605	97	64	+2
1997	CLE	868	815	86	75	0.531	86	75	 0
1998	CLE	850	779	89	73	0.544	88	74	+1
1999	CLE    1009	860	97	65	0.579	94	68	+3
2000	CLE	950	816	90	72	0.575	93	69	-3
2001	CLE	897	821	91	71	0.544	88	74	+3
2002	CLE	739	837	74	88	0.438	71	91	+3
2003	CLE	699	778	68	94	0.447	72	90	-4
2004	CLE	858	857	80	82	0.501	81	81	-1
2005	CLE	790	643	93	69	0.602	97	65	-4

1996	DET	783    1103	53     109	0.335	54     108	-1
1997	DET	784	790	79	83	0.496	80	82	-1
1998	DET	722	863	65	97	0.412	67	95	-2
1999	DET	747	882	69	92	0.418	67	94	+2
2000	DET	823	827	79	73	0.498	76	76	+3
2001	DET	724	876	66	96	0.406	66	96	 0
2002	DET	575	864	55     106	0.307	49     112	+6
2003	DET	591	928	43     119	0.289	47     115	-4
2004	DET	827	844	72	90	0.490	79	83	-7
2005	DET	723	787	71	91	0.458	74	88	-3

1996	KCR	746	786	75	86	0.474	76	85	-1
1997	KCR	747	820	67	94	0.454	73	88	-6
1998	KCR	714	899	72	89	0.387	62	99     +10
1999	KCR	856	921	64	97	0.463	75	86     -11
2000	KCR	879	930	77	85	0.472	76	86	+1
2001	KCR	729	858	65	97	0.419	68	94	-3
2002	KCR	737	891	62     100	0.406	66	96	-4
2003	KCR	836	867	83	79	0.482	78	84	+5
2004	KCR	720	905	58     104	0.388	63	99	-5
2005	KCR	702	935	56     106	0.360	58     104	-2

1996	LAA	762	943	70	91	0.395	64	97	+6
1997	LAA	829	794	84	78	0.522	84	78	 0
1998	LAA	787	783	85	77	0.503	81	81	+4
1999	LAA	711	826	70	92	0.426	69	93	+1
2000	LAA	864	869	82	80	0.497	81	81	+1
2001	LAA	691	730	75	87	0.473	77	85	-2
2002	LAA	851	644	99	63	0.636	103	59	-4
2003	LAA	736	743	77	85	0.495	80	82	-3
2004	LAA	836	734	92	70	0.565	91	71	+1
2005	LAA	761	643	95	67	0.583	95	67	 0

1996	MIL	894	899	80	82	0.497	81	81	-1
1997	MIL	681	742	78	83	0.457	74	87	+4

1996	MIN	877	900	78	84	0.487	79	83	-1
1997	MIN	772	861	68	94	0.446	72	90	-4
1998	MIN	734	818	70	92	0.446	72	90	-2
1999	MIN	686	845	63	97	0.397	64	96	-1
2000	MIN	748	880	69	93	0.419	68	94	+1
2001	MIN	771	766	85	77	0.503	82	80	+3
2002	MIN	768	712	94	67	0.538	87	74	+7
2003	MIN	801	758	90	72	0.528	85	77	+5
2004	MIN	780	715	92	70	0.543	88	74	+4
2005	MIN	688	662	83	79	0.519	84	78	-1

1996	NYY	871	787	92	70	0.551	89	73	+3
1997	NYY	891	688	96	66	0.626  101	61	-5
1998	NYY	965	656    114	48	0.684  111	51	+3
1999	NYY	900	731	98	64	0.603	98	64	 0
2000	NYY	871	814	87	74	0.534	86	75	+1
2001	NYY	804	713	95	65	0.560	90	70	+5
2002	NYY	897	697    103	58	0.624  100	61	+3
2003	NYY	877	716    101	61	0.600	97	65	+4
2004	NYY	897	808    101	61	0.552	89	73     +12
2005	NYY	886	789	95	67	0.558	90	72	+5

1996	OAK	861	900	78	84	0.478	77	85	+1
1997	OAK	764	946	65	97	0.395	64	98	+1
1998	OAK	804	866	74	88	0.463	75	87	-1
1999	OAK	893	846	87	75	0.527	85	77	+2
2000	OAK	947	813	91	70	0.576	93	68	-2
2001	OAK	884	645    102	60	0.653  106	56	-4
2002	OAK	800	654    103	59	0.599	97	65	+6
2003	OAK	768	643	96	66	0.588	95	67	+1
2004	OAK	793	742	91	71	0.533	86	76	+5
2005	OAK	772	658	88	74	0.579	94	68	-6

1996	SEA	993	895	85	76	0.552	89	72	-4
1997	SEA	925	833	90	72	0.552	89	73	+1
1998	SEA	859	855	76	85	0.502	81	80	-5
1999	SEA	859	905	79	83	0.474	77	85	+2
2000	SEA	907	780	91	71	0.575	93	69	-2
2001	SEA	927	627    116	46	0.686  111	51	+5
2002	SEA	814	699	93	69	0.576	93	69	 0
2003	SEA	795	637	93	69	0.609	99	63	-6
2004	SEA	698	823	63	99	0.418	68	94	-5
2005	SEA	699	751	69	93	0.464	75	87	-6

1998	TBD	620	751	63	99	0.405	66	96	-3
1999	TBD	772	913	69	93	0.417	68	94	+1
2000	TBD	733	842	69	92	0.431	69	92	 0
2001	TBD	672	887	62     100	0.365	59     103	+3
2002	TBD	673	918	55     106	0.350	56     105	-1
2003	TBD	715	852	63	99	0.413	67	95	-4
2004	TBD	714	842	70	91	0.418	67	94	+3
2005	TBD	750	936	67	95	0.391	63	99	+4

1996	TEX	928	799	90	72	0.574	93	69	-3
1997	TEX	807	823	77	85	0.490	79	83	-2
1998	TEX	940	871	88	74	0.538	87	75	+1
1999	TEX	945	859	95	67	0.548	89	73	+6
2000	TEX	848	974	71	91	0.431	70	92	+1
2001	TEX	890	968	73	89	0.458	74	88	-1
2002	TEX	843	882	72	90	0.477	77	85	-5
2003	TEX	826	969	71	91	0.421	68	94	+3
2004	TEX	860	794	89	73	0.540	87	75	+2
2005	TEX	865	858	79	83	0.504	82	80	-3

1996	TOR	766	809	74	88	0.473	77	85	-3
1997	TOR	654	694	76	86	0.470	76	86	 0
1998	TOR	816	768	88	74	0.530	86	76	+2
1999	TOR	883	862	84	78	0.512	83	79	+1
2000	TOR	861	908	83	79	0.473	77	85	+6
2001	TOR	767	753	80	82	0.509	82	80	-2
2002	TOR	813	828	78	84	0.491	80	82	-2
2003	TOR	894	826	86	76	0.539	87	75	-1
2004	TOR	719	823	67	94	0.433	70	91	-3
2005	TOR	775	705	80	82	0.547	89	73	-9

The gravitational pull of .500... like a black hole! Invisible, unseen, but exerting limitless power!

studes - Monday, October 03 2005 @ 05:24 PM EDT (#129563) #

<i>Studes, you've got the explanation arrow pointed the wrong way 'round... we're using the team characteristics to look for explanation for why the Pythag is the way it is, not looking at Pythag to tell us something about the teams.</i>

Point taken, Craig. I wasn't saying you guys were doing it, just laying out a concern.

studes - Monday, October 03 2005 @ 05:45 PM EDT (#129564) #

Well, I was curious as to whether the Yankees do indeed have the best five-run stretch of all time. According to my calculations, through 2004, they had the second-best five-year stretch of all time. First is the Oriole team from 1977-1981 (29, using PythagoPat). The Yankees from 2000-2004 had 27.

Replace 2000 with 2005, however, and the Yankees will now sit in first. It's interesting that there are a number of teams near the top. The 1976-1980 Orioles, then the 1970-1974 Tigers. The Yankee run isn't quite as remarkable as I thought.

Here's the list of all five-year runs over 23 games favorable (through 2004):

 yearID franchID  5 Years
  1981    BAL       28.9
  2004    NYY       26.7
  1980    BAL       26.1
  1974    DET       25.6
  1933    PIT       25.2
  1963    LAD       23.9
  1982    BAL       23.6
  1978    SDP       23.3
  1909    DET       23.0

Sheldon - Monday, October 03 2005 @ 07:35 PM EDT (#129566) #

I love this stuff. Thanks for all the hard work gang....Damnit next season can't come soon enough!!!

Flex - Monday, October 03 2005 @ 09:54 PM EDT (#129571) #

Hey, thanks for the data table! One thing I was looking for was what you might call the "elastic band effect" and it shows up bigtime in the case of the KC Royals, from 97-99 and again from 2002 to 2004. We can only hope the same effect holds true for the Jays in 06.

Another thing I read from Magpie's table is that the numbers of over or under performance seem to reflect the personalities of the teams themselves. It's fair to say that Minnesota has been playing all out for the past few years, and that's the impression I'd take from the numbers. Oakland too, has overachieved, and there it is in the data. Similarly in Toronto, except for 2003, it's felt as if the team has failed to play up to its potential for the past several years, and the performance versus Pythagorean projections reflects it. Easy to generalize, of course. But that's what I'm here for.

Most amazing stat from the table though, for me? Nothing to do with Pythagoras. I'd forgotten how bad the Yankees were in 2000, and yet still won the division. That was the missed opportunity year.

Craig B - Monday, October 03 2005 @ 11:01 PM EDT (#129574) #

Interesting group, Studes. Mostly very good teams in there - those 07-09 Tigers won three pennants, I think those Dodgers won a couple as well...

Craig B - Tuesday, October 04 2005 @ 09:01 AM EDT (#129581) #

By the way, Blue Jays 2004 Win Shares are available at The Hardball Times. Those should be the final numbers - let me know if you see any mistakes!

Craig B - Tuesday, October 04 2005 @ 09:08 AM EDT (#129582) #

2004 Win Shares Gold Gloves - AL

C Joe Mauer
1B Mark Teixeira
2B Orlando Hudson
SS Juan Uribe
3B Eric Chavez (Chone Figgins had more DWS but played only 56 games at third)
OF Aaron Rowand
OF Vernon Wells
OF Grady Sizemore

That's quite an excellent list. Mauer, Teix, O-Dog, Uribe, Rowand and Wells all deserve Gold Gloves in my book. I'd go with Crede over Chavez and maybe Jeremy Reed or Ichiro over Sizemore, but I haven't seen Sizemore enough, or enough of Chavez this year.

Craig B - Tuesday, October 04 2005 @ 09:12 AM EDT (#129583) #

That list should be "2005" not 2004, obviously. Whoops.

2005 NL Win Shares Gold Gloves

C Mike Matheny (led the majors in defensive win shares)
1B Derrek Lee
2B Craig Counsell (Craig Counsell?)
SS Rafael Furcal
3B David Wright
OF Brady Clark
OF Carlos Beltran
OF Willy Taveras

OK, I didn't watch as much NL baseball as I usually do, but I have to say this is one mighty strange list. Matheny is a good pick (I'll take Yadier Molina for the next decade though), as is Lee, and Wright. The others? Really?

None of them are bad defenders, I guess.

Cristian - Tuesday, October 04 2005 @ 11:25 AM EDT (#129590) #

OF Carlos Beltran

If you injure a great defensive player, shouldn't this count against your defensive win shares? Shouldn't Beltran's win shares be calculated like this: Final Beltran DWS = Beltran raw DWS - 50 games(Cameron DWS) + 50 games(V.Diaz DWS)

R Billie - Tuesday, October 04 2005 @ 09:28 PM EDT (#129632) #

I'm tending to think that BP is absolutely right with their third order wins pegging the Jays wins as right on track with their actual talent level.

Looking at raw runs scored and allowed for the Jays is deceptive because a great deal of the games they played featured them scoring very few runs. They were shutout the most often in the league I believe.

Their Pythagorean results are propped up by a number of blowout wins which were anomolous with their usual day-to-day performance.

Taking the example of a relief pitcher, you can have two different guys who each make 70 one inning appearances and end the year with a 4.50 era.

One guy consistently gives up 1 run every other inning. The other guy goes on a streak of 9 scoreless innings at a time before giving up 5 runs in the tenth outing.

Both equal in overall value? Apparently. But the latter guy is actually the more <i>reliable</i> performer. He actually shuts out the competition a great deal of the time before melting down in the odd game. He would make the more reliable closer while the first guy (let's call him Miguel Batista) will make you tear your hair out.

The Blue Jays offensively are NOT a reliable team. Every handful of games they might have a blowout or two to prop up their runs scored but then they go on to have a 12 inning scoreless streak. They go on a run of 8 games scoring 3 or fewer runs.

So it's not the performance over 162 games by itself that matters. It's HOW OFTEN you're able to perform within an arbitrary 9 inning set. I think with the general lack of standout players the Jays have a team profile where they will hammer the odd pitcher every few days but will struggle on a regular basis as their lineup just isn't dangerous enough to come through regularly.

The other question is, can they next year expect as good of a performance out of the back of the rotation? Will the relievers continue to improve from respectable to actually good? The Jays need everything working in their favour. But the bottom line is they still need a lot more talent.

Magpie - Tuesday, October 04 2005 @ 10:12 PM EDT (#129635) #

Looking at raw runs scored and allowed for the Jays is deceptive because a great deal of the games they played featured them scoring very few runs. They were shutout the most often in the league I believe.

I'm not completely sold on this point. See, I'm pretty much stuck on two related ideas: 1) the Jays Pythagorean under-performance is a random fluke, because; 2) it's the result of their utterly awful record in one-run games, and that is a random fluke.

The Jays were shutout more often than anybody else in the league, but it's just two more games than Oakland. The Jays really didn't have an unusually large number of blowout wins - they won two games by more than 10 runs, while losing one themselves. The offense was really good enough to score 775 runs - which is in fact pretty mediocre by the standards of this division. But on the other hand, their pitching really was good enough to allow only 705 runs. Which is outstanding by the standards of this division. And that combination should result in a much better record than 80-82.

I think your idea is much more applicable to Oakland. The A's were shutout 12 times, second most in the league. They had three wins by more than 10 runs, and no losses. They underperformed their Pythagorean expectation by a full six games. And unlike Toronto, the A's had a decent record in one-run games.

...it's not the performance over 162 games by itself that matters. It's HOW OFTEN you're able to perform within an arbitrary 9 inning set.

What you're talking about is arranging the runs into more productive groups and here I think we're in complete agreement. The joker in the pack, I say yet again, is the Jays' absymal 16-31 record in games decided by one run.

What I need to do (and I will, I promise - I plan on updating all of my season-in-progress studies, and this was one) is look again at the Jays record every time they scored x number of runs, and every time they allowed x number of runs.

And I also need to take a good hard look at those 47 games decided by a single run... (I did that before too, when there were only 19 of them.)

Lots to do, lots to do. And less than six months til opening day....

Magpie - Tuesday, October 04 2005 @ 10:18 PM EDT (#129637) #

Oops. Oakland actually had five wins by more than 10 runs (a couple of late ones since I originally looked) and no losses. So I think, sure, Oakland missed their Pythagorean expectation partially because of their skewed record in blowouts (much as Arizona exceeded theirs for the opposite reason - the D'Backs lost lots of games this way.)

But I don't think it's all that applicable to the Blue Jays.

Mike Green - Wednesday, October 05 2005 @ 09:54 AM EDT (#129651) #

Thanks, studes. I am a bit surprised that the Yankees performance 2001-05 vs. their Pythagorean projection was not clearly the best.

One factor that might come into play with the older teams in particular is starting pitchers who outperform their runs allowed rate. Bill James did a study on Sandy Koufax showing that in 63-64, Koufax significantly outperformed his runs allowed rate (i.e. when the game was tight, he pitched even better). In the age of the 4 man rotation and starters with 15-20 complete games per season, having two such starters could influence the won-loss record significantly. Other factors you might think have played a role include bullpen (Rivera for the current Yanks, Hiller for the early 70s Tigers), bench (late 70s-early 80's O's), and managerial acumen (late 70s-early 80s O's, the Tigers of Hughie Jennings). I can think of no ready explanation for the '70s Padres and the '30s Pirates.

Pistol - Wednesday, October 05 2005 @ 10:25 AM EDT (#129655) #

"What I need to do (and I will, I promise - I plan on updating all of my season-in-progress studies, and this was one) is look again at the Jays record every time they scored x number of runs, and every time they allowed x number of runs."

Well, here's the data:

http://www.baseballprospectus.com/statistics/team_game_results2005.php

Magpie - Wednesday, October 05 2005 @ 12:07 PM EDT (#129668) #

Bloody hell, Pistol. I was just about to go through the ESPN game logs and laboriously copy and paste and all of that information into a spreadsheet. Hours of time-wasting drudgery.

You've saved me so much time that... I may actually have to get a life or something.

TangoTiger - Wednesday, October 05 2005 @ 03:08 PM EDT (#129700) #

If you are going to use a fixed exponent of 2, you are going to bias the results. This becomes even more problematic when you aggregate the results by team.

The better method would be to have a variable exponent. The best estimator, so far, is the following:
exponent = (RS+RA) ^ .287

So, a team that scores and allows a total of 9 runs will have an exponent of 1.88.

Mike Green - Wednesday, October 05 2005 @ 03:28 PM EDT (#129701) #

I am assuming that for seasonal purposes, one divides the total runs by the number of games. The exponent would be (RS + RA)/162 ^ .287, or for the Jays 1.887. This would make their Pythagorean win total 88 rather than 89. Have I got the numbers right, Tango?

TangoTiger - Wednesday, October 05 2005 @ 03:38 PM EDT (#129702) #

Yes, it would be per game, though if you want to get really technical about it, it should be per 27 outs. (Easy enough to figure on the RA side, and a bit more work on the RS side.)

What Does Pythagoras Say? | 45 comments | Create New Account

The following comments are owned by whoever posted them. This site is not responsible for what they say.