Dissecting Pythagoras
A Historical Examination
Kerry's Calculus for June 22, 2005

If you're a devotee of The Stats That Matter Most, you're already familiar with the principles behind baseball's Pythagorean Theorem.  In the extraordinarily unlikely event that you haven't read every word of the Stats That Matter Most feature, here's a brief description of the theorem:

Many moons ago, trendsetting sabermetrician Bill James demonstrated a relationship between runs scored, runs allowed and wins which, because of the necessity of squaring the elements, he dubbed the Pythagorean Theorem (of baseball).  The formula is that the square of runs scored divided by the sum of the square of runs scored and the square of runs allowed equals a team's expected winning percentage.  From there, you simply multiply the winning percentage and games played and you have a projected win total.  The measure works remarkably well, and, obviously, the more games a team plays, the better the relationship ordinarily is.

That there is a relationship between the number of runs a team scores and allows--essentially, its run differential--and its record is intuitive.  That the relationship is close to perfect when the elements are squared, added and divided in the particular manner described above is less obvious, but there it is.

When analyzing the number for The Stats that Matter Most, it's been commonplace to make statements about win differentials--that is, the difference between a club's actual wins and the wins that the Theorem projects it should have based on the runs it has scored and allowed.  I thought it might be interesting to take a look at how well the Theorem holds up when applied to the past and what sorts of things we can say about various win differentials.

The inspiration to run the Pythagorean numbers for every team in the modern (post-1900) history of baseball was the performance of the 2004 New York Yankees who finished with twelve more wins in real life than the Theorem projected.  I'd been calculating the Stats That Matter Most numbers for about five years and that was easily the largest win differential I'd seen.  I had a feeling it was one of the largest ever, but I really didn't know for certain.  So, I decided to take a look.

Disclaimer

There are no wondrous conclusions to draw from what follows.  It's worth mentioning, I suppose, that this is pretty well-trodden ground.  The Theorem is, I would say, a reliable, useful tool for quickly identifying whether a team is performing significantly above or below what one could reasonably expect, which is pretty much the way we've been using it all along.  I have found that the Theorem is notably more predictive of baseball in the divisional era than it is of the more distant past, as you'll see below, but I don't think that's worth getting particularly excited about.  The information in this piece is more of the whimsical "isn't that interesting" variety than the epiphanic "omigosh" sort.  (Well, I hope it's interesting anyway.)

The Numbers

Here are some basic stats to chew on, covering all teams since 1900:

WDIFF

1901-2004
AVG -0.40882
ST DEV 4.176301

What this tells us is that the Theorem is slightly overestimating the value of run differential.  Teams are coming up, on average roughly 2/5 of a win less than the Theorem projects.  The standard deviation for WDIFF is less than 4.2 wins.

After eyeballing the entire set of statistics, I decided to see if the numbers changed when broken down into a set of distinct eras.  Here's what I found:

YEAR ST DEV AVG
1901-2004 4.176301 -0.40882
1901-45 4.097816 -0.15147
1946-60 4.485843 -0.41867
1961-68 4.132084 -0.2656
1969-2004 3.990205 -0.0659

What you can is that the Theorem becomes far more accurate in the post-expansion era of baseball.  It still overestimates wins, but by the time you come to the era of divisional play, beginning in 1969, the discrepancy is significantly below 1/10 of a win per season per team.  The model is, by far, least predictive during the 1946-60 era.  The greater number of games played after the 1961/1962 major league expansion would be a possible partial explanation, but the number of contests per season was the same (154 games) for virtually all of the 1901-45 era as the 1946-60 era.  Also note that the WDIFF standard deviation has oscillated over time as well, but has gradually shrunk since the 1946-60 era to the point where it has dropped just below four in the era of divisional play.  

Of course, there's the obvious question: why does the model do so much "better" with these more recent seasons?  I'm not certain, but I think it has to do with the relative level of parity that marks the last 35+ seasons of big league ball relative to prior eras.  For all the carping in recent years about the dominance of teams like the Yankees and Red Sox, the fact is that the 1970s, 1980s and at least the first half of the 1990s represented as evenly matched a period as has ever been seen in the modern history of baseball.  Evenly distributed talent pools presumably mean fewer of the blowout games that most dramatically confound the predictive computations of the Theorem.  The flip side of course implies that the so-called "Golden Age of Baseball" (from 1946 or '47 through the late 1950s) coincided with an unprecedented era of stratification in terms of the distribution of talent across major league teams, which led to larger margins of victory in individual games which led to a less predictive model.

So, we know that the average team from 1960 on had a -.0659 WDIFF and the standard deviation for this segment of the population is almost exactly four wins.  If the sample is normally distributed--which is a pretty reasonable expectation--we'd expect roughly 2/3 of the cases to fall within one standard deviation of the mean and approximately 95% of them to fall within two standard deviations of the mean.  712 of the cases in this segment are within four games--plus or minus--of zero in the WDIFF category, which is 74%.  925 of the cases in this segment are within plus or minus eight games of the mean; that's roughly 97% of the sample.  It's not perfect but for all practical purposes, this segment resembles a normal distribution and can be treated as such.

Given that the Theorem is so much more predictive since divisional play began and given that there have been 958 "team seasons" since 1969 (a nice sample), I'll focus the lion's share of the analysis on this era.

Negative Outliers:  The Underachievers

18 teams since 1969 have finished more than eight games below their projected Pythagorean win totals:

Year Team R OR RDIFF G R/G OR/G RDIFF/G PW% EXP W ACT W WDIFF
1993 New York Mets 672 744 -72 162 4.15 4.59

  -0.44

.449 73 59 -14
1986 Pittsburgh Pirates 663 700 -37 162 4.09 4.32 -0.23 .473 77 64 -13
1984 Pittsburgh Pirates 615 567 48 162 3.80 3.50 0.30 .541 88 75 -13
1975 Houston Astros 664 711 -47 162 4.10 4.39 -0.29 .466 75 64 -11
1972 Baltimore Orioles 519 430 89 154 3.37 2.79 0.58 .593 91 80 -11
1970 Chicago Cubs 806 679 127 162 4.98 4.19 0.78 .585 95 84 -11
1999 Kansas City Royals 856 921 -65 161 5.32 5.72 -0.40 .463 75 64 -11
1980 St. Louis Cardinals 738 710 28 162 4.56 4.38 0.17 .519 84 74 -10
1997 Houston Astros 777 660 117 162 4.80 4.07 0.72 .581 94 84 -10
1972 San Francisco Giants 662 649 13 155 4.27 4.19 0.08 .510 79 69 -10
1993 San Diego Padres 679 772 -93 162 4.19 4.77 -0.57 .436 71 61 -10
2001 Colorado Rockies 923 906 17 162 5.70 5.59 0.10 .509 83 73 -10
1985 Boston Red Sox 800 720 80 163 4.91 4.42 0.49 .552 90 81 -9
1980 Milwaukee Brewers 811 682 129 162 5.01 4.21 0.80 .586 95 86 -9
1984 Houston Astros 693 630 63 162 4.28 3.89 0.39 .548 89 80 -9
1975 New York Yankees 681 588 93 160 4.26 3.68 0.58 .573 92 83 -9
1990 New York Mets 775 613 162 162 4.78 3.78 1.00 .615 100 91 -9
1974 California Angels 618 657 -39 163 3.79 4.03 -0.24 .469 77 68 -9

This is a fairly balanced list of clubs.  Only one (the 1990 Mets) projects as an extremely good team; the Theorem had them winning 100 games and they settled for 91 in real life.  There are some other pretty good clubs on this list--the 1970 Cubs, the 1990 Brewers, the 1997 Astros.  There are some high scoring clubs (the 2001 Rockies, the 1999 Royals, the 1980 Brewers, the 1970 Cubs, etc.) and some low scoring teams (the 1975 Yankees, the 1984 Pirates, the 1972 Orioles, the 1974 Angels, etc.) 

What is missing is a list of truly dreadful clubs...and that shouldn't be a surprise.  Imagine a team that projected to lose 105 games; to make this list, that club would have to actually lose 115-odd games.

Positive Outliers:  The Overachievers

15 teams since 1969 have finished more than eight wins above their projected win totals:

Year Team R OR RDIFF G R/G OR/G RDIFF/G PW% EXP W ACT W WDIFF
1977 Baltimore Orioles 719 653 66 161 4.47 4.06 0.41 .548 88 97 9
1970 Philadelphia Phillies 594 730 -136 161 3.69 4.53 -0.84 .398 64 73 9
1978 Oakland Athletics 530 690 -160 162 3.27 4.26 -0.99 .371 60 69 9
1978 Cincinnati Reds 710 688 22 161 4.41 4.27 0.14 .516 83 92 9
1981 Cincinnati Reds 464 440 24 108 4.30 4.07 0.22 .527 57 66 9
2001 New York Mets 642 713 -71 162 3.96 4.40 -0.44 .448 73 82 9
1998 Kansas City Royals 714 898 -184 161 4.43 5.58 -1.14 .387 62 72 10
1972 California Angels 454 533 -79 155 2.93 3.44 -0.51 .420 65 75 10
1997 San Francisco Giants 784 793 -9 162 4.84 4.90 -0.06 .494 80 90 10
2004 Cincinnati Reds 750 907 -157 162 4.63 5.60 -0.97 .406 66 76 10
1970 Cincinnati Reds 775 681 94 162 4.78 4.20 0.58 .564 91 102 11
2004 New York Yankees 897 808 89 162 5.54 4.99 0.55 .552 89 101 12
1974 San Diego Padres 541 830 -289 162 3.34 5.12 -1.78 .298 48 60 12
1984 New York Mets 652 676 -24 162 4.02 4.17 -0.15 .482 78 90 12
1972 New York Mets 528 578 -50 156 3.38 3.71 -0.32 .455 71 83 12

This isn't merely the flip side of the positive group.  There are several really genuinely lousy clubs on this list--the 1974 Padres, for instance, are one of the worst teams of all time in terms of run differential.  Only two teams since 1969 (the 1996 Tigers and the 2003 Tigers) have had a worse negative run differential than the '74 Padres.  (22 pre-1969 clubs had run differentials worse than -289; only four of those clubs were post-WWII).  The 1998 Royals, the 1978 Reds, the 2004 Reds and the 1970 Phillies were all truly poor clubs.  There were a couple of pretty good teams--the 1970 National League champion Reds--in fact the Reds are on this list four separate times and the Mets three times--and the 2004 American League East champion Yankees, the only true really high scoring team on the list.  There are numerous low scoring clubs--the '74 Padres, the incredibly offensively challenged 1972 Angels (since the beginning of divisional play, only the '69 Padres have scored fewer runs per game than the Angels of 1972), the '72 Mets, the 1978 Athletics, the 2001 Mets.

Not surprisingly, in line with what we saw with the underachievers, there are no exceptionally good teams on this list (no club above projected to win more than 91 games).  Imagine a team that really won 105 games; how often is a squad so good that it "should" have won 115?

Regressing Toward the Mean

Bill James always speculated that clubs that were dramatically worse (or better) than their Pythagorean projection would regress toward the mean in the succeeding season.  Everyone, it would be expected, would not only see their WDIFF be far closer to zero the following season, but also see their record go in the direction of the previous season's projection.  For instance, on balance, teams that finished far below their win expectations would win more games the following season.  Overachieving teams, naturally, could be expected to sink back in the direction of their projected number of wins.

Obviously this is an imperfect system because franchises don't simply remain static from year to year.  They make personnel changes, and so do the teams they compete with.  Players develop.  Players become injured.  Players heal.  All of these things, and others, impact the performance of clubs.

Still, the expectation is that underachieving teams--particularly dramatically underachieving teams--will perform better in succeeding years and the overachieving teams--especially substantially overachieving teams--will perform worse in succeeding seasons.  There's a sense that, historically, clubs have been easily fooled by biases; teams playing in good offensive ballparks have a tendency to overrate their offense and underrate their pitching, for instance.  There's an analogous notion that teams are easily misled by their record.  A 90-win club with a +10 WDIFF, for example, is probably going to regard itself as just a move or two away from pushing 100 wins rather than as a .500 club in need of an overhaul.

How does this play out with these outlying clubs, the ones who should be most likely to regress?

First, the underachievers

Year Team EXP W ACT W WDIFF Next Season Expected W Next Season Actual W WDIFF Season to Season Change in Wins
1993 New York Mets 73 59 -14 54 55* 1 -4*
1986 Pittsburgh Pirates 77 64 -13 79 80 1 16
1984 Pittsburgh Pirates 88 75 -13 63 57 -6 -18
1975 Houston Astros 75 64 -11 77 80 3 16
1972 Baltimore Orioles 91 80 -11 104 97 -7 17
1970 Chicago Cubs 95 84 -11 80 83 3 -1
1999 Kansas City Royals 75 64 -11 76 77 1 13
1980 St. Louis Cardinals 84 74 -10 57 59 2 -15**
1997 Houston Astros 94 84 -10 108 102 -6 18
1972 San Francisco Giants 79 69 -10 85 88 3 19
1993 San Diego Padres 71 61 -10 52 47 -5 -14***
2001 Colorado Rockies 83 73 -10 69 73 4 0
1985 Boston Red Sox 90 81 -9 91 95 4 5
1980 Milwaukee Brewers 95 86 -9 58 62 4 -24****
1984 Houston Astros 89 80 -9 83 83 0 3
1975 New York Yankees 92 83 -9 98 97 -1 14
1990 New York Mets 100 91 -9 80 77 -3 -14
1974 California Angels 77 68 -9 69 72 3 4

*--because of the work stoppage in 1994, the Mets only played 113 games; the team's Win% went from .364; .487
**--because of the work stoppage in 1981, the Cardinals played only 103 games; the team's Win% went from .457 to .578
***--because of the work stoppage in 1994, the Padres played only 117 games; the team's Win% went from .377 to .402
****--because of the work stoppage in 1981, the Brewers played only 109 games; the team's Win% went from .531 to .569

14 of the 18 teams above improved in their actual winning percentage from year 1 to year 2.  Only a few teams clearly got worse--the 1984-85 Pirates and the 1990-91 Mets.  The 2001-02 Rockies also slid back in terms of overall play if not wins and, to a more modest extent, so did the 1970-71 Cubs, the 1984-85 Astros and the 1974-75 Angels.

No clubs had consecutive outlying seasons, though a few (most notably the 1972-73 Orioles) came pretty close.

The overachievers:

Year Team EXP W ACT W WDIFF Next Season Expected W Next Season Actual W WDIFF Season to Season Change in Wins
1977 Baltimore Orioles 88 97 9 84 90 6 -7
1970 Philadelphia Phillies 64 73 9 64 67 3 -6
1978 Oakland Athletics 60 69 9 50 54 4 -15
1978 Cincinnati Reds 83 92 9 91 90 -1 -2
1981 Cincinnati Reds 57 66 9 66 61 -5 -5*
2001 New York Mets 73 82 9 79 75 -4 -7
1998 Kansas City Royals 62 72 10 75 64 -11 -8
1972 California Angels 65 75 10 77 79 2 4
1997 San Francisco Giants 80 90 10 92 89 -3 -1
2004 Cincinnati Reds 66 76 10 ? ? ? ?
1970 Cincinnati Reds 91 102 11 82 79 -3 -23
2004 New York Yankees 89 101 12 ? ? ? ?
1974 San Diego Padres 48 60 12 64 71 7 11
1984 New York Mets 78 90 12 97 98 1 8
1972 New York Mets 71 83 12 83 82 -1 -1

*--because of the work stoppage in 1981, the Reds played only 108 games; the team's Win% went from .611 to .377

Two of the 15 teams in the above list are incomplete since we don't yet have a follow-up season...though it's highly likely that both the Reds and the Yankees will win fewer games in 2005 than they did in 2004.  If that happens, nine of the clubs on this list will have regressed in the expected direction, but several improved to a surprising degree (the '74 Padres and the '84 Mets in particular).  The overall trend is in the anticipated direction but the consistency of that tendency is less professed than expected.

The 1998-99 Royals are one of the more intriguing teams I've seen, going from a +10 to a -11 WDIFF in one season.  They are the only club in the segment to have consecutive outlying seasons, even if they are in opposite directions.  Very odd.  The Royals were almost certainly a significantly better team in 1999 when they went 64-98 than in 1988 when they went 72-90.  What is the explanation for this?  This is a unique case out of a sample of nearly 1000.  Chalk it up to randomness.

A Wider Historical View

The Theorem may be less accurate when applied to the pre-divisional years of Major League Baseball, but I thought it would be interesting to take a brief look at some of pre-1969 outlying clubs.

Underachievers

YEAR TEAM R OR R DIFF G R/G OR/G DIFF/G PW% EXP W ACT W W DIFF
1905 Chicago Cubs 667 442 225 155 4.30 2.85 1.45 .695 108 92 -16
1906 Cleveland Indians 664 482 182 157 4.23 3.07 1.16 .655 103 89 -14
1911 Pittsburgh Pirates 744 561 183 155 4.80 3.62 1.18 .638 99 85 -14
1904 Cleveland Indians 647 482 165 154 4.20 3.13 1.07 .643 99 86 -13
1907 Cincinnati Reds 524 519 5 156 3.36 3.33 0.03 .505 79 66 -13
1967 Baltimore Orioles 654 592 62 161 4.06 3.68 0.39 .550 88 76 -12
1911 Chicago White Sox 717 620 97 154 4.66 4.03 0.63 .572 88 77 -11
1924 St. Louis Cardinals 740 750 -10 154 4.81 4.87 -0.06 .493 76 65 -11
1949 New York Giants 736 691 45 156 4.72 4.43 0.29 .532 83 72 -11
1905 Chicago White Sox 613 450 163 158 3.88 2.85 1.03 .650 103 92 -11
1947 Cleveland Indians 687 588 99 157 4.38 3.75 0.63 .577 91 80 -11
1955 Detroit Tigers 775 658 117 154 5.03 4.27 0.76 .581 89 79 -10
1937 Cincinnati Reds 612 707 -95 155 3.95 4.56 -0.61 .428 66 56 -10
1915 Pittsburgh Pirates 557 520 37 156 3.57 3.33 0.24 .534 83 73 -10
1913 Pittsburgh Pirates 673 585 88 155 4.34 3.77 0.57 .570 88 78 -10
1955 Cincinnati Reds 761 684 77 154 4.94 4.44 0.50 .553 85 75 -10
1948 Cleveland Indians 840 567 273 156 5.38 3.63 1.75 .687 107 97 -10
1932 New York Giants 755 706 49 154 4.90 4.58 0.32 .534 82 72 -10
1905 St. Louis Browns 508 608 -100 156 3.26 3.90 -0.64 .411 64 54 -10
1919 Washington Senators 533 571 -38 142 3.75 4.02 -0.27 .466 66 56 -10
1915 Chicago White Sox 717 509 208 155 4.63 3.28 1.34 .665 103 93 -10
1962 St. Louis Cardinals 774 664 110 163 4.75 4.07 0.67 .576 94 84 -10
1935 Boston Braves 575 852 -277 153 3.76 5.57 -1.81 .313 48 38 -10
1966 New York Yankees 611 612 -1 160 3.82 3.83 -0.01 .499 80 70 -10
1908 Boston Red Sox 563 512 51 155 3.63 3.30 0.33 .547 85 75 -10
1953 New York Giants 768 747 21 155 4.95 4.82 0.14 .514 80 70 -10
1958 Cincinnati Reds 695 621 74 154 4.51 4.03 0.48 .556 86 76