Monday, April 5, 2010

2010 MLB Wins Predictions Summary

I was able to find win predictions from the same group as last year, except for Blyleven. The data can also be found as a Google Doc here.

(click to enlarge)
A much stronger display of counting skills than a year ago, particularly by the Yahoo! guys. As shown in the table above, here's the standard deviation for each set of projections:

This is the best sign yet for a strong bounceback year from PECOTA. In 2005-2008, PECOTA always had one of the lowest standard deviations of the predictions I have collected. But last year BP's projection system dropped to the lower half of this list, even behind some human predictions. This is only one metric, but in this sense it appears that whatever was ailing PECOTA last year has been fixed.

Here's how far each set of projections deviates from the average of all the predictions, for each team:

Just like last year, the over/unders top the list rather easily, which makes sense; if any of the totals were way "off", they'd be bet to a more reasonable number. The O/U that was the furthest from the consensus was Houston's; Pinnacle had the Astros at 73.5, while the average was 69.2. Sheehan had the lowest prediction at 64 wins, and somewhat surprisingly (to the point I just went back to double-check) PECOTA has Houston going 78-84.

ZiPS always comes up with some interesting projections, but doesn't have anything that looks totally nuts. Szymborski's system is very low on the White Sox (76.4, vs. an average of 81.9) and the Marlins (74.5, vs. 79.9). It's the humans that have predictions that really stand out. Sheehan has the Royals winning 60 games, and Bukiet pegged the Indians at 67 wins.

Yahoo's Henson is all over the place as usual, projecting the Rangers to win fewer games (75) than the Blue Jays (77). Part of the reason for PECOTA's very low standard deviation is actually some unconventional predictions, with 77 victories for the Royals, 90 for the Yankees and the aforementioned Astros projection keeping their numbers tightly packed.


  1. If you wanted to add Neyer's predictions, here they are:

  2. Hadn't seen that, thanks. Not really much of interest in his numbers, as he basically just combines other projection systems.

  3. VW, Could you use these for CHONE? They were published later, and I consider them my "official", on the record projections.

    A column for each team's average would be cool.

  4. Main table and Google Doc have been updated with CHONE Optimist projections, Neyer's predictions, and "average" column. StDev and Vs Avg. tables now include the new CHONE projections as well.

  5. VW,

    Is that StDev row merely the standard deviation for each individual set of predictions? If so, how is that really informative? Has a lower stdev of predictions been correlated in the past with a lower RMSE? This may be interesting to look at if it has not been examined already.

  6. I have found a correlation in the past between StDev and RMSE. I think this is for two reasons. One is that it's just easier to have a lower RMSE if your predictions are less spread out; Sheehan picking the Royals to win 60 games has a decent chance of really killing his RMSE, and PECOTA doesn't have something like that. Second, a lower StDev, particularly for projection systems, shows that they are sufficiently regressing players' stats from previous years, which likely correlates with having an accurate set of projections.

  7. VW,

    A lot of that makes sense... and Sheehan's ridiculous projections certainly don't help his RMSE.

    I did go back and run the actual StDev's from 2009 and 2008. Interestingly enough they were 11.4 and 11.1 respectively -- much larger than the StDev's from this years projection systems. Thoughts?