November 24, 2009

The Significance of Small Samples

Filed under: Uncategorized — wwinston @ 10:50 am

 Warning: This is a fairly technical post.  You can ignore the math and skip to last paragraph (in bold)  if you wish.

 My posts often get criticized for drawing inferences from small samples. Small samples can often yield significant results. Suppose Drug A is given to 6 patients with stage 4 cancer and 5 of 6 patients survive 5 years or more.  Suppose Drug B is given to 6 patients whose cancer has advanced to a similar stage and only 2 survive 5 years or more. What is the chance (based on this small sample) that Drug A is better than Drug B? If you know statistics you can show that there is a 94% chance that Drug A is better based only on these 12 patients.

   Let’s now return to a less important topic: Basketball. Suppose player X was injured for 7 games and while he was out our best estimate is that his team (after adjusting for strength of opponents and home court) played 0.08 points better than NBA average. In the ten games before his injury and the ten games after his injury our best estimate is that his team played 6.8 points worse than average. Based on this data let’s test the following hypotheses

Null: When Player X was injured  the team played worse or the same as they did when Player X was not injured

Alternative: Team played better with player X injured.

The standard deviation of a team’s performance is 12 points per game. Cranking through the math we reject the null hypothesis with a p-value of  around .01.

Player X is Kevin Durant and the team is the 2008-2009 OKC Thunder.  So we have shown that there is less than 1 chance in 100 that the Thunder played worse when KD was out. Of course, there could be other explanations about why the Thunder improved when KD  was injured. But they played poorly before he was injured and played poorly after he returned. Is not the most logical explanation that the team played better last year when  KD was out?

   Of course, this year KD is playing much better. But it is hard to make a case that he improved the Thunder’s performance during the 2008-2009 season.


  1. [...] The Lampshade: Wayne Winston warns against putting too much stock into small sample sizes. [mathletics] [...]

    Pingback by The Mid-Afternoon Milk Mustache, featuring early season MVP candidates | Stacheketball, an NBA Blog — November 24, 2009 @ 5:21 pm

  2. I have one problem with this analysis.

    Let’s assume we were going to evaluate the 50 best players in the NBA as voted by fans, sports writers etc… Among them we are reasonably likely to find one KD. That is, one player that rates poorly on Adj +/- despite almost everyone being convinced he’s extremely talented and already a very good ball player.

    That player would then be the focus of articles saying that based on stats he’s not as good as he looks.

    So are the odds really 1 out of 100?

    I think not.

    If you picked one player at random, evaluated him, and he came up bad, THEN we could be way more confident that the stats were telling us something significant.

    Comment by Italian Stallion — November 24, 2009 @ 5:58 pm

  3. By the way, all that said, I don’t necessarily disagree with the stats either.

    Comment by Italian Stallion — November 24, 2009 @ 5:58 pm

  4. If the question was name the top 50 team assets I’d have no problem with 08-09 Durant being on that list. If the was name the 50 best offensive players or the 50 most valuable offensive players again no problem picking Durant.

    But if the question was name the 50 best performing players in the NBA last season overall, offense and defense, individual stats and team impact I have no problem with the Adjusted +/- finding being used to question or reject Durant’s inclusion- last season.

    And if the experts answered the other way I’m sorry but I am taking Adjusted +/-, with its limits and errors, over their opinions but I don’t trust them to really know team offensive impact and defensive impact and to value it properly. At least for 75% of them.

    Comment by Crow — November 24, 2009 @ 7:46 pm

  5. should be … “because” I don’t trust them to really know…

    Comment by Crow — November 24, 2009 @ 7:48 pm

  6. His apparent defensive impact was just too big to ignore or overcome last season, though it could have been overstated. But ignore or greatly undervalue its significance, most of the fans, media and insiders would do that.

    This season there is less to disagree about because the defense- his and /or the defense around him- appears better.

    Comment by Crow — November 24, 2009 @ 7:52 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress