Warning: This is a fairly technical post. You can ignore the math and skip to last paragraph (in bold) if you wish.
My posts often get criticized for drawing inferences from small samples. Small samples can often yield significant results. Suppose Drug A is given to 6 patients with stage 4 cancer and 5 of 6 patients survive 5 years or more. Suppose Drug B is given to 6 patients whose cancer has advanced to a similar stage and only 2 survive 5 years or more. What is the chance (based on this small sample) that Drug A is better than Drug B? If you know statistics you can show that there is a 94% chance that Drug A is better based only on these 12 patients.
Let’s now return to a less important topic: Basketball. Suppose player X was injured for 7 games and while he was out our best estimate is that his team (after adjusting for strength of opponents and home court) played 0.08 points better than NBA average. In the ten games before his injury and the ten games after his injury our best estimate is that his team played 6.8 points worse than average. Based on this data let’s test the following hypotheses
Null: When Player X was injured the team played worse or the same as they did when Player X was not injured
Alternative: Team played better with player X injured.
The standard deviation of a team’s performance is 12 points per game. Cranking through the math we reject the null hypothesis with a p-value of around .01.
Player X is Kevin Durant and the team is the 2008-2009 OKC Thunder. So we have shown that there is less than 1 chance in 100 that the Thunder played worse when KD was out. Of course, there could be other explanations about why the Thunder improved when KD was injured. But they played poorly before he was injured and played poorly after he returned. Is not the most logical explanation that the team played better last year when KD was out?
Of course, this year KD is playing much better. But it is hard to make a case that he improved the Thunder’s performance during the 2008-2009 season.