Friday, 13 May 2016

T-tests with python

The t-test is a mainstay of basic analysis in many fields. In python the scipy.stats module offers 1-sample, two-sample unpaired and paired t-tests. In the examples below we're using python 3.

1-sample t-test

The 1-sample t-test is used when we want to compare a sample mean to a population mean (which we already know). The average British man is 175.3 cm tall. A survey recorded the heights of 10 UK men and we want to know whether the mean of the sample is different from the population mean.

In [14]:
# 1-sample t-test
from scipy import stats
one_sample_data = [177.3, 182.7, 169.6, 176.3, 180.3, 179.4, 178.5, 177.2, 181.8, 176.5]

one_sample = stats.ttest_1samp(one_sample_data, 175.3)

print("The t-statistic is %.3f and the p-value is %.3f." % one_sample)
The t-statistic is 2.296 and the p-value is 0.047.

Here we can conclude that the average height of our sample is significantly different (p < 0.05) than the average British male height. The return value (one_sample) is a tuple containing the t-value and the p-value from a two-sided t-test.

Unpaired t-test

This test compares two unrelated samples. In the example below data was collected on the weight (kg) of 8 elderly women and 8 elderly men. We are interested in whether the average male and female weights are different.

In [15]:
female = [63.8, 56.4, 55.2, 58.5, 64.0, 51.6, 54.6, 71]
male = [75.5, 83.9, 75.7, 72.5, 56.2, 73.4, 67.7, 87.9]

two_sample = stats.ttest_ind(male, female)

print("The t-statistic is %.3f and the p-value is %.3f." % two_sample)

# assuming unequal population variances
two_sample_diff_var = stats.ttest_ind(male, female, equal_var=False)

print("If we assume unequal variances than the t-statistic is %.3f and the p-value is %.3f." % two_sample_diff_var)
The t-statistic is 3.588 and the p-value is 0.003.
If we assume unequal variances than the t-statistic is 3.588 and the p-value is 0.004.

Here we can conclude that the weights of men and women are indeed different (p < 0.01). Like the 1-sample test above the return value is a tuple containing the t-statistic and the p-value and these are the results of a two-sided test. If we suspect that the samples we are looking at come from populations with unequal variances we can set the equal_var parameter in the test to False. The results are then from Welch's t-test, which does not assume equal population variance.

Paired t-test

The paired t-test is used when we have two sets of repeated measures i.e. we have measured some parameter on the same subjects at two different times (or under two different conditions). In the example the weight of 9 people were recorded before they had abdominal surgery and then again 5 months later. We are asking if surgery leads to a change in weight.

In [16]:
baseline = [67.2, 67.4, 71.5, 77.6, 86.0, 89.1, 59.5, 81.9, 105.5]
follow_up = [62.4, 64.6, 70.4, 62.6, 80.1, 73.2, 58.2, 71.0, 101.0]

paired_sample = stats.ttest_rel(baseline, follow_up)

print("The t-statistic is %.3f and the p-value is %.3f." % paired_sample)
The t-statistic is 3.668 and the p-value is 0.006.

From the results we see that weight has changed significantly between pre- and post-surgery (p < 0.01).