T-Test in Python
A T-test is the statistical test which gives information about the existence of the significant difference of means between two groups. In other words, it is often used in hypothesis testing. The following is an in-depth description of the T-test and its performance in Python.
Types of T-tests
1. One-Sample T-test:
- Compares the mean of one group to a known value or a theoretical population mean.
- Example: Test whether the class average score is at all different from 75.
2. Two-Sample (Independent) T-test:
- Compares the means of two independent groups by testing whether they differ.
- Example: Test whether there is a difference between the test scores of two different classes.
3. Paired T-test:
- Compares the means of two related groups. For example, before and after measurements.
- Example: Weighing weight before and after a diet program.
Assumptions of T-tests
- Data are continuous and approximately normally distributed.
- Observations are independent.
- Homogeneity of variance: Variance of the groups must be approximately equal (for two-sample T-tests).
Steps to Perform a T-test
1. Define hypotheses:
- Null Hypothesis (H0): Means are equal (no significant difference).
- Alternative Hypothesis (Ha): Means are not equal (significant difference).
2. Set the significance level (α):
- Commonly α=0.05
3. Calculate T-statistic and p-value.
4. Interpret results:
- Reject H0 if p≤α.
- Fail to reject H0 if p>α.
Python Implementation
1. One-Sample T-Test
This test checks if the mean of a dataset is significantly different from a known value (e.g., population mean).
from scipy.stats import ttest_1samp
# Example data: Test scores
data = [85, 90, 88, 92, 87, 89, 84, 91]
population_mean = 88
# Perform one-sample T-test
t_stat, p_value = ttest_1samp(data, population_mean)
print(f"T-statistic: {t_stat:.2f}")
print(f"P-value: {p_value:.4f}")
if p_value < 0.05:
print("Reject the null hypothesis: The sample mean is significantly different from the population mean.")
else:
print("Fail to reject the null hypothesis: No significant difference between the sample mean and the population mean.")
Output:
T-statistic: -0.91
P-value: 0.3903
Fail to reject the null hypothesis: No significant difference between the sample mean and the population mean.
2. Two-Sample (Independent) T-Test
This test checks if the means of two independent groups are significantly different.
from scipy.stats import ttest_ind
# Example data: Test scores of two classes
class_A = [85, 90, 88, 92, 87]
class_B = [78, 82, 80, 84, 79]
# Perform two-sample T-test
t_stat, p_value = ttest_ind(class_A, class_B)
print(f"T-statistic: {t_stat:.2f}")
print(f"P-value: {p_value:.4f}")
if p_value < 0.05:
print("Reject the null hypothesis: The two groups have significantly different means.")
else:
print("Fail to reject the null hypothesis: No significant difference between the means of the two groups.")
Output:
T-statistic: 6.24
P-value: 0.0005
Reject the null hypothesis: The two groups have significantly different means.
3. Paired T-Test
This test compares the means of two related groups, such as measurements before and after treatment.
from scipy.stats import ttest_rel
# Example data: Scores before and after treatment
before = [85, 88, 86, 90, 87]
after = [89, 91, 88, 94, 90]
# Perform paired T-test
t_stat, p_value = ttest_rel(before, after)
print(f"T-statistic: {t_stat:.2f}")
print(f"P-value: {p_value:.4f}")
if p_value < 0.05:
print("Reject the null hypothesis: There is a significant difference between the paired samples.")
else:
print("Fail to reject the null hypothesis: No significant difference between the paired samples.")
Output:
T-statistic: -5.10
P-value: 0.0070
Reject the null hypothesis: There is a significant difference between the paired samples.
How to Check Assumptions?
1. Normality Test:
- Use Shapiro-Wilk test to check if the data is normally distributed.
from scipy.stats import shapiro
stat, p = shapiro(data)
if p > 0.05:
print("Data is normally distributed.")
else:
print("Data is not normally distributed.")
2. Equal Variance Test (for Two-Sample T-Test):
- Use Levene’s test to check if the variances are equal.
from scipy.stats import levene
stat, p = levene(class_A, class_B)
if p > 0.05:
print("Variances are equal.")
else:
print("Variances are not equal.")
Key Points
- A low value of p-values (< 0.05) means that we reject the null hypothesis and get statistically significant results.
- Always check assumptions before performing a T-test, which include normality and variance equality.
- For unequal variances in a two-sample T-test, set
equal_var=Falseinttest_ind.
Summary of Outputs:
| Test Type | T-Statistic | P-value | Conclusion |
|---|---|---|---|
| One-Sample T-Test | -0.91 | 0.3903 | Fail to reject H0 |
| Two-Sample T-Test | 6.24 | 0.0005 | Reject H0: Means are different |
| Paired T-Test | -5.10 | 0.0070 | Reject H0: Significant difference |