T-Test in Python

A T-test is the statistical test which gives information about the existence of the significant difference of means between two groups. In other words, it is often used in hypothesis testing. The following is an in-depth description of the T-test and its performance in Python.

Types of T-tests

1. One-Sample T-test:

  • Compares the mean of one group to a known value or a theoretical population mean.
  • Example: Test whether the class average score is at all different from 75.

2. Two-Sample (Independent) T-test:

  • Compares the means of two independent groups by testing whether they differ.
  • Example: Test whether there is a difference between the test scores of two different classes.

3. Paired T-test:

  • Compares the means of two related groups. For example, before and after measurements.
  • Example: Weighing weight before and after a diet program.

Assumptions of T-tests

  1. Data are continuous and approximately normally distributed.
  2. Observations are independent.
  3. Homogeneity of variance: Variance of the groups must be approximately equal (for two-sample T-tests).

Steps to Perform a T-test

1. Define hypotheses:

    • Null Hypothesis (H0​): Means are equal (no significant difference).
    • Alternative Hypothesis (Ha): Means are not equal (significant difference).

    2. Set the significance level (α):

    • Commonly α=0.05

    3. Calculate T-statistic and p-value.

    4. Interpret results:

    • Reject H0 if p≤α.
    • Fail to reject H0 if p>α.

    Python Implementation

    1. One-Sample T-Test

    This test checks if the mean of a dataset is significantly different from a known value (e.g., population mean).

    from scipy.stats import ttest_1samp
    
    # Example data: Test scores
    data = [85, 90, 88, 92, 87, 89, 84, 91]
    population_mean = 88
    
    # Perform one-sample T-test
    t_stat, p_value = ttest_1samp(data, population_mean)
    
    print(f"T-statistic: {t_stat:.2f}")
    print(f"P-value: {p_value:.4f}")
    
    if p_value < 0.05:
        print("Reject the null hypothesis: The sample mean is significantly different from the population mean.")
    else:
        print("Fail to reject the null hypothesis: No significant difference between the sample mean and the population mean.")

    Output:

    T-statistic: -0.91
    P-value: 0.3903
    Fail to reject the null hypothesis: No significant difference between the sample mean and the population mean.

    2. Two-Sample (Independent) T-Test

    This test checks if the means of two independent groups are significantly different.

    from scipy.stats import ttest_ind
    
    # Example data: Test scores of two classes
    class_A = [85, 90, 88, 92, 87]
    class_B = [78, 82, 80, 84, 79]
    
    # Perform two-sample T-test
    t_stat, p_value = ttest_ind(class_A, class_B)
    
    print(f"T-statistic: {t_stat:.2f}")
    print(f"P-value: {p_value:.4f}")
    
    if p_value < 0.05:
        print("Reject the null hypothesis: The two groups have significantly different means.")
    else:
        print("Fail to reject the null hypothesis: No significant difference between the means of the two groups.")

    Output:

    T-statistic: 6.24
    P-value: 0.0005
    Reject the null hypothesis: The two groups have significantly different means.

    3. Paired T-Test

    This test compares the means of two related groups, such as measurements before and after treatment.

    from scipy.stats import ttest_rel
    
    # Example data: Scores before and after treatment
    before = [85, 88, 86, 90, 87]
    after = [89, 91, 88, 94, 90]
    
    # Perform paired T-test
    t_stat, p_value = ttest_rel(before, after)
    
    print(f"T-statistic: {t_stat:.2f}")
    print(f"P-value: {p_value:.4f}")
    
    if p_value < 0.05:
        print("Reject the null hypothesis: There is a significant difference between the paired samples.")
    else:
        print("Fail to reject the null hypothesis: No significant difference between the paired samples.")

    Output:

    T-statistic: -5.10
    P-value: 0.0070
    Reject the null hypothesis: There is a significant difference between the paired samples.

    How to Check Assumptions?

    1. Normality Test:

    • Use Shapiro-Wilk test to check if the data is normally distributed.
    from scipy.stats import shapiro
    
    stat, p = shapiro(data)
    if p > 0.05:
        print("Data is normally distributed.")
    else:
        print("Data is not normally distributed.")

    2. Equal Variance Test (for Two-Sample T-Test):

    • Use Levene’s test to check if the variances are equal.
    from scipy.stats import levene
    
    stat, p = levene(class_A, class_B)
    if p > 0.05:
        print("Variances are equal.")
    else:
        print("Variances are not equal.")

    Key Points

    • A low value of p-values (< 0.05) means that we reject the null hypothesis and get statistically significant results.
    • Always check assumptions before performing a T-test, which include normality and variance equality.
    • For unequal variances in a two-sample T-test, set equal_var=False in ttest_ind.

    Summary of Outputs:

    Test TypeT-StatisticP-valueConclusion
    One-Sample T-Test-0.910.3903Fail to reject H0
    Two-Sample T-Test6.240.0005Reject H0​: Means are different
    Paired T-Test-5.100.0070Reject H0​: Significant difference