Hypothesis testing#

Let’s do the t-test.#

For t-test, we need one contanious variable and one categorical variable with two categories.

# Lets exclude data for the virginica species, and make a new data set with the species of two (setosa and versicolor)
filtered_data <- subset(iris, Species %in% c("setosa", "versicolor"))

# Perform a t-test
t.test(Sepal.Length ~ Species, data = filtered_data) #null: the sepal length are equal (mean) between setosa and versicolor
	Welch Two Sample t-test

data:  Sepal.Length by Species
t = -10.521, df = 86.538, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.1057074 -0.7542926
sample estimates:
    mean in group setosa mean in group versicolor 
                   5.006                    5.936 

Let’s do the anova test.#

 anova=aov(Sepal.Length ~ Species, data=iris) #null: the sepal length are equal for all species
 summary(anova)
             Df Sum Sq Mean Sq F value Pr(>F)    
Species       2  63.21  31.606   119.3 <2e-16 ***
Residuals   147  38.96   0.265                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Perform Tukey’s HSD test for post hoc analysis#

Tukey’s Honestly Significant Difference (HSD) test results, which will allow you to determine which specific groups differ significantly from each other

TukeyHSD(anova)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Sepal.Length ~ Species, data = iris)

$Species
                      diff       lwr       upr p adj
versicolor-setosa    0.930 0.6862273 1.1737727     0
virginica-setosa     1.582 1.3382273 1.8257727     0
virginica-versicolor 0.652 0.4082273 0.8957727     0

In addition to Tukey’s HSD test, you can perform other post hoc tests such as the Bonferroni correction

pairwise.t.test(iris$Sepal.Length, iris$Species, p.adjust.method = "bonferroni")
	Pairwise comparisons using t tests with pooled SD 

data:  iris$Sepal.Length and iris$Species 

           setosa  versicolor
versicolor 2.6e-15 -         
virginica  < 2e-16 8.3e-09   

P value adjustment method: bonferroni 

Let’s do th Chi-squared test#

So, we need two categorical variables so that we can make the contingency table Let’s break Petal.lenght and make it a categorical variable

# Create a contingency table (2 by 2)
contingency_table <- table(iris$Species, cut(iris$Petal.Length, breaks = c(0, 5, Inf)))
contingency_table

# Perform the chi-squared test
chisq.test(contingency_table)
            
             (0,5] (5,Inf]
  setosa        50       0
  versicolor    49       1
  virginica      9      41
	Pearson's Chi-squared test

data:  contingency_table
X-squared = 108.53, df = 2, p-value < 2.2e-16

Let’s do the Fisher’s exact test#

# The cell value <5, So, lets' perform the Fisher's exact test
fisher.test(contingency_table)
	Fisher's Exact Test for Count Data

data:  contingency_table
p-value < 2.2e-16
alternative hypothesis: two.sided