Hypothesis testing#
Let’s do the t-test.#
For t-test, we need one contanious variable and one categorical variable with two categories.
# Lets exclude data for the virginica species, and make a new data set with the species of two (setosa and versicolor)
filtered_data <- subset(iris, Species %in% c("setosa", "versicolor"))
# Perform a t-test
t.test(Sepal.Length ~ Species, data = filtered_data) #null: the sepal length are equal (mean) between setosa and versicolor
Welch Two Sample t-test
data: Sepal.Length by Species
t = -10.521, df = 86.538, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.1057074 -0.7542926
sample estimates:
mean in group setosa mean in group versicolor
5.006 5.936
Let’s do the anova test.#
anova=aov(Sepal.Length ~ Species, data=iris) #null: the sepal length are equal for all species
summary(anova)
Df Sum Sq Mean Sq F value Pr(>F)
Species 2 63.21 31.606 119.3 <2e-16 ***
Residuals 147 38.96 0.265
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Perform Tukey’s HSD test for post hoc analysis#
Tukey’s Honestly Significant Difference (HSD) test results, which will allow you to determine which specific groups differ significantly from each other
TukeyHSD(anova)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Sepal.Length ~ Species, data = iris)
$Species
diff lwr upr p adj
versicolor-setosa 0.930 0.6862273 1.1737727 0
virginica-setosa 1.582 1.3382273 1.8257727 0
virginica-versicolor 0.652 0.4082273 0.8957727 0
In addition to Tukey’s HSD test, you can perform other post hoc tests such as the Bonferroni correction
pairwise.t.test(iris$Sepal.Length, iris$Species, p.adjust.method = "bonferroni")
Pairwise comparisons using t tests with pooled SD
data: iris$Sepal.Length and iris$Species
setosa versicolor
versicolor 2.6e-15 -
virginica < 2e-16 8.3e-09
P value adjustment method: bonferroni
Let’s do th Chi-squared test#
So, we need two categorical variables so that we can make the contingency table Let’s break Petal.lenght and make it a categorical variable
# Create a contingency table (2 by 2)
contingency_table <- table(iris$Species, cut(iris$Petal.Length, breaks = c(0, 5, Inf)))
contingency_table
# Perform the chi-squared test
chisq.test(contingency_table)
(0,5] (5,Inf]
setosa 50 0
versicolor 49 1
virginica 9 41
Pearson's Chi-squared test
data: contingency_table
X-squared = 108.53, df = 2, p-value < 2.2e-16
Let’s do the Fisher’s exact test#
# The cell value <5, So, lets' perform the Fisher's exact test
fisher.test(contingency_table)
Fisher's Exact Test for Count Data
data: contingency_table
p-value < 2.2e-16
alternative hypothesis: two.sided