Handling Missing Data#
Use na.omit() to remove rows with NA values#
Purpose: na.omit() is used to remove rows from a dataset that contain NA (missing) values. This function scans through the dataset and removes any row where at least one element is NA.
# Creating a sample dataset
data <- data.frame(
Name = c("Alice", "Bob", "Charlie", "David", "Eve"),
Age = c(25, NA, 30, 22, NA),
Height = c(165, 175, NA, 180, 160)
)
# Printing the sample dataset
print(data)
#Example for na.omit()
clean_data <- na.omit(data)
print(clean_data)
Name Age Height
1 Alice 25 165
2 Bob NA 175
3 Charlie 30 NA
4 David 22 180
5 Eve NA 160
Name Age Height
1 Alice 25 165
4 David 22 180
Use is.na() to find NA values#
Purpose:
is.na() is used to identify where NA values are located in your dataset. It returns a logical vector or matrix (depending on the structure of data) where TRUE indicates the presence of NA and FALSE indicates that the value is not missing.
#Example
na_positions <- is.na(data)
print(na_positions)
Name Age Height
[1,] FALSE FALSE FALSE
[2,] FALSE TRUE FALSE
[3,] FALSE FALSE TRUE
[4,] FALSE FALSE FALSE
[5,] FALSE TRUE FALSE