Handling Missing Data#

Use na.omit() to remove rows with NA values#

Purpose: na.omit() is used to remove rows from a dataset that contain NA (missing) values. This function scans through the dataset and removes any row where at least one element is NA.

# Creating a sample dataset
data <- data.frame(
  Name = c("Alice", "Bob", "Charlie", "David", "Eve"),
  Age = c(25, NA, 30, 22, NA),
  Height = c(165, 175, NA, 180, 160)
)

# Printing the sample dataset
print(data)

#Example for na.omit()
clean_data <- na.omit(data)
print(clean_data)
     Name Age Height
1   Alice  25    165
2     Bob  NA    175
3 Charlie  30     NA
4   David  22    180
5     Eve  NA    160
   Name Age Height
1 Alice  25    165
4 David  22    180

Use is.na() to find NA values#

Purpose:

is.na() is used to identify where NA values are located in your dataset. It returns a logical vector or matrix (depending on the structure of data) where TRUE indicates the presence of NA and FALSE indicates that the value is not missing.

#Example

na_positions <- is.na(data)
print(na_positions)
      Name   Age Height
[1,] FALSE FALSE  FALSE
[2,] FALSE  TRUE  FALSE
[3,] FALSE FALSE   TRUE
[4,] FALSE FALSE  FALSE
[5,] FALSE  TRUE  FALSE