Data Structures#

Now that you’re familiar with the basic data types in R, let’s explore some of the main structures used for storing these data.

Vectors#

The simplest data structure in R is the vector. Vectors can contain elements such as numbers, characters, factors, or logical values, but all elements within a vector must be of the same type. A vector with a single value (length 1) is known as a scalar. It’s important to note that while vectors cannot mix data types, they can include NA values, which represent missing data.

# For example
numbers <- c(1, 2, 3, 4, 5)  # Numeric vector
words <- c("apple", "banana", "cherry")  # Character vector

Matrices and Arrays#

Matrices are another common data structure in R, particularly useful in fields like statistics and ecology. A matrix is essentially a vector with added dimensions, forming a two-dimensional table. Arrays extend this concept to more than two dimensions. Like vectors, all elements within a matrix or array must be of the same data type.

Matrices and arrays can be easily created using the matrix() and array() functions, respectively. You can also assign row and column names to matrices, which can help organize and interpret the data.

# For example
matrix_data <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
print(matrix_data)
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Lists#

Lists are a flexible data structure that can store a mixture of different data types. Unlike vectors and matrices, lists can contain elements of different classes, including other lists or data structures. This makes lists ideal for storing irregular or complex data.

You can create a list using the list() function and name the elements within the list for easier reference.

# For example

my_list <- list(name = "John", age = 30, married = TRUE)
print(my_list)
$name
[1] "John"

$age
[1] 30

$married
[1] TRUE

Data Frames#

Data frames are perhaps the most commonly used data structure in R. They are two-dimensional tables that resemble matrices but can contain different types of data in each column. Typically, each row in a data frame represents an observation, and each column represents a variable.

Data frames are especially useful for organizing and analyzing data, and they are similar in structure to spreadsheets used in applications like Excel. You can create a data frame using the data.frame() function, and it’s important to ensure that all columns have the same number of observations. Missing data should be represented as NA.

# For example
df <- data.frame(
  id = c(1, 2, 3),
  name = c("John", "Jane", "Doe"),
  age = c(28, 24, 35)
)
print(df)
  id name age
1  1 John  28
2  2 Jane  24
3  3  Doe  35