Chapter 1 Objects

In the world of programming, an object is like a container that holds information. This information can be of different types: numbers, text, complex data, and even code. The important thing is that an object groups everything necessary to represent an entity or concept.

In R, practically everything is an object. The variables we will use to store data, the functions we will use to process that data, and even the data itself, are objects.

1.1 What are objects in R?

Imagine you are organizing your move to the United States. Each item you pack in a box (clothes, books, appliances) can be considered an object. Each object has characteristics that define it: a name, a type, a size, a weight, etc.

In R, objects also have characteristics that define them. These characteristics are called attributes. For instance, every object has a Name so we can refer to it, and a Type that indicates what kind of data it contains (numeric, character, logical, etc.). Objects also have a Class defining their structure and behavior (such as vector, list, or data frame) and a Length indicating the number of elements they contain.

1.1.1 R as an object-oriented language

R is an object-oriented programming language, meaning it relies on the concept of objects to organize and process information. This approach offers several advantages, such as Modularity, allowing us to divide a program into smaller, manageable parts. It also promotes Reusability, as objects can be used in different parts of the program or even in other projects. Furthermore, objects provide Encapsulation, hiding implementation details to facilitate their use and maintenance.

1.1.2 The power of abstraction

The concept of an object allows us to abstract the complexity of the real world. Instead of thinking about the details of how data is stored and processed in computer memory, we can think in terms of objects representing real-world entities.

For example, instead of thinking of a series of numbers representing the temperatures of different cities, we can think of a “temperatures” object containing all that information.

This abstraction facilitates understanding and handling information, allowing us to focus on the logic of the problem we want to solve.

1.2 Variables: The first objects on your journey

Before we start packing for our move to the United States, we need to know what things we will take. Each object we decide to take is represented in R as a variable.

Think of variables as labels we put on each object. For example, we could use the variable state to save the name of the state we are moving to, or the variable num_suitcases to save the number of suitcases we will take.

1.2.1 Creating variables in R

In R, we don’t need to declare a variable before using it. We simply assign it a value using the <- symbol.

Note: You might see the = symbol used for assignment in other programming languages or even in some R code. While = works in R, the <- operator is the standard and idiomatic way to assign values to variables. It helps distinguish between assigning a value to a variable and passing arguments to a function (where = is always used).

Example:

# Assign the value "California" to the variable "state"
state <- "California"

# Assign the value 5 to the variable "num_suitcases"
num_suitcases <- 5

To see the value we have saved in a variable, we simply type its name in the RStudio console and press Enter.

Example:

state <- "California"

state
#> [1] "California"

When executing this code, you will see the value "California" appear in the console.

1.2.2 Operations with variables

We can also use variables to perform operations. For example, if we want to calculate the total cost of our plane trip, we could use the variables ticket_price and num_people.

Example:

ticket_price <- 300
num_people <- 4

total_cost <- ticket_price * num_people

total_cost
#> [1] 1200

In this example, we first assign values to the variables ticket_price and num_people. Then, we multiply these variables to calculate the total_cost and display its value in the console.

1.2.3 Best practices for naming variables

Watch out for capitalization!

R is case-sensitive. If you create a variable called state and then try to access it as State, R will not find it.

Descriptive names

It is important to use descriptive names for variables, clearly indicating what information they contain. Instead of using variables like x or y, it is better to use names like ticket_price or num_suitcases.

Rules for naming variables

When naming your variables, remember that they can contain letters, numbers, and underscores (_), but they cannot start with a number or contain spaces. Also, keep in mind that R is case-sensitive, so capitalization matters.

1.2.4 Data types

Variables in R can contain different types of data. Numeric variables are used for numbers, such as the population of a city or the cost of a plane ticket. Character variables store text, like the name of a state (“California”) or a city (“Los Angeles”). Logical variables represent binary truth values, TRUE or FALSE, which are useful for conditions, such as indicating whether we want to visit a specific city.

1.3 Object types for complex data

The variables we have seen so far are very useful for storing individual information, such as the name of a city or the number of suitcases we will carry on our move. However, in the real world, we often need to work with more complex datasets.

Imagine you want to save the names of all the cities you plan to visit on your trip to the United States. Would you have to create a variable for each city? That would be very tedious!

Fortunately, R offers other types of objects that allow us to organize and manipulate information more efficiently. Let’s look at some of them:

1.3.1 Vectors: organizing information of the same type

Vectors are like trains transporting a series of objects of the same type. They can be numbers, text, or logical values, but all elements of a vector must be of the same type. For example, we could use a vector to save the name of each state in the United States, or a vector to save the population of each state.

Creating vectors: To create a vector, we can use the c() function (which stands for “combine”) and list the elements we want to include, separated by commas.

# Create a vector with the names of some states
states <- c("California", "Texas", "Florida", "New York")

# Create a vector with the population of each state (in millions)
population <- c(39.2, 29.0, 21.4, 19.4)

If we want to know the amount of data our vector has, its length, we will use the length() function. The class() function tells us the class of the object, that is, what type of data it contains.

length(population)  
#> [1] 4
class(states)     
#> [1] "character"
class(population)   
#> [1] "numeric"

We can use the names() function to assign names to the elements of a vector. This can be useful for identifying each element.

names(population) <- states
population
#> California      Texas    Florida   New York 
#>       39.2       29.0       21.4       19.4

In addition to c(), there are other useful functions for creating vectors. The seq() function creates a sequence of numbers, allowing us to specify the start value, the end value, and the increment.

# Create a vector with numbers from 1 to 10
numbers <- seq(1, 10)

# Create a vector with numbers from 2 to 20, by 2
even_numbers <- seq(2, 20, by = 2)

Another useful function is rep(), which repeats a value or a vector a specified number of times.

# Create a vector with the value 1 repeated 5 times
ones <- rep(1, 5)

# Create a vector with the sequence "A", "B" repeated 3 times
letters <- rep(c("A", "B"), 3)  # Output: "A" "B" "A" "B" "A" "B"

Accessing vector elements: Each element of a vector has a position, indicated by a number in brackets. The first element is at position 1, the second at position 2, and so on.

# Show the first element of the "states" vector
states[1]  # Output: "California"

# Show the third element of the "population" vector
population[3]  # Output: 21.4

We can also access multiple elements at once using the : operator. For example, to access elements from the second to the fourth of the states vector:

states[2:4]
#> [1] "Texas"    "Florida"  "New York"

Operations with vectors: We can perform mathematical operations with numeric vectors. For example, if we want to calculate the total population of the four states, we can use the + operator to sum the elements of the population vector.

population <- c(39.2, 29.0, 21.4, 19.4) 

population[1] + population[2] + population[3] + population[4]  
#> [1] 109

If we want to perform the same operation more concisely, R allows us to sum all elements of a vector directly:

population <- c(39.2, 29.0, 21.4, 19.4) 

sum(population)  
#> [1] 109

R also offers other tools for performing operations with vectors. For example, if we want to calculate the square root of the population of each state:

sqrt(population)  
#> [1] 6.260990 5.385165 4.626013 4.404543

In this case, the sqrt() function calculates the square root of each element of the population vector individually. This is possible because many functions in R are vectorized, meaning they can operate directly on vectors, element by element. Vectorized functions are very efficient as they avoid the need to write loops to process each element of the vector separately. We will explore functions in R and how to use them for more complex data analysis in greater depth later.

Vector coercion: Unlike other programming languages, R tries to interpret or change a value when it encounters an error. For example, if we try to convert a character vector to numeric, R will convert the elements it can and replace the ones it cannot with NA.

example <- c("3", "b", "6", "a", "bridge", "4")
as.numeric(example)
#> Warning: NAs introduced by coercion
#> [1]  3 NA  6 NA NA  4

Sorting vectors: We can sort the elements of a vector using the sort() function.

districts <- c("Comas", "Lince", "Miraflores", "Lurigancho", "Chorrillos")
sort(districts) 
#> [1] "Chorrillos" "Comas"      "Lince"      "Lurigancho" "Miraflores"

We can also order a vector using its indices with the order() function. This way, we get a vector with the positions the elements of the original vector would occupy if they were sorted. This can be useful when we want to sort a vector based on another vector or when we want to preserve the original vector without modifying it.

indices <- order(districts)  # Output: 5 1 2 4 3
districts[indices]
#> [1] "Chorrillos" "Comas"      "Lince"      "Lurigancho" "Miraflores"

NA in vectors: If a vector contains NA values, some operations may return NA. We can use the is.na() function to identify NA values and filter them.

example_na <- c(28, 3, 19, NA, 89, 45, NA, 86, 5, 18, 28, NA)
example_no_na <- example_na[!is.na(example_na)]
mean(example_no_na)  # Output: 38.66667
#> [1] 35.66667

1.3.2 Lists: grouping objects of different types

Lists are like containers that can hold different types of objects. Imagine a box where you can put clothes, books, tools, and any other object you need. In R, lists allow you to group diverse information into a single object.

Creating lists: To create a list, we use the list() function and specify the elements we want to include, separated by commas. Each element can have a name, indicated with the = symbol.

# Create a list with information about a city
city_info <- list(name = "San Francisco", 
                  population = 880000, 
                  cost_of_living = 3.8, 
                  climate = "Temperate")

Accessing list elements: To access the elements of a list, we can use their names or their positions.

# Access the "name" element of the "city_info" list
city_info$name  # Output: "San Francisco"

# Access the second element of the "city_info" list
city_info[[2]]  # Output: 880000

1.3.3 Matrices: organizing data in rows and columns

Matrices are like tables that organize information in rows and columns. All elements of a matrix must be of the same type.

Creating matrices: To create a matrix, we use the matrix() function. We must specify the data we want to include, the number of rows (nrow), and the number of columns (ncol).

# Create a matrix with distances between cities (in miles)
city_distances <- matrix(c(0, 2600, 2100, 950, 
                           2600, 0, 1100, 2700, 
                           2100, 1100, 0, 2100, 
                           950, 2700, 2100, 0), 
                         nrow = 4, ncol = 4)
city_distances
#>      [,1] [,2] [,3] [,4]
#> [1,]    0 2600 2100  950
#> [2,] 2600    0 1100 2700
#> [3,] 2100 1100    0 2100
#> [4,]  950 2700 2100    0

Accessing matrix elements: To access the elements of a matrix, we use brackets and specify the row and column of the element we want.

# Access the element in row 1, column 3 of the "city_distances" matrix
city_distances[1, 3] 
#> [1] 2100

1.3.4 Arrays: multidimensional matrices

Arrays are like matrices that have more than two dimensions. Imagine a matrix that, in addition to rows and columns, has depth. In R, arrays allow you to organize data in more complex structures.

Creating arrays: To create an array, we use the array() function.

# Create an array with maximum and minimum temperatures of 
# three cities during the summer months (June, July, August)
temperatures <- array(c(25, 28, 30, 22, 25, 28,  # City 1
                        28, 20, 32, 25, 18, 30,  # City 2
                        22, 25, 28, 18, 23, 25), # City 3
                      dim = c(3, 2, 3))  # 3 cities, 2 temperatures (max/min), 3 months
temperatures
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]   25   22
#> [2,]   28   25
#> [3,]   30   28
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]   28   25
#> [2,]   20   18
#> [3,]   32   30
#> 
#> , , 3
#> 
#>      [,1] [,2]
#> [1,]   22   18
#> [2,]   25   23
#> [3,]   28   25

Accessing array elements: To access the elements of an array, we use brackets and specify the position of the element in each dimension.

# Access the maximum temperature of city 2 in July
temperatures[2, 1, 2] 
#> [1] 20

1.3.5 Factors: representing categorical data

Factors are a special type of object used to represent categorical data, that is, data that can be classified into groups. For example, the type of climate (“warm”, “temperate”, “cold”), the region of a country (“north”, “south”, “east”, “west”), or the type of housing (“house”, “apartment”).

Creating factors: To create a factor, we use the factor() function.

# Create a factor with climate types of different cities
climate_types <- factor(c("Temperate", "Warm", "Cold"))

Levels of a factor: The different values a factor can take are called levels. In the previous example, the levels of the climate_types factor are “Temperate”, “Warm”, and “Cold”.

Utility of factors: Factors are very useful for data analysis, as they allow grouping and comparing information efficiently. For example, we could use the climate_types factor to analyze how the cost of living varies in cities with different climates.

1.4 The Universe of Objects in R

Throughout this chapter, we have explored the different types of objects inhabiting the R universe. From the simplest variables to multidimensional arrays, each object plays an important role in building our data analyses.

1.4.1 Philosophy of objects in R

In R, everything is an object. This philosophy has profound implications for how code is written and executed. By treating everything as an object, R promotes consistency, modularity, and reuse.

Objects allow us to encapsulate information and behavior, facilitating code organization and maintenance. Furthermore, the ability to create our own objects gives us great power to model and solve complex problems.

By understanding the philosophy of objects in R, we can make the most of the language’s capabilities for data analysis.

1.4.2 Comparison with other languages

While many modern programming languages use the object-oriented paradigm, R has a particular approach. In languages like Python or Java, creating classes and objects is a fundamental part of the language. In R, while it is possible to create classes and objects, the language focuses more on the use of functions to manipulate and transform data.

This difference is due in part to R’s history as a language for statistical analysis. In this context, functions are a natural tool for performing calculations and analyses.

1.5 Exercises

Now that you know the different types of objects in R, it’s time to put your knowledge to the test.

Create four variables to plan your move. Define city_name with the city you would like to move to, population with its number of inhabitants, and distance with the kilometers from your current location. Also, create a logical variable want_to_live_there indicating if you truly want to live there.

Solution

city_name <- "Seattle"
population <- 724745 
distance <- 8340  # Approximate distance from Lima, Peru
want_to_live_there <- TRUE

Create a vector called nearby_cities containing the names of three cities near the city you chose in the previous exercise.

Solution

nearby_cities <- c("Tacoma", "Bellevue", "Everett")

Construct a list called my_list that groups different types of information about yourself. It should include your name, your age, a vector with your three favorite colors, and a logical value indicating if you simplify like chocolate.

Solution

my_list <- list(name = "Ana", 
                age = 30, 
                favorite_colors = c("blue", "green", "red"), 
                likes_chocolate = TRUE)

Create a matrix called monthly_expenses containing your estimated monthly expenses in the following categories:

Category	January	February	March
Housing
Transport
Food
Entertainment

Complete the matrix with numerical values.

Solution

monthly_expenses <- matrix(c(1500, 1500, 1500,  # Housing
                             300,  250,  350,   # Transport
                             500,  400,  550,   # Food
                             200,  150,  250),  # Entertainment
                           nrow = 4, ncol = 3,
                           dimnames = list(c("Housing", "Transport", "Food", "Entertainment"),
                                           c("January", "February", "March")))

Create a factor called climate_types containing the names of the different climate types in the United States (you can use “Temperate”, “Warm”, “Cold”, etc.). Assign labels to the factor levels to make them more descriptive (for example, “Cold climate”, “Temperate climate”, etc.).

Solution

climate_types <- factor(c("Temperate", "Warm", "Cold", "Warm", "Temperate"),
                     levels = c("Cold", "Temperate", "Warm"),
                     labels = c("Cold climate", "Temperate climate", "Warm climate"))

climate_types

Create a vector called cities_to_visit with the names of 5 cities you would like to visit in the United States. Then, create another vector called days_per_city with the number of days you would like to spend in each city. Finally, create a third vector called daily_cost with the estimated daily cost in each city (in dollars).

Solution

cities_to_visit <- c("New York", "Los Angeles", "Chicago", "San Francisco", "Miami")
days_per_city <- c(5, 4, 3, 6, 2)  
daily_cost <- c(200, 180, 150, 220, 170)

Create a vector called max_temperatures with the average maximum temperatures (in Celsius) of the cities you want to visit during the month of July. Then, create a vector called min_temperatures with the average minimum temperatures. Finally, create a matrix containing these two vectors as columns, and name the rows with the names of the cities.

Solution

max_temperatures <- c(29, 28, 27, 22, 31)  # Max temperatures in July
min_temperatures <- c(21, 18, 19, 15, 25)  # Min temperatures in July

# Create the matrix
temperatures <- matrix(c(max_temperatures, min_temperatures), nrow = 5, ncol = 2,
                       dimnames = list(cities_to_visit, c("Maximum", "Minimum")))

temperatures
#>               Maximum Minimum
#> New York           29      21
#> Los Angeles        28      18
#> Chicago            27      19
#> San Francisco      22      15
#> Miami              31      25

Create a three-dimensional array containing information about the climate of the cities you want to visit. The first dimension should represent the cities, the second dimension should represent the months of the year (“January”, “February”, …, “December”), and the third dimension should represent two variables: “Temperature” and “Precipitation”. You can use dummy values to fill the array.

Solution

# Create an array with dimensions 5 cities x 12 months x 2 variables
climate <- array(dim = c(5, 12, 2),
                dimnames = list(cities_to_visit,
                                month.name,
                                c("Temperature", "Precipitation")))

# Fill the array with dummy values (example)
climate[,, "Temperature"] <- sample(10:35, 60, replace = TRUE)  # Temperatures between 10 and 35 degrees
climate[,, "Precipitation"] <- sample(0:100, 60, replace = TRUE)  # Precipitation between 0 and 100 mm

climate
#> , , Temperature
#> 
#>               January February March April May June July August September
#> New York           31       19    10    14  12   35   19     16        13
#> Los Angeles        18       27    21    31  33   17   30     17        15
#> Chicago            30       19    31    15  24   15   23     20        11
#> San Francisco      18       23    32    19  26   19   19     17        19
#> Miami              20       32    11    16  19   11   25     26        18
#>               October November December
#> New York           26       15       32
#> Los Angeles        25       29       22
#> Chicago            16       34       12
#> San Francisco      25       19       19
#> Miami              29       26       33
#> 
#> , , Precipitation
#> 
#>               January February March April May June July August September
#> New York           68       39    11    71   0   81   92     36        64
#> Los Angeles        15       73    42    90  74   92   90     77         1
#> Chicago            42       68    20     1  31   56   35      4        76
#> San Francisco      56       30    31   100  10   48   22     98        11
#> Miami              87      100    55    97  73   12   42      7        69
#>               October November December
#> New York           89       33       58
#> Los Angeles        27       56       44
#> Chicago            68       61       77
#> San Francisco      69       81       19
#> Miami              36       86        3

Imagine you have a vector with the daily maximum temperatures of a US city for a year. Create a program that, using only the concepts learned in this chapter (variables, vectors, matrices, arrays, and factors), identifies the longest streak of consecutive days with maximum temperatures above a given threshold (for example, 25 degrees Celsius).

Solution

This exercise requires efficient vector handling and algorithmic logic to identify the longest streak. Here is a possible solution:

# Create a vector with dummy maximum temperatures for a year
temperatures <- sample(10:35, 365, replace = TRUE)

# Define the temperature threshold
threshold <- 25

# Create a logical vector indicating if the temperature exceeds the threshold
hot_days <- temperatures > threshold

# Initialize variables to track the longest streak
current_streak <- 0
longest_streak <- 0
start_longest_streak <- 0

# Iterate through the hot days vector
for (i in 1:length(hot_days)) {
  if (hot_days[i]) {
    current_streak <- current_streak + 1
  } else {
    if (current_streak > longest_streak) {
      longest_streak <- current_streak
      start_longest_streak <- i - current_streak
    }
    current_streak <- 0
  }
}

# Show the longest streak and its position
cat("The longest streak of hot days is:", longest_streak, "\n")
#> The longest streak of hot days is: 4
cat("Starts on day:", start_longest_streak, "\n")
#> Starts on day: 48

This code uses a for loop to traverse the hot days vector and two variables (current_streak and longest_streak) to track the longest streak.

Imagine you have a vector with the daily stock prices of a company for a year. Create a program that, using only the concepts learned in this chapter, determines the time period in which you could have bought and sold the shares to obtain the maximum profit. Assume you can only buy and sell once.

Solution

This exercise is a variant of the classic “maximize stock profit” problem. Solving it optimally can be complex, but with the concepts from this chapter, we can create an algorithm that finds a solution (though not necessarily the optimal one).

# Create a vector with dummy stock prices for a year
prices <- sample(50:150, 365, replace = TRUE)

# Initialize variables to track max profit
max_profit <- 0
buy_day <- 1
sell_day <- 1

# Iterate through the prices vector
for (i in 1:(length(prices) - 1)) {
  for (j in (i + 1):length(prices)) {
    profit <- prices[j] - prices[i]
    if (profit > max_profit) {
      max_profit <- profit
      buy_day <- i
      sell_day <- j
    }
  }
}

# Show max profit and buy/sell days
cat("Maximum profit:", max_profit, "\n")
#> Maximum profit: 100
cat("Buy day:", buy_day, "\n")
#> Buy day: 280
cat("Sell day:", sell_day, "\n")
#> Sell day: 290

This code uses two nested for loops to compare all possible pairs of buy and sell days.