Chapter 4 Advanced Techniques
4.1 Metaprogramming: writing code that writes code
In previous chapters, we explored different object types in R and how to use functions to manipulate them. Now, we are going to delve into a more advanced concept: metaprogramming.
Metaprogramming is a technique that allows us to write code that generates other code. It’s like having a code factory where we can create new functions and expressions dynamically.
Why is this useful? Metaprogramming can be very useful for:
Metaprogramming is incredibly useful for automating repetitive tasks, allowing us to generate boilerplate code dynamically. It also enables us to create more flexible functions that adapt to different data structures, and facilitates writing concise, expressive code that captures complex logic simply.
In R, metaprogramming is based on the manipulation of expressions. An expression is a representation of R code as an object. We can create expressions, modify them, and evaluate them to generate new code.
4.1.1 Manipulating expressions: The art of sculpting code
In R, metaprogramming relies on manipulating expressions. An expression is a representation of R code as an object. Instead of simply executing the code, we can manipulate it as if it were a block of clay, shaping and modifying it to create new expressions and functions.
Think of an expression like a cooking recipe. The recipe contains a set of instructions (ingredients and steps to follow) to create a dish. Similarly, an expression in R contains instructions to perform a task.
R offers us several tools to manipulate expressions, as if they were the hands of a sculptor shaping clay:
R provides a toolkit for sculpting expressions. The quote() function captures code as an expression without running it, like saving a recipe for later. substitute() allows you to inject values into an expression, replacing placeholders with actual variables. To execute these stored expressions, we use eval(), which runs the code and returns the result. Finally, parse() can turn text strings directly into executable expressions.
With these tools, we can manipulate expressions to create new functions, modify the behavior of existing functions, and generate code dynamically.
4.1.2 Examples
Metaprogramming might seem like an abstract concept at first, but its applications are very concrete and powerful. Let’s look at some examples of how we can use metaprogramming in R to create dynamic functions and generate code automatically.
Example 1: Creating a function that generates other functions
Imagine you need to create several functions performing similar operations, but with some different parameters. For example, functions adding different constants to a number. Instead of writing each function separately, you can use metaprogramming to create a function that generates these functions dynamically.
create_sum_function <- function(n) {
expression <- substitute(function(x) x + n)
eval(expression)
}
sum_5 <- create_sum_function(5)
sum_10 <- create_sum_function(10)
sum_5(10)
#> [1] 15
sum_10(10)
#> [1] 20In this example, the create_sum_function() function receives a number n as an argument and generates a new function adding n to its argument. The substitute() function is used to create an expression representing the function we want to generate, and the eval() function is used to evaluate the expression and create the function.
Example 2: Generating code for data analysis
Suppose you want to perform a data analysis involving several steps, such as filtering data, calculating statistics, and generating a plot. You can use metaprogramming to generate the code for this analysis dynamically, based on specified parameters.
analyze_data <- function(data, filter_cond, column_to_analyze, statistic, plot_type) {
# Filter data
filtered_data <- substitute(data[filter_cond, ][[column_to_analyze]])
filtered_data <- eval(filtered_data)
# Calculate statistic
calculated_statistic <- substitute(statistic(filtered_data))
calculated_statistic <- eval(calculated_statistic)
# Generate plot
plot_expression <- substitute(plot_type(filtered_data))
eval(plot_expression)
# Return calculated statistic
return(calculated_statistic)
}
# Usage example
df <- data.frame(
x = c(1, 3, 2, 5.5, 4, 3.5, 8, 7, 9, 10),
y = c(10, 8, 9, 6, 7, 5, 3.6, 4, 2, 1)
)
# We want to filter data where x > 5, calculate mean of y and generate a histogram
result <- analyze_data(df, df$x > 5, "y", mean, hist)
result
#> [1] 3.32
Example 3: Creating a function to generate plots with dynamic variable names and advanced options
Imagine you need to create a function generating different types of plots (scatter, histograms, boxplots) with custom options like titles, labels, colors, and legends, and that can also handle different datasets and variables. In this case, metaprogramming can be very useful to create a flexible function adapting to these needs.
create_plot <- function(data, plot_type, var_x, var_y = NULL,
title = NULL, color = "blue",
labels_x = NULL, labels_y = NULL,
legend = NULL) {
# Create base plot expression
if (plot_type == "scatter") {
expression <- substitute(plot(data[[var_x]], data[[var_y]],
xlab = labels_x, ylab = labels_y,
main = title, col = color))
} else if (plot_type == "histogram") {
expression <- substitute(hist(data[[var_x]], main = title, xlab = labels_x, col = color))
} else if (plot_type == "boxplot") {
expression <- substitute(boxplot(data[[var_x]], main = title, ylab = labels_y, col = color))
} else {
stop("Invalid plot type.")
}
# Evaluate base expression
eval(expression)
# Add legend if specified
if (!is.null(legend)) {
legend("topright", legend = legend, fill = color)
}
}
# Usage example
df <- data.frame(
x = c(1, 3, 2, 5.5, 4, 3.5, 8, 7, 9, 10),
y = c(10, 8, 9, 6, 7, 5, 3.6, 4, 2, 1)
)
create_plot(df, "scatter", "x", "y",
title = "Scatter Plot", color = "red",
labels_x = "Variable X", labels_y = "Variable Y")
create_plot(df, "histogram", "x",
title = "Histogram of X", color = "green",
labels_x = "Variable X")
create_plot(df, "boxplot", "y",
title = "Boxplot of Y", color = "blue",
labels_y = "Variable X",
legend = c("Group A"))

In this example, the create_plot() function can generate different types of plots with custom options. The function uses substitute() to construct the base plot expression, and then eval() to evaluate the expression and generate the plot. Additionally, the function can add a legend to the plot if the legend argument is specified.
This example illustrates how metaprogramming can be useful for creating more flexible and complex functions that adapt to different needs.
4.2 Functional programming: a new paradigm
In previous chapters, we explored different object types in R and how to use functions to manipulate them. We have also seen how metaprogramming allows us to write code that generates other code. Now, we are going to delve into a different programming paradigm: functional programming.
Functional programming is a programming style based on the use of pure functions and data immutability.
Functional programming relies on two core concepts: pure functions and immutability. A pure function is consistent and side-effect-free, meaning it always produces the same output for the same input and does not modify any external state. Immutability ensures that data is not changed after creation; instead of modifying an existing object, we create a new one with the desired changes.
These principles make functional programming easier to reason about, debug, and maintain. It also facilitates writing concurrent and parallel code, as pure functions have no side effects that can interfere with other processes.
4.2.1 Basic principles of functional programming
Functional programming rests on several pillars. First, functions are first-class citizens, meaning they can be assigned to variables and passed as arguments just like data. Second, it relies on pure functions that produce consistent outputs without side effects. Third, it emphasizes immutability, creating new data rather than modifying existing objects. Finally, it typically rejects loops, facilitating data processing through higher-order functions instead.
4.2.2 Higher-order functions in R
R offers several higher-order functions that are especially useful for functional programming. These functions allow us to manipulate vectors, lists, and other objects concisely and efficiently, avoiding the use of for and while loops. The purrr package offers variants of map() for different types of results: map_dbl() to get a numeric vector, map_chr() to get a character vector, map_lgl() to get a logical vector, etc.
The purrr package provides a robust suite of tools. map() applies a function to each element of a list or vector, returning a new list (or vector with variants like map_dbl). reduce() performs a cumulative operation, combining elements one by one until a single result remains. keep() acts as a filter, retaining only those elements that satisfy a given condition.
The ~ symbol in higher-order functions is used to define an anonymous function. This means you are creating a function “on the fly”, without needing to give it an explicit name. The part following ~ is the body of this function, specifying operations to be performed on each element of the vector or list to which the function is applied. The dot . is used as a placeholder to refer to the current element.
These functions, along with other higher-order functions like map2(), pmap(), accumulate(), and every(), give us great flexibility for processing data functionally in R.
4.2.3 Examples
Let’s see some examples of how to apply functional programming in R:
Let’s put these concepts into practice with some concrete examples. First, consider a scenario where we want to calculate the sum of squares of even numbers in a vector.
``` r
numbers <- c(1, 2, 3, 4, 5)
sum_squares_evens <- numbers |>
keep(~. %% 2 == 0) |>
map_dbl(~. ^2) |>
reduce(`+`)
sum_squares_evens
#> [1] 20
```
For a second example, let’s filter a list of cities to find those with a population greater than 5 million.
``` r
cities <- list(
list(name = "New York", population = 8.4e6),
list(name = "Los Angeles", population = 3.9e6),
list(name = "Chicago", population = 2.7e6)
)
big_cities <- cities |>
keep(~.x$population > 5e6)
big_cities
#> [[1]]
#> [[1]]$name
#> [1] "New York"
#>
#> [[1]]$population
#> [1] 8400000
```
In this example, "x" acts as a placeholder to represent each element of the `cities` list as it iterates over it. That is, in each iteration, "x" will take the value of one of the cities in the list.
You might wonder why we use .x in these expressions. This placeholder serves three main purposes. First, it allows us to define an anonymous function—a quick, unnamed function (~ .x$population > 5e6) that evaluates whether a city meets our criteria. Second, it provides a way to access elements; the .x represents the current list item, allowing us to grab properties like .x$population. Finally, it promotes conciseness, enabling us to write compact, readable code without formally defining a separate function for a simple operation. You can technically use other variable names, but .x is the standard convention in purrr.
Functional programming is a powerful paradigm that can help you write cleaner, more efficient, and maintainable code. As you become familiar with its principles and tools, you will be able to apply them to a wide variety of data analysis problems.
4.3 R6: The future of OOP in R
For advanced Object-Oriented Programming (OOP) using the R6 package, please refer to Appendix B.
4.4 Exercises
Below, you will find a series of exercises with different levels of difficulty. It is time to put into practice what you have learned in this chapter.
- Formulate an expression that represents the sum of two variables,
aandb.
- Compose an expression for the multiplication of
xandy, then execute it to find the result.
- Generate a numeric vector and apply the
map()function to compute the square of each element.
Solution
- Define a vector of numbers and utilize
keep()from thepurrrpackage to retain only the even values.
- Build a function named
create_power_function()that takes a numbernand returns a new function capable of raising its input to the power ofn.
- Construct a numeric vector and apply
reduce()to calculate the product of all its elements.
- Design a function
create_flexible_sum_function()that accepts a numbernand yields a function that addsnto the sum of any arguments passed to it.
Solution
- Develop a
create_dynamic_plot()function that takes a data frame, a plot type (“scatter”, “histogram”, or “boxplot”), and a list of options (like title and color), generating the requested plot dynamically.
Solution
create_dynamic_plot <- function(data, plot_type, options) {
# Create base plot expression
if (plot_type == "scatter") {
expression <- quote(plot(data[[options$var_x]], data[[options$var_y]],
xlab = options$labels_x, ylab = options$labels_y,
main = options$title, col = options$color))
} else if (plot_type == "histogram") {
expression <- quote(hist(data[[options$var_x]],
main = options$title,
xlab = options$labels_x, col = options$color))
} else if (plot_type == "boxplot") {
expression <- quote(boxplot(data[[options$var_x]],
main = options$title,
ylab = options$labels_y, col = options$color))
} else {
stop("Invalid plot type.")
}
# Evaluate base expression
eval(expression)
}
# Create sample data
data <- data.frame(x = rnorm(100), y = rnorm(100))
# Tests
# Scatter plot
options_scatter <- list(var_x = "x", var_y = "y",
title = "Scatter Plot",
labels_x = "Variable X",
labels_y = "Variable Y",
color = "blue")
create_dynamic_plot(data, "scatter", options_scatter)
# Histogram
options_histogram <- list(var_x = "x",
title = "Histogram",
labels_x = "Values",
color = "green")
create_dynamic_plot(data, "histogram", options_histogram)
# Boxplot
options_boxplot <- list(var_x = "y",
title = "Boxplot",
labels_y = "Values",
color = "red")
create_dynamic_plot(data, "boxplot", options_boxplot)
