Chapter 17 Google Analytics from R

Understanding the audience that enters our website helps us make better decisions, whether these are commercial or content release decisions. We can, thus, insert a visit counter or use Google Analytics to start collecting much more than the total visits.

17.1 Problem

We have a website to which we already placed the Google Analytics code to understand visit statistics to my website, but I want reports that today the web does not provide us. We need to access the raw data to represent our own reports and access them even without having to enter the Google Analytics website.

[!IMPORTANT] Google sunsetted Universal Analytics in July 2023. The googleAnalyticsR package now supports GA4. The concepts here remain valid, but the specific function parameters may differ. See the package documentation for GA4-specific usage.

17.2 Access to data

We are going to assume for this case that we already have a google analytics account and we are already tracking data from our website through some view. For this case I am going to use the statistics to the website that you are currently reading.

To access the Google Analytics data we will use the googleAnalyticsR library. In addition, to quickly manipulate dates from or to we will use the lubridate library.

install.packages("googleAnalyticsR")
library(googleAnalyticsR)
library(lubridate)
library(tidyverse)

Then, we have to authenticate. To do this we will use the ga_auth() function, which will open a web page to log in with the account in which we have access to Google Analytics.

ga_auth()

Now that we are authenticated we can bring all our accounts using the ga_account_list() function.

account_list <- ga_account_list()

From here we will search for the row of the website that interests us and from there we will obtain the propertyId column. The Property ID in Google Analytics 4 for this website is the following:

property_id <- 123456789 # Replace with your GA4 Property ID

Finally, we need two variables of the date from when to when we want the data.

from_date <- "2024-01-01"
to_date <- "2024-03-31"

Or if we wish we can only calculate the information of the last two months, or two days, etc.

# Two months ago until now
# Two months ago until now
from_date <- (today() - months(2)) |> as.character()
to_date <- today() |> as.character()

from_date
to_date

Thus, we can already make a call to obtain the data we need using the ga_data() function (the standard for GA4).

history <- ga_data(property_id,
                 date_range = c(from_date, to_date),
                 metrics = "activeUsers",
                 dimensions = "date")

With this data frame we could filter it or visualize it, depending on what we need.

17.3 Visualization

Now with access to the data we can use the multiple metrics and dimensions available. For this case we are going to exemplify visualizing the city from where they visit this website in the last 90 days, which is related to the information in the paragraph of the main page, the preface, of this website (however, for that other calculation it is performed for another period of time).

# Last 90 days to date
from_date <- seq(now(), length = 2, by = "-90 days")[2] |> as_date() |> as.character()
to_date <- now() |> as_date() |> as.character()

# We add the city as a dimension
history <- ga_data(property_id,
                 date_range = c(from_date, to_date),
                 metrics = "activeUsers",
                 dimensions = "city")

As we see, the dimension also allows a vector as input. We will create a bar chart with the top 5 cities that visited this website in the last 90 days.

history |> 
  filter(city != "(not set)") |>
  group_by(city) |> 
  summarise(total = sum(activeUsers)) |> 
  mutate(proportion = total / sum(total)) |> 
  top_n(5, wt = proportion) |> 
  mutate(city = reorder(city, proportion, sum)) |> 
  ggplot() +
  aes(proportion, city) +
  geom_col() +
  labs(
    x = "Proportion of visits",
    title = "Proportion of visits by city",
    y = ""
  )

Keep in mind that in this case there is an issue of recognition of IPs coming from Lima, Peru, and that is why they do not appear as the first visitor. At the time of performing this analysis they all appeared as “(not set)”. However, if the same analysis is done by country and not by city, Peru is recognized and appears as one of the top visiting the web.

17.4 Conclusion

Accessing Google Analytics data programmatically opens powerful possibilities:

  • Automated Reporting: Schedule R scripts to generate weekly/monthly reports.
  • Custom Metrics: Combine GA data with internal business data for richer analysis.
  • Interactive Dashboards: Use Shiny to create real-time analytics dashboards.

With the googleAnalyticsR package, you can query any metric or dimension available in your GA account, transforming raw clickstream data into actionable business insights—all without leaving your R environment.