How to Use Google Analytics in R

Deepanshu Bhalla Add Comment

This tutorial explains how to use Google Analytics in R. It includes several examples which help you to perform any kind of analysis on your Google Analytics data using R.

In this article, we will use Google Analytics 4 (GA4) which is the latest version of Google analytics. It was designed to make it better to track what people do on websites and mobile apps.

How to Use Google Analytics in R

First make sure you install the following packages.

  1. googleAnalyticsR
  2. gargle

Install these packages using the syntax below.

install.packages("googleAnalyticsR")
install.packages("gargle")
Table of Contents

Steps to Integrate Google Analytics into R

Step 1: Authentication

Initially you need to authenticate with Google in your browser. Run the syntax below. It will open browser and ask you to login with Google. This is required when you do it first time. In the subsequent runs, it will be authenticated automatically.

# Libraries
library(googleAnalyticsR)
library(gargle)

# Access
ga_auth(email="deepanshuxxxxxx@gmail.com")
ga_account_list("ga4")

Step 2: Fetch Google Analytics Data

The following code pulls google analytics data and store it in R dataframe. You need to specify your GA4 property ID, along with start and end dates of the data you wish to extract.

my_property_id <- 3819XXXXX
from_date <- "2023-09-12"
to_date <- "2023-09-19"
           
overall <- ga_data(
  my_property_id,
  metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"),
  date_range = c(from_date, to_date)
  )
Output

# A tibble: 1 × 4
  activeUsers newUsers sessions screenPageViews
                           
1       56399    44636    81366          114172

Country-wise Breakdown

To view Google Analytics Traffic Data across countries, you can add dimensions argument in the ga_data() function.

my_property_id <- 3819XXXXX
from_date <- "2023-09-12"
to_date <- "2023-09-19"

# By Country
country <- ga_data(
  my_property_id,
  dimensions = c("country"),
  metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"),
  date_range = c(from_date, to_date)
)
Output

# A tibble: 100 × 5
   country        activeUsers newUsers sessions screenPageViews
 1 United States        18102    14378    25746           29730
 2 India                 9814     7021    17427           30698
 3 United Kingdom        3243     2567     4549            5477
 4 Canada                2327     1889     3061            3879
 5 Australia             1865     1493     2606            2952
 6 Germany               1587     1278     2137            2540
 7 Singapore             1117      955     1526            1887
 8 France                1067      885     1427            1600
 9 Netherlands           1017      854     1329            1510
10 Brazil                 881      651     1248            1504
# ℹ 90 more rows
# ℹ Use `print(n = ...)` to see more rows

Day-wise Breakdown

To see google analytics data by each day, you can add dimensions date and dayOfWeek. The dimension "dayOfWeek" shows the day of the week i.e. sunday when dayOfWeek=0 and saturday when dayOfWeek=6.

my_property_id <- 3819XXXXX
from_date <- "2023-09-12"
to_date <- "2023-09-19"

library(dplyr)
sample <- ga_data(
  my_property_id,
  dimensions = c("date","dayOfWeek"),
  metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"),
  date_range = c(from_date, to_date)
) %>% 
arrange(desc(date))
Output

# A tibble: 8 × 6
  date       dayOfWeek activeUsers newUsers sessions screenPageViews
                                     
1 2023-09-19 2                9954     7053    12343           16108
2 2023-09-18 1                7844     5583    10193           13643
3 2023-09-17 0                3213     2473     4104            5965
4 2023-09-16 6                4019     3022     5236            7041
5 2023-09-15 5                8625     5933    11407           16009
6 2023-09-14 4                9750     6745    12493           19848
7 2023-09-13 3                9910     6938    12973           18394
8 2023-09-12 2                9857     6889    12934           17164

How to add Multiple Dimensions

You can specify multiple dimensions in the dimensions argument of ga_data() function. The following code returns web traffic data by city day-wise.

my_property_id <- 3819XXXXX
from_date <- "2023-09-12"
to_date <- "2023-09-19"

library(dplyr)
sample <- ga_data(
  my_property_id,
  dimensions = c("date","city","dayOfWeek"),
  metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"),
  date_range = c(from_date, to_date)
)%>% 
arrange(desc(date)) %>% 
filter(city == "New York")

In the code above, we applied filter selecting data for "New York" city only.

Output

# A tibble: 6 × 7
  date       city     dayOfWeek activeUsers newUsers sessions screenPageViews
                                         
1 2023-09-19 New York 2                 189      138      240             271
2 2023-09-18 New York 1                 146      100      192             219
3 2023-09-15 New York 5                 182      122      231             266
4 2023-09-14 New York 4                 180      124      227             251
5 2023-09-13 New York 3                 194      142      228             284
6 2023-09-12 New York 2                 186      138      228             290

Web Traffic Data by Posts

To see post-level performance on your website, you can specify pagePath in the dimensions argument of the ga_data() function. Make sure to specify a high number in the "limits" argument of the function.

my_property_id <- 3819XXXXX
from_date <- "2023-09-12"
to_date <- "2023-09-19"

basic <- ga_data(
  my_property_id,
  metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"),
  dimensions = c("pagePath"),
  date_range = c(from_date, to_date),
  limit = 1000,
  dim_filters = ga_data_filter(sessionMedium == "organic")
  )

Post-Level Performance by Country

To see the post-level performance of your website by country, you can specify two dimensions - pagePath and country in the dimensions argument of the ga_data() function. Make sure to specify a high number in the "limits" argument of the function as the combination of the number of posts and countries can result in a significantly large number.

my_property_id <- 3819XXXXX
from_date <- "2023-09-12"
to_date <- "2023-09-19"

basic <- ga_data(
  my_property_id,
  metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"),
  dimensions = c("pagePath","country"),
  date_range = c(from_date, to_date),
  limit = 10000,
  dim_filters = ga_data_filter(sessionMedium == "organic")
  )

Filters

You can use the dim_filters argument of the ga_data() function to apply a filter to any dimension. The function ga_data_filter() is used to create a query for filtering.

In this example, we are selecting organic data by applying filter on "sessionMedium".

my_property_id <- 3819XXXXX
from_date <- "2023-09-12"
to_date <- "2023-09-19"

basic <- ga_data(
  my_property_id,
  metrics = c("activeUsers", "newUsers","sessions", "screenPageViews"),
  date_range = c(from_date, to_date),
  dim_filters = ga_data_filter(sessionMedium == "organic")
  )

Multiple Filters

To apply multiple filters to a Google Analytics report, you can use the symbols &, |, and ! for AND, OR and NOT conditions in the ga_data_filter() function.

# OR condition
ga_data_filter(city=="New York" | city == "Los Angeles")

# Select multiple values
ga_data_filter(city==c("New York","Los Angeles"))

# AND condition
ga_data_filter(city=="Los Angeles" & sessionMedium == "organic")

# NOT Condition
ga_data_filter(!(city=="New York" | city == "Los Angeles"))

Real Time Data

To see real-time reports, you can set TRUE for the realtime argument of the ga_data() function. In the real-time report, GA4 shows the number of users, views in the past 30 minutes.

my_property_id <- 3819XXXXX

overall <- ga_data(
  my_property_id,
  metrics = c("activeUsers", "screenPageViews"),
  realtime = T
)

Note - The real-time report does NOT include these metrics - "newUsers" and "sessions".

See the output below for real-time reports.

Output

# A tibble: 1 × 2
  activeUsers screenPageViews
                   
1         332             475
Real Time Data by Country/City

Real-time report allows limited set of dimensions. You can group the report by country or city.

overall <- ga_data(
  my_property_id,
  dimensions = c("country"),
  metrics = c("activeUsers", "screenPageViews"),
  realtime = T
)

See the output below for real-time report by country.

Output

# A tibble: 59 × 3
   country        activeUsers screenPageViews
                              
 1 India                  135             169
 2 United Kingdom          33              48
 3 (other)                 23              24
 4 United States           19              18
 5 Singapore               10              12
 6 Denmark                  9               9
 7 Italy                    9              13
 8 Netherlands              9              11
 9 Spain                    9              11
10 Germany                  8              16
# ℹ 49 more rows

Fetch Hourly Data

By specifying "hour" in the dimensions parameter of the ga_data() function, you can fetch hourly data from Google Analytics using R. To sort data by sessions in descending order, you can specify a minus sign in the ga_data_order() function. For e.g. you can use ga_data_order(-sessions)

my_property_id <- 3819XXXXX
start_date <- Sys.Date()
end_date <- Sys.Date()

hourly.df <- ga_data(
  my_property_id,
  metrics = c("sessions"),
  dimensions = c("hour","country"),
  date_range =  c(start_date, end_date),
  orderBys = ga_data_order(-sessions)
  )
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 0 Response to "How to Use Google Analytics in R"
Next → ← Prev