How to Transpose Data in R (With Examples)

This tutorial explains how to transpose a data frame in R using tidyr package. It helps to convert data from long to wide format or wide to long format.

Transpose Data with R

Sample Data

The code below creates a sample data frame that would be used to explain examples in this tutorial.

data <- read.table(text="X Y    Z
                   ID12   2012-06    566
                   ID1    2012-06  10239
                   ID6    2012-06    524
                   ID12   2012-07   2360
                   ID1    2012-07   13853
                   ID6    2012-07    2352
                   ID12   2012-08   3950
                   ID1    2012-08   14738
                   ID6    2012-08   4104",header=TRUE)

Converting Data from Long to Wide Format

Suppose you have data containing three variables such as X, Y and Z. The variable 'X' contains IDs and the variable 'Y' contains dates and the variable 'Z' contains income. The data is structured in a long format and you need to convert it to wide format. The snapshot of data and desired output is shown below -

R : Convert Long to Wide Format

pivot_wider()

In tidyr package, pivot_wider( ) function converts data from long to wide Format. See the arguments of this function below:

pivot_wider(
  data,
  names_from,
  values_from,
  values_fill = NULL,
  id_cols = everything(),
  names_sep = "_"
)

data: Data frame to be transposed.
names_from: Column(s) to be used for naming columns in the output.
values_from: Column(s) to get the values from.
values_fill: If specified, this value will replace any missing values.
id_cols: Identifier columns. By default, it takes all columns except for the columns in the 'names_from' and 'values_from' arguments.
names_sep: Separator to use between values in the resulting column names.

You can install 'tidyr' package by using the command : install.packages('tidyr')

library(tidyr)
mydt <- pivot_wider(data, names_from = Y, values_from = Z)

spread()

There is one more function in tidyr named spread() that widens data by increasing the number of columns while decreasing the number of rows from the input to the transposed output data.

The syntax of spread() function is as follows :

spread(data, key, value, fill = NA)

data: data frame name.
key: Column name which will be used for column headings.
value: Column name which will be used for filling the rows.
fill: If used, missing values will be replaced with this value.

mydt <- spread(data, Y, Z)

Output


# A tibble: 3 × 4
  X     `2012-06` `2012-07` `2012-08`
1 ID12        566      2360      3950
2 ID1       10239     13853     14738
3 ID6         524      2352      4104

pivot_wider vs. spread : We recommend using the pivot_wider() function since it is actively maintained by developers and it's easier to use and has more features than the spread() function.

Handling Missing Values While Transposing

In this section, we will see how to treat missing values in the output data. In the data below, we have information about income generated from 3 products - Product A, B and C, reported semi-annually. We have limited data about Product C.

df <- read.table(text="Year SemiYear Product Income
1 1 ProductA 13377
1 2 ProductA 14069
1 1 ProductB 11426
1 2 ProductB 11750
1 2 ProductC 11750
2 1 ProductA 11122
2 2 ProductA 11202
2 1 ProductB 14712
2 2 ProductB 10169
",header=TRUE)

pivot_wider()

library(tidyr)
mydt_pivot <- pivot_wider(df, 
                          names_from = Product, 
                          values_from = Income)

Output


# A tibble: 4 × 5
   Year SemiYear ProductA ProductB ProductC
1     1        1    13377    11426       NA
2     1        2    14069    11750    11750
3     2        1    11122    14712       NA
4     2        2    11202    10169       NA

As you can see in the output, we have NAs in the 'ProductC' column. To replace them with 0, we can use the argument values_fill in the 'pivot_wider' function.


mydt_pivot <- pivot_wider(df, 
                          names_from = Product, 
                          values_from = Income,
                          values_fill = 0)

spread()

We can use the argument fill in the 'spread' function to replace missing values with 0.

library(tidyr)
mydt_spread <- spread(df, Product, Income, fill = 0)

Output


# A tibble: 4 × 5
   Year SemiYear ProductA ProductB ProductC
1     1        1    13377    11426        0
2     1        2    14069    11750    11750
3     2        1    11122    14712        0
4     2        2    11202    10169        0

Converting Data from Wide to Long Format

Suppose you have data containing information of species and their sepal length. The data of sepal length of species are in columns.

Wide to Long Format

Create Sample Data

mydata = read.table(text= "ID setosa versicolor virginica
                    1 5.1 NA NA
                    2 4.9 NA NA
                    3 NA 7 NA
                    4 NA 6.4 NA
                    5 NA NA 6.3
                    6 NA NA 5.8
                    ", header=TRUE)

pivot_longer()

The pivot_longer() function transforms data from wide to long format. See the arguments below :

pivot_longer(
  data,
  cols,
  names_to = "name",
  values_to = "value",
  names_prefix = NULL,
  names_sep = NULL,
  values_drop_na = FALSE
)

data: Data frame to be reshaped to long format.
cols: Column(s) to be transformed into longer format.
names_to: New column to be created from the columns specified in the 'cols' argument.
values_to: Name of the column having values in the output.
names_prefix: Pattern used to remove matching text from the start of each column name.
names_sep: Separator used in column names.
values_drop_na: Removing rows containing only NAs in the 'value_to' column.

mydt_longer <- pivot_longer(
  mydata,
  cols = c(setosa, versicolor, virginica),
  names_to = "Species",
  values_to = "Sepal.Length",
  values_drop_na = TRUE
)

gather()

The syntax of the gather() function is as follows:

gather(data, key, value, ..., na.rm = FALSE)

data: Data frame
key: Name of new key column
value: Name of new value column
...: Columns to gather.
na.rm: na.rm=TRUE means removing rows from the output where the value is missing

mydt_gather <- gather(mydata, 
                      Species, 
                      Sepal.Length, 
                      c("setosa", "versicolor","virginica" ), 
                      na.rm = TRUE)

Output


# A tibble: 6 × 3
     ID Species    Sepal.Length
1     1 setosa              5.1
2     2 setosa              4.9
3     3 versicolor          7  
4     4 versicolor          6.4
5     5 virginica           6.3
6     6 virginica           5.8

Note : Missing values are removed in the above output since we have set the argument values_drop_na to TRUE.

About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

While I love having friends who agree, I only learn from those who don't
Let's Get Connected Email LinkedIn

Post Comment 2 Responses to "How to Transpose Data in R (With Examples)"

UnknownFebruary 21, 2017 at 11:59 PM
Hi Deepashu,
Thanks for sharing such a wonderful article on Transposing data.
However, I still have some doubt associated with, if I will compare it with SAS.
The variable which you are saying that its an "ID" variable, by sas I can say that it a "BY" variable because that's something which we are keeping it on the left hand side and rest we are "ID" by sas.
please reply to clear my small dubiousness.
Thanks
Rishabh
daveNovember 1, 2020 at 3:48 AM
One of the best overall tutorials I have seen. I promise this is getting bookmarked for my reference.