Transpose in R

Deepanshu Bhalla 2 Comments
In R, we can transpose data very easily. There are many R packages such as tidyr and reshape2 that helps to reshape data from long to wide format or vice versa. Like many of us, I was also searching transpose function in dplyr package but didn't get any success. Then I realized there is no such function in dplyr to transpose data. Instead there are separate packages built for this function. Both these packages dplyr and tidyr are a part of tidyverse which is a collection of R packages designed to make data manipulation and exploration cakewalk for R users. In this article, i would use these two packages with several examples. These package were written by the most popular R expert Hadley Wickham.
Transpose Data with R
Let's go through some examples -

Sample Data

The code below would create a sample data that would be used for demonstration.
data <- read.table(text="X Y    Z
                   ID12   2012-06    566
                   ID1    2012-06  10239
                   ID6    2012-06    524
                   ID12   2012-07   2360
                   ID1    2012-07   13853
                   ID6    2012-07    2352
                   ID12   2012-08   3950
                   ID1    2012-08   14738
                   ID6    2012-08   4104",header=TRUE)

Convert Long to Wide Format

Suppose you have data containing three variables such as X, Y and Z. The variable 'X' contains IDs and the variable 'Y' contains dates and the variable 'Z' contains income. The data is structured in a long format and you need to convert it to wide format so that the dates moved to column names and the income information comes under these dates. The snapshot of data and desired output is shown below -
R : Convert Long to Wide Format

In reshape2 package, there are two function for transforming long-format data to wide format. The functions are "dcast" and "acast". The only difference between these two functions are as follows :
  1. dcast function returns a data frame as output.
  2. acast function returns a vector, matrix or array as output.

Install reshape2 package if not installed already
if (!require(reshape2)){
install.packages('reshape2')
library(reshape2)
}
R Code : Transform Long to Wide Format
mydt = dcast(data,X~Y,value.var = "Z")
How dcast function works
  1. The first parameter of dcast function refers to a data frame
  2. The left hand side of the casting function refers to ID variables.
  3. The right hand side refers to the variable to move to column name
  4. The value.var would contain a name of the variable that stores values.
tidyr : Spread( ) Function

In tidyr package, spread( ) function spreads data from Long to Wide Format. See the common parameters of this function below:

  • data: data frame name.
  • key: Column name which will be used for column headings.
  • value: Column name which will be used for filling the rows.
  • fill: If used, missing values will be replaced with this value.
library(tidyr)
mydt = spread(data, Y, Z)
Example 2 : More than 1 ID Variable
Let's see another example wherein we have more than 1 ID variable. It contains information about Income generated from 2 products - Product A and B reported semi-annually.
Example of Transforming Data
data <- read.table(text="Year SemiYear Product Income
1 1 ProductA 13377
1 2 ProductA 14069
1 1 ProductB 11426
1 2 ProductB 11750
2 1 ProductA 11122
2 2 ProductA 11202
2 1 ProductB 14712
2 2 ProductB 10169
",header=TRUE)
library(reshape2)
xx=dcast(data, Year + SemiYear ~ Product, value.var = "Income")
In the above code, "Year + SemiYear" are the 2 ID variables. We want "Product" variable to be moved to columns. The output is shown below -
Output
It is straightforward in spread( ) function of tidyr package.
mydt = spread(data, Product, Income)
If you want the final output to be reported at year level
It seems to be VERY EASY (just remove the additional ID variable 'SemiYear'). But it's a little tricky. See the explanation below -
dcast(data, Year ~ Product, value.var = "Income")
Warning : Aggregation function missing: defaulting to length
Year ProductA ProductB
1 2 2
2 2 2

The income values are incorrect in the above table.
We need to define the statistics to aggregate income at year level. Let's sum the income to report annual score.
dcast(data, Year ~ Product, fun.aggregate = sum, value.var = "Income")
Year ProductA ProductB
1 27446 23176
2 22324 24881

Convert Wide Format Data to Long Format

Suppose you have data containing information of species and their sepal length. The data of sepal length of species are in columns.
Wide to Long Format

Create Sample Data
mydata = read.table(text= "ID setosa versicolor virginica
                    1 5.1 NA NA
                    2 4.9 NA NA
                    3 NA 7 NA
                    4 NA 6.4 NA
                    5 NA NA 6.3
                    6 NA NA 5.8
                    ", header=TRUE)
The following program would reshape data from wide to long format.
library(reshape2)
x = colnames(mydata[,-1])
t = melt(mydata,id.vars = "ID",measure.vars = x , variable.name="Species", value.name="Sepal.Length",na.rm = TRUE)
How melt function works :
  1. id.vars - ID variables to keep in the final output.
  2. measure.vars - variables to be transformed
  3. variable.name - name of variable used to store measured variable names
  4. value.name - name of variable used to store values
tidyr : gather( ) function
This function transforms data from wide to long format. See the syntax parameters below :
  • data: Data frame
  • key: Name of new key column
  • value: Name of new value column
  • na.rm: na.rm=TRUE means removing rows from the output where the value is missing
gather(mydata, Species, Sepal.Length, x, na.rm = TRUE)
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

2 Responses to "Transpose in R"
  1. Hi Deepashu,
    Thanks for sharing such a wonderful article on Transposing data.
    However, I still have some doubt associated with, if I will compare it with SAS.
    The variable which you are saying that its an "ID" variable, by sas I can say that it a "BY" variable because that's something which we are keeping it on the left hand side and rest we are "ID" by sas.
    please reply to clear my small dubiousness.
    Thanks
    Rishabh

    ReplyDelete
  2. One of the best overall tutorials I have seen. I promise this is getting bookmarked for my reference.

    ReplyDelete
Next → ← Prev