Transpose Data in R

In R, we can transpose our data very easily. In R, there are many packages such as tidyr and reshape2 that helps to make it easy. In this article, i would use 'reshape2' package. This package was written by the most popular R expert Hadley Wickham.

Transpose Data with R
Let's go through some examples -

Sample Data

The code below would create a sample data that would be used for demonstration.
data <- read.table(text="X Y    Z
                   ID12   2012-06    566
                   ID1    2012-06  10239
                   ID6    2012-06    524
                   ID12   2012-07   2360
                   ID1    2012-07   13853
                   ID6    2012-07    2352
                   ID12   2012-08   3950
                   ID1    2012-08   14738
                   ID6    2012-08   4104",header=TRUE)

Convert Long to Wide Format

Suppose you have data containing three variables such as X, Y and Z. The variable 'X' contains IDs and the variable 'Y' contains dates and the variable 'Z' contains income. The data is structured in a long format and you need to convert it to wide format so that the dates moved to column names and the income information comes under these dates. The snapshot of data and desired output is shown below -
R : Convert Long to Wide Format

In reshape2 package, there are two function for transforming long-format data to wide format. The functions are "dcast" and "acast". The only difference between these two functions are as follows :
  1. dcast function returns a data frame as output.
  2. acast function returns a vector, matrix or array as output.

Install reshape2 package if not installed already
if (!require(reshape2)){
install.packages('reshape2')
library(reshape2)
}
R Code : Transform Long to Wide Format
mydt = dcast(data,X~Y,value.var = "Z")
How dcast function works
  1. The first parameter of dcast function refers to a data frame
  2. The left hand side of the casting function refers to ID variables.
  3. The right hand side refers to the variable to move to column name
  4. The value.var would contain a name of the variable that stores values.

Example 2 : More than 1 ID Variable

Let's see another example wherein we have more than 1 ID variable. It contains information about Income generated from 2 products - Product A and B reported semi-annually.
Example of Transforming Data
library(reshape2)
xx=dcast(data, Year + SemiYear ~ Product,  value.var = "Income")
In the above code, "Year + SemiYear" are the 2 ID variables. We want "Product" variable to be moved to columns.

The output is shown below -
Output

If you want the final output to be reported at year level

It seems to be VERY EASY (just remove the additional ID variable 'SemiYear'). But it's a little tricky. See the explanation below -
dcast(data, Year ~ Product, value.var = "Income")
Warning : Aggregation function missing: defaulting to length
Year ProductA ProductB
1 2 2
2 2 2

The income values are incorrect in the above table.

We need to define the statistics to aggregate income at year level. Let's sum the income to report annual score.
dcast(data, Year ~ Product, fun.aggregate = sum, value.var = "Income")
Year ProductA ProductB
1 27446 23176
2 22324 24881


Convert Wide Format Data to Long Format

Suppose you have data containing information of species and their sepal length. The data of sepal length of species are in columns.
Wide to Long Format

Create Sample Data
mydata = read.table(text= "ID setosa versicolor virginica
                    1 5.1 NA NA
                    2 4.9 NA NA
                    3 NA 7 NA
                    4 NA 6.4 NA
                    5 NA NA 6.3
                    6 NA NA 5.8
                    ", header=TRUE)
The following program would reshape data from wide to long format. 
library(reshape2)
x = colnames(mydata[,-1])
t = melt(mydata,id.vars = "ID",measure.vars = x , variable.name="Species", value.name="Sepal.Length",na.rm = TRUE)
 How melt function works :

  1. id.vars - ID variables to keep in the final output.
  2. measure.vars - variables to be transformed
  3. variable.name - name of variable used to store measured variable names
  4. value.name - name of variable used to store values

R Tutorials : 75 Free R Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.


While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

1 Response to "Transpose Data in R"

  1. Hi Deepashu,
    Thanks for sharing such a wonderful article on Transposing data.
    However, I still have some doubt associated with, if I will compare it with SAS.
    The variable which you are saying that its an "ID" variable, by sas I can say that it a "BY" variable because that's something which we are keeping it on the left hand side and rest we are "ID" by sas.
    please reply to clear my small dubiousness.
    Thanks
    Rishabh

    ReplyDelete

Next → ← Prev