In R, we can transpose data very easily. There are many R packages such as
tidyr
and
reshape2
that helps to reshape data from long to wide format or vice versa. Like many of us, I was also searching transpose function in dplyr package but didn't get any success. Then I realized there is no such function in dplyr to transpose data. Instead there are separate packages built for this function. Both these packages dplyr and tidyr are a part of
tidyverse
which is a collection of R packages designed to make data manipulation and exploration cakewalk for R users.
In this article, i would use these two packages with several examples. These package were written by the most popular R expert
Hadley Wickham.
 |
Transpose Data with R |
Let's go through some examples -
Sample Data
The code below would create a sample data that would be used for demonstration.
data <- read.table(text="X Y Z
ID12 2012-06 566
ID1 2012-06 10239
ID6 2012-06 524
ID12 2012-07 2360
ID1 2012-07 13853
ID6 2012-07 2352
ID12 2012-08 3950
ID1 2012-08 14738
ID6 2012-08 4104",header=TRUE)
Convert Long to Wide Format
Suppose you have data containing three variables such as X, Y and Z. The variable 'X' contains IDs and the variable 'Y' contains dates and the variable 'Z' contains income. The data is structured in a long format and you need to convert it to wide format so that the dates moved to column names and the income information comes under these dates. The snapshot of data and desired output is shown below -
 |
R : Convert Long to Wide Format |
In
reshape2 package, there are two function for transforming long-format data to wide format. The functions are
"dcast" and
"acast". The only difference between these two functions are as follows :
- dcast function returns a data frame as output.
- acast function returns a vector, matrix or array as output.
Install reshape2 package if not installed already
if (!require(reshape2)){
install.packages('reshape2')
library(reshape2)
}
R Code : Transform Long to Wide Format
mydt = dcast(data,X~Y,value.var = "Z")
 |
How dcast function works |
- The first parameter of dcast function refers to a data frame
- The left hand side of the casting function refers to ID variables.
- The right hand side refers to the variable to move to column name
- The value.var would contain a name of the variable that stores values.
tidyr : Spread( ) Function
In tidyr package, spread( )
function spreads data from Long to Wide Format. See the common parameters of this function below:
data
: data frame name.
key
: Column name which will be used for column headings.
value
: Column name which will be used for filling the rows.
fill
: If used, missing values will be replaced with this value.
library(tidyr)
mydt = spread(data, Y, Z)
Example 2 : More than 1 ID Variable
Let's see another example wherein we have more than 1 ID variable. It contains information about Income generated from 2 products - Product A and B reported semi-annually.
 |
Example of Transforming Data |
data <- read.table(text="Year SemiYear Product Income
1 1 ProductA 13377
1 2 ProductA 14069
1 1 ProductB 11426
1 2 ProductB 11750
2 1 ProductA 11122
2 2 ProductA 11202
2 1 ProductB 14712
2 2 ProductB 10169
",header=TRUE)
library(reshape2)
xx=dcast(data, Year + SemiYear ~ Product, value.var = "Income")
In the above code, "Year + SemiYear" are the 2 ID variables. We want "Product" variable to be moved to columns.
The output is shown below -
 |
Output |
It is straightforward in spread( ) function of tidyr package.
mydt = spread(data, Product, Income)
If you want the final output to be reported at year level
It seems to be
VERY EASY (just remove the additional ID variable 'SemiYear'). But it's a little tricky. See the explanation below -
dcast(data, Year ~ Product, value.var = "Income")
Warning : Aggregation function missing: defaulting to length
Year |
ProductA |
ProductB |
1 |
2 |
2 |
2 |
2 |
2 |
The income values are incorrect in the above table.
We need to define the statistics to aggregate income at year level. Let's sum the income to report annual score.
dcast(data, Year ~ Product, fun.aggregate = sum, value.var = "Income")
Year |
ProductA |
ProductB |
1 |
27446 |
23176 |
2 |
22324 |
24881 |
Convert Wide Format Data to Long Format
Suppose you have data containing information of species and their sepal length. The data of sepal length of species are in columns.
 |
Wide to Long Format |
Create Sample Data
mydata = read.table(text= "ID setosa versicolor virginica
1 5.1 NA NA
2 4.9 NA NA
3 NA 7 NA
4 NA 6.4 NA
5 NA NA 6.3
6 NA NA 5.8
", header=TRUE)
The following program would reshape data from wide to long format.
library(reshape2)
x = colnames(mydata[,-1])
t = melt(mydata,id.vars = "ID",measure.vars = x , variable.name="Species", value.name="Sepal.Length",na.rm = TRUE)
How melt function works :
- id.vars - ID variables to keep in the final output.
- measure.vars - variables to be transformed
- variable.name - name of variable used to store measured variable names
- value.name - name of variable used to store values
tidyr : gather( ) function
This function transforms data from wide to long format. See the syntax parameters below :
data
: Data frame
key
: Name of new key column
value
: Name of new value column
na.rm
: na.rm=TRUE means removing rows from the output where the value is missing
gather(mydata, Species, Sepal.Length, x, na.rm = TRUE)
About Author:
Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.
Hi Deepashu,
ReplyDeleteThanks for sharing such a wonderful article on Transposing data.
However, I still have some doubt associated with, if I will compare it with SAS.
The variable which you are saying that its an "ID" variable, by sas I can say that it a "BY" variable because that's something which we are keeping it on the left hand side and rest we are "ID" by sas.
please reply to clear my small dubiousness.
Thanks
Rishabh
One of the best overall tutorials I have seen. I promise this is getting bookmarked for my reference.
ReplyDelete