R : Keep / Drop Columns from Data Frame

Live Online Training : Data Science with R

- Explain Advanced Algorithms in Simple English
- Live Projects
- Case Studies
- Job Placement Assistance
- Get 10% off till Sept 25, 2017
- Batch starts from October 8, 2017

The article below explains how to keep or drop variables (columns) from data frame. In R, there are multiple ways to select or drop column.

Create a sample data frame

The following code creates a sample data frame that is used for demonstration.
mydata <- data.frame(a=letters[1:5], x=runif(5,10,50), y=sample(5), z=rnorm(5))
Sample Data

R : Delete column by name

Method I :

The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
df = subset(mydata, select = -c(x,z) )
  a y
1 a 2
2 b 1
3 c 4
4 d 3
5 e 5

Method II :

In this method, we are creating a character vector named drop in which we are storing column names x and z. Later we are telling R to select all the variables except the column names specified in the vector drop. The function names() returns all the column names and the '!' sign indicates negation.
drop <- c("x","z")
df = mydata[,!(names(mydata) %in% drop)]
It can also be written like :  df = mydata[,!(names(mydata) %in% c("x","z"))]

R : Drop columns by column index numbers

It's easier to remove variables by their position number. All you just need to do is to mention the column index number. In the following code, we are telling R to drop variables that are positioned at first column, third and fourth columns. The minus sign is to drop variables.
df <- mydata[ -c(1,3:4) ]
1 13.58206
2 18.42049
3 39.31821
4 44.08534
5 41.53592

R : Keep column by name

Method I :

In this section, we are retaining variables x and z.
keeps <- c("x","z")
df = mydata[keeps]
The above code is equivalent to df = mydata[c("x","z")]

Method II :

We can keep variables with subset() function.
df = subset(mydata, select = c(x,z))

Keep columns by column index number

In this case, we are telling R to keep only variables that are placed at second and fourth position.
df <- mydata[c(2,4)]

Keep or Delete columns with dplyr package

In R, the dplyr package is one of the most popular package for data manipulation. It makes data wrangling easy. You can install package by using the command below -

1. How to delete first, third and fourth column
mydata2 = select(mydata, -1, -3:-4)
2. How to delete columns a, x and y

This can be written in three ways -
mydata2 = select(mydata, -a, -x, -y)
mydata2 = select(mydata, -c(a, x, y))
mydata2 = select(mydata, -a:-y)

3. How to keep columns a, y and z
mydata2 = select(mydata, a, y:z)

Keep / Drop Columns by Name Pattern

The code below creates data for 4 variables named as follows :
mydata = read.table(text="
2 1 5 12
3 4 2 13
", header=TRUE)
Keep / Drop Columns by pattern

Keeping columns whose name starts with "INC"
mydata1 = mydata[,grepl("^INC",names(mydata))]
The grepl() function is used to search for matches to a pattern. In this case, it is searching "INC" at starting in the column names of data frame mydata. It returns INC_A and INC_B.

Dropping columns whose name starts with "INC"

The '!' sign indicates negation. It returns SAC_A and ASD_A.
mydata2 = mydata[,!grepl("^INC",names(mydata))]

Keeping columns whose name contain "_A" at the end

The "$" is used to search for the sub-strings at the end of string. It returns INC_A, SAC_A and ASD_A.
mydata12 = mydata[,grepl("_A$",names(mydata))]

Dropping columns whose name contain "_A" at the end
mydata22 = mydata[,!grepl("_A$",names(mydata))]

Keeping columns whose name contain the letter "S"
mydata32 = mydata[,grepl("*S",names(mydata))]

Dropping columns whose name contain the letter "S"
mydata33 = mydata[,!grepl("*S",names(mydata))]

R Function : Keep / Drop Column Function

The following program automates keeping or dropping columns from a data frame.
KeepDrop = function(data=df,cols="var",newdata=df2,drop=1) {
  # Double Quote Output Dataset Name
  t = deparse(substitute(newdata))

  # Drop Columns
  if(drop == 1){
    newdata = data [ , !(names(data) %in% scan(textConnection(cols), what="", sep=" "))]}

  # Keep Columns
  else {
    newdata = data [ , names(data) %in% scan(textConnection(cols), what="", sep=" ")]}
  assign(t, newdata, .GlobalEnv)

How to use the above function

To keep variables 'a' and 'x', use the code below. The drop = 0 implies keeping variables that are specified in the parameter "cols". The parameter "data" refers to input data frame. "cols" refer to the variables you want to keep / remove. "newdata" refers to the output data frame.
KeepDrop(data=mydata,cols="a x", newdata=dt, drop=0)

To drop variables, use the code below. The drop = 1 implies removing variables which are defined in the second parameter of the function.
KeepDrop(data=mydata,cols="a x", newdata=dt, drop=1)

R Tutorials : 75 Free R Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.

While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

7 Responses to "R : Keep / Drop Columns from Data Frame"

  1. df =df[,-c(21,6,5,7,15,2,3)]

    this can work too
    these no. are the column number

  2. Best computer science site ever, Thank you from Quito Ecuador.


Next → ← Prev