How to Drop Columns from Pandas Dataframe

In this tutorial, we will cover how to remove one or more columns from a pandas dataframe. Pandas is a python package that has several functions for data analysis.

Syntax to Drop Columns


import pandas as pd
new_df = df.drop(['column_name1','column_name2'], axis=1)

In pandas, drop( ) function is used to remove column(s) from a pandas dataframe. axis=1 tells Python that you want to apply function on columns instead of rows.

Drop Columns from Pandas Dataframe in Python

Let's create a sample dataframe to explain examples in this tutorial. The code below creates 4 columns named 'A' through 'D'.


import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))

The following code removes Column 'A' from dataframe named 'df' and store it to new dataframe named 'newdf'.

newdf = df.drop(['A'], axis=1)

          B         C         D
0 -1.656038  1.655995 -1.413243
1  0.710933 -1.335381  0.832619
2 -0.411327  0.098119  0.768447
3 -0.093217  1.077528  0.196891
4  0.302687  0.125881 -0.665159
5 -0.692847 -1.463154 -0.707779


#Check columns in newdf after dropping column A
newdf.columns

# Output
# Index(['B', 'C', 'D'], dtype='object')

Remove Multiple Columns in Python

You can specify all the columns you want to remove in a list and pass it in drop( ) function.

Method I

df2 = df.drop(['B','C'], axis=1)

Method II

cols = ['B','C']
df2 = df.drop(cols, axis=1)

Dropping Columns by Column Position

You can find out name of first column by using this command df.columns[0]. Indexing in python starts from 0.

df.drop(df.columns[0], axis =1)

To drop multiple columns by position (first and third columns), you can specify the position in list [0,2].

cols = [0,2]
df.drop(df.columns[cols], axis =1)

Dropping Columns by Name Pattern

df = pd.DataFrame({"X1":range(1,6),"X_2":range(2,7),"YX":range(3,8),"Y_1":range(2,7),"Z":range(5,10)})


   X1  X_2  YX  Y_1  Z
0   1    2   3    2  5
1   2    3   4    3  6
2   3    4   5    4  7
3   4    5   6    5  8
4   5    6   7    6  9

Dropping Columns Starting with 'X'

df.loc[:,~df.columns.str.contains('^X')]

How it works?

^X is a expression of regex language which refers to beginning of letter 'X'
df.columns.str.contains('^X') returns array [True, True, False, False, False]. True where condition meets. Otherwise False
Sign ~ refers to negate the condition.
df.loc[ ] is used to select columns

It can also be written like :

df.drop(df.columns[df.columns.str.contains('^X')], axis=1)

Other Examples

#Removing columns whose name contains string 'X'
df.loc[:,~df.columns.str.contains('X')]

#Removing columns whose name contains string either 'X' or 'Y'
df.loc[:,~df.columns.str.contains('X|Y')]

#Removing columns whose name ends with string 'X'
df.loc[:,~df.columns.str.contains('X$')]

Dropping Columns with Missing Values Greater than 50%

df = pd.DataFrame({'A':[1,3,np.nan,5,np.nan],
                   'B':[4,np.nan,np.nan,5,np.nan]
                   })

% of missing values can be calculated by mean of NAs in each column.

cols = df.columns[df.isnull().mean()>0.5]
df.drop(cols, axis=1)

About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

While I love having friends who agree, I only learn from those who don't
Let's Get Connected Email LinkedIn