How to Drop Columns from Pandas Dataframe

Deepanshu Bhalla 16 Comments ,

In this tutorial, we will cover how to remove one or more columns from a pandas dataframe. Pandas is a python package that has several functions for data analysis.

Syntax to Drop Columns


import pandas as pd
new_df = df.drop(['column_name1','column_name2'], axis=1)

In pandas, drop( ) function is used to remove column(s) from a pandas dataframe. axis=1 tells Python that you want to apply function on columns instead of rows.

Drop Columns from Pandas Dataframe in Python

Let's create a sample dataframe to explain examples in this tutorial. The code below creates 4 columns named 'A' through 'D'.


import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))

The following code removes Column 'A' from dataframe named 'df' and store it to new dataframe named 'newdf'.

newdf = df.drop(['A'], axis=1)
          B         C         D
0 -1.656038  1.655995 -1.413243
1  0.710933 -1.335381  0.832619
2 -0.411327  0.098119  0.768447
3 -0.093217  1.077528  0.196891
4  0.302687  0.125881 -0.665159
5 -0.692847 -1.463154 -0.707779

#Check columns in newdf after dropping column A
newdf.columns

# Output
# Index(['B', 'C', 'D'], dtype='object')
Remove Multiple Columns in Python

You can specify all the columns you want to remove in a list and pass it in drop( ) function.

Method I
df2 = df.drop(['B','C'], axis=1)
Method II
cols = ['B','C']
df2 = df.drop(cols, axis=1)
Dropping Columns by Column Position

You can find out name of first column by using this command df.columns[0]. Indexing in python starts from 0.

df.drop(df.columns[0], axis =1)

To drop multiple columns by position (first and third columns), you can specify the position in list [0,2].

cols = [0,2]
df.drop(df.columns[cols], axis =1)
Dropping Columns by Name Pattern
df = pd.DataFrame({"X1":range(1,6),"X_2":range(2,7),"YX":range(3,8),"Y_1":range(2,7),"Z":range(5,10)})

   X1  X_2  YX  Y_1  Z
0   1    2   3    2  5
1   2    3   4    3  6
2   3    4   5    4  7
3   4    5   6    5  8
4   5    6   7    6  9
Dropping Columns Starting with 'X'
df.loc[:,~df.columns.str.contains('^X')]
How it works?
  1. ^X is a expression of regex language which refers to beginning of letter 'X'
  2. df.columns.str.contains('^X') returns array [True, True, False, False, False]. True where condition meets. Otherwise False
  3. Sign ~ refers to negate the condition.
  4. df.loc[ ] is used to select columns

It can also be written like :

df.drop(df.columns[df.columns.str.contains('^X')], axis=1)
Other Examples
#Removing columns whose name contains string 'X'
df.loc[:,~df.columns.str.contains('X')]

#Removing columns whose name contains string either 'X' or 'Y'
df.loc[:,~df.columns.str.contains('X|Y')]

#Removing columns whose name ends with string 'X'
df.loc[:,~df.columns.str.contains('X$')]
Dropping Columns with Missing Values Greater than 50%
df = pd.DataFrame({'A':[1,3,np.nan,5,np.nan],
                   'B':[4,np.nan,np.nan,5,np.nan]
                   })

% of missing values can be calculated by mean of NAs in each column.

cols = df.columns[df.isnull().mean()>0.5]
df.drop(cols, axis=1)
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 16 Responses to "How to Drop Columns from Pandas Dataframe"
  1. Careful of the API future of `inplace` https://github.com/pandas-dev/pandas/issues/16529

    ReplyDelete
    Replies
    1. Thanks for highlighting the same. I added it in the post to discourage the use of it. Cheers!

      Delete
  2. Thanks for a nice article. Now, a request. If you find time, can you also write on operations on rows.

    ReplyDelete
  3. Sure. Thank you for stopping by my blog!

    ReplyDelete
  4. can u tell me how to apply only one feature on your dataset with code.i hope u you will response as soon as possible.

    ReplyDelete
  5. Really easy to understand sir !!!

    ReplyDelete
  6. great tips, very well presented and easy to understand ! thanks !

    ReplyDelete
  7. great; this really helped me a lot as a beginner.

    ReplyDelete
  8. Hi I have removed missing values from dataset permanently by using inplace=True. i want to restore back to original data frame data,how do i do that

    ReplyDelete
Next → ← Prev