In this tutorial, we will cover how to remove one or more columns from a pandas dataframe. Pandas is a python package that has several functions for data analysis.
Syntax to Drop Columns
import pandas as pd
new_df = df.drop(['column_name1','column_name2'], axis=1)
In pandas, drop( )
function is used to remove column(s) from a pandas dataframe. axis=1
tells Python that you want to apply function on columns instead of rows.
Let's create a sample dataframe to explain examples in this tutorial. The code below creates 4 columns named 'A' through 'D'.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))
The following code removes Column 'A' from dataframe named 'df' and store it to new dataframe named 'newdf'.
newdf = df.drop(['A'], axis=1)
B C D
0 -1.656038 1.655995 -1.413243
1 0.710933 -1.335381 0.832619
2 -0.411327 0.098119 0.768447
3 -0.093217 1.077528 0.196891
4 0.302687 0.125881 -0.665159
5 -0.692847 -1.463154 -0.707779
#Check columns in newdf after dropping column A
newdf.columns
# Output
# Index(['B', 'C', 'D'], dtype='object')
You can specify all the columns you want to remove in a list and pass it in drop( )
function.
df2 = df.drop(['B','C'], axis=1)
cols = ['B','C']
df2 = df.drop(cols, axis=1)
You can find out name of first column by using this command df.columns[0]
. Indexing in python starts from 0.
df.drop(df.columns[0], axis =1)
To drop multiple columns by position (first and third columns), you can specify the position in list [0,2]
.
cols = [0,2]
df.drop(df.columns[cols], axis =1)
df = pd.DataFrame({"X1":range(1,6),"X_2":range(2,7),"YX":range(3,8),"Y_1":range(2,7),"Z":range(5,10)})
X1 X_2 YX Y_1 Z
0 1 2 3 2 5
1 2 3 4 3 6
2 3 4 5 4 7
3 4 5 6 5 8
4 5 6 7 6 9
df.loc[:,~df.columns.str.contains('^X')]
^X
is a expression of regex language which refers to beginning of letter 'X'df.columns.str.contains('^X')
returns array [True, True, False, False, False]. True where condition meets. Otherwise False- Sign
~
refers to negate the condition. df.loc[ ]
is used to select columns
It can also be written like :
df.drop(df.columns[df.columns.str.contains('^X')], axis=1)
#Removing columns whose name contains string 'X'
df.loc[:,~df.columns.str.contains('X')]
#Removing columns whose name contains string either 'X' or 'Y'
df.loc[:,~df.columns.str.contains('X|Y')]
#Removing columns whose name ends with string 'X'
df.loc[:,~df.columns.str.contains('X$')]
df = pd.DataFrame({'A':[1,3,np.nan,5,np.nan],
'B':[4,np.nan,np.nan,5,np.nan]
})
% of missing values can be calculated by mean of NAs in each column.
cols = df.columns[df.isnull().mean()>0.5]
df.drop(cols, axis=1)
Careful of the API future of `inplace` https://github.com/pandas-dev/pandas/issues/16529
ReplyDeleteThanks for highlighting the same. I added it in the post to discourage the use of it. Cheers!
Deletereally helpful ...
ReplyDeleteGlad you liked it. Cheers!
DeleteGreat...article simple and concise...
ReplyDeleteGlad you found it helpful. Cheers!
DeleteThanks for a nice article. Now, a request. If you find time, can you also write on operations on rows.
ReplyDeleteThanks for nice article..
ReplyDeleteSure. Thank you for stopping by my blog!
ReplyDeletecan u tell me how to apply only one feature on your dataset with code.i hope u you will response as soon as possible.
ReplyDeleteReally easy to understand sir !!!
ReplyDeletegreat tips, very well presented and easy to understand ! thanks !
ReplyDeletegreat; this really helped me a lot as a beginner.
ReplyDeleteNice one..great job
ReplyDeleteHi I have removed missing values from dataset permanently by using inplace=True. i want to restore back to original data frame data,how do i do that
ReplyDeleteThanks for helping out
ReplyDelete