R Function : Outlier Treatment

To correct outlier problem, we can winsorise extreme values. Winsorize at the 1st and 99th percentile means values that are less than the value at 1st percentile are replaced by the value at 1st percentile, and values that are greater than the value at 99th percentile are replaced by the value at 99th percentile.
########################################################
# R Function for Outlier Treatment : Percentile Capping
########################################################

pcap <- function(x){
  for (i in which(sapply(x, is.numeric))) {
  quantiles <- quantile( x[,i], c(.05, .95 ), na.rm =TRUE)
  x[,i] = ifelse(x[,i] < quantiles[1] , quantiles[1], x[,i])
  x[,i] = ifelse(x[,i] > quantiles[2] , quantiles[2], x[,i])}
  x}

# Replacing extreme values with percentiles
abcd = pcap(mydata)
  
# Checking Percentile values of 7th variable
quantile(abcd[,7], c(0.25,0.5,.95, .99, 1), na.rm = TRUE)
Spread the Word!
Share
Related Posts
About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and Human Resource.

2 Responses to "R Function : Outlier Treatment"
  1. Hey thanks for your post. I tried your code but it gave an error. I am trying to pass a data frame as an argument and winsorise each column. I copied your code and the following error was displayed:

    Error in check_names_df(j, x) : object 'i' not found

    Any help would be appreciated. Thanks

    ReplyDelete
  2. Thanks for the code!!.... it worked well and helped me solving a great mess.

    ReplyDelete

Next → ← Prev

Looks like you are using an ad blocker!

To continue reading you need to turnoff adblocker and refresh the page. We rely on advertising to help fund our site. Please whitelist us if you enjoy our content.