How to Extract Character Variables from DataFrame in R

Deepanshu Bhalla Add Comment ,

In R, you can extract character columns from a data frame using various methods.

Let's create a sample data frame called mydata having 3 variables (name, city, age).

# Create a sample data frame
mydata <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  city = c("Los Angeles", "New York", "Dallas"),
  height = c(165.5, 180.0, 172.3)
)

How to Extract ALL Character Variables in R

In the dataframe named "mydata", we have two character columns "name" and "city". When we have multiple variables in a dataframe, we don't know the name of the character columns in advance.

  • Base R
  • dplyr

Base R

character_columns <- mydata[sapply(mydata, is.character)]
print(character_columns)

In base R, you can extract multiple character columns using sapply function. The sapply function is a part of apply family of functions. They perform multiple iterations (loops) in R.

In dplyr package, the select_if function is used to select columns based on a condition. In this case, is.character selects only the character columns.

Extract Character Columns from DataFrame in R

Extract Character Variables with more than 2 Unique Categories in R

Let's modify the "mydata" dataframe by adding one more character variable for demonstration purpose.

mydata <- data.frame(
  name = c("Alice", "Bob", "Charlie", "Jon"),
  product = c("A", "A", "A", "B"),
  sales = c(21, 32, 45, 36)
)
  • Base R
  • dplyr

Base R

In this code, we're using the sapply function to iterate through each column of the "mydata" data frame. For each column, we check if it's of character data type (is.character(col)) and if it has more than 2 unique categories (length(unique(col)) > 2).

# Extract character columns with more than 2 unique categories
character_cols0 <- sapply(mydata, function(col) is.character(col) && length(unique(col)) > 2)

# Select columns based on the extracted character column indicators
character_cols <- mydata[character_cols0]
print(character_cols)
Extract Character Columns with more than 2 Unique Categories in R

Extracting Character Variables with No Missing Values in R

Let's say you want to keep character variables that have no missing values in R.

# Create a sample data frame
mydata <- data.frame(
  name = c("Alice", "Bob", "Charlie", "Jon"),
  city = c("Los Angeles", "New York", "Dallas", NA),
  height = c(165.5, 180.0, 172.3, 181)
)
  • Base R
  • dplyr

Base R

character_cols <- sapply(mydata, is.character)
character_no_missing <- colSums(is.na(mydata[character_cols])) == 0
character_no_missing_cols <- mydata[character_cols] [character_no_missing]

Let's see how this code works:

  1. sapply(mydata, is.character) checks and returns if each column in mydata is character type.
  2. colSums(is.na(mydata[character_cols])) == 0 identifies character columns with no missing values.
  3. mydata[character_cols][character_no_missing] selects character columns without missing values and is stored into a new dataframe.
R: Extracting Character Columns with No Missing Values
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 0 Response to "How to Extract Character Variables from DataFrame in R"
Next → ← Prev