In R, you can extract numeric columns from a data frame using various methods. Here are a few common ways to achieve this:
Let's create a sample data frame called mydata
having 3 variables (name, age, height).
# Create a sample data frame mydata <- data.frame( name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 28), height = c(165.5, 180.0, 172.3) )
How to Extract all Numeric Variables in R
In the dataframe named "mydata", we have two numeric columns "age" and "height". When we have multiple variables in a dataframe, we don't know the name of the numeric columns in advance.
In base R, you can extract multiple numeric columns (variables) using sapply
function. The sapply function is a part of apply family of functions. They perform multiple iterations (loops) in R.
In dplyr package, the select_if
function is used to select columns based on a condition. In this case, is.numeric
selects only the numeric columns.
Base R
numeric_columns <- mydata[sapply(mydata, is.numeric)] print(numeric_columns)
dplyr
library(dplyr) # Select numeric columns using select_if() numeric_columns <- mydata %>% select_if(is.numeric) print(numeric_columns)
Extracting Numeric Variables with No Missing Values in R
Let's say you want to keep numeric columns that have no missing values in R.
# Create a sample data frame mydata <- data.frame( name = c("Alice", "Bob", "Charlie", "Dave"), age = c(25, 30, 28, NA), height = c(165.5, 180.0, 172.3, 189), weight = c(NA, NA, 72, 74) )
Base R
numeric_cols <- sapply(mydata, is.numeric) numeric_no_missing <- colSums(is.na(mydata[numeric_cols])) == 0 numeric_no_missing_cols <- mydata[numeric_cols] [numeric_no_missing]
Here's a step-by-step breakdown of the code:
numeric_cols <- sapply(mydata, is.numeric):
- This line creates a logical vector
numeric_cols
where each element corresponds to a column in the dataframemydata
. - It checks whether each column is numeric using the
is.numeric()
function.
- This line creates a logical vector
numeric_no_missing <- colSums(is.na(mydata[numeric_cols])) == 0:
- This line calculates a logical vector
numeric_no_missing
which indicates for each numeric column whether it has no missing values (NA). mydata[numeric_cols]
subsets the original dataframe to include only the numeric columns.is.na(mydata[numeric_cols])
creates a logical dataframe withTRUE
where there are missing values andFALSE
otherwise.colSums(is.na(mydata[numeric_cols]))
calculates the count of missing values in each numeric column.colSums(is.na(mydata[numeric_cols])) == 0
checks whether the count of missing values in each column is equal to zero.
- This line calculates a logical vector
numeric_no_missing_cols <- mydata[numeric_cols][numeric_no_missing]:
- This line creates a new dataframe
numeric_no_missing_cols
. mydata[numeric_cols]
subsets the original dataframe to include only the numeric columns.[numeric_no_missing]
then further subsets these numeric columns using thenumeric_no_missing
logical vector.- This subset operation effectively keeps only the columns that are both numeric and have no missing values.
- This line creates a new dataframe
dplyr
If you want to keep columns that have no missing values, you can use the select() function with where() in dplyr. select(where(is.numeric)) selects only the numeric columns. select(where(~ all(!is.na(.)))) selects columns where all values are not missing (NA).
library(dplyr) numeric_no_missing_cols <- mydata %>% select(where(is.numeric)) %>% select(where(~ all(!is.na(.))))
Post a Comment