In R, you can extract numeric columns from a data frame using various methods. Here are a few common ways to achieve this:

Let's create a sample data frame called `mydata`

having 3 variables (name, age, height).

# Create a sample data frame mydata <- data.frame( name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 28), height = c(165.5, 180.0, 172.3) )

## How to Extract all Numeric Variables in R

In the dataframe named "mydata", we have two numeric columns "age" and "height". When we have multiple variables in a dataframe, we don't know the name of the numeric columns in advance.

In **base R**, you can extract multiple numeric columns (variables) using `sapply`

function. The sapply function is a part of apply family of functions. They perform multiple iterations (loops) in R.

In **dplyr** package, the `select_if`

function is used to select columns based on a condition. In this case, `is.numeric`

selects only the numeric columns.

Base R

numeric_columns <- mydata[sapply(mydata, is.numeric)] print(numeric_columns)

dplyr

library(dplyr) # Select numeric columns using select_if() numeric_columns <- mydata %>% select_if(is.numeric) print(numeric_columns)

## Extracting Numeric Variables with No Missing Values in R

Let's say you want to keep numeric columns that have no missing values in R.

# Create a sample data frame mydata <- data.frame( name = c("Alice", "Bob", "Charlie", "Dave"), age = c(25, 30, 28, NA), height = c(165.5, 180.0, 172.3, 189), weight = c(NA, NA, 72, 74) )

Base R

numeric_cols <- sapply(mydata, is.numeric) numeric_no_missing <- colSums(is.na(mydata[numeric_cols])) == 0 numeric_no_missing_cols <- mydata[numeric_cols] [numeric_no_missing]

Here's a step-by-step breakdown of the code:

`numeric_cols <- sapply(mydata, is.numeric):`

- This line creates a logical vector
`numeric_cols`

where each element corresponds to a column in the dataframe`mydata`

. - It checks whether each column is numeric using the
`is.numeric()`

function.

- This line creates a logical vector
`numeric_no_missing <- colSums(is.na(mydata[numeric_cols])) == 0:`

- This line calculates a logical vector
`numeric_no_missing`

which indicates for each numeric column whether it has no missing values (NA). `mydata[numeric_cols]`

subsets the original dataframe to include only the numeric columns.`is.na(mydata[numeric_cols])`

creates a logical dataframe with`TRUE`

where there are missing values and`FALSE`

otherwise.`colSums(is.na(mydata[numeric_cols]))`

calculates the count of missing values in each numeric column.`colSums(is.na(mydata[numeric_cols])) == 0`

checks whether the count of missing values in each column is equal to zero.

- This line calculates a logical vector
`numeric_no_missing_cols <- mydata[numeric_cols][numeric_no_missing]:`

- This line creates a new dataframe
`numeric_no_missing_cols`

. `mydata[numeric_cols]`

subsets the original dataframe to include only the numeric columns.`[numeric_no_missing]`

then further subsets these numeric columns using the`numeric_no_missing`

logical vector.- This subset operation effectively keeps only the columns that are both numeric and have no missing values.

- This line creates a new dataframe

dplyr

If you want to keep columns that have no missing values, you can use the **select()** function with **where()** in dplyr. **select(where(is.numeric))** selects only the numeric columns. **select(where(~ all(!is.na(.))))** selects columns where all values are not missing (NA).

library(dplyr) numeric_no_missing_cols <- mydata %>% select(where(is.numeric)) %>% select(where(~ all(!is.na(.))))

## Post a Comment