####
**R Data Science:**
R Programming A-Z: R For Data Science With Real Exercises!

**Data Types**

Unlike SAS and SPSS, R has several different data types (structures) including vectors, factors, data frames, matrices, arrays, and lists. The data frame is most like a dataset in SAS.

**1. Vectors**

A vector is an object that contains a set of values called its elements.

**Numeric vector**

x <- c(1,2,3,4,5,6)

*The operator <– is equivalent to "=" sign.*

**Character vector**

State <- c("DL", "MU", "NY", "DL", "NY", "MU")

*To calculate frequency for State vector, you can use*

**table**function.

*To calculate mean for a vector, you can use*

**mean**function.*Since the above vector contains a NA (not available) value, the mean function returns NA.*

*To calculate mean for a vector*

**excluding NA values**, you can include

**na.rm = TRUE**parameter in mean*function.*

**Convert a column "x" to numeric**

data$x = as.numeric(data$x)

**2.**

**Factors**

R has a special data structure to store

**. It tells R that a variable is nominal or ordinal by making it a factor.**

*categorical variables*The factor function has three parameters:

- Vector Name
- Values (Optional)
- Value labels (Optional)

**Convert a column "x" to factor**

data$x = as.factor(data$x)

**3.**

**Matrices**

All values in columns in a matrix must have the same mode (numeric, character, etc.) and the same length.

The

**cbind**function joins columns together into a matrix. See the usage below

The numbers to the left side in brackets are the row numbers. The form [1, ] means that it is row number one and the blank following the comma means that R has displayed all the columns.

To see dimension of the matrix, you can use

**dim**function.

To see correlation of the matrix, you can use

**cor**function.

You can use subscripts to identify rows or columns.

**4. Arrays**

Arrays are similar to matrices but can have more than two dimensions.

**5. Data Frames**

A data frame is similar to SAS and SPSS datasets. It contains variables and records.

It is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.

The

**data.frame**function is used to combine variables (vectors and factors) into a data frame.

**6. Lists**

A list allows you to store a variety of objects.

You can use subscripts to select the specific component of the list.

**How to know data type of a column**

1.

**'class'**is a property assigned to an object that determines how generic functions operate with it. It is not a mutually exclusive classification.

2.

**'mode'**is a mutually exclusive classification of objects according to their basic structure. The 'atomic' modes are numeric, complex, charcter and logical.

> x <- 1:16

> x <- factor(x)

> class(x)

[1] "factor"

> mode(x)

[1] "numeric"

Congrats, Mr. Bhalla. This post was very clear, straight and useful. Thanks for sharing it with us.

ReplyDeleteThank you for your appreciation. Glad you found it useful.

Deletethank a lot !

ReplyDeletethank a lot !

ReplyDeleteCheers!

DeleteGreat and quick tutorials.

ReplyDeleteGlad you found it helpful. Cheers!

DeleteAwesome excllent bro...Thanks alot really Thanks..

ReplyDeleteExcellent Job Man !

ReplyDeletethis is really Awesome post Bro !!! If possible can you add some case studies will be really helpful to get some practical knowledge

ReplyDelete