Data Types
Unlike SAS and SPSS, R has several different data types (structures) including vectors, factors, data frames, matrices, arrays, and lists. The data frame is most like a dataset in SAS.
1. Vectors
A vector is an object that contains a set of values called its elements.
Numeric vector
Character vector
You can use subscripts to refer elements of a vector.
R has a special data structure to store categorical variables. It tells R that a variable is nominal or ordinal by making it a factor.
3. Matrices
All values in columns in a matrix must have the same mode (numeric, character, etc.) and the same length.
The cbind function joins columns together into a matrix. See the usage below

The numbers to the left side in brackets are the row numbers. The form [1, ] means that it is row number one and the blank following the comma means that R has displayed all the columns.
To see dimension of the matrix, you can use dim function.
To see correlation of the matrix, you can use cor function.
4. Arrays
Arrays are similar to matrices but can have more than two dimensions.
5. Data Frames
A data frame is similar to SAS and SPSS datasets. It contains variables and records.
It is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.
The data.frame function is used to combine variables (vectors and factors) into a data frame.
How to know data type of a column
1. 'class' is a property assigned to an object that determines how generic functions operate with it. It is not a mutually exclusive classification.
2. 'mode' is a mutually exclusive classification of objects according to their basic structure. The 'atomic' modes are numeric, complex, charcter and logical.
> x <- 1:16
> x <- factor(x)
> class(x)
[1] "factor"
> mode(x)
[1] "numeric"
Unlike SAS and SPSS, R has several different data types (structures) including vectors, factors, data frames, matrices, arrays, and lists. The data frame is most like a dataset in SAS.
1. Vectors
A vector is an object that contains a set of values called its elements.
Numeric vector
x <- c(1,2,3,4,5,6)The operator <– is equivalent to "=" sign.
Character vector
State <- c("DL", "MU", "NY", "DL", "NY", "MU")
To calculate frequency for State vector, you can use table function.
To calculate mean for a vector, you can use mean function.
Since the above vector contains a NA (not available) value, the mean function returns NA.
To calculate mean for a vector excluding NA values, you can include na.rm = TRUE parameter in mean function.
Convert a column "x" to numeric
data$x = as.numeric(data$x)
2. Factors
R has a special data structure to store categorical variables. It tells R that a variable is nominal or ordinal by making it a factor.
The factor function has three parameters:
- Vector Name
- Values (Optional)
- Value labels (Optional)
Convert a column "x" to factor
data$x = as.factor(data$x)
All values in columns in a matrix must have the same mode (numeric, character, etc.) and the same length.
The cbind function joins columns together into a matrix. See the usage below

The numbers to the left side in brackets are the row numbers. The form [1, ] means that it is row number one and the blank following the comma means that R has displayed all the columns.
To see dimension of the matrix, you can use dim function.
To see correlation of the matrix, you can use cor function.
You can use subscripts to identify rows or columns.
4. Arrays
Arrays are similar to matrices but can have more than two dimensions.
5. Data Frames
A data frame is similar to SAS and SPSS datasets. It contains variables and records.
It is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.
The data.frame function is used to combine variables (vectors and factors) into a data frame.
6. Lists
A list allows you to store a variety of objects.
You can use subscripts to select the specific component of the list.
How to know data type of a column
1. 'class' is a property assigned to an object that determines how generic functions operate with it. It is not a mutually exclusive classification.
2. 'mode' is a mutually exclusive classification of objects according to their basic structure. The 'atomic' modes are numeric, complex, charcter and logical.
> x <- 1:16
> x <- factor(x)
> class(x)
[1] "factor"
> mode(x)
[1] "numeric"
Congrats, Mr. Bhalla. This post was very clear, straight and useful. Thanks for sharing it with us.
ReplyDeleteThank you for your appreciation. Glad you found it useful.
DeleteI agree with Prof. Luiz. It is the best tutorial I came across uptill now! Congrats... and heartfelt thanks!
Deletethank a lot !
ReplyDeletethank a lot !
ReplyDeleteCheers!
DeleteGreat and quick tutorials.
ReplyDeleteGlad you found it helpful. Cheers!
DeleteAwesome excllent bro...Thanks alot really Thanks..
ReplyDeleteExcellent Job Man !
ReplyDeletethis is really Awesome post Bro !!! If possible can you add some case studies will be really helpful to get some practical knowledge
ReplyDeleteThis is very useful who needs supports to stand..
ReplyDeleteThanku so much Please share practice exercises as well at the end of each session to practice
ReplyDeletegreat, easy to understand for user who is starting yet!
ReplyDeleteI have knowledge of R and looking for visulization of data sets, if have any specific link, request to you, please share it to me.
what is the correlation? can you please explain that part
ReplyDeleteSuperb.
ReplyDeleteGreat content. Loved it
ReplyDeleteSuch great content..
ReplyDeleteCould you please specify what's the difference between List and Array then?
Does an array cannot contain any of the things such as 'vectors', 'factors', etc?
Can vector be 2-dimensional?
Thanks!
Thanks a lot for this great R tutorial!
ReplyDeletegood
ReplyDeleteyour tutorial is very helpful to me . easy to understand . congratulations sir
ReplyDeleteHi, your resources are very useful and simple to understand.
ReplyDeletesimple and easy to understand.
ReplyDeleteabsolutely what i was looking for..thank you.
ReplyDeletevery simple to understand.
ReplyDelete