This tutorial lists some of the most useful string or character functions in R. It includes concatenating two strings, extract portion of text from a string, extract word from a string, making text uppercase or lowercase, replacing text with the other text etc.
Basics
1. Convert object into character type
The as.character function converts argument to character type. In the example below, we are storing 25 as a character.
2. Check the character type
To check whether a vector is a character or not, use is.character function.
3. Concatenate Strings
The paste function is used to join two strings. It is one of the most important string manipulation task. Every analyst performs it almost daily to structure data.
Paste Function Syntax
Example 2 : To create column names from x1 through x10
Example 3 : Use of 'Collapse' keyword
Compare the output of Example 2 and Example3, you would understand the usage of collapse keyword in paste function. Every sequence of x is separated by ",".
4. String Formatting
Suppose the value is stored in fraction and you need to convert it to percent. The sprintf is used to perform C-style string formatting.
Sprintf Function Syntax
Note : '%.0f' indicates 'fixed point' decimal notation with 0 decimal. The extra % sign after 'f' tells R to add percentage sign after the number.
The letter 'd' in the format is used for numeric value.
The letter 's' in the format is used for character string.
Character Functions in R |
Basics
In R, strings are stored in a character vector. You can create strings with a single quote / double quote.
For example, x = "I love R Programming"
The as.character function converts argument to character type. In the example below, we are storing 25 as a character.
Y = as.character(25)The class(Y) returns character as 25 is stored as a character in the previous line of code.
class(Y)
2. Check the character type
To check whether a vector is a character or not, use is.character function.
x = "I love R Programming"
is.character(x)
Output : TRUE
Like is.character function, there are other functions such as is.numeric, is.integer and is.array for checking numeric vector, integer and array.
The paste function is used to join two strings. It is one of the most important string manipulation task. Every analyst performs it almost daily to structure data.
Paste Function Syntax
paste (objects, sep = " ", collapse = NULL)
The sep= keyword denotes a separator or delimiter. The default separator is a single space. The collapse= keyword is used to separate the results.
Example 1
x = "Deepanshu"Output : Deepanshu Bhalla
y ="Bhalla"
paste(x, y)
paste(x, y, sep = ",")Output : Deepanshu,Bhalla
Example 2 : To create column names from x1 through x10
paste("x", seq(1,10), sep = "")Output : "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9" "x10"
Example 3 : Use of 'Collapse' keyword
paste("x", seq(1,10), sep="", collapse=",")
Output : "x1,x2,x3,x4,x5,x6,x7,x8,x9,x10"
Compare the output of Example 2 and Example3, you would understand the usage of collapse keyword in paste function. Every sequence of x is separated by ",".
4. String Formatting
Suppose the value is stored in fraction and you need to convert it to percent. The sprintf is used to perform C-style string formatting.
Sprintf Function Syntax
sprintf(fmt, ...)The keyword fmt denotes string format. The format starts with the symbol % followed by numbers and letters.
x = 0.25
sprintf("%.0f%%",x*100)
Output : 25%
Note : '%.0f' indicates 'fixed point' decimal notation with 0 decimal. The extra % sign after 'f' tells R to add percentage sign after the number.
If you change the code to sprintf("%.2f%%",x*100), it would return 25.00%.
Other Examples
a = seq(1, 5)Output : "x001" "x002" "x003" "x004" "x005"
sprintf("x%03d", a)
The letter 'd' in the format is used for numeric value.
sprintf("%s has %d rupees", "Ram", 500)Output : "Ram has 500 rupees"
The letter 's' in the format is used for character string.
5. Extract or replace substrings
substr Syntax - substr(x, starting position, end position)
In the above example. we are telling R to extract string from 1st letter through 3rd letter.
Replace Substring - substr(x, starting position, end position) = Value
In the above example, we are telling R to replace first 2 letters with 11.
6. String Length
The nchar function is used to compute the length of a character value.
It returns 20 as the vector 'x' contains 20 letters (including 3 spaces).
7. Replace the first match of the string
sub Syntax - sub(sub-string, replacement, x, ignore.case = FALSE)
if ignore.case is FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.
In the above example, we are replacing the word 'okay' with 'fine'.
Let's replace all values of a vector
In the example below, we need to replace prefix 'x' with 'Year' in values of a vector.
8. Extract Word from a String
Suppose you need to pull a first or last word from a character string.
Word Function Syntax (Library : stringr)
In the example above , '1' denotes the first word to be extract from a string. sep=" " denotes a single space as a delimiter (It's the default delimiter in the word function)
Extract Last Word
In the example above , '-1' denotes the first word but started to be reading from the right of the string. sep=" " denotes a single space as a delimiter (It's the default delimiter in the word function)
9. Convert Character to Uppercase / Lowercase /Propercase
In many times, we need to change case of a word. For example. convert the case to uppercase or lowercase.
Examples
The toupper() function converts letters in a string to uppercase.
The str_to_title() function converts first letter in a string to uppercase and the remaining letters to lowercase.
10. Remove Leading and Trailing Spaces
The trimws() function is used to remove leading and/or trailing spaces.
Syntax :
The str_trim() function from the stringr package eliminates leading and trailing spaces.
11. Converting Multiple Spaces to a Single Space
It's a challenging task to remove multiple spaces from a string and keep only a single space. In R, it is possible to do it easily with qdap package.
12. Repeat the character N times
In case you need to repeat the character number of times, you can do it with strrep base R function.
13. Find String in a Character Variable
The str_detect() function helps to check whether a sub-string exists in a string. It is equivalent to 'contain' function of SAS. It returns TRUE/FALSE against each value.
14. Splitting a Character Vector
In case of text mining. it is required to split a string to calculate the most frequently used keywords in the list. There is a function called 'strsplit()' in base R to perform this operation.
15. Selecting Multiple Values
The %in% keyword is used to select multiple values. It is the same function as IN keyword in SAS and SQL.
16. Pattern Matching
Most of the times, string manipulation becomes a daunting task as we need to match the pattern in strings. In these cases, Regex is a popular language to check the pattern. In R, it is implemented with grepl function.
Example -
Note : It does not return 'drahim' as pattern mentioned above is case-sensitive.
To make it case-insensitive, we can add (?i) before ^D.
2. Keeping characters do not start with the letter 'D'
3. Keeping characters end with 'S'
4. Keeping characters contain "S"
substr Syntax - substr(x, starting position, end position)
x = "abcdef"Output : abc
substr(x, 1, 3)
In the above example. we are telling R to extract string from 1st letter through 3rd letter.
Replace Substring - substr(x, starting position, end position) = Value
substr(x, 1, 2) = "11"Output : 11cdef
In the above example, we are telling R to replace first 2 letters with 11.
6. String Length
The nchar function is used to compute the length of a character value.
x = "I love R Programming"
nchar(x)
Output : 20
It returns 20 as the vector 'x' contains 20 letters (including 3 spaces).
7. Replace the first match of the string
sub Syntax - sub(sub-string, replacement, x, ignore.case = FALSE)
if ignore.case is FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.
sub("okay", "fine", "She is okay.")Output : She is fine
In the above example, we are replacing the word 'okay' with 'fine'.
Let's replace all values of a vector
In the example below, we need to replace prefix 'x' with 'Year' in values of a vector.
cols = c("x1", "x2", "x3")
sub("x", "Year", cols)
Output : "Year1" "Year2" "Year3"
8. Extract Word from a String
Suppose you need to pull a first or last word from a character string.
Word Function Syntax (Library : stringr)
word(string, position of word to extract, separator)Example
x = "I love R Programming"Output : I
library(stringr)
word(x, 1,sep = " ")
In the example above , '1' denotes the first word to be extract from a string. sep=" " denotes a single space as a delimiter (It's the default delimiter in the word function)
Extract Last Word
x = "I love R Programming"Output : Programming
library(stringr)
word(x, -1,sep = " ")
In the example above , '-1' denotes the first word but started to be reading from the right of the string. sep=" " denotes a single space as a delimiter (It's the default delimiter in the word function)
9. Convert Character to Uppercase / Lowercase /Propercase
In many times, we need to change case of a word. For example. convert the case to uppercase or lowercase.
Examples
x = "I love R Programming"
tolower(x)
Output : "i love r programming"
The tolower() function converts letters in a string to lowercase.
The tolower() function converts letters in a string to lowercase.
toupper(x)Output : "I LOVE R PROGRAMMING"
The toupper() function converts letters in a string to uppercase.
library(stringr)Output : "I Love R Programming"
str_to_title(x)
The str_to_title() function converts first letter in a string to uppercase and the remaining letters to lowercase.
10. Remove Leading and Trailing Spaces
The trimws() function is used to remove leading and/or trailing spaces.
Syntax :
trimws(x, which = c("both", "left", "right"))
Default Option : both : It implies removing both leading and trailing whitespace.
If you want to remove only leading spaces, you can specify "left". For removing trailing spaces,specify "right".
a = " Deepanshu Bhalla "
trimws(a)
It returns "Deepanshu Bhalla".
The str_trim() function from the stringr package eliminates leading and trailing spaces.
x= " deepanshu bhalla "
library(stringr)
str_trim(x)
Output : "deepanshu bhalla"
It's a challenging task to remove multiple spaces from a string and keep only a single space. In R, it is possible to do it easily with qdap package.
x= "deepanshu bhalla"Output : deepanshu bhalla
library(qdap)
Trim(clean(x))
12. Repeat the character N times
In case you need to repeat the character number of times, you can do it with strrep base R function.
strrep("x",3)Output : "xxx"
13. Find String in a Character Variable
The str_detect() function helps to check whether a sub-string exists in a string. It is equivalent to 'contain' function of SAS. It returns TRUE/FALSE against each value.
x = c("Aon Hewitt", "Aon Risk", "Hewitt", "Google")
library(stringr)
str_detect(x,"Aon")
Output : TRUE TRUE FALSE FALSE
14. Splitting a Character Vector
In case of text mining. it is required to split a string to calculate the most frequently used keywords in the list. There is a function called 'strsplit()' in base R to perform this operation.
x = c("I love R Programming")
strsplit(x, " ")
Output : "I" "love" "R" "Programming"
15. Selecting Multiple Values
The %in% keyword is used to select multiple values. It is the same function as IN keyword in SAS and SQL.
x = sample(LETTERS,100, replace = TRUE)
x[x %in% c("A","B","C")]
In the example above, we are generating a sample of alphabets and later we are subsetting data and selecting only A B and C.
16. Pattern Matching
Most of the times, string manipulation becomes a daunting task as we need to match the pattern in strings. In these cases, Regex is a popular language to check the pattern. In R, it is implemented with grepl function.
Example -
x = c("Deepanshu", "Dave", "Sandy", "drahim", "Jades")1. Keeping characters starts with the letter 'D'
x[grepl("^D",x)]Output : "Deepanshu" "Dave"
Note : It does not return 'drahim' as pattern mentioned above is case-sensitive.
To make it case-insensitive, we can add (?i) before ^D.
x[grepl("(?i)^d",x)]
x[!grepl("(?i)^d",x)]Output : "Sandy" "Jades"
3. Keeping characters end with 'S'
x[grepl("s$",x)]Output : "Jades"
4. Keeping characters contain "S"
x[grepl("(?i)*s",x)]Output : "Deepanshu" "Sandy" "Jades"
Share Share Tweet