R : Character Functions

Deepanshu Bhalla Add Comment
This tutorial lists some of the most useful string or character functions in R. It includes concatenating two strings, extract portion of text from a string, extract word from a string, making text uppercase or lowercase, replacing text with the other text etc.
Character Functions in R

Basics


In R, strings are stored in a character vector. You can create strings with a single quote / double quote.

For example, x = "I love R Programming"


1. Convert object into character type

The as.character function converts argument to character type. In the example below, we are storing 25 as a character.
Y = as.character(25)
class(Y)
The class(Y) returns character as 25 is stored as a character in the previous line of code.

2. Check the character type

To check whether a vector is a character or not, use is.character function.
x = "I love R Programming"
is.character(x)
Output : TRUE

Like is.character function, there are other functions such as is.numeric, is.integer and is.array for checking numeric vector, integer and array.

3. Concatenate Strings

The paste function is used to join two strings. It is one of the most important string manipulation task. Every analyst performs it almost daily to structure data.

Paste Function Syntax
paste (objects, sep = " ", collapse = NULL)
The sep= keyword denotes a separator or delimiter. The default separator is a single space. The collapse= keyword is used to separate the results.

Example 1
x = "Deepanshu"
y ="Bhalla"
paste(x, y)
Output : Deepanshu Bhalla
paste(x, y, sep = ",") 
Output : Deepanshu,Bhalla

Example 2 : To create column names from x1 through x10
paste("x", seq(1,10), sep = "")
Output :  "x1"  "x2"  "x3"  "x4"  "x5"  "x6"  "x7"  "x8"  "x9"  "x10"


Example 3 : Use of 'Collapse' keyword
paste("x", seq(1,10), sep="", collapse=",")
Output : "x1,x2,x3,x4,x5,x6,x7,x8,x9,x10"

Compare the output of Example 2 and Example3, you would understand the usage of collapse keyword in paste function. Every sequence of x is separated by ",".

4. String Formatting

Suppose the value is stored in fraction and you need to convert it to percent. The sprintf is used to perform C-style string formatting.

Sprintf Function Syntax
sprintf(fmt, ...)
The keyword fmt denotes string format. The format starts with the symbol % followed by numbers and letters.
x = 0.25
sprintf("%.0f%%",x*100)
Output : 25%

Note : '%.0f' indicates 'fixed point' decimal notation with 0 decimal. The extra % sign after 'f' tells R to add percentage sign after the number.

If you change the code to sprintf("%.2f%%",x*100), it would return 25.00%.

Other Examples
a = seq(1, 5)
sprintf("x%03d", a)
Output :  "x001" "x002" "x003" "x004" "x005"

The letter 'd' in the format is used for numeric value.
sprintf("%s has %d rupees", "Ram", 500)
Output : "Ram has 500 rupees"

The letter 's' in the format is used for character string.


5. Extract or replace substrings

substr Syntax - substr(x, starting position, end position)
x = "abcdef"
substr(x, 1, 3)
Output : abc 

In the above example. we are telling R to extract string from 1st letter through 3rd letter.

Replace Substring - substr(x, starting position, end position) = Value
substr(x, 1, 2) = "11"
Output : 11cdef

In the above example, we are telling R to replace first 2 letters with 11.


6. String Length

The nchar function is used to compute the length of a character value.
x = "I love R Programming"
nchar(x)
Output : 20

It returns 20 as the vector 'x' contains 20 letters (including 3 spaces).

7. Replace the first match of the string

sub Syntax - sub(sub-string, replacement, x, ignore.case = FALSE)

if ignore.case is FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.
sub("okay", "fine", "She is okay.")
Output : She is fine

In the above example, we are replacing the word 'okay' with 'fine'.

Let's replace all values of a vector

In the example below, we need to replace prefix 'x' with 'Year' in values of a vector.
cols = c("x1", "x2", "x3")
sub("x", "Year", cols)
Output : "Year1" "Year2" "Year3"


8. Extract Word from a String

Suppose you need to pull a first or last word from a character string.

Word Function Syntax (Library : stringr)
word(string, position of word to extract, separator) 
Example
x = "I love R Programming"
library(stringr)
word(x, 1,sep = " ")
Output : I

In the example above , '1' denotes the first word to be extract from a string. sep=" " denotes a single space as a delimiter (It's the default delimiter in the word function)


Extract Last Word
x = "I love R Programming"
library(stringr)
word(x, -1,sep = " ")
Output : Programming

In the example above , '-1' denotes the first word but started to be reading from the right of the string. sep=" " denotes a single space as a delimiter (It's the default delimiter in the word function)


9. Convert Character to Uppercase / Lowercase /Propercase

In many times, we need to change case of a word. For example. convert the case to uppercase or lowercase.

Examples
x = "I love R Programming"
tolower(x)
Output : "i love r programming"

The tolower() function converts letters in a string to lowercase.
toupper(x)
Output : "I LOVE R PROGRAMMING"

The toupper() function converts letters in a string to uppercase.
library(stringr)
str_to_title(x)
Output : "I Love R Programming"

The str_to_title() function converts first letter in a string to uppercase and the remaining letters to lowercase.



10. Remove Leading and Trailing Spaces

The trimws() function is used to remove leading and/or trailing spaces.

Syntax :
trimws(x, which = c("both", "left", "right"))
Default Option : both  : It implies removing both leading and trailing whitespace.
If you want to remove only leading spaces, you can specify "left". For removing trailing spaces,specify "right".
a = " Deepanshu Bhalla "
trimws(a)
It returns "Deepanshu Bhalla".

The str_trim() function from the stringr package eliminates leading and trailing spaces.
x= " deepanshu bhalla "
library(stringr)
str_trim(x)
Output : "deepanshu bhalla"


11. Converting Multiple Spaces to a Single Space

It's a challenging task to remove multiple spaces from a string and keep only a single space. In R, it is possible to do it easily with qdap package.
x= "deepanshu    bhalla"
library(qdap)
Trim(clean(x))
Output : deepanshu bhalla

12. Repeat the character N times

In case you need to repeat the character number of times, you can do it with strrep base R function.
strrep("x",3)
Output : "xxx"

13. Find String in a Character Variable


The str_detect() function helps to check whether a sub-string exists in a string. It is equivalent to 'contain' function of SAS. It returns TRUE/FALSE against each value.
x = c("Aon Hewitt", "Aon Risk", "Hewitt", "Google")
library(stringr)
str_detect(x,"Aon")

Output : TRUE  TRUE FALSE FALSE

14. Splitting a Character Vector

In case of text mining. it is required to split a string to calculate the most frequently used keywords in the list. There is a function called 'strsplit()' in base R to perform this operation.
x = c("I love R Programming")
strsplit(x, " ")

Output : "I"           "love"        "R"           "Programming"


15. Selecting Multiple Values


The %in% keyword is used to select multiple values. It is the same function as IN keyword in SAS and SQL.

x = sample(LETTERS,100, replace = TRUE)
x[x %in% c("A","B","C")]
In the example above, we are generating a sample of alphabets and later we are subsetting data and selecting only A B and C.


16. Pattern Matching

Most of the times, string manipulation becomes a daunting task as we need to match the pattern in strings. In these cases, Regex is a popular language to check the pattern. In R, it is implemented with grepl function.

Example -
x = c("Deepanshu", "Dave", "Sandy", "drahim", "Jades")
1. Keeping characters starts with the letter 'D'
x[grepl("^D",x)]
 Output :  "Deepanshu" "Dave"

Note : It does not return 'drahim' as pattern mentioned above is case-sensitive.

To make it case-insensitive, we can add (?i) before ^D.
x[grepl("(?i)^d",x)]

2. Keeping characters do not start with the letter 'D'
x[!grepl("(?i)^d",x)]
Output : "Sandy" "Jades"


3. Keeping characters end with 'S'
x[grepl("s$",x)]
Output : "Jades"

4. Keeping characters contain "S"
x[grepl("(?i)*s",x)]
Output : "Deepanshu" "Sandy"     "Jades"
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 0 Response to "R : Character Functions"
Next → ← Prev