Home » Data Science »

Create WordCloud with R

Deepanshu Bhalla 22 Comments Data Science, R, Text Analytics, Text Mining

A wordcloud is a text mining technique that allows us to visualize most frequently used keywords in a paragraph.

The example wordcloud is shown below :

Create WordCloud with R Programming

How to create Word Cloud with R

Step 1 : Install the required packages

install.packages("wordcloud")
install.packages("tm")
install.packages("ggplot2")

Note : If these packages are already installed, you don't need to install them again.

Step 2 : Load the above installed packages

library("wordcloud")
library("tm")
library(ggplot2)

Step 3 : Import data into R

Import a single file

cname<-read.csv("C:/Users/Deepanshu Bhalla/Documents/Text.csv",head=TRUE)

Import multiple files from a folder

setwd("C:\\Users\\Deepanshu Bhalla\\Documents\\text mining")
cname <-getwd()
## Number of documents
length(dir(cname))
## list file names
dir(cname)

Note : In the above syntax, "text mining" is a folder name. I have placed all text files in this folder

Step 4 : Locate and load the corpus

If imported a single file

docs<-Corpus(VectorSource(cname[,1]));

If imported multiple files from a folder

docs <- Corpus (DirSource(cname))
docs
summary(docs)
inspect(docs[1])

Step 5 : Data Cleaning

# Simple Transformation
for (j in seq(docs))
{
docs[[j]] = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", docs[[j]])
docs[[j]] = gsub("@\\w+", "", docs[[j]])
docs[[j]] = gsub("http\\w+", "", docs[[j]])
docs[[j]] = gsub("[ \t]{2,}", "", docs[[j]])
docs[[j]] = gsub("^\\s+|\\s+quot;", "", docs[[j]])
docs[[j]] = gsub("[^\x20-\x7E]", "", docs[[j]])
}

# Specify stopwords other than in-bult english stopwords
skipwords = c(stopwords("english"), "system","technology")

kb.tf <- list(weighting = weightTf, stopwords = skipwords,
removePunctuation = TRUE,
tolower = TRUE,
minWordLength = 4,
removeNumbers = TRUE, stripWhitespace = TRUE,
stemDocument= TRUE)

# term-document matrix

docs <- tm_map(docs, PlainTextDocument)

tdm = TermDocumentMatrix(docs, control = kb.tf)

# convert as matrix
tdm = as.matrix(tdm)

# get word counts in decreasing order
word_freqs = sort(rowSums(tdm), decreasing=TRUE)

# create a data frame with words and their frequencies
dm = data.frame(word=names(word_freqs), freq=word_freqs)

Step 6 : Create WordCloud with R

# Keep wordcloud the same
set.seed(123)

#Plot Histogram
p <- ggplot(subset(dm, freq>10), aes(word, freq))
p <-p+ geom_bar(stat="identity")
p <-p+ theme(axis.text.x=element_text(angle=45, hjust=1))

p

#Plot Wordcloud
wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(6, "Dark2"),min.freq=10, scale=c(4,.2),rot.per=.15,max.words=100)

Note : You can remove sparse terms with the following code :

tdm.frequent = removeSparseTerms(tdm, 0.1)

About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

While I love having friends who agree, I only learn from those who don't
Let's Get Connected Email LinkedIn

Post Comment 22 Responses to "Create WordCloud with R"

vickieNovember 24, 2014 at 4:25 AM
Awesome article.
ReplyDelete
Replies
Ice Cream Hit EarthFebruary 11, 2015 at 7:45 PM
Hi, I followed all the steps but failed to transform a sentence into a short term. Which is step 5,8
I got the error message:
Error in UseMethod("content", x) :
no applicable method for 'content' applied to an object of class "character"

I can run all Step 1 to Step 4 till now.

Could you help me fix this problem?
Thank you in advance.
ReplyDelete
Replies
Santanu DasApril 5, 2015 at 10:20 PM
Hi Deepanshu

I followed your steps but I got the following error...please help

Error in eval(expr, envir, enclos) : object 'word' not found
ReplyDelete
Replies
Malte RJune 26, 2015 at 7:40 AM
Hey, I am using your code, but I can't do my own stopwords. I want to do a wordcloud of a chat and specify some words that shouldn't go into the wordcloud. Can you please tell me how to get my own words into stopwords? Thank you in advance
ReplyDelete
Replies
Jigyasa GroverJuly 16, 2015 at 4:23 AM
Hi !
Great work.
Well, I am trying to create a word-cloud using tweets. But all it shows in the wordcloud is: object,class,status words
LINK to screenshot: https://drive.google.com/file/d/0B4ZhibK97rv0SE9XOXgwOERYNjQ/view?usp=sharing
Help appreciated.
Thanks
ReplyDelete
Replies
UnknownFebruary 9, 2016 at 8:54 PM
Hi Thanks for sharing this.

When I executed step 5 syntax I got below error

Error: unexpected string constant in:
" docs[[j]] = gsub("^\\s+|\\s+quot;, "", docs[[j]])
docs[[j]] = gsub(""
> }
Error: unexpected '}' in "}"
ReplyDelete
Replies
UnknownFebruary 10, 2016 at 2:29 AM
This is fantastic!!!

There is a small error in step 5 (instead of " used ;)

for (j in seq(docs))
{
docs[[j]] = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", docs[[j]])
docs[[j]] = gsub("@\\w+", "", docs[[j]])
docs[[j]] = gsub("http\\w+", "", docs[[j]])
docs[[j]] = gsub("[ \t]{2,}", "", docs[[j]])

docs[[j]] = gsub("^\\s+|\\s+quot;", "", docs[[j]])

docs[[j]] = gsub("[^\x20-\x7E]", "", docs[[j]])
}
ReplyDelete
Replies
UnknownFebruary 10, 2016 at 8:17 AM
Issue with the version of tm you are using.

run the following command before running
tdm = TermDocumentMatrix(docs, control = kb.tf)

docs <- tm_map(docs, PlainTextDocument)
ReplyDelete
Replies
UnknownFebruary 11, 2016 at 12:28 AM
One more question, I increased my data points, means I included more comments in .csv file, but got only three word in word cloud.
Whether earlier there were too many words displayed in chart

Why when I added more comments word cloud is showing less words in chart
ReplyDelete
Replies
UnknownMarch 4, 2016 at 5:36 AM
i'm using word cloud in shiny , if i select the one data set it have to show the word cloud for corresponding to which i was selected.
ReplyDelete
Replies
UnknownMarch 4, 2016 at 5:38 AM
i'm using word cloud in shiny , if i select the one data set it have to show the word cloud for corresponding to which i was selected.
ReplyDelete
Replies
AnonymousJuly 10, 2018 at 9:21 AM
Would appreciate it if you could provide a link to the data file(s)?
ReplyDelete
Replies

Add comment