Create WordCloud with R

Deepanshu Bhalla 22 Comments , , ,
A wordcloud is a text mining technique that allows us to visualize most frequently used keywords in a paragraph.

The example wordcloud is shown below :
Create WordCloud with R Programming
How to create Word Cloud with R

Step 1 : Install the required packages
Note : If these packages are already installed, you don't need to install them again.

Step 2 : Load the above installed packages

Step 3 : Import data into R

Import a single file 
cname<-read.csv("C:/Users/Deepanshu Bhalla/Documents/Text.csv",head=TRUE)

Import multiple files from a folder

setwd("C:\\Users\\Deepanshu Bhalla\\Documents\\text mining")
cname <-getwd()
## Number of documents
## list file names

Note : In the above syntax, "text mining" is a folder name. I have placed all text files in this folder

Step 4 : Locate and load the corpus

If imported a single file 


If imported multiple files from a folder

docs <- Corpus (DirSource(cname))

Step 5 : Data Cleaning

# Simple Transformation
for (j in seq(docs))
docs[[j]] = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", docs[[j]])
docs[[j]] = gsub("@\\w+", "", docs[[j]])
docs[[j]] = gsub("http\\w+", "", docs[[j]])
docs[[j]] = gsub("[ \t]{2,}", "", docs[[j]])
docs[[j]] = gsub("^\\s+|\\s+quot;", "", docs[[j]])
docs[[j]] = gsub("[^\x20-\x7E]", "", docs[[j]])
# Specify stopwords other than in-bult english stopwords
skipwords = c(stopwords("english"), "system","technology") <- list(weighting = weightTf, stopwords  = skipwords,
              removePunctuation = TRUE,
              tolower = TRUE,
              minWordLength = 4,
              removeNumbers = TRUE, stripWhitespace = TRUE,
              stemDocument= TRUE)

# term-document matrix
docs <- tm_map(docs, PlainTextDocument) 
tdm = TermDocumentMatrix(docs, control =

# convert as matrix
tdm = as.matrix(tdm)

# get word counts in decreasing order
word_freqs = sort(rowSums(tdm), decreasing=TRUE)

# create a data frame with words and their frequencies
dm = data.frame(word=names(word_freqs), freq=word_freqs)

Step 6 : Create WordCloud with R
# Keep wordcloud the same

#Plot Histogram
p <- ggplot(subset(dm, freq>10), aes(word, freq))
p <-p+ geom_bar(stat="identity")
p <-p+ theme(axis.text.x=element_text(angle=45, hjust=1))


#Plot Wordcloud
wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(6, "Dark2"),min.freq=10, scale=c(4,.2),rot.per=.15,max.words=100)

Note : You can remove sparse terms with the following code :
 tdm.frequent = removeSparseTerms(tdm, 0.1)
Related Posts
Spread the Word!
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 22 Responses to "Create WordCloud with R"
  1. Hi, I followed all the steps but failed to transform a sentence into a short term. Which is step 5,8
    I got the error message:
    Error in UseMethod("content", x) :
    no applicable method for 'content' applied to an object of class "character"

    I can run all Step 1 to Step 4 till now.

    Could you help me fix this problem?
    Thank you in advance.

    1. I have updated the code. That should solve your problem.Thanks!

  2. Hi Deepanshu

    I followed your steps but I got the following error...please help

    Error in eval(expr, envir, enclos) : object 'word' not found

    1. May i know the section of the code which returns this error?

  3. Hey, I am using your code, but I can't do my own stopwords. I want to do a wordcloud of a chat and specify some words that shouldn't go into the wordcloud. Can you please tell me how to get my own words into stopwords? Thank you in advance

  4. Hi !
    Great work.
    Well, I am trying to create a word-cloud using tweets. But all it shows in the wordcloud is: object,class,status words
    LINK to screenshot:
    Help appreciated.

  5. Hi Thanks for sharing this.

    When I executed step 5 syntax I got below error

    Error: unexpected string constant in:
    " docs[[j]] = gsub("^\\s+|\\s+quot;, "", docs[[j]])
    docs[[j]] = gsub(""
    > }
    Error: unexpected '}' in "}"

    1. The above error was resolved in step 5, but when I executed next set of syntax getting error. Please see below for more details. Can you please let me know why I am getting this error

      > tdm = TermDocumentMatrix(docs, control =
      Error: inherits(doc, "TextDocument") is not TRUE
      > # convert as matrix
      > tdm = as.matrix(tdm)
      Error in as.matrix(tdm) : object 'tdm' not found
      > # get word counts in decreasing order
      > word_freqs = sort(rowSums(tdm), decreasing=TRUE)
      Error in : object 'tdm' not found
      > # create a data frame with words and their frequencies
      > dm = data.frame(word=names(word_freqs), freq=word_freqs)
      Error in data.frame(word = names(word_freqs), freq = word_freqs) :
      object 'word_freqs' not found

  6. This is fantastic!!!

    There is a small error in step 5 (instead of " used ;)

    for (j in seq(docs))
    docs[[j]] = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", docs[[j]])
    docs[[j]] = gsub("@\\w+", "", docs[[j]])
    docs[[j]] = gsub("http\\w+", "", docs[[j]])
    docs[[j]] = gsub("[ \t]{2,}", "", docs[[j]])

    docs[[j]] = gsub("^\\s+|\\s+quot;", "", docs[[j]])

    docs[[j]] = gsub("[^\x20-\x7E]", "", docs[[j]])

    1. Yes George, but in step 5 for next set of syntax I am getting below error, not sure why I am getting this error

      > tdm = TermDocumentMatrix(docs, control =
      Error: inherits(doc, "TextDocument") is not TRUE
      > # convert as matrix
      > tdm = as.matrix(tdm)
      Error in as.matrix(tdm) : object 'tdm' not found
      > # get word counts in decreasing order
      > word_freqs = sort(rowSums(tdm), decreasing=TRUE)
      Error in : object 'tdm' not found
      > # create a data frame with words and their frequencies
      > dm = data.frame(word=names(word_freqs), freq=word_freqs)
      Error in data.frame(word = names(word_freqs), freq = word_freqs) :
      object 'word_freqs' not found

    2. Hi George,

      I am sorry i didn't get your point. What exactly you have changed in the code -

      for (j in seq(docs))
      docs[[j]] = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", docs[[j]])
      docs[[j]] = gsub("@\\w+", "", docs[[j]])
      docs[[j]] = gsub("http\\w+", "", docs[[j]])
      docs[[j]] = gsub("[ \t]{2,}", "", docs[[j]])
      docs[[j]] = gsub("^\\s+|\\s+quot;, "", docs[[j]])
      docs[[j]] = gsub("[^\x20-\x7E]", "", docs[[j]])

    3. Hi Deepanshu,

      This code in Step 5
      docs[[j]] = gsub("^\\s+|\\s+quot;, "", docs[[j]])

      missed the " after "^\\s+|\\s+quot;

  7. Issue with the version of tm you are using.

    run the following command before running
    tdm = TermDocumentMatrix(docs, control =

    docs <- tm_map(docs, PlainTextDocument)

    1. Thanks a lot George, it does work perfectly.

      Just one question about note mentioned, in the syntax and step

      Note : You can remove sparse terms with the following code :
      tdm.frequent = removeSparseTerms(tdm, 0.1)

      What is the use of this?

      Thanks a lot for your help

  8. One more question, I increased my data points, means I included more comments in .csv file, but got only three word in word cloud.
    Whether earlier there were too many words displayed in chart

    Why when I added more comments word cloud is showing less words in chart

  9. i'm using word cloud in shiny , if i select the one data set it have to show the word cloud for corresponding to which i was selected.

  10. i'm using word cloud in shiny , if i select the one data set it have to show the word cloud for corresponding to which i was selected.

  11. Would appreciate it if you could provide a link to the data file(s)?

Next → ← Prev