Facebook Data Mining using R

Deepanshu Bhalla , ,
In this tutorial, we will see how to extract and analyze facebook data using R. Facebook has crossed more than 1 billion active users.  Facebook has gathered the most extensive data set ever about behavior of human. In R, we can extract data from Facebook and later analyze it. Social media mining is one of the most interesting piece in data science. You can analyze sentiments of an important event by pulling information about the event from Facebook and get insights from data in R.
Extract Facebook Data using R
Step by Step Guide : Extract Data from Facebook

Step I : Facebook Developer Registration

Go to https://developers.facebook.com and register yourself by clicking on Get Started button at the top right of page (See the snapshot below). After it would open a form for registration which you need to fill it to get yourself registered.
Facebook  Developer Registration



Step II : Add a new App

Once you are done with registration as shown in step 1, you need to click on My Apps button (check out the snapshot below). Then select Add a New App from the drop down.

Facebook : My Apps

Then you need to write Display Name of App ID (Type any  name) and select drop down option in Category (Choose Education). press Create App ID button.
Create a new App

Step 3 : Get App ID and App Secret

In this step, we need to note down our App ID and App Secret (Refer the screenshot below).
Fb App ID and App Secret

Step 4 : OAuth Settings

  1. On the left hand side menu, click on Add Product Button
  2. Click on Facebook Login link
  3. Under Settings, make sure YES is selected in Client OAuth Login
  4. Type http://localhost:1410/ in Valid OAuth redirect URIs box
  5. Click on Save Changes button

OAuth redirect URIs

If you don't put information correctly, you would get the following error -
Can't Load URL: The domain of this URL isn't included in the app's domains. To be able to load this URL, add all domains and subdomains of your app to the App Domains field in your app settings. 
Step 5 :  Write R Script

1. Install required packages

Go to R and install Rfacebook and RCurl packages. Run the following code to install them.
install.packages("Rfacebook")
install.packages("RCurl")
The package Rfacebook lets you to access Facebook App via R.

2. Load desired packages

In this step, we will load the above installed packages.
library(Rfacebook)
library(RCurl)
3. Paste your app id and app secret below 
fb_oauth <- fbOAuth(app_id="183xxxxxxxx3748", app_secret="7bfxxxxxxxxcf0",extended_permissions = TRUE)
Press ENTER in R Console or CTRL+ENTER in R Studio.

It would return the following message -
Copy and paste into Site URL on Facebook App Settings: http://localhost:1410/ 
When done, press any key to continue...
Waiting for authentication in browser...
Press Esc/Ctrl + C to abort

Authentication in Browser

Authentication Status

4. Check your profile account information
me <- getUsers("me",token=fb_oauth, private_info=TRUE)
me$name
[1] "Deepanshu Bhalla"

Fix : Error

Are you getting the error below?
Error in callAPI(query, token) :  An active access token must be used to query information about the current user.
Recently Facebook has made changes in the API which causes error in functions of Rfacebook package. See the method below to correct it.

Step 1 : Run the following program
fbOAuth <- function(app_id, app_secret, extended_permissions=FALSE, legacy_permissions=FALSE, scope=NULL)
{
  ## getting callback URL
  full_url <- oauth_callback()
  full_url <- gsub("(.*localhost:[0-9]{1,5}/).*", x=full_url, replacement="\\1")
  message <- paste("Copy and paste into Site URL on Facebook App Settings:",
                   full_url, "\nWhen done, press any key to continue...")
  ## prompting user to introduce callback URL in app page
  invisible(readline(message))
  ## a simplified version of the example in httr package
  facebook <- oauth_endpoint(
    authorize = "https://www.facebook.com/dialog/oauth",
    access = "https://graph.facebook.com/oauth/access_token")
  myapp <- oauth_app("facebook", app_id, app_secret)
  if (is.null(scope)) {
    if (extended_permissions==TRUE){
      scope <- c("user_birthday", "user_hometown", "user_location", "user_relationships",
                 "publish_actions","user_status","user_likes")
    }
    else { scope <- c("public_profile", "user_friends")}
 
    if (legacy_permissions==TRUE) {
      scope <- c(scope, "read_stream")
    }
  }

  if (packageVersion('httr') < "1.2"){
    stop("Rfacebook requires httr version 1.2.0 or greater")
  }

  ## with early httr versions
  if (packageVersion('httr') <= "0.2"){
    facebook_token <- oauth2.0_token(facebook, myapp,
                                     scope=scope)
    fb_oauth <- sign_oauth2.0(facebook_token$access_token)
    if (GET("https://graph.facebook.com/me", config=fb_oauth)$status==200){
      message("Authentication successful.")
    }
  }

  ## less early httr versions
  if (packageVersion('httr') > "0.2" & packageVersion('httr') <= "0.6.1"){
    fb_oauth <- oauth2.0_token(facebook, myapp,
                               scope=scope, cache=FALSE)
    if (GET("https://graph.facebook.com/me", config(token=fb_oauth))$status==200){
      message("Authentication successful.")
    }
  }

  ## httr version from 0.6 to 1.1
  if (packageVersion('httr') > "0.6.1" & packageVersion('httr') < "1.2"){
    Sys.setenv("HTTR_SERVER_PORT" = "1410/")
    fb_oauth <- oauth2.0_token(facebook, myapp,
                               scope=scope, cache=FALSE)
    if (GET("https://graph.facebook.com/me", config(token=fb_oauth))$status==200){
      message("Authentication successful.")
    }
  }

  ## httr version after 1.2
  if (packageVersion('httr') >= "1.2"){
    fb_oauth <- oauth2.0_token(facebook, myapp,
                               scope=scope, cache=FALSE)
    if (GET("https://graph.facebook.com/me", config(token=fb_oauth))$status==200){
      message("Authentication successful.")
    }
  }

  ## identifying API version of token
  error <- tryCatch(callAPI('https://graph.facebook.com/pablobarbera', fb_oauth),
                    error = function(e) e)
  if (inherits(error, 'error')){
    class(fb_oauth)[4] <- 'v2'
  }
  if (!inherits(error, 'error')){
    class(fb_oauth)[4] <- 'v1'
  }

  return(fb_oauth)
}

Step 2 :  Run fbOAuth function again

Make sure you put your own app_id and app_secret number before using the code below
fb_oauth <- fbOAuth(app_id="183385******33748", app_secret="7bf18f8********4cf7def77cf0",extended_permissions = TRUE)

Now, getUsers() function will work.


5. List of all the pages you have liked

Suppose you want to see all the pages you have liked in the past.
likes = getLikes(user="me", token = fb_oauth)
sample(likes$names, 10)
The sample() function is used to list some 10 random pages you have liked.

 [1] "The Hindu"                  "ADGPI - Indian Army"        "Brain Humor"            
 [4] "Jokes Corner"               "The New York Times"         "Oye! Extra Pen Hai?"    
 [7] "So You Think You Can Dance" "Shankar Tucker"             "Rihanna"                
[10] "Lindsey Stirling"


6. Update Facebook Status from R

You can also update status in Facebook via R.
updateStatus("this is just a test", token=fb_oauth)

7. Search Pages that contain a particular keyword
pages <- searchPages( string="trump", token=fb_oauth, n=200)
 In the above code, we are telling R to search all the pages that contain 'trump' as keyword. The n= 200 refers to the number of pages to return.

It returns 16 variables. See the list of variables -

[1] "id"                  "about"               "category"        
 [4] "description"         "general_info"        "likes"            
 [7] "link"                "city"                "state"            
[10] "country"             "latitude"            "longitude"        
[13] "name"                "talking_about_count" "username"        
[16] "website"
head(pages$name)
[1] "Donald J. Trump"                 "Ivanka Trump"                
[3] "President Donald Trump Fan Club" "President Donald J. Trump"    
[5] "Donald Trump Is My President"    "Donald Trump For President"  


8. Extract list of posts from a Facebook page

See the status posted by BBC News. The facebook page name of BBC News is bbcnews.
page <- getPage(page="bbcnews", token=fb_oauth, n=200) 
Posts Details
The above image is truncated. It returns in total 11 variables. See the variables' list -

 [1] "from_id"        "from_name"      "message"        "created_time"
 [5] "type"           "link"           "id"             "story"      
 [9] "likes_count"    "comments_count" "shares_count

9. Get all the posts from a particular date

You can also put the beginning and end date of the posts you wanted to extract.
page <- getPage("bbcnews", token=fb_oauth, n=100,
since='2016/06/01', until='2017/03/20')

10. Which of these posts got maximum likes?

To know the most popular BBCNews post, you can submit the following line of code.
summary = page[which.max(page$likes_count),]
summary$message
[1] "Could circular runways take off? (via BBC World Hacks)"

11. Which of these posts got maximum comments?

Some posts are not so popular in terms of likes but they fetch max comments. It might be because they are controversial.
summary1 = page[which.max(page$comments_count),]
"When Angela Merkel met Donald J. Trump, did her reactions speak louder than words?

12. Which post was shared the most?
summary2 = page[which.max(page$shares_count),]
"Islam will be the world's largest religion by 2070, new research suggests."

13. Extract a list of users who liked the maximum liked posts

In terms of marketing or growth of a website, it is very important to know about the users who liked a certain post.
post <- getPost(summary$id[1], token=fb_oauth, comments = FALSE, n.likes=2000)
To view the list of people:
likes <- post$likes
head(likes)
Result - 
from_name           from_id
Tommy Johnson 10154527932013108
Mirtunjay Raj   399490251425210
Sony Joseph   142559101272027

Note - I have edited the IDs to maintain privacy


14. Extract FB comments on a specific post

To know what users think about a post, it is important to analyze their comments.
post <- getPost(page$id[1], token=fb_oauth, n.comments=1000, likes=FALSE)
comments <- post$comments
fix(comments)

15. What is the comment that got the most likes?
comments[which.max(comments$likes_count),]
16. What are the most common first names in the user list?
head(sort(table(users$first_name), dec=TRUE), n=3)
  David   John Daniel
    14     13     10

17. Extract Reactions for most recent post

Facebook has more than a like button. Last year, it launched emoji (emoticons). If a post got 1k likes, it does not mean everyone really loves the comment. The reaction can be happy, sad or angry.
post <- getReactions(post=page$id[1], token=fb_oauth)
love_count = 60, haha_count = 286, wow_count = 62, sad_count = 169, angry_count = 532

18. Get Posts of a particular group

First, searchGroup() function searches id of a group from which you want to pull out posts. Later, the group ID is used as a input value in getGroup() function.
# Extract posts from Machine Learning Facebook group
ids <- searchGroup(name="machinelearningforum", token=fb_oauth)
group <- getGroup(group_id=ids[1,]$id, token=fb_oauth, n=25)
In case, searchGroup() function could not find group id. You can search it on lookup-id website.

End Notes

Text Mining (Social) has gained a lot of interest in a last couple of years. Every company has started analyzing customers' opinion about their products and what customers talk about the company in social media world. It helps marketing team to define marketing strategies and development team to modify the upcoming products based on customer feedback.
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Next → ← Prev