How to Scrape Google News into Excel

Deepanshu Bhalla Add Comment ,

This tutorial explains how to scrape Google News into Excel using VBA.

You can download the workbook by clicking on the link below.

VBA Code : Google News Scraper

The code below prompts the user to enter the topic for which they want related articles from Google News. It returns the following information about the articles.

  1. Title
  2. Source
  3. Time
  4. Author
  5. Link
Web Scrape Google News in MS Excel
Sub GetNewsData()
    Dim query As String
    Dim url As String
    Dim xmlHttp As Object
    Dim htmlDoc As Object
    Dim articles As Object
    Dim article As Object
    Dim links As Collection
    Dim link As String
    Dim mytext As String
    Dim newsText As Collection
    Dim newsTextSplit As Variant
    Dim newsData As Collection
    Dim data As Collection

    ' Search Query
    query = InputBox("Please enter topic for news articles:", "User Input")
    If query = "" Then
        MsgBox "You didn't enter anything."
        Exit Sub
    End If

    ' Encode special characters in a text string
    query = EncodeSpecialCharacters(query)
    url = "https://news.google.com/search?q=" & query & "&hl=en-US&gl=US&ceid=US%3Aen"

    ' Create a new XML HTTP request
    Set xmlHttp = CreateObject("MSXML2.XMLHTTP")
    xmlHttp.Open "GET", url, False
    xmlHttp.send

    ' Create a new HTML document
    Set htmlDoc = CreateObject("htmlfile")
    htmlDoc.body.innerHTML = xmlHttp.responseText

    ' Find all articles
    Set articles = htmlDoc.getElementsByTagName("article")

    ' Initialize collections
    Set links = New Collection
    Set newsData = New Collection

    ' Loop through each article
    For Each article In articles
        ' Get the link
        link = article.getElementsByTagName("a")(0).href
        link = Replace(link, "about:", "")
        link = Replace(link, "./articles/", "https://news.google.com/articles/")
        links.Add link

        ' Get the news text
        mytext = CleanTrim(article.innerText)

        ' Split the news text into lines
        newsTextSplit = Split(mytext, "\n")
        
        ' Add the data to the collection
        Set data = New Collection
        data.Add IIf(UBound(newsTextSplit) >= 2, Trim(newsTextSplit(2)), "Missing") ' Title
        data.Add Trim(newsTextSplit(0)) ' Source
        
        If UBound(newsTextSplit) >= 3 Then
            data.Add Trim(newsTextSplit(3)) ' Time
        Else
            data.Add "Missing"
        End If
            
        If UBound(newsTextSplit) >= 4 Then
            data.Add Trim(Split(newsTextSplit(4), "By ")(UBound(Split(newsTextSplit(4), "By ")))) ' Author
        Else
            data.Add "Missing"
        End If

        data.Add link ' Link
        newsData.Add data

    Next article

    Dim ws As Worksheet
    Dim mydata As Collection
    Dim i As Integer
    Dim j As Integer
    Dim response As VbMsgBoxResult

    ' Set the active worksheet
    Set ws = ActiveSheet

    ' Check if the sheet has data
    If WorksheetFunction.CountA(ws.UsedRange) > 0 Then
        ' Prompt the user to replace the data
        response = MsgBox("The active sheet contains data. Do you want to replace it?", vbQuestion + vbYesNo, "Replace Data")
        If response = vbYes Then
            ' Clear the data
            ws.Cells.ClearContents
         Else
           MsgBox "Data was not replaced. No New Data Added.", vbInformation, "Canceled"
         Exit Sub
        End If
    End If
    
    With ws
        .Cells(1, 1).Value = "Title"
        .Cells(1, 2).Value = "Source"
        .Cells(1, 3).Value = "Time"
        .Cells(1, 4).Value = "Author"
        .Cells(1, 5).Value = "Link"
    End With

    ' Get the data from the newsData collection
    Set mydata = newsData

    ' Write the data to the worksheet
    For i = 1 To mydata.Count
        For j = 1 To mydata(i).Count
            ws.Cells(i + 1, j).Value = mydata(i)(j)
        Next j
    Next i
End Sub


Function EncodeSpecialCharacters(text As String) As String
    Dim encodedText As String
    Dim char As String
    Dim i As Integer

    ' Encode the special characters
    encodedText = ""
    For i = 1 To Len(text)
        char = Mid(LCase(text), i, 1)
        Select Case char
            Case "&"
                encodedText = encodedText & "%26"
            Case "="
                encodedText = encodedText & "%3D"
            Case "+"
                encodedText = encodedText & "%2B"
            Case " "
                encodedText = encodedText & "%20"
            Case Else
                encodedText = encodedText & char
        End Select
    Next i

    EncodeSpecialCharacters = encodedText
End Function

Function CleanTrim(ByVal S As String) As String
  Dim X As Long, CodesToClean As Variant
  CodesToClean = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, _
                       21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 129, 141, 143, 144, 157, 160)
  For X = LBound(CodesToClean) To UBound(CodesToClean)
    If InStr(S, Chr(CodesToClean(X))) Then S = Replace(S, Chr(CodesToClean(X)), "\n")
  Next
  ' Replace multiple "\n" with a single "\n"
  temp = Replace(S, "\n\n", "\n")
  While InStr(temp, "\n\n") > 0
    temp = Replace(temp, "\n\n", "\n")
  Wend
  
  If Left(temp, 2) = "\n" Then temp = Mid(temp, 3, Len(temp) - 2)
  CleanTrim = WorksheetFunction.Trim(temp)
   
End Function
Steps to Enter and Run VBA Code
  1. Open a New Excel Workbook.
  2. Press Alt + F11 to access the Visual Basic Editor (or click on the 'Visual Basic' icon in the 'Developer' tab)
  3. Select Insert > Module.
  4. In the module window that appears, enter the above VBA code.
  5. Close the Visual Basic Editor window if it's still open.
  6. In Excel, press Alt + F8 shortcut key and then select the GetNewsData macro and hit 'Run' button to run macro.
How to Customize the above VBA Code

If you don't want the user prompt to enter the topic for articles every time you run the code, you can hard code it by replacing the code - query = InputBox("Please enter topic for news articles:", "User Input") with query = "Enter Your Topic"

Explanation
  1. Set xmlHttp = CreateObject("MSXML2.XMLHTTP")
    This line creates an XML HTTP request object which is used to fetch news articles' information from Google News.
  2. Set htmlDoc = CreateObject("htmlfile")
    This object will be used to parse and manipulate the HTML content.
  3. htmlDoc.body.innerHTML = xmlHttp.responseText
    It fills the HTML document with the content fetched from the Google News.
  4. Set articles = htmlDoc.getElementsByTagName("article")
    This line extracts all the HTML elements with the tag name "article" from HTML. Every news article is wrapped within this tag.
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 0 Response to "How to Scrape Google News into Excel"
Next → ← Prev