VBA Data Import from Google into Excel: Custom Time Ranges - html

For a VBA application in Excel, I am trying to include the "custom time range" function Google offers when narrowing down the search. So far, I am using the following code (see below), which allows to import "resultStats" from Google for a given search term into Excel but lacks the time range option.
In this specific case, I would need to determine the number of results/articles e.g. for "Elon Musk" between 01/01/2015 and 12/31/2015. Is there any practicable addition to the code below? And can this also be applied for the Google News tab instead of the regular Google Search results?
Many thanks in advance!
Sub Gethits()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
Dim var As String
Dim var1 As Object
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.com/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.responseText
Set objResultDiv = html.getElementById("rso")
Set var1 = html.getElementById("resultStats")
Cells(i, 2).Value = var1.innerText
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub

It seems you need URL encoding so a string as shown below works when you include your cd_max and cd_min parameters. You specify news with the parameter tbm=nws.
As #chillin mentions you can achieve encoding of parameters with Application.Encodeurl().
I also tried the API method but with limited success. Though the dataRange filter can be passed in the sort parameter, you need to register for an API key , set up a custom search engine and set your requirements. Results are max 10 per query; there is an API call limit for free calls. You can specify a start number to get blocks of 10. You can also see what is URL encoded by running through the Google APIs explorer - custom search. I found it only returned 2 results which was clearly not in the region of the expected number.
Option Explicit
Public Sub GetResultCount()
Dim sResponse As String, html As HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.google.co.uk/search?q=elon+musk&safe=strict&biw=1163&bih=571&source=lnt&tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F2015%2Ccd_max%3A12%2F31%2F2015&tbm=nws", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
Set html = New HTMLDocument
With html
.body.innerHTML = sResponse
Debug.Print .querySelector("#resultStats").innerText
End With
End Sub

Thanks for your feedback. I have now amended the URL line as follows (including the Excel ENCODEURL function, which I applied directly for the input cells of the Excel spreadsheet) and it works perfectly:
url = "https://www.google.com/search?q=" & Cells(i, 1) & "&source=lnt&tbs=cdr%3A1%2Ccd_min%3A" & Cells(i, 2) & "%2Ccd_max%3A" & Cells(i, 3) & "&tbm=nws"

Related

Automated search in google

I have over 20,000 searches I need to do in google. I want to use VBA to do an automate search in google or internet explorer and return link to excel. I have tried multiple VBA formulas and none of them seem to work. Is there a formula that will do an automate search and return link to the first site on google search to excel? Below is the formula I am currently using, but it isn't working. I am searching addresses in column A and need link to be return to column B.
Sub XMLHTTP_Count()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
If html.getElementById("resultStats") Is Nothing Then
str_text = "0 Results"
Else
str_text = html.getElementById("resultStats").innerText
End If
Cells(i, 2) = str_text
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
Well, you don't need the randomizer and it looks like the 'resultStats' changed to 'result-stats'. Try the code below and see if it does what you want.
Sub GetSearchStats()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
Dim var As String
Dim var1 As Object
lastRow = Range("A" & Rows.Count).End(xlUp).row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.com/search?q=" & Cells(i, 1)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getElementById("rso")
Set var1 = html.getElementById("result-stats")
Cells(i, 2).Value = var1.innerText
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
Result:
I think I answered your initial question. This sounds like a new question , and it probably warrants a new post, but I'll go ahead and offer a second answer here, to address this question.
Sub WebPage()
Dim internet As Object
Dim internetdata As Object
Dim div_result As Object
Dim header_links As Object
Dim link As Object
Dim URL As String
Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True
URL = "https://www.google.co.in/search?q=how+to+program+in+vba"
internet.Navigate URL
Do Until internet.ReadyState >= 4
DoEvents
Loop
Application.Wait Now + TimeSerial(0, 0, 5)
Set internetdata = internet.Document
Set div_result = internetdata.getelementbyid("res")
Set header_links = div_result.getelementsbytagname("h3")
For Each h In header_links
Set link = h.ChildNodes.Item(0)
Cells(Range("A" & Rows.Count).End(xlUp).Row + 1, 1) = link.href
Next
MsgBox "done"
End Sub
Result:
You can easily convert each text field to a hyperlink if you want to make thse all clickable links. Feel free to modify the code to suit your needs.

Excel macro only executing to ~60th row, then failing while making get requests

I modified code I found online to be able to find a ClassName on an HTML page and returns its text when doing a Google search. I need to do this for about 10,000 companies but when testing it with only 100 rows it works and then stops around ~60th row. After that I am unable to get any results and found the only way to resolve it is by waiting about an hour and then executing it again. I tested this on another computer and had the same results and issue. It doesn't have to do with what is in the ~60th row because I use a different set of 100 companies each test. Even changing the loop to i = 2 to 101 still causes it to have the same problem.
Col A would have a company name such as: "Buchart Horn"
Col B returns "Architect in Baltimore, Maryland"
Col C would be blank (that's fine)
Col D returns "Baltimore, Maryland - Buchart Horn: Engineers, Architects and Planners"
I'm very new to VBA so any help would be appreciated. Thank you.
'References enabled:
'Microsoft Internet Controls, Microsoft HTML Object Library
Sub GoogleSearch()
Dim URL As String
Dim objHTTP As Object
Dim htmlDoc As HTMLDocument
Set htmlDoc = CreateObject("htmlfile")
Dim objResults1 As Object
Dim objResults2 As Object
Dim objResults3 As Object
On Error Resume Next
lastRow = Range("A" & Rows.count).End(xlUp).Row
For I = 2 To lastRow
URL = "https://www.google.com/search?q=" & Cells(I, 1)
Set objHTTP = CreateObject("MSXML2.XMLHTTP")
With objHTTP
.Open "GET", URL, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send
htmlDoc.body.innerHTML = .responseText
End With
Set objResults1 = htmlDoc.getElementsByClassName("YhemCb")
Set objResults2 = htmlDoc.getElementsByClassName("wwUB2c kno-fb-ctx")
Set objResults3 = htmlDoc.getElementsByClassName("LC20lb")
Cells(I, 2) = objResults1(0).innerText
Cells(I, 3) = objResults2(0).innerText
Cells(I, 4) = objResults3(0).innerText
Next
Set htmlDoc = Nothing
Set objResults1 = Nothing
Set objResults2 = Nothing
Set objResults3 = Nothing
Set objHTTP = Nothing
End Sub
The problem here turned out to be too many requests too quickly to google using the below GET request in a For loop:
With objHTTP
.Open "GET", URL, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send
htmlDoc.body.innerHTML = .responseText
End With
In order to make this code request at an acceptable pace to the server we are requesting from we add a pause in the loop.
The easiest built in way to pause in VBA is with:
For I = 2 To lastRow
... 'Lines omitted for clarity of purpose
...
Application.Wait (Now + TimeValue("0:00:6"))
Next
The application will then wait until that time to continue executing (an unfortunate limitation with this method is 1 second is the minimum wait time, so smaller values would need a different solution, Application.Wait and examples that could create a smaller delay outlined Here)

Downloading data from a web page (list) to an Excel

I have to download data from here:
[http://www.bcra.gov.ar/PublicacionesEstadisticas/Evolucion_moneda.asp][1]
Then I have to save all the data in an Excel. The problem is that I have to choose several dates and several currencies. For example, I have to select 12/31/2018, Dolar, Euro and Pesos. Moreover, I have to choose one currency at a time, and I have many to download.
I've tried Import External Data with Excel, but it didn't work.
I've also tried with this VBA code
Sub descarga_monedas()
Fecha = "2018.06.05"
Moneda = 313
Path = "http://www.bcra.gob.ar/PublicacionesEstadisticas/Evolucion_moneda_3.asp?tipo=E&Fecha=" & Fecha & "&Moneda=" & Moneda & """"
Application.Workbooks.Open (Path)
End Sub
The page seems to block this kind of code.
Is any way to solve this?
You can do it the following way. I have grabbed all the dates but included only one date to be used in conjunction with all currencies. Add another outer loop over dates to add in the dates values i.e. use an outer loop over inputDates collection to get each date.
Option Explicit
Public Sub GetData()
Dim body As String, html As HTMLDocument, http As Object, i As Long
Dim codes As Object, inputCurrency As Object, inputDates As Object, dates As Object
Const BASE_URL As String = "http://www.bcra.gov.ar/PublicacionesEstadisticas/Evolucion_moneda_3.asp?tipo=E&"
Set codes = CreateObject("scripting.dictionary")
Set inputDates = New Collection
Set html = New HTMLDocument '<== VBE > Tools > References > Microsoft HTML Object library
Set http = CreateObject("MSXML2.XMLHTTP")
With http
.Open "GET", "http://www.bcra.gov.ar/PublicacionesEstadisticas/Evolucion_moneda.asp", False
.send
html.body.innerHTML = .responseText
Set inputCurrency = html.querySelectorAll("[name=Moneda] option[value]")
Set dates = html.querySelectorAll("[name=Fecha] option[value]")
For i = 0 To inputCurrency.Length - 1
codes(inputCurrency.item(i).innerText) = inputCurrency.item(i).Value
Next
For i = 0 To dates.Length - 1
inputDates.Add dates.item(i).Value
Next
Dim fecha As String, moneda As String, key As Variant, downloadURL As String
Dim clipboard As Object, ws As Worksheet
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
For Each key In codes.keys
DoEvents
fecha = inputDates.item(1) '<== use an outer loop over inputDates collection to get each date
moneda = key
downloadURL = BASE_URL & "Fecha=" & fecha & "&Moneda=" & moneda '2019.02.11 ,79
.Open "GET", downloadURL, False
.send
html.body.innerHTML = StrConv(http.responseBody, vbUnicode)
clipboard.SetText html.querySelector("table").outerHTML
clipboard.PutInClipboard
Set ws = ThisWorkbook.Worksheets.Add
ws.NAME = fecha & "_" & moneda
ws.Cells(1, 1).PasteSpecial
Next
End With
End Sub

Excel VBA Scraping- HTML tables are not visible

I am trying to get data from "https://in.tradingview.com/symbols/NSE-ABB/technicals/" using excel vba website scraping, eventhough I am geting response, but the body.innerHTML is not showing required table, but in chrome if I inspect the page, I am able to see the table with the name.
What is wrong with the code?
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
WriteTxtFile sResponse
With html
.body.innerHTML = sResponse
Set tElementC = .getElementsByClassName("table-1i1M26QY- maTable-27Z4Dq6Y- tableWithAction-2OCRQQ8y-")(0).getElementsByTagName("td")
End With
URL --> https://in.tradingview.com/symbols/NSE-ABB/technicals/
classname to access = "table-1i1M26QY- maTable-27Z4Dq6Y- tableWithAction-2OCRQQ8y-"
The webpage source HTML by the link provided https://in.tradingview.com/symbols/NSE-ABB/technicals/ doesn't contain the necessary data, it uses AJAX. The website has a sorta API available. The response is returned in JSON format. So you need to make some reverse engineering work first to find out how does the website works. In a browser, e. g. Chrome, press F12 to open DevTools, navigate to the webpage, go to Network tab, set the filter to XHR, it will look like as shown below:
Examine logged responses. One of them having the largest size actually contains all the necessary data:
To make such XHR you need to keep the entire payload structure also, and add the relevant headers:
In Form Data section there are a lot of quote field titles that located within the array, so you may choose which actually you need. You may find more available titles, click on Initiator link (first screenshot above), you will see JS code which initiated that XHR. Click Pretty print {} at the bottom to make the code readable. Type any title you already pulled out from Form Data in the search box, e. g. Recommend.Other, and find others next to it in the code:
Here is VBA example showing how such scraping could be done. Import JSON.bas module into the VBA project for JSON processing.
Option Explicit
Sub Test()
Dim aQuoteFieldTitles()
Dim aQuoteFieldData()
Dim sPayload As String
Dim sJSONString As String
Dim vJSON
Dim sState As String
Dim i As Long
' Put the necessary field titles into array
aQuoteFieldTitles = Array( _
"name", "description", "country", "type", "after_tax_margin", "average_volume", "average_volume_30d_calc", "average_volume_60d_calc", "average_volume_90d_calc", "basic_eps_net_income", "beta_1_year", "beta_3_year", "beta_5_year", "current_ratio", "debt_to_assets", "debt_to_equity", "dividends_paid", "dividends_per_share_fq", _
"dividends_yield", "dps_common_stock_prim_issue_fy", "earnings_per_share_basic_ttm", "earnings_per_share_diluted_ttm", "earnings_per_share_forecast_next_fq", "earnings_per_share_fq", "earnings_release_date", "earnings_release_next_date", "ebitda", "enterprise_value_ebitda_ttm", "enterprise_value_fq", "exchange", "expected_annual_dividends", _
"gross_margin", "gross_profit", "gross_profit_fq", "industry", "last_annual_eps", "last_annual_revenue", "long_term_capital", "market_cap_basic", "market_cap_calc", "net_debt", "net_income", "number_of_employees", "number_of_shareholders", "operating_margin", _
"pre_tax_margin", "preferred_dividends", "price_52_week_high", "price_52_week_low", "price_book_ratio", "price_earnings_ttm", "price_revenue_ttm", "price_sales_ratio", "quick_ratio", "return_of_invested_capital_percent_ttm", "return_on_assets", "return_on_equity", "return_on_invested_capital", "revenue_per_employee", "sector", _
"eps_surprise_fq", "eps_surprise_percent_fq", "total_assets", "total_capital", "total_current_assets", "total_debt", "total_revenue", "total_shares_outstanding_fundamental", "volume", "relative_volume", "pre_change", "post_change", "close", "open", "high", "low", "gap", "price_earnings_to_growth_ttm", "price_sales", "price_book_fq", _
"price_free_cash_flow_ttm", "float_shares_outstanding", "total_shares_outstanding", "change_from_open", "change_from_open_abs", "Perf.W", "Perf.1M", "Perf.3M", "Perf.6M", "Perf.Y", "Perf.YTD", "Volatility.W", "Volatility.M", "Volatility.D", "RSI", "RSI7", "ADX", "ADX+DI", "ADX-DI", "ATR", "Mom", "High.All", "Low.All", "High.6M", "Low.6M", _
"High.3M", "Low.3M", "High.1M", "Low.1M", "EMA5", "EMA10", "EMA20", "EMA30", "EMA50", "EMA100", "EMA200", "SMA5", "SMA10", "SMA20", "SMA30", "SMA50", "SMA100", "SMA200", "Stoch.K", "Stoch.D", "MACD.macd", "MACD.signal", "Aroon.Up", "Aroon.Down", "BB.upper", "BB.lower", "goodwill", "debt_to_equity_fq", "CCI20", "DonchCh20.Upper", _
"DonchCh20.Lower", "HullMA9", "AO", "Pivot.M.Classic.S3", "Pivot.M.Classic.S2", "Pivot.M.Classic.S1", "Pivot.M.Classic.Middle", "Pivot.M.Classic.R1", "Pivot.M.Classic.R2", "Pivot.M.Classic.R3", "Pivot.M.Fibonacci.S3", "Pivot.M.Fibonacci.S2", "Pivot.M.Fibonacci.S1", "Pivot.M.Fibonacci.Middle", "Pivot.M.Fibonacci.R1", _
"Pivot.M.Fibonacci.R2", "Pivot.M.Fibonacci.R3", "Pivot.M.Camarilla.S3", "Pivot.M.Camarilla.S2", "Pivot.M.Camarilla.S1", "Pivot.M.Camarilla.Middle", "Pivot.M.Camarilla.R1", "Pivot.M.Camarilla.R2", "Pivot.M.Camarilla.R3", "Pivot.M.Woodie.S3", "Pivot.M.Woodie.S2", "Pivot.M.Woodie.S1", "Pivot.M.Woodie.Middle", "Pivot.M.Woodie.R1", _
"Pivot.M.Woodie.R2", "Pivot.M.Woodie.R3", "Pivot.M.Demark.S1", "Pivot.M.Demark.Middle", "Pivot.M.Demark.R1", "KltChnl.upper", "KltChnl.lower", "P.SAR", "Value.Traded", "MoneyFlow", "ChaikinMoneyFlow", "Recommend.All", "Recommend.MA", "Recommend.Other", "Stoch.RSI.K", "Stoch.RSI.D", "W.R", "ROC", "BBPower", "UO", "Ichimoku.CLine", _
"Ichimoku.BLine", "Ichimoku.Lead1", "Ichimoku.Lead2", "VWMA", "ADR", "RSI[1]", "Stoch.K[1]", "Stoch.D[1]", "CCI20[1]", "ADX-DI[1]", "AO[1]", "Mom[1]", "Rec.Stoch.RSI", "Rec.WR", "Rec.BBPower", "Rec.UO", "Rec.Ichimoku", "Rec.VWMA", "Rec.HullMA9" _
)
' Field titles exactly as in the table MOVING AVERAGES
' aQuoteFieldTitles = Array("EMA5", "SMA5", "EMA10", "SMA10", "EMA20", "SMA20", "EMA30", "SMA30", "EMA50", "SMA50", "EMA100", "SMA100", "EMA200", "SMA200", "Ichimoku.BLine", "VWMA", "HullMA9")
' Compose payload
sPayload = "{""symbols"":{""tickers"":[""NSE:ABB""],""query"":{""types"":[]}},""columns"":" & JSON.Serialize(aQuoteFieldTitles) & "}"
' Retrieve JSON response
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", "https://scanner.tradingview.com/india/scan", True
.setRequestHeader "content-type", "application/x-www-form-urlencoded"
.setRequestHeader "user-agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
.setRequestHeader "content-length", Len(sPayload)
.send (sPayload)
Do Until .readyState = 4: DoEvents: Loop
sJSONString = .responseText
End With
' Parse JSON response
JSON.Parse sJSONString, vJSON, sState
' Check response validity
Select Case True
Case sState <> "Object"
MsgBox "Invalid JSON response"
Case IsNull(vJSON("data"))
MsgBox vJSON("error")
Case Else
' Output data to worksheet #1
aQuoteFieldData = vJSON("data")(0)("d")
With ThisWorkbook.Sheets(1)
.Cells.Delete
.Cells.WrapText = False
For i = 0 To UBound(aQuoteFieldTitles)
.Cells(i + 1, 1).Value = aQuoteFieldTitles(i)
.Cells(i + 1, 2).Value = aQuoteFieldData(i)
Next
.Columns.AutoFit
End With
MsgBox "Completed"
End Select
End Sub
The output for me is as follows:
BTW, the similar approach applied in other answers.
As mentioned in comments, javascript has to run on the page to update the required content. There doesn't appear to be an API freely available. You can use a browser. You need to go VBE > Tools > References > add a reference to Microsoft Internet Controls.
Option Explicit
Public Sub GetInfo()
Dim IE As InternetExplorer, ws As Worksheet, hTable As Object, tRow As Object, td As Object, r As Long, c As Long, headers()
headers = Array("name", "value", "action")
Set ws = ThisWorkbook.Worksheets("Sheet1"): Set IE = New InternetExplorer
With IE
.Visible = True
.Navigate2 "https://in.tradingview.com/symbols/NSE-ABB/technicals/"
While .Busy Or .readyState < 4: DoEvents: Wend
Set hTable = IE.document.querySelector("table + .tableWithAction-2OCRQQ8y-")
ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
For Each tRow In hTable.getElementsByTagName("tr")
r = r + 1: c = 1
For Each td In tRow.getElementsByTagName("td")
ws.Cells(r, c).Value = td.innerText
c = c + 1
Next td
Next tRow
.Quit
End With
End Sub

VBA - Number of Google News Search Results

I have a cell that contains something I would like searched in google news. I want the code to return the number of results for that search. Currently I have this code which I found elsewhere on the site and does not use google news but even then I sometimes get a
runtime error -2147024891 (80070005)
after 70 or so searched and I can't run again.
Sub HawkishSearch()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("B" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 2) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
If html.getElementById("resultStats") Is Nothing Then
str_text = "0 Results"
Else
str_text = html.getElementById("resultStats").innerText
End If
Cells(i, 3) = str_text
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
Best option (IMO) is to use the Google News API and register for an API key. You can then use a queryString including your search term and parse the JSON response to get the result count. I do that below and also populate a collection with the article titles and links. I use a JSON parser called JSONConverter.bas which you download and add to your project. You can then go to VBE > Tools > References > add a reference to Microsoft Scripting Runtime.
Sample JSON response from API:
The {} denotes a dictionary which you access by key, the [] denotes a collection which you access by index or by For Each loop over.
I use the key totalResults to retrieve the total results count from the initial dictionary returned by the API.
I then loop the collection of dictionaries (articles) and pull the story titles and URLs.
You can then inspect the results in the locals window or print out
Sample of results in locals window:
Option Explicit
Public Sub GetStories()
Dim articles As Collection, article As Object
Dim searchTerm As String, finalResults As Collection, json As Object, arr(0 To 1)
Set finalResults = New Collection
searchTerm = "Obama"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://newsapi.org/v2/everything?q=" & searchTerm & "&apiKey=yourAPIkey", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
Debug.Print "total results = " & json("totalResults")
Set articles = json("articles")
For Each article In articles
arr(0) = article("title")
arr(1) = article("url")
finalResults.Add arr
Next
Stop '<== Delete me later
End Sub
Loop:
If deploying in a loop you can use a class clsHTTP to hold the XMLHTTP object. This is more efficient than creating and destroying. I supply this class with a method GetString to retrieve the JSON response from the API, and a GetInfo method to parse the JSON and retrieve the results count and the API results URLs and Titles.
Example of results structure in locals window:
Class clsHTTP:
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetString(ByVal url As String) As String
With http
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
GetString = .responseText
End With
End Function
Public Function GetInfo(ByVal json As Object) As Variant
Dim results(), counter As Long, finalResults(0 To 1), articles As Object, article As Object
finalResults(0) = json("totalResults")
Set articles = json("articles")
ReDim results(1 To articles.Count, 1 To 2)
For Each article In articles
counter = counter + 1
results(counter, 1) = article("title")
results(counter, 2) = article("url")
Next
finalResults(1) = results
GetInfo = finalResults
End Function
Standard module:
Option Explicit
Public Sub GetStories()
Dim http As clsHTTP, json As Object
Dim finalResults(), searchTerms(), searchTerm As Long, url As String
Set http = New clsHTTP
With ThisWorkbook.Worksheets("Sheet1")
searchTerms = Application.Transpose(.Range("A1:A2")) '<== Change to appropriate range containing search terms
End With
ReDim finalResults(1 To UBound(searchTerms))
For searchTerm = LBound(searchTerms, 1) To UBound(searchTerms, 1)
url = "https://newsapi.org/v2/everything?q=" & searchTerms(searchTerm) & "&apiKey=yourAPIkey"
Set json = JsonConverter.ParseJson(http.GetString(url))
finalResults(searchTerm) = http.GetInfo(json)
Set json = Nothing
Next
Stop '<==Delete me later
End Sub
'
Otherwise:
I would use the following where I grab story links by their class name. I get the count and write the links to a collection
Option Explicit
Public Sub GetStories()
Dim sResponse As String, html As HTMLDocument, articles As Collection
Const BASE_URL As String = "https://news.google.com/"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNRGxqTjNjd0VnSmxiaWdBUAE?hl=en-US&gl=US&ceid=US:en", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
Set html = New HTMLDocument: Set articles = New Collection
Dim numberOfStories As Long, nodeList As Object, i As Long
With html
.body.innerHTML = sResponse
Set nodeList = .querySelectorAll(".VDXfz")
numberOfStories = nodeList.Length
Debug.Print "number of stories = " & numberOfStories
For i = 0 To nodeList.Length - 1
articles.Add Replace$(Replace$(nodeList.item(i).href, "./", BASE_URL), "about:", vbNullString)
Next
End With
Debug.Print articles.Count
End Sub
Standard Google search:
The following works an example standard google search but you will not always get the same HTML structure depending on your search term. You will need to provide some failing cases to help me determine if there is a consistent selector method that can be applied.
Option Explicit
Public Sub GetResultsCount()
Dim sResponse As String, html As HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.google.com/search?q=mitsubishi", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
Set html = New HTMLDocument
With html
.body.innerHTML = sResponse
Debug.Print .querySelector("#resultStats").innerText
End With
End Sub