Excel VBA Scraping- HTML tables are not visible - html

I am trying to get data from "https://in.tradingview.com/symbols/NSE-ABB/technicals/" using excel vba website scraping, eventhough I am geting response, but the body.innerHTML is not showing required table, but in chrome if I inspect the page, I am able to see the table with the name.
What is wrong with the code?
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
WriteTxtFile sResponse
With html
.body.innerHTML = sResponse
Set tElementC = .getElementsByClassName("table-1i1M26QY- maTable-27Z4Dq6Y- tableWithAction-2OCRQQ8y-")(0).getElementsByTagName("td")
End With
URL --> https://in.tradingview.com/symbols/NSE-ABB/technicals/
classname to access = "table-1i1M26QY- maTable-27Z4Dq6Y- tableWithAction-2OCRQQ8y-"

The webpage source HTML by the link provided https://in.tradingview.com/symbols/NSE-ABB/technicals/ doesn't contain the necessary data, it uses AJAX. The website has a sorta API available. The response is returned in JSON format. So you need to make some reverse engineering work first to find out how does the website works. In a browser, e. g. Chrome, press F12 to open DevTools, navigate to the webpage, go to Network tab, set the filter to XHR, it will look like as shown below:
Examine logged responses. One of them having the largest size actually contains all the necessary data:
To make such XHR you need to keep the entire payload structure also, and add the relevant headers:
In Form Data section there are a lot of quote field titles that located within the array, so you may choose which actually you need. You may find more available titles, click on Initiator link (first screenshot above), you will see JS code which initiated that XHR. Click Pretty print {} at the bottom to make the code readable. Type any title you already pulled out from Form Data in the search box, e. g. Recommend.Other, and find others next to it in the code:
Here is VBA example showing how such scraping could be done. Import JSON.bas module into the VBA project for JSON processing.
Option Explicit
Sub Test()
Dim aQuoteFieldTitles()
Dim aQuoteFieldData()
Dim sPayload As String
Dim sJSONString As String
Dim vJSON
Dim sState As String
Dim i As Long
' Put the necessary field titles into array
aQuoteFieldTitles = Array( _
"name", "description", "country", "type", "after_tax_margin", "average_volume", "average_volume_30d_calc", "average_volume_60d_calc", "average_volume_90d_calc", "basic_eps_net_income", "beta_1_year", "beta_3_year", "beta_5_year", "current_ratio", "debt_to_assets", "debt_to_equity", "dividends_paid", "dividends_per_share_fq", _
"dividends_yield", "dps_common_stock_prim_issue_fy", "earnings_per_share_basic_ttm", "earnings_per_share_diluted_ttm", "earnings_per_share_forecast_next_fq", "earnings_per_share_fq", "earnings_release_date", "earnings_release_next_date", "ebitda", "enterprise_value_ebitda_ttm", "enterprise_value_fq", "exchange", "expected_annual_dividends", _
"gross_margin", "gross_profit", "gross_profit_fq", "industry", "last_annual_eps", "last_annual_revenue", "long_term_capital", "market_cap_basic", "market_cap_calc", "net_debt", "net_income", "number_of_employees", "number_of_shareholders", "operating_margin", _
"pre_tax_margin", "preferred_dividends", "price_52_week_high", "price_52_week_low", "price_book_ratio", "price_earnings_ttm", "price_revenue_ttm", "price_sales_ratio", "quick_ratio", "return_of_invested_capital_percent_ttm", "return_on_assets", "return_on_equity", "return_on_invested_capital", "revenue_per_employee", "sector", _
"eps_surprise_fq", "eps_surprise_percent_fq", "total_assets", "total_capital", "total_current_assets", "total_debt", "total_revenue", "total_shares_outstanding_fundamental", "volume", "relative_volume", "pre_change", "post_change", "close", "open", "high", "low", "gap", "price_earnings_to_growth_ttm", "price_sales", "price_book_fq", _
"price_free_cash_flow_ttm", "float_shares_outstanding", "total_shares_outstanding", "change_from_open", "change_from_open_abs", "Perf.W", "Perf.1M", "Perf.3M", "Perf.6M", "Perf.Y", "Perf.YTD", "Volatility.W", "Volatility.M", "Volatility.D", "RSI", "RSI7", "ADX", "ADX+DI", "ADX-DI", "ATR", "Mom", "High.All", "Low.All", "High.6M", "Low.6M", _
"High.3M", "Low.3M", "High.1M", "Low.1M", "EMA5", "EMA10", "EMA20", "EMA30", "EMA50", "EMA100", "EMA200", "SMA5", "SMA10", "SMA20", "SMA30", "SMA50", "SMA100", "SMA200", "Stoch.K", "Stoch.D", "MACD.macd", "MACD.signal", "Aroon.Up", "Aroon.Down", "BB.upper", "BB.lower", "goodwill", "debt_to_equity_fq", "CCI20", "DonchCh20.Upper", _
"DonchCh20.Lower", "HullMA9", "AO", "Pivot.M.Classic.S3", "Pivot.M.Classic.S2", "Pivot.M.Classic.S1", "Pivot.M.Classic.Middle", "Pivot.M.Classic.R1", "Pivot.M.Classic.R2", "Pivot.M.Classic.R3", "Pivot.M.Fibonacci.S3", "Pivot.M.Fibonacci.S2", "Pivot.M.Fibonacci.S1", "Pivot.M.Fibonacci.Middle", "Pivot.M.Fibonacci.R1", _
"Pivot.M.Fibonacci.R2", "Pivot.M.Fibonacci.R3", "Pivot.M.Camarilla.S3", "Pivot.M.Camarilla.S2", "Pivot.M.Camarilla.S1", "Pivot.M.Camarilla.Middle", "Pivot.M.Camarilla.R1", "Pivot.M.Camarilla.R2", "Pivot.M.Camarilla.R3", "Pivot.M.Woodie.S3", "Pivot.M.Woodie.S2", "Pivot.M.Woodie.S1", "Pivot.M.Woodie.Middle", "Pivot.M.Woodie.R1", _
"Pivot.M.Woodie.R2", "Pivot.M.Woodie.R3", "Pivot.M.Demark.S1", "Pivot.M.Demark.Middle", "Pivot.M.Demark.R1", "KltChnl.upper", "KltChnl.lower", "P.SAR", "Value.Traded", "MoneyFlow", "ChaikinMoneyFlow", "Recommend.All", "Recommend.MA", "Recommend.Other", "Stoch.RSI.K", "Stoch.RSI.D", "W.R", "ROC", "BBPower", "UO", "Ichimoku.CLine", _
"Ichimoku.BLine", "Ichimoku.Lead1", "Ichimoku.Lead2", "VWMA", "ADR", "RSI[1]", "Stoch.K[1]", "Stoch.D[1]", "CCI20[1]", "ADX-DI[1]", "AO[1]", "Mom[1]", "Rec.Stoch.RSI", "Rec.WR", "Rec.BBPower", "Rec.UO", "Rec.Ichimoku", "Rec.VWMA", "Rec.HullMA9" _
)
' Field titles exactly as in the table MOVING AVERAGES
' aQuoteFieldTitles = Array("EMA5", "SMA5", "EMA10", "SMA10", "EMA20", "SMA20", "EMA30", "SMA30", "EMA50", "SMA50", "EMA100", "SMA100", "EMA200", "SMA200", "Ichimoku.BLine", "VWMA", "HullMA9")
' Compose payload
sPayload = "{""symbols"":{""tickers"":[""NSE:ABB""],""query"":{""types"":[]}},""columns"":" & JSON.Serialize(aQuoteFieldTitles) & "}"
' Retrieve JSON response
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", "https://scanner.tradingview.com/india/scan", True
.setRequestHeader "content-type", "application/x-www-form-urlencoded"
.setRequestHeader "user-agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
.setRequestHeader "content-length", Len(sPayload)
.send (sPayload)
Do Until .readyState = 4: DoEvents: Loop
sJSONString = .responseText
End With
' Parse JSON response
JSON.Parse sJSONString, vJSON, sState
' Check response validity
Select Case True
Case sState <> "Object"
MsgBox "Invalid JSON response"
Case IsNull(vJSON("data"))
MsgBox vJSON("error")
Case Else
' Output data to worksheet #1
aQuoteFieldData = vJSON("data")(0)("d")
With ThisWorkbook.Sheets(1)
.Cells.Delete
.Cells.WrapText = False
For i = 0 To UBound(aQuoteFieldTitles)
.Cells(i + 1, 1).Value = aQuoteFieldTitles(i)
.Cells(i + 1, 2).Value = aQuoteFieldData(i)
Next
.Columns.AutoFit
End With
MsgBox "Completed"
End Select
End Sub
The output for me is as follows:
BTW, the similar approach applied in other answers.

As mentioned in comments, javascript has to run on the page to update the required content. There doesn't appear to be an API freely available. You can use a browser. You need to go VBE > Tools > References > add a reference to Microsoft Internet Controls.
Option Explicit
Public Sub GetInfo()
Dim IE As InternetExplorer, ws As Worksheet, hTable As Object, tRow As Object, td As Object, r As Long, c As Long, headers()
headers = Array("name", "value", "action")
Set ws = ThisWorkbook.Worksheets("Sheet1"): Set IE = New InternetExplorer
With IE
.Visible = True
.Navigate2 "https://in.tradingview.com/symbols/NSE-ABB/technicals/"
While .Busy Or .readyState < 4: DoEvents: Wend
Set hTable = IE.document.querySelector("table + .tableWithAction-2OCRQQ8y-")
ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
For Each tRow In hTable.getElementsByTagName("tr")
r = r + 1: c = 1
For Each td In tRow.getElementsByTagName("td")
ws.Cells(r, c).Value = td.innerText
c = c + 1
Next td
Next tRow
.Quit
End With
End Sub

Related

Web scraping a hyperlinked page

I want to scrape data from a local web site. The code collects the table in that Page but I want to also collect the data which is hyperlinked from table "Name of VO/NGO" field.
This is the main table. Other fields I want are from the page that appears when you click "Name of VO/NGO".
I read online material but couldn't correct the code.
First output should appear like this and so on the list should be made of each NGO:
Option Explicit
Public Sub GetInfo()
Const URL As String = "https://ngodarpan.gov.in/index.php/home/statewise_ngo/76/35/1"
Dim html As HTMLDocument, hTable As HTMLTable, ws As Worksheet, headers()
headers = Array("Sr No.", "Name of VO/NGO", "Address", "City","State","Telephone","Mobile No.","Website","Email")
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.Send
html.body.innerHTML = .responseText
End With
Set hTable = html.querySelector("table.dvdtbl")
Dim td As Object, tr As Object, r As Long, c As Long
r = 1
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
For Each tr In hTable.getElementsByTagName("tr")
r = r + 1: c = 1
If r > 3 Then
For Each td In tr.getElementsByTagName("td")
.Cells(r - 2, c) = IIf(c = 2, "'" & td.innerText, td.innerText)
c = c + 1
Next
End If
Next
End With
End Sub
There are several things to do to achieve the results you are after.
You need to parse the id number from each link to reuse it in post requests.
You need to parse csrf token from this link to be used in post requests
finally, you have to use any json converter or script control to dig out individual fields from that json response.
My following attempt can fetch you the json response. All you need to do now is parse the json to meet your requirement:
Sub FetchTabularInfo()
Dim Http As New XMLHTTP60, Html As New HTMLDocument
Dim col As Variant, icol As New Collection
Dim csrf As Variant, I&
With Http
.Open "GET", "https://ngodarpan.gov.in/index.php/home/statewise_ngo/76/35/1", False
.send
Html.body.innerHTML = .responseText
End With
With Html.querySelectorAll(".table tr a[onclick^='show_ngo_info']")
For I = 0 To .Length - 1
icol.Add Split(Split(.item(I).getAttribute("onclick"), "(""")(1), """)")(0)
Next I
End With
For Each col In icol
With Http
.Open "GET", "https://ngodarpan.gov.in/index.php/ajaxcontroller/get_csrf", False
.send
csrf = .responseText
End With
csrf = Split(Replace(Split(csrf, ":")(1), """", ""), "}")(0)
With Http
.Open "POST", "https://ngodarpan.gov.in/index.php/ajaxcontroller/show_ngo_info", False
.setRequestHeader "X-Requested-With", "XMLHttpRequest"
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
.send "id=" & col & "&csrf_test_name=" & csrf
End With
Debug.Print Http.responseText
Next col
End Sub
Output of the first lead:
{"status":1,"infor":{"0":{"UniqueID":"AN\/2017\/0161456","Mobile":"9476076176","Email":"anaportblair#gmail.com","ngo_url":"http:\/\/www.adityanatyaacademy.com","ngo_name":"AdityaNatyaAcademy","pan_updDocId":"220156","reg_updDocId":"221361","Off_phone1":null,"Major_Activities1":".Drama\nJatrapala\nStreetplays\nAwareness Programe"},"issues_working_db":"","operational_states_db":"ANDAMAN & NICOBAR ISLANDS, ","operational_district_db":"ANDAMAN & NICOBAR ISLANDS->South Andaman, "},"member_info":[{"SalCode":null,"FName":"ASHUTOSH KARMAKAR","MName":null,"LName":null,"DesigName":"President","EmailId":"nicorajberg#gmail.com","MobileNo":"9434262953","pan_updDocId":"223392","aadhaar_updDocId":"223393"},{"SalCode":null,"FName":"KAVERI DEBSHARMA","MName":null,"LName":null,"DesigName":"Member","EmailId":"rajeshdebsharma#gmail.com","MobileNo":"9474299901","pan_updDocId":"223400","aadhaar_updDocId":"223401"},{"SalCode":null,"FName":"SATYAJIT BAIN","MName":null,"LName":null,"DesigName":"Asisstant Secretary","EmailId"
:"anaportblair#gmail.com","MobileNo":"9434271746","pan_updDocId":"223408","aadhaar_updDocId":"223409"}],"registeration_info":[{"nr_orgName":"AdityaNatyaAcademy","nr_add":"31 M.G. Road,\nOpp. Sun Sea Resort,\nMiddle Point.","nr_city":"Port Blair","StateName":"ANDAMAN & NICOBAR ISLANDS","reg_name":"Registrar of Companies","TypeDescription":"Registered Societies (Non-Government)","nr_regNo":"888","nr_updDocId":"0","nr_actName":"Society Registration Act 1860","nr_isFcra":"N","fcrano":"","ngo_reg_date":"05-12-1995"}],"source_info":[{"sourcefund":"S","deptt_name":"Directorate of Art and Culture","purpose":"To Promote Art and Culture in Andaman and Nicobar Islands.","datefrom":"2013-04-01","dateto":"2014-03-31","amount_sanctioned":"25000"},{"sourcefund":"S","deptt_name":"Directorate of Art and Culture","purpose":"To promote Art and Culture","datefrom":"2014-04-01","dateto":"2015-03-31","amount_sanctioned":"25000"},{"sourcefund":"S","deptt_name":"Directorate of Art and Culture","purpose":"To promote Art and Cult
ure","datefrom":"2015-04-01","dateto":"2016-03-31","amount_sanctioned":"35000"},{"sourcefund":"S","deptt_name":"Directorate of Art and Culture","purpose":"To promote Art and Culture","datefrom":"2016-04-01","dateto":"2017-03-31","amount_sanctioned":"25000"}]}
Reference to add to execute the above script:
Microsoft Html Object Library
Microsoft xml, v6.0

Scraping data from website to Excel using a macro...lost

I am totally new to this but here is my scope.
I am running a macro to pull data from a business system.
After this info is pulled, I want a macro to take certain fields, put them into a website form, click submit and then scrape and paste certain data results back into excel.
Everything works minus the scraping and pasting back into excel.
Help please!
I have searched all over stack overflow and watched vids to try and figure out what I need to do but I must be misunderstanding something.
Sub Track()
Range("B2").Select
'This should call to PT and deliver tracking info
Dim IE As Object
Dim tbl As Object, td As Object
Set IE = CreateObject("InternetExplorer.Application") 'Set IEapp =
InternetExplorer
IE.Visible = True
IE.Navigate "https://www.partstown.com/track-my-order"
With IEapp
Do
DoEvents
Loop Until IE.readyState = 4
'Input PO and zip
Call IE.Document.getElementById("orderNo").SetAttribute("value",
"4500969111")
'ActiveCell.Offset(0, 2).Select
Call IE.Document.getElementById("postalCode").SetAttribute("value",
"37040")
IE.Document.forms(7).Submit
Application.Wait Now + TimeValue("00:00:09")
'this is where i am stuck. I know this isnt right but tried to piece it
together
Set elemCollection = IE.Document.getelElementsByTagname("table.account-
table details _tc_table_highlighted")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) =
elemCollection(t).Rows.Cells(c).innertext
Next c
Next r
Next t
End With
End Sub
Here is what I want it to pull:
Shipping column
QTY ordered
QTY shipped Product
And to display in a linear fashion:
Shipping, QTY ordered, QTY shipped, Product
Internet Explorer:
I have made this a little more verbose than usual so you can see each step.
Key things:
1) proper page loads waits with While .Busy Or .readyState < 4: DoEvents: Wend
2) selecting elements by id where possible. The # is a css id selector. css selectors are applied by querySelector method of .document and retrieve the first element in the page which matches the specified pattern
3) a timed loop is needed to wait for results to be present
4) the order qty etc info is a newline divided string. It seemed easiest to split on these newlines and then access individual items from the resultant array by index
5) I order, per your specification, the results in an array and write that array out in one go to the sheet
6) The "." is a class selector in .order-history__item-descript--min i.e. return the first element with class of order-history__item-descript--min
7) The [x=y] is an attribute = value selector in [data-label=Shipping] i.e. return the first element with data-label attribute having value Shipping
8) The combination of .details-table a is using a descendant combinator, " ", to specify I want a tag elements that have a parent with class .details-table
VBA:
Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub RetrieveInfo()
Dim ie As InternetExplorer, ele As Object, t As Date
Const MAX_WAIT_SEC As Long = 5
Set ie = New InternetExplorer
With ie
.Visible = True
.Navigate2 "https://www.partstown.com/track-my-order"
While .Busy Or .readyState < 4: DoEvents: Wend
With .document
.querySelector("#orderNo").Value = "4500969111"
.querySelector("#postalCode").Value = "37040"
.querySelector("#orderLookUpForm").submit
End With
While .Busy Or .readyState < 4: DoEvents: Wend
Dim shipping As String, order As String, items() As String
With .document
t = Timer
Do
On Error Resume Next
Set ele = .querySelector("[data-label=Shipping]")
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While ele Is Nothing
If ele Is Nothing Then Exit Sub
shipping = ele.innerText
order = .querySelector(".order-history__item-descript--min").innerText
items = Split(order, vbNewLine)
Dim qtyOrdered As Long, qtyShipped As String, product As String
qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
product = .querySelector(".details-table a").Title
Dim results()
results = Array(shipping, qtyOrdered, qtyShipped, product)
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(1, UBound(results) + 1) = results
End With
.Quit
End With
End Sub
If new to HTML please look at:
https://developer.mozilla.org/en-US/docs/Web/HTML
If new to css selectors please look at:
https://flukeout.github.io/
XMLHTTP:
The whole thing can also be done with XHR. This is much faster than opening a browser.
XHR:
Use XMLHttpRequest (XHR) objects to interact with servers. You can
retrieve data from a URL without having to do a full page [render]
In this case I do an initial GET request to the landing page to retrieve the
CSRFToken to use in my re-enactment of the POST request the page makes to the server when you manually input data and press submit. You get the data you want in the server response. I pass a query string in the body of the POST send line
.send "orderNo=4500969111&postalCode=37040&CSRFToken=" & csrft ; you can see your parameters there.
Option Explicit
Public Sub GetInfo()
Dim html As HTMLDocument, csrft As String '< VBE > Tools > References > Microsoft HTML Object Library
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.partstown.com", False
.send
html.body.innerHTML = .responseText
csrft = html.querySelector("[name=CSRFToken]").Value
.Open "POST", "https://www.partstown.com/track-my-order", False
.setRequestHeader "Referer", "https://www.partstown.com/track-my-order"
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
.setRequestHeader "Accept-Encoding", "gzip, deflate"
.setRequestHeader "Accept-Language", "en-US,en;q=0.9"
.send "orderNo=4500969111&postalCode=37040&CSRFToken=" & csrft
html.body.innerHTML = .responseText
End With
Dim shipping As String, order As String, items() As String
shipping = html.querySelector("[data-label=Shipping]").innerText
order = html.querySelector(".order-history__item-descript--min").innerText
items = Split(order, vbNewLine)
Dim qtyOrdered As Long, qtyShipped As String, product As String
qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
product = html.querySelector(".details-table a").Title
Dim results()
results = Array(shipping, qtyOrdered, qtyShipped, product)
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(1, UBound(results) + 1) = results
End Sub
Example of loop:
Option Explicit
Public Sub GetInfo()
Dim html As HTMLDocument, csrft As String, lastRow As Long, sourceValues() '< VBE > Tools > References > Microsoft HTML Object Library
Set html = New HTMLDocument
Dim ws As Worksheet, i As Long
Set ws = ThisWorkbook.Worksheets("Sheet4")
lastRow = ws.Cells(ws.rows.Count, "B").End(xlUp).Row
sourceValues = ws.Range("B2:D" & lastRow).Value
Dim results()
ReDim results(1 To UBound(sourceValues, 1), 1 To 4)
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.partstown.com", False
.send
html.body.innerHTML = .responseText
csrft = html.querySelector("[name=CSRFToken]").Value
Stop
For i = LBound(sourceValues, 1) To UBound(sourceValues, 1)
If sourceValues(i, 1) <> vbNullString And sourceValues(i, 3) <> vbNullString Then
DoEvents
.Open "POST", "https://www.partstown.com/track-my-order", False
.setRequestHeader "Referer", "https://www.partstown.com/track-my-order"
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
.setRequestHeader "Accept-Encoding", "gzip, deflate"
.setRequestHeader "Accept-Language", "en-US,en;q=0.9"
.send "orderNo=" & sourceValues(i, 1) & "&postalCode=" & sourceValues(i, 3) & "&CSRFToken=" & csrft
html.body.innerHTML = .responseText
Dim shipping As String, order As String, items() As String
shipping = html.querySelector("[data-label=Shipping]").innerText
order = html.querySelector(".order-history__item-descript--min").innerText
items = Split(order, vbNewLine)
Dim qtyOrdered As Long, qtyShipped As String, product As String
qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
product = html.querySelector(".details-table a").Title
results(i, 1) = shipping
results(i, 2) = qtyOrdered
results(i, 3) = qtyShipped
results(i, 4) = product
End If
'Application.Wait Now + TimeSerial(0, 0, 1)
Next
End With
'results written out from row 2 column E
ws.Cells(2, 5).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub

VBA Data Import from Google into Excel: Custom Time Ranges

For a VBA application in Excel, I am trying to include the "custom time range" function Google offers when narrowing down the search. So far, I am using the following code (see below), which allows to import "resultStats" from Google for a given search term into Excel but lacks the time range option.
In this specific case, I would need to determine the number of results/articles e.g. for "Elon Musk" between 01/01/2015 and 12/31/2015. Is there any practicable addition to the code below? And can this also be applied for the Google News tab instead of the regular Google Search results?
Many thanks in advance!
Sub Gethits()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
Dim var As String
Dim var1 As Object
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.com/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.responseText
Set objResultDiv = html.getElementById("rso")
Set var1 = html.getElementById("resultStats")
Cells(i, 2).Value = var1.innerText
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
It seems you need URL encoding so a string as shown below works when you include your cd_max and cd_min parameters. You specify news with the parameter tbm=nws.
As #chillin mentions you can achieve encoding of parameters with Application.Encodeurl().
I also tried the API method but with limited success. Though the dataRange filter can be passed in the sort parameter, you need to register for an API key , set up a custom search engine and set your requirements. Results are max 10 per query; there is an API call limit for free calls. You can specify a start number to get blocks of 10. You can also see what is URL encoded by running through the Google APIs explorer - custom search. I found it only returned 2 results which was clearly not in the region of the expected number.
Option Explicit
Public Sub GetResultCount()
Dim sResponse As String, html As HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.google.co.uk/search?q=elon+musk&safe=strict&biw=1163&bih=571&source=lnt&tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F2015%2Ccd_max%3A12%2F31%2F2015&tbm=nws", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
Set html = New HTMLDocument
With html
.body.innerHTML = sResponse
Debug.Print .querySelector("#resultStats").innerText
End With
End Sub
Thanks for your feedback. I have now amended the URL line as follows (including the Excel ENCODEURL function, which I applied directly for the input cells of the Excel spreadsheet) and it works perfectly:
url = "https://www.google.com/search?q=" & Cells(i, 1) & "&source=lnt&tbs=cdr%3A1%2Ccd_min%3A" & Cells(i, 2) & "%2Ccd_max%3A" & Cells(i, 3) & "&tbm=nws"

VBA - Number of Google News Search Results

I have a cell that contains something I would like searched in google news. I want the code to return the number of results for that search. Currently I have this code which I found elsewhere on the site and does not use google news but even then I sometimes get a
runtime error -2147024891 (80070005)
after 70 or so searched and I can't run again.
Sub HawkishSearch()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("B" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 2) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
If html.getElementById("resultStats") Is Nothing Then
str_text = "0 Results"
Else
str_text = html.getElementById("resultStats").innerText
End If
Cells(i, 3) = str_text
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
Best option (IMO) is to use the Google News API and register for an API key. You can then use a queryString including your search term and parse the JSON response to get the result count. I do that below and also populate a collection with the article titles and links. I use a JSON parser called JSONConverter.bas which you download and add to your project. You can then go to VBE > Tools > References > add a reference to Microsoft Scripting Runtime.
Sample JSON response from API:
The {} denotes a dictionary which you access by key, the [] denotes a collection which you access by index or by For Each loop over.
I use the key totalResults to retrieve the total results count from the initial dictionary returned by the API.
I then loop the collection of dictionaries (articles) and pull the story titles and URLs.
You can then inspect the results in the locals window or print out
Sample of results in locals window:
Option Explicit
Public Sub GetStories()
Dim articles As Collection, article As Object
Dim searchTerm As String, finalResults As Collection, json As Object, arr(0 To 1)
Set finalResults = New Collection
searchTerm = "Obama"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://newsapi.org/v2/everything?q=" & searchTerm & "&apiKey=yourAPIkey", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
Debug.Print "total results = " & json("totalResults")
Set articles = json("articles")
For Each article In articles
arr(0) = article("title")
arr(1) = article("url")
finalResults.Add arr
Next
Stop '<== Delete me later
End Sub
Loop:
If deploying in a loop you can use a class clsHTTP to hold the XMLHTTP object. This is more efficient than creating and destroying. I supply this class with a method GetString to retrieve the JSON response from the API, and a GetInfo method to parse the JSON and retrieve the results count and the API results URLs and Titles.
Example of results structure in locals window:
Class clsHTTP:
Option Explicit
Private http As Object
Private Sub Class_Initialize()
Set http = CreateObject("MSXML2.XMLHTTP")
End Sub
Public Function GetString(ByVal url As String) As String
With http
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
GetString = .responseText
End With
End Function
Public Function GetInfo(ByVal json As Object) As Variant
Dim results(), counter As Long, finalResults(0 To 1), articles As Object, article As Object
finalResults(0) = json("totalResults")
Set articles = json("articles")
ReDim results(1 To articles.Count, 1 To 2)
For Each article In articles
counter = counter + 1
results(counter, 1) = article("title")
results(counter, 2) = article("url")
Next
finalResults(1) = results
GetInfo = finalResults
End Function
Standard module:
Option Explicit
Public Sub GetStories()
Dim http As clsHTTP, json As Object
Dim finalResults(), searchTerms(), searchTerm As Long, url As String
Set http = New clsHTTP
With ThisWorkbook.Worksheets("Sheet1")
searchTerms = Application.Transpose(.Range("A1:A2")) '<== Change to appropriate range containing search terms
End With
ReDim finalResults(1 To UBound(searchTerms))
For searchTerm = LBound(searchTerms, 1) To UBound(searchTerms, 1)
url = "https://newsapi.org/v2/everything?q=" & searchTerms(searchTerm) & "&apiKey=yourAPIkey"
Set json = JsonConverter.ParseJson(http.GetString(url))
finalResults(searchTerm) = http.GetInfo(json)
Set json = Nothing
Next
Stop '<==Delete me later
End Sub
'
Otherwise:
I would use the following where I grab story links by their class name. I get the count and write the links to a collection
Option Explicit
Public Sub GetStories()
Dim sResponse As String, html As HTMLDocument, articles As Collection
Const BASE_URL As String = "https://news.google.com/"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNRGxqTjNjd0VnSmxiaWdBUAE?hl=en-US&gl=US&ceid=US:en", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
Set html = New HTMLDocument: Set articles = New Collection
Dim numberOfStories As Long, nodeList As Object, i As Long
With html
.body.innerHTML = sResponse
Set nodeList = .querySelectorAll(".VDXfz")
numberOfStories = nodeList.Length
Debug.Print "number of stories = " & numberOfStories
For i = 0 To nodeList.Length - 1
articles.Add Replace$(Replace$(nodeList.item(i).href, "./", BASE_URL), "about:", vbNullString)
Next
End With
Debug.Print articles.Count
End Sub
Standard Google search:
The following works an example standard google search but you will not always get the same HTML structure depending on your search term. You will need to provide some failing cases to help me determine if there is a consistent selector method that can be applied.
Option Explicit
Public Sub GetResultsCount()
Dim sResponse As String, html As HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.google.com/search?q=mitsubishi", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
Set html = New HTMLDocument
With html
.body.innerHTML = sResponse
Debug.Print .querySelector("#resultStats").innerText
End With
End Sub

VBA Excel pulling new webpage data after clicking on "submit"

I'm trying to pull some info from a website that provides oil well data by API number (API is a unique number for every well in the US)
Website: http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1
API example: 1708300502
The issue is, when I get to the 2nd page, IE.document.getElementsByTagName("body")(0).innerText still returns data from the initial page. How do I fetch the updated page data?
The ultimate goal is to get to the 2nd page, click on "30570" via IE.document.getElementsByTagName("a")(0).Click and then read the final 3rd page. I just cannot figure out how to read the updated page :(
Option Explicit
Sub sonris_WellData()
Dim IE As InternetExplorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
Dim i As Integer
'Open SONRIS website
Application.StatusBar = "Opening Website"
IE.navigate "http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Do While IE.readyState <> 4: DoEvents: Loop
Application.Wait Now() + TimeValue("00:00:01")
Application.StatusBar = False
IE.document.forms(0).p_apinum.Value = "1708300502" 'plug-in API
IE.document.forms(0).submit
' Wait until the next page opens
Application.StatusBar = "Opening Website"
Do While IE.readyState <> 4: DoEvents: Loop
Application.Wait Now() + TimeValue("00:00:01")
Application.StatusBar = False
' Read the page - this is where the issue occurs, MsgBox keeps returning text from the very 1st page
MsgBox IE.document.getElementsByTagName("body")(0).innerText
IE.Quit
End Sub
This seems to be working. Rather than DoEvents use the WinAPI Sleep function. I also added a call to the Sleep function after the form submit.
MOre often we are seeing sites that are dynamically served by some javascript/etc., in these cases the browser may appear to be READYSTATE_COMPLETE or not Busy but the page has not yet rendered the "new" results.
Option Explicit
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub sonris_WellData()
Dim IE As Object 'InternetExplorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
Dim i As Integer
'Open SONRIS website
Application.StatusBar = "Opening Website"
IE.navigate "http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Do While IE.readyState <> 4
Sleep 1000
Loop
Application.StatusBar = False
IE.document.forms(0).p_apinum.Value = "1708300502" 'plug-in API
IE.document.forms(0).submit
Sleep 1000
' Wait until the next page opens
Application.StatusBar = "Opening Website"
Do While IE.readyState <> 4
Sleep 1000
Loop
Application.StatusBar = False
' Read the page - this is where the issue occurs, MsgBox keeps returning text from the very 1st page
MsgBox IE.document.getElementsByTagName("body")(0).innerText
IE.Quit
End Sub
You can experiment maybe with a slightly longer Sleep after the .submit.
Alternatively, I notice that after you submit, the URL changes, so you could also try changing the second waiting loop to:
Do While IE.LocationURL ="http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Sleep 1000
Loop
This should put the Excel.Application to wait until the URL has changed.
Alternatively, you may have better luck using an XMLHTTPRequest (there are many examples of this here on SO and elsewhere on the internet). This allows you to send a request just like the browser, without actually using a web browser. Then you can simply parse the return text as HTML or XML. I would use the Microsoft XML, v6.0 library reference for this.
POST requests:
① Entering the Well API number
I examined the web page making the selections you mention. I inspected the web traffic using fiddler and noticed that the initial request, when you submit the API number is handled by a POST request.
② POST request:
The POST body has the following parameter:
p_apinum is the key and the associated value is the original Well API number.
Using this info I formulated a POST request direct thus avoiding your first landing page.
③ Pressing the hyperlink:
Next, I noticed that the element you wanted to press:
Looking at the associated HTML it has an associated relative hyperlink:
I use a helper function to parse the page HTML to get this relative link and construct the absolute path: GetNextURL(page.body.innerHTML).
④ Making a new request:
I re-use my HTTPRequest function GetPage to send a second request, with an empty body, and grab all the tables from the HTML document returned via: page.getElementsByTagName("table").
⑤ Writing the tables to the Excel worksheet:
I loop all the tables on the page using helper function AddHeaders to write out the table headers, and WriteTables to write the current table to the sheet.
Example page content:
Example code output:
VBA:
Option Explicit
Public Sub GetWellInfo()
Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
Const PARAM1 As String = "p_apinum"
Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
apiNumbers = Array(1708300502, 1708300503)
Application.ScreenUpdating = False
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ws
.Cells.ClearContents
For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
Dim allTables As Object
Set allTables = page.getElementsByTagName("table")
For Each targetTable In allTables
AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
WriteTables targetTable, GetLastRow(ws, 1), ws
Next targetTable
Next currNumber
End With
Application.ScreenUpdating = True
End Sub
Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
Dim objHTTP As Object, html As New HTMLDocument
Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")
Dim sBody As String
If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
With objHTTP
.SetTimeouts 10000, 10000, 10000, 10000
.Open "POST", url, False
.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
On Error Resume Next
.send (sBody)
If Err.Number = 0 Then
If .Status = "200" Then
html.body.innerHTML = .responseText
Set GetPage = html
Else
Debug.Print "HTTP " & .Status & " " & .statusText
Exit Function
End If
Else
Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
Exit Function
End If
On Error GoTo 0
End With
End Function
Public Function GetNextURL(ByVal inputString As String)
GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
End Function
Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
Dim headers As Object, header As Object, columnCounter As Long
Set headers = hTable.getElementsByTagName("th")
For Each header In headers
columnCounter = columnCounter + 1
ws.Cells(startRow, columnCounter) = header.innerText
Next header
End Sub
Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
If ws Is Nothing Then Set ws = ActiveSheet
Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
r = startRow
With ActiveSheet
Set tRow = hTable.getElementsByTagName("tr")
For Each tr In tRow
Set tCell = tr.getElementsByTagName("td")
For Each td In tCell
.Cells(r, c).Value = td.innerText
c = c + 1
Next td
r = r + 1: c = 1
Next tr
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
End With
End Function
References:
VBE > Tools > References > HTML Object Library.