Excel VBA - Scrape Google using XMLHTTP (MSXML2.XMLHTTP) - html

I am trying to pull together a daily table of events for certain news topics on Google News.
In a single module I have the following:
Option Explicit
Private Declare Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
Dim ret As Long
Sub Go()
Dim url As String, i As Integer, numb_H3 As Integer, lastRow As Long, XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object, j1 As Object
url = "https://www.google.co.uk/search?q=" & "Wearables" & "&tbm=nws" ' "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getElementById("rso")
numb_H3 = objResultDiv.GetElementsByTagname("H3").Length
For i = 0 To numb_H3 - 1
If numb_H3 > 0 Then
Set objH3 = objResultDiv.GetElementsByTagname("H3")(i)
Set link = objH3.GetElementsByTagname("a")(0)
'get thumbnail image location
Cells(ActiveCell.Row + i, 1).Value = objResultDiv.GetElementsByTagname("img")(i).src
'get news title
Cells(ActiveCell.Row + i, 2).Value = objH3.InnerText
'get news link
Cells(ActiveCell.Row + i, 3).Value = link.href
'get source name
Cells(ActiveCell.Row + i, 5).Value = "need help"
'get source time
Cells(ActiveCell.Row + i, 6).Value = "need help"
'get news paragraph
Cells(ActiveCell.Row + i, 7).Value = "need help"
End If
DoEvents
Next i
html.Close
End Sub
I am able to return the following objects:
I know where the objects are in red that I am trying to get, I'm just struggling with the syntax when using GetElementsByClassName :
so for example, I know that the text "ZDNet" lies in:
?...GetElementsByClassName("slp")(i).GetElementsByTagname("span")(0).InnerText
And that the date "7 Jan 2017" lies in:
?...GetElementsByClassName("slp")(i).GetElementsByTagname("span")(2).InnerText
But I can't get the correct syntax.
I'm hoping I've made a really simple mistake, but I am also open to other methods if they are more efficient.
Thanks for reading,
Mr. J

Related

Executing URL from within ms access [duplicate]

I need to download a CSV file from a website using VBA in Excel. The server also needed to authenticate me since it was data from a survey service.
I found a lot of examples using Internet Explorer controlled with VBA for this. However, it was mostly slow solutions and most were also convoluted.
Update:
After a while I found a nifty solution using Microsoft.XMLHTTP object in Excel. I thought to share the solution below for future reference.
This solution is based from this website:
http://social.msdn.microsoft.com/Forums/en-US/bd0ee306-7bb5-4ce4-8341-edd9475f84ad/excel-2007-use-vba-to-download-save-csv-from-url
It is slightly modified to overwrite existing file and to pass along login credentials.
Sub DownloadFile()
Dim myURL As String
myURL = "https://YourWebSite.com/?your_query_parameters"
Dim WinHttpReq As Object
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", myURL, False, "username", "password"
WinHttpReq.send
If WinHttpReq.Status = 200 Then
Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq.responseBody
oStream.SaveToFile "C:\file.csv", 2 ' 1 = no overwrite, 2 = overwrite
oStream.Close
End If
End Sub
Declare PtrSafe Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" _
(ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
Sub Example()
DownloadFile$ = "someFile.ext" 'here the name with extension
URL$ = "http://some.web.address/" & DownloadFile 'Here is the web address
LocalFilename$ = "C:\Some\Path" & DownloadFile !OR! CurrentProject.Path & "\" & DownloadFile 'here the drive and download directory
MsgBox "Download Status : " & URLDownloadToFile(0, URL, LocalFilename, 0, 0) = 0
End Sub
Source
I found the above when looking for downloading from FTP with username and address in URL. Users supply information and then make the calls.
This was helpful because our organization has Kaspersky AV which blocks active FTP.exe, but not web connections. We were unable to develop in house with ftp.exe and this was our solution. Hope this helps other looking for info!
A modified version of above to make it more dynamic.
Public Function DownloadFileB(ByVal URL As String, ByVal DownloadPath As String, ByRef Username As String, ByRef Password, Optional Overwrite As Boolean = True) As Boolean
On Error GoTo Failed
Dim WinHttpReq As Object: Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", URL, False, Username, Password
WinHttpReq.send
If WinHttpReq.Status = 200 Then
Dim oStream As Object: Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq.responseBody
oStream.SaveToFile DownloadPath, Abs(CInt(Overwrite)) + 1
oStream.Close
DownloadFileB = Len(Dir(DownloadPath)) > 0
Exit Function
End If
Failed:
DownloadFileB = False
End Function
I was struggling for hours on this until I figured out it can be done in one line of powershell:
invoke-webrequest -Uri "http://myserver/Reports/Pages/ReportViewer.aspx?%2fClients%2ftest&rs:Format=PDF&rs:ClearSession=true&CaseCode=12345678" -OutFile "C:\Temp\test.pdf" -UseDefaultCredentials
I looked into doing it purely in VBA but it runs to several pages, so I just call my powershell script from VBA every time I want to download a file.
Simple.
Public Sub Test_DownloadFile()
Dim URLStr As String, DLPath As String, UName As String, PWD As String, DontOverWrite As Boolean
URLStr = "http.."
DLPath = Environ("USERPROFILE") & "\Downloads\TEST.PDF"
UName = ""
PWD = ""
DontOverWrite = False
Call DownloadFile(URLStr, DLPath, UName, PWD, DontOverWrite)
End Sub
Public Sub DownloadFile(ByVal URLStr As String, ByVal DLPath As String, Optional ByVal UName As String, Optional ByVal PWD As String, Optional DontOverWrite As Boolean)
On Error GoTo Failed
Dim WinHttpReq As Object
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", URLStr, False, UName, PWD
WinHttpReq.send
If WinHttpReq.status = 200 Then
Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq.responseBody
Dim OWrite As Integer
If DontOverWrite = True Then
OWrite = 1
Else
OWrite = 2
End If
oStream.SaveToFile DLPath, OWrite
oStream.Close
Debug.Print "Downloaded " & URLStr & " To " & DLPath
Exit Sub
End If
Failed:
Debug.Print "Failed to DL " & URLStr
End Sub
A modified version of above solution to make it more dynamic.
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
Public Function DownloadFileA(ByVal URL As String, ByVal DownloadPath As String) As Boolean
On Error GoTo Failed
DownloadFileA = False
'As directory must exist, this is a check
If CreateObject("Scripting.FileSystemObject").FolderExists(CreateObject("Scripting.FileSystemObject").GetParentFolderName(DownloadPath)) = False Then Exit Function
Dim returnValue As Long
returnValue = URLDownloadToFile(0, URL, DownloadPath, 0, 0)
'If return value is 0 and the file exist, then it is considered as downloaded correctly
DownloadFileA = (returnValue = 0) And (Len(Dir(DownloadPath)) > 0)
Exit Function
Failed:
End Function

Working code gives error when run on any other PC

I have working code that requests information from a website.
When I send the file to another PC and run the code, I get:
"Run-time error'91': Object variable or With block variable not set"
I ensured:
Macro security levels are the same (Enable all macros & trust access to VBA project object model)
All the checked boxes in VBA editor > Tools > References are the same (Specifically Microsoft HTML Object Library & Microsoft XML, V6.0 is checked)
Sub Macro1()
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim Current As Variant
website = "https://www.thalia.de/shop/home/artikeldetails/A1062020980"
Set request = CreateObject("MSXML2.XMLHTTP")
request.Open "GET", website, False
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
request.send
response = StrConv(request.responseBody, vbUnicode)
html.body.innerHTML = response
Current = html.getElementsByClassName("element-text-standard value").Item(0).innerText
MsgBox (Current)
End Sub
The line on which I get the error:
Current = html.getElementsByClassName("element-text-standard value").Item(0).innerText
WinHttp
I've tried a ton of various solutions, in the end, it came just to replacing MSXML2.XMLHTTP with WinHttp.WinHttpRequest.5.1 to make it work on my computer. While I was researching, I rewrote the whole thing a little bit. I'm a noob at this so I can't explain why one works and the other does not.
Option Explicit
Sub Macro1()
Const URL As String _
= "https://www.thalia.de/shop/home/artikeldetails/A1062020980"
'Const URL As String _
= "https://www.thalia.de/shop/home/artikeldetails/A1060523771"
Const ClassName As String _
= "element-text-standard value"
Dim WhrResponseText As String
WhrResponseText = GetWhrResponseText(URL)
If Len(WhrResponseText) = 0 Then
MsgBox "Could not get a response.", vbExclamation
Exit Sub
End If
' ' Write the response string to a worksheet.
' Dim ws As Worksheet: Set ws = ActiveSheet ' improve!
' Dim arr() As String: arr = Split(WhrResponseText, vbLf)
' ws.Range("A1").Resize(UBound(arr) + 1).Value = Application.Transpose(arr)
Dim Elements As Object
With CreateObject("htmlfile")
.body.innerHTML = WhrResponseText
Set Elements = .getElementsByClassName(ClassName)
End With
' Using 'Length' to determine if a result was found and returning
' the first element.
Dim Result As Variant
With Elements
If .Length > 0 Then
Result = .Item(0).innerText
MsgBox Result
Else
MsgBox "Nothing found."
End If
End With
Dim i As Long
' Loop through the elements using 'For Each... Next'.
Dim Element As Object
For Each Element In Elements
Debug.Print i, Element.innerText
i = i + 1
Next Element
' ' Loop through the elements using 'For... Next'.
' With Elements
' For i = 0 To .Length - 1
' Debug.Print i, .Item(i).innerText
' Next i
' End With
End Sub
Function GetWhrResponseText( _
ByVal URL As String) _
As String
Const ProcName As String = "GetWhrResponseText"
On Error GoTo ClearError
With CreateObject("WinHttp.WinHttpRequest.5.1")
.Open "GET", URL, False
.send
GetWhrResponseText = StrConv(.responseBody, vbUnicode)
End With
ProcExit:
Exit Function
ClearError:
Debug.Print "'" & ProcName & "' Run-time error '" _
& Err.Number & "':" & vbLf & " " & Err.Description
Resume ProcExit
End Function

Downloading data from a web page (list) to an Excel

I have to download data from here:
[http://www.bcra.gov.ar/PublicacionesEstadisticas/Evolucion_moneda.asp][1]
Then I have to save all the data in an Excel. The problem is that I have to choose several dates and several currencies. For example, I have to select 12/31/2018, Dolar, Euro and Pesos. Moreover, I have to choose one currency at a time, and I have many to download.
I've tried Import External Data with Excel, but it didn't work.
I've also tried with this VBA code
Sub descarga_monedas()
Fecha = "2018.06.05"
Moneda = 313
Path = "http://www.bcra.gob.ar/PublicacionesEstadisticas/Evolucion_moneda_3.asp?tipo=E&Fecha=" & Fecha & "&Moneda=" & Moneda & """"
Application.Workbooks.Open (Path)
End Sub
The page seems to block this kind of code.
Is any way to solve this?
You can do it the following way. I have grabbed all the dates but included only one date to be used in conjunction with all currencies. Add another outer loop over dates to add in the dates values i.e. use an outer loop over inputDates collection to get each date.
Option Explicit
Public Sub GetData()
Dim body As String, html As HTMLDocument, http As Object, i As Long
Dim codes As Object, inputCurrency As Object, inputDates As Object, dates As Object
Const BASE_URL As String = "http://www.bcra.gov.ar/PublicacionesEstadisticas/Evolucion_moneda_3.asp?tipo=E&"
Set codes = CreateObject("scripting.dictionary")
Set inputDates = New Collection
Set html = New HTMLDocument '<== VBE > Tools > References > Microsoft HTML Object library
Set http = CreateObject("MSXML2.XMLHTTP")
With http
.Open "GET", "http://www.bcra.gov.ar/PublicacionesEstadisticas/Evolucion_moneda.asp", False
.send
html.body.innerHTML = .responseText
Set inputCurrency = html.querySelectorAll("[name=Moneda] option[value]")
Set dates = html.querySelectorAll("[name=Fecha] option[value]")
For i = 0 To inputCurrency.Length - 1
codes(inputCurrency.item(i).innerText) = inputCurrency.item(i).Value
Next
For i = 0 To dates.Length - 1
inputDates.Add dates.item(i).Value
Next
Dim fecha As String, moneda As String, key As Variant, downloadURL As String
Dim clipboard As Object, ws As Worksheet
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
For Each key In codes.keys
DoEvents
fecha = inputDates.item(1) '<== use an outer loop over inputDates collection to get each date
moneda = key
downloadURL = BASE_URL & "Fecha=" & fecha & "&Moneda=" & moneda '2019.02.11 ,79
.Open "GET", downloadURL, False
.send
html.body.innerHTML = StrConv(http.responseBody, vbUnicode)
clipboard.SetText html.querySelector("table").outerHTML
clipboard.PutInClipboard
Set ws = ThisWorkbook.Worksheets.Add
ws.NAME = fecha & "_" & moneda
ws.Cells(1, 1).PasteSpecial
Next
End With
End Sub

Use XML HTTP Request On Sites That Require Object Interaction

I am working on a project to scrape information from a number of websites. I have a number of sites working with no issue, largely processing them by amending the URL to pass through the relevant criteria or by posting AJAX requests. I am fairly new to this so I am seeking some assistance.
I have come across a website where I need to interact with objects on a page in order to obtain further information. An example of this is the below site:
Example Site
If you visit the site and go to the bottom there are more brands and clicking "view" will display additional products. The HTML for these is only returned once clicked.
With other sites I have sourced information from I have used the below approach. Is there a way to process the page via the XML HTTP Method after a page object action has been performed?
Any help would be greatly appreciated. At the moment I am assuming I will have to stick to scraping such sites using an Internet Explorer object.
Option Explicit
Public Sub sbKF()
Dim conn As ADODB.Connection
Dim rsIn As ADODB.Recordset
Dim HTMLDoc As HTMLDocument
Dim strUrl As String
Dim strPost As String
Set conn = CurrentProject.Connection
Set rsIn = New ADODB.Recordset
Set HTMLDoc = New MSHTML.HTMLDocument
rsIn.Open pcstrInput, conn, adOpenStatic, adLockReadOnly
rsIn.MoveLast: rsIn.MoveFirst
Do While Not rsIn.EOF
' Create the URL and Post submission for input size.
strUrl = "http://www.[Site].com"
strPost = "Stage=2&sop=TyreSize&ssq=1&vnp=&vmk=&vch=&vmo=&drd="
' Return the Document body results
HTMLDoc.body.innerHTML = fnPostXmlHttp(strUrl, strPost)
rsIn.MoveNext
Loop
End Sub
Public Function fnPostXmlHttp(ByVal strUrl As String, ByVal strScript As String)
Dim XMLHttpRequest As Object
Dim strOut As String
Set XMLHttpRequest = CreateObject("MSXML2.XMLHTTP")
XMLHttpRequest.Open "POST", strUrl, False
XMLHttpRequest.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
XMLHttpRequest.send (strScript)
While XMLHttpRequest.ReadyState <> 4
DoEvents
Wend
fnPostXmlHttp = XMLHttpRequest.responseText
End Function
If you take a look at www.blackcircles.com HTML response, you will see the javascript snippet:
...
var newTyresActionUrl;
var lookupAddress;
$(document).ready(function () {
newTyresActionUrl = new BC.classes.productV6SearchPage('https://www.blackcircles.com/order/tyres',
{"Error":false,"VariantFitments":[{"Name":"All Season","VariantType":11,"SeasonalType":true,"TruckType":false,"FriendlyName":"allseason","Count":17,
...
"TakeoverCss":"\u003clink id=\"brandtakeover-css\" rel=\u0027stylesheet\u0027 type=\u0027text/css\u0027 href=\u0027/templates/bcstyles/css/goodyear-effgrip-perf.css\u0027\u003e"},
"Width",
"Profile",
"Rim",
"Speed",
"Method",
true,
""
);
addToBasket = new BC.classes.addtobasket('https://www.blackcircles.com/order/tyres', "order", '/truck/garages');
...
Actually the portion within curly braces represents a JSON object containing all displayed on the webpage data. So you can extract that JSON string from HTML content by Instr(), parse it, convert to arrays and output to the worksheet, as shown in the example code below. Import JSON.bas module into the VBA project for JSON processing.
Option Explicit
Sub Test_blackcircles()
Dim sResp As String
Dim vJSON As Variant
Dim sState As String
Dim i As Long
Dim vItem
Dim aData()
Dim aHeader()
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.blackcircles.com/order/tyres/search?width=205&profile=55&rim=R16&speed=V&vehicle-make=&postcode=&delivery=1&findTyre=", False
.send
sResp = .responseText
End With
sResp = getFragment(sResp, "new BC.classes.productV6SearchPage", "new BC.classes.addtobasket")
sResp = getFragment(sResp, "{", "}")
sResp = "{" & sResp & "}"
JSON.Parse sResp, vJSON, sState
i = 1
With Sheets(1)
.Cells.Delete
.Cells.WrapText = False
For Each vItem In Array( _
"Manufacturers", _
"CarManufacturers", _
"All", _
"Deals", _
"Best", _
"Rest", _
"SearchParams" _
)
.Cells(i, 1).Value = vItem
JSON.ToArray vJSON(vItem), aData, aHeader
OutputArray .Cells(i + 2, 1), aHeader
Output2DArray .Cells(i + 3, 1), aData
.Columns.AutoFit
i = i + UBound(aData, 1) + 5
Next
End With
End Sub
Sub OutputArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
.NumberFormat = "#"
.Value = aCells
End With
End With
End Sub
Sub Output2DArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize( _
UBound(aCells, 1) - LBound(aCells, 1) + 1, _
UBound(aCells, 2) - LBound(aCells, 2) + 1)
.NumberFormat = "#"
.Value = aCells
End With
End With
End Sub
Function getFragment( _
sourceText As String, _
startPattern As String, _
endPattern As String _
) As String
Dim startPos
startPos = InStr(sourceText, startPattern)
If startPos = 0 Then Exit Function
Dim partText
partText = Mid(sourceText, startPos + Len(startPattern))
Dim endPos
endPos = InStrRev(partText, endPattern)
If endPos = 0 Then Exit Function
getFragment = Left(partText, endPos - 1)
End Function
BTW, the similar approach applied in other answers.

VBA Excel pulling new webpage data after clicking on "submit"

I'm trying to pull some info from a website that provides oil well data by API number (API is a unique number for every well in the US)
Website: http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1
API example: 1708300502
The issue is, when I get to the 2nd page, IE.document.getElementsByTagName("body")(0).innerText still returns data from the initial page. How do I fetch the updated page data?
The ultimate goal is to get to the 2nd page, click on "30570" via IE.document.getElementsByTagName("a")(0).Click and then read the final 3rd page. I just cannot figure out how to read the updated page :(
Option Explicit
Sub sonris_WellData()
Dim IE As InternetExplorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
Dim i As Integer
'Open SONRIS website
Application.StatusBar = "Opening Website"
IE.navigate "http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Do While IE.readyState <> 4: DoEvents: Loop
Application.Wait Now() + TimeValue("00:00:01")
Application.StatusBar = False
IE.document.forms(0).p_apinum.Value = "1708300502" 'plug-in API
IE.document.forms(0).submit
' Wait until the next page opens
Application.StatusBar = "Opening Website"
Do While IE.readyState <> 4: DoEvents: Loop
Application.Wait Now() + TimeValue("00:00:01")
Application.StatusBar = False
' Read the page - this is where the issue occurs, MsgBox keeps returning text from the very 1st page
MsgBox IE.document.getElementsByTagName("body")(0).innerText
IE.Quit
End Sub
This seems to be working. Rather than DoEvents use the WinAPI Sleep function. I also added a call to the Sleep function after the form submit.
MOre often we are seeing sites that are dynamically served by some javascript/etc., in these cases the browser may appear to be READYSTATE_COMPLETE or not Busy but the page has not yet rendered the "new" results.
Option Explicit
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub sonris_WellData()
Dim IE As Object 'InternetExplorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
Dim i As Integer
'Open SONRIS website
Application.StatusBar = "Opening Website"
IE.navigate "http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Do While IE.readyState <> 4
Sleep 1000
Loop
Application.StatusBar = False
IE.document.forms(0).p_apinum.Value = "1708300502" 'plug-in API
IE.document.forms(0).submit
Sleep 1000
' Wait until the next page opens
Application.StatusBar = "Opening Website"
Do While IE.readyState <> 4
Sleep 1000
Loop
Application.StatusBar = False
' Read the page - this is where the issue occurs, MsgBox keeps returning text from the very 1st page
MsgBox IE.document.getElementsByTagName("body")(0).innerText
IE.Quit
End Sub
You can experiment maybe with a slightly longer Sleep after the .submit.
Alternatively, I notice that after you submit, the URL changes, so you could also try changing the second waiting loop to:
Do While IE.LocationURL ="http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Sleep 1000
Loop
This should put the Excel.Application to wait until the URL has changed.
Alternatively, you may have better luck using an XMLHTTPRequest (there are many examples of this here on SO and elsewhere on the internet). This allows you to send a request just like the browser, without actually using a web browser. Then you can simply parse the return text as HTML or XML. I would use the Microsoft XML, v6.0 library reference for this.
POST requests:
① Entering the Well API number
I examined the web page making the selections you mention. I inspected the web traffic using fiddler and noticed that the initial request, when you submit the API number is handled by a POST request.
② POST request:
The POST body has the following parameter:
p_apinum is the key and the associated value is the original Well API number.
Using this info I formulated a POST request direct thus avoiding your first landing page.
③ Pressing the hyperlink:
Next, I noticed that the element you wanted to press:
Looking at the associated HTML it has an associated relative hyperlink:
I use a helper function to parse the page HTML to get this relative link and construct the absolute path: GetNextURL(page.body.innerHTML).
④ Making a new request:
I re-use my HTTPRequest function GetPage to send a second request, with an empty body, and grab all the tables from the HTML document returned via: page.getElementsByTagName("table").
⑤ Writing the tables to the Excel worksheet:
I loop all the tables on the page using helper function AddHeaders to write out the table headers, and WriteTables to write the current table to the sheet.
Example page content:
Example code output:
VBA:
Option Explicit
Public Sub GetWellInfo()
Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
Const PARAM1 As String = "p_apinum"
Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
apiNumbers = Array(1708300502, 1708300503)
Application.ScreenUpdating = False
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ws
.Cells.ClearContents
For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
Dim allTables As Object
Set allTables = page.getElementsByTagName("table")
For Each targetTable In allTables
AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
WriteTables targetTable, GetLastRow(ws, 1), ws
Next targetTable
Next currNumber
End With
Application.ScreenUpdating = True
End Sub
Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
Dim objHTTP As Object, html As New HTMLDocument
Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")
Dim sBody As String
If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
With objHTTP
.SetTimeouts 10000, 10000, 10000, 10000
.Open "POST", url, False
.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
On Error Resume Next
.send (sBody)
If Err.Number = 0 Then
If .Status = "200" Then
html.body.innerHTML = .responseText
Set GetPage = html
Else
Debug.Print "HTTP " & .Status & " " & .statusText
Exit Function
End If
Else
Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
Exit Function
End If
On Error GoTo 0
End With
End Function
Public Function GetNextURL(ByVal inputString As String)
GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
End Function
Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
Dim headers As Object, header As Object, columnCounter As Long
Set headers = hTable.getElementsByTagName("th")
For Each header In headers
columnCounter = columnCounter + 1
ws.Cells(startRow, columnCounter) = header.innerText
Next header
End Sub
Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
If ws Is Nothing Then Set ws = ActiveSheet
Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
r = startRow
With ActiveSheet
Set tRow = hTable.getElementsByTagName("tr")
For Each tr In tRow
Set tCell = tr.getElementsByTagName("td")
For Each td In tCell
.Cells(r, c).Value = td.innerText
c = c + 1
Next td
r = r + 1: c = 1
Next tr
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
End With
End Function
References:
VBE > Tools > References > HTML Object Library.