Copy data from Web to excel using VBA - html

I have a webpage and need my code to copy the entire data from the page and copy it into the excel sheet, which is not happening right now. My excel sheet is coming to be completely blank. I think the ^a feature is not working on the IE to select the data and then copy it.
Any help is much appreciated. Below is the code I am using.
Sub Webdata()
Dim assetname As String, country As String, area As String, region As String, pth As String, folname As Variant, assetname1 As String
Website = "http://website.com/"
Set myIE = CreateObject("InternetExplorer.Application")
myIE.Navigate source
myIE.Visible = True
Application.Wait Now + TimeSerial(0, 0, 10)
SendKeys "^a"
Application.Wait Now + TimeSerial(0, 0, 2)
SendKeys "^c"
Application.Wait Now + TimeSerial(0, 0, 2)
Sheets.Add
ActiveSheet.Name = "Webdata"
ActiveSheet.Paste
Application.Wait Now + TimeSerial(0, 0, 2)
Range("A1").Select
Cells.Find(What:="Api Number", After:=ActiveCell, LookIn:= _
xlFormulas, LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:= _
xlNext, MatchCase:=False, SearchFormat:=False).Activate
ActiveCell.Offset(1, 0).Select
Selection.Copy
Sheets("Sheet1").Activate
Range("C2").Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
Application.CutCopyMode = False
myIE.Quit
Set myIE = Nothing
Err.Clear
Sheets("Webdata").Select
ActiveSheet.Delete
End Sub

That table is a mess so rather than spending time perfecting how to write out the table to the sheet in the way I normally would i.e. looping rows of tables and table cells within rows, I will stick with your idea of copying the table but use the clipboard, with .SetText, rather than SendKeys.
The table of interest is within nested frames so you have to negotiate those first.
Set hTable = .frames(2).document.getElementsByTagName("table")(0)
Code:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer, html As HTMLDocument, hTable As HTMLTable, clipboard As Object
Application.ScreenUpdating = False
With IE
.Visible = True
.navigate "http://pipeline.wyo.gov/Wellapi.cfm?oops=IDxxxxx&nAPINO=xxxxxx" '<==Input your personal URL here
While .Busy Or .readyState < 4: DoEvents: Wend
Set html = .document
With html
Set hTable = .frames(2).document.getElementsByTagName("table")(0)
Set clipboard = New MSForms.DataObject
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ActiveSheet.Cells(1, 1).PasteSpecial
End With
.Quit
End With
Application.ScreenUpdating = True
End Sub
References:
VBE> Tools > References:
Microsoft Forms 2.0 Object Library
HTML Object Library
Internet Explorer Controls

Related

Cannot click search result elements after submitting HTML web form with embedded results table - VBA web scrape

I am trying to scrape data from the following URL: http://iswdataclient.azurewebsites.net/webSearchID.aspx?dbkey=parkercad.
I can insert and query a property ID but after loading the search results, I am unable to successfully click the "View Property" link in the results table.
My initial debugging suggested that the form had not actually submitted, meaning the link was not present on the webpage. However, the HTML in the subsequent results page shows the additional elements for the search results. I have unsuccessfully tried the following to wait for the webpage to load, but I do not think it is a timing issue:
Do While ie.Busy Or ie.readyState <> 4: DoEvents: Loop
Do While oIE.ReadyState = 4: WScript.Sleep 100: Loop
Do While oIE.ReadyState <> 4: WScript.Sleep 100: Loop
Do While IE.ReadyState = 4: DoEvents: Loop
Do Until IE.ReadyState = 4: DoEvents: Loop
Application.Wait (Now + TimeValue("00:00:03"))
I have parsed the HTML a number of ways, also considering an event handling issue, beginning with a drill down at the form level:
Set ie = CreateObject("internetexplorer.application")
With ie
.navigate "http://iswdataclient.azurewebsites.net/webProperty.aspx?dbkey=parkercad
.Visible = True
Do While .Busy Or .readyState <> 4
DoEvents
Loop
End With
For Each propid In Range(Cells(2, 8), Cells(2, 8)) 'Cells(Range("H" & Rows.Count).End(xlUp).Row, 8)) 'propid = R000001972
If propid <> "N/A" Then
On Error Resume Next
With ie.document.body
For iFRM = 0 To .getElementsByTagName("form").Length - 1
If .getElementsByTagName("form")(iFRM).ID = "searchForm" Then
With .getElementsByTagName("form")(iFRM)
For iNPT = 0 To .getElementsByTagName("input").Length - 1
Select Case .getElementsByTagName("input")(iNPT).Name
Case "ucSearchID$searchid"
.getElementsByTagName("input")(iNPT).Value = propid
Case "ucSearchID$ButtonSearch"
.getElementsByTagName("input")(iNPT).Click
End Select
Next iNPT
Do While ie.Busy Or ie.readyState <> 4: DoEvents: Loop
Application.Wait (Now + TimeValue("00:00:02"))
Exit For
End With
Exit For
End If
Next iFRM
End With
As well as a simple parse of the required elements:
Set ie = CreateObject("internetexplorer.application")
With ie
.navigate "http://iswdataclient.azurewebsites.net/webProperty.aspx?dbkey=parkercad
.Visible = True
Do While .Busy Or .readyState <> 4
DoEvents
Loop
End With
Set intags = ie.document.getElementsByTagName("input")
For Each intag In intags
If intag.classname = "searchid" Then
intag.Value = propid
Set evt = ie.document.createEvent("keyboardevent")
evt.initEvent "change", True, False
intag.dispatchEvent evt
End If
Next intag
ie.document.getelementbyid("ucSearchID_ButtonSearch").Click
While ie.readyState <> 4 Or ie.Busy: DoEvents: Wend
As well as a drill down of table cells, the code for which I deleted. Although I thought there could be an event handling issue, the webpage updates, I just cannot parse the updated HTML from the results table.
Debug.Print ie.document.getelementbyid("lblResults").innerText
The Debug.Print returns "Your search of ' ' returned 0 result(s)", while the webpage reflects a successful search with "Your search of 'R000001972' returned 1 result(s). So, my code successfully submits the form but does not execute the results page "View Property" link click, as it fails to parse the updated HTML:
For at = 0 To ie.document.getElementsByTagName("a").Length - 1
Select Case ie.document.getElementsByTagName("a")(at).ID
Case "ucResultsGrid_" & propid
ie.document.getElementsByTagName("a")(at).Click
End Select
Next at
It does not seem to be either a timing or event handling issue. Unsure of how to proceed. Any help would be much appreciated.
It's an aspx page. You can perform the same GET and POST requests it does in a simplified form. I use clipboard to write out sample tables. You can amend as you choose.
Option Explicit
Public Sub GetPropertyInfo()
Dim html As MSHTML.HTMLDocument, xhr As Object
Application.ScreenUpdating = False
Set html = New MSHTML.HTMLDocument
Set xhr = CreateObject("MSXML2.ServerXMLHTTP")
Dim body As String, propertyId As String
propertyId = "R000001972"
With xhr
.Open "GET", "http://iswdataclient.azurewebsites.net/webSearchID.aspx?dbkey=parkercad&stype=id&sdata=" & propertyId, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
If html.querySelectorAll("#dvPrimary table tr").Length <= 1 Then Exit Sub
body = GetPostBody(html, propertyId)
.Open "POST", "http://iswdataclient.azurewebsites.net/webProperty.aspx?dbkey=parkercad&stype=id&sdata=" _
& propertyId & "&id=" & propertyId, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send body
html.body.innerHTML = .responseText
End With
Dim ws As Worksheet, clipboard As Object, i As Long
Set ws = ThisWorkbook.Worksheets(1)
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
With ws.Cells
.ClearContents
.ClearFormats
End With
With html.querySelectorAll("table")
For i = 8 To .Length - 1
clipboard.SetText .Item(i).outerHTML
clipboard.PutInClipboard
ws.Range("A" & GetLastRow(ws) + 2).PasteSpecial
Next
End With
Application.ScreenUpdating = True
End Sub
Public Function GetPostBody(ByVal html As MSHTML.HTMLDocument, ByVal propertyId As String) As String
Dim i As Long, result As String
With html.querySelectorAll("input[type=hidden]")
For i = 0 To .Length - 1
result = result & .Item(i).ID & "=" & .Item(i).Value & "&"
Next
End With
result = result & "__EVENTTARGET=ucResultsGrid$" & propertyId
GetPostBody = result
End Function
Public Function GetLastRow(ByVal sh As Worksheet) As Long
On Error Resume Next
GetLastRow = sh.Cells.Find(What:="*", _
After:=sh.Range("A1"), _
Lookat:=xlPart, _
LookIn:=xlFormulas, _
SearchOrder:=xlByRows, _
SearchDirection:=xlPrevious, _
MatchCase:=False).Row
On Error GoTo 0
End Function
References (VBE > Tools > References):
Microsoft HTML Object Library

Extract table from webpage using VBA

I would like to extract the table from html code into Excel using VBA.
I have tried the following code several times with changing some of the code but keep on getting error.
Sub GrabTable()
'dimension (set aside memory for) our variables
Dim objIE As InternetExplorer
Dim ele As Object
Dim y As Integer
'start a new browser instance
Set objIE = New InternetExplorer
'make browser visible
objIE.Visible = False
'navigate to page with needed data
objIE.navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5923061"
'wait for page to load
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'we will output data to excel, starting on row 1
y = 1
'look at all the 'tr' elements in the 'table' with id 'InputTable2',
'and evaluate each, one at a time, using 'ele' variable
For Each ele In objIE.document.getElementByClassName("InputTable2").getElementsByTagName("tr")
'show the text content of 'td' element being looked at
Debug.Print ele.textContent
'each 'tr' (table row) element contains 2 children ('td') elements
'put text of 1st 'td' in col A
Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent
'put text of 2nd 'td' in col B
Sheets("Sheet1").Range("B" & y).Value = ele.Children(1).textContent
y = y + 1
'repeat until last ele has been evaluated
Next
End Sub
I show you two methods:
Using IE: The data is inside an iframe which needs to be negotiated
Using XMLHTTP request - much faster and without browser opening. It uses the first part of the iframe document URL which is what the iframe is navigating to.
In both cases I access the tables containing the company name and then the disclosure info table. For the disclosure main info table I copy the outerHTML to the clipboard and paste to Excel to avoid looping all the rows and columns. You can simply set loop the tr (table rows) and td (table cells) within instead.
IE:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer, clipboard As Object
With IE
.Visible = True
.navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5923061"
While .Busy Or .readyState < 4: DoEvents: Wend
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
With .document.getElementById("bm_ann_detail_iframe").contentDocument
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1) = .getElementsByClassName("company_name")(0).innerText
clipboard.SetText .getElementsByTagName("table")(1).outerHTML
clipboard.PutInClipboard
End With
ThisWorkbook.Worksheets("Sheet1").Cells(2, 1).PasteSpecial
.Quit
End With
End Sub
XMLHTTP:
You can extract a different URL from the front-end of the iframe URL and use that as shown below.
Here is the section of your original HTML that shows the iframe and the associated new URL info:
Option Explicit
Public Sub GetTable()
Dim sResponse As String, html As HTMLDocument, clipboard As Object
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://disclosure.bursamalaysia.com/FileAccess/viewHtml?e=2891609", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
html.body.innerHTML = sResponse
With html
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1) = .querySelector(".company_name").innerText
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
clipboard.SetText .querySelector(".InputTable2").outerHTML
clipboard.PutInClipboard
End With
ThisWorkbook.Worksheets("Sheet1").Cells(2, 1).PasteSpecial
End Sub
Try it this way.
Sub Web_Table_Option_Two()
Dim HTMLDoc As New HTMLDocument
Dim objTable As Object
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Dim objIE As InternetExplorer
Set objIE = New InternetExplorer
objIE.Navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5923061"
Do Until objIE.ReadyState = 4 And Not objIE.Busy
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:03")) 'wait for java script to load
HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
With HTMLDoc.body
Set objTable = .getElementsByTagName("table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End With
objIE.Quit
End Sub

Scrape values from website using VBA

Help needed in order to scrape some data from a website.
As a first step i manage to visit the website and import my variables but:
1.i don't know how to press "Convert currencies" button
2.and afterwards to get "Converted Amount" & "Rate" to excel.
any help will be appreciate!!!
Sub Test()
Dim IE As InternetExplorer
Dim Amount As String
Dim Source As String
Dim Target As String
Dim Datestring As String
Amount = 10000
Source = "Euro"
Target = "UK pound sterling"
Datestring = "03-08-2018"
'Open Browser and download data
Set IE = New InternetExplorer
With IE
.Visible = True
.Navigate "http://sdw.ecb.europa.eu/curConverter.do?sourceAmount=" & _
Amount & _
"&sourceCurrency=" & _
Source & _
"&targetCurrency=" & _
Target & _
"&inputDate=" & _
Datestring & _
"&submitConvert.x=209&submitConvert.y=10"
submitConvert.Click
While .Busy Or .readyState < 4: DoEvents: Wend
End With
End Sub
XmlHttpRequest (XHR):
Faster to use XHR where there is no browser opening.
Option Explicit
Public Sub GetRates()
Dim sResponse As String, i As Long, html As New HTMLDocument, clipboard As Object
Dim sourceAmount As String, sourceCurrency As String, targetCurrency As String, inputDate As String
sourceAmount = "10000"
sourceCurrency = "EUR"
targetCurrency = "GBP"
inputDate = "03-08-2018"
Dim url As String
url = "http://sdw.ecb.europa.eu//curConverter.do?sourceAmount=" & sourceAmount & "&sourceCurrency=" & sourceCurrency & _
"&targetCurrency=" & targetCurrency & "&inputDate=" & inputDate & "&submitConvert.x=52&submitConvert.y=8"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
With html
.body.innerHTML = sResponse
Set clipboard = New MSForms.DataObject
clipboard.SetText .querySelectorAll("table.tableopenpage").item(1).outerHTML
clipboard.PutInClipboard
ActiveSheet.Cells(1, 1).PasteSpecial
End With
End Sub
Less robust but if you just want the
converted amount:
.querySelectorAll("table.tableopenpage").item(1).getElementsbytagname("td")(7).innertext
And the rate:
.querySelectorAll("table.tableopenpage").item(1).getElementsbytagname("td")(10).innertext
Output:
References:
Microsoft Forms Object Library (or add a userform to your project)
Microsoft HTML Object Library
Internet Explorer:
The data is already there due to the query string you are using in the URL. No need for a click.
Just use the correct abbreviations for currencies.
Option Explicit
Public Sub Test()
Dim IE As InternetExplorer, Amount As String, Source As String, Target As String
Dim Datestring As String, hTable As HTMLTable
Amount = 10000
Source = "EUR"
Target = "GBP"
Datestring = "03-08-2018"
Dim url As String
url = "http://sdw.ecb.europa.eu/curConverter.do?sourceAmount=" & _
Amount & _
"&sourceCurrency=" & _
Source & _
"&targetCurrency=" & _
Target & _
"&inputDate=" & _
Datestring & _
"&submitConvert.x=209&submitConvert.y=10"
Set IE = New InternetExplorer
With IE
.Visible = True
.navigate url
While .Busy Or .readyState < 4: DoEvents: Wend
Dim clipboard As Object
Set clipboard = New MSForms.DataObject
clipboard.SetText .document.getElementsByClassName("tableopenpage")(1).outerHTML
clipboard.PutInClipboard
ActiveSheet.Cells(1, 1).PasteSpecial
End With
End Sub
If interested in how to click though:
1) Use the correct 3 letter abbreviations for the currencies.
2) You can click the submit button with:
.document.querySelector("input[name=submitConvert]").Click
It uses a CSS selector of
input[name=submitConvert]
This says
element with input tag having an attribute name whose value is submitconvert.
3) You then need a
While .Busy Or .readyState < 4: DoEvents: Wend
to allow the page to refresh.
4) You can then grab the results table with:
.document.querySelectorAll("table.tableopenpage").item(1)
This collects all elements with a tag table and class tableopenpage. You want the second of these, which is 1 on a 0-based index system.
References required:
Microsoft Internet Controls
Microsoft HTML Object Library
Microsoft Forms Object Library
Other:
I find it simpler to grab the table in one go but you could target the rate, for example, more specifically with a CSS selector of:
a[target*=quickview]
Be aware that Excel may swop Date from dd/mm/yyyy to mm/dd/yyyy on output and so you will need to correct this, or at least be aware of it.

Web Query where there are multiple Frames

My goal is to scrape the source code of a web page.
The site seems to have different Frames which is why my code won't work properly.
I tried to modify a code which I found online which should solve the Frame issue.
The following code creates an error (object required) at:
Set profileFrame .document.getElementById("profileFrame")
Public Sub IE_Automation()
'Needs references to Microsoft Internet Controls and Microsoft HTML Object Library
Dim baseURL As String
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim profileFrame As HTMLIFrame
Dim slotsDiv As HTMLDivElement
'example URL with multiple frames
baseURL = "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"
Set IE = New InternetExplorer
With IE
.Visible = True
'Navigate to the main page
.navigate baseURL & "/publictrophy/index.htm?onlinename=ace_anubis"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
'Get the profileFrame iframe and navigate to it
Set profileFrame = .document.getElementById("profileFrame")
.navigate baseURL & profileFrame.src
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set HTMLdoc = .document
End With
'Display all the text in the profileFrame iframe
MsgBox HTMLdoc.body.innerText
'Display just the text in the slots_container div
Set slotsDiv = HTMLdoc.getElementById("slots_container")
MsgBox slotsDiv.innerText
End Sub
Hummmm, I'm not exactly sure what you are doing here, but can you try the code below?
Option Explicit
Sub Sample()
Dim ie As Object
Dim links As Variant, lnk As Variant
Dim rowcount As Long
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"
'Wait for site to fully load
'ie.Navigate2 URL
Do While ie.Busy = True
DoEvents
Loop
Set links = ie.document.getElementsByTagName("a")
rowcount = 1
With Sheets("Sheet1")
For Each lnk In links
'Debug.Print lnk.innerText
'If lnk.classname Like "*Real Statistics Examples Part 1*" Then
.Range("A" & rowcount) = lnk.innerText
rowcount = rowcount + 1
'Exit For
'End If
Next
End With
End Sub
General:
I think in your research you may have come across this question and misunderstood how it relates/doesn't relate to your circumstance.
I don't think iFrames are relevant to your query. If you are after the list of names, their details and the URLs to their pages you can use the code below.
CSS Selectors
To target the elements of interest I use the following two CSS selectors. These use style infomation on the page to target the elements:
.SearchResults-link
.SearchResults-item
"." means class, which is like saying .getElementsByClassName. The first gets the links, and the second gets the description information on the first page.
With respect to the first CSS selector: The actual link required is dynamically constructed, but we can use the fact that the actual profile URLs have a common base string of "https://www.xing.com/profile/", which is then followed by the profileName. So, in function GetURL, we parse the outerHTML returned by the CSS selector to get the profileName and concatenate it with the BASESTRING constant to get our actual profile link.
Code:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.xing.com/publicsearch/query?search%5Bq%5D=IT"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim a As Object, exitTime As Date, linksNodeList As Object, profileNodeList As Object
' exitTime = Now + TimeSerial(0, 0, 5) '<== uncomment this section if timing problems
'
' Do
' DoEvents
' On Error Resume Next
' Set linksNodeList = .document.querySelectorAll(".SearchResults-link")
' On Error GoTo 0
' If Now > exitTime Then Exit Do
' Loop While linksNodeList Is Nothing
Set linksNodeList = .document.querySelectorAll(".SearchResults-link") '<== comment this out if uncommented section above
Set profileNodeList = .document.querySelectorAll(".SearchResults-item")
Dim i As Long
For i = 0 To profileNodeList.Length - 1
Debug.Print "Profile link: " & GetURL(linksNodeList.item(i).outerHTML)
Debug.Print "Basic info: " & profileNodeList.item(i).innerText
Next i
End With
End Sub
Public Function GetURL(ByVal htmlSection As String) As String
Const BASESTRING As String = "https://www.xing.com/profile/"
Dim arr() As String
arr = Split(htmlSection, "/")
GetURL = BASESTRING & Replace$(Split((arr(UBound(arr) - 1)), ">")(0), Chr$(34), vbNullString)
End Function
Example return information:

VBA Excel pulling new webpage data after clicking on "submit"

I'm trying to pull some info from a website that provides oil well data by API number (API is a unique number for every well in the US)
Website: http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1
API example: 1708300502
The issue is, when I get to the 2nd page, IE.document.getElementsByTagName("body")(0).innerText still returns data from the initial page. How do I fetch the updated page data?
The ultimate goal is to get to the 2nd page, click on "30570" via IE.document.getElementsByTagName("a")(0).Click and then read the final 3rd page. I just cannot figure out how to read the updated page :(
Option Explicit
Sub sonris_WellData()
Dim IE As InternetExplorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
Dim i As Integer
'Open SONRIS website
Application.StatusBar = "Opening Website"
IE.navigate "http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Do While IE.readyState <> 4: DoEvents: Loop
Application.Wait Now() + TimeValue("00:00:01")
Application.StatusBar = False
IE.document.forms(0).p_apinum.Value = "1708300502" 'plug-in API
IE.document.forms(0).submit
' Wait until the next page opens
Application.StatusBar = "Opening Website"
Do While IE.readyState <> 4: DoEvents: Loop
Application.Wait Now() + TimeValue("00:00:01")
Application.StatusBar = False
' Read the page - this is where the issue occurs, MsgBox keeps returning text from the very 1st page
MsgBox IE.document.getElementsByTagName("body")(0).innerText
IE.Quit
End Sub
This seems to be working. Rather than DoEvents use the WinAPI Sleep function. I also added a call to the Sleep function after the form submit.
MOre often we are seeing sites that are dynamically served by some javascript/etc., in these cases the browser may appear to be READYSTATE_COMPLETE or not Busy but the page has not yet rendered the "new" results.
Option Explicit
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub sonris_WellData()
Dim IE As Object 'InternetExplorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
Dim i As Integer
'Open SONRIS website
Application.StatusBar = "Opening Website"
IE.navigate "http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Do While IE.readyState <> 4
Sleep 1000
Loop
Application.StatusBar = False
IE.document.forms(0).p_apinum.Value = "1708300502" 'plug-in API
IE.document.forms(0).submit
Sleep 1000
' Wait until the next page opens
Application.StatusBar = "Opening Website"
Do While IE.readyState <> 4
Sleep 1000
Loop
Application.StatusBar = False
' Read the page - this is where the issue occurs, MsgBox keeps returning text from the very 1st page
MsgBox IE.document.getElementsByTagName("body")(0).innerText
IE.Quit
End Sub
You can experiment maybe with a slightly longer Sleep after the .submit.
Alternatively, I notice that after you submit, the URL changes, so you could also try changing the second waiting loop to:
Do While IE.LocationURL ="http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Sleep 1000
Loop
This should put the Excel.Application to wait until the URL has changed.
Alternatively, you may have better luck using an XMLHTTPRequest (there are many examples of this here on SO and elsewhere on the internet). This allows you to send a request just like the browser, without actually using a web browser. Then you can simply parse the return text as HTML or XML. I would use the Microsoft XML, v6.0 library reference for this.
POST requests:
① Entering the Well API number
I examined the web page making the selections you mention. I inspected the web traffic using fiddler and noticed that the initial request, when you submit the API number is handled by a POST request.
② POST request:
The POST body has the following parameter:
p_apinum is the key and the associated value is the original Well API number.
Using this info I formulated a POST request direct thus avoiding your first landing page.
③ Pressing the hyperlink:
Next, I noticed that the element you wanted to press:
Looking at the associated HTML it has an associated relative hyperlink:
I use a helper function to parse the page HTML to get this relative link and construct the absolute path: GetNextURL(page.body.innerHTML).
④ Making a new request:
I re-use my HTTPRequest function GetPage to send a second request, with an empty body, and grab all the tables from the HTML document returned via: page.getElementsByTagName("table").
⑤ Writing the tables to the Excel worksheet:
I loop all the tables on the page using helper function AddHeaders to write out the table headers, and WriteTables to write the current table to the sheet.
Example page content:
Example code output:
VBA:
Option Explicit
Public Sub GetWellInfo()
Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
Const PARAM1 As String = "p_apinum"
Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
apiNumbers = Array(1708300502, 1708300503)
Application.ScreenUpdating = False
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ws
.Cells.ClearContents
For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
Dim allTables As Object
Set allTables = page.getElementsByTagName("table")
For Each targetTable In allTables
AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
WriteTables targetTable, GetLastRow(ws, 1), ws
Next targetTable
Next currNumber
End With
Application.ScreenUpdating = True
End Sub
Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
Dim objHTTP As Object, html As New HTMLDocument
Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")
Dim sBody As String
If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
With objHTTP
.SetTimeouts 10000, 10000, 10000, 10000
.Open "POST", url, False
.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
On Error Resume Next
.send (sBody)
If Err.Number = 0 Then
If .Status = "200" Then
html.body.innerHTML = .responseText
Set GetPage = html
Else
Debug.Print "HTTP " & .Status & " " & .statusText
Exit Function
End If
Else
Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
Exit Function
End If
On Error GoTo 0
End With
End Function
Public Function GetNextURL(ByVal inputString As String)
GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
End Function
Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
Dim headers As Object, header As Object, columnCounter As Long
Set headers = hTable.getElementsByTagName("th")
For Each header In headers
columnCounter = columnCounter + 1
ws.Cells(startRow, columnCounter) = header.innerText
Next header
End Sub
Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
If ws Is Nothing Then Set ws = ActiveSheet
Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
r = startRow
With ActiveSheet
Set tRow = hTable.getElementsByTagName("tr")
For Each tr In tRow
Set tCell = tr.getElementsByTagName("td")
For Each td In tCell
.Cells(r, c).Value = td.innerText
c = c + 1
Next td
r = r + 1: c = 1
Next tr
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
End With
End Function
References:
VBE > Tools > References > HTML Object Library.