I am trying to extract data from a website and parse it using VBA. The code is working absolutely fine when stepping through the code however, when running it gives an 'Object Required' run-time error. Here is the code:
Function AuthToken()
Dim XMLHTTP
Dim result As String
Dim csrf As String
Dim token As Object
Dim objHTML As Object
Set objHTML = CreateObject("htmlfile")
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
With XMLHTTP
.Open "GET", "some website", False
.setRequestHeader "Cookie", cf
.SEND
result = .responseText
End With
With objHTML
.Open
.Write result
.Close
Set token = .getElementsByName("csrf-token")
End With
AuthToken = token(0).Content
AuthToken = Replace(Replace(Replace(AuthToken, "+", "%2B"), "/", "%2F"), "=", "%3D")
Set XMLHTTP = Nothing
End Function
The error is triggered by this line in the code:
AuthToken = token(0).Content
If I just click debug and resume, it works fine so I am really struggling to debug this. Can anyone help?
Please, instead of
result = .responseText
use
If .status = 200 Then
result = .responseText
End If
Your code needs to be sure that URL has been loaded completely...4
If you place the next debugging line:
Debug.Print token Is Nothing: Stop
immediately after
Set token = .getElementsByName("csrf-token")
Edited:
If not a matter of .status not "OK", you can try to wait a little before reaching the problematic code line. Please call the next Sub, just before that line:
copy the next API on top of the module keeping the code (in the declarations area):
Private Declare PtrSafe Function GetTickCount Lib “kernel32.dll” () As Long
Add the next waiting Sub:
Sub WaitALittle(Finish As Long)
Dim NowTick As Long,EndTick As Long
EndTick = GetTickCount + (Finish * 1000)
Do
NowTick = GetTickCount
DoEvents
Loop Until NowTick >= EndTick
End Sub
It can be simple called as WaitALittle 2. You may play a little with that 2 (time to wait)...
Related
I'm currently working on an excel table that reads various API's and processes the results.
I'm trying to adapt an api request for this table, but unfortunately I can't do it.
I'm assuming this good working code:
Public Sub Main()
On Error Resume Next
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://api.binance.com/api/v3/ticker/price", False
On Error Resume Next
http.Send
On Error GoTo error
Set json = ParseJson(http.ResponseText)
i = 10
For Each Item In json
If Item("symbol") = Workbooks(1).Worksheets("Tabelle1").Range("A1").Value And Workbooks(1).Worksheets("Tabelle1").Range("A1").Value <> "" Then
Sheets(1).Cells(1, i).Value = Item("price")
i = i + 1
End If
Next
Exit Sub
error:
End Sub
CELL A1 says: ETHBTC and I get the corresponding value in J1
Unfortunately, the following code doesn't work and I don't understand why:
Sub GetVolume()
On Error Resume Next
'List of all symbols
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://api.coincap.io/v2/candles", False
On Error Resume Next
http.Send
Set json = ParseJson(http.ResponseText)
i = 10
For Each Item In json("data")
If Item("exchange") = Workbooks(1).Worksheets("Tabelle1").Range("A1").Value And Workbooks(1).Worksheets("Tabelle1").Range("A1").Value <> "" Then
Sheets(1).Cells(1, i).Value = Item("volume")
i = i + 1
End If
Next
Exit Sub
error:
End Sub
In this case, the content of cell A1 looks like this:
binance&interval=m5&baseId=monero"eId=bitcoin&start=1649894400000&end=1649898000000
The request for this looks like this:
https://api.coincap.io/v2/candles?exchange=poloniex&interval=h1&baseId=ethereum"eId=bitcoin&start=1649894400000&end=1649898000000
I would be very grateful for a tip
I've tried different combinations, but get no answer
"https://api.coincap.io/v2/candles" HTTPRequest.responseText:
"{"error":"missing quote","timestamp":1672666785053}"
The code is returning an missing quote error, caused by missing query parameters.
CoinCap Docs has a query builder that you can use. Start with a working query and than start modifying parameters until it fits your specifications.
I recommend creating a function to handle returning the json data. This simplifies development by making it easier to test the code.
Function QueryCoinCap(URL As String) As Object
On Error GoTo QueryCoinCap_Error
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.Send
Set QueryCoinCap = ParseJson(.ResponseText)
End With
On Error GoTo 0
Exit Function
QueryCoinCap_Error:
Rem CoinCap was unable to process the request
Rem to simplify the error handle I'll create a Scripting.Dictionary and ADD A CUSTOM ERROR TO IT
Rem This allows us to treat all errors in the same way
Dim Dictionary As Object
Set Dictionary = CreateObject("Scripting.Dictionary")
Dictionary("error") = "CoinCap was unable to process the request"
Set QueryCoinCap = Dictionary
End Function
Sub WorkingExampleQueryCoinCap()
Dim URL As String
URL = "https://api.coincap.io/v2/assets"
Dim JsonDictionary As Object
Set JsonDictionary = QueryCoinCap(URL)
If JsonDictionary.Exists("error") Then
Debug.Print JsonDictionary("error")
Else
Stop
Rem Do Something
End If
End Sub
Sub ErrorExampleQueryCoinCap()
Dim URL As String
URL = "https://api.coincap.io/v2/candles"
Dim JsonDictionary As Object
Set JsonDictionary = QueryCoinCap(URL)
If JsonDictionary.Exists("error") Then
Debug.Print JsonDictionary("error")
Else
Rem Do Something
End If
End Sub
I'm getting
run-time error 424
in 68th row (line)
request.Open "GET", Url, False
and I don't know how to fix it.
My previous question I posted ;
How to scrape specific part of online english dictionary?
My final goal is to get result like this;
A B
beginning bɪˈɡɪnɪŋ
behalf bɪˈhæf
behave bɪˈheɪv
behaviour bɪˈheɪvjər
belong bɪˈlɔːŋ
below bɪˈloʊ
bird bɜːrd
biscuit ˈbɪskɪt
Here's code I wrote, and it's mostly based on someone else's code I found on internet.
' Microsoft ActiveX Data Objects x.x Library
' Microsoft XML, v3.0
' Microsoft VBScript Regular Expressions
Sub ParseHelp()
' Word reference from
Dim Url As String
Url = "https://www.oxfordlearnersdictionaries.com/definition/english/" & Cells(ActiveCell.Row, "B").Value
' Get dictionary's html
Dim Html As String
Html = GetHtml(Url)
' Check error
If InStr(Html, "<TITLE>Not Found</Title>") > 0 Then
MsgBox "404"
Exit Sub
End If
' Extract phonetic alphabet from HTML
Dim wrapPattern As String
wrapPattern = "<span class='name' (.*?)</span>"
Set wrapCollection = FindRegexpMatch(Html, wrapPattern)
' MsgBox StripHtml(CStr(wrapCollection(1)))
' Fill phonetic alphabet into cell
If Not wrapCollection Is Nothing Then
Dim wrap As String
On Error Resume Next
wrap = StripHtml(CStr(wrapCollection(1)))
If Err.Number <> 0 Then
wrap = ""
End If
Cells(ActiveCell.Row, "C").Value = wrap
Else
MsgBox "not found"
End If
End Sub
Public Function StripHtml(Html As String) As String
Dim RegEx As New RegExp
Dim sOut As String
Html = Replace(Html, "</li>", vbNewLine)
Html = Replace(Html, " ", " ")
With RegEx
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "<[^>]+>"
End With
sOut = RegEx.Replace(Html, "")
StripHtml = sOut
Set RegEx = Nothing
End Function
Public Function GetHtml(Url As String) As String
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.serverXMLHTTP")
Dim converter As New ADODB.stream
' Get
request.Open "GET", Url, False
request.send
' raw bytes
converter.Open
converter.Type = adTypeBinary
converter.Write request.responseBody
' read
converter.Position = 0
converter.Type = adTypeText
converter.Charset = "utf-8"
' close
GetHtml = converter.ReadText
converter.Close
End Function
Public Function FindRegexpMatch(txt As String, pat As String) As Collection
Set FindRegexpMatch = New Collection
Dim rx As New RegExp
Dim matcol As MatchCollection
Dim mat As Match
Dim ret As String
Dim delimiter As String
txt = Replace(txt, Chr(10), "")
txt = Replace(txt, Chr(13), "")
rx.Global = True
rx.IgnoreCase = True
rx.MultiLine = True
rx.Pattern = pat
Set matcol = rx.Execute(txt)
'MsgBox "Match:" & matcol.Count
On Error GoTo ErrorHandler
For Each mat In matcol
'FindRegexpMatch.Add mat.SubMatches(0)
FindRegexpMatch.Add mat.Value
Next mat
Set rx = Nothing
' Insert code that might generate an error here
Exit Function
ErrorHandler:
' Insert code to handle the error here
MsgBox "FindRegexpMatch. " & Err.GetException()
Resume Next
End Function
Any kind of help would be greatly appreciated.
The following is an example of how to read in values from column A and write out pronounciations to column B. It uses css selectors to match a child node then steps up to parentNode in order to ensure entire pronounciation is grabbed. There are a number of ways you could have matched on the parent node to get the second pronounciation. Note that I use a parent node and Replace as the pronounciation may span multiple childNodes.
If doing this for lots of lookups please be a good netizen and put some waits in the code so as to not bombard the site with requests.
Option Explicit
Public Sub WriteOutPronounciations()
Dim html As MSHTML.HTMLDocument, i As Long, ws As Worksheet
Dim data As String, lastRow As Long, urls()
Set ws = ThisWorkbook.Worksheets("Sheet1")
lastRow = ws.Cells(ws.rows.Count, "A").End(xlUp).row 'you need at least two words in column A or change the redim.
urls = Application.Transpose(ws.Range("A1:A" & lastRow).Value)
ReDim results(1 To UBound(urls))
Set html = New MSHTML.HTMLDocument
With CreateObject("MSXML2.ServerXMLHTTP")
For i = LBound(urls) To UBound(urls)
.Open "GET", "https://www.oxfordlearnersdictionaries.com/definition/english/" & urls(i), False
.send
html.body.innerHTML = .responseText
data = Replace$(Replace$(html.querySelector(".name ~ .wrap").ParentNode.innerText, "/", vbNullString), Chr$(10), Chr$(32))
results(i) = Right$(data, Len(data) - 4)
Next
End With
With ThisWorkbook.Worksheets(1)
.Cells(1, 2).Resize(UBound(results, 1), 1) = Application.Transpose(results)
End With
End Sub
Required references (VBE>Tools>References):
Microsoft HTML Object Library
Should you go down the API route then here is a small example. You can make 1000 free calls in a month with Prototype account. The next best, depending on how many calls you wish to make looks like the 10,001 calls (that one extra PAYG call halves the price). # calls will be affected by whether word is head word or needs lemmas lookup call first. The endpoint construction you need is GET /entries/{source_lang}/{word_id}?fields=pronunciations though that doesn't seem to filter massively. You will need a json parser to handle the json returned e.g. github.com/VBA-tools/VBA-JSON/blob/master/JsonConverter.bas. Download raw code from there and add to standard module called JsonConverter. You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
Option Explicit
Public Sub WriteOutPronounciations()
Dim html As MSHTML.HTMLDocument, i As Long, ws As Worksheet
Dim data As String, lastRow As Long, words()
'If not performing lemmas lookup then must be head word e.g. behave, behalf
Const appId As String = "yourAppId"
Const appKey As String = "yourAppKey"
Set ws = ThisWorkbook.Worksheets("Sheet1")
lastRow = ws.Cells(ws.rows.Count, "A").End(xlUp).row
words = Application.Transpose(ws.Range("A1:A" & lastRow).Value)
ReDim results(1 To UBound(words))
Set html = New MSHTML.HTMLDocument
Dim json As Object
With CreateObject("MSXML2.ServerXMLHTTP")
For i = LBound(words) To UBound(words)
.Open "GET", "https://od-api.oxforddictionaries.com/api/v2/entries/en-us/" & LCase$(words(i)) & "?fields=pronunciations", False
.setRequestHeader "app_id", appId
.setRequestHeader "app_key", appKey
.setRequestHeader "ContentType", "application/json"
.send
Set json = JsonConverter.ParseJson(.responseText)
results(i) = IIf(json("results")(1)("type") = "headword", json("results")(1)("lexicalEntries")(1)("pronunciations")(2)("phoneticSpelling"), "lemmas lookup required")
Set json = Nothing
Next
End With
With ThisWorkbook.Worksheets(1)
.Cells(1, 2).Resize(UBound(results, 1), 1) = Application.Transpose(results)
End With
End Sub
I want to scrape this URL https://www.realtor.com/realestateandhomes-search/06510 using the VBA InStr function and extract all URLs with this substring "06510"
Here's is a sample code I've been trying to make work.
Option Explicit
Sub GetLinks()
'
'To use HTMLDocument you need to set a reference to Tools -> References -> Microsoft HTML Object Library
Dim HTML As New HTMLDocument
Dim http As Object
Dim links As Object
Dim link As HTMLHtmlElement
Dim counter As Long
Dim website As Range
Dim LastRange As Range
Dim row As Long
Dim continue As Boolean
Dim respHead As String
Dim lRow As Long
Application.ScreenUpdating = False
' The row where website addresses start
row = 30
continue = True
lRow = Cells(Rows.count, 1).End(xlUp).row + 1
' XMLHTTP gives errors where ServerXMLHTTP does not
' even when using the same URL's
'Set http = CreateObject("MSXML2.XMLHTTP")
Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Do While continue
' Could set this to first cell with URL then OFFSET columns to get next web site
Set website = Range("A" & row)
Set LastRange = Range("B" & lRow)
If Len(website.Value) < 1 Then
continue = False
Exit Sub
End If
If website Is Nothing Then
continue = False
End If
'Debug.Print website
With http
On Error Resume Next
.Open "GET", website.Value, False
.send
' If Err.Num is not 0 then an error occurred accessing the website
' This checks for badly formatted URL's. The website can still return an error
' which should be checked in .Status
'Debug.Print Err.Number
' Clear the row of any previous results
Range("B" & row & ":e" & row).Clear
' If the website sent a valid response to our request
If Err.Number = 0 Then
If .Status = 200 Then
HTML.body.innerHTML = http.responseText
Set links = HTML.getElementsByTagName("a")
For Each link In links
If InStr(link.outerHTML, "06510") Then
LastRange.Value = link.href
End If
Next
End If
Set website = Nothing
Else
'Debug.Print "Error loading page"
LastRange.Value = "Error with website address"
End If
On Error GoTo 0
End With
row = row + 1
Loop
Application.ScreenUpdating = True
End Sub
After inspecting the page, here's a sample of the kind of URL to extract - https://www.realtor.com/realestateandhomes-detail/239-Bradley-St_New-Haven_CT_06510_M36855-92189. Any help will be appreciated
Using QHarr's code in a simplified way...
Sub GetLinks()
Dim url As String, links_count As Integer
Dim j As Integer, row As Integer
Dim XMLHTTP As Object, html As Object
'Dim tr_coll As Object, tr As Object
'Dim elements As Object
Dim i As Long, allLinksOfInterest As Object
'Dim td_coll As Object, td As Object, td_col, objT
url = "https://www.realtor.com/realestateandhomes-search/06510"
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.responseText
Set allLinksOfInterest = html.querySelectorAll("[href*='06510']")
For i = 0 To allLinksOfInterest.Length - 1
Debug.Print allLinksOfInterest.Item(i).href
Next
End Sub
Please check if I'm missing anything. I'm still getting the error "Object doesn't support this property or method"
Don't use Instr on entire node outerHTML during a loop of all a tags. There are times when this is required but this shouldn't be one of them (hopefully).
You want to use attribute = value css selector with contains, *, operator. It is specifically for the purpose of matching on substrings in attribute values. This is more efficient.
Set allLinksOfInterest = html.querySelectorAll("[href*='06510']")
So,
Dim i As Long, allLinksOfInterest As Object
Set allLinksOfInterest = html.querySelectorAll("[href*='06510']")
For i = 0 To allLinksOfInterest.Length - 1
Debug.Print allLinksOfInterest.Item(i).href
Next
Attribute = value with contains operator:
[attr*=value]
Represents elements with an attribute name of attr whose
value contains at least one occurrence of value within the string.
VBA:
Produces 26 links currently.All are relative links so need domain added as shown in loop. Some are duplicates so consider adding to a dictionary as keys so as remove duplicates.
Option Explicit
Public Sub GetLinks()
Dim html As HTMLDocument
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.realtor.com/realestateandhomes-search/06510", False
.send
html.body.innerHTML = .responseText
End With
Dim i As Long, allLinksOfInterest As Object
Set allLinksOfInterest = html.querySelectorAll("[href*='06510']")
For i = 0 To allLinksOfInterest.Length - 1
Debug.Print Replace$(allLinksOfInterest.item(i).href,"about:","https://www.realtor.com")
Next
End Sub
If InStr(link.outerHTML, "06510") Then
In the code above, InStr function was used like boolean function. But it is not boolean, instead it returns integer. So, you should add comparison operator after function. May be like:
If InStr(link.outerHTML, "06510")>0 Then
I'm trying to import the information in JSON format from the following Url by WinHttpRequest:
https://bet.hkjc.com/football/getJSON.aspx?jsontype=odds_allodds.aspx&matchid=default
Sub test()
Dim xmlhttp As Object
Dim strUrl As String: strUrl = "https://bet.hkjc.com/football/getJSON.aspx?jsontype=odds_allodds.aspx&matchid=default"
Dim objRequest As Object
Set objRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
With objRequest
.Open "GET", strUrl, False
.send
End With
Debug.Print objRequest.responseText
End Sub
However, it just shows nothing similar to the Url but a lot of garbled messages.
I would like to know how to address this problem.
The code works fine if I use other Url.
XHR:
I believe the page has bot prevention measures in place whereby, if it suspects you are a bot a challenge is raised which requires javascript to run. If that runs successfully an XHR request is issued with info from the challenge in the headers and that, were you to be using a browser, would lead to your content being correctly updated to show expected values.
The first time I ran GET request I got the expected json response and after that I got the following:
<HTML>
<head>
<script>
Challenge=649275;
ChallengeId=473313563;
GenericErrorMessageCookies="Cookies must be enabled in order to view this page.";
</script>
<script>
function test(var1)
{
var var_str=""+Challenge;
var var_arr=var_str.split("");
var LastDig=var_arr.reverse()[0];
var minDig=var_arr.sort()[0];
var subvar1 = (2 * (var_arr[2]))+(var_arr[1]*1);
var subvar2 = (2 * var_arr[2])+var_arr[1];
var my_pow=Math.pow(((var_arr[0]*1)+2),var_arr[1]);
var x=(var1*3+subvar1)*1;
var y=Math.cos(Math.PI*subvar2);
var answer=x*y;
answer-=my_pow*1;
answer+=(minDig*1)-(LastDig*1);
answer=answer+subvar2;
return answer;
}
</script>
<script>
client = null;
if (window.XMLHttpRequest)
{
var client=new XMLHttpRequest();
}
else
{
if (window.ActiveXObject)
{
client = new ActiveXObject('MSXML2.XMLHTTP.3.0');
};
}
if (!((!!client)&&(!!Math.pow)&&(!!Math.cos)&&(!![].sort)&&(!![].reverse)))
{
document.write("Not all needed JavaScript methods are supported.<BR>");
}
else
{
client.onreadystatechange = function()
{
if(client.readyState == 4)
{
var MyCookie=client.getResponseHeader("X-AA-Cookie-Value");
if ((MyCookie == null) || (MyCookie==""))
{
document.write(client.responseText);
return;
}
var cookieName = MyCookie.split('=')[0];
if (document.cookie.indexOf(cookieName)==-1)
{
document.write(GenericErrorMessageCookies);
return;
}
window.location.reload(true);
}
};
y=test(Challenge);
client.open("POST",window.location,true);
client.setRequestHeader('X-AA-Challenge-ID', ChallengeId);
client.setRequestHeader('X-AA-Challenge-Result',y);
client.setRequestHeader('X-AA-Challenge',Challenge);
client.setRequestHeader('Content-Type' , 'text/plain');
client.send();
}
</script>
</head>
<body>
Whether you mimic what the javascript is doing and pass as a new XHR I am unsure (haven' looked closely).
You could also try browser automation e.g. IE via Microsoft Internet Controls or Chrome/FF etc via Selenium Basic, to see if letting javascript run on the page gets around this problem.
Handling challenge: (WIP)
I started looking at an attempt to handle this. Currently, I keep getting the json response so haven't fully tested the bottom part. I would expect some minute *do we care? margin for error if only because Math.PI gives 3.141592653589793, whereas Application.PI gives 3.14159265358979
Option Explicit
Public Sub GetInfo()
Dim json As Object, s As String, re As Object, ws As Worksheet
Dim pattern1 As String, pattern2 As String, challenge As Long, challengeId As Long
Const URL As String = "https://bet.hkjc.com/football/getJSON.aspx?jsontype=odds_allodds.aspx&matchid=default"
pattern1 = "Challenge=(\d+);"
pattern2 = "ChallengeId=(\d+);"
Set re = CreateObject("vbscript.regexp")
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
s = .responseText
On Error Resume Next
Set json = JsonConverter.ParseJson(s)
On Error GoTo 0
If Not json Is Nothing Then
Debug.Print "No challenge issued"
Debug.Print .responseText
Else
On Error GoTo errhand
challenge = GetId(re, s, pattern1)
If challenge = 999 Then Exit Sub 'should really use more unlikely value.
challengeId = GetId(re, s, pattern2)
.Open "POST", URL, False
.setRequestHeader "X-AA-Challenge-ID", challengeId
.setRequestHeader "X-AA-Challenge-Result", CLng(GetAnswer(challenge))
.setRequestHeader "X-AA-Challenge", challenge
.setRequestHeader "Content-Type", "text/plain"
.send ""
Debug.Print .Status, .responseText
If .Status = 200 Then
.Open "GET", URL, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
s = .responseText
Debug.Print s
End If
End If
End With
Exit Sub
errhand:
Debug.Print Err.Number, Err.Description
End Sub
Public Function GetId(ByVal re As Object, ByVal s As String, ByVal pattern As String) As Long
With re
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = pattern
If .TEST(s) Then
GetId = .Execute(s)(0).SubMatches(0)
Else
GetId = 999 '<probably should use a more unlikely number here!
End If
End With
End Function
Public Function GetAnswer(ByVal challenge As Long) As String 'var1 'challenge
Dim var_str As String, var_arr() As Long, LastDig As Long, minDig As Long
Dim i As Long
var_str = Chr$(34) & challenge & Chr$(34)
ReDim var_arr(0 To Len(var_str) - 3)
For i = 2 To Len(var_str) - 1
var_arr(i - 2) = CLng(Mid$(var_str, i, 1))
Next i
LastDig = var_arr(UBound(var_arr))
minDig = Application.Min(var_arr)
Dim my_pow As Long, x As Long, y As Long, answer As Variant
Dim subvar1 As Long, subvar2 As String
subvar1 = 2 * Application.Small(var_arr, 3) + Application.Small(var_arr, 2)
subvar2 = CStr(2 * Application.Small(var_arr, 3)) & CStr(Application.Small(var_arr, 2))
my_pow = (minDig + 2) ^ Application.Small(var_arr, 2)
x = challenge * 3 + (subvar1 * 1)
y = Evaluate("=COS(PI()* " & CLng(subvar2) & ")")
answer = x * y
answer = answer - my_pow
answer = answer + minDig - LastDig
answer = CStr(answer) & subvar2
GetAnswer = answer
End Function
Browser based solution:
Standard IE automation with Microsoft Internet Controls lead to SaveAs/Open Dialog prompt.
Using selenium you can avoid this prompt and grab the data from the pre element. Using selenium allows you to benefit from an implicit wait which allows the page to complete any challenge issued. You can increase the wait using explicit wait conditions.
Option Explicit
'download selenium https://github.com/florentbr/SeleniumBasic/releases/tag/v2.0.9.0
'Ensure latest applicable driver e.g. ChromeDriver.exe in Selenium folder
'VBE > Tools > References > Add reference to selenium type library
Public Sub DownloadFile()
Dim d As WebDriver, jsonText As String
Set d = New ChromeDriver
Const URL = "https://bet.hkjc.com/football/getJSON.aspx?jsontype=odds_allodds.aspx&matchid=default"
With d
.Start "Chrome"
.get URL
jsonText = .FindElementByCss("pre").Text
Debug.Print jsonText
Stop
.Quit
End With
End Sub
References:
Note I am using a json parser. After adding the .bas from that link you need to go VBE > Tools > References > Add a reference to Microsoft Scripting Runtime.
1 Some perspective from the RubberDuckVBA crew 1 and 2
I'm trying to pull some info from a website that provides oil well data by API number (API is a unique number for every well in the US)
Website: http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1
API example: 1708300502
The issue is, when I get to the 2nd page, IE.document.getElementsByTagName("body")(0).innerText still returns data from the initial page. How do I fetch the updated page data?
The ultimate goal is to get to the 2nd page, click on "30570" via IE.document.getElementsByTagName("a")(0).Click and then read the final 3rd page. I just cannot figure out how to read the updated page :(
Option Explicit
Sub sonris_WellData()
Dim IE As InternetExplorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
Dim i As Integer
'Open SONRIS website
Application.StatusBar = "Opening Website"
IE.navigate "http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Do While IE.readyState <> 4: DoEvents: Loop
Application.Wait Now() + TimeValue("00:00:01")
Application.StatusBar = False
IE.document.forms(0).p_apinum.Value = "1708300502" 'plug-in API
IE.document.forms(0).submit
' Wait until the next page opens
Application.StatusBar = "Opening Website"
Do While IE.readyState <> 4: DoEvents: Loop
Application.Wait Now() + TimeValue("00:00:01")
Application.StatusBar = False
' Read the page - this is where the issue occurs, MsgBox keeps returning text from the very 1st page
MsgBox IE.document.getElementsByTagName("body")(0).innerText
IE.Quit
End Sub
This seems to be working. Rather than DoEvents use the WinAPI Sleep function. I also added a call to the Sleep function after the form submit.
MOre often we are seeing sites that are dynamically served by some javascript/etc., in these cases the browser may appear to be READYSTATE_COMPLETE or not Busy but the page has not yet rendered the "new" results.
Option Explicit
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub sonris_WellData()
Dim IE As Object 'InternetExplorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
Dim i As Integer
'Open SONRIS website
Application.StatusBar = "Opening Website"
IE.navigate "http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Do While IE.readyState <> 4
Sleep 1000
Loop
Application.StatusBar = False
IE.document.forms(0).p_apinum.Value = "1708300502" 'plug-in API
IE.document.forms(0).submit
Sleep 1000
' Wait until the next page opens
Application.StatusBar = "Opening Website"
Do While IE.readyState <> 4
Sleep 1000
Loop
Application.StatusBar = False
' Read the page - this is where the issue occurs, MsgBox keeps returning text from the very 1st page
MsgBox IE.document.getElementsByTagName("body")(0).innerText
IE.Quit
End Sub
You can experiment maybe with a slightly longer Sleep after the .submit.
Alternatively, I notice that after you submit, the URL changes, so you could also try changing the second waiting loop to:
Do While IE.LocationURL ="http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Sleep 1000
Loop
This should put the Excel.Application to wait until the URL has changed.
Alternatively, you may have better luck using an XMLHTTPRequest (there are many examples of this here on SO and elsewhere on the internet). This allows you to send a request just like the browser, without actually using a web browser. Then you can simply parse the return text as HTML or XML. I would use the Microsoft XML, v6.0 library reference for this.
POST requests:
① Entering the Well API number
I examined the web page making the selections you mention. I inspected the web traffic using fiddler and noticed that the initial request, when you submit the API number is handled by a POST request.
② POST request:
The POST body has the following parameter:
p_apinum is the key and the associated value is the original Well API number.
Using this info I formulated a POST request direct thus avoiding your first landing page.
③ Pressing the hyperlink:
Next, I noticed that the element you wanted to press:
Looking at the associated HTML it has an associated relative hyperlink:
I use a helper function to parse the page HTML to get this relative link and construct the absolute path: GetNextURL(page.body.innerHTML).
④ Making a new request:
I re-use my HTTPRequest function GetPage to send a second request, with an empty body, and grab all the tables from the HTML document returned via: page.getElementsByTagName("table").
⑤ Writing the tables to the Excel worksheet:
I loop all the tables on the page using helper function AddHeaders to write out the table headers, and WriteTables to write the current table to the sheet.
Example page content:
Example code output:
VBA:
Option Explicit
Public Sub GetWellInfo()
Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
Const PARAM1 As String = "p_apinum"
Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
apiNumbers = Array(1708300502, 1708300503)
Application.ScreenUpdating = False
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ws
.Cells.ClearContents
For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
Dim allTables As Object
Set allTables = page.getElementsByTagName("table")
For Each targetTable In allTables
AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
WriteTables targetTable, GetLastRow(ws, 1), ws
Next targetTable
Next currNumber
End With
Application.ScreenUpdating = True
End Sub
Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
Dim objHTTP As Object, html As New HTMLDocument
Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")
Dim sBody As String
If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
With objHTTP
.SetTimeouts 10000, 10000, 10000, 10000
.Open "POST", url, False
.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
On Error Resume Next
.send (sBody)
If Err.Number = 0 Then
If .Status = "200" Then
html.body.innerHTML = .responseText
Set GetPage = html
Else
Debug.Print "HTTP " & .Status & " " & .statusText
Exit Function
End If
Else
Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
Exit Function
End If
On Error GoTo 0
End With
End Function
Public Function GetNextURL(ByVal inputString As String)
GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
End Function
Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
Dim headers As Object, header As Object, columnCounter As Long
Set headers = hTable.getElementsByTagName("th")
For Each header In headers
columnCounter = columnCounter + 1
ws.Cells(startRow, columnCounter) = header.innerText
Next header
End Sub
Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
If ws Is Nothing Then Set ws = ActiveSheet
Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
r = startRow
With ActiveSheet
Set tRow = hTable.getElementsByTagName("tr")
For Each tr In tRow
Set tCell = tr.getElementsByTagName("td")
For Each td In tCell
.Cells(r, c).Value = td.innerText
c = c + 1
Next td
r = r + 1: c = 1
Next tr
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
End With
End Function
References:
VBE > Tools > References > HTML Object Library.