Get the whole html file and extract a JSON paragraph - html

I want to retrieve a table from the URL of https://s.cafef.vn/screener.aspx#data using VBA. This task is difficult because the table contains JSON data embedded in an html file.
Taking #Tomalak ‘s advice, I am trying to split up my task; solving four following individual problems one after another:
Send an HTTP request to have the HTML
Locate the JSON string
Parse JSON with VBA and then
Loop over the raw data from the JSON and write into an Excel table.
Extract a JSON DATA table in html using VBA; converting Apps Script into VBA
However, I get stuck at Step 2, the response text that I get is stored in htmlTEXT. Its print-out looks like below attached, but the problem is as a string variable, htmlTEXT can hold up only a small part of the html page content. The JSON paragraph does not lie on the top part of the html page and is therefore not returned into htmlTEXT.
My questions are:
How can we get the whole content of the html page (with the JSON paragraph included)?
Once the JSON paragraph is captured, what Regular Expression can be used to extract the JSON paragraph ?
Noticing that the JSON paragraph starts with [{ and ends with }], I therefore use the pattern [{*}] but it does not work at all, (though it works with pattern like (D.C); resulting in DOC for my testing purpose)
What is wrong with my code?
Sub ExtractJSON_in_html()
' =====send an HTTP request with VBA ====
Dim JSONtext As String
Dim htmlTEXT As String
Dim SDI As Object
Set objHTTP = CreateObject("MSXML2.XMLHTTP")
Url = "https://s.cafef.vn/screener.aspx#data"
objHTTP.Open "GET", Url, False
objHTTP.send
htmlTEXT = objHTTP.responsetext
MsgBox htmlTEXT
' ===== Locate the JSON string =======
Set SDI = CreateObject("VBScript.RegExp")
SDI.IgnoreCase = True
SDI.Pattern = "[{*}]"
SDI.Global = True
Set theMatches = SDI.Execute(htmlTEXT)
For Each Match In theMatches
'MsgBox Match.Value
JSONtext = Match.Value
Next
End Sub
htmlTEXT:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
-- JASON Paragraph var jsonData = [{"Url":"http://s.cafef.vn/upcom/A32-cong-ty-co-phan-32.chn","CenterName":"UpCom","Symbol":"A32","TradeCenterID":9,"ChangePrice":0,"VonHoa":212.84,"ChangeVolume":400,"EPS":6.19220987764706,"PE":5.0547382305287,"Beta":0,"Price":0,"UpdatedDate":"\/Date(1625562652463)\/","FullName":"Công ty cổ phần 32","ParentCategoryId":0
{"Url":"http://s.cafef.vn/upcom/YTC-cong-ty-co-phan-xuat-nhap-khau-y-te-thanh-pho-ho-chi-minh.chn","CenterName":"UpCom","Symbol":"YTC","TradeCenterID":9,"ChangePrice":0,"VonHoa":170.8,"ChangeVolume":200,"EPS":-4.29038514857143,"PE":-14.217837766922,"Beta":0,"Price":0,"UpdatedDate":"\/Date(1625562969277)\/","FullName":"Công ty Cổ phần Xuất nhập khẩu Y tế Thành phố Hồ Chí Minh","ParentCategoryId":0}];

This will return the JSON string as a Dictionary object for you to work through:
You will need JsonConverter (and reference to Microsoft Scripting Runtime for Dictionary object)
Private Sub Test()
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
xmlhttp.Open "GET", "https://s.cafef.vn/screener.aspx"
xmlhttp.send
Dim jsonStr As String
jsonStr = Mid$(xmlhttp.responseText, InStr(xmlhttp.responseText, "[{"))
jsonStr = Left$(jsonStr, InStr(jsonStr, "}]") + 1)
Dim jsDict As Scripting.Dictionary
Set jsDict = JsonConverter.ParseJson("{""results"":" & jsonStr & "}")
Debug.Print jsDict("results").Count '1874
End Sub
Note: The original URL in your question returns 404 error, you just need to remove #data from the URL.

I would want more certainty over matching the correct JavaScript object than given by the current Instr methods (which could be extended to include the var jsonData pattern as well.) In case of using regex then the following pattern can be used, which will allow for line break matching. Note, you only need one entire match then parse the JavaScript array returned with a json parser.
Public Sub ExtractJSON_in_html()
' =====send an HTTP request with VBA ====
Dim JSONtext As String
Dim htmlTEXT As String
Dim SDI As Object
Set OBJHTTP = CreateObject("MSXML2.XMLHTTP")
URL = "https://s.cafef.vn/screener.aspx"
OBJHTTP.Open "GET", URL, False
OBJHTTP.setRequestHeader "User-Agent", "Mozilla/5.0"
OBJHTTP.send
htmlTEXT = OBJHTTP.responseText
MsgBox htmlTEXT
' ===== Locate the JSON string =======
Set SDI = CreateObject("VBScript.RegExp")
SDI.IgnoreCase = True
SDI.Pattern = "var\sjsonData\s=\s([\s\S].*)?;"
WriteTxtFile SDI.Execute(htmlTEXT)(0).SubMatches(0)
End Sub
Public Sub WriteTxtFile(ByVal aString As String, Optional ByVal filePath As String = "C:\Users\<user>\Desktop\Test.txt")
Dim fso As Object, Fileout As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Set Fileout = fso.CreateTextFile(filePath, True, True)
Fileout.Write aString
Fileout.Close
End Sub
Regex:
Sample of treeview of result:
Array with 1874 elements; 1 expanded.

Edited your macro.. This will add a worksheet and parse JSON text from Range A1
Option Explicit
Sub ExtractJSON_in_html()
Dim JSONtext As String, JSONtextarr() As String, Url As String
Dim htmlTEXT As String, colHead As String
Dim SDI As Object, objHTTP As Object, theMatches As Object, Match As Variant
Dim StartPos As Long, endPos As Long, i As Long
Set objHTTP = CreateObject("MSXML2.XMLHTTP")
Url = "https://s.cafef.vn/screener.aspx"
' =====send an HTTP request with VBA ====
objHTTP.Open "GET", Url, False
objHTTP.send
htmlTEXT = objHTTP.responseText
StartPos = InStr(1, htmlTEXT, "var jsonData = [", vbTextCompare)
endPos = InStr(StartPos, htmlTEXT, "]", vbTextCompare)
htmlTEXT = Replace(Mid(htmlTEXT, StartPos, endPos - StartPos + 1), ",""", ";")
' ===== Make the JSON strings collecton =======
Set SDI = CreateObject("VBScript.RegExp")
SDI.IgnoreCase = True
SDI.Global = True
SDI.Pattern = "[^a-zA-Z0-9&{}./:;,-]"
htmlTEXT = SDI.Replace(htmlTEXT, "")
SDI.Pattern = "\{([^}]+)\}"
Set theMatches = SDI.Execute(htmlTEXT)
JSONtext = ""
Debug.Print theMatches.Count
For Each Match In theMatches
JSONtext = JSONtext & Match.Value & ","
Next
' ===== Populate new worksheet with parsed JSON =======
JSONtext = Replace(Mid(JSONtext, 2, Len(JSONtext) - 3), ",ParentCategoryId", ",,ParentCategoryId", , , vbTextCompare)
JSONtextarr = Split(JSONtext, "},{", , vbTextCompare)
Worksheets.Add
Range("A2").Resize(UBound(JSONtextarr) + 1, 1).Value = Application.Transpose(JSONtextarr)
Range("A2").CurrentRegion.TextToColumns Destination:=Range("A2"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
Semicolon:=True, Comma:=False, Space:=False, Other:=False, TrailingMinusNumbers:=True
Debug.Print Range("A2").CurrentRegion.Columns.Count
For i = 1 To Range("A2").CurrentRegion.Columns.Count
colHead = Split(Cells(2, i), ":")(0)
Cells(1, i) = colHead
Range("A2").CurrentRegion.Columns(i).Replace What:=colHead & ":", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Next i
End Sub

Related

I'm getting stuck at vba runtime error 424

I'm getting
run-time error 424
in 68th row (line)
request.Open "GET", Url, False
and I don't know how to fix it.
My previous question I posted ;
How to scrape specific part of online english dictionary?
My final goal is to get result like this;
A B
beginning bɪˈɡɪnɪŋ
behalf bɪˈhæf
behave bɪˈheɪv
behaviour bɪˈheɪvjər
belong bɪˈlɔːŋ
below bɪˈloʊ
bird bɜːrd
biscuit ˈbɪskɪt
Here's code I wrote, and it's mostly based on someone else's code I found on internet.
' Microsoft ActiveX Data Objects x.x Library
' Microsoft XML, v3.0
' Microsoft VBScript Regular Expressions
Sub ParseHelp()
' Word reference from
Dim Url As String
Url = "https://www.oxfordlearnersdictionaries.com/definition/english/" & Cells(ActiveCell.Row, "B").Value
' Get dictionary's html
Dim Html As String
Html = GetHtml(Url)
' Check error
If InStr(Html, "<TITLE>Not Found</Title>") > 0 Then
MsgBox "404"
Exit Sub
End If
' Extract phonetic alphabet from HTML
Dim wrapPattern As String
wrapPattern = "<span class='name' (.*?)</span>"
Set wrapCollection = FindRegexpMatch(Html, wrapPattern)
' MsgBox StripHtml(CStr(wrapCollection(1)))
' Fill phonetic alphabet into cell
If Not wrapCollection Is Nothing Then
Dim wrap As String
On Error Resume Next
wrap = StripHtml(CStr(wrapCollection(1)))
If Err.Number <> 0 Then
wrap = ""
End If
Cells(ActiveCell.Row, "C").Value = wrap
Else
MsgBox "not found"
End If
End Sub
Public Function StripHtml(Html As String) As String
Dim RegEx As New RegExp
Dim sOut As String
Html = Replace(Html, "</li>", vbNewLine)
Html = Replace(Html, " ", " ")
With RegEx
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "<[^>]+>"
End With
sOut = RegEx.Replace(Html, "")
StripHtml = sOut
Set RegEx = Nothing
End Function
Public Function GetHtml(Url As String) As String
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.serverXMLHTTP")
Dim converter As New ADODB.stream
' Get
request.Open "GET", Url, False
request.send
' raw bytes
converter.Open
converter.Type = adTypeBinary
converter.Write request.responseBody
' read
converter.Position = 0
converter.Type = adTypeText
converter.Charset = "utf-8"
' close
GetHtml = converter.ReadText
converter.Close
End Function
Public Function FindRegexpMatch(txt As String, pat As String) As Collection
Set FindRegexpMatch = New Collection
Dim rx As New RegExp
Dim matcol As MatchCollection
Dim mat As Match
Dim ret As String
Dim delimiter As String
txt = Replace(txt, Chr(10), "")
txt = Replace(txt, Chr(13), "")
rx.Global = True
rx.IgnoreCase = True
rx.MultiLine = True
rx.Pattern = pat
Set matcol = rx.Execute(txt)
'MsgBox "Match:" & matcol.Count
On Error GoTo ErrorHandler
For Each mat In matcol
'FindRegexpMatch.Add mat.SubMatches(0)
FindRegexpMatch.Add mat.Value
Next mat
Set rx = Nothing
' Insert code that might generate an error here
Exit Function
ErrorHandler:
' Insert code to handle the error here
MsgBox "FindRegexpMatch. " & Err.GetException()
Resume Next
End Function
Any kind of help would be greatly appreciated.
The following is an example of how to read in values from column A and write out pronounciations to column B. It uses css selectors to match a child node then steps up to parentNode in order to ensure entire pronounciation is grabbed. There are a number of ways you could have matched on the parent node to get the second pronounciation. Note that I use a parent node and Replace as the pronounciation may span multiple childNodes.
If doing this for lots of lookups please be a good netizen and put some waits in the code so as to not bombard the site with requests.
Option Explicit
Public Sub WriteOutPronounciations()
Dim html As MSHTML.HTMLDocument, i As Long, ws As Worksheet
Dim data As String, lastRow As Long, urls()
Set ws = ThisWorkbook.Worksheets("Sheet1")
lastRow = ws.Cells(ws.rows.Count, "A").End(xlUp).row 'you need at least two words in column A or change the redim.
urls = Application.Transpose(ws.Range("A1:A" & lastRow).Value)
ReDim results(1 To UBound(urls))
Set html = New MSHTML.HTMLDocument
With CreateObject("MSXML2.ServerXMLHTTP")
For i = LBound(urls) To UBound(urls)
.Open "GET", "https://www.oxfordlearnersdictionaries.com/definition/english/" & urls(i), False
.send
html.body.innerHTML = .responseText
data = Replace$(Replace$(html.querySelector(".name ~ .wrap").ParentNode.innerText, "/", vbNullString), Chr$(10), Chr$(32))
results(i) = Right$(data, Len(data) - 4)
Next
End With
With ThisWorkbook.Worksheets(1)
.Cells(1, 2).Resize(UBound(results, 1), 1) = Application.Transpose(results)
End With
End Sub
Required references (VBE>Tools>References):
Microsoft HTML Object Library
Should you go down the API route then here is a small example. You can make 1000 free calls in a month with Prototype account. The next best, depending on how many calls you wish to make looks like the 10,001 calls (that one extra PAYG call halves the price). # calls will be affected by whether word is head word or needs lemmas lookup call first. The endpoint construction you need is GET /entries/{source_lang}/{word_id}?fields=pronunciations though that doesn't seem to filter massively. You will need a json parser to handle the json returned e.g. github.com/VBA-tools/VBA-JSON/blob/master/JsonConverter.bas. Download raw code from there and add to standard module called JsonConverter. You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
Option Explicit
Public Sub WriteOutPronounciations()
Dim html As MSHTML.HTMLDocument, i As Long, ws As Worksheet
Dim data As String, lastRow As Long, words()
'If not performing lemmas lookup then must be head word e.g. behave, behalf
Const appId As String = "yourAppId"
Const appKey As String = "yourAppKey"
Set ws = ThisWorkbook.Worksheets("Sheet1")
lastRow = ws.Cells(ws.rows.Count, "A").End(xlUp).row
words = Application.Transpose(ws.Range("A1:A" & lastRow).Value)
ReDim results(1 To UBound(words))
Set html = New MSHTML.HTMLDocument
Dim json As Object
With CreateObject("MSXML2.ServerXMLHTTP")
For i = LBound(words) To UBound(words)
.Open "GET", "https://od-api.oxforddictionaries.com/api/v2/entries/en-us/" & LCase$(words(i)) & "?fields=pronunciations", False
.setRequestHeader "app_id", appId
.setRequestHeader "app_key", appKey
.setRequestHeader "ContentType", "application/json"
.send
Set json = JsonConverter.ParseJson(.responseText)
results(i) = IIf(json("results")(1)("type") = "headword", json("results")(1)("lexicalEntries")(1)("pronunciations")(2)("phoneticSpelling"), "lemmas lookup required")
Set json = Nothing
Next
End With
With ThisWorkbook.Worksheets(1)
.Cells(1, 2).Resize(UBound(results, 1), 1) = Application.Transpose(results)
End With
End Sub

get web page data through class name

I need to get the dates and temp from a weather website and record it on cells but I am getting a object variable or with block variable not set error.
I tried to data from web in excel but I think the website is protected or something because I keep getting "under maintenance" page when trying to load the page from excel. I got the codes below from a tutorial but I can't make it work.
Sub record()
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim websie As String
Dim temps As Variant
'provide link
'website = "https://finance.yahoo.com/quote/EURUSD=X?p=EURUSD=X"
website = "https://www.accuweather.com/en/us/chicago/60608/september-weather/348308"
'create the object that will make the webpage request
Set request = CreateObject("MSXML2.XMLHTTP")
'go to the link
request.Open "GET", website, False
'send request for webpage
request.send
'get web response data to variable
response = StrConv(request.responseBody, vbUnicode)
'put webpage to an html object
html.body.innerHTML = response
'get temperature from specified element
'temps = html.getElementsByClassName("Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)")(0).innerText
temps = html.getElementsByClassName("high")(0).innerText
Sheets("record").Range("A1") = temps
End Sub
Sample lines from the website:
<a class="monthly-daypanel is-past">
<div class="date">2</div>
<div class="icon-container"...</div>
<div class="temp">
<div class="high">83</div>
<div class="low">83</div>
</div>
</a>
I want to get the date, high and low.
You need an User-Agent header. I would also extract the json string from one of the script tags (I use regex for this) and use that as source. I add in a date comparison to work out if it is a forecast or actual value. I read the json string into json object using json library and loop the resultant collection storing items of interest in an array for faster writing out to sheet at end.
json library:
I use jsonconverter.bas. Download raw code from here and add to standard module called jsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime.
Option Explicit
Public Sub GetWeatherListings()
Dim s As String, re As Object, ws As Worksheet
Set re = CreateObject("vbscript.regexp")
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.accuweather.com/en/us/chicago/60608/september-weather/348308", False
' .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
s = .responsetext
End With
Dim results(), r As Long, jsonSource As String, json As Object, item As Object
jsonSource = GetString(re, s, "dailyForecast = (.*?\])")
If jsonSource = "No match" Then Exit Sub
Set json = JsonConverter.ParseJson(jsonSource)
ReDim results(1 To json.count, 1 To 4) 'date, datetime, day > dActual, night > dActual
Dim dateTime() As String, datePart As String, forecast As Boolean
For Each item In json
r = r + 1
dateTime = Split(item("dateTime"), "T")
datePart = dateTime(LBound(dateTime))
forecast = CDate(datePart) >= Date
results(r, 1) = datePart
results(r, 2) = item("dateTime")
results(r, 3) = IIf(forecast, item("day")("dTemp"), item("day")("dActual"))
results(r, 4) = IIf(forecast, item("night")("dTemp"), item("night")("dActual"))
Next
Dim headers()
headers = Array("Date", "DateTime", "Day temp", "Night temp")
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Public Function GetString(ByVal re As Object, ByVal inputString As String, ByVal pattern As String) As String
Dim matches As Object
With re
.Global = True
.MultiLine = True
.IgnoreCase = True
.pattern = pattern
If .Test(inputString) Then
Set matches = .Execute(inputString)
GetString = matches(0).SubMatches(0)
Exit Function
End If
End With
GetString = "No match"
End Function
Sample of end of output:

How to retrieve JSON response using VBA?

I make a request to a website and paste the JSON response into a single cell.
I get an object required 424 error.
Sub GetJSON()
Dim hReq As Object
Dim JSON As Dictionary
Dim var As Variant
Dim ws As Worksheet
Set ws = Title
'create our URL string and pass the user entered information to it
Dim strUrl As String
strUrl = Range("M24").Value
Set hReq = CreateObject("MSXML2.XMLHTTP")
With hReq
.Open "GET", strUrl, False
.Send
End With
'wrap the response in a JSON root tag "data" to count returned objects
Dim response As String
response = "{""data"":" & hReq.responseText & "}"
Set JSON = JsonConverter.ParseJson(response)
'set array size to accept all returned objects
ReDim var(JSON("data").Count, 1)
Cells(25, 13) = JSON
Erase var
Set var = Nothing
Set hReq = Nothing
Set JSON = Nothing
End Sub
The URL that gives me the response in cell "M24":
https://earthquake.usgs.gov/ws/designmaps/asce7-10.json?latitude=36.497452&longitude=-86.949479&riskCategory=III&siteClass=C&title=Seismic
The code after Qharr's response. I get a run time 0 error even though the error says it ran successfully. Nothing is copied to my cells.
Public Sub GetInfo()
Dim URL As String, json As Object
Dim dict As Object
URL = "https://earthquake.usgs.gov/ws/designmaps/asce7-10.json?latitude=36.497452&longitude=-86.949479&riskCategory=III&siteClass=C&title=Seismic"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.Send
Set json = JsonConverter.ParseJson(.responseText) '<== dictionary
ThisWorkbook.Worksheets("Title").Cells(1, 1) = .responseText
Set dict = json("response")("data")
ws.Cells(13, 27) = "ss: " & dict("ss") & Chr$(10) & "s1: " & dict("s1")
End With
End Sub
I'm not clear what you mean. The entire response can go in a cell as follows.
JSON is an object so you would need Set keyword but you can't set a cell range to the dictionary object - the source of your error.
Option Explicit
Public Sub GetInfo()
Dim URL As String, json As Object
URL = "https://earthquake.usgs.gov/ws/designmaps/asce7-10.json?latitude=36.497452&longitude=-86.949479&riskCategory=III&siteClass=C&title=Seismic"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
Set json = JsonConverter.ParseJson(.responseText) '<== dictionary
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1) = .responseText
End With
End Sub
When you use parsejson you are converting to a dictionary object which you need to do something with. There is simply too much data nested inside to write anything readable (if limit not exceeded) into one cell.
Inner dictionary data quickly descends into nested collections. The nested collection count comes from
Dim dict As Object
Set dict = json("response")("data")
Debug.Print "nested collection count = " & dict("sdSpectrum").Count + dict("smSpectrum").Count
To get just s1 and ss values parse them out:
Dim dict As Object
Set dict = json("response")("data")
ws.Cells(1, 2) = "ss: " & dict("ss") & Chr$(10) & "s1: " & dict("s1")
I have figured out the solution to pasting the response text with Excel 2003. Below is my finished code.
Public Sub datagrab()
Dim URL As String
Dim ws As Object
Dim xmlhttp As New MSXML2.XMLHTTP60
URL = Range("M24").Value 'This is the URL I'm requesting from
xmlhttp.Open "GET", URL, False
xmlhttp.Send
Worksheets("Title").Range("M25").Value = xmlhttp.responseText
End Sub

VBA access to a json property without name property

I'm trying to access in VBA to over 508 "tank_id"s from a JSON file as you can see here.
I'm using cStringBuilder, cJSONScript and JSONConverter to parse the JSON file.
My main issue is that I can't pass threw all those ids because I don't know how to get the "1" "33" "49" "81" that are without names.
Here si the code I tried to get them, without success.
Const myurl2 As String = "https://api.worldoftanks.eu/wot/encyclopedia/vehicles/?application_id=demo&fields=tank_id"
Sub List_id_vehicules()
Dim strRequest
Dim xmlHttp: Set xmlHttp = CreateObject("msxml2.xmlhttp")
Dim response As Object
Dim rows As Integer
Dim counter As Integer
Dim j As String
Dim k As Integer: k = 2
Dim url As String
url = myurl2
xmlHttp.Open "GET", url, False
xmlHttp.setRequestHeader "Content-Type", "text/xml"
xmlHttp.send
While Not xmlHttp.Status = 200 '<---------- wait
Wend
Set response = ParseJson(xmlHttp.ResponseText)
rows = response("meta")("count")
For counter = 1 To rows
j = counter
Dim yop As String
yop = "data[" & j & "][" & j & "]"
Sheets(2).Cells(1 + counter, 1).Value = response('data[counter]')['tank_id']
Next counter
END Sub
Could someone help me ?
The JSONConverter essentially parses the JSON text string into a set of nested Dictionary objects. So when the ParseJson function returns an Object, it's really a Dictionary. Then, when you access response("meta"), the "meta" part is the Key to the Dictionary object. It's the same thing as you nest down through the JSON.
So when you try to access response("data")("3137"), you're accessing the Dictionary returned by response("data") with the key="3137". Now the trick becomes how to get all the Keys from the response("data") object.
Here's a sample bit of code to illustrate how you can list all the tank IDs in the JSON data section:
Option Explicit
Sub ListVehicleIDs()
Const jsonFilename As String = "C:\Temp\tanks.json"
Dim fileHandle As Integer
Dim jsonString As String
fileHandle = FreeFile
Open jsonFilename For Input As #fileHandle
jsonString = Input$(LOF(fileHandle), #fileHandle)
Close #fileHandle
Dim jsonObj As Object
Set jsonObj = ParseJson(jsonString)
Dim tankCount As Long
tankCount = jsonObj("meta")("count")
Dim tankIDs As Dictionary
Set tankIDs = jsonObj("data")
Dim tankID As Variant
For Each tankID In tankIDs.keys
Debug.Print "Tank ID = " & tankID
Next tankID
End Sub

Web scraping html page with no tags as delimiter

I'm trying to import into a string array all lines of text in a web page. The URL is here: Vaticano-La Sacra Bibbia-Genesi-Cap.1.
Unfortunately (maybe a choice of the web designer), in the tag there aren't ID's or CLASS. All the rows are separated by 1 or more < BR > element. Start and end text is separated from a simple menu by 2 tag < HR >.
A clean extract of page code is here: jsfiddle.
I find a way to bring the text. And now what I do in VBA till now:
Note: objDoc is a Public variable coming from another module, fill with a .responseText without problems.
Public Sub ScriviXHTML(strBook As String, intNumCap As Integer)
Dim strDati2 As String
Dim TagBr As IHTMLElementCollection
Dim BrElement As IHTMLElement
Dim intElement As Integer
Dim objChild as Object
Dim strData, strTextCont, strNodeVal, strWholeText As String
Set objDoc2 = New HTMLDocument
Set objDoc2 = objDoc
Set objDoc = Nothing
'Put in variable string HTML code of the web page.
strDati2 = objDoc2.body.innerHTML
'Set in the variable object TAG type BR.
Set TagBr = objDoc2.body.getElementsByTagName("BR")
'Loop for all BRs in the page.
For Each BrElement In TagBr
'Here I try to get the NextSibling element of the <br>
' because seems contain the text I'm looking for.
Set objChild = BrElement.NextSibling
With objChild
' Here I try to put in the variables
strData = Trim("" & .Data & "")
strTextCont = Trim("" & .textContent & "")
strNodeVal = Trim("" & .NodeValue & "")
strWholeText = Trim("" & .wholeText & "")
End With
intElement = intElement + 1
Next BrElement
Two questions:
1) Is it, about you, the best way to achieve what I'm trying to do?
2) Sometimes the Element.NextSibling.Data doesn't exist, with an Error of runtime '438', so I manually move the point of sospension of the routine to by-pass the error. How can I intercept this error? [Please not with a simple On Error Resume Next!]... better: how can I use an If...Then... End If statement to check if in NextSibling exist the Data member?
Thanks at all.
Well you can get all the text as follows:
Public Sub GetInfo()
Dim sResponse As String, xhr As Object, html As New HTMLDocument
Set xhr = CreateObject("MSXML2.XMLHTTP")
With xhr
.Open "GET", "http://www.vatican.va/archive/ITA0001/__P1.HTM", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
html.body.innerHTML = sResponse
[A1] = Replace$(Replace$(regexRemove(html.body.innerHTML, "<([^>]+)>"), " ", Chr$(32)), Chr$(10), Chr$(32))
End With
End Sub
Public Function regexRemove(ByVal s As String, ByVal pattern As String) As String
Dim regex As Object
Set regex = CreateObject("VBScript.RegExp")
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = pattern
End With
If regex.test(s) Then
regexRemove = regex.Replace(s, vbNullString)
Else
regexRemove = s
End If
End Function