I want to retrieve a table from the URL of https://s.cafef.vn/screener.aspx#data using VBA. This task is difficult because the table contains JSON data embedded in an html file.
Taking #Tomalak ‘s advice, I am trying to split up my task; solving four following individual problems one after another:
Send an HTTP request to have the HTML
Locate the JSON string
Parse JSON with VBA and then
Loop over the raw data from the JSON and write into an Excel table.
Extract a JSON DATA table in html using VBA; converting Apps Script into VBA
However, I get stuck at Step 2, the response text that I get is stored in htmlTEXT. Its print-out looks like below attached, but the problem is as a string variable, htmlTEXT can hold up only a small part of the html page content. The JSON paragraph does not lie on the top part of the html page and is therefore not returned into htmlTEXT.
My questions are:
How can we get the whole content of the html page (with the JSON paragraph included)?
Once the JSON paragraph is captured, what Regular Expression can be used to extract the JSON paragraph ?
Noticing that the JSON paragraph starts with [{ and ends with }], I therefore use the pattern [{*}] but it does not work at all, (though it works with pattern like (D.C); resulting in DOC for my testing purpose)
What is wrong with my code?
Sub ExtractJSON_in_html()
' =====send an HTTP request with VBA ====
Dim JSONtext As String
Dim htmlTEXT As String
Dim SDI As Object
Set objHTTP = CreateObject("MSXML2.XMLHTTP")
Url = "https://s.cafef.vn/screener.aspx#data"
objHTTP.Open "GET", Url, False
objHTTP.send
htmlTEXT = objHTTP.responsetext
MsgBox htmlTEXT
' ===== Locate the JSON string =======
Set SDI = CreateObject("VBScript.RegExp")
SDI.IgnoreCase = True
SDI.Pattern = "[{*}]"
SDI.Global = True
Set theMatches = SDI.Execute(htmlTEXT)
For Each Match In theMatches
'MsgBox Match.Value
JSONtext = Match.Value
Next
End Sub
htmlTEXT:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
-- JASON Paragraph var jsonData = [{"Url":"http://s.cafef.vn/upcom/A32-cong-ty-co-phan-32.chn","CenterName":"UpCom","Symbol":"A32","TradeCenterID":9,"ChangePrice":0,"VonHoa":212.84,"ChangeVolume":400,"EPS":6.19220987764706,"PE":5.0547382305287,"Beta":0,"Price":0,"UpdatedDate":"\/Date(1625562652463)\/","FullName":"Công ty cổ phần 32","ParentCategoryId":0
{"Url":"http://s.cafef.vn/upcom/YTC-cong-ty-co-phan-xuat-nhap-khau-y-te-thanh-pho-ho-chi-minh.chn","CenterName":"UpCom","Symbol":"YTC","TradeCenterID":9,"ChangePrice":0,"VonHoa":170.8,"ChangeVolume":200,"EPS":-4.29038514857143,"PE":-14.217837766922,"Beta":0,"Price":0,"UpdatedDate":"\/Date(1625562969277)\/","FullName":"Công ty Cổ phần Xuất nhập khẩu Y tế Thành phố Hồ Chí Minh","ParentCategoryId":0}];
This will return the JSON string as a Dictionary object for you to work through:
You will need JsonConverter (and reference to Microsoft Scripting Runtime for Dictionary object)
Private Sub Test()
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
xmlhttp.Open "GET", "https://s.cafef.vn/screener.aspx"
xmlhttp.send
Dim jsonStr As String
jsonStr = Mid$(xmlhttp.responseText, InStr(xmlhttp.responseText, "[{"))
jsonStr = Left$(jsonStr, InStr(jsonStr, "}]") + 1)
Dim jsDict As Scripting.Dictionary
Set jsDict = JsonConverter.ParseJson("{""results"":" & jsonStr & "}")
Debug.Print jsDict("results").Count '1874
End Sub
Note: The original URL in your question returns 404 error, you just need to remove #data from the URL.
I would want more certainty over matching the correct JavaScript object than given by the current Instr methods (which could be extended to include the var jsonData pattern as well.) In case of using regex then the following pattern can be used, which will allow for line break matching. Note, you only need one entire match then parse the JavaScript array returned with a json parser.
Public Sub ExtractJSON_in_html()
' =====send an HTTP request with VBA ====
Dim JSONtext As String
Dim htmlTEXT As String
Dim SDI As Object
Set OBJHTTP = CreateObject("MSXML2.XMLHTTP")
URL = "https://s.cafef.vn/screener.aspx"
OBJHTTP.Open "GET", URL, False
OBJHTTP.setRequestHeader "User-Agent", "Mozilla/5.0"
OBJHTTP.send
htmlTEXT = OBJHTTP.responseText
MsgBox htmlTEXT
' ===== Locate the JSON string =======
Set SDI = CreateObject("VBScript.RegExp")
SDI.IgnoreCase = True
SDI.Pattern = "var\sjsonData\s=\s([\s\S].*)?;"
WriteTxtFile SDI.Execute(htmlTEXT)(0).SubMatches(0)
End Sub
Public Sub WriteTxtFile(ByVal aString As String, Optional ByVal filePath As String = "C:\Users\<user>\Desktop\Test.txt")
Dim fso As Object, Fileout As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Set Fileout = fso.CreateTextFile(filePath, True, True)
Fileout.Write aString
Fileout.Close
End Sub
Regex:
Sample of treeview of result:
Array with 1874 elements; 1 expanded.
Edited your macro.. This will add a worksheet and parse JSON text from Range A1
Option Explicit
Sub ExtractJSON_in_html()
Dim JSONtext As String, JSONtextarr() As String, Url As String
Dim htmlTEXT As String, colHead As String
Dim SDI As Object, objHTTP As Object, theMatches As Object, Match As Variant
Dim StartPos As Long, endPos As Long, i As Long
Set objHTTP = CreateObject("MSXML2.XMLHTTP")
Url = "https://s.cafef.vn/screener.aspx"
' =====send an HTTP request with VBA ====
objHTTP.Open "GET", Url, False
objHTTP.send
htmlTEXT = objHTTP.responseText
StartPos = InStr(1, htmlTEXT, "var jsonData = [", vbTextCompare)
endPos = InStr(StartPos, htmlTEXT, "]", vbTextCompare)
htmlTEXT = Replace(Mid(htmlTEXT, StartPos, endPos - StartPos + 1), ",""", ";")
' ===== Make the JSON strings collecton =======
Set SDI = CreateObject("VBScript.RegExp")
SDI.IgnoreCase = True
SDI.Global = True
SDI.Pattern = "[^a-zA-Z0-9&{}./:;,-]"
htmlTEXT = SDI.Replace(htmlTEXT, "")
SDI.Pattern = "\{([^}]+)\}"
Set theMatches = SDI.Execute(htmlTEXT)
JSONtext = ""
Debug.Print theMatches.Count
For Each Match In theMatches
JSONtext = JSONtext & Match.Value & ","
Next
' ===== Populate new worksheet with parsed JSON =======
JSONtext = Replace(Mid(JSONtext, 2, Len(JSONtext) - 3), ",ParentCategoryId", ",,ParentCategoryId", , , vbTextCompare)
JSONtextarr = Split(JSONtext, "},{", , vbTextCompare)
Worksheets.Add
Range("A2").Resize(UBound(JSONtextarr) + 1, 1).Value = Application.Transpose(JSONtextarr)
Range("A2").CurrentRegion.TextToColumns Destination:=Range("A2"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
Semicolon:=True, Comma:=False, Space:=False, Other:=False, TrailingMinusNumbers:=True
Debug.Print Range("A2").CurrentRegion.Columns.Count
For i = 1 To Range("A2").CurrentRegion.Columns.Count
colHead = Split(Cells(2, i), ":")(0)
Cells(1, i) = colHead
Range("A2").CurrentRegion.Columns(i).Replace What:=colHead & ":", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Next i
End Sub
I am trying to pull JSON values from a URL that I am working with at the moment. I may have done something like this before but I dont know what I'm missing here.
Here is the URL - https://eu-offering.kambicdn.org/offering/v2018/888/listView/golf.json?lang=en_GB&market=GB&client_id=2&channel_id=1&ncid=1568916879040&useCombined=true
And an image for clarity of what is needed to be extracted.
I ran a test using Tinman's approach as can be found here - How to get, JSON values to Work in VBA-JSON? , but i can't even apply his function, PrintJSONAccessors(), here
Public Sub exceljson()
Dim http As Object
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET",
"https://eu-offering.kambicdn.org/offering/v2018/888/listView/golf.json?lang=en_GB&market=GB&client_id=2&channel_id=1&ncid=1568916879040&useCombined=true", False
http.Send
Dim results As Variant
results = BitfinexTextToArray(http.responseText)
Worksheets(1).Range("A1").Resize(UBound(results), UBound(results,2)).Value = results
MsgBox ("complete")
End Sub
Function BitfinexTextToArray(responseText As String) As Variant
Dim item As Variant, JSON As Object
Dim MaxColumns As Long
Set JSON = ParseJson(responseText)
For Each item In JSON
If item.Count > MaxColumns Then MaxColumns = item.Count
Next
Dim results As Variant
ReDim results(1 To JSON.Count, 1 To MaxColumns)
Dim c As Long, r As Long
For Each item In JSON
r = r + 1
For c = 1 To item.Count
results(r, c) = item(c)
Next
Next
BitfinexTextToArray = results
End Function
I need help with pulling the following item values from each of the JSON "event"
1. "englishName"
2. "participant"
3. "oddsFractional"
NOTE: my example uses the JsonConverter library and requires you to add a reference to the Microsoft Scripting Runtime to access the Dictionary object.
I set up a test file with JSON loaded from your URL above. After parsing the JSON data, the exercise becomes understanding how the various levels are nested and what type of data structure is being used. In your JSON, it's a mix of Collection, Array, and Dictionary in various combinations. My example below shows how you have to stack up these nested references to get the data you're looking for.
Review the information in this answer to understand how the JSON is parsed into a hierarchical data structure.
Option Explicit
Public Sub test()
Dim fileNum As Long
fileNum = FreeFile()
Dim filename As String
filename = "C:\Temp\testdata.json"
Dim jsonInput As String
Open filename For Input As #fileNum
jsonInput = Input$(LOF(fileNum), fileNum)
Close fileNum
Dim json As Object
Set json = ParseJson(jsonInput)
Debug.Print " English Name = " & json("events")(1)("event")("englishName")
Debug.Print " Participant = " & json("events")(1)("betOffers")(1)("outcomes")(2)("participant")
Debug.Print "Odds Fractional = " & json("events")(1)("betOffers")(1)("outcomes")(2)("oddsFractional")
End Sub
An even better solution will be to create an intermediate variable and then loop over the contents in an array (or collection or dictionary).
I make a request to a website and paste the JSON response into a single cell.
I get an object required 424 error.
Sub GetJSON()
Dim hReq As Object
Dim JSON As Dictionary
Dim var As Variant
Dim ws As Worksheet
Set ws = Title
'create our URL string and pass the user entered information to it
Dim strUrl As String
strUrl = Range("M24").Value
Set hReq = CreateObject("MSXML2.XMLHTTP")
With hReq
.Open "GET", strUrl, False
.Send
End With
'wrap the response in a JSON root tag "data" to count returned objects
Dim response As String
response = "{""data"":" & hReq.responseText & "}"
Set JSON = JsonConverter.ParseJson(response)
'set array size to accept all returned objects
ReDim var(JSON("data").Count, 1)
Cells(25, 13) = JSON
Erase var
Set var = Nothing
Set hReq = Nothing
Set JSON = Nothing
End Sub
The URL that gives me the response in cell "M24":
https://earthquake.usgs.gov/ws/designmaps/asce7-10.json?latitude=36.497452&longitude=-86.949479&riskCategory=III&siteClass=C&title=Seismic
The code after Qharr's response. I get a run time 0 error even though the error says it ran successfully. Nothing is copied to my cells.
Public Sub GetInfo()
Dim URL As String, json As Object
Dim dict As Object
URL = "https://earthquake.usgs.gov/ws/designmaps/asce7-10.json?latitude=36.497452&longitude=-86.949479&riskCategory=III&siteClass=C&title=Seismic"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.Send
Set json = JsonConverter.ParseJson(.responseText) '<== dictionary
ThisWorkbook.Worksheets("Title").Cells(1, 1) = .responseText
Set dict = json("response")("data")
ws.Cells(13, 27) = "ss: " & dict("ss") & Chr$(10) & "s1: " & dict("s1")
End With
End Sub
I'm not clear what you mean. The entire response can go in a cell as follows.
JSON is an object so you would need Set keyword but you can't set a cell range to the dictionary object - the source of your error.
Option Explicit
Public Sub GetInfo()
Dim URL As String, json As Object
URL = "https://earthquake.usgs.gov/ws/designmaps/asce7-10.json?latitude=36.497452&longitude=-86.949479&riskCategory=III&siteClass=C&title=Seismic"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
Set json = JsonConverter.ParseJson(.responseText) '<== dictionary
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1) = .responseText
End With
End Sub
When you use parsejson you are converting to a dictionary object which you need to do something with. There is simply too much data nested inside to write anything readable (if limit not exceeded) into one cell.
Inner dictionary data quickly descends into nested collections. The nested collection count comes from
Dim dict As Object
Set dict = json("response")("data")
Debug.Print "nested collection count = " & dict("sdSpectrum").Count + dict("smSpectrum").Count
To get just s1 and ss values parse them out:
Dim dict As Object
Set dict = json("response")("data")
ws.Cells(1, 2) = "ss: " & dict("ss") & Chr$(10) & "s1: " & dict("s1")
I have figured out the solution to pasting the response text with Excel 2003. Below is my finished code.
Public Sub datagrab()
Dim URL As String
Dim ws As Object
Dim xmlhttp As New MSXML2.XMLHTTP60
URL = Range("M24").Value 'This is the URL I'm requesting from
xmlhttp.Open "GET", URL, False
xmlhttp.Send
Worksheets("Title").Range("M25").Value = xmlhttp.responseText
End Sub
I’ve run into an issue retrieving values from a non-itemized JSON object. I thought it was simple do so … Just reference the object with the field you want (e.g. JSON(“title”). But I cannot retrieve a value even though it IS there.
This code demonstrates what I’m talking about. (Be sure to put a breakpoint on the “next” line, or it will run for a while.) The strID and strTitle values are not assigned and do not print out. But if you go to the immediate window and type
? JSON2("ID")
? JOON2(“title”)
You get the values. What am I doing wrong? Why can’t I get these values into variables?
Sub testMovie2()
Dim Url As String, data As String, data2 As String
Dim xml As Object, JSON As Object, JSON2 As Object, colObj As Object, colobj2 As Object, item, item2
Dim strID As String, strTitle As String
Url = "https://www.tiff.net/data/films-events-2018.json"
data = getHTTP(Url)
Set JSON = JsonConverter.ParseJson(data)
Set colObj = JSON("items")
For Each item In colObj
Url = "https://www.tiff.net/data/films/" & item("id") & ".JSON"
data2 = getHTTP(Url)
Set JSON2 = JsonConverter.ParseJson(data2)
strID = JSON2("ID")
Debug.Print strID
strTitle = JSON2("Title")
Debug.Print strTitle
Next
End Sub
JSON2 is a dictonary object and to retrieve element from dictonary use below
with key
JSON2.item("id")
JSON2.item("title")
OR
with index
JSON2.Items()(4)
JSON2.Items()(5)
By default dictionary objects are case sensitive
So JSON2("ID") not equals to JSON2("id")
To make it case insensitive use:
JSON2.CompareMode = vbTextCompare
Code:
Sub testMovie2()
Dim url As String, data As String, data2 As String
Dim xml As Object, JSON As Object, JSON2 As Object, colObj As Object, colobj2 As Object, item, item2
Dim strID As String, strTitle As String
url = "https://www.tiff.net/data/films-events-2018.json"
data = getHTTP(url)
Set JSON = JsonConverter.ParseJson(data)
Set colObj = JSON("items")
For Each item In colObj
url = "https://www.tiff.net/data/films/" & item("id") & ".JSON"
data2 = getHTTP(url)
Set JSON2 = JsonConverter.ParseJson(data2)
strID = JSON2.item("id")
Debug.Print strID
strTitle = JSON2.item("title")
Debug.Print strTitle
Next
End Sub
Function getHTTP(url) As String
Dim data As String
Dim xml As Object
Set xml = CreateObject("MSXML2.ServerXMLHTTP")
With xml
.Open "GET", url, False
.setRequestHeader "Content-Type", "text/json"
.send
data = .responseText
End With
getHTTP = data
End Function
I was trying to get some information from a JSON API and everything was going OK. So I started to get mismatch errors when I try to parse values that are inside the “root” of the JSON.
The code I use is below:
Public Sub Times()
Dim http As Object, JSON As Object, i As Integer
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://api.cartolafc.globo.com/time/id/1084847/7", False
http.Send
Set JSON = ParseJson(http.responseText)
i = 2
Application.ScreenUpdating = False
Sheets("Youtube").Select
For Each Item In JSON
Sheets("Mais Escalados").Cells(i, 2).value = Item("pontos")
i = i + 1
Next
Application.ScreenUpdating = True
MsgBox ("Atualização Completa")
End Sub
I can parse the data inside atletas sub-items or any other header changing the code like this:
Sheets("Mais Escalados").Cells(i, 2).value = Item("atletas")("nome")
But when I try to parse information like pontos on the root I get the mismatch error.
This will give you the root value for the key "pontos":
JSON("pontos")
You can't loop over the root keys like you show in your posted code: you would need to check the type of each key's value before you try to write it to the sheet:
Public Sub Times()
Dim http As Object, JSON As Object, i As Integer, k
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://api.cartolafc.globo.com/time/id/1084847/7", False
http.Send
Set JSON = ParseJson(http.responseText)
For Each k In JSON
Debug.Print k, TypeName(JSON(k))
Next
End Sub
Output:
atletas Collection
clubes Dictionary
posicoes Dictionary
status Dictionary
capitao_id Double
time Dictionary
patrimonio Double
esquema_id Double
pontos Double
valor_time Double
rodada_atual Double