I am currently attempting to retrieve a json string using XMLHTTP in VBA from the website url detailed below. Loading the first url creates a session, which I retrieve from the HTML body. A call to the second url, using the session ID & other request headers visible from develop tools results in a 403 error. I have tried multiple combinations of headers with no effect. For deployment purposes a VBA solution is required. Any input/ideas would be much appreciated.
Sub test()
Dim wbk_TB As Workbook
Dim var_array As Variant
Dim url As String
Dim data As Variant
Dim XMLHTTP As MSXML2.XMLHTTP
Dim hdoc As MSHTML.HTMLDocument
Set wbk_TB = ThisWorkbook
Set XMLHTTP = New MSXML2.XMLHTTP
url = "http://www.eex-transparency.com/homepage/power/germany/production/availability/non-usability"
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Accept", "application/json, text/plain, */*"
XMLHTTP.send
data = XMLHTTP.responseText
Dim HTMLdoc As MSHTML.HTMLDocument
Set HTMLdoc = New MSHTML.HTMLDocument
HTMLdoc.body.innerHTML = XMLHTTP.responseText
Name = "session=" & HTMLdoc.getElementsByName("session").Item(0).Value
url = "http://www.eex-transparency.com/dsp/tem-12?country=de&limit=50&offset=50"
XMLHTTP.Open "GET", url, True
XMLHTTP.setRequestHeader "Host", "www.eex-transparency.com"
XMLHTTP.setRequestHeader "Proxy-Connection", "keep-alive"
XMLHTTP.setRequestHeader "Accept", "application/json, text/plain, */*"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36"
XMLHTTP.setRequestHeader "Referer", "http://www.eex-transparency.com/homepage/power/germany/production/availability/non-usability"
XMLHTTP.setRequestHeader "Accept-Encoding", "gzip,deflate,sdch"
XMLHTTP.setRequestHeader "Cache-Control", "max-age=0"
XMLHTTP.setRequestHeader "Accept-Language", "en-US,en;q=0.8"
XMLHTTP.setRequestHeader "Cookie", Name
XMLHTTP.send
While XMLHTTP.readyState <> 4
DoEvents
Wend
data = XMLHTTP.responseText
End Sub
XMLHttp object does not allow unsafe header settings including spoofed referer header. Details are available in this answer
As the referer header is missing in the request, a status 403 is returned. In case you need to get the JSON from VBA, you would need to use an Internet Explorer object and browse to the first URL and once that is loaded, need to navigate to the Second URL by programatically emulating a click on the correct link and then try to capture the data.
Related
I want to extract the following items from a single website response to an Excel sheet:
Value of one of the cookies.
A value of an ID from the body of the response.
Cookie value to capture:
ID value to capture from HTML body:
I have searched for the solution, but I can find a way to pull a cookie separately with a different code and the id value separately from the HTML response body through another code.
However, combining the codes doesn't work as I need to use the same cookie value and the id value from the response in the subsequent post request.
To make the flow easier to understand, I will summarise my expectation below:
Visit "Site 1" grab the "Cookie" value and unique "ID" value from the response.
Pass the two values received in the previous response to the request of "Site 2".
Grab the link from the response of "Site 2" and visit "Site 3".
The code I have used to receive cookie values and HTML body content, which throws an error if uncommenting the codes to pull HTML body content. Kindly let me know where I am making a mistake or try a new way. (I have tried different way around, so I have kept them as comments.)
Sub Cookie_and_HTMLbody()
Dim strCookie As Variant
Dim strToken As Variant
Dim Doc As Object
Dim pontod As Object
'Dim Elements As IHTMLElementCollection
'Dim Element As IHTMLElement
On Error Resume Next
Set Doc = New HTMLDocument
With CreateObject("WinHttp.WinHttpRequest.5.1")
'With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://example.com", False
.setRequestHeader "Upgrade-Insecure-Requests", "1"
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
.setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
.setRequestHeader "Sec-Fetch-Site", "none"
.setRequestHeader "Sec-Fetch-Mode", "navigate"
.setRequestHeader "Sec-Fetch-User", "?1"
.setRequestHeader "Sec-Fetch-Dest", "document"
.setRequestHeader "Accept-Encoding", "gzip, deflate"
.setRequestHeader "Accept-Language", "en-US,en;q=0.9"
.setRequestHeader "Connection", "close"
.send
Doc.body.innerHTML = .responseText
Set pontod = Doc.getElementById("trialrequestlanding").getElementsByTagName("div")(1).getElementsByTagName("div")(1).getElementsByTagName("div")(1).getElementsByTagName("div")(1).getElementsByTagName("form")(1).getElementsByTagName("div")(1).getElementsByTagName("input")(1)
strCookie = .getAllResponseHeaders
'strCookie = .getResponseHeader("Set-Cookie:")
'strCookie = Split(strCookie, "Set-Cookie:")
'strCookie = Trim(strCookie(UBound(strCookie)))
strCookie = Split(strCookie, vbCrLf)
strCookie = Trim(Split(Split(strCookie(5), ";")(0), ":")(1)) & "; " & Trim(Split(Split(strCookie(6), ";")(0), ":")(1))
MsgBox strCookie
'.responseType = document
'Doc = .responseText
strToken = pontod.getAttribute("value")
'strToken = Doc.querySelector("input[name='RequestVerificationToken']").getAttribute("value")
'strToken = document.getElementsByTagName("input")
'Set Doc = ie.document
MsgBox strToken
'Set Elements = .getElementsByTagName("input")
'For Each Element In Elements
' If Element.ID = "RequestVerificationToken" Then
'Range("c2").Value = Element.innerText
' MsgBox Element.Value
' End If
'Next Element
'Set Elements = Nothing
'Doc.Quit
'Set Doc = Nothing
End With
End Sub
Another code that works for retrieving a value from the HTML body is given below.
Sub Generate_Email()
Dim Shell As Object
Dim i As Variant
Dim bie As Object
Dim ie As Object
Dim Doc As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
'Set ie = New InternetExplorerMedium
Set ie = CreateObject("InternetExplorer.Application")
'Set ie = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
'Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "https://randomsite.com/"
Do
DoEvents
Loop Until ie.readyState = 4
'Do While ie.Busy Or ie.readyState <> 4
'DoEvents
'Loop
Set Doc = ie.document
Set Elements = Doc.getElementsByTagName("span")
For Each Element In Elements
If Element.ID = "email_ch_text" Then
Range("c2").Value = Element.innerText
End If
Next Element
Set Elements = Nothing
ie.Visible = True
ie.Quit
Set ie = Nothing
Set objWMIService = GetObject("winmgmts:\\.\root\cimv2")
Set colItems = objWMIService.ExecQuery("Select * From Win32_Process")
On Error Resume Next
For Each objItem In colItems
'msgbox objItem.name & " " & objItem.ProcessID & " " & objItem.CommandLine
If objItem.Name = "ielowutil.exe" Then objItem.Terminate
Next
For Each objItem In colItems
'msgbox objItem.name & " " & objItem.ProcessID & " " & objItem.CommandLine
If objItem.Name = "iexplore.exe" Then objItem.Terminate
Next
End Sub
How to retrieve both the values using a single code?
UPDATE (02 May 2021):
I have rewritten the code that supports extracting cookie properly but has an issue with pulling the element attribute "value", as shown in image 2.
Kindly help me to identify what mistake blocks me from extracting the element attribute in the below code.
Sub Test_Cookie_and_HTML()
Dim pontod As Object
Dim html As Object
On Error Resume Next
Set html = New HTMLDocument
With CreateObject("WinHttp.WinHttpRequest.5.1")
.Open "GET", "https://portswigger.net/burp/pro/trial", False
.setRequestHeader "Upgrade-Insecure-Requests", "1"
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
.setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
.setRequestHeader "Sec-Fetch-Site", "none"
.setRequestHeader "Sec-Fetch-Mode", "navigate"
.setRequestHeader "Sec-Fetch-User", "?1"
.setRequestHeader "Sec-Fetch-Dest", "document"
.setRequestHeader "Accept-Encoding", "gzip, deflate"
.setRequestHeader "Accept-Language", "en-US,en;q=0.9"
.setRequestHeader "Connection", "close"
.send
html.body.innerHTML = .responseText
Dim strCookie As String
Dim sessionidCookie As String
strCookie = .getResponseHeader("Set-Cookie") ' --> "SESSIONID=40DD2DFCAF24A2D64544F55194FCE04E;path=/pamsservices;HttpOnly"
sessionidCookie = GetsessionIdCookie(strCookie) ' Strips to "SESSIONID=40DD2DFCAF24A2D64544F55194FCE04E"
MsgBox sessionidCookie
MsgBox RequestVerificationToken
End With
Set pontod = html.getElementById("trialrequestlanding").getElementsByTagName("input")(1)
MsgBox pontod.getAttribute("value")
End Sub
My JSON response:
{"type":"FeatureCollection","totalFeatures":1,"features":[{"type":"Feature","id":"pand3d.6317078","geometry":{"type":"Polygon","coordinates":[[[125290.418,479247.512,0],[125289.696,479248.817,0],[125287.842,479247.791,0],[125264.136,479234.672,0],[125262.123,479233.558,0],[125262.675,479232.56,0],[125281.04,479199.376,0],[125281.226,479199.039,0],[125283.611,479200.359,0],[125286.739,479202.09,0],[125287.237,479202.366,0],[125289.944,479203.864,0],[125290.547,479204.198,0],[125299.388,479209.09,0],[125299.965,479209.409,0],[125302.747,479210.949,0],[125303.148,479211.171,0],[125306.787,479213.185,0],[125308.799,479214.298,0],[125308.765,479214.36,0],[125290.418,479247.512,0]]]},"geometry_name":"geovlak","properties":{"gid":6317078,"identificatie":"0363100012148538","aanduidingrecordinactief":false,"aanduidingrecordcorrectie":0,"officieel":false,"inonderzoek":false,"documentnummer":"GV00000406","documentdatum":"2010-09-09Z","bouwjaar":"2005-01-01Z","begindatumtijdvakgeldigheid":"2010-09-08T22:00:00Z","einddatumtijdvakgeldigheid":null,"gemeentecode":"0363","ground-0.00":-3.64,"ground-0.10":-0.9,"ground-0.20":-0.89,"ground-0.30":-0.88,"ground-0.40":-0.88,"ground-0.50":-0.87,"roof-0.25":27.52,"rmse-0.25":1.09,"roof-0.50":27.57,"rmse-0.50":1.09,"roof-0.75":29.91,"rmse-0.75":1.08,"roof-0.90":30.24,"rmse-0.90":1.08,"roof-0.95":30.35,"rmse-0.95":1.08,"roof-0.99":30.36,"rmse-0.99":1.08,"roof_flat":false,"nr_ground_pts":3515,"nr_roof_pts":6432,"ahn_file_date":"2014-02-01T23:00:00Z","ahn_version":3,"height_valid":true,"tile_id":"25gz2","bbox":[125262.123,479199.039,125308.799,479248.817]}}],"crs":{"type":"name","properties":{"name":"urn:ogc:def:crs:EPSG::28992"}},"bbox":[125262.123,479199.039,125308.799,479248.817]}
I want to parse ["roof-0.99":30.36] from the {properties} object, which is in an array [features], but getting an error of "Invalid arguments"
Here is my attempt
Sub BAG3D()
Dim ws As Worksheet: Set ws = Blad1
'Variabeles for function
Dim BAG3D As Object
Dim objHTTP As Object
Set objHTTP = CreateObject("MSXML2.XMLHTTP")
objHTTP.Open "GET", "http://3dbag.bk.tudelft.nl/data/wfs?SERVICE=WFS&REQUEST=GetFeature&TYPENAMES=BAG3D:pand3d&CQL_FILTER=identificatie=%270363100012148538%27&outputFormat=json", False
' objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
' objHTTP.setRequestHeader "Accept", "application/hal+json"
' objHTTP.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
' objHTTP.setRequestHeader "X-Api-Key", KEY
objHTTP.send
Set BAG3D = ParseJson(objHTTP.responseText)
Debug.Print objHTTP.responseText
For Each Item In BAG3D("features")(0)("0")("properties")
'Look for "roof-0.99" and get value
ws.Range("Y3").Value = BAG3D("roof-0.99") 'probably not right
Next
End Sub
What should I modify in my parser to properly parse this roof-0.99 into my spreadsheet?
Sub BAG3D()
Dim ws As Worksheet: Set ws = Blad1
'Variabeles for function
Dim BAG3D As Object
Dim objHTTP As Object
Set objHTTP = CreateObject("MSXML2.XMLHTTP")
objHTTP.Open "GET", "http://3dbag.bk.tudelft.nl/data/wfs?SERVICE=WFS&REQUEST=GetFeature&TYPENAMES=BAG3D:pand3d&CQL_FILTER=identificatie=%270363100012148538%27&outputFormat=json", False
' objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
' objHTTP.setRequestHeader "Accept", "application/hal+json"
' objHTTP.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
' objHTTP.setRequestHeader "X-Api-Key", KEY
objHTTP.send
Set BAG3D = ParseJson(objHTTP.responseText)
Debug.Print objHTTP.responseText
ws.Range("Y3").Value = BAG3D("features")(1)("properties")("roof-0.99")
End Sub
I got excellent help from "asmitu" who helped me log-in to the USGA GHIN site with the following code. However, my real end problem was to log-In and then maneuver to this 2nd URL (“https://www.ghin.com/golfer-lookup/following“) to extract my data that is stored on the 2nd URL page. Is there a way to modify the code below so that I land on the 2nd page after login?
Sub GHIN_Login()
Const Url = "https://api2.ghin.com/api/v1/public/login.json?"
Dim Http As New XMLHTTP60, ghinNum$, lastName$
ghinNum = "" 'put your ghinNum here
lastName = "" 'put your lastName here
With Http
.Open "GET", Url & "ghinNumber=" & ghinNum & "&lastName=" & lastName & "&remember_me=false", False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
.setRequestHeader "Referer", "https://www.ghin.com/login"
.send
End With
MsgBox Http.responseText
End Sub
The content of that target page generates dynamically, so you can't parse the required fields using xhr. However, the following attempt should lead you grab the json response having required fields within it.
Sub GetInformation()
Const Url = "https://api2.ghin.com/api/v1/public/login.json?"
Const Link = "https://api2.ghin.com/api/v1/followed_golfers/"
Dim Http As New XMLHTTP60, ghinNum$, lastName$
ghinNum = ""
lastName = ""
With Http
.Open "GET", Url & "ghinNumber=" & ghinNum & "&lastName=" & lastName & "&remember_me=false", False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
.setRequestHeader "Referer", "https://www.ghin.com/login"
.send
.Open "GET", Link & ghinNum & ".json", False
.send
End With
MsgBox Http.responseText
End Sub
I have tried two VBA XML methods for logging on to the USGA Website, it seems straight forward, but neither works?! To test this, you will need your own GHIN Number and Last Name. Can someone please point out how I an screwing this up?
website = "https://www.ghin.com/login"
Sub Get_GHIN_Data()
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim price As Variant
website = "https://www.ghin.com/login"
Set request = CreateObject("MSXML2.XMLHTTP")
request.Open "GET", website, False
'request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
request.send
response = StrConv(request.responseBody, vbUnicode)
html.body.innerHTML = response
'********* Method 1 ************************************
'Dim oLogin As Object, oPassword As Object
'Set oLogin = .document.getElementsByName("ghinNumber")(0)
'Set oPassword = .document.getElementsByName("lastName")(0)
'oLogin.Value = ghinNumber 'real GHIN NUMBER
'oPassword.Value = LastName 'real Last Name
'html.document.forms(0).submit
'********* Method 2 ************************************
'html.getElementById("ghinNumber").Value = "ghinNumber" 'real GHIN NUMBER
'html.getElementById("lastName").Value = "Last name" 'real Last Name
'html.getElementClassName("btn fill cardinal").Click
'html.forms(0).submit
End Sub
Did you try this way? I think it will work.
Sub GetInformation()
Const Url = "https://api2.ghin.com/api/v1/public/login.json?"
Dim Http As New XMLHTTP60, ghinNum$, lastName$
ghinNum = "" 'put your ghinNum here
lastName = "" 'put your lastName here
With Http
.Open "GET", Url & "ghinNumber=" & ghinNum & "&lastName=" & lastName & "&remember_me=false", False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
.setRequestHeader "Referer", "https://www.ghin.com/login"
.send
End With
MsgBox Http.responseText
End Sub
I'm trying to download files(PDFs) silently from a website with VBA. So far I login without issue entering UserName & Password on the initial screen, navigate to the reports page within the site, get my list of files successfully in a table. I get the URL of the file in question without issue. Here's where I hit a wall. I do download a file but get a security warning when i open it that I must be logged in to view it. I can simulate this warning by pasting a URL into any browser when I'm not logged in & they look the same. So I'm downloading but not authenticating.
The code just on the download issue:
Dim strCookie As String
Dim strResponse As String
Dim xobj As Object
Dim WinHttpReq As Object
Dim WinHttpReq2 As Object
Dim oStream As Object
' Set xobj = New WinHttp.WinHttpRequest
strDocLink = "https://atlasbridge.com" & strDocLink & "&RT=PREVMAIL"
Debug.Print strDocLink
' launch tab & goto url/doc
' try to download the link(this is the url of the file)
' strDocLink
Set WinHttpReq = CreateObject("WINHTTP.WinHTTPRequest.5.1")
strUrl = "https://atlasbridge.com/search/AgencyReports.aspx"
WinHttpReq.Open "GET", strUrl, False
WinHttpReq.Option(WinHttpRequestOption_EnableRedirects) = False
WinHttpReq.setRequestHeader "Referer", "https://atlasbridge.com/search/AgencyReports.aspx"
WinHttpReq.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
WinHttpReq.setRequestHeader "Connection", "keep-alive"
WinHttpReq.setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
WinHttpReq.setRequestHeader "Accept-Language", "en-US,en;q=0.5"
WinHttpReq.Send
If WinHttpReq.Status = 200 Then
strResponse = WinHttpReq.responseText
Debug.Print strResponse
strCookie = WinHttpReq.getResponseHeader("Set-Cookie") ' this only gets the cookie; cookie seems include the session id
resp = WinHttpReq.getAllResponseHeaders
' resp = WinHttpReq.responseBody
' strCookie = WinHttpReq.getResponseHeader("Cookie") ' doesnt find the requested header
Debug.Print strCookie
Debug.Print resp
End If
' then open second session & try to get document
Set WinHttpReq2 = CreateObject("WINHTTP.WinHTTPRequest.5.1")
WinHttpReq2.Open "GET", strDocLink, False
WinHttpReq2.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
WinHttpReq2.setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
WinHttpReq2.setRequestHeader "Accept-Language", "en-US,en;q=0.5"
WinHttpReq2.setRequestHeader "Referer", "https://atlasbridge.com/search/AgencyReports.aspx"
WinHttpReq2.setRequestHeader "Connection", "keep-alive"
WinHttpReq2.setRequestHeader "Host", "atlasbridge.com:443" '
WinHttpReq2.setRequestHeader "Accept-Encoding", "gzip, deflate, br"
' WinHttpReq2.setRequestHeader "Transfer-Encoding", "chunked"
' doesnt like this one causes error on the .send
WinHttpReq2.setRequestHeader "Cache-Control", "private"
WinHttpReq2.setRequestHeader "Upgrade-Insecure-Requests", "1"
WinHttpReq2.setRequestHeader "Content-Type", "application/pdf"
WinHttpReq2.setRequestHeader "Cookie", strCookie
WinHttpReq2.Send
If WinHttpReq2.Status = 200 Then
Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq2.responseBody
oStream.SaveToFile "C:\Users\MyUserName\Desktop\DownloadedMail\atlasreportdownload.ashx.pdf", 1 ' 1 = no overwrite, 2 = overwrite
oStream.Close
End If
I've tried a few different things, but I don't believe I'm getting the full cookie & session ID.
The cookie I get back in WinHttpReq.getResponseHeader("Set-Cookie") or getAllResponseHeaders looks like:
NSC_bumbtcsjehf.dpn_TTM_443_MCWT=ffffffffc3a00a0a000000000005e445a4a423660;Version=1;Max-Age=2400;path=/;secure;httponly
But when I use LiveHeaders in Firefox I see:
Cookie: ASP.NET_SessionId=z2e4adilfjgiyynx2mntnh1k; NSC_bumbtcsjehf.dpn_TTM_443_MCWT=ffffffffc3a00a0a000000000005e445a4a423660; AuthToken=0be22946-a97a-442e-bd93-c80f0c96a525; AtlasLastMessage=1173; lc_sso7549731=1546651094987; __lc.visitor_id.7549731=S1546651090.26728e19e6
But I can't seem to expose that full cookie with AuthToken & Session ID, etc. when I Debug.Print the response. Can someone point me in the right direction so I can test a variation on what I'm doing? Thank you in advance.
Update: The response headers from the first request:
Cache-Control: private
Date: Wed, 16 Jan 2019 22:04:54 GMT
Content-Length: 164
Content-Type: text/html; charset=utf-8
Location: /default.aspx?err=Expired&dest=%2fhome.aspx
Server: Microsoft-IIS/7.0
Set-Cookie: ASP.NET_SessionId=mo0owzztbul5of0litxox5kx; path=/; secure; HttpOnly
Set-Cookie: NSC_bumbtcsjehf.dpn_TTM_443_MCWT=ffffffffc3a00a1a45525d5f4f58455e445a4a423660;Version=1;Max-Age=2400;path=/;secure;httponly
X-AspNet-Version: 4.0.30319
X-UA-Compatible: IE=edge
X-Powered-By: ASP.NET
I'm working on downloading the response body now.