I'm trying to scrape elements from xmlhttp.
I'm not too bad with vba, but relatively new to data scraping.
I have previously been using ie.
I can import the html into a cell, but would like to import specifically, the name, id, price and stock level.
The code I'm using to import the data is
Private Sub HTML_VBA_Excel()
Dim oXMLHTTP As Object
Dim sPageHTML As String
Dim sURL As String
'Change the URL before executing the code
sURL = "https://www.superdrug.com/Make-Up/Lips/Lip-Kits/Flower-Beauty-Mix-N%27-Matte-Lipstick-Duo-Tickled-Pink-687/p/769466"
'Extract data from website to Excel using VBA
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
sPageHTML = oXMLHTTP.responseText
'Get webpage data into Excel
sh02.Cells(1, 1) = sPageHTML
End Sub
Thanks in advance for any help received.
Ian
You cannot extract the information reliably from an xmlhttp request issued against the url you show as the content is javascript loaded and will not have run.
Not sure how sustainable the token is (doesn't seem to matter the value used) but you can join the productid, which is the end of your url, with the ajax token present in the page and issue and xmlhttp request using querystring parameters and parse a json response for the items of interest. I use jsonconverter.bas. After downloading and installing the .bas you need to go VBE > Tools > References and add a reference to Microsoft Scripting Runtime.
Some testing seems to indicate any number can be added after the hyphen in place of the token so you could randomly generate a number on the fly to use.
It's worth noting you can comma separate multiple products in the query string and thus do a bulk request. You would need then do a For Each Loop over the collection of dictionaries returned.
Option Explicit
Public Sub GetInfo()
Const URL As String = "https://www.superdrug.com/micrositeProduct/bulk/769466-1548702898380"
Dim json As Object, title As String, price As String, stocking As String, id As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.Send
Set json = jsonconverter.ParseJson(.responsetext)(1)
End With
title = json("name")
price = json("price")("formattedValue") 'json("price")("value")
stocking = json("stockLevel")
id = json("code")
End Sub
If you use a browser then the json string is present within one the script tags as the .innerHTML and you can easily extract from there.
Related
I'm new to web scraping and the HTML language.
I'm trying to write a code in VBA to extract data from the following website:
https://companies.govmu.org:4343/MNSOnlineSearch/
I have an Excel sheet with over 5000 company names and their respective "File No" in columns A and B respectively, and I need to input their "Status" (either "Live" or "Defunct") in column C. This will be done after searching for each company by "File No" and then extracting their status to the Excel sheet.
The issue is that I can't seem to get the element containing the data that I need.
I've already written the bit of code which will extract the "File No" from my Excel sheet, paste it on the webpage in the "File No" search box, and run the search. (You can try searching C5113, as an example).
However, on the resulting webpage, I've tried getting the element containing the data that I need, but it does not work.
For example, I tried to MsgBox (MsgBox is my personal way to check whether my variable contains the data I need) the inner HTML of the tag fieldset (fs) with ID "CompanyList" as shown in the code below, but it returns an error.
I've also tried with another variable named div, of data type HTMLDivElement, and then getting the element by ID "companies".
And finally, I've also tried looping through a variable of type IHTMLElementCollection to look for the element that I need, but it still does not show the element that I need (it shows other elements that I don't need).
Option Explicit
Sub ExtractStatusDetails()
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim resultHtml As HTMLDocument
Dim fs As IHTMLElement
Dim searchBoxes As IHTMLElementCollection
Dim searchButton As Object
Dim homePage As String
homePage = "https://companies.govmu.org:4343/MNSOnlineSearch/"
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate homePage
Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Set html = ie.document
Set searchBoxes = html.getElementsByClassName("col-md-6 col-lg-4")
searchBoxes(0).innerHTML = Replace(searchBoxes(0).innerHTML, "placeholder", "value")
searchBoxes(0).innerHTML = Replace(searchBoxes(0).innerHTML, "Search company by File No...", "C63")
Set searchButton = searchBoxes(0).getElementsByClassName("btn btn-large btn-primary btn-raised")
searchButton(0).Click
Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Set resultHtml = ie.document
Set fs = resultHtml.getElementById("CompanyList")
MsgBox fs.innerHTML
ie.Quit
End Sub
The page does an xmlhttp POST request which retrieves data from a backend data store (likely Oracle GlassFish > JDBC API > data repository e.g. MySQL) . It returns all similar matches, possibly including exact.
You can find the POST request in the network traffic of browser dev tools after you enter the fileNo and press the search button.
Below is a function you can call in a loop over your fileNos to retrieve the company status
Option Explicit
Public Sub test()
Dim fileNo As String, xmlhttp As Object
fileNo = "C5113"
Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
MsgBox GetCompanyStatus(fileNo, xmlhttp)
End Sub
Public Function GetCompanyStatus(ByVal fileNo As String, ByVal xmlhttp As Object) As String
Dim html As HTMLDocument, body As String, fileNos As Object, i As Long
Set html = New HTMLDocument
body = "tabs=tab-1&searchByName=&searchByFileNo=PLACEHOLDER&submitCompanies=&searchByBusName=&searchByBRN=&searchByIncDateFrom=&searchByIncDateTo=&doAction=search"
With xmlhttp
.Open "POST", "https://companies.govmu.org:4343/MNSOnlineSearch/GetCompanies", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send Replace$(body, "PLACEHOLDER", fileNo)
html.body.innerHTML = .responseText
Set fileNos = html.querySelectorAll("td.tdFileNo")
If fileNos.Length > 0 Then
For i = 0 To fileNos.Length - 1
If fileNos.item(i).innerText = fileNo Then
GetCompanyStatus = html.querySelectorAll("td.tdStatus").item(i).innerText
Exit Function
End If
Next i
End If
GetCompanyStatus = "Not found"
End With
End Function
I would instead consider how you can group your requests. As you can post partial file numbers you could cut down on the number of requests considerably by doing in batches with partial file numbers e.g. search for C5 or C51; the backend then does something like "C5%" to return all matches starting with the specified string, and then loop those results searching for your file numbers of interest that fall within that range.
You could have a dictionary with fileNo as key and status as value and update this as you loop the results returned by a request. I think the number of keys is constrained by Long, so no problem I think for storing all your fileNos at the start, in a dictionary, and updating later during requests. You could even have multiple dictionaries that host ranges of fileNos, like the volumes of the good old Encyclopædia Britannica. This would limit loops to dicts you hope to populate from the same request, for example. Is something to play around with an explore.
A website has changed so I can't scrape data from it anymore. Just need to change the set line below I believe but have tried a number of things and none have worked. I'm not very knowledgeable about this area I'm afraid but rest of code is working. Eg url is https://www.gurufocus.com/stock/CFWFF/insider and it is the table of insider transactions I am trying to press the dropdown for and change it to 100 instead of 10:
t = Timer
noTable = 0
Do
Set DropDown = doc.querySelectorAll(".el-dropdown-menu__item")
lastDropDrownItemIndex = DropDown.Length - 1
If Timer - t > MAX_WAIT_SEC Then
noTable = 1
Exit Do
End If
Loop While lastDropDrownItemIndex < 1
If noTable = 1 Then GoTo noTableEscape
DropDown.Item(lastDropDrownItemIndex).Click
Thanks
Ok so, not sure what you are after exactly, but the website you're scraping offers an API which in my opinion could probably make your life a lot easier. To put it simply, this means that it provides an easy way to request for data with the use of some parameters embedded in a URL. It returns the requested data in JSON format.
In the following code I will be using the XMLHTTP request method and a JSON Parser. For this you will need the following:
JSON parser , it helps you parse the downloaded data. Follow the installation instructions to import it in your project
A reference to the Microsoft Scripting Runtime library. The JSON parser needs it.
A reference to the Microsoft WinHTTP Services, Version 5.1 library. It lets you use an HTTP request object.
For demonstration purposes, the following code will only print in the immediate window the first entry's name and position. You can modify the code to fit your needs.
Sub test()
Dim req As New WinHttpRequest
Dim jsonResponse As String
Dim jsonParsed As Object
Dim url As String
Dim pageNum As Integer
Dim numPerPage As Integer
pageNum = 1 'You can change this parameter to navigate in different pages
numPerPage = 100 'You can change this parameter to control the number of entries
url = "https://www.gurufocus.com/reader/_api/stocks/OTCPK:CFWFF/insider?page=" & pageNum & "&per_page=" & numPerPage & "&sort=date%7Cdesc"
With req
.Open "GET", url, False
.setRequestHeader "Accept", "application/json, text/plain, */*"
.setRequestHeader "Authorization", ThisWorkbook.Worksheets("The name of your Worksheet").Range("A1").Value 'I have stored a string that is essential to the request in cell A1
.send
jsonResponse = .responseText
End With
Set jsonParsed = JsonConverter.ParseJson(jsonResponse)
Debug.Print jsonParsed("data")(1)("name") 'get the name parameter of the first entry
Debug.Print jsonParsed("data")(1)("position") 'get the position parameter of the first entry
End Sub
Please note that there's a very long string which is essential to the request, which I have stored in cell A1. This string looks like so:
Bearer
eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImp0aSI6ImUxYjAwMmYxMjczMGRiMTBmMmZkYjJkNDk0YTU4NjRmZDZjOWY3ZGI4ZmI1NDY1NTQ2MzZlMGJhNzkxODUxNmY4NTM2ZWIzZDNhODhmN2VmIn0.eyJhdWQiOiIyIiwianRpIjoiZTFiMDAyZjEyNzMwZGIxMGYyZmRiMmQ0OTRhNTg2NGZkNmM5ZjdkYjhmYjU0NjU1NDYzNmUwYmE3OTE4NTE2Zjg1MzZlYjNkM2E4OGY3ZWYiLCJpYXQiOjE1NTkwNzA3OTcsIm5iZiI6MTU1OTA3MDc5NywiZXhwIjoxNTY5NDM4NzkzLCJzdWIiOiIiLCJzY29wZXMiOltdfQ.mZ4DqhUk9YAU6JYDBScF8MJ_zHPyL94bAec7LxZTaWipcWf9uesdGDMDC9v_7W-6zrtXAUWhk4YAL70E5rpPjM7gusYH0RfO48O2PnaV8gsqXoNCFwFBOHuxh109q7X0YsNkfX2wX8m3XigtK9A_YAGID7wxgX96lwzBevsDJ3borHMcJlQtxidF_Bq2D5WPASsuy3jdY80HkOCR1y4eaSIswBEtK5rPj_xy7VXRbYGhLklqw4wgHgq4blfaHnVVmPXf6k8mx45ye8vPecS-w9kjuDOHVn2mvU6mpBzqEpbH4lqpiqmYG7M-CvB1joEAcMQtcilCvsdfKOusoC2MU4_vPtF3Q4ZFVaEcXIQgomdKtFa_XGpCudit45b2rEFacKMUENqLj_sPwYkgM1IPl1lQfR-VpigqnCHPAxVQAPzqwJvS6CxuYOPmvnrx23fBAillP7LtDHwHtlMpgZUjdB5y6IWsia76crM4kbkrKn3zc8xoAGb1fIrgJlY-9hOzrwsmrchantEdYOFZjcMJvhCnlfvnEm6kT2Sdcu4o6TndTZJjrVmD4mb-jNGy4kw_mAx1DfyqR7GLtCVSzcSLKgrrwCJEL22K2bfXH2HExXvgLFbPXivVZJc70TnF9lJmx_dx79cxAm7szFGIdrs56bAC4mdKpvKL3BNmVY-J-G0
The same string should work for you as well.
The result looks like so:
Brown, James Michael
Senior Officer
Each one of the 100 data entries has the following structure:
It's fairly easy to loop through all the entries as well. For example, to print the name of all the entries you would have to do this:
Dim item As Object
For Each item In jsonParsed("data")
Debug.Print item("name")
Next item
Finally, you can also loop through all the parameters of each entry. For example, the following code prints all the parameters and their corresponding values for the first entry:
Dim key As Variant
For Each key In jsonParsed("data")(1).Keys
Debug.Print key & ": " & jsonParsed("data")(1)(key)
Next key
So this way you can basically access any parameter you want for each entry.
I'm trying to get dollar exchange rate from http://www4.bcb.gov.br/pec/taxas/port/ptaxnpesq.asp?id=txcotacao into a Excel spreadsheet.
I tried to paste as refreshable web query, however, the page opens one step earlier with a form, which has default inputs (that work for me) and then the query copies stuff from this page.
I tried to write a code to submit the form. I tried the .submit, .Click, .FireEvent and many other things I found on internet.
I tried to refer to the button by its name, class, tag, ...
<input title="Pesquisar" class="botao" onclick="limparVazio()" type="submit" value="Pesquisar">
I tried to trigger the form directly or bypass it
<form name="consultarBoletimForm" action="/ptax_internet/consultaBoletim.do?method=consultarBoletim" method="post">
You can use the bcb.gov.br Open Data Portal.
Send a request for a JSON response with the conversion rates from their Exchange rates – daily bulletins.
With the received response, amongst other methods, you can then:
Use the JSON Converter and set the convert the response into a JSON object and work with that;
Parse the response as a string with a regex to get the values
Looking at the results for today's rate on the site:
Input:
Output:
Result:
You can see USD 1 = 3,7048 BRL
① Using JSON object:
Example string to make request:
"https://olinda.bcb.gov.br/olinda/service/PTAX/version/v1/odata/ExchangeRatePeriod(moeda=#moeda,dataInicial=#dataInicial,dataFinalCotacao=#dataFinalCotacao)?%40moeda=%27" & TARGET_CURRENCY & "%27&%40dataInicial=%27" & START_DATE & "%27&%40dataFinalCotacao=%27" & END_DATE & "%27&%24format=json"
I include the start date, end date and currency in the string as well as specify the response format as JSON. I have selected the date to match the website view shown in the images above.
The JSON response is as follows:
I read the response into a string variable and then use JsonConverter.ParseJson(strJSON) to convert to a JSON object, stored in json variable. A quick inspection of the structure:
The begining "{" tells me that json is a dictionary.
I can also see that json("value") is a collection of dictionaries and that the value I am interested in, 3,7048 - remember from the website images above, is stored as "cotacaoCompra".
I can thus use the following script to access that value. The JSON response actually gives rates at 5 different times on that date in question. These are all printed out. The Fechamento (Closing) bulletin rate of 3,7048 we can see matches.
Code:
Option Explicit
Public Sub GetInfo()
Dim strURL As String, strJSON As String, item As Variant, http As Object, json As Object
Const TARGET_CURRENCY As String = "USD"
Const START_DATE As String = "06-13-2018"
Const END_DATE As String = "06-13-2018"
strURL = "https://olinda.bcb.gov.br/olinda/service/PTAX/version/v1/odata/ExchangeRatePeriod(moeda=#moeda,dataInicial=#dataInicial,dataFinalCotacao=#dataFinalCotacao)?%40moeda=%27" & TARGET_CURRENCY & "%27&%40dataInicial=%27" & START_DATE & "%27&%40dataFinalCotacao=%27" & END_DATE & "%27&%24format=json"
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", strURL, False
http.send
strJSON = http.responseText
Set json = JsonConverter.ParseJson(strJSON)
For Each item In json("value")
Debug.Print "rate " & item("cotacaoCompra") & " at " & item("dataHoraCotacao")
Next item
End Sub
Script output:
Notes:
Requires JSONConverter bas added and VBE > Tools > References > Microsoft Scripting RunTime)
② Parsing the responseText with a regex to get the rates:
The regex I will use is
"cotacaoCompra":\d{1,}.\d{1,}
This looks for the literal string "cotacaoCompra":, followed by 1 or more numbers then a ".", then one of more numbers.
I then have to remove the string "cotacaoCompra": with a straight forward replace. Ideally, I would just extract the numbers with "(?<=""cotacaoCompra"":)\d{1,}.\d{1,}"; basically, that says after, but not including "cotacaoCompra":. But that doesn't appear to be supported.
With that in mind the script to get the rates with regex:
Code:
Public Sub GetInfo2()
Dim strURL As String, strJSON As String, item As Variant, http As Object, json As Object
Const TARGET_CURRENCY As String = "USD"
Const START_DATE As String = "06-13-2018"
Const END_DATE As String = "06-13-2018"
strURL = "https://olinda.bcb.gov.br/olinda/service/PTAX/version/v1/odata/ExchangeRatePeriod(moeda=#moeda,dataInicial=#dataInicial,dataFinalCotacao=#dataFinalCotacao)?%40moeda=%27" & TARGET_CURRENCY & "%27&%40dataInicial=%27" & START_DATE & "%27&%40dataFinalCotacao=%27" & END_DATE & "%27&%24format=json"
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", strURL, False
http.send
strJSON = http.responseText
Dim Matches As Object
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = """cotacaoCompra"":\d{1,}.\d{1,}" 'The pattern I really wanted, "(?<=""cotacaoCompra"":)\d{1,}.\d{1,}", doesn't appear to be supported
If Not .test(strJSON) Then Exit Sub
Set Matches = .Execute(strJSON)
Dim match As Object
For Each match In Matches
Debug.Print Replace(match, """cotacaoCompra"":", vbNullString)
Next
End With
End Sub
The "reference URL list" part of the code is where I can drop an individual URL in and the code works fine. But I'd like to make the code more flexible where I can loop through my list of URLs (ideally only changing that portion of my code, or perhaps another small tweak). Here is the code:
Set MyRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
MyRequest.Open "GET", "reference URL list"
MyRequest.Send
Dim Json As Object
Set Json = JsonConverter.ParseJson(MyRequest.ResponseText)
I realize there are multiple ways to approach this -- though I can't find specific information that will slightly augment my approach. I really appreciate the help.
Kyle
This code should do what you want. The key idea is to use an Array to hold the list of website you want to send an HTTP request.
You don't have to use an array typed into VBA as I show below, you could also use a Range in Excel.
Here's the code:
Public Sub HTTP_Req()
Dim MyRequest As Object: Set MyRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
'Add all Urls you want to send a HTTP request to, in an Array
Dim MyUrls: MyUrls = Array("www.google.com", "www.yahoo.com", "www.bing.com")
Dim i As Long
Dim Json As Object
For i = LBound(MyUrls) To UBound(MyUrls)
With MyRequest
.Open "GET", MyUrls(i)
.Send
Set Json = JsonConverter.ParseJson(.ResponseText)
'Do something with the JSON object here
End With
Next
End Sub
I'm looking for a solution to automatically update a cell in my Excel Table via a JSON Webservice. I've found a way to do this via XML, but unfortunately no way to do this via JSON.
The Webservice I want to use is: https://mtgox.com/api/1/BTCUSD/ticker
And I would like to use the value of return.avg.value in my table.
I even thought of using RegEx to get the value but I could't find a way to do so, also.
I really hope there is a solution to my problem without using a lot of different Addons and Plugins for Excel, so my table remains portable.
Thank you for your help,
Frederick
Sub TestJson()
Dim json As String
Dim sc As Object, o
json = WebResponse("https://mtgox.com/api/1/BTCUSD/ticker")
Set sc = CreateObject("ScriptControl")
sc.Language = "JScript"
o = sc.Eval("eval(" & json & ")['return'].avg.value")
Debug.Print o
End Sub
Public Function WebResponse(sURL As String) As String
Dim XmlHttpRequest As Object
Set XmlHttpRequest = CreateObject("Msxml2.ServerXMLHTTP.4.0")
XmlHttpRequest.Open "GET", sURL, False
XmlHttpRequest.send
WebResponse = XmlHttpRequest.responseText
End Function