Extract value from HTML Source - html

I had a macro that used to go to a website pull a value from the A column, for example 517167000, from a particular part of the code and returning that value to a cell.
The html source has changed now and i cant seem to get it to work.
My original code was
Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
With request
.Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
.send
UnitPerBox = Trim(Split(Split(.responseText, "Units per box</td>")(1), "<tr")(0))
End With
End Function
So a working example of the website is
https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-517167000
So that you can go to the website and view the source.
The new html code looks like the below, but its been so long since i did the original macro, that i assumed that i could change
"Units per box</td>")(1), "<tr"
to
"Units per pack</td> <td class="value">")(1), "<tr"
as the below new html code is what is now on the site, and i need the value 2.74 for example, but its not working.
<tr>
<td class="name">Units per pack</td>
<td class="value">2.74</td>
</tr>
Any help would be much appreciated.
An example of
Cheers

If you go and work with .responseText using Split() doing text manipulation you might as well use a regular expression without setting it's Global parameter:
Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Pattern = "\d+(?:\.\d+)?"
With request
.Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
.send
UnitPerBox = RegEx.Execute(Split(.responsetext, "Units per pack</td>")(1))(0)
End With
End Function
Neater (IMO) however is to avoid text manipulation on the .responseText alltogether and work through the HTML document, retrieve the appropriate data straigt from the HTML-table by element-ID and table indexes:
Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
Dim htmlResponse As Object: Set htmlResponse = CreateObject("htmlfile")
With request
.Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
.send
htmlResponse.body.innerHTML = .responseText
UnitPerBox = htmlResponse.body.document.getElementById("specifications").getElementsByTagName("tr")(10).getElementsByTagName("td")(1).innerText
End With
End Function
Note that the table is 0-indexed meaning we are actually retrieving our value from the 11th row, second column. In case you are not sure that the tablecontent is always found on the same indexes, you could also just loop the child nodes:
Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
Dim htmlResponse As Object: Set htmlResponse = CreateObject("htmlfile")
Dim Rws As Object
With request
.Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
.send
htmlResponse.body.innerHTML = .responseText
Set Rws = htmlResponse.body.document.getElementById("specifications").getElementsByTagName("tr")
For Each Rw In Rws
If Rw.getElementsByTagName("td")(0).InnerText = "Units per pack" Then
UnitPerBox = Rw.getElementsByTagName("td")(1).InnerText
Exit For
End If
Next
End With
End Function
Where I personally would prefer to use HTML document over text manipulation, all above options work to retrieve your value =)

Related

Extract data from HTML Element - VBA

I'm new to web scraping and the HTML language.
I'm trying to write a code in VBA to extract data from the following website:
https://companies.govmu.org:4343/MNSOnlineSearch/
I have an Excel sheet with over 5000 company names and their respective "File No" in columns A and B respectively, and I need to input their "Status" (either "Live" or "Defunct") in column C. This will be done after searching for each company by "File No" and then extracting their status to the Excel sheet.
The issue is that I can't seem to get the element containing the data that I need.
I've already written the bit of code which will extract the "File No" from my Excel sheet, paste it on the webpage in the "File No" search box, and run the search. (You can try searching C5113, as an example).
However, on the resulting webpage, I've tried getting the element containing the data that I need, but it does not work.
For example, I tried to MsgBox (MsgBox is my personal way to check whether my variable contains the data I need) the inner HTML of the tag fieldset (fs) with ID "CompanyList" as shown in the code below, but it returns an error.
I've also tried with another variable named div, of data type HTMLDivElement, and then getting the element by ID "companies".
And finally, I've also tried looping through a variable of type IHTMLElementCollection to look for the element that I need, but it still does not show the element that I need (it shows other elements that I don't need).
Option Explicit
Sub ExtractStatusDetails()
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim resultHtml As HTMLDocument
Dim fs As IHTMLElement
Dim searchBoxes As IHTMLElementCollection
Dim searchButton As Object
Dim homePage As String
homePage = "https://companies.govmu.org:4343/MNSOnlineSearch/"
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate homePage
Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Set html = ie.document
Set searchBoxes = html.getElementsByClassName("col-md-6 col-lg-4")
searchBoxes(0).innerHTML = Replace(searchBoxes(0).innerHTML, "placeholder", "value")
searchBoxes(0).innerHTML = Replace(searchBoxes(0).innerHTML, "Search company by File No...", "C63")
Set searchButton = searchBoxes(0).getElementsByClassName("btn btn-large btn-primary btn-raised")
searchButton(0).Click
Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Set resultHtml = ie.document
Set fs = resultHtml.getElementById("CompanyList")
MsgBox fs.innerHTML
ie.Quit
End Sub
The page does an xmlhttp POST request which retrieves data from a backend data store (likely Oracle GlassFish > JDBC API > data repository e.g. MySQL) . It returns all similar matches, possibly including exact.
You can find the POST request in the network traffic of browser dev tools after you enter the fileNo and press the search button.
Below is a function you can call in a loop over your fileNos to retrieve the company status
Option Explicit
Public Sub test()
Dim fileNo As String, xmlhttp As Object
fileNo = "C5113"
Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
MsgBox GetCompanyStatus(fileNo, xmlhttp)
End Sub
Public Function GetCompanyStatus(ByVal fileNo As String, ByVal xmlhttp As Object) As String
Dim html As HTMLDocument, body As String, fileNos As Object, i As Long
Set html = New HTMLDocument
body = "tabs=tab-1&searchByName=&searchByFileNo=PLACEHOLDER&submitCompanies=&searchByBusName=&searchByBRN=&searchByIncDateFrom=&searchByIncDateTo=&doAction=search"
With xmlhttp
.Open "POST", "https://companies.govmu.org:4343/MNSOnlineSearch/GetCompanies", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send Replace$(body, "PLACEHOLDER", fileNo)
html.body.innerHTML = .responseText
Set fileNos = html.querySelectorAll("td.tdFileNo")
If fileNos.Length > 0 Then
For i = 0 To fileNos.Length - 1
If fileNos.item(i).innerText = fileNo Then
GetCompanyStatus = html.querySelectorAll("td.tdStatus").item(i).innerText
Exit Function
End If
Next i
End If
GetCompanyStatus = "Not found"
End With
End Function
I would instead consider how you can group your requests. As you can post partial file numbers you could cut down on the number of requests considerably by doing in batches with partial file numbers e.g. search for C5 or C51; the backend then does something like "C5%" to return all matches starting with the specified string, and then loop those results searching for your file numbers of interest that fall within that range.
You could have a dictionary with fileNo as key and status as value and update this as you loop the results returned by a request. I think the number of keys is constrained by Long, so no problem I think for storing all your fileNos at the start, in a dictionary, and updating later during requests. You could even have multiple dictionaries that host ranges of fileNos, like the volumes of the good old Encyclopædia Britannica. This would limit loops to dicts you hope to populate from the same request, for example. Is something to play around with an explore.

Using XMLHTTP object to parse some websites in VBA

I am trying to pick up "key people" field from a Wikipedia page: https://en.wikipedia.org/wiki/Abbott_Laboratories and to copy that value in my Excel spread sheet.
I managed to do it using xml http which is a method I like for its speed, you can see the code below that is working.
The code is however not flexible enough as the structure of the wiki page can change, for example it doesn't work on this page: https://en.wikipedia.org/wiki/3M
as the tr td structure is not exactly the same (key people is no longer 8th TR for the 3M page)
How can I improve my code?
Public Sub parsehtml()
Dim http As Object, html As New HTMLDocument, topics As Object, titleElem As Object, detailsElem As Object, topic As HTMLHtmlElement
Dim i As Integer
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://en.wikipedia.org/wiki/Abbott_Laboratories", False
http.send
html.body.innerHTML = http.responseText
Set topic = html.getElementsByTagName("tr")(8)
Set titleElem = topic.getElementsByTagName("td")(0)
ThisWorkbook.Sheets(1).Cells(1, 1).Value = titleElem.innerText
End Sub
If row of the table is not fixed for "Key people", then why don't loop the table for "Key people"
I tested with followings modification, it is found working correctly.
In declaration section
Dim topics As HTMLTable, Rw As HTMLTableRow
and then finally
html.body.innerHTML = http.responseText
Set topic = html.getElementsByClassName("infobox vcard")(0)
For Each Rw In topic.Rows
If Rw.Cells(0).innerText = "Key people" Then
ThisWorkbook.Sheets(1).Cells(1, 1).Value = Rw.Cells(1).innerText
Exit For
End If
Next
There is a better faster way. At least for given urls. Match on class name of element and index into returned nodeList. Less returned items to deal with, the path to the element is shorter, and matching with class name is faster than matching on element type.
Option Explicit
Public Sub GetKeyPeople()
Dim html As HTMLDocument, body As String, urls(), i As Long, keyPeople
Set html = New HTMLDocument
urls = Array("https://en.wikipedia.org/wiki/Abbott_Laboratories", "https://en.wikipedia.org/wiki/3M")
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(urls) To UBound(urls)
.Open "GET", urls(i), False
.send
html.body.innerHTML = .responseText
keyPeople = html.querySelectorAll(".agent").item(1).innerText
ThisWorkbook.Worksheets("Sheet1").Cells(i + 1, 1).Value = keyPeople
Next
End With
End Sub

VBA Parse String Response From API

I'm getting below type of responses from an API:
"{"success":false,"error":"Incorrect apikey"}"
"{"success":true,"error":"Correct apikey"}"
In VBA, how to know if response is true or false ?
My current code is below and am getting response in hReq.ResponseText and it is not always fixed (more keys can be added to json). Although the part below is fixed and I have to get only if it returning true or false.
"{"success":false
Code:
Dim hReq As Object
Dim strUrl As String
strUrl = "https://myWebAPI/myMethod?apikey=123456"
Set hReq = CreateObject("MSXML2.XMLHTTP")
With hReq
.Open "GET", strUrl, False
.Send
End With
MsgBox hReq.ResponseText
I know this is basic question but sorry am new to VBA.
Purely based on what you have shown you can get True/False with
Split(Split(hReq.ResponseText, ":")(1), ",")(0)

VBA input data in the Web field

I am coding vba to open the central bank's website and input the values ​​and extract the data, I usually do this on the mail site, bank of Brazil etc ...
() of the central bank
I can not give the input value via vba in the textbox I've already tried:
Ie.Document.all.Item("valueConverter").Innertext="1"
Ie.Document.getElementById("valueConverter").Value="1"
Ie.Document.getElementById("valueConverter")(0).Value="1"
Ie.Document.getElementByName("valueConverter").Value = "1"
The Elements of this site is this:
<Input type = "text" name = "valueConverter" maxlength = "17" size "20" value onkeypress = "return (MascaraMoeda (this, '.', ',', Event)
Does anyone know how?
tl;dr;
I cannot mark this as a duplicate as there is no accepted answer to where I posted an answer to a similar question.
Not sure of the protocol as simply posting a link in the comments doesn't mean it will be found again.
My full answer is here: Excel Web Query Submit Issues
To summarize:
You can use the bcb.gov.br Open Data Portal.
Send a request for a JSON response with the conversion rates from their Exchange rates – daily bulletins.
With the received response, amongst other methods, you can then:
Use the JSON Converter .basa and set the convert the response into a JSON object and work with that
Parse the response as a string with a regex to get the values
For brevity, I will give you just the second method here and you can view my other answer for both methods:
Public Sub GetInfo2()
Dim strURL As String, strJSON As String, item As Variant, http As Object, json As Object
Const TARGET_CURRENCY As String = "USD"
Const START_DATE As String = "06-13-2018"
Const END_DATE As String = "06-13-2018"
strURL = "https://olinda.bcb.gov.br/olinda/service/PTAX/version/v1/odata/ExchangeRatePeriod(moeda=#moeda,dataInicial=#dataInicial,dataFinalCotacao=#dataFinalCotacao)?%40moeda=%27" & TARGET_CURRENCY & "%27&%40dataInicial=%27" & START_DATE & "%27&%40dataFinalCotacao=%27" & END_DATE & "%27&%24format=json"
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", strURL, False
http.send
strJSON = http.responseText
Dim Matches As Object
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = """cotacaoCompra"":\d{1,}.\d{1,}" 'The pattern I really wanted, "(?<=""cotacaoCompra"":)\d{1,}.\d{1,}", doesn't appear to be supported
If Not .test(strJSON) Then Exit Sub
Set Matches = .Execute(strJSON)
Dim match As Object
For Each match In Matches
Debug.Print Replace(match, """cotacaoCompra"":", vbNullString)
Next
End With
End Sub

Loop through list of URLs using .Open (Parsing JSON in VBA)

The "reference URL list" part of the code is where I can drop an individual URL in and the code works fine. But I'd like to make the code more flexible where I can loop through my list of URLs (ideally only changing that portion of my code, or perhaps another small tweak). Here is the code:
Set MyRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
MyRequest.Open "GET", "reference URL list"
MyRequest.Send
Dim Json As Object
Set Json = JsonConverter.ParseJson(MyRequest.ResponseText)
I realize there are multiple ways to approach this -- though I can't find specific information that will slightly augment my approach. I really appreciate the help.
Kyle
This code should do what you want. The key idea is to use an Array to hold the list of website you want to send an HTTP request.
You don't have to use an array typed into VBA as I show below, you could also use a Range in Excel.
Here's the code:
Public Sub HTTP_Req()
Dim MyRequest As Object: Set MyRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
'Add all Urls you want to send a HTTP request to, in an Array
Dim MyUrls: MyUrls = Array("www.google.com", "www.yahoo.com", "www.bing.com")
Dim i As Long
Dim Json As Object
For i = LBound(MyUrls) To UBound(MyUrls)
With MyRequest
.Open "GET", MyUrls(i)
.Send
Set Json = JsonConverter.ParseJson(.ResponseText)
'Do something with the JSON object here
End With
Next
End Sub