I'm getting stuck at vba runtime error 424 - html

I'm getting
run-time error 424
in 68th row (line)
request.Open "GET", Url, False
and I don't know how to fix it.
My previous question I posted ;
How to scrape specific part of online english dictionary?
My final goal is to get result like this;
A B
beginning bɪˈɡɪnɪŋ
behalf bɪˈhæf
behave bɪˈheɪv
behaviour bɪˈheɪvjər
belong bɪˈlɔːŋ
below bɪˈloʊ
bird bɜːrd
biscuit ˈbɪskɪt
Here's code I wrote, and it's mostly based on someone else's code I found on internet.
' Microsoft ActiveX Data Objects x.x Library
' Microsoft XML, v3.0
' Microsoft VBScript Regular Expressions
Sub ParseHelp()
' Word reference from
Dim Url As String
Url = "https://www.oxfordlearnersdictionaries.com/definition/english/" & Cells(ActiveCell.Row, "B").Value
' Get dictionary's html
Dim Html As String
Html = GetHtml(Url)
' Check error
If InStr(Html, "<TITLE>Not Found</Title>") > 0 Then
MsgBox "404"
Exit Sub
End If
' Extract phonetic alphabet from HTML
Dim wrapPattern As String
wrapPattern = "<span class='name' (.*?)</span>"
Set wrapCollection = FindRegexpMatch(Html, wrapPattern)
' MsgBox StripHtml(CStr(wrapCollection(1)))
' Fill phonetic alphabet into cell
If Not wrapCollection Is Nothing Then
Dim wrap As String
On Error Resume Next
wrap = StripHtml(CStr(wrapCollection(1)))
If Err.Number <> 0 Then
wrap = ""
End If
Cells(ActiveCell.Row, "C").Value = wrap
Else
MsgBox "not found"
End If
End Sub
Public Function StripHtml(Html As String) As String
Dim RegEx As New RegExp
Dim sOut As String
Html = Replace(Html, "</li>", vbNewLine)
Html = Replace(Html, " ", " ")
With RegEx
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "<[^>]+>"
End With
sOut = RegEx.Replace(Html, "")
StripHtml = sOut
Set RegEx = Nothing
End Function
Public Function GetHtml(Url As String) As String
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.serverXMLHTTP")
Dim converter As New ADODB.stream
' Get
request.Open "GET", Url, False
request.send
' raw bytes
converter.Open
converter.Type = adTypeBinary
converter.Write request.responseBody
' read
converter.Position = 0
converter.Type = adTypeText
converter.Charset = "utf-8"
' close
GetHtml = converter.ReadText
converter.Close
End Function
Public Function FindRegexpMatch(txt As String, pat As String) As Collection
Set FindRegexpMatch = New Collection
Dim rx As New RegExp
Dim matcol As MatchCollection
Dim mat As Match
Dim ret As String
Dim delimiter As String
txt = Replace(txt, Chr(10), "")
txt = Replace(txt, Chr(13), "")
rx.Global = True
rx.IgnoreCase = True
rx.MultiLine = True
rx.Pattern = pat
Set matcol = rx.Execute(txt)
'MsgBox "Match:" & matcol.Count
On Error GoTo ErrorHandler
For Each mat In matcol
'FindRegexpMatch.Add mat.SubMatches(0)
FindRegexpMatch.Add mat.Value
Next mat
Set rx = Nothing
' Insert code that might generate an error here
Exit Function
ErrorHandler:
' Insert code to handle the error here
MsgBox "FindRegexpMatch. " & Err.GetException()
Resume Next
End Function
Any kind of help would be greatly appreciated.

The following is an example of how to read in values from column A and write out pronounciations to column B. It uses css selectors to match a child node then steps up to parentNode in order to ensure entire pronounciation is grabbed. There are a number of ways you could have matched on the parent node to get the second pronounciation. Note that I use a parent node and Replace as the pronounciation may span multiple childNodes.
If doing this for lots of lookups please be a good netizen and put some waits in the code so as to not bombard the site with requests.
Option Explicit
Public Sub WriteOutPronounciations()
Dim html As MSHTML.HTMLDocument, i As Long, ws As Worksheet
Dim data As String, lastRow As Long, urls()
Set ws = ThisWorkbook.Worksheets("Sheet1")
lastRow = ws.Cells(ws.rows.Count, "A").End(xlUp).row 'you need at least two words in column A or change the redim.
urls = Application.Transpose(ws.Range("A1:A" & lastRow).Value)
ReDim results(1 To UBound(urls))
Set html = New MSHTML.HTMLDocument
With CreateObject("MSXML2.ServerXMLHTTP")
For i = LBound(urls) To UBound(urls)
.Open "GET", "https://www.oxfordlearnersdictionaries.com/definition/english/" & urls(i), False
.send
html.body.innerHTML = .responseText
data = Replace$(Replace$(html.querySelector(".name ~ .wrap").ParentNode.innerText, "/", vbNullString), Chr$(10), Chr$(32))
results(i) = Right$(data, Len(data) - 4)
Next
End With
With ThisWorkbook.Worksheets(1)
.Cells(1, 2).Resize(UBound(results, 1), 1) = Application.Transpose(results)
End With
End Sub
Required references (VBE>Tools>References):
Microsoft HTML Object Library
Should you go down the API route then here is a small example. You can make 1000 free calls in a month with Prototype account. The next best, depending on how many calls you wish to make looks like the 10,001 calls (that one extra PAYG call halves the price). # calls will be affected by whether word is head word or needs lemmas lookup call first. The endpoint construction you need is GET /entries/{source_lang}/{word_id}?fields=pronunciations though that doesn't seem to filter massively. You will need a json parser to handle the json returned e.g. github.com/VBA-tools/VBA-JSON/blob/master/JsonConverter.bas. Download raw code from there and add to standard module called JsonConverter. You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
Option Explicit
Public Sub WriteOutPronounciations()
Dim html As MSHTML.HTMLDocument, i As Long, ws As Worksheet
Dim data As String, lastRow As Long, words()
'If not performing lemmas lookup then must be head word e.g. behave, behalf
Const appId As String = "yourAppId"
Const appKey As String = "yourAppKey"
Set ws = ThisWorkbook.Worksheets("Sheet1")
lastRow = ws.Cells(ws.rows.Count, "A").End(xlUp).row
words = Application.Transpose(ws.Range("A1:A" & lastRow).Value)
ReDim results(1 To UBound(words))
Set html = New MSHTML.HTMLDocument
Dim json As Object
With CreateObject("MSXML2.ServerXMLHTTP")
For i = LBound(words) To UBound(words)
.Open "GET", "https://od-api.oxforddictionaries.com/api/v2/entries/en-us/" & LCase$(words(i)) & "?fields=pronunciations", False
.setRequestHeader "app_id", appId
.setRequestHeader "app_key", appKey
.setRequestHeader "ContentType", "application/json"
.send
Set json = JsonConverter.ParseJson(.responseText)
results(i) = IIf(json("results")(1)("type") = "headword", json("results")(1)("lexicalEntries")(1)("pronunciations")(2)("phoneticSpelling"), "lemmas lookup required")
Set json = Nothing
Next
End With
With ThisWorkbook.Worksheets(1)
.Cells(1, 2).Resize(UBound(results, 1), 1) = Application.Transpose(results)
End With
End Sub

Related

Working code gives error when run on any other PC

I have working code that requests information from a website.
When I send the file to another PC and run the code, I get:
"Run-time error'91': Object variable or With block variable not set"
I ensured:
Macro security levels are the same (Enable all macros & trust access to VBA project object model)
All the checked boxes in VBA editor > Tools > References are the same (Specifically Microsoft HTML Object Library & Microsoft XML, V6.0 is checked)
Sub Macro1()
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim Current As Variant
website = "https://www.thalia.de/shop/home/artikeldetails/A1062020980"
Set request = CreateObject("MSXML2.XMLHTTP")
request.Open "GET", website, False
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
request.send
response = StrConv(request.responseBody, vbUnicode)
html.body.innerHTML = response
Current = html.getElementsByClassName("element-text-standard value").Item(0).innerText
MsgBox (Current)
End Sub
The line on which I get the error:
Current = html.getElementsByClassName("element-text-standard value").Item(0).innerText
WinHttp
I've tried a ton of various solutions, in the end, it came just to replacing MSXML2.XMLHTTP with WinHttp.WinHttpRequest.5.1 to make it work on my computer. While I was researching, I rewrote the whole thing a little bit. I'm a noob at this so I can't explain why one works and the other does not.
Option Explicit
Sub Macro1()
Const URL As String _
= "https://www.thalia.de/shop/home/artikeldetails/A1062020980"
'Const URL As String _
= "https://www.thalia.de/shop/home/artikeldetails/A1060523771"
Const ClassName As String _
= "element-text-standard value"
Dim WhrResponseText As String
WhrResponseText = GetWhrResponseText(URL)
If Len(WhrResponseText) = 0 Then
MsgBox "Could not get a response.", vbExclamation
Exit Sub
End If
' ' Write the response string to a worksheet.
' Dim ws As Worksheet: Set ws = ActiveSheet ' improve!
' Dim arr() As String: arr = Split(WhrResponseText, vbLf)
' ws.Range("A1").Resize(UBound(arr) + 1).Value = Application.Transpose(arr)
Dim Elements As Object
With CreateObject("htmlfile")
.body.innerHTML = WhrResponseText
Set Elements = .getElementsByClassName(ClassName)
End With
' Using 'Length' to determine if a result was found and returning
' the first element.
Dim Result As Variant
With Elements
If .Length > 0 Then
Result = .Item(0).innerText
MsgBox Result
Else
MsgBox "Nothing found."
End If
End With
Dim i As Long
' Loop through the elements using 'For Each... Next'.
Dim Element As Object
For Each Element In Elements
Debug.Print i, Element.innerText
i = i + 1
Next Element
' ' Loop through the elements using 'For... Next'.
' With Elements
' For i = 0 To .Length - 1
' Debug.Print i, .Item(i).innerText
' Next i
' End With
End Sub
Function GetWhrResponseText( _
ByVal URL As String) _
As String
Const ProcName As String = "GetWhrResponseText"
On Error GoTo ClearError
With CreateObject("WinHttp.WinHttpRequest.5.1")
.Open "GET", URL, False
.send
GetWhrResponseText = StrConv(.responseBody, vbUnicode)
End With
ProcExit:
Exit Function
ClearError:
Debug.Print "'" & ProcName & "' Run-time error '" _
& Err.Number & "':" & vbLf & " " & Err.Description
Resume ProcExit
End Function

Webscraping in VBA where some HTML information has no way to refer to it

I have this VBA script scraping from this URL https://accessgudid.nlm.nih.gov/devices/10806378034350
I want the LOT,SERIAL, and EXPIRATION information which in the below pic, has a "Yes" or "No" inside the HTML.
How do I return just that Yes or No information?
Sub GetInnerInformation(HTMLPage As MSHTML.HTMLDocument)
Dim HTMLResult As MSHTML.IHTMLElement
Dim HTMLResults As MSHTML.IHTMLElementCollection
Dim HTMLRow As MSHTML.IHTMLElement
Dim HTMLCell As MSHTML.IHTMLElement
Dim RowNum As Long, ColNum As Integer
Set HTMLResults = HTMLPage.getElementsByClassName("device-attribute")
For Each HTMLResult In HTMLResults
If (HTMLResult.innerText Like "*Lot*") = True Then
Debug.Print HTMLResult.innerText, HTMLResult.outerText, HTMLResult.innerHTML
End If
Next HTMLResult
End Sub
In my Immediate Window I get:
Lot or Batch Number: Lot or Batch Number: Lot or Batch Number:
So no reference to the Yes or No that is in the HTML.
HTML Parser:
You could use a css attribute = value selector to target the span with [?] that is just before the div of interest. Then climb up to shared parent with parentElement, and move to the div of interest with NextSibling. You can then use getElementsByTagName to grab the labels nodes, and loop that nodeList to write out required info. To get the values associated with labels, you again need to use NextSibling to handle the br children within the parent div.
I use xmlhttp to make the request which is faster than opening a browser.
Option Explicit
Public Sub WriteOutYesNos()
Dim html As MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://accessgudid.nlm.nih.gov/devices/10806378034350", False
.send
html.body.innerHTML = .responseText
End With
Dim nodes As Object, i As Long
Set nodes = html.querySelector("[title*='A production identifier (PI) is a variable']").parentElement.NextSibling.getElementsByTagName("LABEL")
For i = 0 To nodes.Length - 3
With ActiveSheet
.Cells(i + 1, 1) = nodes(i).innerText
.Cells(i + 1, 2) = nodes(i).NextSibling.NodeValue
End With
Next
End Sub
JSON Parser:
Data is also available as json which means you can use a json parser to handle. I use jsonconverter.bas as the json parser to handle response. Download raw code from here and add to standard module called JsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
Option Explicit
Public Sub WriteOutYesNos()
Dim json As Object, ws As Worksheet, results(), i As Long, s As String
Set ws = ThisWorkbook.Worksheets("Sheet1")
results = Array("lotBatch", "serialNumber", "manufacturingDate")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://accessgudid.nlm.nih.gov/devices/10806378034350.json", False
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
With ws
For i = LBound(results) To UBound(results)
.Cells(i + 1, 1) = results(i)
.Cells(i + 1, 2).Value = IIf(json(results(i)), "Yes", "No")
Next
End With
End Sub
XML Parser:
Results also come as xml which you can parse with xml parser provided you handle the default namespace appropriately:
Option Explicit
Public Sub WriteOutYesNos()
Dim xmlDoc As Object, ws As Worksheet, results(), i As Long
Set xmlDoc = CreateObject("MSXML2.DOMDocument")
Set ws = ThisWorkbook.Worksheets("Sheet1")
results = Array("lotBatch", "serialNumber", "manufacturingDate")
With xmlDoc
.validateOnParse = True
.setProperty "SelectionLanguage", "XPath"
.setProperty "SelectionNamespaces", "xmlns:i='http://www.fda.gov/cdrh/gudid'"
.async = False
If Not .Load("https://accessgudid.nlm.nih.gov/devices/10806378034350.xml") Then
Err.Raise .parseError.ErrorCode, , .parseError.reason
Exit Sub
End If
End With
With ws
For i = LBound(results) To UBound(results)
.Cells(i + 1, 1) = results(i)
.Cells(i + 1, 2).Value = IIf(xmlDoc.SelectSingleNode("//i:" & results(i)).Text, "Yes", "No")
Next
End With
End Sub
Tinkered around and found it. I had to hardcode the results a little but here is what I got. Let me know if you've found a more elegant answer!
Sub GetInnerInformation(HTMLPage As MSHTML.HTMLDocument)
Dim HTMLResult As MSHTML.IHTMLElement
Dim HTMLResults As MSHTML.IHTMLElementCollection
Dim HTMLRow As MSHTML.IHTMLElement
Dim HTMLCell As MSHTML.IHTMLElement
Dim RowNum As Long, ColNum As Integer
Dim Lot As Boolean
Dim Serial As Boolean
Dim Expiration As Boolean
Set HTMLResults = HTMLPage.getElementsByClassName("expandable-device-content")
For Each HTMLResult In HTMLResults
If (HTMLResult.innerText Like "*Lot or Batch Number*") = True Then
Debug.Print HTMLResult.innerText
If HTMLResult.innerText Like "*Lot or Batch Number: Yes*" Then
Lot = True
End If
If HTMLResult.innerText Like "*Lot or Batch Number: No*" Then
Lot = False
End If
If HTMLResult.innerText Like "*Serial Number: Yes*" Then
Serial = True
End If
If HTMLResult.innerText Like "*Serial Number: No*" Then
Serial = False
End If
If HTMLResult.innerText Like "*Expiration Date: Yes*" Then
Serial = True
End If
If HTMLResult.innerText Like "*Expiration Date: No*" Then
Serial = False
End If
Debug.Print Lot, Serial, Expiration
End If
Next HTMLResult
End Sub

get web page data through class name

I need to get the dates and temp from a weather website and record it on cells but I am getting a object variable or with block variable not set error.
I tried to data from web in excel but I think the website is protected or something because I keep getting "under maintenance" page when trying to load the page from excel. I got the codes below from a tutorial but I can't make it work.
Sub record()
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim websie As String
Dim temps As Variant
'provide link
'website = "https://finance.yahoo.com/quote/EURUSD=X?p=EURUSD=X"
website = "https://www.accuweather.com/en/us/chicago/60608/september-weather/348308"
'create the object that will make the webpage request
Set request = CreateObject("MSXML2.XMLHTTP")
'go to the link
request.Open "GET", website, False
'send request for webpage
request.send
'get web response data to variable
response = StrConv(request.responseBody, vbUnicode)
'put webpage to an html object
html.body.innerHTML = response
'get temperature from specified element
'temps = html.getElementsByClassName("Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)")(0).innerText
temps = html.getElementsByClassName("high")(0).innerText
Sheets("record").Range("A1") = temps
End Sub
Sample lines from the website:
<a class="monthly-daypanel is-past">
<div class="date">2</div>
<div class="icon-container"...</div>
<div class="temp">
<div class="high">83</div>
<div class="low">83</div>
</div>
</a>
I want to get the date, high and low.
You need an User-Agent header. I would also extract the json string from one of the script tags (I use regex for this) and use that as source. I add in a date comparison to work out if it is a forecast or actual value. I read the json string into json object using json library and loop the resultant collection storing items of interest in an array for faster writing out to sheet at end.
json library:
I use jsonconverter.bas. Download raw code from here and add to standard module called jsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime.
Option Explicit
Public Sub GetWeatherListings()
Dim s As String, re As Object, ws As Worksheet
Set re = CreateObject("vbscript.regexp")
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.accuweather.com/en/us/chicago/60608/september-weather/348308", False
' .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
s = .responsetext
End With
Dim results(), r As Long, jsonSource As String, json As Object, item As Object
jsonSource = GetString(re, s, "dailyForecast = (.*?\])")
If jsonSource = "No match" Then Exit Sub
Set json = JsonConverter.ParseJson(jsonSource)
ReDim results(1 To json.count, 1 To 4) 'date, datetime, day > dActual, night > dActual
Dim dateTime() As String, datePart As String, forecast As Boolean
For Each item In json
r = r + 1
dateTime = Split(item("dateTime"), "T")
datePart = dateTime(LBound(dateTime))
forecast = CDate(datePart) >= Date
results(r, 1) = datePart
results(r, 2) = item("dateTime")
results(r, 3) = IIf(forecast, item("day")("dTemp"), item("day")("dActual"))
results(r, 4) = IIf(forecast, item("night")("dTemp"), item("night")("dActual"))
Next
Dim headers()
headers = Array("Date", "DateTime", "Day temp", "Night temp")
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Public Function GetString(ByVal re As Object, ByVal inputString As String, ByVal pattern As String) As String
Dim matches As Object
With re
.Global = True
.MultiLine = True
.IgnoreCase = True
.pattern = pattern
If .Test(inputString) Then
Set matches = .Execute(inputString)
GetString = matches(0).SubMatches(0)
Exit Function
End If
End With
GetString = "No match"
End Function
Sample of end of output:

Exploring the Instr VBA Function In Webscraping

I want to scrape this URL https://www.realtor.com/realestateandhomes-search/06510 using the VBA InStr function and extract all URLs with this substring "06510"
Here's is a sample code I've been trying to make work.
Option Explicit
Sub GetLinks()
'
'To use HTMLDocument you need to set a reference to Tools -> References -> Microsoft HTML Object Library
Dim HTML As New HTMLDocument
Dim http As Object
Dim links As Object
Dim link As HTMLHtmlElement
Dim counter As Long
Dim website As Range
Dim LastRange As Range
Dim row As Long
Dim continue As Boolean
Dim respHead As String
Dim lRow As Long
Application.ScreenUpdating = False
' The row where website addresses start
row = 30
continue = True
lRow = Cells(Rows.count, 1).End(xlUp).row + 1
' XMLHTTP gives errors where ServerXMLHTTP does not
' even when using the same URL's
'Set http = CreateObject("MSXML2.XMLHTTP")
Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Do While continue
' Could set this to first cell with URL then OFFSET columns to get next web site
Set website = Range("A" & row)
Set LastRange = Range("B" & lRow)
If Len(website.Value) < 1 Then
continue = False
Exit Sub
End If
If website Is Nothing Then
continue = False
End If
'Debug.Print website
With http
On Error Resume Next
.Open "GET", website.Value, False
.send
' If Err.Num is not 0 then an error occurred accessing the website
' This checks for badly formatted URL's. The website can still return an error
' which should be checked in .Status
'Debug.Print Err.Number
' Clear the row of any previous results
Range("B" & row & ":e" & row).Clear
' If the website sent a valid response to our request
If Err.Number = 0 Then
If .Status = 200 Then
HTML.body.innerHTML = http.responseText
Set links = HTML.getElementsByTagName("a")
For Each link In links
If InStr(link.outerHTML, "06510") Then
LastRange.Value = link.href
End If
Next
End If
Set website = Nothing
Else
'Debug.Print "Error loading page"
LastRange.Value = "Error with website address"
End If
On Error GoTo 0
End With
row = row + 1
Loop
Application.ScreenUpdating = True
End Sub
After inspecting the page, here's a sample of the kind of URL to extract - https://www.realtor.com/realestateandhomes-detail/239-Bradley-St_New-Haven_CT_06510_M36855-92189. Any help will be appreciated
Using QHarr's code in a simplified way...
Sub GetLinks()
Dim url As String, links_count As Integer
Dim j As Integer, row As Integer
Dim XMLHTTP As Object, html As Object
'Dim tr_coll As Object, tr As Object
'Dim elements As Object
Dim i As Long, allLinksOfInterest As Object
'Dim td_coll As Object, td As Object, td_col, objT
url = "https://www.realtor.com/realestateandhomes-search/06510"
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.responseText
Set allLinksOfInterest = html.querySelectorAll("[href*='06510']")
For i = 0 To allLinksOfInterest.Length - 1
Debug.Print allLinksOfInterest.Item(i).href
Next
End Sub
Please check if I'm missing anything. I'm still getting the error "Object doesn't support this property or method"
Don't use Instr on entire node outerHTML during a loop of all a tags. There are times when this is required but this shouldn't be one of them (hopefully).
You want to use attribute = value css selector with contains, *, operator. It is specifically for the purpose of matching on substrings in attribute values. This is more efficient.
Set allLinksOfInterest = html.querySelectorAll("[href*='06510']")
So,
Dim i As Long, allLinksOfInterest As Object
Set allLinksOfInterest = html.querySelectorAll("[href*='06510']")
For i = 0 To allLinksOfInterest.Length - 1
Debug.Print allLinksOfInterest.Item(i).href
Next
Attribute = value with contains operator:
[attr*=value]
Represents elements with an attribute name of attr whose
value contains at least one occurrence of value within the string.
VBA:
Produces 26 links currently.All are relative links so need domain added as shown in loop. Some are duplicates so consider adding to a dictionary as keys so as remove duplicates.
Option Explicit
Public Sub GetLinks()
Dim html As HTMLDocument
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.realtor.com/realestateandhomes-search/06510", False
.send
html.body.innerHTML = .responseText
End With
Dim i As Long, allLinksOfInterest As Object
Set allLinksOfInterest = html.querySelectorAll("[href*='06510']")
For i = 0 To allLinksOfInterest.Length - 1
Debug.Print Replace$(allLinksOfInterest.item(i).href,"about:","https://www.realtor.com")
Next
End Sub
If InStr(link.outerHTML, "06510") Then
In the code above, InStr function was used like boolean function. But it is not boolean, instead it returns integer. So, you should add comparison operator after function. May be like:
If InStr(link.outerHTML, "06510")>0 Then

Web scraping html page with no tags as delimiter

I'm trying to import into a string array all lines of text in a web page. The URL is here: Vaticano-La Sacra Bibbia-Genesi-Cap.1.
Unfortunately (maybe a choice of the web designer), in the tag there aren't ID's or CLASS. All the rows are separated by 1 or more < BR > element. Start and end text is separated from a simple menu by 2 tag < HR >.
A clean extract of page code is here: jsfiddle.
I find a way to bring the text. And now what I do in VBA till now:
Note: objDoc is a Public variable coming from another module, fill with a .responseText without problems.
Public Sub ScriviXHTML(strBook As String, intNumCap As Integer)
Dim strDati2 As String
Dim TagBr As IHTMLElementCollection
Dim BrElement As IHTMLElement
Dim intElement As Integer
Dim objChild as Object
Dim strData, strTextCont, strNodeVal, strWholeText As String
Set objDoc2 = New HTMLDocument
Set objDoc2 = objDoc
Set objDoc = Nothing
'Put in variable string HTML code of the web page.
strDati2 = objDoc2.body.innerHTML
'Set in the variable object TAG type BR.
Set TagBr = objDoc2.body.getElementsByTagName("BR")
'Loop for all BRs in the page.
For Each BrElement In TagBr
'Here I try to get the NextSibling element of the <br>
' because seems contain the text I'm looking for.
Set objChild = BrElement.NextSibling
With objChild
' Here I try to put in the variables
strData = Trim("" & .Data & "")
strTextCont = Trim("" & .textContent & "")
strNodeVal = Trim("" & .NodeValue & "")
strWholeText = Trim("" & .wholeText & "")
End With
intElement = intElement + 1
Next BrElement
Two questions:
1) Is it, about you, the best way to achieve what I'm trying to do?
2) Sometimes the Element.NextSibling.Data doesn't exist, with an Error of runtime '438', so I manually move the point of sospension of the routine to by-pass the error. How can I intercept this error? [Please not with a simple On Error Resume Next!]... better: how can I use an If...Then... End If statement to check if in NextSibling exist the Data member?
Thanks at all.
Well you can get all the text as follows:
Public Sub GetInfo()
Dim sResponse As String, xhr As Object, html As New HTMLDocument
Set xhr = CreateObject("MSXML2.XMLHTTP")
With xhr
.Open "GET", "http://www.vatican.va/archive/ITA0001/__P1.HTM", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
html.body.innerHTML = sResponse
[A1] = Replace$(Replace$(regexRemove(html.body.innerHTML, "<([^>]+)>"), " ", Chr$(32)), Chr$(10), Chr$(32))
End With
End Sub
Public Function regexRemove(ByVal s As String, ByVal pattern As String) As String
Dim regex As Object
Set regex = CreateObject("VBScript.RegExp")
With regex
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = pattern
End With
If regex.test(s) Then
regexRemove = regex.Replace(s, vbNullString)
Else
regexRemove = s
End If
End Function