Issue With Looping/Variables Through HTML Table Elements - html

I have this project I'm working on which involves tracking the order statuses of individual shipments from a list of URLs on an Excel sheet. Currently, the code is able to loop through the URLs fine and extract information from them, but when I try to add exceptions and variables to the loop for each different URL, I get a type mismatch. Currently, there are 3 variables I need in order to extract the HTML information from the HTMLCollection/Table. What I'm trying to do here is:
Cycle through each URL (inserts URL in IE and keeps going without opening new tabs)
Obtain the status of the item for each URL from the HTML element
FedEx: Delivered (td class="status")
UPS: Delivered (id="tt_spStatus")
USPS: Arrived at USPS Facility (class= "info-text first)
My code:
Sub TrackingDeliveryStatusUpdate()
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.EnableEvents = True
Dim rngURL As Range
Dim wb1 As Workbook, ws1 As Worksheet, ws2 As Worksheet
Dim MyURL As String
Dim Rows As Long, links As Variant, IE As InternetExplorer, link As Variant
Dim i As Long
Dim sID As String
Dim rngLinks As Range, rngLink As Range
Dim filterRange As Range
Dim copyRange As Range
'Dim doc As Object
Set wb1 = Application.Workbooks.Open("\\S51\Store51\Employee Folders\Jason\TrackingDeliveryStatus.xls")
Set ws1 = wb1.Worksheets("TrackingDeliveryStatusResults")
'Set rngURL = ws1.Range("C2:C" & lastRow)
'Arr = rngURL
Set IE = New InternetExplorer
Rows = ws1.Cells(ws1.Rows.Count, "A").End(xlUp).Row
Set rngLinks = ws1.Range("C2:C" & Rows)
'Loop through each used cell in column C on sheet URLs
'For Each rngURL In Worksheets("TrackingDeliveryStatusResults").Range("C2", Worksheets("TrackingDeliveryStatusResults").Range("C" & Rows.Count).End(xlUp))
'Set HTMLDoc = IE.Document
With IE
.Visible = True
i = 2
For Each rngLink In rngLinks
.Navigate (rngLink)
While .Busy Or .ReadyState <> 4: DoEvents: Wend
'If InStr(1, URL.Value, "fedex") Then sID = "status"
Dim doc As Object
Set doc = IE.Document
'I know this line right below is a big mess. Not sure what to do here.
If InStr(1, rngLink.Value2, "ups") Then
sID = "tt_spStatus"
'If InStr(3, URL.Value, "usps") Then sID = "info-text first"
ws1.Range("D" & i).Value = doc.getElementById(sID).Items(0).Value
'ws1.Range("D" & i).Value = .Document.body.innerText
End If
Next rngLink
i = i + 1
End With
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
Application.EnableEvents = True
End Sub
I figured this would work, but apparently it doesn't with referencing URLs from an Excel spreadsheet.

You should dim Link and Links as range and change:
links = ws1.Range("C2:C" & Rows)
to:
Set Links = ws1.Range("C2:C" & Rows)
You are trying to make a Variant equal a Range which can't happen. But once you set Links to the range of cells, then when you're looping through each cell reference the URL with link.Value2.

The reason is not working is because you have several concepts mixed up in your statements. I have refactored the relevant parts of the code and made some suggestions on variable type and name changes
Dim rngLinks as Range, rngLink as Range
Set rng Links = ws1.Range("C2:C" & Rows)
For each rngLink in rngLinks
Just because here you are looping through cells in a worksheet range.
Then inside the For each rngLink in rngLinks statement, after the statement where the code waits for IE to load.
Dim doc as Object
Set doc = .document
If InStr(1, rngLink.Value2, "ups") Then
sID = "tt_spStatus"
ws1.Range("D" & I).Value = doc.getElementById(sID).Items(0).Value
End If
You can also delete the For Each TDelement In sID block. It is not needed.

Related

Extract the details from a Table using VBA gets object variable or with block variable not set

My script runs for few row and then i a getting object variable or with block variable not set error.
I am using the below script to extract the 5,6,7 value from the NSEIndia website.
I get the value of a stock from the same Excel and update the same excel with the values from the nseindia website.
Sub Stock_Basic_Update_NSE()
Dim ie As InternetExplorer
Dim webpage As HTMLDocument
Dim ws As Worksheet
For Item = 23 To 1505
Set ws = ThisWorkbook.Worksheets("NSE Stocks Details")
sSearch = ws.Range("A" & Item).Value
'sSearch = Filestk.Worksheets("Sheet1").Range("E1").Value
Set ie = New InternetExplorer
'ie.Visible = True
ie.navigate ("https://www.nseindia.com/get-quotes/equity?symbol=" & sSearch)
Do While ie.readyState = 4: DoEvents: Loop
Do Until ie.readyState = 4: DoEvents: Loop
While ie.Busy
DoEvents
Wend
Set webpage = ie.document
ws.Cells(Item, 3).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(5).innerText
ws.Cells(Item, 4).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(6).innerText
ws.Cells(Item, 5).Value = webpage.getElementsByClassName("eq-series table-fullwidth w-100")(0).getElementsByTagName("td")(7).innerText
ie.Quit
Set ie = Nothing
Next Item
End Sub
You had some errors in your code and you hadn't wait for the full document to load. Try the following code. I have commented it. So you can see, what I have changed and why. I have tried it with the top 50 symbols.
Sub Stock_Basic_Update_NSE()
'Declare always all variables
Dim ie As Object 'I switched this from early to late binding (not required)
Dim nodeTable As Object
Dim ws As Worksheet
Dim item As Long
Dim sSearch As String
'Use this outside the loop. You only need it once
Set ws = ThisWorkbook.Worksheets("NSE Stocks Details")
For item = 23 To 1505
sSearch = ws.Range("A" & item).Value
Set ie = CreateObject("internetexplorer.application")
ie.Visible = False
'Encode symbols that are restricted for using in URLs. Like &, : or ?
ie.navigate ("https://www.nseindia.com/get-quotes/equity?symbol=" & WorksheetFunction.EncodeURL(sSearch))
'It's not "While = 4" because 4 stands for "readystate = complete"
'If you want to use "= 4" you must use "Until" instead of "While"
'It doesn't matter what you use
Do While ie.readyState <> 4: DoEvents: Loop
'Manual break to load dynamic content after the IE reports the page load was complete
'This was your main problem
Application.Wait (Now + TimeSerial(0, 0, 2))
'The needed html table has an ID. If possible use always that instead of class names
'because an html ID is unique if the standard is kept
'Also use a variable to save the elements
'So you don't need to shorten the html document string in most cases because
'it's only needed one time
Set nodeTable = ie.document.getElementByID("equityInfo")
ws.Cells(item, 3).Value = nodeTable.getElementsByTagName("td")(5).innerText
ws.Cells(item, 4).Value = nodeTable.getElementsByTagName("td")(6).innerText
ws.Cells(item, 5).Value = nodeTable.getElementsByTagName("td")(7).innerText
'Clean up
ie.Quit
Set ie = Nothing
Next item
End Sub

Grabbing a single piece of data from a website's HTML and assign it to a variable

I'm working on a project where I grab data that I stored in an excel sheet and search a specific website that can be seen in the code below. Once the website completes the search, I want to grab the "worth" from the top right of the page. I'm fairly new to using VBA with HTML, so I'm not sure how to take the element (worth) that I'm looking for from the web page, and assign it to a variable in VBA so I can paste it into my excel sheet.
Right now I'm able to open IE, insert my data into the search bar of the specific website that I'm using, and click search. What I have is seen below. Thank you in advance!
Sub BrowsetoSite()
Dim IE As New SHDocVw.InternetExplorer
Dim website As String
Dim i As Integer
i = 2
'Set ie = New SHDocVw.InternetExplorer
website = "https://cardmavin.com/category/football"
IE.navigate website
IE.Visible = False
Do While IE.readyState <> READYSTATE_COMPLETE
'assign info to variable to enter into the search bar
Loop
Dim idoc As MSHTML.HTMLDocument
Set idoc = IE.document
Dim Brand As String
Dim Year As String
Dim Num As String
Dim Name As String
Dim search As String
Dim value As Variant
Brand = Range("A" & i).value
Year = Range("B" & i).value
Num = Range("D" & i).value
Name = Range("E" & i).value
search = (Year & " " & Brand & " " & Name & " " & Num)
i=i+1
idoc.getElementById("search-field").value = search
idoc.getElementById("to-mavin").Click
While IE.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Dim value As Variant
value = idoc.getElementsByTagName("h4")(0).innerText
MsgBox value
IE.Quit
End Sub
The issue that I'm having is the value = idoc.getElementsByTagName("h4")(0).innerText. I've tried to get the element a few different ways, but have been unsuccessful so far.
You need Set idoc = IE.document after you've submitted the search, to get a reference to that new page. Otherwise you're still trying to access the previous page.
i=i+1
idoc.getElementById("search-field").value = search
idoc.getElementById("to-mavin").Click
While IE.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Set idoc = IE.document '<<<<<<<<<<<<<<
Dim value As Variant
value = idoc.getElementById("worthBox") _
.getElementsByTagName("h4")(0).innerText
MsgBox value
Try this approach. Suppose in cell A1 the string 2008 Topps Thomas DeCoud
Sub Test()
Const sURL As String = "https://mavin.io/search?q="
Dim json As Object
Set json = GetJSONFromHTMLHead(sURL & Application.WorksheetFunction.EncodeURL(Range("A1").Value))
Debug.Print json("offers")("priceCurrency")
Debug.Print json("offers")("price")
End Sub
Function GetJSONFromHTMLHead(ByVal sURL As String) As Object
Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument, re As Object, json As Object
Set http = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "<head>([\s\S]+)<\/head>"
With http
.OPEN "Get", sURL, False
.send
html.body.innerHTML = Replace$(Replace$(re.Execute(.responseText)(0), "<head>", "<body>"), "</head>", "</body>")
End With
Set json = JSONConverter.ParseJson(html.querySelector("script[type='application/ld+json']").innerHTML)
Set GetJSONFromHTMLHead = json
End Function

Can we fetch the specific data via using urls in vba

I have 15 different URLs, and I need to fetch price from the particular website in Excel a particular column, can you please help me out. It's my first VBA program and I try but it show my syntax error.
Sub myfile()
Dim IE As New InternetExplorer Dim url As String Dim item As
HTMLHtmlElement Dim Doc As HTMLDocument Dim tagElements As Object
Dim element As Object Dim lastRow Application.ScreenUpdating =
False Application.DisplayAlerts = False Application.EnableEvents =
False Application.Calculation = xlCalculationManual url =
"https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
IE.navigate url IE.Visible = True Do DoEvents Loop Until
IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
lastRow = Sheet1.UsedRange.Rows.Count + 1 Set tagElements =
Doc.all.tags("tr") For Each element In tagElements
If InStr(element.innerText, "ciq-price")> 0 And
InStr(element.className, "ciq-product-name") > 0 Then
Sheet1.Cells(lastRow, 1).Value = element.innerText
' Exit the for loop once you get the temperature to avoid unnecessary processing
Exit For End If Next
IE.Quit Set IE = Nothing Application.ScreenUpdating = True
Application.DisplayAlerts = True Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
You can't copy any web scraping macro for your purposes. Every page has it's own HTML code structure. So you must write for every page an own web scraping macro.
I can't explain all about web scraping with VBA here. Please start your recherche for information with "excel vba web scraping" and "document object model". Further you need knowlege about HTML and CSS. In best case also about JavaScript:
The error message user-defined type not defined ocours because you use early binding without a reference to the libraries Microsoft HTML Object Library and Microsoft Internet Controls. You can read here how to set a reference via Tools -> References... and about the differences between early and late binding Early Binding v/s Late Binding and here deeper information from Microsoft Using early binding and late binding in Automation
To get the prices from the shown url you can use the following macro. I use late binding:
Option Explicit
Sub myfile()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim element As Object
Dim item As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
Set tagElements = IE.document.getElementsByClassName("ciq-online-offer-item ")
For Each element In tagElements
Set item = element.getElementsByTagName("td")(1)
ActiveSheet.Cells(lastRow, 1).Value = Trim(item.innerText)
lastRow = lastRow + 1
Next
IE.Quit
Set IE = Nothing
End Sub
Edit for a second Example:
The new link leads to an offer. I assume the price of the product is to be fetched. No loop is needed for this. You just have to find out in which HTML segment the price is and then you can decide how to get it. In the end there are only two lines of VBA that write the price into the Excel spreadsheet.
I'm in Germany and Excel has automatically set the currency sign from Dollar to Euro. This is of course wrong. Depending on where you are, this may have to be intercepted.
Sub myfile2()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://www.wayfair.com/kitchen-tabletop/pdx/cuisinart-air-fryer-toaster-oven-cui3490.html"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
'Break for 3 seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
Set tagElements = IE.document.getElementsByClassName("BasePriceBlock BasePriceBlock--highlight")(0)
ActiveSheet.Cells(lastRow, 1).Value = Trim(tagElements.innerText)
IE.Quit
Set IE = Nothing
End Sub

How do I pull value from external website by Element ID with Excel / VBA?

I'm trying to retrieve values from external websites by element ID using VBA and add them to my excel table. The website URL's are indicated in column A. Column B and C are for my retrieved values.
URL example
Element ID name: "youtube-user-page-country"
Excel Pic
Bellow is my poor attempt:
Sub getCountry()
Dim IE As New InternetExplorer
IE.Visible = False
IE.navigate Worksheets("Sheet1").Range(A3).Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim getCountry As String
getCountry = Trim(Doc.getElementsByTagName("youtube-user-page-country").innerText)
Worksheets("Sheet1").Range(B31).Value = getCountry
End Sub
The code isn't working showing problems with object definition.
Could anyone give me tips on where I'm going wrong?
I've been a macro recorder user and the switch has quite a steep learning curve :-)
Thanks for any help !
I think I get what you are after. There were a few issues:
You want to use getElementByID.
Naming a string getCountry and the SubRoutine getCountry containing it is not a good idea. You can do it, but don't.
Always fully qualify your sheet references so you know what workbook and sheet you are working with
Here's the revised code, I have it working on my end.
Sub getCountry()
Dim IE As Object: Set IE = CreateObject("InternetExplorer.Application")
Dim ws As Worksheet: Set ws = ThisWorkbook.Sheets("Sheet1")
Dim Country As String
With IE
.Visible = False
.navigate ws.Range("A3").Value
Do
DoEvents
Loop Until .readyState = 4
End With
Country = Trim$(IE.document.getElementByID("youtube-user-page-country").innerText)
ws.Range("B31").Value2 = Country
IE.Quit
End Sub
You can use this to dump the data to your spreadsheet.
Sub DumpData()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "http://finance.yahoo.com/q?s=sbux&ql=1"
'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
DoEvents
Loop
RowCount = 1
With Sheets("Sheet1")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.ID
.Range("C" & RowCount) = itm.classname
.Range("D" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
Next itm
End With
End Sub
Thanks Joel!!!

Fetching data from web page

I am trying to fetch the publication date corresponding to each patent number.
Here is the Excel sheet:
The database is espacenet.com
Here's the link for the first patent you see in the Excel sheet:
http://worldwide.espacenet.com/searchResults?compact=false&PN=US7055777B2&ST=advanced&locale=en_EP&DB=EPODOC
Under the "Publication Info" header, I need to get the date after matching the patent number with the one in the Excel sheet.
Here's the code:
Sub tryextraction()
Dim ie As New InternetExplorer
Dim sdd As String
Dim tdd() As String
Dim num0 As Integer
Dim num1 As Integer
Dim doc As HTMLDocument
Dim i As Integer
Dim j As Integer
ie.Visible = True
num1 = ActiveSheet.UsedRange.Rows.Count
For num0 = 2 To num1
ie.navigate "http://worldwide.espacenet.com/searchResults?compact=false&PN=" & Range("A" & num0) & "&ST=advanced&locale=en_EP&DB=EPODOC"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
sdd = Trim(doc.getElementsByTagName("td")(5).innerText)
tdd() = Split(sdd, vbLf)
j = UBound(tdd)
For i = 0 To j
If InStr(tdd(i), "(") <> 0 Then
tdd(i) = Replace(tdd(i), " ", "")
tdd(i) = Replace(tdd(i), "(", "")
tdd(i) = Replace(tdd(i), ")", "")
If tdd(i) = Range("A" & num0).Value Then
Range("B" & num0).Value = tdd(i + 1)
End If
End If
Next i
Next num0
ie.Quit
End Sub
The code is not giving any error. The column "Publication Date" remains blank after the code finishes running.
The html tag which contains the publication info has been taken correctly.
There are some trailing white space characters after the ID you are searching for in the document so tdd(i) = Range("A" & num0).Value never evaluates to true. It's not just a space, so a simple Trim(tdd(i)) = Range("A" & num0).Value call does not help. Try instead InStr(tdd(i), Range("A" & num0).Value) If that is not good enough, you'll have to specifically remove CRLF from the end of the string before doing the compare.
There are often multiple publication dates under the publication info header.
Example:
The following script obtains all of these and the preceeding line (so you have the associated publication along with date).
It loops from row 2 of the Activesheet, to the last populated row, picking up the Publication Numbers from column A and writing out the results starting from column B. Depending on how many dates there are, the data will extend across multiple columns from B.
Regex:
A regex of ^(.*)\s\d{4}-\d{2}-\d{2} is used to retrieve the date pattern and the preceeding line i.e. The publication identifier and the date. Try it
Example output:
VBA:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer, html As New HTMLDocument, url As String, pubInfo As Object
Dim loopRange As Range, iRow As Range, counter As Long
'example US7055777B2
Application.ScreenUpdating = False
With ActiveSheet
Set loopRange = Range("A2:A" & .Cells(.Rows.Count, "A").End(xlUp).Row)
End With
With IE
.Visible = True
counter = 2 '<== start from row 2
For Each iRow In loopRange
If Not IsEmpty(iRow) Then
url = "https://worldwide.espacenet.com/searchResults?compact=false&PN=" & iRow.Value & "&ST=advanced&locale=en_EP&DB=EPODOC"
.navigate url
While .Busy Or .readyState < 4: DoEvents: Wend
Set html = .document
Do
DoEvents
On Error Resume Next
Set pubInfo = html.querySelector(".publicationInfoColumn")
On Error GoTo 0
Loop While pubInfo Is Nothing
Dim tempArr()
tempArr = GetDateAndPatent(pubInfo.innerText, "^(.*)\s\d{4}-\d{2}-\d{2}") '"(?m)^(.*)\s\d{4}-\d{2}-\d{2}" '<==This is not supported
With ActiveSheet
.Cells(counter, 2).Resize(1, UBound(tempArr) + 1) = tempArr
End With
End If
counter = counter + 1
Next iRow
.Quit '<== Remember to quit application
End With
Application.ScreenUpdating = True
End Sub
Public Function GetDateAndPatent(ByVal inputString As String, ByVal sPattern As String) As Variant
Dim matches As Object, iMatch As Object, s As String, arrMatches(), i As Long
With CreateObject("vbscript.regexp")
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = sPattern
If .test(inputString) Then
Set matches = .Execute(inputString)
For Each iMatch In matches
ReDim Preserve arrMatches(i)
arrMatches(i) = iMatch.Value
i = i + 1
Next iMatch
End If
End With
GetDateAndPatent = arrMatches
End Function