Accessing a website's table with a WinHTTPRequest in Excel VBA - html

I have written code that grabs a table from a website and pulls out each cell from that table and then drops them into an excel spreadsheet. The code works flawlessly when the website loads correctly.
The issue is the website does not play nice with internet explorer, therefore the code only executes successfully about half of the time. I could write a routine that checks to see if the website loaded successfully and repeat if it did not, However I want to see if I can get it to work with a WinHTTPRequest.
The lines below are how I access the table using internet explorer based webscraping, with the last line being how i load the table into a variable.
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "https://weather.com/weather/tenday/l/12345:4:US"
IE.Visible = True
Application.Wait (Now() + TimeValue("00:02:00"))
Set doc = IE.document
Set WeatherTable = doc.getElementsByClassName("twc-table")(0)
I can load the website in question via WinHTTPRequest using the code below.
Set doc = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "https://weather.com/weather/tenday/l/12345:4:US", False
.send
doc.body.innerHTML = .responseText
End With
However when I try and grab the table using the line below i get "Run-time error '438': Object doesn't support this property or method.
Set WeatherTable = doc.getElementByclassname("twc-table")(0)
Basically i need the equivalent of this line for WinHTTP webscraping.
I have looked at descending down through the html document(doc.body.all.item(1), etc) but I don't get very far before I run into errors. I have also looked at the Selenium addon, but I don't remember being able to download and install it successfully, and I am not sure if it is even still maintained for current versions of chrome / firefox.
Here is the full code that allows me to get the table via internet explorer webscraping and then drop it onto an excel spreadsheet.
Any help is appreciated.
Sub GetTable2()
Dim IE As Object
Dim doc As HTMLDocument
Dim WeatherTable As HTMLTable
Dim WeatherTableRows As HTMLTableRow
Dim HTMLTableCell As HTMLTableCell
Dim HeaderRow As Boolean
Dim RowCount As Long
Dim ColumnCount As Long
Dim i As Long
RowCount = 1
ColumnCount = 1
HeaderRow = True
Set IE = CreateObject("InternetExplorer.Application")
IE.navigate "https://weather.com/weather/tenday/l/12345:4:US"
IE.Visible = True
'Application.Wait (Now() + TimeValue("00:02:00"))
Set doc = IE.document
Set WeatherTable = doc.getElementsByClassName("twc-table")(0)
For Each WeatherTableRows In WeatherTable.Rows
i = 1
For Each HTMLTableCell In WeatherTableRows.Cells
If HeaderRow = True Then
ThisWorkbook.Sheets("Sheet5").Cells(RowCount, ColumnCount).Value = HTMLTableCell.innerText
ColumnCount = ColumnCount + 1
Else
If i = 1 Then
i = i + 1
Else
ThisWorkbook.Sheets("Sheet5").Cells(RowCount, ColumnCount).Value = HTMLTableCell.innerText
ColumnCount = ColumnCount + 1
End If
End If
Next HTMLTableCell
HeaderRow = False
ColumnCount = 1
RowCount = RowCount + 1
Next WeatherTableRows
IE.Quit
Set IE = Nothing
Set doc = Nothing
End Sub

You missed an s. It is plural as you are getting a collection of elements by className.
Set WeatherTable = doc.getElementsByClassName("twc-table")(0)

To make your approach slightly cleaner, you can try this way as well.
Sub FetchTabularData()
Dim elem As Object, trow As Object, S$, R&, C&
[B1:G1] = [{"Day","Description","High/Low","Precip","Wind","Humidity"}]
With New WinHttp.WinHttpRequest
.Open "GET", "https://weather.com/weather/tenday/l/12345:4:US", False
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
For Each elem In .querySelector(".twc-table").getElementsByTagName("tr")
For Each trow In elem.getElementsByTagName("td")
C = C + 1: Cells(R + 1, C) = trow.innerText
Next trow
C = 0: R = R + 1
Next elem
End With
End Sub
Reference to add:
Microsoft HTML Object Library
Microsoft WinHTTP Services, version 5.1

Related

VBA automate Edge Browser without downloading any external things

I have the below VBA codes to automate IE, and then extract the figures of the HTML table and populate the data to Excel table. Is it possible to do the same thing by automate Edge Browser? Since my company don't allow us to install any 3rd party application, Selenium is not an option. As I am not too familarize with coding, highly apprecipate if someone can offer some sample codes
Sub sfc_esg_list()
Dim IE As New InternetExplorer
Dim doc As New MSHTML.HTMLDocument
IE.Visible =*emphasized text* True
'use IE browser to navigate SFC website
IE.navigate "https://www.sfc.hk/en/Regulatory-functions/Products/List-of-ESG-funds"
Do
DoEvents
'Application.Wait (Now() + TimeValue("00:00:04"))
Loop Until IE.readyState = 4
Set doc = IE.Document
Set TRs = doc.getElementsByTagName("tr")
Sheets("ESG list_SFC").Activate
'copy and paste the ESG fund list from SFC website to sheets<ESG list_SFC>
With Sheets("ESG list_SFC")
.Cells.Clear
For Each TR In TRs
r = r + 1
For Each Cell In TR.Children
C = C + 1
.Cells(r, C).NumberFormat = "#"
.Cells(r, C) = Cell.innerText
Next Cell
C = 0
Next TR
End With
IE.Quit
Set doc = Nothing
Set IE = Nothing
'Save the file
Application.ScreenUpdating = False
Application.DisplayAlerts = False
'ActiveWorkbook.Save
End Sub
IE is pretty much dead at this point. I think it should be something like this.
Sub TryMe()
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.serverXMLHTTP")
Set request = CreateObject("MSXML2.XMLHTTP")
Dim oHtml As HTMLDocument
Dim oElement As Object
Dim htmlText As String
Set oHtml = New HTMLDocument
request.Open "GET", "https://www.sfc.hk/en/Regulatory-functions/Products/List-of-ESG-funds/", False
request.send
oHtml.body.innerHTML = request.responseText
htmlText = oHtml.getElementsByClassName("tablesorter tablesorter-default tablesorterfcd4c178102ad8")(0).outerhtml
With CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") 'Clipboard
.SetText htmlText
.PutInClipboard
Sheets(1).Range("A1").Select
Sheets(1).PasteSpecial Format:="Unicode Text"
End With
End Sub
I thought the class name was 'tablesorter tablesorter-default tablesorterfcd4c178102ad8' but it doesn't seem to work, and I'm not sure why. Can you play around with some other class names? When you hit F12, you will see the HTML code behind the page.

Question about extracting text from a specific website and printing it in excel using VBA

The webpage is "https://www.racingandsports.com/thoroughbred/jockey/jake-bayliss/27461". Below is the VBA code and questions.
Edit1: For QHarr. the way its set up it takes the data from the first 3 columns of the table. The xmlhttp is just there because I copy pasted it from a previous VBA I was working on.
Edit2: Thank you for the advice, Ron.
Edit3: #QHarr. yes I would like to be able to grab the first 3 columns from all the tables.
Sub Horse2()
Dim IE As InternetExplorer
Application.ScreenUpdating = False
Set IE = New InternetExplorer
IE.Visible = True
Set ws = ThisWorkbook.Worksheets("Sheet1")
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim nodeRaceResultsTable As HTMLHtmlElement
Dim nodeTr As HTMLHtmlElement
Dim nodeDiv As HTMLHtmlElement
Dim Element1 As HTMLHtmlElement
Dim node1 As HTMLHtmlElement
Dim currentUrl As String
With IE
IE.Visible = True
IE.Navigate "https://www.racingandsports.com/thoroughbred/jockey/jake-bayliss/27461"
Do Until .readyState = 4: DoEvents: Loop
End With
With http
.Open "GET", "https://www.racingandsports.com/thoroughbred/jockey/jake-bayliss/27461", False
.send
html.body.innerHTML = .responseText
End With
The issue I have is there are multiple instances of the same class name so it will print the first table on the page. The class name here, along with the inner text, is what enables me to get text from the tables from the webpage. However I would like to be able to extract the first 3 data points from all tables on the webpage and have it printed onto excel.
For Each nodeRaceResultsTable In html.getElementsByClassName("col-md-12 table-responsive")
r = r + 1: c = 4
For Each nodeTr In nodeRaceResultsTable.getElementsByTagName("tr")
With nodeTr.getElementsByTagName("td")
If .Length Then
ws.Cells(r + 1, c + 3) = .Item(0).innerText
ws.Cells(r + 1, c + 4) = .Item(1).innerText
ws.Cells(r + 1, c + 5) = .Item(2).innerText
r = r + 1
End If
End With
Next
Next
IE.Quit
Set IE = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
MsgBox "Input complete."
End Sub>

Can we fetch the specific data via using urls in vba

I have 15 different URLs, and I need to fetch price from the particular website in Excel a particular column, can you please help me out. It's my first VBA program and I try but it show my syntax error.
Sub myfile()
Dim IE As New InternetExplorer Dim url As String Dim item As
HTMLHtmlElement Dim Doc As HTMLDocument Dim tagElements As Object
Dim element As Object Dim lastRow Application.ScreenUpdating =
False Application.DisplayAlerts = False Application.EnableEvents =
False Application.Calculation = xlCalculationManual url =
"https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
IE.navigate url IE.Visible = True Do DoEvents Loop Until
IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
lastRow = Sheet1.UsedRange.Rows.Count + 1 Set tagElements =
Doc.all.tags("tr") For Each element In tagElements
If InStr(element.innerText, "ciq-price")> 0 And
InStr(element.className, "ciq-product-name") > 0 Then
Sheet1.Cells(lastRow, 1).Value = element.innerText
' Exit the for loop once you get the temperature to avoid unnecessary processing
Exit For End If Next
IE.Quit Set IE = Nothing Application.ScreenUpdating = True
Application.DisplayAlerts = True Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
You can't copy any web scraping macro for your purposes. Every page has it's own HTML code structure. So you must write for every page an own web scraping macro.
I can't explain all about web scraping with VBA here. Please start your recherche for information with "excel vba web scraping" and "document object model". Further you need knowlege about HTML and CSS. In best case also about JavaScript:
The error message user-defined type not defined ocours because you use early binding without a reference to the libraries Microsoft HTML Object Library and Microsoft Internet Controls. You can read here how to set a reference via Tools -> References... and about the differences between early and late binding Early Binding v/s Late Binding and here deeper information from Microsoft Using early binding and late binding in Automation
To get the prices from the shown url you can use the following macro. I use late binding:
Option Explicit
Sub myfile()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim element As Object
Dim item As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
Set tagElements = IE.document.getElementsByClassName("ciq-online-offer-item ")
For Each element In tagElements
Set item = element.getElementsByTagName("td")(1)
ActiveSheet.Cells(lastRow, 1).Value = Trim(item.innerText)
lastRow = lastRow + 1
Next
IE.Quit
Set IE = Nothing
End Sub
Edit for a second Example:
The new link leads to an offer. I assume the price of the product is to be fetched. No loop is needed for this. You just have to find out in which HTML segment the price is and then you can decide how to get it. In the end there are only two lines of VBA that write the price into the Excel spreadsheet.
I'm in Germany and Excel has automatically set the currency sign from Dollar to Euro. This is of course wrong. Depending on where you are, this may have to be intercepted.
Sub myfile2()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://www.wayfair.com/kitchen-tabletop/pdx/cuisinart-air-fryer-toaster-oven-cui3490.html"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
'Break for 3 seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
Set tagElements = IE.document.getElementsByClassName("BasePriceBlock BasePriceBlock--highlight")(0)
ActiveSheet.Cells(lastRow, 1).Value = Trim(tagElements.innerText)
IE.Quit
Set IE = Nothing
End Sub

VBA Get HTML Element Info for Changing Id

I am trying to create an excel web scraper that logs into my companies ticket tracking system and logs certain information on the sheet (Lead assigned, Desired Date for the project, etc.). I was doing fine until I had to pull a field off the website that has a changing ID.
For example, on two pages the same field will have the IDs:
"cq_widget_CqFilteringSelect_32"
"cq_widget_CqFilteringSelect_9"
Can somebody provide guidance to how I should search and paste the "IT Lead" value into excel?
HTML snippet of div
Snippet of actual website
Setup in excel
Below is what I have so far
I get confused in this area:
lead = objCollection(i).Value
Sub CQscrub()
Dim i As Long
Dim objElement As Object
Dim objCollection As Object
Dim objCollection2 As Object
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim numbers() As String
Dim size As Integer
Dim row As Integer
Dim objLead As Object
Dim objLead2 As Object
Dim lead As String
Dim counter As Integer
size = WorksheetFunction.CountA(Worksheets(1).Columns(1)) - 4
ReDim numbers(size)
For row = 10 To (size + 10)
numbers(row - 10) = Cells(row, 1).Value
'Cells(row, 2) = numbers(row - 10)
Next row
Set ie = New InternetExplorer
ie.Height = 1000
ie.Width = 1000
ie.Visible = True
ie.navigate "http://clearquest/cqweb/"
Application.StatusBar = "Loading http://clearquest/cqweb"
Do While ie.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
Application.StatusBar = "Searching form. Please wait..."
'Had these below as comment
Dim WRnumber1 As String
WRnumber1 = Range("A10").Value
'Range("A6").Value = WRnumber1
Dim iLastRow As Integer
Dim Rng As Range
iLastRow = Cells(Rows.Count, "a").End(xlUp).row 'last row of A
'Set objCollection = ie.document.getElementsByTagName("input") originally here
For counter = 0 To size - 1
Set objCollection = ie.document.getElementsByTagName("input")
i = 0
While i < objCollection.Length
If objCollection(i).Name = "cqFindRecordString" Then
objCollection(i).Value = numbers(counter)
End If
i = i + 1
Wend
'''''''''''''''''' Find Label ''''''''''''''''''''''''''''
Set objCollection = ie.document.getElementsByTagName("label")
i = 0
While i < objCollection.Length
If objCollection(i).innerText = "IT Lead/Assigned To" Then
lead = objCollection(i).Value
'Set objLead = objCollection(i)
End If
i = i + 1
Wend
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Range("B" & (iLastRow - (size - counter - 1))).Value = lead
Set objElement = ie.document.getElementById("cqFindRecordButton")
objElement.Click
Do While ie.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
Application.Wait (Now + TimeValue("0:00:02"))
Next counter
ie.Quit
Set ie = Nothing
Set objElement = Nothing
Set objCollection = Nothing
Application.StatusBar = ""
MsgBox "Done!"
End Sub
Note: Website is internal only
Goal: Select Name under "IT Lead/Assigned To" field and paste to Excel
Thanks
Regarding the supplied code, tl;dr.
But if you are wanting the scratched out portion you supplied in your HTML snippet, the following may work (I can't test something that I don't have access to :D).
There are many different ways to grab an element, and this method you are grabbing the first instance of the class name dijitReset dijitInputField dijitInputContainer. Class names are not always a unique value, but due to the somewhat complexity of this class name, I feel somewhat safe that in your case it is.
You could have used one line to Set yourObj... but for demonstration purposes I decided to break it up. 1-liner method to Set your obj:
Set yourObj = doc.getElementsByClassName("dijitReset dijitInputField dijitInputContainer")(0).getElementsByTagName("input")(1)
Code Snippet:
Sub getElementFromIE()
Dim ie As InternetExplorer
' ... your above code pulls up webpage ...
'''''''''''''''''' Find Label ''''''''''''''''''''''''''''
Dim doc As HTMLDocument, yourObj As Object
Set doc = ie.document
' I assume the class name is unique? If so, just append (0) as I did below
Set yourObj = doc.getElementsByClassName("dijitReset dijitInputField dijitInputContainer")(0)
Set yourObj = yourObj.getElementsByTagName("input")(1)
lead = yourObj.Value
End Sub
The reason for the (1) on Set yourObj = yourObj.getElementsByTagName("input")(1) is because there are 2 input tags after your class dijitReset.... You are wanting the 2nd instance of this tag, which contains your value; and as you are probably already aware, you are using Base 0, meaning the 2nd instance is actually the number 1.

Copy data from HTML

I am trying to learn how to parse data from HTML using Excel VBA. So I found one example online which works fine but when I change URL address from www.yahoo.com to local HTML file on C it gives me error i.e. Method 'busy' of object 'IwebBrowser2' failed. Code is:
Sub GetBodyText()
Dim URL As String
Dim Data As String
URL = "file:///C:/test.html"
Dim ie As Object
Dim ieDoc As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.navigate URL
Do Until (ie.readyState = 4 And Not ie.Busy)
DoEvents
Loop
Set ieDoc = ie.Document
Data = ieDoc.body.innerText
'Split Data into separate lines
'or just use Range("A1")=data
Dim myarray As Variant
myarray = Split(Data, vbCrLf)
For i = 0 To UBound(myarray)
'Start writing in cell A1
Cells(i + 1, 1) = myarray(i)
Next
ie.Quit
Set ie = Nothing
Set ieDoc = Nothing
End Sub
For IE, just use:
URL = "c:\test.html"