scrape html without id in vba - html

I'm trying to get month-to-date and year-to-date return values from the website
http://us.spindices.com/indices/equity/sp-oil-gas-exploration-production-select-industry-index
into an Excel spreadsheet using VBA. The problem is that there is no "id= " in the code of the page, which I understand would make this process a lot simpler. There is also the matter of which time period (year-to-date or month-to-date) is visible, but I'd be happy with scraping just the MTD values for now.
Here is my code:
Sub Get_Change()
'attempting to scrape Barclay's website
Dim appIE As Object
Dim MyVar As String
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://us.spindices.com/indices/equity/sp-oil-gas-exploration-production-select-industry-index"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Range("A1").Value = "Working..."
Loop
Set TDelements = appIE.document.getElementsbyClassName("performance-chart-table")
For Each TDelement In TDelements
If TDelement.class = "change" Then
MyVar = TDelement.class.innerText("Value")
End If
Next
Range("A1").Value = MyVar
appIE.Quit
Set appIE = Nothing
End Sub
If I can get a way to set the 'MyVar' variable to the current MTD or YTD value, I'll be done, but I'm having a hard time since there is not a unique identifier for either of these values. Any ideas?

I've recently watched some CSS training videos and I can tell you the CSS selector syntax is powerful and I'd recommend it. This is the same syntax that javascript/web developers use to select elements when using JQuery.
I think you should try using
document.queryselectorall
or in your case because you have drilled in to the document to get the "performance-chart-table" call queryselectorall off of that variable, TDelements.
Documentation at http://www.w3schools.com/jsref/met_document_queryselectorall.asp
and you supply as a parameter a CSS selector string the syntax of which can be found at http://www.w3schools.com/cssref/css_selectors.asp
And I've gone and done it for you....
Sub Get_Change()
'* Tools-References Microsoft HTML Object Library
'attempting to scrape Barclay's website
Dim appIE As Object
Dim MyVar As String
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://us.spindices.com/indices/equity/sp-oil-gas-exploration-production-select-industry-index"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Range("A1").Value = "Working..."
Loop
Dim htmlDoc As MSHTML.HTMLDocument
Set htmlDoc = appIE.document
Dim TDelements2 As MSHTML.IHTMLElementCollection
Set TDelements2 = htmlDoc.getElementsByClassName("performance-chart-table")
While TDelements2.Length < 1
DoEvents
Application.Wait (Now() + TimeSerial(0, 0, 3))
Set TDelements2 = htmlDoc.getElementsByClassName("performance-chart-table")
Wend
Dim oHTMLTablePerformanceChartTable As MSHTML.HTMLTable
Set oHTMLTablePerformanceChartTable = TDelements2.Item(0)
Dim objChangeCollection As MSHTML.IHTMLDOMChildrenCollection
Set objChangeCollection = oHTMLTablePerformanceChartTable.querySelectorAll(".change")
'Debug.Assert objChangeCollection.Length = 2
Dim objChange2 As Object
Set objChange2 = objChangeCollection.Item(1)
MyVar = objChange2.innerText
'Set TDelements = appIE.document.getElementsByClassName("performance-chart-table")
'
'For Each TDelement In TDelements
' TDelements.querySelectorAll (".change")
' If TDelement.class = "change" Then
' MyVar = TDelement.class.innerText("Value")
'
' End If
'Next
Range("A1").Value = MyVar
appIE.Quit
Set appIE = Nothing
End Sub

Related

VBA: set HTML as Internet Explorer object ERROR

I'm trying to get the content from a website with VBA, however I keep getting an error.
I already tried several other ways that I looked into similar questions, but nothing seems to work...I also tried with set ie = New InternetExplorer but it didn't work either
Can you help me? My goal is further to find a specific key word and count the number of times it appears.
Thanks,
M
Sub website()
Dim ie As Object
Dim ht As HTMLDocument
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
ie.navigate ("https://poetenalinha.pt/2021/01/13/gratinado-de-massa-com-peru-e-legumes/")
Set ht = ie.document --> here is the error!!
Set elems = ht.getElementsByClassName("entry-content")
For Each elem In elems
Debug.Print (elem.innerText)
Next
End Sub
You need to wait until IE fully loaded after navigating to the website. The complete code is like below and works well:
Sub website()
Dim ie As Object
Dim ht As HTMLDocument
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate ("https://poetenalinha.pt/2021/01/13/gratinado-de-massa-com-peru-e-legumes/")
'add this part
Do Until ie.readyState = 4
DoEvents
Loop
Set ht = ie.document
Set elems = ht.getElementsByClassName("entry-content")
For Each elem In elems
Debug.Print (elem.innerText)
Next
End Sub
Result:
This worked for me - for some reason the ie object gets disconnected from the browser instance. If you re-make the connection (using the "GetIE" function) it works.
Sub website()
Dim ie As Object, elems, elem
Dim ht As HTMLDocument
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "https://poetenalinha.pt/2021/01/13/gratinado-de-massa-com-peru-e-legumes/"
Application.Wait Now + TimeSerial(0, 0, 10) ' wait 10 sec
Set ie = GetIE("https://poetenalinha.pt") ' "reconnect" to IE object
Set ht = ie.document
Set elems = ht.getElementsByClassName("entry-content")
For Each elem In elems
Debug.Print (elem.innerText)
Next
End Sub
'get a reference to an existing IE window, given a partial URL
Function GetIE(sLocation As String) As Object
Dim objShell As Object, objShellWindows As Object, o As Object
Dim sURL As String
Dim retVal As Object
Set retVal = Nothing
Set objShell = CreateObject("Shell.Application")
Set objShellWindows = objShell.Windows
For Each o In objShellWindows
sURL = ""
On Error Resume Next 'because may not have a "document" property
'Check the URL and if it's the one you want then
' assign the window object to the return value and exit the loop
sURL = o.document.Location
On Error GoTo 0
If sURL Like sLocation & "*" Then
Set retVal = o
Exit For
End If
Next o
Set GetIE = retVal
End Function

HTML Webpage Query with Various Search Parameters

I am attempting to import data from a website that requires certain search parameters. There are a selection of about 5 options that is required on the website. I'm trying to build a script that will query the website, select specific search parameters and search. From there import the results to my excel sheet.
The code I created is not working. I am new to VBA so would appreciate the help.
This is what I have:
Private Sub RegulatoryDataPull_Click()
Dim eRow As Long
Dim objIE As Object
Dim HDoc As HTMLDocument
Dim HEle As HTMLUListElement
Set objIE = CreateObject("InternetExplorer.Application") ' Create document object.
Set HDoc = objIE.document ' Create HTML element (<ul>) object.
Set HEle = HDoc.getElementById("dnn_ctr85406_StateNetDB_resultsCount") ' Get the element reference using its ID.
Set sht = Sheets("Sheet1")
eRow = Sheet1.Cells(Rows.Count, 7).End(x1Up.Offset(7, 0)).Row
With objIE
.Visible = True
.navigate "https://www.ncsl.org/research/energy/energy-legislation-tracking-database.aspx"
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
Var arr = [document.querySelectorAll('["name=dnn$ctr85406$StateNetDB$ckBxTopics$16"],[name="dnn$ctr85406$StateNetDB$ckBxTopics$5"],[name="dnn$ctr85406$StateNetDB$ckBxTopics$3"],[name="dnn$ctr85406$StateNetDB$ckBxTopics$8"]')]
Topics.Item(0).Value = Topicchoice
Set States = .document.getElementsByName("dnn$ctr85406$StateNetDB$ckBxAllStates")
States.Item(0).Value = Stateschoice
Set Status = .document.getElementsByName("dnn$ctr85406$StateNetDB$ddlStatus")
Status.Item(0).Value = Statuschoice
Set Year = .document.getElementsByName("dnn$ctr85406$StateNetDB$ddlYear")
Year.Item(0).Value = Yearchoice
.document.getElementById("dnn_ctr85406_StateNetDB_btnSearch").Click
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
Dim ele As Object
' Loop through elements inside the <ul> element and find <br>, which has the texts we want.
With HEle
For ele = 0 To .getElementsByTagName("br").Length - 1
Debug.Print .getElementsByTagName("br").Item(ele).getElementsByTagName("br").Item(0).innerHTML
End Select
Next ele
End With
Set objIE = Nothing
End Sub
Welcome to SO! I copy-pasted your code in Excel-VBA and it indeed crashed. In that case the easiest thing to do is step through it with F8 (don't just run the code with F5/a button). That does help in finding the line where the code blocks/crashes. After some modifications I came up with this code that works on my machine. It's by no means finished, but should give you a good start.
Private Sub RegulatoryDataPullTWO()
Dim eRow As Long
Dim objIE As Object
Dim HDoc As HTMLDocument
Dim HEle As HTMLUListElement
Set objIE = CreateObject("InternetExplorer.Application") ' Create document object.
objIE.Visible = True
objIE.navigate "https://www.ncsl.org/research/energy/energy-legislation-tracking-database.aspx"
Do While objIE.Busy Or objIE.readyState <> 4
DoEvents
Loop
Set HDoc = objIE.document ' Create HTML element (<ul>) object.
Set Top1 = objIE.document.getElementsByName("dnn$ctr85406$StateNetDB$ckBxTopics$16")
Top1.Item(0).Value = True
Set States = objIE.document.getElementsByName("dnn$ctr85406$StateNetDB$ckBxAllStates")
States.Item(0).Value = True
Set Status = objIE.document.getElementsByName("dnn$ctr85406$StateNetDB$ddlStatus")
Status.Item(0).Value = "Adopted"
Set yr = objIE.document.getElementsByName("dnn$ctr85406$StateNetDB$ddlYear")
yr.Item(0).Value = "2019"
objIE.document.getElementById("dnn_ctr85406_StateNetDB_btnSearch").Click
Do While objIE.Busy Or objIE.readyState <> 4
DoEvents
Loop
Set HEle = HDoc.getElementById("dnn_ctr85406_StateNetDB_resultsCount") ' Get the element reference using its ID.
Set HList = HDoc.getElementById("dnn_ctr85406_StateNetDB_linkList")
Set Sht = Sheets("Sheet1")
Debug.Print HEle.outerText
Sht.Range("B2").Value = HEle.outerText
ResRw = 3
For e = 0 To HList.getElementsByTagName("a").Length - 1
Set lnk = HList.getElementsByTagName("a").Item(e)
'Debug.Print e1.outerText, e1.outerHTML
If lnk.outerText <> "Bill Text Lookup" And lnk.outerText <> "*" Then
Debug.Print Replace(Replace(lnk.ParentNode.innerText, Chr(10), ""), Chr(13), "")
Debug.Print lnk.ParentNode.NextSibling.NextSibling.innerText
Sht.Range("A" & ResRw).Value = Replace(Replace(lnk.ParentNode.innerText, Chr(10), ""), Chr(13), "")
Sht.Range("B" & ResRw).Value = lnk.ParentNode.NextSibling.NextSibling.innerText
ResRw = ResRw + 1
End If
Next e
Set objIE = Nothing
End Sub

Can we fetch the specific data via using urls in vba

I have 15 different URLs, and I need to fetch price from the particular website in Excel a particular column, can you please help me out. It's my first VBA program and I try but it show my syntax error.
Sub myfile()
Dim IE As New InternetExplorer Dim url As String Dim item As
HTMLHtmlElement Dim Doc As HTMLDocument Dim tagElements As Object
Dim element As Object Dim lastRow Application.ScreenUpdating =
False Application.DisplayAlerts = False Application.EnableEvents =
False Application.Calculation = xlCalculationManual url =
"https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
IE.navigate url IE.Visible = True Do DoEvents Loop Until
IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
lastRow = Sheet1.UsedRange.Rows.Count + 1 Set tagElements =
Doc.all.tags("tr") For Each element In tagElements
If InStr(element.innerText, "ciq-price")> 0 And
InStr(element.className, "ciq-product-name") > 0 Then
Sheet1.Cells(lastRow, 1).Value = element.innerText
' Exit the for loop once you get the temperature to avoid unnecessary processing
Exit For End If Next
IE.Quit Set IE = Nothing Application.ScreenUpdating = True
Application.DisplayAlerts = True Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
You can't copy any web scraping macro for your purposes. Every page has it's own HTML code structure. So you must write for every page an own web scraping macro.
I can't explain all about web scraping with VBA here. Please start your recherche for information with "excel vba web scraping" and "document object model". Further you need knowlege about HTML and CSS. In best case also about JavaScript:
The error message user-defined type not defined ocours because you use early binding without a reference to the libraries Microsoft HTML Object Library and Microsoft Internet Controls. You can read here how to set a reference via Tools -> References... and about the differences between early and late binding Early Binding v/s Late Binding and here deeper information from Microsoft Using early binding and late binding in Automation
To get the prices from the shown url you can use the following macro. I use late binding:
Option Explicit
Sub myfile()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim element As Object
Dim item As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
Set tagElements = IE.document.getElementsByClassName("ciq-online-offer-item ")
For Each element In tagElements
Set item = element.getElementsByTagName("td")(1)
ActiveSheet.Cells(lastRow, 1).Value = Trim(item.innerText)
lastRow = lastRow + 1
Next
IE.Quit
Set IE = Nothing
End Sub
Edit for a second Example:
The new link leads to an offer. I assume the price of the product is to be fetched. No loop is needed for this. You just have to find out in which HTML segment the price is and then you can decide how to get it. In the end there are only two lines of VBA that write the price into the Excel spreadsheet.
I'm in Germany and Excel has automatically set the currency sign from Dollar to Euro. This is of course wrong. Depending on where you are, this may have to be intercepted.
Sub myfile2()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://www.wayfair.com/kitchen-tabletop/pdx/cuisinart-air-fryer-toaster-oven-cui3490.html"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
'Break for 3 seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
Set tagElements = IE.document.getElementsByClassName("BasePriceBlock BasePriceBlock--highlight")(0)
ActiveSheet.Cells(lastRow, 1).Value = Trim(tagElements.innerText)
IE.Quit
Set IE = Nothing
End Sub

How to get span id value into excel VBA?

Ok so here is my entire code:
Private Sub CommandButton1_Click()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://finance.yahoo.com/q/ks?s=" & "AAPL"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Loop
Set getPrice = appIE.Document.getElementById("yfs_l84_aapl")
Dim myValue As String: myValue = getPrice.Cells(1).innerHTML
appIE.Quit
Set appIE = Nothing
Range("B1").Value = myValue
End Sub
And here is the HTML that I'm trying to read into Excel (specifically, I need the 113.92):
<span id="yfs_l84_aapl">113.92</span>
What do I have to change in these two lines of code to read a "span id"?
Set getPrice = appIE.Document.getElementById("yfs_l84_aapl")
Dim myValue As String: myValue = getPrice.Cells(1).innerHTML
Or, alternatively, is there a way just to read whatever is directly after "yfs_184"??
I'm brand new to coding and am working very hard to get better, so any help is really appreciated!! Thanks! :)
Use this:
myValue = getPrice.innerText

Ignore elements in certain tags when getting elements by id using VBA

I have a vba module for extracting all the links in a page. I would however like to ignore all the links in certain tags such as <header> and <footer> (and all their child tags). Can anyone tell me how can this be done?
Sub Fetch_click()
Dim LinkArr As Variant
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate Cells(1, 1).Text
While IE.Busy
DoEvents
Wend
Dim i As Integer
i = 3
Set LinkArr = IE.Document.getElementsByTagName("a")
For Each LinkObj In LinkArr
Cells(i, 1).Value = LinkObj.href
i = i + 1
Next
End Sub
Thank you
I would prefer to use objects from the Microsoft HTML Object Library and the Microsoft Internet Controls library (add references to both!), e.g.
Sub StartTest()
Dim Browser As SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
' start browser
Set Browser = New SHDocVw.InternetExplorer
Browser.Visible = True
Browser.navigate "www.dauda.at"
Set HTMLDoc = Browser.document
Dim ECol As MSHTML.IHTMLElementCollection
Dim IFld As MSHTML.IHTMLElement
' search all <a> tags
Set ECol = HTMLDoc.getElementsByTagName("a")
For Each IFld In ECol
' etc ...
Next IFld
' clean up
Set IFld = Nothing
Set ECol = Nothing
Set HTMLDoc = Nothing
Browser.Quit
Set Browser = Nothing
End Sub
Checking where your <a> tag is sitting, can be as easy as inspecting the IFld.ParentNode.nodeName to get the tag of the enclosing parent.
If it is unclear how deeply nested your <a> is, you can make use of a recursive function examing the next higher parent all the way up to the document root ("#document") or the contained "HTML", e.g.
Function BadParentRec(TestFld As MSHTML.IHTMLElement) As Boolean
Dim MyTag As String, MyTempResult As Boolean
BadParentRec = False
MyTag = TestFld.ParentNode.nodeName
' Debug.Print MyTag
If MyTag = "#document" Then
MyTempResult = False ' lowest level is good
ElseIf MyTag = "XXX" Then ' your own criteria for bad tags go here
MyTempResult = True ' send "bad" back up the recursion chain
Else
MyTempResult = BadParentRec(TestFld.parentElement) ' next level down
End If
BadParentRec = MyTempResult
End Function
... so inside the For Each loop you would say
If Not BadParentRec(IFld) Then
Debug.Print Ifld.href ' check here for href = ""
End If