vba referencing html elements by xpath - html

I am a beginner to web scraping with excel vba and need some help.
I am trying to reference an element. If there was an id then I could use getElementByID but sometimes there is no id. I could use getElementByClassName but sometimes there are too many elements of the same class.
Is there some way to refer to an element by xpath?
(I can't post the actual website since there is personal info so let us say this is the html)
<!DOCTYPE html>
<html>
<body>
Link
</body>
</html>
is there something like ie.document.getElementByXPath.(/html/body/a).click?
I've searched all over the web and can't seem to find anything on the topic.

this is not meant to be an answer
here is a couple of subs that may give you some ideas
Sub google()
' add reference: Microsoft XML v6.0
Const url = "https://www.google.co.in"
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
http.Open "GET", url, False
http.Send
html.body.innerHTML = http.responseText
Dim elem As Object
Set elem = html.getElementsByClassName("ctr-p") ' HTMLElementCollection
Debug.Print elem.Length
Set elem = html.getElementsByClassName("ctr-p")("viewport") ' HTMLDivElement <div class="ctr-p" id="viewport">
Debug.Print elem.Children.Length
Dim aaa As Object
Set aaa = elem.getElementsByTagName("div")("hplogo") ' HTMLDivElement
Debug.Print aaa.Children.Length
Debug.Print aaa.outerHTML
End Sub
.
' add references Microsoft HTML Object Library
' Microsoft Internet Controls
Sub ieGoogle()
Const url = "https://www.google.co.in"
Dim iE As InternetExplorer
Set iE = New InternetExplorer
iE.Navigate url
iE.Visible = True
Do While iE.ReadyState <> 4: DoEvents: Loop
Dim doc As HTMLDocument
Set doc = iE.Document
Debug.Print doc.ChildNodes.Length ' DOMChildrenCollection
Debug.Print doc.ChildNodes(1).ChildNodes.Item(0).nodeName ' HEAD
Debug.Print doc.ChildNodes(1).ChildNodes.Item(1).nodeName ' BODY
' for querySelector arguments see: https://www.w3schools.com/cssref/css_selectors.asp
Dim elm As HTMLInputElement
Set elm = doc.querySelector("*") ' all elements
Debug.Print Left(elm.outerHTML, 40)
Set elm = doc.querySelector("div.ctr-p#viewport") ' <div class="ctr-p" id="viewport">
Debug.Print Left(elm.outerHTML, 40)
Set elm = doc.querySelector(".ctr-p#viewport") ' <div class="ctr-p" id="viewport">
Debug.Print Left(elm.outerHTML, 40)
Debug.Print elm.ChildNodes.Length
Debug.Print elm.Children.Length
Set elm = doc.querySelector("#viewport") ' id="viewport"
Debug.Print Left(elm.outerHTML, 40)
Debug.Print elm.ID
Dim elem As HTMLInputElement
Set elem = doc.getElementsByClassName("ctr-p")("viewport")
Debug.Print elem.Children.Length
Dim aaa As Object
Set aaa = elem.getElementsByTagName("div")("hplogo")
Debug.Print aaa.Children.Length
Debug.Print aaa.outerHTML
iE.Quit
Set iE = Nothing
End Sub

You can do this in Excel VBA using Selenium Webdriver (https://www.selenium.dev/).
Webdriver does have a FindElementByXPath method. It has the advantage of providing control of other browser than Internet Explorer, but the disadvantage will be the need to install Selenium in every machine which will run your VBA script.
Here is a walkthrough for installing Selenium and adding its library reference to your project (this was the tutorial I used; it's a pt-br page, but I put into google for automatic translation): https://translate.google.com/translate?sl=pt&tl=en&u=https%3A%2F%2Fwww.tomasvasquez.com.br%2Fblog%2Fmicrosoft-office%2Fexcel%2Fvba-interagindo-com-paginas-web-com-o-selenium-webdriver%2F
And here is another quickstart from Coding is Love (it doesn't have the installation walkthrough): https://codingislove.com/browser-automation-in-excel-selenium/

Related

VBA automate Edge Browser without downloading any external things

I have the below VBA codes to automate IE, and then extract the figures of the HTML table and populate the data to Excel table. Is it possible to do the same thing by automate Edge Browser? Since my company don't allow us to install any 3rd party application, Selenium is not an option. As I am not too familarize with coding, highly apprecipate if someone can offer some sample codes
Sub sfc_esg_list()
Dim IE As New InternetExplorer
Dim doc As New MSHTML.HTMLDocument
IE.Visible =*emphasized text* True
'use IE browser to navigate SFC website
IE.navigate "https://www.sfc.hk/en/Regulatory-functions/Products/List-of-ESG-funds"
Do
DoEvents
'Application.Wait (Now() + TimeValue("00:00:04"))
Loop Until IE.readyState = 4
Set doc = IE.Document
Set TRs = doc.getElementsByTagName("tr")
Sheets("ESG list_SFC").Activate
'copy and paste the ESG fund list from SFC website to sheets<ESG list_SFC>
With Sheets("ESG list_SFC")
.Cells.Clear
For Each TR In TRs
r = r + 1
For Each Cell In TR.Children
C = C + 1
.Cells(r, C).NumberFormat = "#"
.Cells(r, C) = Cell.innerText
Next Cell
C = 0
Next TR
End With
IE.Quit
Set doc = Nothing
Set IE = Nothing
'Save the file
Application.ScreenUpdating = False
Application.DisplayAlerts = False
'ActiveWorkbook.Save
End Sub
IE is pretty much dead at this point. I think it should be something like this.
Sub TryMe()
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.serverXMLHTTP")
Set request = CreateObject("MSXML2.XMLHTTP")
Dim oHtml As HTMLDocument
Dim oElement As Object
Dim htmlText As String
Set oHtml = New HTMLDocument
request.Open "GET", "https://www.sfc.hk/en/Regulatory-functions/Products/List-of-ESG-funds/", False
request.send
oHtml.body.innerHTML = request.responseText
htmlText = oHtml.getElementsByClassName("tablesorter tablesorter-default tablesorterfcd4c178102ad8")(0).outerhtml
With CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") 'Clipboard
.SetText htmlText
.PutInClipboard
Sheets(1).Range("A1").Select
Sheets(1).PasteSpecial Format:="Unicode Text"
End With
End Sub
I thought the class name was 'tablesorter tablesorter-default tablesorterfcd4c178102ad8' but it doesn't seem to work, and I'm not sure why. Can you play around with some other class names? When you hit F12, you will see the HTML code behind the page.

How to fetch iframe data using Excel VBA

I am using below mentioned code in Excel VBA for IE navigation.I am facing following error while fetching data from iframe.
Error detail:
Object does not support this property or method
Option Explicit
Public Sub Cgg_Click()
Dim Ie As New InternetExplorer
Dim WebURL
Dim Docx As HTMLDocument
Dim productDesc
Dim productTitle
Dim price
Dim RcdNum
Ie.Visible = True
WebURL = "https://www.google.com/maps/place/parlour+beauty+parlour+beauty/#40.7314166,-74.13182,11z/data=!4m8!1m2!2m1!1sParlour+NY!3m4!1s0x89c2599bd4c1d2e7:0x20873676f6334189!8m2!3d40.7314166!4d-73.9917443"
Ie.Navigate2 WebURL
Do Until Ie.readyState = READYSTATE_COMPLETE
DoEvents
Loop
Application.Wait (Now + TimeValue("00:00:25"))
For N = 0 To Ie.document.getElementsByClassName("section-subheader-header GLOBAL__gm2-subtitle-alt-1").Length - 1
If Ie.document.getElementsByClassName("section-subheader-header GLOBAL__gm2-subtitle-alt-1").Item(N).innerText = "Web results" Then
Ie.document.getElementsByClassName("section-subheader-header GLOBAL__gm2-subtitle-alt-1").Item(N).ScrollIntoView (False)
End If
Next N
Application.Wait (Now + TimeValue("00:00:25"))
Set Docx = Ie.document
productDesc = Docx.Window.frames("section-iframe-iframe").contentWindow.document.getElementsByClassName("trex")(0).outerHTML
End Sub
Here is the HTML:
Please help to resolve this error.
I want to extract "trex" ClassName HTML Contain from above url
Thanks.
You can change the line of extract "trex" element to one of the following, both of them can work well:
Use the getElementsbyTagName method to get the Iframe first , then according to the Iframe.contentDocument property to reach the element via the class name:
productDesc = Docx.getElementsByTagName("iframe")(0).contentDocument.getElementsByClassName("trex")(0).outerHTML
Use querySelector method to get the Iframe through class, then use the same as the above to reach the element:
productDesc = Docx.querySelector(".section-iframe-iframe").contentDocument.getElementsByClassName("trex")(0).outerHTML

Excel VBA IE NAVIGATE method does not return full HTML Page

I am trying to extract first name of the actor from the url i am passing,
for my URL,i need to extact "Will Smith" from the HTML page.
Web Page
I know how to extract elements from HTML page using tag,classnaem etc.
But the problem i am facing is when i pass the URL, "https://ssl.ofdb.de/film/138627,I-Am-Legend"
in the response text,i am not at receiving the full HTML page,due to this i am not able extract the content "Will Smith".
I tried other methods like MSXML2.XMLHTTP60 also both returns the partial HTML page only
I have attached my code here,any one please help
Sub Fetch_Info()
Dim ie As New InternetExplorer
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Top = 0
ie.Left = 700
ie.Width = 1000
ie.Height = 750
ie.AddressBar = 0
ie.StatusBar = 0
ie.Toolbar = 0
ie.navigate "https://ssl.ofdb.de/film/138627,I-Am-Legend"
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Application.Wait Now + TimeValue("00:00:04")
Dim doc As HTMLDocument
Set doc = ie.document
doc.Focus
Debug.Print doc.DocumentElement.innerHTML
End Sub
You can use the following css selector. querySelector returns the first node matched for the css pattern. The pattern is [itemprop='actor'] span which looks for a child span with parent element having attribute itemprop with value actor. Note, I am working off ie.document node.
Debug.Print ie.document.querySelector("[itemprop=actor] span").innerText
That content is static so you could use xhr and avoid overhead of browser. The response header charset is none so you need the response body.
Option Explicit
Public Sub GetActor()
Dim xhr As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument
'required VBE (Alt+F11) > Tools > References > Microsoft HTML Object Library ; Microsoft XML, v6 (your version may vary)
Set xhr = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://ssl.ofdb.de/film/138627,I-Am-Legend", False
.send
html.body.innerHTML = StrConv(.responseBody, vbUnicode)
End With
ActiveSheet.Cells(1, 1) = html.querySelector("[itemprop=actor] span").innerText
End Sub

Can we fetch the specific data via using urls in vba

I have 15 different URLs, and I need to fetch price from the particular website in Excel a particular column, can you please help me out. It's my first VBA program and I try but it show my syntax error.
Sub myfile()
Dim IE As New InternetExplorer Dim url As String Dim item As
HTMLHtmlElement Dim Doc As HTMLDocument Dim tagElements As Object
Dim element As Object Dim lastRow Application.ScreenUpdating =
False Application.DisplayAlerts = False Application.EnableEvents =
False Application.Calculation = xlCalculationManual url =
"https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
IE.navigate url IE.Visible = True Do DoEvents Loop Until
IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
lastRow = Sheet1.UsedRange.Rows.Count + 1 Set tagElements =
Doc.all.tags("tr") For Each element In tagElements
If InStr(element.innerText, "ciq-price")> 0 And
InStr(element.className, "ciq-product-name") > 0 Then
Sheet1.Cells(lastRow, 1).Value = element.innerText
' Exit the for loop once you get the temperature to avoid unnecessary processing
Exit For End If Next
IE.Quit Set IE = Nothing Application.ScreenUpdating = True
Application.DisplayAlerts = True Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
You can't copy any web scraping macro for your purposes. Every page has it's own HTML code structure. So you must write for every page an own web scraping macro.
I can't explain all about web scraping with VBA here. Please start your recherche for information with "excel vba web scraping" and "document object model". Further you need knowlege about HTML and CSS. In best case also about JavaScript:
The error message user-defined type not defined ocours because you use early binding without a reference to the libraries Microsoft HTML Object Library and Microsoft Internet Controls. You can read here how to set a reference via Tools -> References... and about the differences between early and late binding Early Binding v/s Late Binding and here deeper information from Microsoft Using early binding and late binding in Automation
To get the prices from the shown url you can use the following macro. I use late binding:
Option Explicit
Sub myfile()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim element As Object
Dim item As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://wtb.app.channeliq.com/buyonline/D_nhoFMJcUal_LOXlInI_g/TOA-60?html=true"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
Set tagElements = IE.document.getElementsByClassName("ciq-online-offer-item ")
For Each element In tagElements
Set item = element.getElementsByTagName("td")(1)
ActiveSheet.Cells(lastRow, 1).Value = Trim(item.innerText)
lastRow = lastRow + 1
Next
IE.Quit
Set IE = Nothing
End Sub
Edit for a second Example:
The new link leads to an offer. I assume the price of the product is to be fetched. No loop is needed for this. You just have to find out in which HTML segment the price is and then you can decide how to get it. In the end there are only two lines of VBA that write the price into the Excel spreadsheet.
I'm in Germany and Excel has automatically set the currency sign from Dollar to Euro. This is of course wrong. Depending on where you are, this may have to be intercepted.
Sub myfile2()
Dim IE As Object
Dim url As String
Dim tagElements As Object
Dim lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count + 1
url = "https://www.wayfair.com/kitchen-tabletop/pdx/cuisinart-air-fryer-toaster-oven-cui3490.html"
Set IE = CreateObject("internetexplorer.application")
IE.navigate url
IE.Visible = True
Do: DoEvents: Loop Until IE.readyState = 4
'Break for 3 seconds
Application.Wait (Now + TimeSerial(0, 0, 3))
Set tagElements = IE.document.getElementsByClassName("BasePriceBlock BasePriceBlock--highlight")(0)
ActiveSheet.Cells(lastRow, 1).Value = Trim(tagElements.innerText)
IE.Quit
Set IE = Nothing
End Sub

How to get META keywords content with VBA from source code in an EXCEL file

I have to download the source code of a several hundred websites to an Excel file (for example to cells(1, 1) in Worksheets 1) and then extract the content of of the META tag keywords in let's say cells(1, 2).
For downloading I use the following code in VBA:
Dim htm As Object
Set htm = CreateObject("HTMLfile")
URL = "https://www.insolvenzbekanntmachungen.de/cgi-bin/bl_aufruf.pl?PHPSESSID=8ecbeb942c887974468b9010531fc7ab&datei=gerichte/nw/agkoeln/16/0071_IN00181_16/2016_06_10__11_53_26_Anordnung_Sicherungsmassnahmen.htm"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
htm.body.innerHTML = .responseText
Cells(1, 1) = .responseText
End With
I've found the following code on this website but, unfortunately, I'm unable to adapt it to solve my problem:
Sub GetData()
Dim ie As New InternetExplorer
Dim str As String
Dim wk As Worksheet
Dim webpage As New HTMLDocument
Dim item As HTMLHtmlElement
Set wk = Worksheets(1)
str = "https://www.insolvenzbekanntmachungen.de/cgi-bin/bl_aufruf.pl?PHPSESSID=8ecbeb942c887974468b9010531fc7ab&datei=gerichte/nw/agkoeln/16/0071_IN00181_16/2016_06_10__11_53_26_Anordnung_Sicherungsmassnahmen.htm"
ie.Visible = True
ie.navigate str
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
'Find the proper meta element --------------
Const META_TAG As String = "META"
Const META_NAME As String = "keywords"
Dim Doc As HTMLDocument
Dim metaElements As Object
Dim element As Object
Dim kwd As String
Set Doc = ie.Document
Set metaElements = Doc.all.tags(META_TAG)
For Each element In metaElements
If element.Name = META_NAME Then
kwd = element.Content
End If
Next
MsgBox kwd
End Sub
I think I have to modify this line, but don't know how:
Set Doc = ie.Document
Can you please help me out?
Embed a WebrowserControl into a Excel Spreadsheet or userform
How to add a Webrowser to Excel
Set up references to the HTML Object Library
How to add VBA References – Internet Controls, HTML Object Library
Grab Greg Truby's code from this post Webbroswer Control
You'll have access the Document Object Model (DOM). This will expose most of the HTMLElements properties and event's
Option Explicit
Private WithEvents htmDocument As HTMLDocument
Private WithEvents MyButton As HTMLButtonElement
Private Function MyButton_onclick() As Boolean
MsgBox "Sombody Click MyButton on WebBrowser1"
End Function
Private Sub WebBrowser1_NavigateComplete2(ByVal pDisp As Object, URL As Variant)
Dim aTags As Hyperlinks
Do Until .ReadyState = READYSTATE_COMPLETE
DoEvents
Loop
Set MyButton = htmDocument.getElementById("MyButtonID")
Set htmDocument = WebBrowser1.Document
Set aTags = htmDocument.getElementsByTagName("a")
End Sub
Google Web Api, HTA, (MDN){https://developer.mozilla.org/en-US/docs/Web/API} and if you get stuck try to refactor Javascript code to vbscript. It's