Extract email HTML Element - html

I am trying to scrape a page and there is a point I am stuck at. Here's first the HTML part of the whole HTML page
<article class="mod mod-Treffer" data-teilnehmerid="122085958708">
<div data-wipe="{"listener": "click", "name": "Trefferliste Eintrag zur Detailseite", "id": "122085958708", "synchron": true}" data-realid="2aeca1d2-2bc5-4070-ac4d-e16b10badca5" data-tnid="122085958708" target="_self">
<div class="mod-hervorhebung">
<p class="mod-hervorhebung--partnerHervorhebung" data-hervorhebungsstufe="3">Silber Partner</p>
</div>
<picture class="trefferlisten_logo">
<source media="(min-width: 768px)" srcset="https://ies.v4all.de/0122/GS/0122/5/8335/49428335_310x190.png">
<img alt="" data-lazy-src="https://ies.v4all.de/0122/GS/0122/5/8335/49428335_310x190.png" src="https://ies.v4all.de/0122/GS/0122/5/8335/49428335_310x190.png">
</picture>
<h2 data-wipe-name="Titel">A & S Billing Pflege-Service GmbH</h2>
<p class="d-inline-block mod-Treffer--besteBranche">Ambulante Pflegedienste</p>
<div class="mod mod-Stars mod-Stars--" title="2.9/5" data-float="2,9">
<span class="mod-Stars__text" style="width: 58.000001907348632812500%;">2.9</span>
</div>
<span>2.9</span>
<span>(8)</span>
<address class="mod mod-AdresseKompakt">
<p data-wipe-name="Adresse">
Kirchenberg 2‑4,
<span class="nobr">
90482
Nürnberg
</span>
(Mögeldorf)
</p>
<p class="mod-AdresseKompakt__phoneNumber" data-hochgestellt-position="end" data-wipe-name="Kontaktdaten">(0911) 60 00 99 77</p>
</address>
</div>
<div class="aktionsleiste_kompakt">
<div class="mod-gsSlider mod-gsSlider--noneOnWhite">
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="left" data-show="false" data-wipe="{"listener":"click","name":"Trefferliste: Aktionleiste-button-links"}"></span>
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="right" data-show="false" data-wipe="{"listener":"click","name":"Trefferliste: Aktionleiste-button-rechts"}"></span>
<div class="mod-gsSlider__slider" data-initialized="true">
<a class="contains-icon-homepage gs-btn" target="_blank" rel=" noopener" href="http://www.as-billing.de" data-wipe="{"listener":"click", "name":"Trefferliste Webseite-Button", "id":"122085958708"}" data-isneededpromise="false">Webseite</a>
<a class="contains-icon-email gs-btn" href="mailto:info#as-billing.de?subject=Anfrage%20%C3%BCber%20Gelbe%20Seiten" data-wipe="{"listener":"click", "name":"Trefferliste Email-Button", "id":"122085958708"}" data-isneededpromise="false">E-Mail</a>
<span class="contains-icon-route_finden gs-btn" data-wipe="{"listener":"click", "name":"Trefferliste Navigation-Button", "id":"122085958708"}" data-parameters="{"partner": "googlemaps", "searchquery": "A%20%26%20S%20Billing%20Pflege-Service%20GmbH%20Kirchenberg%202-4%2090482%20N%C3%BCrnberg"}" data-target="_blank">Route</span>
<span class="contains-icon-details gs-btn" data-wipe="{"listener":"click", "name":"Trefferliste Actionbutton Mehr Details", "id":"122085958708"}" data-parameters="{"partner": "gs", "realId": "2aeca1d2-2bc5-4070-ac4d-e16b10badca5", "tnId": "122085958708"}">Mehr Details</span>
</div>
</div>
</div>
</article>
I first used these lines
Dim post As Object
Set post = html.querySelectorAll(".mod-Treffer")
For i = 0 To post.Length - 1
Debug.Print post.Item(i).getElementsByTagName("h2")(0).innerText
Debug.Print post.Item(i).getElementsByTagName("Address")(0).getElementsByTagName("p")(1).innerText
'I am stuck with extracting the email
'HERE
Next i
Moreover, sometimes the post object doesn't have the email inforrmation so I need to extract only if found.
That's the code till now
Const sURL As String = "https://www.gelbeseiten.de/Suche/Ambulante%20Pflegedienste/Bundesweit"
Dim http As MSXML2.XMLHTTP60, html As HTMLDocument
Set http = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With http
.Open "Get", sURL, False
.send
html.body.innerHTML = .responseText
End With
Dim post As Object
Set post = html.querySelectorAll(".mod-Treffer")
Dim i As Long, r As Long
Range("A1").Resize(1, 3).Value = Array("Title", "Phone", "Email")
r = 2
For i = 0 To post.Length - 1
Cells(r, 1).Value = post.Item(i).getElementsByTagName("h2")(0).innerText
Cells(r, 2).Value = post.Item(i).getElementsByTagName("Address")(0).getElementsByTagName("p")(1).innerText
Next i
Here's a snapshot of the email part

Original question:
In this case I would use an attribute = value selector with contains operator to target the href attribute by the string mailto. Add css selector: [href*=mailto]
If you use querySelectorAll("[href*=mailto]") you can test if the .Length property is greater than 0 or use querySelector and test if Not querySelector("[href*=mailto]") Is Nothing.
If you set to a variable
Dim ele As Object
Set ele = html.document.querySelector("[href*=mailto]")
If Not ele Is Nothing Then
Debug.Print ele.href 'do something with the href to parse out email
End If
Updated question:
For the updated question I would transfer current node's, in nodeList, outerHTML into a surrogate HTMLDocument variable so I can leverage querySelector method again. I would target email by class.
Option Explicit
Public Sub GetListingInfo()
Const URL As String = "https://www.gelbeseiten.de/Suche/Ambulante%20Pflegedienste/Bundesweit"
Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument
Set http = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With http
.Open "Get", URL, False
.send
html.body.innerHTML = .responseText
End With
Dim post As Object, html2 As MSHTML.HTMLDocument
Set post = html.querySelectorAll(".mod-Treffer")
Set html2 = New MSHTML.HTMLDocument
Dim i As Long, emailNode As Object
With ActiveSheet
.Range("A1").Resize(1, 3).Value = Array("Title", "Phone", "Email")
For i = 0 To post.Length - 1
html2.body.innerHTML = post.Item(i).outerHTML
.Cells(i + 2, 1).Value = html2.querySelector("h2").innerText
.Cells(i + 2, 2).Value = html2.querySelector(".mod-AdresseKompakt__phoneNumber").innerText
Set emailNode = html2.querySelector(".contains-icon-email")
If Not emailNode Is Nothing Then .Cells(i + 2, 3).Value = Replace$(emailNode.href, "mailto:", vbNullString)
Next i
End With
End Sub

Thanks a lot.
I could figure it out using these lines
If InStr(post.Item(i).getElementsByTagName("a")(1).href, "mailto:") Then
Debug.Print Split(Split(post.Item(i).getElementsByTagName("a")(1).href, "mailto:")(1), "?")(0)
End If
But I welcome any other suggestions to improve and learn more.
* After testing, I encountered an error if the email not found within the element. How to avoid the error? I can use On Error Resume Next. But I have a desire to handle the error instead of skipping it.
** Edit:
I could solve the second point by using this structure
Dim emailObj As Object
Set emailObj = post.Item(i).getElementsByTagName("a")(1)
If Not emailObj Is Nothing Then
If InStr(post.Item(i).getElementsByTagName("a")(1).href, "mailto:") Then
Debug.Print Split(Split(post.Item(i).getElementsByTagName("a")(1).href, "mailto:")(1), "?")(0)
End If
The code works but sometimes the email is not grabbed correctly .. that is because of this line
Set emailObj = post.Item(i).getElementsByTagName("a")(1)
Sometimes the object is not assigned to 1. So my last question: how can I get the email data regardless of the assigned number?
Inside the loop, I tried this line and played around with no use
Set aNodeList = post.Item(i).querySelectorAll(".contains-icon-email")(0)

<article class="mod mod-Treffer" data-teilnehmerid="122085958708">
<div data-wipe="{"listener": "click", "name": "Trefferliste Eintrag zur Detailseite", "id": "122085958708", "synchron": true}" data-realid="2aeca1d2-2bc5-4070-ac4d-e16b10badca5" data-tnid="122085958708" target="_self">
<div class="mod-hervorhebung">
<p class="mod-hervorhebung--partnerHervorhebung" data-hervorhebungsstufe="3">Silber Partner</p>
</div>
<picture class="trefferlisten_logo">
<source media="(min-width: 768px)" srcset="https://ies.v4all.de/0122/GS/0122/5/8335/49428335_310x190.png">
<img alt="" data-lazy-src="https://ies.v4all.de/0122/GS/0122/5/8335/49428335_310x190.png" src="https://ies.v4all.de/0122/GS/0122/5/8335/49428335_310x190.png">
</picture>
<h2 data-wipe-name="Titel">A & S Billing Pflege-Service GmbH</h2>
<p class="d-inline-block mod-Treffer--besteBranche">Ambulante Pflegedienste</p>
<div class="mod mod-Stars mod-Stars--" title="2.9/5" data-float="2,9">
<span class="mod-Stars__text" style="width: 58.000001907348632812500%;">2.9</span>
</div>
<span>2.9</span>
<span>(8)</span>
<address class="mod mod-AdresseKompakt">
<p data-wipe-name="Adresse">
Kirchenberg 2‑4,
<span class="nobr">
90482
Nürnberg
</span>
(Mögeldorf)
</p>
<p class="mod-AdresseKompakt__phoneNumber" data-hochgestellt-position="end" data-wipe-name="Kontaktdaten">(0911) 60 00 99 77</p>
</address>
</div>
<div class="aktionsleiste_kompakt">
<div class="mod-gsSlider mod-gsSlider--noneOnWhite">
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="left" data-show="false" data-wipe="{"listener":"click","name":"Trefferliste: Aktionleiste-button-links"}"></span>
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="right" data-show="false" data-wipe="{"listener":"click","name":"Trefferliste: Aktionleiste-button-rechts"}"></span>
<div class="mod-gsSlider__slider" data-initialized="true">
<a class="contains-icon-homepage gs-btn" target="_blank" rel=" noopener" href="http://www.as-billing.de" data-wipe="{"listener":"click", "name":"Trefferliste Webseite-Button", "id":"122085958708"}" data-isneededpromise="false">Webseite</a>
<a class="contains-icon-email gs-btn" href="mailto:info#as-billing.de?subject=Anfrage%20%C3%BCber%20Gelbe%20Seiten" data-wipe="{"listener":"click", "name":"Trefferliste Email-Button", "id":"122085958708"}" data-isneededpromise="false">E-Mail</a>
<span class="contains-icon-route_finden gs-btn" data-wipe="{"listener":"click", "name":"Trefferliste Navigation-Button", "id":"122085958708"}" data-parameters="{"partner": "googlemaps", "searchquery": "A%20%26%20S%20Billing%20Pflege-Service%20GmbH%20Kirchenberg%202-4%2090482%20N%C3%BCrnberg"}" data-target="_blank">Route</span>
<span class="contains-icon-details gs-btn" data-wipe="{"listener":"click", "name":"Trefferliste Actionbutton Mehr Details", "id":"122085958708"}" data-parameters="{"partner": "gs", "realId": "2aeca1d2-2bc5-4070-ac4d-e16b10badca5", "tnId": "122085958708"}">Mehr Details</span>
</div>
</div>
</div>
</article>

Related

How to get elements that are out of Parent Class

I am trying to extract some data from the web. However NOT all of the information that I need is in the Parent Class. I can get the information in the Parent class.
QUESTION - Is there a way to get data if it is outside of the parent class? or is there a way to set the below code to extract without using a parent class.
Link
I am using IE as it allos me to search the site. I have tried several code variations however, the extra information is not is the parent class that I am trying to extract from.
I am after the name, location and social media links. Location is at the tops of the webpage out of the class
I tried to use the following for parent class shop-home as all other class fall into it, but it did not work. I have never tried to get data that is not in the parent class so, not 100% sure how to do it. SIM helped with this element.ParentNode.ParentNode.getElementsByClassName as the product url was before the parent. I have been trying to use this for all the other data that is outside the parent, however I can not get it to work. I do not full understand it if someone could explain what the .ParentNode.ParentNode. is doing that will help with my understand and I might be able to work the rest out myself.
The code below is for the first two items that pulls off fine, the code layout is the same for all items except it is as If element.getElementsByClassName("CLASS HERE")(0) . I have tried using ID Tag Span AND SO ON If element.getElementsByClassName("CLASS HERE")(0).getelementsByTagName ("Span") (0)
Application.ScreenUpdating = False
Set HTML = objIE.document
''''########## Setting the Parent Class HERE ##########
Set elements = HTML.getElementsByClassName("v2-listing-card__info")
''''Scrolls Down the Browser
objIE.document.parentWindow.Scroll 0&, 9999 ' Scrolls Down the Browser
''''FOR LOOP
For Each element In elements
''' Element 1
If element.ParentNode.ParentNode.getElementsByClassName("listing-link")(0) Is Nothing Then
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-"
Else
HtmlText = element.ParentNode.ParentNode.getElementsByClassName("listing-link")(0).href
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText
End If
''' Element 2
If element.getElementsByTagName("h3")(0) Is Nothing Then
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = "-"
Else
HtmlText = element.getElementsByTagName("h3")(0).innerText ' Get CLASS and Child Nod 'src
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = HtmlText 'return value in column
End If
''' Element 3
RESULTS - Date in red is wrong or missing as it is not in the above parent class
The shipping in column H pulls off fine as it is in the Parent, If there is no shipping info then a hyphen goes into the cell. Items for C,D,E, are out of the parent class that I am using.
<div class="flex-grow-1">
<div class="max-width-760px ">
</div>
<div class="max-width-676px">
<div class="">
<p class="wt-text-heading-02 wt-display-inline" data-inplace-editable-text="story_headline" data-endpoint="AboutPost" data-key="story_headline" data-placeholder="Sum up what you do in one sentence. Or just write something catchy." data-use-inplace-input="1"
data-add-class="normal story-headline-edit-link"></p>
</div>
<div class="">
<div id="about-story" class="" aria-hidden="false">
<p class="about-story text-body-larger text-gray-lighter ">
<span class="mt-xs-1" data-inplace-editable-text="story" data-endpoint="AboutPost" data-key="story" data-placeholder="How did you get started? What inspires you? We know each seller’s story is unique — tell yours here."></span>
</p>
</div>
<div class="wt-text-center-xs">
</div>
</div>
</div>
<div class="wt-mb-xs-6 wt-mb-md-8">
<div class="clearfix"></div>
<div>
<h3 class="wt-text-title-01"></h3>
<div class="pt-xs-2 pt-lg-4">
<div class="display-flex-md flex-wrap max-width-760px">
<div class="mb-xs-2 text-body mr-md-6">
<a href="https://www.facebook.com/Lucky-Plum-706715642737271/" class="text-decoration-none clearfix" title="Facebook" target="_blank" rel="nofollow noopener">
<span class="etsy-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M20,5V19a1.007,1.007,0,0,1-1,1H15V13.776h2l0.336-2.3H15V9.659a0.912,0.912,0,0,1,1-1.031h1.5V6.55a11.284,11.284,0,0,0-1.641-.109c-2.2,0-3.3,1.219-3.3,3.039v1.992h-2v2.3h2V20H5a1.007,1.007,0,0,1-1-1V5A1.007,1.007,0,0,1,5,4H19A1.007,1.007,0,0,1,20,5Z"></path></svg></span>
<span>Facebook</span>
</a>
</div>
<div class="mb-xs-2 text-body mr-md-6">
<a href="https://www.instagram.com/luckyplumstudio/" class="text-decoration-none clearfix" title="Instagram" target="_blank" rel="nofollow noopener">
<span class="etsy-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M12,5.447c2.136,0,2.389,0.008,3.233,0.047c0.78,0.036,1.204,0.166,1.485,0.275c0.373,0.145,0.64,0.318,0.92,0.598 c0.28,0.28,0.453,0.546,0.598,0.92c0.11,0.282,0.24,0.706,0.275,1.485c0.038,0.844,0.047,1.097,0.047,3.233 s-0.008,2.389-0.047,3.233c-0.036,0.78-0.166,1.204-0.275,1.485c-0.145,0.373-0.318,0.64-0.598,0.92 c-0.28,0.28-0.546,0.453-0.92,0.598c-0.282,0.11-0.706,0.24-1.485,0.275c-0.843,0.038-1.096,0.047-3.233,0.047 s-2.389-0.008-3.233-0.047c-0.78-0.036-1.204-0.166-1.485-0.275c-0.373-0.145-0.64-0.318-0.92-0.598 c-0.28-0.28-0.453-0.546-0.598-0.92c-0.11-0.282-0.24-0.706-0.275-1.485c-0.038-0.844-0.047-1.097-0.047-3.233 S5.45,9.616,5.488,8.773c0.036-0.78,0.166-1.204,0.275-1.485c0.145-0.373,0.318-0.64,0.598-0.92c0.28-0.28,0.546-0.453,0.92-0.598 c0.282-0.11,0.706-0.24,1.485-0.275C9.611,5.455,9.864,5.447,12,5.447 M12,4.005c-2.173,0-2.445,0.009-3.298,0.048 C7.85,4.092,7.269,4.227,6.76,4.425C6.234,4.63,5.787,4.903,5.343,5.348C4.898,5.793,4.624,6.239,4.42,6.765 c-0.198,0.509-0.333,1.09-0.372,1.942C4.009,9.56,4,9.833,4,12.005c0,2.173,0.009,2.445,0.048,3.298 c0.039,0.852,0.174,1.433,0.372,1.942c0.204,0.526,0.478,0.972,0.923,1.417c0.445,0.445,0.891,0.718,1.417,0.923 c0.509,0.198,1.09,0.333,1.942,0.372c0.853,0.039,1.126,0.048,3.298,0.048s2.445-0.009,3.298-0.048 c0.852-0.039,1.433-0.174,1.942-0.372c0.526-0.204,0.972-0.478,1.417-0.923c0.445-0.445,0.718-0.891,0.923-1.417 c0.198-0.509,0.333-1.09,0.372-1.942C19.991,14.45,20,14.178,20,12.005s-0.009-2.445-0.048-3.298 c-0.039-0.852-0.174-1.433-0.372-1.942c-0.204-0.526-0.478-0.972-0.923-1.417c-0.445-0.445-0.891-0.718-1.417-0.923 c-0.509-0.198-1.09-0.333-1.942-0.372C14.445,4.014,14.173,4.005,12,4.005L12,4.005z"></path><path d="M12,7.897c-2.269,0-4.108,1.839-4.108,4.108S9.731,16.113,12,16.113s4.108-1.839,4.108-4.108S14.269,7.897,12,7.897z M12,14.672c-1.473,0-2.667-1.194-2.667-2.667S10.527,9.339,12,9.339s2.667,1.194,2.667,2.667S13.473,14.672,12,14.672z"></path><circle cx="16.27" cy="7.735" r="0.96"></circle></svg></span>
<span>Instagram</span>
</a>
</div>
</div>
</div>
</div>
</div>
<div class="wt-mb-xs-8 wt-mb-md-10">
<div class="clearfix"></div>
<div class="about-section display-flex-md flex-direction-column-md mb-md-5 pl-xs-0 pr-xs-0" data-region="shop-members" id="shop-members">
<div class="p-xs-0">
<h3 class="wt-text-title-01">Shop members</h3>
</div>
<div class="pl-xs-0 pr-xs-0 pt-xs-2 pt-lg-4">
<div class="max-width-760px">
<ul class="list-unstyled block-grid-md-2" data-region="shop-member-list">
<li class="pt-xs-2 pb-xs-2 block-grid-item" data-region="shop-member" data-member-id="22676501471" data-member-avatar-url="https://i.etsystatic.com/isc/87253d/22676501471/isc_90x90.22676501471_6w54.jpg?version=0" data-member-bio="" data-member-role="Owner"
data-member-name="Lucky Plum Studio">
<div class="flag">
<div class="flag-img vertical-align-top pr-lg-3">
<img src="https://i.etsystatic.com/isc/87253d/22676501471/isc_90x90.22676501471_6w54.jpg?version=0" alt="" class="circle" data-region="member-avatar" width="48" height="48">
</div>
<div class="flag-body">
<h6 class="mb-xs-0 b text-transform-none text-body" data-region="member-name">Lucky Plum Studio</h6>
<p class="prose" data-region="member-role">Owner</p>
<p class="text-gray-lighter mb-xs-0" data-region="member-bio">
</p>
</div>
</div>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="">
</div>
</div>
As Always thanks in advance
''######### updated today 22/3/2021 at 6pm uk time #########
In reply to Qharr answer. I had this for location and nothing was collected, could you please explain where i went wrong and I should be able to fix the rest
''' Element 4
DoEvents
If element.getElementsByClassName("shop-location")(0).getElementsByTagName("Span")(0) Is Nothing Then ' Get CLASS and Child Nod
wsSheet.Cells(sht.Cells(sht.Rows.Count, "D").End(xlUp).Row + 1, "D").Value = "-"
Else
HtmlText = element.getElementsByClassName("shop-location")(0).getElementsByTagName("Span")(0).innerText
wsSheet.Cells(sht.Cells(sht.Rows.Count, "D").End(xlUp).Row + 1, "D").Value = HtmlText
End If
Not sure what to say except read up on html and html document methods/ css selectors so you understand the patterns you need to apply. The rest is just practice and learning which are the fastest and more robust methods.
CSS:
Location: .shop-location span is a span child element with parent having class shop-location
Social media links: #about .text-decoration-none child nodes with one class name that is text-decoration-none, having parent with id about.
Name: [data-region='member-name'] element with data-region attribute having value member-name
Read about css selectors and descendant combinator here
Practice css selectors here
Learn about html here
VBA:
Option Explicit
Public Sub GetInfo()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://www.etsy.com/uk/shop/LuckyPlumStudio"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
With .document
Debug.Print .querySelector(".shop-location span").innerText 'location
Dim i As Long, socialMedias As Object
Set socialMedias = .querySelectorAll("#about .text-decoration-none")
For i = 0 To socialMedias.Length - 1 'media links
Debug.Print socialMedias.Item(i).href
Next
Debug.Print .querySelector("[data-region='member-name']").innerText 'company name
End With
.Quit
End With
End Sub
Less optimal methods for selecting:
Option Explicit
Public Sub GetInfo()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://www.etsy.com/uk/shop/LuckyPlumStudio"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
With .document
Debug.Print .getElementsByClassName("shop-location wt-display-flex-xs")(0).getElementsByTagName("span")(0).innerText 'location
Dim i As Object, socialMedias As Object
Set socialMedias = .getElementById("about").getElementsByClassName("text-decoration-none clearfix")
For Each i In socialMedias 'media links
Debug.Print i.href
Next
Debug.Print .getElementById("about").getElementsByClassName("flag")(0).getElementsByTagName("h6")(0).innerText 'company name
End With
.Quit
End With
End Sub

CSS selector QuerySelector alternative

I have searched a lot and a lot so as to find material about how to get meta data using XMLHTTP. And I think that's impossible to do that using the Early binding method. The only approach that will work is the late binding by CreateObject("HTMLFile") and dealing with that HTML which is late binding. The disadvantage of this approach is that it doesn't support the use of the QuerySelector or QuerySelectorAll..
Now I am trying to find alternative to this CSS selector .. without using the QuerySelector
Set post = .querySelector("table div span[itemprop='lowPrice']")
This arises an error .. and I can't find easier way to find the element
Here's the HTML content
<table class="p">
<tbody><tr>
<td class="foto">
<div class="foto">
<a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/#gallery-open" target="_blank" class="gallery-link product-detail__gallery-link" onclick="dataLayer.push({'event':'sendEvent','event_category':'Product Detail - Desktop','event_action':'Gallery','event_label':'Otev\u0159en\u00ed galerie','event_value':0});">
<img src="https://im9.cz/iR/importprodukt-orig/4c2/4c2b1733c8b233edd5052d3063ac46d9--mmf250x250.jpg" alt="Brit Premium by Nature Adult L 15 kg" width="250" height="250" id="picture-main">
<span class="image-hover">
<span class="image-overlay"></span>
<span class="js-test-image-count-info image-count-info">Galerie <span class="picture-count">(2)</span></span>
</span>
<span class="product-detail__gallery-link__image__count-info">Galerie
<span class="product-detail__gallery-link__image__count-info__count">(2)</span>
</span>
</a>
<span>Top</span><strong>1.</strong>
<div class="poty-ico">
<img src="https://im9.cz/iR/recenze-externi/107.png" alt="Produkt Roku 2019" class="product-of-year-badge"></div>
</div>
</td>
<td>
<div class="main-info">
<div class="text-cover">
<div id="n649054946" data-id="649054946" class="item js-public-product-id">
<h2 itemprop="name">Brit Premium by Nature Adult L 15 kg</h2>
</div>
<div class="rating-box" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">
<p class="eval">
<strong itemprop="ratingValue">95%</strong>
<a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/pridat-uzivatelskou-recenzi/#section">
<span class="rating"><span class="hidden">Hodnocení produktu: 95%</span><span class="over" title="Hodnocení produktu: 95%"><span style="width: 75px;"></span></span></span>
</a>
</p>
<span class="hidden-microdata" itemprop="ratingCount">
456
</span>
<p class="review-count delimiter-blank">
<a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/recenze/#section" class="gtm-header-link" data-gtm-link-description="Počet recenzí">
<span itemprop="reviewCount">344</span>
recenzí
</a>
</p>
<div class="cleaner"></div>
<p class="rating-box__item rating-box__favourite">
Přidat do oblíbených
</p>
<p id="cli649054946" class="rating-box__item rating-box__compare delimiter-blank cl-add">
<a class="checkbox gtm-header-link" data-gtm-link-description="Akce - porovnání" href="#" title="Porovnat">Přidat do porovnání</a>
</p>
<p class="delimiter-blank rating-box__item rating-box__price-watch js-price-watch-button">
<a href="#" title="Hlídat cenu" class="gtm-header-link" data-gtm-link-description="Akce - hlídat cenu">
Hlídat cenu
</a>
</p>
<p class="add-review rating-box__item rating-box__add-review delimiter-blank">
<a href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/pridat-uzivatelskou-recenzi/#section" class="gtm-header-link" data-gtm-link-description="Akce - přidat recenzi">
Přidat recenzi
</a>
</p>
</div>
<div id="top-shop-info" class="top-shop-info">
<div class="inner">
<div class="guar">
<div>
<img class="guar-badge" src="https://im9.cz/css-v2/images/guaranty-seal.png?1" alt="Garance nákupu - SpokojenyPes.cz" width="27" height="34">
</div>
</div>
<div class="shop-claim bold">
<strong>Produkt vám dodá:</strong>
</div>
<div class="shop-logo">
<a href="https://www.heureka.cz/exit/spokojenypes-cz/3180319922/?z=41" target="_blank" rel="nofollow noopener" class="gtm-header-link" data-gtm-link-description="Exit - produkt vám dodá">
<img src="https://im9.cz/iR/importobchod-orig/1983_logo--mmf130x40.png" alt="SpokojenyPes.cz" width="130" height="40">
</a>
</div>
<div class="recommendation">
<a href="https://obchody.heureka.cz/spokojenypes-cz/recenze/" class="gtm-header-link" data-gtm-link-description="Hodnocení - Produkt vám dodá">
99% zákazníků doporučuje obchod
</a>
</div>
<div class="delivery-info bold price-delivery-free">
Doprava zdarma
</div>
<div class="availability-info bold in-stock">
skladem
</div>
</div>
<a data-gtm-link-description="Další nabídky" id="top-shop-count-info" href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/porovnat-ceny/#section" class="top-shop-count-info box-active gtm-header-link">Dalších 134 nabídek od 728 Kč</a>
</div>
<p class="desc">
<span id="product-short-description">
Kompletní krmivo Brit Premium pro dospělé psy. Kuřecí receptura pro dospělé psy velkých plemen (25 - 45 kg).
<a id="product-short-description-button" href="https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section" title="celá specifikace Brit Premium by Nature Adult L 15 kg">celá specifikace</a>
</span>
</p>
</div>
<div itemprop="offers" itemscope="" itemtype="http://schema.org/AggregateOffer" style="display:none">
<span itemprop="lowPrice">728.00</span>
<span itemprop="highPrice">1579.00</span>
<span itemprop="offerCount">135</span>
<link itemprop="availability" href="http://schema.org/InStock">
</div>
<div itemprop="offers" itemscope="" itemtype="http://schema.org/Offer" class="price-from shopping-cart">
<link itemprop="itemCondition" href="http://schema.org/OfferItemCondition" content="http://schema.org/NewCondition">
<link itemprop="availability" href="http://schema.org/InStock">
<link itemprop="category" href="http://schema.org/category" content="Hobby / Chovatelství / Pro psy / Krmivo pro psy">
<link itemprop="image" href="http://schema.org/image" content="https://im9.cz/iR/importprodukt-orig/4c2/4c2b1733c8b233edd5052d3063ac46d9.jpg">
<div class="top-left">
<div id="top-button" class="buy-click-observed">
<p class="buy">
<a href="#" class="flat-button flat-button--top-position flat-button--orange buy-btn hb hb-3180319922 js-top-pos-btn" data-cart-position="0">
<i class="ico basket"></i>
<i class="ico check"></i>
<span class="in">Koupit na Heurece</span>
<span class="in replace">Přidáno do košíku</span>
</a>
</p>
</div>
<div class="n" id="top-offer-price">
<p class="buy-price">
<span itemprop="price" class="js-top-price" content="839.00">839 Kč</span>
<span class="price-vat-title small">s DPH</span>
<span itemprop="priceCurrency" content="CZK"></span>
</p>
</div>
<div class="clear"></div>
<div class="js-top-gifts-info top-shop-gifts-info-box">
</div>
</div>
<div class="clear"></div>
<div class="clear"></div>
</div>
<span id="new-pd"></span>
<script>
(function() {
loadScript("https:\/\/im9.cz\/js\/cache\/7e39f733-1-42bd9e7837b830d87e1af94da6d0e4a82055c56f.hash.js", function () {
var productHeadObserver = new ProductHeadObserver({ 'topShortDescElm': $('product-short-description'), 'topShopBox': $('top-shop-info'), 'maxOfferNameLength': 90 });
productHeadObserver.oneOfferInit();
});
H.Awards._reviewClick($$('#awards-list span.pa'));
var notSelectedCallback = function() {
if ('undefined' != typeof H.ShoppingCartHelper.BuyMoreOptions &&
typeof H.ShoppingCartHelper.BuyMoreOptions.buyClickNotSelectedCallback == 'function') {
H.ShoppingCartHelper.BuyMoreOptions.buyClickNotSelectedCallback();
}
};
H.ShoppingCartHelper.observeBuyClick($('top-button'), new H.ShoppingCart(), notSelectedCallback, 'js-top-pos-btn');
})();
</script>
<div class="clear"></div>
</div>
</td>
</tr>
</tbody></table>
This is the whole HTML
https://pastebin.com/Dgu1wk2b
Here's the code till now
Sub MyTest()
Dim source As Object
Dim obj As Object
Dim resp As String
Dim post As Object
Dim a, i As Long
With CreateObject("MSXML2.xmlHttp")
.Open "GET", "https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section", False
.send
resp = .responseText
End With
With CreateObject("HTMLFile")
.write resp
Set post = .getElementsByTagName("meta")
For i = 0 To post.Length - 1
On Error Resume Next
Debug.Print post.item(i).getAttribute("name")
If post.item(i).getAttribute("name") = "gtm:product_id" Then
Cells(2, 1).Value = post.item(i).Value
End If
If post.item(i).getAttribute("name") = "gtm:product_name" Then
Cells(2, 3).Value = post.item(i).Value
End If
If post.item(i).getAttribute("name") = "gtm:product_brand" Then
Cells(2, 4).Value = post.item(i).Value
End If
On Error GoTo 0
Next i
Set post = Nothing
Set post = .getElementsByTagName("link")
For i = 0 To post.Length - 1
On Error Resume Next
If post.item(i).getAttribute("rel") = "canonical" Then
Cells(2, 2).Value = post.item(i).href
End If
On Error GoTo 0
Next i
'I am stuck here
'Set post = .querySelector("table div span[itemprop='lowPrice']")
'Debug.Print .getElementsByTagName("table")(0).innerHTML
End With
End Sub
As you have discovered HEAD tag info (where meta stuff lives) is stripped out when you use document.body.innerHTML = .responseText with early-bound MSHTML.HTMLDocument. Kinda what you would expect considering what you are populating (document.body). That is why you are unable to select the meta info. With your late bound HTMLFile (where you can't use querySelector) you are using .write method which is writing to your document (HTMLFile) and thereby retaining the HEAD info.
You need to ensure that the HEAD info ends up within BODY tags. Either as part of response body or extracted HEAD concatenated with new BODY tags and written to HTMLDocument if wishing to use early binding.
E.g. for clarity I am writing HEAD info between BODY tags only (Without rest of existing response)
Option Explicit
Public Sub MetaInfoEarlyBound()
Dim html As MSHTML.HTMLDocument, htmlHead As MSHTML.HTMLDocument, xhr As MSXML2.XMLHTTP60
Dim re As VBScript_RegExp_55.RegExp
Set htmlHead = New MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
Set xhr = New MSXML2.XMLHTTP60
Set re = New VBScript_RegExp_55.RegExp
re.Pattern = "<head>([\s\S]+)<\/head>"
With xhr
.Open "GET", "https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section", False
.send
htmlHead.body.innerHTML = Replace$(Replace$(re.Execute(.responseText)(0), "<head>", "<body>"), "</head>", "</body>")
html.body.innerHTML = .responseText
End With
Debug.Print htmlHead.querySelector("[name='gtm:product_price']").Value
Debug.Print html.querySelector("[itemprop=lowPrice]").innerText
End Sub
As an aside, I add two shorter methods (than current other answer) to achieve your goal with late-bound. Note I have commented one out.
Public Sub MetaInfoLateBound()
Dim resp As String
With CreateObject("MSXML2.xmlHttp")
.Open "GET", "https://krmivo-psy.heureka.cz/brit-premium-by-nature-adult-l-15-kg/specifikace/#section", False
.send
resp = .responseText
End With
With CreateObject("HTMLFile")
.write resp
' Dim post As Object
'
' Set post = .getElementById("new-pd")
' Debug.Print post.PreviousSibling.PreviousSibling.getElementsByTagName("span")(0).innertext
'
Dim metas As Object, i As Long
Set metas = .getElementsByTagName("meta")
For i = 0 To metas.Length - 1
If metas.Item(i).Name = "gtm:product_price" Then
Debug.Print metas.Item(i).Value
Exit For
End If
Next
End With
End Sub
Try this:
With CreateObject("HTMLFile")
.Open
.write resp
.Close
For Each tbl In .getElementsByTagName("table")
For Each dv In tbl.getElementsByTagName("div")
If dv.getattribute("itemprop") = "offers" Then '<<EDIT
For Each spn In dv.getElementsByTagName("span")
attr = ""
attr = spn.getattribute("itemprop")
If Len(attr) > 0 Then
If attr = "lowPrice" Then
Debug.Print spn.outerhtml
Debug.Print spn.innerText
End If
End If
Next spn
End If
Next dv
Next tbl
End With

PreviousSibling doesn't return value using querySelector

I am trying to extract two parts from Local html file located in C:\Sample.html
And I used #QHarr code from another thread like that
Sub Test()
Dim html As HTMLDocument, post As Object, i As Long
Set html = GetHTMLFileContent("C:\Sample.html")
Set post = html.querySelectorAll("span.course-player__chapter-item__completion")
For i = 0 To post.Length - 1
ActiveSheet.Cells(i + 1, 1) = Trim(post.item(i).innerText)
ActiveSheet.Cells(i + 1, 2) = post.item(i).PreviousSibling.innerText
Next i
End Sub
Function GetHTMLFileContent(ByVal filePath As String) As HTMLDocument
Dim fso As Object, hFile As Object, hString As String, html As HTMLDocument
Set html = New HTMLDocument
Set fso = CreateObject("Scripting.FileSystemObject")
Set hFile = fso.OpenTextFile(filePath)
Do Until hFile.AtEndOfStream
hString = hFile.ReadAll()
Loop
html.body.innerHTML = hString
Set GetHTMLFileContent = html
End Function
The code works fine and grabs the innertext of the element in that part post.item(i).innerText.
But when trying to get the innertext of the Previous Sibling it doesn't return anything
Here's snapshot of the html
<div class="course-player__chapter-item__header _chapter-item__header_d57kmg ui-accordion-header ui-corner-top ui-state-default ui-accordion-icons ui-accordion-header-active ui-state-active" role="tab" id="ui-id-1" aria-controls="ui-id-2" aria-selected="true" aria-expanded="true" tabindex="0"><span class="ui-accordion-header-icon ui-icon ui-icon-triangle-1-s"></span>
<h2 tabindex="-1" class="course-player__chapter-item__title _chapter-item__title_d57kmg">
<span class="course-player__progress _chapter-item__progress_d57kmg">
<span data-percentage-completion="100" class="_chapter-item__progress-ring_d57kmg">
<span class="progress-ring__ring _progress-ring__ring_jgsecr">
<span class="progress-ring__mask progress-ring--full _progress-ring__mask_jgsecr _progress-ring--full_jgsecr">
<span class="progress-ring--fill brand-color__background _progress-ring--fill_jgsecr"></span>
</span>
<span class="progress-ring__mask progress-ring--half _progress-ring__mask_jgsecr ">
<span class="progress-ring--fill brand-color__background _progress-ring--fill_jgsecr"></span>
<span class="progress-ring--fill progress-ring--fix _progress-ring--fill_jgsecr _progress-ring--fix_jgsecr"></span>
</span>
</span>
<span class="progress-ring__ring-inset _progress-ring__ring-inset_jgsecr"></span>
<span class="progress-ring__checkmark brand-color__text _progress-ring__checkmark_jgsecr"><i aria-label="Completed" class="toga-icon toga-icon-checkmark"></i></span>
</span>
</span>
INTRO TO VBA - Overview
<!---->
<span class="course-player__chapter-item__completion _chapter-item__completion_d57kmg">
10 / 10
</span>
<span class="course-player__chapter-item__toggle _chapter-item__toggle_d57kmg">
<i aria-hidden="true" class="chapter-item__toggle-icon toga-icon toga-icon-caret-stroke-down _chapter-item__toggle-icon_d57kmg"></i>
</span>
</h2>
</div>
I have used the CSS Selector that returns all the value using h2[class='course-player__chapter-item__title _chapter-item__title_d57kmg'] then split the output into two columns
Sub Test()
Dim x, html As HTMLDocument, post As Object, s As String, i As Long
Set html = GetHTMLFileContent("C:\Sample.html")
Set post = html.querySelectorAll("h2[class='course-player__chapter-item__title _chapter-item__title_d57kmg']")
For i = 0 To post.Length - 1
x = Split(Trim(post.item(i).innerText), " ")
s = Join(Array(x(UBound(x)), x(UBound(x) - 1), x(UBound(x) - 2)), " ")
ReDim Preserve x(0 To UBound(x) - 3)
ActiveSheet.Cells(i + 1, 1) = Trim(Join(x, " "))
ActiveSheet.Cells(i + 1, 2) = Trim(s)
Next i
End Sub

Excel VBA - Get Link from HTML Anchor Where There is No ID Getting [object HTMLDivElement]

I'm trying to get the href value from a link on a webpage, where the anchor tag does not have an ID.
The method that I'm using below returns [object HTMLDivElement] as the value for aEle. How do I get the actual link from the anchor tag within the h2 class="title" tag?
The HTML looks like this:
<div id="searchResults" class="searchResults_clear">
<div class="prodFeatures>
<div class="inner">
<a title="Product Name 1" class="img" href="/product1.html"><img alt='' src="/products/product1.jpg" /></a>
<div class="details">
<div class="info">
<p class="mfg">Mfg 1</p>
<h2 class="title">Product Name 1</h2>
<div class="SKU">
SKU #1
</div>
<p class="model">Model #1</p>
</div>
</div>
</div>
<div class="prodFeatures>
<div class="inner">
<a title="Product Name 2" class="img" href="/product2.html"><img alt='' src="/products/product2.jpg" /></a>
<div class="details">
<div class="info">
<p class="mfg">Mfg 2</p>
<h2 class="title">Product Name 2</h2>
<div class="SKU">
SKU #2
</div>
<p class="model">Model #2</p>
</div>
</div>
</div>
</div>
I've tried a few different methods that I found through StackOverflow. This method appears to come the closest. (I didn't save the links, but that would have been helpful for everyone, wouldn't it?)
It seems like this should work:
(I've only shown that part of the sub that is responsible for grabbing the data. The rest of it works fine.)
Dim objIE As InternetExplorer
Dim aEle As HTMLLinkElement
Dim y As Integer
Dim result As String
'set iteration counter
i = 1
'for each <a> element in the collection of objects with class of 'info'...
For Each aEle In objIE.document.getElementsByClassName("info")
MsgBox (aEle)
' limit to 3 items returned per search term
If i = 4 Then
Exit For
End If
'...count result and print it to Sheet2 in col A
Worksheets("Results").Range("A1048576").End(xlUp).Offset(1, 0).Value = i
Worksheets("Results").Range("A1048576").End(xlUp).Offset(1, 0).EntireColumn.AutoFit
Worksheets("Results").Range("A1048576").End(xlUp).Offset(1, 0).EntireColumn.HorizontalAlignment = xlCenter
Debug.Print i
'...print search terms used in Sheet2 in col B
Worksheets("Results").Range("A1048576").End(xlUp).Offset(0, 1).Value = searchCell
Worksheets("Results").Range("A1048576").End(xlUp).Offset(0, 1).WrapText = False
Worksheets("Results").Range("A1048576").End(xlUp).Offset(0, 1).EntireColumn.AutoFit
Debug.Print searchCell
If InStr(aEle, " ") = 1 Then
Worksheets("Results").Range("A1048576").End(xlUp).Offset(0, 2).Value = "Nothing Found"
Debug.Print "Nothing Found"
GoTo nextSearch
Else
'...get the description within the element and print it to Sheet2 in col C
Worksheets("Results").Range("A1048576").End(xlUp).Offset(0, 2).Value = aEle.innerText
Worksheets("Results").Range("A1048576").End(xlUp).Offset(0, 2).WrapText = False
Worksheets("Results").Range("A1048576").End(xlUp).Offset(0, 2).EntireColumn.AutoFit
Debug.Print paraText
End If
'...get the href link and print it to Sheet2 in col D, next blank row
result = aEle
Worksheets("Results").Range("A1048576").End(xlUp).Offset(0, 3).Value = result
Worksheets("Results").Range("A1048576").End(xlUp).Offset(0, 3).WrapText = False
Worksheets("Results").Range("A1048576").End(xlUp).Offset(0, 3).EntireColumn.AutoFit
Debug.Print result
'increment our iteration counter
i = i + 1
'repeat times the # of ele's we have in the collection
Next aEle

VBA - How can I import a List on a website into VBA

I am trying to import a list of names and results from a website. The website requires a log on so I have attached the source code below
<div class="row reports">
<div class="twelve columns end">
<h2 class="green section-title">Student completion</h2>
<div class="sub-menu">
<ul class="inline-list completetion-filter">
<li>View:</li>
<li>All</li>
<li>Completed</li>
<li>Not completed</li>
<li>Marked</li>
</ul>
</div>
<div class="scroll-list-wrapper scrollable">
<ol class="filter-list">
<li class="not-completed " >
<span class="number purple"></span>
<span class="name">
StudentA </span>
<span class="status">
Awaiting submission </span>
</li>
<li class="complete marked" >
<span class="number purple"></span>
<span class="name">
StudentB </span>
<span class="status">
62.5% </span>
View answers
</li>
<li class="not-completed " >
<span class="number purple"></span>
<span class="name">
StudentC </span>
<span class="status">
Awaiting submission </span>
</li>
<li class="complete marked" >
<span class="number purple"></span>
<span class="name">
StudentD </span>
<span class="status">
100% </span>
I have tried using outerHTML and things like that but this is never included in the string for some reason. HTML really isn't my forte, if you could give me some help I would greatly appreciate it.
Thanks
My VBA as requested:
Sub getData()
Const Hyper As String = "http://members.gcsepod.com/teachers/assignments"
Set ie = CreateObject("InternetExplorer.application")
ie.Visible = True
ie.Navigate ("http://members.gcsepod.com/podauth/login")
Do
If ie.ReadyState = 4 Then
'ie.Visible = False
Exit Do
Else
DoEvents
End If
Loop
ie.Visible = True
ie.document.Forms(0).all("username").Value = "****"
ie.document.Forms(0).all("password").Value = "****"
ie.document.Forms(0).submit
Do
If ie.ReadyState = 4 Then
'ie.Visible = False
Exit Do
Else
DoEvents
End If
Loop
ie.Navigate ("http://members.gcsepod.com/teachers/assignments")
ie.Visible = True
Do
If ie.ReadyState = 4 Then
'ie.Visible = False
Exit Do
Else
DoEvents
End If
Loop
Const MASK$ = "data-id="
Dim txt As String, i As Long
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://members.gcsepod.com/teachers/assignments", False
.Send
txt = .ResponseText
End With
Do
i = InStr(i + 1, txt, MASK)
If i = 0 Then Exit Do
'Debug.Print Val(Mid$(txt, i + Len(MASK), 15))
Dim idNum(0 To 10) As String
idNum(0) = Mid(txt, i + 8, 6)
idCount = 0
Do While i > 0
idCount = idCount + 1
txt = Right(txt, Len(txt) - (i + 8))
i = InStr(txt, "data-id=")
'ReDim Preserve idNum(0 To idCount) As String
idNum(Count) = Mid(peopleData, pos + 8, 4)
Loop
Loop
I have also tried the following:
txt = ie.document.all.tags("ol").Item(0).outerHTML
Are you authorized to access the relevant database(s) at that domain? If you are, it would save a lot of time if you do it with an Access + SQL Server (or MySQL) solution.