VBA Webscrape HTML Coinmarketcap - html

Trying to scrape the number of cryptos in the top left corner of https://coinmarketcap.com/. I tried to find the "tr" but could not. Not sure how to grab that value up the top left of the page.
Here is what I have so far and I am being thrown a runtime error 438 Object doesn't support this property or method.
Sub cRYP()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "https://coinmarketcap.com/"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Loop
Set allRowOfData = appIE.Document.getElementById("__next")
Dim myValue As String: myValue =
allRowOfData.Cells(16).innerHTML
appIE.Quit
Range("A1").Value = myValue
End Sub

There is no tr tag, because there is no table. At first you must get the html structure which contains your wanted value, because there is no possibility to get it directly. That is the structure with the classname container. Because the method getElementsByClassName() builds a node collection you must get the right structure with it's index in the collection. That's easy because its the first one. The first index of a collection is 0 like in an array.
Than you have this html structure:
<div class="container">
<div><span class="sc-2bz68i-0 cVPJov">Cryptos
<!-- -->: 17.826
</span><span class="sc-2bz68i-0 cVPJov">Exchanges
<!-- -->: 459
</span><span class="sc-2bz68i-0 cVPJov">Market Cap
<!-- -->: €1,536,467,483,857
</span><span class="sc-2bz68i-0 cVPJov">24h Vol
<!-- -->: €105,960,257,048
</span><span class="sc-2bz68i-0 cVPJov">Dominance
<!-- -->: <a href="/charts/#dominance-percentage" class="cmc-link">BTC
<!-- -->:
<!-- -->42.7%
<!-- -->
<!-- -->ETH
<!-- -->:
<!-- -->18.2%
</a>
</span><span class="sc-2bz68i-0 cVPJov"><span class="icon-Gas-Filled" style="margin-right:4px;vertical-align:middle"></span>ETH Gas
<!-- -->: <a>35
<!-- -->
<!-- -->Gwei<span class="sc-2bz68i-1 cEFmtT icon-Chevron-down"></span>
</a>
</span></div>
<div class="rz95fb-0 jKIeAa">
<div class="sc-16r8icm-0 cPgeGh nav-item"></div>
<div class="rz95fb-1 rz95fb-2 eanzZL">
<div class="cmc-popover">
<div class="cmc-popover__trigger"><button title="Change your language" class="sc-1kx6hcr-0 eFEgkr"><span class="sc-1b4wplq-1 kJnRBT">English</span><span class="sc-1b4wplq-0 ifkbzu"><span class="icon-Caret-down"></span></span></button></div>
</div>
</div>
<div class="rz95fb-1 cfBxiI">
<div><button title="Select Currency" data-qa-id="button-global-currency-picker" class="sc-1kx6hcr-0 eFEgkr"><span class="sc-1q0bpva-0 hEPBWj"></span><span class="sc-1bafwtq-1 dUQeWc">EUR</span><span class="sc-1bafwtq-0 cIzAJN"><span class="icon-Caret-down"></span></span></button></div>
</div><button type="button" class="sc-1kx6hcr-0 rz95fb-6 ccLqrB cmc-theme-picker cmc-theme-picker--day"><span class="icon-Moon"></span></button>
</div>
</div>
As you can see the wanted value is part of the first a tag in the scraped structure. We can simply get that tag with the method getElementsByTagName(). This will also build a node collection. We need also the first element of the collection with the index 0.
Than we have this:
17.826
Now we only need the innertext of this element and that's it.
Here is the VBA code. I don't use the IE, because it is finaly EOL and shouldn't be used anymore. You can load coinmarketcap simply without any parameters via xhr (xml http request):
Sub CryptosCount()
Const url As String = "https://coinmarketcap.com/"
Dim doc As Object
Dim nodeCryptosCount As Object
Set doc = CreateObject("htmlFile")
With CreateObject("MSXML2.XMLHTTP.6.0")
.Open "GET", url, False
.Send
If .Status = 200 Then
doc.body.innerHTML = .responseText
Set nodeCryptosCount = doc.getElementsByClassName("container")(0).getElementsByTagName("a")(0)
MsgBox "Number of cryptocurrencies on Coinmarketcap: " & nodeCryptosCount.innertext
Else
MsgBox "Page not loaded. HTTP status " & .Status
End If
End With
End Sub
Edit
As I see now, there is a possibility to get the value directly by using
getElementsByClassName("cmc-link")(0)
You can play with the code to learn more.

Related

How to get elements that are out of Parent Class

I am trying to extract some data from the web. However NOT all of the information that I need is in the Parent Class. I can get the information in the Parent class.
QUESTION - Is there a way to get data if it is outside of the parent class? or is there a way to set the below code to extract without using a parent class.
Link
I am using IE as it allos me to search the site. I have tried several code variations however, the extra information is not is the parent class that I am trying to extract from.
I am after the name, location and social media links. Location is at the tops of the webpage out of the class
I tried to use the following for parent class shop-home as all other class fall into it, but it did not work. I have never tried to get data that is not in the parent class so, not 100% sure how to do it. SIM helped with this element.ParentNode.ParentNode.getElementsByClassName as the product url was before the parent. I have been trying to use this for all the other data that is outside the parent, however I can not get it to work. I do not full understand it if someone could explain what the .ParentNode.ParentNode. is doing that will help with my understand and I might be able to work the rest out myself.
The code below is for the first two items that pulls off fine, the code layout is the same for all items except it is as If element.getElementsByClassName("CLASS HERE")(0) . I have tried using ID Tag Span AND SO ON If element.getElementsByClassName("CLASS HERE")(0).getelementsByTagName ("Span") (0)
Application.ScreenUpdating = False
Set HTML = objIE.document
''''########## Setting the Parent Class HERE ##########
Set elements = HTML.getElementsByClassName("v2-listing-card__info")
''''Scrolls Down the Browser
objIE.document.parentWindow.Scroll 0&, 9999 ' Scrolls Down the Browser
''''FOR LOOP
For Each element In elements
''' Element 1
If element.ParentNode.ParentNode.getElementsByClassName("listing-link")(0) Is Nothing Then
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-"
Else
HtmlText = element.ParentNode.ParentNode.getElementsByClassName("listing-link")(0).href
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText
End If
''' Element 2
If element.getElementsByTagName("h3")(0) Is Nothing Then
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = "-"
Else
HtmlText = element.getElementsByTagName("h3")(0).innerText ' Get CLASS and Child Nod 'src
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = HtmlText 'return value in column
End If
''' Element 3
RESULTS - Date in red is wrong or missing as it is not in the above parent class
The shipping in column H pulls off fine as it is in the Parent, If there is no shipping info then a hyphen goes into the cell. Items for C,D,E, are out of the parent class that I am using.
<div class="flex-grow-1">
<div class="max-width-760px ">
</div>
<div class="max-width-676px">
<div class="">
<p class="wt-text-heading-02 wt-display-inline" data-inplace-editable-text="story_headline" data-endpoint="AboutPost" data-key="story_headline" data-placeholder="Sum up what you do in one sentence. Or just write something catchy." data-use-inplace-input="1"
data-add-class="normal story-headline-edit-link"></p>
</div>
<div class="">
<div id="about-story" class="" aria-hidden="false">
<p class="about-story text-body-larger text-gray-lighter ">
<span class="mt-xs-1" data-inplace-editable-text="story" data-endpoint="AboutPost" data-key="story" data-placeholder="How did you get started? What inspires you? We know each seller’s story is unique — tell yours here."></span>
</p>
</div>
<div class="wt-text-center-xs">
</div>
</div>
</div>
<div class="wt-mb-xs-6 wt-mb-md-8">
<div class="clearfix"></div>
<div>
<h3 class="wt-text-title-01"></h3>
<div class="pt-xs-2 pt-lg-4">
<div class="display-flex-md flex-wrap max-width-760px">
<div class="mb-xs-2 text-body mr-md-6">
<a href="https://www.facebook.com/Lucky-Plum-706715642737271/" class="text-decoration-none clearfix" title="Facebook" target="_blank" rel="nofollow noopener">
<span class="etsy-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M20,5V19a1.007,1.007,0,0,1-1,1H15V13.776h2l0.336-2.3H15V9.659a0.912,0.912,0,0,1,1-1.031h1.5V6.55a11.284,11.284,0,0,0-1.641-.109c-2.2,0-3.3,1.219-3.3,3.039v1.992h-2v2.3h2V20H5a1.007,1.007,0,0,1-1-1V5A1.007,1.007,0,0,1,5,4H19A1.007,1.007,0,0,1,20,5Z"></path></svg></span>
<span>Facebook</span>
</a>
</div>
<div class="mb-xs-2 text-body mr-md-6">
<a href="https://www.instagram.com/luckyplumstudio/" class="text-decoration-none clearfix" title="Instagram" target="_blank" rel="nofollow noopener">
<span class="etsy-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M12,5.447c2.136,0,2.389,0.008,3.233,0.047c0.78,0.036,1.204,0.166,1.485,0.275c0.373,0.145,0.64,0.318,0.92,0.598 c0.28,0.28,0.453,0.546,0.598,0.92c0.11,0.282,0.24,0.706,0.275,1.485c0.038,0.844,0.047,1.097,0.047,3.233 s-0.008,2.389-0.047,3.233c-0.036,0.78-0.166,1.204-0.275,1.485c-0.145,0.373-0.318,0.64-0.598,0.92 c-0.28,0.28-0.546,0.453-0.92,0.598c-0.282,0.11-0.706,0.24-1.485,0.275c-0.843,0.038-1.096,0.047-3.233,0.047 s-2.389-0.008-3.233-0.047c-0.78-0.036-1.204-0.166-1.485-0.275c-0.373-0.145-0.64-0.318-0.92-0.598 c-0.28-0.28-0.453-0.546-0.598-0.92c-0.11-0.282-0.24-0.706-0.275-1.485c-0.038-0.844-0.047-1.097-0.047-3.233 S5.45,9.616,5.488,8.773c0.036-0.78,0.166-1.204,0.275-1.485c0.145-0.373,0.318-0.64,0.598-0.92c0.28-0.28,0.546-0.453,0.92-0.598 c0.282-0.11,0.706-0.24,1.485-0.275C9.611,5.455,9.864,5.447,12,5.447 M12,4.005c-2.173,0-2.445,0.009-3.298,0.048 C7.85,4.092,7.269,4.227,6.76,4.425C6.234,4.63,5.787,4.903,5.343,5.348C4.898,5.793,4.624,6.239,4.42,6.765 c-0.198,0.509-0.333,1.09-0.372,1.942C4.009,9.56,4,9.833,4,12.005c0,2.173,0.009,2.445,0.048,3.298 c0.039,0.852,0.174,1.433,0.372,1.942c0.204,0.526,0.478,0.972,0.923,1.417c0.445,0.445,0.891,0.718,1.417,0.923 c0.509,0.198,1.09,0.333,1.942,0.372c0.853,0.039,1.126,0.048,3.298,0.048s2.445-0.009,3.298-0.048 c0.852-0.039,1.433-0.174,1.942-0.372c0.526-0.204,0.972-0.478,1.417-0.923c0.445-0.445,0.718-0.891,0.923-1.417 c0.198-0.509,0.333-1.09,0.372-1.942C19.991,14.45,20,14.178,20,12.005s-0.009-2.445-0.048-3.298 c-0.039-0.852-0.174-1.433-0.372-1.942c-0.204-0.526-0.478-0.972-0.923-1.417c-0.445-0.445-0.891-0.718-1.417-0.923 c-0.509-0.198-1.09-0.333-1.942-0.372C14.445,4.014,14.173,4.005,12,4.005L12,4.005z"></path><path d="M12,7.897c-2.269,0-4.108,1.839-4.108,4.108S9.731,16.113,12,16.113s4.108-1.839,4.108-4.108S14.269,7.897,12,7.897z M12,14.672c-1.473,0-2.667-1.194-2.667-2.667S10.527,9.339,12,9.339s2.667,1.194,2.667,2.667S13.473,14.672,12,14.672z"></path><circle cx="16.27" cy="7.735" r="0.96"></circle></svg></span>
<span>Instagram</span>
</a>
</div>
</div>
</div>
</div>
</div>
<div class="wt-mb-xs-8 wt-mb-md-10">
<div class="clearfix"></div>
<div class="about-section display-flex-md flex-direction-column-md mb-md-5 pl-xs-0 pr-xs-0" data-region="shop-members" id="shop-members">
<div class="p-xs-0">
<h3 class="wt-text-title-01">Shop members</h3>
</div>
<div class="pl-xs-0 pr-xs-0 pt-xs-2 pt-lg-4">
<div class="max-width-760px">
<ul class="list-unstyled block-grid-md-2" data-region="shop-member-list">
<li class="pt-xs-2 pb-xs-2 block-grid-item" data-region="shop-member" data-member-id="22676501471" data-member-avatar-url="https://i.etsystatic.com/isc/87253d/22676501471/isc_90x90.22676501471_6w54.jpg?version=0" data-member-bio="" data-member-role="Owner"
data-member-name="Lucky Plum Studio">
<div class="flag">
<div class="flag-img vertical-align-top pr-lg-3">
<img src="https://i.etsystatic.com/isc/87253d/22676501471/isc_90x90.22676501471_6w54.jpg?version=0" alt="" class="circle" data-region="member-avatar" width="48" height="48">
</div>
<div class="flag-body">
<h6 class="mb-xs-0 b text-transform-none text-body" data-region="member-name">Lucky Plum Studio</h6>
<p class="prose" data-region="member-role">Owner</p>
<p class="text-gray-lighter mb-xs-0" data-region="member-bio">
</p>
</div>
</div>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="">
</div>
</div>
As Always thanks in advance
''######### updated today 22/3/2021 at 6pm uk time #########
In reply to Qharr answer. I had this for location and nothing was collected, could you please explain where i went wrong and I should be able to fix the rest
''' Element 4
DoEvents
If element.getElementsByClassName("shop-location")(0).getElementsByTagName("Span")(0) Is Nothing Then ' Get CLASS and Child Nod
wsSheet.Cells(sht.Cells(sht.Rows.Count, "D").End(xlUp).Row + 1, "D").Value = "-"
Else
HtmlText = element.getElementsByClassName("shop-location")(0).getElementsByTagName("Span")(0).innerText
wsSheet.Cells(sht.Cells(sht.Rows.Count, "D").End(xlUp).Row + 1, "D").Value = HtmlText
End If
Not sure what to say except read up on html and html document methods/ css selectors so you understand the patterns you need to apply. The rest is just practice and learning which are the fastest and more robust methods.
CSS:
Location: .shop-location span is a span child element with parent having class shop-location
Social media links: #about .text-decoration-none child nodes with one class name that is text-decoration-none, having parent with id about.
Name: [data-region='member-name'] element with data-region attribute having value member-name
Read about css selectors and descendant combinator here
Practice css selectors here
Learn about html here
VBA:
Option Explicit
Public Sub GetInfo()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://www.etsy.com/uk/shop/LuckyPlumStudio"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
With .document
Debug.Print .querySelector(".shop-location span").innerText 'location
Dim i As Long, socialMedias As Object
Set socialMedias = .querySelectorAll("#about .text-decoration-none")
For i = 0 To socialMedias.Length - 1 'media links
Debug.Print socialMedias.Item(i).href
Next
Debug.Print .querySelector("[data-region='member-name']").innerText 'company name
End With
.Quit
End With
End Sub
Less optimal methods for selecting:
Option Explicit
Public Sub GetInfo()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://www.etsy.com/uk/shop/LuckyPlumStudio"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
With .document
Debug.Print .getElementsByClassName("shop-location wt-display-flex-xs")(0).getElementsByTagName("span")(0).innerText 'location
Dim i As Object, socialMedias As Object
Set socialMedias = .getElementById("about").getElementsByClassName("text-decoration-none clearfix")
For Each i In socialMedias 'media links
Debug.Print i.href
Next
Debug.Print .getElementById("about").getElementsByClassName("flag")(0).getElementsByTagName("h6")(0).innerText 'company name
End With
.Quit
End With
End Sub

Extract email HTML Element

I am trying to scrape a page and there is a point I am stuck at. Here's first the HTML part of the whole HTML page
<article class="mod mod-Treffer" data-teilnehmerid="122085958708">
<div data-wipe="{"listener": "click", "name": "Trefferliste Eintrag zur Detailseite", "id": "122085958708", "synchron": true}" data-realid="2aeca1d2-2bc5-4070-ac4d-e16b10badca5" data-tnid="122085958708" target="_self">
<div class="mod-hervorhebung">
<p class="mod-hervorhebung--partnerHervorhebung" data-hervorhebungsstufe="3">Silber Partner</p>
</div>
<picture class="trefferlisten_logo">
<source media="(min-width: 768px)" srcset="https://ies.v4all.de/0122/GS/0122/5/8335/49428335_310x190.png">
<img alt="" data-lazy-src="https://ies.v4all.de/0122/GS/0122/5/8335/49428335_310x190.png" src="https://ies.v4all.de/0122/GS/0122/5/8335/49428335_310x190.png">
</picture>
<h2 data-wipe-name="Titel">A & S Billing Pflege-Service GmbH</h2>
<p class="d-inline-block mod-Treffer--besteBranche">Ambulante Pflegedienste</p>
<div class="mod mod-Stars mod-Stars--" title="2.9/5" data-float="2,9">
<span class="mod-Stars__text" style="width: 58.000001907348632812500%;">2.9</span>
</div>
<span>2.9</span>
<span>(8)</span>
<address class="mod mod-AdresseKompakt">
<p data-wipe-name="Adresse">
Kirchenberg 2‑4,
<span class="nobr">
90482
Nürnberg
</span>
(Mögeldorf)
</p>
<p class="mod-AdresseKompakt__phoneNumber" data-hochgestellt-position="end" data-wipe-name="Kontaktdaten">(0911) 60 00 99 77</p>
</address>
</div>
<div class="aktionsleiste_kompakt">
<div class="mod-gsSlider mod-gsSlider--noneOnWhite">
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="left" data-show="false" data-wipe="{"listener":"click","name":"Trefferliste: Aktionleiste-button-links"}"></span>
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="right" data-show="false" data-wipe="{"listener":"click","name":"Trefferliste: Aktionleiste-button-rechts"}"></span>
<div class="mod-gsSlider__slider" data-initialized="true">
<a class="contains-icon-homepage gs-btn" target="_blank" rel=" noopener" href="http://www.as-billing.de" data-wipe="{"listener":"click", "name":"Trefferliste Webseite-Button", "id":"122085958708"}" data-isneededpromise="false">Webseite</a>
<a class="contains-icon-email gs-btn" href="mailto:info#as-billing.de?subject=Anfrage%20%C3%BCber%20Gelbe%20Seiten" data-wipe="{"listener":"click", "name":"Trefferliste Email-Button", "id":"122085958708"}" data-isneededpromise="false">E-Mail</a>
<span class="contains-icon-route_finden gs-btn" data-wipe="{"listener":"click", "name":"Trefferliste Navigation-Button", "id":"122085958708"}" data-parameters="{"partner": "googlemaps", "searchquery": "A%20%26%20S%20Billing%20Pflege-Service%20GmbH%20Kirchenberg%202-4%2090482%20N%C3%BCrnberg"}" data-target="_blank">Route</span>
<span class="contains-icon-details gs-btn" data-wipe="{"listener":"click", "name":"Trefferliste Actionbutton Mehr Details", "id":"122085958708"}" data-parameters="{"partner": "gs", "realId": "2aeca1d2-2bc5-4070-ac4d-e16b10badca5", "tnId": "122085958708"}">Mehr Details</span>
</div>
</div>
</div>
</article>
I first used these lines
Dim post As Object
Set post = html.querySelectorAll(".mod-Treffer")
For i = 0 To post.Length - 1
Debug.Print post.Item(i).getElementsByTagName("h2")(0).innerText
Debug.Print post.Item(i).getElementsByTagName("Address")(0).getElementsByTagName("p")(1).innerText
'I am stuck with extracting the email
'HERE
Next i
Moreover, sometimes the post object doesn't have the email inforrmation so I need to extract only if found.
That's the code till now
Const sURL As String = "https://www.gelbeseiten.de/Suche/Ambulante%20Pflegedienste/Bundesweit"
Dim http As MSXML2.XMLHTTP60, html As HTMLDocument
Set http = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With http
.Open "Get", sURL, False
.send
html.body.innerHTML = .responseText
End With
Dim post As Object
Set post = html.querySelectorAll(".mod-Treffer")
Dim i As Long, r As Long
Range("A1").Resize(1, 3).Value = Array("Title", "Phone", "Email")
r = 2
For i = 0 To post.Length - 1
Cells(r, 1).Value = post.Item(i).getElementsByTagName("h2")(0).innerText
Cells(r, 2).Value = post.Item(i).getElementsByTagName("Address")(0).getElementsByTagName("p")(1).innerText
Next i
Here's a snapshot of the email part
Original question:
In this case I would use an attribute = value selector with contains operator to target the href attribute by the string mailto. Add css selector: [href*=mailto]
If you use querySelectorAll("[href*=mailto]") you can test if the .Length property is greater than 0 or use querySelector and test if Not querySelector("[href*=mailto]") Is Nothing.
If you set to a variable
Dim ele As Object
Set ele = html.document.querySelector("[href*=mailto]")
If Not ele Is Nothing Then
Debug.Print ele.href 'do something with the href to parse out email
End If
Updated question:
For the updated question I would transfer current node's, in nodeList, outerHTML into a surrogate HTMLDocument variable so I can leverage querySelector method again. I would target email by class.
Option Explicit
Public Sub GetListingInfo()
Const URL As String = "https://www.gelbeseiten.de/Suche/Ambulante%20Pflegedienste/Bundesweit"
Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument
Set http = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With http
.Open "Get", URL, False
.send
html.body.innerHTML = .responseText
End With
Dim post As Object, html2 As MSHTML.HTMLDocument
Set post = html.querySelectorAll(".mod-Treffer")
Set html2 = New MSHTML.HTMLDocument
Dim i As Long, emailNode As Object
With ActiveSheet
.Range("A1").Resize(1, 3).Value = Array("Title", "Phone", "Email")
For i = 0 To post.Length - 1
html2.body.innerHTML = post.Item(i).outerHTML
.Cells(i + 2, 1).Value = html2.querySelector("h2").innerText
.Cells(i + 2, 2).Value = html2.querySelector(".mod-AdresseKompakt__phoneNumber").innerText
Set emailNode = html2.querySelector(".contains-icon-email")
If Not emailNode Is Nothing Then .Cells(i + 2, 3).Value = Replace$(emailNode.href, "mailto:", vbNullString)
Next i
End With
End Sub
Thanks a lot.
I could figure it out using these lines
If InStr(post.Item(i).getElementsByTagName("a")(1).href, "mailto:") Then
Debug.Print Split(Split(post.Item(i).getElementsByTagName("a")(1).href, "mailto:")(1), "?")(0)
End If
But I welcome any other suggestions to improve and learn more.
* After testing, I encountered an error if the email not found within the element. How to avoid the error? I can use On Error Resume Next. But I have a desire to handle the error instead of skipping it.
** Edit:
I could solve the second point by using this structure
Dim emailObj As Object
Set emailObj = post.Item(i).getElementsByTagName("a")(1)
If Not emailObj Is Nothing Then
If InStr(post.Item(i).getElementsByTagName("a")(1).href, "mailto:") Then
Debug.Print Split(Split(post.Item(i).getElementsByTagName("a")(1).href, "mailto:")(1), "?")(0)
End If
The code works but sometimes the email is not grabbed correctly .. that is because of this line
Set emailObj = post.Item(i).getElementsByTagName("a")(1)
Sometimes the object is not assigned to 1. So my last question: how can I get the email data regardless of the assigned number?
Inside the loop, I tried this line and played around with no use
Set aNodeList = post.Item(i).querySelectorAll(".contains-icon-email")(0)
<article class="mod mod-Treffer" data-teilnehmerid="122085958708">
<div data-wipe="{"listener": "click", "name": "Trefferliste Eintrag zur Detailseite", "id": "122085958708", "synchron": true}" data-realid="2aeca1d2-2bc5-4070-ac4d-e16b10badca5" data-tnid="122085958708" target="_self">
<div class="mod-hervorhebung">
<p class="mod-hervorhebung--partnerHervorhebung" data-hervorhebungsstufe="3">Silber Partner</p>
</div>
<picture class="trefferlisten_logo">
<source media="(min-width: 768px)" srcset="https://ies.v4all.de/0122/GS/0122/5/8335/49428335_310x190.png">
<img alt="" data-lazy-src="https://ies.v4all.de/0122/GS/0122/5/8335/49428335_310x190.png" src="https://ies.v4all.de/0122/GS/0122/5/8335/49428335_310x190.png">
</picture>
<h2 data-wipe-name="Titel">A & S Billing Pflege-Service GmbH</h2>
<p class="d-inline-block mod-Treffer--besteBranche">Ambulante Pflegedienste</p>
<div class="mod mod-Stars mod-Stars--" title="2.9/5" data-float="2,9">
<span class="mod-Stars__text" style="width: 58.000001907348632812500%;">2.9</span>
</div>
<span>2.9</span>
<span>(8)</span>
<address class="mod mod-AdresseKompakt">
<p data-wipe-name="Adresse">
Kirchenberg 2‑4,
<span class="nobr">
90482
Nürnberg
</span>
(Mögeldorf)
</p>
<p class="mod-AdresseKompakt__phoneNumber" data-hochgestellt-position="end" data-wipe-name="Kontaktdaten">(0911) 60 00 99 77</p>
</address>
</div>
<div class="aktionsleiste_kompakt">
<div class="mod-gsSlider mod-gsSlider--noneOnWhite">
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="left" data-show="false" data-wipe="{"listener":"click","name":"Trefferliste: Aktionleiste-button-links"}"></span>
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="right" data-show="false" data-wipe="{"listener":"click","name":"Trefferliste: Aktionleiste-button-rechts"}"></span>
<div class="mod-gsSlider__slider" data-initialized="true">
<a class="contains-icon-homepage gs-btn" target="_blank" rel=" noopener" href="http://www.as-billing.de" data-wipe="{"listener":"click", "name":"Trefferliste Webseite-Button", "id":"122085958708"}" data-isneededpromise="false">Webseite</a>
<a class="contains-icon-email gs-btn" href="mailto:info#as-billing.de?subject=Anfrage%20%C3%BCber%20Gelbe%20Seiten" data-wipe="{"listener":"click", "name":"Trefferliste Email-Button", "id":"122085958708"}" data-isneededpromise="false">E-Mail</a>
<span class="contains-icon-route_finden gs-btn" data-wipe="{"listener":"click", "name":"Trefferliste Navigation-Button", "id":"122085958708"}" data-parameters="{"partner": "googlemaps", "searchquery": "A%20%26%20S%20Billing%20Pflege-Service%20GmbH%20Kirchenberg%202-4%2090482%20N%C3%BCrnberg"}" data-target="_blank">Route</span>
<span class="contains-icon-details gs-btn" data-wipe="{"listener":"click", "name":"Trefferliste Actionbutton Mehr Details", "id":"122085958708"}" data-parameters="{"partner": "gs", "realId": "2aeca1d2-2bc5-4070-ac4d-e16b10badca5", "tnId": "122085958708"}">Mehr Details</span>
</div>
</div>
</div>
</article>

Using MSXML2.XMLHTTP in access VBA not extracting all page data

Currently, we are using below mentioned code for data extraction but code is not extracting complete data from webpage, code is ignoring data which is visible when i enable java script and DOM storage on Internet explorer.
till now I use below mentioned code, trailing code is extracting every thing accept images from web page.
My code is given blow.
Set http = CreateObject("MSXML2.XMLHTTP")
http.Send
html.body.innerHTML = http.ResponseText
On Error GoTo 0
html1 = html.body.innerHTML
brand5 = html.documentElement.innerHTML
If html1 Like "*media__thumb*" Then
other_img = html.getElementsByClassName("media__thumb")(0).innerText
'other_img = other_img.innerHTML
End If
On webpage multiple image html code is given below(please note that my above code is not extracting data from below mentioned html code.
<a class="media__thumbnail" data-media_type="IMAGE" data-media_id="orbit-bagged-53017-64" data-target="IMAGE" data-has-index="true">
<img src="https://images.yourweb/_145.jpg">
</a>
<a class="media__thumbnail media__thumbnail--selected" data-media_type="IMAGE" data-media_id="orbit-bagged-53017-e1" data-target="IMAGE" data-has-index="true">
<img src="https://images.yourweb1_145.jpg">
</a>
</span></a>
http.response is given below
<div id="thumbnails" class="media__thumbnails" data-component="thumbnails"></div>
<script type="text/template" id="media__thumbnails">
{{#thumbnails}}
<a class="media__thumbnail" data-media_type="{{type}}" data-media_id="{{id}}" data-target="{{type}}" data-has-index="true">
<img src="{{{thumb}}}"/>
{{# hasIcon}}
{{# threeSixtyIcon}} <div class="whitespace"><span class="threesixtyIcon"></span></div>{{/ threeSixtyIcon}}
{{^ threeSixtyIcon}} <span class="videoIcon"></span>{{/ threeSixtyIcon}}
{{/ hasIcon}}
</a>
{{/thumbnails}}
{{#additionalThumbnailsThumbnail}}
<a class="media__thumbnail media__thumbnail-additional-count" data-media_type="{{type}}" data-media_id="{{id}}" data-target="{{type}}" data-has-index="true">
<img src="{{{thumb}}}"/>
{{# hasIcon}}
{{# threeSixtyIcon}} <div class="whitespace"><span class="threesixtyIcon"></span></div>{{/ threeSixtyIcon}}
{{^ threeSixtyIcon}} <span class="videoIcon"></span>{{/ threeSixtyIcon}}
{{/ hasIcon}}
{{#additionalImagesCount}}
<div class="media__thumbnail-overlay"></div>
<span class="media__thumbnail-count">+{{additionalImagesCount}}</span>
{{/additionalImagesCount}}
</a>
That content required javascript to run on the page so you will need to either:
Search for the information for those part of those urls in the web traffic to see if you obtain from elsewhere (you can - shown below); or,
Automate a browser e.g. with microsoft internet controls
You can see from the following that the content is loaded dynamically:
<script type="text/template" id="media__thumbnails">
{{#thumbnails}}
<a class="media__thumbnail" data-media_type="{{type}}" data-media_id="{{id}}" data-target="{{type}}" data-has-index="true">
<img src="{{{thumb}}}"/>
{{# hasIcon}}
{{# threeSixtyIcon}} <div class="whitespace"><span class="threesixtyIcon"></span></div>{{/ threeSixtyIcon}}
{{^ threeSixtyIcon}} <span class="videoIcon"></span>{{/ threeSixtyIcon}}
{{/ hasIcon}}
</a>
{{/thumbnails}}
{{#additionalThumbnailsThumbnail}}
......
{{/additionalThumbnails}}
</script>
<script type="text/template"
1) Network tab - different url
Using a different url found in network tab returning json containing links. Response is json so a jsonparser is required.
'VBE>Tools> References> Add reference to Microsoft Scripting Runtime
'Download and add in jsonconverter.bas from https://github.com/VBA-tools/VBA-JSON/blob/master/JsonConverter.bas
Place the converter.bas code in module2 and comment out the line: Attribute VB_Name = "JsonConverter"
In module1 place the GetInfo sub.
Option Compare Database
Option Explicit
Public Sub GetInfo()
Dim json As Object, url1 As String, url2 As String, url3 As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.homedepot.com/p/svcs/frontEndModel/100001020?_=1556447908065", False
.send
Set json = Module2.ParseJson(.responseText)
End With
'Parse json object (see paths shown below for example)
url1 = json("primaryItemData")("media")("mediaList")(2)("location")
url2 = json("primaryItemData")("media")("mediaList")(3)("location")
url3 = json("primaryItemData")("media")("mediaList")(4)("location") 'example
Stop '<==delete me later
End Sub
Paths to first 3 thumbnails:
json►primaryItemData►media►mediaList►2►location
json►primaryItemData►media►mediaList►3►location
json►primaryItemData►media►mediaList►4►location
Explore json here.
2) Automated browser (IE version):
'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub GetImageLinks()
Dim ie As New InternetExplorer, images As Object, i As Long
With ie
.Visible = True
.Navigate2 "https://www.homedepot.com/p/Orbit-Sandstone-Rock-Valve-Box-Cover-53017/100001020"
While .Busy Or .readyState < 4: DoEvents: Wend
Set images = .document.querySelectorAll(".media__thumbnail img")
For i = 0 To images.Length - 1
Debug.Print images.item(i).src
Next
Stop
.Quit
End With
End Sub

How to correctly write a CSS Attribute selector to extract all id attributes?

Situation:
I am currently attempting to reproduce, in VBA, the Attribute selector with syntax [attr] from the CSS selectors exercises given here.
The selector is intended to select elements based on the value of the given attribute.
Expected result:
In the html sample I include, the expected result of trying to get ALL id attributes, using html.querySelectorAll("[id]"), is highlighted in yellow when you run it.
Problem:
Instead of getting just the information associated wih id elements - the yellow highlighted bits - I am getting way more text. Looks like pretty much everything with some repeated material.
What I have tried:
I have read through plenty of CSS resources on this. They all state the same syntax. *See sample references . I haven't found a nicely matched VBA example so I may not be converting the syntax correctly.
In line with the above, only as a test, I tried altering the selector syntax to target a specific id. That worked perfectly.
For example:
Set a = html.querySelectorAll("[id=""my-Address""]")
This, in my code sample, yields the expected value of:
<p id="my-Address">I live in Duckburg</p>
I tried removing the [] from [id] which printed nothing to the immediate window.
This SO question has an answer that mentions Chrome ,the browser I am using, as being problematic with some CSS selectors but I don't think this applies to my scenario.
Question:
How do I correctly write a CSS selector, in VBA, to extract all the elements with id attribute from the given webpage?
Code:
Option Explicit
'[attribute] [target] Selects all elements with a target attribute e.g. [id]
Public Sub Test13()
Dim html As MSHTML.HTMLDocument, i As Long
Set html = GetTestHTML()
Dim a As Object
'Set a = html.querySelectorAll("[id=""my-Address""]")
Set a = html.querySelectorAll("[id]")
For i = 0 To a.Length - 1
Debug.Print a(i).innerText
Next i
End Sub
Public Function GetTestHTML(Optional ByVal url As String = "https://www.w3schools.com/cssref/trysel.asp") As HTMLDocument
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
With http 'Set http = CreateObject("MSXML2.XMLHttp60")
.Open "GET", url, False
.send
html.body.innerHTML = .responseText
Set GetTestHTML = html
End With
End Function
HMTL expected result in yellow:
<div class="noSel">
<h1><span class="markup"><h1></span>Welcome to My Homepage<span class="markup"></h1></span></h1>
<div id="helpIntro">
<span class="markup"><div class="intro"></span>
<div class="intro">
<p style="margin-top:4px;"><span class="markup"><p></span>My name is Donald <span id="Lastname" style="border-color: rgb(255, 102, 102); background-color: rgb(255, 255, 153);"><span class="markup"><span id="Lastname"></span>Duck.<span class="markup"></span></span></span><span class="markup"></p></span></p>
<p id="my-Address" style="border-color: rgb(255, 102, 102); background-color: rgb(255, 255, 153);"><span class="markup"><p id="my-Address"></span>I live in Duckburg<span class="markup"></p></span></p>
<p style="margin-bottom:4px;"><span class="markup"><p></span>I have many friends:<span class="markup"></p></span></p>
</div>
<span class="markup"></div></span>
</div>
<br>
<div class="helpUl" style="border-color: rgb(255, 102, 102); background-color: rgb(255, 255, 153);">
<span class="markup"><ul id="Listfriends></span>
<ul id="Listfriends" style="margin-top:0px;margin-bottom:0px;">
<li><span class="markup"><li></span>Goofy<span class="markup"></li></span></li>
<li><span class="markup"><li></span>Mickey<span class="markup"></li></span></li>
<li><span class="markup"><li></span>Daisy<span class="markup"></li></span></li>
<li><span class="markup"><li></span>Pluto<span class="markup"></li></span></li>
</ul>
<span class="markup"></ul></span>
</div>
<ul style="display:none;"></ul>
<p><span class="markup"><p></span>All my friends are great!<span class="markup"><br></span><br>But I really like Daisy!!<span class="markup"></p></span></p>
<p lang="it" title="Hello beautiful"><span class="markup"><p lang="it" title="Hello beautiful"></span>Ciao bella<span class="markup"></p></span></p>
<h3><span class="markup"><h3></span>We are all animals!<span class="markup"></h3></span></h3>
<p><span class="markup"><p></span><span><b><span class="markup"><b></span>My latest discoveries have led me to believe that we are all animals:<span class="markup"></b></span></b></span><span class="markup"></p></span></p>
<div class="helpTable" style="width:220px;">
<span class="markup"><table></span>
<ul style="display:none;"></ul>
<div class="noSel" style="margin-top:10px;">
References:
Mozilla: CSS selectors
w3schools CSS Selector Reference
VBA/DOM - Get elements based on attribute
Unable to get CSS Attribute selector to work
Chrome and CSS attribute selector
Project references:
*via VBE > Tools > References
It turns out two errors needed to be corrected.
The source website HTML was missing the closing " in the section <ul id="Listfriends> . This should have been <ul id="Listfriends">. This meant the CSS selector carried on matching.
I brought all the HTML from the page and queried that when in fact I wanted just the HTML within a specific iframe in order to work with only the expected ids.
Code:
Option Explicit
Public Sub GetInfo()
Dim html As MSHTML.HTMLDocument, i As Long
Set html = GetTestHTML()
Dim a As Object
html.body.innerHTML = html.querySelector("#iframeResult").document.getElementById("selectorResult").innerHTML
Set a = html.querySelectorAll("[id]")
For i = 0 To a.Length - 1
Debug.Print a(i).innerText
Next i
End Sub
Public Function GetTestHTML(Optional ByVal url As String = "https://www.w3schools.com/cssref/trysel.asp") As HTMLDocument
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
With http 'Set http = CreateObject("MSXML2.XMLHttp60")
.Open "GET", url, False
.send
html.body.innerHTML = Replace(.responseText, """Listfriends", """Listfriends""")
Set GetTestHTML = html
End With
End Function

"Post" XML Data like HTML with Hidden Values using ContentType ="txt/html"

I want to do the same that works previously on HTML but now via .NET Windows Forms.
When I submit this HTML it works :
<html>
<head>
</head>
<body>
<form name="TestForm" action="http://staging.csatravelprotection.com/ws/policyrequest" method="POST">
<input type="hidden" name="xmlrequeststring" value="
<quoterequest>
<aff>COSTAMAR</aff> <!-- required -->
<producer>10527930</producer> <!-- optional -->
<productclass>85FL</productclass> <!-- required -->
<bookingreservno>0123456789AB</bookingreservno> <!-- optional -->
<numinsured>3</numinsured> <!-- required -->
<tripcost>5000.00</tripcost> <!-- required -->
<departdate>2010-11-01</departdate> <!-- required -->
<returndate>2010-11-20</returndate> <!-- required -->
<triptype>Cruise</triptype> <!-- optional -->
<destination>Europe/ Mediterranean</destination> <!-- required -->
<supplier>Carnival Cruise Lines</supplier> <!-- optional -->
<airline>American</airline> <!-- optional-->
<travelers>
<traveler>
<age>45</age> <!-- required -->
</traveler>
<traveler>
<age>43</age> <!-- required -->
</traveler>
<traveler>
<age>15</age> <!-- required -->
</traveler>
</travelers>
</quoterequest>
">
<input type="submit" name="submit" value="submit">
</form>
</body>
</html>
but when I try to send the XML via POST using .NET it appear to fail cause I dont know how to post via Hidden Input on the URI.
Imports System.IO
Imports System.Text
Imports System.Net
Public Class Form2
Private Shared URL As String = "http://staging.csatravelprotection.com/ws/policyrequest"
Private Sub Form2_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim oHttpWebRequest As WebRequest = WebRequest.Create(New Uri(URL))
oHttpWebRequest.Method = "POST"
oHttpWebRequest.ContentType = "text/xml"
Dim oStream As Stream = oHttpWebRequest.GetRequestStream()
Dim Reader As StreamReader = New StreamReader("C:\TEST.XML", Encoding.Default)
Dim Postdata As String = String.Format("xmlrequeststring={0}", Reader.ReadToEnd)
oStream.Write(Encoding.ASCII.GetBytes(Postdata), 0, Postdata.Length)
oStream.Close()
Dim oHttpWebResponse As HttpWebResponse = CType(oHttpWebRequest.GetResponse(), HttpWebResponse)
Dim oStreamResponse As Stream = oHttpWebResponse.GetResponseStream()
Dim oStreamRead As StreamReader = New StreamReader(oStreamResponse, Encoding.UTF8)
Dim strReturnedXML As String = oStreamRead.ReadToEnd()
MessageBox.Show(strReturnedXML)
oStreamResponse.Close()
oStreamRead.Close()
oHttpWebResponse.Close()
End Sub
End Class
XML :
<quoterequest>
<aff>COSTAMAR</aff>
<producer>10527930</producer>
<productclass>TBD</productclass>
<bookingreservno>0123456789AB</bookingreservno>
<numinsured>3</numinsured>
<tripcost>5000.00</tripcost>
<departdate>2009-11-01</departdate>
<returndate>2009-11-20</returndate>
<initdate>2008-09-30</initdate>
<finalpaymentdate>2008-10-30</finalpaymentdate>
<triptype>Cruise</triptype>
<destination>Europe/ Mediterranean</destination>
<supplier>Carnival Cruise Lines</supplier>
<airline>American</airline>
</quoterequest>
Is there a way to make it work as expected on .NET?
Thanks
It is possible, but you should not post only the xml data, but the original html file with your xml data embedded.
Het recieving page expects the data in that form. It cannot/does not see the difference between a browser or your program posting.
It could be that they have a different url form posting xml format data.
MarcelDevG
Thats impossible, but you can work with POST using ContentType ="txt/xml"