I am trying to extract some data from the web. However NOT all of the information that I need is in the Parent Class. I can get the information in the Parent class.
QUESTION - Is there a way to get data if it is outside of the parent class? or is there a way to set the below code to extract without using a parent class.
Link
I am using IE as it allos me to search the site. I have tried several code variations however, the extra information is not is the parent class that I am trying to extract from.
I am after the name, location and social media links. Location is at the tops of the webpage out of the class
I tried to use the following for parent class shop-home as all other class fall into it, but it did not work. I have never tried to get data that is not in the parent class so, not 100% sure how to do it. SIM helped with this element.ParentNode.ParentNode.getElementsByClassName as the product url was before the parent. I have been trying to use this for all the other data that is outside the parent, however I can not get it to work. I do not full understand it if someone could explain what the .ParentNode.ParentNode. is doing that will help with my understand and I might be able to work the rest out myself.
The code below is for the first two items that pulls off fine, the code layout is the same for all items except it is as If element.getElementsByClassName("CLASS HERE")(0) . I have tried using ID Tag Span AND SO ON If element.getElementsByClassName("CLASS HERE")(0).getelementsByTagName ("Span") (0)
Application.ScreenUpdating = False
Set HTML = objIE.document
''''########## Setting the Parent Class HERE ##########
Set elements = HTML.getElementsByClassName("v2-listing-card__info")
''''Scrolls Down the Browser
objIE.document.parentWindow.Scroll 0&, 9999 ' Scrolls Down the Browser
''''FOR LOOP
For Each element In elements
''' Element 1
If element.ParentNode.ParentNode.getElementsByClassName("listing-link")(0) Is Nothing Then
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-"
Else
HtmlText = element.ParentNode.ParentNode.getElementsByClassName("listing-link")(0).href
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText
End If
''' Element 2
If element.getElementsByTagName("h3")(0) Is Nothing Then
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = "-"
Else
HtmlText = element.getElementsByTagName("h3")(0).innerText ' Get CLASS and Child Nod 'src
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = HtmlText 'return value in column
End If
''' Element 3
RESULTS - Date in red is wrong or missing as it is not in the above parent class
The shipping in column H pulls off fine as it is in the Parent, If there is no shipping info then a hyphen goes into the cell. Items for C,D,E, are out of the parent class that I am using.
<div class="flex-grow-1">
<div class="max-width-760px ">
</div>
<div class="max-width-676px">
<div class="">
<p class="wt-text-heading-02 wt-display-inline" data-inplace-editable-text="story_headline" data-endpoint="AboutPost" data-key="story_headline" data-placeholder="Sum up what you do in one sentence. Or just write something catchy." data-use-inplace-input="1"
data-add-class="normal story-headline-edit-link"></p>
</div>
<div class="">
<div id="about-story" class="" aria-hidden="false">
<p class="about-story text-body-larger text-gray-lighter ">
<span class="mt-xs-1" data-inplace-editable-text="story" data-endpoint="AboutPost" data-key="story" data-placeholder="How did you get started? What inspires you? We know each seller’s story is unique — tell yours here."></span>
</p>
</div>
<div class="wt-text-center-xs">
</div>
</div>
</div>
<div class="wt-mb-xs-6 wt-mb-md-8">
<div class="clearfix"></div>
<div>
<h3 class="wt-text-title-01"></h3>
<div class="pt-xs-2 pt-lg-4">
<div class="display-flex-md flex-wrap max-width-760px">
<div class="mb-xs-2 text-body mr-md-6">
<a href="https://www.facebook.com/Lucky-Plum-706715642737271/" class="text-decoration-none clearfix" title="Facebook" target="_blank" rel="nofollow noopener">
<span class="etsy-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M20,5V19a1.007,1.007,0,0,1-1,1H15V13.776h2l0.336-2.3H15V9.659a0.912,0.912,0,0,1,1-1.031h1.5V6.55a11.284,11.284,0,0,0-1.641-.109c-2.2,0-3.3,1.219-3.3,3.039v1.992h-2v2.3h2V20H5a1.007,1.007,0,0,1-1-1V5A1.007,1.007,0,0,1,5,4H19A1.007,1.007,0,0,1,20,5Z"></path></svg></span>
<span>Facebook</span>
</a>
</div>
<div class="mb-xs-2 text-body mr-md-6">
<a href="https://www.instagram.com/luckyplumstudio/" class="text-decoration-none clearfix" title="Instagram" target="_blank" rel="nofollow noopener">
<span class="etsy-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M12,5.447c2.136,0,2.389,0.008,3.233,0.047c0.78,0.036,1.204,0.166,1.485,0.275c0.373,0.145,0.64,0.318,0.92,0.598 c0.28,0.28,0.453,0.546,0.598,0.92c0.11,0.282,0.24,0.706,0.275,1.485c0.038,0.844,0.047,1.097,0.047,3.233 s-0.008,2.389-0.047,3.233c-0.036,0.78-0.166,1.204-0.275,1.485c-0.145,0.373-0.318,0.64-0.598,0.92 c-0.28,0.28-0.546,0.453-0.92,0.598c-0.282,0.11-0.706,0.24-1.485,0.275c-0.843,0.038-1.096,0.047-3.233,0.047 s-2.389-0.008-3.233-0.047c-0.78-0.036-1.204-0.166-1.485-0.275c-0.373-0.145-0.64-0.318-0.92-0.598 c-0.28-0.28-0.453-0.546-0.598-0.92c-0.11-0.282-0.24-0.706-0.275-1.485c-0.038-0.844-0.047-1.097-0.047-3.233 S5.45,9.616,5.488,8.773c0.036-0.78,0.166-1.204,0.275-1.485c0.145-0.373,0.318-0.64,0.598-0.92c0.28-0.28,0.546-0.453,0.92-0.598 c0.282-0.11,0.706-0.24,1.485-0.275C9.611,5.455,9.864,5.447,12,5.447 M12,4.005c-2.173,0-2.445,0.009-3.298,0.048 C7.85,4.092,7.269,4.227,6.76,4.425C6.234,4.63,5.787,4.903,5.343,5.348C4.898,5.793,4.624,6.239,4.42,6.765 c-0.198,0.509-0.333,1.09-0.372,1.942C4.009,9.56,4,9.833,4,12.005c0,2.173,0.009,2.445,0.048,3.298 c0.039,0.852,0.174,1.433,0.372,1.942c0.204,0.526,0.478,0.972,0.923,1.417c0.445,0.445,0.891,0.718,1.417,0.923 c0.509,0.198,1.09,0.333,1.942,0.372c0.853,0.039,1.126,0.048,3.298,0.048s2.445-0.009,3.298-0.048 c0.852-0.039,1.433-0.174,1.942-0.372c0.526-0.204,0.972-0.478,1.417-0.923c0.445-0.445,0.718-0.891,0.923-1.417 c0.198-0.509,0.333-1.09,0.372-1.942C19.991,14.45,20,14.178,20,12.005s-0.009-2.445-0.048-3.298 c-0.039-0.852-0.174-1.433-0.372-1.942c-0.204-0.526-0.478-0.972-0.923-1.417c-0.445-0.445-0.891-0.718-1.417-0.923 c-0.509-0.198-1.09-0.333-1.942-0.372C14.445,4.014,14.173,4.005,12,4.005L12,4.005z"></path><path d="M12,7.897c-2.269,0-4.108,1.839-4.108,4.108S9.731,16.113,12,16.113s4.108-1.839,4.108-4.108S14.269,7.897,12,7.897z M12,14.672c-1.473,0-2.667-1.194-2.667-2.667S10.527,9.339,12,9.339s2.667,1.194,2.667,2.667S13.473,14.672,12,14.672z"></path><circle cx="16.27" cy="7.735" r="0.96"></circle></svg></span>
<span>Instagram</span>
</a>
</div>
</div>
</div>
</div>
</div>
<div class="wt-mb-xs-8 wt-mb-md-10">
<div class="clearfix"></div>
<div class="about-section display-flex-md flex-direction-column-md mb-md-5 pl-xs-0 pr-xs-0" data-region="shop-members" id="shop-members">
<div class="p-xs-0">
<h3 class="wt-text-title-01">Shop members</h3>
</div>
<div class="pl-xs-0 pr-xs-0 pt-xs-2 pt-lg-4">
<div class="max-width-760px">
<ul class="list-unstyled block-grid-md-2" data-region="shop-member-list">
<li class="pt-xs-2 pb-xs-2 block-grid-item" data-region="shop-member" data-member-id="22676501471" data-member-avatar-url="https://i.etsystatic.com/isc/87253d/22676501471/isc_90x90.22676501471_6w54.jpg?version=0" data-member-bio="" data-member-role="Owner"
data-member-name="Lucky Plum Studio">
<div class="flag">
<div class="flag-img vertical-align-top pr-lg-3">
<img src="https://i.etsystatic.com/isc/87253d/22676501471/isc_90x90.22676501471_6w54.jpg?version=0" alt="" class="circle" data-region="member-avatar" width="48" height="48">
</div>
<div class="flag-body">
<h6 class="mb-xs-0 b text-transform-none text-body" data-region="member-name">Lucky Plum Studio</h6>
<p class="prose" data-region="member-role">Owner</p>
<p class="text-gray-lighter mb-xs-0" data-region="member-bio">
</p>
</div>
</div>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="">
</div>
</div>
As Always thanks in advance
''######### updated today 22/3/2021 at 6pm uk time #########
In reply to Qharr answer. I had this for location and nothing was collected, could you please explain where i went wrong and I should be able to fix the rest
''' Element 4
DoEvents
If element.getElementsByClassName("shop-location")(0).getElementsByTagName("Span")(0) Is Nothing Then ' Get CLASS and Child Nod
wsSheet.Cells(sht.Cells(sht.Rows.Count, "D").End(xlUp).Row + 1, "D").Value = "-"
Else
HtmlText = element.getElementsByClassName("shop-location")(0).getElementsByTagName("Span")(0).innerText
wsSheet.Cells(sht.Cells(sht.Rows.Count, "D").End(xlUp).Row + 1, "D").Value = HtmlText
End If
Not sure what to say except read up on html and html document methods/ css selectors so you understand the patterns you need to apply. The rest is just practice and learning which are the fastest and more robust methods.
CSS:
Location: .shop-location span is a span child element with parent having class shop-location
Social media links: #about .text-decoration-none child nodes with one class name that is text-decoration-none, having parent with id about.
Name: [data-region='member-name'] element with data-region attribute having value member-name
Read about css selectors and descendant combinator here
Practice css selectors here
Learn about html here
VBA:
Option Explicit
Public Sub GetInfo()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://www.etsy.com/uk/shop/LuckyPlumStudio"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
With .document
Debug.Print .querySelector(".shop-location span").innerText 'location
Dim i As Long, socialMedias As Object
Set socialMedias = .querySelectorAll("#about .text-decoration-none")
For i = 0 To socialMedias.Length - 1 'media links
Debug.Print socialMedias.Item(i).href
Next
Debug.Print .querySelector("[data-region='member-name']").innerText 'company name
End With
.Quit
End With
End Sub
Less optimal methods for selecting:
Option Explicit
Public Sub GetInfo()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://www.etsy.com/uk/shop/LuckyPlumStudio"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
With .document
Debug.Print .getElementsByClassName("shop-location wt-display-flex-xs")(0).getElementsByTagName("span")(0).innerText 'location
Dim i As Object, socialMedias As Object
Set socialMedias = .getElementById("about").getElementsByClassName("text-decoration-none clearfix")
For Each i In socialMedias 'media links
Debug.Print i.href
Next
Debug.Print .getElementById("about").getElementsByClassName("flag")(0).getElementsByTagName("h6")(0).innerText 'company name
End With
.Quit
End With
End Sub
Good Day,
I've searched for answers and solutions proveded on this site did not seem to help including selectedIndex and looping through arrays
I've got the following HTML code making up a table from which I want to select the second option "Vorige week"
<table cellspacing="0" cellpadding="0" title="" class="mstrListBlock"
id="id_mstr51" style="display: table; width: auto;">
<tbody>
<tr>
<td class="mstrListBlockCell">
<span class="">
<div class="mstrListBlockCaption" style="display: none;"/>
<div class="mstrListBlockHeader" style="display: none;">
<div style="" class="mstrListBlockContents"
id="ListBlockContents_id_mstr51">
<div oncontextmenu="return
mstr.behaviors.Generic.oncontextmenu(arguments[0], self, 'id_mstr51');"
onmouseup="try{mstr.$obj('id_mstr51').focus();}catch(localerr){}; return
mstr.behaviors.Generic.clearBrowserHighlights(self)" onmousedown="var retVal
= mstr.behaviors.ListView.onmousedown(arguments[0], self, 'id_mstr51');
try{mstr.$obj('id_mstr51').focus();}catch(localerr){}; return retVal"
ondblclick="return mstr.behaviors.ListView.ondblclick(arguments[0], self,
'id_mstr51')" class="mstrListBlockListContainer" id="id_mstr51ListContainer"
style="display: block;">
<div class="mstrListBlockItem" title="Huidige Week">
<div class="mstrListBlockItemSelected" title="Vorige Week">
<div class="mstrBGIcon_fi mstrListBlockItemName" style="background-position:
2px 50%; padding-left: 23px;">Vorige Week</div>
</div>
<div class="mstrListBlockItem" title="Afgesloten 4 Weken">
<div class="mstrListBlockItem" title="Afgesloten 8 Weken">
<div class="mstrListBlockItem" title="Huidige Periode">
<div class="mstrListBlockItem" title="Vorige Periode">
<div class="mstrListBlockItem" title="Afgesloten 2 Perioden">
<div class="mstrListBlockItem" title="Selectie Datum Hiërarchie. Aangepast
wegens IServer crash icm. Metric prompts.">
<div class="mstrListBlockItem" title="Gisteren">
I think my problem is in deciding which element I need to use to get the desired outcome
Sub JDWReport()
Dim objIE As InternetExplorer
Set objIE = New InternetExplorerMedium
objIE.Visible = True
objIE.navigate "URL"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
objIE.document.getElementById("Uid").Value = "username"
objIE.document.getElementById("Pwd").Value = "password"
objIE.document.getElementById("3054").Click
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
objIE.navigate "URL2"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
objIE.document.getElementsClassName("mstrBGIcon_fi
mstrListBlockItemName")(0).Click
objIE.Quit
End Sub
See code above which I'm currently using.
It gets stuck with the line objIE.document.getElementsClassName("mstrBGIcon_fi
mstrListBlockItemName")(0).Click
I tried changing this line to different elements based on the HTML code and use .click .selectedindex=2 but those won't work.
<div class="mstrListBlockItemSelected" title="Vorige Week">
Currently it says mstrListBlockItemSelected, however, when first navigating to the site, the class is defined as the rest, mstrListBlockItem.
It will only change to selected if you click on the item in question (from a list of items). My ultimate goal would be to get the class with title "Vorige Week" to change from mstrListBlockItem to mstrListBlockItemSelected.
I can see that you are using HTML Table and create DIV's in that.
I try to search and find that there is no any method or property is available to select the text in DIV.
I suggest you to use any HTML control to select its value. For example "Select option".
You can try to create drop down using select and then use the code below to select any value in it.
Sub Select_Item()
Dim post As Object, elem As Object
With CreateObject("InternetExplorer.Application")
.Visible = True
.navigate "C:\Users\WCS\Desktop\element.html"
While .Busy = True Or .ReadyState < 4: DoEvents: Wend
Set post = .Document.getElementById("ctl00_ContentPlaceHolder1_ddlCycleID")
For Each elem In post.getElementsByTagName("option")
If InStr(elem.Value, "10") > 0 Then elem.Selected = True: Exit For
Next elem
End With
End Sub
You can try an attribute = value CSS selector:
ie.document.querySelector("[title='Vorige Week']").Selected = True
Or
ie.document.querySelector("[title='Vorige Week']").Click