How to bypass <li> clear tag - html
I am using below mentioned code for data extraction,but due to li clear tag unable to copy complete data using access vba.Guidence required regarding bypassing clear tag.My code is given below.
Set my_data = html4.getElementsByClassName("right_box")
For Each Item In my_data
Set my_data1 = Item.getElementsByTagName("li")
For Each item1 In my_data1
If item1.innerHTML Like "*href*" Then
href11 = item1.getElementsByTagName("a")
Else
Exit For
End If
And HTML data code is given below.
<div class="right_box"> <div class="right_box_title"> <div class="title_left"></div> <a class="title_right" href="products.php?disp=1"></a> </div> <ul class="pro_list">
<li> <a title="NEW Handbags Handbags7" href="/index.php/NEW-Handbags-Handbags72-p20253745.html" class="pic"><img title="NEW Handbags Handbags7" alt="NEW Handbags Handbags7" src="/image.php?pic=2017-08-27%2F2017082722393889955047.jpg&style=1&folder=uploadImage%2F" border="0" /></a>
</li>
<li class="clear"></li>
<li>
<a title="NEW Handbags Handbags6" href="/index.php/NEW-Handbags-Handbags6-p2025361.html" class="pic"><img title="NEW Handbags Handbags6" alt="NEW Handbags Handbags6" src="/image.php?pic=2017-08-27%2F201708272239272285106.jpg&style=1&folder=uploadImage%2F" border="0" /></a>
</li>
Above code stopped data at clear
A CSS class should not prevent you from gathering data.
Note: You should set a reference to the Microsoft HTML Object Library.
This line of ↓ Code ↓ should fail due to item1.getElementsByTagName("a") returning an object and not a scalar value.
href11 = item1.getElementsByTagName("a")
Here is a better pattern for iterating over anchor tags:
Dim a As HTMLAnchorElement
Set my_data = html4.getElementsByClassName("right_box")(0)
For Each a In my_data.getElementsByTagName("a")
Debug.Print a.href
Next
remove this li with clear class and manage the design using css appied in other li.
In this case you could use querySelectorAll and pass appropriate selector like this one div[class='right_box'] ul[class='pro_list'] li a which select all a inside li inside ul with class pro_list inside div with class right_box. For more information about selectors see e.g. this page. HTH
Set html4 = ie.document
Dim selector As String
selector = "div[class='right_box'] ul[class='pro_list'] li a"
Dim anchors As IHTMLDOMChildrenCollection
Set anchors = html4.querySelectorAll(selector)
Dim anchor, i
If Not anchors Is Nothing Then
For i = 0 To anchors.Length - 1
Set anchor = anchors.Item(i)
Debug.Print "anchor-" & i & " href: " & anchor.href
Next i
End If
Output:
anchor-0 href: file:///C:/index.php/NEW-Handbags-Handbags72-p20253745.html
anchor-1 href: file:///C:/index.php/NEW-Handbags-Handbags6-p2025361.html
Related
How to get elements that are out of Parent Class
I am trying to extract some data from the web. However NOT all of the information that I need is in the Parent Class. I can get the information in the Parent class. QUESTION - Is there a way to get data if it is outside of the parent class? or is there a way to set the below code to extract without using a parent class. Link I am using IE as it allos me to search the site. I have tried several code variations however, the extra information is not is the parent class that I am trying to extract from. I am after the name, location and social media links. Location is at the tops of the webpage out of the class I tried to use the following for parent class shop-home as all other class fall into it, but it did not work. I have never tried to get data that is not in the parent class so, not 100% sure how to do it. SIM helped with this element.ParentNode.ParentNode.getElementsByClassName as the product url was before the parent. I have been trying to use this for all the other data that is outside the parent, however I can not get it to work. I do not full understand it if someone could explain what the .ParentNode.ParentNode. is doing that will help with my understand and I might be able to work the rest out myself. The code below is for the first two items that pulls off fine, the code layout is the same for all items except it is as If element.getElementsByClassName("CLASS HERE")(0) . I have tried using ID Tag Span AND SO ON If element.getElementsByClassName("CLASS HERE")(0).getelementsByTagName ("Span") (0) Application.ScreenUpdating = False Set HTML = objIE.document ''''########## Setting the Parent Class HERE ########## Set elements = HTML.getElementsByClassName("v2-listing-card__info") ''''Scrolls Down the Browser objIE.document.parentWindow.Scroll 0&, 9999 ' Scrolls Down the Browser ''''FOR LOOP For Each element In elements ''' Element 1 If element.ParentNode.ParentNode.getElementsByClassName("listing-link")(0) Is Nothing Then wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-" Else HtmlText = element.ParentNode.ParentNode.getElementsByClassName("listing-link")(0).href wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText End If ''' Element 2 If element.getElementsByTagName("h3")(0) Is Nothing Then wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = "-" Else HtmlText = element.getElementsByTagName("h3")(0).innerText ' Get CLASS and Child Nod 'src wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = HtmlText 'return value in column End If ''' Element 3 RESULTS - Date in red is wrong or missing as it is not in the above parent class The shipping in column H pulls off fine as it is in the Parent, If there is no shipping info then a hyphen goes into the cell. Items for C,D,E, are out of the parent class that I am using. <div class="flex-grow-1"> <div class="max-width-760px "> </div> <div class="max-width-676px"> <div class=""> <p class="wt-text-heading-02 wt-display-inline" data-inplace-editable-text="story_headline" data-endpoint="AboutPost" data-key="story_headline" data-placeholder="Sum up what you do in one sentence. Or just write something catchy." data-use-inplace-input="1" data-add-class="normal story-headline-edit-link"></p> </div> <div class=""> <div id="about-story" class="" aria-hidden="false"> <p class="about-story text-body-larger text-gray-lighter "> <span class="mt-xs-1" data-inplace-editable-text="story" data-endpoint="AboutPost" data-key="story" data-placeholder="How did you get started? What inspires you? We know each seller’s story is unique — tell yours here."></span> </p> </div> <div class="wt-text-center-xs"> </div> </div> </div> <div class="wt-mb-xs-6 wt-mb-md-8"> <div class="clearfix"></div> <div> <h3 class="wt-text-title-01"></h3> <div class="pt-xs-2 pt-lg-4"> <div class="display-flex-md flex-wrap max-width-760px"> <div class="mb-xs-2 text-body mr-md-6"> <a href="https://www.facebook.com/Lucky-Plum-706715642737271/" class="text-decoration-none clearfix" title="Facebook" target="_blank" rel="nofollow noopener"> <span class="etsy-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M20,5V19a1.007,1.007,0,0,1-1,1H15V13.776h2l0.336-2.3H15V9.659a0.912,0.912,0,0,1,1-1.031h1.5V6.55a11.284,11.284,0,0,0-1.641-.109c-2.2,0-3.3,1.219-3.3,3.039v1.992h-2v2.3h2V20H5a1.007,1.007,0,0,1-1-1V5A1.007,1.007,0,0,1,5,4H19A1.007,1.007,0,0,1,20,5Z"></path></svg></span> <span>Facebook</span> </a> </div> <div class="mb-xs-2 text-body mr-md-6"> <a href="https://www.instagram.com/luckyplumstudio/" class="text-decoration-none clearfix" title="Instagram" target="_blank" rel="nofollow noopener"> <span class="etsy-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M12,5.447c2.136,0,2.389,0.008,3.233,0.047c0.78,0.036,1.204,0.166,1.485,0.275c0.373,0.145,0.64,0.318,0.92,0.598 c0.28,0.28,0.453,0.546,0.598,0.92c0.11,0.282,0.24,0.706,0.275,1.485c0.038,0.844,0.047,1.097,0.047,3.233 s-0.008,2.389-0.047,3.233c-0.036,0.78-0.166,1.204-0.275,1.485c-0.145,0.373-0.318,0.64-0.598,0.92 c-0.28,0.28-0.546,0.453-0.92,0.598c-0.282,0.11-0.706,0.24-1.485,0.275c-0.843,0.038-1.096,0.047-3.233,0.047 s-2.389-0.008-3.233-0.047c-0.78-0.036-1.204-0.166-1.485-0.275c-0.373-0.145-0.64-0.318-0.92-0.598 c-0.28-0.28-0.453-0.546-0.598-0.92c-0.11-0.282-0.24-0.706-0.275-1.485c-0.038-0.844-0.047-1.097-0.047-3.233 S5.45,9.616,5.488,8.773c0.036-0.78,0.166-1.204,0.275-1.485c0.145-0.373,0.318-0.64,0.598-0.92c0.28-0.28,0.546-0.453,0.92-0.598 c0.282-0.11,0.706-0.24,1.485-0.275C9.611,5.455,9.864,5.447,12,5.447 M12,4.005c-2.173,0-2.445,0.009-3.298,0.048 C7.85,4.092,7.269,4.227,6.76,4.425C6.234,4.63,5.787,4.903,5.343,5.348C4.898,5.793,4.624,6.239,4.42,6.765 c-0.198,0.509-0.333,1.09-0.372,1.942C4.009,9.56,4,9.833,4,12.005c0,2.173,0.009,2.445,0.048,3.298 c0.039,0.852,0.174,1.433,0.372,1.942c0.204,0.526,0.478,0.972,0.923,1.417c0.445,0.445,0.891,0.718,1.417,0.923 c0.509,0.198,1.09,0.333,1.942,0.372c0.853,0.039,1.126,0.048,3.298,0.048s2.445-0.009,3.298-0.048 c0.852-0.039,1.433-0.174,1.942-0.372c0.526-0.204,0.972-0.478,1.417-0.923c0.445-0.445,0.718-0.891,0.923-1.417 c0.198-0.509,0.333-1.09,0.372-1.942C19.991,14.45,20,14.178,20,12.005s-0.009-2.445-0.048-3.298 c-0.039-0.852-0.174-1.433-0.372-1.942c-0.204-0.526-0.478-0.972-0.923-1.417c-0.445-0.445-0.891-0.718-1.417-0.923 c-0.509-0.198-1.09-0.333-1.942-0.372C14.445,4.014,14.173,4.005,12,4.005L12,4.005z"></path><path d="M12,7.897c-2.269,0-4.108,1.839-4.108,4.108S9.731,16.113,12,16.113s4.108-1.839,4.108-4.108S14.269,7.897,12,7.897z M12,14.672c-1.473,0-2.667-1.194-2.667-2.667S10.527,9.339,12,9.339s2.667,1.194,2.667,2.667S13.473,14.672,12,14.672z"></path><circle cx="16.27" cy="7.735" r="0.96"></circle></svg></span> <span>Instagram</span> </a> </div> </div> </div> </div> </div> <div class="wt-mb-xs-8 wt-mb-md-10"> <div class="clearfix"></div> <div class="about-section display-flex-md flex-direction-column-md mb-md-5 pl-xs-0 pr-xs-0" data-region="shop-members" id="shop-members"> <div class="p-xs-0"> <h3 class="wt-text-title-01">Shop members</h3> </div> <div class="pl-xs-0 pr-xs-0 pt-xs-2 pt-lg-4"> <div class="max-width-760px"> <ul class="list-unstyled block-grid-md-2" data-region="shop-member-list"> <li class="pt-xs-2 pb-xs-2 block-grid-item" data-region="shop-member" data-member-id="22676501471" data-member-avatar-url="https://i.etsystatic.com/isc/87253d/22676501471/isc_90x90.22676501471_6w54.jpg?version=0" data-member-bio="" data-member-role="Owner" data-member-name="Lucky Plum Studio"> <div class="flag"> <div class="flag-img vertical-align-top pr-lg-3"> <img src="https://i.etsystatic.com/isc/87253d/22676501471/isc_90x90.22676501471_6w54.jpg?version=0" alt="" class="circle" data-region="member-avatar" width="48" height="48"> </div> <div class="flag-body"> <h6 class="mb-xs-0 b text-transform-none text-body" data-region="member-name">Lucky Plum Studio</h6> <p class="prose" data-region="member-role">Owner</p> <p class="text-gray-lighter mb-xs-0" data-region="member-bio"> </p> </div> </div> </li> </ul> </div> </div> </div> </div> <div class=""> </div> </div> As Always thanks in advance ''######### updated today 22/3/2021 at 6pm uk time ######### In reply to Qharr answer. I had this for location and nothing was collected, could you please explain where i went wrong and I should be able to fix the rest ''' Element 4 DoEvents If element.getElementsByClassName("shop-location")(0).getElementsByTagName("Span")(0) Is Nothing Then ' Get CLASS and Child Nod wsSheet.Cells(sht.Cells(sht.Rows.Count, "D").End(xlUp).Row + 1, "D").Value = "-" Else HtmlText = element.getElementsByClassName("shop-location")(0).getElementsByTagName("Span")(0).innerText wsSheet.Cells(sht.Cells(sht.Rows.Count, "D").End(xlUp).Row + 1, "D").Value = HtmlText End If
Not sure what to say except read up on html and html document methods/ css selectors so you understand the patterns you need to apply. The rest is just practice and learning which are the fastest and more robust methods. CSS: Location: .shop-location span is a span child element with parent having class shop-location Social media links: #about .text-decoration-none child nodes with one class name that is text-decoration-none, having parent with id about. Name: [data-region='member-name'] element with data-region attribute having value member-name Read about css selectors and descendant combinator here Practice css selectors here Learn about html here VBA: Option Explicit Public Sub GetInfo() Dim ie As SHDocVw.InternetExplorer Set ie = New SHDocVw.InternetExplorer With ie .Visible = True .Navigate2 "https://www.etsy.com/uk/shop/LuckyPlumStudio" While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend With .document Debug.Print .querySelector(".shop-location span").innerText 'location Dim i As Long, socialMedias As Object Set socialMedias = .querySelectorAll("#about .text-decoration-none") For i = 0 To socialMedias.Length - 1 'media links Debug.Print socialMedias.Item(i).href Next Debug.Print .querySelector("[data-region='member-name']").innerText 'company name End With .Quit End With End Sub Less optimal methods for selecting: Option Explicit Public Sub GetInfo() Dim ie As SHDocVw.InternetExplorer Set ie = New SHDocVw.InternetExplorer With ie .Visible = True .Navigate2 "https://www.etsy.com/uk/shop/LuckyPlumStudio" While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend With .document Debug.Print .getElementsByClassName("shop-location wt-display-flex-xs")(0).getElementsByTagName("span")(0).innerText 'location Dim i As Object, socialMedias As Object Set socialMedias = .getElementById("about").getElementsByClassName("text-decoration-none clearfix") For Each i In socialMedias 'media links Debug.Print i.href Next Debug.Print .getElementById("about").getElementsByClassName("flag")(0).getElementsByTagName("h6")(0).innerText 'company name End With .Quit End With End Sub
How to get the text of img alt inside <a> tag
I have a url with the following html part <div class="shop cf"> <a class="shop-logo js-shop-logo" href="/m/3870/GMobile"> <noscript> <img alt="GMobile" class="js-lazy" data-src="//a.scdn.gr/ds/shops/logos/3870/mid_20160920155600_71ff515d.jpeg" src="//a.scdn.gr/ds/shops/logos/3870/mid_20160920155600_71ff515d.jpeg" /> </noscript> <img alt="GMobile" class="js-lazy" data-src="//a.scdn.gr/ds/shops/logos/3870/mid_20160920155600_71ff515d.jpeg" src="//c.scdn.gr/assets/transparent-325472601571f31e1bf00674c368d335.gif" /> </a> </div> I want to get the first img alt inside the div class shop cf and I do Set seller = Doc.querySelectorAll("img") wks.Cells(i, "D").Value = seller.getAttribute("alt").Content(0) I get nothing what I forget to include?!? Can I get it from <noscript> tag? I tried the following as well Set seller = Doc.getElementsByClassName("js-lazy") wks.Cells(i, "D").Value = seller.getAttribute("alt")
Use element with attribute selector CSS: img[alt] VBA: ie.document.querySelector("img[alt]") You may need to add ie.document.querySelector("img[alt]").getAttribute("alt") To include the class use ie.document.querySelector("img.js-lazy[alt]") If more than one element then use querySelectorAll and index into returned nodeList e.g. Set list = ie.document.querySelectorAll("img.js-lazy[alt]") list.item(0).getAttribute('alt') list.item(1).getAttribute('alt')
have you try this way? let lazy1 = document.querySelectorAll(".js-lazy")[0] let lazyalt = lazy1.getAttribute("alt"); let shop = document.querySelector('.shop'); shop.classList.add(lazyalt); console.log(lazyalt)
VBA does not click <a> within <li>
I am navigating to a webpage with an unordered list. Now I have to click an 'anchor' tag within a specific 'li' tag. The part of the source code is, <UL class="x-tab-strip x-tab-strip-top" id=ext-gen151> <LI id=infoPageinfoPanelID__infoPage_myTab_pubst_pubstructStructureGWT _nodup="30817"> <A class=x-tab-strip-close></A> <A class=x-tab-right href="#"> <EM class=x-tab-left> <SPAN class=x-tab-strip-inner> <SPAN class="x-tab-strip-text ">Structure</SPAN> </SPAN> </EM> </A> </LI> </UL> The anchor tag does not have a name or ID and has a class name(" x-tab-right "). I tried the following vba code for simulating a click on that tag, Dim targetSpan As HTMLObjectElement Set targetSpan = doc.getElementById("infoPageinfoPanelID__infoPage_myTab_pubst_pubstructStructureGWT").getElementsByTagName("a")(1) targetSpan.click => Code : Dim AllSpanElements As IHTMLElementCollection Dim spanCounter As Long Set AllSpanElements = doc.getElementsByTagName("li") For spanCounter = 0 To AllSpanElements.Length - 1 With AllSpanElements(spanCounter) If (.innerText) = "Structure" Then .ParentElement. ParentElement.ParentElement.Click Exit For End If End With Next I got the 2nd code from StackOverflow. Both the code doesn't do anything. What am I doing wrong? Thanks in advance.
Doubts in clicking Unordered List in IE using VBA
I have an unordered list with links in IE and the list is also dynamic. Clicking on the 1st level in the list will open the 2nd level and so on. My work is to select one link within 4th level. So I have to click upto 4th level and click another option after that. Initially, without clicking anything, the html code of the list will be, <div class="tree" id="treeDiv" style="width: 288px; border-top-color: transparent; border-top-width: 195px; border-top-style: solid;" ondragstart="return false" onselectstart="return false" oncontextmenu="return false"> <ul style="width: 915px;"> <li> <a class=" open" id="level1st" onclick="resetTree(0);" href="#">GENERAL </a> <ul></ul> </li> </ul> </div> After clicking "GENERAL" (1st level), the html code of the list will be, <div class="tree" id="treeDiv" style="width: 288px; border-top-color: transparent; border-top-width: 195px; border-top-style: solid;" ondragstart="return false" onselectstart="return false" oncontextmenu="return false"> <ul style="width: 915px;"> <li> <a class=" open" id="level1st" onclick="resetTree(0);" href="#">GENERAL </a> <ul> <li> <a class=" open" id="0/2" style="padding-left: 13px;" href="#">GENERAL </a> <ul></ul> </li> </ul> </ul> </div> To click 1st level, I tried writing two kinds of codes, CODE 1----- Sub A() Dim ie As InternetExplorerMedium Dim availableLinks As MSHTML.HTMLElementCollection Dim cLinks As MSHTML.HTMLLIElement Set ie = New InternetExplorerMedium ie.Navigate 'URL Do While ie.readyState <> READYSTATE_COMPLETE DoEvents Loop Set availableLinks = ie.document.getElementsByClassName("tree").getElementsByTagName("li") For Each cLinks in availableLinks If cLinks.innerText = "General" Then cLinks.click End If Next cLinks End Sub CODE 2----- Sub B() Dim ie As InternetExplorerMedium Dim availableLinks As MSHTML.HTMLElementCollection Dim cLinks As MSHTML.HTMLLIElement Set ie = New InternetExplorerMedium ie.Navigate 'URL Do While ie.readyState <> READYSTATE_COMPLETE DoEvents Loop Set availableLinks = ie.document.getElementsByClassName("tree").getElementsByTagName("a") For Each cLinks in availableLinks If cLinks.ID = "level1st" Then cLinks.Focus cLinks.FireEvent("onclick") End If Next cLinks End Sub Now about my doubts, Assuming my CODE 1 is correct, I am able to select my 1st level thro' innerText 'General'. But my 2nd level also has the same innerText 'General'. So how do I specify in code that it is the 2nd level. Also, in CODE 1, I have used cLinks.click. Is it correct or should I use cLinks.FireEvent("onclick")? Now in CODE 2, I have declared cLinks As MSHTML.HTMLLIElement. But I am not using any <li> tag in the code. So what should I declare it as, as MSHTML.HTMLLIElement or as MSHTML.HTMLElement? Finally, is CODE 1 and CODE 2 correct pertaining to my unordered list? I cannot execute my code and check now as I have to go to my office to do that. So what do you think? Which one is suitable? I think CODE 2 would be appropriate for all the levels. If both the codes are wrong, can you suggest a suitable one? Sorry for the very long post, I am just learning web-scraping so I am full of doubts for which I am not able to find solutions in internet. Can anyone please help? Thanks in advance.
Selecting Dropdown list in IE with VBA
I have been trying to select a drop down list in a web page using VBA, which I'm new to. In HTML the drop down menu is stated as Button and not Select. Here is the HTML code: <span class='btn-group'> <button id='str_listing-btn' name='str_listing-btn' type='button' class='btn btn-default dropdown-toggle' data-toggle='dropdown' data-value='For Sale'> For Sale <span class='caret' style='margin-left:5px;'> </span> </button> <ul class='dropdown-menu wptk_crud_dropdown' id='str_listing' data-value='str_listing' role='menu'> <li> <a href='#' data-value='For Sale'>For Sale</a> </li> <li> <a href='#' data-value='For Rent'>For Rent</a> </li> <li> <a href='#' data-value='Wanted To Buy'>Wanted To Buy</a> </li> <li> <a href='#' data-value='Wanted To Rent'>Wanted To Rent</a> </li> </ul> </span> I have tried a few VBA codes to select one of the options. Below is the latest code that I have used but after running it nothing seems to happen to the drop down menu: Private Sub InsertPropwall_Click() Dim objIE1 As Object objIE1.navigate ("http://www.propwall.my/classifieds/post_ad?action=add") Do DoEvents Loop Until objIE1.ReadyState = 4 objIE1.Document.getElementById("str_listing-btn").Value = "For Rent" Do DoEvents Loop Until objIE1.ReadyState = 4 End Sub
The dropdown isn't a dropdown (as you've mentioned). Instead, it's an unordered list with a number of links as clickable options. With that said, the For Rent option isn't contained in str_listing-btn. Notice how that tag closes before any of the options are available. Instead, it's contained in the list below it. Since I couldn't get into your link without creating an account, I tested out code to select the For Rent option on the main page. Take a look at my code, test it to see if it does what you need, then try to accommodate it into your code. Let us know if you need additional help. Sub NavigateIt() Dim oIE As Object Set oIE = CreateObject("InternetExplorer.Application") oIE.navigate ("http://www.propwall.my/classifieds") oIE.Visible = True Do DoEvents Loop Until oIE.ReadyState = 4 Set AvailableLinks = oIE.document.getelementbyid("list-listing").getelementsbytagname("a") For Each cLink In AvailableLinks If cLink.innerhtml = "For Rent" Then cLink.Click End If Next cLink End Sub
CSS selector: You could have used a CSS selector e.g. a[data-value=For Sale] or a[data-value=For Rent] This says first element with a tag that has attribute data-value whose value is 'For Sale'; Or, 'For Rent' in the subsequent example. CSS query: VBA: You apply the CSS selector using the querySelector method of document objIE1.Document.querySelector("a[data-value=For Sale]").Click