XPath select parent element only if child element exists - html

<element_1>
<element_2>Text</element_2>
<element_3>
<element_4>
<element_5>Test Text</element_5>
</element_4>
<element_4>
</element_4>
</element_3>
<element_6>
<element_7>
<element_8>0</element_8>
How would I write xpath to find all instances of element_4 that contain an instance of element_5
Context
URL: https://www.amazon.com/b/ref=s9_acss_bw_cg_KOTHLPCG_1a1_w?node=565108&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=merchandised-search-6&pf_rd_r=PASRJV57NJ97XPYZW0GS&pf_rd_t=101&pf_rd_p=1e1598d2-28c3-4a64-91af-254d7a033ada&pf_rd_i=541966\
I am using selenium and I am trying grab the name of only the laptops that are on sale. The laptops that are on sale have an old price that is written with a strike through which is written underneath its current price. I want the names of only the laptops that have that strike through price in their listing.
Very new to Selenium and Xpath so I hope that made sense.

To complete, this will select laptop names on sale (works for all pages, featured items on page 1 excluded):
//span[#data-a-strike="true" or contains(#class,"text-strike")][.//text()]/preceding::h2[#class][1]

Related

Primefaces Treenode How to select father node only?

Is there anyways to select father node without its children in Primefaces treenode?
I am using Primefaces treenode to display a product category, but not all products belong to children categories. For example, a product belongs to category 1.1.1, but another belongs to 1.1.
I want to update the product's category, but when I select father node, its nodes will be selected too.
This is when I select father category:
This is what I want it to be:
Please help me, thank you.
p/s: I want to use checkbox.
See the documentation: https://primefaces.github.io/primefaces/10_0_0/#/components/tree?id=tree
Look at the propagateSelectionUp and propagateSelectionDown properties. You want to turn them to false to get the behavior you want.

Insert a header from a HTML into VB

I am currently doing a school project for an elite program that serves every user's aviation purposes, from customers to pilots. I wanted to grab a text from a website called Fuel Planner. The user will input their departure and destination and then the website will load how much fuel is needed for that flight. However, I only need one part of that HTML which is the part where it prints the amount of fuel needed. The HTML code for that is as shown below:
<!-- end #menu -->
<div id="about">
<h3>Airbus A300-600-PW4158 Fuel Planner</h3>
<p>Sydney to Brisbane YSSY-YBBN (406 NM)<br></p>
<h2>Total Fuel: 26608 POUNDS</h2>
The line that I need to grab is
<h2>Total Fuel: 26608 POUNDS</h2>
I want this line to be inserted into the textbox txtFOB.Text.
Both are on the same form but on different tabs, so we don't have to worry about that. The web browser is called webFuel in the form frmPilots.
For this example, the departure ORIG is going to be
YSSY (Sydney)
And the arrival DEST will be
YBBN (Brisbane)
And the aircraft would be
A300-600
Both can be inserted into the two text boxes on the website's home page. Any help would be greatly appreciated. Thanks!
Since the element doesn't have an ID, you would need to loop thru the elements collection and look for the element you are interested in.
Try the following:
For Each element As HtmlElement In webFuel.Document.GetElementsByTagName("H2")
If element.InnerText Like "Total Fuel: * POUNDS" Then
'this is the element we are looking for...
txtFOB.Text = element.InnerText.Replace("Total Fuel:","").Replace("POUNDS","").Trim()
Exit For
End If
Next
NB: You have put a lot of tags in the question which is confusing. The above code is for VB.NET and is not tested.

Trouble with Xpath in Google Spreadsheets (ImportXML)

This is a great site, and I've already had a lot of questions answered simply by scrolling and searching through other postings. Unfortunately, I can't seem to track down an answer that specifically helps this problem, and figured I would try posting and looking for help-
I'm using ImportXML and google spreadsheets to 'scrape'a few product descriptions from a retail site. It's been working fine for the most part, and I have done it in 2 ways:
1) Specific call to the description part of a post:
=ImportXML(A1,"//div[#class='desc']")
2) Call to the entire 'product Card', which also returns info such as product title, price, time posted, and places these items in adjacent cells in my Google spreadsheet:
=ImportXML(A1,"//div[#class='productCard']")
Both have worked fine, but I've ran into a different problem using each method. If I can resolve even one of these problems, then I'll happily scrap the other method, I just need one of them to work. The problems are:
Method 1) The website prohibits sellers from including contact information in product postings-- when they include an email address anyways, the site automatically blocks it, so that in the posting it simply appears as "...you can reach me at [obscured]" or something like that. The [obscured] appears in a different colour text and is obviously treated differently somehow. When I scrape these descriptions using Method 1, ImportXML appears to get 'bumped' when it hits the word [obscured], and it passed the remaining text from that product description to the next cell over in my spreadsheet. This ruins the entire organization of the sheet, and I'd like to find a way where I can get ImportXML to just ignore the [obscured], and still place the entire text of the product description in one cell.
Method 2) My call for the entire 'product Card' is as follows:
=ImportXML(A1,"//div[#class='productCard']")
As mentioned, this works fine (for most products), and I don't mind the additional info (price, date, etc.) being posted in adjacent cells.
However, the website also allows certain products to be 'featured', where they appear in a different colour box on the site, and are therefore more likely to get a buyer's attention.
Using this method, the 'featured' products are not scraped or imported into my spreadsheet, but are simply passed over.
The source code (on actual site) (via 'inspect element' in Safari) for both the description (Method 1) and product card (Method 2) look as follows (for a normal product (a) and a featured product (b)):
(a)
<div id="productSearchResults">
<div class="productCard tracked">
<div>...</div>
<div class="stats">...</div>
<div class="desc collapsed descFull">...</div>
</div>
(b)
<div id="productSearchResults">
<div class="productCard featured tracked">
<div>...</div>
<div class="stats">...</div>
<div class="desc collapsed descFull">...</div>
</div>
You can see in both (a) an (b) the 'desc' class that I call in Method 1, which seems to work fine.
From my reading on this site, I think I've learned that a given class can't have more than one word, and therefore the use of "desc collapsed descFull" and "productCard tracked" and "productCard featured tracked" don't represent classes with 3, 2 and 3 words in the title, but instead cases where multiple classes have been assigned?
Regardless, the call to 'desc' (Method 1) works fine and seems to get all descriptions.
In method 2 therefore, I would have thought that a call to 'productCard' would get the info for all products, both featured and regular, as 'featured' is an extra class assigned to some 'productCard's. If I call all 'productCard's, shouldn't the normal AND featured ones be returned? This is currently not the case. I've tried calling just 'tracked' and just 'featured' as classes, and neither returns anything, so my logic that they are their own class equivalent to 'productCard' may be flawed.
In summary, the 'desc' call in Method 1 works fine, and even gets descriptions for 'featured' products. However, when contact information is included in the description and is displayed as [obscured] it bumps my data into the next cell in the spreadsheet, immediately following the word. This throws off and ruins all organization.
In Method 2, I am not getting the featured products at all, which greatly weakens what I am trying to do. Can either (or both!) of these problems be fixed??
Thanks so so much for any help you can give me.
***UPDATE: As seen in the comments below, use of the 'contain' as suggested improved Method 2 by retrieving both regular and featured products. However, featured product cards have extra text elements, and since the entire card is being scraped in this method, featured products do not match the cell alignment that regular products do. If there is a way to fix Method 1, this would therefore be much better.
As outlined in the comments below, the [obscured] text appears in a 'span' that follows underneath/indented from the
<div class="desc descFull collapsed"
as
<span class="obscureText">[obscured]</span>
Is there any way that I can import the 'desc's as I have been, but tell the XPath to essentially 'ignore' the [obscured] span, or at least deal with it in a way that doesn't make description text immediately after [obscured] appear one cell over?
Thanks so much everyone!
You can wrap your function with the concatenate()-function to make sure it all shows up in one cell:
=concatenate(ImportXML(A1,"//div[#class='productCard']"))

Excel Vba – how verify html child element is there or not

I am making excel file to get product price from site and compare. So far i managed to parse product name and price. but problem comes when product is on sale, then it had different element as shown below 1 is normal 2 is on sale
1.
<div class="price">
<span>$87</span>
</div>
2
<div class="price">
<del>100</del>
<ins>80</ins>
</div>
I am doning
Set hPrice = hPord(r).getElementsByClassName("price")
for loop
ActiveSheet.Range("H6").Offset(r, 0).Value = hPrice(0).innerText
this work fine for normal product price but on sale product it returns "100 80"
i try to use
If Not hPrice(0).getElementsByTagName("ins") Then
this gives error when "ins" is not present,
pleae let me know how to verify child tag is there or not, or you have better alternative
Thanks
You have forgotten to add index of <ins> tag to your line. This could be something like this:
If Not hPrice(0).getElementsByTagName("ins")(0) Then
in other words, to get the product price value you need to have this line:
hPrice(0).getElementsByTagName("ins")(0).innerText
Try below samples
If Not hPrice.getElementsByTagName("ins")(0) Then
OR
If Not hPrice.getElementsByTagName("ins") Then

Handling Multiple Images with ColdFusion and MySQL

This is an architecture question, but its solution lies in ColdFusion and MySQL structure--or at least I believe so.
I have a products table in my database, and each product can have any number of screen-shots. My current method to display product screen-shots is the following:
I have a single folder where all screen-shots associated with all products are contained. All screen-shots are named exactly the same as their productID in the database, plus a prefix.
For example: Screen-shots of a product whose productID is 15 are found in the folder images, with the name 15_screen1.jpg, 15_screen2.jpg, etc...
In my ColdFusion page I have hard-coded the image path into the HTML (images/); the image name is broken into two parts; part one is dynamically generated using the productID from the query; and part two is a prefix, and is hard-coded. For example:
<img src"/images/#QueryName.productID#_screen1.jpg">
<img src"/images/#QueryName.productID#_screen2.jpg"> etc...
This method works, but it has several limitations the biggest listed bellow:
I have to hard-code the exact number of screen-shots in my HTML template. This means the number of screen shots I can display will always be the same. This does not work if one product has 10 screen shots, and another has 5.
I have to hard-code image prefixes into my HTML. For example, I can have up to five types of screen-shots associated with one product: productID=15 may have 15_screen1.jpg, 15_screen2.jpg, and 15_FrontCover.jpg, 15_BackCover.jpg, and 15_Backthumb.jpg, etc...
I thought about creating a paths column in my products table, but that meant creating several hundreds of folders for each product, something that also does not seem efficient.
Does anyone have any suggestions or ideas on the correct method to approach this problem?
Many thanks!
How about...
use an Image table, one product to many images (with optional sortOrder column?), and use imageID as the jpeg file name?
update:
Have a ImageClass table, many Image to one ImageClass.
Image
-----
ID
productID
imageClassID (FK to ImageClass)
Use back-end business logic to enforce the some classes can only have one image.
Or... if you really want to enforce some classes can only one image, then can go for a more complex design:
Product
------
ID
name
...
frontCoverImageID
backCoverImageID
frontThumbImageID
backThumbImageID
Image
-----
ID
productID
isScreenShot (bit) // optional, but easier to query later...
However, I like the first one better since you can have as many classes you see fit later, without refactoring the DB.
Keeping information on how many and what images in the database is definitely the way to go.
Barring that, if you want to use naming conventions to associate images with products, and the number of images is arbitrary, then it's probably a better idea to create one folder per product:
/images/products/{SKU1}/frontview.jpg
/images/products/{SKU1}/sideview.jpg
/images/products/{SKU2}/frontview.jpg
and so forth. Then use <cfdirectory> to collect the images for a given product. You might also want to name your images 00_frontview.jpg, 01_sideview.jpg and such so that you can sort and control what order they'll display on the page.
use the cfdirectory tags to inspect the filesystem:
<!--- get a query resultset of images in filesystem --->
<cfdirectory action="list" name="images" directory="images">
<!--- get images for specific product --->
<cfquery name="productImages" dbtype="query">
select *
from images
where name like '#productid#%'
</cfquery>
<cfoutput query="productImages">
<img src="#productimages.directory#/#productimages.name#" />
</cfoutput>
You could even try using the filter attribute to cfdirectory to try and omit the QoQ