How to find XPath for the banner text? - html

The text is Investors/ Lenders get access to creditworthy borrowers to lend funds as per their risk appetite and gain attractive stable returns or monthly income to create wealth.
How to find xpath for mentioned text?

This is only possible with XPath-2.0 or above, because you need the fn:string-join function to merge the text() values in one XPath expression:
string-join(normalize-space(//p[#class='banner-text']))
I haven't tested this expression, because you chose to include code as image and not as text. Probably the == $0 text is included in the result. You can fix this with the fn:substring-after function.

Here is the xpath:
//p[#class='banner-text']
Here is the css:
p.banner-text
Tested in Chrome Console with below xpath and css.
xpath:
document.evaluate("//p[#class='banner-text']", document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null).snapshotItem(0).innerText
CSS:
document.querySelector(".banner-text").innerText
Using JQuery:
$x("//p[#class='banner-text']")[0].innerText
Result for all the three above:
Investors/ Lendors "get access to creditworty" borrowers "to lend funds per their risk appetite and gain attractive stable returns or monthly income to create wealth."

Related

IMPORTXML in Google Sheets cannot seem to dive multiple spans

I have a fun spreadsheet for playing with PowerBall (potential) winnings. I discovered that I could use IMPORTXML to auto-fill the cash value of the current jackpot. But something changed and I am trying to fix it, but there's a problem:
=IMPORTXML(A4,A?) where A4= https://www.powerball.com/powerball-prize-estimate and A? = the XPATH.
The problem is that the XPATH comes to this: //*[#id="block-winningnumbersmodule"]/div[2]/div[2]/span[3]
The Full XPATH is: /html/body/div[2]/div/header/div[2]/div/div[2]/div[2]/span[3]
The relevant div[2] looks like this from the page:
<div class="estimated-jackpot">
<span class="estimated">Estimated Jackpot</span>
<span class="number"></span>
<span class="cash-value" data-value="Cash Value:"></span>
</div>
The second span is empty and I can't access the third span.
I solved this issue for myself in that the same info turned out to be elsewhere in the page. The old XMLPATH changed very little:
From:
/html/body/div[2]/div/main/div/div[1]/article/div/div[3]
To:
/html/body/div[2]/div/main/div/div[2]/article/div/div[3]
I switched to the, presumably, less likely to change randomly:
//div[#class="field_prize_amount_cash"]
Which lands me the same place.
But I continue with the question because I think it's still valid...why can't I get past the first span?
These are the XPATHs that I tried and the results for each:
//*[#id="block-winningnumbersmodule"]/div[2]/div[2]/span[3] #N/A XPATH
/html/body/div[2]/div/header/div[2]/div/div[2]/div[2]/span[3] #N/A Full XPATH
//*[#id="block-winningnumbersmodule"]/div[2]/div[2]/span Estimated Jackpot
/html/body/div[2]/div/header/div[2]/div/div[2]/div[2]/span Estimated Jackpot
//div[#class="estimated-jackpot"] Estimated Jackpot
//span[#class="estimated"] Estimated Jackpot
//span[#class="number"] #N/A This is an empty span
//span[#class="cash-value"] #N/A This is what I am wanting
//*[#id="block-winningnumbersmodule"]/div[2]/div[2]/span[1] Estimated Jackpot For completeness' sake
//*[#id="block-winningnumbersmodule"]/div[2]/div[2]/span[2] #N/A Maybe it doesn't count the empty span? Poop!
So, help? That I don't need anymore except to scratch an itch :)
Well I figured it out! "The relevant div[2] looks like this from the page:" showed empty spans! Looking at the elements they were all filled in...they never made it to the page source! The part I found later in the page was in the elements AND the source. "There's your problem, right there!"
<div class="field_next_draw_date">
<time datetime="2021-05-16T02:59:59Z">2021-05-16T02:59:59+0000</time>
</div>
<div class="field_prize_amount">
$183 Million
</div>
<div class="field_prize_amount_cash">
$127.4 Million
</div>
Look! Numbers! Not nothing! So when I was getting #N/A it was because I was trying to read nothing. IMPORTXML is cool, and it will read nothing as nothing every time!
Now I leave this here as...a cautionary tale? ...a reminder? (of what?)
I know... A funny story about how the asking of a question, if done thoroughly, giving all information to help those answering with every conceivable piece of information so that they are able to answer, can wind up leading you to your own answer.
I guess.
Thanks for your help! :)

Filtering name with two condition

I have a sample list of student and grades/subject in this file
enter image description here
https://docs.google.com/spreadsheets/d/1NeHlUaRnbvdJ2yJ38fUETGgBoYseQ8CuXmwRCwObAlM/edit#gid=0
On the range A16:A I'd like to see the list of names who has the grades of around 90-100 when I check any of the checkbox on B15:k15
the first example is when I check all of the boxes
I will only see the first name on the list because he is the only one with the 90-100 scores on all subject
2nd example when I check B15 and C15
I will only see the 1st and 2nd names on the list because he's those who only able to get a 90-100 score on those two subjects.
Is there a way to do this kind of filtering? thank you so much
Since this is your first post, I'm going to go with the approach I think you'll find easiest to understand. It's a long formula (which I've placed in a new sheet called "Erik Help" in A16), but it's just a repeat of the same element several times:
=FILTER(A2:K11, IF(B15=TRUE, B2:B11>=90, B2:B11^0), IF(C15=TRUE, C2:C11>=90, B2:B11^0), IF(D15=TRUE, D2:D11>=90, D2:D11^0), IF(E15=TRUE, E2:E11>=90, E2:E11^0), IF(F15=TRUE, F2:F11>=90, F2:F11^0), IF(G15=TRUE, G2:G11>=90, G2:G11^0), IF(H15=TRUE, H2:H11>=90, H2:H11^0), IF(I15=TRUE, I2:I11>=90, I2:I11^0), IF(J15=TRUE, J2:J11>=90, J2:J11^0), IF(K15=TRUE, K2:K11>=90, K2:K11^0))
The first argument of FILTER tells the function what to filter (in this case A2:K11).
After that, an IF statement is set up to check each checkbox. If the checkbox is checked, the FILTER will only include students who obtained a 90 or higher in that subject.
If the checkbox is NOT checked, then the student is automatically included (that's the part that says "B2:B11^0" etc., since anything to the zero-power equals 1, and 1 and TRUE are the same to Google Sheets). In other words, if no checkboxes were checked, then all students would read TRUE for all subjects, i.e., all students would be included (or, to think of it another way, no one is rules out). While the ^0 is not strictly necessary (i.e., any number other than zero is the same as TRUE), I think it's better formula practice and easier for others to understand if TRUE is represented either as TRUE or as 1.
I also set conditional formatting on A15:A, to bold the name as you had it. (The conditional formatting rule says, in English, "If anything is there, use bold.") You can see the rule by clicking anywhere in the range A15:A, then selecting Format > Conditional formatting from the menu and clicking the open the rule that appears in the window to the right of the screen.

How to select from a selection box with a variable in the name?

I am having trouble using selecting from this select element.
<select name="vehicle_attrs[position_count]" class="mb1"><option>Position / Quantity</option><option>Front</option><option>Rear</option></select>
I have tried
select('Front', :from=>'mb1')
select('Front', :from=>'vehicle_attrs[position_count]')
select('Front', :from=>'vehicle_attrs[1]')
All of them result in a can not find selection box error
I've never liked how restrictive Capybara's concept of a 'locator' is (i.e. must have a name/id/label), but if you dig into the source code, those helpful methods like select, click_on, and fill_in are just wrappers for find and some native method of Element, which takes arbitrary CSS, and works in almost all situations. In this case, you could use:
find('[name="vehicle_attrs[position_count]"]').find('option', text: 'Front').select_option
Since dropdowns often have multiple similar options, where one is a substring of the other, you might consider using an exact string match, like
find('[name="vehicle_attrs[position_count]"]').find('option', text: /\AFront\z/).select_option
From the docs for select - https://www.rubydoc.info/github/teamcapybara/capybara/Capybara/Node/Actions#select-instance_method - we can see that the from option takes "The id, Capybara.test_id atrtribute, name or label of the select box".
Neither 'mb1' or 'vehicle_attrs[1]' are any of those so they would be expected to fail.
'vehicle_attrs[position_count]' is the name so assuming the box is actually visible on the page (not replaced with a JS driven select widget, etc), that should work. If it doesn't, then edit your question and add the full exact error message you get when trying to use it. Of course if there is only one select box on the page with an option of 'Front' then you don't need to specify the from option at all and can just do
select 'Front'

How can I use "Interpolated Absolute Discounting" for a bigram model in language modeling?

I want to compare two smoothing methods for a bigram model:
Add-one smoothing
Interpolated Absolute Discounting
For the first method, I found some codes.
def calculate_bigram_probabilty(self, previous_word, word):
bigram_word_probability_numerator = self.bigram_frequencies.get((previous_word, word), 0)
bigram_word_probability_denominator = self.unigram_frequencies.get(previous_word, 0)
if self.smoothing:
bigram_word_probability_numerator += 1
bigram_word_probability_denominator += self.unique__bigram_words
return 0.0 if bigram_word_probability_numerator == 0 or bigram_word_probability_denominator == 0 else float(
bigram_word_probability_numerator) / float(bigram_word_probability_denominator)
However, I found nothing for the second method except for some references for 'KneserNeyProbDist'. However, this is for trigrams!
How can I change my code above to calculate it? The parameters of this method must be estimated from a development-set.
In this answer I just clear up a few things that I just found about your problem, but I can't provide a coded solution.
with KneserNeyProbDist you seem to refer to a python implementation of that problem: https://kite.com/python/docs/nltk.probability.KneserNeyProbDist
There exists an article about Kneser–Ney smoothing on wikipedia: https://en.wikipedia.org/wiki/Kneser%E2%80%93Ney_smoothing
The article above links this tutorial: https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf but this has a small fault on the most important page 29, the clear text is this:
Modified Kneser-Ney
Chen and Goodman introduced modified Kneser-Ney:
Interpolation is used instead of backoff. Uses a separate discount for one- and two-counts instead of a single discount for all counts. Estimates discounts on held-out data instead of using a formula
based on training counts.
Experiments show all three modifications improve performance.
Modified Kneser-Ney consistently had best performance.
Regrettable the modified Version is not explained in that document.
The original documentation by Chen & Goodman luckily is available, the Modified Kneser–Ney smoothing is explained on page 370 of this document: http://u.cs.biu.ac.il/~yogo/courses/mt2013/papers/chen-goodman-99.pdf.
I copy the most important text and formula here as screenshot:
So the Modified Kneser–Ney smoothing now is known and seems being the best solution, just translating the description beside formula in running code is still one step to do.
It might be helpful that below the shown text (above in screenshot) in the original linked document is still some explanation that might help to understand the raw description.

Trouble with Xpath in Google Spreadsheets (ImportXML)

This is a great site, and I've already had a lot of questions answered simply by scrolling and searching through other postings. Unfortunately, I can't seem to track down an answer that specifically helps this problem, and figured I would try posting and looking for help-
I'm using ImportXML and google spreadsheets to 'scrape'a few product descriptions from a retail site. It's been working fine for the most part, and I have done it in 2 ways:
1) Specific call to the description part of a post:
=ImportXML(A1,"//div[#class='desc']")
2) Call to the entire 'product Card', which also returns info such as product title, price, time posted, and places these items in adjacent cells in my Google spreadsheet:
=ImportXML(A1,"//div[#class='productCard']")
Both have worked fine, but I've ran into a different problem using each method. If I can resolve even one of these problems, then I'll happily scrap the other method, I just need one of them to work. The problems are:
Method 1) The website prohibits sellers from including contact information in product postings-- when they include an email address anyways, the site automatically blocks it, so that in the posting it simply appears as "...you can reach me at [obscured]" or something like that. The [obscured] appears in a different colour text and is obviously treated differently somehow. When I scrape these descriptions using Method 1, ImportXML appears to get 'bumped' when it hits the word [obscured], and it passed the remaining text from that product description to the next cell over in my spreadsheet. This ruins the entire organization of the sheet, and I'd like to find a way where I can get ImportXML to just ignore the [obscured], and still place the entire text of the product description in one cell.
Method 2) My call for the entire 'product Card' is as follows:
=ImportXML(A1,"//div[#class='productCard']")
As mentioned, this works fine (for most products), and I don't mind the additional info (price, date, etc.) being posted in adjacent cells.
However, the website also allows certain products to be 'featured', where they appear in a different colour box on the site, and are therefore more likely to get a buyer's attention.
Using this method, the 'featured' products are not scraped or imported into my spreadsheet, but are simply passed over.
The source code (on actual site) (via 'inspect element' in Safari) for both the description (Method 1) and product card (Method 2) look as follows (for a normal product (a) and a featured product (b)):
(a)
<div id="productSearchResults">
<div class="productCard tracked">
<div>...</div>
<div class="stats">...</div>
<div class="desc collapsed descFull">...</div>
</div>
(b)
<div id="productSearchResults">
<div class="productCard featured tracked">
<div>...</div>
<div class="stats">...</div>
<div class="desc collapsed descFull">...</div>
</div>
You can see in both (a) an (b) the 'desc' class that I call in Method 1, which seems to work fine.
From my reading on this site, I think I've learned that a given class can't have more than one word, and therefore the use of "desc collapsed descFull" and "productCard tracked" and "productCard featured tracked" don't represent classes with 3, 2 and 3 words in the title, but instead cases where multiple classes have been assigned?
Regardless, the call to 'desc' (Method 1) works fine and seems to get all descriptions.
In method 2 therefore, I would have thought that a call to 'productCard' would get the info for all products, both featured and regular, as 'featured' is an extra class assigned to some 'productCard's. If I call all 'productCard's, shouldn't the normal AND featured ones be returned? This is currently not the case. I've tried calling just 'tracked' and just 'featured' as classes, and neither returns anything, so my logic that they are their own class equivalent to 'productCard' may be flawed.
In summary, the 'desc' call in Method 1 works fine, and even gets descriptions for 'featured' products. However, when contact information is included in the description and is displayed as [obscured] it bumps my data into the next cell in the spreadsheet, immediately following the word. This throws off and ruins all organization.
In Method 2, I am not getting the featured products at all, which greatly weakens what I am trying to do. Can either (or both!) of these problems be fixed??
Thanks so so much for any help you can give me.
***UPDATE: As seen in the comments below, use of the 'contain' as suggested improved Method 2 by retrieving both regular and featured products. However, featured product cards have extra text elements, and since the entire card is being scraped in this method, featured products do not match the cell alignment that regular products do. If there is a way to fix Method 1, this would therefore be much better.
As outlined in the comments below, the [obscured] text appears in a 'span' that follows underneath/indented from the
<div class="desc descFull collapsed"
as
<span class="obscureText">[obscured]</span>
Is there any way that I can import the 'desc's as I have been, but tell the XPath to essentially 'ignore' the [obscured] span, or at least deal with it in a way that doesn't make description text immediately after [obscured] appear one cell over?
Thanks so much everyone!
You can wrap your function with the concatenate()-function to make sure it all shows up in one cell:
=concatenate(ImportXML(A1,"//div[#class='productCard']"))