Picking the first element using xpath in capybara - html

I have the following line of code
link = find(:xpath, "//div[#id='tree']//a[contains(.,'#{peril}')]")
Above step yields in two elements. How do I pick the first one.
I am getting a Ambiguous match found 2 elements matching xpath. Here is the HTML
"
ShipCase_US_MortalityRatingGroup_Life Portfolio result Earthquake Infectious Disease"

You need to surround the entire XPath in parentheses and add the [1] after it.
(//div[#id='tree']//a[contains(.,'#{peril}')])[1]

find(".active", match: :first).click
this solution uses Capybara's (quite important) waiting capabilities

Related

How to use by.css with Xpath and how to diferenciate both elements with identical paths/classes

I have this two xpaths:
/html/body/ion-app/ng-component/ion-nav/page-settings/ion-content/div[2]/ion-grid/ion-row[2]/ion-col[1]/ion-input
/html/body/ion-app/ng-component/ion-nav/page-settings/ion-content/div[2]/ion-grid/ion-row[2]/ion-col[2]/ion-input
How do i use them with element by.css to test with protractor?
And how to diferenciate when both paths are "equal" like the ones presented above and on the picture?
You can not use xpath with by.css, and only with by.xpath
If you want to use equivalent css to the given xpath, it's very basic knowledge, look it up. Here is a very good source https://devhints.io/xpath
Even though the second question should have been a separate post, I'll answer that also. In your particular example, the input are not identical.
Xpath:
//*[#ng-reflect-model="GG"]/input
//*[#ng-reflect-model="Test"]/input
But in hypothetical scenario, when you have 2 elements with the same attributes, you can specify which occurrence to use - (//xpath)[1] (selects the first match), or say which child is that - //xpath/to/firstChild[1]

Having trouble selecting some specific xpath... (html table, scrapy, xpath)

I'm trying to scrape data (using scrapy) from tables that can be found here:
http://www.bettingtools.co.uk/tipster-table/tipsters
My spider functions when I parse response within the following xpath:
//*[#id="imagetable"]/tbody/tr
Every table on the page shares that id, so I'm basically grabbing all the table data.
However, I only want the table data for the current month (tables in the right column).
When I try and be more specific with my xpath, I get an invalid xpath error even though it seems to be correct. I've tried:
- //*[#id="content"]/[contains(#class, "column2")]/[contains(#class, "table3")]/[#id="imagetable"]/tbody/tr
- //*[#id="content"]/div[contains(#class, "column2")]/div[contains(#class, "table3")]/[#id="imagetable"]/tbody/tr
- //*[#id="content"]/div[2]/div[1]/[#id="imagetable"]/tbody/tr
Also, when I try to select the xpath of a specific table on the page with chrome I just get //*[#id="imagetable"].
Am I missing something obvious here? Why are the 3 above xpath examples I've tried not valid?
Thanks
What makes those 3 invalid xpath is the part with this pattern :
/[predicate expression here]
above xpath missed to select a node on which the predicate would be applied. It should rather looks like this :
/*[predicate expression here]
Here are some examples of valid ones :
1. /table[#id="imagetable"]
2. /div[contains(#class, "column2")]
3. /*[contains(#class, "table3")]
For this specific task, you can try the following xpath which selects rows from table inside <div class="column2"> :
//div[#class='column2']//table[#id="imagetable"]/tbody/tr
Check my anwser Selenium automation- finding best xpath. In short check it by browser, browser can give U unique locator, then check it.

How to parse HTML/XML tags according to NOT conditions in [r]

Dearest StackOverflow homies,
I'm playing with HTML that was output by EverNote and need to parse the following:
Note Title
Note anchor (hyperlink identities of the notes themselves)
Note Creation Date
Note Content, and
Intra-notebook hyperlinks (the
links within the content of a note to another note's anchor)
According to examples by Duncan Temple Lang, author of the [r] XML package and a SO answer by #jdharrison, I have been able to parse the Note Title, Note anchor, and Note Creation Dates with relative ease. For those who may be interested, the commands to do so are
require("XML")
rawHTML <- paste(readLines("EverNotebook.html"), collapse="\n") #Yes... this is noob code
doc = htmlTreeParse(rawHTML,useInternalNodes=T)
#Get Note Titles
html.titles<-xpathApply(doc, "//h1", xmlValue)
#Get Note Title Anchors
html.tAnchors<-xpathApply(doc, "//a[#name]", xmlGetAttr, "name")
#Get Note Creation Date
html.Dates<-xpathApply(doc, "//table[#bgcolor]/tr/td/i", xmlValue)
Here's a fiddle of an example HTML EverNote export.
I'm stuck on parsing 1. Note Contents and 2. Intra-notebook hyperlinks.
Taking a closer look at the code it is apparent the solution for the first part is to return every upper-most* div that does NOT include a table with attribute bgcolor="#D4DDE5." How is this accomplished?
Duncan says that it is possible to use XPath to parse XML according to NOT conditions:
"It allows us to express things such as "find me all nodes named a" or "find me all nodes named a that have no attribute named b" or "nodes a that >have an attribute b equal to 'bob'" or "find me all nodes a which have c as >an ancestor node"
However he does not go on to describe how the XML package can parse exclusions... so I'm stuck there.
Addressing the second part, consider the format of anchors to other notes in the same notebook:
<a href="#13178">
The goal with these is to procure their number and yet this is difficult because they are solely distinguished from www links by the # prefix. Information on how to parse for these particular anchors via partial matching of their value (in this case #) is sparse - maybe even requiring grep(). How can one use the XML package to parse for these special hrefs? I describe both problems here since it's possible a solution to the first part may aid the second... but perhaps I'm wrong. Any advice?
UPDATE 1
By upper-most div I intend to say outer-most div. The contents of every note in an EverNote HMTL export are within the DOMs outer-most divs. Thus the interest is to return every outer-most div that does NOT include a table with attribute bgcolor="#D4DDE5."
"....to return every upper-most div that does NOT include a table with attribute bgcolor="#D4DDE5." How is this accomplished?"
One possible way ignoring 'upper-most' as I don't know exactly how would you define it :
//div[not(table[#bgcolor='#D4DDE5'])]
Above XPath reads: select all <div> not having child element <table> with bgcolor attribute equals #D4DDE5.
I'm not sure about what you mean by "parse" in the 2nd part of the question. If you simply want to get all of those links having special href, you can partially match the href attribute using starts-with() or contains() :
//a[starts-with(#href, '#')]
//a[contains(#href, '#')]
UPDATE :
Taking "outer-most" div into consideration :
//div[not(table[#bgcolor='#D4DDE5']) and not(ancestor::div)]
Side note : I don't know exactly how XPath not() is defined, but if it works like negation in general, (this worked as confirmed by OP in the comment below) you can apply one of De Morgan's law :
"not (A or B)" is the same as "(not A) and (not B)".
so that the updated XPath can be slightly simplified to :
//div[not(table[#bgcolor='#D4DDE5'] or ancestor::div)]

Regex selects first to last instead of just first

I'm trying to use String.sub! in ruby and it substitutes way too much.
The regex i'm using. You can see it's matching too much: http://rubular.com/r/IUav4KEFWH
<rb>.+<\/rb>
it selects from the first to the last and I want it just to select the first pair.
is there another version of sub I'm not aware of, or a better way to sub
it would be easy to turn of multi-line and put them on separate lines but I don't want to sacrifice multi-lining
Your regex is too greedy:
<rb>.+<\/rb>
Make it non-greedy using:
<rb>.+?<\/rb>
Rubular Demo
It matches from the first <rb> tag up until the very last </rb> tag because + is a greedy operator meaning it will match as much as it can and still allow the remainder of the regular expression to match.
You want to use +? for a non-greedy match meaning "one or more — preferably as few as possible".
<rb>.+?</rb>
Note: A parser to extract from HTML is recommended rather than using regular expression.
You can try this variant:
<rb>(?>(?!<\/rb>).)*+<\/rb>
Demo
Or if you want:
<rb>[^<]+<\/rb>
Demo
See the difference between .*? And [^<]+ in this DEMO

Get Image with Xpath using class of Div

How do I write the xpath to get the main news image in this article?
The below one failed for me.
//div[contains(#class,'sectionColumns')]//div[contains(#class,'column2']//*img"]
I want it to return all images in case of slideshow. I want it to be flexible as some classes
change when news changes.
Without looking at "this article", there is an obvious syntax error in your XPath expression:
//div[contains(#class,'sectionColumns')]//div[contains(#class,'column2']//*img"]
The substring of the above:*img", contains two errors -- * followed by a name, and an unbalanced quote.
Probably you want:
//div[contains(#class,'sectionColumns')]//div[contains(#class,'column2']//img]