How do I write the xpath to get the main news image in this article?
The below one failed for me.
//div[contains(#class,'sectionColumns')]//div[contains(#class,'column2']//*img"]
I want it to return all images in case of slideshow. I want it to be flexible as some classes
change when news changes.
Without looking at "this article", there is an obvious syntax error in your XPath expression:
//div[contains(#class,'sectionColumns')]//div[contains(#class,'column2']//*img"]
The substring of the above:*img", contains two errors -- * followed by a name, and an unbalanced quote.
Probably you want:
//div[contains(#class,'sectionColumns')]//div[contains(#class,'column2']//img]
Related
I have a certain bunch of XPath locators that hold the elements I want to extract, and they have a similar structure:
/div/ul/li[1]/div/div[2]/a
/div/ul/li[2]/div/div[2]/a
/div/ul/li[3]/div/div[2]/a
...
They are actually simplified from Pixiv user page. Each /div/div[2]/a element has a title string, so they are actually artwork titles.
I want to use a single expression to fetch all the above a elements in an WebExtension called PageProbe. Although I've tried a bunch of methods, it just can't return the wanted result.
However, the following expression does return all the a elements, including the ones I don't need.
/div/
The following expression returns the a element under only the first li item.
/div/ul/li/div/div[2]/a
Sorry for not providing enough info earlier. Hope someone can help me out. Thanks.
According to the information you gave here you can simply use this xpath:
/div/ul/li/div/div[2]/a
however I'm quite sure it should be some better locator based on other attributes like class names etc.
I try to enter data in a table using Robot Framework. The table has an ID, but it changes every time I load the page (it is some kind of UUID) so I can't use it as "anchor" for my xpath. However there is a heading for this table that seems reasonable to start with that has a fixed ID. Inbetween the heading and the table there are a couple of divs. So something like this (some mix of pseudo code and what I get when I copy selector and xpath in Chrome) to get to the first cell in the first line of the table:
//*[#id="heading"] (a bunch of divs) /*[#id="random string of letters"]/div[3]/div/div/div[2]
I would like to write an xpath that looked something like this
//*[#id="heading"] [wildcard for the random ID and divs] /div[3]/div/div/div[2]
How do I write this?
Thank you.
If only one element inside the "header" contains an id attribute you could use
//*[#id="heading"]//*[#id]/div[3]/div/div/div[2]
If there are more than one element with id attribute you need something more, eg if it contains a certain tag
//*[#id="heading"]//*[contains(#id, "tag")]/div[3]/div/div/div[2]
or (if using xpath 2.0) and only this #id contains an uuid within the heading
//*[#id="heading"]//*[matches(#id,"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}")]/div[3]/div/div/div[2]
Otherways you will have to try to find something unique (within the context of "heading") to start the div[3]/div/div/div[2] search (if you are lucky div[3]/div/div/div[2] is unique enough.
I'm trying to scrape data (using scrapy) from tables that can be found here:
http://www.bettingtools.co.uk/tipster-table/tipsters
My spider functions when I parse response within the following xpath:
//*[#id="imagetable"]/tbody/tr
Every table on the page shares that id, so I'm basically grabbing all the table data.
However, I only want the table data for the current month (tables in the right column).
When I try and be more specific with my xpath, I get an invalid xpath error even though it seems to be correct. I've tried:
- //*[#id="content"]/[contains(#class, "column2")]/[contains(#class, "table3")]/[#id="imagetable"]/tbody/tr
- //*[#id="content"]/div[contains(#class, "column2")]/div[contains(#class, "table3")]/[#id="imagetable"]/tbody/tr
- //*[#id="content"]/div[2]/div[1]/[#id="imagetable"]/tbody/tr
Also, when I try to select the xpath of a specific table on the page with chrome I just get //*[#id="imagetable"].
Am I missing something obvious here? Why are the 3 above xpath examples I've tried not valid?
Thanks
What makes those 3 invalid xpath is the part with this pattern :
/[predicate expression here]
above xpath missed to select a node on which the predicate would be applied. It should rather looks like this :
/*[predicate expression here]
Here are some examples of valid ones :
1. /table[#id="imagetable"]
2. /div[contains(#class, "column2")]
3. /*[contains(#class, "table3")]
For this specific task, you can try the following xpath which selects rows from table inside <div class="column2"> :
//div[#class='column2']//table[#id="imagetable"]/tbody/tr
Check my anwser Selenium automation- finding best xpath. In short check it by browser, browser can give U unique locator, then check it.
I have the following line of code
link = find(:xpath, "//div[#id='tree']//a[contains(.,'#{peril}')]")
Above step yields in two elements. How do I pick the first one.
I am getting a Ambiguous match found 2 elements matching xpath. Here is the HTML
"
ShipCase_US_MortalityRatingGroup_Life Portfolio result Earthquake Infectious Disease"
You need to surround the entire XPath in parentheses and add the [1] after it.
(//div[#id='tree']//a[contains(.,'#{peril}')])[1]
find(".active", match: :first).click
this solution uses Capybara's (quite important) waiting capabilities
I want to do THIS, just a little bit more complicated:
Lets say, I have an HTML input:
Don't break!
Some Twitter Users: #codinghorror, #spolsky, #jarrod_dixon and #blam4c.
You can't reach me at blam4c#example.com.
Is there a good RegEx to replace the twitter username mentions by links to twitter, but leave #example (eMail-Adress at the bottom) AND #test (in the link title, i.e. in HTML tags)?
It probably should also try to not add links inside existing links, i.e. not break this:
Hello #someone there!
My current attempt is to add ">" at the beginning of the string, then use this RegEx:
Search: '/>([^<]*\s)\#([a-z0-9_]+)([\s,.!?])/i'
Replace: '>\1#\2\3'
Then remove the ">" I added in step 1.
But that won't match anything but the "#blam4c". I know WHY it does so, that's not the problem.
I would like to find a solution that finds and replaces all twitter user name mentions without destroying the HTML. Maybe it might even be better to code this without RegEx?
First, keep the angle brackets out of your regexps.
Use a HTML parser and xpath to select the text nodes you are interested in processing, then consider a regexp for matching only #refs in those nodes.
I'll let to other people to try and give a specific answer to the regex part.
I agree with ddaa, there's almost no sane way to attack this without stripping the html links out first.
Presumably you'd be starting out with an actual Twitter message, which cannot by definition include any manually entered hyperlinks.
For example, here's how I found this question (the link resolves to this question so don't bother clicking it!)
Some Twitter Users: #codinghorror, #spolsky, #jarrod_dixon and #blam4c. http://bit.ly/2phvZ1
In this case, it's easy:
var msg = "Some Twitter Users: #codinghorror, #spolsky, #jarrod_dixon and #blam4c. http://bit.ly/2phvZ1";
var html = Regex.Replace(msg, "(?<!\w)(#(\w+))",
"$1");
(this might need some tweaking, I'd like to test it against a corpus, but it seems correct for the average Twitter message)
As for your more complicated cases (with HTML markup embedded in the tweets), I have no idea. Way too hard for me.
This regexp might work a bit better: /\B\#([\w\-]+)/gim
Here's a jsFiddle example of it in action: http://jsfiddle.net/2TQsx/4/