Watir get INNER html of <span> - html

I'm trying to find a navigation link by iterating through a handful of spans with class 'menu-item-text'. My goal is to compare what is inside the span tags to see if it is the right navigation control to click (there are no hard ids to go by.) My code is like this:
navlinks = #browser.spans(:class, 'menu-item')
navlinks.each do |this|
puts "'#{this.text}'"
if this.text == link_name
this.click
break
end
I know for sure I'm getting the correct elements. However, text is always an empty string. My second idea was to use .html instead of .text, but that returns something like this:
<span class="menu-item">Insights</span>
What I want is the "Insights" text inside the span, not the full html that includes the tag markup. I have also tried using this.span.text, but that did not work either.
How can I target exclusively the inner html of an element through watir's content grabbing methods?
Thanks!

Assuming you are using Watir-Webdriver v0.6.9 or later, a inner_html method has been added for getting the inner HTML.
For the span:
<span class="menu-item">Insights</span>
You could do:
#browser.span(:class => 'menu-item').inner_html
#=> "Insights"
Similarly, you could try using this method in your loop instead of .text.
Note that depending on the uniqueness of your text, you might be able to simply check if the text appears in element's (outer) html:
#browser.spans(:class => 'menu-item', :html => /#{link_name}/).click

Related

How can I get the element of a-tag in the div class with selenium?

I recently work on the project that I have to get the element from a specific website.
I want to get the text elements that are something below.
<div class="block-content">
<div class="block-heading">
<a href="https://www~~~~~~">
<i class="fa fa-map">
::before
</i>
"Text I want to get"
</a>
</div>
</div>
I have been trying to solve this for a while, but I could not find anything working fine.
I would love you if you could help me.
Thank you.
According to the information you provided the text you are looking for is inside a element so the xpath for this element is something like:
//a[contains(#href,'https://www')]
But since there is also i element inside it, getting the text from a element will give you both text contained in a itself and the text inside the i.
So you should get the text from i that is looking like just a (space) here and reduce it from the text you are receiving from the a.
In case you want to perform this action on all the a elements containing href and i element inside it you can use the following xpath:
//a[#href and ./i]
If there are more specific definitions about the elements you are looking for - the xpath I mentioned should be updated accordingly
From your comment, I understood that you would like to extract that text. So here is the code for you which would extract the text you want.
Selenium::WebDriver::Wait
.new(timeout: 60)
.until { !driver.find_element(xpath: "//i[#class='fa fa-map-marker']/..").text.empty? }
p driver.find_element(xpath: "//i[#class='fa fa-map-marker']/..").text[/(?<=before \")\w+ \w+ \w+ \w+ \w+/]
output
"Text I want to get"
I couldn't get the elements that I wanted directly, so here's what I did.
It is just that I did modify the elements with some methods though.
def seller_name
shop_info_elements = #driver.find_elements(:class_name, "block-content")
shop_info_text= shop_info_elements.first.text
shop_info_text_array = shop_info_text.lines
seller_name = shop_info_text_array.first.chomp
seller_name
end
It is not beautiful, but it can work for any other pages on the same site.

XPath : How to get text between 2 html tags with same level?

I'm new to xpath and I'm working with scrapy to get text from different html pages that are generated.
I get the {id} of a header tag from the user (<h1|2|.. id="title-{id}">text</h1|2|3..>). I need to get text from all html tags between this header and the next header of same level. So if the header is h1 I need to get all text of all tags until next h1 header.
All headers ids have same pattern "title-{id}" where {id} is generated.
To make it more clear here is an example :
<html>
<body>
...
<h2 id="tittle-id1">id1</h2>
bunch of tags containing text I want to get
<h2 id="tittle-id2">id2</h2>
...
</body>
</html>
NOTE : I don't know what header it might be. It could be any of the html header tags from <h1> to <h6>
UPDATE :
While trying few things around I noticed that I'm not sure if the next header is of same level or even exists. Since the headers are used as titles and sub-titles. The given id may be of last sub-title hence I'll have a header of higher level after or even be the last of the page. So basicaly I only have the id of the header and I need to get all text of the "paragraph".
Work Around :
I found a kindof workaround solution :
I do it in 3 steps :
First, I use //*[#id='title-{id}] which allows me to get the full line with the tag so now I know which tag header it is.
Second, I use //*[id='title-{id}]/following-sibling::* this allows to look for next header of same or higher level {myHeader}.
Last, I use //*[id='title-{id}]/following-sibling::* and //{myHeader}//preceding-sibling::* to get what's between or go 'till the end of page if no header found.
Here is the xpath to get all the elements between h2 tags.
//h2/following-sibling::*[count(following-sibling::h2)=1]
Here is the sample html I used to simulate the scenario. (update the id to check different options shown in the below).
//[#id='tittle-id1' ]/following::[count(following-sibling::[name()=name(preceding-sibling::[#id='tittle-id1'])])=1]
<html><head></head><body>
...
<h2 id="tittle-id1">id1</h2>
<h3 id="tittle-id3"> h3 tag</h3>
<h4 id="tittle-id4"> h4 tag</h4>
<h3 id="tittle-id5"> 2nd h3 tag</h3>
bunch of tags containing text I want to get
<h5 id="tittle-id6"> h5 tag </h5>
<h2 id="tittle-id2">id2</h2>
<h4 id="tittle-id7"> 2nd h4 tag</h4>
...
</body></html>
output if User input: {id1}
output if user input: {id4}
output if user input: {id3}
Note: This xpath is designed to suite the original post scenario.
Because predicates in XPath filter the context node list you can't perform a join selection unless you are able to reintroduce target values from a relative context of your source values. Example selecting all the elements with the same name as that having specific id attribute:
//*[name()=name(//*[#id=$generated-id-string])]
Now, for the in "between marks problem" use as usually the Kaysian method for intersection:
//*[name()=name(//*[#id=$generated-id-string])]/preceding-sibling::node()[
count(.|//*[#id=$generated-id-string]/following-sibling::node())
=
count(//*[#id=$generated-id-string]/following-sibling::node())
]
Test in http://www.xpathtester.com/xpath/0dcfdf59dccb8faf3705c22167ae45f1
This is what worked for me :
For this keep in mind that I'm using scrapy with python-2.7 :
name_query = u"//*[name()=name(//*[#id='"+id+"'])]"
all = response.xpath(name_query)
for selector in all.getall():
if self.id in selector:
position = all.getall().index(selector)
balise = "h" + all.getall()[position].split("<h")[1][0]
title = all.getall()[position].split(">")[1].split("<")[0]
query = u"//*[preceding-sibling::"+balise+"[1] ='"+title+"' and following-sibling::"+balise+"]"
self.log('query = '+query)
results = response.xpath(query)
results.pop(len(results)-1)
with open(filename,'wb') as f:
for text in results.css("::text").getall():
f.write(text.encode('utf-8')+"\n")
This should work in general I tested it against multiple headers wih different levels it works fine for me.

Select the content of a HTML tag not containing children tags

I am writing some code to remove span tags with an specific class from my database. Whenever I remove the opening tag, I need to remove the closing tag as well. For example, I'd like to turn this:
<span class="someClass">Hello</span><span></span>
<span class="someClass">My <span>name</span> is Joe</span>
Into:
Hello<span></span>
My <span>name</span> is Joe
I'm trying to perform this using regex, but I've came to the conclusion that it's not possible. So my second guess was to select only the cases where the content inside the opening and closing tags isn't a span tag itself.
/<span class="someClass">(.*?)<\/span>/g works well for the first case but would cause problem on the second one. However if I try /<span class="someClass">(.*)<\/span>/g would cause a problem on the first one.
Is there a way to make a regex that will only get the first case? I want it to ignore only if there are children span tags, which means something like this
<span class="someClass">Hello world</span>
would be selected as well.
For this solution to work, you'll need to consider whole string as a single line, maybe with s (singleline) option.
/<span((?!>).)*>((?!<span).)*?<\/span>/s

How do I get Mithril.js v0.2.5 to render raw HTML extracted from json? [duplicate]

Suppose I have a string <span class="msg">Text goes here</span>.I need to use this string as a HTML element in my webpage. Any ideas on how to do it?
Mithril provides the m.trust method for this. At the place in your view where you want the HTML output, write m.trust( '<span class="msg">Text goes here</span>' ) and you should be sorted.
Mithril it's powerfull thanks to the virtual dom, in the view you if you want to create a html element you use:
m("htmlattribute.classeCss" , "value");
So in your case:
m("span.msg" , "Text goes here");
Try creating a container you wish to hold your span in.
1. Use jQuery to select it.
2. On that selection, call the jQuery .html() method, and pass in your HTML string.
($('.container').html(//string-goes-here), for example)
You should be able to assign the inner HTML of the container with the string, resulting in the HTML element you want.
Docs here.

how can I write a string like: <space> in html text

I have many pages of html and I want to write a text like this 'DOM<space>LUNCH'.
But when I write the text like above then it showing space instead of in browser,
because browser understand as a html tag and print is as a space. :
Like this:'DOM LUNCH'.
I used this also 'DOM \<space\> LUNCH',so that it will ignore the next letter,but noting goes right.
Please tell me how can I write a string in html like this:'DOM<space>LUNCH'
Even I am not able to post the question as I want, because browser understand space and <> as space actually.
Use < and >.
<mytag>
The closing part of the tag doesn't absolutely have to be replaced however.
<mytag>