css selector - get the value - html

What is the css selector to get the text value '2017-10-09' ?
<div class="col-xs-6">
<strong>
<!--ko text: date -->
2017-10-09
<!--/ko-->
<!--ko text: time -->
12:55
<!--/ko-->
</strong><br>
<!--ko text: locationName -->
City
<!--/ko-->
</div>

There is no such selector.
With a few exceptions (such as :first-line), a selector only allows you to select an element.
The text 2017-10-09 is not an element, it isn't even the whole text content of an element.
strong would allow you to select <strong><!--ko text: date -->2017-10-09<!--/ko--><!--ko text: time -->12:55<!--/ko--></strong> but that is more than you are asking for.
You could select the strong element, then read its text content, and then parse that (e.g. by splitting it across space characters or using a regular expression such as /(\d{4}-\d{2}-\d{2})/).

There are no CSS selectors that can check the data format in the text node that I know of. You should wrap a tag around the date and that use a proper selector for that tag like:
<div class="col-xs-6">
<strong>
<!--ko text: date -->
<span class="date">
2017-10-09
</span>
<!--/ko-->
<!--ko text: time -->
12:55
<!--/ko-->
</strong><br>
<!--ko text: locationName -->
City
<!--/ko-->
</div>
Alternatively, you can target the strong tag inside the div and then process the date out of text node.

Related

Xpath issues selecting <spans> nested in <td>

I'm trying to extract text from a lot of XHTML documents with a program that uses Xpath queries to map the text into a structured table. the XHTML document looks like this
<td class="td-3 c12" valign="top">
<p class="pa-4">
<span class="ca-5">text I would like to select </span>
</p>
</td>
<td class="td-3 c13" valign="top">
<p class="pa-2">
<span class="ca-0">some more text I want to select </span>
</p>
<p class="pa-2">
<span class="ca-0">
<br>
</br>
</span>
</p>
<p class="pa-2">
<span class="ca-5">text and values I don't want to select.</span>
</p>
<p class="pa-2">
<span class="ca-5"> also text and values I don't want to </span>
</p>
</td>
I'm able to select the the spans by their class and retrieve the text/values, however they're not unique enough and I need to filter by table classes. for example only the text from span class ca-0 that is a child of td class td-3 c13
which would be <span class="ca-0">some more text I want to select </span>
I've tried all these combinations
//xhtml:td[#class="td-3 c13"]/xhtml:span[#class = "ca-0"]
//xhtml:span[#class = "ca-0"] //ancestor::xhtml:td[#class= "td-3 c13"]
//xhtml:td[#class="td-3 c6"]//xhtml:span[#class = "ca-0"]
I'm not sure how much your sample xml reflects your actual xml, but strictly based on your sample xml (AND disregarding possible namespaces issues you will probably face), the following xpath expression:
//td[contains(#class,"td-3")]/p[1]/span/text()
selects
text I would like to select
some more text I want to select
According to the doc, and to support namespaces, you should write something like this (fn:...) :
//*:td[fn:contains(#class,"td-3")]/*:p[1]/*:span
Or with a binding namespace :
node.xpath("//xhtml:td[fn:contains(#class,'td-3')]/xhtml:p[1]/xhtml:span", {"xhtml":"http://example.com/ns"})
This expression should work too (select the first span of the first p of each td element) :
//*:td/*:p[1]/*:span[1]
Side notes :
Your XPath expressions could be fixed. Span is not a child but a descendant, so we use //. We use () to keep the first result only.
(//xhtml:td[#class="td-3 c13"]//xhtml:span[#class = "ca-0"])[1]
(//xhtml:td[#class="td-3 c6"]//xhtml:span[#class = "ca-0"])[1]
Replace // with a predicate [] :
(//xhtml:span[#class = "ca-0"][ancestor::xhtml:td[#class= "td-3 c13"]])[1]
Test your XPath with : https://docs.marklogic.com/cts.validIndexPath
The solution is
//td[(#class ="td-3") and (#class = "c13)]/p/span
for some reason it sees the
<td class="td-3 c13">
as separate classes e.g.
<td class = "td-3" and class = "c13"
so you need to treat them as such
Thanks to #E.Wiest and #JackFleeting for validating and pointing me in the right direction.

Inserting HTML content into a Vue Tooltip

I'm trying to render a tooltip for a table cell in Vue, for when the table cell's content exceeds 22 characters.
I need to use the v-tooltip library (https://www.npmjs.com/package/v-tooltip)
I can set the content of the tooltip to a simple string correctly using the 'content' attribute, however, when I try to set the html content, the tooltip is blank, and the html content which was supposed to appear inside the tooltip, appears constantly in the td.
<td v-if="cellContent !== null && cellContent.length>22">
<div>
<!-- <span v-tooltip.right="{content: 'This works, but is just a simple string', class:'mytooltip'}">{{cellContent.substring(0, 19)}}...</span> -->
<span class="icon-info-outline" v-tooltip.right="{ html: 'wildcardContent', class:'mytooltip' }"></span>
<div id="wildcardContent">
<p>This Fails and is displayed in the td</p>
</div>
</div>
</td>
<td v-else >{{cellContent }}</td>
v-tooltip="{ content: `<h1>Hellow World</h1>` }"

Is there a way to parse html that include javascript in tags in ruby?

I am working on a web scraping problem in Ruby. I have seen multiple questions and answers related to this but in none I have seen HTML that include some JavaScript framework in it and I cannot figure out how to do it. I just want to select the HTML and return an array of objects. The following is my script and the HTML code. The HTML classes of the values like name, currency, balance are similar and the question of how can it be done?
content = document.css("div.acc-list").map do |parameters|
name = parameters.at_css("p.s3.bold.row.acDesc").text.strip, # argument?
currency = parameters.at_css(".row.ccy").text.strip, # argument?
balance = parameters.at_css(".row.acyOpeningBal").text.strip # argument?
Account.new name, currency, balance
end
pp content
These HTML paragraphs are inside multiple other classes which I think is due to the framework. However, they are inside a <div class = acc-list div>...</div> and I think I did correctly when I assigned "div.acc-list" to "content" variable.
<!-- HTML for name -->
<td bindonce="" ng-repeat="col in gridOptions.columns" sg-bind-html-compile="col.cellTemplate" bo-class="col.className" bo-style="{width: col.remWidth }"
class="ng-scope icon-two-line-col" style="width: 17.3333rem;">
<div style="width: 17.333333333333332rem" class="first-cell cellText ng-scope">
<i bo-class="{'active':row.selected }" class="i-32 active icon i-circle-account"></i>
<div class="info-wrapper" style="">
<p class="s3 bold" bo-bind="row.acDesc">Name_value</p> # value
<a ui-sref="app.layout.ACCOUNTS.DETAILS.{ID}({id:'091601003439274'})" href="/Bank/accounts/details/BG37FINV91503006938102">
<span bo-bind="row.iban">BG37FINV91503006938102</span>
<i class="i-arrow-right-5x8"></i>
</a>
</div>
</div>
</td>
<!-- HTML for currency -->
<td bindonce="" ng-repeat="col in gridOptions.columns" sg-bind-html-compile="col.cellTemplate" bo-class="col.className" bo-style="{width: col.remWidth }"
class="ng-scope" style="width: 4.4rem;">
<div style="width: 4.4rem" class="text-center cellText ng-scope">
<span bo-bind="row.ccy">EUR</span> # value
</div>
</td>
<!-- HTML for balance -->
<td bindonce="" ng-repeat="col in gridOptions.columns" sg-bind-html-compile="col.cellTemplate" bo-class="col.className" bo-style="{width: col.remWidth }"
class="ng-scope" style="width: 8.73333rem;">
<div style="width: 8.733333333333333rem" class="text-right cellText ng-scope">
<span bo-bind="row.acyAvlBal | sgCurrency">1 523.08</span> # value
</div>
</td>
Using CSS:
require 'nokogiri'
document = Nokogiri::HTML(<<EOT)
<div class="acc-list">
<!-- HTML for name -->
<td>
<div class="first-cell cellText ng-scope">
<div class="info-wrapper">
<!-- # value -->
<p class="s3 bold">Name_value</p>
</div>
</div>
</td>
<!-- HTML for currency -->
<td>
<div class="text-center cellText ng-scope">
<!-- # value -->
<span>EUR</span>
</div>
</td>
<!-- HTML for balance -->
<td>
<div class="text-right cellText ng-scope">
<!-- # value -->
<span>1 523.08</span>
</div>
</td>
</div>
EOT
Now that the DOM is loaded:
content = document.css('div.acc-list').map do |div|
name = div.at("p.s3.bold").text.strip # => "Name_value"
currency = div.at("div.text-center > span").text.strip # => "EUR"
balance = div.at("div.text-right > span").text.strip # => "1 523.08"
[ name, currency, balance ]
end
# => [["Name_value", "EUR", "1 523.08"]]
Your HTML sample has a lot of extraneous information that obscures the trees in this particular forest. I stripped it out because it wasn't useful. (And, when submitting a question you should automatically do that as part of simplifying the non-essential information so we can all focus on the actual problem.)
CSS doesn't care about parameters other than the node name, class and id. The class can chain the parameters in the definition of the class if you need that granular access, but often you can get away with a more general class selector; It just depends on the HTML.
Most XML and HTML parsing is basically the same tactic: Find an outer placeholder, look inside it and iterate grabbing the information needed. I can't demonstrate that completely because your example only has the outer div, but you can probably imagineer the necessary code to handle an inner loop.
at_css is almost equivalent to at, and Nokogiri is smart enough 99.9% of the time to determine whether a selector is CSS or XPath, so I tend toward using at because my fingers are lazy.

How to get span class text using jsoup

I am using jsoup HTML parser and trying to travel into span class and get the text from it but Its returning nothing and its size always zero. I have pasted small part of HTML source . pls help me to extract the text.
<div class="list_carousel">
<div class="rightfloat arrow-position">
<a class="prev disabled" id="ucHome_prev" href="#"><span>prev</span></a>
<a class="next" id="ucHome_next" href="#"><span>next</span></a>
</div>
<div id="uc-container" class="carousel_wrapper">
<ul id="ucHome">
<li modelID="587">
<h3 class="margin-bottom10"> Ford Figo Aspire</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 5.50 - 7.50 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
<li modelID="899">
<h3 class="margin-bottom10"> Chevrolet Trailblazer</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 32 - 40 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
I have tried below code:
Elements var_1=doc.getElementsByClass("list_carousel");//four classes with name of list_carousel
Elements var_2=var_1.eq(1);//selecting first div class
Elements var_3 = var_2.select("> div > span[class=cw-sprite rupee-medium]");
System.out.println(var_3 .eq(0).text());//printing first result of span text
please ask me , if my content was not very clear to you. thanks in advance.
There are several things to note about your code:
A) you can't get the text of the span, since it has no text in the first place:
<div>Estimated Price:
<span class="cw-sprite rupee-medium"></span>
5.50 - 7.50 lakhs
</div>
See? The text is in the div, not the span!
B) Your selector "> div > span[class=cw-sprite rupee-medium]" is not really robust. Classes in HTML can occur in any order, so both
<span class="cw-sprite rupee-medium"></span>
<span class="rupee-medium cw-sprite"></span>
are the same. Your selector only picks up the first. This is why there is a class syntax in css, which you should use instead:
"> div > span.cw-sprite.rupee-medium"
Further you can leave out he first > if you like.
Proposed solution
Elements lcEl = doc.getElementsByClass("list_carousel").first();
Elements spans = lcEl.select("span.cw-sprite.rupee-medium");
for (Element span:spans){
Element priceDiv = span.parent();
System.out.println(priceDiv.getText());
}
Try
System.out.println(doc.select("#ucHome div:nth-child(3)").text());

How to get a single text node in Selenium WebDriver [duplicate]

This question already has answers here:
Getting text from a node
(5 answers)
Closed 5 years ago.
I want to get the text from a tag but without the text from nested tags. I.e. in the example below, I only want to get the string 183591 from inside the <small> tag and exclude the text Service Request ID: from the <span> tag. This is not trivial because the <span> tag is nested in the <small> tag. Is this possible with WebDriver and XPath?
The text in the tag is going to change every time.
<div id="claimInfoBox" style="background-color: transparent;">
<div class="col-md-3 rhtCol">
<div class="cib h530 cntborder">
<h4 class="no-margin-bottom">
<p>
<small style="background-color: transparent;">
<span class="text-primary" style="background-color: transparent;">Service Request ID:</span>
183591
</small>
</p>
<div class="border-bottom" style="background-color: transparent;"></div>
<div id="CIB_PersonalInfo_DisplayMode" class="cib_block">
<div id="CIB_PersonalInfo_EditMode" class="cib_block" style="display: none">
</div>
</div>
<script type="text/javascript">
</div>
</div>
You are going to have to use String manipulation. Something like:
// you will need to adjust these XPaths to suit your needs
String outside = driver.findElement(By.xpath("//small")).getText();
String inside = driver.findElement(By.xpath("//span")).getText();
String edge = outside.replace(inside, "");
The simplest way I've found is by getting the parent small node and the child span node and removing the number of characters in the child from the text of the parent:
public String getTextNode() {
WebElement parent = driver.findElement(By.xpath("//small")); //or By.tagName("small")
WebElement child = parent.findElement(By.xpath(".//span")); //or By.tagName("span")
return parent.getText().substring(child.getText().length()).trim();
}
The actual simplest way is using javascript executor as below:
JavascriptExecutor js = ((JavascriptExecutor)driver);
js.executeScript("return $(\"small\").clone().children().remove().end().text();");
This will return the text associated with the parent element 'small' only. Use trim() to omit leading and trailing whitespace. For the full explanation of what is happening here, please refer the link below.
Reference:
http://exploreselenium.com/selenium/exclude-text-content-of-child-elements-of-the-parent-element-in-selenium-webdriver/