Remove an element's class attribute with Hpricot - html

How do I do it? E.g.,
<span class="selected" id="hi">HELLO</span>
should become
<span id="hi">HELLO</span>

span = Hpricot(some_html) % "span#hi"
span.remove_attribute("class")

Related

XPath selection by value

I want to get a value of "square" (for example, 201). I tried to do so, as described here, but it doesn't work:
./li[attributeTitle='Этаж']
Html code:
<div class = "A">
<ui class = "B">
<li>
<span class = "attributeTitle"> Floor </span>
<span class = "attributeValue"> 3 </span>
</li>
<! A random more items "li" >
<li>
<span class = "attributeTitle"> Square </span>
<span class = "attributeValue"> 201 </span>
</li>
<li>
<span class = "attributeTitle"> Nrooms </span>
<span class = "attributeValue"> 4 </span>
</li>
</ui>
</div>
Thanks for any help.
You can use contains() function in xpath to check whether text contains some string:
"//div[#class='attributeTitle'][contains(text(),'Square')]"
This gets you this node:
<span class = "attributeTitle"> Square </span>
To get the value node that is right below it you can use following-sibling::span:
"//div[#class='attributeTitle'][contains(text(),'Square')]/following-sibling::span[1]"
And adding [1] to indicate that we want only the first sibling in case there are more than one sibling. You can also use [class='attributeValue'] instead to indicate that we only want siblings that have this particular class, or not use anything at all there if you trust there will only be 1 sibling.

Use regular expressions to add new class to element using search/replace

I want to add a NewClass value to the class attribute and modify the text of the span using find/replace functionality with a pair of regular expressions.
<div>
<span class='customer' id='phone$0'>Home</span>
<br/>
<span class='customer' id='phone$1'>Business</span>
<br/>
<span class='customer' id='phone$2'>Mobile</span>
</div>
I am trying to get the following result using after search/replace:
<span class='customer NewClass' id='phone$1'>Organization</span>
Also curious to know if a single find/replace operation can been used for both tasks?
Regex can do this, but be aware the using regex to change HTML can have a lot of edge cases that you may not have accounted for.
This regex101 example shows those three <span> elements changed to add NewClass and the contents to be changed to Organization.
Other technologies, however, would be safer. jQuery, for example, could replace them regardless of the order of the attributes:
$("span#phone$1").addClass("NewClass");
$("span#phone$1").text("Organization");
So just be careful with it, and you should be fine.
EDIT
According to comments on the OP, you want to only change the span containing ID phone$1, so the regex101 link has been updated to reflect this.
EDIT 2
Permalink was too long to fit into a comment, so adding the permalink here. Click on the "Content" tab at the bottom to see the replacement.
You can use a regex like this:
'.*?' id='phone\$1'>.*?<
With substitution string:
'customer' id='phone\$1'>Organization<
Working demo
Php code
$re = "/'.*?' id='phone\\$1'>.*?</";
$str = "<div>\n <span class='customer' id='phone\$0'>Home</span>\n<br/>\n <span class='customer' id='phone\$1'>Business</span>\n<br/>\n <span class='customer' id='phone\$2'>Mobile</span>\n</div>";
$subst = "'customerNewClass' id='phone\$1'>Organization<";
$result = preg_replace($re, $subst, $str);
Result
<div>
<span class='customer' id='phone$0'>Home</span>
<br/>
<span class='customerNewClass' id='phone$1'>Organization</span>
<br/>
<span class='customer' id='phone$2'>Mobile</span>
</div>
Since your tags include preg_match and preg_replace, I think you are using PHP.
Regex is generally not a good idea to manipulate HTML or XML. See RegEx match open tags except XHTML self-contained tags SO post.
In PHP, you can use DOMDocument and DOMXPath with //span[#id="phone$1"] xpath (get all span tags with id attribute vlaue equal to phone$1):
$html =<<<DATA
<div>
<span class='customer' id='phone$0'>Home</span>
<br/>
<span class='customer' id='phone$1'>Business</span>
<br/>
<span class='customer' id='phone$2'>Mobile</span>
</div>
DATA;
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xp = new DOMXPath($dom);
$sps = $xp->query('//span[#id="phone$1"]');
foreach ($sps as $sp) {
$sp->setAttribute('class', $sp->getAttribute('class') . ' NewClass');
$sp->nodeValue = 'Organization';
}
echo $dom->saveHTML();
See IDEONE demo
Result:
<div>
<span class="customer" id="phone$0">Home</span>
<br>
<span class="customer NewClass" id="phone$1">Organization</span>
<br>
<span class="customer" id="phone$2">Mobile</span>
</div>

How to get span class text using jsoup

I am using jsoup HTML parser and trying to travel into span class and get the text from it but Its returning nothing and its size always zero. I have pasted small part of HTML source . pls help me to extract the text.
<div class="list_carousel">
<div class="rightfloat arrow-position">
<a class="prev disabled" id="ucHome_prev" href="#"><span>prev</span></a>
<a class="next" id="ucHome_next" href="#"><span>next</span></a>
</div>
<div id="uc-container" class="carousel_wrapper">
<ul id="ucHome">
<li modelID="587">
<h3 class="margin-bottom10"> Ford Figo Aspire</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 5.50 - 7.50 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
<li modelID="899">
<h3 class="margin-bottom10"> Chevrolet Trailblazer</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 32 - 40 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
I have tried below code:
Elements var_1=doc.getElementsByClass("list_carousel");//four classes with name of list_carousel
Elements var_2=var_1.eq(1);//selecting first div class
Elements var_3 = var_2.select("> div > span[class=cw-sprite rupee-medium]");
System.out.println(var_3 .eq(0).text());//printing first result of span text
please ask me , if my content was not very clear to you. thanks in advance.
There are several things to note about your code:
A) you can't get the text of the span, since it has no text in the first place:
<div>Estimated Price:
<span class="cw-sprite rupee-medium"></span>
5.50 - 7.50 lakhs
</div>
See? The text is in the div, not the span!
B) Your selector "> div > span[class=cw-sprite rupee-medium]" is not really robust. Classes in HTML can occur in any order, so both
<span class="cw-sprite rupee-medium"></span>
<span class="rupee-medium cw-sprite"></span>
are the same. Your selector only picks up the first. This is why there is a class syntax in css, which you should use instead:
"> div > span.cw-sprite.rupee-medium"
Further you can leave out he first > if you like.
Proposed solution
Elements lcEl = doc.getElementsByClass("list_carousel").first();
Elements spans = lcEl.select("span.cw-sprite.rupee-medium");
for (Element span:spans){
Element priceDiv = span.parent();
System.out.println(priceDiv.getText());
}
Try
System.out.println(doc.select("#ucHome div:nth-child(3)").text());

xpath: how to get partial data from within tag

How do I get the $10.99 without getting the $11.50?
my code gets both and I do not want what's in the strike tag:
$price = trim($tmp_xpath->query("//div[#class='ProductPriceRating']/em")->item(0)->nodeValue);
here's the html:
<div class="ProductPriceRating">
<em><strike class="RetailPriceValue">$11.50</strike> $10.99</em>
<span class="Rating Rating5"><img src="images/IcoRating5.gif" alt="" style="" /></span>
</div>
Just get the text() of em element:
//div[#class='ProductPriceRating']/em/text()
Here's what you would have for $price:
$price = trim($tmp_xpath->query("//div[#class='ProductPriceRating']/em/text()")->item(0));
Or, as #adamretter noted, better use normalize-space() instead of trim():
$price = $tmp_xpath->query("normalize-space(//div[#class='ProductPriceRating']/em/text())")->item(0);

Format dom as expression using css

I'm creating a site that uses tags and needs to perform basic tag algebra with operators not, and, or. I have a dom element that describes the expression but can't display the expression using css.
Consider the following expression:
([Green] or ((not [Blue]) and ([Red] or (not [Yellow]))))
Which is represented in the dom as:
<span class="tag-expression">
<span class="tag-or">
<span class="tag" value="green">Green</span>
<span class="tag-and">
<span class="tag-not">
<span class="tag" value="blue">Blue</span>
</span>
<span class="tag-or">
<span class="tag" value="red">Red</span>
<span class="tag-not">
<span class="tag" value="yellow">Yellow</span>
</span>
</span>
</span>
</span>
</span>
I've managed to include the parenthesis using css' :before and :after tied with the content attribute (jfiddle demo). But have no luck showing the operators ¬, &, |. I've been toying with including a <span class="operator"/> with an image background but I was wondering is there is another way to make this using the :before and :after selectors.
Any ideas?
Here you go, it works with what you provided me in the example, you should test it out in more complex expressions to make sure it is correct.
I added some complex CSS selectors at the end of your CSS script for showing your operators:
.tag-expression .tag-or > span:nth-child(2):before {
content: ' | (';
}
.tag-expression .tag-and > span:first-child:after {
content: ' ) & ';
}
.tag-expression .tag-not:before {
content: ' ( ¬ ';
}​
You can checkout this in this fiddle. Let me know if that solves your problem.