Set parent element of a group of nodes (wrap whole group) - html

Can Jsoup set parent element of a group of nodes? I mean wrap it, but no every matched element - only create one parent element? So I want to include more elements into one?
Example: before
<b>some text<i> blabla </i> other text </b>
After
<span id='something'><b>some text<i> blabla </i> other text </b></span>
<b>some te
<span id="cke_bm_69S" style="display: none;"> </span>
xt</b>
aaa
<i>bb
<span id="cke_bm_69S" style="display: none;"> </span>
b</i>
The span tags are bookmarks - start selection and end selection - added from CKEDITOR. Then on the server side I have to process it. This is the goal - add final span and remove the temp-spans (bookmarks):
<b>some te</b>
<span id="something"><b>
xt</b>
aaa
<i>bb
</i></span><i>
b</i>
As you can see, it has to solve the tag-crossing problem.

public static void main(String... args) throws IOException {
Document document = Jsoup.parse("<div>"
+ "<b>some text<i> blabla </i> other text </b>" + "</div>");
Element b = document.select("b").first();
Element span = document.createElement("span");
span.attr("id", "something");
b.replaceWith(span);
span.appendChild(b);
System.out.println(document);
}
Output
<html>
<head></head>
<body>
<div>
<span id="something"><b>some text<i> blabla </i> other text </b></span>
</div>
</body>
</html>

Related

Angular *ngFor problems on showing result

i'm trying to show tags from user, (example when user write #name).
my code in html is this.
<div *ngIf="tagsArr" style="display: flex;">
<p>
<span>Tags: </span>
</p>
<p *ngFor="let tag of tagsArr">
<span> {{tag}}, </span>
</p>
</div>
This is the code. and the result is like:
Tags: #name ,#name ,#name < / p >,
(without space)
Why the end tag of </p>, is displayed there?
I want to display the Tags like:
Tags: #name, #name, #name
it shouldn't show <p> may be you have a <p> string in tagsArr
it work fine on my device
.ts
tagsArr = ['#tag1, #tag2, #tag3' ,'#tag4' ]
.html
<div *ngIf="tagsArr">
<p>
<span>Tags: </span>
</p>
<p *ngFor="let tag of tagsArr">
<span> {{tag}}, </span>
</p>
</div>

XPath selection by value

I want to get a value of "square" (for example, 201). I tried to do so, as described here, but it doesn't work:
./li[attributeTitle='Этаж']
Html code:
<div class = "A">
<ui class = "B">
<li>
<span class = "attributeTitle"> Floor </span>
<span class = "attributeValue"> 3 </span>
</li>
<! A random more items "li" >
<li>
<span class = "attributeTitle"> Square </span>
<span class = "attributeValue"> 201 </span>
</li>
<li>
<span class = "attributeTitle"> Nrooms </span>
<span class = "attributeValue"> 4 </span>
</li>
</ui>
</div>
Thanks for any help.
You can use contains() function in xpath to check whether text contains some string:
"//div[#class='attributeTitle'][contains(text(),'Square')]"
This gets you this node:
<span class = "attributeTitle"> Square </span>
To get the value node that is right below it you can use following-sibling::span:
"//div[#class='attributeTitle'][contains(text(),'Square')]/following-sibling::span[1]"
And adding [1] to indicate that we want only the first sibling in case there are more than one sibling. You can also use [class='attributeValue'] instead to indicate that we only want siblings that have this particular class, or not use anything at all there if you trust there will only be 1 sibling.

How to get span class text using jsoup

I am using jsoup HTML parser and trying to travel into span class and get the text from it but Its returning nothing and its size always zero. I have pasted small part of HTML source . pls help me to extract the text.
<div class="list_carousel">
<div class="rightfloat arrow-position">
<a class="prev disabled" id="ucHome_prev" href="#"><span>prev</span></a>
<a class="next" id="ucHome_next" href="#"><span>next</span></a>
</div>
<div id="uc-container" class="carousel_wrapper">
<ul id="ucHome">
<li modelID="587">
<h3 class="margin-bottom10"> Ford Figo Aspire</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 5.50 - 7.50 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
<li modelID="899">
<h3 class="margin-bottom10"> Chevrolet Trailblazer</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 32 - 40 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
I have tried below code:
Elements var_1=doc.getElementsByClass("list_carousel");//four classes with name of list_carousel
Elements var_2=var_1.eq(1);//selecting first div class
Elements var_3 = var_2.select("> div > span[class=cw-sprite rupee-medium]");
System.out.println(var_3 .eq(0).text());//printing first result of span text
please ask me , if my content was not very clear to you. thanks in advance.
There are several things to note about your code:
A) you can't get the text of the span, since it has no text in the first place:
<div>Estimated Price:
<span class="cw-sprite rupee-medium"></span>
5.50 - 7.50 lakhs
</div>
See? The text is in the div, not the span!
B) Your selector "> div > span[class=cw-sprite rupee-medium]" is not really robust. Classes in HTML can occur in any order, so both
<span class="cw-sprite rupee-medium"></span>
<span class="rupee-medium cw-sprite"></span>
are the same. Your selector only picks up the first. This is why there is a class syntax in css, which you should use instead:
"> div > span.cw-sprite.rupee-medium"
Further you can leave out he first > if you like.
Proposed solution
Elements lcEl = doc.getElementsByClass("list_carousel").first();
Elements spans = lcEl.select("span.cw-sprite.rupee-medium");
for (Element span:spans){
Element priceDiv = span.parent();
System.out.println(priceDiv.getText());
}
Try
System.out.println(doc.select("#ucHome div:nth-child(3)").text());

How to split string containing comma using span tag

There is a string containing tags separated by comma:
<div class="article" id="1">
<span class="tags">Dog, Cat, Bird, Pig</span>
</div>
<div class="article" id="2">
<span class="tags">Asia, Africa, Australia, Europe</span>
</div>
...
I would like to wrap each item using span tag within the tags class so that each of them can be individually styled like this
<span class="tags"><span>Dog</span><span>Cat</span>...</span>
$('.tags').html($('.tags').html().split(', ').map(function(el) {return '<span>' + el + '</span>'}))
Fiddle here

goquery- Extract text from one html tag and add it to the next tag

Yeah, sorry that the title explains nothing. I'll need to use an example.
This is a continuation of another question I posted which solved one problem but not all of them. I've put most of the background info from that question into this one. Also, I've only been looking into Go for about 5 days (and I only started learning code a couple months ago), so I'm 90% sure that I'm close to figuring out what I want and that the problem is that I've got some silly syntax mistakes.
Situation
I'm trying to use goquery to parse a webpage. (Eventually I want to put some of the data in a database). Here's what it looks like:
<html>
<body>
<h1>
<span class="text">Go </span>
</h1>
<p>
<span class="text">totally </span>
<span class="post">kicks </span>
</p>
<p>
<span class="text">hacks </span>
<span class="post">its </span>
</p>
<h1>
<span class="text">debugger </span>
</h1>
<p>
<span class="text">should </span>
<span class="post">be </span>
</p>
<p>
<span class="text">called </span>
<span class="post">ogle </span>
</p>
<h3>
<span class="statement">true</span>
</h3>
</body>
<html>
Objective
I'd like to:
Extract the content of <h1..."text".
Insert (and concatenate) this extracted content into the content of <p..."text".
Only do this for the <p> tag that immediately follows the <h1> tag.
Do this for all of the <h1> tags on the page.
Once again, an example explains ^this better. This is what I want it to look like:
<html>
<body>
<p>
<span class="text">Go totally </span>
<span class="post">kicks </span>
</p>
<p>
<span class="text">hacks </span>
<span class="post">its </span>
</p>
<p>
<span class="text">debugger should </span>
<span class="post">be </span>
</p>
<p>
<span class="text">called </span>
<span class="post">ogle</span>
</p>
<h3>
<span class="statement">true</span>
</h3>
</body>
<html>
Solution Attempts
Because distinguishing further the <h1> tags from the <p> tags would provide more parsing options, I've figured out how to change the class attributes of the <h1> tags to this:
<html>
<body>
<h1>
<span class="title">Go </span>
</h1>
<p>
<span class="text">totally </span>
<span class="post">kicks </span>
</p>
<p>
<span class="text">hacks </span>
<span class="post">its </span>
</p>
<h1>
<span class="title">debugger </span>
</h1>
<p>
<span class="text">should </span>
<span class="post">be </span>
</p>
<p>
<span class="text">called </span>
<span class="post">ogle </span>
</p>
<h3>
<span class="statement">true</span>
</h3>
</body>
<html>
with this code:
html_code := strings.NewReader(`
code_example_above
`)
doc, _ := goquery.NewDocumentFromReader(html_code)
doc.Find("h1").Each(func(i int, s *goquery.Selection) {
s.SetAttr("class", "title")
class, _ := s.Attr("class")
if class == "title" {
fmt.Println(class, s.Text())
}
})
I know that I can select the <p..."text" following the <h1..."title" with either doc.Find("h1+p") or s.Next() inside the doc.Find("h1").Each function:
doc.Find("h1").Each(func(i int, s *goquery.Selection) {
s.SetAttr("class", "title")
class, _ := s.Attr("class")
if class == "title" {
fmt.Println(class, s.Text())
fmt.Println(s.Next().Text())
}
})
I can't figure out how to insert the text from <h1..."title" to <p..."text". I've tried using quite a few variations of s.After(), s.Before(), and s.Append(), e.g., like this:
doc.Find("h1").Each(func(i int, s *goquery.Selection) {
s.SetAttr("class", "title")
class, _ := s.Attr("class")
if class == "title" {
s.After(s.Text())
fmt.Println(s.Next().Text())
}
})
but I can't figure out how to do exactly what I want.
If I use s.After(s.Next().Text()) instead, I get this error output:
panic: expected identifier, found 5 instead
goroutine 1 [running]:
code.google.com/p/cascadia.MustCompile(0xc2082f09a0, 0x62, 0x62)
/home/*/go/src/code.google.com/p/cascadia/selector.go:59 +0x77
github.com/PuerkitoBio/goquery.(*Selection).After(0xc2082ea630, 0xc2082f09a0, 0x62, 0x5)
/home/*/go/src/github.com/PuerkitoBio/goquery/manipulation.go:18 +0x32
main.func·001(0x0, 0xc2082ea630)
/home/*/go/test2.go:78 +0x106
github.com/PuerkitoBio/goquery.(*Selection).Each(0xc2082ea600, 0x7cb678, 0x2)
/home/*/go/src/github.com/PuerkitoBio/goquery/iteration.go:7 +0x173
main.ExampleScrape()
/home/*/go/test2.go:82 +0x213
main.main()
/home/*/go/test2.go:175 +0x1b
goroutine 9 [runnable]:
net/http.(*persistConn).readLoop(0xc208047ef0)
/usr/lib/go/src/net/http/transport.go:928 +0x9ce
created by net/http.(*Transport).dialConn
/usr/lib/go/src/net/http/transport.go:660 +0xc9f
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:2232 +0x1
goroutine 10 [select]:
net/http.(*persistConn).writeLoop(0xc208047ef0)
/usr/lib/go/src/net/http/transport.go:945 +0x41d
created by net/http.(*Transport).dialConn
/usr/lib/go/src/net/http/transport.go:661 +0xcbc
exit status 2
(The lines of my script don't match the lines of the examples above, but "line 72" of my script contains the code s.After(s.Next().Text()). I don't know what exactly panic: expected identifier, found 5 instead means.)
Summary
In summary, my problem is that I can't quite wrap my head around how to use goquery to add text to a tag.
I think I'm close. Would any gopher Jedis be able and willing to help this padawan?
Something like this code does the job, it finds all <h1> nodes, then all <span> nodes inside these <h1> nodes, looking for one with class text. Then it gets the next element to the <h1> node, if it is a <p>, that has inside a <span>, then it replaces this last <span> with a new <span> with the new text and removes the <h1>.
I wonder if it's possible to create nodes using goquery without writing html...
package main
import (
"fmt"
"strings"
"github.com/PuerkitoBio/goquery"
)
var htmlCode string = `<html>
...
<html>`
func main() {
doc, _ := goquery.NewDocumentFromReader(strings.NewReader((htmlCode)))
doc.Find("h1").Each(func(i int, h1 *goquery.Selection) {
h1.Find("span").Each(func(j int, s *goquery.Selection) {
if s.HasClass("text") {
if p := h1.Next(); p != nil {
if ps := p.Children().First(); ps != nil && ps.HasClass("text") {
ps.ReplaceWithHtml(
fmt.Sprintf("<span class=\"text\">%s%s</span>)", s.Text(), ps.Text()))
h1.Remove()
}
}
}
})
})
htmlResult, _ := doc.Html()
fmt.Println(htmlResult)
}