Xpath to extract all the texts present inside all the div element - html

What is the XPATH I should use to extract only the inner text present in all the div elements.

//div/descendant-or-self::*/text()
Expression gives all text in does not matter how deep inside.

Related

Issue in writing xpath to refer to only text of li element

I have written below xpath to refer to only the text of 'li' element, but it refers to entire 'li' element as highlighted in the image. What should the xpath which can refer to text of 'li' element in my case?
//div[#class='parent']//ul[#class='list-none']//li[contains(text(),'7')]/text()

Selecting both links and anchors using XPath

I am using screaming frog and I want to do this using XPath.
Extract all links and anchors containing a certain class from the main body content
but I want to exclude all links within div.list
Right now I am trying this but it's not working too well, plus I want it to spit it out in text form in possible.
//div[#class="page-content"]/*[not(class="list")]//a[#data-wpel-link="internal"]
Anyone got an idea?
This XPath,
//a[#data-wpel-link="internal"][not(ancestor::div[#class="list"])]
will select all a elements with the given attribute value that do not have an ancestor div of the given class.
You can, of course, prefix any heritage in which to restrict the selection, e.g:
//div[#class="page-content"]//a[#data-wpel-link="internal"]
[not(ancestor::div[#class="list"])]

How to select the text node/content using CSS selectors

Is it possible to select the content/text node of a tag using css selectors?
For example I have some content such <div><p class="x">Hello world</p></div>. How can I get/select the text node ("hello world") using CSS?
I know that I could get the p element using .x class selector and then use innerHTML with javascript but I would like to know if it's possible to get the exactly text node using CSS and just set the node data (which is basically text as the node is a text node). Is it possible?
No, it is not possible.
However, if you're trying to figure out how to select the text node with querySelector, do what #j08691 suggested:
document.querySelector('.x').textContent
That or
document.querySelector('.x').firstChild.nodeValue
should work.

How to find a child html element by id with jsoup?

I am parsing the html code of one site with Jsoup. I need to find some html elements that has an specific id but their parent´s tree is complicating me the task. So I would like to know if is it possible to search an specific html element without having to search first all of their parents.
For instance I am doing the next:
Elements el=elements.select(".scroller.context-inplay").select(".zone.grid-1-1").select(".grid-1").select(".module-placeholder");
I would likte to know if there is a simple way to get the same element I get with this code searching by its id
The id of an html element should be unique within the page. Some html that you find in the wild breaks this requirement unfortunately tough. However, if your html source follows the standard you can simply use the # css operator to select the element in question:
Element el = doc.select("#someID").first();
Alternatively you can directly use the getElmentById Jsoup method:
Element el = doc.getElmentById("someID");
Also, if you decide to go by class names as you suggest in your question, it is easy to combine all selects into one selector:
Elements els = elements.select(".scroller.context-inplay .zone.grid-1-1 .grid-1 .module-placeholder");
The spaces in the CSS selector mean that any subselector right of the space must be a child of the stuff on the left side.

Paragraph tag being displayed in new line

I need value "1" to be displayed adjacent to "Id" field but its displaying in a new line.The tag is supposed to be inline not sure why its being moved to new line.
jsfiddle
HTML
<b>Id : <p id="productid">1</p></b>
A <p> element is a paragraph, which by default is a block element.
In this case, you can't use <p> because:
It is not allowed inside <b> elements (because <p> can only be used where flow content is expected, but the content model of <b> is phrasing content). Always remember to validate your code.
Semantically, it's clear that it isn't a paragraph.
I suggest using
<b>Id: <span id="productid">1</span></b>
Demo
#productid{
display:inline-block;
}
p is a block level element by default. You can set it to display inline-block to make it do as you describe using basic css.
I'm not sure if you are unable to access css, so in case you cannot, see oriol's answer. No reason not to just make it a span.
Bit of a side note, it is a little odd to put a p tag inside a b tag. Technically you CAN do this, but it looks like using a span tag is the more proper way to handle this.