How to remove an unnecessary space between elements? - html

Below, the <h3> tag creates a space, and the code gets formatted weirdly. How do you get rid of the space between the 80 and the + sign? The counter-up module that I used is from this link: https://www.jqueryscript.net/animation/Animating-Numbers-Counting-Up-with-jQuery-Counter-Up-Plugin.html
<section id = "cta" class = "wrapper style3">
<div class = "row uniform">
<div class = "4u 6u$(2) 12u$(3)">
<h2><u>Students</u></h2>
</div>
<div class = "4u 6u(2) 12u$(3)">
<h3 class = "counter">80</h3><h3>+</h3>
</div>
</div>
<script>
jQuery(document).ready(function( $ ) {
$('.counter').counterUp({
delay: 30,
time: 1500
});
});
</script>
</section>

Headings are block-level elements meaning two adjacent headings will be rendered on two separate lines (as long as you haven't tinkered with their presentation via CSS).
You can use CSS to append the "+" to the H3 content instead of using another adjacent H3 element:
h3.counter:after {
content: "+";
display: "inline";
}
<h2>Without CSS:</h2>
<h3>80</h3><h3>+</h3>
<h2>With CSS:</h2>
<h3 class="counter">80</h3>

Related

How to replace contents of p tag to div tag?

I need to replace the content of my P tag to DIV tag in almost 600 html pages.Each page had different name and topic.
<div id = "topicname"></div>
<P><A NAME="4u_uvt"></A><B>FiBu-Übergabe</B></P>
I wrote this Javascript and called in my HTML function, but it is not working
<script>
var mydivpchange=document.querySelectorAll("p");
document.getElementByID("topicname").innerHTML=mydivpchange[1].innerHTML;
</script>
// get tag names that are <p> only in first occurence
let test=document.getElementsByTagName('p')[0];
// create a div element
let y=document.createElement('div');
// state that p = div
y.innerHTML=test.innerHTML;
// replace test with y
test.parentNode.replaceChild(y, test);
div{ background:red;}
<p>test</p>
<br/>
<p>test2</p>
<br/>
<p>test3</p>

How to merge/collapse child DOM node into parent with BeautifulSoup / lxml?

I am writing some HTML pre-processing scripts that are cleaning/tagging HTML from a web crawler, for use in a semantic/link analysis step that follows. I have filtered out undesired tags from the HTML, and simplified it to contain only visible text and <div> / <a> elements.
I now am trying to write a "collapseDOM()" function to walk through the DOM tree and perform the following actions:
(1) destroy leaf nodes without any visible text
(2) collapse any <div>, replacing it with its child, if it (a) directly contains no visible text AND (b) has only a single <div> child
So for instance if I have the following HTML as input:
<html>
<body>
<div>
<div>
not collapsed into empty parent: only divs
</div>
</div>
<div>
<div>
<div>
inner div not collapsed because this contains text
<div>some more text ...</div>
but the outer nested divs do get collapsed
</div>
</div>
</div>
<div>
<div>This won't be collapsed into parent because </div>
<div>there are two children ...</div>
</div>
</body>
It should get transformed into this "collapsed" version:
<html>
<body>
<div>
not collapsed into empty parent: only divs
</div>
<div>
inner div not collapsed because this contains text
<div>some more text ...</div>
but the outer nested divs do get collapsed
</div>
<div>
<div>This won't be collapsed into parent because </div>
<div>there are two children ...</div>
</div>
</body>
I have been unable to figure out how to do this. I tried writing a recursive tree-walking function using BeautifulSoup's unwrap() and decompose() methods, but this modified the DOM while iterating over it and I couldn't figure out how to get it to work ...
Is there a simple way to do what I want? I am open to solutions either in BeautifulSoup or lxml. Thanks!
You can start with this and adjust to your own needs:
def stripTagWithNoText(soup):
def remove(node):
for index, item in enumerate(node.contents):
if not isinstance(item, NavigableString):
currentNodes = [text for text in item.contents if not isinstance(text, NavigableString) or (isinstance(text, NavigableString) and len(re.sub('[\s+]', '', text)) > 0)]
parentNodes = [text for text in item.parent.contents if not isinstance(text, NavigableString) or (isinstance(text, NavigableString) and len(re.sub('[\s+]', '', text)) > 0)]
if len(currentNodes) == 1 and item.name == item.parent.name:
if len(parentNodes) > 1:
continue
if item.name == currentNodes[0].name and len(currentNodes) == 1:
item.replaceWithChildren()
node.unwrap()
for tag in soup.find_all():
remove(tag)
print(soup)
soup = BeautifulSoup(data, "lxml")
stripTagWithNoText(soup)
<html>
<body>
<div>
not collapsed into empty parent: only divs
</div>
<div>
inner div not collapsed because this contains text
<div>some more text ...</div>
but the outer nested divs do get collapsed
</div>
<div>
<div>This won't be collapsed into parent because </div>
<div>there are two children ...</div>
</div>
</body>
</html>

how to get all HTML elements with specific text by jsoup

How can I get all elements with specific text(inner HTML) in an HTML document by jsoup?
for example all elements with text test :
<html><head><title>for example></title></head>
<body>
<div id="div1" class='test'>
test
<p id='p1'>test<a id='a1'>test</a></p>
<a id='a2'>test</a>
<img src='' id='img1' alt='test'>
<p id='p2'>example</p>
</div>
</body></html>
note that I don't want to use tags' id or tags' name for selecting elements!
If I understand you correctly:
String html = "<html><head><title>for example></title></head><body><div id=\"div1\" class='test'>test<p id='p1'>test<a id='a1'>test</a></p><a id='a2'>test</a><img src='' id='img1' alt='test'><p id='p2'>example</p></div></body></html>";
Document doc = Jsoup.parse(html);
Elements elements = doc.select("*:containsOwn(test)");
for(Element element:elements)
{
System.out.println(element.toString()+"\n");
}
This will give the output for tags with id: div1,p1,a1,a2.

select the most nested element with same classes

i'm trying to make some special menu but i have a problem with selecting the most nested element (div) . Menu will be dynamic so it can change how much divs will be nested in one div. (parents will be created with new childs) so i need to select the last one (the most nested) without using more classes od Ids.
Here is a code i wrote until now:
<div id="strategy">
<div class="selected">
0
<div class="selected">
some text
<div class="selected"> this is the last div, but it can be anytime changed and more childs of this element can be created</div>
</div>
</div>
<div class="selected">
1
</div>
<div>
2
</div>
</div>
and something of css i tried:
div.selected:only-of-type {background: #F00;}
also tried nth:last-child, only-child.. i think everything but there must be some way how to do it.
if you're open to jQuery...
$(document).ready(function() {
var $target = $('#strategy').children();
while( $target.length ) {
$target = $target.children();
}
var last = $target.end(); // You need .end() to get to the last matched set
var lastHtml = last.html();
$('body').append('<strong>deepest child is: ' + lastHtml + '</strong>');
last.css('color', 'blue');
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="strategy">
<div class="selected">
0
<div class="selected">
some text
<div class="selected"> this is the last div, but it can be anytime changed and more childs of this element can be created</div>
</div>
</div>
<div class="selected">
1
</div>
<div>
2
</div>
</div>

How to access div element text based on adjacent text

I have the following HTML code and am trying to access "QA1234", which is the value of the Serial Number. Can you let me know how I can access this text?
<div class="dataField">
<div class="dataName">
<span id="langSerialNumber">Serial Number</span>
</div>
<div class="dataValue">QA1234</div>
</div>
<div class="dataField">
<div class="dataName">
<span id="langHardwareRevision">Hardware Revision</span>
</div>
<div class="dataValue">05</div>
</div>
<div class="dataField">
<div class="dataName">
<span id="langManufactureDate">Manufacture Date</span>
</div>
<div class="dataValue">03/03/2011</div>
</div>
I assume you are trying to get the "QA1234" text in terms of being the "Serial Number". If that is correct, you basically need to:
Locate the "dataField" div that includes the serial number span.
Get the "dataValue" within that div.
One way is to get all the "dataField" divs and find the one that includes the span:
parent = browser.divs(class: 'dataField').find { |div| div.span(id: 'langSerialNumber').exists? }
p parent.div(class: 'dataValue').text
#=> "QA1234"
parent = browser.divs(class: 'dataField').find { |div| div.span(id: 'langManufactureDate').exists? }
p parent.div(class: 'dataValue').text
#=> "03/03/2011"
Another option is to find the serial number span and then traverse up to the parent "dataField" div:
parent = browser.span(id: 'langSerialNumber').parent.parent
p parent.div(class: 'dataValue').text
#=> "QA1234"
parent = browser.span(id: 'langManufactureDate').parent.parent
p parent.div(class: 'dataValue').text
#=> "03/03/2011"
I find the first approach to be more robust to changes since it is more flexible to how the serial number is nested within the "dataField" div. However, for pages with a lot of fields, it may be less performant.