Html text control (DIV not recognizing another div created on the fly??) - html

<div id='imgCaption' style='background-color:grey;padding:5px 20px;color:white;'>
</div>
<script language="javascript>
// lots of codes
$('#imgCaption').html(imgCaption + \"<div style='width:87px; float:right; text-align:right;'>\" + nextImgNum + ' of ' + totalNo + \"</div>\" );
</script>
The result are not stable.
First result
Some caption text 1 of 11
Second result
Some caption text continuing.....................................................
.. 1 of 11
Third result
Some caption text continuing ...............................................
1 of 11
First and 2nd second result are ok as the caption text and the index text are in the same line..
3rd result is not ok as they are on the different line. and the caption div's background color is not covering the index div.
As a result, the index cannot be seen on the third result..
Any work around??
Tkz..

This looks like a float issue, try adding overflow:hidden to <div id="imgCaption" />

Related

select all text nodes inside an element without text in child elements

On scraping a site, I have an HTML like this:
<div class="classA classB classC">
<div class="classD classE">
<h1 class="classF classD">Text I don't want</h1>
<ul>....</ul> <!-- containing more text in nested children, don't want -->
</div>
Text I want to grab.
<br>
More text I want to grab
</div>
Here, how can I select only the text I want to grab, i.e ["Text I want to grab", "More text I want to grab"] and prevent selecting Text I don't want. I am trying to select using CSS selector like this:
text = response.css('.classA:not(.classD) *::text').getall()
Does anyone know, what to do in this case, I am not familiar with xpath, but please do suggest if have a solution in it?
You are about to reach your goal. You want to prevent <h1 class="classF classD">Text I don't want</h1> using :not that's correct but you have to select the entire portion of html from where there is your desired output meaning you have to select <div class="classA classB classC"> at first then you have to prevent whatever you want. so the css expression should be like:
response.css('div.classA.classB.classC:not(.classF)::text').getall()
OR
' '.join([x.strip() for x in resp.css('div.classA.classB.classC:not(.classF)::text').getall()])
Proven by scrapy shell:
In [1]: from scrapy.selector import Selector
In [2]: %paste
html='''
<div class="classA classB classC">
<div class="classD classE">
<h1 class="classF classD">Text I don't want</h1>
<ul>....</ul> <!-- containing more text in nested children, don't want -->
</div>
Text I want to grab.
<br>
More text I want to grab
</div>
'''
## -- End pasted text --
In [3]: resp=Selector(text=html)
In [4]: ''.join(resp.css('div.classA.classB.classC:not(.classF)::text').getall()).strip()
Out[4]: 'Text I want to grab.\n \n More text I want to grab'
In [5]: ''.join(resp.css('div.classA.classB.classC:not(.classF)::text').getall()).replace('\n',''
...: ).strip()
Out[5]: 'Text I want to grab. More text I want to grab'
In [6]: ''.join(resp.css('div.classA.classB.classC:not(.classF)::text').getall()).strip().replace
...: ('\n','').strip()
Out[6]: 'Text I want to grab. More text I want to grab'
Out[7]: ['', 'Text I want to grab.', 'More text I want to grab']
In [8]: ''.join([x.strip() for x in resp.css('div.classA.classB.classC:not(.classF)::text').getal
...: l()])
Out[8]: 'Text I want to grab.More text I want to grab'
In [9]: ''.join([x.strip() for x in resp.css('div.classA.classB.classC:not(.classF)::text').getall()])
Out[9]: 'Text I want to grab.More text I want to grab'
In [10]: ' '.join([x.strip() for x in resp.css('div.classA.classB.classC:not(.classF)::text').getall()])
Out[10]: ' Text I want to grab. More text I want to grab'

How to get content of unnamed nested div inside div with class? (Use Scrapy or BeautifulSoup)

I'm trying to obtain the nested unnamed div inside:
div class="pod three columns closable content-sizes clearfix">
The nested unnamed div is also the first div inside the div above (see image)
I have tried the following:
for div in soup.findAll('div', attrs={'class':'pod three columns closable content-sizes clearfix'}):
print(div.text)
The length of
soup.findAll('div',attrs={'class':'pod three columns closable
content-sizes clearfix'})
is just one despite this div having many nested divs. So, the for-loop runs only once and prints everything.
I need all the text inside only the first nested div div (see image):
Project...
Reference Number...
Other text
Try:
from bs4 import BeautifulSoup
html_doc = """
<div class="pod three columns closable content-sizes clearfix">
<div>
<b>Project: ...</b>
<br>
<br>
<b>Reference Number: ...</b>
<br>
<br>
<b>Other text ...</b>
<br>
<br>
</div>
<div>
Other text I don't want
</div>
</div>
"""
soup = BeautifulSoup(html_doc, "html.parser")
print(
soup.select_one("div.pod.three.columns > div").get_text(
strip=True, separator="\n"
)
)
Prints:
Project: ...
Reference Number: ...
Other text ...
Or without CSS selector:
print(
soup.find(
"div",
attrs={"class": "pod three columns closable content-sizes clearfix"},
)
.find("div")
.get_text(strip=True, separator="\n")
)
try this:-
result = soup.find('div', class_ = "pod three columns closable content-sizes clearfix").find("div")
print(result.text)
output:-
Project: .............
Reference Number: ....
Other text ...........

xpath text within a span within text within a div

I'm trying desperately to extract text from within a span which is within text which is within a div (underlined in the image)
This is the relevant part of the code ...
<div id="groupBlock3">
<div class="groupBlockTitle">
::before
"
ALL TEACHERES ("
<span class="activeTeachers">12</span>
" ACTIVE, "
<span class="archivedTeachers">1</span>
" ARCHIVED)
"
<div>...</div>
<div>+ enroll a teacher</div>
</div>
<div>...</div>
</div>
I can retrieve the text from within the first div with this ...
"normalize-space(//div[#id='groupBlock3']/div[1])"
... which gives me ...
'ALL TEACHERES ( ACTIVE, ARCHIVED) + enroll a teacher'
... but, try as I might I cannot get the text from within the first or second span - it just returns a null string. Please help me!!
Try one of these XPath-1.0 expressions:
normalize-space(//div[#id='groupBlock3']/div[1]/span[1]/text())
which results in 12, or, for the second span
normalize-space(//div[#id='groupBlock3']/div[1]/span[2]/text())
which results in 1.
But if you want all text of the first div, use this expression
normalize-space(string(//div[#id='groupBlock3']/div[1]))
which gives you the result
::before " ALL TEACHERES (" 12 " ACTIVE, " 1 " ARCHIVED) " ...+ enroll a teacher

BeatifulSoup - Trying to get text inside span tags

I want to pull the text inside the span tags but when I try and use .text or get_text() I get errors (either after print spans or in the for loop). What am I missing? I have it set just now to just do this for the first div of class col, just to test if it is working, but I will want it to work for the 2nd aswell.
Thanks
My Code -
premier_soup1 = player_soup.find('div', {'class': 'row-table details -bp30'})
premier_soup_tr = premier_soup1.find_all('div', {'class': 'col'})
for x in premier_soup_tr[0]:
spans = x.find('span')
print (spans)
Output
-1
<span itemprop="name">Alisson Ramses Becker</span>
-1
<span itemprop="birthDate">02/10/1992</span>
-1
<span itemprop="nationality"> Brazil</span>
-1
>>>
The HTML
<div class="col">
<p>Name: <strong><span itemprop="name">Alisson Ramses Becker</span> </strong></p>
<p>Date of birth:<span itemprop="birthDate">02/10/1992</span></p>
<p>Place of birth:<span itemprop="nationality"> Brazil</span></p>
</div>
<div class="col">
<p>Club: <span itemprop="affiliation">Liverpool</span></p>
<p>Squad: 13</p><p>Position: Goal Keeper</p>
</div>
If you just want the text in the spans you can search specifically for the spans:
soup = BeautifulSoup(html, 'html.parser')
spans = soup.find_all('span')
for span in spans:
print(span.text)
If you want to find the spans with the specific divs, then you can do:
divs = soup.find_all( 'div', {'class': 'col'})
for div in divs:
spans = div.find_all('span')
for span in spans:
print(span.text)
If you just want all of the values after the colons, you can search for the paragraph tags:
soup = BeautifulSoup(html, 'html.parser')
divs = soup.find_all( 'div', {'class': 'col'})
for div in divs:
ps = div.find_all('p')
for p in ps:
print(p.text.split(":")[1].strip())
Kyle's answer is good, but to avoid printing the same value multiple times like you said happened, you need to change up the logic a little bit. First you parse and add all matches you find to a list and THEN you loop through the list with all the matches and print them.
Another thing that you may have to consider is this problem:
<div class=col>
<div class=col>
<span/>
</div>
</div>
By using a list instead of printing right away, you can handle any matches that are identical to any existing records
in the above html example you can see how the span could be added twice with how you find matches in the answer suggested by Kyle. It's all about making sure you create a logic that will only find the matches you need. How you do it is often/always dependant on how the html is formatted, but its also important to be creative!
Good luck.

Single text overlapping over elements

I have the following codes
<div class="row-fluid">
<div class="span6"><img src = "<?php echo $photo ?>"/><?php echo $row['comment'];?></div>
<div class="span6"></div>
</div>
which produces this: http://d.pr/i/g4y7
Why would the text "thissssssss..sss" overlap the other span while the text "this this this" with spaces display just fine?
The expected result is the first part of the text. The problem is the second one.
Because the browser does not know how to wrap this long word.
You have to declare:
<element> { word-wrap: break-word }
to avoid this.