Parse Classes with similar names using Beautiful Soup in python

Parse Classes with similar names using Beautiful Soup in python - html

I have an html page from which I want to extract the td element with the class attribute bold. Instead other td elements pop up like dark bold
When I use the findAll method in BeautifulSoup,
scores= soup.findAll(lambda tag: tag.name == 'td', { "class" : "bold"})
I get all these elements
<td class="dark bold">
<span class="hide-for-tablet">Sebastian</span>
<span class="hide-for-mobile">Vettel</span>
<span class="uppercase hide-for-desktop">VET</span>
</td>
<td class="bold hide-for-mobile">78</td>
<td class="dark bold">1:44:44.340</td>
<td class="bold">25</td>
Where as all I really want is
<td class="bold">25</td>
How do I narrow down my results?

Try this
scores= soup.findAll(lambda tag: tag.name == 'td' and tag.get('class') == ['bold'])

Related

Thymeleaf multiple conditions, change background color

Table cells in columns accountDTO.state and accountDTO.accountDTO should change background color, depending on the text value of accountDTO.state.
I've tried this:
Thymeleaf - How to apply two (or more) styles based on mutually exclusive conditions
and I've searched some more solutions, but everyone uses conditionals for two possible values, or in URL above there is multiple conditions. I have only one condition, but I have 9 possible values on that field. User selects a value for that field in a different web page. On this page, a list from database is shown.
Now I've tried with th:switch statement inside td tag but then I need to add span or div inside td, but in this case it doesn't work, cells are not colored.
I've searched for something like if/else/if structure or elif, but there is none it seems. And I can't repeat th:if in a single tag. I hope I can use th:switch but, why it doesn't work?
Below is just a shorter version switch, just some of the cases for this example, but I'll need nine cases in total.
<tbody>
<tr th:each = "accountDTO : ${listOfAccounts}">
<td th:text = "${accountDTO.accountDTO}"></td>
<td th:text = "${accountDTO.startBalance}"></td>
<td th:text = "${accountDTO.currentBalance}"></td>
<td th:text = "${accountDTO.state}" th:switch = "${accountDTO.state}"
><span th:case = "${accountDTO.state} == 'OUT_OF_USE'" th:appendstyle = "'background: red'"></span>
<span th:case = "${accountDTO.state} == 'DEVICE'" th:appendstyle = "'background: green'"></span>
<span th:case = "${accountDTO.state} == 'LOCKED'" th:appendstyle = "'background: blue'"></span>
</td>
<td th:text = "${accountDTO.employee.employeeName}"></td>
<td><a th:href = "#{/editAccount/{id}(id=${accountDTO.idAccount})}" class="btn btn-primary btn-lg" id = "eBtn" th:data = "${accountDTO.idAccount}">Edit</a></td>
</tr>
</tbody>

Consider using the variable as class to simplify the Thymeleaf and HTML. Then you can set the format consistently with your CSS. Your HTML would look something like this:
<tbody>
<tr th:each = "accountDTO : ${listOfAccounts}">
<td th:text = "${accountDTO.accountDTO}"></td>
<td th:text = "${accountDTO.startBalance}"></td>
<td th:text = "${accountDTO.currentBalance}"></td>
<td th:text = "${accountDTO.state}" th:classappend = "${accountDTO.state}"></td>
<td th:text = "${accountDTO.employee.employeeName}"></td>
<td><a th:href = "#{/editAccount/{id}(id=${accountDTO.idAccount})}" class="btn btn-primary btn-lg" id = "eBtn" th:data = "${accountDTO.idAccount}">Edit</a></td>
<tr>
</tbody>
Then look at something like this in your CSS:
td.OUT_OF_USE {
background: red;
}
td.DEVICE {
background: green;
}
td.LOCKED {
background: blue;
}
Possibly useful for making similar output consistent on seperate pages.

Beautiful soup finding the first sibling of a known object with a known attribute

I have the following code to select a certain cell in a table element:
tag = soup.find_all('td', attrs={'class': 'I'})
as shown in the attached image 1, I would like to somehow be able to find its first sibling within the same class "even_row". Ideally, the selection would output only the contents of data-seconds, in this case "58". Not every "even_row" class has a element with class I, and some have more than one, so I need to get the value data-seconds only for the "even_row" classes that have the element with class "I"
Any help would be appreciated as I've been banging my head on the wall looking through documentation to no avail.
html look like :
<tr class='even_row'>
<td class='row_labels' data-seconds="58">
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>

One way to get around that issue is to pass True
from bs4 import BeautifulSoup
html = """
<tr class='even_row'>
<td class='row_labels' data-seconds="58">
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
</tr>
<tr class='even_row'>
<td class='row_labels' >
<div class='celldiv slots1'></div>
</td>
<td class='new'>...</td>
<td class='I'>...</td>
<td class='new'>...</td>
<td class='new'>...</td>
</tr>
"""
soup = BeautifulSoup(html,'html.parser')
even_rows = soup.find_all('tr', attrs={'class': 'even_row'})
for row in even_rows:
tag = row.find("td", {"data-seconds" : True})
if tag is not None:
print(tag.get('data-seconds'))
Output :
58
another way to do it is using regular expressions
import re
tds = [tag.get('data-seconds') for tag in soup.findAll("td", {"data-seconds" : re.compile(r".*")})]
print(tds)
Output :
['58']

Cannot test properly without the html but sounds like with bs4 4.7.1+ you can use :has to satisfy your requirements for .even_row:has(.I) i.e. parent with class even_row, having child with class I, and then add in [data-seconds] to cater for all child data-seconds attribute values
print([i['data-seconds'] for i in soup.select('.even_row:has(.I) [data-seconds]')])

How to filter url links with criteria via beautifulsoup? is it possible? YES indeed

There are always some new posts in any forum. The one I visited gives a "new" sticker to the post. How do i filter and retrieve the URLs with new stickers? Tricky...
I usually just grabbed off first page. But it seems unprofessional. Actually there are also author and date stickers in each section. Can these be filtering criteria via beautifulsoup? I am feeling so much to learn.
This is the DOM:
<!-- 三級置頂分開 -->
<tbody id="stickthread_10432064">
<tr>
<td class="folder"><img src="images/green001/folder_new.gif"/></td>
<td class="icon">
  </td>
<th class="new">
<label>
<img alt="" src="images/green001/agree.gif"/>
<img alt="本版置顶" src="images/green001/pin_1.gif"/>
 </label>
<em>[痴女]</em> <span id="thread_10432064">(セレブの友)(CESD-???)大槻ひびき</span>
<img alt="附件" class="attach" src="images/attachicons/common.gif"/>
<span class="threadpages"> <img src="images/new2.gif"/></span> ### new sticker
</th>
<td class="author"> ### author sticker
<cite>
新片<img align="absmiddle" border="0" src="images/thankyou.gif"/>12 </cite>
<em>2019-4-23</em> ### date sticker
</td>
<td class="nums"><strong>6</strong> / <em>14398</em></td>
<td class="nums">7.29G / MP4
</td>
<td class="lastpost">
<em>2019-4-25 14:11</em>
<cite>by 22811</cite>
</td>
</tr>
</tbody><!-- 三級置頂分開 -->
Let's put it this way, it seems that I didn't express myself well enough. What i'm saying is this: for example, I wanna find all 'tbody' with either 'author' of 新片, or 'date' of 2019-4-23, or with a sticker called "images/new2.gif". I would get a lists of tbodys presumably, and then, I wanna find the href in them via
blue = soup.find_all('a', style="font-weight: bold;color: blue")
Thanks chiefs!

There is a class new so I am wondering if you could just use that? That would be:
items = soup.select('tbody:has(.new)')
for item in items:
print([i['href'] for i in item.select('a')])
Otherwise, you can use :has and :contains pseudo classes (bs4 4.7.1) to specify those patterns
items = soup.select('tbody:has(.author a:contains("新片")), tbody:has(em:contains("2019-4-23")), tbody:has([src="images/new2.gif"])')
You can then get hrefs with a loop
for item in items:
print([i['href'] for i in item.select('a')])

First you need to find out the parent tag and then need to find the next sibling and then find the respective tag.Hope you will get your answer.try below code.
from bs4 import BeautifulSoup
import re
data='''<tbody id="stickthread_10432064">
<tr>
<td class="folder"><img src="images/green001/folder_new.gif"/></td>
<td class="icon">
</td>
<th class="new">
<label>
<img alt="" src="images/green001/agree.gif"/>
<img alt="本版置顶" src="images/green001/pin_1.gif"/>
</label>
<em>[痴女]</em> <span id="thread_10432064">(セレブの友)(CESD-???)大槻ひびき</span>
<img alt="附件" class="attach" src="images/attachicons/common.gif"/>
<span class="threadpages"> <img src="images/new2.gif"/></span> ### new sticker
</th>
<td class="author"> ### author sticker
<cite>
新片<img align="absmiddle" border="0" src="images/thankyou.gif"/>12 </cite>
<em>2019-4-23</em> ### date sticker
</td>
<td class="nums"><strong>6</strong> / <em>14398</em></td>
<td class="nums">7.29G / MP4
</td>
<td class="lastpost">
<em>2019-4-25 14:11</em>
<cite>by 22811</cite>
</td>
</tr>
</tbody>'''
soup=BeautifulSoup(data,'html.parser')
for item in soup.find_all('img',src=re.compile('images/new')):
parent=item.parent.parent
print(parent.find_next_siblings('td')[0].find('a').text)
print(parent.find_next_siblings('td')[0].find('em').text)

XPath for a span based on its text?

I am unable to locate first span with XPath I tried:
//*[#id='student-grid']/div[2]/div[1]/table/tbody/tr[1]/td/span/span[contains(text(), 'Edit School')]
to select span with text - Edit Student button
<tbody role="rowgroup">
<tr class="" data-uid="2f3646c6-213a-4e91-99f9-0fbaa5f7755d" role="row" aria-selected="false">
<td class="select-row" role="gridcell">
<td class="font-md" role="gridcell">marker, Lion</td>
<td role="gridcell">TESTLINK_1_ArchScenario</td>
<td role="gridcell">1st</td>
<td role="gridcell">Not Started</td>
<td role="gridcell"/>
<td role="gridcell"/>
<td role="gridcell">QA Automation TestLink Folders</td>
<td class="k-cell-action" role="gridcell"/>
<td class="k-cell-action detail-view-link font-md" role="gridcell">
<span class="button-grid-action kendo-lexia-tooltip icon-pencil" role="button" title="Edit Student">
<span>Edit Student</span>
</span>
</td>
<td class="k-cell-action archive-link font-md" role="gridcell">
<span class="button-grid-action kendo-lexia-tooltip icon-archive" role="button" title="Archive Student">
<span>Archive Student</span>
</span>
</td>
</tr>
</tbody>

If you want to select span with text - Edit Studen try any of this:
//span[#title='Edit Student']/span
//span[text()='Edit Student']
If you want to select Edit Studen with role="button" try any of this:
//span[#title='Edit Student'][#role='button']
//span[#role='button'][./span[text()='Edit Student']]
//span[#role='button'][./span[.='Edit Student']]

simply use can use any of this xpaths
//span[contains(text(),'Edit Student')]
//*[contains(text(),'Edit Student')]
//span [#class='button-grid-action kendo-lexia-tooltip icon-pencil']/span
//span [#title='Edit Student']/span
//span [contains(#title,'Edit Student')]/span
//span [contains(#class,'button-grid-action kendo-lexia-tooltip icon-pencil')]/span

To select the outer span, this XPath,
//span[#role='button' and normalize-space()='Edit School']
will select span elements with a button #role and a normalized string value of Edit School.
To select the inner span, this XPath,
//span[text()='Edit School']
will select span elements with an immediate text node child whose value is Edit School.
You can, of course, further qualify the heritage in either case as needed.
See also
Testing text() nodes vs string values in XPath

This should get the span text
//span[.='Edit Student']

How to find element in selenium

I want to find the element "Holiday4" in span class="fc-title". How to do it? Do i need to use css selector or do I need to use xpath?
<tr>
<td class="fc-day-number fc-sun fc-future hp-cal-selected" data-date="2016-02-14">14</td>
<td class="fc-day-number fc-mon fc-future" data-date="2016-02-15">15</td>
<td class="fc-day-number fc-tue fc-future" data-date="2016-02-16">16</td>
<td class="fc-day-number fc-wed fc-future" data-date="2016-02-17">17</td>
<td class="fc-day-number fc-thu fc-future" data-date="2016-02-18">18</td>
<td class="fc-day-number fc-fri fc-future" data-date="2016-02-19">19</td>
<td class="fc-day-number fc-sat fc-future" data-date="2016-02-20">20</td>
</tr>
<tr>
<td class="fc-event-container">
<a class="fc-day-grid-event fc-h-event fc-event fc-start fc-end holiday" style="color:65280" title="Holiday">
<div class="fc-content">
<span class="fc-title">Holiday4</span>
</div>
</a>

Use cssSelector
.fc-content [div:contains('Holiday 4')]
Also check this out
Need to find element in selenium by css
https://saucelabs.com/selenium/css-selectors

If to use Python:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://your_url.com')
element = driver.find_element_by_xpath('//span[#class="fc-title"][contain(text(),"Holiday 4")]')

Use Below Xpath,
//span[#class='fc-title'][.='Holiday4']

To get the Text of "Holiday4" you can find it by the following.
Using Css Selector:
string str = driver.FindElement(By.CssSelector("[class='fc-title']")).Text;
Using XPath:
string str = driver.FindElement(By.Xpath("tr/td/a/div/span")).Text;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Parse Classes with similar names using Beautiful Soup in python - html

Try this scores= soup.findAll(lambda tag: tag.name == 'td' and tag.get('class') == ['bold'])

Related

Thymeleaf multiple conditions, change background color

Beautiful soup finding the first sibling of a known object with a known attribute

How to filter url links with criteria via beautifulsoup? is it possible? YES indeed

XPath for a span based on its text?

How to find element in selenium

Categories

Resources