Goal is to modify existing html's content only.
For example, given current markup:
<html lang="en" op="item">
<head>
<meta name="referrer" content="origin">
<title>The Scientific Case for Two Spaces After a Period (2018)</title>
</head>
<body>
<center>
<table class="fatitem" border="0">
<tr class='athing' id='25581282'>
<td class="title">
<a class="titlelink">The Scientific Case for Two Spaces After a Period (2018)</a>
</td>
</tr>
</table>
</center>
</body>
</html>
Suppose, I want to append "™" string to each word which length is 6.
The result expected:
<html lang="en" op="item">
<head>
<meta name="referrer" content="origin">
<title>The Scientific Case for Two Spaces™ After a Period™ (2018)</title>
</head>
<body>
<center>
<table class="fatitem" border="0">
<tr class='athing' id='25581282'>
<td class="title">
<a class="titlelink">The Scientific Case for Two Spaces™ After a Period™ (2018)</a>
</td>
</tr>
</table>
</center>
</body>
</html>
I'm fairly new to python, and having trouble with this. Because of nested contents, I'm struggling with properly accessing the elements and returning expected outcome.
This is what I have tried so far:
soup = BeautifulSoup(markup, 'html.parser')
new_html = []
for tags in soup.contents:
for tag in tags:
if type(tag) != str:
split_tag = re.split(r"(\W+)", str(tag.string))
for word in split_tag:
if len(word) == 6 and word.isalpha():
word += "™"
tag.string = "".join(split_tag)
else:
str_obj.append(tag)
new_html.append(str(tag))
You can use .find_all(text=True) in combination with .replace_with():
import re
from bs4 import BeautifulSoup
html_doc = """
<html lang="en" op="item">
<head>
<meta name="referrer" content="origin">
<title>The Scientific Case for Two Spaces After a Period (2018)</title>
</head>
<body>
<center>
<table class="fatitem" border="0">
<tr class='athing' id='25581282'>
<td class="title">
<a class="titlelink">The Scientific Case for Two Spaces After a Period (2018)</a>
</td>
</tr>
</table>
</center>
</body>
</html>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for s in soup.find_all(text=True):
new_s = re.sub(r"([a-zA-Z]{6,})", r"\1™", s)
s.replace_with(new_s)
print(soup.prettify())
# to have HTML entities:
# print(soup.prettify(formatter="html"))
Prints:
<html lang="en" op="item">
<head>
<meta content="origin" name="referrer"/>
<title>
The Scientific™ Case for Two Spaces™ After a Period™ (2018)
</title>
</head>
<body>
<center>
<table border="0" class="fatitem">
<tr class="athing" id="25581282">
<td class="title">
<a class="titlelink">
The Scientific™ Case for Two Spaces™ After a Period™ (2018)
</a>
</td>
</tr>
</table>
</center>
</body>
</html>
Related
I have a django admin action function that display Transactions and loops through as many as queryset selected that will be rendered in html page, and converts the html to pdf.
I want at the pdf that each transaction object fit in A4, not to have another transaction object in same page. here is code..
def report_pdf(self, request, queryset):
if request.user.is_superuser:
transaction = queryset
else:
transaction = queryset.filter(user=request.user)
template_path = "single-pdf.html"
context = {"transactions": transaction}
template = get_template(template_path)
html = template.render(context)
file = open('test.pdf', "w+b")
pisaStatus = pisa.CreatePDF(html.encode('utf-8'), dest=file,
encoding='utf-8')
file.seek(0)
pdf = file.read()
file.close()
return HttpResponse(pdf, 'application/pdf')
and here is my html
{% load staticfiles %}
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Report</title>
<style>
</style>
</head>
{% for transaction in transactions %}
<body>
<div class="container">
{% if transaction.complete %}
<table class="tg" >
<thead>
<tr>
<th class="tg-4rlv">Report for Tenant</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-c6of" CHP Reference</span></td>
<td class="tg-c6of">{{transaction.chp_reference}}</td>
</tr>
<tr>
<td class="tg-c6of"> Rent Effective From(dd/mm/yyyy)</td>
<td class="tg-c6of">{{transaction.rent_effective_date}}</td>
</tr>
<tr>
<td class="tg-c6of"> CRA Fortnightly Rates valid for 6 months from</td>
<td class="tg-c6of">{{transaction.cra_rate_from}}</td>
</tr>
<tr>
<td class="tg-l8qj">Market Rent of the property :</td>
<td class="tg-c6of">{{transaction.property_market_rent}}</td>
</tr>
<tr>
<td class="tg-l8qj" >Number of Family Group(s) :</td>
<td class="tg-c6of">{{transaction.number_of_groups}}</td>
</tr>
</div>
</body>
</html>
I am trying to pass parameters from my input text control to the called xquery by adding them as parameters in url. I tried doing in may ways - always without success. Could you have a look and tell me what I am doing wrong?
xquery version "3.0";
declare option exist:serialize "method=xhtml media-type=application/xhtml+xml indent=yes";
import module namespace xmldb="http://exist-db.org/xquery/xmldb";
declare variable $collection as xs:string := '/db/junitReports';
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:bf="http://betterform.sourceforge.net/xforms" xmlns:xf="http://www.w3.org/2002/xforms" bf:toaster-position="tl-down">
<head>
<title>IB interfaces regression testing report</title>
<meta name="author" content="test"/>
<meta name="author" content="test"/>
<meta name="description" content="IB interfaces regression testing report"/>
<link rel="stylesheet" type="text/css" href="styles/demo.css"/>
<!-- INPUT CONTROLS -->
<xf:model>
<xf:instance id="default">
<data xmlns="">
<InterfaceName constraint="true" readonly="false" required="false" relevant="true">
</InterfaceName>
<trigger1 constraint="true" readonly="false" required="false" relevant="true">
</trigger1>
</data>
</xf:instance>
<xf:instance id="table" xmlns="">
<data>
</data>
</xf:instance>
<xf:bind nodeset="InterfaceName" type="string">
</xf:bind>
<xf:submission id="showTable"
method="post"
action="{concat('/exist/rest/db/xquery/returnTable.xq?interface=',InterfaceName)}"
replace="instance"
ref="instance('table')"
instance="table">
</xf:submission>
</xf:model>
</head>
<body class="soria" style="margin:30px;">
<div class="Headline">IB test report</div>
<div class="description">
<p>You can restrict report output:</p>
</div>
<p>2. By typing in particular interface name</p>
<div class="Interface">
<xf:input id="InterfaceName" ref="InterfaceName" incremental="false">
<xf:label></xf:label>
<xf:hint>(S|R)xxxxxYYYZZZ</xf:hint>
<xf:help>Enter interface name</xf:help>
<xf:alert>Enter interface name</xf:alert>
</xf:input>
</div>
<br/>
<div>
<xf:trigger id="trigger1" ref="trigger1" incremental="true">
<xf:label>Filter output</xf:label>
<xf:hint>a Hint for this control</xf:hint>
<xf:help>help for trigger1</xf:help>
<xf:action ev:event="DOMActivate">
<xf:send submission="showTable"/>
</xf:action>
</xf:trigger>
</div>
<div>
<table border="1">
<thead>
<tr>
<th>Inteface Name</th>
<th>Test Date</th>
<th>Test Result</th>
<th>Report Link</th>
</tr>
</thead>
<tbody xf:repeat-nodeset="instance('table')//result">
<tr>
<td>
<xf:output ref="interfaceName"></xf:output>
</td>
<td>
<xf:output ref="reportDate"></xf:output>
</td>
<td>
<xf:output ref="testResult"></xf:output>
</td>
<td>
<li>
<xf:output ref="fileLink"></xf:output>
</li>
</td>
</tr>
</tbody>
</table>
</div>
<label>{InterfaceName}</label>
</body>
</html>
My uri remains without parameters:
"resource-uri":"http://localhost:8080/exist/rest/db/xquery/returnTable.xq?interface="
The right way to do it was to replace action attribute with this element in submission:
<xf:resource value="concat('/exist/rest/db/xquery/returnTable.xq?interface=',instance('defaultInstance')//InterfaceName,'&','date=',instance('defaultInstance')//CalendarDate)"/>
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Achievements</title>
</head>
<body>
<h2>수료증 및 수상 내역</h2>
<hr>
<ul>
<h3><li>수료증</li></h3>
<table border="1">
<tr>
<th>과목명</th>
<th>교수명</th>
<th>대학명</th>
</tr>
<tr>
<td>Programming for Everybody(Getting Started with Python)</td>
<td>Charles Severance</td>
<td>University of Michigan</td>
</tr>
</table>
<h3><li>수상내역</li></h3>
<table border="1">
<tr>
<th>대회명</th>
<th>수상일</th>
</tr>
<tr>
<td>The 5th MIRROR Essay Contest</td>
<td>11/23/15</td>
</tr>
</table>
</ul>
<hr>
메인으로
</body>
</html>
I'm using Eclipse Jee Mars btw, and I get warnings saying "Multiple annotations found at this line: Invalid location of tag (h3), Invalid location of tag (h3)" and "Invalid location of (table)". When I open the file through Internet explorer it works fine. What's the problem and how should I fix it?
I'm new to thymeleaf and am trying to make a simple table using an array and an each loop.
My code looks like this:
<!DOCTYPE HTML>
<html xmlns:th="http://www.thymeleaf.org">
<head>
<title>Smoke Tests</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
</head>
<body>
<table border="1" style="width:300px">
<tr>
<td>Test Name</td>
</tr>
<tr th:each="smokeTest : ${smokeTests}">
<td>
th:text="${smokeTest.name}">A Smoke Test'
</td>
</tr>
</table>
</body>
</html>
Basically my problem is that I can't run the loop as <td>s within <tr>s. Is there any way that this code could work?
You must put th:text as an attribute of a tag, so
<tr th:each="smokeTest : ${smokeTests}">
<td th:text="${smokeTest.name}">A Smoke Test'</td>
</tr>
should run.
Simple solution which comes to mind first:
<th:block th:each="smokeTest : ${smokeTests}">
<tr>
<td th:text="${smokeTest.name}">A Smoke Test'</td>
</tr>
</th:block>
Details: http://www.thymeleaf.org/whatsnew21.html#bloc
Although, it's late answer.
It's work more specifically, like
<tr th:each="smokeTest : ${smokeTests}">
<td><p th:text="${smokeTest.name}"></p></td>
</tr>
I'm reading a local HTML document with Nokogiri like so:
f = File.open(local_xml)
#doc = Nokogiri::XML(f)
f.close
#doc contains a Nokogiri XML object that I can parse using at_css.
I want to modify it using Nokogiri's XML::Node, and I'm absolutely stuck. How do I take this Nokogiri XML document and work with it using node methods?
For example:
#doc.at_css('rates tr').add_next_sibling(element)
returns:
undefined method `add_next_sibling' for nil:NilClass (NoMethodError)
despite the fact that #doc.class is Nokogiri::XML::Document.
For completeness, here is the markup I'm trying to edit.
<html>
<head>
<title>Exchange Rates</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<table class="rates">
<tr>
<td class="up"><div></div></td>
<td class="date">Saturday, Jan 12</td>
<td class="rate up">3.83</td>
</tr>
<tr>
<td class="up"><div></div></td>
<td class="date">Friday, Jan 11</td>
<td class="rate up">3.70</td>
</tr>
<tr>
<td class="down"><div></div></td>
<td class="date">Thursday, Jan 10</td>
<td class="rate down">3.68</td>
</tr>
<tr>
<td class="down"><div></div></td>
<td class="date">Wedensday, Jan 9</td>
<td class="rate down">3.70</td>
</tr>
<tr>
<td class="up"><div></div></td>
<td class="date">Tuesday, Jan 8</td>
<td class="rate up">3.66</td>
</tr>
</table>
</body>
</html>
This is an example how to do what you are trying to do. Starting with f containing a shortened version of the HTML you want to parse:
require 'nokogiri'
f = '
<html>
<head>
<title>Exchange Rates</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<table class="rates">
<tr>
<td class="up"><div></div></td>
<td class="date">Saturday, Jan 12</td>
<td class="rate up">3.83</td>
</tr>
</table>
</body>
</html>
'
doc = Nokogiri::HTML(f)
doc.at('.rates tr').add_next_sibling('<p>foobar</p>')
puts doc.to_html
Your code is incorrectly trying to find the class="rates" parameter for <table>. In CSS we'd use .rates. An alternate way to do it using CSS is table[class="rates"].
Your example didn't define the node you were trying to add to the HTML, so I appended <p>foobar</p>. Nokogiri will let you build a node from scratch and append it, or use markup and add that, or you could find a node from one place in the HTML, remove it, and then insert it somewhere else.
That code outputs:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Exchange Rates</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<table class="rates">
<tr>
<td class="up"><div></div></td>
<td class="date">Saturday, Jan 12</td>
<td class="rate up">3.83</td>
</tr>
<p>foobar</p>
</table>
</body>
</html>
It's not necessary to use at_css or at_xpath instead of at. Nokogiri senses what type of accessor you're using and handles it. The same applies using xpath or css instead of search. Also, at is equivalent to search('some accessor').first, so it finds the first occurrence of the matching node.
Try to load as HTML instead of XML Nokogiri::HTML(f)
Not getting in much detail on how Nokogiri works, lets say that XML does not have css right? So the method at_css doesn't make sense (maybe it does I dunno). So it should work loading as Html.
Update
Just noticed one thing. You want to do at_css('.rates tr') insteand of at_css('rates tr') because that's how you select a class in css. Maybe it works with XML now.