Convert from HTML::Template to PDF:FromHtml says invalid XML - html

I create an HTML file using HTML::Template. The resulting code is a valid XML/HTML (check against a xml validator). But while convert to pdf using PDF::FromHTML a message of "invalid token in xml file" is found.
Trying changing the first declaration line from doctype to xml, or supressing, but nothing works. XML::Simple, PDF:API2, XML::Writer are last version.
Ay idea what is happening?
# create template object and store to verify
shout('s',"create template from $str_filepath") if ($bool_DEBUG);
$str_mytemplate = HTML::Template->new(filename => $str_filepath, case_sensitive => 0, no_includes => 1 );
$str_mytemplate->param(\%strct_toreplace);
$str_filepath = envDir('temp').newID().'.html';
shout('',"template created, storing to : $str_filepath") if ($bool_DEBUG);
if (open(FILE, '>', $str_filepath)) {
print FILE $str_mytemplate->output;
close (FILE);
}
# generate pdf from created file
shout('p',"Creating PDF ") if ($bool_DEBUG);
$pdf_this = PDF::FromHTML->new( encoding => 'utf-8' );
$pdf_this->load_file($str_filepath);
$pdf_this->convert( LineHeight => 10, Landscape => 1, PageSize => 'Letter', );
shout('p',"Display PDF") if ($bool_DEBUG);
print header(-type=>'application/pdf', -charset=>'UTF-8');
print $pdf_this->write_file();
$bool_DEBUG and shout(); are a variable and procedure to set and display messages while debugging mode.
Html code generated via template: http://www.etoxica.com/examplecode.html
Template used: http://www.etoxica.com/exampletemplate.tmpl
Message displayed:
SECTION: Creating PDF
Software error:
not well-formed (invalid token) at line 19, column 13, byte 430 at /usr/local/lib64/perl5/XML/Parser.pm line 187.
at /home/grupo/perl/usr/share/perl5/PDF/FromHTML.pm line 141.

Summary: Found the problem (I guess) ;)
Consider the following lines:
<td>
Some line of data
<br/>
A second line of data
</td>
When try to be read by PDF::FromHTML it will send a message of malformed token in the 5th line, specifically on the slash '/' from </td> tag; BUT, that is not the problem, the problem is created by the <br/> tag inside the <td></td>.
If it is changed to <br> or <br /> no error is found. I don't know if using <br> is a good html practice to xml compability, even is defined as it w3c br semantic.

Related

How do I pass pandoc_options as output_options to rmarkdown::render()

I have an Rmd file that renders into html correctly almost all of the time. However, it does not render correctly when pandoc (used in the rendering process) finds 4 spaces in the html and at that point, interprets that I want to render a markdown code snippet instead of html.
I have been told that I can turn off the markdown_in_html_blocks feature by doing something like this:
pandoc -f markdown-markdown_in_html_blocks.
I have tried calling pandoc directly rather than it being called implicitly by
rmarkdown::render()
but couldn't get that syntax to work and being able to specify this option (-markdown_in_html_blocks) directly as I call render() is preferred. Here is the latest of I have tried without success:
Base case: works but HTML output file is malformed / has a code block instead of the data that I want to display in the table.
render("reports/Pacing.Rmd")
Attempted fix: not working
rmdFmt <- rmarkdown_format("-markdown_in_html_blocks")
pandocOpts <- pandoc_options(to = "html", from = rmdFmt)
render("reports/Pacing.Rmd",output_format = "html_document",output_file = NULL, output_dir = NULL, output_options = pandocOpts)
Error message: Error in (function (toc = FALSE, toc_depth = 3, toc_float = FALSE, number_sections = FALSE, :
argument 1 matches multiple formal arguments
I have tried other syntax to express that I want to turn off markdown_in_html_blocks but no luck.
Given the following document test.Rmd...
---
title: Test
output: html_document
---
<table>
<tr>
<td>*one*</td>
<td>[a link](https://google.com)</td>
</tr>
</table>
...you can disable the markdown_in_html_blocks extension via
rmarkdown::render("test.Rmd",
output_options = list(md_extensions = "-markdown_in_html_blocks"))
md_extensions is one of the arguments that can be passed to rmarkdown::html_document (see ?rmarkdown::html_document for other arguments).
That seems to be an open issue, but a simpler way to turn off/on such a feature is to directly update the YAML in Rmd file. This should work in your case:
output:
html_document:
pandoc_args: [
"-f", "markdown-markdown_in_html_blocks"
]

Keep Special Characters when parsing JSON response

I have retrieve this key/value from a hash using the facebook api
"message":"Next Practice:\n\nDate: 04.05.2014\nTime: 10:00-12:00\nVenue: Llandaff Fields\n\nAll welcome
but when i save it to my model i seem to lose all the special characters, i.e \n. Is there a way to save the value as it is returned so that i can use the \n when outputting to my view using .html_safe
This is how i am retrieving the data
def get_feed
fb_access_token = access_token
uri = URI(URI.escape "https://graph.facebook.com/#{VANDALS_ID}/posts/?#{fb_access_token}")
response = HTTParty.get(uri)
results = JSON.parse(response.body)
formatted_data(results)
end
anything i need to be doing to keep that string with \n left in it
Thanks
When I run the following code:
raw_json = '{"message":"Next Practice:\n\nDate: 04.05.2014\nTime: 10:00-12:00\nVenue: Llandaff Fields\n\nAll welcome"}'
parsed_json = JSON.parse(raw_json)
puts parsed_json['message']
# => Next Practice:
# => Date: 04.05.2014
# => Time: 10:00-12:00
# => Venue: Llandaff Fields
# => All welcome
So the \n is kept (it is parsed, and shown as real new-line). I also don't believe that saving this to your model erased the new lines.
Where I think your real problem lies is that in HTML new lines (\n) are not rendered as new lines at all, but as spaces. To render them as new lines, you need to replace them with breaks (<br>).
So you can try using the following on your ERB:
<div class=message><%= feed.message.gsub("\n", "<br>").html_safe %></div>
Your new-lines will now be rendered on the page.

How to user erlang gen_smtp to send html-formatted email?

gen_smtp can be found here
What I want is to let the content of email supports HTML tag, such as <strong>Hello</strong>
Will display as Hello.
Look at https://github.com/selectel/pat. It's an easy to use SMTP client and you can use any text, including html tags as body of the message.
See the gen_smtp mimemail tests for an example of multipart/alternative messages:
Email = {<<"text">>, <<"html">>, [
{<<"From">>, <<"me#example.com">>},
{<<"To">>, <<"you#example.com">>},
{<<"Subject">>, <<"This is a test">>}],
#{content_type_params => [
{<<"charset">>, <<"US-ASCII">>}],
disposition => <<"inline">>
},
<<"This is a <strong>HTML</strong> message with some non-ascii characters øÿ\r\nso there">>},
Encoded = mimemail:encode(Email)
The answer given by #Ward Bekker is fundamentally correct but it took me a while to make it work as the mimemail:encode/1 expects a proplist not a map which the example shows.
I used Erlang Erlang/OTP 23 [erts-11.0.3] and it failed with:
** exception error: no function clause matching proplists:get_value(<<"content-type-params">>, #{disposition => <<"inline">>,<<"content-type-params">> => [{<<"charset">>,<<"US-ASCII">>}]},[]) (proplists.erl, line 215)
in function mimemail:ensure_content_headers/7 (/Users/sean/Documents/code/erlang/scofblog/_build/default/lib/gen_smtp/src/mimemail.erl, line 661)
The following is the modified code and the encoded output:
Email = {
<<"text">>,
<<"html">>,
[
{<<"From">>, <<"me#example.com">>},
{<<"To">>, <<"you#example.com">>},
{<<"Subject">>, <<"This is a test">>}
],
[{<<"content-type-params">>, [{<<"charset">>, <<"US-ASCII">>}]},
{<<"disposition">>, <<"inline">>}
],
<<"This is a <strong>HTML</strong> øÿ\r\nso there">>
}.
62> mimemail:encode(Email).
<<"From: me#example.com\r\nTo: you#example.com\r\nSubject: This is a test\r\nContent-Type: text/html;\r\n\tcharset=US-ASCII\r\nCon"...>>
Hope that saves some head scratching.

jtidy fails to parse html - options

So I was trying to evaluate a couple of the HTML parsers and gave JTidy a try. Trying to parse this URL:
http://htmlcleaner.sourceforge.net/doc/org/htmlcleaner/TagNode.html
Gives these errors:
line 1 column 56,258 - Error: missing '>' for end of tag
line 1 column 56,258 - Error: is not recognized!
It says line one as it reads it in as one line, but this is the line that JTidy pukes/fails on:
<li>//div[last() >= 4]//./div[position() = last()])[position() > 22]//li[2]//a</li>
My code is pretty simple:
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.tidy.Tidy;
Document document = tidy.parseDOM(new ByteArrayInputStream(this.getHtml().getBytes()), null);
NodeList anchorTags = document.getElementsByTagName("A");
Is this just a bug in JTidy or am I doing something wrong? I've evaluated about 6 others so far and none of them have had a problem on this page.

How could/should I state colon (punctuation) in a YAML file?

I am using Ruby on Rails 3.1.0 and I would like to know how to correctly state colon (punctuation) in a YAML file. I tried to support that by adding the following code in my config/locales/defaults/en.yml file
en
# ':' is the HTML code for ':'
test_key_html: Test value:
and in my view file I used
t('test_key_html')
but it doesn't work (in the front end content is displayed the "plain" Test value: text).
Is it possible? If so how?
You should be able to double quote the value:
test_key_html: "Test value:"
This avoids colon-confusion in the YAML and gets your colon into your HTML.
Consider this in irb:
>> { 'en' => { 'test_key_html' => 'Test value:' } }.to_yaml
=> "--- \nen: \n test_key_html: "Test value:"\n"
Try
raw(t('test_key_html'))
Rails 3+ automattically escapes html markup