How do I replace entity references with character references? - html

I'm looking for a way in Ruby or Rails to replace entity references ( ) in a file with their character reference equivalents ( ).
is the main offender, but I'd like to do the replacement systematically rather than just hand coding a bunch of gsubs.

You can use the HtmlEntities gem:
gem install htmlentieties
require 'htmlentities'
decoded = HTMLEntities.new.decode ' Hello'
decoded[0].ord #=> 160
As Stefan mentioned in the comment, if you want to encode it back using reference numbers, just decode the string and encode it with the :decimal flag:
require 'htmlentities'
text = ' Hello'
coder = HTMLEntities.new
final_text = coder.encode coder.decode(' Hello'), :decimal
p final_text #=>  Hello

"Max Williams".html_safe => "Max Williams"
This is functionality of Rails's Active Support.

Related

Read a file in R with mixed character encodings

I'm trying to read tables into R from HTML pages that are mostly encoded in UTF-8 (and declare <meta charset="utf-8">) but have some strings in some other encodings (I think Windows-1252 or ISO 8859-1). Here's an example. I want everything decoded properly into an R data frame. XML::readHTMLTable takes an encoding argument but doesn't seem to allow one to try multiple encodings.
So, in R, how can I try several encodings for each line of the input file? In Python 3, I'd do something like:
with open('file', 'rb') as o:
for line in o:
try:
line = line.decode('UTF-8')
except UnicodeDecodeError:
line = line.decode('Windows-1252')
There do seem to be R library functions for guessing character encodings, like stringi::stri_enc_detect, but when possible, it's probably better to use the simpler determinstic method of trying a fixed set of encodings in order. It looks like the best way to do this is to take advantage of the fact that when iconv fails to convert a string, it returns NA.
linewise.decode = function(path)
sapply(readLines(path), USE.NAMES = F, function(line) {
if (validUTF8(line))
return(line)
l2 = iconv(line, "Windows-1252", "UTF-8")
if (!is.na(l2))
return(l2)
l2 = iconv(line, "Shift-JIS", "UTF-8")
if (!is.na(l2))
return(l2)
stop("Encoding not detected")
})
If you create a test file with
$ python3 -c 'with open("inptest", "wb") as o: o.write(b"This line is ASCII\n" + "This line is UTF-8: I like π\n".encode("UTF-8") + "This line is Windows-1252: Müller\n".encode("Windows-1252") + "This line is Shift-JIS: ハローワールド\n".encode("Shift-JIS"))'
then linewise.decode("inptest") indeed returns
[1] "This line is ASCII"
[2] "This line is UTF-8: I like π"
[3] "This line is Windows-1252: Müller"
[4] "This line is Shift-JIS: ハローワールド"
To use linewise.decode with XML::readHTMLTable, just say something like XML::readHTMLTable(linewise.decode("http://example.com")).

LuaLaTex using fontspec package and luacode reading JSON file

I'm using Latex since years but I'm new to embedded luacode (with Lualatex). Below you can see a simplified example:
\begin{filecontents*}{data.json}
[
{"firstName":"Max", "lastName":"Möller"},
{"firstName":"Anna", "lastName":"Smith"}
];
\end{filecontents*}
\documentclass[11pt]{article}
\usepackage{fontspec}
%\setmainfont{Carlito}
\usepackage{tikz}
\usepackage{luacode}
\begin{document}
\begin{luacode}
require("lualibs.lua")
local file = io.open('data.json','rb')
local jsonstring = file:read('*a')
file.close()
local jsondata = utilities.json.tolua(jsonstring)
tex.print('\\begin{tabular}{cc}')
for key, value in pairs(jsondata) do
tex.print(value["firstName"] .. ' & ' .. value["lastName"] .. '\\\\')
end
tex.print('\\hline\\end{tabular}')
\end{luacode}
\end{document}
When executing Lualatex following error occurs:
LuaTeX error [\directlua]:6: attempt to index field 'json' (a nil value) [\directlua]:6: in main chunk. \end{luacode}
When commenting the line \usepackage{fontspec} the output will be produced. Alternatively, the error can be avoided by commenting utilities.json.tolua(jsonstring) and all following lua-code lines.
So the question is: How can I use both "fontspec" package and json-data without generating an error message? Apart from this I have another question: How to enable german umlauts in output of luacode (see first "lastName" in example: Möller)?
Ah, I'm using TeX Live 2015/Debian on Ubuntu 16.04.
Thank you,
Jerome

Getting JRuby-internal Java object from Ruby code

I'm wondering if I could get JRuby-internal Java objects (e.g. org.jruby.RubyString, org.jruby.RubyTime) in Ruby code, and call their Java methods from Ruby. Does anyone know how to do it?
str = "foobar"
rubystring_str = str.toSomethingConversion # <== What I want
# http://jruby.org/apidocs/org/jruby/RubyString.html#getEncoding()
rubystring_str.getEncoding() # Java::org.jcodings.Encoding
# http://jruby.org/apidocs/org/jruby/RubyString.html#getBytes()
rubystring_str.getBytes() # [Java::byte]
time = Time.now
rubytime_time = time.toSomethingConversion # <== What I want
# http://jruby.org/apidocs/org/jruby/RubyTime.html#getDateTime()
rubytime_time.getDateTime() # Java::org.joda.time.DateTime
I know I can do like that using Java code as below, but here, I'd like to do it purely in Ruby.
public org.joda.time.DateTime getJodaDateTime(RubyTime rubytime) {
return rubytime.getDateTime();
}
Ah, I found the answer in my tries-and-errors.
The following works.
"foobar".to_java(Java::org.jruby.RubyString).getEncoding()
Time.now.to_java(Java::org.jruby.RubyTime).getDateTime()

Converting RSA keys to JSON in Perl

I need to find a way of transferring an RSA public key to a server for my network communication program. I have done some research, and it seems that the easiest way to do this is to convert the public key (which is stored as some kind of hash reference) to a JSON for transmission. However, in my test code I cannot get the key to convert to a JSON. Here is my test program:
use strict;
use warnings;
use Crypt::RSA;
use JSON;
my %hash = ( name => "bob",
age => 123,
hates=> "Perl"
);
my $hash_ref = \%hash;
my $hash_as_json = to_json($hash_ref);
print $hash_as_json, "\n"; # Works fine for a normal hash
my $rsa = new Crypt::RSA;
my ($public, $private) = $rsa->keygen (
Identity => 'client',
Size => 512,
Password => 'password',
Verbosity => 1,
) or die $rsa->errstr();
my $key_hash_as_json = to_json($public, {allow_blessed => 1, convert_blessed => 1});
print $key_hash_as_json, "\n";
Before I found the line {allow_blessed => 1, convert_blessed => 1} I got an error message saying
encountered object 'Crypt::RSA::Key::Public=HASH(0x3117128)', but
neither allow_blessed, convert_blessed nor allow_tags settings are
enabled (or TO_JSON/FREEZE method missing) at
/home/alex/perl5/lib/perl5/JSON.pm line 154.
What does this mean and why did that line fix it?
After adding the code, it just gives null when I try and print the JSON. Why is this happening and how do I fix it?
Alternatively, is there a better way of doing what I am trying here?
The most common way of representing an RSA public key as text is the PEM encoding. Unfortunately, Crypt::RSA does not provide any way to convert to or from this format, or indeed any other standard format. Don't use it!
Instead, I'd recommend that you use Crypt::OpenSSL::RSA. Generating a private key and printing its public form with this module is simple:
use Crypt::OpenSSL::RSA;
my $key = Crypt::OpenSSL::RSA->generate_key(512);
print $key->get_public_key_string;
This will output a PEM encoding like the following:
-----BEGIN RSA PUBLIC KEY-----
MEgCQQDd/5F9Rc5vsNuKBrd4gfI4BDgre/sTBKu3yXpk+8NjByKpClsi3IQEGYeG
wmv/q/1ZjflFby1MPxMhXZo/82CbAgMBAAE=
-----END RSA PUBLIC KEY-----
Apart from already mentioned PEM there exists JWK format (JSON Web Key). Have a look at Crypt::PK::RSA (my module) which supports generating, importing and exporting RSA keys in both PEM and JWK.

Rails html encoding

I am using h helper method in Rails to encode/escape a string that has an apostrophe (') In my view I am using it like this
<%=h "Mike's computer" %>
My understanding is that the html when viewing the source should be Mike%27s computer but the html produced has an apostrophe in it, Mike's computer
Am I missing something obvious?
How do I get my desired result of Mike%27s computer?
Help is always appreciated.
An apostrophe is a valid character in HTML. It is not encoded because it is not needed to be encoded.
If you want to encode a URL, use u helper:
>> fer#:~/$ script/console
Loading development environment (Rails 2.3.8)
>> include ERB::Util
=> Object
>> h "Mike's computer"
=> "Mike's computer"
>> u "Mike's computer"
=> "Mike%27s%20computer"
>>
If we look at the source code of the h method (it is an alias for html_escape), it is not that hard to just open the file and add the single quote (') to the HTML_ESCAPE constant in the file.
Below is the source code of the method with the location of the method in the file. Find the constant and and the quote in. You can even add more things inside as you want it.
HTML_ESCAPE = { '&' => '&', '>' => '>', '<' => '<', '"' => '"' }
File actionpack/lib/action_view/template_handlers/erb.rb, line 17
17: def html_escape(s)
18: s.to_s.gsub(/[&"><]/) { |special| HTML_ESCAPE[special] }
19: end
CAVEAT: This modification will affect all projects that uses the library.
OR an alternative will be to create a view helper method say in ApplicationHelper
def h_with_quote(s)
HTML_ESCAPE = { "'" => "%27"}
h(s).gsub(/[']/) {|special| HTML_ESCAPE[special]}
end
That approach should be safer.