How to escape quotes that passed from a JSON file to Jade template? - json

I have some variables stored in a JSON file which will be injected into my generated HTML later. Those variables would be put in places like:
var str = '#{content.str}';
While in the JSON file the content.str might contain ' in it, and it would cause the JavaScript error after rendering the HTML file.
What should I do to prevent this happening?
Thanks,

It's simple -
"I've done it".replace("'", "\\'")
//output "I\'ve done it"
Using RegEx - replace all
"I've done it haven't you".replace(/'/g, "\\'")
//output "I\'ve done it haven\'t you"

Related

Windows (command line) tool to convert non-ascii characters to HTML numeric codes [duplicate]

I need to replicate the exact function this website http://www.unicodetools.com/unicode/convert-to-html.php does in a hybrid Javascript/Windows batch script. I have zero knowledge about Javascript but it seems it is the easiest (for those knowledgeable) possible way to replace special non-ASCII characters with their HTML entity equivalents within text files: "têxt" to "têxt", for example, but using input and output text files instead of web forms. I've seen the wonders JREPL.bat (a regex/find and replace tool) does so I thought this could be achieved.
Pardon me for asking this question but this is part of a problem I could not wrap my head around for days. It is in regard to this unanswered question, https://stackoverflow.com/questions/35121949/curl-data-urlencode-posts-broken-non-english-characters. I figured out that the Japanese and other UTF-8 characters in the text file can be passed through CURL post request without being garbled by first encoding them to HTML code before the --data-urlencode part.
That said, I am kindly asking if someone would be so kind as to create a simple JScript/Windows batch script hybrid incorporating the Javascript code the above-mentioned website uses to encode only non-ASCII characters to HTML entities within a text file which I can call from another batch file using a line of code like this:
CALL EncodetoHTML.bat -i "input.txt" -o "output.txt"
Here it is . Brand new and fresh.
You can pass only the file you want to encode (the result will be printed to the console) or pass input and output file.Examples:
call toHtmlEnt.bat input.txt output.txt
call toHtmlEnt.bat input.txt
I wrote my own script. It took me a whole day basically scouring the Internet for useful pieces of code I could find and combining them to achieve the effect I wanted.
Save the code below to tohtmlent.bat. Use it from CMD like tohtmlent.bat filename.txt or call it from another batch file like call tohtmlent.bat filename.txt where "filename.txt" is the input file. Output will be displayed in the console so use > if you would like to pipe the output to a file. The input file should strictly be encoded in UTF-8. Output is ANSI. What the script does is it converts all Unicode characters with decimal range 128 and higher to their numeric HTML entity equivalents.
The code is nowhere near elegant considering I am not a programmer and it still has a lot more room for improvement. But hey, it does its job!
#if (#X)==(#Y) #end /*
#echo off
cscript //E:JScript //nologo "%~f0" %*
exit /b 0
*/
if (WScript.Arguments.Length < 1 ) {
WScript.Echo("No file specified.");
WScript.Quit(0)
}
var inputFile = WScript.Arguments.Item(0);
var fso= new ActiveXObject("Scripting.FileSystemObject");
var inputFile=WScript.Arguments.Item(0);
if (!fso.FileExists(inputFile)){
WScript.Echo(inputFile + " does not exist.");
WScript.Quit(1);
}
var objAdoS = WScript.CreateObject("ADODB.Stream");
objAdoS.Type = 2;
objAdoS.CharSet = "utf-8";
objAdoS.Open();
objAdoS.LoadFromFile(inputFile);
var strInput = objAdoS.ReadText();
objAdoS.Close();
var strOutput = '';
for(i=0; i<strInput.length; i++){
if(strInput.charCodeAt(i)>127){ strOutput += '&#' + strInput.charCodeAt(i) + ';'; }else{ strOutput += strInput.charAt(i); }
}
WScript.Echo(strOutput);

Replacing a string in HTML file in Python

I'm trying to replace a string stored in a list with an HTML tag in a file by doing:
links=[http://hexagon-dashboard-gbc-01/vboard/latest?regs=3281546<!--V68NUR-->]
str1="""%s<!--V68NUR-->"""%(vboard['V68N']['perf.tl'],vboard['V68N']['perf.tl'])
with open(html_file,'r+') as f:
content = f.read()
f.seek(0)
f.truncate()
f.write(content.replace(links[0],str1))
But I get the following error:
TypeError: replace() argument 1 must be str, not Tag.
What am I missing? Please help me with the modification I have to do.
Updated:
From what you posted, I suppose you are treating a html file as plain text and going to perform string replacement.
The replace() function only works when both of its arguments are strings.
The reason you got an error is that links[0] is not a string but a tag.
If you manage to get links like this (note the single quotes)
links=['http://hexagon-dashboard-gbc-01/vboard/latest?regs=3281546<!--V68NUR-->']
then
content.replace(links[0],str1)
would not produce any errors.
To edit html files, you can also use HTML Parser instead.

I need a good regex for HTML file parsing in ruby

Here is a Ruby question guys. So need to parse through the html file and catch urls and emails can't come up with proper regex expression. Tried 100+ regexes and all the times I cash something else with the url.
File.open("/Desktop/file.html").each_line do |line|
if line.split("href=\"") =~ /???/
puts line
end
end
# I can use line.split("href=\"") so each new line will start with url =>
(https://www.facebook.com/students">
The question is what regex can I use to catch everything from https to the end of the url which ends with (") (there could be one or more samples of same url so {1,2} is needed
Try this
file = File.open('filename_path')
links = file.read().scan(/href=\"(?<url>.*?)\"/)
you get links in array
it also works if you remove ?<url> from above(it's just named capture group)

What is the proper method for reading and writing HTML/XML (byte string) with Python and lxml and etree?

EDIT: Now that the problem is solved, I realize that it had more to do with properly reading/writing byte-strings, rather than HTML. Hopefully, that will make it easier for someone else to find this answer.
I have an HTML file that's poorly formatted. I want to use a Python lib to just make it tidy.
It seems like it should be as simple as the following:
import sys
from lxml import etree, html
#read the unformatted HTML
with open('C:/Users/mhurley/Portable_Python/notebooks/View_Custom_Report.html', 'r', encoding='utf-8') as file:
#write the pretty XML to a file
file_text = ''.join(file.readlines())
#format the HTML
document_root = html.fromstring(file_text)
document = etree.tostring(document_root, pretty_print=True)
#write the nice, pretty, formatted HTML
with open('C:/Users/mhurley/Portable_Python/notebooks/Pretty.html', 'w') as file:
#write the pretty XML to a file
file.write(document)
But this chunk of code complains that file_lines is not a string or bytes-like object. Okay, it makes sense that the function can't take a list, I suppose.
But then, it's 'bytes' not a string. No problem, str(document)
But then I get HTML that's full of '\n' that are not newlines... they're a slash followed by an en. And there are no actual carriage returns in the result, it's just one long line.
I've tried a number of other weird things like specifying the encoding, trying to decode, etc. None of which produce the desired result.
What's the right way to read and write this kind of (is non-ASCII the right term?) text?
You are missing that you get bytes from tostring method from etree and need to take that into account when writing (a bytestring) to a file. Use the b switch in the open function like this and forget about the str() conversion:
with open('Pretty.html', 'wb') as file:
#write the pretty XML to a file
file.write(document)
Addendum
Even though this answer solves the immediate problem at hand and teaches about bytestrings, the solution by Padraic Cunningham is the cleaner and faster way to write lxml etrees to a file.
This can be done all using lxml in a couple of lines of code without ever needing to use open, the .write method is exactly for what you are trying to do:
# parse using file name which is the also the recommended way.
tree = html.parse("C:/Users/mhurley/Portable_Python/notebooks/View_Custom_Report.html")
# call write on the tree
tree.write("C:/Users/mhurley/Portable_Python/notebooks/Pretty.html", pretty_print=True, encoding="utf=8")
Also file_text = ''.join(file.readlines()) is exactly the same as file_text = file.read()

Json parsing with unicode characters

i have a json file with unicode characters, and i'm having trouble to parse it. I've tried in Flash CS5, the JSON library, and i have tried it in http://json.parser.online.fr/ and i always get "unexpected token - eval fails"
I'm sorry, there realy was a problem with the syntax, it came this way from the client.
Can someone please help me? Thanks
Quoth the RFC:
JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.
So a correctly encoded Unicode character should not be a problem. Which leads me to believe that it's not correctly encoded (maybe it uses latin-1 instead of UTF-8). How did you create the file? In a text editor?
There might be an obscure Unicode whitespace character hidden in your string.
This URL contains more detail:
http://timelessrepo.com/json-isnt-a-javascript-subset
In asp.net you would think you would use System.Text.Encoding to convert a string like "Paul\u0027s" back to a string like "Paul's" but i tried for hours and found nothing that worked.
The trouble is hardcoding a string as shown above already decodes the string as you will see if you put a break point on it so in the end i wrote a function that converts the Hex27 to Dec39 so that i ended up with HTML encodeing and then decoded that.
string Padding = "000";
for (int f = 1; f <= 256; f++)
{
string Hex = "\\u" + Padding.Substring(0, 4 - f.ToString().Length) + f;
string Dec = "&#" + Int32.Parse(f.ToString(), NumberStyles.HexNumber) + ";";
HTML = HTML.Replace(Hex, Dec);
}
HTML = System.Web.HttpUtility.HtmlDecode(HTML);
Ugly as sin, I know but without using the latest framework (Not on ISP's server) it was the best I could do and someone must know a better solution.
I had the same problem and I just change the file encoding type Mac-Roman/windows-1252 to UTF-8.. and it worked
I had the same problem with Twitter json files. I was parsing them in Python with json.loads(tweet) but it failed for half of the records.
I changed to Python3 and it works well now.
If you seem to have trouble with the encoding of a JSON file (i.e. escaped codes such as \u00fc aren't displayed correctly regardless of your editor's encoding setting) generated by Python with json.dump s(): it encodes ASCII by default and escapes the unicode characters! See python json unicode - how do I eval using javascript (and python: json.dumps can't handle utf-8? and Why does json.dumps escape non-ascii characters with "\uxxxx").