I need to render a string exactly as I get it from the server, for example if I get a string that contains "\t" I need it to be rendered as "\t" and not as space/s.
In the state of the component I see that the string appears with the special characters but rendered without:
state:
'\"id\"\t\"name\n key\"'
what is rendered:
"id" "name key"
How can I prevent this from happening?
Since JS and DOM by default parse special characters such as \n, you can define special characters that you want to prevent from behaving in their default way and replace them with original plus backslash before it:
Take a look at this runnable snippet:
let textWithSpecialChars = `"id"\t"name\n key\f \r \b"`;
const specialChars = ['\\b', '\\r', '\\f', '\\n', '\\t'];
let modifiedTextWithSpecialChars = JSON.stringify(textWithSpecialChars);
specialChars.forEach((char) => {
modifiedTextWithSpecialChars = modifiedTextWithSpecialChars.replace(char, '\\' + char);
});
console.log(modifiedTextWithSpecialChars);
// "\"id\"\\t\"name\\n key\\f \\r \\b\""
console.log(JSON.parse(modifiedTextWithSpecialChars));
// "id"\t"name\n key\f \r \b"
console.log(textWithSpecialChars);
// "id" "name
// key
// "
document.body.innerHTML = JSON.parse(modifiedTextWithSpecialChars)
Analysis
You collect all special characters in that array and escape them.
Stringify your string, in this way you can use that string in JS without JS parsing special characters
Take this string and loop through all special characters that you added above
For every special character take the original string and replace special character such as \n with \\n. Return that value to modifying string and continue until all special characters are replaced. Stringified result of this will be '\"id\"\\t\"name\\n key\\f \\r \\b\"'
Parse your string back and JS will not parse special characters as special characters, rathe they will be plain strings.
Related
I have a file that have an HTMl code, the HTML tags are encoded like the following content:
\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e
The decoded HTML should be:
<div data-name="region-name" class="main-id">UK</div>
In Ruby, I used cgi library to unescapeHTML however it does not work because when it read the content it does not identify the encoded tags, here is another example:
require 'cgi'
single_quoted_string = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'
double_quoted_string = "\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e"
puts 'unescape single_quoted_string ' + CGI.unescapeHTML(single_quoted_string)
puts 'unescape double_quoted_string ' + CGI.unescapeHTML(double_quoted_string)
The output of the previous code is:
unescape single_quoted_string \x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e
unescape double_quoted_string <div data-name="region-name" class="main-id">UK</div>
My question is, how can I make the single_quoted_string act as if its content is double-quoted to make the function understand the encoded tags?
Thanks
Ruby's parser allows certain escape sequences in string literals.
The double-quoted string literal "\x3c" is recognized as containing a hexadecimal pattern \xnn which represents the single character <. (0x3C in ASCII)
The single-quoted string literal '\x3c' however is treated literally, i.e. it represents four characters: \, x, 3, and c.
how can I make the single_quoted_string act as if its content is double-quoted
You can't. In order to turn these four characters into < you have to parse the string yourself:
str = '\x3c'
str[2, 2] #=> "3c" take hex part
str[2, 2].hex #=> 60 convert to number
str[2, 2].hex.chr #=> "<" convert to character
You can apply this to gsub:
str = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'
str.gsub(/\\x\h{2}/) { |m| m[2, 2].hex.chr }
#=> "<div data-name=\"region-name\" class=\"main-id\">UK</div>"
/\\x\h{2}/ matches a literal backslash (\\) followed by x and two ({2}) hex characters (\h).
Just for reference, a CGI encoded string would look like this:
str = "<div data-name=\"region-name\" class=\"main-id\">UK</div>"
CGI.escapeHTML(str)
#=> "<div data-name="region-name" class="main-id">UK</div>"
It uses &...; style character references.
Your problem has nothing to do with HTML, \x3c represent the hex number '3c' in the ascii table.
Double-quoted strings look for this patterns and convert them to the desired value, single-quoted strings treat it the final outcome.
You can check for yourself that CGI is not doing anything.
CGI.unescapeHTML(double_quoted_string) == double_quoted_string
The easiest way I know to solve your problem is through gsub
def convert(str)
str.gsub(/\\x(\w\w)/) do
[Regexp.last_match(1)].pack("H*")
end
end
single_quoted_string = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'
puts convert(single_quoted_string)
What convert does is to get every pair of hex escaped values and pack them as characters.
I have a csv similar to this (original file is proprietary, cannot share). Separator is Tab.
It contains a description column, whose text is wrapped in double quotes, can contain quoted strings, where, wait for it, escape sequence is also double quote.
id description other_field
12 "Some Description" 34
56 "Some
Multiline
""With Escaped Stuff""
Description" 78
I am parsing the file with this code
let mut reader = csv::ReaderBuilder::new()
.from_reader(file)
.deserialize().unwrap();
I'm consistently getting CSV deserialize error :
CSV deserialize error: record 43747 (line: 43748, byte: 21082563): missing field 'id'
I tried using flexible(true), double_quotes(true) with no luck.
Is it possible to parse this type of field, and if yes, how ?
Actually the issue was unrelated, rust-serde perfectly parses this. Just forgot to define the delimiter (tab in this case). This code works :
let mut reader = csv::ReaderBuilder::new()
.delimiter(b'\t')
.from_reader(file)
.deserialize().unwrap();
A CSV style quoted string, for the purposes of this question, is a string in which:
The string starts and ends with exactly one ".
Two double quotes inside the string are collapsed to one double quote. "Alo""ha"→Alo"ha.
"" on its own is an empty string.
Error inputs, such as "A""" e", cannot be parsed. It's an A", followed by junk e".
I've tried several things, none of which have worked fully.
The closest I've gotten, thanks to some help from user pinkieval in #nom on the Mozilla IRC:
use std::error as stderror; /* Avoids needing nightly to compile */
named!(csv_style_string<&str, String>, map_res!(
terminated!(tag!("\""), not!(peek!(char!('"')))),
csv_string_to_string
));
fn csv_string_to_string(s: &str) -> Result<String, Box<stderror::Error>> {
Ok(s.to_string().replace("\"\"", "\""))
}
This does not catch the end of the string correctly.
I've also attempted to use the re_match! macro with r#""([^"]|"")*""#, but that always results in an Err::Incomplete(1).
I've determined that the given CSV example for Nom 1.0 doesn't work for a quoted CSV string as I'm describing it, but I do know implementations differ.
Here is one way of doing it:
use nom::types::CompleteStr;
use nom::*;
named!(csv_style_string<CompleteStr, String>,
delimited!(
char!('"'),
map!(
many0!(
alt!(
// Eat a " delimiter and the " that follows it
tag!("\"\"") => { |_| '"' }
| // Normal character
none_of!("\"")
)
),
// Make a string from a vector of chars
|v| v.iter().collect::<String>()
),
char!('"')
)
);
fn main() {
println!(r#""Alo\"ha" = {:?}"#, csv_style_string(CompleteStr(r#""Alo""ha""#)));
println!(r#""" = {:?}"#, csv_style_string(CompleteStr(r#""""#)));
println!(r#"bad format: {:?}"#, csv_style_string(CompleteStr(r#""A""" e""#)));
}
(I wrote it in full nom, but a solution like yours, based on an external function instead of map!() each character, would work too, and may be more efficient.)
The magic here, that would also solve your regexp issue, is to use CompleteStr. This basically tells nom that nothing will come after that input (otherwise, nom assumes you're doing a streaming parser, so more input may follow).
This is needed because we need to know what to do with a " if it is the last character fed to nom. Depending on the character that comes after it (another ", a normal character, or EOF), we have to take a different decision -- hence the Incomplete result, meaning nom does not have enough input to make the decision. Telling nom that EOF comes next solves this indecision.
Further reading on Incomplete on nom's author's blog: http://unhandledexpression.com/general/2018/05/14/nom-4-0-faster-safer-simpler-parsers.html#dealing-with-incomplete-usage
You may note that this parser does not actually rejects the invalid input, but parses the beginning and returns the rest. If you use this parser as a subparser in another parser, the latter would then feed the remainder to the next subparser, which would crash as well (because it would expect a comma), causing the overall parser to fail.
If you don't want that, you could make csv_style_string match peek!(alt!(char!(',')|char!('\n")|eof!())).
I've tried a couple of different solutions to fix my problem with some "funny" newlines within my json dictionary and none of them works, so I thought I might make a post. The dictionary is achieved by scraping a website.
I have a json dictionary:
my_dict = {
u"Danish title": u"Avanceret",
u"Course type": u"MScTechnol",
u"Type of": u"assessmen",
u"Date": u"\nof exami",
u"Evaluation": u"7 step sca",
u"Learning objectives": u"\nA studen",
u"Participants restrictions": u"Minimum 10",
u"Aid": u"No Aid",
u"Duration of Course": u"13 weeks",
u"name": u"Advanced u",
u"Department": u"31\n",
u"Mandatory Prerequisites": u"31545",
u"General course objectives": u"\nThe cour",
u"Responsible": u"\nMartin C",
u"Location": u"Campus Lyn",
u"Scope and form": u"Lectures, ",
u"Point( ECTS )": u"10",
u"Language": u"English",
u"number": u"31548",
u"Content": u"\nThe cour",
u"Schedule": u"F4 (Tues 1"
}
I have stripped the value content to [:10] to reduce clutter, but some of the values have a length of 300 characters. It might not be portrayed well here, but some of values have a lot of newline characters in them and I've tried a lot of different solutions to remove them, such as str.strip and str.replace but without success because my 'values' are unicode. And by values I mean key, value in my_dict.items().
How do I remove all the newlines appearing in my dictionary? (With the values in focus as some of the newlines are trailing, some are leading and others are in the middle of the content: e.i \nI have a\ngood\n idea\n).
EDIT
I am using Python v. 2.7.11 and the following piece of code doesn't produce what I need. I want all the newlines to be changed to a single whitespace character.
for key, value in test.items():
value = str(value[:10]).replace("\n", " ")
print key, value
If you're trying to remove all \n or any junk character apart from numbers or letters then use regex
for key in my_dict.keys():
my_dict[key] = mydict[key].replace('\\n', '')
my_dict[key] = re.sub('[^A-Za-z0-9 ]+', '', my_dict[key])
print my_dict
If you wish to keep anything apart from those then add it on to the character class inside the regex
for remove '\n' try this ....
for key, value in my_dict.items():
my_dict[key] = ''.join(value.split('\n'))
you need to put the updated value back to your dictionary ( similar to "by value vs. by reference" situation ;) ) ...
to remove the "/n" this one liner may be more "pythonic" :
new_test ={ k:v.replace("\n", "") for k,v in test.iteritems()}
to do what you try to do in your loop try something like:
new_test ={ k:str(value[:10]).replace("\n", " ") for k,v in test.iteritems()}
In your code, value takes the new value, but you never write it back...
So for example, this would work (but be slower, also you would be changing the values inside the loop, which should not cause problems, but the interpreter might not like...):
for key, value in test.items():
value = str(value[:10]).replace("\n", " ")
#now put it back to the dictionary...
test[key]=value
print key, value
I'm trying to construct json text as show below. But the variables such as $token, $state, $failedServers are not been replaced with its value. Note- I don't want to use any module specifically for this to work, I just want some plain string to work. Can anyone help me ?
my $json = '{"serverToken":"$token", "state":"$state","parameters" :"$failedServers"}';
current output was:
{"serverToken":"$token", "state":"$state","parameters" :"$failedServers"}
needed output format:
{"serverToken":"1213", "state":"failed","parameters" :"oracleapps.veeralab.com,suntrust.com"}
Your variables are not being replaced, because they are inside of a single-quoted string--that is, they are inside a string quoted by ' characters. This prevents variable substitution.
You will also be much better off creating JSON using a JSON library, such as this one. Simply using a quoted string is very dangerous. Suppose your one of your variables ends up containing a special character; you will end up with invalid JSON.
{"serverToken":"123"ABC", "state":"offline", "paramameters":"bugs"}
If your variables come from user input, really bad things could happen. Imagine that $token is set to equal foo", "state":"online", "foo":"bar. Your resulting JSON structure would be:
{"serverToken":"foo", "state":"online", "foo":"bar", "state":"offline" ...
Certainly not what you want.
Possible solutions:
The most blatantly obvious solution is simply not to the ' quote character. This has the drawback of requiring you to escape your double quote (") characters, though, but it's easy:
my $json = "{\"serverToken\":\"$token\", \"state\":\"$state\",\"parameters\" :\"$failedServers\"}";
Another option is to use sprintf:
my $json = sprintf('{"serverToken":"%s", "state":"%s", "parameters":"%s"}', $token, $state, $failedServers);
But by far, the best solution, because it won't break with wonky input, is to use a library:
use JSON;
my $json = encode_json( {
serverToken => $token,
state => $state,
paramaters => $failedServers
} );