Make a string both valid JSON and shell escaped - json

I have an array which I want to convert to a JSON string. One of the elements has a backtick. This would cause an error when I try to run the command in the shell:
data = [["305", "John Smith", "Amy Smith`", "10/11/2008", "Medical", {"page_count"=>4}]]
json_str = data.to_json.gsub('"','\"')
cmd = "node myscript.js #{json_str}"
Open3.popen3(cmd) do |stdin, stdout, stderr, wait_thr|
output = [stdout.read, stderr.read]
end
Error retrieving data: sh: 1: Syntax error: EOF in backquote substitution
An obvious solution is to escape the backtick:
json_str = data.to_json.gsub('"','\"').gsub('`','\\\`')
But I want to escape all special shell characters that could raise an isuse. Ruby's shellescape escapes a string so that it can be safely used in a Bourne shell command line. Here's an example:
argv = "It's better to give than to receive".shellescape
argv #=> "It\\'s\\ better\\ to\\ give\\ than\\ to\\ receive"
But look what happens when I apply it to JSON string:
data = [["305", "John Smith", "Amy Smith`", "10/11/2008", "Medical", {"page_count"=>4}]]
data = data.to_json
=> "[[\"305\",\"John Smith\",\"Amy Smith`\",\"10/11/2008\",\"Medical\",{\"page_count\":4}]]"
data = data.to_json.shellescape
=> "\\"\\\\"\[\[\\\\\\\\"305\\\\\\\\",\\\\\\\\"John\ Smith\\\\\\\\",\\\\\\\\"Amy\ Smith\`\\\\\\\\",\\\\\\\\"10/11/2008\\\\\\\\",\\\\\\\\"Medical\\\\\\\\",\{\\\\\\\\"page_count\\\\\\\\":4\}\]\]\\\\"\\""
Clearly, this would raise an error like:
SyntaxError: Unexpected token \ in JSON at position 0
What happens is that shellescape will also escape spaces, since the shell requires spaces to be escaped. But having spaces is valid and necessary JSON. So how could I escape shell characters that would cause an error in my command without it breaking the JSON?

Shells are for humans, not for machines. Having a machine produce shell commands is a code smell indicating that you're automating at the wrong layer.
Skip the shell, and just run your program with the required arguments:
data = [["305", "John Smith", "Amy Smith`", "10/11/2008", "Medical", {"page_count"=>4}]]
json_str = data.to_json
Open3.popen3("node", "myscript.js", json_str) do |stdin, stdout, stderr, wait_thr|
output = [stdout.read, stderr.read]
end
Since there is no shell involved, there's no human silliness like escaping to care about.

Related

Use variables in JQ queries

I want to use the value of a variable USER_PROXY in the JQ query statement.
export USER_PROXY= "proxy.zyz.com:122"
BY refering the SO answer no:1 from HERE , and also the LINK, I made the following shell script.
jq -r --arg UPROXY ${USER_PROXY} '.proxies = {
"default": {
"httpProxy": "http://$UPROXY\",
"httpsProxy": "http://$UPROXY\",
"noProxy": "127.0.0.1,localhost"
}
}' ~/.docker/config.json > tmp && mv tmp ~/.docker/config.json
However, I see I get the bash error as below. What is it that is missing here. Why is JQ variable UPROXY not getting the value from USER_PROXY bash variable.
export USER_PROXY= "proxy.zyz.com:122"
You can't have a space here. This sets USER_PROXY to an empty string and tries to export a non-existant variable 'proxy.zyz.com:122'. You probably want
export USER_PROXY="proxy.zyz.com:122"
jq -r --arg UPROXY ${USER_PROXY} '.proxies = {
"default": {
"httpProxy": "http://$UPROXY\",
"httpsProxy": "http://$UPROXY\",
"noProxy": "127.0.0.1,localhost"
}
}' ~/.docker/config.json > tmp && mv tmp ~/.docker/config.json
You need quotes around ${USER_PROXY} otherwise any whitespace in it will break it. Instead use --arg UPROXY "${USER_PROXY}".
This isn't the syntax for using variables inside a string in jq. Instead of "...$UPROXY..." you need "...\($UPROXY)..."
You are escaping the " at the end of the string by putting a \ before it. I am not sure what you mean here. I think you perhaps meant to use a forward slash instead?
This last issue is the immediate cause of the error message you're saying. It says "syntax error, unexpected IDENT, expecting '}' at line 4" and then shows you what it found on line 4: "httpsProxy": .... It parsed the string from line 3, which looks like: "http://$UPROXY\"\n " because the escaped double quote doesn't end the string. After finding the end of the string on line 4, jq expects to find a } to close the object, or a , for the next key-value-pair, but it finds httpsProxy, which looks like an identifier. So that's what the error message is saying. It found an IDENTifier when it was expecting a } (or a , but it doesn't mention that).

Trying to dump information to a json, but getting double backslashs

I have some info store in a MySQL database, something like: AHmmgZq\n/+AH+G4
We get that using an API, so when I read it in my python I get: AHmmgZq\\n/+AH+G4 The backslash is doubled!
Now I need to put that into a JSON file, how can I remove the extra backslash?
EDIT: let me show my full code:
json_dict = {
"private_key": "AHmmgZq\\n/+AH+G4"
}
print(json_dict)
print(json_dict['private_key'])
with open(file_name, "w", encoding="utf-8") as f:
json.dump(json_dict, f, ensure_ascii=False, indent=2)
In the first print I have the doubled backslash, but in the second one there's only one. When I dump it to the json file it gives me doubled.
"AHmmgZq\\n/+AH+G4" in python is equivalent to the literal string "AHmmgZq\n/+AH+G4". print("AHmmgZq\\n/+AH+G4") => "AHmmgZq\n/+AH+G4"
\n is a new line character in python. So to represent \n literally it needs to be escaped with a \. I would first try to convert to json as is and see if that works.
Otherwise for removing extra backslashs:
string_to_json.replace("\\\\","\\")
Remember that \\ = escaped \ = \
But in the above string that will not help you, because python reads "AHmmgZq\\n/+AH+G4" as "AHmmgZq\n/+AH+G4" and so finds no double backslash.
What solved my problem was this:
string_to_json.replace("\\n","\n")
Thanks to everybody!

Invalid numeric literal with jq

I have a large amount of JSON from a 3rd party system which I would like to pre-process with jq, but I am having difficulty composing the query, test case follows:
$ cat test.json
{
"a": "b",
"c": "d",
"e": {
"1": {
"f": "g",
"h": "i"
}
}
}
$ cat test.json|jq .e.1.f
jq: error: Invalid numeric literal at EOF at line 1, column 3 (while parsing '.1.') at <top-level>, line 1:
.e.1.f
How would I get "g" as my output here? Or how do I cast that 1 to a "1" so it is handled correctly?
From jq manual :
You can also look up fields of an object using syntax like .["foo"]
(.foo above is a shorthand version of this, but only for
identifier-like strings).
You also need quotes and use -r if you want raw output :
jq -r '.e["1"].f' test.json
I wrote a shell script function that calls the curl command, and pipes it into the jq command.
function getName {
curl http://localhost:123/getname/$1 | jq;
}
export -f getName
When I ran this from the CLI,
getName jarvis
I was getting this response:
parse error: Invalid numeric literal at line 1, column 72
I tried removing the | jq from the curl command, and I got back the result without jq parsing:
<Map><timestamp>1234567890</timestamp><status>404</status><error>Not Found</error><message>....
I first thought that I had a bad character in the curl command, or that I was using the function param $1 wrong.
Then I counted the number of chars in the result string, and I noticed that the 72nd char in that string was the empty space between "Not Found".
The underlying issue was that I didn't have a method named getname yet in my spring REST controller, so the response was coming back 404 Not Found. But in addition, jq wasn't handling the empty space in the response except by outputting the error message.
I'm new to jq so maybe there is a way to get around the empty space issue, but that's for another day.

Parsing JSON file using sed or perl

I have a json file that requires parsing.
Using scripting like sed/awk or perl, how to extract value30 and substitute that to value6 prefixed by string "XX" (eg. XX + value30).
Where:
field6 = fixed string
value6 = fixed string
value30 = varying string
[
{"field6" : "value6", "field30" : "value30" },
{ "field6" : "value6", "field30" : "value30" }
]
If I understand you correctly, this program should do what you're after:
use JSON qw(decode_json encode_json);
use strict;
use warnings;
# set the input line separator to undefined so the next read (<>) reads the entire file
undef $/;
# read the entire input (stdin or a file passed on the command line) and parse it as JSON
my $data = decode_json(<>);
my $from_field = "field6";
my $to_field = "field30";
for (#$data) {
$_->{$to_field} = $_->{$from_field};
}
print encode_json($data), "\n";
It relies on the JSON module being installed, which you can install via cpanm (which should be available in most modern Perl distributions):
cpanm install JSON
If the program is in the file substitute.pl and your json array is in data.json, then you would run it as:
perl substitute.pl data.json
# or
cat data.json | perl substitute.pl
It should produce:
[{"field30":"value6","field6":"value6"},{"field30":"value6","field6":"value6"}]
Replacing field30's value iwth field6's.
Is this what you were attempting to do?

Retrieving original JSON code from a decoded array node in Perl

I'm working on a script which receives JSON code for an array of objects similar to this:
{
"array":[
{ "id": 1, "text": "Some text" },
{ "id": 2, "text": "Some text" }
]
}
I decode it using JSON::XS and then filter out some of the results. After this, I need to store the JSON code for each individual node into a queue for later processing. The format this queue requires is JSON too, so the code I'd need to insert for each node would be something like this:
{ "id": 1, "text": "Some text" }
However, after decode_json has decoded a node, all that's left are hash references for each node:
print $json->{'array'}[0]; # Would print something like HASH(0x7ffa80c83270)
I know I could get something similar to the original JSON code using encode_json on the hash reference, but the resulting code is different from the original code, UTF-8 characters get all weird, and it seems like a lot of extra processing, specially considering the amount of data this script has to deal with.
Is there a way to retrieve the original JSON code from a decoded array node? Does JSON::XS keep the original chunks somewhere after they have been decoded?
EDIT
About the weird UTF-8 characters, they just look weird on the screen:
#!/usr/bin/perl
use utf8;
use JSON::XS;
binmode STDOUT, ":utf8";
$old_json = '{ "text": "Drag\u00f3n" }';
$json = decode_json($old_json);
print $json->{'text'}; # Dragón
$new_json = encode_json($json);
print $new_json; # {"text":"Dragón"}
$json = decode_json($new_json);
print $json->{'text'}; # Dragón
encode_json will produce equivalent JSON to what you originally had before you decoded it with decode_json. Characters encoded using UTF-8 do not get all weird.
$ cat a.pl
use Encode qw( encode_utf8 );
use JSON::XS qw( decode_json encode_json );
my $json = encode_utf8(qq!{"name":"\x{C9}ric" }!);
print($json, "\n");
print(encode_json(decode_json($json)), "\n");
$ perl a.pl | od -c
0000000 { " n a m e " : " 303 211 r i c "
0000020 } \n { " n a m e " : " 303 211 r i c
0000040 " } \n
0000043
If you want a parser that preserves the original JSON, you'll surely have to write your own; the existing ones don't do that.
No, it doesn't exist anywhere. The "original JSON" isn't stored per-element; it's decoded in a single pass.
No, this is not possible. Every JSON object can have multiple, but equivalent representations:
{ "key": "abc" }
and
{
"key" : "abc"
}
are pretty much the same.
So just use the re-encoded JSON your module gives you.
Even if JSON::XS caches the chunks, extracting them would be a breach of encapsulation, therefore having no guarantee of working if the module is upgraded. And it is bad design.
Don't care about performance. The XS modules have exceptional performance, as they are coded in C. And if you were paranoid about performance, you wouldn't use JSON but some binary format. And you wouldn't be using Perl, but Fortran ;-)
You should treat equivalent data as equivalent data. Even if the presentation is different.
If the unicode chars look weird, but process fine, there is no problem. If they don't get processed correctly, you might have to specify an exact encoding.