Say I have a JSON with a 0xb7 byte encoded as a UTF codepoint:
{"key":"_\u00b7_"}
If I extract the value of the "key" with jq it keeps the utf8 encoding of this byte which is "c2 b7":
$ echo '{"key":"_\u00b7_"}' | ./jq '.key' -r | xxd
0000000: 5fc2 b75f 0a _.._.
Is there any jq command that extracts the decoded "5f b7 5f" byte sequence out of this JSON?
I can solve this with extra tools like iconv but it's a bit ugly:
$ echo '{"key":"_\u00b7_"}' | ./jq '.key' -r \
| iconv -f utf8 -t utf32le \
| xxd -ps | sed -e 's/000000//g' | xxd -ps -r \
| xxd
0000000: 5fb7 5f0a _._.
def hx:
def hex: [if . < 10 then 48 + . else 55 + . end] | implode ;
tonumber | "\(./16 | floor | hex)\(. % 16 | hex)";
{"key":"_\u00b7_"} | .key | explode | map(hx)
produces:
["5F","B7","5F"]
"Raw Bytes" (caveat emptor)
Since jq only supports UTF-8 strings, you would have to use some external tool to obtain the "raw bytes". Maybe this is closer to what you want:
jq -nrj '{"key":"_\u00b7_"} | .key' | iconv -f utf-8 -t ISO8859-1
This produces the three bytes.
And here's an iconv-free solution:
jq -nrj '{"key":"_\u00b7_"} | .key' | php -r 'print utf8_decode(readline());'
Alternate
Addressing the character encoding scenario outside of jq:
Though you didn't want extra tools, iconv and hexdump are indeed readily available - I for one frequently lean on iconv when I require certain parts of a pipeline to be completely known to me, and hexdump when I want control of the formatting of the representation of those parts.
So an alternative is:
jq -njr '{"key":"_\u00b7_"} | .key' | iconv -f utf8 -t UTF-32LE | hexdump -ve '1/1 "%.X"'
Result:
5FB75F
Related
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/5317139/property/IsomericSMILES/JSON
For the above JSON, the following jq prints 5317139 CCC/C=C\\1/C2=C(C3C(O3)CC2)C(=O)O1.
.PropertyTable.Properties
| .[]
| [.CID, .IsomericSMILES]
| #tsv
But there are two \ before the first 1. Is it wrong, should three be just one \? How to get the correct number of backslash?
The extra backslash in the output is the result of the request to produce TSV, since "\" has a special role to play in jq's TSV (e.g. "\t" signifies the tab character).
By contrast, consider:
jq -r '
.PropertyTable.Properties
| .[]
| [.CID, .IsomericSMILES]
| join("\t")' smiles.json
5317139 CCC/C=C\1/C2=C(C3C(O3)CC2)C(=O)O1
I have a response trace file containing below response:
#RESPONSE BODY
#--------------------
{"totalItems":1,"member":[{"name":"name","title":"PatchedT","description":"My des_","id":"70EA96FB313349279EB089BA9DE2EC3B","type":"Product","modified":"2019 Jul 23 10:22:15","created":"2019 Jul 23 10:21:54",}]}
I need to fetch the value of the "id" key in a variable which I can put in my further code.
Expected result is
echo $id - should give me 70EA96FB313349279EB089BA9DE2EC3B value
With valid JSON (remove first to second row with sed and parse with jq):
id=$(sed '1,2d' file | jq -r '.member[]|.id')
Output to variable id:
70EA96FB313349279EB089BA9DE2EC3B
I would strongly suggest using jq to parse json.
But given that json is mostly compatible with python dictionaries and arrays, this HACK would work too:
$ cat resp
#RESPONSE BODY
#--------------------
{"totalItems":1,"member":[{"name":"name","title":"PatchedT","description":"My des_","id":"70EA96FB313349279EB089BA9DE2EC3B","type":"Product","modified":"2019 Jul 23 10:22:15","created":"2019 Jul 23 10:21:54",}]}
$ awk 'NR==3{print "a="$0;print "print a[\"member\"][0][\"id\"]"}' resp | python
70EA96FB313349279EB089BA9DE2EC3B
$ sed -n '3s|.*|a=\0\nprint a["member"][0]["id"]|p' resp | python
70EA96FB313349279EB089BA9DE2EC3B
Note that this code is
1. dirty hack, because your system does not have the right tool - jq
2. susceptible to shell injection attacks. Hence use it ONLY IF you trust the response received from your service.
Quick and dirty (don't use eval):
eval $(cat response_file | tail -1 | awk -F , '{ print $5 }' | sed -e 's/"//g' -e 's/:/=/')
It is based on the exact structure you gave, and hoping there is no , in any value before "id".
Or assign it yourself:
id=$(cat response_file | tail -1 | awk -F , '{ print $5 }' | cut -d: -f2 | sed -e 's/"//g')
Note that you can't access the name field with that trick, as it is the first item of the member array and will be "swallowed" by the { print $2 }. You can use an even-uglier hack to retrieve it though:
id=$(cat response_file | tail -1 | sed -e 's/:\[/,/g' -e 's/}\]//g' | awk -F , '{ print $5 }' | cut -d: -f2 | sed -e 's/"//g')
But, if you can, jq is the right tool for that work instead of ugly hacks like that (but if it works...).
When you can't use jq, you can consider
id=$(grep -Eo "[0-9A-F]{32}" file)
This is only working when the file looks like what I expect, so you might need to add extra checks like
id=$(grep "My des_" file | grep -Eo "[0-9A-F]{32}" | head -1)
How do I convert these two text strings into separate json objects
Text strings:
start process: Mon May 15 03:14:09 UTC 2017
logfilename: log_download_2017
Json output:
{
"start process": "Mon May 15 03:14:09 UTC 2017",
}
{
"logfilename": "log_download_2017",
}
Shell script:
logfilename="log_download_2017"
echo "start process: $(date -u)" | tee -a $logfilename.txt | jq -R split(:) >> $logfilename.json
echo "logfilename:" $logfilename | tee -a $logfilename.txt | jq -R split(:) >> $logfilename.json
One approach would be to use index/1, e.g. along these lines:
jq -R 'index(":") as $ix | {(.[:$ix]) : .[$ix+1:]}'
Or, if your jq supports regex, you might like to consider:
jq -R 'match( "([^:]*):(.*)" ) | .captures | {(.[0].string): .[1].string}'
or:
jq -R '[capture( "(?<key>[^:]*):(?<value>.*)" )] | from_entries'
How do I convert these two text strings into a single json object
Text strings:
start process: Mon May 15 03:14:09 UTC 2017
logfilename: log_download_2017
Json output:
{
"start process": "Mon May 15 03:14:09 UTC 2017",
"logfilename": "log_download_2017",
}
Shell script:
logfilename="log_download_2017"
echo "start process: $(date -u)" | tee -a $logfilename.txt | jq -R . >> $logfilename.json
echo "logfilename:" $logfilename | tee -a $logfilename.txt | jq -R . >> $logfilename.json
As mentioned e.g. at Use jq to turn x=y pairs into key/value pairs, the basic task of converting a key:value string can be accomplished in a number of ways. For example, you could start with:
index(":") as $ix | {(.[:$ix]) : .[$ix+1:]}
You evidently want to trim some spaces, which can be done using sub/2.
To combine the objects, you could use add. To do this in a single pass, you would use jq -R -s
Putting it all together, you could do worse than:
def trim: sub("^ +";"") | sub(" +$";"");
def s2o:
(index(":") // empty) as $ix
| {(.[:$ix]): (.[$ix+1:]|trim)};
split("\n") | map(s2o) | add
Input :-
{"Timestamp":140,
"DateTime":"2014-06-02 14:32:34.440 PDT",
"CustomerId":"01",
"VisitorId":"78"}
Desired Output
Timestamp; DateTime; CustomerId; VisitorId
140; 2014-06-02 14:32:34.440 PDT; 01; 78
I tried the following code:-
results.txt
| (map(keys) | add | unique) as $cols
| map(. as $row | $cols | map($row[.])) as $rows
| $cols, $rows[] | #csv
Error:-
'add' is not recognized as an internal or external command,
operable program or batch file."
I don't know what is wrong. I am using window platform with cygwin.
With your input, and the following program in tocsv.jq:
(keys_unsorted | join(",")),
([.[]] | #csv)
the command:
$ jq -r -f tocsv.jq input.json
produces:
Timestamp,DateTime,CustomerId,VisitorId
140,"2014-06-02 14:32:34.440 PDT","01","78"
Eliminating the quotation marks in the second line is left as an exercise for the interested reader :-) [Hint: use join(",") again.]
WARNING: the above program is intended only for jq version 1.5 or later. When using an earlier version of jq, using to_entries or explicitly specifying the key names may be required.