how to parse a JSON String with jq (or other alternatives)? - json

I'm trying to get jq to parse a JSON structure like:
{
"a" : 1,
"b" : 2,
"c" : "{\"id\":\"9ee ...\",\"parent\":\"abc...\"}\n"
}
That is, an element in the JSON is a string with escaped json.
So, I have something along the lines of
$ jq [.c] myFile.json | jq [.id]
But that crashes with jq: error: Cannot index string with string
This is because the output of .c is a string, not more JSON.
How do I get jq to parse this string?
My initial solution is to use sed to replace all the escape chars (\":\", \",\" and \") but that's messy, I assume there's a way built into jq to do this?
Thanks!
edit:
Also, the jq version available here is:
$ jq --version
jq version 1.3
I guess I could update it if required.

jq has the fromjson builtin for this:
jq '.c | fromjson | .id' myFile.json
fromjson was added in version 1.4.

You can use the raw output (-r) that will unescape characters:
jq -r .c myfile.json | jq .id
ADDENDUM: This has the advantage that it works in jq 1.3 and up; indeed, it should work in every version of jq that has the -r option.

Motivation: you want to parse JSON string - you want to escape a JSON object that's wrapped with quotes and represented as a String buffer, and convert it to a valid JSON object. For example:
some JSON unescaped string :
"{\"name\":\"John Doe\",\"position\":\"developer\"}"
the expected result ( a JSON object ):
{"name":"John Doe","position":"developer"}
Solution: In order to escape a JSON string and convert it into a valid JSON object use the sed tool in command line and use regex expressions to remove/replace specific characters:
cat current_json.txt | sed -e 's/\\\"/\"/g' -e 's/^.//g' -e 's/.$//g'
s/\\\"/\"/g replacing all backslashes and quotes ( \" ) into quotes only (")
s/^.//g replacing the first character in the stream to none character
s/.$//g replacing the last character in the stream to none character

Related

How to prettify json with jq given a string with escaped double quotes

I would like to pretty print a json string I copied from an API call which contains escaped double quotes.
Similar to this:
"{\"name\":\"Hans\", \"Hobbies\":[\"Car\",\"Swimming\"]}"
However when I execute pbpaste | jq "." the result is not changed and is still in one line.
I guess the problem are the escaped double quotes, but I don't know how to tell jq to remove them if possible, or any other workaround.
What you have is a JSON string which happens to contain a JSON object. You can decode the contents of the string with the fromjson function.
$ pbpaste | jq "."
"{\"name\":\"Hans\", \"Hobbies\":[\"Car\",\"Swimming\"]}"
$ pbpaste | jq "fromjson"
{
"name": "Hans",
"Hobbies": [
"Car",
"Swimming"
]
}

How to extract elements from a string value in json, using jq [duplicate]

I'm trying to get jq to parse a JSON structure like:
{
"a" : 1,
"b" : 2,
"c" : "{\"id\":\"9ee ...\",\"parent\":\"abc...\"}\n"
}
That is, an element in the JSON is a string with escaped json.
So, I have something along the lines of
$ jq [.c] myFile.json | jq [.id]
But that crashes with jq: error: Cannot index string with string
This is because the output of .c is a string, not more JSON.
How do I get jq to parse this string?
My initial solution is to use sed to replace all the escape chars (\":\", \",\" and \") but that's messy, I assume there's a way built into jq to do this?
Thanks!
edit:
Also, the jq version available here is:
$ jq --version
jq version 1.3
I guess I could update it if required.
jq has the fromjson builtin for this:
jq '.c | fromjson | .id' myFile.json
fromjson was added in version 1.4.
You can use the raw output (-r) that will unescape characters:
jq -r .c myfile.json | jq .id
ADDENDUM: This has the advantage that it works in jq 1.3 and up; indeed, it should work in every version of jq that has the -r option.
Motivation: you want to parse JSON string - you want to escape a JSON object that's wrapped with quotes and represented as a String buffer, and convert it to a valid JSON object. For example:
some JSON unescaped string :
"{\"name\":\"John Doe\",\"position\":\"developer\"}"
the expected result ( a JSON object ):
{"name":"John Doe","position":"developer"}
Solution: In order to escape a JSON string and convert it into a valid JSON object use the sed tool in command line and use regex expressions to remove/replace specific characters:
cat current_json.txt | sed -e 's/\\\"/\"/g' -e 's/^.//g' -e 's/.$//g'
s/\\\"/\"/g replacing all backslashes and quotes ( \" ) into quotes only (")
s/^.//g replacing the first character in the stream to none character
s/.$//g replacing the last character in the stream to none character

Parsing JSON String without separator using jq

I have a JSON string without the separator ,. How do I parse it using jq?
$echo '{"access_token":"XXXX""expires_in":300"token_type":"Bearer"}' | jq -r .access_token
The above line gives me the below error:
parse error: Expected separator between values at line 1
I understand that the issue is because the JSON string provided is not comma-separated. But this is what I am getting as a response from the server. How do I parse such a string? I want to retrive the value for key "access_token".
You can use a regular expression with sed if you know the accesss token never contains quotes.
echo '{"access_token":"XXXX""expires_in":300"token_type":"Bearer"}' |
sed 's/"access_token":"\([^"]*\)/\1/'
The capture group between \( and \) captures the string between the quotes, and \1 in the replacement string extracts it.
Here are two just-jq solutions, each with its own degree of brittleness. The first one attempts to convert each entire input line into valid JSON:
Using fromjson
echo '{"access_token":"XXXX""expires_in":300"token_type":"Bearer"}' |
jq -rR 'gsub("(?<k>\"[^\"]*\")"; "," + .k )
| gsub("{,\"";"{\"") | gsub(":,\""; ":\"")
| fromjson | .access_token'
XXXX
Assume the value is a string on the same line
jq -rR 'sub(".*\"access_token\" *: *\"(?<v>[^\"]*)\".*"; .v )'

Convert json filtered into csv with jq

I have file that looks like this:
$ cat sample-test.json |jq .
{
"logRef": "c4fa4367-23f6-462f-b5fd-f972d0916a30",
"timestamp": 1563268297545,
"someOtherField": "nonImportantValue"
}
{
"logRef": "c4fa4367-23f6-462f-b5fd-f972d0916a31",
"timestamp": 1563268297595,
"someOtherField2": "nonImportantValue3"
}
And I would like to convert it to csv like this:
logRef;timestamp
c4fa4367-23f6-462f-b5fd-f972d0916a30;1563268297545
c4fa4367-23f6-462f-b5fd-f972d0916a31;1563268297595
I was trying
$ cat sample-test.json |jq '.logRef, .timestamp |#csv'
jq: error (at <stdin>:1): string ("c4fa4367-2...) cannot be csv-formatted, only array
jq: error (at <stdin>:2): string ("c4fa4367-2...) cannot be csv-formatted, only array
Your input is fine (it's a JSON stream).
The problem with your filter is that #csv expects an array. So this will work:
[.logRef,.timestamp] | #csv
However it quotes strings, so if you want your strings unquoted (which might mean the result won't be CSV), then you could use:
"\(.logRef),\(.timestamp)"
In all cases, you'll need to use jq's-r command-line option.
The problem in your json file. Looks like it has incorrect format (without root array element [] and commas between documents). If you fix it, jq will work as expected.
> cat sample-test.json
[{
"logRef": "c4fa4367-23f6-462f-b5fd-f972d0916a30",
"timestamp": 1563268297545,
"someOtherField": "nonImportantValue"
},
{
"logRef": "c4fa4367-23f6-462f-b5fd-f972d0916a31",
"timestamp": 1563268297595,
"someOtherField2": "nonImportantValue3"
}]
cat sample-test.json |jq -r 'map(.logRef), map(.timestamp) | #csv'
"c4fa4367-23f6-462f-b5fd-f972d0916a30","c4fa4367-23f6-462f-b5fd-f972d0916a31"
1563268297545,1563268297595
I've also fixed the command with map() function.

jq double backslash sometime removed

I have a first json file like this:
{
"env_vars": {
"TERRAFORM_CFG_TLS_CERT": "-----BEGIN CERTIFICATE----\\nMIIIqzCCB5O"
}
}
If I use the command:
echo <file> | jq -r '.env_vars'
The result is as expected (the backslash are still there):
{
"TERRAFORM_CFG_TLS_CERT": "-----BEGIN CERTIFICATE----\\nMIIIqzCCB5O"
}
But if i execute this command:
cat <file> | jq -r '.env_vars' | jq -r 'keys[] as $k | "\($k)=\"\(.[$k])\""'
The result is:
TERRAFORM_CFG_TLS_CERT: "-----BEGIN CERTIFICATE----\nMIIIqzCCB5O"
=> One backslash has been removed... why ?
How to avoid this ?
Thanks.
Using the -r option tells jq to "translate" the JSON string into a "raw" string by interpreting the characters that are special to JSON (see e.g. http://json.org). Thus, following the [mcve] guidelines a bit more closely, we could start with:
$ jq . <<< '"X\\nY"'
"X\\nY"
$ jq -r . <<< '"X\\nY"'
X\nY
If you check the json.org specification of strings, you'll see this is exactly correct.
So if for some reason you want each occurrence of \\ in the JSON string to be replaced by two backslash characters (i.e. JSON: "\\\\"), you could use sub or gsub. That's a bit tricky, because the first argument of these functions is a regex. Behold:
$ jq -r 'gsub("\\\\"; "\\\\")' <<< '"X\\nY"'
X\\nY
You should output the string as json to preserve the escapes. By taking a string and outputting it raw, you're getting exactly what that string was, a literal backslash followed by an n.
$ ... | jq -r '.env_vars | to_entries[] | "\(.key): \(.value | tojson)"'
If any of the values are non-strings, add a tostring to the filter.