How to transform nested JSON to csv using jq

How to transform nested JSON to csv using jq - json

I have tried to transform json in the following format to csv using jq on Linux cmd line, but with no success. Any help of guidance would be appreciated.
{
"dir/file1.txt": [
{
"Setting": {
"SettingA": "",
"SettingB": null
},
"Rule": "Rulechecker.Rule15",
"Description": "",
"Line": 11,
"Link": "www.sample.com",
"Message": "Some message",
"Severity": "error",
"Span": [
1,
3
],
"Match": "[id"
},
{
"Setting": {
"SettingA": "",
"SettingB": null
},
"Check": "Rulechecker.Rule16",
"Description": "",
"Line": 27,
"Link": "www.sample.com",
"Message": "Fix the rule",
"Severity": "error",
"Span": [
1,
3
],
"Match": "[id"
}
],
"dir/file2.txt": [
{
"Setting": {
"SettingA": "",
"SettingB": null
},
"Rule": "Rulechecker.Rule17",
"Description": "",
"Line": 51,
"Link": "www.example.com",
"Message": "Fix anoher 'rule'?",
"Severity": "error",
"Span": [
1,
18
],
"Match": "[source,terminal]\n----\n"
}
]
}
Ultimately, I want to present a matrix with dir/file1.txt, dir/file2.txt as rows on the left of the matrix, and all the keys to be presented as column headings, with the corresponding values.
| Filename | SettingA | SettingB | Rule | More columns... |
| -------- | -------------- | -------------- | -------------- | -------------- |
| dir/file1.txt | | null | Rulechecker.Rule15 | |
| dir/file1.txt | | null | Rulechecker.Rule16 | |
| dir/file2.txt | | null | Rulechecker.Rule17 | |

Iterate over the top-level key-value pairs obtained by to_entries to get access to the key names, then once again over its content array in .value to get the array items. Also note that newlines as present in the sample's last .Match value cannot be used as is in a line-oriented format such as CSV. Here, I chose to replace them with the literal string \n using gsub.
jq -r '
to_entries[] | . as {$key} | .value[] | [$key,
(.Setting | .SettingA, .SettingB),
.Rule // .Check, .Description, .Line, .Link,
.Message, .Severity, .Span[], .Match
| strings |= gsub("\n"; "\\n")
] | #csv
'
"dir/file1.txt","",,"Rulechecker.Rule15","",11,"www.sample.com","Some message","error",1,3,"[id"
"dir/file1.txt","",,"Rulechecker.Rule16","",27,"www.sample.com","Fix the rule","error",1,3,"[id"
"dir/file2.txt","",,"Rulechecker.Rule17","",51,"www.example.com","Fix anoher 'rule'?","error",1,18,"[source,terminal]\n----\n"
Demo
If you just want to dump all the values in the order they appear, you can simplify this by using .. | scalars to traverse the levels of the document:
jq -r '
to_entries[] | . as {$key} | .value[] | [$key,
(.. | scalars) | strings |= gsub("\n"; "\\n")
] | #csv
'
"dir/file1.txt","",,"Rulechecker.Rule15","",11,"www.sample.com","Some message","error",1,3,"[id"
"dir/file1.txt","",,"Rulechecker.Rule16","",27,"www.sample.com","Fix the rule","error",1,3,"[id"
"dir/file2.txt","",,"Rulechecker.Rule17","",51,"www.example.com","Fix anoher 'rule'?","error",1,18,"[source,terminal]\n----\n"
Demo
As for the column headings, for the first case I'd add them manually, as you spell out each value path anyways. For the latter case it will be a little complicated as not all coulmns have immediate names (what should the items of array Span be called?), and some seem to change (in the second record, column Rule is called Check). You could, however, stick to the names of the first record, and taking the deepest field name either as is or add the array indices. Something along these lines would do:
jq -r '
to_entries[0].value[0] | ["Filename", (
path(..|scalars) | .[.[[map(strings)|last]]|last:] | join(".")
)] | #csv
'
"Filename","SettingA","SettingB","Rule","Description","Line","Link","Message","Severity","Span.0","Span.1","Match"
Demo

Related

how to jq by the desired key is inside nested json

Here is the id.json
{
"name": "peter",
"path": "desktop/name",
"description": "male",
"env1": {
"school": "AAA",
"height": "150",
"weight": "80"
},
"env2": {
"school": "BBB",
"height": "160",
"weight": "70"
}
}
it can be more env3, env4, etc created automatically
I am trying to get the env1 by using height and weight as key
so the output can look like:
env1:height:150
env1:weight:80
env2:height:160
env2:weight:70
env3:height:xxx
.
.
.
My shell command jq .env1.height... id.json tried can only get the output by using env1, env2 as key, but it cannot handle env3, env4. And also, using jq to_entries[] to convert the json defined by key and value, but the first few row made me cannot get .value.weight as output. Any idea please?
Update:
edited the json to remove these three line
"name": "peter",
"path": "desktop/name",
"description": "male",
Then run below command:
jq 'to_entries[] | select(.value.height!=null) | [.key, .value.height, .value.weight]' id2.json
I can get below result
[
"dev",
"1",
"1"
]
[
"sit",
"1",
"1"
]
This is almost what I need, but any idea to remove the outer level json please?

Using your data as initially presented, the following jq program:
keys_unsorted[] as $k
| select($k|startswith("env"))
| .[$k] | to_entries[]
| select(.key|IN("height","weight"))
| [$k, .key, .value]
| join(":")
produces
env1:height:150
env1:weight:80
env2:height:160
env2:weight:70
An answer to the supplementary question
According to one interpretation of the supplementary question,
a solution would be:
keys_unsorted[] as $k
| .[$k]
| objects
| select(.height and .weight)
| to_entries[]
| select(.key|IN("height","weight"))
| [$k, .key, .value]
| join(":")
Equivalently, but without the redundancy:
["height","weight"] as $hw
| keys_unsorted[] as $k
| .[$k]
| objects
| . as $object
| select(all($hw[]; $object[.]))
| $hw[]
| [$k, ., $object[.]]
| join(":")

Reducing after filtering doesn't add up with jq

I have the following data:
[
{
"name": "example-1",
"amount": 4
},
{
"name": "foo",
"amount": 42
},
{
"name": "example-2",
"amount": 6
}
]
I would like to filter objects with a .name containing "example" and reduce the .amount property.
This is what I tried to do:
json='[{"name":"example-1","amount":4}, {"name": "foo","amount":42}, {"name": "example-2","amount":6}]'
echo $json | jq '.[] | select(.name | contains("example")) | .amount | add'
I get this error:
jq: error (at :1): Cannot iterate over number (4)
I think that the output of .[] | select(.name | contains("example")) | .amount is a stream, and not an array, so I cannot add the values together.
But how could I do to output an array instead, after the select and the lookup?
I know there is a map function and map(.amount) | add works, but the filtering isn't here.
I can't do a select without .[] | before, and I think that's where the "stream" problem comes from...

As you say, add/0 expects an array as input.
Since it's a useful idiom, consider using map(select(_)):
echo "$json" | jq 'map(select(.name | contains("example")) | .amount) | add'
However, sometimes it's better to use a stream-oriented approach:
def add(s): reduce s as $x (null; . + $x);
add(.[] | select(.name | contains("example")) | .amount)

How to not let jq interpret the newline character when exporting to CSV

I want to convert the following JSON content stored in a file tmp.json
{
"results": [
[
{
"field": "field1",
"value": "value1-1"
},
{
"field": "field2",
"value": "value1-2\n"
}
],
[
{
"field": "field1",
"value": "value2-1"
},
{
"field": "field2",
"value": "value2-2\n"
}
]
]
}
into a CSV output
"field1","field2"
"value1-1","value1-2\n"
"value2-1","value2-2\n"
When I use this jq command, however,
cat tmp.json | jq -r '.results | (first | map(.field)), (.[] | map(.value)) | #csv'
I get this result:
"field1","field2"
"value1-1","value1-2
"
"value2-1","value2-2
"
How should the jq command be written to get the desired CSV result?

For a jq-only solution, you can use gsub("\n"; "\\n"). I'd go with something like this:
.results
| (.[0] | map(.field)),
(.[] | map( .value | gsub("\n"; "\\n")))
| #csv
Using your JSON and invoking this with the -r command line option yields:
"field1","field2"
"value1-1","value1-2\n"
"value2-1","value2-2\n"

If newlines are the only thing you can handle, maybe you can do a string replacement.
cat tmp.json | jq -r '.results | (first | map(.field)), (.[] | map(.value) | map(gsub("\\n"; "\\n"))) | #csv'

SQL query to return an attribute as an array of objects

My DB (MySQL) looks as follows:
TASKS:
-----------------
| id | desc |
-----------------
| 1 | 'dishes' |
| 2 | 'dust' |
-----------------
IMAGES:
---------------------------
| id | task_id | url |
---------------------------
| 1 | 1 | 'http1' |
| 2 | 1 | 'http2' |
---------------------------
I would like to get a response in the following structure (nested array of objects with id, url):
"tasks": [
{
"id": 1,
"desc": "dishes",
"images": [
{
"id": 1,
"url": "http1"
},
{
"id": 2,
"url": "http2"
}
]
},
...
]
The closest I have got was with this code:
SELECT
t.id,
t.desc,
JSON_ARRAYAGG(i.url) AS images,
FROM tasks AS t
LEFT JOIN images AS i ON t.id=i.task_id
GROUP BY t.id
And got in return:
[
{
"id": 1,
"desc": "dishes",
"images": [
"http1",
"http2"
]
}
...
]
Above response is problematic as I need the image_ids.
I have also tried using JSON_OBJECTAGG (which is not ideal) however I had below SQL error:
"JSON documents may not contain NULL member names."
Indeed some tasks may not have images matching and I want to have them included in the response.
How should I refactor my code to get the desired response from the server?

Summing values from a JSON array in Snowflake

I have a source data which contains the following type of a JSON array:
[
[
"source 1",
250
],
[
"other source",
58
],
[
"more stuff",
42
],
...
]
There can be 1..N pairs of strings and values like this. How can I sum all the values together from this JSON?

You can use FLATTEN, it will produce a single row for each element of the input array. Then you can access the number in that element directly.
Imagine you have this input table:
create or replace table input as
select parse_json($$
[
[
"source 1",
250
],
[
"other source",
58
],
[
"more stuff",
42
]
]
$$) as json;
FLATTEN will do this:
select index, value from input, table(flatten(json));
-------+-------------------+
INDEX | VALUE |
-------+-------------------+
0 | [ |
| "source 1", |
| 250 |
| ] |
1 | [ |
| "other source", |
| 58 |
| ] |
2 | [ |
| "more stuff", |
| 42 |
| ] |
-------+-------------------+
And so you can just use VALUE[1] to access what you want
select sum(value[1]) from input, table(flatten(json));
---------------+
SUM(VALUE[1]) |
---------------+
350 |
---------------+

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to transform nested JSON to csv using jq - json

Related

how to jq by the desired key is inside nested json

Reducing after filtering doesn't add up with jq

How to not let jq interpret the newline character when exporting to CSV

SQL query to return an attribute as an array of objects

Summing values from a JSON array in Snowflake

Categories

Resources