Convert json to csv with only selected keys - json

I have a json script with the following content.
{
"syncToken": "1612442658",
"createDate": "2021-02-04-12-44-18",
"prefixes": [
{
"ip_prefix": "3.5.140.0/22",
"region": "ap-northeast-2",
"service": "AMAZON",
"network_border_group": "ap-northeast-2"
},
{
"ip_prefix": "35.180.0.0/16",
"region": "eu-west-3",
"service": "AMAZON",
"network_border_group": "eu-west-3"
},
{
"ip_prefix": "52.93.178.234/32",
"region": "us-west-1",
"service": "AMAZON",
"network_border_group": "us-west-1"
}
]
}
My requirement is to convert this json to csv in the below format.
ip_prefix region service
3.5.140.0/22 ap-northeast-2 AMAZON
35.180.0.0/16 eu-west-3 AMAZON
52.93.178.234/32 us-west-1 AMAZON
Used jq to convert the data using the below command
jq -r '(.prefixes[0] | keys_unsorted), (.prefixes[] | to_entries | map(.value))|#tsv' ip-ranges.json
But its exporting all the keys. Need help in exporting few keys out of many keys.

Say you had a list of fields in $fields. Then all you'd need is this:
.prefixes | // The array of records from which we will build our rows.
(
$fields, // The header row
( // The data rows
.[] | // For each input,
[ .[ $fields[] ] ] // create an array of the selected values
)
) |
#tsv
This will serve as the basis of all the following solutions. What will differ is how we build $fields.
Allow list:
[ "ip_prefix", "region", "service" ] as $fields |
.prefixes |
( $fields, ( .[] | [ .[ $fields[] ] ] ) ) | #tsv
Example use:
jq -r --argjson fields '[ "ip_prefix", "region", "service" ]' '
[ "ip_prefix", "region", "service" ] as $fields |
.prefixes |
( $fields, ( .[] | [ .[ $fields[] ] ] ) ) | #tsv
' data.json
Blocked item:
.prefixes |
( .[0] | keys_unsorted | map(select( . != "network_border_group" )) ) as $fields |
.prefixes |
( $fields, ( .[] | [ .[ $fields[] ] ] ) ) | #tsv
Block list:
[ "network_border_group" ] as $blocked |
.prefixes |
( .[0] | keys_unsorted - $blocked ) as $fields |
( $fields, ( .[] | [ .[ $fields[] ] ] ) ) | #tsv
Example use:
jq -r --argjson blocked '[ "network_border_group" ]' '
.prefixes |
( .[0] | keys_unsorted - $blocked ) as $fields |
( $fields, ( .[] | [ .[ $fields[] ] ] ) ) | #tsv
' data.json

The following variations assume a "whitelist" approach is appropriate:
Without a header row
.prefixes[] | [.ip_prefix,.region,.service] | #csv
Or, specifying the whitelist as an argument:
jq -r --argjson whitelist '["ip_prefix","region","service"]' '
.prefixes[] | [ .[$whitelist[]] ] | #csv'
With a header row:
["ip_prefix","region","service"] as $whitelist
| $whitelist,
(.prefixes[] | [getpath($whitelist[]|[.])])
| #csv
Etc.
Note that none of the above require any special ordering of keys in the objects in the source.

One way to do it would be to use the update operator |= to delete the unwanted key-value pair.
jq -r '.prefixes[] |= del(.network_border_group)
| (.prefixes[0] | keys_unsorted),
(.prefixes[] | to_entries | map(.value))
| #tsv'

Related

How to transform nested JSON to csv using jq

I have tried to transform json in the following format to csv using jq on Linux cmd line, but with no success. Any help of guidance would be appreciated.
{
"dir/file1.txt": [
{
"Setting": {
"SettingA": "",
"SettingB": null
},
"Rule": "Rulechecker.Rule15",
"Description": "",
"Line": 11,
"Link": "www.sample.com",
"Message": "Some message",
"Severity": "error",
"Span": [
1,
3
],
"Match": "[id"
},
{
"Setting": {
"SettingA": "",
"SettingB": null
},
"Check": "Rulechecker.Rule16",
"Description": "",
"Line": 27,
"Link": "www.sample.com",
"Message": "Fix the rule",
"Severity": "error",
"Span": [
1,
3
],
"Match": "[id"
}
],
"dir/file2.txt": [
{
"Setting": {
"SettingA": "",
"SettingB": null
},
"Rule": "Rulechecker.Rule17",
"Description": "",
"Line": 51,
"Link": "www.example.com",
"Message": "Fix anoher 'rule'?",
"Severity": "error",
"Span": [
1,
18
],
"Match": "[source,terminal]\n----\n"
}
]
}
Ultimately, I want to present a matrix with dir/file1.txt, dir/file2.txt as rows on the left of the matrix, and all the keys to be presented as column headings, with the corresponding values.
| Filename | SettingA | SettingB | Rule | More columns... |
| -------- | -------------- | -------------- | -------------- | -------------- |
| dir/file1.txt | | null | Rulechecker.Rule15 | |
| dir/file1.txt | | null | Rulechecker.Rule16 | |
| dir/file2.txt | | null | Rulechecker.Rule17 | |
Iterate over the top-level key-value pairs obtained by to_entries to get access to the key names, then once again over its content array in .value to get the array items. Also note that newlines as present in the sample's last .Match value cannot be used as is in a line-oriented format such as CSV. Here, I chose to replace them with the literal string \n using gsub.
jq -r '
to_entries[] | . as {$key} | .value[] | [$key,
(.Setting | .SettingA, .SettingB),
.Rule // .Check, .Description, .Line, .Link,
.Message, .Severity, .Span[], .Match
| strings |= gsub("\n"; "\\n")
] | #csv
'
"dir/file1.txt","",,"Rulechecker.Rule15","",11,"www.sample.com","Some message","error",1,3,"[id"
"dir/file1.txt","",,"Rulechecker.Rule16","",27,"www.sample.com","Fix the rule","error",1,3,"[id"
"dir/file2.txt","",,"Rulechecker.Rule17","",51,"www.example.com","Fix anoher 'rule'?","error",1,18,"[source,terminal]\n----\n"
Demo
If you just want to dump all the values in the order they appear, you can simplify this by using .. | scalars to traverse the levels of the document:
jq -r '
to_entries[] | . as {$key} | .value[] | [$key,
(.. | scalars) | strings |= gsub("\n"; "\\n")
] | #csv
'
"dir/file1.txt","",,"Rulechecker.Rule15","",11,"www.sample.com","Some message","error",1,3,"[id"
"dir/file1.txt","",,"Rulechecker.Rule16","",27,"www.sample.com","Fix the rule","error",1,3,"[id"
"dir/file2.txt","",,"Rulechecker.Rule17","",51,"www.example.com","Fix anoher 'rule'?","error",1,18,"[source,terminal]\n----\n"
Demo
As for the column headings, for the first case I'd add them manually, as you spell out each value path anyways. For the latter case it will be a little complicated as not all coulmns have immediate names (what should the items of array Span be called?), and some seem to change (in the second record, column Rule is called Check). You could, however, stick to the names of the first record, and taking the deepest field name either as is or add the array indices. Something along these lines would do:
jq -r '
to_entries[0].value[0] | ["Filename", (
path(..|scalars) | .[.[[map(strings)|last]]|last:] | join(".")
)] | #csv
'
"Filename","SettingA","SettingB","Rule","Description","Line","Link","Message","Severity","Span.0","Span.1","Match"
Demo

Mapping an array of objects to plain array with jq

I'm working on implementing translations in my app, however the format of my translations is not usable for me. I made a shell script that uses jq to try to modify this array, but I cannot get the output that I desire.
The JSON I get from my service looks something like this.
{
"result": {
"terms": [
{
"term": "title",
"translation": {
"content": "Welcome to {{user}}"
}
},
{
"term": "car",
"translation": {
"content": {
"one": "car",
"other": "cars"
}
}
}
]
}
}
The output that I want is something like this.
{
"title": "Welcome to {{user}}",
"car_one": "car",
"car_other": "cars",
}
I've managed to strip away the uneeded parts of my objects, but I can't figure out how to append something to they key, e.g. turning "car" into "car_one". Or actually just adding the keys properly to the array.
This is currently where I'm at https://jqplay.org/s/P6KIEVX5sWp
Probably not the most efficient solution, but this oughta work and is kinda readable:
.result.terms
| map(
(select(.translation.content | type == "object")
| .term as $term | .translation.content | to_entries[] | .key |= "\($term)_\(.)"),
(select(.translation.content | type == "string")
| { key: .term, value: .translation.content })
)
| from_entries
Or with if-then-else:
.result.terms
| map(
if .translation.content | type == "object" then
.term as $term | .translation.content | to_entries[] | .key |= "\($term)_\(.)"
else
{ key: .term, value: .translation.content }
end
)
| from_entries
Cleverly place (and name) the variable and you get down to:
.result.terms
| map(.term as $key
| .translation.content
| if type == "object" then
to_entries[] | .key |= "\($key)_\(.)"
else
{ $key, value: . }
end
)
| from_entries
Even more concise by combining the optional/error suppression operator with the alternative operator:
.result.terms
| map(.term as $key
| .translation.content
| (to_entries[] | .key |= "\($key)_\(.)")? // { $key, value: . }
)
| from_entries
Or, if you prefer (so many possibilities):
.result.terms
| map(
.term as $key
| .translation.content as $value | $value
| (to_entries[] | .key |= "\($key)_\(.)")? // { $key, $value }
)
| from_entries
"\($key)_\(.)" is string interpolation and equivalent to ($key + "_" + .)
The requirements aren't entirely clear to me, but the following does follow the logic of your approach and does produce the required output:
.result
| [.terms[]
| if .translation.content|type == "string" then {title: .translation.content}
else .term as $term
| .translation
| (.content|keys) as $keys
| ([$keys[] as $key | {($term + "_" + $key): .content[$key]}] | add)
end ]
| add

how to jq by the desired key is inside nested json

Here is the id.json
{
"name": "peter",
"path": "desktop/name",
"description": "male",
"env1": {
"school": "AAA",
"height": "150",
"weight": "80"
},
"env2": {
"school": "BBB",
"height": "160",
"weight": "70"
}
}
it can be more env3, env4, etc created automatically
I am trying to get the env1 by using height and weight as key
so the output can look like:
env1:height:150
env1:weight:80
env2:height:160
env2:weight:70
env3:height:xxx
.
.
.
My shell command jq .env1.height... id.json tried can only get the output by using env1, env2 as key, but it cannot handle env3, env4. And also, using jq to_entries[] to convert the json defined by key and value, but the first few row made me cannot get .value.weight as output. Any idea please?
Update:
edited the json to remove these three line
"name": "peter",
"path": "desktop/name",
"description": "male",
Then run below command:
jq 'to_entries[] | select(.value.height!=null) | [.key, .value.height, .value.weight]' id2.json
I can get below result
[
"dev",
"1",
"1"
]
[
"sit",
"1",
"1"
]
This is almost what I need, but any idea to remove the outer level json please?
Using your data as initially presented, the following jq program:
keys_unsorted[] as $k
| select($k|startswith("env"))
| .[$k] | to_entries[]
| select(.key|IN("height","weight"))
| [$k, .key, .value]
| join(":")
produces
env1:height:150
env1:weight:80
env2:height:160
env2:weight:70
An answer to the supplementary question
According to one interpretation of the supplementary question,
a solution would be:
keys_unsorted[] as $k
| .[$k]
| objects
| select(.height and .weight)
| to_entries[]
| select(.key|IN("height","weight"))
| [$k, .key, .value]
| join(":")
Equivalently, but without the redundancy:
["height","weight"] as $hw
| keys_unsorted[] as $k
| .[$k]
| objects
| . as $object
| select(all($hw[]; $object[.]))
| $hw[]
| [$k, ., $object[.]]
| join(":")

Reducing after filtering doesn't add up with jq

I have the following data:
[
{
"name": "example-1",
"amount": 4
},
{
"name": "foo",
"amount": 42
},
{
"name": "example-2",
"amount": 6
}
]
I would like to filter objects with a .name containing "example" and reduce the .amount property.
This is what I tried to do:
json='[{"name":"example-1","amount":4}, {"name": "foo","amount":42}, {"name": "example-2","amount":6}]'
echo $json | jq '.[] | select(.name | contains("example")) | .amount | add'
I get this error:
jq: error (at :1): Cannot iterate over number (4)
I think that the output of .[] | select(.name | contains("example")) | .amount is a stream, and not an array, so I cannot add the values together.
But how could I do to output an array instead, after the select and the lookup?
I know there is a map function and map(.amount) | add works, but the filtering isn't here.
I can't do a select without .[] | before, and I think that's where the "stream" problem comes from...
As you say, add/0 expects an array as input.
Since it's a useful idiom, consider using map(select(_)):
echo "$json" | jq 'map(select(.name | contains("example")) | .amount) | add'
However, sometimes it's better to use a stream-oriented approach:
def add(s): reduce s as $x (null; . + $x);
add(.[] | select(.name | contains("example")) | .amount)

How to not let jq interpret the newline character when exporting to CSV

I want to convert the following JSON content stored in a file tmp.json
{
"results": [
[
{
"field": "field1",
"value": "value1-1"
},
{
"field": "field2",
"value": "value1-2\n"
}
],
[
{
"field": "field1",
"value": "value2-1"
},
{
"field": "field2",
"value": "value2-2\n"
}
]
]
}
into a CSV output
"field1","field2"
"value1-1","value1-2\n"
"value2-1","value2-2\n"
When I use this jq command, however,
cat tmp.json | jq -r '.results | (first | map(.field)), (.[] | map(.value)) | #csv'
I get this result:
"field1","field2"
"value1-1","value1-2
"
"value2-1","value2-2
"
How should the jq command be written to get the desired CSV result?
For a jq-only solution, you can use gsub("\n"; "\\n"). I'd go with something like this:
.results
| (.[0] | map(.field)),
(.[] | map( .value | gsub("\n"; "\\n")))
| #csv
Using your JSON and invoking this with the -r command line option yields:
"field1","field2"
"value1-1","value1-2\n"
"value2-1","value2-2\n"
If newlines are the only thing you can handle, maybe you can do a string replacement.
cat tmp.json | jq -r '.results | (first | map(.field)), (.[] | map(.value) | map(gsub("\\n"; "\\n"))) | #csv'