How to flatten JSON array values as CSV using JQ - json

I have a JSON file containing application clients and their associated application features:
{
"client-A": [
"feature-x"
],
"client-B": [
"feature-x",
"feature-y"
],
"client-C": [
"feature-z"
],
"client-D": [
"feature-x",
"feature-z"
],
...
}
I'm trying to turn this into the following CSV:
client,feature
client-A,feature-x
client-B,feature-x
client-B,feature-y
client-C,feature-z
client-D,feature-x
client-D,feature-z
What's an easy way using jq to get this done?

Not sure whether this is the most efficient way of doing it, but you can convert use the following pipeline:
<yourfile.json jq -r 'to_entries | .[] | { key: .key, value: .value[] } | [ .key, .value ] | #csv'
to_entries converts the structure into "key value" pairs, which can then be operated on. The { key: .key, value: .value[] } bit will convert the array into multiple rows...

Related

How to convert Json Objects to table using keys as rows and nested keys as columns [duplicate]

Given a JSON file like this,
[
{
"h1": "x1",
"h2": "x2"
},
{
"h1": "y1",
"h2": "y2"
}
]
I extract it as a headed TSV using the following jq code. But I need to specify the header names twice. Is there a way to just specify the header names once? Thanks.
[
"h1"
, "h2"
], (.[] | [
.h1
, .h2
]) | #tsv
Here's a relatively robust jq script for printing the TSV with headers using the key names in the first object:
(.[0] | keys_unsorted) as $keys
| $keys, (.[] | [.[$keys[]]])
| #tsv
This of course assumes the -r command-line option.

Transforming a list containing key/value strings with jq

So, I basically have a file test.json
[
"Name=TestName",
"Tag=TestTag"
]
Which I'd like to transform into
[
{
"ParameterKey": "Name",
"ParameterValue": "TestName",
},
{
"ParameterKey": "Tag",
"ParameterValue": "TestTag",
}
]
With jq. Any idea?
You don't need to use split() call twice but just once and access the results directly with the Array/Object Value Iterator: .[] and specifying the index inside
jq -n '[ inputs[] | split("=") | {ParameterKey: .[0], ParameterValue: .[1]} ]'
You can try JQ Play
I tried with the following jq. It should work as long as you are sure of the format of the array.
[.[] | {ParameterKey: split("=")[0], ParameterValue: split("=")[1]}]
If you are using from terminal, you can use the following option
cat test.json | jq '[.[] | {ParameterKey: split("=")[0], ParameterValue: split("=")[1]}]'

Printing all keys and values in a single line after sorting the keys

I've a folder with more than 1000 request logs (generated per hour/day) which are in the following format:
[
{
"input": {
"random_param_name_1": "random_value_1",
"random_param_name_2": "random_value_2",
"random_param_name_3": "random_value_3",
"random_param_name_4": "random_value_4"
},
"output": {
"some_key_we_dont_care_about": "some_value_we_dont_care_about"
},
"status_code": 200
},
{
"input": {
"random_param_name_1": "random_value_1",
"random_param_name_4": "random_value_4",
"random_param_name_3": "random_value_3",
"random_param_name_5": "random_value_5"
},
"output": {
"some_key_we_dont_care_about": "some_value_we_dont_care_about"
},
"status_code": 200
}
]
And I need to find all the input requests that are unique. For this, I need to do two things:
sort the keys in input as different inputs might have same keys but in different order
print all the key and value in a single line, so that I can pipe the output to sort | uniq to get all the unique input combinations.
Please note that the input keys are random, most existing questions in stackoverflow of the similar kind, know the keys in advance, but that's not the case here.
I can print the key and values like this:
jq -r 'keys[] as $k | "\($k):(.[$k])"'
but they end up being on new lines.
to summarise, for the above json, I need a magic_expression
$ jq 'magic_expression' log.json
that will return
"random_param_name_1":"random_value_1","random_param_name_2":"random_value_2","random_param_name_3":"random_value_3","random_param_name_4":"random_value_4"
"random_param_name_1":"random_value_1","random_param_name_3":"random_value_3","random_param_name_4":"random_value_4","random_param_name_5":"random_value_5"
Here is a "magic expression" to get you started.
It uses to_entries to make the objects appearing in .input more managable.
def format: "\"\(.key)\":\"\(.value)\"" ;
map(.input) | unique | map(to_entries)[] | map(format) | join(",")
When run with -r / --raw-output it produces
"random_param_name_1":"random_value_1","random_param_name_2":"random_value_2","random_param_name_3":"random_value_3","random_param_name_4":"random_value_4"
"random_param_name_1":"random_value_1","random_param_name_4":"random_value_4","random_param_name_3":"random_value_3","random_param_name_5":"random_value_5"
Try it online!
EDIT: if as customcommander points out you want the keys to be sorted you can move the format before the unique. e.g.
def format: "\"\(.key)\":\"\(.value)\"" ;
map(.input | to_entries | map(format) | sort ) | unique[] | join(",")
which produces
"random_param_name_1":"random_value_1","random_param_name_2":"random_value_2","random_param_name_3":"random_value_3","random_param_name_4":"random_value_4"
"random_param_name_1":"random_value_1","random_param_name_3":"random_value_3","random_param_name_4":"random_value_4","random_param_name_5":"random_value_5"
when run with -r / --raw-output
Try it online!
Consider this:
/workspaces # jq 'map(.input)' data.json
[
{
"random_param_name_1": "random_value_1",
"random_param_name_2": "random_value_2",
"random_param_name_3": "random_value_3",
"random_param_name_4": "random_value_4"
},
{
"random_param_name_1": "random_value_1",
"random_param_name_4": "random_value_4",
"random_param_name_3": "random_value_3",
"random_param_name_5": "random_value_5"
}
]
You can sort the keys of each object with --sort-keys:
/workspaces # jq --sort-keys 'map(.input)' data.json
[
{
"random_param_name_1": "random_value_1",
"random_param_name_2": "random_value_2",
"random_param_name_3": "random_value_3",
"random_param_name_4": "random_value_4"
},
{
"random_param_name_1": "random_value_1",
"random_param_name_3": "random_value_3",
"random_param_name_4": "random_value_4",
"random_param_name_5": "random_value_5"
}
]
Then pipe this into another jq filter:
/workspaces # jq --sort-keys 'map(.input)' data.json | jq -r 'map(to_entries)[] | map("\"\(.key)\":\"\(.value)\"") | join(",")'
"random_param_name_1":"random_value_1","random_param_name_2":"random_value_2","random_param_name_3":"random_value_3","random_param_name_4":"random_value_4"
"random_param_name_1":"random_value_1","random_param_name_3":"random_value_3","random_param_name_4":"random_value_4","random_param_name_5":"random_value_5"
I need to find all the input requests that are unique.
This can be done within jq, without any sorting of keys, since jq's == operator ignores the key order. For example, the following will produce the unique input requests in their original form (i.e. without the keys being sorted):
map(.input)
| group_by(.)
| map(.[0])
Since group_by uses ==, uniqueness is guaranteed.
If you really want the keys to be sorted, then you could use the -S command-line option:
jq -S -f program.jq input.json
And if for some reason you really want the non-standard output format, you could use the following modification of the above program:
map(.input)
| group_by(.)
| map(.[0])
| .[]
| . as $in
| [ keys[] as $k | "\"\($k)\":\"\($in[$k])\"" ] | join(",")
With your sample input, this last produces:
"random_param_name_1":"random_value_1","random_param_name_2":"random_value_2","random_param_name_3":"random_value_3","random_param_name_4":"random_value_4"
"random_param_name_1":"random_value_1","random_param_name_3":"random_value_3","random_param_name_4":"random_value_4","random_param_name_5":"random_value_5"

Convert JSON to headed TSV

Given a JSON file like this,
[
{
"h1": "x1",
"h2": "x2"
},
{
"h1": "y1",
"h2": "y2"
}
]
I extract it as a headed TSV using the following jq code. But I need to specify the header names twice. Is there a way to just specify the header names once? Thanks.
[
"h1"
, "h2"
], (.[] | [
.h1
, .h2
]) | #tsv
Here's a relatively robust jq script for printing the TSV with headers using the key names in the first object:
(.[0] | keys_unsorted) as $keys
| $keys, (.[] | [.[$keys[]]])
| #tsv
This of course assumes the -r command-line option.

"Transposing" objects in jq

I'm unsure if "transpose" is the correct term here, but I'm looking to use jq to transpose a 2-dimensional object such as this:
[
{
"name": "A",
"keys": ["k1", "k2", "k3"]
},
{
"name": "B",
"keys": ["k2", "k3", "k4"]
}
]
I'd like to transform it to:
{
"k1": ["A"],
"k2": ["A", "B"],
"k3": ["A", "B"],
"k4": ["A"],
}
I can split out the object with .[] | {key: .keys[], name} to get a list of keys and names, or I could use .[] | {(.keys[]): [.name]} to get a collection of key–value pairs {"k1": ["A"]} and so on, but I'm unsure of the final concatenation step for either approach.
Are either of these approaches heading in the right direction? Is there a better way?
This should work:
map({ name, key: .keys[] })
| group_by(.key)
| map({ key: .[0].key, value: map(.name) })
| from_entries
The basic approach is to convert each object to name/key pairs, regroup them by key, then map them out to entries of an object.
This produces the following output:
{
"k1": [ "A" ],
"k2": [ "A", "B" ],
"k3": [ "A", "B" ],
"k4": [ "B" ]
}
Here is a simple solution that may also be easier to understand. It is based on the idea that a dictionary (a JSON object) can be extended by adding details about additional (key -> value) pairs:
# input: a dictionary to be extended by key -> value
# for each key in keys
def extend_dictionary(keys; value):
reduce keys[] as $key (.; .[$key] += [value]);
reduce .[] as $o ({}; extend_dictionary($o.keys; $o.name) )
$ jq -c -f transpose-object.jq input.json
{"k1":["A"],"k2":["A","B"],"k3":["A","B"],"k4":["B"]}
Here is a better solution for the case that all the values of "name"
are distinct. It is better because it uses a completely generic
filter, invertMapping; that is, invertMapping could be a built-in or
library function. With the help of this function, the solution
becomes a simple three-liner.
Furthermore, if the values of "name" are not all unique, then the solution
below can easily be tweaked by modifying the initial reduction of the input
(i.e. the line immediately above the invocation of invertMapping).
# input: a JSON object of (key, values) pairs, in which "values" is an array of strings;
# output: a JSON object representing the inverse relation
def invertMapping:
reduce to_entries[] as $pair
({}; reduce $pair.value[] as $v (.; .[$v] += [$pair.key] ));
map( { (.name) : .keys} )
| add
| invertMapping