JQ - Groupby and concatenate text objects - json

Not quite getting it. I can produce multiple lines but cannot get multiple entries to combine. Looking to take Source JSON and output to CSV as shown:
Source JSON:
[{"State": "NewYork","Drivers": [
{"Car": "Jetta","Users": [{"Name": "Steve","Details": {"Location": "Home","Time": "9a-7p"}}]},
{"Car": "Jetta","Users": [{"Name": "Roger","Details": {"Location": "Office","Time": "3p-6p"}}]},
{"Car": "Ford","Users": [{"Name": "John","Details": {"Location": "Home","Time": "12p-5p"}}]}
]}]
Desired CSV:
"NewYork","Jetta","Steve;Roger","Home;Office","9a-7p;3p-6p"
"NewYork","Ford","John","Home","12p-5p"
JQ code that does not work:
.\[\] | .Drivers\[\] | .Car as $car |
.Users\[\] |
\[$car, .Name\] | #csv

You're looking for something like this:
.[] | [.State] + (
.Drivers | group_by(.Car)[] | [.[0].Car] + (
map(.Users) | add | [
map(.Name),
map(.Details.Location),
map(.Details.Time)
] | map(join(";"))
)
) | #csv
$ jq -r -f tst.jq file
"NewYork","Ford","John","Home","12p-5p"
"NewYork","Jetta","Steve;Roger","Home;Office","9a-7p;3p-6p"
$

Not quite optimised, but I though't I'd share the general idea:
jq -r 'map(.State as $s |
(.Drivers | group_by(.Car))[]
| [
$s,
(map(.Users[].Name) | join(";")),
(map(.Users[].Details.Location) | join(";")),
(map(.Users[].Details.Time) | join(";"))
])
[] | #csv' b
map() over each state, remember the name (map(.State as $s | )
group_by(.Car)
Create an array containing all your fields that is passed to #csv
Use map() and join() to create the fields for Name, Location and Time
This part could be improved so you don't need that duplicated part
Output (with --raw-output:
"NewYork","John","Home","12p-5p"
"NewYork","Steve;Roger","Home;Office","9a-7p;3p-6p"
JqPlay seems down, so I'm still searching for an other way of sharing a public demo

Far from perfect, but it builds the result incrementally so it should be easily debuggable and extensible:
map({State} + (.Drivers[] | {Car} + (.Users[] | {Name} + (.Details | {Location, Time}))))
| group_by(.Car)
| map(reduce .[] as $item (
{State:null,Car:null,Name:[],Location:[],Time:[]};
. + ($item | {State,Car}) | .Name += [$item.Name] | .Location += [$item.Location] | .Time += [$item.Time]))
| .[]
| [.State, .Car, (.Name,.Location,.Time|join(","))]
| #csv

Related

Reducing after filtering doesn't add up with jq

I have the following data:
[
{
"name": "example-1",
"amount": 4
},
{
"name": "foo",
"amount": 42
},
{
"name": "example-2",
"amount": 6
}
]
I would like to filter objects with a .name containing "example" and reduce the .amount property.
This is what I tried to do:
json='[{"name":"example-1","amount":4}, {"name": "foo","amount":42}, {"name": "example-2","amount":6}]'
echo $json | jq '.[] | select(.name | contains("example")) | .amount | add'
I get this error:
jq: error (at :1): Cannot iterate over number (4)
I think that the output of .[] | select(.name | contains("example")) | .amount is a stream, and not an array, so I cannot add the values together.
But how could I do to output an array instead, after the select and the lookup?
I know there is a map function and map(.amount) | add works, but the filtering isn't here.
I can't do a select without .[] | before, and I think that's where the "stream" problem comes from...
As you say, add/0 expects an array as input.
Since it's a useful idiom, consider using map(select(_)):
echo "$json" | jq 'map(select(.name | contains("example")) | .amount) | add'
However, sometimes it's better to use a stream-oriented approach:
def add(s): reduce s as $x (null; . + $x);
add(.[] | select(.name | contains("example")) | .amount)

Output semicolon-separated string

Lets say we have this file:
{
"persons": [
{
"friends": 4,
"phoneNumber": 123456,
"personID": 11111
},
{
"friends": 2057,
"phoneNumber": 432100,
"personID": 22222
},
{
"friends": 50,
"phoneNumber": 147258,
"personID": 55555
}
]
}
I now want to extract the phone numbers of the persons 11111, 22222, 33333, 44444 and 55555 as a semicolon-separated string:
123456;432100;;;147258
While running
cat persons.txt | jq ".persons[] | select(.personID==<ID>) | .phoneNumber"
once for each <ID> and glueing the results together with the ; afterwards works, this is terribly slow, because it has to reload the file for each of the IDs (and other fields I want to extract).
Concatenating it in a single query:
cat persons.txt | jq "(.persons[] | select(.personID==11111) | .phoneNumber), (.persons[] | select(.personID==22222) | .phoneNumber), (.persons[] | select(.personID==33333) | .phoneNumber), (.persons[] | select(.personID==44444) | .phoneNumber), (.persons[] | select(.personID==55555) | .phoneNumber)"
This also works, but it gives
123456
432100
147258
so I do not know which of the fields are missing and how many ; I have to insert.
With your sample input in input.json, and using jq 1.6 (or a jq with INDEX/2), the following invocation of jq produces the desired output:
jq -r --argjson ids '[11111, 22222, 33333, 44444, 55555]' -f tossv.jq input.json
assuming tossv.jq contains the program:
INDEX(.persons[]; .personID) as $dict
| $ids
| map( $dict[tostring] | .phoneNumber)
| join(";")
Program notes
INDEX/2 produces a JSON object that serves as a dictionary. Since JSON keys must be strings, tostring must be used in line 3 above.
When using join(";"), null values effectively become empty strings.
If your jq does not have INDEX/2, then now might be a good time to upgrade. Otherwise you can snarf its definition by googling: jq "def INDEX" builtin.jq
Unfortunately I couldn't test if peak's answer works since I only have jq 1.5. Here's what I came up with yesterday evening:
For each semicolon, add the following query
(\";\" as \$a | \$a)
Resulting command (abstract):
cat persons.txt | jq "(<1's phone number>), (\";\" as \$a | \$a),
(<2's phone number>), (\";\" as \$a | \$a), ..."
Resulting command (concrete):
cat persons.txt | jq "(.persons[] | select(.personID==11111) | .phoneNumber), (\";\" as \$a | \$a),
(.persons[] | select(.personID==22222) | .phoneNumber), (\";\" as \$a | \$a),
(.persons[] | select(.personID==33333) | .phoneNumber), (\";\" as \$a | \$a),
(.persons[] | select(.personID==44444) | .phoneNumber), (\";\" as \$a | \$a),
(.persons[] | select(.personID==55555) | .phoneNumber)"
Result:
123456
";"
432100
";"
";"
";"
147258
Delete the newlines and ":
<commandAsAbove> | tr --delete "\n\""
Result:
123456;432100;;;147258
Do not get me wrong, this is far uglier than peak's answer, but it worked for me yesterday.
Without jq solution:
for i in $(seq 11111 11111 55555)
do
string=$(grep -B1 "$i" persons.txt | head -1 | sed 's/.* \(.*\),/\1/g')
echo "$string;" >> output
done
cat output | tr -d '\n' | rev | cut -d';' -f2- | rev > tmp && mv tmp output
This little script will yield the result you want and you can adapt it quickly if the input data varies
cat output
123456;432100;;;147258

jq - How to filter a json that does not contain

I have an aws query that I want to filter in jq.
I want to filter all the imageTags that don't end with "latest"
So far I did this but it filters things containing "latest" while I want to filter things not containing "latest" (or not ending with "latest")
aws ecr describe-images --repository-name <repo> --output json | jq '.[]' | jq '.[]' | jq "select ((.imagePushedAt < 14893094695) and (.imageTags[] | contains(\"latest\")))"
Thanks
You can use not to reverse the logic
(.imageTags[] | contains(\"latest\") | not)
Also, I'd imagine you can simplify your pipeline into a single jq call.
All you have to do is | not within your jq
A useful example, in particular for mac brew users:
List all bottled formulae
by querying the JSON and parsing the output
brew info --json=v1 --installed | jq -r 'map(
select(.installed[].poured_from_bottle)|.name) | unique | .[]' | tr '\n' ' '
List all non-bottled formulae
by querying the JSON and parsing the output and using | not
brew info --json=v1 --installed | jq -r 'map(
select(.installed[].poured_from_bottle | not) | .name) | unique | .[]'
In this case contains() doesn't work properly, is better use the not of index() function
select(.imageTags | index("latest") | not)
This .[] | .[] can be shorten to .[][] e.g.,
$ jq --null-input '[[1,2],[3,4]] | .[] | .[]'
1
2
3
4
$ jq --null-input '[[1,2],[3,4]] | .[][]'
1
2
3
4
To check whether a string does not contain another string, you can combine contains and not e.g.,
$ jq --null-input '"foobar" | contains("foo") | not'
false
$ jq --null-input '"barbaz" | contains("foo") | not'
true
You can do something similar with an array of strings with either any or all e.g.,
$ jq --null-input '["foobar","barbaz"] | any(.[]; contains("foo"))'
true
$ jq --null-input '["foobar","barbaz"] | any(.[]; contains("qux"))'
false
$ jq --null-input '["foobar","barbaz"] | all(.[]; contains("ba"))'
true
$ jq --null-input '["foobar","barbaz"] | all(.[]; contains("qux"))'
false
Say you had file.json:
[ [["foo", "foo"],["foo", "bat"]]
, [["foo", "bar"],["foo", "bat"]]
, [["foo", "baz"],["foo", "bat"]]
]
And you only want to keep the nested arrays that don't have any strings with "ba":
$ jq --compact-output '.[][] | select(all(.[]; contains("bat") | not))' file.json
["foo","foo"]
["foo","bar"]
["foo","baz"]

json parsing with jq and convert to csv

I need to get some values from a json file with JQ. I need to get a csv (Time, Data.key, Lat, Lng, Qline)
Input:
{
"Time":"14:16:23",
"Data":{
"101043":{
"Lat":49,
"Lng":15,
"Qline":420
},
"101044":{
"Lat":48,
"Lng":15,
"Qline":421
}
}
}
Example output of csv:
"14:16:23", 101043, 49, 15, 420
"14:16:23", 101044, 48, 15, 421
Thanks a lot.
I tried only to:
cat test.json | jq '.Data[] |[ .Lat, .Lng, .Qline ] | #csv'
Try this:
{ Time } + (.Data | to_entries[] | { key: .key | tonumber } + .value)
| [ .Time, .key, .Lat, .Lng, .Qline ]
| #csv
Make sure you get the raw output by using the -r switch.
Here's another solution that doesn't involve the +'s.
{Time, Data: (.Data | to_entries)[]}
| [.Time, (.Data.key | tonumber), .Data.value.Lat, .Data.value.Lng, .Data.value.Qline]
| #csv
Here is another solution. If data.json contains the sample data then
jq -M -r '(.Data|keys[]) as $k | {Time,k:$k}+.Data[$k] | [.[]] | #csv' data.json
will produce
"14:16:23","101043",49,15,420
"14:16:23","101044",48,15,421

filtering data using parameters

I have this command that is working..
cat ~/Desktop/results.json | jq '.[] | .environmentStatuses[].deploymentResult | select(.key.entityKey.key=="39583746-39747586") | .lifeCycleState '
I want to pass the entity key as variable , tried the below ones ,but none seems to work-
enkey="39583746-39747586"
cat ~/Desktop/results.json | jq '.[] | .environmentStatuses[].deploymentResult | select(.key.entityKey.key=="""${enkey}""") | .lifeCycleState '
cat ~/Desktop/results.json | jq '.[] | .environmentStatuses[].deploymentResult | select(.key.entityKey.key=="${enkey}") | .lifeCycleState '
When trying to use extra parameters in your filters, use the --arg option to pass them in. Don't rely on the shell to insert it into your filter string, keep that separate.
jq --arg key "$enkey" '.[] |
.environmentStatuses[].deploymentResult |
select(.key.entityKey.key == $key) |
.lifeCycleState' ~/Desktop/results.json
This worked for me :
jq '.[] | .environmentStatuses[].deploymentResult |
select(.key.entityKey.key == "'$key'") |
.lifeCycleState' ~/Desktop/results.json
--arg does not works as expected ...