Join a list of json values using jq - json

I'm attempting to reduce this list of names to a single line of text.
I have JSON like this:
{
"speakers": [
{
"firstName": "Abe",
"lastName": "Abraham"
},
{
"firstName": "Max",
"lastName": "Miller"
}
]
}
Expected output:
Abe Abraham and Max Miller
One of the many attempts I've made is this:
jq -r '.speakers[] | ["\(.firstName) \(.lastName)"] | join(" and ")'
The results are printed out on separate lines like this:
Abe Abraham
Max Miller
I think the join command is just joining the single-element array piped to it (one name per array). How can I get the full list of names passed to join as a single array, so I get the expected output shown above?

You're getting an array for each speaker that way. What you want is a single array containing all so that you can join them, which is done like this:
.speakers | map("\(.firstName) \(.lastName)") | join(" and ")

$ jq -c '.speakers[] | [ "\(.firstName) \(.lastName)" ]' speakers.json
["Abe Abraham"]
["Max Miller"]
If you move your opening [ you get a single array with all the names.
$ jq -c '[ .speakers[] | "\(.firstName) \(.lastName)" ]' speakers.json
["Abe Abraham","Max Miller"]
Which you can pass to join()
$ jq -r '[ .speakers[] | "\(.firstName) \(.lastName)" ] | join(" and ")' speakers.json
Abe Abraham and Max Miller
If there are no other keys you can also write it like:
$ jq -r '[.speakers[] | join(" ")] | join(" and ")' speakers.json
Abe Abraham and Max Miller

Related

Bash group csv data by values

I have a CSV that looks like this:
created,id,value
2022-12-16 11:55,58,10
2022-12-16 11:55,59,2
2022-12-16 11:50,58,11
2022-12-16 11:50,59,3
2022-12-16 11:50,60,7
I want to parse it so I have the following result, setting ids as columns and grouping by date:
created,58,59,60
2022-12-16 11:55,10,2,nan
2022-12-16 11:50,11,3,7
missing values are set to nan, each id appears at most once per date
How can I do it? I also have the first CSV in a JSON equivalent if this is easier to do with jq
The JSON is composed of similar elements:
{
"created": "2022-12-16 09:15",
"value": "10.4",
"id": "60"
}
Using the great Miller (version >= 6), running
mlr --csv reshape -s id,value then unsparsify then fill-empty -v "nan" input.csv
you get
created,58,59,60
2022-12-1611:55,10,2,nan
2022-12-1611:50,11,3,7
The core command here is reshape -s id,value, to transform your input from long to wide structure.
This is how I would do it in jq, based on the JSON input stream:
reduce inputs as {$created, $value, $id} ({head: [], body: {}};
.head |= (.[index($id) // length] = $id) | .body[$created][$id] = $value
)
| (.head | sort_by(tonumber)) as $head | ["created", $head[]], (
.body | to_entries[] | [.key, .value[$head[]]]
)
Then, either use the #csv builtin which wraps the values in quotes, and produces empty values for missing combinations:
jq -nr '
⋮
| #csv
'
"created","2","3","10","11","50","55","58","59"
"2022-12-16 11:55","6",,"3",,,"4","2","5"
"2022-12-16 11:50",,"12",,"9","10",,"8","11"
Demo
Or generate nan and , manually by mapping and joining accordingly:
jq -nr '
⋮
| map(. // "nan") | join(",")
'
created,2,3,10,11,50,55,58,59
2022-12-16 11:55,6,nan,3,nan,nan,4,2,5
2022-12-16 11:50,nan,12,nan,9,10,nan,8,11
Demo

Is there a way to filter a JSON object using jq to only include those with a key matching a value from a known list?

I have a JSON array, and another text file that contains a list of values.
[
{
"key": "foo",
"detail": "bar"
},
...
]
I need to filter the array elements to only those that have a "key" value that is found in the list of values.
The list of values is a text file containing a single item per-line.
foo
baz
Is this possible to do using jq?
You can use the following:
jq --rawfile to_keep_file to_keep.txt '
( [ $to_keep_file | match(".+"; "g").string | { (.): true } ] | add ) as $to_keep_lkup |
map(select($to_keep_lkup[.key]))
' to_filter.json
or
(
jq -sR . to_keep.txt
cat to_filter.json
) | jq -n '
( [ input | match(".+"; "g").string | { (.): true } ] | add ) as $to_keep_lkup |
inputs | map(select($to_keep_lkup[.key]))
'
The former requires jq v1.6, the first version to provide --rawfile.
jqplay

replace string with jq

I have the following file file.txt:
{"a": "a", "b": "a", "time": "20210210T10:10:00"}
{"a": "b", "b": "b", "time": "20210210T11:10:00"}
I extract the values with bash command jq (I use this command on massive 100g files):
jq -r '[.a, .b, .time] | #tsv'
This returns good result of:
a a 20210210T10:10:00
b b 20210210T11:10:00
The output I would like is:
a a 2021-02-10 10:10:00
b b 2021-02-10 11:10:00
The problem is that I want to change the format of the date in the most efficient way possible.
How do I do that?
You can do it in sed, but you can also call sub directly in jq:
jq -r '[.a, .b,
( .time
| sub("(?<y>\\d{4})(?<m>\\d{2})(?<d>\\d{2})T";
.y+"-"+.m+"-"+.d+" ")
)
] | #tsv'
Use strptime for date interpretation and strftime for formatting:
parse.jq
[
.a,
.b,
( .time
| strptime("%Y%m%dT%H:%M:%S")
| strftime("%Y-%d-%m %H:%M:%S")
)
] | #tsv
Run it like this:
<input.json jq -rf parse.jq
Or as a one-liner:
<input.json jq -r '[.a,.b,(.time|strptime("%Y%m%dT%H:%M:%S")|strftime("%Y-%d-%m %H:%M:%S"))]|#tsv'
Output:
a a 2021-10-02 10:10:00
b b 2021-10-02 11:10:00
Since speed is an issue, and since there does not appear to be a need for anything more than string splitting, you could compare string splitting done with jq using
[.a, .b,
(.time | "\(.[:4])-\(.[4:6])-\(.[6:8]) \(.[9:])"]
vs similar splitting using jq with awk -F\\t 'BEGIN{OFS=FS} ....' (awk for ease of handling the TSV).
With sed:
$ echo "20210427T19:23:00" | sed -r 's|([[:digit:]]{4})([[:digit:]]{2})([[:digit:]]
{2})T|\1-\2-\3 |'
2021-04-27 19:23:00

Using jq to get json values

Input json:
{
"food_group": "fruit",
"glycemic_index": "low",
"fruits": {
"fruit_name": "apple",
"size": "large",
"color": "red"
}
}
Below two jq commands work:
# jq -r 'keys_unsorted[] as $key | "\($key), \(.[$key])"' food.json
food_group, fruit
glycemic_index, low
fruits, {"fruit_name":"apple","size":"large","color":"red"}
# jq -r 'keys_unsorted[0:2] as $key | "\($key)"' food.json
["food_group","glycemic_index"]
How to get values for the first two keys using jq in the same manner? I tried below
# jq -r 'keys_unsorted[0:2] as $key | "\($key), \(.[$key])"' food.json
jq: error (at food.json:9): Cannot index object with array
Expected output:
food_group, fruit
glycemic_index, low
To iterate over a hash array , you can use to_entries and that will transform to a array .
After you can use select to filter rows you want to keep .
jq -r 'to_entries[]| select( ( .value | type ) == "string" ) | "\(.key), \(.value)" '
You can use to_entries
to_entries[] | select(.key=="food_group" or .key=="glycemic_index") | "\(.key), \(.value)"
Demo
https://jqplay.org/s/Aqvos4w7bo

Pad JSON array with JQ to obtain rectangular result

I have json that looks like this (jq play in the link), and I want to build csv in the end looking like this (reproducible sample at the bottom).
"SO302993",items1,item2,item3.1,item3.2,item3.3, item3.4,...
"SO302994",items1,item2,item3.1,item3.2, , ,...
"SO302995",items1,item2,item3.1,item3.2,item3.3, ,...
item3 elements are in an array and my current solution:
.[] | [.number, .item1, item2, item3[]?]
gives me this:
"SO302993",items1,item2,item3.1,item3.2,item3.3, item3.4,...
"SO302994",items1,item2,item3.1,item3.2,...
"SO302995",items1,item2,item3.1,item3.2,item3.3,...
which will create an uneven number of columns in the csv.
I tried adding .item3[:]? in a Python flavor-style, but it didn't work.
Any help would be much appreciated! And if I wasn't clear do ask to clarify! My snippet and toy data are in the link above.
{
"items": [
{
"name": "Mr Simon Mackin",
"country_of_residence": "Scotland",
"natures_of_control": [
"voting-rights-25-to-50-percent-limited-liability-partnership",
"significant-influence-or-control-limited-liability-partnership"
],
"premises": "4"
}
]
}
{
"items": [
{
"name": "Mrs Simonne Mackinni",
"country_of_residence": "France",
"natures_of_control": [
"significant-influence-or-control-limited-liability-partnership"
],
"premises": "4"
}
]
}
with this query:
.items[] | [.name, .country_of_residence, .natures_of_control[]?, .premises] | #csv
I get this results
"Mr Simon Mackin","Scotland","voting-rights","significant-influence","4"
"Mrs Simonne Mackinni","France","significant-influence","4"
But I'd like to get this (second line has extra comma after "significant-influence).
"Mr Simon Mackin","Scotland","voting-rights","significant-influence","4"
"Mrs Simonne Mackinni","France","significant-influence",,"4"
Since you want a rectangular result, you will have to "pad" the "natures_of_control" array. Based on the sample input, you will need to "slurp" the input in order to obtain a global maximum.
To pad the array, you could use the helper function:
# emit a stream of exactly $n items
def pad($n): range(0;$n) as $i | .[$i];
The solution to the problem as posted on jqplay then becomes:
([.[] | .items[] | .natures_of_control | length] | max) as $mx
| .[]
| (.active_count) as $active_count
| (.ceased_count) as $ceased_count
| (.links.self | split("/")[2]) as $companyCode
| .items[]
| [$companyCode, $active_count, $ceased_count, .name, .country_of_residence, .nationality, .notified_on, (.natures_of_control | pad($mx))]
| #csv
Invocation
The appropriate invocation would look like this:
jq -sr -f program.jq input.json
Handling missing data
To ignore objects that have no "items" you could tweak the above, e.g. as follows:
([.[] | .items[]? | .natures_of_control | length] | max) as $mx
| .[]
| select(.items)
| (.active_count) as $active_count
| (.ceased_count) as $ceased_count
| (.links.self | split("/")[2]) as $companyCode
| .items[]
| [$companyCode, $active_count, $ceased_count, .name, .country_of_residence, .nationality, .notified_on, (.natures_of_control | pad($mx))]
| #csv