Bash group csv data by values - json

I have a CSV that looks like this:
created,id,value
2022-12-16 11:55,58,10
2022-12-16 11:55,59,2
2022-12-16 11:50,58,11
2022-12-16 11:50,59,3
2022-12-16 11:50,60,7
I want to parse it so I have the following result, setting ids as columns and grouping by date:
created,58,59,60
2022-12-16 11:55,10,2,nan
2022-12-16 11:50,11,3,7
missing values are set to nan, each id appears at most once per date
How can I do it? I also have the first CSV in a JSON equivalent if this is easier to do with jq
The JSON is composed of similar elements:
{
"created": "2022-12-16 09:15",
"value": "10.4",
"id": "60"
}

Using the great Miller (version >= 6), running
mlr --csv reshape -s id,value then unsparsify then fill-empty -v "nan" input.csv
you get
created,58,59,60
2022-12-1611:55,10,2,nan
2022-12-1611:50,11,3,7
The core command here is reshape -s id,value, to transform your input from long to wide structure.

This is how I would do it in jq, based on the JSON input stream:
reduce inputs as {$created, $value, $id} ({head: [], body: {}};
.head |= (.[index($id) // length] = $id) | .body[$created][$id] = $value
)
| (.head | sort_by(tonumber)) as $head | ["created", $head[]], (
.body | to_entries[] | [.key, .value[$head[]]]
)
Then, either use the #csv builtin which wraps the values in quotes, and produces empty values for missing combinations:
jq -nr '
⋮
| #csv
'
"created","2","3","10","11","50","55","58","59"
"2022-12-16 11:55","6",,"3",,,"4","2","5"
"2022-12-16 11:50",,"12",,"9","10",,"8","11"
Demo
Or generate nan and , manually by mapping and joining accordingly:
jq -nr '
⋮
| map(. // "nan") | join(",")
'
created,2,3,10,11,50,55,58,59
2022-12-16 11:55,6,nan,3,nan,nan,4,2,5
2022-12-16 11:50,nan,12,nan,9,10,nan,8,11
Demo

Related

jq - converting json to cvs - how to treat "null" as string?

I have the following json file which I would like to convert to csv:
{
"id": 1,
"date": "2014-05-05T19:07:48.577"
}
{
"id": 2,
"date": null
}
Converting it to csv with the following jq produces:
$ jq -sr '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | #csv' < test.json
"date","id"
"2014-05-05T19:07:48.577",1
,2
Unfortunately, for the line with "id" equal to "2", the date column was not set to "null" - instead, it was empty. This in turn makes MySQL error on import if it's a datetime column (it expects a literal "null" if we don't have a date, and errors on "").
How can I make jq print the literal "null", and not ""?
I'd go with:
(map(keys_unsorted) | add | unique) as $cols
| $cols,
(.[] | [.[$cols[]]] | map(. // "null") )
| #csv
First, using keys_unsorted avoids useless sorting.
Second, [.[$cols[]]] is an important, recurrent and idiomatic pattern, used to ensure an array is constructed in the correct order without resorting to the reduce sledge-hammer.
Third, although map(. // "null") seems to be appropriate here, it should be noted that this expression will also replace false with "null", so, it would not be appropriate in general. Instead, to preserve false, one could write map(if . == null then "null" else . end).
Fourth, it should be noted that using map(. // "null") as above will also mask missing values of any of the keys, so if one wants some other behavior (e.g., raising an error if id is missing), then an alternative approach would be warranted.
The above assumes the stream of JSON objects shown in the question is "slurped", e.g. using jq's -s command-line option.
Use // as alternative operator for your cell value:
jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.] // "null")) as $rows | $cols, $rows[] | #csv' < test.json
(The whole string is pretty good explained here: https://stackoverflow.com/a/32965227/16174836)
You can "stringify" the value using tostring by changing map($row[.]) into map($row[.]|tostring):
$ cat so2332.json
{
"id": 1,
"date": "2014-05-05T19:07:48.577"
}
{
"id": 2,
"date": null
}
$ jq --slurp --raw-output '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.]|tostring)) as $rows | $cols, $rows[] | #csv' so2332.json
"date","id"
"2014-05-05T19:07:48.577","1"
"null","2"
Note that the use of tostring will cause the numbers to be converted to strings.

replace string with jq

I have the following file file.txt:
{"a": "a", "b": "a", "time": "20210210T10:10:00"}
{"a": "b", "b": "b", "time": "20210210T11:10:00"}
I extract the values with bash command jq (I use this command on massive 100g files):
jq -r '[.a, .b, .time] | #tsv'
This returns good result of:
a a 20210210T10:10:00
b b 20210210T11:10:00
The output I would like is:
a a 2021-02-10 10:10:00
b b 2021-02-10 11:10:00
The problem is that I want to change the format of the date in the most efficient way possible.
How do I do that?
You can do it in sed, but you can also call sub directly in jq:
jq -r '[.a, .b,
( .time
| sub("(?<y>\\d{4})(?<m>\\d{2})(?<d>\\d{2})T";
.y+"-"+.m+"-"+.d+" ")
)
] | #tsv'
Use strptime for date interpretation and strftime for formatting:
parse.jq
[
.a,
.b,
( .time
| strptime("%Y%m%dT%H:%M:%S")
| strftime("%Y-%d-%m %H:%M:%S")
)
] | #tsv
Run it like this:
<input.json jq -rf parse.jq
Or as a one-liner:
<input.json jq -r '[.a,.b,(.time|strptime("%Y%m%dT%H:%M:%S")|strftime("%Y-%d-%m %H:%M:%S"))]|#tsv'
Output:
a a 2021-10-02 10:10:00
b b 2021-10-02 11:10:00
Since speed is an issue, and since there does not appear to be a need for anything more than string splitting, you could compare string splitting done with jq using
[.a, .b,
(.time | "\(.[:4])-\(.[4:6])-\(.[6:8]) \(.[9:])"]
vs similar splitting using jq with awk -F\\t 'BEGIN{OFS=FS} ....' (awk for ease of handling the TSV).
With sed:
$ echo "20210427T19:23:00" | sed -r 's|([[:digit:]]{4})([[:digit:]]{2})([[:digit:]]
{2})T|\1-\2-\3 |'
2021-04-27 19:23:00

Join a list of json values using jq

I'm attempting to reduce this list of names to a single line of text.
I have JSON like this:
{
"speakers": [
{
"firstName": "Abe",
"lastName": "Abraham"
},
{
"firstName": "Max",
"lastName": "Miller"
}
]
}
Expected output:
Abe Abraham and Max Miller
One of the many attempts I've made is this:
jq -r '.speakers[] | ["\(.firstName) \(.lastName)"] | join(" and ")'
The results are printed out on separate lines like this:
Abe Abraham
Max Miller
I think the join command is just joining the single-element array piped to it (one name per array). How can I get the full list of names passed to join as a single array, so I get the expected output shown above?
You're getting an array for each speaker that way. What you want is a single array containing all so that you can join them, which is done like this:
.speakers | map("\(.firstName) \(.lastName)") | join(" and ")
$ jq -c '.speakers[] | [ "\(.firstName) \(.lastName)" ]' speakers.json
["Abe Abraham"]
["Max Miller"]
If you move your opening [ you get a single array with all the names.
$ jq -c '[ .speakers[] | "\(.firstName) \(.lastName)" ]' speakers.json
["Abe Abraham","Max Miller"]
Which you can pass to join()
$ jq -r '[ .speakers[] | "\(.firstName) \(.lastName)" ] | join(" and ")' speakers.json
Abe Abraham and Max Miller
If there are no other keys you can also write it like:
$ jq -r '[.speakers[] | join(" ")] | join(" and ")' speakers.json
Abe Abraham and Max Miller

Using jq to get json values

Input json:
{
"food_group": "fruit",
"glycemic_index": "low",
"fruits": {
"fruit_name": "apple",
"size": "large",
"color": "red"
}
}
Below two jq commands work:
# jq -r 'keys_unsorted[] as $key | "\($key), \(.[$key])"' food.json
food_group, fruit
glycemic_index, low
fruits, {"fruit_name":"apple","size":"large","color":"red"}
# jq -r 'keys_unsorted[0:2] as $key | "\($key)"' food.json
["food_group","glycemic_index"]
How to get values for the first two keys using jq in the same manner? I tried below
# jq -r 'keys_unsorted[0:2] as $key | "\($key), \(.[$key])"' food.json
jq: error (at food.json:9): Cannot index object with array
Expected output:
food_group, fruit
glycemic_index, low
To iterate over a hash array , you can use to_entries and that will transform to a array .
After you can use select to filter rows you want to keep .
jq -r 'to_entries[]| select( ( .value | type ) == "string" ) | "\(.key), \(.value)" '
You can use to_entries
to_entries[] | select(.key=="food_group" or .key=="glycemic_index") | "\(.key), \(.value)"
Demo
https://jqplay.org/s/Aqvos4w7bo

How to make it work with filter in jq

I'd like to filter output from below json file to get all start with "tag_Name"
{
...
"tag_Name_abc": [
"10_1_4_3",
"10_1_6_2",
"10_1_5_3",
"10_1_5_5"
],
"tag_Name_efg": [
"10_1_4_5"
],
...
}
Try something but failed.
$ cat output.json |jq 'map(select(startswith("tag_Name")))'
jq: error (at <stdin>:1466): startswith() requires string inputs
There's plenty of ways you can do this but the simplest way you can do so is to convert that object to entries so you can get access to the keys, then filter the entries by the names you want then back again.
with_entries(select(.key | startswith("tag_Name")))
Here are a few more solutions:
1) combining values for matching keys with add
. as $d
| keys
| map( select(startswith("tag_Name")) | {(.): $d[.]} )
| add
2) filtering out non-matching keys with delpaths
delpaths([
keys[]
| select(startswith("tag_Name") | not)
| [.]
])
3) filtering out non-matching keys with reduce and del
reduce keys[] as $k (
.
; if ($k|startswith("tag_Name")) then . else del(.[$k]) end
)