I have the following file file.txt:
{"a": "a", "b": "a", "time": "20210210T10:10:00"}
{"a": "b", "b": "b", "time": "20210210T11:10:00"}
I extract the values with bash command jq (I use this command on massive 100g files):
jq -r '[.a, .b, .time] | #tsv'
This returns good result of:
a a 20210210T10:10:00
b b 20210210T11:10:00
The output I would like is:
a a 2021-02-10 10:10:00
b b 2021-02-10 11:10:00
The problem is that I want to change the format of the date in the most efficient way possible.
How do I do that?
You can do it in sed, but you can also call sub directly in jq:
jq -r '[.a, .b,
( .time
| sub("(?<y>\\d{4})(?<m>\\d{2})(?<d>\\d{2})T";
.y+"-"+.m+"-"+.d+" ")
)
] | #tsv'
Use strptime for date interpretation and strftime for formatting:
parse.jq
[
.a,
.b,
( .time
| strptime("%Y%m%dT%H:%M:%S")
| strftime("%Y-%d-%m %H:%M:%S")
)
] | #tsv
Run it like this:
<input.json jq -rf parse.jq
Or as a one-liner:
<input.json jq -r '[.a,.b,(.time|strptime("%Y%m%dT%H:%M:%S")|strftime("%Y-%d-%m %H:%M:%S"))]|#tsv'
Output:
a a 2021-10-02 10:10:00
b b 2021-10-02 11:10:00
Since speed is an issue, and since there does not appear to be a need for anything more than string splitting, you could compare string splitting done with jq using
[.a, .b,
(.time | "\(.[:4])-\(.[4:6])-\(.[6:8]) \(.[9:])"]
vs similar splitting using jq with awk -F\\t 'BEGIN{OFS=FS} ....' (awk for ease of handling the TSV).
With sed:
$ echo "20210427T19:23:00" | sed -r 's|([[:digit:]]{4})([[:digit:]]{2})([[:digit:]]
{2})T|\1-\2-\3 |'
2021-04-27 19:23:00
Related
I have a CSV that looks like this:
created,id,value
2022-12-16 11:55,58,10
2022-12-16 11:55,59,2
2022-12-16 11:50,58,11
2022-12-16 11:50,59,3
2022-12-16 11:50,60,7
I want to parse it so I have the following result, setting ids as columns and grouping by date:
created,58,59,60
2022-12-16 11:55,10,2,nan
2022-12-16 11:50,11,3,7
missing values are set to nan, each id appears at most once per date
How can I do it? I also have the first CSV in a JSON equivalent if this is easier to do with jq
The JSON is composed of similar elements:
{
"created": "2022-12-16 09:15",
"value": "10.4",
"id": "60"
}
Using the great Miller (version >= 6), running
mlr --csv reshape -s id,value then unsparsify then fill-empty -v "nan" input.csv
you get
created,58,59,60
2022-12-1611:55,10,2,nan
2022-12-1611:50,11,3,7
The core command here is reshape -s id,value, to transform your input from long to wide structure.
This is how I would do it in jq, based on the JSON input stream:
reduce inputs as {$created, $value, $id} ({head: [], body: {}};
.head |= (.[index($id) // length] = $id) | .body[$created][$id] = $value
)
| (.head | sort_by(tonumber)) as $head | ["created", $head[]], (
.body | to_entries[] | [.key, .value[$head[]]]
)
Then, either use the #csv builtin which wraps the values in quotes, and produces empty values for missing combinations:
jq -nr '
⋮
| #csv
'
"created","2","3","10","11","50","55","58","59"
"2022-12-16 11:55","6",,"3",,,"4","2","5"
"2022-12-16 11:50",,"12",,"9","10",,"8","11"
Demo
Or generate nan and , manually by mapping and joining accordingly:
jq -nr '
⋮
| map(. // "nan") | join(",")
'
created,2,3,10,11,50,55,58,59
2022-12-16 11:55,6,nan,3,nan,nan,4,2,5
2022-12-16 11:50,nan,12,nan,9,10,nan,8,11
Demo
I've been using jq to parse the output from AWS cli.
The output looks something like this..
{
"Vpcs": [
{
"CidrBlock": "10.29.19.64/26",
"State": "available",
"VpcId": "vpc-0ba51bd29c41d41",
"IsDefault": false,
"Tags": [
{
"Key": "Name",
"Value": "CloudEndure-Europe-Development"
}
]
}
]}
and the script I am using looks like this..
.Vpcs[] | [.VpcId, .CidrBlock, (.Tags[]|select(.Key=="Name")|.Value)]
If I run it under Windows it fails like this.
jq: error: Name/0 is not defined at , line 1:
.Vpcs[] | [.VpcId, .CidrBlock, (.Tags[]|select(.Key==Name)|.Value)]
jq: 1 compile error
But it works fine in jqplay.org.
Any ideas, on Windows Im using jq-1.6.
Thanks
Bruce.
The correct jq program is
.Vpcs[] | [.VpcId, .CidrBlock, ( .Tags[] | select( .Key == "Name" ) | .Value ) ]
You didn't show the command you used, but you provided the following to jq:
.Vpcs[] | [.VpcId, .CidrBlock, ( .Tags[] | select( .Key == Name ) | .Value ) ]
That's incorrect. (Notice the missing quotes.)
Not only did you not provide what command you used, you didn't specify whether it was being provided to the Windows API (CreateProcess), Windows Shell (cmd) or Power Shell.
I'm guessing cmd. In order to provide the above program to jq, you can use the following cmd command:
jq ".Vpcs[] | [.VpcId, .CidrBlock, ( .Tags[] | select( .Key == \"Name\" ) | .Value ) ]" file.json
I'm not agreeing to ikegami about the CMD command that [he/she?] provided because the character used for CMD escaping is ^, not \ like Assembly/C/C++. I hope this will work (I don't want to test this on my potato thing):
jq .Vpcs[] | [.VpcId, .CidrBlock, ( .Tags[] | select( .Key == "Name" ) | .Value ) ] file.json
or this:
jq .Vpcs[] | [.VpcId, .CidrBlock, ( .Tags[] | select( .Key == ^"Name^" ) | .Value ) ] file.json
I'm attempting to reduce this list of names to a single line of text.
I have JSON like this:
{
"speakers": [
{
"firstName": "Abe",
"lastName": "Abraham"
},
{
"firstName": "Max",
"lastName": "Miller"
}
]
}
Expected output:
Abe Abraham and Max Miller
One of the many attempts I've made is this:
jq -r '.speakers[] | ["\(.firstName) \(.lastName)"] | join(" and ")'
The results are printed out on separate lines like this:
Abe Abraham
Max Miller
I think the join command is just joining the single-element array piped to it (one name per array). How can I get the full list of names passed to join as a single array, so I get the expected output shown above?
You're getting an array for each speaker that way. What you want is a single array containing all so that you can join them, which is done like this:
.speakers | map("\(.firstName) \(.lastName)") | join(" and ")
$ jq -c '.speakers[] | [ "\(.firstName) \(.lastName)" ]' speakers.json
["Abe Abraham"]
["Max Miller"]
If you move your opening [ you get a single array with all the names.
$ jq -c '[ .speakers[] | "\(.firstName) \(.lastName)" ]' speakers.json
["Abe Abraham","Max Miller"]
Which you can pass to join()
$ jq -r '[ .speakers[] | "\(.firstName) \(.lastName)" ] | join(" and ")' speakers.json
Abe Abraham and Max Miller
If there are no other keys you can also write it like:
$ jq -r '[.speakers[] | join(" ")] | join(" and ")' speakers.json
Abe Abraham and Max Miller
Input json:
{
"food_group": "fruit",
"glycemic_index": "low",
"fruits": {
"fruit_name": "apple",
"size": "large",
"color": "red"
}
}
Below two jq commands work:
# jq -r 'keys_unsorted[] as $key | "\($key), \(.[$key])"' food.json
food_group, fruit
glycemic_index, low
fruits, {"fruit_name":"apple","size":"large","color":"red"}
# jq -r 'keys_unsorted[0:2] as $key | "\($key)"' food.json
["food_group","glycemic_index"]
How to get values for the first two keys using jq in the same manner? I tried below
# jq -r 'keys_unsorted[0:2] as $key | "\($key), \(.[$key])"' food.json
jq: error (at food.json:9): Cannot index object with array
Expected output:
food_group, fruit
glycemic_index, low
To iterate over a hash array , you can use to_entries and that will transform to a array .
After you can use select to filter rows you want to keep .
jq -r 'to_entries[]| select( ( .value | type ) == "string" ) | "\(.key), \(.value)" '
You can use to_entries
to_entries[] | select(.key=="food_group" or .key=="glycemic_index") | "\(.key), \(.value)"
Demo
https://jqplay.org/s/Aqvos4w7bo
I have the following code, which lists all the current aws lambda functions on my account:
aws lambda list-functions --region eu-west-1 | jq -r '.Functions | .[] | .FunctionName' | xargs -L1 -I {} aws logs describe-log-streams --log-group-name /aws/lambda/{} | jq 'select(.logStreams[-1] != null)' | jq -r '.logStreams | .[] | [.arn, .lastEventTimestamp] | #csv'
that returns
aws:logs:eu-west-1:****:log-group:/aws/lambda/admin-devices-block-master:log-stream:2018/01/23/[$LATEST]64965367852942f490305cb8707d81b4",1516717768514
i am only interested in admin-devices-block-master and i want to convert the timestamp 1516717768514 in as strflocaltime("%Y-%m-%d %I:%M%p")
so it should just return:
"admin-devices-block-master",1516717768514
i tried
aws lambda list-functions --region eu-west-1 | jq -r '.Functions | .[] | .FunctionName' | xargs -L1 -I {} aws logs describe-log-streams --log-group-name /aws/lambda/{} | jq 'select(.logStreams[-1] != null)' | jq -r '.logStreams | .[] | [.arn,[.lastEventTimestamp|./1000|strflocaltime("%Y-%m-%d %I:%M%p")]]'
jq: error: strflocaltime/1 is not defined at <top-level>, line 1:
.logStreams | .[] | [.arn,[.lastEventTimestamp|./1000|strflocaltime("%Y-%m-%d %I:%M%p")]]
jq: 1 compile error
^CException ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe
any advice is much appreciated
strflocaltime need jq version 1.6 , thanks to #oliv to remark it .
This is a very simple example that will replace a EPOCH with milliseconds by a local time .
date -d #1572892409
Mon Nov 4 13:33:29 EST 2019
echo '{ "ts" : 1572892409356 , "id": 2 , "v": "foobar" } ' | \
jq '.ts|=( ./1000|strflocaltime("%Y-%m-%d %I:%M%p")) '
{
"ts": "2019-11-04 01:33PM",
"id": 2,
"v": "foobar"
}
A second version that test if ts exists
(
echo '{ "ts" : 1572892409356 , "id": 2 , "v": "foobar" } ' ;
echo '{ "id":3 }' ;
echo '{ "id": 4 , "v": "barfoo" }'
) | jq 'if .ts != null
then ( .ts|=( ./1000|strflocaltime("%Y-%m-%d %I:%M%p")) )
else .
end '