I want create a more simple json with the same original structure but with one a small sample.
As example, If I have this json:
{
"field1": [
{
"a": "F1A1",
"b": "F1B1"
},
{
"a": "F1A2",
"b": "F1B2"
},
{
"a": "F1A3",
"b": "F1B3"
},
{
"a": "F1A4",
"b": "F1B4"
}
],
"field2": [
{
"a": "F2A1",
"b": "F2B1"
},
{
"a": "F2A2",
"b": "F2B2"
}
],
"field3": [
{
"a": "F3A1",
"b": "F3B1"
},
{
"a": "F3A2",
"b": "F3B2"
}
]
}
I want to get the first array element from the first field. So I was expecting this:
{
"field1": [
{
"a": "F1A1",
"b": "F1B1"
}
],
}
I executed jq "select(.field1[0])" tmp.json but it returns the original json.
Bonus:
As bonus, how to do the same but extracting let's say field1 and elements in the array with a=="F1A1" and a=="F1A4", so will expect?:
{
"field1": [
{
"a": "F1A1",
"b": "F1B1"
},
{
"a": "F1A4",
"b": "F1B4"
}
]
}
reduce the oouter object to your field using {field1}, then map this field to an array containing only the first item:
jq '{field1} | map_values([first])'
{
"field1": [
{
"a": "F1A1",
"b": "F1B1"
}
]
}
To filter for certain items use select:
jq '{field1} | map_values(map(select(.a == "F1A1" or .a == "F1A4")))'
{
"field1": [
{
"a": "F1A1",
"b": "F1B1"
},
{
"a": "F1A4",
"b": "F1B4"
}
]
}
As you can see, select does something different. It passes on its input if the argument evaluates to true. Therefore its output is either all or nothing, never just a filtered part. (Of course, you can use select to achieve specific filtering, as shown above.)
I originally had an issue with a JSON (let's call it "raw_data.json") that looked like this:
[
{
"metadata": {
"project": [
"someProject1",
"someProject2"
],
"table_name": [
"someTable1",
"someTable2"
],
"sys_insertdatetime": "someDate",
"sys_sourcesystem": "someSourceSystem"
},
"data": {
"field1": 63533712,
"field2": "",
"field3": "hello",
"field4": "other",
"field5": 2022,
"field6": "0",
"field7": "0",
"field8": "0",
"field9": "0",
"field10": "0",
"field11": "0"
}
},
{
"metadata": {
"project": [
"someProject1",
"someProject2"
],
"table_name": [
"someTable1",
"someTable2"
],
"sys_insertdatetime": "someDate",
"sys_sourcesystem": "someSourceSystem"
},
"data": {
"field1": 63533713,
"field2": "Y2JjLTIwMjItdzA1LXRyZi1vZmZyZXMtcmVuZm9ydC1jaGVxdWllci13MDU=",
"field3": "A0AVB",
"field4": "other",
"field5": "HJlbmZvcnQgY2hlcXVpZXIgVzA1",
"field6": "",
"field7": "02/02/2022",
"field8": "14/02/2022",
"field9": "Ticket"
}
}
]
The dictionaries were pretty big and usually spanned over several lines on my monitor. That was an issue because I needed to encode the whole thing in Base64 and send the result with a HTTP POST method. But the encoding tended to wrap lines by adding newlines characters, which caused my POST method to fail.
I fortunately found a question with the following solution:
export DATA=$(cat 'raw_data.json')
export ENCODED_DATA=$(echo "$DATA" | jq -c . | base64 -w 0)
Problem is, my JSON has now changed to this:
{
"request_id": 1234,
"data_to_be_encoded": [
{
"metadata": {
"project": [
"someProject1",
"someProject2"
],
"table_name": [
"someTable1",
"someTable2"
],
"sys_insertdatetime": "someDate",
"sys_sourcesystem": "someSourceSystem"
},
"data": {
"field1": 63533712,
"field2": "",
"field3": "hello",
"field4": "other",
"field5": 2022,
"field6": "0",
"field7": "0",
"field8": "0",
"field9": "0",
"field10": "0",
"field11": "0"
}
},
{
"metadata": {
"project": [
"someProject1",
"someProject2"
],
"table_name": [
"someTable1",
"someTable2"
],
"sys_insertdatetime": "someDate",
"sys_sourcesystem": "someSourceSystem"
},
"data": {
"field1": 63533713,
"field2": "Y2JjLTIwMjItdzA1LXRyZi1vZmZyZXMtcmVuZm9ydC1jaGVxdWllci13MDU=",
"field3": "A0AVB",
"field4": "other",
"field5": "HJlbmZvcnQgY2hlcXVpZXIgVzA1",
"field6": "",
"field7": "02/02/2022",
"field8": "14/02/2022",
"field9": "Ticket"
}
}
]
}
Basically I needed to keep the request_id key-value pair as is, while what was inside the data_to_be_encoded key had to be encoded. Again, another post offered a nice solution:
export DATA=$(cat 'raw_data.json')
export ENCODED_DATA=$(echo "$DATA" | jq '.data_to_be_encoded |= #base64')
Except for the fact that this solution adds line wrapping and I haven't found a way to disable that feature like I managed to do with the first solution.
I did try this:
export ENCODED_DATA=$(echo "$DATA" | jq -c .data_to_be_encoded | base64 -w 0)
But it only returns the value inside the data_to_be_encoded key and not the whole JSON. So I'm back to square one.
How can I get the best of both worlds? In other words, how can I disable the line wrapping while at the same time encoding a specific part of my JSON?
Use jq's #base64 builtin, rather than doing the conversion outside because that way you can either convert all or nothing.
jq '.data_to_be_encoded |= #base64'
{
"request_id": 1234,
"data_to_be_encoded": "W3sibWV0YWRhdGEiOnsicHJvamVjdCI6WyJzb21lUHJvamVjdDEiLCJzb21lUHJvamVjdDIiXSwidGFibGVfbmFtZSI6WyJzb21lVGFibGUxIiwic29tZVRhYmxlMiJdLCJzeXNfaW5zZXJ0ZGF0ZXRpbWUiOiJzb21lRGF0ZSIsInN5c19zb3VyY2VzeXN0ZW0iOiJzb21lU291cmNlU3lzdGVtIn0sImRhdGEiOnsiZmllbGQxIjo2MzUzMzcxMiwiZmllbGQyIjoiIiwiZmllbGQzIjoiaGVsbG8iLCJmaWVsZDQiOiJvdGhlciIsImZpZWxkNSI6MjAyMiwiZmllbGQ2IjoiMCIsImZpZWxkNyI6IjAiLCJmaWVsZDgiOiIwIiwiZmllbGQ5IjoiMCIsImZpZWxkMTAiOiIwIiwiZmllbGQxMSI6IjAifX0seyJtZXRhZGF0YSI6eyJwcm9qZWN0IjpbInNvbWVQcm9qZWN0MSIsInNvbWVQcm9qZWN0MiJdLCJ0YWJsZV9uYW1lIjpbInNvbWVUYWJsZTEiLCJzb21lVGFibGUyIl0sInN5c19pbnNlcnRkYXRldGltZSI6InNvbWVEYXRlIiwic3lzX3NvdXJjZXN5c3RlbSI6InNvbWVTb3VyY2VTeXN0ZW0ifSwiZGF0YSI6eyJmaWVsZDEiOjYzNTMzNzEzLCJmaWVsZDIiOiJZMkpqTFRJd01qSXRkekExTFhSeVppMXZabVp5WlhNdGNtVnVabTl5ZEMxamFHVnhkV2xsY2kxM01EVT0iLCJmaWVsZDMiOiJBMEFWQiIsImZpZWxkNCI6Im90aGVyIiwiZmllbGQ1IjoiSEpsYm1admNuUWdZMmhsY1hWcFpYSWdWekExIiwiZmllbGQ2IjoiIiwiZmllbGQ3IjoiMDIvMDIvMjAyMiIsImZpZWxkOCI6IjE0LzAyLzIwMjIiLCJmaWVsZDkiOiJUaWNrZXQifX1d"
}
Demo
If you need all in one line, use the -c option as before:
jq -c '.data_to_be_encoded |= #base64'
{"request_id":1234,"data_to_be_encoded":"W3sibWV0YWRhdGEiOnsicHJvamVjdCI6WyJzb21lUHJvamVjdDEiLCJzb21lUHJvamVjdDIiXSwidGFibGVfbmFtZSI6WyJzb21lVGFibGUxIiwic29tZVRhYmxlMiJdLCJzeXNfaW5zZXJ0ZGF0ZXRpbWUiOiJzb21lRGF0ZSIsInN5c19zb3VyY2VzeXN0ZW0iOiJzb21lU291cmNlU3lzdGVtIn0sImRhdGEiOnsiZmllbGQxIjo2MzUzMzcxMiwiZmllbGQyIjoiIiwiZmllbGQzIjoiaGVsbG8iLCJmaWVsZDQiOiJvdGhlciIsImZpZWxkNSI6MjAyMiwiZmllbGQ2IjoiMCIsImZpZWxkNyI6IjAiLCJmaWVsZDgiOiIwIiwiZmllbGQ5IjoiMCIsImZpZWxkMTAiOiIwIiwiZmllbGQxMSI6IjAifX0seyJtZXRhZGF0YSI6eyJwcm9qZWN0IjpbInNvbWVQcm9qZWN0MSIsInNvbWVQcm9qZWN0MiJdLCJ0YWJsZV9uYW1lIjpbInNvbWVUYWJsZTEiLCJzb21lVGFibGUyIl0sInN5c19pbnNlcnRkYXRldGltZSI6InNvbWVEYXRlIiwic3lzX3NvdXJjZXN5c3RlbSI6InNvbWVTb3VyY2VTeXN0ZW0ifSwiZGF0YSI6eyJmaWVsZDEiOjYzNTMzNzEzLCJmaWVsZDIiOiJZMkpqTFRJd01qSXRkekExTFhSeVppMXZabVp5WlhNdGNtVnVabTl5ZEMxamFHVnhkV2xsY2kxM01EVT0iLCJmaWVsZDMiOiJBMEFWQiIsImZpZWxkNCI6Im90aGVyIiwiZmllbGQ1IjoiSEpsYm1admNuUWdZMmhsY1hWcFpYSWdWekExIiwiZmllbGQ2IjoiIiwiZmllbGQ3IjoiMDIvMDIvMjAyMiIsImZpZWxkOCI6IjE0LzAyLzIwMjIiLCJmaWVsZDkiOiJUaWNrZXQifX1d"}
Demo
I need to transform an array of this kind of elements:
[
{
"Field1": "value1",
"Field2": "value2"
},
{
"Field1": "value3",
"Field2": "value4"
},
...
]
To:
[
"PutRequest": {
"Item": {
"Field1": {
"S": "value1"
},
"Field2": {
"S": "value2"
}
}
},
"PutRequest": {
"Item": {
"Field1": {
"S": "value3"
},
"Field2": {
"S": "value4"
}
}
},
...
]
I was thinking about using jq, but I don't quite figure out how to get it.
EDIT
Up to now, I've been able to get that:
[.[] | {"Field1": {"S": .Field1}, "Field2": {"S": .Field2}}]
Is there any what to say: for each field add an like key: {"S": .value}?
EDIT 2
Using map({PutRequest: {Item: map_values({S: .})}}) approach, it's generating me:
{
"S": {
"Field1": "value1",
"Field2": "value2",
}
}
I need:
"Item": {
"Field1": {
"S": "value3"
},
"Field2": {
"S": "value4"
}
}
Any ideas?
Does not exactly match your expected output but you're probably looking for something like this:
map({PutRequest: {Item: map_values({S: .})}})
Demo
I have been playing around with jq to format a json file but I am having some issues trying to solve a particular transformation. Given a test.json file in this format:
[
{
"name": "A", // This would be the first key
"number": 1,
"type": "apple",
"city": "NYC" // This would be the second key
},
{
"name": "A",
"number": "5",
"type": "apple",
"city": "LA"
},
{
"name": "A",
"number": 2,
"type": "apple",
"city": "NYC"
},
{
"name": "B",
"number": 3,
"type": "apple",
"city": "NYC"
}
]
I was wondering, how can I format it this way using jq?
[
{
"key": "A",
"values": [
{
"key": "NYC",
"values": [
{
"number": 1,
"type": "a"
},
{
"number": 2,
"type": "b"
}
]
},
{
"key": "LA",
"values": [
{
"number": 5,
"type": "b"
}
]
}
]
},
{
"key": "B",
"values": [
{
"key": "NYC",
"values": [
{
"number": 3,
"type": "apple"
}
]
}
]
}
]
I have followed this thread Using jq, convert array of name/value pairs to object with named keys and tried to group the json using this expression
jq '. | group_by(.name) | group_by(.city) ' ./test.json
but I have not been able to add the keys in the output.
You'll want to group the items at the different levels and building out your result objects as you want.
group_by(.name) | map({
key: .[0].name,
values: (group_by(.city) | map({
key: .[0].city,
values: map({number,type})
}))
})
Just keep in mind that group_by/1 yields groups in a sorted order. You'll probably want an implementation that preserves that order.
def group_by_unsorted(key_selector):
reduce .[] as $i ({};
.["\($i|key_selector)"] += [$i]
)|[.[]];
I've read all the posts related to it, I'm playing around with it for hours, and still can't manage to get a grip of this tool which seems to be exactly what I need if I just find a way to make it work as I need...
So here's a sample of my JSON:
{
"res": "0",
"main": {
"All": [
{
"field1": "a",
"field2": "aa",
"field3": "aaa",
"field4": "0",
"active": "true",
"id": "1"
},
{
"field1": "b",
"field2": "bb",
"field3": "bbb",
"field4": "0",
"active": "false",
"id": "2"
},
{
"field1": "c",
"field2": "cc",
"field3": "ccc",
"field4": "0",
"active": "true",
"id": "3"
},
{
"field1": "d",
"field2": "dd",
"field3": "ddd",
"field4": "0",
"active": "true",
"id": "4"
}
]
}
}
I'd like to selectively extract some of the fields and get a csv output like this:
field1,field2,field3,id
a,aa,aaa,1
b,bb,bbb,2
c,cc,ccc,3
d,dd,ddd,4
Please notice I've skipped some fields and I'm also not interested in the parent arrays and such.
Thanks a lot in advance.
First your JSON needs to be fixed as following:
{
"main": {
},
"table": {
"All": [
{
"field1": "a",
"field2": "aa",
"field3": "aaa",
"field4": "0",
"active": "true",
"id": "1"
},
{
"field1": "b",
"field2": "bb",
"field3": "bbb",
"field4": "0",
"active": "false",
"id": "2"
},
{
"field1": "c",
"field2": "cc",
"field3": "ccc",
"field4": "0",
"active": "true",
"id": "3"
},
{
"field1": "d",
"field2": "dd",
"field3": "ddd",
"field4": "0",
"active": "true",
"id": "4"
}
]
},
"res": "0"
}
Second using jq you can do the following in order to generate the table output using column:
{ echo Field1 Field2 Field3 ID ; cat data.json | jq -r '.table.All[] | (.field1, .field2, .field3, .id)' | xargs -L4 } | column -t
Output:
Field1 Field2 Field3 ID
a aa aaa 1
b bb bbb 2
c cc ccc 3
d dd ddd 4
Using sed:
echo "field1,field2,field3,id" ;cat data.json | jq -r '.table.All[] | (.field1, .field2, .field3, .id)' | xargs -L4 | sed 's/ /,/g'
Output:
field1,field2,field3,id
a,aa,aaa,1
b,bb,bbb,2
c,cc,ccc,3
d,dd,ddd,4
Update:
Without using sed or xargs , jq has the ability to format the output as csv like the following:
cat data.json | jq -r '.table.All[] | [.field1, .field2, .field3, .id] | #csv'
Output:
"a","aa","aaa","1"
"b","bb","bbb","2"
"c","cc","ccc","3"
"d","dd","ddd","4"
Thanks to chepner as he mentioned in comments the header can be added using jq directly as following:
jq -r '(([["field1", "field2", "field3", "id"]]) + [(.table.All[] | [.field1,.field2,.field3,.id])])[]|#csv' data.json
Output:
"field1","field2","field3","id"
"a","aa","aaa","1"
"b","bb","bbb","2"
"c","cc","ccc","3"
"d","dd","ddd","4"
This command should work correctly according to the last JSON data you have provided in your question:
jq -r '(([["field1", "field2", "field3", "id"]]) + [(.main.All[] | [.field1,.field2,.field3,.id])])[]|#csv' data.json
([["field1", "field2", "field3", "id"]]) : The first part of the command is for the csv header
(.main.All[] | [.field1,.field2,.field3,.id])]) : As main is the
parent of your JSON then you can choose it using .main which will
print the array All then to print the contents of this array you
have to add [] to the name of this array and the full command will
be .main.All[] which will print multiple dictionries and we can
specific the needed keys by piping the out put of .main.All[] to
another array with the keys we want as this
[.field1,.field2,.field3,.id]
Here's an only-jq solution that only requires specifying the desired keys once, e.g. on the command line:
jq -r --argjson f '["field1", "field2", "field3", "id"]' '
$f, (.table.All[] | [getpath( $f[]|[.])]) | #csv'
Output:
"field1","field2","field3","id"
"a","aa","aaa","1"
"b","bb","bbb","2"
"c","cc","ccc","3"
"d","dd","ddd","4"
Losing the quotation marks
One way to avoid quoting the strings would be to pipe into join(",") (or join(", ")) instead of #csv:
field1,field2,field3,id
a,aa,aaa,1
b,bb,bbb,2
c,cc,ccc,3
d,dd,ddd,4
Of course, this might be unacceptable if the values contain commas. In general, if avoiding the quotation marks around strings is important, a good option to consider is #tsv.