Transform JSON array to object with jq - json

I'm trying to transform array to object by specific key. It works fine without using stream, but not possible when stream is applied.
Data:
[
{
"id": "1",
"userId": "fa51531d"
}
,
{
"id": "2",
"userId": "a167869a"
}
]
I tried running this command but it throws an error.
jq -n --stream 'fromstream(1|truncate_stream(inputs)) | INDEX(.id)' test.json > result.json
Data above should be transformed to:
{
"1": {
"userId": "fa51531d",
"id": "1"
},
"2": {
"userId": "a167869a",
"id": "2"
},
}
I want to achieve the same result as with jq 'INDEX(.id) but I need to use stream (because of big JSON file).

If you are trying to recreate the whole input object, the stream-based approach is rendered pointless. That said, using this approach, there's no need to truncate. So either replace 1 with 0:
jq -n --stream 'fromstream(0|truncate_stream(inputs)) | INDEX(.id)'
Or just omit it entirely (which reveals its futility):
jq -n --stream 'fromstream(inputs) | INDEX(.id)'
What would make more sense, is to output a stream of objects, each indexed as with INDEX. Maybe you were looking for this:
jq -n --stream 'fromstream(1|truncate_stream(inputs)) | {(.id):.}'
{
"1": {
"id": "1",
"userId": "fa51531d"
}
}
{
"2": {
"id": "2",
"userId": "a167869a"
}
}

To transform your JSON Array to JSON Object maybe you can use this
jq reduce .[] as $item ({}; .[$item.id] = $item)
but if you want to stream the JSON, i don't have the solutions
cmiiw

If your stream really looks like in your question, this should do:
jq 'INDEX(.id)' test.json
Output:
{
"1": {
"id": "1",
"userId": "fa51531d"
},
"2": {
"id": "2",
"userId": "a167869a"
}
}

Related

How can I merge matching keys to into arrays via another key?

I have a GraphQL schema file with deeply nested object metadata that I'd like to extract into arrays of child properties. The original file is over 75000 lines long but I was able to successfully extract the Types & fields for each object using this command:
jq '.data.__schema.types[] | {name: .name, fields: .fields[]?.name?}' schema.json > output.json
Output:
{
"name": "UsersConnection",
"fields": "nodes"
}
{
"name": "UsersConnection",
"fields": "edges"
}
{
"name": "UsersConnection",
"fields": "pageInfo"
}
{
"name": "UsersConnection",
"fields": "totalCount"
}
{
"name": "UsersEdge",
"fields": "cursor"
}
{
"name": "UsersEdge",
"fields": "node"
}
...
But the output I want looks more like this:
[{
"name": "UsersConnection",
"fields": [ "nodes", "edges", "pageInfo", "totalCount" ]
},
{
"name": "UsersEdge",
"fields": [ "cursor", "node" ]
}]
I was able to do this by comma-separating each object, surrounding the output with { "data": [ -OUTPUT- ]} & the command:
jq 'map(. |= (group_by(.name) | map(first + {fields: map(.fields)})))' output.json > output2.json
How can I do this with a single command?
Assuming .data.__schema.types is an array, and so is .fields, you could try map in both cases:
.data.__schema.types | map({name: .name, fields: (.fields | map(.name))})
I totally missed that I put the fields object inside brackets like this:
jq '.data.__schema.types[] | {name: .name, fields: [.fields[]?.name?]}'
Keeping this up for posterity in case someone else is trying to do the same thing
Update: I was able to get a cleaner, comma-separated result like this:
jq 'reduce .data.__schema.types[] as $d (null; .[$d.name] += [$d.fields[]?.name?])'

Aggregate json arrays from multiple files using jq, grouping by key

I would like to aggregate two or more files into a single json, and aggregate arrays under a same key.
file1.json
{
"shapes": [
{
"id": "1",
"name": "circle"
},
{
"id": "2",
"name": "square"
}
]
}
file2.json
{
"shapes": [
{
"id": "3",
"name": "triangle"
}
]
}
Expected result :
{
"shapes": [
{
"id": "1",
"name": "circle"
},
{
"id": "2",
"name": "square"
},
{
"id": "3",
"name": "triangle"
}
]
}
I can do this with the following jq command :
jq -s '{shapes: map(.shapes)|add }' file*.json
But this requires me to know the shapes attribute and hardcode it. Is there a simple way I can get the same result without ever using the key name explicitly?
Here is a solution that’s suitable when each top-level object has only one key, and that is both efficient and conceptually simple. It assumes jq is invoked with the -n option.
reduce inputs as $in (null;
($in|keys_unsorted[0]) as $k | { ($k): (.[$k] + $in[$k]) })
or slightly more compactly:
reduce inputs as $in (null; ($in|keys_unsorted[0]) as $k | .[$k] += $in[$k] )
Here is a solution that also solves a more general problem: first, it handles arbitrarily many input files; and second, it forms the "sum" by key, for every key, on the assumption that every top-level key is array-valued.
The generic function:
# the values at each key are assumed to be arrays
def aggregate(stream):
reduce stream as $o ({};
reduce ($o|keys_unsorted[]) as $k (.;
.[$k] += $o[$k] ));
To avoid "slurping", we will use inputs:
aggregate(inputs)
The invocation must therefore use the -n command-line option:
jq -n -f program.jq *.json
Try the following code. This can handle any number of files. All inputs are assumed to be json objects with all values inside as arrays. All such arrays are aggregated after grouping by keys. It outputs an object which has keys associated with corresponding aggregated arrays.
jq -s 'map(to_entries)|add|group_by(.key)|
map( { "key": (.[0].key), "value": (map(.value)|add)})|
from_entries' file1.json file2.json
For your sample input this gives:
{
"shapes": [
{
"id": "1",
"name": "circle"
},
{
"id": "2",
"name": "square"
},
{
"id": "3",
"name": "triangle"
}
]
}

jq: How to combine disjoint object values as a single object of key/value pairs?

If I have a JSON input data:
input.json
{
"metadata": {
"guid": "07f90eed-105d-41b2-bc20-4c20dfb51653"
},
"entity": {
"name": "first"
}
}
{
"metadata": {
"guid": "da187e3a-8db9-49fd-8c05-41f29cf87f51"
},
"entity": {
"name": "second"
}
}
{
"metadata": {
"guid": "6685c3af-5427-4add-8764-7b18ae3c23bb"
},
"entity": {
"name": "third"
}
}
and I want to create from it the following:
{
"first": "07f90eed-105d-41b2-bc20-4c20dfb51653",
"second": "da187e3a-8db9-49fd-8c05-41f29cf87f51",
"third": "6685c3af-5427-4add-8764-7b18ae3c23bb"
}
That is, the input data is a collection of separate JSON objects, each of which has the structure shown. I want the output to be a single JSON object where the key is the .entity.name and the value is the .metadata.guid.
I have tried:
jq -r '{.entity.name: .metadata.guid}' input.json
jq -r 'map({(.entity.name): .metadata.guid})' input.json
but these just yields a syntax error. The closest I got was:
jq -r '.entity.name as $name|.metadata.guid as $guid | { ($name) : ($guid) }' input.json
{
"first": "07f90eed-105d-41b2-bc20-4c20dfb51653"
}
{
"second": "da187e3a-8db9-49fd-8c05-41f29cf87f51"
}
{
"third": "6685c3af-5427-4add-8764-7b18ae3c23bb"
}
But, the there are still 3 objects (not 1).
I did get one form to give me what I want, but I suspect there is an easier way to do this:
jq -r '.entity.name as $name|.metadata.guid as $guid | { ($name) : ($guid) }' input.json | jq -s add
{
"first": "07f90eed-105d-41b2-bc20-4c20dfb51653",
"second": "da187e3a-8db9-49fd-8c05-41f29cf87f51",
"third": "6685c3af-5427-4add-8764-7b18ae3c23bb"
}
Any thoughts how how to do this properly?
With single jq command:
jq -s '[.[] | { (.entity.name): .metadata.guid }] | add' input.json
-s (--slurp) - instead of running the filter for each JSON object in the input, read the entire input stream into a large array and run the filter just once.
The output:
{
"first": "07f90eed-105d-41b2-bc20-4c20dfb51653",
"second": "da187e3a-8db9-49fd-8c05-41f29cf87f51",
"third": "6685c3af-5427-4add-8764-7b18ae3c23bb"
}
A simpler way to get three objects is jq '{(.entity.name): .metadata.guid}' input.json.
Wrapping the key (.entity.name) into parentheses tells jq to evaluate it as an expression, not as a string.
This leads to a simpler form of what you already have (using two invocations of jq):
$ jq '{(.entity.name): .metadata.guid}' input.json | jq -s add
{
"first": "07f90eed-105d-41b2-bc20-4c20dfb51653",
"second": "da187e3a-8db9-49fd-8c05-41f29cf87f51",
"third": "6685c3af-5427-4add-8764-7b18ae3c23bb"
}

json array into json stream with jq

This task is similar to this one but in my case I would like to go other way around.
So say we have input:
[
{
"name": "John",
"email": "john#company.com"
},
{
"name": "Brad",
"email": "brad#company.com"
}
]
and desired output is:
{
"name": "John",
"email": "john#company.com"
}
{
"name": "Brad",
"email": "brad#company.com"
}
I tried to write a bash function which will do it in loop:
#!/bin/bash
json=`cat $1`
length=`echo $json | jq '. | length'`
for (( i=0; i<$length ; i++ ))
do
echo $json | jq ".[$i]"
done
but it is obviously extremly slow...
Is there any way how to use jq better for this?
You can use this :
jq '.[]' file
If you use the .[index] syntax, but omit the index entirely, it will return all of the elements of an array.
Test:
$ jq '.[]' file
{
"email": "john#company.com",
"name": "John"
}
{
"email": "brad#company.com",
"name": "Brad"
}
you can apply ".[]" filter.
This tutorial is very informative
https://stedolan.github.io/jq/tutorial/

Update inner attribute of JSON with jq

Could somebody help me to deal with jq command line utility to update JSON object's inner value?
I want to alter object interpreterSettings.2B263G4Z1.properties by adding several key-values, like "spark.executor.instances": "16".
So far I only managed to fully replace this object, not add new properties with command:
cat test.json | jq ".interpreterSettings.\"2B188AQ5T\".properties |= { \"spark.executor.instances\": \"16\" }"
This is input JSON:
{
"interpreterSettings": {
"2B263G4Z1": {
"id": "2B263G4Z1",
"name": "sh",
"group": "sh",
"properties": {}
},
"2B188AQ5T": {
"id": "2B188AQ5T",
"name": "spark",
"group": "spark",
"properties": {
"spark.cores.max": "",
"spark.yarn.jar": "",
"master": "yarn-client",
"zeppelin.spark.maxResult": "1000",
"zeppelin.dep.localrepo": "local-repo",
"spark.app.name": "Zeppelin",
"spark.executor.memory": "2560M",
"zeppelin.spark.useHiveContext": "true",
"spark.home": "/usr/lib/spark",
"zeppelin.spark.concurrentSQL": "false",
"args": "",
"zeppelin.pyspark.python": "python"
}
}
},
"interpreterBindings": {
"2AXUMXYK4": [
"2B188AQ5T",
"2AY8SDMRU"
]
}
}
I also tried the following but this only prints contents of interpreterSettings.2B263G4Z1.properties, not full object.
cat test.json | jq ".interpreterSettings.\"2B188AQ5T\".properties + { \"spark.executor.instances\": \"16\" }"
The following works using jq 1.4 or jq 1.5 with a Mac/Linux shell:
jq '.interpreterSettings."2B188AQ5T".properties."spark.executor.instances" = "16" ' test.json
If you have trouble adapting the above for Windows, I'd suggest putting the jq program in a file, say my.jq, and invoking it like so:
jq -f my.jq test.json
Notice that there is no need to use "cat" in this case.
p.s. You were on the right track - try replacing |= with +=