is this even possible with `jq`? - json

I have some JSON data that looks like this
[
{Key: "fruits/red/apple", Value: "Red apples"},
{Key:"fruits/green/lime", Value: "Green Limes"},
{Key: "fruits/blue/berries/blueberry", Value: "Blue Berries"},
{Key: "vegetables/red/tomato", Value: "Red Tomatoes"},
{Key: "vegetables/green/cucumber", Value: "Green Cucumbers"}
]
And I am trying to extract the data to a nested JSON-tree structure like
{
"fruits": {
"id": 1,
"name": "fruits",
"children": [
{
"id": 2,
"name": "red",
"path": 1.2,
"children": [ { "id": 3, "name": "apple", "path": 1.2.3 } ]
},
{
"id: 4,
"name": "green",
"path": 1.4,
"children": [ {"id": 5, "name": "lime", "path": 1.4.5} ]
},
{
"id: 6,
"name": "blue",
"path": 1.6,
"children": [ {"id": 7, "name": "berries", "path": 1.6.7, "children": [{...}] } ]
}
]
},
"vegetables": {...}
}
I am new to jq and have something like this that gives me one level of data, but am lost on how to do running counters and recursion
[ .[] | { name: .Key, description: .Value, children: ( [.Key | split("/")] | .[0] | to_entries ) } ]
appreciate any pointers on this.

The desired output is not JSON, and it would be difficult to produce those non-numeric paths (e.g. 1.2.3). You could obviously add quotation marks to make them strings, but it would be much better to choose a more standard or convenient path representation.
Other than that, you can rest assured that jq is up to the task, though it would require some expertise in programming generally, or at least fluency with jq.

Related

Filtering deeply within tree

I'm trying to prune nodes deeply within a JSON structure and I'm puzzled why empty behaves seemingly different from a normal value here.
Input
[
{
"name": "foo",
"children": [{
"name": "foo.0",
"color": "red"
}]
},
{
"name": "bar",
"children": [{
"name": "bar.0",
"color": "green"
},
{
"name": "bar.1"
}]
},
{
"name": "baz",
"children": [{
"name": "baz.0"
},
{
"name": "baz.1"
}]
}
]
Program
jq '(.[].children|.[])|=if has("color") then . else empty end' foo.json
Actual output
[
{
"name": "foo",
"children": [
{
"name": "foo.0",
"color": "red"
}
]
},
{
"name": "bar",
"children": [
{
"name": "bar.0",
"color": "green"
}
]
},
{
"name": "baz",
"children": [
{
"name": "baz.1"
}
]
}
]
Expected output
The output I get, except without the baz.1 child, as that one doesn't have a color.
Question
Apart from the right solution, I'm also curious why replacing empty in the script by a regular value like 42 would replace the children without colors with 42 as expected, but when replacing with empty, it looks like the else branch doesn't get executed?
.[].children |= map(select(.color))
Will remove children that does not has an color so the output becomes:
[
{
"name": "foo",
"children": [
{
"name": "foo.0",
"color": "red"
}
]
},
{
"name": "bar",
"children": [
{
"name": "bar.0",
"color": "green"
}
]
},
{
"name": "baz",
"children": []
}
]
Online demo
Regarding why your filter does not seem to like empty;
This git issue seems to be the cause, multiple elements with empty will fail.
There must be a bug with assigning empty to multiple paths.
In this case you can use del instead:
del(.[].children[] | select(has("color") | not))
Online demo

Iterate over array and output TSV report

I have file with 30, 000 JSON lines delimited by new line. I am using JQ to process it.
Below is each line schema (new.json).
{
"indexed": {
"date-parts": [
[
2020,
8,
13
]
],
"date-time": "2020-08-13T06:27:26Z",
"timestamp": 1597300046660
},
"reference-count": 42,
"publisher": "American Chemical Society (ACS)",
"issue": "3",
"content-domain": {
"domain": [],
"crossmark-restriction": false
},
"short-container-title": [
"Org. Lett."
],
"published-print": {
"date-parts": [
[
2005,
2
]
]
},
"DOI": "10.1021/ol047829t",
"type": "journal-article",
"created": {
"date-parts": [
[
2005,
1,
27
]
],
"date-time": "2005-01-27T05:53:29Z",
"timestamp": 1106805209000
},
"page": "383-386",
"source": "Crossref",
"is-referenced-by-count": 38,
"title": [
"Liquid-Crystalline [60]Fullerene-TTF Dyads"
],
"prefix": "10.1021",
"volume": "7",
"author": [
{
"given": "Emmanuel",
"family": "Allard",
"affiliation": []
},
{
"given": "Frédéric",
"family": "Oswald",
"affiliation": []
},
{
"given": "Bertrand",
"family": "Donnio",
"affiliation": []
},
{
"given": "Daniel",
"family": "Guillon",
"affiliation": []
}
],
"member": "316",
"container-title": [
"Organic Letters"
],
"original-title": [],
"link": [
{
"URL": "https://pubs.acs.org/doi/pdf/10.1021/ol047829t",
"content-type": "unspecified",
"content-version": "vor",
"intended-application": "similarity-checking"
}
],
"deposited": {
"date-parts": [
[
2020,
4,
7
]
],
"date-time": "2020-04-07T13:39:55Z",
"timestamp": 1586266795000
},
"score": null,
"subtitle": [],
"short-title": [],
"issued": {
"date-parts": [
[
2005,
2
]
]
},
"references-count": 42,
"alternative-id": [
"10.1021/ol047829t"
],
"URL": "http://dx.doi.org/10.1021/ol047829t",
"relation": {},
"ISSN": [
"1523-7060",
"1523-7052"
],
"issn-type": [
{
"value": "1523-7060",
"type": "print"
},
{
"value": "1523-7052",
"type": "electronic"
}
],
"subject": [
"Physical and Theoretical Chemistry",
"Organic Chemistry",
"Biochemistry"
]
}
For every DOI, I need to obtain the values of given and family key in the same cell of the same row of that DOI in the CSV/TSV format.
The expected output for the above json is (in CSV/TSV format):
|DOI| givenName|familyName|
|10.1021/ol047829t|Emmanuel; Frédéric; Bertrand; Daniel;|Allard; Oswald; Donnio; Guillon|
I am using the below command line but it is throwing error and when I try to alter I am unable to get CSV/TSV output at all.
cat new.json | jq -r "[.DOI, .publisher, .author[] | .given] | #tsv" > manage.tsv
The same logic applies for subject key also. I am using the below command line to output values of subject key to CSV but it is throwing only the first element (in this case only: "Physical and Theoretical Chemistry")
cat new.json | jq -c -r "[.DOI, .publisher, .subject[0]] | #csv" > manage.csv
Any pointers for right jq command line will be of great help.
Join given and family names by semicolons separately, then pass resulting strings as fields to the TSV filter.
["DOI", "givenName", "familyName"],
(inputs | [.DOI, (.author | map(.given), map(.family) | join("; "))])
| #tsv
Online demo
Note that you need to invoke JQ with -r and -n flags for this to work and produce a valid TSV output.

how to add values to JSON array without full path?

I have a problem with 'jq' I could not solve after searching for a few hours. Let's take this simple JSON as an example with "Category4" nested within "Category2":
{
"categories": [
{
"Id": 1,
"Name": "Category1",
"Children": []
},
{
"Id": 2,
"Name": "Category2",
"Children": [
{
"Id": 4,
"Name": "Category4",
"Children": []
}
]
},
{
"Id": 3,
"Name": "Category3",
"Children": []
}
]
}
I would like to add a "Category5" child within the "Category4" object such as:
{
"categories": [
{
"Id": 1,
"Name": "Category1",
"Children": []
},
{
"Id": 2,
"Name": "Category2",
"Children": [
{
"Id": 4,
"Name": "Category4",
"Children": [
{
"Id": 5,
"Name": "Category5",
"Children": []
}
]
}
]
},
{
"Id": 3,
"Name": "Category3",
"Children": []
}
]
}
I can do it by using the full path of the "Category4" object with:
jq --argjson a '[{"Id":5,"Name":"Category5","Children":[]}]' '.categories[1].Children[0].Children += $a' "myfile.json"
But how can I achieve the same result if I don't know the position of "Category4" (which could be at root level or nested deep inside other objects)? This command was my best guess:
jq --argjson a '[{"Id":5,"Name":"Category5","Children":[]}]' '.. | select(.Id?==4) | .Children += $a' "myfile"
but it only retrieves "Category4" and "Category5" objects (Category1, 2 and 3 are missing from the output). I feel like I am missing something stupid...
Thanks for any help!
Use walk builtin for applying filters to values at arbitrary depths without changing the overall structure.
walk(select(.Id? == 4) .Children += $a)
demo at jqplay.org

Using jq to convert object to key with values

I have been playing around with jq to format a json file but I am having some issues trying to solve a particular transformation. Given a test.json file in this format:
[
{
"name": "A", // This would be the first key
"number": 1,
"type": "apple",
"city": "NYC" // This would be the second key
},
{
"name": "A",
"number": "5",
"type": "apple",
"city": "LA"
},
{
"name": "A",
"number": 2,
"type": "apple",
"city": "NYC"
},
{
"name": "B",
"number": 3,
"type": "apple",
"city": "NYC"
}
]
I was wondering, how can I format it this way using jq?
[
{
"key": "A",
"values": [
{
"key": "NYC",
"values": [
{
"number": 1,
"type": "a"
},
{
"number": 2,
"type": "b"
}
]
},
{
"key": "LA",
"values": [
{
"number": 5,
"type": "b"
}
]
}
]
},
{
"key": "B",
"values": [
{
"key": "NYC",
"values": [
{
"number": 3,
"type": "apple"
}
]
}
]
}
]
I have followed this thread Using jq, convert array of name/value pairs to object with named keys and tried to group the json using this expression
jq '. | group_by(.name) | group_by(.city) ' ./test.json
but I have not been able to add the keys in the output.
You'll want to group the items at the different levels and building out your result objects as you want.
group_by(.name) | map({
key: .[0].name,
values: (group_by(.city) | map({
key: .[0].city,
values: map({number,type})
}))
})
Just keep in mind that group_by/1 yields groups in a sorted order. You'll probably want an implementation that preserves that order.
def group_by_unsorted(key_selector):
reduce .[] as $i ({};
.["\($i|key_selector)"] += [$i]
)|[.[]];

Convert a complex JSON file into a simple JSON file using JQ without getting cartesian product

I want to convert a complex JSON file into a simple JSON file using JQ. However, the query I'm using generates an incorrect output.
My (cut down) JSON file:
[
{
"id": 100,
"foo": [
{
"bar": [
{"type": "read"},
{"type": "write"}
],
"users": ["admin_1"],
"groups": []
},
{
"bar": [
{"type": "execute"},
{ "type": "read"}
],
"users": [],
"groups": ["admin_2"]
}
]
},
{
"id": 101,
"foo": [
{
"bar": [
{"type": "read"}
],
"users": [
"admin_3"
],
"groups": []
}
]
}
]
I need to generate a flatter JSON file and combine the users and groups into one field, similar to this:
[
{
"id": 100,
"users_groups": [
"admin_1",
"admin_2"
],
"bar": ["read"]
},
{
"id": 100,
"users_groups": ["admin_1"],
"bar": ["write"]
},
{
"id": 100,
"users_groups": ["admin_2"],
"bar": ["execute"]
},
{
"id": 101,
"users_groups": ["admin_3"],
"bar": ["read"]
}
]
Everything I try in JQ results in me getting an incorrect output (where admin_1 incorrectly has bar=execute and admin_2 incorrectly has bar=write), similar to the following:
[
{
"id": 100,
"users_groups": [
"admin_1",
"admin_2"
],
"bar": ["read", "write", "execute"]
},
{
"id": 101,
"users_groups": ["admin_3"],
"bar": ["read"]
}
]
I have tried many vairiats of this query - any idea what I should be doing instead?
cat file.json | jq -r '[.[] | select(has("foo")) |{"id", "users":(.foo[] | .users), "groups":(.foo[] | .groups), "bar":([.foo[].bar[] | .type])} ] '
The following filter groups by "type" as the question seems to require:
map(.id as $id
| [.foo[]
| {id: $id, bar: .bar[].type} +
{"users_groups": (.users + .groups)[]} ]
| group_by(.bar)
| map(.[0] + {"users_groups": [.[].users_groups]}) )
Output
[
[
{
"id": 100,
"bar": "execute",
"users_groups": [
"admin_2"
]
},
{
"id": 100,
"bar": "read",
"users_groups": [
"admin_1",
"admin_2"
]
},
{
"id": 100,
"bar": "write",
"users_groups": [
"admin_1"
]
}
],
[
{
"id": 101,
"bar": "read",
"users_groups": [
"admin_3"
]
}
]
]
Variations
To achieve the array-of-objects output format, simply tack on | [.[][]];
it would similarly be trivially easy to ensure that .bar is array-valued, though that might be pointless given that the grouping is by .type.