Nested JSON with variable keys to TSV using jq - json

I have the following nested JSON file labs.json with variable keywords (lab001, lab002, etc.) which I would like to convert into a TSV using jq:
{
"lab001": {
"tags": {
"T1": [],
"T2": ["k26","e23"],
"T3": ["s92"]
},
"code": "8231"
},
"lab002": {
"tags": {
"T1": ["t32","y55"],
"T2": ["q78"],
"T3": ["b24"]
},
"code": "9112"
}
}
The resulting table should look like:
ID
T1
T2
T3
lab001
k26,e23
s92
lab002
t32,y55
q78
b24
Currently I am using a rather pedestrian approach by pasting two calls of jq and doing some cleanup with tr:
paste <(jq -r 'keys_unsorted | #csv' labs.json | tr ',' '\n') <(jq -r '.[].tags | map(tostring) | #tsv' labs.json) | tr -d '[]"'
Is there any more elegant way to get this done purely with jq?

Join elements of each tag by commas, put resulting strings into an array with the lab ID as the first element, and pipe it to the #tsv filter like so:
keys_unsorted[] as $id | [$id, (.[$id].tags[] | join(","))] | #tsv
Online demo

Related

How to convert Json Objects to table using keys as rows and nested keys as columns [duplicate]

Given a JSON file like this,
[
{
"h1": "x1",
"h2": "x2"
},
{
"h1": "y1",
"h2": "y2"
}
]
I extract it as a headed TSV using the following jq code. But I need to specify the header names twice. Is there a way to just specify the header names once? Thanks.
[
"h1"
, "h2"
], (.[] | [
.h1
, .h2
]) | #tsv
Here's a relatively robust jq script for printing the TSV with headers using the key names in the first object:
(.[0] | keys_unsorted) as $keys
| $keys, (.[] | [.[$keys[]]])
| #tsv
This of course assumes the -r command-line option.

parse json and get values with same type

In the below json, I'm unable to get the value which have reporter only.
the output should be jhoncena only which should written into a file.
jq -r '.values' response.json | grep reporter
the output for this is
"name": "reporter-jhoncena"
{
"size": 3,
"limit": 25,
"isLastPage": true,
"values": [
{
"name": "hello-world"
},
{
"name": "test-frame"
},
{
"name": "reporter-jhoncena"
}
],
"start": 0
}
You can use capture :
jq -r '.values[].name
| capture("^reporter-(?<name>.*)").name
' response.json
You can use split such as
jq -r '.values[2].name | split("-")[1]' response.json
Demo
Edit : Alternatively you can use
jq -r '.values[].name | select(.|split("-")[0]=="reporter")|split("-")[1]' response.json > outfile.txt
without knowing the order of the name element within the array
Demo
jq -r '.values[]
| select(.name|index("reporter"))
| .name
| sub("reporter-";"")' in.json > out.txt
Of course you might wish to use a different selection criterion, e.g. using startswith or test.

why not print csv headers?

CentOS, jq
https://stedolan.github.io/jq/manual/
I want to export json to csv.
I use tool jq for this.
Here example of json.
{
"page": {
"id": "kctbh9vrtdwd",
"name": "GitHub",
"url": "https://www.githubstatus.com",
"time_zone": "Etc/UTC",
"updated_at": "2021-05-27T16:56:02.461Z"
},
"status": {
"indicator": "none",
"description": "All Systems Operational"
}
}
I get by
curl -s https://www.githubstatus.com/api/v2/status.json
Here convert json to csv.
curl -s https://www.githubstatus.com/api/v2/status.json | jq -r '.page | [.id, .name] | #csv'
And here is the result:
"kctbh9vrtdwd","GitHub"
But why not print csv headers?
There is quite a lot of noise on the SO page that is provided as a link in one of the comments,
so here are two safe jq-only solutions ("safe" in the sense that it does not matter how the keys are ordered in the input JSON):
Manually add the headers
["id", "name"],
(.page | [.id, .name])
| #csv
Include the headers based on the specification of the relevant columns
["id", "name"] as $headers
| $headers, (.page | [.[$headers[]]])
| #csv

Convert JSON to headed TSV

Given a JSON file like this,
[
{
"h1": "x1",
"h2": "x2"
},
{
"h1": "y1",
"h2": "y2"
}
]
I extract it as a headed TSV using the following jq code. But I need to specify the header names twice. Is there a way to just specify the header names once? Thanks.
[
"h1"
, "h2"
], (.[] | [
.h1
, .h2
]) | #tsv
Here's a relatively robust jq script for printing the TSV with headers using the key names in the first object:
(.[0] | keys_unsorted) as $keys
| $keys, (.[] | [.[$keys[]]])
| #tsv
This of course assumes the -r command-line option.

Where clause issue when parsing JSON file using jq

I'm trying to parse a JSON file which has 6 million lines. Which something looks like this:
temp.json
{
"bbc.com": {
"Reputation": "2.1",
"Rank": "448",
"Category": [
"News"
]
},
"amazon.com": {
"Reputation": "2.1",
"Rank": "448",
"Category": [
"Shopping"
]
}
}
I know how to parse the "Keys" alone. To get "Keys" of this JSON structure, I
tried,
jq -r 'keys[]' temp.json
Result :
amazon.com
bbc.com
To get the "Category" in the above JSON file . I tried ,
jq -r '.[].Category[]' temp.json
Result :
Shopping
News
How to get the "Keys" where the "Category" only with "Shopping"?
Use the to_entries function as in:
jq -r 'to_entries[] | select(.value.Category | index("Shopping") != null) | .key'
In this particular case, to_entries and its overhead can be avoided while still yielding a concise and clear solution:
keys[] as $k | select( .[$k].Category | index("Shopping") != null) | $k