Where clause issue when parsing JSON file using jq - json

I'm trying to parse a JSON file which has 6 million lines. Which something looks like this:
temp.json
{
"bbc.com": {
"Reputation": "2.1",
"Rank": "448",
"Category": [
"News"
]
},
"amazon.com": {
"Reputation": "2.1",
"Rank": "448",
"Category": [
"Shopping"
]
}
}
I know how to parse the "Keys" alone. To get "Keys" of this JSON structure, I
tried,
jq -r 'keys[]' temp.json
Result :
amazon.com
bbc.com
To get the "Category" in the above JSON file . I tried ,
jq -r '.[].Category[]' temp.json
Result :
Shopping
News
How to get the "Keys" where the "Category" only with "Shopping"?

Use the to_entries function as in:
jq -r 'to_entries[] | select(.value.Category | index("Shopping") != null) | .key'

In this particular case, to_entries and its overhead can be avoided while still yielding a concise and clear solution:
keys[] as $k | select( .[$k].Category | index("Shopping") != null) | $k

Related

Skip or Ignore non-existing key with to_entries in jq

I'm trying to create a massive CSV file converted from each *.json file. This snippet works until it faces the file that doesn't have the key (hobby).
Original
{
"name": "bob",
"hobby": [
"baseball",
"baseketball"
]
}
jq snippet
cat *.json | jq '.name as $n | .hobby | to_entries[] | [ $n, .value]'
It works
[][]... is a pre-format when creating CSV with jq
[
"bob",
"baseball"
]
[
"bob",
"baseketball"
]
https://jqplay.org/s/L-SmqiN-jw
However if the .hobby key doesn't exist it fails miserably.
jq: error (at <stdin>:6): null (null) has no keys
exit status 5
https://jqplay.org/s/gapUv1Tpmb
I tried to use if block but it seems not correct. How can we do such a thing either
return [] (empty array)
skip jq excution for the current working file with this problem and go to the next
One of many possibilities would be to use try:
.name as $n | .hobby | try to_entries[] | [ $n, .value]
try EXP is equivalent to try EXP catch empty.
Enter the following:
{
"name": "bob",
"hobby": [
"baseball",
"baseketball"
]
}
$ cat file | jq -c "{name,hobby:.hobby[]}|[.[]]"
["bob","baseball"]
["bob","baseketball"]

Expand large array and select elements in JQ

This may just not be possible due to how conceptually streaming/filtering JSON works, but let's suppose I have something like the following JSON:
[
{
"name": "account_1",
"type": "account"
},
{
"name": "account_2",
"type": "account"
},
{
"name": "user_1",
"type": "user"
},
{
"name": "user_2",
"type": "user"
}
]
And now I want to print out only the user objects.
I know I can filter to just the streaming type entities with something like this:
cat file.json | jq --stream 'select(.[0][1] == "type" and .[1] == "user" | .)'
Which would produce:
[
[
2,
"type"
],
"user"
]
[
[
3,
"type"
],
"user"
]
Is there any way I can print out the parent objects of those types instead of the type entities? E.g. I'd like to get out:
[
{
"name": "user_1",
"type": "user"
},
{
"name": "user_2",
"type": "user"
}
]
Without streaming, this is a pretty straightforward exercise. E.g.:
cat file.json | jq '.[] | select(.type=="user")'
In reality the actual input file is around 5GB, so I need to use streaming input, but I can't seem to get the jq syntax right with --stream enabled. E.g.
cat file.json | jq --stream '.[] | select(.type=="user")'
Produces:
jq: error (at <stdin>:3): Cannot index array with string "type"
jq: error (at <stdin>:5): Cannot index array with string "type"
...
(edited to include desired output)
Just truncate the top-level array.
jq -n --stream 'fromstream(1 | truncate_stream(inputs)) | select(.type == "user")'
Online demo
jqplay does not support the --stream option, so the above demo has the output of --stream as the JSON input.

How to group a JSON by a key and sort by its count?

I start from a jsonlines file similar to this
{ "kw": "foo", "age": 1}
{ "kw": "foo", "age": 1}
{ "kw": "foo", "age": 1}
{ "kw": "bar", "age": 1}
{ "kw": "bar", "age": 1}
Please note each line is a valid json, but the whole file is not.
The output I'm seeking is an ordered list of keywords sorted by its occurrence. Like this:
[
{"kw": "foo", "count": 3},
{"kw": "bar", "count": 2}
]
I'm able to group and count the keywords using the slurp option
jq --slurp '. | group_by(.kw) | .[] | {kw: .[0].kw, count: . | length }'
Output:
{"kw":"bar","count":2}
{"kw":"foo","count":3}
But:
This is not sorted
This is not valid JSON array
A very stupid solution I've found, is to pass twice via jq :)
jq --slurp --compact-output '. | group_by(.kw) | .[] | {kw: .[0].kw, count: . | length }' sample.json \
| jq --slurp --compact-output '. | sort_by(.count)'
But I'm pretty sure someone smarter than me can find a more elegant solution.
This is not sorted
That is not quite correct, group_by(.foo) internally does a sort(.foo), so the results are shown in the sorted order of the field. See jq Manual - group_by(path_expression)
This is not valid JSON array
Just enclose the operation within [..] and also the leading . is optional. So just do
jq --slurp --compact-output '[ group_by(.kw)[] | {kw: .[0].kw, count: length } ]'
If you are referring to sort by the .count you can do a ascending sort and reverse
jq --slurp --compact-output '[ group_by(.kw)[] | {kw: .[0].kw, count: length }] | sort_by(.count) | reverse'

Using jq with 'contains' in a 'select' inside a 'del' is not working

i try to remove some entries from a dict in a json. It works by using == but with contains it doesn't work.
Jq call working:
jq 'del(.entries[] | select(.var == "foo"))' input.json
Jq call not working:
jq 'del(.entries[] | select(.var | contains("foo")))' input.json
input.json:
{
"entries": [
{
"name": "test1",
"var": "foo"
},
{
"name": "test2",
"var": "bar"
}
]
}
Output:
{
"entries": [
{
"name": "test2",
"var": "bar"
}
]
}
The result of jq '.entries[] | select(.var == "foo")' input.json and jq '.entries[] | select(.var | contains("foo"))' input.json is the same, so I think the two del-calls should also work.
Is this a bug in jq or did I something wrong?
This must be a bug as it seems to work perfectly on jq 1.6 (try it here).
If you're unable to update to jq 1.6 you should be able to use the following command instead, which I've successfully tested on jq 1.5 :
jq '.entries |= map(select(.var | contains("foo") | not))' file.json

How to use jq #CSV filter to generate a csv file with values from different json levels

I have a JSON file from the Spotify API that lists all the songs on a specific album. The file is organized as follows:
.
.name
.tracks.items
.tracks.items[]
.tracks.items[].artists
.tracks.items[].artists[].name
.tracks.items[].duration_ms
.tracks.items[].name
I'm using jq to create a csv with the following information: song's artist, song's title, and song's duration. I can do this using the following syntax:
jq -r '.tracks.items[] | [.artists[].name, .name, .duration_ms] | #csv' myfile.json
Output:
"Michael Jackson","Wanna Be Startin' Somethin'",363400
"Michael Jackson","Baby Be Mine",260666
...
However, I would like to also add the value under .name (which represents the name of the album the songs are from) to every row of my csv file. Something that would look like this:
"Thriller","Michael Jackson","Wanna Be Startin' Somethin'",363400
"Thriller","Michael Jackson","Baby Be Mine",260666
...
Is it possible to do this using the #csv filter? I can do it by hand by hardcoding the name of the album like this
jq -r '.tracks.items[] | ["Thriller", .artists[].name, .name, .duration_ms] | #csv' myfile.json
But I was hoping there might be a nicer way to do it.
EDIT:
Here's what the file looks like:
{
"name": "Thriller",
"tracks": {
"items": [
{
"artists": [
{
"name": "Michael Jackson"
}
],
"duration_ms": 363400,
"name": "Wanna Be Startin' Somethin'"
},
{
"artists": [
{
"name": "Michael Jackson"
}
],
"duration_ms": 260666,
"name": "Baby Be Mine"
}
]
}
}
See the "Variable / Symbolic Binding Operator" section in jq's documentation
jq -r '
.name as $album_name ### <- THIS RIGHT HERE
| .tracks.items[]
| [$album_name, .artists[].name, .name, .duration_ms]
| #csv
' myfile.json