Grep for a name-value pair inside a JSON object - json

Have a shell script running on Unix that is going through a list of JSON objects like the following, collecting values like <init>() # JSONInputData.java:82. There are also other objects with other values that I need to retrieve.
Is there a better option than grepping for "STACKTRACE_LINE",\n\s*.* and then splitting up that result?
inb4: "add X package to the OS". Need to run generically.
. . .
"probableStartLocationView" : {
"lines" : [ {
"fragments" : [ {
"type" : "STACKTRACE_LINE",
"value" : "<init>() # JSONInputData.java:82"
} ],
"text" : "<init>() # JSONInputData.java:82"
} ],
"nested" : false
},
. . . .
What if I was looking for "description" : "Dangerous Data Received" in a series of objects like the following knowing that I need to know that it is associated with event 12345 and not with another event listed in the same file?
. . .
"events" : [ {
"id" : "12345",
"important" : true,
"type" : "Creation",
"description" : "Dangerous Data Received",
. . .

Is there a better option than grepping for "STACKTRACE_LINE",\n\s*.* and then splitting up that result?
Yes. Use jq to filter and extract the interesting parts.
Example 1, given this JSON:
{
"probableStartLocationView": {
"lines": [
{
"fragments": [
{
"type": "STACKTRACE_LINE",
"value": "<init>() # JSONInputData.java:82"
}
],
"text": "<init>() # JSONInputData.java:82"
}
],
"nested": false
}
}
Extract value where type is "STACKTRACE_LINE":
jq -r '.probableStartLocationView.lines[] | .fragments[] | select(.type == "STACKTRACE_LINE") | .value' file.json
This is going to produce one line per value.
Example 2, given this JSON:
{
"events": [
{
"id": "12345",
"important": true,
"type": "Creation",
"description": "Dangerous Data Received"
}
]
}
Extract the id where description starts with "Dangerous":
jq -r '.events[] | select(.description | startswith("Dangerous")) | .id'
And so on.
See the jq manual for more examples and capabilities.
Also there are many questions on Stack Overflow using jq,
that should help you find the right combination of filtering and extracting the relevant parts.

Related

Extract top-level key and contents from large JSON using stream

One procedure in a system is to 'extract' one key and its (object) value to a dedicated file to subsequently process it in some way in a (irrelevant) script.
A representative subset of the original JSON file looks like:
{
"version" : null,
"produced" : "2021-01-01T00:00:00+0000",
"other": "content here",
"items" : [
{
"code" : "AA",
"name" : "Example 1",
"prices" : [ "other", "content", "here" ]
},
{
"code" : "BB",
"name" : "Example 2",
"prices" : [ "other", "content", "here" ]
}
]
}
And the current output, given that subset as input, simply equals:
[
{
"code" : "AA",
"name" : "Example 1",
"prices" : [ "other", "content", "here" ],
},
{
"code" : "BB",
"name" : "Example 2",
"prices" : [ "other", "content", "here" ],
},
...
]
Previously, we would extract the whole portion of "items" using jq with a very straightforward command (which worked fine):
cat file.json | jq '.items' > file.items.json
However, recently the size of the original json file has increased drastically in size, causing the script to fail due to a Out of memory error. One obvious solution is to use jq's 'stream' option. However, I am kind of stuck on how to convert above command to a valid filter in jq's stream syntax.
cat file.json | jq --stream '...' > file.items.json
Any advice on what to use as a filter for this command would be greatly appreciated. Thanks in advance!
You should use the --stream flag in combination with the fromstream builtin
jq --stream --null-input '
fromstream(inputs | select(.[0][0] == "items"))[]
' file.json
[
{
"code": "AA",
"name": "Example 1",
"prices": [
"other",
"content",
"here"
]
},
{
"code": "BB",
"name": "Example 2",
"prices": [
"other",
"content",
"here"
]
}
]
Demo not for the efficiency or memory consumption but rather for the syntax (as I had to stream your original input using tostream for the lack of the --stream option on jqplay.org)
Note: Although it works for the sample data, do not try to shortcut using
jq --stream --null-input 'fromstream(inputs).items' file.json
directly on your large JSON file, as it only
reconstructs the entire input JSON entity, thus defeating the purpose of using --stream
(clarified by #peak)
If a stream of the {code, name, prices} objects is acceptable, then you could go with:
< input.json jq --stream -n '
fromstream( 2 | truncate_stream(inputs | select(.[0][0] == "items")) )'
This would have minimal memory requirements, which may or may not be significant depending on the value of .items|length

jq with multiple select statements and an array

I've got some JSON like the following (I've filtered the output here):
[
{
"Tags": [
{
"Key": "Name",
"Value": "example1"
},
{
"Key": "Irrelevant",
"Value": "irrelevant"
}
],
"c7n:MatchedFilters": [
"tag: example_tag_rule"
],
"another_key": "another_value_I_dont_want"
},
{
"Tags": [
{
"Key": "Name",
"Value": "example2"
}
],
"c7n:MatchedFilters": [
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
}
]
I'd like to create a csv file with the value within the Name key and all of the "c7n:MatchedFilters" in the array. I've made a few attempts but still can't get quite the output I expect. There's some example code and the output below:
#Prints the key that I'm after.
cat new.jq | jq '.[] | [.Tags[], {"c7n:MatchedFilters"}] | .[] | select(.Key=="Name")|.Value'
"example1"
"example2"
#Prints all the filters in an array I'm after.
cat new.jq | jq -r '.[] | [.Tags[], {"c7n:MatchedFilters"}] | .[] | select(."c7n:MatchedFilters") | .[]'
[
"tag: example_tag_rule"
]
[
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
#Prints *all* the tags (including ones I don't want) and all the filters in the array I'm after.
cat new.jq | jq '.[] | [.Tags[], {"c7n:MatchedFilters"}] | select((.[].Key=="Name") and (.[]."c7n:MatchedFilters"))'
[
{
"Key": "Name",
"Value": "example1"
},
{
"Key": "Irrelevant",
"Value": "irrelevant"
},
{
"c7n:MatchedFilters": [
"tag: example_tag_rule"
]
}
]
[
{
"Key": "Name",
"Value": "example2"
},
{
"c7n:MatchedFilters": [
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
}
]
I hope this makes sense, let me know if I've missed anything.
Your attempts are not working because you start out with [.Tags[], {"c7n:MatchedFilters"}] to construct one array containing all the tags and an object containing the filters. You are then struggling to find a way to process this entire array at once because it jumbles together these unrelated things without any distinction. You will find it much easier if you don't combine them in the first place!
You want to find the single tag with a Key of "Name". Here's one way to find that:
first(
.Tags[]|
select(.Key=="Name")
).Value as $name
By using a variable binding we can save it for later and worry about constructing the array separately.
You say (in the comments) that you just want to concatenate the filters with spaces. You can do that easily enough:
(
."c7n:MatchedFilters"|
join(" ")
) as $filters
You can combine all this together like follows. Note that each variable binding leaves the input stream unchanged, so it's easy to compose everything.
jq --raw-output '
.[]|
first(
.Tags[]|
select(.Key=="Name")
).Value as $name|
(
."c7n:MatchedFilters"|
join(" ")
) as $filters|
[$name, $filters]|
#csv
Hopefully that's easy enough to read and separates out each concept. We break up the array into a stream of objects. For each object, we find the name and bind it to $name, we concatenate the filters and bind them to $filters, then we construct an array containing both, then we convert the array to a CSV string.
We don't need to use variables. We could just have a big array constructor wrapped around the expression to find the name and the expression to find the filters. But I hope you can see the variables make things a bit flatter and easier to understand.

InfluxDB query in json format transform to csv with jq including tags and fields

I want to process data with a bash script but have trouble to get the InfluxDB output to the desired csv output with all tags and fields.
Below an example output from an influx query:
{
"results": [
{
"series": [
{
"name": "tickerPrice",
"tags": {
"symbol": "AAVE",
"symbolTo": "EUR"
},
"columns": [
"time",
"priceMean"
],
"values": [
[
1614402874120627200,
282.398263888889
]
]
},
{
"name": "tickerPrice",
"tags": {
"symbol": "BTC",
"symbolTo": "EUR"
},
"columns": [
"time",
"priceMean"
],
"values": [
[
1614402874120627200,
39189.756944444445
]
]
}
]
}
]
}
And I would like to transform it to:
"name","symbol","symbolTo","time","priceMean"
"tickerPrice","AAVE","EUR",1614402874120627200,282.398263888889
"tickerPrice","BTC","EUR",1614402874120627200,39189.756944444445
I have managed (google) to get the fields to a csv format but till now not managed to get all data in the csv. Here is the commands that I use for that:
$ jq -r '(.results[0].series[0].columns), (.results[0].series[].values[])'
Because this is not the only query I want to do it would be nice that it is universal for the content, so the number of fields and tags could be different.
Why you just don't specify csv format directly in influxdb CLI https://docs.influxdata.com/influxdb/v1.8/tools/shell/ :
-format 'json|csv|column' Specifies the format of the server responses.
So you won't need any result post processing.
The following produces the required output in a way that
allows for multiple values of "time" in each .values array, but does not refer to the specific headers except for "name":
def headers:
(.tags | keys_unsorted) as $tags
| (["name"] + $tags + .columns);
.results[0]
| (.series[0] | headers),
(.series[] | ([.name, .tags[]] + .values[]))
| #csv
This of course assumes that the separate "series" are conformal.

How to retrieve recursive path to a specific key (not displaying the parents' key name, but the value from a different key of each parent)

I have the following JSON
[
{
"name": "alpha"
},
{
"fields": [
{
"name": "beta_sub_1"
},
{
"name": "beta_sub_2"
}
],
"name": "beta"
},
{
"fields": [
{
"fields": [
{
"name": "gamma_sub_sub_1"
}
],
"name": "gamma_sub_1"
}
],
"name": "gamma"
}
]
and I would like to get the paths of "name" needed to get to each "name" values. Considering the above code, I would like the following result:
"alpha"
"beta.beta_sub_1"
"beta.beta_sub_2"
"beta"
"gamma.gamma_sub_1.gamma_sub_sub_1"
"gamma.gamma_sub_1"
"gamma"
I've been searching around but I couldn't get to this result. So far, I have this:
tostream as [$p,$v] | select($p[-1] == "name" and $v != null) | "\([$p[0,1]] | join(".")).\($v)"
but this gives me the path with the key name of the parents (and doesn't keep all the intermediary parents.
"0.name.alpha"
"1.fields.beta_sub_1"
"1.fields.beta_sub_2"
"1.name.beta"
"2.fields.gamma_sub_sub_1"
"2.fields.gamma_sub_1"
"2.name.gamma"
Any ideas?
P.S.: I've been searching for very detailed doc on jq but couldn't find anything good enough. If anyone has any recommendations, I'd appreciate.
The problem description does not seem to match the sample input and output, but the following jq program produces the required output:
def descend:
select( type == "object" and has("name") )
| if has("fields") then ([.name] + (.fields[] | descend)) else empty end,
[.name] ;
.[]
| descend
| join(".")
With your input, and using the -r command-line option, this produces:
alpha
beta.beta_sub_1
beta.beta_sub_2
beta
gamma.gamma_sub_1.gamma_sub_sub_1
gamma.gamma_sub_1
gamma
Resources
Apart from the jq manual, FAQ, and Cookbook, you might find the following helpful:
"jq Language Description"
"A Stream-Oriented Introduction to jq"

Using jq to list keys in a JSON object

I have a hierarchically deep JSON object created by a scientific instrument, so the file is somewhat large (1.3MB) and not readily readable by people. I would like to get a list of keys, up to a certain depth, for the JSON object. For example, given an input object like this
{
"acquisition_parameters": {
"laser": {
"wavelength": {
"value": 632,
"units": "nm"
}
},
"date": "02/03/2525",
"camera": {}
},
"software": {
"repo": "github.com/username/repo",
"commit": "a7642f",
"branch": "develop"
},
"data": [{},{},{}]
}
I would like an output like such.
{
"acquisition_parameters": [
"laser",
"date",
"camera"
],
"software": [
"repo",
"commit",
"branch"
]
}
This is mainly for the purpose of being able to enumerate what is in a JSON object. After processing the JSON objects from the instrument begin to diverge: for example, some may have a field like .frame.cross_section.stats.fwhm, while others may have .sample.species, so it would be convenient to be able to interrogate the JSON object on the command line.
The following should do exactly what you want
jq '[(keys - ["data"])[] as $key | { ($key): .[$key] | keys }] | add'
This will give the following output, using the input you described above:
{
"acquisition_parameters": [
"camera",
"date",
"laser"
],
"software": [
"branch",
"commit",
"repo"
]
}
Given your purpose you might have an easier time using the paths builtin to list all the paths in the input and then truncate at the desired depth:
$ echo '{"a":{"b":{"c":{"d":true}}}}' | jq -c '[paths|.[0:2]]|unique'
[["a"],["a","b"]]
Here is another variation uing reduce and setpath which assumes you have a specific set of top-level keys you want to examine:
. as $v
| reduce ("acquisition_parameters", "software") as $k (
{}; setpath([$k]; $v[$k] | keys)
)