have jq detect errors in influxdb json output - json

I have a jq filter that converts (influxdb) json input to csv for further parsing. However, this filter fails when influxdb returns an error. I'm trying to improve my jq filter to detect this, however I can't get this to work. I need something like https://stackoverflow.com/a/41829748 but can't seem to get this to work. Any ideas?
Example data
{"results":[{"statement_id":0,"series":[{"name":"energyv3","columns":["time","value"],"values":[["2015-07-30T23:59:00Z",56980800],["2015-07-31T23:59:00Z",95108400]]}]}]}
{"error":"error parsing query: found EOF, expected integer at line 1, char 34"}
Desired outcome
"\"time\",\"value\""
"\"2015-07-30T23:59:00Z\",56980800"
"\"2015-07-31T23:59:00Z\",95108400"
"error parsing query: found EOF, expected integer at line 1, char 34"
i.e.
For input with .results key: data formatted as csv (works OK)
For input with .error key: only error string (doesn't work)
Current filter used
select(.results) | (.results[0].series[0].columns), (.results[0].series[0].values[]) | #csv
Attempt to combine filters
((select(.error) | {error}) // null) + select(.results) | (.results[0].series[0].columns), (.results[0].series[0].values[]) | #csv

Based on your attempts, and the assumption that each object contain either results or error, this should do it:
( .results[0].series | .[0].columns, .[]?.values[] ) // [ .error ] | #csv
REPL demo

Related

Maintain order of jq CSV output and include empty values

I am currently working on a bash script that combines the output of both the aws iam list-users and aws iam list-user-tags commands in a CSV file containing all users along with their respective information and assigned tags. To parse the JSON output of those commands I choose to use jq.
Retrieving parsing and converting (JSON to CSV) the list-user output works fine and produces the expected comma-separated list of values.
The output of list-user-tags does not quite behave that way. Its JSON output has the following schema:
{
Tags: [
{
Key: "Name",
Value: "NameOfUser"
},
{
Key: "Email",
Value: "EmailOfUser"
},
{
Key: "Company",
Value: "CompanyOfUser"
}
]
}
Unfortunately the order of the tags is not consistent across users (and possibly across queries) which currently makes it impossible for me to maintain the order defined in the CSV file. On top of that there is the possibility of one or multiple missing tags.
What I am looking for is a way to achieve the following (preferably using jq):
Select a tags "Value"-value by its "Key"-value
Check whether it is existent and if not add an empty entry
Put the value in the exact same place every time (maintain a certain order)
Repeat for every entry in the original output
Convert the resulting array of values into CSV
What I tried so far:
aws iam list-user-tags --user-name abcdef --no-cli-pager \
| jq -r '[.Tags[] | select(.Key=="Name"),select(.Key=="Email"),select(.Key=="Company") | .Value // ""] | #csv'
Any help is much appreciated!
Let's suppose your sample "schema" is in a file named schema.jq
Then
jq -n -f schema.jq | jq -r '
def get($key):
map(select(.Key == $key))
| if length == 0 then null else .[0].Value end;
.Tags | [get("Name"), get("Email"), get("Company")] | #csv
'
produces the following CSV:
"NameOfUser","EmailOfUser","CompanyOfUser"
It should be easy to adapt this illustration to your needs.

Fuzzy match string with jq

Let's say I have some JSON in a file, it's a subset of JSON data extracted from a larger JSON file - that's why I'll use stream later in my attempted solution - and it looks like this:
[
{"_id":"1","#":{},"article":false,"body":"Hello world","comments":"3","createdAt":"20201007200628","creator":{"id":"4a7ba8fd719d43598b977dd548eed6aa","bio":"","blocked":false,"followed":false,"human":false,"integration":false,"joined":"20201007200628","muted":false,"name":"mkscott","rss":false,"private":false,"username":"mkscott","verified":false,"verifiedComments":false,"badges":[],"score":"0","interactions":258,"state":1},"depth":"0","depthRaw":0,"hashtags":[],"id":"2d4126e342ed46509b55facb49b992a5","impressions":"3","links":[],"sensitive":false,"state":4,"upvotes":"0"},
{"_id":"2","#":{},"article":false,"body":"Goodbye world","comments":"3","createdAt":"20201007200628","creator":{"id":"4a7ba8fd719d43598b977dd548eed6aa","bio":"","blocked":false,"followed":false,"human":false,"integration":false,"joined":"20201007200628","muted":false,"name":"mkscott","rss":false,"private":false,"username":"mkscott","verified":false,"verifiedComments":false,"badges":[],"score":"0","interactions":258,"state":1},"depth":"0","depthRaw":0,"hashtags":[],"id":"2d4126e342ed46509b55facb49b992a5","impressions":"3","links":[],"sensitive":false,"state":4,"upvotes":"0"}
],
[
{"_id":"55","#":{},"article":false,"body":"Hello world","comments":"3","createdAt":"20201007200628","creator":{"id":"3a7ba8fd719d43598b977dd548eed6aa","bio":"","blocked":false,"followed":false,"human":false,"integration":false,"joined":"20201007200628","muted":false,"name":"mkscott","rss":false,"private":false,"username":"jkscott","verified":false,"verifiedComments":false,"badges":[],"score":"0","interactions":258,"state":1},"depth":"0","depthRaw":0,"hashtags":[],"id":"2d4126e342ed46509b55facb49b992a5","impressions":"3","links":[],"sensitive":false,"state":4,"upvotes":"0"},
{"_id":"56","#":{},"article":false,"body":"Goodbye world","comments":"3","createdAt":"20201007200628","creator":{"id":"3a7ba8fd719d43598b977dd548eed6aa","bio":"","blocked":false,"followed":false,"human":false,"integration":false,"joined":"20201007200628","muted":false,"name":"mkscott","rss":false,"private":false,"username":"jkscott","verified":false,"verifiedComments":false,"badges":[],"score":"0","interactions":258,"state":1},"depth":"0","depthRaw":0,"hashtags":[],"id":"2d4126e342ed46509b55facb49b992a5","impressions":"3","links":[],"sensitive":false,"state":4,"upvotes":"0"}
]
It describes 4 posts written by 2 different authors, with unique _id fields for each post. Both authors wrote 2 posts, where 1 says "Hello World" and the other says "Goodbye World".
I want to match on the word "Hello" and return the _id only for fields containing "Hello". The expected result is:
1
55
The closest I could come in my attempt was:
jq -nr --stream '
fromstream(1|truncate_stream(inputs))
| select(.body %like% "Hello")
| ._id
' <input_file
Assuming the input is modified slightly to make it a stream of the arrays as shown in the Q:
jq -nr --stream '
fromstream(1|truncate_stream(inputs))
| select(.body | test("Hello"))
| ._id
'
produces the desired output.
test uses regex matching. In your case, it seems you could use simple substring matching instead.
Handling extraneous commas
Assuming the input has commas between a stream of valid JSON exactly as shown, you could presumably use sed to remove them first.
Or, if you want an only-jq solution, use the following in conjunction with the -n, -r and --stream command-line options:
def iterate:
fromstream(1|truncate_stream(inputs?))
| select(.body | test("Hello"))
| ._id,
iterate;
iterate
(Notice the "?".)
The streaming parser (invoked with --stream) is usually not needed for the kind of task you describe, so in this response, I'm going to assume that the following (or a variant thereof) will suffice:
.[]
| select( .body | test("Hello") )._id
This of course assumes that the input is valid JSON.
Handling comma-delimited JSON
If your input is a comma-delimited stream of JSON as shown in the Q, you could use the following in conjunction with the -n command-line option:
# This is a variant of the built-in `recurse/1`:
def iterate(f): def r: f | (., r); r;
iterate( inputs? | .[] | select( .body | test("Hello") )._id )
Please note that this assumes that whatever occurs on a line after a delimiting comma can be ignored.

JSON will not convert with jq in Unix

Having difficulties converting this JSON. It is multi-line similar to what is below. The example data at the bottom is what is reads as-is once unzipped.
An example of what has been tried:
jq -r '(([["user_id","server_received_time","app","device_carrier","$schema","city","uuid","event_time","platform","os_version","amplitude_id","processed_time","user_creation_time","version_name","ip_address","paying","dma","group_properties","user_properties","client_upload_time","$insert_id","event_type","library","amplitude_attribution_ids","device_type","device_manufacturer","start_version","location_lng","server_upload_time","event_id","location_lat","os_name","amplitude_event_type","device_brand","groups","event_properties","data","device_id","language","device_model","country","region","is_attribution_event","adid","session_id","device_family","sample_rate","idfa","client_event_time"]]) + [(.table.All[] | [.user_id,.server_received_time,.app,.device_carrier,.$schema,.city,.uuid,.event_time,.platform,.os_version,.amplitude_id,.processed_time,.user_creation_time,.version_name,.ip_address,.paying,.dma,.group_properties,.user_properties,.client_upload_time,.$insert_id,.event_type,.library,.amplitude_attribution_ids,.device_type,.device_manufacturer,.start_version,.location_lng,.server_upload_time,.event_id,.location_lat,.os_name,.amplitude_event_type,.device_brand,.groups,.event_properties,.data,.device_id,.language,.device_model,.country,.region,.is_attribution_event,.adid,.session_id,.device_family,.sample_rate,.idfa,.client_event_time])])[]|#csv' test.json > test.csv
As well as some other jq options. I need every column regardless of the value, and the values as-is. Does anyone have thoughts on why we are running into issues? One error we get is:
jq: error: try .["field"] instead of .field for unusually named fields at <top-level>, line 1:
Other jq lines have given the following error:
string (...) cannot be csv-formatted, only array
This is an excerpt from one of the JSON files:
{"groups":{},"country":"United States","device_id":"3d-88c-45-b6-ed81277eR","is_attribution_event":false,"server_received_time":"2019-12-17 17:29:11.113000","language":"English","event_time":"2019-12-17 17:27:49.047000","user_creation_time":"2019-11-08 13:15:32.919000","city":"Sure","uuid":"someID","device_model":"Windows","amplitude_event_type":null,"client_upload_time":"2019-12-17 17:29:21.958000","data":{},"library":"amplitude-js\/5.2.2","device_manufacturer":null,"dma":"Washington, DC (Townville, USA)","version_name":null,"region":"Virginia","group_properties":{},"location_lng":null,"device_family":"Windows","paying":null,"client_event_time":"2019-12-17 17:27:59.892000","$schema":12,"device_brand":null,"user_id":"email#gmail.com","event_properties":{"title":"Name","id":"1-253251","applicationName":"SomeName"},"os_version":"18","device_carrier":null,"server_upload_time":"2019-12-17 17:29:11.135000","session_id":1576603675620,"app":231165,"amplitude_attribution_ids":null,"event_type":"CHANGE_PERSPECTIVE","user_properties":{},"adid":null,"device_type":"Windows","$insert_id":"e308c923-d8eb-48c6-8ea5-600","event_id":24,"amplitude_id":515,"processed_time":"2019-12-17 17:29:12.760372","platform":"Web","idfa":null,"os_name":"Edge","location_lat":null,"ip_address":"123.456.78.90","sample_rate":null,"start_version":null}
Thank you!
There are several problems with your attempt.
First, the keys with "$" in their names cannot be specified using the abbreviated .foo syntax; you could use .["$foo"] instead.
Second, #csv expects an array of atomic values. Thus the keys with JSON objects as values must be handled specially.
Third, the "+" is incorrect. The relevant connector here is ",".
With your sample JSON, the following will work:
(["user_id","server_received_time","app","device_carrier","$schema","city","uuid","event_time","platform","os_version","amplitude_id","processed_time","user_creation_time","version_name","ip_address","paying","dma","group_properties","user_properties","client_upload_time","$insert_id","event_type","library","amplitude_attribution_ids","device_type","device_manufacturer","start_version","location_lng","server_upload_time","event_id","location_lat","os_name","amplitude_event_type","device_brand","groups","event_properties","data","device_id","language","device_model","country","region","is_attribution_event","adid","session_id","device_family","sample_rate","idfa","client_event_time"]),
([.user_id,.server_received_time,.app,.device_carrier,.["$schema"],.city,.uuid,.event_time,.platform,.os_version,.amplitude_id,.processed_time,.user_creation_time,.version_name,.ip_address,.paying,.dma,.group_properties,.user_properties,.client_upload_time,.["$insert_id"],.event_type,.library,.amplitude_attribution_ids,.device_type,.device_manufacturer,.start_version,.location_lng,.server_upload_time,.event_id,.location_lat,.os_name,.amplitude_event_type,.device_brand,.groups,.event_properties,.data,.device_id,.language,.device_model,.country,.region,.is_attribution_event,.adid,.session_id,.device_family,.sample_rate,.idfa,.client_event_time]
| map(if type=="object"
then to_entries
| map( "\(.key):\(.value)" )
| join(";")
else . end))
| #csv
A less error-prone solution
Specifying the long list of keys twice makes the above solution error-prone. It would be better to specify the keys just once, and then programatically generate the rows.
Here's a utility function that can be used to this end:
def toa($headers):
. as $in | $headers | map($in[.]);
Or you could handle the object-valued keys inside toa:
def toa($headers):
def flat:
if type == "object" or type == "array"
then to_entries | map( "\(.key):\(.value)" ) | join(";")
else .
end;
. as $in | $headers | map($in[.] | flat);
JSONL
If the input is a stream of JSON objects of the type illustrated in the question, an efficient solution would use inputs with the -n command line option. This could be along the lines of:
print_header,
(inputs | print_row)

Bash script to extract all specific key values from a unstructured JSON file

I was trying to extract all the values from a specific key in the below JSON file.
{
"tags": [
{
"name": "xxx1",
"image_id": "yyy1"
},
{
"name": "xxx2",
"image_id": "yyy2"
}
]
}
I used the below code to get the image_id key values.
echo new.json | jq '.tags[] | .["image_id"]'
I'm getting the below error message.
parse error: Invalid literal at line 2, column 0
I think either the JSON file is not in the proper format OR the echo command to call the Json file is wrong.
Given the above input, my intended/desired output is:
yyy1
yyy2
What needs to be fixed to make this happen?
When you run:
echo new.json | jq '.tags[] | .["image_id"]'
...the string new.json -- not the contents of the file named new.json -- is fed to jq's stdin, and is thus what it tries to parse as JSON text.
Instead, run:
jq -r '.tags[] | .["image_id"]' <new.json
...to directly open new.json connected to the stdin of jq (and, with -r, to avoid adding unwanted quotes to the output stream).
Your filter .tags[] | .["image_id"]
is valid, but can be abbreviated to:
.tags[] | .image_id
or even:
.tags[].image_id
If you want the values associated with the "image_id" key, wherever that key occurs, you could go with:
.. | objects | select(has("image_id")) | .image_id
Or, if you don't mind throwing away false and null values:
.. | .image_id? // empty

JQ: Numeric field names

I use JQ 1.5 in a Windows10 enviroment (PowerShell).
I built a jq statement that works on the example data on jqplay but throws a error on my enviroment:
Sample: Code share
Code:
. | { last_update: .starbase_detailed_scan.last_update_time, user_name: .starbase_detailed_scan.owner_name, alliance_id: .starbase_detailed_scan.owner_alliance_id, drydocks: .starbase_detailed_scan.num_drydocks, tier: .starbase_detailed_scan.owner_level, defence_plattform: .starbase_detailed_scan.num_defence_platforms, shield_triggered: .starbase_detailed_scan.player_shield.triggered_on, shield_end: .starbase_detailed_scan.player_shield.expiry_time, parsteel: .resources."325683920".current_amount, tritanium: .resources."743985951".current_amount, dilithium: .resources."2614028847".current_amount, user_id: .starbase_detailed_scan.owner_user_id, defence_rating: .starbase_detailed_scan.defense_rating }
The problem are the JSON objects with a numeric identifier. On jqplay I got the correct values. On PowerShell jq I get an error. I expected that this is a PowerShell problem so I tried to move the filter into a filter file. The error is then gone but I get only NULL as value for the three objects.
Numbers in the json path need to be marked Oldschool like:
.starbase_detailed_scan.resources["2614028847"]
BR
Timo