Select JSON values with special characters - json

I am looking to detect anomalies in my JSON values.
Here's an example of the data queries via jq
"2014-03-26 01:58:00"
"9019549360"
"109092812_20150626"
"134670164"
""
"97695498"
"680561513"
I would like to display all the values that contain a - or a _ or is blank.
In other words, I'd like to display the following output
"2014-03-26 01:58:00"
"109092812_20150626"
""
Now, I have tried the following:
select (. | contains("-","_"," "))'
This appears to work, but in order to make it more robust, I'd like to expand this to include all special characters.

Your query won't detect empty strings, and will possibly emit the same string more than once. It would be easier to use test, e.g.:
select( length==0 or test("[-_ ]") )
Note also that the preliminary '.' in your query is unnecessary.
Addendum
From one of the comments, it awould appear that you will want to specify "[^a-zA-Z0-9]" or similar as the argument of test.

Related

Finding the location (line, column) of a field value in a JSON file

Consider the following JSON file example.json:
{
"key1": ["arr value 1", "arr value 2", "arr value 3"],
"key2": {
"key2_1": ["a1", "a2"],
"key2_2": {
"key2_2_1": 1.43123123,
"key2_2_2": 456.3123,
"key2_2_3": "string1"
}
}
}
The following jq command extracts a value from the above file:
jq ".key2.key2_2.key2_2_1" example.json
Output:
1.43123123
Is there an option in jq that, instead of printing the value itself, prints the location (line and column, start and end position) of the value within a (valid) JSON file, given an Object Identifier-Index (.key2.key2_2.key2_2_1 in the example)?
The output could be something like:
some_utility ".key2.key2_2.key2_2_1" example.json
Output:
(6,25) (6,35)
Given JSON data and a query, there is no
option in jq that, instead of printing the value itself, prints the location
of possible matches.
This is because JSON parsers providing an interface to developers usually focus on processing the logical structure of a JSON input, not the textual stream conveying it. You would have to instruct it to explicitly treat its input as raw text, while properly parsing it at the same time in order to extract the queried value. In the case of jq, the former can be achieved using the --raw-input (or -R) option, the latter then by parsing the read-in JSON-encoded string using fromjson.
The -R option alone would read the input linewise into an array of strings, which would have to be concatenated (e.g. using add) in order to provide the whole input at once to fromjson. The other way round, you could also provide the --slurp (or -s) option which (in combination with -R) already concatenates the input to a single string which then, after having parsed it with fromjson, would have to be split again into lines (e.g. using /"\n") in order to provide row numbers. I found the latter to be more convenient.
That said, this could give you a starting point (the --raw-output (or -r) option outputs raw text instead of JSON):
jq -Rrs '
"\(fromjson.key2.key2_2.key2_2_1)" as $query # save the query value as string
| ($query | length) as $length # save its length by counting its characters
| ./"\n" | to_entries[] # split into lines and provide 0-based line numbers
| {row: .key, col: .value | indices($query)[]} # find occurrences of the query
| "(\(.row),\(.col)) (\(.row),\(.col + $length))" # format the output
'
(5,24) (5,34)
Demo
Now, this works for the sample query, how about the general case? Your example queried a number (1.43123123) which is an easy target as it has the same textual representation when encoded as JSON. Therefore, a simple string search and length count did a fairly good job (not a perfect one because it would still find any occurrence of that character stream, not just "values"). Thus, for more precision, but especially with more complex JSON datatypes being queried, you would need to develop a more sophisticated searching approach, probably involving more JSON conversions, whitespace stripping and other normalizing shenanigans. So, unless your goal is to rebuild a full JSON parser within another one, you should narrow it down to the kind of queries you expect, and compose an appropriately tailored searching approach. This solution provides you with concepts to simultaneously process the input textually and structurally, and with a simple search and ouput integration.

Print from jq using a wild card (or coalesce to first non null)

I have the following command:
kubectl get pod -A -o=json | jq -r '.items[]|select(any( .status.containerStatuses[]; .state.waiting or .state.terminated))|[.metadata.namespace, .metadata.name]|#csv'
This command works great. It outputs both the namespace and name of my failing pods.
But now I want to add one more column to the results. The column I want is located in one (and only one) of two places:
.status.containerStatuses[].state.waiting.reason
.status.containerStatuses[].state.terminated.reason
I first tried adding .status.containerStatuses[].state.*.reason to the results fields array. But that gave me an unexpected '*' compile error.
I then got to thinking about how I would do this with SQL or another programming language. They frequently have a function that will return the first non-null value of its parameters. (This is usually called coalesce). However I could not find any such command for jq.
How can I return the reason as a result of my query?
jq has a counterpart to "coalesce" in the form of //.
For example, null // 0 evaluates to 0, and chances are that it will suffice in your case, perhaps:
.status.containerStatuses[].state | (.waiting // .terminated) | .reason
or
.status.containerStatuses[].state | (.waiting.reason // .terminated.reason )
or similar.
However, // should only be used with some understanding of what it does, as explained in detail on the jq FAQ at https://github.com/stedolan/jq/wiki/FAQ#or-versus-
If // is inapplicable for some reason, then the obvious alternative would be an if ... then ... else ... end statement, which is quite like C's _ ? _ : _ in that it can be used to produce a value, e.g. something along the lines of:
.status.containerStatuses[].state
| if has("waiting") then .waiting.reason
else .terminated.reason
end
However, if containerStatuses is an array, then some care may be required.
In case you want to go with coalesce:
# Stream-oriented version
def coalesce(s):
first(s | select(. != null)) // null;
or if you prefer to work with arrays:
# Input: an array
# Output: the first non-null element if any, else null
def coalesce: coalesce(.[]);
Using the stream-oriented version, you could write something along the lines you had in mind with the wildcard, e.g.
coalesce(.status.containerStatuses[].state[].reason?)

AWS CLI / jq - transforming JSON with tags, and showing information even for non-defined tags

I'm facing an issue when trying to process output of 'aws ec2 describe-instances' command with 'jq', and I really need some help.
I want to transform JSON output into CSV file with the list of all instances, with
columns 'Name,InstanceId,Tag-Client,Tag-CostCenter'.
I've been using jq's select with a command like:
aws ec2 describe-instances |
jq -r '.Reservations[].Instances[]
| (.Tags[]|select(.Key=="Name")|.Value) + "," + .InstanceId + ","
+ (.Tags[]|select(.Key=="Client")|.Value) + ","
+ (.Tags[]|select(.Key=="CostCenter")|.Value)'
However using selects in this way, only those entries containing all the tags are displayed, not showing those that contain one of the tags only.
I understand the behavior, which is similar to a grep, but I'm trying to figure out if it's possible to perform this operation using jq, so in the case that one tag is not defined, would just return string "" and not remove the whole line.
I've found a reference about using 'if' clauses in jq ([https://ilya-sher.org/2016/05/11/most-jq-you-will-ever-need/], but wondering in anyone has resolved such case without having to make this logic or splitting the command in different executions.
Whenever you are given an array of key/value pairs (the tags here) and you want to extract values by their key, it'll be easier to map them into an object so you can access them directly. Functions like from_entries will work well with this.
However, since you're also trying to retrieve values not within this tag array, you can approach it a little differently to save some steps. Using reduce or foreach, you can go through each of the tags and add it to an object that holds all the values you're interested in. Then you can map the values you want into an array then convert to a csv row.
So if your goal is to create rows of Tags[Name], InstaceId, Tags[Client], Tags[CostCenter] for each instance, you could do this:
# for each instance
.Reservations[].Instances[]
# map each instance to an object where we can easily extract the values
| reduce .Tags[] as $t (
{ InstanceId }; # we want the InstanceId from the instance
.[$t.Key] = $t.Value # add the values to the object
)
# map the desired values to an array
| [ .Name, .InstanceId, .Client, .CostCenter ]
# convert to csv
| #csv
And the good news is, if Name, Client, or CostCenter doesn't exist in the tag array, or even InstanceId, then they'll just be null which becomes empty when converted to csv.

Postgres regex not matching comma as expected

I have a text field containing JSON and I need to replace and eliminate some fields. Below is an example of the JSON format. I would like to remove certain fields suffixed by '-op' and the trailing comma but the comma is not being picked up for some reason.
{
"miscId":[],
"otherActivityData":{"activityDate-op":"eq","activityDate":"11/28/2017"}
}
I worked with a nice online tools to show my pattern should work for most languages => regexpal
The pattern is:
"activityDate-op":".+?",?
It picks everything up except the comma. I did a regexp_match and the printed it out via raise notice and it produced
{"\"activityDate-op\":\"eq\""}
Can anyone help point out how I can pick up the comma?
Sometimes the field-op is last in the array so I need to have the 0 or 1 question mark quantifier in place. If I remove the ? then it picks the comma up sometimes but also causes issues.
You don't need a regex. There is an operator to remove an element from a JSONB based on the element's path:
select j #- array['otherActivityData', 'activityDate-op']
from (
values ( '{"miscId":[],
"otherActivityData":{"activityDate-op":"eq","activityDate":"11/28/2017"}
}'::jsonb)
) as t(j);
returns:
{"miscId": [], "otherActivityData": {"activityDate": "11/28/2017"}}

Finding a string between two strings in a file

This is a bit of a .json file I need to find information in:
"title":
"Spring bank holiday","date":"2012-06-04","notes":"Substitute day","bunting":true},
{"title":"Queen\u2019s Diamond Jubilee","date":"2012-06-05","notes":"Extra bank holiday","bunting":true},
{"title":"Summer bank holiday","date":"2012-08-27","notes":"","bunting":true},
{"title":"Christmas Day","date":"2012-12-25","notes":"","bunting":true},
{"title":"Boxing Day","date":"2012-12-26","notes":"","bunting":true},
{"title":"New Year\u2019s Day","date":"2013-01-01","notes":"","bunting":true},
{"title":"Good Friday","date":"2013-03-29","notes":"","bunting":false},
{"title":"
The file is much longer, but it is one long line of text.
I would like to display what bank holiday it is after a certain date, and also if it involves bunting.
I've tried grep and sed but I can't figure it out.
I'd like something like this:
[command] between [date] and [}] display [title] and [bunting]/[no bunting]
[title] should be just "Christmas Day" or something else
Forgot to mention:
I would like to achieve this in bash shell, either from the prompt or from a short bit of code.
You should use a proper JSON parser in a decent programming language, then you can do a lot of work in a safe way without too much code. How about this little Python code:
#!/usr/bin/env python
import json
with open('my.json') as jsonFile:
holidays = json.load(jsonFile)
for holiday in holidays:
if holiday['date'] > '2012-05-06':
print holiday['date'], ':', holiday['title'], \
("bunting" if holiday['bunting'] else "no bunting")
break # in case you only want one line of output
I could not figure out what exactly the output should be; if you can be more specific, I can adjust my example.
You can try this with awk:
awk -F"}," '{for(i=1;i<=NF;i++){print $i}}' file.json | awk -F"\"[:,]\"?" '$4>"2013-01-01"{printf "%s:%s:%s\n" ,$2,$4,$8}'
Seeing that the json file is one long string we first split this line into multiple json records on },. Then each individual record is split on a combination of ":, characters with an optional closing ". We then only output the line if its after a certain date.
This will find all records after Jan 1 2013.
EDIT:
The 2nd awk splits each individual json record into key-value pairs using a sub-string starting with ", followed by either a : or ,, and an optional ending ".
So in your example it will split on either ",", ":" or ":.
All odd fields are keys, and all even fields are values (hence $4 being the date in your example). We then check if $4(date) is after 2013-01-01.
I noticed i made a mistake on the optional " (should be followed by ? instead of *) in the split which i have now corrected and i also used printf function to display the values.