Linux Regex snake case to camel case with word matchinf - json

I am stuck on a problem where we need to convert snake case to camel case inside a json key. This all has to happen in linux shell script
The json is as below
[{“name”:”THE_NAME1”,”value”:”Value_1”}, {“name”:”THE_NAME_NEW2”,”value”:”Value_2”}]
The expected output from regex
[{“name”:”theName1”,”value”:”Value_1”}, {“name”:”theNameNew2”,”value”:”Value_2”}]
The values should not change.

You can use jq to manipulate the appropriate values:
jq '
map(
.name |= (
ascii_downcase |
gsub(
"_(?<a>[a-z])";
.a|ascii_upcase
)
)
)
' <json
map(filter) - apply filter to each element of array
path|=(filter1|filter2|...) - update path by applying filters
filter1|filter2 - chain filters
ascii_downcase - convert A..Z to a..z
gsub(regex; string) - replace all occurrences of regex by string after interpolation of capture variables
ascii_upcase - convert a..z to A..Z

Related

How to insert JSON as string into another JSON

I am writing script (bash script for Azure pipeline) and I need to combine JSON from different variables. For example, I have:
TYPE='car'
COLOR='blue'
ADDITIONAL_PARAMS='{"something": "big", "etc":"small"}'
So, as you can see, I have several string variables and one which consist JSON.
I need to combine these variables with this format (and I cant :( ):
some_script --extr-vars --extra_vars '{"var_type": "'$TYPE'", "var_color": "'$COLOR'", "var_additional_data": "'$ADDITIONAL_PARAMS'"}'
But this combination is not working, I have a string something like:
some_script --extr-vars --extra_vars '{"var_type": "car", "var_color": "blue", "var_additional_data": " {"something": "big", "etc":"small"} "}'
which is not correct and valid JSON.
How I can combine existing JSON (already formatted with double quotes ") with other variables? I am using bash / console / yq utilite (to convert yaml to json)
Use jq to generate the JSON. (You can probably do this in one step with yq, but I'm not as familiar with that tool.)
ev=$(jq --arg t "$TYPE" \
--arg c "$COLOR" \
--argjson ap "$ADDITIONAL_PARAMS" \
-n '{var_type: $t, var_color: $c var_additional_data: $ap}')
some_script --extr-vars --extra_vars "$ev"

Fuzzy match string with jq

Let's say I have some JSON in a file, it's a subset of JSON data extracted from a larger JSON file - that's why I'll use stream later in my attempted solution - and it looks like this:
[
{"_id":"1","#":{},"article":false,"body":"Hello world","comments":"3","createdAt":"20201007200628","creator":{"id":"4a7ba8fd719d43598b977dd548eed6aa","bio":"","blocked":false,"followed":false,"human":false,"integration":false,"joined":"20201007200628","muted":false,"name":"mkscott","rss":false,"private":false,"username":"mkscott","verified":false,"verifiedComments":false,"badges":[],"score":"0","interactions":258,"state":1},"depth":"0","depthRaw":0,"hashtags":[],"id":"2d4126e342ed46509b55facb49b992a5","impressions":"3","links":[],"sensitive":false,"state":4,"upvotes":"0"},
{"_id":"2","#":{},"article":false,"body":"Goodbye world","comments":"3","createdAt":"20201007200628","creator":{"id":"4a7ba8fd719d43598b977dd548eed6aa","bio":"","blocked":false,"followed":false,"human":false,"integration":false,"joined":"20201007200628","muted":false,"name":"mkscott","rss":false,"private":false,"username":"mkscott","verified":false,"verifiedComments":false,"badges":[],"score":"0","interactions":258,"state":1},"depth":"0","depthRaw":0,"hashtags":[],"id":"2d4126e342ed46509b55facb49b992a5","impressions":"3","links":[],"sensitive":false,"state":4,"upvotes":"0"}
],
[
{"_id":"55","#":{},"article":false,"body":"Hello world","comments":"3","createdAt":"20201007200628","creator":{"id":"3a7ba8fd719d43598b977dd548eed6aa","bio":"","blocked":false,"followed":false,"human":false,"integration":false,"joined":"20201007200628","muted":false,"name":"mkscott","rss":false,"private":false,"username":"jkscott","verified":false,"verifiedComments":false,"badges":[],"score":"0","interactions":258,"state":1},"depth":"0","depthRaw":0,"hashtags":[],"id":"2d4126e342ed46509b55facb49b992a5","impressions":"3","links":[],"sensitive":false,"state":4,"upvotes":"0"},
{"_id":"56","#":{},"article":false,"body":"Goodbye world","comments":"3","createdAt":"20201007200628","creator":{"id":"3a7ba8fd719d43598b977dd548eed6aa","bio":"","blocked":false,"followed":false,"human":false,"integration":false,"joined":"20201007200628","muted":false,"name":"mkscott","rss":false,"private":false,"username":"jkscott","verified":false,"verifiedComments":false,"badges":[],"score":"0","interactions":258,"state":1},"depth":"0","depthRaw":0,"hashtags":[],"id":"2d4126e342ed46509b55facb49b992a5","impressions":"3","links":[],"sensitive":false,"state":4,"upvotes":"0"}
]
It describes 4 posts written by 2 different authors, with unique _id fields for each post. Both authors wrote 2 posts, where 1 says "Hello World" and the other says "Goodbye World".
I want to match on the word "Hello" and return the _id only for fields containing "Hello". The expected result is:
1
55
The closest I could come in my attempt was:
jq -nr --stream '
fromstream(1|truncate_stream(inputs))
| select(.body %like% "Hello")
| ._id
' <input_file
Assuming the input is modified slightly to make it a stream of the arrays as shown in the Q:
jq -nr --stream '
fromstream(1|truncate_stream(inputs))
| select(.body | test("Hello"))
| ._id
'
produces the desired output.
test uses regex matching. In your case, it seems you could use simple substring matching instead.
Handling extraneous commas
Assuming the input has commas between a stream of valid JSON exactly as shown, you could presumably use sed to remove them first.
Or, if you want an only-jq solution, use the following in conjunction with the -n, -r and --stream command-line options:
def iterate:
fromstream(1|truncate_stream(inputs?))
| select(.body | test("Hello"))
| ._id,
iterate;
iterate
(Notice the "?".)
The streaming parser (invoked with --stream) is usually not needed for the kind of task you describe, so in this response, I'm going to assume that the following (or a variant thereof) will suffice:
.[]
| select( .body | test("Hello") )._id
This of course assumes that the input is valid JSON.
Handling comma-delimited JSON
If your input is a comma-delimited stream of JSON as shown in the Q, you could use the following in conjunction with the -n command-line option:
# This is a variant of the built-in `recurse/1`:
def iterate(f): def r: f | (., r); r;
iterate( inputs? | .[] | select( .body | test("Hello") )._id )
Please note that this assumes that whatever occurs on a line after a delimiting comma can be ignored.

jq --stream filter on multiple values of same key

I am processing a very large JSON wherein I need to filter the inner JSON objects using a value of a key. My JSON looks like as follows:
{"userActivities":{"L3ATRosRdbDgSmX75Z":{"deviceId":"60ee32c2fae8dcf0","dow":"Friday","localDate":"2018-01-20"},"L3ATSFGrpAYRkIIKqrh":{"deviceId":"60ee32c2fae8dcf0","dow":"Friday","localDate":"2018-01-21"},"L3AVHvmReBBPNGluvHl":{"deviceId":"60ee32c2fae8dcf0","dow":"Friday","localDate":"2018-01-22"},"L3AVIcqaDpZxLf6ispK":{"deviceId":"60ee32c2fae8dcf0","dow":"Friday,"localDate":"2018-01-19"}}}
I want to put a filter on localDate values such that localDate in 2018-01-20 or localDate in "2018-01-21" such that the output look like.
{"userActivities":{"L3ATRosRdbDgSmX75Z":{"deviceId":"60ee32c2fae8dcf0","dow":"Friday","localDate":"2018-01-20"},"L3ATSFGrpAYRkIIKqrh":{"deviceId":"60ee32c2fae8dcf0","dow":"Friday","localDate":"2018-01-21"}}}
I have asked a similar question here and realised that I need to put filter on multiple values and retain the original structure of JSON.
https://stackoverflow.com/questions/52324497/how-to-filter-json-using-jq-stream
Thanks a ton in advance!
From the jq Cookbook, let's borrow def atomize(s):
# Convert an object (presented in streaming form as the stream s) into
# a stream of single-key objects
# Examples:
# atomize({a:1,b:2}|tostream)
# atomize(inputs) (used in conjunction with "jq -n --stream")
def atomize(s):
fromstream(foreach s as $in ( {previous:null, emit: null};
if ($in | length == 2) and ($in|.[0][0]) != .previous and .previous != null
then {emit: [[.previous]], previous: $in|.[0][0]}
else { previous: ($in|.[0][0]), emit: null}
end;
(.emit // empty), $in) ) ;
Since the top-level object described by the OP contains just one key, we can select the August 2018 objects as follows:
atomize(1|truncate_stream(inputs))
| select( .[].localDate[0:7] == "2018-08")
If you want these collected into a composite object, you might have to be careful about memory, so you might want to pipe the selected objects to another program (e.g. awk or jq). Otherwise, I'd go with:
def add(s): reduce s as $x (null; .+$x);
{"userActivities": add(
atomize(1|truncate_stream(inputs | select(.[0][0] == "userActivities")))
| select( .[].localDate[0:7] =="2018-01") ) }
Variation
If the top-level object has more than one key, then the following variation would be appropriate:
atomize(1|truncate_stream(inputs | select(.[0][0] == "userActivities")))
| select( .[].localDate[0:7] =="2018-08")

Using jq to count

Using jq-1.5 if I have a file of JSON that looks like
[{... ,"sapm_score":40.776, ...} {..., "spam_score":17.376, ...} ...]
How would I get a count of the ones where sapm_score > 40?
Thanks,
Dan
Update:
I looked at the input file and the format is actually
{... ,"sapm_score":40.776, ...}
{..., "spam_score":17.376, ...}
...
Does this change how one needs to count?
[UPDATE: If the input is not an array, see the last section below.]
count/1
I'd recommend defining a count filter (and maybe putting it in your ~/.jq), perhaps as follows:
def count(s): reduce s as $_ (0;.+1);
With this, assuming the input is an array, you'd write:
count(.[] | select(.sapm_score > 40))
or slightly more efficiently:
count(.[] | (.sapm_score > 40) // empty)
This approach (counting items in a stream) is usually preferable to using length as it avoids the costs associated with constructing an array.
count/2
Here's another definition of count that you might like to use (and perhaps add to ~/.jq as well):
def count(stream; cond): count(stream | cond // empty);
This counts the elements of the stream for which cond is neither false nor null.
Now, assuming the input consists of an array, you can simply write:
count(.[]; .sapm_score > 40)
"sapm_score" vs "spam_score"
If the point is that you want to normalize "sapm_score" to "spam_score", then (for example) you could use count/2 as defined above, like so:
count(.[]; .spam_score > 40 or .sapm_score > 40)
This assumes all the items in the array are JSON objects. If that is not the case, then you might want to try adding "?" after the key names:
count(.[]; .spam_score? > 40 or .sapm_score? > 40)
Of course all the above assumes the input is valid JSON. If that is not the case, then please see https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json
If the input is a stream of JSON objects ...
The revised question indicates the input consists of a stream of JSON objects (whereas originally the input was said to be an array of JSON objects). If the input consists of a stream of JSON objects, then the above solutions can easily be adapted, depending on the version of jq that you have. If your version of jq has inputs then (2) is recommended.
(1) All versions: use the -s command-line option.
(2) If your jq has inputs: use the -n command line option, and change .[] above to inputs, e.g.
count(inputs; .spam_score? > 40 or .sapm_score? > 40)
Filter the items that satisfy the condition then get the length.
map(select(.sapm_score > 40)) | length
Here is one way:
reduce .[] as $s(0; if $s.spam_score > 40 then .+1 else . end)
Try it online at jqplay.org
If instead of an array the input is a sequence of newline delimited objects (jsonlines)
reduce inputs as $s(0; if $s.spam_score > 40 then .+1 else . end)
will work if jq is invoked with the -n flag. Here is an example:
$ cat data.json
{ "spam_score":40.776 }
{ "spam_score":17.376 }
$ jq -Mn 'reduce inputs as $s(0; if $s.spam_score > 40 then .+1 else . end)' data.json
1
Try it online at tio.run
cat input.json | jq -c '. | select(.sapm_score > 40)' | wc -l
should do it.
The -c option prints a one-liner compact json representation of each match, and we count the number of lines jq prints.

How to delete the last character of prior line with sed

I'm trying to delete a line with a the last character of the prior line with sed:
I have a json file :
{
"name":"John",
"age":"16",
"country":"Spain"
}
I would like to delete country of all entries, to do that I have to delete the comma for the json syntax of the prior line.
I'm using this pattern :
sed '/country/d' test.json
sed -n '/resolved//.$//{x;d;};1h;1!{x;p;};${x;p;}' test.json
Editor's note:
The OP later clarified the following additional requirements, which invalidated some of the existing answers:
- multiple occurrences of country properties should be removed
- across all levels of the object hierarchy
- whitespace variations should be tolerated
Using a proper JSON parser such as jq is generally the best choice (see below), but if installing a utility is not an option, try this GNU sed command:
$ sed -zr 's/,\s*"country":[^\n]+//g' test.json
{
"name":"John",
"age":"16"
}
-z splits the input into records by NULs, which, in this case means that the whole file is read at once, which enables cross-line substitutions.
-r enables extended regular expressions for a more modern syntax with more features.
s/,\n"country":\s*//g replaces all occurrences of a comma followed by a (possibly empty) run of whitespace (including possibly a newline) and then "country" through the end of that line with the empty string, i.e., effectively removes the matched strings.
Note that this assumes that no other property or closing } follows such a country property on the same line.
To demonstrate a more robust solution based on jq.
Bertrand Martel's helpful answer contains a jq solution, which, however, does not address the requirement (added later) of replacing country attributes anywhere in the input object hierarchy.
In a not-yet-released version of jq higher than v1.5.2, a builtin walk/1 function will be available, which enables the following simple solution:
# Walk all nodes and remove a "country" property from any object.
jq 'walk(if type == "object" then del (.country) else . end)' test.json
In v1.5.2 and below, you can define a simplified variant of walk yourself:
jq '
# Define recursive function walk_objects/1 that walks all objects in the
# hierarchy.
def walk_objects(f): . as $in |
if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | walk_objects(f)) } ) | f
elif type == "array" then map( walk_objects(f) )
else . end;
# Walk all objects and remove a "country" property, if present.
walk_objects(del(.country))
' test.json
As pointed out before you should really consider using a JSON parser to parse JSON.
When that is said you can slurp the whole file, remove newlines and then replace
accordantly:
$ sed ':a;N;$!ba;s/\n//g;s/,"country"[^}]*//' test.json
{"name":"John","age":"16"}
Breakdown:
:a; # Define label 'a'
N; # Append next line to pattern space
$!ba; # Goto 'a' unless it's the last line
s/\n//g; # Replace all newlines with nothing
s/,"country"[^}]*// # Replace ',"country...' with nothing
This might work for you (GNU sed):
sed 'N;s/,\s*\n\s*"country".*//;P;D' file
Read two lines into the pattern space and remove substitution string.
N.B. Allows for spaces either side of the line.
You can use a JSON parser like jq to parse json file. The following will return the document without the country field and write the new document in result.json :
jq 'del(.country)' file.json > result.json