How to parse multidimensional JSON array in bash using jsawk?

How to parse multidimensional JSON array in bash using jsawk? - json

I have an array like below. I want to parse entire data to my bash array.
So i can call the first "JSON addressLineOne" from ${bashaddr[0]}, and etc.
[
{
"id":"f0c546d5-0ce4-55ee-e043-516e0f0afdc1",
"cardType":"WMUSGESTORECARD",
"lastFour":"1682",
"cardExpiryDate":"2012-01-16",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"Apt venue",
"addressLineTwo":"",
"city":"oakdale",
"state":"CT",
"postalCode":"06370",
"phone":"534534",
"isDefault":false
},
{
"id":"f0c546d5-0ce0-55ee-e043-516e0f0afdc1",
"cardType":"MASTERCARD",
"lastFour":"2731",
"cardExpiryDate":"2009-08-31",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"119 maple ave.",
"addressLineTwo":"",
"city":"uncasville",
"state":"CT",
"postalCode":"06382",
"phone":"7676456",
"isDefault":false
},
{
"id":"f0c546d5-0ce2-55ee-e043-516e0f0afdc1",
"cardType":"MASTERCARD",
"lastFour":"6025",
"cardExpiryDate":"2011-08-31",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"Angeline Street",
"addressLineTwo":"",
"city":"oakdale",
"state":"CT",
"postalCode":"06370",
"phone":"7867876",
"isDefault":false
}
]
I have tried like this:
#!/bin/bash
addressLineOne="$(echo $card | jsawk 'return this.addressLineOne')"
but it gives me the entire address:
["address 1","address 2","address 3"]
Thank you.

I wrote the answer below before reading the comments, but this is exactly the same answer as #4ae1e1 provided, except I don't put -r tag in case you want the values to remain quoted (e.g. passing this as an argument somewhere else).
I know this is not jsawk, but do consider jq:
jq '.[].addressLineOne' yourfile.txt
And to access specific values you can put record number in the square brackets (starting with 0 for the first address and so on). For example to get the address for the third record:
jq '.[2].addressLineOne' yourfile.txt
For learning more about jq and advanced uses, check: http://jqplay.org

What you need to do is make use of the -a switch to apply some post processing and filter the output array like this:
jsawk 'return this.addressLineOne' -a 'return this[0]'
From the documentation:
-b <script> | -a <script>
Run the specified snippet of JavaScript before (-b) or after (-a)
processing JSON input. The `this` object is set to the whole JSON
array or object. This is used to preprocess (-b) or postprocess
(-a) the JSON array before or after the main script is applied.
This option can be specified multiple times to define multiple
before/after scripts, which will be applied in the order they
appeared on the command line.

Related

Display an object if a nephew element array contains a value

Select objects based on value of variable in object using jq
That shows how to return values directly above the selection criteria but how would I get another object that was adjacent to a value above my selection criteria?
Given the data below, what jq invocation would return the French name of planets whose moon(s) have been spoiled? (this is a structural reproduction of the live data with which I am working -- which actually uses the word "value" in this way, so that's not helping)
{"kind":"solarsystem","name":"Sol",
"Planets": [
{ "kind":"habitable",
"names": { "english":"Earth","french":"Terre"},
"satellites" : [
{"name":"The Moon",
"parameters": [
{"name":"diameter", "intValue":"3476"},
{"name":"diameter_units", "value":"km"},
{"name":"unspoiled","value":"no"}]}]},
{"kind":"uninhabitable",
"names": {"english":"Mars","french":"Mars"},
"satellites" : [
{"name":"Phobos",
"parameters": [
{"name":"diameter", "intValue":"2200"},
{"name":"diameter_units", "value":"m"},
{"name":"unspoiled","value":"yes"}]},
{"name":"Deimos",
"parameters": [
{"name":"diameter", "intValue":"1200"},
{"name":"diameter_units", "value":"m"},
{"name":"unspoiled","value":"yes"}]}]}]}

The program below selects planets whose moons have all been spoiled. As each parameter is a name-value pair, we can use from_entries to transform the array of parameters into an object and retrieve the unspoiled status with just .unspoiled, and thus avoid another select to find the parameter we're interested in.
.Planets[] | select(.satellites | all(.parameters | from_entries .unspoiled == "no")) .names.french
If a single spoiled moon is enough, change all to any.
Online demo

And here, also a solution for the same JSON query using an alternative tool (jtc):
In the simplest form, the following will do:
bash $ <file.json jtc -w'[value]:<no>:[-5][names][french]'
"Terre"
However, that solution will return planet's french name for each of the moon, e.g., for spoiled moons it would give this:
bash $ <file.json jtc -w'[value]:<yes>:[-5][names][french]'
"Mars"
"Mars"
bash $
For the case when there're multiple moons but the name is required only once, strengthen the query like this (showcasing here spoiled moons):
bash $ <file.json jtc -w'<satellites>l:[value]:<yes>[-5][names][french]'
"Mars"
bash $
PS. I'm a deveoper of jtc unix JSON processor
PPS. the above disclaimer is required by SO.
Update:
the answer was updated based on discussion in comments with #oguzismail to enhance structural relationship between value and french labels so that other (irrelevant) possible value matches won't trigger false positives.
If, by a chance, the structural relation [-5][names] is not enough, the query then can be ultimately enhanced by inserting <unspoiled>[-1] before [value]... lexeme

Get the element after/before the matched one in json with jq

Suppose I have the following json input to the jq command:
[
{"type":"dog", "set":"foo"},
{"type":"bird", "set":"bar"},
{"type":"cat", "set":"blaz"},
{"type":"fish", "set":"mor"}
]
I know that there is an element in this array whose type is "bird", in this case, the second element. But I want its next (or previous) sibling, that is, the element after (before) it, in this case, the third (first) element. How can I get it in jq?
Also, I have another question: If the matched element (that is, the element whose value of type I know) is the last one in the array, I want it to get the first one as next (that is, cycle through the array) instead of returning nothing. The same whether the matched element is the first one (then I want to get the last element).

For the sake of specificity, let's suppose you want to extract the the (before, focus, after) triple as an array, where before and after wrap around as described. For simplicity, let's also suppose the source array is of length at least 2.
Next, for ease of exposition and understanding, we will define a helper function for extracting the three items:
# $i is assumed to be a valid index into the input array,
# which is assumed to be of length at least 2
def triple($i):
if $i == 0 then [last] + .[0:2]
elif $i == (length-1) then .[$i-1: $i+2] + [first]
else .[$i-1: $i+2]
end;
Now we have only to find the index, and use it:
(map(.type) | index("bird")) as $i
| if $i then triple($i) else empty end
Using this approach, other variants can easily be obtained.

let me also offer you an alternative solution, based on a walk-path unix tool for JSON: jtc - there you "encode" your query logic right into the path:
e.g. to find a "type":"bird" record and then it's preceding sibling (in the parent's array) would be like this:
bash $ <file.json jtc -w'[type]:<bird> [-1]<idx>k [-1]>idx<t-1' -r
{ "set": "foo", "type": "dog" }
let me break it down for you:
[type]:<bird> - will find recursively a record "type":"bird"
[-1]<idx>k - will step up 1 tier in JSON tree (select parent, effectively select the entire record {"type":"bird", "set":"bar"}) and will memorize its array index into the namespace idx
[-1]>idx<t-1 - will again step up 1 level in JSON (selecting the top array) and will search (non-recursively) for the entry with index (stored in idx) offset by -1
Equally once can select a next sibling:
bash $ <file.json jtc -w'[type]:<bird>[-1]<idx>k[-1]>idx<t1'
{ "set": "blaz", "type": "cat" }
Or, select the first entry (based on the last match):
bash $ <file.json jtc -w'[type]:<fish>[-1]<idx>k[-1]>idx<t-1000' -r
{ "set": "foo", "type": "dog" }
(just put some surely low value as the relative quantifier - it'll get normalized to the first entry)
PS> Disclosure: I'm the creator of the jtc tool

Convert plain text with a specific format into JSON in VIM

All my university notes are in JSON format and when I get a set of practical questions from a pdf it is formatted like this:
1. Download and compile the code. Run the example to get an understanding of how it works. (Note that both
threads write to the standard output, and so there is some mixing up of the two conceptual streams, but this
is an interface issue, not of concern in this course.)
2. Explore the classes SumTask and StringTask as well as the abstract class Task.
3. Modify StringTask.java so that it also writes out “Executing a StringTask task” when the execute() method is
called.
4. Create a new subclass of Task called ProdTask that prints out the product of a small array of int. (You will have
to add another option in TaskGenerationThread.java to allow the user to generate a ProdTask for the queue.)
Note: you might notice strange behaviour with a naïve implementation of this and an array of int that is larger
than 7 items with numbers varying between 0 (inclusive) and 20 (exclusive); see ProdTask.java in the answer
for a discussion.
5. Play with the behaviour of the processing thread so that it polls more frequently and a larger number of times,
but “pop()”s off only the first task in the queue and executes it.
6. Remove the “taskType” member variable definition from the abstract Task class. Then add statements such as
the following to the SumTask class definition:
private static final String taskType = "SumTask";
Investigate what “static” and “final” mean.
7. More challenging: write an interface and modify the SumTask, StringTask and ProdTask classes so that they
implement this interface. Here’s an example interface:
What I would like to do is copy it into vim and execute a find and replace to convert it into this:
"1": {
"Task": "Download and compile the code. Run the example to get an understanding of how it works. (Note that both threads write to the standard output, and so there is some mixing up of the two conceptual streams, but this is an interface issue, not of concern in this course.)",
"Solution": ""
},
"2": {
"Task": "Explore the classes SumTask and StringTask as well as the abstract class Task.",
"Solution": ""
},
"3": {
"Task": "Modify StringTask.java so that it also writes out “Executing a StringTask task” when the execute() method is called.",
"Solution": ""
},
"4": {
"Task": "Create a new subclass of Task called ProdTask that prints out the product of a small array of int. (You will have to add another option in TaskGenerationThread.java to allow the user to generate a ProdTask for the queue.) Note: you might notice strange behaviour with a naïve implementation of this and an array of int that is larger than 7 items with numbers varying between 0 (inclusive) and 20 (exclusive); see ProdTask.java in the answer for a discussion.",
"Solution": ""
},
"5": {
"Task": "Play with the behaviour of the processing thread so that it polls more frequently and a larger number of times, but “pop()”s off only the first task in the queue and executes it.",
"Solution": ""
},
"6": {
"Task": "Remove the “taskType” member variable definition from the abstract Task class. Then add statements such as the following to the SumTask class definition: private static final String taskType = 'SumTask'; Investigate what “static” and “final” mean.",
"Solution": ""
},
"7": {
"Task": "More challenging: write an interface and modify the SumTask, StringTask and ProdTask classes so that they implement this interface. Here’s an example interface:",
"Solution": ""
}
After trying to figure this out during the practical (instead of actually doing the practical) this is the closest I got:
%s/\([1-9][1-9]*\)\. \(\_.\{-}\)--end--/"\1": {\r "Task": "\2",\r"Solution": "" \r},/g
The 3 problems with this are
I have to add --end-- to the end of each question. I would like it to know when the question ends by looking ahead to a line which starts with [1-9][1-9]*. unfortunately when I search for that It also replaces that part.
This keeps all the new lines within the question (which is invalid in JSON). I would like it to remove the new lines.
The last entry should not contain a "," after the input because that would also be invalid JSON (Note I don't mind this very much as it is easy to remove the last "," manually)
Please keep in mind I am very bad at regular expressions and one of the reasons I am doing this is to learn more about regex so please explain any regex you post as a solution.

In two steps:
%s/\n/\ /g
to solve problem 2, and then:
%s/\([1-9][1-9]*\)\. \(\_.\{-}\([1-9][1-9]*\. \|\%$\)\#=\)/"\1": {\r "Task": "\2",\r"Solution": "" \r},\r/g
to solve problem 1.
You can solve problem 3 with another replace round. Also, my solution inserts an unwanted extra space at the end of the task entries. Try to remove it yourself.
Short explanation of what I have added:
\|: or;
\%$: end of file;
\#=: find but don't include in match.

If each item sits in single line, I would transform the text with macro, it is shorter and more straightforward than the :s:
I"<esc>f.s": {<enter>"Task": "<esc>A"<enter>"Solution": ""<enter>},<esc>+
Record this macro in a register, like q, then you can just replay it like 100#q to do the transformation.
Note that
the result will leave a comma , and the end, just remove it.
You can also add indentations during your macro recording, then your json will be "pretty printed". Or you can make it sexy later with other tool.

You could probably do this with one large regular expression, but that quickly becomes unmaintainable. I would break the task up into 3 steps instead:
Separate each numbered step into its own paragraph .
Put each paragraph on its own line .
Generate the JSON .
Taken together:
%s/^[0-9]\+\./\r&/
%s/\(\S\)\n\(\S\)/\1 \2/
%s/^\([0-9]\+\)\. *\(.*\)/"\1": {\r "Task": "\2",\r "Solution": ""\r},/
This solution also leaves a comma after the last element. This can be removed with:
$s/,//
Explanation
%s/^[0-9]\+\./\r&/ this matches a line starting with a number followed by a dot, e.g. 1., 8., 13., 131, etc. and replaces it with a newline (\r) followed by the match (&).
%s/\(\S\)\n\(\S\)/\1 \2/ this removes any newline that is flanked by non-white-space characters (\S).
%s/^\([0-9]\+\)\. *\(.*\) ... capture the number and text in \1 and \2.
... /"\1": {\r "Task": "\2",\r "Solution": ""\r},/ format text appropriately.
Alternative way using sed, awk and jq
You can perform steps one and two from above straightforwardly with sed and awk:
sed 's/^[0-9]\+\./\n&/' infile
awk '$1=$1; { print "\n" }' RS= ORS=' '
Using jq for the third step ensures that the output is valid JSON:
jq -R 'match("([0-9]+). *(.*)") | .captures | {(.[0].string): { "Task": (.[1].string), "Solution": "" } }'
Here as one command line:
sed 's/^[0-9]\+\./\n&/' infile |
awk '$1=$1; { print "\n" }' RS= ORS=' ' |
jq -R 'match("([0-9]+). *(.*)") | .captures | {(.[0].string): { "Task": (.[1].string), "Solution": "" } }'

Decoding and using JSON data in Perl

I am confused about accessing the contents of some JSON data that I have decoded. Here is an example
I don't understand why this solution works and my own does not. My questions are rephrased below
my $json_raw = getJSON();
my $content = decode_json($json_raw);
print Data::Dumper($content);
At this point my JSON data has been transformed into this
$VAR1 = { 'items' => [ 1, 2, 3, 4 ] };
My guess tells me that, once decoded, the object will be a hash with one element that has the key items and an array reference as the value.
$content{'items'}[0]
where $content{'items'} would obtain the array reference, and the outer $...[0] would access the first element in the array and interpret it as a scalar. However this does not work. I get an error message use of uninitialized value [...]
However, the following does work:
$content->{items}[0]
where $content->{items} yields the array reference and [0] accesses the first element of that array.
Questions
Why does $content{'items'} not return an array reference? I even tried #{content{'items'}}, thinking that, once I got the value from content{'items'}, it would need to be interpreted as an array. But still, I receive the uninitialized array reference.
How can I access the array reference without using the arrow operator?

Beginner's answer to beginner :) Sure not as profesional as should be, but maybe helps you.
use strict; #use this all times
use warnings; #this too - helps a lot!
use JSON;
my $json_str = ' { "items" : [ 1, 2, 3, 4 ] } ';
my $content = decode_json($json_str);
You wrote:
My guess tells me that, once decoded, the object will be a hash with
one element that has the key items and an array reference as the value.
Yes, it is a hash, but the the decode_json returns a reference, in this case, the reference to hash. (from the docs)
expects an UTF-8 (binary) string and tries to parse that
as an UTF-8 encoded JSON text,
returning the resulting reference.
In the line
my $content = decode_json($json_str);
you assigning to an SCALAR variable (not to hash).
Because you know: it is a reference, you can do the next:
printf "reftype:%s\n", ref($content);
#print: reftype:HASH ^
#therefore the +------- is a SCALAR value containing a reference to hash
It is a hashref - you can dump all keys
print "key: $_\n" for keys %{$content}; #or in short %$content
#prints: key: items
also you can assing the value of the "items" (arrayref) to an scalar variable
my $aref = $content->{items}; #$hashref->{key}
#or
#my $aref = ${$content}{items}; #$hash{key}
but NOT
#my $aref = $content{items}; #throws error if "use strict;"
#Global symbol "%content" requires explicit package name at script.pl line 20.
The $content{item} is requesting a value from the hash %content and you never defined/assigned such variable. the $content is an scalar variable not hash variable %content.
{
#in perl 5.20 you can also
use 5.020;
use experimental 'postderef';
print "key-postderef: $_\n" for keys $content->%*;
}
Now step deeper - to the arrayref - again you can print out the reference type
printf "reftype:%s\n", ref($aref);
#reftype:ARRAY
print all elements of array
print "arr-item: $_\n" for #{$aref};
but again NOT
#print "$_\n" for #aref;
#dies: Global symbol "#aref" requires explicit package name at script.pl line 37.
{
#in perl 5.20 you can also
use 5.020;
use experimental 'postderef';
print "aref-postderef: $_\n" for $aref->#*;
}
Here is an simple rule:
my #arr; #array variable
my $arr_ref = \#arr; #scalar - containing a reference to #arr
#{$arr_ref} is the same as #arr
^^^^^^^^^^ - array reference in curly brackets
If you have an $arrayref - use the #{$array_ref} everywhere you want use the array.
my %hash; #hash variable
my $hash_ref = \%hash; #scalar - containing a reference to %hash
%{$hash_ref} is the same as %hash
^^^^^^^^^^^ - hash reference in curly brackets
If you have an $hash_ref - use the %{$hash_ref} everywhere you want use the hash.
For the whole structure, the following
say $content->{items}->[0];
say $content->{items}[0];
say ${$content}{items}->[0];
say ${$content}{items}[0];
say ${$content->{items}}[0];
say ${${$content}{items}}[0];
prints the same value 1.

$content is a hash reference, so you always need to use an arrow to access its contents. $content{items} would refer to a %content hash, which you don't have. That's where you're getting that "use of uninitialized value" error from.

I actually asked a similar question here
The answer:
In Perl, a function can only really return a scalar or a list.
Since hashes can be initialized or assigned from lists (e.g. %foo = (a => 1, b => 2)), I guess you're asking why json_decode returns something like { a => 1, b => 2 } (a reference to an anonymous hash) rather than (a => 1, b => 2) (a list that can be copied into a hash).
I can think of a few good reasons for this:
in Perl, an array or hash always contains scalars. So in something like { "a": { "b": 3 } }, the { "b": 3 } part has to be a scalar; and for consistency, it makes sense for the whole thing to be a scalar in the same way.
if the hash is quite large (many keys at top-level), it's pointless and expensive to iterate over all the elements to convert it into a list, and then build a new hash from that list.
in JSON, the top-level element can be either an object (= Perl hash) or an array (= Perl array). If json_decode returned a list in the former case, it's not clear what it would return in the latter case. After decoding the JSON string, how could you examine the result to know what to do with it? (And it wouldn't be safe to write %foo = json_decode(...) unless you already knew that you had a hash.) So json_decode's behavior works better for any general-purpose library code that has to use it without already knowing very much about the data it's working with.
I have to wonder exactly what you passed as an array to json_decode, because my results differ from yours.
#!/usr/bin/perl
use JSON qw (decode_json);
use Data::Dumper;
my $json = '["1", "2", "3", "4"]';
my $fromJSON = decode_json($json);
print Dumper($fromJSON);
The result is $VAR1 = [ '1', '2', '3', '4' ];
Which is an array ref, where your result is a hash ref
So did you pass in a hash with element items which was a reference to an array?
In my example you would get the array by doing
my #array = #{ $fromJSON };
In yours
my #array = #{ $content->{'items'} }

I don't understand why you dislike the arrow operator so much!
The decode_json function from the JSON module will always return a data reference.
Suppose you have a Perl program like this
use strict;
use warnings;
use JSON;
my $json_data = '{ "items": [ 1, 2, 3, 4 ] }';
my $content = decode_json($json_data);
use Data::Dump;
dd $content;
which outputs this text
{ items => [1 .. 4] }
showing that $content is a hash reference. Then you can access the array reference, as you found, with
dd $content->{items};
which shows
[1 .. 4]
and you can print the first element of the array by writing
print $content->{items}[0], "\n";
which, again as you have found, shows just
1
which is the first element of the array.
As #cjm mentions in a comment, it is imperative that you use strict and use warnings at the start of every Perl program. If you had those in place in the program where you tried to access $content{items}, your program would have failed to compile, and you would have seen the message
Global symbol "%content" requires explicit package name
which is a (poorly-phrased) way of telling you that there is no %content so there can be no items element.
The scalar variable $content is completely independent from the hash variable %content, which you are trying to access when you write $content{items}. %content has never been mentioned before and it is empty, so there is no items element. If you had tried #{$content->{items}} then it would have worked, as would #{${$content}{items}}
If you really have a problem with the arrow operator, then you could write
print ${$content}{items}[0], "\n";
which produces the same output; but I don't understand what is wrong with the original version.

BASH - Parsing values from a single line database file (

im trying to parse a database file retrieved from a website via curl, however I having trouble trying to figure out how to get the values.
This is an example of the file
{"Databasename":[{"Var1":"Var1Value","Var2":"Var2Value","Var3":"Var3Value"},{"Var1b":"Var1bValue","Var2b":"Var2bValue","Var3b":"Var3bValue"}],"foldername":{"dbTblcountvar":"dbTblcountvalue","filecountsize":"filecountsizvalue"}}
and with line break for better readability
{
"Databasename":
[
{
"Var1":"Var1aValue",
"Var2":"Var2aValue",
"Var3":"Var3aValue"
},
{
"Var1":"Var1bValue",
"Var2":"Var2bValue",
"Var3":"Var3bValue"
},
{
"Var1":"Var1cValue",
"Var2":"Var2cValue",
"Var3":"Var3cValue"
}
],
"foldername":
{
"dbTblcountvar":"dbTblcountvalue",
"filecountsize":"filecountsizvalue"
}
}
asuming Var2 is always constant, how can i get its value? (Var2aValue,Var2bValue,Var2cValue,Var2dValue,.....)
In the example above the value im trying to get is an id for a file i need to send back to the server to download the file, and perform other operations on it.
Thanks

cat DownloadedFile.Ext | perl -pe 's/"Var2[abc]?":"(.+?)(?<![\\])"/\n\1\n/g' | grep -vPe '(?<!\\)"'
Those commands first put the Var2 (with optional a, b or c after) on a new line, then filter all lines that have a ".
I suppose that is a json file, so I avoid the matching of escaped " with this part of the regexp:
(?<!\\)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to parse multidimensional JSON array in bash using jsawk? - json

Related

Display an object if a nephew element array contains a value

Get the element after/before the matched one in json with jq

Convert plain text with a specific format into JSON in VIM

Decoding and using JSON data in Perl

BASH - Parsing values from a single line database file (

Categories

Resources