I am trying to parse JSON that is coming to me in the form of an array of arrays (think a table of data). The issue is that this table may contain arrays or maps as elements and these elements may by empty. Here is an example:
json <- '[[1,"foo",[],{}],[1,"bar",[1],{"foo":"bar"}]]'
# Result is a list of 2 where each sublist is of length 4
jsonlite::fromJSON(json)
# Result is a character vector of length 6
> unname(unlist(jsonlite::fromJSON(json)))
[1] "1" "foo" "1" "bar" "1" "bar"
So when I try and cast this to a 2 by 4 matrix I am getting the wrong answer. I would like [] to map to the string "[]" and {} to "{}" so I don't lose elements. It is totally fine for me to return the nested array as "[1]" instead of parsing it as a list. To me this seems like I need to tell the json parser to stop recursing and treat the elements as characters at a certain point but I can't figure out how to do this. I'm not tied to the jsonlite package so basically anything is fair game as long as it is not slow.
You could recursively iterate the parsed json to find the empty lists and replace them with the values you want. For example
renameEmptyLists <- function(x) {
if (is.list(x)) {
if( length(x)==0 ) {
return(if(!is.null(names(x))) { "{}" } else {"[]"} )
} else {
return(lapply(x, renameEmptyLists))
}
} else {
x
}
}
jj <- jsonlite::fromJSON(json)
unname(unlist(renameEmptyLists(jj)))
# [1] "1" "foo" "[]" "{}" "1" "bar" "1" "bar"
And to be clear, you where "loosing" them during the unlist(). If you look at the jj object in my example, you will see that the parse correctly identified the empty list and the empty named list.
Related
Original Data
I have the following JSON:
{
"foo":[
"asd",
"fgh"
],
"bar":[
"abc",
"xyz",
"ert"
],
"baz":[
"something"
]
}
Now I want to transform it to a "flat" CSV, such that for every key in my object the list in the value is expanded to n rows with n being the number of entries in the respective list.
Expected Output
foo;asd
foo;fgh
bar;abc
bar;xyz
bar;ert
baz;something
Approaches
I guess I need to use to_entries and then for each .value repeat the same .key for the first column. The jq docs state that:
Thus as functions as something of a foreach loop.
So I tried combining
to_entriesto give the keys and values from my dictionary an accessible name
then build kind of a foreach loop around the .values
and pass the result to #csv
to_entries|map(.value) as $v|what goes here?|#csv
I prepared something that at least compiles here
Don't need to use _entries function, a simple key/value lookup and string interpolation should suffice
keys_unsorted[] as $k | "\($k);\( .[$k][])"
The construct .[$k][] is an expression that first expands the value associated with each key, i.e. .foo and then with object construction, produces multiple results for each key identified and stored in $k variable.
I'm very new to jq and this post is a result of not understanding the mechanics behind jq.
I could develop a bash script, which does what I want but jq and it's JSON super-powers have intrigued me and I'd like to learn it by applying to real world scenarios. Here's one...
BTW, I've tried to make use of the existing jq related SO solutions for merging/joining JSONs but have failed.
The closest I came to what I needed was to use an INDEX and a concatenation of $x + . , however I was only getting the LAST item from my second (c2) json.
So, my problem is as follows:
There are Two JSON files:
JSON #1 will have unique "id" and "type" keys - among other key/value pairs, which I've removed for better clarity of my post.
JSON #2 will contain multiples/non-unique "type" keys, which I'd like to match these two JSON files on. This JSON #2 will also contain other key/value pairs, which are expected to be contained in the resultant output.
My output requirements are:
I'd like to obtain a (one per line or a single array) list of all combinations of matching key/values pairs between c1 and c2 array where the value of the "type" key (string) matches between c1 and c2 exactly.
One more question, how much more difficult would it be to scale the solution to perform similar matching/joining between three JSON files at once - again on the same value of a particular key?
Any assistance or even just hints on how to solve and understand how to solve this would be greatly appreciated!
1st input file: JSON #1, Array c1 (collection 1)
{ "c1":
[
{ "c1id":1, "type":"alpha" },
{ "c1id":2, "type":"beta" }
]
}
2nd input file: JSON #2, Array c2 (collection 2)
{
"c2":
[
{ "c2id":1,"type":"alpha","serial":"DDBB001"} ,
{ "c2id":2,"type":"beta","serial":"DDBB007"} ,
{ "c2id":3,"type":"alpha","serial":"DDTT005"} ,
{ "c2id":4,"type":"beta","serial":"DDAA002"} ,
{ "c2id":5,"type":"yotta","serial":"DDCC017"}
]
}
Expected output:
{"c1id":1,"type":"alpha","c2id":1,"serial":"DDBB001"}
{"c1id":1,"type":"alpha","c2id":3,"serial":"DDTT005"}
{"c1id":2,"type":"beta","c2id":2,"serial":"DDBB007"}
{"c1id":2,"type":"beta","c2id":4,"serial":"DDAA002"}
You will notice that type "yotta" from the c2 is not included in the output. This is expected. Only "types" which exist in c1 and match c2 are expected to be in the results. I guess this is implied by this being a matching/joining exercise - I added it just for clarity - I hope it worked.
Here's an example of using INDEX and JOIN:
jq --compact-output --slurpfile c1 c1.json '
INDEX(
$c1[0].c1[];
.type
) as $index |
JOIN(
$index;
.c2[];
.type;
reverse|add
)
' c2.json
The first argument to INDEX needs to produce a stream of items, which is why we apply [] to get the items from the array individually. The second argument selects our index key.
We use the four argument version of JOIN. The first argument is the index itself, the second is a stream of objects to be joined to the index, the third argument selects the lookup key from the streamed objects, and the fourth argument is an expression to assemble the join object. The input to that expression is a stream of two-item arrays, each looking something like this:
[{"c2id":1,"type":"alpha","serial":"DDBB001"},{"c1id":1,"type":"alpha"}]
Since we just want to combine all the keys and values from the objects we just use add, but we first reverse the array to nicely arrange the c1 fields before the c2 fields. The end result is as you hoped:
{"c1id":1,"type":"alpha","c2id":1,"serial":"DDBB001"}
{"c1id":2,"type":"beta","c2id":2,"serial":"DDBB007"}
{"c1id":1,"type":"alpha","c2id":3,"serial":"DDTT005"}
{"c1id":2,"type":"beta","c2id":4,"serial":"DDAA002"}
Most examples deal with the book store example from Stefan Gössner, however I'm struggling to define the correct JsonPath expression for a simple object (no array):
{ "Id": 1, "Name": "Test" }
To check if this json contains Id = 1.
I tried the following expression: $..?[(#.Id == 1]), but this does find any matches using Json.NET?
Also tried Manatee.Json for parsing, and there it seems the jsonpath expression could be like $[?($.Id == 1)] ?
The path that you posted is not valid. I think you meant $..[?(#.Id == 1)] (some characters were out of order). My answer assumes this.
The JSON Path that you're using indicates that the item you're looking for should be in an array.
$ start
.. recursive search (1)
[ array item specification
?( item-based query
#.Id == 1 where the item is an object with an "Id" with value == 1 at the root
) end item-based query
] end array item specification
(1) the conditions following this could match a value no matter how deep in the hierarchy it exists
You want to just navigate the object directly. Using $.Id will return 1, which you can validate in your application.
All of that said...
It sounds to me like you want to validate that the Id property is 1 rather than to search an array for an object where the Id property is 1. To do this, you want JSON Schema, not JSON Path.
JSON Path is a query language for searching for values which meet certain conditions (e.g. an object where Id == 1.
JSON Schema is for validating that the JSON meet certain requirements (your data's in the right shape). A JSON Schema to validate that your object has a value of 1 could be something like
{
"properties": {
"Id": {"const":1}
}
}
Granted this isn't very useful because it'll only validate that the Id property is 1, which ideally should only be true for one object.
I'm parsing a JSON string that is stored in a database.
{"name":"simon", "age":"23", "height":"tall"}
I'm pulling the data, then decoding. When running the code below, I'm receiving weird 'HASH' values back.
use JSON;
$data = decode_json($row->{'address'});
for my $key (keys %$data){
if($data->{$key} ne ''){
$XML .= " <$key>$data->{$key}</$key>";
}
}
// Returns data like so
<company_type>HASH(0x27dbac0)</company_type>
<county>HASH(0x27db7c0)</county>
<address1>HASH(0x27dba90)</address1>
<company_name>HASH(0x27db808)</company_name>
The Error happens when I have a data set like so:
{"name":"", "age":{}, "height":{}}
I don't understand why JSON / Arrays / Hashes have to be so difficult to work with in Perl. What point am I missing?
You are processing a flat hash, while your data in fact has another, nested, hashref. In the line
{ "name":"", "age":{}, "height":{} }
the {} may be intended to mean "nothing" but are in fact JSON "object", the next level of nested data (which are indeed empty). In Perl we get a hashref for it and that's what your code prints.
The other pillar of JSON is an "array" and in Perl we get an arrayref. And that's that -- decode_json gives us back the top-level hashref, which when dereferenced into a hash may contain further hash or array references as values. If you print the whole structure with Data::Dumper you'll see that.
To negotiate this we have to test each time for a reference. Since a dereferenced hash or array may contain yet further levels (more references), we need to use either a recursive routine (see this post for an example) or a module for complex data structures. But for the first level
for my $key (keys %$data)
{
next if $data->{$key} eq '';
my $ref_type = ref $data->{$key};
# if $data->{key} is not a reference ref() returns an empty string (false)
if (not $ref_type) {
$XML .= " <$key>$data->{$key}</$key>";
}
elsif ($ref_type eq 'HASH') {
# hashref, unpack and parse. it may contain references
say "$_ => $data->{$key}{$_}" for keys %{ $data->{$key} };
}
elsif ($ref_type eq 'ARRAY') {
# arrayref, unpack and parse. it may contain references
say "#{$data->{$key}}";
}
else { say "Reference is to type: $ref_type" }
}
If the argument of ref is not a reference (but a string or a number) ref returns an empty string, which evaluates as false, which is when you have plain data. Otherwise it returns the type the reference is to. Coming from JSON it can be either a HASH or an ARRAY. This is how nesting is accomplished.
In the shown example you are runnig into hashref. Since the ones you show are empty you can just discard them and the code for the specific example can reduce greatly, to one statement. However, I'd leave the other tests in place. This should also work as it stands with the posted example.
I am confused about accessing the contents of some JSON data that I have decoded. Here is an example
I don't understand why this solution works and my own does not. My questions are rephrased below
my $json_raw = getJSON();
my $content = decode_json($json_raw);
print Data::Dumper($content);
At this point my JSON data has been transformed into this
$VAR1 = { 'items' => [ 1, 2, 3, 4 ] };
My guess tells me that, once decoded, the object will be a hash with one element that has the key items and an array reference as the value.
$content{'items'}[0]
where $content{'items'} would obtain the array reference, and the outer $...[0] would access the first element in the array and interpret it as a scalar. However this does not work. I get an error message use of uninitialized value [...]
However, the following does work:
$content->{items}[0]
where $content->{items} yields the array reference and [0] accesses the first element of that array.
Questions
Why does $content{'items'} not return an array reference? I even tried #{content{'items'}}, thinking that, once I got the value from content{'items'}, it would need to be interpreted as an array. But still, I receive the uninitialized array reference.
How can I access the array reference without using the arrow operator?
Beginner's answer to beginner :) Sure not as profesional as should be, but maybe helps you.
use strict; #use this all times
use warnings; #this too - helps a lot!
use JSON;
my $json_str = ' { "items" : [ 1, 2, 3, 4 ] } ';
my $content = decode_json($json_str);
You wrote:
My guess tells me that, once decoded, the object will be a hash with
one element that has the key items and an array reference as the value.
Yes, it is a hash, but the the decode_json returns a reference, in this case, the reference to hash. (from the docs)
expects an UTF-8 (binary) string and tries to parse that
as an UTF-8 encoded JSON text,
returning the resulting reference.
In the line
my $content = decode_json($json_str);
you assigning to an SCALAR variable (not to hash).
Because you know: it is a reference, you can do the next:
printf "reftype:%s\n", ref($content);
#print: reftype:HASH ^
#therefore the +------- is a SCALAR value containing a reference to hash
It is a hashref - you can dump all keys
print "key: $_\n" for keys %{$content}; #or in short %$content
#prints: key: items
also you can assing the value of the "items" (arrayref) to an scalar variable
my $aref = $content->{items}; #$hashref->{key}
#or
#my $aref = ${$content}{items}; #$hash{key}
but NOT
#my $aref = $content{items}; #throws error if "use strict;"
#Global symbol "%content" requires explicit package name at script.pl line 20.
The $content{item} is requesting a value from the hash %content and you never defined/assigned such variable. the $content is an scalar variable not hash variable %content.
{
#in perl 5.20 you can also
use 5.020;
use experimental 'postderef';
print "key-postderef: $_\n" for keys $content->%*;
}
Now step deeper - to the arrayref - again you can print out the reference type
printf "reftype:%s\n", ref($aref);
#reftype:ARRAY
print all elements of array
print "arr-item: $_\n" for #{$aref};
but again NOT
#print "$_\n" for #aref;
#dies: Global symbol "#aref" requires explicit package name at script.pl line 37.
{
#in perl 5.20 you can also
use 5.020;
use experimental 'postderef';
print "aref-postderef: $_\n" for $aref->#*;
}
Here is an simple rule:
my #arr; #array variable
my $arr_ref = \#arr; #scalar - containing a reference to #arr
#{$arr_ref} is the same as #arr
^^^^^^^^^^ - array reference in curly brackets
If you have an $arrayref - use the #{$array_ref} everywhere you want use the array.
my %hash; #hash variable
my $hash_ref = \%hash; #scalar - containing a reference to %hash
%{$hash_ref} is the same as %hash
^^^^^^^^^^^ - hash reference in curly brackets
If you have an $hash_ref - use the %{$hash_ref} everywhere you want use the hash.
For the whole structure, the following
say $content->{items}->[0];
say $content->{items}[0];
say ${$content}{items}->[0];
say ${$content}{items}[0];
say ${$content->{items}}[0];
say ${${$content}{items}}[0];
prints the same value 1.
$content is a hash reference, so you always need to use an arrow to access its contents. $content{items} would refer to a %content hash, which you don't have. That's where you're getting that "use of uninitialized value" error from.
I actually asked a similar question here
The answer:
In Perl, a function can only really return a scalar or a list.
Since hashes can be initialized or assigned from lists (e.g. %foo = (a => 1, b => 2)), I guess you're asking why json_decode returns something like { a => 1, b => 2 } (a reference to an anonymous hash) rather than (a => 1, b => 2) (a list that can be copied into a hash).
I can think of a few good reasons for this:
in Perl, an array or hash always contains scalars. So in something like { "a": { "b": 3 } }, the { "b": 3 } part has to be a scalar; and for consistency, it makes sense for the whole thing to be a scalar in the same way.
if the hash is quite large (many keys at top-level), it's pointless and expensive to iterate over all the elements to convert it into a list, and then build a new hash from that list.
in JSON, the top-level element can be either an object (= Perl hash) or an array (= Perl array). If json_decode returned a list in the former case, it's not clear what it would return in the latter case. After decoding the JSON string, how could you examine the result to know what to do with it? (And it wouldn't be safe to write %foo = json_decode(...) unless you already knew that you had a hash.) So json_decode's behavior works better for any general-purpose library code that has to use it without already knowing very much about the data it's working with.
I have to wonder exactly what you passed as an array to json_decode, because my results differ from yours.
#!/usr/bin/perl
use JSON qw (decode_json);
use Data::Dumper;
my $json = '["1", "2", "3", "4"]';
my $fromJSON = decode_json($json);
print Dumper($fromJSON);
The result is $VAR1 = [ '1', '2', '3', '4' ];
Which is an array ref, where your result is a hash ref
So did you pass in a hash with element items which was a reference to an array?
In my example you would get the array by doing
my #array = #{ $fromJSON };
In yours
my #array = #{ $content->{'items'} }
I don't understand why you dislike the arrow operator so much!
The decode_json function from the JSON module will always return a data reference.
Suppose you have a Perl program like this
use strict;
use warnings;
use JSON;
my $json_data = '{ "items": [ 1, 2, 3, 4 ] }';
my $content = decode_json($json_data);
use Data::Dump;
dd $content;
which outputs this text
{ items => [1 .. 4] }
showing that $content is a hash reference. Then you can access the array reference, as you found, with
dd $content->{items};
which shows
[1 .. 4]
and you can print the first element of the array by writing
print $content->{items}[0], "\n";
which, again as you have found, shows just
1
which is the first element of the array.
As #cjm mentions in a comment, it is imperative that you use strict and use warnings at the start of every Perl program. If you had those in place in the program where you tried to access $content{items}, your program would have failed to compile, and you would have seen the message
Global symbol "%content" requires explicit package name
which is a (poorly-phrased) way of telling you that there is no %content so there can be no items element.
The scalar variable $content is completely independent from the hash variable %content, which you are trying to access when you write $content{items}. %content has never been mentioned before and it is empty, so there is no items element. If you had tried #{$content->{items}} then it would have worked, as would #{${$content}{items}}
If you really have a problem with the arrow operator, then you could write
print ${$content}{items}[0], "\n";
which produces the same output; but I don't understand what is wrong with the original version.