How can I stream JSON from a file? - json

I will have a possibly very large JSON file and I want to stream from it instead of load it all into memory. Based on the following statement (I added the emphasis) from JSON::XS, I believe it won't suit my needs. Is there a Perl 5 JSON module that will stream the results from the disk?
In some cases, there is the need for incremental parsing of JSON texts. While this module always has to keep both JSON text and resulting Perl data structure in memory at one time, it does allow you to parse a JSON stream incrementally. It does so by accumulating text until it has a full JSON object, which it then can decode. This process is similar to using decode_prefix to see if a full JSON object is available, but is much more efficient (and can be implemented with a minimum of method calls).
To clarify, the JSON will contain an array of objects. I want to read one object at a time from the file.

In terms of ease of use and speed, JSON::SL seems to be the winner:
#!/usr/bin/perl
use strict;
use warnings;
use JSON::SL;
my $p = JSON::SL->new;
#look for everthing past the first level (i.e. everything in the array)
$p->set_jsonpointer(["/^"]);
local $/ = \5; #read only 5 bytes at a time
while (my $buf = <DATA>) {
$p->feed($buf); #parse what you can
#fetch anything that completed the parse and matches the JSON Pointer
while (my $obj = $p->fetch) {
print "$obj->{Value}{n}: $obj->{Value}{s}\n";
}
}
__DATA__
[
{ "n": 0, "s": "zero" },
{ "n": 1, "s": "one" },
{ "n": 2, "s": "two" }
]
JSON::Streaming::Reader was okay, but it is slower and suffers from too verbose an interface (all of these coderefs are required even though many do nothing):
#!/usr/bin/perl
use strict;
use warnings;
use JSON::Streaming::Reader;
my $p = JSON::Streaming::Reader->for_stream(\*DATA);
my $obj;
my $attr;
$p->process_tokens(
start_array => sub {}, #who cares?
end_array => sub {}, #who cares?
end_property => sub {}, #who cares?
start_object => sub { $obj = {}; }, #clear the current object
start_property => sub { $attr = shift; }, #get the name of the attribute
#add the value of the attribute to the object
add_string => sub { $obj->{$attr} = shift; },
add_number => sub { $obj->{$attr} = shift; },
#object has finished parsing, it can be used now
end_object => sub { print "$obj->{n}: $obj->{s}\n"; },
);
__DATA__
[
{ "n": 0, "s": "zero" },
{ "n": 1, "s": "one" },
{ "n": 2, "s": "two" }
]
To parse 1,000 records it took JSON::SL .2 seconds and JSON::Streaming::Reader 3.6 seconds (note, JSON::SL was being fed 4k at a time, I had no control over JSON::Streaming::Reader's buffer size).

Have you looked at JSON::Streaming::Reader which shows up as first while searching for 'JSON Stream' on search.cpan.org?
Alternatively JSON::SL found by searching for 'JSON SAX' - not quite as obvious search terms, but what you describe sounds like a SAX parsers for XML.

It does so by accumulating text until it has a full JSON object, which it then can decode.
This is what screws your over. A JSON document is one object.
You need to define more clearly what you want from incremental parsing. Are you looking for one element of a large mapping? What are you trying to do with the information you read out/write?
I don't know any library that will incrementally parse JSON data by reading one element out of an array at once. However this is quite simple to implement yourself using a finite state automaton (basically your file has the format \s*\[\s*([^,]+,)*([^,]+)?\s*\]\s* except that you need to parse commas in strings correctly.)

Did you try to skip first right braket [ and then the commas , :
$json->incr_text =~ s/^ \s* \[ //x;
...
$json->incr_text =~ s/^ \s* , //x;
...
$json->incr_text =~ s/^ \s* \] //x;
like in the third example :
http://search.cpan.org/dist/JSON-XS/XS.pm#EXAMPLES

If you have control over how you're generating your JSON, then I suggest turning pretty formatting off and printing one object per line. This makes parsing simple, like so:
use Data::Dumper;
use JSON::Parse 'json_to_perl';
use JSON;
use JSON::SL;
my $json_sl = JSON::SL->new();
use JSON::XS;
my $json_xs = JSON::XS->new();
$json_xs = $json_xs->pretty(0);
#$json_xs = $json_xs->utf8(1);
#$json_xs = $json_xs->ascii(0);
#$json_xs = $json_xs->allow_unknown(1);
my ($file) = #ARGV;
unless( defined $file && -f $file )
{
print STDERR "usage: $0 FILE\n";
exit 1;
}
my #cmd = ( qw( CMD ARGS ), $file );
open my $JSON, '-|', #cmd or die "Failed to exec #cmd: $!";
# local $/ = \4096; #read 4k at a time
while( my $line = <$JSON> )
{
if( my $obj = json($line) )
{
print Dumper($obj);
}
else
{
die "error: failed to parse line - $line";
}
exit if( $. == 5 );
}
exit 0;
sub json
{
my ($data) = #_;
return decode_json($data);
}
sub json_parse
{
my ($data) = #_;
return json_to_perl($data);
}
sub json_xs
{
my ($data) = #_;
return $json_xs->decode($data);
}
sub json_xs_incremental
{
my ($data) = #_;
my $result = [];
$json_xs->incr_parse($data); # void context, so no parsing
push( #$result, $_ ) for( $json_xs->incr_parse );
return $result;
}
sub json_sl_incremental
{
my ($data) = #_;
my $result = [];
$json_sl->feed($data);
push( #$result, $_ ) for( $json_sl->fetch );
# ? error: JSON::SL - Got error CANT_INSERT at position 552 at json_to_perl.pl line 82, <$JSON> line 2.
return $result;
}

Related

Getting individual key values from Perl JSON one at a time

I have JSON code that I'm pulling with key names that are the same and I'm trying to pull the values from the keys one at a time and pass them to variables (in a loop) in a perl script but it pulls all of the values at one time instead of iterating through them. I'd like to pull a value from a key and pass it to a variable then iterate through the loop again for the next value. The amount of data changes in JSON so the amount of identical keys will grow.
Perl Script Snippet
#!/usr/bin/perl
use warnings;
use strict;
use JSON::XS;
my $res = "test.json";
my $txt = do {
local $/;
open my $fh, "<", $res or die $!;
<$fh>;
};
my $json = decode_json($txt);
for my $mdata (#{ $json->{results} }) {
my $sitedomain = "$mdata->{custom_fields}->{Domain}";
my $routerip = "$mdata->{custom_fields}->{RouterIP}";
#vars
my $domain = $sitedomain;
my $host = $routerip;
print $domain;
print $host;
}
Print $host variable
print $host;
192.168.201.1192.168.202.1192.168.203.1
Print $domain variable
print $domain;
site1.global.localsite2.global.localsite3.global.local
JSON (test.json)
{
"results": [
{
"id": 37,
"url": "http://global.local/api/dcim/sites/37/",
"display": "Site 1",
"name": "Site 1",
"slug": "site1",
"custom_fields": {
"Domain": "site1.global.local",
"RouterIP": "192.168.201.1"
}
},
{
"id": 38,
"url": "http://global.local/api/dcim/sites/38/",
"display": "Site 2",
"name": "Site 2",
"slug": "site2",
"custom_fields": {
"Domain": "site2.global.local",
"RouterIP": "192.168.202.1"
}
},
{
"id": 39,
"url": "http://global.local/api/dcim/sites/39/",
"display": "Site 3",
"name": "Site 3",
"slug": "site3",
"custom_fields": {
"Domain": "site3.global.local",
"RouterIP": "192.168.203.1"
}
}
]
}
Your code produces expected result if you add \n to print statement. You can utilize say instead of print if there is no format required.
use warnings;
use strict;
use feature 'say';
use JSON::XS;
my $res = "test.json";
my $txt = do {
local $/;
open my $fh, "<", $res or die $!;
<$fh>;
};
my $json = decode_json($txt);
for my $mdata (#{ $json->{results} }) {
my $sitedomain = "$mdata->{custom_fields}->{Domain}";
my $routerip = "$mdata->{custom_fields}->{RouterIP}";
#vars
my $domain = $sitedomain;
my $host = $routerip;
say "$domain $host";
}
The code can be re-written in shorter form as following
use strict;
use warnings;
use feature 'say';
use JSON;
my $fname = 'router_test.json';
my $txt = do {
local $/;
open my $fh, "<", $fname or die $!;
<$fh>;
};
my $json = from_json($txt);
say "$_->{custom_fields}{Domain} $_->{custom_fields}{RouterIP}" for #{$json->{results}};
It sounds like you want to "slice" the data. You could buffer in code, or collect unique values later. Let's modify what you started with, and make some tweaks:
n.b. No need to quote my $sitedomain = "$mdata->{custom_fields}->{Domain}";. The content of the JSON is already a string, and forcing Perl to make another string by interpolating it is unnecessary.
n.b.2 JSON::XS works automatically if it's installed.
my %domains;
my %ips;
for my $mdata (#{ $json->{results} }) {
my $sitedomain = $mdata->{custom_fields}->{Domain};
my $routerip = $mdata->{custom_fields}->{RouterIP};
# Collect and count all the unique domains and IPs by storing them as hash keys
$domains{$sitedomain} += 1;
$ips{$routerip} += 1;
}
for my $key (keys %domains) {
printf "%s %s\n", $key, $domains{$key};
# and so on
}
If we don't know the custom fields, we can play with nested hashes to collect it all:
my %fields;
for my $mdata (#{ $json->{results} }) {
for my $custom_field (keys %{ $mdata->{custom_fields} }) {
$fields{$custom_field}{$mdata->{custom_fields}{$custom_field}} += 1;
}
}
for my $custom_field (keys %fields) {
print "$custom_field:\n";
for my $unique_value (keys %{ $fields{$custom_field} }){
printf "%s - %s\n", $unique_value, $fields{$custom_field}{$unique_value};
}
}
Example output:
RouterIP:
192.168.201.1 - 1
192.168.203.1 - 1
192.168.202.1 - 1
Domain:
site2.global.local - 1
site1.global.local - 1
site3.global.local - 1
... or something like that. Nested structures lead very quickly to messy code. You can mitigate it by dereferencing the substructures. It could also be more predictable if we work with a known list of keys e.g.
my #known_keys = qw/RouterIP Domain/;
for my $mdata (#{ $json->{results} }) {
for my $custom_field (#known_keys) {
if (exists $fields{$custom_field}) {
$fields{$custom_field}{$mdata->{custom_fields}{$custom_field}} += 1;
}
}
}
If the JSON file is massive you may run out of memory. For this you would need to look into a package like JSON::SL or JSON::Streaming::Reader. They're more involved to use but prevent you from needing to load the whole file into memory. There are also unix tools like jq that provide the same powers.

looping through json in perl

I'm trying to grab some information out of a json export from Ping. My rusty Perl skills are failing me as I'm getting lost in the weeds with the dereferencing. Rather than bang my head against the wall some more I thought I'd post a question since all the google searches are leading here.
My understanding is that decode_json converts items into an array of hashes and each hash has strings and some other arrays of hashes as contents. This seems to bear out when attempting to get to an individual string value but only if I manually specify a specific array element. I can't figure out how to loop through the items.
The JSON comes back like this:
{
"items":[
{
#lots of values here are some examples
"type": "SP",
"contactInfo": {
"company": "Acme",
"email": "john.doe#acme.com"
}
]
}
I had no problems getting to actual values
#!/usr/bin/perl
use JSON;
use Data::Dumper;
use strict;
use warnings;
use LWP::Simple;
my $json;
{
local $/; #Enable 'slurp' mode
open my $fh, "<", "idp.json";
$json = <$fh>;
close $fh;
}
my $data = decode_json($json);
#array print $data->{'items'};
#hash print $data->{'items'}->[0];
#print $data->{'items'}->[0]->{'type'};
But, I can't figure out how to iterate through the array of items. I've tried for and foreach and various combinations of dereferencing, and it keeps telling me that the value I'm looping thru is still an array. If $data->{'items'} is an array, then presumably I should be able to do some variation of
foreach my $item ($data->{'items'})
or
my #items = $data->{'items'};
for (#items)
{
# stuff
}
But, I keep getting arrays back and I have to add in the ->[0] to get to a specific value.
$data->{'items'} is a reference to an array (of hash references). You need to dereference it, with #{ }:
use JSON;
use strict;
use warnings;
my $json;
{
local $/; #Enable 'slurp' mode
$json = <DATA>;
}
my $data = decode_json($json);
for my $item (#{ $data->{items} }) {
print "$item->{type}\n";
}
__DATA__
{
"items":[
{
"type": "SP",
"contactInfo": {
"company": "Acme",
"email": "john.doe#acme.com"
}
}
]
}
Output:
SP

DBM::Deep is failing to import hashref having 'true' or 'false' values

I have the JSON text as given below :
test.json
{
"a" : false
}
I want to create the DBM::Deep hash for above JSON. My code is looks like as given below :
dbm.pl
use strict;
use warnings;
use DBM::Deep;
use JSON;
use Data::Dumper;
# create the dbm::deep object
my $db = DBM::Deep->new(
file => 'test.db',
type => DBM::Deep->TYPE_HASH
);
my $json_text = do {
open( my $json_fh, $path )
or die("Can't open \$path\": $!\n");
local $/;
<$json_fh>;
};
my $json = JSON->new;
my $data = $json->decode($json_text);
print Dumper($data);
# create dbm::deep hash
eval { $db->{$path} = $data; };
if ($#) {
print "error : $#\n";
}
I am getting below output/error on execution of above code:
Error
$VAR1 = {
'a' => bless( do{(my $o = 0)}, 'JSON::XS::Boolean' )
};
error : DBM::Deep: Storage of references of type 'SCALAR' is not supported. at dbm.pl line 26
It seems like, JSON internally uses JSON::XS which convert the 'true' value in JSON::XS::Boolean object and DBM::Deep is not able to handle this, while it can handle the null value.
While the above code is working fine for below inputs:
{
"a" : 'true' # if true is in quotes
}
or
{
"a" : null
}
I tried many thing, but nothing worked. Does anyone has any workaround?
The JSON parser you are using, among others, returns an object that works as a boolean when it encounters true or false in the JSON. This allows the data to be re-encoded into JSON without change, but it can cause this kind of issue.
null doesn't have this problem because Perl has a native value (undef) that can be used to represent it unambiguously.
The following convert these objects into simple values.
sub convert_json_bools {
local *_convert_json_bools = sub {
my $ref_type = ref($_[0])
or return;
if ($ref_type eq 'HASH') {
_convert_json_bools($_) for values(%{ $_[0] });
}
elsif ($ref_type eq 'ARRAY') {
_convert_json_bools($_) for #{ $_[0] };
}
elsif ($ref_type =~ /::Boolean\z/) {
$_[0] = $_[0] ? 1 : 0;
}
else {
warn("Unsupported type $ref_type\n");
}
};
&_convert_json_bools;
}
convert_json_bools($data);
Your code works fine for me, with the only change being to set
my $path = 'test.json';
You should check your module version numbers. These are the ones that I have
print $DBM::Deep::VERSION, "\n"; # 2.0013
print $JSON::VERSION, "\n"; # 2.90
print $JSON::XS::VERSION, "\n"; # 3.02
and I am running Perl v5.24.0
The dumped output is as follows
Newly-created DBM::Deep database
$VAR1 = bless( {}, 'DBM::Deep::Hash' );
output of $json->decode
$VAR1 = {
'a' => undef
};
Populated DBM::Deep database after the eval
$VAR1 = bless( {
'test.json' => bless( {
'a' => undef
}, 'DBM::Deep::Hash' )
}, 'DBM::Deep::Hash' );
All of that looks to be as it should

Perl XML2JSON : How to preserve XML element order?

I have a configuration file which is in XML format. I need to parse the XML and convert to JSON. I'm able to convert it with XML2JSON module of perl. But the problem is, it is not maintaining the order of XML elements. I strictly need the elements in order otherwise I cannot configure
My XML file is something like this. I have to configure an IP address and set that IP as a gateway to certain route.
<Config>
<ip>
<address>1.1.1.1</address>
<netmask>255.255.255.0</netmask>
</ip>
<route>
<network>20.20.20.0</network>
<netmask>55.255.255.0</netmask>
<gateway>1.1.1.1</gateway>
</route>
</Config>
This is my perl code to convert to JSON
my $file = 'config.xml';
use Data::Dumper;
open my $fh, '<',$file or die;
$/ = undef;
my $data = <$fh>;
my $XML = $data;
my $XML2JSON = XML::XML2JSON->new();
my $Obj = $XML2JSON->xml2obj($XML);
print Dumper($Obj);
The output I'm getting is,
$VAR1 = {'Config' => {'route' => {'netmask' => {'$t' => '55.255.255.0'},'gateway' => {'$t' => '1.1.1.1'},'network' => {'$t' => '20.20.20.0'}},'ip' => {'netmask' => {'$t' => '255.255.255.0'},'address' => {'$t' => '1.1.1.1'}}},'#encoding' => 'UTF-8','#version' => '1.0'};
I have a script which reads the json object and configure..
But it fails as it first tries to set gateway ip address to a route where the ip address is not yet configured and add then add ip address.
I strictly want key ip to come first and then route for proper configuration without error. Like this I have many dependencies where order of keys is a must.
Is there any way I can tackle this problem? I tried almost all modules of XML parsing like XML::Simple,Twig::XML,XML::Parser. But nothing helped..
Here's a program that I hacked together that uses XML::Parser to parse some XML data and generate the equivalent JSON in the same order. It ignores any attributes, processing instructions etc. and requires that every XML element must contain either a list of child elements or a text node. Mixing text and elements won't work, and this isn't checked except that the program will die trying to dereference a string
It's intended to be a framework for you to enhance as you require, but works fine as it stands with the XML data you show in your question
use strict;
use warnings 'all';
use XML::Parser;
my $parser = XML::Parser->new(Handlers => {
Start => \&start_tag,
End => \&end_tag,
Char => \&text,
});
my $struct;
my #stack;
$parser->parsefile('config.xml');
print_json($struct->[1]);
sub start_tag {
my $expat = shift;
my ($tag, %attr) = #_;
my $elem = [ $tag => [] ];
if ( $struct ) {
my $content = $stack[-1][1];
push #{ $content }, $elem;
}
else {
$struct = $elem;
}
push #stack, $elem;
}
sub end_tag {
my $expat = shift;
my ($elem) = #_;
die "$elem <=> $stack[-1][0]" unless $stack[-1][0] eq $elem;
for my $content ( $stack[-1][1] ) {
$content = "#$content" unless grep ref, #$content;
}
pop #stack;
}
sub text {
my $expat = shift;
my ($string) = #_;
return unless $string =~ /\S/;
$string =~ s/\A\s+//;
$string =~ s/\s+\z//;
push #{ $stack[-1][1] }, $string;
}
sub print_json {
my ($data, $indent, $comma) = (#_, 0, '');
print "{\n";
for my $i ( 0 .. $#$data ) {
# Note that $data, $indent and $comma are overridden here
# to reflect the inner context
#
my $elem = $data->[$i];
my $comma = $i < $#$data ? ',' : '';
my ($tag, $data) = #$elem;
my $indent = $indent + 1;
printf qq{%s"%s" : }, ' ' x $indent, $tag;
if ( ref $data ) {
print_json($data, $indent, $comma);
}
else {
printf qq{"%s"%s\n}, $data, $comma;
}
}
# $indent and $comma (and $data) are restored here
#
printf "%s}%s\n", ' ' x $indent, $comma;
}
output
{
"ip" : {
"address" : "1.1.1.1",
"netmask" : "255.255.255.0"
},
"route" : {
"network" : "20.20.20.0",
"netmask" : "55.255.255.0",
"gateway" : "1.1.1.1"
}
}
The problem isn't so much to do with XML parsing, but because perl hashes are not ordered. So when you 'write' some JSON... it can be any order.
The way to avoid this is to apply a sort function to your JSON.
You can do this by using sort_by to explicitly sort:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use JSON::PP;
use Data::Dumper;
sub order_nodes {
my %rank_of = ( ip => 0, route => 1, address => 2, network => 3, netmask => 4, gateway => 5 );
print "$JSON::PP::a <=> $JSON::PP::b\n";
return $rank_of{$JSON::PP::a} <=> $rank_of{$JSON::PP::b};
}
my $twig = XML::Twig -> parse (\*DATA);
my $json = JSON::PP -> new;
$json ->sort_by ( \&order_nodes );
print $json -> encode( $twig -> simplify );
__DATA__
<Config>
<ip>
<address>1.1.1.1</address>
<netmask>255.255.255.0</netmask>
</ip>
<route>
<network>20.20.20.0</network>
<netmask>55.255.255.0</netmask>
<gateway>1.1.1.1</gateway>
</route>
</Config>
In some scenarios, setting canonical can help, as that sets ordering to lexical order. (And means your JSON output would be consistently ordered). This doesn't apply to your case.
You could build the node ordering via XML::Twig, either by an xpath expression, or by using twig_handlers. I gave it a quick go, but got slightly unstuck in figuring out how you'd 'tell' how to figure out ordering based on getting address/netmask and then network/netmask/gateway.
As a simple example you could:
my $count = 0;
foreach my $node ( $twig -> get_xpath ( './*' ) ) {
$rank_of{$node->tag} = $count++ unless $rank_of{$node->tag};
}
print Dumper \%rank_of;
This will ensure ip and route are always the right way around. However it doesn't order the subkeys.
That actually gets a bit more complicated, as you'd need to recurse... and then decide how to handle 'collisions' (like netmask - address comes before, but how does it sort compared to network).
Or alternatively:
my $count = 0;
foreach my $node ( $twig->get_xpath('.//*') ) {
$rank_of{ $node->tag } = $count++ unless $rank_of{ $node->tag };
}
This walks all the nodes, and puts them in order. It doesn't quite work, because netmask appears in both stanzas though.
You get:
{"ip":{"address":"1.1.1.1","netmask":"255.255.255.0"},"route":{"netmask":"55.255.255.0","network":"20.20.20.0","gateway":"1.1.1.1"}}
I couldn't figure out a neat way of collapsing both lists.

Looping through JSON data and printing out a field using Perl (Not a HASH ref)

I need to loop through several jsons and print out index 1 of each...
I start with a generated string containing my JSON data. I then decode the string and dump it to show exactly what I'm working with:
my $decoded = decode_json $string
print Dumper $string
This results in the following output:
$VAR1 = [
{
'hdr' => [
1,
'acknowledged',
'',
'/home/clanier/dev/test/sds-test/data/JPLIDR2015169.64575',
'2015/271-19:10:39.0101355',
'2015/271-19:10:39.2599252',
''
]
},
{
'hdr' => [
2,
'acknowledged',
'',
'/home/clanier/dev/test/sds-test/data/JPLIDR2015169.64575',
'2015/271-19:10:39.3928414',
'2015/271-19:10:39.6397269',
''
]
},
{
'hdr' => [
3,
'acknowledged',
'',
'/home/clanier/dev/test/sds-test/data/JPLIDR2015169.64575',
'2015/271-19:10:39.7726375',
'2015/271-19:10:40.0162758',
''
]
}
];
Now I try to loop through this and print the word acknowledged for each one:
foreach my $hdr ( $decoded->{hdr} ) {
print $hdr->[1];
}
I looked to this solution for help, but it appears I cannot even get as far as the original poster, due to "Not a HASH reference" errors. I was able to print a specific one previously, but I need to loop through and print all of them. This was the code for that: print $$decoded[0]->{'hdr'}->[1];
$decoded is an array reference, not a hash reference. You can't dereference it as a hash: ->{.
You can dereference it as an array, though:
for my $hdr (#$decoded) {
print $hdr->{hdr}[1];
}
See perlreftut
for more information about references in Perl.