File path into JSON data structure - json

I'm doing a disk space report that uses File::Find to collect cumulative sizing in a directory tree.
What I get (easily) from File::Find is the directory name.
e.g.:
/path/to/user/username/subdir/anothersubdir/etc
I'm running File::Find to collect sizes beneath:
/path/to/user/username
And build a cumulative size report of the directory and each of the subdirectories.
What I've currently got is:
while ( $dir_tree ) {
%results{$dir_tree} += $blocks * $block_size;
my #path_arr = split ( "/", $dir_tree );
pop ( #path_arr );
$dir_tree = join ( "/", #path_arr );
}
(And yes, I know that's not very nice.).
The purpose of doing this is so when I stat each file, I add it's size to the current node and each parent node in the tree.
This is sufficient to generate:
username,300M
username/documents,150M
username/documents/excel,50M
username/documents/word,40M
username/work,70M
username/fish,50M,
username/some_other_stuff,30M
But I'd like to now turn that in to JSON more like this:
{
"name" : "username",
"size" : "307200",
"children" : [
{
"name" : "documents",
"size" : "153750",
"children" : [
{
"name" : "excel",
"size" : "51200"
},
{
"name" : "word",
"size" : "81920"
}
]
}
]
}
That's because I'm intending to do a D3 visualisation of this structure - loosely based on D3 Zoomable Circle Pack
So my question is this - what is the neatest way to collate my data such that I can have cumulative (and ideally non cumulative) sizing information, but populating a hash hierarchically.
I was thinking in terms of a 'cursor' approach (and using File::Spec this time):
use File::Spec;
my $data;
my $cursor = \$data;
foreach my $element ( File::Spec -> splitdir ( $File::Find::dir ) ) {
$cursor -> {size} += $blocks * $block_size;
$cursor = $cursor -> {$element}
}
Although... that's not quite creating the data structure I'm looking for, not least because we basically have to search by hash key to do the 'rolling up' part of the process.
Is there a better way of accomplishing this?
Edit - more complete example of what I have already:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find;
use Data::Dumper;
my $block_size = 1024;
sub collate_sizes {
my ( $results_ref, $starting_path ) = #_;
$starting_path =~ s,/\w+$,/,;
if ( -f $File::Find::name ) {
print "$File::Find::name isafile\n";
my ($dev, $ino, $mode, $nlink, $uid,
$gid, $rdev, $size, $atime, $mtime,
$ctime, $blksize, $blocks
) = stat($File::Find::name);
my $dir_tree = $File::Find::dir;
$dir_tree =~ s|^$starting_path||g;
while ($dir_tree) {
print "Updating $dir_tree\n";
$$results_ref{$dir_tree} += $blocks * $block_size;
my #path_arr = split( "/", $dir_tree );
pop(#path_arr);
$dir_tree = join( "/", #path_arr );
}
}
}
my #users = qw ( user1 user2 );
foreach my $user (#users) {
my $path = "/home/$user";
print $path;
my %results;
File::Find::find(
{ wanted => sub { \&collate_sizes( \%results, $path ) },
no_chdir => 1
},
$path
);
print Dumper \%results;
#would print this to a file in the homedir - to STDOUT for convenience
foreach my $key ( sort { $results{$b} <=> $results{$a} } keys %results ) {
print "$key => $results{$key}\n";
}
}
And yes - I know this isn't portable, and does a few somewhat nasty things. Part of what I'm doing here is trying to improve on that. (But currently it's a Unix based homedir structure, so that's fine).

If you do your own dir scanning instead of using File::Find, you naturally get the right structure.
sub _scan {
my ($qfn, $fn) = #_;
my $node = { name => $fn };
lstat($qfn)
or die $!;
my $size = -s _;
my $is_dir = -d _;
if ($is_dir) {
my #child_fns = do {
opendir(my $dh, $qfn)
or die $!;
grep !/^\.\.?\z/, readdir($dh);
};
my #children;
for my $child_fn (#child_fns) {
my $child_node = _scan("$qfn/$child_fn", $child_fn);
$size += $child_node->{size};
push #children, $child_node;
}
$node->{children} = \#children;
}
$node->{size} = $size;
return $node;
}
Rest of the code:
#!/usr/bin/perl
use strict;
use warnings;
no warnings 'recursion';
use File::Basename qw( basename );
use JSON qw( encode_json );
...
sub scan { _scan($_[0], basename($_[0])) }
print(encode_json(scan($ARGV[0] // '.')));

In the end, I have done it like this:
In the File::Find wanted sub collate_sizes:
my $cursor = $data;
foreach my $element (
File::Spec->splitdir( $File::Find::dir =~ s/^$starting_path//r ) )
{
$cursor->{$element}->{name} = $element;
$cursor->{$element}->{size} += $blocks * $block_size;
$cursor = $cursor->{$element}->{children} //= {};
}
To generate a hash of nested directory names. (The name subelement is probably redundant, but whatever).
And then post process it with (using JSON):
my $json_structure = {
'name' => $user,
'size' => $data->{$user}->{size},
'children' => [],
};
process_data_to_json( $json_structure, $data->{$user}->{children} );
open( my $json_out, '>', "homedir.json" ) or die $!;
print {$json_out} to_json( $json_structure, { pretty => 1 } );
close($json_out);
sub process_data_to_json {
my ( $json_cursor, $data_cursor ) = #_;
if ( ref $data_cursor eq "HASH" ) {
print "Traversing $key\n";
my $newelt = {
'name' => $key,
'size' => $data_cursor->{$key}->{size},
};
push( #{ $json_cursor->{children} }, $newelt );
process_data_to_json( $newelt, $data_cursor->{$key}->{children} );
}
}

Related

Parsing JSON data in Perl

I am parsing JSON data which is in .json file. Here I have 2 formats of JSON data files.
I could parse first JSON file - file is shown below:
file1.json
{
"sequence" : [ {
"type" : "type_value",
"attribute" : {
"att1" : "att1_val",
"att2" : "att2_val",
"att3" : "att3_val",
"att_id" : "1"
}
} ],
"current" : 0,
"next" : 1
}
Here is my script:
#/usr/lib/perl
use strict;
use warnings;
use Data::Dumper;
use JSON;
my $filename = $ARGV[0]; #Pass json file as an argument
print "FILE:$filename\n";
my $json_text = do {
open(my $json_fh, "<:encoding(UTF-8)", $filename)
or die("Can't open \$filename\": $!\n");
local $/;
<$json_fh>
};
my $json = JSON->new;
my $data = $json->decode($json_text);
my $aref = $data->{sequence};
my %Hash;
for my $element (#$aref) {
my $a = $element->{attribute};
next if(!$a);
my $aNo = $a->{att_id};
$Hash{$aNo}{'att1'} = $a->{att1};
$Hash{$aNo}{'att2'} = $a->{att2};
$Hash{$aNo}{'att3'} = $a->{att3};
}
print Dumper \%Hash;
Everything is getting stored in %Hash and when I print Dumper of the %Hash I am getting following result.
$VAR1 = {
'1' => {
'att1' => 'att1_val',
'att2' => 'att2_val',
'att3' => 'att3_val'
}
};
But when I parse second set of JSON file, I am getting empty hash by using the above script.
Output:
$VAR1 = {};
Here is the JSON file -
file2.json
{
"sequence" : [ {
"type" : "loop",
"quantity" : 8,
"currentIteration" : 0,
"sequence" : [ {
"type" : "type_value",
"attribute" : {
"att1" : "att1_val",
"att2" : "att2_val",
"att3" : "att3_val",
"att_id" : "1"
}
} ]
} ]
}
We can see two sequence in above JSON data file, which is causing the problem.
Can somebody tell me what I am missing in the script inorder to parse file2.json.
One possibility might be to check the type field to differentiate between the two file formats:
# [...]
for my $element (#$aref) {
if ( $element->{type} eq "loop" ) {
my $aref2 = $element->{sequence};
for my $element2 ( #$aref2 ) {
get_attrs( $element2, \%Hash );
}
}
else {
get_attrs( $element, \%Hash );
}
}
sub get_attrs {
my ( $element, $hash ) = #_;
my $a = $element->{attribute};
return if(!$a);
my $aNo = $a->{att_id};
$hash->{$aNo}{'att1'} = $a->{att1};
$hash->{$aNo}{'att2'} = $a->{att2};
$hash->{$aNo}{'att3'} = $a->{att3};
}
Please see the following code if it fits your requirements
#!/usr/bin/env perl
#
# vim: ai:sw=4:ts=4
#
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
use JSON;
my $debug = 0; # debug flag
my $filename = shift; # Pass json file as an argument
say "FILE: $filename";
open(my $json_fh, "<:encoding(UTF-8)", $filename)
or die("Can't open \$filename\": $!\n");
my $json_data = do { local $/; <$json_fh> };
close $json_fh;
my $json = JSON->new;
my $data = $json->decode($json_data);
say Dumper($data) if $debug;
my $data_ref;
my %Hash;
$data_ref = $data->{sequence}[0]{attribute}
if $filename eq 'file1.json';
$data_ref = $data->{sequence}[0]{sequence}[0]{attribute}
if $filename eq 'file2.json';
say Dumper($data_ref) if $debug;
my #fields = qw/att1 att2 att3/;
my $aNo = $data_ref->{att_id};
my %data_hash;
#data_hash{#fields} = $data_ref->{#fields};
$Hash{$aNo} = \%data_hash;
say Dumper(\%Hash);

Determine the Moose Type for providing conversion to JSON

I have a class, MyClass:
package MyClass;
use Moose;
has 'IntegerMember' => (
is => 'rw',
isa => 'Int'
);
has 'BooleanMember' => (
is => 'rw',
isa => 'Bool'
);
sub TO_JSON {
my $self = shift;
return { % { $self } };
}
Currently when I instantiate MyClass and pass the new object to the json_encoder I get a JSON string returned as expected. I was hoping that the perl booleans ( 1,0 ) would be converted to ( true, false ) but that is not how the JSON module is designed:
use JSON;
use MyClass;
my $object_to_encode = MyClass->new (
IntegerMember => 10,
BooleanMember => 1
);
my $json_encoder = JSON->new->convert_blessed;
my $json_data = $json_encoder->encode( $object_to_encode );
In MyClass, I want to improve my TO_JSON subroutine to provide a conversion of any Moose 'Bool' member from ( 1 or 0 ) to ( true or false ):
sub TO_JSON {
my $self = shift;
for my $member ( %$self ) {
if {
# Convert '1' to 'true' if Moose Type is 'Bool'
} else {
# Keep the member as is
}
}
}
How can I determine the Moose Type as I iterate through MyClass' members so I can provide a mechanism for the conversion?
Here's one way to do it:
package MyClass;
use Moose;
has 'IntegerMember' => (
is => 'rw',
isa => 'Int'
);
has 'BooleanMember' => (
is => 'rw',
isa => 'Bool'
);
sub TO_JSON {
my $self = shift;
my $meta = $self->meta;
my $result = {};
for my $attr ($meta->get_all_attributes) {
my $name = $attr->name;
my $value = $attr->get_value($self);
my $type = $attr->type_constraint;
if ($type && $type->equals('Bool')) {
$value = $value ? \1 : \ 0;
}
$result->{$name} = $value;
}
return $result;
}
1
We use the metaclass object (accessible via ->meta) to introspect the class and get a list of all attributes (in the form of meta-attribute objects).
For each attribute, we get the name, current value, and type constraint (if any). If the type is Bool, we convert the value to either \1 or \0. The JSON module understands these values and converts them to true or false.
And a test program:
use strict;
use warnings;
use JSON::MaybeXS;
use MyClass;
my $object_to_encode = MyClass->new (
IntegerMember => 10,
BooleanMember => 1
);
my $json_encoder = JSON->new->convert_blessed;
my $json_data = $json_encoder->encode( $object_to_encode );
print "Result: $json_data\n";
Output:
Result: {"IntegerMember":10,"BooleanMember":true}

perl fast json parser program

My JSON file contains some 3000 lines of content like below:
{
"product": [
{
"data": [
{
"number":"111",
"price":"3170",
"stock":"1"
},
{
"number":"222",
"price":"3170",
"stock":"1"
},
{
"number":"333",
"price":"3749",
"stock":"1"
}
],
"object":"apple",
"id":"54529"
},
{
"data":[],
"object":"orange",
"id":"54524"
}
]
}
I need to parse them really quick.
Below is my code. It's not working ..
use strict;
use warnings;
use JSON qw( );
my $filename = 'mob.json';
my $json_text = do
{
open(my $json_fh, "<:encoding(UTF-8)", $filename);
local $/;
<$json_fh>
};
my $json = JSON->new;
my $data = $json->decode($json_text);
for ( #{$data->{'product'}} )
{
print $_->{data}[0]->{number};
}
I need to get the number, price, stock and object, id as well.
Your code works fine. Almost. I made a couple of tweaks.
You alluded to speed at the beginning. Not clear if you wanted a quick answer, or a quicker way to parse lots of information. If it's the former, read on. If it's the latter, make sure you have JSON::XS installed.
Style-wise I find it painful to look at.
The use of a do{} to read the file makes me want to hurt myself. But, you used 3-param open. Kudos.
You need to deference the array value from the hash
You need to handle empty values in the data or you'll keep getting warnings
This code parses your JSON and outputs it, substuting empty vals with 'undefined':
use strict;
use warnings;
use JSON qw( );
my $filename = 'mob.json';
my $json_text = do {
open(my $json_fh, "<:encoding(UTF-8)", $filename);
local $/;
<$json_fh>;
};
my $json = JSON->new()->utf8(1);
my $data = $json->decode($json_text);
for my $product ( #{$data->{'product'}} ){
my ($name, $id) = map { $product->{$_} // 'undefined' } qw(name id);
print sprintf("Product: %s (%s)\n", $name, $id);
foreach my $data ( #{$product->{'data'}} ) {
my ($number, $price, $stock) =
map { $data->{$_}//'undefined' } qw(number price stock);
print sprintf(
" number: %s, price: %s, stock: %s\n",
$number,
$price,
$stock,
);
}
print "\n";
}

Convert csv tree structure to json

I have a large excel document (2000+ lines) that uses the cells to specify a tree structure and I would like to parse that into a .json file. The excel-exported .csv document is formatted as follows, where in the excel file a comma would be an empty cell:
Layer1category1,,,,,
,Layer2category,,,,
...
,,,,Layer5category1,
,,,,,item1
,,,,,item2
,,,,Layer5category2,
,,,,,item1
,,,Layer4category2,,
...
Layer1category2,,,,,
...
Layer1category8,,,,, // this is the last category in the uppermost layer
In summary, Layer n categories are prefaced with n-1 commas and followed by 6-n commas, and rows prefaced with 5 commas are the final layer, which is in the format of a string and has many fields other than its name.
I would like this to be converted to a .json file similar to the following. I use "name" because aside from a name each field is also tied to a lot of statistics that also needs to go into the json file.
{"name" : "Layer1category1",
"children": [
{"name" : "Layer2category1",
"children" : [
{"name" : "Layer3category1"
"children" : [
...
{"name" : "Layer5category1",
"children" : [{"name" : "item1"}, {"name" : "item2"}],}
{"name" : "Layer5category2",
"children" : [{"name" : "item1"}],}
{"name" : "Layer4category2",
"children" : [
...
]}
"name" : "Layer1category2",
"children" : [ ... ]
}
Does anyone have any suggestions for how I can approach this? The csv to json converters I have found do not support multi-layered structures. Thanks!
I faced with the same issue and wrote simple php script:
Input
Level I,Level II,Level III,Level IV,Level V,Level VI,Level VII,Level VIII,,,,,,,,,,,,,,,,,,
Role Profile,,,,,,,,,,,,,,,,,,,,,,,,,
,Development,,,,,,,,,,,,,,,,,,,,,,,,
,,Security,,,,,,,,,,,,,,,,,,,,,,,
,,Cloud,,,,,,,,,,,,,,,,,,,,,,,
,,,Cloud technologies,,,,,,,,,,,,,,,,,,,,,,
,,,,IaaS,,,,,,,,,,,,,,,,,,,,,
,,,,,Amazon Web Service (AWS),,,,,,,,,,,,,,,,,,,,
,,,,,Microsoft Azure,,,,,,,,,,,,,,,,,,,,
,,,,,Google Compute Engine (GCE),,,,,,,,,,,,,,,,,,,,
,,,,,OpenStack,,,,,,,,,,,,,,,,,,,,
Output
{
"Role Profile":{
"Development":{
"Security":{},
"Cloud":{
"Cloud technologies":{
"IaaS":{
"Amazon Web Service (AWS)":{},
"Microsoft Azure":{},
"Google Compute Engine (GCE)":{},
"OpenStack":{}
}
}
}
}
}
}
Code
<?php
$fn = "skills.csv"; // input file name
$hasHeader = true; // input file has header, we will skip first line
//
function appendItem( &$r, $lvl, $item ) {
if ( $lvl ) {
$lvl--;
end( $r );
appendItem( $r[key($r)], $lvl, $item );
} else {
$r[$item] = array();
}
}
$r = array();
if ( ( $handle = fopen( $fn, "r" ) ) !== FALSE ) {
$header = true;
while ( ( $data = fgetcsv( $handle, 1000, "," ) ) !== FALSE ) {
if ( $header and $hasHeader ) {
$header = false;
} else {
$lvl = 0;
foreach( $data as $dv ) {
$v = trim( $dv );
if ( $v ) {
appendItem( $r, $lvl, $v );
break;
}
$lvl++;
}
}
}
}
echo json_encode( $r );
?>

json formatting string to number

My Json output generates;
[
{
"a1_id":"7847TK10",
"output2":"7847TK10",
"output4":"something",
"output5":"3stars.gif",
"output9": "269000",
...
etc. etc.
The google visualization api asks for a number format for the output9 element e.g.:
"output9": 269000 instead of "output9": "269000". How can I achieve this for this element?
My json.php generates the json output like this:
?>
{
"total": <?php echo $total ?>,
"success": true,
"rows": [
// Iterate over the rows
$nextRow= $result->nextRow();
$r = 1;
$info = array();
while ( $nextRow ) {
$nextColumn = $result->nextColumn();
// Has this column been printed already
if ( $unique )
{
$d = $result->getDataForField($unique);
if ( array_key_exists($d, $already) )
{
$nextRow= $result->nextRow();
continue;
}
$already[$d] = true;
}
echo '{';
// Iterate over the columns in each row
while ( $nextColumn )
{
// Get the variable
$variable = $result->getOutputVariable();
$name = $variable->getName(true);
$data = $result->getDataForField();
if ( !isset($info[$name]) ) {
$info[$name]['translate'] = $variable->shouldTranslate();
$info[$name]['type'] = $variable->getDataType();
$info[$name]['linkable'] = $variable->isLinkable();
}
// Translate the data if requested
if ( $info[$name]['translate'] ) {
$data = LQMTemplate::_($data);
}
$data = $variable->format($data, false);
$type = $info[$name]['type'];
if ( ($type == 'bool') or ($type == 'boolean') )
{
$data = $data ? '1' : '0';
echo "'$name':$data";
} elseif ( $encode ) {
// Can we use json_encode ?
// str_replace because some versions of PHP have a bug that will over escape forward slashes
echo "\"$name\":".str_replace('\\/', '/', json_encode($data));
} else {
$data = LQMUtility::jsonEscape($data, '"');
//echo "'$name':\"$data\"";
echo "\"$name\":\"$data\"";
}
// Conditionally print the next column
$nextColumn = $result->nextColumn();
if ( $nextColumn ) echo ",\n ";
}
// Conditionally print the next column
$nextRow = $result->nextRow();
echo $nextRow ? "},\n" : "}\n";
$r++;
}
unset($result);
echo ']}';
}
}
This depends on how you are generating your JSON.
For example, if you were using a Ruby backend, you could call:
"output9" => output9.to_i
There are various helper methods in different languages (e.g. Java and Javascript have parseInt() functions) to change a string into an integer.
Edit:
If your JSON is being generated by PHP, cast the string to an integer:
$json['output9'] = int($output9_value);
That should get rid of the quotation marks.