I have a CSV file, which contains data like below:
I want parse data from above csv file and store it in a hash initially. So my hash dumper %hash would look like this:
$VAR1 = {
'1' => {
'Name' => 'Name1',
'Time' => '7/2/2020 11:00'
'Cell' => 'NCell1',
'PMR' => '1001',
'ISD' => 'ISDVAL1',
'PCO' => 'PCOVAL1'
},
'2' => {
'Name' => 'Name2',
'Time' => '7/3/2020 13:10',
'Cell' => 'NCell2',
'PMR' => '1002',
'PCO' => 'PCOVAL2',
'MKR' => 'MKRVAL2',
'STD' => 'STDVAL2'
},
'3' => {
'Name' => 'Name3',
'Time' => '7/4/2020 20:15',
'Cell' => 'NCell3',
'PMR' => '1003',
'ISD' => 'ISDVAL3',
'MKR' => 'MKRVAL3'
},
};
Script is below:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my %hash;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", "input_file.csv" or die "input_file.csv: $!";
while (my $row = $csv->getline ($fh)) {
my #fields = #$row;
$hash{$fields[0]}{"Time"} = $fields[1];
$hash{$fields[0]}{"Name"} = $fields[2];
$hash{$fields[0]}{"Cell"} = $fields[3];
}
close $fh;
print Dumper(\%hash);
Here id is an key element in each line and based on the data value each data should be stored in respective names of an id.
Problem here is, till column D (Cell) I am able to parse data in above script and there after column D there won't be a header line and it will be like column E will act as header and column F is the value for the particular header's particular id. Similar condition goes to rest of the data values until end. And in middle we can see some values also will be missing. For example there is No MKR value for id 1.
How can I parse these data and store it in hash, so that my hash would look like above. TIA.
Changes made to the script posted was to remove the header line so that it does not form part of the result and added a for loop to set the reset of the data.
Test Data Used:
id,Time,Name,Cell,,,,,
1,7/2/2020 11:00,Name1,NCell1,PMR,1001,ISD,ISDVAL1
2,7/3/2020 13:10,Name2,NCell3,PMR,1002,PCO,PCOVAL2,MKR,MKRVAL2
Updated Script: (This was the first version suggest using the improved version in the edit)
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my %hash;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", "input_file.csv" or die "input_file.csv: $!";
my $headers = $csv->getline ($fh);
while (my $row = $csv->getline ($fh)) {
$hash{$row->[0]}{Time} = $row->[1];
$hash{$row->[0]}{Name} = $row->[2];
$hash{$row->[0]}{Cell} = $row->[3];
for (my $i = 4; $i < scalar (#{$row}); $i += 2) {
$hash{$row->[0]}{$row->[$i]} = $row->[$i + 1];
}
}
close $fh;
print Dumper(\%hash);
Output:
$VAR1 = {
'2' => {
'MKR' => 'MKRVAL2',
'Name' => 'Name2',
'PCO' => 'PCOVAL2',
'Cell' => 'NCell3',
'Time' => '7/3/2020 13:10',
'PMR' => '1002'
},
'1' => {
'Name' => 'Name1',
'ISD' => 'ISDVAL1',
'Cell' => 'NCell1',
'Time' => '7/2/2020 11:00',
'PMR' => '1001'
}
};
Edit:
Thanks to comment from #choroba here is an improved version of the script setting the hash with all the additional row values first and then adding the first values Time Name Cell using the header line read from the file.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my %hash;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", "input_file.csv" or die "input_file.csv: $!";
my $headers = $csv->getline ($fh);
while (my $row = $csv->getline ($fh)) {
$hash{$row->[0]} = { #$row[4 .. $#$row] };
#{$hash{$row->[0]}}{#$headers[1, 2, 3]} = #$row[1, 2, 3];
}
close $fh;
print Dumper(\%hash);
There are some Text::CSV features that you can use to make this a bit simpler. There's a lot of readability to gain by removing density in the loop.
First, you can set the column names for missing header values. I don't know what those columns represent so I've called them K1, V1, and so on. You can substitute better names for them. How I do that isn't as important is that I do that. I'm using v5.26 because I'm using postfix dereferencing:
use v5.26;
my $headers = $csv->getline($fh);
my #kv_range = 1 .. 4;
$headers->#[4..11] = map { ("K$_", "V$_") } #kv_range;
$csv->column_names( $headers );
If I knew the names, I could use those instead of numbers. I merely change the stuff in #kv_range:
my #kv_range = qw(machine test regression ice_cream);
And, when the data file changes, I handle all of that here. When it's outside the loop, there's much less to miss.
Now that I have all columns named, I use getline_hr to get back a hash reference of the line. The keys are the column names I just set. This does a lot of the work for you already. You have to handle the pairs at the end, but that's going to be easy too:
my %Grand;
while( my $row = $csv->getline_hr($fh) ) {
foreach ( #kv_range ) {
no warnings 'uninitialized';
$row->{ delete $row->{"K$_"} } = delete $row->{"V$_"};
}
$Grand{ $row->{id} } = $row;
delete $row->#{ 'id', '' };
}
Now to handle the pairs at the end: I want to take the value in the column K1 and make it a key, then take the value in V1 and make that the value. At the same time, I need to remove those K1 and V1 columns. delete has the nice behavior in that it returns the value for the key you deleted. This way doesn't require any sort of pointer math or knowledge about positions. Those things might change and I've handled all of that before I got this far:
$row->{ delete $row->{"K$_"} } = delete $row->{"V$_"};
You could also do this in a couple steps if that statement is too much for you:
my( $key, $value ) = delete $row->#{ "K$_", "V$_" };
$row->{$key} = $value;
I'd leave the id column in there, but if you don't want it, get rid of it. Also, that step with the deletes might have made some empty string keys for the cells that had no values. Instead of guarding against that and making the foreach more complicated, I let it happen and get rid of it at the end:
delete $row->#{ 'id', '' };
Altogether, it looks like this. It's doing the same thing as Piet Bosch's answer, but I've pushed a lot of the complexity back into the module as well as doing a little pre-loop work:
use v5.26;
use strict;
use warnings;
use Data::Dumper;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1
});
open my $fh, "<:encoding(utf8)", "input_file.csv"
or die "input_file.csv: $!";
my $headers = $csv->getline($fh);
my #kv_range = 1 .. 4;
$headers->#[4..11] = map { ("K$_", "V$_") } #kv_range;
$csv->column_names( $headers );
my %Grand;
while( my $row = $csv->getline_hr($fh) ) {
foreach ( #kv_range ) {
no warnings 'uninitialized';
$row->{ delete $row->{"K$_"} } = delete $row->{"V$_"};
}
$Grand{ $row->{id} } = $row;
delete $row->#{ 'id', '' };
}
say Dumper( \%Grand );
And the output looks like this:
$VAR1 = {
'2' => {
'PMR' => '1002',
'PCO' => 'PCOVAL2',
'MKR' => 'MKRVAL2',
'Name' => 'Name2',
'Time' => '7/3/2020 13:10',
'Cell' => 'NCell3'
},
'1' => {
'Cell' => 'NCell1',
'Time' => '7/2/2020 11:00',
'ISD' => 'ISDVAL1',
'PMR' => '1001',
'Name' => 'Name1'
}
};
Package JSON::XS uses JSON::XS::Boolean objects to represent true/false. Is it possible to force decoding true/false json values as 1/0 Perl numbers?
#!/usr/bin/env perl
use JSON::XS;
use Data::Dumper;
my $json = decode_json(join('', <DATA>));
print Dumper $json;
__DATA__
{
"test_true": true,
"test_false": false
}
Output:
$VAR1 = {
'test_true' => bless( do{\(my $o = 1)}, 'JSON::XS::Boolean' ),
'test_false' => bless( do{\(my $o = 0)}, 'JSON::XS::Boolean' )
};
I want something like this after decode_json:
$VAR1 = {
'test_true' => 1,
'test_false' => 0
};
Reason: In some cases, it's hard to predict how JSON::XS::Boolean will be serialized with, for example, SOAP serializer or another one.
PerlMonks discussion.
No. The values are blessed objects. They can only have the values allowed in JSON::XS::Boolean.
With Cpanel::JSON::XS, the unblessed_bool option controls this. So, you could use the following:
use Cpanel::JSON::XS qw( );
my $j = Cpanel::JSON::XS->new->utf8->unblessed_bool;
my $data = $j->decode( $json );
JSON::XS doesn't (currently) have an equivalent option. You would have to traverse the data returned structure and fix it up.
I get an error when attempting to access the contents of my JSON array.
Here is the contents of my JSON array assets.json:
[{"id":1002,"interfaces":[{"ip_addresses":[{"value":"172.16.77.239"}]}]},{"id":1003,"interfaces":[{"ip_addresses":[{"value":"192.168.0.2"}]}]}]
Here is my code
#!/usr/bin/perl
use strict;
use warnings;
use JSON::XS;
use File::Slurp;
my $json_source = "assets.json";
my $json = read_file( $json_source ) ;
my $json_array = decode_json $json;
foreach my $item( #$json_array ) {
print $item->{id};
print "\n";
print $item->{interfaces}->{ip_addresses}->{value};
print "\n\n";
}
I get the expected output for $item->{id} but when accessing the nested element
I get the error "Not a HASH reference"
Data::Dumper is your friend here:
Trying this:
#!/usr/bin/env perl
use strict;
use warnings;
use JSON::XS;
use Data::Dumper;
$Data::Dumper::Indent = 1;
$Data::Dumper::Terse = 1;
my $json_array = decode_json ( do { local $/; <DATA> } );
print Dumper $json_array;
__DATA__
[{"id":1002,"interfaces":[{"ip_addresses":[{"value":"172.16.77.239"}]}]},{"id":1003,"interfaces":[{"ip_addresses":[{"value":"192.168.0.2"}]}]}]
Gives:
[
{
'interfaces' => [
{
'ip_addresses' => [
{
'value' => '172.16.77.239'
}
]
}
],
'id' => 1002
},
{
'interfaces' => [
{
'ip_addresses' => [
{
'value' => '192.168.0.2'
}
]
}
],
'id' => 1003
}
]
Important point of note - you have nested arrays (the [] denotes array, the {} a hash).
So you can extract your thing with:
print $item->{interfaces}->[0]->{ip_addresses}->[0]->{value};
Or as friedo notes:
Note that you may omit the -> operator after the first one, so $item->{interfaces}[0]{ip_addresses}[0]{value} will also work.
Below is the Perl code having JSON data:
use Data::Dumper;
use JSON;
my $var = '{
"episode1": {
"title":"Cartman Gets an Anal Probe",
"id":"103511",
"airdate":"08.13.97",
"episodenumber":"101",
"available":"true",
"when":"08.13.97"
}
},
{
"episode2": {
"title":"Weight Gain 4000",
"id":"103516",
"airdate":"08.20.97",
"episodenumber":"102",
"available":"true",
"when":"08.20.97"
}
}';
my $resp = JSON::jsonToObj( $var );
print Dumper ($resp);
The output is:
$VAR1 = {
'episode1' => {
'when' => '08.13.97',
'episodenumber' => '101',
'airdate' => '08.13.97',
'title' => 'Cartman Gets an Anal Probe',
'id' => '103511',
'available' => 'true'
}
};
I am dumping a JSON data but only episode1 is dumped in the output. But, I want both episode1 and episode2 to be displayed when I dump. How to do it?
Write valid JSON.
From JSON Lint
Parse error on line 14:
...: "08.13.97" }},{ "episode2":
---------------------^
Expecting 'EOF'
If you want an array of objects, you need an array in the data: [...].
I have a JSON that prints
{"d":{"success":true,"drivers":[{"FIRST_NAME":"JOHN","LAST_NAME":"SMITH"},{"FIRST_NAME":"JANE","LAST_NAME":"DOE"}]}}
The names change depending on what was found in the database. I need to push this in this format for each result resturned in the JSON:
push(#$dummy_data, {'name' => 'testname', 'key' => 'somekey-1234'});
push(#$dummy_data, {'name' => 'testname2', 'key' => 'somekey-5678'});
So for this example it would be John Smith in place of testname and Jane for testname2
How would I do this so for each first and last name in the json gets pushed in the format above?
Let's try this new game
use strict; use warnings;
use JSON::XS;
use Data::Dumper;
# creating reference to a void ARRAY
my $dummy_data = [];
# creating $json string
my $json = '{"d":{"success":true,"drivers":[{"FIRST_NAME":"JOHN","LAST_NAME":"SMITH"},{"FIRST_NAME":"JANE","LAST_NAME":"DOE"}]}}';
# converting JSON -> Perl data structure
my $perl_hash = decode_json $json;
# feeding $dummy_data ARRAY ref with a HASH
push #$dummy_data, {
name => $perl_hash->{d}->{drivers}->[0]->{FIRST_NAME},
key => $perl_hash->{d}->{drivers}->[1]->{FIRST_NAME}
};
# print what we have finally
print Dumper $dummy_data;
Output
$VAR1 = [
{
'name' => 'JOHN',
'key' => 'JANE'
}
];