Read CSV to parse data and store it in Hash - csv

I have a CSV file, which contains data like below:
I want parse data from above csv file and store it in a hash initially. So my hash dumper %hash would look like this:
$VAR1 = {
'1' => {
'Name' => 'Name1',
'Time' => '7/2/2020 11:00'
'Cell' => 'NCell1',
'PMR' => '1001',
'ISD' => 'ISDVAL1',
'PCO' => 'PCOVAL1'
},
'2' => {
'Name' => 'Name2',
'Time' => '7/3/2020 13:10',
'Cell' => 'NCell2',
'PMR' => '1002',
'PCO' => 'PCOVAL2',
'MKR' => 'MKRVAL2',
'STD' => 'STDVAL2'
},
'3' => {
'Name' => 'Name3',
'Time' => '7/4/2020 20:15',
'Cell' => 'NCell3',
'PMR' => '1003',
'ISD' => 'ISDVAL3',
'MKR' => 'MKRVAL3'
},
};
Script is below:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my %hash;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", "input_file.csv" or die "input_file.csv: $!";
while (my $row = $csv->getline ($fh)) {
my #fields = #$row;
$hash{$fields[0]}{"Time"} = $fields[1];
$hash{$fields[0]}{"Name"} = $fields[2];
$hash{$fields[0]}{"Cell"} = $fields[3];
}
close $fh;
print Dumper(\%hash);
Here id is an key element in each line and based on the data value each data should be stored in respective names of an id.
Problem here is, till column D (Cell) I am able to parse data in above script and there after column D there won't be a header line and it will be like column E will act as header and column F is the value for the particular header's particular id. Similar condition goes to rest of the data values until end. And in middle we can see some values also will be missing. For example there is No MKR value for id 1.
How can I parse these data and store it in hash, so that my hash would look like above. TIA.

Changes made to the script posted was to remove the header line so that it does not form part of the result and added a for loop to set the reset of the data.
Test Data Used:
id,Time,Name,Cell,,,,,
1,7/2/2020 11:00,Name1,NCell1,PMR,1001,ISD,ISDVAL1
2,7/3/2020 13:10,Name2,NCell3,PMR,1002,PCO,PCOVAL2,MKR,MKRVAL2
Updated Script: (This was the first version suggest using the improved version in the edit)
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my %hash;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", "input_file.csv" or die "input_file.csv: $!";
my $headers = $csv->getline ($fh);
while (my $row = $csv->getline ($fh)) {
$hash{$row->[0]}{Time} = $row->[1];
$hash{$row->[0]}{Name} = $row->[2];
$hash{$row->[0]}{Cell} = $row->[3];
for (my $i = 4; $i < scalar (#{$row}); $i += 2) {
$hash{$row->[0]}{$row->[$i]} = $row->[$i + 1];
}
}
close $fh;
print Dumper(\%hash);
Output:
$VAR1 = {
'2' => {
'MKR' => 'MKRVAL2',
'Name' => 'Name2',
'PCO' => 'PCOVAL2',
'Cell' => 'NCell3',
'Time' => '7/3/2020 13:10',
'PMR' => '1002'
},
'1' => {
'Name' => 'Name1',
'ISD' => 'ISDVAL1',
'Cell' => 'NCell1',
'Time' => '7/2/2020 11:00',
'PMR' => '1001'
}
};
Edit:
Thanks to comment from #choroba here is an improved version of the script setting the hash with all the additional row values first and then adding the first values Time Name Cell using the header line read from the file.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my %hash;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", "input_file.csv" or die "input_file.csv: $!";
my $headers = $csv->getline ($fh);
while (my $row = $csv->getline ($fh)) {
$hash{$row->[0]} = { #$row[4 .. $#$row] };
#{$hash{$row->[0]}}{#$headers[1, 2, 3]} = #$row[1, 2, 3];
}
close $fh;
print Dumper(\%hash);

There are some Text::CSV features that you can use to make this a bit simpler. There's a lot of readability to gain by removing density in the loop.
First, you can set the column names for missing header values. I don't know what those columns represent so I've called them K1, V1, and so on. You can substitute better names for them. How I do that isn't as important is that I do that. I'm using v5.26 because I'm using postfix dereferencing:
use v5.26;
my $headers = $csv->getline($fh);
my #kv_range = 1 .. 4;
$headers->#[4..11] = map { ("K$_", "V$_") } #kv_range;
$csv->column_names( $headers );
If I knew the names, I could use those instead of numbers. I merely change the stuff in #kv_range:
my #kv_range = qw(machine test regression ice_cream);
And, when the data file changes, I handle all of that here. When it's outside the loop, there's much less to miss.
Now that I have all columns named, I use getline_hr to get back a hash reference of the line. The keys are the column names I just set. This does a lot of the work for you already. You have to handle the pairs at the end, but that's going to be easy too:
my %Grand;
while( my $row = $csv->getline_hr($fh) ) {
foreach ( #kv_range ) {
no warnings 'uninitialized';
$row->{ delete $row->{"K$_"} } = delete $row->{"V$_"};
}
$Grand{ $row->{id} } = $row;
delete $row->#{ 'id', '' };
}
Now to handle the pairs at the end: I want to take the value in the column K1 and make it a key, then take the value in V1 and make that the value. At the same time, I need to remove those K1 and V1 columns. delete has the nice behavior in that it returns the value for the key you deleted. This way doesn't require any sort of pointer math or knowledge about positions. Those things might change and I've handled all of that before I got this far:
$row->{ delete $row->{"K$_"} } = delete $row->{"V$_"};
You could also do this in a couple steps if that statement is too much for you:
my( $key, $value ) = delete $row->#{ "K$_", "V$_" };
$row->{$key} = $value;
I'd leave the id column in there, but if you don't want it, get rid of it. Also, that step with the deletes might have made some empty string keys for the cells that had no values. Instead of guarding against that and making the foreach more complicated, I let it happen and get rid of it at the end:
delete $row->#{ 'id', '' };
Altogether, it looks like this. It's doing the same thing as Piet Bosch's answer, but I've pushed a lot of the complexity back into the module as well as doing a little pre-loop work:
use v5.26;
use strict;
use warnings;
use Data::Dumper;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1
});
open my $fh, "<:encoding(utf8)", "input_file.csv"
or die "input_file.csv: $!";
my $headers = $csv->getline($fh);
my #kv_range = 1 .. 4;
$headers->#[4..11] = map { ("K$_", "V$_") } #kv_range;
$csv->column_names( $headers );
my %Grand;
while( my $row = $csv->getline_hr($fh) ) {
foreach ( #kv_range ) {
no warnings 'uninitialized';
$row->{ delete $row->{"K$_"} } = delete $row->{"V$_"};
}
$Grand{ $row->{id} } = $row;
delete $row->#{ 'id', '' };
}
say Dumper( \%Grand );
And the output looks like this:
$VAR1 = {
'2' => {
'PMR' => '1002',
'PCO' => 'PCOVAL2',
'MKR' => 'MKRVAL2',
'Name' => 'Name2',
'Time' => '7/3/2020 13:10',
'Cell' => 'NCell3'
},
'1' => {
'Cell' => 'NCell1',
'Time' => '7/2/2020 11:00',
'ISD' => 'ISDVAL1',
'PMR' => '1001',
'Name' => 'Name1'
}
};

Related

How to access data on Perl Object structures

I have the following perl code in where I have a perl structure as follows:
`
use Data::Dumper;
my %data = (
'status' => 200,
'message' => '',
'response' => {
'name' => 'John Smith',
'id' => '1abc579',
'ibge' => '3304557',
'uf' => 'XY',
'status' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' )
}
);
my $resp = $data{'status'};
print "Response is $resp \n";
print Dumper(%data->{'response'});
Getting the status field works, however If I try something like this:
my $resp = $data{'response'}
I get Response is HASH(0x8b6640)
So I'm wondering if there's a way I can extract all the data of the 'response' field on the same way I can do it for 'status' without getting that HASH...
I've tried all sort of combinations when accessing the data, however I'm still getting the HASH back when I try to get the content of 'response'
$data{'response'} is the correct way to access that field on a hash called %data. It's returning a hash reference, which prints out by default in the (relatively unhelpful) HASH(0x8b6640) syntax you've seen. But if you pass that reference to Dumper, it'll show you everything.
print Dumper($data{'response'});
to actually access those subfields, you need to dereference, which is done with an indirection -> operation.
print $data{'response'}->{'name'}
The first access doesn't need the -> because you're accessing a field on a hash variable (i.e. a variable with the % sigil). The second one does because you're dereferencing a reference, which, at least in spirit, has the $ sigil like other scalars.
Thanks for your posts. I fixed the code as follows:
use Data::Dumper;
my %data = (
'status' => 200,
'message' => '',
'response' => {
'name' => 'John Smith',
'id' => '1abc579',
'ibge' => '3304557',
'uf' => 'XY',
'status' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' )
}
);
my $resp = $data{'response'};
print Dumper($resp);
Now it works like a charm, and I'm able to get the data I want.

Perl CSV to Hash of Arrays natively

I'm trying to build an associative array from a csv file that stores only unique keys. All without using extra features like Text::CSV
An example text file:
emp1,dept1,1090
emp2,dept2,8920
emp3,dept1,3213
emp3,dept2,3234
I would like the data to be organized by dept to look like
$hash = {
dept=>[dept1, dept2, dept3]
}
and within each dept to have its respective emp and ids
So far, I have tried
my %hash;
while (<$fh>){
my #data = split(/,/, $fh);
push #{$hash{$_}}, shift #data
for qw(emp dept id);
}
However, this does not seem to fill the arrays properly and instead just initializes the arrays with no data in them. I've looked all over for examples of how to do this but my searches always contain people mentioning Text::CSV
Your first problem is the with this line
my #data = split(/,/, $fh);
You are splitting of the filehandle, not the data returned from the while statement. That is stored in $_
Below is you code changes to fix the split line. I'm also using the inline DATA filehandle to make it easier on myself. Finally, I've added a call to Data::Dumper to see what is getting stored into the hash.
use Data::Dumper ;
my %hash;
while (<DATA>){
my #data = split(/,/, $_);
push #{$hash{$_}}, shift #data
for qw(emp dept id);
}
print "Hash is " . Dumper(\%hash);
__DATA__
emp1,dept1,1090
emp2,dept2,8920
emp3,dept1,3213
emp3,dept2,3234
Running that gives this, which shows the second issue -- you are including a newline in the id column
Hash is $VAR1 = {
'dept' => [
'dept1',
'dept2',
'dept1',
'dept2'
],
'emp' => [
'emp1',
'emp2',
'emp3',
'emp3'
],
'id' => [
'1090
',
'8920
',
'3213
',
'3234
'
]
};
Fix that with a call to chomp before the split line
use Data::Dumper ;
my %hash;
while (<DATA>){
chomp;
my #data = split(/,/, $_);
push #{$hash{$_}}, shift #data
for qw(emp dept id);
}
print "Hash is " . Dumper(\%hash);
__DATA__
emp1,dept1,1090
emp2,dept2,8920
emp3,dept1,3213
emp3,dept2,3234
output is now
Hash is $VAR1 = {
'id' => [
'1090',
'8920',
'3213',
'3234'
],
'emp' => [
'emp1',
'emp2',
'emp3',
'emp3'
],
'dept' => [
'dept1',
'dept2',
'dept1',
'dept2'
]
};
That looks better, but you have duplicates in the hash. To deal with that, I'm going to store the data read from the CSV as a hash-of-hashes. That will get rid of the duplicates
my %hash;
my #cols = qw( emp dept id);
while (<DATA>)
{
chomp $_;
my #data = split /,/, $_ ;
for my $i (0 .. #cols-1)
{
# Store as a hash of hashes
$hash{ $cols[$i] }{ $data[$i] } ++;
}
}
print "Hash is " . Dumper(\%hash);
That looks better - the duplicates are removed
Hash is $VAR1 = {
'dept' => {
'dept2' => 2,
'dept1' => 2
},
'emp' => {
'emp3' => 2,
'emp2' => 1,
'emp1' => 1
},
'id' => {
'3213' => 1,
'8920' => 1,
'1090' => 1,
'3234' => 1
}
};
Your requirement was to has a hash of arrays, so add a final step to dump the hash-of-hashes into the format you require
my %result;
for my $col (keys %hash)
{
push #{ $result{$col} }, sort keys %{ $hash{$col} } ;
}
print "Hash is " . Dumper(\%result);
That outputs this
Hash is $VAR1 = {
'dept' => [
'dept1',
'dept2'
],
'emp' => [
'emp1',
'emp2',
'emp3'
],
'id' => [
'1090',
'3213',
'3234',
'8920'
]
};

Perl: hash from import JSON data, Dumper Outputs right data, However I can not access it

I have the following data in .json; actual values substituted.
{ "Mercury": [
{
"Long": "0.xxxxxx",
"LongP": "0.xxxxx",
"Eccent": "0.xxxx",
"Semi": "0.xxxx",
"Inclin": "0.xxxx",
"ascnode": "0.xx.xxxx",
"adia": "0.xxx",
"visual": "-0.xx"
}
]
}
This works fine:
my %data = ();
my $json = JSON->new();
my $data = $json->decode($json_text);
my $planet = "Mercury";
print Dumper $data; # prints:
This is all fine:
$VAR1 = {
'Mercury' => [
{
'Inclin' => '7.',
'Semi' => '0.8',
'adia' => '6.7',
'LongP' => '77.29',
'visual' => '-0.00',
'Long' => '60.000',
'Eccent' => '0.0000',
'ascnode' => '48.0000'
}
]
};
However when I try to access the hash:
my $var = $data{$planet}{Long};
I get empty values, why?
Problem 1
$data{$planet} accesses hash %data, but you populated scalar $data.
You want $data->{$planet} instead of $data{$planet}.
Always use use strict; use warnings;. It would have caught this error.
Problem 2
$data->{$planet} returns a reference to an array.
You want $data->{$planet}[0]{Long} (first element) or $data->{$planet}[-1]{Long} (last element) instead of $data->{$planet}{Long}. Maybe. An array suggests the number of elements isn't always going to be one, so you might want a loop.

Creating hash of hashes from list of network interface config files in perl

I'm trying to load the list of network interface configuration files on Linux into the hash of hashes and further encode them into JSON. This is the code that I'm using:
#!/usr/bin/env perl
use strict;
use diagnostics;
use JSON;
use Data::Dumper qw(Dumper);
opendir (DIR, "/etc/sysconfig/network-scripts/");
my #configs =grep(/^ifcfg-*/, readdir(DIR));
my $output = "metadata/json_no_comment";
my %configuration;
my $key;
my $value;
my %temp_hash;
foreach my $input ( #configs) {
$input= "/var/tmp/rhel6.8/" . $input;
open (my $JH, '<', $input) or die "Cannot open the input file $!\n";
while (<$JH>) {
s/#.*$//g;
next if /^\s*#/;
next if /^$/;
for my $field (split ) {
($key, $value) = split /\s*=\s*/, $field;
$temp_hash{$key} = $value;
}
$configuration{$input} = \%temp_hash;
}
close $JH;
}
print "-----------------------\n";
print Dumper \%configuration;
print "-----------------------\n";
my $json = encode_json \%configuration;
open (my $JNH, '>', $output) or die "Cannot open the output file $!\n";
print $JNH $json;
close $JNH;
The data structure, that I'm getting is following:
$VAR1 = {
'/etc/sysconfig/network-scripts/ifcfg-lo' => {
'BOOTPROTO' => 'dhcp',
'NAME' => 'loopback',
'TYPE' => 'Ethernet',
'IPV6INIT' => 'yes',
'HWADDR' => '"52:54:00:65:e7:8c"',
'DEVICE' => 'lo',
'NETBOOT' => 'yes',
'NETMASK' => '255.0.0.0',
'BROADCAST' => '127.255.255.255',
'IPADDR' => '127.0.0.1',
'NETWORK' => '127.0.0.0',
'ONBOOT' => 'yes'
},
'/etc/sysconfig/network-scripts/ifcfg-eth0' => $VAR1->{'/etc/sysconfig/network-scripts/ifcfg-lo'}
};
The data structure, I'm looking for is the following:
$VAR1 = {
'/etc/sysconfig/network-scripts/ifcfg-lo' => {
'BOOTPROTO' => 'dhcp',
'NAME' => 'loopback',
'TYPE' => 'Ethernet',
'IPV6INIT' => 'yes',
'HWADDR' => '"52:54:00:65:e7:8c"',
'DEVICE' => 'lo',
'NETBOOT' => 'yes',
'NETMASK' => '255.0.0.0',
'BROADCAST' => '127.255.255.255',
'IPADDR' => '127.0.0.1',
'NETWORK' => '127.0.0.0',
'ONBOOT' => 'yes'
},
'/etc/sysconfig/network-scripts/ifcfg-eth0' => {
'BOOTPROTO' => 'dhcp',
'NAME' => '"eth0"',
'TYPE' => 'Ethernet',
'IPV6INIT' => 'yes',
'HWADDR' => '"52:54:00:65:e7:8c"',
'NETBOOT' => 'yes',
'ONBOOT' => 'yes'
}
};
Any idea what am I doing wrong? Why the first nested hash is created correctly and the second one is not? I suspect, that it has something to do with reading the files line by line, but I have to do it, because I need to filter out the commented lines before JSON conversion.
Thanks for any help.
Edit: I have modified the script as suggested by Borodin and it works. Thanks!
The problem is that $configuration{$input} always refers to the same hash %temp_hash because you have declared it at file level. You need to created a new hash for each config file by declaring %temp_hash inside the for loop
Also note that next if /^\s*#/ can have no effect because you just deleted any hashes in the line. Your sanitisation should look like
s/#.*//;
next unless /\S/;

Perl - from JSON to object/ hash

I have to code below:
#!/usr/intel/bin/perl
use strict;
use warnings;
use JSON::XS;
my $json = '{"Object1":{"Year":"2012","Quarter":"Q3","DataType":"Other 3","Environment":"STEVE","Amount":125},"Object2":{"Year":"2012","Quarter":"Q4","DataType":"Other 2","Environment":"MIKE","Amount":500}}';
my $arrayref = decode_json $json;
for my $array(#$arrayref){
for my $key (keys(%$array)){
my $val = $array->{$key};
print "$key: $val\n";
}
}
When I compile it, it print me the error "Not an ARRAY reference at generator.pl line 12.".
I want to parse the JSON to an object and get data according to the object with the attributes. How can I do it?
I expect after I parse it, I can use to compare string, print, loop it and so on.
It is not array reference, it is hash reference:
#!/usr/intel/bin/perl
use strict;
use warnings;
use JSON::XS;
use Data::Dumper;
my $json = '{"Object1":{"Year":"2012","Quarter":"Q3","DataType":"Other 3","Environment":"STEVE","Amount":125},"Object2":{"Year":"2012","Quarter":"Q4","DataType":"Other 2","Environment":"MIKE","Amount":500}}';
my $arrayref = decode_json $json;
print Data::Dumper->Dump([$arrayref], [qw(arrayref)]);
And output:
$arrayref = {
'Object2' => {
'Quarter' => 'Q4',
'Year' => '2012',
'Amount' => 500,
'DataType' => 'Other 2',
'Environment' => 'MIKE'
},
'Object1' => {
'Amount' => 125,
'DataType' => 'Other 3',
'Year' => '2012',
'Environment' => 'STEVE',
'Quarter' => 'Q3'
}
};
There are no arrays there; it is a hash of hashes.
my $hashref = decode_json $json;
for my $object_name (sort keys %$hashref){
print "In $object_name:\n";
for my $key (sort keys %{ $hashref->{$object_name} }){
my $val = $hashref->{$object_name}{$key};
print "$key: $val\n";
}
}