Perl CSV to Hash of Arrays natively - csv

I'm trying to build an associative array from a csv file that stores only unique keys. All without using extra features like Text::CSV
An example text file:
emp1,dept1,1090
emp2,dept2,8920
emp3,dept1,3213
emp3,dept2,3234
I would like the data to be organized by dept to look like
$hash = {
dept=>[dept1, dept2, dept3]
}
and within each dept to have its respective emp and ids
So far, I have tried
my %hash;
while (<$fh>){
my #data = split(/,/, $fh);
push #{$hash{$_}}, shift #data
for qw(emp dept id);
}
However, this does not seem to fill the arrays properly and instead just initializes the arrays with no data in them. I've looked all over for examples of how to do this but my searches always contain people mentioning Text::CSV

Your first problem is the with this line
my #data = split(/,/, $fh);
You are splitting of the filehandle, not the data returned from the while statement. That is stored in $_
Below is you code changes to fix the split line. I'm also using the inline DATA filehandle to make it easier on myself. Finally, I've added a call to Data::Dumper to see what is getting stored into the hash.
use Data::Dumper ;
my %hash;
while (<DATA>){
my #data = split(/,/, $_);
push #{$hash{$_}}, shift #data
for qw(emp dept id);
}
print "Hash is " . Dumper(\%hash);
__DATA__
emp1,dept1,1090
emp2,dept2,8920
emp3,dept1,3213
emp3,dept2,3234
Running that gives this, which shows the second issue -- you are including a newline in the id column
Hash is $VAR1 = {
'dept' => [
'dept1',
'dept2',
'dept1',
'dept2'
],
'emp' => [
'emp1',
'emp2',
'emp3',
'emp3'
],
'id' => [
'1090
',
'8920
',
'3213
',
'3234
'
]
};
Fix that with a call to chomp before the split line
use Data::Dumper ;
my %hash;
while (<DATA>){
chomp;
my #data = split(/,/, $_);
push #{$hash{$_}}, shift #data
for qw(emp dept id);
}
print "Hash is " . Dumper(\%hash);
__DATA__
emp1,dept1,1090
emp2,dept2,8920
emp3,dept1,3213
emp3,dept2,3234
output is now
Hash is $VAR1 = {
'id' => [
'1090',
'8920',
'3213',
'3234'
],
'emp' => [
'emp1',
'emp2',
'emp3',
'emp3'
],
'dept' => [
'dept1',
'dept2',
'dept1',
'dept2'
]
};
That looks better, but you have duplicates in the hash. To deal with that, I'm going to store the data read from the CSV as a hash-of-hashes. That will get rid of the duplicates
my %hash;
my #cols = qw( emp dept id);
while (<DATA>)
{
chomp $_;
my #data = split /,/, $_ ;
for my $i (0 .. #cols-1)
{
# Store as a hash of hashes
$hash{ $cols[$i] }{ $data[$i] } ++;
}
}
print "Hash is " . Dumper(\%hash);
That looks better - the duplicates are removed
Hash is $VAR1 = {
'dept' => {
'dept2' => 2,
'dept1' => 2
},
'emp' => {
'emp3' => 2,
'emp2' => 1,
'emp1' => 1
},
'id' => {
'3213' => 1,
'8920' => 1,
'1090' => 1,
'3234' => 1
}
};
Your requirement was to has a hash of arrays, so add a final step to dump the hash-of-hashes into the format you require
my %result;
for my $col (keys %hash)
{
push #{ $result{$col} }, sort keys %{ $hash{$col} } ;
}
print "Hash is " . Dumper(\%result);
That outputs this
Hash is $VAR1 = {
'dept' => [
'dept1',
'dept2'
],
'emp' => [
'emp1',
'emp2',
'emp3'
],
'id' => [
'1090',
'3213',
'3234',
'8920'
]
};

Related

Perl: hash from import JSON data, Dumper Outputs right data, However I can not access it

I have the following data in .json; actual values substituted.
{ "Mercury": [
{
"Long": "0.xxxxxx",
"LongP": "0.xxxxx",
"Eccent": "0.xxxx",
"Semi": "0.xxxx",
"Inclin": "0.xxxx",
"ascnode": "0.xx.xxxx",
"adia": "0.xxx",
"visual": "-0.xx"
}
]
}
This works fine:
my %data = ();
my $json = JSON->new();
my $data = $json->decode($json_text);
my $planet = "Mercury";
print Dumper $data; # prints:
This is all fine:
$VAR1 = {
'Mercury' => [
{
'Inclin' => '7.',
'Semi' => '0.8',
'adia' => '6.7',
'LongP' => '77.29',
'visual' => '-0.00',
'Long' => '60.000',
'Eccent' => '0.0000',
'ascnode' => '48.0000'
}
]
};
However when I try to access the hash:
my $var = $data{$planet}{Long};
I get empty values, why?
Problem 1
$data{$planet} accesses hash %data, but you populated scalar $data.
You want $data->{$planet} instead of $data{$planet}.
Always use use strict; use warnings;. It would have caught this error.
Problem 2
$data->{$planet} returns a reference to an array.
You want $data->{$planet}[0]{Long} (first element) or $data->{$planet}[-1]{Long} (last element) instead of $data->{$planet}{Long}. Maybe. An array suggests the number of elements isn't always going to be one, so you might want a loop.

Read CSV to parse data and store it in Hash

I have a CSV file, which contains data like below:
I want parse data from above csv file and store it in a hash initially. So my hash dumper %hash would look like this:
$VAR1 = {
'1' => {
'Name' => 'Name1',
'Time' => '7/2/2020 11:00'
'Cell' => 'NCell1',
'PMR' => '1001',
'ISD' => 'ISDVAL1',
'PCO' => 'PCOVAL1'
},
'2' => {
'Name' => 'Name2',
'Time' => '7/3/2020 13:10',
'Cell' => 'NCell2',
'PMR' => '1002',
'PCO' => 'PCOVAL2',
'MKR' => 'MKRVAL2',
'STD' => 'STDVAL2'
},
'3' => {
'Name' => 'Name3',
'Time' => '7/4/2020 20:15',
'Cell' => 'NCell3',
'PMR' => '1003',
'ISD' => 'ISDVAL3',
'MKR' => 'MKRVAL3'
},
};
Script is below:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my %hash;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", "input_file.csv" or die "input_file.csv: $!";
while (my $row = $csv->getline ($fh)) {
my #fields = #$row;
$hash{$fields[0]}{"Time"} = $fields[1];
$hash{$fields[0]}{"Name"} = $fields[2];
$hash{$fields[0]}{"Cell"} = $fields[3];
}
close $fh;
print Dumper(\%hash);
Here id is an key element in each line and based on the data value each data should be stored in respective names of an id.
Problem here is, till column D (Cell) I am able to parse data in above script and there after column D there won't be a header line and it will be like column E will act as header and column F is the value for the particular header's particular id. Similar condition goes to rest of the data values until end. And in middle we can see some values also will be missing. For example there is No MKR value for id 1.
How can I parse these data and store it in hash, so that my hash would look like above. TIA.
Changes made to the script posted was to remove the header line so that it does not form part of the result and added a for loop to set the reset of the data.
Test Data Used:
id,Time,Name,Cell,,,,,
1,7/2/2020 11:00,Name1,NCell1,PMR,1001,ISD,ISDVAL1
2,7/3/2020 13:10,Name2,NCell3,PMR,1002,PCO,PCOVAL2,MKR,MKRVAL2
Updated Script: (This was the first version suggest using the improved version in the edit)
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my %hash;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", "input_file.csv" or die "input_file.csv: $!";
my $headers = $csv->getline ($fh);
while (my $row = $csv->getline ($fh)) {
$hash{$row->[0]}{Time} = $row->[1];
$hash{$row->[0]}{Name} = $row->[2];
$hash{$row->[0]}{Cell} = $row->[3];
for (my $i = 4; $i < scalar (#{$row}); $i += 2) {
$hash{$row->[0]}{$row->[$i]} = $row->[$i + 1];
}
}
close $fh;
print Dumper(\%hash);
Output:
$VAR1 = {
'2' => {
'MKR' => 'MKRVAL2',
'Name' => 'Name2',
'PCO' => 'PCOVAL2',
'Cell' => 'NCell3',
'Time' => '7/3/2020 13:10',
'PMR' => '1002'
},
'1' => {
'Name' => 'Name1',
'ISD' => 'ISDVAL1',
'Cell' => 'NCell1',
'Time' => '7/2/2020 11:00',
'PMR' => '1001'
}
};
Edit:
Thanks to comment from #choroba here is an improved version of the script setting the hash with all the additional row values first and then adding the first values Time Name Cell using the header line read from the file.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my %hash;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", "input_file.csv" or die "input_file.csv: $!";
my $headers = $csv->getline ($fh);
while (my $row = $csv->getline ($fh)) {
$hash{$row->[0]} = { #$row[4 .. $#$row] };
#{$hash{$row->[0]}}{#$headers[1, 2, 3]} = #$row[1, 2, 3];
}
close $fh;
print Dumper(\%hash);
There are some Text::CSV features that you can use to make this a bit simpler. There's a lot of readability to gain by removing density in the loop.
First, you can set the column names for missing header values. I don't know what those columns represent so I've called them K1, V1, and so on. You can substitute better names for them. How I do that isn't as important is that I do that. I'm using v5.26 because I'm using postfix dereferencing:
use v5.26;
my $headers = $csv->getline($fh);
my #kv_range = 1 .. 4;
$headers->#[4..11] = map { ("K$_", "V$_") } #kv_range;
$csv->column_names( $headers );
If I knew the names, I could use those instead of numbers. I merely change the stuff in #kv_range:
my #kv_range = qw(machine test regression ice_cream);
And, when the data file changes, I handle all of that here. When it's outside the loop, there's much less to miss.
Now that I have all columns named, I use getline_hr to get back a hash reference of the line. The keys are the column names I just set. This does a lot of the work for you already. You have to handle the pairs at the end, but that's going to be easy too:
my %Grand;
while( my $row = $csv->getline_hr($fh) ) {
foreach ( #kv_range ) {
no warnings 'uninitialized';
$row->{ delete $row->{"K$_"} } = delete $row->{"V$_"};
}
$Grand{ $row->{id} } = $row;
delete $row->#{ 'id', '' };
}
Now to handle the pairs at the end: I want to take the value in the column K1 and make it a key, then take the value in V1 and make that the value. At the same time, I need to remove those K1 and V1 columns. delete has the nice behavior in that it returns the value for the key you deleted. This way doesn't require any sort of pointer math or knowledge about positions. Those things might change and I've handled all of that before I got this far:
$row->{ delete $row->{"K$_"} } = delete $row->{"V$_"};
You could also do this in a couple steps if that statement is too much for you:
my( $key, $value ) = delete $row->#{ "K$_", "V$_" };
$row->{$key} = $value;
I'd leave the id column in there, but if you don't want it, get rid of it. Also, that step with the deletes might have made some empty string keys for the cells that had no values. Instead of guarding against that and making the foreach more complicated, I let it happen and get rid of it at the end:
delete $row->#{ 'id', '' };
Altogether, it looks like this. It's doing the same thing as Piet Bosch's answer, but I've pushed a lot of the complexity back into the module as well as doing a little pre-loop work:
use v5.26;
use strict;
use warnings;
use Data::Dumper;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1
});
open my $fh, "<:encoding(utf8)", "input_file.csv"
or die "input_file.csv: $!";
my $headers = $csv->getline($fh);
my #kv_range = 1 .. 4;
$headers->#[4..11] = map { ("K$_", "V$_") } #kv_range;
$csv->column_names( $headers );
my %Grand;
while( my $row = $csv->getline_hr($fh) ) {
foreach ( #kv_range ) {
no warnings 'uninitialized';
$row->{ delete $row->{"K$_"} } = delete $row->{"V$_"};
}
$Grand{ $row->{id} } = $row;
delete $row->#{ 'id', '' };
}
say Dumper( \%Grand );
And the output looks like this:
$VAR1 = {
'2' => {
'PMR' => '1002',
'PCO' => 'PCOVAL2',
'MKR' => 'MKRVAL2',
'Name' => 'Name2',
'Time' => '7/3/2020 13:10',
'Cell' => 'NCell3'
},
'1' => {
'Cell' => 'NCell1',
'Time' => '7/2/2020 11:00',
'ISD' => 'ISDVAL1',
'PMR' => '1001',
'Name' => 'Name1'
}
};

Perl - Accessing data from decode_json output

I supply my script with a file of JSON data. I have then decoded the JSON data using decode_json...
open my $fh, '<:encoding(UTF-8)', $file ir die;
my $jsondata = do {local $/; <$fh> };
my $data = decode_json($jsondata);
#print Dumper $data
#I am trying to write a foreach loop in here to pull particular bits of
#the information out that I want to display (as detailed further below)
close $fh;
The Dumper output looks like this...
$VAR1 = [
{
'DataName' => 'FileOfPetsAcrossTheWorld',
'Information001' => [
{
'Name' => Steve,
'Sex' => 'Male',
'Age' => 24,
'Animals' => [
'Dog',
'Cat',
'Hamster',
'Parrot
],
'Location' => 'London',
},
{
'Name' => Dave,
'Sex' => 'Male',
'Age' => 59,
'Animals' => [
'Fish',
'Horse',
'Budgie',
],
'Location' => 'Paris',
},
{
'Name' => Sandra,
'Sex' => 'Female',
'Age' => 44,
'Animals' => [
'Snake',
'Crocodile',
'Flamingo',
],
'Location' => 'Syndey',
}
]
}
];
I am trying to retrieve output from this data structure using a foreach look so that I can print the output...
Dataname: FileOfPetsAcrossTheWorld
Name: Steve
Animals: Dog, Cat, Parrot, Hamster
Location: London
Name: Dave
Animals: Fish, Horse, Budgie
Location: Paris
Name: Sandra
Animals: Snake, Crocodile, Flamingo
Location: Sydey
I have tried various different foreach loops and hash referencing code snippets from online sources (and some that I have used and had working previously) to iterate through and pull data from hashes etc, but I cannot seem to get it working in this case. Amongst other errors, I receive errors such as 'Not a HASH reference at...'.
What is the correct method I should be using to pull this information out of this type of data structure?
for my $hash (#$data) {
say "Dataname: $hash->{DataName}";
for my $info (#{ $hash->{Information001} }) {
say "Name: $info->{Name}";
say 'Animals: ', join ', ', #{ $info->{Animals} };
say "Location: $info->{Location}";
say "";
}
}
The order of Animals is different for Steve. Sydney is spelled "Sydey".

Accessing nested JSON elements in Perl

I get an error when attempting to access the contents of my JSON array.
Here is the contents of my JSON array assets.json:
[{"id":1002,"interfaces":[{"ip_addresses":[{"value":"172.16.77.239"}]}]},{"id":1003,"interfaces":[{"ip_addresses":[{"value":"192.168.0.2"}]}]}]
Here is my code
#!/usr/bin/perl
use strict;
use warnings;
use JSON::XS;
use File::Slurp;
my $json_source = "assets.json";
my $json = read_file( $json_source ) ;
my $json_array = decode_json $json;
foreach my $item( #$json_array ) {
print $item->{id};
print "\n";
print $item->{interfaces}->{ip_addresses}->{value};
print "\n\n";
}
I get the expected output for $item->{id} but when accessing the nested element
I get the error "Not a HASH reference"
Data::Dumper is your friend here:
Trying this:
#!/usr/bin/env perl
use strict;
use warnings;
use JSON::XS;
use Data::Dumper;
$Data::Dumper::Indent = 1;
$Data::Dumper::Terse = 1;
my $json_array = decode_json ( do { local $/; <DATA> } );
print Dumper $json_array;
__DATA__
[{"id":1002,"interfaces":[{"ip_addresses":[{"value":"172.16.77.239"}]}]},{"id":1003,"interfaces":[{"ip_addresses":[{"value":"192.168.0.2"}]}]}]
Gives:
[
{
'interfaces' => [
{
'ip_addresses' => [
{
'value' => '172.16.77.239'
}
]
}
],
'id' => 1002
},
{
'interfaces' => [
{
'ip_addresses' => [
{
'value' => '192.168.0.2'
}
]
}
],
'id' => 1003
}
]
Important point of note - you have nested arrays (the [] denotes array, the {} a hash).
So you can extract your thing with:
print $item->{interfaces}->[0]->{ip_addresses}->[0]->{value};
Or as friedo notes:
Note that you may omit the -> operator after the first one, so $item->{interfaces}[0]{ip_addresses}[0]{value} will also work.

JSON Data output

Below is the Perl code having JSON data:
use Data::Dumper;
use JSON;
my $var = '{
"episode1": {
"title":"Cartman Gets an Anal Probe",
"id":"103511",
"airdate":"08.13.97",
"episodenumber":"101",
"available":"true",
"when":"08.13.97"
}
},
{
"episode2": {
"title":"Weight Gain 4000",
"id":"103516",
"airdate":"08.20.97",
"episodenumber":"102",
"available":"true",
"when":"08.20.97"
}
}';
my $resp = JSON::jsonToObj( $var );
print Dumper ($resp);
The output is:
$VAR1 = {
'episode1' => {
'when' => '08.13.97',
'episodenumber' => '101',
'airdate' => '08.13.97',
'title' => 'Cartman Gets an Anal Probe',
'id' => '103511',
'available' => 'true'
}
};
I am dumping a JSON data but only episode1 is dumped in the output. But, I want both episode1 and episode2 to be displayed when I dump. How to do it?
Write valid JSON.
From JSON Lint
Parse error on line 14:
...: "08.13.97" }},{ "episode2":
---------------------^
Expecting 'EOF'
If you want an array of objects, you need an array in the data: [...].