DBI convert fetched arrayref to hash - mysql

I'm trying to write a program to fetch a big MySQL table, rename some fields and write it to JSON. Here is what I have for now:
use strict;
use JSON;
use DBI;
# here goes some statement preparations and db initialization
my $rowcache;
my $max_rows = 1000;
my $LIMIT_PER_FILE = 100000;
while ( my $res = shift( #$rowcache )
|| shift( #{ $rowcache = $sth->fetchall_arrayref( undef, $max_rows ) } ) ) {
if ( $cnt % $LIMIT_PER_FILE == 0 ) {
if ( $f ) {
print "CLOSE $fname\n";
close $f;
}
$filenum++;
$fname = "$BASEDIR/export-$filenum.json";
print "OPEN $fname\n";
open $f, ">$fname";
}
$res->{some_field} = $res->{another_field}
delete $res->{another_field}
print $f $json->encode( $res ) . "\n";
$cnt++;
}
I used the database row caching technique from
Speeding up the DBI
and everything seems good.
The only problem I have for now is that on $res->{some_field} = $res->{another_field}, the row interpreter complains and says that $res is Not a HASH reference.
Please could anybody point me to my mistakes?

If you want fetchall_arrayref to return an array of hashrefs, the first parameter should be a hashref. Otherwise, an array of arrayrefs is returned resulting in the "Not a HASH reference" error. So in order to return full rows as hashref, simply pass an empty hash:
$rowcache = $sth->fetchall_arrayref({}, $max_rows)

Related

Duplicate records in .CSV - How do In Duplicates, to ignore the similar values in Hash and warn only for different values in Perl

The following codes check for Duplicates in CSV file where TO Column is “USD”. I need your help to figure out how do I compare the resulted duplicate value, if the duplicate value has same value like in the below case, Perl should not give any warning, if the value is same. Perl file name is Source, just change the directory and run it.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use List::MoreUtils qw/ uniq /;
my %seen = ();
my #uniq = ();
my %uniq;
my %data;
my %dupes;
my #rows;
my $csv = Text::CSV->new ()
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'D:\Longview\ENCDEVD740\DataServers\ENCDEVD740\lvaf\inbound\data\enc_meroll_fxrate_soa_load.csv' or die "Cannot use CSV: $!";
while ( my $row = $csv->getline( $fh ) ) {
# insert row into row list
push #rows, $row;
# join the unique keys with the
# perl 'multidimensional array emulation'
# subscript character
my $key = join( $;, #{$row}[0,1] );
# if it was just one field, just use
# my $key = $row->[$keyfieldindex];
# if you were checking for full line duplicates (header lines):
# my $key = join($;, #$row);
# if %data has an entry for the record, add it to dupes
#print "#{$row}\n ";
if (exists $data{$key}) { # duplicate
# if it isn't already duplicated
# add this row and the original
if (not exists $dupes{$key}) {
push #{$dupes{$key}}, $data{$key};
}
# add the duplicate row
push #{$dupes{$key}}, $row;
} else {
$data{ $key } = $row;
}
}
$csv->eof or $csv->error_diag();
close $fh;
# print out duplicates:
warn "Duplicate Values:\n";
warn "-----------------\n";
foreach my $key (keys %dupes) {
my #keys = split($;, $key);
if (($keys[1] ne 'USD') or ($keys[0] eq 'FROMCURRENCY')){
#print "Rejecting record since duplicate records are for Outofscope currencies\n";
#print "\$keys[0] = $keys[0]\n";
#print "\$keys[1] = $keys[1]\n";
next;
}
else {
print "Key: #keys\n";
foreach my $dupe (#{$dupes{$key}}) {
print "\tData: #$dupe\n";
}
}
}
Source - CSV File
Query
CSV File
Sample data:
FROMCURRENCY,TOCURRENCY,RATE
AED,USD,0.272257011
ANG,USD,0.557584544
ARS,USD,0.01421147
AUD,USD,0.68635
AED,USD,0.272257011
ANG,USD,0.557584544
ARS,USD,0.01421147
Different Values for duplicates
Like #Håkon wrote it seems like all your duplicates are in fact the same rate so they should not be considered duplicates. However, it could be an idea to store the rate in a hash mapped to each from and to currency. That way you don't need to check for duplicates every iteration and can rely on the uniqueness of the hash.
It's great that you use proper CSV parsers but here's an example using a single hash to keep track of duplicates by just splitting by , since the data seems reliable.
#!/usr/bin/env perl
use warnings;
use strict;
my $result = {};
my $format = "%-4s | %-4s | %s\n";
while ( my $line = <DATA> ) {
chomp $line;
my ( $from, $to, $rate ) = split( /,/x, $line );
$result->{$from}->{$to}->{$rate} = 1;
}
printf( $format, "FROM", "TO", "RATES" );
printf( "%s\n", "-" x 40 );
foreach my $from ( keys %$result ) {
foreach my $to ( keys %{ $result->{$from} } ) {
my #rates = keys %{ $result->{$from}->{$to} };
next if #rates < 2;
printf( $format, $from, $to, join( ", ", #rates ) );
}
}
__DATA__
AED,USD,0.272257011
ANG,USD,0.557584545
ANG,USD,1.557584545
ARS,USD,0.01421147
ARS,USD,0.01421147
ARS,USD,0.01421147
AUD,USD,0.68635
AUD,USD,1.68635
AUD,USD,2.68635
I change the test data to contain duplicates with the same rate and with different rates and the result would print.
FROM | TO | RATES
----------------------------------------
ANG | USD | 1.557584545, 0.557584545
AUD | USD | 1.68635, 0.68635, 2.68635

Can I use Text::CSV_XS to parse a csv-format string without writing it to disk?

I am getting a "csv file" from a vendor (using their API), but what they do is just spew the whole thing into their response. It wouldn't be a significant problem except that, of course, some of those pesky humans entered the data and put in "features" like line breaks. What I am doing now is creating a file for the raw data and then reopening it to read the data:
open RAW, ">", "$rawfile" or die "ERROR: Could not open $rawfile for write: $! \n";
print RAW $response->content;
close RAW;
my $csv = Text::CSV_XS->new({ binary=>1,always_quote=>1,eol=>$/ });
open my $fh, "<", "$rawfile" or die "ERROR: Could not open $rawfile for read: $! \n";
while ( $line = $csv->getline ($fh) ) { ...
Somehow this seems ... inelegant. It seems that I ought to be able to just read the data from the $response->content (multiline string) as if it were a file. But I'm drawing a total blank on how do this.
A pointer would be greatly appreciated.
Thanks,
Paul
You could use a string filehandle:
my $data = $response->content;
open my $fh, "<", \$data or croak "unable to open string filehandle : $!";
my $csv = Text::CSV_XS->new({ binary=>1,always_quote=>1,eol=>$/ });
while ( $line = $csv->getline ($fh) ) { ... }
Yes, you can use Text::CSV_XS on a string, via its functional interface
use warnings;
use strict;
use feature 'say';
use Text::CSV_XS qw(csv); # must use _XS version
my $csv = qq(a,line\nand,another);
my $aoa = csv(in => \$csv)
or die Text::CSV->error_diag;
say "#$_" for #aoa;
Note that this indeed needs Text::CSV_XS (normally Text::CSV works but not with this).
I don't know why this isn't available in the OO interface (or perhaps is but is not documented).
While the above parses the string directly as asked, one can also lessen the "inelegant" aspect in your example by writing content directly to a file as it's acquired, what most libraries support like with :content_file option in LWP::UserAgent::get method.
Let me also note that most of the time you want the library to decode content, so for LWP::UA to use decoded_content (see HTTP::Response).
I cooked up this example with Mojo::UserAgent. For the CSV input I used various data sets from the NYC Open Data. This is also going to appear in the next update for Mojo Web Clients.
I build the request without making the request right away, and that gives me the transaction object, $tx. I can then replace the read event so I can immediately send the lines into Text::CSV_XS:
#!perl
use v5.10;
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $url = ...;
my $tx = $ua->build_tx( GET => $url );
$tx->res->content->unsubscribe('read')->on(read => sub {
state $csv = do {
require Text::CSV_XS;
Text::CSV_XS->new;
};
state $buffer;
state $reader = do {
open my $r, '<:encoding(UTF-8)', \$buffer;
$r;
};
my ($content, $bytes) = #_;
$buffer .= $bytes;
while (my $row = $csv->getline($reader) ) {
say join ':', $row->#[2,4];
}
});
$tx = $ua->start($tx);
That's not as nice as I'd like it to be because all the data still show up in the buffer. This is slightly more appealing, but it's fragile in the ways I note in the comments. I'm too lazy at the moment to make it any better because that gets hairy very quickly as you figure out when you have enough data to process a record. My particular code isn't as important as the idea that you can do whatever you like as the transactor reads data and passes it into the content handler:
use v5.10;
use strict;
use warnings;
use feature qw(signatures);
no warnings qw(experimental::signatures);
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $url = ...;
my $tx = $ua->build_tx( GET => $url );
$tx->res->content
->unsubscribe('read')
->on( read => process_bytes_factory() );
$tx = $ua->start($tx);
sub process_bytes_factory {
return sub ( $content, $bytes ) {
state $csv = do {
require Text::CSV_XS;
Text::CSV_XS->new( { decode_utf8 => 1 } );
};
state $buffer = '';
state $line_no = 0;
$buffer .= $bytes;
# fragile if the entire content does not end in a
# newline (or whatever the line ending is)
my $last_line_incomplete = $buffer !~ /\n\z/;
# will not work if the format allows embedded newlines
my #lines = split /\n/, $buffer;
$buffer = pop #lines if $last_line_incomplete;
foreach my $line ( #lines ) {
my $status = $csv->parse($line);
my #row = $csv->fields;
say join ':', $line_no++, #row[2,4];
}
};
}

Calculations cannot be performed by passed value from two different subroutines to a new subroutine: Perl

I used an anonymous hash to pass value from two different subroutines to a new subroutine. But, now I'm not able to perform calculations using the passed variables.
use warnings;
use strict;
use feature 'say';
use DBI;
use autodie;
use Data::Dumper;
use CGI;
print "Enter sequence";
my $seq = <STDIN>;
chomp $seq;
$len = length $seq;
my $f = nuc($seq);
perc({ len => $len });
sub nuc {
my ($c) = #_;
chomp $c;
my $len = length $c;
for (my $i = 0; $i< = $len; $i++) {
my $seq2 = substr($c, $i, 1);
$nuc=$nuc . $seq2;
chomp $nuc;
}
my $l = perc({nuc => $nuc});
}
sub perc {
my $params = shift;
my $k = $params->{nuc};
my $w = $params->{len};
my $db = "hnf1a";
my $user = "root";
my $password = "";
my $host = "localhost";
my $dbh = DBI->connect("DBI:mysql:database=$db:$host",$user,$password);
my $sth = $dbh->prepare('SELECT COUNT(*) FROM mody where nm = ?');
for (1..100) {
$sth->execute(int(rand(10)));
}
chomp (my $input = $k);
my #num = split /':'/, $input;
for my $num(#num) {
say "rows matching input nuc <$num>:";
$sth->execute($num);
my $count = $sth->fetchrow_array;
say "$count";
$u += $count;
}
}
$h = $u / $w;
print $h;
I passed the variables : $nuc and $len to the last subroutine 'perc' by declaring an anonymous hash.
When I use these variables to perform calculations I don't get a proper answer.
For the above division performed I got a statement as 'Illegal division'.
Please help me out. Thanks in advance.
You are making two separate calls to perc, each with only one of the required values in the hash. You can't do that: the subroutine won't "remember" a value passed to it across separate calls unless you write the code to do that
You need to collect all the values and pass them in a single call to perc
There are rather a lot of misunderstandings here. Let's go through your code.
use CGI;
Using CGI.pm is a bit dated, but it's not a terrible idea if you're writing a CGI program. But this isn't a CGI program, so this isn't necessary.
print "Enter sequence";
my $seq = <STDIN>;
chomp $seq;
$len = length $seq;
my $f = nuc($seq);
This looks OK. You prompt the user, get some input, remove the newline from the end of the input, get the length of the input and then pass your input into nuc().
So, let's look at nuc() - which could probably have a better name!
sub nuc {
my ($c) = #_;
chomp $c;
my $len = length $c;
for (my $i = 0; $i< = $len; $i++) {
my $seq2 = substr($c, $i, 1);
$nuc=$nuc . $seq2;
chomp $nuc;
}
my $l = perc({nuc => $nuc});
}
You get the parameter that has been passed in and remove the newline from the end of it (which does nothing as this is $seq which has already had its newline removed). You then get the length of this string (again!)
Then it gets very strange. Firstly, there's a syntax error (< = should be <=). Then you use a C-style for loop together with substr() too... well, basically you just copy $c to $nuc in a really inefficient manner. So this subroutine could be written as:
sub nuc {
my ($c) = #_;
$nuc = $c;
my $l = perc({ nuc => $nuc });
}
Oh, and I don't know why you chomp($nuc) each time round the loop.
Two more strange things. Firstly, you don't declare $nuc anywhere, and you have use strict in your code. Which means that this code doesn't even compile. (Please don't waste our time with code that doesn't compile!) And secondly, you don't explicitly return a value from nuc(), but you store the return value in $f. Because of the way Perl works, this subroutine will return the value in $l. But it's best to be explicit.
Then there's your perc() subroutine.
sub perc {
my $params = shift;
my $k = $params->{nuc};
my $w = $params->{len};
my $db = "hnf1a";
my $user = "root";
my $password = "";
my $host = "localhost";
my $dbh = DBI->connect("DBI:mysql:database=$db:$host",$user,$password);
my $sth = $dbh->prepare('SELECT COUNT(*) FROM mody where nm = ?');
for (1..100) {
$sth->execute(int(rand(10)));
}
chomp (my $input = $k);
my #num = split /':'/, $input;
for my $num(#num) {
say "rows matching input nuc <$num>:";
$sth->execute($num);
my $count = $sth->fetchrow_array;
say "$count";
$u += $count;
}
}
You get the hash ref which is passed in an store that in $params. You then extract the nuc and len values from that hash and store them in variables called $k and $w (you really need to improve your variable and subroutine names!) But each call to perc only has one of those values set - so only one of your two variables get a value, the other will be undef.
So then you connect to the database. And you run a select query a hundred times passing in random integers between 0 and 9. And ignore the value returned from the select statement. Which is bizarre and pointless.
Eventually, you start doing something with one of your input parameters, $k (the other, $w, is completely ignored). You copy it into another scalar variable before splitting it into an array. You then run the same SQL select statement once for each element in that array and add the number you get back to the running total in $u. And $u is another variable that you never declare, so (once again) this code doesn't compile.
Outside of your subroutines, you then do some simple maths with $u (an undeclared variable) and $w (a variable that was declared in a different scope) and store the result in $h (another undeclared variable).
I really don't understand what this code is supposed to do. And, to be honest, I don't think you do too. If you're at school, then you need to go back to your teacher and say that you have no idea what you are doing. If you're in a job, you need to tell your boss that you're not the right person for this task.
Either way, if you want to be a programmer, you need to go right back to the start and cover the very basics again.

Perl JSON issue when string starts with [ and not {

Hope some Perl gurus out there can help me out here. Basically my issue is when a JSON string starts with a "[" instead of a "{", Perl doesn't treat the variable as a hash after I use decode_json.
Here's a sample code.
#!/usr/bin/perl
use JSON;
use Data::Dumper;
$string1 = '{"Peti Bar":{"Literature":88,"Mathematics":82,"Art":99},"Foo Bar":{"Literature":67,"Mathematics":97}}';
$string = '[{"ActionID":5,"ActionName":"TEST- 051017"},{"ActionID":10,"ActionName":"Something here"},{"ActionID":13,"ActionName":"Some action"},{"ActionID":141,"ActionName":"Email Reminder"}]';
print "First string that starts with \"{\" below:\n$string1\n\n";
my $w = decode_json $string1;
my $count = keys %$w;
print "printing \$count's value -> $count\n\n";
print "Second string starts with \"[\" below:\n$string\n\n";
my $x = decode_json $string;
my $count2 = keys %$x;
print "printing \$count2's value -> $count2\n\n";
Below is the script output.
Both $w and $x works though. It's just I have to use keys $x instead of keys %$x on the other json string.
Now the issue with using that is I get a keys on reference is experimental at tests/jsontest.pl error. It won't stop the script but I'm worried about future compatibility issues.
What's the best way to approach this?
Use the ref function to determine what type the reference is. See perldoc -f ref.
my $w = decode_json $string1;
my $count = 1;
if( my $ref = ref( $w ) ){
if( $ref eq 'HASH' ){
$count = keys %$w;
}elsif( $ref eq 'ARRAY' ){
$count = scalar #$w;
}else{
die "invalid reference '$ref'\n";
}
}

Perl variable becoming undefined in loop

Edit: modified code and output to make it more clear
Edit 2: Added example input for reproduction
I have a JSON file and a CSV file and I am running comparisons on the two. The problem is that $asset_ip is correctly defined in the outer foreach loop, but when in the nested loop $asset_ip becomes undefined.
Why is $asset_ip becoming undefined?
#!/usr/bin/perl
# perl -e'use CPAN; install "Text::CSV"'
use strict;
use warnings;
use JSON::XS;
use File::Slurp;
use Text::CSV;
my $csv = Text::CSV->new( { sep_char => ',' } );
my $csv_source = "servers.csv";
my $json_source = "assets.json";
my $dest = "servers_for_upload.csv";
# defined these here as I need to use them in foreach loop and if statement:
my $csv_ip;
my #fields;
open( my $csv_fh, '<', $csv_source ) or die "$! error trying to read";
open( my $dest_fh, '>', $dest ) or die "$! error trying to read";
my $json = read_file($json_source);
my $json_array = decode_json $json;
foreach my $item (#$json_array) {
my $id = $item->{id};
my $asset_ip = $item->{interfaces}->[0]->{ip_addresses}->[0]->{value};
# test the data is there:
if ( defined $asset_ip ) {
print "id: " . $id . "\nip: " . $asset_ip . "\n";
}
while (my $line = <$csv_fh>) {
chomp $line;
if ( $csv->parse($line) ) {
#fields = $csv->fields();
$csv_ip = $fields[0];
}
else {
warn "Line could not be parsed: $line\n";
}
if ( $csv_ip eq $asset_ip ) {
# preppend id to csv array and write these lines to new file
unshift( #fields, $id );
print $dest_fh join( ", ", #fields );
}
}
}
close $csv_fh;
Output:
Use of uninitialized value $asset_ip in string eq at script.pl line 43, <$csv_fh> line 1.
Use of uninitialized value $asset_ip in string eq at script.pl line 43, <$csv_fh> line 2.
Use of uninitialized value $asset_ip in string eq at script.pl line 43, <$csv_fh> line 3.
id: 1003
ip: 192.168.0.2
id: 1004
ip: 192.168.0.3
id: 1005
ip: 192.168.0.4
assets.json:
[{"id":1001,"interfaces":[]},{"id":1003,"interfaces":[{"ip_addresses":[{"value":"192.168.0.2"}]}]},{"id":1004,"interfaces":[{"ip_addresses":[{"value":"192.168.0.3"}]}]},{"id":1005,"interfaces":[{"ip_addresses":[{"value":"192.168.0.4"}]}]}]
Note, that for the first iteration, $asset_ip will be undefined. I will therefore alter the code to only run the eq comparison if $asset_ip is defined. However, for this example I am not doing the check because all iterations are undefined.
servers.csv:
192.168.0.3,Brian,Germany
192.168.0.4,Billy,UK
192.168.0.5,Ben,UK
I think your problem will be this:
foreach my $line (<$csv_fh>) {
You execute this within our outer loop. But when you do this, your $csv_fh ends up at the end of file.
Once you have done this, subsequent iterations of your outer loop will not execute this inner loop, because there's nothing left for it to read from $csv_fh.
An easy test if this is your problem is to add a seek e.g. seek ( $csv_fh, 0, 0 );.
But this isn't an efficient thing to do, because then you'll be looping through the file multiple times - you should instead read it into a data structure and use that.
Edit: Here is your problem:
[{"id":1001,"interfaces":[]},{"id":1003,"interfaces":[{"ip_addresses":[{"value":"192.168.0.2"}]}]},{"id":1004,"interfaces":[{"ip_addresses":[{"value":"192.168.0.3"}]}]},{"id":1005,"interfaces":[{"ip_addresses":[{"value":"192.168.0.4"}]}]}]
And specifically:
[{"id":1001,"interfaces":[]}
Your first element in that array doesn't have a $asset_ip defined.
This means - on your first pass - $asset_ip is undefined and generates the errors. (no line is printed because of your if defined test).
But then - the code proceeds to traverse $csv_fh - reading to the end of file - looking for matches (and fails 3 times, generating 3 error messages.
Second iteration - for id 1002 - the IP isn't in the file anyway, but $csv_fh has already been read to end-of-file (EOF) - so that foreach loop doesn't execute at all.
This can be made workable by:
adding else next; after that if defined.
adding seek to after the while loop.
But really - a rewrite would be in order so you're not re-reading a file over and over anyway.
Very crudely:
#!/usr/bin/perl
# perl -e'use CPAN; install "Text::CSV"'
use strict;
use warnings;
use JSON::XS;
use File::Slurp;
use Text::CSV;
my $csv = Text::CSV->new( { sep_char => ',' } );
my $csv_source = "servers.csv";
my $json_source = "assets.json";
my $dest = "servers_for_upload.csv";
# defined these here as I need to use them in foreach loop and if statement:
my $csv_ip;
my #fields;
open( my $csv_fh, '<', $csv_source ) or die "$! error trying to read";
open( my $dest_fh, '>', $dest ) or die "$! error trying to read";
my $json = read_file($json_source);
my $json_array = decode_json $json;
foreach my $item (#$json_array) {
my $id = $item->{id};
my $asset_ip = $item->{interfaces}->[0]->{ip_addresses}->[0]->{value};
# test the data is there:
if ( defined $asset_ip ) {
print "id: " . $id . "\nip: " . $asset_ip . "\n";
}
else {
print "asset_ip undefined for id $id\n";
next;
}
while ( my $line = <$csv_fh> ) {
chomp $line;
if ( $csv->parse($line) ) {
#fields = $csv->fields();
$csv_ip = $fields[0];
}
else {
warn "Line could not be parsed: $line\n";
}
if ( $csv_ip eq $asset_ip ) {
# preppend id to csv array and write these lines to new file
unshift( #fields, $id );
print {$dest_fh} join( ", ", #fields ),"\n";
}
}
seek( $csv_fh, 0, 0 );
}
close $csv_fh;
I would suggest this also needs:
change of while so you're not re-reading the file each time
You're using Text::CSV so using a print join ( ","... doesn't seem a consistent choice. If your data warrants Text::CSV it's worth keeping it for output too.