Reading CSV file to hash gives incorrect result - csv

I have csv file containing (the SQL query is simplified):
SQL_QRY
"insert into TTL_CCL_DB_INF.D_1244_CRB_PARTY_EXT (CD_REC_PARTY, CD_REC_OBJECT_TYPE, T_FULL_NAME, ..) select * from TTL_CCL_DB_STG.D_1244_CRB_PARTY aa1 inner join TTL_CCL_DB_STG.D_1244_CRB_COMPANYFACT bb1 on aa1.CD_REC_PARTY = bb1.CD_REC_PARTY"
I have a perl script to read the file into hash:
sub somesub {
...
my $fh;
my $key;
eval { open($fh, '<', $tmp_file); };
if ($#) {
$errmsg = $#;
croak {message=>$errmsg};
};
while(my $lines = <$fh>) {
chomp $lines;
my #data = split(/|/, $lines);
$key = shift #data;
$data_rt{$key} = \#data;
};
close $fh;
unlink $tmp_file;
return %data_rt;
}
But the returned hash looks like:
RETURN >>>>$VAR1 = {
' => [], 'SQL_QRY
'"insert into TTL_CCL_DB_INF.D_1244_CRB_PARTY_EXT (CD_REC_PARTY, CD_REC_OBJECT_TYPE, T_FULL_NAME, CD_TAX_IDENTIFIER, CD_VAT_IDENTIFIER,DWH_TRANSFRM_ID, DWH_TRANSFRM_RUN_ID, DWH_BD, DWH_VSN_NO ) select aa1.CD_REC_PARTY, aa1.CD_REC_OBJECT_TYPE, aa1.T_FULL_NAME, bb1.CD_TAX_IDENTIFIER, bb1.CD_VAT_IDENTIFIER, $transfrmId, \'$transfrmRunId\', cast(\'$bnsDt\' as date format \'YYYYMMDD\'), $occNbr from TTL_CCL_DB_STG.D_1244_CRB_PARTY aa1 inner join TTL_CCL_DB_STG.D_1244_CRB_COMPANYFACT bb1 on aa1.C' => []ARTY = bb1.CD_REC_PARTY and cast(aa1.DWH_BD as date format \'YYYYMMDD\') = \'$10000050_dwh_bd\' and aa1.DWH_VSN_NO = $10000050_vsn_no and cast(bb1.DWH_BD as date format \'YYYYMMDD\') = \'$10000034_dwh_bd\' and bb1.DWH_VSN_NO = $10000034_vsn_no"
};
Would anybody help me how to do that correctly? I need to have it as much flexible as possible (csv file can contain more rows and more columns)

The sample data that you show us doesn't seem to match your code. The code wants to split the data on pipe symbols. But the data doesn't contain any pipe symbols. So I'm not sure how that's going to work.
However, I'm pretty confident that the problem is on this line:
my #data = split(/|/, $lines);
The first argument to split() is a regex. And a pipe symbol is a regex metacharacter. In order to use a metacharacter as itself, you need to escape it with a backslash.
my #data = split(/\|/, $lines);

Related

Duplicate records in .CSV - How do In Duplicates, to ignore the similar values in Hash and warn only for different values in Perl

The following codes check for Duplicates in CSV file where TO Column is “USD”. I need your help to figure out how do I compare the resulted duplicate value, if the duplicate value has same value like in the below case, Perl should not give any warning, if the value is same. Perl file name is Source, just change the directory and run it.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use List::MoreUtils qw/ uniq /;
my %seen = ();
my #uniq = ();
my %uniq;
my %data;
my %dupes;
my #rows;
my $csv = Text::CSV->new ()
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'D:\Longview\ENCDEVD740\DataServers\ENCDEVD740\lvaf\inbound\data\enc_meroll_fxrate_soa_load.csv' or die "Cannot use CSV: $!";
while ( my $row = $csv->getline( $fh ) ) {
# insert row into row list
push #rows, $row;
# join the unique keys with the
# perl 'multidimensional array emulation'
# subscript character
my $key = join( $;, #{$row}[0,1] );
# if it was just one field, just use
# my $key = $row->[$keyfieldindex];
# if you were checking for full line duplicates (header lines):
# my $key = join($;, #$row);
# if %data has an entry for the record, add it to dupes
#print "#{$row}\n ";
if (exists $data{$key}) { # duplicate
# if it isn't already duplicated
# add this row and the original
if (not exists $dupes{$key}) {
push #{$dupes{$key}}, $data{$key};
}
# add the duplicate row
push #{$dupes{$key}}, $row;
} else {
$data{ $key } = $row;
}
}
$csv->eof or $csv->error_diag();
close $fh;
# print out duplicates:
warn "Duplicate Values:\n";
warn "-----------------\n";
foreach my $key (keys %dupes) {
my #keys = split($;, $key);
if (($keys[1] ne 'USD') or ($keys[0] eq 'FROMCURRENCY')){
#print "Rejecting record since duplicate records are for Outofscope currencies\n";
#print "\$keys[0] = $keys[0]\n";
#print "\$keys[1] = $keys[1]\n";
next;
}
else {
print "Key: #keys\n";
foreach my $dupe (#{$dupes{$key}}) {
print "\tData: #$dupe\n";
}
}
}
Source - CSV File
Query
CSV File
Sample data:
FROMCURRENCY,TOCURRENCY,RATE
AED,USD,0.272257011
ANG,USD,0.557584544
ARS,USD,0.01421147
AUD,USD,0.68635
AED,USD,0.272257011
ANG,USD,0.557584544
ARS,USD,0.01421147
Different Values for duplicates
Like #Håkon wrote it seems like all your duplicates are in fact the same rate so they should not be considered duplicates. However, it could be an idea to store the rate in a hash mapped to each from and to currency. That way you don't need to check for duplicates every iteration and can rely on the uniqueness of the hash.
It's great that you use proper CSV parsers but here's an example using a single hash to keep track of duplicates by just splitting by , since the data seems reliable.
#!/usr/bin/env perl
use warnings;
use strict;
my $result = {};
my $format = "%-4s | %-4s | %s\n";
while ( my $line = <DATA> ) {
chomp $line;
my ( $from, $to, $rate ) = split( /,/x, $line );
$result->{$from}->{$to}->{$rate} = 1;
}
printf( $format, "FROM", "TO", "RATES" );
printf( "%s\n", "-" x 40 );
foreach my $from ( keys %$result ) {
foreach my $to ( keys %{ $result->{$from} } ) {
my #rates = keys %{ $result->{$from}->{$to} };
next if #rates < 2;
printf( $format, $from, $to, join( ", ", #rates ) );
}
}
__DATA__
AED,USD,0.272257011
ANG,USD,0.557584545
ANG,USD,1.557584545
ARS,USD,0.01421147
ARS,USD,0.01421147
ARS,USD,0.01421147
AUD,USD,0.68635
AUD,USD,1.68635
AUD,USD,2.68635
I change the test data to contain duplicates with the same rate and with different rates and the result would print.
FROM | TO | RATES
----------------------------------------
ANG | USD | 1.557584545, 0.557584545
AUD | USD | 1.68635, 0.68635, 2.68635

Can I use Text::CSV_XS to parse a csv-format string without writing it to disk?

I am getting a "csv file" from a vendor (using their API), but what they do is just spew the whole thing into their response. It wouldn't be a significant problem except that, of course, some of those pesky humans entered the data and put in "features" like line breaks. What I am doing now is creating a file for the raw data and then reopening it to read the data:
open RAW, ">", "$rawfile" or die "ERROR: Could not open $rawfile for write: $! \n";
print RAW $response->content;
close RAW;
my $csv = Text::CSV_XS->new({ binary=>1,always_quote=>1,eol=>$/ });
open my $fh, "<", "$rawfile" or die "ERROR: Could not open $rawfile for read: $! \n";
while ( $line = $csv->getline ($fh) ) { ...
Somehow this seems ... inelegant. It seems that I ought to be able to just read the data from the $response->content (multiline string) as if it were a file. But I'm drawing a total blank on how do this.
A pointer would be greatly appreciated.
Thanks,
Paul
You could use a string filehandle:
my $data = $response->content;
open my $fh, "<", \$data or croak "unable to open string filehandle : $!";
my $csv = Text::CSV_XS->new({ binary=>1,always_quote=>1,eol=>$/ });
while ( $line = $csv->getline ($fh) ) { ... }
Yes, you can use Text::CSV_XS on a string, via its functional interface
use warnings;
use strict;
use feature 'say';
use Text::CSV_XS qw(csv); # must use _XS version
my $csv = qq(a,line\nand,another);
my $aoa = csv(in => \$csv)
or die Text::CSV->error_diag;
say "#$_" for #aoa;
Note that this indeed needs Text::CSV_XS (normally Text::CSV works but not with this).
I don't know why this isn't available in the OO interface (or perhaps is but is not documented).
While the above parses the string directly as asked, one can also lessen the "inelegant" aspect in your example by writing content directly to a file as it's acquired, what most libraries support like with :content_file option in LWP::UserAgent::get method.
Let me also note that most of the time you want the library to decode content, so for LWP::UA to use decoded_content (see HTTP::Response).
I cooked up this example with Mojo::UserAgent. For the CSV input I used various data sets from the NYC Open Data. This is also going to appear in the next update for Mojo Web Clients.
I build the request without making the request right away, and that gives me the transaction object, $tx. I can then replace the read event so I can immediately send the lines into Text::CSV_XS:
#!perl
use v5.10;
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $url = ...;
my $tx = $ua->build_tx( GET => $url );
$tx->res->content->unsubscribe('read')->on(read => sub {
state $csv = do {
require Text::CSV_XS;
Text::CSV_XS->new;
};
state $buffer;
state $reader = do {
open my $r, '<:encoding(UTF-8)', \$buffer;
$r;
};
my ($content, $bytes) = #_;
$buffer .= $bytes;
while (my $row = $csv->getline($reader) ) {
say join ':', $row->#[2,4];
}
});
$tx = $ua->start($tx);
That's not as nice as I'd like it to be because all the data still show up in the buffer. This is slightly more appealing, but it's fragile in the ways I note in the comments. I'm too lazy at the moment to make it any better because that gets hairy very quickly as you figure out when you have enough data to process a record. My particular code isn't as important as the idea that you can do whatever you like as the transactor reads data and passes it into the content handler:
use v5.10;
use strict;
use warnings;
use feature qw(signatures);
no warnings qw(experimental::signatures);
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $url = ...;
my $tx = $ua->build_tx( GET => $url );
$tx->res->content
->unsubscribe('read')
->on( read => process_bytes_factory() );
$tx = $ua->start($tx);
sub process_bytes_factory {
return sub ( $content, $bytes ) {
state $csv = do {
require Text::CSV_XS;
Text::CSV_XS->new( { decode_utf8 => 1 } );
};
state $buffer = '';
state $line_no = 0;
$buffer .= $bytes;
# fragile if the entire content does not end in a
# newline (or whatever the line ending is)
my $last_line_incomplete = $buffer !~ /\n\z/;
# will not work if the format allows embedded newlines
my #lines = split /\n/, $buffer;
$buffer = pop #lines if $last_line_incomplete;
foreach my $line ( #lines ) {
my $status = $csv->parse($line);
my #row = $csv->fields;
say join ':', $line_no++, #row[2,4];
}
};
}

How to convert a string to hash in perl without using regex or split

I have a function i cannot control which returns a string which is acutally a hash. It looks something like below:
{"offset":0,"limit":500,"count":0,"virtual_machines":[]}
I need to check if the count is greater than 0. Because the output is a string and not a hash, i am trying to split the string and get the output from it.
The snippet for the same is below:
my $output = '{"offset":0,"limit":500,"count":0,"virtual_machines":[]}';
$output =~ s/ +/ /g;
my #words = split /[:,"\s\/]+/, $output;
print Dumper(#words);
The output for this is:
$VAR1 = '{';
$VAR2 = 'offset';
$VAR3 = '0';
$VAR4 = 'limit';
$VAR5 = '500';
$VAR6 = 'count';
$VAR7 = '0';
$VAR8 = 'virtual_machines';
$VAR9 = '[]}';
Now, i can get the value $VAR7 and get the count.
Is there a way to convert a string to hash and then use the keys to get the values instead of using regex and split. Can someone help me out here!
That string is in JSON format. I'd simply do
use strict;
use warnings;
use JSON::PP qw(decode_json);
my $output = '{"offset":0,"limit":500,"count":0,"virtual_machines":[]}';
my $data = decode_json $output;
print $data->{count}, "\n";
If all colons are just separators, then you can replace them with '=>'s and eval the string.
That's probably unrealistic, though. So you can use JSON ... looks like the string is in JSON format. Try the following (worked for me :-):
#!/usr/bin/perl
use JSON::Parse 'parse_json';
# the string is JSON
my $jstr = '{"offset":0,"limit":500,"count":0,"virtual_machines":[]}';
# oversimplified (not using json ... o.k. if no colons anywhere but as separators
my $sstr = $jstr;
$sstr =~ s/:/=>/g;
my $href = eval "$sstr";
printf("From oversimplified eval, limit == %d\n", $href->{limit});
# using JSON (looks like string is in JSON format).
# get JSON::Parse from CPAN (sudo cpan JSON::Parse)
my $jref = parse_json($jstr);
printf("From JSON::Parse, limit == %d\n", $jref->{limit});
1;
Output:
From oversimplified eval, limit == 500
From JSON::Parse, limit == 500

JSON delimiter tool

I have a series of JSON objects and I need to replace all the commas at the end of each object with a pipe |
Obviously I can't use Find & Replace because that would replace every comma in the JSON, but I only want to replace those at the end of each object.
For example:
{
"id":123,
"name":Joe,
"last":Smith
} , <----- I want to replace this comma only
{"id":454
"name":Bill,
"last":Smith
}
You could parse the JSON by adding '[]' around it and then re-serialize it.
With a PHP script you could do something like this:
$content = file_get_contents('/path/to/yourfile.json');
// Add [] around the JSON to make it valid:
$json = json_decode('[' . $json . ']', true);
$result = '';
foreach ($json as $j) {
if ($result != '') $result .= '|';
$result .= json_encode($j);
}
echo $result;
There is already a PHP solution. Here's a Regex solution in case.
string s1 = "{\"id\":123,\"name\":Joe,\"last\":Smith} , {\"id\":454,\"name\":Bill,\"last\":Smith}";
string pattern = "} , {";
var s2 = Regex.Split(s1, pattern);
string s3 = string.Join(" | ", s2);

DBI convert fetched arrayref to hash

I'm trying to write a program to fetch a big MySQL table, rename some fields and write it to JSON. Here is what I have for now:
use strict;
use JSON;
use DBI;
# here goes some statement preparations and db initialization
my $rowcache;
my $max_rows = 1000;
my $LIMIT_PER_FILE = 100000;
while ( my $res = shift( #$rowcache )
|| shift( #{ $rowcache = $sth->fetchall_arrayref( undef, $max_rows ) } ) ) {
if ( $cnt % $LIMIT_PER_FILE == 0 ) {
if ( $f ) {
print "CLOSE $fname\n";
close $f;
}
$filenum++;
$fname = "$BASEDIR/export-$filenum.json";
print "OPEN $fname\n";
open $f, ">$fname";
}
$res->{some_field} = $res->{another_field}
delete $res->{another_field}
print $f $json->encode( $res ) . "\n";
$cnt++;
}
I used the database row caching technique from
Speeding up the DBI
and everything seems good.
The only problem I have for now is that on $res->{some_field} = $res->{another_field}, the row interpreter complains and says that $res is Not a HASH reference.
Please could anybody point me to my mistakes?
If you want fetchall_arrayref to return an array of hashrefs, the first parameter should be a hashref. Otherwise, an array of arrayrefs is returned resulting in the "Not a HASH reference" error. So in order to return full rows as hashref, simply pass an empty hash:
$rowcache = $sth->fetchall_arrayref({}, $max_rows)