Why do I get this error, when I use the pretty print version?
'"' expected, at character offset 2 (before "(end of string)") at ./perl.pl line 29.
#!/usr/bin/env perl
use warnings;
use 5.014;
use utf8;
binmode STDOUT, ':encoding(utf-8)';
use Data::Dumper;
use JSON;
my $json = JSON->new->utf8;
my $hashref = {
'muster, hanß' => {
'hello' => {
year => 2000,
color => 'green'
}
}
};
my $utf8_encoded_json_text = $json->pretty->encode( $hashref ); # leads to a die
#my $utf8_encoded_json_text = $json->encode( $hashref ); # works
open my $fh, '>', 'testfile.json' or die $!;
print $fh $utf8_encoded_json_text;
close $fh;
open $fh, '<', 'testfile.json' or die $!;
$utf8_encoded_json_text = readline $fh;
close $fh;
$hashref = decode_json( $utf8_encoded_json_text );
say Dumper $hashref;
Because when you read the file back in, you're using readline, and only reading the first line of the file. When pretty is off, the entire output is on one line. When pretty is on, the JSON is spread out over multiple lines, so you're passing invalid truncated JSON to decode_json.
Read the entire content by using local $/ = undef; or slurp or whatever else you want.
Related
I want to read a csv using perl excluding the first row. Further, col 2 and col3 variables need to be stored in another file and the row read must be deleted.
Edit : Following code has worked. I just want the deletion part.
use strict;
use warnings;
my ($field1, $field2, $field3, $line);
my $file = 'D:\Patching_test\ptch_file.csv';
open( my $data, '<', $file ) or die;
while ( $line = <$data> ) {
next if $. == 1;
( $field1, $field2, $field3 ) = split ',', $line;
print "$field1 : $field2 : $field3 ";
my $filename = 'D:\DB_Patch.properties';
unlink $filename;
open( my $sh, '>', $filename )
or die "Could not open file '$filename' $!";
print $sh "Patch_id=$field2\n";
print $sh "Patch_Name=$field3";
close($sh);
close($data);
exit 0;
}
OPs problem poorly presented for processing
no input data sample provided
no desired output data presented
no modified input file after processing presented
Based on problem description following code provided
use strict;
use warnings;
use feature 'say';
my $input = 'D:\Patching_test\ptch_file.csv';
my $output = 'D:\DB_Patch.properties';
my $temp = 'D:\script_temp.dat';
open my $in, '<', $input
or die "Couldn't open $input";
open my $out, '>', $output
or die "Couldn't open $output";
open my $tmp, '>', $temp
or die "Couldn't open $temp";
while ( <$in> ) {
if( $. == 1 ) {
say $tmp $_;
} else {
my($patch_id, $patch_name) = (split ',')[1,2];
say $out "Patch_id=$patch_id";
say $out "Patch_Name=$patch_name";
}
}
close $in;
close $out;
close $tmp;
rename $temp,$input;
exit 0;
apologies if this is a really stupid question or already asked elsewhere. I'm reading in some JSON and using decode_json on it, then extracting text from it and outputting that to a file.
My problem is that Unicode characters are encoded as eg \u2019 in the JSON, decode_json appears to convert this to \x{2019}. When I grab this text and output to a UTF8-encoded file, it appears as garbage.
Sample code:
use warnings;
use strict;
use JSON qw( decode_json );
use Data::Dumper;
open IN, $file or die;
binmode IN, ":utf8";
my $data = <IN>;
my $json = decode_json( $data );
open OUT, ">$outfile" or die;
binmode OUT, ":utf8";
binmode STDOUT, ":utf8";
foreach my $textdat (#{ $json->{'results'} }) {
print STDOUT Dumper($textdat);
my $text = $textdat->{'text'};
print OUT "$text\n";
}
The Dumper output shows that the \u encoding has been converted to \x encoding. What am I doing wrong?
decode_json needs UTF-8 encoded input, so use from_json instead that accepts unicode:
my $json = from_json($data);
Another option would be to encode the data yourself:
use Encode;
my $encoded_data = encode('UTF-8', $data);
...
my $json = decode_json($data);
But it makes little sense to encode data just to decode it.
decode_json expects UTF-8, but you're passing decoded text (Unicode Code Points) instead.
So, you could remove the existing character decoding.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( decode_json );
my $json_utf8 = do {
open(my $fh, '<:raw', $in_qfn)
or die("Can't open \"$in_qfn\": $!\n");
local $/;
<$fh>;
};
my $data = decode_json($json_utf8);
{
open(my $fh, '>', $out_qfn)
or die("Can't create \"$out_qfn\": $!\n");
for my $result (#{ $data->{results} }) {
say $fh $result->{text};
}
}
Or, you could use from_json (or JSON->new->decode) instead of decode_json.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( from_json ); # <---
my $json_ucp = do {
open(my $fh, '<', $in_qfn) # <---
or die("Can't open \"$in_qfn\": $!\n");
local $/;
<$fh>;
};
my $data = from_json($json_ucp); # <---
{
open(my $fh, '>', $out_qfn)
or die("Can't create \"$out_qfn\": $!\n");
for my $result (#{ $data->{results} }) {
say $fh $result->{text};
}
}
The arrows point to the three minor differences between the two snippets.
I made a number of cleanups.
Missing local $/; in case there are line breaks in the JSON.
Don't use 2-arg open.
Don't needlessly use global variables.
Use better names for variables. $data and $json were notably reversed, and $file didn't contain a file.
Limit the scope of your variables, especially if they use up system resources (e.g. file handles).
Use :encoding(UTF-8) (the standard encoding) instead of :encoding(utf8) (an encoding only used by Perl). :utf8 is even worse as it uses the internal encoding rather than the standard one, and it can lead to corrupt scalars if provided bad input.
Get rid of the noisy quotes around identifiers used as hash keys.
My custom code (on Perl) give next wrong JSON, missing comma between blocks:
{
"data": [{
"{#LOGFILEPATH}": "/tmp/QRZ2007.tcserverlogs",
"{#LOGFILE}": "QRZ2007"
} **missing comma** {
"{#LOGFILE}": "ARZ2007",
"{#LOGFILEPATH}": "/tmp/ARZ2007.tcserverlogs"
}]
}
My terrible code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use utf8;
use JSON;
binmode STDOUT, ":utf8";
my $dir = $ARGV[0];
my $json = JSON->new->utf8->space_after;
opendir(DIR, $dir) or die $!;
print '{"data": [';
while (my $file = readdir(DIR)) {
next unless (-f "$dir/$file");
next unless ($file =~ m/\.tcserverlogs$/);
my $fullPath = "$dir/$file";
my $filenameshort = basename($file, ".tcserverlogs");
my $data_to_json = {"{#LOGFILEPATH}"=>$fullPath,"{#LOGFILE}"=>$filenameshort};
my $data_to_json = {"{#LOGFILEPATH}"=>$fullPath,"{#LOGFILE}"=>$filenameshort};
print $json->encode($data_to_json);
}
print ']}'."\n";
closedir(DIR);
exit 0;
Dear Team i am not a programmer, please any idea how fix it, thank you!
If you do not print a comma, you will not get a comma.
You are trying to build your own JSON string from pre-encoded pieces of smaller data structures. That will not work unless you tell Perl when to put commas. You could do that, but it's easier to just collect all the data into a Perl data structure that is equivalent to the JSON string you want to produce, and encode the whole thing in one go when you're done.
my $dir = $ARGV[0];
my $json = JSON->new->utf8->space_after;
my #data;
opendir( DIR, $dir ) or die $!;
while ( my $file = readdir(DIR) ) {
next unless ( -f "$dir/$file" );
next unless ( $file =~ m/\.tcserverlogs$/ );
my $fullPath = "$dir/$file";
my $filenameshort = basename( $file, ".tcserverlogs" );
my $data_to_json = { "{#LOGFILEPATH}" => $fullPath, "{#LOGFILE}" => $filenameshort };
push #data, $data_to_json;
}
closedir(DIR);
print $json->encode( { data => \#data } );
I'm parsing a CSV file in which each line look something as below.
10998,4499,SLC27A5,Q9Y2P5,GO:0000166,GO:0032403,GO:0005524,GO:0016874,GO:0047747,GO:0004467,GO:0015245,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
There seems to be trailing commas at the end of each line.
I want to get the first term, in this case "10998" and get the number of GO terms related to it.
So my output in this case should be,
Output:
10998,7
But instead it shows 299. I realized overall there are 303 commas in each line. And I'm not able to figure out an easy way to remove trailing commas. Can anyone help me solve this issue?
Thanks!
My Code:
use strict;
use warnings;
open my $IN, '<', 'test.csv' or die "can't find file: $!";
open(CSV, ">GO_MF_counts_Genes.csv") or die "Error!! Cannot create the file: $!\n";
my #genes = ();
my $mf;
foreach my $line (<$IN>) {
chomp $line;
my #array = split(/,/, $line);
my #GO = splice(#array, 4);
my $GO = join(',', #GO);
$mf = count($GO);
print CSV "$array[0],$mf\n";
}
sub count {
my $go = shift #_;
my $count = my #go = split(/,/, $go);
return $count;
}
I'd use juanrpozo's solution for counting but if you still want to go your way, then remove the commas with regex substitution.
$line =~ s/,+$//;
I suggest this more concise way of coding your program.
Note that the line my #data = split /,/, $line discards trailing empty fields (#data has only 11 fields with your sample data) so will produce the same result whether or not trailing commas are removed beforehand.
use strict;
use warnings;
open my $in, '<', 'test.csv' or die "Cannot open file for input: $!";
open my $out, '>', 'GO_MF_counts_Genes.csv' or die "Cannot open file for output: $!";
foreach my $line (<$in>) {
chomp $line;
my #data = split /,/, $line;
printf $out "%s,%d\n", $data[0], scalar grep /^GO:/, #data;
}
You can apply grep to #array
my $mf = grep { /^GO:/ } #array;
assuming $array[0] never matches /^GO:/
For each your line:
foreach my $line (<$IN>) {
my ($first_term) = ($line =~ /(\d+),/);
my #tmp = split('GO', " $line ");
my $nr_of_GOs = #tmp - 1;
print CSV "$first_term,$nr_of_GOs\n";
}
The following Perl script cureently reads in an html file and strips off what I don't need. It also opens up a csv document which is blank.
My problem being is I want to import the stripped down results into the CSV's 3 fields using Name as field 1, Lives in as field 2 and commented as field 3.
The results are getting displayed in the cmd prompt but not in the CSV.
use warnings;
use strict;
use DBI;
use HTML::TreeBuilder;
use Text::CSV;
open (FILE, 'file.htm');
open (F1, ">file.csv") || die "couldn't open the file!";
my $csv = Text::CSV->new ({ binary => 1, empty_is_undef => 1 })
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'file.csv' or die "ERROR: $!";
$csv->column_names('field1', 'field2', 'field3');
while ( my $l = $csv->getline_hr($fh)) {
next if ($l->{'field1'} =~ /xxx/);
printf "Field1: %s Field2: %s Field3: %s\n",
$l->{'field1'}, $l->{'field2'}, $1->{'field3'}
}
close $fh;
my $tree = HTML::TreeBuilder->new_from_content( do { local $/; <FILE> } );
for ( $tree->look_down( 'class' => 'postbody' ) ) {
my $location = $_->look_down
( 'class' => 'posthilit' )->as_trimmed_text;
my $comment = $_->look_down( 'class' => 'content' )->as_trimmed_text;
my $name = $_->look_down( '_tag' => 'h3' )->as_text;
$name =~ s/^Re:\s*//;
$name =~ s/\s*$location\s*$//;
print "Name: $name\nLives in: $location\nCommented: $comment\n";
}
An example of the html is:
<div class="postbody">
<h3><a href "foo">Re: John Smith <span class="posthilit">England</span></a></h3>
<div class="content">Is C# better than Visula Basic?</div>
</div>
You don't actually write anything to the CSV file. Firstly, it isn't clear why you open the file for writing and then later for reading. You then read from the (now empty) file. Then you read from the HTML, and display the contents you want.
Surely you will need to write to the CSV file somewhere if you want data to appear in it!
Also, it's best to avoid barewords for file handles if you want to use them through Text::CSV.
Maybe you need something like:
my $csv = Text::CSV->new();
$csv->column_names('field1', 'field2', 'field3');
open $fh, ">", "file.csv" or die "new.csv: $!";
...
# As you handle the HTML
$csv->print ($fh, [$name, $location, $comment]);
...
close $fh or die "$!";