HTML post Perl script to text file - html

I have a Perl script (I didn't write it) that takes a POST from an html page and it displays a certain section of a txt file to a webpage. The problem is, now I need it to also make a text file of that section to a text file on our Unix server. Any help? Code below.
#!/usr/bin/perl
#
print "Content-type: text/html\n\n";
print '<pre>';
read(STDIN, $buf, $ENV{'CONTENT_LENGTH'});
#print "$buf\n";
#print "$REMOTE_USER = $REMOTE_USER\n";
#pairs = split(/&/, $buf);
#print "$pairs\n";
($txt_HRcode, $lc_HRcode) = split(/=/,$pairs[0]);
#print "$txt_HRcode\n";
#$HRcode = " HRcode: E2PSYAA0";
$HRcode = " HRcode: F8".uc($lc_HRcode)."0";
#print "$HRcode\n";
open(LINEFIND, "grep -n \"$HRcode\" /release/ucpmr/ucpmr.txt |") or die print "Can't Open File" ;
$line_num = <LINEFIND>;
#print "$line_num\n";
#if($line_num !~ m/$HRcode/) {print "SEQUENCE CODE NOT FOUND"; die()};
($sline, $hrd, $lin_text) = split(/:/, $line_num);
$beg_line = ($sline - 2);
$end_line = ($beg_line + 10000);
#print "$beg_line\n";
#print "$end_line\n";
close(LINEFIND);
open(DISP, "/release/ucpmr/ucpmr.txt") or die print "File is no longer in History. Press Back to Return";
for($incr=1; $incr <= $end_line; $incr +=1)
{$line = <DISP>;
if($incr > $beg_line) {
if($incr >$sline){
if($line =~ m/HRcode: F8/){
if($line !~ m/$HRcode/) {$quit_line = $incr-3 ; last;
close(DISP);}}}}}
open(PRINTFIND, "/release/ucpmr/ucpmr.txt") or die print "File is no longer in History. Press Back to Return";
for($incr=1; $incr <= $quit_line; $incr +=1)
{$line = <PRINTFIND>;
#$line =~ s/\d\d\d-\d\d-/XXX-XX-/;
if($incr > $beg_line) {print"$line";}}
#print "quit line is : $quit_line\n";
print "</pre>";

Change the end part, starting from open(PRINTFIND, ... like this:
open(PRINTFIND, "/release/ucpmr/ucpmr.txt") or die print "File is no longer in History. Press Back to Return";
open(my $fh, '>/release/ucpmr/TEXT_FILE_NAME.txt');
for($incr=1; $incr <= $quit_line; $incr +=1)
{$line = <PRINTFIND>;
#$line =~ s/\d\d\d-\d\d-/XXX-XX-/;
if($incr > $beg_line) {print"$line"; print $fh $line; }}
#print "quit line is : $quit_line\n";
print "</pre>";
... but dude, if you're a .NET guy, do yourself a favor and rewrite this mess in .NET, seriously...

open(my $fh, '>', '/release/ucpmr/TEXT_FILE_NAME.txt');
print $fh FILE_CONTENT;
close $fh;
like this?

Related

How to read a csv using Perl?

I want to read a csv using perl excluding the first row. Further, col 2 and col3 variables need to be stored in another file and the row read must be deleted.
Edit : Following code has worked. I just want the deletion part.
use strict;
use warnings;
my ($field1, $field2, $field3, $line);
my $file = 'D:\Patching_test\ptch_file.csv';
open( my $data, '<', $file ) or die;
while ( $line = <$data> ) {
next if $. == 1;
( $field1, $field2, $field3 ) = split ',', $line;
print "$field1 : $field2 : $field3 ";
my $filename = 'D:\DB_Patch.properties';
unlink $filename;
open( my $sh, '>', $filename )
or die "Could not open file '$filename' $!";
print $sh "Patch_id=$field2\n";
print $sh "Patch_Name=$field3";
close($sh);
close($data);
exit 0;
}
OPs problem poorly presented for processing
no input data sample provided
no desired output data presented
no modified input file after processing presented
Based on problem description following code provided
use strict;
use warnings;
use feature 'say';
my $input = 'D:\Patching_test\ptch_file.csv';
my $output = 'D:\DB_Patch.properties';
my $temp = 'D:\script_temp.dat';
open my $in, '<', $input
or die "Couldn't open $input";
open my $out, '>', $output
or die "Couldn't open $output";
open my $tmp, '>', $temp
or die "Couldn't open $temp";
while ( <$in> ) {
if( $. == 1 ) {
say $tmp $_;
} else {
my($patch_id, $patch_name) = (split ',')[1,2];
say $out "Patch_id=$patch_id";
say $out "Patch_Name=$patch_name";
}
}
close $in;
close $out;
close $tmp;
rename $temp,$input;
exit 0;

How to start reading CSV from beginning again?

use Text::CSV_XS;
my $csv = Text::CSV_XS->new;
open my $fh, "test.csv" or die "test.csv: $!";
while (my $row = $csv->getline($fh)) {
my #fields = #$row;
if ($fields[0] eq "A1") {
print "Found A1", "\n";
last;
}
}
# now start searching the CSV again
If I have gone through some of a CSV using Text::CSV_XS, how can I then start again from the beginning? Is there some way to return the pointer/window to the beginning of the file?
use Fcntl qw( SEEK_SET );
seek($fh, 0, SEEK_SET);
You could also just re-open the file.

Parsing HTML link from around 600 URLs using Perl

I have the list of around 600 drugs as a input and I have written a perl script to get the list of 600 URLs for all these drugs, grabs the URL content. Also, there is a link inside each URL termed Shared/Embed Graph that can be clicked to view the HTML source code. However, I need to make the script such that it clicks all these links inside all the 600 URLs and prints the 600 HTML source codes to STDOUT in perl. Right now my script is:
<c>
#!/usr/bin/perl
use strict;
use warnings;
#use LWP::Simple qw(get);
#use HTML::Parse;
use YAML;
use WWW::Mechanize;
use Data::Dumper;
use Template::Extract;
my $infile = $ARGV[0];
my $outfile = $ARGV[1];
open (IFILE, "<", $infile) || die "Could not open $infile\n";
open (OFILE, ">", $outfile) || die "Could not open $outfile\n";
my #arrayofdrugterms;
while (<IFILE>) {
chomp;
push (#arrayofdrugterms, $_);
}
#print "#arrayofdrugterms\n";
my $url;
foreach my $arrayofdrugterms( #arrayofdrugterms) {
$url = "http://www.drugcite.com/?q=$arrayofdrugterms\n";
print OFILE "$url\n";
}
close OFILE;
#open outfile for reading
open (OFILE, "<", $outfile) || die "Cannot open $outfile\n";
my #arrayofurls;
my $mech;
my $ext;
my #result;
my #link;
my $template;
my $content;
while (<OFILE>) {
chomp;
#arrayofurls = split ( ' ', $_);
#print "#arrayofurls\n";
foreach my $arrayofurls ( #arrayofurls) {
#print "$arrayofurls\n";
$mech = WWW::Mechanize->new(autocheck => 0);
$mech->get( "$arrayofurls" );
#print $mech->get( "$arrayofurls" ). "\n";
$ext = Template::Extract->new;
#print "$ext\n";
</c>
<b>
$template = "<div id="[% DrugCite %]" style="[% padding:10px %]">
<img src="[% http://www.drugcite.com/img/? item=vg&q=$arrayofdrugterms&meddra=pt style=border;0px; alt=Top 10 $arrayofdrugterms ADVERSE EVENTS - DrugCite.com %]">
<br />
<a href="[% http://www.drugcite.com/?q=$arrayofdrugterms style=font-size:7pt;margin-left:20px;color:#c0c0c0;text-decoration:none %]">"[% Source DrugCite %]"
</a>
</div>";
</b>
<c>
#result = $ext->extract($template, $mech->content);
print "#result\n";
#print Dumper "\#result" . "\n";
foreach ($mech->links) {
if( $_->[0] =~ /^Share\/Embed Graph$/) {
$mech->get($_->[0]);
}
=cut
#else {
#print "Not found the required link\n";
#}
#else {
#push (#link, $_->[0]) . "\n";
#}
=cut
}#end foreach
#print STDOUT "$mech->content\n";
#print Dumper \#link . "\n";
foreach (#result) {
#print YAML::Dump $_;
}
}
}
</c>
Any help is appreciated.Thanks

Remove trailing commas at the end of the string using Perl

I'm parsing a CSV file in which each line look something as below.
10998,4499,SLC27A5,Q9Y2P5,GO:0000166,GO:0032403,GO:0005524,GO:0016874,GO:0047747,GO:0004467,GO:0015245,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
There seems to be trailing commas at the end of each line.
I want to get the first term, in this case "10998" and get the number of GO terms related to it.
So my output in this case should be,
Output:
10998,7
But instead it shows 299. I realized overall there are 303 commas in each line. And I'm not able to figure out an easy way to remove trailing commas. Can anyone help me solve this issue?
Thanks!
My Code:
use strict;
use warnings;
open my $IN, '<', 'test.csv' or die "can't find file: $!";
open(CSV, ">GO_MF_counts_Genes.csv") or die "Error!! Cannot create the file: $!\n";
my #genes = ();
my $mf;
foreach my $line (<$IN>) {
chomp $line;
my #array = split(/,/, $line);
my #GO = splice(#array, 4);
my $GO = join(',', #GO);
$mf = count($GO);
print CSV "$array[0],$mf\n";
}
sub count {
my $go = shift #_;
my $count = my #go = split(/,/, $go);
return $count;
}
I'd use juanrpozo's solution for counting but if you still want to go your way, then remove the commas with regex substitution.
$line =~ s/,+$//;
I suggest this more concise way of coding your program.
Note that the line my #data = split /,/, $line discards trailing empty fields (#data has only 11 fields with your sample data) so will produce the same result whether or not trailing commas are removed beforehand.
use strict;
use warnings;
open my $in, '<', 'test.csv' or die "Cannot open file for input: $!";
open my $out, '>', 'GO_MF_counts_Genes.csv' or die "Cannot open file for output: $!";
foreach my $line (<$in>) {
chomp $line;
my #data = split /,/, $line;
printf $out "%s,%d\n", $data[0], scalar grep /^GO:/, #data;
}
You can apply grep to #array
my $mf = grep { /^GO:/ } #array;
assuming $array[0] never matches /^GO:/
For each your line:
foreach my $line (<$IN>) {
my ($first_term) = ($line =~ /(\d+),/);
my #tmp = split('GO', " $line ");
my $nr_of_GOs = #tmp - 1;
print CSV "$first_term,$nr_of_GOs\n";
}

How can I extract data from HTML tables in Perl?

I'm trying to use regular expressions in Perl to parse a table with the following structure. The first line is as follows:
<tr class="Highlight"><td>Time Played</a></td><td></td><td>Artist</td><td width="1%"></td><td>Title</td><td>Label</td></tr>
Here I wish to take out "Time Played", "Artist", "Title", and "Label", and print them to an output file.
I've tried many regular expressions such as:
$lines =~ / (<td>) /
OR
$lines =~ / <td>(.*)< /
OR
$lines =~ / >(.*)< /
My current program looks like so:
#!perl -w
open INPUT_FILE, "<", "FIRST_LINE_OF_OUTPUT.txt" or die $!;
open OUTPUT_FILE, ">>", "PLAYLIST_TABLE.txt" or die $!;
my $lines = join '', <INPUT_FILE>;
print "Hello 2\n";
if ($lines =~ / (\S.*\S) /) {
print "this is 1: \n";
print $1;
if ($lines =~ / <td>(.*)< / ) {
print "this is the 2nd 1: \n";
print $1;
print "the word was: $1.\n";
$Time = $1;
print $Time;
print OUTPUT_FILE $Time;
} else {
print "2ND IF FAILED\n";
}
} else {
print "THIS FAILED\n";
}
close(INPUT_FILE);
close(OUTPUT_FILE);
Do NOT use regexps to parse HTML. There are a very large number of CPAN modules which do this for you much more effectively.
Can you provide some examples of why it is hard to parse XML and HTML with a regex?
Can you provide an example of parsing HTML with your favorite parser?
HTML::Parser
HTML::TreeBuilder
HTML::TableExtract
Use HTML::TableExtract. Really.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use LWP::Simple;
my $file = 'Table3.htm';
unless ( -e $file ) {
my $rc = getstore(
'http://www.ntsb.gov/aviation/Table3.htm',
$file);
die "Failed to download document\n" unless $rc == 200;
}
my #headers = qw( Year Fatalities );
my $te = HTML::TableExtract->new(
headers => \#headers,
attribs => { id => 'myTable' },
);
$te->parse_file($file);
my ($table) = $te->tables;
print join("\t", #headers), "\n";
for my $row ($te->rows ) {
print join("\t", #$row), "\n";
}
This is what I meant in another post by "task-specific" HTML parsers.
You could have saved a lot of time by directing your energy to reading some documentation rather than throwing regexes at the wall and seeing if any stuck.
That's an easy one:
my $html = '<tr class="Highlight"><td>Time Played</a></td><td></td><td>Artist</td><td width="1%"></td><td>Title</td><td>Label</td></tr>';
my #stuff = $html =~ />([^<]+)</g;
print join (", ", #stuff), "\n";
See http://codepad.org/qz9d5Bro if you want to try running it.