I want to store text/string in a text field in a database.
This string has the variable $name in it.
When I pull it out of the database, I want that variable to be substituted with the value I define before I print the string.
# Variable I want to substitute #
1. $name='John';
# needs to be read from database #
2. $txt{'hello'}="Hello ${name}, How are you?";
3. print "<tag>$txt{'hello'}</tag>";
It prints Hello John, How are you? as required, but when 2nd line is read from database, it displays Hello ${name}, How are you?.
Some things I found are:
Locale::Maketext
$string =~ s/(\$\w+)/$1/eeg;
my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
Can someone guide me about how to go about it?
What you're describing is a template. There are lots of template systems on CPAN of varying degrees of complexity. Text::Template and Template Toolkit are a couple of popular ones. You don't want to let your templates access arbitrary variables; that's a security hole. Instead, put the variables they're allowed to access into a hash.
If all you need is a very simple system, you can do something like this:
sub fill_in_template
{
my ($text, $values) = #_;
$text =~ s/ \$\{ ( [^}\s]+ ) \} /$values->{$1}/gx;
return $text;
}
my %txt;
my %values = (name => 'Your Name');
my $template = 'Hello ${name}, How are you?'; # $name NOT interpolated
$txt{'hello'} = fill_in_template($template, \%values);
print "<tag>$txt{'hello'}</tag>\n";
You might add some error checking in case the template uses a field that's not defined. But if you need something more complicated than that, you're probably better off picking an existing template system from CPAN.
Locale::Maketext is intended for internationalization (so your app can produce output in multiple languages without your translators needing to work on the code directly) and is not the sort of template engine you're looking for.
Related
I have an assignment for school. We need to use the POST method to pass data from an embedded HTML form to a Perl script. The Perl will then loop through the data and display it back to the user.
I was using a separate file and the GET method but my instructor said I had to embed the HTML and use POST. But the information is no longer stored in the environment variable.
Is there a built-in variable I can use to access this information?
#!/usr/bin/perl -w
print "Content-Type:text/html\n\n";
$qstring = $ENV{'QUERY_STRING'};
$rmethod = $ENV{'REQUEST_METHOD'};
if ( $rmethod eq "GET" ) {
&displayform();
}
elsif ( $rmethod eq "POST" ) {
#pairs = split(/&/, $qstring);
foreach (#pairs) {
($key, $value) = split(/=/);
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$form{$key} = $value;
}
print "<html><head><title>Student Survey</title></head>\n";
print "<body>\n";
&displayInfo();
print "</body></html>\n";
}
print "</body></html>\n";
}
sub displayInfo {
print "Full Name:", $form{"person"}, "<br>";
print "Favourite:", $form{"sport"}, "<br>";
print "Favourite Course:", $form{"course"}, "<br>";
print "GPA:", $form{"gpa"}, "<br>";
}
sub displayform {
print qq~ (FORM GOES HERE)
Someone is teaching you very bad practices. I don't know whether your sample code is following examples supplied by your school or whether you have cobbled it together from bad examples on the internet - but either way, this code uses techniques that have been out of date for twenty years.
There's a whole debate to be had about the wisdom of teaching CGI programming in 2017 (see CGI::Alternatives for a brief discussion of some better approaches) but let's ignore that and assume that CGI is a good idea here.
If you're writing a CGI program, then you should use the CGI.pm library which has been part of the standard Perl distribution for over twenty years (it was removed recently, but the chances of your school using a version this up to date is tiny).
A standard CGI program, using CGI.pm looks like this:
#!/user/bin/env perl
use strict;
use warnings;
use CGI qw[header param]; # Load the two functions we're going to use
# Display the content-type header
print header;
# See if we have been passed parameters.
if (param) {
# Display the parameters
# Note: I've ignored HTML here. You shouldn't
print 'Name: ', param('person');
print 'Sport: ', param('sport');
# etc...
} else {
# Display the form
...
}
I ignored HTML in my example because embedding HTML in your Perl code is a terrible idea. It's a much better idea to use a templating system (I recommend the Template Toolkit).
A few other points:
Always use strict and use warnings.
-w on the shebang was obsoleted by use warnings in 2000.
Using & on subroutine calls has been unnecessary since Perl 5 was released in 1994.
I know that you don't know any better and that you're just following what your teacher is telling you to do. But it's really depressing to see such outdated practices been taught in school.
Update: And just to add the answer to your original question. You're right that in a POST request, the parameter data is no longer available in the QUERY_STRING environment variable - you need to read it from STDIN instead. That's one of the many advantages of CGI.pm - you use the same method (the param() subroutine) to access both GET and POST parameters.
I have 5 different variable coming from different if and loop statements, when I use "put" to take output into text file all characters and digits are altogether like this : alphaclass112098voip. where
variables: name = alpha
category = class1
number = 12098
service = voip
I want output in file as like this with spaces on same line.
Alpha class1 12098 voip
Beta class1 12093 DHCP SIP
Also at certain point I want to through delimiters for future purposes.
The easiest way to deal with this is to construct a Tcl list that represents the record. You can do this piecemeal with lappend, or all at once with list. Or mix them.
foreach name $names category $categories number $numbers service $services {
set record [list $name $category]
lappend record $number
lappend record $service
puts $record
}
This shows the record for each line in a format that Tcl finds easy to parse (you'll see what I mean if you have a name with a space in it). To use a delimiter to separate the values instead, the join command is very useful:
puts [join $record $delimiter]
The default delimiter is space, but try a : instead to see how it works.
If you're generating a CSV file, do use the csv package in Tcllib. It handles the tricky edge-cases (e.g., embedded commas) for you.
i have an html page that contain urls like :
<h3><a href="http://site.com/path/index.php" h="blablabla">
<h3><a href="https://www.site.org/index.php?option=com_content" h="vlavlavla">
i want to extract :
site.com/path
www.site.org
between <h3><a href=" & /index.php .
i've tried this code :
#!/usr/local/bin/perl
use strict;
use warnings;
open (MYFILE, 'MyFileName.txt');
while (<MYFILE>)
{
my $values1 = split('http://', $_); #VALUE WILL BE: www.site.org/path/index2.php
my #values2 = split('index.php', $values1); #VALUE WILL BE: www.site.org/path/ ?option=com_content
print $values2[0]; # here it must print www.site.org/path/ but it don't
print "\n";
}
close (MYFILE);
but this give an output :
2
1
2
2
1
1
and it don't parse https websites.
hope you've understand , regards.
The main thing wrong with your code is that when you call split in scalar context as in your line:
my $values1 = split('http://', $_);
It returns the size of the list created by the split. See split.
But I don't think split is appropriate for this task anyway. If you know that the value you are looking for will always lie between 'http[s]://' and '/index.php' you just need a regex substitution in your loop (you should also be more careful opening your file...):
open(my $myfile_fh, '<', 'MyFileName.txt') or die "Couldn't open $!";
while(<$myfile_fh>) {
s{.*http[s]?://(.*)/index\.php.*}{$1} && print;
}
close($myfile_fh);
It's likely you will need a more general regex than that, but I think this would work based on your description of the problem.
This feels to me like a job for modules
HTML::LinkExtor
URI
Generally using regexps to parse HTML is risky.
dms explained in his answer why using split isn't the best solution here:
It returns the number of items in scalar context
A normal regex is better suited for this task.
However, I do not think that line-based processing of the input is valid for HTML, or that using a substitution makes sense (it does not, especially when the pattern looks like .*Pattern.*).
Given an URL, we can extract the required information like
if ($url =~ m{^https?://(.+?)/index\.php}s) { # domain+path now in $1
say $1;
}
But how do we extract the URLs? I'd recommend the wonderful Mojolicious suite.
use strict; use warnings;
use feature 'say';
use File::Slurp 'slurp'; # makes it easy to read files.
use Mojo;
my $html_file = shift #ARGV; # take file name from command line
my $dom = Mojo::DOM->new(scalar slurp $html_file);
for my $link ($dom->find('a[href]')->each) {
say $1 if $link->attr('href') =~ m{^https?://(.+?)/index\.php}s;
}
The find method can take CSS selectors (here: all a elements that have an href attribute). The each flattens the result set into a list which we can loop over.
As I print to STDOUT, we can use shell redirection to put the output into a wanted file, e.g.
$ perl the-script.pl html-with-links.html >only-links.txt
The whole script as a one-liner:
$ perl -Mojo -E'$_->attr("href") =~ m{^https?://(.+?)/index\.php}s and say $1 for x(b("test.html")->slurp)->find("a[href]")->each'
I asked a question a couple of days ago about stripping HTML from files with PERL. I am a n00b and I've searched the site for answers to my question...but unfortunately I couldn't find anything...this is probably because I'm a n00b and I didn't see the answer when I was looking at it.
So, here is the situation. I have a directory with around 20 gb of text files. I want to strip the HTML from each file and output each file to a unique text file. I've written the program below, which seems to do the trick for the first 12 text files in the directory (there are about 12,000 text files in total)...however...I run into a couple of snags. The first snag is that after the 12th text file has been parsed, then I start getting warnings about deep recursion...and then shortly after this the program quits because I've run out of memory. I imagine that my programming is extremely inefficient. So, I'm wondering if any of you see any obvious errors with my code below that would case me to run out of memory. ...once I figure things out then hopefully I'll be able to contribute.
#!/usr/bin/perl -w
#use strict;
use Benchmark;
#get the HTML-Format package from the package manager.
use HTML::Formatter;
#get the HTML-TREE from the package manager
use HTML::TreeBuilder;
use HTML::FormatText;
$startTime = new Benchmark;
my $direct="C:\\Directory";
my $slash='\\';
opendir(DIR1,"$direct")||die "Can't open directory";
my #New1=readdir(DIR1);
foreach $file(#New1)
{
if ($file=~/^\./){next;}
#Initialize the variable names.
my $HTML=0;
my $tree="Empty";
my $data="";
#Open the file and put the file in variable called $data
{
local $/;
open (SLURP, "$direct$slash"."$file") or die "can't open $file: $!";
#read the contents into data
$data = <SLURP>;
#close the filehandle called SLURP
close SLURP or die "cannot close $file: $!";
if($data=~m/<HTML>/i){$HTML=1;}
if($HTML==1)
{
#the following steps strip out any HTML tags, etc.
$tree=HTML::TreeBuilder->new->parse($data);
$formatter=HTML::FormatText->new(leftmargin=> 0, rightmargin=>60);
$Alldata=$formatter->format($tree);
}
}
#print
my $outfile = "out_".$file;
open (FOUT, "> $direct\\$outfile");
print FOUT "file: $file\nHTML: $HTML\n$Alldata\n","*" x 40, "\n" ;
close(FOUT);
}
$endTime = new Benchmark;
$runTime = timediff($endTime, $startTime);
print ("Processing files took ", timestr($runTime));
You are using up a lot of space with the list of files in #New1.
In addition, if you are using an older version of HTML::TreeBuilder then your objects of this class may need explcitly deleting, as they used to be immune to automatic Perl garbage collection.
Here is a program that avoids both of these problems, by reading the directory incrementallly, and by using HTML::FormatText->format_string to format the text, which implicitly deletes any HTML::TreeBuilder objects that it creates.
In addition, File::Spec makes a tidier job of building absolute file paths, and it is a core module so will not need installing on your system
use strict;
use warnings;
use File::Spec;
use HTML::FormatText;
my $direct = 'C:\Directory';
opendir my $dh, $direct or die "Can't open directory";
while ( readdir $dh ) {
next if /^\./;
my $file = File::Spec->catfile($direct, $_);
my $outfile = File::Spec->catfile($direct, "out_$_");
next unless -f $file;
my $html = do {
open my $fh, '<', $file or die qq(Unable to open "$file" for reading: $!);
local $/;
<$fh>;
};
next unless $html =~ /<html/i;
my $formatted = HTML::FormatText->format_string(
$html, leftmargin => 0, rightmargin => 60);
open my $fh, '>', $outfile or die qq(Unable to open "$outfile" for writing: $!);
print $fh "File: $file\n\n";
print $fh "$formatted\n";
print $fh "*" x 40, "\n" ;
close $fh or die qq(Unable to close "$outfile" after writing: $!);
}
What was wrong with the answer to your previous question?
Your opening files for writing without checking the return code. Are you sure the succeed? And in which directory do you thing the files are created?
A better approach would be to:
read files 1 by 1
strip the HTML
write out the new file in the correct directory and checking the return code
something like:
while ( my $file = readdir DIR ) {
....process file
open my $newfile, '>', "$direct/out_$outfile" or die "cannot open $outfile: $!\n";
... etc
}
How to reduce the memory footprint of this application:
Does the problem persist when you add $tree = $tree->delete to the end of your loop?
The perl garbage collector cannot resolve circular references; so you have to destroy the tree manually so you don't run out of memory.
(See the first example in the module documentation at http://metacpan.org/pod/HTML::TreeBuilder)
You should put the readdir inside the loop. The way you coded it, you first read in this gigantic list of files. When you say
my $file;
while (defined($file = readdir DIR1)) {..}
only one entry is actually read at a time. Should save some extra memory.
A few other comments on style:
default values
You give $tree the default value of "Empty". That is completely unnecessary. If you want to show how undefined a variable is, set it to undef, which it is by default. Perl guarantees this initialization.
backslashes
You use backslashes as a directory separator? Stop worrying and just use normal slashes. Unless you are on DOS you can use normal slashes as well, Windows isn't that dumb.
statement modifiers
This line
if ($file=~/^\./){next;}
can be written far more readable as
next if $file =~ /^\./;
consequent use of parens
Your use of parens for function argument lists is inconsequent. You can omit the parens for all built-in functions unless there is ambiguity. I prefer avoiding them, others may find them easier to read. But please stick to a style!
better regex
You test for the existence of /<HTML>/i. What if I told you the html tag can have attributes? You should rather consider testing for /<html/i.
simplification (removes another bug)
Your test
if($data=~m/<HTML>/i){$HTML=1;}
if($HTML==1) {...}
can be written as
$HTML = $data =~ /<html/i;
if ($HTML == 1) {...}
can be written as
$HTML = $data =~ /<html/i
if ($HTML) {...}
can be folded into
if ($data =~ /<html/i) {...}
The way you implemented it, the $HTML variable was never reset to a false value. So once a file contained html, all subsequent files would have been treated as html as well. You can counteract such problems by defining your vars in the innermost sensible scope.
use HTML::FormatText, tribute to #pavel
Use the modules you use to the fullest. Look what I found in the example for HTML::FormatText:
my $string = HTML::FormatText->format_file(
'test.html',
leftmargin => 0, rightmargin => 50
);
You can easily adapt that to circumvent building the tree manually. Why hadn't you tried this approach, as #pavel told you to in your other post? Would have saved you the memory problem...
use strict
Why did you comment out use strict? Getting as much fatal warnings as possible is important when learning a language. Or when writing solid code. That would force you to declare all your variables like $file sensibly. And rather use warnings than the -w switch, which is a bit outdated.
well done
But a very big "well done" on checking the return value of close ;-) That is very un-n00bish!
I have a .SQL file containing a large number of queries. They are being run against a database containing data for multiple states over multiple years. The machine I am running this on can only handle running the queries for one state, in one year, at a time.
I am trying to create a Perl script that takes user input for the state abbreviation, the state id number, and the year. It then creates a directory for that state and year. Then it opens the "base" .SQL file and searches and replaces the base state id and year with the user input, and saves this new .SQL file to the created directory.
The current script I have (below) stops at
open(IN,'<$infile')
with
"Can't open [filename]"
It seems that it is having difficulty finding or opening the .SQL file. I have quadruple-checked to make sure the paths are correct, and I have even tried replacing the
$path
with an absolute path for the base file. If it was having trouble with creating the new file I'd have more direction, but since it can't find/open the base file I do not know how to proceed.
#!/usr/local/bin/perl
use Cwd;
$path = getcwd();
#Cleans up the path
$path =~ s/\\/\//sg;
#User inputs
print "What is the 2 letter state abbreviation for the state? Ex. 'GA'\n";
$stlet = <>;
print "What is the 2 digit state abbreviation for the state? Ex. '13'\n";
$stdig = <>;
print "What four-digit year are you doing the calculations for? Ex. '2008'\n";
$year = <>;
chomp $stlet;
chomp $stdig;
chomp $year;
#Creates the directory
mkdir($stlet);
$new = $path."\/".$stlet;
mkdir("$new/$year");
$infile = '$path/Base/TABLE_1-26.sql';
$outfile = '$path/$stlet/$year/TABLE_1-26.sql';
open(IN,'<$infile') or die "Can't open $infile: $!\n";
open(OUT,">$infile2") or die "Can't open $outfile: $!\n";
print "Working...";
while (my $search = <IN>) {
chomp $search;
$search =~ s/WHERE pop.grp = 132008/WHERE pop.grp = $stdig$year/g;
print OUT "$search\n";
}
close(IN);
close(OUT);
I know I also probably need to tweak the regular expression some, but I'm trying to take things one at a time. This is my first Perl script, and I haven't really been able to find anything that handles .SQL files like this that I can understand.
Thank you!
$infile = '$path/Base/TABLE_1-26.sql';
The string in that line is single quoted, so $path won't interpolate, so your program is looking for a file literally named $path/Base/TABLE_1-26.sql.
You want
$infile = "$path/Base/TABLE_1-26.sql";
Or, better,
use File::Spec;
....
$infile = File::Spec->catfile($path, 'Base', 'TABLE_1-26.sql');
and similarly for $outfile -- or was that $infile2 ? :) I'd strongly recommend putting a use strict; and use warnings; at the top of this and your future scripts.