I have the following code at the top of every of my php pages:
<?php
function name_format($str)
{
return trim(mysql_real_escape_string(htmlspecialchars($str, ENT_QUOTES)));
}
?>
foreach ($_POST as $key => $value) {
if (!is_array($value))
{
$_POST[$key] = name_format($value);
}
}
This was pretty useful until now. I experienced that if I want to display a text from a <textarea> before writing it into a database, then it shows "\r\n" instead of normal line breaks.
Even if I try to do the following, it doesn't work:
$str = str_replace("\r\n", "<br>", $str);
The mistake you're making here is over-writing $_POST with a version of the string which you are hoping will be appropriate for all contexts (using mysqli_real_escape_string and htmlspecialchars at the same time).
You should leave the original value untouched, and escape it where it is used, using the appropriate function for that context. (This is one reason why the "magic quotes" feature of early versions of PHP are universally acknowledged to have been a bad idea.)
So in your database code, you would prepare a variable for use with SQL (specifically, MySQL):
$comment = mysqli_real_escape_string(trim($_POST['comment']));
And in your template, you would prepare a variable for use with HTML:
$comment = htmlspecialchars(trim($_POST['comment']));
Possibly adding a call to nl2br() in the HTML context, as desired.
Related
I currently have a script named "test.pl" that does a variety of things and prints out HTML code to view on a webpage. One of the things I want it to do, is allow the user to type in a comment, and select which type of comment it is and the processing form of the comment box will append the comment into a file. I am not sure if I am doing this right because it doesn't seem to be working as I'm getting some errors.. here are the snippets of the code:
#!/usr/bin/perl
use warnings;
use CGI qw(:cgi-lib :standard); # Use CGI modules that let people read data passed from a form
#Initiate comment processing
&ReadParse(%in);
if ($in("comment") && $in("type") ! == "") {
$comment = $in("comment");
$type = $in("type");
WritetoFile($comment,$type);
}
sub WritetoFile {
my $input = shift;
my $type = shift;
my $file = "$type" . "_comment.txt";
open (my $fh, '>>', $file) or die "Could not open file '$file' $!";
print $fh "$input\n";
close $fh
}
The form I am using is this:
<FORM ACTION=test.pl METHOD=POST>
Comment:
<INPUT TYPE=TEXT NAME="comment" LENGTH=60>
<P>
Select Type
<SELECT NAME ="type">
<OPTION SELECTED> Animal
<OPTION> Fruit
<OPTION> Vegetable
<OPTION> Meat
<OPTION> Other
<INPUT TYPE=SUBMIT VALUE="Submit"></FORM
Any suggestions on how to make this work or even improve the process I am doing would be greatly appreciated!I would prefer to keep the processing script and the script that does the rest of my subs to be the same script (test.pl) unless this is something I have to keep separate
Your code is a bizarre mixture of old- and new-style Perl. You're using the cgi-lib compatibility layer in CGI.pm and calling its ReadParse() function using the (unnecessary since 1994) leading ampersand. On the other hand, you're using three-arg open() and lexical filehandles. I'd be interested to hear how you developed that style.
Your problem comes from your (mis-)handling of the %in hash. Your call to ReadParse() puts all of the CGI parameters into the hash, but you're using the wrong syntax to get the values out of the hash. Hash keys are looked up using braces ({ ... }), not parentheses (( ... )).
You also have some confusion over your boolean equality operators. != is used for numeric comparisons. You want ne for string comparisons.
You probably wanted something like:
ReadParse(%in);
if ($in{comment} ne "" and $in{type} ne "") {
$comment = $in{comment};
$type = $in{type};
WritetoFile($comment,$type);
}
Your $comment and $type variables are unnecessary as you can pass the hash lookups directly into your subroutine.
WritetoFile($in{comment}, $in{type});
Finally, as others have pointed out, learning CGI in 2014 is like learning to use a typewriter - it'll still work, but people will think you're rather old-fashioned. Look at CGI::Alternatives for some more modern approaches.
My scrape content is not displaying the special characters.It shows some junk values in place of special characters.(€ printed as -aA).Thanks in advance.
# !/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder::XPath;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new(agent => "Mozilla/5.0");
my $req = HTTP::Request->new(GET => 'http://www.infanziabimbo.it/costi-modalita-e-tempi-di-spedizione.html');
my $res = $ua->request($req);
die("error") unless $res->is_success;
my $xp = HTML::TreeBuilder::XPath->new_from_content($res->content);
my #node = $xp->findnodes_as_strings('//div[#class="mainbox-body"]');
die("node doesn't exist") if $#node == -1; # Line 18
open HTML, ">C:/Users/jeyakuma/Desktop/kjk.html";
foreach(<#node>)
{
print HTML "$_";
}
close HTML;
"
Here are some observations on your code that I hope will help you
You must always check that a call to open succeeded, otherwise your program will just continue to run silently without any input or output. Rather than the idiomatic open ... or die $! you may prefer just to add use autodie at the top of your code
If the HTTP request fails, it is more informative if your program indicates why it failed instead of just saying "error". I suggest you write this instead
$res->is_success or die $res->status_line;
If you don't need any special LWP or parse options, then you can just write
my $url = 'http://www.infanziabimbo.it/costi-modalita-e-tempi-di-spedizione.html';
my $xp = HTML::TreeBuilder::XPath->new_from_url($url);
although that doesn't give you any way to specify the user agent string as you do currently
Rather than testing $#node for equality to -1, it is much neater to check for the truth of #node, so
die "node doesn't exist" unless #node; # Line 18
If your data contains UTF-8 characters then your output file handle must be set to the appropriate mode. You can change the mode using binmode, like this
open HTML, ">C:/Users/jeyakuma/Desktop/kjk.html";
binmode HTML, ':encoding(utf-8)';
But the best way is to use the preferred three-parameter form of open, which would look like this, assuming that you have use autodie in place at the start of your program
open HTML, '>:encoding(utf-8)', 'C:/Users/jeyakuma/Desktop/kjk.html';
Lexical file handles are far superior to the old-fashioned global file handles
The loop foreach(<#node>) { ... } is completely wrong because it is equivalent to foreach (glob join ' ', #node) { ... } and only appears to work because, in general, glob will leave a filename untouched if it doesn't contain any wildcards. What you meant was just for (#node) { ... }
In addition, it is bad practice to enclose a variable in quotes unless you specifically want to call its stringification method, so "$_" should be just $_
You may as well write your final output loop as
print HTML #node;
Putting these changes in place, the result looks like this, which I believe will fix your problem
use strict;
use warnings;
use autodie;
use HTML::TreeBuilder::XPath;
my $url = 'http://www.infanziabimbo.it/costi-modalita-e-tempi-di-spedizione.html';
my $xp = HTML::TreeBuilder::XPath->new_from_url($url);
my #node = $xp->findnodes_as_strings('//div[#class="mainbox-body"]');
die "node doesn't exist" unless #node;
open my $html_fh, '>:encoding(utf-8)', 'C:/Users/jeyakuma/Desktop/kjk.html';
print $html_fh #node;
close $html_fh;
I've recently used a pattern to replace straight double quotes by pairs of opening/closing double quotes.
$string = preg_replace('/(\")([^\"]+)(\")/','“$2”',$string);
It works fine when $string is a sentence, even a paragraph.
But…
My function can be called to to the job for a chunk of HTML code, and it's not working as excepted anymore:
$string = preg_replace('/(\")([^\"]+)(\")/','“$2”','Something "with" quotes');
returns
<a href=“page.html”>Something “with” quotes</a>
And that's a problem…
So I thought I could do it in two passes: extract text within tags, then replace quotes.
I tried this
$pattern='/<[^>]+>(.*)<\/[^>]+>/';
And it works for instance if the string is
$string='Something "with" quotes';
But it's not working with strings like:
$string='Something "with" quotes Something "with" quotes';
Any idea?
Bertrand
Usual reply I guess... As it has been already pointed out, you should not parse HTML through Regex. You can take a look at the PHP Simple DOM Parse to extract the text and apply your regex, which from what you have already said, seems to be working just fine.
This tutorial should put you in the right direction.
I'm quite sure that this will end in a flame war but this works:
echo do_replace('Something "with" quotes')."\n";
echo do_replace('Something "with" quotes Something "with" quotes')."\n";
function do_replace($string){
preg_match_all('/<([^"]*?|"[^"]*")*>/', $string, $matches);
$matches = array_flip($matches[0]);
$uuid = md5(mt_rand());
while(strpos($string, $uuid) !== false) $uuid = md5(mt_rand());
// if you want better (time) garanties you could build a prefix tree and search it for a string not in it (would be O(n)
foreach($matches as $key => $value)
$matches[$key] = $uuid.$value;
$string = str_replace(array_keys($matches), $matches, $string);
$string = preg_replace('/\"([^\"<]+)\"/','“$1”', $string);
return str_replace($matches, array_keys($matches), $string);
}
output (I replaced “ and ” with “ and ”):
Something “with” quotes
Something “with” quotes Something “with” quotes
With a costum state machine you could even do it without the first replace and than replace back. I recomment to use a Parser anyway.
I finally found a way:
extract text that can be inside, or outside (before, after) any tag (if any)
use a callback to find quotes by pair and replace them.
code
$string = preg_replace_callback('/[^<>]*(?!([^<]+)?>)/sim', create_function('$matches', 'return preg_replace(\'/(\")([^\"]+)(\")/\', \'“$2”\', $matches[0]);'), $string);
Bertrand, resurrecting this question because it had a simple solution that lets you do the replace in one go—no need for a callback. (Found your question while doing some research for a general question about how to exclude patterns in regex.)
Here's our simple regex:
<[^>]*>(*SKIP)(*F)|"([^"]*)"
The left side of the alternation matches complete <tags> then deliberately fails. The right side matches double-quoted strings, and we know they are the right strings because they were not matched by the expression on the left.
This code shows how to use the regex (see the results at the bottom of the online demo):
<?php
$regex = '~<[^>]*>(*SKIP)(*F)|"([^"]*)"~';
$subject = 'Something "with" quotes Something "with" quotes';
$replaced = preg_replace($regex,"“$1”",$subject);
echo $replaced."<br />\n";
?>
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
Will this do the trick if I am sanitizing code that is going from a form into a mysql table? the data that should/will be entered will be school subjects and professor's first/last names...any other suggestions on how to do this?
/*
Sanitize() function removes any potential threat from the
data submitted. Prevents email injections or any other hacker attempts.
if $remove_nl is true, newline chracters are removed from the input.
*/
function Sanitize($str,$remove_nl=true)
{
$str = $this->StripSlashes($str);
if($remove_nl)
{
$injections = array('/(\n+)/i',
'/(\r+)/i',
'/(\t+)/i',
'/(%0A+)/i',
'/(%0D+)/i',
'/(%08+)/i',
'/(%09+)/i'
);
$str = preg_replace($injections,'',$str);
}
return $str;
}
function StripSlashes($str)
{
if(get_magic_quotes_gpc())
{
$str = stripslashes($str);
}
return $str;
}
I recommend PHP's PDO class. You would do something like:
try
{
$sql ='INSERT INTO whatever(a,b,c) VALUES(:a,:b:c);
//or if you prefer...
$sql ='INSERT INTO whatever(a,b,c) VALUES(?,?,?);
$stmt = db::db()->prepare($sql);
$stmt->execute(array(123,234,345));
}
catch(PDOException $e){library::sql_error($e,$sql);}
Thanks to everyone for taking the time to help. I went with the preg_replace function which limits characters to only what i want people to use: preg_replace("~" . "[^a-zA-Z0-9\-\_\.\ ]" . "~iU", "", $string). I also used mysql_real_escape_string so I'm doing two levels of filtering before sending on to the database.
Why don't you use mysql_real_escape_string() which escapes all potential characters that can cause issues? Besides being built in, it cals MySQL's own mysql_real_escape_string so you know you'll always be up to date on what needs to be escaped for your installed database.
The best option is to use PDO's bindValue method:
http://www.php.net/manual/en/pdostatement.bindvalue.php
This sorts out all your escaping.
For forms, you can also look at this:
http://semlabs.co.uk/docs/xfl/xfl-elements/sanitise
It's a set of PHP classes to handle forms with less hassle, though it will take a while to get your head round.
Try this :
function sanatize($value) {
$value = preg_replace("~" . "[^a-zA-Z0-9\-\_\.]" . "~iU", "", $value);
return $value;
}
I want to parse a Website into a Perl data structure.
First I load the page with
use LWP::Simple;
my $html = get("http://f.oo");
Now I know two ways to deal with it.
First are the regular expressions and secound the modules.
I started with reading about HTML::Parser and found some examples.
But I'm not that sure about by Perl knowledge.
My code example goes on
my #links;
my $p = HTML::Parser->new();
$p->handler(start => \&start_handler,"tagname,attr,self");
$p->parse($html);
foreach my $link(#links){
print "Linktext: ",$link->[1],"\tURL: ",$link->[0],"\n";
}
sub start_handler{
return if(shift ne 'a');
my ($class) = shift->{href};
my $self = shift;
my $text;
$self->handler(text => sub{$text = shift;},"dtext");
$self->handler(end => sub{push(#links,[$class,$text]) if(shift eq 'a')},"tagname");
}
I don't understand why there is two times a shift. The secound should be the self pointer. But the first makes me think that the self reference is allready shiftet, used as a Hash and the Value for href is stored in $class. Could someone Explain this line (my ($class) = shift->{href};)?
Beside this lack, I do not want to parse all the URLs, I want to put all the code between <div class ="foo"> and </div> into a string, where lots of code is between, specially other <div></div> tags. So I or a module has to find the right end.
After that I planed to scan the string again, to find special classes, like <h1>,<h2>, <p class ="foo2"></p>, etc.
I hope this informations helps you to give me some usefull advices, and please have in mind that first of all I want an easy understanding way, which has not to be a great performance in the first level!
HTML::Parser is more of a tokenizer than a parser. It leaves a lot of hard work up to you. Have you considered using HTML::TreeBuilder (which uses HTML::Parser) or XML::LibXML (a great library which has support for HTML)?
Use HTML::TokeParser::Simple.
Untested code based on your description:
#!/usr/bin/env perl
use strict; use warnings;
use HTML::TokeParser::Simple;
my $p = HTML::TokeParser::Simple->new(url => 'http://example.com/example.html');
my $level;
while (my $tag = $p->get_tag('div')) {
my $class = $tag->get_attr('class');
next unless defined($class) and $class eq 'foo';
$level += 1;
while (my $token = $p->get_token) {
$level += 1 if $token->is_start_tag('div');
$level -= 1 if $token->is_end_tag('div');
print $token->as_is;
unless ($level) {
last;
}
}
}
No need to get so complicated. You can retrieve and find elements in the DOM using CSS selectors with Mojo::UserAgent:
say Mojo::UserAgent->new->get('http://f.oo')->res->dom->find('div.foo');
or, loop through the elements found:
say $_ for Mojo::UserAgent->new->get('http://f.oo')->res->dom
->find('div.foo')->each;
or, loop using a callback:
Mojo::UserAgent->new->get('http://f.oo')->res->dom->find('div.foo')->each(sub {
my ($count, $el) = #_;
say "$count: $el";
});
According to the docs, the handler's signature is (\%attr, \#attr_seq, $text). There are three shifts, one for each argument.
my ($class) = shift->{href};
is equivalent to:
my $class;
my %attr_seq;
my $attr_seq_ref;
$attr_seq_ref = shift;
%attr_seq = %$attr_seq_ref;
$class = $attr_seq{'href'};