I am trying to validate form data from server-side.
my interest is that the user just fill the form by Persian characters.
I am using this code:
$name=trim($_POST['name']);
$name= mysql_real_escape_string($name);
if (preg_match('/^[\u0600-\u06FF]+$/',str_replace("\\\\","",$name))){$err.= "Please use Persian characters!";}
but it is not working!
here is a warning:
Warning: preg_match() [function.preg-match]: Compilation failed: PCRE does not support \L, \l, \N, \U, or \u at offset 3 in C:\xampp\htdocs\site\form.php on line 38
What can I do?
This 'should' work...
** added a ^ after the opening [ to exclude arabic/farsi characters from the match...
if (preg_match('/^[^\x{600}-\x{6FF}]+$/u', str_replace("\\\\","",$name)))
http://utf8-chartable.de/unicode-utf8-table.pl?start=1536&number=1024&utf8=0x&addlinks=1&htmlent=1
پژگچ in 600 - 6FF range
fa only:
preg_match('/^[پچجحخهعغفقثصضشسیبلاتنمکگوئدذرزطظژؤإأءًٌٍَُِّ\s]+$/u', $input);
en , en-num and fa character:
preg_match('/^([a-zA-Z0-9 پچجحخهعغفقثصضشسیبلاتنمکگوئدذرزطظژؤإأءًٌٍَُِّ])+$/u', $input);
you can set fa-numbers or arabic ي ك
You should use this:
if(preg_match("/^[آ ا ب پ ت ث ج چ ح خ د ذ ر ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن و ه ی]/", $_POST['name']))
Related
I am trying to find the encoding scheme for this page (and others) which are surely Arabic, using lower ASCII range Latin characters to encode the contents.
http://www.saintcyrille.com/2011a.htm
http://www.saintcyrille.com/2011b.htm (English version/translation of that same page)
I have seen several sites and even PDF documents with this encoding, but I can't find the name or method of it.
This specific page is from 2011 and I think this is a pre-Unicode method of encoding Arabic that has fallen out of fashion.
Some sample text:
'D1J'6) 'D1H-J) 'DA5-J)
*#ED'* AJ 3A1 'D*CHJF
JDBJG'
'D#( / 3'EJ -D'B 'DJ3H9J
'D0J J#*J .5J5'K EF -D( # 3H1J'
An extraordinary mojibake case. It looks like there is missing high byte in Unicode code points in Arabic text. For instance: ا (U+0627, Arabic Letter Alef) appears as ' (U+0027, Apostrophe).
Let's suppose that missing high byte is always 0x06 in the following PowerShell script (added some more strings from the very end of the page http://www.saintcyrille.com/2011a.htm to your sample text):
$mojibakes = #'
E3'!K
'D1J'6) 'D1H-J) 'DA5-J)
*#ED'* AJ 3A1 'D*CHJF
JDBJG'
'D#( / 3'EJ -D'B 'DJ3H9J
'D0J J#*J .5J5'K EF -D( # 3H1J'
ED'-8'* :
'D#CD 'D5J'EJ 7H'D 'D#3(H9 'D98JE E-(0 ,/'K H'D5HE JF*GJ (9/ B/'3 'D9J/
J-(0 'D*B/E DD'9*1'A (9J/'K 9F JHE 'D9J/ (B/1 'D%EC'F -*I *3*7J9H' 'DE4'1C) AJ 'D5DH'* HB/'3 'D9J/ HFF5- D0DC 'D'3*A'/) EF -AD) 'D*H() 'D,E'9J) JHE 'D,E9) 15 '(1JD 2011 -J+ JGJ# 'D,EJ9 E9'K DFH'D 31 'DE5'D-) ( 9// EF 'D#('! 'DCGF) 3JCHF -'61'K )
(5F/HB 'D5HE) 9F/ E/.D 'DCFJ3) AAJ A*1) 'D#9J'/ *8G1 AJF' #9E'D 'D1-E) H'D5/B'* HE' JB'(DG' H0DC 9ED EB(HD HEE/H-
HDF' H7J/ 'D#ED #F *4'1CH' 'D'-*A'D'* AJ 19J*CE HCD 9'E H#F*E (.J1
'DE3J- B#'E ... -#B'K B#'E
'# -split [System.Environment]::NewLine
Function highByte ([byte]$lowByte, [switch]$moreInfo) {
if ( $moreInfo.IsPresent -and (
$lowByte -lt 0x20 -or $lowByte -gt 0x7f )) {
Write-Host $lowByte -ForegroundColor Cyan
}
if ( $lowByte -eq 0x20 ) { 0,$lowByte } else { 6,$lowByte }
}
foreach ( $mojibake in $mojibakes ) {
$aux = [System.Text.Encoding]::
GetEncoding( 1252).GetBytes( [char[]]$mojibake )
[System.Text.Encoding]::BigEndianUnicode.GetString(
$aux.ForEach({(highByte -lowByte $_)})
)
'' # new line separator for better readability
}
Output (using Google Translate) seems to give a sense roughly similar to English version of the page, after a fashion…
Output: .\SO\70062779.ps1
مساءً
الرياضة الروحية الفصحية
تأملات في سفر التكوين
يلقيها
الأب د سامي حلاق اليسوعي
الذي يأتي خصيصاً من حلب ـ سوريا
ملاحظات غ
الأكل الصيامي طوال الأسبوع العظيم محبذ جداً والصوم ينتهي بعد قداس
العيد
يحبذ التقدم للاعتراف بعيداً عن يوم العيد بقدر الإمكان حتى تستطيعوا
المشاركة في الصلوات وقداس العيد ، وننصح لذلك ، الاستفادة من حفلة
التوبة الجماعية يوم الجمعة رص ابريل زذرر حيث يهيأ الجميع معاً لنوال سر
المصالحة ب عدد من الأباء الكهنة سيكون حاضراً ة
بصندوق الصومة عند مدخل الكنيسة ففي فترة الأعياد تظهر فينا أعمال الرحمة
والصدقات وما يقابلها وذلك عمل مقبول وممدوح
ولنا وطيد الأمل أن تشاركوا الاحتفالات في رعيتكم وكل عام وأنتم بخير
المسيح قـام خخخ حـقاً قـام
Please keep in mind that I do not understand Arabic.
The script does not handle numbers: year 2011 in note #2 is incorrectly transformed to زذرر, for instance;
Handling spaces is unclear: is 0x20 always a space, or should be transformed to ؠ (U+0620, Arabic Letter Kashmiri Yeh)?
moreover, there is that problematic presumption about Unicode range U+0600-U+067F (where are U+0680-U+06FF and others?).
How can I remove characters from a string that are not supported by MySQL's utf8 character set? In other words, characters with four bytes, such as "𝜀", that are only supported by MySQL's utf8mb4 character set.
For example,
𝜀C = -2.4‰ ± 0.3‰; 𝜀H = -57‰
should become
C = -2.4‰ ± 0.3‰; H = -57‰
I want to load a data file into a MySQL table that has CHARSET=utf8.
MySQL's utf8mb4 encoding is what the world calls UTF-8.
MySQL's utf8 encoding is a subset of UTF-8 that only supports characters in the BMP (meaning characters U+0000 to U+FFFF inclusive).
Reference
So, the following will match the unsupported characters in question:
/[^\N{U+0000}-\N{U+FFFF}]/
Here are three different techniques you can use clean your input:
1: Remove unsupported characters:
s/[^\N{U+0000}-\N{U+FFFF}]//g;
2: Replace unsupported characters with U+FFFD:
s/[^\N{U+0000}-\N{U+FFFF}]/\N{REPLACEMENT CHARACTER}/g;
3: Replace unsupported characters using a translation map:
my %translations = (
"\N{MATHEMATICAL ITALIC SMALL EPSILON}" => "\N{GREEK SMALL LETTER EPSILON}",
# ...
);
s{([^\N{U+0000}-\N{U+FFFF}])}{ $translations{$1} // "\N{REPLACEMENT CHARACTER}" }eg;
For example,
use utf8; # Source code is encoded using UTF-8
use open ':std', ':encoding(UTF-8)'; # Terminal and files use UTF-8.
use strict;
use warnings;
use 5.010; # say, //
use charnames ':full'; # Not needed in 5.16+
my %translations = (
"\N{MATHEMATICAL ITALIC SMALL EPSILON}" => "\N{GREEK SMALL LETTER EPSILON}",
# ...
);
$_ = "𝜀C = -2.4‰ ± 0.3‰; 𝜀H = -57‰";
say;
s{([^\N{U+0000}-\N{U+FFFF}])}{ $translations{$1} // "\N{REPLACEMENT CHARACTER}" }eg;
say;
Output:
𝜀C = -2.4‰ ± 0.3‰; 𝜀H = -57‰
εC = -2.4‰ ± 0.3‰; εH = -57‰
I'm looking for a way in Ruby or Rails to replace entity references ( ) in a file with their character reference equivalents ( ).
is the main offender, but I'd like to do the replacement systematically rather than just hand coding a bunch of gsubs.
You can use the HtmlEntities gem:
gem install htmlentieties
require 'htmlentities'
decoded = HTMLEntities.new.decode ' Hello'
decoded[0].ord #=> 160
As Stefan mentioned in the comment, if you want to encode it back using reference numbers, just decode the string and encode it with the :decimal flag:
require 'htmlentities'
text = ' Hello'
coder = HTMLEntities.new
final_text = coder.encode coder.decode(' Hello'), :decimal
p final_text #=> Hello
"Max Williams".html_safe => "Max Williams"
This is functionality of Rails's Active Support.
I am using h helper method in Rails to encode/escape a string that has an apostrophe (') In my view I am using it like this
<%=h "Mike's computer" %>
My understanding is that the html when viewing the source should be Mike%27s computer but the html produced has an apostrophe in it, Mike's computer
Am I missing something obvious?
How do I get my desired result of Mike%27s computer?
Help is always appreciated.
An apostrophe is a valid character in HTML. It is not encoded because it is not needed to be encoded.
If you want to encode a URL, use u helper:
>> fer#:~/$ script/console
Loading development environment (Rails 2.3.8)
>> include ERB::Util
=> Object
>> h "Mike's computer"
=> "Mike's computer"
>> u "Mike's computer"
=> "Mike%27s%20computer"
>>
If we look at the source code of the h method (it is an alias for html_escape), it is not that hard to just open the file and add the single quote (') to the HTML_ESCAPE constant in the file.
Below is the source code of the method with the location of the method in the file. Find the constant and and the quote in. You can even add more things inside as you want it.
HTML_ESCAPE = { '&' => '&', '>' => '>', '<' => '<', '"' => '"' }
File actionpack/lib/action_view/template_handlers/erb.rb, line 17
17: def html_escape(s)
18: s.to_s.gsub(/[&"><]/) { |special| HTML_ESCAPE[special] }
19: end
CAVEAT: This modification will affect all projects that uses the library.
OR an alternative will be to create a view helper method say in ApplicationHelper
def h_with_quote(s)
HTML_ESCAPE = { "'" => "%27"}
h(s).gsub(/[']/) {|special| HTML_ESCAPE[special]}
end
That approach should be safer.
I'm using MySQL API's function
mysql_real_escape_string()
Based on the documentation, it escapes the following characters:
\0
\n
\r
\
'
"
\Z
Now, I looked into OWASP.org's ESAPI security library and in the Python port it had the following code (http://code.google.com/p/owasp-esapi-python/source/browse/esapi/codecs/mysql.py):
"""
Encodes a character for MySQL.
"""
lookup = {
0x00 : "\\0",
0x08 : "\\b",
0x09 : "\\t",
0x0a : "\\n",
0x0d : "\\r",
0x1a : "\\Z",
0x22 : '\\"',
0x25 : "\\%",
0x27 : "\\'",
0x5c : "\\\\",
0x5f : "\\_",
}
Now, I'm wondering whether all those characters are really needed to be escaped. I understand why % and _ are there, they are meta characters in LIKE operator, but I can't simply understand why did they add backspace and tabulator characters (\b \t)? Is there a security issue if you do a query:
SELECT a FROM b WHERE c = '...user input ...';
Where user input contains tabulators or backspace characters?
My question is here: Why did they include \b \t in the ESAPI security library? Are there any situations where you might need to escape those characters?
A guess concerning the backspace character: Imagine I send you an email "Hi, here's the query to update your DB as you wanted" and an attached textfile with
INSERT INTO students VALUES ("Bobby Tables",12,"abc",3.6);
You cat the file, see it's okay, and just pipe the file to MySQL. What you didn't know, however, was that I put
DROP TABLE students;\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b
before the INSERT STATEMENT which you didn't see because on console output the backspaces overwrote it. Bamm!
Just a guess, though.
Edit (couldn't resist):
The MySQL manual page for strings says:
\0 An ASCII NUL (0x00) character.
\' A single quote (“'”) character.
\" A double quote (“"”) character.
\b A backspace character.
\n A newline (linefeed) character.
\r A carriage return character.
\t A tab character.
\Z ASCII 26 (Control-Z). See note following the table.
\\ A backslash (“\”) character.
\% A “%” character. See note following the table.
\_ A “_” character. See note following the table.
Blacklisting (identifying bad characters) is never the way to go, if you have any other options.
You need to use a conbination of whitelisting, and more importantly, bound-parameter approaches.
Whilst this particular answer has a PHP focus, it still helps plenty and will help explain that just running a string through a char filter doesn't work in many cases. Please, please see Do htmlspecialchars and mysql_real_escape_string keep my PHP code safe from injection?
Where user input contains tabulators or backspace characters?
It's quite remarkable a fact that up to this day most users do believe that it's user input have to be escaped, and such escaping "prevents injections".
Java solution:
public static String filter( String s ) {
StringBuffer buffer = new StringBuffer();
int i;
for( byte b : s.getBytes() ) {
i = (int) b;
switch( i ) {
case 9 : buffer.append( " " ); break;
case 10 : buffer.append( "\\n" ); break;
case 13 : buffer.append( "\\r" ); break;
case 34 : buffer.append( "\\\"" ); break;
case 39 : buffer.append( "\\'" ); break;
case 92 : buffer.append( "\\" );
if( i > 31 && i < 127 ) buffer.append( new String( new byte[] { b } ) );
}
}
return buffer.toString();
}
couldn't one just delete the single quote(s) from user input?
eg: $input =~ s/\'|\"//g;