How to amend sub strings? - mysql

Using collation xxx_german2_ci which treats ü and ue as identical, is it possible to have all occurences of München be highlighted as follows?
Example input: "München can also be written as Muenchen."
Example output: "<b>München</b> can also be written as <b>Muenchen</b>."
Note: It is OK to use some non-SQL programming in addition. The only requirement is that the knowledge about which character sequences are identical is taken from the MySQL collation.

I have found this tables: http://developer.mimer.com/collations/charts/index.tml. They are, of course, landuage dependant. Collation is just comapring algorithm. For general utf8 I am not sure, how it treats special characters.
You can use them to found desired symbols and replace them in output to get same result as in example. But for those, you will need some programming language (PHP or anything else).
Another resources:
http://collation-charts.org/
http://mysql.rjweb.org/doc.php/charcoll (down on the page)
Basicly, try to google "collation algorithm mysql utf8_general_ci" or something like this

In the end I decided to do it all in PHP, therefore my question about
which characters are equal with utf8_general_ci.
Below is what I came up with, by example: A label is constructed from a text
$description, with sub strings $term highlighted, and special characters
converted. Substitution is not complete, but probably sufficient for the actual
use case.
mb_internal_encoding("UTF-8");
function withoutAccents($s) {
return strtr(utf8_decode($s),
utf8_decode('àáâãäçèéêëìíîïñòóôõöùúûüýÿß'),
'aaaaaceeeeiiiinooooouuuuyys');
}
function simplified($s) {
return withoutAccents(strtolower($s));
}
function encodedSubstr($s, $start, $length) {
return htmlspecialchars(mb_substr($s, $start, $length));
}
function labelFromDescription($description, $term) {
$simpleTerm = simplified($term);
$simpleDescription = simplified($description);
$lastEndPos = $pos = 0;
$termLen = strlen($simpleTerm);
$label = ''; // HTML
while (($pos = strpos($simpleDescription,
$simpleTerm, $lastEndPos)) !== false) {
$label .=
encodedSubstr($description, $lastEndPos, $pos - $lastEndPos).
'<strong>'.
encodedSubstr($description, $pos, $termLen).
'</strong>';
$lastEndPos = $pos + $termLen;
}
$label .= encodedSubstr($description, $lastEndPos,
strlen($description) - $lastEndPos);
return $label;
}
echo labelFromDescription('São Paulo <SAO>', 'SAO')."\n";
echo labelFromDescription('München <MUC>', 'ünc');
Output:
<strong>São</strong> Paulo <<strong>SAO</strong>>
M<strong>ünc</strong>hen <MUC>

Related

Powershell - How to build a function to replace certain special characters with others

I'm trying to build a function in Powershell that I can use every time I have to replace a list of special characters with other characters. For example, this is the list of the special chars:
$Specialchars = '["ü","ä","ö","ß","æ","Œ","œ","°","~","Ø"]'
And I want that function to replace for example the "ü" with "ue" or the "Ø" with "o" and so on. The idea is to run the function against any string that I want to have those special characters replaced the way I want.
For now, I tried this:
function ReplaceSpecialChars($chars)
{
return $chars -replace "Ø","o"
return $chars -replace "~",""
return $chars -replace "œ","oe"
}
and it works only when it founds the first special characters, the "Ø" and not for example when I run it against a string which has the "~". I'm not an expert of Powershell function at all, so I was wondering if somebody has some hints to help me.
Thanks a lot !
You must assign the result of the replacement before doing the next.
Here is an updated version of the function, incl a way to define your replacements:
function ReplaceSpecialChars([string]$string) {
# define replacements:
#{
"ä" = "ae"
"ß" = "ss"
# ...
}.GetEnumerator() | foreach {
$string = $string.Replace($_.Key, $_.Value)
}
return $string
}
Note that you might want to define lowercase + uppercase replacements.

Undo mysql_real_escape_string

I have the following code at the top of every of my php pages:
<?php
function name_format($str)
{
return trim(mysql_real_escape_string(htmlspecialchars($str, ENT_QUOTES)));
}
?>
foreach ($_POST as $key => $value) {
if (!is_array($value))
{
$_POST[$key] = name_format($value);
}
}
This was pretty useful until now. I experienced that if I want to display a text from a <textarea> before writing it into a database, then it shows "\r\n" instead of normal line breaks.
Even if I try to do the following, it doesn't work:
$str = str_replace("\r\n", "<br>", $str);
The mistake you're making here is over-writing $_POST with a version of the string which you are hoping will be appropriate for all contexts (using mysqli_real_escape_string and htmlspecialchars at the same time).
You should leave the original value untouched, and escape it where it is used, using the appropriate function for that context. (This is one reason why the "magic quotes" feature of early versions of PHP are universally acknowledged to have been a bad idea.)
So in your database code, you would prepare a variable for use with SQL (specifically, MySQL):
$comment = mysqli_real_escape_string(trim($_POST['comment']));
And in your template, you would prepare a variable for use with HTML:
$comment = htmlspecialchars(trim($_POST['comment']));
Possibly adding a call to nl2br() in the HTML context, as desired.

Sanitizing Form Data for MySQL table

Will this do the trick if I am sanitizing code that is going from a form into a mysql table? the data that should/will be entered will be school subjects and professor's first/last names...any other suggestions on how to do this?
/*
Sanitize() function removes any potential threat from the
data submitted. Prevents email injections or any other hacker attempts.
if $remove_nl is true, newline chracters are removed from the input.
*/
function Sanitize($str,$remove_nl=true)
{
$str = $this->StripSlashes($str);
if($remove_nl)
{
$injections = array('/(\n+)/i',
'/(\r+)/i',
'/(\t+)/i',
'/(%0A+)/i',
'/(%0D+)/i',
'/(%08+)/i',
'/(%09+)/i'
);
$str = preg_replace($injections,'',$str);
}
return $str;
}
function StripSlashes($str)
{
if(get_magic_quotes_gpc())
{
$str = stripslashes($str);
}
return $str;
}
I recommend PHP's PDO class. You would do something like:
try
{
$sql ='INSERT INTO whatever(a,b,c) VALUES(:a,:b:c);
//or if you prefer...
$sql ='INSERT INTO whatever(a,b,c) VALUES(?,?,?);
$stmt = db::db()->prepare($sql);
$stmt->execute(array(123,234,345));
}
catch(PDOException $e){library::sql_error($e,$sql);}
Thanks to everyone for taking the time to help. I went with the preg_replace function which limits characters to only what i want people to use: preg_replace("~" . "[^a-zA-Z0-9\-\_\.\ ]" . "~iU", "", $string). I also used mysql_real_escape_string so I'm doing two levels of filtering before sending on to the database.
Why don't you use mysql_real_escape_string() which escapes all potential characters that can cause issues? Besides being built in, it cals MySQL's own mysql_real_escape_string so you know you'll always be up to date on what needs to be escaped for your installed database.
The best option is to use PDO's bindValue method:
http://www.php.net/manual/en/pdostatement.bindvalue.php
This sorts out all your escaping.
For forms, you can also look at this:
http://semlabs.co.uk/docs/xfl/xfl-elements/sanitise
It's a set of PHP classes to handle forms with less hassle, though it will take a while to get your head round.
Try this :
function sanatize($value) {
$value = preg_replace("~" . "[^a-zA-Z0-9\-\_\.]" . "~iU", "", $value);
return $value;
}

PHP force a var to have a certain type

This may be a stupid question but I might aswell as it :)
is there away to force
$tel1 = '05';// string
settype($tel1,'string');
$tel1 = 06;//either throw error, or convert it to string automatically.
var_dump($tel1);//(string [2]) 05
The above code is of the top of my head so might not be accurate but I need to keep a variable as a string not numeric, because of some silly thing I have done, now my phone numbers lose the leading 0s :-(
n I cn't rewrite it because it will mess up with other numeric types,b4 u ask it was an automated service for db to check if it was a numeric value or not,
UPDATE
This is the problem
function escape($str){
if(is_numeric($str)){
return $str;
}else{
return "'".mysql_real_escape_string($str).'\'';
}
}
$tel1 = "06";
$sql = 'SELECT * FROM blabla WHERE id = '.escape($tel1).'';
//above is same as below
$sql = 'SELECT * FROM blabla WHERE id = 06 ';
I can't change anything inside the scape function because other inputes thruout the website are using this function, I dont wanna mess their validations.
Your use of is_numeric tests for numeric content, not an integer type. But then you take a variable called $str which implies you want it to be a string.
Perhaps use:
function escape($val) {
if (is_numeric($val) && !is_string($val)) {
return $val;
}
else{
return "'" . mysql_real_escape_string($val) . '\'';
}
}
Now strings will be escaped and quoted, but not if they contain only numeric content.
you can do something like:
$string = (string) $int;
or use a function
$string = strval($int);
You can't force a variable to a specific type in the global scope.
You can force Arrays and Objects in a function.
function getElementsByClassName(DOMNode $parentElement, Array $classNames) {
...
}
If you pass an object that is not an instantiation of DOMNode (or a subclass), or if you don't pass an Array as the second argument, you'll get an error.
You can of course cast any variable, e.g. (string) $tel1.
You shouldn't be treating phone numbers as Ints anyway, because of leading zeroes and possible parenthesis and dashes. Also, once your telephone number is an Int, it won't know its 0 padding anymore because it will be discarded, so casting it back won't give you the original String.
To cast a variable you can use something like:
$i = 1;
$s = (string) $i;
Depending on the db adaptor you might not be able to detect the type being returned from the database. I believe it's PDO that returns everything (even INT values) as strings.
The number_format() function may be of use to you too.
If you declare a variable as:
$var = 06;
it immediately becomes 6 without leading zero because leading zero when it comes to integers is meaningless and therefore it's cut out.
In other words, your variable has to be created as string, which is what you probably deduced yourself.
Quick fix would be the following: you can add another parameter to your escape() function.
For example:
function escape($str, $force_str = false)
{
if($force_str)
{
// do your conversion, the rest of the site will by default pass false so nothing will be broken
}
}
As alex said, start by making sure the phone number is never converted from string to int in your own code. Then, you need to make sure it will not be converted when sent to your SQL DB.
It ought to work if you do it this way:
$sql = "SELECT * FROM blabla WHERE id = '" . mysql_real_escape_string($tel1) . "'";
This is the same as
$sql = "SELECT * FROM blabla WHERE id = '06'";

special characters strange behavior

i have this string in my utf-8 mysql DB:
"Pruebá de eñes"
When i print it like plain text, everything works ok, but if i load that same field inside an input, textarea, etc, it becomes: "Pruebá de eñes"
How can i solve this problem? =(
First I recommend read this: http://www.joelonsoftware.com/articles/Unicode.html (The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky) really useful.
Then if it's MySQL you can set the default character set in the connection by "SET NAMES 'utf8'". For example in Zend_Framework ($db->query("SET NAMES 'utf8'");)
Another option is add a filter to the data before print it in the screen. I made two handly functions ...
function utf8encode ($subject, $action = '//TRANSLIT')
{
if ( #iconv( 'UTF-8', "UTF-8{$action}", $subject) == $subject ) {
#-- the value it's utf8
} else {
$subject = utf8_encode ($subject);
}
return $subject;
}
function utf8decode ($subject, $action = '//TRANSLIT')
{
if ( #iconv( 'UTF-8', "UTF-8{$action}", $subject) == $subject ){
$subject = utf8_decode ($subject);
} else {
#-- the value is probably ISO-8859-1
}
return $subject;
}