MySQL - Confusing RegEx Variable Issue - mysql

I need some help with a RegEx. The concept is simple, but the actual solution is well beyond anything I know how to figure out. If anyone could explain how I could achieve my desired effect (and provide an explanation with any example code) it'd be much appreciated!
Basically, imagine a database table that stores the following string:
'My name is $1. I wonder who $2 is.'
First, bear in mind that the dollar sign-number format IS set in stone. That's not just for this example--that's how these wildcards will actually be stored. I would like an input like the following to be able to return the above string.
'My name is John. I wonder who Sarah is.'
How would I create a query that searches with wildcards in this format, and then returns the applicable rows? I imagine a regular expression would be the best way. Bear in mind that, theoretically, any number of wildcards should be acceptable.
Right now, this is the part of my existing query that drags the content out of the database. The concatenation, et cetera, is there because in a single database cell, there are multiple strings concatenated by a vertical bar.
AND CONCAT('|', content, '|')
LIKE CONCAT('%|', '" . mysql_real_escape_string($in) . "', '|%')
I need to modify ^this line to work with the variables that are a part of the query, while keeping the current effect (vertical bars, etc) in place. If the RegEx also takes into account the bars, then the CONCAT() functions can be removed.
Here is an example string with concatenation as it might appear in the database:
Hello, my name is $1.|Hello, I'm $1.|$1 is my name!
The query should be able to match with any of those chunks in the string, and then return that row if there is a match. The variables $1 should be treated as wildcards. Vertical bars will always delimit chunks.

For MySQL, this article is a nice guide which should help you. The Regexp would be "(\$)(\d+)". Here's a query I ripped off the article:
SELECT * FROM posts WHERE content REGEXP '(\\$)(\\d+)';
After retrieving data, use this handy function:
function ParseData($query,$data) {
$matches=array();
while(preg_match("/(\\$)(\\d+)/",$query,$matches)) {
if (array_key_exists(substr($matches[0],1),$data))
$query=str_replace($matches[0],"'".mysql_real_escape_string($data[substr($matches[0],1)])."'",$query);
else
$query=str_replace($matches[0],"''",$query);
}
return $query;
}
Usage:
$query="$1 went to $2's house";
$data=array(
'1' => 'Bob',
'2' => 'Joe'
);
echo ParseData($query,$data); //Returns "Bob went to Joe's house

If you aren't sticky about using the $1 and $2 and could change them around a bit, you could take a look at this:
http://php.net/manual/en/function.sprintf.php
E.G.
<?php
$num = 5;
$location = 'tree';
$format = 'There are %d monkeys in the %s';
printf($format, $num, $location);
?>

If you want to find entries in the database, then you can use a LIKE statement:
SELECT statement FROM myTable WHERE statement LIKE '%$1%'
Which will find all statements that include $1. I'm assuming that the first number to replace will always be $1 - it doesn't matter, in that case, that the total number of wildcards is arbitrary, as we're just looking for the first one.
The PHP replacement is a little trickier. You could probably do something like:
$count = 1;
while (strpos($statement, "$" . $count)) {
$statement = str_replace("$" . $count, $array[$count], $statement);
}
(I've not tested that, so there might be typos in there, but it should be enough to give the general idea.)
The one downside is that it will fail if you have more than ten parameters in your string to replace - the first runthrough will replace the first two characters of $10, as it's looking for $1.

I asked a different, but similar, question, and I think the solution applies to this question just as well.
https://stackoverflow.com/a/10763476/1382779

Related

issue with duplicating record in mysql query

When I search any record by chapter number then it works.
But the problem is when I select chapter number 1 or 2 from drop-down and the search all records included in that chapter.
It displays all records included in 1,11,21,31...or 2,21,12,...like this.
I know I wrote 'like' there that's why it happens. But when i write " = " operator that I commented in my code that also didn't work for me.
What will be the perfect query to solve this problem?
My Code:
<?php
include("conn.php");
$name=$_POST['fname'];
$name2=$_POST['chapter'];
$sql="SELECT distinct * FROM $user WHERE question like '%".$name."%' and Chapter like '%".$name2."%'";
// $sql="SELECT * FROM $user WHERE question='$name' and Chapter='$name2'";
$result=mysql_query($sql,$connection) or die(mysql_error());
while($row=mysql_fetch_array($result)) {
?>
I would be interested to see what the type of 'Chapter' is in the returned query, and try to see why it is that the equality comparison doesn't work.
If the typing is straightforward (i.e. it really is just plain old strings), then I'd be looking for whitespace characters or something like that which is foiling the equality comparison.
Similarly, I'm wondering whether it's the equality on the 'Question' that is messing up your alternate query.
At a guess, try one of the following:
$sql="SELECT distinct * FROM $user WHERE question like '%".$name."%' and Chapter like '$name2'";
$sql="SELECT distinct * FROM $user WHERE question like '%".$name."%' and Chapter='$name2'";
Oh, and you should really do something about escaping those parameters properly to avoid any nasty SQL injection attacks.
The problem is this part of the first query:
Chapter like '%".$name2."%'
If = doesn't work, then I can think of two things. The first is that Chapter is really a list, probably a comma delimited list. The second is that there are extraneous characters in the database.
If Chapter is really a list, use find_in_set() instead:
find_in_set($name2, Chapter) > 0
You directly use $_POST variables in your SQL query which makes you vulnerable to SQL injection attacks. Please take a look at this page for ways around that: http://bobby-tables.com/php.html. You should use either mysql_real_escape_string or prepared statements (better). The best solution would probably be to use PDO.
Also if you want a better answer, please format your question so it can be easily read, include example inputs, outputs and database contents and make sure your code is properly indented.
All I can assume now is that your database field probably contains more than the data you want to match. Leading/tailing spaces or something?
The main problem in your query is in this section
like '%".$name."%'
Just remove % sign from your whole query where you have wrote and check your query may work
properly.

mysql_query syntax variation: at, quotes and curly braces

I am learning MySQL/php through online tutorials and have found the techniques and syntax different from different sources.
In one tutorial, I enter data (from an HTML form) like this:
$table = "ENTRIES";
$sql = "INSERT INTO $table SET
TITLE = '$_POST[title]',
SUMMARY = '$_POST[summary]',
CONTENT = '$_POST[content]'";
$query = #mysql_query($sql);
And in another, like this:
mysql_query("
INSERT INTO `posts` SET
`title` = '{$_POST['title']}',
`contents` = '{$_POST['post']}'
");}
They both work, and I understand the different variable arrangements. BUT I have the following questions, probably all related. (I gather that #mysql_query suppresses error messages, SO if that is what is going on here, can you please explain how it is functioning and what is actually proper syntax?)
1) In the first example, in #mysql_query(), it doesn't matter if I use ("") or ('') ... but in the second example, in mysql_query(), it breaks if I use (''). In fact it tells me that there is an unexpected {, which leads to my next question:
2) What is the deal with the {} in the second example? They don't seem to be doing anything, but it breaks without them.
3) In the first example, is breaks if I enclose title, summary, and content in single quotes ''. In the second, with 'title' and 'post', it breaks if I don't!
Any explanations or references/links comprehensible to a beginner would be much appreciated!
Run far away from this tutorial and fine one that uses PDO / mysqli and explains how to properly parameterize queries.
Anyway, your questions are PHP specific and have to do with variable interpolation in strings. In quoted strings (") variables are interpolated, and arrays can be accessed via:
"{$var['value']}"
"$var[value]"
Either one is valid ... they function identically and it's up to personal preference which one you should use.
mysql_query takes a string as an argument, so it actually makes no difference how you build it. Both of the above are valid. Using # makes no difference -- in fact, you shouldn't use it, and you should properly handle possible errors and check mysql_error

Wildcard searches w/MySQL

I've a situation where my wildcard search is not really working how I thought it should...
This is the MySQL query I am running and it will return only whole words...
eg. If I enter "phone" as a search word it will return all rows with "phone" in it but not rows with "phones" (note the added 's')
mysql_query(" SELECT * FROM posts WHERE permalink = '$permalink' AND LOWER(raw_text) LIKE '%$str%' " );
How can I get it to return all variations of the search word? Obviously I know this could run into problems as it could return all sorts of matches if the user enters a common run of letters, I thought I could make this part of the advanced search option.
ADENDUM
I have narrowed it down to not a problem with the wildcard but with what I'm doing with the returned data... It is in my Regex that I am throwing at the data.
$pattern= "/(?:[\w\"',.-]+\s*){0,5}[\"',.-]?\S*".$str."(\s|[,.!?])(\s*[A-Za-z0-9.,-]+){0,5}/";
preg_match_all($pattern, $row["raw_text"], $matches);
The regex is not finding the string in the raw data that I am returning so it is throwing me a null. A new problem, I'm not that familiar with regex so I will havbe to fugure this one out as well... Maybe I'll be throwing up a new question soon!
Thanks,
M
I think something else is going on. The % part of the query seems correct. And this seems to be confirmation:
select 'phones' LIKE '%phone%';
+-------------------------+
| 'phones' LIKE '%phone%' |
+-------------------------+
| 1 |
+-------------------------+
1 row in set (0.00 sec)
By the way, I really hope you are doing a rigorous sanitation on $str (and $permalink too if it is from user input). You should only allow alphanumerics and a small number of other safe characters (spaces, probably) and you should be running it through mysql_real_escape_string() before using it in mysql_query(). Better yet, have a look at PDO.
Anyway, back to troubleshooting this: One thing to try might be to have the program log the string your sending to mysql_query(). Basically, change the call to mysql_query() to a call to error_log() or echo() or something like that. Then copy and paste the resulting query into MySQL command-line tool or PHPmyAdmin or whatever. When that query doesn't work the way you expect, at least you can look at it and tweak it to figure out what's up. And who knows, maybe it will be super obvious once you see the query spelled out.
I suspect you have a trailing space in your $str variable, which would explain what you're seeing: Your LIKE criterion would be '%phone %', which matches "... phone ...", but not "... phones ...".
Try trimming your value first.

How to write MySQL query where A contains ( "a" or "b" )

I must use this format where A operand B. A is the field; I want B to be either "text 1" or "text 2", so if A has data like "text 1 texttext" or "texttext 2" , the query will have result.
But how do I write this? Does MySQL support something like
where A contains ('text 1' OR 'text 2')? `
Two options:
Use the LIKE keyword, along with percent signs in the string
select * from table where field like '%a%' or field like '%b%'.
(note: If your search string contains percent signs, you'll need to escape them)
If you're looking for more a complex combination of strings than you've specified in your example, you could regular expressions (regex):
See the MySQL manual for more on how to use them: http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Of these, using LIKE is the most usual solution -- it's standard SQL, and in common use. Regex is less commonly used but much more powerful.
Note that whichever option you go with, you need to be aware of possible performance implications. Searching for sub-strings like this will mean that the query will have to scan the entire table. If you have a large table, this could make for a very slow query, and no amount of indexing is going to help.
If this is an issue for you, and you'r going to need to search for the same things over and over, you may prefer to do something like adding a flag field to the table which specifies that the string field contains the relevant sub-strings. If you keep this flag field up-to-date when you insert of update a record, you could simply query the flag when you want to search. This can be indexed, and would make your query much much quicker. Whether it's worth the effort to do that is up to you, it'll depend on how bad the performance is using LIKE.
You can write your query like so:
SELECT * FROM MyTable WHERE (A LIKE '%text1%' OR A LIKE '%text2%')
The % is a wildcard, meaning that it searches for all rows where column A contains either text1 or text2
I've used most of the times the LIKE option and it works just fine.
I just like to share one of my latest experiences where I used INSTR function. Regardless of the reasons that made me consider this options, what's important here is that the use is similar:
instr(A, 'text 1') > 0 or instr(A, 'text 2') > 0
Another option could be:
(instr(A, 'text 1') + instr(A, 'text 2')) > 0
I'd go with the LIKE '%text1%' OR LIKE '%text2%' option... if not hope this other option helps
I user for searching the size of motorcycle :
For example :
Data = "Tire cycle size 70 / 90 - 16"
i can search with "70 90 16"
$searchTerms = preg_split("/[\s,-\/?!]+/", $itemName);
foreach ($searchTerms as $term) {
$term = trim($term);
if (!empty($term)) {
$searchTermBits[] = "name LIKE '%$term%'";
}
}
$query = "SELECT * FROM item WHERE " .implode(' AND ', $searchTermBits);

Searching for phone numbers in mysql

I have a table which is full of arbitrarily formatted phone numbers, like this
027 123 5644
021 393-5593
(07) 123 456
042123456
I need to search for a phone number in a similarly arbitrary format ( e.g. 07123456 should find the entry (07) 123 456
The way I'd do this in a normal programming language is to strip all the non-digit characters out of the 'needle', then go through each number in the haystack, strip all non-digit characters out of it, then compare against the needle, eg (in ruby)
digits_only = lambda{ |n| n.gsub /[^\d]/, '' }
needle = digits_only[input_phone_number]
haystack.map(&digits_only).include?(needle)
The catch is, I need to do this in MySQL. It has a host of string functions, none of which really seem to do what I want.
Currently I can think of 2 'solutions'
Hack together a franken-query of CONCAT and SUBSTR
Insert a % between every character of the needle ( so it's like this: %0%7%1%2%3%4%5%6% )
However, neither of these seem like particularly elegant solutions.
Hopefully someone can help or I might be forced to use the %%%%%% solution
Update: This is operating over a relatively fixed set of data, with maybe a few hundred rows. I just didn't want to do something ridiculously bad that future programmers would cry over.
If the dataset grows I'll take the 'phoneStripped' approach. Thanks for all the feedback!
could you use a "replace" function to strip out any instances of "(", "-" and " ",
I'm not concerned about the result being numeric.
The main characters I need to consider are +, -, (, ) and space
So would that solution look like this?
SELECT * FROM people
WHERE
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(phonenumber, '('),')'),'-'),' '),'+')
LIKE '123456'
Wouldn't that be terribly slow?
This looks like a problem from the start. Any kind of searching you do will require a table scan and we all know that's bad.
How about adding a column with a hash of the current phone numbers after stripping out all formatting characters. Then you can at least index the hash values and avoid a full blown table scan.
Or is the amount of data small and not expected to grow much?
Then maybe just sucking all the numbers into the client and running a search there.
I know this is ancient history, but I found it while looking for a similar solution.
A simple REGEXP may work:
select * from phone_table where phone1 REGEXP "07[^0-9]*123[^0-9]*456"
This would match the phonenumber column with or without any separating characters.
As John Dyer said, you should consider fixing the data in the DB and store only numbers. However, if you are facing the same situation as mine (I cannot run a update query) the workaround I found was combining 2 queries.
The "inside" query will retrieve all the phone numbers and format them removing the non-numeric characters.
SELECT REGEXP_REPLACE(column_name, '[^0-9]', '') phone_formatted FROM table_name
The result of it will be all phone numbers without any special character. After that the "outside" query just need to get the entry you are looking for.
The 2 queries will be:
SELECT phone_formatted FROM (
SELECT REGEXP_REPLACE(column_name, '[^0-9]', '') phone_formatted FROM table_name
) AS result WHERE phone_formatted = 9999999999
Important: the AS result is not used but it should be there to avoid erros.
An out-of-the-box idea, but could you use a "replace" function to strip out any instances of "(", "-" and " ", and then use an "isnumeric" function to test whether the resulting string is a number?
Then you could do the same to the phone number string you're searching for and compare them as integers.
Of course, this won't work for numbers like 1800-MATT-ROCKS. :)
Is it possible to run a query to reformat the data to match a desired format and then just run a simple query? That way even if the initial reformatting is slow you it doesn't really matter.
My solution would be something along the lines of what John Dyer said. I'd add a second column (e.g. phoneStripped) that gets stripped on insert and update. Index this column and search on it (after stripping your search term, of course).
You could also add a trigger to automatically update the column, although I've not worked with triggers. But like you said, it's really difficult to write the MySQL code to strip the strings, so it's probably easier to just do it in your client code.
(I know this is late, but I just started looking around here :)
i suggest to use php functions, and not mysql patterns, so you will have some code like this:
$tmp_phone = '';
for ($i=0; $i < strlen($phone); $i++)
if (is_numeric($phone[$i]))
$tmp_phone .= '%'.$phone[$i];
$tmp_phone .= '%';
$search_condition .= " and phone LIKE '" . $tmp_phone . "' ";
This is a problem with MySQL - the regex function can match, but it can't replace. See this post for a possible solution.
See
http://www.mfs-erp.org/community/blog/find-phone-number-in-database-format-independent
It is not really an issue that the regular expression would become visually appalling, since only mysql "sees" it. Note that instead of '+' (cfr. post with [\D] from the OP) you should use '*' in the regular expression.
Some users are concerned about performance (non-indexed search), but in a table with 100000 customers, this query, when issued from a user interface returns immediately, without noticeable delay.
Here is a working Solution for PHP users.
This uses a loop in PHP to build the Regular Expression. Then searches the database in MySQL with the RLIKE operator.
$phone = '(456) 584-5874' // can be any format
$phone = preg_replace('/[^0-9]/', '', $phone); // strip non-numeric characters
$len = strlen($phone); // get length of phone number
for ($i = 0; $i < $len - 1; $i++) {
$regex .= $phone[$i] . "[^[:digit:]]*";
}
$regex .= $phone[$len - 1];
This creates a Regular Expression that looks like this: 4[^[:digit:]]*5[^[:digit:]]*6[^[:digit:]]*5[^[:digit:]]*8[^[:digit:]]*4[^[:digit:]]*5[^[:digit:]]*8[^[:digit:]]*7[^[:digit:]]*4
Now formulate your MySQL something like this:
$sql = "SELECT Client FROM tb_clients WHERE Phone RLIKE '$regex'"
NOTE: I tried several of the other posted answers but found performance issues. For example, on our large database, it took 16 seconds to run the IsNumeric example. But this solution ran instantly. And this solution is compatible with older MySQL versions.
MySQL can search based on regular expressions.
Sure, but given the arbitrary formatting, if my haystack contained "(027) 123 456" (bear in mind position of spaces can change, it could just as easily be 027 12 3456 and I wanted to match it with 027123456, would my regex therefore need to be this?
"^[\D]+0[\D]+2[\D]+7[\D]+1[\D]+2[\D]+3[\D]+4[\D]+5[\D]+6$"
(actually it'd be worse as the mysql manual doesn't seem to indicate it supports \D)
If that is the case, isn't it more or less the same as my %%%%% idea?
Just an idea, but couldn't you use Regex to quickly strip out the characters and then compare against that like #Matt Hamilton suggested?
Maybe even set up a view (not sure of mysql on views) that would hold all phone numbers stripped by regex to a plain phone number?
Woe is me. I ended up doing this:
mre = mobile_number && ('%' + mobile_number.gsub(/\D/, '').scan(/./m).join('%'))
find(:first, :conditions => ['trim(mobile_phone) like ?', mre])
if this is something that is going to happen on a regular basis perhaps modifying the data to be all one format and then setup the search form to strip out any non-alphanumeric (if you allow numbers like 310-BELL) would be a good idea. Having data in an easily searched format is half the battle.
a possible solution can be found at http: //udf-regexp.php-baustelle.de/trac/
additional package need to be installed, then you can play with REGEXP_REPLACE
Create a user defined function to dynamically creates Regex.
DELIMITER //
CREATE FUNCTION udfn_GetPhoneRegex
(
var_Input VARCHAR(25)
)
RETURNS VARCHAR(200)
BEGIN
DECLARE iterator INT DEFAULT 1;
DECLARE phoneregex VARCHAR(200) DEFAULT '';
DECLARE output VARCHAR(25) DEFAULT '';
WHILE iterator < (LENGTH(var_Input) + 1) DO
IF SUBSTRING(var_Input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
SET output = CONCAT(output, SUBSTRING(var_Input, iterator, 1));
END IF;
SET iterator = iterator + 1;
END WHILE;
SET output = RIGHT(output,10);
SET iterator = 1;
WHILE iterator < (LENGTH(output) + 1) DO
SET phoneregex = CONCAT(phoneregex,'[^0-9]*',SUBSTRING(output, iterator, 1));
SET iterator = iterator + 1;
END WHILE;
SET phoneregex = CONCAT(phoneregex,'$');
RETURN phoneregex;
END//
DELIMITER ;
Call that User Defined Function in your stored procedure.
DECLARE var_PhoneNumberRegex VARCHAR(200);
SET var_PhoneNumberRegex = udfn_GetPhoneRegex('+ 123 555 7890');
SELECT * FROM Customer WHERE phonenumber REGEXP var_PhoneNumberRegex;
I would use Google's libPhoneNumber to format a number to E164 format. I would add a second column called "e164_number" to store the e164 formatted number and add an index on it.
In my case, I needed to identify Swiss (CH) mobile phone numbers in the phone column and move them in mobile column.
As all mobile phone numbers starts with 07x or +417x here is the regex to use :
/^(\+[0-9][0-9]\s*|0|)7.*/mgix
It find all numbers like the following :
+41 79 123 456 78
+417612345678
076 123 456 78
07812345678
7712345678
and ignore all others like theese :
+41 47 123 456 78
+413212345678
021 123 456 78
02212345678
3412345678
In MySQL it gives the following code :
UPDATE `contact`
SET `mobile` = `phone`,
`phone` = ''
WHERE `phone` REGEXP '^(\\+[\D+][0-9]\\s*|0|)(7.*)$'
You'll need to clean your number from special chars like -/.() before.
https://regex101.com/r/AiWFX8/1