how to speed up mysql regex query - mysql

I want to develope a site for announcing jobs, but because I have a lot of conditions (title,category,tags,city..) I use a MySQL regex statement. However, it's very slow and sometimes results in a 500 internal Server Error
Here is one example :
select * from job
where
( LOWER(title) REGEXP 'dév|freelance|free lance| 3eme grade|inform|design|site|java|vb.net|poo '
or
LOWER(description) REGEXP 'dév|freelance|free lance| 3eme grade|inform|design|site|java|vb.net|poo '
or
LOWER(tags) REGEXP 'dév|freelance|free lance| 3eme grade|inform|design|site|java|vb.net|poo')
and
LOWER(ville) REGEXP LOWER('Agadir')
and
`date`<'2016-01-11'
order by `date` desc
Any advice?

You can't optimize a query based exclusively on regexes. Use full text indexing (or a dedicated search engine such as Mnogo) for text search and geospatial indexing for locations.

The big part of the WHERE, namely the OR of 3 REGEXPs cannot be optimized.
LOWER(ville) REGEXP LOWER('Agadir') can be turned into simply ville REGEXP 'Agadir' if your collation is ..._ci. Please provide SHOW CREATE TABLE job.
Then that can be optimized to ville = 'Agadir'.
But maybe this query is "generated" by your UI? And the users are allowed to use regexp thingies? (SECURITY WARNING: SQL injection is possible here!)
If it is "generated", the generate the "=" version if there are no regexp codes.
Provide these:
INDEX(ville, date) -- for cases when you can do `ville = '...'`
INDEX(date) -- for cases when you must have `ville REGEXP '...'`
The first will be used (and reasonably optimal) when appropriate. The second is better than nothing. (It depends on how many rows have that date range.)
It smells like there may be other SELECTs. Let's see some other variants. What I have provided here may or may not help with them.
See my indexing cookbook: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Related

Select MySQL vs find MongoDB

i have 2 dbs: one in mySql and one in MongoDB with the same data inside...
i do the follow in mySQL:
Select tweet.testo From tweet Where tweet.testo like ‘%pizza%’
and this is the result:
1627 rows in set (2.79 sec)
but if i exec in mongo:
Db.tweets.find({text: /pizza/ }).explain()
this is the result:
nscannedObjects" : 1606334,
"n" : 1169,
or if i exec:
Db.tweets.find({text: /pizza/i }).explain()
this is the result:
"nscannedObjects" : 1606334,
"n" : 1641,
Why the number of rows/document in mysql/mongo find is different?
Why the number of rows/document in mysql/mongo find is different??
There could be 1000000000000000 reasons including the temperature of the sun on that particular day.
MongoDB and MySQL are two completely separate techs as such if you expect to keep both in synch you will need some kind of replicator between the two. You have not made us aware as to whether this is the case.
Also we have no idea of your coding, server setup, network setup and everything else so really we cannot even begin to answer this.
A good answer would be to say that the reason you are seeing this is because the data between the two is different...
As for the difference between:
Db.tweets.find({text: /pizza/ }).explain()
and
Db.tweets.find({text: /pizza/i }).explain()
This is because MySQL, by default, queries in lower case I believe and MongoDB (I know) does not as such it is case sensitive (this i makes it case insensitive).
However about replicators, here is a good one: https://docs.continuent.com/wiki/display/TEDOC/Replicating+from+MySQL+to+MongoDB
the mysql command
Select tweet.testo From tweet Where tweet.testo like ‘%pizza%’
is equivalent to MongoDB's
Db.tweets.find({text: /pizza/i })
I realized they both contain the same data, but in some cases the text in mysql was cut-off, so it resulted in less rows being returned.
To begin with your SQL query like '%pizza%' may not pickup entries that begin with the string 'pizza' because of the wildcard on the front. Try the following SQL query to rule out any syntactical differences with the matching logic in SQL and the Regex used by MongoDB
Select tweet.testo From tweet Where lower(tweet.testo) like ‘%pizza%’ or lower(tweet.testo) like ‘pizza%’
Disclaimer: I don't have mySQL in front of me just now so can't verify the leading wildcard behaviour described above, however this is consistent with other RDBMS so it's worth checking

mysql regexp for search using alias

I am not very good with regexp so I really would like some help to achieve my goal.
When searching in my db I use an alias for specific keywords.
Here is an example
keyword tets alias test (someone have spell wrong then word test)
keyword b.m.w alias bmw (if someone write b.m.w instead of bmw)
etc.
So far if a user searches for "bmw 316" I use LIKE "%bmw%316%" to get the results.
Now if the user searches for "b.m.w 316" I must use
"%b.m.w%316%" OR
"%bmw%316%"
because b.m.w has alias bmw.
In the case of 6 words with 2-3 aliases there are too many combinations.
I am trying to achieve it with regexp.
In the scenario above it would be something like (bmw|b.m.w) 316.
How do I solve this problem?
You are not looking for REGEXP you are looking for a thing called levenshtein distance
MySQL does not (yet) have native support for this (wonderful) concept, but you can download a UDF here:
http://joshdrew.com/
And here's a list so you've got something to choose from:
http://blog.lolyco.com/sean/2008/08/27/damerau-levenshtein-algorithm-levenshtein-with-transpositions/
You can also write your own function in MySQL, so you don't have to install a UDF.
http://www.supermind.org/blog/927/working-mysql-5-1-levenshtein-stored-procedure
Finally this question might help you out as well:
Implementation of Levenshtein distance for mysql/fuzzy search?
A query for the closest match would look something like:
SELECT * FROM atable a ORDER BY levenshtein(a.field, '$search') ASC LIMIT 10

MySQL fulltext with stems

I am building a little search function for my site. I am taking my user's query, stemming the keywords and then running a fulltext MySQL search against the stemmed keywords.
The problem is that MySQL is treating the stems as literal. Here is the process that is happening:
user searches for a word like "baseballs"
my stemming algorithm (Porter Stemmer) turns "baseballs" into "basebal"
fulltext does not find anything matching "basebal", even though there SHOULD be matches for "baseball" and "baseballs"
How do I do the equivalent of LIKE 'basebal%' with fulltext?
EDIT:
Here is my current query:
SELECT MATCH (`title`,`body`) AGAINST ('basebal') AS `relevance`,`id` FROM `blogs` WHERE MATCH (`title`,`body`) AGAINST ('basebal') ORDER BY `relevance` DESC
I think it will work with an asterisk at the end: basebal*. See the * operator on this page for more info.
See This link.. Stemming is not installed BY default in MySQL but you can install it your self..
http://oksoft.blogspot.com/2009/05/stemming-words-in-mysql.html
IN NATURAL LANGUAGE MODE is the default mode and not compatible with stemming. Try IN BOOLEAN MODE with wildcards...
SELECT MATCH (`title`, `body`) AGAINST ('basebal*' IN BOOLEAN MODE) AS `relevance`, `id` FROM `blogs` WHERE MATCH (`title`, `body`) AGAINST ('basebal*' IN BOOLEAN MODE) ORDER BY `relevance` DESC
Example above provides clarity for people stumbling onto this question 10 years after it was asked. Topic is still relevant and benefits from complete examples 😉

Mysql match...against vs. simple like "%term%"

What's wrong with:
$term = $_POST['search'];
function buildQuery($exploded,$count,$query)
{
if(count($exploded)>$count)
{
$query.= ' AND column LIKE "%'. $exploded[$count] .'%"';
return buildQuery($exploded,$count+1,$query);
}
return $query;
}
$exploded = explode(' ',$term);
$query = buildQuery($exploded,1,
'SELECT * FROM table WHERE column LIKE "%'. $exploded[0] .'%"');
and then query the db to retrieve the results in a certain order, instead of using the myIsam-only sql match...against?
Would it dawdle performance dramatically?
The difference is in the algorithm's that MySQL uses behind the scenes find your data. Fulltext searches also allow you sort based on relevancy. The LIKE search in most conditions is going to do a full table scan, so depending on the amount of data, you could see performance issues with it. The fulltext engine can also have performance issues when dealing with large row sets.
On a different note, one thing I would add to this code is something to escape the exploded values. Perhaps a call to mysql_real_escape_string()
You can check out my recent presentation I did for MySQL University:
http://forge.mysql.com/wiki/Practical_Full-Text_Search_in_MySQL
Slides are also here:
http://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql
In my test, using LIKE '%pattern%' was more than 300x slower than using a MySQL FULLTEXT index. My test data was 1.5 million posts from the StackOverflow October data dump.

Practical limit to length of SQL query (specifically MySQL)

Is it particularly bad to have a very, very large SQL query with lots of (potentially redundant) WHERE clauses?
For example, here's a query I've generated from my web application with everything turned off, which should be the largest possible query for this program to generate:
SELECT *
FROM 4e_magic_items
INNER JOIN 4e_magic_item_levels
ON 4e_magic_items.id = 4e_magic_item_levels.itemid
INNER JOIN 4e_monster_sources
ON 4e_magic_items.source = 4e_monster_sources.id
WHERE (itemlevel BETWEEN 1 AND 30)
AND source!=16 AND source!=2 AND source!=5
AND source!=13 AND source!=15 AND source!=3
AND source!=4 AND source!=12 AND source!=7
AND source!=14 AND source!=11 AND source!=10
AND source!=8 AND source!=1 AND source!=6
AND source!=9 AND type!='Arms' AND type!='Feet'
AND type!='Hands' AND type!='Head'
AND type!='Neck' AND type!='Orb'
AND type!='Potion' AND type!='Ring'
AND type!='Rod' AND type!='Staff'
AND type!='Symbol' AND type!='Waist'
AND type!='Wand' AND type!='Wondrous Item'
AND type!='Alchemical Item' AND type!='Elixir'
AND type!='Reagent' AND type!='Whetstone'
AND type!='Other Consumable' AND type!='Companion'
AND type!='Mount' AND (type!='Armor' OR (false ))
AND (type!='Weapon' OR (false ))
ORDER BY type ASC, itemlevel ASC, name ASC
It seems to work well enough, but it's also not particularly high traffic (a few hundred hits a day or so), and I wonder if it would be worth the effort to try and optimize the queries to remove redundancies and such.
Reading your query makes me want to play an RPG.
This is definitely not too long. As long as they are well formatted, I'd say a practical limit is about 100 lines. After that, you're better off breaking subqueries into views just to keep your eyes from crossing.
I've worked with some queries that are 1000+ lines, and that's hard to debug.
By the way, may I suggest a reformatted version? This is mostly to demonstrate the importance of formatting; I trust this will be easier to understand.
select *
from
4e_magic_items mi
,4e_magic_item_levels mil
,4e_monster_sources ms
where mi.id = mil.itemid
and mi.source = ms.id
and itemlevel between 1 and 30
and source not in(16,2,5,13,15,3,4,12,7,14,11,10,8,1,6,9)
and type not in(
'Arms' ,'Feet' ,'Hands' ,'Head' ,'Neck' ,'Orb' ,
'Potion' ,'Ring' ,'Rod' ,'Staff' ,'Symbol' ,'Waist' ,
'Wand' ,'Wondrous Item' ,'Alchemical Item' ,'Elixir' ,
'Reagent' ,'Whetstone' ,'Other Consumable' ,'Companion' ,
'Mount'
)
and ((type != 'Armor') or (false))
and ((type != 'Weapon') or (false))
order by
type asc
,itemlevel asc
,name asc
/*
Some thoughts:
==============
0 - Formatting really matters, in SQL even more than most languages.
1 - consider selecting only the columns you need, not "*"
2 - use of table aliases makes it short & clear ("MI", "MIL" in my example)
3 - joins in the WHERE clause will un-clutter your FROM clause
4 - use NOT IN for long lists
5 - logically, the last two lines can be added to the "type not in" section.
I'm not sure why you have the "or false", but I'll assume some good reason
and leave them here.
*/
Default MySQL 5.0 server limitation is "1MB", configurable up to 1GB.
This is configured via the max_allowed_packet setting on both client and server, and the effective limitation is the lessor of the two.
Caveats:
It's likely that this "packet" limitation does not map directly to characters in a SQL statement. Surely you want to take into account character encoding within the client, some packet metadata, etc.)
SELECT ##global.max_allowed_packet
this is the only real limit it's adjustable on a server so there is no real straight answer
From a practical perspective, I generally consider any SELECT that ends up taking more than 10 lines to write (putting each clause/condition on a separate line) to be too long to easily maintain. At this point, it should probably be done as a stored procedure of some sort, or I should try to find a better way to express the same concept--possibly by creating an intermediate table to capture some relationship I seem to be frequently querying.
Your mileage may vary, and there are some exceptionally long queries that have a good reason to be. But my rule of thumb is 10 lines.
Example (mildly improper SQL):
SELECT x, y, z
FROM a, b
WHERE fiz = 1
AND foo = 2
AND a.x = b.y
AND b.z IN (SELECT q, r, s, t
FROM c, d, e
WHERE c.q = d.r
AND d.s = e.t
AND c.gar IS NOT NULL)
ORDER BY b.gonk
This is probably too large; optimizing, however, would depend largely on context.
Just remember, the longer and more complex the query, the harder it's going to be to maintain.
Most databases support stored procedures to avoid this issue. If your code is fast enough to execute and easy to read, you don't want to have to change it in order to get the compile time down.
An alternative is to use prepared statements so you get the hit only once per client connection and then pass in only the parameters for each call
I'm assuming you mean by 'turned off' that a field doesn't have a value?
Instead of checking if something is not this, and it's also not that etc. can't you just check if the field is null? Or set the field to 'off', and check if type or whatever equals 'off'.