Mysql optimization for REGEXP

Mysql optimization for REGEXP - mysql

This query (with different name instead of "jack") happens many times in my slow query log. Why?
The Users table has many fields (more than these three I've selected) and about 40.000 rows.
select name,username,id from Users where ( name REGEXP
'[[:<:]]jack[[:>:]]' ) or ( username REGEXP '[[:<:]]jack[[:>:]]' )
order by name limit 0,5;
id is primary and autoincrement.
name has an index.
username has a unique index.
Sometimes it takes 3 seconds!
If I explain the select on MySQL I've got this:
select type: SIMPLE
table: Users
type: index
possible keys: NULL
key: name
key len: 452
ref: NULL
rows: 5
extra: Using where
Is this the best I can do? What can I fix?

If you must use regexp-style WHERE clauses, you definitely will be plagued by slow-query problems. For regexp-style search to work, MySQL has to compare every value in your name column with the regexp. And, your query has doubled the trouble by also looking at your username column.
This means MySQL can't take advantage of any indexes, which is how all DBMSs speed up queries of large tables.
There are a few things you can try. All of them involve saying goodbye to REGEXP.
One is this:
WHERE name LIKE CONCAT('jack', '%') OR username LIKE CONCAT('jack', '%')
If you create indexes on your name and username columns this should be decently fast. It will look for all names/usernames beginning with 'jack'. NOTICE that
WHERE name LIKE CONCAT('%','jack') /* SLOW!!! */
will look for names ending with 'jack' but will be slow like your regexp-style search.
Another thing you can do is figure out why your application needs to be able to search for part of a name or username. You can either eliminate this feature from your application, or figure out some better way to handle it.
Possible better ways:
Ask your users to break up their names into given-name and surname fields, and search separately.
Create a separate "search all users" feature that only gets used when a user needs it, thereby reducing the frequency of your slow regexp-style query.
Break up their names into a separate name-words table yourself using some sort of preprocessing progam. Search the name-words table without regexp.
Figure out how to use MySQL full text search for this feature.
All of these involve some programming work.

I reached 50% speedup just by adding fieldname != '' in where clause. It makes mysql to use indexes.
SELECT name, username, id
FROM users
WHERE name != ''
AND (name REGEXP '[[:<:]]jack[[:>:]]' or username REGEXP '[[:<:]]jack[[:>:]]')
ORDER BY name
LIMIT 0,5;
Not a perfect solution but helps.

Add "LIKE" in front
from
SELECT cat_ID, categoryName FROM category WHERE cat_ID REGEXP '^15-64-8$' ORDER BY categoryName
to
SELECT cat_ID, categoryName FROM category WHERE cat_ID LIKE '15-64-8%' and cat_ID REGEXP '^15-64-8$' ORDER BY categoryName
Of cos, that works only if U r search for phrases U know starting with what, else full text index is the solution.

Related

What will be the SQL Query for search function? [duplicate]

I am new to SQL programming.
I have a table job where the fields are id, position, category, location, salary range, description, refno.
I want to implement a keyword search from the front end. The keyword can reside in any of the fields of the above table.
This is the query I have tried but it consist of so many duplicate rows:
SELECT
a.*,
b.catname
FROM
job a,
category b
WHERE
a.catid = b.catid AND
a.jobsalrange = '15001-20000' AND
a.jobloc = 'Berkshire' AND
a.jobpos LIKE '%sales%' OR
a.jobloc LIKE '%sales%' OR
a.jobsal LIKE '%sales%' OR
a.jobref LIKE '%sales%' OR
a.jobemail LIKE '%sales%' OR
a.jobsalrange LIKE '%sales%' OR
b.catname LIKE '%sales%'

For a single keyword on VARCHAR fields you can use LIKE:
SELECT id, category, location
FROM table
WHERE
(
category LIKE '%keyword%'
OR location LIKE '%keyword%'
)
For a description you're usually better adding a full text index and doing a Full-Text Search (MyISAM only):
SELECT id, description
FROM table
WHERE MATCH (description) AGAINST('keyword1 keyword2')

SELECT
*
FROM
yourtable
WHERE
id LIKE '%keyword%'
OR position LIKE '%keyword%'
OR category LIKE '%keyword%'
OR location LIKE '%keyword%'
OR description LIKE '%keyword%'
OR refno LIKE '%keyword%';

Ideally, have a keyword table containing the fields:
Keyword
Id
Count (possibly)
with an index on Keyword. Create an insert/update/delete trigger on the other table so that, when a row is changed, every keyword is extracted and put into (or replaced in) this table.
You'll also need a table of words to not count as keywords (if, and, so, but, ...).
In this way, you'll get the best speed for queries wanting to look for the keywords and you can implement (relatively easily) more complex queries such as "contains Java and RCA1802".
"LIKE" queries will work but they won't scale as well.

Personally, I wouldn't use the LIKE string comparison on the ID field or any other numeric field. It doesn't make sense for a search for ID# "216" to return 16216, 21651, 3216087, 5321668..., and so on and so forth; likewise with salary.
Also, if you want to use prepared statements to prevent SQL injections, you would use a query string like:
SELECT * FROM job WHERE `position` LIKE CONCAT('%', ? ,'%') OR ...

I will explain the method i usally prefer:
First of all you need to take into consideration that for this method you will sacrifice memory with the aim of gaining computation speed.
Second you need to have a the right to edit the table structure.
1) Add a field (i usually call it "digest") where you store all the data from the table.
The field will look like:
"n-n1-n2-n3-n4-n5-n6-n7-n8-n9" etc.. where n is a single word
I achieve this using a regular expression thar replaces " " with "-".
This field is the result of all the table data "digested" in one sigle string.
2) Use the LIKE statement %keyword% on the digest field:
SELECT * FROM table WHERE digest LIKE %keyword%
you can even build a qUery with a little loop so you can search for multiple keywords at the same time looking like:
SELECT * FROM table WHERE
digest LIKE %keyword1% AND
digest LIKE %keyword2% AND
digest LIKE %keyword3% ...

You can find another simpler option in a thread here: Match Against.. with a more detail help in 11.9.2. Boolean Full-Text Searches
This is just in case someone need a more compact option. This will require to create an Index FULLTEXT in the table, which can be accomplish easily.
Information on how to create Indexes (MySQL): MySQL FULLTEXT Indexing and Searching
In the FULLTEXT Index you can have more than one column listed, the result would be an SQL Statement with an index named search:
SELECT *,MATCH (`column`) AGAINST('+keyword1* +keyword2* +keyword3*') as relevance FROM `documents`USE INDEX(search) WHERE MATCH (`column`) AGAINST('+keyword1* +keyword2* +keyword3*' IN BOOLEAN MODE) ORDER BY relevance;
I tried with multiple columns, with no luck. Even though multiple columns are allowed in indexes, you still need an index for each column to use with Match/Against Statement.
Depending in your criterias you can use either options.

I know this is a bit late but what I did to our application is this. Hope this will help someone tho. But it works for me:
SELECT * FROM `landmarks` WHERE `landmark_name` OR `landmark_description` OR `landmark_address` LIKE '%keyword'
OR `landmark_name` OR `landmark_description` OR `landmark_address` LIKE 'keyword%'
OR `landmark_name` OR `landmark_description` OR `landmark_address` LIKE '%keyword%'

SQL query running really slow

I am running this query to search the database:
SELECT
IFNULL(firstname, '') AS firstname,
IFNULL(lastname, '') AS lastname,
IFNULL(age, ' ') AS age,
email,
telephone,
comments,
ref
FROM person
RIGHT JOIN
order ON person.oID = order.ref
WHERE
LOWER(firstname) LIKE LOWER ('%{$search}%') OR
LOWER(lastname) LIKE LOWER ('%{$search}%') OR
LOWER(email) LIKE LOWER ('%{$search}%') OR
LOWER(telephone) LIKE LOWER ('%{$search}%') OR
LOWER(ref) LIKE LOWER ('%{$search}%');
It's doing a lot of processing, but how can I get these results faster? The page is taking about 6-7 seconds to load, If i run the query in PHPMyAdmin, the query takes 3-4 seconds to run. Its not a huge database, 3000 entries or so. I have added an index to the ref, email, firstname and lastname columns but that doesnt seem to have made any difference. Can anyone help?

The reason this query is slow is because you've combined two convenient but slow features of MySQL in the slowest possible way.
FUNCTION(column) LIKE %matchstring% requires a scan of the table; no ordered index can help satisfy this search because it's unanchored.
condition OR condition OR condition requires the table to be rescanned once per OR clause.
You also happen to be ignoring the fact that MySQL's searches are already case-insensitive if you have set up your column collations correctly.
Finally, it's not clear what you're doing with the RIGHT JOINed table data. Which columns of your result set come from that table? If you don't need data from that table get rid of it.
So, in summary, what you have is slow x many.
So, how can you fix this? The most important thing is for you to get rid of as many of these unanchored scans as possible. If you can change them to
email LIKE '{$search}%'
so the LOWER() functions and leading %s in the LIKE terms can be eliminated, you will have a big win.
If this sort of cast-a-wide-net search feature is critical to your application, you should consider using MySQL fulltext searching.
Or you could consider creating a new column in your table that's the concatenation of all the columns you presently search, so you can search it just once.
Edit to explain LIKE slowness
If the column haystack is indexed, the search haystack LIKE 'needle%' runs quite quickly. That's because the BTREE style index is inherently ordered. To search this way, MySQL can random-access the first possible match, and then scan sequentially to the last possible match.
But the search haystack LIKE '%needle%' can't use random access to find the first possible match in the index. The first possible match could be anywhere. So it has to scan all the values of the haystack one by one for the needle.

I would suggest that you change the right join to an inner join. The fields that you are looking for look like they are coming from the person table anyway, so the where clause is turning the query into an inner join.
SELECT
IFNULL(firstname, '') AS firstname,
IFNULL(lastname, '') AS lastname,
IFNULL(age, ' ') AS age,
email,
telephone,
comments,
ref
FROM person INNER JOIN
order
ON person.oID = order.ref
WHERE
LOWER(firstname) LIKE LOWER ('%{$search}%') OR
LOWER(lastname) LIKE LOWER ('%{$search}%') OR
LOWER(email) LIKE LOWER ('%{$search}%') OR
LOWER(telephone) LIKE LOWER ('%{$search}%') OR
LOWER(ref) LIKE LOWER ('%{$search}%');
Second, create an index on order(ref). This should greatly reduce the search space for the where clause. The syntax is:
create index order_ref on `order`(ref);
By the way, order is a bad name for a table, because it is a SQL reserved word. I would suggest orders instead.

why dont you use Full text search instead of bunch of OR and LOWER ?
SELECT
IFNULL(firstname, '') AS firstname,
IFNULL(lastname, '') AS lastname,
IFNULL(age, ' ') AS age,
email,
telephone,
comments,
ref
FROM person
RIGHT JOIN
order ON person.oID = order.ref
WHERE
MATCH (LOWER(firstname), LOWER(lastname),LOWER(email),LOWER(ref))
AGAINST ('$search' IN BOOLEAN MODE)
to run this faster you need to add an index .
ALTER TABLE person ADD FULLTEXT(firstname, lastname,email,ref);

mysql regex() alternation match order?

With the following query:
SELECT * FROM people WHERE name REGEXP(bob|robert)
Am I right in assuming that mysql will scan each row in a fairly random order looking for either 'bob' or 'robert' (rather than bob first, then another scan for robert)?
If so, is there any way to get mysql to attempt to match the entire table against 'bob' first and then 'robert' without performing two seperate queries?

SELECT * FROM people WHERE name REGEXP(bob|robert) order by name desc
It is only one query, and do the job.
SGBD can scan the data as they are please to do, it is not specify in SQL, and it is not random.
Unspecified can be random but isn't random.

There is no logical way to match the entire table against bob first (why would you want to?)
You can order the results, though, but it can be slow if the table has high cardinality and/or name is not a key.
SELECT * FROM people WHERE name = 'bob' OR name = 'robert'
ORDER BY name = 'bob' DESC

MySql Explain ignoring the unique index in a particular query

I started looking into Index(es) in depth for the first time and started analyzing our db beginning from the users table for the first time. I searched SO to find a similar question but was not able to frame my search well, I guess.
I was going through a particular concept and this first observation left me wondering - The difference in these Explain(s) [Difference : First query is using 'a%' while the second query is using 'ab%']
[Total number of rows in users table = 9193]:
1) explain select * from users where email_address like 'a%';
(Actually matching columns = 1240)
2) explain select * from users where email_address like 'ab%';
(Actually matching columns = 109)
The index looks like this :
My question:
Why is the index totally ignored in the first query? Does mySql think that it is a better idea not to use the index in the case 1? If yes, why?

If the probability, based statistics mysql collects on distribution of the values, is above a certain ratio of the total rows (typically 1/11 of the total), mysql deems it more efficient to simply scan the whole table reading the disks pages in sequentially, rather than use the index jumping around the disk pages in random order.
You could try your luck with this query, which may use the index:
where email_address between 'a' and 'az'
Although doing the full scan may actually be faster.

This is not a direct answer to your question but I still want to point it out (in case you already don't know):
Try:
explain select email_address from users where email_address like 'a%';
explain select email_address from users where email_address like 'ab%';
MySQL would now use indexes in both the queries above since the columns of interest are directly available from the index.
Probably in the case where you do a "select *", index access is more costly since the optmizer has to go through the index records, find the row ids and then go back to the table to retrieve other column values.
But in the query above where you only do a "select email_address", the optmizer knows all the information desired is available right from the index and hence it would use the index irrespective of the 30% rule.
Experts, please correct me if I am wrong.

optimize SELECT query, knowing that we are dealing with a limited range

I am trying to include in a MYSQL SELECT query a limitation.
My database is structured in a way, that if a record is found in column one then only 5000 max records with the same name can be found after that one.
Example:
mark
..mark repeated 5000 times
john
anna
..other millions of names
So in this table it would be more efficent to find the first Mark, and continue to search maximum 5000 rows down from that one.
Is it possible to do something like this?

Just make a btree index on the name column:
CREATE INDEX name ON your_table(name) USING BTREE
and mysql will silently do exactly what you want each time it looks for a name.

Try with:
SELECT name
FROM table
ORDER BY (name = 'mark') DESC
LIMIT 5000
Basicly you sort mark 1st then the rest follow up and gets limited.

Its actually quite difficult to understand your desired output .... but i think this might be heading in the right direction ?
(SELECT name
FROM table
WHERE name = 'mark'
LIMIT 5000)
UNION
(SELECT name
FROM table
WHERE name != 'mark'
ORDER BY name)
This will first get upto 5000 records with the first name as mark then get the remainder - you can add a limit to the second query if required ... using UNION
For performance you should ensure that the columns used by ORDER BY and WHERE are indexed accordingly

If you make sure that the column is properly indexed, MySQL will take care off optimisation for you.
Edit:
Thinking about it, I figured that this answer is only useful if I specify how to do that. user nobody beat me to the punch: CREATE INDEX name ON your_table(name) USING BTREE
This is exactly what database indexes are designed to do; this is what they are for. MySQL will use the index itself to optimise the search.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008