Optimizing search query - mysql

This might seem to be a redundant question but i can't find the right answer to this issue.
I have a TableA with more than 50 columns.I am implementing a search functionality for searching a query in about 10 columns of this table. TableA contains more than a million rows
For this I have created a composite index on these 10 columns.
index (col1,col_2,col_3,col_4,col_5,col_6,col_7,col_8,col_9,col_10)
Now i am splitting user's query using space as regex. i.e. $search_words = $search_query.split(' '); and using individual words to match in my search query. Example :
SELECT something FROM tableA
WHERE ( MATCH ( col_1, col_2,col_3,col_4,col_5,col_6,col_7,col_8,col_9,col_10 )
AGAINST ( ' +word1* +word2* +word3* +word4* ' IN BOOLEAN MODE ) )
This query works fine for general searches but if users searches for individual alphabets in query like A E I O Co. it takes too much time. What is the best way to optimise the query or another way to perform search in this situation?

If you feed a too-short string to InnoDB's FULLTEXT, it returns zero results. So... Filter out any strings that are shorter than innodb_ft_min_token_size.
If necessary, test for them separately using REGEXP '[[:<:]]A[[:>:]] to look for a 1-letter word A.
Or throw them together. This would check for the only 1-letter English words: REGEXP '[[:<:]][AI][[:>:]]

Related

LIKE % or AGAINST for FULLTEXT search?

I was trying to make a very fast & efficient approach to fetch the records using keywords as search.
Our MYSQL table MASTER tablescontains 30,000 rows and has 4 fields.
ID
title (FULLTEXT)
short_descr (FULLTEXT)
long_descr (FULLTEXT)
Can any one suggest which is one more efficient?
LIKE %
MYSQL's AGAINST
It would be nice if some one can write a SQL query for the keywords
Weight Loss Secrets
SELECT id FROM MASTER
WHERE (title LIKE '%Weight Loss Secrets%' OR
short_descr LIKE '%Weight Loss Secrets%' OR
long_descr LIKE '%Weight Loss Secrets%')
Thanks in advance
The FULLTEXT index should be faster, maybe its a good idea to add all columns into 1 fulltext index.
ALTER TABLE MASTER
ADD FULLTEXT INDEX `FullTextSearch`
(`title` ASC, `short_descr` ASC, `long_descr` ASC);
Then execute using IN BOOLEAN MODE
SELECT id FROM MASTER WHERE
MATCH (title, short_descr, long_descr)
AGAINST ('+Weight +Loss +Secrets' IN BOOLEAN MODE);
This will find rows that contains all 3 keywords.
However, this wont give you exact match the keywords just need to be present in same row.
If you also want exact match you could do like this, but its a bit hacky and would only work if your table doesnt get to big.
SELECT id FROM
(
SELECT CONCAT(title,' ',short_descr,' ', long_descr) AS SearchField
FROM MASTER WHERE
MATCH (title, short_descr, long_descr)
AGAINST ('+Weight +Loss +Secrets' IN BOOLEAN MODE)
) result WHERE SearchField LIKE '%Weight Loss Secrets%'

how can i make query more faster

I have one complex queries and which fetches data from database based on search keywords. I have written two query to fetch data based on keyword by joining two tables. And each table contains more than 5 millions of records. But the problem is, this query takes 5-7 seconds to run so the page take more time to laod. The queries are:
SELECT DISTINCT( `general_info`.`company_name` ),
general_info.*
FROM general_info
INNER JOIN `financial_info`
ON `financial_info`.`reg_code` = `general_info`.`reg_code`
WHERE ( `financial_info`.`type_of_activity` LIKE '%siveco%'
OR `general_info`.`company_name` LIKE '%siveco%'
OR `general_info`.`reg_code` LIKE '%siveco%' )
The parentheses around distinct don't make a difference. distinct is not a function. So your query is equivalent to:
SELECT gi.*
FROM general_info gi INNER JOIN
`financial_info` gi
ON fi.`reg_code` = gi.`reg_code`
WHERE fi.`type_of_activity` LIKE '%siveco%' OR
gi.`company_name` LIKE '%siveco%' OR
gi.`reg_code` LIKE '%siveco%';
For the join, you should have indexes on general_info(reg_code) and financial_info(reg_code). You may already have these indexes.
The real problem is probably the where clause. Because you are using wildcards at the beginning of the pattern, you cannot optimize this with a regular index. You may be able to do what you want using full text search, along with the matches clause. The documentation for such an index is here. This will work particularly well if you are looking for complete words in the various names.

SQL query running really slow

I am running this query to search the database:
SELECT
IFNULL(firstname, '') AS firstname,
IFNULL(lastname, '') AS lastname,
IFNULL(age, ' ') AS age,
email,
telephone,
comments,
ref
FROM person
RIGHT JOIN
order ON person.oID = order.ref
WHERE
LOWER(firstname) LIKE LOWER ('%{$search}%') OR
LOWER(lastname) LIKE LOWER ('%{$search}%') OR
LOWER(email) LIKE LOWER ('%{$search}%') OR
LOWER(telephone) LIKE LOWER ('%{$search}%') OR
LOWER(ref) LIKE LOWER ('%{$search}%');
It's doing a lot of processing, but how can I get these results faster? The page is taking about 6-7 seconds to load, If i run the query in PHPMyAdmin, the query takes 3-4 seconds to run. Its not a huge database, 3000 entries or so. I have added an index to the ref, email, firstname and lastname columns but that doesnt seem to have made any difference. Can anyone help?
The reason this query is slow is because you've combined two convenient but slow features of MySQL in the slowest possible way.
FUNCTION(column) LIKE %matchstring% requires a scan of the table; no ordered index can help satisfy this search because it's unanchored.
condition OR condition OR condition requires the table to be rescanned once per OR clause.
You also happen to be ignoring the fact that MySQL's searches are already case-insensitive if you have set up your column collations correctly.
Finally, it's not clear what you're doing with the RIGHT JOINed table data. Which columns of your result set come from that table? If you don't need data from that table get rid of it.
So, in summary, what you have is slow x many.
So, how can you fix this? The most important thing is for you to get rid of as many of these unanchored scans as possible. If you can change them to
email LIKE '{$search}%'
so the LOWER() functions and leading %s in the LIKE terms can be eliminated, you will have a big win.
If this sort of cast-a-wide-net search feature is critical to your application, you should consider using MySQL fulltext searching.
Or you could consider creating a new column in your table that's the concatenation of all the columns you presently search, so you can search it just once.
Edit to explain LIKE slowness
If the column haystack is indexed, the search haystack LIKE 'needle%' runs quite quickly. That's because the BTREE style index is inherently ordered. To search this way, MySQL can random-access the first possible match, and then scan sequentially to the last possible match.
But the search haystack LIKE '%needle%' can't use random access to find the first possible match in the index. The first possible match could be anywhere. So it has to scan all the values of the haystack one by one for the needle.
I would suggest that you change the right join to an inner join. The fields that you are looking for look like they are coming from the person table anyway, so the where clause is turning the query into an inner join.
SELECT
IFNULL(firstname, '') AS firstname,
IFNULL(lastname, '') AS lastname,
IFNULL(age, ' ') AS age,
email,
telephone,
comments,
ref
FROM person INNER JOIN
order
ON person.oID = order.ref
WHERE
LOWER(firstname) LIKE LOWER ('%{$search}%') OR
LOWER(lastname) LIKE LOWER ('%{$search}%') OR
LOWER(email) LIKE LOWER ('%{$search}%') OR
LOWER(telephone) LIKE LOWER ('%{$search}%') OR
LOWER(ref) LIKE LOWER ('%{$search}%');
Second, create an index on order(ref). This should greatly reduce the search space for the where clause. The syntax is:
create index order_ref on `order`(ref);
By the way, order is a bad name for a table, because it is a SQL reserved word. I would suggest orders instead.
why dont you use Full text search instead of bunch of OR and LOWER ?
SELECT
IFNULL(firstname, '') AS firstname,
IFNULL(lastname, '') AS lastname,
IFNULL(age, ' ') AS age,
email,
telephone,
comments,
ref
FROM person
RIGHT JOIN
order ON person.oID = order.ref
WHERE
MATCH (LOWER(firstname), LOWER(lastname),LOWER(email),LOWER(ref))
AGAINST ('$search' IN BOOLEAN MODE)
to run this faster you need to add an index .
ALTER TABLE person ADD FULLTEXT(firstname, lastname,email,ref);

difference between these 2 queries

Well i'm running 2 queries that should show me the same result,
First query:
SELECT count( id ) AS cv FROM table_name WHERE field_name LIKE '%êêê01, word02, word03%'
Second query:
SELECT count( id ) AS cv FROM table_name WHERE match(field_name) against('êêê01, word02, word03')
but the first show more rows than the second, someone could tell me why?
I'm using fulltext index on this field,
Thanks.
I did a quick research and the following quote should answer your question:
One problem with MATCH on MySQL is that it seems to only match against whole words so a search for 'bla' won't match a column with a value of 'blah'.
It's also described in the documentation for match
By default, the MATCH() function performs a natural language search for a string against a text collection. A collection is a set of one or more columns included in a FULLTEXT index. The search string is given as the argument to AGAINST(). For each row in the table, MATCH() returns a relevance value; that is, a similarity measure between the search string and the text in that row in the columns named in the MATCH() list.
Meanwhile like is more "powerful" as it can look upon individuals characters:
Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator:
Which explains why like returns more results than match.

MySQL Mixing Damerau–Levenshtein Fuzzy with Like Wildcard

I recently implemented the UDFs of the Damerau–Levenshtein algorithms into MySQL, and was wondering if there is a way to combine the fuzzy matching of the Damerau–Levenshtein algorithm with the wildcard searching of the Like function? If I have the following data in a table:
ID | Text
---------------------------------------------
1 | let's find this document
2 | let's find this docment
3 | When the book is closed
4 | The dcument is locked
I want to run a query that would incorporate the Damerau–Levenshtein algorithm...
select text from table where damlev('Document',tablename.text) <= 5;
...with a wildcard match to return IDs 1, 2, and 4 in my query. I'm not sure of the syntax or if this is possible, or whether I would have to approach this differently. The above select statement works fine in issolation, but is not working on individual words. I would have to change the above SQL to...
select text from table where
damlev('let's find this document',tablename.text) <= 5;
...which of course returns just ID 2. I'm hoping there is a way to combine the fuzzy and wildcard together if I want all records returned that have the word "document" or variations of it appearing anyway within the Text field.
In working with person names, and doing fuzzy lookups on them, what worked for me was to create a second table of words. Also create a third table that is an intersect table for the many to many relationship between the table containing the text, and the word table. When a row is added to the text table, you split the text into words and populate the intersect table appropriately, adding new words to the word table when needed. Once this structure is in place, you can do lookups a bit faster, because you only need to perform your damlev function over the table of unique words. A simple join gets you the text containing the matching words.
A query for a single word match would look something like this:
SELECT T.* FROM Words AS W
JOIN Intersect AS I ON I.WordId = W.WordId
JOIN Text AS T ON T.TextId = I.TextId
WHERE damlev('document',W.Word) <= 5
and two words would look like this (off the top of my head, so may not be exactly correct):
SELECT T.* FROM Text AS T
JOIN (SELECT I.TextId, COUNT(I.WordId) AS MatchCount FROM Word AS W
JOIN Intersect AS I ON I.WordId = W.WordId
WHERE damlev('john',W.Word) <= 2
OR damlev('smith',W.Word) <=2
GROUP BY I.TextId) AS Matches ON Matches.TextId = T.TextId
AND Matches.MatchCount = 2
The advantages here, at the cost of some database space, is that you only have to apply the time-expensive damlev function to the unique words, which will probably only number in the 10's of thousands regardless of the size of your table of text. This matters, because the damlev UDF will not use indexes - it will scan the entire table on which it's applied to compute a value for every row. Scanning just the unique words should be much faster. The other advantage is that the damlev is applied at the word level, which seems to be what you are asking for. Another advantage is that you can expand the query to support searching on multiple words, and can rank the results by grouping the matching intersect rows on TextId, and ranking on the count of matches.