Inefficient LIKE query - mysql

I have an online search box that needs to look across many MySQL columns for a match. And it needs to handle a multi-keyword search.
Use cases:
I search for DP/101/R/23 (rego no)
I search for Johnty Winebottom (owner)
I search for Le Mans 1969 (mixed, history related keywords)
I get a lot of special chars so fulltext doesn't always work. So I'm splitting the keyword input apart on spaces and then looping thorugh and doing LIKE queries.
Simplified query that gets the point across (I've removed many columns):
SELECT   `cars`.`id`,
         `cars`.`car_id`,
         `cars`.`date_of_build`,
…..
FROM     (`cars`)
WHERE    (
                  `chassis_no` LIKE "DP/101/R/23"
         OR       `chassis_no` LIKE "DP/101/R/23 %"
         OR       `chassis_no` LIKE "% DP/101/R/23"
         OR       `chassis_no` LIKE "% DP/101/R/23 %"
         OR       `history` LIKE "DP/101/R/23"
         OR       `history` LIKE "DP/101/R/23 %"
         OR       `history` LIKE "% DP/101/R/23"
         OR       `history` LIKE "% DP/101/R/23 %"
….
In this case (rego no) it's exact so matches the LIKE without spaces on either side.
This works.. but is slow and feels wrong. Is there another way to do this that's more efficient?
EDIT:: Using REGEXP appears to work and actually is a little faster:
chassis_no` REGEXP "([ ]*)DP/101/R/23([ ]*)"
I'm not sure of a better way since fulltext fails on many of the special characters in my data.

Related

How to do fulltext search in multiple columns in MySQL, quickly?

I know this question has been asked several times.. but , let me explain.
I have a table with 450k records of users (id, first name, last name, address, phone number, etc ..).
I want to search users by thei first name and/or their last name.
I used these queries :
SELECT * FROM correspondants WHERE nom LIKE 'Renault%' AND prénom LIKE 'r%';
and
SELECT * FROM correspondants WHERE CONCAT(nom, CHAR(32), prénom= LIKE 'Renault r%';
It works well, but with a too high duration (1,5 s). This is my problem.
To fix it, I tried with MATCH and AGAINST with a full text index on both colums 'nom' and 'prénom' :
SELECT * FROM correspondants WHERE MATCH(nom, prénom) AGAINST('Renault r');
It's very quick (0,000 s ..) but result is bad, I don't obtain what I should have.
For example, with LIKE function, results are :
88623 RENAULT Rémy
91736 RENAULT Robin
202269 RENAULT Régine
(3 results).
And with MATCH/AGAINST :
327380 RENAULT Luc
1559 RENAULT Marina
17280 RENAULT Anne
(...)
88623 RENAULT Rémy
91736 RENAULT Robin
202269 RENAULT Régine
(...)
436696 SEZNEC-RENAULT Helene
(...)
(115 results !)
What is the best way to do a quick and efficient text search on both columns with a "AND" search ? (and what about indexes)
Fulltext search doesn't do pattern-matching as LIKE string comparisons do. Fulltext search only searches for full words, not fragments like r%.
Also there's a minimum size of word, controlled by the ft_min_word_len configuration variable. To avoid making the fulltext index too large, it doesn't index words smaller than that variable. And therefore short words are ignored when you search, so r is ignored.
There's also no choice in fulltext indexing to search for words in a specific position like at the beginning of a string. So your search for renault may be found in the middle of the string.
To solve these issues, you could do the following:
SELECT * FROM correspondants WHERE MATCH(nom, prénom) AGAINST('Renault')
AND CONCAT(nom, CHAR(32), prénom) LIKE 'Renault r%';
This would use the fulltext index to find a small subset of your 450,000 rows that have the word renault somewhere in the string. Then the second term in the search would be done without help from an index, but only against the subset of rows that match the first term.
That particular query is best done this way:
INDEX(nom, prénom)
WHERE non = 'Relault' AND prénom LIKE 'R%'
I recommend that you add that index and add code to your application to handle different requests in different ways.
Do not hide an indexed column inside a function call, such as CONCAT(nom, ...), it will not be able to use the index; instead it will check every row, performing the CONCAT for every row and then doing the LIKE. Very slow.
Except for cases of initials (as above), you should mostly avoid very short names. However, here is another case where you can make it work with extra code:
WHERE nom = 'Lu'
(with the same index). Note that using any flavor of MATCH is likely to be much less efficient.
So, if you are given a full last name, use WHERE nom =. If you are given a prefix, then it might work to use WHERE nom LIKE 'Prefix%' Etc.
FULLTEXT is best used for cases where you have full words scattered in longer text, which is not your case since you have nom and prénom split out.
Perhaps you should not use MATCH for anything in this schema.

MySQL dictionary query optimization

I have a dictionary query which I would like to optimize. Apparently the query is too long as the result page takes quite long to load. The query is as follows:
$var = #$_GET['q'] ;
$varup1 = strtoupper($var);
$varup = addslashes ($varup1);
$query1 = "select distinct $lang from $dict WHERE
UPPER ($lang) LIKE trim('$varup')
or UPPER($lang) LIKE replace('$varup',' ','')
or replace($lang,'ß','ss') LIKE trim('$varup')
or replace($lang,'ss','ß') LIKE trim('$varup')
or replace($lang,'ence','ance') LIKE trim('$varup')
or replace($lang,'ance','ence') LIKE trim('$varup')
or UPPER ($lang) like trim(trailing 'LY' from '$varup')
or UPPER ($lang) like trim(trailing 'Y' from '$varup')
or UPPER ($lang) like trim(trailing 'MENTE' from '$varup')
or UPPER ($lang) like trim(trailing 'EMENT' from '$varup')
or UPPER ($lang) like trim(trailing 'IN' from '$varup')
The purpose is that a search string shall also find different writings of the same word or the adverb of an adjective.
The table looks like
or
For instance "flawlessly" shall also display "flawless". "Fully" shall also find "full" and vice-versa.
"Feliz" should also find the entries for "Felizmente".
There are around twenty substitutes like the above which I eliminated as they do not make the question easier to understand.
The whole code is quite long and I wonder if I can make it smaller without losing functionality. Any ideas?
Where is the FROM clause in the query?
The REPLACE calls could be chained: REPLACE(REPLACE(..., 'a', 'b'), 'c', 'd'). Ditto for theTRIM` calls.
As already mentioned, a suitable COLLATION eliminates all need for UPPER() and LOWER(). Avoid the ...general... collations, and you will be provided with this: ss=ß. Many, but not all, treat ij=ij and/or oe=œ and/or Aa=Å (etc); do you need them, too? Here is a rundown of most situations: http://mysql.rjweb.org/utf8_collations.html
Using a FULLTEXT index will take care of most of the endings you are testing for, there obviating most of your code.
You show multiple words in the second column. Is this simply for display? If you need to pick apart the words, then you have other nasty challenges.
This, alone, will speed up the query something like 10-fold:
WHERE english LIKE 'ha%'
AND ... (whatever else you have)
That is, filter on the first 2 letters with something that can use INDEX(english), specifically LIKE 'ha%' for the word hate. Since you seem to be using PHP, there should be no difficulty building this into the query.
Here's another thought on my substring($word, 0, 2)... Instead of specifially using "2", see if floor(strlen($word)/2) will work well enough. So, 'flawlessly' would be tested LIKE 'flawl%' and run a lot faster than even 10-fold.
But, another issue. Are you chopping both the word in the table and the word given? Try to avoid chopping the word in the table. To discuss this further, please provide the table entries for 'flaw', 'flaws', 'flawless', flawlessly', etc. I can't quite tell if you need to get all the way down to 'flaw', but have various rows for the various forms.
Beware of some very short words with odd forms. Perhaps you need to add extra entries to avoid making the SQL query too messy. These change the second letter: "LIE" and "LYING". Seems like there is even a common word that changes the first letter.

Regular Expression Error in MySQL Query

I'm trying to search through a database of sofware titles for those that have an interior capital letter (e.g PowerPoiint, inCase).
I tried
select * from table where field REGEXP '^([a-z][A-Z]+)+$'
This seemed to work as it returned a subset of the table and most were correct but a fair amount were not (e.g Alias). Clearly it is doing something right but not sure what; could it be that the ascii is somehow messed up?
Try this as your RegEx pattern:
^[A-z]+[A-Z][A-z]+$
It will match all the examples above (PowerPoint, inCase), and not match 'Alias', one of the examples that you are having trouble with.

How to use prefix wildcards like '*abc' with match-against

I have the following query :
SELECT * FROM `user`
WHERE MATCH (user_login) AGAINST ('supriya*' IN BOOLEAN MODE)
Which outputs all the records starting with 'supriya'.
Now I want something that will find all the records ending with e.g. 'abc'.
I know that * cannot be preappended and it doesn't work either and I have searched a lot but couldn't find anything regarding this.
If I give query the string priya ..it should return all records ending with priya.
How do I do this?
Match doesn't work with starting wildcards, so matching with *abc* won't work. You will have to use LIKE to achieve this:
SELECT * FROM user WHERE user_login LIKE '%abc';
This will be very slow however.
If you really need to match for the ending of the string, and you have to do this often while the performance is killing you, a solution would be to create a separate column in which you reverse the strings, so you got:
user_login user_login_rev
xyzabc cbazyx
Then, instead of looking for '%abc', you can look for 'cba%' which is much faster if the column is indexed. And you can again use MATCH if you like to search for 'cba*'. You will just have to reverse the search string as well.
I believe the selection of FULL-TEXT Searching isn't relevant here. If you are interested in searching some fields based on wildcards like:
%word% ( word anywhere in the string)
word% ( starting with word)
%word ( ending with word)
best option is to use LIKE clause as GolezTrol has mentioned.
However, if you are interested in advanced/text based searching, FULL-TEXT search is the option.
Limitations with LIKE:
There are some limitations with this clause. Let suppose you use something like '%good' (anything ending with good). It may return irrelevant results like goods, goody.
So make sure you understand what you are doing and what is required.

Punctuation insensitive search in mySQL

I have a database of phrases that users will search for from their own input. I want them to find the phrase regardless of what punctuation they use. For example if the phrase, "Hey, how are you?" is in the row, I want all of the following searches to return it:
"Hey! How are you?!"
"Hey how are you?"
"Hey :) How are you?"
Right now, I have the columns 'phrase' and 'phrase_search'. Phrase search is a stripped down version of phrase so our example would be 'hey-how-are-you'.
Is there anyway to achieve this without storing the phrase twice?
Thank you!
-Nicky
What you've done is probably the most time-efficient way of doing it. Yes, it requires double the space, but is that an issue?
If it is an issue, a possible solution would be to convert your search string to use wildcards (eg. %Hey%how%are%you%) and then filter the SQL results in your code by applying the same stripping function to the database input and the search string and comparing them. The rationale behind this is that there should be relatively few matches with non-punctuation characters in-between the words, so you're still getting MySQL to do the "heavy lifting" while your PHP/Perl/Python/whatever code can do a more fine-grained check on a relatively small number of rows.
(This assumes that you have some code calling this, rather than a user typing the SQL query from the command line, of course.)