What field structure would be better for a definition table in MySQL? - mysql

I'm making a dictionary webapp. The user will search for words. Would it be faster to do this?
SELECT * from definition WHERE word LIKE "house";
or...
SELECT * from definition WHERE word_hash LIKE md5("house");
In the second example, I store the md5() value of words in the word_hash field. Of course, "word" and "word_hash" are indexes.
Update: sometimes, the word field could be more than 1 word. Example: Sacré Bleu

Skipping LIKE completely would be faster. Added the lower case version of word as word_lc, index word_lc, and then do:
select * from definition where word_lc = lower(word_you_want)
Using LIKE without any % or _ wildcards is just a case insensitive equality test so you should go straight to a case insensitive comparison that can and will take advantage of an index. Also, as usual, say what you mean so the computer can do what you want it to do.

Related

use full text search to search incomplete words in mysql

I am making a library management system.
I have a problem in the search for a book from mysql database.
For searching data in mysql we use full text search .
But it only works if a full word is given. If user enters an incomplete word instead of the actual word , is there any function to search.
ex : if book name is calculus,
if user types calc , then also the books should come
You can try using fulltext search with boolean mode, which allows a few extra operators. You will be interested in the truncation operator (*):
The asterisk serves as the truncation (or wildcard) operator. Unlike
the other operators, it is appended to the word to be affected. Words
match if they begin with the word preceding the * operator.
If a word is specified with the truncation operator, it is not
stripped from a boolean query, even if it is too short or a stopword.
Whether a word is too short is determined from the
innodb_ft_min_token_size setting for InnoDB tables, or ft_min_word_len
for MyISAM tables. These options are not applicable to FULLTEXT
indexes that use the ngram parser.
The wildcarded word is considered as a prefix that must be present at
the start of one or more words. If the minimum word length is 4, a
search for '+word +the*' could return fewer rows than a search for
'+word +the', because the second query ignores the too-short search
term the.
Pls note, that you cannot start an expression with the * operator, so the results cannot include a book, which title contains 'calc', only which title starts with 'calc'.
You can use the LIKE operator with the "%" wildcard
With LIKE you can use the following two wildcard characters in the pattern:
% matches any number of characters, even zero characters.
for example
SELECT * FROM <Table> where book like "%calc%";
http://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html

Performance of LIKE 'xyz%' v/s LIKE '%xyz'

I was wondering how the LIKE operator actually work.
Does it simply start from first character of the string and try matching pattern, one character moving to the right? Or does it look at the placement of the %, i.e. if it finds the % to be the first character of the pattern, does it start from the right most character and starts matching, moving one character to the left on each successful match?
Not that I have any use case in my mind right now, just curious.
edit: made question narrow
If there is an index on the column, putting constant characters in the front will lead your dbms to use a more efficient searching/seeking algorithm. But even at the simplest form, the dbms has to test characters. If it is able to find it doesn't match early on, it can discard it and move onto the next test.
The LIKE search condition uses wildcards to search for patterns within a string. For example:
WHERE name LIKE 'Mickey%'
will locate all values that begin with 'Mickey' optionally followed by any number of characters. The % is not case sensitive and not accent sensitive and you can use multiple %, for example
WHERE name LIKE '%mouse%'
will return all values with 'mouse' (or 'Mouse' or 'mousé') in it.
The % is inclusive, meaning that
WHERE name like '%A%'
will return all that starts with an 'A', contain 'A' or end with 'A'.
You can use _ (underscore) for any character on a single position:
WHERE name LIKE '_at%'
will give you all values with 'a' as the second letter and 't' as the third. The first letter can be anything. For example: 'Batman'
In T-SQL, if you use [] you can find values in a range.
WHERE name LIKE '[c-f]%'
it will find any value beginning with letter between c and f, inclusive. Meaning it will return any value that start with c, d, e or f. This [] is T-SQL only. Use [^ ] to find values not in a range.
Finding all values that contain a number:
WHERE name LIKE '%[0-9]%'
returns everything that has a number in it. Example: 'Godfather2'
If you are looking for all values with the 3rd position to be a '-' (dash) use two underscores:
WHERE NAME '__-%'
It will return for example: 'Lo-Res'
Finding the values with names ends in 'xyz' use:
WHERE name LIKE '%xyz'
returns anything that ends with 'xyz'
Finding a % sign in a name use brackets:
WHERE name LIKE '%[%]%'
will return for example: 'Top%Movies'
Searching for [ use brackets around it:
WHERE name LIKE '%[[]%'
gives results as: 'New York [NY]'
The database collation's sort order determines both case sensitivety and the sort order for the range of characters. You can optionally use COLLATE to specify collation sort order used by the LIKE operator.
Usually the main performance bottleneck is IO. The efficiency of the LIKE operator can be only important if your whole table fits in the memory otherwise IO will take most of the time.
AFAIK oracle can use indexes for prefix matching. (like 'abc%'), but these index cannot be used for more complex expressions.
Anyway if you have only this kind of queries you should consider using a simple index on the related column. (Probably this is true for other RDBMS's as well.)
Otherwise LIKE operator is generally slow, but most of the RDBMS have some kind of full text searching solution. I think the main reason of the slowness is that LIKE is too general. Usually full text indexes has lots of different options which can tell the database what you really want to search for, and with these additional information the DB can do its task in a more efficient way.
As a rule of thumb I think if you want to search in a text field and you think performance can be an issue, you should consider your RDBMS's full text searching solution, or the real goal is not text searching, but this is some kind of "design side effect", for example xml/json/statuses stored in a field as text, then probably you should consider choosing a more efficient data storing option. (if there is any...)

How to use prefix wildcards like '*abc' with match-against

I have the following query :
SELECT * FROM `user`
WHERE MATCH (user_login) AGAINST ('supriya*' IN BOOLEAN MODE)
Which outputs all the records starting with 'supriya'.
Now I want something that will find all the records ending with e.g. 'abc'.
I know that * cannot be preappended and it doesn't work either and I have searched a lot but couldn't find anything regarding this.
If I give query the string priya ..it should return all records ending with priya.
How do I do this?
Match doesn't work with starting wildcards, so matching with *abc* won't work. You will have to use LIKE to achieve this:
SELECT * FROM user WHERE user_login LIKE '%abc';
This will be very slow however.
If you really need to match for the ending of the string, and you have to do this often while the performance is killing you, a solution would be to create a separate column in which you reverse the strings, so you got:
user_login user_login_rev
xyzabc cbazyx
Then, instead of looking for '%abc', you can look for 'cba%' which is much faster if the column is indexed. And you can again use MATCH if you like to search for 'cba*'. You will just have to reverse the search string as well.
I believe the selection of FULL-TEXT Searching isn't relevant here. If you are interested in searching some fields based on wildcards like:
%word% ( word anywhere in the string)
word% ( starting with word)
%word ( ending with word)
best option is to use LIKE clause as GolezTrol has mentioned.
However, if you are interested in advanced/text based searching, FULL-TEXT search is the option.
Limitations with LIKE:
There are some limitations with this clause. Let suppose you use something like '%good' (anything ending with good). It may return irrelevant results like goods, goody.
So make sure you understand what you are doing and what is required.

mysql boolean mode fulltext search with wildcards and literals

I'm pretty new to MySQL full-text searches and I ran into this problem today:
My company table has a record with "e-magazine AG" in the name column. I have a full-text index on the name column.
When I execute this query the record is not found:
SELECT id, name FROM company WHERE MATCH(name) AGAINST('+"e-magazi"*' IN BOOLEAN MODE);
I need to work with quotes because of the dash and to use the wildcard because I implement a "search as you type" functionality.
When I search for the whole term "e-magazine AG", the record is found.
Any ideas what I'm doing wrong here? I read about adding the dash to the list of word characters (config update needed) but I'm searching for a way to do this programmatically.
This clause
MATCH(name) AGAINST('+"e-magazi"*' IN BOOLEAN MODE);
Will search for a AND "e" AND NOT "magazi"; i.e. the - inside "e-magazi" will be interpreted as a not even though it is inside quotation marks.
For this reason it will not work as expected.
A solution is to apply an extra having clause with a LIKE.
I know this having is slow, but it will only be applied to the results of the match, so not too many rows should be involved.
I suggest something like:
SELECT id, name
FROM company
WHERE MATCH(name) AGAINST('magazine' IN BOOLEAN MODE)
HAVING name LIKE '%e-magazi%';
MySQL fulltext treats the word e-magazine in a text as a phrase and not as a word. Because of that it results the two words e and magazine. And while it builds the search index it does not add the e to the index because of the ft_min_word_len (default is 4 chars).
The same length limitation is used for the search query. That is the reason why a search for e-magazine returns exactly the same results as a-magazine because a and - is fully ignored.
But now you want to find the exact phrase e-magazine. By that you use the quotes and that is the complete correct way to find phrases, but MySQL does not support operators for phrases, only for words:
https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html
With this modifier, certain characters have special meaning at the beginning or end of words in the search string
Some people would suggest to use the following query:
SELECT id, name
FROM company
WHERE MATCH(name) AGAINST('e-magazi*' IN BOOLEAN MODE)
HAVING name LIKE 'e-magazi%';
As I said MySQL ignores the e- and searches for the wildcard word magazi*. After those results are optained it uses HAVING to aditionally filter the results for e-magazi* including the e-. By that you will find the phrase e-magazine AG. Of course HAVING is only needed if the search phrase contains the wildcard operator and you should never use quotes. This operator is used by your user and not you!
Note: As long you do not surround the search phrase with % it will find only fields that start with that word. And you do not want to surround it, because it would find bee-magazine as well. So maybe you need an additional OR HAVING name LIKE ' %e-magazi%' OR HAVING NAME LIKE '\\n%e-magazi%' to make it usable inside of texts.
Trick
But finally I prefer a trick so HAVING isn't needed at all:
If you add texts to your database table, add them additionally to a separate fulltext indexed column and replace words like up-to-date with up-to-date uptodate.
If a user searches for up-to-date replace it in the query with uptodate.
By that you can still find specific in user-specific but up-to-date as well (and not only date).
Bonus
If a user searches for -well-known huge ports MySQL treats that as not include *well*, could include *known* and *huge*. Of course you could solve that with an other extra query variant as well, but with the trick above you remove the hyphen so the search query looks simply like that:
SELECT id
FROM texts
WHERE MATCH(text) AGAINST('-wellknown huge ports' IN BOOLEAN MODE)

mysql fulltext MATCH,AGAINST returning 0 results

I am trying to follow: http://dev.mysql.com/doc/refman/4.1/en/fulltext-natural-language.html
in an attempt to improve search queries, both in speed and the ability to order by score.
However when using this SQL ("skitt" is used as a search term just so I can try match Skittles).
SELECT
id,name,description,price,image,
MATCH (name,description)
AGAINST ('skitt')
AS score
FROM
products
WHERE
MATCH (name,description)
AGAINST ('skitt')
it returns 0 results. I am trying to find out why, I think I might have set my index's up wrong I'm not sure, this is the first time I've strayed away from LIKE!
Here is my table structure and data:
Thank you!
By default certain words are excluded from the search. These are called stopwords. "a" is an example of a stopword. You could test your query by using a word that is not a stopword, or you can disable stopwords:
How can I write full search index query which will not consider any stopwords?
If you want to also match prefixes use the truncation operator in boolean mode:
*
The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator.