MySQL search query optimization: Match...Against vs %LIKE% - mysql

I wanted to build my first real search function. I've been Googling a while, but wasn't able to really make my mind up and understand everything.
My database exists of three InnoDB tables:
Products: Contains of a product information. Columns: proID (primary, auto-increment), content (contains up to a few hundred words), title, author, year, and a bunch of others that are not related to the search query. Rows: 100 to 2000.
Categories: Contains category information: Columns: catID (primary, auto-increment), catName. Rows: 5-30
Productscategories: Link between the two above. Each product can be related to multiple categories. Columns: pcID (primary, auto-increment), catID, proID. Rows: 1-5 times amount of products.
My search function offers the following things. They do not have to be filled in. If more than one is filled in, the final query will connect them with the AND-query:
Terms: Searches the content and title field. Searches on random terms, multiple words can be added, but searches for each of them seperate. Most likely 1 match with the database should be enough for a hit (OR-query)
Year: Searches on the year-column of products.
Category: Selectable from a list of categories. Multiple possible. The form returns the catID's of the chosen categories. 1 match with the database should be enough for a hit (OR-query)
Author: Searches on the author-column of products
As you may have noticed, when a category is selected, the tables products and productcategories are joined together for the search query. There is also a foreign key set between the two.
To clearify the relations an example of how it should be interpreted(no search for the year!):
Search WHERE (products.content = term 1 OR products.content = term 2 OR products.title = term 1 OR products.title = term 2 ......) AND (products.author = author) AND (productscategories.catID = catID1 OR productscategories.catID= catID2 ......)
Also note that I created a pagination system that only shows 10 results on each 'page'.
The question I am stuck with is the following: I wish to optimize this search query, but can't figure out which way is the best.
Most cases I found Googling used the LIKE %% mysqli-query. However some used the MATCH...AGAINST. I seem to really like the last one because I read it can sort on relevance and because it seems to make the query a lot easier to create (1 match against the term values instead of plenty of LIKE %% combined with OR). It seems I would only use it on the Term-searchfield though. However for MATCH...AGAINST I will need a MyIsam table (right?), in which I can't use the foreign key to prevent faults in the database.
MATCH...AGAINST example (without year field, category field and not joining products and productscategories):
SELECT *,MATCH (content,title) AGAINST ('search terms' IN BOOLEAN MODE) AS relevance
FROM products WHERE (MATCH (content,title) AGAINST ('search terms' IN BOOLEAN MODE)) AND
author='author' SORT BY relevance LIMIT 10
%LIKE% example(without year field, category field and not joining products and productscategories) and sadly no relevance sorting:
SELECT * FROM products WHERE
(content LIKE '%term1%' OR content LIKE '%term2' OR title LIKE '%term1%' OR title LIKE '%term2%')
AND (author='author') SORT BY title LIMIT 10
I could make a relevance sorting by using the CASE and add 'points' if a term comes in the title or the content? Or would that make the query too heavy for performance?
So what is the best way to make this kind of query? Go with Innodb and LIKE, or switch to MyIsam and use the MATCH...AGAINST for sorting?

You dont have to switch to MyIsam. Fulltext indexing is supported in Mysql 5.6 and higher.
I usually recommend using fulltext indexes. Create a fulltext index on your columns title,author,year
Then you can run a fulltext query on all 3 at the same time, and apply IN BOOLEAN MODE to really narrow your searches. This is ofcourse something you have to decide for yourself but the options in fulltext are more.
However, if you are running queries that spawn between a range, date for instance or a simple string. Then a standard index is better but for tekst searching in different columns, fulltext index is the way to go!
Read this: http://dev.mysql.com/doc/refman/5.6/en/fulltext-search.html

Related

LIKE is faster than FULLTEXT search in MySQL

I have a table called documents that have around 30 columns, around 3.5 million rows at a size of about 10GB. The most important columns are:
system_id, archive_id, content, barcodes, status and notes.
As you can see this is a multi-tenant application where each tenant is a system and references through system_id.
I have 2 indexes on this table where the first one is a BTREE and have the columns system_id, archive_id and status in it's index.
The other one is a FULLTEXT index containing the columns content, barcodes and notes.
I have two different tenants that I want to highlight. The first one (Customer A) has system_id = 1 and have say 1000 records in the documents table. The second one (Customer B) have system_id = 2 and say 400 000 records in this table.
The LIKE query for Customer A is:
SELECT *
FROM documents
WHERE system_id = 1 AND
CONCAT_WS(' ',content,barcodes,notes) LIKE '%office%' AND
status = 100
The above query will run in about 0.02 seconds. If I run a similar query but with the FULLTEXT search like
SELECT *
FROM documents
WHERE system_id = 1 AND
MATCH(content,barcodes,notes) AGAINST ('office' IN BOOLEAN MODE) AND
status = 100
This operation takes around 4 seconds?! I have read that the FULLTEXT search index should be a lot quicker than LIKE.
If I run the same queries but for Customer B (that has 400 000 records in the documents table) the LIKE search is a little bit slower than FULLTEXT but not with a lot.
What can the reason for this be?
Should I go with LIKE or FULLTEXT search in above situation (8GB RAM database server)?
I'm a little bit confused of why my queries with FULLTEXT search is taking so long. The text in content is probably not just words that a normal person would use because it's OCR-read from the document so there will be a lot of different words that might blow up the index?
The EXPLAINs will show that the fast query is using your index on system_id and status, not the LIKE. It was fast, not because of LIKE, but because of that filtering.
And the slow query decided to use the FULLTEXT index because the Optimizer is too dumb to realize that lots of rows contain "office".
LIKE, especially in conjunction with CONCAT_WS is not faster than FULLTEXT.

Is it a good practice to do full text search for all kinds of fields?

I am creating a search functionality for my website and I've learned that MySQL has full-text capability as of version 5.3.
I did some research and some people claim that full-text search is slower than LIKE search (which makes sense to me). I was doing LIKE search and the results are not awesome. However, with the full-text search I got much better results.
For the columns which has a lot of text, lets say, content which has more than 5K words. It makes sense to do full-text search. However, does it make sense to do full text search for the columns, lets say first_name, last_name which does not have much text.?
When I do LIKE search on those columns (first_name, last_name), I am searching every single text in all columns. Lets say, the search query is "ma meq". When I search, I use search "ma" in both first_name and last_name and same for the meq. The thing is releavance is much better in full text search.
I've created such a query:
select member_id FROM Members WHERE MATCH(Members.value) AGAINST ( 'ma meq*' IN BOOLEAN MODE) AND FIELD_ID IN(3,4)
This query gives me better results than like equivalent:
select member_id FROM Members WHERE (value LIKE ma) AND field_id = 3
select member_id FROM Members WHERE (value LIKE meq) AND field_id = 3
select member_id FROM Members WHERE (value LIKE ma) AND field_id = 4
select member_id FROM Members WHERE (value LIKE meq) AND field_id = 4
. What I am want to know is that is that a good practice? Does it make sense to use full-text search even if the column does not much text?
Everything has use, but also have a cost. A full text search index is bigger. But not sure where you hear is slower. Is design to work with great volumen of data.
At the end you have to test and decide if the cost is worth the perfomance improvement.

FullText indexes in multiple variable columns

I am currently looking into using FULLTEXT indexes in MySQL for search functionality within a web site.
Basically, the user can go to an advanced search page, and select 1 or more columns to search against, e.g. they can search Title, Description and Comments or either only 1 column or a mixture of the three and when they perform the search these selected columns are searched for against the keywords.
I had created 1 index for the title, 1 index for the description and 1 index for the comments and then tried to run the following query:
SELECT * FROM support_calls WHERE MATCH(Title, Description) AGAINST('+these, +are, +some, +keywords')
I got an error from MySQL saying that the MATCH didn't match any fulltext indexes and I found that I need to create an index which included Title and Description together instead of having them in separate indexes.
This is going to add some complexity if this is the case as I am going to have to create an index for every single variation of what columns the user selects. Am I going about this the right away or is there a better solution?
first execute below query and then run your MATCH() query.
ALTER TABLE support_calls ADD FULLTEXT (
Title, Description
)

Can i use 2 different indexes on a fulltext search?

i'm not very very experimented with the indexes so that's why i'm asking this silly question. i searched like everywhere but i didn't get a clear answer.
I will have a table items with columns: id,name,category,price
Here will be 3 indexes:
id - Primary Index
name - FullText Index
category,price - Composite Index
I estimate my table in future will get like 700.000-1.000.00 rows.
I need to do a fulltext search for name and where category is a specified category and order by price.
So my query will be this:
SELECT * FROM items
WHERE MATCH(name) AGAINST(‘my search’) and category='my category' order by price
My question is:
How many index will be used to perform this search?
It will use 2 indexes?
[fulltext index] & [category,price] index - Will get results for words and then will use the next index to match my category and price order
It will use 1 index
[fulltext index] only - Will get results for words, but after will have to manually match my category and price order
I want my query to be fast, what are you opinions? I know the fulltext search is fast, but what happen if i apply clauses like: category and price order? will be same fast?
MySQL will only ever use one index in any search. The reason being that using two indexes will require two searches. This will make the query much more slower. You can force MySQL to use a specific index in a query but this is not a good idea.
In summary: MySQL will only ever use one index it cant use two indexes.

Multi-column database indexes and query speed

I'm deploying a Rails application that aggregates coupon data from various third-party providers into a searchable database. Searches are conducted across four fields for each coupon: headline, coupon code, description, and expiration date.
Because some of these third-party providers do a rather bad job of keeping their data sorted, and because I don't want duplicate coupons to creep into my database, I've implemented a unique compound index across those four columns. That prevents the same coupon from being inserted into my database more than once.
Given that I'm searching against these columns (via simple WHERE column LIKE %whatever% matching for the time being), I want these columns to each individually benefit from the speed gains to be had by indexing them.
So here's my question: will the compound index across all columns provide the same searching speed gains as if I had applied an individual index to each column? Or will it only guarantee uniqueness among the rows?
Complicating the matter somewhat is that I'm developing in Rails, so my question pertains both to SQLite3 and MySQL (and whatever we might port over to in the future), rather than one specific RDBMS.
My guess is that the indexes will speed up searching across individual columns, but I really don't have enough "under the hood" database expertise to feel confident in that judgement.
Thanks for lending your expertise.
will the compound index across all
columns provide the same searching
speed gains as if I had applied an
individual index to each column?
Nope. The order of the columns in the index is very important. Lets suppose you have an index like this: create unique index index_name on table_name (headline, coupon_code, description,expiration_date)
In this case these queries will use the index
select * from table_name where headline = 1
select * from table_name where headline = 1 and cupon_code = 2
and these queries wont use the unique index:
select * from table_name where coupon_code = 1
select * from table_name where description = 1 and cupon_code = 2
So the rule is something like this. When you have multiple fields indexed together, then you have to specify the first k field to be able to use the index.
So if you want to be able to search for any one of these fields then you should create on index on each of them separately (besides the combined unique index)
Also, be careful with the LIKE operator.
this will use index SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
and this will not SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
index usage http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
multiple column index http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html