I know there are a lot of question already on this subject, but I needed more specific information. So here goes:
Ideally what should be the maximum length of characters upon which a full text search can be performed using minimal resources (CPU, memory)?
When should I decide between using the LIKE %$str% and full-text search?
Is it important to have both versions LIKE %$str% and full-text search implemented and use the optimal one dynamically?
As far as I know it depends on the number of words, not characters. The fewer, the faster mysql will be. But don't let that get in your way.
Never use LIKE if you can use a full-text search. Except maybe for queries that you would manually run once in a while and you don't want to slow down the INSERTs on that table.
You know the speed of select vs speed of insert tradeoff in indexes, right?
Always use FT (full-text) search in queries that you don't run manually. LIKE is slow and becomes really slower when the number of rows increases. This is because the mysql engine has to look into EVERY row to answer your query. And FT keeps an index and knows exactly where to look.
Related
Is there any way to search on specific text in MySQL without using the Full Text Search
I know LIKE is a solution but using wildcard at the beginning will disable using indexes , therefore not best performance for large data
Please specify more details of your use case.
Meanwhile, I have found this to be beneficial in some use cases. For example, suppose you wanted to search for a bracketed word:
WHERE MATCH(col) AGAINST('+word' IN BOOLEAN MODE)
AND col LIKE '%[word]%'
The MATCH would rapidly find the few rows with "word", then the LIKE would slowly check those few rows. It gives reasonably fast overall speed while checking for some types of non-words.
I have this doubt:
Suppose I have a one big table with a relationship to to a smaller table of users.
The idea is to search in that really big table for dates bigger than a given date and order by a score (big int, for example), and obtain related user info at the same time.
The result of this query can change every 10 minutes or so.
So, there is no text search, but I have a really big table. Should I use sphinx (or other search engine) or should I just use some MySQL indexes?
If I use sphinx, it's sure that I can obtain really fast results; but maybe having the index refreshed, even with delta indexing, doesn't make a big difference with MySQL indexing. At the same time, the changes in the table are not necessary new inserts, but updates; and I have read that real time indexing and delta index can give problems.
Maybe it would be better to use MySQL indexes, and help with some kind of caching to avoid unnecessary queries .
Just use MySQL, you definitely don't need Sphinx for what you are doing.
I am working with a table having 1500 records, I query for comments columns using 'LIKE '%word%', That is working good for now. but as comments are increasing day by day. I am thinking to use Full text search or some tool like http://sphinxsearch.com.
What is the limit of records in the table when full text search or search engine (spinxsearch) do the job effectively also I think, size of field also matters, I am working with comments so its fine but if its with long long articles then I may realize the power of full text search or sponxsearch.
Thanks.
At the moment when dataset (the whole table in your case) cannot fit in the memory, the indexing (hence the full-text index / sphinx/lucene) make a big difference.
Also, on a highly-concurrent, mixed write/read load, it makes a big difference, because in your query, the whole table have to be scanned, and this involves locking, scanning of redo logs, etc. (depending on transaction isolation level)
'LIKE' use regex to find the right row, this can become very very slow very fast, switch to sphinx, its very good.
In my mysql db I have a user table consisting of 37,000 (or thereabouts) users.
When a user search for another user on the site, I perform a simple like wildcard (i.e. LIKE '{name}%}) to return the users found.
Would it be more efficient and quicker to use a search engine such a solr to do my 'LIKE' searches? furthermore? I believe in solr I can use wildcard queries (http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/)
To be honest, it's not that slow at the moment using a LIKE query however as the number of users grows it'll become slower. Any tips or advice is greatly appreciated.
We had a similar situation about a month ago, our database is roughly around 33k~ and due to the fact our engine was InnoDB we could not utilize the MySQL full-text search feature (that and it being quite blunt).
We decided to implement sphinxsearch (http://www.sphinxsearch.com) and we're really impressed with the results (me becoming quite a 'fanboy' of it).
If we do a large index search with many columns (loads of left joins) of all our rows we actually halved the query response time against the MySQL 'LIKE' counterpart.
Although we havn't used it for long - If you're going to build for future scailablity i'd recommend sphinx.
you can speed up if the searchword must have minimum 3 chars to start the search and index your search column with a index size of 3 chars.
It's actually already built-in to MySQL: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
we're using solr for this purpose, since you can search in 1-2 ms even with milions of documents indexed. we're mirroring our mysql instance with Data Import Handler and then we search on Solr.
as neville pointed out, full text searches are built-in in mysql, but solr performances are way better, since it's born as a full text search engine
I read on MySQL Performance Blog that when tables are large, it is better to scan full tables, instead of using indexes.
I have a table with tens of millions of rows. When conducting queries, if I use no indexes, then queries are 24 times slower than with indexes. I know lot of things may cause this (e.g., are rows stored sequentially), but can you please give me some hints what might be happening? Or how I should start examining this issue? I want to understand when use of indexes is preferred and when it's not
Thanks
The article says that when dealing with very large data sets, where the amount of rows you need to work with are approaching the number of rows that is in the table, using an index might hurt performance.
In this case, going through the index will indeed hurt performance, as long as you need more data than is present in the index.
To go through the index, the database engine first has to read large parts of the index table (it is a type of table), then for each row (or set of rows) from this result, go to the real table and start cherrypicking pages to read.
If, on the other hand, you only need to retrieve columns that area already part of the index table, then the database engine only has to read from that, and not continue on to the full table for more data.
If you end up reading most or close to most of the actual table in question, all the work required to deal with the index might be more overhead than just doing a full table-scan to begin with.
Now, this is all the article is saying. For most work dealing with a database, using indexes is the exact right thing to do.
For instance, if you need to extract a small set of rows, going through an index instead of a full table scan will be many order of magnitudes faster.
In any case, if you're in doubt, you should do some performance profiling to find out how your application behaves under different types of loads, and then start tweaking, don't take a single article as a silver bullet for anything.
For instance, one way to speed up the example queries that does a count on the pad column in the article, would be to create a single index that covered both val and pad, in this way, the count would simply be a index-scan, and not a index-scan + table-lookup, and would run faster than the full table-scan.
Your best option is to know your data, and to experiment, and to know how the tools you use work, so indeed, learn more about indexes, but in the end, it is you who decides what is best for your program.
As always, it depends. I've so far never ran into a scenario as described in that blog posts. Using indexes on my queries for large (50+ million rows) has been on the order of 100 to 10000 times faster than doing a full table scan on these big tables.
There's probably no silver bullet here, you have to test for your particular data and your particular queries.
It is good practice to put the index on each column which you used in a WHERE clause.