How to improve performance in "like '%variable%'"? - mysql

I have this query in MySQL:
select *
from alias where
name like '%jandro%';
Which results are:
Jandro, Alejandro
The index on name cannot be applied to higher performance because it is a range filter. Is there any way of improving that query performance?
I have tried with a full-text index, but it only works for complete words.
I also tried with a MEMORY ENGINE table, and it is faster, but I would like a better choice.
EDIT
I think i will have just to accept this for now:
select *
from alias where match(name) against ('jandro*' in boolean mode);

I've done this in the past (not on MySQL, and before full text searching was commonly available on database servers) by creating a lookup table, in which I created all left-chopping substrings to search on.
In my case, it was merited - a key user journey involved searching for names in much the way you suggest, and performance was key.
It worked with triggers on insert, update and delete.
Translated to your example:
Table alias
ID name
1 Jandro
2 Alejandro
Table name_lookup
alias_id name_substring
1 Jandro
1 andro
1 ndro
1 dro
1 ro
2 Alejandro
2 lejandro
2 ejandro
2 jandro
2 andro
2 ndro
2 dro
2 ro
Your query then becomes
select alias_id, name
from alias a,
name_lookup nl
where a.id = ni.alias_id
and ni.name_substring like 'andro%'
That way, you hit the index on the name_substring table.
It's only worth doing for common queries on huge data sets - but it works, and it's quick.

Related

LIKE is faster than FULLTEXT search in MySQL

I have a table called documents that have around 30 columns, around 3.5 million rows at a size of about 10GB. The most important columns are:
system_id, archive_id, content, barcodes, status and notes.
As you can see this is a multi-tenant application where each tenant is a system and references through system_id.
I have 2 indexes on this table where the first one is a BTREE and have the columns system_id, archive_id and status in it's index.
The other one is a FULLTEXT index containing the columns content, barcodes and notes.
I have two different tenants that I want to highlight. The first one (Customer A) has system_id = 1 and have say 1000 records in the documents table. The second one (Customer B) have system_id = 2 and say 400 000 records in this table.
The LIKE query for Customer A is:
SELECT *
FROM documents
WHERE system_id = 1 AND
CONCAT_WS(' ',content,barcodes,notes) LIKE '%office%' AND
status = 100
The above query will run in about 0.02 seconds. If I run a similar query but with the FULLTEXT search like
SELECT *
FROM documents
WHERE system_id = 1 AND
MATCH(content,barcodes,notes) AGAINST ('office' IN BOOLEAN MODE) AND
status = 100
This operation takes around 4 seconds?! I have read that the FULLTEXT search index should be a lot quicker than LIKE.
If I run the same queries but for Customer B (that has 400 000 records in the documents table) the LIKE search is a little bit slower than FULLTEXT but not with a lot.
What can the reason for this be?
Should I go with LIKE or FULLTEXT search in above situation (8GB RAM database server)?
I'm a little bit confused of why my queries with FULLTEXT search is taking so long. The text in content is probably not just words that a normal person would use because it's OCR-read from the document so there will be a lot of different words that might blow up the index?
The EXPLAINs will show that the fast query is using your index on system_id and status, not the LIKE. It was fast, not because of LIKE, but because of that filtering.
And the slow query decided to use the FULLTEXT index because the Optimizer is too dumb to realize that lots of rows contain "office".
LIKE, especially in conjunction with CONCAT_WS is not faster than FULLTEXT.

If your table has more selects than inserts, are indexes always beneficial?

I have a mysql innodb table where I'm performing a lot of selects using different columns. I thought that adding an index on each of those fields could help performance, but after reading a bit on indexes I'm not sure if adding an index on a column you select on always helps.
I have far more selects than inserts/updates happening in my case.
My table 'students' looks like:
id | student_name | nickname | team | time_joined_school | honor_roll
and I have the following queries:
# The team column is varchar(32), and only has about 20 different values.
# The honor_roll field is a smallint and is only either 0 or 1.
1. select from students where team = '?' and honor_roll = ?;
# The student_name field is varchar(32).
2. select from students where student_name = '?';
# The nickname field is varchar(64).
3. select from students where nickname like '%?%';
all the results are ordered by time_joined_school, which is a bigint(20).
So I was just going to add an index on each of the columns, does that make sense in this scenario?
Thanks
Indexes help the database more efficiently find the data you're looking for. Which is to say you don't need an index simply because you're selecting a given column, but instead you (generally) need an index for columns you're selecting based on - i.e. using a WHERE clause (even if you don't end up including the searched column in your result).
Broadly, this means you should have indexes on columns that segregate your data in logical ways, and not on extraneous, simply informative columns. Before looking at your specific queries, all of these columns seem like reasonable candidates for indexing, since you could reasonably construct queries around these columns. Examples of columns that would make less sense would be things phone_number, address, or student_notes - you could index such columns, but generally you don't need or want to.
Specifically based on your queries, you'll want student_name, team, and honor_roll to be indexed, since you're defining WHERE conditions based on the values of these columns. You'll also benefit from indexing time_joined_school if, as you suggest, you're ORDER BYing your queries based on that column. Your LIKE query is not actually easy for most RDBs to handle, and indexing nickname won't help. Check out How to speed up SELECT .. LIKE queries in MySQL on multiple columns? for more.
Note also that the ratio of SELECT to INSERT is not terribly relevant for deciding whether to use an index or not. Even if you only populate the table once, and it's read-only from that point on, SELECTs will run faster if you index the correct columns.
Yes indexes help on accerate your querys.
In your case you should have index on:
1) Team and honor_roll from query 1 (only 1 index with 2 fields)
2) student_name
3) time_joined_school from order
For the query 3 you can't use indexes because of the like statement. Hope this helps.

Condition on joined table faster than condition on reference

I have a query involving two tables: table A has lots of rows, and contains a field called b_id, which references a record from table B, which has about 30 different rows. Table A has an index on b_id, and table B has an index on the column name.
My query looks something like this:
SELECT COUNT(A.id) FROM A INNER JOIN B ON B.id = A.b_id WHERE (B.name != 'dummy') AND <condition>;
With condition being some random condition on table A (I have lots of those, all exhibiting the same behavior).
This query is extremely slow (taking north of 2 seconds), and using explain, shows that query optimizer starts with table B, coming up with about 29 rows, and then scans table A. Doing a STRAIGHT_JOIN, turned the order around and the query ran instantaneously.
I'm not a fan of black magic, so I decided to try something else: come up with the id for the record in B that has the name dummy, let's say 23, and then simplify the query to:
SELECT COUNT(A.id) FROM A WHERE (b_id != 23) AND <condition>;
To my surprise, this query was actually slower than the straight join, taking north of a second.
Any ideas on why the join would be faster than the simplified query?
UPDATE: following a request in the comments, the outputs from explain:
Straight join:
+----+-------------+-------+--------+-----------------+---------+---------+---------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------+---------+---------+---------------+--------+-------------+
| 1 | SIMPLE | A | ALL | b_id | NULL | NULL | NULL | 200707 | Using where |
| 1 | SIMPLE | B | eq_ref | PRIMARY,id_name | PRIMARY | 4 | schema.A.b_id | 1 | Using where |
+----+-------------+-------+--------+-----------------+---------+---------+---------------+--------+-------------+
No join:
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | A | ALL | b_id | NULL | NULL | NULL | 200707 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
UPDATE 2:
Tried another variant:
SELECT COUNT(A.id) FROM A WHERE b_id IN (<all the ids except for 23>) AND <condition>;
This runs faster than the no join, but still slower than the join, so it seems that the inequality operation is responsible for part of the performance hit, but not all.
If you are using MySQL 5.6 or later then you can ask the query optimizer what it is doing;
SET optimizer_trace="enabled=on";
## YOUR QUERY
SELECT COUNT(*) FROM transactions WHERE (id < 9000) and user != 11;
##END YOUR QUERY
SELECT trace FROM information_schema.optimizer_trace;
SET optimizer_trace="enabled=off";
You will almost certainly need to refer to the following sections in the MySQL reference Tracing the Optimiser and The Optimizer
Looking at the first explain it appears that the query is quicker probably because the optimizer can use the table B to filter down to the rows required based on the join and then use the foreign key to get the rows in table A.
In the explain it's this bit that is interesting; there is only one row matching and it's using schema.A.b_id. Effectively this is pre-filtering the rows from A which is where I think the performance difference comes from.
| ref | rows | Extra |
| schema.A.b_id | 1 | Using where |
So, as is usual with queries it is all down to indexes - or more accurately missing indexes. Just because you have indexes on individual fields it doesn't necessarily mean that these are suitable for the query you're running.
Basic rule: If the EXPLAIN doesn't say Using Index then you need to add a suitable index.
Looking at the explain output the first interesting thing is ironically the last thing on each line; namely the Extra
In the first example we see
| 1 | SIMPLE | A | .... Using where |
| 1 | SIMPLE | B | ... Using where |
Both of these Using where is not good; ideally at least one, and preferably both should say Using index
When you do
SELECT COUNT(A.id) FROM A WHERE (b_id != 23) AND <condition>;
and see Using where then you need to add an index as it's doing a table scan.
for example if you did
EXPLAIN SELECT COUNT(A.id) FROM A WHERE (Id > 23)
You should see Using where; Using index (assuming here that Id is the primary key and has an index)
If you then added a condition onto the end
EXPLAIN SELECT COUNT(A.id) FROM A WHERE (Id > 23) and Field > 0
and see Using where then you need to add an index for the two fields. Just having an index on a field doesn't mean that MySQL will be able to use that index during the query across multiple fields - this is something that internally the query optimizer will decide upon. I'm not exactly certain of the internal rules; but generally adding an extra index to match the query helps immensely.
So adding an index (on the two fields in the query above):
ALTER TABLE `A` ADD INDEX `IndexIdField` (`Id`,`Field`)
should change it such that when querying based upon those two fields there is an index.
I've tried this on one of my databases that has Transactions and User tables.
I'll use this query
EXPLAIN SELECT COUNT(*) FROM transactions WHERE (id < 9000) and user != 11;
Running without index on the two fields:
PRIMARY,user PRIMARY 4 NULL 14334 Using where
Then add an index:
ALTER TABLE `transactions` ADD INDEX `IndexIdUser` (`id`, `user`);
Then the same query again and this time
PRIMARY,user,Index 4 Index 4 4 NULL 12628 Using where; Using index
This time it's using the indexes - and as a result will be a lot quicker.
From comments by #Wrikken - and also bear in mind that I don't have the accurate schema / data so some of this investigation has required assumptions about the schema (which may be wrong)
SELECT COUNT(A.id) FROM A FORCE INDEX (b_id)
would perform at least as good as
SELECT COUNT(A.id) FROM A INNER JOIN B ON A.b_id = B.id.
If we look at the first EXPLAIN in the OP we see that there are two elements to the query. Referring to the EXPLAIN documentation for *eq_ref* I can see that this is going to define the rows for consideration based on this relationship.
The order of the explain output doesn't necessarily mean it's doing one and then the other; it's simply what has been chosen to execute the query (at least as far as I can tell).
For some reason the query optimizer has decided not to use the index on b_id - I'm assuming here that because of the query the optimizer has decided that it will be more efficient to do a table scan.
The second explain concerns me a little because it's not considering the index on b_id; possibly because of the AND <condition> (which is omitted so I'm guessing as to what it could be). When I try this with an index on b_id it does use the index; but as soon as a condition is added it doesn't use the index.
So, when doing
SELECT COUNT(A.id) FROM A INNER JOIN B ON A.b_id = B.id.
This all indicates to me is that the PRIMARY index on B is where the speed difference is coming from; I'm assuming because of the schema.A.b_id in the explain that there is a Foreign key on this table; which must be a better collection of related rows than the index on b_id - so the query optimizer can use this relationship to define which rows to pick - and because a primary index is better than secondary indexes it's going to be much quicker to select rows out of B and then use the relationship link to match against the rows in A.
I do not see any strange behavior here. What you need is to understand the basics of how MySQL uses indexes. Here is an article I usually recommend: 3 ways MySQL uses indexes.
It is always funny to observe people writing things like WHERE (B.name != 'dummy') AND <condition>, because this AND <condition> might be the reason why MySQL optimizer chose the specific index, and there is no valid reason to compare the performance of the query with that of another one with WHERE b_id != 23 AND <condition>, because the two queries usually need different indexes to perform good.
One thing you should understand, is that MySQL likes equality comparisons, and does not like range conditions and inequality comparisons. It is usually better to specify the correct values than to use a range condition or specify a != value.
So, let's compare the two queries.
With straight join
For each row in the A.id order (which is the primary key and is clustered, that is data is stored in its order on disk) take data for the row from disk to check if your <condition> is met and b_id, then (I repeat for each matching row) find the appropriate row for b_id, go on disk, take b.name, compare it with 'dummy'. Even though this plan in not at all efficient, you have only 200000 rows in your A table, so that it seems rather performant.
Without straight join
For each row in table B compare if name is matching, look into the A.b_id index (which is obviously sorted by b_id, since it is an index, hence contains A.ids in random order), and for each A.id for the given A.b_id find the corresponding A row on disk to check the <condition>, if it matches count the id, otherwise, discard the row.
As you see, there is nothing strange in the fact that the second query takes so long, you basically force MySQL to randomly access almost each row in A table, where in the first query you read the A table in the order it is stored on disk.
The query with no join does not use any index at all. It actually should take about the same as the query with straight join. My guess is that the order of the b_id!=23 and <condition> is significant.
UPD1: Could you still compare the performance of your query without join with the following:
SELECT COUNT(A.id)
FROM A
WHERE IF(b_id!=23, <condition>, 0);
UPD2: the fact the you do not see an index in EXPLAIN does not mean that no index is used at all. An index is at least used to define the reading order: when there is no other useful index, it is usually the primary key, but, as I said above, when there is an equility condition and the corresponding index, MySQL will use the index. So, basically, to understand which index is used you can look at the order in which rows are output. If the order is the same as the primary key, than no index was used (that is the primary key index was used), if the order of rows is shuffled - than there was some other index involved.
In your case, the second condition seems to be true for most of the rows, but the index is still used, that is to get b_id MySQL goes on disk in random order, that's why it is slow. No black magic here, and this second condition does affect the performance.
Probably this should be a comment rather than an answer but it will be a bit long.
First of all, it is hard to believe that two queries that have (almost) exactly the same explain run at different speed. Furthermore, this is less likely if the one with the extra line in the explain runs faster. And I guess the word faster is the key here.
You've compared speed (the time it takes for a query to finish) and that is an extremely empiric way of testing. For example, you could have improperly disabled the cache, which makes that comparison useless. Not to mention that your <insert your preferred software application here> could have made a page fault or any other operation at the time you've run the test that could have resulted in a decrease of the query speed.
The right way of measuring query performance is based on the explain (that's why it is there)
So the closest thing I have to answer the question: Any ideas on why the join would be faster than the simplified query?... is, in short, a layer 8 error.
I do have some other comments, though, that should be taken into account in order to speed things up. If A.id is a primary key (the name smells like it is), according to your explain, why does the count(A.id) have to scan all the rows? It should be able to get the data directly from the index but I don't see the Using index in the extra flags. It seems you don't even have a unique index on and that it is not a non nullable field. That also smells odd. Make sure that the field is not null and that there is a unique index on it, run the explain again, confirm the extra flags contain the Using index and then (properly) time the query. It should run much faster.
Also note that an approach that would result in the same performance improvement as I mentioned above would be to replace count(A.id) with count(*).
Just my 2 cents.
Because MySQL will not use index for index!=val in where.
The optimizer will decide to use an index by guessing. As a "!=" will more likely fetch everything, it skip and prevent using index to reduce overhead. (yes, mysql is stupid, and it does not statistic index column)
You may do a faster SELECT, by using index in(everything other then val), that MySQL will learn to use the index.
Example here showing query optimizer will choose to not use index by value
The answer to this question is actually a very simple consequence of algorithm design:
The key difference between these two queries is the merge operation.
Before I give a lesson on algorithms, I will mention the reason why the merge operation improves the performance. The merge improves the performance because it reduces the overall load on the aggregation. This is an iteration vs recursion issue. In the iteration analogy, we are simply looping through the entire index and counting the matches. In the recursion analogy, we are dividing and conquering (so to speak); or in other words, we are filtering the results that we need to count, thus reducing the volume of numbers we actually need to count.
Here are the key questions:
Why is a merge sort faster than an insertion sort?
Is a merge sort always faster than an insertion sort?
Let's explain this with a parable:
Let's say we have a deck of playing cards, and we need to sum the numbers of playing cards that have the numbers 7, 8 and 9 (assuming we don't know the answer in advance).
Let's say that we decide upon two ways to solve this problem:
We can hold the deck in one hand and move the cards to the table, one by one, counting as we go.
We can separate the cards into two groups: black suits and red suits. Then we can perform step 1 upon one of the groups and reuse the results for the second group.
If we choose option 2, then we have divided our problem in half. As a consequence, we can count the matching black cards and multiply the number by 2. In other words, we are re-using the part of the query execution plan that required the counting. This reasoning especially works when we know in advance how the cards were sorted (aka "clustered index"). Counting half of the cards is obviously much less time consuming than counting the entire deck.
If we wanted to improve the performance yet again, depending on how large the size of our database is, we may even further consider sorting into four groups (instead of two groups): clubs, diamonds, hearts, and spades. Whether or not we want to perform this further step depends on whether or not the overhead of sorting the cards into the additional groups is justified by the performance gain. In small numbers of cards, the performance gain is likely not worth the extra overhead required to sort into the different groups. As the number of cards grows, the performance gain begins to outweigh the overhead cost.
Here is an excerpt from "Introduction to Algorithms, 3rd edition," (Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein):
(Note: If someone can tell me how to format the sub-notation, I will edit this to improve readability.)
(Also, keep in mind that "n" is the number of objects we are dealing with.)
"As an example, in Chapter 2, we will see two algorithms for sorting.
The first, known as insertion sort, takes time roughly equal to c1n2
to sort n items, where c1 is a constant that does not depend on n.
That is, it takes time roughly proportional to n2. The second, merge
sort, takes time roughly equal to c2n lg n, where lg n stands for
log2 n and c2 is another constant that also does not depend on n.
Insertion sort typically has a smaller constant factor than merge
sort, so that c1 < c2. We shall see that the constant factors can
have far less of an impact on the running time than the dependence on
the input size n. Let’s write insertion sort’s running time as c1n ·
n and merge sort’s running time as c2n · lg n. Then we see that where
insertion sort has a factor of n in its running time, merge sort has
a factor of lg n, which is much smaller. (For example, when n = 1000,
lg n is approximately 10, and when n equals one million, lg n is
approximately only 20.) Although insertion sort usually runs faster
than merge sort for small input sizes, once the input size n becomes
large enough, merge sort’s advantage of lg n vs. n will more than
compensate for the difference in constant factors. No matter how much
smaller c1 is than c2, there will always be a crossover point beyond
which merge sort is faster."
Why is this relevant? Let us look at the query execution plans for these two queries. We will see that there is a merge operation caused by the inner join.

Concatenating multiple MySQL tables for search. Is this a good use of a MySQL View?

I'm trying to make it quick and easy to perform a keyword search on a set of MySQL tables which are linked to each other.
There's a table of items with a unique "itemID" and associated data is spread out amongst other tables, all linked to via the itemID.
I've created a view which concatenates much of this information into one usable form. This makes searching really easy, but hasn't helped with performance. It's my first use of a view, and perhaps wasn't the right use. If anyone could give me some pointers I'd be very grateful.
A simplified example is:
ITEMS TABLE:
itemID | name
------ -------
1 "James"
2 "Bob"
3 "Mary"
KEYWORDS TABLE:
keywordID | itemID | keyword
------ ------- -------
1 2 "rabbit"
2 2 "dog"
3 3 "chicken"
plus many more relations...
MY VIEW: (created using CONCAT_WS, GROUP_CONCAT and a fair few JOINs)
itemID | important_search_terms
------ -------
1 "James ..."
2 "Bob, rabbit, dog ..."
3 "Mary, chicken ..."
I can then search the view for "mary" and "chicken" and easily find that itemID=3 matches. Brilliant!
The problem is, it seems to be doing all the work of the CONCATs and JOINs for each and every search which is not efficient. With my current test data searches are taking approx 2 seconds, which is not practical.
I was hoping that the view would be cached in some way, but perhaps I'm not using it in the right way.
I could have an actual table with this search info which I update periodically, but it doesn't seem as neat as I had hoped.
If anyone has any suggestions I'd be very grateful. Many Thanks
Well, a view is nothing more than making it easier to read what you query for but underneath perform the SQL-Statement lying underneath everytime.
So no wonder it is as slow (even slower...) as when you run that statement itself.
Usually this is done by indexing jobs (running at nighttime, not annoying anyone), or indexed inserts (when new data is inserted, checks run if it is a good idea to insert them into the indexed interesting words).
Having that at runtime is really hard and require well designed database structures and most of the time potent hardware for the sql server (depending of data amount).
A MySQL view is not the same as a materialized view in other SQL languages. All it's really doing is caching the query itself, not the data needed for the query.
The primary use for a MySQL view is to eliminate repetitive queries that you have to write over and over again.
You've made it easy, but not made it quick. I think if you look at the EXPLAIN for your query you are going to see that MySQL is materializing that view (writing out a copy of the result set from the view query as a "derived table") each time you run the query, and then running a query from that "derived table".
You would get better performance if you can have the "search" predicate run against each table separately, something like this:
SELECT 'items' AS source, itemID, name AS found_term
FROM items WHERE name LIKE 'foo'
UNION ALL
SELECT 'keywords', itemID, keyword
FROM keywords WHERE keyword LIKE 'foo'
UNION ALL
SELECT 'others', itemID
FROM others WHERE other LIKE 'foo'
-or-
if you don't care what the matched term is, or which table it was found in, and you just want to return a distinct list of itemID that were matched
SELECT itemID
FROM items WHERE name LIKE 'foo'
UNION
SELECT itemID
FROM keywords WHERE keyword LIKE 'foo'
UNION
SELECT itemID
FROM others WHERE other LIKE 'foo'

How to speed up SELECT .. LIKE queries in MySQL on multiple columns?

I have a MySQL table for which I do very frequent SELECT x, y, z FROM table WHERE x LIKE '%text%' OR y LIKE '%text%' OR z LIKE '%text%' queries. Would any kind of index help speed things up?
There are a few million records in the table. If there is anything that would speed up the search, would it seriously impact disk usage by the database files and the speed of INSERT and DELETE statements? (no UPDATE is ever performed)
Update: Quickly after posting, I have seen a lot of information and discussion about the way LIKE is used in the query; I would like to point out that the solution must use LIKE '%text%' (that is, the text I am looking for is prepended and appended with a % wildcard). The database also has to be local, for many reasons, including security.
An index wouldn't speed up the query, because for textual columns indexes work by indexing N characters starting from left. When you do LIKE '%text%' it can't use the index because there can be a variable number of characters before text.
What you should be doing is not use a query like that at all. Instead you should use something like FTS (Full Text Search) that MySQL supports for MyISAM tables. It's also pretty easy to make such indexing system yourself for non-MyISAM tables, you just need a separate index table where you store words and their relevant IDs in the actual table.
Update
Full text search available for InnoDB tables with MySQL 5.6+.
An index won't help text matching with a leading wildcard, an index can be used for:
LIKE 'text%'
But I'm guessing that won't cut it. For this type of query you really should be looking at a full text search provider if you want to scale the amount of records you can search across. My preferred provider is Sphinx, very full featured/fast etc. Lucene might also be worth a look. A fulltext index on a MyISAM table will also work, but ultimately pursuing MyISAM for any database that has a significant amount of writes isn't a good idea.
An index can not be used to speed up queries where the search criteria starts with a wildcard:
LIKE '%text%'
An index can (and might be, depending on selectivity) used for search terms of the form:
LIKE 'text%'
Add a Full Text Index and Use MATCH() AGAINST().
Normal indexes will not help you with like queries, especially those that utilize wildcards on both sides of the search term.
What you can do is add a full text index on the columns that you're interested in searching and then use a MATCH() AGAINST() query to search those full text indexes.
Add a full text index on the columns that you need:
ALTER TABLE table ADD FULLTEXT INDEX index_table_on_x_y_z (x, y, z);
Then query those columns:
SELECT * FROM table WHERE MATCH(x,y,z) AGAINST("text")
From our trials, we found these queries to take around 1ms in a table with over 1 million records. Not bad, especially compared to the equivalent wildcard LIKE %text% query which takes 16,400ms.
Benchmarks
MATCH(x,y,z) AGAINST("text") takes 1ms
LIKE %text% takes 16400ms
16400x faster!
I would add that in some cases you can speed up the query using an index together with like/rlike if the field you are looking at is often empty or contains something constant.
In that case it seems that you can limit the rows which are visited using the index by adding an "and" clause with the fixed value.
I tried this for searching 'tags' in a huge table which usually does not contain a lot of tags.
SELECT * FROM objects WHERE tags RLIKE("((^|,)tag(,|$))" AND tags!=''
If you have an index on tags you will see that it is used to limit the rows which are being searched.
Maybe you can try to upgrade mysql5.1 to mysql5.7.
I have about 70,000 records. And run following SQL:
select * from comics where name like '%test%';
It takes 2000ms in mysql5.1.
And it takes 200ms in mysql5.7 or mysql5.6.
Another way:
You can mantain calculated columns with those strings REVERSEd and use
SELECT x, y, z FROM table WHERE x LIKE 'text%' OR y LIKE 'text%' OR z LIKE 'text%' OR xRev LIKE 'txet%' OR yRev LIKE 'txet%' OR zRev LIKE 'txet%'
Example of how to ADD a stored persisted column
ALTER TABLE table ADD COLUMN xRev VARCHAR(N) GENERATED ALWAYS AS REVERSE(x) stored;
and then create an indexes on xRev, yRev etc.
Another alternative to avoid full table scans is selecting substrings and checking them in the having statement:
SELECT
al3.article_number,
SUBSTR(al3.article_number, 2, 3) AS art_nr_substr,
SUBSTR(al3.article_number, 1, 3) AS art_nr_substr2,
al1.*
FROM
t1 al1
INNER JOIN t2 al2 ON al2.t1_id = al1.id
INNER JOIN t3 al3 ON al3.id = al2.t3_id
WHERE
al1.created_at > '2018-05-29'
HAVING
(art_nr_substr = "FLA" OR art_nr_substr = 'VKV' OR art_nr_subst2 = 'PBR');
When you optimize a SELECT foo FROM bar WHERE baz LIKE 'ZOT%' query, you want the index length to at least match the number of characters in the request.
Here is a real life example from just now:
Here is the query:
EXPLAIN SELECT COUNT(*) FROM client_detail cd
JOIN client_account ca ON cd.client_acct_id = ca.client_acct_id
WHERE cd.first_name LIKE 'XX%' AND cd.last_name_index LIKE 'YY%';
With no index:
+-------+
| rows |
+-------+
| 13994 |
| 1 |
+-------+
So first try a 4x index,
CREATE INDEX idx_last_first_4x4 on client_detail(last_name_index(4), first_name(4));
+------+
| rows |
+------+
| 7035 |
| 1 |
+------+
A bit better, but COUNT(*) shows there are only 102 results. So lets now add a 2x index:
CREATE INDEX idx_last_first_2x2 on client_detail(last_name_index(2), first_name(2));
yields:
+------+
| rows |
+------+
| 102 |
| 1 |
+------+
Both indexes are still in place at this point, and MySQL chose the latter index for this query---however it will still choose the 4x4 query if it is more efficient.
Index ordering may be useful, try the 2x2 before the 4x4 or vice-versa to see how it performs for your environment. To re-order an index you have to drop and re-create the earlier one.