Indexes in MySQL - mysql

I've only started using INDEXes in my MySQL database and I'm a little unsure if what I have in mind will work. I have a TEXT field that can store a large body of text and will need to be searched, along with another id INT field. If I have an INDEX on say my id_column field and a FULLTEXT index on my text_column, will MySQL use both in a query such as
SELECT * FROM notes WHERE id_column='123' AND MATCH(text_column) AGAINST(search_text)
??
Secondly, I have a group of columns that can be used frequently for searching in combination together. If I create a multi-column INDEX in these columns, the index wills till work if the columns used are together left-to-right in the index. But what happens if the user leaves out a particular column, say B, and searches using A, B, D in an index like (A, B, C, D) ???

For question 1:
Yes, the query will use both indices. FULLTEXT indices can be kind of tricky, however, so it's a good idea to read the MySQL documentation thoroughly on them and use EXPLAIN on your queries to make sure they are properly utilizing indices.
For question 2:
If you have a multiple column index, the index has to have the same columns in the same order as the query to be used. So in your example, the index wouldn't be utilized.
EXPLAIN is a very powerful tool for understanding how queries use indices, and it's a good idea to use it frequently (especially on queries which are programatically generated). http://dev.mysql.com/doc/refman/5.0/en/explain.html

There is no guarantee that MySQL will use both two indexes for the same table in one query. In general, no. But sometimes it activates an "index merge," searching both indexes and combining the results.
Not all queries can do this, however. You should read about this feature here: http://dev.mysql.com/doc/refman/5.6/en/index-merge-optimization.html
Regarding multi-column indexes, if you have an index on columns A, B, C, D, and you do a search on columns A, B, D, then the index may be used, but only so far as it narrows down the search based on your conditions for columns A and B.
You can see evidence of this if you use EXPLAIN and look at the "ken_len" field. The key_len will be the total number of bytes in the columns that are used in that multi-column index. Fo example, if A, B, C, D are four 4-byte integers, the key_len could be as much as 16. But if only A and B are used, the key_len will be 8.

Given this query:
SELECT * FROM notes
WHERE id_column='123'
AND MATCH(text_column) AGAINST(search_text)
the only way the optimizer will perform it (to my knowledge) is to
Use FULLTEXT(text_column) to do the second part of the search, then
Filter out those without id_column='123'; no index will be used for this step.
That's the general rule when mixing FULLTEXT and non-fulltext indexes -- FULLTEXT first; no other indexes used.
However... Here is a trick that sometimes speeds up complex queries:
SELECT b.*
FROM (
SELECT id -- assuming this is the PRIMARY KEY
FROM notes
WHERE MATCH(text_column) AGAINST(search_text)
) AS a
JOIN notes AS b -- "self join"
ON b.id = a.id -- just the PK
JOIN ((other tables)) ON ...
WHERE ((other messy or bulky stuff)) ...
The idea is to use the subquery to condense down to a short list of small values (the ids), then reach back in (or futher JOIN) to get the bulky stuff.
For building optimal composite indexes for some simple queries, see my index cookbook.

Related

Should I create separate MySQL indexes for url_title vs url_title, url_description, url_keywords?

Using MySQL 5.7, I have a table of urls containing url_title, url_description, url_keywords
Sometimes I just need to look in url_title, but sometimes look for something in all columns.
Is it better to just create one index containing all 3 columns or create a separate index for url_title alone and another index containing all 3 columns ?
e.g Will it search for url_title slower in the 3 columns index vs single column ?
Or can MySQL only search/read in given column even if index would contain 3 columns ?
Later edit: this is a sample query but I do have other less important variations:
SELECT *
FROM urls
WHERE match(url_title, url_description,
url_keywords, url_paragraphs)
against('red boots' IN BOOLEAN MODE)
LIMIT 500
Update: You didn't mention in your original post that you were talking about fulltext indexes, not conventional B-tree indexes.
Fulltext indexes are a different type. You must specify ALL the columns of the fulltext index in your MATCH() clause. No fewer, and no more, and they must be in the same order as they appear in the index definition.
If you want to do a fulltext search only on a single column sometimes, then you will have to create another fulltext with that single column.
Below is my original answer, that I wrote before you clarified that you were using a fulltext index. Perhaps it will help someone else.
MySQL can use the index if the column(s) you search are the leftmost column(s) of that index. It can use a subset of the columns of a multi-column index.
For example, given an index on (a, b, c), the following query uses all three columns:
SELECT ... WHERE a = ? AND b = ? AND c = ?
The following query uses the first column a of the index, because it's the leftmost column.
SELECT ... WHERE a = ?
The following query uses the first two columns of the index, because they're consecutive and the leftmost subset of columns.
SELECT ... WHERE a = ? AND b = ?
The following query uses only the first column a of the index, because the conditions don't match consecutive columns of the index. It will use the index to narrow down the search to rows matching the a condition, but then it will have to examine each of those rows to evaluate the c condition, even though c is part of the same index.
SELECT ... WHERE a = ? AND c = ?
MySQL has an optimization called index condition pushdown which does a short-cut for this. It delegates to the storage engine to evaluate the c condition, knowing that c is part of the index. So it still counts as examining the row, but it make the row read a little bit less costly.
The following query cannot use the index at all, because the conditions are not on leftmost columns of that index.
SELECT ... WHERE b = ? AND c = ?
The guidelines for FULLTEXT indexes and MATCH...AGAINST are different than for INDEX. For this:
SELECT *
FROM urls
WHERE match(url_title, url_description,
url_keywords, url_paragraphs)
against('red boots' IN BOOLEAN MODE)
LIMIT 500
(and assuming ENGINE=InnoDB), you need a FULLTEXT index with all 4 columns in it.
FULLTEXT(url_title, url_description,
url_keywords, url_paragraphs)
If you might also be searching, say, just url_title in another query, then you would also need FULLTEXT(url_title). (Etc)
See if either of these would be 'better' for your application:
against('+red +boots' IN BOOLEAN MODE)
against('red boots')

How do I create one MySQL index for 2 SQL queries?

SELECT * FROM messages_messages WHERE (from_user_id=? AND to_user_id=?) OR (from_user_id=? AND to_user_id=?) ORDER BY created_at DESC
I have another query, which is this:
SELECT COUNT(*) FROM messages_messages WHERE from_user_id=? AND to_user_id=? AND read_at IS NULL
I want to index both of these queries, but I don't want to create 2 separate indexes.
Right now, I'm using 2 indexes:
[from_user_id, to_user_id, created_at]
[from_user_id, to_user_id, read_at]
I was wondering if I could do this with one index instead of 2?
These are the only 2 queries I have for this table.
The docs explain fairly completely how MySQL uses indices. In particular, its optimizer can use any left prefix of a multi-column index. Therefore, you could drop either of your two existing indices, and the other would be eligible for use in both queries, though it would be more selective / useful for one than for the other.
In principle, it could be more beneficial to keep your first index, provided that the created_at column was indexed in descending order. In practice, MySQL allows you to specify index column order, but in fact implements only ascending order. Therefore, having created_at in your index probably doesn't help very much.
No, you need both indexes for these two queries if you want to optimize fully.
Once you reach the column used for either sorting or range comparison (IS [NOT] NULL counts as a range predicate for this purpose), you don't get any benefit from putting more columns in the index. In other words, your index can have:
Some columns that are used in equality predicates
One column that is used either in a range predicate, or to avoid a filesort -- but not both.
Extra columns used in neither searching nor sorting, but only for the sake of a covering index.
So you cannot make a four-column index that serves both queries.
The only way you can reduce this to one index, as #JohnBollinger says, is to make an index that optimizes for one query, and uses a subset of the index for the second query. But that won't work as well.

MySQL multiple index optimization

I have a question about optimizing sql queries with multiple index.
Imagine I have a table "TEST" with fields "A, B, C, D, E, F".
In my code (php), I use the following "WHERE" query :
Select (..) from TEST WHERE a = 'x' and B = 'y'
Select (..) from TEST WHERE a = 'x' and B = 'y' and F = 'z'
Select (..) from TEST WHERE a = 'x' and B = 'y' and (D = 'w' or F = 'z')
what is the best approach to get the best speed when running queries?
3 multiple Index like (A, B), (A, B, F) and (A, B, D, F)?
Or A single multiple index (A, B, D, F)?
I would tend to say that the 3 index would be best even if the space of index in the database will be larger.
In my problem, I search the best execution time not the space.
The database being of a reasonable size.
Multiple-column indexes:
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
In other words, it is a waste of space an computing power to define an index that covers the same first N columns as another index and in the same order.
The best way to exam the index is to practice. Use "explain" in mysql, it will give you a query plan and tell you which index to use. In addition, it will give you an estimate time for your query to run. Here is an example
explain select * from TEST WHERE a = 'x' and B = 'y'
It is hard to give definitive answers without experiments.
BUT: ordinarily an index like (A,B,D) is considered to be superfluous if you have an index on (A,B,D,F). So, in my opinion you only need the one multicolumn index.
There is one other consideration. If your table has a lot of columns and a lot of rows and your SELECT list has a small subset of those columns, you might consider including those columns in your index. For example, if your query says SELECT D,F,G,H FROM ... you should try creating an index on
(A,B,D,F,G,H)
as it will allow the query to be satisfied from the index without having to refer back to the rows of the table. This can sometimes help performance a great deal.
It's hard to explain well, but generally you should use as few indexes as you can get away with, using as many columns of the common queries as you can, with the most commonly queried columns first.
In your example WHERE clauses, A and B are always included. These should thus be part of an index. If A is more commonly used in a search then list that first, if B is more commonly used then list that first. MySQL can partially use the index as long as each column (seen from the left) in the index is used in the WHERE clause. So if you have an index ( A, B, C ) then WHERE ( A = .. AND B = .. AND Z = .. ) can still use that index to narrow down the search. If you have a WHERE ( B = .. AND Z = .. ) clause then A isn't part of the search condition and it can't be used for that index.
You want the single multiple column index A, B, D, F OR A, B, F, D (only one of these at a time can be used), but which depends mostly on the number of times D or F are queried for, and the distribution of data. Say if most of the values in D are 0 but one in a hundred values are 1 then that column would have a poor key distribution and thus putting the index on that column wouldn't be all that useful.
The optimiser can use a composite index for where conditions that follow the order of the index with no gaps:
An index on (A,B,F) will cover the first two queries.
The last query is a bit trickier, because of the OR. I think only the A and B conditions will be covered by (A,B,F) but using a separate index (D) or index (F) may speed up the query depending on the cardinality of the rows.
I think an index on (A,B,D,F) can only be used for the A and B conditions on all three queries. Not the F condition on query two, because the D value in the index can be anything and not the D and F conditions because of the OR.
You may have to add hints to the query to get the optimiser to use the best index and you can see which indexes are being used by running an EXPLAIN ... on the query.
Also, adding indexes slows down DML statements and can cause locking issues, so it's best to avoid over-indexing where possible.

How can I avoid a full table scan on this mysql query?

explain
select
*
from
zipcode_distances z
inner join
venues v
on z.zipcode_to=v.zipcode
inner join
events e
on v.id=e.venue_id
where
z.zipcode_from='92108' and
z.distance <= 5
I'm trying to find all "events at venues within 5 miles of zipcode 92108", however, I am having a hard time optimizing this query.
Here is what the explain looks like:
id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, SIMPLE, e, ALL, idx_venue_id, , , , 60024,
1, SIMPLE, v, eq_ref, PRIMARY,idx_zipcode, PRIMARY, 4, comedyworld.e.venue_id, 1,
1, SIMPLE, z, ref, idx_zip_from_distance,idx_zip_to_distance,idx_zip_from_to, idx_zip_from_to, 30, const,comedyworld.v.zipcode, 1, Using where; Using index
I'm getting a full table scan on the "e" table, and I can't figure out what index I need to create to get it to be fast.
Any advice would be appreciated
Thank you
Based on the EXPLAIN output in your question, you already have all the indexes the query should be using, namely:
CREATE INDEX idx_zip_from_distance
ON zipcode_distances (zipcode_from, distance, zipcode_to);
CREATE INDEX idx_zipcode ON venues (zipcode, id);
CREATE INDEX idx_venue_id ON events (venue_id);
(I'm not sure from your index names whether idx_zip_from_distance really includes the zipcode_to column. If not, you should add it to make it a covering index. Also, I've included the venues.id column in idx_zipcode for completeness, but, assuming it's the primary key for the table and that you're using InnoDB, it will be included automatically anyway.)
However, it looks like MySQL is choosing a different, and possibly suboptimal, query plan, where it scans through all events, finds their venues and zip codes, and only then filters the results on distance. This could be the optimal query plan, if the cardinality of the events table was low enough, but from the fact that you're asking this question I assume it's not.
One reason for the suboptimal query plan could be the fact that you have too many indexes which are confusing the planner. For instance, do you really need all three of those indexes on the zipcode table, given that the data it stores is presumably symmetric? Personally, I'd suggest only the index I described above, plus a unique index (which can also be the primary key, if you don't have an artificial one) on (zipcode_to, zipcode_from) (preferably in that order, so that any occasional queries on zipcode_to=? can make use of it).
However, based on some testing I did, I suspect the main issue why MySQL is choosing the wrong query plan comes simply down to the relative cardinalities of your tables. Presumably, your actual zipcode_distances table is huge, and MySQL isn't smart enough to realize quite how much the conditions in the WHERE clause really narrow it down.
If so, the best and simplest fix may be to simply force MySQL to use the indexes you want:
select
*
from
zipcode_distances z
FORCE INDEX (idx_zip_from_distance)
inner join
venues v
FORCE INDEX (idx_zipcode)
on z.zipcode_to=v.zipcode
inner join
events e
FORCE INDEX (idx_venue_id)
on v.id=e.venue_id
where
z.zipcode_from='92108' and
z.distance <= 5
With that query, you should indeed get the desired query plan. (You do need FORCE INDEX here, since with just USE INDEX the query planner could still decide to use a table scan instead of the suggested index, defeating the purpose. I had this happen when I first tested this.)
Ps. Here's a demo on SQLize, both with and without FORCE INDEX, demonstrating the issue.
Have indexed the columns in both tables?
e.id and v.venue_id
If you do not, creates indexes in both tables. If you already have, it could be that you have few records in one or more tables and analyzer detects that it is more efficient to perform a full scan rather than an indexed read.
You could use a subquery:
select * from zipcode_distances z, venues v, events e
where
z.id in (select id from zipcode z where z.zipcode_from='92108' and z.distance <= 5)
and z.zipcode_to=v.zipcode
and v.id=e.venue_id
You are selecting all columns from all tables (select *) so there is little point in the optimizer using an index when the query engine will then have to do a lookup from the index to the table on every single row.

Best possible indexing strategy for MySQL DB

I have a table with 5 columns,say - A(Primary key), B, C, D and E.
This table has almost 150k rows and there are no indices on this table. As expected the select queries are very slow.
These queries are generated by the user search requests so he can enter values in any of the fields (B, C, D and E) and these are 'IN' kind of queries. I am not sure what should be the good indexing strategy here - having indexes on each of these columns or have them in some combinations.
Selectivity of each of these columns is the same (around 50).
Any help would be appreciated.
Are you running the same query regardless of what the user gives you? In that case, that query should tell you what indexes to use.
For example, if your query might look like
SELECT * FROM mytable WHERE
B IN (...) AND
C IN (...) AND
D IN (...) AND
E IN (...)
In this case, where you restrict on all columns, a combined index with all five columns would probably be ok.
Otherwise, create one index per column, or combine columns that you always restrict on together in separate indexes.
Remember that if you have a combined index on e.g. B and C, then a query that does not restrict on B will not use that combined index.
if you can group two columns in one index that would okay. Having an index on each column is not so bad as long as you don't query Cartesian product like cross join. But better not too ..