mysql: multiple indexes advice - mysql

I have a single table where I need to query based on 4 columns a,b,c,d
The most common query will be a select based on all 4 columns at the same time, however I need to be able to search quickly for each of the columns taken separately, or also combinations of them (e.g. a&b, a&d, b&c&d and so on).
Shall I create an index for every combination? or it's better to have only an index for a&b&c&d and one for a, one for b, one for c, one for d? in this last case a query that matches only a&b for example will be sped up because both a and be have an index?

If you want to satisfy all the combinations with an index, you need the following:
(a, b, c, d)
(a, b, d)
(a, c, d)
(a, d)
(b, c, d)
(b, d)
(c, d)
d
You don't need other combinations because any prefix of an index is also an index. The first index will be used for queries that test just a, a&b, a&b&c, so you don't need indexes for those combinations.
Whether you really need all these indexes depends on how much data you have. It's possible that just having indexes on each column will narrow down the search sufficiently that you don't need indexes on the combinations. The only real way to tell is by benchmarking the performance of your applications. The indexes take up disk space and memory, so trying to create all possible indexes can cause problems of its own; you need to determine if the need is strong enough.

One thing to note is that a "range" is only useful as the last item in an index:
WHERE x=2 AND y>5 -- INDEX(x,y) is useful; INDEX(y,x) only uses `y`
WHERE x=2 AND y BETWEEN 11 AND 22 -- ditto
WHERE x=2 AND s LIKE 'foo%' -- ditto
Another thing: "flags" (true/false, etc) are useless to index by themselves. They can be somewhat useful in combination:
WHERE published=1 AND ...
Also, order matters in the INDEX, but not in the WHERE: Suppose you have INDEX(a,b):
WHERE a=1 AND b=2 -- good index
WHERE b=2 AND a=1 -- equally good
WHERE a=1 -- the index is good
WHERE b=2 -- the index is useless
If some column is always a range (such as a date), it gets messier. For optimal indexing two indexes are needed here:
WHERE d BETWEEN ... -- needs INDEX(d)
WHERE a=1 AND d BETWEEN ... -- needs INDEX(a,d)
So, I might do these:
Make all 2-column combinations of a,b,c,d -- This would be 6 combinations if nothing is involved in "ranges". I would be sure to vary which col starts the indexes: ab, bc, cd, da, ac, db
Turn on the slowlog to see what is not being well indexed.
Log the actual combinations that people use. Some combinations will be very rarely used. Get rid of the indexes that are useless.
More on understanding index creation.

Related

MySQL-select query is very slow when there are a few results

I have table1 with 1M rows in my db.
columns: {id, name, timestamp, tag, r, g, b}
indexes: {primary: id, index: timestamp, index: (tag,r,g,b)}
each rows has a tag (which is an integer) and a color, which is saved by its components (r,g,b) in seperate columns. my queries are supposed to be like:
SELECT * from table1 WHERE tag=... AND (r>... AND r<... AND g>... AND g<... AND b>... AND b<...) ORDER BY timestamp DESC LIMIT 24;
the problem is that when there are only a few records in the db for the selected filters (tag and color), the query is very slow (15 seconds). it is also notable that when I remove ORDER BY timestamp DESC from the query, it runs very fast, even if there are a few results. how to solve the issue and make the query fast?
I'm not sure what you mean by "few", but 15 seconds seems like a long time.
You want an index on this query, on (tag, r, g, b).
That said, this is not an optimal index; or more precisely, it is about as optimal as you can get in MySQL. The real type of index that you want is an RD-Tree, which is optimized for ranges on different dimensions. The primary use-case is GIS (geographic information systems).
However, I don't think that MySQL supports RD-Trees as a generic index type. Hopefully, tag is highly selective and the above index will work well.
INDEX(tag, timestamp)
may help somewhat.
The general problem is that the Optimizer sees two semi-useful indexes but with no inadequate clues as to which one to pick. And then it picks the less beneficial one.
Adding these may help when you have a relatively narrow choice for g or b:
INDEX(tag, g)
INDEX(tag, b)
Unfortunately you have 4 "ranges" in the the WHERE clause (timestamp, r, g, b) and the Optimizer can use only one. I stuck tag in front of each (including your extant (tag, r, g, b), which won't get beyond r).
= tests should go first; the index can end with one range; any subsequent range test (g,b, in your case) will be ignored in the index.

MySQL multiple index optimization

I have a question about optimizing sql queries with multiple index.
Imagine I have a table "TEST" with fields "A, B, C, D, E, F".
In my code (php), I use the following "WHERE" query :
Select (..) from TEST WHERE a = 'x' and B = 'y'
Select (..) from TEST WHERE a = 'x' and B = 'y' and F = 'z'
Select (..) from TEST WHERE a = 'x' and B = 'y' and (D = 'w' or F = 'z')
what is the best approach to get the best speed when running queries?
3 multiple Index like (A, B), (A, B, F) and (A, B, D, F)?
Or A single multiple index (A, B, D, F)?
I would tend to say that the 3 index would be best even if the space of index in the database will be larger.
In my problem, I search the best execution time not the space.
The database being of a reasonable size.
Multiple-column indexes:
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
In other words, it is a waste of space an computing power to define an index that covers the same first N columns as another index and in the same order.
The best way to exam the index is to practice. Use "explain" in mysql, it will give you a query plan and tell you which index to use. In addition, it will give you an estimate time for your query to run. Here is an example
explain select * from TEST WHERE a = 'x' and B = 'y'
It is hard to give definitive answers without experiments.
BUT: ordinarily an index like (A,B,D) is considered to be superfluous if you have an index on (A,B,D,F). So, in my opinion you only need the one multicolumn index.
There is one other consideration. If your table has a lot of columns and a lot of rows and your SELECT list has a small subset of those columns, you might consider including those columns in your index. For example, if your query says SELECT D,F,G,H FROM ... you should try creating an index on
(A,B,D,F,G,H)
as it will allow the query to be satisfied from the index without having to refer back to the rows of the table. This can sometimes help performance a great deal.
It's hard to explain well, but generally you should use as few indexes as you can get away with, using as many columns of the common queries as you can, with the most commonly queried columns first.
In your example WHERE clauses, A and B are always included. These should thus be part of an index. If A is more commonly used in a search then list that first, if B is more commonly used then list that first. MySQL can partially use the index as long as each column (seen from the left) in the index is used in the WHERE clause. So if you have an index ( A, B, C ) then WHERE ( A = .. AND B = .. AND Z = .. ) can still use that index to narrow down the search. If you have a WHERE ( B = .. AND Z = .. ) clause then A isn't part of the search condition and it can't be used for that index.
You want the single multiple column index A, B, D, F OR A, B, F, D (only one of these at a time can be used), but which depends mostly on the number of times D or F are queried for, and the distribution of data. Say if most of the values in D are 0 but one in a hundred values are 1 then that column would have a poor key distribution and thus putting the index on that column wouldn't be all that useful.
The optimiser can use a composite index for where conditions that follow the order of the index with no gaps:
An index on (A,B,F) will cover the first two queries.
The last query is a bit trickier, because of the OR. I think only the A and B conditions will be covered by (A,B,F) but using a separate index (D) or index (F) may speed up the query depending on the cardinality of the rows.
I think an index on (A,B,D,F) can only be used for the A and B conditions on all three queries. Not the F condition on query two, because the D value in the index can be anything and not the D and F conditions because of the OR.
You may have to add hints to the query to get the optimiser to use the best index and you can see which indexes are being used by running an EXPLAIN ... on the query.
Also, adding indexes slows down DML statements and can cause locking issues, so it's best to avoid over-indexing where possible.

Indexes in MySQL

I've only started using INDEXes in my MySQL database and I'm a little unsure if what I have in mind will work. I have a TEXT field that can store a large body of text and will need to be searched, along with another id INT field. If I have an INDEX on say my id_column field and a FULLTEXT index on my text_column, will MySQL use both in a query such as
SELECT * FROM notes WHERE id_column='123' AND MATCH(text_column) AGAINST(search_text)
??
Secondly, I have a group of columns that can be used frequently for searching in combination together. If I create a multi-column INDEX in these columns, the index wills till work if the columns used are together left-to-right in the index. But what happens if the user leaves out a particular column, say B, and searches using A, B, D in an index like (A, B, C, D) ???
For question 1:
Yes, the query will use both indices. FULLTEXT indices can be kind of tricky, however, so it's a good idea to read the MySQL documentation thoroughly on them and use EXPLAIN on your queries to make sure they are properly utilizing indices.
For question 2:
If you have a multiple column index, the index has to have the same columns in the same order as the query to be used. So in your example, the index wouldn't be utilized.
EXPLAIN is a very powerful tool for understanding how queries use indices, and it's a good idea to use it frequently (especially on queries which are programatically generated). http://dev.mysql.com/doc/refman/5.0/en/explain.html
There is no guarantee that MySQL will use both two indexes for the same table in one query. In general, no. But sometimes it activates an "index merge," searching both indexes and combining the results.
Not all queries can do this, however. You should read about this feature here: http://dev.mysql.com/doc/refman/5.6/en/index-merge-optimization.html
Regarding multi-column indexes, if you have an index on columns A, B, C, D, and you do a search on columns A, B, D, then the index may be used, but only so far as it narrows down the search based on your conditions for columns A and B.
You can see evidence of this if you use EXPLAIN and look at the "ken_len" field. The key_len will be the total number of bytes in the columns that are used in that multi-column index. Fo example, if A, B, C, D are four 4-byte integers, the key_len could be as much as 16. But if only A and B are used, the key_len will be 8.
Given this query:
SELECT * FROM notes
WHERE id_column='123'
AND MATCH(text_column) AGAINST(search_text)
the only way the optimizer will perform it (to my knowledge) is to
Use FULLTEXT(text_column) to do the second part of the search, then
Filter out those without id_column='123'; no index will be used for this step.
That's the general rule when mixing FULLTEXT and non-fulltext indexes -- FULLTEXT first; no other indexes used.
However... Here is a trick that sometimes speeds up complex queries:
SELECT b.*
FROM (
SELECT id -- assuming this is the PRIMARY KEY
FROM notes
WHERE MATCH(text_column) AGAINST(search_text)
) AS a
JOIN notes AS b -- "self join"
ON b.id = a.id -- just the PK
JOIN ((other tables)) ON ...
WHERE ((other messy or bulky stuff)) ...
The idea is to use the subquery to condense down to a short list of small values (the ids), then reach back in (or futher JOIN) to get the bulky stuff.
For building optimal composite indexes for some simple queries, see my index cookbook.

Best possible indexing strategy for MySQL DB

I have a table with 5 columns,say - A(Primary key), B, C, D and E.
This table has almost 150k rows and there are no indices on this table. As expected the select queries are very slow.
These queries are generated by the user search requests so he can enter values in any of the fields (B, C, D and E) and these are 'IN' kind of queries. I am not sure what should be the good indexing strategy here - having indexes on each of these columns or have them in some combinations.
Selectivity of each of these columns is the same (around 50).
Any help would be appreciated.
Are you running the same query regardless of what the user gives you? In that case, that query should tell you what indexes to use.
For example, if your query might look like
SELECT * FROM mytable WHERE
B IN (...) AND
C IN (...) AND
D IN (...) AND
E IN (...)
In this case, where you restrict on all columns, a combined index with all five columns would probably be ok.
Otherwise, create one index per column, or combine columns that you always restrict on together in separate indexes.
Remember that if you have a combined index on e.g. B and C, then a query that does not restrict on B will not use that combined index.
if you can group two columns in one index that would okay. Having an index on each column is not so bad as long as you don't query Cartesian product like cross join. But better not too ..

Is there a better way to index multiple columns than creating an index for each permutation?

Suppose I have a database table with columns a, b, and c. I plan on doing queries on all three columns, but I'm not sure which columns in particular I'm querying. There's enough rows in the table that an index immensely speeds up the search, but it feels wrong to make all the permutations of possible indexes (like this):
a
b
c
a, b
a, c
b, c
a, b, c
Is there a better way to handle this problem? (It's very possible that I'll be just fine indexing a, b, c alone, since this will cut down on the number of rows quickly, but I'm wondering if there's a better way.)
If you need more concrete examples, in the real-life data, the columns are city, state, and zip code. Also, I'm using a MySQL database.
In MS SQL the index "a, b, c" will cover you for scenarios "a"; "a, b"; and "a, b, c". So you would only need the following indexes:
a, b, c
b, c
c
Not sure if MySQL works the same way, but I would assume so.
To use indexes for all possible equality conditions on N columns, you will need C([N/2], N) indexes, that is N! / ([N/2]! * (N - [N/2])!)
See this article in my blog for detailed explanations:
Creating indexes
You can also read the strict mathematical proof by Russian mathematician Egor Timoshenko (update: now in English).
One can, however, get decent performance with less indexes using the following techniques:
Index merging
If the columns col1, col2 and col3 are selective, then this query
SELECT *
FROM mytable
WHERE col1 = :value1
AND col2 = :value2
AND col3 = :value3
can use three separate indexes on col1, col2 and col3, select the ROWID's that match each condition separately and them find their intersection, like in:
SELECT *
FROM (
SELECT rowid
FROM mytable
WHERE col1 = :value1
INTERSECT
SELECT rowid
FROM mytable
WHERE col2 = :value2
INTERSECT
SELECT rowid
FROM mytable
WHERE col3 = :value3
) mo
JOIN mytable mi
ON mi.rowid = mo.rowid
Bitmap indexing
PostgreSQL can build temporary bitmap indexes in memory right during the query.
A bitmap index is quite a compact contiguous bit array.
Each bit set for the the array tells that the corresponging tid should be selected from the table.
Such an index can take but 128M of temporary storage for a table with 1G rows.
The following query:
SELECT *
FROM mytable
WHERE col1 = :value1
AND col2 = :value2
AND col3 = :value3
will first allocate a zero-filled bitmap large enough to cover all possible tid's in the table (that is large enough to take all tid's from (0, 0) to the last tid, not taking missing tid's into account).
Then it will seek the first index, setting the bits to 1 if they satisfy the first condition.
Then it will scan the second index, AND'ing the bits that satisfy the second condition with a 1. This will leave 1 only for those bits that satisfy both conditions.
Same for the third index.
Finally, it will just select rows with the tid's corresponding to the bits set.
The tid's will be fetched sequentially, so it's very efficient.
The more the indexes you create the more your performance will be hit during update and delete operations. Because the index itself might get updated.
Yes, you can use multiple-column indexes. Something like
CREATE TABLE temp (
id INT NOT NULL,
a INT NULL,
b INT NULL,
c INT NULL,
PRIMARY KEY (id),
INDEX ind1 (a,b,c),
INDEX ind2 (a,b)
);
This type of index i.e. ind1 will surely help you in queries like
SELECT * FROM temp WHERE a=2 AND b=3 AND c=4;
Similarly, ind2 will help you in queries like
SELECT * FROM temp WHERE a=2 AND b=3;
But these indexes won't be used if the query is some thing like
SELECT * FROM temp WHERE a=2 OR b=3 OR c=4;
Here you will need separate indexes on a, b, and c.
So instead of having so many indexes, I would agree with what John said i.e. have indexes on a,b,c and if you feel that your workload covers more multi-column queries then you can switch to multi-column indexes.
cheers
Given that your columns are actually City, State and Zip Code, I would suggest just the following indexes:
INDEX(ZipCode)
If I am correct, Zip Codes are not duplicated across the USA, so it's pointless adding City or State information to the index as well because they will be the same value for all Zip Codes. E.g., 90210 is always Los Angeles, CA.
INDEX(City(5)) or INDEX(City(5)), State)
This is just an index on the first five letters of the city name. In many cases, this will be specific enough that having the State indexed wouldn't provide any useful filtering. E.g., 'Los A' will almost certainly be records from Los Angeles, CA. Maybe there is another small town in the USA starting with 'Los A', but there will be so few records it's not worth cluttering the index with State data as well. On the other hand, some city names appear in many states (Springfield comes to mind), so in those cases it is better to have the State indexed as well. You will need to figure out for yourself which index is most suited to your set of data. If in doubt, I would go with the second index (City and State).
INDEX(State, sort_field)
State is a pretty broad index (quite possibly NY and CA alone will have 30% of the records). If you plan displaying this information to the user, say, 30 records at a time, then you would have a query ending in
... WHERE STATE = "NY"
ORDER BY <sort_field>
LIMIT <number>, 30
To make that query efficient, you need to include the sorting column in the State index. So if you're showing pages ordered by Last Name (presuming you have that column), then you would use INDEX(State, LastName(3)), otherwise MySQL has to sort all of the 'NY' records before it can give you the 30 you want.
It's depend on your sql-query.
index (a, b, c) is different to index(b, c, a) or index(a, c, b)