I have a simple query which selects a specific time range of data then group by.
The SQL looks like:
SELECT
C,
sum(X),
sum(Y)
FROM
table
WHERE
A = ${id} AND
B BETWEEN '2021-08-01'AND '2021-08-02'
GROUP BY
C;
B's data type is date.
C's data type is varchar.
X, Y's data type is bigint.
The index is (A, B, C).
When I use EXPLAIN, in the extra column: Using where; Using index; Using temporary; Using filesort. Key column: (A, B, C).
I guess the index stops at B, since B is a range condition.
Is there any other way to optimize this query? Thanks a lot.
The columns, in optimal order for the INDEX, are
A -- Column(s) being tested in WHERE with =
B -- One column in WHERE as a range
No more columns are worth adding to the INDEX
Exception: If all the columns used anywhere in the SELECT are in the index, then the index is "covering". This gives a slight performance boost because it won't have to bounce between the INDEX's BTree and the data's BTree.
INDEX(A, B, -- first, and in this order (as above)
C, X, Y) -- in any order
I doubt if the "covering" index is worth using in that query.
Another possibility
A -- all the = columns
C -- the GROUP BY columns
This would avoid the sorting that may be necessary for GROUP BY. So...
INDEX(A, C), or
INDEX(A, C, B, X, Y) -- covering
I'm surprised that your EXPLAIN said "Using index". That means that the index is covering, yet your (A,B,C) is not.
More on building the optimal index: Index Cookbook
Another technique... Often SUMs (and COUNTs) like that come along with "Data Warehouse" schemas. A very good speedup is to build and maintain a Summary table of, say, hourly subtotals. Then, the SELECT SUMs the subtotals to get the total. More discussion in Summary Tables
"Using filesort" may actually happen in RAM. (Don't be put off by "file".)
Related
I have table1 with 1M rows in my db.
columns: {id, name, timestamp, tag, r, g, b}
indexes: {primary: id, index: timestamp, index: (tag,r,g,b)}
each rows has a tag (which is an integer) and a color, which is saved by its components (r,g,b) in seperate columns. my queries are supposed to be like:
SELECT * from table1 WHERE tag=... AND (r>... AND r<... AND g>... AND g<... AND b>... AND b<...) ORDER BY timestamp DESC LIMIT 24;
the problem is that when there are only a few records in the db for the selected filters (tag and color), the query is very slow (15 seconds). it is also notable that when I remove ORDER BY timestamp DESC from the query, it runs very fast, even if there are a few results. how to solve the issue and make the query fast?
I'm not sure what you mean by "few", but 15 seconds seems like a long time.
You want an index on this query, on (tag, r, g, b).
That said, this is not an optimal index; or more precisely, it is about as optimal as you can get in MySQL. The real type of index that you want is an RD-Tree, which is optimized for ranges on different dimensions. The primary use-case is GIS (geographic information systems).
However, I don't think that MySQL supports RD-Trees as a generic index type. Hopefully, tag is highly selective and the above index will work well.
INDEX(tag, timestamp)
may help somewhat.
The general problem is that the Optimizer sees two semi-useful indexes but with no inadequate clues as to which one to pick. And then it picks the less beneficial one.
Adding these may help when you have a relatively narrow choice for g or b:
INDEX(tag, g)
INDEX(tag, b)
Unfortunately you have 4 "ranges" in the the WHERE clause (timestamp, r, g, b) and the Optimizer can use only one. I stuck tag in front of each (including your extant (tag, r, g, b), which won't get beyond r).
= tests should go first; the index can end with one range; any subsequent range test (g,b, in your case) will be ignored in the index.
Suppose you have a table with the following columns:
id
date
col1
I would like to be able to query this table with a specific id and date, and also order by another column. For example,
SELECT * FROM TABLE WHERE id = ? AND date > ? ORDER BY col1 DESC
According to this range documentation, an index will stop being used after it hits the > operator. But according to this order by documentation, an index can only be used to optimize the order by clause if it is ordering by the last column in the index. Is it possible to get an indexed lookup on every part of this query, or can you only get 2 of the 3? Can I do any better than index (id, date)?
Plan A: INDEX(id, date) -- works best if when it filters out a lot of rows, making the subsequent "filesort" not very costly.
Plan B: INDEX(col1), which may work best if very few rows are filtered by the WHERE clause. This avoids the filesort, but is not necessarily faster than the other choices here.
Plan C: INDEX(id, date, col1) -- This is a "covering" index if the query does not reference any other fields. The potential advantage here is to look only at the index, and not have to touch the data. If it applies, Plan C is better than Plan A.
You have not provided enough information to say which of these INDEXes will work best. Suggest you add C and B, if "covering" applies; else add A and B. The see which index the Optimizer picks. (There is still a chance that the Optimizer will not pick 'right'.)
(These three indexes are what my Index blog recommends.)
I have a question about optimizing sql queries with multiple index.
Imagine I have a table "TEST" with fields "A, B, C, D, E, F".
In my code (php), I use the following "WHERE" query :
Select (..) from TEST WHERE a = 'x' and B = 'y'
Select (..) from TEST WHERE a = 'x' and B = 'y' and F = 'z'
Select (..) from TEST WHERE a = 'x' and B = 'y' and (D = 'w' or F = 'z')
what is the best approach to get the best speed when running queries?
3 multiple Index like (A, B), (A, B, F) and (A, B, D, F)?
Or A single multiple index (A, B, D, F)?
I would tend to say that the 3 index would be best even if the space of index in the database will be larger.
In my problem, I search the best execution time not the space.
The database being of a reasonable size.
Multiple-column indexes:
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
In other words, it is a waste of space an computing power to define an index that covers the same first N columns as another index and in the same order.
The best way to exam the index is to practice. Use "explain" in mysql, it will give you a query plan and tell you which index to use. In addition, it will give you an estimate time for your query to run. Here is an example
explain select * from TEST WHERE a = 'x' and B = 'y'
It is hard to give definitive answers without experiments.
BUT: ordinarily an index like (A,B,D) is considered to be superfluous if you have an index on (A,B,D,F). So, in my opinion you only need the one multicolumn index.
There is one other consideration. If your table has a lot of columns and a lot of rows and your SELECT list has a small subset of those columns, you might consider including those columns in your index. For example, if your query says SELECT D,F,G,H FROM ... you should try creating an index on
(A,B,D,F,G,H)
as it will allow the query to be satisfied from the index without having to refer back to the rows of the table. This can sometimes help performance a great deal.
It's hard to explain well, but generally you should use as few indexes as you can get away with, using as many columns of the common queries as you can, with the most commonly queried columns first.
In your example WHERE clauses, A and B are always included. These should thus be part of an index. If A is more commonly used in a search then list that first, if B is more commonly used then list that first. MySQL can partially use the index as long as each column (seen from the left) in the index is used in the WHERE clause. So if you have an index ( A, B, C ) then WHERE ( A = .. AND B = .. AND Z = .. ) can still use that index to narrow down the search. If you have a WHERE ( B = .. AND Z = .. ) clause then A isn't part of the search condition and it can't be used for that index.
You want the single multiple column index A, B, D, F OR A, B, F, D (only one of these at a time can be used), but which depends mostly on the number of times D or F are queried for, and the distribution of data. Say if most of the values in D are 0 but one in a hundred values are 1 then that column would have a poor key distribution and thus putting the index on that column wouldn't be all that useful.
The optimiser can use a composite index for where conditions that follow the order of the index with no gaps:
An index on (A,B,F) will cover the first two queries.
The last query is a bit trickier, because of the OR. I think only the A and B conditions will be covered by (A,B,F) but using a separate index (D) or index (F) may speed up the query depending on the cardinality of the rows.
I think an index on (A,B,D,F) can only be used for the A and B conditions on all three queries. Not the F condition on query two, because the D value in the index can be anything and not the D and F conditions because of the OR.
You may have to add hints to the query to get the optimiser to use the best index and you can see which indexes are being used by running an EXPLAIN ... on the query.
Also, adding indexes slows down DML statements and can cause locking issues, so it's best to avoid over-indexing where possible.
I've only started using INDEXes in my MySQL database and I'm a little unsure if what I have in mind will work. I have a TEXT field that can store a large body of text and will need to be searched, along with another id INT field. If I have an INDEX on say my id_column field and a FULLTEXT index on my text_column, will MySQL use both in a query such as
SELECT * FROM notes WHERE id_column='123' AND MATCH(text_column) AGAINST(search_text)
??
Secondly, I have a group of columns that can be used frequently for searching in combination together. If I create a multi-column INDEX in these columns, the index wills till work if the columns used are together left-to-right in the index. But what happens if the user leaves out a particular column, say B, and searches using A, B, D in an index like (A, B, C, D) ???
For question 1:
Yes, the query will use both indices. FULLTEXT indices can be kind of tricky, however, so it's a good idea to read the MySQL documentation thoroughly on them and use EXPLAIN on your queries to make sure they are properly utilizing indices.
For question 2:
If you have a multiple column index, the index has to have the same columns in the same order as the query to be used. So in your example, the index wouldn't be utilized.
EXPLAIN is a very powerful tool for understanding how queries use indices, and it's a good idea to use it frequently (especially on queries which are programatically generated). http://dev.mysql.com/doc/refman/5.0/en/explain.html
There is no guarantee that MySQL will use both two indexes for the same table in one query. In general, no. But sometimes it activates an "index merge," searching both indexes and combining the results.
Not all queries can do this, however. You should read about this feature here: http://dev.mysql.com/doc/refman/5.6/en/index-merge-optimization.html
Regarding multi-column indexes, if you have an index on columns A, B, C, D, and you do a search on columns A, B, D, then the index may be used, but only so far as it narrows down the search based on your conditions for columns A and B.
You can see evidence of this if you use EXPLAIN and look at the "ken_len" field. The key_len will be the total number of bytes in the columns that are used in that multi-column index. Fo example, if A, B, C, D are four 4-byte integers, the key_len could be as much as 16. But if only A and B are used, the key_len will be 8.
Given this query:
SELECT * FROM notes
WHERE id_column='123'
AND MATCH(text_column) AGAINST(search_text)
the only way the optimizer will perform it (to my knowledge) is to
Use FULLTEXT(text_column) to do the second part of the search, then
Filter out those without id_column='123'; no index will be used for this step.
That's the general rule when mixing FULLTEXT and non-fulltext indexes -- FULLTEXT first; no other indexes used.
However... Here is a trick that sometimes speeds up complex queries:
SELECT b.*
FROM (
SELECT id -- assuming this is the PRIMARY KEY
FROM notes
WHERE MATCH(text_column) AGAINST(search_text)
) AS a
JOIN notes AS b -- "self join"
ON b.id = a.id -- just the PK
JOIN ((other tables)) ON ...
WHERE ((other messy or bulky stuff)) ...
The idea is to use the subquery to condense down to a short list of small values (the ids), then reach back in (or futher JOIN) to get the bulky stuff.
For building optimal composite indexes for some simple queries, see my index cookbook.
I have a MyISAM table with almost 1 billion records, with say, three fields: a, b and c.
The table has a btree multi-field index on columns a, b and c in that order. Analyzing the index shows that the cardinalities for the fields in that index are:
a: 112 (int)
b: 2694 (int)
c: 936426795 (datetime)
Which means that there are around 100 different values for a, around 20 different values for b, and for each combination of a and b, a whole lot of values of c.
I want to perform a query over a specific value of a, and a range over c. Something like
select a, b, c from mytable where a=4 and c >= "2011-01-01 00:00:00" and c < "2011-01-02 00:00:00"
Getting the query explained shows me that it will indeed use the index, but I don't know if it will use only the first field of the index and then scan over the rest of the table, or if it will be smart enough to apply the third field index, for each value of b, which would be the same as executing 20 different queries, one for each different value of b.
Anybody who knows the internal working of mysql indices can answer this question?
Edit: I'm not asking whether or not I can have mysql to use the index over only a and c. I know how btrees work, and I know that you can only use it over a, a and b, or a and b and c. I would like to know if the mysql optimizer is smart enough to apply the index over all the values in b so it can use the a+b+c index, considering that the cardinality of b is extremely small.
Consider an even simpler example. A table with two columns: a and b, and the index has cardinality 1 over a and 10000000 over b. Mysql should be smart enough to know that there's only one value of a, therefore this index is equivalent to an index only over b, and should use this index when performing queries only over b.
MySQL Reference Manual :: How MySQL Uses Indexes
If the table has a multiple-column index, any leftmost prefix of the
index can be used by the optimizer to find rows. For example, if you
have a three-column index on (col1, col2, col3), you have indexed
search capabilities on (col1), (col1, col2), and (col1, col2, col3).
MySQL cannot use an index if the columns do not form a leftmost prefix of the index.
a,c is not a leftmost prefix of the index a,b,c so the index cannot be used to resolve the search on c.
The question makes sense from the point of view that some database engines are smart enough to scan the index rather than scanning the table. (And they allow "data" to be stored in the index for this exact reason.) Scanning the index will be faster than joining the index to the base data, then limiting (excluding) returned rows based on the where clause.
It would make sense that only the rows in the index that meet the where condition (on columns in the index) are joined. Particularly if you are running a large key cache...
It would appear this doesn't happen in MySQL which is disappointing.
Therefore no.
Below are some facts related with B-TREE index usage by mysql and one example to understand this logic.
a) If any table has approx. 75% same data then index will not be used instead mysql will do table scan.
b) Normally mysql use only single index per table.
c) Index ordering methodology: Mysql will use index as per their order.
For example there is an combined index on a, b and c field idx_a_b_c(a,b,c)
i. select a, b, c from mytable where a=4
This query will use index as 'a' column is first in index order.
ii. select a, b, c from mytable where a=4 and b=5
This query will use combined index on a & b as these column are continue in index order.
iii. select a, b, c from mytable where a=4 and b=5 and c >= "2011-01-01 00:00:00"
This query will use combined index on a, b & c as these column are continue in index order.
iv. select a, b, c from mytable where c >= "2011-01-01 00:00:00"
This query will not use index as mysql consider index from left most corner and column c is not a left most column in index.
v. select a, b, c from mytable where a=4 and c >= "2011-01-01 00:00:00" and c < "2011-01-02 00:00:00"
This query will use only index on 'a' column but not of 'c' column as continuity is breaking here from left side. So this query will use index on a column and then scan table for column c for corresponding rows as per filter on column a.