MySQL index for constant values - mysql

How looks like an optimal MySQL index for this query:
select a,b,c from t where a=1 and b=2 order by c

The optimal index is: t(a, b, c) or t(b, a, c).
The first two columns of the index will be used for the where clause. The third column can then be used for the order by.

Related

MySQL Slow Query - Range and Group By

I have a simple query which selects a specific time range of data then group by.
The SQL looks like:
SELECT
C,
sum(X),
sum(Y)
FROM
table
WHERE
A = ${id} AND
B BETWEEN '2021-08-01'AND '2021-08-02'
GROUP BY
C;
B's data type is date.
C's data type is varchar.
X, Y's data type is bigint.
The index is (A, B, C).
When I use EXPLAIN, in the extra column: Using where; Using index; Using temporary; Using filesort. Key column: (A, B, C).
I guess the index stops at B, since B is a range condition.
Is there any other way to optimize this query? Thanks a lot.
The columns, in optimal order for the INDEX, are
A -- Column(s) being tested in WHERE with =
B -- One column in WHERE as a range
No more columns are worth adding to the INDEX
Exception: If all the columns used anywhere in the SELECT are in the index, then the index is "covering". This gives a slight performance boost because it won't have to bounce between the INDEX's BTree and the data's BTree.
INDEX(A, B, -- first, and in this order (as above)
C, X, Y) -- in any order
I doubt if the "covering" index is worth using in that query.
Another possibility
A -- all the = columns
C -- the GROUP BY columns
This would avoid the sorting that may be necessary for GROUP BY. So...
INDEX(A, C), or
INDEX(A, C, B, X, Y) -- covering
I'm surprised that your EXPLAIN said "Using index". That means that the index is covering, yet your (A,B,C) is not.
More on building the optimal index: Index Cookbook
Another technique... Often SUMs (and COUNTs) like that come along with "Data Warehouse" schemas. A very good speedup is to build and maintain a Summary table of, say, hourly subtotals. Then, the SELECT SUMs the subtotals to get the total. More discussion in Summary Tables
"Using filesort" may actually happen in RAM. (Don't be put off by "file".)

MariaDB subquery use whole row

Usually subqueries compare single or multiple fields and delete statements usually delete values by an ID. Unfortunately I don't have a ID field and I have to use an generic approach for differnt kind of tables.
That's why I am working with a subquery using limit and offset as resolving rows.
I know that approach is risky, however is there any way to delete rows by subquerying and comparing the whole row?
DELETE FROM table WHERE * = ( SELECT * FROM table LIMIT 1 OFFSET 6 )
I am using the latest version of MariaDB
This sounds like a really strange need, but who am I to judge? :)
I would simply rely on the primary key:
DELETE FROM table WHERE id_table = (SELECT id_table FROM table LIMIT 1 OFFSET 6)
update: oh, so you don't have a primary key? You can join on the whole row this way (assuming it has five columns named a, b, c, d, e):
DELETE t
FROM table t
INNER JOIN (
SELECT a, b, c, d, e
FROM table
ORDER BY a, b, c, d, e
LIMIT 1 OFFSET 6
) ROW6 USING (a, b, c, d, e);
Any subset of columns (e.g. a, c, d) that uniquely identify a row will do the trick (and is probably what you need as a primary key anyway).
Edit: Added an ORDER BY clause as per The Impaler's excellent advice. That's what you get for knocking an example up quickly.
DELETE FROM t
ORDER BY ... -- fill in as needed
LIMIT 6
(Works on any version)

GROUP BY a, b VS GROUP BY b, a

I was wondering if there is any difference between order of grouping in GROUP BY a, b and GROUP BY b, a (I know the final result is the same). If so, would it affect the query's speed?
A group by clause just defines the unique combination of field(s) which would be considered a group. There is no meaning to the order these fields are stated.
It does matter if you have multiple-column indexes. You should define the GROUP BY columns in the order of the index.
So, if you have an index for (a,b) then you should use GROUP BY a, b and MySQL is able to take full advantage of the index.
See example

MySQL: Indexes on GROUP BY

I have a reasonably big table (>10.000 rows) which is going to grow much bigger fast. On this table I run the following query:
SELECT *, MAX(a) FROM table GROUP BY b, c, d
Currently EXPLAIN tells me that there are no keys, no possible keys and it's "Using temporary; Using filesort". What would the best key be for such a table?
What about composite key b+c+d+a?
Btw, SELECT * makes no sense in case when you have GROUP BY
A primary index on field b,c,d would be nice if applicable.
In that case you just do a
SELECT * FROM table1
group by <insert PRIMARY KEY here>
If not put an index on b,c,d.
And maybe on a, depends on the performance.
If b,c,d are always used in unison, use a composite index on all three.
Very important! Always declare a primary key. Without it performance on InnoDB will suck.
To elaborate on #zerkms, you only need to put those columns in the group by clause that completely define the rows that you are selecting.
If you select * that may be OK, but than the max(a) is not needed and neither is the group by.
Also note that the max(a) may come from a different row than the rest of the fields.
The only use case that does make sense is:
select t1.*, count(*) as occurrence from t1
inner join t2 on (t1.id = t2.manytoone_id)
group by t1.id
Where t1.id is the PK.
I think you need to rethink that query.
Ask a new question explaining what you want with the real code.
And make sure to ask how to make the outcomedeterminate, so that all values shown are functionally dependent on the group by clause.
In the end what worked was a modification to the query as follows:
SELECT b, c, d, e, f, MAX(a) FROM table GROUP BY b, c, d
And creating an index on (b, c, d, e, f).
Thanks a lot for your help: the tips here were very useful.

Partial multi-field index usage in MySQL

I have a MyISAM table with almost 1 billion records, with say, three fields: a, b and c.
The table has a btree multi-field index on columns a, b and c in that order. Analyzing the index shows that the cardinalities for the fields in that index are:
a: 112 (int)
b: 2694 (int)
c: 936426795 (datetime)
Which means that there are around 100 different values for a, around 20 different values for b, and for each combination of a and b, a whole lot of values of c.
I want to perform a query over a specific value of a, and a range over c. Something like
select a, b, c from mytable where a=4 and c >= "2011-01-01 00:00:00" and c < "2011-01-02 00:00:00"
Getting the query explained shows me that it will indeed use the index, but I don't know if it will use only the first field of the index and then scan over the rest of the table, or if it will be smart enough to apply the third field index, for each value of b, which would be the same as executing 20 different queries, one for each different value of b.
Anybody who knows the internal working of mysql indices can answer this question?
Edit: I'm not asking whether or not I can have mysql to use the index over only a and c. I know how btrees work, and I know that you can only use it over a, a and b, or a and b and c. I would like to know if the mysql optimizer is smart enough to apply the index over all the values in b so it can use the a+b+c index, considering that the cardinality of b is extremely small.
Consider an even simpler example. A table with two columns: a and b, and the index has cardinality 1 over a and 10000000 over b. Mysql should be smart enough to know that there's only one value of a, therefore this index is equivalent to an index only over b, and should use this index when performing queries only over b.
MySQL Reference Manual :: How MySQL Uses Indexes
If the table has a multiple-column index, any leftmost prefix of the
index can be used by the optimizer to find rows. For example, if you
have a three-column index on (col1, col2, col3), you have indexed
search capabilities on (col1), (col1, col2), and (col1, col2, col3).
MySQL cannot use an index if the columns do not form a leftmost prefix of the index.
a,c is not a leftmost prefix of the index a,b,c so the index cannot be used to resolve the search on c.
The question makes sense from the point of view that some database engines are smart enough to scan the index rather than scanning the table. (And they allow "data" to be stored in the index for this exact reason.) Scanning the index will be faster than joining the index to the base data, then limiting (excluding) returned rows based on the where clause.
It would make sense that only the rows in the index that meet the where condition (on columns in the index) are joined. Particularly if you are running a large key cache...
It would appear this doesn't happen in MySQL which is disappointing.
Therefore no.
Below are some facts related with B-TREE index usage by mysql and one example to understand this logic.
a) If any table has approx. 75% same data then index will not be used instead mysql will do table scan.
b) Normally mysql use only single index per table.
c) Index ordering methodology: Mysql will use index as per their order.
For example there is an combined index on a, b and c field idx_a_b_c(a,b,c)
i. select a, b, c from mytable where a=4
This query will use index as 'a' column is first in index order.
ii. select a, b, c from mytable where a=4 and b=5
This query will use combined index on a & b as these column are continue in index order.
iii. select a, b, c from mytable where a=4 and b=5 and c >= "2011-01-01 00:00:00"
This query will use combined index on a, b & c as these column are continue in index order.
iv. select a, b, c from mytable where c >= "2011-01-01 00:00:00"
This query will not use index as mysql consider index from left most corner and column c is not a left most column in index.
v. select a, b, c from mytable where a=4 and c >= "2011-01-01 00:00:00" and c < "2011-01-02 00:00:00"
This query will use only index on 'a' column but not of 'c' column as continuity is breaking here from left side. So this query will use index on a column and then scan table for column c for corresponding rows as per filter on column a.