Issue with MySQL Compound Primary Key - mysql

I am having an issue with a table the uses a compound primary key.
The key consists of a date followed by an bigint.
Selects on the table look to be scanning even when only selecting fields from the PK and using a where clause that contains both columns. For Example
SELECT mydate, myid from foo WHERE mydate >='2014-08-26' AND my_id = 1234;
Explain select shows using where and the number of rows considered is in the millions.
One oddity is the key_len which is shown as 7 which seems far too small.
My instinct says the key is broken but I may be missing something obvious.
Any thoughts?
Thank you
Richard

For this query, the index you want is on id, date:
create index idx_foo_myid_mydate on foo(my_id, mydate);
This is because the conditions in the where clause have an equality and inequality. The equality conditions need to match the index from left to right, before the inequalities can be applied.
MySQL documentation actually does a good job (in my opinion) in explaining composite indexes.
Your existing index will be used for the inequality on mydate. However, all the index after the date in question will then be scanned to satisfy the condition on my_id. With the right index, MySQL can just go to the right rows directly.

Related

Order of columns in an index

I am working on a database with large number of rows (6 Mil+).
This table has a composite primary key on two columns.
It also has separate index on each of those fields as there are queries that require this. Obviously, one of those indexes (indices?) is redundant and slowing down performance for write operations.
How do I find out which one is redundant? I understand the first column of a primary key is already indexed and need not be indexed separately. Is that correct? If so, is there a query I can run to find out which is the first one in the list?
SHOW INDEXES FROM tablename will include a Seq_in_index column, which tells you which is first (aka, left most) column, second column, etc.
Therefore, whichever column is listed with a value of 1 for Seq_in_index is the column that does not need it's own single column index.
You can also use SHOW CREATE TABLE tablename to see the index listed from left to right, and that order displayed correctly represents the order of columns in the index.
SHOW CREATE TABLE tablename gives you all the indexes, in their established order.
You don't need INDEX(a) because the column(s) in it are the first column(s) in the INDEX(a,b),
That applies to INDEX / UNIQUE / PRIMARY KEY in (a,b).
I understand the first column of a primary key is already indexed
Erm, no. All the columns in the primary key are indexed.
An explanation of how indexes work is stretching the scope of a post here, and the question of which indexes to put on your table is way too broad.
Suppose you have a primary key defined on attributes a,b,c. This index can be used for queries with predicates
a
a and b
a and b and c
But (at least, the last time I checked) it would not be used for a query with predicates
b
b and c
The optimizer will only ever use one index for each table in a query.
The right indexes depend on the volume of data, the cardinality of the data and the frequency and combination of predicates in your queries. There are execution and storage overheads when you start adding indexes, even just for select operations badly designed indexes can make your query slower than it would run without indexes.

MySQL Index sometimes not being used

I have a table with 150k rows of data, and I have column with a UNIQUE INDEX, It has a type of VARCHAR(10) and stores 10 digit account numbers.
Now whenever I query, like a simple one:
SELECT * FROM table WHERE account_number LIKE '0103%'
It results 30,000+ ROWS, and when I run a EXPLAIN on my query It shows no INDEX is used.
But when I do:
SELECT * FROM table WHERE account_number LIKE '0104%'
It results 4,000+ ROWS, with the INDEX used.
Anyone can explain this?
I'm using MySQL 5.7 Percona XtraDB.
30k+/150k > 20% and I guess it is faster to do table scan. From 8.2.1.19 Avoiding Full Table Scans:
The output from EXPLAIN shows ALL in the type column when MySQL uses a full table scan to resolve a query. This usually happens under the following conditions:
You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it probably will do many key lookups and that a table scan would be faster.
If you don't need all values try to use:
SELECT account_number FROM table WHERE account_number LIKE '0103%'
instead of SELECT *. Then your index will become covering index and optimizer should always use it (as long as WHERE condition is SARGable).
The most database uses B tree for indexing. In this case the database optimizer don't use the index because its faster to scan without index. Like #lad2025 explained.
Your database column is unique and i think your cardinality of your index is high. But since your query using the like filter the database optimizer decides for you to choose not to use the index.
You can use try force index to see the result. Your using varchar with unique index. I would choose another data type or change your index type. If your table only contains numbers change it to numbers. This will help to optimize you query a lot.
In some cases when you have to use like you can use full text index.
If you need help with optimizing your query and table. Provide us more info and which info you want to fetch from your table.
lad2025 is correct. The database is attempting to make an intelligent optimization.
Benchmark with:
SELECT * FROM table FORCE INDEX(table_index) WHERE account_number LIKE '0103%'
and see who is smarter :-) You can always try your hand at questioning the optimizer. That's what index hints are for...
https://dev.mysql.com/doc/refman/5.7/en/index-hints.html

Index on mysql partitioned tables

I have a table with two partitions. Partitions are pactive = 1 and pinactive = 0. I understand that two partitions does not make so much of a gain, but I have used it to truncate and load in one partition and plain inserts in another partition.
The problem comes when I create indexes.
Query goes this way
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
Created index -
create index idx_try on customformattributes(partitionflag,companyid,activityname,completiondate,attributename,isclosed)
there are around 200000 records that will be retreived from the above query. But the query along with the mentioned index takes 30+ seconds. What is the reason for such a long time? Also, if remove the partitionflag from the mentioned index, the index is not even used.
And is the understanding that,
Even with the partitions available, the optimizer needs to have the required partition mentioned in the index definition, so that it only hits the required partition ---- Correct?
Any ideas on understanding this would be very helpful
You can optimize your index by reordering the columns in it. Usually the columns in the index are ordered by its cardinality (starting from the highest and go down to the lowest). Cardinality is the uniqueness of data in the given column. So in your case I suppose there are many variations of companyid in customformattributes table while partitionflag will have cardinality of 2 (if all the options for this column are 1 and 0).
Your query will first filter all the rows with partitionflag=0, then it will filter by company id and so on.
When you remove partitionflag from the index the query did not used the index because may be the optimizer decides that it will be faster to make full table scan instead of using the index (in most of the cases the optimizer is right)
For the given query:
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
the following index may be would be better (but of course :
create index idx_try on customformattributes(companyid,activityname, completiondate,attributename, partitionflag, isclosed)
For the query to use index the following rule must be met - the left most column in the index should be present in the where clause ... and depending on the mysql version you are using additional query requirements may be needed. For example if you are using old version of mysql - you may need to order the columns in the where clause in the same order they are listed in the index. In the last versions of mysql the query optimizer is responsible for ordering the columns in the where clause in the correct order.
Your SELECT query took 30+ seconds because it returns 200k rows and because the index might not be the optimal for the given query.
For the second question about the partitioning: the common rule is that the column you are partitioning by must be part of all the UNIQUE keys in a table (Primary key is also unique key by definition so the column should be added to the PK also). If table structure and logic allows you to add the partitioning column to all the UNIQUE indexes in the table then you add it and partition the table.
When the partitioning is made correctly you can take the advantage of partitioning pruning - this is when SELECT query searches the data only in the partitions where given data is stored (otherwise it looks in all partitions)
You can read more about partitioning here:
https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
The query is slow simply because disks are slow.
Cardinality is not important when designing an index.
The optimal index for that query is
INDEX(companyid, activityname, partitionflag) -- in any order
It is "covering" since it includes all the columns mentioned anywhere in the SELECT. This is indicated by "Using index" in the EXPLAIN.
Leaving off the other 3 columns makes the query faster because it will have to read less off the disk.
If you make any changes to the query (add columns, change from '=' to '>', add ORDER BY, etc), then the index may no longer be optimal.
"Also, if remove the partitionflag from the mentioned index, the index is not even used." -- That is because it was no longer "covering".
Keep in mind that there are two ways an index may be used -- "covering" versus being a way to look up the data. When you don't have a "covering" index, the optimizer chooses between using the index and bouncing between the index and the data versus simply ignoring the index and scanning the table.

Distinct (or group by) using filesort and temp table

I know there are similar questions on this but I've got a specific query / question around why this query
EXPLAIN SELECT DISTINCT RSubdomain FROM R_Subdomains WHERE EmploymentState IN (0,1) AND RPhone='7853932120'
gives me this output explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE RSubdomains index NULL RSubdomain 767 NULL 3278 Using where
with and index on RSubdomains
but if I add in a composite index on EmploymentState/RPhone
I get this output from explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE RSubdomains range EmploymentState EmploymentState 67 NULL 2 Using where; Using temporary
if I take away the distinct on RSubdomains it drops the Using temp from the explain output... but what I don't get is why, when I add in the composite key (and keeping the key on RSubdomain) does the distinct end up using a temp table and which index schema is better here? I see that the amount of rows scanned on the combined key is far less, but the query is of type range and it's also slower.
Q: why ... does the distinct end up using a temp table?
MySQL is doing a range scan on the index (i.e. reading index blocks) to locate the rows that satisfy the predicates (WHERE clause). Then MySQL has to lookup the value of the RSubdomain column from the underlying table (it's not available in the index.) To eliminate duplicates, MySQL needs to scan the values of RSubdomain that were retrieved. The "Using temp" indicates the MySQL is materializing a resultset, which is processed in a subsequent step. (Likely, that's the set of RSubdomain values that was retrieved; given the DISTINCT, it's likely that MySQL is actually creating a temporary table with RSubdomain as a primary or unique key, and only inserting non-duplicate values.
In the first case, it looks like the rows are being retreived in order by RSubdomain (likely, that's the first column in the cluster key). That means that MySQL needn't compare the values of all the RSubdomain values; it only needs to check if the last retrieved value matches the currently retrieved value to determine whether the value can be "skipped."
Q: which index schema is better here?
The optimum index for your query is likely a covering index:
... ON R_Subdomains (RPhone, EmploymentState, RSubdomain)
But with only 3278 rows, you aren't likely to see any performance difference.
FOLLOWUP
Unfortunately, MySQL does not provide the type of instrumentation provided in other RDBMS (like the Oracle event 10046 sql trace, which gives actual timings for resources and waits.)
Since MySQL is choosing to use the index when it is available, that is probably the most efficient plan. For the best efficiency, I'd perform an OPTIMIZE TABLE operation (for InnoDB tables and MyISAM tables with dynamic format, if there have been a significant number of DML changes, especially DELETEs and UPDATEs that modify the length of the row...) At the very least, it would ensure that the index statistics are up to date.
You might want to compare the plan of an equivalent statement that does a GROUP BY instead of a DISTINCT, i.e.
SELECT r.RSubdomain
FROM R_Subdomains r
WHERE r.EmploymentState IN (0,1)
AND r.RPhone='7853932120'
GROUP
BY r.Subdomain
For optimum performance, I'd go with a covering index with RPhone as the leading column; that's based on an assumption about the cardinality of the RPhone column (close to unique values), opposed to only a few different values in the EmploymentState column. That covering index will give the best performance... i.e. the quickest elimination of rows that need to be examined.
But again, with only a couple thousand rows, it's going to be hard to see any performance difference. If the query was examining millions of rows, that's when you'd likely see a difference, and the key to good performance will be limiting the number of rows that need to be inspected.

1 index for 2 kinds of queries below possible in MYSQL?

select xx from tablexx where type in (1,3) and last<current-interval 30 second;
select xx from tablexx where type=1;
If create index on (type,last),the first one won't use index.
If create index on (last,type),the second one won't use index.
As for data type,which is can be seen from the example,type: int unsigned,last: datetime
In the first query, MySQL is going to look for an index on 'last' because it is an inequality. I would then expect it to have to iterate over all records with 'last
I would expect you'd get just as good performance with two separate indexes, one on 'last' (for the first query) and one on 'type' (for the second query).
The 'EXPLAIN' command can be really helpful for analysing this stuff.
The second query, having only type = 1 in the where clause, only needs and index on type, not on (type, last).
MySQL should pick the most specific index for your query, so creating an index just covering type should be used for the second one, but not the first one.
You stated "If create index on (type,last),the first one won't use index." Are you sure about this? I was under the impression this is exactly the circumstance under which a covering index would execute.
EDIT: Unless of course there's a selectivity problem with the data - if most records have type 1 or 3 then the optimizer wouldn't use the index (regardless of whether it was a basic or composite index).