select xx from tablexx where type in (1,3) and last<current-interval 30 second;
select xx from tablexx where type=1;
If create index on (type,last),the first one won't use index.
If create index on (last,type),the second one won't use index.
As for data type,which is can be seen from the example,type: int unsigned,last: datetime
In the first query, MySQL is going to look for an index on 'last' because it is an inequality. I would then expect it to have to iterate over all records with 'last
I would expect you'd get just as good performance with two separate indexes, one on 'last' (for the first query) and one on 'type' (for the second query).
The 'EXPLAIN' command can be really helpful for analysing this stuff.
The second query, having only type = 1 in the where clause, only needs and index on type, not on (type, last).
MySQL should pick the most specific index for your query, so creating an index just covering type should be used for the second one, but not the first one.
You stated "If create index on (type,last),the first one won't use index." Are you sure about this? I was under the impression this is exactly the circumstance under which a covering index would execute.
EDIT: Unless of course there's a selectivity problem with the data - if most records have type 1 or 3 then the optimizer wouldn't use the index (regardless of whether it was a basic or composite index).
Related
I have a large table (about 3 million records) that includes primarily these fields: rowID (int), a deviceID (varchar(20)), a UnixTimestamp in a format like 1536169459 (int(10)), powerLevel which has integers that range between 30 and 90 (smallint(6)).
I'm looking to pull out records within a certain time range (using UnixTimestamp) for a particular deviceID and with a powerLevel above a certain number. With over 3 million records, it takes a while. Is there a way to create an index that will optimize for this?
Create an index over:
DeviceId,
PowerLevel,
UnixTimestamp
When selecting, you will first narrow in to the set of records for your given Device, then it will narrow in to only those records that are in the correct PowerLevel range. And lastly, it will narrow in, for each PowerLevel, to the correct records by UnixTimestamp.
If I understand you correctly, you hope to speed up this sort of query.
SELECT something
FROM tbl
WHERE deviceID = constant
AND start <= UnixTimestamp
AND UnixTimestamp < end
AND Power >= constant
You have one constant criterion (deviceID) and two range critera (UnixTimestamp and Power). MySQL's indexes are BTREE (think sorted in order), and MySQL can only do one index range scan per SELECT.
So, you should probably choose an index on (deviceID, UnixTimestamp, Power). To satisfy the query, MySQL will random-access the index to the entries for deviceID, then further random access to the first row meeting the UnixTimestamp start criterion.
It will then scan the index sequentially, and use the Power information from each index entry to decide whether it should choose each row.
You could also use (deviceID, Power, UnixTimestamp) . But in this case MySQL will find the first entry matching the device and power criteria, then scan the index to look at entries will all timestamps to see which rows it should choose.
Your performance objective is to get MySQL to scan the fewest possible index entries, so it seems very likely the (deviceID, UnixTimestamp, Power) choice is superior. The index column on UnixTimestamp is probably more selective than the one on Power. (That's my guess.)
ALTER TABLE tbl CREATE INDEX tbl_dev_ts_pwr (deviceID, UnixTimestamp, Power);
Look at Bill Karwin's tutorials. Also look at Markus Winand's https://use-the-index-luke.com
The suggested 3-column indexes are only partially useful. The Optimizer will use the first 2 columns, but ignore the third.
Better:
INDEX(DeviceId, PowerLevel),
INDEX(DeviceId, UnixTimestamp)
Why?
The optimizer will pick between those two based on which seems to be more selective. If the time range is 'narrow', then the second index will be used; if there are not many rows with the desired PowerLevel, then the first index will be used.
Even better...
The PRIMARY KEY... You probably have Id as the PK? Perhaps (DeviceId, UnixTimestamp) is unique? (Or can you have two readings for a single device in a single second??) If the pair is unique, get rid of Id completely and have
PRIMARY KEY(DeviceId, UnixTimestamp),
INDEX(DeviceId, PowerLevel)
Notes:
Getting rid of Id saves space, thereby providing a little bit of speed.
When using a secondary index, the executing spends time bouncing between the index's BTree and the data BTree (ordered by the PK). By having PRIMARY KEY(Id), you are guaranteed to do the bouncing. By changing the PK to this, the bouncing is avoided. This may double the speed of the query.
(I am not sure the secondary index will every be used.)
Another (minor) suggestion: Normalize the DeviceId so that it is (perhaps) a 2-byte SMALLINT UNSIGNED (range 0..64K) instead of VARCHAR(20). Even if this entails a JOIN, the query will run a little faster. And a bunch of space is saved.
I have a table with two partitions. Partitions are pactive = 1 and pinactive = 0. I understand that two partitions does not make so much of a gain, but I have used it to truncate and load in one partition and plain inserts in another partition.
The problem comes when I create indexes.
Query goes this way
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
Created index -
create index idx_try on customformattributes(partitionflag,companyid,activityname,completiondate,attributename,isclosed)
there are around 200000 records that will be retreived from the above query. But the query along with the mentioned index takes 30+ seconds. What is the reason for such a long time? Also, if remove the partitionflag from the mentioned index, the index is not even used.
And is the understanding that,
Even with the partitions available, the optimizer needs to have the required partition mentioned in the index definition, so that it only hits the required partition ---- Correct?
Any ideas on understanding this would be very helpful
You can optimize your index by reordering the columns in it. Usually the columns in the index are ordered by its cardinality (starting from the highest and go down to the lowest). Cardinality is the uniqueness of data in the given column. So in your case I suppose there are many variations of companyid in customformattributes table while partitionflag will have cardinality of 2 (if all the options for this column are 1 and 0).
Your query will first filter all the rows with partitionflag=0, then it will filter by company id and so on.
When you remove partitionflag from the index the query did not used the index because may be the optimizer decides that it will be faster to make full table scan instead of using the index (in most of the cases the optimizer is right)
For the given query:
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
the following index may be would be better (but of course :
create index idx_try on customformattributes(companyid,activityname, completiondate,attributename, partitionflag, isclosed)
For the query to use index the following rule must be met - the left most column in the index should be present in the where clause ... and depending on the mysql version you are using additional query requirements may be needed. For example if you are using old version of mysql - you may need to order the columns in the where clause in the same order they are listed in the index. In the last versions of mysql the query optimizer is responsible for ordering the columns in the where clause in the correct order.
Your SELECT query took 30+ seconds because it returns 200k rows and because the index might not be the optimal for the given query.
For the second question about the partitioning: the common rule is that the column you are partitioning by must be part of all the UNIQUE keys in a table (Primary key is also unique key by definition so the column should be added to the PK also). If table structure and logic allows you to add the partitioning column to all the UNIQUE indexes in the table then you add it and partition the table.
When the partitioning is made correctly you can take the advantage of partitioning pruning - this is when SELECT query searches the data only in the partitions where given data is stored (otherwise it looks in all partitions)
You can read more about partitioning here:
https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
The query is slow simply because disks are slow.
Cardinality is not important when designing an index.
The optimal index for that query is
INDEX(companyid, activityname, partitionflag) -- in any order
It is "covering" since it includes all the columns mentioned anywhere in the SELECT. This is indicated by "Using index" in the EXPLAIN.
Leaving off the other 3 columns makes the query faster because it will have to read less off the disk.
If you make any changes to the query (add columns, change from '=' to '>', add ORDER BY, etc), then the index may no longer be optimal.
"Also, if remove the partitionflag from the mentioned index, the index is not even used." -- That is because it was no longer "covering".
Keep in mind that there are two ways an index may be used -- "covering" versus being a way to look up the data. When you don't have a "covering" index, the optimizer chooses between using the index and bouncing between the index and the data versus simply ignoring the index and scanning the table.
I am having an issue with a table the uses a compound primary key.
The key consists of a date followed by an bigint.
Selects on the table look to be scanning even when only selecting fields from the PK and using a where clause that contains both columns. For Example
SELECT mydate, myid from foo WHERE mydate >='2014-08-26' AND my_id = 1234;
Explain select shows using where and the number of rows considered is in the millions.
One oddity is the key_len which is shown as 7 which seems far too small.
My instinct says the key is broken but I may be missing something obvious.
Any thoughts?
Thank you
Richard
For this query, the index you want is on id, date:
create index idx_foo_myid_mydate on foo(my_id, mydate);
This is because the conditions in the where clause have an equality and inequality. The equality conditions need to match the index from left to right, before the inequalities can be applied.
MySQL documentation actually does a good job (in my opinion) in explaining composite indexes.
Your existing index will be used for the inequality on mydate. However, all the index after the date in question will then be scanned to satisfy the condition on my_id. With the right index, MySQL can just go to the right rows directly.
i got questions about indexing SQL database:
Is it better to index boolean column or rather not because there are only 2 options? i know if the table is small then indexing will not change anything, but im asking about table with 1mln records.
If i got two dates ValidFrom and ValidTo is it better to create 1 index with 2 columns or 2 seperate indexes? In 90% of queries i use where validfrom < date && validto > date, but there are also few selects only with validfrom or only with validto
whats the diffrence between clustered and non-clistered index? i cant find any article, so a link would be great
You both tagged MySQL and SQL-server. This answer is MySQL inspired.
It depends on many things, but more important than the size is the variation. If about 50% of the values are TRUE, that means the rest of the values (also about 50%) are FALSE and an index will not help much. If only 2% of the values are TRUE and your queries often only need TRUE records, this index will be useful!
If your queries often use both, put both in the index. If one is used more than the other, put that one FIRST in the index, so the composite index can be used for the one field as well.
A clustered index means that the data actually is inside the index. A non-clustered index just points to the data, which is actually stored elsewhere. The PRIMARY KEY in InnoDB is a clustered index.
If you want to use Indexes in MySQL, EXPLAIN is your friend!
This is all for SQL Server, which is what I know about...
1 - Depends on cardinality, but as a rule an index on a single boolean field (BIT in SQL Server) won't be used since it's not very selective.
2 - Make 2 indexes, one with both, and the other with just the second field from the first index. Then you are covered in both cases.
3 - Clustered indexes contain the data for ALL fields at the leaf level (the entire table basically) ordered by your clustered index field. Non-clustered indexes contain only the key fields and any INCLUDEd fields at the leaf level, with a pointer to the clustered index row if you need any other data from other fields for that row.
If you use the "Filtered Index", the number of records up to 2 million with no problems.
Create 1 Non clustered index instead of 2 Filtered Index
Different in user experience, these two aspects are not related to each other nothing. The search index (PK: Primary Key) is different than searching for a range of values (Non clustered Index often used in tracing the value range), in fact finding by PK represented less than 1% queries
The explain command with the query:
explain SELECT * FROM leituras
WHERE categorias_id=75 AND
textos_id=190304 AND
cookie='3f203349ce5ad3c67770ebc882927646' AND
endereco_ip='127.0.0.1'
LIMIT 1
The result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE leituras ALL (null) (null) (null) (null) 1022597 Using where
Will it make any difference adding some keys on the table? Even that the query will always return only one row.
In answer to your question, yes. You should add indexes where necessary (thanks #Col.Shrapnel) on the columns that appear in your WHERE clause - in this case, categorias_id, textos_id, cookie, and endereco_ip.
If you always perform a query using the same 3 columns in the WHERE clause, it may be beneficial to add an index which comprises the 3 columns in one go, rather than adding individual indexes.
It still has to do a linear search over the table until it finds that one row. So adding indexes could noticeably improve performance.
Yes, indexes are even more important when you want to return only one row.
If you are returning half of the rows and your database system has to scan the entire table, you're still at 50% efficiency.
However, if you want to return just one row, and your database system has to scan 1022597 rows to find your row, your efficiency is minuscule.
LIMIT 1 does offer some efficiency in that it stops as soon as it finds the first matching row, but it obviously has to scan an enormous number of records to find that first row.
Adding an index for each of the columns in your WHERE clause allows your database system to avoid scanning rows that don't match your criteria. With adequate indexes, you'll see that the rows column in the explain will get closer to the actual number of returned rows.
Using a compound index that covers all four of the columns in your WHERE clause allows even better performance and less scanning, as the index will provide full coverage. Compound indexes do use a lot of memory and negatively affect insert performance, so you might only want to add a compound index if a large percentage of your queries repeatedly do a look up on the same columns, or if you rarely insert records, or it's just that important to you for that particular query to be fast.
Another way to improve performance is to return only the columns that you need rather than using SELECT *. If you had a compound index on those four columns, and you returned only those four columns, your database system wouldn't need to hit your records at all. The database system could get everything it needed right from the indexes.