Table structure:
+-------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| total | int(11) | YES | | NULL | |
| thedatetime | datetime | YES | MUL | NULL | |
+-------------+----------+------+-----+---------+----------------+
Total rows: 137967
mysql> explain select * from out where thedatetime <= NOW();
+----+-------------+-------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | out | ALL | thedatetime | NULL | NULL | NULL | 137967 | Using where |
+----+-------------+-------------+------+---------------+------+---------+------+--------+-------------+
The real query is much more longer with more table joins, the point is, I can't get the table to use the datetime index. This is going to be hard for me if I want to select all data until certain date. However, I noticed that I can get MySQL to use the index if I select a smaller subset of data.
mysql> explain select * from out where thedatetime <= '2008-01-01';
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+-------------+
| 1 | SIMPLE | out | range | thedatetime | thedatetime | 9 | NULL | 15826 | Using where |
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+-------------+
mysql> select count(*) from out where thedatetime <= '2008-01-01';
+----------+
| count(*) |
+----------+
| 15990 |
+----------+
So, what can I do to make sure MySQL will use the index no matter what date that I put?
There are two things in play here -
Index is not selective enough - if the index covers more than approx. 30% of the rows, MySQL will decide a full table scan is more efficient. When you contract the range the index kicks in.
One index per table in a join
The real query is much more longer
with more table joins, the point is ...
The point is exactly because it has joins that it probably can't use that index. MySQL can use one index per table in a join (unless it qualifies for an index-merge optimization). If the primary key is already used for the join, thedatetime won't be used. In order to use it, you need to create a multi-column index on the join key + thedatetime index, in the correct order.
Check the EXPLAIN of the actual query to see which key MySQL uses for the join. Modify that index to include the thedatetime column as well, or create a new multi-column index from both (depending on what you use the join key for).
Everything works as it is supposed to. :)
Indexes are there to speed up retrieval. They do it using index lookups.
In you first query the index is not used because you are retrieving ALL rows, and in this case using index is slower (lookup index, get row, lookup index, get row... x number of rows is slower then get all rows == table scan)
In the second query you are retrieving only a portion of the data and in this case table scan is much slower.
The job of the optimizer is to use statistics that RDBMS keeps on the index to determine the best plan. In first case index was considered, but planner (correctly) threw it away.
EDIT
You might want to read something like this to get some concepts and keywords regarding mysql query planner.
Related
In a MySQL db I have a table that only has 2 columns, for all intents and purposes: a key hash and a value. Both are INTEGER type. The hash column will have a large number of duplicates (worst case expect ~80k dup for each hash, not possible to make unique due to small hash preimage), and the table contains on the order of 100 billion rows.
Right now I have the hash column indexed (CREATE INDEX idx_hash ON table(hash)); however lookups are very slow. Something like SELECT value FROM table WHERE hash=123 LIMIT 50 will take minutes if not hours, while a similar select on a similar sized table on a primary key column will finish in a jiffy on the same machine.
So my question is how do I optimize for lookup in this case? Is sub-linear time SELECT possible on index columns? This table will be mostly read-only, rebuilding it is possible but will take a long time, so I'd like to gather some information and do it correctly.
EXPLAIN says:
+----+-------------+----------------+------------+------+---------------+------+---------+------+-----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------+------------+------+---------------+------+---------+------+-----------+----------+-------------+
| 1 | SIMPLE | partial_lookup | NULL | ALL | NULL | NULL | NULL | NULL | 100401571 | 10.00 | Using where |
+----+-------------+----------------+------------+------+---------------+------+---------+------+-----------+----------+-------------+
ANALYZE:
+--------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------------------+---------+----------+----------+
| partial_lookup | analyze | status | OK |
+--------------------+---------+----------+----------+
1 row in set (1.47 sec)
What is the performance penalty for SELECT * FROM Table VS SELECT * FROM (SELECT * FROM Table AS A) AS B
My questions are: Firstly, does the SELECT * involve iteration over the rows in the table, or will it simply return all rows as a chunk without any iteration (because no WHERE clause was given), and if so does the nested query in example two involve iterating over the table twice, and will take 2x the time of the first query? thanks...
The answer to this question hinges on whether you are using mysql before 5.7, or 5.7 and after. I may be altering your question slightly, but hopefully the following captures what you are after.
Your SELECT * FROM Table does a table scan via the clustered index (the physical ordering). In the case of no primary key, one is implicitly available to the engine. There is no where clause as you say. No filtering or choice of another index is attempted.
The Explain output (see also) shows 1 row in its summary. It is relatively straight forward. The explain output and performance with your derived table B will differ depending on whether you are on a version before 5.7, or 5.7 and after.
The document Derived Tables in MySQL 5.7 describes it well for versions 5.6 and 5.7, where the latter will provide no penalty due to the change in materialized derived table output being incorporated into the outer query. In prior versions, substantial overhead was endured with temporary tables with the derived.
It is quite easy to test the performance penalty prior to 5.7. All it takes is a medium sized table to see the noticeable impact that your question's derived table has on impacting performance. The following example is on a small table in version 5.6:
explain
select qm1.title
from questions_mysql qm1
join questions_mysql qm2
on qm2.qid<qm1.qid
where qm1.qid>3333 and qm1.status='O';
+----+-------------+-------+-------+-----------------+---------+---------+------+-------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------+---------+---------+------+-------+------------------------------------------------+
| 1 | SIMPLE | qm1 | range | PRIMARY,cactus1 | PRIMARY | 4 | NULL | 5441 | Using where |
| 1 | SIMPLE | qm2 | ALL | PRIMARY,cactus1 | NULL | NULL | NULL | 10882 | Range checked for each record (index map: 0x3) |
+----+-------------+-------+-------+-----------------+---------+---------+------+-------+------------------------------------------------+
explain
select b.title from
( select qid,title from questions_mysql where qid>3333 and status='O'
) b
join questions_mysql qm2
on qm2.qid<b.qid;
+----+-------------+-----------------+-------+-----------------+---------+---------+------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+-----------------+---------+---------+------+-------+----------------------------------------------------+
| 1 | PRIMARY | qm2 | index | PRIMARY,cactus1 | cactus1 | 10 | NULL | 10882 | Using index |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 5441 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | questions_mysql | range | PRIMARY,cactus1 | PRIMARY | 4 | NULL | 5441 | Using where |
+----+-------------+-----------------+-------+-----------------+---------+---------+------+-------+----------------------------------------------------+
Note, I did change the question, but it illustrates the impact that derived tables and their lack of index use with the optimizer has in versions prior to 5.7. The derived table benefits from indexes as it is being materialized. But thereafter it endures overhead as a temporary table and is incorporated into the outer query without index use. This is not the case in version 5.7
I'm new to query optimizations so I accept I don't understand everything yet but I do not understand why even this simple query isn't optimized as expected.
My table:
+------------------+-----------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-----------+------+-----+-------------------+----------------+
| tasktransitionid | int(11) | NO | PRI | NULL | auto_increment |
| taskid | int(11) | NO | MUL | NULL | |
| transitiondate | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
+------------------+-----------+------+-----+-------------------+----------------+
My indexes:
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tasktransitions | 0 | PRIMARY | 1 | tasktransitionid | A | 952 | NULL | NULL | | BTREE | | |
| tasktransitions | 1 | transitiondate_ix | 1 | transitiondate | A | 952 | NULL | NULL | | BTREE | | |
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
My query:
SELECT taskid FROM tasktransitions WHERE transitiondate>'2013-09-31 00:00:00';
gives this:
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | tasktransitions | ALL | transitiondate_ix | NULL | NULL | NULL | 1082 | Using where |
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
If I understand everything correctly Using where and ALL means that all rows are retrieved from the storage engine and filtered at server layer. This is sub-optimal. Why does it refuse to use the index and only retrieve the requested range from the storage engine (innoDB)?
Cheers
MySQL will not use the index if it estimates that it would select a significantly large portion of the table, and it thinks that a table-scan is actually more efficient in those cases.
By analogy, this is the reason the index of a book doesn't contain very common words like "the" -- because it would be a waste of time to look up the word in the index and find the list of page numbers is a very long list, even every page in the book. It would be more efficient to simply read the book cover to cover.
My experience is that this happens in MySQL if a query's search criteria would match greater than 20% of the table, and this is usually the right crossover point. There could be some variation based on the data types, size of table, etc.
You can give a hint to MySQL to convince it that a table-scan would be prohibitively expensive, so it would be much more likely to use the index. This is not usually necessary, but you can do it like this:
SELECT taskid FROM tasktransitions FORCE INDEX (transitiondate_ix)
WHERE transitiondate>'2013-09-31 00:00:00';
I once was trying to join two tables and MySQL was refusing to use an index, resulting in >500ms queries, sometimes a few seconds. Turns out the column I was joining on had different encodings on each table. Changing both to the same encoding sped up the query to consistently less than 100ms.
Just in case, it helps somebody.
I have a table with a varchar column _id (long int coded as string). I added an index for this column, but query was still slow. I was executing this query:
select * from table where (_id = 2221835089) limit 1
I realized that the _id column wasn't been generated as string (I'm Laravel as DB framework). Well, if query is executed with the right data type in the where clause everything worked like a charm:
select * from table where (_id = '2221835089') limit 1
I am new at my MySQL 8.0, have finished 2 simple tutorials completely, and there is only two subjects that has not worked for me, one of them is indexing. I read the section labeled "2 Answers" and found that using
the statement suggested at the end of said section, seems to defeat the
purpose of the original USE INDEX or FORCE INDEX statement below. The suggested statement is like getting a table sorted via a WHERE statement instead of MySQL using USE INDEX or FORCE INDEX. It works, but seems to me it is not the same as using the natural USE INDEX or FORCE INDEX. Does any one knows why MySQL is ignoring my simple request to index a 10 row table on the Lname column?
Field
Type
Null
Key
Default
Extra
ID
int
NO
PRI
Null
auto_increment
Lname
varchar(20)
NO
MUL
Null
Fname
varchar(20)
NO
Mul
Null
City
varchar(15)
NO
Null
Birth_Date
date
NO
Null
CREATE INDEX idx_Lname ON TestTable (Lname);
SELECT * FROM TestTable USE INDEX (idx_Lname);
SELECT * From Testtable FORCE INDEX (idx_LastFirst);
I am executing most of the queries based on the time. So i created index for the created time. But , The index only works , If I select the indexed columns only. Is mysql index is dependant the selected columns?.
My Assumption On Index
I thought index is like a telephone dictionary index page. Ex: If i want to find "Mark" . Index page shows which page character "M" starts in the directory. I think as same as the mysql works.
Table
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Name | varchar(100) | YES | | NULL | |
| OPERATION | varchar(100) | YES | | NULL | |
| PID | int(11) | YES | | NULL | |
| CREATED_TIME | bigint(20) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
Indexes On the table.
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| IndexTest | 0 | PRIMARY | 1 | ID | A | 10261 | NULL | NULL | | BTREE | | |
| IndexTest | 1 | t_dx | 1 | CREATED_TIME | A | 410 | NULL | NULL | YES | BTREE | | |
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Queries Using Indexes:
explain select * from IndexTest where ID < 5;
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | IndexTest | range | PRIMARY | PRIMARY | 4 | NULL | 4 | Using where |
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
explain select CREATED_TIME from IndexTest where CREATED_TIME > UNIX_TIMESTAMP(CURRENT_DATE())*1000;
+----+-------------+-----------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | IndexTest | range | t_dx | t_dx | 9 | NULL | 5248 | Using where; Using index |
+----+-------------+-----------+-------+---------------+------+---------+------+------+--------------------------+
Queries Not using Indexes
explain select count(distinct(PID)) from IndexTest where CREATED_TIME > UNIX_TIMESTAMP(CURRENT_DATE())*1000;
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | IndexTest | ALL | t_dx | NULL | NULL | NULL | 10261 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
explain select PID from IndexTest where CREATED_TIME > UNIX_TIMESTAMP(CURRENT_DATE())*1000;
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | IndexTest | ALL | t_dx | NULL | NULL | NULL | 10261 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
Short answer: No.
Whether indexes are used depends on the expresion in your WHERE clause, JOINs etc, but not on the columns you select.
But no rule without an exception (or actually a long list of those):
Long answer: Usually not
There are a number of factors used by the MySQL Optimizer in order to determine whether it should use an index.
The optimizer may decide to ignore an index if...
another (otherwise non-optimal) saves it from accessing the table data at all
it fails to understand that an expression is a constant
its estimates suggest it will return the full table anyway
if its use will cause the creation of a temporary file
... and tons of other reasons, some of which seem not to be documented anywhere
Sometimes the choices made by said optimizer are... erm... lets call them sub-optimal. Now what do you do in those cases?
You can help the optimizer by doing an OPTIMIZE TABLE and/or ANALYZE TABLE. That is easy to do, and sometimes helps.
You can make it use a certain index with the USE INDEX(indexname) or FORCE INDEX(indexname) syntax
You can make it ignore a certain index with the IGNORE INDEX(indexname) syntax
More details on Index Hints, Optimize Table and Analyze Table on the MySQL documentation website.
Actually, it makes no difference wether you select the column or not. Indexes are used for lookups, meaning for reducing really fast the number of records you need retrieved. That makes it usually useful in situations where: you have joins, you have where conditions. Also indexes help alot in ordering.
Updating and deleting can be sped up quite alot using indexes on the where conditions as well.
As an example:
table: id int pk ai, col1 ... indexed, col2 ...
select * from table -> does not use a index
select id from table where col1 = something -> uses the col1 index although it is not selected.
Looking at the second query, mysql does a lookup in the index, locates the records, then in this case stops and delivers (both id and col1 have index and id happens to be pk, so no need for a secondary lookup).
Situation changes a little in this case:
select col2 from table where col1 = something
This will make internally 2 lookups: 1 for the condition, and 1 on the pk for delivering the col2 data. Please notice that again, you don't need to select the col1 column to use the index.
Getting back to your query, the problem lies with: UNIX_TIMESTAMP(CURRENT_DATE())*1000;
If you remove that, your index will be used for lookups.
Is mysql index is dependant the selected columns?.
Yes, absolutely.
For example:
MySQL cannot use the index to perform lookups if the columns do not form a leftmost
prefix of the index. Suppose that you have the SELECT statements shown here:
SELECT * FROM tbl_name WHERE col1=val1;
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
SELECT * FROM tbl_name WHERE col2=val2;
SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
If an index exists on (col1, col2, col3), only the first two queries use the index.
The third and fourth queries do involve indexed columns, but (col2) and (col2, col3)
are not leftmost prefixes of (col1, col2, col3).
Have a read through the extensive documentation.
for mysql query , the answer is yes, but not all
the query:
explain select * from IndexTest where ID < 5;
use the table cluster index if you use innodb, its table's primary key, so it use primary for query
the second query:
select CREATED_TIME from IndexTest where CREATED_TIME >
UNIX_TIMESTAMP(CURRENT_DATE())*1000;
this one is just fetch the index column that mysql does not need to fetch data from table but just index, so your explain result got "Using Index"
the query:
select count(distinct(PID)) from IndexTest where CREATED_TIME >
UNIX_TIMESTAMP(CURRENT_DATE())*1000;
it look like this
select PID from IndexTest where
CREATE_TIME>UNIX_TIMESTAMP(CURRENT_DATE())*1000 group by PID
mysql can use index to fetch data from database also, but mysql thinks this query it no need to use index to fetch data, because of the where condition filter, mysql thinks that use index fetch data is more expensive than scan all table, you can use force index also
the same reason for your last query
hopp this answer can help you
indexing helps speed the search for that particular column and associated data rather than the table data. So you have to include the indexed column to speed up select.
I have a table holding numeric data points with timestamps, like so:
CREATE TABLE `value_table1` (
`datetime` datetime NOT NULL,
`value` decimal(14,8) DEFAULT NULL,
KEY `datetime` (`datetime`)
) ENGINE=InnoDB;
My table holds a data point for every 5 seconds, so timestamps in the table will be, e.g.:
"2013-01-01 10:23:35"
"2013-01-01 10:23:40"
"2013-01-01 10:23:45"
"2013-01-01 10:23:50"
I have a few such value tables, and it is sometimes necessary to look at the ratio between two value series.
I therefore attempted a join, but it seems to not work:
SELECT value_table1.datetime, value_table1.value / value_table2.rate
FROM value_table1
JOIN value_table2
ON value_table1.datetime = value_table2.datetime
ORDER BY value_table1.datetime ASC;
Running EXPLAIN on the query shows:
+----+-------------+--------------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+-------+---------------------------------+
| 1 | SIMPLE | value_table1 | ALL | NULL | NULL | NULL | NULL | 83784 | Using temporary; Using filesort |
| 1 | SIMPLE | value_table2 | ALL | NULL | NULL | NULL | NULL | 83735 | |
+----+-------------+--------------+------+---------------+------+---------+------+-------+---------------------------------+
Edit
Problem solved, no idea where my index disappeared to. EXPLAIN showed it, thanks!
Thanks!
As your explain shows, the query is not using indexes on the join. Without indexes, it has to scan every row in both tables to process the join.
First of all, make sure the columns used in the join are both indexed.
If they are, then it might be the column type that is causing issues. You could create an integer representation of the time, and then use that to join the two tables.