MySQL join issue, query hangs - mysql

I have a table holding numeric data points with timestamps, like so:
CREATE TABLE `value_table1` (
`datetime` datetime NOT NULL,
`value` decimal(14,8) DEFAULT NULL,
KEY `datetime` (`datetime`)
) ENGINE=InnoDB;
My table holds a data point for every 5 seconds, so timestamps in the table will be, e.g.:
"2013-01-01 10:23:35"
"2013-01-01 10:23:40"
"2013-01-01 10:23:45"
"2013-01-01 10:23:50"
I have a few such value tables, and it is sometimes necessary to look at the ratio between two value series.
I therefore attempted a join, but it seems to not work:
SELECT value_table1.datetime, value_table1.value / value_table2.rate
FROM value_table1
JOIN value_table2
ON value_table1.datetime = value_table2.datetime
ORDER BY value_table1.datetime ASC;
Running EXPLAIN on the query shows:
+----+-------------+--------------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+-------+---------------------------------+
| 1 | SIMPLE | value_table1 | ALL | NULL | NULL | NULL | NULL | 83784 | Using temporary; Using filesort |
| 1 | SIMPLE | value_table2 | ALL | NULL | NULL | NULL | NULL | 83735 | |
+----+-------------+--------------+------+---------------+------+---------+------+-------+---------------------------------+
Edit
Problem solved, no idea where my index disappeared to. EXPLAIN showed it, thanks!
Thanks!

As your explain shows, the query is not using indexes on the join. Without indexes, it has to scan every row in both tables to process the join.
First of all, make sure the columns used in the join are both indexed.
If they are, then it might be the column type that is causing issues. You could create an integer representation of the time, and then use that to join the two tables.

Related

Need index for simple query please

Can anyone suggest a good index to make this query run quicker?
SELECT
s.*,
sl.session_id AS session_id,
sl.lesson_id AS lesson_id
FROM
cdu_sessions s
INNER JOIN cdu_sessions_lessons sl ON sl.session_id = s.id
WHERE
(s.sort = '1') AND
(s.enabled = '1') AND
(s.teacher_id IN ('193', '1', '168', '1797', '7622', '19951'))
Explain:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | s | NULL | ALL | PRIMARY | NULL | NULL | NULL | 2993 | 0.50 | Using where
1 | SIMPLE | sl | NULL | ref | session_id,ix2 | ix2 | 4 | ealteach_main.s.id | 5 | 100.00 | Using index
cdu_sessions looks like this:
------------------------------------------------
id | int(11)
name | varchar(255)
map_location | enum('classroom', 'school'...)
sort | tinyint(1)
sort_titles | tinyint(1)
friend_gender | enum('boy', 'girl'...)
friend_name | varchar(255)
friend_description | varchar(2048)
friend_description_format | varchar(128)
friend_description_audio | varchar(255)
friend_description_audio_fid | int(11)
enabled | tinyint(1)
created | int(11)
teacher_id | int(11)
custom | int(1)
------------------------------------------------
cdu_sessions_lessons contains 3 fields - id, session_id and lesson_id
Thanks!
Without looking at the query plan, row count and distribution on each table, is hard to predict a good index to make it run faster.
But, I would say that this might help:
> create index sessions_teacher_idx on cdu_sessions(teacher_id);
looking at where condition you could use a composite index for table cdu_sessions
create index idx1 on cdu_sessions(teacher_id, sort, enabled);
and looking to join and select for table cdu_sessions_lessons
create index idx2 on cdu_sessions_lessons(session_id, lesson_id);
First, write the query so no type conversions are necessary. All the comparisons in the where clause are to numbers, so use numeric constants:
SELECT s.*,
sl.session_id, -- unnecessary because s.id is in the result set
sl.lesson_id
FROM cdu_sessions s INNER JOIN
cdu_sessions_lessons sl
ON sl.session_id = s.id
WHERE s.sort = 1 AND
s.enabled = 1 AND
s.teacher_id IN (193, 1, 168, 1797, 7622, 19951);
Although it might not be happening in this specific case, mixing types can impede the use of indexes.
I removed the column as aliases (as session_id for instance). These were redundant because the column name is the alias and the query wasn't changing the name.
For this query, first look at the WHERE clause. All the column references are from one table. These should go in the index, with the equality comparisons first:
create index idx_cdu_sessions_4 on cdu_sessions(sort, enabled, teacher_id, id)
I added id because it is also used in the JOIN.
Formally, id is not needed in the index if it is the primary key. However, I like to be explicit if I want it there.
Next you want an index for the second table. Only two columns are referenced from there, so they can both go in the index. The first column should be the one used in the join:
create index idx_cdu_sessions_lessons_2 on cdu_sessions_lessons(session_id, lesson_id);

MySql refuses to use index

I'm new to query optimizations so I accept I don't understand everything yet but I do not understand why even this simple query isn't optimized as expected.
My table:
+------------------+-----------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-----------+------+-----+-------------------+----------------+
| tasktransitionid | int(11) | NO | PRI | NULL | auto_increment |
| taskid | int(11) | NO | MUL | NULL | |
| transitiondate | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
+------------------+-----------+------+-----+-------------------+----------------+
My indexes:
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tasktransitions | 0 | PRIMARY | 1 | tasktransitionid | A | 952 | NULL | NULL | | BTREE | | |
| tasktransitions | 1 | transitiondate_ix | 1 | transitiondate | A | 952 | NULL | NULL | | BTREE | | |
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
My query:
SELECT taskid FROM tasktransitions WHERE transitiondate>'2013-09-31 00:00:00';
gives this:
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | tasktransitions | ALL | transitiondate_ix | NULL | NULL | NULL | 1082 | Using where |
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
If I understand everything correctly Using where and ALL means that all rows are retrieved from the storage engine and filtered at server layer. This is sub-optimal. Why does it refuse to use the index and only retrieve the requested range from the storage engine (innoDB)?
Cheers
MySQL will not use the index if it estimates that it would select a significantly large portion of the table, and it thinks that a table-scan is actually more efficient in those cases.
By analogy, this is the reason the index of a book doesn't contain very common words like "the" -- because it would be a waste of time to look up the word in the index and find the list of page numbers is a very long list, even every page in the book. It would be more efficient to simply read the book cover to cover.
My experience is that this happens in MySQL if a query's search criteria would match greater than 20% of the table, and this is usually the right crossover point. There could be some variation based on the data types, size of table, etc.
You can give a hint to MySQL to convince it that a table-scan would be prohibitively expensive, so it would be much more likely to use the index. This is not usually necessary, but you can do it like this:
SELECT taskid FROM tasktransitions FORCE INDEX (transitiondate_ix)
WHERE transitiondate>'2013-09-31 00:00:00';
I once was trying to join two tables and MySQL was refusing to use an index, resulting in >500ms queries, sometimes a few seconds. Turns out the column I was joining on had different encodings on each table. Changing both to the same encoding sped up the query to consistently less than 100ms.
Just in case, it helps somebody.
I have a table with a varchar column _id (long int coded as string). I added an index for this column, but query was still slow. I was executing this query:
select * from table where (_id = 2221835089) limit 1
I realized that the _id column wasn't been generated as string (I'm Laravel as DB framework). Well, if query is executed with the right data type in the where clause everything worked like a charm:
select * from table where (_id = '2221835089') limit 1
I am new at my MySQL 8.0, have finished 2 simple tutorials completely, and there is only two subjects that has not worked for me, one of them is indexing. I read the section labeled "2 Answers" and found that using
the statement suggested at the end of said section, seems to defeat the
purpose of the original USE INDEX or FORCE INDEX statement below. The suggested statement is like getting a table sorted via a WHERE statement instead of MySQL using USE INDEX or FORCE INDEX. It works, but seems to me it is not the same as using the natural USE INDEX or FORCE INDEX. Does any one knows why MySQL is ignoring my simple request to index a 10 row table on the Lname column?
Field
Type
Null
Key
Default
Extra
ID
int
NO
PRI
Null
auto_increment
Lname
varchar(20)
NO
MUL
Null
Fname
varchar(20)
NO
Mul
Null
City
varchar(15)
NO
Null
Birth_Date
date
NO
Null
CREATE INDEX idx_Lname ON TestTable (Lname);
SELECT * FROM TestTable USE INDEX (idx_Lname);
SELECT * From Testtable FORCE INDEX (idx_LastFirst);

Why is COUNT() query from large table much faster than SUM()

I have a data warehouse with the following tables:
main
about 8 million records
CREATE TABLE `main` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`cid` mediumint(8) unsigned DEFAULT NULL, //This is the customer id
`iid` mediumint(8) unsigned DEFAULT NULL, //This is the item id
`pid` tinyint(3) unsigned DEFAULT NULL, //This is the period id
`qty` double DEFAULT NULL,
`sales` double DEFAULT NULL,
`gm` double DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_pci` (`pid`,`cid`,`iid`) USING HASH,
KEY `idx_pic` (`pid`,`iid`,`cid`) USING HASH
) ENGINE=InnoDB AUTO_INCREMENT=7978349 DEFAULT CHARSET=latin1
period
This table has about 50 records and has the following fields
id
month
year
customer
This has about 23,000 records and the following fileds
id
number //This field is unique
name //This is simply a description field
The following query runs very fast (less than 1 second) and returns about 2,000:
select count(*)
from mydb.main m
INNER JOIN mydb.period p ON p.id = m.pid
INNER JOIN mydb.customer c ON c.id = m.cid
WHERE p.year = 2013 AND c.number = 'ABC';
But this query is much slower (mmore than 45 seconds), which is the same as the previous but sums instead of counts:
select sum(sales)
from mydb.main m
INNER JOIN mydb.period p ON p.id = m.pid
INNER JOIN mydb.customer c ON c.id = m.cid
WHERE p.year = 2013 AND c.number = 'ABC';
When I explain each query, the ONLY difference I see is that on the 'count()'
query the 'Extra' field says 'Using index', while for the 'sum()' query this field is NULL.
Explain count() query
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | c | const | PRIMARY,idx_customer | idx_customer | 11 | const | 1 | Using index |
| 1 | SIMPLE | p | ref | PRIMARY,idx_period | idx_period | 4 | const | 6 | Using index |
| 1 | SIMPLE | m | ref | idx_pci,idx_pic | idx_pci | 6 | mydb.p.id,const | 7 | Using index |
Explain sum() query
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | c | const | PRIMARY,idx_customer | idx_customer | 11 | const | 1 | Using index |
| 1 | SIMPLE | p | ref | PRIMARY,idx_period | idx_period | 4 | const | 6 | Using index |
| 1 | SIMPLE | m | ref | idx_pci,idx_pic | idx_pci | 6 | mydb.p.id,const | 7 | NULL |
Why is the count() so much faster than sum()? Shouldn't it be using the index for both?
What can I do to make the sum() go faster?
Thanks in advance!
EDIT
All the tables show that it is using Engine InnoDB
Also, as a side note, if I just do a 'SELECT *' query, this runs very quickly (less than 2 seconds). I would expect that the 'SUM()' shouldn't take any longer than that since SELECT * has to retrieve the rows anyways...
SOLVED
This is what I've learned:
Since the sales field is not a part of the index, it has to retrieve the records from the hard drive (which can be kind've slow).
I'm not too familiar with this, but it looks like I/O performance can be increased by switching to a SSD (Solid-state drive). I'll have to research this more.
For now, I think I'm going to create another layer of summary in order to get the performance I'm looking for.
I redefined my index on the main table to be (pid,cid,iid,sales,gm,qty) and now the sum() queries are running VERY fast!
Thanks everybody!
The index is the list of key rows.
When you do the count() query the actual data from the database can be ignored and just the index used.
When you do the sum(sales) query, then each row has to be read from disk to get the sales figure, hence much slower.
Additionally, the indexes can be read in bulk and then processed in memory, while the disk access will be randomly trashing the drive trying to read rows from across the disk.
Finally, the index itself may have summaries of the counts (to help with the plan generation)
Update
You actually have three indexes on your table:
PRIMARY KEY (`id`),
KEY `idx_pci` (`pid`,`cid`,`iid`) USING HASH,
KEY `idx_pic` (`pid`,`iid`,`cid`) USING HASH
So you only have indexes on the columns id, pid, cid, iid. (As an aside, most databases are smart enough to combine indexes, so you could probably optimize your indexes somewhat)
If you added another key like KEY idx_sales(id,sales) that could improve performance, but given the likely distribution of sales values numerically, you would be adding extra performance cost for updates which is likely a bad thing
The simple answer is that count() is only counting rows. This can be satisfied by the index.
The sum() needs to identify each row and then fetch the page in order to get the sales column. This adds a lot of overhead -- about one page load per row.
If you add sales into the index, then it should also go very fast, because it will not have to fetch the original data.

Optimizing MySQL indexes for query (trading tick data database)

My MySQL database has over 350 million rows, and is growing. It's 32GB in size right now. I am using SSD's and lots of RAM, but would like to seek advice to make sure I am using appropriate indexes.
CREATE TABLE `qcollector` (
`key` bigint(20) NOT NULL AUTO_INCREMENT,
`instrument` char(4) DEFAULT NULL,
`datetime` datetime DEFAULT NULL,
`last` double DEFAULT NULL,
`lastsize` int(10) DEFAULT NULL,
`totvol` int(10) DEFAULT NULL,
`bid` double DEFAULT NULL,
`ask` double DEFAULT NULL,
PRIMARY KEY (`key`),
KEY `datetime_index` (`datetime`)
) ENGINE=InnoDB;
show index from qcollector;
+------------+------------+----------------+--------------+-------------+-----------+-- -----------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| qcollector | 0 | PRIMARY | 1 | key | A | 378866659 | NULL | NULL | | BTREE | | |
| qcollector | 1 | datetime_index | 1 | datetime | A | 63144443 | NULL | NULL | YES | BTREE | | |
+------------+------------+----------------+--------------+-------------+-----------+------ -------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.03 sec)
select * from qcollector order by datetime desc limit 1;
+-----------+------------+---------------------+---------+----------+---------+---------+--------+
| key | instrument | datetime | last | lastsize | totvol | bid | ask |
+-----------+------------+---------------------+---------+----------+---------+---------+--------+
| 389054487 | ES | 2012-06-29 15:14:59 | 1358.25 | 2 | 2484771 | 1358.25 | 1358.5 |
+-----------+------------+---------------------+---------+----------+---------+---------+--------+
1 row in set (0.09 sec)
A typical query that is slow (full table scan, this query takes 3-4 minutes):
explain select date(datetime), count(lastsize) from qcollector where instrument = 'ES' and datetime > '2011-01-01' and time(datetime) between '15:16:00' and '15:29:00' group by date(datetime) order by date(datetime) desc;
+------+-------------+------------+------+----------------+------+---------+------+-----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+------+----------------+------+---------+------+-----------+----------------------------------------------+
| 1 | SIMPLE | qcollector | ALL | datetime_index | NULL | NULL | NULL | 378866659 | Using where; Using temporary; Using filesort |
+------+-------------+------------+------+----------------+------+---------+------+-----------+----------------------------------------------+
A couple ideas for you to consider:
A covering index (that is, an index that includes ALL of the columns referenced in the query) may help some. Such an index is going to require more disk (SSD?) space, but it will remove the necessity for MySQL to visit the data pages to lookup the values of the columns that aren't in the index.
ON qcollector (datetime,instrument,lastsize)
or
ON qcollector (instrument,datetime,lastsize)
Do you really need to exclude rows that have a NULL value for lastsize from the count? Could you return a count of all rows instead? If you could instead return COUNT(1) or SUM(1), then the query wouldn't need to reference the lastsize column, so it wouldn't be needed in an index to make it a covering index.
The COUNT(lastsize) expression is equivalent to SUM(IF(lastsize IS NULL,0,1))
Do you need to return dates when there are only NULL lastsize values for the datetime range, or could all of the rows with a NULL lastsize be excluded? That is, could you include a predicate like
AND lastsize IS NOT NULL
in your query?
Those may help some.
I think the big problem is that the predicates on the TIME(datetime) expression are not sargable. That is, MySQL won't use an index range scan operation for those. The predicate on the bare datetime column is sargable... that's why the EXPLAIN is showing the datetime_index as a possible key.
And the other big problem is that the query is doing GROUP BY and ORDER BY operations on a derived expression, which is going to require MySQL to generate an intermediate result set (as a temporary MyISAM table), and then process that result set. And that can be a lot of heavy lifting when there are lots of rows to process.
As far as table changes, I would consider using separate DATE and TIME columns, and using a TIMESTAMP datatype in place of DATETIME (if you need to store the date and time together). I would rewrite the query to reference the bare DATE and bare TIME columns, and consider adding a covering index that included all columns referenced in the rewritten query, with leading columns being the columns with the highest cardinality (and having the most selective predicates in the query.)
When you use date and time functions on a column the indexes cannot be used efficiently. You could also store the date and time in separate columns and index those, though this will take up more storage space.
You may also want to consider adding multi-column indexes. An index on (instrument, datetime) would probably help you here.

MySQL datetime index is not working

Table structure:
+-------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| total | int(11) | YES | | NULL | |
| thedatetime | datetime | YES | MUL | NULL | |
+-------------+----------+------+-----+---------+----------------+
Total rows: 137967
mysql> explain select * from out where thedatetime <= NOW();
+----+-------------+-------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | out | ALL | thedatetime | NULL | NULL | NULL | 137967 | Using where |
+----+-------------+-------------+------+---------------+------+---------+------+--------+-------------+
The real query is much more longer with more table joins, the point is, I can't get the table to use the datetime index. This is going to be hard for me if I want to select all data until certain date. However, I noticed that I can get MySQL to use the index if I select a smaller subset of data.
mysql> explain select * from out where thedatetime <= '2008-01-01';
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+-------------+
| 1 | SIMPLE | out | range | thedatetime | thedatetime | 9 | NULL | 15826 | Using where |
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+-------------+
mysql> select count(*) from out where thedatetime <= '2008-01-01';
+----------+
| count(*) |
+----------+
| 15990 |
+----------+
So, what can I do to make sure MySQL will use the index no matter what date that I put?
There are two things in play here -
Index is not selective enough - if the index covers more than approx. 30% of the rows, MySQL will decide a full table scan is more efficient. When you contract the range the index kicks in.
One index per table in a join
The real query is much more longer
with more table joins, the point is ...
The point is exactly because it has joins that it probably can't use that index. MySQL can use one index per table in a join (unless it qualifies for an index-merge optimization). If the primary key is already used for the join, thedatetime won't be used. In order to use it, you need to create a multi-column index on the join key + thedatetime index, in the correct order.
Check the EXPLAIN of the actual query to see which key MySQL uses for the join. Modify that index to include the thedatetime column as well, or create a new multi-column index from both (depending on what you use the join key for).
Everything works as it is supposed to. :)
Indexes are there to speed up retrieval. They do it using index lookups.
In you first query the index is not used because you are retrieving ALL rows, and in this case using index is slower (lookup index, get row, lookup index, get row... x number of rows is slower then get all rows == table scan)
In the second query you are retrieving only a portion of the data and in this case table scan is much slower.
The job of the optimizer is to use statistics that RDBMS keeps on the index to determine the best plan. In first case index was considered, but planner (correctly) threw it away.
EDIT
You might want to read something like this to get some concepts and keywords regarding mysql query planner.