Performance disparities between two almost identical tables - mysql

I have two tables that are all the same, except one has a timestamp value column and the other has a datetime value column. Indexes are the same. Values are the same.
But when I run SELECT station, MAX(timestamp) AS max_timestamp FROM stations GROUP BY station; if stations is the one with timestamps, it executes really fast, and if I try it with the datetime one, well I haven't seen one query executes. In both cases the timestampcolumn is indexed, only the type changes.
Where should I start looking for? Or is datetime just not suitable for search and indexing ?
Here is what EXPLAIN gives :
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| 1 | SIMPLE | stations | range | NULL | stamp | 33 | NULL | 1511 | Using index for group-by |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------+
| 1 | SIMPLE |stations2 | index | NULL | station | 2 | NULL | 3025467 | |
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------+
And the SHOW:
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stations | CREATE TABLE `stations` (
`station` varchar(10) COLLATE utf8_bin DEFAULT NULL,
`available` smallint(6) DEFAULT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY `stamp` (`station`,`timestamp`),
KEY `time` (`timestamp`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stations2 | CREATE TABLE `stations2` (
`station` smallint(5) unsigned NOT NULL,
`available` smallint(5) unsigned DEFAULT NULL,
`timestamp` datetime DEFAULT NULL,
KEY `station` (`station`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

You can see from the EXPLAIN that there is no key being used for selection (NULL for possible_keys). You don't have a WHERE clause, so this makes sense.
MySQL can utilize an index to determine MAX, and it can utilize an index to optimize GROUP BY. However, to be able to optimize both combined, you would need both the column in your MAX() function and the column in your GROUP BY clause to be in a compound index. In the first table, you have this compound index as a unique key called 'stamp'. The EXPLAIN result shows that MySQL is using that index.
On the second table, you don't have this compound index, so MySQL is having to perform a lot more work. It has to manually group the results and keep the MAX value for each station by manually scanning each row. If you add the same compound index on the second table, you will see similar performance between the two.
However, TIMESTAMP will still slightly outperform DATETIME because TIMESTAMP is treated as a single 4 byte integer value, which is processed faster than an 8 byte special DATETIME value. The larger the data set, the larger difference you will see.

Related

Mysql count performance on very big rows between two dates conditions

I have a table with more than 20 million rows in Innodb.
the columns are
id, viewable_id, visitor, viewed_at
where the viewable_id and viewed_at are indexes.
when I do the below query
SELECT COUNT(*)
FROM views_users
WHERE (viewable_id = 2)
and (viewed_at between '2021-04-19 01:38:37'
and '2021-06-30 01:38:37');
=> take (3 min 6.72 sec)
the explain is
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+-----------------------------------------------------------+-------------------------------+---------+-------+---------+----------+-------------+
| 1 | SIMPLE | views_users | NULL | ref | views_users_viewable_id_index,views_users_viewed_at_index | views_users_viewable_id_index | 8 | const | 9554594 | 50.00 | Using where
How can I increase the performance to reach less than 4 seconds?
CREATE TABLE views_users (
id int unsigned NOT NULL AUTO_INCREMENT,
viewable_type varchar(255) NOT NULL,
viewable_id bigint unsigned NOT NULL,
visitor text,
collection varchar(255) DEFAULT NULL,
viewed_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id),
KEY user_id (viewable_id)
) ENGINE=InnoDB AUTO_INCREMENT=20995848
DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
I increase the performance to take less than 2 seconds by applying MySQL partiotons.
I used partition by range using the viewed_at column. change viewed_at type from timestamp to datatime and made it as primary key with id.
make a cronjob runs on first day of each month that reorganize last partition into another partitions and so on.
For this query:
SELECT COUNT(*)
FROM views_users
WHERE viewable_id = 2 and
viewed_at between '2021-04-19 01:38:37' and '2021-06-30 01:38:37';
You can create an index:
CREATE INDEX idx_views_users_viewable_id_viewed_at ON views_users(viewable_id, viewed_at);

MySQL with JOIN not using index

Problem with MySQL version 5.7.18. Earlier versions of MySQL behaves as supposed to.
Here are two tables. Table 1:
CREATE TABLE `test_events` (
`id` int(11) NOT NULL,
`event` int(11) DEFAULT '0',
`manager` int(11) DEFAULT '0',
`base_id` int(11) DEFAULT '0',
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`client` int(11) DEFAULT '0',
`event_time` datetime DEFAULT '0000-00-00 00:00:00'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `test_events`
ADD PRIMARY KEY (`id`),
ADD KEY `client` (`client`),
ADD KEY `event_time` (`event_time`),
ADD KEY `manager` (`manager`),
ADD KEY `base_id` (`base_id`),
ADD KEY `create_time` (`create_time`);
And the second table:
CREATE TABLE `test_event_types` (
`id` int(11) NOT NULL,
`name` varchar(255) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`base` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `test_event_types`
ADD PRIMARY KEY (`id`);
Let's try to select last event from base "314":
EXPLAIN SELECT `test_events`.`create_time`
FROM `test_events`
LEFT JOIN `test_event_types`
ON ( `test_events`.`event` = `test_event_types`.`id` )
WHERE base = 314
ORDER BY `test_events`.`create_time` DESC
LIMIT 1;
+----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
| 1 | SIMPLE | test_events | NULL | ALL | NULL | NULL | NULL | NULL | 434928 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | test_event_types | NULL | ALL | PRIMARY | NULL | NULL | NULL | 44 | 2.27 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
MySQL is not using index and reads the whole table.
Without WHERE statement:
EXPLAIN SELECT `test_events`.`create_time`
FROM `test_events`
LEFT JOIN `test_event_types`
ON ( `test_events`.`event` = `test_event_types`.`id` )
ORDER BY `test_events`.`create_time` DESC
LIMIT 1;
+----+-------------+------------------+------------+--------+---------------+-------------+---------+-----------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+---------------+-------------+---------+-----------------------+------+----------+-------------+
| 1 | SIMPLE | test_events | NULL | index | NULL | create_time | 4 | NULL | 1 | 100.00 | NULL |
| 1 | SIMPLE | test_event_types | NULL | eq_ref | PRIMARY | PRIMARY | 4 | m16.test_events.event | 1 | 100.00 | Using index |
+----+-------------+------------------+------------+--------+---------------+-------------+---------+-----------------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
Now it uses index.
MySQL 5.5.55 uses index in both cases. Why is it so and what to do with it?
I don't know the difference you are seeing in your previous and current installations but the servers behaviour makes sense.
SELECT test_events.create_time FROM test_events LEFT JOIN test_event_types ON ( test_events.event = test_event_types.id ) ORDER BY test_events.create_time DESC LIMIT 1;
In this query you do not have a where clause but you are fetching one row only. And that's after sorting by create_time which happens to have an index. And that index can be used for sorting. But let's see the second query.
SELECT test_events.create_time FROM test_events LEFT JOIN test_event_types ON ( test_events.event = test_event_types.id ) WHERE base = 314 ORDER BY test_events.create_time DESC LIMIT 1
You don't have an index on the base column. So no index can be used on that. To find the relevent records mysql has to do a table scan. Having identified the relevent rows, they need to be sorted. But in this case the query planner has decided that it's just not worth it to use the index on create_time
I see several problems with your setup, the first being not having and index on base as already mentioned. But why is base varchar? You appear to be storing integers in it.
ALTER TABLE test_events
ADD PRIMARY KEY (id),
ADD KEY client (client),
ADD KEY event_time (event_time),
ADD KEY manager (manager),
ADD KEY base_id (base_id),
ADD KEY create_time (create_time);
And making multiple indexes like this doesn't make much sense in mysql. That's because mysql can use only one index per table for queries. You would be far better off with one or two indexes. Possibly multi column indexes.
I think your ideal index would contain both create_time and event fields
base = 314 with base VARCHAR... is a performance problem. Either put quotes around 314 or make base some integer type.
You appear not to need LEFT. If not, then do a plain JOIN so that the optimizer has the freedom to start with an INDEX(base), which is then missing and needed.
As for the differences between 5.5 and 5.6 and 5.7, there have been a number of Optimization changes; you may have encountered a regression. But I don't want to chase that until you have improved the query and indexes.
I stumbled upon same scenario where MySQL was using table scan, instead of INDEX search.
This could be because of one of the reasons, mentioned in MySQL docs:
The table is so small that it is faster to perform a table scan than to bother with a key lookup. This is common for tables with fewer than 10 rows and a short row length.
mysql docs link
And when I checked EXPLAIN of MySQL query in production server with large number of rows, it used INDEX search as expected.
Its one of the MySQL optimizations, under the hood :)

Why does mysql require filesort when the order by col has an index?

I create a table like this:
CREATE TABLE `testtable` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`created_at` datetime NOT NULL,
`updated_at` timestamp NOT NULL,
PRIMARY KEY (`id`),
KEY `test_updated_at_index` (`updated_at`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
And the explain on a select says that it is using filesort:
mysql> explain select * from testtable order by updated_at;
+----+-------------+-----------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | testtable | ALL | NULL | NULL | NULL | NULL | 1 | Using filesort |
+----+-------------+-----------+------+---------------+------+---------+------+------+----------------+
Why does it not use the index? Additionally, if I just remove the created_at column, then it DOES use the index!
Why does adding a single additional column cause mysql to forget how to use the index?
How can I change the table so that mysql DOES use the index to order by, even with an additional column?
There is no WHERE clause, so you read all records from the table. Most often it is way faster to read the table directly, even if you must sort the data afterwards, rather than looping over an index and picking each record separately.
Only if you were interested only in a small part of the table, say 2% of its data, would it make sense to use the index. Or if you were only interested in the indexed column alone, of course.

Why won't my query use the index on my datetime field?

I have the following table:
CREATE TABLE 'tableA'(
`col1` int(11) NOT NULL AUTO_INCREMENT,
`col2` varchar(20) NOT NULL,
`col3` int(11) NOT NULL,
`col4` varchar(200) NOT NULL,
`col5` varchar(15) NOT NULL,
`col6` datetime NOT NULL,
PRIMARY KEY (`col1`),
UNIQUE KEY `col2,col3` (`col2`,`col3`),
KEY `col6` (`col6`)
) ENGINE=InnoDB AUTO_INCREMENT=1881208 DEFAULT CHARSET=utf8
I have an index on col6, a datetime column. I have almost 2M rows in the table, and the dates range from 1/1/2007 to 11/27/2012.
When I run the following, it doesn't use my index:
EXPLAIN SELECT * FROM tableA ORDER BY col6 ASC
+----+-------------+----------+------+---------------+------+---------+------+---------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+---------+----------------+
| 1 | SIMPLE | tableA | ALL | NULL | NULL | NULL | NULL | 1933765 | Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+---------+----------------+
I tried converting the datetime field to an integer and converting the datetime to a unix timestamp. However, it still won't use my index. What am I missing? Why does the optimizer insist on sorting through lots of rows (in this case 1,933,765 rows) rather than use the index?
Since you are not selecting on anything based on the index to narrow the result set, using it would only incur additional work to lookup via point-lookup every each row in the primary table.

Why does MySQL not use an index for a greater than comparison?

I am trying to optimize a bigger query and ran into this wall when I realized this part of the query was doing a full table scan, which in my mind does not make sense considering the field in question is a primary key. I would assume that the MySQL Optimizer would use the index.
Here is the table:
CREATE TABLE userapplication (
application_id int(11) NOT NULL auto_increment,
userid int(11) NOT NULL default '0',
accountid int(11) NOT NULL default '0',
resume_id int(11) NOT NULL default '0',
coverletter_id int(11) NOT NULL default '0',
user_email varchar(100) NOT NULL default '',
account_name varchar(200) NOT NULL default '',
resume_name varchar(255) NOT NULL default '',
resume_modified datetime NOT NULL default '0000-00-00 00:00:00',
cover_name varchar(255) NOT NULL default '',
cover_modified datetime NOT NULL default '0000-00-00 00:00:00',
application_status tinyint(4) NOT NULL default '0',
application_created datetime NOT NULL default '0000-00-00 00:00:00',
application_modified timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
publishid int(11) NOT NULL default '0',
application_visible int(11) default '1',
PRIMARY KEY (application_id),
KEY publishid (publishid),
KEY application_status (application_status),
KEY userid (userid),
KEY accountid (accountid),
KEY application_created (application_created),
KEY resume_id (resume_id),
KEY coverletter_id (coverletter_id),
) ENGINE=MyISAM ;
This simple query seems to do a full table scan:
SELECT * FROM userapplication WHERE application_id > 1025;
This is the output of the EXPLAIN:
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | userapplication | ALL | PRIMARY | NULL | NULL | NULL | 784422 | Using where |
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+`
Any ideas how to prevent this simple query from doing a full table scan? Or am I out of luck?
You'd probably be better off letting MySql decide on the query plan. There is a good
chance that doing an index scan would be less efficient than a full table scan.
There are two data structures on disk for this table
The table itself; and
The primary key B-Tree index.
When you run a query the optimizer has two options about how to access the data:
SELECT * FROM userapplication WHERE application_id > 1025;
Using The Index
Scan the B-Tree index to find the address of all the rows where application_id > 1025
Read the appropriate pages of the table to get the data for these rows.
Not using the Index
Scan the entire table, and pick the appropriate records.
Choosing the best stratergy
The job of the query optimizer is to choose the most efficient strategy for getting the data you want. If there are a lot of rows with an application_id > 1025 then it can actually be less efficient to use the index. For example if 90% of the records have an application_id > 1025 then the query optimizer would have to scan around 90% of the leaf nodes of the b-tree index and then read at least 90% of the table as well to get the actual data; this would involve reading more data from disk than just scanning the table.
MyISAM tables are not clustered, a PRIMARY KEY index is a secondary index and requires an additional table lookup to get the other values.
It is several times more expensive to traverse the index and do the lookups. If you condition is not very selective (yields a large share of total records), MySQL will consider table scan cheaper.
To prevent it from doing a table scan, you could add a hint:
SELECT *
FROM userapplication FORCE INDEX (PRIMARY)
WHERE application_id > 1025
, though it would not necessarily be more efficient.
Mysql definitely considers a full table scan cheaper than using the index; you can however force to use your primary key as preferred index with:
mysql> EXPLAIN SELECT * FROM userapplication FORCE INDEX (PRIMARY) WHERE application_id > 10;
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | userapplication | range | PRIMARY | PRIMARY | 4 | NULL | 24 | Using where |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
Note that using "USE INDEX" instead of "FORCE INDEX" to only hint mysql on the index to use, mysql still prefers a full table scan:
mysql> EXPLAIN SELECT * FROM userapplication USE INDEX (PRIMARY) WHERE application_id > 10;
+----+-------------+-----------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | userapplication | ALL | PRIMARY | NULL | NULL | NULL | 34 | Using where |
+----+-------------+-----------------+------+---------------+------+---------+------+------+-------------+
If your WHERE is a "greater than" comparison, it probably returns quite a few entries (and can realistically return all of them), therefore full table scans are usually preferred.
It should be the case of just typing:
SELECT * FROM userapplication WHERE application_id > 1025;
As detailed at this link. According to that guide, it should work where the application_id is a numeric value, for non-numeric values, you should type:
SELECT * FROM userapplication WHERE application_id > '1025';
I don't think there's anything wrong with your SELECT, maybe it's a table configuration problem?