Indexing for keyset pagination in mysql - mysql

I am trying to build an index in mysql to support a keyset pagination query. My query looks like this:
SELECT * FROM invoice
WHERE company_id = 'someguid'
AND id > 'lastguidfromlastpage'
ORDER BY id
LIMIT 10
Common knowledge on this says an index on company_id would contain the PRIMARY KEY of the table (id). Because of this I would expect to be able to use rows directly from the index without any need for the query to sort results first however my explain plan shows a filesort and an index merge:
mysql> explain SELECT *
-> FROM invoice
-> WHERE company_id = '37687714-2e9d-4daa-aee6-f7d56962f903'
-> AND id > '525ae038-0cc3-4f9a-85e6-6f36d43fae40'
-> ORDER BY id
-> LIMIT 10;
+----+-------------+---------+------------+-------------+-----------------------------+-----------------------------+---------+------+------+----------+---------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------------+-----------------------------+-----------------------------+---------+------+------+----------+---------------------------------------------------------------------------+
| 1 | SIMPLE | invoice | NULL | index_merge | PRIMARY,invoice__company_id | invoice__company_id,PRIMARY | 76,38 | NULL | 48 | 100.00 | Using intersect(invoice__company_id,PRIMARY); Using where; Using filesort |
+----+-------------+---------+------------+-------------+-----------------------------+-----------------------------+---------+------+------+----------+---------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
If I explicitly add the id to the index then I get the explain plan I would expect:
mysql> explain SELECT *
-> FROM invoice
-> WHERE company_id = '37687714-2e9d-4daa-aee6-f7d56962f903'
-> AND id > '525ae038-0cc3-4f9a-85e6-6f36d43fae40'
-> ORDER BY id
-> LIMIT 10;
+----+-------------+---------+------------+-------+--------------------------------+--------------------------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+--------------------------------+--------------------------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | invoice | NULL | range | PRIMARY,invoice__company_id_id | invoice__company_id_id,PRIMARY | 76 | NULL | 98 | 100.00 | Using index condition |
+----+-------------+---------+------------+-------+--------------------------------+--------------------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
SHOW CREATE TABLE:
CREATE TABLE `invoice` (
`id` varchar(36) NOT NULL,
`company_id` varchar(36) NOT NULL DEFAULT '0',
`invoice_number` varchar(36) NOT NULL DEFAULT '0',
`identifier` varchar(255) NOT NULL,
`created_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`created_by` varchar(36) DEFAULT NULL,
`data_source` varchar(36) NOT NULL,
`type` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `invoice__company_id_id` (`company_id`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
select ##optimizer_switch;
use_index_extensions=on
MySQL version:
version: 5.7.26-29-57-log
innodb_version: 5.7.26-29
version_comment: Percona XtraDB Cluster (GPL), Release rel29, Revision 03540a3, WSREP version 31.37, wsrep_31.37
SHOW VARIABLES LIKE 'char%';
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
There are a few sources explaining that the company_id index on it's own should be sufficient for this:
https://stackoverflow.com/a/30152513/64023
https://dba.stackexchange.com/a/136029/166838
I've been unable to find official documentation about exactly what to expect. Is this related to the datatypes for the id? Is the common knowledge about mysql+innodb behavior incorrect?

I have encountered this problem before. Here is my analysis of it.
It occurs in MySQL 5.7 and 8.0, but apparently not in older versions and not in MariaDB.
The "solution" I prefer is to change the indexes thus:
INDEX(company_id) -- DROP this
INDEX(company_id, id) -- ADD this
Although the 2-column index is theoretically identical to the one-column index for InnoDB (assuming id is the PK`), the Optimizer seems to ignore this fact in some situations.
Also, I like to explicitly add the PK when I see a need. This signals future readers of the schema (including myself) that some query benefits from the PK being appended.
I have yet to find a case where "index merge intersect" is faster than an equivalent composite index.
I dislike ever using index "hints" for fear that the data distribution will change in the future and my "hint" will make things worse.

This won't work.
For keyset pagination to take effect, you need to have autoincrement integer as your primary id/key. Right now you are using VARCHAR and store UIDs.
Your query won't select "next" UID "larger than" (... AND id > '525ae038-0cc3-4f9a-85e6-6f36d43fae40' ... ).
When you change primary ID to be number, then, this will work.
If you still have issues with indexes, you can try forcing mysql to use your index:
SELECT * FROM invoice USE INDEX (invoice__company_id_id)
WHERE company_id = 'someguid'
AND id > 12345
ORDER BY id
LIMIT 10

Related

MySQL - Created index isn't showing up as possible key

I have the following table (it has more data columns, removed them because it would be a long post):
CREATE TABLE `members` (
`memberid` int(11) NOT NULL AUTO_INCREMENT,
`firstname` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`lastname` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`memberid`),
KEY `members_lname_ix` (`lastname`)
) ENGINE=InnoDB AUTO_INCREMENT=1019 DEFAULT CHARSET=utf8
COLLATE=utf8_unicode_ci;
By default, a user only ever accesses 10-20 rows from this table at a time and it is usually sorted by the lastname column, it's all paginated server side. so I decided to add an index to lastname to help with sorting, however the index does not seem to be working like I would expect it to. when I run EXPLAIN SELECT * FROM members ORDER BY lastname ASC I get:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra
1 | simple | members | ALL | null | null | null | null | 711 | using filesort
I can at least confirm the index exists because if I run SHOW INDEX FROM members I get:
Table | Non_Unique | Key_name | Seq_in_ix | Col_name | Collation | Cardinality | Sub part | Packed | Null | Ix type
members | 0 | PRIMARY | 1 | memberid | A | 711 | null | null | (blank) | BTREE
members | 1 | members_lname_ix | 1 | lastname | A | 711 | null | null | YES | BTREE
if I add USE INDEX (members_lname_ix) both possible_keys and key will remain null. However if I add FORCE INDEX (members_lname_ix) possible_keys remains null and key shows members_lname_ix. This is my first time trying to apply indexing but to me this doesn't seem very intuitive - it feels like mysql should know that I created an index for lastname, no? I can't quite figure out what I'm doing wrong here unless I am misunderstanding something. Is the solution here to just keep using FORCE INDEX?
There are two ways to perform that query:
Plan A (as you were expecting):
Scan through the index sequentially, reading the entire (estimated) 711 rows.
Randomly look up each row in the data BTree. This involves reading the entire dataset.
Deliver the data in order.
Plan B (what it does):
Scan through the data, reading all 711 rows.
Sort the data
Deliver the sorted data.
Plan B does not touch the index at all; this was deemed to be a bigger savings than not having to sort the data.
In a table as tiny as yours, it would be hard to see a difference in speed. (In my test case, it took under 10 milliseconds either way.) In huge tables, the difference could be significant.
For optimal pagination, see http://mysql.rjweb.org/doc.php/pagination

MySQL with JOIN not using index

Problem with MySQL version 5.7.18. Earlier versions of MySQL behaves as supposed to.
Here are two tables. Table 1:
CREATE TABLE `test_events` (
`id` int(11) NOT NULL,
`event` int(11) DEFAULT '0',
`manager` int(11) DEFAULT '0',
`base_id` int(11) DEFAULT '0',
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`client` int(11) DEFAULT '0',
`event_time` datetime DEFAULT '0000-00-00 00:00:00'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `test_events`
ADD PRIMARY KEY (`id`),
ADD KEY `client` (`client`),
ADD KEY `event_time` (`event_time`),
ADD KEY `manager` (`manager`),
ADD KEY `base_id` (`base_id`),
ADD KEY `create_time` (`create_time`);
And the second table:
CREATE TABLE `test_event_types` (
`id` int(11) NOT NULL,
`name` varchar(255) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`base` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `test_event_types`
ADD PRIMARY KEY (`id`);
Let's try to select last event from base "314":
EXPLAIN SELECT `test_events`.`create_time`
FROM `test_events`
LEFT JOIN `test_event_types`
ON ( `test_events`.`event` = `test_event_types`.`id` )
WHERE base = 314
ORDER BY `test_events`.`create_time` DESC
LIMIT 1;
+----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
| 1 | SIMPLE | test_events | NULL | ALL | NULL | NULL | NULL | NULL | 434928 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | test_event_types | NULL | ALL | PRIMARY | NULL | NULL | NULL | 44 | 2.27 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
MySQL is not using index and reads the whole table.
Without WHERE statement:
EXPLAIN SELECT `test_events`.`create_time`
FROM `test_events`
LEFT JOIN `test_event_types`
ON ( `test_events`.`event` = `test_event_types`.`id` )
ORDER BY `test_events`.`create_time` DESC
LIMIT 1;
+----+-------------+------------------+------------+--------+---------------+-------------+---------+-----------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+---------------+-------------+---------+-----------------------+------+----------+-------------+
| 1 | SIMPLE | test_events | NULL | index | NULL | create_time | 4 | NULL | 1 | 100.00 | NULL |
| 1 | SIMPLE | test_event_types | NULL | eq_ref | PRIMARY | PRIMARY | 4 | m16.test_events.event | 1 | 100.00 | Using index |
+----+-------------+------------------+------------+--------+---------------+-------------+---------+-----------------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
Now it uses index.
MySQL 5.5.55 uses index in both cases. Why is it so and what to do with it?
I don't know the difference you are seeing in your previous and current installations but the servers behaviour makes sense.
SELECT test_events.create_time FROM test_events LEFT JOIN test_event_types ON ( test_events.event = test_event_types.id ) ORDER BY test_events.create_time DESC LIMIT 1;
In this query you do not have a where clause but you are fetching one row only. And that's after sorting by create_time which happens to have an index. And that index can be used for sorting. But let's see the second query.
SELECT test_events.create_time FROM test_events LEFT JOIN test_event_types ON ( test_events.event = test_event_types.id ) WHERE base = 314 ORDER BY test_events.create_time DESC LIMIT 1
You don't have an index on the base column. So no index can be used on that. To find the relevent records mysql has to do a table scan. Having identified the relevent rows, they need to be sorted. But in this case the query planner has decided that it's just not worth it to use the index on create_time
I see several problems with your setup, the first being not having and index on base as already mentioned. But why is base varchar? You appear to be storing integers in it.
ALTER TABLE test_events
ADD PRIMARY KEY (id),
ADD KEY client (client),
ADD KEY event_time (event_time),
ADD KEY manager (manager),
ADD KEY base_id (base_id),
ADD KEY create_time (create_time);
And making multiple indexes like this doesn't make much sense in mysql. That's because mysql can use only one index per table for queries. You would be far better off with one or two indexes. Possibly multi column indexes.
I think your ideal index would contain both create_time and event fields
base = 314 with base VARCHAR... is a performance problem. Either put quotes around 314 or make base some integer type.
You appear not to need LEFT. If not, then do a plain JOIN so that the optimizer has the freedom to start with an INDEX(base), which is then missing and needed.
As for the differences between 5.5 and 5.6 and 5.7, there have been a number of Optimization changes; you may have encountered a regression. But I don't want to chase that until you have improved the query and indexes.
I stumbled upon same scenario where MySQL was using table scan, instead of INDEX search.
This could be because of one of the reasons, mentioned in MySQL docs:
The table is so small that it is faster to perform a table scan than to bother with a key lookup. This is common for tables with fewer than 10 rows and a short row length.
mysql docs link
And when I checked EXPLAIN of MySQL query in production server with large number of rows, it used INDEX search as expected.
Its one of the MySQL optimizations, under the hood :)

MySQL using different index depending on limit value with ORDER BY query

This is weird to me:
One table 'ACTIVITIES' with one index on ACTIVITY_DATE. The exact same query with different LIMIT value results in different execution plan.
Here it is:
mysql> explain select * from ACTIVITIES order by ACTIVITY_DATE desc limit 20
-> ;
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
| 1 | SIMPLE | ACTIVITIES | index | NULL | ACTI_DATE_I | 4 | NULL | 20 | |
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
1 row in set (0.00 sec)
mysql> explain select * from ACTIVITIES order by ACTIVITY_DATE desc limit 150
-> ;
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
| 1 | SIMPLE | ACTIVITIES | ALL | NULL | NULL | NULL | NULL | 10629 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
1 row in set (0.00 sec)
How come when I limit 150 it is not using the index? I mean, scanning 150 lines seems faster than scanning 10629 rows, right?
EDIT
The query uses the index till "limit 96" and starts filesort at "limit 97".
The table has nothing specific, even not a foreign key, here is the complete create table:
mysql> show create table ACTIVITIES\G
*************************** 1. row ***************************
Table: ACTIVITIES
Create Table: CREATE TABLE `ACTIVITIES` (
`ACTIVITY_ID` int(11) NOT NULL AUTO_INCREMENT,
`ACTIVITY_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`USER_KEY` varchar(50) NOT NULL,
`ITEM_KEY` varchar(50) NOT NULL,
`ACTIVITY_TYPE` varchar(1) NOT NULL,
`EXTRA` varchar(500) DEFAULT NULL,
`IS_VISIBLE` varchar(1) NOT NULL DEFAULT 'Y',
PRIMARY KEY (`ACTIVITY_ID`),
KEY `ACTI_USER_I` (`USER_KEY`,`ACTIVITY_DATE`),
KEY `ACTIVITY_ITEM_I` (`ITEM_KEY`,`ACTIVITY_DATE`),
KEY `ACTI_ITEM_TYPE_I` (`ITEM_KEY`,`ACTIVITY_TYPE`,`ACTIVITY_DATE`),
KEY `ACTI_DATE_I` (`ACTIVITY_DATE`)
) ENGINE=InnoDB AUTO_INCREMENT=10091 DEFAULT CHARSET=utf8 COMMENT='Logs activity'
1 row in set (0.00 sec)
mysql>
I also tried to run "ANALYSE TABLE ACTIVITIES" but that did not change a thing.
That's the way things go. Bear with me a minute...
The Optimizer would like to use an INDEX, in this case ACTI_DATE_I. But it does not want to use it if that would be slower.
Plan A: Use the index.
Reach into the BTree-structured index at the end (because of DESC)
Scan backward
For each row in the index, look up the corresponding row in the data. Note: The index has (ACTIVITY_DATE, ACTIVITY_ID) because the PRIMARY KEY is implicitly appended to any secondary key. To reach into the "data" using the PK (ACTIVITY_ID) is another BTree lookup, potentially random. Hence, it is potentially slow. (But not very slow in your case.)
This stops after LIMIT rows.
Plan B: Ignore the table
Scan the table, building a tmp table. (Likely to be in-memory.)
Sort the tmp table
Peel off LIMIT rows.
In your case (96 -- 1% of 10K) it is surprising that it picked the table scan. Normally, the cutoff is somewhere around 10%-30% of the number of rows in the table.
ANALYZE TABLE should have caused a recalculation of the statistics, which could have convinced it to go with the other Plan.
What version of MySQL are you using? (No, I don't know of any changes in this area.)
One thing you could try: OPTIMIZE TABLE ACTIVITIES; That will rebuild the table, thereby repacking the blocks and leading to potentially different statistics. If that helps, I would like to know it -- since I normally say "Optimize table is useless".

Even with seemingly correct indices and enough memory, query runs too long

I have a table with 3 million rows and 6 columns.
The table structure:
| Sample | CREATE TABLE `sample` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`FileMD5` varchar(32) NOT NULL,
`NoCsumMD5` varchar(32) NOT NULL,
`SectMD5` varchar(32) NOT NULL,
`SectNoResMD5` varchar(32) NOT NULL,
`ImpMD5` varchar(32) NOT NULL,
`Overlay` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`ID`),
KEY `FileMD5` (`FileMD5`),
KEY `NoCsumMD5` (`NoCsumMD5`)
) ENGINE=InnoDB AUTO_INCREMENT=3073630 DEFAULT CHARSET=latin1 |
The temporary table values:
mysql> SHOW VARIABLES LIKE 'tmp_table_size';
+----------------+----------+
| Variable_name | Value |
+----------------+----------+
| tmp_table_size | 16777216 |
+----------------+----------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'max_heap_table_size';
+---------------------+----------+
| Variable_name | Value |
+---------------------+----------+
| max_heap_table_size | 16777216 |
+---------------------+----------+
1 row in set (0.00 sec)
My Query
mysql> explain SELECT NoCsumMD5,Count(FileMD5)
FROM Sample GROUP BY NoCsumMD5
HAVING Count(FileMD5) > 10 ORDER BY Count(FileMD5) Desc ;
+----+-------------+--------+-------+---------------+-----------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-----------+---------+------+---------+---------------------------------+
| 1 | SIMPLE | Sample | index | NULL | NoCsumMD5 | 34 | NULL | 2928042 | Using temporary; Using filesort |
+----+-------------+--------+-------+---------------+-----------+---------+------+---------+---------------------------------+
How can I optimize this query. Even after 10 min, it generates no output.
I feel that I have indexed the right columns and given enough memory for temporary tables.
Since FileMD5 is not NULL in your table definition, the query can be simplified, and you will not need the composite index #brendan-long suggests (NoCsumMD5 index is enough):
SELECT NoCsumMD5, Count(*) as cnt
FROM Sample
GROUP BY NoCsumMD5
HAVING cnt > 10
ORDER BY cnt DESC;
I'm not sure if this will help, but MySQL can only use one index at a time, so it may be helpful to create an index over both FileMD5 and NoCsumMD5:
KEY `someName` (`NoCsumMD5`, `FileMD5`),
Here's some information on multiple column indexes:
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
The short version is that the order of the columns in the index matters, because MySQL can only use the index in that order (for example, in the index I gave above, it can test NoCsumMD5, then narrow the result down using FileMD5).
I'm not sure how much it will help in this query though, since all you care about is whether FileMD5 is NULL or not..

COLLATE in SQL statements on fields in utf8_bin slower than using the default collation?

Two scenarios:
Using the default collation:
CREATE TABLE IF NOT EXISTS `table` (
`name` varchar(255) collate utf8_general_ci NOT NULL,
UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
SELECT `name` FROM `table` ORDER BY `name`;
Using COLLATE:
CREATE TABLE IF NOT EXISTS `table` (
`name` varchar(255) collate utf8_bin NOT NULL,
UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
SELECT `name` FROM `table` ORDER BY `name` COLLATE utf8_general_ci;
I need to change from the first scenario to second because the index is case insensitive. Still ordering is important. There are experimental collations as utf8_general_cs but it requires special compilation.
Will this have an impact on the performance?
In my opinion if MySQL stores text fields internally in utf8 independent of collation it should not affect performance.
Edit:
The output of explain in case COLLATE is used is the same as without.
mysql> EXPLAIN SELECT *
-> FROM `table`
-> ORDER BY `name`
-> COLLATE utf8_general_ci;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | table | ALL | NULL | NULL | NULL | NULL | 5 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT *
-> FROM `table`
-> ORDER BY `name`;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | table | ALL | NULL | NULL | NULL | NULL | 5 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)
The collation determines the manner in which the column is considered for indexing as well as the manner in which comparisons are made. utf8 bin is meant for comparing strings by binary value, and utf8 general is meant for comparing by alphabetical value. What's considered a match will vary by collation and the order will vary by collation. If the column is treated as a binary (as in UTF8 bin) a character is equal to another character if and only if its bit value is equivalent.
When you specify a different collation in a select statement from the default collation of a field, you can't take advantage of the existing index (which uses the default collation). It should be about the same if you manually specify a collation on a query using an indexed column as it would be on a non-indexed column (provided the manually specified collation is different from the column's default collation) because it would simply ignore the index, in which case MySQL would use QuickSort (using a comparator based on the specified collation).
If you don't have an index on that column I don't think it will be slower. With indexed column it would be slower.
With first table I get Extra field = "Using index;", with second table "Using index; Using filesort". So the second would be slower.