TokuDB sorting time different between ASC vs DESC - mysql

This is MariaDB + TokuDB 7.1 community downloaded from Tokutek. Please accept my ignorance if this is normal behavior but I have a question about sorting results. I'm experiencing huge time difference in sorting between the two sort directions - ascending and descending:
SELECT sql_no_cache id, createts, deleted
FROM sort_test
WHERE createts > '2000098'
ORDER BY createts asc
+---------+----------+---------+
| id | createts | deleted |
+---------+----------+---------+
| 1999999 | 2000099 | NULL |
| 2000000 | 2000100 | NULL |
+---------+----------+---------+
2 rows in set (0.00 sec)
SELECT sql_no_cache id, createts, deleted
FROM sort_test
WHERE createts > '2000098'
ORDER BY createts desc
+---------+----------+---------+
| id | createts | deleted |
+---------+----------+---------+
| 2000000 | 2000100 | NULL |
| 1999999 | 2000099 | NULL |
+---------+----------+---------+
2 rows in set (0.55 sec)
Below I present my simplified test case. Here is the table:
CREATE TABLE `sort_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`createts` int(11) DEFAULT NULL,
`deleted` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_createts` (`createts`)
) ENGINE=TokuDB
Here I populate the table with 2 million rows using this procedure:
delimiter ;;
drop procedure if exists sort_test_populate;;
create procedure sort_test_populate()
begin
DECLARE int_val INT DEFAULT 1;
myloop : LOOP
if (int_val > 2000000) THEN
LEAVE myloop;
end if;
insert into sort_test (id, createts) values (int_val, int_val+100);
set int_val = int_val +1;
end loop;
end;;
call sort_test_populate();;
Query OK, 1 row affected (28 min 2.80 sec)
Here are my test queries again:
SELECT sql_no_cache id, createts, deleted
FROM sort_test
WHERE createts > '2000098'
ORDER BY createts asc
2 rows in set (0.00 sec)
SELECT sql_no_cache id, createts, deleted
FROM sort_test
WHERE createts > '2000098'
ORDER BY createts desc
2 rows in set (0.55 sec)
And here is the "explain extended" result, it's identical for both queries:
+------+-------------+-----------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-----------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | sort_test | range | idx_createts | idx_createts | 5 | NULL | 2 | 100.00 | Using where |
+------+-------------+-----------+-------+---------------+--------------+---------+------+------+----------+-------------+
Please note that this is not my exact data I'm working with, that would be too much to include here. I just wanted to create some test data to demonstrate the problem. My question is - why it's behaving like this and how to make the descending order query faster?

This is a known bug with Index Condition Pushdown (ICP). The workaround is to disable ICP by setting the optimizer_switch either globally or within the session executing this query.
mysql> SET optimizer_switch='index_condition_pushdown=off';
(full disclosure, I'm an employee at Tokutek, makers of TokuDB)

Related

How to uniquely order a MySQL table with a multi column primary key

Good day all
I have a strange query.
Let's say I have a table with a composite primary key (2 columns).
CREATE TABLE `testtable` (
`ifk1` INT(10) NOT NULL,
`ifk2` INT(10) NOT NULL,
`data1` VARCHAR(10) DEFAULT NULL,
PRIMARY KEY (`ifk1`,`ifk2`),
UNIQUE KEY `keyName` (`data1`)
) ENGINE=INNODB DEFAULT CHARSET=utf8mb4
Let's add some basic data
INSERT INTO testtable(ifk1 , ifk2 , data1)
VALUES (1 , 2 , 'a') , (5 , 2 , 'b') , (2 , 4 , 'c') , (5 , 8 , 'd') , (2 , 2 , 'e') , (2 , 5 , 'f');
Let's do a simple SELECT to see what order the data comes out in:
ifk1 ifk2 data1
1 2 a
2 2 e
2 4 c
2 5 f
5 2 b
5 8 d
Now, what if I want to write some code to iterate through the table, grabbing X number of records at a time.
With a small set of data, this is simple:
SELECT * FROM testtable LIMIT 0 , 2;
SELECT * FROM testtable LIMIT 2 , 2;
SELECT * FROM testtable LIMIT 4 , 2;
This is going to run into some problems as the table gets bigger, as it's not using a WHERE clause and so not using an INDEX.
How do I use a WHERE clause to replicate the above SELECTS?
SELECT * FROM testtable WHERE ifk1 > 0 AND ifk2 > 0 LIMIT 2; -- this will work
The first one is easy, but what about the others?
Is there a way to do that?
A LIMIT clause without an ORDER BY clause is arbitrary. All three queries you are showing:
SELECT * FROM testtable LIMIT 0 , 2;
SELECT * FROM testtable LIMIT 2 , 2;
SELECT * FROM testtable LIMIT 4 , 2;
could return the exact same two rows. So, you must add an ORDER BY clause to make this work reliably: ORDER BY ifk1, ifk2.
But, yes, having to sort the data again and again for every access can take a lot of time. This is why we try to avoid using offsets and work with a key instead:
SELECT *
FROM testtable
WHERE ifk1 > #last_ifk1 OR (ifk1 = #last_ifk1 AND ifk2 > #last_ifk2)
ORDER BY ifk1, ifk2
LIMIT 2;
Paging is almost always quite slow. But this access method can use the primary key's unique index on (ifk1, ifk2) and access the next two rows very quickly. It depends on the implemantation in MySQL and its version how fast this is.
I am not sure I understand the index part of your question.
But generally, if you want to iterate over a bigger result set you can use a cursor as described here:
https://www.mysqltutorial.org/mysql-cursor/
This would be for a stored procedure but db drivers for other languages will expose similar functionality.
If your request is to use indexes: Logic is the same:
mysql> SELECT * FROM testtable WHERE ifk1 > 0 AND ifk2 > 0 LIMIT 2,2;
+------+------+-------+
| ifk1 | ifk2 | data1 |
+------+------+-------+
| 2 | 4 | c |
| 5 | 8 | d |
+------+------+-------+
2 rows in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM testtable WHERE ifk1 > 0 AND ifk2 > 0 LIMIT 2,2;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | testtable | NULL | index | PRIMARY | keyName | 43 | NULL | 6 | 33.33 | Using where; Using index |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> SELECT * FROM testtable WHERE ifk1 > 0 AND ifk2 > 0 LIMIT 4,2;
+------+------+-------+
| ifk1 | ifk2 | data1 |
+------+------+-------+
| 2 | 2 | e |
| 2 | 5 | f |
+------+------+-------+
2 rows in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM testtable WHERE ifk1 > 0 AND ifk2 > 0 LIMIT 4,2;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | testtable | NULL | index | PRIMARY | keyName | 43 | NULL | 6 | 33.33 | Using where; Using index |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
Above uses the KeyName indexes: Maybe you should deactivate or drop KeyName index, and enable the query to use PK composite indexes: Please follow these steps to achieve it:
DROP keyName index first:
mysql> ALTER TABLE testtable
-> DROP INDEX keyName;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
the run the query again to see that PK Composite keys are used in the query which I think make the query faster:
mysql> EXPLAIN SELECT * FROM testtable WHERE ifk1 > 0 AND ifk2 > 0 LIMIT 2,2;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | testtable | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 6 | 33.33 | Using where |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

Select from table is slow

My problem is: simple select query takes a long time (3 minutes).
Structure:
mysql> show create table seventhcont_exceptionreport;
seventhcont_exceptionreport | CREATE TABLE `seventhcont_exceptionreport` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`body_html` longtext NOT NULL,
`datetime_created` datetime NOT NULL,
`subject` varchar(256) NOT NULL,
`host` varchar(128) NOT NULL,
`exc_value` varchar(512) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=74607 DEFAULT CHARSET=utf8 |
Rows count:
mysql> select count(*) from seventhcont_exceptionreport;
+----------+
| count(*) |
+----------+
| 7064 |
+----------+
1 row in set (0.00 sec)
Query 1 (normal):
mysql> select id, datetime_created from seventhcont_exceptionreport order by id LIMIT 100 OFFSET 6000;
...
100 rows in set (0.30 sec)
Query 2 (very slow):
mysql> select id, datetime_created from seventhcont_exceptionreport order by id LIMIT 100 OFFSET 7000;
...
63 rows in set (3 min 40.56 sec)
!!! 3 minutes and 40 sec.
Why?
UPDATE
Explain for query 1:
mysql> EXPLAIN select id, datetime_created from seventhcont_exceptionreport order by id LIMIT 100 OFFSET 6000;
+----+-------------+-----------------------------+-------+---------------+---------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------------------+-------+---------------+---------+---------+------+------+-------+
| 1 | SIMPLE | seventhcont_exceptionreport | index | NULL | PRIMARY | 4 | NULL | 6100 | |
+----+-------------+-----------------------------+-------+---------------+---------+---------+------+------+-------+
1 row in set (0.00 sec)
Explain for query 2:
mysql> EXPLAIN select id, datetime_created from seventhcont_exceptionreport order by id LIMIT 100 OFFSET 7000;
+----+-------------+-----------------------------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------------------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | seventhcont_exceptionreport | ALL | NULL | NULL | NULL | NULL | 7067 | Using filesort |
+----+-------------+-----------------------------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)
UPDATE
Analyze table:
mysql> ANALYZE TABLE seventhcont_exceptionreport;
+--------------------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------------------------------+---------+----------+----------+
| 7k.seventhcont_exceptionreport | analyze | status | OK |
+--------------------------------+---------+----------+----------+
1 row in set (2.51 sec)
I am no MySQL specialist, but I might be able to point you in the right direction.
In the first query, we can see in the Explain Plan, that an index access was used. In contrast, for the second query, we can see that a non-index access is performed (type index vs ALL). Also, we can see that MySQL is using Using filesort.
This means MySQL cannot perform the sort operation on the index and is therefore performing it on the data itself. This could be because the sort buffer is too small (also see https://www.percona.com/blog/2009/03/05/what-does-using-filesort-mean-in-mysql/).
Therefore, try to increase the size of your sort buffer (soft_buffer_size).

Find value within a range in database table

I need the SQL equivalent of this.
I have a table like this
ID MN MX
-- -- --
A 0 3
B 4 6
C 7 9
Given a number, say 5, I want to find the ID of the row where MN and MX contain that number, in this case that would be B.
Obviously,
SELECT ID FROM T WHERE ? BETWEEN MN AND MX
would do, but I have 9 million rows and I want this to run as fast as possible. In particular, I know that there can be only one matching row, I now that the MN-MX ranges cover the space completely, and so on. With all these constraints on the possible answers, there should be some optimizations I can make. Shouldn't there be?
All I have so far is indexing MN and using the following
SELECT ID FROM T WHERE ? BETWEEN MN AND MX ORDER BY MN LIMIT 1
but that is weak.
If you have an index spanning MN and MX it should be pretty fast, even with 9M rows.
alter table T add index mn_mx (mn, mx);
Edit
I just tried a test w/ a 1M row table
mysql> select count(*) from T;
+----------+
| count(*) |
+----------+
| 1000001 |
+----------+
1 row in set (0.17 sec)
mysql> show create table T\G
*************************** 1. row ***************************
Table: T
Create Table: CREATE TABLE `T` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`mn` int(10) DEFAULT NULL,
`mx` int(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `mn_mx` (`mn`,`mx`)
) ENGINE=InnoDB AUTO_INCREMENT=1048561 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> select * from T order by rand() limit 1;
+--------+-----------+-----------+
| id | mn | mx |
+--------+-----------+-----------+
| 112940 | 948004986 | 948004989 |
+--------+-----------+-----------+
1 row in set (0.65 sec)
mysql> explain select id from T where 948004987 between mn and mx;
+----+-------------+-------+-------+---------------+-------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+--------+--------------------------+
| 1 | SIMPLE | T | range | mn_mx | mn_mx | 5 | NULL | 239000 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)
mysql> select id from T where 948004987 between mn and mx;
+--------+
| id |
+--------+
| 112938 |
| 112939 |
| 112940 |
| 112941 |
+--------+
4 rows in set (0.03 sec)
In my example I just had an incrementing range of mn values and then set mx to +3 that so that's why I got more than 1, but should apply the same to you.
Edit 2
Reworking your query will definitely be better
mysql> explain select id from T where mn<=947892055 and mx>=947892055;
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| 1 | SIMPLE | T | range | mn_mx | mn_mx | 5 | NULL | 9 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
It's worth noting even though the first explain reported many more rows to be scanned I had enough innodb buffer pool set to keep the entire thing in RAM after creating it; so it was still pretty fast.
If there are no gaps in your set, a simple gte comparison will work:
SELECT ID FROM T WHERE ? >= MN ORDER BY MN ASC LIMIT 1

Primary key for a table where DATETIME col is used for almost all SELECT operations

I'm wondering what kind of PK I should be choosing for this table in MySQL. Almost all the SELECT operations will involve the DATETIME (date ranges, a specific date, etc.).
Is there a best practice for this?
I wouldn't recommend that the DATETIME be your PK, but you should certainly create an index on that column.
It's perfectly acceptable to use dates to form part of a composite primary key especially if you're using innodb and want to take advantage of clustered primary key indexes to gain maximum read performance.
have a look at the following:
http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/
MySQL script
Things to note:
innodb doesnt support auto_increment composite primary keys hence the use of the sequence table.
the auto_increment portion of the primary key just helps guarantee uniqueness - the outer part of the key is the important part i.e the date.
full script here: http://pastie.org/1475625 or continue reading...
drop table if exists foo_seq;
create table foo_seq
(
next_val int unsigned not null default 0
)
engine = innodb;
insert into foo_seq values (0);
drop table if exists foo;
create table foo
(
foo_date datetime not null,
foo_id int unsigned not null, -- auto inc field which just guarantees uniqueness
primary key (foo_date, foo_id) -- clustered composite PK (innodb only)
)
engine=innodb;
delimiter #
create trigger foo_before_ins_trig before insert on foo
for each row
begin
declare v_id int unsigned default 0;
select next_val+1 into v_id from foo_seq;
set new.foo_id = v_id;
update foo_seq set next_val = v_id;
end#
delimiter ;
Stats:
select count(*) as counter from foo; -- count(*) under innodb always slow
+---------+
| counter |
+---------+
| 2000000 |
+---------+
select min(foo_date) as min_foo_date from foo;
+---------------------+
| min_foo_date |
+---------------------+
| 1782-11-21 16:32:00 |
+---------------------+
1 row in set (0.00 sec)
select max(foo_date) as max_foo_date from foo;
+---------------------+
| max_foo_date |
+---------------------+
| 2011-01-18 23:06:04 |
+---------------------+
1 row in set (0.00 sec)
select count(*) as counter from foo where foo_date between
'2009-01-01 00:00:00' and '2011-01-01 00:00:00';
+---------+
| counter |
+---------+
| 17520 |
+---------+
1 row in set (0.01 sec)
select * from foo where foo_date between
'2009-01-01 00:00:00' and '2011-01-01 00:00:00' order by 1 desc limit 10;
+---------------------+--------+
| foo_date | foo_id |
+---------------------+--------+
| 2010-12-31 23:06:04 | 433 |
| 2010-12-31 22:06:04 | 434 |
| 2010-12-31 21:06:04 | 435 |
| 2010-12-31 20:06:04 | 436 |
| 2010-12-31 19:06:04 | 437 |
| 2010-12-31 18:06:04 | 438 |
| 2010-12-31 17:06:04 | 439 |
| 2010-12-31 16:06:04 | 440 |
| 2010-12-31 15:06:04 | 441 |
| 2010-12-31 14:06:04 | 442 |
+---------------------+--------+
10 rows in set (0.00 sec)
explain
select * from foo where foo_date between
'2009-01-01 00:00:00' and '2011-01-01 00:00:00' order by 1 desc limit 10;
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref |rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
| 1 | SIMPLE | foo | range | PRIMARY | PRIMARY | 8 | NULL |35308 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
1 row in set (0.00 sec)
Pretty performant considering there are 2 million rows...
Hope this helps :)
Choose an autonumber in order to be on the safe side. You could have two or more rows with the same datetime.

MySql function not using indexes

I have simple function consist of one sql query
CREATE FUNCTION `GetProductIDFunc`( in_title char (14) )
RETURNS bigint(20)
BEGIN
declare out_id bigint;
select id into out_id from products where title = in_title limit 1;
RETURN out_id;
END
Execution time of this function takes 5 seconds
select Benchmark(500 ,GetProductIdFunc('sample_product'));
Execution time of plain query takes 0.001 seconds
select Benchmark(500,(select id from products where title = 'sample_product' limit 1));
"Title" field is indexed. Why function execution takes so much time and how can I optimize it?
edit:
Execution plan
mysql> EXPLAIN EXTENDED select id from products where title = 'sample_product' limit 1;
+----+-------------+----------+-------+---------------+------------+---------+-------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+-------+---------------+------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | products | const | Index_title | Index_title | 14 | const | 1 | 100.00 | Using index |
+----+-------------+----------+-------+---------------+------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN select GetProductIdFunc('sample_product');
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)
This could be a character set issue. If the function is using a different character set than the table column, it would lead to very slow performance despite the index.
Run show create table products\G to determine the character set for the column.
Run show variables like 'character_set%'; to see what the relevant default character sets are for your DB.
Try this:
CREATE FUNCTION `GetProductIDFunc`( in_title char (14) )
RETURNS bigint(20)
BEGIN
declare out_id bigint;
set out_id = (select id from products where title = in_title limit 1);
RETURN out_id;
END