Efficiently query on first two digits of indexed int column in MySQL - mysql

I have a table (MySQL 8.0.26, InnoDB) containing an indexed column of MEDIUMINTs that denote the date a record was created:
date_created MEDIUMINT NOT NULL
INDEX idx_created (date_created)
E.g., the entry "210516" denotes 2021-05-16.
Are the following queries roughly equally efficient in utilizing the index?
WHERE 210000<=date_created AND date_created<220000,
WHERE date_created DIV 10000 = 21,
WHERE date_created LIKE '21%', and
WHERE LEFT(date_created, 2) = '21'
I am currently using WHERE date_created DIV 10000 = 21 in my code but wonder if I should alter all queries to make them more efficient.
Thanks a lot in advance.

Look at the type column in EXPLAIN. If it says "ALL" it means it must do a table-scan of all the rows, evaluating the condition expression for each row. This is not using the index.
mysql> explain select * from mytable where 21000<=date_created and date_created < 22000;
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | mytable | NULL | range | date_created | date_created | 4 | NULL | 1 | 100.00 | Using index condition |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+-----------------------+
mysql> explain select * from mytable where date_created like '21%';
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | mytable | NULL | ALL | date_created | NULL | NULL | NULL | 8192 | 11.11 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
mysql> explain select * from mytable where date_created div 10000 = 21;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | mytable | NULL | ALL | NULL | NULL | NULL | NULL | 8192 | 100.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
mysql> explain select * from mytable where left(date_created, 2) = '21';
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | mytable | NULL | ALL | NULL | NULL | NULL | NULL | 8192 | 100.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
MySQL 8.0 supports expression indexes, which helps a couple of the cases:
mysql> alter table mytable add index expr1 ((left(date_created, 2)));
mysql> explain select * from mytable where left(date_created, 2) = '21';
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | ref | expr1 | expr1 | 11 | const | 1402 | 100.00 | NULL |
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
mysql> alter table mytable add index expr2 ((date_created DIV 10000));
mysql> explain select * from mytable where date_created div 10000 = 21;
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | ref | expr2 | expr2 | 5 | const | 1402 | 100.00 | NULL |
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
But expression indexes won't help the LIKE '21%' search, because you'd have to hard-code the value '21%' in the expression for the index definition. You could use that index to search for that value only, not for the value of a different year.

Related

Does querying int column with string datatype have any performance impact in mysql queries?

Assuming I have a table as:
create table any_table (any_column_1 int, any_column_2 varchar(255));
create index any_table_any_column_1_IDX USING BTREE ON any_table (any_column_1);
(Note: Index type should not matter here)
I was wondering if querying any_column with int or string have any impact on performance, i.e. does
select * from any_table where any_column_1 = 12345;
have any differences in terms of performance with this one?
select * from any_table where any_column_1 = '12345';
I have looked around the web and really have not faced this particular case.
It should be fine to do this either way for an indexed integer column. When you compare an integer column to a constant, the constant value is cast to an integer whether you format it as an integer or a string.
You can confirm this with EXPLAIN. In both cases, the EXPLAIN shows that it will use the index (type: ref indicates an index lookup), and the performance will be the same.
mysql> explain select * from any_table where any_column_1 = 12345;
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | any_table | NULL | ref | any_table_any_column_1_IDX | any_table_any_column_1_IDX | 5 | const | 1 | 100.00 | NULL |
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
mysql> explain select * from any_table where any_column_1 = '12345';
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | any_table | NULL | ref | any_table_any_column_1_IDX | any_table_any_column_1_IDX | 5 | const | 1 | 100.00 | NULL |
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
If you had indexed the string column in your example, any_column_2, it would make a difference because the collation of a string column must match the collation of the value you compare it to. A string literal will be cast to a compatible collation by default, so it uses the index:
create index any_table_any_column_2_IDX USING BTREE ON any_table (any_column_2);
mysql> explain select * from any_table where any_column_2 = '12345';
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | any_table | NULL | ref | any_table_any_column_2_IDX | any_table_any_column_2_IDX | 768 | const | 1 | 100.00 | NULL |
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
But an integer literal has no collation, so you get warnings, and the index cannot be used. The EXPLAIN shows type: ALL so it will do a table-scan and that will have poor performance if you query a table with many rows.
mysql> explain select * from any_table where any_column_2 = 12345;
+----+-------------+-----------+------------+------+----------------------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+----------------------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | any_table | NULL | ALL | any_table_any_column_2_IDX | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+-----------+------------+------+----------------------------+------+---------+------+------+----------+-------------+
1 row in set, 3 warnings (0.00 sec)
mysql> show warnings;
+---------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1739 | Cannot use ref access on index 'any_table_any_column_2_IDX' due to type or collation conversion on field 'any_column_2' |
| Warning | 1739 | Cannot use range access on index 'any_table_any_column_2_IDX' due to type or collation conversion on field 'any_column_2' |
| Note | 1003 | /* select#1 */ select `test2`.`any_table`.`any_column_1` AS `any_column_1`,`test2`.`any_table`.`any_column_2` AS `any_column_2` from `test2`.`any_table` where (`test2`.`any_table`.`any_column_2` = 12345) |
+---------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

DELETE statement is not using INDEX on table and executing for long time

There is one huge table which is having 25M records and when we try to delete the records by manually passing the value it is using the INDEX and query is executing faster.
Below are details.
MySQL [(none)]> explain DELETE FROM isca51410_octopus_prod_eai.WMSERVICE WHERE contextid in ('1121','1245','5432','12412','1212','7856','2342','1345','5312','2342','3432','5321');
+----+-------------+-----------+------------+-------+---------------+-------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+-------------+---------+-------+------+----------+-------------+
| 1 | DELETE | BIG_TABLE | NULL | range | IDX_BIG_CID | IDX_BIG_CID | 109 | const | 12 | 100.00 | Using where |
+----+-------------+-----------+------------+-------+---------------+-------------+---------+-------+------+----------+-------------+
But when we try to pass the values by using select query it is not using index and query is executing for more time.
Below is the explain plan.
MySQL [(none)]> explain DELETE FROM DATABASE1_1.BIG_TABLE WHERE contextid in (SELECT contextid FROM DATABASE_2.TABLE_2);
+----+--------------------+------------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+------------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| 1 | DELETE | BIG_TABLE | NULL | ALL | NULL | NULL | NULL | NULL | 25730673 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | TABLE_2 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 10.00 | Using where |
+----+--------------------+------------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
Here DATABASE_2.TABLE_2 is a table where the values will change everytime and row count will be less than 100.
How to make use of index IDX_BIG_CID on table DATABASE1_1.BIG_TABLE for the below query
DELETE FROM DATABASE1_1.BIG_TABLE WHERE contextid in (SELECT contextid FROM DATABASE_2.TABLE_2);
Don't use IN ( SELECT ... ). Use a multi-table DELETE. (See the ref manual.)

How to improve this sql query performance?

SELECT id, name, detail FROM student WHERE id NOT IN (1,788,103,100) ORDER BY id DESC LIMIT 1000,10
The table is tiny (10,000 rows). I have to consider two point, "IN query" and "LIMIT query".
Here are the DDLs and the EXPLAIN. I'm using MySQL 5.6.4.
CREATE TABLE student
( id int(11) NOT NULL AUTO_INCREMENT
, name varchar(45) NOT NULL
, detail varchar(255) NOT NULL
, PRIMARY KEY (id)
) ENGINE = MyISAM;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | student| ALL | Primary,id | NULL | NULL | NULL | 13 | |
The LIMIT and ORDER BY clauses mean that the query has to build the whole table and then order it and then go the record 1000 and then extract the next 10 records.
Why are you looking for 10 records starting at record 1000?
Removing the ORDER BY clause would make it faster as the query would only need to extract 1010 records.
I cannot replicate this finding...
SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 5.5.16 |
+-----------+
SELECT COUNT(*) FROM student;
+----------+
| COUNT(*) |
+----------+
| 131072 |
+----------+
SELECT id
FROM student
WHERE id
NOT IN (1,788,103,100)
ORDER
BY id DESC
LIMIT 1000,10;
+--------+
| id |
+--------+
| 195591 |
| 195590 |
| 195589 |
| 195588 |
| 195587 |
| 195586 |
| 195585 |
| 195584 |
| 195583 |
| 195582 |
+--------+
10 rows in set (0.00 sec)
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+
| 1 | SIMPLE | student | range | PRIMARY | PRIMARY | 4 | NULL | 131069 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+

MySQL index not used

have some table with index for two columns (user_id,date)
and SQL query
select user_id, stat.in, stat.out, stat.time, date
from stat
where user_id in (select id from users force index (street_id) where street_id=30);
or
select user_id, stat.in, stat.out, stat.time, date
from stat where user_id in (select id from users force index (street_id) where street_id=30)
and date between STR_TO_DATE('2010-01-01 00:00:00', '%Y-%m-%d %H:%i:%s') and TR_TO_DATE('2014-05-22 23:59:59', '%Y-%m-%d %H:%i:%s')
In two case index must work, but I sink problem in in statement. If it's possible, how make it work?
Explain:
+----+--------------------+-------+------+---------------+-----------+---------+-------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+-----------+---------+-------+----------+--------------------------+
| 1 | PRIMARY | stat | ALL | NULL | NULL | NULL | NULL | 32028701 | Using where |
| 2 | DEPENDENT SUBQUERY | users | ref | street_id | street_id | 8 | const | 650 | Using where; Using index |
+----+--------------------+-------+------+---------------+-----------+---------+-------+----------+--------------------------+
if search with one user_id index work
explain select user_id, stat.in, stat.out, stat.time, date
from stat
where user_id=3991;
Explain:
+----+-------------+-------+------+---------------+-----------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+-----------+---------+-------+------+-------+
| 1 | SIMPLE | stat | ref | user_id_2 | user_id_2 | 8 | const | 2973 | |
+----+-------------+-------+------+---------------+-----------+---------+-------+------+-------+
First thing in the query the IN clause is creating havoc and if I am not wrong the indexes are not done properly.
So here is how it should be lets say the tables are as
create table users (id int, name varchar(100),street_id int);
insert into users values
(1,'a',20),(2,'b',30),(3,'c',10),(4,'d',20),(5,'e',10),(6,'f',40),(7,'g',20),
(8,'h',10),(9,'i',10),(10,'j',40);
create table stat (user_id int ,`in` int, `out` int, time int , date date);
insert into stat values
(1,1,1,20,'2014-01-01'),
(1,1,1,20,'2014-01-02'),
(3,1,1,20,'2014-01-01'),
(2,1,1,20,'2014-01-01'),
(4,1,1,20,'2014-01-02'),
(6,1,1,20,'2014-01-02'),
(7,1,1,20,'2014-01-02'),
(8,1,1,20,'2014-01-02'),
(1,1,1,20,'2014-01-02'),
(2,1,1,20,'2014-01-02'),
(3,1,1,20,'2014-01-03'),
(4,1,1,20,'2014-01-04'),
(5,1,1,20,'2014-01-04'),
(6,1,1,20,'2014-01-04'),
(7,1,1,20,'2014-01-04'),
(2,1,1,20,'2014-01-04'),
(3,1,1,20,'2014-01-04'),
(4,1,1,20,'2014-01-05'),
(5,1,1,20,'2014-01-05'),
(6,1,1,20,'2014-01-05'),
(7,1,1,20,'2014-01-05'),
(8,1,1,20,'2014-01-05'),
(9,1,1,20,'2014-01-05'),
(10,1,1,20,'2014-01-05'),
(1,1,1,20,'2014-01-06'),
(4,1,1,20,'2014-01-06');
Now add some indexes on the table
alter table users add index id_idx (id);
alter table users add index street_idx(street_id);
alter table stat add index user_id_idx(user_id);
Now if we execute the same query that you are trying to do using explain yields
EXPLAIN
select user_id, stat.`in`, stat.`out`, stat.time, date
from stat
where user_id in (select id from users force index (street_id) where street_id=30);
+----+--------------------+-------+------+---------------+------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+------------+---------+-------+------+-------------+
| 1 | PRIMARY | stat | ALL | NULL | NULL | NULL | NULL | 26 | Using where |
| 2 | DEPENDENT SUBQUERY | users | ref | street_idx | street_idx | 5 | const | 1 | Using where |
+----+--------------------+-------+------+---------------+------------+---------+-------+------+-------------+
It still looks like trying to scan the entire table.
Now lets modify the query and use JOIN and see what explain has to say, note that I have index on both table for the joining key and which are of same type and size.
EXPLAIN
select
s.user_id,
s.`in`,
s.`out`,
s.time,
s.date
from stat s
join users u on u.id = s.user_id
where u.street_id=30 ;
+----+-------------+-------+------+-------------------+-------------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------+-------------+---------+-----------+------+-------------+
| 1 | SIMPLE | u | ref | id_idx,street_idx | street_idx | 5 | const | 1 | Using where |
| 1 | SIMPLE | s | ref | user_id_idx | user_id_idx | 5 | test.u.id | 3 | Using where |
+----+-------------+-------+------+-------------------+-------------+---------+-----------+------+-------------+
Better hun ?? Now lets try a range search
EXPLAIN
select
s.user_id,
s.`in`,
s.`out`,
s.time,
s.date
from stat s
join users u on u.id = s.user_id
where
u.street_id=30
and s.date between '2014-01-01' AND '2014-01-06'
;
+----+-------------+-------+------+-------------------+-------------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------+-------------+---------+-----------+------+-------------+
| 1 | SIMPLE | u | ref | id_idx,street_idx | street_idx | 5 | const | 1 | Using where |
| 1 | SIMPLE | s | ref | user_id_idx | user_id_idx | 5 | test.u.id | 3 | Using where |
+----+-------------+-------+------+-------------------+-------------+---------+-----------+------+-------------+
Still better right ??
So the underlying agenda is try avoiding IN queries. Use JOIN on indexed column and for search columns indexed them properly.

MySQL Indexing - In vs. Equals indexing issues

Following queries run quite fast and instantaneously on mysql server:
SELECT table_name.id
FROM table_name
WHERE table_name.id in (10000)
SELECT table_name.id
from table_name
where table_name.id = (SELECT table_name.id
FROM table_name
WHERE table_name.id in (10000)
);
But if I change the second query to as following, then it takes more than 20 seconds:
SELECT table_name.id
from table_name
where table_name.id in (SELECT table_name.id
FROM table_name
WHERE table_name.id in (10000)
);
On doing explain, I get the following output. It is clear that there are some issues regarding how MySQL indexes the data, and use in keyword.
For first query:
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | table_name | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
For second query:
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | PRIMARY | table_name | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
| 2 | SUBQUERY | table_name | const | PRIMARY | PRIMARY | 4 | | 1 | Using index |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
For third query:
+----+--------------------+------------+-------+---------------+---------+---------+-------+---------+--------------------------+
| id | select_type | table_name | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+-------+---------------+---------+---------+-------+---------+--------------------------+
| 1 | PRIMARY | table_name | index | NULL | sentTo | 5 | NULL | 6250751 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | table_name | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
+----+--------------------+------------+-------+---------------+---------+---------+-------+---------+--------------------------+
I am using InnoDB and have tried changing the third query to forcibly use the index as indicated by the following category.
In first case you have only first record from subquery (It runs once, because equals is only for first value)
In second query you got Cartesian multiplication (each per each) because IN runs subquery for each row. Which is not good for performance
Try to use joins for these cases.