[Partition Benifit on Indexed Column] - mysql

CREATE TABLE ofRoster (
`rosterID` bigint(20) NOT NULL,
`username` varchar(64) NOT NULL,
`jid` varchar(1024) NOT NULL,
`sub` tinyint(4) NOT NULL,
`ask` tinyint(4) NOT NULL,
`recv` tinyint(4) NOT NULL,
`nick` varchar(255) DEFAULT NULL,
PRIMARY KEY (`rosterID`),
KEY `ofRoster_unameid_idx` (`username`),
KEY `ofRoster_jid_idx` (`jid`(255))
) ENGINE=InnoDB;
CREATE TABLE `ofRoster_par` (
`rosterID` bigint(20) NOT NULL AUTO_INCREMENT,
`username` int(64) NOT NULL,
`jid` varchar(1024) NOT NULL,
`sub` tinyint(4) NOT NULL,
`ask` tinyint(4) NOT NULL,
`recv` tinyint(4) NOT NULL,
`nick` varchar(255) DEFAULT NULL,
UNIQUE KEY `rosterID` (`rosterID`,`username`),
KEY `ofRoster_unameid_idx` (`username`),
KEY `ofRoster_jid_idx` (`jid`(255))
) ENGINE=InnoDB AUTO_INCREMENT=412595 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (username)
PARTITIONS 10 */ ;
I created partition on username so that when i use select command it need to search on one partition only.
But i am not sure if this will be benifitial as there is already a index on username.
explain SELECT count(*) FROM ofRoster_par WHERE username='1';
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
| 1 | SIMPLE | ofRoster_par | ref | ofRoster_unameid_idx | ofRoster_unameid_idx | 4 | const | 120 | Using index |
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
explain SELECT count(*) FROM ofRoster WHERE username='1';
+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+--------------------------+
| 1 | SIMPLE | ofRoster | ref | ofRoster_unameid_idx | ofRoster_unameid_idx | 66 | const | 120 | Using where; Using index |
Right now there are just 400 000 records on the table but on the production records will be around 80 million.
Time taken by both query is also the same :-(

PARTITION BY HASH is, in my opinion, useless.
In your example, INDEX(username) on a non-partitioned table would probably be faster than using PARTITION BY HASH(username).
You already have such an index. How fast was it?
Here's what is happening:
With partitioning:
pick partition
use KEY(username) (and not the data) to do the COUNT(*) inside the index (note "Using index")
Without partitioning:
use KEY(username) (and not the data) to do the COUNT(*) inside the index (note "Using index")
Other comments:
If username is unique, consider making it the PRIMARY KEY and get rid of rosterID. (You may want to keep rosterID because it is smaller and used for JOINing to several other tables.)
Bug: You say INT(64) where you meant VARCHAR(64). This may have impacted your timing test.
"Prefix indexes" (jid(255)) are rarely useful. Let's see how you are using it.
80M rows does not warrant BIGINT (8 bytes); INT UNSIGNED (4 bytes) can handle 400 crore.
You understand that latin1 limits you to western European languages?
When using EXPLAIN with partitioned tables, use EXPLAIN PARTITIONS SELECT .... You may get some surprises.

Related

mysql unique index on char column not getting used with IN clause

Mysql version - 5.7.22
Table definition
CREATE TABLE `books` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`uuid` char(32) COLLATE utf8mb4_unicode_ci NOT NULL,
`title` varchar(254) COLLATE utf8mb4_unicode_ci NOT NULL,
`created` datetime(6) NOT NULL,
`modified` datetime(6) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `book_uuid` (`uuid`),
) ENGINE=InnoDB AUTO_INCREMENT=115 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Query
SELECT DISTINCT `books`.`id`,
`books`.`uuid`,
`books`.`title`
FROM `books`
WHERE (`books`.`uuid` IN ("334222a0e99b4a3e97f577665055208e",
"979c059840964934816280ba85c67221",
"4e2978c765dd435998666ea3083666e5",
"535aa78ba80e4215bbf75fb1e20cc5f3",
"f969fb10c72b4875aabdf75c1b493524",
"1daa0015055444a4b1c0821618a7a4d9",
"04f34ede284a4b86b0adddb405d30a75",
"513cad12c88c44c6ab248d43643459b9",
"de2bde6d016f4381ad0ba714234386fa",
"f645c2c9f1594a199a960b97b7015986",
"3ce02c072f24447a8a7b269a19ec554f",
"75450daf9d024d9d9c0df038437ae2c2",
"0e822042b50b4f79bb38304e0acde6f0",
"38d808fb3f9a4f57b4f7b30a141e7169",
"ecd424abd3a94a339383f6f8e668655e"))
ORDER BY `books`.`id` DESC
LIMIT 15;
when i do explain on this query it doesn't pick the index
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-----------------------------+
| 1 | SIMPLE | books | NULL | ALL | book_uuid | NULL | NULL | NULL | 107 | 12.15 | Using where; Using filesort |
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-----------------------------+
strangely the index is correctly used when there are 12 or less entries passed to the IN clause (without forcing index)
forcing the index works, but I cannot force index as this query is created by django ORM, cannot use django-mysql 's force_index as I am on Innodb
Note: I know 'distinct' and 'limit' can be avoided in this query, but its part of a bigger query, so i have kept it as is

Trying to understand EXPLAIN results of query in mySQL

Below is my EXPLAIN query and the output. I'm very much a beginner (please forgive my SQL syntax...unless that's my problem!) - can anyone explain the order of the tables here please? I've played around with the order (in the query itself) and yet the TABLE artists is always top in the EXPLAIN output? I gather the order relates to when the tables are accessed - if so, why artists first?
EXPLAIN
SELECT album_name, artist_name, genre_name
FROM albums
JOIN genres USING (genre_pk)
JOIN artists USING (artist_pk)
ORDER BY album_name;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+--------------------+-----------+---------+-------------------------+------+---------------------------------+
| 1 | SIMPLE | artists | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using temporary; Using filesort |
| 1 | SIMPLE | albums | ref | genre_pk,artist_pk | artist_pk | 2 | music.artists.artist_pk | 1 | NULL |
| 1 | SIMPLE | genres | eq_ref | PRIMARY | PRIMARY | 1 | music.albums.genre_pk | 1 | NULL |
SHOW CREATE TABLE info:
CREATE TABLE `artists` (
`artist_pk` smallint(4) unsigned NOT NULL AUTO_INCREMENT,
`artist_name` varchar(57) NOT NULL,
`artist_origin` enum('UK','US','OTHER') DEFAULT NULL,
PRIMARY KEY (`artist_pk`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1;
CREATE TABLE `genres` (
`genre_pk` tinyint(2) unsigned NOT NULL AUTO_INCREMENT,
`genre_name` varchar(30) NOT NULL,
PRIMARY KEY (`genre_pk`),
UNIQUE KEY `genre_name` (`genre_name`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=latin1;
CREATE TABLE `albums` (
`album_pk` smallint(4) unsigned NOT NULL AUTO_INCREMENT,
`genre_pk` tinyint(2) unsigned NOT NULL,
`artist_pk` smallint(4) unsigned NOT NULL,
`album_name` varchar(57) NOT NULL,
`album_year` year(4) DEFAULT NULL,
`album_track_qty` tinyint(2) unsigned NOT NULL,
`album_disc_num` char(6) NOT NULL DEFAULT '1 of 1',
PRIMARY KEY (`album_pk`),
KEY `genre_pk` (`genre_pk`),
KEY `artist_pk` (`artist_pk`),
FULLTEXT KEY `album_name` (`album_name`),
CONSTRAINT `albums_ibfk_1` FOREIGN KEY (`genre_pk`) REFERENCES `genres` (`genre_pk`),
CONSTRAINT `albums_ibfk_2` FOREIGN KEY (`artist_pk`) REFERENCES `artists` (`artist_pk`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
The order of joining your tables is depending on the SQL optimizer. The optimizer internally modifies your query to deliver results in a fast and efficient way (read this page for more details). To avoid internal join optimization you can use SELECT STRAIGHT_JOIN.
In your special case, the order is depending on the number of rows in each table and the availability of indexes. Have a look at these slides starting with page 25 for a little example.
Here is the fiddle for you: http://sqlfiddle.com/#!2/a6224/2/0
As #Daniel already said, MySQL takes into account not only indices, but also the number of rows in each table. The number of rows is low both in my fiddle and in your database - so it is hard to blame MySQL.
Note that even though STRAIGHT_JOIN will make the order of joins seem logical to you, it will not however make the execution plan prettier (I mean Using temporary; Using filesort red flags)

Why won't my query use the index on my datetime field?

I have the following table:
CREATE TABLE 'tableA'(
`col1` int(11) NOT NULL AUTO_INCREMENT,
`col2` varchar(20) NOT NULL,
`col3` int(11) NOT NULL,
`col4` varchar(200) NOT NULL,
`col5` varchar(15) NOT NULL,
`col6` datetime NOT NULL,
PRIMARY KEY (`col1`),
UNIQUE KEY `col2,col3` (`col2`,`col3`),
KEY `col6` (`col6`)
) ENGINE=InnoDB AUTO_INCREMENT=1881208 DEFAULT CHARSET=utf8
I have an index on col6, a datetime column. I have almost 2M rows in the table, and the dates range from 1/1/2007 to 11/27/2012.
When I run the following, it doesn't use my index:
EXPLAIN SELECT * FROM tableA ORDER BY col6 ASC
+----+-------------+----------+------+---------------+------+---------+------+---------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+---------+----------------+
| 1 | SIMPLE | tableA | ALL | NULL | NULL | NULL | NULL | 1933765 | Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+---------+----------------+
I tried converting the datetime field to an integer and converting the datetime to a unix timestamp. However, it still won't use my index. What am I missing? Why does the optimizer insist on sorting through lots of rows (in this case 1,933,765 rows) rather than use the index?
Since you are not selecting on anything based on the index to narrow the result set, using it would only incur additional work to lookup via point-lookup every each row in the primary table.

What is the best way to optimze this query with indexes?

I have a table with about 30 million records which I need to perform queries upon. From my reading, I thought that a composite index using leftmost prefixing with all the fields I need to select would be the correct way to do it, but when I run an explain on the query, it's not even using the index.
This is the query:
select distinct email FROM my_table
WHERE `customer_id` IN(278,428,186,40,208,247,59,79,376,73,38,52,68,227)
AND `company_id` = 4
AND `active` = 1
AND `date` > '2012-04-15';
The explain looks like this
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
| 1 | SIMPLE | emails | index | customer_id | email | 772 | NULL | 29296705 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
These are the fields
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL DEFAULT '',
`customer_id` int(10) unsigned DEFAULT NULL,
`company_id` int(10) unsigned NOT NULL,
`active` tinyint(1) unsigned NOT NULL DEFAULT '1',
`date` date DEFAULT NULL
Indexes looks like this
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`,`customer_id`),
KEY `customer_id` (`customer_id`,`company_id`,`active`,`date`)
I'm not quite sure what the best way to optimize this is.
MySQL is often fussy about IN on the left side of the index. Try one query for each customer_id and see if that's using your index. You can use the UNION syntax to join them together The other possibility is that MySQL figures it's faster to sift through everything for 10% of rows than to try to use indexes for them.

MySQL won't use index for query?

I have this table:
CREATE TABLE `point` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`siteid` INT(11) NOT NULL,
`lft` INT(11) DEFAULT NULL,
`rgt` INT(11) DEFAULT NULL,
`level` SMALLINT(6) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `point_siteid_site_id` (`siteid`),
CONSTRAINT `point_siteid_site_id` FOREIGN KEY (`siteid`) REFERENCES `site` (`id`) ON DELETE CASCADE
) ENGINE=INNODB AUTO_INCREMENT=35 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
And this query:
SELECT * FROM `point` WHERE siteid = 1;
Which results in this EXPLAIN information:
+----+-------------+-------+------+----------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | point | ALL | point_siteid_site_id | NULL | NULL | NULL | 6 | Using where |
+----+-------------+-------+------+----------------------+------+---------+------+------+-------------+
Question is, why isn't the query using the point_siteid_site_id index?
I haven't used MySQL much, but in PostgreSQL if the number of records is small in the table, it can decide not to use the index. This is not a problem, because it chooses the best query plan for the situation. When the number of records is bigger, it will use the index. Maybe this is the same case here with MySQL.
Edit: Are you sure the foreign key is indexed?
It's probably because you don't have enough sample data, or the cardinality (diversity) of the siteid values isn't very large. These are the most common reasons the optimizer will find it quicker just to read all the values.