A question about query performance in MySQL. I have a table (the largest I've ever dealt with) of 2.3 million records (and growing). The table is part of a database keeping track of users logging in and scoring points in kind of seperate, quiz-like, sessions. For the query at hand I need the 'highscore table' of all the sessions.
So, the points scored in a session are stored per question in order to analyse the progress of the user better. A session combines the total of a user's points, and a session is connected to a user.
At first the query executiontime ran towards 12 seconds (unacceptable) with the table and query data as follows under 'Original set'. Under 'Improved scores table' there is the altered situation with some optimization in the indexes. This results in a query execution time of about 2 seconds.
My Question is: Is there an additional way to optimize? Like I said, 2.3 million (and counting) is the largest table I've ever seen, so I'm not that experienced at this and optimization sooner results in seconds than tenths of a second improvement.
Original set
CREATE TABLE `players` (
`id_players` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_organisations` int(10) unsigned NOT NULL,
`player_name` varchar(45) NOT NULL,
`player_comments` text NOT NULL,
PRIMARY KEY (`id_players`),
KEY `FK_players_organisation` (`id_organisations`),
CONSTRAINT `FK_players_organisation` FOREIGN KEY (`id_organisations`) REFERENCES `organisations` (`id_organisations`)
) ENGINE=InnoDB AUTO_INCREMENT=9139 DEFAULT CHARSET=latin1
SELECT COUNT(*) FROM players => 9126
CREATE TABLE `scores` (
`id_scores` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_sessions` int(10) unsigned NOT NULL,
`id_levels` int(10) unsigned NOT NULL,
`id_categories` int(10) unsigned NOT NULL,
`score_points` int(10) unsigned NOT NULL,
`score_correct` tinyint(4) NOT NULL,
`score_submitted` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id_scores`),
KEY `FK_scores_sessions` (`id_sessions`),
KEY `FK_scores_levels` (`id_levels`),
KEY `FK_scores_categories` (`id_categories`),
KEY `Index_3_points` (`score_points`),
KEY `Index_4_submitted` (`score_submitted`)
) ENGINE=InnoDB AUTO_INCREMENT=2328510 DEFAULT CHARSET=latin1
SELECT COUNT(*) FROM scores => 2328469
CREATE TABLE `sessions` (
`id_sessions` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_players` int(10) unsigned NOT NULL,
`id_classes` int(11) DEFAULT NULL,
`session_start` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`session_grade` decimal(4,1) NOT NULL,
`session_ip` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id_sessions`),
KEY `FK_sessions_players` (`id_players`),
KEY `FK_sessions_classes` (`id_classes`)
) ENGINE=InnoDB AUTO_INCREMENT=40800 DEFAULT CHARSET=latin1
SELECT COUNT(*) FROM sessions => 40788
The 'offending' query:
SELECT sum( s.score_points ) AS score_points, p.player_name
FROM scores s
INNER JOIN sessions se ON s.id_sessions = se.id_sessions
INNER JOIN players p ON se.id_players = p.id_players
GROUP BY se.id_sessions
ORDER BY score_points DESC
LIMIT 50;
Above query took about 12 seconds with said scores table. (below the EXPLAIN ouput)
id select_type table type possible_keys key key_len ref rows Extra
'1' 'SIMPLE' 'p' 'ALL' 'PRIMARY' NULL NULL NULL '9326' 'Using temporary; Using filesort'
'1' 'SIMPLE' 'se' 'ref' 'PRIMARY,FK_sessions_players' 'FK_sessions_players' '4' 'earzsql.p.id_players' '2' 'Using index'
'1' 'SIMPLE' 's' 'ref' 'FK_scores_sessions' 'FK_scores_sessions' '4' 'earzsql.se.id_sessions' '72' ''
(the apparently infamous Using temporary and Using filesort)
After some 'research' I've changed indexes (Index_3_points) in scores table resulting in a table like:
Improved scores table
CREATE TABLE `scores` (
`id_scores` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_sessions` int(10) unsigned NOT NULL,
`id_levels` int(10) unsigned NOT NULL,
`id_categories` int(10) unsigned NOT NULL,
`score_points` int(10) unsigned NOT NULL,
`score_correct` tinyint(4) NOT NULL,
`score_submitted` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id_scores`),
KEY `FK_scores_sessions` (`id_sessions`),
KEY `FK_scores_levels` (`id_levels`),
KEY `FK_scores_categories` (`id_categories`),
KEY `Index_4_submitted` (`score_submitted`),
KEY `Index_3_points` (`id_sessions`,`score_points`)
) ENGINE=InnoDB AUTO_INCREMENT=2328510 DEFAULT CHARSET=latin1
With above scores table the query execution time drops to about 2 seconds. Explain (below) has not really changed a lot though (at least, the infamous temporary and filesorts are still used)
id select_type table type possible_keys key key_len ref rows Extra
'1' 'SIMPLE' 'p' 'ALL' 'PRIMARY' NULL NULL NULL '9326' 'Using temporary; Using filesort'
'1' 'SIMPLE' 'se' 'ref' 'PRIMARY,FK_sessions_players' 'FK_sessions_players' '4' 'earzsql.p.id_players' '2' 'Using index'
'1' 'SIMPLE' 's' 'ref' 'FK_scores_sessions,Index_3_points' 'Index_3_points' '4' 'earzsql.se.id_sessions' '35' 'Using index'
I'd love to hear it if anyone knows further optimization tricks.
Presumably the top 50 scores don't change very often?
So run the query into a TopScore table, and index it. When a user's score changes, check it against the high scores table, and only update the TopScore table if the user's score is better than the 50th.
I would also suggest that adding a lot of indexes to a table that is frequently updated will probably have adverse performance effects on that table.
Related
I have posts and websites (and connecting post_websites). Each post can be on multiple websites, and some websites share the content, so I am trying to access the posts which are attached to particular website IDs.
Most of the cases WHERE IN works fine, but not for all websites, some of them are laggy, and I can't understand a difference.
SELECT *
FROM `posts`
WHERE `posts`.`id` IN (
SELECT `post_websites`.`post_id`
FROM `post_websites`
WHERE `website_id` IN (
12054,
19829,
2258,
253
)
) AND
`status` = 1 AND
`posts`.`deleted_at` IS NULL
ORDER BY `post_date` DESC
LIMIT 6
Explain
select_type
table
type
key
key_len
ref
rows
Extra
SIMPLE
post_websites
range
post_websites_website_id_index
4
NULL
440
Using index condition; Using temporary; Using filesort; Start temporary
SIMPLE
posts
eq_ref
PRIMARY
4
post_websites.post_id
1
Using where; End temporary
Other version with EXISTS
SELECT *
FROM `posts`
WHERE EXISTS (
SELECT `post_websites`.`post_id`
FROM `post_websites`
WHERE `website_id` IN (
12054,
19829,
2258,
253
) AND
`posts`.`id` = `post_websites`.`post_id`
) AND
`status` = 1 AND
`deleted_at` IS NULL
ORDER BY `post_date` DESC
LIMIT 6
EXPLAIN:
select_type
table
type
key
key_len
ref
rows
Extra
PRIMARY
posts
index
post_date_index
5
NULL
12
Using where
DEPENDENT SUBQUERY
post_websites
ref
post_id_website_id_unique
4
post.id
1
Using where; Using index
Long story short: based on different amounts of posts on each site and amount of websites sharing content the results are different from 20ms to 50s!
Based on the EXPLAIN the EXISTS works better, but on practice when the amount of data in subquery is lower, it can be very slow.
Is there a query I am missing that could work like a charm for all cases? Or should I check something before querying and choose the method of doing so dynamically?
migrations:
CREATE TABLE `posts` (
`id` int(10) UNSIGNED NOT NULL,
`title` varchar(225) COLLATE utf8_unicode_ci NOT NULL,
`description` varchar(500) COLLATE utf8_unicode_ci NOT NULL,
`post_date` timestamp NULL DEFAULT NULL,
`status` tinyint(4) NOT NULL DEFAULT '1',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`deleted_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
ALTER TABLE `posts`
ADD PRIMARY KEY (`id`),
ADD KEY `created_at_index` (`created_at`) USING BTREE,
ADD KEY `status_deleted_at_index` (`status`,`deleted_at`) USING BTREE,
ADD KEY `post_date_index` (`post_date`) USING BTREE,
ADD KEY `id_post_date_status_deleted_at` (`id`,`post_date`,`status`,`deleted_at`) USING BTREE;
CREATE TABLE `post_websites` (
`post_id` int(10) UNSIGNED NOT NULL,
`website_id` int(10) UNSIGNED NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
ALTER TABLE `post_websites`
ADD PRIMARY KEY (`website_id`, `post_id`),
ADD UNIQUE KEY `post_id_website_id_unique` (`post_id`,`website_id`),
ADD KEY `website_id_index` (`website_id`),
ADD KEY `post_id_index` (`post_id`);
eloquent:
$news = Post::select(['title', 'description'])
->where('status', 1)
->whereExists(
function ($query) use ($sites) {
$query->select('post_websites.post_id')
->from('post_websites')
->whereIn('websites_id', $sites)
->whereRaw('post_websites.post_id = posts.id');
})
->orderBy('post_date', 'desc');
->limit(6)
->get();
or
$q->whereIn('posts.id',
function ($query) use ($sites) {
$query->select('post_websites.post_id')
->from('post_websites')
->whereIn('website_id', $sites);
});
Thanks.
Many:many table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
That says to get rid if id (because it slows things down), promote that UNIQUE to be the PK, and add an INDEX in the opposite direction.
Don't use IN ( SELECT ... ). A simple JOIN is probably the best alternative here.
Did some 3rd party package provide those 3 TIMESTAMPs for each table? Are they ever used? Get rid of them.
KEY `id_post_date_status_deleted_at` (`id`,`post_date`,`status`,`deleted_at`) USING BTREE;
is mostly backward. Some rules:
Don't start an index with the PRIMARY KEY column(s).
Do start an index with = tests: status,deleted_at
I have a first table containing my ips stored as integer (500k rows), and a second one containing ranges of black listed ips and the reason of black listing (10M rows)
here is the table structure :
CREATE TABLE `black_lists` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`ip_start` INT(11) UNSIGNED NOT NULL,
`ip_end` INT(11) UNSIGNED NULL DEFAULT NULL,
`reason` VARCHAR(3) NOT NULL,
`excluded` TINYINT(1) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `ip_range` (`ip_end`, `ip_start`),
INDEX `ip_start` ( `ip_start`),
INDEX `ip_end` (`ip_end`),
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=10747741
;
CREATE TABLE `ips` (
`id` INT(11) NOT NULL AUTO_INCREMENT COMMENT 'Id ips',
`idhost` INT(11) NOT NULL COMMENT 'Id Host',
`ip` VARCHAR(45) NULL DEFAULT NULL COMMENT 'Ip',
`ipint` INT(11) UNSIGNED NULL DEFAULT NULL COMMENT 'Int ip',
`type` VARCHAR(45) NULL DEFAULT NULL COMMENT 'Type',
PRIMARY KEY (`id`),
INDEX `host` (`idhost`),
INDEX `index3` (`ip`),
INDEX `index4` (`idhost`, `ip`),
INDEX `ipsin` (`ipint`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=675651;
my problem is when I try to run this query no index is used and it takes an eternity to finish :
select i.ip,s1.reason
from ips i
left join black_lists s1 on i.ipint BETWEEN s1.ip_start and s1.ip_end;
I'm using MariaDB 10.0.16
True.
The optimizer has no knowledge that start..end values are non overlapping, nor anything else obvious about them. So, the best it can do is decide between
s1.ip_start <= i.ipint -- and use INDEX(ip_start), or
s1.ip_end >= i.ipint -- and use INDEX(ip_end)
Either of those could result in upwards of half the table being scanned.
In 2 steps you could achieve the desired goal for one ip; let's say #ip:
SELECT ip_start, reason
FROM black_lists
WHERE ip_start <= #ip
ORDER BY ip_start DESC
LIMIT 1
But after that, you need to see if the ip_end corresponding to that ip_start is <= #ip before deciding whether you have a black-listed item.
SELECT reason
FROM ( ... ) a -- fill in the above query
JOIN black_lists b USING(ip_start)
WHERE b.ip_end <= #ip
That will either return the reason or no rows.
In spite of the complexity, it will be very fast. But, you seem to have a set of IPs to check. That makes it more complex.
For black_lists, there seems to be no need for id. Suggest you replace the 4 indexes with only 2:
PRIMARY KEY(ip_start, ip_end),
INDEX(ip_end)
In ips, isn't ip unique? If so, get rid if id and change 5 indexes to 3:
PRIMARY KEY(idint),
INDEX(host, ip),
INDEX(ip)
You have allowed more than enough in the VARCHAR for IPv6, but not in INT UNSIGNED.
More discussion.
I'm struggling to understand if I've indexed this query properly, it's somewhat slow and I feel it could use optimization. MySQL 5.1.70
select snaps.id, snaps.userid, snaps.ins_time, usr.gender
from usersnaps as snaps
join user as usr on usr.id = snaps.userid
left join user_convert as conv on snaps.userid = conv.userid
where (conv.level is null or conv.level = 4) and snaps.active = 'N'
and (usr.status = "unfilled" or usr.status = "unapproved") and usr.active = 1
order by snaps.ins_time asc
usersnaps table (irrelevant deta removed, size about 250k records) :
CREATE TABLE IF NOT EXISTS `usersnaps` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`userid` int(11) unsigned NOT NULL DEFAULT '0',
`picture` varchar(250) NOT NULL,
`active` enum('N','Y') NOT NULL DEFAULT 'N',
`ins_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`,`userid`),
KEY `userid` (`userid`,`active`),
KEY `ins_time` (`ins_time`),
KEY `active` (`active`)
) ENGINE=InnoDB;
user table (irrelevant deta removed, size about 300k records) :
CREATE TABLE IF NOT EXISTS `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`active` tinyint(1) NOT NULL DEFAULT '1',
`status` enum('15','active','approval','suspended','unapproved','unfilled','rejected','suspended_auto','incomplete') NOT NULL DEFAULT 'approval',
PRIMARY KEY (`id`),
KEY `status` (`status`,`active`)
) ENGINE=InnoDB;
user_convert table (size about : 60k records) :
CREATE TABLE IF NOT EXISTS `user_convert` (
`userid` int(10) unsigned NOT NULL,
`level` tinyint(4) NOT NULL,
UNIQUE KEY `userid` (`userid`),
KEY `level` (`level`)
) ENGINE=InnoDB;
Explain extended returns :
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE snaps ref userid,default_pic,active active 1 const 65248 100.00 Using where; Using filesort
1 SIMPLE usr eq_ref PRIMARY,active,status PRIMARY 4 snaps.userid 1 100.00 Using where
1 SIMPLE conv eq_ref userid userid 4s snaps.userid 1 100.00 Using where
Using filesort is probably your performance killer.
You need the records from usersnaps where active = 'N' and you need them sorted by ins_time.
ALTER TABLE usersnaps ADD KEY active_ins_time (active,ins_time);
Indexes are stored in sorted order, and read in sorted order... so if the optimizer chooses that index, it will go for the records with active = 'N' and -- hey, look at that -- they're already sorted by ins_time -- because of that index. So as it reads the rows referenced by the index, the result-set internally is already in the order you want it to ORDER BY, and the optimizer should realize this... no filesort required.
I would recommend changing the userid index (assuming you're not using it right now) to have active first and userid later.
That should make it more useful for this query.
Having some real issues with a few queries, this one inparticular. Info below.
tgmp_games, about 20k rows
CREATE TABLE IF NOT EXISTS `tgmp_games` (
`g_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`g_name` varchar(255) NOT NULL,
`g_link` varchar(255) NOT NULL,
`g_url` varchar(255) NOT NULL,
`g_platforms` varchar(128) NOT NULL,
`g_added` datetime NOT NULL,
`g_cover` varchar(255) NOT NULL,
`g_impressions` int(8) NOT NULL,
PRIMARY KEY (`g_id`),
KEY `g_platforms` (`g_platforms`),
KEY `site_id` (`site_id`),
KEY `g_link` (`g_link`),
KEY `g_release` (`g_release`),
KEY `g_genre` (`g_genre`),
KEY `g_name` (`g_name`),
KEY `g_impressions` (`g_impressions`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
tgmp_reviews - about 200k rows
CREATE TABLE IF NOT EXISTS `tgmp_reviews` (
`r_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`r_source` varchar(128) NOT NULL,
`r_date` date NOT NULL,
`r_score` int(3) NOT NULL,
`r_copy` text NOT NULL,
`r_link` text NOT NULL,
`r_int_link` text NOT NULL,
`r_parent` int(8) NOT NULL,
`r_platform` varchar(12) NOT NULL,
`r_impressions` int(8) NOT NULL,
PRIMARY KEY (`r_id`),
KEY `site_id` (`site_id`),
KEY `r_parent` (`r_parent`),
KEY `r_platform` (`r_platform`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Here is the query, takes 3 seconds ish
SELECT * FROM tgmp_games g
RIGHT JOIN tgmp_reviews r ON g_id = r.r_parent
WHERE g.site_id = '34'
GROUP BY g_name
ORDER BY g_impressions DESC LIMIT 15
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE r ALL r_parent NULL NULL NULL 201133 Using temporary; Using filesort
1 SIMPLE g eq_ref PRIMARY,site_id PRIMARY 4 engine_comp.r.r_parent 1 Using where
I am just trying to grab the 15 most viewed games, then grab a single review (doesnt really matter which, I guess highest rated would be ideal, r_score) for each game.
Can someone help me figure out why this is so horribly inefficient?
I don't understand what is the purpose of having a GROUP BY g_name in your query, but this makes MySQL performing aggregates on the columns selected, or all columns from both table. So please try to exclude it and check if it helps.
Also, RIGHT JOIN makes database to query tgmp_reviews first, which is not what you want. I suppose LEFT JOIN is a better choice here. Please, try to change the join type.
If none of the first options helps, you need to redesign your query. As you need to obtain 15 most viewed games for the site, the query will be:
SELECT g_id
FROM tgmp_games g
WHERE site_id = 34
ORDER BY g_impressions DESC
LIMIT 15;
This is the very first part that should be executed by the database, as it provides the best selectivity. Then you can get the desired reviews for the games:
SELECT r_parent, max(r_score)
FROM tgmp_reviews r
WHERE r_parent IN (/*1st query*/)
GROUP BY r_parent;
Such construct will force database to execute the first query first (sorry for the tautology) and will give you the maximal score for each of the wanted games. I hope you will be able to use the obtained results for your purpose.
Your MyISAM table is small, you can try converting it to see if that resolves the issue. Do you have a reason for using MyISAM instead of InnoDB for that table?
You can also try running an analyze on each table to update the statistics to see if the optimizer chooses something different.
This query:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
correctly uses the Donation.order_line_id and Lineitem.id indexes, shown in this EXPLAIN output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ref order_line_id order_line_id 4 Lineitem.id 2 Using index
However, this query, which simply includes another field:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`npo_id`,
`Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
Shows that the Donation table does not use an index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ALL order_line_id NULL NULL NULL 3
All of the _id fields in the tables are indexed, but I can't figure out how adding this field into the list of selected fields causes the index to be dropped.
As requested by James C, here are the table definitions:
CREATE TABLE `donations` (
`id` int(10) unsigned NOT NULL auto_increment,
`npo_id` int(10) unsigned NOT NULL,
`order_line_detail_id` int(10) unsigned NOT NULL default '0',
`order_line_id` int(10) unsigned NOT NULL default '0',
`created` datetime default NULL,
`modified` datetime default NULL,
PRIMARY KEY (`id`),
KEY `npo_id` (`npo_id`),
KEY `order_line_id` (`order_line_id`),
KEY `order_line_detail_id` (`order_line_detail_id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8
CREATE TABLE `order_line` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`order_id` bigint(20) NOT NULL,
`npo_id` bigint(20) NOT NULL default '0',
`session_id` varchar(32) collate utf8_unicode_ci default NULL,
`created` datetime default NULL,
PRIMARY KEY (`id`),
KEY `order_id` (`order_id`),
KEY `npo_id` (`npo_id`),
KEY `session_id` (`session_id`)
) ENGINE=InnoDB AUTO_INCREMENT=23 DEFAULT CHARSET=utf8
I also did some reading about cardinality, and it looks like both the Donations.npo_id and Donations.order_line_id have a cardinality of 2. Hopefully this suggests something useful?
I'm thinking that a USE INDEX might solve the problem, but I'm using an ORM that makes this a bit tricky, and I don't understand why it wouldn't grab the correct index when the JOIN specifically names indexed fields?!?
Thanks for your brainpower!
The first explain has "uses index" at the end. This means that it was able to find the rows and return the result for the query by just looking at the index and not having to fetch/analyse any row data.
In the second query you add a row that's likely not indexed. This means that MySQL has to look at the data of the table. I'm not sure why the optimiser chose to do a table scan but I think it's likely that if the table is fairly small it's easier for it to just read everything than trying to pick out details for individual rows.
edit: I think adding the following indexes will improve things even more and let all of the join use indexes only:
ALTER TABLE order_line ADD INDEX(session_id, id);
ALTER TABLE donations ADD INDEX(order_line_id, npo_id, id)
This will allow order_line to to find the rows using session_id and then return id and also allow donations to join onto order_line_id and then return the other two columns.
Looking at the auto_increment values can I assume that there's not much data in there. It's worth noting that the amount of data in the tables will have an effect on the query plan and it's good practice to put some sample data in there to test things out. For more detail have a look in this blog post I made some time back: http://webmonkeyuk.wordpress.com/2010/09/27/what-makes-a-good-mysql-index-part-2-cardinality/