Mysql Innodb "select count(*)" performance [duplicate] - mysql

I have a largish but narrow InnoDB table with ~9m records. Doing count(*) or count(id) on the table is extremely slow (6+ seconds):
DROP TABLE IF EXISTS `perf2`;
CREATE TABLE `perf2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`channel_id` int(11) DEFAULT NULL,
`timestamp` bigint(20) NOT NULL,
`value` double NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ts_uniq` (`channel_id`,`timestamp`),
KEY `IDX_CHANNEL_ID` (`channel_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
RESET QUERY CACHE;
SELECT COUNT(*) FROM perf2;
While the statement is not run too often it would be nice to optimize it. According to http://www.cloudspace.com/blog/2009/08/06/fast-mysql-innodb-count-really-fast/ this should be possible by forcing InnoDB to use an index:
SELECT COUNT(id) FROM perf2 USE INDEX (PRIMARY);
The explain plan seems fine:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE perf2 index NULL PRIMARY 4 NULL 8906459 Using index
Unfortunately the statement is as slow as before. According to "SELECT COUNT(*)" is slow, even with where clause I've also tried optimizing the table without success.
What/is the/re a way to optimize COUNT(*) performance on InnoDB?

As of MySQL 5.1.6 you can use the Event Scheduler and insert the count to a stats table regularly.
First create a table to hold the count:
CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL);
Then create an event to update the table:
CREATE EVENT update_stats
ON SCHEDULE
EVERY 5 MINUTE
DO
INSERT INTO stats (`key`, `value`)
VALUES ('data_count', (select count(id) from data))
ON DUPLICATE KEY UPDATE value=VALUES(value);
It's not perfect but it offers a self contained solution (no cronjob or queue) that can be easily tailored to run as often as the required freshness of the count.

For the time being I've solved the problem by using this approximation:
EXPLAIN SELECT COUNT(id) FROM data USE INDEX (PRIMARY)
The approximate number of rows can be read from the rows column of the explain plan when using InnoDB as shown above. When using MyISAM this will remain EMPTY as the table reference isbeing optimized away- so if empty fallback to traditional SELECT COUNT instead.

Based on #Che code, you can also use triggers on INSERT and on UPDATE to perf2 in order to keep the value in stats table up to date in realtime.
CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL
);
Then:
CREATE TRIGGER `count_up` AFTER INSERT ON `perf2` FOR EACH ROW UPDATE `stats`
SET `stats`.`value` = `stats`.`value` + 1
WHERE `stats`.`key` = 'perf2_count';
CREATE TRIGGER `count_down` AFTER DELETE ON `perf2` FOR EACH ROW UPDATE `stats`
SET `stats`.`value` = `stats`.`value` - 1
WHERE `stats`.`key` = 'perf2_count';
So the number of rows in the perf2 table can be read using this query, in realtime:
SELECT `value` FROM `stats` WHERE `key` = 'perf2_count';
This would have the advantage of eliminating the performance issue of performing a COUNT(*) and would only be executed when data changes in perf2.

Related

optimize mysql count in 31M data [duplicate]

I have a largish but narrow InnoDB table with ~9m records. Doing count(*) or count(id) on the table is extremely slow (6+ seconds):
DROP TABLE IF EXISTS `perf2`;
CREATE TABLE `perf2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`channel_id` int(11) DEFAULT NULL,
`timestamp` bigint(20) NOT NULL,
`value` double NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ts_uniq` (`channel_id`,`timestamp`),
KEY `IDX_CHANNEL_ID` (`channel_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
RESET QUERY CACHE;
SELECT COUNT(*) FROM perf2;
While the statement is not run too often it would be nice to optimize it. According to http://www.cloudspace.com/blog/2009/08/06/fast-mysql-innodb-count-really-fast/ this should be possible by forcing InnoDB to use an index:
SELECT COUNT(id) FROM perf2 USE INDEX (PRIMARY);
The explain plan seems fine:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE perf2 index NULL PRIMARY 4 NULL 8906459 Using index
Unfortunately the statement is as slow as before. According to "SELECT COUNT(*)" is slow, even with where clause I've also tried optimizing the table without success.
What/is the/re a way to optimize COUNT(*) performance on InnoDB?
As of MySQL 5.1.6 you can use the Event Scheduler and insert the count to a stats table regularly.
First create a table to hold the count:
CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL);
Then create an event to update the table:
CREATE EVENT update_stats
ON SCHEDULE
EVERY 5 MINUTE
DO
INSERT INTO stats (`key`, `value`)
VALUES ('data_count', (select count(id) from data))
ON DUPLICATE KEY UPDATE value=VALUES(value);
It's not perfect but it offers a self contained solution (no cronjob or queue) that can be easily tailored to run as often as the required freshness of the count.
For the time being I've solved the problem by using this approximation:
EXPLAIN SELECT COUNT(id) FROM data USE INDEX (PRIMARY)
The approximate number of rows can be read from the rows column of the explain plan when using InnoDB as shown above. When using MyISAM this will remain EMPTY as the table reference isbeing optimized away- so if empty fallback to traditional SELECT COUNT instead.
Based on #Che code, you can also use triggers on INSERT and on UPDATE to perf2 in order to keep the value in stats table up to date in realtime.
CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL
);
Then:
CREATE TRIGGER `count_up` AFTER INSERT ON `perf2` FOR EACH ROW UPDATE `stats`
SET `stats`.`value` = `stats`.`value` + 1
WHERE `stats`.`key` = 'perf2_count';
CREATE TRIGGER `count_down` AFTER DELETE ON `perf2` FOR EACH ROW UPDATE `stats`
SET `stats`.`value` = `stats`.`value` - 1
WHERE `stats`.`key` = 'perf2_count';
So the number of rows in the perf2 table can be read using this query, in realtime:
SELECT `value` FROM `stats` WHERE `key` = 'perf2_count';
This would have the advantage of eliminating the performance issue of performing a COUNT(*) and would only be executed when data changes in perf2.

MYSQL Update query using a temporary table when an equivalent select query does not

I have a MYSQL table Foo which has Primary key on id, and 2 other non primary keys on different columns.
Fiddle "select" example
My actual table contains many millions of rows so the behaviour of the explain is different, ie. it uses an Index_Merge on the 2 non primary indexes.
When I run the following Explain Update statement:
explain UPDATE Foo
SET readyState = 1
WHERE readyState = 0
AND productName = 'OpenAM'
LIMIT 30;
The Extra column contains "Using Temporary".
When I run the equivalent explain Select statement:
Explain Select id, productName, readyState
FROM Foo
WHERE readyState = 0
AND productName = 'OpenAM'
Limit 30;
The Extra column does not contain "Using Temporary".
The effect of this on my actual table is that when I update, there is a temporary table being created with several million rows as they are all matching the conditions of the update before the Limit 30 kicks in. The update takes 4-5 seconds whereas the select only takes ~0.001s as it does not create the temp table of the merged index. I understand that the Update will also need to update all 3 indexes (Primary + 2 non primary used in the query) but I would be shocked if it took 4 seconds to update 30 index rows in 3 indexes.
QUESTION: Is there a way to force the Update to not use the unneccessary Temporary table? I was under the impression that MYSQL treated an Update query plan the same way as a select.
If the temp table is required for Update and not for Select, why?
EDIT:
Show Create Table (removed a heap of columns since it is a very wide table):
CREATE TABLE Item (
ID int(11) NOT NULL AUTO_INCREMENT,
ImportId int(11) NOT NULL,
MerchantCategoryName varchar(200) NOT NULL,
HashId int(11) DEFAULT NULL,
Processing varchar(36) DEFAULT NULL,
Status int(11) NOT NULL,
AuditWho varchar(200) NOT NULL,
AuditWhen datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (ID),
KEY idx_Processing_Item (Processing),
KEY idx_Status_Item (Status),
KEY idx_MerchantCategoryName_Item (MerchantCategoryName),
KEY fk_Import_Item (ImportId),
KEY fk_Hash_Item (HashId),
CONSTRAINT fk_Hash_Item FOREIGN KEY (HashId) REFERENCES Hash (ID),
CONSTRAINT fk_Import_Item FOREIGN KEY (ImportId) REFERENCES Import (ID)
) ENGINE=InnoDB AUTO_INCREMENT=12004589 DEFAULT CHARSET=utf8
Update statement
explain UPDATE Item
SET Processing = 'd53dbc91-eef4-11e5-a3a6-06f88beef4f3',
Status = 2,
AuditWho = 'name',
AuditWhen = now()
WHERE EventId = 1
AND Processing is null
AND Status = 1
LIMIT 30;
Results:
'id','select_type','table','type','possible_keys','key','key_len','ref','rows','Extra',
'1','SIMPLE','Item','index_merge','idx_Processing_Item,idx_Status_Item,fk_Import_Item','idx_Processing_Item,idx_Status_Item,fk_Import_Item','111,4,4',\N,'1362610','Using intersect(idx_Processing_Item,idx_Status_Item,fk_Import_Item); Using where; Using temporary',
Select Query
explain select ID from Item where Status = 1 and Processing is null and ImportId = 1 limit 30;
Results:
'id','select_type','table','type','possible_keys','key','key_len','ref','rows','Extra',
'1','SIMPLE','Item','index_merge','idx_Processing_Item,idx_Status_Item,fk_ImportEvent_Item','idx_Processing_Item,idx_Status_Item,fk_Import_Item','111,4,4',\N,'1362610','Using intersect(idx_Processing_ItemPending,idx_Status_ItemPending,fk_ImportEvent_ItemPending); Using where; Using index',
A guess:
The UPDATE is changing an indexed value (readyState), correct? That means that the index in question is being changed as the UPDATE is using it? So, the UPDATE may be "protecting" itself by fetching the rows (in an inefficient way, apparently), tossing them into a tmp table, and only then performing the action.
"Index merge intersect" is almost always less efficient than a composite index: INDEX(readyState, productName) (in either order). Suggest you add that.
Since you have no ORDER BY, which "30" will be unpredictable. Suggest you add ORDER BY the-primary-key.

Why does adding an INNER JOIN make this query so slow?

I have a database with the following three tables:
matches table has 200,000 matches...
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
heroes table has ~100 heroes...
CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
matches_heroes table has 2,000,000 relationships (10 random heroes per match)...
CREATE TABLE `matches_heroes` (
`relation_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`relation_id`),
KEY `match_id` (`match_id`),
KEY `hero_id` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`)
REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`)
REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=3689891 DEFAULT CHARSET=utf8
The following query takes over 1 second, which seems pretty slow to me for something so simple:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5
Removing only the WHERE clause doesn't help, but if I take out the INNER JOIN also, like so:
SELECT SQL_NO_CACHE COUNT(*) AS match_count FROM matches
...it only takes 0.05 seconds. It seems that INNER JOIN is very costly. I don't have much experience with joins. Is this normal or am I doing something wrong?
UPDATE #1: Here's the EXPLAIN result.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE matches_heroes ref match_id,hero_id,match_id_hero_id hero_id 2 const 34742
1 SIMPLE matches eq_ref PRIMARY PRIMARY 8 mydatabase.matches_heroes.match_id 1 Using index
UPDATE #2: After listening to you guys, I think it's working properly and this is simply as fast as it gets. Please let me know if you disagree. Thanks for all the help. I really appreciate it.
Use COUNT(matches.match_id) instead of count(*), as when using joins it's best to not use the * as it does extra computation. Using columns from the join are the best way ensure you are not requesting any other operations. (not a problem on MySql InnerJoin, my bad).
Also you should verify that you have all keys defragmented, and enough ram free for the index to load in memory
Update 1:
Try to add a composed index for match_id,hero_id as it should give better performance.
ALTER TABLE `matches_heroes` ADD KEY `match_id_hero_id` (`match_id`,`hero_id`)
Update 2:
I wasn't satisfied with the accepted answer, that mysql is that slow for just 2 mill records and I runed benchmarks on my ubuntu PC (i7 processor, with standard HDD).
-- pre-requirements
CREATE TABLE seq_numbers (
number INT NOT NULL
) ENGINE = MYISAM;
DELIMITER $$
CREATE PROCEDURE InsertSeq(IN MinVal INT, IN MaxVal INT)
BEGIN
DECLARE i INT;
SET i = MinVal;
START TRANSACTION;
WHILE i <= MaxVal DO
INSERT INTO seq_numbers VALUES (i);
SET i = i + 1;
END WHILE;
COMMIT;
END$$
DELIMITER ;
CALL InsertSeq(1,200000)
;
ALTER TABLE seq_numbers ADD PRIMARY KEY (number)
;
-- create tables
-- DROP TABLE IF EXISTS `matches`
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
CREATE TABLE `matches_heroes` (
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`match_id`,`hero_id`),
KEY (match_id),
KEY (hero_id),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`) REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`) REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
-- insert DATA
-- 100
INSERT INTO heroes(name)
SELECT SUBSTR(CONCAT(char(RAND()*25+65),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97)),1,RAND()*9+4) as RandomName
FROM seq_numbers WHERE number <= 100
-- 200000
INSERT INTO matches(start_time)
SELECT rand()*1000000
FROM seq_numbers WHERE number <= 200000
-- 2000000
INSERT INTO matches_heroes(hero_id,match_id)
SELECT a.hero_id, b.match_id
FROM heroes as a
INNER JOIN matches as b ON 1=1
LIMIT 2000000
-- warm-up database, load INDEXes in ram (optional, works only for MyISAM tables)
LOAD INDEX INTO CACHE matches_heroes,matches,heroes
-- get random hero_id
SET #randHeroId=(SELECT hero_id FROM matches_heroes ORDER BY rand() LIMIT 1);
-- test 1
SELECT SQL_NO_CACHE #randHeroId,COUNT(*) AS match_count
FROM matches as a
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
WHERE b.hero_id = #randHeroId
; -- Time: 0.039s
-- test 2: adding some complexity
SET #randName = (SELECT `name` FROM heroes WHERE hero_id = #randHeroId LIMIT 1);
SELECT SQL_NO_CACHE #randName, COUNT(*) AS match_count
FROM matches as a
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
INNER JOIN heroes as c ON b.hero_id = c.hero_id
WHERE c.name = #randName
; -- Time: 0.037s
Conclusion: The test results are about 20x faster, and my server load was about 80% before testing as it's not a dedicated mysql server and had other cpu intensive tasks running, so if you run the whole script (from above) and get lower results it can be because:
you have a shared host, and the load is too big. In this case there isn't much you can do: you either complain to your current host, pay for a better host/vm or try another host
your configured key_buffer_size(for MyISAM) or innodb_buffer_pool_size(for innoDB) is too small, the optimum size would be over 150MB
your available ram is not enough, you would require about 100 - 150 mb of ram for the indexes to be loaded into memory. solution: free up some ram or buy more of it
Note that by using the test script, the generating of new data rules out the index fragmentation problem.
Hope this helps, and ask if you have issues in testing this.
obs:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5`
is the equivalent to:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches_heroes
WHERE hero_id = 5`
So you wouldn't require a join, if that's the count you need, but I'm guessing that was just an example.
So you say reading a table of 200,000 records is faster than reading a table of 2,000,000 records, finding the desired ones, then take them all to find matching records in the 200,000 record table?
And this surprises you? It's simply a lot of more work for the dbms. (It can even be, btw, that the dbms decides not to use the hero_id index when it considers a full table scan to be faster.)
So in my opinion there is nothing wrong with what is happening here.

How to optimize COUNT(*) performance on InnoDB by using index

I have a largish but narrow InnoDB table with ~9m records. Doing count(*) or count(id) on the table is extremely slow (6+ seconds):
DROP TABLE IF EXISTS `perf2`;
CREATE TABLE `perf2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`channel_id` int(11) DEFAULT NULL,
`timestamp` bigint(20) NOT NULL,
`value` double NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ts_uniq` (`channel_id`,`timestamp`),
KEY `IDX_CHANNEL_ID` (`channel_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
RESET QUERY CACHE;
SELECT COUNT(*) FROM perf2;
While the statement is not run too often it would be nice to optimize it. According to http://www.cloudspace.com/blog/2009/08/06/fast-mysql-innodb-count-really-fast/ this should be possible by forcing InnoDB to use an index:
SELECT COUNT(id) FROM perf2 USE INDEX (PRIMARY);
The explain plan seems fine:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE perf2 index NULL PRIMARY 4 NULL 8906459 Using index
Unfortunately the statement is as slow as before. According to "SELECT COUNT(*)" is slow, even with where clause I've also tried optimizing the table without success.
What/is the/re a way to optimize COUNT(*) performance on InnoDB?
As of MySQL 5.1.6 you can use the Event Scheduler and insert the count to a stats table regularly.
First create a table to hold the count:
CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL);
Then create an event to update the table:
CREATE EVENT update_stats
ON SCHEDULE
EVERY 5 MINUTE
DO
INSERT INTO stats (`key`, `value`)
VALUES ('data_count', (select count(id) from data))
ON DUPLICATE KEY UPDATE value=VALUES(value);
It's not perfect but it offers a self contained solution (no cronjob or queue) that can be easily tailored to run as often as the required freshness of the count.
For the time being I've solved the problem by using this approximation:
EXPLAIN SELECT COUNT(id) FROM data USE INDEX (PRIMARY)
The approximate number of rows can be read from the rows column of the explain plan when using InnoDB as shown above. When using MyISAM this will remain EMPTY as the table reference isbeing optimized away- so if empty fallback to traditional SELECT COUNT instead.
Based on #Che code, you can also use triggers on INSERT and on UPDATE to perf2 in order to keep the value in stats table up to date in realtime.
CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL
);
Then:
CREATE TRIGGER `count_up` AFTER INSERT ON `perf2` FOR EACH ROW UPDATE `stats`
SET `stats`.`value` = `stats`.`value` + 1
WHERE `stats`.`key` = 'perf2_count';
CREATE TRIGGER `count_down` AFTER DELETE ON `perf2` FOR EACH ROW UPDATE `stats`
SET `stats`.`value` = `stats`.`value` - 1
WHERE `stats`.`key` = 'perf2_count';
So the number of rows in the perf2 table can be read using this query, in realtime:
SELECT `value` FROM `stats` WHERE `key` = 'perf2_count';
This would have the advantage of eliminating the performance issue of performing a COUNT(*) and would only be executed when data changes in perf2.

How to optimize this simple JOIN+ORDER BY query?

I have two mysql tables:
/* Table users */
CREATE TABLE IF NOT EXISTS `users` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`DateRegistered` datetime NOT NULL,
PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
/* Table statistics_user */
CREATE TABLE IF NOT EXISTS `statistics_user` (
`UserId` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Sent_Views` int(10) unsigned NOT NULL DEFAULT '0',
`Sent_Winks` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`UserId`),
CONSTRAINT `statistics_user_ibfk_1` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Both tables are populated with 10.000 random rows for testing by using the following procedure:
DELIMITER //
CREATE DEFINER=`root`#`localhost` PROCEDURE `FillUsersStatistics`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE dt DATE;
DECLARE Winks INT DEFAULT 1;
DECLARE Views INT DEFAULT 1;
WHILE (i<=cnt) DO
SET dt = str_to_date(concat(floor(1 + rand() * (9-1)),'-',floor(1 + rand() * (28 -1)),'-','2011'),'%m-%d-%Y');
INSERT INTO users (Id, DateRegistered) VALUES(i, dt);
SET Winks = floor(1 + rand() * (30-1));
SET Views = floor(1 + rand() * (30-1));
INSERT INTO statistics_user (UserId, Sent_Winks, Sent_Views) VALUES (i, Winks, Views);
SET i=i+1;
END WHILE;
END//
DELIMITER ;
CALL `FillUsersStatistics`(10000);
The problem:
When I run the EXPLAIN for this query:
SELECT
t1.Id, (Sent_Views + Sent_Winks) / DATEDIFF(NOW(), t1.DateRegistered) as Score
FROM users t1
JOIN statistics_user t2 ON t2.UserId = t1.Id
ORDER BY Score DESC
.. I get this explain:
Id select_type table type possible_keys key key_len ref rows extra
1 SIMPLE t1 ALL PRIMARY (NULL) (NULL) (NULL) 10037 Using temporary; Using filesort
1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test2.t2.UserId 1
The above query gets very slow when both tables have more than 500K rows. I guess it's because of the 'Using temporary; Using filesort' in the explain of the query.
How can the above query be optimized so that it runs faster?
I'm faily sure that the ORDER BY is what's killing you, since it cannot be properly indexed. Here is a workable, if not particularly pretty, solution.
First, let's say you have a column named Score for storing a user's current score. Every time a user's Sent_Views or Sent_Winks changes, modify the Score column to match. This could probably be done with a trigger (my experience with triggers is limited), or definitely done in the same code that updates the Sent_Views and Sent_Winks fields. This change wouldn't need to know the DATEDIFF portion, because it could just divide by the old sum of Sent_Views + Sent_Winks and multiply by the new one.
Now you just need to change the Score column once per day (if you're not picky about the precise number of hours a user has been registered). This could be done with a script run by a cron job.
Then, just index the Score column and SELECT away!
Note: edited to remove incorrect first attempt.
I'm offering my comment as answer:
Establish a future date, far enough to not interfere with your application, say the year 5000. Replace the current date with this future date in your score calculation. The score computation is now for all intents and purposes absolute, and can be computed when updating winks and views (through a stored rocedure or atrigger (does mysql have triggers?)).
Add a score column to your statistics_user table to store the computed score and define an index on it.
Your SQL can be rewritten as:
SELECT
UserId, score
FROM
statistics_user
ORDER BY score DESC
If you need the real score, it is easily computed with just a constant multiplication which could be done afterwards if it interferse with mysql index selection.
Shouldn't you have indexed DateRegistered in Users?
You should try an inner join, rather than a cartesian product, the next thing you can do is partitioning according to date_registered.