Eliminating Sub query in MySQL - mysql

I have a table in MySQL as below.
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`employeeNumber` int(11) DEFAULT NULL,
`approveDate` date DEFAULT NULL,
`documentNumber` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `INDEX_1` (`documentNumber`)
) ENGINE=InnoDB;
I want to make a query if a documentNumber is approved by all employeeNumber or approved by some of employeeNumber or not approved by any employeeNumber.
I made a query as below.
SELECT T1.documentNumber,
(CASE WHEN T2.currentNum = '0' THEN '1' WHEN T2.currentNum < T2.totalNum THEN '2' ELSE '3' END) AS approveStatusNumber
FROM myTable AS T1 LEFT JOIN
(SELECT documentNumber, COUNT(*) AS totalNum, SUM(CASE WHEN approveDate IS NOT NULL THEN '1' ELSE '0' END) AS currentNum
FROM myTable GROUP by documentNumber) AS T2 ON T1.documentNumber = T2.documentNumber
GROUP BY T1.documentNumber;
This SQL works, but very slow.
I tried explain on this SQL, the result is as below.
+----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------------------+
| 1 | PRIMARY | T1 | range | INDEX_1 | INDEX_1 | 153 | NULL | 27 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 5517 | |
| 2 | DERIVED | myTable | index | NULL | INDEX_1 | 153 | NULL | 5948 | Using where |
+----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------------------+
I think I have to eliminate sub query to improve my query.
How can I do the same thing without sub query? Or do I have another way to improve my query?

The expression to return currentNum could be more succinctly expressed (in MySQL) as
SUM(approveDate IS NOT NULL)
And there's no need for an inline view. This will return an equivalent result:
SELECT t.documentNumber
, CASE
WHEN SUM(t.approveDate IS NOT NULL) = 0
THEN '1'
WHEN SUM(t.approveDate IS NOT NULL) < COUNT(*)
THEN '2'
ELSE '3'
END AS approveStatusNumber
FROM myTable t
GROUP BY t.documentNumber
ORDER BY t.documentNumber

Related

MySQL 8 is not using INDEX when subquery has a group column

We have just moved from mariadb 5.5 to MySQL 8 and some of the update queries have suddenly become slow. On more investigation, we found that MySQL 8 does not use index when the subquery has group column.
For example, below is a sample database. Table users maintain the current balance of the users per type and table 'accounts' maintain the total balance history per day.
CREATE DATABASE 'test';
CREATE TABLE `users` (
`uid` int(10) unsigned NOT NULL DEFAULT '0',
`balance` int(10) unsigned NOT NULL DEFAULT '0',
`type` int(10) unsigned NOT NULL DEFAULT '0',
KEY (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `accounts` (
`uid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`balance` int(10) unsigned NOT NULL DEFAULT '0',
`day` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`uid`),
KEY `day` (`day`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Below is a explanation for the query to update accounts
mysql> explain update accounts a inner join (
select uid, sum(balance) balance, day(current_date()) day from users) r
on r.uid=a.uid and r.day=a.day set a.balance=r.balance;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| 1 | UPDATE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | no matching row in const table |
| 2 | DERIVED | users | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
2 rows in set, 1 warning (0.00 sec)
As you can see, mysql is not using index.
On more investigation, I found that if I remove sum() from the subquery, it starts using index. However, that's not the case with mariadb 5.5 which was correctly using the index in all the case.
Below are two select queries with and without sum(). I've used select query to cross check with mariadb 5.5 since 5.5 does not have explanation for update queries.
mysql> explain select * from accounts a inner join (
select uid, balance, day(current_date()) day from users
) r on r.uid=a.uid and r.day=a.day ;
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------+
| 1 | SIMPLE | a | NULL | ref | PRIMARY,day | day | 4 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | users | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.a.uid | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------+
2 rows in set, 1 warning (0.00 sec)
and with sum()
mysql> explain select * from accounts a inner join (
select uid, sum(balance) balance, day(current_date()) day from users
) r on r.uid=a.uid and r.day=a.day ;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | no matching row in const table |
| 2 | DERIVED | users | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
2 rows in set, 1 warning (0.00 sec)
Below is output from mariadb 5.5
MariaDB [test]> explain select * from accounts a inner join (
select uid, sum(balance) balance, day(current_date()) day from users
) r on r.uid=a.uid and r.day=a.day ;
+------+-------------+------------+------+---------------+------+---------+-----------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+------+---------------+------+---------+-----------------------+------+-------------+
| 1 | PRIMARY | a | ALL | PRIMARY,day | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 10 | test.a.uid,test.a.day | 2 | Using where |
| 2 | DERIVED | users | ALL | NULL | NULL | NULL | NULL | 1 | |
+------+-------------+------------+------+---------------+------+---------+-----------------------+------+-------------+
3 rows in set (0.00 sec)
Any idea what are we doing wrong?
As others have commented, break your update query apart...
update accounts join
then your query
on condition of the join.
Your inner select query of
select uid, sum(balance) balance, day(current_date()) day from users
is the only thing that is running, getting some ID and the sum of all balances and whatever the current day. You never know which user is getting updated, let alone the correct amount. Start by getting your query to see your expected results per user ID. Although the context does not make sense that your users table has a "uid", but no primary key thus IMPLYING there is multiple records for the same "uid". The accounts (to me) implies ex: I am a bank representative and sign up multiple user accounts. Thus my active portfolio of client balances on a given day is the sum from users table.
Having said that, lets look at getting that answer
select
u.uid,
sum( u.balance ) allUserBalance
from
users u
group by
u.uid
This will show you per user what their total balance is as of right now. The group by now gives you the "ID" key to tie back to the accounts table. In MySQL, the syntax of a correlated update for this scenario would be... (I am using above query and giving alias "PQ" for PreQuery for the join)
update accounts a
JOIN
( select
u.uid,
sum( u.balance ) allUserBalance
from
users u
group by
u.uid ) PQ
-- NOW, the JOIN ON clause ties the Accounts ID to the SUM TOTALS per UID balance
on a.uid = PQ.uid
-- NOW you can SET the values
set Balance = PQ.allUserBalance,
Day = day( current_date())
Now, the above will not give a proper answer if you have accounts that no longer have user entries associated... such as all users get out. So, whatever accounts have no users, their balance and day record will be as of some prior day. To fix this, you could to a LEFT-JOIN such as
update accounts a
LEFT JOIN
( select
u.uid,
sum( u.balance ) allUserBalance
from
users u
group by
u.uid ) PQ
-- NOW, the JOIN ON clause ties the Accounts ID to the SUM TOTALS per UID balance
on a.uid = PQ.uid
-- NOW you can SET the values
set Balance = coalesce( PQ.allUserBalance, 0 ),
Day = day( current_date())
With the left-join and COALESCE(), if there is no record summation in the user table, it will set the account balance to zero.

can't improve query performance of query

I have the following 2 tables:
CREATE TABLE table1 (
ID INT(11) NOT NULL AUTO_INCREMENT,
AccountID INT NOT NULL,
Type VARCHAR(50) NOT NULL,
ValidForBilling BOOLEAN NULL DEFAULT false,
MerchantCreationTime TIMESTAMP NOT NULL,
PRIMARY KEY (ID),
UNIQUE KEY (OrderID, Type)
);
with the index:
INDEX accID_type_merchCreatTime_vfb (AccountID, Type, MerchantCreationTime, ValidForBilling);
CREATE TABLE table2 (
OrderID INT NOT NULL,
AccountID INT NOT NULL,
LineType VARCHAR(256) NOT NULL,
CreationDate TIMESTAMP NOT NULL,
CalculatedAmount NUMERIC(4,4) NULL,
table1ID INT(11) NOT NULL
);
I'm running the following query:
SELECT COALESCE(SUM(CalculatedAmount), 0.0) AS CalculatedAmount
FROM table2
INNER JOIN table1 ON table1.ID = table2.table1ID
WHERE table1.ValidForBilling is TRUE
AND table1.AccountID = 388
AND table1.Type = 'TPG_DISCOUNT'
AND table1.MerchantCreationTime >= '2018-11-01T05:00:00'
AND table1.MerchantCreationTime < '2018-12-01T05:00:00';
And it takes about 2 minutes to complete.
I did EXPLAIN in order to try and improve the query performance and got the following output:
+----+-------------+------------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------+----------------------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------+----------------------+-------+----------+--------------------------+
| 1 | SIMPLE | table1 | NULL | range | PRIMARY,i_fo_merchant_time_account,FO_AccountID_MerchantCreationTime,FO_AccountID_ExecutionTime,FO_AccountID_Type_ExecutionTime,FO_AccountID_Type_MerchantCreationTime,accID_type_merchCreatTime_vfb | accID_type_merchCreatTime_vfb | 61 | NULL | 71276 | 100.00 | Using where; Using index |
| 1 | SIMPLE | table2 | NULL | eq_ref | table1ID,i_oc_fo_id | table1ID | 4 | finance.table1.ID | 1 | 100.00 | NULL |
+----+-------------+------------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------+----------------------+-------+----------+--------------------------+
I see that I scan 71276 rows in table1 and I can't seem to make this number lower.
Is there an index I can create to improve this query performance?
Move ValidForBilling before MerchantCreationTime in accID_type_merchCreatTime_vfb. You need to do ref lookups =TRUE before range uses in an index.
For table 2, seems to be a table1ID index already and appending CalculatedAmount will be able to be used in the result:
CREATE INDEX tbl1IDCalcAmount (table1ID,CalculatedAmount) ON table2

How to improve MySQL "fill the gaps" query

I have a table with currency exchange rates that I fill with data published by the ECB. That data contains gaps in the date dimension like e.g. holidays.
CREATE TABLE `imp_exchangerate` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rate_date` date NOT NULL,
`currency` char(3) NOT NULL,
`rate` decimal(14,6) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `rate_date` (`rate_date`,`currency`),
KEY `imp_exchangerate_by_currency` (`currency`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
I also have a date dimension as youd expect in a data warehouse:
CREATE TABLE `d_date` (
`date_id` int(11) NOT NULL,
`full_date` date DEFAULT NULL,
---- etc.
PRIMARY KEY (`date_id`),
KEY `full_date` (`full_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Now I try to fill the gaps in the exchangerates like this:
SELECT
d.full_date,
currency,
(SELECT rate FROM imp_exchangerate
WHERE rate_date <= d.full_date AND currency = c.currency
ORDER BY rate_date DESC LIMIT 1) AS rate
FROM
d_date d,
(SELECT DISTINCT currency FROM imp_exchangerate) c
WHERE
d.full_date >=
(SELECT min(rate_date) FROM imp_exchangerate
WHERE currency = c.currency) AND
d.full_date <= curdate()
Explain says:
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 201 | |
| 1 | PRIMARY | d | range | full_date | full_date | 4 | NULL | 6047 | Using where; Using index; Using join buffer (flat, BNL join) |
| 4 | DEPENDENT SUBQUERY | imp_exchangerate | ref | imp_exchangerate_by_currency | imp_exchangerate_by_currency | 3 | c.currency | 664 | |
| 3 | DERIVED | imp_exchangerate | range | NULL | imp_exchangerate_by_currency | 3 | NULL | 201 | Using index for group-by |
| 2 | DEPENDENT SUBQUERY | imp_exchangerate | index | rate_date,imp_exchangerate_by_currency | rate_date | 6 | NULL | 1 | Using where |
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
MySQL needs multiple hours to execute that query. Are there any Ideas how to improve that? I have tried with an index on rate without any noticable impact.
I have a solution for a while now: get rid of dependent subqueries. I had to think from different angles in mutliple places and here is the result:
SELECT
cd.date_id,
x.currency,
x.rate
FROM
imp_exchangerate x INNER JOIN
(SELECT
d.date_id,
max(rate_date) as rate_date,
currency
FROM
d_date d INNER JOIN
imp_exchangerate ON rate_date <= d.full_date
WHERE
d.full_date <= curdate()
GROUP BY
d.date_id,
currency) cd ON x.rate_date = cd.rate_date and x.currency = cd.currency
This query finishes in less then 10 minutes now compared to multiple hours for the original query.
Lesson learned: avoid dependent subqueries in MySQL like the plague!

Can this MySQL Query be optimized?

I currently try to optimize a MySQL query which runs a little slow on tables with 10,000+ rows.
CREATE TABLE IF NOT EXISTS `person` (
`_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`_oid` char(8) NOT NULL,
`firstname` varchar(255) NOT NULL,
`lastname` varchar(255) NOT NULL,
PRIMARY KEY (`_id`),
KEY `_oid` (`_oid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `person_cars` (
`_id` int(11) NOT NULL AUTO_INCREMENT,
`_oid` char(8) NOT NULL,
`idx` varchar(255) NOT NULL,
`val` blob NOT NULL,
PRIMARY KEY (`_id`),
KEY `_oid` (`_oid`),
KEY `idx` (`idx`),
KEY `val` (`val`(64))
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
# Insert some 10000+ rows…
INSERT INTO `person` (`_oid`,`firstname`,`lastname`)
VALUES
('1', 'John', 'Doe'),
('2', 'Jack', 'Black'),
('3', 'Jim', 'Kirk'),
('4', 'Forrest', 'Gump');
INSERT INTO `person_cars` (`_oid`,`idx`,`val`)
VALUES
('1', '0', 'BMW'),
('1', '1', 'PORSCHE'),
('2', '0', 'BMW'),
('3', '1', 'MERCEDES'),
('3', '0', 'TOYOTA'),
('3', '1', 'NISSAN'),
('4', '0', 'OLDMOBILE');
SELECT `_person`.`_oid`,
`_person`.`firstname`,
`_person`.`lastname`,
`_person_cars`.`cars[0]`,
`_person_cars`.`cars[1]`
FROM `person` `_person`
LEFT JOIN (
SELECT `_person`.`_oid`,
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=0, `_person_cars`.`val`, NULL)), NULL) AS `cars[0]`,
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=1, `_person_cars`.`val`, NULL)), NULL) AS `cars[1]`
FROM `person` `_person`
JOIN `person_cars` `_person_cars` ON `_person`.`_oid` = `_person_cars`.`_oid`
GROUP BY `_person`.`_oid`
) `_person_cars` ON `_person_cars`.`_oid` = `_person`.`_oid`
WHERE `cars[0]` = 'BMW' OR `cars[1]` = 'BMW';
The above SELECT query takes ~170ms on my virtual machine running MySQL 5.1.53. with approx. 10,000 rows in each of the two tables.
When I EXPLAIN the above query, results differ depending on how many rows are in each table:
+----+-------------+--------------+-------+---------------+------+---------+------+------+---------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+------+---------+------+------+---------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using where |
| 1 | PRIMARY | _person | ALL | _oid | NULL | NULL | NULL | 4 | Using where; Using join buffer |
| 2 | DERIVED | _person_cars | ALL | _oid | NULL | NULL | NULL | 7 | Using temporary; Using filesort |
| 2 | DERIVED | _person | index | _oid | _oid | 24 | NULL | 4 | Using where; Using index; Using join buffer |
+----+-------------+--------------+-------+---------------+------+---------+------+------+---------------------------------------------+
Some 10,000 rows give the following result:
+----+-------------+--------------+------+---------------+------+---------+------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------------------------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 6613 | Using where |
| 1 | PRIMARY | _person | ref | _oid | _oid | 24 | _person_cars._oid | 10 | |
| 2 | DERIVED | _person_cars | ALL | _oid | NULL | NULL | NULL | 9913 | Using temporary; Using filesort |
| 2 | DERIVED | _person | ref | _oid | _oid | 24 | test._person_cars._oid | 10 | Using index |
+----+-------------+--------------+------+---------------+------+---------+------------------------+------+---------------------------------+
Things get worse when I leave out the WHERE clause or when I LEFT JOIN another table similar to person_cars.
Does anyone have an idea how to optimize the SELECT query to make things a little faster?
It's slow because this will force three full table scans on persons that then get joined together:
LEFT JOIN (
...
GROUP BY `_person`.`_oid` -- the group by here
) `_person_cars` ...
WHERE ... -- and the where clauses on _person_cars.
Considering the where clauses the left join is really an inner join, for one. And you could shove the conditions before the join with persons actually occurs. That join is also needlessly applied twice.
This will make it faster, but if you've an order by/limit clause it will still lead to a full table scan on persons (i.e. still not good) because of the group by in the subquery:
JOIN (
SELECT `_person_cars`.`_oid`,
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=0, `_person_cars`.`val`, NULL)), NULL) AS `cars[0]`,
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=1, `_person_cars`.`val`, NULL)), NULL) AS `cars[1]`
FROM `person_cars`
GROUP BY `_person_cars`.`_oid`
HAVING IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=0, `_person_cars`.`val`, NULL)), NULL) = 'BMW' OR
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=1, `_person_cars`.`val`, NULL)), NULL) = 'BMW'
) `_person_cars` ... -- smaller number of rows
If you apply an order by/limit, you'll get better results with two queries, i.e.:
SELECT `_person`.`_oid`,
`_person`.`firstname`,
`_person`.`lastname`
FROM `_person`
JOIN `_person_cars`
ON `_person_cars`.`_oid` = `_person`.`_oid`
AND `_person_cars`.`val` = 'BMW'
GROUP BY -- pre-sort the result before grouping, so as to not do the work twice
`_person`.`lastname`,
`_person`.`firstname`,
-- eliminate users with multiple BMWs
`_person`.`_oid`
ORDER BY `_person`.`lastname`,
`_person`.`firstname`,
`_person`.`_oid`
LIMIT 10
And then select the cars with an IN () clause using the resulting ids.
Oh, and your vals column probably should be a varchar.
Check This
SELECT
p._oid AS oid,
p.firstname AS firstname,
p.lastname AS lastname,
pc.val AS car1,
pc2.val AS car2
FROM person AS p
LEFT JOIN person_cars AS pc
ON pc._oid = p._oid
AND pc.idx = 0
LEFT JOIN person_cars AS pc2
ON pc2._oid = p._oid
AND pc2.idx = 1
WHERE pc.val = 'BMW'
OR pc2.val = 'BWM'

Query Takes too long to Complete

I have a table with 900k+ records
it takes a minute or more to run this query:
SELECT
t.user_id,
SUM(t.direction = "i") AS 'num_in',
SUM(t.direction = "o") AS 'num_out'
FROM tbl_user_reports t
WHERE t.bound_time BETWEEN '2011-02-01' AND '2011-02-28'
GROUP BY t.user_id
HAVING t.user_id IS NOT NULL
ORDER BY num_in DESC
LIMIT 10;
can you tell me how to query the result faster?
-- more info --
structure:
id int(11) unsigned NOT NULL
subscriber varchar(255) NULL
user_id int(11) unsigned NULL
carrier_id int(11) unsigned NOT NULL
pool_id int(11) unsigned NOT NULL
service_id int(11) unsigned NOT NULL
persona_id int(11) unsigned NULL
inbound_id int(11) unsigned NULL
outbound_id int(11) unsigned NULL
bound_time datetime NOT NULL
direction varchar(1) NOT NULL
indexes:
bound_timebound_time
FK_tbl_user_reportspersona_id
FK_tbl_user_reports_messageinbound_id
FK_tbl_user_reports_serviceservice_id
FK_tbl_user_reports_poolpool_id
FK_tbl_user_reports_useruser_id
FK_tbl_user_reports_carriercarrier_id
FK_tbl_user_reports_subscribersubscriber
FK_tbl_user_reports_outboundoutbound_id
directiondirection
You may want to try a compound index on
(bound_time, user_id, direction)
Contains all the fields you need and can be narrowed by the date range very efficiently.
If possible redesign your report table to take more advantage of your innodb clustered primary key index.
Here's a simplified example of what i mean:
5 million rows
32K users
126K records in date range
cold runtime (after mysqld restart) = 0.13 seconds
create table user_reports
(
bound_time datetime not null,
user_id int unsigned not null,
id int unsigned not null,
direction tinyint unsigned not null default 0,
primary key (bound_time, user_id, id) -- clustered composite PK
)
engine=innodb;
select count(*) as counter from user_reports;
+---------+
| counter |
+---------+
| 5000000 |
+---------+
select count(distinct(user_id)) as counter from user_reports;
+---------+
| counter |
+---------+
| 32000 |
+---------+
select count(*) as counter from user_reports
where bound_time between '2011-02-01 00:00:00' and '2011-04-30 00:00:00';
+---------+
| counter |
+---------+
| 126721 |
+---------+
select
t.user_id,
sum(t.direction = 1) AS num_in,
sum(t.direction = 0) AS num_out
from
user_reports t
where
t.bound_time between '2011-02-01 00:00:00' and '2011-04-30 00:00:00' and
t.user_id is not null
group by
t.user_id
order by
direction desc
limit 10;
+---------+--------+---------+
| user_id | num_in | num_out |
+---------+--------+---------+
| 17397 | 1 | 1 |
| 14729 | 2 | 1 |
| 20094 | 4 | 1 |
| 19343 | 7 | 1 |
| 24804 | 1 | 2 |
| 14714 | 3 | 2 |
| 2662 | 4 | 3 |
| 16360 | 2 | 3 |
| 21288 | 2 | 3 |
| 12800 | 6 | 2 |
+---------+--------+---------+
10 rows in set (0.13 sec)
explain
select
t.user_id,
sum(t.direction = 1) AS num_in,
sum(t.direction = 0) AS num_out
from
user_reports t
where
t.bound_time between '2011-02-01 00:00:00' and '2011-04-30 00:00:00' and
t.user_id is not null
group by
t.user_id
order by
direction desc
limit 10;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref |rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | t | range | PRIMARY | PRIMARY | 8 | NULL |255270 | Using where; Using temporary; Using filesort |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
1 row in set (0.00 sec)
hope you find this helpful :)
As Thilo said add indexes, also instead tbl_user_reports t use tbl_user_reports AS t, I would move HAVING statement to WHERE to reduce amount of operations.
WHERE t.user_id IS NOT NULL AND t.bound_time BETWEEN '2011-02-01' AND '2011-02-28'
UPDATE
For experiment purpose you can try to use like instead of between
t.bound_time LIKE '2011-02%'