using mysql key/index for query performance improvement - mysql

I want all the promotions which were active between specific date. Once the promotion is activated createdOnDt is changed to now() and updatedOnDt remains null. When promotion is deactivated updatedOnDt is changed to now(). Here is table structure
CREATE TABLE promotions
(id bigint NOT NULL AUTO_INCREMENT,
promotionId bigint NOT NULL,
promotionName varchar(255) CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci DEFAULT NULL,
clientId int NOT NULL,
subClientId int DEFAULT NULL,
createdOnDt datetime DEFAULT NULL,
updatedOnDt datetime DEFAULT NULL,
PRIMARY KEY (id),
KEY clientId_subClientId (clientId,subClientId)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1
Now, active promotions between date can be found out using,
select p.*
from promotions p
where p.createdOnDt >= (:fromDate)
and p.createdOnDt <= (:toDate)
and p.clientId = (:clientId) ```
for this query performance is as follows,
+----+-------------+-------+------------+------+-------------------------------+-------------------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-------------------------------+-------------------------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | p | NULL | ref | clientId_subClientId | clientId_subClientId | 4 | const | 2 | 50.00 | Using where |
+----+-------------+-------+------------+------+-------------------------------+-------------------------------+---------+-------+------+----------+-------------+
but when I want to fetch currently running promtions as well using following query,
select p.*
from promotions p
where p.createdOnDt >= (:fromDate)
and p.createdOnDt <= (:toDate)
or (p.createdOnDt <= (:fromDate)
and p.updatedOnDt is null
)
and p.clientId = (1957)
for this query following is the performance,
+----+-------------+-------+------------+------+-------------------------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-------------------------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | p | NULL | ALL | clientId_subClientId | NULL | NULL | NULL | 2 | 50.00 | Using where |
+----+-------------+-------+------------+------+-------------------------------+------+---------+------+------+----------+-------------+
this is full table scan.
I want to know,
can we get the required data without updatedOnDt column ?
and if updatedOnDt is necessary how to improve performance of query ? also
if createdOnDt is let's say 2nd Aug and updatedOnDt is 6th Aug for promotion X, then if searched between 3rd Aug and 5th Aug query should return promotion X because it was active during selected date range. I tried adding key for createdOnDt as well as on updatedOnDt. but still there was no performance improvement.

The first query needs INDEX(clientId, createdOnDt).
The second query does not test both sides of the OR against clientId. This smells like a flaw.
In the second query, OR is the performance killer...
select p.*
from promotions p
where p.createdOnDt >= (:fromDate)
and p.createdOnDt <= (:toDate)
and p.clientId = (1957)
UNION ALL
select p.*
from promotions p
where p.createdOnDt <= (:fromDate)
and p.updatedOnDt is null
and p.clientId = (1957)
The previous index, plus this, are needed:
INDEX(clientId, updatedOnDt, createdOnDt)
Be aware that :toDate should end with 23:59:59, else the query will not include the last "day".
Also note that p.createdOnDt = (:fromDate) shows up in both sides of the UNION ALL. This may not be what you wanted.

Related

MySQL 8 is not using INDEX when subquery has a group column

We have just moved from mariadb 5.5 to MySQL 8 and some of the update queries have suddenly become slow. On more investigation, we found that MySQL 8 does not use index when the subquery has group column.
For example, below is a sample database. Table users maintain the current balance of the users per type and table 'accounts' maintain the total balance history per day.
CREATE DATABASE 'test';
CREATE TABLE `users` (
`uid` int(10) unsigned NOT NULL DEFAULT '0',
`balance` int(10) unsigned NOT NULL DEFAULT '0',
`type` int(10) unsigned NOT NULL DEFAULT '0',
KEY (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `accounts` (
`uid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`balance` int(10) unsigned NOT NULL DEFAULT '0',
`day` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`uid`),
KEY `day` (`day`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Below is a explanation for the query to update accounts
mysql> explain update accounts a inner join (
select uid, sum(balance) balance, day(current_date()) day from users) r
on r.uid=a.uid and r.day=a.day set a.balance=r.balance;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| 1 | UPDATE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | no matching row in const table |
| 2 | DERIVED | users | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
2 rows in set, 1 warning (0.00 sec)
As you can see, mysql is not using index.
On more investigation, I found that if I remove sum() from the subquery, it starts using index. However, that's not the case with mariadb 5.5 which was correctly using the index in all the case.
Below are two select queries with and without sum(). I've used select query to cross check with mariadb 5.5 since 5.5 does not have explanation for update queries.
mysql> explain select * from accounts a inner join (
select uid, balance, day(current_date()) day from users
) r on r.uid=a.uid and r.day=a.day ;
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------+
| 1 | SIMPLE | a | NULL | ref | PRIMARY,day | day | 4 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | users | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.a.uid | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------+
2 rows in set, 1 warning (0.00 sec)
and with sum()
mysql> explain select * from accounts a inner join (
select uid, sum(balance) balance, day(current_date()) day from users
) r on r.uid=a.uid and r.day=a.day ;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | no matching row in const table |
| 2 | DERIVED | users | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
2 rows in set, 1 warning (0.00 sec)
Below is output from mariadb 5.5
MariaDB [test]> explain select * from accounts a inner join (
select uid, sum(balance) balance, day(current_date()) day from users
) r on r.uid=a.uid and r.day=a.day ;
+------+-------------+------------+------+---------------+------+---------+-----------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+------+---------------+------+---------+-----------------------+------+-------------+
| 1 | PRIMARY | a | ALL | PRIMARY,day | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 10 | test.a.uid,test.a.day | 2 | Using where |
| 2 | DERIVED | users | ALL | NULL | NULL | NULL | NULL | 1 | |
+------+-------------+------------+------+---------------+------+---------+-----------------------+------+-------------+
3 rows in set (0.00 sec)
Any idea what are we doing wrong?
As others have commented, break your update query apart...
update accounts join
then your query
on condition of the join.
Your inner select query of
select uid, sum(balance) balance, day(current_date()) day from users
is the only thing that is running, getting some ID and the sum of all balances and whatever the current day. You never know which user is getting updated, let alone the correct amount. Start by getting your query to see your expected results per user ID. Although the context does not make sense that your users table has a "uid", but no primary key thus IMPLYING there is multiple records for the same "uid". The accounts (to me) implies ex: I am a bank representative and sign up multiple user accounts. Thus my active portfolio of client balances on a given day is the sum from users table.
Having said that, lets look at getting that answer
select
u.uid,
sum( u.balance ) allUserBalance
from
users u
group by
u.uid
This will show you per user what their total balance is as of right now. The group by now gives you the "ID" key to tie back to the accounts table. In MySQL, the syntax of a correlated update for this scenario would be... (I am using above query and giving alias "PQ" for PreQuery for the join)
update accounts a
JOIN
( select
u.uid,
sum( u.balance ) allUserBalance
from
users u
group by
u.uid ) PQ
-- NOW, the JOIN ON clause ties the Accounts ID to the SUM TOTALS per UID balance
on a.uid = PQ.uid
-- NOW you can SET the values
set Balance = PQ.allUserBalance,
Day = day( current_date())
Now, the above will not give a proper answer if you have accounts that no longer have user entries associated... such as all users get out. So, whatever accounts have no users, their balance and day record will be as of some prior day. To fix this, you could to a LEFT-JOIN such as
update accounts a
LEFT JOIN
( select
u.uid,
sum( u.balance ) allUserBalance
from
users u
group by
u.uid ) PQ
-- NOW, the JOIN ON clause ties the Accounts ID to the SUM TOTALS per UID balance
on a.uid = PQ.uid
-- NOW you can SET the values
set Balance = coalesce( PQ.allUserBalance, 0 ),
Day = day( current_date())
With the left-join and COALESCE(), if there is no record summation in the user table, it will set the account balance to zero.

MySql Record Matching Criteria With Latest Date

I have a mySql table where all status changes are recorded. I want to be able to query the status of all items on a specific date, or the last date for all items. The table I have now is:
CREATE TABLE `tra_rel_sta` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tra_id` int(11) DEFAULT NULL,
`sta_id` int(11) DEFAULT NULL,
`changed_on` datetime DEFAULT NULL,
`changed_by` int(11) DEFAULT NULL,
`comments` text,
PRIMARY KEY (`id`),
KEY `tra_id` (`tra_id`),
KEY `rel` (`tra_id`,`sta_id`,`changed_on`),
KEY `sta_id` (`sta_id`),
KEY `changed_on` (`changed_on`),
KEY `tra_changed` (`tra_id`,`changed_on`)
) ENGINE=InnoDB AUTO_INCREMENT=51734 DEFAULT CHARSET=utf8;
(I know I'm probably overdoing the indexes, but I haven't exactly figured out how to optimize indexes yet).
The query I'm using now, which works is:
SELECT rel.changed_on, rel.changed_by, rel.tra_id, sta.id AS sta_id, sta.status, sta.description, sta.onHold, sta.awaitingApproval, sta.approved, sta.complete, sta.locked
FROM (
SELECT tra_id, MAX(changed_on) AS lst
FROM tra_rel_sta
GROUP BY tra_id
) AS rec
LEFT JOIN tra_rel_sta AS rel ON rel.changed_on = rec.lst AND rel.tra_id = rec.tra_id
LEFT JOIN tra_status AS sta ON sta.id = rel.sta_id
If I want to use a specific date, I insert a WHERE statement in the sub-query.
This works, but it takes about 0.65 seconds to run in PHP with about 51,733 records in the table. This query is used as a sub query in several others when I need to know the last status of an object, and as a result, is slowing down many application.
I've tried to use a sub query in the WHERE statement as described in MySQL: how to select record with latest date before a certain date but it takes almost twice as long. I've tried using a JOIN statement as described in MySQL select of record with latest date but I'm getting about the same or just slightly slower results.
How can I optimize this query or fix my indexes to make this more effective?
Thanks!!
As requested, EXPLAIN of query:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
---|-------------|-------------|--------|-----------------------------------|---------|---------|-------------------|-------|-------------
1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 49931 | NULL
1 | PRIMARY | rel | ref | tra_id,rel,changed_on,tra_changed | tra_id | 5 | rec.tra_id | 1 | Using where
1 | PRIMARY | sta | eq_ref | PRIMARY | PRIMARY | 4 | csinfo.rel.sta_id | 1 | NULL
2 | DERIVED | tra_rel_sta | index | tra_id,rel,tra_changed | tra_id | 5 | NULL | 49931 | NULL

How to improve MySQL "fill the gaps" query

I have a table with currency exchange rates that I fill with data published by the ECB. That data contains gaps in the date dimension like e.g. holidays.
CREATE TABLE `imp_exchangerate` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rate_date` date NOT NULL,
`currency` char(3) NOT NULL,
`rate` decimal(14,6) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `rate_date` (`rate_date`,`currency`),
KEY `imp_exchangerate_by_currency` (`currency`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
I also have a date dimension as youd expect in a data warehouse:
CREATE TABLE `d_date` (
`date_id` int(11) NOT NULL,
`full_date` date DEFAULT NULL,
---- etc.
PRIMARY KEY (`date_id`),
KEY `full_date` (`full_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Now I try to fill the gaps in the exchangerates like this:
SELECT
d.full_date,
currency,
(SELECT rate FROM imp_exchangerate
WHERE rate_date <= d.full_date AND currency = c.currency
ORDER BY rate_date DESC LIMIT 1) AS rate
FROM
d_date d,
(SELECT DISTINCT currency FROM imp_exchangerate) c
WHERE
d.full_date >=
(SELECT min(rate_date) FROM imp_exchangerate
WHERE currency = c.currency) AND
d.full_date <= curdate()
Explain says:
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 201 | |
| 1 | PRIMARY | d | range | full_date | full_date | 4 | NULL | 6047 | Using where; Using index; Using join buffer (flat, BNL join) |
| 4 | DEPENDENT SUBQUERY | imp_exchangerate | ref | imp_exchangerate_by_currency | imp_exchangerate_by_currency | 3 | c.currency | 664 | |
| 3 | DERIVED | imp_exchangerate | range | NULL | imp_exchangerate_by_currency | 3 | NULL | 201 | Using index for group-by |
| 2 | DEPENDENT SUBQUERY | imp_exchangerate | index | rate_date,imp_exchangerate_by_currency | rate_date | 6 | NULL | 1 | Using where |
+------+--------------------+------------------+-------+----------------------------------------+------------------------------+---------+------------+------+--------------------------------------------------------------+
MySQL needs multiple hours to execute that query. Are there any Ideas how to improve that? I have tried with an index on rate without any noticable impact.
I have a solution for a while now: get rid of dependent subqueries. I had to think from different angles in mutliple places and here is the result:
SELECT
cd.date_id,
x.currency,
x.rate
FROM
imp_exchangerate x INNER JOIN
(SELECT
d.date_id,
max(rate_date) as rate_date,
currency
FROM
d_date d INNER JOIN
imp_exchangerate ON rate_date <= d.full_date
WHERE
d.full_date <= curdate()
GROUP BY
d.date_id,
currency) cd ON x.rate_date = cd.rate_date and x.currency = cd.currency
This query finishes in less then 10 minutes now compared to multiple hours for the original query.
Lesson learned: avoid dependent subqueries in MySQL like the plague!

Need help optimizing outer join SQL query

I am hoping to get some advice on how to optimize the performance of this query I have with an outer join. First I will explain what I am trying to do and then I'll show the code and results.
I have an Accounts table that has a list of all customer accounts. And I have a datausage table which keeps track of how much data each customer is using. A backend process running on multiple servers inserts records into the datausage table each day to keep track of how much usage occurred that day for each customer on that server.
The backend process works like this - if there is no activity on that server for an account on that day, no records are written for that account. If there is activity, one record is written with a "LogDate" of that day. This is happening on multiple servers. So collectively the datausage table winds up with no rows (no activity at all for that customer each day), one row (activity was only on one server for that day), or multiple rows (activity was on multiple servers for that day).
We need to run a report that lists ALL customers, along with their usage for a specific date range. Some customers may have no usage at all (nothing whatsoever in the datausage table). Some customers may have no usage at all for the current period (but usage in other periods).
Regardless of whether there is any usage or not (ever, or for the selected period) we need EVERY customer in the Accounts table to be listed in the report, even if they show no usage. Therefore it seems this required an outer join.
Here is the query I am using:
SELECT
Accounts.accountID as AccountID,
IFNULL(Accounts.name,Accounts.accountID) as AccountName,
AccountPlans.plantype as AccountType,
Accounts.status as AccountStatus,
date(Accounts.created_at) as Created,
sum(IFNULL(datausage.Core,0) + (IFNULL(datausage.CoreDeluxe,0) * 3)) as 'CoreData'
FROM `Accounts`
LEFT JOIN `datausage` on `Accounts`.`accountID` = `datausage`.`accountID`
LEFT JOIN `AccountPlans` on `AccountPlans`.`PlanID` = `Accounts`.`PlanID`
WHERE
(
(`datausage`.`LogDate` >= '2014-06-01' and `datausage`.`LogDate` < '2014-07-01')
or `datausage`.`LogDate` is null
)
GROUP BY Accounts.accountID
ORDER BY `AccountName` asc
This query takes about 2 seconds to run. However it only takes 0.3 seconds to run if the "or datausage.LogDate is NULL" is removed. However, it seems I must have that clause in there, because accounts with no usage are excluded from the result set if that does not appear.
Here is the table data:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+---------------------------------------------------------+---------+---------+----------------------+------- +----------------------------------------------------+
| 1 | SIMPLE | Accounts | ALL | PRIMARY,accounts_planid_foreign,accounts_cardid_foreign | NULL | NULL | NULL | 57 | Using temporary; Using filesort |
| 1 | SIMPLE | datausage | ALL | NULL | NULL | NULL | NULL | 96805 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | AccountPlans | eq_ref | PRIMARY | PRIMARY | 4 | mydb.Accounts.planID | 1 | NULL |
The indexes on Accounts table are as follows:
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+-------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Accounts | 0 | PRIMARY | 1 | accountID | A | 57 | NULL | NULL | | BTREE | | |
| Accounts | 1 | accounts_planid_foreign | 1 | planID | A | 5 | NULL | NULL | | BTREE | | |
| Accounts | 1 | accounts_cardid_foreign | 1 | cardID | A | 0 | NULL | NULL | YES | BTREE | | |
The index on the datausage table is as follows:
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| datausage | 0 | PRIMARY | 1 | UsageID | A | 96805 | NULL | NULL | | BTREE | | |
I tried creating different indexes on datausage to see if it would help, but nothing did. I tried an index on AccountID, an index on AccountID, LogData, and index on LogData, AccountID, and an index on LogData. None of these made any difference.
I also tried using a UNION ALL with one of the queries with the logdata range and the other query just where logdata is null, but the result was about the same (actually a bit worse).
Can someone please help me understand what may be going on and the ways in which I can optimize the query execution time? Thank you!!
UPDATE: At Philipxy's request, here are the table definitions. Note that I removed some columns and constraints that are not related to this query to help keep things as tight and clean as possible.
CREATE TABLE `Accounts` (
`accountID` varchar(25) NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`status` int(11) NOT NULL,
`planID` int(10) unsigned NOT NULL DEFAULT '1',
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00'
PRIMARY KEY (`accountID`),
KEY `accounts_planid_foreign` (`planID`),
KEY `acctname_id_ndx` (`name`,`accountID`),
CONSTRAINT `accounts_planid_foreign` FOREIGN KEY (`planID`) REFERENCES `AccountPlans` (`planID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `datausage` (
`UsageID` int(11) NOT NULL AUTO_INCREMENT,
`Core` int(11) DEFAULT NULL,
`CoreDelux` int(11) DEFAULT NULL,
`AccountID` varchar(25) DEFAULT NULL,
`LogDate` date DEFAULT NULL
PRIMARY KEY (`UsageID`),
KEY `acctusage` (`AccountID`,`LogDate`)
) ENGINE=MyISAM AUTO_INCREMENT=104303 DEFAULT CHARSET=latin1
CREATE TABLE `AccountPlans` (
`planID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
`params` text COLLATE utf8_unicode_ci NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`plantype` varchar(25) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`planID`),
KEY `acctplans_id_type_ndx` (`planID`,`plantype`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
First, you can simplify the query by moving the where clause to the on clause:
SELECT a.accountID as AccountID, coalesce(a.name, a.accountID) as AccountName,
ap.plantype as AccountType, a.status as AccountStatus,
date(a.created_at) as Created,
sum(coalesce(du.Core, 0) + (coalesce(du.CoreDeluxe, 0) * 3)) as CoreData
FROM Accounts a LEFT JOIN
datausage du
on a.accountID = du.`accountID` AND
du.`LogDate` >= '2014-06-01' and du.`LogDate` < '2014-07-01'
LEFT JOIN
AccountPlans ap
on ap.`PlanID` = a.`PlanID`
GROUP BY a.accountID
ORDER BY AccountName asc ;
(I also introduced table aliases to make the query easier to read.)
This version should make better uses of indexes because it eliminates the or in the where clause. However, it still won't use an index for the outer sort. The following might be better:
SELECT a.accountID as AccountID, coalesce(a.name, a.accountID) as AccountName,
ap.plantype as AccountType, a.status as AccountStatus,
date(a.created_at) as Created,
sum(coalesce(du.Core, 0) + (coalesce(du.CoreDeluxe, 0) * 3)) as CoreData
FROM Accounts a LEFT JOIN
datausage du
on a.accountID = du.`accountID` AND
du.LogDate >= '2014-06-01' and du.LogDate < '2014-07-01'LEFT JOIN
AccountPlans ap
on ap.PlanID = a.PlanID
GROUP BY a.accountID
ORDER BY a.name, a.accountID ;
For this, I would recommend the following indexes:
Accounts(name, AccountId)
Datausage(AccountId, LogDate)
AccountPlans(PlanId, PlanType)
When you left join with datausage you should restrict the output as much as possible right there. (JOIN means AND means WHERE means ON. Put the conditions in essentially whatever order will be clear and/or optimize when necessary.) The result will be a null-extended row when there was no usage; you want to leave that row in.
When you join with AccountPlans you don't want to introduce null rows (which can't happen anyway) so that's just an inner join.
The version below has the AccountPlan join as an inner join and put first. (Indexed) Accounts FK PlanID to AccountPlan means the DBMS knows the inner join will only ever generate one row per Accounts PK. So the output has key AccountId. That row can be immediately inner joined to datausage. (An index on its AccountID should help, eg for a merge join.) For the other way around there is no PlanID key/index on the outer join result to join with AccountPlan.
SELECT
a.accountID as AccountID,
IFNULL(a.name,a.accountID) as AccountName,
ap.plantype as AccountType,
a.status as AccountStatus,
date(a.created_at) as Created,
sum(IFNULL(du.Core,0) + (IFNULL(du.CoreDeluxe,0) * 3)) as CoreData
FROM Accounts a
JOIN AccountPlans ap ON ap.PlanID = a.PlanID
LEFT JOIN datausage du ON a.accountID = du.accountID AND du.LogDate >= '2014-06-01' AND du.LogDate < '2014-07-01'
GROUP BY a.accountID

MySQL & nested set: slow JOIN (not using index)

I have two tables:
localities:
CREATE TABLE `localities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
`type` varchar(30) NOT NULL,
`parent_id` int(11) DEFAULT NULL,
`lft` int(11) DEFAULT NULL,
`rgt` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_localities_on_parent_id_and_type` (`parent_id`,`type`),
KEY `index_localities_on_name` (`name`),
KEY `index_localities_on_lft_and_rgt` (`lft`,`rgt`)
) ENGINE=InnoDB;
locatings:
CREATE TABLE `locatings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`localizable_id` int(11) DEFAULT NULL,
`localizable_type` varchar(255) DEFAULT NULL,
`locality_id` int(11) NOT NULL,
`category` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_locatings_on_locality_id` (`locality_id`),
KEY `localizable_and_category_index` (`localizable_type`,`localizable_id`,`category`),
KEY `index_locatings_on_category` (`category`)
) ENGINE=InnoDB;
localities table is implemented as a nested set.
Now, when user belongs to some locality (through some locating) he also belongs to all its ancestors (higher level localities). I need a query that will select all the localities that all the users belong to into a view.
Here is my try:
select distinct lca.*, lt.localizable_type, lt.localizable_id
from locatings lt
join localities lc on lc.id = lt.locality_id
left join localities lca on (lca.lft <= lc.lft and lca.rgt >= lc.rgt)
The problem here is that it takes way too much time to execute.
I consulted EXPLAIN:
+----+-------------+-------+--------+---------------------------------+---------+---------+----------------------------------+-------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+---------------------------------+---------+---------+----------------------------------+-------+----------+-----------------+
| 1 | SIMPLE | lt | ALL | index_locatings_on_locality_id | NULL | NULL | NULL | 4926 | 100.00 | Using temporary |
| 1 | SIMPLE | lc | eq_ref | PRIMARY | PRIMARY | 4 | bzzik_development.lt.locality_id | 1 | 100.00 | |
| 1 | SIMPLE | lca | ALL | index_localities_on_lft_and_rgt | NULL | NULL | NULL | 11439 | 100.00 | |
+----+-------------+-------+--------+---------------------------------+---------+---------+----------------------------------+-------+----------+-----------------+
3 rows in set, 1 warning (0.00 sec)
The last join obviously doesn’t use lft, rgt index as I expect it to. I’m desperate.
UPDATE:
After adding a condition as #cairnz suggested, the query takes still too much time to process.
UPDATE 2: Column names instead of the asterisk
Updated query:
SELECT DISTINCT lca.id, lt.`localizable_id`, lt.`localizable_type`
FROM locatings lt FORCE INDEX(index_locatings_on_category)
JOIN localities lc
ON lc.id = lt.locality_id
INNER JOIN localities lca
ON lca.lft <= lc.lft AND lca.rgt >= lc.rgt
WHERE lt.`category` != "Unknown";
Updated EXAPLAIN:
+----+-------------+-------+--------+-----------------------------------------+-----------------------------+---------+---------------------------------+-------+----------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+-----------------------------------------+-----------------------------+---------+---------------------------------+-------+----------+-------------------------------------------------+
| 1 | SIMPLE | lt | range | index_locatings_on_category | index_locatings_on_category | 153 | NULL | 2545 | 100.00 | Using where; Using temporary |
| 1 | SIMPLE | lc | eq_ref | PRIMARY,index_localities_on_lft_and_rgt | PRIMARY | 4 | bzzik_production.lt.locality_id | 1 | 100.00 | |
| 1 | SIMPLE | lca | ALL | index_localities_on_lft_and_rgt | NULL | NULL | NULL | 11570 | 100.00 | Range checked for each record (index map: 0x10) |
+----+-------------+-------+--------+-----------------------------------------+-----------------------------+---------+---------------------------------+-------+----------+-------------------------------------------------+
Any help appreciated.
Ah, it just occurred to me.
Since you are asking for everything in the table, mysql decides to use a full table scan instead, as it deems it more efficient.
In order to get some key usage, add in some filters to restrict looking for every row in all the tables anyways.
Updating Answer:
Your second query does not make sense. You are left joining to lca yet you have a filter in it, this negates the left join by itself. Also you're looking for data in the last step of the query, meaning you will have to look through all of lt, lc and lca in order to find your data. Also you have no index with left-most column 'type' on locations, so you still need a full table scan to find your data.
If you had some sample data and example of what you are trying to achieve it would perhaps be easier to help.
try to experiment with forcing index - http://dev.mysql.com/doc/refman/5.1/en/index-hints.html, maybe it's just optimizer issue.
It looks like you're wanting the parents of the single result.
According to the person credited with defining Nested Sets in SQL, Joe Celko at http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html "This model is a natural way to show a parts explosion, because a final assembly is made of physically nested assemblies that break down into separate parts."
In other words, Nested Sets are used to filter children efficiently to an arbitrary number of independent levels within a single collection. You have two tables, but I don't see where the properties of the set "locatings" can't be de-normalized into "localities"?
If the localities table had a geometry column, could I not find the one locality from a "locating" and then select on the one table using a single filter: parent.lft <= row.left AND parent.rgt >= row.rgt ?
UPDATED
In this answer https://stackoverflow.com/a/1743952/3018894, there is an example from http://explainextended.com/2009/09/29/adjacency-list-vs-nested-sets-mysql/ where the following example gets all the ancestors to an arbitrary depth of 100000:
SELECT hp.id, hp.parent, hp.lft, hp.rgt, hp.data
FROM (
SELECT #r AS _id,
#level := #level + 1 AS level,
(
SELECT #r := NULLIF(parent, 0)
FROM t_hierarchy hn
WHERE id = _id
)
FROM (
SELECT #r := 1000000,
#level := 0
) vars,
t_hierarchy hc
WHERE #r IS NOT NULL
) hc
JOIN t_hierarchy hp
ON hp.id = hc._id
ORDER BY
level DESC