I have a legacy query that is terribly slow. I'll show the query, and the background to it after.
The query takes ~ 10s which is ridiculously slow. Explain gives me:
Query:
select staff.id as Id,
staff.eid as AccountId,
staff.Surname
from staff
LEFT JOIN app_roles ON (app_roles.app_staff_id = staff.id )
where staff.eid = 7227
AND app_roles.application_id = '1'
and staff.last_modified > '2022-05-11 13:15:21Z'
Staff table contains 280k rows, app_roles contains 644k rows. Staff rows with eid 7727 - 87 rows. app_roles rows for those matching staff id's - 75 rows
Table structures:
CREATE TABLE `app_roles` (
`application_id` varchar(40) NOT NULL,
`app_staff_id` varchar(40) NOT NULL,
`role` varchar(40) NOT NULL,
PRIMARY KEY (`application_id`,`app_staff_id`),
KEY `application_id` (`application_id`),
KEY `app_staff_id` (`app_staff_id`)
) ENGINE=InnoDB
CREATE TABLE `staff` (
`eid` int NOT NULL,
`id` varchar(40) NOT NULL,
`forename` varchar(60) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`surname` varchar(150) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
last_nodified DATETIME NOT NULL,
... columns omitted for simplicity
PRIMARY KEY (`eid`,`id`),
KEY `email` (`email`),
KEY `app_login` (`app_login`),
KEY `app_passwd` (`app_password`),
KEY `id` (`id`),
KEY `eid` (`eid`)
) ENGINE=InnoDB
+----+-------------+-----------+------------+--------+-------------------------------------+----------------+---------+---------------------------------------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+--------+-------------------------------------+----------------+---------+---------------------------------------+--------+----------+--------------------------+
| 1 | SIMPLE | app_roles | NULL | ref | PRIMARY,application_id,app_staff_id | application_id | 42 | const | 330114 | 100.00 | Using where; Using index |
| 1 | SIMPLE | staff | NULL | eq_ref | PRIMARY,id,eid | PRIMARY | 126 | const,inventry.app_roles.app_staff_id | 1 | 33.33 | Using where |
+----+-------------+-----------+------------+--------+-------------------------------------+----------------+---------+---------------------------------------+--------+----------+--------------------------+
I don't understand why the left join and the where are not filtering rows out, and why the indexes are not helping.
All other things being equal, MySQL likes to do joins by primary key lookup. It has a strong preference for that, because primary key lookups are a bit more efficient than secondary key lookups.
It may even change the order of the join to satisfy this preference. Inner join is commutative, so the optimizer can access either table first and then join to the other.
But you used a LEFT [OUTER] JOIN, so how can this be optimized like an inner join? You wrote a condition app_roles.application_id = '1' in the WHERE clause. If you test for a non-NULL value on the right table of a left outer join, it eliminates any of the rows that would make that join an outer join. It's effectively an inner join. Therefore the optimizer is free to reorder the tables in the join.
Both orders of join result in the join using primary key lookups. In both cases, the first column of the lookup is based on a constant condition in your query. The second column of the lookup is a reference from the first table.
So the optimizer has a dilemma. It can choose either join order, and both satisfy the preference for a primary key lookup. So it chooses one arbitrarily.
The failure is that it apparently didn't take into account that the condition on application_id causes it to examine over 330k rows. Either the optimizer has a blindness to this cost, or else the table statistics are not up to date and are fooling the optimizer.
You can refresh the table statistics. This is easy to do and has very small impact on the running system, so you might as well do it to rule out the possibility that bad statistics are causing a bad query optimization.
ANALYZE TABLE app_roles;
ANALYZE TABLE staff;
Then try your query again.
If it's still choosing a bad optimization strategy, you can use a join hint to force it to use the join order matching what you wrote in your query.
select id as Id,
eid as AccountId,
Surname
from staff
STRAIGHT_JOIN app_roles ON (app_roles.app_staff_id = staff.id )
where staff.eid = 7227
AND app_roles.application_id = '1'
and last_modified > '2022-05-11 13:15:21Z'
There might also be a way to incorporate last_modified into an index, but I can't tell which table it belongs to.
I would assume you have an issue with the character set / collation. Make sure the fields you are joining match. To verify this, run :
SHOW FULL COLUMNS FROM staff;
SHOW FULL COLUMNS FROM app_roles;
More specifically, make sure app_roles.app_staff_id and staff.id are the same type.
These 'composite' and 'covering' indexes should help:
staff: INDEX(eid, last_modified, id, Surname)
app_roles: INDEX(application_id, app_staff_id)
Get rid of the Z on the DATETIME literal; MySQL does not understand such.
Related
I am new to SQL (using mySQL Community Workbench) and not sure where to begin with this problem.
Here is the overview: I have two tables in my food database: branded_food and food_nutrient
The important columns in branded_food are fdc_id and kcals.
The important columns in food_nutrient are fdc_id, nutrient_id, and value
branded_food's fdc_id column indexes into food_nutrient's fdc_id column. However, this returns every nutrient in the food, when I only want nutrient id 208's value entry.
Here is an example:
branded_food looks like:
fdc_id | kcals
-----------------
123 | (Empty)
456 | (Empty)
... | (Empty)
food_nutrient looks like:
fdc_id | nutrient_id | value
----------------------------
123 | 203 | 23
123 | 204 | 25
123 | ... | ...
123 | 208 | 500
Essentially, I would like to write some sort of loop that goes through each fdc_id in branded_food, finds the row in food_nutrient that has fdc_id equal to the looped value, and then populate the kcals in the row of the fdc_id in branded_food. Thus the first example row should populate like:
fdc_id | kcals
-----------------
123 | 500
As an update, I have looked at INNER JOIN and have created this:
SELECT food_nutrient.amount,food_branded_food.description, food_branded_food.fdc_id
FROM food_nutrient
INNER JOIN food_branded_food ON food_nutrient.fdc_id = food_branded_food.fdc_id
WHERE food_nutrient.nutrient_id = 208
LIMIT 1;
This will correctly display the kcals of the food_branded_food.description (the name of the food) that has fdc_id of food_branded_food.fdc_id. I limit to 1 because the query takes very long. Is there a better way?
Update #2: Here is something I recently tried, but just spins forever:
UPDATE backup_branded_food bf
INNER JOIN (
SELECT food_nutrient.fdc_id,food_nutrient.amount amt FROM food_nutrient WHERE food_nutrient.nutrient_id = 208
) mn ON bf.fdc_id = mn.fdc_id
SET bf.kcals = mn.amt
WHERE bf.kcals IS NULL;
Running explain:
And SHOW CREATE TABLE food_nutrient
| food_nutrient | CREATE TABLE `food_nutrient` (
`id` bigint DEFAULT NULL,
`fdc_id` bigint DEFAULT NULL,
`nutrient_id` bigint DEFAULT NULL,
`amount` bigint DEFAULT NULL,
`data_points` bigint DEFAULT NULL,
`derivation_id` bigint DEFAULT NULL,
`min` double DEFAULT NULL,
`max` double DEFAULT NULL,
`median` double DEFAULT NULL,
`loq` text,
`footnote` text,
`min_year_acquired` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
Running SHOW CREATE TABLE backup_branded_food (I use a backup of branded food instead of the actual table)
| backup_branded_food | CREATE TABLE `backup_branded_food` (
`fdc_id` bigint DEFAULT NULL,
`data_type` text,
`description` text,
`food_category_id` bigint DEFAULT NULL,
`publication_date` text,
`brand_owner` varchar(255) DEFAULT NULL,
`brand_name` varchar(255) DEFAULT NULL,
`serving_size` double DEFAULT NULL,
`serving_size_unit` varchar(50) DEFAULT NULL,
`kcals` double DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
Table Indexes:
The table structure info obtained from SHOW CREATE TABLE table_name shows that both table don't have any indexes and/or primary key. This is probably why your query runs very slow. To quickly fix this issue, let's start by adding indexes on columns appear in WHERE and ON (in the JOIN):
ALTER TABLE food_nutrient
ADD INDEX fdc_id(fdc_id),
ADD INDEX nutrient_id(nutrient_id);
ALTER TABLE branded_food
ADD INDEX fdc_id(fdc_id);
With these indexes added, the EXPLAIN shows the following:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
fn
ref
fdc_id,nutrient_id
nutrient_id
9
const
1
100.00
Using where
1
SIMPLE
bf
ref
fdc_id
fdc_id
9
db_40606077.fn.fdc_id
1
100.00
Since I don't know the size of the table, I can't really test how quick the query will be after adding these indexes but I assume that this will improve the query speed significantly.
P/S: Normally, you would have at least 1 column assigned as PRIMARY KEY - which will never have any duplicates. In your table food_nutrient, there's an id column that might be the PRIMARY KEY but there's a possible unique combination between fdc_id and nutrient_id. Therefore, you might consider adding UNIQUE KEY on those two columns apart from adding PRIMARY KEY on id. 24.6.1 Partitioning Keys, Primary Keys, and Unique Keys
Usage of aliases:
This is to help make your query more readable. You didn't use any in your current query so you end up appending the full table name on column(s) that you're using in you operations:
....
FROM food_nutrient AS fn
INNER JOIN food_branded_food fbf /*can simply be written without "..AS.."*/
ON fn.fdc_id = fbf.fdc_id /*the operation afterwards didn't require you to append full table name*/
...
Similarly, once you've added the table alias, you can use it in SELECT too:
SELECT fn.amount, fbf.description,
fbf.fdc_id AS 'FBF_id'
/*you can also assign a custom/desired alias to your column - as your output column name*/
...
Couldn't find official documentation on MySQL website but here's a further explanation from a different site.
Alternative UPDATE syntax:
Your current UPDATE query should be able to perform what you need but you probably don't need the subquery at all. This UPDATE query should work as well:
UPDATE branded_food bf
JOIN food_nutrient fn ON bf.fdc_id = fn.fdc_id
SET bf.kcals = fn.amount
WHERE fn.nutrient_id = 208
AND bf.kcals IS NULL;
Here's a demo fiddle for reference
A UPDATE and an INNER JOIN gets you your wanted result
UPDATE branded_food bf
INNER JOIN (SELECT fdc_id , SUM(value) svalue FROM Mfood_nutrient ) mn ON bg.fdc_id = mn.fdc_id
SET bf.value = mn.svalue
WHERE bf.value IS NULL;
I have a query which purpose is to generate statistics for how many musical work (track) has been downloaded from a site at different periods (by month, by quarter, by year etc). The query operates on the tables entityusage, entityusage_file and track.
To get the number of downloads for tracks belonging to an specific album I would do the following query :
select
date_format(eu.updated, '%Y-%m-%d') as p, count(eu.id) as c
from entityusage as eu
inner join entityusage_file as euf
ON euf.entityusage_id = eu.id
inner join track as t
ON t.id = euf.track_id
where
t.album_id = '0054a47e-b594-407b-86df-3be078b4e7b7'
and entitytype = 't'
and action = 1
group by date_format(eu.updated, '%Y%m%d')
I need to set entitytype = 't' as the entityusage can hold downloads of other entities as well (if entitytype = 'a' then an entire album would have been downloaded, and entityusage_file would then hold all tracks which the album "translated" into at the point of download).
This query takes 40 - 50 seconds. I've been trying to optimize this query for a while, but I have the feeling that I'm approaching this the wrong way.
This is one out of 4 similar queries which must run to generate a report. The report should preferable be able to finish while a user waits for it. Right now, I'm looking at 3 - 4 minutes. That's a long time to wait.
Can this query be optimised further with indexes, or do I need to take another approach to get this job done?
CREATE TABLE `entityusage` (
`id` char(36) NOT NULL,
`title` varchar(255) DEFAULT NULL,
`entitytype` varchar(5) NOT NULL,
`entityid` char(36) NOT NULL,
`externaluser` int(10) NOT NULL,
`action` tinyint(1) NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `e` (`entityid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `entityusage_file` (
`id` char(36) NOT NULL,
`entityusage_id` char(36) NOT NULL,
`track_id` char(36) NOT NULL,
`file_id` char(36) NOT NULL,
`type` varchar(3) NOT NULL,
`quality` int(1) NOT NULL,
`size` int(20) NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `file_id` (`file_id`),
KEY `entityusage_id` (`entityusage_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `track` (
`id` char(36) NOT NULL,
`album_id` char(36) NOT NULL,
`number` int(3) NOT NULL DEFAULT '0',
`title` varchar(255) DEFAULT NULL,
`updated` datetime NOT NULL DEFAULT '2000-01-01 00:00:00',
PRIMARY KEY (`id`),
KEY `album` (`album_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC;
An EXPLAIN on the query gives me the following :
+------+-------------+-------+--------+----------------+----------------+---------+------------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+----------------+----------------+---------+------------------------------+---------+----------------------------------------------+
| 1 | SIMPLE | eu | ALL | NULL | NULL | NULL | NULL | 7832817 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | euf | ref | entityusage_id | entityusage_id | 108 | func | 1 | Using index condition |
| 1 | SIMPLE | t | eq_ref | PRIMARY,album | PRIMARY | 108 | trackerdatabase.euf.track_id | 1 | Using where |
+------+-------------+-------+--------+----------------+----------------+---------+------------------------------+---------+----------------------------------------------+
This is your query:
select date_format(eu.updated, '%Y-%m-%d') as p, count(eu.id) as c
from entityusage eu join
entityusage_file euf
on euf.entityusage_id = eu.id join
track t
on t.id = euf.track_id
where t.album_id = '0054a47e-b594-407b-86df-3be078b4e7b7' and
eu.entitytype = 't' and
eu.action = 1
group by date_format(eu.updated, '%Y%m%d');
I would suggest indexes on track(album_id, id), entityusage_file(track_id, entityusage_id), and entityusage(id, entitytype, action).
Assuming that entityusage_file is mostly a many:many mapping table, see this for tips on improving it. Note that it calls for getting rid of the id and making a pair of 2-column indexes, one of which is the PRIMARY KEY(track_id, entityusage_id). Since your table has a few extra columns, that link does not cover everything.
The UUIDs could be shrunk from 108 bytes to 36, then then to 16 by going to BINARY(16) and using a compression function. Many exist (including a builtin pair in version 8.0); here's mine.
To explain one thing... The query execution should have started with track (on the assumption that '0054a47e-b594-407b-86df-3be078b4e7b7' is very selective). The hangup was that there was no index to get from there to the next table. Gordon's suggested indexes include such.
date_format(eu.updated, '%Y-%m-%d') and date_format(eu.updated, '%Y%m%d') can be simplified to DATE(eu.updated). (No significant performance change.)
(The other Answers and Comments cover a number of issues; I won't repeat them here.)
Because the GROUP BY operation is on an expression involving a function, MySQL can't use an index to optimize that operation. It's going to require a "Using filesort" operation.
I believe the indexes that Gordon suggested are the best bets, given the current table definitions. But even with those indexes, the "tall post" is the eu table, chunking through and sorting all those rows.
To get more reasonable performance, you may need to introduce a "precomputed results" table. It's going to be expensive to generate the counts for everything... but we can pay that price ahead of time...
CREATE TABLE usage_track_by_day
( updated_dt DATE NOT NULL
, PRIMARY KEY (track_id, updated_dt)
)
AS
SELECT eu.track_id
, DATE(eu.updated) AS updated_dt
, SUM(IF(eu.action = 1,1,0) AS cnt
FROM entityusage eu
WHERE eu.track_id IS NOT NULL
AND eu.updated IS NOT NULL
GROUP
BY eu.track_id
, DATE(eu.updated)
An index ON entityusage (track_id,updated,action) may benefit performance.
Then, we could write a query against the new "precomputed results" table, with a better shot at reasonable performance.
The "precomputed results" table would get stale, and would need to be periodically refreshed.
This isn't necessarily the best solution to the issue, but it's a technique we can use in datawarehouse/datamart applications. This lets us churn through lots of detail rows to get counts one time, and then save those counts for fast access.
can you try this. i cant really test it without some sample data from you.
In this case the query looks first in table track and joins then the other tables.
SELECT
date_format(eu.updated, '%Y-%m-%d') AS p
, count(eu.id) AS c
FROM track AS t
INNER JOIN entityusage_file AS euf ON t.id = euf.track_id
INNER JOIN entityusage AS eu ON euf.entityusage_id = eu.id
WHERE
t.album_id = '0054a47e-b594-407b-86df-3be078b4e7b7'
AND entitytype = 't'
AND ACTION = 1
GROUP BY date_format(eu.updated, '%Y%m%d');
Below are the 4 tables' table structure:
Calendar:
CREATE TABLE `calender` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`HospitalID` int(11) NOT NULL,
`ColorCode` int(11) DEFAULT NULL,
`RecurrID` int(11) NOT NULL,
`IsActive` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`ID`),
UNIQUE KEY `ID_UNIQUE` (`ID`),
KEY `idxHospital` (`ID`,`StaffID`,`HospitalID`,`ColorCode`,`RecurrID`,`IsActive`)
) ENGINE=InnoDB AUTO_INCREMENT=4638 DEFAULT CHARSET=latin1;
CalendarAttendee:
CREATE TABLE `calenderattendee` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`CalenderID` int(11) NOT NULL,
`StaffID` int(11) NOT NULL,
`IsActive` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`ID`),
KEY `idxCalStaffID` (`StaffID`,`CalenderID`)
) ENGINE=InnoDB AUTO_INCREMENT=20436 DEFAULT CHARSET=latin1;
CallPlanStaff:
CREATE TABLE `callplanstaff` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`Staffname` varchar(45) NOT NULL,
`IsActive` tinyint(4) NOT NULL DEFAULT '1',
PRIMARY KEY (`ID`),
UNIQUE KEY `ID_UNIQUE` (`ID`),
KEY `idx_IsActive` (`Staffname`,`IsActive`),
KEY `idx_staffName` (`Staffname`,`ID`) USING BTREE KEY_BLOCK_SIZE=100
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=latin1;
Users:
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL DEFAULT '',
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_users_on_email` (`email`),
UNIQUE KEY `index_users_on_name` (`name`),
KEY `idx_email` (`email`) USING BTREE KEY_BLOCK_SIZE=100
) ENGINE=InnoDB AUTO_INCREMENT=33 DEFAULT CHARSET=utf8;
What I'm trying to do is to fetch the calender.ID and Users.name using below query:
SELECT a.ID, h.name
FROM `stjude`.`calender` a
left join calenderattendee e on a.ID = e.calenderID
left join callplanstaff f on e.StaffID = f.ID
left join users h on f.Staffname = h.email
The relation between those tables are:
It took about 4 seconds to fetch 13000 records which I bet it could be faster.
When I look at the tabular explain of the query, here's the result:
Why MYSQL isn't using index on callplanstaff table and users table?
Also, in my case, should I use multi index instead of multi column index?
And is there any indexes I'm missing so my query is slow?
=======================================================================
Updated:
As zedfoxus and spencer7593 recommended to change the idxCalStaffID's ordering and idx_staffname's ordering, below is the execution plan:
It took 0.063 seconds to fetch, much fewer time required, how does the ordering of the indexing affects the fetch time..?
You're misinterpreting the EXPLAIN report.
type: index is not such a good thing. It means it's doing an "index-scan" which examines every element of an index. It's almost as bad as a table-scan. Notice the column rows: 4562 and rows: 13451. This is the estimated number of index elements it will examine for each of those tables.
Having two tables doing a index-scan is even worse. The total number of rows examined for this join is 4562 x 13451 = 61,363,462.
Using join buffer is not a good thing. It's a thing the optimizer does as a consolation when it can't use an index for the join.
type: eqref is a good thing. It means it's using a PRIMARY KEY index or UNIQUE KEY index, to look up exactly one row. Notice the column rows: 1. So at least for each of the rows from the previous join, it only does one index lookup.
You should create an index on calenderattendee for columns (CalenderId, StaffId) in that order (#spencer7593 posted this suggestion while I was writing my post).
By using LEFT [OUTER] JOIN in this query, you're preventing MySQL from optimizing the order of table joins. And since your query fetches h.name, I infer that you really just want results where the calendar event has an attendee and the attendee has a corresponding user record. It makes no sense that you're not using an INNER JOIN.
Here's the EXPLAIN with the new index and the joins changed to INNER JOIN (though my row counts are meaningless because I didn't create test data):
+----+-------------+-------+------------+--------+--------------------------------+----------------------+---------+----------------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+--------------------------------+----------------------+---------+----------------+------+----------+-----------------------+
| 1 | SIMPLE | a | NULL | index | PRIMARY,ID_UNIQUE,idxHospital | ID_UNIQUE | 4 | NULL | 1 | 100.00 | Using index |
| 1 | SIMPLE | e | NULL | ref | idxCalStaffID,CalenderID | CalenderID | 4 | test.a.ID | 1 | 100.00 | Using index |
| 1 | SIMPLE | f | NULL | eq_ref | PRIMARY,ID_UNIQUE | PRIMARY | 4 | test.e.StaffID | 1 | 100.00 | NULL |
| 1 | SIMPLE | h | NULL | eq_ref | index_users_on_email,idx_email | index_users_on_email | 767 | func | 1 | 100.00 | Using index condition |
+----+-------------+-------+------------+--------+--------------------------------+----------------------+---------+----------------+------+----------+-----------------------+
The type: index for the calenderattendee table has been changed to type: ref which means an index lookup against a non-unique index. And the note about Using join buffer is gone.
That should run better.
how does the ordering of the indexing affects the fetch time..?
Think of a telephone book, which is ordered by last name first, then by first name. This helps you look up people by last name very quickly. But it does not help you look up people by first name.
The position of columns in an index matters!
You might like my presentation How to Design Indexes, Really.
Slides: http://www.slideshare.net/billkarwin/how-to-design-indexes-really
Video of me presenting this talk: https://www.youtube.com/watch?v=ELR7-RdU9XU
Q: Is there any indexes I'm missing so my query is slow?
A: Yes. A suitable index on calendarattendee is missing.
We probably want an index on calenderattendee with a calendarid as the leading column, for example:
... ON calenderattendee (calendaid, staffid)
This seems like a situation where inner join might be a better option than a left join.
SELECT a.ID, h.name
FROM `stjude`.`calender` a
INNER JOIN calenderattendee e on a.ID = e.calenderID
INNER JOIN callplanstaff f on e.StaffID = f.ID
INNER JOIN users h on f.Staffname = h.email
Then let's get onto the indexes. The Calendar table has
PRIMARY KEY (`ID`),
UNIQUE KEY `ID_UNIQUE` (`ID`),
The second one, ID_UNIQUE is redundant. A Primary key is a unique index. Having too many indexes slows down insert/update/delete operations.
Then the users table has
UNIQUE KEY `index_users_on_email` (`email`),
UNIQUE KEY `index_users_on_name` (`name`),
KEY `idx_email` (`email`) USING BTREE KEY_BLOCK_SIZE=100
The idx_email column is redundant here. Other than that there isn't much to do by way of tweaking the indexes. Your explain shows that an index is being used on each and table.
Why MYSQL isn't using index on callplanstaff table and users table?
Your explain shows that it does. The it's using the primary key and the index_users_on_email indexes on these tables.
Also, in my case, should I use multi index instead of multi column
index?
As a rule of thumb, mysql uses only one index per table. So a multi column index is the way to go rather than having multiple indexes.
And is there any indexes I'm missing so my query is slow?
As I mentioned in the comments you are fetching (and probably displaying) 13,000 records. That's where your bottleneck maybe.
I have a table with 25 million rows, indexed appropriately.
But adding the clause AND status IS NULL turns a super fast query into a crazy slow query.
Please help me speed it up.
Query:
SELECT
student_id,
grade,
status
FROM
grades
WHERE
class_id = 1
AND status IS NULL -- This line delays results from <200ms to 40-70s!
AND grade BETWEEN 0 AND 0.7
LIMIT 25;
Table:
CREATE TABLE IF NOT EXISTS `grades` (
`student_id` BIGINT(20) NOT NULL,
`class_id` INT(11) NOT NULL,
`grade` FLOAT(10,6) DEFAULT NULL,
`status` INT(11) DEFAULT NULL,
UNIQUE KEY `unique_key` (`student_id`,`class_id`),
KEY `class_id` (`class_id`),
KEY `status` (`status`),
KEY `grade` (`grade`)
) ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Local development shows results instantly (<200ms). Production server is huge slowdown (40-70 seconds!).
Can you point me in the right direction to debug?
Explain:
+----+-------------+--------+-------------+-----------------------+-----------------+---------+------+-------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------------+-----------------------+-----------------+---------+------+-------+--------------------------------------------------------+
| 1 | SIMPLE | grades | index_merge | class_id,status,grade | status,class_id | 5,4 | NULL | 26811 | Using intersect(status,class_id); Using where |
+----+-------------+--------+-------------+-----------------------+-----------------+---------+------+-------+--------------------------------------------------------+
A SELECT statement can only use one index per table.
Presumably the query before just did a scan using the sole index class_id for your condition class_id=1. Which will probably filter your result set nicely before checking the other conditions.
The optimiser is 'incorrectly' choosing an index merge on class_id and status for the second query and checking 26811 rows which is probably not optimal. You could hint at the class_id index by adding USING INDEX (class_id) to the end of the FROM clause.
You may get some joy with a composite index on (class_id,status,grade) which may run the query faster as it can match the first two and then range scan the grade. I'm not sure how this works with null though.
I'm guessing the ORDER BY pushed the optimiser to choose the class_id index again and returned your query to it's original speed.
I've got a fairly simple query that seeks to display the number of email addresses that are subscribed along with the number unsubscribed, grouped by client.
The query:
SELECT
client_id,
COUNT(CASE WHEN subscribed = 1 THEN subscribed END) AS subs,
COUNT(CASE WHEN subscribed = 0 THEN subscribed END) AS unsubs
FROM
contacts_emailAddresses
LEFT JOIN contacts ON contacts.id = contacts_emailAddresses.contact_id
GROUP BY
client_id
Schema of relevant tables follows. contacts_emailAddresses is a junction table between contacts (which has the client_id) and emailAddresses (which is not actually used in this query).
CREATE TABLE `contacts` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`firstname` varchar(255) NOT NULL DEFAULT '',
`middlename` varchar(255) NOT NULL DEFAULT '',
`lastname` varchar(255) NOT NULL DEFAULT '',
`gender` varchar(5) DEFAULT NULL,
`client_id` mediumint(10) unsigned DEFAULT NULL,
`datasource` varchar(10) DEFAULT NULL,
`external_id` int(10) unsigned DEFAULT NULL,
`created` timestamp NULL DEFAULT NULL,
`trash` tinyint(1) NOT NULL DEFAULT '0',
`updated` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `client_id` (`client_id`),
KEY `external_id combo` (`client_id`,`datasource`,`external_id`),
KEY `trash` (`trash`),
KEY `lastname` (`lastname`),
KEY `firstname` (`firstname`),
CONSTRAINT `contacts_ibfk_1` FOREIGN KEY (`client_id`) REFERENCES `clients` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=14742974 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT
CREATE TABLE `contacts_emailAddresses` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`contact_id` int(10) unsigned NOT NULL,
`emailAddress_id` int(11) unsigned DEFAULT NULL,
`primary` tinyint(1) unsigned NOT NULL DEFAULT '0',
`subscribed` tinyint(1) unsigned NOT NULL DEFAULT '1',
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `contact_id` (`contact_id`),
KEY `subscribed` (`subscribed`),
KEY `combo` (`contact_id`,`emailAddress_id`) USING BTREE,
KEY `emailAddress_id` (`emailAddress_id`) USING BTREE,
CONSTRAINT `contacts_emailAddresses_ibfk_1` FOREIGN KEY (`contact_id`) REFERENCES `contacts` (`id`),
CONSTRAINT `contacts_emailAddresses_ibfk_2` FOREIGN KEY (`emailAddress_id`) REFERENCES `emailAddresses` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=24700918 DEFAULT CHARSET=utf8
Here's the EXPLAIN:
+----+-------------+-------------------------+--------+---------------+---------+---------+-------------------------------------------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------------+--------+---------------+---------+---------+-------------------------------------------+----------+---------------------------------+
| 1 | SIMPLE | contacts_emailAddresses | ALL | NULL | NULL | NULL | NULL | 10176639 | Using temporary; Using filesort |
| 1 | SIMPLE | contacts | eq_ref | PRIMARY | PRIMARY | 4 | icarus.contacts_emailAddresses.contact_id | 1 | |
+----+-------------+-------------------------+--------+---------------+---------+---------+-------------------------------------------+----------+---------------------------------+
2 rows in set (0.08 sec)
The problem here clearly is the GROUP BY clause, as I can remove the JOIN (and the items that depend on it) and the performance still is terrible (40+ seconds). There are 10m records in contacts_emailAddresses, 12m-some records in contacts, and 10–15 client records for the grouping.
From the doc:
Temporary tables can be created under conditions such as these:
If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
DISTINCT combined with ORDER BY may require a temporary table.
If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.
I'm obviously not combining the GROUP BY with an ORDER BY, and I have tried multiple things to ensure that the GROUP BY is on a column that should be properly placed in the join queue (including rewriting the query to put contacts in the FROM and instead join to contacts_emailAddresses), all to no avail.
Any suggestions for performance tuning would be much appreciated!
I think the only real shot you have of getting away from a "Using temporary; Using filesort" operation (given the current schema, the current query, and the specified resultset) would be to use correlated subqueries in the SELECT list.
SELECT c.client_id
, (SELECT IFNULL(SUM(es.subscribed=1),0)
FROM contacts_emailAddresses es
JOIN contacts cs
ON cs.id = es.contact_id
WHERE cs.client_id = c.client_id
) AS subs
, (SELECT IFNULL(SUM(eu.subscribed=0),0)
FROM contacts_emailAddresses eu
JOIN contacts cu
ON cu.id = eu.contact_id
WHERE cu.client_id = c.client_id
) AS unsubs
FROM contacts c
GROUP BY c.client_id
This may run quicker than the original query, or it may not. Those correlated subqueries are going to get run for each returned by the outer query. If that outer query is returning a boatload of rows, that's a whole boatload of subquery executions.
Here's the output from an EXPLAIN:
id select_type table type possible_keys key key_len ref Extra
-- ------------------ ----- ----- ----------------------------------- ---------- ------- ------ ------------------------
1 PRIMARY c index (NULL) client_id 5 (NULL) Using index
3 DEPENDENT SUBQUERY cu ref PRIMARY,client_id,external_id combo client_id 5 func Using where; Using index
3 DEPENDENT SUBQUERY eu ref contact_id,combo contact_id 4 cu.id Using where
2 DEPENDENT SUBQUERY cs ref PRIMARY,client_id,external_id combo client_id 5 func Using where; Using index
2 DEPENDENT SUBQUERY es ref contact_id,combo contact_id 4 cs.id Using where
For optimum performance of this query, we'd really like to see "Using index" in the Extra column of the explain for the eu and es tables. But to get that, we'd need a suitable index, one with a leading column of contact_id and including the subscribed column. For example:
CREATE INDEX cemail_IX2 ON contacts_emailAddresses (contact_id, subscribed);
With the new index available, EXPLAIN output shows MySQL will use the new index:
id select_type table type possible_keys key key_len ref Extra
-- ------------------ ----- ----- ----------------------------------- ---------- ------- ------ ------------------------
1 PRIMARY c index (NULL) client_id 5 (NULL) Using index
3 DEPENDENT SUBQUERY cu ref PRIMARY,client_id,external_id combo client_id 5 func Using where; Using index
3 DEPENDENT SUBQUERY eu ref contact_id,combo,cemail_IX2 cemail_IX2 4 cu.id Using where; Using index
2 DEPENDENT SUBQUERY cs ref PRIMARY,client_id,external_id combo client_id 5 func Using where; Using index
2 DEPENDENT SUBQUERY es ref contact_id,combo,cemail_IX2 cemail_IX2 4 cs.id Using where; Using index
NOTES
This is the kind of problem where introducing a little redundancy can improve performance. (Just like we do in a traditional data warehouse.)
For optimum performance, what we'd really like is to have the client_id column available on the contacts_emailAddresses table, without a need to JOINI to the contacts table.
In the current schema, the foreign key relationship to contacts table gets us the client_id (rather, the JOIN operation in the original query is what gets it for us.) If we could avoid that JOIN operation entirely, we could satisfy the query entirely from a single index, using the index to do the aggregation, and avoiding the overhead of the "Using temporary; Using filesort" and JOIN operations...
With the client_id column available, we'd create a covering index like...
... ON contacts_emailAddresses (client_id, subscribed)
Then, we'd have a blazingly fast query...
SELECT e.client_id
, SUM(e.subscribed=1) AS subs
, SUM(e.subscribed=0) AS unsubs
FROM contacts_emailAddresses e
GROUP BY e.client_id
That would get us a "Using index" in the query plan, and the query plan for this resultset just doesn't get any better than that.
But, that would require a change to your scheam, it doesn't really answer your question.
Without the client_id column, then the best we're likely to do is a query like the one Gordon posted in his answer (though you still need to add the GROUP BY c.client_id to get the specified result.) The index Gordon recommended will be of benefit...
... ON contacts_emailAddresses(contact_id, subscribed)
With that index defined, the standalone index on contact_id is redundant. The new index will be a suitable replacement to support the existing foreign key constraint. (The index on just contact_id could be dropped.)
Another approach would be to do the aggregation on the "big" table first, before doing the JOIN, since it's the driving table for the outer join. Actually, since that foreign key column is defined as NOT NULL, and there's a foreign key, it's not really an "outer" join at all.
SELECT c.client_id
, SUM(s.subs) AS subs
, SUM(s.unsubs) AS unsubs
FROM ( SELECT e.contact_id
, SUM(e.subscribed=1) AS subs
, SUM(e.eubscribed=0) AS unsubs
FROM contacts_emailAddresses e
GROUP BY e.contact_id
) s
JOIN contacts c
ON c.id = s.contact_id
GROUP BY c.client_id
Again, we need an index with contact_id as the leading column and including the subscribed column, for best performance. (The plan for s should show "Using index".) Unfortunately, that's still going to materialize a fairly sizable resultset (derived table s) as a temporary MyISAM table, and the MyISAM table isn't going to be indexed.