This query is very slow. It is pretty simple and the 3 tables used are indexed on all columns in JOIN and WHERE clauses. How can I optimize my query, or my tables for this query?
This is the slow query. It takes 15-20 seconds to run.
SELECT
user.id,
user.name,
user.key,
user.secret,
account.id,
account.name,
account.admin,
setting.attribute,
setting.value
FROM user
INNER JOIN account ON account.id = user.account_id
INNER JOIN setting ON setting.user_id = user.id
AND setting.deleted = 0
WHERE user.deleted = 0
It is likely issue is caused by join on the setting table, because the below two queries take about 5 seconds total. Although, 5 seconds still seems a little long?
SELECT
user.id,
user.name,
user.user_key,
user.secret,
account.id,
account.name,
account.admin
FROM user
INNER JOIN account ON account.user_id = user.id
WHERE user.deleted = 0
SELECT
setting.user_id,
setting.attribute,
setting.value
FROM setting
WHERE setting.deleted = 0
The explain for the slow query:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1, 'SIMPLE', 'user', 'ALL', 'PRIMARY,idx_id,idx_deleted', null, null, null, 600, 'Using where'
1, 'SIMPLE', 'account', 'eq_ref', 'PRIMARY', 'PRIMARY', '8', 'user.account_id', 1, null
1, 'SIMPLE', 'setting', 'ref', 'attribute_version_unique,idx_user_id,indx_deleted', 'attribute_version_unique', '8', 'user.id', 35, 'Using where'
The schema:
CREATE TABLE user
(
id BIGINT(20) unsigned PRIMARY KEY NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NOT NULL,
user_key VARCHAR(45) NOT NULL,
secret VARCHAR(16),
account_id BIGINT(20) unsigned NOT NULL,
name VARCHAR(40) NOT NULL,
demo TINYINT(1) DEFAULT '0' NOT NULL,
details VARCHAR(4000),
date_created DATETIME NOT NULL,
date_modified DATETIME NOT NULL,
deleted TINYINT(1) DEFAULT '0' NOT NULL
);
CREATE INDEX idx_date_modified ON user (date_modified);
CREATE INDEX idx_deleted ON user (deleted);
CREATE INDEX idx_id ON pub_application (id);
CREATE UNIQUE INDEX idx_name_unique ON user (user_key);
CREATE TABLE account
(
id BIGINT(20) unsigned PRIMARY KEY NOT NULL AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
display_name VARCHAR(100),
admin TINYINT(1) DEFAULT '0' NOT NULL,
visibility VARCHAR(15) DEFAULT 'public',
cost DOUBLE,
monthly_fee VARCHAR(300),
date_created DATETIME NOT NULL,
date_modified DATETIME NOT NULL,
deleted TINYINT(1) DEFAULT '0'
);
CREATE INDEX idx_date_modified ON account (date_modified);
CREATE TABLE setting
(
id BIGINT(20) unsigned PRIMARY KEY NOT NULL AUTO_INCREMENT,
user_id BIGINT(20) unsigned NOT NULL,
attribute VARCHAR(45) NOT NULL,
value VARCHAR(4000),
date_created DATETIME NOT NULL,
date_modified DATETIME NOT NULL,
deleted TINYINT(1) DEFAULT '0' NOT NULL
);
CREATE UNIQUE INDEX attribute_version_unique ON setting (user_id, attribute);
CREATE INDEX idx_user_id ON setting (user_id);
CREATE INDEX idx_date_modified ON setting (date_modified);
CREATE INDEX indx_deleted ON setting (deleted);
With respect, you've stumbled across a common antipattern. Indexing "all columns" ordinarily is a useless move. MySQL (as of late 2016) can exploit at most one index per table when satisfying a query. So the extra indexes are likely to help no queries, and definitely add overhead on INSERT and UPDATE operations.
This query might be improved by some purpose-designed compound covering indexes.
Try this index on your user table. It's a covering index: intended to contain all the columns necessary to satisfy the query. It's organized in an order that matches your WHERE clause.
CREATE INDEX idx_user_account_setting
ON user (deleted , account_id, id, name, key, secret);
This covering index might help on your setting table
CREATE INDEX idx_setting_user
ON setting (user_id, deleted , attribute, value);
Also try this one, switching the order of the first two columns, if the first one doesn't help.
CREATE INDEX idx_setting_user_alt
ON setting (deleted, user_id, attribute, value);
Finally try this one on account.
CREATE INDEX idx_account_user
ON account (id, name, admin);
Please, if these suggestions help leave a brief comment telling how much they helped.
Read this. http://use-the-index-luke.com/
Related
I have a table to store data from csv files. It is a large table (over 40 million rows). This is its structure:
CREATE TABLE `imported_lines` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`day` date NOT NULL,
`name` varchar(256) NOT NULL,
`origin_id` int(11) NOT NULL,
`time` time(3) NOT NULL,
`main_index` tinyint(4) NOT NULL DEFAULT 0,
`transaction_index` tinyint(4) NOT NULL DEFAULT 0,
`data` varchar(4096) NOT NULL,
`error` bit(1) NOT NULL,
`expressions_applied` bit(1) NOT NULL,
`count_records` smallint(6) NOT NULL DEFAULT 0,
`client_id` tinyint(4) NOT NULL DEFAULT 0,
`receive_date` datetime(3) NOT NULL,
PRIMARY KEY (`id`,`client_id`),
UNIQUE KEY `uq` (`client_id`,`name`,`origin_id`,`receive_date`),
KEY `dh` (`day`,`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY HASH (`client_id`) PARTITIONS 15 */
When I perform a SELECT with one day filter, it returns data very quick (0.4 s). But, as I increase date range, it becomes slow, until gets a timeout error.
This is the query:
SELECT origin_id, error, main_index, transaction_index,
expressions_applied, name, day,
COUNT(id) AS total, SUM(count_records) AS sum_records
FROM imported_lines FORCE INDEX (dh)
WHERE client_id = 1
AND day >= '2017-07-02' AND day <= '2017-07-03'
AND name IN ('name1', 'name2', 'name3', ...)
GROUP BY origin_id, error, main_index, transaction_index, expressions_applied, name, day;
I think the IN clause may be losing performance. I also tried to add uq index to this query, which gave a little gain (FORCE INDEX (dh, uq)).
Plus, I tried to INNER JOIN (SELECT name FROM providers WHERE id = 2) prov ON prov.name = il.name but doesn't result in a quicker query as well.
EDIT
EXPLAINing the query
id - 1
select_type - SIMPLE
table - imported_lines
type - range
possible_keys - uq, dh
key - dh
key_len - 261
ref - NULL
rows - 297988
extra - Using where; Using temporary; Using filesort
Any suggestions what it should do?
I have done a few changes, adding a new index with multiple columns (as suggested by #Uueerdo) and rewritten query as another user suggested too (but he deleted his answer).
I ran a few EXPLAIN PARTITIONS with queries, tested with SQL_NO_CACHE in order to guarantee it wouldn't use cache and searching data for one whole month now takes 1.8s.
It's so much faster!
This is what I did:
ALTER TABLE `imported_lines` DROP INDEX dh;
ALTER TABLE `imported_lines` ADD INDEX dhc (`day`, `name`, `client_id`);
Query:
SELECT origin_id, error, main_index, transaction_index,
expressions_applied, name, day,
COUNT(id) AS total, SUM(count_records) AS sum_records
FROM imported_lines il
INNER JOIN (
SELECT id FROM imported_lines
WHERE client_id = 1
AND day >= '2017-07-01' AND day <= '2017-07-31'
AND name IN ('name1', 'name2', 'name3', ...)
) AS il_filter
ON il_filter.id = il.id
WHERE il.client_id = 1
GROUP BY origin_id, error, main_index, transaction_index, expressions_applied, name, day;
I realized using INNER JOIN, EXPLAIN PARTITIONS it began to use index. Also with WHERE il.client_id = 1, query reduces the number of partitions to look up.
Thanks for your help!
I have two tables, main_part (3k records) and part_details (25k records)
I tried the following indexes but explain always returns full table scan of 25k records as opposed to about 2k of matched records and Using where; Using temporary; Using filesort
ALTER TABLE `main_part` ADD INDEX `main_part_index_1` (`unit`);
ALTER TABLE `part_details` ADD INDEX `part_details_index_1` (`approved`, `display`, `country`, `id`, `price`);
Here is my query:
SELECT a.part_id, b.my_title,
b.price, a.type,
a.unit, a.certification,
b.my_image,
b.price/a.unit AS priceW
FROM main_part AS a
INNER JOIN part_details AS b ON a.part_id=b.id
WHERE b.approved = 'Yes'
AND b.display = 'On'
AND b.country = 'US'
AND a.unit >= 300
ORDER BY b.price ASC LIMIT 50
One thing that I am aware of is that a.part_id is not a Primary Key in main_part table. Could this be a culprit?
Create tables SQL:
CREATE TABLE `main_part` (
`id` smallint(6) NOT NULL AUTO_INCREMENT,
`part_id` mediumint(9) NOT NULL DEFAULT '0',
`type` varchar(50) NOT NULL DEFAULT '',
`unit` varchar(50) NOT NULL DEFAULT '',
`certification` varchar(50) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `part_details` (
`id` mediumint(9) NOT NULL AUTO_INCREMENT,
`asn` varchar(50) NOT NULL DEFAULT '',
`country` varchar(5) NOT NULL DEFAULT '',
`my_title` varchar(200) NOT NULL DEFAULT '',
`display` varchar(5) NOT NULL DEFAULT 'On',
`approved` varchar(5) NOT NULL DEFAULT 'No',
`price` decimal(7,3) NOT NULL DEFAULT '0.000',
`my_image` varchar(250) NOT NULL DEFAULT '',
`update_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `countryasn` (`country`,`asn`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
For your query the more important index is the JOIN condition and as you are already aware a.part_id isn't primary key, so doesn't have a default index and your first try should be:
ALTER TABLE `main_part` ADD INDEX `main_part_index_1` (`part_id`,`unit`);
Because we are interested on the JOIN condition first you should also change the second index to
ALTER TABLE `part_details` ADD INDEX `part_details_index_1`
(`id`, `approved`, `display`, `country`, `price`);
order matters in the index
Another tip is you start with the basic query:
SELECT *
FROM main_part AS a
INNER JOIN part_details AS b ON a.part_id=b.id
Add index for part_id and id check the explain plan and then start adding condition and updating the index if required.
It seems that most columns used for filtering in part_details aren't very selective (display is probably an On/off switch, country is probably very similar in many products, etc.).
In some cases, when the WHERE clause is not very selective, MySQL may choose to use an index that better suits the ORDER BY clause.
I would try to create this index as well and check in the explain plan if there is any changes:
ALTER TABLE `part_details` ADD INDEX `part_details_price_index` (`price`);
For this query:
SELECT mp.part_id, pd.my_title, pd.price, mp.type,
mp.unit, mp.certification, pd.my_image,
pd.price/mp.unit AS priceW
FROM main_part mp INNER JOIN
part_details pd
ON mp.part_id = pd.id
WHERE pd.approved = 'Yes' AND
pd.display = 'On' AND
pd.country = 'US' AND
mp.unit >= 300
ORDER BY pd.price ASC
LIMIT 50;
For this query, I would start with indexes on part_details(country, display, approved, id, price) and main_part(part_id, unit).
The index on part_details can be used for filtering before the join. It is not easy to get rid of the sort for the order by.
My app needs to run this query pretty often, which gets a list of user data for the app to display. The problem is that subquery about the user_quiz is resource heavy and calculating the rankings are also very CPU intense too.
Benchmark: ~.5 second each run
When it will be run:
When the user want to see their ranking
When the user want to see other people's ranking
Getting a list of user's friends
.5 second it's a really long time considering this query will be run pretty often. Is there anything I could do to optimize this query?
Table for user:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`firstname` varchar(100) DEFAULT NULL,
`lastname` varchar(100) DEFAULT NULL,
`password` varchar(20) NOT NULL,
`email` varchar(300) NOT NULL,
`verified` tinyint(10) DEFAULT NULL,
`avatar` varchar(300) DEFAULT NULL,
`points_total` int(11) unsigned NOT NULL DEFAULT '0',
`points_today` int(11) unsigned NOT NULL DEFAULT '0',
`number_correctanswer` int(11) unsigned NOT NULL DEFAULT '0',
`number_watchedvideo` int(11) unsigned NOT NULL DEFAULT '0',
`create_time` datetime NOT NULL,
`type` tinyint(1) unsigned NOT NULL DEFAULT '1',
`number_win` int(11) unsigned NOT NULL DEFAULT '0',
`number_lost` int(11) unsigned NOT NULL DEFAULT '0',
`number_tie` int(11) unsigned NOT NULL DEFAULT '0',
`level` int(1) unsigned NOT NULL DEFAULT '0',
`islogined` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=230 DEFAULT CHARSET=utf8;
Table for user_quiz:
CREATE TABLE `user_quiz` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`question_id` int(11) NOT NULL,
`is_answercorrect` int(11) unsigned NOT NULL DEFAULT '0',
`question_answer_datetime` datetime NOT NULL,
`score` int(1) DEFAULT NULL,
`quarter` int(1) DEFAULT NULL,
`game_type` int(1) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=9816 DEFAULT CHARSET=utf8;
Table for user_starter:
CREATE TABLE `user_starter` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`result` int(1) DEFAULT NULL,
`created_date` date DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=456 DEFAULT CHARSET=utf8mb4;
My indexes:
Table: user
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
user 0 PRIMARY 1 id A 32 BTREE
Table: user_quiz
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
user_quiz 0 PRIMARY 1 id A 9462 BTREE
user_quiz 1 user_id 1 user_id A 270 BTREE
Table: user_starter
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
user_starter 0 PRIMARY 1 id A 454 BTREE
user_starter 1 user_id 1 user_id A 227 YES BTREE
Query:
SET #curRank = 0;
SET #lastPlayerPoints = 0;
SELECT
sub.*,
#curRank := IF(#lastPlayerPoints!=points_week, #curRank + 1, #curRank) AS rank,
#lastPlayerPoints := points_week AS db_PPW
FROM (
SELECT u.id,u.firstname,u.lastname,u.email,u.avatar,u.type,u.points_total,u.number_win,u.number_lost,u.number_tie,u.verified,
COALESCE(SUM(uq.score),0) as points_week,
COALESCE(us.number_lost,0) as number_week_lost,
COALESCE(us.number_win,0) as number_week_win,
(select MAX(question_answer_datetime) from user_quiz WHERE user_id = u.id and game_type = 1) as lastFrdFight,
(select MAX(question_answer_datetime) from user_quiz WHERE user_id = u.id and game_type = 2) as lastBotFight
FROM `user` u
LEFT JOIN (SELECT user_id,
count(case when result=1 then 1 else null end) as number_win,
count(case when result=-1 then 1 else null end) as number_lost
from user_starter where created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27' ) us ON u.id = us.user_id
LEFT JOIN (SELECT * FROM user_quiz WHERE question_answer_datetime BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 00:00:00') uq on u.id = uq.user_id
GROUP BY u.id ORDER BY points_week DESC, u.lastname ASC, u.firstname ASC
) as sub
EXPLAIN:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 3027 100
2 DERIVED u ALL PRIMARY 32 100 Using temporary; Using filesort
2 DERIVED <derived5> ALL 1 100 Using where; Using join buffer (Block Nested Loop)
2 DERIVED <derived6> ref <auto_key0> <auto_key0> 4 fancard.u.id 94 100
6 DERIVED user_quiz ALL 9461 100 Using where
5 DERIVED user_starter ALL 454 100 Using where
4 DEPENDENT SUBQUERY user_quiz ref user_id user_id 4 func 35 100 Using where
3 DEPENDENT SUBQUERY user_quiz ref user_id user_id 4 func 35 100 Using where
Example output and expected output:
Bench mark: around .5 second
The following index should make the subquery to user_quiz ultra fast.
ALTER TABLE user_quiz
ADD INDEX (`user_id`,`game_type`,`question_answer_datetime`)
Please provide SHOW CREATE TABLE tablename statements for all tables, as that will help with additional optimizations.
Update #1
Alright, I've had some time to look things over, and fortunately there a appears to be a lot of relatively low hanging fruit in terms of optimization.
Here are all the indexes to add:
ALTER TABLE user_quiz
ADD INDEX `userGametypeAnswerDatetimes` (`user_id`,`game_type`,`question_answer_datetime`)
ALTER TABLE user_quiz
ADD INDEX `userAnswerScores` (`user_id`,`question_answer_datetime`,`score`)
ALTER TABLE user_starter
ADD INDEX `userResultDates` (`user_id`,`result`,`created_date`)
Note that the names (such as userGametypeAnswerDatetimes) are optional, and you can name them to whatever makes the most sense to you. But, in general, it's good to put specific names on your custom indexes (simply for organization purposes.)
Now, here is your query that should work will with those new indexes:
SET #curRank = 0;
SET #lastPlayerPoints = 0;
SELECT
sub.*,
#curRank := IF(#lastPlayerPoints!=points_week, #curRank + 1, #curRank) AS rank,
#lastPlayerPoints := points_week AS db_PPW
FROM (
SELECT u.id,
u.firstname,
u.lastname,
u.email,
u.avatar,
u.type,
u.points_total,
u.number_win,
u.number_lost,
u.number_tie,
u.verified,
COALESCE(user_scores.score,0) as points_week,
COALESCE(user_losses.number_lost,0) as number_week_lost,
COALESCE(user_wins.number_win,0) as number_week_win,
(
select MAX(question_answer_datetime)
from user_quiz
WHERE user_id = u.id and game_type = 1
) as lastFrdFight,
(
select MAX(question_answer_datetime)
from user_quiz
WHERE user_id = u.id
and game_type = 2
) as lastBotFight
FROM `user` u
LEFT OUTER JOIN (
SELECT user_id,
COUNT(*) AS number_won
from user_starter
WHERE created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27'
AND result = 1
GROUP BY user_id
) user_wins
ON user_wins.user_id = u.user_id
LEFT OUTER JOIN (
SELECT user_id,
COUNT(*) AS number_lost
from user_starter
WHERE created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27'
AND result = -1
GROUP BY user_id
) user_losses
ON user_losses.user_id = u.user_id
LEFT OUTER JOIN (
SELECT SUM(score)
FROM user_quiz
WHERE question_answer_datetime
BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 00:00:00'
GROUP BY user_id
) user_scores
ON u.id = user_scores.user_id
ORDER BY points_week DESC, u.lastname ASC, u.firstname ASC
) as sub
Note: This is not necessarily the best result. It depends a LOT on your data set, as to whether this is necessarily the best, and sometimes you need to do a bit of trial and error.
A hint as to what you can use trial and error on is the structure of how we query the lastFrdFight and lastBotFight verses how we query points_week, number_week_lost, number_week_win. All of these could either be done in the select statement (like the first two are in my query) or could be done by joining to a subquery result (like the last three do, in my query.)
Mix and match to see what works best. In general, I've found the joining to a subquery to be fastest when you have a large number of rows in the outer query (in this case, querying the user table.) This is because it only needs to get the results once, and then can just match them up on a user by user basis. Other times, it can be better to have the query just in the SELECT clause - this will run MUCH faster, since there are more constants (the user_id is already known), but has to run for each row. So it's a trade off, and why you sometimes need to use trial and error.
Why do the indexes work?
So, you may be wondering why I made the indexes as I did. If you are familiar with phone books (in this age of smartphones, that's no longer a valid assumption I can make) then we can use that as an analogy:
If you had a composite index of phonebookIndex (lastname,firstname,email) on your user table (example here! you don' actually need to add that index!) you would have a result similar to what a phone book provides. (Using email instead of phone number.)
Each index is an internal copy of the data in the overall table. With this phonebookIndex there would internally be stored a list of all users with their lastname, then their first name, and then their email, and each of these would be ordered, just like a phone book.
Why is that useful? Consider when you know someone's first and last name. You can quickly flip to where their last name is, then quickly go through that list of everyone with their last name, finding the first name you want, so obtaining the email.
Indexes work in exactly the same way, in terms of how the database looks at them.
Consider the userGametypeAnswerDatetimes index I defined above, and how we query that index in the lastFrdFight SELECT subquery.
(
select MAX(question_answer_datetime)
from user_quiz
WHERE user_id = u.id and game_type = 1
) as lastFrdFight
Notice how we have both the user_id (from the outer query) and the game_type as constants. That is exactly like our example earlier, with having the first and last name, and wanting to look up an email/phone number. In this case, we are looking for the MAX of the 3rd value in the index. Still easy: All the values are ordered, so if this index was sitting in front of us, we could just flip to the specific user_id, then look at the section with all game_type=1 and then just pick the last value to find the maximum. Very very fast. Same for the database. It can find this value extremely fast, which is why you saw an 80%+ reduction in your overall query time.
So, that's how indexes work, and why I choose these indexes as I did.
Be aware, that the more indexes you have, the more you'll see slowdowns when doing inserts and updates. But, if you are reading a lot more from your tables than you are writing, this is usually a more than acceptable trade off.
So, give these changes a shot, and let me know how it performs. Please provide the new EXPLAIN plan if you want further optimization help. Also, this should give you quite a bit of tools to use trial and error to see what does work at what doesn't. All my changes are fairly independent of each other, so you can swap them in and out with your original query pieces to see how each one works.
I have two tables with structure that can be described as follows:
CREATE TABLE `sub_schedule` (
`ScheduleID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`ServiceID` bigint(20) unsigned NOT NULL,
`RunTime` time NOT NULL,
`Status` char(1) NOT NULL DEFAULT 'A',
`Telco` text,
PRIMARY KEY (`ScheduleID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `mt` (
`MtID` int(15) unsigned NOT NULL AUTO_INCREMENT,
`ServiceID` bigint(20) unsigned NOT NULL,
`Moperator` varchar(10) NOT NULL,
`Cmd` varchar(20) NOT NULL,
`CreateDate` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`MtID`),
KEY `CreateDate` (`CreateDate`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Mt table is big one and the sub_schedule table is holding less then 500 records.
When i try to run this query:
EXPLAIN
SELECT m.serviceid
, m.createdate
, m.moperator
FROM mt m
JOIN sub_schedule ss
ON m.serviceid = ss.serviceid
AND ss.status = "A"
AND ss.telco LIKE CONCAT('%', m.moperator, '%')
WHERE m.createdate >= addtime((subdate(curdate(), 1)),ss.runtime)
AND m.createdate <= addtime((subdate(curdate(), 0)),ss.runtime)
AND m.cmd LIKE "SUBS%";
It produce this output:
id, select_type, table, type, possible_keys, key , key_len, ref , rows , Extra
1, SIMPLE , ss , ALL , NULL , NULL , NULL , NULL , 470 , Using where
1, SIMPLE , m , ALL , CreateDate , NULL , NULL , NULL , 57610462 , Range checked for each record (index map: 0x10)
It seems that it doesn't use index for createdate in that query which result in very long query execution time. Already tried with FORCE INDEX and different approaches to that query and tried to move createdate to the ON condition part and use FORCE INDEX FOR JOIN.
So my question is: Is there any way to make mysql actually use index for createdate field?
There is no index on mt.serviceid therefore all rows from mt must be scanned to fulfill the condition m.serviceid = ss.serviceid. Other conditions on this table can and will be checked during the full table scan.
Add an index on mt(serviceid, createdate) or mt(createdate, serviceid) (not sure which one will yield best results).
Note: if you decide to go with and index on mt(createdate, serviceid), then the index on mt(createdate) becomes superfluous.
Here's the query:
SELECT COUNT(*) AS c, MAX(`followers_count`) AS max_fc,
MIN(`followers_count`) AS min_fc, MAX(`following_count`) AS max_fgc,
MIN(`following_count`) AS min_fgc, SUM(`followers_count`) AS fc,
SUM(`following_count`) AS fgc, MAX(`updates_count`) AS max_uc,
MIN(`updates_count`) AS min_uc, SUM(`updates_count`) AS uc
FROM `profiles`
WHERE `twitter_id` IN (SELECT `followed_by`
FROM `relations`
WHERE `twitter_id` = 123);
The two tables are profiles and relations. Both have over 1,000,000 rows, InnoDB engine. Both have indexes on twitter_id, relations has an extra index on (twitter_id, followed_by). The query is taking over 6 seconds to execute, this really frustrates me. I know that I can JOIN this somehow, but my MySQL knowledge is not so cool, that's why I'm asking for your help.
Thanks in advance everyone =)
Cheers,
K ~
Updated
Okay I managed to get down to 2,5 seconds. I used INNER JOIN and added the three index pairs. Here's the EXPLAIN results:
id, select_type, table, type, possible_keys,
key, key_len, ref, rows, Extra
1, 'SIMPLE', 'r', 'ref', 'relation',
'relation', '4', 'const', 252310, 'Using index'
1, 'SIMPLE', 'p', 'ref', 'PRIMARY,twiter_id,id_fc,id_fgc,id_uc',
'id_uc', '4', 'follerme.r.followed_by', 1, ''
Hope this helps.
Another update
Here are the SHOW CREATE TABLE statements for both tables:
CREATE TABLE `profiles` (
`twitter_id` int(10) unsigned NOT NULL,
`screen_name` varchar(45) NOT NULL default '',
`followers_count` int(10) unsigned default NULL,
`following_count` int(10) unsigned default NULL,
`updates_count` int(10) unsigned default NULL,
`location` varchar(45) default NULL,
`bio` varchar(160) default NULL,
`url` varchar(255) default NULL,
`image` varchar(255) default NULL,
`registered` int(10) unsigned default NULL,
`timestamp` int(10) unsigned default NULL,
`relations_timestamp` int(10) unsigned default NULL,
PRIMARY KEY USING BTREE (`twitter_id`,`screen_name`),
KEY `twiter_id` (`twitter_id`),
KEY `screen_name` USING BTREE (`screen_name`,`twitter_id`),
KEY `id_fc` (`twitter_id`,`followers_count`),
KEY `id_fgc` (`twitter_id`,`following_count`),
KEY `id_uc` (`twitter_id`,`updates_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `relations` (
`id` int(10) unsigned NOT NULL auto_increment,
`twitter_id` int(10) unsigned NOT NULL default '0',
`followed_by` int(10) unsigned default NULL,
`timestamp` int(10) unsigned default NULL,
PRIMARY KEY USING BTREE (`id`,`twitter_id`),
UNIQUE KEY `relation` (`twitter_id`,`followed_by`)
) ENGINE=InnoDB AUTO_INCREMENT=1209557 DEFAULT CHARSET=utf8
Wow, what a mess =) Sorry!
A join would look something like this:
SELECT COUNT(*) AS c,
MAX(p.`followers_count`) AS max_fc,
MIN(p.`followers_count`) AS min_fc,
MAX(p.`following_count`) AS max_fgc,
MIN(p.`following_count`) AS min_fgc,
SUM(p.`followers_count`) AS fc,
SUM(p.`following_count`) AS fgc,
MAX(p.`updates_count`) AS max_uc,
MIN(p.`updates_count`) AS min_uc,
SUM(p.`updates_count`) AS uc
FROM `profiles` AS p
INNER JOIN `relations` AS r ON p.`twitter_id` = r.`followed_by`
WHERE r.`twitter_id` = 123;
To help optimize it you should run EXPLAIN SELECT ... on both queries.
Create the following composite indexes:
profiles (twitter_id, followers_count)
profiles (twitter_id, following_count)
profiles (twitter_id, updates_count)
and post the query plan, for God's sake.
By the way, how many rows does this COUNT(*) return?
Update:
Your table rows are quite long. Create a composite index on all the fields you select:
profiles (twitter_id, followers_count, following_count, updates_count)
so that the JOIN query can retrieve all the values it need from that index.
SELECT COUNT(*) AS c,
MAX(`followers_count`) AS max_fc, MIN(`followers_count`) AS min_fc,
MAX(`following_count`) AS max_fgc, MIN(`following_count`) AS min_fgc,
SUM(`followers_count`) AS fc, SUM(`following_count`) AS fgc,
MAX(`updates_count`) AS max_uc, MIN(`updates_count`) AS min_uc, SUM(`updates_count`) AS uc
FROM `profiles`
JOIN `relations`
ON (profiles.twitter_id = relations.followed_by)
WHERE relations.twitted_id = 123;
might be a bit faster, but you'll need to measure and check if that is indeed so.
count(*) is a very expensive operation under the InnoDB Engine, have you tried this query without that piece? If it's causing the most processing time then maybe you could keep a running value instead of querying for it each time.
I'd approach this problem from a programmers angle; I'd have a separate table (or storage area somewhere) that stored the max,min and sum values associated with each field in your original query and update those values every time I updated and added a table record. (although deleting may be problematic if not handled correctly).
After the original query to populate these values is complete (which is the almost the same as the query you posted), you're essentially reducing your final query to getting one row from a data table, rather than computing everything all at once.