I want to reduce the time taken by the query in mysql.
There are three tables say
A ~600k rows,
B ~2K rows,
C ~100K rows
having 2 columns each.
A has one column which is used in aggregation and other to join with table B.
B has one column to join with A and other with C
C has one column to join with B and other column to group by.
What should be the indexing plan to reduce the run time. As of now it is using temporary tables and then file sort. Is there any way we could avoid temporary tables.
Sample query :
SELECT
sum(`revenue_facts`.`total_price`) AS `m0`
FROM
`category_groups` AS `category_groups`,
`revenue_facts` AS `revenue_facts`,
`dim_products` AS `dim_products`
WHERE
`dim_products`.`product_category_group_sk` = `category_groups`.`product_category_group_sk` AND
`revenue_facts`.`product_sk` = `dim_products`.`product_sk`
GROUP BY `category_groups`.`category_name`;
I already have indexes on group by column and the columns in join.
my query is currently taking *6 minute*s. I want to reduce the time taken. table structure is as
table A :
CREATE TABLE `revenue_facts` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`product_sk` bigint(20) unsigned NOT NULL,
`total_price` decimal(12,2) NOT NULL,
PRIMARY KEY (`id`),
KEY `product_sk` (`product_sk`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Table B :
CREATE TABLE `dim_products` (
`product_sk` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`product_category_group_sk` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`product_sk`),
KEY `product_id` (`product_id`),
KEY (`product_sk`) (`product_sk`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
table C :
CREATE TABLE `category_groups` (
`product_category_group_sk` bigint(20) unsigned NOT NULL,
`category_sk` bigint(20) unsigned NOT NULL,
`category_name` varchar(255) NOT NULL,
PRIMARY KEY (`product_category_group_sk`,`category_sk`),
KEY `category_sk` (`category_sk`),
KEY `product_category_group_sk` (`product_category_group_sk`
KEY `category_sk` (`category_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Execution plan used is:
1 SIMPLE dim_products index PRIMARY,product_category_group_index product_category_group_index 8 NULL 651264 Using index; Using temporary; Using filesort
1 SIMPLE category_groups ref PRIMARY,category_sk,product_category_group_sk,category_name product_category_group_sk 8 etl_testing.dim_products.product_category_group_sk 4 Using index
1 SIMPLE revenue_facts ref product_sk product_sk 8 etl_testing..dim_products.product_sk 5 NULL
Try this:
SELECT
sum(`revenue_facts`.`total_price`) AS `m0`
FROM
(`dim_products` LEFT JOIN `category_groups` ON `dim_products`.`product_category_group_sk` = `category_groups`.`product_category_group_sk`)
LEFT JOIN `revenue_facts` ON `dim_products`.`product_sk` = `revenue_facts`.`product_sk`
GROUP BY `category_groups`.`category_name`;
Also, as Abdul said:
"Post your table structures and explain plan"
Related
Sorry fot long post but this is really strange and I am close to give it up. 2 tables:
CREATE TABLE `endu_results` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`base_name` varchar(200) NOT NULL,
`base_nr` int(11) DEFAULT NULL,
`base_yob` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `endu_results_206a6355` (`base_name`),
KEY `endu_results_63df4402` (`base_nr`),
KEY `base_yob` (`base_yob`)
) ENGINE=InnoDB AUTO_INCREMENT=3424028 DEFAULT CHARSET=utf8;enter code here
and 2nd:
CREATE TABLE `endu_resultinterest` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`result_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `endu_resultinterest_3b529087` (`result_id`),
CONSTRAINT `result_id_refs_id_19e24435` FOREIGN KEY (`result_id`) REFERENCES `endu_results` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=48590 DEFAULT CHARSET=utf8;
There are about 2mln records in endu_resultstable and less then 100K i endu_resultinterest. I have slow query:
explain select base_yob from endu_resultinterest
inner join endu_results
on (endu_results.id = endu_resultinterest.result_id)
order by endu_results.base_yob;
1 SIMPLE endu_resultinterest index endu_resultinterest_3b529087 endu_resultinterest_3b529087 4 NULL 47559 Using index; Using temporary; Using filesort
The question is: Why mysql is using this index: endu_resultinterest_3b529087 - but it should use base_yob - this is where sorting is requested ?
To test it further I have manaully created 2 additional identical tables endu_testresults and endu_testresultintrest and filled those with some records:
CREATE TABLE `endu_testresults` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`base_yob` int(11) DEFAULT NULL,
`base_name` varchar(200) NOT NULL,
`base_nr` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `endu_testresults_a65b2616` (`base_yob`),
KEY `endu_testresults_ba0ab39c` (`base_name`),
KEY `endu_testresults_d75ba04d` (`base_nr`)
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8;
So I go again for explain:
explain select base_yob from endu_testresultinterest
inner join endu_testresults
on (endu_testresults.id = endu_testresultinterest.result_id)
order by endu_testresults.base_yob;
and suprise suprise:
1 SIMPLE endu_testresults index PRIMARY endu_testresults_a65b2616 5 NULL 19 Using index
Index sort column base_yob (endu_testresults_a65b2616) is now used.
Why is that in one case index is used and in other I got 'using filesort;using temporary ? Does size matters ? I will try to copy records from one to another but do not get it with indexes. MySql is 5.6.16
Short answer: Because it is faster.
Long answer...
Your EXPLAINs seem to be incomplete -- I would expect 2 lines in each.
The first table is 20 (70?) times as big as the second. The optimizer picked the smaller table to start with. Hence it is initially doing 1/20th the amount of work. The sort that comes later (ORDER BY ...) is much less work than if it had to do 20 times as much work to start with.
The output is only 48K rows, correct? And that is how many rows in the 2nd table, correct?
Your test tables did not have the same bigger/smaller ratio, did they? Hence the different EXPLAIN.
I have a query that take 50 seconds
SELECT `security_tasks`.`itemid` AS `itemid`
FROM `security_tasks`
INNER JOIN `relations` ON (`relations`.`user_id` = `security_tasks`.`user_id` AND `relations`.`relation_type_id` = `security_tasks`.`relation_type_id` AND `relations`.`relation_with` = 3001 )
Records in security_tasks = 841321 || Records in relations = 234254
CREATE TABLE `security_tasks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`itemid` int(11) DEFAULT NULL,
`relation_type_id` int(11) DEFAULT NULL,
`Task_id` int(2) DEFAULT '0',
`job_id` int(2) DEFAULT '0',
`task_type_id` int(2) DEFAULT '0',
`name` int(2) DEFAULT '0'
PRIMARY KEY (`id`),
KEY `itemid` (`itemid`),
KEY `relation_type_id` (`relation_type_id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1822995 DEFAULT CHARSET=utf8;
CREATE TABLE `relations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`relation_with` int(11) DEFAULT NULL,
`relation_type_id` int(11) DEFAULT NULL,
`manager_level` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `relation_with` (`relation_with`),
KEY `relation_type_id` (`relation_type_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1082882 DEFAULT CHARSET=utf8;
what can i do to make it fast, like 1 or 2 seconds fast
EXPLAIN :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE relations ref user_id,relation_with,relation_type_id relation_with 5 const 169 Using where
1 SIMPLE security_tasks ref relation_type_id,user_id user_id 5 transparent.relations.user_id 569 Using where
UPDATE :
adding a composite key minimized the time to 20 seconds
ALTER TABLE security_tasks ADD INDEX (user_id, relation_type_id) ; ALTER TABLE relations ADD INDEX (user_id, relation_type_id) ; ALTER TABLE relations ADD INDEX (relation_with) ;
The problem is when the relations table has large data for the selected user (relations.relation_with` = 3001 )
any ideas ?
Adjust your compound index slightly, don't do just two, but all three parts
ALTER TABLE relations ADD INDEX (user_id, relation_type_id, relation_with)
The index does not just have to be on the joined columns, but SHOULD be based on joined columns PLUS anything else that makes sense as querying criteria is concerned (within reason, takes time to learn more efficiencies). So, in the case suggested, you know the join on the user and type, but are also specific to the relation with... so that is added to the same index.
Additionally, your security task table, you could add the itemID to the index to make it a covering index (ie: covers the join conditions AND the data element(s) you want to retrieve). This too is a technique, and should NOT include all other elements in a query, but since this is a single column might make sense for your scenario. So, look into "covering indexes", but in essence, a covering index qualifies the join, but since it also has this "itemid", the engine does not have to go back to the raw data pages of the entire security tasks table to get that one column. It's part of the index so it grabs whatever qualified the join and comes along for the ride and you are done.
ALTER TABLE security_tasks ADD INDEX (user_id, relation_type_id, itemid) ;
And for readability purposes, especially with long table names, it's good to use aliases
SELECT
st.itemid
FROM
security_tasks st
INNER JOIN relations r
ON st.user_id = r.user_id
AND st.relation_type_id = r.relation_type_id
AND r.relation_with = 3001
I'm struggling to understand if I've indexed this query properly, it's somewhat slow and I feel it could use optimization. MySQL 5.1.70
select snaps.id, snaps.userid, snaps.ins_time, usr.gender
from usersnaps as snaps
join user as usr on usr.id = snaps.userid
left join user_convert as conv on snaps.userid = conv.userid
where (conv.level is null or conv.level = 4) and snaps.active = 'N'
and (usr.status = "unfilled" or usr.status = "unapproved") and usr.active = 1
order by snaps.ins_time asc
usersnaps table (irrelevant deta removed, size about 250k records) :
CREATE TABLE IF NOT EXISTS `usersnaps` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`userid` int(11) unsigned NOT NULL DEFAULT '0',
`picture` varchar(250) NOT NULL,
`active` enum('N','Y') NOT NULL DEFAULT 'N',
`ins_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`,`userid`),
KEY `userid` (`userid`,`active`),
KEY `ins_time` (`ins_time`),
KEY `active` (`active`)
) ENGINE=InnoDB;
user table (irrelevant deta removed, size about 300k records) :
CREATE TABLE IF NOT EXISTS `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`active` tinyint(1) NOT NULL DEFAULT '1',
`status` enum('15','active','approval','suspended','unapproved','unfilled','rejected','suspended_auto','incomplete') NOT NULL DEFAULT 'approval',
PRIMARY KEY (`id`),
KEY `status` (`status`,`active`)
) ENGINE=InnoDB;
user_convert table (size about : 60k records) :
CREATE TABLE IF NOT EXISTS `user_convert` (
`userid` int(10) unsigned NOT NULL,
`level` tinyint(4) NOT NULL,
UNIQUE KEY `userid` (`userid`),
KEY `level` (`level`)
) ENGINE=InnoDB;
Explain extended returns :
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE snaps ref userid,default_pic,active active 1 const 65248 100.00 Using where; Using filesort
1 SIMPLE usr eq_ref PRIMARY,active,status PRIMARY 4 snaps.userid 1 100.00 Using where
1 SIMPLE conv eq_ref userid userid 4s snaps.userid 1 100.00 Using where
Using filesort is probably your performance killer.
You need the records from usersnaps where active = 'N' and you need them sorted by ins_time.
ALTER TABLE usersnaps ADD KEY active_ins_time (active,ins_time);
Indexes are stored in sorted order, and read in sorted order... so if the optimizer chooses that index, it will go for the records with active = 'N' and -- hey, look at that -- they're already sorted by ins_time -- because of that index. So as it reads the rows referenced by the index, the result-set internally is already in the order you want it to ORDER BY, and the optimizer should realize this... no filesort required.
I would recommend changing the userid index (assuming you're not using it right now) to have active first and userid later.
That should make it more useful for this query.
I have two MYSQL tables, A and B. Table A has 44,902 rows and table B has 109,583 rows.
I would like to compare two columns from the two tables and return the rows from table A where it finds a match. My, unsuccessful queries are:
SELECT pool.domain_name FROM `pool`, `en_dict` WHERE pool.domain_string = en_dict.word
and another variant:
SELECT a.domain_name FROM `pool` as a inner join en_dict as b on a.domain_string = b.word
both solutions falied returning any values under 300 seconds.
What should I do to reduce the time for finding the matches??
P.S. I have tried adding a LIMIT at the end of the queries and managed to display 10 results in 245 seconds.
Edit: My tables structures are as follows :
--
-- Table structure for table `en_dict`
--
CREATE TABLE `en_dict` (
`word_id` bigint(20) unsigned zerofill NOT NULL AUTO_INCREMENT,
`word` varchar(35) NOT NULL,
PRIMARY KEY (`word_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=109584 ;
-- --------------------------------------------------------
--
-- Table structure for table `pool`
--
CREATE TABLE `pool` (
`domain_id` bigint(20) unsigned zerofill NOT NULL AUTO_INCREMENT,
`domain_name` varchar(100) NOT NULL,
`domain_tld` varchar(10) NOT NULL,
`domain_string` varchar(90) NOT NULL,
`domain_lenght` int(2) NOT NULL,
`domain_expiretime` date NOT NULL,
PRIMARY KEY (`domain_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=44903 ;
Try adding an index on the relevant columns of your tables:
ALTER TABLE `pool` ADD INDEX `domain_string_idx` (`domain_string`);
ALTER TABLE `en_dict` ADD INDEX `word_idx` (`word`);
To start out here is a simplified version of the tables involved.
tbl_map has approx 4,000,000 rows, tbl_1 has approx 120 rows, tbl_2 contains approx 5,000,000 rows. I know the data shouldn't be consider that large given that Google, Yahoo!, etc use much larger datasets. So I'm just assuming that I'm missing something.
CREATE TABLE `tbl_map` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`tbl_1_id` bigint(20) DEFAULT '-1',
`tbl_2_id` bigint(20) DEFAULT '-1',
`rating` decimal(3,3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `tbl_1_id` (`tbl_1_id`),
KEY `tbl_2_id` (`tbl_2_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tbl_1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tbl_2` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`data` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The Query in interest: also, instead of ORDER BY RAND(), ORDERY BY t.id DESC. The query is taking as much as 5~10 seconds and causes a considerable wait when users view this page.
EXPLAIN SELECT t.data, t.id , tm.rating
FROM tbl_2 AS t
JOIN tbl_map AS tm
ON t.id = tm.tbl_2_id
WHERE tm.tbl_1_id =94
AND tm.rating IS NOT NULL
ORDER BY t.id DESC
LIMIT 200
1 SIMPLE tm ref tbl_1_id, tbl_2_id tbl_1_id 9 const 703438 Using where; Using temporary; Using filesort
1 SIMPLE t eq_ref PRIMARY PRIMARY 8 tm.tbl_2_id 1
I would just liked to speed up the query, ensure that I have proper indexes, etc.
I appreciate any advice from DB Gurus out there! Thanks.
SUGGESTION : Index the table as follows:
ALTER TABLE tbl_map ADD INDEX (tbl_1_id,rating,tbl_2_id);
As per Rolando, yes, you definitely need an index on the map table but I would expand to ALSO include the tbl_2_id which is for your ORDER BY clause of Table 2's ID (which is in the same table as the map, so just use that index. Also, since the index now holds all 3 fields, and is based on the ID of the key search and criteria of null (or not) of rating, the 3rd element has them already in order for your ORDER BY clause.
INDEX (tbl_1_id,rating, tbl_2_id);
Then, I would just have the query as
SELECT STRAIGHT_JOIN
t.data,
t.id ,
tm.rating
FROM
tbl_map tm
join tbl_2 t
on tm.tbl_2_id = t.id
WHERE
tm.tbl_1_id = 94
AND tm.rating IS NOT NULL
ORDER BY
tm.tbl_2_id DESC
LIMIT 200