SELECT with WHERE IN and subquery extremely slow - mysql

I want to exectute the following query:
SELECT *
FROM `bm_tracking`
WHERE `oid` IN
(SELECT `oid`
FROM `bm_tracking`
GROUP BY `oid` HAVING COUNT(*) >1)
The subquery:
SELECT `oid`
FROM `bm_tracking`
GROUP BY `oid`
HAVING COUNT( * ) >1
executes in 0.0525 secs
The whole query "stucks" (still processing after 3 minutes...). Column oid is indexed.
Table bm_tracking contains around 64k rows.
What could be the reason for this "stuck"?
[Edit: Upon request]
CREATE TABLE `bm_tracking` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`oid` varchar(10) NOT NULL,
`trk_main` varchar(50) NOT NULL,
`tracking` varchar(50) NOT NULL,
`label` text NOT NULL,
`void` int(11) NOT NULL DEFAULT '0',
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `oid` (`oid`),
KEY `trk_main` (`trk_main`),
KEY `tracking` (`tracking`),
KEY `created` (`created`)
) ENGINE=MyISAM AUTO_INCREMENT=63331 DEFAULT CHARSET=latin1
[Execution Plan]

Generally exists EXISTS faster than IN so you can try this and see if it executes better for you
SELECT *
FROM `bm_tracking` bt
WHERE EXISTS
( SELECT 1
FROM `bm_tracking` bt1
WHERE bt.oid = bt1.oid
GROUP BY `oid`
HAVING COUNT(*) >1
)
EDIT:
if you notice from the EXPLAIN you posted... the IN() is considered as a DEPENDENT SUBQUERY which is a correlated subquery... meaning that for every row in the table all rows in the table are pulled and compared... so for example 1000 rows in the table would mean 1000 * 1000 = 1 million comparisons -- thats why its taking such a long time

Related

performance issue when joining two large tables

I have a multilingual CMS that uses a translation table (70k rows) that contains all of the texts
CREATE TABLE IF NOT EXISTS `translations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`key` int(11) NOT NULL,
`lang` int(11) NOT NULL,
`value` text CHARACTER SET utf8,
PRIMARY KEY (`id`),
KEY `key` (`key`,`lang`)
) ENGINE=MyISAM
and products table (4k rows) containing products with translation keys
CREATE TABLE IF NOT EXISTS `products` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name_trans_id` int(11) NOT NULL,
`desc_trans_id` int(11) DEFAULT NULL,
`text_trans_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_index` (`name_trans_id`),
KEY `desc_index` (`desc_trans_id`),
KEY `text_index` (`text_trans_id`)
) ENGINE=MyISAM
now i need to get top 20 products in alphabetical order, to do that i use this query :
SELECT
SQL_CALC_FOUND_ROWS
dt_table.* ,
t_name.value as 'name'
FROM
products as dt_table
LEFT JOIN
`translations` as t_name on dt_table.name_trans_id = t_name.key
WHERE
(t_name.lang = 1 OR t_name.lang is null)
ORDER BY
name ASC LIMIT 0, 20
It takes forever.
Any help optimizing this query/tables will be appreciated.
Thank you.
Try to change your structure of translations table to:
CREATE TABLE IF NOT EXISTS `translations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`key` int(11) NOT NULL,
`lang` int(11) NOT NULL DEFAULT 0,
`value` text CHARACTER SET utf8,
PRIMARY KEY (`id`),
KEY `lang` (`lang`),
KEY `key` (`key`,`lang`),
FULLTEXT idx (`value`)
) ENGINE=InnoDB;
because you really need lang to be indexed as soon as you use it in WHERE clause.
And try to change your query a little bit:
SELECT
dt_table.* ,
t_name.value as 'name',
SUBSTR(t_name.value,0,100) as text_order
FROM
products as dt_table
LEFT JOIN (
SELECT key, value FROM `translations`
WHERE lang = 1 OR lang is null
) as t_name
ON dt_table.name_trans_id = t_name.key
ORDER BY
text_order ASC LIMIT 0, 20
and if you really need SQL_CALC_FOUND_ROWS (I don't understand why do you need counter for translations items)
you can run another query just right after the first one:
SELECT COUNT(*) FROM products;
I am pretty sure you will be surprised with performance :-)

select count, group by and having optimization

I have this query
SELECT
t2.counter_id,
t2.hash_counter,
count(1) AS cnt
FROM
table1 t1
RIGHT JOIN
table2 t2 USING(counter_id)
WHERE
t2.hash_id = 973
GROUP BY
t1.counter_id
HAVING
cnt < 8000
Here are the tables.
CREATE TABLE `table1` (
`id` varchar(255) NOT NULL,
`platform` varchar(32) DEFAULT NULL,
`version` varchar(10) DEFAULT NULL,
`edition` varchar(2) NOT NULL DEFAULT 'us',
`counter_id` int(11) NOT NULL,
`created_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `counter_id` (`counter_id`)
) ENGINE=InnoDB
CREATE TABLE `table2` (
`counter_id` int(11) NOT NULL AUTO_INCREMENT,
`hash_id` int(11) DEFAULT NULL,
`hash_counter` int(11) DEFAULT NULL,
PRIMARY KEY (`counter_id`),
UNIQUE KEY `counter_key` (`hash_id`,`hash_counter`)
) ENGINE=InnoDB
The "EXPLAIN" shows "Using index; Using temporary; Using filesort" for table t2. Is there any way to get rid off temporary/filesort ? or any other ideas about optimizing this guy.
Your comment above gives more insight into what you want. It is always better to explain more about what you are trying to achieve - just looking at the non-working SQL leads people down the wrong path.
So, you want to know which table2 rows have < 8000 table1 rows?
Why not this:
select *
from table2 as t2
where hash_id = 973
and 8000 < (select count(*) from table1 as t1 where t1.counter_id = t2.counter_id)
;

Mysql optimize slow query with explain

I'm working on MySQL 5.5.29-0ubuntu0.12.04.1.
I have the need to create a query that can sort results by date and by a score.
I read the documentation and the posts here on stackoverflow (specifically this) about how to optimize a query but I'm still struggling to do it well.
The key findings is that to avoid the use of a temporary table the ORDER BY or GROUP BY must contains only columns from the first table in the join queue, so that's why the use of the STRAIGHT_JOIN clause and the two slightly different queries.
To avoid confusion, I'm going to assign a number to various query configuration:
order by date with STRAIGHT_JOIN clause
order by score with STRAIGHT_JOIN clause
order by date without STRAIGHT_JOIN clause
order by score without STRAIGHT_JOIN clause
Following is query 1, takes about 2.5 seconds to complete:
SELECT STRAIGHT_JOIN item.id AS id
FROM item
INNER JOIN score ON item.id = score.item_id
LEFT JOIN url ON item.url_id = url.id
LEFT JOIN doc ON url.doc_id = doc.id
INNER JOIN feed ON feed.id = item.feed_id
INNER JOIN user_feed ON feed.id = user_feed.feed_id AND score.user_id = user_feed.user_id
LEFT JOIN star ON item.id = star.item_id AND score.user_id = star.user_id
JOIN unseen ON item.id = unseen.item_id AND score.user_id = unseen.user_id
WHERE score.user_id = 1 AND user_feed.id = 7
ORDER BY zen_time DESC
LIMIT 0, 10
Following is query 2 (first join tables are inverted and the ordering column is different), takes only about 0.01 seconds to complete:
SELECT STRAIGHT_JOIN item.id AS id
FROM score
INNER JOIN item ON item.id = score.item_id
LEFT JOIN url ON item.url_id = url.id
LEFT JOIN doc ON url.doc_id = doc.id
INNER JOIN feed ON feed.id = item.feed_id
INNER JOIN user_feed ON feed.id = user_feed.feed_id AND score.user_id = user_feed.user_id
LEFT JOIN star ON item.id = star.item_id AND score.user_id = star.user_id
JOIN unseen ON item.id = unseen.item_id AND score.user_id = unseen.user_id
WHERE score.user_id = 1 AND user_feed.id = 7
ORDER BY score DESC
LIMIT 0, 10
Following are the EXPLAIN results for the queries.
Explain for query 1:
Explain for query 2:
Explain for query 3:
Explain for query 4:
Profiler result for query 1:
Profiler result for query 2:
Profiler result for query 3:
Profiler result for query 4:
Following are tables definitions:
CREATE TABLE `doc` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`md5` char(32) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `Md5_index` (`md5`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `feed` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`url` text NOT NULL,
`title` text,
PRIMARY KEY (`id`),
FULLTEXT KEY `Title_url_index` (`title`,`url`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `item` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`feed_id` bigint(20) unsigned NOT NULL,
`url_id` bigint(20) unsigned DEFAULT NULL,
`md5` char(32) NOT NULL,
PRIMARY KEY (`id`),
KEY `Md5_index` (`md5`),
KEY `Zen_time_index` (`zen_time`),
KEY `Feed_index` (`feed_id`),
KEY `Url_index` (`url_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `score` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`item_id` bigint(20) unsigned NOT NULL,
`score` float DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `User_item_index` (`user_id`,`item_id`),
KEY Score_index (`score`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `star` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`item_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `User_item_index` (`user_id`,`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `unseen` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`item_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `User_item_index` (`user_id`,`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `url` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`doc_id` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY Doc_index (`doc_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `user` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_Email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `user_feed` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`feed_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `User_feed_index` (`user_id`,`feed_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Here are the row counts for the tables involved in the query:
Score: 68657
Item: 197602
Url: 198354
Doc: 186113
Feed: 754
User_feed: 721
Star: 0
Unseen: 150762
Which approach should I take since my program needs to be able to order results both by zen_time and score in the fastest way possible?
Due to the different query speeds I decided to make an even more accurate analysis based on the various results I want to achieve.
The result sets I need are four:
Select all the items from a specific feed, order them by SCORE.score (intelligent order)
Select all the items from a specific feed, order them by ITEM.zen_time (time order)
Select all the items, order them by SCORE.score (intelligent order)
Select all the items, order them by ITEM.zen_time (time order)
The query so has to be adapted to those conditions, and its variable parts are:
STRAIGHT_JOIN yes/no
First JOIN table score/item
WHERE condition on specific feed yes/no
ORDER BY score/zen_time
All of the tests have been executed with the SELECT SQL_NO_CACHE instruction.
Following are the results:
Now it's clear what I have to do:
No STRAIGHT_JOIN, first JOIN table SCORE
No STRAIGHT_JOIN, first JOIN table SCORE
STRAIGHT_JOIN (I did beat MySQL engine here :D ), first JOIN table SCORE
STRAIGHT_JOIN (I did beat MySQL engine here :D ), first JOIN table ITEM

simple select count query is taking too long to execute

Below is the table structure. MySql table contains 11 million data. All the data may be inserted into the table in one day only.
I tried executing the below query and it took 5-7 mins to finish. I hav created the index based on the query and even it is taking 3-5 mins to finish.Index is being used.
What should i do to increase the query performance ? any suggestions ?
CREATE TABLE `service_collection_data` (
`service_name` varchar(512) NOT NULL,
`device_id` bigint(20) NOT NULL,
`dataset_name` varchar(1024) NOT NULL,
`dataset_title` varchar(1024) NOT NULL,
`dataset_type` varchar(32) NOT NULL,
`dataset_command` varchar(4000) DEFAULT NULL,
`tag_name` varchar(4000) DEFAULT NULL,
`command_status` varchar(256) NOT NULL,
`command_result` longtext,
`command_error` longtext,
`command_context` longtext,
`collection_time` timestamp NOT NULL
DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
KEY `device_id` (`device_id`),
KEY `service_tag_command_index` (`service_name`(300),`tag_name`(200),
`command_status`(200),`device_id`),
CONSTRAINT `service_collection_data_ibfk_1`
FOREIGN KEY (`device_id`)
REFERENCES `nodes` (`id`) ON DELETE CASCADE
)
ENGINE=InnoDB DEFAULT CHARSET=latin1
Below is query is executed
SELECT COUNT(DISTINCT device_id)
FROM service_collection_data
WHERE service_name='NOS'
AND tag_name='Config'
AND command_status = 'Successful'
As debugging scenarios, what is the speed of:
1)
SELECT COUNT(device_id)
FROM service_collection_data
2)
SELECT COUNT(DISTINCT device_id)
FROM service_collection_data
3)
SELECT COUNT(*)
FROM service_collection_data
WHERE service_name='NOS'
AND tag_name='Config'
AND command_status = 'Successful'
4)
SELECT COUNT(DISTINCT device_id)
FROM
(SELECT service_name, tag_name, command_status, device_id
FROM service_collection_data
GROUP BY service_name, tag_name, command_status, device_id
WHERE service_name='NOS'
AND tag_name='Config'
AND command_status = 'Successful')
Other:
Can device_id be specified as primary key (thus guaranteeing a distinct value)?
Can service_name, tag_name, and command_status be changed to integer foreign keys?

MySQL optimize count query

I've got a question about MySQL performance.
These are my tables:
(about 140.000 records)
CREATE TABLE IF NOT EXISTS `article` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`label` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`title` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`intro` text COLLATE utf8_unicode_ci NOT NULL,
`content` text COLLATE utf8_unicode_ci NOT NULL,
`date` int(11) NOT NULL,
`active` int(1) NOT NULL,
`language_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
`indexed` int(1) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=132911 ;
(about 400.000 records)
CREATE TABLE IF NOT EXISTS `article_category` (
`article_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
RUNNING THIS COUNT QUERY:
SELECT SQL_NO_CACHE COUNT(id) as total
FROM (`article`)
LEFT JOIN `article_category` ON `article_category`.`article_id` = `article`.`id`
WHERE `article`.`language_id` = 1
AND `article_category`.`category_id` = '<catid>'
This query takes a lot of resources, so I am wondering how to optimize this query.
After executing it's beeing cached, so after the first run I am fine.
RUNNING THE EXPLAIN FUNCTION:
AFTER CREATING AN INDEX:
ALTER TABLE `article_category` ADD INDEX ( `article_id` , `category_id` ) ;
After adding indexes and changing LEFT JOIN to JOIN the query runs alot faster!
Thanks for these fast replys :)
QUERY I USE NOW (I removed the language_id because it was not that neccesary):
SELECT COUNT(id) as total
FROM (`article`)
JOIN `article_category` ON `article_category`.`article_id` = `article`.`id`
AND `article_category`.`category_id` = '<catid>'
I've read something about forcing an index, but I think thats not neccesary anymore because the tables are already indexed, right?
Thanks alot!
Martijn
You haven't created necessary index on the table
Table article_category - Create a compound index on (article_id, category_id)
Table article -Create a compound index on (id, language_id)
If this doesn't help post the explain statement.
The columns used in a JOIN condition should have an index, so you need to index article_id.