select count, group by and having optimization - mysql

I have this query
SELECT
t2.counter_id,
t2.hash_counter,
count(1) AS cnt
FROM
table1 t1
RIGHT JOIN
table2 t2 USING(counter_id)
WHERE
t2.hash_id = 973
GROUP BY
t1.counter_id
HAVING
cnt < 8000
Here are the tables.
CREATE TABLE `table1` (
`id` varchar(255) NOT NULL,
`platform` varchar(32) DEFAULT NULL,
`version` varchar(10) DEFAULT NULL,
`edition` varchar(2) NOT NULL DEFAULT 'us',
`counter_id` int(11) NOT NULL,
`created_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `counter_id` (`counter_id`)
) ENGINE=InnoDB
CREATE TABLE `table2` (
`counter_id` int(11) NOT NULL AUTO_INCREMENT,
`hash_id` int(11) DEFAULT NULL,
`hash_counter` int(11) DEFAULT NULL,
PRIMARY KEY (`counter_id`),
UNIQUE KEY `counter_key` (`hash_id`,`hash_counter`)
) ENGINE=InnoDB
The "EXPLAIN" shows "Using index; Using temporary; Using filesort" for table t2. Is there any way to get rid off temporary/filesort ? or any other ideas about optimizing this guy.

Your comment above gives more insight into what you want. It is always better to explain more about what you are trying to achieve - just looking at the non-working SQL leads people down the wrong path.
So, you want to know which table2 rows have < 8000 table1 rows?
Why not this:
select *
from table2 as t2
where hash_id = 973
and 8000 < (select count(*) from table1 as t1 where t1.counter_id = t2.counter_id)
;

Related

MySQL query optimization for extremly slow query

I'm a web developer and I'm posting for the first time on SO.
Today I'm asking for your help because I already tried all the possibilities with no luck.
I created a SAAS web application that is used by salesman on the ground, it includes an offline version where users don't need to be connected to use it.
As the database is getting bigger, the queries are taking more and more time to be executed.
Today I'm facing a big issue where the query is leading to a timeout when the user tries to display it's result.
So, here's the dump :
SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
SET AUTOCOMMIT = 0;
START TRANSACTION;
SET time_zone = "+00:00";
-- 86 rows
CREATE TABLE t2 (
id_t2 int(11) NOT NULL,
quantite_t2 int(11) NOT NULL,
ca_t2 decimal(10,2) NOT NULL,
date_t2 date NOT NULL,
import_t2 datetime NOT NULL,
id_enseigne int(11) NOT NULL,
id_t3 int(11) NOT NULL,
annee_t2 int(11) NOT NULL,
mois_t2 int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- 2012065 rows
CREATE TABLE t1 (
id_t2 int(11) NOT NULL,
id_t0 bigint(20) NOT NULL,
id_t4 bigint(20) NOT NULL,
quantite_t1 int(11) NOT NULL,
ca_t1 decimal(10,2) NOT NULL,
pvc_moyen_t1 float NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- 388 rows
CREATE TABLE t4 (
id_t4 int(11) NOT NULL,
lib_t4 text NOT NULL,
libcourt_t4 varchar(255) NOT NULL,
ean_t4 varchar(255) NOT NULL,
pcb_t4 int(11) NOT NULL,
pcb2_t4 int(11) NOT NULL,
fam_t4 varchar(255) NOT NULL,
gam_t4 varchar(255) NOT NULL DEFAULT '0',
stat_t4 int(11) NOT NULL DEFAULT 1,
vmh_t4 decimal(10,2) NOT NULL,
detail_t4 text NOT NULL,
ingr_t4 text NOT NULL,
weight_t4 float NOT NULL,
lifetime_t4 varchar(255) NOT NULL,
pmc1_t4 float NOT NULL,
pmc2_t4 float NOT NULL,
dim_t4 decimal(10,2) NOT NULL,
ordre_t4 int(11) NOT NULL,
created_t4 datetime NOT NULL,
updated_t4 datetime NOT NULL,
updated_img_t4 datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- 1 row
CREATE TABLE t3 (
id_t3 int(11) NOT NULL,
nom_t3 text NOT NULL,
stat_t3 int(11) NOT NULL,
created_t3 datetime NOT NULL,
deleted_t3 datetime NOT NULL,
updated_t3 datetime NOT NULL,
ip_create_t3 text NOT NULL,
ip_delete_t3 text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
ALTER TABLE t2
ADD PRIMARY KEY (id_t2),
ADD KEY annee_t2 (annee_t2,mois_t2,date_t2);
ALTER TABLE t1
ADD PRIMARY KEY (id_t2,id_t0,id_t4);
ALTER TABLE t4
ADD PRIMARY KEY (id_t4,ean_t4),
ADD KEY ean_t4 (ean_t4),
ADD KEY id_t4 (id_t4);
ALTER TABLE t3
ADD PRIMARY KEY (id_t3);
ALTER TABLE t2
MODIFY id_t2 int(11) NOT NULL AUTO_INCREMENT;
ALTER TABLE t4
MODIFY id_t4 int(11) NOT NULL AUTO_INCREMENT;
ALTER TABLE t3
MODIFY id_t3 int(11) NOT NULL AUTO_INCREMENT;
COMMIT;
The executed query below takes like 4 min to execute :
-- execution time 228 seconds
SELECT SUM(t1.ca_t1) AS ca_t4, SUM(t1.quantite_t1) AS qte_t4,
t4.fam_t4, t4.gam_t4, t4.lib_t4, t4.ean_t4, t4.id_t4,
t2.annee_t2, t2.mois_t2, COUNT(t1.id_t0) AS count_mag,
t3.id_t3, t3.nom_t3
FROM t1 t1
INNER JOIN t2 t2 ON t2.id_t2 = t1.id_t2
LEFT JOIN t3 t3 ON t2.id_t3 = t3.id_t3
INNER JOIN t4 t4 ON t1.id_t4 = t4.ean_t4
WHERE t2.date_t2 BETWEEN "2017-05-01" AND "2019-05-01"
GROUP BY t2.annee_t2, t2.mois_t2, t4.id_t4
ORDER BY ca_t4 DESC;
I tried all optimization I know to help me reduce execution time but no success...
The EXPLAIN shows this :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t2 ALL PRIMARY NULL NULL NULL 86 Using where; Using temporary; Using filesort
1 SIMPLE t3 eq_ref PRIMARY PRIMARY 4 db.t2.id_t3 1
1 SIMPLE t1 ref PRIMARY,id_t2 PRIMARY 4 db.t2.id_t2 11266
1 SIMPLE t4 ALL ean_t4 NULL NULL NULL 388 Using where; Using join buffer (flat, BNL join)
Thank you for your help guys.
This looks like a fairly complex query, with multiple places that will slow it down. However, the first thing I notice is that a double lookup would have to occur in order to use the annee_2 index, which is probably why it is not using it.
Try adding id_t3 to the end of that index on table t2:
(annee_t2,mois_t2,date_t2,id_t3)
This should permit the optimize to use that index.
Run the query again (twice, to populate the buffer cache, report only the 2nd result) and if it doesn't improve sufficiently, post the new EXPLAIN plan.
The GROUP BY is probably improper since it does not include the t3 columns that are not aggergated.
Do you really want 2 years plus 1 day? Perhaps use this:
t2.date_t2 >= "2017-05-01"
AND t2.date_t2 < "2017-05-01" + INTERVAL 2 YEAR
Do not mix datatypes when JOINing -- ON t1.id_t4 = t4.ean_t4:
ean_t4 varchar(255) NOT NULL,
id_t4 bigint(20) NOT NULL,
(There may be other issues, but these should help.)

MySQL Query with Subqueries taking longer than it should

I've been trying to find the cause for the slowdown in the query. The query is originally a DELETE query, but I've been using a SELECT * from
This is the query in question
SELECT * FROM table1
where table1.id IN (
#Per friends suggestion I wrapped the subquery in a subquery (yo dawg) to "cache" it, it works on other queries, but not on this time.
SELECT id FROM (
(
SELECT id FROM (
SELECT table1.id FROM table1
LEFT JOIN table2 ON table2.id = table1.salesperson_id
LEFT JOIN table3 ON table3.id = table2.user_id
LEFT JOIN table4 ON table3.office_id = table4.id
WHERE table1.type = "Snapshot"
AND table4.id = 25 OR table4.parent_id =25
LIMIT 500
) AS ids )
) AS moreIds
)
The table in question is 16 gigs.
The server it's being ran against is beefy enough not to be a bottleneck.
Fields id,salesperson_id and type are all indexed.Checked it 5 times.
The subquery itself runs extremely fast. Subquery:
SELECT id FROM (
SELECT table1.id FROM table1
LEFT JOIN table2 ON table2.id = table1.salesperson_id
LEFT JOIN table3 ON table3.id = table2.user_id
LEFT JOIN table4 ON table3.office_id = table4.id
WHERE table1.type = "Snapshot"
AND table4.id = 25 OR table4.parent_id =25
LIMIT 500
)
In the processlist the query is stuck in the state of "SENDING DATA". But Workbench indicates that the query is still running.
Here's an EXPLAIN SELECT of the query
'1', 'PRIMARY', 'table1', 'index', NULL, 'SALES_FK_ON_SALES_STATE', '5', NULL, '36688459', 'Using where; Using index'
'2', 'DEPENDENT SUBQUERY', '<derived3>', 'ALL', NULL, NULL, NULL, NULL, '500', 'Using where'
'3', 'DERIVED', '<derived4>', 'ALL', NULL, NULL, NULL, NULL, '500', ''
'4', 'DERIVED', 'table4', 'index_merge', 'PRIMARY,IDX_9F61CEFC727ACA70', 'PRIMARY,IDX_9F61CEFC727ACA70', '4,5', NULL, '67', 'Using union(PRIMARY,IDX_9F61CEFC727ACA70); Using where; Using index'
'4', 'DERIVED', 'table3', 'ref', 'PRIMARY,IDX_C077730FFFA0C224', 'IDX_C077730FFFA0C224', '5', 'hugeDb.table4.id', '381', 'Using where; Using index'
'4', 'DERIVED', 'table2', 'ref', 'PRIMARY,UNIQ_36E3BDB1A76ED395', 'UNIQ_36E3BDB1A76ED395', '5', 'hugeDb.table3.id', '1', 'Using where; Using index'
'4', 'DERIVED', 'table1', 'ref', 'SALESPERSON,SALES_FK_ON_SALES_STATE', 'SALES_FK_ON_SALES_STATE', '5', 'hugeDb.table2.id', '115', 'Using where'
Here are the SHOW CREATE TABLES
CREATE TABLE `table4` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`logo_file_id` int(11) DEFAULT NULL,
`contact_address_id` int(11) DEFAULT NULL,
`billing_address_id` int(11) DEFAULT NULL,
`parent_id` int(11) DEFAULT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`fax` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`contact_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`active` tinyint(1) NOT NULL,
`date_modified` datetime DEFAULT NULL,
`date_created` datetime NOT NULL,
`license_number` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`list_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`email` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`routing_address_id` int(11) DEFAULT NULL,
`billed_separately` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_9F61CEFCA7E1931C` (`logo_file_id`),
KEY `IDX_9F61CEFC320EF6E2` (`contact_address_id`),
KEY `IDX_9F61CEFC79D0C0E4` (`billing_address_id`),
KEY `IDX_9F61CEFC727ACA70` (`parent_id`),
KEY `IDX_9F61CEFC40F0487C` (`routing_address_id`),
-- CONSTRAINT `FK_9F61CEFC320EF6E2` FOREIGN KEY (`contact_address_id`) REFERENCES `other_irrelevant_table` (`id`),
-- CONSTRAINT `FK_9F61CEFC79D0C0E4` FOREIGN KEY (`billing_address_id`) REFERENCES `other_irrelevant_table` (`id`),
-- CONSTRAINT `FK_9F61CEFCA7E1931C` FOREIGN KEY (`logo_file_id`) REFERENCES `other_irrelevant_table` (`id`),
-- CONSTRAINT `FK_9F61CEFCE346079F` FOREIGN KEY (`routing_address_id`) REFERENCES `other_irrelevant_table` (`id`),
CONSTRAINT `FK_9F61CEFC727ACA70` FOREIGN KEY (`parent_id`) REFERENCES `table4` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=750 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `table3` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`office_id` int(11) DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
`active` tinyint(1) NOT NULL,
`date_modified` datetime DEFAULT NULL,
`date_created` datetime NOT NULL,
`profile_id` int(11) DEFAULT NULL,
`deleted` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_C077730FFFA0C224` (`office_id`),
KEY `IDX_C077730FA76ED395` (`user_id`),
KEY `IDX_C077730FCCFA12B8` (`profile_id`),
-- CONSTRAINT `FK_C077730FA76ED395` FOREIGN KEY (`user_id`) REFERENCES `other_irrelevant_table` (`id`),
-- CONSTRAINT `FK_C077730FCCFA12B8` FOREIGN KEY (`profile_id`) REFERENCES `other_irrelevant_table` (`id`),
CONSTRAINT `FK_C077730FFFA0C224` FOREIGN KEY (`office_id`) REFERENCES `table4` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=382425 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `table2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`active` tinyint(1) NOT NULL,
`date_modified` datetime DEFAULT NULL,
`date_created` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_36E3BDB1A76ED395` (`user_id`),
CONSTRAINT `FK_36E3BDB1A76ED395` FOREIGN KEY (`user_id`) REFERENCES `table3` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=174049 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `table1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`salesperson_id` int(11) DEFAULT NULL,
`count_active_contracts` int(11) NOT NULL,
`average_initial_price` decimal(12,2) NOT NULL,
`average_contract_value` decimal(12,2) NOT NULL,
`total_sold` int(11) NOT NULL,
`total_active` int(11) NOT NULL,
`active` tinyint(1) NOT NULL,
`date_modified` datetime DEFAULT NULL,
`date_created` datetime NOT NULL,
`type` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`services_scheduled_today` int(11) NOT NULL,
`services_scheduled_week` int(11) NOT NULL,
`services_scheduled_month` int(11) NOT NULL,
`services_scheduled_summer` int(11) NOT NULL,
`serviced_today` int(11) NOT NULL,
`serviced_this_week` int(11) NOT NULL,
`serviced_this_month` int(11) NOT NULL,
`serviced_this_summer` int(11) NOT NULL,
`autopay_account_percentage` decimal(3,2) NOT NULL,
`value_per_door` decimal(12,2) NOT NULL,
`total_paid` int(11) NOT NULL,
`sales_status_summary` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`total_serviced` int(11) NOT NULL,
`services_scheduled_year` int(11) NOT NULL,
`serviced_this_year` int(11) NOT NULL,
`services_scheduled_yesterday` int(11) NOT NULL,
`serviced_yesterday` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `SALESPERSON` (`type`),
KEY `SALES_FK_ON_SALES_STATE` (`salesperson_id`),
CONSTRAINT `SALES_FK_ON_SALES_STATE` FOREIGN KEY (`salesperson_id`) REFERENCES `table2` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=181662521 DEFAULT CHARSET=utf8;
When you see "DEPENDENT SUBQUERY" in the explain, it isn't caching the result of the subquery. It's re-executing the subquery many times (once for each distinct value in the outermost query). I see in the explain that your outermost query is examining 36 million rows. So this is probably running the subquery many, many times.
This is documented here: https://dev.mysql.com/doc/refman/5.7/en/explain-output.html
For DEPENDENT SUBQUERY, the subquery is re-evaluated only once for each set of different values of the variables from its outer context. For UNCACHEABLE SUBQUERY, the subquery is re-evaluated for each row of the outer context.
One way to avoid this is to use a subquery as a derived table instead of as the argument to an IN() predicate. This is a better way to do a semi-join like you're doing.
SELECT ... FROM TableA
WHERE TableA.id IN (SELECT id FROM ...)
Should be equivalent to:
SELECT ... FROM TableA
JOIN (SELECT DISTINCT id FROM ...) AS TableB
ON TableA.id = TableB.id
The use of DISTINCT in the subquery means there's only one row per id returned by the subquery, so the join won't multiply the number of rows from TableA if there are multiple matches. This makes it a semi-join.
The following should do better:
SELECT table1.*
FROM table1
JOIN (
SELECT table1.id FROM table1
LEFT JOIN table2 ON table2.id = table1.salesperson_id
LEFT JOIN table3 ON table3.id = table2.user_id
LEFT JOIN table4 ON table3.office_id = table4.id
WHERE table1.type = 'Snapshot'
AND table4.id = 25 OR table4.parent_id =25
LIMIT 500
) AS ids ON table1.id = ids.id;
You might also try to get rid of the index_merge. You're getting that because you're using OR for two different indexed columns in table4. It uses both indexes, and then unions them. Sometimes† it's better to use a UNION of two subqueries explicitly, instead of relying on the index_merge.
SELECT table1.*
FROM table1
JOIN (
SELECT table1.id FROM table1
JOIN table2 ON table2.id = table1.salesperson_id
JOIN table3 ON table3.id = table2.user_id
JOIN (
SELECT id FROM table4 WHERE id=25
UNION
SELECT id FROM table4 WHERE parent_id=25
) AS t4 ON table3.office_id = t4.id
WHERE table1.type = 'Snapshot'
LIMIT 500
) AS ids ON table1.id = ids.id;
You're also using LEFT JOIN unnecessarily, so I replaced it with JOIN. The MySQL optimizer will silently convert it to an inner join, but I think you should study what LEFT JOIN means, and use it when it's called for.
† I say "sometimes" because which method is best might depend on your data, so you should test it both ways.
Due to me needing to limit a delete query with joins(which isn't possible in mysql), there is an another option. Which is in no way the better one (Can't beat Bill's answer).
But it works, and the query is extremely fast, albeit, not very flexible. Because it has a minimum amount of rows it can pull, which for a 38M row table is 575k (no idea why)
But here it is:
SELECT COUNT(*) FROM table1
JOIN table2 ON table2.id = table1.salesperson_id
JOIN table3 ON table3.id = table2.user_id
JOIN table4 ON table3.office_id = table4.id
WHERE table1.type = "Snapshot"
AND table4.id = 113 OR table4.parent_id =113
AND RAND()<=0.001;
But Bill's answer should be more than enough for everyone.
P.S. I'll ask the question about RAND() in a Where Clause and will post the link here. Maybe it will help some desperate dev in 2025.
You got carried away with nesting, etc.
SELECT table1.*
FROM
(
SELECT table1.id
FROM table1
JOIN table2 ON table2.id = table1.salesperson_id
JOIN table3 ON table3.id = table2.user_id
JOIN table4 ON table3.office_id = table4.id
WHERE table1.type = "Snapshot"
AND table4.id = 25
OR table4.parent_id =25
LIMIT 500
) AS ids
JOIN table1 USING(id)
Some discussion:
It is better to find the 500 ids and throw them into a tmp table than to haul around all the columns of table1.*. Hence the subquery with LIMIT 500.
Bill's UNION seems to be unnecessary since the Optimizer decided to use "index merge union". This may be only the second time I have seen that feature in use!
IN ( SELECT ... ) is probably never faster than an equivalent JOIN or EXISTS, whichever is appropriate. (JOIN is appropriate for your case.)
For table4, you have a perfectly good 'natural PK in logo_file_id, why not get rid of id and promote that to PK? (Similarly in table2.)
Aarrgghh... By doing my previous suggestion, you can bypass table2!
table1 has 181M rows? INT is always 4 bytes. You have a lot of columns that sound like small counters; consider using TINYINT UNSIGNED (1 byte; range: 0..255) or SMALLINT UNSIGNED. That should shrink the size of the table significantly, thereby speeding up cacheability and use of the table somewhat.

Select a record from millions of records slowness

I have a standalone table, we insert it's data through a weekly job, and retrieve data in our search module.
the table has around 4 millions records (and will get bigger) when I execute the straight forward select query it take long time (around 15 second). I am using MySql DB.
Here is my table structure
CREATE TABLE `myTable` (
`myTableId` int(11) NOT NULL AUTO_INCREMENT,
`date` varchar(255) DEFAULT NULL,
`startTime` int(11) DEFAULT NULL,
`endTime` int(11) DEFAULT NULL,
`price` decimal(19,4) DEFAULT NULL,
`total` decimal(19,4) DEFAULT NULL,
`taxes` decimal(19,4) DEFAULT NULL,
`persons` int(11) NOT NULL DEFAULT '0',
`length` int(11) DEFAULT NULL,
`total` decimal(19,4) DEFAULT NULL,
`totalPerPerson` decimal(19,4) DEFAULT NULL,
`dayId` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`myTableId`)
);
When I run the following statement it take around 15 second to retrieve results.
So, how to optimize it to be faster.
SELECT
tt.testTableId,
(SELECT
totalPerPerson
FROM
myTable mt
WHERE
mt.venueId = tt.venueId
ORDER BY totalPerPerson ASC
LIMIT 1) AS minValue
FROM
testTable tt
WHERE
status is NULL;
Please note that testTable tble has around 15 records only.
This is the query:
SELECT tt.testTableId,
(SELECT mt.totalPerPerson
FROM myTable mt
WHERE mt.venueId = tt.venueId
ORDER BY mt.totalPerPerson ASC
LIMIT 1
) as minValue
FROM testTable tt
WHERE status is NULL;
For the subquery, you want an index on mytable(venueId, totalPerPerson). For the outer query, an index is unnecessary. However, if the table were larger, you would want in index on testTable(status, venueId, testTableId).
Using MIN and GROUP BY may be faster.
SELECT tt.testTableId, MIN(totalPerPerson)
FROM testTable tt
INNER JOIN mytable mt ON tt.venueId = mt.venueId
WHERE tt.status is NULL
GROUP BY tt.testTableId

SELECT with WHERE IN and subquery extremely slow

I want to exectute the following query:
SELECT *
FROM `bm_tracking`
WHERE `oid` IN
(SELECT `oid`
FROM `bm_tracking`
GROUP BY `oid` HAVING COUNT(*) >1)
The subquery:
SELECT `oid`
FROM `bm_tracking`
GROUP BY `oid`
HAVING COUNT( * ) >1
executes in 0.0525 secs
The whole query "stucks" (still processing after 3 minutes...). Column oid is indexed.
Table bm_tracking contains around 64k rows.
What could be the reason for this "stuck"?
[Edit: Upon request]
CREATE TABLE `bm_tracking` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`oid` varchar(10) NOT NULL,
`trk_main` varchar(50) NOT NULL,
`tracking` varchar(50) NOT NULL,
`label` text NOT NULL,
`void` int(11) NOT NULL DEFAULT '0',
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `oid` (`oid`),
KEY `trk_main` (`trk_main`),
KEY `tracking` (`tracking`),
KEY `created` (`created`)
) ENGINE=MyISAM AUTO_INCREMENT=63331 DEFAULT CHARSET=latin1
[Execution Plan]
Generally exists EXISTS faster than IN so you can try this and see if it executes better for you
SELECT *
FROM `bm_tracking` bt
WHERE EXISTS
( SELECT 1
FROM `bm_tracking` bt1
WHERE bt.oid = bt1.oid
GROUP BY `oid`
HAVING COUNT(*) >1
)
EDIT:
if you notice from the EXPLAIN you posted... the IN() is considered as a DEPENDENT SUBQUERY which is a correlated subquery... meaning that for every row in the table all rows in the table are pulled and compared... so for example 1000 rows in the table would mean 1000 * 1000 = 1 million comparisons -- thats why its taking such a long time

Improving the MySQL Query

I have the following query which filters the row with replyAutoId=0 and then fetches the most recent record of each propertyId. Now the query takes 0.23225 sec for fetching just 5,435 from 21,369 rows and I want to improve this. All I am asking is, Is there a better way of writing this query ? Any suggestions ?
SELECT pc1.* FROM (SELECT * FROM propertyComment WHERE replyAutoId=0) as pc1
LEFT JOIN propertyComment as pc2
ON pc1.propertyId= pc2.propertyId AND pc1.updatedDate < pc2.updatedDate
WHERE pc2.propertyId IS NULL
The SHOW CREATE TABLE propertyComment Output:
CREATE TABLE `propertyComment` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`propertyId` int(11) NOT NULL,
`agentId` int(11) NOT NULL,
`comment` longtext COLLATE utf8_unicode_ci NOT NULL,
`replyAutoId` int(11) NOT NULL,
`updatedDate` datetime NOT NULL,
`contactDate` date NOT NULL,
`status` enum('Y','N') COLLATE utf8_unicode_ci NOT NULL DEFAULT 'N',
`clientStatusId` int(11) NOT NULL,
`adminsId` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `propertyId` (`propertyId`),
KEY `agentId` (`agentId`),
KEY `status` (`status`),
KEY `adminsId` (`adminsId`),
KEY `replyAutoId` (`replyAutoId`)
) ENGINE=MyISAM AUTO_INCREMENT=21404 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Try to get rid of the nested query.
The following query should give the same result as your original query:
SELECT pc1.*
FROM propertyComment AS pc1
LEFT JOIN propertyComment AS pc2
ON pc1.propertyID = pc2.propertyId AND pc1.updatedDate < pc2.updatedDate
WHERE pc1.replyAutoId = 0 AND pc2.propertyID IS NULL
SELECT pc1.* FROM (SELECT * WHERE replyAutoId=0) as pc1
LEFT JOIN (SELECT propertyID, updatedDate from propertyComment order by 1,2) as pc2
ON pc1.propertyId= pc2.propertyId AND pc1.updatedDate < pc2.updatedDate
WHERE pc2.propertyId IS NULL
You also don't have any indexes?
If you did on primary key, you're not joining on it, so why include it?
Why not only select the columns you're interested from B table? This will limit the number of columns you're selecting from table B. Since you're pulling everything from table A where replyAutoID = 0, it wouldn't make much sense to limit the columns there. This should speed it up little.