GROUP BY + ORDER BY make my query very slow - mysql

I am trying to figure out what I should do to my query and/ or to my tables structure to improve a query to get the best sellers which is run in over 1 sec.
Here is the query I'm talking about:
SELECT pr.id_prod, MAX(pr.stock) AS stock, MAX(pr.dt_add) AS dt_add, SUM(od.quantity) AS quantity
FROM orders AS o
INNER JOIN orders_details AS od ON od.id_order = o.id_order
INNER JOIN products_references AS pr ON pr.id_prod_ref = od.id_prod_ref
INNER JOIN products AS p ON p.id_prod = pr.id_prod
WHERE o.id_order_status > 11
AND pr.active = 1
GROUP BY p.id_prod
ORDER BY quantity
LIMIT 10
If I use GROUP BY p.id_prod instead of GROUP BY pr.id_prod and remove the ORDER BY, the query is run in 0.07sec.
is that EXPLAIN table OKAY?
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE o range PRIMARY,id_order_status id_order_status 1 75940 Using where; Using index; Using temporary; Using filesort
1 SIMPLE od ref id_order,id_prod_ref id_order 4 dbname.o.id_order 1
1 SIMPLE pr eq_ref PRIMARY,id_prod PRIMARY 4 dbname.od.id_prod_ref 1 Using where
1 SIMPLE p eq_ref PRIMARY,name_url,id_brand,name PRIMARY 4 dbname.pr.id_prod 1 Using index
And this is the EXPLAIN without the ORDER BY
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p index PRIMARY,name_url,id_brand,name PRIMARY 4 1 Using index
1 SIMPLE pr ref PRIMARY,id_prod id_prod 4 dbname.p.id_prod 2 Using where
1 SIMPLE od ref id_order,id_prod_ref id_prod_ref 4 dbname.pr.id_prod_ref 67
1 SIMPLE o eq_ref PRIMARY,id_order_status PRIMARY 4 dbname.od.id_order 1 Using where
And here is the table structures
CREATE TABLE `orders` (
`id_order` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_dir` int(10) unsigned DEFAULT NULL,
`id_status` tinyint(3) unsigned NOT NULL DEFAULT '11',
PRIMARY KEY (`id_order`),
KEY `id_dir` (`id_dir`),
KEY `id_status` (`id_status`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `orders_details` (
`id_order_det` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_order` int(10) unsigned NOT NULL,
`id_prod_ref` int(10) unsigned NOT NULL,
`quantity` smallint(5) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`id_order_det`),
UNIQUE KEY `id_order` (`id_order`,`id_prod_ref`) USING BTREE,
KEY `id_prod_ref` (`id_prod_ref`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `products` (
`id_prod` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(60) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id_prod`),
FULLTEXT KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `products_references` (
`id_prod_ref` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_prod` int(10) unsigned NOT NULL,
`stock` smallint(6) NOT NULL DEFAULT '0',
`dt_add` datetime DEFAULT NULL,
`active` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id_prod_ref`),
KEY `id_prod` (`id_prod`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I also tried to give you the tables relations (ON UPDATE, ON DELETE CASCADE, ...) but didn't manage to export it. But I don't think it's crucial for now!

Try using the alias name in order by and not the value from the table
and use the group by for the value in select (is the same for join because is inner join on equal value and the value form the pr is not retrived for select result )
SELECT p.id_prod, p.name, SUM(od.quantity) AS quantity
FROM orders AS o
INNER JOIN orders_details AS od ON od.id_order = o.id_order
INNER JOIN products_references AS pr ON pr.id_prod_ref = od.id_prod_ref
INNER JOIN products AS p ON p.id_prod = pr.id_prod
WHERE pr.active = 1
GROUP BY p.id_prod
ORDER BY quantity
LIMIT 10
do not forget to use appropriate indexes on join columns

(Rewritten after OP added more info.)
SELECT pr.id_prod,
MAX(pr.stock) AS max_stock,
MAX(pr.dt_add) AS max_dt_add
SUM(od.quantity) AS sum_quantity
FROM orders AS o
INNER JOIN orders_details AS od
ON od.id_order = o.id_order
INNER JOIN products_references AS pr
ON pr.id_prod_ref = od.id_prod_ref
WHERE o.id_order_status > 11
AND pr.active = 1
GROUP BY pr.id_prod
ORDER BY sum_quantity
LIMIT 10
Note that p was removed as being irrelevant.
Beware of SUM() when using JOIN with GROUP BY -- you might get an incorrect, inflated, value.
Improvement on one table:
CREATE TABLE `orders_details` (
`id_order` int(10) unsigned NOT NULL,
`id_prod_ref` int(10) unsigned NOT NULL,
`quantity` smallint(5) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`id_order`,`id_prod_ref`),
INDEX (id_prod_ref, id_order)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Here's why: od sounds like a many:many mapping table. See here for tips on improving performance in it.
GROUP BY usually involves a sort. ORDER BY, when it is not identical to the GROUP BY definitely requires another sort.
Removing the ORDER BY allows the query to return any 10 rows without the sort. (This may explain the timing difference.)
Note the alias sum_quantity to avoid ambiguity between the column quantity and your alias quantity.
Explaining EXPLAIN
1 SIMPLE o range id_order_status 1 75940 Using where; Using index; Using temporary; Using filesort
1 SIMPLE od ref id_order 4 o.id_order 1
1 SIMPLE pr eq_ref PRIMARY 4 od.id_prod_ref 1 Using where
1 SIMPLE p eq_ref PRIMARY 4 pr.id_prod 1 Using index
The tables will be accessed in the order given (o,od,pr,p).
o won't use the data ("Using index") but will scan the id_order_status index which includes (id_status, id_order). Note: The PRIMARY KEY columns are implicitedly added to any secondary key.
It estimates 76K will need to be scanned (for > 11).
Somewhere in the processing, there will a temp table and a sort of it. This may or may not involve disk I/O.
The reach into od might find 1 row, might find 0 or more than 1 ("ref").
The reaching into pr and p are known to get at most 1 row.
pr does a small amount of filtering (active=1), but not until the third line of EXPLAIN. And no index is useful for this filtering. This could be improved, but only slightly, by a composite index (active, id_prod_ref). With only 5-10% being filtered out, this won't help much.
After all the JOINing and filtering, there will be two temp tables and sorts, one for GROUP BY, one for ORDER BY.
Only after that, will 10 rows be peeled off from the 70K (or so) rows collected up to this point.
Without the ORDER BY, the EXPLAIN shows that a different order seems to be better. And the tmp & sort went away.
1 SIMPLE p index PRIMARY 4 1 Using index
1 SIMPLE pr ref id_prod 4 p.id_prod 2 Using where
1 SIMPLE od ref id_prod_ref 4 pr.id_prod_ref 67
1 SIMPLE o eq_ref PRIMARY 4 dbne.od.id_order 1 Using where
There seem to be only 1 row in p, correct? So, in a way, it does not matter when this table is accessed. When you have multiple "products" all this analysis may change!
"key=PRIMARY", "Using index" is sort of a misnomer. It is really using the data, but being able to efficiently access it because the PRIMARY KEY is "clustered" with the data.
There is only one pr row?? Perhaps the optimizer realized that GROUP BY was not needed?
When it got to od, it estimated that "67" rows would be needed per p+pr combo.
You removed the ORDER BY, so there is no need to sort, and any 10 rows can be delivered.

Related

MySQL - how to optimize query with order by

I am trying to generate a list of the 5 most recent history items for for a collection of user tasks. If I remove the order by the execution drops from ~2 seconds to < 20msec.
Indexes are on
h.task_id
h.mod_date
i.task_id
i.user_id
This is the query
SELECT h.*
, i.task_id
, i.user_id
, i.name
, i.completed
FROM h
, i
WHERE i.task_id = h.task_id
AND i.user_id = 42
ORDER
BY h.mod_date DESC
LIMIT 5
Here is the explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE i ref PRIMARY,UserID UserID 4 const 3091 Using temporary; Using filesort
1 SIMPLE h ref TaskID TaskID 4 myDB.i.task_id 7
Here are the show create tables:
CREATE TABLE `h` (
`history_id` int(6) NOT NULL AUTO_INCREMENT,
`history_code` tinyint(4) NOT NULL DEFAULT '0',
`task_id` int(6) NOT NULL,
`mod_date` datetime NOT NULL,
`description` text NOT NULL,
PRIMARY KEY (`history_id`),
KEY `TaskID` (`task_id`),
KEY `historyCode` (`history_code`),
KEY `modDate` (`mod_date`)
) ENGINE=InnoDB AUTO_INCREMENT=185647 DEFAULT CHARSET=latin1
and
CREATE TABLE `i` (
`task_id` int(6) NOT NULL AUTO_INCREMENT,
`user_id` int(6) NOT NULL,
`name` varchar(60) NOT NULL,
`due_date` date DEFAULT NULL,
`create_date` date NOT NULL,
`completed` tinyint(1) NOT NULL DEFAULT '0',
`task_description` blob,
PRIMARY KEY (`task_id`),
KEY `name_2` (`name`),
KEY `UserID` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=12085 DEFAULT CHARSET=latin1
INDEX(task_id, mod_date, history_id) -- in this order
Will be "covering" and the columns will be in the optimal order
Also, DROP
KEY `TaskID` (`task_id`)
So that the Optimizer won't be tempted to use it.
Try changing the index on h.task_id so it's this compound index.
CREATE OR REPLACE INDEX TaskID ON h(task_id, mod_date DESC);
This may (or may not) allow MySql to shortcut some or all the extra work in your ORDER BY ... LIMIT ... request. It's a notorious performance anti pattern, by the way, but sometimes necessary.
Edit the index didn't help. So let's try a so-called deferred join so we don't have to ORDER and then LIMIT all the data from your h table.
Start with this subquery. It retrieves only the primary key values for the rows involved in your results, and will generate just five rows.
SELECT h.history_id, i.task_id
FROM h
JOIN i ON h.task_id = i.task_id
WHERE i.user_id = 42
ORDER BY h.mod_date
LIMIT 5
Why this subquery? It handles the work-intensive ORDER BY ... LIMIT operation while manipulating only the primary keys and the date. It still must sort tons of rows only to discard all but five, but the rows it has to handle are much shorter. Because this subquery does the heavy work, you focus on optimizing it, rather than the whole query.
Keep the index I suggested above, because it covers the subquery for h.
Then, join it to the rest of your query like this. That way you'll only have to retrieve the expensive h.description column for the five rows you care about.
SELECT h.* , i.task_id, i.user_id , i.name, i.completed
FROM h
JOIN i ON i.task_id = h.task_id
JOIN (
SELECT h.history_id, i.task_id
FROM h
JOIN i ON h.task_id = i.task_id
WHERE i.user_id = 42
ORDER BY h.mod_date
LIMIT 5
) selected ON h.history_id = selected.history_id
AND i.task_id = selected.task_id
ORDER BY h.mod_date DESC
LIMIT 5

Mysql query not optimized and very slow, but why?

in the software that i develop, a car delear software, there's a section with the agenda with all the appointments of the users.
This section is pretty fast to load with a daily and normal use of the agenda, thousands of rows, but start to be really slow when the agenda tables reach 1 million of rows.
The structure:
1) Main table
CREATE TABLE IF NOT EXISTS `agenda` (
`id_agenda` int(11) NOT NULL AUTO_INCREMENT,
`id_user` int(11) NOT NULL DEFAULT '0',
`id_agency` int(11) NOT NULL DEFAULT '0',
`id_customer` int(11) DEFAULT NULL,
`id_car` int(11) DEFAULT NULL,
`id_owner` int(11) DEFAULT NULL,
`type` int(11) NOT NULL DEFAULT '8',
`title` varchar(255) NOT NULL DEFAULT '',
`text` text NOT NULL,
`start_day` date NOT NULL DEFAULT '0000-00-00',
`end_day` date NOT NULL DEFAULT '0000-00-00',
`start_hour` time NOT NULL DEFAULT '00:00:00',
`end_hour` time NOT NULL DEFAULT '00:00:00'
PRIMARY KEY (`id_agenda`),
KEY `start_day` (`start_day`),
KEY `id_customer` (`id_customer`),
KEY `id_car` (`id_car`),
KEY `id_user` (`id_user`),
KEY `id_owner` (`id_owner`),
KEY `type` (`type`),
KEY `id_agency` (`id_agency`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
2) Secondary table
CREATE TABLE IF NOT EXISTS `agenda_cars` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_agenda` int(11) NOT NULL,
`id_car` int(11) NOT NULL,
`id_owner` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `id_agenda` (`id_agenda`),
KEY `id_car` (`id_car`),
KEY `id_owner` (`id_owner`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Query:
SELECT a.id_agenda
FROM agenda as a
LEFT JOIN agenda_cars as agc on agc.id_agenda = a.id_agenda
WHERE
(a.id_customer = '22' OR (a.id_owner = '22' OR agc.id_owner = '22' ))
GROUP BY a.id_agenda
ORDER BY a.start_day, a.start_hour
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE a index PRIMARY PRIMARY 4 NULL 1051987 Using temporary; Using filesort
1 SIMPLE agc ref id_agenda id_agenda 4 db.a.id_agenda 1 Using where
The query reachs 10 secs to end, with the id 22, but with other id can reach also 20 secs, this just for the query, to load all in the web page take of course more time.
I don't get the point why it takes so long to get the data, i think the indexes are right configured and the query is pretty simple, so why?
Too much data?
I've solved in this way:
SELECT a.id_agenda
FROM
(
SELECT id_agenda
FROM agenda
WHERE (id_customer = '22' OR id_owner = '22' )
UNION
SELECT id_agenda
FROM agenda_cars
WHERE id_owner = '22'
) as at
INNER JOIN agenda as a on a.id_agenda = at.id_agenda
GROUP BY a.id_agenda
ORDER BY a.start_day, a.start_hour
This version of the query is ten times faster the then previous...but why?
Thanks to all want to contribute to solve my doubts!
UPDATE AFTER Rick James solution:
Query suggested
SELECT a.id_agenda
FROM
(
SELECT id_agenda FROM agenda WHERE id_customer = '22'
UNION DISTINCT
SELECT id_agenda FROM agenda WHERE id_owner = '22'
UNION DISTINCT
SELECT id_agenda FROM agenda_cars WHERE id_owner = '22'
) as at
INNER JOIN agenda as a ON a.id_agenda = at.id_agenda
ORDER BY a.start_datetime;
Result: 279 total, 0.0111 sec
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 366 Using temporary; Using filesort
1 PRIMARY a eq_ref PRIMARY PRIMARY 4 at.id_agenda 1 NULL
2 DERIVED agenda ref id_customer id_customer 5 const 1 Using index
3 UNION agenda ref id_owner id_owner 5 const 114 Using index
4 UNION agenda_cars ref id_owner id_owner 4 const 250 NULL
NULL UNION RESULT <union2,3,4> ALL NULL NULL NULL NULL NULL Using temporary
Before I dig into what can be done, let me list several reg flags I see.
OR is hard to optimize
Filtering (WHERE) on multiple tables JOINed together is hard to optimize.
GROUP BY x ORDER BY z means two passes over the data, usually 2 temp tables and filesorts.
Did you really mean LEFT? It says "the right table (agc) might be missing, in which case provide NULLs".
(You may not be able to get rid of all of the red flags.)
Red flags in the Schema:
Indexing every column -- usually not useful
Only single-column indexes -- "composite" indexes often help.
DATE and TIME as separate columns -- usually makes for clumsy queries.
OK, those are off my shoulder, now to study the query... (Oh, and thanks for providing the CREATEs and EXPLAIN!)
The ON implies a 1:many relationship between agenda:agenda_cars. Is that correct?
id_owner and id_car are in both tables, yet are not included in the ON; what's up?
(Here's the meat of the answer to your final question.) Why have GROUP BY? I see no aggregates. I will guess that the 1:many relationship lead to multiple rows, and you needed to de-dup? For dedupping, please use DISTINCT. But, the real solution is to avoid the "inflate (JOIN) - deflate (GROUP BY)" syndrome. Your subquery is a good start on that.
Rolling some of the above comments in, plus more:
SELECT a.id_agenda
FROM
(
SELECT id_agenda FROM agenda WHERE id_customer = '22'
UNION DISTINCT
SELECT id_agenda FROM agenda WHERE id_owner = '22'
UNION DISTINCT
SELECT id_agenda FROM agenda_cars WHERE id_owner = '22'
) as at
INNER JOIN agenda as a ON a.id_agenda = at.id_agenda
ORDER BY a.start_datetime;
Notes:
Got rid of the other OR
Explicit UNION DISTINCT to be clear that dups are expected.
Toss GROUP BY and not using SELECT DISTINCT; UNION DISTINCT deals with the need.
You have the 4 necessary indexes (one per subquery): (id_customer), (id_owner) (on both tables) and PRIMARY KEY(id_agenda).
The indexes are "covering indexes for all the subqueries -- an extra bonus.
There will be one unavoidable tmp table and file sort -- for the ORDER BY, but it won't be on a million rows.
(No need for composite indexes -- this time.)
I changed to a DATETIME; change back if you have a good reason for splitting them.
Did I get you another 10x? Did I explain it sufficiently?
Oh, one more thing...
This query returns an list of ids ordered by something that it does not return (date+time). What will you do with ids? If you are using this as a subquery in another table, then the Optimizer has a right to throw away the ORDER BY. Just warning you.

Where am I going wrong in using a Join in the mysql query - Explain result posted too

I have this query which takes about 3.5 seconds just to fetch 2 records. However there are over 100k rows in testimonials, 13k in users, 850 in courses, 2 in exams.
SELECT t.*, u.name, f.feedback
FROM testmonials t
INNER JOIN user u ON u.id = t.userid
INNER JOIN courses co ON co.id = t.courseid
LEFT JOIN exam ex ON ex.id = t.exam_id
WHERE t.status = 4
AND t.verfication_required = 'Y'
AND t.verfication_completed = 'N'
ORDER BY t.submissiondate DESC
.Explain result: .
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE co ALL PRIMARY NULL NULL NULL 850 Using temporary; Using filesort
1 SIMPLE t ref CID,nuk_tran_user CID 4 kms.co.id 8 Using where
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 kms.t.userid 1 Using where
1 SIMPLE ex eq_ref PRIMARY PRIMARY 3 kms.t.eval_id 1
If I remove the courses table join then the query returns the result pretty quick. I can't figure out why this query has to select all the courses rows i.e. 850?
Any ideas what I am doing wrong?
Edit:
I have an index on courseid, userid in testimonials table and these are primary keys of their respective tables.
EDIT 2
I have just removed the courseid index from the testimonials table (just to test) and interestingly the query returned result in 0.22 seconds!!!?? Everything else the same as above just removed only this index.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t ALL nuk_tran_user NULL NULL NULL 130696 Using where; Using filesort
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 kms.t.userid 1 Using where
1 SIMPLE co eq_ref PRIMARY PRIMARY 4 kms.t.courseid 1
1 SIMPLE ex eq_ref PRIMARY PRIMARY 3 kms.t.exam_id 1
EDIT 3
EDIT 3
CREATE TABLE IF NOT EXISTS `courses` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` text NOT NULL,
`duration` varchar(100) NOT NULL DEFAULT '',
`objectives` text NOT NULL,
`updated_at` datetime DEFAULT NULL,
`updated_by` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=851 ;
Testimonials
CREATE TABLE IF NOT EXISTS `testimonials` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`feedback` text NOT NULL,
`userid` int(10) unsigned NOT NULL DEFAULT '0',
`courseid` int(10) unsigned NOT NULL DEFAULT '0',
`eventid` int(10) unsigned NOT NULL DEFAULT '0',
`emr_date` datetime DEFAULT NULL,
`exam_required` enum('Y','N') NOT NULL DEFAULT 'N',
`exam_id` smallint(5) unsigned NOT NULL DEFAULT '0',
`emr_completed` enum('Y','N') NOT NULL DEFAULT 'N',
PRIMARY KEY (`id`),
KEY `event` (`eventid`),
KEY `nuk_tran_user` (`userid`),
KEY `emr_date` (`emr_date`),
KEY `courseid` (`courseid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=134691 ;
.. this is the latest Explain query result now ...
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t ALL nuk_tran_user,courseid NULL NULL NULL 130696 Using where; Using filesort
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 kms.t.userid 1 Using where
1 SIMPLE co eq_ref PRIMARY PRIMARY 4 kms.t.courseid 1
1 SIMPLE ex eq_ref PRIMARY PRIMARY 3 kms.t.exam_id 1
Doing an ORDER BY that does not have a corresponding index that can be utilized is known to cause delay issues. Even though this does not specifically answer your issue of the courses table.
Your original query looks MOSTLY ok, but you reference "f.feedback" and there is no "f" alias in the query. You also refer to "verification_required" and "verification_completed" but don't see those in the table structures but DO find "exam_required" and "emr_completed".
I would however change one thing. In the testimonials table, instead of individual column indexes, I would add one more with multiple columns to both take advantage of your multiple criteria query AND the order by
create table ...
KEY StatVerifySubmit ( status, verification_required, verification_completed, submissionDate )
but appears your query is referring to columns not listed in your table structure listing, but instead might be
KEY StatVerifySubmit ( status, exam_required, emr_completed, emr_Date)
Could you give a try to the following query instead of the original:
SELECT t.*, u.name, f.feedback
FROM testmonials t
INNER JOIN user u ON u.id = t.userid
LEFT JOIN exam ex ON ex.id = t.exam_id
WHERE t.status = 4
AND t.verfication_required = 'Y'
AND t.verfication_completed = 'N'
AND t.courseid in ( SELECT co.id FROM courses co)
ORDER BY t.submissiondate DESC
Do you need to select columns from the courses table?

Help me optimize this MySql query

I have a MySql query that take a very long time to run (about 7 seconds). The problem seems to be with the OR in this part of the query: "(tblprivateitem.userid=?userid OR tblprivateitem.userid=1)". If I skip the "OR tblprivateitem.userid=1" part it takes only 0.01 seconds. As I need that part I need to find a way to optimize this query. Any ideas?
QUERY:
SELECT
tbladdeditem.addeditemid,
tblprivateitem.iitemid,
tblprivateitem.itemid
FROM tbladdeditem
INNER JOIN tblprivateitem
ON tblprivateitem.itemid=tbladdeditem.itemid
AND (tblprivateitem.userid=?userid OR tblprivateitem.userid=1)
WHERE tbladdeditem.userid=?userid
EXPLAIN:
id select_type table type possible_keys key key_len ref rows extra
1 SIMPLE tbladdeditem ref userid userid 4 const 293 Using where
1 SIMPLE tblprivateitem ref userid,itemid itemid 4 tbladdeditem.itemid 2 Using where
TABLES:
tbladdeditem contains 1 100 000 rows:
CREATE TABLE `tbladdeditem` (
`addeditemid` int(11) NOT NULL auto_increment,
`itemid` int(11) default NULL,
`userid` mediumint(9) default NULL,
PRIMARY KEY (`addeditemid`),
KEY `userid` (`userid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
tblprivateitem contains 2 700 000 rows:
CREATE TABLE `tblprivateitem` (
`privateitemid` int(11) NOT NULL auto_increment,
`userid` mediumint(9) default '1',
`itemid` int(10) NOT NULL,
`iitemid` mediumint(9) default NULL,
PRIMARY KEY (`privateitemid`),
KEY `userid` (`userid`),
KEY `itemid` (`itemid`) //Changed this index to only use itemid instead
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
UPDATE
I made my queries and schema match your original question exactly, multi-column key and all. The only possible difference is that I populated each table with two million entries. My query (your query) runs in 0.15 seconds.
delimiter $$
set #userid = 6
$$
SELECT
tbladdeditem.addeditemid, tblprivateitem.iitemid, tblprivateitem.itemid
FROM tbladdeditem
INNER JOIN tblprivateitem
ON tblprivateitem.itemid=tbladdeditem.itemid
AND (tblprivateitem.userid=#userid or tblprivateitem.userid = 1)
WHERE tbladdeditem.userid=#userid
I have the same explain that you do, and with my data, my query return over a thousand matches without any issue at all. Being completely at a loss, as you really shouldn't be having these issues -- is it possible you are running a very limiting version of MySQL? Are you running 64-bit? Plenty of memory?
I had made the assumption that your query wasn't performing well, and when mine was, assumed I had fixed you problem. So now I eat crow. I'll post some of the avenues I went down. But I'm telling you, your query the way you posted it originally works just fine. I can only imagine your MySQL thrashing to the hard drive or something. Sorry I couldn't be more help.
PREVIOUS RESPONSE (Which is also an update)
I broke down and recreated your problem in my own database. After trying independent indexes on userid and on itemid I was unable to get the query below a few seconds, so I set up very specific multi-column keys as directed by the query. Notice on tbladdeditem the multi-column query begins with itemid while on the tblprivateitem the columns are reversed:
Here is the schema I used:
CREATE TABLE `tbladdeditem` (
`addeditemid` int(11) NOT NULL AUTO_INCREMENT,
`itemid` int(11) NOT NULL,
`userid` mediumint(9) NOT NULL,
PRIMARY KEY (`addeditemid`),
KEY `userid` (`userid`),
KEY `i_and_u` (`itemid`,`userid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `tblprivateitem` (
`privateitemid` int(11) NOT NULL AUTO_INCREMENT,
`userid` mediumint(9) NOT NULL DEFAULT '1',
`itemid` int(10) NOT NULL,
`iitemid` mediumint(9) NOT NULL,
PRIMARY KEY (`privateitemid`),
KEY `userid` (`userid`),
KEY `itemid` (`itemid`),
KEY `u_and_i` (`userid`,`itemid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I filled each table with 2 million entries of random data. I made some assumptions:
userid varies from 1 to 2000
itemid varies between 1 and 10000
This gives each user about a thousand entries in each table.
Here are two versions of the query (I'm using workbench for my editor):
Version 1 - do all the filtering on the join.
Result: 0.016 seconds to return 1297 rows
delimiter $$
set #userid = 3
$$
SELECT
a.addeditemid,
p.iitemid,
p.itemid
FROM tblprivateitem as p
INNER JOIN tbladdeditem as a
ON (p.userid in (1, #userid))
AND p.itemid = a.itemid
AND a.userid = #userid
$$
Here's the explain:
EXPLAIN:
id select_type table type key ref rows extra
1 SIMPLE p range u_and_i 2150 Using where; Using index
1 SIMPLE a ref i_and_u 1 Using where; Using index
Version 2 - filter up front
Result: 0.015 seconds to return 1297 rows
delimiter $$
set #userid = 3
$$
SELECT
a.addeditemid,
p.iitemid,
p.itemid
from
(select userid, itemid, iitemid from tblprivateitem
where userid in (1, #userid)) as p
join tbladdeditem as a on p.userid = a.userid and a.itemid = p.itemid;
where a.userid = #userid
$$
Here's the explain:
id select_type table type key ref rows extra
1 PRIMARY <derived2> ALL null null 2152
1 PRIMARY a ref i_and_u p.itemid,const 1 Using where; Using index
2 DERIVED p1 range u_and_i 2150 Using where
Since you have the predicate condition tbladdeditem.userid=?userid in the where clause I don't think you need it in the join condition.. Try removing it from the join condition and (If you are using the Or to handle the case where the parameter is null, then use Coalesce instead of OR) if not leave it as an Or
-- If Or is to provide default for when (?userid is null...
SELECT a.addeditemid, p.iitemid, p.itemid
FROM tbladdeditem a
JOIN tblprivateitem p
ON p.itemid=a.itemid
WHERE a.userid=?userid
AND p.userid=Coalesce(?userid, 1)
-- if not then
SELECT a.addeditemid, p.iitemid, p.itemid
FROM tbladdeditem a
JOIN tblprivateitem p
ON p.itemid=a.itemid
WHERE a.userid=?userid
AND (p.userid=?userid Or p.userid = 1)
Second, if there is not an index on the userId column in these two tables, consider adding one.
Finally, if these all fail, try converting to two separate queries and unioning them together:
Select a.addeditemid, p.iitemid, p.itemid
From tbladdeditem a
Join tblprivateitem p
On p.itemid=a.itemid
And p.userId = a.Userid
Where p.userid=?userid
Union
Select a.addeditemid, p.iitemid, p.itemid
From tbladdeditem a
Join tblprivateitem p
On p.itemid=a.itemid
And p.userId = a.Userid
Where p.userid = 1
I would try this instead, on your original JOIN you have an OR associated with a parameter, move that to your WHERE clause.
SELECT
tbladdeditem.addeditemid,
tblprivateitem.iitemid,
tblprivateitem.itemid
FROM tbladdeditem
INNER JOIN tblprivateitem
ON tblprivateitem.itemid=tbladdeditem.itemid
WHERE tbladdeditem.userid=?userid
AND (tblprivateitem.userid=?userid OR tblprivateitem.userid=1)

MySQL Optimization Problem

I'm having problems with a query optimization. The following query takes more than 30 seconds to get the expected result.
SELECT tbl_history.buffet_q_rating, tbl_history.cod_stock, tbl_history.bqqq_change_month, stocks.ticker, countries.country, stocks.company
FROM tbl_history
INNER JOIN stocks ON tbl_history.cod_stock = stocks.cod_stock
INNER JOIN exchange ON stocks.cod_exchange = exchange.cod_exchange
INNER JOIN countries ON exchange.cod_country = countries.cod_country
WHERE exchange.cod_country =125
AND DATE = '2011-07-25'
AND bqqq_change_month IS NOT NULL
AND buffet_q_rating IS NOT NULL
ORDER BY bqqq_change_month DESC
LIMIT 10
The tables are:
CREATE TABLE IF NOT EXISTS `tbl_history` (
`cod_stock` int(11) NOT NULL DEFAULT '0',
`date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`price` decimal(11,3) DEFAULT NULL,
`buffet_q_rating` decimal(11,4) DEFAULT NULL,
`bqqq_change_day` decimal(11,2) DEFAULT NULL,
`bqqq_change_month` decimal(11,2) DEFAULT NULL,
(...)
PRIMARY KEY (`cod_stock`,`date`),
KEY `cod_stock` (`cod_stock`),
KEY `buf_rating` (`buffet_q_rating`),
KEY `data` (`date`),
KEY `bqqq_change_month` (`bqqq_change_month`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `stocks` (
`cod_stock` int(11) NOT NULL AUTO_INCREMENT,
`cod_exchange` int(11) DEFAULT NULL,
PRIMARY KEY (`cod_stock`),
KEY `exchangestocks` (`cod_exchange`),
KEY `codstock` (`cod_stock`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;
CREATE TABLE IF NOT EXISTS `exchange` (
`cod_exchange` int(11) NOT NULL AUTO_INCREMENT,
`exchange` varchar(100) DEFAULT NULL,
`cod_country` int(11) DEFAULT NULL,
PRIMARY KEY (`cod_exchange`),
KEY `countriesexchange` (`cod_country`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;
CREATE TABLE IF NOT EXISTS `countries` (
`cod_country` int(11) NOT NULL AUTO_INCREMENT,
`country` varchar(100) DEFAULT NULL,
`initial_amount` double DEFAULT NULL,
PRIMARY KEY (`cod_country`),
KEY `codcountry` (`cod_country`),
KEY `country` (`country`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;
The first table have more than 20 million rows, the second have 40k and the others have just a few rows (maybe 100).
Them problem seems to be the "order by" but I have no idea how to optimize it.
I already tried some things searching on google/stackoverflow but I was unable to get good results
Can someone give me some advice?
EDIT:
Forgot the EXPLAIN result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE countries const PRIMARY,codcountry PRIMARY 4 const 1 Using temporary; Using filesort
1 SIMPLE exchange ref PRIMARY,countriesexchange countriesexchange 5 const 15 Using where
1 SIMPLE stocks ref PRIMARY,exchangestocks,codstock exchangestocks 5 databaseName.exchange.cod_exchange 661 Using where
1 SIMPLE tbl_history eq_ref PRIMARY,cod_stock,buf_rating,data,bqqq_change_mont... PRIMARY 12 v.stocks.cod_stock,const 1 Using where
UPDATE
this is the new EXPLAIN I got:
id select_type table type possible_keys key key_len ref rows Extra |
1 SIMPLE tbl_history range monthstats monthstats 14 NULL 80053 Using where; Using index |
1 SIMPLE countries ref country country 4 const 1 Using index |
1 SIMPLE exchange ref PRIMARY,cod_country,countryexchange countryexchange 5 cons‌​t 5 Using where; Using index |
1 SIMPLE stocks ref info4stats info4stats 9 databaseName.exchange.cod_exchange,d‌​atabaseName.stock_... 1 Using where; Using index |
I would try to preemptively start with the Country records for 125 and work in reverse. By using a Straight_join will force the order of your query as entered...
I would also have an index on your Tbl_History table by the COD_Stock and DATE( date ). So the query will properly and efficiently match the join condition on the pre-qualified date portion of the date/time field.
SELECT STRAIGHT_JOIN
th.buffet_q_rating,
th.cod_stock,
th.bqqq_change_month,
stocks.ticker,
c.country,
s.company
FROM
Exchange e
join Countries c
on e.Cod_Country = c.Cod_Country
join Stocks s
on e.cod_exchange = s.cod_exchange
join tbl_history th
on s.cod_stock = th.cod_stock
AND th.`Date` = '2011-07-25'
AND th.bqqq_change_month IS NOT NULL
AND th.buffet_q_rating IS NOT NULL
WHERE
e.Cod_Country = 125
ORDER BY
th.bqqq_change_month DESC
LIMIT 10
If you want to limit the result, why do you do it after you join all the table?
Try to reduce the size of those big tables first (LIMIT or WHERE them) before joining them with other tables.
But you have to be sure that your original query and your modified query means the same.
Update (Sample) :
select
tbl_user.user_id,
tbl_group.group_name
from
tbl_grp_user
inner join
(
select
tbl_user.user_id,
tbl_user.user_name
from
tbl_user
limit
5
) as tbl_user
on
tbl_user.user_id = tbl_grp_user.user_id
inner join
(
select
group_id,
group_name
from
tbl_group
where
tbl_group.group_id > 5
) as tbl_group
on
tbl_group.group_id = tbl_grp_user.group_id
Hopefully, query above will give you a hint