Mysql query not optimized and very slow, but why? - mysql

in the software that i develop, a car delear software, there's a section with the agenda with all the appointments of the users.
This section is pretty fast to load with a daily and normal use of the agenda, thousands of rows, but start to be really slow when the agenda tables reach 1 million of rows.
The structure:
1) Main table
CREATE TABLE IF NOT EXISTS `agenda` (
`id_agenda` int(11) NOT NULL AUTO_INCREMENT,
`id_user` int(11) NOT NULL DEFAULT '0',
`id_agency` int(11) NOT NULL DEFAULT '0',
`id_customer` int(11) DEFAULT NULL,
`id_car` int(11) DEFAULT NULL,
`id_owner` int(11) DEFAULT NULL,
`type` int(11) NOT NULL DEFAULT '8',
`title` varchar(255) NOT NULL DEFAULT '',
`text` text NOT NULL,
`start_day` date NOT NULL DEFAULT '0000-00-00',
`end_day` date NOT NULL DEFAULT '0000-00-00',
`start_hour` time NOT NULL DEFAULT '00:00:00',
`end_hour` time NOT NULL DEFAULT '00:00:00'
PRIMARY KEY (`id_agenda`),
KEY `start_day` (`start_day`),
KEY `id_customer` (`id_customer`),
KEY `id_car` (`id_car`),
KEY `id_user` (`id_user`),
KEY `id_owner` (`id_owner`),
KEY `type` (`type`),
KEY `id_agency` (`id_agency`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
2) Secondary table
CREATE TABLE IF NOT EXISTS `agenda_cars` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_agenda` int(11) NOT NULL,
`id_car` int(11) NOT NULL,
`id_owner` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `id_agenda` (`id_agenda`),
KEY `id_car` (`id_car`),
KEY `id_owner` (`id_owner`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Query:
SELECT a.id_agenda
FROM agenda as a
LEFT JOIN agenda_cars as agc on agc.id_agenda = a.id_agenda
WHERE
(a.id_customer = '22' OR (a.id_owner = '22' OR agc.id_owner = '22' ))
GROUP BY a.id_agenda
ORDER BY a.start_day, a.start_hour
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE a index PRIMARY PRIMARY 4 NULL 1051987 Using temporary; Using filesort
1 SIMPLE agc ref id_agenda id_agenda 4 db.a.id_agenda 1 Using where
The query reachs 10 secs to end, with the id 22, but with other id can reach also 20 secs, this just for the query, to load all in the web page take of course more time.
I don't get the point why it takes so long to get the data, i think the indexes are right configured and the query is pretty simple, so why?
Too much data?
I've solved in this way:
SELECT a.id_agenda
FROM
(
SELECT id_agenda
FROM agenda
WHERE (id_customer = '22' OR id_owner = '22' )
UNION
SELECT id_agenda
FROM agenda_cars
WHERE id_owner = '22'
) as at
INNER JOIN agenda as a on a.id_agenda = at.id_agenda
GROUP BY a.id_agenda
ORDER BY a.start_day, a.start_hour
This version of the query is ten times faster the then previous...but why?
Thanks to all want to contribute to solve my doubts!
UPDATE AFTER Rick James solution:
Query suggested
SELECT a.id_agenda
FROM
(
SELECT id_agenda FROM agenda WHERE id_customer = '22'
UNION DISTINCT
SELECT id_agenda FROM agenda WHERE id_owner = '22'
UNION DISTINCT
SELECT id_agenda FROM agenda_cars WHERE id_owner = '22'
) as at
INNER JOIN agenda as a ON a.id_agenda = at.id_agenda
ORDER BY a.start_datetime;
Result: 279 total, 0.0111 sec
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 366 Using temporary; Using filesort
1 PRIMARY a eq_ref PRIMARY PRIMARY 4 at.id_agenda 1 NULL
2 DERIVED agenda ref id_customer id_customer 5 const 1 Using index
3 UNION agenda ref id_owner id_owner 5 const 114 Using index
4 UNION agenda_cars ref id_owner id_owner 4 const 250 NULL
NULL UNION RESULT <union2,3,4> ALL NULL NULL NULL NULL NULL Using temporary

Before I dig into what can be done, let me list several reg flags I see.
OR is hard to optimize
Filtering (WHERE) on multiple tables JOINed together is hard to optimize.
GROUP BY x ORDER BY z means two passes over the data, usually 2 temp tables and filesorts.
Did you really mean LEFT? It says "the right table (agc) might be missing, in which case provide NULLs".
(You may not be able to get rid of all of the red flags.)
Red flags in the Schema:
Indexing every column -- usually not useful
Only single-column indexes -- "composite" indexes often help.
DATE and TIME as separate columns -- usually makes for clumsy queries.
OK, those are off my shoulder, now to study the query... (Oh, and thanks for providing the CREATEs and EXPLAIN!)
The ON implies a 1:many relationship between agenda:agenda_cars. Is that correct?
id_owner and id_car are in both tables, yet are not included in the ON; what's up?
(Here's the meat of the answer to your final question.) Why have GROUP BY? I see no aggregates. I will guess that the 1:many relationship lead to multiple rows, and you needed to de-dup? For dedupping, please use DISTINCT. But, the real solution is to avoid the "inflate (JOIN) - deflate (GROUP BY)" syndrome. Your subquery is a good start on that.
Rolling some of the above comments in, plus more:
SELECT a.id_agenda
FROM
(
SELECT id_agenda FROM agenda WHERE id_customer = '22'
UNION DISTINCT
SELECT id_agenda FROM agenda WHERE id_owner = '22'
UNION DISTINCT
SELECT id_agenda FROM agenda_cars WHERE id_owner = '22'
) as at
INNER JOIN agenda as a ON a.id_agenda = at.id_agenda
ORDER BY a.start_datetime;
Notes:
Got rid of the other OR
Explicit UNION DISTINCT to be clear that dups are expected.
Toss GROUP BY and not using SELECT DISTINCT; UNION DISTINCT deals with the need.
You have the 4 necessary indexes (one per subquery): (id_customer), (id_owner) (on both tables) and PRIMARY KEY(id_agenda).
The indexes are "covering indexes for all the subqueries -- an extra bonus.
There will be one unavoidable tmp table and file sort -- for the ORDER BY, but it won't be on a million rows.
(No need for composite indexes -- this time.)
I changed to a DATETIME; change back if you have a good reason for splitting them.
Did I get you another 10x? Did I explain it sufficiently?
Oh, one more thing...
This query returns an list of ids ordered by something that it does not return (date+time). What will you do with ids? If you are using this as a subquery in another table, then the Optimizer has a right to throw away the ORDER BY. Just warning you.

Related

MySQL - how to optimize query with order by

I am trying to generate a list of the 5 most recent history items for for a collection of user tasks. If I remove the order by the execution drops from ~2 seconds to < 20msec.
Indexes are on
h.task_id
h.mod_date
i.task_id
i.user_id
This is the query
SELECT h.*
, i.task_id
, i.user_id
, i.name
, i.completed
FROM h
, i
WHERE i.task_id = h.task_id
AND i.user_id = 42
ORDER
BY h.mod_date DESC
LIMIT 5
Here is the explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE i ref PRIMARY,UserID UserID 4 const 3091 Using temporary; Using filesort
1 SIMPLE h ref TaskID TaskID 4 myDB.i.task_id 7
Here are the show create tables:
CREATE TABLE `h` (
`history_id` int(6) NOT NULL AUTO_INCREMENT,
`history_code` tinyint(4) NOT NULL DEFAULT '0',
`task_id` int(6) NOT NULL,
`mod_date` datetime NOT NULL,
`description` text NOT NULL,
PRIMARY KEY (`history_id`),
KEY `TaskID` (`task_id`),
KEY `historyCode` (`history_code`),
KEY `modDate` (`mod_date`)
) ENGINE=InnoDB AUTO_INCREMENT=185647 DEFAULT CHARSET=latin1
and
CREATE TABLE `i` (
`task_id` int(6) NOT NULL AUTO_INCREMENT,
`user_id` int(6) NOT NULL,
`name` varchar(60) NOT NULL,
`due_date` date DEFAULT NULL,
`create_date` date NOT NULL,
`completed` tinyint(1) NOT NULL DEFAULT '0',
`task_description` blob,
PRIMARY KEY (`task_id`),
KEY `name_2` (`name`),
KEY `UserID` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=12085 DEFAULT CHARSET=latin1
INDEX(task_id, mod_date, history_id) -- in this order
Will be "covering" and the columns will be in the optimal order
Also, DROP
KEY `TaskID` (`task_id`)
So that the Optimizer won't be tempted to use it.
Try changing the index on h.task_id so it's this compound index.
CREATE OR REPLACE INDEX TaskID ON h(task_id, mod_date DESC);
This may (or may not) allow MySql to shortcut some or all the extra work in your ORDER BY ... LIMIT ... request. It's a notorious performance anti pattern, by the way, but sometimes necessary.
Edit the index didn't help. So let's try a so-called deferred join so we don't have to ORDER and then LIMIT all the data from your h table.
Start with this subquery. It retrieves only the primary key values for the rows involved in your results, and will generate just five rows.
SELECT h.history_id, i.task_id
FROM h
JOIN i ON h.task_id = i.task_id
WHERE i.user_id = 42
ORDER BY h.mod_date
LIMIT 5
Why this subquery? It handles the work-intensive ORDER BY ... LIMIT operation while manipulating only the primary keys and the date. It still must sort tons of rows only to discard all but five, but the rows it has to handle are much shorter. Because this subquery does the heavy work, you focus on optimizing it, rather than the whole query.
Keep the index I suggested above, because it covers the subquery for h.
Then, join it to the rest of your query like this. That way you'll only have to retrieve the expensive h.description column for the five rows you care about.
SELECT h.* , i.task_id, i.user_id , i.name, i.completed
FROM h
JOIN i ON i.task_id = h.task_id
JOIN (
SELECT h.history_id, i.task_id
FROM h
JOIN i ON h.task_id = i.task_id
WHERE i.user_id = 42
ORDER BY h.mod_date
LIMIT 5
) selected ON h.history_id = selected.history_id
AND i.task_id = selected.task_id
ORDER BY h.mod_date DESC
LIMIT 5

Query speed of insert/ update SMA (simple moving average)

I would like to include a column in my table with the simple moving average of stock data. I have been able to create several queries which successfully do so, however the query speed is slow. My goal is to improve the query speed.
I have the following table:
CREATE TABLE `timeseries_test` (
`timeseries_id` int(11) NOT NULL AUTO_INCREMENT,
`stock_id` int(10) NOT NULL,
`date` date NOT NULL,
`open` decimal(16,8) NOT NULL,
`high` decimal(16,8) NOT NULL,
`low` decimal(16,8) NOT NULL,
`close` decimal(16,8) NOT NULL,
`adjusted_close` double(16,8) NOT NULL,
`volume` int(16) NOT NULL,
`dividend` double(16,8) NOT NULL,
`split_coefficient` double(16,15) NOT NULL,
`100sma` decimal(16,8) NOT NULL,
PRIMARY KEY (`timeseries_id`),
KEY `stock` (`stock_id`),
KEY `date` (`date`),
KEY `date_stock` (`stock_id`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=5444325 DEFAULT CHARSET=latin1
I have tried many different query formats, but they all take about 25 seconds per 5000 rows. The select query only takes less than a second. Below an example query:
UPDATE stock.timeseries_test t1 INNER JOIN (
SELECT a.timeseries_id,
Round( ( SELECT SUM(b.close) / COUNT(b.close)
FROM timeseries_test AS b
WHERE DATEDIFF(a.date, b.date) BETWEEN 0 AND 99 AND a.stock_id = b.stock_id
), 2 ) AS '100sma'
FROM timeseries_test AS a) t2
ON t1.`timeseries_id` = t2.`timeseries_id`
SET t1.100sma = t2.100SMA
WHERE t2.100sma = null
Below the explain query:
1 PRIMARY <derived2> NULL ALL NULL NULL NULL NULL 10385 10.00 Using where
1 UPDATE t1 NULL eq_ref PRIMARY PRIMARY 4 t2.timeseries_id 1 100.00 NULL
2 DERIVED a NULL index NULL date_stock 7 NULL 10385 100.00 Using index
3 DEPENDENT SUBQUERY b NULL ref stock,date_stock stock 4 stock.a.stock_id 5192 100.00 Using where
Any help is appreciated.
If you are running MySQL 8.0, I recommend window functions with a range specification; this avois the need for a correlated subquery.
update stock.timeseries_test t1
inner join (
select timeseries_id,
avg(close) over(
partition by stock_id
order by date
range between interval 99 day preceding and current row
) `100sma`
from timeseries_test
) t2 on t1.timeseries_id = t2.timeseries_id
set t1.`100sma` = t2.`100sma`
It is quite unclear what the purpose of the original, outer where clause is, so I removed it:
WHERE t2.`100sma` = null
If you do want to check for nullness, then you need is null; but doing so would pretty much defeat whole logic of the update statement. Maybe you meant:
WHERE t1.`100sma` is null
Functions are not sargable. Instead of
DATEDIFF(a.date, b.date) BETWEEN 0 AND 99
use
a.date BETWEEN b.date AND b.date + INTERVAL 99 DAY
(or maybe a and b should be swapped)
I suspect (from the column names) that the pair (stock_id,date) is unique and that timeseries_id is never really used. If those are correct, then
PRIMARY KEY (`timeseries_id`),
KEY `date_stock` (`stock_id`,`date`)
-->
PRIMARY KEY(`stock_id`,`date`)
The ON(timestamp_id would need to be changed to testing both those columns.
Also, toss this since there is another index that starts with the same column(s):
KEY `stock` (`stock_id`),

GROUP BY + ORDER BY make my query very slow

I am trying to figure out what I should do to my query and/ or to my tables structure to improve a query to get the best sellers which is run in over 1 sec.
Here is the query I'm talking about:
SELECT pr.id_prod, MAX(pr.stock) AS stock, MAX(pr.dt_add) AS dt_add, SUM(od.quantity) AS quantity
FROM orders AS o
INNER JOIN orders_details AS od ON od.id_order = o.id_order
INNER JOIN products_references AS pr ON pr.id_prod_ref = od.id_prod_ref
INNER JOIN products AS p ON p.id_prod = pr.id_prod
WHERE o.id_order_status > 11
AND pr.active = 1
GROUP BY p.id_prod
ORDER BY quantity
LIMIT 10
If I use GROUP BY p.id_prod instead of GROUP BY pr.id_prod and remove the ORDER BY, the query is run in 0.07sec.
is that EXPLAIN table OKAY?
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE o range PRIMARY,id_order_status id_order_status 1 75940 Using where; Using index; Using temporary; Using filesort
1 SIMPLE od ref id_order,id_prod_ref id_order 4 dbname.o.id_order 1
1 SIMPLE pr eq_ref PRIMARY,id_prod PRIMARY 4 dbname.od.id_prod_ref 1 Using where
1 SIMPLE p eq_ref PRIMARY,name_url,id_brand,name PRIMARY 4 dbname.pr.id_prod 1 Using index
And this is the EXPLAIN without the ORDER BY
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p index PRIMARY,name_url,id_brand,name PRIMARY 4 1 Using index
1 SIMPLE pr ref PRIMARY,id_prod id_prod 4 dbname.p.id_prod 2 Using where
1 SIMPLE od ref id_order,id_prod_ref id_prod_ref 4 dbname.pr.id_prod_ref 67
1 SIMPLE o eq_ref PRIMARY,id_order_status PRIMARY 4 dbname.od.id_order 1 Using where
And here is the table structures
CREATE TABLE `orders` (
`id_order` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_dir` int(10) unsigned DEFAULT NULL,
`id_status` tinyint(3) unsigned NOT NULL DEFAULT '11',
PRIMARY KEY (`id_order`),
KEY `id_dir` (`id_dir`),
KEY `id_status` (`id_status`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `orders_details` (
`id_order_det` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_order` int(10) unsigned NOT NULL,
`id_prod_ref` int(10) unsigned NOT NULL,
`quantity` smallint(5) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`id_order_det`),
UNIQUE KEY `id_order` (`id_order`,`id_prod_ref`) USING BTREE,
KEY `id_prod_ref` (`id_prod_ref`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `products` (
`id_prod` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(60) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id_prod`),
FULLTEXT KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `products_references` (
`id_prod_ref` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_prod` int(10) unsigned NOT NULL,
`stock` smallint(6) NOT NULL DEFAULT '0',
`dt_add` datetime DEFAULT NULL,
`active` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id_prod_ref`),
KEY `id_prod` (`id_prod`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I also tried to give you the tables relations (ON UPDATE, ON DELETE CASCADE, ...) but didn't manage to export it. But I don't think it's crucial for now!
Try using the alias name in order by and not the value from the table
and use the group by for the value in select (is the same for join because is inner join on equal value and the value form the pr is not retrived for select result )
SELECT p.id_prod, p.name, SUM(od.quantity) AS quantity
FROM orders AS o
INNER JOIN orders_details AS od ON od.id_order = o.id_order
INNER JOIN products_references AS pr ON pr.id_prod_ref = od.id_prod_ref
INNER JOIN products AS p ON p.id_prod = pr.id_prod
WHERE pr.active = 1
GROUP BY p.id_prod
ORDER BY quantity
LIMIT 10
do not forget to use appropriate indexes on join columns
(Rewritten after OP added more info.)
SELECT pr.id_prod,
MAX(pr.stock) AS max_stock,
MAX(pr.dt_add) AS max_dt_add
SUM(od.quantity) AS sum_quantity
FROM orders AS o
INNER JOIN orders_details AS od
ON od.id_order = o.id_order
INNER JOIN products_references AS pr
ON pr.id_prod_ref = od.id_prod_ref
WHERE o.id_order_status > 11
AND pr.active = 1
GROUP BY pr.id_prod
ORDER BY sum_quantity
LIMIT 10
Note that p was removed as being irrelevant.
Beware of SUM() when using JOIN with GROUP BY -- you might get an incorrect, inflated, value.
Improvement on one table:
CREATE TABLE `orders_details` (
`id_order` int(10) unsigned NOT NULL,
`id_prod_ref` int(10) unsigned NOT NULL,
`quantity` smallint(5) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`id_order`,`id_prod_ref`),
INDEX (id_prod_ref, id_order)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Here's why: od sounds like a many:many mapping table. See here for tips on improving performance in it.
GROUP BY usually involves a sort. ORDER BY, when it is not identical to the GROUP BY definitely requires another sort.
Removing the ORDER BY allows the query to return any 10 rows without the sort. (This may explain the timing difference.)
Note the alias sum_quantity to avoid ambiguity between the column quantity and your alias quantity.
Explaining EXPLAIN
1 SIMPLE o range id_order_status 1 75940 Using where; Using index; Using temporary; Using filesort
1 SIMPLE od ref id_order 4 o.id_order 1
1 SIMPLE pr eq_ref PRIMARY 4 od.id_prod_ref 1 Using where
1 SIMPLE p eq_ref PRIMARY 4 pr.id_prod 1 Using index
The tables will be accessed in the order given (o,od,pr,p).
o won't use the data ("Using index") but will scan the id_order_status index which includes (id_status, id_order). Note: The PRIMARY KEY columns are implicitedly added to any secondary key.
It estimates 76K will need to be scanned (for > 11).
Somewhere in the processing, there will a temp table and a sort of it. This may or may not involve disk I/O.
The reach into od might find 1 row, might find 0 or more than 1 ("ref").
The reaching into pr and p are known to get at most 1 row.
pr does a small amount of filtering (active=1), but not until the third line of EXPLAIN. And no index is useful for this filtering. This could be improved, but only slightly, by a composite index (active, id_prod_ref). With only 5-10% being filtered out, this won't help much.
After all the JOINing and filtering, there will be two temp tables and sorts, one for GROUP BY, one for ORDER BY.
Only after that, will 10 rows be peeled off from the 70K (or so) rows collected up to this point.
Without the ORDER BY, the EXPLAIN shows that a different order seems to be better. And the tmp & sort went away.
1 SIMPLE p index PRIMARY 4 1 Using index
1 SIMPLE pr ref id_prod 4 p.id_prod 2 Using where
1 SIMPLE od ref id_prod_ref 4 pr.id_prod_ref 67
1 SIMPLE o eq_ref PRIMARY 4 dbne.od.id_order 1 Using where
There seem to be only 1 row in p, correct? So, in a way, it does not matter when this table is accessed. When you have multiple "products" all this analysis may change!
"key=PRIMARY", "Using index" is sort of a misnomer. It is really using the data, but being able to efficiently access it because the PRIMARY KEY is "clustered" with the data.
There is only one pr row?? Perhaps the optimizer realized that GROUP BY was not needed?
When it got to od, it estimated that "67" rows would be needed per p+pr combo.
You removed the ORDER BY, so there is no need to sort, and any 10 rows can be delivered.

Any way to optimize this MySQL query? (Resource intense)

My app needs to run this query pretty often, which gets a list of user data for the app to display. The problem is that subquery about the user_quiz is resource heavy and calculating the rankings are also very CPU intense too.
Benchmark: ~.5 second each run
When it will be run:
When the user want to see their ranking
When the user want to see other people's ranking
Getting a list of user's friends
.5 second it's a really long time considering this query will be run pretty often. Is there anything I could do to optimize this query?
Table for user:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`firstname` varchar(100) DEFAULT NULL,
`lastname` varchar(100) DEFAULT NULL,
`password` varchar(20) NOT NULL,
`email` varchar(300) NOT NULL,
`verified` tinyint(10) DEFAULT NULL,
`avatar` varchar(300) DEFAULT NULL,
`points_total` int(11) unsigned NOT NULL DEFAULT '0',
`points_today` int(11) unsigned NOT NULL DEFAULT '0',
`number_correctanswer` int(11) unsigned NOT NULL DEFAULT '0',
`number_watchedvideo` int(11) unsigned NOT NULL DEFAULT '0',
`create_time` datetime NOT NULL,
`type` tinyint(1) unsigned NOT NULL DEFAULT '1',
`number_win` int(11) unsigned NOT NULL DEFAULT '0',
`number_lost` int(11) unsigned NOT NULL DEFAULT '0',
`number_tie` int(11) unsigned NOT NULL DEFAULT '0',
`level` int(1) unsigned NOT NULL DEFAULT '0',
`islogined` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=230 DEFAULT CHARSET=utf8;
Table for user_quiz:
CREATE TABLE `user_quiz` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`question_id` int(11) NOT NULL,
`is_answercorrect` int(11) unsigned NOT NULL DEFAULT '0',
`question_answer_datetime` datetime NOT NULL,
`score` int(1) DEFAULT NULL,
`quarter` int(1) DEFAULT NULL,
`game_type` int(1) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=9816 DEFAULT CHARSET=utf8;
Table for user_starter:
CREATE TABLE `user_starter` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`result` int(1) DEFAULT NULL,
`created_date` date DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=456 DEFAULT CHARSET=utf8mb4;
My indexes:
Table: user
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
user 0 PRIMARY 1 id A 32 BTREE
Table: user_quiz
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
user_quiz 0 PRIMARY 1 id A 9462 BTREE
user_quiz 1 user_id 1 user_id A 270 BTREE
Table: user_starter
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
user_starter 0 PRIMARY 1 id A 454 BTREE
user_starter 1 user_id 1 user_id A 227 YES BTREE
Query:
SET #curRank = 0;
SET #lastPlayerPoints = 0;
SELECT
sub.*,
#curRank := IF(#lastPlayerPoints!=points_week, #curRank + 1, #curRank) AS rank,
#lastPlayerPoints := points_week AS db_PPW
FROM (
SELECT u.id,u.firstname,u.lastname,u.email,u.avatar,u.type,u.points_total,u.number_win,u.number_lost,u.number_tie,u.verified,
COALESCE(SUM(uq.score),0) as points_week,
COALESCE(us.number_lost,0) as number_week_lost,
COALESCE(us.number_win,0) as number_week_win,
(select MAX(question_answer_datetime) from user_quiz WHERE user_id = u.id and game_type = 1) as lastFrdFight,
(select MAX(question_answer_datetime) from user_quiz WHERE user_id = u.id and game_type = 2) as lastBotFight
FROM `user` u
LEFT JOIN (SELECT user_id,
count(case when result=1 then 1 else null end) as number_win,
count(case when result=-1 then 1 else null end) as number_lost
from user_starter where created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27' ) us ON u.id = us.user_id
LEFT JOIN (SELECT * FROM user_quiz WHERE question_answer_datetime BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 00:00:00') uq on u.id = uq.user_id
GROUP BY u.id ORDER BY points_week DESC, u.lastname ASC, u.firstname ASC
) as sub
EXPLAIN:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 3027 100
2 DERIVED u ALL PRIMARY 32 100 Using temporary; Using filesort
2 DERIVED <derived5> ALL 1 100 Using where; Using join buffer (Block Nested Loop)
2 DERIVED <derived6> ref <auto_key0> <auto_key0> 4 fancard.u.id 94 100
6 DERIVED user_quiz ALL 9461 100 Using where
5 DERIVED user_starter ALL 454 100 Using where
4 DEPENDENT SUBQUERY user_quiz ref user_id user_id 4 func 35 100 Using where
3 DEPENDENT SUBQUERY user_quiz ref user_id user_id 4 func 35 100 Using where
Example output and expected output:
Bench mark: around .5 second
The following index should make the subquery to user_quiz ultra fast.
ALTER TABLE user_quiz
ADD INDEX (`user_id`,`game_type`,`question_answer_datetime`)
Please provide SHOW CREATE TABLE tablename statements for all tables, as that will help with additional optimizations.
Update #1
Alright, I've had some time to look things over, and fortunately there a appears to be a lot of relatively low hanging fruit in terms of optimization.
Here are all the indexes to add:
ALTER TABLE user_quiz
ADD INDEX `userGametypeAnswerDatetimes` (`user_id`,`game_type`,`question_answer_datetime`)
ALTER TABLE user_quiz
ADD INDEX `userAnswerScores` (`user_id`,`question_answer_datetime`,`score`)
ALTER TABLE user_starter
ADD INDEX `userResultDates` (`user_id`,`result`,`created_date`)
Note that the names (such as userGametypeAnswerDatetimes) are optional, and you can name them to whatever makes the most sense to you. But, in general, it's good to put specific names on your custom indexes (simply for organization purposes.)
Now, here is your query that should work will with those new indexes:
SET #curRank = 0;
SET #lastPlayerPoints = 0;
SELECT
sub.*,
#curRank := IF(#lastPlayerPoints!=points_week, #curRank + 1, #curRank) AS rank,
#lastPlayerPoints := points_week AS db_PPW
FROM (
SELECT u.id,
u.firstname,
u.lastname,
u.email,
u.avatar,
u.type,
u.points_total,
u.number_win,
u.number_lost,
u.number_tie,
u.verified,
COALESCE(user_scores.score,0) as points_week,
COALESCE(user_losses.number_lost,0) as number_week_lost,
COALESCE(user_wins.number_win,0) as number_week_win,
(
select MAX(question_answer_datetime)
from user_quiz
WHERE user_id = u.id and game_type = 1
) as lastFrdFight,
(
select MAX(question_answer_datetime)
from user_quiz
WHERE user_id = u.id
and game_type = 2
) as lastBotFight
FROM `user` u
LEFT OUTER JOIN (
SELECT user_id,
COUNT(*) AS number_won
from user_starter
WHERE created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27'
AND result = 1
GROUP BY user_id
) user_wins
ON user_wins.user_id = u.user_id
LEFT OUTER JOIN (
SELECT user_id,
COUNT(*) AS number_lost
from user_starter
WHERE created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27'
AND result = -1
GROUP BY user_id
) user_losses
ON user_losses.user_id = u.user_id
LEFT OUTER JOIN (
SELECT SUM(score)
FROM user_quiz
WHERE question_answer_datetime
BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 00:00:00'
GROUP BY user_id
) user_scores
ON u.id = user_scores.user_id
ORDER BY points_week DESC, u.lastname ASC, u.firstname ASC
) as sub
Note: This is not necessarily the best result. It depends a LOT on your data set, as to whether this is necessarily the best, and sometimes you need to do a bit of trial and error.
A hint as to what you can use trial and error on is the structure of how we query the lastFrdFight and lastBotFight verses how we query points_week, number_week_lost, number_week_win. All of these could either be done in the select statement (like the first two are in my query) or could be done by joining to a subquery result (like the last three do, in my query.)
Mix and match to see what works best. In general, I've found the joining to a subquery to be fastest when you have a large number of rows in the outer query (in this case, querying the user table.) This is because it only needs to get the results once, and then can just match them up on a user by user basis. Other times, it can be better to have the query just in the SELECT clause - this will run MUCH faster, since there are more constants (the user_id is already known), but has to run for each row. So it's a trade off, and why you sometimes need to use trial and error.
Why do the indexes work?
So, you may be wondering why I made the indexes as I did. If you are familiar with phone books (in this age of smartphones, that's no longer a valid assumption I can make) then we can use that as an analogy:
If you had a composite index of phonebookIndex (lastname,firstname,email) on your user table (example here! you don' actually need to add that index!) you would have a result similar to what a phone book provides. (Using email instead of phone number.)
Each index is an internal copy of the data in the overall table. With this phonebookIndex there would internally be stored a list of all users with their lastname, then their first name, and then their email, and each of these would be ordered, just like a phone book.
Why is that useful? Consider when you know someone's first and last name. You can quickly flip to where their last name is, then quickly go through that list of everyone with their last name, finding the first name you want, so obtaining the email.
Indexes work in exactly the same way, in terms of how the database looks at them.
Consider the userGametypeAnswerDatetimes index I defined above, and how we query that index in the lastFrdFight SELECT subquery.
(
select MAX(question_answer_datetime)
from user_quiz
WHERE user_id = u.id and game_type = 1
) as lastFrdFight
Notice how we have both the user_id (from the outer query) and the game_type as constants. That is exactly like our example earlier, with having the first and last name, and wanting to look up an email/phone number. In this case, we are looking for the MAX of the 3rd value in the index. Still easy: All the values are ordered, so if this index was sitting in front of us, we could just flip to the specific user_id, then look at the section with all game_type=1 and then just pick the last value to find the maximum. Very very fast. Same for the database. It can find this value extremely fast, which is why you saw an 80%+ reduction in your overall query time.
So, that's how indexes work, and why I choose these indexes as I did.
Be aware, that the more indexes you have, the more you'll see slowdowns when doing inserts and updates. But, if you are reading a lot more from your tables than you are writing, this is usually a more than acceptable trade off.
So, give these changes a shot, and let me know how it performs. Please provide the new EXPLAIN plan if you want further optimization help. Also, this should give you quite a bit of tools to use trial and error to see what does work at what doesn't. All my changes are fairly independent of each other, so you can swap them in and out with your original query pieces to see how each one works.

Help me optimize this MySql query

I have a MySql query that take a very long time to run (about 7 seconds). The problem seems to be with the OR in this part of the query: "(tblprivateitem.userid=?userid OR tblprivateitem.userid=1)". If I skip the "OR tblprivateitem.userid=1" part it takes only 0.01 seconds. As I need that part I need to find a way to optimize this query. Any ideas?
QUERY:
SELECT
tbladdeditem.addeditemid,
tblprivateitem.iitemid,
tblprivateitem.itemid
FROM tbladdeditem
INNER JOIN tblprivateitem
ON tblprivateitem.itemid=tbladdeditem.itemid
AND (tblprivateitem.userid=?userid OR tblprivateitem.userid=1)
WHERE tbladdeditem.userid=?userid
EXPLAIN:
id select_type table type possible_keys key key_len ref rows extra
1 SIMPLE tbladdeditem ref userid userid 4 const 293 Using where
1 SIMPLE tblprivateitem ref userid,itemid itemid 4 tbladdeditem.itemid 2 Using where
TABLES:
tbladdeditem contains 1 100 000 rows:
CREATE TABLE `tbladdeditem` (
`addeditemid` int(11) NOT NULL auto_increment,
`itemid` int(11) default NULL,
`userid` mediumint(9) default NULL,
PRIMARY KEY (`addeditemid`),
KEY `userid` (`userid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
tblprivateitem contains 2 700 000 rows:
CREATE TABLE `tblprivateitem` (
`privateitemid` int(11) NOT NULL auto_increment,
`userid` mediumint(9) default '1',
`itemid` int(10) NOT NULL,
`iitemid` mediumint(9) default NULL,
PRIMARY KEY (`privateitemid`),
KEY `userid` (`userid`),
KEY `itemid` (`itemid`) //Changed this index to only use itemid instead
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
UPDATE
I made my queries and schema match your original question exactly, multi-column key and all. The only possible difference is that I populated each table with two million entries. My query (your query) runs in 0.15 seconds.
delimiter $$
set #userid = 6
$$
SELECT
tbladdeditem.addeditemid, tblprivateitem.iitemid, tblprivateitem.itemid
FROM tbladdeditem
INNER JOIN tblprivateitem
ON tblprivateitem.itemid=tbladdeditem.itemid
AND (tblprivateitem.userid=#userid or tblprivateitem.userid = 1)
WHERE tbladdeditem.userid=#userid
I have the same explain that you do, and with my data, my query return over a thousand matches without any issue at all. Being completely at a loss, as you really shouldn't be having these issues -- is it possible you are running a very limiting version of MySQL? Are you running 64-bit? Plenty of memory?
I had made the assumption that your query wasn't performing well, and when mine was, assumed I had fixed you problem. So now I eat crow. I'll post some of the avenues I went down. But I'm telling you, your query the way you posted it originally works just fine. I can only imagine your MySQL thrashing to the hard drive or something. Sorry I couldn't be more help.
PREVIOUS RESPONSE (Which is also an update)
I broke down and recreated your problem in my own database. After trying independent indexes on userid and on itemid I was unable to get the query below a few seconds, so I set up very specific multi-column keys as directed by the query. Notice on tbladdeditem the multi-column query begins with itemid while on the tblprivateitem the columns are reversed:
Here is the schema I used:
CREATE TABLE `tbladdeditem` (
`addeditemid` int(11) NOT NULL AUTO_INCREMENT,
`itemid` int(11) NOT NULL,
`userid` mediumint(9) NOT NULL,
PRIMARY KEY (`addeditemid`),
KEY `userid` (`userid`),
KEY `i_and_u` (`itemid`,`userid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `tblprivateitem` (
`privateitemid` int(11) NOT NULL AUTO_INCREMENT,
`userid` mediumint(9) NOT NULL DEFAULT '1',
`itemid` int(10) NOT NULL,
`iitemid` mediumint(9) NOT NULL,
PRIMARY KEY (`privateitemid`),
KEY `userid` (`userid`),
KEY `itemid` (`itemid`),
KEY `u_and_i` (`userid`,`itemid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I filled each table with 2 million entries of random data. I made some assumptions:
userid varies from 1 to 2000
itemid varies between 1 and 10000
This gives each user about a thousand entries in each table.
Here are two versions of the query (I'm using workbench for my editor):
Version 1 - do all the filtering on the join.
Result: 0.016 seconds to return 1297 rows
delimiter $$
set #userid = 3
$$
SELECT
a.addeditemid,
p.iitemid,
p.itemid
FROM tblprivateitem as p
INNER JOIN tbladdeditem as a
ON (p.userid in (1, #userid))
AND p.itemid = a.itemid
AND a.userid = #userid
$$
Here's the explain:
EXPLAIN:
id select_type table type key ref rows extra
1 SIMPLE p range u_and_i 2150 Using where; Using index
1 SIMPLE a ref i_and_u 1 Using where; Using index
Version 2 - filter up front
Result: 0.015 seconds to return 1297 rows
delimiter $$
set #userid = 3
$$
SELECT
a.addeditemid,
p.iitemid,
p.itemid
from
(select userid, itemid, iitemid from tblprivateitem
where userid in (1, #userid)) as p
join tbladdeditem as a on p.userid = a.userid and a.itemid = p.itemid;
where a.userid = #userid
$$
Here's the explain:
id select_type table type key ref rows extra
1 PRIMARY <derived2> ALL null null 2152
1 PRIMARY a ref i_and_u p.itemid,const 1 Using where; Using index
2 DERIVED p1 range u_and_i 2150 Using where
Since you have the predicate condition tbladdeditem.userid=?userid in the where clause I don't think you need it in the join condition.. Try removing it from the join condition and (If you are using the Or to handle the case where the parameter is null, then use Coalesce instead of OR) if not leave it as an Or
-- If Or is to provide default for when (?userid is null...
SELECT a.addeditemid, p.iitemid, p.itemid
FROM tbladdeditem a
JOIN tblprivateitem p
ON p.itemid=a.itemid
WHERE a.userid=?userid
AND p.userid=Coalesce(?userid, 1)
-- if not then
SELECT a.addeditemid, p.iitemid, p.itemid
FROM tbladdeditem a
JOIN tblprivateitem p
ON p.itemid=a.itemid
WHERE a.userid=?userid
AND (p.userid=?userid Or p.userid = 1)
Second, if there is not an index on the userId column in these two tables, consider adding one.
Finally, if these all fail, try converting to two separate queries and unioning them together:
Select a.addeditemid, p.iitemid, p.itemid
From tbladdeditem a
Join tblprivateitem p
On p.itemid=a.itemid
And p.userId = a.Userid
Where p.userid=?userid
Union
Select a.addeditemid, p.iitemid, p.itemid
From tbladdeditem a
Join tblprivateitem p
On p.itemid=a.itemid
And p.userId = a.Userid
Where p.userid = 1
I would try this instead, on your original JOIN you have an OR associated with a parameter, move that to your WHERE clause.
SELECT
tbladdeditem.addeditemid,
tblprivateitem.iitemid,
tblprivateitem.itemid
FROM tbladdeditem
INNER JOIN tblprivateitem
ON tblprivateitem.itemid=tbladdeditem.itemid
WHERE tbladdeditem.userid=?userid
AND (tblprivateitem.userid=?userid OR tblprivateitem.userid=1)