JOIN query taking long time and creating issue "converting HEAP to MyISAM - mysql

My query like below. here I used join query to take data. can u pls suggest how can I solve "converting HEAP to MyISAM" issue.
Can I use subquery here to update it? pls suggest how can I.
Here I have joined users table to check user is exist or not. can I refine it without join so that "converting HEAP to MyISAM" can solve.
Oh one more sometimes I will not check with specific user_id. like here I have added user_id = 16082
SELECT `user_point_logs`.`id`,
`user_point_logs`.`user_id`,
`user_point_logs`.`point_get_id`,
`user_point_logs`.`point`,
`user_point_logs`.`expire_date`,
`user_point_logs`.`point` AS `sum_point`,
IF(sum(`user_point_used_logs`.`point`) IS NULL, 0, sum(`user_point_used_logs`.`point`)) AS `minus`
FROM `user_point_logs`
JOIN `users` ON ( `users`.`id` = `user_point_logs`.`user_id` )
LEFT JOIN (SELECT *
FROM user_point_used_logs
WHERE user_point_log_id NOT IN (
SELECT DISTINCT return_id
FROM user_point_logs
WHERE return_id IS NOT NULL
AND user_id = 16082
)
)
AS user_point_used_logs
ON ( `user_point_logs`.`id` = `user_point_used_logs`.`user_point_log_used_id` )
WHERE expire_date >= 1563980400
AND `user_point_logs`.`point` >= 0
AND `users`.`id` IS NOT NULL
AND ( `user_point_logs`.`return_id` = 0
OR `user_point_logs`.`return_id` IS NULL )
AND `user_point_logs`.`user_id` = '16082'
GROUP BY `user_point_logs`.`id`
ORDER BY `user_point_logs`.`expire_date` ASC
DB FIDDLE HERE WITH STRUCTURE

Kindly try this, If it works... will optimize further by adding composite index.
SELECT
upl.id,
upl.user_id,
upl.point_get_id,
upl.point,
upl.expire_date,
upl.point AS sum_point,
coalesce(SUM(upl.point),0) AS minus -- changed from complex to readable
FROM user_point_logs upl
JOIN users u ON upl.user_id = u.id
LEFT JOIN (select supul.user_point_log_used_id from user_point_used_logs supul
left join user_point_logs supl on supul.user_point_log_id=supl.return_id and supl.return_id is null and supl.user_id = 16082) AS upul
ON upl.id=upul.user_point_log_used_id
WHERE
upl.user_id = 16082 and coalesce(upl.return_id,0)= 0
and upl.expire_date >= 1563980400 -- tip: if its unix timestamp change the datatype and if possible use range between dates
#AND upl.point >= 0 -- since its NN by default removing this condition
#AND u.id IS NOT NULL -- removed since the inner join matches not null
GROUP BY upl.id
ORDER BY upl.expire_date ASC;
Edit:
Try adding index in the column return_id on the table user_point_logs.
Since this column is used in join on derived query.
Or use composite index with user_id and return_id

Indexes:
user_point_logs: (user_id, expire_date)
user_point_logs: (user_id, return_id)
OR is hard to optimize. Decide on only one way to say whatever is being said here, then get rid of the OR:
AND ( `user_point_logs`.`return_id` = 0
OR `user_point_logs`.`return_id` IS NULL )
DISTINCT is redundant:
NOT IN ( SELECT DISTINCT ... )
Change
IF(sum(`user_point_used_logs`.`point`) IS NULL, 0,
sum(`user_point_used_logs`.`point`)) AS `minus`
to
COALESCE( ( SELECT SUM(point) FROM user_point_used_logs ... ), 0) AS minus
and toss LEFT JOIN (SELECT * FROM user_point_used_logs ... )
Since a PRIMARY KEY is a key, the second of these is redundant and can be DROPped:
ADD PRIMARY KEY (`id`),
ADD KEY `id` (`id`) USING BTREE;
After all that, we may need another pass to further simplify and optimize it.

Related

Group by and group concat , optimization mysql query without using main pk

my example is on
MYSQL VERSION is
5.6.34-log
Problem summary the below query takes 40 seconds, ORDER_ITEM table
has 758423 records
And PAYMENT table
has 177272 records
And submission_entry table
has 2165698 records
as A Whole Table count.
DETAILS HERE: BELOW:
I Have This Query, Refer to [1]
I Have added SQL_NO_CACHE for testing repeated tests when re
query.
I Have Optimized indexes Refer to [2], but no significant
improvement.
Find Table Structures here [3]
Find explain plan used [4]
[1]
SELECT SQL_NO_CACHE
`payment`.`id` AS id,
`order_item`.`order_id` AS order_id,
GROUP_CONCAT(DISTINCT (CASE WHEN submission_entry.text = '' OR submission_entry.text IS NULL
THEN ' '
ELSE submission_entry.text END) ORDER BY question.var DESC SEPARATOR 0x1D) AS buyer,
event.name AS event,
COUNT(DISTINCT CASE WHEN (`order_item`.status > 0 OR (
`order_item`.status != -1 AND `order_item`.status >= -2 AND `payment`.payment_type_id != 8 AND
payment.make_order_free = 1))
THEN `order_item`.id
ELSE NULL END) AS qty,
payment.currency AS `currency`,
(SELECT SUM(order_item.sub_total)
FROM order_item
WHERE payment_id =
payment.id) AS sub_total,
CASE WHEN payment.make_order_free = 1
THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
ELSE ROUND(payment.total, 2) END AS 'total',
`payment_type`.`name` AS payment_type,
payment_status.name AS status,
`payment_status`.`id` AS status_id,
DATE_FORMAT(CONVERT_TZ(order_item.`created`, '+0:00', '-8:00'),
'%Y-%m-%d %H:%i') AS 'created',
`user`.`name` AS 'agent',
event.id AS event_id,
payment.checked,
DATE_FORMAT(CONVERT_TZ(payment.checked_date, '+0:00', '-8:00'),
'%Y-%m-%d %H:%i') AS checked_date,
DATE_FORMAT(CONVERT_TZ(`payment`.`complete_date`, '+0:00', '-8:00'),
'%Y-%m-%d %H:%i') AS `complete date`,
`payment`.`delivery_status` AS `delivered`
FROM `order_item`
INNER JOIN `payment`
ON payment.id = `order_item`.`payment_id` AND (payment.status > 0.0 OR payment.status = -3.0)
LEFT JOIN (SELECT
sum(`payment_refund`.total) AS `refunds_total`,
payment_refunds.payment_id AS `payment_id`
FROM payment
INNER JOIN `payment_refunds` ON payment_refunds.payment_id = payment.id
INNER JOIN `payment` AS `payment_refund`
ON `payment_refund`.id = `payment_refunds`.payment_id_refund
GROUP BY `payment_refunds`.payment_id) AS `refunds` ON `refunds`.payment_id = payment.id
# INNER JOIN event_date_product ON event_date_product.id = order_item.event_date_product_id
# INNER JOIN event_date ON event_date.id = event_date_product.event_date_id
INNER JOIN event ON event.id = order_item.event_id
INNER JOIN payment_status ON payment_status.id = payment.status
INNER JOIN payment_type ON payment_type.id = payment.payment_type_id
LEFT JOIN user ON user.id = payment.completed_by
LEFT JOIN submission_entry ON submission_entry.form_submission_id = `payment`.`form_submission_id`
LEFT JOIN question ON question.id = submission_entry.question_id AND question.var IN ('name', 'email')
WHERE 1 = '1' AND (order_item.status > 0.0 OR order_item.status = -2.0)
GROUP BY `order_item`.`order_id`
HAVING 1 = '1'
ORDER BY `order_item`.`order_id` DESC
LIMIT 10
[2]
CREATE INDEX order_id
ON order_item (order_id);
CREATE INDEX payment_id
ON order_item (payment_id);
CREATE INDEX status
ON order_item (status);
Second Table
CREATE INDEX payment_type_id
ON payment (payment_type_id);
CREATE INDEX status
ON payment (status);
[3]
CREATE TABLE order_item
(
id INT AUTO_INCREMENT
PRIMARY KEY,
order_id INT NOT NULL,
form_submission_id INT NULL,
status DOUBLE DEFAULT '0' NULL,
payment_id INT DEFAULT '0' NULL
);
SECOND TABLE
CREATE TABLE payment
(
id INT AUTO_INCREMENT,
payment_type_id INT NOT NULL,
status DOUBLE NOT NULL,
form_submission_id INT NOT NULL,
PRIMARY KEY (id, payment_type_id)
);
[4] Run the snippet to see the table of EXPLAIN in HTML format
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<table border="1" style="border-collapse:collapse">
<tr><th>id</th><th>select_type</th><th>table</th><th>type</th><th>possible_keys</th><th>key</th><th>key_len</th><th>ref</th><th>rows</th><th>Extra</th></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment_status</td><td>range</td><td>PRIMARY</td><td>PRIMARY</td><td>8</td><td>NULL</td><td>4</td><td>Using where; Using temporary; Using filesort</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment</td><td>ref</td><td>PRIMARY,payment_type_id,status</td><td>status</td><td>8</td><td>exp_live_18092017.payment_status.id</td><td>17357</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment_type</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment.payment_type_id</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>user</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment.completed_by</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>submission_entry</td><td>ref</td><td>form_submission_id,idx_submission_entry_1</td><td>form_submission_id</td><td>4</td><td>exp_live_18092017.payment.form_submission_id</td><td>2</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>question</td><td>eq_ref</td><td>PRIMARY,var</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.submission_entry.question_id</td><td>1</td><td>Using where</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>order_item</td><td>ref</td><td>status,payment_id</td><td>payment_id</td><td>5</td><td>exp_live_18092017.payment.id</td><td>3</td><td>Using where</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>event</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.order_item.event_id</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td><derived3></td><td>ref</td><td>key0</td><td>key0</td><td>5</td><td>exp_live_18092017.payment.id</td><td>10</td><td>Using where</td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment_refunds</td><td>index</td><td>payment_id,payment_id_refund</td><td>payment_id</td><td>4</td><td>NULL</td><td>1110</td><td></td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment</td><td>ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment_refunds.payment_id</td><td>1</td><td>Using index</td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment_refund</td><td>ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment_refunds.payment_id_refund</td><td>1</td><td></td></tr>
<tr><td>2</td><td>DEPENDENT SUBQUERY</td><td>order_item</td><td>ref</td><td>payment_id</td><td>payment_id</td><td>5</td><td>func</td><td>3</td><td></td></tr></table>
</body>
</html>
Expected Restul
It has to be instead of 40 seconds less than 5
IMPORTANT
Updates
1) Reply to comment 1: there is no foreign key at all on those two tables.
UPDATE-1:
On local the original query takes 40 seconds
if i removed only the following it becomes 25 seconds saves 15 seconds
GROUP_CONCAT(DISTINCT (CASE WHEN submission_entry.text = '' OR submission_entry.text IS NULL
THEN ' '
ELSE submission_entry.text END) ORDER BY question.var DESC SEPARATOR 0x1D) AS buyer
if I removed only its the same time around 40 seconds no save!
COUNT(DISTINCT CASE WHEN (`order_item`.status > 0 OR (
`order_item`.status != -1 AND `order_item`.status >= -2 AND `payment`.payment_type_id != 8 AND
payment.make_order_free = 1))
THEN `order_item`.id
ELSE NULL END) AS qty,
if I removed only it takes around 36 seconds saves 4 seconds
(SELECT SUM(order_item.sub_total)
FROM order_item
WHERE payment_id =
payment.id) AS sub_total,
CASE WHEN payment.make_order_free = 1
THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
ELSE ROUND(payment.total, 2) END AS 'total',
Remove HAVING 1=1; the Optimizer may not be smart enough to ignore it. Please provide EXPLAIN SELECT (not in html) to see what the Optimizer is doing.
It seems wrong to have a composite PK in this case: PRIMARY KEY (id, payment_type_id). Please justify it.
Please explain the meaning of status or the need for DOUBLE: status DOUBLE
It will take some effort to figure out why the query is so slow. Let's start by tossing the normalization parts, such as dates and event name and currency. That is whittle down the query to enough to find the desired rows, but not the details on each row. If it is still slow, let's debug that. If it is 'fast', then add back on the other stuff, one by one, to find out what is causing a performance issue.
Is just id the PRIMARY KEY of each table? Or are there more exceptions (like payment)?
It seems 'wrong' to specify a value for question.var, but then use LEFT to imply that it is optional. Please change all LEFT JOINs to INNER JOINs unless I am mistaken on this issue.
Are any of the tables (perhaps submission_entry and event_date_product) "many-to-many" mapping tables? If so, then follow the tips here to get some performance gains.
When you come back please provide SHOW CREATE TABLE for each table.
Guided by the strategies below,
pre-evaluating agregations onto temporary tables
placing payment at the top - since this seems to be the most deterministic
grouping joins - enforcing to the query optimizer the tables relationship
i present a revised version of your query:
-- -----------------------------------------------------------------------------
-- Summarization of order_item
-- -----------------------------------------------------------------------------
drop temporary table if exists _ord_itm_sub_tot;
create temporary table _ord_itm_sub_tot(
primary key (payment_id)
)
SELECT
payment_id,
--
COUNT(
DISTINCT
CASE
WHEN(
`order_item`.status > 0 OR
(
`order_item`.status != -1 AND
`order_item`.status >= -2 AND
`payment`.payment_type_id != 8 AND
payment.make_order_free = 1
)
) THEN `order_item`.id
ELSE NULL
END
) AS qty,
--
SUM(order_item.sub_total) sub_total
FROM
order_item
inner join payment
on payment.id = order_item.payment_id
where order_item.status > 0.0 OR order_item.status = -2.0
group by payment_id;
-- -----------------------------------------------------------------------------
-- Summarization of payment_refunds
-- -----------------------------------------------------------------------------
drop temporary table if exists _pay_ref_tot;
create temporary table _pay_ref_tot(
primary key(payment_id)
)
SELECT
payment_refunds.payment_id AS `payment_id`,
sum(`payment_refund`.total) AS `refunds_total`
FROM
`payment_refunds`
INNER JOIN `payment` AS `payment_refund`
ON `payment_refund`.id = `payment_refunds`.payment_id_refund
GROUP BY `payment_refunds`.payment_id;
-- -----------------------------------------------------------------------------
-- Summarization of submission_entry
-- -----------------------------------------------------------------------------
drop temporary table if exists _sub_ent;
create temporary table _sub_ent(
primary key(form_submission_id)
)
select
submission_entry.form_submission_id,
GROUP_CONCAT(
DISTINCT (
CASE WHEN coalesce(submission_entry.text, '') THEN ' '
ELSE submission_entry.text
END
)
ORDER BY question.var
DESC SEPARATOR 0x1D
) AS buyer
from
submission_entry
LEFT JOIN question
ON(
question.id = submission_entry.question_id
AND question.var IN ('name', 'email')
)
group by submission_entry.form_submission_id;
-- -----------------------------------------------------------------------------
-- The result
-- -----------------------------------------------------------------------------
SELECT SQL_NO_CACHE
`payment`.`id` AS id,
`order_item`.`order_id` AS order_id,
--
_sub_ent.buyer,
--
event.name AS event,
--
_ord_itm_sub_tot.qty,
--
payment.currency AS `currency`,
--
_ord_itm_sub_tot.sub_total,
--
CASE
WHEN payment.make_order_free = 1 THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
ELSE ROUND(payment.total, 2)
END AS 'total',
--
`payment_type`.`name` AS payment_type,
`payment_status`.`name` AS status,
`payment_status`.`id` AS status_id,
--
DATE_FORMAT(
CONVERT_TZ(order_item.`created`, '+0:00', '-8:00'),
'%Y-%m-%d %H:%i'
) AS 'created',
--
`user`.`name` AS 'agent',
event.id AS event_id,
payment.checked,
--
DATE_FORMAT(CONVERT_TZ(payment.checked_date, '+0:00', '-8:00'), '%Y-%m-%d %H:%i') AS checked_date,
DATE_FORMAT(CONVERT_TZ(payment.complete_date, '+0:00', '-8:00'), '%Y-%m-%d %H:%i') AS `complete date`,
--
`payment`.`delivery_status` AS `delivered`
FROM
`payment`
INNER JOIN(
`order_item`
INNER JOIN event
ON event.id = order_item.event_id
)
ON `order_item`.`payment_id` = payment.id
--
inner join _ord_itm_sub_tot
on _ord_itm_sub_tot.payment_id = payment.id
--
LEFT JOIN _pay_ref_tot
on _pay_ref_tot.payment_id = `payment`.id
--
INNER JOIN payment_status ON payment_status.id = payment.status
INNER JOIN payment_type ON payment_type.id = payment.payment_type_id
LEFT JOIN user ON user.id = payment.completed_by
--
LEFT JOIN _sub_ent
on _sub_ent.form_submission_id = `payment`.`form_submission_id`
WHERE
1 = 1
AND (payment.status > 0.0 OR payment.status = -3.0)
AND (order_item.status > 0.0 OR order_item.status = -2.0)
ORDER BY `order_item`.`order_id` DESC
LIMIT 10
The query from your question present aggregated functions without explicit groupings... this is pretty awkward and in my solution I try to devise aggregations that 'make sense'.
Please, run this version and tell us your findings.
Be, please, very careful not just on the running statistics, but also on the summarization results.
(The tables and query are too complex for me to do the transformation for you. But here are the steps.)
Reformulate the query without any mention of refunds. That is, remove the derived table and the mention of it in the complex CASE.
Debug and time the resulting query. Keep the GROUP BY order_item ORDER BY order_item DESC LIMIT 10 and do any other optimizations already suggested. In particular, get rid of HAVING 1=1 since it is in the way of a likely optimization.
Make the query from step #2 be a 'derived table'...
Something like:
SELECT lots of stuff
FROM ( query from step 2 ) AS step2
LEFT JOIN ( ... ) AS refunds ON step2... = refunds...
ORDER BY step2.order_item DESC
The ORDER BY is repeated, but neither the GROUP BY, nor the LIMIT need be repeated.
Why? The principle here is...
Currently, it is going into the refunds correlated subquery thousands of times, only to toss it all but 10 times. The reformulation cuts that back to only 10 times.
(Caveat: I may have missed a subtlety preventing this reformulation from working as I presented it. If it does not work, see if you can make the 'principle' help you anyway.)
Here is the minimum you should do each time you see a query with a lot of joins and pagination: you should select those 10 (LIMIT 10) ids that you group by from the first table (order_item) with as minimum joins as possible and then join the ids back to the first table and make all other joins. That way you will not move around in temporary tables all those thousands of columns and rows that you do not need to display.
You look at the inner joins and WHERE conditions, GROUP BYs and ORDER BYs to see whether you need any other tables to filter out rows, group or order ids from the first table. In your case, it doesn't seem you need any joins, except for payment.
Now you write the query to select those ids:
SELECT o.order_id, o.payment_id
FROM order_item o
JOIN payment p
ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
WHERE order_item.status > 0.0 OR order_item.status = -2.0
ORDER BY order_id DESC
LIMIT 10
If there might be several payments for a single order, you should use GROUP BY order_id DESC instead of ORDER BY. To make the query work quicker you need a BTREE index on status column for order_item table, or even a composite index on (status, payment_id).
Now, when you are sure that the ids are those that you expected, you make all other joins:
SELECT order_item.order_id,
`payment`.`id`,
GROUP_CONCAT ... -- and so on from the original query
FROM (
SELECT o.order_id, o.payment_id
FROM order_item o
JOIN payment p
ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
WHERE order_item.status > 0.0 OR order_item.status = -2.0
ORDER BY order_id DESC
LIMIT 10
) as ids
JOIN order_item ON ids.order_id = order_item.order_id
JOIN payment ON ids.payment_id = payment.id
LEFT JOIN ( ... -- and so on
The idea is that you significantly lower the temporary tables you need to process. Now every row selected by the joins will be used in the result set.
UPD1: Another thing is that you should simplify the aggregation in LEFT JOIN:
SELECT
sum(payment.total) AS `refunds_total`,
refs.payment_id AS `payment_id`
FROM payment_refunds refs
JOIN payment ON payment.id = refs.payment_id_refund
GROUP BY refs.payment_id
or even replace the LEFT JOIN with a correlated subquery, since the correlation will be executed only for those 10 rows (make sure, you use this whole query with three columns as the subquery, otherwise, the correlation will be computed for each row in the resulting join before the GROUP BY):
SELECT
ids.order_id,
ids.payment_id,
(SELECT SUM(p.total)
FROM payment_refunds refs
JOIN payment p
ON refs.payment_id_refund = p.id
WHERE refs.payment_id = ids.payment_id
) as refunds_total
FROM (
SELECT o.order_id, o.payment_id
FROM order_item o
JOIN payment p
ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
WHERE order_item.status > 0.0 OR order_item.status = -2.0
ORDER BY order_id DESC
LIMIT 10
) as ids
You will also need to an index (payment_id, payment_id_refund) on payment_refunds and you can even try a covering index (payment_id, total) on payment as well.

Optimize table to avoid using temporary and using filesort

I have a messages table
CREATE TABLE `messages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`author` int(11) DEFAULT NULL,
`time` int(10) unsigned DEFAULT NULL,
`text` text CHARACTER SET latin1,
`dest` int(11) unsigned DEFAULT NULL,
`type` tinyint(4) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `author` (`author`),
KEY `dest` (`dest`)
) ENGINE=InnoDB AUTO_INCREMENT=2758 DEFAULT CHARSET=utf8;
I need to get messages between two users
SELECT
...
FROM
`messages` m
LEFT JOIN `people` p ON m.author = p.id
WHERE
(author = 1 AND dest = 2)
OR (author = 2 AND dest = 1)
ORDER BY
m.id DESC
LIMIT 0, 25
When I EXPLAIN this query I get
Please excuse any ignorance, but is there a way I could optimize this table to avoid using a temporary table and filesort for this query, for now it is not causing a problem but I'm pretty sure in future it is going to be troublesome?
First, I'm guessing the left join is not necessary. Second, consider using union all instead. Then one approach is:
(SELECT ...
FROM messages m JOIN
people p
ON m.author = p.id
WHERE author = 1 AND dest = 2
ORDER BY id DESC
LIMIT 25
)
UNION ALL
(SELECT ...
FROM messages m JOIN
people p
ON m.author = p.id
WHERE author = 2 AND dest = 1
ORDER BY id DESC
LIMIT 25
)
ORDER BY m.id DESC
LIMIT 0, 25
With this query, an index on messages(author, dest, id) should make it fast. (Note: you might need to include m.id in the SELECT list.)
To build on Gordon's answer:
SELECT m2..., p...
FROM
(
( SELECT id
FROM messages
WHERE author = 1
AND dest = 2
ORDER BY id DESC
LIMIT 75
)
UNION ALL
(
SELECT id
FROM messages
WHERE author = 2
AND dest = 1
ORDER BY id DESC
LIMIT 75
)
) ORDER BY id DESC
LIMIT 50, 25 ) AS m1
JOIN messages AS m2 ON m2.id = m1.id
JOIN people p ON p.id = m2.author
ORDER BY m1.id DESC
Notes:
Gordon's index is now "covering". (This adds efficiency, thereby masking some of the other stuff I added.)
Lazy evaluation means that it does not need to shovel all the bulky fields of more than 25 rows around. Instead, only 25 need to be handled. Also, I avoid touching people to start with.
The code shows what "page 3" should look like. Note LIMIT 75 versus LIMIT 50,25.
"Pagination via OFFSET" has several problems. See my blog.
This formulation still will not avoid "filesort" and "using temp". But speed is the real goal, correct? ("Filesort" is a misnomer -- if you don't include that TEXT column, the sort will be done in RAM.)
When you add INDEX(author, dest, id), INDEX(author) becomes redundant; drop it.
The ALL after UNION is not the default for UNION, but it avoids an extra pass (and temp table) to de-duplicate the data.
There will still be 2 or 3 temp tables involved. See EXPLAIN FORMAT=JSON SELECT ... for details.

Using an OR clause in my INNER JOIN

I have the following two tables,
CREATE TABLE logins (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
user_id_1 INT NOT NULL,
user_id_2 INT DEFAULT 0,
user_id_3 INT DEFAULT 0,
PRIMARY KEY (id)
) ENGINE=MyISAM;
CREATE TABLE user_data (
user_id NOT NULL,
day DATE NOT NULL,
PRIMARY KEY (`user_id, `day`)
) ENGINE=MyISAM;
This schema could use a refactor, but I've inherited it and have to write a query now that does a JOIN with both logins and user_data. I need to select all the rows in user_data that have a > 0 value for one of the three user_id_? keys.
I'm not entirely sure how to compile this query, was thinking something along the lines of:
SELECT logins.user_id_1, logins.user_id_2, logins.user_id_3, user_data.day,
FROM logins
INNER JOIN user_data
ON (logins.user_id_1 = user_data.user_id OR ??)
What's the best way to query for this where I will retrieve up to 3 rows, one for each user_id_?
OR is allowed. Another option is to left join to the user_data table three times, and check to see if any of them came back. You might want to try both and see which performs better. My guess is they will be about the same, but I'm not deeply familiar with the MySQL plan generator.
SELECT
l.user_id_1
,l.user_id_2
,l.user_id_3
--consider: what if there is a match in more
--than one table? what do you want to happen?
,case when ud1.day is not null then ud1.day
when ud2.day is not null then ud2.day
when ud3.day is not null then ud3.day
else null
end as day
FROM
logins l
left JOIN user_data ud1 on ud1.user_id = l.user_id_1
left join user_data ud2 on ud2.user_id = l.user_id_2
left join user_data ud3 on ud3.user_id = l.user_id_3
where ud1.user_id is not null
or ud2.user_id is not null
or ud3.user_id is not null
you can use or.
SELECT logins.user_id_1, logins.user_id_2, logins.user_id_3, user_data.day,
FROM logins
INNER JOIN user_data
ON logins.user_id_1 = user_data.user_id OR
logins.user_id_2 = user_data.user_id OR
logins.user_id_3 = user_data.user_id
This syntax will work to choose all the records that match one or the other conditions.
You could try this:
SELECT logins.user_id_1, logins.user_id_2, logins.user_id_3, user_data.day,
FROM logins,user_data
WHERE user_data.user_id in (logins.user_id_1,logins.user_id_2,logins.user_id_3)
You can set any combinations of different conditions when do ON absolutely no limits for OR AND and other boolean operations same as for WHERE clause
SELECT logins.user_id_1, logins.user_id_2, logins.user_id_3, user_data.day,
FROM logins
INNER JOIN user_data
ON logins.user_id_1 = user_data.user_id OR
logins.user_id_2 = user_data.user_id OR
logins.user_id_3 = user_data.user_id
or probably you don't need those logins.user_id_1, logins.user_id_2, logins.user_id_3, but just user_data.user_id:
SELECT user_data.user_id, user_data.day,
FROM logins
INNER JOIN user_data
ON logins.user_id_1 = user_data.user_id OR
logins.user_id_2 = user_data.user_id OR
logins.user_id_3 = user_data.user_id
You might utilize a UNION, this will return multiple rows per login-row, don't know if you need this:
SELECT l.user_id, user_data.*
FROM user_data as u
INNER JOIN
(
select user_id_1 --, could add other columns...
from logins
union
select user_id_2 --, could add other columns...
from logins
where user_id_2 > 0
union
select user_id_3 --, could add other columns...
from logins
where user_id_3 > 0
) as l
ON l.user_id = user_data.user_id

MYSQL order large database by decimal 10,10

I have about 25 million rows containing 0.183463545, 0.183423545, 0.183443545, 0.183443445, 0.183447545 and so on.
I need to order these however it currently takes around 20 seconds. Any way to speed it up? AFAIK, I have my index put it place correctly.
Thank you!
SELECT `a`.`float_val`,
`a`.`num_id`,
`b`.`userID`,
`c`.`img`,
`c`.`username`,
`b`.`img`,
`d`.`exterior`
FROM `a`
INNER JOIN `b` ON `b`.`num_id` = `a`.`num_id`
INNER JOIN `d` ON `d`.`id` = `b`.`item`
INNER JOIN `c` ON `c`.`userID` = `b`.`userID`
WHERE `float_val` IS NOT NULL
AND `float_val` BETWEEN 0 AND 1
AND `username` = 'ABC'
ORDER BY `float_val` LIMIT 100
Index is on float_val, num_id, userID
CREATE TABLE `float` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`num_id` bigint(11) DEFAULT NULL,
`float_val` decimal(10,10) DEFAULT NULL,
`userID` char(17) DEFAULT NULL,
`last_checked` datetime DEFAULT NULL,
`index10` smallint(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `floatID` (`num_id`),
UNIQUE KEY `num_id` (`num_id`,`userID`),
KEY `userID` (`userID`),
KEY `float_val` (`float_val`),
KEY `last_checked` (`last_checked`),
KEY `index10` (`index10`)
) ENGINE=InnoDB AUTO_INCREMENT=25750916 DEFAULT CHARSET=latin1;
Edited to reflect table definitions shown.
Start by doing a query to obtain the lowest float_val rows from table a.
SELECT a.id
FROM a
INNER JOIN b ON b.num_id = a.num_id
INNER JOIN d ON d.id = b.item
INNER JOIN c ON c.userID = b.userID
WHERE c.username = 'ABC'
AND a.float_val BETWEEN 0.0 AND 1.0
ORDER BY a.float_val
LIMIT 100
If you have an index on a(float_val,num_id), and another on c.username this will be reasonably fast. It will spit out the id values for the rows of a that are candidates for your query. (If you're using MyISAM, you need an index on a(float_val, num_id, id). By the way, the BETWEEN clause also implies IS NOT NULL.
Then use that as a subquery to complete your query, as follows.
SELECT a.float_val,
a.num_id,
b.userID,
c.img,
c.username,
b.img,
d.exterior
FROM a
INNER JOIN (
SELECT a.id
FROM a
INNER JOIN b ON b.num_id = a.num_id
INNER JOIN d ON d.id = b.item
INNER JOIN c ON c.userID = b.userID
WHERE c.username = 'ABC'
AND a.float_val BETWEEN 0.0 AND 1.0
ORDER BY a.float_val
LIMIT 100
) q ON a.ID = q.id
INNER JOIN b ON b.num_id = a.num_id
INNER JOIN d ON d.id = b.item
INNER JOIN c ON c.userID = b.userID
WHERE c.username = 'ABC'
AND a.float_val BETWEEN 0.0 AND 1.0
ORDER BY a.float_val LIMIT 100
This kind of query contains a deferred join. That dramatically reduces the number of rows of the full query that need to be subjected to ORDER BY ... LIMIT. Without the deferred join, your original query sorts an enormous mess of quite long rows just to discard all but the first hundred of them. That's why it takes so long.
This should help. The next optimization step is to look at the EXPLAIN output from this query and the exact definitions of your tables.
Pro tip: In queries of this complexity, always qualify column names with table names or aliases. That is, use a.float_val throughout, rather than just float_val. This is a kindness to the next person to look at the query.

How to speed up left join queries by indexing?

At the moment I am experiencing some slower MySQL queries in my application which I want to speed up. Unfortunately I’m not quite sure which is the correct way to do it.
I have the following (fictitious) tables: Book, Page and Word.
Word is child of Page by word_page_id
Page is child of Book by page_book id
I already have individual indexes on page_book_id, word_page_id, book_user_id and book_flag_delete.
SELECT `book`.*, COUNT(word_id) AS `word_amount` FROM `book`
LEFT JOIN `page` ON page_book_id = book_id
LEFT JOIN `word` ON word_page_id = paragraph_id
WHERE (book_user_id = 1) AND (book_flag_delete IS NULL)
GROUP BY `book_id`
ORDER BY `book_id` ASC LIMIT 100
SELECT COUNT(DISTINCT `book_id`) AS `book_row_count` FROM `book`
LEFT JOIN `page` ON page_book_id = book_id
LEFT JOIN `word` ON word_page_id = page_id
WHERE (book_user_id = 59) AND (book_flag_delete IS NULL)
Any ideas how to speed up such queries?
Is there extra indexing involved?
Set indexes on the fields you use for joining.
Further make sure that these have both the same datatype, encoding, and collation, else the index will also not be used.
mysql> EXPLAIN <query> will show you the actually used fields (key column in output) and the available indexes (possible_keys output field).
For this query:
SELECT b.*, COUNT(w.word_id) AS `word_amount`
FROM `book` b LEFT JOIN
`page` p
ON p.page_book_id = b.book_id LEFT JOIN
`word` w
ON w.word_page_id = p.paragraph_id
WHERE (b.book_user_id = 1) AND (b.book_flag_delete IS NULL)
GROUP BY b.`book_id`
ORDER BY b.`book_id` ASC
LIMIT 100;
The best indexes are: book(user_id, book_flag_delete, book_id), page(page_book_id, paragraph_id), and word(word_page_id, word_id).
However, the overall group by might be expensive. You might try writing the query as:
SELECT b.*,
(SELECT COUNT(w.word_id)
FROM `page` p JOIN
`word` w
ON w.word_page_id = p.paragraph_id
WHERE p.page_book_id = b.book_id
) AS `word_amount`
FROM `book` b LEFT JOIN
WHERE (b.book_user_id = 1) AND (b.book_flag_delete IS NULL)
ORDER BY b.`book_id` ASC
LIMIT 100;
The same indexes indexes work here. But, this query should avoid a group by on all the data at once (instead, it uses the indexes for the aggregation).
The optimal schema for a many-to-many mapping table is
CREATE TABLE XtoY (
# No surrogate id for this table
x_id MEDIUMINT UNSIGNED NOT NULL, -- For JOINing to one table
y_id MEDIUMINT UNSIGNED NOT NULL, -- For JOINing to the other table
# Include other fields specific to the 'relation'
PRIMARY KEY(x_id, y_id), -- When starting with X
INDEX (y_id, x_id) -- When starting with Y
) ENGINE=InnoDB;
The details on 'why' are in my index cookbook
In your select you're gonna want to refrain from using the wildcard "*" to grab columns. Plus utilize aliases ALWAYS!! This will keep your db from having to create a "virtual" alias.
select book1.column1, book1.column2, page1.column1
from book book1
left join page page1
on page1.page_book_id = book1.book_id
..... blah