Please teach me how to optimize this MySQL query - mysql

I have two tables, main_part (3k records) and part_details (25k records)
I tried the following indexes but explain always returns full table scan of 25k records as opposed to about 2k of matched records and Using where; Using temporary; Using filesort
ALTER TABLE `main_part` ADD INDEX `main_part_index_1` (`unit`);
ALTER TABLE `part_details` ADD INDEX `part_details_index_1` (`approved`, `display`, `country`, `id`, `price`);
Here is my query:
SELECT a.part_id, b.my_title,
b.price, a.type,
a.unit, a.certification,
b.my_image,
b.price/a.unit AS priceW
FROM main_part AS a
INNER JOIN part_details AS b ON a.part_id=b.id
WHERE b.approved = 'Yes'
AND b.display = 'On'
AND b.country = 'US'
AND a.unit >= 300
ORDER BY b.price ASC LIMIT 50
One thing that I am aware of is that a.part_id is not a Primary Key in main_part table. Could this be a culprit?
Create tables SQL:
CREATE TABLE `main_part` (
`id` smallint(6) NOT NULL AUTO_INCREMENT,
`part_id` mediumint(9) NOT NULL DEFAULT '0',
`type` varchar(50) NOT NULL DEFAULT '',
`unit` varchar(50) NOT NULL DEFAULT '',
`certification` varchar(50) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `part_details` (
`id` mediumint(9) NOT NULL AUTO_INCREMENT,
`asn` varchar(50) NOT NULL DEFAULT '',
`country` varchar(5) NOT NULL DEFAULT '',
`my_title` varchar(200) NOT NULL DEFAULT '',
`display` varchar(5) NOT NULL DEFAULT 'On',
`approved` varchar(5) NOT NULL DEFAULT 'No',
`price` decimal(7,3) NOT NULL DEFAULT '0.000',
`my_image` varchar(250) NOT NULL DEFAULT '',
`update_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `countryasn` (`country`,`asn`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

For your query the more important index is the JOIN condition and as you are already aware a.part_id isn't primary key, so doesn't have a default index and your first try should be:
ALTER TABLE `main_part` ADD INDEX `main_part_index_1` (`part_id`,`unit`);
Because we are interested on the JOIN condition first you should also change the second index to
ALTER TABLE `part_details` ADD INDEX `part_details_index_1`
(`id`, `approved`, `display`, `country`, `price`);
order matters in the index
Another tip is you start with the basic query:
SELECT *
FROM main_part AS a
INNER JOIN part_details AS b ON a.part_id=b.id
Add index for part_id and id check the explain plan and then start adding condition and updating the index if required.

It seems that most columns used for filtering in part_details aren't very selective (display is probably an On/off switch, country is probably very similar in many products, etc.).
In some cases, when the WHERE clause is not very selective, MySQL may choose to use an index that better suits the ORDER BY clause.
I would try to create this index as well and check in the explain plan if there is any changes:
ALTER TABLE `part_details` ADD INDEX `part_details_price_index` (`price`);

For this query:
SELECT mp.part_id, pd.my_title, pd.price, mp.type,
mp.unit, mp.certification, pd.my_image,
pd.price/mp.unit AS priceW
FROM main_part mp INNER JOIN
part_details pd
ON mp.part_id = pd.id
WHERE pd.approved = 'Yes' AND
pd.display = 'On' AND
pd.country = 'US' AND
mp.unit >= 300
ORDER BY pd.price ASC
LIMIT 50;
For this query, I would start with indexes on part_details(country, display, approved, id, price) and main_part(part_id, unit).
The index on part_details can be used for filtering before the join. It is not easy to get rid of the sort for the order by.

Related

Any way to do this query faster with big data

This query takes around 2.23seconds and feels a bit slow ... is there anyway to make it faster.
our member.id, member_id, membership_id, valid_to, valid_from has index as well.
select *
from member
where (member.id in ( select member_id from member_membership mm
INNER JOIN membership m ON mm.membership_id = m.id
where instr(organization_chain, 2513) and m.valid_to > NOW() and m.valid_from < NOW() ) )
order by id desc
limit 10 offset 0
EXPLAIN FOR WHAT QUERY DOING: every member has many a member_memberships and and member_memberships connect with another table called membership there we have the membership details. so query will get all members that has valid memberships and where the organization id 2513 exist on member_membership.
Tables as following:
CREATE TABLE `member` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) DEFAULT NULL,
`last_name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
CREATE TABLE `member_membership` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`membership_id` int(11) DEFAULT NULL,
`member_id` int(11) DEFAULT NULL,
`organization_chain` text DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `member_membership_to_membership` (`membership_id`),
KEY `member_membership_to_member` (`member_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
CREATE TABLE `membership` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`valid_to` datetime DEFAULT NULL,
`valid_from` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `valid_to` (`valid_to`),
KEY `valid_from` (`valid_from`),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
ALTER TABLE `member_membership` ADD CONSTRAINT `member_membership_to_membership` FOREIGN KEY (`membership_id`) REFERENCES `membership` (`id`);
ALTER TABLE `member_membership` ADD CONSTRAINT `member_membership_to_member` FOREIGN KEY (`member_id`) REFERENCES `member` (`id`);
Here with EXPLAIN statement => https://i.ibb.co/xjrcYWR/EXPLAIN.png
Relations
member has many member_membership
membership has manymember_membership
So member_membership is like join for tables member and membership.
Well I found a way to make it less to 800ms ... like this. Is this good way or maybe there is more we can do?
select *
from member
where (member.id in ( select member_id from member_membership mm FORCE INDEX (PRIMARY)
INNER JOIN membership m ON mm.membership_id = m.id
where instr(organization_chain, 2513) and m.valid_to > NOW() and m.valid_from < NOW() ) )
order by id desc
limit 10 offset 0
NEW UPDATE.. and I think this solve the issue.. 15ms :)
I added FORCE INDEX..
The FORCE INDEX hint acts like USE INDEX (index_list), with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the named indexes to find rows in the table.
select *
from member
where (member.id in ( select member_id from member_membership mm FORCE INDEX (member_membership_to_member)
INNER JOIN membership m FORCE INDEX (organization_to_membership) ON mm.membership_id = m.id
where instr(organization_chain, 2513) and m.valid_to > NOW() and m.valid_from < NOW() ) )
order by id desc
limit 10 offset 0
How big is organization_chain? If you don't need TEXT, use a reasonably sized VARCHAR so that it could be in an index. Better yet, is there some way to get 2513 in a column by itself?
Don't use id int(11) NOT NULL AUTO_INCREMENT, in a many-to-many table; rather have the two columns in PRIMARY KEY.
Put the ORDER BY and LIMIT in the subquery.
Don't use IN ( SELECT ...), use a JOIN.

Mysql select query with subqueries with index taking long time to excute about(28 sec)

I am trying to run below query on MYsql, The query is taking too long to run.
I need to extract for each supplier:
total_purchases
total_sales
If supplier has an opening balance different from null, use it in where clause in subqueries, else use '2020-01-01'
Here is the query I am using:
SELECT
sup.name AS supplier_name,
sup.id AS supplier_id,
sup.opening_balance_date,
sup.opening_balance AS opening_balance,
(
SELECT
IFNULL(SUM(pop.quantity * pop.cost),0) AS total_purchases
FROM purchase_order_products pop
JOIN purchase_order po ON
pop.purchase_order_id = po.id
WHERE
DATE(po.created_at) >= IFNULL(sup.opening_balance_date, '2020-01-01')
AND DATE(po.created_at) < '2020-03-01'
AND po.status = 'approved'
AND po.supplier_id = sup.id
) as total_purchases,
(
SELECT
IFNULL(sum(soi.total_cost),0) AS total_sales
FROM
sales_order_item soi
JOIN sales_order so use index (date_status_completed) ON
so.id = soi.sales_order_id
WHERE
soi.total_cost > 0
AND soi.supplier_id = sup.id
AND so.order_status = 'complete'
AND so.completed_returned = 0
AND so.desired_delivery_date >= IFNULL(sup.opening_balance_date, '2020-01-01')
) AS total_sales
FROM supplier sup
WHERE sup.is_active = 1
group by sup.id
ORDER BY sup.name;
If I run the query without the below subquery, it tooks 0.5 sec which is accepted.
(
SELECT
IFNULL(sum(soi.total_cost),0) AS total_sales
FROM
sales_order_item soi
JOIN sales_order so use index (date_status_completed) ON
so.id = soi.sales_order_id
WHERE
soi.total_cost > 0
AND soi.supplier_id = sup.id
AND so.order_status = 'complete'
AND so.completed_returned = 0
AND so.desired_delivery_date >= IFNULL(sup.opening_balance_date, '2020-01-01')
) AS total_sales
I think the problem is here: soi.supplier_id = sup.id
here is the explain
and here the database
Create Query:
--
-- Table structure for table `purchase_order`
--
CREATE TABLE `purchase_order` (
`id` int(11) NOT NULL,
`supplier_id` int(11) NOT NULL,
`created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`status` varchar(150) DEFAULT 'Pending',
`total` float NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- --------------------------------------------------------
--
-- Table structure for table `purchase_order_products`
--
CREATE TABLE `purchase_order_products` (
`id` int(11) NOT NULL,
`purchase_order_id` int(11) NOT NULL,
`product_id` int(11) NOT NULL,
`quantity` int(50) NOT NULL,
`cost` decimal(15,3) NOT NULL,
`reporting_quantity` int(11) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- --------------------------------------------------------
--
-- Table structure for table `sales_order`
--
CREATE TABLE `sales_order` (
`id` varchar(20) NOT NULL,
`order_status` varchar(50) DEFAULT NULL,
`desired_delivery_date` date DEFAULT NULL,
`completed_returned` int(1) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- --------------------------------------------------------
--
-- Table structure for table `sales_order_item`
--
CREATE TABLE `sales_order_item` (
`id` varchar(20) NOT NULL,
`sales_order_id` varchar(20) NOT NULL,
`total_cost` float NOT NULL DEFAULT '0',
`supplier_id` int(11) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- --------------------------------------------------------
--
-- Table structure for table `supplier`
--
CREATE TABLE `supplier` (
`id` int(11) NOT NULL,
`name` varchar(400) NOT NULL,
`opening_balance` decimal(10,2) NOT NULL DEFAULT '0.00',
`opening_balance_date` date DEFAULT NULL,
`is_active` tinyint(4) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- --------------------------------------------------------
--
-- Indexes for dumped tables
--
--
-- Indexes for table `credit_note`
--
-- Indexes for table `purchase_order`
--
ALTER TABLE `purchase_order`
ADD PRIMARY KEY (`id`),
ADD KEY `id` (`id`),
ADD KEY `supplier_id` (`supplier_id`),
ADD KEY `created_at` (`created_at`,`status`),
ADD KEY `supplier_id_2` (`supplier_id`,`created_at`),
ADD KEY `id_date_status` (`supplier_id`,`created_at`,`status`) USING BTREE;
--
-- Indexes for table `purchase_order_products`
--
ALTER TABLE `purchase_order_products`
ADD PRIMARY KEY (`id`),
ADD KEY `purchase_order_id` (`purchase_order_id`),
ADD KEY `product_id` (`product_id`),
ADD KEY `cost` (`cost`,`reporting_quantity`);
--
-- Indexes for table `sales_order`
--
ALTER TABLE `sales_order`
ADD PRIMARY KEY (`id`),
ADD UNIQUE KEY `idx_id` (`id`),
ADD KEY `idx_desired_delivery_date` (`desired_delivery_date`),
ADD KEY `date_status_completed` (`order_status`,`desired_delivery_date`,`completed_returned`) USING BTREE,
ADD KEY `completed_returned` (`completed_returned`),
ADD KEY `order_status` (`order_status`),
ADD KEY `order_status_2` (`order_status`,`completed_returned`);
--
-- Indexes for table `sales_order_item`
--
ALTER TABLE `sales_order_item`
ADD PRIMARY KEY (`id`),
ADD KEY `sales_order_id` (`sales_order_id`),
ADD KEY `reporting_supplier` (`supplier_id`) USING BTREE,
ADD KEY `total_cost` (`total_cost`),
ADD KEY `supplier_cost` (`supplier_id`,`total_cost`) USING BTREE,
ADD KEY `supplier_id` (`supplier_id`);
--
-- Indexes for table `supplier`
--
ALTER TABLE `supplier`
ADD PRIMARY KEY (`id`),
ADD KEY `id` (`id`),
ADD KEY `is_active` (`is_active`);
--
-- AUTO_INCREMENT for table `purchase_order`
--
ALTER TABLE `purchase_order`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=26763;
--
-- AUTO_INCREMENT for table `purchase_order_products`
--
ALTER TABLE `purchase_order_products`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=41884;
--
-- AUTO_INCREMENT for table `supplier`
--
ALTER TABLE `supplier`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=182;
A PRIMARY KEY is a unique key:
ADD PRIMARY KEY (`id`),
ADD KEY `id` (`id`), -- Toss this; it is redundant
Do not mix INT and VARCHAR:
so.id = soi.sales_order_id
Don't use DATE() when you don't need to:
DATE(po.created_at) < '2020-03-01'
-->
po.created_at < '2020-03-01'
When you have INDEX(a,b), you don't also need INDEX(a).
Some of the above may be severely impacting preformance.
Here are some composite indexes that may help:
sup: (is_active, id)
so: (completed_returned, order_status, desired_delivery_date, id)
soi: (supplier_id, total_cost, sales_order_id)
po: (supplier_id, status, created_at, id)
There's not enough information to answer this, but it looks like the USE INDEX clause you're using in the second subquery may be forcing MySQL to scan more rows than necessary. Try removing use index (date_status_completed).
A revision to the query tested on mysql version 8.0.19, assuming the purchase_order and sales_order_item tables to be sparse against each other but both (should have a FOREIGN KEY) reference the supplier table (but not necessarily any of the same records on supplier).
You can trim down the number of necessary scans by pushing down either one of purchase_order or sales_order_item into a single EXISTS subquery and then converting the other into LEFT OUTER JOINs.
The functionality of SUM() obviates the need for a null-coalesce operation for purchase_order_product.
Because sales_order.completed_returned is a BOOLEAN INT, we can combine it into the (1-so.completed_returned) to convert the total_cost of 'returned' items into harmless zeroes in the aggregation; GREATEST() likewise obviates the need for the total_cost >0 filter criteria.
Additionally, if every purchase_order supplier is also a sales_order_supplier then you can replace LEFT OUTER JOIN with INNER JOINs for a proper output.
Finally, as #RickJames noted, don't bother using DATE() here either, since the conditions already imply it.
SELECT
sup.name AS supplier_name,
sup.id AS supplier_id,
MIN(sup.opening_balance_date) AS opening_balance_date,
MIN(sup.opening_balance) AS opening_balance,
SUM( pop.quantity * pop.cost ) as total_purchases,
SUM( COALESCE( GREATEST( soi.total_cost, 0) * (1-so.completed_returned),0) ) AS total_sales
FROM (( supplier sup, purchase_order_products pop )
LEFT OUTER JOIN sales_order_item soi ON soi.supplier_id = sup.id )
LEFT OUTER JOIN sales_order so ON
so.id = soi.sales_order_id AND
so.order_status = 'complete' AND
so.desired_delivery_date >= COALESCE(sup.opening_balance_date, '2020-01-01')
WHERE sup.is_active = 1
AND EXISTS (
SELECT 1
FROM purchase_order po
WHERE
po.created_at >= COALESCE(sup.opening_balance_date, '2020-01-01')
AND po.created_at < '2020-03-01'
AND po.status = 'approved'
AND po.supplier_id = sup.id
AND pop.purchase_order_id <=> po.id
)
group by sup.id, sup.name
ORDER BY sup.name;
EXPLAIN FORMAT=TREE
-> Sort: <temporary>.name
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Nested loop inner join (cost=1.75 rows=1)
-> Nested loop inner join (cost=1.40 rows=1)
-> Nested loop left join (cost=1.05 rows=1)
-> Nested loop left join (cost=0.70 rows=1)
-> Index lookup on sup using is_active (is_active=1) (cost=0.35 rows=1)
-> Index lookup on soi using reporting_supplier (supplier_id=sup.id) (cost=0.35 rows=1)
-> Filter: ((so.order_status = \'complete\') and (so.desired_delivery_date >= coalesce(sup.opening_balance_date,\'2020-01-01\'))) (cost=0.35 rows=1)
-> Single-row index lookup on so using PRIMARY (id=soi.sales_order_id) (cost=0.35 rows=1)
-> Filter: ((po.created_at >= coalesce(sup.opening_balance_date,\'2020-01-01\')) and (po.created_at < TIMESTAMP\'2020-03-01 00:00:00\') and (po. = \'approved\')) (cost=0.35 rows=1)
-> Index lookup on po using supplier_id (supplier_id=sup.id) (cost=0.35 rows=1)
-> Index lookup on pop using purchase_order_id (purchase_order_id=po.id), with index condition: (pop.purchase_order_id <=> po.id) (cost=0.35 rows=1)
See if this might work better in your use case - it's hard to know exactly how a re-work will play out without knowing the contents of all the tables involved.
A cursory look at the EXPLAIN plan indicates that further optimization might be possible by employing hints, i.e. SEMIJOIN and BNL/NO_NBL optimizations for this query in particular.

MySQL query - taking ORDER BY off making query 100 times faster

I have found very long query in my system. The MySQL Slow Log says the following:
# Time: 2018-07-08T18:47:02.273314Z
# User#Host: server[server] # localhost [] Id: 1467
# Query_time: 97.251247 Lock_time: 0.000210 Rows_sent: 50 Rows_examined: 41646378
SET timestamp=1531075622;
SELECT n1.full_name AS sender_full_name, s1.email AS sender_email,
e.subject, e.body, e.attach, e.date, e.id, r.status,
n2.full_name AS receiver_full_name, s2.email AS receiver_email,
r.basket,
FROM email_routing r
JOIN email e ON e.id = r.message_id
JOIN people_emails s1 ON s1.id = r.sender_email_id
JOIN people n1 ON n1.id = s1.people_id
JOIN people_emails s2 ON s2.id = r.receiver_email_id
JOIN people n2 ON n2.id = s2.people_id
WHERE r.sender_email_id = 21897 ORDER BY e.date desc LIMIT 0, 50;
The EXPLAIN query shows no full table scan and the query using indexes:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE s1 NULL const PRIMARY PRIMARY 4 const 1 100.00 Using temporary; Using filesort
1 SIMPLE n1 NULL const PRIMARY,ppl PRIMARY 4 const 1 100.00 NULL
1 SIMPLE n2 NULL index PRIMARY,ppl ppl 771 NULL 1 100.00 Using index
1 SIMPLE s2 NULL index PRIMARY s2 771 NULL 3178 10.00 Using where; Using index; Using join buffer (Block Nested Loop)
1 SIMPLE r NULL ref bk1,bk2,msgid bk1 4 server.s2.id 440 6.60 Using where; Using index
1 SIMPLE e NULL eq_ref PRIMARY PRIMARY 4 server.r.message_id 1 100.00 NULL
Here is my SHOW CREATE TABLE queries for the used tables:
CREATE TABLE `email_routing` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`message_id` int(11) NOT NULL,
`sender_email_id` int(11) NOT NULL,
`receiver_email_id` int(11) NOT NULL,
`basket` int(11) NOT NULL,
`status` int(11) NOT NULL,
`popup` int(11) NOT NULL DEFAULT '0',
`tm` int(11) NOT NULL DEFAULT '0',
KEY `id` (`id`),
KEY `bk1` (`receiver_email_id`,`status`,`sender_email_id`,`message_id`,`basket`),
KEY `bk2` (`sender_email_id`,`tm`),
KEY `msgid` (`message_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1055796 DEFAULT CHARSET=utf8
-
CREATE TABLE `email` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`subject` text NOT NULL,
`body` text NOT NULL,
`date` datetime NOT NULL,
`attach` text NOT NULL,
`attach_dir` varchar(255) CHARACTER SET cp1251 DEFAULT NULL,
`attach_subject` varchar(255) DEFAULT NULL,
`attach_content` longtext,
`sphinx_synced` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `Index_2` (`attach_dir`),
KEY `dt` (`date`)
) ENGINE=InnoDB AUTO_INCREMENT=898001 DEFAULT CHARSET=utf8
-
CREATE TABLE `people_emails` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`nick` varchar(255) NOT NULL,
`email` varchar(255) NOT NULL,
`key_name` varchar(255) NOT NULL,
`people_id` int(11) NOT NULL,
`status` int(11) NOT NULL DEFAULT '0',
`activity` int(11) NOT NULL,
`internal_user_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `s2` (`email`,`people_id`)
) ENGINE=InnoDB AUTO_INCREMENT=22146 DEFAULT CHARSET=utf8
-
CREATE TABLE `people` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fname` varchar(255) CHARACTER SET cp1251 NOT NULL,
`lname` varchar(255) CHARACTER SET cp1251 NOT NULL,
`patronymic` varchar(255) CHARACTER SET cp1251 NOT NULL,
`gender` tinyint(1) NOT NULL,
`full_name` varchar(255) NOT NULL DEFAULT ' ',
`category` int(11) NOT NULL,
`people_type_id` int(255) DEFAULT NULL,
`tags` varchar(255) CHARACTER SET cp1251 NOT NULL,
`job` varchar(255) CHARACTER SET cp1251 NOT NULL,
`post` varchar(255) CHARACTER SET cp1251 NOT NULL,
`profession` varchar(255) CHARACTER SET cp1251 DEFAULT NULL,
`zip` varchar(16) CHARACTER SET cp1251 NOT NULL,
`country` int(11) DEFAULT NULL,
`region` varchar(10) NOT NULL,
`city` varchar(255) CHARACTER SET cp1251 NOT NULL,
`address` varchar(255) CHARACTER SET cp1251 NOT NULL,
`address_date` date DEFAULT NULL,
`last_update_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `ppl` (`id`,`full_name`)
) ENGINE=InnoDB AUTO_INCREMENT=415040 DEFAULT CHARSET=utf8
Here is the SHOW TABLE STATUS output for those 4 tables:
Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment
email InnoDB 10 Dynamic 753748 12079 9104785408 0 61112320 4194304 898167
email_routing InnoDB 10 Dynamic 900152 61 55132160 0 69419008 6291456 1056033
people InnoDB 10 Dynamic 9538 386 3686400 0 2785280 4194304 415040
people_emails InnoDB 10 Dynamic 3178 752 2392064 0 98304 4194304 22146
MySQL Version 5.7.22 Ubuntu 16.04
However I have noticed one thing - if I take ORDER BY out of the query, but leaving the LIMIT, then query runs almost instantly taking not more than 0.2 seconds. So I have started to think to run query with no ORDER BY and do sorting by PHP means like that but eventually that seems to complicated as using the LIMIT with no ORDER BY I get wronge range to sort.
Is there anything else I could do to speed up or optimize that query?
AS AN ALTERNATIVE I could do sorting and paging by my PHP code. I add addtional columnt into the SELECT ..., UNIX_TIMESTAMP(e.date) as ts and then do:
<?php
...
$main_query = $server->query($query);
$emails_list = $main_query->fetch_all(MYSQLI_ASSOC);
function cmp($a, $b) {
return strcmp($a['ts'], $b['ts']);
}
$emails_sorted = usort($emails_list, "cmp");
for ($i=$start;$i<$lenght;$i++)
{
$singe_email = $emails_sorted[$i]
// Format the output
}
But when I do that I get
Fatal error: Allowed memory size of 134217728 bytes exhausted
at line of $emails_sorted = usort($emails_list, "cmp");
Warning, I'm not very familiar with MySQL, in fact I'm mostly projecting MSSQL experience on top of things I (mostly) read about MySQL.
1) Potential workaround: is it safe to assume that email.id and email.date are always in the same order? From a functional point of view this seems logical as emails get added to the table over time and thus have an ever increasing auto-number... But maybe the initial load of the data was in a different/random order? Anyway, if it is, what happens if you ORDER BY e.id instead of ORDER BY e.date ?
2) Does adding a composite index on email (id, date) (in that order!) help?
3) If all of that does not help, splitting the query into 2 parts might help out the optimizer. (You may need to fix the syntax for MySQL)
-- Locate what we want first
CREATE TEMPORARY TABLE results (
SELECT e.id
r.basket
FROM email_routing r
JOIN email e ON e.id = r.message_id
WHERE r.sender_email_id = 21897
ORDER BY e.date desc LIMIT 0, 50 );
-- Again, having an index on email (id, date) seems like a good idea to me
-- (As a test you may want to add an index on results (id) here, shouldn't take long and
-- in MSSQl it would help build a better query plan, can't tell with MySQL)
-- return actual results
SELECT n1.full_name AS sender_full_name,
s1.email AS sender_email,
e.subject, e.body, e.attach, e.date, e.id, r.status,
n2.full_name AS receiver_full_name,
s2.email AS receiver_email,
r.basket,
FROM results r
JOIN email e ON e.id = r.message_id
JOIN people_emails s1 ON s1.id = r.sender_email_id
JOIN people n1 ON n1.id = s1.people_id
JOIN people_emails s2 ON s2.id = r.receiver_email_id
JOIN people n2 ON n2.id = s2.people_id
ORDER BY e.date desc
If your data comes back that quickly, how about wrapping it... but how many rows are actually GOING to be returned WITHOUT the LIMIT. Maybe you would still get better performance AFTER such as...
select PQ.*
from ( YourQueryWithoutOrderByAndLimt ) PQ
order by PQ.date desc
LIMIT 0, 50;
I suspect this is a case where the MySQL Join Optimizer overestimates the benefits of Block Nested Loop (BNL) join. You can try to turn off BNL by doing:
set optimizer_switch='block_nested_loop=off';
Hopefully this will provide a better join order. You could also try:
set optimizer_prune_level = 0;
to force the join optimizer to explore all possible join orders.
Another option is to use STRAIGHT_JOIN to force a particular join order. In this case, it seems the order as specified in the query text would be good. Hence, to force this particular join order you could write
SELECT STRAIGHT_JOIN ...
Note that whatever you do, you can not expect the query to be as fast as without ORDER BY. As long as you need to find the latest emails from a particular sender, and there is no information about sender in the email table, it is not possible to use an index to avoid sorting without going through all emails from all senders. Things would be different if you had information about date in the email_routing table. Then an index on that table could have been used to avoid sorting.
MySQL cannot use index for order by in your query because
The query joins many tables, and the columns in the ORDER BY are not
all from the first nonconstant table that is used to retrieve rows.
(This is the first table in the EXPLAIN output that does not have a
const join type.)
MySQL Order By Optimization

Sorting result of mysql join by avg of third table?

I have three tables.
One table contains submissions which has about 75,000 rows
One table contains submission ratings and only has < 10 rows
One table contains submission => competition mappings and for my test data also has about 75,000 rows.
What I want to do is
Get the top 50 submissions in a round of a competition.
Top is classified as highest average rating, followed by highest amount of votes
Here is the query I am using which works, but the problem is that it takes over 45 seconds to complete! I profiled the query (results at bottom) and the bottlenecks are copying the data to a tmp table and then sorting it so how can I speed this up?
SELECT `submission_submissions`.*
FROM `submission_submissions`
JOIN `competition_submissions`
ON `competition_submissions`.`submission_id` = `submission_submissions`.`id`
LEFT JOIN `submission_ratings`
ON `submission_submissions`.`id` = `submission_ratings`.`submission_id`
WHERE `top_round` = 1
AND `competition_id` = '2'
AND `submission_submissions`.`date_deleted` IS NULL
GROUP BY submission_submissions.id
ORDER BY AVG(submission_ratings.`stars`) DESC,
COUNT(submission_ratings.`id`) DESC
LIMIT 50
submission_submissions
CREATE TABLE `submission_submissions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`title` varchar(255) NOT NULL,
`description` varchar(255) DEFAULT NULL,
`genre` int(11) NOT NULL,
`goals` text,
`submission` text NOT NULL,
`date_created` datetime DEFAULT NULL,
`date_modified` datetime DEFAULT NULL,
`date_deleted` datetime DEFAULT NULL,
`cover_image` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `genre` (`genre`),
KEY `account_id` (`account_id`),
KEY `date_created` (`date_created`)
) ENGINE=InnoDB AUTO_INCREMENT=115037 DEFAULT CHARSET=latin1;
submission_ratings
CREATE TABLE `submission_ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`submission_id` int(11) NOT NULL,
`stars` tinyint(1) NOT NULL,
`date_created` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `submission_id` (`submission_id`),
KEY `account_id` (`account_id`),
KEY `stars` (`stars`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
competition_submissions
CREATE TABLE `competition_submissions` (
`competition_id` int(11) NOT NULL,
`submission_id` int(11) NOT NULL,
`top_round` int(11) DEFAULT '1',
PRIMARY KEY (`submission_id`),
KEY `competition_id` (`competition_id`),
KEY `top_round` (`top_round`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SHOW PROFILE Result (ordered by duration)
state duration (summed) in sec percentage
Copying to tmp table 33.15621 68.46924
Sorting result 11.83148 24.43260
removing tmp table 3.06054 6.32017
Sending data 0.37560 0.77563
... insignificant amounts removed ...
Total 48.42497 100.00000
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE competition_submissions index_merge PRIMARY,competition_id,top_round competition_id,top_round 4,5 18596 Using intersect(competition_id,top_round); Using where; Using index; Using temporary; Using filesort
1 SIMPLE submission_submissions eq_ref PRIMARY PRIMARY 4 inkstakes.competition_submissions.submission_id 1 Using where
1 SIMPLE submission_ratings ALL submission_id 5 Using where; Using join buffer (flat, BNL join)
Assuming that in reality you won't be interested in unrated submissions, and that a given submission only has a single competition_submissions entry for a given match and top_round, I suggest:
SELECT s.*
FROM (SELECT `submission_id`,
AVG(`stars`) AvgStars,
COUNT(`id`) CountId
FROM `submission_ratings`
GROUP BY `submission_id`
ORDER BY AVG(`stars`) DESC, COUNT(`id`) DESC
LIMIT 50) r
JOIN `submission_submissions` s
ON r.`submission_id` = s.`id` AND
s.`date_deleted` IS NULL
JOIN `competition_submissions` c
ON c.`submission_id` = s.`id` AND
c.`top_round` = 1 AND
c.`competition_id` = '2'
ORDER BY r.AvgStars DESC,
r.CountId DESC
(If there is more than one competition_submissions entry per submission for a given match and top_round, then you can add the GROUP BY clause back in to the main query.)
If you do want to see unrated submissions, you can union the results of this query to a LEFT JOIN ... WHERE NULL query.
There is a simple trick that works on MySql and helps to avoid copying/sorting huge temp tables in queries like this (with LIMIT X).
Just avoid SELECT *, this copies all columns to the temporary table, then this huge table is sorted, and in the end, the query takes only 50 records from this huge table ( 50 / 70000 = 0,07 % ).
Select only columns that are really necessary to perform sort and limit, and then join missing columns only for selected 50 records by id.
select ss.*
from submission_submissions ss
join (
SELECT `submission_submissions`.id,
AVG(submission_ratings.`stars`) stars,
COUNT(submission_ratings.`id`) cnt
FROM `submission_submissions`
JOIN `competition_submissions`
ON `competition_submissions`.`submission_id` = `submission_submissions`.`id`
LEFT JOIN `submission_ratings`
ON `submission_submissions`.`id` = `submission_ratings`.`submission_id`
WHERE `top_round` = 1
AND `competition_id` = '2'
AND `submission_submissions`.`date_deleted` IS NULL
GROUP BY submission_submissions.id
ORDER BY AVG(submission_ratings.`stars`) DESC,
COUNT(submission_ratings.`id`) DESC
LIMIT 50
) xx
ON ss.id = xx.id
ORDER BY xx.stars DESC,
xx.cnt DESC;

MySQL query killing my server

Looking at this query there's got to be something bogging it down that I'm not noticing. I ran it for 7 minutes and it only updated 2 rows.
//set product count for makes
$tru->query->run(array(
'name' => 'get-make-list',
'sql' => 'SELECT id, name FROM vehicle_make',
'connection' => 'core'
));
while($tempMake = $tru->query->getArray('get-make-list')) {
$tru->query->run(array(
'name' => 'update-product-count',
'sql' => 'UPDATE vehicle_make SET product_count = (
SELECT COUNT(product_id) FROM taxonomy_master WHERE v_id IN (
SELECT id FROM vehicle_catalog WHERE make_id = '.$tempMake['id'].'
)
) WHERE id = '.$tempMake['id'],
'connection' => 'core'
));
}
I'm sure this query can be optimized to perform better, but I can't think of how to do it.
vehicle_make = 45 rows
taxonomy_master = 11,223 rows
vehicle_catalog = 5,108 rows
All tables have appropriate indexes
UPDATE: I should note that this is a 1-time script so overhead isn't a big deal as long as it runs.
CREATE TABLE IF NOT EXISTS `vehicle_make` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(32) NOT NULL,
`product_count` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=46 ;
CREATE TABLE IF NOT EXISTS `taxonomy_master` (
`product_id` int(10) NOT NULL,
`v_id` int(10) NOT NULL,
`vehicle_requirement` varchar(255) DEFAULT NULL,
`is_sellable` enum('True','False') DEFAULT 'True',
`programming_override` varchar(25) DEFAULT NULL,
PRIMARY KEY (`product_id`,`v_id`),
KEY `idx2` (`product_id`),
KEY `idx3` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `vehicle_catalog` (
`v_id` int(10) NOT NULL,
`id` int(11) NOT NULL,
`v_make` varchar(255) NOT NULL,
`make_id` int(11) NOT NULL,
`v_model` varchar(255) NOT NULL,
`model_id` int(11) NOT NULL,
`v_year` varchar(255) NOT NULL,
PRIMARY KEY (`v_id`,`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx` (`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx2` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Update: The successful query to get what I needed is here....
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
without the tables/columns this is my best guess from reverse engineering the given queries:
UPDATE m
SET product_count =COUNT(t.product_id)
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.name
The given code loops over each make, and then runs a query the counts for each. My answer just does them all in one query and should be a lot faster.
have an index for each of these:
vehicle_make.id cover on name
vehicle_catalog.id cover make_id
taxonomy_master.v_id
EDIT
give this a try:
CREATE TEMPORARY TABLE CountsOf (
id int(11) NOT NULL
, CountOf int(11) NOT NULL DEFAULT 0.00
);
INSERT INTO CountsOf
(id, CountOf )
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
UPDATE taxonomy_master,CountsOf
SET taxonomy_master.product_count=CountsOf.CountOf
WHERE taxonomy_master.id=CountsOf.id;
instead of using nested query ,
you can separated this query to 2 or 3 queries,
and in php insert the result of the inner query to the out query ,
its faster !
#haim-evgi Separating the queries will not increase the speed significantly, it will just shift the load from the DB server to the Web server and create overhead of moving data between the two servers.
I am not sure with the appropriate indexes you run such query 7 minutes. Could you please show the table structure of the tables involved in these queries.
Seems like you need the following indices:
INDEX BTREE('make_id') on vehicle_catalog
INDEX BTREE('v_id') on taxonomy_master