How Can I improve this MySQL query? - mysql

I am trying to improve the performance of this query as it is taking 3-4 seconds to execute.
Here is the query
SELECT SQL_NO_CACHE
ac.account_id,
ac.account_name,
cl.name AS client_name,
IFNULL(cn.contact_number, "") AS Phone
FROM accounts AS ac
STRAIGHT_JOIN clients AS cl ON cl.client_id = ac.client_id
LEFT JOIN (
SELECT bc.contact_number, bc.account_id
FROM contact_numbers AS bc
INNER JOIN (
SELECT account_id, MAX(number_id) AS number_id
FROM contact_numbers
WHERE status = 1 AND contact_type != "Fax" AND contact_link = "Account"
GROUP BY account_id
) AS bb ON bb.number_id = bc.number_id
) AS cn ON ac.account_id = cn.account_id
WHERE ac.status = 1
ORDER BY ac.account_name
LIMIT 0, 100
the client table contains about 10 rows that's why I have straight join. The account table contains 350K records. The contact_numbers contains about 500k records
I believe the problem here is the left Join and also the ORDER BY but I am not sure how to work around it. Also I am using SQL_NO_CACHE because the accounts, contact_numbers tables are being updated at a fast rate.
What else can I do to improve performance of this query?
this is a screenshot of the explain on this query
I am using MySQL 5.6.13
I Set sort_buffer_size=1M
My server has 32GB of RAM

The below should make the outer query run without requiring a filesort.
CREATE INDEX ac_status_acctname ON accounts (status, account_name);
The below should make the inner query Using index, and help it to do the GROUP by without using a temp table.
CREATE INDEX cn_max ON contact_numbers (account_id, status, contact_link,
contact_type, number_id);
You need to join on both account_id and number_id to get the greatest entry per account. The way you have it now, you just get any account that happens to have the same number_id, which is probably not what you intended, and it could be what's generating too many rows for the subquery result set.
bc INNER JOIN ... bb ON bb.account_id = bc.account_id AND bb.number_id = bc.number_id
You can also write the same join condition as:
bc INNER JOIN ... bb USING (account_id, number_id)

I would actualy rewrite the query. You currently select a lot of data you do not need and discard. I would minimize the amount of the fetched data.
It seems you basically select something for each account with a certain status and take only 100 of them. So I would put this in a subquery:
SELECT
account_id,
account_name,
c.name AS client_name,
IFNULL(contact_number, '') as Phone
FROM (
SELECT
account_id,
MAX(number_id) as number_id
FROM (
SELECT account_id
FROM accounts
WHERE status = 1 -- other conditions on accounts go here
ORDER BY account_name
LIMIT 0, 100) as a
LEFT JOIN contact_numbers n
ON a.coount_id = n.account_id
AND n.status = 1
AND contact_type != "Fax"
AND contact_link = "Account"
GROUP BY account_id) an
LEFT JOIN contact_numbers USING (account_id, number_id)
JOIN accounts a USING (account_id)
JOIN clients c USING (client_id);
You will need (status, account_name) index for accounts table (for the query with client_id = 4 (status, client_id, account_name) as well) and an index on account_id in contact_numbers. This should suffice.

Related

optimizing SQL counts

I have to select a list of Catalogs from one table, and perform counts in two other tables: Stores and Categories. The counters should show how many Stores and Categories are linked to each Catalog.
I have managed to get the functionality I need using this SQL query:
SELECT `catalog`.`id` AS `id`,
`catalog`.`name` AS `name`,
(
SELECT COUNT(*)
FROM `category`
WHERE `category`.`catalog_id` = `catalog`.`id`
AND `category`.`is_archive` = 0
AND `category`.`company_id` = 2
) AS `category_count`,
(
SELECT COUNT(*)
FROM `store`
WHERE `store`.`catalog_id` = `catalog`.`id`
AND `store`.`is_archive` = 0
AND `store`.`company_id` = 2
) AS `store_count`
FROM `catalog`
WHERE `catalog`.`company_id` = 2
AND `catalog`.`is_archive` = 0
ORDER BY `catalog`.`id` ASC;
This works as expected. But I don't like to perform sub-queries, as they are slow and this query may perform badly on LARGE lists.. Is there any method of optimizing this SQL using JOINs?
Thanks in advance.
You can make this a lot faster by refactoring the dependent subqueries in your SELECT clause into, as you mention, JOINed aggregate subqueries.
The first subquery you can write this way.
SELECT COUNT(*) num, catalog_id, company_id
FROM category
WHERE is_archive = 0
GROUP BY catalog_id, company_id
The second one like this.
SELECT COUNT(*) num, catalog_id, company_id
FROM store
WHERE is_archive = 0
GROUP BY catalog_id, company_id
Then, use those in your main query aas if they were tables containing the counts you want.
SELECT catalog.id,
catalog.name,
category.num category_count,
store.num store_count
FROM catalog
LEFT JOIN (
SELECT COUNT(*) num, catalog_id, company_id
FROM category
WHERE is_archive = 0
GROUP BY catalog_id, company_id
) category ON catalog.id = category.catalog_id
AND catalog.company_id = category.company_id
LEFT JOIN (
SELECT COUNT(*) num, catalog_id, company_id
FROM store
WHERE is_archive = 0
GROUP BY catalog_id, company_id
) store ON catalog.id = store.catalog_id
AND catalog.company_id = store.company_id
WHERE catalog.is_archive = 0
AND catalog.company_id = 2
ORDER BY catalog.id ASC;
This is faster than your example because each subquery need only run once, rather than once per catalog entry. It also has the nice feature that you only need say WHERE catalog.company_id = 2 once. The MySQL optimizer knows what to do with that.
I suggest LEFT JOIN operations so you'll still see catalog entries even if they're not mentioned in your category or store tables.
Subqueries are fine, but you can simplify your query:
SELECT c.id, c.name,
COUNT(*) OVER (PARTITION BY c.catalog_id) as category_count,
(SELECT COUNT(*)
FROM store s
WHERE s.catalog_id = s.id AND
s.is_archive = 0 AND
s.company_id = c.company_id
) AS store_count
FROM catalog c
WHERE c.company_id = 2 AND c.is_archive = 0
ORDER BY c.id ASC;
For performance, you want indexes on:
catalog(company_id, is_archive, id)
store(catalog_id, company_id, is_archive)
Because of the filtering in the outer query, a correlated subquery is probably the best performing way to get the results from store.
Also note some changes to the query:
I removed the backticks. They are unnecessary and just clutter the query.
An expression like c.id as id is redundant. The expression is given id as the alias anyway.
I changed the s.company_id = 2 to s.company_id = c.company_id. It seems like a correlation clause.

Very slow sql query for count

I need get report count for each user role, but my sql query very slow (40 sec on good server). My sql query:
SELECT `auth_assignment`.`item_name`, COUNT(*) as count
FROM `report`
LEFT JOIN `company` ON company.id = report.company_id
LEFT JOIN `auth_assignment`
ON auth_assignment.user_id = company.user_id
GROUP BY `auth_assignment`.`item_name`
ORDER BY `count`
auth_assignment.item_name is role type.
auth_assignment has ~23k rows.
company ~11k rows.
reports ~12k rows (one company can have many reports).
report.id and company.id, have binding
First, you are aggregating on a column from the third table in a left join. I'm guessing you don't want NULL for the value, so use inner join or change the order of the tables.
Table aliases make the query easier to write and to read:
SELECT aa.item_name, COUNT(*) as cnt
FROM report r JOIN
company c
ON c.id = r.company_id JOIN
auth_assignment aa
ON aa.user_id = c.user_id
GROUP BY aa.item_name
ORDER BY cnt;
Assuming the join's are correct for the tables, then you just want to be sure that you have indexes. These should go on the columns used for the joins: company(id, user_id), auth_assignment(user_id, item_name).

Optimize MySQL query nested not exists possible?

I have list of submissions of exercises done by students who are part of a group(classroom), this contains:
submission table: userId, groupId, exercise_id (and more irrelevant data)
users table: userId, groupId
I want to select all the exercises done by all the students in a specific group. For this I currently have:
SELECT DISTINCT(exercise_id) FROM submissions as c1 WHERE c1.groupId = 1
AND NOT EXISTS(
SELECT DISTINCT(UserId) FROM users as u WHERE u.GroupId = 1
AND NOT EXISTS (
SELECT exercise_id FROM submissions as c2 WHERE u.UserId = c2.UserId
AND c2.exercise_id = c1.exercise_id
)
)
i.e. I select all the exercises for which there are no users in the group that have not done the exercise.
However, this query takes 5 seconds on a submission table with 1.5 million rows. Which steps could I take to further optimise this query? I have considered inner joins, but won't this result in the same query execution plan?
The groupid really shouldn't be in both tables. Assuming the values are consistent, try the following:
select s.exercise_id
from submissions s
where s.groupid = 1
group by s.exercise_id
having count(distinct userid) = (select count(distinct userid) from users where groupid = 1);
For performance, you want an index on submissions(groupid, exercise_id). Also, if you know there are no duplicate submissions or users, then remove the distinct, because that has an adverse effect on performance.

Slow aggregate query with join on same table

I have a query to show customers and the total dollar value of all their orders. The query takes about 100 seconds to execute.
I'm querying on an ExpressionEngine CMS database. ExpressionEngine uses one table exp_channel_data, for all content. Therefore, I have to join on that table for both customer and order data. I have about 14,000 customers, 30,000 orders and 160,000 total records in that table.
Can I change this query to speed it up?
SELECT link.author_id AS customer_id,
customers.field_id_122 AS company,
Sum(orders.field_id_22) AS total_orders
FROM exp_channel_data customers
JOIN exp_channel_titles link
ON link.author_id = customers.field_id_117
AND customers.channel_id = 7
JOIN exp_channel_data orders
ON orders.entry_id = link.entry_id
AND orders.channel_id = 3
GROUP BY customer_id
Thanks, and please let me know if I should include other information.
UPDATE SOLUTION
My apologies. I noticed that entry_id for the exp_channel_data table customers corresponds to author_id for the exp_channel_titles table. So I don't have to use field_id_117 in the join. field_id_117 duplicates entry_id, but in a TEXT field. JOINING on that text field slowed things down. The query is now 3 seconds
However, the inner join solution posted by #DRapp is 1.5 seconds. Here is his sql with a minor edit:
SELECT
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
FROM
( SELECT
t.author_id
SUM( o.field_id_22 ) as totalOrders
FROM
exp_channel_data o
JOIN
exp_channel_titles t ON t.author_id = o.entry_id AND o.channel_id = 3
GROUP BY
t.author_id ) PQ
JOIN
exp_channel_data c ON PQ.author_id = c.entry_id AND c.channel_id = 7
ORDER BY CustomerID
If this is the same table, then the same columns across the board for all alias instances.
I would ensure an index on (channel_id, entry_id, field_id_117 ) if possible. Another index on (author_id) for the prequery of order totals
Then, start first with what will become an inner query doing nothing but a per customer sum of order amounts.. Since the join is the "author_id" as the customer ID, just query/sum that first. Not completely understanding the (what I would consider) poor design of the structure, knowing what the "Channel_ID" really indicates, you don't want to duplicate summation values because of these other things in the mix.
select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id
If that is correct on the per customer (via author_id column), then that can be wrapped as follows
select
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
from
( select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id ) PQ
JOIN exp_channel_data c
on PQ.author_id = c.field_id_117
AND c.channel_id = 7
Can you post the results of an EXPLAIN query?
I'm guessing that your tables are not indexed well for this operation. All of the columns that you join on should probably be indexed. As a first guess I'd look at indexing exp_channel_data.field_id_117
Try something like this. Possibly you have error in joins. also check whether joins on columns are correct in your databases. Cross join may takes time to fetch large data, by mistake if your joins are not proper on columns.
select
link.author_id as customer_id,
customers.field_id_122 as company,
sum(orders.field_id_22) as total_or_orders
from exp_channel_data customers
join exp_channel_titles link on (link.author_id = customers.field_id_117 and
link.author_id = customer.channel_id = 7)
join exp_channel_data orders on (orders.entry_id = link.entry_id and orders.entry_id = orders.channel_id = 3)
group by customer_id

how can I make this query more efficient?

edit: here is a simplified version of the original query (runs in 3.6 secs on a products table of 475K rows)
SELECT p.*, shop FROM products p JOIN
users u ON p.date >= u.prior_login and u.user_id = 22 JOIN
shops s ON p.shop_id = s.shop_id
ORDER BY shop, date, product_id;
this is the explain plan
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE u const PRIMARY,prior_login,user_id PRIMARY 4 const 1 Using temporary; Using filesort
1 SIMPLE s ALL PRIMARY NULL NULL NULL 90
1 SIMPLE p ref shop_id,date,shop_id_2,shop_id_3 shop_id 4 bitt3n_minxa.s.shop_id 5338 Using where
the bottleneck seems to be ORDER BY date,product_id. Removing these two orderings, the query runs in 0.06 seconds. (Removing either one of the two (but not both) has virtually no effect, query still takes over 3 seconds.) I have indexes on both product_id and date in the products table. I have also added an index on (product,date) with no improvement.
newtover suggests the problem is the fact that the INNER JOIN users u1 ON products.date >= u1.prior_login requirement is preventing use of the index on products.date
Two variations of the query that execute in ~0.006 secs (as opposed to 3.6 secs for the original) have been suggested to me (not from this thread).
this one uses a subquery, which appears to force the order of the joins
SELECT p.*, shop
FROM
(
SELECT p.*
FROM products p
WHERE p.date >= (select prior_login FROM users where user_id = 22)
) as p
JOIN shops s
ON p.shop_id = s.shop_id
ORDER BY shop, date, product_id;
this one uses the WHERE clause to do the same thing (although the presence of SQL_SMALL_RESULT doesn't change the execution time, 0.006 secs without it as well)
SELECT SQL_SMALL_RESULT p . * , shop
FROM products p
INNER JOIN shops s ON p.shop_id = s.shop_id
WHERE p.date >= (
SELECT prior_login
FROM users
WHERE user_id =22 )
ORDER BY shop, DATE, product_id;
My understanding is that these queries work much faster on account of reducing the relevant number of rows of the product table before joining it to the shops table. I am wondering if this is correct.
Use the EXPLAIN statement to see the execution plan. Also you can try adding an index to products.date and u1.prior_login.
Also please just make sure you have defined your foreign keys and they are indexed.
Good luck.
We do need an explain plan... but
Be very careful of select * from table where id in (select id from another_table) This is a notorious. Generally these can be replaced by a join. The following query might run, although I haven't tested it.
SELECT shop,
shops.shop_id AS shop_id,
products.product_id AS product_id,
brand,
title,
price,
image AS image,
image_width,
image_height,
0 AS sex,
products.date AS date,
fav1.favorited AS circle_favorited,
fav2.favorited AS session_user_favorited,
u2.username AS circle_username
FROM products
LEFT JOIN favorites fav2
ON fav2.product_id = products.product_id
AND fav2.user_id = 22
AND fav2.current = 1
INNER JOIN shops
ON shops.shop_id = products.shop_id
INNER JOIN users u1
ON products.date >= u1.prior_login AND u1.user_id = 22
LEFT JOIN favorites fav1
ON products.product_id = fav1.product_id
LEFT JOIN friends f1
ON f1.star_id = fav1.user_id
LEFT JOIN users u2
ON fav1.user_id = u2.user_id
WHERE f1.fan_id = 22 OR fav1.user_id = 22
ORDER BY shop,
DATE,
product_id,
circle_favorited
the fact that the query is slow because of the ordering is rather obvious since it is hard to find an index that would to apply ORDER BY in this case. The main problem is products.date >= comparison which breaks using any index for ORDER BY. And since you have a lot of data to output, MySQL starts using temporary tables for sorting.
what i would to is to try to force MySQL output data in the order of an index which already has the required order and remove the ORDER BY clause.
I am not at a computer to test, but how would I do it:
I would do all inner joins
then I would LEFT JOIN to a subquery which makes all computations on favorites ordered by product_id, circle_favourited (which would provide the last ordering criterion).
So, the question is how to make the data be sorted on shop, date, product_id
I am going to write about it a bit later =)
UPD1:
You should probably read something on how btree indexes work in MySQL. There is a good article on mysqlperformanceblog.com about it (I currently write from a mobile and don't have the link at hand). In short, you seem to talk about one-column indexes which arrange pointers to rows based on values sorted in a single column. Compound indexes store an order based on several columns. Indexes mostly used to operate on clearly defined ranges of them to obtain most of the information before retrieving data from the rows they point at. Indexes usually do not know about other indexes on the same table, as result they are rarely merged. when there is no more info to take from the index, MySQL starts to operate directly on data.
That is an index on date can not make use of the index on product_id, but an index on (date, product_id) can get some more info on product_id after a condition on date (sort on product id for a specific date match).
Nevertheless, a range condition on date (>=) breaks this. That is what I was talking about.
UPD2:
As I uderstand the problem can be reduced to (most of the time it spends on that):
SELECT p.*, shop
FROM products p
JOIN users u ON p.`date` >= u.prior_login and u.user_id = 22
JOIN shops s ON p.shop_id = s.shop_id
ORDER BY shop, `date`, product_id;
Now add an index (user_id, prior_login) on users and (date) on products, and try the following query:
SELECT STRAIGHT_JOIN p.*, shop
FROM (
SELECT product_id, shop
FROM users u
JOIN products p
user_id = 22 AND p.`date` >= prior_login
JOIN shops s
ON p.shop_id = s.shop_id
ORDER BY shop, p.`date`, product_id
) as s
JOIN products p USING (product_id);
If I am correct the query should return the same result but quicker. If would be nice if you would post the result of EXPLAIN for the query.