optimizing SQL counts - mysql

I have to select a list of Catalogs from one table, and perform counts in two other tables: Stores and Categories. The counters should show how many Stores and Categories are linked to each Catalog.
I have managed to get the functionality I need using this SQL query:
SELECT `catalog`.`id` AS `id`,
`catalog`.`name` AS `name`,
(
SELECT COUNT(*)
FROM `category`
WHERE `category`.`catalog_id` = `catalog`.`id`
AND `category`.`is_archive` = 0
AND `category`.`company_id` = 2
) AS `category_count`,
(
SELECT COUNT(*)
FROM `store`
WHERE `store`.`catalog_id` = `catalog`.`id`
AND `store`.`is_archive` = 0
AND `store`.`company_id` = 2
) AS `store_count`
FROM `catalog`
WHERE `catalog`.`company_id` = 2
AND `catalog`.`is_archive` = 0
ORDER BY `catalog`.`id` ASC;
This works as expected. But I don't like to perform sub-queries, as they are slow and this query may perform badly on LARGE lists.. Is there any method of optimizing this SQL using JOINs?
Thanks in advance.

You can make this a lot faster by refactoring the dependent subqueries in your SELECT clause into, as you mention, JOINed aggregate subqueries.
The first subquery you can write this way.
SELECT COUNT(*) num, catalog_id, company_id
FROM category
WHERE is_archive = 0
GROUP BY catalog_id, company_id
The second one like this.
SELECT COUNT(*) num, catalog_id, company_id
FROM store
WHERE is_archive = 0
GROUP BY catalog_id, company_id
Then, use those in your main query aas if they were tables containing the counts you want.
SELECT catalog.id,
catalog.name,
category.num category_count,
store.num store_count
FROM catalog
LEFT JOIN (
SELECT COUNT(*) num, catalog_id, company_id
FROM category
WHERE is_archive = 0
GROUP BY catalog_id, company_id
) category ON catalog.id = category.catalog_id
AND catalog.company_id = category.company_id
LEFT JOIN (
SELECT COUNT(*) num, catalog_id, company_id
FROM store
WHERE is_archive = 0
GROUP BY catalog_id, company_id
) store ON catalog.id = store.catalog_id
AND catalog.company_id = store.company_id
WHERE catalog.is_archive = 0
AND catalog.company_id = 2
ORDER BY catalog.id ASC;
This is faster than your example because each subquery need only run once, rather than once per catalog entry. It also has the nice feature that you only need say WHERE catalog.company_id = 2 once. The MySQL optimizer knows what to do with that.
I suggest LEFT JOIN operations so you'll still see catalog entries even if they're not mentioned in your category or store tables.

Subqueries are fine, but you can simplify your query:
SELECT c.id, c.name,
COUNT(*) OVER (PARTITION BY c.catalog_id) as category_count,
(SELECT COUNT(*)
FROM store s
WHERE s.catalog_id = s.id AND
s.is_archive = 0 AND
s.company_id = c.company_id
) AS store_count
FROM catalog c
WHERE c.company_id = 2 AND c.is_archive = 0
ORDER BY c.id ASC;
For performance, you want indexes on:
catalog(company_id, is_archive, id)
store(catalog_id, company_id, is_archive)
Because of the filtering in the outer query, a correlated subquery is probably the best performing way to get the results from store.
Also note some changes to the query:
I removed the backticks. They are unnecessary and just clutter the query.
An expression like c.id as id is redundant. The expression is given id as the alias anyway.
I changed the s.company_id = 2 to s.company_id = c.company_id. It seems like a correlation clause.

Related

Very slow sql query for count

I need get report count for each user role, but my sql query very slow (40 sec on good server). My sql query:
SELECT `auth_assignment`.`item_name`, COUNT(*) as count
FROM `report`
LEFT JOIN `company` ON company.id = report.company_id
LEFT JOIN `auth_assignment`
ON auth_assignment.user_id = company.user_id
GROUP BY `auth_assignment`.`item_name`
ORDER BY `count`
auth_assignment.item_name is role type.
auth_assignment has ~23k rows.
company ~11k rows.
reports ~12k rows (one company can have many reports).
report.id and company.id, have binding
First, you are aggregating on a column from the third table in a left join. I'm guessing you don't want NULL for the value, so use inner join or change the order of the tables.
Table aliases make the query easier to write and to read:
SELECT aa.item_name, COUNT(*) as cnt
FROM report r JOIN
company c
ON c.id = r.company_id JOIN
auth_assignment aa
ON aa.user_id = c.user_id
GROUP BY aa.item_name
ORDER BY cnt;
Assuming the join's are correct for the tables, then you just want to be sure that you have indexes. These should go on the columns used for the joins: company(id, user_id), auth_assignment(user_id, item_name).

Slow aggregate query with join on same table

I have a query to show customers and the total dollar value of all their orders. The query takes about 100 seconds to execute.
I'm querying on an ExpressionEngine CMS database. ExpressionEngine uses one table exp_channel_data, for all content. Therefore, I have to join on that table for both customer and order data. I have about 14,000 customers, 30,000 orders and 160,000 total records in that table.
Can I change this query to speed it up?
SELECT link.author_id AS customer_id,
customers.field_id_122 AS company,
Sum(orders.field_id_22) AS total_orders
FROM exp_channel_data customers
JOIN exp_channel_titles link
ON link.author_id = customers.field_id_117
AND customers.channel_id = 7
JOIN exp_channel_data orders
ON orders.entry_id = link.entry_id
AND orders.channel_id = 3
GROUP BY customer_id
Thanks, and please let me know if I should include other information.
UPDATE SOLUTION
My apologies. I noticed that entry_id for the exp_channel_data table customers corresponds to author_id for the exp_channel_titles table. So I don't have to use field_id_117 in the join. field_id_117 duplicates entry_id, but in a TEXT field. JOINING on that text field slowed things down. The query is now 3 seconds
However, the inner join solution posted by #DRapp is 1.5 seconds. Here is his sql with a minor edit:
SELECT
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
FROM
( SELECT
t.author_id
SUM( o.field_id_22 ) as totalOrders
FROM
exp_channel_data o
JOIN
exp_channel_titles t ON t.author_id = o.entry_id AND o.channel_id = 3
GROUP BY
t.author_id ) PQ
JOIN
exp_channel_data c ON PQ.author_id = c.entry_id AND c.channel_id = 7
ORDER BY CustomerID
If this is the same table, then the same columns across the board for all alias instances.
I would ensure an index on (channel_id, entry_id, field_id_117 ) if possible. Another index on (author_id) for the prequery of order totals
Then, start first with what will become an inner query doing nothing but a per customer sum of order amounts.. Since the join is the "author_id" as the customer ID, just query/sum that first. Not completely understanding the (what I would consider) poor design of the structure, knowing what the "Channel_ID" really indicates, you don't want to duplicate summation values because of these other things in the mix.
select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id
If that is correct on the per customer (via author_id column), then that can be wrapped as follows
select
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
from
( select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id ) PQ
JOIN exp_channel_data c
on PQ.author_id = c.field_id_117
AND c.channel_id = 7
Can you post the results of an EXPLAIN query?
I'm guessing that your tables are not indexed well for this operation. All of the columns that you join on should probably be indexed. As a first guess I'd look at indexing exp_channel_data.field_id_117
Try something like this. Possibly you have error in joins. also check whether joins on columns are correct in your databases. Cross join may takes time to fetch large data, by mistake if your joins are not proper on columns.
select
link.author_id as customer_id,
customers.field_id_122 as company,
sum(orders.field_id_22) as total_or_orders
from exp_channel_data customers
join exp_channel_titles link on (link.author_id = customers.field_id_117 and
link.author_id = customer.channel_id = 7)
join exp_channel_data orders on (orders.entry_id = link.entry_id and orders.entry_id = orders.channel_id = 3)
group by customer_id

How Can I improve this MySQL query?

I am trying to improve the performance of this query as it is taking 3-4 seconds to execute.
Here is the query
SELECT SQL_NO_CACHE
ac.account_id,
ac.account_name,
cl.name AS client_name,
IFNULL(cn.contact_number, "") AS Phone
FROM accounts AS ac
STRAIGHT_JOIN clients AS cl ON cl.client_id = ac.client_id
LEFT JOIN (
SELECT bc.contact_number, bc.account_id
FROM contact_numbers AS bc
INNER JOIN (
SELECT account_id, MAX(number_id) AS number_id
FROM contact_numbers
WHERE status = 1 AND contact_type != "Fax" AND contact_link = "Account"
GROUP BY account_id
) AS bb ON bb.number_id = bc.number_id
) AS cn ON ac.account_id = cn.account_id
WHERE ac.status = 1
ORDER BY ac.account_name
LIMIT 0, 100
the client table contains about 10 rows that's why I have straight join. The account table contains 350K records. The contact_numbers contains about 500k records
I believe the problem here is the left Join and also the ORDER BY but I am not sure how to work around it. Also I am using SQL_NO_CACHE because the accounts, contact_numbers tables are being updated at a fast rate.
What else can I do to improve performance of this query?
this is a screenshot of the explain on this query
I am using MySQL 5.6.13
I Set sort_buffer_size=1M
My server has 32GB of RAM
The below should make the outer query run without requiring a filesort.
CREATE INDEX ac_status_acctname ON accounts (status, account_name);
The below should make the inner query Using index, and help it to do the GROUP by without using a temp table.
CREATE INDEX cn_max ON contact_numbers (account_id, status, contact_link,
contact_type, number_id);
You need to join on both account_id and number_id to get the greatest entry per account. The way you have it now, you just get any account that happens to have the same number_id, which is probably not what you intended, and it could be what's generating too many rows for the subquery result set.
bc INNER JOIN ... bb ON bb.account_id = bc.account_id AND bb.number_id = bc.number_id
You can also write the same join condition as:
bc INNER JOIN ... bb USING (account_id, number_id)
I would actualy rewrite the query. You currently select a lot of data you do not need and discard. I would minimize the amount of the fetched data.
It seems you basically select something for each account with a certain status and take only 100 of them. So I would put this in a subquery:
SELECT
account_id,
account_name,
c.name AS client_name,
IFNULL(contact_number, '') as Phone
FROM (
SELECT
account_id,
MAX(number_id) as number_id
FROM (
SELECT account_id
FROM accounts
WHERE status = 1 -- other conditions on accounts go here
ORDER BY account_name
LIMIT 0, 100) as a
LEFT JOIN contact_numbers n
ON a.coount_id = n.account_id
AND n.status = 1
AND contact_type != "Fax"
AND contact_link = "Account"
GROUP BY account_id) an
LEFT JOIN contact_numbers USING (account_id, number_id)
JOIN accounts a USING (account_id)
JOIN clients c USING (client_id);
You will need (status, account_name) index for accounts table (for the query with client_id = 4 (status, client_id, account_name) as well) and an index on account_id in contact_numbers. This should suffice.

SQL Join returns only one record

I'm writing a query whereby I'm trying to count the total number of records in report and assignment table, whiles at the same time retrieving information from the main table group. Group has a primary key id which is saved in the other tables as gid. This is the query:
SELECT `group`.`id` AS `gid`
, `group`.`name` AS `g_name`
, COUNT(`report`.`id`) AS `reports`
FROM `group`
LEFT OUTER JOIN `report` ON `report`.`gid` = `group`.`id`
LEFT OUTER JOIN `assignment` ON `assignment`.`gid` = `group`.`id`
WHERE `group`.`active` = 0
ORDER BY
`group`.`name`;
My problem is whenever I execute this only one record is returned even if theirs multiple groups.
Thanks in advance.
Well, your query is far from correct :) First of all, you should not have aggregated functions (in this case count) without a group by clause. Now, even if you have that clause the query will summarize information and you want both: the detail and a summary in the same query. I'd recommend 2 separate queries to retrieve this information, but if you want information mixed in only one query (the detail and also the "total number of records in report and assignment table") try the following query:
SELECT
`group`.id AS gid,
`group`.name AS g_name,
(SELECT COUNT(*) from report) as ReportTotalCount,
(SELECT COUNT(*) from assignment) as AssignmentTotalCount,
FROM `group`
WHERE `group`.`active` = 0
LEFT OUTER JOIN report ON report.gid = `group`.id
LEFT OUTER JOIN assignment ON assignment.gid = `group`.id
ORDER BY `group`.name;
I whish I could understand exactly what you're looking for but this might give you an idea on how to get the result you expect.
Can't see anything obvious in your query that would limit it to returning one record.
You are going to have to break it up to see where the problem is against your existing data.
So how many groups where acitive = 0, ahow many with a corresponding assignment record, etc.
maybe it will help:
SELECT
groupid,
groupname,
reports,
assignments,
FROM
(SELECT group.id, group.name, COUNT(*) AS reports from group
INNER JOIN report ON (report.gid = group.id)
WHERE group.active = 0
GROUP BY group.id ) AS ReportForGroup
CROSS JOIN
(SELECT group.id AS groupid, group.name AS groupname, COUNT(*) AS assignments from group
INNER JOIN assignmentON (assignment.gid = group.id)
WHERE group.active = 0
GROUP BY group.id ) AS AssignmentForGroup
ON (ReportForGroup.groupid = AssignmentForGroup.groupid)
ORDER BY groupname;
I'm can't check it so if LEFT JOIN returns to COUNT(*) 0 or 1. if it returns 0 just change the INNERs to LEFTs and use INNER JOIN between the two queries

MySQL evaluate case with subquery

I am trying to create a custom sort that involves the count of some records in another table. For example, if one record has no records associated with it in the other table, it should appear higher in the sort than if it had one or more records. Here's what I have so far:
SELECT People.*, Organizations.Name AS Organization_Name,
(CASE
WHEN Sent IS NULL AND COUNT(SELECT * FROM Graphics WHERE People.Organization_ID = Graphics.Organization_ID) = 0 THEN 0
ELSE 1
END) AS Status
FROM People
LEFT JOIN Organizations ON Organizations.ID = People.Organization_ID
ORDER BY Status ASC
The subquery within the COUNT is not working. What is the correct way to do something like this?
Update: I moved the case statement into the order by clause and added a join:
SELECT People.*, Organizations.Name AS Organization_Name
FROM People
LEFT JOIN Organizations ON Organizations.ID = People.Organization_ID
LEFT JOIN Graphics ON Graphics.Organization_ID = People.Organization_ID
GROUP BY People.ID
ORDER BY
CASE
WHEN Sent IS NULL AND Graphics.ID IS NULL THEN 0
ELSE 1
END ASC
So if if the People record does not have any graphics, Graphics.ID will be null. This achieves the immediate need.
If what you tried does not work, it can be done by joining against a subquery, and placing the CASE expression into ORDER BY as well:
SELECT
People.*,
orgcount.num
FROM People JOIN (
SELECT Organization_ID, COUNT(*) AS num FROM Graphics GROUP BY Organization_ID
) orgcount ON People.Organization_ID = orgcount.num
ORDER BY
CASE WHEN Sent IS NULL AND orgcount.num = 0 THEN 0 ELSE 1 END,
orgcount.num DESC
You could use an outer join to the Graphics table to get the data needed for your sort.
Since I don't know your schema, I made an assumption that the People table has a primary key column called ID. If the PK column has a different name, you should substitute that in the GROUP BY clause.
Something like this should work for you:
SELECT People.*, (count(Distinct Graphics.Organization_ID) > 0) as Status
FROM People
LEFT OUTER JOIN Graphics ON People.Organization_ID = Graphics.Organization_ID
GROUP BY People.ID
ORDER BY Status ASC
Fairly straight forward with a LEFT JOIN provided you have some kind of primary key in the People table to GROUP on;
SELECT p.*, sent IS NOT NULL or COUNT(g.Organization_ID) Status
FROM People p LEFT JOIN Graphics g ON g.Organization_ID = p.Organization_ID
GROUP BY p.primary_key
ORDER BY Status
Demo here.