MySQL Aggregate Function with group by and join - mysql

I have the following tables schemas and I want to get the sum of amount column for each category and the count of employees in the corresponding categories.
employee
id | name | category
1 | SC | G 1.2
2 | BK | G 2.2
3 | LM | G 2.2
payroll_histories
id | employee_id | amount
1 | 1 | 1000
2 | 1 | 500
3 | 2 | 200
4 | 2 | 100
5 | 3 | 300
Output table should look like this:
category | total | count
G 1.2 | 1500 | 1
G 2.2 | 600 | 2
I have tried this query below its summing up and grouping but I cannot get the count to work.
SELECT
employee_id,
category,
SUM(amount) from payroll_histories,employees
WHERE employees.id=payroll_histories.employee_id
GROUP BY category;
I have tried the COUNT(category) but that one too is not working.

You are, I believe, seeking two different summaries of your data. One is a sum of salaries by category, and the other is a count of employees, also by category.
You need to use, and then join, separate aggregate queries to get this.
SELECT a.category, a.amount, b.cnt
FROM (
SELECT e.category, SUM(p.amount) amount
FROM employees e
JOIN payroll_histories p ON e.id = p.employee_id
GROUP BY e.category
) a
JOIN (
SELECT category, COUNT(*) cnt
FROM employees
GROUP BY category
) b ON a.category = b.category
The general principle here is to avoid trying to use just one aggregate query to aggregate more than one kind of detail entity. Your amount aggregates payroll totals, whereas your count aggregates employees.
Alternatively for your specific case, this query will also work. But it doesn't generalize well or necessary perform well.
SELECT e.category, SUM(p.amount) amount, COUNT(DISTINCT e.id) cnt
FROM employees e
JOIN payroll_histories p ON e.id = p.employee_id
GROUP BY e.category
The COUNT(DISTINCT....) will fix the combinatorial explosion that comes from the join.
(Pro tip: use the explicit join rather than the outmoded table,table WHERE form of the join. It's easier to read.)

Related

MySQL: Return Count of 0 for Items Not Found in "WHERE IN" List

I have a page listing 100 auctions and I would like to display the number of bids per itemid. Here is what I have so far (using just 4 auctions in this example):
SELECT itemid, COUNT(*) as count FROM bids WHERE itemid IN(487359,487342,487339,487338) GROUP BY itemid
This displays the number of bids... but only if a bids (and itemid) actually exist in the bid table. Items that have no bid will not have an itemid in the bids table. How do I return a count of "0" for these?
I tried the following, but these don't work either:
IFNULL(COUNT(*),0) and COLEASCE(COUNT(*),0)
UPDATE
Here is some example data:
Bids Table
itemid | bid |
487359 | 1.00 |
487359 | 2.00 |
487359 | 2.50 |
487342 | 8.20 |
487338 | 1.00 |
What actually happens:
itemid | #of bids |
487359 | 3 |
487342 | 1 |
487338 | 1 |
What I would like returned:
itemid | #of bids |
487359 | 3 |
487342 | 1 |
487339 | 0 |
487338 | 1 |
Note that item "487339" isn't found in the auctions table because there are no bids recorded for that item. I would like mysql to return 0 for these.
This can be done with an OUTER JOIN:
SELECT i.id as itemid, COUNT(DISTINCT b.id) as count
FROM items i LEFT OUTER JOIN bids b ON i.id = b.itemid
WHERE i.id IN (487359, 487342, 487339, 487338)
GROUP BY i.id
This will always return the items specified — so long as the record exists — and a 0 if there are no bids found. Because an OUTER JOIN is being used, if you need to add any filter criteria to the bids, be sure to add it to the OUTER JOIN with AND. Do not add bids criteria to the WHERE, otherwise your OUTER JOIN will be treated as an INNER JOIN, removing all records with 0 bids.
You could use conditional summations here:
SELECT
itemid,
SUM(itemid IN (487359, 487342, 487339, 487338, ...)) AS count
FROM bids
GROUP BY
itemid;
However, a better approach, especially if the list of itemid be long, would be to place these values in a separate table items and then do a left join with bids:
SELECT
b.itemid,
COUNT(i.itemid) AS count
FROM bids b
LEFT JOIN items i -- contains 487359, 487342, 487339, 487338, ...
ON i.itemid = b.itemid
GROUP BY
b.itemid;

SQL Query is not returning NULL values despite using IFNULL (MySQL)

I have a test table set up as such: Table Rows setup
My objective is to try and get the count of departments that were established before the current department. My SQL Is:
SELECT A.Department, IFNULL(COUNT(*), 0)
FROM Departments A
INNER JOIN Departments B ON B.YearOfEstablishment < A.YearOfEstablishment
GROUP BY Department
ORDER BY COUNT(*);
However I've tried both LEFT JOIN and INNER JOIN, the last department that was found first never is returned because I can assume it is null. Despite having IFNULL, the department is not shown.
What am I doing wrong here?
I think that this is the query you need :
SELECT A.Department, COUNT(B.Department)
FROM Departments A
LEFT JOIN Departments B ON B.YearOfEstablishment < A.YearOfEstablishment
GROUP BY A.Department
ORDER BY 2;
See this db fiddle demo.
| Department | cnt |
| ----------------- | --- |
| Office Management | 0 |
| Business | 1 |
| Sales Management | 2 |
| ComputerScience | 3 |
| Liberal Arts | 4 |
| Farming | 4 |
| Communications | 6 |
| Digital Science | 7 |
NB : as commented by #fifonik, IFNULL is not needed since COUNT already returns 0 when no records are available.
In MySQL 8+, you can better do this using rank() or row_number():
SELECT d.Department,
ROW_NUMBER() OVER (ORDER BY YearOfEstablishment) - 1 as seqnum
FROM Departments d
ORDER BY seqnum;
With no ties, this would be the same as your query. It might be better to do:
SELECT d.Department,
RANK() OVER (ORDER BY YearOfEstablishment) - 1 as seqnum
FROM Departments d
ORDER BY seqnum;
This should be the count you are looking for.
You can also do something like this:
SELECT
A.Department
, A.YearOfEstablishment
, COUNT(*) - 1
FROM
Departments A
INNER JOIN Departments B ON (
A.id = B.id
OR B.YearOfEstablishment < A.YearOfEstablishment
)
GROUP BY
Department
ORDER BY
COUNT(*);

MySQL SUM of one column, DISTINCT of ID column

I'm trying to create a summary report of our orders but having trouble extracting all my required data in a single query.
The data I'd like to extract:
subtotal - SUM of all sale prices
delivery total - SUM of all orders deliveryTotal
orders - COUNT of DISTINCT orderIds
quantity - SUM of all quantity ordered
Orders table (simplified for this example)
| orderId | deliveryTotal | total |
|---------|---------------|-------|
| 1 | 5 | 15 |
| 2 | 5 | 15 |
| 3 | 7.50 | 27.50 |
Order items table
| orderItemId | orderId | productId | salePrice | quantity |
|-------------|---------|-----------|-----------|----------|
| 1 | 1 | 1 | 10 | 1 |
| 2 | 2 | 1 | 10 | 1 |
| 3 | 3 | 1 | 10 | 1 |
| 4 | 3 | 2 | 10 | 1 |
My current query for extracting this data is
SELECT
SUM(i.salePrice * i.quantity) as subtotal,
SUM(DISTINCT o.deliveryTotal) as deliveryTotal,
COUNT(DISTINCT o.orderId) as orders,
SUM(i.quantity) as quantity
FROM orderItems i
INNER JOIN orders o ON o.orderId = i.orderId
Which results in a correct subtotal, order count and quantity sum. But delivery total is returned as 12.50 when I'm after 17.50. If I do SUM(o.deliveryTotal) it will return 25.
EDIT: Desired results
| subtotal | deliveryTotal | orders | quantity |
|----------|---------------|--------|----------|
| 40.00 | 17.50 | 3 | 4 |
https://tiaashish.wordpress.com/2014/01/31/mysql-sum-for-distinct-rows-with-left-join/
Here is a blog post that shows exactly what I was looking for. Maybe this can help others too.
The formula is something like this:
SUM(o.deliveryTotal) * COUNT(DISTINCT o.orderId) / COUNT(*)
Because of the join, the SUM(DISTINCT deliveryTotal) aggregate is being applied to a rowset including the values 5, 5, 7.5, 7.5 (distinct 5 + 7.5 = 12.5).
The rows your SUM() acted on become more apparent if you simply do
SELECT o.*
FROM orderItems i
INNER JOIN orders o ON o.orderId = i.orderId
Instead you are asking for the SUM() of all the values in deliveryTotal, irrespective of their position in the join with orderItems. That means you need to apply the aggregate at a different level.
Since you are not intending to add a GROUP BY later, the easiest way to do that is to use a subselect whose purpose is only to get the SUM() across the whole table.
SELECT
SUM(i.salePrice * i.quantity) as subtotal,
-- deliveryTotal sum as a subselect
(SELECT SUM(deliveryTotal) FROM orders) as deliveryTotal,
COUNT(DISTINCT o.orderId) as orders,
SUM(i.quantity) as quantity
FROM orderItems i
INNER JOIN orders o ON o.orderId = i.orderId
Subselects are usually discouraged but there won't be a significant performance penalty for the subselect, none different from the alternative methods of using a join for it. The calculation has to be done on a separate aggregate from the existing join no matter what. Other methods would place a subquery CROSS JOIN in the FROM clause, which performs the same thing we placed here in the subselect. Performance would be the same.
Select per Order in the Inner Select and than sum it up
Select
SUM(subtotal) as subtotal,
sum(deliveryTotal) as deliveryTotal,
count(1) as orders,
sum(quantity) as quantity
from (
SELECT
SUM(i.salePrice * i.quantity) as subtotal,
o.deliveryTotal as deliveryTotal,
SUM(i.quantity) as quantity
FROM orders o
INNER JOIN orderItems i ON o.orderId = i.orderId
group by o.orderId) as sub
The below query results exactly what you need
SELECT SUM(conctable.subtotal),
SUM(conctable.deliveryTotal),
SUM(conctable.orders),
SUM(conctable.quantity) from
(SELECT SUM(i.salePrice * i.quantity) as subtotal,
o.deliveryTotal as deliveryTotal,
COUNT(DISTINCT o.orderId) as orders,
SUM(i.quantity) as quantity
FROM orderItems i
JOIN orders o ON o.orderId = i.orderId group by i.orderid) as conctable;

Making large SQL query efficicent

I'm stuck on a rather complex query.
I'm looking to write a query that shows the "top five customers" as well as some key metrics (counts with conditions) about each of those customers. Each of the different metrics uses a totally different join structure.
+-----------+------------+ +-----------+------------+ +-----------+------------+
| customer | | | metricn | | | metricn_lineitem |
+-----------+------------+ +-----------+------------+ +-----------+------------+
| id | Name | | id | customer_id| |id |metricn_id |
| 1 | Customer1 | | 1 | 1 | | 1 | 1 |
| 2 | Customer2 | | 2 | 2 | | 2 | 1 |
+-----------+------------+ +-----------+------------+ +-----------+------------+
The issue this is that I always want to group by this customer table.
I first tried to put all of my joins into the original query, but the query was abysmal with performance. I then tried using subqueries, but I couldn't get them to group by the original hospital id.
Here's a sample query
SELECT
customer.name,
(SELECT COUNT(metric1_lineitem.id)
FROM metric1 INNER JOIN metric1_lineitem
ON metric1_lineitem.metric1_id = metric1.id
WHERE metric1.customer_id = customer_id
) as metric_1,
(SELECT COUNT(metric2_lineitem.id)
FROM metric2 INNER JOIN metric2_lineitem
ON metric2_lineitem.metric2_id = metric2.id
WHERE metric2.customer_id = customer_id
) as metric_2
FROM customer
GROUP BY customer.name
SORT BY COUNT(metric1.id) DESC
LIMIT 5
Any advice? Thanks!
SELECT name, metric_1, metric_2
FROM customer AS c
LEFT JOIN (SELECT customer_id, COUNT(*) AS metric_1
FROM metric1 AS m
INNER JOIN metric1_lineitem AS l ON m.id = l.metric1_id
GROUP BY customer_id) m1
ON m1.customer_id = c.customer_id
LEFT JOIN (SELECT customer_id, COUNT(*) AS metric_2
FROM metric2 AS m
INNER JOIN metric2_lineitem AS l ON m.id = l.metric2_id
GROUP BY customer_id) m1
ON m2.customer_id = c.customer_id
ORDER BY metric_1 DESC
LIMIT 5
You should also avoid using COUNT(columnname) when you can use COUNT(*) instead. The former has to test every value to see if it's null.
Although your data structure may be lousy, your query may not be so bad, with two exceptions. I don't think you need the aggregation on the outer level. Also, the "correlation"s in the where clause (such as metric1.customer_id = customer_id) are not doing anything, because customer_id is coming from the local tables. You need metric1.customer_id = c.customer_id:
SELECT c.name,
(SELECT COUNT(metric1_lineitem.id)
FROM metric1 INNER JOIN
metric1_lineitem
ON metric1_lineitem.metric1_id = metric1.id
WHERE metric1.customer_id = c.customer_id
) as metric_1,
(SELECT COUNT(metric2_lineitem.id)
FROM metric2 INNER JOIN
metric2_lineitem
ON metric2_lineitem.metric2_id = metric2.id
WHERE metric2.customer_id = c.customer_id
) as metric_2
FROM customer c
ORDER BY 1 DESC
LIMIT 5;
How can you make this run faster? One way is to introduce indexes. I would recommend metric1(customer_id), metric2(customer_id), metric1_lineitem(metric1_id) and metric2_lineitem(metric2_id).
This may be faster than the aggregation method (proposed by Barmar) because MySQL is inefficient with aggregations. This should allow the aggregations to take place only using indexes instead of the base tables.

How to combine MySQL number of rows of the joined table, including 0?

I have two tables: 'company' and 'order'. The first one contains company info and the second one holds all orders made with a company. (order.company = company.ID).
I am making a query on the first table, for example all companies in the city of New York. I would like to make a join with the order table, so that it immediately shows how many orders for a company was made. I could do this with a simple JOIN query, however, it does not include 0. For example, if a company has no orders yet, it will not show up at all, while it should be in the list with 0 orders.
Desired end result:
----------------------------------------
| ID | Name | ... | Orders |
----------------------------------------
| 105 | Company A | ... | 14 |
| 115 | Company B | ... | 5 |
| 120 | Company C | ... | 0 |
| 121 | Company D | ... | 0 |
----------------------------------------
Thanks in advance!
This is a left join with aggregation:
SELECT c.ID, c.Name, count(o.company) as total
FROM companies c left outer join
orders o
on c.id = o.company
WHERE c.city = 'New York'
GROUP BY c.ID;
In MySQL, it is best to avoid subqueries in the from clause -- where possible -- because the derived table is actually created.
The COUNT() expression is counting the number of matches by counting the number of non-null values in the id field used for the join.
Try this
SELECT com.id,com.name,od.orders FROM compnay AS com
LEFT JOIN orders AS od ON od.company = com.id;
SELECT companies.ID,companies.Name ,orders.total FROM
(SELECT ID,Name FROM company where county ='NEW YORK') companies
LEFT JOIN (SELECT company,COUNT(*) as total FROM order GROUP BY company) orders
ON orders.company = companies.ID