COUNT all values in a column with JOIN - mysql

I am joining three tables and need to return two separate counts, one showing the total number of unique users who have purchased an item, and the other showing the total number of unique users who haven't purchased an item. These are cropped for brevity, but here are the relevant tables:
user table
+----------+------+------+-----+
| username | colb | colc | etc |
+----------+------+------+-----+
| user1 | * | * | * |
| user2 | * | * | * |
| user3 | * | * | * |
+----------+------+------+-----+
purchase table
+------------+---------+----------+------+
| purchaseID | storeID | username | cost |
+------------+---------+----------+------+
| 1 | 1 | user1 | * |
| 2 | 1 | user2 | * |
| 3 | 5 | user2 | * |
| 4 | 3 | user1 | * |
+------------+---------+----------+------+
store table
+---------+-----------+-----+
| storeID | storeName | etc |
+---------+-----------+-----+
| 1 | store1 | * |
| 2 | store2 | * |
| 3 | store3 | * |
+---------+-----------+-----+
I am currently using this query to get the unique users who have purchased an item from a store:
SELECT
store.storeID storeID,
store.storeName storeName,
COUNT(DISTINCT CASE WHEN purchase.username IS NOT NULL
THEN purchase.purchaseID END) AS purchases
[Query to retrieve total unique users who have not purchased an item]
FROM store
LEFT JOIN purchase
ON store.storeID = purchase.storeID
LEFT JOIN user
ON purchase.username = user.username
GROUP BY 1, 2
I have tried a few different ways, none of which have worked. The issue I've identified is when the LEFT JOIN happens it only returns the matching results for usernames, thus the COUNT won't include the other users in the user table. I have not had any luck finding a way to fix this, so I'm hoping someone on here can lend me a hand. The results I'm hoping to see should be something like this:
+---------+-----------+-----------+--------------+
| storeID | storeName | purchases | nonPurchases |
+---------+-----------+-----------+--------------+
| 1 | store1 | 2 | 1 |
| 2 | store2 | 0 | 3 |
| 3 | store3 | 1 | 2 |
+---------+-----------+-----------+--------------+

that is actually quite simple.
First you count all user and subtract te count of distinct purchasers
SELECT
store.storeID storeID,
store.storeName storeName,
COUNT(DISTINCT CASE WHEN purchase.username IS NOT NULL
THEN purchase.purchaseID END) AS purchases,
(SELECT COUNT(*) FROM User) - COUNT(DISTINCT CASE WHEN purchase.username IS NOT NULL
THEN purchase.purchaseID END) AS NON_purchases
FROM store
LEFT JOIN purchase
ON store.storeID = purchase.storeID
LEFT JOIN user
ON purchase.username = user.username
GROUP BY 1, 2

Here is a clean solution.
Please note that the aggregation is done before the join.
with
purchases as
(
select storeID
,count(distinct username) as purchase
from purchase
group by storeID
),
users as
(
select count(*) as total_users
from user
)
select storeID
,storeName
,coalesce(purchase, 0) as purchase
,total_users - coalesce(purchase, 0) as nonPurchases
from store
left join purchases using (storeID)
cross join users
storeID
storeName
purchase
nonPurchases
1
store1
2
1
2
store2
0
3
3
store3
1
2
Fiddle

I'll go with a slightly different approach.
Generate a combination of store and user using CROSS JOIN, make it as a subquery then use that to LEFT JOIN with purchase table. In SELECT, change COUNT(DISTINCT ..) to SUM(..). Something like this:
SELECT us.storeID,
us.storeName,
SUM(CASE WHEN p.username IS NOT NULL
THEN 1 ELSE 0 END) AS purchases,
SUM(CASE WHEN p.username IS NULL
THEN 1 ELSE 0 END) AS nonPurchases
FROM (SELECT storeID, storeName, username FROM user u CROSS JOIN store s) us
LEFT JOIN (SELECT DISTINCT storeid, username FROM purchase) p
ON us.storeID = p.storeID
AND us.username=p.username
GROUP BY 1, 2;
Thanks to David pointing out in the comment that my previous suggestion is not exactly counting unique users. So I made a quick modification to make sure that it does what OP wanted in the first place. Therefore I did a SELECT DISTINCT ... on purchase table then make it as a subquery for the LEFT JOIN. The other parts of the original suggestion remains.
Updated fiddle

Related

Calculate the amount payable to each user

I want to calculate the amount payable to each user.
This may be in the negative. Briefly:
MustPay = AmountTaken - AmountPaid
I could not write the sql query.
SELECT users.Name, users.Surname,
SUM(takenfrom.AmountTaken) - SUM(paid.AmountPaid) AS MustPay
FROM users
LEFT JOIN takenfrom ON takenfrom.UserId = users.UserId
LEFT JOIN paid ON paid.UserId = users.UserId
GROUP BY users.UserId
Tables:
FIRST TABLE USERS
| UserId | Name | Surname |
| 1 | foo | boo |
| 2 | f | b |
SECOND TABLE TAKENFROM
| TakenFromId | UserId | AmountTaken|
| 1 | 1 | 100 |
| 2 | 2 | 200 |
THIRD TABLE PAID
| PaidId | UserId | AmountPaid|
| 1 | 2 | 50 |
| 2 | 2 | 50 |
RESULT TABLE
| Name | Surname| MustPay |
| foo | boo | 100 |
| f | b | 100 |
You don't need a LEFT JOIN on TakenFrom as every user is going to have an amount billed to them, whether or not they've paid is what Paid is for. You only need a LEFT JOIN on Paid, as user may have paid already or they may have not.
Since not every user is going to have an AmountPaid, so you need to use an IFNULL() to check that. SUM() returns NULL if it's given a NULL.
Also, unless there are multiple rows for each user in takenfrom, then you don't need a SUM() for AmountTaken.
SELECT users.Name, users.Surname,
SUM(takenfrom.AmountTaken) - IFNULL(SUM(paid.AmountPaid), 0) AS MustPay
FROM users
JOIN takenfrom ON takenfrom.UserId = users.UserId
LEFT JOIN paid ON paid.UserId = users.UserId
GROUP BY users.UserId
DEMO: http://sqlfiddle.com/#!9/e69b3b/1
UPDATE: If both paid and takenfrom have multiple rows (for a UserId), then you'll get duplicate rows from the JOINs. To fix this, you can use subqueries instead of JOIN:
SELECT Name, Surname,IFNULL((
SELECT SUM(AmountTaken) FROM takenfrom WHERE UserID = users.UserID
), 0) - IFNULL((
SELECT SUM(AmountPaid) FROM paid WHERE UserID = users.UserID
), 0) AS MustPay
FROM users
DEMO: http://sqlfiddle.com/#!9/3960b5/26

MySQL - Selecting Duplicates across 3 columns and joining with another table to filter

I have a Purchases table, where I'm trying to select all rows where first name, surname and email are duplicates (for all 3).
Purchases table:
| purchase_id | product_id | user_id | firstname | surname | email |
| ------------- | -----------| ------------- | ----------- | --------- | ----------- |
| 1 | 1 | 777 | Sally | Smith | s#gmail.com |
| 2 | 2 | 777 | Sally | Smith | s#gmail.com |
| 3 | 3 | 777 | Sally | Smith | s#gmail.com |
| 4 | 1 | 888 | Bob | Smith | b#gmail.com |
Further to this, each product ID corresponds to a product type in a 'Products' table, and I'm trying to filter by 'lawnmower' purchases (so only product ID 1 & 2)
Products table:
| product_type | product_id |
| ------------- | -----------|
| lawnmower | 1 |
| lawnmower | 2 |
| leafblower | 3 |
I'm hoping to write a query that will return all purchases of the 'lawnmower' type where first name, last name, and email are duplicates (so would return the first two rows of the Purchases table).
This is where my query is at so far, however it's not returning accurate data (e.g. I know I have around 350 duplicates and it's returning 10,000 rows):
SELECT t. *
FROM database_name.purchases t
JOIN (
SELECT firstname, surname, email, count( * ) AS NumDuplicates
FROM database_name.purchases
GROUP BY firstname, surname, email
HAVING NumDuplicates >1
)tsum ON t.firstname = tsum.firstname
AND t.surname = tsum.surname
AND t.email = tsum.email
INNER JOIN database_name.products p2 ON t.product_id = p2.product_id
WHERE p2.product_type = 'lawnmower'
Just wanting to know what I need to tweak in my query syntax.
You know that you should be returning Sally Smith. Create a table from the results of your query above. Then Select * from that table where first_name=sally and surname=Smith. See if you can figure out where you are going wrong based on that. This will help you debug these type of issues yourself in the future.
Your inner SELECT does not filter on the product type. It gets all customers who have purchased any two items. Then you join it to purchases and therefore also get the purchases of customers who have bought any two items and, possibly only one, lawnmower. Add a filter on the product type in the subquery too:
SELECT t.*
FROM database_name.purchases t
INNER JOIN (SELECT purchases.userid
FROM database_name.purchases
INNER JOIN database_name.products
ON products.product_id = purchases.product_id
WHERE products.product_type = 'lawnmower'
GROUP BY userid
HAVING count(*) > 1) s
ON t.user_id = s.user_id
INNER JOIN database_name.products p
ON t.product_id = p.product_id
WHERE p.product_type = 'lawnmower';
Your schema also is problematic -- denormalised. firstname, surname and email depend on user_id (Note that I only grouped and joined using the user_id, that's enough,). So they shouldn't be in purchases, only user_id. product_type better by an ID referencing to some product type table too.

MySQL GroupBy with null/zero results

I'm currently writing a ticket system that has three tables
one for users:
users
+----+-----------+----------+
| ID | FirstName | LastName |
+----+-----------+----------+
| 1 | First | User |
| 2 | Second | User |
| 3 | Third | User |
| 4 | Fourth | User |
| 5 | Fifth | User |
+----+-----------+----------+
one for tickets:
ticket
+----+---------------+
| ID | TicketSubject |
+----+---------------+
| 1 | Ticket #1 |
| 2 | Ticket #2 |
| 3 | Ticket #3 |
| 4 | Ticket #4 |
+----+---------------+
and one to assign users to tickets to action (can be more than one user per ticket):
ticket_assigned
+----+----------+--------+
| ID | TicketID | UserID |
+----+----------+--------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 3 | 5 |
| 5 | 3 | 3 |
+----+----------+--------+
I'm trying to create a summary to show each user, and how many tickets they have assigned to them, example:
+------------+-------+
| Name | Count |
+------------+-------+
| First | 2 |
| Second | 1 |
| Third | 1 |
| Fourth | 0 |
| Fifth | 1 |
| Unassigned | 2 |
+------------+-------+
Note that the last entry is "unassigned", this is the number of records in the ticket table that DONT appear in the ticket_assigned table (thus being, unassigned). Also further note that user "Fourth" is zero, in that that user has no records in the ticket_assigned table.
Here is the current MySQL query I am using:
SELECT
CASE
WHEN users.FirstName IS NULL
THEN 'Unassigned'
ELSE users.FirstName
END as 'UserName',
COUNT(*) as 'TicketCount'
FROM tickets
LEFT OUTER JOIN ticket_assigned ON tickets.ticket_id = ticket_assigned.ticket_id
LEFT OUTER JOIN users ON ticket_assigned.user_id = users.user_id
GROUP BY ticket_assigned.user_id
ORDER BY UserName;
Problem with this is that it's not showing any of the users that don't feature in the ticket_assigned table, I'm essentially getting this:
+------------+-------+
| Name | Count |
+------------+-------+
| First | 2 |
| Second | 1 |
| Third | 1 |
| Fifth | 1 |
| Unassigned | 2 |
+------------+-------+
Is anyone able to assist and tell me how I can modify my query to include users that have no records in the ticket_assigned table? Thanks in advance!
Use a LEFT JOIN with a subquery to aggregate tickets:
SELECT t1.FirstName,
COALESCE(t2.ticket_count, 0) AS num_tickets
FROM users t1
LEFT JOIN
(
SELECT UserID, COUNT(*) AS ticket_count
FROM ticket_assigned
GROUP BY UserID
) t2
ON t1.ID = t2.UserID
UNION ALL
SELECT 'Unassigned', COUNT(*)
FROM tickets t
WHERE NOT EXISTS (SELECT 1 FROM tickets_assigned ta
WHERE ta.ticketId = t.id)
In MySQL, I think you need a left join and union all:
select u.id, u.firstname, count(ta.userId) as num_tickets
from users u left join
tickets_assigned ta
on ta.userId = u.id
group by u.id, u.firstname
union all
select NULL, 'Unassigned', count(*)
from tickets t
where not exists (select 1
from tickets_assigned
where ta.ticketId = t.id
);
I included the u.id in the aggregations. I'm uncomfortable just aggregating (and reporting) by first name, because different people frequently have the same first name, even in a relatively small group.
SELECT
u2.Firstname, IFNULL(tmp.count, 0) AS count
FROM users u2
LEFT JOIN (
SELECT u.id, u.Firstname, COUNT(1) as count
FROM ticket_assigned ta
LEFT JOIN ticket t ON t.id = ta.ticketID
LEFT JOIN users u ON u.id = ta.userID
GROUP BY u.id
) tmp ON tmp.id = u2.id
UNION
SELECT
'Unassigned', count(1) AS count
FROM ticket
WHERE id NOT IN (SELECT ticketid FROM ticket_assigned)

Select users that have more received photos than sent photos

I'm struggling with this SQL query. Say I have these two tables
**USERS**
+----+-------+
| id | name |
+----+-------+
| 1 | james |
| 2 | tom |
| 3 | kate |
+----+-------+
**PHOTOS**
+-----------+-----------+---------+
| name | sent_from | sent_to |
+-----------+-----------+---------+
| beach.jpg | 1 | 2 |
| trees.jpg | 3 | 1 |
| earth.jpg | 2 | 1 |
+-----------+-----------+---------+
How could I get, using one SQL query, all the users that have more sent_to associated with their id than sent_from ?
I think of this as aggregating the data twice and then doing the comparison:
select sf.sent_from
from (select sent_from, count(*) as numsent
from photos
group by sent_from
) sf left outer join
(select sent_to, count(*) as numrecv
from photos
group by sent_to
) st
on sf.sent_from, st.sent_to
where numsent > numrecv;
If you want user information, then join that in.
An alternative way restructures the data first and then does the aggregation:
select who
from (select sent_from as who, 1 as sent_from, 0 as sent_to
from photos
union all
select sent_to as who, 0, 1
from photos
) p
group by who
having sum(sent_from) > sum(sent_to);
I think here is something that might help you:
SELECT * FROM (
SELECT `id`, `name`,
IFNULL((SELECT count(*) FROM `photos` WHERE `sent_from` = `users`.`id`),0) AS `sent_from_count`,
IFNULL((SELECT count(*) FROM `photos` WHERE `sent_t`o = `users`.`id`),0) AS `sent_to_count`
FROM `users`) AS `t1`
WHERE `t1`.`sent_to_count` > `t1`.`sent_to_count`

Add column from another table, but not affect count()

I have already a query with multiple JOINs, simple list of reservations
SELECT reservation.reservation_id, customer.customer_id, customer.name, count(ordered_services.reservation_id) AS num_of_ordered_services
FROM reservations
JOIN customers ON reservations.customer_id = customer.customer_id
LEFT JOIN ordered_services ON reservations.reservation_id = ordered_services.reservation_id
GROUP BY reservation.reservation_id, customer.customer_id, customer.name
ORDER BY reservation.reservation_id
which outputs something like
reservation_id | customer_id | name | num_of_ordered_services
1 | 1909091202 | John | 2
2 | 2512541508 | Jane | 3
I would like to add another column with information about payment, but simple JOIN, LEFT JOIN interferes with existing count() column. Like
SELECT reservation.reservation_id, count(payments.reservation_id) AS num_of_payments
FROM reservations
LEFT JOIN payments ON reservations.reservation_id = payments.reservation_id
GROUP BY reservation.reservation_id
ORDER BY reservation.reservation_id
reservation_id | num_of_payments
1 | 0
2 | 2
but in both a single result. How to achieve this?
PS: num_of_payments is not necessary, I only need to know if the payment for certain reservation exists or not (1, 0).
Thank you
tbl structure, nothing special:
reservations
reservation_id | customer_id | added
1 | 1909091202 | 2011-11-04 02:37:28
2 | 2512541508 | 2011-11-04 14:27:01
customers
customer_id | name | personal information columns ...
1909091202 | John | | |
2512541508 | Jane | | |
... | ... | | |
payments
payment_id | reservation_id | customer_id | total | added
1 | 2 | 1909091202 | 199 | 2011-11-04 02:37:28
2 | 2 | 2512541508 | 50 | 2011-11-04 14:27:01
You could use a subselect for the additional field.
SELECT reservation.reservation_id, customer.customer_id, customer.name,
count(ordered_services.reservation_id) AS num_of_ordered_services,
(SELECT count(*) FROM payments WHERE reservation.reservation_id=payments.reservation_id) AS num_of_payments
FROM reservations
JOIN customers ON reservations.customer_id = customer.customer_id
LEFT JOIN ordered_services ON reservations.reservation_id = ordered_services.reservation_id
GROUP BY reservation.reservation_id, customer.customer_id, customer.name
ORDER BY reservation.reservation_id
Something like the following should work:
select
reservation.reservation_id,
(case when exists (select * from payments p1 where p1.reservation_id = reservation.reservation_id) then 1 else 0 end) as one_or_many_payments_made
from reservation
GROUP BY reservation.reservation_id
ORDER BY reservation.reservation_id
But without your data, there is some guesswork here.