MySQL: WHERE COUNT(*) = 0 - mysql

I am trying to get all the customer_id's where no rows from have been found, for example:
SELECT customer_id FROM transaction WHERE count(*) = '0'
I have tried this aswell:
SELECT customer_id, count(*) as total_rows FROM transaction WHERE total_rows='0'
But I get the error that total_rows is not a column.

The easiest way to do this is to think about it in a bit of a different way: "how do I get a list of all customers who have no transaction history?"
Simple! You get a list of all of the customers, join it against their transactions and filter out any customers who have a non-empty list of transactions. Or, in SQL:
SELECT
customer.customer_id
FROM customer
LEFT JOIN transaction
ON transaction.customer_id = customer.customer_id
WHERE
transaction.transaction_id IS NULL
Note that you cannot simply use the transaction table like you're attempting. It is not a complete list of customer_id but rather it contains only IDs of customers who have an order.
Instead of operating on transaction and finding customers with no transactions (which you literally cannot do), you must find all customers and then filter by those who have no transactions. Similar concept, just opposite order.

Related

MySQL GROUP BY and INNER JOIN - groupping by the joined fields

Considering the following query:
SELECT COUNT(table1.someField), COUNT(table2.someField)
FROM table1
INNER JOIN table2 ON table2.id = table1.id
GROUP BY table1.id
I am trying to understand what the difference is (if any) between groupping by table1.id and groupping by table2.id. In short, when inner joining two tables on X=Y, what the difference is when groupping by X and when groupping by Y. That's it.
The real world example - pretty straightforward: a table transaction holds transactions information (paid amount, dates etc), and a table transaction_product holds information regarding which products were included in which transaction.
So for example, transaction number 1 could have included products number 1, 2 and 3, and so forth (so the table relation is obviously one-to-many).
The problem: I need to know for each transaction, how much was paid for how many products. This is the query, including both GROUP BY alternatives:
SELECT
`transaction`.id,
SUM(`transaction`.transaction_amount) AS total_amount,
COUNT(`transaction_product`.product_id) AS number_of_products
FROM `transaction`
INNER JOIN `transaction_product` ON `transaction_product`.transaction_id = `transaction`.id
GROUP BY [`transaction`.id [OR] `transaction_product`.transaction_id]
I need to know if there is a difference between the two GROUP BY alternatives. I couldn't find relevant information regarding the GROUP BY behavior in this case in the documentation, therefore any help on clarifying the matter would be much appreciated.
The result of the inner join will be a set of rows with matching transaction IDs, so the set of values that column can have will be the same on both transaction and transaction_product tables.
The group by will return a single row for each available value of the grouped column(s), and all the rows that share the same value will be aggregated with the aggregation function you use. The result
Result: there won't be any difference between the two options you have, because the same rows will be grouped with the exact same criteria, being the set of values the same on both sides.
TL/DR
There is no difference at all.
There is no difference whatsovever which id you choose to include in your GROUP BY clause. The total number of rows for each transaction id will be the number of products for that transaction. This query should get what you need:
SELECT
`transaction`.id,
SUM(`transaction`.transaction_amount) AS total_amount,
COUNT(1) AS number_of_products
FROM `transaction`
INNER JOIN `transaction_product` ON `transaction_product`.transaction_id =
`transaction`.id
GROUP BY `transaction`.id

Using SQL to calculate average number of customers

Please see above for the data structure. I am trying to write an SQL query to get the average number of customers for each session
My attempt:
select avg(A.NumberCustomer)
from(
select SessionName, count(distinct customers.Idcustomer) as NumberCustomer,
from customers, enrollments, sessions
where customers.Idcustomer=enrollments.Idcustomer and enrollments.Idsession=sessions.Idsession
group by sessions.SessionName
) A
But I seem to get an error on the from customers, enrollments, sessions line
Not sure about this, any help appreciated.
Thanks
You have and extra comma that you should to delete:
select avg(A.NumberCustomer)
from(
select SessionName,
count(distinct customers.Idcustomer) as NumberCustomer, #<--- here
from customers, enrollments, sessions
where customers.Idcustomer=enrollments.Idcustomer
and enrollments.Idsession=sessions.Idsession
group by sessions.SessionName
) A
By the way, I suggest to you to move to SQL'99 join syntax for readability reasons:
SELECT
avg(A.NumberCustomer)
FROM (
select
SessionName,
count(distinct customers.Idcustomer) as NumberCustomer
from customers
inner join enrollments
on customers.Idcustomer=enrollments.Idcustomer
inner join sessions
on enrollments.Idsession=sessions.Idsession
group by sessions.SessionName
) A
Also, nice diagram on question and remember to include your error message next time.
For the average number of customers in each session, you should be able to use just the enrollments table. The average would be the number of enrollments divided by the number of sessions:
select count(*) / count(distinct idSession)
from enrollments e;
This makes the following assumptions:
All sessions have at least one customer (your original query had this assumption as well).
No customer signs up multiple times for the same session.

selecting from two tables with a condition which effect only on one table

I have two tables clients and transactions and I need to take a query from these two tables in a way which all the clients should select with the total of their transactions.
My problem is when I do a query from these two tables and set the condition; which transactions should have the clients id it shows only those clients that have record in transaction table, but I want it display all the clients even if they don't have any transaction(it can display zero instead sum of transaction).
I know because of condition which belongs to transaction table, query doesn't select persons in clients table which doesn't meet the query requirement, but how can I select all the clients and sum of their transactions or put zero if they don't has any transaction.
this is a short view of tables (only those columns I used in query)
ID Name Company Phone //clients table
ID Client_id Incoming ... //transaction table
Thank you in advance and sorry for my bad english
In addition, you can also do this with a correlated subquery:
SELECT c.*,
(select sum(t.incoming) - sum(t.outgoing)
from transactions t
where t.client_id = c.id
) as total
from clients c;
Under some circumstances, this could have better performance.
SELECT c.Name, count(t.ID)
FROM clients c
left join transactions t on c.CustomerID = t.Client_id
group by t.client_id
you could use a left join, something like :
SELECT *
FROM clients
LEFT JOIN transaction ON client.id = transaction.Client_id
You would get all clients, empty rows from transaction would be set to null, so you'll have to change that to 0

Joining two tables, including count, and sorting by count in MySQL

Have the need to run a bit more complex of a MySQL query. I have two tables that I need to join where one contains the primary key on the other. That's easy enough, but then I need to find the number of occurrences of each ID returned as well, and ultimately sort all the results by this number.
Normally this would just be a group by, but I also need to see ALL of the results (so if it were a group by containing 10 records, I'd need to see all 10, as well as that count returned as well).
So for instance, two tables could be:
Customers table:
CustomerID name address phone etc..
Orders table:
OrderID CustomerID product info etc..
The idea is to output, and sort the orders table to find the customer with the most orders in a given time period. The resultant report would have a few hundred customers, along with their order info below.
I couldn't figure out a way to have it return the rows containing ALL the info from both tables, plus the number of occurences of each in one row. (customer info, individual orders info, and count).
I considered separating it into multiple queries (get the list of top customers), then a bunch of sub-queries for each order programmatically. But that was going to end up with many hundreds of sub-queries every time this is submitted.
So I was hoping someone might know of an easier way to do this. My thought was to have a return result with repeated information, but get it only in one query.
Thanks in advance!
SELECT CUST.CustomerID, CUST.Name, ORDR.OrderID, ORDR.OrderDate, ORDR.ProductInfo, COUNTS.cnt
FROM Customers CUST
INNER JOIN Orders ORDR
ON ORDR.CustomerID = CUST.CustomerID
INNER JOIN
(
SELECT C.CustomerID, COUNT(DISTINCT O.OrderID) AS cnt
FROM Customers C
INNER JOIN Orders O
ON O.CustomerID = C.CustomerID
GROUP BY C.CustomerID
) COUNTS
ON COUNTS.CustomerID = CUST.CustomerID
ORDER BY COUNTS.cnt DESC, CustomerID
This will return one row per order, displayed by customer, ordered by the number of orders for that customer.

Best way to structure SQL queries with many inner joins?

I have an SQL query that needs to perform multiple inner joins, as follows:
SELECT DISTINCT adv.Email, adv.Credit, c.credit_id AS creditId, c.creditName AS creditName, a.Ad_id AS adId, a.adName
FROM placementlist pl
INNER JOIN
(SELECT Ad_id, List_id FROM placements) AS p
ON pl.List_id = p.List_id
INNER JOIN
(SELECT Ad_id, Name AS adName, credit_id FROM ad) AS a
ON ...
(few more inner joins)
My question is the following: How can I optimize this query? I was under the impression that, even though the way I currently query the database creates small temporary tables (inner SELECT statements), it would still be advantageous to performing an inner join on the unaltered tables as they could have about 10,000 - 100,000 entries (not millions). However, I was told that this is not the best way to go about it but did not have the opportunity to ask what the recommended approach would be.
What would be the best approach here?
To use derived tables such as
INNER JOIN (SELECT Ad_id, List_id FROM placements) AS p
is not recommendable. Let the dbms find out by itself what values it needs from
INNER JOIN placements AS p
instead of telling it (again) by kinda forcing it to create a view on the table with the two values only. (And using FROM tablename is even much more readable.)
With SQL you mainly say what you want to see, not how this is going to be achieved. (Well, of course this is just a rule of thumb.) So if no other columns except Ad_id and List_id are used from table placements, the dbms will find its best way to handle this. Don't try to make it use your way.
The same is true of the IN clause, by the way, where you often see WHERE col IN (SELECT DISTINCT colx FROM ...) instead of simply WHERE col IN (SELECT colx FROM ...). This does exactly the same, but with DISTINCT you tell the dbms "make your subquery's rows distinct before looking for col". But why would you want to force it to do so? Why not have it use just the method the dbms finds most appropriate?
Back to derived tables: Use them when they really do something, especially aggregations, or when they make your query more readable.
Moreover,
SELECT DISTINCT adv.Email, adv.Credit, ...
doesn't look to good either. Yes, sometimes you need SELECT DISTINCT, but usually you wouldn't. Most often it is just a sign that you haven't thought your query through.
An example: you want to select clients that bought product X. In SQL you would say: where a purchase of X EXISTS for the client. Or: where the client is IN the set of the X purchasers.
select * from clients c where exists
(select * from purchases p where p.clientid = c.clientid and product = 'X');
Or
select * from clients where clientid in
(select clientid from purchases where product = 'X');
You don't say: Give me all combinations of clients and X purchases and then boil that down so I just get each client once.
select distinct c.*
from clients c
join purchases p on p.clientid = c.clientid and product = 'X';
Yes, it is very easy to just join all tables needed and then just list the columns to select and then just put DISTINCT in front. But it makes the query kind of blurry, because you don't write the query as you would word the task. And it can make things difficult when it comes to aggregations. The following query is wrong, because you multiply money earned with the number of money-spent records and vice versa.
select
sum(money_spent.value),
sum(money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
And the following may look correct, but is still incorrect (it only works when the values happen to be unique):
select
sum(distinct money_spent.value),
sum(distinct money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
Again: You would not say: "I want to combine each purchase with each earning and then ...". You would say: "I want the sum of money spent and the sum of money earned per user". So you are not dealing with single purchases or earnings, but with their sums. As in
select
sum(select value from money_spent where money_spent.userid = user.userid),
sum(select value from money_earned where money_earned.userid = user.userid)
from user;
Or:
select
spent.total,
earned.total
from user
join (select userid, sum(value) as total from money_spent group by userid) spent
on spent.userid = user.userid
join (select userid, sum(value) as total from money_earned group by userid) earned
on earned.userid = user.userid;
So you see, this is where derived tables come into play.