SQL is it possible to combine group, count and distinct? - mysql

I manage a registration system, where people can register for a course, and I have the following query to calculate some statistics:
SELECT p.id_country AS id, c.name, COUNT(p.id_country) AS total
FROM participants p
LEFT JOIN countries c ON p.id_country = c.id
WHERE p.id_status NOT IN (3,4,13,14)
GROUP BY p.id_country
ORDER BY total DESC
this query works fine, it shows me exactly the number of participants per country.
Now it is possible for our system to register for multiple courses, and for every registration a new row will be inserted in the participants table. I know, it's not ideal situation, but unfortunately it's too late to change this right now. If a participant registers for a second (or a third, fourth etc) course, then he uses the same email address. So in the participant table the same email address can be there multiple times.
what I would like to do is change this query, so that it takes into account that every email address can be used only once. the field is just p.email, and I think I should do something with DISTINCT to make this happen. But whatever I try, it either gives me very weird results or an error.
is it possible to do this ?

Try not to mix distinct and group by in queries. You get the same result doing:
select distinct p.id_country from participants
than doing
select p.id_country from participants group by p.id_country
What you need is to filter out duplicates:
SELECT p.id_country AS id, c.name, COUNT(p.id_country) AS total
FROM participants p
LEFT JOIN countries c ON p.id_country = c.id
WHERE p.id_status NOT IN (3,4,13,14)
and not exists
(select email from participants p2 where p1.email=p2.email and p1.id>p2.id)
GROUP BY p.id_country
ORDER BY total DESC
This will only count emails once, by not counting the newer IDS of account with duplicated emails.

how about adding UNIQUE constraint on the table?
ALTER TABLE participants ADD CONSTRAINT part_uq UNIQUE (email)

SELECT
p.id_country AS id,
c.name,
COUNT(p.id_country) AS total
FROM
(select p.mail, max(id_country) as id_country from participants where p.id_status not in (3,4,13,14) group by p.mail) p
LEFT JOIN countries c ON p.id_country = c.id
GROUP BY
p.id_country
ORDER BY
total DESC
I am using max(id_country) for the case where one email adress has more countries. If this cannot happen by design, you can move id_country to group by clause.

Related

How can I get customer data based on the number of users they have?

I want to get customer data from all the businesses with more than 1 user.
For this I think I need a subquery to count more than 1 user and then the outer query to give me their emails.
I have tried subqueries in the WHERE and HAVING clause
SELECT u.mail
FROM users u
WHERE count IN (
SELECT count (u.id_business)
FROM businesses b
INNER JOIN users u ON b.id = u.id_business
GROUP BY b.id, u.id_business
HAVING COUNT (u.id_business) >= 2
)
I believe that you do not need a subquery, everything can be achieved in a joined aggregate query with a HAVING clause, like :
SELECT u.mail
FROM users u
INNER JOIN businesses b on b.id = u.id_business
GROUP BY u.id, u.email
HAVING COUNT (*) >= 2
NB : in case several users may have the same email, I have added the primary key of users to the GROUP BY clause (I assumed that the pk is called id) : you may remove this if email is a unique field in users.

Selecting count(column) from different table

I have three tables area,vehicle and employee.
ward_no is the foreign key for vehicle and employee.
I want to select the number of vehicles and number of employees and display them along with other details of area.
The query i used is:
select a.* ,count(v.vid) as vehicles,count(e.eid) as employees from area a,vehicle v,employee e where v.ward_no=a.ward_no and e.ward_no=a.ward_no group by a.name;
But the output is not what I want. I get the same values in both the columns where the count is use instead of displaying the total number of vehicles/employees in that particular area.
I'm new to MySQl
The default behavior of count is to count the non-null values.
In your case, this counts repetitions of the value.
Try adding DISTINCT inside the count:
select a.* ,count(DISTINCT v.vid) as vehicles,count(DISTINCT e.eid) as employees
from area a,vehicle v,employee e
where v.ward_no=a.ward_no and e.ward_no=a.ward_no group by a.name;
Also, it's better to use explicit JOIN rather than implicit, like this:
select a.* ,count(DISTINCT v.vid) as vehicles,count(DISTINCT e.eid) as employees
from area a JOIN vehicle v ON v.ward_no=a.ward_no
JOIN employee e ON e.ward_no=a.ward_no
group by a.name;
There may be a chance that you are getting same vehicle and employee multiple times due to the joins, Use DISTINCT in COUNT() get count of unique vehicles and employees
SELECT
a.*,
COUNT(DISTINCT v.vid) AS vehicles,
COUNT(DISTINCT e.eid) AS employees
FROM
`area` a
JOIN vehicle v
ON v.ward_no = a.ward_no
JOIN employee e
ON e.ward_no = a.ward_no
GROUP BY a.name

Wrong use of inner join function / group function?

I have the following problem with my query:
I have two tables:
Customer
Subscriber
linked together by customer.id=subscriber.customer_id
in the subscriber table, I have records with id_customer=0 (these are email records, that do not have a full customer account)
Now i want to show how many customers I have per day, and how many subscribers with id_customer, and how many subscribers WITH id_customer=0 (emailonlies i call them)
Somehow, i cannot manage to get those emailonlies.
Perhaps it has something to do with not using the right join type.
When i use left join, i get the right amount of customers, but not the right amount of emailonlies. When I use inner join i get the wrong amount of customers. Am i using the group function correctly? i think it has something to do with that.
THIS IS MY QUERY:
` SELECT DATE(c.date_register),
COUNT(DISTINCT c.id) AS newcustomers,
COUNT(DISTINCT s.customer_id) AS newsubscribedcustomers,
COUNT(DISTINCT s.subscriber_id AND s.customer_id=0) AS emailonlies
FROM customer c
LEFT JOIN subscriber s ON s.customer_id=c.id
GROUP BY DATE(c.date_register)
ORDER BY DATE(c.date_register) DESC
LIMIT 10
;`
I'm not entirely sure, but I think in DISTINCT s.subscriber_id AND s.customer_id=0, it runs the AND before the DISTINCT, so the DISTINCT only ever sees true and false.
Why don't you just take
COUNT(DISTINCT s.subscriber_id) - (COUNT(DISTINCT s.customer_id) - 1)?
(The -1 is there because DISTINCT s.customer_id will count 0.)
Got it, only risk is that i get no email onlies if there are no customers on this day, becuase of the left join. But this one works:
SELECT customers.regdatum,customers.customersqty,subscribers.emailonlies
FROM (
(SELECT DATE(c.date_register) AS regdatum,COUNT(DISTINCT c.id) AS customersqty
FROM customer c
GROUP BY DATE(c.date_register)
) AS customers
LEFT JOIN
(SELECT DATE(s.added) AS voegdatum,COUNT(DISTINCT s.subscriber_id) AS emailonlies
FROM subscriber s
WHERE s.customer_id=0
GROUP BY DATE(s.added)
) AS subscribers
ON customers.regdatum=subscribers.voegdatum
)
ORDER BY customers.regdatum DESC
;

Table join issue with MySQL

I have a table for referred users (contains an email address and date columns) and a table for users.
I run to get the top referers:
SELECT count(r.Email) as count, r.Email
FROM refs r
WHERE r.referredOn > '2011-12-13'
GROUP BY email
ORDER BY count DESC
But I want to join this with the users table so it displays with other data in the user table, I thought a join would work. Left join becuase emails may be entered incorrectly, some people put first name etc under refs.Email
SELECT count(r.Email) as count, r.Email, u.*
FROM refs r LEFT JOIN users u ON u.email_primary = r.Email
WHERE r.referredOn > '2011-12-13'
GROUP BY email
ORDER BY count DESC
With the above query the count is incorrect, but I don't know why.
Try this one:
SELECT count(r.Email) as count, r.Email
FROM refs r
INNER JOIN users u ON u.email_primary = r.Email
WHERE r.referredOn > '2011-12-13'
GROUP BY email
ORDER BY count DESC
if your adding new column from users u you also need to add it on your group by clause.
Regards
Unfortunately, a LEFT JOIN wont help you here; what this type of join says is give me all the rows in users that match my email, as well as all the rows that have no match on email. If the email doesn't match, then they wont come back as you want.
So you can't use a the left join condition here the way you want.
If you enforced the fact that they had to enter an email everytime, and it was a valid email etc, then you could use an INNER JOIN.
JOINs are usually used to follow referential integrity. So, for example, I have a user with an id in one table, and another table with the column userid - there is a strong relationship between the two tables I can join on.
Jeft Atwood has a good explantion of how joins work.
SEE if this will help you:
SELECT e.count, e.email, u.col1, u.col2 -- etc
FROM (
SELECT count(r.Email) as count, r.Email
FROM refs r
WHERE r.referredOn > '2011-12-13'
GROUP BY email
) e
INNER JOIN
users u ON u.email_primary = e.Email
Instead of a direct join, you could TRY to use your counting query as a subquery-table type..
I wrote this query
SELECT *, count(r.Email) as count FROM refs r
LEFT OUTER JOIN users u ON r.email = u.email_primary
WHERE u.uid IS NOT NULL
GROUP BY u.uid
ORDER BY count DESC
Which showed me that the reason the count was wrong was because some of the email addresses are used twice in the users table (users sharing 'family' email address), this doubled my count, the above query shows each separate user account.

mysql inner join giving bad results (?)

The following sql call works fine, returns the correct total retail for customers:
SELECT customer.id,
customer.first_name,
customer.last_name,
SUM(sales_line_item_detail.retail) AS total_retail
FROM sales_line_item_detail
INNER JOIN sales_header
ON sales_header.id = sales_line_item_detail.sales_header_id
INNER JOIN customer
ON customer.id = sales_header.customer_id
GROUP BY sales_header.customer_Id
ORDER BY total_Retail DESC
LIMIT 10
However, i need it to return the customers telephone and email addresses as well.. please keep in mind that not all customers have an email address and telephone number. whenever i left join the email and numbers tables, it throws the total_retail amount off by thousands and I am not sure why.
The following query gives completely wrong results for the total_retail field:
SELECT customer.id,
customer.first_name,
customer.last_name,
IF(
ISNULL( gemstore.customer_phone_numbers.Number),
'No Number..',
gemstore.customer_phone_numbers.Number
) AS Number,
IF(
ISNULL(gemstore.customer_emails.Email),
'No Email...',
gemstore.customer_emails.Email
) AS Email,
SUM(sales_line_item_detail.retail) AS total_retail,
FROM sales_line_item_detail
INNER JOIN sales_header
ON sales_header.id = sales_line_item_detail.sales_header_id
INNER JOIN customer
ON customer.id = sales_header.customer_id
LEFT JOIN gemstore.customer_emails
ON gemstore.customer_emails.Customer_ID = gemstore.customer.ID
LEFT JOIN gemstore.customer_phone_numbers
ON gemstore.customer_phone_numbers.Customer_ID = gemstore.customer.ID
GROUP BY sales_header.customer_Id
ORDER BY total_Retail DESC
LIMIT 10
Any help figuring out why it is throwing off my results is greatly appreciated.
Thanks!
Is it possible that there are multiple records for a Customer_ID in either the customer_emails or customer_phone_numbers tables?
You'll be matching too many records. Try the query without the group by clause and you'll see which ones and how. Most likely the left join's will duplicate order rows on every customer email/phone match.
I am not totally sure, as i can't test this, but the following might be happening.
If there are more than one email or phone number per customer the final result might get multiplied, because of the new joins.
Imagine the query without the group_by and join to sales:
CustomerId Email phoneNumber
1 test#gmx.com 0122233
1 mail#yahoo.com 0122233
The user in this example has 2 mailadresses.
If you would now add the join to sales and the group by, you would have doubled total_retail.
If this should be the case, replacing the LEFT JOIN with an LEFT OUTER JOIN should do the trick. In that case you will however only see the first email/phonenumer of the customer.