Wrong use of inner join function / group function? - mysql

I have the following problem with my query:
I have two tables:
Customer
Subscriber
linked together by customer.id=subscriber.customer_id
in the subscriber table, I have records with id_customer=0 (these are email records, that do not have a full customer account)
Now i want to show how many customers I have per day, and how many subscribers with id_customer, and how many subscribers WITH id_customer=0 (emailonlies i call them)
Somehow, i cannot manage to get those emailonlies.
Perhaps it has something to do with not using the right join type.
When i use left join, i get the right amount of customers, but not the right amount of emailonlies. When I use inner join i get the wrong amount of customers. Am i using the group function correctly? i think it has something to do with that.
THIS IS MY QUERY:
` SELECT DATE(c.date_register),
COUNT(DISTINCT c.id) AS newcustomers,
COUNT(DISTINCT s.customer_id) AS newsubscribedcustomers,
COUNT(DISTINCT s.subscriber_id AND s.customer_id=0) AS emailonlies
FROM customer c
LEFT JOIN subscriber s ON s.customer_id=c.id
GROUP BY DATE(c.date_register)
ORDER BY DATE(c.date_register) DESC
LIMIT 10
;`

I'm not entirely sure, but I think in DISTINCT s.subscriber_id AND s.customer_id=0, it runs the AND before the DISTINCT, so the DISTINCT only ever sees true and false.
Why don't you just take
COUNT(DISTINCT s.subscriber_id) - (COUNT(DISTINCT s.customer_id) - 1)?
(The -1 is there because DISTINCT s.customer_id will count 0.)

Got it, only risk is that i get no email onlies if there are no customers on this day, becuase of the left join. But this one works:
SELECT customers.regdatum,customers.customersqty,subscribers.emailonlies
FROM (
(SELECT DATE(c.date_register) AS regdatum,COUNT(DISTINCT c.id) AS customersqty
FROM customer c
GROUP BY DATE(c.date_register)
) AS customers
LEFT JOIN
(SELECT DATE(s.added) AS voegdatum,COUNT(DISTINCT s.subscriber_id) AS emailonlies
FROM subscriber s
WHERE s.customer_id=0
GROUP BY DATE(s.added)
) AS subscribers
ON customers.regdatum=subscribers.voegdatum
)
ORDER BY customers.regdatum DESC
;

Related

How to make my WHERE clause not run a syntax error in SQL?

The questions asks,
"Write a query to display the customer name and the number of payments they have made where the amount on the check is greater than their average payment amount. Order the results by the descending number of payments."
So far I have,
SELECT customerName,
(SELECT COUNT(checkNumber) FROM Payments WHERE
Customers.customerNumber = Payments.customerNumber) AS
NumberOfPayments
FROM Customers
WHERE amount > SELECT AVG(amount)
ORDER BY NumberOfPayments DESC;
But I am getting a syntax error every-time I run out. What am I doing incorrectly in this situation?
The syntax error comes from the fact that you are having an incorrect second subquery: amount > SELECT AVG(amount) doesn't work.
You could use amount > (SELECT AVG(amount) FROM Payments).
That is: complete the subquery and put it between ( ).
However this won't do what you want (plus it is inefficient).
Now since this is not a forum to do your homework for you, I will leave it at this and thus only help you with the actual question: why do you get the syntax error. Keep on looking, you will find it. No better way to learn than to search and find yourself.
I would phrase this as an inner join between the two tables, with a correlated subquery to find the average payment amount per customer:
SELECT
c.customerName,
COUNT(CASE WHEN p.amount > (SELECT AVG(p2.amount) FROM Payments p2
WHERE p2.customerName = c.customerName)
THEN 1 END) AS NumberOfPayments
FROM Customers c
INNER JOIN Payments p
ON c.customerNumber = p.customerNumber
GROUP BY
c.customerNumber
ORDER BY
NumberOfPayments DESC;
Your current query is on the right track, but you need to do something called conditional aggregation to obtain the count. In this case, we aggregate by customer then assert that a given payment amount is greater than his average before we include it in the count.
I would approach this just using JOINs:
SELECT c.customerName,
SUM( p.amount > p2.avg_amount ) as Num_Payments_Larger_Than_Average
FROM Customers c LEFT JOIN
Payments p
ON c.customerNumber = p.customerNumber LEFT JOIN
(SELECT p2.customerNumber, AVG(amount) as avg_amount
FROM payments p2
GROUP BY p2.customerNumber
) p2
ON p2.customerNumber = p.customerNumber
GROUP BY c.customerNumber, c.customerName
ORDER BY Num_Payments_Larger_Than_Average;
Some notes about this answer. First, it uses LEFT JOIN and conditional aggregation. This allows the query to return customers with zero payments larger than their average -- that is, customers with no payments or all of whose payments are the same.
Second, it includes customerNumber in the GROUP BY. I think this is important, because it may be possible for two customers to have the same name.

Use SELECT through three table

I tried to write a query, but unfortunately I didn't succeed.
I want to know how many packages delivered over a given period by a person.
So I want to know how many packages were delivered by John (user_id = 1) between 01-02-18 and 28-02-18. John drives another car (another plate_id) every day.
(orders_drivers.user_id, plates.plate_name, orders.delivery_date, orders.package_amount)
I have 3 table:
orders with plate_id delivery_date package_amount
plates with plate_id plate_name
orders_drivers with plate_id plate_date user_id
I tried some solutions but didn't get the expected result. Thanks!
Try using JOINS as shown below:
SELECT SUM(o.package_amount)
FROM orders o INNER JOIN orders_drivers od
ON o.plate_id=od.plate_id
WHERE od.user_id=<the_user_id>;
See MySQL Join Made Easy for insight.
You can also use a subquery:
SELECT SUM(o.package_amount)
FROM orders o
WHERE EXISTS (SELECT 1
FROM orders_drivers od
WHERE user_id=<user_id> AND o.plate_id=od.plate_id);
SELECT sum(orders.package_amount) AS amount
FROM orders
LEFT JOIN plates ON orders.plate_id = orders_drivers.plate_id
LEFT JOIN orders_driver ON orders.plate_id = orders_drivers.plate_id
WHERE orders.delivery_date > date1 AND orders.delivery_date < date2 AND orders_driver.user_id = userid
GROUP BY orders_drivers.user_id
But seriously, you need to ask questions that makes more sense.
sum is a function to add all values that has been grouped by GROUP BY.
LEFT JOIN connects all tables by id = id. Any other join can do this in this case, as all ids are unique (at least I hope).
WHERE, where you give the dates and user.
And GROUP BY userid, so if there are more records of the same id, they are returned as one (and summed by their pack amount.)
With the AS, your result is returned under the name 'amount',
If you want the total of packageamount by user in a period, you can use this query:
UPDATE: add a where clause on user_id, to retrieve John related data
SELECT od.user_id
, p.plate_name
, SUM(o.package_amount) AS TotalPackageAmount
FROM orders_drivers od
JOIN plates p
ON o.plate_id = od.plate_id
JOIN orders o
ON o.plate_id = od.plate_id
WHERE o.delivery_date BETWEEN convert(datetime,01/02/2018,103) AND convert(datetime,28/02/2018,103)
AND od.user_id = 1
GROUP BY od.user_id
, p.plate_name
It groups rows on user_id and plate_name, filter a period of delivery_date(s) and then calculate the sum of packageamount for the group

SQL is it possible to combine group, count and distinct?

I manage a registration system, where people can register for a course, and I have the following query to calculate some statistics:
SELECT p.id_country AS id, c.name, COUNT(p.id_country) AS total
FROM participants p
LEFT JOIN countries c ON p.id_country = c.id
WHERE p.id_status NOT IN (3,4,13,14)
GROUP BY p.id_country
ORDER BY total DESC
this query works fine, it shows me exactly the number of participants per country.
Now it is possible for our system to register for multiple courses, and for every registration a new row will be inserted in the participants table. I know, it's not ideal situation, but unfortunately it's too late to change this right now. If a participant registers for a second (or a third, fourth etc) course, then he uses the same email address. So in the participant table the same email address can be there multiple times.
what I would like to do is change this query, so that it takes into account that every email address can be used only once. the field is just p.email, and I think I should do something with DISTINCT to make this happen. But whatever I try, it either gives me very weird results or an error.
is it possible to do this ?
Try not to mix distinct and group by in queries. You get the same result doing:
select distinct p.id_country from participants
than doing
select p.id_country from participants group by p.id_country
What you need is to filter out duplicates:
SELECT p.id_country AS id, c.name, COUNT(p.id_country) AS total
FROM participants p
LEFT JOIN countries c ON p.id_country = c.id
WHERE p.id_status NOT IN (3,4,13,14)
and not exists
(select email from participants p2 where p1.email=p2.email and p1.id>p2.id)
GROUP BY p.id_country
ORDER BY total DESC
This will only count emails once, by not counting the newer IDS of account with duplicated emails.
how about adding UNIQUE constraint on the table?
ALTER TABLE participants ADD CONSTRAINT part_uq UNIQUE (email)
SELECT
p.id_country AS id,
c.name,
COUNT(p.id_country) AS total
FROM
(select p.mail, max(id_country) as id_country from participants where p.id_status not in (3,4,13,14) group by p.mail) p
LEFT JOIN countries c ON p.id_country = c.id
GROUP BY
p.id_country
ORDER BY
total DESC
I am using max(id_country) for the case where one email adress has more countries. If this cannot happen by design, you can move id_country to group by clause.

MySQL is not using INDEX in subquery

I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.

mysql inner join giving bad results (?)

The following sql call works fine, returns the correct total retail for customers:
SELECT customer.id,
customer.first_name,
customer.last_name,
SUM(sales_line_item_detail.retail) AS total_retail
FROM sales_line_item_detail
INNER JOIN sales_header
ON sales_header.id = sales_line_item_detail.sales_header_id
INNER JOIN customer
ON customer.id = sales_header.customer_id
GROUP BY sales_header.customer_Id
ORDER BY total_Retail DESC
LIMIT 10
However, i need it to return the customers telephone and email addresses as well.. please keep in mind that not all customers have an email address and telephone number. whenever i left join the email and numbers tables, it throws the total_retail amount off by thousands and I am not sure why.
The following query gives completely wrong results for the total_retail field:
SELECT customer.id,
customer.first_name,
customer.last_name,
IF(
ISNULL( gemstore.customer_phone_numbers.Number),
'No Number..',
gemstore.customer_phone_numbers.Number
) AS Number,
IF(
ISNULL(gemstore.customer_emails.Email),
'No Email...',
gemstore.customer_emails.Email
) AS Email,
SUM(sales_line_item_detail.retail) AS total_retail,
FROM sales_line_item_detail
INNER JOIN sales_header
ON sales_header.id = sales_line_item_detail.sales_header_id
INNER JOIN customer
ON customer.id = sales_header.customer_id
LEFT JOIN gemstore.customer_emails
ON gemstore.customer_emails.Customer_ID = gemstore.customer.ID
LEFT JOIN gemstore.customer_phone_numbers
ON gemstore.customer_phone_numbers.Customer_ID = gemstore.customer.ID
GROUP BY sales_header.customer_Id
ORDER BY total_Retail DESC
LIMIT 10
Any help figuring out why it is throwing off my results is greatly appreciated.
Thanks!
Is it possible that there are multiple records for a Customer_ID in either the customer_emails or customer_phone_numbers tables?
You'll be matching too many records. Try the query without the group by clause and you'll see which ones and how. Most likely the left join's will duplicate order rows on every customer email/phone match.
I am not totally sure, as i can't test this, but the following might be happening.
If there are more than one email or phone number per customer the final result might get multiplied, because of the new joins.
Imagine the query without the group_by and join to sales:
CustomerId Email phoneNumber
1 test#gmx.com 0122233
1 mail#yahoo.com 0122233
The user in this example has 2 mailadresses.
If you would now add the join to sales and the group by, you would have doubled total_retail.
If this should be the case, replacing the LEFT JOIN with an LEFT OUTER JOIN should do the trick. In that case you will however only see the first email/phonenumer of the customer.