GROUP BY aggregate by column - mysql

I've run into an issue where every time I attempt to use GROUP BY, H2 informs me that I need to add certain column names into the GROUP BY clause because, based on my research, it's unclear to H2 how to sort columns with non-repeating data.
Here's an example to elaborate:
Person table
+------------+------------+
| ID | Name |
+============+============+
| 1 | John |
+------------+------------+
| 2 | Jane |
+------------+------------+
Pet table
+------------+------------+------------+------------+
| ID | PERSON_ID | NAME | BIRTHDATE |
+============+============+============+============+
| 1 | 1 | Rufus | 2012 |
+------------+------------+------------+------------+
| 2 | 1 | Ben | 2014 |
+------------+------------+------------+------------+
Let's say I want all the oldest pets belonging to John.
SELECT PERSON.NAME, PET.NAME, PET.BIRTHDATE FROM PERSON
INNER JOIN PET ON PET.PERSON_ID = PERSON.ID
GROUP BY PERSON.NAME
ORDER BY PET.BIRTHDATE ASC
This would work perfectly in MySQL because it will simply group by PERSON.NAME and, by default, select the first record in the set. However, in H2 it needs to have aggregation such as MAX, MIN, etc.
The problem, as you can see in this example, is that you could use MIN to get the BIRTHDATE ordered correctly but there does not appear to be any aggregation function available for sorting NAME based on the oldest BIRTHDATE?

If you want the oldest pets, I would recommend:
SELECT p.NAME, pt.NAME, pt.BIRTHDATE
FROM PERSON p INNER JOIN
PET pt
ON pt.PERSON_ID = p.ID
WHERE pt.BIRTHDATE = (SELECT MIN(pt2.BIRTHDATE)
FROM pet pt2
WHERE pt2.PERSON_ID = PT.PERSON_ID
);
This explicitly selects the pet or pets (for each person) that have the earliest birth year. No aggregation is necessary.
You can also phrase this with JOINs only in the FROM:
SELECT p.NAME, pt.NAME, pt.BIRTHDATE
FROM PERSON p INNER JOIN
PET pt
ON pt.PERSON_ID = p.ID JOIN
(SELECT PERSON_ID, MIN(pt2.BIRTHDATE) as MINBT
FROM pet pt2
GROUP BY pt2.PERSON_ID
) pt2
ON pt2.PERSON_ID = PT.PERSON_ID;

You can always resort to NOT EXISTS in such cases, if the person has no pet with smaller birthdate then the pet is the oldest (if two pets happen to have the same age and both are the oldest ones for that person, then both are selected):
SELECT p.NAME, q.NAME, q.BIRTHDATE
FROM PERSON p
INNER JOIN PET q ON q.PERSON_ID = p.ID AND NOT EXISTS (
SELECT * FROM PET WHERE PERSON_ID = p.ID AND BIRTHDATE < q.BIRTHDATE
)
ORDER BY q.BIRTHDATE ASC
If you insist on GROUP BY you can do it like this:
SELECT a.name, b.name, b.BIRTHDATE FROM (
SELECT p.id, MIN(q.BIRTHDATE) birthdate FROM PERSON p
INNER JOIN PET q ON q.PERSON_ID = p.ID
GROUP BY p.ID
) o INNER JOIN PERSON a ON a.ID = o.ID
INNER JOIN PET b ON b.PERSON_ID = a.ID AND b.BIRTHDATE = o.BIRTHDATE
ORDER BY b.BIRTHDATE
If you can use WITH the query could be written easier.

Related

MySQL where joined value is multiple ANDs

Running into a seemingly simple JOIN problems here..
I have two tables, users and courses
| users.id | users.name |
| 1 | Joe |
| 2 | Mary |
| 3 | Mark |
| courses.id | courses.name |
| 1 | History |
| 2 | Math |
| 3 | Science |
| 4 | English |
and another table that joins the two:
| users_id | courses_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
I'm trying to find distinct user names who are in course 1 and course 2
It's possible a user is in other courses, too, but I only care that they're in 1 and 2 at a minimum
SELECT DISTINCT(users.name)
FROM users_courses
LEFT JOIN users ON users_courses.users_id = users.id
LEFT JOIN courses ON users_courses.courses_id = courses.id
WHERE courses.name = "History" AND courses.name = "Math"
AND courses.name NOT IN ("English")
I understand why this is returning an empty set (since no single joined row has History and Math - it only has one value per row.
How can I structure the query so that it returns "Joe" because he is in both courses?
Update - I'm hoping to avoid hard-coding the expected total count of courses for a given user, since they might be in other courses my search does not care about.
Join users to a query that returns the user ids that are in both courses:
select u.name
from users u
inner join (
select users_id
from users_courses
where courses_id in (1, 2)
group by users_id
having count(distinct courses_id) = 2
) c on c.users_id = u.id
You can omit distinct from the condition:
count(distinct courses_id) = 2
if there are no duplicates in users_courses.
See the demo.
If you want to search by course names and not ids:
select u.name
from users u
inner join (
select uc.users_id
from users_courses uc inner join courses c
on c.id = uc.courses_id
where c.name in ('History', 'Math')
group by uc.users_id
having count(distinct c.id) = 2
) c on c.users_id = u.id
See the demo.
Results:
| name |
| ---- |
| Joe |
You can use in operator and use select to generate list of potential users_id attending the second course, to find matching ones in the first course. This is many times faster than using joins.
select distinct u.users_id, users.name
from users_courses u, users
where u.users_id in (select distinct users_id from users_courses where courses_id = 2)
and u.courses_id = 1
and users.users_id = u.users_id
Almost similar to what #Nae's solution.
select u.name from users u
where exists
(select 1
from users_courses uc
where uc.course_id in (1, 2)
and uc.user_id = u.id
group by uc.user_id
having count(0) = 2);
Your code is close. Just use GROUP BY and a HAVING clause:
SELECT u.name
FROM users_courses uc JOIN
users u
ON uc.users_id = u.id JOIN
courses c
ON uc.courses_id = c.id
WHERE c.name IN ('History', 'Math')
GROUP BY u.name
HAVING COUNT(DISTINCT c.name) = 2;
Notes:
This assumes that users cannot have the same name. You might want to use GROUP BY u.id, u.name to ensure that you are counting individual users.
If users cannot take the same course multiple times, then use COUNT(*) = 2 rather than COUNT(DISTINCT).
I'd write:
SELECT MAX(u.name)
FROM users_courses uc
LEFT JOIN users u ON uc.users_id = u.id
WHERE uc.courses_id IN (1, 2)
GROUP BY uc.users_id
HAVING COUNT(0) = 2
;
For more complex conditions (for example requiring the user to be in certain classes but also not in certain classes such as "Science") this should also work:
SELECT MAX(u.name)
FROM users_courses uc
LEFT JOIN users u ON uc.users_id = u.id
GROUP BY uc.users_id
HAVING (
SUM(uc.courses_id = 1) = 1
-- user enrolled exactly once in the course 2
AND SUM(uc.courses_id = 2) = 1
-- user enrolled in course 3, 0 times
AND SUM(uc.courses_id = 3) = 0
)
;

SQL - How to find the person with the highest grade

I am trying to find the name of the person who received the highest grade in the "Big Data" course.
I have 3 different tables:
People (id, name, age, address)
---------------------------------------------------
p1 | Tom Martin| 24 | 11, Integer Avenue, Fractions, MA
p2 | Al Smith | 33 | 26, Main Street, Noman's Land, PA
p3 | Kim Burton| 40 | 45, Elm Street, Blacksburg, VA
---------------------------------------------------
Courses (cid, name, department)
---------------------------------------------------------
c1 | Systematic Torture | MATH
c2 | Pretty Painful | CS
c3 | Not so Bad | MATH
c4 | Big Data | CS
---------------------------------------------------------
Grades (pid, cid, grade)
---------------------------------------------------
p1 | c1 | 3.5
p2 | c3 | 2.5
p3 | c2 | 4.0
p3 | c4 | 3.85
---------------------------------------------------
I can't figure out how to find the person with the highest grade without using any fancy SQL feature. That is, I just want to use SELECT, FROM, WHERE, UNION, INTERSECT, EXCEPT, CREATE VIEW and arithmetic comparison operators like =, <, >.
My outcome is showing something other than what I try to achieve.
This is what I have tried so far:
CREATE VIEW TEMPFIVE AS
SELECT G1.pid FROM Grades AS G1, Grades AS G2 WHERE G1.pid = G2.pid AND G1.cid = G2.cid
SELECT People.name, Courses.name FROM TEMPFIVE, People, Courses WHERE TEMPFIVE.pid = People.pid AND Courses.name = "Big Data";
+------------+----------+
| name | name |
+------------+----------+
| Tom Martin | Big Data |
| Al Smith | Big Data |
|Kim Burton | Big Data |
|Kim Burton | Big Data |
+------------+----------+
The easiest way is to use LIMT 1 with an ORDER BY DESC clause:
SELECT p.name, c.name, g.grade
FROM People AS p
JOIN Grades AS g ON p.id = g.pid
JOIN Courses AS c ON c.cid = g.cid
WHERE c.name = "Big Data"
ORDER BY g.grade DESC LIMIT 1
No Idea for MySql Query structure. So Explained in steps. I hope you can build query based on that.
join three tables according to their relationship
set course name 'Big data' in where clause
set grade order to DESC order
set the limit to fetch only first row.
Try this
select * from(
select p.id pid,p.name name, p.age age,p.address address,
c.cid cid, c.name coursname, c.department department,g.grade grade
from Grades G
left join
Courses C on g.cid = c.cid
left join
People p on g.pid = p.id
)a where coursname= 'Big Data' order by grade desc
you can apply the operators on the where clause
GiorgosBestos shoes the correct way if you only want 1 record. If you want ties. meaning if more than 1 student has the same MAX grade then you can do a subselect as follows:
SELECT p.name, c.name, g.grade
FROM
(
SELECT c.cid, MAX(g.grade) MaxGrade
FROM
Grades g
INNER JOIN Courses c
ON c.cid = g.cid
AND c.name = 'Big Data'
GROUP BY
c.cid
) m
INNER JOIN Grades g
ON g.cid = m.cid
AND g.grade = m.MaxGrade
INNER JOIN People p
ON g.pid = p.id
The following SQL covers the case when tow or more students have the same maximum grade:
SELECT P.NAME,
C.NAME,
G.GRADE
FROM PEOPLE P
JOIN GRADES G ON G.PID = P.ID
JOIN COURSES C ON C.CID = G.CID
WHERE C.NAME = 'Big data'
AND G.GRADE = (SELECT MAX(G2.GRADE)
FROM PEOPLE P2
JOIN GRADES G2 ON G2.PID = P2.ID
JOIN COURSES C2 ON C2.CID = G2.CID
WHERE C2.NAME = 'Big data');
It is similar but not identical to the SQL proposed by Matt.

MySQL join table return only one match

I'm struggling the the JOIN in a MySQL query. Somehow I can't find out why my result is not what I want.
I have two tables, a table orders and a table products. The table product holds the order.id of the order. So a order can have more than one products, so for example the table products holds two records for a order.
The result I need is all orders where a product holds a VAT of 21.
Table example.
orders
id | customer
---------------
1 | John Doe
2 | Hello World
order_products
id | order_id | product | vat
1 | 1 | Porsche 911 GT4 | 21
2 | 1 | Audi R8 LMS | 21
3 | 1 | Ferrari Enzo | 19
3 | 2 | Bugatti Veyron | 19
No I want all orders where the products have a VAT of 21. So I will do a LEFT JOIN on the table order_products:
SELECT orders.id, orders.customer, order_products.product FROM orders LEFT JOIN order_products ON orders.id = order_products.order_id WHERE order_products.vat = '21'
This returns the following:
1 John Doe Porsche
1 John Doe Audi R8 LMS
But I only need one result because the orders.id is important for me, not all products in the order. I only join on the order_products to get the orders with only VAT 21. At the moment I ran out of options on how to fix this. Even after reading several topics about joins on this site and other sites.
First, you aren't going to return any orders that don't have products, so there is no need for a left join...an inner join is fine.
If the orders_product is not important to you, you can use a subquery and not select any columns from the orders_product. With your current query, you're selecting a column though.
Something like...
SELECT id, customer
FROM orders
WHERE order_id IN (SELECT order_id FROM order_products
WHERE order_products.vat = '21');
If you prefer not to use a correlated subquery, you can use a group by or distinct
SELECT orders.id, orders.customer
FROM orders
INNER JOIN order_products ON orders.id = order_products.order_id
WHERE order_products.vat = '21'
GROUP BY orders.id, orders.customer;
or...
SELECT DISTINCT orders.id, orders.customer
FROM orders
INNER JOIN order_products ON orders.id = order_products.order_id
WHERE order_products.vat = '21';
If you only care about the orders then you can group by the order id:
SELECT orders.id, orders.customer, order_products.product
FROM orders
LEFT JOIN order_products ON orders.id = order_products.order_id
WHERE order_products.vat = '21'
GROUP BY orders.id;
However note that order_products.product will only be one of the products. If you wish to display all of the products but in the one column you can user GROUP_CONCAT:
SELECT orders.id, orders.customer, GROUP_CONCAT(order_products.product)
FROM orders
LEFT JOIN order_products ON orders.id = order_products.order_id
WHERE order_products.vat = '21'
GROUP BY orders.id;
This will return:
1 John Doe Porsche, Audi R8 LMS
SELECT DISTINCT o.id
, o.customer
FROM orders o
JOIN order_products op
ON op.order_id = o.id
WHERE op.vat = 21;

Pass value from query 1 to query 2

I have a query join 2 table as below:
SELECT * FROM Staff s INNER JOIN Account a on s.AccNo = a.AccNo WHERE a.Status = 'Active'
The result as shown:
AccNo | Name | ID
------------------
1 | Alex | S01
2 | John | S02
After I get the staff ID,I write second query to find out the max sale as below:
SELECT s.ProductID,Max(s.Amount) from Sales s WHERE StaffID = 'S01' GROUP BY s.ProductID
The max sale for Staff 'S01' as below:
ProductID | Amount
------------------
Cloth | 2000
How to I combine these 2 queries and become result as below? Thanks
AccNo | Name | ID | Amount
--------------------------
1 | Alex | S01 | 2000
2 | John | S02 | 5000
You can create a subquery and join it:
SELECT a.AccNo, b.Name, b.ID, c.maximum
FROM transaction as a
INNER JOIN Account as b
ON a.AccNo = b.AccNo
LEFT JOIN (SELECT StaffID, Max(Amount) as maximum FROM Sales GROUP BY StaffID) as c
ON c.StaffID = b.ID
WHERE b.Status = 'Active'
See the SQLFiddle example (I've tried to guess the schema)
So what you want to do is join to sales on the staffId then group.
SELECT a.AccNo,a.Name,a.ID,Max(s.Amount)
FROM Transaction t
INNER JOIN Account a on t.AccNo = a.AccNo
INNER JOIN Sales s on s.staffId = a.ID
WHERE a.Status = 'Active'
GROUP BY a.AccNo,a.Name,a.ID
You could try something like this:
Select Account.*, Max(Sales.amount) from Sales
JOIN Account ON Sales.StaffID = Account.ID
where Account.status = 'Active'
group by Sales.ProductID, Account.AccNo, Account.Name, Account.ID
Honestly, I don't understand why do you use Transascation table in your queries, because you don't use it.
I think this should work
Just do a join and retrieve the max amount associated with each staff
SELECT t.AccNo , t.Name, t.ID, s.ProductID, Max(s.Amount) FROM Transaction t
INNER JOIN Account a ON t.AccNo = a.AccNo
INNER JOIN Sales s ON s.StaffID = a.ID
WHERE a.Status = 'Active';
Thanks

SQL how to select unique one-to-one pairings

edited to make clearer - many apologies for the confusion of the original example
I have the following table structure representing married couples:
id | Person | Spouse
______________________
1 | Mary | John
2 | John | Mary
3 | Katy | Bob
4 | Bob | Katy
5 | Mary | John
6 | John | Mary
In this example Mary is married to John, Katy to Bob and a different Mary is married to a different John.
How can I retrieve these pairs of married couples?
I have got close with this:
SELECT
p.id id1,
q.id id2
FROM
people p
INNER JOIN people q ON
p.person = q.spouse AND
q.person = p.spouse AND
p.id < q.id
ORDER BY p.id
However this returns:
1 | 2 (1st Mary & 1st John)
1 | 6 (1st Mary & 2nd John) *problem*
2 | 5 (1st John & 2nd Mary) *problem*
3 | 4 (Katy & Bob)
5 | 6 (2nd Mary & 2nd John)
How can I make sure the 1st Mary and 1st John are only married once (i.e. remove the problem rows above)?
Many thanks
Here's the SQL to create the example:
CREATE TABLE people
(`id` int, `person` varchar(7), `spouse` varchar(7))
;
INSERT INTO people
(`id`, `person`, `spouse`)
VALUES
(1, 'Mary', 'John'),
(2, 'John', 'Mary'),
(3, 'Katy', 'Bob'),
(4, 'Bob', 'Katy'),
(5, 'Mary', 'John'),
(6, 'John', 'Mary')
;
SELECT
p.id id1,
q.id id2
FROM
people p
INNER JOIN people q ON
p.person = q.spouse AND
q.person = p.spouse AND
p.id < q.id
ORDER BY p.id
;
I'll give it a try:
SELECT
p.id AS id1,
q.id AS id2
FROM
people AS p
JOIN people AS q ON
p.person = q.spouse AND
q.person = p.spouse AND
p.id < q.id
JOIN (SELECT
p.id, COUNT(*) AS rank
FROM
people AS p
INNER JOIN people AS p2 ON
p.person = p2.person AND
p.spouse = p2.spouse AND
p.id >= p2.id
GROUP BY p.id
) AS x ON
x.id = p.id
JOIN (SELECT
p.id, COUNT(*) AS rank
FROM
people AS p
INNER JOIN people AS p2 ON
p.person = p2.person AND
p.spouse = p2.spouse AND
p.id >= p2.id
GROUP BY p.id
) AS y ON
y.id = q.id AND
y.rank = x.rank ;
And another one:
SELECT
p.id AS id1,
q.id AS id2
FROM
people AS p
JOIN people AS q ON
p.person = q.spouse AND
q.person = p.spouse
JOIN people AS p2 ON
p.person = p2.person AND
p.spouse = p2.spouse AND
p.id >= p2.id
JOIN people AS q2 ON
q.person = q2.person AND
q.spouse = q2.spouse AND
q.id >= q2.id
WHERE
p.id < q.id
GROUP BY
p.id, q.id
HAVING
COUNT(DISTINCT p2.id) = COUNT(DISTINCT q2.id) ;
Both tested at SQL-Fiddle
It would be much simpler, if only MySQL had window functions (like almost all other DBMS have). Tested at Postgres fiddle:
WITH cte AS
( SELECT
id, person, spouse,
ROW_NUMBER() OVER( PARTITION BY person, spouse
ORDER BY id )
AS rn
FROM
people
)
SELECT
p.id AS id1,
q.id AS id2
FROM
cte AS p
JOIN cte AS q ON
p.person = q.spouse AND
q.person = p.spouse AND
p.rn = q.rn AND
p.id < q.id ;
In this example Mary is married to John, Katy to Bob and a different Mary is married to Richard.
Nothing in your show data structures allows to differentiate between those two “Marys”, because there is no difference between them.
Both are just the text literal Mary. If you want to differentiate between different people that might have the same name, then you need another criterion, and a unique one at that. (F.e. the id of the database records for each individual person.)
Your database stricture is wrong.
People like Mary, John, etc. do not have identity.
Some heuristic query might help, but it is not a reliable solution.
So, please, improve you data structure.
Not very elegant, but works:
SELECT p.id, q.id
FROM people p
INNER JOIN people q ON
p.person1 = q.person2 and
q.person1 = p.person1
which in fact uses the existance of an inverted row as a selector
There's lots of ways of doing it, however one of the most important reasons for using a database is that it holds lots of data - and there should rarely be times when you ever write a query which retrieves lots of data. Except in very unusual circumstances, and for homework assignments, the results should be filtered according to some criteria. Hence the most appropriate solution depends on what other stuff you add to the query later.
But here's a couple of examples of how to get the unique pairs:
SELECT a, b, GROUP_CONCAT(id)
(SELECT id
, IF (person>=spouse, person, spouse) as a
, IF (person>=spouse, spouse, person) as b
FROM yourtable ) AS pairs
GROUP BY a,b;
SELECT id, person, spouse
FROM yourtable s1
WHERE NOT EXISTS ( SELECT 1
FROM yourtable s2
WHERE s2.id>s1.id
AND s1.person=s2.spouse
AND s1.spouse=S2.person);
(there are several other solutions).