How to avoid groups but require a minimum count? - mysql

I have answered and read many question on getting the greatest-n-per-group but now find myself needing the opposite.
I have a result set that shows students, date, and project that represent which students worked on a project on a given day.
I would like to see rows where multiple students worked on a project for that day. So if my result set looks like this:
| student | date | project |
+---------+------------+---------+
| 1 | 2014-12-04 | 1 |
| 2 | 2014-12-04 | 1 |
| 3 | 2014-12-04 | 1 |
| 1 | 2014-12-03 | 1 |
I would only like to see the first three rows, so I can see that students 1,2,3 worked together on the same project on the same day. I could filter like this:
GROUP BY date, project
HAVING COUNT(*) > 1
But then only one row will be returned.

you can use your existing query as subquery and get the results
SQL FIDDLE DEMO
SELECT * from Table1 T1
JOIN
(
SELECT date, project
from table1
group by date, project
having count(*) >1
) t
on t1.date = t.date
and t1.project = t.project

This should work.
I think of the table as two sets of data and join them based on date and project and not the same student.
This way if any records exist after the join, we know that they have the same project and date but not for the same student. Group the results ... and you have what you're after.
SELECT A.student, A.date, A.project
from table a
INNER JOIN table b
on A.date=B.Date
and A.Project=B.Project
and a.student<> b.student
group by A.student, a.date, a.project

Related

MySQL Query for dates that have not been picked by a customer

I am using mySQL to build an application where a customer can pick available dates.
I want two queries, one that specifies what slots each customer picked and one the specifies which date has not yet been picked.
The setup
I have a list of time slots in the form of dates
TABLE: timeslots
slot_id | date
1 | 2020-10-01
2 | 2020-10-02
3 | 2020-10-03
I also have a customer table
TABLE: customers
customer_id | name
1 | Anders
2 | Joe
3 | Karen
Each customer can pick whatever date they like which is specified in the customer_timeslot table which has two Foreign Keys.
TABLE: customer_timeslot
customer_id | slot_id
1 | 1
1 | 2
2 | 1
3 | 1
First query all good
The first query is easy enough and gives me the dates Anders has picked.
The query for the dates Anders (cust. 1) picked
SELECT timeslots.date AS Date, customer.name AS Customer FROM timeslots
JOIN customer_timeslot
USING (slot_id)
JOIN customers
USING (customer_id)
WHERE customers.customer_id = 1
Result query 1
Date | Customer
2020-10-01 | Anders
2020-10-02 | Anders
The result I want for the second query
I want the date Anders has not picked yet which would look like this
Date | Customer
2020-10-03 | Anders
What I've tried
I've tried to use LEFT JOIN instead of JOIN..
SELECT timeslots.date AS Date, customer.name AS Customer FROM timeslots
LEFT JOIN customer_timeslot
USING (slot_id)
JOIN customers
USING (customer_id)
WHERE customers.customer_id = 1
..Which i expected would give me this result but instead gives me the exact same as the INNER JOIN (No NULL to work with)
Date | Customer
2020-10-01 | Anders
2020-10-02 | Anders
2020-10-03 | NULL
How can i get the desired query? Shouldn't be so complicated I guess but I'm finding myself completely stuck and looking for some help.
You could use not exists:
select t.*
from timeslots t
where not exists (
select 1
from customer_timeslot ct
where ct.customer_id = 1 and ct.slot_id = t.slot_id
)
This returns the timeslots that customer_id 1 did not pick. You can get the information for all customers at once with a cross join, then not exists:
select t.date, c.name
from timeslots t
cross join customers c
where not exists (
select 1
from customer_timeslot ct
where ct.customer_id = c.customer_id and ct.slot_id = t.slot_id
)

SQL left join: how to return the newest from tableB and grouped by another field

I've been trying for two days, without luck.
I have the following simplified tables in my database:
customers:
| id | name |
| 1 | andrea |
| 2 | marco |
| 3 | giovanni |
access:
| id | name_id | date |
| 1 | 1 | 5000 |
| 2 | 1 | 4000 |
| 3 | 2 | 1500 |
| 4 | 2 | 3000 |
| 5 | 2 | 1000 |
| 6 | 3 | 6000 |
| 7 | 3 | 2000 |
I want to return all the names with their last access date.
At first I tried simply with
SELECT * FROM customers LEFT JOIN access ON customers.id =
access.name_id
But I got 7 rows instead of 3 as expected. So I understood I need to use GROUP BY statemet as the following:
SELECT * FROM customers LEFT JOIN access ON customers.id =
access.name_id GROUP BY customers.id
As far I know, GROUP BY combines using a random row. In fact I got unordered access dates with several tests.
Instead I need to group every customer id with its corresponding latest access! How this can be done?
You have to get the latest date from the access table with a group by on the the name_id, then join this result with the customer table. Here is the query:
select c.id, c.name, a.last_access_date from customers c left join
(select id, name_id, max(access_date) last_access_date from access group by name_id) a
on c.id=a.name_id;
Here is a DEMO on sqlfiddle.
I think this is what you'd like to achieve:
SELECT c.id, c.name, max(a.date) last_access
FROM customers c
LEFT JOIN access a ON c.id = a.name_id
GROUP BY c.id, c.name
The LEFT join will return all entries in table customers regardless if the join criteria (c.id = a.name_id) is satisfied. This means that you might get some NULL entries.
Example:
Simply add a new row in the customers table (id: 4, name: manuela). The output will have 4 rows and the newest row will be (id: 4, last_access: null)
I would do this using a correlated subquery in the ON clause:
SELECT a.*, c.*
FROM customers c LEFT JOIN
access a
ON c.id = a.name_id AND
a.DATE = (SELECT MAX(a2.date) FROM access a2 WHERE a2.name_id = a.name_id);
If this statement is true:
I need to group every customer id with its corresponding latest access! How this can be done?
Then you can simply do:
select a.name_id, max(a2.date)
from access a
group by a.name_id;
You do not need the customers table because:
All customers are in access, so the left join is not necessary.
You need no columns from customers.

How to format and print a result of sql query

I recently got this question on interview which I failed to answer. The question was to list the number of duplicates that appear in a column employer like from this table
id | employer | employee
1 | isro | dude1
2 | isro | dude 2
3 | cnd | dude 3
4 | df | dsfdef
5 | dfdf | dfdfd
...
so the result should be like
isro = 2
df = 4
dfsf = 6
how do i achieve this?
I know there is count(*) which i could use with select statement with where clause, but how do i achieve the above result.
The HAVING clause can be used to filter on aggregated values:
SELECT employer, COUNT(*)
FROM yourTable
GROUP BY employer
HAVING COUNT(*) > 1
assuming TableName is the name of the table you want to select from, this would be your answer.
SELECT employer, count(employer)
FROM TableName
GROUP BY employer
HAVING COUNT(*) > 1
here is an answer to a very similar question that has some more info for you.
How to count occurrences of a column value efficiently in SQL?

MySQL Aggregate Function with group by and join

I have the following tables schemas and I want to get the sum of amount column for each category and the count of employees in the corresponding categories.
employee
id | name | category
1 | SC | G 1.2
2 | BK | G 2.2
3 | LM | G 2.2
payroll_histories
id | employee_id | amount
1 | 1 | 1000
2 | 1 | 500
3 | 2 | 200
4 | 2 | 100
5 | 3 | 300
Output table should look like this:
category | total | count
G 1.2 | 1500 | 1
G 2.2 | 600 | 2
I have tried this query below its summing up and grouping but I cannot get the count to work.
SELECT
employee_id,
category,
SUM(amount) from payroll_histories,employees
WHERE employees.id=payroll_histories.employee_id
GROUP BY category;
I have tried the COUNT(category) but that one too is not working.
You are, I believe, seeking two different summaries of your data. One is a sum of salaries by category, and the other is a count of employees, also by category.
You need to use, and then join, separate aggregate queries to get this.
SELECT a.category, a.amount, b.cnt
FROM (
SELECT e.category, SUM(p.amount) amount
FROM employees e
JOIN payroll_histories p ON e.id = p.employee_id
GROUP BY e.category
) a
JOIN (
SELECT category, COUNT(*) cnt
FROM employees
GROUP BY category
) b ON a.category = b.category
The general principle here is to avoid trying to use just one aggregate query to aggregate more than one kind of detail entity. Your amount aggregates payroll totals, whereas your count aggregates employees.
Alternatively for your specific case, this query will also work. But it doesn't generalize well or necessary perform well.
SELECT e.category, SUM(p.amount) amount, COUNT(DISTINCT e.id) cnt
FROM employees e
JOIN payroll_histories p ON e.id = p.employee_id
GROUP BY e.category
The COUNT(DISTINCT....) will fix the combinatorial explosion that comes from the join.
(Pro tip: use the explicit join rather than the outmoded table,table WHERE form of the join. It's easier to read.)

MySQL conditionally populate column 3 based on DISTINCT involving 2 other columns in one table

Had a good read through similar topics but I can't quite a) find one to match my scenario, or b) understand others enough to fit / tailor / tweek to my situation.
I have a table, the important fields being;
+------+------+--------+--------+
| ID | Name | Price |Status |
+------+------+--------+--------+
| 1 | Fred | 4.50 | |
| 2 | Fred | 4.50 | |
| 3 | Fred | 5.00 | |
| 4 | John | 7.20 | |
| 5 | John | 7.20 | |
| 6 | John | 7.20 | |
| 7 | Max | 2.38 | |
| 8 | Max | 2.38 | |
| 9 | Sam | 21.00 | |
+------+------+--------+--------+
ID is an auto-incrementing value as records get added throughout the day.
NAME is a Primary Key field, which can repeat 1 to 3 times in the whole table.
Each NAME will have a PRICE value, which may or may not be the same per NAME.
There is also a STATUS field that need to be populated based on the following, which is actually the part I am stuck on.
Status = 'Y' if each DISTINCT name has only one price attached to it.
Status = 'N' if each DISTINCT name has multiple prices attached to it.
Using the table above, ID's 1, 2 and 3 should be 'N', whilst 4, 5, 6, 7, 8 and 9 should be 'Y'.
I think this may well involve some form of combination of JOINs, GROUPs, and DISTINCTs but I am at a loss on how to put that into the right order for SQL.
In order to get the count of distinct Price values per name, we must use a GROUP BY on the Name field, but since you also want to display all names ungrouped but with an additional Status field, we must first create a subselect in the FROM clause which groups by the name and determines whether the name has multiple price values or not.
When we GROUP BY Name in the subselect, COUNT(DISTINCT price) will count the number of distinct price values for each particular name. Without the DISTINCT keyword, it would simply count the number of rows where price is not null.
In conjunction with that, we use a CASE expression to insert N into the Status column if there is more than one distinct Price value for the particular name, otherwise, it will insert Y.
The subselect only returns one row per Name, so to get all names ungrouped, we join that subselect to the main table on the condition that the subselect's Name = the main table's Name:
SELECT
b.ID,
b.Name,
b.Price,
a.Status
FROM
(
SELECT Name, CASE WHEN COUNT(DISTINCT Price) > 1 THEN 'N' ELSE 'Y' END AS Status
FROM tbl
GROUP BY Name
) a
INNER JOIN
tbl b ON a.Name = b.Name
Edit: In order to facilitate an update, you can incorporate this query using JOINs in the UPDATE like so:
UPDATE
tbl a
INNER JOIN
(
SELECT Name, CASE WHEN COUNT(DISTINCT Price) > 1 THEN 'N' ELSE 'Y' END AS Status
FROM tbl
GROUP BY Name
) b ON a.Name = b.Name
SET
a.Status = b.Status
Assuming you have an unfilled Status column in your table.
If you want to update the status column, you could do:
UPDATE mytable s
SET status = (
SELECT IF(COUNT(DISTINCT price)=1, 'Y', 'N') c
FROM (
SELECT *
FROM mytable
) s1
WHERE s1.name = s.name
GROUP BY name
);
Technically, it should not be necessary to have this:
FROM (
SELECT *
FROM mytable
) s1
but there is a mysql limitation that prevents you to select from the table you're updating. By wrapping it in parenthesis, we force mysql to create a temporary table and then it suddenly is possible.