I need to retrieve the employees presences for the day. There are two states in the presences: In & Out.
If the employee does not have a presence it should retrieve with the status of null.
I have two tables, Employees and Presences and I want to join them.
ID | name
1 John
2 Julie
3 Anthony
4 Joseph
Now the presences table has the following data:
ID | employee_id | presence_date | presence_hour | Movement
1 1 2016-08-30 08:55 In
2 2 2016-08-30 08:56 In
3 3 2016-08-30 08:57 In
4 1 2016-08-30 12:33 Out
5 2 2016-08-30 12:34 Out
As you can see in the presences data, the employee Anthony has not yet left the office and the employee Joseph has no entries in the table.
The result I'm expecting:
Employee | Movement
John Out
Julie Out
Anthony In
Joseph null
The query I'm using:
SELECT employee.name, presence.movement
FROM employees AS employee
LEFT JOIN presences AS presence ON presence.employee_id = employee.id
WHERE presence.presence_date = '2016-08-30' AND
employee.id IN (1, 2, 3, 4)
GROUP BY employee.id
ORDER BY employee.name, presence.id DESC
The problems I'm facing:
Joseph never appears in the data
presence.id DESC doesn't work
For Joseph presence.presence_date is null, so it is not matched by presence.presence_date = '2016-08-30'.
The order by presence.id makes no sense to me. You are grouping by employee, so all matching rows in presence for that employee are merged together. You want to sort all those according to presence.id and select the most recent rows movement value? This does not work the way you wrote it. One solution would be to use MAX(presence.id) in your query to get the id of the most recent row of presence for the current employee and then join the presence table again to get the data you want.
SELECT a.name, b.movement
FROM (
SELECT employee.name, MAX(presence.id) max_id
FROM employees AS employee
LEFT JOIN presences AS presence
ON presence.employee_id = employee.id WHERE presence.presence_date = '2016-08-30' AND
employee.id IN (1, 2, 3, 4)
GROUP BY employee.id
) a
LEFT JOIN presence b ON a.max_id = b.id
ORDER BY a.name
Although it might be not a good idea to assume that most recent is equivalent to biggest id, so one might select the row with the most recent date, but this is another "problem".
This is caused by applying the date filter in the where criteria. The where criteria is applied after the join, thus eliminating any records for Joseph, since he was not present that day. Move the date criteria to the join condition instead.
You got the whole group by wrong, your query is against the sql standards because you have columns in the select list that are not in the group by list and are not subject of an aggregate function, such as max(). MySQL allows such queries under certain sql mode settings only. Use max() on the movement and group by on employee name and date fields.
Sample query, assuming you can only have 1 in and one out per employee per day:
SELECT employee.name, max(presence.movement) as movement
FROM employees AS employee
LEFT JOIN presences AS presence ON presence.employee_id = employee.id and date(presence.presence_date) = '2016-08-30'
WHERE employee.id IN (1, 2, 3, 4)
GROUP BY employee.name, date(presence.presence_date)
Related
I need to create a query from 2 tables, where my company stores e-shop information.
Example of data from the first table:
currentDate: 5.5.2022 | eshopId: 1 | eshopName: test | active: true |
Table 2:
currentDate: 5.5.2022 | eshopId: 1 | orderId: 123 | attribution: direct |
From the first table, I want get how many days in a given period the eshop was active. From the second table, I would like to count all the orders that were attributed directly to our company in the same time period as in the first table.
SELECT i.id, count(*)
from table1 as i
FULL JOIN table1 as e ON e.id= i.id
WHERE i.active = TRUE
GROUP BY i.id
I tried merging the same table twice, because after I used count to get amount of inactive dates, I could not use another variable as it was not aggregated. This still does not work. I cannot imagine how I would do this for 3 tables. Can someone please point me in the right direction on how to do this? Thanks.
If there is one row for each day per eshopId and you want to count number of active days along with number of order per eshopId:
SELECT i.eshopId, count(*)
from table1 as i
left join (select eshopId, count(distinct orderId) from table2 group by eshopId) j on i.eshopId=j.eshopId
WHERE i.active = TRUE
GROUP BY i.eshopId
two table EMPLOYEE and Department
EMPLOYEE's fields are ID,Name, Salary ,DEPT_ID(foreign key to department table)
DEPARTMENT'S fields are id,NAME,LOCATION
VALUES OF EMPLOYEE TABLE WILL Be
Values OF DEPARTMENT TABLE WILL BE
Output from these table should be
DEPARTMENT_Name should be alpabetically within their count If are there same Count DEPARTMENT_Name should appear in alpabetically and count will be desc order
EMPLOYEE TABLE Values
id name salary dept_id
1 Candice 4685 1
2 Julia 2559 2
3 Bob 4405 4
4 Scarlet 2305 1
5 Ileana 1151 4
Department TABLE Values
id name location
1 Executive Sydney
2 Production Sydney
3 Resources Cape Town
4 Technical Texas
5 Management Paris
OUTPUT DATA SHOULD BE
DEPARTMENT_Name Count_OF_EMPLOYEE_SAME_DEPARTMENT
Executive 2,
Technical 2,
PRODUCTION 1,
MANAGEMENT 0,
RESOURCES 0
For what you want to show all departments even if there are no employees is a LEFT JOIN. So, start with the department table (alias "d" in the query) and LEFT JOIN to the employee table (alias "e"). using shorter alias names that make sense with context makes readability easier.
Now, you have the common "count()" which just returns a count for however many records are encountered, even if multiple in the secondary (employee) table based on common ID. In addition to count(), I also did a sum of the employee salary just for purposes that you can get multiple aggregate values in the same query.. Use it or don't, just wanted to present as an option for you.
Now the order. You want that based on the highest count first, so the COUNT(*) DESC (descending order) is the first sorting. Secondary is the department name to keep alphabetized if within the same count.
select
d.`name` Department_Name,
d.Location,
count(*) NumberOfEmployees,
sum( coalesce( e.salary, 0 )) as DeptTotalSalary
from
Department d
left join employee e
on d.dept_id = e.id
group by
d.`name`
order by
count(*) desc,
d.`name`
Is there an efficient way to find missing data not just in one sequence, but many sequences?
This is probably unavoidably O(N**2), so efficient here is defined as relatively few queries using MySQL
Let's say I have a table of temporary employees and their starting and ending months.
employees | start_month | end_month
------------------------------------
Jane 2017-05 2017-07
Bob 2017-10 2017-12
And there is a related table of monthly payments to those employees
employee | paid_month
---------------------
Jane 2017-05
Jane 2017-07
Bob 2017-11
Bob 2017-12
Now, it's clear that we're missing a month for Jane (2017-06) and one for Bob too (2017-10).
Is there a way to somehow find the gaps in their payment record, without lots of trips back and forth?
In the case where there's just one sequence to check, some people generate a temporary table of valid values, and then LEFT JOIN to find the gaps. But here we have different sequences for each employee.
One possibility is that we could do an aggregate query to find the COUNT() of paid_months for each employee, and then check it versus the expected delta of months. Unfortunately the data here is a bit dirty so we actually have payment dates that could be before or after that employee start or end date. But we're verifying that the official sequence definitely has payments.
Form a Cartesian product of employees and months, then left join the actual data to that, then the missing data is revealed when there is no matched payment to the Cartesian product.
You need a list of every months. This might come from a "calendar table" you already have, OR, it MIGHT be possible using a subquery if every month is represented in the source data)
e.g.
select
m.paid_month, e.employee
from (select distinct paid_month from payments) m
cross join (select employee from employees) e
left join payments p on m.paid_month = p.paid_month and e.employee = p.employee
where p.employee is null
The subquery m can be substituted by the calendar table or some other technique for generating a series of months. e.g.
select
DATE_FORMAT(m1, '%Y-%m')
from (
select
'2017-01-01'+ INTERVAL m MONTH as m1
from (
select #rownum:=#rownum+1 as m
from (select 1 union select 2 union select 3 union select 4) t1
cross join (select 1 union select 2 union select 3 union select 4) t2
## cross join (select 1 union select 2 union select 3 union select 4) t3
## cross join (select 1 union select 2 union select 3 union select 4) t4
cross join(select #rownum:=-1) t0
) d1
) d2
where m1 < '2018-01-01'
order by m1
The subquery e could contain other logic (e.g. to determine which employees are still currently employed, or that are "temporary employees")
First we need to get all the months between start date and end_date in a temporary table then need do a left outer join with the payments table on paid month filtering all non matching months ( payment employee name is null )
select e.employee, e.yearmonth as missing_paid_month from (
with t as (
select e.employee, to_date(e.start_date, 'YYYY-MM') as start_date, to_date(e.end_date, 'YYYY-MM') as end_date from employees e
)
select distinct t.employee,
to_char(add_months(trunc(start_date,'MM'),level - 1),'YYYY-MM') yearmonth
from t
connect by trunc(end_date,'mm') >= add_months(trunc(start_date,'mm'),level - 1)
order by t.employee, yearmonth
) e
left outer join payments p
on p.paid_month = e.yearmonth
where p.employee is null
output
EMPLOYEE MISSING_PAID_MONTH
Bob 2017-10
Jane 2017-06
SQL Fiddle http://sqlfiddle.com/#!4/2b2857/35
I'm working on a project where I have some attendance data. I want to be able to print the top # attendees.
My query as is is set to order the list by # of events attended for each individual. I allow the user to set a limit (so, say top 50 attendees). The problem is that this doesn't do anything to account for ties, so I want to generate a rank in the query that I can then use to limit by.
My relevant schema is as follows:
Members Table:
Member Name | Member ID | # Events Attended
Events Table:
Event Name | Event ID | Other Stuff
This table is then used as a foreign key for an attendance table, which links members to events by using a foreign key that combines a Member and Event ID.
Attendance Table:
Attendance Log ID | Member FK | Event FK
So, my query as is is this:
SELECT `Member Name`, `Member ID` , COUNT( `Member ID` ) AS Attendances
FROM `Members` m
INNER JOIN
(SELECT *
FROM `Events` e
INNER JOIN `Attendance` r ON `Event ID` = `Event FK`
) er
ON `Member ID` = `Member FK`
GROUP BY `Member ID`
ORDER BY `Attendances` DESC
So, to summarize, how can I create a "rank" that I can use to limit results? So top 50 attendees is top 50 ranked attendees (so #entries >= 50), rather than 50 individuals (# entries always 50, cuts off ties).
Thanks all!
Edit1:
Sample output from query with no limit (show all results):
Member Name | Member ID | Attendances
Bob Saget 1 5
John Doe 2 4
Jane Doe 3 3
Stack Overflow 4 3
So, when users request "Show top 3 attendees" with my current query,
they would get the following:
Member Name | Member ID | Attendances
Bob Saget 1 5
John Doe 2 4
Jane Doe 3 3
when in reality, I'd like it to display the ties and show something like
Rank | Member Name | Member ID | Attendances
1 Bob Saget 1 5
2 John Doe 2 4
3 Jane Doe 3 3
3 Stack Overflow 4 3
You can try this:-
SELECT IF(Attendances = #_last_Attendances, #curRank:=#curRank, #curRank:=#_sequence) AS rank,
#_sequence:=#_sequence+1,#_last_age:=age, Member Name, Member ID,
COUNT( `Member ID` ) AS Attendances
FROM `Members` m
INNER JOIN (SELECT * FROM `Events` e
INNER JOIN `Attendance` r
ON `Event ID` = `Event FK`) er
ON `Member ID` = `Member FK`,
(SELECT #curRank := 1, #_sequence:=1, #_last_Attendances:=0) r
GROUP BY `Member Name`, `Member ID`, Rank
HAVING COUNT( `Member ID`) >= (SELECT MAX (`Member ID`)
FROM `Members`
WHERE `Member ID` < (SELECT MAX (`Member ID`)
FROM `Members`
WHERE `Member ID` < (SELECT MAX (`Member ID`)
FROM `Members`)))
ORDER BY COUNT(`Member ID`) DESC;
I think this approach will help you.
Doing this in two queries is going to be your best bet, otherwise the query gets really convoluted.
Here is a SQLFiddle showing your table schema, example data, and the queries we're talking about.
The first problem we need to break down is how to determine what the correct rank is. We can do this by doing the select but only returning a single value of the rank that is our new limit. Assuming we want the top 3 ranks we'll return only the third row (offset 2, limit 1).
# Pre-select the lowest rank allowed.
SELECT COUNT(a.attendanceId) INTO #lowestRank
FROM Member AS m
JOIN Event AS e
JOIN Attendance AS a USING (memberId, eventId)
GROUP BY m.memberId
ORDER BY 'Attendances' DESC
LIMIT 1 OFFSET 2;
Once we have the #lowestRank we can now run the query again but with a HAVING clause to restrict the GROUP BY results. By restricting only results which have a rank equal to or greater than the #lowestRank we've essentially added a LIMIT to that field.
# Return all rows of the lowest rank or above.
SELECT m.name, m.memberId, COUNT(a.attendanceId) AS 'Attendances'
FROM Member AS m
JOIN Event AS e
JOIN Attendance AS a USING (memberId, eventId)
GROUP BY m.memberId
HAVING COUNT(a.attendanceId) >= #lowestRank
ORDER BY 'Attendances' DESC;
We could have done this in one query by making the first one a JOIN of the second one, but I don't recommend that because it complicates the queries, has potential performance impact, and makes it harder to change them independently.
For example the first query only limits duplicates at the cutoff point, but if you wanted to consider all duplicates a single rank then we could change that query to only consider DISTINCT rows. In this particular data set the results would be the same, but if we had two members with four attendance then we'd still get three distinct ranks (5, 4, 4, 3, 3) versus the above query only gets two distinct ranks (5, 4, 4).
I can't seem to find a suitable solution for the following (probably an age old) problem so hoping someone can shed some light. I need to return 1 distinct column along with other non distinct columns in mySQL.
I have the following table in mySQL:
id name destination rating country
----------------------------------------------------
1 James Barbados 5 WI
2 Andrew Antigua 6 WI
3 James Barbados 3 WI
4 Declan Trinidad 2 WI
5 Steve Barbados 4 WI
6 Declan Trinidad 3 WI
I would like SQL statement to return the DISTINCT name along with the destination, rating based on country.
id name destination rating country
----------------------------------------------------
1 James Barbados 5 WI
2 Andrew Antigua 6 WI
4 Declan Trinidad 2 WI
5 Steve Barbados 4 WI
As you can see, James and Declan have different ratings, but the same name, so they are returned only once.
The following query returns all rows because the ratings are different. Is there anyway I can return the above result set?
SELECT (distinct name), destination, rating
FROM table
WHERE country = 'WI'
ORDER BY id
Using a subquery, you can get the highest id for each name, then select the rest of the rows based on that:
SELECT * FROM table
WHERE id IN (
SELECT MAX(id) FROM table GROUP BY name
)
If you'd prefer, use MIN(id) to get the first record for each name instead of the last.
It can also be done with an INNER JOIN against the subquery. For this purpose the performance should be similar, and sometimes you need to join on two columns from the subquery.
SELECT
table.*
FROM
table
INNER JOIN (
SELECT MAX(id) AS id FROM table GROUP BY name
) maxid ON table.id = maxid.id
The problem is that distinct works across the entire return set and not just the first field. Otherwise MySQL wouldn't know what record to return. So, you want to have some sort of group function on rating, whether MAX, MIN, GROUP_CONCAT, AVG, or several other functions.
Michael has already posted a good answer, so I'm not going to re-write the query.
I agree with #rcdmk . Using a DEPENDENT subquery can kill performance, GROUP BY seems more suitable provided that you have already INDEXed the country field and only a few rows will reach the server. Rewriting the query giben by #rcdmk , I added the ORDER BY NULL clause to suppress the implicit ordering by GROUP BY, to make it a little faster:
SELECT MIN(id) as id, name, destination as rating, country
FROM table WHERE country = 'WI'
GROUP BY name, destination ORDER BY NULL
You can do a GROUP BY clause:
SELECT MIN(id) AS id, name, destination, AVG(rating) AS rating, country
FROM TABLE_NAME
GROUP BY name, destination, country
This query would perform better in large datasets than the subquery alternatives and it can be easier to read as well.