Creating a rank for "Order By" integer data - MySQL - mysql

I'm working on a project where I have some attendance data. I want to be able to print the top # attendees.
My query as is is set to order the list by # of events attended for each individual. I allow the user to set a limit (so, say top 50 attendees). The problem is that this doesn't do anything to account for ties, so I want to generate a rank in the query that I can then use to limit by.
My relevant schema is as follows:
Members Table:
Member Name | Member ID | # Events Attended
Events Table:
Event Name | Event ID | Other Stuff
This table is then used as a foreign key for an attendance table, which links members to events by using a foreign key that combines a Member and Event ID.
Attendance Table:
Attendance Log ID | Member FK | Event FK
So, my query as is is this:
SELECT `Member Name`, `Member ID` , COUNT( `Member ID` ) AS Attendances
FROM `Members` m
INNER JOIN
(SELECT *
FROM `Events` e
INNER JOIN `Attendance` r ON `Event ID` = `Event FK`
) er
ON `Member ID` = `Member FK`
GROUP BY `Member ID`
ORDER BY `Attendances` DESC
So, to summarize, how can I create a "rank" that I can use to limit results? So top 50 attendees is top 50 ranked attendees (so #entries >= 50), rather than 50 individuals (# entries always 50, cuts off ties).
Thanks all!
Edit1:
Sample output from query with no limit (show all results):
Member Name | Member ID | Attendances
Bob Saget 1 5
John Doe 2 4
Jane Doe 3 3
Stack Overflow 4 3
So, when users request "Show top 3 attendees" with my current query,
they would get the following:
Member Name | Member ID | Attendances
Bob Saget 1 5
John Doe 2 4
Jane Doe 3 3
when in reality, I'd like it to display the ties and show something like
Rank | Member Name | Member ID | Attendances
1 Bob Saget 1 5
2 John Doe 2 4
3 Jane Doe 3 3
3 Stack Overflow 4 3

You can try this:-
SELECT IF(Attendances = #_last_Attendances, #curRank:=#curRank, #curRank:=#_sequence) AS rank,
#_sequence:=#_sequence+1,#_last_age:=age, Member Name, Member ID,
COUNT( `Member ID` ) AS Attendances
FROM `Members` m
INNER JOIN (SELECT * FROM `Events` e
INNER JOIN `Attendance` r
ON `Event ID` = `Event FK`) er
ON `Member ID` = `Member FK`,
(SELECT #curRank := 1, #_sequence:=1, #_last_Attendances:=0) r
GROUP BY `Member Name`, `Member ID`, Rank
HAVING COUNT( `Member ID`) >= (SELECT MAX (`Member ID`)
FROM `Members`
WHERE `Member ID` < (SELECT MAX (`Member ID`)
FROM `Members`
WHERE `Member ID` < (SELECT MAX (`Member ID`)
FROM `Members`)))
ORDER BY COUNT(`Member ID`) DESC;
I think this approach will help you.

Doing this in two queries is going to be your best bet, otherwise the query gets really convoluted.
Here is a SQLFiddle showing your table schema, example data, and the queries we're talking about.
The first problem we need to break down is how to determine what the correct rank is. We can do this by doing the select but only returning a single value of the rank that is our new limit. Assuming we want the top 3 ranks we'll return only the third row (offset 2, limit 1).
# Pre-select the lowest rank allowed.
SELECT COUNT(a.attendanceId) INTO #lowestRank
FROM Member AS m
JOIN Event AS e
JOIN Attendance AS a USING (memberId, eventId)
GROUP BY m.memberId
ORDER BY 'Attendances' DESC
LIMIT 1 OFFSET 2;
Once we have the #lowestRank we can now run the query again but with a HAVING clause to restrict the GROUP BY results. By restricting only results which have a rank equal to or greater than the #lowestRank we've essentially added a LIMIT to that field.
# Return all rows of the lowest rank or above.
SELECT m.name, m.memberId, COUNT(a.attendanceId) AS 'Attendances'
FROM Member AS m
JOIN Event AS e
JOIN Attendance AS a USING (memberId, eventId)
GROUP BY m.memberId
HAVING COUNT(a.attendanceId) >= #lowestRank
ORDER BY 'Attendances' DESC;
We could have done this in one query by making the first one a JOIN of the second one, but I don't recommend that because it complicates the queries, has potential performance impact, and makes it harder to change them independently.
For example the first query only limits duplicates at the cutoff point, but if you wanted to consider all duplicates a single rank then we could change that query to only consider DISTINCT rows. In this particular data set the results would be the same, but if we had two members with four attendance then we'd still get three distinct ranks (5, 4, 4, 3, 3) versus the above query only gets two distinct ranks (5, 4, 4).

Related

Finding missing data in a sequence in MySQL

Is there an efficient way to find missing data not just in one sequence, but many sequences?
This is probably unavoidably O(N**2), so efficient here is defined as relatively few queries using MySQL
Let's say I have a table of temporary employees and their starting and ending months.
employees | start_month | end_month
------------------------------------
Jane 2017-05 2017-07
Bob 2017-10 2017-12
And there is a related table of monthly payments to those employees
employee | paid_month
---------------------
Jane 2017-05
Jane 2017-07
Bob 2017-11
Bob 2017-12
Now, it's clear that we're missing a month for Jane (2017-06) and one for Bob too (2017-10).
Is there a way to somehow find the gaps in their payment record, without lots of trips back and forth?
In the case where there's just one sequence to check, some people generate a temporary table of valid values, and then LEFT JOIN to find the gaps. But here we have different sequences for each employee.
One possibility is that we could do an aggregate query to find the COUNT() of paid_months for each employee, and then check it versus the expected delta of months. Unfortunately the data here is a bit dirty so we actually have payment dates that could be before or after that employee start or end date. But we're verifying that the official sequence definitely has payments.
Form a Cartesian product of employees and months, then left join the actual data to that, then the missing data is revealed when there is no matched payment to the Cartesian product.
You need a list of every months. This might come from a "calendar table" you already have, OR, it MIGHT be possible using a subquery if every month is represented in the source data)
e.g.
select
m.paid_month, e.employee
from (select distinct paid_month from payments) m
cross join (select employee from employees) e
left join payments p on m.paid_month = p.paid_month and e.employee = p.employee
where p.employee is null
The subquery m can be substituted by the calendar table or some other technique for generating a series of months. e.g.
select
DATE_FORMAT(m1, '%Y-%m')
from (
select
'2017-01-01'+ INTERVAL m MONTH as m1
from (
select #rownum:=#rownum+1 as m
from (select 1 union select 2 union select 3 union select 4) t1
cross join (select 1 union select 2 union select 3 union select 4) t2
## cross join (select 1 union select 2 union select 3 union select 4) t3
## cross join (select 1 union select 2 union select 3 union select 4) t4
cross join(select #rownum:=-1) t0
) d1
) d2
where m1 < '2018-01-01'
order by m1
The subquery e could contain other logic (e.g. to determine which employees are still currently employed, or that are "temporary employees")
First we need to get all the months between start date and end_date in a temporary table then need do a left outer join with the payments table on paid month filtering all non matching months ( payment employee name is null )
select e.employee, e.yearmonth as missing_paid_month from (
with t as (
select e.employee, to_date(e.start_date, 'YYYY-MM') as start_date, to_date(e.end_date, 'YYYY-MM') as end_date from employees e
)
select distinct t.employee,
to_char(add_months(trunc(start_date,'MM'),level - 1),'YYYY-MM') yearmonth
from t
connect by trunc(end_date,'mm') >= add_months(trunc(start_date,'mm'),level - 1)
order by t.employee, yearmonth
) e
left outer join payments p
on p.paid_month = e.yearmonth
where p.employee is null
output
EMPLOYEE MISSING_PAID_MONTH
Bob 2017-10
Jane 2017-06
SQL Fiddle http://sqlfiddle.com/#!4/2b2857/35

MySQL retrieve employees presences

I need to retrieve the employees presences for the day. There are two states in the presences: In & Out.
If the employee does not have a presence it should retrieve with the status of null.
I have two tables, Employees and Presences and I want to join them.
ID | name
1 John
2 Julie
3 Anthony
4 Joseph
Now the presences table has the following data:
ID | employee_id | presence_date | presence_hour | Movement
1 1 2016-08-30 08:55 In
2 2 2016-08-30 08:56 In
3 3 2016-08-30 08:57 In
4 1 2016-08-30 12:33 Out
5 2 2016-08-30 12:34 Out
As you can see in the presences data, the employee Anthony has not yet left the office and the employee Joseph has no entries in the table.
The result I'm expecting:
Employee | Movement
John Out
Julie Out
Anthony In
Joseph null
The query I'm using:
SELECT employee.name, presence.movement
FROM employees AS employee
LEFT JOIN presences AS presence ON presence.employee_id = employee.id
WHERE presence.presence_date = '2016-08-30' AND
employee.id IN (1, 2, 3, 4)
GROUP BY employee.id
ORDER BY employee.name, presence.id DESC
The problems I'm facing:
Joseph never appears in the data
presence.id DESC doesn't work
For Joseph presence.presence_date is null, so it is not matched by presence.presence_date = '2016-08-30'.
The order by presence.id makes no sense to me. You are grouping by employee, so all matching rows in presence for that employee are merged together. You want to sort all those according to presence.id and select the most recent rows movement value? This does not work the way you wrote it. One solution would be to use MAX(presence.id) in your query to get the id of the most recent row of presence for the current employee and then join the presence table again to get the data you want.
SELECT a.name, b.movement
FROM (
SELECT employee.name, MAX(presence.id) max_id
FROM employees AS employee
LEFT JOIN presences AS presence
ON presence.employee_id = employee.id WHERE presence.presence_date = '2016-08-30' AND
employee.id IN (1, 2, 3, 4)
GROUP BY employee.id
) a
LEFT JOIN presence b ON a.max_id = b.id
ORDER BY a.name
Although it might be not a good idea to assume that most recent is equivalent to biggest id, so one might select the row with the most recent date, but this is another "problem".
This is caused by applying the date filter in the where criteria. The where criteria is applied after the join, thus eliminating any records for Joseph, since he was not present that day. Move the date criteria to the join condition instead.
You got the whole group by wrong, your query is against the sql standards because you have columns in the select list that are not in the group by list and are not subject of an aggregate function, such as max(). MySQL allows such queries under certain sql mode settings only. Use max() on the movement and group by on employee name and date fields.
Sample query, assuming you can only have 1 in and one out per employee per day:
SELECT employee.name, max(presence.movement) as movement
FROM employees AS employee
LEFT JOIN presences AS presence ON presence.employee_id = employee.id and date(presence.presence_date) = '2016-08-30'
WHERE employee.id IN (1, 2, 3, 4)
GROUP BY employee.name, date(presence.presence_date)

Group mysql results by cumulative column value

I have a database table events and a table bets. All bets placed for a particular event are located in the bets table while information about the event is stored in the events table.
Let's say I have these tables:
events table:
id event_title
1 Call of Duty Finals
2 DOTA 2 Semi-Finals
3 GTA V Air Race
bets table:
id event_id amount
1 1 $10
1 2 $50
1 2 $100
1 3 $25
1 3 $25
1 3 $25
I want to be able to sort by popularity aka how many bets have been placed for that event and by prize aka the total amount of money for that event.
SORTING BY PRIZE
Obviously this query doesn't work but I want to do something like this:
SELECT * FROM bets GROUP BY event_id SORT BY amount
amount from the query above should be a cumulative value of all the bet amounts for that event_id added together, so this query would return
Array (
[0]=>Array(
'event_id'=>2
'amount'=>$150
)
[1]=>Array(
'event_id'=>3
'amount'=>$75
)
[2]=>Array(
'event_id'=>1
'amount'=>$10
)
)
SORTING BY POPULARITY
Obviously this query doesn't work either but I want to do something like this:
SELECT * FROM bets GROUP BY event_id SORT BY total_rows
total_rows from the query above should be the number of rows that exist in the bets table added together, so this query would return
Array (
[0]=>Array(
'event_id'=>3
'total_rows'=>3
)
[1]=>Array(
'event_id'=>2
'total_rows'=>2
)
[2]=>Array(
'event_id'=>1
'total_rows'=>1
)
)
I wouldn't necessarily need it to return the total_rows value as I could calculate that, but it does need to be sorted by the number of occurrences for that particular event_id in the bets table.
I think count and sum are your friends here:
SELECT COUNT(event_id) AS NumberBets,
SUM(amount) AS TotalPrize
FROM bets
GROUP BY event_id
Should do the trick.
Then you can ORDER BY either the NumberBets(popularity) or TotalPrize as you need. JOIN only needed if you want event titles.
You can use SUM and COUNT aggregate functions:
SELECT
e.id AS event_id, SUM(amount) AS sum_amount
FROM [events] e
LEFT JOIN bets b
ON b.event_id = e.id
GROUP BY
e.id
ORDER BY
sum_amount DESC
SELECT
e.id AS event_id, COUNT(e.event_id) AS no_of_events
FROM [events] e
LEFT JOIN bets b
ON b.event_id = e.id
GROUP BY
e.id
ORDER BY
no_of_events DESC

SQL for last x events of type

Im trying to find which members have had the best attendance over the last x events where event type matters
Example structure here
http://sqlfiddle.com/#!2/bde53/1
attended_id is the members id, Given many events with many event types i would like something like this if possible
attended_id | last 6 event 1 | last 12 event 2 | 2013 event 3 |
1 6 10 6
2 5 9 12
3 2 8 7
2013 event 3 means all event id's 3 which occured in 2013
is this possible or is it best to export to excel to get this information ?
Also open to new structures if it makes this query easier. The numbers should be easily changeable eg getting the last 8 event 1's instead of the last 6
I have SQL for each but cant combine them
Events in the last year by member id
SELECT attended_id, year(x.date), count(event_id) FROM events e INNER JOIN events_types x USING (event_id)
INNER JOIN events_type t USING (event_type)
WHERE t.event_type = 1
group by attended_id, year(x.`date`);
last x events of type
SELECT attended_id, count(event_id) FROM events e INNER JOIN events_types x USING (event_id)
INNER JOIN events_type t USING (event_type)
WHERE t.event_type = 1 and
e.event_id >= (
select event_id from events_types where event_type = 1 order by event_id desc
limit 1,1
)
group by attended_id
I just cant combine these to show both on the same query
The query can be achieved by:
using user defined variables to calculate the age rank of each event type in one pass
joining the attendance records to the ranked results, grouping on attended_id
Here's the query:
select
a.attended_id,
sum(event_type = 1 and rank <= 6) `last 6 event 1`,
sum(event_type = 2 and rank <= 12) `last 12 event 2`,
sum(event_type = 3 and year(date) = 2013) `2013 event 3`
from events a
left join (
select event_id, event_type, date,
(#row := if(#prev is null or #row is null or #prev != event_type, 1, #row + 1)) rank,
(#prev := event_type) prev
from (select event_id, event_type, date
from events_types
order by event_type, date desc) e) r on r.event_id = a.event_id
group by 1
Here's the SQLFiddle
As you can see, it would be a simple matter to change the variables to have different event types and "last n" values, or even to add more columns for different breakdowns.
Notes:
the inner-most query orders the rows by event then date oldest-to-newest. This is required for the rank logic to work properly
the tests for the variables being null are only needed for the first row, but are used to avoid having to define the variables in a separate statement, making the query a single stand-alone query, which is often required because most database libraries don't support multiple queries executed in a single call
lengthy case statements are avoided by summing a condition; in mysql booleans are 1 for true and 0 for false, so summing a condition counts how many times it was true
A final note, your table names are a bit confusing. They could be changed to make it more clear what their meaning is. I suggest these changes:
events --> attendances
events_types --> events
events_type --> event_types
please try this sqlFiddle
SELECT T1.attended_id,
T2.`last 6 event 1`,
T3.`last 12 event 2`,
T4.`2013 event 3`
FROM
(SELECT DISTINCT attended_id
FROM events)T1
LEFT JOIN
(SELECT e.attended_id,COUNT(*) as `last 6 event 1`
FROM events e
INNER JOIN (SELECT event_id,event_type,date
FROM events_types
WHERE event_type = 1
ORDER BY date DESC
LIMIT 6
) et
USING (event_id)
GROUP BY e.attended_id
)T2
ON (T1.attended_id = T2.attended_id)
LEFT JOIN
(SELECT e.attended_id,COUNT(*) as `last 12 event 2`
FROM events e
INNER JOIN (SELECT event_id,event_type,date
FROM events_types
WHERE event_type = 2
ORDER BY date DESC
LIMIT 12
) et
USING (event_id)
GROUP BY e.attended_id
)T3
ON (T1.attended_id = T3.attended_id)
LEFT JOIN
(SELECT e.attended_id,COUNT(*) as `2013 event 3`
FROM events e
INNER JOIN (SELECT event_id,event_type,date
FROM events_types
WHERE event_type = 3
AND year(date) = 2013
) et
USING (event_id)
GROUP BY e.attended_id
)T4
ON (T1.attended_id = T4.attended_id)
your sqlFiddle doesn't have enough data so I added some random events and some event_types and I noticed that you don't need to join with event_type since you're not grabbing any information from event_type but only wanted a count.
If you wanted last 8 event 1, just change the LIMIT 6 to LIMIT 8 inside et of T2
OR you can try this sqlFiddle with code below
SELECT e.attended_id,
SUM(IF(event_type = 1 AND typeRank BETWEEN 1 AND 6,1,0)) as `last 6 event 1`,
SUM(IF(event_type = 2 AND typeRank BETWEEN 1 AND 12,1,0)) as `last 12 event 2`,
SUM(IF(event_type = 3 AND YEAR(date) = 2013,1,0)) as `2013 event 3`
FROM events e
INNER JOIN (SELECT event_id,event_type,date,
IF (#prevType != event_type,#typeRank:=1,#typeRank:=#typeRank+1) as typeRank,
#prevType := event_type
FROM events_types,(SELECT #prevType:=0,#typeRank:=0)R
ORDER BY event_type,date DESC
) et
USING (event_id)
GROUP BY e.attended_id
if you wanted last 8 event 1 just change the BETWEEN 1 AND 6 to BETWEEN 1 AND 8

SQL query that reports N or more consecutive absents from attendance table

I have a table that looks like this:
studentID | subjectID | attendanceStatus | classDate | classTime | lecturerID |
12345678 1234 1 2012-06-05 15:30:00
87654321
12345678 1234 0 2012-06-08 02:30:00
I want a query that reports if a student has been absent for 3 or more consecutive classes. based on studentID and a specific subject between 2 specific dates as well. Each class can have a different time. The schema for that table is:
PK(`studentID`, `classDate`, `classTime`, `subjectID, `lecturerID`)
Attendance Status: 1 = Present, 0 = Absent
Edit: Worded question so that it is more accurate and really describes what was my intention.
I wasn't able to create an SQL query for this. So instead, I tried a PHP solution:
Select all rows from table, ordered by student, subject and date
Create a running counter for absents, initialized to 0
Iterate over each record:
If student and/or subject is different from previous row
Reset the counter to 0 (present) or 1 (absent)
Else, that is when student and subject are same
Set the counter to 0 (present) or plus 1 (absent)
I then realized that this logic can easily be implemented using MySQL variables, so:
SET #studentID = 0;
SET #subjectID = 0;
SET #absentRun = 0;
SELECT *,
CASE
WHEN (#studentID = studentID) AND (#subjectID = subjectID) THEN #absentRun := IF(attendanceStatus = 1, 0, #absentRun + 1)
WHEN (#studentID := studentID) AND (#subjectID := subjectID) THEN #absentRun := IF(attendanceStatus = 1, 0, 1)
END AS absentRun
FROM table4
ORDER BY studentID, subjectID, classDate
You can probably nest this query inside another query that selects records where absentRun >= 3.
SQL Fiddle
This query works for intended result:
SELECT DISTINCT first_day.studentID
FROM student_visits first_day
LEFT JOIN student_visits second_day
ON first_day.studentID = second_day.studentID
AND DATE(second_day.classDate) - INTERVAL 1 DAY = date(first_day.classDate)
LEFT JOIN student_visits third_day
ON first_day.studentID = third_day.studentID
AND DATE(third_day.classDate) - INTERVAL 2 DAY = date(first_day.classDate)
WHERE first_day.attendanceStatus = 0 AND second_day.attendanceStatus = 0 AND third_day.attendanceStatus = 0
It's joining table 'student_visits' (let's name your original table so) to itself step by step on consecutive 3 dates for each student and finally checks the absence on these days. Distinct makes sure that result willn't contain duplicate results for more than 3 consecutive days of absence.
This query doesn't consider absence on specific subject - just consectuive absence for each student for 3 or more days. To consider subject simply add .subjectID in each ON clause:
ON first_day.subjectID = second_day.subjectID
P.S.: not sure that it's the fastest way (at least it's not the only).
Unfortunately, mysql does not support windows functions. This would be much easier with row_number() or better yet cumulative sums (as supported in Oracle).
I will describe the solution. Imagine that you have two additional columns in your table:
ClassSeqNum -- a sequence starting at 1 and incrementing by 1 for each class date.
AbsentSeqNum -- a sequence starting a 1 each time a student misses a class and then increments by 1 on each subsequent absence.
The key observation is that the difference between these two values is constant for consecutive absences. Because you are using mysql, you might consider adding these columns to the table. They are big challenging to add in the query, which is why this answer is so long.
Given the key observation, the answer to your question is provided by the following query:
select studentid, subjectid, absenceid, count(*) as cnt
from (select a.*, (ClassSeqNum - AbsentSeqNum) as absenceid
from Attendance a
) a
group by studentid, subjectid, absenceid
having count(*) > 2
(Okay, this gives every sequence of absences for a student for each subject, but I think you can figure out how to whittle this down just to a list of students.)
How do you assign the sequence numbers? In mysql, you need to do a self join. So, the following adds the ClassSeqNum:
select a.StudentId, a.SubjectId, count(*) as ClassSeqNum
from Attendance a join
Attendance a1
on a.studentid = a1.studentid and a.SubjectId = a1.Subjectid and
a.ClassDate >= s1.classDate
group by a.StudentId, a.SubjectId
And the following adds the absence sequence number:
select a.StudentId, a.SubjectId, count(*) as AbsenceSeqNum
from Attendance a join
Attendance a1
on a.studentid = a1.studentid and a.SubjectId = a1.Subjectid and
a.ClassDate >= a1.classDate
where AttendanceStatus = 0
group by a.StudentId, a.SubjectId
So the final query looks like:
with cs as (
select a.StudentId, a.SubjectId, count(*) as ClassSeqNum
from Attendance a join
Attendance a1
on a.studentid = a1.studentid and a.SubjectId = a1.Subjectid and
a.ClassDate >= s1.classDate
group by a.StudentId, a.SubjectId
),
a as (
select a.StudentId, a.SubjectId, count(*) as AbsenceSeqNum
from Attendance a join
Attendance a1
on a.studentid = a1.studentid and a.SubjectId = a1.Subjectid and
a.ClassDate >= s1.classDate
where AttendanceStatus = 0
group by a.StudentId, a.SubjectId
)
select studentid, subjectid, absenceid, count(*) as cnt
from (select cs.studentid, cs.subjectid,
(cs.ClassSeqNum - a.AbsentSeqNum) as absenceid
from cs join
a
on cs.studentid = a.studentid and cs.subjectid = as.subjectid
) a
group by studentid, subjectid, absenceid
having count(*) > 2