mysql trying a select with multiple conditions - mysql

I have 2 tables with values like below:
tbl_users
user_ID name
1 somename1
2 somename2
3 somename3
tbl_interviews
int_ID user_ID answer date
1 1 sometextaba 2012-11-04
2 2 sometextxcec 2012-10-05
3 1 sometextabs 2011-06-04
4 3 sometextxcfc 2012-11-04
5 3 sometextxcdn 2012-11-04
how can i ask mysql tell me who is the only user in the table above that was interviewed this year but had also another interview in the previous years? the only one is the user with id = 1 (since he had an interview (the int_id 1) this year, but the first interview was in 2011 (int-id 3). )
unfortunately I'm not able even to select them..

By joining the table against itself, where one side of the join only includes interviews from this year and the other side only includes previous years, the result of the INNER JOIN will be users having both.
Because it doesn't need to rely on any aggregates or subqueries, this method should be extremely efficient. Especially so, if the date column has an index.
SELECT
DISTINCT
thisyear.user_ID,
name
FROM
/* Left side of join retrieces only this year (year=2012) */
tbl_interviews thisyear
/* Right side retrieves year < 2012 */
/* The combined result will elmininate any users who don't exist on both sides of the join */
INNER JOIN tbl_interviews previous_years ON thisyear.user_ID = previous_years.user_ID
/* and JOIN in the user table to get a name */
INNER JOIN tbl_users ON tbl_users.user_ID = thisyear.user_ID
WHERE
YEAR(thisyear.date) = 2012
AND YEAR(previous_years.date) < 2012
Here is a demonstration on SQLFiddle

A simple approach, perhaps less efficient than JOINs
SELECT DISTINCT user_ID
FROM tbl_interviews
WHERE user_ID IN (
SELECT user_ID
FROM tbl_interviews
WHERE date < 2012-01-01
)
AND user_ID IN (
SELECT user_ID
FROM tbl_interviews
WHERE date > 2012-01-01
)

Following gives you the users taking interviews in Current year, only those who also had appeared in some Previous year/s
SELECT Distinct tc.user_ID FROM tbl_interviews tc
INNER JOIN tbl_interviews tp ON tc.user_ID = tp.user_ID
WHERE YEAR(tc.date) = Year(curDate()) AND YEAR(tp.date) < Year(curDate());
SqlFiddle Demo

Here is a version with no joins, and only one subselect.
SELECT user_id
FROM (
SELECT user_id,
MAX(date) AS last_interview,
COUNT(int_id) AS interviews
FROM tbl_interviews
GROUP BY user_id) AS t
WHERE YEAR(last_interview) = 2012 AND interviews > 1
You can group tbl_interviews by user_id to count the number of interviews per user, and then filter for users who have more than one interview (in addition to having an interview this year). There a number of variations on this theme, according to your specific needs, so let me know if needs a tweak.
For example, this should work as well.
SELECT user_id
FROM (
SELECT user_id,
BIT_OR(YEAR(date) = 2012) AS this_year,
BIT_OR(YEAR(date) < 2012) AS other_year
FROM tbl_interviews
GROUP BY user_id) AS t
WHERE this_year AND other_year

Related

Finding missing data in a sequence in MySQL

Is there an efficient way to find missing data not just in one sequence, but many sequences?
This is probably unavoidably O(N**2), so efficient here is defined as relatively few queries using MySQL
Let's say I have a table of temporary employees and their starting and ending months.
employees | start_month | end_month
------------------------------------
Jane 2017-05 2017-07
Bob 2017-10 2017-12
And there is a related table of monthly payments to those employees
employee | paid_month
---------------------
Jane 2017-05
Jane 2017-07
Bob 2017-11
Bob 2017-12
Now, it's clear that we're missing a month for Jane (2017-06) and one for Bob too (2017-10).
Is there a way to somehow find the gaps in their payment record, without lots of trips back and forth?
In the case where there's just one sequence to check, some people generate a temporary table of valid values, and then LEFT JOIN to find the gaps. But here we have different sequences for each employee.
One possibility is that we could do an aggregate query to find the COUNT() of paid_months for each employee, and then check it versus the expected delta of months. Unfortunately the data here is a bit dirty so we actually have payment dates that could be before or after that employee start or end date. But we're verifying that the official sequence definitely has payments.
Form a Cartesian product of employees and months, then left join the actual data to that, then the missing data is revealed when there is no matched payment to the Cartesian product.
You need a list of every months. This might come from a "calendar table" you already have, OR, it MIGHT be possible using a subquery if every month is represented in the source data)
e.g.
select
m.paid_month, e.employee
from (select distinct paid_month from payments) m
cross join (select employee from employees) e
left join payments p on m.paid_month = p.paid_month and e.employee = p.employee
where p.employee is null
The subquery m can be substituted by the calendar table or some other technique for generating a series of months. e.g.
select
DATE_FORMAT(m1, '%Y-%m')
from (
select
'2017-01-01'+ INTERVAL m MONTH as m1
from (
select #rownum:=#rownum+1 as m
from (select 1 union select 2 union select 3 union select 4) t1
cross join (select 1 union select 2 union select 3 union select 4) t2
## cross join (select 1 union select 2 union select 3 union select 4) t3
## cross join (select 1 union select 2 union select 3 union select 4) t4
cross join(select #rownum:=-1) t0
) d1
) d2
where m1 < '2018-01-01'
order by m1
The subquery e could contain other logic (e.g. to determine which employees are still currently employed, or that are "temporary employees")
First we need to get all the months between start date and end_date in a temporary table then need do a left outer join with the payments table on paid month filtering all non matching months ( payment employee name is null )
select e.employee, e.yearmonth as missing_paid_month from (
with t as (
select e.employee, to_date(e.start_date, 'YYYY-MM') as start_date, to_date(e.end_date, 'YYYY-MM') as end_date from employees e
)
select distinct t.employee,
to_char(add_months(trunc(start_date,'MM'),level - 1),'YYYY-MM') yearmonth
from t
connect by trunc(end_date,'mm') >= add_months(trunc(start_date,'mm'),level - 1)
order by t.employee, yearmonth
) e
left outer join payments p
on p.paid_month = e.yearmonth
where p.employee is null
output
EMPLOYEE MISSING_PAID_MONTH
Bob 2017-10
Jane 2017-06
SQL Fiddle http://sqlfiddle.com/#!4/2b2857/35

Join to table according to date

I have two tables, one is a list of firms, the other is a list of jobs the firms have advertised with deadlines for application and start dates.
Some of the firms will have advertised no jobs, some will only have jobs that are past their deadline dates, some will only have live jobs and others will have past and live applications.
What I want to be able to show as a result of a query is a list of all the firms, with the nearest deadline they have, sorted by that deadline. So the result might look something like this (if today was 2015-01-01).
Sorry, I misstated that. What I want to be able to do is find the next future deadline, and if there is no future deadline then show the last past deadline. So in the first table below the BillyCo deadline has passed, but the next BuffyCo deadline is shown. In the BillyCo case there are earlier deadlines, but in the BuffyCo case there are both earlier and later deadlines.
id name title date
== ==== ===== ====
1 BobCo null null
2 BillCo Designer 2014-12-01
3 BuffyCo Admin 2015-01-31
So, BobCo has no jobs listed at all, BillCo has a deadline that has passed and BuffyCo has a deadline in the future.
The problematic part is that BillCo may have a set of jobs like this:
id title date desired hit
== ===== ==== ===========
1 Coder 2013-12-01
2 Manager 2014-06-30
3 Designer 2012-12-01 <--
And BuffyCo might have:
id title date desired hit
== ===== ==== ===========
1 Magician 2013-10-01
2 Teaboy 2014-05-19
3 Admin 2015-01-31 <--
4 Writer 2015-02-28
So, I can do something like:
select * from (
select * from firms
left join jobs on firms.id = jobs.firmid
order by date desc)
as t1 group by firmid;
Or, limit the jobs joined or returned by a date criterion, but I don't seem to be able to get the records I want returned. ie the above query would return:
id name title date
== ==== ===== ====
1 BobCo null null
2 BillCo Designer 2014-12-01
3 BuffyCo Writer 2015-02-28
For BuffyCo it's returning the Writer job rather than the Admin job.
Is it impossible with an SQL query? Any advice appreciated, thanks in advance.
I think this may be what you need, you need:
1) calculate the delta for all of your jobs between the date and the current date finding the min delta for each firm.
2) join firms to jobs only on where firm id's match and where the calculated min delta for the firm matches the delta for the row in jobs.
SELECT f.id, f.name, j.title,j.date
FROM firms f LEFT JOIN
(SELECT firmid,MIN(abs(datediff(date, curdate())))) AS delta
FROM jobs
GROUP BY firmid) d
ON f.id = d.firmid
LEFT JOIN jobs j ON f.id = j.id AND d.delta = abs(datediff(j.date, curdate())))) ;
You want to make an outer join with something akin to the group-wise maximum of (next upcoming, last expired):
SELECT * FROM firms LEFT JOIN (
-- fetch the "groupwise" record
jobs NATURAL JOIN (
-- using the relevant date for each firm
SELECT firmid, MAX(closest_date) date
FROM (
-- next upcoming deadline
SELECT firmid, MIN(date) closest_date
FROM jobs
WHERE date >= CURRENT_DATE
GROUP BY firmid
UNION ALL
-- most recent expired deadline
SELECT firmid, MAX(date)
FROM jobs
WHERE date < CURRENT_DATE
GROUP BY firmid
) closest_dates
GROUP BY firmid
) selected_dates
) ON jobs.firmid = firms.id
This will actually give you all jobs that have the best deadline date for each firm. If you want to restrict the results to an indeterminate record from each such group, you can add GROUP BY firms.id to the very end.
The revision to your question makes it rather trickier, but it can still be done. Try this:
select
closest_job.*, firm.name
from
firms
left join (
select future_job.*
from
(
select firmid, min(date) as mindate
from jobs
where date >= curdate()
group by firmid
) future
inner join jobs future_job
on future_job.firmid = future.firmid and future_job.date = future.mindate
union all
select past_job.*
from
(
select firmid, max(date) as maxdate
from jobs
group by firmid
having max(date) < curdate()
) past
inner join jobs past_job
on past_job.firmid = past.firmid and past_job.date = past.maxdate
) closest_job
on firms.id = closest_job.firmid
I think this does what I need:
select * from (
select firms.name, t2.closest_date from firms
left join
(
select * from (
--get first date in the future
SELECT firmid, MIN(date) closest_date
FROM jobs
WHERE date >= CURRENT_DATE
GROUP BY firmid
UNION ALL
-- most recent expired deadline
SELECT firmid, MAX(date)
FROM jobs
WHERE date < CURRENT_DATE
GROUP BY firmid) as t1
-- order so latest date is first
order by closest_date desc) as t2
on firms.id = t2.firmid
-- group by eliminates all but latest date
group by firms.id) as t3
order by closest_date asc;
Thanks for all the help on this

Multiple column counts by week

I have the following two tables:
Posts
post_id
post_title
post_timestamp
Comments
comment_id
posts_post_id
comment_content
comment_timestamp
I want to create a report that shows the weekly post count and comment count. Something like this:
Week StartDate Posts Comments
1 1/1/2012 100 305
2 1/8/2012 115 412
I have this query but it only pulls form the Posts table.
select makedate( left(yearweek(p.post_timestamp),1),week(p.post_timestamp, 2 ) * 7 ) as Week, COUNT(p.post_id) as Posts
FROM cl_posts p
GROUP BY Week
ORDER BY WEEK(p.post_timestamp)
How do I add the Comment count too?
I think you need something like this:
select
week(post_timestamp) as Week,
adddate(date(post_timestamp), INTERVAL 1-DAYOFWEEK(post_timestamp) DAY) as StartDate,
count(distinct post_id),
count(comment_id)
from
posts left join comments
on comments.posts_post_id = posts.post_id
group by Week, StartDate
Here is one way, using join:
select coalesce(p.week, c.week) as week, p.Posts, c.Comments
from (select makedate( left(yearweek(p.post_timestamp),1),week(p.post_timestamp, 2 ) * 7 ) as Week,
COUNT(*) as Posts
FROM cl_posts p
GROUP BY Week
) p full outer join
(select makedate( left(yearweek(c.comment_timestamp),1),week(c.comment_timestamp, 2 ) * 7 ) as Week,
COUNT(*) as Comments
FROM cl_comments c
GROUP BY Week
) c
on p.week = c.week
order by 1
The reason that I'm using a full outer join instead of another join type is to keep weeks even when one or the other counts are 0. The reason I'm not joining the tables together is because, presumably, you want the report by the comment date, not the post date of the post associated with the comment.

MySQL query problems with combined SUM

I have three tables here, that I'm trying to do a tricky combined query on.
Table 1(teams) has Teams in it:
id name
------------
150 LA Lakers
151 Boston Celtics
152 NY Knicks
Table 2(scores) has scores in it:
id teamid week score
---------------------------
1 150 5 75
2 151 5 95
3 152 5 112
Table 3(tickets) has tickets in it
id teamids week
---------------------
1 150,152,154 5
2 151,154,155 5
I have two queries that I'm trying to write
Rather than trying to sum these each time i query the tickets, I've added a weekly_score field to the ticket. The idea being, any time a new score is entered for the team, I could take that teams id, get all tickets that have that team / week combo, and update them all based on the sum of their team scores.
I've tried the following to get the results i'm looking for (before I try and update them):
SELECT t.id, t.teamids, (
SELECT SUM( s1.score )
FROM scores s1
WHERE s1.teamid
IN (
t.teamids
)
AND s1.week =11
) AS score
FROM tickets t
WHERE t.week =11
AND (t.teamids LIKE "150,%" OR t.teamids LIKE "%,150")
Not only is the query slow, but it also seems to not return the sum of the scores, it just returns the first score in the list.
Any help is greatly appreciated.
If you are going to match, you'll need to accommodate for the column only having one team id. Also, you'll need to LIKE in your SELECT sub query.
SELECT t.id, t.teamids, (
SELECT SUM( s1.score )
FROM scores s1
WHERE
(s1.teamid LIKE t.teamids
OR CONCAT("%,",s1.teamid, "%") LIKE t.teamids
OR CONCAT("%",s1.teamid, ",%") LIKE t.teamids
)
AND s1.week =11
) AS score
FROM tickets t
WHERE t.week =11
AND (t.teamids LIKE "150,%" OR t.teamids LIKE "%,150" OR t.teamids LIKE "150")
You don't need SUM function here ? The scores table already has it? And BTW, avoid subqueries, try the left join (or left outer join depending on your needs).
SELECT t.id, t.name, t1.score, t2.teamids
FROM teams t
LEFT JOIN scores t1 ON t.id = t1.teamid AND t1.week = 11
LEFT JOIN tickets t2 ON t2.week = 11
WHERE t2.week = 11 AND t2.teamids LIKE "%150%"
Not tested.
Well not the most elegant query ever, but it should word:
SELECT
tickets.id,
tickets.teamids,
sum(score)
FROM
tickets left join scores
on concat(',', tickets.teamids, ',') like concat('%,', scores.teamid, ',%')
WHERE tickets.week = 11 and concat(',', tickets.teamids, ',') like '%,150,%'
GROUP BY tickets.id, tickets.teamids
or also this:
SELECT
tickets.id,
tickets.teamids,
sum(score)
FROM
tickets left join scores
on FIND_IN_SET(scores.teamid, tickets.teamids)>0
WHERE tickets.week = 11 and FIND_IN_SET('150', tickets.teamids)>0
GROUP BY tickets.id, tickets.teamids
(see this question and the answers for more informations).

SQL query that reports N or more consecutive absents from attendance table

I have a table that looks like this:
studentID | subjectID | attendanceStatus | classDate | classTime | lecturerID |
12345678 1234 1 2012-06-05 15:30:00
87654321
12345678 1234 0 2012-06-08 02:30:00
I want a query that reports if a student has been absent for 3 or more consecutive classes. based on studentID and a specific subject between 2 specific dates as well. Each class can have a different time. The schema for that table is:
PK(`studentID`, `classDate`, `classTime`, `subjectID, `lecturerID`)
Attendance Status: 1 = Present, 0 = Absent
Edit: Worded question so that it is more accurate and really describes what was my intention.
I wasn't able to create an SQL query for this. So instead, I tried a PHP solution:
Select all rows from table, ordered by student, subject and date
Create a running counter for absents, initialized to 0
Iterate over each record:
If student and/or subject is different from previous row
Reset the counter to 0 (present) or 1 (absent)
Else, that is when student and subject are same
Set the counter to 0 (present) or plus 1 (absent)
I then realized that this logic can easily be implemented using MySQL variables, so:
SET #studentID = 0;
SET #subjectID = 0;
SET #absentRun = 0;
SELECT *,
CASE
WHEN (#studentID = studentID) AND (#subjectID = subjectID) THEN #absentRun := IF(attendanceStatus = 1, 0, #absentRun + 1)
WHEN (#studentID := studentID) AND (#subjectID := subjectID) THEN #absentRun := IF(attendanceStatus = 1, 0, 1)
END AS absentRun
FROM table4
ORDER BY studentID, subjectID, classDate
You can probably nest this query inside another query that selects records where absentRun >= 3.
SQL Fiddle
This query works for intended result:
SELECT DISTINCT first_day.studentID
FROM student_visits first_day
LEFT JOIN student_visits second_day
ON first_day.studentID = second_day.studentID
AND DATE(second_day.classDate) - INTERVAL 1 DAY = date(first_day.classDate)
LEFT JOIN student_visits third_day
ON first_day.studentID = third_day.studentID
AND DATE(third_day.classDate) - INTERVAL 2 DAY = date(first_day.classDate)
WHERE first_day.attendanceStatus = 0 AND second_day.attendanceStatus = 0 AND third_day.attendanceStatus = 0
It's joining table 'student_visits' (let's name your original table so) to itself step by step on consecutive 3 dates for each student and finally checks the absence on these days. Distinct makes sure that result willn't contain duplicate results for more than 3 consecutive days of absence.
This query doesn't consider absence on specific subject - just consectuive absence for each student for 3 or more days. To consider subject simply add .subjectID in each ON clause:
ON first_day.subjectID = second_day.subjectID
P.S.: not sure that it's the fastest way (at least it's not the only).
Unfortunately, mysql does not support windows functions. This would be much easier with row_number() or better yet cumulative sums (as supported in Oracle).
I will describe the solution. Imagine that you have two additional columns in your table:
ClassSeqNum -- a sequence starting at 1 and incrementing by 1 for each class date.
AbsentSeqNum -- a sequence starting a 1 each time a student misses a class and then increments by 1 on each subsequent absence.
The key observation is that the difference between these two values is constant for consecutive absences. Because you are using mysql, you might consider adding these columns to the table. They are big challenging to add in the query, which is why this answer is so long.
Given the key observation, the answer to your question is provided by the following query:
select studentid, subjectid, absenceid, count(*) as cnt
from (select a.*, (ClassSeqNum - AbsentSeqNum) as absenceid
from Attendance a
) a
group by studentid, subjectid, absenceid
having count(*) > 2
(Okay, this gives every sequence of absences for a student for each subject, but I think you can figure out how to whittle this down just to a list of students.)
How do you assign the sequence numbers? In mysql, you need to do a self join. So, the following adds the ClassSeqNum:
select a.StudentId, a.SubjectId, count(*) as ClassSeqNum
from Attendance a join
Attendance a1
on a.studentid = a1.studentid and a.SubjectId = a1.Subjectid and
a.ClassDate >= s1.classDate
group by a.StudentId, a.SubjectId
And the following adds the absence sequence number:
select a.StudentId, a.SubjectId, count(*) as AbsenceSeqNum
from Attendance a join
Attendance a1
on a.studentid = a1.studentid and a.SubjectId = a1.Subjectid and
a.ClassDate >= a1.classDate
where AttendanceStatus = 0
group by a.StudentId, a.SubjectId
So the final query looks like:
with cs as (
select a.StudentId, a.SubjectId, count(*) as ClassSeqNum
from Attendance a join
Attendance a1
on a.studentid = a1.studentid and a.SubjectId = a1.Subjectid and
a.ClassDate >= s1.classDate
group by a.StudentId, a.SubjectId
),
a as (
select a.StudentId, a.SubjectId, count(*) as AbsenceSeqNum
from Attendance a join
Attendance a1
on a.studentid = a1.studentid and a.SubjectId = a1.Subjectid and
a.ClassDate >= s1.classDate
where AttendanceStatus = 0
group by a.StudentId, a.SubjectId
)
select studentid, subjectid, absenceid, count(*) as cnt
from (select cs.studentid, cs.subjectid,
(cs.ClassSeqNum - a.AbsentSeqNum) as absenceid
from cs join
a
on cs.studentid = a.studentid and cs.subjectid = as.subjectid
) a
group by studentid, subjectid, absenceid
having count(*) > 2