get result where one column value is identical in two following rows - mysql

I want to select failed employees from employeeExam table where status column equals 0 for two following rows.
Result should be like this:
ID COURSE_ID EMPLOYEE_ID DEGREE DATE STATUS NUMOFTAKINGEXAMS
4 2 4 17 January, 15 2013 00:00:00+0000 0 2
Here is what I did:
SQL Fiddle
To clarify more: when ordered by id, result should contains only exams' data which have the same course_id and employee_id and status = 0 under each other directly.

Please try this out and comment:
SQLFIDDLE DEMO
set #sum:=0;
set #id:=0;
select distinct x.empid, x.degree, x.date, x.status
from (
select #sum:= (case when status=0
and #id = employee_id then #sum+1
else 1 end)
as sm, #id:=employee_id as empid, degree, date, status
from employeeexam
order by employee_id)x
where x.sm >= 2
;
| EMPID | DEGREE | DATE | STATUS |
------------------------------------------------------------
| 2 | 5 | January, 16 2013 00:00:00+0000 | 0 |
| 3 | 6 | January, 16 2013 00:00:00+0000 | 0 |
| 4 | 15 | January, 14 2013 00:00:00+0000 | 0 |

Just a portion of the SQL-code, to show the principle, although I doubt it fully extends your requirements. For example, this code doesn't check for consecutive failed attempts, only all attempts in general. It doesn't show any any rows of which the applicant has already passed an exam.
So for example: should one take an exam and get status 0, 0, 1, 0, 0; it would not show up, because applicant has passed at least one exam.
SELECT course_id, employee_id, MAX(degree), status, COUNT(id) NumExamsTaken
FROM employeeExam
GROUP BY course_id, employee_id
HAVING COUNT(id) >= 2 AND SUM(status) = 0;
SQL Fiddle

SELECT a.*
FROM
( SELECT x.*
, COUNT(*) rank
FROM employeeExam x
JOIN employeeExam y
ON y.course_id = x.course_id
AND y.employee_id = x.employee_id
AND y.date <= x.date
GROUP
BY id
) a
JOIN
( SELECT x.*
, COUNT(*) rank
FROM employeeExam x
JOIN employeeExam y
ON y.course_id = x.course_id
AND y.employee_id = x.employee_id
AND y.date <= x.date
GROUP
BY id
) b
ON b.course_id = a.course_id
AND b.employee_id = a.employee_id
AND b.status = a.status
AND b.rank = a.rank - 1
WHERE a.status = 0;

Related

MySQL query GROUP BY two columns

I need help with MySQL query
I have this cenary:
table A
- page_id
- rev_id
- date
one page_id can have multiples rev_id
table B
- rev_id
- words
I have what words have in each revision
I need return for each date the quantity of words that I have in the
last rev_id in each page_id
Example:
table A
page_id | rev_id | date
---------------------------------
1 | 231 | 2002-01-01
2 | 345 | 2002-10-12
1 | 324 | 2002-10-13
3 | 348 | 2003-01-01
--
table B
rev_id | words
---------------
231 | 'ask'
231 | 'the'
231 | 'if'
345 | 'ask'
324 | 'ask'
324 | 'if'
348 | 'who'
magical sql here edited to show how its calculated {page_id : [words]}
date | count(words)
--------------------------
2002-01-01 | 3 { 1:[ask, the, if] }
2002-10-12 | 4 { 1:[ask, the, if], 2:[ask] }
2002-10-13 | 3 { 1:[ask, if], 2:[ask] }
2003-01-01 | 4 { 1:[ask, if], 2:[ask], 3:[who] }
I did this query, but my date are fixed and I need for all dates contained in table revision:
SELECT SUM(q)
FROM (
SELECT COUNT(equation) q
FROM revision r, equation e
WHERE r.rev_id in (
SELECT max(rev_id)
FROM revision
WHERE date < '2006-01-01'
GROUP BY page_id
)
AND r.rev_id = e.rev_id
GROUP BY date
) q;
Solved
My friend help-me to create query to solve my problem!
select s.date, count(words) from
(select d.date, r.page_id, max(r.rev_id) as rev_id
from revision r, (select distinct(date) from revision) d
where d.date >= r.date group by d.date, r.page_id) s
join words e on e.rev_id = s.rev_id
group by s.date;
I think this is a basic join and group by:
select a.date, count(*)
from a join
b
on a.rev_id = b.rev_id
group by a.date;
EDIT:
Oh, I think I get it. This is a cumulative thing. That makes it more complicated.
select d.date,
(select count(*)
from a join
b
on a.rev_id = b.rev_id
where a.date <= d.date and
a.rev_id = (select max(a2.rev_id) from a a2 where a2.date = a.date and a2.date <= d.date)
) as cnt
from (select date from a) d;
But that won't work in MySQL because of the nesting of the correlation clause. So, we can restructure the logic as:
select a.date, count(*)
from (select a.*,
(select max(a2.rev_id)
from a a2
where a2.date <= a.date and a2.page_id = a.page_id
) as last_rev_id
from a
) a join
b
on a.last_rev_id = b.rev_id
group by a.date;

MySQL: How to get the Active members by month in year

I have got the previous year working members and subtracted previous year relieving employees, then got the previous month relieving list and subtracted it from the result set. Then added the newly added members in a current month.
SQL Fiddle Link
I am sensing that there lot of improvements we can do to the current query. But right now I am out of ideas, Can someone kindly help on this?
IF I have interpreted your existing query correctly, I suggest the following:
select
mnth.num, count(*)
from (
select 1 AS num union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9 union all select 10 union all select 11 union all select 12
) mnth
left join (
select
e.emp_id
, case
when e.hired_date < date_format(current_date(), '%Y-01-01') then 1
else month(e.hired_date)
end AS start_month
, case
when es.relieving_date < date_format(current_date(), '%Y-01-01') then 0
when es.relieving_date >= date_format(current_date(), '%Y-01-01') then month(es.relieving_date)
else month(current_date())
end AS end_month
from employee e
left join employee_separation es on e.emp_id = es.emp_id
) emp on mnth.num between emp.start_month and emp.end_month
where mnth.num <= month(current_date())
group by
mnth.num
;
This produced the following result (current_date() on Nov 21 2017
| num | count(*) |
|-----|----------|
| 1 | 6 |
| 2 | 7 |
| 3 | 8 |
| 4 | 9 |
| 5 | 10 |
| 6 | 9 |
| 7 | 10 |
| 8 | 11 |
| 9 | 12 |
| 10 | 13 |
| 11 | 14 |
DEMO
Depending on data volumes adding a where clause in the emp subquery may help, this also affect a case expression:
, case
when es.relieving_date >= date_format(current_date(), '%Y-01-01') then month(es.relieving_date)
else month(current_date())
end AS end_month
from employee e
left join employee_separation es on e.emp_id = es.emp_id
where es.relieving_date >= date_format(current_date(), '%Y-01-01')
I think what you need to do is to get all the employees who are already working from the employee table with:
SELECT * FROM employee WHERE hired_date<= CURRENT_DATE;
Then get the list of employees whose relieving date is still in the future using:
SELECT * FROM employee_separation WHERE relieving_date > CURRENT_DATE;
Then join the two results and group by the month and year of the reliving date as shown below:
SELECT DATE_FORMAT(B.relieving_date, "%Y-%M") RELIEVING_DATE, COUNT(*)
NUMBER_OF_ACTIVE_MEMBERS FROM
(SELECT * FROM employee WHERE hired_date <= CURRENT_DATE) A INNER JOIN
(SELECT * FROM employee_separation WHERE relieving_date > CURRENT_DATE) B
ON A.emp_id=B.emp_id
GROUP BY DATE_FORMAT(B.relieving_date , "%Y-%M");
Here is a Demo on sql fiddle.

Finding count for a Period in sql

I have a table with :
user_id | order_date
---------+------------
12 | 2014-03-23
12 | 2014-01-24
14 | 2014-01-26
16 | 2014-01-23
15 | 2014-03-21
20 | 2013-10-23
13 | 2014-01-25
16 | 2014-03-23
13 | 2014-01-25
14 | 2014-03-22
A Active user is someone who has logged in last 12 months.
Need output as
Period | count of Active user
----------------------------
Oct-2013 - 1
Jan-2014 - 5
Mar-2014 - 10
The Jan 2014 value - includes Oct -2013 1 record and 4 non duplicate record for Jan 2014)
You can use a variable to calculate the running total of active users:
SELECT Period,
#total:=#total+cnt AS `Count of Active Users`
FROM (
SELECT CONCAT(MONTHNAME(order_date), '-', YEAR(order_date)) AS Period,
COUNT(DISTINCT user_id) AS cnt
FROM mytable
GROUP BY Period
ORDER BY YEAR(order_date), MONTH(order_date) ) t,
(SELECT #total:=0) AS var
The subquery returns the number of distinct active users per Month/Year. The outer query uses #total variable in order to calculate the running total of active users' count.
Fiddle Demo here
I've got two queries that do the thing. I am not sure which one's the fastest. Check them aginst your database:
SQL Fiddle
Query 1:
select per.yyyymm,
(select count(DISTINCT o.user_id) from orders o where o.order_date >=
(per.yyyymm - INTERVAL 1 YEAR) and o.order_date < per.yyyymm + INTERVAL 1 MONTH) as `count`
from
(select DISTINCT LAST_DAY(order_date) + INTERVAL 1 DAY - INTERVAL 1 MONTH as yyyymm
from orders) per
order by per.yyyymm
Results:
| yyyymm | count |
|---------------------------|-------|
| October, 01 2013 00:00:00 | 1 |
| January, 01 2014 00:00:00 | 5 |
| March, 01 2014 00:00:00 | 6 |
Query 2:
select DATE_FORMAT(order_date, '%Y-%m'),
(select count(DISTINCT o.user_id) from orders o where o.order_date >=
(LAST_DAY(o1.order_date) + INTERVAL 1 DAY - INTERVAL 13 MONTH) and
o.order_date <= LAST_DAY(o1.order_date)) as `count`
from orders o1
group by DATE_FORMAT(order_date, '%Y-%m')
Results:
| DATE_FORMAT(order_date, '%Y-%m') | count |
|----------------------------------|-------|
| 2013-10 | 1 |
| 2014-01 | 5 |
| 2014-03 | 6 |
The best thing I could do is this:
SELECT Date, COUNT(*) as ActiveUsers
FROM
(
SELECT DISTINCT userId, CONCAT(YEAR(order_date), "-", MONTH(order_date)) as Date
FROM `a`
ORDER BY Date
)
AS `b`
GROUP BY Date
The output is the following:
| Date | ActiveUsers |
|---------|-------------|
| 2013-10 | 1 |
| 2014-1 | 4 |
| 2014-3 | 4 |
Now, for every row you need to sum up the number of active users in previous rows.
For example, here is the code in C#.
int total = 0;
while (reader.Read())
{
total += (int)reader['ActiveUsers'];
Console.WriteLine("{0} - {1} active users", reader['Date'].ToString(), reader['ActiveUsers'].ToString());
}
By the way, for the March of 2014 the answer is 9 because one row is duplicated.
Try this, but thise doesn't handle the last part: The Jan 2014 value - includes Oct -2013
select TO_CHAR(order_dt,'MON-YYYY'), count(distinct User_ID ) cnt from [orders]
where User_ID in
(select User_ID from
(select a.User_ID from [orders] a,
(select a.User_ID,count (a.order_dt) from [orders] a
where a.order_dt > (select max(b.order_dt)-365 from [orders] b where a.User_ID=b.User_ID)
group by a.User_ID
having count(order_dt)>1) b
where a.User_ID=b.User_ID) a
)
group by TO_CHAR(order_dt,'MON-YYYY');
This is what I think you are looking for
SET #cnt = 0;
SELECT Period, #cnt := #cnt + total_active_users AS total_active_users
FROM (
SELECT DATE_FORMAT(order_date, '%b-%Y') AS Period , COUNT( id) AS total_active_users
FROM t
GROUP BY DATE_FORMAT(order_date, '%b-%Y')
ORDER BY order_date
) AS t
This is the output that I get
Period total_active_users
Oct-2013 1
Jan-2014 6
Mar-2014 10
You can also do COUNT(DISTINCT id) to get the unique Ids only
Here is a SQL Fiddle

MySQL count daily new users VS returned users (cohort analysis)

The table structure is: user_id, Date (I'm used to work with timestamp)
for example
user id | Date (TS)
A | '2014-08-10 14:02:53'
A | '2014-08-12 14:03:25'
A | '2014-08-13 14:04:47'
B | '2014-08-13 04:04:47'
...
and for the next week I have
user id | Date (TS)
A | '2014-08-17 09:02:53'
B | '2014-08-17 10:04:47'
B | '2014-08-18 10:04:47'
A | '2014-08-19 10:04:22'
C | '2014-08-19 11:04:47'
...
and for today I have
user id | Date (TS)
A | '2015-05-27 09:02:53'
B | '2015-05-27 10:04:47'
C | '2015-05-27 10:04:22'
D | '2015-05-27 17:04:47'
I need to know how to perform a single query to find the number of users which are a "returned" user from the very beginning of their activity.
Expected results :
date | New user | returned User
2014-08-10 | 1 | 0
2014-08-11 | 0 | 0
2014-08-12 | 0 | 1 (A was active on 08/11)
2014-08-13 | 1 | 1 (A was active on 08/12 & 08/11)
...
2014-08-17 | 0 | 2 (A & B were already active )
2014-08-18 | 0 | 1
2014-08-19 | 1 | 1
...
2015-05-27 | 1 | 3 (D is a new user)
After some long search on Stackoverflow I found some material provided by https://meta.stackoverflow.com/users/107744/spencer7593 here : Weekly Active Users for each day from log but I didn't succeed to change his query to output my expected results.
Thanks for your help
Assuming you have a date table somewhere (and using t-sql syntax because I know it better...) the key is to calculate the mindate for each user separately, calculate the total number of users on that day, and then just declaring a returning user to be a user who wasn't new:
SELECT DateTable.Date, NewUsers, NumUsers - NewUsers AS ReturningUsers
FROM
DateTable
LEFT JOIN
(
SELECT MinDate, COUNT(user_id) AS NewUsers
FROM (
SELECT user_id, min(CAST(date AS Date)) as MinDate
FROM Table
GROUP BY user_id
) A
GROUP BY MinDate
) B ON DateTable.Date = B.MinDate
LEFT JOIN
(
SELECT CAST(date AS Date) AS Date, COUNT(DISTINCT user_id) AS NumUsers
FROM Table
GROUP CAST(date AS Date)
) C ON DateTable.Date = C.Date
Thanks to Stephen, I made a short fix on his query, which works well even it's a bit time consuming on large database :
SELECT
DATE(Stats.Created),
NewUsers,
NumUsers - NewUsers AS ReturningUsers
FROM
Stats
LEFT JOIN
(
SELECT
MinDate,
COUNT(user_id) AS NewUsers
FROM (
SELECT
user_id,
MIN(DATE(Created)) as MinDate
FROM Stats
GROUP BY user_id
) A
GROUP BY MinDate
) B
ON DATE(Stats.Created) = B.MinDate
LEFT JOIN
(
SELECT
DATE(Created) AS Date,
COUNT(DISTINCT user_id) AS NumUsers
FROM Stats
GROUP BY DATE(Created)
) C
ON DATE(Stats.Created) = C.Date
GROUP BY DATE(Stats.Created)

MySQL SELECT Query to Turn History into Weekly Summary Over Time

I have a history table ('property_histories') that logs events in our property management system. These events can be used to determine whether a given property was available to rent and I am trying to build a (weekly) summary of 'live' properties.
The 4 events in question are 'published', 'unpublished', 'hidden_from_search' and 'unhidden_from_search.
For a property to be live it must have been:
Published.
If it has ever been unpublished a subsequent published event mush be the most recent.
If it has ever been hidden_from_search a subsequent 'unhidden_from_search' event must have taken place more recently.
Most properties will have a simple history that most likely consists of a single 'Published' event but some are more complicated an example is here:
property_histories
----------------------------
id | property_id | City | status | date
1 | 325407 | Paris | published | 2014-01-01
2 | 325407 | Paris | hidden_from_search | 2014-01-24
3 | 325407 | Paris | unhidden_from_search | 2014-02-05
4 | 325407 | Paris | unpublished | 2014-02-15
5 | 410008 | London | published | 2014-01-01
6 | 410008 | London | unpublished | 2014-01-10
7 | 410008 | London | published | 2014-01-18
My aim is to be able to count 'live' properties by week:
weekly_count
----------------------------
Year | Week | City | Live_Count
2014 | 1 | Paris | 0
2014 | 1 | London | 0
2014 | 2 | Paris | 1
2014 | 2 | London | 1
2014 | 3 | Paris | 1
2014 | 3 | London | 0
2014 | 4 | Paris | 1
2014 | 4 | London | 1
2014 | 5 | Paris | 0
2014 | 5 | London | 1
2014 | 6 | Paris | 0
2014 | 6 | London | 1
2014 | 7 | Paris | 1
2014 | 7 | London | 0
2014 | 8 | Paris | 0
2014 | 8 | London | 1
2014 | 9 | Paris | 0
2014 | 9 | London | 1
----------------------------
Help appreciated!!
Your own test results don't match what you're asking for. You state the live count is by week, which means London should be live in week #1 as it was published in week #1 and then unpublished in week #2.
Assuming week starts on a Sunday (sql default) then this will work. Just put in your own date range, and replace my numbers table with yours.
If you need Monday to be your start date, use this at the top of your query
SET DATEFIRST 1
Emulating your test:
-- Create dummy data
CREATE TABLE #property_histories
(
id int, property_id int, City varchar(50), status varchar(50), date date
)
INSERT INTO #property_histories
SELECT 1 , 325407 , 'Paris' , 'published' , '2014-01-01' UNION ALL
SELECT 2 , 325407 , 'Paris' , 'hidden_from_search' , '2014-01-24' UNION ALL
SELECT 3 , 325407 , 'Paris' , 'unhidden_from_search' , '2014-02-05' UNION ALL
SELECT 4 , 325407 , 'Paris' , 'unpublished' , '2014-02-15' UNION ALL
SELECT 5 , 410008 , 'London' , 'published' , '2014-01-01' UNION ALL
SELECT 6 , 410008 , 'London' , 'unpublished' , '2014-01-10' UNION ALL
SELECT 7 , 410008 , 'London' , 'published' , '2014-01-18'
Now the code:
-- TODO: Set your date range
DECLARE #SD Datetime = '2014-01-01'
DECLARE #ED Datetime = '2014-12-31'
DECLARE #Wks INT = Datediff(week,#SD,#ED) -- Don't change this
-- Generate dates table
SELECT NumberID as 'Week',
DATEADD(DAY, 1-DATEPART(WEEKDAY, DateAdd(week,NumberID-1,#SD)), DateAdd(week,NumberID-1,#SD)) as 'WeekStart',
DATEADD(DAY, 7-DATEPART(WEEKDAY, DateAdd(week,NumberID-1,#SD)), DateAdd(week,NumberID-1,#SD)) as 'WeekEnd'
INTO #Dates
FROM Generic.tblNumbers -- TODO: use your own Numbers table here
WHERE NumberID BETWEEN 1 AND #Wks
-- Now generate report
SELECT T.Year, T.Week, T.City,
SUM(CASE WHEN PH1.status = 'published' THEN 1
WHEN PH1.status = 'unhidden_from_search' THEN 1
ELSE 0 END) as 'Live_Count'
FROM #Dates D1
LEFT JOIN
-- Get latest date per week
(SELECT YEAR(D.WeekStart) as 'Year',
D.Week,
PH.City,
PH.property_ID,
MAX(PH.date) as MaxDate
FROM #Dates D
LEFT JOIN #property_histories PH
ON PH.date BETWEEN #SD AND D.WeekEnd
GROUP BY D.WeekStart, D.Week, D.WeekStart, D.WeekEnd, PH.City, PH.property_id
) T
ON T.Week = D1.Week
LEFT JOIN #property_histories PH1
ON PH1.City = T.City AND PH1.property_id = T.property_id AND PH1.date = T.MaxDate
GROUP BY T.Year, T.Week, T.City
To break down the logic: Firstly I'm creating a helper table with week number, week start and week end dates. Week start is largely redundant but might come in handy for reporting.
I then subquery to get the latest date relevant for each week / city / property. For this "max" date, city and property I get the status, and if it's live, I sum it. So in layman terms ; get the latest status per city per property per week and SUM(if live).
Unlike the other answers posted, this solution caters for gaps in data. If the latest status recorded for a city and property was actually all the way back to week 1, it still works in any subsequent week.
I have a feeling I have missed a simpler way to do this.
However the following query uses 2 sub queries. The first gets all the published / unpublished ranges for a property (ie, the smallest unpublished date following a published date), while the 2nd does the same for properties being hidden from search.
These are then joined to properties on the property id, where the current date is within the range returned by the sub queries. The WHERE clause then checks that a record is matched for published and not found for the hidden sub queries
Had to use DISTINCT as otherwise the multiple published dates for a single unpublish would trigger duplicate property rows being returned.
SELECT DISTINCT properties.*
FROM properties
INNER JOIN
(
SELECT a.property_id, a.created_at AS start_date, IFNULL(MIN(b.created_at), NOW()) AS end_date
FROM property_histories a
LEFT OUTER JOIN property_histories b
ON a.property_id = b.propert_id
AND a.created_at < b.created_at
WHERE a.status = 'published'
AND b.status = 'unpublished'
GROUP BY a.property_id, a.created_at
) published
ON properties.property_id = published.property_id
AND NOW() BETWEEN published.start_date AND published.end_date
LEFT OUTER JOIN
(
SELECT a.property_id, a.created_at AS start_date, MIN(b.created_at) AS end_date
FROM property_histories a
LEFT OUTER JOIN property_histories b
ON a.property_id = b.propert_id
AND a.created_at < b.created_at
WHERE a.status = 'hidden_from_search'
AND b.status = 'unhidden_from_search'
GROUP BY a.property_id, a.created_at
) hidden
ON properties.property_id = hidden.property_id
AND NOW() BETWEEN hidden.start_date AND hidden.end_date
WHERE published.property_id IS NOT NULL
AND hidden.property_id IS NULL
I used a numbers table as a handy shortcut. Essentially, your question revolved around wanting to know a running sum of published or unhidden versus unpublished or hidden. At this point, the paper IDs become a moot point in the view (provided their uniqueness is properly constrained elsewhere), and all we need is a custom sum. I have the example on SQLFiddle. Here's the query:
select years.n + 2013 as year, weeks.n as week
, c.City
,
(select
sum(case
when status in ('published','unhidden_from_research') then 1
when status in ('unpublished','hidden_from_research') then -1
else 0
end)
from property_histories p2
where weekofyear(p2.date) <= weeks.n
and p2.city=c.city
) AS Live_Count
from numbers weeks
inner join numbers years on weeks.n <= 52
cross join (select City from property_histories group by city) c
where years.n + 2013 <= (select max(year(date)) from property_histories)
group by years.n + 2013, weeks.n
, c.City
;