SELECT users that appear daily - mysql

I have a question that appears easy on the surface but I'm finding challenging, hence the request for help. I have a table with two columns:
table: USERS
USER_ID | LOGGED_IN_DATE
001 | 2015-05-01
002 | 2015-05-01
003 | 2015-05-01
001 | 2015-05-02
...
What I need is a query that will return all of the IDs that were present every day for a given week, say 2015-05-01 through 2015-05-07. Not just anytime during the week, but there must be a record for that user every day. I need the fastest and most concise query possible. Any ideas?
What I tried already:
Sub-queries
Union Queries
self-join
With no success.
Thanks!

Aggregation is probably the easiest way:
select u.user_id
from users u
where u.LOGGED_IN_DATE >= '2015-05-01' and u.LOGGED_IN_DATE < '2015-05-08'
group by u.user_id
having count(distinct date(u.LOGGED_IN_DATE)) = 7;
If the field is really a date with no time, then you don't need the date() function in the having clause.

After thinking about it, instead of trying to do some complicated SQL query, I asked myself what does it mean to be online daily. It means that the number of unique dates in that given time period should equal 7. So this query I think works well:
select sub.user_id, sub.count
FROM (select user_id, count(1) as count from users where logged_in_date >= '2015-05-01' AND logged_in_date < '2015-05-08' group by user_id) sub
where sub.count = 7;
Any thoughts/comments?
UPDATE:
This should handle any number of logins at the day level:
SELECT DISTINCT user_id, count(1) AS total
FROM (SELECT DISTINCT user_id, logged_in_date
FROM users
WHERE logged_in_date >= '2015-05-01'
AND logged_in_date < '2015-05-08'
ORDER BY logged_in_date) sub
GROUP BY user_id
HAVING total = 7;
As well as #Gordon's answer:
SELECT u.user_id
FROM users u
WHERE u.LOGGED_IN_DATE >= '2015-05-01'
AND u.LOGGED_IN_DATE < '2015-05-08'
GROUP BY u.user_id
HAVING COUNT(DISTINCT DATE(u.LOGGED_IN_DATE)) = 7;
I like his better though. Good job.

Related

User Churn - Final outer statement in a cte

I have a table below as
timestamp | user_id | activity
2021-02-01 03:21:11 mike12 read
2021-02-02 03:45:22 bob55 like
2021-02-03 04:21:33 sarah22 post
2021-02-01 04:11:33 cindy11 sign-in
I want to calculate # users churned in last 7 days as =
number of all users - active users (where active are those who like, read, comment, or post
with active_users as
(
select count(distinct user_id)
from table
where activity IN ('comment','post','read','like')
and date_diff(timestamp, current_date()) <= 7
)
, inactive_users as
(select count(distinct user_id)
from table
where activity IN ('sign-in')
and date_diff(timestamp, current_date()) <= 7)
What would be the correct way to subtract the two above? I am unsure of how to join the two ctes in the final query, thanks for helping!

Get values of first record and last record by date in mysql subquery

I know that for some MySQL pro, this is reasonably straightforward. I further realize that the answer could likely be figured out from other answers, however I've spent some real time trying to build this query, and I can't seem to figure out how to apply those solutions to my situation.
Mine seems different than others who want the "min and max" of a field - but I need the value from another field based on the "min and max" of the date field.
Given the following structure - a "user" table, and an "entries" table:
Data Sample (for "entries" table):
id | user_id | date | value
---+---------+--------------+-------
1 1 2018-02-01 125
2 5 2018-01-15 220
3 1 2017-12-31 131
4 4 2018-01-01 77
3 1 2017-12-15 133
I'd like to know value of the first entry (by date), the value of the last entry (by date), and the user_id.
The results should be:
user_id | first_date | first_value | last_date | last_value
--------+------------+-------------+------------+-----------
1 2017-12-15 133 2018-02-01 125
4 2018-01-01 77 2018-01-01 133
5 2018-01-15 220 2018-01-15 220
While I want the best solution, what I've been working on revolves around combining some queries like so:
SELECT user_id, l.date AS last_date, l.value AS last_value, f.date AS first_date, f.value AS first_value
FROM user AS u
LEFT JOIN (SELECT user_id, date, value FROM entries ORDER BY date ASC LIMIT 1) AS f ON f.user_id = u.user_id
LEFT JOIN (SELECT user_id, date, value FROM entries ORDER BY date DESC LIMIT 1) AS l ON l.user_id = u.user_id
NOTE: This doesn't work. If I wanted the "first entry" for someone, I would write a query that was SELECT user_id, date, value FROM entries ORDER BY date ASC LIMIT 1 - however, using it in the subqueries doesn't have the desired effect.
I've also tried some GROUP BY queries, with no success as well.
SQL Fiddle: http://sqlfiddle.com/#!9/71599
The following query gives you the expected result, but it's done without using a LEFT JOIN. So the NULL values are excluded.
SELECT
u.id AS user_id,
e1.date AS first_date,
e1.value AS first_value,
e2.date AS last_date,
e2.value AS last_value
FROM
users u,
(SELECT * FROM entries e ORDER BY date ASC) e1,
(SELECT * FROM entries e ORDER BY date DESC) e2
WHERE
e1.user_id = u.id
AND
e2.user_id = u.id
GROUP BY
u.id
And here's a working fiddle - http://sqlfiddle.com/#!9/71599/8
Also, it's worth noting that the LIMIT in your attempt would limit the results to 1 for all joined results, not each joined result. Either way, the LEFT JOIN didn't work. If anyone knows why, I'd be interested to understand.
Edit: Here's another attempt, this time utilising MIN() and MAX(), rather than ORDER BY. Unfortunately, you need to join the entries table multiple times for this to work though.
SELECT
u.id AS user_id,
e1.date AS first_date,
e1.value AS first_value,
e2.date AS last_date,
e2.value AS last_value
FROM users u
INNER JOIN entries e1 ON (u.id = e1.user_id)
INNER JOIN entries e2 ON (u.id = e2.user_id)
INNER JOIN (
SELECT user_id, MIN(date) AS date
FROM entries
GROUP BY user_id
) e3 ON (e1.user_id = e3.user_id AND e1.date = e3.date)
INNER JOIN (
SELECT user_id, MAX(date) AS date
FROM entries
GROUP BY user_id
) e4 ON (e2.user_id = e4.user_id AND e2.date = e4.date)
GROUP BY u.id
Another working fiddle: http://sqlfiddle.com/#!9/71599/18

How to add or substract minutes from a timediff result in mysql

I'm trying to make a query that will show all worked hours, days and persons.
I have that runnning.
Asume i have a table called uren:
rec_id | user_id | start (datetime) | eind (datetime)
and i have table called users
user_id | name |
With the query below i nearly have all the info i want.
select users.name, sec_to_time(SUM(TIME_TO_SEC(TIMEDIFF(uren.eind, uren.start)))),count(distinct(date(start))) as dagen
from uren, users
where date(uren.start) between CAST('2017-10-04 00:00:00' as Date) and CAST('2017-11-04 00:00:00' as DATE) and
uren.user_id = users.user_id
group by uren.user_id
ORDER BY name
Which shows me this
Piet (name) 230 (hours total) 24(days worked)
Now comes the real question:
I want to subtract 30 minutes for each day worked less then 5 hours.
Im clueless atm.
Can someonme please help
Assuming one row per user per day:
select users.name,
(sec_to_time(sum(time_to_sec(timediff(uren.eind, uren.start))) -
30 * 60 * sum(time_to_sec(timediff(uren.eind, uren.start)) < 5*60*60)
)
),
count(distinct(date(start))) as dagen
from uren join users
on uren.user_id = users.user_id
where date(uren.start) between '2017-10-04' and '2017-11-04'
group by uren.user_id
order by name;
If one day has multiple shifts, you need to aggregate by day first:
select u.name,
(sec_to_time(sum(day_secs) -
30 * 60 * sum(day_secs < 5*60*60)
)
),
count(*) as dagen
from (select uren.user_id, uren.name, date(uren.start),
sum(time_to_sec(timediff(uren.eind, uren.start))) as day_secs
from uren join
users
on uren.user_id = users.user_id
where uren.start >= '2017-10-04' and uren.start < '2017-11-05'
group by uren.user_id, date(uren.start)
) u
group by name
order by name
There is a good reason that you do not have a clue. It is because what you are asking is quite complex.
The first thing you need to accept (and I mean despairingly accept) is that your function depends days and users, and therefore, you need group by both, at first, then only by users in the final result. To do this, you will need a subquery that groups by days and users, before the parent can group by users.
Here is what I came up with...
SELECT
users.name,
sec_to_time(SUM(ur.timeWorked)) as tWorked,
SUM(ur.dagen) as Dagen
FROM users
INNER JOIN (
SELECT
user_id,
SUM(TIME_TO_SEC(TIMEDIFF(`eind`, `start`))) - (1800 *
IF(SUM(TIME_TO_SEC(TIMEDIFF(`eind`, `start`))) < 18000,1,0))
as `timeWorked`,
count(distinct(date(`start`))) as `dagen`
FROM uren
WHERE date(`start`)
BETWEEN CAST('2017-10-04 00:00:00' as Date)
AND CAST('2017-11-04 00:00:00' as DATE)
GROUP BY user_id, DAYOFYEAR(`start`)
) as ur ON ur.user_id = users.user_id
GROUP BY ur.user_id
ORDER BY name

Joining two tables by date MySQL

I have this:
SELECT * FROM history JOIN value WHERE history.the_date >= value.the_date
is it possible to somehow to ask this question like, where history.the_date is bigger then or equal to biggest possible value of value.the_date?
HISTORY
the_date amount
2014-02-27 200
2015-02-26 2000
VALUE
the_date interest
2010-02-10 2
2015-01-01 3
I need to pair the correct interest with the amount!
So value.the_date is the date since when the interest is valid. Interest 2 was valid from 2010-02-10 till 2014-12-31, because since 2015-01-01 the new interest 3 applies.
To get the current interest for a date you'd use a subquery where you select all interest records with a valid-from date up to then and only keep the latest:
select
the_date,
amount,
(
select v.interest
from value v
where v.the_date <= h.the_date
order by v.the_date desc
limit 1
) as interest
from history h;
use join condition after ON not in where clause...
SELECT * FROM history JOIN (select max(value.the_date) as d from value) as x on history.the_date >= x.d
WHERE 1=1
Presumably, you want this:
select h.*
from history h
where h.the_date >= (select max(v.the_date) from value v);

How can I write a query that aggregate a single row with latest date among multiple set of rows?

I have a MySQL table where there are many rows for each person, and I want to write a query which aggregates rows with special constraint. (one per person)
For example, lets say the table is consist of following data.
name date reason
---------------------------------------
John 2013-04-01 14:00:00 Vacation
John 2013-03-31 18:00:00 Sick
Ted 2012-05-06 20:00:00 Sick
Ted 2012-02-20 01:00:00 Vacation
John 2011-12-21 00:00:00 Sick
Bob 2011-04-02 20:00:00 Sick
I want to see the distribution of 'reason' column. If I just write a query like below
select reason, count(*) as count from table group by reason
then I will be able to see number of reasons for this table overall.
reason count
------------------
Sick 4
Vacation 2
However, I am only interested in single reason from each person. The reason that should be counted should be from a row with latest date from the person's records. For example, John's latest reason would be Vacation while Ted's latest reason would be Sick. And Bob's latest reason (and the only reason) is Sick.
The expected result for that query should be like below. (Sum of count will be 3 because there are only 3 people)
reason count
-----------------
Sick 2
Vacation 1
Is it possible to write a query such that single latest reason will be counted when I want to see distribution(count) of reasons?
Here are some facts about the table.
The table has tens of millions of rows
For most of times, each person has one reason.
Some people have multiple reasons, but 99.99% of people have fewer than 5 reasons.
There are about 30 different reasons while there are millions of distinct names.
The table is partitioned based on date range.
SELECT T.REASON, COUNT(*)
FROM
(
SELECT PERSON, MAX(DATE) AS MAX_DATE
FROM TABLE-NAME
GROUP BY PERSON
) A, TABLE-NAME T
WHERE T.PERSON = A.PERSON AND T.DATE = A.MAX_DATE
GROUP BY T.REASON
Try this
select reason, count(*) from
(select reason from table where date in
(select max(date) from table group by name)) t
group by reason
In MySQL, it's not very efficient to do this kind of query since you don't have access to tools like partitionning query in SQL Server or Oracle.
You can still emulate it by doing a subquery and retrieve the rows based on the condition you need, here the maximum date :
SELECT t.reason, COUNT(1)
FROM
(
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
) maxDateRows
INNER JOIN #aTable t ON maxDateRows.name = t.name
AND maxDateRows.maxDate = t.adate
GROUP BY t.reason
You can see a sample here.
Test this query on your samples, but I'm afraid that it will be slow as hell.
For your information, you can do the same thing in a more elegant and much much faster way in SQL Server :
SELECT reason, COUNT(1)
FROM
(
SELECT name
, reason
, RANK() OVER(PARTITION BY name ORDER BY adate DESC) as Rank
FROM #aTable
) AS rankTable
WHERE Rank = 1
GROUP BY reason
The sample is here
If you are really stuck to MySql, and the first query is too slow, then you can split the problem.
Do a first query creating a table:
CREATE TABLE maxDateRows AS
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
Then create index on both name and maxDate.
Finally, get the results :
SELECT t.reason, COUNT(1)
FROM maxDateRows m
INNER JOIN #aTable t ON m.name = t.name
AND m.maxDate = t.adate
GROUP BY t.reason
The solution you are looking for seems to be solved by this query :
select
reason,
count(*)
from (select * from tablename group by name) abc
group by
reason
It is quite fast and simple. You can view the SQL Fiddle
Apologies if this answer duplicates an existing. Maybe I'm suffering from some form aphasia but I cannot see it...
SELECT x.reason
, COUNT(*)
FROM absentism x
JOIN
( SELECT name,MAX(date) max_date FROM absentism GROUP BY name) y
ON y.name = x.name
AND y.max_date = x.date
GROUP
BY reason;