Get the count per date of first logged in - mysql

I have a table containing the logging of a web app which tracks when people log in. An example of my table is:
| user_id | date_time |
+---------+------------------+
| 0033 | 2012-11-22 10:33 | <- first login of 0033 on 2012-11-22
| 0034 | 2012-11-22 10:38 | <- first login of 0034 on 2012-11-22
| 0052 | 2012-11-22 10:43 | <- first login of 0052 on 2012-11-22
| 0052 | 2012-11-23 09:23 |
| 0066 | 2012-11-23 15:58 | <- first login of 0066 on 2012-11-23
| 0033 | 2012-11-23 16:14 |
The thing I want is a table with the amount of people that logged in for the first time on each date, i.e.:
| count | date |
+-------+------------+
| 3 | 2012-11-22 | <- there were 3 users that logged in for the first time on 2012-11-22
| 1 | 2012-11-23 |
I know I can get the date only, by doing
SELECT DATE(`date_time`) AS `date`
FROM `logging`
GROUP BY `date`
ORDER BY `date` ASC
I would like to get the second table in one query, I know it's possible, I just don't know how. Thanks in advance

You can use an uncorrelated subquery to get the first login date for every user and then group those dates together to get the number of first logins per day.
SELECT dd, COUNT(*)
FROM (SELECT MIN(DATE(`date_time`)) AS dd
FROM `logging`
GROUP BY `user_id`) a
GROUP BY dd
ORDER BY dd;
Demo

Count the number of logins per day, for user_ids that have do not have a previous login record:
select DATE(`date_time`) as `date`,
count(user_id)
from `logging` l1
where user_id not in (
select user_id from `logging` l2 where l1.user_id = l2.user_id and l2.date_time < l1.date_time)
group by DATE(`date_time`)

I think you need this:
SELECT count(1) ,
DATE(`date_time`)
from my_table
group by DATE(`date_time`)
If you need users which had been logged in day wise
Select
user_id, `date_time` from my_table group by DATE(`date_time`), user_id

Related

Count the first survey sent after a specific incident case resolve date

I'm trying to solve a problem in my analytical project whereas a representative gets a customer call/email and if needed they would ask for an advisor help (advise ring) and after that the representative would resolve the email/phone, henceforth sending a survey to the customer.
I want to track the first survey sent after the advise ring have been resolved only, but it does not matter if the customer opened the incident again and the representative resolved the incident without an advise ring, we would not want to count that survey.
Use case: A Representative gets an email > Pulls an advise ring > The advisor resolves it > Representative solves the email > Survey is sent (Gets counted). > Customer open the incident again > Rep solves the incident without an advise ring > Survey is sent (Does not get counted).
The problem here is that I can't track the subsequent dates for some reason.
Survey Table example
Incident_ID| Survey_sent_date | survey_id | response |
-----------+---------+-----+------------+------------------
3324324 | 2022-03-03 16:23:02.1| 7 | |
3324324 | 2022-03-03 18:32:0.1 | 14 | N |
3324324 | 2022-03-04 11:32:0.1 | 9 | Y |
3324324 | 2022-03-05 16:23:02.1| 17 | Y |
3324324 | 2022-03-11 18:31:12.1| 134 | N |
3324324 | 2022-03-11 20:35:12.1| 139 | N |
3324324 | 2022-03-15 19:45:26.0| 29 | Y |
3324324 | 2022-03-17 19:45:26.0| 229 | Y |
Advise ring table
Incident_ID| advise_ring_resolve_date| advise_ring_id
-----------+---------+-----+------------+--------------
3324324 | 2022-03-11 18:30:52.1 | 247 |
3324324 | 2022-03-15 19:30:37.0 | 143 |
Expected Results:
Incident_ID| Survey_sent_date | survey_id | response |
-----------+---------+-----+------------+------------------
3324324 | 2022-03-11 18:31:12.1 | 134 | N |
3324324 | 2022-03-15 19:45:26.0 | 29 | Y |
I have tried to get the first rank of each date between each survey_sent_date and get the subsequent survey_id between the advise_ring_resolve_date and the survey_sent_date.
What I have in mind is the following:
On the advise_ring_id 247, get the survey sent on 2022-03-11 18:31:12.1 ONLY as it was immediately sent after the advise ring was resolved.
Similarly, the advise_ring_id 143 would get the survey sent on 2022-03-15 19:45:26.0 as it was immediately after the advise ring was resolved. and so on.
WITH surveys_sent as(
SELECT DISTINCT
ss.incident_id,
ss.survey_sent_date,
ss.survey_id,
ss.response,
ROW_NUMBER OVER (PARTITION BY ss.survey_id ORDER BY ss.survey_sent_date) as req
FROM SURVEY_TABLE SS
WHERE ss.incident_id = '3324324'),
advise_ring as(
SELECT DISTINCT
ar.incident_id,
ar.advise_ring_resolve_date,
ar.advise_ring_id
FROM ADVISE_TABLE
WHERE ss.incient_id = '3324324')
SELECT DISTINCT
finalar.case_id,
finalss.survey_id,
finalss.survey_sent_date
FROM advise_ring finalar
INNER JOIN surveys_sent finalss
ON finalar.case_id = finalss.case_id AND finalar.advise_ring_resolve_date BETWEEN
finalar.advise_ring_resolve_date AND finalss.survey_sent_date
WHERE finalss.req = 1;
I tried to full outer and left join the table, also tried to remove the ranking bit which did not help at all.
The full outer join provided me with a union like join where both of the columns are empty at their respective timestamp.
The desired result would having the count OR the details of the surveys sent after the first advise ring has been sent on that specific date.
Do you have any advise on the above?
Thanks!
EDIT:
I have tried to get the rank of the lowest difference between the advise ring date and the survey sent date, however it gives me duplicate values for some reason.
WITH base AS
(
SELECT DISTINCT *
FROM ADVISE_TABLE
WHERE INCIDENT_ID = 3324324
)
SELECT *
FROM
(
SELECT
base.advise_ring_id,
ss.survey_sent_date,
ss.survey_id,
base.advise_ring_resolve_date,
DATEDIFF(sec,ss.survey_sent_date,base.advise_ring_resolve_date)
as time_diff
,ROW_NUMBER() over (PARTITION BY base.advise_ring_id ORDER BY
DATEDIFF(sec,ss.survey_sent_date,base.advise_ring_resolve_date)
DESC) as rk
FROM base
LEFT JOIN SURVEY_TABLE ss
on ss.incident_id = base.incident_id
WHERE
DATEDIFF(sec,ss.survey_sent_date,base.advise_ring_resolve_date)
<=0
)
WHERE rk = 1
Result is from real table but different data:
base.advise_ring_id|Survey_sent_date | survey_id | base.advise_ring_resolve_date|
-----------+---------+-----+------------+------------------
123 | 2022-12-22 16:23:02.1| 7 | 2022-12-15 13:23:02.1 |
333 | 2022-01-03 06:55:25.1 | 14 | 2022-12-23 18:23:02.1 |
333 | 2022-01-03 06:55:25.1 | 14 | 2022-12-29 11:23:02.1 |
Basically from the result the top 2 only should be included not the third one, so by each date.
I think something like this should work. Use LAG to get the previous survey date based on the current row. Then search for any Advise_Ring records between the previous and current survey date range. Finally use ROW_NUMBER() to rank the Advise_Ring records and return the first one.
WITH surveys AS (
SELECT *
, LAG(Survey_Sent_Date, 1) OVER(
PARTITION BY Incident_Id ORDER BY Survey_Sent_Date
) AS Prev_Survey_Sent_Date
FROM Survey_Table
)
SELECT t.Incident_ID
, t.Survey_sent_date
, t.survey_id
, t.response
, t.advise_ring_id
FROM (
SELECT s.Incident_ID
, s.Survey_sent_date
, s.survey_id
, s.response
, ar.advise_ring_id
, ar.advise_ring_resolve_date
, ROW_NUMBER() OVER(
PARTITION BY s.survey_id
ORDER BY ar.advise_ring_resolve_date, ar.advise_ring_id
) AS advise_row_num
FROM surveys s INNER JOIN Advise_Ring ar
ON s.Incident_ID = ar.Incident_ID
AND s.Prev_Survey_Sent_Date < ar.advise_ring_resolve_date
AND s.Survey_Sent_Date > ar.advise_ring_resolve_date
) t
WHERE t.advise_row_num = 1
ORDER BY t.Survey_Sent_Date
;
Results:
Incident_ID
Survey_sent_date
survey_id
response
advise_ring_id
7496137492
2021-12-22 15:25:48
5245
Y
1
7496137492
2022-01-03 06:55:25
4613
Y
12
db<>fiddle here

How get conditional count while using group by in mysql?

Mysql newbie here.
I have a table( name:'audit_webservice_aua' ) like this:
+---------+------------------------------------+-------------------+------------------------+
| auditId | device_code | response_status | request_date
+---------+------------------------------------+-------------------+------------------------+
| 10001 | 0007756-gyy66-4c6e-a59d-xxxccyyyt1 | P | 2020-03-02 00:00:08.785
| 10002 | 0007756-gyy66-4c6e-a59d-xxxccyyyt2 | F | 2020-04-06 00:00:08.785
| 10003 | 0007756-gyy66-4c6e-a59d-xxxccyyyt3 | F | 2020-04-01 00:01:08.785
| 10004 | 0007756-gyy66-4c6e-a59d-xxxccyyyt1 | P | 2020-05-02 00:02:08.785
| 10005 | 0007756-gyy66-4c6e-a59d-xxxccyyyt1 | P | 2020-05-09 00:03:08.785
| 10006 | 0007756-gyy66-4c6e-a59d-xxxccyyyt2 | P | 2020-05-09 01:00:08.785
| 10007 | 0007756-gyy66-4c6e-a59d-xxxccyyyt7 | F | 2020-06-06 02:00:08.785
+---------+------------------------------------+-------------------+------------------------+
Every time a new request is made the above table stores the requesting device_code ,response_status and request time.
I have a requirement of getting the result set which contains the each device_code, total_trans, total_successful, total_failure and date for each day between two given dates.
The query i have written is as follows:
SELECT DATE_FORMAT(aua.request_date,'%b') as month ,
YEAR(aua.request_date) as year,
DATE_FORMAT(aua.request_date,'%Y-%m-%d') as date,
(select count(aua.audit_id) )as total_trans ,
(select count(aua.audit_id) where aua.response_status 'P') as total_failure ,
(select count(aua.audit_id) where aua.response_status = 'P') as total_successful ,
aua.device_code as deviceCode
FROM audit_webservice_aua aua where DATE_FORMAT(aua.request_date,'%Y-%m-%d') between '2020-04-16' and '2020-07-17'
group by dates,deviceCode ;
In the above code im tring to get results between '2020-03-02' and '2020-06-06' but the count im getting is not correct.
Any help would be appreciated.
Thank you in advance.
I think you just want conditional aggregation:
SELECT DATE_FORMAT(aua.request_date,'%b') as month ,
YEAR(aua.request_date) as year,
DATE_FORMAT(aua.request_date, '%Y-%m-%d') as date,
COUNT(aua.audit_id) as total_trans ,
SUM(aua.response_status <> 'P') as total_failure,
SUM(aua.response_status = 'P') as total_successful,
aua.device_code as deviceCode
FROM audit_webservice_aua aua
WHERE DATE_FORMAT(aua.request_date, '%Y-%m-%d') between '2020-04-16' and '2020-07-17'
GROUP BY month, year, date, deviceCode ;
I would also advise you to change the WHERE clause to:
WHERE aua.request_date >= '2020-04-16' AND
aua.request_date >= '2020-07-18'

mysql counts the number of new visitors by day

MySQL version: 5.7
Here is users table:
+------------+------+
| date | uid |
+------------+------+
| 2020-06-29 05:00:00 | 352 |
| 2020-06-29 08:00:00 | 354 |
| 2020-06-29 09:25:53 | 354 |
| 2020-06-30 08:00:00 | 863 |
| 2020-06-30 09:00:01 | 352 |
| 2020-06-30 09:59:59 | 352 |
| 2020-07-01 07:00:00 | 358 |
| 2020-07-01 09:00:00 | 358 |
+------------+------+
I want to count the number of new visitors per day,But there is an important condition here that new visitors of the day cannot be visited before.
I want the result:
Result:
+------------+------------------+
| date | new_user_count |
+------------+------------------+
| 2020-06-29 | 2 |
| 2020-06-30 | 1 |
| 2020-07-01 | 1 |
+------------+------------------+
The above result is equivalent to these three sql:
2020-06-29 (352,354) : select count( distinct uid ) as new_user_count from users where DATE(date) = '2020-06-29' and uid not in ( select distinct uid from users where date < '2020-06-29 05:00:00'); #2
2020-06-30 (863): select count( distinct uid ) as new_user_count from users where DATE(date)= '2020-06-30' and uid not in ( select distinct uid from users where date < '2020-06-30 08:00:00'); # 1
2020-07-01 (358): select count( distinct uid ) as new_user_count from users where DATE(date)= '2020-07-01' and uid not in ( select distinct uid from users where date < '2020-07-01 07:00:00'); # 1
I haven't thought of it until now, thanks
Here is Online users table
You could try using a correlated subquery to check if each user visit be the first or not:
SELECT
date,
SUM(CASE WHEN NOT EXISTS (SELECT 1 FROM users u2
WHERE u2.date < u1.date AND u2.uid = u1.uid)
THEN 1 ELSE 0 END) AS new_user_count
FROM
(SELECT DISTINCT date, uid FROM users) u1
GROUP BY
date;
Demo
The above logic actually reads straightforward, and says to count a user record only if we cannot find that same user appearing in the table at some later date. Note that I use distinct selects, because it appears that in your data a given user might appear more than once on the same date. This data would spoof the above correlated subquery, so we ensure that a given user appear only once on a given date (and besides, one user can only be counted once per day anyway).
SELECT
date,
(
SELECT COUNT(DISTINCT u1.uid)
FROM users u1
WHERE NOT EXISTS(
SELECT * FROM users u2
WHERE u2.uid = u1.uid AND u2.date < u0.date
) AND u1.date = u0.date
)
FROM
users u0
GROUP BY
date
;
-- get date and the amount of distinct users
SELECT date, COUNT(DISTINCT uid)
-- from users table
FROM users
-- only when there not exists a row
WHERE NOT EXISTS ( SELECT NULL -- may use any literal value instead of NULL
-- in the table
FROM users u
-- with this user id
WHERE users.uid = u.uid
-- but earlier (less) date
AND users.date > u.date )
GROUP BY date;

Query to get the count of logins by a user within a set time interval from previous login

I want to get a count of how many times a user logs in within, let's say, 5 hours from the previous login.
So something like new_login - old_login < 5 hours.
The login table would have user_id and time_accessed.
This query is to get the count of user logins within a day. I can't figure out how to compare the different times within the same column within the same statement:
SELECT user_id, date(time_accessed), count(user_id) AS login_within_5_hour_period
FROM login
GROUP BY user_id, date(time_accessed)
ORDER BY time_accessed;
Sample input
+---------+---------------------+
| user_id | time_accessed |
+---------+---------------------+
| 1 | 2020-02-19 09:00:00 |
| 1 | 2020-02-19 12:00:00 |
| 1 | 2020-02-19 13:00:00 |
| 1 | 2020-02-19 19:00:00 |
+---------+---------------------+
Sample ouput
+---------+---------------------+----------------------------+
| user_id | date(time_accessed) | login_within_5_hour_period |
+---------+---------------------+----------------------------+
| 1 | 2020-02-19 | 3 |
| 1 | 2020-02-19 | 1 |
+---------+---------------------+----------------------------+
In order to compare different times, you need to join the table with itself.
The following query will find the number of logins by the user within 5 hours, excluding the current login. If you want to include the current login in the count, change this l1.time_accessed > l2.time_accessed to l1.time_accessed >= l2.time_accessed.
SELECT l1.user_id, l1.time_accessed, COUNT(l2.user_id) AS login_within_5_hour_period
FROM logins l1
LEFT JOIN logins l2
ON l1.user_id = l2.user_id
AND l1.time_accessed > l2.time_accessed
AND TIME_TO_SEC(TIMEDIFF(l1.time_accessed, l2.time_accessed)) / 3600 <= 5
GROUP BY l1.user_id, l1.time_accessed;
This second query will return a single result, showing the number of logins by the user within 5 hours of the time specified.
SELECT l1.user_id, l1.time_accessed, COUNT(l2.user_id) AS login_within_5_hour_period
FROM logins l1
LEFT JOIN logins l2
ON l1.user_id = l2.user_id
AND l1.time_accessed > l2.time_accessed
AND TIME_TO_SEC(TIMEDIFF(l1.time_accessed, l2.time_accessed)) / 3600 <= 5
WHERE l1.time_accessed = '2020-02-19 19:00:00'
GROUP BY l1.user_id, l1.time_accessed;
Working example: https://www.db-fiddle.com/f/g7jDYqoKn38iQTFuPjej9m/1

calculate the Ranking of questionnaire

I am trying to develop a ranking table for a sort of questionnaire.
Each day a question is asked at 16h (4:00 pm), which can be answered by 17:59:59 the following day. The table has to show the position of the participants taking into account the correct answers is the time.
My table will be of the sort:
+-------+---------+---------------------+
|userid | correct | timestamp |
+-------+---------+---------------------+
| 2 | 1 | 2018-02-07 16:00:01 |
| 1 | 1 | 2018-02-07 16:02:00 |
| 3 | 1 | 2018-02-07 17:00:00 |
| 1 | 0 | 2018-02-08 16:00:02 |
| 3 | 1 | 2018-02-08 16:00:05 |
| 2 | 0 | 2018-02-08 16:01:00 |
+-------+---------+---------------------+
For now I started with this query:
SELECT `userid`, `correct `, `timestamp`,
count(correct) as count
FROM `results`
WHERE correct = 1
GROUP BY `userid `
ORDER BY count DESC, timestamp DESC
But I have already realized that this is not what I intend because the ranking has to be cumulative but taking into account the several days.
Does anyone have an idea how I can do this?
A user from Stackoverflow Portugal advised this code but it is not working either.
SELECT userid, SUM(correct),
SUM(TIMESTAMPDIFF(HOUR,(timestamp,CAST(CONCAT_WS(' ',date(timestamp), '17:59:59') as DATETIME)))) time
FROM results
GROUP BY userid
ORDER BY correct DESC, time
Don’t deal with this datetime (16h), this may be changed and you will be lost on your query.
Instead, you should count by userid and questionnaire_id. To do so:
add new table questionnaire [id, title] (you can add extra column
later : created_time, end_time, …)
edit your record table by adding the questionnaire id as FK : [userid, questionnaireid, correct, timestamp]
then count normally: Correct answer by user, by questionnaire
SELECT userid, questionnaireid ,
sum(correct) as total
FROM results r
INNER JOIN questionnaire q
ON r.questionnaireid = q.id
WHERE correct = 1
GROUP BY userid, questionnaireid
ORDER BY total DESC, id ASC