MySql Select last row with 30 minutes difference in date - mysql

This is a followup to this question MySql Select rows with 30 minutes difference in date, albeit similar in concept the solution needed might be different.
I have a MySql-8.0/MariaDb-10.4 table that contains a list of site visits of different visitors:
I want to create a query that returns the last visit of each visit session, where the session definition is where the CreatedAt date is 30 min or more from the previous visits.
So in my case, I should be returning row 7 (Id column), row 12 and row 13. Note also that a session can be more than 30 minutes, as long as each visit succeeds a previous visit with less than 30min.
The neat solution suggest by #EugenRieck was as follows:
SELECT
late.*
FROM activities AS late
LEFT JOIN activities AS early
ON late.VisitorId=early.VisitorId
AND late.CreatedAt>early.CreatedAt
AND late.CreatedAt<=DATE_ADD(early.CreatedAt, INTERVAL +30 MINUTE)
WHERE early.Id IS NULL
-- Maybe: AND late.VisitorId='26924c19-3cd1-411e-a771-5ebd6806fb27'
-- Maybe: ORDER BY late.CreatedAt
It works great, but it works by returning the first visit in each visit session, not the last visit. I tried to modify to work as i wanted but with no luck. Please help.

This is a variant of gap-and-islands problem. But you can handle it using lead(). Just check if the next createdAt is over 30 minutes from the value in a given row. That is the last row for a session:
select a.*
from (select a.*,
lead(createdAt) over (partition by visitorid order by createdat) as next_ca
from activities a
) a
where next_ca > createdAt + interval 30 minute;
Usually, in this situation you would want the last row as well. You would get that with or next_ca is null.

Related

What's the difference between the two SQL statements?

This is a question from leetcode, using the second query I got the question wrong but could not identify why
SELECT
user_id,
max(time_stamp) as "last_stamp"
from
logins
where
year(time_stamp) = '2020'
group by
user_id
and
select
user_id,
max(time_stamp) as "last_stamp"
from
logins
where
time_stamp between '2020-01-01' and '2020-12-31'
group by
user_id
The first query uses a function on every row to extract the year (an integer) and compares that to a string. (It would be preferable to use an integer instead.) Whilst this may be sub-optimal, this query would accurately locate all rows that fall into the year 2020.
The second query could fail to locate all rows that fall into 2020. Here it is important to remember that days have a 24 hour duration, and that each day starts at midnight and concludes at midnight 24 hours later. That is; a day does have a start point (midnight) and an end-point (midnight+24 hours).
However a single date used in SQL code cannot be both the start-point and the end-point of the same day, so every date in SQL represents only the start-point. Also note here, that between does NOT magically change the second given date into "the end of that day" - it simply cannot (and does not) do that.
So, when you use time_stamp between '2020-01-01' and '2020-12-31' you need to think of it as meaning "from the start of 2020-01-01 up to and including the start of 2020-12-31". Hence, this excludes the 24 hours duration of 2020-12-31.
The safest way to deal with this is to NOT use between at all, instead write just a few characters more code which will be accurate regardless of the time precision used by any date/datetime/timestamp column:
where
time_stamp >= '2020-01-01' and time_stamp <'2021-01-01'
with the second date being "the start-point of the next day"
See answer to SQL "between" not inclusive

MySql Select rows with 30 minutes difference in date

I have a MySql-8.0/MariaDb-10.4 table that contains a list of site visits of different visitors:
I want to create a query that returns the first visit of each visit session, where the session definition is where the CreatedAt date is 30 min or more from the previous visits.
So in my case, I should be returning row 2 (Id column), row 8 and row 13. Note also that a session can be more than 30 minutes, as long as each visit succeeds a previous visit with less than 30min.
My solution was as follows:
SELECT DISTINCT a.`CreatedAt`
FROM activities AS a
LEFT JOIN activities AS b
ON (
(UNIX_TIMESTAMP(b.`CreatedAt`) >= (UNIX_TIMESTAMP(a.`CreatedAt`) - (30 * 60)) ) AND
(b.`CreatedAt` < a.`CreatedAt`)
)
WHERE (b.`CreatedAt` IS NULL) AND (a.`VisitorId` = '26924c19-3cd1-411e-a771-5ebd6806fb27' /* or others for example */ )
It works alright, but it does not return the last row 13, also I'm not sure it's the best solution. Thanks in advance.
The easiest way to approach this is to relate all visits to their earlier siblings and then chose only those, that have none. The (more intuitive) other approach of taking the fist of each, that has a later sibling will fail if no later visit exists (as in your example with ID 13).
SELECT
late.*
FROM activities AS late
LEFT JOIN activities AS early
ON late.VisitorId=early.VisitorId
AND late.CreatedAt>early.CreatedAt
AND late.CreatedAt<=DATE_ADD(early.CreatedAt, INTERVAL +30 MINUTE)
WHERE early.Id IS NULL
-- Maybe: AND late.VisitorId='26924c19-3cd1-411e-a771-5ebd6806fb27'
-- Maybe: ORDER BY late.CreatedAt
I've got a similar answer to #Eugen Rieck https://stackoverflow.com/a/61027502/625144. But using MySQL TIMESTAMPDIFF function
SELECT a.*,
FROM activities a
LEFT JOIN activities b
ON b.VisitorId = a.VisitorId
AND a.Id > b.Id
AND TIMESTAMPDIFF(MINUTE, b.CreatedAt, a.CreatedAt) <= 30
WHERE
b.Id IS NULL
;

MySQL: Select count of returning vs. new rows in MySQL with period

I have database table in MySQL, which consist of the following fields:
id
user_id
timestamp
The table is a simple log of visitors. I am trying to get the following numbers in one query:
Distinct user_id's for a specific time period (30 days)
Amount of these user_id's, which already exist in the table, regardless of time period
I have been able to do it within the period with this simple query:
SELECT
COUNT(DISTINCT user_id) AS 'count_distinct',
COUNT(user_id) AS 'count_all'
FROM
table
WHERE
timestamp BETWEEN CURDATE() - INTERVAL 30 DAY AND CURDATE();
Running this query gives me the count of distinct user_id's and the count of all user_id's within the time period. I can then apply the math myself to get the count of new vs. returning visitors - for that period. What I am trying to figure out is how many distinct user_id's, who visited within 30 days, who has also visited at any previous point in time.
I hope you can help me solve this.

How to get a rolling data set by week with sql

I had a sql query I would run that would get a rolling sum (or moving window) data set. I would run this query for every 7 days, increase the interval number by 7 (28 in example below) until I reached the start of the data. It would give me the data split by week so I can loop through it on the view to create a weekly graph.
SELECT *
FROM `table`
WHERE `row_date` >= DATE_SUB(NOW(), INTERVAL 28 DAY)
AND `row_date` <= DATE_SUB(NOW(), INTERVAL 28 DAY)
This is of course very slow once you have several weeks worth of data. I wanted to replace it with a single query. I came up with this.
SELECT *
CONCAT(YEAR(row_date), '/', WEEK(row_date)) as week_date
FROM `table`
GROUP BY week_date
ORDER BY row_date DESC
It appeared mostly accurate, except I noticed the current week and the last week of 2015 was much lower than usual. That's because this query gets a week starting on Sunday (or Monday?) meaning that it resets weekly.
Here's a data set of employees that you can use to demonstrate the behavior.
CREATE TABLE employees (
id INT NOT NULL,
first_name VARCHAR(14) NOT NULL,
last_name VARCHAR(16) NOT NULL,
row_date DATE NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO `employees` VALUES
(1,'Bezalel','Simmel','2016-12-25'),
(2,'Bezalel','Simmel','2016-12-31'),
(3,'Bezalel','Simmel','2017-01-01'),
(4,'Bezalel','Simmel','2017-01-05')
This data will return the last 3 rows on the same data point on the old query (last 7 days) assuming you run it today 2017-01-06, but only the last 2 rows on the same data point on the new query (Sunday to Saturday).
For more information on what I mean by rolling or moving window, see this English stack exchange link.
https://english.stackexchange.com/questions/362791/word-for-graph-that-counts-backwards-vs-graph-that-counts-forwards
How can I write a query in MySQL that will bring me rolling data, where the last data point is the last 7 days of data, the previous point is the previous 7 days, and so on?
I've had to interpret your question a lot so this answer might be unsuitable. It sounds like you are trying to get a graph showing data historically grouped into 7-day periods. Your current attempt does this by grouping on calendar week instead of by 7-day period leading to inconsistent size of periods.
So using a modification of your dataset on sql fiddle ( http://sqlfiddle.com/#!9/90f1f2 ) I have come up with this
SELECT
-- Figure out how many periods of 7 days ago this record applies to
FLOOR( DATEDIFF( CURRENT_DATE , row_date ) / 7 ) AS weeks_ago,
-- Count the number of ids in this group
COUNT( DISTINCT id ) AS number_in_week,
-- Because this is grouped, make sure to have some consistency on what we select instead of leaving it to chance
MIN( row_date ) AS min_date_in_week_in_dataset
FROM `sample_data`
-- Groups by weeks ago because that's what you are interested in
GROUP BY weeks_ago
ORDER BY
min_date_in_week_in_dataset DESC;

Select Top Viewed From Last 7 Days

I have a table with a date stamp E.g (1241037505). There's also a column with the number of views.
The data stamp resembles when it was created.
So I want to select the top viewed threads from the past week.
How do I do this?
Try this:
SELECT * WHERE
DATEDIFF(NOW(),created_date) < 7
SELECT * FROM table WHERE createdon > SUBDATE(NOW(), '7 day') ORDER BY hits DESC;
See: http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_subdate
The data you're currently tracking isn't going to allow you to select the top viewed in the last week. It will show you the top viewed over all time, or the most viewed items created in the last week. If something was created two weeks ago, but was viewed more than anything else during the last week you cannot determine that from the data you're tracking. One way I can see to do it would be to track the number of hits each content item gets each day of the week.
create table daily_hits {
cid integer, -- content id points to the table you already have
dotw smallint, -- 0-6 or similar
hits integer
PRIMARY KEY (cid, dotw)
}
Whenever you increase the hit count on the content item, you would also update the daily_hits table for the given content id and day of the week. You would need a function that converted the current date/time to a day of the week. MySql provides DAYOFWEEK for this purpose.
To get the most viewed in the last week, you could query like this:
SELECT cid, SUM(hits) FROM daily_hits GROUP BY cid ORDER BY SUM(hits) DESC
You will need some type of scheduled job that deletes the current day of the week at midnight so you aren't accumulating forever and essentially performing the same accumulation happening on the hits column of the current table.
SELECT * FROM table WHERE Date_Created > (7 days ago value) ORDER BY Hits LIMIT 0,100
or you could use this (per WishCow's Answer)
SELECT * FROM table WHERE Date_Created > SUBDATE(NOW(), '7 day') ORDER BY Hits LIMIT 0,100