MySql Select rows with 30 minutes difference in date - mysql

I have a MySql-8.0/MariaDb-10.4 table that contains a list of site visits of different visitors:
I want to create a query that returns the first visit of each visit session, where the session definition is where the CreatedAt date is 30 min or more from the previous visits.
So in my case, I should be returning row 2 (Id column), row 8 and row 13. Note also that a session can be more than 30 minutes, as long as each visit succeeds a previous visit with less than 30min.
My solution was as follows:
SELECT DISTINCT a.`CreatedAt`
FROM activities AS a
LEFT JOIN activities AS b
ON (
(UNIX_TIMESTAMP(b.`CreatedAt`) >= (UNIX_TIMESTAMP(a.`CreatedAt`) - (30 * 60)) ) AND
(b.`CreatedAt` < a.`CreatedAt`)
)
WHERE (b.`CreatedAt` IS NULL) AND (a.`VisitorId` = '26924c19-3cd1-411e-a771-5ebd6806fb27' /* or others for example */ )
It works alright, but it does not return the last row 13, also I'm not sure it's the best solution. Thanks in advance.

The easiest way to approach this is to relate all visits to their earlier siblings and then chose only those, that have none. The (more intuitive) other approach of taking the fist of each, that has a later sibling will fail if no later visit exists (as in your example with ID 13).
SELECT
late.*
FROM activities AS late
LEFT JOIN activities AS early
ON late.VisitorId=early.VisitorId
AND late.CreatedAt>early.CreatedAt
AND late.CreatedAt<=DATE_ADD(early.CreatedAt, INTERVAL +30 MINUTE)
WHERE early.Id IS NULL
-- Maybe: AND late.VisitorId='26924c19-3cd1-411e-a771-5ebd6806fb27'
-- Maybe: ORDER BY late.CreatedAt

I've got a similar answer to #Eugen Rieck https://stackoverflow.com/a/61027502/625144. But using MySQL TIMESTAMPDIFF function
SELECT a.*,
FROM activities a
LEFT JOIN activities b
ON b.VisitorId = a.VisitorId
AND a.Id > b.Id
AND TIMESTAMPDIFF(MINUTE, b.CreatedAt, a.CreatedAt) <= 30
WHERE
b.Id IS NULL
;

Related

MySql Select last row with 30 minutes difference in date

This is a followup to this question MySql Select rows with 30 minutes difference in date, albeit similar in concept the solution needed might be different.
I have a MySql-8.0/MariaDb-10.4 table that contains a list of site visits of different visitors:
I want to create a query that returns the last visit of each visit session, where the session definition is where the CreatedAt date is 30 min or more from the previous visits.
So in my case, I should be returning row 7 (Id column), row 12 and row 13. Note also that a session can be more than 30 minutes, as long as each visit succeeds a previous visit with less than 30min.
The neat solution suggest by #EugenRieck was as follows:
SELECT
late.*
FROM activities AS late
LEFT JOIN activities AS early
ON late.VisitorId=early.VisitorId
AND late.CreatedAt>early.CreatedAt
AND late.CreatedAt<=DATE_ADD(early.CreatedAt, INTERVAL +30 MINUTE)
WHERE early.Id IS NULL
-- Maybe: AND late.VisitorId='26924c19-3cd1-411e-a771-5ebd6806fb27'
-- Maybe: ORDER BY late.CreatedAt
It works great, but it works by returning the first visit in each visit session, not the last visit. I tried to modify to work as i wanted but with no luck. Please help.
This is a variant of gap-and-islands problem. But you can handle it using lead(). Just check if the next createdAt is over 30 minutes from the value in a given row. That is the last row for a session:
select a.*
from (select a.*,
lead(createdAt) over (partition by visitorid order by createdat) as next_ca
from activities a
) a
where next_ca > createdAt + interval 30 minute;
Usually, in this situation you would want the last row as well. You would get that with or next_ca is null.

COUNT number distinct when they a row hasn't existed before the time period

I have kind of an interesting situation that I will try my best to explain.
I have a table called appointments in that table holds many appointments that a sales person can have with a potential customer. The relationship between appointments to salespeople is many to one and it is the same for potential customers.
I need to count how many appointments a salesperson has set with a lead when that salesperson has never set an appointment with that lead before.
Here is how far I have gotten in the code (I'm trying to see how many appointments a salesperson set yesterday, hence the date scrub):
SELECT COUNT(DISTINCT lead)
FROM appointments
WHERE status = 3
and DATE(appointment_created_at) = CURDATE() - interval 1 day
AND creator = 'xxx';
(the column creator represents the individual sales person and the column lead represents the individual potential customer)
The problem with this SQL query is that if a salesperson is resetting an appointment with a lead they have already set an appointment with, it still counts it as a "set appointment".
How can I count the number of rows in my appointments table without counting leads who have already been set before?
You can utilize NOT EXISTS() to check if an appointment already exists earlier or not.
SELECT COUNT(DISTINCT a1.lead)
FROM appointments a1
WHERE a1.status = 3
and a1.appointment_created_at >= CURRENT_DATE() - INTERVAL 1 DAY
AND a1.appointment_created_at < CURRENT_DATE()
AND a1.creator = 'xxx'
AND NOT EXISTS (SELECT 1
FROM appointments a2
WHERE a2.creator = 'xxx'
AND a2.lead = a1.lead
AND a2.appointment_created_at < a1.appointment_created_at)
For good performance, for the Correlated subquery in the NOT EXISTS() portion, you can use the following composite index: (creator, lead, appointment_created_at)
And, for the main select query, you can add the following the composite index: (creator, status, appointment_created_at)
If you want the number of "first-time" appointments, you can use row_number() or a correlated subquery:
SELECT COUNT(*)
FROM appointments a
WHERE a.status = 3 AND
a.appointment_created_at >= CURDATE() - interval 1 day AND
a.appointment_created_at < CURDATE() AND
a.creator = 'xxx' AND
a.appointment_created_at = (SELECT MIN(a2.appointment_created_at)
FROM appointments a2
WHERE a2.creator = a.creator AND
a2.lead = a.lead
);
Notice that I changed the date comparisons so an index can be used for the WHERE clause. If you care about performance, you want indexes on:
appointments(creator, status, appointment_created_at, lead)
appointments(creator, lead, appointment_created_at).
If the sales people can reschedule appointments then you are going to need an additional field to store original appointment date, at least. There are other more complex solutions, but this is probably the easiest approach.

MYSQL - find and show all duplicates within date difference critria

This query below selects all rows that have a row with the same father registering 335 days or less since earlier registration. Is there a way to edit this query so that it does not eliminate the duplicate row in the output? I need to see all instances of the registration for that father within 335 days of each other.
SELECT * FROM ymca_reg a later
WHERE NOT EXISTS (
SELECT 1 FROM ymca_reg a earlier
WHERE
earlier.Father_First_Name = later.Father_First_Name
AND earlier.Father_Last_Name = later.Father_Last_Name
AND (later.Date - earlier.Date < 335) AND (later.Date > earlier.Date)
My current query is:
SELECT ymca_reg.* FROM ymca_reg WHERE (((ymca_reg.Year) In (SELECT Year FROM ymca_reg As Tmp
GROUP BY Year, Father_Last_Name, Father_First_Name
HAVING Count(*)>1
And Father_Last_Name = ymca_reg.Father_Last_Name
And Father_First_Name = ymca_reg.Father_First_Name)))
ORDER BY ymca_reg.Year, ymca_reg.Father_Last_Name, ymca_reg.Father_First_Name
This query does return all the duplicates for review correctly, but it's terribly slow because it doesn't use a join and as soon as I add the date criteria it only returns the later row. Thanks.
I think you want something like this:
SELECT *
FROM ymca_reg later
WHERE EXISTS (SELECT 1
FROM ymca_reg earlier
WHERE earlier.Father_First_Name = later.Father_First_Name AND
earlier.Father_Last_Name = later.Father_Last_Name AND
abs(later.Date - earlier.Date) < 335 and
later.Date <> earlier.Date
);
This should return all records that have such duplicates. Note that "later" and "earlier" are no longer really apt descriptions, but I left the names so you can see the similarity to your query.

SQL Server 2008 date exclusion

I'm trying to figure out a way to perform a query which will obtain all data greater than six months old, without any data that is newer. I will see if I can appropriately summarize:
select u.USER_FirstName, u.USER_LastName,
u.USER_LastSession, c.Login_Name
FROM USER u
JOIN Customer c
ON u.USER_Customer_Identity=c.Customer_Identity
Where u.USER_LastSession < getdate()-180
Order by USER_LastSession
This is what I've found on SO so far, but the issue lies in that the USER.USER_LastSession records values for each log in (so some Customer.Login_Name values are unnecessary to return). I only want the ones which are greater than six months, with no result returned if they are also recorded at time less than six months. Example data:
USER_LastSession Login_Name
2012-08-29 21:33:30.000 TEST/TEST
2012-12-25 13:12:23.346 EXAMPLE/EXAMPLE
2013-10-30 17:13:45.000 TEST/TEST
I would not want to return TEST/TEST, since there is data in the past six months. I would, however, like to return EXAMPLE/EXAMPLE, since it only has data that is older than six months. I imagine there is probably something that I have overlooked - please forgive me if there is already an answer up for this (I was only able to find a "get older than six months" reply). Any and all help is greatly appreciated!
SELECT ...
FROM User u
JOIN Customer c ON u.USER_Customer_Identity=c.Customer_Identity
WHERE u.USER_Customer_Identity NOT IN
(SELECT USER_Customer_Identity
FROM User
WHERE USER_LastSession >= getdate() - 180)
ORDER BY USER_LastSession
with cte as (
select Login_Name, max(USER_LastSession) LastSession
FROM USER u
JOIN Customer c
ON u.USER_Customer_Identity = c.Customer_Identity
group by Login_Name
)
select *
from cte
where LastSession < getdate()-180

Help needed optimizing MySQL SELECT query

I have a MySQL table like this one:
day int(11)
hour int(11)
amount int(11)
Day is an integer with a value that spans from 0 to 365, assume hour is a timestamp and amount is just a simple integer. What I want to do is to select the value of the amount field for a certain group of days (for example from 0 to 10) but I only need the last value of amount available for that day, which pratically is where the hour field has its max value (inside that day). This doesn't sound too hard but the solution I came up with is completely inefficient.
Here it is:
SELECT q.day, q.amount
FROM amt_table q
WHERE q.day >= 0 AND q.day <= 4 AND q.hour = (
SELECT MAX(p.hour) FROM amt_table p WHERE p.day = q.day
) GROUP BY day
It takes 5 seconds to execute that query on a 11k rows table, and it just takes a span of 5 days; I may need to select a span of en entire month or year so this is not a valid solution.
Anybody who can help me find another solution or optimize this one is really appreciated
EDIT
No indexes are set, but (day, hour, amount) could be a PRIMARY KEY if needed
Use:
SELECT a.day,
a.amount
FROM AMT_TABLE a
JOIN (SELECT t.day,
MAX(t.hour) AS max_hour
FROM AMT_TABLE t
GROUP BY t.day) b ON b.day = a.day
AND b.max_hour = a.hour
WHERE a.day BETWEEN 0 AND 4
I think you're using the GROUP BY a.day just to get a single amount value per day, but it's not reliable because in MySQL, columns not in the GROUP BY are arbitrary -- the value could change. Sadly, MySQL doesn't yet support analytics (ROW_NUMBER, etc) which is what you'd typically use for cases like these.
Look at indexes on the primary keys first, then add indexes on the columns used to join tables together. Composite indexes (more than one column to an index) are an option too.
I think the problem is the subquery in the where clause. MySQl will at first calculate this "SELECT MAX(p.hour) FROM amt_table p WHERE p.day = q.day" for the whole table and afterwards select the days. Not quite efficient :-)