Time difference between adjacent rows in one column of one mysql table - mysql

I have a table with some 100.000 rows having this structure:
+------+---------------------+-----------+
| id | timestamp | eventType |
+------+---------------------+-----------+
| 12 | 2015-07-01 16:45:47 | 3001 |
| 103 | 2015-07-10 19:30:14 | 3001 |
| 1174 | 2015-09-03 12:57:08 | 3001 |
+------+---------------------+-----------+
For each row, I would like to calculate the days between the timestamp of this and the previous row.
As you can see, the id is not continuous, this the table contains different events and I would like to compare only the timestamp of one specific event over time.
I know, that for the comparison of tow datas, DATEDIFF can be used, and I would define the two rows with a query, that selects the row by the specific id.
But as I have many 1000 rows, I am searching for a way to somehow loop through the whole table.
Unfortunately my sql knowledge is limited and searching did not reveal an example, close enough to my question, that I would continue form there.
I would be very thankful for any hint.

If you are running MySQL 8.0, you can just use lag(). Say you want the difference in seconds:
select t.*,
timestampdiff(
second,
lag(timestamp) over(partition by eventtype order by id),
timestamp
) diff
from mytable t
In earlier versions, one alternative is a correlated subquery:
select t.*,
timestampdiff(
second,
(select timestamp from mytable t1 where t1.eventtype = t.eventtype and t1.id < t.id order by t1.id desc limit 1),
timestamp
) diff
from mytable t

Related

I don't get why these two SQL queries with WHERE IN give different results

What I wanted to do
: DELETE row only if there is a data exists that meets the WHERE condition.
table looks like below and I'm using MySQL.
table A
+-----------+-------------+---------------------+
| id | user_id | created_date |
+-----------+-------------+---------------------+
| 17 | Amy | 2021-04-19 17:00:00 |
| 19 | Amy | 2021-04-20 17:00:00 |
| 20 | Amy | 2021-04-22 17:00:00 |
| 21 | Bob | 2021-04-22 17:00:00 |
+-----------+-------------+---------------------+
1st try
I tried below query, but it failed.
I wanted to delete only Amy's 2021-04-20 data, but it deleted all three Amy's rows.
DELETE FROM A
WHERE user_id IN (
SELECT tmp.user_id from (SELECT user_id FROM A WHERE date(created_date)=date("2021-04-20") AND user_id="Amy") tmp )
AND user_id="Amy";
2nd try
succeeded.
Below query only deletes one row that meets the condition.
DELETE FROM A
WHERE created_date IN (
SELECT tmp.created_date from (SELECT created_date FROM A WHERE date(created_date)=date("2021-04-20") AND user_id="Amy") tmp )
AND user_id="Amy";
question
I don't get why these two SQL queries give different results.
All I changed was just using another column.
Maybe I'm not fully understanding IN or subquery :(
Please give some advice.
Perhaps what can help you understand this situation is to know the order of execution of an MySQL statement.
FROM clause
WHERE clause
SELECT clause
GROUP BY clause
HAVING clause
ORDER BY clause
So in this case the interpretation of your query starts FROM your original table. however the main query in its where condition has another query that also begins to be evaluated in that order. The conditional in the WHERE statement of the subquery where in the first case is:
SELECT tmp.user_id from (SELECT user_id FROM example WHERE date(created_date)=date("2021-04-20") AND user_id="Amy") tmp;
The result of that query is 'Amy'. Because even though you evaluate with the createddate you are requesting the user_id in the SELECT statement.
In the second case:
SELECT tmp.created_date from (SELECT created_date FROM example WHERE date(created_date)=date("2021-04-20") AND user_id="Amy") tmp
The result is '2021-04-20 17:00:00' which is the result that effectively in your search you will eliminate a single record.
Continuing with the order of execution, we can already notice that in both cases the data universe changes because you are changing the search condition in your main query with the WHERE user_id IN or with the WHERE created_date IN.
In the first it is looking for everything it finds with the user_id 'Amy' and in the second it is looking for everything with the date '2021-04-20 17:00:00'.
Only theoretically, if we wanted to solve the first case, you should also include in the main query the conditional of the date field, which is the field that can differentiate the 3 cases of 'Amy'. It would be as follows:
SELECT * FROM example
WHERE user_id IN (
SELECT tmp.user_id from (SELECT user_id FROM example WHERE date(created_date)=date("2021-04-20") AND user_id="Amy") tmp )
and created_date IN (SELECT tmp.created_date from (SELECT created_date FROM example WHERE date(created_date)=date("2021-04-20") AND user_id="Amy") tmp)
Regards.

Remove continuous duplicated values with different IDs in MySQL

I know there is a ton of same questions about finding and removing duplicate values in mySQL but my question is a bit different:
I have a table with columns as ID, Timestamp and price. A script scrapes data from another webpage and saves it in the database every 10 seconds. Sometimes data ends up like this:
| id | timestamp | price |
|----|-----------|-------|
| 1 | 12:13 | 100 |
| 2 | 12:14 | 120 |
| 3 | 12:15 | 100 |
| 4 | 12:16 | 100 |
| 5 | 12:17 | 110 |
As you see there are 3 duplicated values and removing the price with ID = 4 will shrink the table without damaging data integrity. I need to remove continuous duplicated records except the first one (which has the lowest ID or Timestamp).
Is there a sufficient way to do it? (there is about a million records)
I edited my scraping script so it checks for duplicated price before adding it but I need to shrink and maintain my old data.
Since MySQL 8.0 you can use window function LAG() in next way:
delete tbl.* from tbl
join (
-- use lag(price) for get value from previous row
select id, lag(price) over (order by id) price from tbl
) l
-- join rows with same previous price witch will be deleted
on tbl.id = l.id and tbl.price = l.price;
fiddle
I am just grouping based on price and filtering only one record per group.The lowest id gets displayed.Hope the below helps.
select id,timestamp,price from yourTable group by price having count(price)>0;
My query is based on #Tim Biegeleisen one.
-- delete records
DELETE
FROM yourTable t1
-- where exists an older one with the same price
WHERE EXISTS (SELECT 1
FROM yourTable t2
WHERE t2.price = t1.price
AND t2.id < t1.id
-- but does not exists any between this and the older one
AND NOT EXISTS (SELECT 1
FROM yourTable t3
WHERE t1.price <> t3.price
AND t3.id > t2.id
AND t3 < t1.id));
It deletes records where exists an older one with same price but does not exists any different between
It could be checked by timestamp column if id column is not numeric and ascending.

MySQL COUNT(DISTINCT) giving wrong values with GROUP BY

I have a table that contains custom user analytics data. I was able to pull the number of unique users with a query:
SELECT COUNT(DISTINCT(user_id)) AS 'unique_users'
FROM `events`
WHERE client_id = 123
And this will return 16728
This table also has a column of type DATETIME that I would like to group the counts by. However, if I add a GROUP BY to the end of it, everything groups properly it seems except the totals don't match. My new query is this:
SELECT COUNT(DISTINCT(user_id)) AS 'unique_users', DATE(server_stamp) AS 'date'
FROM `events`
WHERE client_id = 123
GROUP BY DATE(server_stamp)
Now I get the following values:
|-----------------------------|
| unique_users | date |
|---------------|-------------|
| 2650 | 2019-08-26 |
| 3486 | 2019-08-27 |
| 3475 | 2019-08-28 |
| 3631 | 2019-08-29 |
| 3492 | 2019-08-30 |
|-----------------------------|
Totaling to 16734. I tried using a sub query to get the distinct users then count and group in the main query but no luck there. Any help in this would be greatly appreciated. Let me know if there is further information to help diagnosis.
A user, who is connected with events on multiple days (e.g. session starts before midnight and ends afterwards), will occur the number of these days times in the new query. This is due to the fact, that the first query performs the DISTINCT over all rows at once while the second just removes duplicates inside each groups. Identical values in different groups will stay untouched.
So if you have a combination of DISTINCT in the select clause and a GROUP BY, the GROUP BY will be executed before the DISTINCT. Thus without any restrictions you cannot assume, that the COUNT(DISTINCT user_id) of the first query and the sum over the COUNT(DISTINCT user_id) of all groups is the same.
Xandor is absolutely correct. If a user logged on 2 different days, There is no way your 2nd query can remove them. If you need data grouped by date, You can try below query -
SELECT COUNT(user_id) AS 'unique_users', DATE(MIN_DATE) AS 'date'
FROM (SELECT user_id, MIN(DATE(server_stamp)) MIN_DATE -- Might be MAX
FROM `events`'
WHERE client_id = 123
GROUP BY user_id) X
GROUP BY DATE(server_stamp);

Get the MAX date between a range

I was searching for querys but i cant find an answer that helps me or if exit a similar question.
i need to get the info of the customers that made their last purchase between two dates
+--------+------------+------------+
| client | amt | date |
+--------+------------+------------+
| 1 | 2440.9100 | 2014-02-05 |
| 1 | 21640.4600 | 2014-03-11 |
| 2 | 6782.5000 | 2014-03-12 |
| 2 | 1324.6600 | 2014-05-28 |
+--------+------------+------------+
for example if i want to know all the cust who make the last purchase between
2014-02-11 and 2014-03-16, in that case the result must be
+--------+------------+------------+
| client | amt | date |
+--------+------------+------------+
| 1 | 21640.4600 | 2014-03-11 |
+--------+------------+------------+
cant be the client number 2 cause have a purchease on 2014-05-28,
i try to make a
SELECT MAX(date)
FROM table
GROUP BY client
but that only get the max of all dates,
i dont know if exist a function or something that can help, thanks.
well i dont know how to mark this question as resolved but this work for me
to complete the original query
SELECT client, MAX(date)
FROM table
GROUP BY client
HAVING MAX(date) BETWEEN date1 AND date2
thanks to all that took a minute to help me with my problem,
special thanks to Ollie Jones and Peter Pei Guo
Something in this format, replace date1 and date 2 with the real values.
SELECT client, max(date)
from table
group by client
having max(date) between date1 AND date2
There is more than one way to do this. Here is one of them.
select * from
(
select client, max(date) maxdate
from table
group by client ) temp
where maxdate between '2014-02-11' and '2014-03-06'
This will allow you to grab the amount column of the applicable rows as well:
select t.*
from tbl t
join (select client, max(date) as last_date
from tbl
group by client
having max(date) between date1 and date2) v
on t.client = v.client
and t.date = v.last_date
I had to change the field "Date" to "TheDate" since date is a reserved word. I assume you are using SQL? My table name is Table1. You need to group records:
SELECT Table1.Client, Sum(Table1.Amt) AS SumOfAmt, Table1.TheDate
FROM Table1
GROUP BY Table1.Client, Table1.TheDate
HAVING (((Table1.TheDate) Between #2/11/2014# And #3/16/2014#));
Query Results:
Client SumOfAmt TheDate
1 21640 03/11/14
2 6792 03/12/14
You may want to get yourself a copy of MS Access. You can generate SQL statements using their query builder which I used to generate this SQL. When I make a post here I will always test it first to make sure it works! I have never written even 1 line of SQL code, but have executed thousands of them from within MS Access.
Good luck,
Dan

MySQL - Exclude rows from Select based on duplication of two columns

I am attempting to narrow results of an existing complex query based on conditional matches on multiple columns within the returned data set. I'll attempt to simplify the data as much as possible here.
Assume that the following table structure represents the data that my existing complex query has already selected (here ordered by date):
+----+-----------+------+------------+
| id | remote_id | type | date |
+----+-----------+------+------------+
| 1 | 1 | A | 2011-01-01 |
| 3 | 1 | A | 2011-01-07 |
| 5 | 1 | B | 2011-01-07 |
| 4 | 1 | A | 2011-05-01 |
+----+-----------+------+------------+
I need to select from that data set based on the following criteria:
If the pairing of remote_id and type is unique to the set, return the row always
If the pairing of remote_id and type is not unique to the set, take the following action:
Of the sets of rows for which the pairing of remote_id and type are not unique, return only the single row for which date is greatest and still less than or equal to now.
So, if today is 2011-01-10, I'd like the data set returned to be:
+----+-----------+------+------------+
| id | remote_id | type | date |
+----+-----------+------+------------+
| 3 | 1 | A | 2011-01-07 |
| 5 | 1 | B | 2011-01-07 |
+----+-----------+------+------------+
For some reason I'm having no luck wrapping my head around this one. I suspect the answer lies in good application of group by, but I just can't grasp it. Any help is greatly appreciated!
/* Rows with exactly one date - always return regardless of when date occurs */
SELECT id, remote_id, type, date
FROM YourTable
GROUP BY remote_id, type
HAVING COUNT(*) = 1
UNION
/* Rows with more than one date - Return Max date <= NOW */
SELECT yt.id, yt.remote_id, yt.type, yt.date
FROM YourTable yt
INNER JOIN (SELECT remote_id, type, max(date) as maxdate
FROM YourTable
WHERE date <= DATE(NOW())
GROUP BY remote_id, type
HAVING COUNT(*) > 1) sq
ON yt.remote_id = sq.remote_id
AND yt.type = sq.type
AND yt.date = sq.maxdate
The group by clause groups all rows that have identical values of one or more columns together and returns one row in the result set for them. If you use aggregate functions (min, max, sum, avg etc.) that will be applied for each "group".
SELECT id, remote_id, type, max(date)
FROM blah
GROUP BY remote_id, date;
I'm not whore where today's date comes in, but assumed that was part of the complex query that you didn't describe and I assume isn't directly relevant to your question here.
Try this:
SELECT a.*
FROM table a INNER JOIN
(
select remote_id, type, MAX(date) date, COUNT(1) cnt from table
group by remote_id, type
) b
WHERE a.remote_id = b.remote_id,
AND a.type = b.type
AND a.date = b.date
AND ( (b.cnt = 1) OR (b.cnt>1 AND b.date <= DATE(NOW())))
Try this
select id, remote_id, type, MAX(date) from table
group by remote_id, type
Hey Carson! You could try using the "distinct" keyword on those two fields, and in a union you can use Count() along with group by and some operators to pull non-unique (greatest and less-than today) records!