Select difference between row dates in MySQL - mysql

I want to calculate the difference in unique date fields between different rows in the same table.
For instance, given the following data:
id | date
---+------------
1 | 2011-01-01
2 | 2011-01-02
3 | 2011-01-15
4 | 2011-01-20
5 | 2011-01-10
6 | 2011-01-30
7 | 2011-01-03
I would like to generate a query that produces the following:
id | date | days_since_last
---+------------+-----------------
1 | 2011-01-01 |
2 | 2011-01-02 | 1
7 | 2011-01-03 | 1
5 | 2011-01-10 | 7
3 | 2011-01-15 | 5
4 | 2011-01-20 | 5
6 | 2011-01-30 | 10
Any suggestions for what date functions I would use in MySQL, or is there a subselect that would do this?
(Of course, I don't mind putting WHERE date > '2011-01-01' to ignore the first row.)

A correlated subquery could be of help:
SELECT
id,
date,
DATEDIFF(
(SELECT MAX(date) FROM atable WHERE date < t.date),
date
) AS days_since_last
FROM atable AS t

Something like this should work :
SELECT mytable.id, mytable.date, DATEDIFF(mytable.date, t2.date)
FROM mytable
LEFT JOIN mytable AS t2 ON t2.id = table.id - 1
However, this imply that your id are continuous in your table, otherwise this won't work at all. And maybe MySQL will complain for the first row since t2.date will be null but I don't have the time to check now.

Related

Seek rows with incorrect dates in historic data

I had a table that is an historic log, recently I fixed a bug that was writing in that table an incorrect date, the dates should be correlatives, but in some cases there was a date that wasn't it, so much older than the previous date.
How can I get all the rows that aren't correlatives for each entity_id? In the example below I should get the rows 5 and 10.
The table has millions of rows and thousand of differents entities. I was thinking to compare the results of ordering by date and id but that is a lot of manual work.
| id | entity_id | time_stamp |
|--------|-------------|---------------|
| 1 | 7 | 2019-01-22 |
| 2 | 9 | 2019-01-05 |
| 3 | 6 | 2019-03-14 |
| 4 | 9 | 2019-04-20 |
| 5 | 6 | 2015-10-04 | WRONG
| 6 | 9 | 2019-07-15 |
| 7 | 3 | 2019-07-04 |
| 8 | 7 | 2019-06-01 |
| 9 | 6 | 2019-11-04 |
| 10 | 7 | 2019-03-04 | WRONG
Are there any function to compare the previous date by the entity id? I'm completely lost here, not sure how to clean the data. The database is MYSQL by the way.
If you are running MySQL 8.0, you can use lag(); the idea is to order records by id within groups having the same entity_id, and then to filter on records where the current timestamp is smaller than the previous one:
select t.*
from (
select t.*, lag(time_stamp) over(partition by entity_id order by id) lag_time_stamp
from mytable t
) t
where time_stamp < lag_time_stamp
In earlier versions, one option is to use a correlated subquery to get the previous timestamp:
select t.*
from mytable t
where time_stamp < (
select time_stamp
from mytable t1
where t1.entity_id = t.entity_id and t1.id < t.id
order by id desc
limit 1
)
SELECT s1.*
FROM sourcetable s1
WHERE EXISTS ( SELECT NULL
FROM sourcetable s2
WHERE s1.id < s2.id
AND s1.entity_id = s2.entity_id
AND s1.time_stamp > s2.time_stamp )
The index by (entity_id, id, time_stamp) or (entity_id, time_stamp, id) will increase the performance.

MySQL: How to select rows from one table between each interval taken from other table

There are tables:
1.current_table:
date value
02.10.2019 1
03.10.2019 2
04.10.2019 2
05.10.2019 -1
06.10.2019 1
07.10.2019 1
08.10.2019 2
09.10.2019 2
10.10.2019 -1
11.10.2019 2
12.10.2019 1
2.intervals
date_start date_end
02.10.2019 04.10.2019 3
06.10.2019 09.10.2019 4
11.10.2019 12.10.2019 2
"intervals" table contains maximum length of an uninterrupted sequence of positive values.
How to select rows from "current_table" between each interval taken from "intervals" table (there are many of such intervals)?
So result should be:
date value
02.10.2019 1
03.10.2019 2
04.10.2019 2
06.10.2019 1
07.10.2019 1
08.10.2019 2
09.10.2019 2
11.10.2019 2
12.10.2019 1
My first inclination is simply:
select t1.*
from table1 t1
where t1.value > 0;
Perhaps your intervals might overlap. Or you might want to filter only for intervals in the second table. If so, then exists is handy:
select t1.*
from table1 t1
where t1.value > 0 and
exists (select 1
from table2 t2
where t1.date between t2.date_start and t2.date_end
);
This is overkill for your sample data, though.
Join the tables.
Only the rows that belong to an interval in table intervals will be returned:
select t.*
from current_table t inner join intervals i
on t.date between i.date_start and i.date_end
See the demo.
Or with EXISTS:
select t.*
from current_table t
where exists (
select 1 from intervals i
where t.date between i.date_start and i.date_end
)
See the demo.
Results:
| date | value |
| ---------- | ----- |
| 2019-02-10 | 1 |
| 2019-03-10 | 2 |
| 2019-04-10 | 2 |
| 2019-06-10 | 1 |
| 2019-07-10 | 1 |
| 2019-08-10 | 2 |
| 2019-09-10 | 2 |
| 2019-11-10 | 2 |
| 2019-12-10 | 1 |

MySQL query based on time range, group users, and sum values over a sliding window

I want to create a new Table B based on the information from another existing Table A. I'm wondering if MySQL has the functionality to take into account a range of time and group column A values then only sum up the values in a column B based on those groups in column A.
Table A stores logs of events like a journal for users. There can be multiple events from a single user in a single day. Say hypothetically I'm keeping track of when my users eat fruit and I want to know how many fruit they eat in a week (7days) and also how many apples they eat.
So in Table B I want to count for each entry in Table A, the previous 7 day total # of fruit and apples.
EDIT:
I'm sorry I over simplified my given information and didn't thoroughly think my example.
I'm initially have only Table A. I'm trying to create Table B from a query.
Assume:
User/id can log an entry multiple times in a day.
sum counts should be for id between date and date - 7 days
fruit column stands for the total # of fruit during the 7 day interval ( apples and bananas are both fruit)
The data doesn't only start at 2013-9-5. It can date back 2000 and I want to use the 7 day sliding window over all the dates between 2000 to 2013.
The sum count is over a sliding window of 7 days
Here's an example:
Table A:
| id | date-time | apples | banana |
---------------------------------------------
| 1 | 2013-9-5 08:00:00 | 1 | 1 |
| 2 | 2013-9-5 09:00:00 | 1 | 0 |
| 1 | 2013-9-5 16:00:00 | 1 | 0 |
| 1 | 2013-9-6 08:00:00 | 0 | 1 |
| 2 | 2013-9-9 08:00:00 | 1 | 1 |
| 1 | 2013-9-11 08:00:00 | 0 | 1 |
| 1 | 2013-9-12 08:00:00 | 0 | 1 |
| 2 | 2013-9-13 08:00:00 | 1 | 1 |
note: user 1 logged 2 entries on 2013-9-5
The result after the query should be Table B.
Table B
| id | date-time | apples | fruit |
--------------------------------------------
| 1 | 2013-9-5 08:00:00 | 1 | 2 |
| 2 | 2013-9-5 09:00:00 | 1 | 1 |
| 1 | 2013-9-5 16:00:00 | 2 | 3 |
| 1 | 2013-9-6 08:00:00 | 2 | 4 |
| 2 | 2013-9-9 08:00:00 | 2 | 3 |
| 1 | 2013-9-11 08:00:00 | 2 | 5 |
| 1 | 2013-9-12 08:00:00 | 0 | 3 |
| 2 | 2013-9-13 08:00:00 | 2 | 4 |
At 2013-9-12 the sliding window moves and only includes 9-6 to 9-12. That's why id 1 goes from a sum of 2 apples to 0 apples.
You need years in your data to be able to use date arithmetic correctly. I added them.
There's an odd thing in your data. You seem to have multiple log entries for each person for each day. You're assuming an implicit order setting the later log entries somehow "after" the earlier ones. If SQL and MySQL do that, it's only by accident: there's no implicit ordering of rows in a table. Plus if we duplicate date/id combinations, the self join (read on) has lots of duplicate rows and ruins the sums.
So we need to start by creating a daily summary table of your data, like so:
select id, `date`, sum(apples) as apples, sum(banana) as banana
from fruit
group by id, `date`
This summary will contain at most one row per id per day.
Next we need to do a limited cross product self-join, so we get seven days' worth of fruit eating.
select --whatever--
from (
-- summary query --
) as a
join (
-- same summary query once again
) as b
on ( a.id = b.id
and b.`date` between a.`date` - interval 6 day AND a.`date` )
The between clause in the on gives us the seven days (today, and the six days prior). Notice that the table in the join with the alias b is the seven day stuff, and the a table is the today stuff.
Finally, we have to summarize that result according to your specification. The resulting query is this.
select a.id, a.`date`,
sum(b.apples) + sum(b.banana) as fruit_last_week,
a.apples as apple_today
from (
select id, `date`, sum(apples) as apples, sum(banana) as banana
from fruit
group by id, `date`
) as a
join (
select id, `date`, sum(apples) as apples, sum(banana) as banana
from fruit
group by id, `date`
) as b on (a.id = b.id and
b.`date` between a.`date` - interval 6 day AND a.`date` )
group by a.id, a.`date`, a.apples
order by a.`date`, a.id
Here's a fiddle: http://sqlfiddle.com/#!2/670b2/15/0
Assumptions:
one row per id/date
the counts should be for id between date and date - 7 days
"fruit" = "banana"
the "date" column is actually a date (including year) and not just month/day
then this SQL should do the trick:
INSERT INTO B
SELECT a1.id, a1.date, SUM( a2.banana ), SUM( a2.apples )
FROM (SELECT DISTINCT id, date
FROM A
WHERE date > NOW() - INTERVAL 7 DAY
) a1
JOIN A a2
ON a2.id = a1.id
AND a2.date <= a1.date
AND a2.date >= a1.date - INTERVAL 7 DAY
GROUP BY a1.id, a1.date
Some questions:
Are the above assumptions correct?
Does table A contain more fruits than just Bananas and Apples? If so, what does the real structure look like?

SQL Group By Date Conflicts

I have a table with columns start_date and end_date. What we need to do is Select everything and group them by date conflicts for each Object_ID.
A date conflict is when a row's start date and/or end date pass through another rows'. For instance, here are some examples of conflicts:
Row 1 has dates 1st through the 5th, Row 2 has dates 2nd through the 3rd.
Row 1 has dates 2nd through the 5th, Row 2 has dates 1st through the 3rd.
Row 1 has dates 2nd through the 5th, Row 2 has dates 3rd through the 6th.
Row 1 has dates 2nd through the 5th, Row 2 has dates 1st through the 7th.
So for example, if we have some sample data (assume the numbers are just days of the month for simplicity):
id | object_id | start_date | end_date
1 | 1 | 1 | 5
2 | 1 | 2 | 4
3 | 1 | 6 | 8
4 | 2 | 2 | 3
What i would expect to see is this:
object_id | start_date | end_date | numconflicts
1 | <na> | <na> | 2
1 | 6 | 8 | 0 or null
2 | 2 | 3 | 0 or null
And for a Second Test Case, Here is some sample data:
id | object_id | start_date | end_date
1 | 1 | 1 | 5
2 | 1 | 2 | 4
3 | 1 | 6 | 8
4 | 2 | 2 | 3
5 | 2 | 4 | 5
6 | 1 | 2 | 3
7 | 1 | 10 | 12
8 | 1 | 11 | 13
And for the second Test Case, what I would expect to see as output:
object_id | start_date | end_date | numconflicts
1 | <na> | <na> | 3
1 | 6 | 8 | 0 or null
2 | 2 | 3 | 0 or null
2 | 4 | 5 | 0 or null
1 | <na> | <na> | 2
Yes, I will need some way of differentiating the first and the second grouping (the first and last rows) but I haven't quite figured that out. The goal is to view this list, and then when you click on a group of conflicts you can view all of the conflicts in that group.
My first thought was to attempt some GROUP BY CASE ... clause but I just wrapped by head around itself.
The language I am using to call mysql is php. So if someone knows of a php-loop solution rather than a large mysql query i am all ears.
Thanks in advance.
Edit: Added in primary Keys to provide a little less confusion.
Edit: Added in a Test case 2 to provide some more reasoning.
This query finds the number of duplicates:
select od1.object_id, od1.start_date, od1.end_date, sum(od2.id is not null) as dups
from object_date od1
left join object_date od2
on od2.object_id = od1.object_id
and od2.end_date >= od1.start_date
and od2.start_date <= od1.end_date
and od2.id != od1.id
group by 1,2,3;
You can use this query as the basis of a query that gives you exactly what you asked for (see below for output).
select
object_id,
case dups when 0 then start_date else '<na>' end as start_date,
case dups when 0 then end_date else '<na>' end as end_date,
sum(dups) as dups
from (
select od1.object_id, od1.start_date, od1.end_date, sum(od2.id is not null) as dups
from object_date od1
left join object_date od2
on od2.object_id = od1.object_id
and od2.end_date >= od1.start_date
and od2.start_date <= od1.end_date
and od2.id != od1.id
group by 1,2,3) x
group by 1,2,3;
Note that I have used an id column to distinguish the rows. However, you could replace the test of id's not matching with comparisons on every column, ie replace od2.id != od1.id with tests that every other column is not equal, but that would require a unique index on all the other columns to make sense, and having an id column is a good idea anyway.
Here's a test using your data:
create table object_date (
id int primary key auto_increment,
object_id int,
start_date int,
end_date int
);
insert into object_date (object_id, start_date, end_date)
values (1,1,5),(1,2,4),(1,6,8),(2,2,3);
Output of first query when run against this sample data:
+-----------+------------+----------+------+
| object_id | start_date | end_date | dups |
+-----------+------------+----------+------+
| 1 | 1 | 5 | 1 |
| 1 | 2 | 4 | 1 |
| 1 | 6 | 8 | 0 |
| 2 | 2 | 3 | 0 |
+-----------+------------+----------+------+
Output of second query when run against this sample data:
+-----------+------------+----------+------+
| object_id | start_date | end_date | dups |
+-----------+------------+----------+------+
| 1 | 6 | 8 | 0 |
| 1 | <na> | <na> | 2 |
| 2 | 2 | 3 | 0 |
+-----------+------------+----------+------+
Oracle : This could be done with a subquery in a group by CASE statement.
https://forums.oracle.com/forums/thread.jspa?threadID=2131172
Mysql : You could have a view which had all the conflicts .
select distinct a1.appt, a2.appt from appointment a1, appointment a2 where a1.start < a2.end and a1.end > a2.start.
and then simply do a count(*) on that table.
Something like the following should work:
select T1.object_id, T1.start_date, T1.end_date, count(T1.object_id) as numconflicts
from T1
inner join T2 on T1.start_date between T2.start_date and T2.end_date
inner join T3 on T1.end_date between T2.start_date and T2.end_date
group by T1.object_id
I might be off a little bit, but it should help you get started.
Edit: Indented it properly

mysql select ordernumber by group

I'm trying to do something like 'select groupwise maximum', but I'm looking for groupwise order number.
so with a table like this
briefs
----------
id_brief | id_case | date
1 | 1 | 06/07/2010
2 | 1 | 04/07/2010
3 | 1 | 03/07/2010
4 | 2 | 18/05/2010
5 | 2 | 17/05/2010
6 | 2 | 19/05/2010
I want a result like this
breifs result
----------
id_brief | id_case | dateOrder
1 | 1 | 3
2 | 1 | 2
3 | 1 | 1
4 | 2 | 2
5 | 2 | 1
6 | 2 | 3
I think I want to do something like described here MySQL - Get row number on select, but I don't know how I would reset the variable for each id_case.
This will give you how many records are there with this id_case value and a date less than or equal to this date value.
SELECT t1.id_brief,
t1.id_case,
COUNT(t2.*) AS dateOrder
FROM yourtable AS t1
LEFT JOIN yourtable AS t2 ON t2.id_case = t1.id_case AND t2.date <= t1.date
GROUP BY t1.id_brief
Mysql is permissive about columns which can be queries using GROUP BY. With a more stric DBMS you may need GROUP BY t1.id_brief, t1.id_case.
I strongly advise you to have the right indexes on the table:
CREATE INDEX filter1 ON yourtabl (id_case, date)