How can I create a self-incrementing ID per day in MySQL? - mysql

I have a table
bills
( id INT NOT NULL AUTOINCREMENT PRIMARY KEY
, createdAt TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
, idDay INT NULL
);
I want the 1st record of the idDay field of each day to be 1 and from there continue the incremental, example:
| id | createdAt | idDay |
|----------|----------------|-------|
| 1 | 2021-01-10 | 1 |
| 2 | 2021-01-10 | 2 |
| 3 | 2021-01-11 | 1 |
| 4 | 2021-01-11 | 2 |
| 5 | 2021-01-11 | 3 |
| 6 | 2021-01-12 | 1 |
| 7 | 2021-01-13 | 1 |
| 8 | 2021-01-13 | 2 |
It's necessary the idDay field? or can i do this in the select?.
I think I can do this with a procedure but how?.
Thanks for help. 😁

You can use the row_number() window function available since MySQL 8.
SELECT id,
createdat,
row_number() OVER (PARTITION BY date(createdat)
ORDER BY id) idday
FROM bill;
(Or ORDER BY createdat, if that defines the order, not the id.)
But since window functions are calculated after a WHERE clause is applied, the number might be different for a record if previous records for a day are filtered. It's not clear from your question if this is a problem or not. If it is a problem, you can use the query in a derived table or create a view with it and work on that.
Yet another option is a correlated subquery counting the "older" records.
SELECT b1.id,
b1.createdat,
(SELECT count(*) + 1
FROM bill b2
WHERE b2.createdat >= date(b1.cratedat)
AND b2.createdat < date_add(date(b1.createdat), INTERVAL 1 DAY))
AND b2.id < b1.id) idday
FROM bill b1;
(If createdat defines the order, change b2.createdat < date_add(date(b1.createdat), INTERVAL 1 DAY)) to b2.createdat <= b1.createdat.)
That would also work in lower MySQL versions and you can add a WHERE clause (to the outer query) without changing the numbers.

You can just calculate the number in a select (requires an index on createdAt to work well):
select b.id, b.createdAt, count(b2.id)+1 as idDay
from bill b
left join bill b2 on b2.createdAt=b.createdAt and b2.id < b.id
where ...
group by b.id

Related

Seek rows with incorrect dates in historic data

I had a table that is an historic log, recently I fixed a bug that was writing in that table an incorrect date, the dates should be correlatives, but in some cases there was a date that wasn't it, so much older than the previous date.
How can I get all the rows that aren't correlatives for each entity_id? In the example below I should get the rows 5 and 10.
The table has millions of rows and thousand of differents entities. I was thinking to compare the results of ordering by date and id but that is a lot of manual work.
| id | entity_id | time_stamp |
|--------|-------------|---------------|
| 1 | 7 | 2019-01-22 |
| 2 | 9 | 2019-01-05 |
| 3 | 6 | 2019-03-14 |
| 4 | 9 | 2019-04-20 |
| 5 | 6 | 2015-10-04 | WRONG
| 6 | 9 | 2019-07-15 |
| 7 | 3 | 2019-07-04 |
| 8 | 7 | 2019-06-01 |
| 9 | 6 | 2019-11-04 |
| 10 | 7 | 2019-03-04 | WRONG
Are there any function to compare the previous date by the entity id? I'm completely lost here, not sure how to clean the data. The database is MYSQL by the way.
If you are running MySQL 8.0, you can use lag(); the idea is to order records by id within groups having the same entity_id, and then to filter on records where the current timestamp is smaller than the previous one:
select t.*
from (
select t.*, lag(time_stamp) over(partition by entity_id order by id) lag_time_stamp
from mytable t
) t
where time_stamp < lag_time_stamp
In earlier versions, one option is to use a correlated subquery to get the previous timestamp:
select t.*
from mytable t
where time_stamp < (
select time_stamp
from mytable t1
where t1.entity_id = t.entity_id and t1.id < t.id
order by id desc
limit 1
)
SELECT s1.*
FROM sourcetable s1
WHERE EXISTS ( SELECT NULL
FROM sourcetable s2
WHERE s1.id < s2.id
AND s1.entity_id = s2.entity_id
AND s1.time_stamp > s2.time_stamp )
The index by (entity_id, id, time_stamp) or (entity_id, time_stamp, id) will increase the performance.

How to get the average time between multiple dates

What I'm trying to do is bucket my customers based on their transaction frequency. I have the date recorded for every time they transact but I can't work out to get the average delta between each date. What I effectively want is a table showing me:
| User | Average Frequency
| 1 | 15
| 2 | 15
| 3 | 35
...
The data I currently have is formatted like this:
| User | Transaction Date
| 1 | 2018-01-01
| 1 | 2018-01-15
| 1 | 2018-02-01
| 2 | 2018-06-01
| 2 | 2018-06-18
| 2 | 2018-07-01
| 3 | 2019-01-01
| 3 | 2019-02-05
...
So basically, each customer will have multiple transactions and I want to understand how to get the delta between each date and then average of the deltas.
I know the datediff function and how it works but I can't work out how to split them transactions up. I also know that the offset function is available in tools like Looker but I don't know the syntax behind it.
Thanks
In MySQL 8+ you can use LAG to get a delayed Transaction Date and then use DATEDIFF to get the difference between two consecutive dates. You can then take the average of those values:
SELECT User, AVG(delta) AS `Average Frequency`
FROM (SELECT User,
DATEDIFF(`Transaction Date`, LAG(`Transaction Date`) OVER (PARTITION BY User ORDER BY `Transaction Date`)) AS delta
FROM transactions) t
GROUP BY User
Output:
User Average Frequency
1 15.5
2 15
3 35
Demo on dbfiddle.com
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(user INT NOT NULL
,transaction_date DATE
,PRIMARY KEY(user,transaction_date)
);
INSERT INTO my_table VALUES
(1,'2018-01-01'),
(1,'2018-01-15'),
(1,'2018-02-01'),
(2,'2018-06-01'),
(2,'2018-06-18'),
(2,'2018-07-01'),
(3,'2019-01-01'),
(3,'2019-02-05');
SELECT user
, AVG(delta) avg_delta
FROM
( SELECT x.*
, DATEDIFF(x.transaction_date,MAX(y.transaction_date)) delta
FROM my_table x
JOIN my_table y
ON y.user = x.user
AND y.transaction_date < x.transaction_date
GROUP
BY x.user
, x.transaction_date
) a
GROUP
BY user;
+------+-----------+
| user | avg_delta |
+------+-----------+
| 1 | 15.5000 |
| 2 | 15.0000 |
| 3 | 35.0000 |
+------+-----------+
I don't know what to say other than use a GROUP BY.
SELECT User, AVG(DATEDIFF(...))
FROM ...
GROUP BY User

To find the last value in the dataset of 15 minutes interval

ID Timestamp Value
1 11:59.54 10
1 12.04.00 20
1 12.12.00 31
1 12.16.00 10
1 12.48.00 05
I want the result set as
ID Timestamp Value
1 11:59.54 10
1 12:00:00 10
1 12.04.00 20
1 12.12.00 31
1 12:15:00 31
1 12:16.00 10
1 12:30:00 10
1 12:45:00 10
1 12.48.00 05
More coffee will probably lead to a simpler solution, but consider the the following...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,timestamp TIMESTAMP
,value INT NOT NULL
);
INSERT INTO my_table VALUES
(1 ,'11:59:54',10),
(2 ,'12:04:00',20),
(3 ,'12:12:00',31),
(4 ,'12:16:00',10),
(5 ,'12:48:00',05);
... in addition, I have a table of integers, that looks like this:
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
So...
SELECT a.timestamp
, b.value
FROM
( SELECT x.*
, MIN(y.timestamp) min_timestamp
FROM
( SELECT timestamp
FROM my_table
UNION
SELECT SEC_TO_TIME((i2.i*10+i1.i)*900)
FROM ints i1
, ints i2
WHERE SEC_TO_TIME((i2.i*10+i1.i)*900)
BETWEEN (SELECT MIN(timestamp) FROM my_table)
AND (SELECT MAX(timestamp) FROM my_table)
ORDER
BY timestamp
) x
LEFT
JOIN my_table y
ON y.timestamp >= x.timestamp
GROUP
BY x.timestamp
) a
JOIN my_table b
ON b.timestamp = min_timestamp;
+-----------+-------+
| timestamp | value |
+-----------+-------+
| 11:59:54 | 10 |
| 12:00:00 | 20 |
| 12:04:00 | 20 |
| 12:12:00 | 31 |
| 12:15:00 | 10 |
| 12:16:00 | 10 |
| 12:30:00 | 5 |
| 12:45:00 | 5 |
| 12:48:00 | 5 |
+-----------+-------+
The idea is as follows. Use SERIES_GENERATE() to generate the missing time stamps with the 15 minute intervals and and union it with the existing data your table T. Now you would want to use LAST_VALUE with IGNORE NULLS. IGNORE NULLS is not implemented in HANA, therefore you have to do a bit of a workaround. I use COUNT() as a window function to count the non null values. I do the same on the original data and then join both on the count. This way I repeat the last non-null value.
select X.ID, X.TIME, Y.VALUE from (
select ID, TIME, value,
count(VALUE) over (order by TIME rows between unbounded preceding and current row) as CNT
from (
--add the missing 15 minute interval timestamps
select 1 as ID, GENERATED_PERIOD_START as TIME, NULL as VALUE
from SERIES_GENERATE_TIME('INTERVAL 15 MINUTE', '12:00:00', '13:00:00')
union all
select ID, TIME, VALUE from T
)
) as X join (
select ID, TIME, value,
count(value) over (order by TIME rows between unbounded preceding and current row) as CNT
from T
) as Y on X.CNT = Y.CNT

MySQL query based on time range, group users, and sum values over a sliding window

I want to create a new Table B based on the information from another existing Table A. I'm wondering if MySQL has the functionality to take into account a range of time and group column A values then only sum up the values in a column B based on those groups in column A.
Table A stores logs of events like a journal for users. There can be multiple events from a single user in a single day. Say hypothetically I'm keeping track of when my users eat fruit and I want to know how many fruit they eat in a week (7days) and also how many apples they eat.
So in Table B I want to count for each entry in Table A, the previous 7 day total # of fruit and apples.
EDIT:
I'm sorry I over simplified my given information and didn't thoroughly think my example.
I'm initially have only Table A. I'm trying to create Table B from a query.
Assume:
User/id can log an entry multiple times in a day.
sum counts should be for id between date and date - 7 days
fruit column stands for the total # of fruit during the 7 day interval ( apples and bananas are both fruit)
The data doesn't only start at 2013-9-5. It can date back 2000 and I want to use the 7 day sliding window over all the dates between 2000 to 2013.
The sum count is over a sliding window of 7 days
Here's an example:
Table A:
| id | date-time | apples | banana |
---------------------------------------------
| 1 | 2013-9-5 08:00:00 | 1 | 1 |
| 2 | 2013-9-5 09:00:00 | 1 | 0 |
| 1 | 2013-9-5 16:00:00 | 1 | 0 |
| 1 | 2013-9-6 08:00:00 | 0 | 1 |
| 2 | 2013-9-9 08:00:00 | 1 | 1 |
| 1 | 2013-9-11 08:00:00 | 0 | 1 |
| 1 | 2013-9-12 08:00:00 | 0 | 1 |
| 2 | 2013-9-13 08:00:00 | 1 | 1 |
note: user 1 logged 2 entries on 2013-9-5
The result after the query should be Table B.
Table B
| id | date-time | apples | fruit |
--------------------------------------------
| 1 | 2013-9-5 08:00:00 | 1 | 2 |
| 2 | 2013-9-5 09:00:00 | 1 | 1 |
| 1 | 2013-9-5 16:00:00 | 2 | 3 |
| 1 | 2013-9-6 08:00:00 | 2 | 4 |
| 2 | 2013-9-9 08:00:00 | 2 | 3 |
| 1 | 2013-9-11 08:00:00 | 2 | 5 |
| 1 | 2013-9-12 08:00:00 | 0 | 3 |
| 2 | 2013-9-13 08:00:00 | 2 | 4 |
At 2013-9-12 the sliding window moves and only includes 9-6 to 9-12. That's why id 1 goes from a sum of 2 apples to 0 apples.
You need years in your data to be able to use date arithmetic correctly. I added them.
There's an odd thing in your data. You seem to have multiple log entries for each person for each day. You're assuming an implicit order setting the later log entries somehow "after" the earlier ones. If SQL and MySQL do that, it's only by accident: there's no implicit ordering of rows in a table. Plus if we duplicate date/id combinations, the self join (read on) has lots of duplicate rows and ruins the sums.
So we need to start by creating a daily summary table of your data, like so:
select id, `date`, sum(apples) as apples, sum(banana) as banana
from fruit
group by id, `date`
This summary will contain at most one row per id per day.
Next we need to do a limited cross product self-join, so we get seven days' worth of fruit eating.
select --whatever--
from (
-- summary query --
) as a
join (
-- same summary query once again
) as b
on ( a.id = b.id
and b.`date` between a.`date` - interval 6 day AND a.`date` )
The between clause in the on gives us the seven days (today, and the six days prior). Notice that the table in the join with the alias b is the seven day stuff, and the a table is the today stuff.
Finally, we have to summarize that result according to your specification. The resulting query is this.
select a.id, a.`date`,
sum(b.apples) + sum(b.banana) as fruit_last_week,
a.apples as apple_today
from (
select id, `date`, sum(apples) as apples, sum(banana) as banana
from fruit
group by id, `date`
) as a
join (
select id, `date`, sum(apples) as apples, sum(banana) as banana
from fruit
group by id, `date`
) as b on (a.id = b.id and
b.`date` between a.`date` - interval 6 day AND a.`date` )
group by a.id, a.`date`, a.apples
order by a.`date`, a.id
Here's a fiddle: http://sqlfiddle.com/#!2/670b2/15/0
Assumptions:
one row per id/date
the counts should be for id between date and date - 7 days
"fruit" = "banana"
the "date" column is actually a date (including year) and not just month/day
then this SQL should do the trick:
INSERT INTO B
SELECT a1.id, a1.date, SUM( a2.banana ), SUM( a2.apples )
FROM (SELECT DISTINCT id, date
FROM A
WHERE date > NOW() - INTERVAL 7 DAY
) a1
JOIN A a2
ON a2.id = a1.id
AND a2.date <= a1.date
AND a2.date >= a1.date - INTERVAL 7 DAY
GROUP BY a1.id, a1.date
Some questions:
Are the above assumptions correct?
Does table A contain more fruits than just Bananas and Apples? If so, what does the real structure look like?

Select difference between row dates in MySQL

I want to calculate the difference in unique date fields between different rows in the same table.
For instance, given the following data:
id | date
---+------------
1 | 2011-01-01
2 | 2011-01-02
3 | 2011-01-15
4 | 2011-01-20
5 | 2011-01-10
6 | 2011-01-30
7 | 2011-01-03
I would like to generate a query that produces the following:
id | date | days_since_last
---+------------+-----------------
1 | 2011-01-01 |
2 | 2011-01-02 | 1
7 | 2011-01-03 | 1
5 | 2011-01-10 | 7
3 | 2011-01-15 | 5
4 | 2011-01-20 | 5
6 | 2011-01-30 | 10
Any suggestions for what date functions I would use in MySQL, or is there a subselect that would do this?
(Of course, I don't mind putting WHERE date > '2011-01-01' to ignore the first row.)
A correlated subquery could be of help:
SELECT
id,
date,
DATEDIFF(
(SELECT MAX(date) FROM atable WHERE date < t.date),
date
) AS days_since_last
FROM atable AS t
Something like this should work :
SELECT mytable.id, mytable.date, DATEDIFF(mytable.date, t2.date)
FROM mytable
LEFT JOIN mytable AS t2 ON t2.id = table.id - 1
However, this imply that your id are continuous in your table, otherwise this won't work at all. And maybe MySQL will complain for the first row since t2.date will be null but I don't have the time to check now.