database design - MySQL: How to store and split time series - mysql

I have a table where I store historical data and add a record for items I'm tracking every 5 mins.
This is an example using just 2 items:
+----+-------------+
| id | timestamp |
+----+-------------+
| 1 | 1533209426 |
| 2 | 1533209426 |
| 1 | 1533209726 |
| 2 | 1533209726 |
| 1 | 1533210026 |
| 2 | 1533210026 |
+----+-------------+
The problem is that I'm actually tracking 4k items and the table keeps getting bigger, also, I don't need 5 mins data if I want to get the last month. What I'm trying to understand is if there's a way to keep 5 mins records for the last 24h, 1h records for the last 7 days etc. Maybe every hour I could get the first 12 records from the 5 mins table and store the average in the 1h table? But what if some records are missing because there were errors? Is this the correct way to solve this problem or there are some better alternatives?

You are on the right track.
There are multiple issues to decide on how to handle -- missing entries, timestamps skewed by 1 second (or whatever), etc.
By providing a count (which should always be 12), you can discover some hiccups:
SELECT FLOOR(timestamp / 3600) AS hr, -- MEDIUMINT UNSIGNED
COUNT(*), -- TINYINT UNSIGNED
AVG(metric) -- FLOAT
FROM tbl
GROUP BY 1;
Yes, every hour, do the previous hour's worth of data. Add WHERE timestamp BETWEEN ... AND ... + 3599 to constrain the range in question. Then purge the same set of data.
The table would have PRIMARY KEY(hr).
Unless you are talking about millions of rows in a table, I would not recommend any use of PARTITION.

Related

Is there a way to group by one minute periods in MySQL using unix timestamps?

I have a MariaDB database holding some data for a game and I wanted to use Grafana to graph out how many rows have been inserted every minute. I'm using a Node.js script to fetch the data every minute from the game's Public API and insert it into my db, however I can't figure out how to get MySQL to output something that works with Grafana. What I need it to do is output a table with a column for the number of new inserts and the time at the start of the one minute window. My DB looks something like this:
--------------------------------------------------------------------
| auction_id | seller | buyer | timestamp | price | bin | item_bytes |
--------------------------------------------------------------------
I have a working query for returning the number of new inserts for the current data supplied by the API, however I can't figure out how to get the historical data for the past minutes. The output I'm looking for would look something like this:
-----------------------------
| new inserts | timestamp |
-----------------------------
| 234 | 1625373706053 |
-----------------------------
| 684 | 1625373666053 |
-----------------------------
| 720 | 1625373626053 |
-----------------------------
| 403 | 1625373586053 |
-----------------------------
Notice how the timestamp goes down by 60,000 every row, which is equivalent to 60 seconds, or a minute. I have tried using GROUP BY and almost every other solution I could find on StackOverflow and other sites, however it still doesn't work.
Please don't hesitate to comment if I wasn't clear enough.
You may use UNIX_TIMESTAMP to convert your milliseconds-since-epoch values into a bona fide datetime. Then, aggregate by minute to get the counts:
SELECT
COUNT(*) AS `new inserts`,
FROM_UNIXTIME(timestamp / 1000, '%Y-%m-%d %H:%i') AS ts_minute
FROM yourTable
GROUP BY ts_minute
ORDER BY ts_minute;

Advanced Average Date DIfference with unique ids

im back to stack overflow with another headache that I have been trying to get to the bottom of with no success at all. No matter how many times I use avg(datediff) functions.
I have an SQL table like the below:
ID | PersonID | Start | End
1 | 1 | 2006-03-21 00:00:00 | 2007-05-19 00:00:00 | Active
2 | 1 | 2007-05-19 00:00:00 | 2007-05-20 00:00:00 | Active
3 | 2 | 2016-08-24 00:00:00 | 2016-08-25 00:00:00 | Active
4 | 2 | 2005-08-25 00:00:00 | 2016-08-28 00:00:00 | Active
5 | 2 | 2016-08-28 00:00:00 | 2017-10-05 00:00:00 | Active
Im trying to find the average active stay (in days) across all unique people.
Ie the average number of days based on their EARLIEST start date and LATEST end date (as a single person ID can have multiple active statuses).
For example, person ID 1, their earliest start date was 2006-03-21 and their latest end date is 2007-05-20. Their stay has therefore been 425 days.
Repeat this for ID number 2, their stay is 407 days.
After doing this for everyone on the table... I want to get the average length of stay, the average for the above 5 rows, with 2 unique people is 416. Doing a simple datediff average across all rows will give me a very inaccurate average of 102.
Hope this makes sense. As always,any help you could give is very much appreciated.
So why not try that:
SELECT
AVG(DATEDIFF(PersonEnd, PersonStart))
FROM
(SELECT
MIN(Start) AS PersonStart,
MAX(End) AS PersonEnd
FROM
table
GROUP BY
PersonID) PeriodsPerPerson
Of course, you should have proper indexes so that MySQL can compute MAX and MIN fast and can group fast as well, which means indexes at least on PersonID, Start and End.
Please note that you really need the alias for the inner query although I don't use it anywhere. If you leave it away, you'll run into an error, at least with MySQL 5.5 (I don't know about later versions).
If you have millions or even billions of rows, you might be better off moving the calculation into a stored procedure or a back-end application instead of doing it as shown above.

How to write a script for count working time employee?

help me, please!
At the table - date, time, person, source. Updated with new values ​​when employee passing through the checkpoint, he can leave / came several times per day.
+---------------+----------+--------+-------------+
| date | time |person |source |
+---------------+----------+--------+-------------+
| 01.08.2014 | 08:42:08 | Name1 | enter1 |
+---------------+----------+--------+-------------+
| 01.08.2014 | 09:42:12 | Name1 | exit1 |
+---------------+----------+--------+-------------+
| 01.08.2014 | 10:22:45 | Name1 | enter2 |
+---------------+----------+--------+-------------+
| 01.08.2014 | 18:09:11 | Name1 | exit2 |
+---------------+----------+--------+-------------+
I need to count for each employee the actual time he spent at work each day. Table will always be not editable. It is formed from a csv file. The script runs once.
I think need to do something like this:
TIMESTAMPDIFF(MINUTE, enterTime, exitTime)
for each employee for 1 day. But I have a very poor knowledge in sql.
The date/time formats should be stored in a datetime/timestamp column. It is possible to convert them, although ugly (there's probably a better way...):
> SELECT CONCAT(STR_TO_DATE('01.08.2014', '%m.%d.%Y'), ' ', '08:42:08');
2014-01-08 08:42:08
Now Suppose the times are unix timestamps. An employ arrives at t0 and leaves at t1. The time he was at work is (t1-t0) seconds. Now suppose he he arrives at t0, leaves for a break at t1, returns at t2, and leaves for the day at t3. His total time at work is (t1-t0) + (t3-t2) = (t1+t3)-(t0+t2). In general: his time at work for a given day is the sum of the arrival times subtracted from the sum of the departure times.
Using your times:
1389188528 enter1
1389192132 exit1
1389194565 enter2
1389222551 exit2
We see that total time at work is: 1389222551 + 1389192132 - (1389188528 + 1389194565) = 31590, or about 8 hours and 47 minutes. Now what remains is converting to unix timestamps (UNIX_TIMESTAMP()) and applying this reasoning via SQL. In the following example, I have added your data to a table named work_log and assumed that when source begins with exit or enter, we are respectively referring to a departure or arrival.
SELECT person, DATE(dt) AS day,
SUM(IF(`source` like 'enter%', -1, 1)*UNIX_TIMESTAMP(dt))/3600 AS hours
FROM (SELECT CONCAT(STR_TO_DATE(`date`, '%m.%d.%Y'), ' ', `time`) AS `dt`,
`person`,`source` FROM work_log) AS wl
GROUP BY person,day;
+--------+------------+--------------+
| person | day | hours |
+--------+------------+--------------+
| Name1 | 2014-01-08 | 8.7750000000 |
+--------+------------+--------------+
1 row in set (0.00 sec)
There are probably cleaner ways of doing that.

Average a difference in timestamps across all rows

I have a table that looks like so:
id | username | jointime | parttime
------------------------------------------
1 | foo | 1391806818 | 1391814383
2 | bar | 1391406218 | 1392714270
3 | baz | 1391327818 | 1393197383
4 | qux | 1391815603 | 1391818320
I would like to find the overall average time that's being spent on the site (parttime - jointime).
I tried a query like the one below, but it just returned the average time spent by one single user.
SELECT AVG(parttime - jointime) as time FROM foo_table
Any ideas as to how I can get the overall average difference?
Thanks
You are already getting what you want.
Using your example data, result is:
796974.75
http://sqlfiddle.com/#!2/d94977/1/0
That is the average of the 4 differences in your example data:
7565
1308052
1869565
2717
http://sqlfiddle.com/#!2/d94977/2/0
It is not the average of one particular user, but indeed the average of all sessions (stored on the table), which I believe is what you want.
Wrap that in a SUM.
SELECT SUM(AVG(parttime - jointime)) as time FROM foo_table

combine data - keep unique key

I have several large tables (~100 million rows in total) which all have a similar schema: They log certain settings of an object (u_id) at a point of time
u_id | x | y | time
---------------------------
1 | 2 | 3 | [timestamp]
1 | 1 | 3 | [timestamp]
2 | 1 | 2 | [timestamp]
2 | 2 | 5 | [timestamp]
3 | 3 | 2 | [timestamp]
I now want to combine these tables into one large table which is holding ALL data. However I want to leave the u_ids unique. Obviously each source table does have e.g. u_id 1. When combining the data in the result table the entries should still be distinguishable (however I do not need to associate them back to their original values). This only has to be done once so performance does not matter.
My first idea was to add a prefix (like a_, b_, etc.) to each u_id before writing it to the destination but this obviously would introduce overhead. I'd prefer that the destination table would use an AI value for minimum overhead but I don't know how to achieve that as each source u_id can have multiple (several thousand) entries.
I think you should take one column for Type in your destination table . Type will be represent different tables of source . then you can combine u_id and Type as primary key . it will solve your problem .