I'm working with a logfile that records data every 10 minutes. I'm trying to come up with a query that can verify that data was in fact saved every 10 minutes.
Here are some sample timestamps:
2008-01-01 00:00:00
2008-01-01 00:10:00
2008-01-01 00:20:00
2008-01-01 00:30:00
Any ideas on this? I'd give some SQL if I thought it could be improved to be correct but I don't have anything worth posting.
One trick is to make a virtual table to which you can attempt to join the the data from your logfile. In my example I've used the postgres generate_series function to generate a series of second values to append to an initial timestamp (I assume there is a similar function in MySQL?).
The trick is to use the virtual table to which to do a left join to the actual data, to find where there is a missing value in the logging table (i.e., where logger.timestamp will be NULL).
Something along these lines will show you where there is a missing timestamp if any.
SELECT
y.c
, logger.timestamp
FROM
(SELECT
a + cast(b || ' sec' as interval) as c
FROM
(SELECT
cast('2011-10-31 10:00:00' as timestamp) as a
,t.b from generate_series(0,100,10) as t(b)
)x
) y
LEFT JOIN (
SELECT timestamp from log
) logger ON y.c = logger.timestamp
WHERE
logger.timestamp IS NULL;
Related
I have three columns User_ID, New_Status and DATETIME.
New_Status contains 0(inactive) and 1(active) for users.
Every user starts from active status - ie. 1.
Subsequently table stores their status and datetime at which they got activated/inactivated.
How to calculate number of active users at the end of each date, including dates when no records were generated into the table.
Sample data:
| ID | New_Status | DATETIME |
+----+------------+---------------------+
| 1 | 1 | 2019-01-01 21:00:00 |
| 1 | 0 | 2019-02-05 17:00:00 |
| 1 | 1 | 2019-03-06 18:00:00 |
| 2 | 1 | 2019-01-02 01:00:00 |
| 2 | 0 | 2019-02-03 13:00:00 |
Format the date time value to a date only string and group by it
SELECT DATE_FORMAT(DATETIME, '%Y-%m-%d') as day, COUNT(*) as active
FROM test
WHERE New_Status = 1
GROUP BY day
ORDER BY day
In MySQL 8 you can use the row_number() window function to get the last status of a user per day. Then filter for the one that indicate the user was active GROUP BY the day and count them.
SELECT date(x.datetime),
count(*)
FROM (SELECT date(t.datetime) datetime,
t.new_status,
row_number() OVER (PARTITION BY date(t.datetime)
ORDER BY t.datetime DESC) rn
FROM elbat t) x
WHERE x.rn = 1
AND x.new_status = 1
GROUP BY x.datetime;
If not all days are in the table you need to create a (possibly derived) table with all days and cross join it.
Find out the last activity status of users whose activity was changed for each day
select User_ID, New_Status, DATE_FORMAT(DATETIME, '%Y-%m-%d')
from activity_table
where not exists
(
select 1
from activity_table at
where at.User_ID = activity_table.User_ID and
DATE_FORMAT(at.DATETIME, '%Y-%m-%d') = DATE_FORMAT(activity_table.DATETIME, '%Y-%m-%d') and
at.DATETIME > activity_table.DATETIME
)
order by DATE_FORMAT(activity_table.DATETIME, '%Y-%m-%d');
This is not the solution yet, but a very very useful information before solution. Note that here not all dates are covered yet and the values are individual records, more precisely their last values on each day, ordered by the date.
Let's get aggregate numbers
Using the query above as a subselect and aliasing it into a table, you can group by DATETIME and do a select sum(new_Status) as activity, count(*) total, DATETIME so you will know that activity - (total - activity) is the difference in comparison to the previous day.
Knowing the delta for each day present in the result
At the previous section we have seen how the delta can be calculated. If the whole query in the previous section is aliased, then you can self join it using a left join, with pairs of (previous date, current date), still having the gaps of dates, but not worrying about that just yet. In the case of the first date, its activity is the delta. For subsequent records, adding the previous day's delta to their delta yields the result you need. To achieve this you can use a recursive query, supported by MySQL 8, or, alternatively, you can just have a subquery which sums the delta of previous days (with special attention to the first date, as described earlier) will and adding the current date's delta yields the result we need.
Fill the gaps
The previous section would already perfectly work (assuming the lack of integrity problems), assuming that there were activity changes for each day, but we will not continue with the assumption. Here we know that the figures are correct for each date where a figure is present and we will need to just add the missing dates into the result. If the results are properly ordered, as they should be, then one can use a cursor and loop the results. At each record after the first one, we can determine the dates that are missing. There might be 0 such dates between two consequent dates or more. What we do know about the gaps is that their values are exactly the same as the previous record, that do has data. If there were no activity changes on a given date, then the number of active users is exactly the same as in the previous day. Using some structure, like a table you can generate the results you have with the knowledge described here.
Solving possible integrity problems
There are several possibilities for such problems:
First, a data item might exist prior to the introduction of this table's records were started to be spawned.
Second, bugs or any other causes might have made a pause in creating records for this activity table.
Third, the addition of user is or was not necessarily generating an activity change, since its popping into existence renders its previous state of activity undefined and subject to human standards, which might change over time.
Fourth, the removal of user is or was not necessarily generating an activity change, since its popping out of existence renders is current state of activity undefined and subject to human standards, which might change over time.
Fifth, there is an infinity of other issues which might cause data integrity issues.
To cope with these you will need to comprehensively analyze whatever you can from the source-code and the history of the project, including database records, logs and humanly available information to detect such anomalies, the time they were effective and figure out what their solution is if they exist.
EDIT
In the meantime I was thinking about the possibility of a user, who was active at the start of the day being deactivated and then activated again by the end of the day. Similarly, an inactive user during a day might be activated and then finally deactivated by the end of the day. For users that have more than an activation at the start of the day, we need to compare their activity status at the start and the end of the day to find out what the difference was.
SELECT
DATE(DATETIME),
COUNT(*)
FROM your_table
WHERE New_Status = 1
GROUP BY User_ID,
DATE(DATETIME)
For MySQL
WITH RECURSIVE
cte AS (
SELECT MIN(DATE(DT)) dt
FROM src
UNION ALL
SELECT dt + INTERVAL 1 DAY
FROM cte
WHERE dt < ( SELECT MAX(DATE(DT)) dt
FROM src )
),
cte2 AS
(
SELECT users.id,
cte.dt,
SUM( CASE src.New_Status WHEN 1 THEN 1
WHEN 0 THEN -1
ELSE 0
END ) OVER ( PARTITION BY users.id
ORDER BY cte.dt ) status
FROM cte
CROSS JOIN ( SELECT DISTINCT id
FROM src ) users
LEFT JOIN src ON src.id = users.id
AND DATE(src.dt) = cte.dt
)
SELECT dt, SUM(status)
FROM cte2
GROUP BY dt;
fiddle
Do not forget to adjust max recursion depth.
Here is what I believe is a good solution for this problem of yours:
SELECT SUM(New_Status) "Number of active users"
, DATE_FORMAT(DATEC, '%Y-%m-%d') "Date"
FROM TEST T1
WHERE DATE_FORMAT(DATEC,'%H:%i:%s') =
(SELECT MAX(DATE_FORMAT(T2.DATEC,'%H:%i:%s'))
FROM TEST T2
WHERE T2.ID = T1.ID
AND DATE_FORMAT(T1.DATEC, '%Y-%m-%d') = DATE_FORMAT(T2.DATEC, '%Y-%m-%d')
GROUP BY ID
, DATE_FORMAT(DATEC, '%Y-%m-%d'))
GROUP BY DATE_FORMAT(DATEC, '%Y-%m-%d');
Here is the DEMO
I am using MySQL 8 and need to create a stored procedure
I have a single table that has a DATE field and a value field which can be 0 or any other number. This value field represents the daily amount of rain for that day.
The table stores data between today and 10 years.
I need to find out how many periods of rain there will be in the next 10 years.
So, for example, if my table contains the following data:
Date - Value
2018-06-09 - 0
2018-06-10 - 50
2018-06-11 - 0
2018-06-12 - 15
2018-06-13 - 17
2018-06-14 - 0
2018-06-15 - 0
2018-06-16 - 12
2018-06-17 - 123
2018-06-18 - 17
Then the SP should return 3, because there were 3 periods of rain.
Any help in getting me closer to the answer will be appreciated!
You don't need to have a stored procedure for this.
A solution with MySQL's 8.0 LEAD function this supports dates with gaps.
The complete table needs to be scanned but i don't think that a huge problem with ~3560 records.
Query
SELECT
SUM(filter_match = 1) AS number
FROM (
SELECT
((t.value = 0) AND (LEAD(t.value) OVER (ORDER BY t.date ASC) != 0)) AS filter_match
FROM
t
) t
see demo https://www.db-fiddle.com/f/sev4NqgLsFPgtNgwzruwy/2
By the way, would you mind expanding your answer to understand how
LEAD and SUM work together?
LEAD(t.value) OVER (ORDER BY t.date ASC) simply means get the next value from the next record ordered by date.
this demo shows it nicely https://www.db-fiddle.com/f/sev4NqgLsFPgtNgwzruwy/6
SUM(filter_match = 1) is a conditional sum. in this case the alias filter_match needs to be true.
see what filter_match is demo https://www.db-fiddle.com/f/sev4NqgLsFPgtNgwzruwy/8
In MySQL aggregate functions can have a SQL expression something like 1 = 1 (which is always true or 1) or 1 = 0 (which is always false or 0).
The conditional sum only sums up when the condition is true.
see demo https://www.db-fiddle.com/f/sev4NqgLsFPgtNgwzruwy/7
Use MySQL join:
SELECT COUNT(*) Number_of_Periods
FROM yourTable A JOIN yourTable B
ON DATE(A.`DATE`)=DATE(B.`DATE` - INTERVAL 1 DAY)
AND A.`VALUE`=0 AND B.`VALUE`>0;
See Demo on DB Fiddle.
I am using the Graph Reports for the select below. The MySQL database only has the active records in the database, so if no records are in the database from X hours till Y hours that select does not return anything. So in my case, I need that select return Paypal zero values as well even the no activity was in the database. And I do not understand how to use the UNION function or re-create select in order to get the zero values if nothing was recorded in the database in time interval. Could you please help?
select STR_TO_DATE ( DATE_FORMAT(`acctstarttime`,'%y-%m-%d %H'),'%y-%m-%d %H')
as '#date', count(*) as `Active Paid Accounts`
from radacct_history where `paymentmethod` = 'PayPal'
group by DATE_FORMAT(`#date`,'%y-%m-%d %H')
When I run the select the output is:
Current Output
But I need if there are no values between 2016-07-27 07:00:00 and 2016-07-28 11:00:00, then in every hour it should show zero active accounts Like that:
Needed output with no values every hour
I have created such select below , but it not put to every hour the zero value like i need. showing the big gap between the 12 Sep and 13 Sep anyway, but there should be the zero values every hour
(select STR_TO_DATE ( DATE_FORMAT(acctstarttime,'%y-%m-%d %H'),'%y-%m-%d %H')
as '#date', count(paymentmethod) as Active Paid Accounts
from radacct_history where paymentmethod <> 'PayPal'
group by DATE_FORMAT(#date,'%y-%m-%d %H'))
union ALL
(select STR_TO_DATE ( DATE_FORMAT(acctstarttime,'%y-%m-%d %H'),'%y-%m-%d %H')
as '#date', 0 as Active Paid Accounts
from radacct_history where paymentmethod <> 'PayPal'
group by DATE_FORMAT(#date,'%y-%m-%d %H')) ;
I guess, you want to return 0 if there is no matching rows in MySQL. Here is an example:
(SELECT Col1,Col2,Col3 FROM ExampleTable WHERE ID='1234')
UNION (SELECT 'Def Val' AS Col1,'none' AS Col2,'' AS Col3) LIMIT 1;
Updated the post: You are trying to retrieve data that aren't present in the table, I guess in reference to the output provided. So in this case, you have to maintain a date table to show the date that aren't in the table. Please refer to this and it's little bit tricky - SQL query that returns all dates not used in a table
You need an artificial table with all necessary time intervals. E.g. if you need daily data create a table and add all day dates e.g. start from 1970 till 2100.
Then you can use the table and LEFT JOIN your radacct_history. So for each desired interval you will have group item (group by should be based on the intervals table.
I've been trying to work this one out for a while now, maybe my problem is coming up with the correct search query. I'm not sure.
Anyway, the problem I'm having is that I have a table of data that has a new row added every second (imagine the structure {id, timestamp(datetime), value}). I would like to do a single query for MySQL to go through the table and output only the first value of each minute.
I thought about doing this with multiple queries with LIMIT and datetime >= (beginning of minute) but with the volume of data I'm collecting that is a lot of queries so it would be nicer to produce the data in a single query.
Sample data:
id datetime value
1 2015-01-01 00:00:00 128
2 2015-01-01 00:00:01 127
3 2015-01-01 00:00:04 129
4 2015-01-01 00:00:05 127
...
67 2015-01-01 00:00:59 112
68 2015-01-01 00:01:12 108
69 2015-01-01 00:01:13 109
Where I would want the result to select the rows:
1 2015-01-01 00:00:00 128
68 2015-01-01 00:01:12 108
Any ideas?
Thanks!
EDIT: Forgot to add, the data, whilst every second, is not reliably on the first second of every minute - it may be :30 or :01 rather than :00 seconds past the minute
EDIT 2: A nice-to-have (definitely not required for answer) would be a query that is flexible to also take an arbitrary number of minutes (rather than one row each minute)
SELECT t2.* FROM
( SELECT MIN(`datetime`) AS dt
FROM tbl
GROUP BY DATE_FORMAT(`datetime`,'%Y-%m-%d %H:%i')
) t1
JOIN tbl t2 ON t1.dt = t2.`datetime`
SQLFiddle
Or
SELECT *
FROM tbl
WHERE dt IN ( SELECT MIN(dt) AS dt
FROM tbl
GROUP BY DATE_FORMAT(dt,'%Y-%m-%d %H:%i'))
SQLFiddle
SELECT t1.*
FROM tbl t1
LEFT JOIN (
SELECT MIN(dt) AS dt
FROM tbl
GROUP BY DATE_FORMAT(dt,'%Y-%m-%d %H:%i')
) t2 ON t1.dt = t2.dt
WHERE t2.dt IS NOT NULL
SQLFiddle
In MS SQL Server I would use CROSS APPLY, but as far as I know MySQL doesn't have it, so we can emulate it.
Make sure that you have an index on your datetime column.
Create a table of numbers, or in your case a table of minutes. If you have a table of numbers starting from 1 it is trivial to turn it into minutes in the necessary range.
SELECT
tbl.ID
,tbl.`dt`
,tbl.value
FROM
(
SELECT
MinuteValue
, (
SELECT tbl.id
FROM tbl
WHERE tbl.`dt` >= Minutes.MinuteValue
ORDER BY tbl.`dt`
LIMIT 1
) AS ID
FROM Minutes
) AS IDs
INNER JOIN tbl ON tbl.ID = IDs.ID
For each minute find one row that has timestamp greater than the minute. I don't know how to return the full row, rather than one column in MySQL in the nested SELECT, so at first I'm making a temp table with two columns: Minute and id from the original table and then explicitly look up rows from original table knowing their IDs.
SQL Fiddle
I've created a table of Minutes in the SQL Fiddle with the necessary values to make example simple. In real life you would have a more generic table.
Here is SQL Fiddle that uses a table of numbers, just for illustration.
In any case, you do need to know in advance somehow the range of dates/numbers you are interested in.
It is trivial to make it work for any interval of minutes. If you need results every 5 minutes, just generate a table of minutes that has values not every 1 minute, but every 5 minutes. The main query would remain the same.
It may be more efficient, because here you don't join the big table to itself and you don't make calculations on the datetime column, so the server should be able to use the index on it.
The example that I made assumes that for each minute there is at least one row in the big table. If it is possible that there are some minutes that don't have any data at all you'd need to add extra check in the WHERE clause to make sure that the found row is still within that minute.
select * from table where timestamp LIKE "%-%-% %:%:00" could work.
This is similar to this question: Stack Overflow Date SQL Query Question
Edit: This probably would work better:
`select , date_format(timestamp, '%Y-%m-%d %H:%i') as the_minute, count()
from table
group by the_minute
order by the_minute
Similar to this question here: mysql select date format
i'm not really sure, but you could try this:
SELECT MIN(timestamp) FROM table WHERE YEAR(timestamp)=2015 GROUP BY DATE(timestamp), HOUR(timestamp), MINUTE(timestamp)
I'm reasonably new to Access and having trouble solving what should be (I hope) a simple problem - think I may be looking at it through Excel goggles.
I have a table named importedData into which I (not so surprisingly) import a log file each day. This log file is from a simple data-logging application on some mining equipment, and essentially it saves a timestamp and status for the point at which the current activity changes to a new activity.
A sample of the data looks like this:
This information is then filtered using a query to define the range I want to see information for, say from 29/11/2013 06:00:00 AM until 29/11/2013 06:00:00 PM
Now the object of this is to take a status entry's timestamp and get the time difference between it and the record on the subsequent row of the query results. As the equipment works for a 12hr shift, I should then be able to build a picture of how much time the equipment spent doing each activity during that shift.
In the above example, the equipment was in status "START_SHIFT" for 00:01:00, in status "DELAY_WAIT_PIT" for 06:08:26 and so-on. I would then build a unique list of the status entries for the period selected, and sum the total time for each status to get my shift summary.
You can use a correlated subquery to fetch the next timestamp for each row.
SELECT
i.status,
i.timestamp,
(
SELECT Min([timestamp])
FROM importedData
WHERE [timestamp] > i.timestamp
) AS next_timestamp
FROM importedData AS i
WHERE i.timestamp BETWEEN #2013-11-29 06:00:00#
AND #2013-11-29 18:00:00#;
Then you can use that query as a subquery in another query where you compute the duration between timestamp and next_timestamp. And then use that entire new query as a subquery in a third where you GROUP BY status and compute the total duration for each status.
Here's my version which I tested in Access 2007 ...
SELECT
sub2.status,
Format(Sum(Nz(sub2.duration,0)), 'hh:nn:ss') AS SumOfduration
FROM
(
SELECT
sub1.status,
(sub1.next_timestamp - sub1.timestamp) AS duration
FROM
(
SELECT
i.status,
i.timestamp,
(
SELECT Min([timestamp])
FROM importedData
WHERE [timestamp] > i.timestamp
) AS next_timestamp
FROM importedData AS i
WHERE i.timestamp BETWEEN #2013-11-29 06:00:00#
AND #2013-11-29 18:00:00#
) AS sub1
) AS sub2
GROUP BY sub2.status;
If you run into trouble or need to modify it, break out the innermost subquery, sub1, and test that by itself. Then do the same for sub2. I suspect you will want to change the WHERE clause to use parameters instead of hard-coded times.
Note the query Format expression would not be appropriate if your durations exceed 24 hours. Here is an Immediate window session which illustrates the problem ...
' duration greater than one day:
? #2013-11-30 02:00# - #2013-11-29 01:00#
1.04166666667152
' this Format() makes the 25 hr. duration appear as 1 hr.:
? Format(#2013-11-30 02:00# - #2013-11-29 01:00#, "hh:nn:ss")
01:00:00
However, if you're dealing exclusively with data from 12 hr. shifts, this should not be a problem. Keep it in mind in case you ever need to analyze data which spans more than 24 hrs.
If subqueries are unfamiliar, see Allen Browne's page: Subquery basics. He discusses correlated subqueries in the section titled Get the value in another record.