MySql -- Determine periods of missing data with query - mysql

I have a database that's set up like this:
(Schema Name)
Historical
-CID int UQ AI NN
-ID Int PK
-Location Varchar(255)
-Status Varchar(255)
-Time datetime
So an entry might look like this
433275 | 97 | MyLocation | OK | 2013-08-20 13:05:54
My question is, if I'm expecting 5 minute interval data from each of my sites, how can I determine how long a site has been down?
Example, if MyLocation didn't send in the 5 minute interval data from 13:05:54 until 14:05:54 it would've missed 60 minutes worth of intervals, how could I find this downtime and report on it easily?
Thanks,

*Disclaimer: I'm assuming that your time column determines the order of the entries in your table and that you can't easily (and without heavy performance loss) self-join the table on auto_increment column since it can contain gaps.*
Either you create a table containing simply datetime values and do a
FROM datetime_table d
LEFT JOIN your_table y ON DATE_FORMAT(d.datetimevalue, '%Y-%m-%d %H:%i:00') = DATE_FORMAT(y.`time`, '%Y-%m-%d %H:%i:00')
WHERE y.some_column IS NULL
(date_format() function is used here to get rid of the seconds part in the datetime values).
Or you use user defined variables.
SELECT * FROM (
SELECT
y.*,
TIMESTAMPDIFF(MINUTE, #prevDT, `Time`) AS timedifference
#prevDT := `Time`
FROM your_table y ,
(SELECT #prevDT:=(SELECT MIN(`Time`) FROM your_table)) vars
ORDER BY `Time`
) sq
WHERE timedifference > 5
EDIT: I thought you wanted to scan the whole table (or parts of it) for rows where the timedifference to the previous row is greater than 5 minutes. To check for a specific ID (and still having same assumptions as in the disclaimer) you'd have to do a different approach:
SELECT
TIMESTAMPDIFF(MINUTE, (SELECT `Time` FROM your_table sy WHERE sy.ID < y.ID ORDER BY ID DESC LIMIT 1), `Time`) AS timedifference
FROM your_table y
WHERE ID = whatever
EDIT 2:
When you say "if the ID is currently down" is there already an entry in your table or not? If not, you can simply check this via
SELECT TIMESTAMPDIFF(MINUTE, NOW(), (SELECT MAX(`Time`) FROM your_table WHERE ID = whatever));

So I assume you are going to have some sort of cron job running to check this table. If that is the case you can simply check for the highest time value for each id/location and compare it against current time to flag any id's that have a most recent time that is older than the specified threshold. You can do that like this:
SELECT id, location, MAX(time) as most_recent_time
FROM Historical
GROUP BY id
HAVING most_recent_time < DATE_SUB(NOW(), INTERVAL 5 minutes)

Something like this:
SELECT h1.ID, h1.location, h1.time, min(h2.time)
FROM Historical h1 LEFT JOIN Historical h2
ON (h1.ID = h2.ID AND h2.CID > h1.CID)
WHERE now() > h1.time + INTERVAL 301 SECOND
GROUP BY h1.ID, h1.location, h1.time
HAVING min(h2.time) IS NULL
OR min(h2.time) > h1.time + INTERVAL 301 SECOND

Related

MySQL SELECT all rows between date time with interval

I have a column in my sql table called loggedTime which is a datetime field and I want to select between two dates startDate and endDate along with the interval may be 5 minutes, 10 minutes, 1 hour etc. I tried to write the SQL query but it says You have syntax error next interval, I am not sure what wrong with my query. If I remove INTERVAL 5 MINUTE my query works fine but I want to pass the Interval along with the date so it will select all rows between two dates and also with interval
Here is SQL
SELECT * FROM mytable WHERE loggedTime BETWEEN '2021-06-01' and '2021-06-03' INTERVAL 5 MINUTE
If you have any unique consecutively increasing column like id, then you can use an INNER JOIN as done followingly:
SELECT *
FROM mytable a
INNER JOIN mytable b
ON a.ID = b.ID + 1
WHERE TIMESTAMPDIFF(minute, a.timestamp, b.timestamp) = 5;
If you do not have that column in your table then use this code :
SELECT *
FROM (SELECT mt.*,
TIMESTAMPDIFF(minute, #prevTS, `loggedTime`) AS timeinterval,
#prevTS:=mt.`loggedTime`
FROM mytable mt,
(SELECT #prevTS := (SELECT MIN(`loggedTime`)
FROM yourTable)) vars
ORDER BY ID)subquery_alias
WHERE loggedTime BETWEEN '2021-06-01' AND '2021-06-03'
AND timeinterval = 5
Check this thread as reference too.

Rewrite sql query to pad empty month rows

I have this query i use to get statistics of blogs in our own tracking system.
I use union select over 2 tables as we daily aggregate data in 1 table and keeps todays data in another table.
I want to have the last 10 months of traffic show.. This query does that, but of there is no traffic in a specific month that row is not in the result.
I have previously used a calendar table in mysql to join against to at avoid that, but im simply not skilled enoght to rewrite this query to join against that calendar table.
The calendart table has 1 field called "datefield" which i date format YYY-MM-DD
This is the current query i use
SELECT FORMAT(SUM(`count`),0) as `count`, DATE(`date`) as `date`
FROM
(
SELECT count(distinct(uniq_id)) as `count`, `timestamp` as `date`
FROM tracking
WHERE `timestamp` > now() - INTERVAL 1 DAY AND target_bid = 92
group by `datestamp`
UNION ALL
select sum(`count`),`datestamp` as `date`
from aggregate_visits
where `datestamp` > now() - interval 10 month
and target_bid = 92
group by `datestamp`
) a
GROUP BY MONTH(date)
Something like this?
select sum(COALESCE(t.`count`,0)),s.date as `date`
from DateTable s
LEFT JOIN (SELECT * FROM aggregate_visits
where `datestamp` > now() - interval 10 month
and target_bid = 92) t
ON(s.date = t.datestamp)
group by s.date

MySQL combine 2 different counts in one query

I have a table, that pretty much looks like this:
users (id INT, masterId INT, date DATETIME)
Every user has exactly one master. But masters can have n users.
Now I want to find out how many users each master has. I'm doing that this way:
SELECT `masterId`, COUNT(`id`) AS `total` FROM `users` GROUP BY `masterId` ORDER BY `total` DESC
But now I also want to know how many new users a master has since the last 14 days. I could do it with this query:
SELECT `masterId`, COUNT(`id`) AS `last14days` FROM `users` WHERE `date` > DATE_SUB(NOW(), INTERVAL 14 DAY) GROUP BY `masterId` ORDER BY `total` DESC
Now the question: Could I somehow get this information with one query, instead of using 2 queries?
You can use conditional aggregation to do this by only counting rows for with the condition is true. In standard SQL this would be done using a case expression inside the aggregate function:
SELECT
masterId,
COUNT(id) AS total,
SUM(CASE WHEN date > DATE_SUB(NOW(), INTERVAL 14 DAY) THEN 1 ELSE 0 END) AS last14days
FROM users
GROUP BY masterId
ORDER BY total DESC
Sample SQL Fiddle

MySQL query to count items by week for the current 52-weeks?

I have a query that I'd like to change so that it gives me the counts for the current 52 weeks. This query makes use of a calendar table I've made which contains a list of dates in a fixed range. The query as it stands is selecting max and min dates and not necessarily the last 52 weeks.
I'm wondering how to keep my calendar table current such that I can get the last 52-weeks (i.e, from right now to one year ago). Or is there another way to make the query independent of using a calendar table?
Here's the query:
SELECT calendar.datefield AS date, IFNULL(SUM(purchaseyesno),0) AS item_sales
FROM items_purchased join items on items_purchased.item_id=items.item_id
RIGHT JOIN calendar ON (DATE(items_purchased.purchase_date) = calendar.datefield)
WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(purchase_date))
FROM items_purchased) AND (SELECT MAX(DATE(purchase_date)) FROM items_purchased))
GROUP BY week(date)
thoughts?
Some people dislike this approach but I tend to use a dummy table that contains values from 0 - 1000 and then use a derived table to produce the ranges that are needed -
CREATE TABLE dummy (`num` INT NOT NULL);
INSERT INTO dummy VALUES (0), (1), (2), (3), (4), (5), .... (999), (1000);
If you have a table with an auto-incrementing id and plenty of rows you could generate it from that -
CREATE TABLE `dummy`
SELECT id AS `num` FROM `some_table` WHERE `id` <= 1000;
Just remember to insert the 0 value.
SELECT CURRENT_DATE - INTERVAL num DAY
FROM dummy
WHERE num < 365
So, applying this approach to your query you could do something like this -
SELECT WEEK(calendar.datefield) AS `week`, IFNULL(SUM(purchaseyesno),0) AS item_sales
FROM items_purchased join items on items_purchased.item_id=items.item_id
RIGHT JOIN (
SELECT (CURRENT_DATE - INTERVAL num DAY) AS datefield
FROM dummy
WHERE num < 365
) AS calendar ON (DATE(items_purchased.purchase_date) = calendar.datefield)
WHERE calendar.datefield >= (CURRENT_DATE - INTERVAL 1 YEAR)
GROUP BY week(datefield) -- shouldn't this be datefield instead of date?
I too typically "simulate" a table on the fly by using #sql variables and just join to ANY table in your system that has AT least as many weeks as you want. NOTE... when dealing with dates, I like to typically use the date-part only which implies a 12:00:00 am. Also, by advancing the start date by 7 days for the "EndOfWeek", you can now apply a BETWEEN clause for records within a given time period... such as your weekly needs.
I've applied such a sample to coordinate the join based on date association to the per week basis... Since your
select
DynamicCalendar.StartOfWeek,
COALESCE( SUM( IP.PurchaseYesNo ), 0 ) as Item_Sales
from
( select
#weekNum := #weekNum +1 as WeekNum,
#startDate as StartOfWeek,
#startDate := date_add( #startDate, interval 1 week ) EndOfWeek
from
( select #weekNum := 0,
#startDate := date(date_sub(now(), interval 1 year ))) sqlv,
AnyTableThatHasAtLeast52Records,
limit
52 ) DynamicCalendar
LEFT JOIN items_purchased IP
on IP.Purchase_Date bewteen DynamicCalendar.StartOfWeek
AND DynamicCalendar.EndOfWeek
group by
DynamicCalendar.StartOfWeek
This is under the premise that your "PurchaseYesNo" value is in your purchased table directly. If so, no need to join to the ITEMS table. If the field IS in the items table, then I would just tack on a LEFT JOIN for your items table and get value from that.
However you could use the dynamicCalendar context in MANY conditions.

Need to find number of new unique ID numbers in a MySQL table

I have an iPhone app out there that "calls home" to my server every time a user uses it. On my server, I create a row in a MySQL table each time with the unique ID (similar to a serial number) aka UDID for the device, IP address, and other data.
Table ClientLog columns:
Time, UDID, etc, etc.
What I'd like to know is the number of new devices (new unique UDIDs) on a given date. I.e. how many UDIDs were added to the table on a given date that don't appear before that date? Put plainly, this is the number of new users I gained that day.
This is close, I think, but I'm not 100% there and not sure it's what I want...
SELECT distinct UDID
FROM ClientLog a
WHERE NOT EXISTS (
SELECT *
FROM ClientLog b
WHERE a.UDID = b.UDID AND b.Time <= '2010-04-05 00:00:00'
)
I think the number of rows returned is the new unique users after the given date, but I'm not sure. And I want to add to the statement to limit it to a date range (specify an upper bound as well).
Your query seems correct, and you can add bounds like this:
SELECT DISTINCT UDID FROM ClientLog a WHERE a.Time >= '2010-04-05 00:00:00'
AND a.Time < '2010-04-06 00:00:00'
AND NOT EXISTS(SELECT * FROM ClientLog b WHERE a.UDID = b.UDID
AND b.Time < '2010-04-05 00:00:00');
UPDATE: another method that comes to mind is below, but I believe it's slower:
SELECT DISTINCT UDID FROM ClientLog a WHERE a.Time >= '2010-04-05 00:00:00'
AND a.Time < '2010-04-06 00:00:00'
AND a.UDID <> ALL
(SELECT DISTINCT udid FROM ClientLog b where b.Time < '2010-04-05 00:00:00');
UPDATE 2: Of course, if you're only interested in the number of new UDIDs, then this would be the best solution:
SELECT COUNT(DISTINCT UDID) FROM ClientLog WHERE Time < '2010-04-05 00:00:00';
SELECT COUNT(DISTINCT UDID) FROM ClientLog WHERE Time < '2010-04-06 00:00:00';
Then take the difference in your code (there might be a way to do it in MySQL, but I'm not a MySQL expert).