Getting the newest record from a database Multiple queries - mysql

I have a mysql database with vehicles records. I need a fast query that will return the newest records of those records that were updated within the last 4 minutes. For example vehicle "A" may be updated several times a minute so it will appear many times within the last 4min. Same with vehicle B C etc. I need only the most recent entries for each vehicle within a 4 min window. I have tried like this
SELECT *
FROM yourtable AS a
WHERE a.ts =
(SELECT MAX(ts)
FROM yourtable AS b
WHERE b.ts > NOW() - INTERVAL 5 MINUTE
AND b.name = a.name)
but it takes too long to produce results >10seconds.

Does this work?
select distinct a.id from yourtable AS a
where a.ts > NOW() - INTERVAL 5 MINUTE;
It gives all vehicles that were updated in last 4/5 minutes.
Thanks,
Rakesh

Related

How to transform a NOT IN into a LEFT JOIN (or a NOT EXISTS)

I have the following query that is quite complex and even though I tried to understand how to do using various sources online, all the examples uses simple queries where mine is more complex, and for that, I don't find the solution.
Here's my current query :
SELECT id, category_id, name
FROM orders AS u1
WHERE added < (UTC_TIMESTAMP() - INTERVAL 60 SECOND)
AND (executed IS NULL OR executed < (UTC_DATE() - INTERVAL 1 MONTH))
AND category_id NOT IN (SELECT category_id
FROM orders AS u2
WHERE executed > (UTC_TIMESTAMP() - INTERVAL 5 SECOND)
GROUP BY category_id)
GROUP BY category_id
ORDER BY added ASC
LIMIT 10;
The table orders is like this:
id
category_id
name
added
executed
The purpose of the query is to list n orders (here, 10) that belong in different categories (I have hundreds of categories), so 10 category_id different. The orders showed here must be older than a minute ago (INTERVAL 60 SECOND) and never executed (IS NULL) or executed more than a month ago.
The NOT IN query is to avoid treating a category_id that has already been treated less than 5 seconds ago. So in the result, I remove all the categories that have been treated less than 5 seconds ago.
I've tried to change the NOT IN in a LEFT JOIN clause or a NOT EXISTS but the switch results in a different set of entries so I believe it's not correct.
Here's what I have so far :
SELECT u1.id, u1.category_id, u1.name, u1.added
FROM orders AS u1
LEFT JOIN orders AS u2
ON u1.category_id = u2.category_id
AND u2.executed > (UTC_TIMESTAMP() - INTERVAL 5 SECOND)
WHERE u1.added < (UTC_TIMESTAMP() - INTERVAL 60 SECOND)
AND (u1.executed IS NULL OR u1.executed < (UTC_DATE() - INTERVAL 1 MONTH))
AND u2.category_id IS NULL
GROUP BY u1.category_id
LIMIT 10
Thank you for your help.
Here's a sample data to try. In that case, there is no "older than 5 seconds" since it's near impossible to get a correct value, but it gives you some data to help out :)
Your query is using a column which doesn't exist in the table as a join condition.
ON u1.domain = u2.category_id
There is no column in your example data called "domain"
Your query is also using the incorrect operator for your 2nd join condition.
AND u2.executed > (UTC_TIMESTAMP() - INTERVAL 5 SECOND)
should be
AND u2.executed < (UTC_TIMESTAMP() - INTERVAL 5 SECOND)
as is used in your first query

MySQL join date columns with 1-month lag and performance issues

Note: I found this similar question but it does not address my issue, so I do not believe this is a duplicate.
I have two simple MySQL tables (created with the MyISAM engine), Table1 and Table2.
Both of the tables have 3 columns, a date-type column, an integer ID column, and a float value column. Both tables have about 3 million records and are very straightforward.
The contents of the tables looks like this (with Date and Id as primary keys):
Date Id Var1
2012-1-27 1 0.1
2012-1-27 2 0.5
2012-2-28 1 0.6
2012-2-28 2 0.7
(assume Var1 becomes Var2 for the second table).
Note that for each (year, month, ID) triplet, there will only be a single entry. But the actual day of the month that appears is not necessarily the final day, nor is it the final weekday, nor is it the final business day, etc... It's just some day of the month. This day is important as an observation day in other tables, but the day-of-month itself doesn't matter between Table1 and Table2.
Because of this, I cannot rely on Date + INTERVAL 1 MONTH to produce the matching day-of-month for the date it should match to that is one month ahead.
I'm looking to join the two tables on Date and Id but where the values from the second table (Var2) come from 1-month ahead than Var1.
This sort of code will accomplish it, but I am noticing a significant performance degradation with this, explained below.
-- This is exceptionally slow for me
SELECT b.Date,
b.Id,
a.Var1,
b.Var2
FROM Table1 a
JOIN Table2 b
ON a.Id = b.Id
AND YEAR(a.Date + INTERVAL 1 MONTH) = YEAR(b.Date)
AND MONTH(a.Date + INTERVAL 1 MONTH) = MONTH(b.Date)
-- This returns quickly, but if I use it as a sub-query
-- then the parent query is very slow.
SELECT Date + INTERVAL 1 MONTH as FutureDate,
Id,
Var1
FROM Table1
-- That is, the above is fast, but this is super slow:
select b.Date,
b.Id,
a.Var1,
b.Var2
FROM (SELECT Date + INTERVAL 1 MONTH as FutureDate
Id,
Var1
FROM Table1) a
JOIN Table2 b
ON YEAR(a.FutureDate) = YEAR(b.Date)
AND MONTH(a.FutureDate) = MONTH(b.Date)
AND a.Id = b.Id
I've tried re-ordering the JOIN criteria, thinking maybe that matching on Id first in the code would change the query execution plan, but it seems to make no difference.
When I say "super slow", I mean that option #1 from the code above doesn't return the results for all 3 million records even if I wait for over an hour. Option #2 returns in less than 10 minutes, but then option number three takes longer than 1 hour again.
I don't understand why the introduction of the date lag makes it take so long.
How can I
profile the queries to understand why it takes a long time?
write a better query for joining tables based on a 1-month date lag (where day-of-month that results from the 1-month lag may cause mismatches).
Here is an alternative approach:
SELECT b.Date, b.Id, b.Var2
(select a.var1
from Table1 a
where a.id = b.id and a.date < b.date
order by a.date
limit 1
) as var1
b.Var2
FROM Table2 b;
Be sure the primary index is set up with id first and then date on Table1. Otherwise, create another index Table1(id, date).
Note that this assumes that the preceding date is for the preceding month.
Here's another alternative way to go about this:
SELECT thismonth.Date,
thismonth.Id,
thismonth.Var1 AS Var1_thismonth,
lastmonth.Var1 AS Var1_lastmonth
FROM Table2 AS thismonth
JOIN
(SELECT id, Var1,
DATE(DATE_FORMAT(Date,'%Y-%m-01')) as MonthStart
FROM Table2
) AS lastmonth
ON ( thismonth.id = lastmonth.id
AND thismonth.Date >= lastmonth.MonthStart + INTERVAL 1 MONTH
AND thismonth.Date < lastmonth.MonthStart + INTERVAL 2 MONTH
)
To get this to perform ideally, I think you're going to need a compound covering index on (id, Date, Var1).
It works by generating a derived table containing Id,MonthStart,Var1 and then joining the original table to it by a sequence of range scans. Hence the compound covering index.
The other answers gave very useful tips, but ultimately, without making significant modifications to the index structure of my data (which is not feasible at the moment), those methods would not work faster (in any meaningful sense) than what I had already tried in the question.
Ollie Jones gave me the idea to use date formatting, and coupling that with the TIMESTAMPDIFF function seems to make it passably fast, though I still welcome any comments explaining why the use of YEAR, MONTH, DATE_FORMAT, and TIMESTAMPDIFF have such wildly different performance properties.
SELECT b.Date,
b.Id,
b.Var2,
a.Date,
a.Id,
a.Var1
FROM Table1 a
JOIN Table2 b
ON a.Id = b.Id
AND (TIMESTAMPDIFF(MONTH,
DATE_FORMAT(a.Date, '%Y-%m-01'),
DATE_FORMAT(b.Date, '%Y-%m-01')) = 1)

Show values even if empty

I am using the following to show a count of products added over the last 7 days...Can i somehow tailor the query to show all the last 7 days even if COUNT=0?
query as it stands:
SELECT DAYNAME(dateadded) DAY, COUNT(*) COUNT
FROM `products`
WHERE (`dateadded` BETWEEN DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND CURDATE() && site_url = 'mysite.com')
GROUP BY DAY(dateadded)
Add a table with dates in it (a dates lookup table), then:
SELECT DAYNAME(d.FullDate) DAY, COUNT(*) COUNT
FROM dates d
LEFT OUTER JOIN products p ON d.FullDate = DATE(p.dateadded)
AND p.site_url = 'mysite.com'
WHERE d.FullDate BETWEEN DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND CURDATE()
GROUP BY d.FullDate
It takes a little bit of storage, yes, but it will make queries like this a lot easier.
Alternatively, you can make a stored procedure that loops through dates between 7 days ago and today and returns one row for each.

Getting the newest record from a database Multiple queries

I have a mysql database with vehicles records. I need a fast query that will return the newest records of those records that were updated within the last 4 minutes. For example vehicle "A" may be updated several times a minute so it will appear many times within the last 4min. Same with vehicle B C etc. I need only the most recent entries for each vehicle within a 4 min window. I have tried like this
SELECT *
FROM yourtable AS a
WHERE a.ts =
(SELECT MAX(ts)
FROM yourtable AS b
WHERE b.ts > NOW() - INTERVAL 5 MINUTE
AND b.name = a.name)
but it takes too long to produce results >10seconds.
You don't need the self-join.
select max(ts), name from Table1
where ts > NOW() - INTERVAL 5 MINUTE
group by name
To get all the rows for the latest updates and not only the name and timestamp:
SELECT t.*
FROM
TableX AS t
JOIN
( SELECT name
, MAX(ts) AS maxts
FROM TableX
WHERE ts > NOW() - INTERVAL 4 MINUTE
GROUP BY name
) AS grp
ON grp.name = t.name
AND grp.maxts = t.ts
You'll need at least an index on the timestamp column for this query.

sql query to get duplicate records with different dates

I need to get records with different date field ,
table Sites:
field id
reference
created
Every day we add lot of records, so I need to do a function that extract all records existing with duplicates of rows just was added, to do some notifications.
the conditions that i can't get is the difference between records of the current day and the old data in the table should be (one day to 4 days) .
If is there any simple query to do that without using transaction .
I'm not sure I totally understand what you mean by duplicate records, but here's a basic date query:
SELECT fieldId, reference, created, DATE(created) as the_date
FROM Sites
WHERE the_date
BETWEEN DATE( DATE_SUB( NOW() , INTERVAL 3 DAY ) )
AND DATE ( NOW() )
I'm making several assumptions such as:
You don't want the "first" row returned
Duplicates don't carry the
date forward (The next after initial 4 days is not a duplicate)
The 4 days means +4 days so Day 5 is included
So, my code is :
with originals as (
select s1.*
from sites as s1
where 0 = (
select count(*)
from sites as s2
where s1.field_id = s2.field_id
and s1.reference = s2.reference
and s1.created <> s2.created
and DATEDIFF(DAY,s2.created, s1.created) between 1 and 4
)
)
select s1.*
from sites as s1
inner join originals as o
on s1.field_id = o.field_id
and s1.reference = o.reference
and s1.created <> o.created
where DATEDIFF(DAY,o.created, s1.created) between 1 and 4
order by 1,2,3;
Here it is in a fiddle: http://sqlfiddle.com/#!3/9b407/20
This could be simpler if some conditions are relaxed.
thanks a lot for every one who tried to help me ,
i have found this solution after lot of test
SELECT `id`,`reference`,count(`config_id`) as c,`created` FROM `sites`
where datediff(date(current_date()),date(`created`)) < 4
group by `reference`
having c > 1
thanks a lot for your help