In my mysql database, I insert to a table 100 rows every minute. The column named 'time' is of type DATETIME contains the date & hour of the insertion (excluding seconds).
I'm looking for an efficient way to fetch rows from that table - in a specified timeframe.
For example, if my timeframe is 15 minutes, then the following rows would be fetched:
3-11-18 13:00:00
3-11-18 13:15:00
3-11-18 13:30:00
3-11-18 13:45:00
3-11-18 14:00:00
etc...
For timeframe of one hour it will fetch
3-11-18 13:00:00
3-11-18 14:00:00
Currently, I'm using the LIKE operator to do that. For example, 15 minutes timeframe query looks like this:
SELECT * FROM my_table WHERE time LIKE '%:15:00' OR time LIKE '%:30:00' OR time LIKE '%:45:00' OR time LIKE '%:00:00';
But these queries run very very slow.
What can I do to improve performance?
You can try to add a generated column for minutes and set an index on it:
alter table my_table add column minute tinyint unsigned as (minute(time));
alter table my_table add index (minute);
A query like
select * from my_table where minute(time) = 0;
would use that index.
But I'm not sure if it helps with a query like
select * from my_table where minute(time) in (0, 15, 30, 45);
The reason is that the selectivity of the WHERE condition is not very good here. A full table scan skipping 14 of 15 rows can be faster than an index search + a second round trip to the clustered index. In that case you can't do anything. Except of creating a covering index. But an index that will cover SELECT * will be probably harmfull for 100 inserts per minute.
Use MOD to filter by the timeframe requirement.
For timeframe of 15 minutes:
SELECT *
from my_table
where mod(minute(time),15) = 0
For 30 minutes timeframe:
mod(minute(time),30) = 0
For one hour (60-minutes) timeframe:
mod(minute(time),60) = 0
Or
minute(time) = 0
Generalized WHERE clause for 15, 30 or 60 minutes timeframe:
mod( minute(time), <timeframeInMinutes>) = 0
Related
I have 29,938,766 rows inside the VISITS table and the table looks like this
USER_ID (INT)
VISITED_IN (DATETIME)
65
2020-08-26 07:57:43
1182
2019-03-15 02:46:48
1564
2015-07-04 10:59:44
73
2021-03-18 00:25:08
3791
2017-10-17 12:22:45
51
2022-05-02 19:11:09
917
2017-11-20 15:32:06
3
2019-12-29 15:15:51
51
2015-02-08 17:48:30
1531
2020-08-05 08:44:55
Etc...
Etc...
When running this query, It takes 17-20 seconds and returns 63,514 (The user has 63,514 visits)
SELECT COUNT(*) FROM VISITS WHERE USER_ID = 917
When running this query, It takes 17-20 seconds and returns 193 (The user has 193 visits)
SELECT COUNT(*) FROM VISITS WHERE USER_ID = 716
The problem is the query always takes between 17-20 seconds for 29,938,766 rows even if the user has only 3, 50, 70, or 1,000,000 visits.
I think the problem is because it is looping for all rows?
The second query must be faster than the first query. It depends on the number of rows. But both queries take the same time!
What do you suggest to me to avoid this problem?
Table structure
Update: Here is a new suggested scenario:
When a user enters his or others' profile, He can see the number of profile visits and he can filter visits using this way
Last 24 hours
|
---> SELECT COUNT(*) FROM VISITS WHERE USER_ID = 5 AND VISITED_IN >= DATE_SUB(NOW(), INTERVAL 1 DAY);
Last 7 days
|
---> SELECT COUNT(*) FROM VISITS WHERE USER_ID = 5 AND VISITED_IN >= DATE_SUB(NOW(), INTERVAL 7 DAY);
Last 30 days
|
---> SELECT COUNT(*) FROM VISITS WHERE USER_ID = 5 AND VISITED_IN >= DATE_SUB(NOW(), INTERVAL 30 DAY);
All time
|
---> SELECT VISITS FROM USERS WHERE USER_ID = 5;
Also, I'll create a recurring event that executes this command every day.
DELETE FROM VISITS WHERE VISITED_IN <= DATE_SUB(NOW(), INTERVAL 30 DAY);
Also, I'll make sure to increase the VISITS column when adding a new row in the VISITS table.
UPDATE USERS SET VISITS = VISITS + 1 WHERE ID = 5
INDEX(user_id, visited_in)
will speed up all the SELECTs you mentioned. They will have to scan a chunk of the index; they will not have to "scan the whole table".
The DELETE needs `INDEX(visited_in). But it is problematic if you don't run it frequently enough. This is because Deleting thousands of rows at one time is potentially a problem. Consider running that delete at least once an hour.
If the table will be really big, etc, consider using Partitioning of a "time series". With that DROP PARTITION, is much faster. Partition
Any caching service will provide stale counts, but it will be faster some of the time.
It is "ok to hit the database every time someone opens a page", but only if the queries are efficient enough. Do Index.
In my Answer to your other Question, I explain how a Summary table can speed things up even more. However it assumes "last N days" is measured from midnight to midnight. Your current queries are NOW() - INTERVAL N DAY. That is messier to implement than midnight. Are you willing to change the meaning of "last N days"?
(Some INDEX basics...)
An important reason for any index is its ability to rapidly find the row(s) based on some column(s).
An INDEX is a list of keys mapping to rows.
A UNIQUE INDEX is an INDEX, plus a uniqueness constraint -- implying that no two rows have the same value in the index.
The one and only PRIMARY KEY is a unique index designated to uniquely identify every row in the table.
"key" and "index" are synonyms.
Indexes (in MySQL's InnoDB engine) are implemented as a BTree (actually a B+Tree; see Wikipedia). In the case of the PK, the rest of the columns are sitting there with the PK value. In the case of "secondary" keys, the 'value' part of the BTree is the PK column(s).
Any index can contain 1 column or multiple columns (called "composite")
INDEX(lastname) is not likely to be UNIQUE
INDEX(lastname, firstname) is still not likely to be UNIQUE, but it is "composite".
I want to query a table which is like
Table structure for table archive
|------
|Column|Type|Null|Default
|------
|//**id**//|int(11)|No|
|datetime|timestamp|No|CURRENT_TIMESTAMP
|gatewayid|int(11)|No|
|RSSI|float|No|
|distance|float|No|
|beaconid|int(11)|No|
== Dumping data for table archive
|1|2017-08-22 12:14:19|1|-65|36|1
|2|2017-08-22 12:14:19|2|-60|30|1
|3|2017-08-22 12:14:19|3|-60|30|1
|4|2017-08-22 12:14:19|1|-52|63|2
|5|2017-08-22 12:14:19|2|-36|33|2
|6|2017-08-22 12:14:19|3|-65|33|2
|7|2017-08-22 12:14:19|1|-69|66|3
|8|2017-08-22 12:14:19|2|-65|33|3
|9|2017-08-22 12:14:19|3|-66|33|3
|10|2017-08-22 12:16:09|1|-65|36|1
|11|2017-08-22 12:16:09|2|-60|30|1
|12|2017-08-22 12:16:09|3|-60|30|1
|13|2017-08-22 12:16:09|1|-52|63|2
|14|2017-08-22 12:16:09|2|-36|33|2
|15|2017-08-22 12:16:09|3|-65|33|2
|16|2017-08-22 12:16:09|1|-69|66|3
|17|2017-08-22 12:16:09|2|-65|33|3
|18|2017-08-22 12:16:09|3|-66|33|3
|19|2017-08-22 12:32:05|1|-65|36|1
|20|2017-08-22 12:32:05|2|-60|30|1
|21|2017-08-22 12:32:05|3|-60|30|1
|22|2017-08-22 12:32:05|1|-52|63|2
|23|2017-08-22 12:32:05|2|-36|33|2
|24|2017-08-22 12:32:05|3|-65|33|2
|25|2017-08-22 12:32:05|1|-69|66|3
I want to average RSSI values based on theses rules
- group based on gatewayid,beaconid and datetime
- the group by datetime should be in every 5 minutes for example
well in fact I want avrage RSSI values for rows which their beaconid and gatewayid are the same and they are added in a 5 minutes interval,
I have written this query
select DATE_ADD( '1900-01-01T00:00:00',INTERVAL 15+TIMESTAMPDIFF(minute, '1900-01-01T00:00:00', datetime) minute),
(sum(RSSI)/count(*)) as mean_rssi,
(sum(distance)/count(*)) as mean_distance,
beaconid,
gatewayid
from archive
GROUP by DATE_ADD( '1900-01-01T00:00:00',INTERVAL 15+TIMESTAMPDIFF(minute, '1900-01-01T00:00:00', datetime) minute),
beaconid,
gatewayid
Here is sqlfiddle for create statement
this query returns all rows without any changes,where am I doing wrong?
thanks
Your query appears to be correct it's just that your data is unique on interval/beaconid/getewayid so it happens to return 'all' rows...
If I understand your data you probably want to group by 5 min intervals, and to get a value that is the same for every five min interval you may opt to just divide the timestamp by 300 seconds - something like:
select min(`datetime`) as `start`,
(sum(RSSI)/count(*)) as mean_rssi,
(sum(distance)/count(*)) as mean_distance,
beaconid,
gatewayid
from archive
group by UNIX_TIMESTAMP(`datetime`) / 60*5,
beaconid,
gatewayid
also use sqlfiddle or rextester it is much easier to help you if you use those...
Hi i am trying to fetch last 5 minutes of data from oracle table.The query is written below and its not working somehow.
select * from mytable where (time_to_sec(timediff(now(),mytable.time_stamp)) <= 300)
Its showing this error ORA-00904.
I tried one more query.
select * from mytable where TIME_STAMP > (sysdate - numtodsinterval(5,'minute'))
Now, can you tell me the query which fetches data of last 5 minutes and which deletes data that is in the table for more than 12 hours.Thanks.
I need queries in both oracle and mysql. The mysql query i tried is here.
delete from mytable where (time_to_sec(timediff(now(),time_stamp))/3600 >12);
In oracle subtracting 1 from timestamp means one day. And You can substract a fraction of one. So,
current_timestamp - (5/(24*60))
gives You date from 5 minutes ago. Using that we can query:
select * from mytable where TIME_STAMP > current_timestamp - (5/(24*60)
Which should give You needed result. I find this method more straightfoward and simpler to remember than using special functions.
If You want filter out data from last 12 hours than You can query it like this:
select * from mytable where TIME_STAMP <= current_timestamp - 0.5
I've been trying to work this one out for a while now, maybe my problem is coming up with the correct search query. I'm not sure.
Anyway, the problem I'm having is that I have a table of data that has a new row added every second (imagine the structure {id, timestamp(datetime), value}). I would like to do a single query for MySQL to go through the table and output only the first value of each minute.
I thought about doing this with multiple queries with LIMIT and datetime >= (beginning of minute) but with the volume of data I'm collecting that is a lot of queries so it would be nicer to produce the data in a single query.
Sample data:
id datetime value
1 2015-01-01 00:00:00 128
2 2015-01-01 00:00:01 127
3 2015-01-01 00:00:04 129
4 2015-01-01 00:00:05 127
...
67 2015-01-01 00:00:59 112
68 2015-01-01 00:01:12 108
69 2015-01-01 00:01:13 109
Where I would want the result to select the rows:
1 2015-01-01 00:00:00 128
68 2015-01-01 00:01:12 108
Any ideas?
Thanks!
EDIT: Forgot to add, the data, whilst every second, is not reliably on the first second of every minute - it may be :30 or :01 rather than :00 seconds past the minute
EDIT 2: A nice-to-have (definitely not required for answer) would be a query that is flexible to also take an arbitrary number of minutes (rather than one row each minute)
SELECT t2.* FROM
( SELECT MIN(`datetime`) AS dt
FROM tbl
GROUP BY DATE_FORMAT(`datetime`,'%Y-%m-%d %H:%i')
) t1
JOIN tbl t2 ON t1.dt = t2.`datetime`
SQLFiddle
Or
SELECT *
FROM tbl
WHERE dt IN ( SELECT MIN(dt) AS dt
FROM tbl
GROUP BY DATE_FORMAT(dt,'%Y-%m-%d %H:%i'))
SQLFiddle
SELECT t1.*
FROM tbl t1
LEFT JOIN (
SELECT MIN(dt) AS dt
FROM tbl
GROUP BY DATE_FORMAT(dt,'%Y-%m-%d %H:%i')
) t2 ON t1.dt = t2.dt
WHERE t2.dt IS NOT NULL
SQLFiddle
In MS SQL Server I would use CROSS APPLY, but as far as I know MySQL doesn't have it, so we can emulate it.
Make sure that you have an index on your datetime column.
Create a table of numbers, or in your case a table of minutes. If you have a table of numbers starting from 1 it is trivial to turn it into minutes in the necessary range.
SELECT
tbl.ID
,tbl.`dt`
,tbl.value
FROM
(
SELECT
MinuteValue
, (
SELECT tbl.id
FROM tbl
WHERE tbl.`dt` >= Minutes.MinuteValue
ORDER BY tbl.`dt`
LIMIT 1
) AS ID
FROM Minutes
) AS IDs
INNER JOIN tbl ON tbl.ID = IDs.ID
For each minute find one row that has timestamp greater than the minute. I don't know how to return the full row, rather than one column in MySQL in the nested SELECT, so at first I'm making a temp table with two columns: Minute and id from the original table and then explicitly look up rows from original table knowing their IDs.
SQL Fiddle
I've created a table of Minutes in the SQL Fiddle with the necessary values to make example simple. In real life you would have a more generic table.
Here is SQL Fiddle that uses a table of numbers, just for illustration.
In any case, you do need to know in advance somehow the range of dates/numbers you are interested in.
It is trivial to make it work for any interval of minutes. If you need results every 5 minutes, just generate a table of minutes that has values not every 1 minute, but every 5 minutes. The main query would remain the same.
It may be more efficient, because here you don't join the big table to itself and you don't make calculations on the datetime column, so the server should be able to use the index on it.
The example that I made assumes that for each minute there is at least one row in the big table. If it is possible that there are some minutes that don't have any data at all you'd need to add extra check in the WHERE clause to make sure that the found row is still within that minute.
select * from table where timestamp LIKE "%-%-% %:%:00" could work.
This is similar to this question: Stack Overflow Date SQL Query Question
Edit: This probably would work better:
`select , date_format(timestamp, '%Y-%m-%d %H:%i') as the_minute, count()
from table
group by the_minute
order by the_minute
Similar to this question here: mysql select date format
i'm not really sure, but you could try this:
SELECT MIN(timestamp) FROM table WHERE YEAR(timestamp)=2015 GROUP BY DATE(timestamp), HOUR(timestamp), MINUTE(timestamp)
The following query should return 180 rows (as it is the total number of minutes between 2012-03-13 00:00 and 2012-03-13 03:00) but always returns 171 rows. I found out that when I use interval in a SELECT query, I can always get 171 rows at max. Here is the query:
SET #num = -1;
SELECT #num:=#num+1 AS AddInterval, DATE_ADD('2012-03-13 00:00', interval #num minute) AS MyDate FROM MyTable HAVING MyDate <= '2012-03-13 03:00' LIMIT 0, 180
Also, the following very simple query returns 171 rows too:
SELECT COUNT(DATE_ADD('2012-03-13 00:00', interval 1 minute)) FROM MyTable;
Is there any mysql configuration affects this, a limit or am I doing something wrong?
Thanks.
You're selecting from a table - if that table only has 171 rows, then you'll only get 171 increments. SQL will not create the missing rows for you, even if you specify a limit higher than what's available.
This is true even though you're not actually selecting any actual fields from that table. You're selecting only generated values - but you're still limited to the number of rows in the underlying table.