So my colleague created this query which will run every hour on a table with 500K+ records.
Delete from table where timestamp> now() - interval 24 hour
I am having a feeling that this would be slower as it is computing time at each row, am I right? How can I optimize it?
Update
With 2.8 Million records it took around 12 seconds to delete the matched rows.
I am having a feeling that this would be slower as it is computing time at each row, am I right?
No, the time calculation is done once at the start of the query. It is a constant value for the duration of the query.
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_now says:
NOW() returns a constant time that indicates the time at which the statement began to execute.
https://dev.mysql.com/doc/refman/8.0/en/where-optimization.html says:
Constant expressions used by indexes are evaluated only once.
You also asked:
How can I optimize it?
The easiest thing to do is make sure there is an index on the timestamp column.
A different solution is to use partitioning by the timestamp column, and drop 1 partition per day. This blog has a description of this solution: http://mysql.rjweb.org/doc.php/partitionmaint
run the query more frequently (say, hourly)
have an index on that column
PARTITION BY RANGE and use DROP PARTITION; suggest by hour; see Partition
More tips: http://mysql.rjweb.org/doc.php/deletebig
Related
I have a MERGE_MYISAM table(3 tables merged). Tables have entries of last 5 days, last 5-10 days, last 10-15 days respectively. Select queries are on column -:id and deletion takes place on the basis of date. If i create a table and create 3 partitions(range) on date select queries will be slow as select happens on id and if partitions are created on the basis of id delete queries will take time. Is there any other beneficial way to create partitions?
Are you deleting a whole day at a time? Or a whole 5 days?
I would seriously consider daily partitions split by PARTITION BY RANGE(TO_DAYS(...)) and use a daily DROP PARTITION (which is much faster than DELETE) and REORGANIZE PARTITION.
I discuss the details here.
And, I would move to InnoDB. See tips.
I am getting stuck by the execution time of a query. I have a table (no written by me) with a lot of rows (4mio) and a column representing the timestamp.
I want to do a query that will keep only the datas between two given timestamp.
I am currently using :
SELECT * FROM myTable WHERE timestamp BETWEEN "x" AND "y"
This query takes approximatively 11 sec to return 4000 row, while the query without the WHERE statement but a limit of 50'000 rows is executed in less than 0.1 sec. I am aware of the fact that with the WHERE statement, more rows are tested.
Because the timestamp is always increasing, is there any way to stop the query if the upperbound of the timestamp is reached? Or another way to run the same query much more faster?
Thank you very much
Kilian
WHERE NumberModule=24 AND Timestamp BETWEEN 40764 AND 40772
Needs a different index:
INDEX(NumberModule, Timestamp)
My Index Cookbook discusses why.
Please provide SHOW CREATE TABLE so we can see all the indexes, plus the ENGINE.
Add an BTREE index (best way to deal with BETWEEN, see here):
ALTER TABLE myTable ADD INDEX myIdx USING BTREE (`timestamp`)
Please post the result of
EXPLAIN SELECT * FROM myTable WHERE `timestamp` BETWEEN "x" AND "y"
after that.
I am using mysql server v5.1.73 on a Centos 6.4 64bit operating system.I have a table with about 17m records that it's size is about 10GB. mysql engine for this table is innodb.
This table has 10 columns and one of them is 'date' which its type is datetime. I want to delete records of a specific date with mysql 'delete' command.
delete
from table
where date(date) = '2015-06-01'
limit 1000
but when I run this command, i get an error 'the total number of locks exceeds lock table size'. I had this problem before and when i change innodb_buffer_poolsize, it would fix the problem but this time even increasing this amount, problem still exits.
I tried many tricks like changing limit value to 100 or even 1 record, but it doesn't work. I even increased innodb-buffer-poolsize to 20GB but nothing changed.
I also read these links: "The total number of locks exceeds the lock table size" Deleting 267 Records andThe total number of locks exceeds the lock table size
but they didn't solve my problem. my server has 64GB RAM.
in the other hand, I can delete records when not using a filter on a specific date:
delete
from table
limit 1000
and also I can select records of the day without any problem.can anyone help me with this?
I would appreciate any help to fix the problem.
Don't use date(date), it cannot use INDEX(date). Instead simply use date and have an index beginning with date.
More ways to do a chunking delete.
date(date) in the WHERE clause requires the database to calculate the value of that column for all 17m rows in the database at run time - the DB will create date(date) row by row for 17m rows, and then table scan (there is no index) those 17m rows to work out your result set. This is where you are running out of resources.
You need to remove the usage of the calculated column, which can be solved a couple of different ways.
Rather than doing date(date), change your comparison to be:
WHERE date >= '2015-06-01 00:00:00' AND date <= '2015-06-01 23:59:59'
This will now hit the index on the date column directly (I'm assuming you have an index on this column, it just won't be used by your original query)
The other solution would be to add a column to the table of type DATE, and permanently store the DATE of each DATETIME in that column (and obviously add an index for the new DATE column). That would allow you to run any query you like that just wants to examine the DATE portion only, without having to specify the time range. If you've got other queries currently using date(date), having a column with just the date specifically in it might be a preferred solution than adding the time range to the query (adding the time range is fine for a straight index comparison in a SELECT or DELETE like here, but might not be a usable solution for other queries involving JOIN, GROUP BY, etc).
I am working on designing a large database. In my application I will have many rows for example I currently have one table with 4 million records. Most of my queries use datetime clause to select data. Is it a good idea to index datetime fields in mysql database?
Select field1, field2,.....,field15
from table where field 20 between now() and now + 30 days
I am trying to keep my database working good and queries being run smoothly
More, what idea do you think I should have to create a high efficiency database?
MySQL recommends using indexes for a variety of reasons including elimination of rows between conditions: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
This makes your datetime column an excellent candidate for an index if you are going to be using it in conditions frequently in queries. If your only condition is BETWEEN NOW() AND DATE_ADD(NOW(), INTERVAL 30 DAY) and you have no other index in the condition, MySQL will have to do a full table scan on every query. I'm not sure how many rows are generated in 30 days, but as long as it's less than about 1/3 of the total rows it will be more efficient to use an index on the column.
Your question about creating an efficient database is very broad. I'd say to just make sure that it's normalized and all appropriate columns are indexed (i.e. ones used in joins and where clauses).
Here author performed tests showed that integer unix timestamp is better than DateTime. Note, he used MySql. But I feel no matter what DB engine you use comparing integers are slightly faster than comparing dates so int index is better than DateTime index. Take T1 - time of comparing 2 dates, T2 - time of comparing 2 integers. Search on indexed field takes approximately O(log(rows)) time because index based on some balanced tree - it may be different for different DB engines but anyway Log(rows) is common estimation. (if you not use bitmask or r-tree based index). So difference is (T2-T1)*Log(rows) - may play role if you perform your query oftenly.
I know this question was asked years ago, but I just found my solution.
I added an index to a datetime column.
I went from 1.6 seconds of calling the last 600 records sorted by datetime, and after adding the index, it came down to 0.0028 seconds. I'd say it's a win.
ALTER TABLE `database`.`table`
ADD INDEX `name_of_index` (`datetime_field_from_table`);
I have a large table containing hourly statistical data broken down across a number of dimensions. It's now large enough that I need to start aggregating the data to make queries faster. The table looks something like:
customer INT
campaign INT
start_time TIMESTAMP
end_time TIMESTAMP
time_period ENUM(hour, day, week)
clicks INT
I was thinking that I could, for example, insert a row into the table where campaign is null, and the clicks value would be the sum of all clicks for that customer and time period. Similarly, I could set the time period to "day" and this would be the sum of all of the hours in that day.
I'm sure this is a fairly common thing to do, so I'm wondering what the best way to achieve this in MySql? I'm assuming an INSERT INTO combined with a SELECT statement (like with a materialized view) - however since new data is constantly being added to this table, how do I avoid re-calculating aggregate data that I've previously calculated?
I done something similar and here is the problems I have deal with:
You can use round(start_time/86400)*86400 in "group by" part to get summary of all entries from same day. (For week is almost the same)
The SQL will look like:
insert into the_table
( select
customer,
NULL,
round(start_time/86400)*86400,
round(start_time/86400)*86400 + 86400,
'day',
sum(clicks)
from the_table
where time_period = 'hour' and start_time between <A> and <B>
group by customer, round(start_time/86400)*86400 ) as tbl;
delete from the_table
where time_period = 'hour' and start_time between <A> and <B>;
If you going to insert summary from same table to itself - you will use temp (Which mean you copy part of data from the table aside, than it dropped - for each transaction). So you must be very careful with the indexes and size of data returned by inner select.
When you constantly inserting and deleting rows - you will get fragmentation issues sooner or later. It will slow you down dramatically. The solutions is to use partitioning & to drop old partitions from time to time. Or you can run "optimize table" statement, but it will stop you work for relatively long time (may be minutes).
To avoid mess with duplicate data - you may want to clone the table for each time aggregation period (hour_table, day_table, ...)
If you're trying to make the table smaller, you'll be deleting the detailed rows after you make the summary row, right? Transactions are your friend. Start one, compute the rollup, insert the rollup, delete the detailed rows, end the transaction.
If you happen to add more rows for an older time period (who does that??), you can run the rollup again - it will combine your previous rollup entry with your extra data into a new, more powerful, rollup entry.