Here is what im trying to do explained in a query
DELETE FROM table ORDER BY dateRegistered DESC LIMIT 1000 *
I want to run such query in a script which i have already designed. Every time it finds older records that are 1001th record or above it deletes
So kinda of setting Max Row size but deleting all the older records.
Actually is there a way to set that up in the CREATE statement.
Therefore: If i have 9023 rows in the database, when i run that query it should delete 8023 rows and leave me with 1000
If you have a unique ID for rows here is the theoretically correct way, but it is not very efficient (not even if you have an index on the dateRegistered column):
DELETE FROM table
WHERE id NOT IN (
SELECT id FROM table
ORDER BY dateRegistered DESC
LIMIT 1000
)
I think you would be better off by limiting the DELETE directly by date instead of number of rows.
I don't think there is a way to set that up in the CREATE TABLE statement, at least not a portable one.
The only way that immediately occurs to me for this exact job is to do it manually.
First, get a lock on the table. You don't want the row count changing while you're doing this. (If a lock is not practical for your app, you'll have to work out a more clever queuing system rather than using this method.)
Next, get current row count:
SELECT count(*) FROM table
Once you have that, you should with simple maths be able to figure out how many rows need deleting. Let's say it said 1005 - you need to delete 5 rows.
DELETE FROM table ORDER BY dateRegistered ASC LIMIT 5
Now, unlock the table.
If a lock isn't practical for your scenario, you'll have to be a bit more clever - for example, select the unique ID of all the rows that need deleting, and queue them for gradual deletion. I'll let you work that out yourself :)
Related
hello im new to mysql,
how to make a trigger on insert to check row count if it was 500 to delete the first 100 rows.
so the row count get back to 400 then add new 100 untill it reach 500 and when try to add new one the trigger delete the first 100
Please don't give me other options i just want to know this answer
Why bother? You can just have an auto-incrementing id and keep track of the most recent records. For real performance, you can have an index on (id desc).
Then, a query such as:
select t.*
from t
order by id desc
limit 500;
should be quite fast.
At your leisure, you can schedule a job -- say a weekly job -- that truncates the table to 400 or 500 rows if you really need to reduce the size of the table. However, such a table is so small that it is hard to believe that size considerations are involved.
i have a small question. Is it possible if you have a database, that you can keep only the last 10 records inserted in the table? Or that your table can only hold 10 records, so when a new record is inserted, it adds it but gets rid of the last inserted record?
I'm using mySQL and PhpMyAdmin.
You can do this, using a trigger.
I don't think I recommend actually doing this. There is overhead to deleting records. Instead, you can just create a view such as:
create view v_table as
select t.*
from t
order by insertionDateTime desc
limit 10;
This might seem expensive, but with an index on (insertionDateTime desc), the performance should be reasonable.
Here's one way:
delete from my_table where id not in (select id from my_table order by id desc limit 10);
Assuming id is a number field that is incremented on each insert. Timestamps would also work.
I have a table with more than 40 million records.i want to delete about 150000 records with a sql query:
DELETE
FROM t
WHERE date="2013-11-24"
but I get error 1206(The total number of locks exceeds the lock table size).
I searched a lot and change the buffer pool size:
innodb_buffer_pool_size=3GB
but it didn't work.
I also tried to lock tables but didn't work too:
Lock Tables t write;
DELETE
FROM t
WHERE date="2013-11-24";
unlock tables;
I know one solution is to split the process of deleting but i want this be my last option.
I am using mysql server, server OS is centos and server Ram is 4GB.
I'll appreciate any help.
You can use Limit on your delete and try deleting data in batches of say 10,000 records at a time as:
DELETE
FROM t
WHERE date="2013-11-24"
LIMIT 10000
You can also include an ORDER BY clause so that rows are deleted in the order specified by the clause:
DELETE
FROM t
WHERE date="2013-11-24"
ORDER BY primary_key_column
LIMIT 10000
There are a lot of quirky ways this error can occur. I will try to list one or two and perhaps the analogy holds true for someone reading this at some point.
On larger datasets even when changing innodb_buffer_pool_size to a larger value, you can hit this error when an adequate index is not in place to isolate the rows in the where clause. Or in some cases with the primary index (see this) and the comment from Roger Gammans:
From the (5.0 documentation for innodb):-
If you have no indexes suitable for your statement and MySQL must scan
the entire table to process the statement, every row of the table
becomes locked, which in turn blocks all inserts by other users to the
table. It is important to create good indexes so that your queries do
not unnecessarily scan many rows.
A visual of how this error can occur and difficult to solve is with this simple schema:
CREATE TABLE `students` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`thing` int(11) NOT NULL,
`campusId` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ix_stu_cam` (`camId`)
) ENGINE=InnoDB;
A table with 50 Million rows. FK's not shown, not the issue. This table was originally for showing query performance also not important. Yet, in initializing thing=id in blocks of 1M rows, I had to perform a limit during the block update to prevent other problems, by using:
update students
set thing=id
where thing!=id
order by id desc
limit 1000000 ; -- 1 Million
This was all well until it got down to say 600000 left to update as seen by
select count(*) from students where thing!=id;
Why I was doing that count(*) stemmed from repeated
Error 1206: The total number of locks exceeds the lock table size
I could keep lowering my LIMIT shown in the above update, but in the end I would be left, say, with 1200 != in the count, and the problem just continued.
Why did it continue? Because the system filled the lock table as it scanned this large table. Sure, it might "intra implicit transaction" have changed those last 1200 row to equal, in my mind, but due to the lock table filling up, in reality would abort the transaction with nothing set. And the process would stalemate.
Illustration 2:
In this example, let's say I have 288 rows of the 50 Million row table that could be updated shown above. Due to the end-game problem described, I would often find a problem running this query twice:
update students set thing=id where thing!=id order by id desc limit 200 ;
But I would not have a problem with these:
update students set thing=id where thing!=id order by id desc limit 200;
update students set thing=id where thing!=id order by id desc limit 88 ;
Solutions
There are many ways to solve this, including but not limited to:
A. The creation of another index on a column suggesting the data had been updated, perhaps a boolean. And incorporating that into the where clause. Yet on huge tables, the creation of somewhat temporary indexes may be out of the question.
B. Populating a 2nd table with yet to be cleaned id's could be another solution. Coupled with and update with a join pattern.
C. Dynamically changing the LIMIT value so as to not cause an overrun of the lock table. The overrun can occur when there are simply no more rows to UPDATE or DELETE (your operation), the LIMIT has not been reached, and the lock table fills up in a fruitless scan for more that simply don't exist (seen above in Illustration2).
The main point of this answer is to offer an understanding of why it is happening. And for any reader to craft an end-game solution that fits their needs (versus, at times, fruitless changes to system variables, reboots, and prayers).
The simplest way is to create an index on the date column. I had 170 million rows and was deleting 6.5 million rows. I ran into the same problem and solved it by creating non-clustered index on the column which I was using in the WHERE clause then I executed the delete query and it worked.
Delete the index if you don't need it for future.
Is there a faster way to update the oldest row of a MySQL table that matches a certain condition than using ORDER BY id LIMIT 1 as in the following query?
UPDATE mytable SET field1 = '1' WHERE field1 = 0 ORDER BY id LIMIT 1;
Note:
Assume the primary key is id and there is also a index on field1.
We are updating a single row.
We are not updating strictly the oldest row, we are updating the oldest row that matches a condition.
We want to update the oldest matching row, i.e the lowest id, i.e. the head of the FIFO queue.
Questions:
Is the ORDER BY id necessary? How does MySQL order by default?
Real world example
We have a DB table being used for a email queue. Rows are added when we want to queue emails to send to our users. Rows are removed by a cron job, run each minute, processing as many as possible in that minute and sending 1 email per row.
We plan to ditch this approach and use something like Gearman or Resque to process our email queue. But in the meantime I have a question on how we can efficiently mark the oldest item of the queue for processing, a.k.a. The row with the lowest ID. This query does the job:
mysql_query("UPDATE email_queue SET processingID = '1' WHERE processingID = 0 ORDER BY id LIMIT 1");
However, it is appearing in the mysql slow log a lot due to scaling issues. The query can take more than 10s when the table has 500,000 rows. The problem is that this table has grown massively since it was first introduced and now sometimes has half a million rows and a overhead of 133.9 MiB. For example we INSERT 6000 new rows perhaps 180 times a day and DELETE roughly the same number.
To stop the query appearing in the slow log we removed the ORDER BY id to stop a massive sort of the whole table. i.e.
mysql_query("UPDATE email_queue SET processingID = '1' WHERE processingID = 0 LIMIT 1");
... but the new query no longer always gets the row with the lowest id (although it often does). Is there a more efficient way of getting the row with the lowest id other than using ORDER BY id ?
For reference, this is the structure of the email queue table:
CREATE TABLE IF NOT EXISTS `email_queue` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`time_queued` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Time when item was queued',
`mem_id` int(10) NOT NULL,
`email` varchar(150) NOT NULL,
`processingID` int(2) NOT NULL COMMENT 'Indicate if row is being processed',
PRIMARY KEY (`id`),
KEY `processingID` (`processingID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Give this a read:
ORDER BY … LIMIT Performance Optimization
sounds like you have other processes locking the table preventing your update completing in a timely manner - have you considered using innodb ?
I think the 'slow part' comes from
WHERE processingID = 0
It's slow because it's not indexed. But, indexing this column (IMHO) seems incorrect too.
The idea is to change above query to something like :
WHERE id = 0
Which theoretically will be faster since it uses index.
How about creating another table which contains ids of rows which hasn't been processed? Hence the insertion works twice. First to insert to the real table and the second is to insert id into 'table of hasn't processed'. The processing part too, needs to double its duty. First to retrieve an id from 'table of hasn't been processed' then delete it. The second job of processing part is to process of course.
Of course, the id column in 'table of hasn't been processed' needs to index its content. Just to ensure that selecting and deleting will be faster.
This question is old, but for reference for anyone ending up here:
You have a condition on processingID (WHERE processingID = 0), and within that constraint you want to order by ID.
What's happening with your current query is that it scans the table from the lowest ID to the greatest, stopping when it finds 1 record matching the condition. Presumably, it will first find a ton of old records, scanning almost the entire table until it finds an unprocessed one near the end.
How do we improve this?
Consider that you have an index on processingID. Technically, the primary key is always appended (which is how the index can "point" to anything in the first place). So you really have an index on processingID, id. That means ordering on that will be fast.
Change your ordering to: ORDER BY processingID, id
Since you have fixed processingID to a single value with you WHERE clause, this does not change the resulting order. However, it does make it easy for the database to apply both your condition and your ordering, without scanning any records that do not match.
One funny thing is that MySQL, by default, returns rows orderd by ID, instead in a casual way as stated in the relational theory (I am not sure if this behaviour is changed in the latest versions). So, the last row you get from a select should be the last inserted row. I would not use this way, of course.
As you said, the best solution is to use something like Resque, or RabbitMQ & co.
You could use an in-memory table, that is volatile, but much faster, than store, there the latest ID, or just use a my_isam table to add persistency. It is simple and fast in performance and it takes a little bit to implement.
I have a table which i would like only to have the 20 most recent entries, because the table is adding a row every .50 seconds.
How would I accomplish this in querying to the database(mysql).
Should i add a new row and delete the oldest whenever i want to push a new value in?
You could create a view. To guarantee ordering by recentness, introduce a column with a timestamp to order by (or use the primary key when using sequential numbering, such as AUTO_INCREMENT).
CREATE VIEW latest_entries AS
SELECT ... FROM TABLE foo ORDER BY created_time LIMIT 20;
Also, don't forget to 'clean' the underlying table from time to time.
If you really want to purge the 20th row of the table when you insert a new row, you will have to delete the row on insert. The best way is to create a Trigger to do this work for you.
CREATE TRIGGER Deleter AFTER INSERT on YourTable FOR EACH ROW
BEGIN
Delete from yourTable where ID = (Select max(id) from yourTable);
END;
Yes, you need to delete any rows older than the 20th oldest row every time you insert if you always must have only the latest 20 rows in the table.
You can have a view that returns the latest 20 rows, but your table will still grow if all you do is insert.
The best solution might be to use the view for querying, don't delete every time you insert, but once a day at off time run a delete that leaves only the latest 20 rows.
Just make sure the timestamp column is indexed.