I have set-up a simple event that runs every hour and adds a record like this:
ON SCHEDULE EVERY 1 HOUR STARTS '2015-01-01 00:00:00'
DO
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE a INT;
DECLARE cursor_1 CURSOR FOR SELECT item_id FROM item WHERE NOW()>expiration_date AND has_expired = 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cursor_1;
read_loop: LOOP
FETCH cursor_1 INTO a;
IF done THEN
LEAVE read_loop;
END IF;
UPDATE item SET has_expired=1 WHERE quote_id=a;
INSERT INTO item_log (item_id, message) VALUES (a, 'Item is now expired');
END LOOP;
END
This thing runs 24 times a day and it works as expected, however, there is another idea to create events dynamically and attach them with a given record, e.g.
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 3 WEEK
DO
BEGIN
UPDATE item SET has_expired=1 WHERE item_id=232;
INSERT INTO item_log (item_id, message) VALUES (232, 'Item is now expired');
END
Of course the above would have different values of interval and ids, but that would mean that there are possibly 1000s or tens of thousands of events.
Now, would that be a problem? Limitations and performance wise?
I can imagine that if there are no records, or just few created a month then first approach will be constantly running for nothing. However if there will be few items added an hour, then it will mean that DB could reach thousands of one-time events. Would that not cause problems of its own?
Would you like to run that 100% faster? Get rid of it. Instead, have SELECTs include the clause AND (expiration_date < NOW())
OK, so you asked about the code. Here are some comments:
The UPDATE and INSERT need to be in a transaction.
The SELECT needs FOR UPDATE and should be in the transaction, too. But this is less important because, unless you ever change expiration_date, it never matters.
Cursors suck, performance-wise. Select 100 rows to purge, then run one UPDATE and one INSERT.
Scanning the table for this flag will be a slow "table scan" unless you have an index starting with expiration_date.
Related
DELIMITER $$
CREATE PROCEDURE sp_delete_data()
BEGIN
DECLARE i INT DEFAULT 0;
DECLARE loop_counter INT DEFAULT 0;
DECLARE retain_days datetime;
DECLARE delete_days datetime;
SET loop_counter=(SELECT ROUND(count(*)/100,0) FROM data2 WHERE datetime<(SELECT DATE_ADD(min(datetime),INTERVAL 1 DAY) FROM data2));
SET retain_days=(SELECT DATE_SUB(now(),INTERVAL 5 DAY)); -- 5 days data will keep
SET delete_days =(SELECT DATE_ADD(min(datetime),INTERVAL 1 DAY) FROM data2); -- check old data from table data2
WHILE i <= loop_counter DO
IF retain_days>delete_days THEN
DELETE FROM data2 where datetime<delete_days LIMIT 1000;
END IF;
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
I want to keep only 5 days data and delete rest of the data if it is less than the retention date. Since each day data generate almost 2000000 rows that is why it is difficult to delete whole data by one shot. That is why I want to delete 100000 data in each loop.
Here,
loop_counter variable used to find how many loop we should use for this day data.
retain_days variable define to find retention date
delete_days variable define to find deleted date.
based on retain_days and delete_days varoable data will retain and delete.
Fnally this procedure will call by event every 1 day.
My code loop is not working as expected. Need expert solution.
If is there any performance issue to delete data like this please let me know. Thanks In Advance
Just create an event that runs once a day:
CREATE EVENT purge_old_data
ON SCHEDULE EVERY '1' DAY
STARTS CURRENT_TIMESTAMP()
ON COMPLETION PRESERVE
COMMENT 'Delete rows older than 5 days'
DO
BEGIN
DELETE
FROM data2
WHERE `datetime` < DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 5 DAY);
END;
You should not use a loop to delete smaller chunks. In SQL databases loops actually make performance worse. If you have to delete 100s of millions of rows in the first run it is really not a problem for MariaDB.
PARTITION BY RANGE with each partition being, say, 2 hours' worth of data. Then DROP PARTITION will very rapidly drop data -- much better than DELETE.
More on using partitioning: http://mysql.rjweb.org/doc.php/partitionmaint
Alternatives: http://mysql.rjweb.org/doc.php/deletebig
In particular, the second link show how to 'continually' run through the data via the PRIMARY KEY, deleting 1000 rows at a time. Repeat when finished.
Note: The following is problematic:
DELETE FROM data2
where datetime<delete_days LIMIT 1000;
Without INDEX(datetime), it will spend much of its looking for any rows to delete. With such an index, there is still the overhead of bouncing between the index and the data 1000 times. In either case, the 1000 rows must be put into the redo log for the off chance of a crash.
I have a procedure in MYSQL database that should collect some data from multiple tables and loop through it.
I have created a table to insert this data in it, the table has a primary key and a none unique index.
The inserted data is about 200000 rows, the insert is don in few seconds, but the while loop takes very long time (about 30 mins) to complete!!!
The while loop code is something like below:
SET #I = 0;
WhileLoop: WHILE (1=1)
SELECT KeyRW
INTO #I
FROM MyTable
WHERE KeyRW> #I
ORDER BY KeyRW
LIMIT 1;
IF #I IS NULL THEN
leave WhileLoop;
END IF;
(Some simple calculation...)
END WHILE WhileLoop;
We moved the loop code into another procedure and execute it manually after the insert is done (there was a few minutes delay) and it executed very very faster!!!
Then we moved the loop back into the previous procedure and added a delay before it, and now its working, it seems that MYSQL is indexing data asyncly and we should wait for it!!
Did i understand correctly?
If yes then how much should we wait based on size of data?
If no then whats the problem and why a delay is solving this?
These are the statements
INSERT INTO toolate (name,type,date)
SELECT name, type ,date
FROM homework
WHERE date < CURRENT_DATE()
and
DELETE FROM homework WHERE date < CURRENT_DATE()
I need to combine these two so that my event will work in a proper order.
Firstly the INSERT statement then the DELETE one.
That way I can still see homework that's past date while having a clean homework table and it needs to happen automatically thus why I'm using events. Of course I will welcome a different solution.
You can't combine these two in a single query. However, an alternative would be to use STORED PROCEDURE and execute these two inside a transaction with EXIT HANDLER e.g.:
BEGIN
START TRANSACTION;
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
ROLLBACK;
EXIT PROCEDURE;
END;
INSERT INTO toolate (name,type,date)
SELECT name, type ,date
FROM homework
WHERE date < CURRENT_DATE()
DELETE FROM homework WHERE date < CURRENT_DATE()
COMMIT;
END
This will make sure both of these queries are executed sequencially, and if DELETE query fails, INSERT will be rolled back.
Here's MtSQL's documentation for stored procedures.
I’ve got a table of bookings with start and end times, and no two bookings can overlap.
I need to check that a new booking won’t overlap with any existing bookings. However we’ve got very high load so there’s a race condition: two overlapping bookings can be both successfully inserted because the first booking was inserted after the second booking checked for overlaps.
I’m trying to solve this by taking a lock on a related resource using a BEFORE INSERT database trigger.
DELIMITER //
CREATE TRIGGER booking_resource_double_booking_guard BEFORE INSERT ON booking_resource
FOR EACH ROW BEGIN
DECLARE overlapping_booking_resource_id INT DEFAULT NULL;
DECLARE msg VARCHAR(255);
-- Take an exclusive lock on the resource in question for the duration of the current
-- transaction. This will prevent double bookings.
DECLARE ignored INT DEFAULT NULL;
SELECT resource_id INTO ignored
FROM resource
WHERE resource_id = NEW.resource_id
FOR UPDATE;
-- Now we have the lock, check for optimistic locking conflicts:
SELECT booking_resource_id INTO overlapping_booking_resource_id
FROM booking_resource other
WHERE other.booking_from < NEW.booking_to
AND other.booking_to > NEW.booking_from
AND other.resource_id = NEW.resource_id
LIMIT 1;
IF overlapping_booking_resource_id IS NOT NULL THEN
SET msg = CONCAT('The inserted times overlap with booking_resource_id: ', overlapping_booking_resource_id);
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = msg;
END IF;
END
//
If I put this trigger in the database and insert two bookings asynchronously from the command line, this trigger successfully blocks the overlapping booking. I’ve tried this with putting a SLEEP before the last IF statement in the trigger, to make sure that the lock has really been taken out.
However, I have a load testing environment in Jenkins which runs a lot of bookings concurrently using jMeter. When I put this trigger there and run the load tests, no overlapping bookings are caught, i.e. double bookings are made.
Some checks I’ve done:
I’ve logged out the SQL queries that the load test script generates when creating a booking, and it is the same as the SQL I use in the command line.
The trigger is definitely being triggered in the load test environment, and it is definitely not catching any overlapping bookings. I ascertained this by inserting the “overlapping_booking_resource_id” variable from the trigger into another table. All the values were null.
The trigger works in the load test environment when inserting bookings from the command line, i.e. it prevents the overlapping booking from being inserted.
If I make the constraint for what a “double booking” is slightly too strict, i.e. adjacent bookings count as double bookings, then I do see things being caught by the trigger – that is, the apache log records several errors with the message ‘The inserted times overlap with booking_resource_id:’
I’m wondering if maybe the lock is only taken out until the end of the trigger, and there is still a race condition between the end of the trigger and actually inserting into the table. However this doesn’t explain why none of the overlapping bookings are ever caught.
I’m really stuck now. Does anyone have any ideas as to what I have done wrong?
A less elegant but more robust method would be to use a table made for locking records accross the system.
DELIMITER //
CREATE TRIGGER booking_resource_double_booking_guard BEFORE INSERT ON booking_resource
FOR EACH ROW BEGIN
DECLARE overlapping_booking_resource_id INT DEFAULT NULL;
DECLARE msg VARCHAR(255);
-- Take an exclusive lock on the resource in question for the duration of the current
-- transaction. This will prevent double bookings.
---CHANGED HERE
REPEAT
BEGIN
DECLARE CONTINUE HANDLER FOR SQLWARNING
BEGIN
SET locked = FALSE;
END;
locked=TRUE;
INSERT INTO lockresource values(NEW.resource_id);
END;
UNTIL LOCKED END REPEAT;
---TIL HERE
-- Now we have the lock, check for optimistic locking conflicts:
SELECT booking_resource_id INTO overlapping_booking_resource_id
FROM booking_resource other
WHERE other.booking_from < NEW.booking_to
AND other.booking_to > NEW.booking_from
AND other.resource_id = NEW.resource_id
LIMIT 1;
IF overlapping_booking_resource_id IS NOT NULL THEN
SET msg = CONCAT('The inserted times overlap with booking_resource_id: ', overlapping_booking_resource_id);
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = msg;
END IF;
END
//
---ADDED FROM HERE
DELIMITER //
CREATE TRIGGER booking_resource_double_booking_guard_after AFTER INSERT ON booking_resource
FOR EACH ROW BEGIN
DECLARE CONTINUE HANDLER FOR SQLWARNING BEGIN END;
delete from lockresource where lockid=NEW.resource_id;
END
//
Anyway, that'd be the idea and would certainly prevent any loss of lock until completion of your validation.
Hey guys, here is one I am not able to figure out. We have a table in database, where PHP inserts records. I created a trigger to compute a value to be inserted as well. The computed value should be unique. However it happens from time to time that I have exact same number for few rows in the table. The number is combination of year, month and day and a number of the order for that day. I thought that single operation of insert is atomic and table is locked while transaction is in progress. I need the computed value to be unique...The server is version 5.0.88. Server is Linux CentOS 5 with dual core processor.
Here is the trigger:
CREATE TRIGGER bi_order_data BEFORE INSERT ON order_data
FOR EACH ROW BEGIN
SET NEW.auth_code = get_auth_code();
END;
Corresponding routine looks like this:
CREATE FUNCTION `get_auth_code`() RETURNS bigint(20)
BEGIN
DECLARE my_auth_code, acode BIGINT;
SELECT MAX(d.auth_code) INTO my_auth_code
FROM orders_data d
JOIN orders o ON (o.order_id = d.order_id)
WHERE DATE(NOW()) = DATE(o.date);
IF my_auth_code IS NULL THEN
SET acode = ((DATE_FORMAT(NOW(), "%y%m%d")) + 100000) * 10000 + 1;
ELSE
SET acode = my_auth_code + 1;
END IF;
RETURN acode;
END
I thought that single operation of
insert is atomic and table is locked
while transaction is in progress
Either table is locked (MyISAM is used) or records may be locked (InnoDB is used), not both.
Since you mentioned "transaction", I assume that InnoDB is in use.
One of InnoDB advantages is absence of table locks, so nothing will prevent many triggers' bodies to be executed simultaneously and produce the same result.