MySQL stored procedure on big table eats server disk space - mysql

I have inherited a MySQL InnoDB table with around 500 million rows. The table has IP numbers and the name of the ISP to which that number belongs, both as strings.
Sometimes, I need to update the name of an ISP to a new value, after company changes such as mergers or rebranding. But, because the table is so big, a simple UPDATE...WHERE statement doesn't work - The query usually times out, or the box runs out of memory.
So, I have written a stored procedure which uses a cursor to try and make the change one record at a time. When I run the procedure on a small sample table, it works perfectly. But, when I try to run it against the whole 500 million row table in production, I can see a temporary table gets created (because a /tmp/xxx.MYI and /tmp/xxx.MYD file appear). The temporary table file keeps growing in size until it uses all available disk space on the box (around 40 GB).
I'm not sure why this temporary table is necessary. Is the server trying to maintain some kind of rollback state? My real question is, can I change the stored procedure such that the temporary table is not created? I don't really care if some, but not all of the records get updated - I can easily add some reporting and just keep running the proc until no records are altered.
At this time, architecture changes are not really an option – I can't change the structure of the table, for example.
Thanks in advance for any help.
David
This is my stored proc;
DELIMITER $$
DROP PROCEDURE IF EXISTS update_isp;
CREATE PROCEDURE update_isp()
BEGIN
DECLARE v_finished INT DEFAULT 0;
DECLARE v_num VARCHAR(255) DEFAULT "";
DECLARE v_isp VARCHAR(255) DEFAULT "";
DECLARE ip_cursor CURSOR FOR
SELECT ip_number, isp FROM ips;
DECLARE CONTINUE HANDLER
FOR NOT FOUND SET v_finished = 1;
OPEN ip_cursor;
get_ip: LOOP
IF v_finished = 1 THEN
LEAVE get_ip;
END IF;
FETCH ip_cursor INTO v_num, v_isp;
IF v_isp = 'old name' THEN
UPDATE ips SET isp = 'new name' WHERE ip_number = v_num;
END IF;
END LOOP get_ip;
CLOSE ip_cursor;
END$$
DELIMITER ;
CALL update_isp();
I have also tried wrapping the update statement in a transaction. It didn't make any difference.
[EDIT] My assumption below, that a simple counting procedure does not create a temporary table, was wrong. The temporary table is still created, but it grows more slowly and the box does not run out of disk space before the procedure completes.
So the problem seems to be that any use of a cursor in a stored procedure results in a temporary table being created. I have no idea why, or if there is any way to prevent this.

If your update is essentially:
UPDATE ips
SET isp = 'new name'
WHERE isp = OLDNAME;
I am guessing that this update -- without the cursor -- will work better if you have an index on isp(isp):
create index idx_isp_isp on isp(isp);
Your original query should be fine once this index is created. There should be no performance issue updating a single row even in a very large table. The issue is in all likelihood finding the row, not updating it.

I don't think there is a solution to this problem.
From this page; http://spec-zone.ru/mysql/5.7/restrictions_cursor-restrictions.html
In MySQL, a server-side cursor is materialized into an internal
temporary table. Initially, this is a MEMORY table, but is converted
to a MyISAM table when its size exceeds the minimum value of the
max_heap_table_size and tmp_table_size system variables.
I misunderstood how cursors work. I assumed that my cursor functioned as a pointer to the underlying table. But, it seems MySQL must build the full result set first, and then give you a pointer to that. So, I don't really understand the benefits of cursors in MySQL. Thanks to everyone who tried to help.
David

If the table has some numerical index also you can specify a
WHERE myindex > 123 AND myindex < 456
in your update query and do that for a couple of intevals (with a loop for example) until the whole table is covered.
(sorry, my rep is too low to ask in the comment section, so I'll just post my guess-answer here to be able to comment on)
You could try to fake a numerical index with
SELECT ROW_NUMBER() as n, thetable.* FROM thetable ORDER BY oneofyourcolumns;
and then try what I suggested above.

Related

RAND() used in SQL trigger to get random number. What happens if it generates already used number?

I'm working on a system where I need to generate a six digit code for every user that signs up. So I'm using this statement (SELECT LEFT(CAST(RAND()*1000000000+999999 AS INT),6) for generating it. I have made that particular row UNIQUE. The thing is that this is all happening through a trigger. My question is, What happens if the number generated by this RAND() is already in use? Will the trigger be executed again as that particular is UNIQUE? or Do I need to write any condition in the trigger itself? If I need to write any condition, Please help me with it.
If the randomizer generates a value that has already been used, and stores it in a column that has a UNIQUE constraint, then the row will violate the constraint, and the INSERT and any other data changed by the trigger will be cancelled.
The trigger will not retry. A retry would need to be executed by your application code, after catching the error.
It would be far simpler to use a table's auto-increment mechanism to guarantee that values are not reused.
An example. Use with caution!!!
CREATE TRIGGER tr_bi_generate_pin
BEFORE INSERT
ON test
FOR EACH ROW
BEGIN
REPEAT
SET NEW.pin = CEIL(255 * RAND()); -- 255 is MAXVALUE for TINYINT UNSIGNED
SET NEW.iterations = NEW.iterations + 1;
UNTIL NOT EXISTS ( SELECT NULL
FROM test
WHERE pin = NEW.pin ) END REPEAT;
END
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=11c263a2eb07b8db133ae13a3d22e549
This code is relatively safe - it counts the amount of iterations, and if it reaches 256 the insertion will fail. But on real system, without such counting and with more wide datatype, the code may cause server hang because of too long, infinite-like, loop. So add maximal iterations amount checking - query fail is better than server hang.

Mysql/adventureworks database/trigger question

Good afternoon I have been having some trouble trying to understand this question in class.The purpose of this assignment is to add parameters around data entry through triggers that are launched when specific conditions are met.
Due to warehousing limitations, inventory over 800 units needs to be sent to an external storage site and tracked separately. You have been asked to monitor when an update will exceed this boundary so it can be addressed in production meetings.
Write a trigger titled "tgrExcessInventory" for the Production.ProductInventory table to ensure the quantity can never exceed 800 units. This is step one.
Modify the trigger created in step 1 to execute its check code only if the Quantity column is updated. I successfully created the trigger but I am having trouble understanding how to modify it? This is what I have so far. I Have seen a few other post on here similar to this question, but I haven't seen any with the modification done to it. I feel it'something small im missing. From my understanding I need to write an alter statement?
CREATE TRIGGER tgrExcessInventory
on Production.ProductInventory
FOR UPDATE
AS
IF EXISTS
(SELECT 'True'
FROM Inserted i
JOIN Deleted d
ON i.productID = d.ProductID
AND i.locationID = d.LocationID
WHERE (d.quantity + i.quantity) >= 800 OR
i.quantity >=800
)Begin
RAISERROR('Cant increase supply where units would be over 800
units',16,1)
ROLLBACK TRAN
END
Then I did the alter function
ALTER TRIGGER [Production].[tgrExcessInventory]
on [Production].[ProductInventory]
FOR UPDATE
AS
IF EXISTS
(SELECT 'True'
FROM Inserted I
JOIN Deleted D
ON i.Quantity = d.quantity
AND i.Quantity = d.Quantity
WHERE (d.quantity + i.quantity) >= 800 OR
i.quantity >=800
)Begin
RAISERROR('Cant increase supply where units would be over 800 units',16,1)
ROLLBACK TRAN
END
Seems to work? I believe i did this right any tips would be appreciated thanks for your time
If you read here: https://dba.stackexchange.com/questions/193219/alter-procedure-in-mysql
regarding ALTER PROCEDURE
This statement can be used to change the characteristics of a stored procedure. More than one change may be specified in an ALTER PROCEDURE statement. However, you cannot change the parameters or body of a stored procedure using this statement; to make such changes, you must drop and re-create the procedure using DROP PROCEDURE and CREATE PROCEDURE.
You can easily alter a procedure using an alter statement if it is a minor change. If you want to make bigger changes, you should use SHOW CREATE PROCEDURE tgrExcessInventory then make your changes, drop the existing procedure DROP PROCEDURE IF EXIST tgrExcessInventory and run the CREATE PROCEDURE statement with your changes

Parameter sniffing on table valued parameters

I'm fairly certain that adding parameter sniffing to table valued parameters is of little or no value however I was wondering if someone could confirm this?
(INT_LIST is a user defined table type which is a single column of type INT)
CREATE PROCEDURE [dbo].[TVPSniffTest](
#param1 varchar(50),
#idList INT_LIST readonly
)
AS
BEGIN
DECLARE #param1_sniff VARCHAR(50) = #param1 --this is worth doing
DECLARE #idList_sniff INT_LIST
INSERT INTO #idList_sniff SELECT value FROM #idList --will this help?
--query code here
END
As Jeroen already mentioned, there is no parameter sniffing issue with TVPs. And also that one option to mitigate the lack of statistics is to copy the TVP to a local temp table (which does maintain statistics).
But, another option that is sometimes more efficient is to do a statement-level recompile on any queries using the table variable (i.e. the TVP). The statistics won't be maintained across queries so it needs to be done on any query that involves the table variable that is not something like a simple SELECT.
The following illustrates this behavior:
DECLARE #TableVariable TABLE (Col1 INT NOT NULL);
INSERT INTO #TableVariable (Col1)
SELECT so.[object_id]
FROM [master].[sys].[objects] so;
-- Control-M to turn on "Include Actual Execution Plan".
-- For each of the 3 following queries, hover over the "Table Scan"
-- operator to see the "Estimated Number of Rows".
SELECT * FROM #TableVariable; -- Estimated Number of Rows = 1 (incorrect)
SELECT * FROM #TableVariable
OPTION (RECOMPILE); -- Estimated Number of Rows = 91 (correct)
SELECT * FROM #TableVariable; -- Estimated Number of Rows = 1 (back to incorrect)
This has no effect whatsoever -- in fact, it's detrimental to performance because you're copying the whole table first.
The optimizer maintains no statistics for either table-valued parameters or table variables. This can easily lead to bad query plans with cardinality mismatches; the solution for that is usually an intermediate temp table. In any case, parameter sniffing won't be an issue -- the table contents are never used to optimize the query plan.
Incidentally, while you can assign the parameter to a local variable to circumvent sniffing, a more flexible option is to use the OPTIMIZE FOR or RECOMPILE hints in queries that are particularly affected (or WITH RECOMPILE on the whole stored procedure, but that's a little more drastic). This prevents cluttering the procedure with copies of everything.

How to sync values in a table column in mysql trigger

I need to sync values in a table column in mysql trigger while having the same value in another column. Here is an example of my table:
id___MP____sweek
1____2_____1
2____2_____1
3____1_____2
4____1_____2
5____3_____3
6____3_____3
If a user changes, for example, MP in the first row (id=1) from 2 to 4, then the value of MP with the same sweek has to be changed (e.g., id=2, MP becomes also 4).
I wrote a BEFORE UPDATE tigger that does not work:
USE moodle;
DELIMITER $$
CREATE TRIGGER trigger_course_minpostUPD BEFORE UPDATE ON moodle.mdl_course_sections FOR EACH ROW
BEGIN
IF NEW.MP <> OLD.MP THEN
BEGIN
SET #A=NEW.MP;
SET NEW.MP = #A
WHERE OLD.sweek=NEW.sweek;
END;
END IF;
END$$
DELIMITER ;
From within a MySQL trigger you are not able to affect other rows on the same table.
You would want to say something like:
UPDATE my_table SET MP=NEW.MP WHERE sweek = NEW.sweek
But - sorry - no go.
There are hack around this -- and ugly ones, too.
If your table is MyISAM, you can wrap it up with a MERGE table, and act on the MERGE table instead (MySQL doesn't realize at that point you're actually hacking around it).
However, using MyISAM as a storage engine may not be a good thing -- today's focus is on InnoDB, a much more sophisticated engine.
Another trick is to try and use the FEDERATED engine. See relevant post by Roland Bouman. Again, this is a dirty hack.
I would probably let the application do the thing within the same transaction.

MySQL trigger : is it possible to delete rows if table become too large?

When inserting a new row in a table T, I would like to check if the table is larger than a certain threshold, and if it is the case delete the oldest record (creating some kind of FIFO in the end).
I thought I could simply make a trigger, but apparently MySQL doesn't allow the modification of the table on which we are actually inserting :
Code: 1442 Msg: Can't update table 'amoreAgentTST01' in stored function/trigger because it is already used by statement which invoked this stored function/trigger.
Here is the trigger I tried :
Delimiter $$
CREATE TRIGGER test
AFTER INSERT ON amoreAgentTST01
FOR EACH ROW
BEGIN
DECLARE table_size INTEGER;
DECLARE new_row_size INTEGER;
DECLARE threshold INTEGER;
DECLARE max_update_time TIMESTAMP;
SELECT SUM(OCTET_LENGTH(data)) INTO table_size FROM amoreAgentTST01;
SELECT OCTET_LENGTH(NEW.data) INTO new_row_size;
SELECT 500000 INTO threshold;
select max(updatetime) INTO max_update_time from amoreAgentTST01;
IF (table_size+new_row_size) > threshold THEN
DELETE FROM amoreAgentTST01 WHERE max_update_time = updatetime; -- and check if not current
END IF;
END$$
delimiter ;
Do you have any idea on how to do this within the database ?
Or it is clearly something to be done in my program ?
Ideally you should have a dedicated archive strategy in a separate process that runs at off-peak times.
You could implement this either as a scheduled stored procedure (yuck) or an additional background worker thread within your application server, or a totally separate application service. This would be a good place to put other regular housekeeping jobs.
This has a few benefits. Apart from avoiding the trigger issue you're seeing, you should consider the performance implications of anything happening in a trigger. If you do many inserts, that trigger will do that work and effectively half the performance, not to mention the lock contention that will arise as other processes try to access the same table.
A separate process that does housekeeping work minimises lock contention, and allows the work to be carried out as a high-performance bulk operation, in a transaction.
One last thing - you should possibly consider archiving records to another table or database, rather than deleting them.