I have a table name tbl_tmp_trans
it contains every user transactions ever done ( and it's up to 6Mil right now !)
we have decided to keep only last 100 transaction per user in our database so we could keep the db clean
here is a query that i have came up with
delete from tbl_tmp_trans
where trans_id in
(
select trans_id
from
(
select trans_id
from tbl_faucets_transactions
order by date
group by user_id
limit 100
) foo
)
what am i doing wrong?
because after doing this my cpu reach 100% and mysql crashed.
Thanks in advance
P.S: our db is Mysql and table engine is Innodb
P.S2: We have about 120k and transction table have near 6 million record
I have a proposal... Hopefully, it might help you.
Alter your table:
alter table tbl_tmp_trans add column todel tinyint(1);
Implement a stored procedure to iterate through the table with a cursor and mark (set todel to 1) records that should be deleted. Example procedure to do that:
delimiter //
drop procedure if exists mark_old_transactions //
create procedure mark_old_transactions()
begin
declare done int default false;
declare tid int;
declare uid int;
declare last_uid int default 0;
declare count int default 0;
declare cur cursor for select trans_id, user_id from tbl_tmp_trans order by user_id, date desc;
declare continue handler for not found set done = true;
open cur;
repeat
fetch cur into tid, uid;
if (!done) then
if (uid!=last_uid) then
set count = 0;
end if;
set last_uid = uid;
set count = count + 1;
if (count > 100) then
update tbl_tmp_trans set todel=1 where trans_id=tid;
end if;
end if;
until done
end repeat;
close cur;
end //
Invoke the procedure, maybe do some simple checks (how many transactions you delete from the table, etc.), and delete the marked records.
call mark_old_transactions;
-- select count(*) from tbl_tmp_trans where todel=1;
-- select count(*) from tbl_tmp_trans;
delete from tbl_tmp_trans where todel=1;
Finally, remove the column that we just added.
alter table tbl_tmp_trans drop column todel;
Some notes:
Probably you have to iterate through all the records of the table
anyway, so you don't loose performance with the cursor.
If you have ~120K users and ~6M transactions, you have ~50 transactions per user on average. Which means, that probably you don't really
have too many users with transactions over 100, so the number of
updates (hopefully) won't be too many. => the procedure runs relatively fast.
Delete should be fast again with the new column.
Related
In Mysql I have two concurrent processes that need to read some rows and update a flag based on a condition.
I have to write a stored procedure with transaction but the problem is that sometimes the two processes updates the same rows.
I have a table Status and I want read 15 rows where the flag Reserved is true, then update those rows setting the flag Reserved to False.
The updated rows must be returned to the client.
My stored procedure is:
CREATE DEFINER=`user`#`%` PROCEDURE `get_reserved`()
BEGIN
DECLARE tmpProfilePageId bigint;
DECLARE finished INTEGER DEFAULT 0;
DECLARE curProfilePage CURSOR FOR
SELECT ProfilePageId
FROM Status
WHERE Reserved is false and ((timestampdiff(HOUR, UpdatedTime, NOW()) >= 23) or UpdatedTime is NULL)
ORDER BY UpdatedTime ASC
LIMIT 15;
DECLARE CONTINUE HANDLER
FOR NOT FOUND SET finished = 1;
DECLARE EXIT HANDLER FOR SQLEXCEPTION ROLLBACK;
DECLARE EXIT HANDLER FOR SQLWARNING ROLLBACK;
START TRANSACTION;
DROP TEMPORARY TABLE IF EXISTS TmpAdsProfile;
CREATE TEMPORARY TABLE TmpAdsProfile(Id INT PRIMARY KEY AUTO_INCREMENT, ProfilePageId BIGINT);
OPEN curProfilePage;
getProfilePage: LOOP
FETCH curProfilePage INTO tmpProfilePageId;
IF finished = 1 THEN LEAVE getProfilePage;
END IF;
UPDATE StatusSET Reserved = true WHERE ProfilePageId = tmpProfilePageId;
INSERT INTO TmpAdsProfile (ProfilePageId) VALUES (tmpProfilePageId);
END LOOP getProfilePage;
CLOSE curProfilePage;
SELECT ProfilePageId FROM TmpAdsProfile;
COMMIT;
END
Anyway, if I execute two concurrent processes that call this stored procedure, sometimes they update the same rows.
How can I execute the stored procedure in an atomic way?
Simplify this a bit and use FOR UPDATE. That will lock the rows you want to change until you commit the transaction. You can get rid of the cursor entirely. Something like this, not debugged!
START TRANSACTION;
CREATE OR REPLACE TEMPORARY TABLE TmpAdsProfile AS
SELECT ProfilePageId
FROM Status
WHERE Reserved IS false
AND ((timestampdiff(HOUR, UpdatedTime, NOW()) >= 23) OR UpdatedTime IS NULL)
ORDER BY UpdatedTime ASC
LIMIT 15
FOR UPDATE;
UPDATE Status SET Reserved = true
WHERE ProfilePageId IN (SELECT ProfilePageId FROM TmpAdsProfile);
COMMIT;
SELECT ProfilePageId FROM TmpAdsProfile;
That temporary table will only ever have fifteen rows in it. So indexes and PKs and all that are not necessary. Therefore you can use CREATE ... AS SELECT ... to create and populate the table in one go.
And, consider recasting your UpdatedTime filter so it can use an index.
AND (UpdatedTime <= NOW() - INTERVAL 23 HOUR OR UpdatedTime IS NULL)
The appropriate index for the SELECT query is
CREATE INDEX status_update ON Status (Reserved, UpdatedTime, ProfilePageId);
The faster your SELECT operation can be, the less time your transaction will take, so the better your overall performance will be.
I'm having issues where an update statement should (to my knowledge) update 5 rows (I have selected 5 rows into a temp table and used an INNER JOIN in the update statement)
however when it comes to running the update statement it updates anything that could have been selected into the temp table not just the joined contents of the temp table itself.
I'm using the FOR UPDATE code in the selection statement to lock the rows, (as I'm expecting multiple queries to be aimed at this table at one time, (NOTE removing that does not change the error effect)
I've generalised the entire code base and it still has the same effect, I've been on this for the past few days and I'm sure that its just something silly I must be doing
Code description
TABLE `data`.`data_table Table to store data and to show it has been taken my the external program.
Stored Procedure `admin`.`func_fill_table debug code to populate the above table.
Stored Procedure `data`.`func_get_data Actual code designed to retrieve a batch size records, mark them as picked up and then return them to an external application.
Basic Setup Code
DROP TABLE IF EXISTS `data`.`data_table`;
DROP PROCEDURE IF EXISTS `admin`.`func_fill_table`;
DROP PROCEDURE IF EXISTS `data`.`func_get_data`;
DROP SCHEMA IF EXISTS `data`;
DROP SCHEMA IF EXISTS `admin`;
CREATE SCHEMA `admin`;
CREATE SCHEMA `data`;
CREATE TABLE `data`.`data_table` (
`identification_field_1` char(36) NOT NULL,
`identification_field_2` char(36) NOT NULL,
`identification_field_3` int(11) NOT NULL,
`information_field_1` int(11) NOT NULL,
`utc_action_time` datetime NOT NULL,
`utc_actioned_time` datetime DEFAULT NULL,
PRIMARY KEY (`identification_field_1`,`identification_field_2`,`identification_field_3`),
KEY `NC_IDX_data_table_action_time` (`utc_action_time`)
);
Procedure Creation
DELIMITER //
CREATE PROCEDURE `admin`.`func_fill_table`(
IN records int
)
BEGIN
IF records < 1
THEN SET records = 50;
END IF;
SET #processed = 0;
SET #action_time = NULL;
WHILE #processed < records
DO
SET #action_time = DATE_ADD(now(), INTERVAL FLOOR(RAND()*(45)-10) MINUTE);#time shorter for temp testing
SET #if_1 = UUID();
SET #if_2 = UUID();
INSERT INTO data.data_table(
identification_field_1
,identification_field_2
,identification_field_3
,information_field_1
,utc_action_time
,utc_actioned_time)
VALUES (
#if_1
,#if_2
,FLOOR(RAND()*5000+1)
,FLOOR(RAND()*5000+1)
,#action_time
,NULL);
SET #processed = #processed +1;
END WHILE;
END
//
CREATE PROCEDURE `data`.`func_get_data`(
IN batch int
)
BEGIN
IF batch < 1
THEN SET batch = 1; /*Minimum Batch Size of 1 */
END IF;
DROP TABLE IF EXISTS `data_set`;
CREATE TEMPORARY TABLE `data_set`
SELECT
`identification_field_1` as `identification_field_1_local`
,`identification_field_2` as `identification_field_2_local`
,`identification_field_3` as `identification_field_3_local`
FROM `data`.`data_table`
LIMIT 0; /* Create a temp table using the same data format as the table but insert no data*/
SET SESSION sql_select_limit = batch;
INSERT INTO `data_set` (
`identification_field_1_local`
,`identification_field_2_local`
,`identification_field_3_local`)
SELECT
`identification_field_1`
,`identification_field_2`
,`identification_field_3`
FROM `data`.`data_table`
WHERE
`utc_actioned_time` IS NULL
AND `utc_action_time` < NOW()
FOR UPDATE; #Select out the rows to process (up to batch size (eg 5)) and lock those rows
UPDATE
`data`.`data_table` `dt`
INNER JOIN
`data_set` `ds`
ON (`ds`.`identification_field_1_local` = `dt`.`identification_field_1`
AND `ds`.`identification_field_2_local` = `dt`.`identification_field_2`
AND `ds`.`identification_field_3_local` = `dt`. `identification_field_3`)
SET `dt`.`utc_actioned_time` = NOW();
# Update the table to say these rows are being processed
select ROW_COUNT(),batch;
#Debug output for rows altered (should be maxed by batch number)
SELECT * FROM
`data`.`data_table` `dt`
INNER JOIN
`data_set` `ds`
ON (`ds`.`identification_field_1_local` = `dt`.`identification_field_1`
AND `ds`.`identification_field_2_local` = `dt`.`identification_field_2`
AND `ds`.`identification_field_3_local` = `dt`. `identification_field_3`);
# Debug output of the rows that should have been modified
SELECT
`identification_field_1_local`
,`identification_field_2_local`
,`identification_field_3_local`
FROM
`data_set`; /* Output data to external system*/
/* Commit the in process field and allow other processes to access thoese rows again */
END;
//
Run Code
call `admin`.`func_fill_table`(5000);
call `data`.`func_get_data`(5);
You are misusing the sql_select_limit-setting:
The maximum number of rows to return from SELECT statements.
It only applies to select statements (to limit results sent to the client), not to insert ... select .... It is intended as a safeguard to prevent users to be accidentally flooded with millions of results, not as another limit function.
While in general, you cannot use a variable for limit, you can do it in a stored procedure (for MySQL 5.5+):
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants, with these exceptions: [...]
Within stored programs, LIMIT parameters can be specified using integer-valued routine parameters or local variables.
So in your case, you can simply use
...
FROM `data`.`data_table`
WHERE `utc_actioned_time` IS NULL AND `utc_action_time` < NOW()
LIMIT batch
FOR UPDATE;
This stored procedure that I'm working on errors out some times. I am getting a Result consisted of more than one row error, but only for certain JOB_ID_INPUT values. I understand what causes this error, and so I have tried to be really careful to make sure that my return values are scalar when they should be. Its tough to see into the stored procedure, so I'm not sure where the error could be generated. Since the error is thrown conditionally, it has me thinking memory could be an issue, or cursor reuse. I don't work with cursors that often so I'm not sure. Thank you to anyone who helps.
DROP PROCEDURE IF EXISTS export_job_candidates;
DELIMITER $$
CREATE PROCEDURE export_job_candidates (IN JOB_ID_INPUT INT(11))
BEGIN
DECLARE candidate_count INT(11) DEFAULT 0;
DECLARE candidate_id INT(11) DEFAULT 0;
# these are the ib variables
DECLARE _overall_score DECIMAL(5, 2) DEFAULT 0.0;
# declare the cursor that will be needed for this SP
DECLARE curs CURSOR FOR SELECT user_id FROM job_application WHERE job_id = JOB_ID_INPUT;
# this table stores all of the data that will be returned from the various tables that will be joined together to build the final export
CREATE TEMPORARY TABLE IF NOT EXISTS candidate_stats_temp_table (
overall_score_ib DECIMAL(5, 2) DEFAULT 0.0
) engine = memory;
SELECT COUNT(job_application.id) INTO candidate_count FROM job_application WHERE job_id = JOB_ID_INPUT;
OPEN curs;
# loop controlling the insert of data into the temp table that is retuned by this function
insert_loop: LOOP
# end the loop if there is no more computation that needs to be done
IF candidate_count = 0 THEN
LEAVE insert_loop;
END IF;
FETCH curs INTO candidate_id;
# get the ib data that may exist for this user
SELECT
tests.overall_score
INTO
_overall_score
FROM
tests
WHERE
user_id = candidate_id;
#build the insert for the table that is being constructed via this loop
INSERT INTO candidate_stats_temp_table (
overall_score
) VALUES (
_overall_score
);
SET candidate_count = candidate_count - 1;
END LOOP;
CLOSE curs;
SELECT * FROM candidate_stats_temp_table WHERE 1;
END $$
DELIMITER ;
The WHERE 1 (as pointed out by #cdonner) definitely doesn't look right, but I'm pretty sure this error is happening because one of your SELECT ... INTO commands is returning more than one row.
This one should be OK because it's an aggregate without a GROUP BY, which always returns one row:
SELECT COUNT(job_application.id) INTO candidate_count
FROM job_application WHERE job_id = JOB_ID_INPUT;
So it's probably this one:
# get the ib data that may exist for this user
SELECT
tests.overall_score
INTO
_overall_score
FROM
tests
WHERE
user_id = candidate_id;
Try to figure out if it's possible for this query to return more than one row, and if so, how do you work around it. One way might be to MAX the overall score:
SELECT MAX(tests.overall_sore) INTO _overall_score
FROM tests
WHERE user_id = candidate_id
I think you want to use
LIMIT 1
in your select, not
WHERE 1
Aside from using this safety net, you should understand your data to figure out why you are getting multiple results. Without seeing the data, it is difficult for me to take a guess.
I create a Mysql procedure using cursor, but it's run too slow... It's get between 40 and 60 lines by second.. See:
DELIMITER $$
CREATE PROCEDURE sp_create(IN v_idsorteio INT,OUT afetados INT)
BEGIN
DECLARE done INT default 0;
DECLARE vc_idsocio INT;
DECLARE z INT;
DECLARE cur1 CURSOR FOR select IdSocio from socios where Sorteio=1 and Finalizado='S' and CodClientes IS NOT NULL;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=1;
SET z=1;
OPEN cur1;
FETCH cur1 INTO vc_idsocio;
WHILE done=0 DO
-- SELECT register as t;
insert INTO socios_numeros_sorteio (IdSocio,IdSorteio,NumerodeSorteio) VALUES (vc_idsocio,v_idsorteio,z);
FETCH cur1 INTO vc_idsocio;
SET z = z+1;
END WHILE;
CLOSE cur1;
Select z-1 as total INTO afetados;
END$$
DELIMITER ;
how can I to improve that?
This is slow because you are looping through a resultset, row by row, and performing individual insert statements for each row returned. That's why it's gonna be slow.
Let's briefly summarize what you are doing. First, you are running a query:
select IdSocio
from socios
where Sorteio=1
and Finalizado='S'
and CodClientes IS NOT NULL;
(Apparently the order these rows are returned in is not important.)
Then for each row returned from that query, you want to insert a row into another table.
insert INTO socios_numeros_sorteio
(IdSocio
,IdSorteio
,NumerodeSorteio
) VALUES
(vc_idsocio
,v_idsorteio
,z);
The value for the first column is coming from a value returned by the query.
The value for the second column is being assigned a value passed as an argument to the procedure.
And the value for the third column is from a counter that starts at 1 and is being incremented by 1 for each row.
MySQL is optimized to perform an operation like this. But it's NOT optimized to do this using a stored procedure that loops through a cursor row by row.
If you are looking to get some reasonable performance, you need to SIGNIFICANTLY REDUCE the number of individual INSERT statements you run, and instead think in terms of processing data in "sets" rather than individual rows. One approach is batch the rows up into "extended insert" statements, which can insert multiple rows at a time. (The number rows you can insert in one statement is effectively limited by max_allowed_packet.)
That approach will significantly improve performance, but it doesn't avoid the overhead of the cursor, fetching each row into procedure variables.
Something like this (in the body of your procedure) is likely to perform much, much better, because it takes the result set from your select and inserts all of the rows into the destination table in one fell swoop, without bothering to mess with updating the values of variables in the procedure.
BEGIN
SET #idsorteio = v_idsorteio;
INSERT INTO socios_numeros_sorteio
( IdSocio
, IdSorteio
, NumerodeSorteio
)
SELECT s.IdSocio AS IdSocio
, #idsorteio AS IdSorteio
, #z := #z+1 AS NumerodeSorteio
FROM socios s
JOIN (SELECT #z := 0) z
WHERE s.Sorteio=1
AND s.Finalizado='S'
AND s.CodClientes IS NOT NULL;
SELECT ROW_NUMBER() INTO afetados;
END$$
Another simple solution is only to change the engine of the table to MyISAM by running the below query,
ALTER TABLE `socios_numeros_sorteio`
ENGINE=MyISAM;
Then CALL the procedure again.
note: MyISAM make the insertion process very fast
i have table data like this:
id,time,otherdata
a,1,fsdfas
a,2,fasdfag
a,3,fasdfas
a,7,asfdsaf
b,8,fasdf
a,8,asdfasd
a,9,afsadfa
b,10,fasdf
...
so essentially, i can select all the data in the order i want by saying something like:
select * from mytable ordered by id,time;
so i get all the records in the order i want, sorted by id first, and then by time. but instead of getting all the records, i need the latest 3 times for each id.
Answer:
Well, I figured out how to do it. I'm surprised at how quick it was, as I'm operating on a couple million rows of data and it took about 11 seconds. I wrote a procedure in a sql script to do it, and here's what it looks like. --Note that instead of getting the last 3, it gets the last "n" number of rows of data.
use my_database;
drop procedure if exists getLastN;
drop table if exists lastN;
-- Create a procedure that gets the last three records for each id
delimiter //
create procedure getLastN(n int)
begin
# Declare cursor for data iterations, and variables for storage
declare idData varchar(32);
declare done int default 0;
declare curs cursor for select distinct id from my_table;
declare continue handler for not found set done = 1;
open curs;
# Create a temporary table to contain our results
create temporary table lastN like my_table;
# Iterate through each id
DATA_LOOP: loop
if done then leave DATA_LOOP; end if;
fetch curs into idData;
insert into lastThree select * from my_table where id = idData order by time desc limit n;
end loop;
end//
delimiter ;
call getLastN(3);
select * from lastN;
sorry if this doesn't exactly work, I've had to change variable names and stuff to obfuscate my work's work, but i ran this exact piece of code and got what i needed!
I think it's as simple as:
SELECT * FROM `mytable`
GROUP BY `id`
ORDER BY `time` DESC
LIMIT 3
Two approaches that I'm aware of are (1) to use a set of unions, each one containing a "limit 3", or (2) to use a temporary variable. These approaches, along with other useful links and discussion can be found here.
Try this:
select *
from mytable as m1
where (
select count(*) from mytable as m2
where m1.id = m2.id
) <= 3 ORDER BY id, time